For loop writing same result for every page
Hi,
I'm trying to scrape some data from a site, every page is the same structure. The page url's are also numbered in sequential order, so for example.
www.football.com/scores001.html
www.football.com/scores002.html
etc
I followed tutorial 1 and setup the extractor patterns etc so that I could scrape what I needed from page 1. I then tried to add a loop to scrape the data for pages 001-999, writing them to a txt file. As part of this I set the URL on the PROPERTIES tab to the session variable ~#URL#~, this variable is changed within the loop.
At the minute I'm only scraping one value, the score, to test it. The problem is I'm getting the same score written to the txt file 999 times.
Can anyone please tell me what the problem is, the code seems simple enough:
// Output a message to the log so we know that we'll be writing the text out to a file.
session.log( "Writing data to a file." );
// Create a FileWriter object that we'll use to write out the text.
out = new FileWriter( "scores.txt" );
//loop for all 999 pages
for (i=1; i<1000; i++)
{
/// put the leading 0's in the number
numZero = i + "";
while(numZero.length() < 3) {
numZero = "0" + numZero;
}
//url for the page to scrape
nextUrl = "http://football.com/scores" + numZero + ".html";
//display it checkit works
session.log( "Next" + nextUrl);
//set session var to the url for ~#URL#~ in properties
session.setVariable( "URL", nextUrl);
//set file to scrape
session.scrapeFile( "Scorefile" );
// Write out the text.
out.write( session.getVariable( "first_score" ) );
}
// Close the file.
out.close();
Any advice would be great, thanks.
I would decouple the iterator
I would decouple the iterator from the writing, and the iterator would look like:
{
str = String.valueOf(i);
while (str.length()<3)
str = 0 + str;
url = "http://football.com/scores" + str + ".html";
session.log("Requesting " + url);
session.setv("URL", url);
session.scrapeFile("Scorefile");
}
You also want to make sure the URL on the Scorefile is set to ~#URL#~
Your writer would just live on the Scorefile page ... I would attach it to the extractor pattern.
when where and how to apply?
I have a similar problem. My question is when where and how do you apply this for-loop?
I mean somehow before the file is scraped, after the file is scraped, once if pattern matches or elsehow?
I only get it to be invoked always right from the beginning - so my "i" is always 0.
I use it on my scaprable "next page" file
I somehow need to pass an incrementing value to the page parameter, because the last Response with the pagenumber is without the next higher Page. There you can only see the current page.
I often use a script like
I often use a script like this on the scraping session level, "before scraping session begins", and then it will invoke the scrapeable fie.
thanks Jason
thanks Jason