Continuing scrape when using the memory conscious page iteration
Hi
I had a scrape running well, but I want to make some changes to it.
The site has added an extra level to it and I need to automate.
There is now:
1) a list of auctions with a link to each. This is dynamic and auctions expire and new ones get added. This page links to the individual auction listing pages (2) below) https://www.cva-auctions.co.uk/auctions/
2) Standard Listing page - I have the scraping of the productid/url sorted. As it uses an API I have been using the code below for pagination. https://www.cva-auctions.co.uk/stock?saleid=47
3) Product page. No issues here. https://www.cva-auctions.co.uk/details?listingid=084f1c9f-fb56-41bf-b764-0d28ba435949
So I scrape the saleid from (1) and insert it into the URL properties for (2). This works well and, if there is more than one page, the iteration works as well.
Trouble is the iteration code says that if there are no more pages then the session should stop, rather than go pack to the next sale from (1) above.
I am sure this is a one line change to the code, but I have no idea where to start...
Many Thanks in advance
Jason
total = Integer.parseInt(t);
u = dataRecord.get("PAGE").trim();
page = Integer.parseInt(u);
log.log(">>>Page " + page);
perPage = 25;
pages = total/perPage;
log.log(">>>Page " + total/perPage);
if (page<pages) {
page++;
} else {
session.stopScraping();
}
log.log(">>>Page " + page);
{
session.setv("PAGE", page);
session.scrapeFile("CVASearchResults");
}
}
Perfect, Thank you
I have made a couple of changes so my code now look like this:
String p = dataRecord.get("PAGE").trim();
int total = Integer.parseInt(t);
int PAGE = Integer.parseInt(p);
int perPage = 24;
int pages = total/perPage; // Always results in an int
// In case there is a final page with fewer than 20 results listed
if (total%perPage>0)
pages++;
if (PAGE==1)
{
// Starts at page 2
for (int i=2; i<=pages && !session.shouldStopScraping(); i++)
{
session.setv("PAGE", i);
log.log(">>>Requesting page "+ i +" of "+ pages);
session.scrapeFile("CVASearchResults");
session.setVariable("PAGE","1");
}
}
Thanks again
Jason
I sort of need to see your
I sort of need to see your site, but most of the time when you use a script like this, you can see the total results on page 1, so, I would set page one in a script before I get the search results, and:
int total = Integer.parseInt(t);
int perPage = 25;
int pages = total/perPage; // Always results in an int
// In case there is a final page with fewer than 20 results listed
if (total%perPage>0)
pages++;
if (session.getv("PAGE")==1)
{
// Starts at page 2
for (int i=2; i<=pages && !session.shouldStopScraping(); i++)
{
session.setv("PAGE", i);
log.log(">>>Requesting page "+ i +" of "+ pages);
session.scrapeFile("CVASearchResults");
}
}
This way the script runs on page one, requests page 2 and allows it to finish and close, then call page 3, etc.