Next Page Link
The following script is called upon completion of scraping the first page of a site's details. This script is useful when matching the current page number in the HTML is preferable or simpler than matching the next page number. Depending on how a site is coded, the number of the next page may not even appear on the current page. In this case, we would match for the word "Next", to simply determine if a next page exists or not. The regular expression used for the word next would be used as follows:
The regular expression for the lone token ~@NEXT@~ would be the text that suggests that a next page exists, such as Next Page or maybe a simple >> link.
The only change you should have to make to the code below is to set any variable names properly (if different than in your own project), and to set the correct scrapeableFile name near the bottom.
// Check to see if we found the word or phrase that flags a "Next" page
if (session.getVariable("NEXT") != null)
{
// Retrieve the page number of the page just scraped
currentPage = session.getVariable("PAGE");
if (currentPage == null)
currentPage = 1;
else
currentPage = Integer.parseInt(currentPage).toString();
// write out the page number of the page just scraped
session.log("Last page was: " + currentPage);
// Increment the page number
currentPage++;
// write out the page number of the next page to be scraped
session.log("Next page is: " + currentPage);
// Set the "PAGE" variable with the incremented page number
session.setVariable("PAGE", currentPage);
// Clear the "NEXT" variable so that the next page is allowed to find it's own value for "NEXT"
session.setVariable("NEXT", null);
// Scrape the next page
session.scrapeFile("Scraping Session Name--Next Page");
}
- Printer-friendly version
- Login or register to post comments