Simply Set Variables
When a Scraping Session is started it can be a good idea to feed certain pieces of information to the session before it begins resolving URLs. This simple version of the Initialize script is to demonstrate how you might start on a certain page. While basic, understanding when a script like this would be used is pivotal in making screen scraper work for you.
session.scrapeFile( "Your First Page Goes Here!" );
The above code is useful where "PAGE" is an input parameter in the first page you would like to scrape.
Occasionally a site will be structured so that instead of page numbers the site displays records 1-10 or 20-29. If this is the case your Initialize script could look something like this:
session.setVariable( "DISPLAY_RECORD_MAX", 10 );
session.scrapeFile( "Your First Page Goes Here!" );
Once again "DISPLAY_RECORD_MIN" and "DISPLAY_RECORD_MAX" are input parameters on the first page you would like to scrape.
If you feel you understand this one, I'd encourage you to check out the other Initialize scripts in this code repository.
- Printer-friendly version
- Login or register to post comments
Comments
Concurrent scraping sessions
This approach is very useful for when you have one scraping session that you want to break up into pieces to be run at the same time. You can pass to screen-scraper different values either by setting them using --params in a batch file or shell script, or from the schedule feature of the Web Interface.
Because screen-scraper sets these values as session variables when they're passed in externally you would not need to manually set them as is illustrated in the examples. Instead, use the examples as they appear above for testing, but just comment them out when you save or export your scrapes.
Comment these out because...
// session.setVariable( "DISPLAY_RECORD_MIN", 1 );
// session.setVariable( "DISPLAY_RECORD_MAX", 10 );
...they'll automatically be set from the batch file
Make a few more batch files for yourself and you're good to go.
-Scott