Using Scripts

Overview

screen-scraper's scraping engine allows you to associate custom scripts with various events in the scraping process. It is recommended that you read about managing and scripting in screen-scraper before continuing.

Using the scripts

Depending on what event triggers a script to be run different objects will be in-scope. Triggers regarding the scraping session are added on the general tab of the scraping session, file request/response triggers are associated on the properties tab of the scrapeable file, and extractor pattern events in the scripts section of the main tab in the extractor patterns tab of the scrapeable file.

Scripts can also be used to run scripts using the session.executeScript method.

Built-in objects

screen-scraper offers a few objects that you can work with in a script in the scraping engine. See the variable scope section and/or API documentation for more details.

  • session: The running scraping session.
  • scrapeableFile: The file interaction including request and response, it also holds the extractor pattern requests.
  • dataSet: All of the matches from an extractor pattern's tokens.
  • dataRecord: A single match of an extractor pattern's tokens.

Variable scope

Depending on when a script gets run different variables may be in or out of scope. When associating a script with an object, such as a scraping session or scrapeable file, you're asked to specify when the script is to be run. The table that follows specifies what variables will be in scope depending on when a given script is run. Only variables that are in scope are accessible to the script.

When Script is Run session in scope scrapeableFile in scope dataSet in scope dataRecord in scope
Before scraping session begins X
After scraping session ends X
Before file is scraped X X
After file is scraped X X
Before pattern is applied X X
After pattern is applied X X X
Once if pattern matches X X X X
Once if no matches X X
After each pattern match X X X X

Debugging scripts

One of the best ways to fix errors is to simply watch the scraping session log and the error.log file (located in the log directory where screen-scraper was installed) for script errors. When a problem arises in executing a script screen-scraper will output a series of error-related statements to the logs. Often a good approach in debugging is to build your script bit by bit, running it frequently to ensure that it runs without errors as you add each piece.

When screen-scraper is running as a server it will automatically generate individual log files in the log directory for each running scraping session (this can be disabled in the settings window). An error.log file will also be generated in that same directory when internal screen-scraper errors occur.

The breakpoint window can also be invaluable in debugging scripts. You can invoke it by inserting the line session.breakpoint() into your script.