1: Scrape Updates

Scrape Process

screen-scraper can be invoked from software applications written in most modern programming languages, including Java, Active Server Pages, PHP, .NET, and anything that supports SOAP. In this tutorial we'll give some examples of applications that do just that.

Our application will pass parameters to screen-scraper corresponding to login information as well as a key phrase for which to search. As in the third tutorial, we're going to pretend that the web site requires us to log in before we can search, for the sake of providing an example. Once we pass the parameters to screen-scraper we'll tell it to start scraping. screen-scraper will then run the scraping session using the parameters we gave it. Once it's done, we'll ask it for the extracted information, then output it for the user to see.

Updates

Before we begin we'll first need to make a couple of minor changes to the Shopping Site scraping session from the third tutorial. If you haven't already, start up screen-scraper.

Login Parameters

Under the Shopping Site scraping session click on the Login scrapeable file, then on the Parameters tab. We're going to alter the email_address and password POST parameters so that we can pass those parameters in rather than hard-coding them. For the email_address parameter change the value [email protected] to ~#EMAIL_ADDRESS#~, and change the testing value for the password parameter to ~#PASSWORD#~.

Remember tokens surrounded by the ~# #~ delimiters indicate that the value of a session variable should be inserted. For example, in our case we're going to create an EMAIL_ADDRESS session variable and give it the value [email protected] such that screen-scraper substitutes it in for the corresponding POST parameter at runtime.

Products Extractor Pattern

To simplify the process of giving an external script access to the extracted product details, we will save the data set into a session variable.

Click on the Details page scrapeable file. On the PRODUCTS extractor pattern, select the Advanced tab and check the box next to Automatically save the data set generated by this extractor pattern in a session variable.

Initialization Script

The code that we'll be writing in our external application will essentially take the place of the Shopping Site--initialize session script. Let's disable the association since it would otherwise overwrite the values we'll be passing in externally.

To do that click on the Shopping Site scraping session in the objects tree and un-check the Enabled checkbox for the Shopping Site--initialize session script.

Prepare screen-scraper for Application

Save your changes and exit screen-scraper. Also, so that the external scripts will be able to interact with screen-scraper, start screen-scraper running as a server.