1: Initialization Script

The Extension

A significant limitation of our first Hello World was that we could only scrape the text from our first request. That is, we were always scraping the text "Hello world!", which really isn't that useful. We'll now adjust our setup so that we can designate the text to be submitted in the form.

Initialization Script

First, we're going to set a session variable that will hold the text we'd like submitted in the form.

Session variables are used by screen-scraper to transfer information between scripts, scrapeable files, and other objects. Session variables are generally set from within scripts, but can also be automatically set within extractor patterns as well as passed in from external applications.

We'll now set up a script to set a session variable before our scraping session runs. Create a new script as you've done before, and call it Initialize scraping session. Copy the code below into the Script Text field in the script:

// Put the text to be submitted in the form into a
// session variable so we can reference it later.
session.setVariable( "TEXT_TO_SUBMIT", "Hi everybody!" );

Hopefully the script seems pretty straightforward. It sets a session variable named TEXT_TO_SUBMIT, and gives it the value Hi everybody! (spoken, of course, in your best Dr. Nick voice).

Setting the session variable TEXT_TO_SUBMIT will allow us to access that value in other scripts and scrapeable files while our Hello World scraping session is running.

We will, later in this tutorial, replace this script with a call from our external script. So, it might help to think of this as a debug script. We place it in or code so that we can run it from the workbench, but remove it later so the it doesn't interfere with our external scripts.

Adding Script Association

We'll now need to associate our script with our scraping session so that it gets invoked before the scraping session begins.

To do that, click on the Hello World scraping session in the objects tree on the left, then (in the section towards the bottom of the window) click the Add Script button to add the association. In the Script Name column select Initialize scraping session. The When to Run column should show Before scraping session begins, and the Enabled checkbox should be checked. This will cause our script to get executed at the very beginning of the scraping session so that the TEXT_TO_SUBMIT session variable can get set.

Scrapeable File Updates

Just as we use special tokens in extractor patterns to designate values we'd like to extract, we use special tokens to insert values of session variables into the URLs or parameters (GET, POST, or BASIC authentication) of scrapeable files. We'll do this now by embedding it into one of the parameters of our only scrapeable file. Expand the Hello World scraping session in the objects tree, then select the Form submission scrapeable file. Click on the Parameters tab. In the Value column for our text_string parameter replace the text Hello world! with the text ~#TEXT_TO_SUBMIT#~

The ~# and #~ delimiters are used to designate a session variable whose value should be inserted into that location when the scrapeable file gets executed. When the scrapeable file gets invoked, screen-scraper will construct the URL by including the text_string parameter in it. In other words, the URL for our scrapeable file will become:

http://www.screen-scraper.com/screen-scraper/tutorial/basic_form.php?text_string=Hi+everybody%21

Test Run

We're going to run our scraping session, but before doing that clear out the scraping session log by selecting the Hello World scraping session in the objects tree, clicking on the Log tab, then on the Clear Log button. Start up the scraping session by clicking the Run Scraping Session button. Once the scrape has run, you should get the log similar to the one in figure below.

If you look at the contents of the form_submitted_text.txt file (in the screen-scraper installation directory) you'll notice the text Hi everybody!. If you still have the file from before you might need to look for the new text.

Remember that it's a good idea to run scraping sessions often as you make changes, and watch the log and last responses to ensure that things are working as you expect them to.