2: Scrape Updates

Session Variables

We'll modify our existing scraping session a bit to get it ready to save the scraped data to our database. First, click on the Details page scrapeable file in the objects tree, then on the Extractor Patterns tab, then click the Sub-Extractor Patterns tab for our PRODUCTS extractor pattern.

We're going to update each of our extractor pattern tokens so that they save their extracted values in a session variable. Do this by double-clicking each of them (e.g., on TITLE) or right-clicking (control-clicking on Mac OS X) and selecting Edit token. In the Edit Token window click the Save in session variable check box, then close the window. Do this for each extractor token (TITLE, PRICE, etc.).

We save the values in session variables so that we can use them as POST parameters in the scrapeable file that will POSTS to our PHP file.

New Scrapeable File

Let's create that scrapeable file now. Click on the Shopping Site scrapeable file in the objects tree, then the Add Scrapeable File button, found on the General tab. Once the scrapeable file appears give it the name Save product. In the URL field enter:

http://www.screen-scraper.com/support/tutorials/tutorial5/db/save_product.php

This is an example of the completed file. We will discuss the file in more detail later.

Check the box labeled This scrapeable file will be invoked manually from a script so that it will not run in sequence.

Now add the required parameters. Click on the Parameters tab for the new scrapeable file, and give it five POST parameters:

Key Value
title ~#TITLE#~
price ~#PRICE#~
manufactured_by ~#MANUFACTURED_BY#~
model ~#MODEL#~
shipping_weight ~#SHIPPING_WEIGHT#~

Remember that the ~# #~ delimiters indicate that the value of the corresponding session variable should be substituted in. For example, in our case the value of the TITLE session variable (e.g., "A Bug's Life") will be substituted in for the TITLE token. This value will be the one that gets submitted to the PHP file so that it can be inserted into the database.

New Script

Finally, we need to create a simple script that will invoke our new scrapeable file. Click on the (Add a new script) button. Give the script the name Save product, and give it the Script Text:

session.scrapeFile( "Save product" );

The script simply tells screen-scraper to invoke the Save product scrapeable file.

New Script Association

We need invoke the Save product scrapeable file for each product, so that they all get saved to the database. As such, we'll invoke the script after the Details page is requested. Do this by clicking on the Details page scrapeable file in the objects tree, on the Properties tab, and then on the Add Script button. In the Script Name column select Save product. Under the When to Run column select After file is scraped.

Test Run

Okay, we're done setting up screen-scraper, so we're ready give our scraping session a run. Before we invoke it, let's make one minor tweak so that the session doesn't take quite so long to run.

In the Shopping Site--initialize session script, change the value of the SEARCH session variable from dvd to bug.

This way we'll get the two "Bug's Life" DVD's rather than every DVD in the system.

Once you've done that click on the Shopping Site scraping session in the objects tree, then on the Run Scraping Session button.

Check the Results

Once the scraping session has run it's course click on the Save product scrapeable file, then on the Last Response tab. You should see something like this for the response:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<status>Success</status>
<product>
<title>A Bug\'s Life \"Multi Pak\"</title>
<price>35.99</price>
<manufactured_by>Warner</manufactured_by>
<model>DVD-ABUG</model>
<shipping_weight>7.00 lbs.</shipping_weight>
</product>
</result>

This indicates that the last product was successfully inserted.