8: Saving the Data
Processing Collected Information
Once screen-scraper extracts data there are a number of things that can be done with it. For example, you might be invoking screen-scraper from an external script (e.g., ASP, PHP, Java, etc.), which, after telling screen-scraper to extract data for a given search string, might display the results to the user. In our case we'll simply write the data out to a text file. To do this, we'll once again write a script.
Create a new script, call it Write data to a file, and use the following Interpreted Java code snippit:
try
{
session.log( "Writing data to a file." );
// Open up the file to be appended to.
out = new FileWriter( "dvds.txt", true );
// Write out the data to the file.
out.write( dataRecord.get( "TITLE" ) + "\t" );
out.write( dataRecord.get( "PRICE" ) + "\t" );
out.write( dataRecord.get( "MODEL" ) + "\t" );
out.write( dataRecord.get( "SHIPPING_WEIGHT" ) + "\t" );
out.write( dataRecord.get( "MANUFACTURED_BY" ) );
out.write( "\n" );
// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}
This script simply takes the contents of the current data record (which for us will be the data record that constitutes a single DVD) and appends it to a dvds.txt text file.
If you're familiar with Java, hopefully the scripts make sense. There is one important point worth noting, though. You'll notice that each script makes use of a DataRecord object (referenced as the dataRecord variable in the scripts). This object refers to the current DataRecord as the script is executed.
Return to the spreadsheet example. When the script gets invoked, a specific DataRecord (or row in the spreadsheet) will be current. This DataRecord automatically becomes a variable you can use in your script. The dataRecord object has a get method, which allows you to retrieve the value for a key it contains (i.e., you're referencing a specific cell in the spreadsheet). Again, you can read more about objects available to scripts and their scope in our documentation: Using Scripts and API.
Script Association
Click on the Details page scrapeable file, then on the Extractor Patterns tab. Below the extractor pattern click the Add Script button to add an association. In the Script Name column, select Write data to a file and in the When to Run column select After each pattern match (even though there will only be one match per page). For each DVD, we'll execute the script that will write the information out to a file.
To clarify a bit further, because we're invoking the script After each pattern match, the dataRecord variable will be in scope. In other words, for each row in the spreadsheet (which happens to be a single row in this case) screen-scraper will execute the Write data to a file script. Each time it gets invoked a DataRecord will be current (again, think of it walking through each row in the spreadsheet). As such, we have access to the current row in the spreadsheet by way of the dataRecord variable.
Had we indicated that the script was to be invoked After pattern is applied, the dataRecord would not be in scope. Again using the spreadsheet analogy, scripts that get invoked After pattern is applied would run after screen-scraper had walked through all of the rows in the spreadsheet, so no DataRecord would be in scope (i.e., it's at the end of the spreadsheet: after the very last row). See the Variable scope section in our documentation for more detail on which variables are in scope depending on when a given script is run.
Final Test
Run the scraping session one last time. If you check the directory where screen-scraper is installed you'll notice a dvds.txt file that will grow as the DVD detail pages get scraped.
Alternative for Professional and Enterprise Users
As an alternative to the above script you could use the following code:
The writeToFile method is an advanced feature and will write the whole of a DataSet (the spreadsheet) to a file. If you want to specify the order in which the columns are written you will have to use the first method as writeToFile saves them in the same order that is shown when testing the extractor patterns.
If you would like information on saving extracted data to a database please consult our FAQ on the topic.