Utilities API

Overview

There are many classes that can be very helpful in getting your scripts to run correctly. Many of these are initially developed in-house to speed up coding time and once they have proved very stable offered to the public. For all classes you will need to import their packages. They are not automatically imported like the built-in screen-scraper objects.

Classes

CsvWriter (com.screenscraper.csv): For recording data into a CSV file (helpful for Excel).
DataManagerFactory (com.screenscraper.datamanager): Facilitates the creation of an SqlDataManager.
ProxyServerPool (com.screenscraper.util): For setting up anonymization using your own proxies.
RetryPolicy and RetryPolicyFactory (com.screenscraper.util.retry): Objects that tell a scrapeable file how to check for errors, and optionally what to do before retrying to download them. .
SqlDataManager (com.screenscraper.datamanager): Facilitates writing of data into a SQL database.
XmlWriter (com.screenscraper.xml): Oftentimes you want to write extracted data directly to an XML file. This class facilitates doing that.

Apache Lang Library

Overview

The Apache Lang library provides enhancements to the standard Lang library of Java and can be particularly useful for completing tasks. As it is not a class that we maintain we will not document the methods in case they change without our notice but we invite you to look over how to use it in their API.

CSVReader

Overview

The CSVReader is not a class that is part of screen-scraper but is very useful and well put together. We have used it extensively. It is part of the opencsv package which actually holds the under pinnings of our own CsvWriter. As it is not a class that we maintain we will not document the methods in case they change without our notice but we invite you to look over how to use it in their API or brief documentation.

Using CSVReader

To use the CSVReader simply import it in your script, the same as you would any other utility class. The opencsv.jar file is already included in the Professional and Enterprise Editions of screen-scraper's default installation.

//import opencsv class
import au.com.bytecode.opencsv.*;

// read file
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));

CsvWriter

Overview

This CsvWriter has been created to work particularly well with the screen-scraper objects. It is simple to use and provided to ease the task of keeping track of everything when creating a csv file.

The most used methods are documented here but if you would like more information you can read the JavaDoc for the CsvWriter.

CsvWriter

CsvWriter CsvWriter ( String filePath ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, boolean addTimeStamp ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, boolean addTimeStamp ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, char quotechar ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, char quotechar, char escapechar ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, char quotechar, String lineEnd ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, char quotechar, char escapechar, String lineEnd ) (professional and enterprise editions only)

Description

Create a csv file writer.

Parameters

filePath File path to where the csv file should be created/saved, as a string.
addTimeStamp (optional) If true a time stamp will be added to the filename; otherwise, the filePath will remain unchanged.
seperator (optional) The character that should be used to separate the fields in the csv file, the default is char 44 (comma).
quotechar (optional) The character that should be used to quote fields, the default is char 34 (straight double-quotes).
escapechar (optional) The escape character for quotes, the default is char 34 (straight double-quotes).
lineEnd (optional) The end of line character, as a string. The default is the new line character ("\n").

Return Values

Returns a CsvWriter object. If it encounters an error it will be thrown.

Change Log

Version	Description
5.0	Available for Professional and Enterprise editions.
4.5.18a	Introduced in alpha version.

Class Location

com.screenscraper.csv.CsvWriter

Examples

Create CsvWriter

// Import class
import com.screenscraper.csv.*;

// Create CsvWriter with timestamp
CsvWriter writer = new CsvWriter("output.csv", true);

// Save in session variable for general access
session.setVariable( "WRITER", writer);

close

void csvWriter.close ( )

Description

Clear the buffer contents and close the file.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.
4.5.18a	Introduced in alpha version.

Examples

Close CsvWriter

// Retrieve CsvWriter from session variable
writer = session.getv( "WRITER" );

// Write buffer and close file
writer.close();

flush

void csvWriter.flush ( )

Description

Write the buffer contents to the file.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.
4.5.18a	Introduced in alpha version.

Examples

Write Data Record to CSV

// Retrieve CsvWriter from session variable
writer = session.getv( "WRITER" );

// Write dataRecord to the file (headers already set)
writer.write(dataRecord);

// Flush record to file (write it now)
writer.flush();

setHeader

void csvWriter.setHeader ( String[ ] header )

Description

Set the header row of the csv document. If the document already exists the headers will not be written. Also creates a data record mapping to ease writing to file.

Parameters

header Headers of csv file, as a one-dimensional array of strings.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.
4.5.18a	Introduced in alpha version.

If you want to use the data record mapping then the extractor tokens names should be all caps and all spaces should be replaced with underscores.

Examples

Add Headers to CSV File

// Create CsvWriter with timestamp
CsvWriter writer = new CsvWriter("output.csv", true);

// Create Headers Array
String[] header = {"Brand Name", "Product Title"};

// Set Headers
writer.setHeader(header);

// Write out to file
writer.flush();

// Save in session variable for general access
session.setVariable( "WRITER", writer);

write

void csvWriter.write ( DataRecord dataRecord )

Description

Write to the CsvWriter object.

Parameters

dataRecord The data record containing the mapped token matches (see setHeader). Note that the token names in the data record should be in all caps, and spaces should be replaced with underscores. For example, if one of your headers is "Product ID", the corresponding data record token should be "PRODUCT_ID". This is in keeping with the recommended naming convention for extractor pattern tokens.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.
4.5.18a	Introduced in alpha version.

Examples

Write Data Record to CSV

DataManagerFactory

Overview

This class is used to instantiate a data manager object. This is done to simplify the process of creating a data manager of a given type. Currently it only creates SqlDataManagers. A SQL data manager can be created without the use of this class, but it is simplified greatly through its use.

This class should no longer be used. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples

This class is only available for Professional and Enterprise editions of screen-scraper.

getMsSqlDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getMsSqlDataManager ( ScrapingSession session, String host, String database, String username, String password, String queryString) (professional and enterprise editions only)

Description

Create a MsSQL data manager object.

Parameters

session The scraping session that the data manager should be attached to.
host The database host (URL and maybe Port), as a string.
database The name of the database, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.
parameters URL encoded query string, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the MsSQL data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the MsSQL JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create MsSQL Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1";
database = "mydb";
username = "user";
password = "pwrd";
parameters = null;

// Get MsSQL datamanager
dm = DataManagerFactory.getMsSqlDataManager( session, host, database, username, password, parameters);

getMySqlDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getMySqlDataManager ( ScrapingSession session, String host, String database, String username, String password, String parameters ) (professional and enterprise editions only)

Description

Create a MySQL data manager object.

Parameters

session The scraping session that the data manager should be attached to.
host The database host (URL and maybe Port), as a string.
database The name of the database, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.
parameters URL encoded query string, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the MySQL data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the MySQL JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create MySQL Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = null;

// Get MySQL datamanager
dm = DataManagerFactory.getMySqlDataManager( session, host, database, username, password, parameters);

getOracleDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getOracleDataManager ( ScrapingSession session, String host, String database, String username, String password, String queryString ) (professional and enterprise editions only)

Description

Create an Oracle data manager object.

Parameters

session The scraping session that the data manager should be attached to.
host The database host (URL and maybe Port), as a string.
database The name of the database, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.
parameters URL encoded query string, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the Oracle data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the Oracle JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create an Oracle Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = null;

// Get Oracle datamanager
dm = DataManagerFactory.getOracleDataManager( session, host, database, username, password, parameters);

getPostreSqlDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getPostreSqlDataManager ( ScrapingSession session, String host, String database, String username, String password, String queryString ) (professional and enterprise editions only)

Description

Create a Postgre data manager object.

Parameters

session The scraping session that the data manager should be attached to.
host The database host (URL and maybe Port), as a string.
database The name of the database, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.
parameters URL encoded query string, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the Postgre data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the Postgre JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create a Postgre Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = null;

// Get PostgreSQL datamanager
dm = DataManagerFactory.getPostreSqlDataManager( session, host, database, username, password, parameters);

getSqliteDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getSqliteDataManager ( ScrapingSession session, String file, String username, String password ) (professional and enterprise editions only)

Description

Create a SQLite data manager object.

Parameters

session The scraping session that the data manager should be attached to.
file The file path of the sqlite file, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the Sqlite data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the Sqlite JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create a SQLite Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
file = "c:/db/mydb.sqlite";
username = "user";
password = "pwrd";

// Get Sqlite datamanager
dm = DataManagerFactory.getSqliteDataManager( session, file, username, password);

ProxyServerPool

Overview

The proxy server pool object is used to aid with manual anonymization of scrapes. An example of how to setup manual proxy pools is available in the documentation. You will likely want to read that page first if you are new to the process.

Additionally, you should reference the available method's available in the Anonymous API

ProxyServerPool

ProxyServerPool ProxyServerPool ( )

Description

Initiate a ProxyServerPool object.

Parameters

This method does not receive any parameters.

Return Values

Returns a ProxyServerPool.

Change Log

Version	Description
4.5	Available for all editions.

Class Location

com.screenscraper.util.ProxyServerPool

Examples

Creating ProxyServerPool

import com.screenscraper.util.*;

// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.

proxyServerPool = new ProxyServerPool();

filter

void proxyServerPool.filter ( int timeout )

Description

Set the timeout that will render a proxy as being bad.

Parameters

timeout Number of seconds before timeout, as an integer.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Setup Timout for Bad Proxies

import com.screenscraper.util.*;

// Create a new ProxyServerPool object.
proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file
proxyServerPool.populateFromFile( "proxies.txt" );

// Validate proxies up to 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );

// This method call tells screen-scraper to filter the list of>
// proxy servers using 7 seconds as a timeout value. That is,
// if a server doesnt respond within 7 seconds, it's deemed
// to be invalid.

proxyServerPool.filter( 7 );

getNumProxyServers

int proxyServerPool.getNumProxyServers ( int numProxyServers )

Description

Retrieve the number of available proxy servers.

Parameters

This method does not receive any parameters.

Return Values

Returns the number of available proxy servers, as an integer.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Good Proxies to File

outputProxyServersToLog

void proxyServerPool.outputProxyServersToLog ( )

Description

Write list of proxies to log.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Good Proxies to File

import com.screenscraper.util.*;

// Create a new ProxyServerPool object.
proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file>
proxyServerPool.populateFromFile( "proxies.txt" );

// Validate proxies up to 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );

// Set timout interval
proxyServerPool.filter( 7 );

// Write good proxies to file
proxyServerPool.writeProxyPoolToFile( "good_proxies.txt" );

// You might also want to write out the list of proxy servers
// to screen-scraper's log.

proxyServerPool.outputProxyServersToLog();

populateFromFile

void proxyServerPool.populateFromFile ( String filePath )

Description

Add proxy servers to pool using a text file.

Parameters

filePath Path to the file containing proxy settings, as a string. The format of the file is a hard return delimited list of domain:port listing.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Creating ProxyServerPool

import com.screenscraper.util.*;

// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.

proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file
// containing a list of proxy servers. The format is very
// simple--you should have a proxy server on each line of
// the file, with the host separated from the port by a colon.
// For example:
// one.proxy.com:8888
// two.proxy.com:3128
// 29.283.928.10:8080
// But obviously without the slashes at the beginning.

proxyServerPool.populateFromFile( "proxies.txt" );

setAutomaticProxyCycling

void setAutomaticProxyCycling ( boolean cycleProxies )(professional and enterprise editions only)

Description

Enables or disables automatic proxy cycling. When this is set to false (default is true) the current proxy that was automatically selected from the pool will be used each time the next proxy is requested. When set to true, each call to the getNextProxy method will cycle as normal between all available proxies.

Parameters

A boolean value.

Return Value

None

Change Log

Version	Description
5.5.17a	Available in Professional and Enterprise editions.

Example

// Assuming a ProxyServerPool object was created previously, and
// stored in the PROXY_SERVER_POOL session variable.
pool = session.getv( "PROXY_SERVER_POOL" );

// This will cause the current proxy server to be reused until the
// value is set back to true.
pool.setAutomaticProxyCycling( false );

// The corresponding getter will indicate what the current value is.
session.log( "Automatically cycling proxies: " + pool.getAutomaticProxyCycling() );

setNumProxiesToValidateConcurrently

void proxyServerPool.setNumProxiesToValidateConcurrently ( int numProxies )

Description

Set the number of proxies that can be tested concurrently.

Parameters

numProxies Number of proxies to be validated concurrently, as an integer.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Test Proxies in Pool in Multiple Threads

setRepopulateThreshold

void proxyServerPool.setRepopulateThreshold ( int repopulateThreshold )

Description

Set threshold to get more proxy servers.

Parameters

repopulateThreshold Lowest number of proxies before more proxies are requested.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Good Proxies to File

import com.screenscraper.util.*;

// Create a new ProxyServerPool object.
proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file
proxyServerPool.populateFromFile( "proxies.txt" );

// Validate proxies up to 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );

// Set timout interval
proxyServerPool.filter( 7 );

// Write good proxies to file
proxyServerPool.writeProxyPoolToFile( "good_proxies.txt" );

// Write Proxy Servers to log
proxyServerPool.outputProxyServersToLog();

// As a scraping session runs, screen-scraper will filter out
// proxies that become non-responsive. If the number of proxies
// gets down to a specified level, screen-scraper can repopulate
// itself. Thats what this method call controls.

proxyServerPool.setRepopulateThreshold( 5 );

writeProxyPoolToFile

void proxyServerPool.writeProxyPoolToFile ( String path )

Description

Write list of proxies after invalid proxies have been removed.

Parameters

path File path to where the file should be written, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Good Proxies to File

import com.screenscraper.util.*;

// Create a new ProxyServerPool object.
proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file
proxyServerPool.populateFromFile( "proxies.txt" );

// Validate proxies up to 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );

// Set timout interval
proxyServerPool.filter( 7 );

// Once filtering is done, it's often helpful to write the good
// set of proxies out to a file. That way you may not have to
// filter again the next time.

proxyServerPool.writeProxyPoolToFile( "good_proxies.txt" );

RetryPolicy

Overview

Retry Policies are objects that tell a scrapeable file how to check for errors, and optionally what to do before retrying to download the files. Some of the things that can be done are executing scripts when the page loads incorrectly or running Runnables. Usually these things would either request a new proxy, output some helpful information, or could simply stop the scrape. RetryPolicy is an interface and can be implemented to create a custom retry policy, or there is a RetryPolicyFactory class that can be used to create some standard policies.

This policy is checked AFTER all the extractors have been run. This allows for checks on whether extractor patterns matched or not, and also allows a page to have it's 'error status' based off of another page (since extractor patterns could execute scripts that scrape other files, and those files could set a variable that acts as a flag to a previous retry policy). This could also cause some problems if the scrape isn't built to handle a page whose extractors shouldn't be run before the error checking occurs.
This interface is in the com.screenscraper.util.retry package.

Interface Implementation

If you need a custom retry policy, you can implement your own version of it. Be aware that you will need to ensure the references it has to the scrapeableFile are to the correct scrapeableFile. This could be tricky if you use the session.setDefaultRetryPolicy method. When using the scrapeableFile.setRetryPolicy method, the scrapeableFile will be the correct object. The interface is given below.

To help ensure you can create custom retry policies that have access to the scraping session and the scrapeable file that is currently being checked, there is an AbstractRetryPolicy class in the same package as the interface. This class defines some default behavior and adds protected fields for the session and scrapeable file that get set before the policy is run. If you extend this abstract class you can access the session and scrapeable file through this.scrapingSession and this.theScrapeableFile. Due to some oddities with the interpreter it is best to reference these variables with 'this.' to eliminate a few problems that arise in a few specific cases.

public interface RetryPolicy
{
/**
* Checks to see if the page loaded incorrectly
*
* @return True on errors, false otherwise
* @throws Exception If something goes wrong while executing this method
*/
public boolean isError() throws Exception;

/**
* Runs this code when the page had an error. This could include things such as rotating the proxy.
*
* @throws Exception If something goes wrong while executing this method
*/
public void runOnError() throws Exception;

/**
* Returns a map that can be used to output an error message to indicate what checks failed. For instance,
* you could set a key to the value "Status Code" and the value '200', or a key with "Valid Page" and value 'false'
*
* @return Map of keys, or null if no values are indicated
*
* @throws Exception If something goes wrong while executing this method
*/
public Map getErrorChecksMap() throws Exception;

/**
* Returns true if the session variables should be reset before attempting to rescrape the file, if there was an error.
* This can be useful especially if extractors null session variables when they don't match, but the value is needed
* to rescrape the file.
*
* @return True if session variables should be reset if there was an error, false otherwise.
*/
public boolean resetSessionVariablesBeforeRescrape();

/**
* Returns true if the referrer should be reset before attempting to rescrape the file,
* if there was an error. This can be useful to reset so the referrer
* doesn't show the page you just requested.
*
* @return True if the referrer should be reset if there was an error, false otherwise.
*/
public boolean resetReferrerBeforeRescrape();

/**
* Returns true if errors should be logged to the log/web interface when they occur
*
* @return True if errors should be logged to the log/web interface when they occur
*/
public boolean shouldLogErrors();

/**
* Return the maximum number of times this policy allows for a retry before terminating in an error
*
* @return The maximum number of times to allow the ScrapeableFile to be rescraped before resulting in an error
*/
public int getMaxRetryAttempts();

/**
* This will be called if all the retry attempts for the scrapeable file failed.
* In other words, if the policy said to retry 25 times, after 25 failures this
* method will be called. Note that {@link #runOnError()} will be called just before this,
* as it is called after each time the scrapeable file fails to load
* correctly, including the last time it fails to load.
* <p/>
* This should only contain code that handles the final error. Any proxy rotating, cookie
* clearing, etc... should generally be done in the {@link #runOnError()}
* method, especially since it will still be called after the final error.
*/
public void runOnAllAttemptsFailed();
}

getErrorChecksMap

Map getErrorChecksMap ( )

Description

Returns a map that can be used to output an error message to indicate what checks failed. For instance, you could set a key to the value "Status Code" and the value '200', or a key with "Valid Page" and value 'false'

Parameters

This method takes no parameters

Return Value

Map of keys, or null if no values are indicated

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

import com.screenscraper.util.retry.RetryPolicy;

_log = log;
_session = session;

RetryPolicy policy = new RetryPolicy()
{
Map errorMap = new HashMap();

boolean isError() throws Exception
{
errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
return scrapeableFile.wasErrorOnRequest();
}

void runOnError() throws Exception
{
session.executeScript("Rotate Proxy");
}

Map getErrorChecksMap() throws Exception
{
return errorMap;
}

boolean resetSessionVariablesBeforeRescrape()
{
return true;
}

boolean shouldLogErrors()
{
return true;
}

int getMaxRetryAttempts()
{
return 5;
}

boolean resetReferrerBeforeRescrape()
{
return false;
}

void runOnAllAttemptsFailed()
{
_log.logError("Failed to fix errors with the retry policy, stopping scrape");
_session.stopScraping();
}
};

scrapeableFile.setRetryPolicy(policy);

getMaxRetryAttempts

int getMaxRetryAttempts ( )

Description

Return the maximum number of times this policy allows for a retry before terminating in an error

Parameters

This method takes no parameters

Return Value

The maximum number of times to allow the ScrapeableFile to be rescraped before resulting in an error

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

isError

boolean isError ( )

Description

Checks to see if the page loaded incorrectly

Parameters

This method takes no parameters

Return Value

True on errors, false otherwise

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

resetReferrerBeforeRescrape

boolean resetReferrerBeforeRescrape ( )

Description

Returns true if the referrer should be reset before attempting to rescrape the file, if there was an error. This can be useful to reset so the referrer doesn't show the page you just requested.

Parameters

This method takes no parameters

Return Value

True if the referrer should be reset if there was an error, false otherwise.

Change Log

Version	Description
6.0.36a	Available in all editions.

Examples

Create a custom RetryPolicy

resetSessionVariablesBeforeRescrape

boolean resetSessionVariablesBeforeRescrape ( )

Description

Returns true if the session variables should be reset before attempting to rescrape the file, if there was an error. This can be useful especially if extractors null session variables when they don't match, but the value is needed to rescrape the file.

Parameters

This method takes no parameters

Return Value

True if session variables should be reset if there was an error, false otherwise.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

runOnAllAttemptsFailed

void runOnAllAttemptsFailed ( )

Description

This will be called if all the retry attempts for the scrapeable file failed. In other words, if the policy said to retry 25 times, after 25 failures this method will be called. Note that runOnError will be called just before this, as it is called after each time the scrapeable file fails to load correctly, including the last time it fails to load.

This should only contain code that handles the final error. Any proxy rotating, cookie clearing, etc... should generally be done in the runOnError method, especially since it will still be called after the final error.

Parameters

This method takes no parameters

Return Value

This method returns void

Change Log

Version	Description
6.0.37a	Available in all editions.

Examples

Create a custom RetryPolicy

runOnError

void runOnError ( )

Description

Runs this code when the page had an error. This could include things such as rotating the proxy. This code will be executed just before the page is downloaded again.

Parameters

This method takes no parameters

Return Value

This method returns void

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

shouldLogErrors

boolean shouldLogErrors ( )

Description

Returns true if errors should be logged to the log/web interface when they occur

Parameters

This method takes no parameters

Return Value

True if errors should be logged to the log/web interface when they occur

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

RetryPolicyFactory

Overview

Class used to create simple Retry Policies. See the RetryPolicy page for more details on what a RetryPolicy does. This class is found in the com.screenscraper.util.retry package.

getBasicPolicy

RetryPolicy RetryPolicyFactory.getBasicPolicy ( int retries, String scriptOnFail )
RetryPolicy RetryPolicyFactory.getBasicPolicy ( int retries, Runnable runnableOnFail )

Description

Policy that retries if there was an error on the request by status code. Executes the runnable given before retrying.

Parameters

retries How many times max to retry before failing
scriptOnFail/runnableOnFail What to run (script or Runnable) if the policy shows an error on the page. This will be run just before the page is downloaded again. The script or Runnable will be executed in the current thread, so the scrapeable file will not be redownloaded until this runnable or script has finished executing.

Return Value

The RetryPolicy to set in the ScrapeableFile

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set a basic retry policy

import com.screenscraper.util.retry.RetryPolicyFactory;
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getBasicPolicy(5, "Rotate Proxy"));

getEmptyPolicy

RetryPolicy RetryPolicyFactory.getEmptyPolicy ( )

Description

Policy that returns no error. Useful for having a session-wide retry policy, but then using this for a particular scrapeable file so it doesn't use the session's policy

Parameters

This method takes no parameters

Return Value

The RetryPolicy to set in the ScrapeableFile

Change Log

Version	Description
6.0.25a	Available in all editions.

Examples

Set an empty retry policy

import com.screenscraper.util.retry.RetryPolicyFactory;
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getEmptyPolicy());

getMatchingRegexPolicy

RetryPolicy RetryPolicyFactory.getMatchingRegexPolicy ( int retries, String regex )
RetryPolicy RetryPolicyFactory.getMatchingRegexPolicy ( int retries, String regex, String scriptOnFail )
RetryPolicy RetryPolicyFactory.getMatchingRegexPolicy ( int retries, String regex, Runnable runnableOnFail )

Description

Policy that requires a Regular Expression to match the page content (including headers) in order to be considered valid.

Parameters

retries How many times max to retry before failing
regex A Regular expression that must match the page content for the page to be considered valid
scriptOnFail/runnableOnFail (optional) What to run (script or Runnable) if the policy shows an error on the page. This will be run just before the page is downloaded again. The script or Runnable will be executed in the current thread, so the scrapeable file will not be redownloaded until this runnable or script has finished executing.

Return Value

The RetryPolicy to set in the ScrapeableFile

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set a matching regex policy

import com.screenscraper.util.retry.RetryPolicyFactory;
// Require the response to contain the text "Google.com". Since this is a regex, the . must have a \ before it
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getMatchingRegexPolicy(5, "Google\\.com", "Rotate Proxy"));

getMissingRegexPolicy

RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex )
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex, String scriptOnFail )
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex, Runnable runnableOnFail )

Description

Policy that requires a Regular Expression NOT to match the page content (including headers) in order to be considered valid. In other words, if the Regular Expression matches, it means that the page should be rescraped.

Parameters

retries How many times max to retry before failing
regex A Regular expression that must NOT match the page content for the page to be considered valid
scriptOnFail/runnableOnFail (optional) What to run (script or Runnable) if the policy shows an error on the page. This will be run just before the page is downloaded again. The script or Runnable will be executed in the current thread, so the scrapeable file will not be redownloaded until this runnable or script has finished executing.

Return Value

The RetryPolicy to set in the ScrapeableFile

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set a matching regex policy

import com.screenscraper.util.retry.RetryPolicyFactory;
// Require the response to not contain the text "Google.com". Since this is a regex, the . must have a \ before it
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getMissingRegexPolicy(5, "Google\\.com", "Rotate Proxy"));

SqlDataManager

Overview

This object simplifies your interactions with a JDBC-compliant SQL database. It can work with various types of databases and even in a multi-threaded format to allow scrapes to continue without having to wait for the queries to process. View an example of how to use the SqlDataManager.

This feature is only available for Professional and Enterprise editions of screen-scraper.

Prefer a more traditional approach? See an example of Working with MySQL databases.

In order to use the SqlDataManager you will need to make sure to install the appropriate JDBC driver. This can be done by downloading the driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Event Callbacks

Overview

Add an event callback to SqlDataManager object.

This feature is only available for Professional and Enterprise editions of screen-scraper.

Before adding an event to the SqlDataManager, you must build the schema of any tables you will use because events are related to table operations such as inserting data

Parameters

schema Case insensitive schema (table) name
when The event assiciated with the schema that should trigger the callback
- onCreate Triggered whenever the DataManager creates a new DataNode, such as the first addData since the last commit
- onAddData Triggered after dm.addData is called
- onWrite Triggered immediately before the DataNode is written (DataWriter.write). Applies to both inserts and updates
- onInsert Triggered immediately before the data is going to be inserted as a new row in the database as opposed to updating an existing row
- onUpdate Triggered immediately before existing database values are going to be updated as opposed to inserted as a new row
- onWriteError Triggered if an exception was thrown when trying to write to the database
- afterWrite Triggered immediately after the DataNode is written. At this point any values written are in the DataNode, including autogenerated keys
listener A callback interface that must be implemented by the client. There is a single method public void handleEvent(DataManagerEvent event) that needs to be implemented. The DataManagerEvent has a method getDataNode() to retrieve the relevant DataNode.

Return Values

Returns a DataManagerEventListener. The same DataManagerEventListener object that was passed in

Change Log

Version	Description
5.5	Available for professional and enterprise editions.

Class Locations

com.screenscraper.datamanager.DataManager
com.screenscraper.datamanager.DataManagerEventListener
com.screenscraper.datamanager.DataManagerEventSource.EventFireTime

Examples

Register a callback to log out database write errors to 'person' table to the web interface

import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.SqlDataManager;
import org.apache.commons.dbcp.BasicDataSource;

// BasicDataSource
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( "user" );
ds.setPassword( "psswrd" );
ds.setUrl( "jdbc:mysql://127.0.0.1:3306/mydb?UTF8ENCODING" );
ds.setMaxActive( 10 );

// Create Data Manager
dm = new SqlDataManager( ds, session );
dm.buildSchemas();
_session = session;

//This will log out any write errors to the 'person' table to the screen-scraper web interface
dm.addEventListener("person", DataManagerEventSource.EventFireTime.onWriteError,
new DataManagerEventListener() {
public void handleEvent(DataManagerEvent event) {
DataNode n = event.getDataNode();
_session.webError("Database Write Error",n.getObjectMap());
}
}
);

addData

void sqlDataManager.addData ( String table, Map data ) (professional and enterprise editions only)
void sqlDataManager.addData ( String table, String columnName, Object value ) (professional and enterprise editions only)

Description

Add data to fields, in preparation for insertion into a database.

When adding data in a many-to-many relation, if setAutoManyToMany is set to false, a null row should be inserted into the relating table so the datamanager will link the keys correctly between related tables. For example, dm.addData("many_to_many", null);

Before adding data the first time, you must build the schema of any tables you will use, as well as add foreign keys if you are not using a database engine that natively supports them (such as InnoDB for MySQL).

Parameters

table Name of the database table that the data corresponds to, as a string.
data (this or columnName and value) Map using field names as keys to desired values to be added in the database for fields. This can be a dataRecord object.
columnName (requires value) The name of the column/field in the database table that the data is being added for, as a string.
value (requires columnName) The value being inserted into the column/field.

The SqlDataManager will attempt to convert a value that is given to the correct format for the database. For example, if the database requires an int for a column named age, dm.addData("table", "age", "32") will convert the String "32" to an int before adding it to the database. See the table below the examples for other types of java objects and how they map to SQL types.

The table and columnName parameters are not case sensitive. The same is true for the key values in the data map.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Add Data from Data Record

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add DataRecord Information into person table
dm.addData( "person", dataRecord );

// Create and add query to buffer
dm.commit( "person" );

Add Data In a Specific Field

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add DataRecord Information into person table
dm.addData( "person", dataRecord );

// Add Specific Other Data
dm.addData( "person", "date_collected", "2010-07-13" );

// Create and add query to buffer
dm.commit( "person" );

Java Object and SQL Type Mappings

Since the DataManager is designed with screen-scraper in mind all inputs support using the String type in addition to their corresponding Java object type, but the String needs to be parseable into the corresponding data type. For example if there is a column that is defined as an Integer in the database then the String needs to be parseable by Integer.parseInt(String value). Here is a mapping of the sql types (based on java.sql.Types) to Java objects:

SQL Type		Java Object
java.sql.Types.CHAR		String
java.sql.Types.VARCHAR		String
java.sql.Types.LONGVARCHAR		String
java.sql.Types.LONGNVARCHAR		String
java.sql.Types.NUMERIC		BigDecimal
java.sql.Types.DECIMAL		BigDecimal
java.sql.Types.TINYINT		Integer
java.sql.Types.SMALLINT		Integer
java.sql.Types.INTEGER		Integer
java.sql.Types.BIGINT		Long
java.sql.Types.REAL		Float
java.sql.Types.FLOAT		Double
java.sql.Types.DOUBLE		Double
java.sql.Types.BIT		Boolean
java.sql.Types.BINARY		ByteArray
java.sql.Types.VARBINARY		ByteArray
java.sql.Types.LONGVARBINARY		ByteArray
java.sql.Types.DATE		SQLDate or Long
java.sql.Types.TIME		SQLTime or Long
java.sql.Types.TIMESTAMP		SQLTime or Long
java.sql.Types.ARRAY		Object
java.sql.Types.BLOB		ByteArray
java.sql.Types.CLOB		Object
java.sql.Types.JAVA_OBJECT		Object
java.sql.Types.OTHER		Object

addForeignKey

void sqlDataManager.addForeignKey ( String table, String columnName, String foreignTable, String foreignColumnName ) (professional and enterprise editions only)

Description

Manually setup table connection (key matching).

If SqlDataManager.buildSchemas is called, any foreign keys manually added before that point will be overridden or erased.

Parameters

table Name of the database table with the primary key, as a string.
columnName Column/field name of the primary key, as a string.
foreignTable Name of the database table with the foreign key, as a string.
foreignColumnName Column/field name of the foreign key, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

If the database has some indication of foreign keys then these will be followed automatically. If the database does not allow for foreign key references then you will need to build the table connections using this method.

Examples

Setup Table Connections

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas
dm.buildSchemas();

// Setup table connections
// parameter order: "child_table", "child_column", "parent_table", "parent_column"
dm.addForeignKey( "job", "person_id", "person", "id");
dm.addForeignKey( "address", "person_id", "person", "id");

addSessionVariables

void sqlDataManager.addSessionVariables ( String table ) (professional and enterprise editions only)

Description

Manually add session variable data to fields, in preparation for insertion into a database.

Parameters

table Name of the database table that the data corresponds to, as a string.

The keys from the session will be matched in a case insensitive way to the column names of the database.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Add Data from Session Variables

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add session variables into person table
dm.addSessionVariables( "person" );

// Create and add query to buffer
dm.commit( "person" );

addSessionVariablesOnCommit

void sqlDataManager.addSessionVariablesOnCommit ( boolean automate ) (professional and enterprise editions only)

Description

Add corresponding session variables to the tables automatically when it is committed.

Parameters

automate If true then session variables whose names match field names (case insensitive) will be automatically added to queries when the fields are committed.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Automate Session Variables

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

// Write Information to Database
// automatically using session variables
dm.addSessionVariablesOnCommit( true );

buildSchemas

void sqlDataManager.buildSchemas ( ) (professional and enterprise editions only)
void sqlDataManager.buildSchemas ( List tables ) (professional and enterprise editions only)

Description

Collect the database schema information, including foreign key relations between tables.

Schemas must be built for any tables that will be used by this DataManager before data can be added.

Parameters

tables (option) A list of table names, as strings, for which to build schemas.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Build Database Schema using a BasicDataSource

Build Database Schema using an SshDataSource

import com.screenscraper.datamanager.sql.*;

// SshDataSource
ds = new SshDataSource( "[email protected]", "ssPass" );
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( "user" );
ds.setPassword( "psswrd" );

// Accepted values for the first parameter of setUrl are:
// SshDataSource.MYSQL
// SshDataSource.MSSQL
// SshDataSource.ORACLE
// SshDataSource.POSTGRESQL
ds.setUrl( SshDataSource.MYSQL, 3306, "database" );

// Create Data Manager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

clearAllData

void sqlDataManager.clearAllData ( ) (professional and enterprise editions only)

Description

Clear all data from the data manager without writing it to the database. This includes all data previously committed but not yet written.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Write to Database

// Get data manager
dm = session.getv( "DATAMANAGER" );

// Clear information from the datamanager
dm.clearAllData();

clearSessionVariables

void sqlDataManager.clearSessionVariables ( String table ) (professional and enterprise editions only)

Description

Clear session variables corresponding to the fields of a specific table (case insensitive).

Parameters

table Name of the table whose field names will be used to clear session variables.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Clear Session Variables

// Get data manager
dm = session.getv( "DATAMANAGER" );

// Clear session variables for people table
dm.clearSessionVariables( "people" );

clearSessionVariablesOnCommit

void sqlDataManager.clearSessionVariablesOnCommit ( boolean clearVars ) (professional and enterprise editions only)

Description

Clear session variables corresponding to a committed table automatically.

Parameters

clearVars If true then session variables whose names match field names (case insensitive) will be automatically cleared when the table is committed.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Automate Session Variables

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

// Write Information to Database
// automatically using session variables
dm.addSessionVariablesOnCommit( true );

// Clear session variables on commit
// to avoid carry over
dm.clearSessionVariablesOnCommit( true );

close

void sqlDataManager.close ( ) (professional and enterprise editions only)

Description

Close data manager's connections.

If there is data that has not yet been written to the database when this method is called it will not be written.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Close Data Manager

// Get Data Manager
dm = session.getv( "DATAMANAGER" );

// Close Data Manager
dm.close();

commit

void sqlDataManager.commit ( String table ) (professional and enterprise editions only)

Description

Commit a prepared row of data into queue. Once called the data can no longer be edited. When working with multiple tables that relate by a foreign key, it is important to commit rows in the correct order. The rows in each of the child tables should be committed before the parent, or they will not be correctly linked when written to the database.

This does not write the row of data to the database, but rather puts it in queue to be written at a later time.

Parameters

table Name of the database table that the data corresponds to, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Commit Database Row

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add session variables into person table
dm.addSessionVariables( "person" );

// Create and add query to buffer
dm.commit( "person" );

commitAll

void sqlDataManager.commitAll ( ) (professional and enterprise editions only)

Description

Commit prepared rows of data for all tables into queue. Once called the data can no longer be edited.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Commit Database Row

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add session variables into tables
dm.addSessionVariables( "person" );
dm.addSessionVariables( "address" );
dm.addSessionVariables( "jobs" );

// Create and add queries to buffer
dm.commitAll();

flush

boolean sqlDataManager.flush ( ) (professional and enterprise editions only)

Description

Write committed data to the database. Any data that has not been committed using either the commit or commitAll method will be lost and not written to the database.

Parameters

This method does not receive any parameters.

Return Values

Returns true data was successfully written to the database; otherwise, it returns false.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Write to Database

// Get data manager
dm = session.getv( "DATAMANAGER" );

// Write Information to Database
dm.flush();

getConnection

Connection sqlDataManager.getConnection ( ) (professional and enterprise editions only)

Description

Retrieve the connection object of the data manager. This can be helpful if you want to do something that the data manager cannot do easily, such as query the database.

Be sure to close the connection once it is no longer needed. Failure to do so could exhaust the connection pool used by the datamanger, which will cause the scraping session to hang.

Parameters

This method does not receive and parameters.

Return Values

Returns a connection object matching the one used in the data manager.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Retrieve Database Connection

// Import SQL object
import java.sql.*;

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Retrieve connection
connection = dm.getConnection();

try {
PreparedStatement ps = connection.prepareStatement( "UPDATE table SET status=?" );
ps.setString( 1, session.getv("STATUS") );
ps.executeUpdate();
} finally {
connection.close();
}

getLastAutoIncrementKey

DataObject sqlDataManager.getLastAutoIncrementKey (String table) (professional and enterprise editions only)

Description

Retrieve the last autogenerated primary key, if any, for the given table

Parameters

case insensitve table name

Return Values

Returns a com.screenscraper.datamanager.DataObject containing the primary key.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Retrieve AutoIncrement Key

//Save some data
dm.addData("table", "column", "important data");
dm.commit("table");
dm.flush("table");

//Retrieve the key associated with the data we just saved as an Integer
key = dm.getLastAutoIncrementKey("table").getInt();

setAutoManyToMany

void sqlDataManager.setAutoManyToMany ( boolean enable ) (professional and enterprise editions only)

Description

Sets whether or not the data manager should automatically take care of many-to-many relationships.

Parameters

enable Whether the data manager should automatically run a commit for many-to-many tables when the connected tables are committed, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

If the many-to-many table has more information than just the keys then you will want to leave this feature turned off so that you can add more data than just the keys before committing.

Examples

Set Automatic Commits for Many-to-many Tables

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Set Automatic Commit on Many-to-many tables
dm.setAutoManyToMany( true );

setGlobalMergeEnabled

void sqlDataManager.setGlobalMergeEnabled ( boolean merge )

This feature is only available for Professional and Enterprise editions of screen-scraper.

Description

Set global merge status. When conflicts exist in data, a merge of true will take the newer values and save them over previous null values.

When merging or updating values in a table, that table must have a Primary Key. When the Primary Key is set to autoincrement, if the value of that key was not set with the addData method the DataManager will create a new row rather than update or merge with an existing row. One solution is to use an SqlDuplicateFilter to set fields that would identify an entry as a duplicate and automatically insert the value of the autoincrement key when data is committed.

By default if the data that you are inserting has the same primary key as data already in the database it will ignore the insert. This behavior can be modified by the dm.setGlobalUpdateEnabled and dm.setGlobalMergeEnabled methods of the DataManager. This allows for four different settings to insert data:

Update	Merge	Resulting Action
false	false	Ignore row on duplicate
true	false	Update only values whose corresponding columns are currently NOT NULL in the database
false	true	Update only values whose corresponding columns are currently NULL in the database
true	true	Update all values to new data

Parameters

merge Whether to turn on global merge or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Global Database Merge

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

// Set Global Update
dm.setGlobalUpdateEnabled( true );

// Set Global Merge
dm.setGlobalMergeEnabled( true );

setGlobalUpdateEnabled

void sqlDataManager.setGlobalUpdateEnabled ( boolean update )

This feature is only available for Professional and Enterprise editions of screen-scraper.

Description

Set update status globally. When conflicts exist in data, an update of true will take the newer values and save them over previous non-null values.

Update	Merge	Resulting Action
false	false	Ignore row on duplicate
true	false	Update only values whose corresponding columns are currently NOT NULL in the database
false	true	Update only values whose corresponding columns are currently NULL in the database
true	true	Update all values to new data

Parameters

update Whether to turn on global update or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Global Database Update

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

// Set Global Update
dm.setGlobalUpdateEnabled( true );

setLoggingLevel

void sqlDataManager.setLoggingLevel ( Level level ) (professional and enterprise editions only)

Description

Set the error logging level. Currently only DEBUG and ERROR levels are supported. At the DEBUG level, all queries and results will be output to the log.

Parameters

level log4j logging level object.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Logging Level

// Get MySQL datamanager
dm = session.getVariable( "DATAMANAGER" );

// Set Logging Level
dm.setLoggingLevel( org.apache.log4j.Level.ERROR );

// Build Schemas
dm.buildSchemas();

setMergeEnabled

void sqlDataManager.setMergeEnabled ( boolean merge )

This feature is only available for Professional and Enterprise editions of screen-scraper.

Description

Set merge status for a table. When conflicts exists in data, a merge of true will take the newer values and save them over previous null values.

By default if the data that you are inserting has the same primary key as data already in the database it will ignore the insert. This behavior can be modified for a specific table by the dm.setUpdateEnabled and dm.setMergeEnabled methods of the DataManager. This allows for four different settings to insert data:

Update	Merge	Resulting Action
false	false	Ignore row on duplicate
true	false	Update only values whose corresponding columns are currently NOT NULL in the database
false	true	Update only values whose corresponding columns are currently NULL in the database
true	true	Update all values to new data

Parameters

table Name of the database table, as a string.
merge Whether to turn on global merge or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Database Table Merge

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Set Merge
dm.setMergeEnabled( "person", true );

setMultiThreadWrite

void sqlDataManager.setMultiThreadWrite ( int numThreads ) (professional and enterprise editions only)

Description

Set number of threads that the data manager can have open at once. When set higher than one, the scraping session can continue to run and download pages while the database is being written. This can decrease the time required to run a scrape, but also makes debugging harder as there is no guarantee about the order in which data will be written. It is recommended to leave this setting alone while developing a scrape. Also, the flush method will always return true if more than one thread is being used to write to the database, even if the write failed.

Parameters

numThreads The number of threads that the data manager can start and use to write data, as an integer.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Thread Count

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );
ds.setMaxActive( 100 );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Set number of threads that can be opened
// when interacting with the database
dm.setMultiThreadWrite(10);

// Build Schemas For all Tables
dm.buildSchemas();

setUpdateEnabled

void sqlDataManager.setUpdateEnabled ( boolean update )

This feature is only available for Professional and Enterprise editions of screen-scraper.

Description

Set update status for a given table. When conflicts exists in data, an update of true will take the newer values and save them over previous non-null values.

Update	Merge	Resulting Action
false	false	Ignore row on duplicate
true	false	Update only values whose corresponding columns are currently NOT NULL in the database
false	true	Update only values whose corresponding columns are currently NULL in the database
true	true	Update all values to new data

Parameters

table The name of the database table, as a string.
update Whether to turn on global update or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Database Table Update

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Set Update on person table
dm.setUpdateEnabled( "person", true );

SqlDataManager

SqlDataManager SqlDataManager ( BasicDataSource dataSource, ScrapingSession session ) (professional and enterprise editions only)

Description

Initiate a SqlDataManager object.

Before adding data to the SqlDataManager, you must build the schema of any tables you will use, as well as add foreign keys if you are not using a database engine that natively supports them (such as InnoDB for MySQL).

Parameters

dataSource A BasicDataSource object.
session The scraping session to which the data manager should be associated.

Return Values

Returns a SqlDataManager. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Class Location

com.screenscraper.datamanager.sql.SqlDataManager

Examples

Create a SQL Data Manager

import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// BasicDataSource
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( "user" );
ds.setPassword( "psswrd" );
ds.setUrl( "jdbc:mysql://127.0.0.1:3306/mydb?UTF8ENCODING" );
ds.setMaxActive( 100 );

// Create Data Manager
dm = new SqlDataManager( ds, session );

Create a SQL Data Manager Over SSH Tunnel

SqlDuplicateFilter

Overview

SqlDuplicateFilters are designed to filter duplicates when more information than just a primary key might define a duplicate entry. For example, you might define a unique person by their SSN, driver's license number, or by their first name, last name, and phone number. It is also possible that a single person may have multiple phone numbers, and if any of them match then the duplicate constraint should be met. Using an SqlDuplicateFilter can check for conditions such as this and correctly recognize duplicate entries.

This feature is only available for Professional and Enterprise editions of screen-scraper.

Examples

Register a new duplicate filter

// Import classes
import com.screenscraper.datamanager.sql.*;

//Get the data manager
SqlDataManager dm = session.getVariable( "_DATAMANAGER" );

// Register a new duplicate filter
// Check for duplicate people, so register it for the people table
SqlDuplicateFilter nameFilter = SqlDuplicateFilter.register("people", dm);

//Add constraints to match when a first name, middle initial, and last name match a different row in the database
nameFilter.addConstraint( "people", "first_name" );
nameFilter.addConstraint( "people", "middle_initial" );
nameFilter.addConstraint( "people", "last_name" );

Match Duplicates across tables

Sometimes the data will need to be filtered across multiple tables, or possibly different constaints might indicate a duplicate. An example of this is a person might be a duplicate if their SSN matches OR if their driver's license number matches. Alternatively, they may be a duplicate when they have the same first name, last name, and phone number.

import com.screenscraper.datamanager.sql.SqlDuplicateFilter;

/*
Perform the setup of the SqlDataManager, as shown previously, and name the variable dm.
*/

//register an SqlDuplicateFilter with the DataManager for the social security number
SqlDuplicateFilter ssnDuplicate = SqlDuplicateFilter.register( "person", dm );
ssnDuplicate.addConstraint( "person", "ssn" );

//register an SqlDuplicateFilter with the DataManager for the drivers license
SqlDuplicateFilter licenseDuplicate = SqlDuplicateFilter.register( "person", dm );
licenseDuplicate.addConstraint( "person", "drivers_license" );

//register an SqlDuplicateFilter with the DataManager for the name/phone number
//where the person table has a child table named phone.
SqlDuplicateFilter namePhoneDuplicate = SqlDuplicateFilter.register( "person", dm );
namePhoneDuplicate.addConstraint( "person", "first_name" );
namePhoneDuplicate.addConstraint( "person", "last_name" );
namePhoneDuplicate.addConstraint( "phone", "phone_number" );

Duplicate filters are checked in the order they are added, so consider perfomance when creating duplicate filters. If, for instance, most duplicates will match on the social security number, create that filter before the others. Also make sure to add indexes into your database on those columns that you are selecting by or else performance will rapidly degrade as your database gets large.

Duplicates will be filtered by any one of the filters created. If multiple fields must all match for an entry to be a duplicate, create a single filter and add each of those fields as constraints, as shown in the third filter created above. In other words, constraints added to a single filter will be ANDed together, while seperate filters will be ORed.

addConstraint

void sqlDuplicateFilter.addConstraint ( String table, String column ) (professional and enterprise editions only)

Description

Add a constraint that checks the value of new entries against the value of entries already in the database for a given column and table.

Parameters

table Name of the database table, either the same table the filter is registered to or one of it's children
column The column that will be checked in the table for a duplicate with new values

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Register a new duplicate filter

import com.screenscraper.datamanager.sql.SqlDuplicateFilter;

/*
Perform the setup of the SqlDataManager, as shown previously, and name the variable dm.
*/

//register an SqlDuplicateFilter with the DataManager for the social security number
SqlDuplicateFilter ssnDuplicate = SqlDuplicateFilter.register( "person", dm );
ssnDuplicate.addConstraint( "person", "ssn" );

//register an SqlDuplicateFilter with the DataManager for the social security number
SqlDuplicateFilter licenseDuplicate = SqlDuplicateFilter.register( "person", dm );
licenseDuplicate.addConstraint( "person", "drivers_license" );

//register an SqlDuplicateFilter with the DataManager for the name/phone number
//where the person table has a child table named phone.
SqlDuplicateFilter namePhoneDuplicate = SqlDuplicateFilter.register( "person", dm );
namePhoneDuplicate.addConstraint( "person", "first_name" );
namePhoneDuplicate.addConstraint( "person", "last_name" );
namePhoneDuplicate.addConstraint( "phone", "phone_number" );

register

SqlDuplicateFilter SqlDuplicateFilter.register ( String table, SqlDataManager dataManager ) (professional and enterprise editions only)

Description

Create an SqlDuplicateFilter for a specific table and register it with the data manager.

Parameters

table Name of the database table with the primary key, as a string.
dataManager The data manager that will use this filter when adding entries to the database.

Return Values

Returns an SqlDuplicateFilter that can then be configured for duplicate entries.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Register a new duplicate filter

Match Duplicates across tables

// Import classes
import com.screenscraper.datamanager.sql.*;

//Get the data manager
SqlDataManager dm = session.getVariable( "_DATAMANAGER" );

// Register a new duplicate filter
// Check for duplicate people, so register it for the people table
SqlDuplicateFilter personFilter = SqlDuplicateFilter.register("people", dm);

// Catch duplicates when a new entry has the same first name, last name, and phone number as another entry
// Note that phone is a child table of people
personFilter.addConstraint( "people", "first_name" );
personFilter.addConstraint( "people", "last_name" );
personFilter.addConstraint( "phone", "phone_number" );

XmlWriter

Overview

Oftentimes you want to write extracted data directly to an XML file. This class facilitates doing that. Before working with the methods below, you may wish to read our documentation about writing extracted data to XML, which contains examples of scripts that utilize these methods.

This feature is only available to Enterprise editions of screen-scraper.

XmlWriter

XmlWriter XmlWriter ( String fileName, String rootElementName ) (enterprise edition only)
XmlWriter XmlWriter ( String fileName, String rootElementName, String rootElementText ) (enterprise edition only)
XmlWriter XmlWriter ( String fileName, String rootElementName, String rootElementText, Hashtable attributes ) (enterprise edition only)
XmlWriter XmlWriter ( String fileName, String rootElementName, String rootElementText, Hashtable attributes, String characterSet ) (enterprise edition only)

Description

Initiate a XmlWriter object.

Parameters

fileName The file path where the file will be created, as a string.
rootElementName The root element's name in the XML file, as a string.
rootElementText (optional) Any text to be added inside of the root node, as a string.
attributes (optional) Hashtable of attribute names and their associated values, for the root node.

Return Values

Returns a XmlWriter. If an error is experienced it will be thrown.

Change Log

Version	Description
4.5	Available for enterprise edition.
5.5.3a	Added the constructor that takes a character set.

Class Location

com.screenscraper.xml.XmlWriter

Examples

Create an XmlWriter

// Import package
import com.screenscraper.xml.*;

// Create XmlWriter
xmlWriter = new XmlWriter( "C:/students.xml", "students" );

addElement

Element XmlWriter.addElement ( String name ) (enterprise edition only)
Element XmlWriter.addElement ( String name, String text ) (enterprise edition only)
Element XmlWriter.addElement ( String name, String text, Hashtable attributes ) (enterprise edition only)
Element XmlWriter.addElement ( Element elementToAppendTo, String name ) (enterprise edition only)
Element XmlWriter.addElement ( Element elementToAppendTo, String name, String text ) (enterprise edition only)
Element XmlWriter.addElement ( Element elementToAppendTo, String name, String text, Hashtable attributes ) (enterprise edition only)

Description

Add a node to the XML file.

Parameters

elementToAppendTo (optional) The XmlElement to which the node is being appended.
name The element's name, as a string.
text (optional) Any text to be added inside of the node, as a string.
attributes (optional) Hashtable of attribute names and their associated values, for the node.

Return Values

Returns the added element object.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Add Nodes to XML File

// Import package
import com.screenscraper.xml.*;

// Create XmlWriter
xmlWriter = new XmlWriter( "C:/students.xml", "students" );

// Add Student Node
student = xmlWriter.addElement( "student" );

// Add Name Node Under the Student
address = xmlWriter.addElement( student, "name", "John Smith" );

// Close XmlWriter
xmlWriter.close();

addElements

Element XmlWriter.addElements ( Element elementToAppendTo, String name, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( Element elementToAppendTo, String name, String text, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( Element elementToAppendTo, String name, String text, Hashtable attributes, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( String name, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( String name, String text, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( String name, String text, Hashtable attributes, Hashtable subElements ) (enterprise edition only)
void XmlWriter.addElements ( String containingTagName, DataSet dataSet ) (enterprise edition only)
void XmlWriter.addElements ( String containingTagName, String containingTagText, DataSet dataSet ) (enterprise edition only)
void XmlWriter.addElements ( String containingTagName, String containingTagText, Hashtable attributes, DataSet dataSet ) (enterprise edition only)

Description

Add multiple nodes under a single node (new or already in existence).

Parameters

elementToAppendTo The XmlElement to which the node is being appended.
name The element's name, as a string.
text (optional--pass in null to omit) Any text to be added inside of the node, as a string.
attributes (optional--pass in null to omit) Hashtable of attribute names and their associated values, for the node.
subElements (optional--pass in null to omit) Hashtable children nodes with node names as keys and text as values.

name The element's name, as a string.
text (optional--pass in null to omit) Any text to be added inside of the node, as a string.
attributes (optional--pass in null to omit) Hashtable of attribute names and their associated values, for the node.
subElements (optional--pass in null to omit) Hashtable children nodes with node names as keys and text as values.

containingTagName The element's name, as a string.
containingTagText (optional--pass in null to omit) Any text to be added inside of the containing node, as a string.
attributes (optional--pass in null to omit) Hashtable of attribute names and their associated values, for the node.
dataSet A dataSet object.

Return Values

Returns the main added element object, if one was created. It there was not a main element that was added then it returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Add Nodes to XML File

// Import package
import com.screenscraper.xml.*;
import java.util.Hashtable;

// Create XmlWriter
xmlWriter = new XmlWriter( "C:/students.xml", "students" );

// Student Information
info = new Hashtable();
info.put("name", "John Smith");
info.put("phone", "555-0135");
info.put("gender", "male");

// Add Student Node
student = xmlWriter.addElements( "student", info );

// Close XmlWriter
xmlWriter.close();

close

void XmlWriter.close ( ) (enterprise edition only)

Description

Close the XmlWriter.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.