session
Overview
This object refers to the current scraping session that is running. To make the methods a little easier to sort through they have been grouped into related methods. The groups have been named to ease in finding them when they are needed.
Anonymization
Overview
The following methods are provided to aid you in setting up an anonymous scraping session. If you are using your own server proxy pool you will use the methods to allow screen-scraper to interact with and manage your proxy pool. If you are using automatic anonymization then the only method you will use is currentProxyServerIsBad as screen-scraper will manage the servers using the anonymization settings from your setup.
See an example of Anonymization via Manual Proxy Pools.
currentProxyServerIsBad
void session.currentProxyServerIsBad ( ) (professional and enterprise editions only)
Description
Remove proxy server from proxy pool. This is only used with anonymization and indicates that one server in the pool is bad and should be removed.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
If you are using automatic anonymization or manual proxy pools, a new proxy server will be created as a result of the method call.
When checking if a request you have made is invalid it is best not to rely on the HTTP status code (eg. 404) alone as the status codes are not always accurate. It is recommended that you also scrape a known string (eg. "Not found") from the response HTML that validates the status code.
Examples
Flag Proxy Server
// Indicates that the current proxy server is bad.
session.currentProxyServerIsBad();
getCurrentProxyServerFromPool
ProxyServer session.getCurrentProxyServerFromPool ( )
Description
Get the current proxy server from the proxy server pool.
Parameters
This method does not receive any parameters.
Return Values
Returns the current proxy server being used.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Write Proxy Server Description to Log
// Get Proxy Server
proxyServer = session.getCurrentProxyServerFromPool();
// Log Server Description
session.log( "Proxy Server: " + proxyServer.getDescription() );
getProxyServerPool
void session.getProxyServerPool ()
Description
Holds the proxy server pool object that allows proxies to be cycled through.
Parameters
- This method does not receive any parameters.
Return Values
Returns true if there is an available proxy server pool.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Check if ProxyServerPool object exists.
// If ProxyServerPool does not exist
// Create a new ProxyServerPool object.
if ( !session.getProxyServerPool() )
{
// The ProxyServerPool object will
// control how screen-scraper interacts with proxy servers.
proxyServerPool = new ProxyServerPool();
// We give the current scraping session a reference to
// the proxy pool. This step should ideally be done right
// after the object is created (as in the previous step).
session.setProxyServerPool( proxyServerPool );
}
getTerminateProxiesOnCompletion
boolean session.getTerminateProxiesOnCompletion ( )
Description
Determine whether proxies are set to be terminated when the scrape ends.
Parameters
This method does not receive any parameters.
Return Values
Returns true if a proxy will be terminated; otherwise, it returns false.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Check Termination Setting
// Log whether proxies are being terminated or not
if ( session.getTerminateProxiesOnCompletion() )
{
session.log( "Anonymous Proxies are set to be terminated with the scrape." );
}
else
{
session.log( "Anonymous Proxies are set to continue running after the scrape is finished." );
}
See Also
getUseProxyFromPool
boolean session.getUseProxyFromPool ( )
Description
Determine whether proxies are being used from proxy pool.
Parameters
This method does not receive any parameters.
Return Values
Returns true if a proxy pool is being used; otherwise, it returns false.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Turn On Proxy Pool Usage If Not Running
// Are proxies being used from a pool
if ( !session.getUseProxyFromPool() )
{
session.setUseProxyFromPool( true );
}
See Also
- setUseProxyFromPool() [session] - Sets whether a proxy from the proxy pool should be used when making a request
setProxyServerPool
void session.setProxyServerPool ( ProxyServerPool proxyServerPool )
Description
Associate a proxy pool with a scraping session.
Parameters
- proxyServerPool A ProxyServerPool object.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Associate Proxy Pool with Scraping Session
// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.
proxyServerPool = new ProxyServerPool();
// We give the current scraping session a reference to
// the proxy pool. This step should ideally be done right
// after the object is created (as in the previous step).
session.setProxyServerPool( proxyServerPool );
setTerminateProxiesOnCompletion
void session.setTerminateProxiesOnCompletion ( boolean terminateProxies )
Description
Manually set the outcome of proxies when the scrape ends.
Parameters
- terminateProxies Whether proxies should be terminated at the end of the session or not, as a boolean.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Make Sure Proxies are Deleted on Scrape Completion
// Test
if ( session.getTerminateProxiesOnCompletion() )
{
session.log( "Anonymous Proxies are set to be terminated with the scrape." );
}
else
{
// Set proxies to be terminated with the scrape
session.setTerminateProxiesOnCompletion( true );
session.log( "Anonymous Proxies updated to be terminated with the scrape." );
}
See Also
setUseProxyFromPool
void session.setUseProxyFromPool ( boolean useProxyFromPool )
Description
Determine if proxies from a proxyServerPool be used when making scrapeable file request.
Parameters
- useProxyFromPool Whether proxies in the proxyServerPool should be used, as a boolean.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Anonymize Scrapeable Files
// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.
proxyServerPool = new ProxyServerPool();
// We give the current scraping session a reference to
// the proxy pool. This step should ideally be done right
// after the object is created (as in the previous step).
session.setProxyServerPool( proxyServerPool );
... Proxy Server Pool Setup ...
// This is the switch that tells the scraping session to make
// use of the proxy servers. Note that this can be turned on
// and off during the course of the scrape. You may want to
// anonymize some pages, but not others.
session.setUseProxyFromPool( true );
See Also
- getUseProxyFromPool() [session] - Returns whether or not a proxy from the proxy pool will be used upon making a request
External Proxy Settings
Overview
If you are already going through a proxy server, screen-scraper must be told the credentials in order to get out to the internet. These methods are all provided to manually tell screen-scraper how to get through your external proxy.
If you always go through the same external proxy you would probably want to set the credentials in screen-scraper's proxy settings so that you don't have to specify them in all of your scrapes.
getExternalNTProxyDomain
string session.getExternalNTProxyDomain ( )
Description
Retrieve the external NT proxy domain.
Parameters
This method does not receive any parameters.
Return Values
Returns the external NT domain, as a string.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Log External NT Proxy Settings
// Log External Proxy Settings
session.log( "Username: " + session.getExternalNTProxyUsername( ) );
session.log( "Password: " + session.getExternalNTProxyPassword( ) );
session.log( "Domain: " + session.getExternalNTProxyDomain( ) );
session.log( "Host: " + session.getExternalNTProxyHost( ) );
See Also
getExternalNTProxyHost
string session.getExternalNTProxyHost ( )
Description
Retrieve the external NT proxy host.
Parameters
This method does not receive any parameters.
Return Values
Returns the external NT host, as a string.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Log External NT Proxy Settings
// Log External Proxy Settings
session.log( "Username: " + session.getExternalNTProxyUsername( ) );
session.log( "Password: " + session.getExternalNTProxyPassword( ) );
session.log( "Domain: " + session.getExternalNTProxyDomain( ) );
session.log( "Host: " + session.getExternalNTProxyHost( ) );
See Also
getExternalNTProxyPassword
string session.getExternalNTProxyPassword ( )
Description
Retrieve the external NT proxy password.
Parameters
This method does not receive any parameters.
Return Values
Returns the external NT password, as a string.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Log External NT Proxy Settings
// Log External Proxy Settings
session.log( "Username: " + session.getExternalNTProxyUsername( ) );
session.log( "Password: " + session.getExternalNTProxyPassword( ) );
session.log( "Domain: " + session.getExternalNTProxyDomain( ) );
session.log( "Host: " + session.getExternalNTProxyHost( ) );
See Also
getExternalNTProxyUsername
string session.getExternalNTProxyUsername ( )
Description
Retrieve the external NT proxy username.
Parameters
This method does not receive any parameters.
Return Values
Returns the external NT username, as a string.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Log External NT Proxy Settings
// Log External Proxy Settings
session.log( "Username: " + session.getExternalNTProxyUsername( ) );
session.log( "Password: " + session.getExternalNTProxyPassword( ) );
session.log( "Domain: " + session.getExternalNTProxyDomain( ) );
session.log( "Host: " + session.getExternalNTProxyHost( ) );
See Also
getExternalProxyHost
string session.getExternalProxyHost ( )
Description
Retrieve the external proxy host.
Parameters
This method does not receive any parameters.
Return Values
Returns the external host, as a string.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Log External Proxy Settings
// Log External Proxy Settings
session.log( "Username: " + session.getExternalProxyUsername( ) );
session.log( "Password: " + session.getExternalProxyPassword( ) );
session.log( "Host: " + session.getExternalProxyHost( ) );
session.log( "Port: " + session.getExternalProxyPort( ) );
See Also
getExternalProxyPassword
string session.getExternalProxyPassword ( )
Description
Retrieve the external proxy password.
Parameters
This method does not receive any parameters.
Return Values
Returns the external password, as a string.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Log External Proxy Settings
// Log External Proxy Settings
session.log( "Username: " + session.getExternalProxyUsername( ) );
session.log( "Password: " + session.getExternalProxyPassword( ) );
session.log( "Host: " + session.getExternalProxyHost( ) );
session.log( "Port: " + session.getExternalProxyPort( ) );
See Also
getExternalProxyPort
string session.getExternalProxyPort ( )
Description
Retrieve the external proxy port.
Parameters
This method does not receive any parameters.
Return Values
Returns the external port, as a string.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Log External Proxy Settings
// Log External Proxy Settings
session.log( "Username: " + session.getExternalProxyUsername( ) );
session.log( "Password: " + session.getExternalProxyPassword( ) );
session.log( "Host: " + session.getExternalProxyHost( ) );
session.log( "Port: " + session.getExternalProxyPort( ) );
See Also
getExternalProxyUsername
string session.getExternalProxyUsername ( )
Description
Retrieve the external proxy username.
Parameters
This method does not receive any parameters.
Return Values
Returns the external username, as a string.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Log External Proxy Settings
// Log External Proxy Settings
session.log( "Username: " + session.getExternalProxyUsername( ) );
session.log( "Password: " + session.getExternalProxyPassword( ) );
session.log( "Host: " + session.getExternalProxyHost( ) );
session.log( "Port: " + session.getExternalProxyPort( ) );
See Also
setExternalNTProxyDomain
void session.setExternalNTProxyDomain ( String domain )
Description
Manually set external NT proxy domain.
Parameters
- domain Domain for the external NT proxy, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
If you are using this method on all of your scripts you might want to set it in screen-scraper's external NT proxy settings.
If you are using NTLM (Windows NT) authentication you'll need to designate settings for both the standard external proxy as well as the external NT proxy.
Examples
Manually Setup External NT Proxy
// Setup External Proxy
session.setExternalNTProxyUsername( "guest" );
session.setExternalNTProxyPassword( "guestPassword" );
session.setExternalNTProxyDomain( "Group" );
session.setExternalNTProxyHost( "proxy.domain.com" );
See Also
setExternalNTProxyHost
void session.setExternalNTProxyHost ( String host )
Description
Manually set external NT proxy host/domain.
Parameters
- host Host/domain for the external NT proxy, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
If you are using this method on all of your scripts you might want to set it in screen-scraper's external NT proxy settings.
If you are using NTLM (Windows NT) authentication you'll need to designate settings for both the standard external proxy as well as the external NT proxy.
Examples
Manually Setup External NT Proxy
// Setup External Proxy
session.setExternalNTProxyUsername( "guest" );
session.setExternalNTProxyPassword( "guestPassword" );
session.setExternalNTProxyDomain( "Group" );
session.setExternalNTProxyHost( "proxy.domain.com" );
See Also
setExternalNTProxyPassword
void session.setExternalNTProxyPassword ( String password )
Description
Manually set external NT proxy password.
Parameters
- password Password for the external NT proxy, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
If you are using this method on all of your scripts you might want to set it in screen-scraper's external NT proxy settings.
If you are using NTLM (Windows NT) authentication you'll need to designate settings for both the standard external proxy as well as the external NT proxy.
Examples
Manually Setup External NT Proxy
// Setup External Proxy
session.setExternalNTProxyUsername( "guest" );
session.setExternalNTProxyPassword( "guestPassword" );
session.setExternalNTProxyDomain( "Group" );
session.setExternalNTProxyHost( "proxy.domain.com" );
See Also
setExternalNTProxyUsername
void session.setExternalNTProxyUsername ( String username )
Description
Manually set external NT proxy username.
Parameters
- username Username for the external NT proxy, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
If you are using this method on all of your scripts you might want to set it in screen-scraper's external NT proxy settings.
If you are using NTLM (Windows NT) authentication you'll need to designate settings for both the standard external proxy as well as the external NT proxy.
Examples
Manually Setup External NT Proxy
// Setup External Proxy
session.setExternalNTProxyUsername( "guest" );
session.setExternalNTProxyPassword( "guestPassword" );
session.setExternalNTProxyDomain( "Group" );
session.setExternalNTProxyHost( "proxy.domain.com" );
See Also
setExternalProxyHost
void session.setExternalProxyHost ( String host )
Description
Manually set external proxy host/domain.
Parameters
- host Host/domain for the external proxy, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
If you are using this method on all of your scripts you might want to set it in screen-scraper's external proxy settings.
Examples
Manually Setup External Proxy
// Setup External Proxy
session.setExternalProxyUsername( "guest" );
session.setExternalProxyPassword( "guestPassword" );
session.setExternalProxyHost( "proxy.domain.com" );
session.setExternalProxyPort( "80" );
See Also
setExternalProxyPassword
void session.setExternalProxyPassword ( String password )
Description
Manually set external proxy password.
Parameters
- password Password for the external proxy, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
If you are using this method on all of your scripts you might want to set it in screen-scraper's external proxy settings.
Examples
Manually Setup External Proxy
// Setup External Proxy
session.setExternalProxyUsername( "guest" );
session.setExternalProxyPassword( "guestPassword" );
session.setExternalProxyHost( "proxy.domain.com" );
session.setExternalProxyPort( "80" );
See Also
setExternalProxyPort
void session.setExternalProxyPort ( String port )
Description
Manually set external proxy port.
Parameters
- port Port for the external proxy, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
If you are using this method on all of your scripts you might want to set it in screen-scraper's external proxy settings.
Examples
Manually Setup External Proxy
// Setup External Proxy
session.setExternalProxyUsername( "guest" );
session.setExternalProxyPassword( "guestPassword" );
session.setExternalProxyHost( "proxy.domain.com" );
session.setExternalProxyPort( "80" );
See Also
setExternalProxyUsername
void session.setExternalProxyUsername ( String username )
Description
Manually set external proxy username.
Parameters
- username Username for the external proxy, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
If you are using this method on all of your scripts you might want to set it in screen-scraper's external proxy settings.
Examples
Manually Setup External Proxy
// Setup External Proxy
session.setExternalProxyUsername( "guest" );
session.setExternalProxyPassword( "guestPassword" );
session.setExternalProxyHost( "proxy.domain.com" );
session.setExternalProxyPort( "80" );
See Also
Logging
Overview
Use of log is a great tool to ensure that your scrapes are working correctly as well as troubleshooting problems that arise. Though logging large amounts of information may slow down a scrape, the best way around this is not to remove log writing requests but rather change the verbosity of the logging when running the scrape in a production environment. If you do this, know that you make it harder to troubleshoot some problems should they arise.
The number of methods provided is merely to enhance your ability to log information according to importance.
See Also
- debug() [log] - Sends a message to the log as an debug message
- info() [log] - Sends a message to the log as an info message
- warn() [log] - Sends a message to the log as an warning message
- error() [log] - Sends a message to the log as a error message
getLogFileName
String session.getLogFileName ( ) (professional and enterprise editions only)
Description
Get the name of the current log file.
Parameters
This method does not receive any parameters.
Return Values
Returns the name of the log file, as a string.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
This method can be very helpful when screen-scraper is running in server mode and you are tracking the log where the scrape of a record is located, or for tracking the location of errors in larger scrapes.
Examples
Get Log's File Name
// Output the name of the log file to the session log.
logName = session.getLogFileName();
log
void session.log ( Object message )
Description
Write message to the log.
Parameters
- message Message to be written to the log after being converted to a String using String.valueOf( message ).
Return Values
Returns void.
Change Log
Version |
Description |
5.5 |
Now accepts any Object as a message |
4.5 |
Available for all editions. |
When the workbench is running, this will be found under the log tab for the scraping session. When screen-scraper is running in server mode, the message will get sent to the corresponding .log file found in screen-scraper's log folder. When screen-scraper is invoked from the command line, the message will get sent to standard out.
Examples
Write to Log
// Sends the message to the log.
session.log( "Inserting extracted data into the database." );
See Also
- logDebug() [session] - Sends a message to the log as a debugging message
- logInfo() [session] - Sends a message to the log as an informative message
- logWarn() [session] - Sends a message to the log as a warning
- logError() [session] - Sends a message to the log as an error message
- log() [log] - Write message to the log
logCurrentDateAndTime
void session.logCurrentDateAndTime ( ) (professional and enterprise editions only)
Description
Write current date and time to log (at most verbose level). It is formatted to be human readable.
Parameters
This method does not receive any parameters.
Return Values
Returns void. If an error occurs, an error will be thrown.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Log Date and Time
// Output the current date and time to the log.
session.logCurrentDateAndTime();
logCurrentTime
void session.logCurrentTime ( ) (professional and enterprise editions only)
Description
Write current time to log (at most verbose level). The time is formatted to be human readable.
Parameters
This method does not receive any parameters.
Return Values
Returns void. If an error occurs, an error will be thrown.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Log Formatted Time
// Output the current date and time to the log.
session.logCurrentTime();
logDebug
void session.logDebug ( Object message ) (professional and enterprise editions only)
Description
Write message to the log, at the the debug level (most verbose).
Parameters
- message Message to be written to the log after being converted to a String using String.valueOf( message ).
Return Values
Returns void.
Change Log
Version |
Description |
5.5 |
Now accepts any Object as a message |
4.5 |
Available for professional and enterprise editions. |
Examples
Write to Log at Debug level
// Sends the message to the lowest level of logging.
session.logDebug( "Index: " + session.getVariable( "INDEX" ) );
- log() [session] - Sends a message to the log as a debugging message
- logInfo() [session] - Sends a message to the log as an informative message
- logWarn() [session] - Sends a message to the log as a warning
- logError() [session] - Sends a message to the log as an error message
- debug() [log] - Sends a message to the log as a debug message
logElapsedRunningTime
void session.logElapsedRunningTime ( ) (professional and enterprise editions only)
Description
Write scrape run time to the log (at most verbose level). It is formatted to be human readable, including breaking it into days, hours, minutes, and seconds.
Parameters
This method does not receive any parameters.
Return Values
Returns void. If an error occurs, an error will be thrown.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Log Time the Scrape has been Running
// Output the running time to the log.
session.logElapsedRunningTime();
See Also
logError
void session.logError ( Object message ) (professional and enterprise editions only)
Description
Write message to the log, at the the error level (least verbose).
Parameters
- message Message to be written to the log after being converted to a String using String.valueOf( message ).
Return Values
Returns void. If an error occurs, an error will be thrown.
Change Log
Version |
Description |
5.5 |
Now accepts any Object as a message |
4.5 |
Available for professional and enterprise editions. |
Examples
Write to Log at Error level
// Sends the message to the highest level of logging.
session.logError( "Error parsing date: " + session.getVariable( "DATE" ) );
- log() [session] - Sends a message to the log as a debugging message
- logDebug() [session] - Sends a message to the log as a debugging message
- logInfo() [session] - Sends a message to the log as an informative message
- logWarn() [session] - Sends a message to the log as a warning
- error() [log] - Sends a message to the log as an error message
logInfo
void session.logInfo ( Object message ) (professional and enterprise editions only)
Description
Write message to the log, at the the info level (second most verbose).
Parameters
- message Message to be written to the log after being converted to a String using String.valueOf( message ).
Return Values
Returns void. If an error occurs, an error will be thrown.
Change Log
Version |
Description |
5.5 |
Now accepts any Object as a message |
4.5 |
Available for professional and enterprise editions. |
Examples
Write to Log at Info level
// Sends the message to the second lowest level of logging.
session.logInfo( "Traversing search results pages..." );
- log() [session] - Sends a message to the log as a debugging message
- logDebug() [session] - Sends a message to the log as a debugging message
- logWarn() [session] - Sends a message to the log as a warning
- logError() [session] - Sends a message to the log as an error message
- info() [log] - Sends a message to the log as an info message
logVariables
void session.logVariables ( ) (professional and enterprise editions only)
Description
Write all session variables to log.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Log All Session Variables
// Write Variables to Log
session.logVariables();
See Also
- berakpoint [dataSet] - Pause scrape and display breakpoint window.
logWarn
void session.logWarn ( Object message ) (professional and enterprise editions only)
Description
Write message to the log, at the the warn level (third most verbose).
Parameters
- message Message to be written to the log after being converted to a String using String.valueOf( message ).
Return Values
Returns void. If an error occurs, an error will be thrown.
Change Log
Version |
Description |
5.5 |
Now accepts any Object as a message |
4.5 |
Available for professional and enterprise editions. |
Examples
Write to Log at Info level
// Sends the message to the third level of logging.
session.logWarn( "Warning! Received a 404 response." );
- log() [session] - Sends a message to the log as a debugging message
- logDebug() [session] - Sends a message to the log as a debugging message
- logInfo() [session] - Sends a message to the log as an informative message
- logError() [session] - Sends a message to the log as an error message
- warn() [log] - Sends a message to the log as an warning message
Web Interface Interactions
Overview
These methods are used in connection with the web interface of screen-scraper. Their use will provide the interface with more detailed information regarding the state of a running scrape. If you are not running the scrapes using the web interface then these methods are not particularly helpful to you.
As the web interface is an enterprise edition feature, these methods are only available in enterprise edition users.
addToNumDuplicateRecordsScraped
void session.addToNumDuplicateRecordsScraped ( Object value ) (enterprise edition only)
Description
Add to the value of duplicate records scraped. (As opposed to new or error records.)
Parameters
- value Value to be added to the count. Usually a integer but if it is given a string (e.g. "10") it will try to transform it into an integer before adding.
Return Values
Returns void.
Change Log
Version |
Description |
7.0 |
Available for enterprise edition. |
Examples
Record New Records Scraped
// Adds 10 to the value of new records scraped.
session.addToNumDuplicateRecordsScraped(10);
Have session record each time a new record saved to the database
// In script called "After each pattern match"
import java.sql.PreparedStatement;
import java.sql.ResultSet;
dm = session.getv("_DM");
con = dm.getConnection();
try
{
String sql = "SELECT id FROM table WHERE did = ?";
PreparedStatement pstmt = con.prepareStatement(sql);
pstmt.setString(1, dataRecord.get("ID"));
ResultSet rs = pstmt.executeQuery();
if (rs.next())
{
log.log("---Already in DB");
session.addToNumDuplicateRecordsScraped(1);
}
else
{
session.scrapeFile("Results");
}
}
catch (Exception e)
{
log.logError(e);
session.setFatalErrorOccurred(true);
session.setErrorMessage(e);
}
finally
{
con.close();
}
addToNumErrorRecordsScraped
void session.addToNumErrorRecordsScraped ( Object value ) (enterprise edition only)
Description
Add to the value error records. (As opposed to duplicate or new records.)
Parameters
- value Value to be added to the count. Usually a integer but if it is given a string (e.g. "10") it will try to transform it into an integer before adding.
Return Values
Returns void.
Change Log
Version |
Description |
7.0 |
Available for enterprise edition. |
Examples
Record New Records Scraped
// Adds 10 to the value of new records scraped.
session.addToNumErrorRecordsScraped(10);
Have session record each time a dataRecord is missing a vital datam
// In script called "After each pattern match"
if (sutil.isNullOrEmptyString(dataRecord.get("VITAL_DATUM")))
{
log.logError("Missing VITAL_DATUM");
session.addToNumErrorRecordsScraped(1);
}
addToNumNewRecordsScraped
void session.addToNumNewRecordsScraped ( Object value ) (enterprise edition only)
Description
Add to the value of new records scraped. (As opposed to duplicate or error records.)
Parameters
- value Value to be added to the count. Usually a integer but if it is given a string (e.g. "10") it will try to transform it into an integer before adding.
Return Values
Returns void.
Change Log
Version |
Description |
7.0 |
Available for enterprise edition. |
Examples
Record New Records Scraped
// Adds 10 to the value of new records scraped.
session.addToNumNewRecordsScraped(10);
Have session record each time a new record saved to the database
// In script called "After each pattern match"
dm = session.getv("_DM");
dm.addData("db_table", dataRecord);
dm.commit("db_table");
if (dm.flush())
{
session.addToNumNewRecordsScraped(1);
}
addToNumRecordsScraped
void session.addToNumRecordsScraped ( Object value ) (enterprise edition only)
Description
Add to the value of number of records scraped.
Parameters
- value Value to be added to the count. Usually a integer but if it is given a string (e.g. "10") it will try to transform it into an integer before adding.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Record Number of Records Scraped
// Adds 10 to the value of the number of records scraped.
session.addToNumRecordsScraped( 10 );
Have session record each time a DataRecord exists
// In script called "After file is scraped"
// Adds number of DataRecords in DataSet
// to the value of the number of records scraped.
session.addToNumRecordsScraped( dataSet.getNumDataRecords() );
See Also
appendErrorMessage
void session.appendErrorMessage ( String errorMessage ) (enterprise edition only)
Description
Append an error message to any existing error messages.
Parameters
- errorMessage Error message that should be added, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
User Specified Error
// First set the flag indicating that an error occurred.
session.setFatalErrorOccurred( true );
// Append an error message.
session.appendErrorMessage( "An error occurred in the scraping session." );
See Also
getErrorMessage
String session.getErrorMessage ( ) (enterprise edition only)
Description
Get the current error message.
Parameters
This method does not receive any parameters.
Return Values
Returns current error message, as a string.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Write Error Message to the Log
// Output the current error message to the log.
session.log( "Error message: " + session.getErrorMessage() );
See Also
getFatalErrorOccurred
boolean session.getFatalErrorOccurred ( ) (enterprise edition only)
Description
Determine the fatal error status of the scrape.
Parameters
This method does not receive any parameters.
Return Values
Returns whether a fatal error has occurred, as a boolean .
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Write Fatal Error State to Log
// Output the "fatal error" state to the log.
session.log( "Fatal error occurred: " + session.getFatalErrorOccurred() );
See Also
getNumRecordsScraped
int session.getNumRecordsScraped ( ) (enterprise edition only)
Description
Get the number of records that have been scraped.
Parameters
This method does not receive any parameters.
Return Values
Returns number of records scraped, as a integer.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Write Number of Records Scraped to Log
// Outputs the number of records that have been scraped to the log.
session.log( "Num records scraped so far: " + session.getNumRecordsScraped() );
See Also
resetNumRecordsScraped
void session.resetNumRecordsScraped ( ) (enterprise editions only)
Description
Reset the count on the number of scraped records.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Reset Count
// Clear number of records scraped
session.resetNumRecordsScraped();
See Also
setErrorMessage
void session.setErrorMessage ( String errorMessage ) (enterprise edition only)
Description
Set the current error message.
Parameters
- errorMessage Desired error message, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Specify an Error Message
// First set the flag indicating that an error occurred.
session.setFatalErrorOccurred( true );
// Append an error message.
session.setErrorMessage( "An error occurred in the scraping session." );
Web Interface Feedback
// Append an error message. Without flagging it as an error.
// This will hijack the error message so it is more just a
// status message. Don't hijack if there was a fatal error.
if ( !session.getFatalErrorOccurred() )
{
session.appendErrorMessage( "Scraping Page: " + session.getv( "PAGE" ) );
}
See Also
setFatalErrorOccurred
void session.setFatalErrorOccurred ( boolean fatalErrorOccurred ) (enterprise edition only)
Description
Set the fatal error status of the scrape.
Parameters
- fatalErrorOccurred Desired fatal error status to set, as a boolean.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Set Fatal Error Flag
// Set the flag indicating that an error occurred.
session.setFatalErrorOccurred( true );
See Also
setNumRecordsScraped
void session.setNumRecordsScraped ( Object value ) (enterprise edition only)
Description
Set the number of records that have been scraped.
Parameters
- value Value to set the count of the number of records scraped.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Set the Number of Records Scraped
// Sets the value of the number of records scraped to 10.
session.setNumRecordsScraped( 10 );
See Also
addEventCallback
void session.addEventCallback ( EventFireTime eventTime, EventHandler callback ) (professional and enterprise editions only)
void session.addEventCallbackWithPriority ( EventFireTime eventTime, EventHandler callback, int priority ) (professional and enterprise editions only)
Description
Add a runnable that will be executed at the given time.
Note: session.addEventCallback is automatically executed at a priority of 0.
Parameters
- eventTime The time to execute a callback.
- callback The callback to execute.
- priority The prority for this callback. Lower numbers are higher priority.
Return Values
Returns void.
Change Log
Version |
Description |
6.0.55a |
Introduced for pro and enterprise editions. |
Examples
Sets a handler to do something after the scripts set to run at the end of the session have run.
// using the default callback with the priority being 0.
session.addEventCallback(SessionEventFireTime.AfterEndScripts, handler);
// if we need to set the priority to be something else (or variable) use the second option
// in this case the priority could still be set to 0 if you wanted to.
session.addEventCallbackWithPriority(SessionEventFireTime.AfterEndScripts, handler, 3);
More Examples
EventFireTime
The EventFireTime is an interface which defines the methods that a fire time must have and so the addEventCallback method can take different types of fire times.
A number of different types of classes based on this interface have been defined for you which call out the various parts of a scrape that you can add event handlers to. Those are defined below.
ExtractorPatternEventFireTime
ExtractorPatternEventFireTime
Enum
- BeforeExtractorPattern Before an extractor is applied (including before any scripts on it run). The returned value should be a boolean and indicates whether the extractor should be run or not. Any non-boolean result is the same as true. Also note that regardless of whether the extractor will be run or not, the event for after extractor pattern will still be fired.
- AfterExtractorPatternAppliedButBeforeScripts After an extractor is applied (but before any scripts on it run &emdash; including the after apparent match scripts).
- AfterEachExtractorMatch After each match of an extractor. This will be applied before any of the "After each pattern match" scripts are applied.
- AfterExtractorPattern After an extractor is applied (including any scripts on it run).
Change Log
Version |
Description |
6.0.55a |
Introduced for pro and enterprise editions. |
Examples
How to use the EventFireTime with the session.addEventcallback method.
session.addEventCallback(ExtractorPatternEventFireTime.AfterEachExtractorMatch, handler);
ScrapeableFileEventFireTime
ScrapeableFileEventFireTime
Enum
- BeforeScrapeableFile Before a scrapeable file is launched (inlcuding before any scripts on it run).
- BeforeHttpRequest Fired right before the http request (after any "before scrapeable fie" scripts, and wil fire each time the request is retired). If it returns a non-null String, that will be used as the response instead of issuing a request. This response will still get passed into the AfterHttpRequest even, but it will not pass through any tidying.
- AfterHttpRequest Fire right after the http response and running tidy, if set, but before anything else happens. Returns the data that should be used as the response data.
- AfterScrapeableFile After a scrapeable file is completed (including afer any scripts on it run).
- OnHttpRedirect* Called when a redirect will occur, and returns true if a redirect should occur or false if it should not (any non boolean results in no chanage).
*Note: When using the Async HTTP client you will have access to the request builder from ScrapeableFileEventData.getRedirectRequestBuilder() which can be used to modify and adjust the request before it is sent. If you use the Apache HTTP client the getRedirectRequestBuilder() method will always return null.
Change Log
Version |
Description |
6.0.55a |
Introduced for pro and enterprise editions. |
Examples
How to use the EventFireTime with the session.addEventcallback method.
session.addEventCallback(ScrapeableFileEventFireTime.BeforeScrapeableFile, handler);
getRedirectToURL
String scrapeableFileEventData.getRedirectToURL ( )
Description
Returns the RedirectToURL value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the RedirectToURL value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the redirect URL
public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
String url = data.getRedirectToURL();
// do something
}
ScriptEventFireTime
ScriptEventFireTime
Enum
- AfterScript After a script is executed
- BeforeScript Before a script is executed
- OnScriptEnd Run when the script finishes executing. The difference between AfterScript and this is that AfterScript fires after the script is done running, and this runs after all the developer code has run but the script engine is still active. The return value is an injected string to execute, or null (or the empty string) to do nothing aside from execute the script code.
- OnScriptError Executes when an error occurs in a script.
- OnScriptStart Run when the script beings to execute. The difference between BeforeScript and this is that BeforeScript fires as preparation is made to launch a script, and this runs after all the default pre-script code is executed by the script engine, but before the developer code in the script. The return value is an injected string to execute, or null (or the empty string) to do nothing aside from execute the script code.
Change Log
Version |
Description |
6.0.55a |
Introduced for pro and enterprise editions. |
Examples
How to use the EventFireTime with the session.addEventcallback method.
session.addEventCallback(ScriptEventFireTime.OnScriptEnd, handler);
SessionEventFireTime
SessionEventFireTime
Enum
- AfterEndScripts After the scrape finishes and all
- NumRecordsSavedModified When the ScrapingSession.addToNumRecordsScraped(Object) is called, this will also be called. The returned value will be the actual value to add.
- StopScrapingCalled When the session is stopped, either by calling the stopScraping method or clicking the stop scraping button in the workbench.
- SessionVariableSet* Called whenever a session variable is set. This is called before the value is actually set. The variable value passed in will be the new value to be set, and the return value of the handler will be the actual value returned.
- SessionVariableRetrieved* Called whenever a session variable is retrieved. This is called after the value is retrieved. The variable value passed in will be the current value, and the return value of the handler will be the actual value returned.
*Note: Calling a setVariable or getVariable method in here WILL trigger the events for those again. Avoid infinite recursion please!
Change Log
Version |
Description |
6.0.55a |
Introduced for pro and enterprise editions. |
Examples
How to use the EventFireTime with the session.addEventcallback method.
session.addEventCallback(SessionEventFireTime.AfterEndScripts, handler);
StringOperationEventFireTime
StringOperationEventFireTime
Enum
- HttpParameterEncodeKey Called when an http parameter key (GET or POST) is encoded. The input string will be the value that is already encoded, and the return value should be the value to actually use.
- HttpParameterEncodeValue Called when an http parameter value (GET or POST) is encoded. The input string will be the value that is already encoded, and the return value should be the value to actually use.
Change Log
Version |
Description |
6.0.55a |
Introduced for pro and enterprise editions. |
Examples
How to use the EventFireTime with the session.addEventcallback method.
session.addEventCallback(StringOperationEventFireTime.HttpParameterEncodeKey, handler);
EventHandler
EventHandler EventHandler ( ) (professional and enterprise editions only)
Description
Creates an EventHandler callback object which will be called when the event triggers
Change Log
Version |
Description |
6.0.55a |
Introduced for pro and enterprise editions. |
Examples
Define a handler for the session.addEventCallback to use.
// Create an EventHandler object which will be called when the event triggers
EventHandler handler = new EventHandler()
{
/**
* Returns the name of the handler. This method doens't need to be implemented
* but helps with debugging (on error executing the callback it will output this)
*/
public String getHandlerName()
{
return "A test event handler";
}
/**
* Processes the event, and potentially returns a useful value modifying something
* in the internal code
*
* @param fireTime The fire time of the event. This helps when using the same handler
* for multiple event times, to determine which was called
* @param data The actual data from the event. Based on the event time this
* will be a different type. It could be SessionEventData, ScrapeableFileEventData,
* ScriptEventData, StringEventData, etc... It will match the fire time class name
*
* @return A value indicating how to proceed (or sometimes the value is ignored)
*/
public Object handleEvent(EventFireTime fireTime, AbstractEventData data)
{
// While you can specifically grab any data from the data object,
// if this is a method that has a return value that matters,
// it's best to get it as the last return value, so that multiple
// events can be chained together. The input data object
// will always have the original values for all the other getters
Object returnValue = data.getLastReturnValue();
// Do stuff...
// The EventFireTime values describe in the documentation what the return
// value will do, or says nothing about it if the value is ignored
// If you don't intend to modify the return, always return data.getLastReturnValue();
return returnValue;
}
};
getHandlerName
String getHandlerName ( )
Description
Returns the name of the handler. This method doesn't need to be implemented but helps with debugging.
Parameters
This method does not receive any parameters.
Return Values
Returns the name of the handler. This method doesn't need to be implemented but helps with debugging.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
// Create an EventHandler object which will be called when the event triggers
EventHandler handler = new EventHandler()
{
/**
* Returns the name of the handler. This method doens't need to be implemented
* but helps with debugging (on error executing the callback it will output this)
*/
public String getHandlerName()
{
return "A test event handler";
}
public Object handleEvent(EventFireTime fireTime, AbstractEventData data)
{
// do something
}
};
See Also
handleEvent
Object handleEvent ( EventFireTime fireTime, AbstractEventData data )
Description
Processes the event, and potentially returns a useful value modifying something in the internal code as defined by the EventFireTime used to launch this event.
Parameters
- fireTime Defines the methods that a fire time must have.
- data Allows for the accessing of various data values found within ScreenScraper dependent on the class used.
Return Values
Returns a value based on which AbstractEventData class is used.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
EventHandler handler = new EventHandler()
{
public String getHandlerName()
{
// return something
}
/**
* Processes the event, and potentially returns a useful value modifying something
* in the internal code
*
* @param fireTime The fire time of the event. This helps when using the same handler
* for multiple event times, to determine which was called
* @param data The actual data from the event. Based on the event time this
* will be a different type. It could be SessionEventData, ScrapeableFileEventData,
* ScriptEventData, StringEventData, etc... It will match the fire time class name
*
* @return A value indicating how to proceed (or sometimes the value is ignored)
*/
public Object handleEvent(EventFireTime fireTime, AbstractEventData data)
{
// While you can specifically grab any data from the data object,
// if this is a method that has a return value that matters,
// it's best to get it as the last return value, so that multiple
// events can be chained together. The input data object
// will always have the original values for all the other getters
Object returnValue = data.getLastReturnValue();
// Do stuff...
// The EventFireTime values describe in the documentation what the return
// value will do, or says nothing about it if the value is ignored
// If you don't intend to modify the return, always return data.getLastReturnValue();
return returnValue;
}
};
See Also
AbstractEventData
The AbstractEventData class is an abstract class which allows for the accessing of various data values found within ScreenScraper. Below are the various classes that extend AbstractEventData
AbstractEventData is extended by the following classes and it is those classes that should be used in place of AbstractEventData.
getLastReturnValue
Object getLastReturnValue ( )
Description
Returns the LastReturnValue for the object. This is the value previously returned by another callback. This can be null, if no callbacks have been fired yet for this event. A null value is also the default return value for the given event.
Parameters
This method does not receive any parameters.
Return Values
Returns the LastReturnValue for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Write to Log
// In practice AbstractEventData is just the abstract class.
// You must actually use one of the classes that extend it.
public Object handleEvent(EventFireTime fireTime, AbstractEventData data) {
// While you can specifically grab any data from the data object,
// if this is a method that has a return value that matters,
// it's best to get it as the last return value, so that multiple
// events can be chained together. The input data object
// will always have the original values for all the other getters
Object returnValue = data.getLastReturnValue();
// do something
// The EventFireTime values describe in the documentation what the return
// value will do, or says nothing about it if the value is ignored
// If you don't intend to modify the return, always return data.getLastReturnValue();
return data.getLastReturnValue();
}
setLastReturnValue
void setLastReturnValue ( Object lastReturnValue )
Description
Sets the LastReturnValue fro the object.
Parameters
- lastReturnValue The new value for the LastReturnValue
Return Values
Returns void.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
// In practice AbstractEventData is just the abstract class.
// You must actually use one of the classes that extend it.
public Object handleEvent(EventFireTime fireTime, AbstractEventData data) {
Object foo = // something here;
data.setLastReturnValue(foo);
// do something
// The EventFireTime values describe in the documentation what the return
// value will do, or says nothing about it if the value is ignored
// If you don't intend to modify the return, always return data.getLastReturnValue();
return data.getLastReturnValue();
}
ExtractorPatternEventData
ExtractorPatternEventData extends AbstractEventData
This contains the data for various extractor pattern operations
Inherits the following methods from AbstractEventData
See Also
extractorPatternTimedOut
boolean extractorPatternEventData.extractorPatternTimedOut ( )
Description
Returns the status of the extractor pattern timeout. Returns true if and only if the extractor pattern was applied and timed out while doing so. Otherwise it will return false.
Parameters
This method does not receive any parameters.
Return Values
Returns a boolean value representing the status of the extractor pattern timeout.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Determine if an extractor pattern has timed out.
public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
if (data.extractorPatternTimeOut()) {
// do something
}
}
getDataRecord
DataRecord extractorPatternEventData.getDataRecord ( )
Description
Returns the DataRecord value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the DataRecord value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current DataRecord.
public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
DataRecord dr = data.getDataRecord();
// do something
}
getDataSet
DataSet extractorPatternEventData.getDataSet ( )
Description
Returns the DataSet value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the DataSet value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current DataSet.
public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
DataSet ds = data.getDataSet();
// do something
}
getExtractorPattern
ExtractorPattern extractorPatternEventData.getExtractorPattern ( )
Description
Returns the ExtractorPattern value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the ExtractorPattern value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current ExtractorPattern.
public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
ExtractorPattern pattern = data.getExtractorPattern();
// do something
}
getScrapeableFile
ScrapeableFile extractorPatternEventData.getScrapeableFile ( )
Description
Returns the Scrapeablefile value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the Scrapeablefile value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current ScrapeableFile.
public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
ScrapeableFile sf = data.getScrapeableFile();
// do something
}
getSession
ScrapingSession extractorPatternEventData.getSession ( )
Description
Returns the Session value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the Session value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current Session.
public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
ScrapingSession _session = data.getSession();
// do something
}
ScrapeableFileEventData
ScrapeableFileEventData extends AbstractEventData
This contains the data for various scrapeable file operations
Inherits the following methods from AbstractEventData
See Also
getHttpResponseData
String scrapeableFileEventData.getHttpResponseData ( )
Description
Returns the HttpResponseData for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the HttpResponseData for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the HttpResponseData
public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
String responseData = data.getHttpResponseData();
// do something
}
getRedirectRequestBuilder
ScrapingRequest.Builder scrapeableFileEventData.getRedirectRequestBuilder ( )
Description
Returns the RedirectRequestBuilder for the object. Use this to add headers, etc... for the redirect. It can be null depending on the HTTP client being used, and whether or not it supports manually playing with the redirect.
Parameters
This method does not receive any parameters.
Return Values
Returns the RedirectRequestBuilder for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the Request Builder in order to modify it.
public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
ScrapingRequest.Builder builder = data.getRedirectRequestBuilder();
// do something
}
getScrapeableFile
ScrapeableFile scrapeableFileEventData.getScrapeableFile ( )
Description
Returns the Scrapeablefile value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the Scrapeablefile value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current ScrapeableFile.
public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
ScrapeableFile sf = data.getScrapeableFile();
// do something
}
getSession
ScrapingSession scrapeableFileEventData.getSession ( )
Description
Returns the Session value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the Session value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current Session.
public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
ScrapingSession _session = data.getSession();
// do something
}
ScriptEventData
ScriptEventData extends AbstractEventData
This contains the data for various script operations
Inherits the following methods from AbstractEventData
See Also
getDataRecord
DataRecord scriptEventData.getDataRecord ( )
Description
Returns the DataRecord value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the DataRecord value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current DataRecord.
public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
DataRecord dr = data.getDataRecord();
// do something
}
getDataSet
DataSet scriptEventData.getDataSet ( )
Description
Returns the DataSet value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the DataSet value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current DataSet.
public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
DataSet ds = data.getDataSet();
// do something
}
getScrapeableFile
ScrapeableFile scriptEventData.getScrapeableFile ( )
Description
Returns the Scrapeablefile value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the Scrapeablefile value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current ScrapeableFile.
public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
ScrapeableFile sf = data.getScrapeableFile();
// do something
}
getScriptException
java.lang.Exception scriptEventData.getScriptException ( )
Description
Returns the ScriptException for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the ScriptException for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the script exception
public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
java.lang.Exception e = data.getScriptException();
// do something
}
getScriptName
String scriptEventData.getScriptName ( )
Description
Returns the ScriptName value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the ScriptName value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the script name
public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
String name = data.getScriptName();
// do something
}
getSession
ScrapingSession scriptEventData.getSession ( )
Description
Returns the Session value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the Session value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current Session.
public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
ScrapingSession _session = data.getSession();
// do something
}
SessionEventData
SessionEventData extends AbstractEventData
This contains the data for various session operations
Inherits the following methods from AbstractEventData
See Also
getIncrementRecordsAmount
Object sessionEventData.getIncrementRecordsAmount ( )
Description
Returns the IncrementRecordsAmount value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the IncrementRecordsAmount value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current increment records amount.
public Object handleEvent(EventFireTime fireTime, SessionEventData data) {
Object recordsAmt = data.getIncrementRecordsAmount();
// do something
}
getSession
ScrapingSession sessionEventData.getSession ( )
Description
Returns the Session value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the Session value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current Session.
public Object handleEvent(EventFireTime fireTime, SessionEventData data) {
ScrapingSession _session = data.getSession();
// do something
}
getVariableName
String sessionEventData.getVariableName ( )
Description
Returns the VariableName value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the VariableName value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the variable name.
public Object handleEvent(EventFireTime fireTime, SessionEventData data) {
String name = data.getVariableName();
// do something
}
getVariableValue
Object sessionEventData.getVariableValue ( )
Description
Returns the VariableValue value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the VariableValue value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Get the current Session.
public Object handleEvent(EventFireTime fireTime, SessionEventData data) {
Object value = data.getVariableValue();
// do something
}
StringEventData
StringEventData extends AbstractEventData
This contains the data for various string operations
Inherits the following methods from AbstractEventData
See Also
getInput
String stringEventData.getInput ( )
Description
Returns the Input value for the object.
Parameters
This method does not receive any parameters.
Return Values
Returns the Input value for the object.
Change Log
Version |
Description |
6.0.55a |
Available for all editions. |
Examples
Write to Log
public Object handleEvent(EventFireTime fireTime, StringEventData data) {
String str = data.getInput();
// do something
}
addToVariable
void session.addToVariable ( String variable, int value ) (professional and enterprise editions only)
Description
Add to the value of a session variable.
Parameters
- variable Key of the variable, as a string.
- value Value to be added to the variable, as a integer.
Return Values
Returns void. If the variable doesn't exist, or is not a string or integer, a message will be added to the log. If it cannot add to the variable for any other reason it will write an error to the log.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Increment Variable
// Increments the session variable "PAGE_NUM" by one.
session.addToVariable( "PAGE_NUM", 1 )
See Also
- getVariable() [session] - Returns the value of a session variable
- getv() [session] - Returns the value of a session variable (alias of getVariable)
- setVariable() [session] - Sets the value of a session variable
- setv() [session] - Sets the value of a session variable (alias of setVariable)
breakpoint
void session.breakpoint ( ) (professional and enterprise editions only)
Description
Pause scrape and display breakpoint window. If the scrape is running in server mode, to avoid the break, logVariables will be called in place of breakpoint.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Open BreakPoint Window
// Causes the breakpoint window to be displayed.
session.breakpoint();
clearAllSessionVariables
void session.clearAllSessionVariables ( )
Description
Remove all session variables.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Clear Session Variables
// Clear all session variables.
session.clearAllSessionVariables();
See Also
- setVariable() [session] - Sets the value of a session variable
- setv() [session] - Sets the value of a session variable (alias of setVariable)
clearCookies
void session.clearCookies ( ) (enterprise edition only)
Description
Clear stored cookies.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Clear Cookies
// Clear all current cookies,
session.clearCookies();
See Also
- getCookies() [session] - Gets all the cookies currently stored by this scraping session
- setCookie() [session] - Sets the value of a cookie
clearVariables
void session.clearVariables ( Map variables ) (professional and enterprise editions only)
void session.clearVariables ( Collection variables ) (professional and enterprise editions only)
Description
Clears the value of all session variables that match the keys in the Map. This will ignore a key of DATARECORD.
This method is provided using a Map or Collection rather than a List or Set to work easier with the setSessionVariables method.
Parameters
- Map The map to use when clearing the session variables.
- Collection The collection to use when clearing the session variables.
Return Value
This method returns void.
Change Log
Version |
Description |
5.5.29a |
Available in all editions. |
5.5.43a |
Changed from session.removeSessionVariablesInMap to session.clearVariables. |
Examples
Clear the ASPX values for a .NET site after scraping the next page
DataRecord aspx = scrapeableFile.getASPXValues();
session.setSessionVariables(aspx);
session.scrapeFile("Next Results");
session.clearVariables(aspx);
convertHTMLEntitiesInVariable
void session.convertHTMLEntitiesInVariable ( String variable )
Description
Decode HTML Entities on a session variable.
Parameters
- variable Session variable whose HTML Entities will be converted to characters.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Decode HTML Entities In Variable
// Set variable
session.setv( "LOCATION", "Angela's Room" );
// Convert HTML entities
session.convertHTMLEntitiesInVariable( "LOCATION" );
// Write to Log
session.log( session.getv( "LOCATION" ) ); //logs Angela's Room
See Also
downloadFile
boolean session.downloadFile ( String url, String fileName ) (professional and enterprise editions only)
boolean session.downloadFile ( String url, String fileName, int maxNumAttempts ) (professional and enterprise editions only)
boolean session.downloadFile ( String url, String fileName, int maxNumAttempts, boolean doLazy ) (enterprise edition only)
Description
Downloads the file to the local file system.
Parameters
- url URL reference to the desired file, as a string.
- fileName Local file path when the file should be saved, as a string.
- maxNumAttempts (optional) Number of times the file will be requested without success, as an integer. Defaults to 3.
- doLazy (optional) Whether the file should be downloaded in a separate thread, as a boolean. Defaults to false.
Return Values
Returns true on successful download of the file otherwise it return false.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. Lazy scrape only available for enterprise edition. |
If the file to download requires that POST data is sent in order to get the file you would use saveFileOnRequest with a scrapeable file.
Using this method in a script takes the place of requesting the target URL as a scrapeable file.
Examples
Download File in a Separate Thread
// Downloads the image pointed to by the URL to the local C: drive.
// A maximum number of 5 attempts will be made to download the file,
// and the file will be downloaded in its own thread.
session.downloadFile( "http://www.foo.com/imgs/puppy_image.gif", "C:/images/puppy.gif", 5, true );
executeScript
void session.executeScript ( String scriptName ) (professional and enterprise editions only)
Description
Manual start the execution of a script.
Parameters
- scriptName Name of the script to execute, as a string. The script has to be on the same instance of screen-scraper as the scraping session.
Return Values
Returns void. If the file doesn't exist a message will be written to the log. If the called script has an error in it a warning will be written to the log.
Change Log
Version |
Description |
5.0 |
Scripts called using this method are now exported with the scraping session. |
4.5 |
Available for professional and enterprise editions. |
Examples
Execute Script
// Executes the script "My Script".
session.executeScript( "My Script" );
executeScriptWithContext
void session.executeScriptWithContext ( String scriptName ) (professional and enterprise editions only)
Description
Executes the named script, but preserves the current context (dataRecord, scrapeableFile, etc...)
Parameters
- scriptName The name of the script to execute.
Return Value
This method returns void.
Change Log
Version |
Description |
5.5.29a |
Available in professional and enterprise editions. |
Examples
Execute a script, but preserve the context
// Execute the 'Do more stuff' script, but give it access to the scrapeableFile this script has access to.
session.executeScriptWithContext("Do more stuff");
getCharacterSet
String session.getCharacterSet ( )
Description
Get the general character set being used in page response renderings.
Parameters
This method does not receive any parameters.
Return Values
Returns the character set applied to the scraping session's files, as a string. If a character set has not been specified then it will default to the character set specified in settings dialog box.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
If you are having trouble with characters displaying incorrectly, we encourage you to read about how to go about finding a solution using one of our FAQs.
Examples
Get Character Set
// Get the character set of the dataSet
charSetValue = session.getCharacterSet();
See Also
- setCharacterSet() [session] - Set the character set used to render all responses.
- getCharacterSet() [scrapeableFile] - Get the character set used to responses to a specific scrapeable file.
- setCharacterSet() [scrapeableFile] - Set the character set used to responses to a specific scrapeable file.
getConnectionTimeout
int session.getConnectionTimeout ( )
Description
Retrieve the timeout value for scrapeable files in the session.
Parameters
This method does not receive any parameters.
Return Values
Returns the timeout value in milliseconds, as an integer.
Change Log
Version |
Description |
5.0.1a |
Introduced for all editions. |
Examples
Retrieve Connection Timeout
// set variable to connection timeout
timeout = session.getConnectionTimeout( );
See Also
getCookies
Cookie[] session.getCookies ( )
Description
Get the current cookies.
Parameters
This method does not receive any parameters.
Return Values
Returns an array of the cookies in the session.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Add Cookie If Missing
// Get cookies
cookies = session.getCookies();
// Cookie Information
cookieDomain = "mydomain.com";
cookieName = "cookie_test";
cookieValue = "please_accept_for_session";
// Exists Flag
cookieExists = false;
// Loop through cookies
for (i = 0; i < cookies.length; i++) {
cookie = cookies[i];
// Check if this is the cookie
if (cookie.getName().equals(cookieName) && cookie.getValue().equals(cookieValue)&&cookie.getDomain().equals(cookieDomain)) {
//if the cookie matches then it exists
cookieExists = true;
// Log search status
session.log( "+++Cookie Exists" );
// Stop searching
break;
}
}
// Add cookie, if it doesn't exist
if ( !cookieExists ) {
session.log( "+++Cookie Does NOT Exists: Setting Cookie" );
session.setCookie( cookieDomain, cookieName, cookieValue);
}
Write Cookies to Log
// Get cookies
cookies = session.getCookies();
// Loop through Cookies
for (i = 0; i < cookies.length; i++) {
cookie = cookies[i];
// Write Cookie information to the Log
session.log( "COOKIE #" + i );
session.log( "Name: " + cookie.getName() );
session.log( "Value: " + cookie.getValue() );
session.log( "Path: " + cookie.getPath() );
session.log( "Domain: " + cookie.getDomain() );
// Only log expiration if it is set
if (cookie.getExpiryDate() != null) {
session.log( "Expiration: " + cookie.getExpiryDate().toString() );
}
}
See Also
- clearCookies() [session] - Clears all the cookies from this scraping session
- setCookie() [session] - Sets the value of a cookie
getDebugMode
boolean session.getDebugMode ( )
Description
Checks to see if this is currently set to run in debug mode. This is useful for developing scrapes, as enabling debug mode logs a warning message, so it is easier to notice a scrape with hard-coded values used for development. Also logs a warning in the web interface or log each time monitored variables are logged with the logMonitoredValues or webMessage methods are called.
Parameters
This method takes no parameters.
Return Value
True if debug mode is enabled, false otherwise.
Change Log
Version |
Description |
5.5.29a |
Available in all editions. |
Examples
Set some hardcoded values to use when the scrape is being developed
// Comment out the line below for production
session.setDebugMode(true);
if(session.getDebugMode())
{
session.setVariable("SEARCH_TERM", "DVDs");
session.setVariable("USERNAME", "some user");
session.setVariable("PASSWORD", "the password");
}
getDefaultRetryPolicy
RetryPolicy session.getDefaultRetryPolicy ( ) (professional and enterprise editions only)
Description
Gets the default retry policy to be used by each scrapeable file when one wasn't set for it.
Parameters
This method takes no parameters
Return Value
The default return policy, or null if there isn't one
Change Log
Version |
Description |
5.5.29a |
Available in professional and enterprise editions. |
Examples
Check for a default RetryPolicy
if(session.getDefaultRetryPolicy() == null)
{
session.logWarn("No default retry policy specified");
}
getElapsedRunningTime
long session.getElapsedRunningTime ( ) (professional and enterprise editions only)
Description
Get how long the current session has been running.
Parameters
This method does not receive any parameters.
Return Values
Returns number of milliseconds the scrape has been running, as a long (8-byte integer).
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
If you would like to log the running time of the scraping session you should use logElapsedRunningTime.
Examples
Generic Scrape Timeout
// On pagination iterator
// Setup length to run
timeout = 1000*60*60*24; // 1 day
// Check how long scrape has been running
if (session.getElapsedRunningTime() >= timeout )
{
session.stopScraping();
}
See Also
getLoggingLevel
int session.getLoggingLevel ( )
Description
Get the logging level of the scrape.
Parameters
This method does not receive any parameters.
Return Values
Returns the logging level, as an integer. Currently there are four levels: 1 = Debug, 2 = Info, 3 = Warn, 4 = Error.
Change Log
Version |
Description |
5.0.1a |
Introduced for all editions. |
Examples
Set Logging Level If Low
// get logging level
logLevel = session.getLoggingLevel();
if (logLevel < Notifiable.LEVEL_WARN )
{
session.setLoggingLevel( Notifiable.LEVEL_WARN );
}
See Also
getMaxConcurrentFileDownloads
int session.getMaxConcurrentFileDownloads ( ) (professional and enterprise editions only)
Description
Retrieve the maximum number of concurrent file downloads being allowed.
Parameters
This methods does not receive any parameters.
Return Values
Returns the max number of concurrent file downloads allowed, as an integer.
Change Log
Version |
Description |
5.0 |
Added for professional and enterprise editions. |
Examples
Check Max Concurrent File Downloads
// How many concurrent downloads are permitted
maxConcurrentDownloads = session.getMaxConcurrentFileDownloads();
See Also
getMaxHTTPRequests
int session.getMaxHTTPRequests ( ) (professional and enterprise editions only)
Description
Retrieve the number of attempts that scrapeable files should make to get the requested page.
Parameters
This method does not receive any parameters.
Return Values
Returns the number of attempts that will be made, as a integer.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Retrieve the Retry Value
// Write retries to log
session.log( "Retries per file: " + session.getMaxHTTPRequests() );
See Also
- setMaxHTTPRequests() [session] - Sets the number of attempts a scrapeable file will make to get the requested page
getMaxScriptsOnStack
int session.getMaxScriptsOnStack ( )
Description
Get the total number of scripts allowed on the stack before the scraping session is forcibly stopped.
Parameters
This method does not receive any parameters.
Return Values
Returns max number of scripts that can be running at a time, as an integer.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Check If More Scripts Can Be Run
import java.math.*;
// Get Number of Scripts (running and max)
BigDecimal numRunningScripts = new BigDecimal(session.getNumScriptsOnStack());
BigDecimal maxAllowedScripts = new BigDecimal(session.getMaxScriptsOnStack());
// Calculate percentage used
BigDecimal percentageUsedBD = numRunningScripts.divide(maxAllowedScripts, 2, BigDecimal.ROUND_HALF_UP);
double percentageUsed = percentageUsedBD.doubleValue();
if (percentageUsed < 90)
{
session.log(percentageUsed.toString() + "% of max scripts used");
}
else
{
session.logWarn("90% max scripts threshold has been reached.");
}
See Also
getName
String session.getName ( )
Description
Get the name of the current scraping session.
Parameters
This method does not receive any parameters.
Return Values
Returns the name of the scraping session, as a string.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Write Scraping Session Name to Log
// Outputs the name of the scraping session to the log.
session.log( "Current scraping session: " + session.getName() );
getNumScriptsOnStack
int session.getNumScriptsOnStack ( )
Description
Get the number of scripts currently running.
Parameters
This method does not receive any parameters.
Return Values
Returns number of running scripts, as an integer.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Check If More Scripts Can Be Run
import java.math.*;
// Get Number of Scripts (running and max)
BigDecimal numRunningScripts = new BigDecimal(session.getNumScriptsOnStack());
BigDecimal maxAllowedScripts = new BigDecimal(session.getMaxScriptsOnStack());
// Calculate percentage used
BigDecimal percentageUsedBD = numRunningScripts.divide(maxAllowedScripts, 2, BigDecimal.ROUND_HALF_UP);
double percentageUsed = percentageUsedBD.doubleValue();
if (percentageUsed < 90)
{
session.log(percentageUsed.toString() + "% of max scripts used");
}
else
{
session.logWarn("90% max scripts threshold has been reached.");
}
See Also
getRetainNonTidiedHTML
boolean session.getRetainNonTidiedHTML ( ) (enterprise edition only)
Description
Determine whether or not non-tidied HTML is to be retained for all scrapeable files in this scraping session.
Parameters
This method does not receive any parameters.
Return Values
Returns whether non-tidied HTML is be retained for all scrapeable files or not, as a boolean.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Determine if Non-tidied HTML is Being Retained
// Outputs the non-tidied HTML from the scrapeable file
// to the log if it was retained otherwise just a message.
if (session.getRetainNonTidiedHTML())
{
session.log( "All scrapeable files will retain non-tidied HTML" );
}
else
{
session.log( "Non-tidied HTML will not be not retained." );
}
See Also
getScrapeableSessionID
int session.getScrapeableSessionID ( ) (enterprise edition only)
Description
Get the unique identifier for the scraping session.
Parameters
This method does not receive any parameters.
Return Values
Returns unique session id for the scraping session, as an integer.
Change Log
Version |
Description |
5.0 |
Added for enterprise edition. |
Examples
Retrieve Unique ID
// Get Unique ID
int i = session.getScrapeableSessionID();
getStartTime
long session.getStartTime ( )
Description
Retrieve the time at which the scrape started.
Parameters
This method does not receive any parameters.
Return Values
Returns the start time of the scrape in milliseconds, as a long.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Get Session Start Time
// Retrieves the start time and places it
// in the variable "start".
start = session.getStartTime();
getTimeZone
TimeZone session.getTimeZone ( )
Description
Gets the current time zone of the Scraping Session
Parameters
This method takes no parameters.
Return Value
The time zone this scrape is set to.
Change Log
Version |
Description |
5.5.29a |
Available in all editions. |
Examples
Get the current Time Zone in use
TimeZone currentTimeZone = session.getTimeZone();
getVariable
Object session.getVariable ( String identifier )
Description
Retrieve the value of a saved session variable.
Parameters
- identifier The name of the variable whose value is to be retrieved, as a string.
Return Values
Returns the value of the session variable. This will be a string unless you have used setVariable to place something other than a string into a session variable.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Retrieve Session Variable
// Places the session variable "CITY_CODE" in the local
// variable "cityCode".
cityCode = session.getVariable( "CITY_CODE" );
See Also
- addToVariable() [session] - Adds an integer to the value of a session variable.
- getv() [session] - Retrieve the value of a saved session variable (alias of getVariable).
- setv() [session] - Set the value of a session variable (alias of setVariable).
- setVariable() [session] - Set the value of a session variable.
getv
Object session.getv ( String identifier )
Description
Retrieve the value of a saved session variable (alias of getVariable).
Parameters
- identifier The name of the variable whose value is to be retrieved, as a string.
Return Values
Returns the value of the session variable. This will be a string unless you have used setVariable to place something other than a string into a session variable.
Change Log
Version |
Description |
4.5 |
Added for all editions. |
Examples
Retrieve Session Variable
// Places the session variable "CITY_CODE" in the local
// variable "cityCode".
cityCode = session.getv( "CITY_CODE" );
See Also
- addToVariable() [session] - Adds an integer to the value of a session variable.
- getVariable() [session] - Retrieve the value of a saved session variable.
- setv() [session] - Set the value of a session variable (alias of setVariable).
- setVariable() [session] - Set the value of a session variable.
isRunningFromCommandLine
boolean session.isRunningFromCommandLine ( )
Description
Returns whether or not we are currently running in the command line. This is a convenience method for doing something different in a script when running in the command line as opposed to other modes
Parameters
This method does not receive any parameters.
Return Values
Returns true if and only if the scrape is currently running in the command line.
Change Log
Version |
Description |
6.0.37a |
Introduced for all editions. |
Examples
Retrieve Connection Timeout
if (session.isRunningFromCommandLine()) {
// do something only done in the command line
}
isRunningInServer
boolean session.isRunningInServer ( )
Description
Returns whether or not we are currently running in the server. This is a convenience method for doing something different in a script when running in the server as opposed to other modes
Parameters
This method does not receive any parameters.
Return Values
Returns true if and only if the scrape is currently running in the server.
Change Log
Version |
Description |
6.0.37a |
Introduced for all editions. |
Examples
Retrieve Connection Timeout
if (session.isRunningInServer()) {
// do something only done in the server
}
isRunningInWorkbench
boolean session.isRunningInWorkbench ( )
Description
Returns whether or not we are currently running in the workbench. This is a convenience method for doing something different in a script when running in the workbench as opposed to other modes
Parameters
This method does not receive any parameters.
Return Values
Returns true if and only if the scrape is currently running in the workbench.
Change Log
Version |
Description |
6.0.37a |
Introduced for all editions. |
Examples
Retrieve Connection Timeout
if (session.isRunningInWorkbench()) {
// do something only done in workbench
}
loadStateFromString
boolean session.loadStateFromString ( String stateXML ) (professional and enterprise editions only)
Description
Loads the state that would have been previously saved by invoking the session.saveStateToString method.
Parameters
- stateXML A string representing session state.
Change Log
Version |
Description |
5.5.30a |
Available in Professional and Enterprise editions. |
Examples
Load state in from a file
import org.apache.commons.io.FileUtils;
File f = new File( "session_state.xml" );
sessionState = FileUtils.readFileToString( f, session.getCharacterSet() );
session.loadStateFromString( sessionState );
loadVariables
void session.loadVariables ( String fileToReadFrom ) (enterprise edition only)
Description
Load session variables from a file.
Parameters
- fileToReadFrom File path of the file that contains the session variables, as a string.
Return Values
Returns void. If there is a problem retrieving the file contents an I/O error will be written to the log.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
See also: saveVariables.
If you want to create your own file of session variables, the format is a hard return-delimited list of name/value pairs. Both the key and value should be URL-encoded.
Examples
Load Session Variables from File
// Reads in variables from the file located at "C:\myvars.txt".
// Note that a forward slash is used instead of a back slash
// as a folder delimiter. If back slashes were used, they
// would need to be doubled so that they're properly escaped
// out for the script interpreter.
session.loadVariables( "C:/myvars.txt" );
Sample Variables File
BIRTHDAY=12%2F25
NAME=Santa
AGE=Unknown
See Also
saveStateToString
boolean session.saveStateToString ( boolean saveCookies, boolean saveVariables ) (professional and enterprise editions only)
Description
Saves the current state of the scraping session to a string. An example use case for this method would be a scraping session that logs in to a site, extracts some information, and then is stopped, saving its state out to a file. A second scraping session could then be run, loading the state back in from the file, which would keep the session logged in so that other information could be obtained without logging in once again. By default the scraping session will save out information such as the URL to use as a referer. More information can be saved using the boolean flags described below.
Parameters
- saveCookies Whether or not cookies should be saved.
- saveVariables Whether or not session variables should be saved.
Change Log
Version |
Description |
5.5.30a |
Available in Professional and Enterprise editions. |
Examples
Save out state to a file
// Put the current state in a local variable.
sessionState = session.saveStateToString( true, true );
// Write the state out to a file.
sutil.writeValueToFile( sessionState, "session_state.xml", session.getCharacterSet() );
saveVariables
void session.saveVariables ( String fileToSaveTo ) (enterprise edition only)
Description
Saves all current string and integer variables to a file.
Parameters
- fileToSaveTo File path where the file should be saved, as a string.
Return Values
Returns void. If there is a problem retrieving the file contents an I/O error will be written to the log.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Save Session Variables to File System
// Saves the current session variables out to C:\myvars.txt.
// Note that a forward slash is used instead of a back slash
// as a folder delimiter. If back slashes were used, they
// would need to be doubled so that they're properly escaped
// out for the script interpreter.
session.saveVariables( "C:/myvars.txt" );
See Also
scrapeFile
void session.scrapeFile ( String scrapeableFileIdentifier )
Description
Manually scrape a scrapeable file.
Parameters
- scrapeableFileIdentifier Name of the scrapeable file, as a string.
Return Values
Returns void. If there is a problem accessing the scrapeable file an message will be written to the log.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Scrape File Manually
// Causes the scrapeable file "Login" to be requested.
session.scrapeFile( "Login" );
scrapeString
boolean session.scrapeString ( String scrapeableFileName, String content ) (professional and enterprise editions only)
Description
Invokes a scrapeable file using a string of content instead of a web page or local file.
Parameters
- scrapeableFileName The scrapeable file to be invoked.
- content The content to load.
Change Log
Version |
Description |
5.5.13a |
Available in all editions. |
Examples
Invoke a scrapeable file using a string
content = session.getv( "PARTIAL_PAGE_CONTENT" );
session.scrapeString( "My Scrapeable File", content );
sendDataToClient
void session.sendDataToClient ( String key, Object value ) (enterprise edition only)
Description
Send data to the external script that initiated the scrape. This isn't currently supported with all drivers (e.g., remote scraping session), check the documentation on the language of the external script for more information.
Parameters
- key Name of the information being sent, as a string.
- value Data to be processed by external script, supported types are Strings, Integers, DataRecords, and DataSets.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Send dataRecord to Client
// Causes the current DataRecord object to be sent to the client
// for processing.
session.sendDataToClient( "MyDataRecord", dataRecord );
setCharacterSet
void session.setCharacterSet ( String characterSet )
Description
Set the general character set used in page response renderings. This can be particularly helpful when the pages render characters incorrectly.
Parameters
- characterSet Java recognized character set, as a string. Java provides a list of supported character sets in its documentation.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
This method must be invoked before the session starts.
If you are having trouble with characters displaying incorrectly, we encourage you to ready about how to go about finding a solution using one of our FAQs.
Examples
Set Character Set of All Scrapeable Files
// In script called "Before scraping session begins"
// Sets the character set to be applied to the last responses
// of all scrapeable files in session.
session.setCharacterSet( "ISO-8859-1" );
See Also
- getCharacterSet() [session] - Gets the character set used to render all responses.
- getCharacterSet() [scrapeableFile] - Get the character set used to responses to a specific scrapeable file.
- setCharacterSet() [scrapeableFile] - Set the character set used to responses to a specific scrapeable file.
setConnectionTimeout
void session.setConnectionTimeout ( int timeout )
Description
Set the timeout value for scrapeable files in the session.
Parameters
- timeout The length of the timeout in seconds, as an integer.
Return Values
Returns void.
Change Log
Version |
Description |
5.0.1a |
Introduced for all editions. |
Examples
Set Connection Timeout
// set connection timeout to 15 seconds
session.setConnectionTimeout( 15 );
See Also
setCookie
void session.setCookie ( String domain, String key, String value ) (professional and enterprise editions only)
Description
Manually set a cookie in the current session state.
Parameters
- domain The domain to which the cookie pertains, as a string.
- key The name of the cookie, as a string.
- value The value of the cookie, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
This method should be rarely used as screen-scraper automatically manages cookies. In cases where cookies are set via JavaScript, this function might be necessary.
Examples
Manually Set Cookie
// Sets a cookie associated with "mydomain.com", using the
// key "user" and the value "John Smith".
session.setCookie( "mydomain.com", "user", "John Smith" );
See Also
- clearCookies() [session] - Clear all cookies from this scraping session
- getCookies() [session] - Gets all the cookies currently stored by this scraping session
setDebugMode
void session.setDebugMode ( boolean debugMode )
Description
Sets the debug state for the scrape. Enabled debug mode simply outputs a warning periodically while running, to help prevent running a production scrape in debug mode.
Parameters
- debugMode True to enable debug mode, false to disable it.
Return Value
This method returns void.
Change Log
Version |
Description |
5.5.29a |
Available in all editions. |
Examples
Set some hardcoded values to use when the scrape is being developed
// Comment out the line below for production
session.setDebugMode(true);
if(session.getDebugMode())
{
session.setVariable("SEARCH_TERM", "DVDs");
session.setVariable("USERNAME", "some user");
session.setVariable("PASSWORD", "the password");
}
setDefaultRetryPolicy
void session.setDefaultRetryPolicy ( RetryPolicy retryPolicy ) (professional and enterprise editions only)
Description
Sets a retry policy that will affect all files in the scrape. This policy will be used by all scrapeable files that do not have a retry policy set for them. If a retry policy was manually set for them, this one will not be used.
Parameters
- retryPolicy The retry policy to use by default, if no other retry policy is set.
Return Value
This method returns void.
Change Log
Version |
Description |
5.5.29a |
Available in professional and enterprise editions. |
Examples
Create a defaul RetryPolicy
import com.screenscraper.util.retry.RetryPolicyFactory;
// Use a retry policy that will rotate the proxy if there was an error on request
session.setDefaultRetryPolicy(RetryPolicyFactory.getBasicPolicy(5, "Get new proxy"));
setKeyStoreFilePath
void session.setKeyStoreFilePath ( String filePath ) (professional and enterprise editions only)
Description
Sets the path to the keystore file. Some web sites require a special type of authentication that requires the use of a keystore file. See our blog entry on Using Client Certificates for more detail. Calling this method is the equivalent of setting the corresponding value under the "Advanced" tab for the scraping session in the workbench.
Parameters
- filePath The path to the keystore file.
Change Log
Version |
Description |
5.5.10a |
Available in all editions. |
Examples
Set the path to the keystore file
// Set the path.
session.setKeyStoreFilePath( "~/key_files/my_key.crt" );
// Output the current path.
session.log( "Keystore file path is: " + session.getKeyStoreFilePath() );
setKeyStorePassword
void session.setKeyStorePassword ( String password ) (professional and enterprise editions only)
Description
Sets the password for the keystore file. Some web sites require a special type of authentication that requires the use of a keystore file. See our blog entry on Using Client Certificates for more detail. Calling this method is the equivalent of setting the corresponding value under the "Advanced" tab for the scraping session in the workbench.
Parameters
- filePath The password for the keystore file.
Change Log
Version |
Description |
5.5.10a |
Available in all editions. |
Examples
Set the path to the keystore file
// Set the password.
session.setKeyStorePassword( "My_password" );
// Output the current password.
session.log( "Keystore password is: " + session.getKeyStorePassword() );
setLoggingLevel
void session.setLoggingLevel ( int loggingLevel )
Description
Set the logging level of the scrape.
Parameters
- loggingLevel Level of logging that should be used, as an integer. It works best if you use the Notifiable interface in case levels are ever changed.
Return Values
Returns void.
Change Log
Version |
Description |
5.0.1a |
Introduced for all editions. |
Examples
Set Logging Level
// get logging level
logLevel = session.getLoggingLevel();
if (logLevel < Notifiable.LEVEL_WARN )
{
session.setLoggingLevel( Notifiable.LEVEL_WARN );
}
See Also
setMaxConcurrentFileDownloads
void session.setMaxConcurrentFileDownloads ( int maxConcurrentFileDownloads ) (professional and enterprise editions only)
Description
Set the maximum number of concurrent file downloads to a allow.
Parameters
- maxConcurrentFileDownloads The maximum number of downloads to allow, as an integer.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for professional and enterprise editions. |
Examples
Set Max for Concurrent File Downloads
// Limit the number of concurrent file downloads to 10
session.setMaxConcurrentFileDownloads( 10 );
See Also
setMaxHTTPRequests
void session.setMaxHTTPRequests ( int maxAttempts ) (professional and enterprise editions only)
Description
Set the number of attempts that scrapeable files should make to get the requested page.
Parameters
- maxAttempts The number of attempts that will be made, as a integer.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Set the Retry Value
// Set retries for files
session.setMaxHTTPRequests( 3 );
See Also
- getMaxHTTPRequests() [session] - Returns the maximum number of attempts a scrapeable file will make to retrieve the file
setMaxScriptsOnStack
void session.setMaxScriptsOnStack ( int maxScriptsOnStack ) (enterprise edition only)
Description
Get the total number of scripts that can be running concurrently. Default value for maxScriptsOnStack is 50.
Parameters
- maxScriptsOnStack Number of scripts to be allowed to run concurrently, as an integer.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for enterprise edition. |
Before you start upping the value of the number of scripts that can be on the stack you should make sure that your scrape is not eating more then it should. One thing to consider is recursion instead of iterating. This is discussed in more details on our blog or in the Tips, Tricks, and Samples section of this site.
Examples
Allocate More Resources to Scrape
// Allow for 100 scripts (instead of 50)
session.setMaxScriptsOnStack(100);
See Also
setRandomizeUserAgent
void session.setRandomizeUserAgent ( boolean randomizeUserAgent ) (professional and enterprise editions only)
Description
Causes the "User-Agent" header sent by screen-scraper to be randomized. The user agent strings from which screen-scraper will select are found in the "resource\conf\user_agents.txt" file.
Parameters
- randomizeUserAgent true or false
Change Log
Version |
Description |
5.5.34a |
Available in Professional and Enterprise editions. |
Examples
Randomize the user-agent header
session.setRandomizeUserAgent( true );
// You can also access the current value like so:
session.log( "Randomize user agent: " + session.getRandomizeUserAgent() );
setRetainNonTidiedHTML
void session.setRetainNonTidiedHTML ( boolean retainNonTidiedHTML ) (enterprise edition only)
Description
Set whether or not non-tidied HTML is to be retained for all scrapeable files.
Parameters
- retainNonTidiedHTML Whether the non-tidied HTML should be retained, as a boolean. The default is false.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
If, after the file is scraped, you want to be able to use getNonTidiedHTML this method has to be called before a file is scraped.
Examples
Retain Non-tidied HTML
// Tell screen-scraper to retain tidied HTML for the all
// scrapeable files.
session.setRetainNonTidiedHTML( true );
See Also
setSessionVariables
void session.setSessionVariables ( Map variables) (professional and enterprise editions only)(professional and enterprise editions only)
void session.setSessionVariables ( Map variables, boolean ignoreLowerCaseKeys)(professional and enterprise editions only)
Description
Sets the value of all session variables that match the keys in the Map to the values in the Map. This will ignore a key of DATARECORD.
Parameters
- Map The map to use when setting the session variables.
- ignoreLowerCase True if keys with lowercase characters should be ignored. This would include A_KEy
Return Value
This method returns void.
Change Log
Version |
Description |
5.5.29a |
Available in all editions. |
5.5.43a |
Changed from session.setSessionVariablesFromMap to session.setSessionVariables. |
Examples
Set the ASPX values for a .NET site before scraping the next page
DataRecord aspx = scrapeableFile.getASPXValues();
session.setSessionVariables(aspx);
session.scrapeFile("Next Results");
setStatusMessage
void session.setStatusMessage ( String message ) (enterprise edition only)
Description
Sets a status message to be displayed in the web interface.
Parameters
- message The message to be set.
Change Log
Version |
Description |
5.5.32a |
Available in Enterprise edition. |
Examples
Append a status message
if( scrapeableFile.getMaxRequestAttemptsReached() )
{
session.setStatusMessage( "Maximum requests reached for scrapeable file: " + scrapeableFile.getName() );
// Output the current status message.
session.log( "Current status message: " + session.getStatusMessage() );
}
setStopScrapingOnExtractorPatternTimeout
void session.setStopScrapingOnExtractorPatternTimeout ( boolean stopScrapingOnExtractorPatternTimeout ) (professional and enterprise editions only)
Description
If this method is passed the value of true, it will cause screen-scraper to stop the current scraping session if an extractor pattern timeout occurs.
Parameters
- stopScrapingOnExtractorPatternTimeout true or false
Change Log
Version |
Description |
5.5.36a |
Available in Professional and Enterprise editions. |
Examples
Indicate that the scraping session should be stopped when an extractor pattern timeout occurs
session.setStopScrapingOnExtractorPatternTimeout( true );
// You can also access the current value like so:
session.log( "Stop scraping on extractor pattern timeout: " + session.getStopScrapingOnExtractorPatternTimeout() );
setStopScrapingOnMaxRequestAttemptsReached
void session.setStopScrapingOnMaxRequestAttemptsReached ( boolean stopScrapingOnMaxRequestAttemptsReached ) (professional and enterprise editions only)
Description
If this method is passed the value of true, it will cause screen-scraper to stop the current scraping session if the maximum attempts to request a file is reached.
Parameters
- stopScrapingOnMaxRequestAttemptsReached true or false
Change Log
Version |
Description |
5.5.36a |
Available in Professional and Enterprise editions. |
Examples
Indicate that the scraping session should be stopped if the maximum attempts to request a file is reached
session.setStopScrapingOnMaxRequestAttemptsReached( true );
// You can also access the current value like so:
session.log( "Stop scraping on max attempts reached: " + session.getStopScrapingOnMaxRequestAttemptsReached() );
setStopScrapingOnScriptError
void session.setStopScrapingOnScriptError ( boolean stopScrapingOnScriptError ) (professional and enterprise editions only)
Description
If this method is passed the value of true, it will cause screen-scraper to stop the current scraping session if a script error occurs.
Parameters
- stopScrapingOnScriptError true or false
Change Log
Version |
Description |
5.5.36a |
Available in Professional and Enterprise editions. |
Examples
Indicate that the scraping session should be stopped if a script error occurs
session.setStopScrapingOnScriptError( true );
// You can also access the current value like so:
session.log( "Stop scraping on script error: " + session.getStopScrapingOnScriptError() );
setTimeZone
void session.setTimeZone ( String timeZone )
void session.setTimeZone ( TimeZone timeZone )
Description
Sets the time zone that will be used when using a method that returns a time formatted as a string.
Parameters
- timeZone The new timezone to use. If null is given, the local timezone will be used.
Return Value
This method returns void.
Change Log
Version |
Description |
5.5.29a |
Available in all editions. |
Examples
Set the time zone
session.setTimeZone("America/Denver");
setUseServerCharacterSet
void session.setUseServerCharacterSet ( boolean useServerCharacterSet ) (professional and enterprise editions only)
Description
If this method is passed the value of true, it will cause screen-scraper to utilize whatever character set is specified by the server in its "Content-Type" response header. If no such header exists, screen-scraper will default to either the character set indicated for the scraping session or the global character set (in that order).
Parameters
- useServerCharacterSet true or false
Change Log
Version |
Description |
5.5.11a |
Available in all editions. |
Examples
Indicate that the server character set should be used
session.setUseServerCharacterSet( true );
// You can also access the current value like so:
session.log( "Use server character set: " + session.getUseServerCharacterSet() );
setUserAgent
void session.setUserAgent ( String userAgent ) (professional and enterprise editions only)
Description
Sets the user agent to be used for all requests.
Change Log
Version |
Description |
5.5.23a |
Available in Professional and Enterprise editions. |
Examples
Set the user agent
session.setUserAgent( "Opera/9.64(Windows NT 5.1; U; en) Presto/2.1.1" );
// You can also access the current value like so:
session.log( "Session user agent: " + session.getUserAgent() );
setVariable
void session.setVariable ( String identifier, Object value )
Description
Set the value of a session variable.
Parameters
- identifier Name of the session variable, as a string.
- value Value of the session variable. This can be any Java object, including (but not llimited to) a String, DataSet, or DataRecord.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Set Session Variable
// Sets the session variable "CITY_CODE" with the value found
// in the first dataRecord (at index 0) pointed to by the
// identifier "CITY_CODE".
session.setVariable( "CITY_CODE", dataSet.get( 0, "CITY_CODE" ) );
See Also
- addToVariable() [session] - Adds an integer to the value of a session variable.
- getv() [session] - Retrieve the value of a saved session variable (alias of getVariable).
- getVariable() [session] - Retrieve the value of a saved session variable.
- setv() [session] - Set the value of a session variable (alias of setVariable).
setv
void session.setv ( String identifier, Object value )
Description
Set the value of a session variable (alias of setVariable).
Parameters
- identifier Name of the session variable, as a string.
- value Value of the session variable. This can be any Java object, including (but not llimited to) a String, DataSet, or DataRecord.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Set Session Variable
// Sets the session variable "CITY_CODE" with the value found
// in the first dataRecord (at index 0) pointed to by the
// identifier "CITY_CODE".
session.setv( "CITY_CODE", dataSet.get( 0, "CITY_CODE" ) );
See Also
- addToVariable() [session] - Adds an integer to the value of a session variable.
- getv() [session] - Retrieve the value of a saved session variable (alias of getVariable).
- getVariable() [session] - Retrieve the value of a saved session variable.
- setVariable() [session] - Set the value of a session variable.
shouldStopScraping
boolean session.shouldStopScraping ( )
Description
Determine if the scrape has been stopped. This can be done using the stop button in the workbench or the stop scraping button on the web interface (for enterprise users).
Parameters
This method does not receive any parameters.
Return Values
Returns true if the scrape has been requested to stop; otherwise, it returns false.
Change Log
Version |
Description |
5.0 |
Added for enterprise edition. |
Examples
Stop Iterator if Scrape is Stopped
for (int i = 0; i < dataSet.getNumDataRecords(); ++i)
{
// check during every iteration to see if we should exit early.
// Without this check, the iteration will continue even
// if the stop scraping button were to be pressed.
if ( session.shouldStopScraping() )
{
break;
}
session.setVariable( "URL", dataSet.get( i, "NEXT_PAGE_URL" ) );
session.scrapeFile( "NEXT_PAGE" );
}
stopScraping
void session.stopScraping ( )
Description
Stop the current scraping session.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Stop Scrape on Scrapeable File Request Error
// Stops scraping if an error response was received
// from the server.
if( scrapeableFile.wasErrorOnRequest() )
{
session.stopScraping();
}
waitForFileDownloadsToComplete
void session.waitForFileDownloadsToComplete() (enterprise edition only)
Description
Waits for any file downloads to complete before returning. This should be used in tandem with the session.downloadFile method call that takes the "doLazy" paraameter.
Change Log
Version |
Description |
5.5.43a |
Available in Enterprise edition. |
Examples
Set the user agent
// Download five image files concurrently.
for( i = 0; i < 5; i++ )
{
session.downloadFile( "http://www.mysite.com/images/image" + i + ".jpg", "output/image" + i + ".jpg", 5, true );
}
// Wait for all of the images to finish downloading before continuing.
session.waitForFileDownloadsToComplete();