scrapeableFile

Overview

The scrapeableFile object refers to the current file being requested from a given server. It houses both the request for a file and response and can be manipulated to meet any necessary requirements: GET and POST parameters, referer information, cookies, FILE parameters, HTTP headers, characterset, and such.

addGETHTTPParameter

void scrapeableFile.addGETHTTPParameter ( String key, String value, int sequence ) (professional and enterprise editions only)

Description

Dynamically adds a GET parameter to the URL of the current scrapeable file. If a parameter with the given sequence already exists, it will be replaced by the one created from this method call. Calling this method is the equivalent in the workbench of adding a parameter under the "Parameters" tab, and designating the type as GET. Once the scraping session is completed the original HTTP parameters (those under the "Parameters" tab in the workbench) will be restored.

Parameters

  • key The key portion of the parameter. For example, if the parameter were foo=bar, the key portion would be "foo".
  • value The value portion of the parameter. For example, if the parameter were foo=bar, the value portion would be "bar".
  • sequence The sequence the parameter (equivalent to the value under the "Sequence" column in the workbench).

Return Values

None

Change Log

Version Description
5.5.32a Available in Professional and Enterprise editions.

Examples

Add a GET HTTP parameter to a scrapeable file

scrapeableFile.addGETHTTPParameter( "searchTerm", "LP player", 3 );

addHTTPHeader

void scrapeableFile.addHTTPHeader ( String key, String value ) (professional and enterprise editions only)

Description

Add an HTTP header to be sent along with the request.

Parameters

  • key Name of the variable, as a string.
  • value Value of the variable, as a string

Return Values

Returns void. If you are not using enterprise edition it will throw an error.

Change Log

Version Description
5.0 Available for professional and enterprise edition.
4.5 Available for enterprise edition.

In certain rare cases it may be necessary to explicitly add a custom header of the POST data of an HTTP request. This may be required in cases where a site is using AJAX, and the POST payload of a request is sent as XML (e.g., using the setRequestEntity method). This method must be invoked before the HTTP request is made (e.g., "Before file is scraped" for a scrapeable file).

Examples

Add AJAX header

 // In a script called "Before file is scraped"

 // Add and set AJAX-Method header to true.
 scrapeableFile.addHTTPHeader( "AJAX-Method", "true" );

See Also

addHTTPParameter

void scrapeableFile.addHTTPParameter ( HTTPParameter parameter )

Description

Dynamically add an HTTPParameter to the current scrapeable file.

Parameters

  • parameter HTTPParameter object.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for all editions.

The HTTPParameter constructor is as follows: HTTPParameter( String key, String value, int sequence, String type ). Valid types for the constructor are GET, POST, and FILE. Calling this method will have no effect unless it's invoked before the file is scraped.

Examples

Add GET HTTP Parameter

 // This would be in a script called "Before file is scraped"

 // Create HTTP parameter "page" with a value of "3" in the first location (GET is default)
 httpParameter = new com.screenscraper.common.HTTPParameter("page", "3", 1);

 // Adds a new GET HTTP parameter to the current file.
 scrapeableFile.addHTTPParameter( httpParameter );

Add POST HTTP Parameter

 // This would be in a script called "Before file is scraped"

 // Create HTTP parameter "page" with a value of "3" in the first location
 httpParameter = new com.screenscraper.common.HTTPParameter("page", "3", 1, "POST");

 // Adds a new POST HTTP parameter to the current file.
 scrapeableFile.addHTTPParameter( httpParameter );

See Also

  • removeHTTPParameter() [scrapeableFile] - Removes an HTTP Parameter from the request that will be made by the scrapeable file
  • removeAllHTTPParameters() [scrapeableFile] - Remove all the HTTP Parameters from the request that will be made by the scrapeable file

addPOSTHTTPParameter

void scrapeableFile.addPOSTHTTPParameter ( String key, String value ) (professional and enterprise editions only)
void scrapeableFile.addPOSTHTTPParameter ( String key, String value, int sequence )(professional and enterprise editions only)

Description

Dynamically adds a POST parameter to the existing set of POST parameters. If a parameter with the given sequence already exists, it will be replaced by the one created from this method call. If the method call is used that doesn't take a sequence, the new POST parameter will carry a sequence just higher than the highest existing sequence. Calling this method is the equivalent in the workbench of adding a parameter under the "Parameters" tab, and designating the type as POST. Once the scraping session is completed the original HTTP parameters (those under the "Parameters" tab in the workbench) will be restored.

Parameters

  • key The key portion of the parameter. For example, if the parameter were foo=bar, the key portion would be "foo".
  • value The value portion of the parameter. For example, if the parameter were foo=bar, the value portion would be "bar".
  • sequence The sequence the parameter (equivalent to the value under the "Sequence" column in the workbench).

Return Values

None

Change Log

Version Description
5.5.32a Available in Professional and Enterprise editions.

Examples

Add a POST HTTP parameter to a scrapeable file

// Adds a POST parameter to the end of the existing set.
scrapeableFile.addPOSTHTTPParameter( "EVENTTARGET", session.getv( "EVENTTARGET" ) );

// Replaces the existing POST parameter with a sequence of 2 with a new one.
scrapeableFile.addPOSTHTTPParameter( "VIEWSTATE", session.getv( "VIEWSTATE" ), 2 );

extractData

DataSet scrapeableFile.extractData ( String text, String extractorPatternName ) (professional and enterprise editions only)

Description

Manually apply an extractor pattern to a string.

Parameters

  • text The string to which the extractor pattern will be applied.
  • extractorPatternName Name of extractor pattern in the scrapeable file, as a string. Optionally the scraping session and scrapeable file where the extractor pattern can be found can be specified in the form [scraping session:][scrapeable file:]extractor pattern.

Return Values

Returns DataSet on success. Failures will be written out to the log as errors.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

An example of how to manually extract data is available.

Examples

Extract DataSet

 // Applies the "PRODUCT" extractor pattern to the text found in the
 // productDescriptionText variable. The resulting DataSet from
 // extractData is stored in the variable productData.

 DataSet productData = scrapeableFile.extractData( productDescriptionText, "PRODUCT" );

Loop Through DataRecords

 // Expanded example using the "PRODUCT" extractor pattern to the text found in the
 // productDescriptionText variable. The resulting DataSet from
 // extractData is stored in the variable myDataSet, which has multiple dataRecords.
 // Each myDataRecord has a PRICE and a PRODUCT_ID.<br />

 myDataSet = scrapeableFile.extractData( productDescriptionText, "PRODUCT" );
 for (i = 0; i < myDataSet.getNumDataRecords(); i++) {
     myDataRecord = myDataSet.getDataRecord(i);

     session.setVariable("PRICE", myDataRecord.get("PRICE"));
     session.setVariable("PRODUCT_ID", myDataRecord.get("PRODUCT_ID"));
 }

Extractor Pattern from another Scrapeable File

 // Apply extractor pattern "PRODUCT" from "Another scrapeable file"
 // to the variable productDescriptionText

 DataSet productData = scrapeableFile.extractData( productDescriptionText, "Another scrapeable file:PRODUCT" );

Extractor Pattern from another Scraping Session

 // Apply extractor pattern "PRODUCT" from "Another scrapeable file"
 // in "Other scraping session" to the variable productDescriptionText

 DataSet productData = scrapeableFile.extractData( productDescriptionText,
                        "Other scraping session:Another scrapeable file:PRODUCT" );

extractOneValue

String scrapeableFile.extractOneValue ( String text, String extractorPatternName ) (professional and enterprise editions only)
String scrapeableFile.extractOneValue ( String text, String extractorPatternName, String extractorTokenName ) (professional and enterprise editions only)

Description

Manually retrieve the value of a single extractor token.

Parameters

  • text The string to which the extractor pattern will be applied.
  • extractorPatternName Name of extractor pattern in the scrapeable file, as a string. Optionally the scraping session and scrapeable file where the extractor pattern can be found can be specified in the form [scraping session:][scrapeable file:]extractor pattern.
  • extractorTokenName (optional) Extractor token name, as a string, whose matched value should be returned. If left off the matched value for the first extractor token in the data set will be returned.

Return Values

Returns the match from the last data record, as a string, on success. On failure it returns null and writes a error to the log.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

If you want it to be from the first data record you could use getDataRecord.

Examples

Extract Value

 // Applies the extractor pattern "Product Name" to the data found in
 // the variable productDescriptionText. The extracted string is
 // stored in the productName variable.
 // Returns the value found in the first token found in the extractor pattern
 // or null if no token is found.

 productName = scrapeableFile.extractOneValue( productDescriptionText, "Product Name" );

Extract Value of Specified Token

 // Applies the extractor pattern "Product Name" to the data found in
 // the variable productDescriptionText. The extracted string is
 // stored in the productName variable.
 // Returns the value found in the token "NAME" found in the extractor pattern
 // or null if no token is found.

 productName = scrapeableFile.extractOneValue( productDescriptionText, "Product Name", "NAME" );

Extractor Pattern from another Scrapeable File

 // Apply extractor pattern "Product Name" from "Another scrapeable file"
 // to the variable productDescriptionText return the first "NAME"

 String productName = scrapeableFile.extractOneValue( productDescriptionText, "Another scrapeable file:Product Name", "NAME" );

Extractor Pattern from another Scraping Session

 // Apply extractor pattern "Product Name" from "Another scrapeable file"
 // in "Other scraping session" to the variable productDescriptionText
 // return the first "NAME"

 String productName = scrapeableFile.extractData( productDescriptionText,
                        "Other scraping session:Another scrapeable file:Product Name",
                       "NAME" );

getASPXValues

DataRecord scrapeableFile.getASPXValues ( boolean onlyStandard ) (professional and enterprise editions only)

Description

Gets the ASPX .NET values from the string. The standard values are __VIEWSTATE, __EVENTTARGET, __EVENTVALIDATION, and __EVENTARGUMENT. Values will be stored in the returned DataRecord as ASPX_VIEWSTATE, ASPX_EVENTTARGET, etc...

Parameters

  • onlyStandard Sets whether or not to only get the four standard tags, or look for any tags that begin with __

Return Values

A DataRecord object with each ASPX name as ASPX_[NAME] mapped to it's value. Note that when onlyStandard is false, any parameter that starts with the name __ will be returned in this DataRecord

Change Log

Version Description
5.5.26a Available in all editions.

Examples

Get the .NET values for a page

 DataRecord aspx = scrapeableFile.getASPXValues(true);

getAuthenticationPreemptive

boolean scrapeableFile.getAuthenticationPreemptive ( )

Description

Retrieve the authentication expectation of the request.

Parameters

This method does not receive any parameters.

Return Values

Returns whether the scrapeable file expects to have to authenticate and so will send the information initially instead of waiting for the request for it, as a boolean.

Change Log

Version Description
5.0 Available for all editions.

Examples

Write Expectation Status to Log

// Log expectation of authentication
if ( scrapeableFile.getAuthenticationPreemptive() )
{
    session.log( "Expecting Authentication" );
}

See Also

getCharacterSet

String scrapeableFile.getCharacterSet ( )

Description

Get the character set being used in the page response rendering.

Parameters

This method does not receive any parameters.

Return Values

Returns the character set applied to the scraped page, as a string. If a character set has not been specified then it will default to the character set specified in settings dialog box.

Change Log

Version Description
4.5 Available for all editions.

If you are having trouble with characters displaying incorrectly, we encourage you to read about how to go about finding a solution using one of our FAQs.

Examples

Get Character Set

 // Get the character set of the dataSet
 charSetValue = scrapeableFile.getCharacterSet();

See Also

  • setCharacterSet() [scrapeableFile] - Set the character set used to responses to a specific scrapeable file.
  • setCharacterSet() [session] - Set the character set used to render all responses.
  • getCharacterSet() [session] - Gets the character set used to render all responses.

getContentAsString

String scrapeableFile.getContentAsString ( )

Description

Retrieve contents of the response.

Parameters

This method does not receive any parameters.

Return Values

Returns contents of the last response, as a string. If the file has not been scraped it will return an empty string.

Change Log

Version Description
4.5 Available for all editions.

Examples

Log Response

 // In a script run "After file is scraped"

 // Sends the HTML of the current file to the log.
 session.log( scrapeableFile.getContentAsString() );

getContentType

String scrapeableFile.getContentType ( )

Description

Retrieve the POST payload type being used to interpret the page. This can be important with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.

Parameters

This method does not receive any parameters.

Return Values

Returns the content type, as a string (e.g., text/html or text/xml).

Change Log

Version Description
5.0 Available for all editions.

Examples

Write Content Type to Log

// Write to log
session.log( "Content Type: " + scrapeableFile.getContentType( "text/xml" ) );

See Also

getCurrentPOSTData

String scrapeableFile.getCurrentPOSTData ( )

Description

Retrieve the POST data.

Parameters

This method does not receive any parameters.

Return Values

Returns the POST data for the scrapeable file, as a string. If called after the file has been scraped the session variable token will be resolved to their values; otherwise, the tokens will simply be removed from the string.

Change Log

Version Description
4.5 Available for all editions.

Examples

Collect POST data

 // In script called "After file is scraped"

 // Stores the POST data from the scrapeable file in the
 // currentPOSTData variable.

 currentPOSTData = scrapeableFile.getCurrentPOSTData();

getCurrentURL

String scrapeableFile.getCurrentURL ( )

Description

Get the URL of the file.

Parameters

This method does not receive any parameters.

Return Values

Returns the URL of the scrapeable file, as a string. If called after the file has been scraped the session variable tokens will be resolved to their values; otherwise, the tokens will simply be removed from the string.

Change Log

Version Description
4.5 Available for all editions.

Examples

Collect URL

 // In script called "After file is scraped"

 // Stores the current URL in the variable currentURL.
 currentURL = scrapeableFile.getCurrentURL();

getExtractorPatternTimedOut

boolean scrapeableFile.getExtractorPatternTimedOut () (professional and enterprise editions only)

Description

Indicates whether or not the most recent extractor pattern application timed out.

Parameters

None

Return Values

  • true or false

Change Log

Version Description
5.5.36a Available in all editions.

Examples

Find out about the last extractor pattern attempt

if( scrapeableFile.getExtractorPatternTimedOut() )
{
        session.log( "Most recent extractor pattern timed out." );
}

getForceNonBinary

boolean scrapeableFile.getForceNonBinary ( )

Description

Determine whether or not the contents of this response are being forced to be recognized as non-binary.

Parameters

This method does not receive any parameters.

Return Values

Returns true if the scrapeable file is being forced to be treated as non-binary; otherwise, it returns false.

Change Log

Version Description
5.0 Added for all editions.

Examples

Check Binary Status of File

 // Determine if the file is being forced
 // to be recognized as non-binary

 forced = scrapeableFile.getForceNonBinary();

See Also

  • setForceNonBinary() [scrapeableFile] - Sets whether or not the contents of the file are forced to be interpreted as non-binary

getHTTPResponseHeader

String scrapeableFile.getHTTPResponseHeader ( String header ) (professional and enterprise editions only)

Description

Gets the value of the header in the response of the scrapeable file, or returns null if it couldn't be found

Parameters

  • header The header name (case-insensitive) to get

Return Value

The value of the header, or null if not found

Change Log

Version Description
5.5.29a Available in professional and enterprise editions.

Examples

Log the Content-Type

 session.log(scrapeableFile.getHTTPResponseHeader());

getHTTPResponseHeaderSection

String scrapeableFile.getHTTPResponseHeaderSection ( ) (professional and enterprise editions only)

Description

Gets the header section of the HTTP Response

Parameters

This method takes no parameters

Return Value

A String containing the HTTP Response Headers

Change Log

Version Description
5.5.29a Available in professional and enterprise editions.

Examples

Log the headers

 // Split the headers into lines
 String[] headers = scrapeableFile.getHTTPResponseHeaderSection().split("[\\r\\n]");
 for(int i = 0; i < headers.length; i++)
 {
   session.log(headers[i]);
 }

getHTTPResponseHeaders

Map<String, String> scrapeableFile.getHTTPResponseHeaders ( ) (professional and enterprise editions only)

Description

Gets the headers of the HTTP Response as a map, and returns them.

Parameters

This method takes no parameters

Return Value

A Map from header name to header value for the response headers.

Change Log

Version Description
5.5.29a Available in professional and enterprise editions.

Examples

Get the Content-Type header

 Map headers = scrapeableFile.getHTTPResponseHeaders();
 Iterator it = headers.keySet().iterator();
 while(it.hasNext())
 {
   String next = it.next();
   if(next.equalsIgnoreCase("Content-Type"))
     session.log("Content-Type was: " + headers.get(next));
 }

getLastTidyAttemptFailed

boolean scrapeableFile.getLastTidyAttemptFailed ()

Description

Indicates whether or not the most recent attempt to tidy the HTML failed.

Parameters

None

Return Values

  • true or false

Change Log

Version Description
5.5.36a Available in all editions.

Examples

Find out about the last HTML tidy attempt

if( scrapeableFile.getLastTidyAttemptFailed() )
{
        session.log( "Most recent tidy attempt failed." );
}

getMaxRequestAttemptsReached

boolean scrapeableFile.getMaxRequestAttemptsReached () (professional and enterprise editions only)

Description

Indicates whether or not the maximum attempts to request a given scrapeable file were reached.

Parameters

None

Return Values

  • true or false

Change Log

Version Description
5.5.36a Available in all editions.

Examples

Find out about the last request attempt

if( scrapeableFile.getMaxRequestAttemptsReached() )
{
        session.log( "Maximum request attempts were reached." );
}

getMaxResponseLength

int scrapeableFile.getMaxResponseLength ( )

Description

Retrieve the kilobyte limit for information retrieved by the scrapeable file, any additional information will not be retrieved.

Parameters

This method does not receive any parameters.

Return Values

Returns the current kilobyte limit on the response, as an integer.

Change Log

Version Description
5.0 Add for professional and enterprise editions.

Examples

Log Response Size Limit

 // Log Limit
 session.log( "Max Response Length: " + scrapeableFile.getMaxResponseLength() + " KB" );

See Also

  • setMaxResponseLength() [scrapeableFile] - Sets the maximum number of kilobytes that will be retreived by the scrapeable file

getName

String scrapeableFile.getName ( )

Description

Get the name of the scrapeable file.

Parameters

This method does not receive any parameters.

Return Values

Returns the name of the scrapeable file, as a string.

Change Log

Version Description
4.5 Available for all editions.

Examples

Write Scrapeable File Name to Log

 // Outputs the name of the scrapeable file to the log.

 session.log( "Current scrapeable file: " + scrapeableFile.getName() );

getNonTidiedHTML

String scrapeableFile.getNonTidiedHTML ( ) (enterprise edition only)

Description

Retrieve the non-tidied HTML of the scrapeable file.

Parameters

This method does not receive any parameters.

Return Values

Returns the non-tidied contents of the scrapeable file, as a string. On failure it returns null.

Change Log

Version Description
4.5 Available for enterprise edition.

By default non-tidied html is not retained. For this method to return anything other than null you must use setRetainNonTidiedHTML to force non-tidied html to be retained.

Examples

Write Untidied HTML to Log if Retained

 // Outputs the non-tidied HTML from the scrapeable file
 // to the log based on whether it was retained or not.

 if (scrapeableFile.getRetainNonTidiedHTML())
 {
     session.log( "Non-tidied HTML: " + scrapeableFile.getNonTidiedHTML() );
 }
 else
 {
     session.log( "The non-tidied HTML was not retained or the file has not yet been scraped." );
 }

See Also

getRedirectURLs

String[] scrapeableFile.getRedirectURLs ( ) (professional and enterprise editions only)

Description

Gets an array of strings containing the redirect URL's for the current scrapeable file request attempt.

Parameters

This method does not receive any parameters.

Return Values

Returns the array of strings; may be empty.

Change Log

Version Description
6.0.24a Available in Professional and Enterprise editions.

getRetainNonTidiedHTML

boolean scrapeableFile.getRetainNonTidiedHTML ( ) (enterprise edition only)

Description

Determine if the scrapeable file is set to retain non-tidied html.

Parameters

This method does not receive any parameters.

Return Values

Returns boolean flag for non-tidied contents being retained.

Change Log

Version Description
4.5 Available for enterprise edition.

Examples

Write Untidied HTML to Log if Retained

 // Outputs the non-tidied HTML from the scrapeable file
 // to the log if it was retained otherwise just a message.

 if (scrapeableFile.getRetainNonTidiedHTML())
 {
     session.log( "Non-tidied HTML: " + scrapeableFile.getNonTidiedHTML() );
 }
 else
 {
     session.log( "The non-tidied HTML was not retained or the file has not yet been scraped." );
 }

See Also

getRetryPolicy

RetryPolicy scrapeableFile.getRetryPolicy ( ) (professional and enterprise editions only)

Description

Returns the retry policy. Note that in any 'After file is scraped' scripts this is null

Parameters

This method takes no parameters.

Return Value

The Retry Policy that will be used by this scrapeable file

Change Log

Version Description
5.5.29a Available in professional and enterprise editions.

Examples

Check for a retry policy

 if(scrapeableFile.getRetryPolicy() == null)
 {
   session.log(scrapeableFile.getName() + ": Retry policy has been set for this scrapeable file.");
 }

getStatusCode

int scrapeableFile.getStatusCode ( ) (professional and enterprise editions only)

Description

Determine the HTTP status code sent by the server.

Parameters

This method does not receive any parameters.

Return Values

Returns integer corresponding to the HTTP status code of the response.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

Examples

Write warning to log on 404 error

 // Check for a 404 response (file not found).
 if( scrapeableFile.getStatusCode() == 404 )
 {
     url = scrapeablefile.getCurrentURL();
     session.log( "Warning! The server returned a 404 response for the url ( " + url + ")." );
 }

getUserAgent

String scrapeableFile.getUserAgent ( )

Description

Retrieve the name of the user agent making the request.

Parameters

This method does not receive any parameters.

Return Values

Returns the user agent, as a string.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

Examples

Write User Agent to Log

 // write to log
 session.log( scrapeableFile.getUserAgent( ) );

See Also

  • setUserAgent() [scrapeableFile] - Sets the name of the user agent that will make the request

inputOutputErrorOccured

boolean scrapeableFile.inputOutputErrorOccurred ( )

Description

Determine if an input or output error occurred when requesting file.

Parameters

This method does not receive any parameters.

Return Values

Returns true if an error has occurred; otherwise, it returns false.

Change Log

Version Description
5.0 Added for all editions.

This method should be run after the scrapeable file has been scraped.

Examples

End scrape on Error

 // Check for error<br />
 if (scrapeableFile.inputOutputErrorOccurred())
 {
     // Log error occurrence
     session.log("Input/output error occurred.");
     // End scrape
     session.stopScraping();
 }

noExtractorPatternsMatched

boolean scrapeableFile.noExtractorPatternsMatched ( )

Description

Determine whether any extractor patterns associated with the scrapeable file found a match.

Parameters

This method does not receive any parameters.

Return Values

Returns boolean corresponding to whether any extractor pattern matched in the scrapeable file.

Change Log

Version Description
4.5 Available for all editions.

Examples

Warning if no Extractor Patterns matched

 // If no patterns matched, outputs a message indicating such
 // to the session log.

 if( scrapeableFile.noExtractorPatternsMatched() )
 {
     session.log( "Warning! No extractor patterns matched." );
 }

removeAllHTTPParameters

void scrapeableFile.removeAllHTTPParameters ( ) (professional and enterprise editions only)

Description

Remove all of the HTTP parameters from the current scrapeable file.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

Examples

Delete HTTP Parameters

 // Removes all of the HTTP parameters from the current scrapeable file.
 scrapeableFile.removeAllHTTPParameters();

See Also

  • removeHTTPParameter() [scrapeableFile] - Removes an HTTP Parameter from the request that will be made by the scrapeable file
  • addHTTPParameter() [scrapeableFile] - Add an HTTP Parameter to the request that will be made by the scrapeable file

removeHTTPHeader

void scrapeableFile.removeHTTPHeader ( String key ) (enterprise edition only)
void scrapeableFile.removeHTTPHeader ( String key, String value ) (enterprise edition only)

Description

Remove an HTTP header from a scrapeable file.

Parameters

  • key The name of the HTTP header to be removed, as a string.
  • value (optional) The value of the HTTP header that is to be removed, as a string. If this is left off then all headers of the specified key will be removed.

Return Values

Returns void.

Change Log

Version Description
5.0.5a Introduced for enterprise edition.

Examples

Remove All Values of a Header

// delete all cookie headers for this scrapeableFile
// this can be done on a global scale
//    using session.clearCookies
scrapeableFile.removeHTTPHeader( "User-Agent" );

See Also

  • addHTTPHeader() [scrapeableFile] - Adds an HTTP Header to the scrapeable file

removeHTTPParameter

void scrapeableFile.removeHTTPParameter ( int sequence )
void scrapeableFile.removeHTTPParameter ( String key ) (professional and enterprise editions only)

Description

Dynamically removes an HTTPParameter. The order of the remaining parameters are adjusted immediately.

Parameters

  • sequence The ordered location of the parameter.
  • key The key identifying the HTTP parameter to be removed.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for all editions.
5.5.32a: Added method call that takes a String. Available for Professional and Enterprise editions.

If calling this method more than once in the same script, when used in conjunction with the addHTTPParameter method, it is important to keep track of how the list is reordered before calling either method again.

Calling this method will have no effect unless it's invoked before the file is scraped.

This method can be used for both GET and POST parameters.

Examples

Remove HTTP parameter

 // In a script called "Before file is scraped"

 // Removes the eighth HTTP parameter from the current file.
 scrapeableFile.removeHTTPParameter( 8 );

See Also

  • addHTTPParameter() [scrapeableFile] - Adds an HTTP Parameter to the request that will be made by the scrapeable file
  • removeAllHTTPParameters() [scrapeableFile] - Remove all the HTTP Parameters from the request that will be made by the scrapeable file

resequenceHTTPParameter

void scrapeableFile.resequenceHTTPParameter ( String key, int sequence ) (professional and enterprise editions only)

Description

Resequences an HTTP parameter.

Parameters

  • key The key identifying the HTTP parameter to be resequenced.
  • sequence The new sequence the parameter should have.

Return Values

None

Change Log

Version Description
5.5.32a Available in Professional and Enterprise editions.

Examples

Resequence an HTTP parameter

// Give the "VIEWSTATE" HTTP parameter a sequence of 3.
scrapeableFile.resequenceHTTPParameter( "VIEWSTATE", 3 );

resolveRelativeURL

String scrapeableFile.resolveRelativeURL ( String urlToResolve ) (professional and enterprise editions only)

Description

Resolves a relative URL to an absolute URL based on the current URL of this scrapeable file.

Parameters

  • urlToResolve Relative file path, as a string.

Return Values

Returns string containing the complete url to the file. On failure it will return the relative path and an error will be written to the log.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

Examples

Resolve relative URL into an absolute URL

 // Assuming the URL of the current scrapeable file is
 // "https://www.screen-scraper.com/path/to/file/"
 // the method call would result in the URL
 // "https://www.screen-scraper.com/path/to/file/thisfile.php"
 // begin assigned to the "fullURL" variable.

 fullURL = scrapeableFile.resolveRelativeURL( "thisfile.php" );

saveFileBeforeTidying

void scrapeableFile.saveFileBeforeTidying ( String filePath ) (professional and enterprise editions only)

Description

Write non-tidied contents of the scrapeable file response to a text file.

Parameters

  • filePath File path, as a string, where the file should be saved.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

This method must be called before the file is scraped.

Because the response header are also saved in the file, if the file is anything except a text file it will not be valid (e.g. images, pdfs).

Examples

Save Untidied Request and Response

 // In script called "Before file is scraped"

 // Causes the non-tidied HTML from the scrapeable file
 // to be output to the file path.

 scrapeableFile.saveFileBeforeTidying( "C:/non-tidied.html" );

saveFileOnRequest

void scrapeableFile.saveFileOnRequest ( String filePath ) (enterprise edition only)

Description

Save the file returned from a scrapeable file request.

Parameters

  • filePath Location where the file should be saved as a string.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for enterprise edition.

This method must be called from a scrapeable file before the file is scraped. Do not call this method from a script which is invoked by other means such as after an extractor pattern match or from within another script.

It is preferable to use downloadFile; however, at times you may have to send POST parameters in order to access a file. If that is the case, you would use this method.

This method cannot save local file requests to another location.

Examples

Save requested file

 // In script called "Before file is scraped"

 // When the current file is requested it will be saved to the
 // local file system as "sample.pdf".

 scrapeableFile.saveFileOnRequest( "C:/downloaded_files/sample.pdf" );

setAuthenticationPreemptive

void scrapeableFile.setAuthenticationPreemptive ( boolean preemptiveAuthentication )

Description

Set the authentication expectation of the request.

Parameters

  • preemptiveAuthentication Whether the scrapeable file expects to have to authenticate and so will send the information initially instead of waiting for the request for it, as a boolean.

Return Values

Returns void.

Change Log

Version Description
5.0 Available for all editions.

Examples

Set Preemptive Authentication

// Set expectation of authentication
scrapeableFile.setAuthenticationPreemptive( false );

See Also

setCharacterSet

void scrapeableFile.setCharacterSet ( String characterSet ) (professional and enterprise editions only)

Description

Set the character set used in a specific scrapeable file's response renderings. This can be particularly helpful when the page renders characters incorrectly.

Parameters

  • characterSet Java recognized character set, as a string. Java provides a list of supported character sets in its documentation.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for all editions.

This method must be called before the file is scraped.

If you are having trouble with characters displaying incorrectly, we encourage you to read about how to go about finding a solution using one of our FAQs.

Examples

Set Character Set of Scrapeable File

 // In script called "Before file is scraped"

 // Sets the character set to be applied to the last response.
 scrapeableFile.setCharacterSet( "ISO-8859-1" );

See Also

  • getCharacterSet() [scrapeableFile] - Gets the character set used to responses to a specific scrapeable file.
  • setCharacterSet() [session] - Set the character set used to render all responses.
  • getCharacterSet() [session] - Gets the character set used to render all responses.

setContentType

void scrapeableFile.setContentType ( String contentType ) (professional and enterprise editions only)

Description

Set POST payload type. This is particularly helpful with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.

Parameters

  • setContentType Desired content type of the POST payload, as a string.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

This method must be called before the file is scraped.

This method is usually used in connection with setRequestEntity as that method specifies the content of the POST data.

Examples

Set Content Type for XML payload in AJAX

 // In script called "Before file is scraped"

 // Sets the type of the POST entity to XML.
 scrapeableFile.setContentType( "text/xml" );

 // Set content of POST data
 scrapeableFile.setRequestEntity( "<person><name>John Smith</name></person>" );

See Also

setForceMultiPart

void scrapeableFile.setForceMultiPart ( boolean forceMultiPart ) (professional and enterprise editions only)

Description

Set content type header to multipart/form-data.

Parameters

  • forceMultiPart Boolean representing whether the request contains multipart data (e.g. images, files) as opposed to plain text. The default is false.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

This method must be called before the file is scraped.

Occasionally a site will expect a multi-part request when a file is not being sent in the request.

If you include a file upload parameter under the parameters tab of the scrapeable file the request will automatically be multi-part.

Examples

Specify that Request contains Files

 // In script called "Before file is scraped"

 // Will cause the request to be made as a multi-part request.
 scrapeableFile.setForceMultiPart( true );

setForceNonBinary

void scrapeableFile.setForceNonBinary ( boolean forceNonBinary )

Description

Set whether or not the contents of this response should be forced to be treated as non-binary. Default forceNonBinary value is false.

Parameters

  • forceNonBinary Whether or not the scrapeable file should be forced to be non-binary.

Return Values

Returns void.

Change Log

Version Description
5.0 Added for all editions.

This is provided in the case where screen-scraper misidentifies a non-binary file as a binary file. It doesn't happen often but is possible.

Examples

Check Binary Status of File

 // Force file to be recognized as non-binary
 scrapeableFile.setForceNonBinary( true );

See Also

  • getForceNonBinary() [scrapeableFile] - Returns whether or not this scrapeable file response will be forced to be treated as non-binary

setForcePOST

void scrapeableFile.setForcePOST ( Boolean forcePOST ) (professional and enterprise editions only)

Description

Determines whether or not a POST request should be forced.

Parameters

  • forcePOST Whether a POST

Return Values

Returns void.

Change Log

Version Description
6.0.14a Available in Professional and Enterprise editions.

setForcedRequestType

void scrapeableFile.setForcedRequestType ( ScrapeableFile.RequestType type ) (professional and enterprise editions only)

Description

Sets the request type to use.

Parameters

  • type The type of request to issue, or null to let screen-scraper decide.

    ScrapeableFile.RequestType is an enum with the following options as values

    • GET
    • POST
    • HEAD
    • DELETE
    • OPTIONS


    If the method sets the request to one of those types, all paramenters set as GET in the paramenters tab will be appended to the url (like normal) and all parameters set as POST parameters will be used to buld the request entity. If there are POST values on a type that doesn't support a request entity an exception will be thrown when the request is issued.

Return Values

Returns void.

Change Log

Version Description
6.0.55a Available in Professional and Enterprise editions.

Examples

Sets the request type

    scrapeableFile.setForcedRequestType(ScrapeableFile.RequestType.PUT)

setLastScrapedData

void scrapeableFile.setLastScrapedData(String) (enterprise edition only)

Description

Overwrite the content of the "last response"

Parameters

  • String Desired new content of the last response

Return Values

Returns void.

This method must be called from an extractor pattern before the pattern is run.

Examples

Replace new line characters with a space

newLastResponse = scrapeableFile.getContentAsString().replaceAll("\\n"," ");
scrapeableFile.setLastScrapedData(newLastResponse );

setMaxResponseLength

void scrapeableFile.setMaxResponseLength ( int maxKBytes ) (professional and enterprise editions only)

Description

Limit the amount of information retrieved by the scrapeable file. This method can be useful in cases of very large responses where the desired information is found in the first portion of the response. It can also help to make the scraping process more efficient by only downloading the needed information.

Parameters

  • maxKBytes Kilobytes to be downloaded, as an integer.

Return Values

Returns void.

Change Log

Version Description
5.0 Add for professional and enterprise editions.

This method must be called before the file is scraped.

Examples

Limit Response Size

 // In script called "Before file is scraped"

 // Only download the first 50 KB
 scrapeableFile.setMaxResponseLength(50);

See Also

  • getMaxResponseLength() [scrapeableFile] - Returns the maximum response length that is read by the scrapeable file

setReferer

void scrapeableFile.setReferer ( String url ) (professional and enterprise editions only)

Description

Set referer HTTP header.

Parameters

  • url URL of the referer, as a string.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

This method must be called before the file is scraped.

Examples

Specify that Request contains Files

 // In script called "Before file is scraped"

 // Sets the value of url as the HTTP header
 // referer for the current scrapeable file.

 scrapeableFile.setReferer( "http://www.foo.com/" );

setRequestEntity

void scrapeableFile.setRequestEntity ( String requestEntity ) (professional and enterprise editions only)

Description

Set POST payload data. This is particularly helpful with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.

Parameters

  • requestEntity Desired content of the POST payload, as a string.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

This method must be called before the file is scraped.

This method is usually used in connection with setContentType as that method specifies the content of the POST data.

Though you can set plain text POST data using this method it is preferable to use the addHTTPParameter method for this task.

Examples

Set POST data as XML

 // In script called "Before file is scraped"

 // Sets the type of the POST entity to XML.
 scrapeableFile.setContentType( "text/xml" );

 // Set content of POST data
 scrapeableFile.setRequestEntity( "<person><name>John Smith</name></person>" );

setRetainNonTidiedHTML

void scrapeableFile.setRetainNonTidiedHTML ( boolean retainNonTidiedHTML ) (enterprise edition only)

Description

Set whether or not non-tidied HTML is to be retained for the current scrapeable file.

Parameters

  • retainNonTidiedHTML Whether the non-tidied HTML should be retained, as a boolean. The default is false.

Return Values

Returns void.

Change Log

Version Description
4.5 Available for enterprise edition.

If, after the file is scraped, you want to be able to use getNonTidiedHTML this method has to be called before the file is scraped.

Examples

Retain Non-tidied HTML

 // In script called "Before file is scraped"

 // Tells screen-scraper to retain tidied HTML for the current
 // scrapeable files.

 scrapeableFile.setRetainNonTidiedHTML( true );

See Also

setRetryPolicy

void scrapeableFile.setRetryPolicy ( RetryPolicy policy ) (professional and enterprise editions only)

Description

Sets a Retry Policy that will be run to check if a page should be re-downloaded or not. The policy will be checked after all the extractors have run, and will check for an error on the page based on a set of conditions. If the policy shows an error on the page, it can run scripts or other code to attempt to remedy the situation, and then it will rescrape the file.

The file will be re-downloaded without rerunning any of the scripts that run before the file is downloaded, and before any of the scripts marked to run after the file is scraped. If there is any change that needs to be made to session variables/headers, etc... they should be made in the script or runnable that will be executed. Also, the policy can specify that session variables should be restored to their previous values before the file is rescraped. If it does, they will be reset after the error checking portion of the policy but before the policy runs the code to make changes before a retry.

The retry policy should be set in a script run 'Before file is scraped', but can also be set by a script on an extractor pattern. It it is set on an extractor pattern, session variables will not be restored if the retry is required

Parameters

  • policy The policy that should be run. See the RetryPolicyFactory for standard policies, or one can be created by implementing the RetryPolicy interface

Return Value

This method returns void.

Change Log

Version Description
5.5.29a Available in professional and enterprise editions.

Examples

Set a basic retry policy

 import com.screenscraper.util.retry.RetryPolicyFactory;

 // Use a policy that will retry up to 5 times, and on each failed attempt to load
 // the page, it will execute the "Get new Proxy" script

 scrapeableFile.setRetryPolicy(RetryPolicyFactory.getBasicPolicy(5, "Get new Proxy"));

setUserAgent

void scrapeableFile.setUserAgent ( String userAgent ) (professional and enterprise editions only)

Description

Explicitly state the user agent making the request.

Parameters

  • userAgent User agent name, as a string. There are a lot of possible user agents, a list is maintained by User-Agents.org. The default is Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322).

Return Values

Returns void.

Change Log

Version Description
4.5 Available for professional and enterprise editions.

This method must be called before the file is scraped.

Examples

Set User Agent

 // In script called "Before file is scraped"

 // Causes screen-scraper to identify itself as Firefox
 // running on Linux.

 scrapeableFile.setUserAgent( "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826" );

See Also

  • getUserAgent() [scrapeableFile] - Returns the name of the user agent that will make the request

wasErrorOnRequest

boolean scrapeableFile.wasErrorOnRequest ( )

Description

Determine if an error occurred with the request. Errors are considered to be server timeouts as well as any status code outside of the range 200-399.

Parameters

This method does not receive any parameters.

Return Values

Returns true for server timeouts as well as any status code outside of the range 200-399; otherwise, it returns false.

Change Log

Version Description
4.5 Available for all editions.

This method must be called after the file is scraped.

If you want to know what the status code was you can use getStatusCode.

Examples

Check for Request Errors

 // In script called "After file is scraped"

 // If an error occurred when the file was requested, an error
 // message indicating such gets output to the log.

 if( scrapeableFile.wasErrorOnRequest() )
 {
     session.log( "Connection error occurred." );
 }