scrapeableFile
Overview
The scrapeableFile object refers to the current file being requested from a given server. It houses both the request for a file and response and can be manipulated to meet any necessary requirements: GET and POST parameters, referer information, cookies, FILE parameters, HTTP headers, characterset, and such.
addGETHTTPParameter
void scrapeableFile.addGETHTTPParameter ( String key, String value, int sequence ) (professional and enterprise editions only)
Description
Dynamically adds a GET parameter to the URL of the current scrapeable file. If a parameter with the given sequence already exists, it will be replaced by the one created from this method call. Calling this method is the equivalent in the workbench of adding a parameter under the "Parameters" tab, and designating the type as GET. Once the scraping session is completed the original HTTP parameters (those under the "Parameters" tab in the workbench) will be restored.
Parameters
- key The key portion of the parameter. For example, if the parameter were foo=bar, the key portion would be "foo".
- value The value portion of the parameter. For example, if the parameter were foo=bar, the value portion would be "bar".
- sequence The sequence the parameter (equivalent to the value under the "Sequence" column in the workbench).
Change Log
Version |
Description |
5.5.32a |
Available in Professional and Enterprise editions. |
Examples
Add a GET HTTP parameter to a scrapeable file
scrapeableFile.addGETHTTPParameter( "searchTerm", "LP player", 3 );
addHTTPHeader
void scrapeableFile.addHTTPHeader ( String key, String value ) (professional and enterprise editions only)
Description
Add an HTTP header to be sent along with the request.
Parameters
- key Name of the variable, as a string.
- value Value of the variable, as a string
Return Values
Returns void. If you are not using enterprise edition it will throw an error.
Change Log
Version |
Description |
5.0 |
Available for professional and enterprise edition. |
4.5 |
Available for enterprise edition. |
In certain rare cases it may be necessary to explicitly add a custom header of the POST data of an HTTP request. This may be required in cases where a site is using AJAX, and the POST payload of a request is sent as XML (e.g., using the setRequestEntity method). This method must be invoked before the HTTP request is made (e.g., "Before file is scraped" for a scrapeable file).
Examples
Add AJAX header
// In a script called "Before file is scraped"
// Add and set AJAX-Method header to true.
scrapeableFile.addHTTPHeader( "AJAX-Method", "true" );
See Also
addHTTPParameter
void scrapeableFile.addHTTPParameter ( HTTPParameter parameter )
Description
Dynamically add an HTTPParameter to the current scrapeable file.
Parameters
- parameter HTTPParameter object.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
The HTTPParameter constructor is as follows: HTTPParameter( String key, String value, int sequence, String type ). Valid types for the constructor are GET, POST, and FILE. Calling this method will have no effect unless it's invoked before the file is scraped.
Examples
Add GET HTTP Parameter
// This would be in a script called "Before file is scraped"
// Create HTTP parameter "page" with a value of "3" in the first location (GET is default)
httpParameter = new com.screenscraper.common.HTTPParameter("page", "3", 1);
// Adds a new GET HTTP parameter to the current file.
scrapeableFile.addHTTPParameter( httpParameter );
Add POST HTTP Parameter
// This would be in a script called "Before file is scraped"
// Create HTTP parameter "page" with a value of "3" in the first location
httpParameter = new com.screenscraper.common.HTTPParameter("page", "3", 1, "POST");
// Adds a new POST HTTP parameter to the current file.
scrapeableFile.addHTTPParameter( httpParameter );
See Also
- removeHTTPParameter() [scrapeableFile] - Removes an HTTP Parameter from the request that will be made by the scrapeable file
- removeAllHTTPParameters() [scrapeableFile] - Remove all the HTTP Parameters from the request that will be made by the scrapeable file
addPOSTHTTPParameter
void scrapeableFile.addPOSTHTTPParameter ( String key, String value ) (professional and enterprise editions only)
void scrapeableFile.addPOSTHTTPParameter ( String key, String value, int sequence )(professional and enterprise editions only)
Description
Dynamically adds a POST parameter to the existing set of POST parameters. If a parameter with the given sequence already exists, it will be replaced by the one created from this method call. If the method call is used that doesn't take a sequence, the new POST parameter will carry a sequence just higher than the highest existing sequence. Calling this method is the equivalent in the workbench of adding a parameter under the "Parameters" tab, and designating the type as POST. Once the scraping session is completed the original HTTP parameters (those under the "Parameters" tab in the workbench) will be restored.
Parameters
- key The key portion of the parameter. For example, if the parameter were foo=bar, the key portion would be "foo".
- value The value portion of the parameter. For example, if the parameter were foo=bar, the value portion would be "bar".
- sequence The sequence the parameter (equivalent to the value under the "Sequence" column in the workbench).
Change Log
Version |
Description |
5.5.32a |
Available in Professional and Enterprise editions. |
Examples
Add a POST HTTP parameter to a scrapeable file
// Adds a POST parameter to the end of the existing set.
scrapeableFile.addPOSTHTTPParameter( "EVENTTARGET", session.getv( "EVENTTARGET" ) );
// Replaces the existing POST parameter with a sequence of 2 with a new one.
scrapeableFile.addPOSTHTTPParameter( "VIEWSTATE", session.getv( "VIEWSTATE" ), 2 );
extractData
DataSet scrapeableFile.extractData ( String text, String extractorPatternName ) (professional and enterprise editions only)
Description
Manually apply an extractor pattern to a string.
Parameters
- text The string to which the extractor pattern will be applied.
- extractorPatternName Name of extractor pattern in the scrapeable file, as a string. Optionally the scraping session and scrapeable file where the extractor pattern can be found can be specified in the form [scraping session:][scrapeable file:]extractor pattern.
Return Values
Returns DataSet on success. Failures will be written out to the log as errors.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
An example of how to manually extract data is available.
Examples
Extract DataSet
// Applies the "PRODUCT" extractor pattern to the text found in the
// productDescriptionText variable. The resulting DataSet from
// extractData is stored in the variable productData.
DataSet productData = scrapeableFile.extractData( productDescriptionText, "PRODUCT" );
Loop Through DataRecords
// Expanded example using the "PRODUCT" extractor pattern to the text found in the
// productDescriptionText variable. The resulting DataSet from
// extractData is stored in the variable myDataSet, which has multiple dataRecords.
// Each myDataRecord has a PRICE and a PRODUCT_ID.<br />
myDataSet = scrapeableFile.extractData( productDescriptionText, "PRODUCT" );
for (i = 0; i < myDataSet.getNumDataRecords(); i++) {
myDataRecord = myDataSet.getDataRecord(i);
session.setVariable("PRICE", myDataRecord.get("PRICE"));
session.setVariable("PRODUCT_ID", myDataRecord.get("PRODUCT_ID"));
}
Extractor Pattern from another Scrapeable File
// Apply extractor pattern "PRODUCT" from "Another scrapeable file"
// to the variable productDescriptionText
DataSet productData = scrapeableFile.extractData( productDescriptionText, "Another scrapeable file:PRODUCT" );
Extractor Pattern from another Scraping Session
// Apply extractor pattern "PRODUCT" from "Another scrapeable file"
// in "Other scraping session" to the variable productDescriptionText
DataSet productData = scrapeableFile.extractData( productDescriptionText,
"Other scraping session:Another scrapeable file:PRODUCT" );
extractOneValue
String scrapeableFile.extractOneValue ( String text, String extractorPatternName ) (professional and enterprise editions only)
String scrapeableFile.extractOneValue ( String text, String extractorPatternName, String extractorTokenName ) (professional and enterprise editions only)
Description
Manually retrieve the value of a single extractor token.
Parameters
- text The string to which the extractor pattern will be applied.
- extractorPatternName Name of extractor pattern in the scrapeable file, as a string. Optionally the scraping session and scrapeable file where the extractor pattern can be found can be specified in the form [scraping session:][scrapeable file:]extractor pattern.
- extractorTokenName (optional) Extractor token name, as a string, whose matched value should be returned. If left off the matched value for the first extractor token in the data set will be returned.
Return Values
Returns the match from the last data record, as a string, on success. On failure it returns null and writes a error to the log.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
If you want it to be from the first data record you could use getDataRecord.
Examples
Extract Value
// Applies the extractor pattern "Product Name" to the data found in
// the variable productDescriptionText. The extracted string is
// stored in the productName variable.
// Returns the value found in the first token found in the extractor pattern
// or null if no token is found.
productName = scrapeableFile.extractOneValue( productDescriptionText, "Product Name" );
Extract Value of Specified Token
// Applies the extractor pattern "Product Name" to the data found in
// the variable productDescriptionText. The extracted string is
// stored in the productName variable.
// Returns the value found in the token "NAME" found in the extractor pattern
// or null if no token is found.
productName = scrapeableFile.extractOneValue( productDescriptionText, "Product Name", "NAME" );
Extractor Pattern from another Scrapeable File
// Apply extractor pattern "Product Name" from "Another scrapeable file"
// to the variable productDescriptionText return the first "NAME"
String productName = scrapeableFile.extractOneValue( productDescriptionText, "Another scrapeable file:Product Name", "NAME" );
Extractor Pattern from another Scraping Session
// Apply extractor pattern "Product Name" from "Another scrapeable file"
// in "Other scraping session" to the variable productDescriptionText
// return the first "NAME"
String productName = scrapeableFile.extractData( productDescriptionText,
"Other scraping session:Another scrapeable file:Product Name",
"NAME" );
getASPXValues
DataRecord scrapeableFile.getASPXValues ( boolean onlyStandard ) (professional and enterprise editions only)
Description
Gets the ASPX .NET values from the string. The standard values are __VIEWSTATE, __EVENTTARGET, __EVENTVALIDATION, and __EVENTARGUMENT. Values will be stored in the returned DataRecord as ASPX_VIEWSTATE, ASPX_EVENTTARGET, etc...
Parameters
- onlyStandard Sets whether or not to only get the four standard tags, or look for any tags that begin with __
Return Values
A DataRecord object with each ASPX name as ASPX_[NAME] mapped to it's value. Note that when onlyStandard is false, any parameter that starts with the name __ will be returned in this DataRecord
Change Log
Version |
Description |
5.5.26a |
Available in all editions. |
Examples
Get the .NET values for a page
DataRecord aspx = scrapeableFile.getASPXValues(true);
getAuthenticationPreemptive
boolean scrapeableFile.getAuthenticationPreemptive ( )
Description
Retrieve the authentication expectation of the request.
Parameters
This method does not receive any parameters.
Return Values
Returns whether the scrapeable file expects to have to authenticate and so will send the information initially instead of waiting for the request for it, as a boolean.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Write Expectation Status to Log
// Log expectation of authentication
if ( scrapeableFile.getAuthenticationPreemptive() )
{
session.log( "Expecting Authentication" );
}
See Also
getCharacterSet
String scrapeableFile.getCharacterSet ( )
Description
Get the character set being used in the page response rendering.
Parameters
This method does not receive any parameters.
Return Values
Returns the character set applied to the scraped page, as a string. If a character set has not been specified then it will default to the character set specified in settings dialog box.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
If you are having trouble with characters displaying incorrectly, we encourage you to read about how to go about finding a solution using one of our FAQs.
Examples
Get Character Set
// Get the character set of the dataSet
charSetValue = scrapeableFile.getCharacterSet();
See Also
- setCharacterSet() [scrapeableFile] - Set the character set used to responses to a specific scrapeable file.
- setCharacterSet() [session] - Set the character set used to render all responses.
- getCharacterSet() [session] - Gets the character set used to render all responses.
getContentAsString
String scrapeableFile.getContentAsString ( )
Description
Retrieve contents of the response.
Parameters
This method does not receive any parameters.
Return Values
Returns contents of the last response, as a string. If the file has not been scraped it will return an empty string.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Log Response
// In a script run "After file is scraped"
// Sends the HTML of the current file to the log.
session.log( scrapeableFile.getContentAsString() );
getContentType
String scrapeableFile.getContentType ( )
Description
Retrieve the POST payload type being used to interpret the page. This can be important with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.
Parameters
This method does not receive any parameters.
Return Values
Returns the content type, as a string (e.g., text/html or text/xml).
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Write Content Type to Log
// Write to log
session.log( "Content Type: " + scrapeableFile.getContentType( "text/xml" ) );
See Also
getCurrentPOSTData
String scrapeableFile.getCurrentPOSTData ( )
Description
Retrieve the POST data.
Parameters
This method does not receive any parameters.
Return Values
Returns the POST data for the scrapeable file, as a string. If called after the file has been scraped the session variable token will be resolved to their values; otherwise, the tokens will simply be removed from the string.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Collect POST data
// In script called "After file is scraped"
// Stores the POST data from the scrapeable file in the
// currentPOSTData variable.
currentPOSTData = scrapeableFile.getCurrentPOSTData();
getCurrentURL
String scrapeableFile.getCurrentURL ( )
Description
Get the URL of the file.
Parameters
This method does not receive any parameters.
Return Values
Returns the URL of the scrapeable file, as a string. If called after the file has been scraped the session variable tokens will be resolved to their values; otherwise, the tokens will simply be removed from the string.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Collect URL
// In script called "After file is scraped"
// Stores the current URL in the variable currentURL.
currentURL = scrapeableFile.getCurrentURL();
getExtractorPatternTimedOut
boolean scrapeableFile.getExtractorPatternTimedOut () (professional and enterprise editions only)
Description
Indicates whether or not the most recent extractor pattern application timed out.
Change Log
Version |
Description |
5.5.36a |
Available in all editions. |
Examples
Find out about the last extractor pattern attempt
if( scrapeableFile.getExtractorPatternTimedOut() )
{
session.log( "Most recent extractor pattern timed out." );
}
getForceNonBinary
boolean scrapeableFile.getForceNonBinary ( )
Description
Determine whether or not the contents of this response are being forced to be recognized as non-binary.
Parameters
This method does not receive any parameters.
Return Values
Returns true if the scrapeable file is being forced to be treated as non-binary; otherwise, it returns false.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
Examples
Check Binary Status of File
// Determine if the file is being forced
// to be recognized as non-binary
forced = scrapeableFile.getForceNonBinary();
See Also
- setForceNonBinary() [scrapeableFile] - Sets whether or not the contents of the file are forced to be interpreted as non-binary
getHTTPResponseHeader
String scrapeableFile.getHTTPResponseHeader ( String header ) (professional and enterprise editions only)
Description
Gets the value of the header in the response of the scrapeable file, or returns null if it couldn't be found
Parameters
- header The header name (case-insensitive) to get
Return Value
The value of the header, or null if not found
Change Log
Version |
Description |
5.5.29a |
Available in professional and enterprise editions. |
Examples
Log the Content-Type
session.log(scrapeableFile.getHTTPResponseHeader());
getHTTPResponseHeaderSection
String scrapeableFile.getHTTPResponseHeaderSection ( ) (professional and enterprise editions only)
Description
Gets the header section of the HTTP Response
Parameters
This method takes no parameters
Return Value
A String containing the HTTP Response Headers
Change Log
Version |
Description |
5.5.29a |
Available in professional and enterprise editions. |
Examples
Log the headers
// Split the headers into lines
String[] headers = scrapeableFile.getHTTPResponseHeaderSection().split("[\\r\\n]");
for(int i = 0; i < headers.length; i++)
{
session.log(headers[i]);
}
getHTTPResponseHeaders
Map<String, String> scrapeableFile.getHTTPResponseHeaders ( ) (professional and enterprise editions only)
Description
Gets the headers of the HTTP Response as a map, and returns them.
Parameters
This method takes no parameters
Return Value
A Map from header name to header value for the response headers.
Change Log
Version |
Description |
5.5.29a |
Available in professional and enterprise editions. |
Examples
Get the Content-Type header
Map headers = scrapeableFile.getHTTPResponseHeaders();
Iterator it = headers.keySet().iterator();
while(it.hasNext())
{
String next = it.next();
if(next.equalsIgnoreCase("Content-Type"))
session.log("Content-Type was: " + headers.get(next));
}
getLastTidyAttemptFailed
boolean scrapeableFile.getLastTidyAttemptFailed ()
Description
Indicates whether or not the most recent attempt to tidy the HTML failed.
Change Log
Version |
Description |
5.5.36a |
Available in all editions. |
Examples
Find out about the last HTML tidy attempt
if( scrapeableFile.getLastTidyAttemptFailed() )
{
session.log( "Most recent tidy attempt failed." );
}
getMaxRequestAttemptsReached
boolean scrapeableFile.getMaxRequestAttemptsReached () (professional and enterprise editions only)
Description
Indicates whether or not the maximum attempts to request a given scrapeable file were reached.
Change Log
Version |
Description |
5.5.36a |
Available in all editions. |
Examples
Find out about the last request attempt
if( scrapeableFile.getMaxRequestAttemptsReached() )
{
session.log( "Maximum request attempts were reached." );
}
getMaxResponseLength
int scrapeableFile.getMaxResponseLength ( )
Description
Retrieve the kilobyte limit for information retrieved by the scrapeable file, any additional information will not be retrieved.
Parameters
This method does not receive any parameters.
Return Values
Returns the current kilobyte limit on the response, as an integer.
Change Log
Version |
Description |
5.0 |
Add for professional and enterprise editions. |
Examples
Log Response Size Limit
// Log Limit
session.log( "Max Response Length: " + scrapeableFile.getMaxResponseLength() + " KB" );
See Also
- setMaxResponseLength() [scrapeableFile] - Sets the maximum number of kilobytes that will be retreived by the scrapeable file
getName
String scrapeableFile.getName ( )
Description
Get the name of the scrapeable file.
Parameters
This method does not receive any parameters.
Return Values
Returns the name of the scrapeable file, as a string.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Write Scrapeable File Name to Log
// Outputs the name of the scrapeable file to the log.
session.log( "Current scrapeable file: " + scrapeableFile.getName() );
getNonTidiedHTML
String scrapeableFile.getNonTidiedHTML ( ) (enterprise edition only)
Description
Retrieve the non-tidied HTML of the scrapeable file.
Parameters
This method does not receive any parameters.
Return Values
Returns the non-tidied contents of the scrapeable file, as a string. On failure it returns null.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
By default non-tidied html is not retained. For this method to return anything other than null you must use setRetainNonTidiedHTML to force non-tidied html to be retained.
Examples
Write Untidied HTML to Log if Retained
// Outputs the non-tidied HTML from the scrapeable file
// to the log based on whether it was retained or not.
if (scrapeableFile.getRetainNonTidiedHTML())
{
session.log( "Non-tidied HTML: " + scrapeableFile.getNonTidiedHTML() );
}
else
{
session.log( "The non-tidied HTML was not retained or the file has not yet been scraped." );
}
See Also
getRedirectURLs
String[] scrapeableFile.getRedirectURLs ( ) (professional and enterprise editions only)
Description
Gets an array of strings containing the redirect URL's for the current scrapeable file request attempt.
Parameters
This method does not receive any parameters.
Return Values
Returns the array of strings; may be empty.
Change Log
Version |
Description |
6.0.24a |
Available in Professional and Enterprise editions. |
getRetainNonTidiedHTML
boolean scrapeableFile.getRetainNonTidiedHTML ( ) (enterprise edition only)
Description
Determine if the scrapeable file is set to retain non-tidied html.
Parameters
This method does not receive any parameters.
Return Values
Returns boolean flag for non-tidied contents being retained.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
Examples
Write Untidied HTML to Log if Retained
// Outputs the non-tidied HTML from the scrapeable file
// to the log if it was retained otherwise just a message.
if (scrapeableFile.getRetainNonTidiedHTML())
{
session.log( "Non-tidied HTML: " + scrapeableFile.getNonTidiedHTML() );
}
else
{
session.log( "The non-tidied HTML was not retained or the file has not yet been scraped." );
}
See Also
getRetryPolicy
RetryPolicy scrapeableFile.getRetryPolicy ( ) (professional and enterprise editions only)
Description
Returns the retry policy. Note that in any 'After file is scraped' scripts this is null
Parameters
This method takes no parameters.
Return Value
The Retry Policy that will be used by this scrapeable file
Change Log
Version |
Description |
5.5.29a |
Available in professional and enterprise editions. |
Examples
Check for a retry policy
if(scrapeableFile.getRetryPolicy() == null)
{
session.log(scrapeableFile.getName() + ": Retry policy has been set for this scrapeable file.");
}
getStatusCode
int scrapeableFile.getStatusCode ( ) (professional and enterprise editions only)
Description
Determine the HTTP status code sent by the server.
Parameters
This method does not receive any parameters.
Return Values
Returns integer corresponding to the HTTP status code of the response.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Write warning to log on 404 error
// Check for a 404 response (file not found).
if( scrapeableFile.getStatusCode() == 404 )
{
url = scrapeablefile.getCurrentURL();
session.log( "Warning! The server returned a 404 response for the url ( " + url + ")." );
}
getUserAgent
String scrapeableFile.getUserAgent ( )
Description
Retrieve the name of the user agent making the request.
Parameters
This method does not receive any parameters.
Return Values
Returns the user agent, as a string.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Write User Agent to Log
// write to log
session.log( scrapeableFile.getUserAgent( ) );
See Also
- setUserAgent() [scrapeableFile] - Sets the name of the user agent that will make the request
inputOutputErrorOccured
boolean scrapeableFile.inputOutputErrorOccurred ( )
Description
Determine if an input or output error occurred when requesting file.
Parameters
This method does not receive any parameters.
Return Values
Returns true if an error has occurred; otherwise, it returns false.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
This method should be run after the scrapeable file has been scraped.
Examples
End scrape on Error
// Check for error<br />
if (scrapeableFile.inputOutputErrorOccurred())
{
// Log error occurrence
session.log("Input/output error occurred.");
// End scrape
session.stopScraping();
}
noExtractorPatternsMatched
boolean scrapeableFile.noExtractorPatternsMatched ( )
Description
Determine whether any extractor patterns associated with the scrapeable file found a match.
Parameters
This method does not receive any parameters.
Return Values
Returns boolean corresponding to whether any extractor pattern matched in the scrapeable file.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
Examples
Warning if no Extractor Patterns matched
// If no patterns matched, outputs a message indicating such
// to the session log.
if( scrapeableFile.noExtractorPatternsMatched() )
{
session.log( "Warning! No extractor patterns matched." );
}
removeAllHTTPParameters
void scrapeableFile.removeAllHTTPParameters ( ) (professional and enterprise editions only)
Description
Remove all of the HTTP parameters from the current scrapeable file.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Delete HTTP Parameters
// Removes all of the HTTP parameters from the current scrapeable file.
scrapeableFile.removeAllHTTPParameters();
See Also
- removeHTTPParameter() [scrapeableFile] - Removes an HTTP Parameter from the request that will be made by the scrapeable file
- addHTTPParameter() [scrapeableFile] - Add an HTTP Parameter to the request that will be made by the scrapeable file
removeHTTPHeader
void scrapeableFile.removeHTTPHeader ( String key ) (enterprise edition only)
void scrapeableFile.removeHTTPHeader ( String key, String value ) (enterprise edition only)
Description
Remove an HTTP header from a scrapeable file.
Parameters
- key The name of the HTTP header to be removed, as a string.
- value (optional) The value of the HTTP header that is to be removed, as a string. If this is left off then all headers of the specified key will be removed.
Return Values
Returns void.
Change Log
Version |
Description |
5.0.5a |
Introduced for enterprise edition. |
Examples
Remove All Values of a Header
// delete all cookie headers for this scrapeableFile
// this can be done on a global scale
// using session.clearCookies
scrapeableFile.removeHTTPHeader( "User-Agent" );
See Also
- addHTTPHeader() [scrapeableFile] - Adds an HTTP Header to the scrapeable file
removeHTTPParameter
void scrapeableFile.removeHTTPParameter ( int sequence )
void scrapeableFile.removeHTTPParameter ( String key ) (professional and enterprise editions only)
Description
Dynamically removes an HTTPParameter. The order of the remaining parameters are adjusted immediately.
Parameters
- sequence The ordered location of the parameter.
- key The key identifying the HTTP parameter to be removed.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
5.5.32a: Added method call that takes a String. |
Available for Professional and Enterprise editions. |
If calling this method more than once in the same script, when used in conjunction with the addHTTPParameter method, it is important to keep track of how the list is reordered before calling either method again.
Calling this method will have no effect unless it's invoked before the file is scraped.
This method can be used for both GET and POST parameters.
Examples
Remove HTTP parameter
// In a script called "Before file is scraped"
// Removes the eighth HTTP parameter from the current file.
scrapeableFile.removeHTTPParameter( 8 );
See Also
- addHTTPParameter() [scrapeableFile] - Adds an HTTP Parameter to the request that will be made by the scrapeable file
- removeAllHTTPParameters() [scrapeableFile] - Remove all the HTTP Parameters from the request that will be made by the scrapeable file
resequenceHTTPParameter
void scrapeableFile.resequenceHTTPParameter ( String key, int sequence ) (professional and enterprise editions only)
Description
Resequences an HTTP parameter.
Parameters
- key The key identifying the HTTP parameter to be resequenced.
- sequence The new sequence the parameter should have.
Change Log
Version |
Description |
5.5.32a |
Available in Professional and Enterprise editions. |
Examples
Resequence an HTTP parameter
// Give the "VIEWSTATE" HTTP parameter a sequence of 3.
scrapeableFile.resequenceHTTPParameter( "VIEWSTATE", 3 );
resolveRelativeURL
String scrapeableFile.resolveRelativeURL ( String urlToResolve ) (professional and enterprise editions only)
Description
Resolves a relative URL to an absolute URL based on the current URL of this scrapeable file.
Parameters
- urlToResolve Relative file path, as a string.
Return Values
Returns string containing the complete url to the file. On failure it will return the relative path and an error will be written to the log.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
Examples
Resolve relative URL into an absolute URL
// Assuming the URL of the current scrapeable file is
// "https://www.screen-scraper.com/path/to/file/"
// the method call would result in the URL
// "https://www.screen-scraper.com/path/to/file/thisfile.php"
// begin assigned to the "fullURL" variable.
fullURL = scrapeableFile.resolveRelativeURL( "thisfile.php" );
saveFileBeforeTidying
void scrapeableFile.saveFileBeforeTidying ( String filePath ) (professional and enterprise editions only)
Description
Write non-tidied contents of the scrapeable file response to a text file.
Parameters
- filePath File path, as a string, where the file should be saved.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
This method must be called before the file is scraped.
Because the response header are also saved in the file, if the file is anything except a text file it will not be valid (e.g. images, pdfs).
Examples
Save Untidied Request and Response
// In script called "Before file is scraped"
// Causes the non-tidied HTML from the scrapeable file
// to be output to the file path.
scrapeableFile.saveFileBeforeTidying( "C:/non-tidied.html" );
saveFileOnRequest
void scrapeableFile.saveFileOnRequest ( String filePath ) (enterprise edition only)
Description
Save the file returned from a scrapeable file request.
Parameters
- filePath Location where the file should be saved as a string.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
This method must be called from a scrapeable file before the file is scraped. Do not call this method from a script which is invoked by other means such as after an extractor pattern match or from within another script.
It is preferable to use downloadFile; however, at times you may have to send POST parameters in order to access a file. If that is the case, you would use this method.
This method cannot save local file requests to another location.
Examples
Save requested file
// In script called "Before file is scraped"
// When the current file is requested it will be saved to the
// local file system as "sample.pdf".
scrapeableFile.saveFileOnRequest( "C:/downloaded_files/sample.pdf" );
setAuthenticationPreemptive
void scrapeableFile.setAuthenticationPreemptive ( boolean preemptiveAuthentication )
Description
Set the authentication expectation of the request.
Parameters
- preemptiveAuthentication Whether the scrapeable file expects to have to authenticate and so will send the information initially instead of waiting for the request for it, as a boolean.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Available for all editions. |
Examples
Set Preemptive Authentication
// Set expectation of authentication
scrapeableFile.setAuthenticationPreemptive( false );
See Also
setCharacterSet
void scrapeableFile.setCharacterSet ( String characterSet ) (professional and enterprise editions only)
Description
Set the character set used in a specific scrapeable file's response renderings. This can be particularly helpful when the page renders characters incorrectly.
Parameters
- characterSet Java recognized character set, as a string. Java provides a list of supported character sets in its documentation.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
This method must be called before the file is scraped.
If you are having trouble with characters displaying incorrectly, we encourage you to read about how to go about finding a solution using one of our FAQs.
Examples
Set Character Set of Scrapeable File
// In script called "Before file is scraped"
// Sets the character set to be applied to the last response.
scrapeableFile.setCharacterSet( "ISO-8859-1" );
See Also
- getCharacterSet() [scrapeableFile] - Gets the character set used to responses to a specific scrapeable file.
- setCharacterSet() [session] - Set the character set used to render all responses.
- getCharacterSet() [session] - Gets the character set used to render all responses.
setContentType
void scrapeableFile.setContentType ( String contentType ) (professional and enterprise editions only)
Description
Set POST payload type. This is particularly helpful with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.
Parameters
- setContentType Desired content type of the POST payload, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
This method must be called before the file is scraped.
This method is usually used in connection with setRequestEntity as that method specifies the content of the POST data.
Examples
Set Content Type for XML payload in AJAX
// In script called "Before file is scraped"
// Sets the type of the POST entity to XML.
scrapeableFile.setContentType( "text/xml" );
// Set content of POST data
scrapeableFile.setRequestEntity( "<person><name>John Smith</name></person>" );
See Also
setForceMultiPart
void scrapeableFile.setForceMultiPart ( boolean forceMultiPart ) (professional and enterprise editions only)
Description
Set content type header to multipart/form-data.
Parameters
- forceMultiPart Boolean representing whether the request contains multipart data (e.g. images, files) as opposed to plain text. The default is false.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
This method must be called before the file is scraped.
Occasionally a site will expect a multi-part request when a file is not being sent in the request.
If you include a file upload parameter under the parameters tab of the scrapeable file the request will automatically be multi-part.
Examples
Specify that Request contains Files
// In script called "Before file is scraped"
// Will cause the request to be made as a multi-part request.
scrapeableFile.setForceMultiPart( true );
setForceNonBinary
void scrapeableFile.setForceNonBinary ( boolean forceNonBinary )
Description
Set whether or not the contents of this response should be forced to be treated as non-binary. Default forceNonBinary value is false.
Parameters
- forceNonBinary Whether or not the scrapeable file should be forced to be non-binary.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Added for all editions. |
This is provided in the case where screen-scraper misidentifies a non-binary file as a binary file. It doesn't happen often but is possible.
Examples
Check Binary Status of File
// Force file to be recognized as non-binary
scrapeableFile.setForceNonBinary( true );
See Also
- getForceNonBinary() [scrapeableFile] - Returns whether or not this scrapeable file response will be forced to be treated as non-binary
setForcePOST
void scrapeableFile.setForcePOST ( Boolean forcePOST ) (professional and enterprise editions only)
Description
Determines whether or not a POST request should be forced.
Return Values
Returns void.
Change Log
Version |
Description |
6.0.14a |
Available in Professional and Enterprise editions. |
setForcedRequestType
void scrapeableFile.setForcedRequestType ( ScrapeableFile.RequestType type ) (professional and enterprise editions only)
Description
Sets the request type to use.
Parameters
-
type The type of request to issue, or null to let screen-scraper decide.
ScrapeableFile.RequestType is an enum with the following options as values
- GET
- POST
- HEAD
- DELETE
- OPTIONS
If the method sets the request to one of those types, all paramenters set as GET in the paramenters tab will be appended to the url (like normal) and all parameters set as POST parameters will be used to buld the request entity. If there are POST values on a type that doesn't support a request entity an exception will be thrown when the request is issued.
Return Values
Returns void.
Change Log
Version |
Description |
6.0.55a |
Available in Professional and Enterprise editions. |
Examples
Sets the request type
scrapeableFile.setForcedRequestType(ScrapeableFile.RequestType.PUT)
setLastScrapedData
void scrapeableFile.setLastScrapedData(String) (enterprise edition only)
Description
Overwrite the content of the "last response"
Parameters
- String Desired new content of the last response
Return Values
Returns void.
This method must be called from an extractor pattern before the pattern is run.
Examples
Replace new line characters with a space
newLastResponse = scrapeableFile.getContentAsString().replaceAll("\\n"," ");
scrapeableFile.setLastScrapedData(newLastResponse );
setMaxResponseLength
void scrapeableFile.setMaxResponseLength ( int maxKBytes ) (professional and enterprise editions only)
Description
Limit the amount of information retrieved by the scrapeable file. This method can be useful in cases of very large responses where the desired information is found in the first portion of the response. It can also help to make the scraping process more efficient by only downloading the needed information.
Parameters
- maxKBytes Kilobytes to be downloaded, as an integer.
Return Values
Returns void.
Change Log
Version |
Description |
5.0 |
Add for professional and enterprise editions. |
This method must be called before the file is scraped.
Examples
Limit Response Size
// In script called "Before file is scraped"
// Only download the first 50 KB
scrapeableFile.setMaxResponseLength(50);
See Also
- getMaxResponseLength() [scrapeableFile] - Returns the maximum response length that is read by the scrapeable file
setReferer
void scrapeableFile.setReferer ( String url ) (professional and enterprise editions only)
Description
Set referer HTTP header.
Parameters
- url URL of the referer, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
This method must be called before the file is scraped.
Examples
Specify that Request contains Files
// In script called "Before file is scraped"
// Sets the value of url as the HTTP header
// referer for the current scrapeable file.
scrapeableFile.setReferer( "http://www.foo.com/" );
setRequestEntity
void scrapeableFile.setRequestEntity ( String requestEntity ) (professional and enterprise editions only)
Description
Set POST payload data. This is particularly helpful with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.
Parameters
- requestEntity Desired content of the POST payload, as a string.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
This method must be called before the file is scraped.
This method is usually used in connection with setContentType as that method specifies the content of the POST data.
Though you can set plain text POST data using this method it is preferable to use the addHTTPParameter method for this task.
Examples
Set POST data as XML
// In script called "Before file is scraped"
// Sets the type of the POST entity to XML.
scrapeableFile.setContentType( "text/xml" );
// Set content of POST data
scrapeableFile.setRequestEntity( "<person><name>John Smith</name></person>" );
setRetainNonTidiedHTML
void scrapeableFile.setRetainNonTidiedHTML ( boolean retainNonTidiedHTML ) (enterprise edition only)
Description
Set whether or not non-tidied HTML is to be retained for the current scrapeable file.
Parameters
- retainNonTidiedHTML Whether the non-tidied HTML should be retained, as a boolean. The default is false.
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for enterprise edition. |
If, after the file is scraped, you want to be able to use getNonTidiedHTML this method has to be called before the file is scraped.
Examples
Retain Non-tidied HTML
// In script called "Before file is scraped"
// Tells screen-scraper to retain tidied HTML for the current
// scrapeable files.
scrapeableFile.setRetainNonTidiedHTML( true );
See Also
setRetryPolicy
void scrapeableFile.setRetryPolicy ( RetryPolicy policy ) (professional and enterprise editions only)
Description
Sets a Retry Policy that will be run to check if a page should be re-downloaded or not. The policy will be checked after all the extractors have run, and will check for an error on the page based on a set of conditions. If the policy shows an error on the page, it can run scripts or other code to attempt to remedy the situation, and then it will rescrape the file.
The file will be re-downloaded without rerunning any of the scripts that run before the file is downloaded, and before any of the scripts marked to run after the file is scraped. If there is any change that needs to be made to session variables/headers, etc... they should be made in the script or runnable that will be executed. Also, the policy can specify that session variables should be restored to their previous values before the file is rescraped. If it does, they will be reset after the error checking portion of the policy but before the policy runs the code to make changes before a retry.
The retry policy should be set in a script run 'Before file is scraped', but can also be set by a script on an extractor pattern. It it is set on an extractor pattern, session variables will not be restored if the retry is required
Parameters
- policy The policy that should be run. See the RetryPolicyFactory for standard policies, or one can be created by implementing the RetryPolicy interface
Return Value
This method returns void.
Change Log
Version |
Description |
5.5.29a |
Available in professional and enterprise editions. |
Examples
Set a basic retry policy
import com.screenscraper.util.retry.RetryPolicyFactory;
// Use a policy that will retry up to 5 times, and on each failed attempt to load
// the page, it will execute the "Get new Proxy" script
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getBasicPolicy(5, "Get new Proxy"));
setUserAgent
void scrapeableFile.setUserAgent ( String userAgent ) (professional and enterprise editions only)
Description
Explicitly state the user agent making the request.
Parameters
- userAgent User agent name, as a string. There are a lot of possible user agents, a list is maintained by User-Agents.org. The default is Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322).
Return Values
Returns void.
Change Log
Version |
Description |
4.5 |
Available for professional and enterprise editions. |
This method must be called before the file is scraped.
Examples
Set User Agent
// In script called "Before file is scraped"
// Causes screen-scraper to identify itself as Firefox
// running on Linux.
scrapeableFile.setUserAgent( "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826" );
See Also
- getUserAgent() [scrapeableFile] - Returns the name of the user agent that will make the request
wasErrorOnRequest
boolean scrapeableFile.wasErrorOnRequest ( )
Description
Determine if an error occurred with the request. Errors are considered to be server timeouts as well as any status code outside of the range 200-399.
Parameters
This method does not receive any parameters.
Return Values
Returns true for server timeouts as well as any status code outside of the range 200-399; otherwise, it returns false.
Change Log
Version |
Description |
4.5 |
Available for all editions. |
This method must be called after the file is scraped.
If you want to know what the status code was you can use getStatusCode.
Examples
Check for Request Errors
// In script called "After file is scraped"
// If an error occurred when the file was requested, an error
// message indicating such gets output to the log.
if( scrapeableFile.wasErrorOnRequest() )
{
session.log( "Connection error occurred." );
}