scrapeableFile
setRequestEntity
Description
Set POST payload data. This is particularly helpful with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.
setReferer
Description
Set referer HTTP header.
Parameters
- url URL of the referer, as a string.
Return Values
Returns void.
setContentType
Description
Set POST payload type. This is particularly helpful with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.
saveFileBeforeTidying
Description
Write non-tidied contents of the scrapeable file response to a text file.
Parameters
- filePath File path, as a string, where the file should be saved.
Return Values
Returns void.
wasErrorOnRequest
Description
Determine if an error occurred with the request. Errors are considered to be server timeouts as well as any status code outside of the range 200-399.
Parameters
This method does not receive any parameters.
Return Values
Returns true for server timeouts as well as any status code outside of the range 200-399; otherwise, it returns false.
saveFileOnRequest
Description
Save the file returned from a scrapeable file request.
Parameters
- filePath Location where the file should be saved as a string.
Return Values
Returns void.
removeAllHTTPParameters
Description
Remove all of the HTTP parameters from the current scrapeable file.
Parameters
This method does not receive any parameters.
Return Values
Returns void.
noExtractorPatternsMatched
Description
Determine whether any extractor patterns associated with the scrapeable file found a match.
Parameters
This method does not receive any parameters.
Return Values
Returns boolean corresponding to whether any extractor pattern matched in the scrapeable file.
getStatusCode
Description
Determine the HTTP status code sent by the server.
Parameters
This method does not receive any parameters.
Return Values
Returns integer corresponding to the HTTP status code of the response.
getNonTidiedHTML
Description
Retrieve the non-tidied HTML of the scrapeable file.
Parameters
This method does not receive any parameters.
Return Values
Returns the non-tidied contents of the scrapeable file, as a string. On failure it returns null.