setRetryPolicy
Description
Sets a Retry Policy that will be run to check if a page should be re-downloaded or not. The policy will be checked after all the extractors have run, and will check for an error on the page based on a set of conditions. If the policy shows an error on the page, it can run scripts or other code to attempt to remedy the situation, and then it will rescrape the file.
The file will be re-downloaded without rerunning any of the scripts that run before the file is downloaded, and before any of the scripts marked to run after the file is scraped. If there is any change that needs to be made to session variables/headers, etc... they should be made in the script or runnable that will be executed. Also, the policy can specify that session variables should be restored to their previous values before the file is rescraped. If it does, they will be reset after the error checking portion of the policy but before the policy runs the code to make changes before a retry.
The retry policy should be set in a script run 'Before file is scraped', but can also be set by a script on an extractor pattern. It it is set on an extractor pattern, session variables will not be restored if the retry is required
Parameters
- policy The policy that should be run. See the RetryPolicyFactory for standard policies, or one can be created by implementing the RetryPolicy interface
Return Value
This method returns void.
Change Log
Version | Description |
---|---|
5.5.29a | Available in professional and enterprise editions. |
Examples
Set a basic retry policy
// Use a policy that will retry up to 5 times, and on each failed attempt to load
// the page, it will execute the "Get new Proxy" script
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getBasicPolicy(5, "Get new Proxy"));
- Printer-friendly version
- Login or register to post comments