getMissingRegexPolicy
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex )
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex, String scriptOnFail )
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex, Runnable runnableOnFail )
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex, String scriptOnFail )
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex, Runnable runnableOnFail )
Description
Policy that requires a Regular Expression NOT to match the page content (including headers) in order to be considered valid. In other words, if the Regular Expression matches, it means that the page should be rescraped.
Parameters
- retries How many times max to retry before failing
- regex A Regular expression that must NOT match the page content for the page to be considered valid
- scriptOnFail/runnableOnFail (optional) What to run (script or Runnable) if the policy shows an error on the page. This will be run just before the page is downloaded again. The script or Runnable will be executed in the current thread, so the scrapeable file will not be redownloaded until this runnable or script has finished executing.
Return Value
The RetryPolicy to set in the ScrapeableFile
Change Log
Version | Description |
---|---|
5.5.29a | Available in all editions. |
Examples
Set a matching regex policy
import com.screenscraper.util.retry.RetryPolicyFactory;
// Require the response to not contain the text "Google.com". Since this is a regex, the . must have a \ before it
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getMissingRegexPolicy(5, "Google\\.com", "Rotate Proxy"));
// Require the response to not contain the text "Google.com". Since this is a regex, the . must have a \ before it
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getMissingRegexPolicy(5, "Google\\.com", "Rotate Proxy"));
mikes on 11/21/2011 at 3:52 pm
- Printer-friendly version
- Login or register to post comments