RetryPolicy

Overview

Retry Policies are objects that tell a scrapeable file how to check for errors, and optionally what to do before retrying to download the files. Some of the things that can be done are executing scripts when the page loads incorrectly or running Runnables. Usually these things would either request a new proxy, output some helpful information, or could simply stop the scrape. RetryPolicy is an interface and can be implemented to create a custom retry policy, or there is a RetryPolicyFactory class that can be used to create some standard policies.

This policy is checked AFTER all the extractors have been run. This allows for checks on whether extractor patterns matched or not, and also allows a page to have it's 'error status' based off of another page (since extractor patterns could execute scripts that scrape other files, and those files could set a variable that acts as a flag to a previous retry policy). This could also cause some problems if the scrape isn't built to handle a page whose extractors shouldn't be run before the error checking occurs.
This interface is in the com.screenscraper.util.retry package.

Interface Implementation

If you need a custom retry policy, you can implement your own version of it. Be aware that you will need to ensure the references it has to the scrapeableFile are to the correct scrapeableFile. This could be tricky if you use the session.setDefaultRetryPolicy method. When using the scrapeableFile.setRetryPolicy method, the scrapeableFile will be the correct object. The interface is given below.

To help ensure you can create custom retry policies that have access to the scraping session and the scrapeable file that is currently being checked, there is an AbstractRetryPolicy class in the same package as the interface. This class defines some default behavior and adds protected fields for the session and scrapeable file that get set before the policy is run. If you extend this abstract class you can access the session and scrapeable file through this.scrapingSession and this.theScrapeableFile. Due to some oddities with the interpreter it is best to reference these variables with 'this.' to eliminate a few problems that arise in a few specific cases.

public interface RetryPolicy
{
        /**
         * Checks to see if the page loaded incorrectly
         *
         * @return True on errors, false otherwise
         * @throws Exception If something goes wrong while executing this method
         */

        public boolean isError() throws Exception;

        /**
         * Runs this code when the page had an error.  This could include things such as rotating the proxy.
         *
         * @throws Exception If something goes wrong while executing this method
         */

        public void runOnError() throws Exception;

        /**
         * Returns a map that can be used to output an error message to indicate what checks failed.  For instance,
         * you could set a key to the value "Status Code" and the value '200', or a key with "Valid Page" and value 'false'
         *
         * @return Map of keys, or null if no values are indicated
         *
         * @throws Exception If something goes wrong while executing this method
         */

        public Map getErrorChecksMap() throws Exception;

        /**
         * Returns true if the session variables should be reset before attempting to rescrape the file, if there was an error.
         * This can be useful especially if extractors null session variables when they don't match, but the value is needed
         * to rescrape the file.
         *
         * @return True if session variables should be reset if there was an error, false otherwise.
         */

        public boolean resetSessionVariablesBeforeRescrape();

        /**
         * Returns true if the referrer should be reset before attempting to rescrape the file,
         * if there was an error. This can be useful to reset so the referrer
         * doesn't show the page you just requested.
         *
         * @return True if the referrer should be reset if there was an error, false otherwise.
         */

        public boolean resetReferrerBeforeRescrape();

        /**
         * Returns true if errors should be logged to the log/web interface when they occur
         *
         * @return True if errors should be logged to the log/web interface when they occur
         */

        public boolean shouldLogErrors();

        /**
         * Return the maximum number of times this policy allows for a retry before terminating in an error
         *
         * @return The maximum number of times to allow the ScrapeableFile to be rescraped before resulting in an error
         */

        public int getMaxRetryAttempts();

        /**
         * This will be called if all the retry attempts for the scrapeable file failed.
         * In other words, if the policy said to retry 25 times, after 25 failures this
         * method will be called.  Note that {@link #runOnError()} will be called just before this,
         * as it is called after each time the scrapeable file fails to load
         * correctly, including the last time it fails to load.
         * <p/>
         * This should only contain code that handles the final error.  Any proxy rotating, cookie
         * clearing, etc... should generally be done in the {@link #runOnError()}
         * method, especially since it will still be called after the final error.
         */

        public void runOnAllAttemptsFailed();
}

getErrorChecksMap

Map getErrorChecksMap ( )

Description

Returns a map that can be used to output an error message to indicate what checks failed. For instance, you could set a key to the value "Status Code" and the value '200', or a key with "Valid Page" and value 'false'

Parameters

This method takes no parameters

Return Value

Map of keys, or null if no values are indicated

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Create a custom RetryPolicy

 import com.screenscraper.util.retry.RetryPolicy;
 
 _log = log;
 _session = session;
 
 RetryPolicy policy = new RetryPolicy()
 {
   Map errorMap = new HashMap();

   boolean isError() throws Exception
   {
     errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
     return scrapeableFile.wasErrorOnRequest();
   }

   void runOnError() throws Exception
   {
     session.executeScript("Rotate Proxy");
   }

   Map getErrorChecksMap() throws Exception
   {
     return errorMap;
   }

   boolean resetSessionVariablesBeforeRescrape()
   {
     return true;
   }

   boolean shouldLogErrors()
   {
     return true;
   }

   int getMaxRetryAttempts()
   {
     return 5;
   }
   
   boolean resetReferrerBeforeRescrape()
   {
      return false;
   }
   
   void runOnAllAttemptsFailed()
   {
      _log.logError("Failed to fix errors with the retry policy, stopping scrape");
      _session.stopScraping();
   }
 };

 scrapeableFile.setRetryPolicy(policy);

getMaxRetryAttempts

int getMaxRetryAttempts ( )

Description

Return the maximum number of times this policy allows for a retry before terminating in an error

Parameters

This method takes no parameters

Return Value

The maximum number of times to allow the ScrapeableFile to be rescraped before resulting in an error

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Create a custom RetryPolicy

 import com.screenscraper.util.retry.RetryPolicy;
 
 _log = log;
 _session = session;
 
 RetryPolicy policy = new RetryPolicy()
 {
   Map errorMap = new HashMap();

   boolean isError() throws Exception
   {
     errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
     return scrapeableFile.wasErrorOnRequest();
   }

   void runOnError() throws Exception
   {
     session.executeScript("Rotate Proxy");
   }

   Map getErrorChecksMap() throws Exception
   {
     return errorMap;
   }

   boolean resetSessionVariablesBeforeRescrape()
   {
     return true;
   }

   boolean shouldLogErrors()
   {
     return true;
   }

   int getMaxRetryAttempts()
   {
     return 5;
   }
   
   boolean resetReferrerBeforeRescrape()
   {
      return false;
   }
   
   void runOnAllAttemptsFailed()
   {
      _log.logError("Failed to fix errors with the retry policy, stopping scrape");
      _session.stopScraping();
   }
 };

 scrapeableFile.setRetryPolicy(policy);

isError

boolean isError ( )

Description

Checks to see if the page loaded incorrectly

Parameters

This method takes no parameters

Return Value

True on errors, false otherwise

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Create a custom RetryPolicy

 import com.screenscraper.util.retry.RetryPolicy;
 
 _log = log;
 _session = session;
 
 RetryPolicy policy = new RetryPolicy()
 {
   Map errorMap = new HashMap();

   boolean isError() throws Exception
   {
     errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
     return scrapeableFile.wasErrorOnRequest();
   }

   void runOnError() throws Exception
   {
     session.executeScript("Rotate Proxy");
   }

   Map getErrorChecksMap() throws Exception
   {
     return errorMap;
   }

   boolean resetSessionVariablesBeforeRescrape()
   {
     return true;
   }

   boolean shouldLogErrors()
   {
     return true;
   }

   int getMaxRetryAttempts()
   {
     return 5;
   }
   
   boolean resetReferrerBeforeRescrape()
   {
      return false;
   }
   
   void runOnAllAttemptsFailed()
   {
      _log.logError("Failed to fix errors with the retry policy, stopping scrape");
      _session.stopScraping();
   }
 };

 scrapeableFile.setRetryPolicy(policy);

resetReferrerBeforeRescrape

boolean resetReferrerBeforeRescrape ( )

Description

Returns true if the referrer should be reset before attempting to rescrape the file, if there was an error. This can be useful to reset so the referrer doesn't show the page you just requested.

Parameters

This method takes no parameters

Return Value

True if the referrer should be reset if there was an error, false otherwise.

Change Log

Version Description
6.0.36a Available in all editions.

Examples

Create a custom RetryPolicy

 import com.screenscraper.util.retry.RetryPolicy;
 
 _log = log;
 _session = session;
 
 RetryPolicy policy = new RetryPolicy()
 {
   Map errorMap = new HashMap();

   boolean isError() throws Exception
   {
     errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
     return scrapeableFile.wasErrorOnRequest();
   }

   void runOnError() throws Exception
   {
     session.executeScript("Rotate Proxy");
   }

   Map getErrorChecksMap() throws Exception
   {
     return errorMap;
   }

   boolean resetSessionVariablesBeforeRescrape()
   {
     return true;
   }

   boolean shouldLogErrors()
   {
     return true;
   }

   int getMaxRetryAttempts()
   {
     return 5;
   }
   
   boolean resetReferrerBeforeRescrape()
   {
      return false;
   }
   
   void runOnAllAttemptsFailed()
   {
      _log.logError("Failed to fix errors with the retry policy, stopping scrape");
      _session.stopScraping();
   }
 };

 scrapeableFile.setRetryPolicy(policy);

resetSessionVariablesBeforeRescrape

boolean resetSessionVariablesBeforeRescrape ( )

Description

Returns true if the session variables should be reset before attempting to rescrape the file, if there was an error. This can be useful especially if extractors null session variables when they don't match, but the value is needed to rescrape the file.

Parameters

This method takes no parameters

Return Value

True if session variables should be reset if there was an error, false otherwise.

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Create a custom RetryPolicy

 import com.screenscraper.util.retry.RetryPolicy;
 
 _log = log;
 _session = session;
 
 RetryPolicy policy = new RetryPolicy()
 {
   Map errorMap = new HashMap();

   boolean isError() throws Exception
   {
     errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
     return scrapeableFile.wasErrorOnRequest();
   }

   void runOnError() throws Exception
   {
     session.executeScript("Rotate Proxy");
   }

   Map getErrorChecksMap() throws Exception
   {
     return errorMap;
   }

   boolean resetSessionVariablesBeforeRescrape()
   {
     return true;
   }

   boolean shouldLogErrors()
   {
     return true;
   }

   int getMaxRetryAttempts()
   {
     return 5;
   }
   
   boolean resetReferrerBeforeRescrape()
   {
      return false;
   }
   
   void runOnAllAttemptsFailed()
   {
      _log.logError("Failed to fix errors with the retry policy, stopping scrape");
      _session.stopScraping();
   }
 };

 scrapeableFile.setRetryPolicy(policy);

runOnAllAttemptsFailed

void runOnAllAttemptsFailed ( )

Description

This will be called if all the retry attempts for the scrapeable file failed. In other words, if the policy said to retry 25 times, after 25 failures this method will be called. Note that runOnError will be called just before this, as it is called after each time the scrapeable file fails to load correctly, including the last time it fails to load.

This should only contain code that handles the final error. Any proxy rotating, cookie clearing, etc... should generally be done in the runOnError method, especially since it will still be called after the final error.

Parameters

This method takes no parameters

Return Value

This method returns void

Change Log

Version Description
6.0.37a Available in all editions.

Examples

Create a custom RetryPolicy

 import com.screenscraper.util.retry.RetryPolicy;
 
 _log = log;
 _session = session;
 
 RetryPolicy policy = new RetryPolicy()
 {
   Map errorMap = new HashMap();

   boolean isError() throws Exception
   {
     errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
     return scrapeableFile.wasErrorOnRequest();
   }

   void runOnError() throws Exception
   {
     session.executeScript("Rotate Proxy");
   }

   Map getErrorChecksMap() throws Exception
   {
     return errorMap;
   }

   boolean resetSessionVariablesBeforeRescrape()
   {
     return true;
   }

   boolean shouldLogErrors()
   {
     return true;
   }

   int getMaxRetryAttempts()
   {
     return 5;
   }
   
   boolean resetReferrerBeforeRescrape()
   {
      return false;
   }
   
   void runOnAllAttemptsFailed()
   {
      _log.logError("Failed to fix errors with the retry policy, stopping scrape");
      _session.stopScraping();
   }
 };

 scrapeableFile.setRetryPolicy(policy);

runOnError

void runOnError ( )

Description

Runs this code when the page had an error. This could include things such as rotating the proxy. This code will be executed just before the page is downloaded again.

Parameters

This method takes no parameters

Return Value

This method returns void

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Create a custom RetryPolicy

 import com.screenscraper.util.retry.RetryPolicy;
 
 _log = log;
 _session = session;
 
 RetryPolicy policy = new RetryPolicy()
 {
   Map errorMap = new HashMap();

   boolean isError() throws Exception
   {
     errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
     return scrapeableFile.wasErrorOnRequest();
   }

   void runOnError() throws Exception
   {
     session.executeScript("Rotate Proxy");
   }

   Map getErrorChecksMap() throws Exception
   {
     return errorMap;
   }

   boolean resetSessionVariablesBeforeRescrape()
   {
     return true;
   }

   boolean shouldLogErrors()
   {
     return true;
   }

   int getMaxRetryAttempts()
   {
     return 5;
   }
   
   boolean resetReferrerBeforeRescrape()
   {
      return false;
   }
   
   void runOnAllAttemptsFailed()
   {
      _log.logError("Failed to fix errors with the retry policy, stopping scrape");
      _session.stopScraping();
   }
 };

 scrapeableFile.setRetryPolicy(policy);

shouldLogErrors

boolean shouldLogErrors ( )

Description

Returns true if errors should be logged to the log/web interface when they occur

Parameters

This method takes no parameters

Return Value

True if errors should be logged to the log/web interface when they occur

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Create a custom RetryPolicy

 import com.screenscraper.util.retry.RetryPolicy;
 
 _log = log;
 _session = session;
 
 RetryPolicy policy = new RetryPolicy()
 {
   Map errorMap = new HashMap();

   boolean isError() throws Exception
   {
     errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
     return scrapeableFile.wasErrorOnRequest();
   }

   void runOnError() throws Exception
   {
     session.executeScript("Rotate Proxy");
   }

   Map getErrorChecksMap() throws Exception
   {
     return errorMap;
   }

   boolean resetSessionVariablesBeforeRescrape()
   {
     return true;
   }

   boolean shouldLogErrors()
   {
     return true;
   }

   int getMaxRetryAttempts()
   {
     return 5;
   }
   
   boolean resetReferrerBeforeRescrape()
   {
      return false;
   }
   
   void runOnAllAttemptsFailed()
   {
      _log.logError("Failed to fix errors with the retry policy, stopping scrape");
      _session.stopScraping();
   }
 };

 scrapeableFile.setRetryPolicy(policy);