Is there more info/documentation on the Advanced setting for "Max retries per file"?

A common situation I'm seeing in our scraper scripts is that we hit Max Retries per file limit while trying to go through a set of proxy entries.

When the limit is reached the page is dropped and the next search term is picked up and the cycle starts over. This causes us to skip an entire search term.

I'm looking for more documentation on this setting so I can figure out how to detect the Max Retries event. So that we can restart that search term.

Upping the limit doesn't seem to fix the issue if there is a string of bad proxy connections. I've gone from 5 to 20 for the limit and we still keep dropping search terms. Obviously I have a bad set of proxies we are hitting.

I need to control the flow better and not just drop the search term because an internal limit of screen scraper has been reached.

Any suggestions?

In the case of proxies, I

In the case of proxies, I usually use a script instead of the max retries. Here's a sample of one I use with Tor:

import java.io.*;
import com.ryanjustus.sstorcontrol.SSTorController;

SSTorController tor = session.getVariable("TOR_CONTROLLER");
if(tor==null)
{
        session.logError("NO TOR CONTROLLER INITIALIZED");
        return;
}
tor.touchFiles();

if (scrapeableFile.getStatusCode()==403 || (session.getVariable ("TOR_RETRY_ON_PATTERN_MATCH") == true) )
{
  //increase retry count
        if(session.getVariable("TOR_RETRY_COUNT")==null)
                session.setVariable("TOR_RETRY_COUNT", 1);
                session.setVariable ("TOR_RETRY_COUNT", session.getVariable ("TOR_RETRY_COUNT") + 1 );
                //if we are less than 10 retries request a new identity
                if (session.getVariable ("TOR_RETRY_COUNT") <= 20)
                {
                        session.log ("ON TRY: " + session.getVariable ("TOR_RETRY_COUNT") );
   
                        session.logWarn( "@@@ There was an error request for this page. @@@" );
                        session.logWarn( "@@@ Requesting new Tor identity                                                               @@@");
                        tor.requestNewIdentity();
                       
                        session.logWarn( "@@@ Retrying scrapeable file in ten seconds.  @@@" );
                        session.pause( 10000 );
                        session.scrapeFile( scrapeableFile.getName() );
                }
        //otherwise, reset the retry count, report the error in the log and move on
                else
                {
                        session.setVariable ("TOR_RETRY_COUNT", 0);
                        session.setVariable ("TOR_RETRY_ON_PATTERN_MATCH", false);     
                        session.logError("ERROR - ERROR ON REQUEST FOR [" + scrapeableFile.getCurrentURL() + "] :" );
                }
}
else
{
                        session.setVariable ("TOR_RETRY_COUNT", 0);
                        session.setVariable ("TOR_RETRY_ON_PATTERN_MATCH", false);
}

Notice that if I get an error indicating a blocked proxy, I get a new exit node, and retry with the same parameters. This way a search has to be blocked many times before it's skipped.

Will something like this help?

I'm using the built in proxy pool

We use the proxy pool objects setup before scraping the files.
ex:
import com.screenscraper.util.*;
proxyServerPool = new ProxyServerPool();
session.setProxyServerPool( proxyServerPool );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/good-proxies_A-M.txt" );
proxyServerPool.outputProxyServersToLog();
session.setUseProxyFromPool( true );
proxyServerPool.setRepopulateThreshold( 4 );

It appears ScrapeFile ends up terminating after the max retries value is reached.
I see the proxy changing for "max retries" and then failing without apparently throwing an error.

The call to "session.scrapeFile( "Search results" );" is in a while loop that cycles through a DB resultSet of terms to search for.
So when session.scrapeFile terminates without error the loop just moves on to the next term.
Since session.scrapeFile returns void there is nothing to capture to check if there was a successful scrape.
Does session.scrapeFile post an error condition I could check?
How about the code that controls the retries of proxies from the pool that is controlled by "max retries"?

If there was, or if session.scrapeFile returned an condition code I'd be able to check the result and take an appropriate action.
Perhaps jump to a new VPN tunnel, or go fetch a new set of proxies as this set appears to be stale.
Then retry the search term.

The max retries is meant to

The max retries is meant to be drop-dead easy. It's for use on dodgy servers that sometimes just get too bogged to respond. Cases like this one are more complex, and need a more specialized solution.

If you're using proxy pool, the script you run after each scrapeable file would be more like:

maxTries = 25;

if (session.getv("TRIES")==null)
        session.setv("TRIES", 0);

if (session.getv("TRIES")<maxTries && (scrapeableFile.wasErrorOnRequest() || scrapeableFile.noExtractorPatternsMatched()))
{
        sf = scrapeableFile.getName();
        session.logWarn("~~~Error on " + sf);
        session.addToVariable("TRIES", 1);
        session.log("~~~Retrying");
        session.currentProxyServerIsBad();
        session.scrapeFile(sf);
}
else if (session.getv("TRIES")>=maxTries)
{
        session.logError("---Error on " + sf);
        session.logError("---Already tried " + session.getv("TRIES") + " times, so aborting");
        session.stopScraping();
}
else
{
        // Request worked
        session.setv("TRIES", 0);
}