Controlling(pausing/restarting) from one scraping session to another possible?
Hello,
I have 2 scraping sessions:
session A scrapes website A and runs every night from 01AM to approx 07:30AM.
session B scrapes website B and runs once per week an takes approx 3 days to complete running non-stop day and night.
website A has a very strict limit of requests, so the only way I have worked around that is by rebooting the router from SS using Telnet(and pausing the session for the time that it takes my router to assign new ip address which is 2 minutes). I have tried doing it at specified times(like when getminute() is 0, 15, 30 or 45) but it works best just rebooting the router after an x amount of requests.
Because session B must run day and night for 3 days, whenever session A reboots the router session B will lose connection though it will keep on running, therefore losing important information.
Is there anyway that I can make session B pause also from session A everytime the router is rebooted? Or any alternative idea?
Thank you!
Boga
There isn't a dynamic pause,
There isn't a dynamic pause, but you could add a script on each scrapeableFile for something like:
session.setv("_TRIES", 0);
maxTries = 10;
if (scrapeableFile.noExtractorPatternsMatched() && session.getv("_TRIES")<maxTries)
{
log.logError("Error: no extractor patterns matched! Retrying");
session.addToVariable("_Tries", 1)
sutil.pause(120000); // 2 minutes
session.scrapeFile(scrapeableFile.getName());
}
else if (session.getv("_TRIES")>=maxTries)
{
log.logError("Error: tried " + session.getv("_TRIES") + " times, not response. Halting.");
session.stopScraping();
}
else
session.setv("_TRIES", 0);
Many thanks, I'll try it! So
Many thanks, I'll try it!
So this script would be called "After file is scraped" on each of the files that are scraped by scraping session B right?
cheers,
Boga
I suppose that since the time it takes the router to reboot is like 2 minutes, with maxtries = 2 and a 2 minute pause each try I would be safe.