Start scraping where you left off

Hi,

If the scraper crashes or stops responding, is it possible to restart the scraping session and have the program continue where it left off?

Thanks,

Brendan

Start scraping where you left off

Thanks Todd,

This is exactly what we were going to do. I just wanted to know what other people did or if there was a call to do that automatically. It would be nice if there was a session.saveCurrentState("/some/file/path") call that would save the entire session state to disk. It could be invoked once for every product that we scrape. Or maybe once per product listing. Then on startup, look for that file. If it existed, load it up. If the end of the scrape was reached, the session state file would be deleted.

Thanks,

Brendan

Start scraping where you left off

Hi Brendan,

Unfortunately, there isn't an automated way to do this, but it's an excellent suggestion, so I've added it to our list of future features.

Fortunately, there is a way to address this using a more manual method that we use frequently. As your scraping session is progressing you can write out any significant values to a file that indicate where it is in the process. For example, if you were iterating through a series of zip codes and scraping related search results, before beginning a search for a given zip code you would write it out to a file. When you start up your scraping session you would then look for the existence of this file, and, if found, you would load up its value into a session variable such that your session would begin at that point.

Best,

Todd Wilson
[email protected]