Scraping a site multiple times - remotely

Hi -

I am running a scrape that must retrieve several pages of data one page at a time. The data on each page includes a link to an article or pdf file that exists on the site being scraped and the user must have access to these. The links often require a session id. How can I keep the session id active? Every time I run a scrape it starts from scratch and a new session id is created. I am using RemoteScrapingSession from Java.

Robin

robind on 11/21/2011 at 6:05 pm

screen-scraper support for licensed users

When you start a scrape it

When you start a scrape it will always have a new state, so if you can get the documents from within the running scrape that would be ideal. If you need to run a scrape to see what docs to get, you might need 2 scrapes that both log in, on one identifies documents, and the other just downloads them, but if they both log in, they each get a working session.

jason on 11/22/2011 at 10:18 am

Search

Community

screen-scraper

User login

Scraping a site multiple times - remotely

When you start a scrape it