Scrape needs to be periodically restarted

I've built a scrape that runs for a while, but usually after about 1000 queries, the site starts giving 404 errors. It's not blocking my IP address, because all I have to do is stop the scrape and restart it where I left off, and it will go for a while again. I've run into this before, but usually I'm able to do some combination of messing with the cookies or referrers to get it to work, but this time, no combination of such is successful. The proxy transactions look the same, so I don't think it's the headers. It is a site with a username/password, but we re-login as part of the error recovery.

So my question is, is there something behind the scenes that's happening when you stop and restart a scrape that I don't know about? And if so, is there some way to simulate it inside a script without having to manually stop and start the scrape? Maybe it's something to do with the HTTP Connection or HTTPsession object? I'm stumped.

Chris, When you log out and

Chris,

When you log out and log back in you are renewing your session with the site. You can simulate this within your scraping session by clearing your cookies and logging in again.

See if that works.

-Scott

Yea, that was one of the

Yea, that was one of the first things we tried, to no avail... When we hit the error, we first clear the cookies, then we set the referrer to google.com, so it thinks it's coming in clean. Then we call the first scrapeable file, get the VIEWSTATE parameter, then call the Login page, then continue where we left off, but it just keeps giving the error. And we've tried most combinations of these things as well.

Nevermind, I figured it out

Nevermind, I figured it out (that always happens right after I ask for help :) ). There was one little page I wasn't calling in the right order, so it was a referrer issue again. Thanks for the help anyway.

Pesky .Net sites want things

Pesky .Net sites want things done just so, and in the right order. Glad you found the fix.