How to handle periodic changes of session state values and cookies that relate to denial of access
I set up a up a proxy session and a scraping session, then added scrapeable files. Then I added sesison variables that allow me to insert userID and password into a scrapeable file. It all works. However, I did the same thing previously and it worked too, but then it stopped working after a few hours. After while the scraping session began issuing a 404 code, which is the status code that means the web site denied access. I logged on to the web site manually and that worked, so the userID and password were OK. So something else changed. My guess is that a value in session state or a cookie changes periodically, perhaps as a security measure. But I really don't know.
What do you think happened? How can I determine what happened? Is there anything I can do to overcome this problem, or are there some sites you just can't scrape because of the way that the site is setup?
I see a "compare" feature in "Last Request". I see that some cookie values have chnged, there are new cookies, and there are "Headers". The Headers also have some new values, and some headers are added and deleted. Can screen-scraper in some way get the new cookies, headers and their values, and set the cookies and headers in downstream scapeable files, so that it doesn't gete a 404?
Actually HTTP 404 is "file
Actually HTTP 404 is "file not found", but it could be a cookie or session. Most of the time screen-scraper will handle those automatically, but if either is set in JavaScript, screen-scraper doesn't run that so you may need to review the JavaScript and see if there is anything set there that you need to set manually.