redirect leads to 404 error
Hi,
when I try to scrape URL: http://www.arkaden.at, screen scraper is redirected to http://flowfact.net/arkaden-immobilien. The response from there is:
HTTP/1.1 404 Not Found
Server: Apache
Connection: Keep-Alive
Content-Type: text/html
Date: Mon, 17 Dec 2012 08:32:56 GMT
Keep-Alive: timeout=2, max=200
Transfer-Encoding: chunked
X-Powered-By: PHP/4.4.9
When the above is replicated in a browser, all works fine.
How can I modify the way screen scraper handles redirection? I cannot use the "redirected URL" up-front, because these URLs are retrieved during a scrape.
best regards
Christian Pieler
There isn't a way to change
There isn't a way to change the redirect, but you don't need to.
Ok, if I understand you
Ok, if I understand you correctly, I cannot do anything against initially running into the 404 error. URL http://www.arkaden.at/ is just one among many links that screen scraper is supposed to follow in this scrape. Every single page that it encounters has of cause it's own structure, so that finding http://www.flowfact.net/ffkunden/125787/index.php or any other address that might be there, is not trivial.
Any suggestions for that?
best regards
Christian Pieler
P.S.: why does screen-scraper's redirect lead to the 404, while a browser just shows the page?
The server is sending a 404
The server is sending a 404 with custom content ... the browser thinks it's displaying an error. I've seen it before. I don't know why.