redirect leads to 404 error

Hi,

when I try to scrape URL: http://www.arkaden.at, screen scraper is redirected to http://flowfact.net/arkaden-immobilien. The response from there is:

HTTP/1.1 404 Not Found
Server: Apache
Connection: Keep-Alive
Content-Type: text/html
Date: Mon, 17 Dec 2012 08:32:56 GMT
Keep-Alive: timeout=2, max=200
Transfer-Encoding: chunked
X-Powered-By: PHP/4.4.9

When the above is replicated in a browser, all works fine.

How can I modify the way screen scraper handles redirection? I cannot use the "redirected URL" up-front, because these URLs are retrieved during a scrape.

best regards

Christian Pieler

There isn't a way to change

There isn't a way to change the redirect, but you don't need to.

  1. http://www.arkaden.at/ responds with an HTTP 302 that is is automatically followed to
  2. http://flowfact.net/arkaden-immobilien responds with an HTTP 404, but it still has content that is a frameset, and it loads
  3. http://www.flowfact.net/ffkunden/125787/index.php which responds with HTTP 200 and sets a session cookie, and should get you to what you want.

Ok, if I understand you

Ok, if I understand you correctly, I cannot do anything against initially running into the 404 error. URL http://www.arkaden.at/ is just one among many links that screen scraper is supposed to follow in this scrape. Every single page that it encounters has of cause it's own structure, so that finding http://www.flowfact.net/ffkunden/125787/index.php or any other address that might be there, is not trivial.

Any suggestions for that?

best regards

Christian Pieler

P.S.: why does screen-scraper's redirect lead to the 404, while a browser just shows the page?

The server is sending a 404

The server is sending a 404 with custom content ... the browser thinks it's displaying an error. I've seen it before. I don't know why.