Host redirect

When i try to scrape the page "https://www.bedandbreakfast.nl/bed-and-breakfast-nl/utrecht/t-singelhuis/7392/" the scraped source page in ss is different from the source page in my browser. When I tried to solve this I noticed the "last request" tab in ss shows the host is classic.bedandbreakfast.nl, in the browser request the Host is www.bedandbreakfast.nl. I think the problem is related to this issue but setting the host to www.bedandbreakfast.nl does not work.

When I request that page, I

When I request that page, I get re-directed to https://classic.bedandbreakfast.nl/bed-and-breakfast-nl/utrecht/t-singelhuis/7392/ which looks like a good version to scrape from.

There is a group of tags

<link rel="alternate"

They are detecting the language setting, and re-directing you to the page for that language. Screen-scraper obeys those link tags.

Hi Jason, This alternate

Hi Jason,

This alternate version lacks some information that is present on the original page. I need this information. Can i instruct ss not to follow this redirect or set the correct language (which is dutch:nl).

Screen-scraper identifies as

Screen-scraper identifies as IE 6, and since it is ancient, the site is re-dierecting you.

Make a script, and have it run before the session is run to:

session.setDefaultUserAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36");

Thanks Jason this works. I

Thanks Jason this works. I did try to change the useragent by using the addHTTPHeader method before running the scrapeable file, but that did not work. I am a little puzzled why this does work.