Can ScreenScraper crawl this?
Hi,
I keep having issues scraping sites which are built using a lot of java script. I'm wondering whether ScreenScraper could potentially scrape this site -- I mean the paging on this site:
http://www.thenorthface.com/catalog/ca_ecom/en/gear/mens-jackets-vests
The issue is the paging through the product lists! Since the site is done using java script the html doesn't change. As far as I know, there is no way of scraping such sites using ScreenScraper.
Unfortunately, there are many sites using similar technology for paging like http://jysk.ca/category.aspx?catalog=RETAIL&id=10396. This site uses Microsoft's .NET engine and it doesn't seem to accept URL parameters.
Am I doing something wrong and ScreenScraper can actually handle these kind of sites? If I am not doing anything wrong is ScreenScraper approaching this issue? Because this is a common problem and since more and more sites are using this type of technology the issue is becoming more and more acute.
Thank you for your help,
Edgar
You can indeed scrape this
You can indeed scrape this sort of thing, it's just tricky. You'll find that when you click next page, there is still a corresponding HTTP request. The response, however, doesn't always look like a web-page. It could be JSON, XML, or myriad other things.
You would need to build a scrapeable file to emulate that HTTP request, and parse the result. It's a little tricky to figure out, but once you get the hang of it, the response is easier to parse (most of the time).