Pausing SS to wait for results
I am scraping a site that goes to a temporary screen with the words "Processing..." flashing on the screen and then loads the results that I want. Right now Screen-scraper is scraping the processing screen and of course failing to find the extractor patterns from the final results screen.
My question is: how can I tell screen scraper to wait for the final page to load before scraping, is there a way to specify a pause?
Thanks,
Joel
Pausing SS to wait for results
Joel,
Not to ask too much of your time but could you share with us a brief synopsis of what you had to do with the JavaScript Ajax calls to get screen-scraper to work with the site? Did you have to learn what the JS was doing because the HTTP proxy transactions didn't reveal enough?
This is the future, I'm afraid.
-Scott
Pausing SS to wait for results
Scott,
In this case, unfortunately the final data is being retreived with an AJAX call. I fiddled with replicating that call as another scrapable file, but then found another way into the data i was looking for that usually bypasses the loading delay.
This should work for now.
Thanks for the response,
Joel
Pausing SS to wait for results
Joel,
One correction. screen-scraper will not follow a redirect made via a meta tag in the HTML header (different than the HTTP header). In the case that the browser is being redirected by code that looks like this:
<meta http-equiv="refresh" content="0;url=/newWebPage.html" />
You would need to extract out the "newWebPage.html" portion and use it in the URL of a subsequent scrapeable file.
-Scott
Pausing SS to wait for results
Joel,
If you're running the professional or enterprise editions you have the session.pause() method available. However, I doubt pausing will result in what you want.
When a website pauses like this it displays the content after pausing in a few possible ways.
1. It redirects the browser using a meta tag redirect. screen-scraper should follow this.
2. It redirects using an HTTP Header 302 "object has moved". Again, screen-scraper should follow.
3. It does not redirect, rather is calls in the new content using AJAX. This is when client-side JavaScript makes an HTTP request to the server from within the already-loaded page. screen-scraper does not always follow the request like you would want it to (something we're working on). You need to parse through the HTTP transactions you recorded with the screen-scraper proxy server and read through their JavaScript to find out what the server needs you to send it in order for it to give you back what you want.
I hope this helps.
-Scott