Using 'Next' link that fires off Javascript
Anyone tried grabbing info from 118.com?
This link http://www.118.co.uk/SearchResults.aspx?query=taxi&type=BusinessType is a list of Taxi companies in the UK. Clicking the Next link fires off Javascript which gets the next page.
The page number is not held in the querystring of the link or anything that simple.
However, I can't get Screen-Scraper to effectively click the next button. After checking out the scource code it seems the page does a postback but I can't see where the arguments are to specify what page is to be viewed.
(Sorry for not being very lucid. Javascript isn't my strong point :))
Any ideas?
Using 'Next' link that fires off Javascript
Doesn't viewstate in asp.net hold all the info on the form? Knew I shoulda played around with .net more :) (I'm an ASP coder by trade).
I think I got a little daunted and thought they were doing clever stuff but considering requests and responses will be text (even the headers) then presumably screen-scraper can grab it.
Great product btw :)
I use Firefox serial-scrapist, hadn't thought of using the debugger in it - I'll check it out.
And thanks todd. That made sense. Didn't think to capture the viewstate parameter.
I'll let you know how I get on.
Using 'Next' link that fires off Javascript
Hi,
Most web apps that run on .NET use this "__VIEWSTATE" parameter. Fortunately, it's not quite as scary as it may appear.
First, I would recommend using screen-scraper's proxy server to record what occurs when you click the "Next" link. Once you have that page captured, add it to a scraping session. You'll notice that there are five parameters being passed in the POST request:
__EVENTTARGET
__EVENTARGUMENT
__VIEWSTATE
usrSearchBox:Text1
hidFormHolder
To handle this, you simply need to extract each those elements from the first search results page, then embed the values as session variables in the POST parameters list. For example, to extract the "__VIEWSTATE" parameter, your extractor pattern would look like this:
name="__VIEWSTATE" value="~@VIEWSTATE@~"
You'll want to be sure that the "Save in session variable?" box is checked for the "~@VIEWSTATE@~" extractor pattern token. After that, you could then embed the extracted VIEWSTATE value in your POST parameter using this token:
~#VIEWSTATE#~
Hopefully that helps. Just let me know if I can clarify anything.
Kind regards,
Todd Wilson
Using 'Next' link that fires off Javascript
try constructing a url of the form (looks like it may need some urlencoding
SearchResults.aspx?__EVENTTARGET=dunno&__EVENTARGUMENT=dunnoeither&__VIEWSTATE=dunnoagain
this should do the trick , i think the venkman js debugger in firefox may help you find out what the values should be.
The code looks a little sloppy so I'll be suprised if a GET request was filtered
G (SS)