Having trouble trying to scrape an ASPX site. Looking for ideas...

I'm trying the scrape an ASPX site, and I'm looking for direction on what to try next. I have already successfully scraped the first and second page of the site and have captured and passed along the Event Validation and View State variables. I now have a page of 500 results, but they are being displayed 20 at a time so I'm now trying to scrape the page which is displayed when the "Next 20 results" button is clicked. It keeps returning "11|pageRedirect||/Error.aspx|"

Does anyone have a suggestion on the next area I should focus on in my debugging efforts? Could it be the cookies? Any help is much appreciated! Thanks.

-Joe

Here is some additional info:

From the "Display Raw Request" of the HTTP transaction which works (captured during the Proxy session):

POST http://www.showcase.com/AppRoot.aspx?cc=0 HTTP/1.1
Proxy-Connection: Keep-Alive
Referer: http://www.showcase.com/AppRoot.aspx?cc=0#&&/wEXAQURV29ya2Zsb3dIaXN0b3J5...
Host: www.showcase.com
Pragma: no-cache
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Accept: */*
x-microsoftajax: Delta=true
Cache-Control: no-cache
Cookie: s_cc=true; s_evar13=Data%20Not%20Available; s_sq=cgproduction%3D%2526pid%253DSearch%252520Results%2526pidt%253D1%2526oid%253Dfunctiononclick%252528%252529%25257BShowLoader%252528%252529%25253B%25257D%2526oidt%253D2%2526ot%253DSUBMIT%2526oi%253D117; s_nr=1263260528001; ASP.NET_SessionId=nyxvxo55vx4pv3exqwviaj3p; IPLocationCookie=Lat=33.9799995422363&Lon=-118.443000793457&Ctry=UNITED STATES&Rgn=CA&City=MARINA DEL REY&Zip=90291&Addy=&ZoomLevel=14; ShowcaseAnonymousGUID=3974b6df-13f7-41e1-a956-648985c19835; LSL=DC7210ABC584838AD34C99E6A2C3C541
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Content-Length: 70483
Accept-Encoding: gzip, deflate

ctl00%24sm=ctl00%24sm%7Cctl00%24cphMain%24wfMain%24ctl00%24ctl00%24Search1%24ctl01%24ctl00%24SearchResults1%24ctl01%24ctl00%24summaryResults%24moduleSummaryResults%24pageNavigation%24btn ...

From the "Last Request" of the scrapeable file which returns the error:

POST /AppRoot.aspx?cc=0 HTTP/1.1
Referer: http://www.showcase.com/AppRoot.aspx?cc=0#&&/wEXAQURV29ya2Zsb3dIaXN0b3J5...
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Content-Type: application/x-www-form-urlencoded
Host: www.showcase.com
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
Cookie: ASP.NET_SessionId=2qelv055dql2al55hp5j2ar4; ShowcaseAnonymousGUID=845f5fdd-0fc1-46a4-bc6d-8d41823c5e5b; IPLocationCookie=Lat=33.9799995422363&Lon=-118.443000793457&Ctry=UNITED STATES&Rgn=CA&City=MARINA DEL REY&Zip=90291&Addy=&ZoomLevel=14; LSL=DC7210ABC584838AD34C99E6A2C3C541
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept-Encoding: gzip
Content-Length: 67986

ctl00%24sm=ctl00%24sm%7Cctl00%24cphMain%24wfMain%24ctl00%24ctl00%24Search1%24ctl01%24ctl00%24SearchResults1%24ctl01%24ctl00%24summaryResults%24moduleSummaryResults%24pageNavigation% ...

Looks like you have the right

Looks like you have the right idea: you want that last request to look like the one in the proxy as much an humanly possible.

If you don't already have the latest alpha version, there is a tool in there that can help you examine the differences: http://community.screen-scraper.com/node/1329

The you just need some of the tools in here to hammer your request into matching: http://community.screen-scraper.com/API/ScrapeableFile

I'm trying the "Compare with proxy transaction..." button

Thanks for your suggestion. I've upgraded to the latest alpha version and followed the instructions for "Comparing with proxy...", but after I'm prompted to select the proxy transaction, and then I select the transaction, no compare window pops up. In fact, nothing seems to happen. I'm I doing something wrong?

Joe, The "compare with proxy

Joe,

The "compare with proxy transaction" feature is a little buggy (and thus still in alpha). It sometimes take repeating those steps a second time to get it to work. Despite the bugs, it is very helpful for this kind of thing--especially when you can't view the entire raw last request ;(.

When I proxied the site my referrers were always the following, nothing more.

http://www.showcase.com/AppRoot.aspx?cc=0

In your example you're appending what looks like the values from some of the post parameters. Try just using the above.

It's been my experience that .Net sites are very, very picky about you sending exactly the right information in your request. For pages 2 & 3 of the results, I'm seeing what looks like two parameters for every property from the previous page being posted to subsequent page (crazy).

For example:


ctl00$cphMain$wfMain$ctl00$ctl00$Search1$ctl01$ctl00$SearchResults1$ctl01$ctl00$summaryResults$moduleSummaryResults$moduleWorkflow$ctl00$ctl00$propertiesListTab$propertyListResult$ctl00$ctl00$propertyList$showcaseResultsGrid$rptProperties$ctl00$FlexLease$address$hdnPropertyId = 1058627

ctl00$cphMain$wfMain$ctl00$ctl00$Search1$ctl01$ctl00$SearchResults1$ctl01$ctl00$summaryResults$moduleSummaryResults$moduleWorkflow$ctl00$ctl00$propertiesListTab$propertyListResult$ctl00$ctl00$propertyList$showcaseResultsGrid$rptProperties$ctl00$FlexLease$address$hdnPropertyName = 134 S 400 E

Just make sure you're passing everything. Note, too, that sometimes the names of the parameters (keys) will dynamically change. It may be helpful to set your POST parameters looping in a script using the addHTTPParameter method.

-Scott