Scraping .Net Websites

Hi there,

I'm looking to scrape an aspx website that uses postback. Basically its like a normal website with an index page next button for next page but uses aspx postback / javascript.

Any ideas how to get it to return the pages I need?

Cheers

Scraping .Net Websites

jonno,

Please have a look at a blog entry I recently completed on the topic. Hopefully, it will give you some tools to work with.

http://blog.screen-scraper.com/2008/06/04/scraping-aspnet-sites/

-Scott

Scraping .Net Websites

I'm of the opinion that anything written in .NET is rough, mainly because of its magical POST variable, called "_VIEWSTATE". It's giant and ugly and completely insensible, and it usually changes every time you *click* on a link.

Without seeing the webpage code myself, it's hard to say what you'd have to do to make it work. It's always possible though. Screen-scraper does nothing different than your web browser, except that you're having to go throught the steps manually.

One thing to note: Sometimes pages use redirects, and if it's a nasty page with a big _VIEWSTATE variable, then you might just want to let screen-scraper take care of the work, rather than trying to trace the pages exactly as you see them in your proxy session.

Generally pay close close attention to your POST data when you proxy a page. You may find it necessary to have a couple of scrapeableFiles in sequence that all point to the same base URL, yet pass different POST parameters, like the ugly _VIEWSTATE variable.

If you need more help, try posting some specific code snippets and we'll see what we can do ;)

Tim