Malformed Request
I'm trying to scrape the following site: https://www.geoaccess.com/wellmark/po/default.asp?SelectedNet=BlueDental....
I'm entering a single zip code, and I keep getting a page with the following message: The page you have requested cannot be displayed because the request for the page was malformed.
The parameters are pretty basic, so I can't figure out why the request is considered malformed.
Any ideas?
Malformed Request
John,
Please have a look at a blog entry I recently completed on the topic. Hopefully, it will give you some tools to work with.
http://blog.screen-scraper.com/2008/06/04/scraping-aspnet-sites/
-Scott
Malformed Request
John,
Wow, sorry I missed that. So, they're geocoding the zip you enter. You may need to have two different scrapeable files, then. One for RefineGeneral.asp and another for all subsequent pages using, DisplayResults.asp.
Without the ability to test this easily I'm relying on your feedback for my suggestions. Am I sending you in the right direction, do you think?
-Scott
Malformed Request
At least on this site, the xcoord and ycoord parameters change based on the zip code you submit. There are two other parameters that record the actual x and y coordinates of the button click.
Malformed Request
John,
In response to your question about the x & y coordinates parameters that are being passed to DisplayResults.asp and not to RefineGeneral.asp I would ask whether those parameters actually make a difference included or not. I wish I had an easy way to test this...but my ip is bad, ya know.
The x & y parameters are passed by default on occasions where the user clicks a submit button. They track the coordinates at which you click on the button. I've never seen them matter, so they're probably ok to ignore.
Please give it a try without them and see if it works.
Thanks,
Scott
Malformed Request
jclerie,
I'm not sure why this is but it seems to prefer it if you use "DisplayResults.asp", the page it redirects you to, rather than, "RefineGeneral.asp".
Try posting each page's data to "DisplayResults.asp" and remove "RefineGeneral.asp" altogether.
-Scott
Malformed Request
I've scraped a number of sites that used geoaccess and never had a problem before now. Yes, they like lots of state variables passed along, but I've never had to set the referrer. (I tried it in this case with no luck).
The site redirects to the results page but then immediately redirects again to an error page. If I submit the URL from the first redirect directly into a browser, I get results, so I can't figure out why the second redirect happens.
Here's the relevant log entry--
Results: Preliminary URL: https://www.geoaccess.com/wellmark/po/RefineGeneral.asp
Results: Using strict mode.
Results: POST data: address1=&city=&state=&zip=50317&mileage=10&usermiles=&speccode=&guid=D0C1AEDE-B0B3-4D49-BCDD-D0F28C83C7B0&vguid=AC782ED4-2DC8-4486-AFC6-EB3B32FA706F&primcolor=white&seccolor=white&tertcolor=white&selectednet=BlueDental&srchtype=SEARCH&mode=&returnurl=&membergender=&prodcode=&transid=&planid=&btnResults.x=30&btnResults.y=8
Results: Resolved URL: https://www.geoaccess.com/wellmark/po/RefineGeneral.asp
Results: Sending request.
Results: Redirecting to: https://www.geoaccess.com/wellmark/po/DisplayResults.asp?returnurl=&planid=&btnresults.x=30&btnresults.y=8&state=&vguid=AC782ED4%2D2DC8%2D4486%2DAFC6%2DEB3B32FA706F&membergender=&usermiles=&address1=&tertcolor=white&city=&mode=&transid=&zip=50317&speccode=&guid=D0C1AEDE%2DB0B3%2D4D49%2DBCDD%2DD0F28C83C7B0&primcolor=white&seccolor=white&selectednet=BlueDental&srchtype=SEARCH&xcoord=93552702&ycoord=41614443&usrclass=S&quality=ZCP&mileage=10&prodcode=
Results: Redirecting to: https://www.geoaccess.com/wellmark/po/Warning.asp?guid=&vguid=&selectednet=BlueDental&error=IV
Results: Processing scripts after a file is scraped.
Malformed Request
jclerie,
About a year ago I did a scrape for a client that would pass through the geoaccess.com domain. I hit it hard enough that my IP is still blocked. It's going to take a bit more set up than I have time for right now to do a thorough test but that site really likes all of the relevant VIEWSTATE, etc. hidden post parameters passed properly and some times needs the refer to be set manually.
http://www.screen-scraper.com/support/docs/api_documentation.php#setReferer
They've been one of the pickiest I've scraped.
Let us know how it goes and good luck.
-Scott