I have no clue...

I'm scraping another site that has a similar problem to the last site. I have no clue how to solve this simple problem. The url contains a number that I have no idea how to get. For example, when I try to find the value of 3352 Maple Ave 90000 the url becomes http://www.website.com/home-values-3352-maple-ave-los-angeles-ca-90000-1... but I have no idea how to get the "183917120" portion of the URL. I have thousands of properties I need to search so creating an individual scraping session for each property would be useless. Any help would be appreciated.

"Let go, Luke."

Adrian,

It looks like it's redirecting properly on it's own. My suggestion is, if you don't need that number that you just allow screen-scraper to follow the redirect. If you do need the number, then I may need to call in the big guns on this one.

I've encountered situations like this in the past. screen-scraper uses Apache's HTTP Client internally to handle much of the communication between the client and the server. Some interactions are handled automatically by HTTP Client and are not actually exposed to screen-scraper, making accessing the origins of this type of data difficult.

Is there any reason why you couldn't just read in your list of addresses and zip codes as GET parameters and let the screen-scraper follow the redirect? Does the resulting page not contain the data you need to make the next step?

-Scott

Need Big Guns...

Hi Scott,

I tried as you suggested and received the following error: Warning! Received a status code of: 404.

Any suggestions? Thanks for your help!!!

Please post your log.

Please post your log.

Forgot to mention...

...that I had proxied the site prior to posting my first response. When proxying the site I noticed that the initial submission URL looks like this.

http://www.website.com/propertyinfo.aspx?a=3352+Madera+Ave&z=90039

So, if you set up your scrapeable file to look like the above and press go, you'll notice screen-scraper follows the 301 redirect and takes you to...

http://www.website.com/home-values-3352-madera-ave-los-angeles-ca-90039-183917120.mvc

The first rule of screen-scraping: As closely as you can, imitate the requests to the server that your browser makes. Study the raw contents of a successful request from your proxy session while constructing your scrapeable files.

Remember to let the information in your proxy session guide you towards a perfect scraping session.

-Scott