scraping help

i am trying to scrape through a website.I am able to scrape the first page pf the request, but to go to the next page its giving me error.
i have checked the cookies etc though its getting set properly, not sure why the error happens. The site tells that there is a cookie issue.
The URL to the next page is also getting set perfectly.

any answers?

Thanks,

it worked~

todd,

Thanks for the support...now iam able to scrape the site for properties.
I think i got confused when i tried the same url in the firefox and IE. IE rejected the URL but foirefox accepted it.
Anyways, good to see it working.,

cheers

scraping help

Ah. I see the issue. Take a look at this URL snippet from your request

/2/pf/property/searchResults.do?pageNumber=2&atn=ATN_GOTO_PAGE&=

You'll notice that instead of the usual "&" delimiters you have "&". When screen-scraper tidies a page it replaces all "&" with "&", which is likely the root of the issue. There are a few possible solutions

1. Don't tidy the page. If you're using the Professional Edition of screen-scraper, you can do this under the "Advanced" tab for your scrapeable file.
2. Extract out only the dynamic pieces of the URL, then embed them in individually. The resulting URL might look something like this

http//www.propertyfinder.com/~#DIR_NUM#~/pf/property/searchResults.do?pageNumber=~#PAGE_NUM#~&atn=ATN_GOTO_PAGE&=

3. After extracting the full URL, write a script that will replace any "&" strings with "&".

Kind regards,

Todd

scraping help

I tried all the combinations of the cookie spec as well and tried setitng the cookie in the session and executing the script before the scraping session began. But still having the trouble. I tried playing with the cookies set by the site and they are looking for the session id cookie, jsessionid.
But have a look at the request we are sending across
------------------------------------------
GET /2/pf/property/searchResults.do?pageNumber=2&atn=ATN_GOTO_PAGE&= HTTP/1.1

Referer: http://www.propertyfinder.com/2/pf/property/searchByArea.do?region=&rentalPeriod=monthly&atn=ATN_NEW_SEARCH&browseArea=&tenureType=buy&minRentPrice=&minBuyPrice=&minRentPriceWeekly=&minRentPriceMonthly=&maxRentPrice=&maxBuyPrice=&maxRentPriceWeekly=&maxRentPriceMonthly=&minBedrooms=&searchString=Clerkenwell

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)

Host: www.propertyfinder.com

Cookie: jsessionid=B8D4B8BEE6AF6FC92CA0E082F3087138

Keep-Alive: 300

Cookie: ARPT=IIIIIIS10.1.1.43CKIMQ

Accept-Charset: ISO-8859-1, ISO-10646-1, utf-8;q=0.66, *;q=0.66

Cookie: JSESSIONID=B8D4B8BEE6AF6FC92CA0E082F3087138

Accept-Language: en-us, en;q=0.50

Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1

-----------------------------------------------------------

We seems to have all the info there...

cheers

scraping help

Hi,

I just posted a FAQ for that issue here: here. Would you mind checking that out, and posting back if it doesn't seem to help?

Thanks,

Todd Wilson