How to get to this data
I stumbled upon this site where the search results are somehow loaded outside the main page. If I look at the source code or save the page as html the search result section isn't there.
I was wondering if the professional version of the software can handle this type of pages and what's the solution to it.
Here is an example url:
http://looplink.natl.kwcommercial.com/looplink/kwc/SearchResults.aspx?SearchType=FS&VIEWSTATEID=79868016&PgCxtGuid=58e1fd4a-b13d-44ee-bbc2-5d51ac185b0f&PgCxtCurFLKey=LooplinkSearchPage&name=kwc&LooplinkRadioButton=FS&QryRadioCountry=US&QryRadioStateList=AL&QryRadioLooplinkSubmit=Search&ReturnTargetUrl=%2fxNet%2fLoopLink%2fLoopLinks%2fkwc%2fqryradio.aspx&R_LL_RB=FS&R_QR_Country=US&R_QR_SL=AL
I already got that far but
I already got that far but I'm getting an error "Unable to Parse" message when I try to generate scrapable file from http://looplink.natl.kwcommercial.com/xNet/ExternalServices/Listing/ListingSearchLL.svc/PerformSearch
pipe.jack, After you get the
pipe.jack,
After you get the can't parse error you should notice that a scrapeable file is still created in your scraping session.
If you go back to the proxy transaction and look under the Request tab of the transaction you notice the POST data looks like this.
In order to request this page in screen-scraper you will need to use scrapeableFile.setRequestEntity.
Now, looking at the content of the POST data you'll notice the "pg", "sg", "sid", "ai", "si", and "c" values look like they could be dynamic. This means their values are likely generated by the server for this session. So, in order to retrieve the correct value for your session you will need to scrape these values from a previous page.
Search your proxy transaction for part of one of the values. For example, I searched for "9b1625a3" and found that it was available for scraping from the following URL.
Incidentally, I found the values for all "pg", "sg", "sid", "ai", "si", and "c" together. Create an extractor pattern for the values found on the SearchResults.aspx page like so.
Save each token as a session variables then construct your call to scrapeableFile.setRequestEntity like so.
Call your script before running the PerformSearch scrapeable file.
This will solve one of the issues you'll encounter when scraping this page. The other issues involve cookies which you can see by making use of the "Compare with proxy transaction..." button located at the top of your Last Response tab.
Take a look and post any questions you have.
-Scott
pipe.jack, Yes, we currently
pipe.jack,
Yes, we currently scrape many, many sites that are powered by Loopnet. And, yes, the site can be scraped by any of our editions.
In order to scrape any site using screen-scraper we strongly recommend that you take the time to walk through our online tutorials. In the tutorials you'll learn how to proxy a site first before you start to build your scraping session.
When you proxy the site you can search the proxy transactions for a unique phrase like part of one of the addresses in the results. Doing so will reveal that the results page is located here:
http://looplink.natl.kwcommercial.com/xNet/ExternalServices/Listing/ListingSearchLL.svc/PerformSearch
After going through our tutorials everything should make a lot more sense.
Good luck,
Scott