Scraping the data from the Page

Hi All,

I have created one session to scrap the propertyRoom Site. I have created 4 scrap files for scraping the details which are as follows.
1] All Categories : URL http://www.propertyroom.com/all-categories.aspx
2] CategorySearch : URL http://www.propertyroom.com/c/bikes_beach-bikes
3] ItemDetails : http://www.propertyroom.com/l/panama-jack-beach-bike/8104884

While creating a scrap file through Proxy session I found that it was calling a URL http://www.propertyroom.com/ajax/ajax.svc/GetClientListings repeatedly.
So I also created the following scrap file with this URL

4] GetItemListings : http://www.propertyroom.com/ajax/ajax.svc/GetClientListings

After creating these 4 scrap files I added extractor patterns and session variables to pass those variable to other scrap files.

But when I am running the session its not showing me the details on the 3rd scrap file which is for the "Item Details".On this page I need to scrap the Details like "Current Price", "Ends" etc details.
In last response of the scrap file It shows "Loading..." instead of the actual details.

Can any one please help me to scrap this details?

Data exist on the page.But can't get it in Source code of a page

Hi There,

Data is already there on the HTML page.But if you try to check that data in the source code of that page it will show you the text "Loading..." instead of the details.For an Example,Refer the following link.
http://www.propertyroom.com/l/specialized-hardrock-mountain-bike/8108451

On this page you will see the "Current Price","Ends" etc details. Now if you check the source code of this page it will show you that text "Loading..." instead of the actual details.

I want to create a session which will scrap the details like "Current Price","Ends" etc.Please help me in this task.

If the source says 'loading'

If the source says 'loading' it means that the results aren't in the current response. If you look at that page, there is most likely a META refresh tag. Screen-scraper won't automatically run that, so you just need to insert a pause, and make the same request again until the response has the data instead of the "loading" message.

If it's not a META refresh, there might be a JavaScript redirect to the results, and if that's the case you just need to make that request ... though I think this option is less likely.

Sajid, In order to determine

Sajid,

In order to determine the time remaining and the ending date/time for each auction you will need to reverse engineer the Javascript happening in this file.

http://www.propertyroom.com/scripts/js-bundle.min.js?3.0.1.30

This is a very unusual situation which we have very little experience with. Over the past 10 years we have only needed to handle a situation like this once before.

Proxy the three pages of your target website. Click the Find button above your proxy transactions and search for the word "Ends". Be sure to check the box "Case sensitive".

You should see two results from the same URL that look something like this.

http://www.propertyroom.com/l/huffy-panama-jack/8105262

If you examine the surrounding HTML you will notice a reference to a CSS class id "uxTimeLeft". Perform another search of your proxy transactions for this string ("uxTimeLeft").

You should see two results come back. Click on the one result that is not the URL you just looked at.

http://www.propertyroom.com/scripts/js-bundle.min.js?3.0.1.30

The content of this response is very large and it is being truncated by screen-scraper. Click the "Display Raw Response" button to see the entire response. Copy and paste the entire response into your favorite text editor.

Perform a search in your text editor for the string "uxTimeLeft". Note how this class is inside of a function called CountdownListingDetail. Do a search for that function in the same document.

function CountdownListingDetail(){
var seconds=0;seconds=$('#uxTimeLeft').attr('tlv');if(seconds>0){seconds--;
$('#uxTimeLeft').html(seconds.formatTimeLeft());
$('#uxTimeLeft').attr('tlv',seconds);}

Scan across that function and you will notice a reference to the function formatTimeLeft. Search for this function and you will find this code.

This is the block of code you will need to reverse engineer.

Number.prototype.formatTimeLeft=function(){if(this<1)return"Closing";var seconds=this;var days=Math.floor(seconds/86400);seconds-=days*86400;var hours=Math.floor(seconds/3600);seconds-=hours*3600;var minutes=Math.floor(seconds/60);seconds-=minutes*60;if(days>=10){return days+"d";}
else if(days<10&&days>=1){return days+"d "+hours.leadingZero()+"h";}
else if(days==0&&hours>=1){return hours.leadingZero()+"h "+minutes.leadingZero()+"m";}
else if(days==0&&hours==0&&minutes<60){return minutes.leadingZero()+":"+seconds.leadingZero();}}

Here is a snippet of code one of our developers created when we were scraping Groupon.com for another client. The language is Interpreted Java, not Javascript.

String timeLeft(year, month, day, hour, min, sec){
duration = Date.UTC(year,month,day,hour,min,sec)-new Date().getTime();

days = duration/(1000*60*60*24);
hours = (duration/(1000*60*60))-days*24;
minutes = (duration/(1000*60))-(days*24*60)-(hours*60);
seconds = duration/1000-days*24*60*60-hours*60*60-minutes*60;
session.log("long >>>>"+ duration);
session.log("Days >>>>" + days);
time = days+" Days " + hours + " Hours " + minutes + " Minutes " + seconds + " Seconds";
return time;
}

String[] months = dataRecord.get("MONTH").split(" ");
int year = Integer.parseInt(dataRecord.get("YEAR"))-1900;
int month = Integer.parseInt(sutil.reformatDate( months[1], "MMM", "M" ).toUpperCase())-1;
int day = Integer.parseInt(dataRecord.get("DAY"));
int hour = Integer.parseInt(dataRecord.get("HOUR"));
int min = Integer.parseInt(dataRecord.get("MIN"));
int sec = Integer.parseInt(dataRecord.get("SEC"));

timeLeft = timeLeft(year, month, day, hour, min, sec);

This code won't work for propertyroom.com. You will need to modify it for your situation.

I hope this helps,

Scott

Most of the time those

Most of the time those "loading" pages mean to just wait a few seconds and try the same HTTP request again.