Ability to Scrape this Site

I am having difficulty obtaining the url required to identify the pruducts as the part url is called using java in the html code. I have identified the unique id of the product but cant see how to use that to identify all the products to display the results page to gett the further information.

IS it still possible to scrape the following site, is therre someting I am missing?

http://atsearch.autotrader.co.uk/www/ds_search.asp?did=23482&pageid=5&or...

This may be obvious to some but I am very new to this scraping utility and would appreciate any help offered.

Many Thanks

Adrian Cutler

Thanks

Hi Tim,

I have just come back and seen your response, I will try this thank you.

Adrian

Ooo, actually...

I was just looking at the site, and it appears to be a simplified version of the AutoTrader system.

If you look at the page source, there's a javascript function defined about 2/3 of the way down the document:
function more_info_ds(adID,dist,cat,pos,popName,Width,Height) {
popURL="/www/bikes_popup.jsp?did=23482&pageid=5&originalid=&gid=nogroup&tid=7&start=" + pos + "&distance=" + dist + "&adcategory=" + cat + "&channel=DEALERPAGE&id=" + adID;
pop_up(popURL,popName,Width,Height);
}

Basically, you'll only have to recontruct this function in a script of your own:
// Using Java...
// "did", "pageid", "originalid", "gid", and "tid" are all variables needed to access the page, and are available as GET parameters in the URL.
// You should already have access to these variables through whatever page you're using to get to this results page.

String newURL = "/www/bikes_popup.jsp?"
newURL += "did=" + session.getVariable("DID")
newURL += "&pageid=" + session.getVariable("PAGEID")
newURL += "&originalid=" + session.getVariable("ORIGINALID")
newURL += "&gid=" + session.getVariable("GID")
newURL += "&tid=" + session.getVariable("TID")

// At this point, you need to start inserting the variables that you'll be scraping off of each of those results in the results page. You need all of the variables, not just the ID, because the site is going to call a script, and you'll likely cause a script error if you don't pass it everything.
// All of these variables I'm going to pull should be in your extractor pattern to match the "

// The 4th parameter goes in as "start". It seems to always be... 1 <= start <= numberOfResultsOnThePage. In this case, 1 through 10
newURL += "&start=" + dataRecord.get("START") // 4th parameter
newURL += "&distance=" + dataRecord.get("DIST") // 2nd
newURL += "&adcategory=" + dataRecord.get("CAT") // 3rd
newURL += "&channel=DEALERPAGE" // this one was hardcoded in
newURL += "&id=" + dataRecord.get("ID") // 1st parameter

All that really remains is to make sure that your extractor pattern is catching those 4 parameters inside of the call to javascript. It looks like you already extracted the 1st one, the ID, so just add to the pattern to get the other three, and then call a script like the one just above "After each pattern application".

Let me know how that one works out for you!