Extracting data from current URL

Scrapers,

I've been trying to find a way to scrape data from the current URL in process such as http://www.thesite.com/~@GET_THIS_DATA@~/detailspage/detail.html

Is there a way?

/Johan

Johan on 06/21/2009 at 6:31 pm

screen-scraper support for licensed users

Yes, though you will have to

Yes, though you will have to do it in a script:

// Interpreted Java

String url = scrapeableFile.getCurrentURL();

import java.util.regex.*;
Matcher m = Pattern.compile("http://www.thesite.com/([^/]*)/").matcher(url);

m.find();

String get_this_data = m.group(1);

The key to the pattern match is the parentheses, which defines the later-used "m.group(1)". The things within the parentheses defines how far out the match will go. In this case, it's just "[^/]*", meaning, anything up until the next "/" character.

Tim

timv on 06/22/2009 at 3:10 pm

Beautiful solution, but how

Beautiful solution, but how do you get the result into a session variable?

Johan on 06/22/2009 at 5:56 pm

Once you're in a script, you

Once you're in a script, you can set pretty much any value into a sessionVariable:

//... String get_this_data = m.group(1); session.setVariable("GET_THIS_DATA", get_this_data);

timv on 07/01/2009 at 10:59 am

No need to reply to this one,

No need to reply to this one, I figured out an alternative solution. Very much in need of expert advice on this one though: http://community.screen-scraper.com/node/1275

Johan on 06/26/2009 at 10:12 am

Search

Community

screen-scraper

User login

Extracting data from current URL

Yes, though you will have to

Beautiful solution, but how

Once you're in a script, you

No need to reply to this one,