scrape page without URL parameters
Hi everyone,
I'm trying to scan a site where the same pattern I want to extract occurs on different subpages, but those subpages can't be accessed via a parameter. For example, the pattern occurs on pages
www.mysite.com/a
www.mysite.com/b
www.mysite.com/c
...
So whereas normally you'd access the "a", "b" and "c" via a parameter (mysite.com/search?itemid=a), this would involve changing the end of the URL string following the forward slash after the top level domain.
Is there any way to do that in screen scraper? I can't see a way to do it since the scraping process seems to 1) require a variable parameter and 2) I can't seem to manually set a part of the URL string as a parameter.
Failing that, is there some way to 'parameterise' URLs such as the one above so I get to page 'a' by calling an alternate URL with the usual ? or & parameter operator (e.g. mysite.com/&pageid=a)? I'm guessing not but there might be some convention I'm not aware of.
Any help is greatly appreciated.
Thanks
Dan
I want do the same thing but do it dynamically
In my scrape I receive back a Search Page that has the URL for each subsequent detail page I would like to scrape. The problem is that I cannot seem to get the dynamic results to be passed then to the script to cause the scrape of the Details Page. I want the logic to work like the following:
Scrape Search Page
Receive Back URLs for each Detail Page
Iterate Through Each Detail Page and get needed data
Any help you could provide would be greatly appreciated!!
Thanks
JC
Sorry to Hijack but
Hi All, and sorry to Hijack...this thread,
Is it then possible to use the INDEX in your example as part of you extractor patter text?
Regards
Shaun
You cannot use a variable in
You cannot use a variable in an extractor pattern. It's looking at the HTML in the last response, and there's not a way to replace your value in.
On your scrapeable file, just
On your scrapeable file, just set the URL to:
And set the INDEX in a script or extractor. An example:
String[] alphabet = { "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z" };
// Basic for loop to get started.
for (int i=0; i<=alphabet.length; i++)
{
session.log("***Current letter is " + alphabet[i]);
session.setVariable("INDEX", alphabet[i]);
session.scrapeFile("File name");
}
thanks
Thanks Jason, I didn't know I could add a parameter like that; it works now. Cheers