How to loop through URLs with Interpreted Java? (again)
I'll start with a disclaimer that I recognize that I'm not a programmer (writing simple PHP code and Excel macros is about my speed) and therefore probably out of my depth; and that I'm probably posing a question that amounts to asking you to teach me how to program in Java.
However, I have a question which I've seen asked in several other forum posts, which seems to have confused more than just a few potential users, and which I still haven't been able to piece together after going through all the support materials: how do you write a script(s) to loop through incrementing a URL token in order to scrape multiple pages?
I'm talking about a scenario like scraping a page that is called with a URL like http://www.example.com/display.php?item=123 (or, for that matter, like http://www.screen-scraper.com/forum/phpBB2/viewtopic.php?t=324)
I can successfully follow the tutorial directions to generate a proxy session, define a scraping session and add a scrapeable file. I even get how to setup extractor patterns and write out the results of extraction to a file.
I have inserted a token into the URL as a placeholder for the item id (so the URL in the scrapeable file properties page becomes, for example, http://www.example.com/display.php?~#ITEM_ID#~.
Then, based upon one of the examples, I specified this "start session" script - which only successfully generates a scrape based upon the initial ITEM_ID = 601 and then terminates:
runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "item description" );
for(i=600; i<=650; i++)
{
runnableScrapingSession.setVariable("ITEM_ID", String.valueOf(i));
runnableScrapingSession.scrape();
}
Clearly I don't know what I'm doing... Help!
What I'd like is to setup a loop for the ITEM_ID token in the URL to be replaced during each successive iteration with the next number in sequence and have the scraping session re-run with that next ITEM_ID (in my example, for all the id's between 600 and 650).
The general flow, I believe, is something like this:
1. Initiate the scraping session (I'm doing this via a "start session" style script from the GUI as in the Tutorials)
2. Define a variable in the session to hold the ITEM_ID value
3. Define an initial value for ITEM_ID and a stopping value
4. Call the scrapeable file using the initial value of ITEM_ID
5. Scrape the scrapeable file
6. After file is scraped, write the results of extracting out to a file
7. increment the value of ITEM_ID
8. Call scrapeable file again with new (i.e. next) value of ITEM_ID in the URL
I've seen the post about how it is possible to setup an external batch file to invoke SS and feed it ITEM_ID's via "params" â€" but I'd like to learn how to do it with a loop using Interpreted Java.
As an aside, I (obviously!) think this would be a good topic for a dedicated tutorial â€" and sorry for the really long post and for revisiting a question that seems to have been asked before. I'd appreciate it if someone would take one more shot at explaining it step-by-step to someone who really wants to understand it but has met his match.
How to loop through URLs with Interpreted Java? (again)
Hi Gwilym,
Your best bet would probably be to carefully go through our seventh tutorial (here), which describes this technique in detail.
If you're looking to pay someone to do some of the work for you, feel free to send along a service request to us.
Kind regards,
Todd
Help with this sort of thing...
I'm trying to do the same sort of thing and am a little confused....
Is it possible for you to send me your log or a copy of each of your scripts? I'm confused as to if this...
for(i=600; i<=650; i++)
{
runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "item description" );
runnableScrapingSession.setVariable("ITEM_ID", String.valueOf(i));
runnableScrapingSession.scrape();
}
Needs to be in a sript before or after the scrape and if the variable being used in the URL needs to be declared somewhere else as well...
Alternatively is there anybody out there I could pay to set me up to get started?
How to loop through URLs with Interpreted Java? (again)
Good enough. It's not the ideal solution, but if it works for you then I'd go with it.
Best,
Todd
How to loop through URLs with Interpreted Java? (again)
Brilliant! works like a charm. Thank you. I do see what I suppose is evidence of the "multiple idependent scraping threads" phenomenon as the results written to the log (and my file) are out of order - but I can easily live with that.
How to loop through URLs with Interpreted Java? (again)
Hi,
In this case the solution may be relatively simple. Try adjusting your script like so:
for(i=600; i<=650; i++)
{
runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "item description" );
runnableScrapingSession.setVariable("ITEM_ID", String.valueOf(i));
runnableScrapingSession.scrape();
}
That is, you need to recreate your runnableScrapingSession object each time in the loop.
Generally we don't recommend this approach of creating multiple scraping sessions in a script, though. It's usually better in cases like this to either iterate over the various item ID's within your scraping session, or invoke the scraping sessions from a remote application. Nonetheless, go ahead and give this a shot to see how it works for you. Feel free to post back if you run into any snags.
I think you make a good point about this particular issue justifying a tutorial of its own. I'll add that to our list.
Kind regards,
Todd Wilson