automatically generating url`s

I worked myself through the tutorials (especially 2 and 7) and the forum, but I still dont get it. I have to admit, that I never programmed anything, so I am a complete newbie.
Nevertheless my question: I would like a script automatically generating the url`s I want to scrap data from. More detailed: the url is something like www.******id=12 and the next one would be www.******id=13 and so on. So the script should scrap all url`s from let`s say 10 to 100.

I know this is quiet simple, but tutorial 2 and 7 are about slightly different problems and therefore I dont get my script to run properly.

thanks in advance
Matthias

sennierer on 01/21/2009 at 4:38 am

screen-scraper public support

I am trying this method, but

I am trying this method, but when I run it and check the log, the URL created just ends with the startpage number and loops around to it forever, it never goes up in increments.

What have I overlooked ?

renamecor on 11/24/2009 at 11:48 am

If you know how many pages

If you know how many pages there are in total, then this can be pretty easy. We can adapt it further if you need to, but let's start with this:

// Interpreted Java
startPage = 12;
endPage = 15;

for (int currentPage = startPage; currentPage <= endPage; currentPage++)
{
session.setVariable("ID", currentPage);
session.scrapeFile("The page");
}

It shouldn't matter what you named the script itself, just set those first two variables 'startPage' and 'endPage' to the number range you want. I set them as 12 and 15, but that's a silly normal case, so you can change it to 1 and 58, respectively (or whatever the endPage should be for you). These numbers should correspond to the "id=12" part of your desired URL. Also, set that orange "The Page" text to the actual name of your scrapeableFile in screen-scraper, which should show you whatever is on page id=X (where X is some page/id number).

Now that you have the script, you'll have to get the scrape to start the script for you. You can do this in a few different ways... Either way, you'll want to make sure your scrapeableFile is set to run only "manually" (ie, not automatically in sequence)

If this is the only scrapeableFile you need to toy with on the website (or at least if it's the first one you want to hit), switch to your Scraping Session (blue gear icon) and go to the "Scripts" tab. Add a new script to the list and select your new script that we made above, and make sure that it's set to run "Before scraping session". It says "before", but since your scrapeableFile is set to run manually, it actually doesn't matter if you say "Before" or "After". But whatever.
On the other hand, if this is not the first scrapeableFile that should be running against the website, then I would try this instead: Whatever is the last scrapeableFile *before* this target one you want to run, change to it's "Scripts" tab and add your new script, setting it to run "After scrapeableFile".

The difference in the above two approaches is simply where you're putting the script: on the scraping session, or on the scrapeableFile. It just depends on how you want the scrape to flow.

Now, the last thing you should do is on your scrapeableFile (which is set to manually run, which your script will manually invoke for you, once for each ID/page thingy), Make sure your URL is correct on the "General" tab of the scrapeableFile, and also the list of parameters on the "Parameters" tab. Basically, the only thing you really have to check is the Parameter "id" (via your example ***id=12, ***id=13, etc). Don't put a number there-- instead put the text ~#ID#~. This will grab the most current value of the "ID" variable that we set in the script. This number will be changing each time our script calls on the scrapeableFile.

If you need more clarification, ask away-- I'm not sure on the specifics of your setup. Anyway, the script itself isn't super complex; all it does is make a variable called "currentPage", which will cycle through all the numbers between (and including) the startPage and endPage. For each value 'currentPage' cycles through (hence the programming term "for", as in my code above), it will run the code between the { and } braces, which updates the "ID" session variable, and runs your target scrapeableFile.

Hope that helps!

timv on 01/22/2009 at 11:25 am

thank you

The script worked perfectly.

I am facing two other problems, maybe you can help me with them as well?

1. Is it really true that I cant write the data I scrape to a cvs, txt or xls file within the basic version of screen scraper? I tried it but it just worked with the trial professional version.

2. I cant get screen scraper to write the data into columns. If I work with two extractor patterns the program writes all the data into one column. I tried to use subextractor patterns, but that didnt work as well. Do I really need to use subextractor patterns to get more columns? I just want the program to write all the data from one url into one row, with one column for each extractor pattern.

Thank you for the help and your very detailed explanation!

sennierer on 01/24/2009 at 9:38 am

write to csv

Sennierer,

We've developed an example write to csv which works for those using screen-scraper basic. It will require a little assembly by you, but all the ideas are there. Give it a look and it should be pretty self-explanitory.

here: http://community.screen-scraper.com/script_repository/Write_to_CSV

Thanks
Scraper

scraper on 01/26/2009 at 1:07 pm

Search

Community

screen-scraper

User login

automatically generating url`s

I am trying this method, but

If you know how many pages

thank you

write to csv