looping the scrape changing the URL each time
OK so this is part 2 from my previous post current date scrape. I would like to scrape the same URL over and over each time changing part of the URL in a loop. This way I only capture 1 proxy server URL and create 1 session and 1 scrapable file.
Here is a sample to show what I am trying to do. The 3 letters in CAPS before the date I would like to change. The captured URL is the same but maybe some type of script similar to the current date one shown in the previous post but a looping one so that it scrapes for each change.
Captured URL
http://www.thesite.com/static/entry/xyz052814test.html
URL's I would like to create from the captured with the changes in CAPS.
http://www.thesite.com/static/entry/ABC052814test.html
http://www.thesite.com/static/entry/DEF052814test.html
http://www.thesite.com/static/entry/GHI052814test.html
http://www.thesite.com/static/entry/JKL052814test.html
String[] urls =
This assumes that you're running a version newer than 6.0. If you're at that version, you can't do the for each type loop, and would need a regular for iterator.
for (urlPart : urls)
{
url = "http://www.thesite.com/static/entry/";
url += urlPart;
url += sutil.getCurrentDate("MMddyy");
url += "test.html";
session.log("==>New URL :: " + url);
session.setv("URL", url);
}
Using the new code
Jason - I am getting the new URLs when I run this but I would like get Screen Scraper to actually run each URL. Can that be done? Here is what I get now.
Processing script: "Entries"
==>New URL :: http://www.equibase.com/static/entry/AP061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/AQU061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/BEL061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/CBY061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/CD061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/CRC061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/CT061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/DEL061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/DMR061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/ELP061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/EMD061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/EVD061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/FG061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/GG061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/GP061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/HOU061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/IND061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/KEE061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/LAD061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/LRL061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/LS061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/MNR061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/MTH061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/OP061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/PEN061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/PIM061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/PRM061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/PRX061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/RET061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/SA061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/SAR061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/SUF061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/TAM061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/TP061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/TUP061114USA-EQB.html
==>New URL :: http://www.equibase.com/static/entry/WO061114USA-EQB.html
Entries: Untitled Extractor Pattern: Processing scripts once if pattern matches.
Entries: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
I have generated each URL using Java now I would like to process each of these URL's so I can scrape each one. Is that possible? Thanks.
ANOTHER LINE OF CODE
Is it possible I need a command called session.ScrapeFile("filename") in the loop to initiate scraping?
JAVA CODE
In this code I think I need a line session.scrapeFile.
url = "http://www.equibase.com/static/entry/";
url += urls[i];
url += sutil.getCurrentDate("MMddyy");
url += "USA-EQB.html";
session.log("==>New URL :: " + url);
session.setv("URL", url);
UPDATE
ok what I did is replace the URL in the scrap able file with the session variable ~#URL#~. What happens is the code loops and goes through all the URLs in the loop and only tries to process the last one. Now all I need to do it make it scrape all the URLs in the For loop.
Right, the scrapableFile has
Right, the scrapableFile has a dynamic URL set, and you probably need to check the box "this scrapeable file will be invoked manually from a script", and inside the for loop of your script, invoke
WORKS!!!
Thanks Jason for all your help!
Sweet!
Thanks Jason I'll have fun trying this out. I currently have about 40 URL's I am scraping so this is a big improvement.
I just have basic so I am at 6.0. When you say regular for iterator do you mean a simple counter in a variable and increment by 1 for the loop?
Right, so you'd need to
Right, so you'd need to do
And instead of "urlParts" you'd use "urls[i]"