Scrape within a scrape: SCRAP-CEPTION!
Hi,
Sorry, I am not much of a programmer so I am referencing from the "Manual Data Extraction" page using apples =).
From 100 apples (level 1 categories), I am trying to pick out just 5 apples (5 x level 1 categories).
But instead of gathering all 5 apples, is it possible to get Screen-Scraper to:
grab 1 of 5 apples,
go to 1 of 5 apples: level 2 sub-category,
grab 2 of 5 apples
go to 2 of 5 apples: level 2 sub-category,
etc.
Using the examples scripts provided from the "Manual Data Extraction", it is currently doing this:
DATARECORD 1:
grab 1 of 5 apples
storing as session variable
DATARECORD 2:
grab 2 of 5 apples
storing as session variable
DATARECORD 3:
grab 3 of 5 apples
storing as session variable
go to 3 of 5 apples: level 2 sub-category (via session.executescript())
Thanks for any input in advance!
When you run the extractData
When you run the extractData you get a dataSet over which you must iterate. If you wanted to do all you would:
for (i=0; i<ds.getNumDataRecords(); i++)
{
// Stuff
}
Instead if you wanted only 5
for (i=0; i<6; i++)
{
// Stuff
}
Ah, sorry, I didn't make
Ah, sorry, I didn't make myself clear...
What I need is for Screen-scraper to scrape a level 1 category URL and then proceed to scrape a level 1's sub-category URL. I am not much of a programmer so I am just using the example script from "Manual Data Extraction" page. So the script would only scrape all the level 1 category URL.
What I meant was that by using the example script from extractData, it would grab all the level 1 category in one go:
Category 1 (Level 1)
category 2 (Level 1)
category 3 (Level 1)
etc...
But what I want it to do is:
Category 1 (Level 1)
Category 1 sub-category 1 (Level 2)
category 1 sub-category 2 (Level 2)
category 2 (Level 1)
Category 2 sub-category 1 (Level 2)
etc...
Is this possible? Thank you for your help in advance!
That is possible--I do
That is possible--I do similar all the time. Sites vary so much on how they implement it, however, I cannot devise a general way to show you.
Or another example, there are
Or another example, there are 3 paragraphs with the same identical text pattern. I just need to scrape the URLs from the 2nd paragraph, go to that URL to scrape some data, and then go back to the 2nd paragraph to scrape the next URL (like using "after each pattern match"). Would this be possible? If so, what would the script look like? Would it be just:
ds = scrapeableFile.extractData(text, extractorPattern);
without the looping:
{
for (i=0; i
// Stuff
}?
Thanks for any help in advance!
I'm not sure how to answer. I
I'm not sure how to answer. I would need to see an example.
Below is an example of what I
Below is an example of what I am trying to scrape:
There are 3 paragraphs: "Games & Puzzles", "Kids' Clothes" and "Learning & Educational Toys". I just need to grab the URLs under the sub category block "Kids' Clothes" (eg: Accessories, Boy T-Shirts, Girl T-Shirts). Can I use extractData in this situation? The thing is when I tried extractData, it would loop even if I deleted the looping code.
========================================================
<h2><a href="/category/index.jsp?categoryId=2255966"> Games & Puzzles </a></h2>
<ul>
<li><a href="/category/index.jsp?categoryId=3252390">Board Games</a></li>
<li><a href="/category/index.jsp?categoryId=3252397">Card Games</a></li>
<li><a href="/category/index.jsp?categoryId=3252399">Electronic & Interactive Games</a></li>
</ul>
</div>
<div class="subCatBlockTRU">
<h2><a href="/category/index.jsp?categoryId=4174594"> Kids' Clothes </a></h2>
<ul>
<li><a href="/category/index.jsp?categoryId=11906884">Accessories</a></li>
<li><a href="/category/index.jsp?categoryId=4009438">Boy T-Shirts</a></li>
<li><a href="/category/index.jsp?categoryId=4009477">Girl T-Shirts</a></li>
</ul>
</div>
<div class="subCatBlockTRU">
<h2><a href="/category/index.jsp?categoryId=2255959"> Learning & Educational Toys </a></h2>
<ul>
<li><a href="/category/index.jsp?categoryId=2256390">Electronic Learning</a></li>
<li><a href="/category/index.jsp?categoryId=2256398">Back to School Supplies</a></li>
<li><a href="/category/index.jsp?categoryId=2256399">Science & Discovery</a></li>
</ul>
</div>
So you would use extract data
So you would use extract data for that. You'd first want an extractor that gets only "Kids' Clothes"
~@TO_EXTRACT@~</ul>
And it would have a script like
for (i=0; i<ds.getNumDataRecords(); i++)
{
dr = ds.getDataRecord(i);
name = dr.get("NAME");
url = dr.get("URL");
// Other stuff
}
And "Extractor_2" would look like:
Thanks for the tip!
Thanks for the tip! Unfortunately, it still loops and then stops.
(Actually it says "(i=0; ds.getNumDataRecords(); i++)" has too be a boolean so I set it as (i=0; i==ds.getNumDataRecords(); i++) and it seems to work. )
Below is the result:
NAME=NAME0
URL=URL0
Storing this value in a session variable.
EXTRACT--DataRecord 1:
NAME=NAME1
URL=URL1
Storing this value in a session variable.
EXTRACT--DataRecord 2:
NAME=NAME2
URL=URL2
Storing this value in a session variable.
Is there a way it can:
NAME=NAME0
URL=URL0
Storing this value in a session variable.
(Processes a script: "Goto Scrape_URL" and then completes Scrape_URL)
EXTRACT--DataRecord 1:
NAME=NAME1
URL=URL1
Storing this value in a session variable.
(Processes a script: "Goto Scrape_URL" and then completes Scrape_URL)
I've tried:
for (i=0; i==ds.getNumDataRecords(); i++)
{
dr = ds.getDataRecord(i);
url=dr.get("URL");
session.setVariable("URL", url);
session.scrapeFile("Scrape_URL");
}
But it still loops and then stops without going to the scrape file.
Again, thanks for any help in advance!
I edited the script above,
I edited the script above, and it should work like this now.
But the result is it still
But the result is it still loops. Is there a way for it to not loop or loop after a scrape file?
NAME=NAME0
URL=URL0
Storing this value in a session variable.
(Processes a script: "Goto Scrape_URL" and then completes Scrape_URL)
EXTRACT--DataRecord 1:
NAME=NAME1
URL=URL1
Storing this value in a session variable.
(Processes a script: "Goto Scrape_URL" and then completes Scrape_URL)
I don't understand what you
I don't understand what you want. I verified that you asked for the sub-categories of "kids clothes", and we have those.
Apologies. The provided
Apologies. The provided extractData script does this:
Store in session variable
Extract data
Store in session variable
Extract data
Store in session variable
Process script
I was wondering can extractData do this:
Store in session variable
Process script
Extract data
Store in session variable
Process script
Extract data
Store in session variable
Process script
ds =
for (i=0; i<ds.getNumDataRecords(); i++)
{
dr = ds.getDataRecord(i);
name = dr.get("NAME");
url = dr.get("URL");
session.setv("URL", url);
session.executeScript("Script name");
}
any help would be
any help would be appreciated...