sub extractor not working the same as normal pattern

I'm trying to extract the following snippet froma phpBB forum. I've seen this in a few different places on the site...

I've created a main extractor pattern:

<select name="g">~@DATARECORD@~</select>

then the following subextractor:

<option value="~@GROUP_ID@~">~@GROUP_NAME@~</option>

It only grabs the first line of each datarecord. Yet if I use the subextractor pattern above as the main pattern it finds them all (along with some other stuff I don't want).

I've tried shortening the beginning and end of the subextractor in case there were linefeeds in the way or something but it never finds more than the first line? help!

shadders on 03/06/2009 at 9:18 pm

screen-scraper public support

Actually, your observations

Actually, your observations are 100% accurate. Sub-extractor patterns only match once. The intended effect is not "only make it match once", but rather, "look until you find this data, then save it". We would like to implement something to fix this issue, but for now, the solution is something like this:

Main extractor: as you already have it.
Call a script on that main extractor, "after each pattern application".
Make that script do the following:
import com.screenscraper.common.*; text = dataRecord.get("DATARECORD"); DataSet myDataSet = scrapeableFile.extractData(text, "Give a new pattern name here");
Create a new "main" extractor pattern, and put your original sub-extractor text in it. Name the pattern the same thing as the pattern name used in the script you just made.
Make this new "main" pattern only a manual-execution pattern, with the advanced tab of the new pattern.

Now you should be grabbing that info, but you'll have to process it in a script... Use the notes found in the API on the DataSet.extractData method. It shows you how to effectively loop through the results.

This has become a common dilemma with a few site structures, and we'd like to implement something to make it easier.

Hope that helps.
Tim

timv on 03/09/2009 at 11:49 am

still a problem

Thanks, Tim; that solution is a good start until a fix is implemented. There is, however, one rather large problem: If the new "main" extractor pattern (which contains the former sub-extractor pattern) is set to run manually, it won't execute scripts after each application. That means the extracted values cannot be used further and we're back to square one. (The useability flaw in http://community.screen-scraper.com/node/691 appears to be related.) Is there a solution? Perhaps a different way of running those scripts?

Thanks!

ac4000 on 11/16/2009 at 5:03 pm

When I am faced with a

When I am faced with a similar situation, I generally use this:

http://community.screen-scraper.com/API/extractData

jason on 11/17/2009 at 9:33 am

Brilliant!

That worked perfectly. Thanks!

ac4000 on 11/18/2009 at 11:08 am

Search

Community

screen-scraper

User login

sub extractor not working the same as normal pattern

Actually, your observations

still a problem

When I am faced with a

Brilliant!