Multiple fields in a data record

Hi,

I've been using SS for a while now (thanks for a great tool) but I'm having a bit of difficulty extracting data on multiple sizes. For example, here I've got a page with multiple rows, each row containing a table (example below) which I extract into a datarecord. In this example table, I need to extract the names of the comics from the table cell below the one labelled comics. In this instance we have Adam Bloom, Ben Norris and John Fothergill (MC) but in the next table I may have 1, 2, 3, 4, 5 or in the second example 6 comics. Any ideas.

=== Example 1 ===

When: 23:55 - Friday 22nd Aug, '08
Prices: £13
Info: Plus Sugar Sammy
Comics: Adam Bloom, Ben Norris, John Fothergill (MC)

Book now

=== Example 2 ===

When: 20:00 - Tuesday 26th Aug, '08
Prices: £15
Info: Cutting Edge topical comedy games
Comics: Andy Parsons, John Fothergill, Martin Coyote, Roger Monkhouse, Sean Meo, Steve Gribbin

Book now

calling an extractor pattern from a script

As you probably realized sub-extractors only match once per extractor pattern, so since you have more than one item you are trying to sub-extract out of the table they won't work. If you have professional or enterprise addition the solution is to use the method extractData. In the api there is example code on how to do this. Essentially you are saving the table, or at least a chunk of it with the comic info, to a session variable then calling an extractor pattern to work on that session variable. The extractor pattern that is called from a script would then look for the specific data you are trying to extract, in your case the comic's name.

If you only have basic edition the solution is a little bit trickier. You will have to do the same thing that the extractData method does except program it yourself in java using regular expressions to parse out the data.

Thanks but not quite getting it yet

ryanj,

Thanks and I assume the link should have pointed at:

http://community.screen-scraper.com/API/extractData

Being rather new to this, I'm not really understanding the API example, could you elaborate a little (in java as I'm on a mac). I've got the professional edition.

So I run an extractor pattern as usual and save the table as a session variable. I then run a script (after each pattern application) and run the extractData function on the variable and save the results in a second variable (the one I then use to write out to a file).

thanks

Alex

ryanj, Thanks for the reply,

ryanj,

Thanks for the reply, I assume the link was to

http://community.screen-scraper.com/API/extractData

Any chance you could expand the explanation a little more (using a java example as I'm on a mac) as I'm struggling to understand how to use the function and access the data.

Alex

We'll see if I can explain it

We'll see if I can explain it more clearly. For your particular site this is what I would do. First of all make an extractor pattern to match the table and save the table data into ~@DATARECORD@~. Then I am assuming you want other information such as time, price so save these along with the td tag that has all the comics names with sub-extractors into session variables. Now lets say you saved the td with the comics names as the session variable COMICNAMES. We'll name this first extractor pattern "Comics table"

You then make a separate extractor pattern and under the advanced tab check the "This extractor pattern will be invoked manually from a script" checkbox. Make this extractor pattern match for the names in the session variable COMICNAMES. Your extractor pattern might look like the following: ~@COMICNAME@~. We'll name this extractor pattern "Comics names".

Now you make a script which will be run after each pattern application of your "Comics table" extractor pattern. This is where you use the extractData method. In this example I will write out the contents to a csv file on which each line is Date, Price, [Comma separated comic names]. The contents of your script will look like the following:

import com.screenscraper.common.*;

out = new FileWriter( outputFile, true );

out.write(session.getVariable("DATE")+ ",");
out.write(session.getVariable("PRICE")+ ",");

comicNames=session.getVariable("COMICNAMES");
myDataSet = scrapeableFile.extractData(comicNames , "Comics names" );
for (i = 0; i < myDataSet.getNumDataRecords(); i++) {
myDataRecord = myDataSet.getDataRecord(i);
out.write(myDataRecord.get("COMICNAME") + ",");
}
out.write("\n");