Header and Footer Bracketing

Hi,

Still getting great use from this tool, so thanks for writing it.

I want to extract poll data from a sheet on which there are multiple polls, each representing the same candidates for different questions:

Poll 1
Candidate 1 - 2 votes
Candidate 2 - 4 votes

Poll 2
Candidate 1 - 294 votes
Candidate 2 - 624 votes

My question is how can I extract these as different datasets. I have a pattern which will pull them both, but I get multiple rows per candidate and I can't distinguish between the polls easily.

I've tried using a sub-extractor pattern to match the whole of Poll 1, but it seems to want each candidate in a a separate parent record, so I'm guessing that wont do it.

Essentially I want to be able to define a header and footer to bracket each section of the scrapable file.

Any ideas?

Simon

Header and Footer Bracketing

Maybe I'm being dim, but I really don't get this.

What is the first Extraction Pattern supposed to be matching? If it's the first poll, then how do you get the scripted Extractor Patterns to scope to that poll and that poll alone.

I'm flummoxed, please help.

Header and Footer Bracketing

Hi back on topic now Todd's got the scripts working for me.

I'm not sure I understand the solution.

If I have some HTML which looks like



Poll number 1

Candidate Number 3

Header and Footer Bracketing

Hi,

I've dropped back to 2.0, and Scripts still don't work.

I get a series 5 of the following errors in error.log
An exception was thrown when generating a pop-up item for:

I guess there is some sort of JRE issue. I'm using
Java Runtime Environment Standard Edition 1.3.1_07
Default Virtual Machine Version 1.3.1_07-b02

Any ideas?

Simon

Header and Footer Bracketing

Hi,

I think I've gone up too many versions to an alpha version (I just clicked the Check for Upgrades menu option, and I don't have "allow upgrading to unstable versions" set, so there seems to be a prblem there.

Let me drop to the last known stable version and I'll come back.

Header and Footer Bracketing

Hi,

Thanks for the response, but I have some problems with new version (2.0.5.39a - professional edition)

Proxy authentication is now scoped to the Scraping Session, and will not inherit from the application scoped settings. This means I have to add in time sensitive proxy authentication data for every scraping session. Why can't the session inherit from the app by default with an override if required?

The logging is not as verbose. The datarecords are missing from the logging, and there is no longer any information when a scraping session cannot retrieve the scrapable file. At present the logs tell me everything is fine - but there is no Last Response, and no Last Request in the extractor.

Scripts don't work. I can click the Add Script button all I like, but nothing happens.

Simon

script extracting

This functionality is available, but you need to have the most recent release of the professional version. In order to this, you will use extractor patterns within scripts. It is also rather complicated, so feel free to post any questions about the solution.

First, create all the extractor patterns you will use for the data in the scrapeable file, and make sure you name them.

You should have one main extractor pattern that gets each set of data that you want to split up further. In this example, it would be an extractor pattern that returns all the polls in a data set.

For all of your other extractor patterns:
Go to the 'Advanced' tab, then click on 'This extractor pattern will be invoked manually from a script'. This will stop the patterns from executing normally and filling up your log with unneeded information.

Now we can actually write the script to extract the information we want.

import com.screenscraper.common.DataSet;
import com.screenscraper.common.DataRecord;

//retrieve the last poll we just extracted and create a new Dataset
pollDataSet = scrapeableFile.extractData( dataRecord.get( "POLL" ), "CANDIDATE" );

This will create a new dataset for each individual poll, and its data records will contain candidates. The new dataset can be manipulated like any other dataset, ie getAllDataRecords etc. You will need some further scripting to access and manipulate the data, but that will be specific to your case.

The way this script functions is to extract data after each poll, so on the extractor pattern page, when you add the script to the patter, make sure 'When to Run' is set to 'After each pattern application' or you won't get all the results. Hope this helps.