Error handling examples

I'm having trouble trapping errors and wondering if i could get some examples.

I have a scraping session that has 3 scraping files:


-
-
-

each page may return an error page, or if successfully scraped, variables extracted will be used for the next page.

if i encounter an error, i want to capture the error & terminate the scraping session

currently i kick off my session via a script

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "Session" );
runnableScrapingSession.scrape();

I'm having trouble trapping errors and wondering if i could get some examples.

I have a scraping session that has 3 scraping files:


-
-
-

each page may return an error page, or if successfully scraped, variables extracted will be used for the next page.

if i encounter an error, i want to capture the error & terminate the scraping session

currently i kick off my session via a script

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "Session" );
runnableScrapingSession.scrape();

and then i run a script after each extractor pattern to determine whether the file returned was the error page or the "good" file, using a combination of

wasErrorOnRequest()
noExtractorPatternsMatched()
getVariable()

if an error i use

 runnableScrapingSession.stopScraping().
to terminate the script

if the page is good i use

 runnableScrapingSession.getVariable()
to grab bits of data, and scrape the next page

this seems a bit clumsy, i have a script to kick thngs off, all these little scripts after each extractor pattern, and yet another script to save the results at the end.

I then tried writing a script that calls each scraping file individually and applying error logic:

runnableScrapingSession.scrapeFile( "Page 1" )
if (scrapeableFile.noExtractorPatternsMatched() ) {logic}
....
if (runnableScrapingSession.getVariable()) {logic}
... etc.

but get an error on

 runnableScrapingSession.scrapeFile( "Page 1" )

Before getting into the exact syntax of each of the scripts, i'm just wondering what is the most efficient way of going about doing this?

Ideally i'd prefer just 1 script to launch and manage my entire scrapping session.

I'm using Interpreted Java as my scripting language, and i'm ultimately calling the session from a php script. Am i best having all the logic in a screen-scraper script, and use it to trigger a pass/fail test in my php script, or building the error logic after checking each step in php?

What is more efficient?

Does anyone have examples of what they have done in this situation?

thx

Error handling examples

ozmex,

Currently it is impossible to disable patterns whether manually or in a script. The main idea behind the current set up is that scripts will do something with the data returned or will control the logic of going through different urls. You can however have a script which takes a look at the results of both patterns and decides what to do from there.

As far as calling another script from within one script, I have not found how to do that yet. An alternative which has come in very handy, is one can write some Java code, jar it up and place in the directory lib/ext. Then one can call the methods which are now within scope inside of ones scripts. I hope that made sense. That is how I go about accessing a database from within screen-scraper.

Brent
[email protected]

Error handling examples

brent, thanks for the info, it's helped -- sort of (more info, more testing, more problems!)

if i have multiple extractor patterns within a scrape, can i call an individual extractor pattern, or exclude one from being called?

for example, my scrape returns a web page, and i have 2 extractor patterns to evaluate this page. it is either:

1) good, and my first extractor pattern gets the info i need, or
2) bad, in which case the 1st extractor pattern returns noExtractorPatternsMatched=true, and i use my next extractor pattern to detect the error details.

the problem is that if i don't get the error page (ie the page returned is good), the second extractor pattern runs anyway.

this isn't a problem, but it seems like unnecessary processing ...

also, can i call a script from within a script?

any suggestions?