Extracting Data for Pattern

Thanks for this great tool. I am very interested in purchasing it. However, in the three day effort to peel data from one web site, I have yet to be successful. It appears that everything is functioning as it should, except the main Pattern on the details page states that the pattern did not find any matches. I simply don't understand how that would not be working. In fact, I manually went to one of the pages, that had that error, and downloaded via the proxy server. I copied the cleaned up html to a text editor and searched for my patterns, and they are on the page once, and exactly where they should be. So, what do I do to get this working. I just can't think of a reason it wouldn't work.
Thanks,
t

Thanks, Tim. I think that is

Thanks, Tim. I think that is working now. The "Write data to a file" seems to be my hangup, now.

The script:

FileWriter out = null;

try
{
session.log( "Writing data to a file." );

// Open up the file to be appended to.
out = new FileWriter( "dvds.txt", true );

// Write out the data to the file.
out.write( dataRecord.get( "PRODUCTNAME" ) + "\t" );
out.write( dataRecord.get( "PRODUCTID" ) + "\t" );
out.write( dataRecord.get( "PRICE" ) + "\t" );
out.write( dataRecord.get( "IMAGE" ) + "\t" );
out.write( dataRecord.get( "DESCRIPTION" ) + "\t" );
out.write( dataRecord.get( "CATEGORY" ) );
out.write( "\n" );

// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}

The error:
Details page: Processing scripts after a pattern application.
Processing script: "Write data to a file"
Writing data to a file.
An error occurred while writing the data to a file: null

Your help is appreciated.

Holy shoot. I wrote a reply

Holy shoot. I wrote a reply to this, but it didn't work for some reason..... :( Crap.

Uh.... what did I tell you...? Man I hate writing things twice. I'll make this condensed and to the point :P

So, I copied your script and ran numerous test cases against it. I couldn't make it throw any errors at all. I intentionally made sure that "PRODUCTNAME" and other variables were null, but that still didn't throw an error.

(ung.. I still can't believe my reply is gone. I looked at it after posting, I swear :P )

*ahem*. I think I have a better reply this time though. I digress:

I'm thinking that it's got something to do the FileWriter. The only reason I could see it messing up, though, is if the file already exists on your harddrive, and it's being used by another process on your computer. Consequently, the file can't be opened by the FileWriter, which makes your "out" variable null, which then throws ugly errors when you try to do "out.write".

Personally, I think it's kind of a design flaw in the way that Windows works. Give the senario, I'm guessing that you're using Windows, since Linux doesn't encounter this problem. When a program like Notepad or Word or Excel is using a file, it locks it down, telling the rest of the computer that nobody else can alter it. This is great for flagging a file as "in use", but it's a horrible horrible problem when you've got something like screen-scraper trying to write more data into the file.

Your FileWriter can't open the file while it's being used by some other program.

If you're not using your "dvds.txt" in another program, then perhaps something unaccounted for is using it... an on-access virus scanner... the Windows Vista searching indexer... something.

Another issue that stems from this dealio with Windows is that even after closing a program like Excel, it won't let go of the file that you had open. You may have encountered this before, when you close a program or video file, and then try to delete it, and Windows tells you that it can't, because it's being used by another process. Same problem.

Try running this without having any contact with the "dvds.txt" file. If you're just trying to monitor the file as it's being updated, then you have a good idea, but it won't work on Windows.

Sometimes it requires an entire restart of Windows in order to make it let go of a file. Simply logging out and back in won't always fix the problem.

Tell me if this leads to any new developements.. If it doesn't, I'll just stick a sock in my mouth and try to figure out what else might be wrong. In fact, if you just can't figure it out, put this at the top of your write-to-file script:

int step = 0;

and then put a few of these throughout your script:

step++; session.log(step.toString());

This way, you'll see numbers pop up in your log when you run the scrape. If you see it get to the number 2, but not 3, then you know the problem is somewhere between the 2nd and 3rd spots where you put the "step++; session.log(step.toString());" Move the 3rd one farther up in the script and try to narrow it down to just a single line that is causing the "null" error.

Again, let me know if this helps at all!

Tim

Hmm.. there must be

Hmm.. there must be *something* not right. Perhaps it's just the token's pattern itself that is trying to match too far out. If that happens, then the rest of the pattern is not going to match properly.

Without any specifics about which website you're scraping, or the patterns you're using, all I can offer you is this:

  1. Copy and paste your tidy'd HTML into the extractor pattern again. No variables. Just text.
  2. Test the pattern out by applying it to the HTML. For this to work, you'll obviously need to be copying your tidy'd HTML out of the "Last Response" tab. The pattern should match just one single result.
  3. Add a single variable to the mix, replacing whatever text it is supposed to be matching.
  4. Test again.
  5. If the test failed, examine your extractor pattern carefully. Start with something simple, not complex or specific to the data you're matching. If the test succeeded, add another variable and repeat the process of testing.

Somewhere down the line, you're bound to find that a variable isn't doing what you're expecting it to do.

Hope that helps for now-- If you need more help, provide a specific example of what's not matching, and then I can get specific, too.

Tim