Is this doing any good, or not?

OK, I finally got around to rewriting the VBScript I was using to write data out to .txt files. I used Interpreted Java, instead of VBScript, for the new code.

In perusing through some Java docs, I hit upon an idea that (I hoped) would make writing out to the files faster. I did this because one of the pages I scrape once a day has over 1000 items that are extracted, and it took a good bit of time (and CPU cycles) to write the data out to the .txt file.

So, here's the code I came up with:

PrintWriter out = null;
try
{
// Open up the file to be appended to.
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter( "URIBLns.txt", true )));

// Write out the data to the file.
out.write( dataRecord.get( "URIBLNS" ) + "\r\n" );

// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}

My question is this:

Does the following line:

PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter( "URIBLns.txt", true )));

do anything to speed up writing to the file, since this script is called each time a pattern is matched?

Or would it be better to save all the extracted data to a ListArray, then write that ListArray out to the .txt file?

Is this doing any good, or not?

Hi,

No, you won't have to null them out. You shouldn't have to null them out with the previous approach you took, either. You'd only need to null out the values, potentially, if you were dealing with session variables, which persist across extractions.

Kind regards,

Todd

Is this doing any good, or not?

With that code, will I have to null the DataRecords on subsequent page requests?

In other words, do the DataRecords persist from one scraped page to the next? If so, it'll write out everything to .txt file that it's found on previously scraped pages, and everything it's found on the current page.

Is this doing any good, or not?

Hi,

You're correct that you're re-creating those objects each time, so it may be more efficient to write all of the extracted data out in a single script.

Try changing your script to the following:

PrintWriter out = null;
try
{
// Open up the file to be appended to.
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter( "URIBLns.txt", true )));

for( i = 0; i < dataSet.getNumDataRecords(); i++ )
{
tmpDataRecord = dataSet.getDataRecord( i );

out.write( tmpDataRecord.get( "URIBLNS" ) + "\r\n" );
}

// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}

The one switch is that you'll want to invoke this script "After pattern is applied" rather than "After each pattern application". That is, it will run only once for the entire extracted data set, rather than multiple times for each record.

Kind regards,

Todd Wilson