Help with nulls and appending files
I adapted the scrape from tutorial 3 to scrape a site for me. It works well, but not quite right.
Some of the extractor patterns do not find data (it's not there on all of the details pages to be scraped) and the tokens end up being null. The nulls somehow mess up my write-data-to-a-file script, and I end up missing the data from about 1 in 6 scrapes.
How can I change the content of the tokens to "NA" if they are null?
Also I don't want my session/scrape to append the data to the same file everytime it is run. I want it to start a new file (overwrite) each time I run the scrape/session. How do I do that?
I tried searching the documention for answers, but it's pretty thin when it comes to the interpreted java syntax. Where should I look for reference material on the interpreted java?
Here's my initialize script:
try
{
session.log( "Writing data to a file." );
// Open up the file to be appended to.
out = new FileWriter( "cars.tsv", true );
// Write out the data to the file.
out.write( "Build Number" + "\t" );
out.write( "Name" + "\t" );
out.write( "Body Style" + "\t" );
out.write( "Ext. Color" + "\t" );
out.write( "Int. Color" + "\t" );
out.write( "VIN" + "\t" );
out.write( "Price" + "\t" );
out.write( "Mileage" + "\t" );
out.write( "Engine" + "\t" );
out.write( "Transmission" );
out.write( "\n" );
// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}
Here's my Write-data-to-file script:
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.GregorianCalendar;
FileWriter out = null;
try
{
// Open up the file and append data to it.
out = new FileWriter( "cars.tsv", true );
// Write out the data to the file.
out.write( dataRecord.get( "BUILDNUM" ) + "\t" );
out.write( dataRecord.get( "YEAR" ) + "\t" );
out.write( dataRecord.get( "MAKE_LONG" ) + "\t" );
out.write( dataRecord.get( "MODEL_LONG" ) + "\t" );
out.write( dataRecord.get( "BODYSTYLE" ) + "\t" );
out.write( dataRecord.get( "XCOLOR" ) + "\t" );
out.write( dataRecord.get( "ICOLOR" ) + "\t" );
out.write( dataRecord.get( "VIN" ) + dataRecord.get( "BUILDNUM" ) + "\t" );
out.write( dataRecord.get( "PRICE" ) + "\t" );
out.write( dataRecord.get( "MILEAGE" ) + "\t" );
out.write( dataRecord.get( "ENGINE" ) + "\t" );
out.write( dataRecord.get( "TRANSMISSION" ) + "\t" );
out.write( dataRecord.get( "DRIVETYPE" ) + "\t" );
out.write( dataRecord.get( "FUELTYPE" ) + "\t" );
out.write( dataRecord.get( "DOORS" ) );
out.write( "\n" );
// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}
Thanks in advance,
Tom
Help with nulls and appending files
tomtombombadil,
If your extractor pattern does not match and you call a script "After Pattern is Applied" and attempt to reference the expected variables (extractor tokens) using dataRecord.get() it won't work because there are no dataRecords to, um, get.
Instead, within your script you'll need to check to see if there are dataRecords available by using this if statement which asks if there are any dataRecords in the dataSet...
if ( dataSet.getNumDataRecords() > 0 )
You can enclose the entire script in this statement or you could just put inside of it the variables you're checking for and set them as local variables that you'll reference later in the script.
myVar = dataRecord.get("myVar");
This way if local variable "myVar" is null, it won't throw an exception.
Hope this helps,
Scott
Got part of it
Well, I figured out part of my question above.
If you want the file to be overwritten rather than appended, change the last switch from true to false. Like this:
out = new FileWriter( "cars.tsv", false );