Multiple Rows for Data Record
The data record is occupying three rows (please see below) in the cvs file. For example, for a data record the address is display in one row. The city, neighborhood, county, state, and zip is displayed in the next row. The price, type, built, bed, bath, gla, and lot is displayed in the following row. How do I correct the problem so that each data record is displayed in one row?
Please see the screen-shot of the CVS file.
If it helps I have included the "Write to a file" script and a portion of the log below:
outputFile = "Trulia.csv";
//Error catching.
try
{
//Set-up file to be written.
File file = new File ( outputFile );
fileExists = file.exists();
//Open up the file to be appended.
out = new FileWriter( file, true );
session.log( "Writing data to a file." );
//Write headers only one time.
if (!fileExists)
{
// Write out the headers.
out.write("\"" + "ADDRESS" + "\"" + ",");
out.write("\"" + "CITY" + "\"" + "," );
out.write("\"" + "NEIGHBORHOOD" + "\"" + ",");
out.write("\"" + "COUNTY" + "\"" + ",");
out.write("\"" + "STATE" + "\"" + ",");
out.write("\"" + "ZIP" + "\"" + ",");
out.write("\"" + "PRICE" + "\"" + ",");
out.write("\"" + "TYPE" + "\"" + ",");
out.write("\"" + "BUILT" + "\"" + ",");
out.write("\"" + "BED" + "\"" + ",");
out.write("\"" + "BATH" + "\"" + ",");
out.write("\"" + "GLA" + "\"" + ",");
out.write("\"" + "LOT" + "\"" + ",");
out.write( "\n" );
}
String [] variables = {"PRICE","GLA"};
i = 0;
// Iterate through each variable in the array above
while (i < variables.length){
//Get the variables to be fixed
value = session.getVariable(variables[i]);
//Log the UNFIXED values
session.log("UNFIXED: " + variables[i] + " = " + value);
if(value != null){
//Remove non-numerical elements from number
value = value.replaceAll("\\D","");
// Set variables with new values
dataRecord.put(variables[i], value);
session.setVariable(variables[i], value);
//Log the FIXED values
session.log("FIXED " + variables[i] + " = " + session.getVariable(variables[i]));
}
i++;
}
//Write columns.
out.write( session.getVariable( "ADDRESS" )+ "," );
out.write( session.getVariable( "CITY" ) + "," );
out.write( session.getVariable( "NEIGHBORHOOD" ) + "," );
out.write( session.getVariable( "COUNTY" ) + "," );
out.write( session.getVariable( "STATE" ) + "," );
out.write( session.getVariable( "ZIP" ) + "," );
out.write( session.getVariable( "PRICE" ) + "," );
out.write( session.getVariable( "TYPE" ) + "," );
out.write( session.getVariable( "BUILT" ) + "," );
out.write( session.getVariable( "BED" ) + "," );
out.write( session.getVariable( "BATH" ) + "," );
out.write( session.getVariable( "GLA" ) + "," );
out.write( session.getVariable( "LOT" ) + "," );
out.write( "\n" );
//Close up the file.
out.close();
//Clear variables.
session.setVariable("ADDRESS","");
session.setVariable("CITY","");
session.setVariable("NEIGHBORHOOD","");
session.setVariable("COUNTY","");
session.setVariable("STATE","");
session.setVariable("ZIP","");
session.setVariable("PRICE","");
session.setVariable("TYPE","");
session.setVariable("BUILT","");
session.setVariable("BED","");
session.setVariable("BATH","");
session.setVariable("GLA","");
session.setVariable("LOT","");
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}
Log:
LATITUDE=28.133923
LONGITUDE=-82.75858
PROPERTY_ID=1040860447
Storing this value in a session variable.
FULL_ADDRESS=1250-S-Pinellas-Ave-901-Tarpon-Springs-FL-34689
Storing this value in a session variable.
Search Results: Processing scripts after a pattern application.
Processing script: "Scrape Details Page"
Scraping file: "Details Page"
Details Page: Preliminary URL: http://www.website.com/property/~#PROPERTY_ID#~-~#FULL_ADDRESS#~
Details Page: Using strict mode.
Details Page: Resolved URL: http://www.website.com/property/1250-S-Pinellas-Ave-901-Tarpon-Springs-F... Page: Sending request.
Details Page: Processing scripts before all pattern applications.
Details Page: Extracting data for pattern "ADDRESS"
Details Page: The following data elements were found:
ADDRESS--DataRecord 0:
DATARECORD=1250 S Pinellas Avenue #901, Tarpon Springs FL 34689'
ADDRESS=1250 S Pinellas Avenue #901
Storing this value in a session variable.
Details Page: Processing scripts after a pattern application.
Processing script: "Write data to a file"
Writing data to a file.
UNFIXED: PRICE = null
UNFIXED: GLA = null
Details Page: Processing scripts after all pattern applications.
Details Page: Processing scripts before all pattern applications.
Details Page: Extracting data for pattern "NEIGHBORHOOD_CITY_COUNTY_STATE_ZIP"
Details Page: The following data elements were found:
NEIGHBORHOOD_CITY_COUNTY_STATE_ZIP--DataRecord 0:
DATARECORD={"city":["Tarpon Springs"],"state":["FL"],"state_name":["Florida"],"zip":["34689"],"county":["Pinellas"],"dma":["54"],"neighborhood":[null],"p_status":"For Sale","p_beds":"2","p_baths":"2.0","p_type":"Condo","page":"listing-detail"}
CITY=Tarpon Springs
Storing this value in a session variable.
COUNTY=Pinellas
Storing this value in a session variable.
STATE=Florida
Storing this value in a session variable.
ZIP=34689
Storing this value in a session variable.
Details Page: Processing scripts after a pattern application.
Processing script: "Write data to a file"
Writing data to a file.
UNFIXED: PRICE =
FIXED PRICE =
UNFIXED: GLA =
FIXED GLA =
Details Page: Processing scripts after all pattern applications.
Details Page: Processing scripts before all pattern applications.
Details Page: Extracting data for pattern "LIST_TYPE_BUILT_BED_BATH_GLA_LOT"
Details Page: The following data elements were found:
LIST_TYPE_BUILT_BED_BATH_GLA_LOT--DataRecord 0:
DATARECORD="1040860447" /> <input type="hidden" id="flag_site_id" value="98774" /> <input type="hidden" id="property_detail_type_org" value="Condo" /> <input type="hidden" id="property_detail_listing_type_org" value="Resale" /> <input type="hidden" id="property_detail_status_org" value="For Sale" /> <input type="hidden" id="property_detail_price_org" value="58,000" /> <input type="hidden" id="property_detail_beds_org" value="2" /> <input type="hidden" id="property_detail_baths_org" value="2.0" /> <input type="hidden" id="property_detail_sqft_org" value="871" /> <input type="hidden" id="property_detail_lot_size_org" value="n/a" /> <input type="hidden" id="property_detail_year_built_org" value="1974" /> <input type="hidden" id="istp" value="" /> <input type="hidden" id="user_type" value="" />
PRICE=58,000
Storing this value in a session variable.
TYPE=Condo
Storing this value in a session variable.
BUILT=1974
Storing this value in a session variable.
BED=2
Storing this value in a session variable.
BATH=2.0
Storing this value in a session variable.
GLA=871
Storing this value in a session variable.
LOT=n/a
Storing this value in a session variable.
Details Page: Processing scripts after a pattern application.
Processing script: "Write data to a file"
Writing data to a file.
UNFIXED: PRICE = 58,000
FIXED PRICE = 58000
UNFIXED: GLA = 871
FIXED GLA = 871
Details Page: Processing scripts after all pattern applications.
I think it looks like you're
I think it looks like you're calling the "Write data to a file" script after you get each piece of data.
If you save all of your information to session variables, and then just call the script once at the end (perhaps after the scrapeableFile finishes), that should be good.
The ultimate trouble is that you're telling it to write before you've gathered everything that you want. For instance, (don't do this, I'm just giving you a hypotheical situation) if you were to remove those last few lines of your "write data" script, where it sets a bunch of variables to be simple blank strings (
""
), you would see that your rows in the CSV would be more like the following:ADDRESS CITY NEIGHBORHOOD COUNTY STATE ZIP PRICE TYPE BUILT BED BATH GLA LOT
1250 S...
1250 S... Tarpon Springs Pinellas Florida 34689
1250 S... Tarpon Springs Pinellas Florida 34689 58000 Condo 1974 2 2.0 871 n/a
But anyway, just call the "write data" script once at the end of the scrapeableFile, once you've finished getting everything that you need. The script is trying to write out all the values, but you haven't finished getting them all when you call that script the first two times.
Let me know if that works :)
Tim
Thanks!!!
I removed the "Write data to a file" script for the first and second extractors and left the "Write data to a file" for the third extractor and it worked. Thanks!!!