Text File Column Headers

Sorry if this has already been discussed....but I couldn't find it anywhere.

Is there a way to get the tokens used in the "write info to file" script to act as headers for the .txt file.

For example....in the script used to write a scrape to a .txt file I have the following

// Write out the data to the file.
out.write( dataRecord.get( "INFO_COL1" ) + "\t" );
out.write( dataRecord.get( "INFO_COL2" ) + "\t" );

Is there a way to get the token "INFO_COL1" and "INFO_COL2" to act as the first info saved in the .txt file....so when ported into a database they act as the column headers for the database file?

Also...if I am writing the scraped data to the file "after each pattern application" ....(as a "when to run" script under my main extractor pattern)...will the headers still stay in place across a large number of scraped pages?

Thanks in advace for your help.

Text File Column Headers

Hi Scott,

It may be that the answer to your question is this:


http://www.mysite/similar_folder/unique_page~#PAGE#~.html

That is, you can embed session variables anywhere in the URL, which, depending on what's stored in the session variables, could yield URL's like these:


http://www.mysite/similar_folder/unique_pageA.html
http://www.mysite/similar_folder/unique_pageBA.html

That may only be the first part of your question. The second part may deal with getting the "words" that you'd like to embed in the URL. The easiest way would be to extract them from a page. For example, you may have all of those URL's embedded in a web page. You might then apply an extractor pattern that grabs the unique portion of each URL (e.g., "A" or "BA"), and saves that unique portion in a session variable (e.g., "PAGE"). You could then embed them in your URL like so:


http://www.mysite/similar_folder/unique_page~#PAGE#~.html

If you can't extract the unique portions of the URL from the page, you could also read them in from a file or URL. We provide an example of doing this here: here.

Hopefully that answers your questions. Feel free to post back, if not.

Best wishes,

Todd

Text File Column Headers

Todd,
Wow....missed that extra parentheses¦..silly mistake on my part (Dohh).. Thanks for the patience in dealing with a newbie like me.

I do have another question. Similar and along the lines of the post that was made here
http://www.screen-scraper.com/forum/phpBB2/viewtopic.php?t=68

I have a pure HTML site that has a hundred or so files that I need to scrape. Each URL is contained in a similar folder.....but has a unique page extension (ex: www.mysite.com/similar_folder/unique_page.html ) - somehow I need to access all the pages within the folder each containing a different and unique extension

I understand how you embed a session variable within the scraping session script to alter the requested URL when the different pages are based on numbers (Call RunnableScrapingSession.SetVariable( "PAGE", "1" ) BUT how can I do the same function when each HTML page is a different word?

www.mysite/similar_folder/unique_pageA.html
www.mysite/similar_folder/unique_pageBA.html

I can manually collect all the unique page names in a file if necessary¦..or can one program SS to search for every page within a folder?

Anyhow¦hopefully this makes sense I really do appreciate your time in helping me with an answer. THANKS

Scott

Text File Column Headers

Hi Scott,

You've got some extra parentheses. Try changing your script to this:

FileWriter out = null;

try
{
session.log( "Write SSS Headers to file" );

// Open up the file to be appended to.
out = new FileWriter( "SSS.txt", false );

// Write out the data to the file.
out.write( "4pc_Place" + "\t" );
out.write( "5pc_Place" + "\t" );
out.write( "4pc_Dinner" + "\t" );
out.write( "5pc_Dinner" + "\t" );
out.write( "46pc_Place" + "\t" );
out.write( "46pc_Dinner" + "\t" );
out.write( "66pc_Place" + "\t" );
out.write( "66pc_Dinner" );
out.write( "\n" );

// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
};

Best,

Todd

Text File Column Headers

Thanks Todd - you are right on....this is exactly what I am trying to do....

However - I created a script using the "writing headers to a file" code given below....then added the script to the Extractor Pattern, sequenced it first ("1"), and asked that it be invoked "before pattern is applied"....and I get the following error message.

Product Page: Processing scripts before all pattern applications.
Processing script: "Write SSS Headers to file"
Product Page: An error occurred while processing the script: Write SSS Headers to file
Product Page: The error message was: BeanShell script error: In file:

try
{
session.log( "Write SSS Headers to file" );

// Open up the file to be appended to.
out = new FileWriter( "SSS.txt", false );

// Write out the data to the file.
out.write( "4pc_Place" ) + "\t" );
out.write( "5pc_Place" ) + "\t" );
out.write( "4pc_Dinner" ) + "\t" );
out.write( "5pc_Dinner" ) + "\t" );
out.write( "46pc_Place" ) + "\t" );
out.write( "46pc_Dinner" ) + "\t" );
out.write( "66pc_Place" ) + "\t" );
out.write( "66pc_Dinner" ) );
out.write( "\n" );

// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}; > Encountered "+" at line 11, column 28.

BSF info: null at line: 0 column: columnNo

Am I invoking the code correctly? I get the same error message pop-up if I click the run button on the script page.

Thanks again,
Scott

Text File Column Headers

Hi,

I believe I understand what you're asking, but feel free to correct me, if not. The key here is the constructor for the FileWriter object. To start a fresh file (overwriting an existing file) you would create the object like so:

out = new FileWriter( "dvds.txt", false );

To append to an existing file you would instead do the following:

out = new FileWriter( "dvds.txt", true );

Oftentimes we'll write out column headers by invoking a script that gets invoked before a scraping session begins, which might look something like this:

FileWriter out = null;

try
{
session.log( "Writing headers to a file." );

// Open up the file to be created.
out = new FileWriter( "dvds.txt", false );

// Write out the data to the file.
out.write( "TITLE" + "\t" );
out.write( "PRICE" + "\t" );
out.write( "MODEL" + "\t" );
out.write( "SHIPPING_WEIGHT" + "\t" );
out.write( "MANUFACTURED_BY" );
out.write( "\n" );

// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}

If you invoke that script before the scraping session begins it will start out your file with only the column headers. After that, you'll likely call a script more like this one:

FileWriter out = null;

try
{
session.log( "Writing data to a file." );

// Open up the file to be appended to.
out = new FileWriter( "dvds.txt", true );

// Write out the data to the file.
out.write( dataRecord.get( "TITLE" ) + "\t" );
out.write( dataRecord.get( "PRICE" ) + "\t" );
out.write( dataRecord.get( "MODEL" ) + "\t" );
out.write( dataRecord.get( "SHIPPING_WEIGHT" ) + "\t" );
out.write( dataRecord.get( "MANUFACTURED_BY" ) );
out.write( "\n" );

// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}

Which will simply append data to the file as it gets scraped. The headers will remain in-tact throughout the process.

Just let me know if I can clarify any of that.

Kind regards,

Todd Wilson