writing directly to database

Hello,

I am a newbie in the forum here but would like to say that I absolutely love your product.

/// Question ///

I would like to write my scraping sessions directly into a SQL database, is this possible? Something to the effect of using ADO with VBScript. If this is not feasible, what about writing directly to Access locally? Ultimately I want to cron-job scheduled daily scraping sessions then write/update database.

thanks~

writing directly to database

Thank you for such an informative post DS, that really helped a lot! :D

christiandude~

writing directly to database

Here's the Timer event:

private void PollScraperTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e)
{
Screenscraper.DataSet dataSet;
foreach (string sessionName in _sessionNames)
{
try
{
dataSet = GetScreenScrape( sessionName );
ParseScreenScrape( dataSet, sessionName );
}
catch(Exception ex)
{
EventLog.WriteEntry(ex.Source,
ex.Message + "\n" +
ex.StackTrace + "\n" +
sessionName,
EventLogEntryType.Error);
}
}
}

_sessionNames is just a list of the Session Names I have in screen-scraper. I store them in a config file so I can add more in later.

This is the function that gets the dataset from the scraper session
private Screenscraper.DataSet GetScreenScrape( string sessionName )
{
RemoteScrapingSession remoteSession = new RemoteScrapingSession( sessionName );

remoteSession.Scrape();
Thread.Sleep( 5000 ); //Give the scraper time to get the results
Screenscraper.DataSet dataSet =
( Screenscraper.DataSet ) remoteSession.GetVariable( sessionName );

remoteSession.Disconnect();

voteInsert.Parameters["@dateTimeStamp"].Value = DateTime.Now;

return dataSet;
}

Obviously I don't like the thread.sleep call, but I haven't put enough thought into how the thing should interact, and performance isn't much of an issue for me.

Parsing the dataset is as follows. voteInsert is a stored procedure added into the [Design] session by just dragging it off the Server Explorer. This function chunters through the screen-scraper dataset populating the stored proc parameters with the values. I always name my screen-scraper tokens the same, and I try to keep the cleansing on the screen-scraper side to make everything as data-driven as possible.

private void ParseScreenScrape( Screenscraper.DataSet dataSet, string sessionName )
{
for( IEnumerator i = dataSet.AllDataRecords.GetEnumerator(); i.MoveNext(); )
{
DataRecord tmpDataRecord = ( DataRecord )i.Current;
// Enumerate through the data record, outputting each of the fields.
for( IDictionaryEnumerator j = tmpDataRecord.GetEnumerator(); j.MoveNext(); )
{
switch (j.Key.ToString())
{
case "Name":
voteInsert.Parameters["@candidateName"].Value = j.Value.ToString();
break;
case "Poll":
voteInsert.Parameters["@pollName"].Value = sessionName;
break;
case "Votes":
voteInsert.Parameters["@votes"].Value = j.Value;
break;
default:
throw new Exception
("Screen-Scraper returned invalid field '" + j.Key + "'");
}
}
WritePollToDB();
}
}

We then write the data to the database. The stuff at the top handles exceptions in the data, so if your scraping session is good, you shouldn't need it. This should be filtered in the scraper, but I've got a few questions outstanding myself on how to do this effectively.

private void WritePollToDB()
{
try
{
connectionRoosevelt.Open();

//exceptions
bool[,] exceptions =
{
{
voteInsert.Parameters["@candidateName"].Value.ToString().Equals("Candidate 1"),
voteInsert.Parameters["@votes"].Value.ToString().Equals("384")
},
{
voteInsert.Parameters["@candidateName"].Value.ToString().Equals("Candidate 2"),
voteInsert.Parameters["@votes"].Value.ToString().Equals("464")
}
};

//TODO: Remove this hard coding by modifying the Scraping Sessions
if ( !(exceptions[0,0] & exceptions[0,1])
&
!(exceptions[1,0] & exceptions[1,1])
)
{
voteInsert.ExecuteNonQuery();

if( (int) voteInsert.Parameters["@RETURN_VALUE"].Value < 0 )
{
EventLog.WriteEntry(this.ToString(),
"Stored Procedure voteCumulativeInsert returned " +
voteInsert.Parameters["@RETURN_VALUE"].Value.ToString() +
" for " + voteInsert.Parameters["@candidateName"].Value.ToString() +
" in " + voteInsert.Parameters["@pollName"].Value.ToString(),
EventLogEntryType.Information);
}
}
}
catch( Exception ex )
{
EventLog.WriteEntry(ex.Source,
ex.Message + "\n" + ex.StackTrace + "\n" +
"\t" + voteInsert.Parameters["@pollName"].Value.ToString() +
"\t" + voteInsert.Parameters["@candidateName"].Value.ToString() +
"\t" + voteInsert.Parameters["@votes"].Value.ToString(),
EventLogEntryType.Error);
}
finally {
connectionRoosevelt.Close();
}
}

writing directly to database

Would you be willing to share how you did this? I love this screen scraping tool but need some direction in this area. You can reach me via PM. Thank you so much!

christiandude~