Input Filtering

I'm parsing scraped data to insert into a mysql database, I'm using a java.sql.PreparedStatement. All the incoming scraped data are strings or null and a prepared statement requires typed data based on the column types, so I can't directly insert strings as numeric or date types.

My java experience/knowledge is amateur at best, so there is bound to be a better way, please with a cherry on top show me a better way.

I've been doing it like the following so far, the script is executed after each pattern match. In my actual script there are about 50 scrape variables so its much more cumbersome. This script works fine if all the patterns matched something, but if not, my conversions raise errors since I'm trying to convert nulls.

// <snip> imports, connection manager setup </snip>

PreparedStatement prepStmt = null;
DataRecord userStatData = null;
DataSet scrapeDataSet = null;


String sqlAdd = "INSERT INTO `db`.`stat_history` "
+ "(`user_id`,`string_name`,`large_number`,`float_number`) VALUES (?,?,?,?)";

try {
 prepStmt = conn.prepareStatement(sqlAdd);

 // main pattern
  scrapeDataSet = session.getv("USERSTATS");

  // lastest datarecord in dataset
  userStatData = scrapeDataSet.getDataRecord(
    scrapeDataSet.getNumDataRecords() - 1 );

  // regex to strip commas in large numbers (ex. 123,456,789)
  String regex = "(?<=\\d),(?=\\d)";

  // user_id bigint()
  prepStmt.setLong(1,
    Long.parseLong( userStatData.get( "USERID" ) ) );
  // string_name varchar(150)
  prepStmt.setString(2, userStatData.get( "USERNAME" ) );
  // large_number bigint()
  prepStmt.setLong(3,
    Long.parseLong(
      userStatData.get( "LARGENUMBER" ).replaceAll(regex, "") )
      );
  // float_number decimal(10,4)
  prepStmt.setFloat(4,
    Float.valueOf( userStatData.get( "FLOATNUMBER" ) ).floatValue() );

  prepStmt.executeUpdate();

} catch (SQLException e) {
  session.logError("SQLException: " + e.getMessage() );
  session.logError("SQLState: " + e.getSQLState() );
  session.logError("VendorError: " + e.getErrorCode() );
} finally {
  if (prepStmt != null){
    prepStmt.close();
  }
}

Yeah, so the previous works when my tokens/patterns return data but when they return nulls it fails, am I to write individual checks/conversions on all of my data?
Long largeNumToPassDB = ( userStatData.get("LARGENUMBER") == null)
        ? null
        : userStatData.get( "LARGENUMBER" ).replaceAll(regex, "");

prepStmt.setLong(3, largeNumToPassDB);

That just doesn't seem right, or efficient and its ugly. I think what I'm looking for some kind of input filtering or sanitization for my tokens / variables that'll return a type of my want.
myLongFilter = new LongFilter();
myLongFilter.setAllowNull();
myLongFilter.replaceAll(matchRegex, replaceValue);

prepStmt.setLong(3, myLongFilter.parse( userStatData.get("LARGENUMBER") ) );

I could use the filter over and over for different data. I searched the javadoc's for some kind of 'filter' solution, but nothing comes up, does anyone know of an existing solution? Maybe I'm off on calling it a 'filter' and that's why I can't find anything like it. A set of filters for java primitives, date, time, timestamp with mostly sql types in mind.

Thanks for any advice in advance.

There are a couple of ways to

There are a couple of ways to do this.

  • Token mappings are available in enterprise edition, and the documentation is here.
  • If you look in our script repository you will find a small collection of scripts which each contain a function to help with these sorts of things. They are all labeled with "prepare for output", and you can put many functions in one script.