SqlDuplicateFilter
Overview
SqlDuplicateFilters are designed to filter duplicates when more information than just a primary key might define a duplicate entry. For example, you might define a unique person by their SSN, driver's license number, or by their first name, last name, and phone number. It is also possible that a single person may have multiple phone numbers, and if any of them match then the duplicate constraint should be met. Using an SqlDuplicateFilter can check for conditions such as this and correctly recognize duplicate entries.
This feature is only available for Professional and Enterprise editions of screen-scraper.
Examples
Register a new duplicate filter
import com.screenscraper.datamanager.sql.*;
//Get the data manager
SqlDataManager dm = session.getVariable( "_DATAMANAGER" );
// Register a new duplicate filter
// Check for duplicate people, so register it for the people table
SqlDuplicateFilter nameFilter = SqlDuplicateFilter.register("people", dm);
//Add constraints to match when a first name, middle initial, and last name match a different row in the database
nameFilter.addConstraint( "people", "first_name" );
nameFilter.addConstraint( "people", "middle_initial" );
nameFilter.addConstraint( "people", "last_name" );
Match Duplicates across tables
Sometimes the data will need to be filtered across multiple tables, or possibly different constaints might indicate a duplicate. An example of this is a person might be a duplicate if their SSN matches OR if their driver's license number matches. Alternatively, they may be a duplicate when they have the same first name, last name, and phone number.
/*
Perform the setup of the SqlDataManager, as shown previously, and name the variable dm.
*/
//register an SqlDuplicateFilter with the DataManager for the social security number
SqlDuplicateFilter ssnDuplicate = SqlDuplicateFilter.register( "person", dm );
ssnDuplicate.addConstraint( "person", "ssn" );
//register an SqlDuplicateFilter with the DataManager for the drivers license
SqlDuplicateFilter licenseDuplicate = SqlDuplicateFilter.register( "person", dm );
licenseDuplicate.addConstraint( "person", "drivers_license" );
//register an SqlDuplicateFilter with the DataManager for the name/phone number
//where the person table has a child table named phone.
SqlDuplicateFilter namePhoneDuplicate = SqlDuplicateFilter.register( "person", dm );
namePhoneDuplicate.addConstraint( "person", "first_name" );
namePhoneDuplicate.addConstraint( "person", "last_name" );
namePhoneDuplicate.addConstraint( "phone", "phone_number" );
Duplicate filters are checked in the order they are added, so consider perfomance when creating duplicate filters. If, for instance, most duplicates will match on the social security number, create that filter before the others. Also make sure to add indexes into your database on those columns that you are selecting by or else performance will rapidly degrade as your database gets large.
Duplicates will be filtered by any one of the filters created. If multiple fields must all match for an entry to be a duplicate, create a single filter and add each of those fields as constraints, as shown in the third filter created above. In other words, constraints added to a single filter will be ANDed together, while seperate filters will be ORed.
- Printer-friendly version
- Login or register to post comments