This object simplifies your interactions with a JDBC-compliant SQL database. It can work with various types of databases and even in a multi-threaded format to allow scrapes to continue without having to wait for the queries to process. View an example of how to use the SqlDataManager.
This feature is only available for Professional and Enterprise editions of screen-scraper.
Prefer a more traditional approach? See an example of Working with MySQL databases.
In order to use the SqlDataManager you will need to make sure to install the appropriate JDBC driver. This can be done by downloading the driver and placing it in the lib/ext folder in the screen-scraper installation directory.
Add an event callback to SqlDataManager object.
This feature is only available for Professional and Enterprise editions of screen-scraper.
Before adding an event to the SqlDataManager, you must build the schema of any tables you will use because events are related to table operations such as inserting data
public void handleEvent(DataManagerEvent event)
that needs to be implemented. The DataManagerEvent has a method getDataNode() to retrieve the relevant DataNode.Returns a DataManagerEventListener. The same DataManagerEventListener object that was passed in
Version | Description |
---|---|
5.5 | Available for professional and enterprise editions. |
Add data to fields, in preparation for insertion into a database.
When adding data in a many-to-many relation, if setAutoManyToMany is set to false, a null row should be inserted into the relating table so the datamanager will link the keys correctly between related tables. For example, dm.addData("many_to_many", null);
Before adding data the first time, you must build the schema of any tables you will use, as well as add foreign keys if you are not using a database engine that natively supports them (such as InnoDB for MySQL).
The SqlDataManager will attempt to convert a value that is given to the correct format for the database. For example, if the database requires an int for a column named age, dm.addData("table", "age", "32") will convert the String "32" to an int before adding it to the database. See the table below the examples for other types of java objects and how they map to SQL types.
The table and columnName parameters are not case sensitive. The same is true for the key values in the data map.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Since the DataManager is designed with screen-scraper in mind all inputs support using the String type in addition to their corresponding Java object type, but the String needs to be parseable into the corresponding data type. For example if there is a column that is defined as an Integer in the database then the String needs to be parseable by Integer.parseInt(String value). Here is a mapping of the sql types (based on java.sql.Types) to Java objects:
SQL Type | Java Object | |
---|---|---|
java.sql.Types.CHAR | String | |
java.sql.Types.VARCHAR | String | |
java.sql.Types.LONGVARCHAR | String | |
java.sql.Types.LONGNVARCHAR | String | |
java.sql.Types.NUMERIC | BigDecimal | |
java.sql.Types.DECIMAL | BigDecimal | |
java.sql.Types.TINYINT | Integer | |
java.sql.Types.SMALLINT | Integer | |
java.sql.Types.INTEGER | Integer | |
java.sql.Types.BIGINT | Long | |
java.sql.Types.REAL | Float | |
java.sql.Types.FLOAT | Double | |
java.sql.Types.DOUBLE | Double | |
java.sql.Types.BIT | Boolean | |
java.sql.Types.BINARY | ByteArray | |
java.sql.Types.VARBINARY | ByteArray | |
java.sql.Types.LONGVARBINARY | ByteArray | |
java.sql.Types.DATE | SQLDate or Long | |
java.sql.Types.TIME | SQLTime or Long | |
java.sql.Types.TIMESTAMP | SQLTime or Long | |
java.sql.Types.ARRAY | Object | |
java.sql.Types.BLOB | ByteArray | |
java.sql.Types.CLOB | Object | |
java.sql.Types.JAVA_OBJECT | Object | |
java.sql.Types.OTHER | Object |
Manually setup table connection (key matching).
If SqlDataManager.buildSchemas is called, any foreign keys manually added before that point will be overridden or erased.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
If the database has some indication of foreign keys then these will be followed automatically. If the database does not allow for foreign key references then you will need to build the table connections using this method.
Manually add session variable data to fields, in preparation for insertion into a database.
The keys from the session will be matched in a case insensitive way to the column names of the database.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Add corresponding session variables to the tables automatically when it is committed.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Collect the database schema information, including foreign key relations between tables.
Schemas must be built for any tables that will be used by this DataManager before data can be added.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Clear all data from the data manager without writing it to the database. This includes all data previously committed but not yet written.
This method does not receive any parameters.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Clear session variables corresponding to the fields of a specific table (case insensitive).
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Clear session variables corresponding to a committed table automatically.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Close data manager's connections.
If there is data that has not yet been written to the database when this method is called it will not be written.
This method does not receive any parameters.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Commit a prepared row of data into queue. Once called the data can no longer be edited. When working with multiple tables that relate by a foreign key, it is important to commit rows in the correct order. The rows in each of the child tables should be committed before the parent, or they will not be correctly linked when written to the database.
This does not write the row of data to the database, but rather puts it in queue to be written at a later time.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Commit prepared rows of data for all tables into queue. Once called the data can no longer be edited.
This method does not receive any parameters.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Write committed data to the database. Any data that has not been committed using either the commit or commitAll method will be lost and not written to the database.
This method does not receive any parameters.
Returns true data was successfully written to the database; otherwise, it returns false.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Retrieve the connection object of the data manager. This can be helpful if you want to do something that the data manager cannot do easily, such as query the database.
Be sure to close the connection once it is no longer needed. Failure to do so could exhaust the connection pool used by the datamanger, which will cause the scraping session to hang.
This method does not receive and parameters.
Returns a connection object matching the one used in the data manager.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Retrieve the last autogenerated primary key, if any, for the given table
case insensitve table name
Returns a com.screenscraper.datamanager.DataObject containing the primary key.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Sets whether or not the data manager should automatically take care of many-to-many relationships.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
If the many-to-many table has more information than just the keys then you will want to leave this feature turned off so that you can add more data than just the keys before committing.
This feature is only available for Professional and Enterprise editions of screen-scraper.
Set global merge status. When conflicts exist in data, a merge of true will take the newer values and save them over previous null values.
When merging or updating values in a table, that table must have a Primary Key. When the Primary Key is set to autoincrement, if the value of that key was not set with the addData method the DataManager will create a new row rather than update or merge with an existing row. One solution is to use an SqlDuplicateFilter to set fields that would identify an entry as a duplicate and automatically insert the value of the autoincrement key when data is committed.
Update | Merge | Resulting Action |
---|---|---|
false | false | Ignore row on duplicate |
true | false | Update only values whose corresponding columns are currently NOT NULL in the database |
false | true | Update only values whose corresponding columns are currently NULL in the database |
true | true | Update all values to new data |
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
This feature is only available for Professional and Enterprise editions of screen-scraper.
Set update status globally. When conflicts exist in data, an update of true will take the newer values and save them over previous non-null values.
When merging or updating values in a table, that table must have a Primary Key. When the Primary Key is set to autoincrement, if the value of that key was not set with the addData method the DataManager will create a new row rather than update or merge with an existing row. One solution is to use an SqlDuplicateFilter to set fields that would identify an entry as a duplicate and automatically insert the value of the autoincrement key when data is committed.
Update | Merge | Resulting Action |
---|---|---|
false | false | Ignore row on duplicate |
true | false | Update only values whose corresponding columns are currently NOT NULL in the database |
false | true | Update only values whose corresponding columns are currently NULL in the database |
true | true | Update all values to new data |
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Set the error logging level. Currently only DEBUG and ERROR levels are supported. At the DEBUG level, all queries and results will be output to the log.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
This feature is only available for Professional and Enterprise editions of screen-scraper.
Set merge status for a table. When conflicts exists in data, a merge of true will take the newer values and save them over previous null values.
When merging or updating values in a table, that table must have a Primary Key. When the Primary Key is set to autoincrement, if the value of that key was not set with the addData method the DataManager will create a new row rather than update or merge with an existing row. One solution is to use an SqlDuplicateFilter to set fields that would identify an entry as a duplicate and automatically insert the value of the autoincrement key when data is committed.
Update | Merge | Resulting Action |
---|---|---|
false | false | Ignore row on duplicate |
true | false | Update only values whose corresponding columns are currently NOT NULL in the database |
false | true | Update only values whose corresponding columns are currently NULL in the database |
true | true | Update all values to new data |
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Set number of threads that the data manager can have open at once. When set higher than one, the scraping session can continue to run and download pages while the database is being written. This can decrease the time required to run a scrape, but also makes debugging harder as there is no guarantee about the order in which data will be written. It is recommended to leave this setting alone while developing a scrape. Also, the flush method will always return true if more than one thread is being used to write to the database, even if the write failed.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
This feature is only available for Professional and Enterprise editions of screen-scraper.
Set update status for a given table. When conflicts exists in data, an update of true will take the newer values and save them over previous non-null values.
When merging or updating values in a table, that table must have a Primary Key. When the Primary Key is set to autoincrement, if the value of that key was not set with the addData method the DataManager will create a new row rather than update or merge with an existing row. One solution is to use an SqlDuplicateFilter to set fields that would identify an entry as a duplicate and automatically insert the value of the autoincrement key when data is committed.
Update | Merge | Resulting Action |
---|---|---|
false | false | Ignore row on duplicate |
true | false | Update only values whose corresponding columns are currently NOT NULL in the database |
false | true | Update only values whose corresponding columns are currently NULL in the database |
true | true | Update all values to new data |
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Initiate a SqlDataManager object.
Before adding data to the SqlDataManager, you must build the schema of any tables you will use, as well as add foreign keys if you are not using a database engine that natively supports them (such as InnoDB for MySQL).
Returns a SqlDataManager. If an error is experienced it will be thrown.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
com.screenscraper.datamanager.sql.SqlDataManager
SqlDuplicateFilters are designed to filter duplicates when more information than just a primary key might define a duplicate entry. For example, you might define a unique person by their SSN, driver's license number, or by their first name, last name, and phone number. It is also possible that a single person may have multiple phone numbers, and if any of them match then the duplicate constraint should be met. Using an SqlDuplicateFilter can check for conditions such as this and correctly recognize duplicate entries.
This feature is only available for Professional and Enterprise editions of screen-scraper.
Sometimes the data will need to be filtered across multiple tables, or possibly different constaints might indicate a duplicate. An example of this is a person might be a duplicate if their SSN matches OR if their driver's license number matches. Alternatively, they may be a duplicate when they have the same first name, last name, and phone number.
Duplicate filters are checked in the order they are added, so consider perfomance when creating duplicate filters. If, for instance, most duplicates will match on the social security number, create that filter before the others. Also make sure to add indexes into your database on those columns that you are selecting by or else performance will rapidly degrade as your database gets large.
Duplicates will be filtered by any one of the filters created. If multiple fields must all match for an entry to be a duplicate, create a single filter and add each of those fields as constraints, as shown in the third filter created above. In other words, constraints added to a single filter will be ANDed together, while seperate filters will be ORed.
Add a constraint that checks the value of new entries against the value of entries already in the database for a given column and table.
Returns void.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |
Sometimes the data will need to be filtered across multiple tables, or possibly different constaints might indicate a duplicate. An example of this is a person might be a duplicate if their SSN matches OR if their driver's license number matches. Alternatively, they may be a duplicate when they have the same first name, last name, and phone number.
Duplicate filters are checked in the order they are added, so consider perfomance when creating duplicate filters. If, for instance, most duplicates will match on the social security number, create that filter before the others. Also make sure to add indexes into your database on those columns that you are selecting by or else performance will rapidly degrade as your database gets large.
Duplicates will be filtered by any one of the filters created. If multiple fields must all match for an entry to be a duplicate, create a single filter and add each of those fields as constraints, as shown in the third filter created above. In other words, constraints added to a single filter will be ANDed together, while seperate filters will be ORed.
Create an SqlDuplicateFilter for a specific table and register it with the data manager.
Returns an SqlDuplicateFilter that can then be configured for duplicate entries.
Version | Description |
---|---|
5.0 | Available for professional and enterprise editions. |