Interacting with screen-scraper Externally

Overview

This feature is only available to Professional and Enterprise editions of screen-scraper.

screen-scraper was designed from the beginning to interact with external systems. This means that you can invoke it from other applications and it can send data to yet other systems. We've tried to design screen-scraper such that it can interact with code written in virtually any modern programming language or platform. This section of our documentation will familiarize you with how this process occurs, as well as specifics on languages that screen-scraper can work with.

In order to interact with screen-scraper in any of these methods it needs to be running in server mode.

Read, write and query your database from within screen-scraper. SqlDataManager object or traditional approach.

Manage Server with Scripts

Overview

In order to invoke screen-scraper from something like a Visual Basic application or a PHP script screen-scraper needs to be run in server mode. When running screen-scraper in server mode it acts, in many respects, like a database server would. Interacting with screen-scraper via one of the remote scraping sessions is more or less analogous to querying a database via a database driver. Objects like scrapeable files and scripts set up in screen-scraper provide access to information much like tables and columns in a database would. Issuing a SQL query to a database is analogous to setting session variables and telling screen-scraper to initiate a named scraping session.

Log Files

When running as a server any error messages screen-scraper produces will be written to the error.log file, found in the log folder of screen-scraper's install directory.

Each time you run a scraping session externally screen-scraper will generate a log file corresponding to that scraping session in the log folder found inside screen-scraper's install directory. This can be invaluable for debugging, so you'll want to take a look at it if you run into trouble.

You can turn server logging off by unchecking the Generate log files check box in the Servers section of the settings dialog box. Within a scraping session script, the name of the log file can be found in the session variable SS_LOG_FILE_NAME.

Connecting to Server

Once the server is running it will be listening for connections. The default port the server will listen on is 8778, which can be changed by altering the Port in the Servers section of the settings dialog box.

While the server is running scraping sessions and scripts can be imported into screen-scraper without stopping it. See the documentation on importing and exporting objects for more information.

If you are experiencing trouble trying to connect to screen-scraper you will want to check the connection restrictions.

Invoking screen-scraper from ColdFusion

Overview

To interact with screen-scraper from ColdFusion we will make use of ColdFusion's ability to implement Java classes. The ColdFusion server can execute Java code but needs to be configured first.

screen-scraper needs to be running as a server before invoking it externally.

Configuring ColdFusion for Java

These steps assume that the server is running on the local machine and default setup. Some consideration might need to be given if you have changed these settings.

  1. Access the ColdFusion adminstration interface at http://127.0.0.1:8500/CFIDE/administrator/index.cfm and enter the administrator password.
  2. Under Server Settings on the left, select the Java and JVM link.
  3. Add the screen-scraper.jar to the class path.
    You can find the jar where screen-scraper was installed. If it was installed on the local Windows machine then the classpath might look like this:

  4. Make sure to click the Submit changes button and restart the ColdFusion service.
  5. After the service has been restarted check to make sure that the screen-scraper.jar is now in the server's classpath by selecting the Settings Summary link under the Server Settings section.

Interacting with screen-scraper

Now that the server will be able to load the Java class and interact with the screen-scraper server through it, you now need to learn the methods and objects. As you are really interacting through Java you will need to read the section on Invoking screen-scraper from Java to learn about the methods and classes.

Example

The following is an example of a .cfm file that creates a RemoteScrapingSession object, calls the scrape method and processes the results. The scraping session to be invoked is called test and we will be outputing a session variable, TEST that we have saved in screen-scraper.

<html>
<head>
<title>A Cold Fusion Page</title>
</head>
<body>
<cfobject
  action = "create"
  type = "java"
  class = "com.screenscraper.scraper.RemoteScrapingSession"
  name = "RemoteScrapingSession">
<cfset remoteSession = RemoteScrapingSession.init("test","localhost",8778)>
<cfset remoteSession.scrape()>
<cfset test = remoteSession.getVariable("TEST")>
<cfoutput>
    #test#
</cfoutput>
<cfset remoteSession.disconnect()>
</cfobject>
</body>
</html>

For another example of interacting with screen-scraper via ColdFusion please see Tutorial 4: Scraping a Shopping Site from External Programs.

Invoking screen-scraper from Java

Overview

A Java application or servlet interacts with screen-scraper via the class RemoteScrapingSession class (com.screenscraper.scraper.RemoteScrapingSession). You can utilize the class by including the screen-scraper.jar and lib\log4j.jar files in your CLASSPATH.

screen-scraper needs to be running as a server before invoking it from a Java class.

You can also reference your own Java code from within screen-scraper

Methods

The following is a reference for all of the methods found in the RemoteScrapingSession class.

  • RemoteScrapingSession( String identifier ). Instantiates a RemoteScrapingSession identified by identifier . If this constructor is called the default host (localhost) and port (8778) will be used.

    import com.screenscraper.scraper.*;
    RemoteScrapingSession remoteSession = new RemoteScrapingSession( "Hello World" );
  • RemoteScrapingSession( String identifier, String host, int port ). Instantiates a RemoteScrapingSession identified by identifier, and connecting to the server found at host listening on port.

    import com.screenscraper.scraper.*;
    RemoteScrapingSession remoteSession = new RemoteScrapingSession( "Hello World", "localhost", 8080 );
  • RemoteScrapingSession( String identifier, String host, int port, String characterSet ). Instantiates a RemoteScrapingSession identified by identifier, connecting to the server found at host listening on port, and utilizing the given characterSet.

    import com.screenscraper.scraper.*;
    RemoteScrapingSession remoteSession = new RemoteScrapingSession( "Hello World", "localhost", 8080, "UTF-8" );
  • int getNumDataRecordsInDataSet( String dataSetName ) throws RemoteScrapingSessionException. Gets the number of records found in the DataSet named by dataSetName.

    remoteSession.getNumDataRecordsInDataSet( "PRODUCTS" );

  • disconnect() throws IOException. Disconnects from the screen-scraper server and closes up the network socket.

    remoteSession.disconnect();
  • DataRecord getDataRecordFromDataSet( String dataSetName, int index ) throws RemoteScrapingSessionException. Gets the DataRecord specified by the index found in the DataSet named dataSetName.

    remoteSession.getDataRecordFromDataSet( "PRODUCTS", 2 );

  • getVariable( String varName ) throws RemoteScrapingSessionException. Gets the value of a session variable, varName that was set during the course of the scraping session.

    Currently only Strings, DataRecords, and DataSets can be accessed by this method.

    remoteSession.getVariable( "FORM_SUBMITTED_TEXT" );

  • loadVariables( String fileToReadFrom ) throws RemoteScrapingSessionException. This method will cause screen-scraper to load variables in from the file fileToReadFrom. More details on this method can be found with the loadVariables method.

    remoteSession.loadVariables( "variables.txt" );

  • scrape() throws RemoteScrapingSessionException. Causes the session to scrape. This is equivalent to clicking the Run Scraping Session button from within screen-scraper on the General tab of the scraping session.

    remoteSession.scrape();

  • boolean sessionTimedOut() throws RemoteScrapingSessionException. For non-lazy scrapes, this method can be called after the scrape method returns to determine whether or not a scraping session timed out. This method may only return true if the setTimeout method was called prior to calling scrape.

    remoteSession.sessionTimedOut();

  • setDoLazyScrape( boolean doLazyScrape ) throws RemoteScrapingSessionException. If set to true, screen-scraper will execute the scraping session in a separate thread, returning execution flow to the calling application immediately after the scrape method is called. This is false by default.

    remoteSession.setDoLazyScrape( true );

  • setOutputLogFiles( boolean outputLogFiles ) throws RemoteScrapingSessionException. Indicates whether or not screen-scraper should output a log file to the log folder when running this scraping session. This is true by default.

    remoteSession.setOutputLogFiles( false );

  • setTimeout( int timeout ) throws RemoteScrapingSessionException. Sets the number of minutes a scraping session should be allowed to run before it automatically stops itself. The timeout value is in minutes.

    remoteSession.setTimeout( 60 );

  • setVariable( String varName, String value ) throws RemoteScrapingSessionException. Sets a session variable, varName, in the session that will be accessible from within a screen-scraper script.

    remoteSession.setVariable( "TEXT_TO_SUBMIT", "Hi everybody!" );

  • stopServer() throws RemoteScrapingSessionException. Tells the server to stop.

    The server cannot be started remotely.

    remoteSession.stopServer();

  • DataRecords getNextCachedDataRecord( String dataSetName ) throws RemoteScrapingSessionException and
    DataSet getNextCachedDataRecord( String dataSetName, int numRecordsToRetrive ) throws RemoteScrapingSessionException. In the case of a data set that's been cached, this allows for individual DataRecord objects to be retrieved in piecemeal fashion. This is desirable in cases where a large amount of data is to be extracted throughout the life of the scraping session, and retaining it all in memory could cause problems. DataSet objects are cached by checking the Cache the data set check box under the Advanced tab for an extractor pattern.

    remoteSession.getNextCachedDataRecord( "PRODUCTS" );

Built-In Objects/Classes

It is also possible to store data sets and data records in session variables, which can then be accessed via the RemoteScrapingSession class. Data set objects are analogous to database result sets and data records are analogous to individual records within a result set. When an extractor pattern is applied a data set is generated. Storing the resulting data set in a session variable (within a screen-scraper script) will allow for it to be accessed via a RemoteScrapingSession.getVariable call. More information on these classes can be found in the DataRecord and DataSet API documentation pages.

Receiving Data in Real Time

This feature is only available to Enterprise editions of screen-scraper.

DataReceiver Interface

The DataReceiver (com.screenscraper.scraper.DataReceiver) interface allows your code to handle extracted data as it is being scraped. That is, you need not wait until the scraping session has finished before getting access to the extracted data. This interface contains a single method:

  • receiveData( String key, Object value ) throws RemoteScrapingSessionException. The key portion is simply a string you'll designate in a screen-scraper script. The value parameter holds the value you pass from screen-scraper to your code.

Real Time RemoteScrapingSession Methods

Once you have implemented the DataReceiver interface on any of your own classes, then pass an instance of the class to the RemoteScrapingSession via the setDataReceiver method. Here are other methods that allow you to control the flow of real time information.

  • setDataReceiver( DataReceiver dataReceiver ) throws RemoteScrapingSessionException

    remoteSession.setDataReceiver( dataReceiver );

  • DataReceiver getDataReceiver() throws RemoteScrapingSessionException. Use this to see if a DataReceiver has already been set.

    remoteSession.getDataReceiver( );

  • setPollFrequency( int pollFrequency ) throws RemoteScrapingSessionException. Sets the frequency in seconds with which screen-scraper should be polled for data to be sent. The default is five seconds.

    remoteSession.setPollFrequency( 2 );

  • int getPollFrequency() throws RemoteScrapingSessionException. Gets the current poll frequency, in seconds.

    remoteSession.getPollFrequency( );

Passing Information in Real Time

On the screen-scraper side, whenever you'd like to send data from screen-scraper back to your code, you simply invoke the session.sendDataToClient method. Data sent through this method will show up through the receiveData method.

Examples

In screen-scraper

As a specific example, let's suppose you've created a scraping session that extracts product records from a shopping web site. As each product record is being scraped, you might simply output them to a CSV file, but you decide instead that you'd like to insert them into your database, and determine that it would be best for you to write your own code to perform the database insertion. In your scraping session, you might have a script that contains the following:

 session.sendDataToClient( "ProductRecord", dataRecord );

You set up this script to be invoked After each pattern match for the extractor pattern that pulls the product information. For example, the extractor pattern might get the price, title, and weight of the product. Because the script is being invoked After each pattern match, the current dataRecord object will hold all of that information. You invoke session.sendDataToClient so that each record can be processed by your code as it gets extracted.

In Java Code

In your Java code you create a class that implements the DataReceiver interface. You create an instance of this class and pass it to your RemoteScrapingSession object so that you can process each of the product records as they get extracted. Your >receiveData method implementation might look something like this:

public void receiveData( String key, Object value ) throws RemoteScrapingSessionException
{
    if( key.equals( "ProductRecord" ) && value instanceof DataRecord )
    {
        // Here you would include code that might make
        // use of an existing JDBC connection to insert
        // or update the record in your database.
    }
}

Each time you invoke session.sendDataToClient in screen-scraper, there will be a corresponding method call made to your receiveData method, which will allow you to handle each of the data pieces individually.

For other examples of using the Java driver please see Tutorial 3: Extending Hello World and Tutorial 4: Scraping a Shopping Site from External Programs.

Invoking screen-scraper from PHP

Overview

A PHP script interacts with screen-scraper via a PHP class called RemoteScrapingSession. You can utilize this class by including the file remote_scraping_session.php (found in the misc/php directory of your screen-scraper installation) within your PHP script.

screen-scraper needs to be running as a server before invoking screen-scraper from a PHP script.

Methods

The following is a reference for all of the methods found in the RemoteScrapingSession class.

  • initialize( $name ). Initializes a RemoteScrapingSession identified by name. If this constructor is called the default host (localhost) and port (8778) will be used.

    $session->initialize("Hello World");

  • initialize( $name, $host, $port ). Instantiates a RemoteScrapingSession identified by name, and connecting to the server found at host listening on port.

    $session->initialize("Hello World", "127.0.0.1", "8080");

  • setVariable( $var_name, $value ). Sets a session variable using the given var_name and value.

    $session->setVariable("TEXT_TO_SUBMIT", "Hi everybody!" );

  • scrape(). Causes the session to start. This is equivalent to clicking the Run Scraping Session button from within screen-scraper on the General tab for a scraping session.

    $session->scrape();

  • getVariable( $var_name ). Gets the value of a session variable that was set during the course of the scraping session. If the object identified by $var_name is a data record an associative array will be returned. If the object identified by $var_name is a data set a two-dimensional ordinal array of associative arrays will be returned.

    Currently only Strings, DataRecords, and DataSets can be accessed by this method.

    $session->getVariable("FORM_SUBMITTED_TEXT");

  • isError(). Indicates whether or not an error has occurred in the scraping process.

    $session->isError();

  • getErrorMessage(). Returns the last error message returned from the server, if one was returned.

    $session->getErrorMessage();

  • disconnect(). Disconnects from the remote server. This should be called once a scraping session is complete so that system resources can be freed up.

    $session->disconnect();

  • getNumDataRecordsInDataSet( $data_set_name ). Returns the number of data records found in the data set named by data_set_name.

    $session->getNumDataRecordsInDataSet( "PRODUCTS" )

  • getDataRecordFromDataSet( $data_set_name, $index ). Returns a single data record (a hash array) from the data set named by data_set_name at the given index.

    $session->getDataRecordFromDataSet( "PRODUCTS", 2 )

  • setDoLazyScrape( $doLazyScrape ). Indicates whether or not a scraping session should be run in a separate thread. By default this value is false.

    Calling this method will only have an effect if it's done before calling the scrape method. If this value is set to true, after the scrape method is called, program flow will return immediately, but the scraping session will still be running in screen-scraper.

    $session->setDoLazyScrape( true )

Receiving Data in Real Time

This feature is only available to Enterprise editions of screen-scraper.

By creating a special PHP class, your code can handle extracted data as it is being scraped instead of after the scrape is finished. That means, you will not need to wait until the scraping session has finished before getting access to the extracted data.

We recommend calling the class DataReceiver.

DataReceiver Class

The DataReceiver class needs to contain the following method (you can add other methods as needed to process the data but this one is particular).

  • function receiveData( $key, $value ). The key portion is simply a string you'll designate in a screen-scraper script. The value parameter holds the value you pass from screen-scraper to your code.

Real Time RemoteScrapingSession Methods

Once you have created the DataReceiver class containing the receiveData method it must be incorporated into the RemoteScrapingSession using the setDataReceiver method. Here are other methods that allow you to control the flow of real time information.

  • setDataReceiver( $data_receiver ) Adds the DataReceiver class specified by data_receiver to the RemoteScrapingSession object.

    $session->setDataReceiver( $my_data_receiver );

  • getDataReceiver(). Use this to see if a DataReceiver has already been set.

    $session->getDataReceiver( );

  • setPollFrequency( $poll_frequency ). Sets the frequency in seconds with which screen-scraper should be polled for data to be sent. The default is five seconds.

    $session->setPollFrequency( 1 );

  • getPollFrequency(). Gets the current poll frequency, in seconds.

    $session->getPollFrequency( );

Passing Information in Real Time

On the screen-scraper side, whenever you'd like to send data from screen-scraper back to your code, you simply invoke the session.sendDataToClient method. Data sent through this method will be processed through the receiveData method.

Examples

In screen-scraper

As a specific example, let's suppose you've created a scraping session that extracts product records from a shopping web site. As each product record is being scraped, you might simply output them to a CSV file, but you decide instead that you'd like to insert them into your database, and determine that it would be best for you to write your own code to perform the database insertion. On the screen-scraper side, in your scraping session, you might have a script that contains the following:

 session.sendDataToClient( "ProductRecord", dataRecord );

You set up this script to be invoked After each pattern match for the extractor pattern that pulls the product information. For example, the extractor pattern might get the price, title, and weight of the product. Because the script is being invoked After each pattern match, the current dataRecord object will hold all of that information. You invoke session.sendDataToClient so that each record can be processed by your code as it gets extracted.

In PHP Script

In your PHP code you create a class that implements the receiveData( $key, $value ). You create an instance of this class and pass it to your RemoteScrapingSession object so that you can process each of the product records as they get extracted. Your DataReceiver class implementation might look something like this:

class DataReceiver
{
   function receiveData( $key, $value )
   {
      echo "Received data from ss:\n";
      echo "Key: $key\n";
      echo "Value: $value\n";
      flush();
      writeRow( $value );
   }
}

You would instantiate the class and set it on your session like so:

$data_receiver = new DataReceiver;
$session->setDataReceiver( $data_receiver );

Each time you invoke session.sendDataToClient in screen-scraper, there will be a corresponding method call made to your receiveData method, which will allow you to handle each of the data pieces individually.

For other examples of using the PHP driver please see Tutorial 3: Extending Hello World and Tutorial 4: Scraping an E-commerce Site from External Programs.

Invoking screen-scraper from Python

Overview

A Python script interacts with screen-scraper via a Python class called RemoteScrapingSession. You can utilize this class by importing the module remote_scraping_session.py (found in the misc/python directory of your screen-scraper installation) within your Python script.

screen-scraper needs to be running as a server before invoking screen-scraper from a Python script.

Methods

The following is a reference for all of the methods found in the RemoteScrapingSession class.

  • initialize( name ). Initializes a >RemoteScrapingSession identified by name. If this constructor is called the default host (localhost) and port (8778) will be used.

    session.initialize( "Shopping Site" )

  • initialize( name, host, port ). Instantiates a RemoteScrapingSession identified by name, and connecting to the server found at host listening on port.

    session.initialize( "Shopping Site", "192.168.0.5", 8778 )

  • setVariable( var_name, value ). Sets a session variable using the given var_name and value.

    session.setVariable( "SEARCH", search_term )

  • scrape(). Causes the session to scrape. This is equivalent to clicking the Run Scraping Session button from within screen-scraper on the General tab of a scraping session.

    session.scrape()

  • getVariable( var_name ). Gets the value of a session variable that was set during the course of the scraping session. If the object identified by var_name is a data record an associative array will be returned. If the object identified by var_name is a data set a two-dimensional ordinal array of associative arrays will be returned (see our fourth tutorial for an illustration of this).

    Currently only Strings, DataRecords, and DataSets can be accessed by this method.

    data_set = session.getVariable( "PRODUCTS" )

  • setBufferSize( buffer_size ). Explicitly sets the size of the buffer (in bytes) that will be used when reading data from screen-scraper. The default buffer size is 1024 bytes, so if you're anticipating a large amount of data (such as when receiving a full data set) you'll want to increase this value.

    session.setBufferSize( 64000 )

  • resetBufferSize(). Resets the size of the buffer back to its default size of 1024 bytes.

    session.resetBufferSize( )

  • isError(). Indicates whether or not an error has occurred in the scraping process.

    session.isError()

  • getErrorMessage(). Returns the last error message returned from the server, if one was returned.

    session.getErrorMessage()

  • disconnect(). Disconnects from the remote server. This should be called once a scraping session is complete so that system resources can be freed up.

    session.disconnect()

  • getNumDataRecordsInDataSet( data_set_name ). Returns the number of data records found in the data set named by data_set_name.

    session.getNumDataRecordsInDataSet( "PRODUCTS" )

  • getDataRecordFromDataSet( data_set_name, index ). Returns a single data record (a hash array) from the data set named by data_set_name at the given index.

    session.getNumDataRecordsInDataSet( "PRODUCTS", 2 )

  • setDoLazyScrape( doLazyScrape ). Indicates whether or not a scraping session should be run in a separate thread. By default this value is false.

    Calling this method will only have an effect if it's done before calling the scrape method. If this value is set to true, after the scrape method is called, program flow will return immediately, but the scraping session will still be run by screen-scraper.

    session.setDoLazyScrape( true )

Examples

For an example of using the Python driver please see Tutorial 4: Scraping a Shopping Site from External Programs.

Invoking screen-scraper from Ruby

Overview

A Ruby script interacts with screen-scraper via a Ruby class called RemoteScrapingSession. You can utilize this class by importing the module remote_scraping_session.rb (found in the misc/ruby directory of your screen-scraper installation) within your Ruby script.

screen-scraper needs to be running as a server before invoking screen-scraper from a Ruby script.

Methods

The following is a reference for all of the methods found in the RemoteScrapingSession class.

  • initialize( name ). Initializes a RemoteScrapingSession identified by name. If this constructor is called the default host (localhost) and port (8778) will be used.

    session.initialize( "Shopping Site" )

  • initialize( name, host, port ). Instantiates a RemoteScrapingSession identified by name, and connecting to the server found at host listening on port.

    session.initialize( "Shopping Site", "192.168.0.5", 8778 )

  • setVariable( var_name, value ). Sets a session variable using the given var_name and value.

    session.setVariable( "PAGE", "1" )

  • scrape. Causes the session to start. This is equivalent to clicking the Run Scraping Session button from within screen-scraper on the General tab for a scraping session.

    session.scrape()

  • getVariable( var_name ). Gets the value of a session variable that was set during the course of the scraping session. If the object identified by $var_name is a data record an associative array will be returned. If the object identified by $var_name is a data set a two-dimensional ordinal array of associative arrays will be returned.

    Currently only Strings, DataRecords, and DataSets can be accessed by this method.

    session.getVariable( "PRODUCTS" )

  • setBufferSize( buffer_size ). Explicitly sets the size of the buffer (in bytes) that will be used when reading data from screen-scraper. The default buffer size is 1024 bytes, so if you're anticipating a large amount of data (such as when receiving a full data set) you'll want to increase this value.

    session.setBufferSize( 64000 )

  • resetBufferSize. Resets the size of the buffer back to its default size of 1024 bytes.

    session.setBufferSize( 64000 )

  • isError. Indicates whether or not an error has occurred in the scraping process.

    session.isError

  • getErrorMessage. Returns the last error message returned from the server, if one was returned.

    session.getErrorMessage

  • disconnect. Disconnects from the remote server. This should be called once a scraping session is complete so that system resources can be freed up.

    session.disconnect

  • getNumDataRecordsInDataSet( data_set_name ). Returns the number of data records found in the data set named by data_set_name.

    session.getNumDataRecordsInDataSet( "PRODUCTS" )

  • getDataRecordFromDataSet( data_set_name, index ). Returns a single data record (a hash array) from the data set named by data_set_name at the given index.

    session.getNumDataRecordsInDataSet( "PRODUCTS", 2 )

  • setDoLazyScrape( doLazyScrape ). Indicates whether or not a scraping session should be run in a separate thread. By default this value is false.

    Calling this method will only have an effect if it's done before calling the scrape method. If this value is set to true, after the scrape method is called, program flow will return immediately, but the scraping session will still be running in screen-scraper.

    session.setDoLazyScrape( true )

Examples

For an example of using the Ruby driver please see Tutorial 4: Scraping a Shopping Site from External Programs.

Invoking screen-scraper from a COM-based Application

Overview

When running as a server screen-scraper can be invoked from any windows application that supports COM, such as Visual Basic, Active Server Pages, or Visual C++. For examples of using the COM driver please see Tutorial 3: Extending Hello World and Tutorial 4: Scraping a Shopping Site from External Programs.

Software Requirements

In order to use the COM driver with screen-scraper you'll need the Microsoft Virtual Machine installed on your system (not Sun's Java Runtime Environment). It's likely you've already got it on your computer, but if you experience problems you should probably try installing the most recent version of the Microsoft Virtual Machine, which can be downloaded from java or jheroen.

Installation

When screen-scraper was installed it registered the COM driver on your system in the form of a DLL.

It's very likely that you need not do anything further in order to make use of the DLL.

Should you run into trouble, though, you might try re-registering the DLL:

  1. Optionally move all of the files with a .class extension and Screenscraper.dll (located in misc\COM\Screenscraper) to a different directory (often it's helpful to put them in the directory where most of the other DLLs on your system are contained, such as winnt\system32 on WindowsNT, 2000, or XP, or \windows\system on Windows95/98--bear in mind that this is not required, though).
  2. Open a DOS prompt.
  3. Navigate to the directory where the .class and Screenscraper.dll files are located.
  4. Type regsvr32 Screenscraper.dll (if this doesn't work you may need to find the file regsvr32.exe on your computer, copy it to the directory where Screenscraper.dll is located, then type regsvr32 Screenscraper.dll).

Details

A Windows-based application interacts with screen-scraper via the DLL mentioned previously (think of it as a database driver).

screen-scraper needs to be running as a server before being invoked via the COM driver.

Methods

The following is a reference for all of the methods of the RemoteScrapingSession COM object.

  • Initialize( ScrapingSessionName ): Initializes the remote scraping session for the scraping session identified by ScrapingSessionName.

    Call objRemoteScrapingSession.Initialize( "Weather" )

  • Initialize( ScrapingSessionName, Host, Port ): Initializes the remote scraping session for the scraping session identified by ScrapingSessionName. This method also allows you to explicitly designate a host (default is localhost) and a port (default is 8778).

    Call objRemoteScrapingSession.Initialize( "Weather", "www.mydomain.com", 8799 )

  • Scrape: Causes the scraping session to begin processing.

    Call objRemoteScrapingSession.Scrape

  • GetVariable( VariableName ). Gets the value of a session variable currently being stored by screen-scraper. Note that currently only Strings, DataRecords, and DataSets can be accessed by this method.

    temperature = objRemoteScrapingSession.GetVariable( "TEMPERATURE" )

  • StoreVariable( VariableName ): This method is to be used only for data set and data record objects stored in session variables. It causes the value of VariableName to be retrieved from screen-scraper and stored in the RemoteScrapingSession DLL. The values found within the data set and data record objects can then be retrieved using the methods below.

    Call objRemoteScrapingSession.StoreVariable( "DATASET" )

  • GetDataRecordValue( DataRecordName, FieldName ): Gets the value of the field identified by FieldName found within the data record identified by DataRecordName. Note that a pre-condition to using this method is that the data record was previously retrieved and stored using the StoreVariable method.

    story_title = objRemoteScrapingSession.GetDataRecordValue( "DATARECORD", "STORY_TITLE" )

  • GetNumDataRecordsForDataSet( DataSetName ): Gets the number of data record objects held by the data set identified by DataSetName.

    num_records = objRemoteScrapingSession.GetNumDataRecordsForDataSet( "DATASET" )

  • GetDataSetValue( DataSetName, DataRecordNumber, FieldName ): Gets the value of the field identified by FieldName found on row DataRecordNumber within the data set identified by DataSetName. Note that a pre-condition to using this method is that the data record was previously retrieved and stored using the StoreVariable method. Note also that the first row in the data set is found at 0.

    story_title = objRemoteScrapingSession.GetDataSetValue( "DATASET", 0, "STORY_TITLE" )

  • RemoveObjectFromStore( VariableName ): This method is to be used only for data set and data record objects that were previously stored in the RemoteScrapingSession DLL using the StoreVariable method. Calling this method will cause the DLL to release from memory the data set or data record identified by VariableName.

    Call objRemoteScrapingSession.RemoveObjectFromStore( "DATASET" )

  • RemoveAllObjectsFromStore: This method is to be used only for data set and data record objects that were previously stored in the RemoteScrapingSession DLL using the StoreVariable method. Calling this method will cause the DLL to release all objects from memory.

    Call objRemoteScrapingSession.RemoveAllObjectsFromStore

  • SetVariable( VariableName, Value ). This method should be called before the Scrape method is called, and will cause screen-scraper to set a session variable using the identifier VariableName and the value Value.

    Call objRemoteScrapingSession.SetVariable( "ZIP_CODE", "90001" )

  • IsError: Indicates whether or not an error has occurred since the last method call.

    If objRemoteScrapingSession.IsError Then ...

  • GetErrorMessage: In the event that an error has occurrend (i.e. IsError returns true) this method will return the message associated with the error.

    error_message = objRemoteScrapingSession.GetErrorMessage

  • Disconnect: This method closes up the connection to screen-scraper. It should be called once a RemoteScrapingSession is no longer needed so that resources can be freed up.

    objRemoteScrapingSession.Disconnect

  • GetVersion: Gets the version of the RemoteScrapingSession.

    version = objRemoteScrapingSession.GetVersion

Invoking screen-scraper from the Command Line

Overview

Scraping sessions created within screen-scraper can be invoked by running screen-scraper from a Unix terminal or a DOS command prompt. This allows for possibilities such as scraping information at regular intervals via something like cron or a scheduled task. The basic syntax is as follows:

jre\bin\java -jar screen-scraper.jar -s "scraping_session_name" [-p "url-encoded_variable_string"]

If you installed a version of screen-scraper that includes a Java Virtual Machine (currently Windows and Linux), you'll want to preface the command with "jre\bin\" on Windows or "jre/bin/" on Linux.

Windows Version

{screen-scraper-install-folder}\jre\bin\java -jar screen-scraper.jar -s "scraping_session_name" [-p "url-encoded_variable_string"]

You could also do it in two steps. In which case the two commands are represented below.

cd {screen-scraper-install-folder}

jre\bin\java -jar screen-scraper.jar -s "scraping_session_name" [-p "url-encoded_variable_string"]

{screen-scraper-install-folder} is the location where you installed screen-scraper, such as "C:\Program Files\screen-scraper professional edition\".

Examples

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -jar screen-scraper.jar -s "Google search" -p "search_string=screen+scraper"

This would invoke the Google search scraping session and pass in a parameter named search_string containing the value screen scraper. This will cause a session variable named search_string to be created, which would hold the value screen scraper.

Passed-in parameters need to be URL-encoded strings, just like the query string in a URL.

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -jar screen-scraper.jar -s "Hotmail mail retrieval" --params "user_name=uname&password=mypass"

This one would invoke the Hotmail mail retrieval scraping session and pass in two parameters: user_name containing the value uname and password containing the value mypass. These parameters will become session variables.

Piping Log to a File

While running screen-scraper from command line, you can have the log written to a file by piping it. In order to do that, you need to change the code from the above examples.

For the first example, lets say you want to write a log file with a name google_search.log, the code would change to:

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -jar screen-scraper.jar -s "Google search" -p "search_string=screen+scraper" > "log\google_search.log"

The only difference is at the end of the request: > "log\google_search.log". This instructs the log of the scrape to be written to the log\google_search.log file.

The bat file with the above code needs to be inside the folder where screen-scraper is installed. But if you want your bat file somewhere else other than the screen-scraper installed directory, you have to make some changes to the code. First, you have to cd to the directory where screen-scraper is installed. The code will look like this:

cd "C:\Program Files\screen-scraper professional edition"

"jre\bin\java" -jar screen-scraper.jar -s "Google search" -p "search_string=screen+scraper" > "log\google_search.log"

Similarly, for the second example the code to write a log file will be:

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -jar screen-scraper.jar -s "Hotmail mail retrieval" --params "user_name=uname&password=mypass" > "log/hotmail_mail_retrieval.log"

The above code will write a log file hotmail_mail_retrieval.log inside the log directory.

If your bat file is not inside the screen-scraper installed directory, the code should be like this:

cd "C:\Program Files\screen-scraper professional edition"

"jre\bin\java" -jar screen-scraper.jar -s "Hotmail mail retrieval" --params "user_name=uname&password=mypass" > "log/hotmail_mail_retrieval.log"

When running on Windows, any % character needs to be doubled because this character is treated in a special way in DOS. For example, the parameter "string=hello%21world" would need to be passed in as "string=hello%%21world".

Xmx-flag

While running screen-scraper from command line, there is one thing we need to consider: Memory size. Java runs with a fixed amount of heap memory, which happens to be 64Mb by default. If you get an error message that says it's out of memory then this is because screen-scraper consumed all the heap memory and requires more in order to continue its job.

You can increase the heap memory with the -Xmx flag. To set the heap memory size to 1024 megabytes, use the flag below.

-Xmx1024M

Lets say, we got an error message out of memory size, while running the Hotmail mail retrieval scraping session (from the examples above). The code to increase the heap memory size will be:

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -Xmx1024M -jar "screen-scraper.jar" -s "Hotmail mail retrieval" --params "user_name=uname&password=mypass" > "log/hotmail_mail_retrieval.log"

This code will increase the heap memory size of java to 1024 megabytes.

Remember not to set the heap memory size larger than the physical memory of the machine you are running on.

Invoking screen-scraper through SOAP

Overview

This feature is only available to Enterprise editions of screen-scraper.

SOAP is a common protocol used for accessing web services based on XML. There are several libraries available in most popular programming languages which allow for the rapid development of SOAP clients.

SOAP API Specification

Many of the libraries available include some method of generating the code necessary to interact with a specific SOAP interface when given a WSDL file. We have also provided two examples using a SOAP client for screen-scraper: Java and .NET.

Method Summary

Logging Methods

string getLog(string filename) - Return the content of a given log file.

string getLog(string filename, boolean start, int lines) - Returns a portion of the content of a given log file.

string[] getLogNames() - Returns the names of all the files in the log directory of the remote server.

long getLogSize(string filename) - Return the size of the given logfile in bytes.

int removeLog(string filename) - Remove a log file from the log directory on the remote server.

Scraping Methods

string[] getCompletedScrapingSessions() - Returns the ID's of the completed scraping sessions.

string[] getDataRecord(string id, string var) - Get a data record for the given variable.

string[][] getDataSet(string id, string var) - Get the data set contained in a variable in a scraping session.

string[] getRunningScrapingSessions() - Return the ID's of the currently running scraping sessions.

string getScrapingSessionName(string id) - Returns the name of the scraping session where its key is id.

string[] getScrapingSessionNames() - Returns an array of names of scraping sessions which this server currently has.

long getScrapingSessionStartTime(string id) - Returns the starting time of a particular scraping session as a long.

string[] getScriptNames() - Returns the names of scripts in this server.

string getVariable(string id, string var) - Get the value of a certain variable in a scraping session.

string initializeScrapingSession(string name) - Initialize this scraping session to allow it to be scraped.

int isFinished(string id) - Returns if the session with key=id is finished.

int removeCompletedScrapingSession(string id) - Remove the scraping session given by id from the list of completed scraping sessions.

int removeScrapingSession(string name) - Remove a scraping session from the remote server and from it's database.

int removeScript(string name) - Remove a script from the remote server and it's database.

int scrape(string id) - Scrape the session given by this ID.

int setTimeout(string id, int minutes) - Set the time out minutes of a scraping session to scrape.

int setVariable(string id, string var, string value) - Set a variable within a scraping session.

int stopScrapingSession(string id) - Stop a scraping session in progress.

int update(string xml) - Update the remote server with an exported scraping session or script.

Server Methods

boolean isAcceptingConnections() - Returns the value to acceptingConnections, which is the value which dictates if the server is handling remote requests to scrape.

int setAcceptingConnections(boolean accepting) - Sets the value for acceptingConnections, which will either stop the server from handling requests for remote scrapes or allow them.

Method Detail

isAcceptingConnections

public static boolean isAcceptingConnections()

Returns the value to acceptingConnections, which is the value which dictates if the server is handling remote requests to scrape.

Returns: true if the server is will accept requests to scrape.

setAcceptingConnections

public static int setAcceptingConnections(boolean accepting)

Sets the value for acceptingConnections, which will either stop the server from handling requests for remote scrapes or allow them.

Parameters:

  • accepting - value to change acceptingConnections to.

Returns: int which represents success or a specific error code.

getScrapingSessionNames

public string[] getScrapingSessionNames()

Returns an array of names of scraping sessions which this server currently has.

Returns: names of scraping sessions.

getScriptNames

public string[] getScriptNames()

Returns the names of scripts in this server.

Returns: names of scripts.

getRunningScrapingSessions

public string[] getRunningScrapingSessions()

Return the ID's of the currently running scraping sessions.

Returns: An array of Strings, which are the ID's.

getCompletedScrapingSessions

public string[] getCompletedScrapingSessions()

Returns the ID's of the completed scraping sessions. (Also, updates the list.)

Returns: the ID's of completed scraping sessions.

removeCompletedScrapingSession

public int removeCompletedScrapingSession(string id)

Remove the scraping session given by id from the list of completed scraping sessions.

Parameters:

  • id - the ID of the scraping session to be removed.

Returns: an int representing success or a failure code.

isFinished

public int isFinished(string id)

Returns if the session with key=id is finished.

Parameters:

  • id - the ID of the scraping session to check status.

Returns: an int representing finished (1), not finished (0) or error (0)

getScrapingSessionName

public string getScrapingSessionName(string id)

Returns the name of the scraping session where its key is id.

Parameters:

  • id - the ID of a scraping session.

Returns: the name of a scraping session, or "-1" if not found.

getScrapingSessionStartTime

public long getScrapingSessionStartTime(string id)

Returns the starting time of a particular scraping session as a long.

Parameters:

  • id - the ID of a scraping session.

Returns: the starting time of the scraping session, -1 if not yet started, or 0 if session not found.

initializeScrapingSession

public string initializeScrapingSession(string name)

Initialize this scraping session to allow it to be scraped.

Parameters:

  • name - the name of the scraping session to initialize.

Returns: if success then the ID of this scraping session is returned, otherwise "-1".

scrape

public int scrape(string id)

Scrape the session given by this ID.

Parameters:

  • id - the ID of a scraping session.

Returns: 0 if an error occurred or 1 if successfully started.

setVariable

public int setVariable(string id, string var, string value)

Set a variable within a scraping session. Disallowed if acceptingConnections is false.

Parameters:

  • id - the ID of a scraping session that has been initialized.
  • var - the name of the variable to set.
  • value - the value to set the variable to.

Returns: 1 if successfully set, 0 otherwise.

setTimeout

public int setTimeout(string id, int minutes)

Set the time out minutes of a scraping session to scrape.

Parameters:

  • id - the ID of a scraping session.
  • minutes - the number of minutes before this session will timeout.

Returns: 1 if successful, 0 otherwise.

stopScrapingSession

public int stopScrapingSession(string id)

Stop a scraping session in progress.

Parameters:

  • id - the ID of a scraping session.

Returns: 1 if successful, 0 otherwise.

getVariable

public string getVariable(string id, string var)

Get the value of a certain variable in a scraping session. Note that currently only Strings, DataRecords, and DataSets can be accessed by this method.

Parameters:

  • id - the ID of a scraping session.
  • var - the name of the variable to get the value of.

Returns: if this is a valid scraping session and the value of this variable is a string, then - the value is returned, "NULL" if the value is null, and "-1" otherwise.

getDataRecord

public string[] getDataRecord(string id, string var)

Get a data record for the given variable.

Parameters:

  • id - the ID of a scraping session.
  • var - the name of a variable in this scraping session.

Returns: an array of String objectss like key=value or an empty array if an error happened or the variable is empty.

getDataSet

public string[][] getDataSet(string id, string var)

Get the data set contained in a variable in a scraping session.

Parameters:

  • id - the ID of a scraping session.
  • var - the name of a variable.

Returns: an array of data records as translated to arrays of String objects.

update

public int update(string xml)

Update the remote server with an exported scraping session or script. As a warning, if the version of screen-scraper this xml was exported from is different from the version of screen-scraper which is running as a server, then the update may not work.

Parameters:

  • xml - the XML contained within an exported scraping session file.

Returns: 0 for failure, 1 for success.

removeScrapingSession

public int removeScrapingSession(string name)

Remove a scraping session from the remote server and from it's database.

Parameters:

  • name - the name of the scraping session to be removed.

Returns: 0 for failure, 1 for success.

removeScript

public int removeScript(string name)

Remove a script from the remote server and it's database.

Parameters:

  • name - the name of a script to e removed.

Returns: 0 for failure, 1 for success.

getLogNames

public string[] getLogNames()

Returns the names of all the files in the log directory of the remote server.

Returns: an array of the names of the log files, or null - if there is no log directory.

getLogSize

public long getLogSize(string filename)

Return the size of the given logfile in bytes.

Parameters:

  • filename - the name of a file in the log directory.

Returns: a long representing the length in bytes of this file, or 0 if the file - does not exist or is empty.

getLog

public string getLog(string filename)

Return the content of a given log file.

Parameters:

  • filename - the name of the file to get the contents of.

Returns: a String of the contents of the file, or "" if not possible.

getLog

public string getLog(string filename, boolean start, int lines)

Returns a portion of the content of a given log file.

Parameters:

  • filename - the name of a log file.
  • start - true to return content from the beginning of a file, false - to start counting lines from the end.
  • lines - the number of lines from the log file to return.

Returns: a portion of the content of the given log file, or "" if anything goes wrong.

removeLog

public int removeLog(string filename)

Remove a log file from the log directory on the remote server.

Parameters:

  • filename - the name of the file to remove.

Returns: 0 for failure, 1 for success.

.NET SOAP Example

Overview

The .NET SDK includes an executable that can automatically generate the files necessary to access screen-scraper's SOAP interface as an object.

The first step in the process is getting screen-scraper running as a server.

Next we will generate the service class to do the actual communication in SOAP for us. There is a wsdl.exe in the Bin directory of the .NET SDK. Find it on your computer. Using v1.1 type this command:

"C:\Program Files\Microsoft.NET\SDK\v1.1\Bin\wsdl.exe" http://localhost:8779/axis/services/SOAPInterface?wsdl

To see the options available when using wsdl.exe like the output language being Visual Basic, try the flag /?.

After creating the SOAPInterfaceService class, it is possible that there is a mistake in the code. Find the getDataSet method. If the method returns String[], then change it to String[][] and also the casting of the returned object.

C# Code Example

The following is an example class which uses the generated class from above to call on the scraping session created in Tutorial 2.

Be sure that the newly created class is part of the compilation process.

using System;
using System.Threading;

/*
 * This class calls screen-scraper through SOAP to run Tutorial 2's
 * scraping session and return the results.
 */

public class Tutorial2
{
    public static void Main()
    {
        // This is the object used to call the remote API.
        SOAPInterfaceService soap = new SOAPInterfaceService();

        // First, initialze the scraping session and remember
        // the ID returned.
        string id = soap.initializeScrapingSession("Shopping Site");

        // Set the initial variables before running.
        soap.setVariable(id, "SEARCH", "dvd");
        soap.setVariable(id, "PAGE", "1");

        // Start the scrape, this method returns immediately,
        // though the scraping has not completed.
        soap.scrape(id);

        // One way to do things is to wait until the scraping
        // session completes.
        while (soap.isFinished(id) != 1)
        {
            Thread.Sleep(1000);
        }

        // Get the data set to all the products scraped.
        string[][] dataSet = soap.getDataSet(id, "PRODUCTS");

        // Returned the used memory for storing session
         // variables to the virtual machine.
        soap.removeCompletedScrapingSession(id);

        // Loop through all the data records.
        foreach (string[] datarecord in dataSet)
        {
            // Loop through all the key, value pairs.
            foreach (string record in datarecord)
            {
                Console.Write(record);
                Console.Write("\t");
            }
            Console.WriteLine();
        }
    }
}

Java SOAP Example

Overview

The Axis Library included with screen-scraper makes creating a SOAP client quite easy. Using the WSDL file created from the remote procedures on the screen-scraper server, Axis can create the stubs to make calling the methods in SOAP a matter of just using an object.

The first step in the process is getting screen-scraper running as a server.

The next step is to call the WSDL2Java class in the Axis library. To do so there are six jar files which need to be in the class path to run the class. All of these files are included in the lib directory of screen-scraper.

  • axis.jar
  • commons-discovery.jar
  • commons-logging.jar
  • jaxrpc.jar
  • saaj.jar
  • wsdl4j.jar

You also need to call the correct class org.apache.axis.wsdl.WSDL2Java. We also recommend changing the output package that the created Java files go to so that it is com.screenscraper.soapclient. This can be done using the --package option.

Here is an example command-line usage of options just specified above from inside the screen-scraper directory on Windows.

jre\bin\java.exe -cp "lib\axis.jar;lib\jaxrpc.jar;lib\saaj.jar;lib\commons-logging.jar;lib\commons-discovery.jar;lib\wsdl4j.jar" org.apache.axis.wsdl.WSDL2Java --package com.screenscraper.soapclient http://localhost:8779/axis/services/SOAPInterface?wsdl

This will create a new directory com where the command was issued containing the necessary stubs.

Example

The following is an example class which uses the generated classes from above to call on the scraping session created in Tutorial 2.

Be sure that the newly created files compile with this one and that the above mentioned jars are in your CLASSPATH. Also, make sure that screen-scraper is running as a server.

package com.screenscraper.tutorial2;

import com.screenscraper.soapclient.SOAPInterface;
import com.screenscraper.soapclient.SOAPInterfaceService;
import com.screenscraper.soapclient.SOAPInterfaceServiceLocator;

import java.rmi.RemoteException;
import javax.xml.rpc.ServiceException;

/**
 * An example program demonstrating how to use the auto-generated
 * classes from axis to create a Java program to interact with
 * screen-scraper.
 */

public class Main {

     /** Creates a new instance of Main. */
    public Main() { }

     /**
      * Scrapes the scraping session created by tutorial 2 writing out
      * the results to the console.
     */

    public void scrapeTutorial2() {
        try {
            // Necessary calls to auto-generated classes.
            SOAPInterfaceService service = new SOAPInterfaceServiceLocator();
            SOAPInterface soap = service.getSOAPInterface();

            // Initialize the scraping session.
            String id = soap.initializeScrapingSession("Shopping Site");

            // Set the variables needed before scraping.
            soap.setVariable(id, "SEARCH", "dvd");
            soap.setVariable(id, "PAGE", "1");

            // Start scraping.  This call returns immediately,
            // though the scraping session is not completed.
            soap.scrape(id);

            // Wait until scraping completes.
            while (soap.isFinished(id) != 1) {
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException ignored) { }
            }

            // Retrieve the data set generated.
            String[][] dataSet = soap.getDataSet(id, "DETAILS");

            // Clean up memory usage by screen-scraper.
            soap.removeCompletedScrapingSession(id);

            // Print out key, value pairs.
            for (int i = 0; i < dataSet.length; i++) {
                for (int j = 0; j < dataSet[i].length; j++) {
                    System.out.print(dataSet[i][j] + '\t');
                }
                System.out.println();
            }
        } catch (RemoteException re) {
            re.printStackTrace();
        } catch (ServiceException se) {
            se.printStackTrace();
        }
    }

     /**
     * Starts the example program.
     * @param args the command line arguments
     */

    public static void main(String[] args) {
        Main testSoap = new Main();
        testSoap.scrapeTutorial2();
    }
}

Invoking screen-scraper via .NET

Overview

If you're using Visual Studio 2008 or later, the project 'Target Framework' will need to be set to .NET 3.5 or later. However, do not use any .NET client frameworks since they do not have the required libraries for your project to compile.

A C# application interacts with screen-scraper via the Screenscraper.RemoteScrapingSession class. You can utilize the this class by compiling with a reference to the misc/dotNET folder of your screen-scraper distribution.

screen-scraper needs to be running as a server before invoking it from a .NET class.

RemoteScrapingSession Methods

The following is a reference for all of the methods found in the RemoteScrapingSession class.

  • RemoteScrapingSession( string identifier ). Instantiates a RemoteScrapingSession identified by identifier. If this constructor is called the default host (localhost) and port (8778) will be used.

    RemoteScrapingSession remoteScrapingSession = new RemoteScrapingSession("Shopping Site");

  • RemoteScrapingSession( string identifier, string host, int port ). Instantiates a RemoteScrapingSession identified by identifier, and connecting to the server found at host listening on port.

    RemoteScrapingSession remoteScrapingSession = new RemoteScrapingSession("Shopping Site", "192.168.0.5", 8778 );

  • RemoteScrapingSession( string host, int port ). Instantiates a RemoteScrapingSession which is connected to the server found at host listening on port. A RemoteScrapingSession object instantiated with this constructor is mainly used for stopping a running scraping session.

    RemoteScrapingSession remoteScrapingSession = new RemoteScrapingSession("192.168.0.5", 8778 );

  • Disconnect(). Should be called once you're done interacting with the RemoteScrapingSession object so that screen-scraper can clean up.

    remoteScrapingSession.Disconnect();

  • SetVariable( string varName, string value ). Sets a session variable in the session that will be accessible from within a screen-scraper script.

    remoteScrapingSession.SetVariable("PAGE", "1");

  • Scrape(). Causes the session to scrape. This is equivalent to clicking the "Run Scraping Session" button from within screen-scraper on the "General" tab for a scraping session.

    remoteScrapingSession.Scrape();

  • GetVariable( string varName ). Gets the value of a session variable that was set during the course of the scraping session. Note that currently only Strings, DataRecords, and DataSets can be accessed by this method.

    remoteScrapingSession.GetVariable("PRODUCTS");

  • StopServer(). Causes the server connected to by this instance of RemoteScrapingSession to stop scraping if it currently was.

    remoteScrapingSession.StopServer();

  • Timeout. A property which allows one to set the timeout of this scraping session. The value passed in here will be an int representing the number of minutes before timing out.

    remoteScrapingSession.Timeout = 60;

  • SessionTimedOut. This is a property which allows one to get a bool value which allows one to know if the scraping session timed out when it ended.

    remoteScrapingSession.SessionTimedOut;

  • LazyScrape. A property which can be set to true or false. The default value is false. When this property is set to true, then requests for pages will be done in different threads, or simultaneously. If one page does not rely on the scraping of another page, this could significantly increase the speed of the scrape.

    remoteScrapingSession.LazyScrape = true;

It is also possible to store data sets and data records in session variables, which can then be accessed via the RemoteScrapingSession class. Data set objects are analogous to database result sets and data records are analogous to individual records within a result set. When an extractor pattern is applied a data set of data record objects is generated. Storing the resulting data set in a session variable (within a screen-scraper script) will allow for it to be accessed via a RemoteScrapingSession.GetVariable call.

The data record class (Screenscraper.DataRecord) simply extends Microsoft's Hashtable.

DataSet Methods

The following is a reference for all of the methods found in the DataSet class (Screenscraper.DataSet).

  • AllDataRecords. Is a property which returns all of the DataRecord objects in an ArrayList.

    dataSet.AllDataRecords;

  • DataRecord this[ int dataRecordNumber ]. Is an indexer method which will return the DataRecord at position dataRecordNumber containing data extracted from a single application of an ExtractorPattern.

    DataRecord product = products[i];

  • int NumDataRecords. Is a property which gets the number of DataRecord objects held by this DataSet object.

    dataSet.NumDataRecords;

  • string this[ int dataRecordNumber, string identifier ]. Another indexer which returns a single item of data identified by identifier from the DataRecord at dataRecordNumber.

    String productName = products[i, "NAME"];

Examples

For an example of using the .NET driver please see Tutorial 4: Scraping an E-commerce Site from External Programs.