Invoking screen-scraper from a COM-based Application

Overview

When running as a server screen-scraper can be invoked from any windows application that supports COM, such as Visual Basic, Active Server Pages, or Visual C++. For examples of using the COM driver please see Tutorial 3: Extending Hello World and Tutorial 4: Scraping a Shopping Site from External Programs.

Software Requirements

In order to use the COM driver with screen-scraper you'll need the Microsoft Virtual Machine installed on your system (not Sun's Java Runtime Environment). It's likely you've already got it on your computer, but if you experience problems you should probably try installing the most recent version of the Microsoft Virtual Machine, which can be downloaded from java or jheroen.

Installation

When screen-scraper was installed it registered the COM driver on your system in the form of a DLL.

It's very likely that you need not do anything further in order to make use of the DLL.

Should you run into trouble, though, you might try re-registering the DLL:

  1. Optionally move all of the files with a .class extension and Screenscraper.dll (located in misc\COM\Screenscraper) to a different directory (often it's helpful to put them in the directory where most of the other DLLs on your system are contained, such as winnt\system32 on WindowsNT, 2000, or XP, or \windows\system on Windows95/98--bear in mind that this is not required, though).
  2. Open a DOS prompt.
  3. Navigate to the directory where the .class and Screenscraper.dll files are located.
  4. Type regsvr32 Screenscraper.dll (if this doesn't work you may need to find the file regsvr32.exe on your computer, copy it to the directory where Screenscraper.dll is located, then type regsvr32 Screenscraper.dll).

Details

A Windows-based application interacts with screen-scraper via the DLL mentioned previously (think of it as a database driver).

screen-scraper needs to be running as a server before being invoked via the COM driver.

Methods

The following is a reference for all of the methods of the RemoteScrapingSession COM object.

  • Initialize( ScrapingSessionName ): Initializes the remote scraping session for the scraping session identified by ScrapingSessionName.

    Call objRemoteScrapingSession.Initialize( "Weather" )

  • Initialize( ScrapingSessionName, Host, Port ): Initializes the remote scraping session for the scraping session identified by ScrapingSessionName. This method also allows you to explicitly designate a host (default is localhost) and a port (default is 8778).

    Call objRemoteScrapingSession.Initialize( "Weather", "www.mydomain.com", 8799 )

  • Scrape: Causes the scraping session to begin processing.

    Call objRemoteScrapingSession.Scrape

  • GetVariable( VariableName ). Gets the value of a session variable currently being stored by screen-scraper. Note that currently only Strings, DataRecords, and DataSets can be accessed by this method.

    temperature = objRemoteScrapingSession.GetVariable( "TEMPERATURE" )

  • StoreVariable( VariableName ): This method is to be used only for data set and data record objects stored in session variables. It causes the value of VariableName to be retrieved from screen-scraper and stored in the RemoteScrapingSession DLL. The values found within the data set and data record objects can then be retrieved using the methods below.

    Call objRemoteScrapingSession.StoreVariable( "DATASET" )

  • GetDataRecordValue( DataRecordName, FieldName ): Gets the value of the field identified by FieldName found within the data record identified by DataRecordName. Note that a pre-condition to using this method is that the data record was previously retrieved and stored using the StoreVariable method.

    story_title = objRemoteScrapingSession.GetDataRecordValue( "DATARECORD", "STORY_TITLE" )

  • GetNumDataRecordsForDataSet( DataSetName ): Gets the number of data record objects held by the data set identified by DataSetName.

    num_records = objRemoteScrapingSession.GetNumDataRecordsForDataSet( "DATASET" )

  • GetDataSetValue( DataSetName, DataRecordNumber, FieldName ): Gets the value of the field identified by FieldName found on row DataRecordNumber within the data set identified by DataSetName. Note that a pre-condition to using this method is that the data record was previously retrieved and stored using the StoreVariable method. Note also that the first row in the data set is found at 0.

    story_title = objRemoteScrapingSession.GetDataSetValue( "DATASET", 0, "STORY_TITLE" )

  • RemoveObjectFromStore( VariableName ): This method is to be used only for data set and data record objects that were previously stored in the RemoteScrapingSession DLL using the StoreVariable method. Calling this method will cause the DLL to release from memory the data set or data record identified by VariableName.

    Call objRemoteScrapingSession.RemoveObjectFromStore( "DATASET" )

  • RemoveAllObjectsFromStore: This method is to be used only for data set and data record objects that were previously stored in the RemoteScrapingSession DLL using the StoreVariable method. Calling this method will cause the DLL to release all objects from memory.

    Call objRemoteScrapingSession.RemoveAllObjectsFromStore

  • SetVariable( VariableName, Value ). This method should be called before the Scrape method is called, and will cause screen-scraper to set a session variable using the identifier VariableName and the value Value.

    Call objRemoteScrapingSession.SetVariable( "ZIP_CODE", "90001" )

  • IsError: Indicates whether or not an error has occurred since the last method call.

    If objRemoteScrapingSession.IsError Then ...

  • GetErrorMessage: In the event that an error has occurrend (i.e. IsError returns true) this method will return the message associated with the error.

    error_message = objRemoteScrapingSession.GetErrorMessage

  • Disconnect: This method closes up the connection to screen-scraper. It should be called once a RemoteScrapingSession is no longer needed so that resources can be freed up.

    objRemoteScrapingSession.Disconnect

  • GetVersion: Gets the version of the RemoteScrapingSession.

    version = objRemoteScrapingSession.GetVersion