Invoking screen-scraper through SOAP

Overview

This feature is only available to Enterprise editions of screen-scraper.

SOAP is a common protocol used for accessing web services based on XML. There are several libraries available in most popular programming languages which allow for the rapid development of SOAP clients.

SOAP API Specification

Many of the libraries available include some method of generating the code necessary to interact with a specific SOAP interface when given a WSDL file. We have also provided two examples using a SOAP client for screen-scraper: Java and .NET.

Method Summary

Logging Methods

string getLog(string filename) - Return the content of a given log file.

string getLog(string filename, boolean start, int lines) - Returns a portion of the content of a given log file.

string[] getLogNames() - Returns the names of all the files in the log directory of the remote server.

long getLogSize(string filename) - Return the size of the given logfile in bytes.

int removeLog(string filename) - Remove a log file from the log directory on the remote server.

Scraping Methods

string[] getCompletedScrapingSessions() - Returns the ID's of the completed scraping sessions.

string[] getDataRecord(string id, string var) - Get a data record for the given variable.

string[][] getDataSet(string id, string var) - Get the data set contained in a variable in a scraping session.

string[] getRunningScrapingSessions() - Return the ID's of the currently running scraping sessions.

string getScrapingSessionName(string id) - Returns the name of the scraping session where its key is id.

string[] getScrapingSessionNames() - Returns an array of names of scraping sessions which this server currently has.

long getScrapingSessionStartTime(string id) - Returns the starting time of a particular scraping session as a long.

string[] getScriptNames() - Returns the names of scripts in this server.

string getVariable(string id, string var) - Get the value of a certain variable in a scraping session.

string initializeScrapingSession(string name) - Initialize this scraping session to allow it to be scraped.

int isFinished(string id) - Returns if the session with key=id is finished.

int removeCompletedScrapingSession(string id) - Remove the scraping session given by id from the list of completed scraping sessions.

int removeScrapingSession(string name) - Remove a scraping session from the remote server and from it's database.

int removeScript(string name) - Remove a script from the remote server and it's database.

int scrape(string id) - Scrape the session given by this ID.

int setTimeout(string id, int minutes) - Set the time out minutes of a scraping session to scrape.

int setVariable(string id, string var, string value) - Set a variable within a scraping session.

int stopScrapingSession(string id) - Stop a scraping session in progress.

int update(string xml) - Update the remote server with an exported scraping session or script.

Server Methods

boolean isAcceptingConnections() - Returns the value to acceptingConnections, which is the value which dictates if the server is handling remote requests to scrape.

int setAcceptingConnections(boolean accepting) - Sets the value for acceptingConnections, which will either stop the server from handling requests for remote scrapes or allow them.

Method Detail

isAcceptingConnections

public static boolean isAcceptingConnections()

Returns the value to acceptingConnections, which is the value which dictates if the server is handling remote requests to scrape.

Returns: true if the server is will accept requests to scrape.

setAcceptingConnections

public static int setAcceptingConnections(boolean accepting)

Sets the value for acceptingConnections, which will either stop the server from handling requests for remote scrapes or allow them.

Parameters:

  • accepting - value to change acceptingConnections to.

Returns: int which represents success or a specific error code.

getScrapingSessionNames

public string[] getScrapingSessionNames()

Returns an array of names of scraping sessions which this server currently has.

Returns: names of scraping sessions.

getScriptNames

public string[] getScriptNames()

Returns the names of scripts in this server.

Returns: names of scripts.

getRunningScrapingSessions

public string[] getRunningScrapingSessions()

Return the ID's of the currently running scraping sessions.

Returns: An array of Strings, which are the ID's.

getCompletedScrapingSessions

public string[] getCompletedScrapingSessions()

Returns the ID's of the completed scraping sessions. (Also, updates the list.)

Returns: the ID's of completed scraping sessions.

removeCompletedScrapingSession

public int removeCompletedScrapingSession(string id)

Remove the scraping session given by id from the list of completed scraping sessions.

Parameters:

  • id - the ID of the scraping session to be removed.

Returns: an int representing success or a failure code.

isFinished

public int isFinished(string id)

Returns if the session with key=id is finished.

Parameters:

  • id - the ID of the scraping session to check status.

Returns: an int representing finished (1), not finished (0) or error (0)

getScrapingSessionName

public string getScrapingSessionName(string id)

Returns the name of the scraping session where its key is id.

Parameters:

  • id - the ID of a scraping session.

Returns: the name of a scraping session, or "-1" if not found.

getScrapingSessionStartTime

public long getScrapingSessionStartTime(string id)

Returns the starting time of a particular scraping session as a long.

Parameters:

  • id - the ID of a scraping session.

Returns: the starting time of the scraping session, -1 if not yet started, or 0 if session not found.

initializeScrapingSession

public string initializeScrapingSession(string name)

Initialize this scraping session to allow it to be scraped.

Parameters:

  • name - the name of the scraping session to initialize.

Returns: if success then the ID of this scraping session is returned, otherwise "-1".

scrape

public int scrape(string id)

Scrape the session given by this ID.

Parameters:

  • id - the ID of a scraping session.

Returns: 0 if an error occurred or 1 if successfully started.

setVariable

public int setVariable(string id, string var, string value)

Set a variable within a scraping session. Disallowed if acceptingConnections is false.

Parameters:

  • id - the ID of a scraping session that has been initialized.
  • var - the name of the variable to set.
  • value - the value to set the variable to.

Returns: 1 if successfully set, 0 otherwise.

setTimeout

public int setTimeout(string id, int minutes)

Set the time out minutes of a scraping session to scrape.

Parameters:

  • id - the ID of a scraping session.
  • minutes - the number of minutes before this session will timeout.

Returns: 1 if successful, 0 otherwise.

stopScrapingSession

public int stopScrapingSession(string id)

Stop a scraping session in progress.

Parameters:

  • id - the ID of a scraping session.

Returns: 1 if successful, 0 otherwise.

getVariable

public string getVariable(string id, string var)

Get the value of a certain variable in a scraping session. Note that currently only Strings, DataRecords, and DataSets can be accessed by this method.

Parameters:

  • id - the ID of a scraping session.
  • var - the name of the variable to get the value of.

Returns: if this is a valid scraping session and the value of this variable is a string, then - the value is returned, "NULL" if the value is null, and "-1" otherwise.

getDataRecord

public string[] getDataRecord(string id, string var)

Get a data record for the given variable.

Parameters:

  • id - the ID of a scraping session.
  • var - the name of a variable in this scraping session.

Returns: an array of String objectss like key=value or an empty array if an error happened or the variable is empty.

getDataSet

public string[][] getDataSet(string id, string var)

Get the data set contained in a variable in a scraping session.

Parameters:

  • id - the ID of a scraping session.
  • var - the name of a variable.

Returns: an array of data records as translated to arrays of String objects.

update

public int update(string xml)

Update the remote server with an exported scraping session or script. As a warning, if the version of screen-scraper this xml was exported from is different from the version of screen-scraper which is running as a server, then the update may not work.

Parameters:

  • xml - the XML contained within an exported scraping session file.

Returns: 0 for failure, 1 for success.

removeScrapingSession

public int removeScrapingSession(string name)

Remove a scraping session from the remote server and from it's database.

Parameters:

  • name - the name of the scraping session to be removed.

Returns: 0 for failure, 1 for success.

removeScript

public int removeScript(string name)

Remove a script from the remote server and it's database.

Parameters:

  • name - the name of a script to e removed.

Returns: 0 for failure, 1 for success.

getLogNames

public string[] getLogNames()

Returns the names of all the files in the log directory of the remote server.

Returns: an array of the names of the log files, or null - if there is no log directory.

getLogSize

public long getLogSize(string filename)

Return the size of the given logfile in bytes.

Parameters:

  • filename - the name of a file in the log directory.

Returns: a long representing the length in bytes of this file, or 0 if the file - does not exist or is empty.

getLog

public string getLog(string filename)

Return the content of a given log file.

Parameters:

  • filename - the name of the file to get the contents of.

Returns: a String of the contents of the file, or "" if not possible.

getLog

public string getLog(string filename, boolean start, int lines)

Returns a portion of the content of a given log file.

Parameters:

  • filename - the name of a log file.
  • start - true to return content from the beginning of a file, false - to start counting lines from the end.
  • lines - the number of lines from the log file to return.

Returns: a portion of the content of the given log file, or "" if anything goes wrong.

removeLog

public int removeLog(string filename)

Remove a log file from the log directory on the remote server.

Parameters:

  • filename - the name of the file to remove.

Returns: 0 for failure, 1 for success.

.NET SOAP Example

Overview

The .NET SDK includes an executable that can automatically generate the files necessary to access screen-scraper's SOAP interface as an object.

The first step in the process is getting screen-scraper running as a server.

Next we will generate the service class to do the actual communication in SOAP for us. There is a wsdl.exe in the Bin directory of the .NET SDK. Find it on your computer. Using v1.1 type this command:

"C:\Program Files\Microsoft.NET\SDK\v1.1\Bin\wsdl.exe" http://localhost:8779/axis/services/SOAPInterface?wsdl

To see the options available when using wsdl.exe like the output language being Visual Basic, try the flag /?.

After creating the SOAPInterfaceService class, it is possible that there is a mistake in the code. Find the getDataSet method. If the method returns String[], then change it to String[][] and also the casting of the returned object.

C# Code Example

The following is an example class which uses the generated class from above to call on the scraping session created in Tutorial 2.

Be sure that the newly created class is part of the compilation process.

using System;
using System.Threading;

/*
 * This class calls screen-scraper through SOAP to run Tutorial 2's
 * scraping session and return the results.
 */

public class Tutorial2
{
    public static void Main()
    {
        // This is the object used to call the remote API.
        SOAPInterfaceService soap = new SOAPInterfaceService();

        // First, initialze the scraping session and remember
        // the ID returned.
        string id = soap.initializeScrapingSession("Shopping Site");

        // Set the initial variables before running.
        soap.setVariable(id, "SEARCH", "dvd");
        soap.setVariable(id, "PAGE", "1");

        // Start the scrape, this method returns immediately,
        // though the scraping has not completed.
        soap.scrape(id);

        // One way to do things is to wait until the scraping
        // session completes.
        while (soap.isFinished(id) != 1)
        {
            Thread.Sleep(1000);
        }

        // Get the data set to all the products scraped.
        string[][] dataSet = soap.getDataSet(id, "PRODUCTS");

        // Returned the used memory for storing session
         // variables to the virtual machine.
        soap.removeCompletedScrapingSession(id);

        // Loop through all the data records.
        foreach (string[] datarecord in dataSet)
        {
            // Loop through all the key, value pairs.
            foreach (string record in datarecord)
            {
                Console.Write(record);
                Console.Write("\t");
            }
            Console.WriteLine();
        }
    }
}

Java SOAP Example

Overview

The Axis Library included with screen-scraper makes creating a SOAP client quite easy. Using the WSDL file created from the remote procedures on the screen-scraper server, Axis can create the stubs to make calling the methods in SOAP a matter of just using an object.

The first step in the process is getting screen-scraper running as a server.

The next step is to call the WSDL2Java class in the Axis library. To do so there are six jar files which need to be in the class path to run the class. All of these files are included in the lib directory of screen-scraper.

  • axis.jar
  • commons-discovery.jar
  • commons-logging.jar
  • jaxrpc.jar
  • saaj.jar
  • wsdl4j.jar

You also need to call the correct class org.apache.axis.wsdl.WSDL2Java. We also recommend changing the output package that the created Java files go to so that it is com.screenscraper.soapclient. This can be done using the --package option.

Here is an example command-line usage of options just specified above from inside the screen-scraper directory on Windows.

jre\bin\java.exe -cp "lib\axis.jar;lib\jaxrpc.jar;lib\saaj.jar;lib\commons-logging.jar;lib\commons-discovery.jar;lib\wsdl4j.jar" org.apache.axis.wsdl.WSDL2Java --package com.screenscraper.soapclient http://localhost:8779/axis/services/SOAPInterface?wsdl

This will create a new directory com where the command was issued containing the necessary stubs.

Example

The following is an example class which uses the generated classes from above to call on the scraping session created in Tutorial 2.

Be sure that the newly created files compile with this one and that the above mentioned jars are in your CLASSPATH. Also, make sure that screen-scraper is running as a server.

package com.screenscraper.tutorial2;

import com.screenscraper.soapclient.SOAPInterface;
import com.screenscraper.soapclient.SOAPInterfaceService;
import com.screenscraper.soapclient.SOAPInterfaceServiceLocator;

import java.rmi.RemoteException;
import javax.xml.rpc.ServiceException;

/**
 * An example program demonstrating how to use the auto-generated
 * classes from axis to create a Java program to interact with
 * screen-scraper.
 */

public class Main {

     /** Creates a new instance of Main. */
    public Main() { }

     /**
      * Scrapes the scraping session created by tutorial 2 writing out
      * the results to the console.
     */

    public void scrapeTutorial2() {
        try {
            // Necessary calls to auto-generated classes.
            SOAPInterfaceService service = new SOAPInterfaceServiceLocator();
            SOAPInterface soap = service.getSOAPInterface();

            // Initialize the scraping session.
            String id = soap.initializeScrapingSession("Shopping Site");

            // Set the variables needed before scraping.
            soap.setVariable(id, "SEARCH", "dvd");
            soap.setVariable(id, "PAGE", "1");

            // Start scraping.  This call returns immediately,
            // though the scraping session is not completed.
            soap.scrape(id);

            // Wait until scraping completes.
            while (soap.isFinished(id) != 1) {
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException ignored) { }
            }

            // Retrieve the data set generated.
            String[][] dataSet = soap.getDataSet(id, "DETAILS");

            // Clean up memory usage by screen-scraper.
            soap.removeCompletedScrapingSession(id);

            // Print out key, value pairs.
            for (int i = 0; i < dataSet.length; i++) {
                for (int j = 0; j < dataSet[i].length; j++) {
                    System.out.print(dataSet[i][j] + '\t');
                }
                System.out.println();
            }
        } catch (RemoteException re) {
            re.printStackTrace();
        } catch (ServiceException se) {
            se.printStackTrace();
        }
    }

     /**
     * Starts the example program.
     * @param args the command line arguments
     */

    public static void main(String[] args) {
        Main testSoap = new Main();
        testSoap.scrapeTutorial2();
    }
}