Invoking screen-scraper via .NET
Overview
If you're using Visual Studio 2008 or later, the project 'Target Framework' will need to be set to .NET 3.5 or later. However, do not use any .NET client frameworks since they do not have the required libraries for your project to compile.
A C# application interacts with screen-scraper via the Screenscraper.RemoteScrapingSession class. You can utilize the this class by compiling with a reference to the misc/dotNET folder of your screen-scraper distribution.
screen-scraper needs to be running as a server before invoking it from a .NET class.
RemoteScrapingSession Methods
The following is a reference for all of the methods found in the RemoteScrapingSession class.
- RemoteScrapingSession( string identifier ). Instantiates a RemoteScrapingSession identified by identifier. If this constructor is called the default host (localhost) and port (8778) will be used.
RemoteScrapingSession remoteScrapingSession = new RemoteScrapingSession("Shopping Site");
- RemoteScrapingSession( string identifier, string host, int port ). Instantiates a RemoteScrapingSession identified by identifier, and connecting to the server found at host listening on port.
RemoteScrapingSession remoteScrapingSession = new RemoteScrapingSession("Shopping Site", "192.168.0.5", 8778 );
- RemoteScrapingSession( string host, int port ). Instantiates a RemoteScrapingSession which is connected to the server found at host listening on port. A RemoteScrapingSession object instantiated with this constructor is mainly used for stopping a running scraping session.
RemoteScrapingSession remoteScrapingSession = new RemoteScrapingSession("192.168.0.5", 8778 );
- Disconnect(). Should be called once you're done interacting with the RemoteScrapingSession object so that screen-scraper can clean up.
remoteScrapingSession.Disconnect();
- SetVariable( string varName, string value ). Sets a session variable in the session that will be accessible from within a screen-scraper script.
remoteScrapingSession.SetVariable("PAGE", "1");
- Scrape(). Causes the session to scrape. This is equivalent to clicking the "Run Scraping Session" button from within screen-scraper on the "General" tab for a scraping session.
remoteScrapingSession.Scrape();
- GetVariable( string varName ). Gets the value of a session variable that was set during the course of the scraping session. Note that currently only Strings, DataRecords, and DataSets can be accessed by this method.
remoteScrapingSession.GetVariable("PRODUCTS");
- StopServer(). Causes the server connected to by this instance of RemoteScrapingSession to stop scraping if it currently was.
remoteScrapingSession.StopServer();
- Timeout. A property which allows one to set the timeout of this scraping session. The value passed in here will be an int representing the number of minutes before timing out.
remoteScrapingSession.Timeout = 60;
- SessionTimedOut. This is a property which allows one to get a bool value which allows one to know if the scraping session timed out when it ended.
remoteScrapingSession.SessionTimedOut;
- LazyScrape. A property which can be set to true or false. The default value is false. When this property is set to true, then requests for pages will be done in different threads, or simultaneously. If one page does not rely on the scraping of another page, this could significantly increase the speed of the scrape.
remoteScrapingSession.LazyScrape = true;
It is also possible to store data sets and data records in session variables, which can then be accessed via the RemoteScrapingSession class. Data set objects are analogous to database result sets and data records are analogous to individual records within a result set. When an extractor pattern is applied a data set of data record objects is generated. Storing the resulting data set in a session variable (within a screen-scraper script) will allow for it to be accessed via a RemoteScrapingSession.GetVariable call.
The data record class (Screenscraper.DataRecord) simply extends Microsoft's Hashtable.
DataSet Methods
The following is a reference for all of the methods found in the DataSet class (Screenscraper.DataSet).
- AllDataRecords. Is a property which returns all of the DataRecord objects in an ArrayList.
- DataRecord this[ int dataRecordNumber ]. Is an indexer method which will return the DataRecord at position dataRecordNumber containing data extracted from a single application of an ExtractorPattern.
DataRecord product = products[i];
- int NumDataRecords. Is a property which gets the number of DataRecord objects held by this DataSet object.
- string this[ int dataRecordNumber, string identifier ]. Another indexer which returns a single item of data identified by identifier from the DataRecord at dataRecordNumber.
String productName = products[i, "NAME"];
Examples
For an example of using the .NET driver please see Tutorial 4: Scraping an E-commerce Site from External Programs.