Invoking screen-scraper from Ruby
Overview
A Ruby script interacts with screen-scraper via a Ruby class called RemoteScrapingSession. You can utilize this class by importing the module remote_scraping_session.rb (found in the misc/ruby directory of your screen-scraper installation) within your Ruby script.
screen-scraper needs to be running as a server before invoking screen-scraper from a Ruby script.
Methods
The following is a reference for all of the methods found in the RemoteScrapingSession class.
- initialize( name ). Initializes a RemoteScrapingSession identified by name. If this constructor is called the default host (localhost) and port (8778) will be used.
session.initialize( "Shopping Site" )
- initialize( name, host, port ). Instantiates a RemoteScrapingSession identified by name, and connecting to the server found at host listening on port.
session.initialize( "Shopping Site", "192.168.0.5", 8778 )
- setVariable( var_name, value ). Sets a session variable using the given var_name and value.
session.setVariable( "PAGE", "1" )
- scrape. Causes the session to start. This is equivalent to clicking the Run Scraping Session button from within screen-scraper on the General tab for a scraping session.
session.scrape()
- getVariable( var_name ). Gets the value of a session variable that was set during the course of the scraping session. If the object identified by $var_name is a data record an associative array will be returned. If the object identified by $var_name is a data set a two-dimensional ordinal array of associative arrays will be returned.
Currently only Strings, DataRecords, and DataSets can be accessed by this method.
session.getVariable( "PRODUCTS" ) - setBufferSize( buffer_size ). Explicitly sets the size of the buffer (in bytes) that will be used when reading data from screen-scraper. The default buffer size is 1024 bytes, so if you're anticipating a large amount of data (such as when receiving a full data set) you'll want to increase this value.
session.setBufferSize( 64000 )
- resetBufferSize. Resets the size of the buffer back to its default size of 1024 bytes.
session.setBufferSize( 64000 )
- isError. Indicates whether or not an error has occurred in the scraping process.
session.isError
- getErrorMessage. Returns the last error message returned from the server, if one was returned.
session.getErrorMessage
- disconnect. Disconnects from the remote server. This should be called once a scraping session is complete so that system resources can be freed up.
session.disconnect
- getNumDataRecordsInDataSet( data_set_name ). Returns the number of data records found in the data set named by data_set_name.
session.getNumDataRecordsInDataSet( "PRODUCTS" )
- getDataRecordFromDataSet( data_set_name, index ). Returns a single data record (a hash array) from the data set named by data_set_name at the given index.
session.getNumDataRecordsInDataSet( "PRODUCTS", 2 )
- setDoLazyScrape( doLazyScrape ). Indicates whether or not a scraping session should be run in a separate thread. By default this value is false.
Calling this method will only have an effect if it's done before calling the scrape method. If this value is set to true, after the scrape method is called, program flow will return immediately, but the scraping session will still be running in screen-scraper.
session.setDoLazyScrape( true )
Examples
For an example of using the Ruby driver please see Tutorial 4: Scraping a Shopping Site from External Programs.
- Printer-friendly version
- Login or register to post comments