This section s provided to give additional information about the software, how it works, and the technologies behind screen-scraper. Many of these pages contain links that are not under the control of our company. We have chosen them for their quality at the time. If the links break or the content changes we would appreciate your contacting us about it so that the links remain relevant.
As of screen-scraper 5.0 a simple code completion has been added to the scripting. It is meant to make it easier to remember the method names and their parameters. It provides you with this information as well as a link back to the documentation on the methods.
To activate the dialog simply type the name of a built-in object followed by a period (just like you would when coding). If you pause after the period the dialog will pop up and allow you to click through the methods of the object. As you type it will limit the list until it gets to the one that you are looking for. By double-clicking on one, or hitting the tab key when it is selected, you will get the remaining code in your script with place holders for the parameters. Type in the values of the parameters and hit tab to jump to the next. When you are finished, the last time you hit tab it will jump to the end of the method call.
In addition to the code completion there are a number of built-in macros for common tasks. To active a macro simply type in its code and then hit the spacebar while holding down the Ctrl button.
Overview
This feature is only available to the Enterprise edition of screen-scraper.
screen-scraper has the ability to automatically generate RSS and Atom feeds from extracted data. If you're unfamiliar with RSS and Atom feeds you might take a minute to read up on the topic first.
The documentation on this page is a bit abstract. If you're interested in building RSS/Atom feeds with screen-scraper it would probably be a good idea for you to go through our Sixth Tutorial, which will walk you through the process in detail.
How it Works
A small web server runs within screen-scraper that interacts with the scraping engine. As such, you can access a URL within a browser or RSS/Atom reader that will cause screen-scraper to invoke a scraping session, then return back an RSS or Atom feed.
The basic syntax for the URL you'll use to generate a feed looks like this:
For example, if you were running screen-scraper on your local machine, and wanted to generate a feed for the "Shopping Site" example used in our tutorials with the search term "bug" the URL would look like this:
As with any other URL, each of the parameters must be properly URL-encoded. Key/value pairs can also be passed in as POST parameters.
The only required parameter is "scraping_session". screen-scraper will create session variables out of any other parameters that get passed in.
Setting Up the Scraping Session
The scraping session must have certain named elements present in order to generate the feed. They are as follows:
When the XML feed is requested through your browser or reader screen-scraper will invoke the scraping session named by the "scraping_session" parameter. Once the scraping session completes screen-scraper will look for a DataSet called "XML_FEED", iterate over its constituent DataRecord objects, building the feed from them.
Hypertext transfer protocol provides a way for clients such as web browsers to communicate with web servers. There's quite a bit on the web that's written on the topic, so for the time being we'll just provide some good links for you:
Scraping sessions and scripts can be exported from screen-scraper to external files. You might consider doing this in order to back up your work, and even commit them to a versioning system, such as CVS or Subversion.
In order to export a scraping session or script to an external file simply select the object you wish to export then click on the corresponding Export button (Export Session or Export Script). You'll be asked to save the file to a location of your choice. You're also free to name the file what you wish, though we recommend you leave the (scraping session) or (script) portion of the name in tact so that you can identify the type of the object later on. When you export a scraping session from screen-scraper all scripts directly associated with that scraping session will be exported within the same file.
When a scraping session is exported the time of export is also included in the resulting file. This date can be useful to track versions of the scraping session. To view the date, open the .sss file in a text editor and search for the
To import a scraping session or script into screen-scraper select the Import... option from the . Locate the ".sss" file corresponding to the object you wish to import, and select Open. If you've selected a valid file the objects contained within that file will be imported into the application.
You can also import exported scraping sessions and scripts into screen-scraper by copying them into the import folder you'll find in the directory where screen-scraper was installed. This can be especially useful while screen-scraper is running as a server, which allows the objects to be imported on the fly (that is, without stopping the server). screen-scraper will check this directory just before executing a scraping session, and import any files found in it. Note that imported files will be removed from the import folder once they are imported by screen-scraper.
In cases where you want to pack up scraping sessions and scripts along with other files needed to run a scrape, you can compress them all into an update.zip file. This file should replicate the directory structure of screen-scraper. For example, you might have a folder called import that contains a scraping session. You might also have a CSV file in the root of the zip file that contains parameters needed to run the scraping session. You can zip all of these up into an update.zip file, then place that file inside an update folder found in screen-scraper's install directory. When screen-scraper starts up it will unzip the file, copy all of its contents to the corresponding locations, then delete the update.zip file.
If you've un-checked the Overwrite on import checkbox for a script, and would like to import that script into an instance of screen-scraper that is running in a GUI-less environment, follow the instructions on script overwriting.
The memory usage indicator was introduced in screen-scraper 4.5 and shows you how much of the memory currently allocated to screen-scraper that is being used. As screen-scraper requires it, it may be allocated more memory from the underlying Java Virtual Machine, up to the amount specified in the settings dialog box.
In the workbench, the indicator is on the far right of the main window's status bar. In the Enterprise Edition's web interface, the indicator is at the top of the page under the Import button.
The current memory usage can also be queried in a script via the getMemoryUsage method.
Often times screen-scraper will be running on a server that has no graphical interface. Updating to the latest version in such an environment previously required multiple steps, but can now be done with a simple Python script.
You can download the script from our site.
Any Unix-based computer worth its salt will already have python installed. To use the updater, open a terminal and navigate to the screen-scraper install directory. Ensure that screen-scraper is not currently running (via ./server status). After that, issue this command to update to the latest version:
If you want to force screen-scraper to upgrade to the latest unstable version, use this command: