One of the most powerful features of screen-scraper is its built-in scripting engine. Through scripting web sites can be crawled in a very dynamic way. Scripting also allows you to insert business logic, clean and normalize data, and write data out to external repositories, such as files and databases. This section of the documentation will familiarize you with scripting in screen-scraper, as well as cover specifics on the various scripting languages that screen-scraper supports.
Before reading through this section you might find it helpful to first read the section on using scripts with the scraping engine. Also, if you haven't done so already, we'd highly recommend going through our first few tutorials, which provides several examples of scripting in screen-scraper.
Because screen-scraper internally uses Java it is important that file paths follow the requirements of Java. That is that file paths follow the Unix/Linux structure (e.g., /usr/local/file.txt). If you are working on a machine that follows these conventions then it will not look any different to you; however, if you are working on a Windows machine this is an important difference to keep in mind.
Windows uses the backslash (\) as a file delimiter but Java uses it as an escape character. That means that on a Windows machine you need to pay closer attention to file paths as they will look a little different.
Windows file paths should use either a forward slash (/) or two backslashes (\\) to delimit file paths.
screen-scraper uses the BeanShell library to allow for scripting in Java. If you've done some programming in C or JavaScript you'll probably find BeanShell's syntax familiar. Documentation for BeanShell is excellent, and we'd recommend referring to it as you program.
Interpreted Java is just a phrase used to mean that it is java that does not require being compiled.
See the using scripts and API pages for details on objects and methods that you can make use of in a script. We also use Interpreted Java in all of our tutorials, which should get you familiar with how it's used in screen-scraper.
It is possible to access Java libraries in screen-scraper. See adding Java libraries for more details.
We use Java in the screen-scraper tutorials but if you would like to learn more about Java you can look for tutorials online. The following are some good Java resources:
This scripting language is not available by default any more. To use it you will need to edit the AllowUnstableWindowsFeatures in the screen-scraper.properties file.
Writing scripts in JScript gives you the familiarity of a widely used language, while still providing access to commonly useed Windows libraries. Using JScript within screen-scraper can only be done on a Windows platform, and requires that the JScript runtime be installed. The chances are good that you've already got the JScript runtime on your system.
screen-scraper will automatically detect if the JScript runtime is installed, which you can see by selecting a script from the objects tree in the workbench and clicking on the Language drop-down list. If you don't see JScript in the list then the runtime needs to be installed.
If you do not have JScript runtime on your system you can download it from Microsoft's script downloads page.
Please be aware that because of a bug in the third-party library that allows screen-scraper to integrate with the Microsoft Scripting Engine problems can occur if multiple JScript scripts are run simultaneously. If you're using the professional edition of screen-scraper and plan on running multiple scraping sessions simultaneously you should use Interpreted Java, JavaScript, or Python as a scripting language.
Because screen-scraper uses the native JScript engine, all Active X objects installed on the computer (such as ADO or the FileSystemObject) can be accessed. Additionally, all of the objects mentioned on the Using scripts and API pages are also available.
Java classes can also be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession and run it:
Mozilla's Rhino scripting engine is used by screen-scraper to allow scripts to be written in JavaScript. Documentation for Rhino is sparse, but the interpreter does adhere strictly to the established ECMAScript standard, so just about any reference on JavaScript could be referred to. If you try writing scripts using JavaScript, and run into difficulties (because of lack of documentation), you may want to consider using Interpreted Java instead, which has very similar syntax and provides significantly better documentation. If you've worked with client-side JavaScript in web programming, you'll probably be comfortable using JavaScript in screen-scraper.
These must be prefaced with the Packages keyword.
This scripting language is not available by default any more. To use it you will need to edit the AllowUnstableWindowsFeatures in the screen-scraper.properties file.
screen-scraper uses ActiveState's ActivePerl library for scripts written in Perl. Using Perl within screen-scraper can only be done on a Windows platform, and requires that the ActivePerl runtime be installed.
screen-scraper will automatically detect if the ActivePerl runtime is installed, which you can see by selecting a script from objects tree in the workbench and clicking on the Language drop-down. If you don't see Perl in the list then the runtime needs to be installed.
The ActivePerl runtime can be downloaded from ActiveState's download page for free.
Java classes can be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession for the "Weather" scraping session (which is found in the default screen-scraper installation) and run it:
The Jython interpreter is used by screen-scraper to for scripting in Python. Jython is a very fast interpreter, and we'd recommend using it if you're familiar with the Python programming language.
Importing your externally-compiled classes is as easy as placing them in the ./lib/ext folder of your installation. The Jython interpreter will automatically include that folder on your PythonPath.
The generator objects are implemented in Jython and the folders lib/ext, lib/jython-lib, and lib/jython-lib/site-packages are included in python's system path.
When scripting in Python all of the standard Java classes can be used. Classes must be imported using the Java package hierarchy of screen-scraper, which is also required if you'd like to create one of screen-scraper's RunnableScrapingSession objects. Here's an example that will run a scraping session called "Weather":
Notice that before the RunnableScrapingSession class can be used it first must be imported.
This scripting language is not available by default any more. To use it you will need to edit the AllowUnstableWindowsFeatures in the screen-scraper.properties file.
If you've programmed in Visual Basic or Active Server Pages you should find scripting in screen-scraper to be similar. Using VBScript within screen-scraper can only be done on a Windows platform, and requires that the VBScript runtime be installed. The chances are good that you've already got the VBScript runtime on your system.
screen-scraper will automatically detect if the VBScript runtime is installed, which you can see by selecting a script from the objects tree in the workbench and clicking on the Language drop-down list. If you don't see VBScript in the list then the runtime needs to be installed.
If you do not have VBScript runtime on your system you can download it from Microsoft's script downloads page.
Please be aware that because of a bug in the third-party library that allows screen-scraper to integrate with the Microsoft Scripting Engine problems can occur if multiple VBScript scripts are run simultaneously. If you're using the professional edition of screen-scraper and plan on running multiple scraping sessions simultaneously you should use Interpreted Java, JavaScript, or Python as a scripting language.
Because screen-scraper uses the native VBScript engine, all Active X objects installed on the computer (such as ADO or the FileSystemObject) can be accessed. Additionally, all of the objects mentioned on the using scripts and API pages are also available.
Java classes can also be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession and run it: