If you've decided to use the basic edition of screen-scraper your only option for invoking screen-scraper externally is to use the command line (invoking screen-scraper from the command line is also available in the Professional and Enterprise Editions). If you are using a Professional or Enterprise edition of screen-scraper and have access to a server that supports ASP, PHP, ColdFusion, or Java you can continue the tutorial by selecting which language you desire to use at the bottom of the page.
The rest of this page is particular to completing the tutorial using a server language. If you are using a Basic Edition of screen-scraper you are welcome to read on but you will not be able to complete the tasks.
Oftentimes you'll want to use a language or platform external to screen-scraper to scrape data. screen-scraper can be controlled externally using Java, PHP, Ruby, Python, .NET, ColdFusion, any COM-friendly language (such as Active Server Pages or Visual Basic), or any language that supports SOAP. In this next part of the tutorial we'll give examples in PHP, Java, ColdFusion, and Active Server Pages.
In order to interact with screen-scraper externally it needs to be running as a server. When running as a server screen-scraper acts much like a database server does. That is, it listens for requests from external sources, services those requests, and sends back responses. For example, when you issue a SQL statement to a database from an ASP script your script opens up a socket to the database, sends the request over it, then receives the database's response back over the socket. Once this transaction has been completed the socket will be closed, but the database will continue to listen for other requests. screen-scraper works in a similar way.
At this point we'd recommend reading over the documentation page that discusses running screen-scraper as a server, and gives details on how to start and stop it according to the platform you're running on. Follow the link below, then return back to this page when you're finished:
Before we start writing code to interact with screen-scraper externally we need to configure a few things. Depending on the language you'd like to program in, please follow one of the links below, which will give you an overview of interacting with screen-scraper using that language and guide you through any configuration that needs to take place. Once you're finished return back to this page.
The ASP script we'll be using will invoke our scraping session remotely, passing in a value for the TEXT_TO_SUBMIT session variable. Create a new ASP script on your computer, and paste the following code into it:
After creating our RemoteScrapingSession object we make a separate call to initialize it. This is required for ASP. Also, you'll notice that before calling the Scrape method we check for any errors that may have occurred up to this point.
If for some reason your ASP script can't connect to the server you'd want to know before you tried to tell it to scrape.
Finally, the script explicitly disconnects from the server so that it knows we're done.
OK, we're ready to give our script a try. Make sure that screen-scraper running in server mode. If you've succeeded in starting up the server go ahead and load your ASP script in a browser. After a short pause you should see Scraped text: Hi everybody! output to your browser.
If there was an error then a message indicating the problem that occurred will be displayed.
We'll be creating two different scripts to interact with screen-scraper via ColdFusion. The first will be using ColdFusion tags, and the second will be using ColdFusion script. Each of these scripts will invoke our scraping session remotely and pass in a value for the TEXT_TO_SUBMIT session variable.
If you have not already configured ColdFusion to run with screen-scraper, now is a good time to setup ColdFusion.
Create a new ColdFusion script on your computer, and paste the following code into it:
If you prefer using ColdFusion script to program, you can use the following code instead of the code we give above:
You can probably follow the logic but for clarity let's take a moment to look at it. This script creates a RemoteScrapingSession, initializes it to be connected to the Hello World scraping session, sets the TEXT_TO_SUBMIT session variable, then scrapes the page and explicitly disconnects.
OK, we're ready to give our ColdFusion script a try. Start screen-scraper running in server mode. If you've succeeded in starting up the server go ahead and access your ColdFusion script from your browser. After a short pause you should see textReturned: Hi everybody! appear.
The Java class we'll be writing will simply substitute for the Initialize scraping session script we wrote previously. That is, our Java class will invoke our scraping session remotely and pass in a value for the TEXT_TO_SUBMIT session variable. Create a new Java class on your computer, and paste the following code into it:
For the most part this Java code is virtually identical to our script. The one notable difference is that we need to explicitly disconnect from the server so that it knows we're done.
OK, we're ready to give our Java class a try. After you've successfully compiled the class (remember to include the "screen-scraper.jar" file in your classpath), start screen-scraper running as a server. If you've succeeded in starting up the server go ahead and run the Java class from a command prompt or console. After a short pause you should see the "Hi everybody!" message output.
The PHP script we'll be writing will invoke our scraping session remotely, passing in a value for the TEXT_TO_SUBMIT session variable. Create a new PHP script on your computer, and paste the following code into it:
After creating our RemoteScrapingSession object we make a separate call to initialize it for our specific scraping session. After calling the Scrape method we check for any errors that may have occurred up to this point.
If for some reason your PHP script can't connect to the server you'd want to know before you tried to tell it to scrape.
Finally, we explicitly disconnect from the server so that it knows we're done.
OK, we're ready to give our script a try. Start screen-scraper running as a server.
Make sure that the remote_scraping_session.php file has been copied to the same directory as your PHP script (the file can be found in screen-scraper's installation directory, misc/php.
If you've succeeded in starting up the server go ahead and load your PHP script in a browser. After a short pause you should see the following in the browser output:
If you've decided to use the basic edition of screen-scraper this is your only option for invoking screen-scraper externally (invoking screen-scraper from the command line is also available in the professional and enterprise editions).
You can find full documentation and examples on using the command line on our Invoking screen-scraper from the command line documentation.
In order to invoke screen-scraper from the command line, you'll need to create a batch file (in Windows) or a shell script (in Linux or Mac OS X) to invoke the scraping session.
If you have not disabled the Initialize scraping session script then please do so now. Instructions on how to do this can be found on the previous page.
If you're using Windows open a text editor (e.g., Notepad) and enter the following:
Save the batch file (call it hello_world.bat) in the folder where screen-scraper is installed (e.g., C:\Program Files\screen-scraper professional edition\).
If the version of screen-scraper you're running is prior to 4.5, and you're running Windows Vista, you will need to save your batch file to a location such as your Documents folder or your Desktop. Then, within Windows Explorer, manually transfer the file to the directory where screen-scraper is installed.
If you're running Linux, the shell script would look like this:
Save the shell script (call it hello_world.sh) in the folder where screen-scraper is installed (e.g., /usr/local/screen-scraper professional edition/).
For Mac OS X, you'd use this for the script:
Save the shell script (call it hello_world.sh) in the folder where screen-scraper is installed (e.g., /Users/username/screen-scraper professional edition/).
Open a DOS prompt. The cmd can be opened by clicking on the Start menu, selecting the Run option (on the right), and typing cmd.
For Windows 7 just type command into the Start menu search then click on cmd.
Navigate to the screen-scraper installation directory using the cd command. It should resemble:
Once you are in the correct directory, run the file by simply typing its name into cmd:
You should see the text from screen-scraper's log appear in the DOS window.
If you're running Linux or Mac OS X, you'll need to close the workbench before invoking your shell script.
Open Terminal and navigate to the screen-scraper installation directory using the cd command. One example is shown below:
Once you are in the correct directory, run the file:
As with the first tutorial and our test run, to see that the script has run open the form_submitted_text.txt file in the screen-scraper installation directory. You can also try editing the file and running it again to have it say something else. Have some fun!