This tutorial illustrates invoking screen-scraper from other programs in ways more complex than those presented in Tutorial 3. From our external program we'll be passing to screen-scraper search parameters, invoking the scraping process, getting the scraped data from screen-scraper, then iterating over the data, and outputting it within our application.
Before proceeding it would be a good idea to go through Tutorial 2, if you haven't done so already.
If you haven't gone through Tutorial 2, or don't still have the scraping session you created in it, you can download it and import it into screen-scraper.
This tutorial requires you to be using the Professional or Enterprise edition of screen-scraper. And it requires that you have access to a server (remote or local) that can run one of the external scripting languages that screen-scraper has drivers for: ASP, C#.NET, ColdFusion, Java, PHP, Python, or VB.NET.
If you'd like to see the final version of the scraping session you'll be creating in this tutorial you can download it below.
Attachment | Size |
---|---|
Shopping Site (Scraping Session).sss | 11.63 KB |
screen-scraper can be invoked from software applications written in most modern programming languages, including Java, Active Server Pages, PHP, .NET, and anything that supports SOAP. In this tutorial we'll give some examples of applications that do just that.
Our application will pass parameters to screen-scraper corresponding to login information as well as a key phrase for which to search. As in the third tutorial, we're going to pretend that the web site requires us to log in before we can search, for the sake of providing an example. Once we pass the parameters to screen-scraper we'll tell it to start scraping. screen-scraper will then run the scraping session using the parameters we gave it. Once it's done, we'll ask it for the extracted information, then output it for the user to see.
Before we begin we'll first need to make a couple of minor changes to the Shopping Site scraping session from the third tutorial. If you haven't already, start up screen-scraper.
Under the Shopping Site scraping session click on the Login scrapeable file, then on the Parameters tab. We're going to alter the email_address and password POST parameters so that we can pass those parameters in rather than hard-coding them. For the email_address parameter change the value [email protected] to ~#EMAIL_ADDRESS#~, and change the testing value for the password parameter to ~#PASSWORD#~.
Remember tokens surrounded by the ~# #~ delimiters indicate that the value of a session variable should be inserted. For example, in our case we're going to create an EMAIL_ADDRESS session variable and give it the value [email protected] such that screen-scraper substitutes it in for the corresponding POST parameter at runtime.
To simplify the process of giving an external script access to the extracted product details, we will save the data set into a session variable.
Click on the Details page scrapeable file. On the PRODUCTS extractor pattern, select the Advanced tab and check the box next to Automatically save the data set generated by this extractor pattern in a session variable.
The code that we'll be writing in our external application will essentially take the place of the Shopping Site--initialize session script. Let's disable the association since it would otherwise overwrite the values we'll be passing in externally.
To do that click on the Shopping Site scraping session in the objects tree and un-check the Enabled checkbox for the Shopping Site--initialize session script.
Save your changes and exit screen-scraper. Also, so that the external scripts will be able to interact with screen-scraper, start screen-scraper running as a server.
Where you go next depends on which programming language you're interested in. Select the the link below that corresponds the the language that you will be using.
In order to invoke screen-scraper from ASP, screen-scraper needs to be running in server mode. If you'd like a refresher on how to start up screen-scraper in server mode go ahead and follow that link, then come back here.
Right-click and download the shopping.asp file, then save it to a directory where it will be web-accessible (i.e., within your IIS web dir).
Open up your web browser and go to the URL corresponding to the shopping.asp file (e.g, "http://localhost/screen-scraper/shopping.asp"). You'll see a simple search form. Type in a product keyword, such as bug, then hit the Go button. If all goes well the page will take a little while to load (it's waiting as screen-scraper extracts the data), then it will output the corresponding products.
If that didn't go quite as you expected here are some things to check:
Assuming the test worked, fire up your favorite ASP editor and open the shopping.asp file in it. The file is pretty heavily commented, so hopefully it makes sense what's going on. If not, try reviewing our COM documentation or posting to our forum.
When you invoke screen-scraper as a server it creates log files corresponding to each run of your scraping sessions in its log folder. Take a look in that folder for your Shopping Site log and take a look through it. It should look similar to what you see when you run scraping sessions in the workbench.
In order to invoke screen-scraper from C#.NET, screen-scraper needs to be running in server mode. If you'd like a refresher on how to start up screen-scraper in server mode go ahead and follow the link, then return here.
Right-click and download the shopping.cs file. Move it into the desired directory.
From your .NET environment compile and execute the shopping.cs file.
If that didn't go quite as you expected here are some things to check:
Assuming that test worked, take a closer look over the shopping.cs class. The file is pretty heavily commented, so hopefully it makes sense what's going on. If not, try reviewing our .NET documentation or posting to our forum.
When you invoke screen-scraper as a server it creates log files corresponding to your scraping session in its log folder. Take a look in that folder for your Shopping Site log file and take a look through it. It should look similar to what you see when you run scraping sessions in the workbench.
In order to invoke screen-scraper from ColdFusion, screen-scraper needs to be running in server mode. If you'd like a refresher on how to start up screen-scraper in server mode go ahead and follow the link, then return here.
Download the shopping.cfm file, then save it in a directory that will be accessible from your web server. Rename the file from shopping.cfm.txt to shopping.cfm.
Open up your web browser and go to the URL corresponding to the shopping.cfm file (e.g, "http://localhost/screen-scraper/shopping.cfm"). You'll see a simple search form. Type in a product keyword, such as bug, then hit the Go button. If all goes well the page will take a little while to load (it's waiting as screen-scraper extracts the data), then it will output the corresponding products.
If that didn't go quite as you expected here are some things to check:
Assuming that test worked, fire up your favorite ColdFusion editor and open the shopping.cfm file in it. The file is pretty heavily commented, so hopefully it makes sense what's going on. If not, try reviewing ColdFusion documentation or posting to our forum.
When you invoke screen-scraper as a server it creates log files corresponding to your scraping session in its log folder. Take a look in that folder for your Shopping Site log file and take a look through it. It should look similar to what you see when you run scraping sessions in the workbench.
In order to invoke screen-scraper from Java, screen-scraper needs to be running in server mode. If you'd like a refresher on how to start up screen-scraper in server mode go ahead and follow the link, then return here.
Before we dig into the code let's review a few things related to invoking screen-scraper via Java. First, your Java code will need to have two jars in its classpath: screen-scraper.jar (found in the root screen-scraper install folder) and log4j.jar (found in screen-scraper's lib folder). For convenience we've packaged all of the files you'll need. Download the file and unzip it. You'll notice that we also include an Ant build file that you can use to compile and run the sample class.
If you're using Ant simply type ant run at a command prompt inside of the folder where the build.xml file is found.
If that didn't go quite as you expected here are some things to check:
Assuming that test worked, fire up your favorite Java editor and open the Shopping.java file in it. The file is pretty heavily commented, so hopefully it makes sense what's going on. If not, try reviewing our Java documentation or posting to our forum.
When you invoke screen-scraper as a server it creates log files corresponding to your scraping session in its log folder. Take a look in that folder for your Shopping Site log file and take a look through it. It should look similar to what you see when you run scraping sessions in the workbench.
In order to invoke screen-scraper from PHP, screen-scraper needs to be running in server mode. If you'd like a refresher on how to start up screen-scraper in server mode go ahead and follow that link, then come back here.
Your PHP code will need to refer to screen-scraper's PHP driver, called remote_scraping_session.php. You can find this file in the misc\php\ folder of your screen-scraper installation. You'll want to copy the file into the directory where you plan on putting the PHP file that will invoke screen-scraper.
Download the shopping.php file and then save it in the same directory where you copied the remote_scraping_session.php file. Rename the file from shopping.php.txt to shopping.php.
Open up your web browser and go to the URL corresponding to the shopping.php file (e.g, "http://localhost/screen-scraper/shopping.php"). You'll see a simple search form. Type in a product keyword, such as bug, then hit the Go button. If all goes well the page will take a little while to load (it's waiting as screen-scraper extracts the data), then it will output the corresponding products.
If that didn't go quite as you expected here are some things to check:
Assuming that test worked, fire up your favorite PHP editor and open the shopping.php file in it. The file is pretty heavily commented, so hopefully it makes sense what's going on. If not, try reviewing the PHP documentation or posting to our forum.
When you invoke screen-scraper as a server it creates log files corresponding to your scraping session in its log folder. Take a look in that folder for your Shopping Site log file and take a look through it. It should look similar to what you see when you run scraping sessions in the workbench.
In order to invoke screen-scraper from Python, screen-scraper needs to be running in server mode. If you'd like a refresher on how to start up screen-scraper in server mode go ahead and follow the link, then return here.
Your Python code will need to refer to screen-scraper's Python driver, called remote_scraping_session.py. You can find this file in the misc\python\ folder of your screen-scraper installation. You'll want to put a copy of the file into the directory where you plan on putting the Python file that will invoke screen-scraper.
Download the shopping.py file, then save it in the same directory where you copied the remote_scraping_session.py file. Rename the file from shopping.py.txt to shopping.py.
Run the command python shopping.py in your console. You'll be asked which keyword to search. Type in a product keyword, such as bug, then press the Enter key. If all goes well the program will take a little while to load (it's waiting as screen-scraper extracts the data), then it will output the corresponding products.
If that didn't go quite as you expected here are some things to check:
Assuming that test worked, fire up your favorite Python editor and open the shopping.py file in it. The file is pretty heavily commented, so hopefully it makes sense what's going on. If not, try reviewing the Python documentation or posting to our forum.
When you invoke screen-scraper as a server it creates log files corresponding to your scraping session in its log folder. Take a look in that folder for your Shopping Site log file and take a look through it. It should look similar to what you see when you run scraping sessions in the workbench.
In order to invoke screen-scraper from Ruby, screen-scraper needs to be running in server mode. If you'd like a refresher on how to start up screen-scraper in server mode go ahead and follow that link, then come back here.
Your Ruby code will need to refer to screen-scraper's Ruby driver, called remote_scraping_session.rb. You can find this file in the misc\ruby\ folder of your screen-scraper installation. You'll want to copy that file into the directory where you plan on putting the Ruby file that will invoke screen-scraper.
Download the shopping.rb.txt file then save it in the same directory where you copied the remote_scraping_session.rb file. Rename the file from shopping.rb.txt to shopping.rb.
Run the command ruby shopping.rb in your console. You'll be asked which keyword to search. Type in a product keyword, such as bug, then press the Enter key. If all goes well the program will take a little while to load (it's waiting as screen-scraper extracts the data), then it will output the corresponding products.
If that didn't go quite as you expected here are some things to check:
Assuming that test worked, fire up your favorite Ruby editor and open the shopping.rb file in it. The file is pretty heavily commented, so hopefully it makes sense what's going on. If not, try reviewing the Ruby documentation, or posting to our forum.
When you invoke screen-scraper as a server it creates log files corresponding to your scraping session in its log folder. Take a look in that folder for your "Shopping Site" log file and take a look through it. It should look similar to what you see when you run scraping sessions in the workbench.
In order to invoke screen-scraper from VB.NET, screen-scraper needs to be running in server mode. If you'd like a refresher on how to start up screen-scraper in server mode go ahead and follow the link, then return here.
Download the shopping.vb file. Rename the file from shopping.vb.txt to shopping.vb. From your .NET environment compile and execute the file.
If that didn't go quite as you expected here are some things to check:
Assuming that test worked, take a closer look over the shopping.vb class. The file is pretty heavily commented, so hopefully it makes sense what's going on. If not, try reviewing our .NET documentation or posting to our forum.
When you invoke screen-scraper as a server it creates log files corresponding to your scraping session in its log folder. Take a look in that folder for your Shopping Site log file and take a look through it. It should look similar to what you see when you run scraping sessions in the workbench.
First off, Congratulations! You have made it through another tutorial and are progressing in your abilities to extract information from the web. The approach outlined in this tutorial works great for relatively small sets of data. When we extract records from the shopping site we're probably not going to extract more than 25 or so. When screen-scraper extracts the data it is saved in memory (remember we checked the Automatically save the data set generated by this extractor pattern in a session variable checkbox for the PRODUCTS extractor pattern, which is what causes this to happen), so it works fine because there aren't that many products.
Where to next? Well, what would happen if we needed to extract and save large numbers of records? The simple answer is that you need to save them out as they're extracted rather than having screen-scraper keep them in memory. Usually this means either inserting the scraped records into a database or writing them out to a text file.
Tutorial 2 already illustrated how to write the data out to a file but Tutorial 5 will walk your through saving scraped data to a database (if you interested in this you might also find this FAQ helpful).
Just remember that if you're writing the data out to a file you'll want to uncheck the box labeled Automatically save the data set generated by this extractor pattern in a session variable for the extractor pattern that pulls out the data you want to save. If it's checked it will cause screen-scraper to store all of the data in memory, which could cause it to run out of memory while it's running.
Tutorial 6 will use screen-scraper to create an XML Feed from the e-commerce site while Tutorial 7 will go through using a file of search terms to run the search scrape multiple times and write it to a file.
If you don't feel comfortable with the process, we invite you to recreate the scrape using the tutorial only for reference. This can be done using only the screen-shots while you work on it. If you are still struggling you can search our forums for others like yourself and ask specific questions to the screen-scraper community.