General Technical

Questions regarding how screen-scraper works or how to get it to do something.

How do I upgrade from version 6.x to 7.0?

screen-scraper 7.0 requires a newer JRE than the previous stable release, therefore upgrading requires some additional steps.

If you don’t already have all your scrapes exported, or just want to preserve the current configuration, you need to upgrade your current screen-scraper to the latest alpha version 6.0.64a (instructions). Once done, back up the content of the screen-scraper/resource/db directory.

Linux/OSX

The new installer does not include the JRE
You need to have the Java JRE 1.8 installed

Why do I get "HTML Truncated" on the Last Response tab?

Some large web pages are enough to make the "Last Response" tab non-responsive. In order to prevent performance issues, screen-scraper will truncate the HTML. You can still see it, however, if you:

  1. Click to "display response in a browser"
  2. Right click and view the source for that page

You may edit the screen-scraper.properties file to allow more to be displayed, but in so doing you may run afoul the aforementioned performance issues. To do so you either edit or add a line:
 

Can I put a session variable in an extractor pattern to limit the results?

Extractor patterns can't accept variables. The extractor pattern is dealing with the last response HTML and doesn't have the means to snip some of that HTML out and replace with a token.

In cases where you would do this, the extractor pattern might look like:

name="ProductID" value="[email protected]@~">~#NAME#~<

The hope would be to get only the match for ~#NAME#~

The correct means to do this would be to:

name="ProductID" value="[email protected]@~">[email protected]@~<

You would then invoke a script that would compare the name you scraped to that you want:
 

My sub-extractor pattern only gets one instance of my data. How can I get all of the data?

A sub-extractor pattern will, by design, match only once per dataRecord.

If you need to match a datum that appears more than once, you need to use: scrapeableFile.extractData()

I'd like to scrape data from a mainframe/tn3270 application. Can screen-scraper handle this?

No. screen-scraper is designed only to scrape data from web sites. If you're looking for a solution that can extract data from older mainframe-type applications, we'd recommend looking at Jagacy.

My web site is hosted on a shared server (virtual hosting). Can I use screen-scraper with it?

In order to install screen-scraper on a machine, you'll likely need administrative or root access. Generally this is not the case with virtual hosting, so you likely will not be able to run screen-scraper on your server.

How do I set up screen-scraper on Lunix/Unix/BSD?

Screen-scraper version 7.0 does not come with a bundled JRE, therefore you need to install Java Runtime Environment version 1.8 (version 1.7 will work, but is not recommended).

Download the installer from our download page.

For Linux, cd to the directory where you downloaded the installer. Give the current user execute permissions to the installer with:

Can screen-scraper extract information from PDF files?

Sort of, yes. See this blog posting.

Can screen-scraper be scheduled to scrape sites on a periodic basis?

If you're using the Enterprise Edition of screen-scraper, this can be done via the web interface.