Web Interface

The web interface is only available for enterprise edition users of screen-scraper.

Overview

The screen-scraper web interface allows you to administer aspects of the scraping process. This includes monitoring running scraping sessions, importing and exporting scraping sessions, and scheduling scraping sessions to be run on a periodic basis.

When screen-scraper is running in server mode, you can access the web interface on your local machine at the following URL: http://localhost:8779/.

If you've changed the Web/SOAP Server port in the workbench or the SOAPPort in the screen-scraper.properties file, you'll need to use the port you designated.

Depending on the operating system you're running, instead of localhost, you may need to use 127.0.0.1 or the IP address of the machine.

Managing Scraping Sessions

Importing

Exporting

Web Interface: Settings

Overview

The web interface settings can be opened by clicking on the settings button in the upper-right corner of the screen.

Settings

  • Timeout: The default number of minutes the scraping session is allowed to run before a request to stop is inserted.

    If this value is blank, 0, or negative, the scraping session will not time out.

  • Time: The default percentage of time whereby two runs of a scraping session may differ without being flagged as a possible error.
  • Record Count: The default percentage of records scraped whereby two runs of a scraping session may differ without being flagged as a possible error.
  • Repeat Every: How often the scrape should be rerun by default.
  • Reload From File: If you directly edit the screen-scraper.properties file this causes the new settings to be reloaded.
  • Save: Save settings and close dialog.
  • Cancel: Close without saving changes to settings.

Flagged scrapes are highlighted in red in the run/running tab.

Web Interface: Runnable tab

Overview

This tab displays all scraping sessions loaded into the current instance of screen-scraper. It will display basic information on scraping sessions that are currently running, as well as scraping sessions that have run in the past. It also allows you to start and schedule scraping sessions.

The runnable tab will display all of the scraping sessions listed alphabetically by name, and the messages from the most recently started instance of the scrape.

  • View as: Change from list to folders view using this drop-down menu.
  • Refresh: Update the contents of the scraping session list.
  • (List of available scraping sessions):
    • Name: The name of the scraping session.
    • Start Time: The date and time the scraping session was last started.
    • Running Time: The amount of time the scraping session has been running (the number will update each time you click the Refresh button at the top right of the table).

      If the scraping session is not currently running it shows is how long is took to run last time it was run.

    • Previous Running Time: The amount of time the scraping session took the last time it ran.

      If the scraping session is not currently running it show the amount of time it took to run two times ago.

    • Num Records: The number of records the scraping session has extracted as recorded by the session.addToNumRecordsScraped method. If the method is never called then this number will always be zero.
    • Previous Num Records: The number of records the scraping scraping session the last time it ran.
    • Status: Indicates the current status of the scraping session. Possibilities include "In Process", "Completed", "Interrupted", and "Error".
    • Export: Exports the scraping session, just as you would from the workbench.
    • Run Now: Runs the scraping session.
    • Schedule: Allows you to schedule the scraping session to be run. See schedule scraping sessions for more information.
    • Remove: Deletes the scraping session from screen-scraper.
    • Notes: Allows you to view the notes specified in the scraping session.

Web Interface: Run/Running tab

Overview

This tab displays information on scraping sessions that are either currently running or have run in the past. You can use this table to compare run times, the number of records scraped, and also to monitor scraping session logs. If scraping sessions have timed out (see settings) the stop button will gray and the status will change to interupted. If a script has flagged a fatal error (see setFatalErrorOccurred) then the error cell will display in red for that scrape.

Scrapes can be ordered in ascending and descending order using any of the fields. This is done by clicking on the column header that you want to sort by.

Run/Running tab

  • Stop Marked Scraping Sessions: Stops the scraping sessions whose rows are checked on the far left.
  • Remove Completed Scraping Sessions: Removes the scraping sessions which have a status of complete.

    Removing records for scraping sessions that have run doesn't remove the scraping sessions themselves, just the records related to the time when they were run.

  • Remove Marked Scraping Sessions: Removes the scraping sessions whose rows are checked on the far left from the run/runnable tab.

    Removing records for scraping sessions that have run doesn't remove the scraping sessions themselves, just the records related to the time when they were run.

  • Auto-refresh: Refreshes the table of running files regularly.
  • Refresh: Refreshes the table of running files.
  • (List of running and completed scraping session runs):
    • Name: The name of the scraping session.
    • Start Time: The date and time the scraping session was last started.
    • Running Time: The amount of time the scraping session has been running (the number will update each time you click the Refresh button at the top right of the table).

      If the scraping session is not currently running it shows is how long is took to run last time it was run.

    • Previous Running Time: The amount of time the scraping session took the last time it ran.

      If the scraping session is not currently running it show the amount of time it took to run two times ago.

    • Num Records: The number of records the scraping session has extracted as recorded by the session.addToNumRecordsScraped method. If the method is never called then this number will always be zero.
    • Previous Num Records: The number of records the scraping scraping session the last time it ran.
    • Status: Indicates the current status of the scraping session. Possibilities include "In Process", "Completed", "Interrupted", and "Error".
    • Error: Indicates whether or not a fatal error has been flagged in the scraping session (see setFatalErrorOccurred).
    • Error Message: In the event of a flagged error, displays the provided message (see setErrorMessage).
    • Peek: Pops up a box that allows you to view the most recent section of the log.
    • Stop: Stops the scraping session.

Web Interface: Scheduled tab

Overview

On this tab you can manage scraping sessions that have been scheduled to be run. The columns can be sorted by clicking on the column headers.

Scheduled Tab

  • Refresh: Reloads the table of scheduled scraping sessions.
  • (List of scheduled runs for scraping sessions):
    • Scraping Session: The name of the scheduled scraping session.
    • Timeout: The amount of time in minutes the scraping session should be allowed to run.

      If this value is 0 or a negative number, the scraping session will not time out.

    • Date/Time: The date and time the scraping session is next scheduled to be run.
    • Session Variables: Any session variables that are to be passed to the scraping session when it runs.
    • Disable/Enable: Allows you to temporarily enable or disable the scheduled run of the scraping session.

      If the run of the scraping session is disabled, it will not run even if it's scheduled to do so.

    • Edit: Pops up a dialog box that allows you to manage the scheduled run of the scraping session.
    • Remove: Removes the scheduled run of the scraping session.

Web Interface: Schedule Scraping Session

Overview

It can be very helpful to have scraping sessions run automatically or on an on going basis. The web interface makes this simple allowing you to schedule and manage multiple scrapes in a single location.

Managing Scheduled Scrapes

Scheduling Run

Editing Scheduled Run

  • You can alter the settings for an already scheduled scraping session by clicking on teh Edit button on the scheduled tab.

Removing Scheduled Run

  • You can remove an already scheduled scraping session by clicking on the Remove button on the scheduled tab.

Schedule Scraping Session: General tab

General Tab

  • Scraping Session: The name of the scheduled scraping session.
  • Timeout: The number of minutes the scraping session is allowed to run before a request to stop is inserted.

    If this value is blank, 0, or negative, the scraping session will not time out.

  • Session Variables: This is a list of session variables that will be passed to the scraping session when it is run.

Schedule Scraping Session: Schedule tab

Schedule Tab

  • Date: The calendar date when the scraping session is to run next. Click the box to bring up a graphical calendar from which you can select the desired date.
  • Time: The time of day when the scraping session is to run next. This should be a 24-hour (military) time.
  • Repeat Every: Use this to set the frequency with which the scraping session is to run. For example, if you enter 2 into the Hours box, the scraping session will run when it is scheduled, then be re-scheduled to run once again two hours from the time it started.

    If these boxes are left blank, the scraping session will run once and not be re-scheduled.

Schedule Scraping Session: Thresholds tab

Thresholds Tab

  • Time: The percentage of time whereby two runs of a scraping session may differ without being flagged as a possible error.
  • Record Count: The percentage of records scraped whereby two runs of a scraping session may differ without being flagged as a possible error.

Flagged scrapes are highlighted in red in the run/running tab.