Web Interface
The web interface is only available for enterprise edition users of screen-scraper.
Overview
The screen-scraper web interface allows you to administer aspects of the scraping process. This includes monitoring running scraping sessions, importing and exporting scraping sessions, and scheduling scraping sessions to be run on a periodic basis.
When screen-scraper is running in server mode, you can access the web interface on your local machine at the following URL: http://localhost:8779/.
If you've changed the Web/SOAP Server port in the workbench or the SOAPPort in the screen-scraper.properties file, you'll need to use the port you designated.
Depending on the operating system you're running, instead of localhost, you may need to use 127.0.0.1 or the IP address of the machine.
Managing Scraping Sessions
Importing
- Click the Import button in the upper-left corner of the browser. It will open a dialog which prompts you to navigate to the scraping session you want to import.
If a scraping session or script with the same name already exists in this instance of screen-scraper then script overwriting properties will determine which one is discarded. If a script cannot be overwritten then a warning message will inform you of that the script that was trying to import was discarded.
- Add the scraping session to the import folder in screen-scraper's install directory
If screen-scraper is running when you copy the files into the import folder they will be imported and hot-swapped in the next time a scraping session is invoked. They will also be imported if you start or stop screen-scraper.
Exporting
- Click the Export button in the Runnable tab that corresponds to the scraping session.
Web Interface: Settings
Overview
The web interface settings can be opened by clicking on the settings button in the upper-right corner of the screen.
Settings
Flagged scrapes are highlighted in red in the run/running tab.
Web Interface: Runnable tab
Overview
This tab displays all scraping sessions loaded into the current instance of screen-scraper. It will display basic information on scraping sessions that are currently running, as well as scraping sessions that have run in the past. It also allows you to start and schedule scraping sessions.
The runnable tab will display all of the scraping sessions listed alphabetically by name, and the messages from the most recently started instance of the scrape.
- View as: Change from list to folders view using this drop-down menu.
- Refresh: Update the contents of the scraping session list.
- (List of available scraping sessions):
- Name: The name of the scraping session.
- Start Time: The date and time the scraping session was last started.
- Running Time: The amount of time the scraping session has been running (the number will update each time you click the Refresh button at the top right of the table).
If the scraping session is not currently running it shows is how long is took to run last time it was run.
- Previous Running Time: The amount of time the scraping session took the last time it ran.
If the scraping session is not currently running it show the amount of time it took to run two times ago.
- Num Records: The number of records the scraping session has extracted as recorded by the session.addToNumRecordsScraped method. If the method is never called then this number will always be zero.
- Previous Num Records: The number of records the scraping scraping session the last time it ran.
- Status: Indicates the current status of the scraping session. Possibilities include "In Process", "Completed", "Interrupted", and "Error".
- Export: Exports the scraping session, just as you would from the workbench.
- Run Now: Runs the scraping session.
- Schedule: Allows you to schedule the scraping session to be run. See schedule scraping sessions for more information.
- Remove: Deletes the scraping session from screen-scraper.
- Notes: Allows you to view the notes specified in the scraping session.
Web Interface: Run/Running tab
Overview
This tab displays information on scraping sessions that are either currently running or have run in the past. You can use this table to compare run times, the number of records scraped, and also to monitor scraping session logs. If scraping sessions have timed out (see settings) the stop button will gray and the status will change to interupted. If a script has flagged a fatal error (see setFatalErrorOccurred) then the error cell will display in red for that scrape.
Scrapes can be ordered in ascending and descending order using any of the fields. This is done by clicking on the column header that you want to sort by.
Run/Running tab
- Stop Marked Scraping Sessions: Stops the scraping sessions whose rows are checked on the far left.
- Remove Completed Scraping Sessions: Removes the scraping sessions which have a status of complete.
Removing records for scraping sessions that have run doesn't remove the scraping sessions themselves, just the records related to the time when they were run.
- Remove Marked Scraping Sessions: Removes the scraping sessions whose rows are checked on the far left from the run/runnable tab.
Removing records for scraping sessions that have run doesn't remove the scraping sessions themselves, just the records related to the time when they were run.
- Auto-refresh: Refreshes the table of running files regularly.
- Refresh: Refreshes the table of running files.
- (List of running and completed scraping session runs):
- Name: The name of the scraping session.
- Start Time: The date and time the scraping session was last started.
- Running Time: The amount of time the scraping session has been running (the number will update each time you click the Refresh button at the top right of the table).
If the scraping session is not currently running it shows is how long is took to run last time it was run.
- Previous Running Time: The amount of time the scraping session took the last time it ran.
If the scraping session is not currently running it show the amount of time it took to run two times ago.
- Num Records: The number of records the scraping session has extracted as recorded by the session.addToNumRecordsScraped method. If the method is never called then this number will always be zero.
- Previous Num Records: The number of records the scraping scraping session the last time it ran.
- Status: Indicates the current status of the scraping session. Possibilities include "In Process", "Completed", "Interrupted", and "Error".
- Error: Indicates whether or not a fatal error has been flagged in the scraping session (see setFatalErrorOccurred).
- Error Message: In the event of a flagged error, displays the provided message (see setErrorMessage).
- Peek: Pops up a box that allows you to view the most recent section of the log.
- Stop: Stops the scraping session.
Web Interface: Scheduled tab
Overview
On this tab you can manage scraping sessions that have been scheduled to be run. The columns can be sorted by clicking on the column headers.
Scheduled Tab
- Refresh: Reloads the table of scheduled scraping sessions.
- (List of scheduled runs for scraping sessions):
- Scraping Session: The name of the scheduled scraping session.
- Timeout: The amount of time in minutes the scraping session should be allowed to run.
If this value is 0 or a negative number, the scraping session will not time out.
- Date/Time: The date and time the scraping session is next scheduled to be run.
- Session Variables: Any session variables that are to be passed to the scraping session when it runs.
- Disable/Enable: Allows you to temporarily enable or disable the scheduled run of the scraping session.
If the run of the scraping session is disabled, it will not run even if it's scheduled to do so.
- Edit: Pops up a dialog box that allows you to manage the scheduled run of the scraping session.
- Remove: Removes the scheduled run of the scraping session.
Web Interface: Schedule Scraping Session
Overview
It can be very helpful to have scraping sessions run automatically or on an on going basis. The web interface makes this simple allowing you to schedule and manage multiple scrapes in a single location.
Managing Scheduled Scrapes
Scheduling Run
Editing Scheduled Run
- You can alter the settings for an already scheduled scraping session by clicking on teh Edit button on the scheduled tab.
Removing Scheduled Run
- You can remove an already scheduled scraping session by clicking on the Remove button on the scheduled tab.
Schedule Scraping Session: General tab
General Tab
Schedule Scraping Session: Schedule tab
Schedule Tab
Schedule Scraping Session: Thresholds tab
Thresholds Tab
- Time: The percentage of time whereby two runs of a scraping session may differ without being flagged as a possible error.
- Record Count: The percentage of records scraped whereby two runs of a scraping session may differ without being flagged as a possible error.
Flagged scrapes are highlighted in red in the run/running tab.