REST API

Overview

The REST API was first released in the stable version 5.0 (alpha 4.5.18a). It is not a true REST API but rather an API accessible via GET requests. But for the sake of naming we call it the screen-scraper REST API. It will allow you to issue web interface commands through GET requests.

Using REST API

The basic structure to all REST API requests is to specify the action GET parameter with what you want to do. Some actions will require other parameters to be set as well. Here are some available actions and their parameters.

For any of this to work screen-scraper has to be running in server mode.

This feature is only available to Enterprise editions of screen-scraper.

Change General Settings for Scraping Sessions

http://localhost:8779/ss/rest?action=save_settings&default_timeout=89&default_repeat_days=9&default_repeat_hours=8&default_repeat_minutes=7&default_repeat_seconds=6&default_threshold_time=4&default_threshold_record_count=3

  • default_timeout The number of minutes the scraping session is allowed to run before a request to stop is inserted.
  • default_repeat_days The number of days that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
  • default_repeat_hours The number of hours that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
  • default_repeat_minutes The number of minutes that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
  • default_repeat_seconds The number of seconds that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
  • default_threshold_time The percentage of time whereby two runs of a scraping session may differ without being flagged as a possible error.
  • default_threshold_record_count The percentage of records scraped whereby two runs of a scraping session may differ without being flagged as a possible error.

Disable/Enable Scheduled Scraping Session

http://localhost:8779/ss/rest?action=disable_enable_scheduled_scraping_session&scheduled_scraping_session_id=110&enable=false

  • scheduled_scraping_session_id The id of the scheduled scraping session. Omit this parameter or leave it blank if you want to generate a new scheduled scraping session.
  • enable Whether the scheduled scrape should be enabled (true) or disabled (false).

Get Memory Usage

http://localhost:8779/ss/rest?action=get_memory_usage

Get Runnable Scraping Sessions

http://localhost:8779/ss/rest?action=get_runnable_scraping_sessions

Get Scheduled Scrapes

http://localhost:8779/ss/rest?action=get_scheduled_scraping_sessions

Get Scrapeable Sessions

http://localhost:8779/ss/rest?action=get_scrapeable_sessions

Get Session Variable on Scraping Session

http://localhost:8779/ss/rest?action=get_session_variable_from_scrapeable_session&scrapeable_session_id=3&key=foo

  • scrapeable_session_id The id of the scrapeable session.
  • key The name of the session variable.

Import a File

http://localhost:8779/ss/importFile

  • This call is a bit different from the others in that it needs to be a multi-part POST request (i.e., a file upload) to the above URL, with a single parameter that is a file. The parameter name should be fileToImport. The uploaded file can be either an exported scraping session or script (i.e., a ".sss" file).

Peek at a Scraping Session Log

http://localhost:8779/ss/rest?action=peek_scrapeable_session_log&scrapeable_session_id=42&num_lines=50

  • scrapeable_session_id The id of the scraping session.
  • num_lines The number of lines to show up in the log peek.

Reload Settings

http://localhost:8779/ss/rest?action=reload_settings

Delete a Scraping Session

http://localhost:8779/ss/rest?action=remove_scraping_session&scraping_session_name=ScrapeName

  • scraping_session_name The name of the scraping session to delete from the server.

Remove a Completed or Running Scraping Session

http://localhost:8779/ss/rest?action=remove_scrapeable_session&scrapeable_session_id=29

  • scrapeable_session_id The id of the scraping session as returned when the scrape was launched.

Remove Scheduled Scraping Session

http://localhost:8779/ss/rest?action=remove_scheduled_scraping_session&scheduled_scraping_session_id=0

  • scheduled_scraping_session_id The id of the scheduled scraping session.

Run Scraping Session

http://localhost:8779/ss/rest?action=run_scraping_session&scraping_session_name=Shopping+Site&settable_session_variables=this%3Dthat%26foo%3Dbar

The returned file now contains the scrapeable_session_id of the scrape to ease in manipulating it with other REST Interface actions.

  • scraping_session_name The name of the scraping session to run.
  • settable_session_variables URL encoded parameters string of session variables.

Set Scheduled Scraping Session Settings

http://localhost:8779/ss/rest?action=set_scheduled_scraping_session&scheduled_scraping_session_id=3&scraping_session_name=Shopping+Site&timeout=123&schedule_date=08%2F20%2F2009&schedule_time=11:22:33&repeat_days=4&repeat_hours=3&repeat_minutes=2&repeat_seconds=1&threshold_time=21&threshold_record_count=43&settable_session_variables=this%3Dthat%26foo%3Dbar

  • scheduled_scraping_session_id The id of the scheduled scraping session. If this parameter is empty or omitted a new scheduled scraping session will be created.
  • scraping_session_name Name of the scraping session to be scheduled.
  • timeout The number of minutes the scraping session is allowed to run before a request to stop is inserted.
  • schedule_date The calendar date when the scraping session is to run next. It is in teh format of month/day/year (MM/DD/YYYY) and should be URL encoded (%2F instead of /)
  • schedule_time The time of day when the scraping session is to run. This should be a 24-hour (military) time.
  • repeat_days The number of days that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
  • repeat_hours The number of hours that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
  • repeat_minutes The number of minutes that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
  • repeat_seconds The number of seconds that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
  • threshold_time The percentage of time whereby two runs of a scraping session may differ without being flagged as a possible error.
  • threshold_record_count The percentage of records scraped whereby two runs of a scraping session may differ without being flagged as a possible error.
  • settable_session_variables URL encoded parameters string of session variables.

Set Session Variable on Scraping Session

http://localhost:8779/ss/rest?action=set_session_variable_on_scrapeable_session&scrapeable_session_id=3&key=foo&value=bap

  • scrapeable_session_id The id of the scrapeable session.
  • key The name of the session variable.
  • value The value to associate to the session variable.

Stop a Scraping Session

http://localhost:8779/ss/rest?action=stop_running_scraping_session&scrapeable_session_id=43

  • scrapeable_session_id The id of the running scraping session.

Stop all Scraping Sessions

http://localhost:8779/ss/rest?action=stop_all_running_scraping_session