Other Windows

Overview

So far we have explained each of the windows in the workbench of screen-scraper. Here we would like to make you aware of a few other windows that you will likely come across in your work with screen-scraper.

Breakpoint Window

Overview

The breakpoint window opens when the scraping session runs into a session.breakpoint method call in a script. It is a very effective tool when trouble shooting your scrapes.

Breakpoint Window

  • (run): Instructs the scrape to continue from the stop point.
  • (stop): Ends the scrape (as soon as it can).
  • Session variables: Lists all of the session variables that are currently available.

    The value of any variable can be edited here by double clicking on it, changing it, and deselecting or hitting enter.

  • Current script: The script that initiated the breakpoint as well as a count of currently active scripts.
  • Current scrapeable file: The scrapeable file that called the script that initiated the breakpoint.
  • Current data set: Opens a dataset window with the contents of the active data set.
  • Current data record: Lists all of the data record variables that are currently available.

    The value of any variable can be edited here by double clicking on it, changing it, and deselecting or hitting enter.

Compare Last Request and Proxy Transaction

Overview

This feature is only available to Professional and Enterprise editions of screen-scraper.

At times in developing a scraping session a particular scrapeable file may not be giving you the results you're expecting. Even if you generated it from a proxy session parameters or cookies may be different enough that the response from the server is very different than what you were anticipating, including even errors. Generally in cases like this the best approach is to compare the request produced by the scrapeable file in the running scraping session with the request produced by your browser in the proxy session. That is, ideally your scraping session mimics as closely as possible what your web browser does.

The Compare Last Request and Proxy Transaction window facilitates just such a comparison. I can be accessed in the last request tab of the scrapreable file. After clicking the Compare Last Request and Proxy Transaction button, you will be prompted to select the proxy transaction to which the request should be compared. Simply navigate to the proxy session that it is connected to and select the desired transaction and the window will open.

The screen has four tabs to aid in comparing transaction and request: URL, POST data, Cookies, and Headers. Parameters in any of these areas can be controlled using the scrapeableFile object and its methods.

DataSet Window

Overview

The DataSet window displays the values matched by the extractor tokens. It can be view in two basic ways:

  1. Clicking the Apply Pattern to Last Scraped Data button on an extractor pattern or sub-extractor pattern.
  2. Selecting a DataSet or clicking the Current data set button in a breakpoint window.

The DataSet window has two rendering styles. The default is grid view, but you can switch between views using the button at the top of the screen (after view as:).

Grid View

The names of the columns correspond to the tokens that matched data in the most recent scrapeable file's response. The one addition is the Sequence column that is used by screen-scraper to identify the order in which the matches occurred on the page.

If a column is not showing up for an extractor token it is because that token does not match anything in any of the data records.

List View

This view can be a little easier for viewing the matched data in data record groups.

Regular Expressions Editor

Overview

The regular expressions that you can select for extractor tokens are stored in screen-scraper and can be edited in the Regular Expressions Editor window. The window is accessed by selecting Edit Regular Expressions from the Options menu.

This can be helpful if you have a regular expression that you use regularly. You can also edit the provided regular expressions though we encourage you not to do so without good reason. These regular expressions have been tested over time and updated when required; they are very stable expressions.

  • Add Regular Expression: Adds a new regular expression to the list.
  • (list of regular expressions)
    • Identifier: Name for the pattern. This is what will be selected when adding a regular expression to the extractor token.
    • Expression: The regular expression.
    • Description: A brief description of the regular expression. This is primarily to help you remember when you come back to it later.

Listed regular expressions can be edited by double clicking in the field that you would like to edit.