Extractor Tokens
Overview
Extractor tokens select the information from a file that you want to be able to access. The purpose of an extractor pattern is to give context to the extractor token(s) that it contains. This is to assist in getting the tokens to only return the information that you desire to have. Without extractor tokens you will not gather any information from the site.
Extractor tokens become available to dataRecord, dataSet, and session objects depending on their settings and the scope of the scripts invoked. All extractor tokens are surrounded by the delimiters ~@ and @~ (one for each side of the token). Between the two delimiters is where the name/identifier of the token is specified.
Managing Extractor Tokens
Adding
Removing
- Remove the token and delimiters from the Pattern text of the extractor pattern like you would with any text editor
Editing
- Double-click on the desired extractor token's name
- Select the extractor token's name, right click, and choose Edit token
Extractor Token: General tab
General Tab
- Identifier: This is a string that will be used to identify the piece of data that gets extracted as a result of this token. You can use only alphanumeric characters and underscores here.
- Save in session variable: Checking this box causes the value extracted by the token to be saved in a session variable using the token's identifier.
- Null session variable if no match (enterprise edition only): When checked, if a session variable was matched previously but not this time, the value will be set to null. If unchecked the unmatched token would do nothing to the session variable so that the old session variable persists.
- Regular Expression: Here you can designate a regular expression that will be used to match the text covered by this token. In most cases you should designate a regular expression for tokens. This makes the extraction more efficient and helps to guard against future changes that might be made to the target web site.
Extractor Token: Mapping tab
Mapping Tab (enterprise edition only)
We would encourage you to read our documentation on mapping extracted data before you start using mappings.
Mappings can be deleted by pressing the Delete key on your keyboard after selecting them.
Extractor Token: Advanced tab
Advanced Tab (enterprise edition only)
- Strip HTML (enterprise edition only): Check this box if you'd like screen-scraper to pull out HTML tags from the extracted value.
- Resolve relatively URL to absolute URL (enterprise edition only): If checked, this will resolve a relative URL (e.g., /myimage.gif) into an absolute URL (e.g., http://www.mysite.com/myimage.gif).
- Convert HTML entities (enterprise edition only): This will cause any html entities to be converted into plain text (e.g., it will convert & into &).
- Trim white space (enterprise edition only): This will cause any white space characters (e.g., space, tab, return) to be removed from the start and end of the matched string.
- Exclude from DataSet/DataRecord (enterprise edition only): This will cause this token to not be saved in the DataRecord from each match of the extractor pattern