btw, is it possible that we can get a more powerful regex parser for the next update , this one doesnt seem to support look behinds which i think would be VERY useful
I realize that this thread is very, very old, but for future viewers, I thought I'd add to the discussion, despite any lack of replies.
One primary reason behind the inability to use lookbehind very well, is that the programming in screen-scraper is using parenthesis behind the scenes, which help to create the flexibleness (is that a word??) of an extractor pattern.
Consequently, using parenthesis inside of a token within an extractor pattern will throw off the way screen-scraper parses its way through the tokens. Even though lookahead/lookbehind calls use a pure "(" and ")" (ie, they don't create "reference groups"), screen-scraper is still seeing them for what they literally are (which are actual parentheses), and has trouble digesting the token.
Expressions in a token that cause problems include: // Lookahead/behind
A(?=police) // matching the string "A" only when followed by "police"
(?<=car)A // matching the string "A" only when appearing after "car"
(car)(?=police) // like the first example, but matching "car" only when followed by "police"
// Trying to do wildcard match on a group
(someText)?
(someText)*
(someText)+
// Creating reference groups
(left)(right)\1\2 // matching "leftrightleftright"
Note that reference groups (and consequently back references such as "\1") may be used on the "Mapping" tab of the token editor .
You're still absolutely free to use such lookahead/behind in your scripts, though. For instance, Java comes with packages to handle such things, and you can have complete access to regex power through Java if you access your variables from the dataRecord/sessionVariables.
The reason
I realize that this thread is very, very old, but for future viewers, I thought I'd add to the discussion, despite any lack of replies.
One primary reason behind the inability to use lookbehind very well, is that the programming in screen-scraper is using parenthesis behind the scenes, which help to create the flexibleness (is that a word??) of an extractor pattern.
Consequently, using parenthesis inside of a token within an extractor pattern will throw off the way screen-scraper parses its way through the tokens. Even though lookahead/lookbehind calls use a pure "(" and ")" (ie, they don't create "reference groups"), screen-scraper is still seeing them for what they literally are (which are actual parentheses), and has trouble digesting the token.
Expressions in a token that cause problems include:
// Lookahead/behind
A(?=police) // matching the string "A" only when followed by "police"
(?<=car)A // matching the string "A" only when appearing after "car"
(car)(?=police) // like the first example, but matching "car" only when followed by "police"
// Trying to do wildcard match on a group
(someText)?
(someText)*
(someText)+
// Creating reference groups
(left)(right)\1\2 // matching "leftrightleftright"
Note that reference groups (and consequently back references such as "\1") may be used on the "Mapping" tab of the token editor .
You're still absolutely free to use such lookahead/behind in your scripts, though. For instance, Java comes with packages to handle such things, and you can have complete access to regex power through Java if you access your variables from the dataRecord/sessionVariables.
MORE REGEX PLEASE IM BRITISH
That would be nice.