Extractor pattern returns empty records

I have an extractor pattern that fails to display text. When I test the pattern it shows on record however the field is empty. I've tried to extract a few other items from the page and everything returns the correct number of records but all the data is empty. I'll send in the script...

=================== Log Variables with Message ===============
screen-scraper Instance Information
=================== Static Values ================
Java Vendor: Oracle Corporation
Java Version: 1.8.0_66
OS Architecture: amd64
OS Name: Windows 8
OS Version: 6.2
Scrape HTTP Client: ApacheScrapingHttpClient
SS Connection Timeout: 180 seconds
SS Edition: Professional
SS Extractor Timeout: 120000 milliseconds
SS Max Concurrent Scraping Sessions: 5
SS Maximum Memory: 1024 MB
SS Run Mode: Workbench
SS Version: 7.0
======== Message logged at: 12/09/2016 16:26:22.892 CST ========

Upper case worked

I made the change and it works. It's odd because I have used lower-case in the past without issue.

Thanks for the help, Jeremy

I think the issue is that you

I think the issue is that you used lower case letters for the token names. Our general convention is to use all uppercase for token, but for things like your "skip" where we need to account for dynamic data, but we don't want it, we will use lower case, and therefore if you go to the token properties, advanced tab, the "exclude from DataSet/DataRecord" is checked. Uncheck the box and you will see it. Use uppercase token names to avoid it in the future.

You could also go into screen-scraper/resource/conf and edit the default_token_config.xml. I have done so because I don't like anything getting excluded automatically.

Empty data set

I've found a similar issue, but this time it's not related to the pattern case. This issue is specific to a page, which returns "The data set is empty"

The link is: https://www.homeaway.com/vacation-rental/p4202512

The pattern is: <[email protected]@~</html

NOTE: the token doesn't have a regular expression. It appears to be an issue with <. Removing it solves the issue but there are also other issues in sub-patterns with other characters. They may all point back to the same problem.

I've also used DATARECORD as the token along with other names and nothing works. Other pages for the site work so it's specific to this page.

Let me know if I should send an example file.

Thanks, Jeremy

Jeremy, When you set no


When you set no RegEx, there is still a default on the backend that is ".*?", and there are some unicode characters that don't match dot. Therefore if you make the RegEx: (?>\P{M}\p{M}*)+ it will work.