1 issues possible bug?
I've used the Shopping .sss as a base to learn and create my own .sss.
When I try to create the pattern match my '~@DataRecord@~' doesn't generate using all of the text I've selected. It seems to leave some of the selected text outside of the expression.
Example:
Hi, I'm Bob and I would like to be friends.
Becomes:
~@DataRecord@~ o be friends.
I thought this was due to the length of the expression, but after creating a significantly long one and running it, I don't think this is the case.
Edit:
Second issue was a data type issue. I resolved it.
I have a high level of
I have a high level of confidence in the integrity of the extractors.
I whipped up a quick scrape to grab this. What you need to do is:
You should then be able to see my extractor. Note the RegEx used in each token.
<scraping-session use-strict-mode="true"><script-instances><owner-type>ScrapingSession</owner-type><owner-name>Test</owner-name></script-instances><name>Test</name><notes></notes><cookiePolicy>0</cookiePolicy><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>2</originator_edition><logging_level>1</logging_level><date_exported>October 06, 2011 08:58:46</date_exported><character_set>UTF-8</character_set><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jericho"><last-scraped-data></last-scraped-data><URL>http://na.leagueoflegends.com/champions/84/akali_the_fist_of_shadow</URL><BASICAuthenticationUsername></BASICAuthenticationUsername><last-request></last-request><name>New Scrapeable File</name><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text><tr>
~@ws@~<td class="stats_name">~@NAME@~</td>
~@ws@~<td class="stats_value">~@VALUE@~</td>
~@ws@~<td class="stats_modifier">~@ws@~(~@ws@~<span class="ability_per_level_stat">~@MODIFIER@~~@ws@~</span>~@ws@~)~@ws@~</td>
~@ws@~</tr></pattern-text><identifier>Stats</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="true" exclude-from-data="false" null-session-variable="false" sequence="4"><regular-expression>[^<>]*</regular-expression><identifier>VALUE</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="true" exclude-from-data="false" null-session-variable="false" sequence="2"><regular-expression>[^<>]*</regular-expression><identifier>NAME</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="false" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="true" exclude-from-data="false" null-session-variable="false" sequence="8"><regular-expression>[^<>]*</regular-expression><identifier>MODIFIER</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="true" null-session-variable="false" sequence="12"><regular-expression>[\n\t\s]*</regular-expression><identifier>ws</identifier></extractor-pattern-tokens><script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Stats</owner-name></script-instances></extractor-patterns><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>New Scrapeable File</owner-name></script-instances></scrapeable-files></scraping-session>