Ignoring HTML codes
I'm using an evaluation copy of screen-scraper professional (Version 2.7.2).
I would like to setup an extraction pattern that removes the URL from the sample below (without the extra HTML code). I reviewed other entries in this forum and saw a reference to a "Strip HTML" checkbox under the Advanced tab for a given extraction pattern. I do not see that checkbox listed.
Is there also a way to do this with a regular expression? Please describe.
Appreciatively,
Peter
_____________
First try accessing the PHP directly here: http://www.screen-scraper.com/support/tutorials/tutorial5/db/save_product.php.
Ignoring HTML codes
Hi Todd,
Wonderful explanation! Thanks for presenting different options.
I would never have figured out the token editting procedure without your help.
Best regards,
Peter
Ignoring HTML codes
Hi Peter,
The simplest way to do this would be to create a targeted extractor pattern so that it pull only the URL from the HTML. Something like this:
" target="_new">~@URL@~</a>
or perhaps this:
~@JUNK_TEXT@~ <a href="~@RELATIVE_URL@~" target="_new">~@URL@~</a>.
</p>
If you use the "Strip HTML" text box on that entire string of text, you'd get something like this:
First try accessing the PHP directly here: http://www.screen-scraper.com/support/tutorials/tutorial5/db/save_product.php
To address your question on the "Strip HTML" option, if you edit an extractor pattern token (double-click it; or select it, right-click and select "Edit token"), under the "Advanced" tab you'll see the option.
Kind regards,
Todd Wilson