Strip HTML

Scott,

I'm having trouble with comments, any ideas?

e.g. trying to remove the likes of

or

Cantankerous on 07/02/2008 at 5:32 am

Login or register to post comments

Strip HTML

Works wonderfully, thank you.

Cantankerous on 06/29/2008 at 1:56 pm

Login or register to post comments

Strip HTML

Thanks I'll give this a go.

I've only got the pro version at the moment as most of the scraping I'm doing is one off / infrequent to generate XML data files.

I now know how to add new regex to the library but a larger initial library of filters would be a nice addition.

I've found the tool to be very useful and your help even more so.

Thanks again

Alex

Cantankerous on 06/27/2008 at 6:22 pm

Login or register to post comments

Strip HTML

Alex,

Both ways will work to call a function.

Again, if you're running enterprise edition you can easily convert HTML entities into ASCII by checking the Convert HTML entities box under the advanced tab for any given token.

Otherwise, give this a try.

String prepareStringForOutput( String value )
{
if (value != null)
{
//Strip all html tags except for formating tags
value = value.replaceAll(",", "\\,");
value = value.replaceAll("\"", "\'");
value = value.replaceAll("<ol[^<>]*>", "ol_open_!HOLD!");
value = value.replaceAll("<li[^<>]*>", "li_!HOLD!");
value = value.replaceAll("</ol>", "ol_close_!HOLD!");
value = value.replaceAll("<ul[^<>]*>", "ul_open_!HOLD!");
value = value.replaceAll("</ul>", "ul_close_!HOLD!");
value = value.replaceAll("<p[^<>]*>", "p_open_!HOLD!");
value = value.replaceAll("</p>", "p_close_!HOLD!");
value = value.replaceAll("<br/>", "br_!HOLD!");
value = value.replaceAll("

swilsonmc on 06/27/2008 at 8:31 am

Login or register to post comments

Hidden characters

Scot,

Do you also have a helpful script for stripping non-visible characters?

Alex

Cantankerous on 06/27/2008 at 3:10 am

Login or register to post comments

Strip HTML

Scot,

Thank you very much for this but can I ask you one more question. I'm starting from a very low level with the scripting. How do I pass the content I scrape through this function.

is it

myVariable = fixstring(myVariable)

or

myVariable = fixstring((session.getVariable( "myVariable" ))

Alex

Cantankerous on 06/27/2008 at 1:47 am

Login or register to post comments

Strip HTML

Alex,

If you're wanting to remove the tags completely and you're using the enterprise edition you can make swift work of it by enabling the "Strip HTML" feature for each token under its advanced tab.

For basic and professional, you'll need to write a function in one of your scripts that gets called (typically) just prior to writing out the data.

String fixString(String value)
{
if (value != null)
{
value = value.replaceAll("\"", "\'");
value = value.replaceAll("&", "&");
value = value.replaceAll("

swilsonmc on 06/25/2008 at 11:53 am

Login or register to post comments

Search

Community

screen-scraper

User login

Strip HTML

Strip HTML

Strip HTML

Strip HTML

Strip HTML

Hidden characters

Strip HTML

Strip HTML