Strip HTML missing when java script involved...
I have a piece that is scraped to be :
<font color="#000000"><SCRIPT language="JavaScript"> var scolor='#fc050b'; var shimmercount=shimmercount+3; eval('var shimmercolor' +shimmercount+ '="' +scolor+ '"'); document.write("<span id='" + shimmercount + "animate'><b>"); </SCRIPT>Brian2112</b></span></font>
I am expecting to see only :
Brian2112
but I end up with :
var scolor='#fc050b'; var shimmercount=shimmercount+3; eval('var shimmercolor' +shimmercount+ '="' +scolor+ '"'); document.write(""); Brian2112
So essentially the strip HTML regular expression does not take the above scenario into account.
I test and it is ok
I scrape your link using
~@NAME@~
then return to me Brian2112
I use this regular expression
[^<>]*
That's a good suggestion,
That's a good suggestion, angeloalves.
Dean, the "Strip HTML" option will find and remove anything between the opening and closing HTML tags (<>), however, in your case I would recommend not using that feature. Instead, do as angeloalves suggests and simply place a token over "Brian2112" and set the regular expression to "non HTML tags". When your extractor pattern text is applied it will match correctly.
-Scott