Avoid case sensitivity
Hi
I have a problem scraping a page. On some of the pages, the HTML tags are lower case and on other they are upper case. In both cases it is the same letters. My problem is that my pattern will only match one of them. One solution would be to make to patterns, but isn't there another solution?
Hans
In those cases, I have to
In those cases, I have to replace the test in the tags with a token. For example, if the tag can show up as either:
<b>Desired data</b>
or:
<B>Desired data</B>
My extractor will look like:
<~@B@~>~@DESIRED_DATA@~</~@B@~>
and in the "B" tokens I will set a RexEx of "B|b" to get either.
Same solution
Hi Jason
Thanks for your reply. I came to the same solution myself. Instead of using B|b i used (?i)b which makes the RegEx case insensitive (easier if it is a longer tag).
Hans