Problem with case sensitivity in html tags
A website I am scraping will occasionally output an html file with source tags that are capitalized.
For example, a pattern would be created for
and it would work most of the time. Occasionally a page would have
and the pattern will not apply anymore.
Is there a way to turn of case sensitivity in SS?
I can think of two options:
1. Create a second extractor pattern for instances where tags are lowercase
2. Download my searches, create a script to change everything to lowercase, and then scrape.
I would prefer not having to do either.
Thanks in advance.
Problem with case sensitivity in html tags
dliu,
Before HTML is processed by a scrapeable file it is first parsed using HTML Tidy ([url]http://www.w3.org/People/Raggett/tidy/[/url]). One thing HTML Tidy does is to make the HTML tags all lowercase.
Is it possible that you unchecked the box that says "Tidy HTML after scraping? (recommended)" on the Advanced tab of the scrapeable file where you're having this issue?
Please let us know if this is not the case and we'll need to investigate further.
Thanks,
Scott