UTF-8 in Linux
Moved my machine over to using Linux and found that with UTF-8 I'm now getting a squiggly A character infront of a £. Using the same settings as I was in Windows and never had this issue. In the debug log the characters keep appearing. Are there any settings that need to be changed specifically for Linux?
Scraping a mixture of.....
Scraping a mixture of prices and French and German characters. I get: menère and £5 appearing. Any other things that I can try?
Anyone get UTF-8 to work correctly in Linux
Still not having any joy. Has anyone got Screen-scraper to work correctly in Linux with UTF-8? Squiggle As are still appearing.
I've done lots. What's the
I've done lots. What's the language you're dealing with?
I imagine that you'll need to
I imagine that you'll need to set the default font to one that supports your characters. In Windows we often us MS Arial Unicode ... I can't remember a good one for Linux at the moment, but I found a list of canidates here.
Tried.....
Installed Arial on the linux machine taken from my windows machine and the same issue is happening. Any other ideas?
griffen, I have sometimes
griffen,
I have sometimes found that even when a site indicates UTF-8 in a metatag that a different character set works in screen-scraper.
If you haven't yet, try other character sets. You can make short work of the process by setting up a test scrape which iterates over a list of character sets and calls the same scrapeable file for each. Write to the log a snippet of the last response where you're seeing the unprintable characters.
Use session.setCharacterSet and try some of the following.
CP1252
GB2312
ISO-8859-1
US-ASCII
For more ideas, read our FAQ on the topic.
-Scott