UTF-8

Working with international characters and I seem to be experiencing the same issues that have befallen many a screen scraper!

Have set the default character set to UTF-8 and the default font to Arial Unicode MS.

The information is being posted to a PHP file which is encoded in UTF-8.

The content type is set to charset=utf-8 in the Accept-Charset: the values are set to: ISO-8859-1, utf-8;q=0.7,*;q=0.7

Have gone through tried Tidy HTML after scraping, no joy.

URL is: http://www.oddbins.com/products/productdetail.asp?ProductCode=32882

I get lots of � characters appearing for £ and French characters.

See script: http://www.blue-curve.com/local.sss

There must be something that I'm missing here! Any suggestions?

International character sets

International character sets are a pain. Most of the time you need to turn off HTML tidy, and since the site is declaring the character set they are using, your default character set should be overridden by the site's settings, and the font you indicate is appropriate.

Character set order of priority

griffen,

I've updated one of our FAQs on this topic. Please have a look and let us know if you have any questions.

Scott