CvsWriter encoding issue
I am scraping product names from a website and one of the products as listed on the website is "Convoy™ 2 u660"
With JTidy as the default setting the product name was being scraped as "Convoyª 2 u660"
I disabled tidy HTML for that page and the product name is showing correctly on the console as "Convoy™ 2 u660"
However, when I write this to file using the CSVWriter it is being written as "Convoyª 2 u660"
I have the character set for the scraping session set to UTF-8.
Is there an encoding bug in the csvwriter?
- Vivek
You original text is in UTF8?
You original text is in UTF8? If not, if you set the CSV writer to output the same encoding as the original it should work.
Yes. Original Test is in UTF8
How do I set the encoding on the CSV writer? It's using whatever is the default and that's causing the weird characters to appear in the csv output file whereas the session log shows everything fine.
The CsvWriter hasn't a means
The CsvWriter hasn't a means to change the character set. If you need to, you may need to use am alternate means to write the output.