Unable to scrape website with French characters !!
HI,
I am starting to evaluate Screen Scraper Professional who looks good , but it' seems impossible to scrape French web site !!!
I want to invoke screen-scraper from Java.
I turn off "Tidy HTML" and use Arial Unicode MS and UTF-8 setting as explain in the faq and in the forum
.All the French characters are displayed as ? ,both in textfile and in my java client !!
The only thinks who work is to set the characters in ISO-8859-1 and to create a file . In Java it never display the French characters even if I try to convert then in UTF8
Please help me !!!!
Gils
PS : I can send you my projects if that's help!!!
I also deleted another copy
I also deleted another copy of this thread so things don't get confused.
Gils, From your notes, it
Gils,
From your notes, it looks like you've taken the correct steps. Can you share the URL to the site you're scraping, and maybe by looking I can come up with something to help.
Cheers,
~Jason
Gils, From your notes, it
Here is the url : http://www.novaplanet.com/bons-plans/?ville=1&ddj=2009-08-16
When, i define Iso 8859-1 as the Character set, the character are ok in the Extractor Pattern and also when i write them to a file.
But when i try to invoke mu screen-scraper project from Java (with remoteScrapingSession), i get bad Character like cin? . I try lot of things but nothing works . Maybe somethink to deal with RemoteScrapingSession and the charater encoding ?
Hope you can help me !!
Cheers,
Gils
Hi Gils, We actually made a
Hi Gils,
We actually made a fix for this in the very latest alpha version of screen-scraper. Would you mind upgrading to see if this resolves the issue for you? Here's a FAQ on updating if you need help:
http://community.screen-scraper.com/FAQ/NoUpdates
Kind regards,
Todd