Unable to scrape website with French characters (again)!!

I'm sorry to bother you again, but I have always my problem with French accent when I tried to invoke screen-scraper from Java . It seems not to be a hard problem but a have no response.
I would like to buy Screen Scraper Professional but I want to evaluate it before

Here is the url : http://www.novaplanet.com/bons-plans/?ville=1&ddj=2009-08-16
When, i define Iso 8859-1 as the Character set, the character are ok in the Extractor Pattern and also when i write them to a file.
But when i try to invoke mu screen-scraper project from Java (with remoteScrapingSession), i get bad Character like "?" . I try lot of things but nothing works . Maybe somethink to deal with RemoteScrapingSession and the character encoding ?
Hope you can help me !!
Cheers,
Gils

I am on a Windows XP machine,

I am on a Windows XP machine, and sometimes the OS can make a difference, so if this doesn't help, let me know your OS.

  1. In screen-scraper settings, "Default character set" is "ISO-8859-1". I found it in the pulldown.
  2. Also in settings, "Default Font" is "Arial Unicode MS".
  3. I create a scraping session with 1 scrapeable file, the URL is "http://www.novaplanet.com/bons-plans/?ville=1&ddj=2009-08-16"
  4. On advanced tab for for that scrapeable file, uncheck "tidy HTML after scraping"
  5. One extractor pattern:

class="titre">~@TEST@~

There is a RegEx in the TEST token for non-HTML.

That works for me, and get the correct characters.

Did you try from JAVA ??

As i said in my last mail, when i define ""ISO-8859-1" as the Default character set, it’ works fine in screen Scraper (Extractor Pattern) and when i write the session to a file (see , the script « WriteNOVADATA « in the mail i send you)
But when i want to invoke my Scraping session from JAVA (with remoteScrapingSession), i get bad Character like “? » .
Did you try with a Java client ??? (i can send you again my full project)
I tried on mac & PC : same problem ☹

Regards
Gilles

That part too works for my

That part too works for my test. My script is just:

// Define new session
myScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession("Nova");

// Run
myScrapingSession.scrape();

I suspect you may be passing the "session" when you define the new session. If so, I don't think you need it.

I’m using remoteScrapingSession in my Java client

I’m using remoteScrapingSession in my Java client , not RunnableScrapingSession ! So screen-scraper is running as a server !!.
Maybe my problem is similar to this one :
http://community.screen-scraper.com/node/1254
I can send you my projet and my Java Client if you need !

Thanks for your help
Gilles

Giles, If you update to the

Giles,

If you update to the latest version of screen-scraper, there is an updated driver you can use.

Instructions to do so:

http://community.screen-scraper.com/FAQ/NoUpdates

And then make sure you're using the drivers in this version.

Always problem with French characters from Java

Hi,
I upgrade screen scraper to version 4.5.14a.
But nothing change !! I get always my bad characters from my java client like :
Le festival de cin� en plein aime les grandes plaines et le prouve avec sa th�matique

I feel desesperate !
I can send you my all project if that’s help
Thanks
Giles

Okay, do you want to email it

Okay, do you want to email it over to me?

[email protected]