Inconsistent characters in XML output
My XML parser says I have a bad character but it doesn't know me that well.
Let me see if I can explain:
I am having the same issue on all versions of screen scraper 3.0.67a, 3.0.70a and currently 4.0 running on Windows XP Pro. I have several scrapes that have worked fine untill a few days ago, they are still working in production but not on my development box.
The output from the scrapes is an XML file with standard UTF-8 formatting. Screen-scraper has always automatically converted special characters (&,>, foriegn, etc.) to thier ASCII equivalent so that they do not break the XML. The problems is that a few days ago scrapes produced on my development box stopped converting some of those characters, leaving the original scraped value. Below are 2 simplified examples of the same scrape. The first run on my devepopment pc, the second on our production server. Look at the french word for "technology" in both scrapes. The first example is what I want. Hopefully the the differences will show in this posting.
Example production version scrape results (Win Server 2003):
tecnologí
313446
Example of the same scrape run on my dev pc (dell laptop XP Pro):
tecnologí¡
313446
Have you ever seen this before? I welcome any ideas of things to try.
thanks,
Joel
Inconsistent characters in XML output
Just a final note for anyone experienceing a similar problem with foriegn language characters not being translated to thier ASCII equivilants by Screen-scraper.
The problem is being fixed perminantely but for now my solution was to downgrade to the version before the change went in. version 3.0.67a
Instructions for moving to this or any other version manually can be found here - http://www.screen-scraper.com/support/faq/faq.php#GUILessUpdate
Good luck and thanks
-Joel
Inconsistent characters in XML output
Joel,
Could you let me know (pm me if you prefer) which scraping session this pertains to?
Thanks,
Scott
Inconsistent characters in XML output
Scott,
The server is running 3.0.67a
My dev box was 3.0.67a, but the install got corrupted a few days ago and I had to upgrade to 3.0.70a, when i found i was having issues I updated to 4.0 on the dev box.
I am not sure that the issue existed before i started to upgrade.
I am not using the setting on the advanced tab. and adding the data to the xml file using -
xmlWriter.addElement(job, "description", prepareStringForOutput(description));
FYI - This particular script was originally written through a contractor and now being finished by Jason at SS
I am having the same issue with other older scripts that are also still working fine in production.
thanks for the help.
Inconsistent characters in XML output
Joel,
What version of screen-scraper is running on the two instances you're referencing?
Win 2003 = 3.0.67a?
XP Pro = 4.0?
Did this coincide with you upgrading to 3.0.70a or to 4.0, by chance? We made some changes to the way that screen-scraper handles character encoding and it happen to have a small relationship with HTML entities (it was an odd bug to fix). The change took place roughly around 3.0.67a and 3.0.70a and newer contains this fix.
How are you converting the HTML entities? Are you using the setting under the advanced tab for the extractor pattern token to "convert HTML entities" or are you doing it in a script?
Have you changed your default font and/or character encoding on either box recently?
-Scott