XML file with Shift_JIS character set

Hi Guys,
After im scraping some website with shift_JIS character set, and im receving data i have probably simply problem to fix: in screen scraper logs i have result:
//_LINK_: KN2300060600394539
//ENTITY_NAME: KOKO・TOYOTA
//ENTITY_ADDRESS: 〒471-0034 愛知県豊田市小坂本町4丁目1−4
//PHONE: 0838-26-5200

//_LINK_: KN3500060700059393
//ENTITY_NAME: ファミリーtoyota
//ENTITY_ADDRESS: 〒758-0011 山口県萩市大字椿東無田ケ原2884−1
//PHONE: 0120-060861

//_LINK_: KN2307011300001766
//ENTITY_NAME: トヨタすまいるライフ株式会社/レジデンス・THE・TOYOTAマンションパビリオン
//ENTITY_ADDRESS: 〒471-0878 愛知県豊田市下林町1丁目3−3−1501
//PHONE: 0565-37-8567

but XML file result is like that:


<_LINK_>KN2300060600394539
???????????
?471-0034?????????????????
0838-26-5200


<_LINK_>KN3500060700059393
???????????
?758-0011????????????????????
0120-060861


<_LINK_>KN2307011300001766
?????????????????????????????????????????
?471-0878?????????????????????
0565-37-8567

can you please help me with sort out this problem, how to set up xmlWriter?

Best Regards,

Radek

We checked, and as of right

We checked, and as of right now the XMLWriter doesn't allow setting of the character sets. It should be there, though, so we're going to add it. Watch the blog for a note when we release an alpha version in the next day or two with this feature added.

We have this added in version

We have this added in version 5.5.3a. If you upgrade to this version, you can use this session to see how it works.

First, we know the site is showing the characters in Shift_JIS, but the writer needs to be set to UTF-8 to output it correctly.

You need to copy this text to your editor, save the file as "XML Writer.sss" and import it to your screen-scraper.

<?xml version="1.0" encoding="Shift_JIS"?>
<scraping-session use-strict-mode="true"><script-instances><script-instances when-to-run="20" sequence="1" enabled="true"><script><script-text>xmlWriter =
        new com.screenscraper.xml.XmlWriter
        (
                "output/test.xml",
                "root_element",
                "This is the root element",
                null,
                //"Shift_JIS"
                "UTF-8"
        );

xmlWriter.addElement( "foo", session.getv( "SAMPLE" ) );

xmlWriter.close();</script-text><name>XML Writer--go</name><language>Interpreted Java</language></script></script-instances><owner-type>ScrapingSession</owner-type><owner-name>XML Writer</owner-name></script-instances><name>XML Writer</name><notes></notes><cookiePolicy>0</cookiePolicy><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>2</originator_edition><logging_level>1</logging_level><date_exported>April 28, 2011 10:09:33</date_exported><character_set>Shift_JIS</character_set><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jericho"><last-scraped-data></last-scraped-data><URL>http://www.phdcc.com/fiscd/japan.htm</URL><BASICAuthenticationUsername></BASICAuthenticationUsername><last-request></last-request><name>Sample</name><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>&lt;TITLE&gt;&#xd;
   ~@SAMPLE@~&#xd;
  &lt;/TITLE&gt;&#xd;
</pattern-text><identifier>Sample</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="true" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[^&lt;&gt;]*</regular-expression><identifier>SAMPLE</identifier></extractor-pattern-tokens><script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Sample</owner-name></script-instances></extractor-patterns><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>Sample</owner-name></script-instances></scrapeable-files></scraping-session>

thank you very much

thank you very much Jason

Radek

P.S. Thank you for very quick respond if you dont mind can you show me syntax example how to use it now?

Best Regards