Scraping foreign language site

The site from which Iâ€™m trying to scrap are Korean sites (thus Korean fonts). So, I have set â€˜Default character setâ€™ as â€˜euc_krâ€™ and â€˜Default fontâ€™ as â€˜Arial Unicode MSâ€™. I am able to receive token results from scrapeable file except that they are unreadable texts (actually symbols and squares). Of course, when I then transfer the token results to database I can then see the results in readable Korean fonts. But, this isn't good enough.. I really need to see the results in scraper program before they are transferred to database so that I know exactly what is being scraped.

I can solve this problem if I un-check â€˜Tidy HTML after scrapingâ€™ in the Advanced Tab menu. However, my new problem is that no results are being found. The error message reads â€˜Warning! No matches were made by any of the extractor patterns associated with this scrapeable file.â€™

Does anyone know how I can see the results in readable Korean font in the scraper program without having it transferred to database?

You help is very much appreciated.
Thank you.

Brian Kim on 05/26/2007 at 5:54 am

screen-scraper public support

Helpful foreign language tool

When scraping sites in foreign languages, there are a few tools available to you. One that we have recently come across is a translation addon for Firefox that allows you to convert Chinese pages to English. The addon can be found at https://addons.mozilla.org/en-US/firefox/addon/3349

scraper on 07/10/2008 at 10:14 am

Search

Community

screen-scraper

User login

Scraping foreign language site

Helpful foreign language tool