funny characters

When I scrape a site, I sometimes get a funny character, like: Ã'’ replacing: '

which is the character that appears on the website. The only way i can remove it is by doing a find and replace in my text editor. This is not such a big deal, but i rather not get these, since PHP doesn't seem to want to find and replace these characters.

funny characters

Oh, the "smart quote". Wrapping our phrases with properly-curved punctuation.

funny characters

It turns out these funny characters are a special type of windows character called a 'smart quote'. I found the most effective way of dealing with them is to issue a find and replace UPDATE command directly to mysql database:

funny characters

rubing,

If you're using professional or enterprise editions you can automatically convert HTML entities by checking the appropriate box in the extractor pattern token's "Edit Token" properties window (double-click your token or highlight your token text, right-click and choose "Edit token").

If you're using the basic edition you'll need to accomplish this within a script by calling a function that utilizes the replaceAll() method. It would look something like this...

String prepareStringForOutput( String value )
{
        if (value != null)
        {
               
                value = value.replaceAll(" ", " ");
                value = value.replaceAll("!", "!");
                value = value.replaceAll(""", "\"");
                value = value.replaceAll("#", "#");
                // ...etc., etc.
               
                value = value.replaceAll("

funny characters

I used my browser to go to the original page and looked at the source. It turns out that this funny character is being converted from ascii: #146;

Is there any way to stop this it is wreaking havoc with my database.