UTF8 Character Coversion (MSWord Characters too)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • -Oz-
    Senior Member
    • Mar 2004
    • 545

    #1

    UTF8 Character Coversion (MSWord Characters too)

    I wrote a function that will convert all the "problem" characters into their &blah; text. This makes XML feeds valid and eliminates problems with MSWord's smart characters. Feel free to use the function as you see fit (you can even remove the OzTheory.com part.

    PHP Code:
    function utf8encode($text=""){
        
    //Compiled by OzTheory.com
          
    $chars=array(
              
    'Ò' => 'Ò',
              
    'Ó' => 'Ó',
              
    'Ô' => 'Ô',
              
    'Õ' => 'Õ',
              
    'Ø' => 'Ø',
              
    'Ù' => 'Ù',
              
    'Ú' => 'Ú',
              
    'Û' => 'Û',
              
    'Ü' => 'Ü',
              
    'ß' => 'ß',
              
    'à' => 'à',
              
    'á' => 'á',
              
    'â' => 'â',
              
    'ã' => 'ã',
              
    'ä' => 'ä',
              
    'å' => 'å',
              
    'æ' => 'æ',
              
    'ç' => 'ç',
              
    'è' => 'è',
              
    'é' => 'é',
              
    'ê' => 'ê',
              
    'ë' => 'ë',
              
    'ì' => 'ì',
              
    'í' => 'í',
              
    'î' => 'î',
              
    'ï' => 'ï',
              
    'ñ' => 'ñ',
              
    'ò' => 'ò',
              
    'ó' => 'ó',
              
    'ô' => 'ô',
              
    'õ' => 'õ',
              
    'ö' => 'ö',
              
    '÷' => '÷',
              
    'ø' => 'ø',
              
    'ù' => 'ù',
              
    'ú' => 'ú',
              
    'û' => 'û',
              
    'ü' => 'ü',
              
    'ÿ' => 'ÿ',
              
    '‚' => '‚',
              
    'ƒ' => 'ƒ',
              
    '„' => '„',
              
    '…' => '…',
              
    '†' => '†',
              
    '‡' => '‡',
              
    'ˆ' => 'ˆ',
              
    '‰' => '‰',
              
    'Œ' => 'Œ',
              
    '–' => '–',
              
    '—' => '—',
              
    '˜' => '˜',
              
    '™' => '™',
              
    'œ' => 'œ',
              
    'Ÿ' => 'Ÿ',
              
    'Ñ' => 'Ñ',
              
    'Ï' => 'Ï',
              
    'Î' => 'Î',
              
    'Í' => 'Í',
              
    'Ì' => 'Ì',
              
    'Ë' => 'Ë',
              
    'Ê' => 'Ê',
              
    'É' => 'É',
              
    'È' => 'È',
              
    'Ç' => 'Ç',
              
    'Æ' => 'Æ',
              
    'Å' => 'Å',
              
    'Ä' => 'Ä',
              
    'Ã' => 'Ã',
              
    'Â' => 'Â',
              
    'Á' => 'Á',
              
    'À' => 'À',
              
    '¿' => '¿',
              
    'µ' => 'µ',
              
    '±' => '±',
              
    '°' => '°',
              
    '®' => '®',
              
    '©' => '©',
              
    '¨' => '¨',
              
    '§' => '§',
              
    '¥' => '¥',
              
    '£' => '£',
              
    '€' => '€',
              
    '¢' => '¢',
              
    '¡' => '¡',
              
    '’' => "'",
              
    '‘' => "'",
              
    '“' => '"',
              
    '”' => '"',
              
    '…' => '...',
              
    "'" => '’'
          
    );
          
    $text=str_replace(array_keys($chars),array_values($chars),$text);
          return 
    $text;
      } 
    I remember people talking about things like this awhile ago. Hopefully people find it useful.
    Dan Blomberg
  • Jonathan
    Senior Member
    • Mar 2004
    • 1229

    #2
    Nice function, Oz/Dan I may use this on a project I'm working on...
    "How can someone be so distracted yet so focused?"
    - C

    Comment

    • timg
      Member
      • Feb 2005
      • 84

      #3
      So...

      How would you use this? If you were including a Word or XML file somehow?
      ~ Tim Gallant ~ http://www.pactumweb.com

      Comment

      • -Oz-
        Senior Member
        • Mar 2004
        • 545

        #4
        as you take the data out from the mysql database, or put it in just say:
        PHP Code:
        $string utf8encode($string); 
        Dan Blomberg

        Comment

        • Frank Hagan
          Senior Member
          • Mar 2004
          • 724

          #5
          Often, you need to do this for forms that ask for more than a single word or two because people will sometimes write their information in Word and cut and paste into the web form.

          Comment

          • sdjl
            Senior Member
            • Mar 2004
            • 502

            #6
            XML is a little picky when it comes to data inside it's fields.
            Even if you use CDATA it still fails if certain characters are present.

            Therefore, it's necessary to format all characters to make sure that they are usable within and XML file
            -----
            Do you fear the obsolescence of the metanarrative apparatus of legitimation?

            Comment

            Working...