From: Peter Dyballa <Peter_Dyballa@Web.DE>
To: Tech Stuff <techstuff1971@yahoo.com>
Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>
Sent: Tuesday, March 12, 2013 3:50 AM
Subject: Re: File Encoding Issue on Windows
Am 12.03.2013 um 04:08 schrieb Tech Stuff:
> ¿En qué fecha llegaron
>
> when I should see:
>
> ¿En qué fecha llegaron
The first line encodes the text of the last line in UTF-8 encoding, but is displayed to you in a different, an 8-bit encoding. In UTF-8 more than one byte, more than 8 bits, are used to encode the characters. Only the characters of the US-ASCII range (U+0001 - U+007E), i.e. the digits, non-accented characters, punctuation, are encoded by one byte.
The character ¿, INVERTED QUESTION MARK, U+00BF, is encoded in UTF-8 as two bytes: C2BF. These two bytes are in Notepad interpreted as some Latin or MS Windows encoding, i.e. as two different characters, as  and as ¿, which are then displayed as such.
The character é, LATIN SMALL LETTER E WITH ACUTE, U+00E9, is encoded in UTF-8 as two bytes: C3A9. These two bytes are in Notepad interpreted as some Latin or MS Windows
encoding, i.e. as two different characters and then displayed as à and as ©.
In MS Windows code page CP1252 uses for encoding:
A9 = ©, COPYRIGHT SIGN
BF = ¿, INVERTED QUESTION MARK
C2 = Â, LATIN CAPITAL LETTER A WITH CIRCUMFLEX
C3 = Ä, LATIN CAPITAL LETTER A WITH DIAERESIS
So Notepad is using this code page, CP1252, to display the UTF-8 encoded file. What you need to do is to tell Notepad to use UTF-8.
--
Greetings
Pete
Give a man a fish, and you've fed him for a day. Teach him to fish, and you've depleted the lake.