* Character sets and encodings confusion
@ 2008-01-11 14:26 Otto Maddox
2008-01-11 16:34 ` Eli Zaretskii
0 siblings, 1 reply; 3+ messages in thread
From: Otto Maddox @ 2008-01-11 14:26 UTC (permalink / raw)
To: help-gnu-emacs
When I type `C-u C-x =' on the character `£', I get
something like this:
character: £ (2211, #o4243, #x8a3, U+00A3)
charset: latin-iso8859-1
(Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
code point: #x23
syntax: w which means: word
category: l:Latin
buffer code: #x81 #xA3
file code: #xA3 (encoded by coding system iso-latin-1)
display: by this font (glyph code)
-apple-monaco-medium-r-normal--13-130-72-72-m-130-iso10646-1 (#xA3)
Why is the code point #x23? Should it not be #xA3 in Latin Alphabet 1?
Because when you click on the #x23, the character list you get shows
the code point as being #xA3, which is confusing.
Also, what are the first three numbers in parenthesis on the
`character:' line? Are they code points of some charset? (I
understand that the fourth number is a Unicode code point.)
--
Otto Maddox
ottomaddox@fastmail.fm
--
http://www.fastmail.fm - Same, same, but different
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Character sets and encodings confusion
[not found] <mailman.6033.1200065717.18990.help-gnu-emacs@gnu.org>
@ 2008-01-11 16:28 ` Jason Rumney
0 siblings, 0 replies; 3+ messages in thread
From: Jason Rumney @ 2008-01-11 16:28 UTC (permalink / raw)
To: help-gnu-emacs
On 11 Jan, 14:26, "Otto Maddox" <ottomad...@fastmail.fm> wrote:
> When I type `C-u C-x =' on the character `£', ...
> Why is the code point #x23? Should it not be #xA3 in Latin Alphabet 1?
The clue is in the following:
> charset: latin-iso8859-1
> (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
Note that the latin-iso8859-1 charset only includes the Right-Hand
part (0x80-0xff).
> Because when you click on the #x23, the character list you get shows
> the code point as being #xA3, which is confusing.
It is confusing, but the table displayed is listed as the *coded*
charset, so it has the +0x80 transformation applied.
> Also, what are the first three numbers in parenthesis on the
> `character:' line?
They are the code-point in the internal encoding (emacs-mule in the
current version) in decimal, octal and hexadecimal.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Character sets and encodings confusion
2008-01-11 14:26 Character sets and encodings confusion Otto Maddox
@ 2008-01-11 16:34 ` Eli Zaretskii
0 siblings, 0 replies; 3+ messages in thread
From: Eli Zaretskii @ 2008-01-11 16:34 UTC (permalink / raw)
To: help-gnu-emacs
> From: "Otto Maddox" <ottomaddox@fastmail.fm>
> Date: Fri, 11 Jan 2008 14:26:29 +0000
>
> When I type `C-u C-x =' on the character `£', I get
> something like this:
>
> character: £ (2211, #o4243, #x8a3, U+00A3)
> charset: latin-iso8859-1
> (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
> code point: #x23
> syntax: w which means: word
> category: l:Latin
> buffer code: #x81 #xA3
> file code: #xA3 (encoded by coding system iso-latin-1)
> display: by this font (glyph code)
> -apple-monaco-medium-r-normal--13-130-72-72-m-130-iso10646-1 (#xA3)
>
> Why is the code point #x23?
This is the code point of `£' in the latin-iso8859-1 charset.
> Should it not be #xA3 in Latin Alphabet 1?
No. The latin-iso8859-1 charset does not include ASCII, so it starts
from what you are used to call "codepoint 160".
> Also, what are the first three numbers in parenthesis on the
> `character:' line? Are they code points of some charset?
They are internal Emacs representation of this character, in decimal,
octal, and hex.
This is all explained in the Emacs manual, btw; see the node "Position
Info" there (I got to that node by typing "i C-x =" in Info).
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-01-11 16:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-11 14:26 Character sets and encodings confusion Otto Maddox
2008-01-11 16:34 ` Eli Zaretskii
[not found] <mailman.6033.1200065717.18990.help-gnu-emacs@gnu.org>
2008-01-11 16:28 ` Jason Rumney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).