Character sets and encodings confusion

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* Character sets and encodings confusion
@ 2008-01-11 14:26 Otto Maddox
  2008-01-11 16:34 ` Eli Zaretskii
  0 siblings, 1 reply; 3+ messages in thread
From: Otto Maddox @ 2008-01-11 14:26 UTC (permalink / raw)
  To: help-gnu-emacs

When I type `C-u C-x =' on the character `£', I get
something like this:

  character: £ (2211, #o4243, #x8a3, U+00A3)
    charset: latin-iso8859-1
             (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
 code point: #x23
     syntax: w 	which means: word
   category: l:Latin
buffer code: #x81 #xA3
  file code: #xA3 (encoded by coding system iso-latin-1)
    display: by this font (glyph code)
     -apple-monaco-medium-r-normal--13-130-72-72-m-130-iso10646-1 (#xA3)

Why is the code point #x23?  Should it not be #xA3 in Latin Alphabet 1?
Because when you click on the #x23, the character list you get shows
the code point as being #xA3, which is confusing.

Also, what are the first three numbers in parenthesis on the
`character:' line?  Are they code points of some charset?  (I
understand that the fourth number is a Unicode code point.)

-- 
  Otto Maddox
  ottomaddox@fastmail.fm

-- 
http://www.fastmail.fm - Same, same, but different…

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Character sets and encodings confusion
       [not found] <mailman.6033.1200065717.18990.help-gnu-emacs@gnu.org>
@ 2008-01-11 16:28 ` Jason Rumney
  0 siblings, 0 replies; 3+ messages in thread
From: Jason Rumney @ 2008-01-11 16:28 UTC (permalink / raw)
  To: help-gnu-emacs

On 11 Jan, 14:26, "Otto Maddox" <ottomad...@fastmail.fm> wrote:
> When I type `C-u C-x =' on the character `£', ...

> Why is the code point #x23?  Should it not be #xA3 in Latin Alphabet 1?

The clue is in the following:

>     charset: latin-iso8859-1
>              (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)

Note that the latin-iso8859-1 charset only includes the Right-Hand
part (0x80-0xff).

> Because when you click on the #x23, the character list you get shows
> the code point as being #xA3, which is confusing.

It is confusing, but the table displayed is listed as the *coded*
charset, so it has the +0x80 transformation applied.

> Also, what are the first three numbers in parenthesis on the
> `character:' line?

They are the code-point in the internal encoding (emacs-mule in the
current version) in decimal, octal and hexadecimal.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Character sets and encodings confusion
  2008-01-11 14:26 Character sets and encodings confusion Otto Maddox
@ 2008-01-11 16:34 ` Eli Zaretskii
  0 siblings, 0 replies; 3+ messages in thread
From: Eli Zaretskii @ 2008-01-11 16:34 UTC (permalink / raw)
  To: help-gnu-emacs

> From: "Otto Maddox" <ottomaddox@fastmail.fm>
> Date: Fri, 11 Jan 2008 14:26:29 +0000
> 
> When I type `C-u C-x =' on the character `£', I get
> something like this:
> 
>   character: £ (2211, #o4243, #x8a3, U+00A3)
>     charset: latin-iso8859-1
>              (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
>  code point: #x23
>      syntax: w 	which means: word
>    category: l:Latin
> buffer code: #x81 #xA3
>   file code: #xA3 (encoded by coding system iso-latin-1)
>     display: by this font (glyph code)
>      -apple-monaco-medium-r-normal--13-130-72-72-m-130-iso10646-1 (#xA3)
> 
> Why is the code point #x23?

This is the code point of `£' in the latin-iso8859-1 charset.

> Should it not be #xA3 in Latin Alphabet 1?

No.  The latin-iso8859-1 charset does not include ASCII, so it starts
from what you are used to call "codepoint 160".

> Also, what are the first three numbers in parenthesis on the
> `character:' line?  Are they code points of some charset?

They are internal Emacs representation of this character, in decimal,
octal, and hex.

This is all explained in the Emacs manual, btw; see the node "Position
Info" there (I got to that node by typing "i C-x =" in Info).

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-01-11 16:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-11 14:26 Character sets and encodings confusion Otto Maddox
2008-01-11 16:34 ` Eli Zaretskii
     [not found] <mailman.6033.1200065717.18990.help-gnu-emacs@gnu.org>
2008-01-11 16:28 ` Jason Rumney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).