From: "B.T. Raven" <ejmn@cpinternet.com>
Subject: Re: coding system
Date: Sun, 27 Mar 2005 00:56:37 -0600 [thread overview]
Message-ID: <114cmcu2661njb8@corp.supernews.com> (raw)
In-Reply-To: 877jju9dog.fsf-monnier+gnu.emacs.help@gnu.org
"Stefan Monnier" <monnier@iro.umontreal.ca> wrote in message
news:877jju9dog.fsf-monnier+gnu.emacs.help@gnu.org...
> > However it seems that the coding system for keyboard input is
latin-1.
> > This is a unibyte coding system; why does emacs see a multibyte
charater
> > when I press é? To what corresponds this 2281?
>
> Inside Emacs, there's no such thing as unibyte characters and
> a multibyte characters. There are just characters, which are
represented
> by integers. When loading/saving a file, characters are
decoded/encoded
> into sequences of bytes which can be unibyte or multibyte. This same
"é"
> can be represented in some files with a single byte (e.g. if it's a
latin-1
> file) or as two bytes (e.g. if it's a utf-8 file), or ...
>
>
> Stefan
That "or ..." is pregnant with meaning. It seems that the same
character can be represented in the same buffer itself with 3 or more
different byte sequences. Here is the C-u C-x = report for three e with
acute and two e with macron:
(Sorry about the munged characters. I don't know how to use gnus under
w32 so I have to copypaste from emacs to Outlook.
Notice that the e with macron expands from a 2-byte to a 4-byte
representation in the buffer after being saved and then reloaded. Also
the part of the font it uses seems to be different. Even if unification
on decoding were working, could it overcome this great a difference in
the representation of the characters?
Ed.
ééé^[$,1 3^[,D:
^[(Bcharacter: é (04351, 2281, 0x8e9)
charset: latin-iso8859-1 (Right-Hand Part of Latin Alphabet 1
(ISO/IEC 8859-1): ISO-IR-100)
code point: 105
syntax: word
category: l:Latin
buffer code: 0x81 0xE9
file code: E9 (encoded by coding system iso-latin-1-dos)
font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-1
character: é (04551, 2409, 0x969)
charset: latin-iso8859-2 (Right-Hand Part of Latin Alphabet 2
(ISO/IEC 8859-2): ISO-IR-101)
code point: 105
syntax: word
category: l:Latin
buffer code: 0x82 0xE9
file code: 0xC3 0xA9 (encoded by coding system mule-utf-8-dos)
font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-2
character: é (05151, 2665, 0xa69)
charset: latin-iso8859-4 (Right-Hand Part of Latin Alphabet 4
(ISO/IEC 8859-4): ISO-IR-110)
code point: 105
syntax: word
category: l:Latin
buffer code: 0x84 0xE9
file code: E9 (encoded by coding system iso-latin-1-dos)
font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-4
character: ^[$,1 3^[(B (05072, 2618, 0xa3a)
charset: latin-iso8859-4 (Right-Hand Part of Latin Alphabet 4
(ISO/IEC 8859-4): ISO-IR-110)
code point: 58
syntax: word
category: l:Latin
buffer code: 0x84 0xBA
file code: 0xC4 0x93 (encoded by coding system utf-8-dos)
font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-4
character: ^[$,1 3^[(B (01210063, 331827, 0x51033)
charset: mule-unicode-0100-24ff (Unicode characters of the range
U+0100..U+24FF.)
code point: 32 51
syntax: word
category: l:Latin
buffer code: 0x9C 0xF4 0xA0 0xB3
file code: 0xC4 0x93 (encoded by coding system mule-utf-8-dos)
font: -outline-Arial Unicode
MS-normal-r-normal-normal-14-105-96-96-p-60-iso10646-1
next prev parent reply other threads:[~2005-03-27 6:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-22 12:47 coding system Olive
2005-03-22 13:57 ` Joe Corneli
[not found] ` <mailman.4714.1111501237.32256.help-gnu-emacs@gnu.org>
2005-03-22 14:32 ` Olive
2005-03-23 0:43 ` Miles Bader
2005-03-22 19:08 ` Peter Dyballa
2005-03-26 23:46 ` Stefan Monnier
2005-03-27 6:56 ` B.T. Raven [this message]
2005-03-27 10:50 ` Eli Zaretskii
[not found] ` <mailman.321.1111924417.28103.help-gnu-emacs@gnu.org>
2005-03-27 15:54 ` Reiner Steib
2005-03-27 20:03 ` B.T. Raven
2005-03-29 14:54 ` Stefan Monnier
-- strict thread matches above, loose matches on Subject: below --
2003-05-26 9:58 Stein A. Stromme
2003-05-26 11:05 ` lawrence mitchell
2003-05-26 11:52 ` Stein A. Stromme
2003-05-26 11:58 ` Stein A. Stromme
2003-05-26 13:47 ` Oliver Scholz
2003-05-26 13:55 ` Kai Großjohann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=114cmcu2661njb8@corp.supernews.com \
--to=ejmn@cpinternet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).