From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "B.T. Raven" Newsgroups: gmane.emacs.help Subject: Re: coding system Date: Sun, 27 Mar 2005 00:56:37 -0600 Organization: Posted via Supernews, http://www.supernews.com Message-ID: <114cmcu2661njb8@corp.supernews.com> References: <4240134e$1_3@x-privat.org> <877jju9dog.fsf-monnier+gnu.emacs.help@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1111906745 23312 80.91.229.2 (27 Mar 2005 06:59:05 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 27 Mar 2005 06:59:05 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun Mar 27 08:59:01 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DFRje-0001D0-0o for geh-help-gnu-emacs@m.gmane.org; Sun, 27 Mar 2005 08:58:58 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DFRzT-0006rs-6I for geh-help-gnu-emacs@m.gmane.org; Sun, 27 Mar 2005 02:15:19 -0500 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!sn-xit-02!sn-xit-01!sn-post-01!supernews.com!corp.supernews.com!not-for-mail Original-Newsgroups: gnu.emacs.help X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Original-X-Complaints-To: abuse@supernews.com Original-Lines: 97 Original-Xref: shelby.stanford.edu gnu.emacs.help:129657 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:25209 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:25209 "Stefan Monnier" wrote in message news:877jju9dog.fsf-monnier+gnu.emacs.help@gnu.org... > > However it seems that the coding system for keyboard input is latin-1. > > This is a unibyte coding system; why does emacs see a multibyte charater > > when I press é? To what corresponds this 2281? > > Inside Emacs, there's no such thing as unibyte characters and > a multibyte characters. There are just characters, which are represented > by integers. When loading/saving a file, characters are decoded/encoded > into sequences of bytes which can be unibyte or multibyte. This same "é" > can be represented in some files with a single byte (e.g. if it's a latin-1 > file) or as two bytes (e.g. if it's a utf-8 file), or ... > > > Stefan That "or ..." is pregnant with meaning. It seems that the same character can be represented in the same buffer itself with 3 or more different byte sequences. Here is the C-u C-x = report for three e with acute and two e with macron: (Sorry about the munged characters. I don't know how to use gnus under w32 so I have to copypaste from emacs to Outlook. Notice that the e with macron expands from a 2-byte to a 4-byte representation in the buffer after being saved and then reloaded. Also the part of the font it uses seems to be different. Even if unification on decoding were working, could it overcome this great a difference in the representation of the characters? Ed. ééé$,1 3,D: (Bcharacter: é (04351, 2281, 0x8e9) charset: latin-iso8859-1 (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100) code point: 105 syntax: word category: l:Latin buffer code: 0x81 0xE9 file code: E9 (encoded by coding system iso-latin-1-dos) font: -outline-Arial Unicode MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-1 character: é (04551, 2409, 0x969) charset: latin-iso8859-2 (Right-Hand Part of Latin Alphabet 2 (ISO/IEC 8859-2): ISO-IR-101) code point: 105 syntax: word category: l:Latin buffer code: 0x82 0xE9 file code: 0xC3 0xA9 (encoded by coding system mule-utf-8-dos) font: -outline-Arial Unicode MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-2 character: é (05151, 2665, 0xa69) charset: latin-iso8859-4 (Right-Hand Part of Latin Alphabet 4 (ISO/IEC 8859-4): ISO-IR-110) code point: 105 syntax: word category: l:Latin buffer code: 0x84 0xE9 file code: E9 (encoded by coding system iso-latin-1-dos) font: -outline-Arial Unicode MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-4 character: $,1 3(B (05072, 2618, 0xa3a) charset: latin-iso8859-4 (Right-Hand Part of Latin Alphabet 4 (ISO/IEC 8859-4): ISO-IR-110) code point: 58 syntax: word category: l:Latin buffer code: 0x84 0xBA file code: 0xC4 0x93 (encoded by coding system utf-8-dos) font: -outline-Arial Unicode MS-normal-r-normal-normal-14-105-96-96-p-60-iso8859-4 character: $,1 3(B (01210063, 331827, 0x51033) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point: 32 51 syntax: word category: l:Latin buffer code: 0x9C 0xF4 0xA0 0xB3 file code: 0xC4 0x93 (encoded by coding system mule-utf-8-dos) font: -outline-Arial Unicode MS-normal-r-normal-normal-14-105-96-96-p-60-iso10646-1