From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: adding a new encoding Date: Tue, 22 Jun 2004 08:44:41 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200406212344.IAA11732@etlken.m17n.org> References: <20040619150301.GC3467@ata.cs.hun.edu.tr> <20040620163514.GA3576@ata.cs.hun.edu.tr> <200406202318.IAA09165@etlken.m17n.org> <20040621200633.GF1361@ata.cs.hun.edu.tr> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1087861574 1528 80.91.224.253 (21 Jun 2004 23:46:14 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 21 Jun 2004 23:46:14 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Jun 22 01:46:06 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BcYUI-0002O7-00 for ; Tue, 22 Jun 2004 01:46:06 +0200 Original-Received: from lists.gnu.org ([199.232.76.165]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BcYUH-0002AS-00 for ; Tue, 22 Jun 2004 01:46:06 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BcYVY-0000Sg-FM for emacs-devel@quimby.gnus.org; Mon, 21 Jun 2004 19:47:24 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BcYVV-0000SK-KZ for emacs-devel@gnu.org; Mon, 21 Jun 2004 19:47:21 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BcYVT-0000R6-6v for emacs-devel@gnu.org; Mon, 21 Jun 2004 19:47:20 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BcYVT-0000R3-4u for emacs-devel@gnu.org; Mon, 21 Jun 2004 19:47:19 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1BcYTL-0007vS-0Z for emacs-devel@gnu.org; Mon, 21 Jun 2004 19:45:07 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.12.3/8.12.3/Debian-6.6) with ESMTP id i5LNigno018726; Tue, 22 Jun 2004 08:44:43 +0900 Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6p2/8.11.6) with ESMTP id i5LNifW19663; Tue, 22 Jun 2004 08:44:41 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id IAA11732; Tue, 22 Jun 2004 08:44:41 +0900 (JST) Original-To: ibr@ata.cs.hun.edu.tr In-reply-to: <20040621200633.GF1361@ata.cs.hun.edu.tr> (message from Baurjan Ismagulov on Mon, 21 Jun 2004 22:06:33 +0200) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:25165 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:25165 In article <20040621200633.GF1361@ata.cs.hun.edu.tr>, Baurjan Ismagulov writes: > > > * Emacs uses ISO 2022 internally. > > Not correct. Emacs uses character codes of each ISO-2022 > > conforming charsets. > And each character is prepended with a charset code? Yes, prepended with a charset code of range 0x80..0x9D plus optional extented charset code of range 0xA0..0xFF, and code points (0x20..0x7F) are `logior'ed with 0x80. So a multibyte representation of character is classified into these: 0x00..0x7F 0x80..0x9D 0xA0..0xFF 0x80..0x9D 0xA0..0xFF 0xA0..0xFF 0x80..0x9D 0xA0..0xFF 0xA0..0xFF 0xA0..0xFF > How do raw bytes > look like in Emacs memory for, say, \U+0410\U+00DF\U+0534? Is there an > easy way to see that (princ somewhere in > lisp/international/mule-cmds.el)? Try this: (string-as-unibyte (string (decode-char 'ucs #x0410) (decode-char 'ucs #x00DF) (decode-char 'ucs #x0534))) M-x list-character-sets also give some information. But, you'ld better not write a code depending on it. Unicode-based Emacs (that will come after the release of current CVS HEAD) uses UTF-8 as a multibyte representation. --- Ken'ichi HANDA handa@m17n.org