From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Fri, 21 Nov 2003 15:27:37 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311210627.PAA18757@etlken.m17n.org> References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> <200311190006.JAA14847@etlken.m17n.org> <200311210041.JAA18324@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1069396786 8212 80.91.224.253 (21 Nov 2003 06:39:46 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 21 Nov 2003 06:39:46 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Nov 21 07:39:42 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AN4xC-0008VY-00 for ; Fri, 21 Nov 2003 07:39:42 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AN4xC-0007gR-00 for ; Fri, 21 Nov 2003 07:39:42 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN5pF-0001J8-0T for emacs-devel@quimby.gnus.org; Fri, 21 Nov 2003 02:35:33 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AN5ku-0007MX-ES for emacs-devel@gnu.org; Fri, 21 Nov 2003 02:31:04 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AN5jG-0004mg-TD for emacs-devel@gnu.org; Fri, 21 Nov 2003 02:29:54 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN5jB-0004dL-Uz for emacs-devel@gnu.org; Fri, 21 Nov 2003 02:29:18 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAL6Rdh09229; Fri, 21 Nov 2003 15:27:39 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAL6Rbs13844; Fri, 21 Nov 2003 15:27:37 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id PAA18757; Fri, 21 Nov 2003 15:27:37 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 21 Nov 2003 00:27:42 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18010 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18010 In article , Stefan Monnier writes: >> Yes, but it doesn't mean it is conceptually the same as >> encode-coding-string. The result of string-make-unibyte >> should still be regarded as a sequence of character, but the >> result of encode-coding-string is a sequence of byte. > Why/when is the distinction meaningful (given the fact that it > can only be used meaningfully with 8bit coding-systems where the > distinction seems more philosophical than anything else) ? It is perfectly possible to live in such an environment where only the charset iso-8859-1 is used but only the coding system utf-8 is used. In this environment, the results of encode-coding-string and string-make-unibyte are of course not the same, but still both operations are meaningful. >> Here exists an ambiguity of a unibyte string. >> The number 192 can be regarded as: >> (1) just a number, a byte >> (2) a code point of some character set. >> (3) a character code > But the second case is only possible for 8bit character sets, right? Yes. But, as I wrote above, it doesn't mean that we are restricted to simple 8bit-oriented coding-systems. > Until now, I always thought that Emacs only dealt with > - byte streams representing encoded sequences of code points: case 1. > - sequences of internal character codes (internally encoded in emacs-mule > or unicode depending on the branch you use): case 3. > Is there any place where we deal with sequences of code points of external > charsets really (other than in the degenerate case where such a sequence > is indistinguishable from case 1, maybe). I'd like to repeat that although we don't have such an environment now, it doesn't mean it is impossible to assume such environment. >> A unibyte string can contain (1) and (2) without >> distinguishing them, but a multibyte string can contain (1) >> and (3) while distinguishing them. > Can multibyte strings distinguish the cases (1) and (3) for integer 97 and > character `a' ? Good point. Of course no. I dared not mention that to make the discussion simpler. --- Ken'ichi HANDA handa@m17n.org