From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: 21 Nov 2003 00:27:42 -0500 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> <200311190006.JAA14847@etlken.m17n.org> <200311210041.JAA18324@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1069392882 2733 80.91.224.253 (21 Nov 2003 05:34:42 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 21 Nov 2003 05:34:42 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Nov 21 06:34:39 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AN3wF-0006dh-00 for ; Fri, 21 Nov 2003 06:34:39 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AN3wD-00070b-00 for ; Fri, 21 Nov 2003 06:34:37 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN4np-0002ib-1k for emacs-devel@quimby.gnus.org; Fri, 21 Nov 2003 01:30:01 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AN4ng-0002eg-0W for emacs-devel@gnu.org; Fri, 21 Nov 2003 01:29:52 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AN4n8-0002MA-Qc for emacs-devel@gnu.org; Fri, 21 Nov 2003 01:29:49 -0500 Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN4n8-0002Le-8o for emacs-devel@gnu.org; Fri, 21 Nov 2003 01:29:18 -0500 Original-Received: from vor.iro.umontreal.ca (vor.iro.umontreal.ca [132.204.24.42]) by mercure.iro.umontreal.ca (8.12.9/8.12.9) with ESMTP id hAL5Rgbj032203; Fri, 21 Nov 2003 00:27:43 -0500 Original-Received: by vor.iro.umontreal.ca (Postfix, from userid 20848) id 91BAB3C63F; Fri, 21 Nov 2003 00:27:42 -0500 (EST) Original-To: Kenichi Handa In-Reply-To: <200311210041.JAA18324@etlken.m17n.org> Original-Lines: 38 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 X-DIRO-MailScanner: Found to be clean X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18009 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18009 >> I thought that string-make-unibyte only behaves meaningfully for >> "normal 8bit coding-systems" such as latin-1. > Yes, but it doesn't mean it is conceptually the same as > encode-coding-string. The result of string-make-unibyte > should still be regarded as a sequence of character, but the > result of encode-coding-string is a sequence of byte. Why/when is the distinction meaningful (given the fact that it can only be used meaningfully with 8bit coding-systems where the distinction seems more philosophical than anything else) ? > Here exists an ambiguity of a unibyte string. > The number 192 can be regarded as: > (1) just a number, a byte > (2) a code point of some character set. > (3) a character code But the second case is only possible for 8bit character sets, right? Until now, I always thought that Emacs only dealt with - byte streams representing encoded sequences of code points: case 1. - sequences of internal character codes (internally encoded in emacs-mule or unicode depending on the branch you use): case 3. Is there any place where we deal with sequences of code points of external charsets really (other than in the degenerate case where such a sequence is indistinguishable from case 1, maybe). > A unibyte string can contain (1) and (2) without > distinguishing them, but a multibyte string can contain (1) > and (3) while distinguishing them. Can multibyte strings distinguish the cases (1) and (3) for integer 97 and character `a' ? Stefan