From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Sat, 22 Nov 2003 10:25:36 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311220125.KAA20128@etlken.m17n.org> References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> <200311190006.JAA14847@etlken.m17n.org> <200311210041.JAA18324@etlken.m17n.org> <200311210627.PAA18757@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1069464870 21748 80.91.224.253 (22 Nov 2003 01:34:30 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 22 Nov 2003 01:34:30 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Sat Nov 22 02:34:27 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ANMfL-0000eJ-00 for ; Sat, 22 Nov 2003 02:34:27 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1ANMfL-00033x-00 for ; Sat, 22 Nov 2003 02:34:27 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ANNV5-0007fi-Pa for emacs-devel@quimby.gnus.org; Fri, 21 Nov 2003 21:27:55 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1ANNV0-0007de-Is for emacs-devel@gnu.org; Fri, 21 Nov 2003 21:27:50 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1ANNUU-0007HR-Jb for emacs-devel@gnu.org; Fri, 21 Nov 2003 21:27:49 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ANNUT-0007Ff-Ry for emacs-devel@gnu.org; Fri, 21 Nov 2003 21:27:18 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAM1Pbh27116; Sat, 22 Nov 2003 10:25:37 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAM1Pas20814; Sat, 22 Nov 2003 10:25:36 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id KAA20128; Sat, 22 Nov 2003 10:25:36 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 21 Nov 2003 09:59:59 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18027 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18027 In article , Stefan Monnier writes: >> It is perfectly possible to live in such an environment >> where only the charset iso-8859-1 is used but only the >> coding system utf-8 is used. In this environment, the >> results of encode-coding-string and string-make-unibyte are >> of course not the same, but still both operations are >> meaningful. > I see that encode-coding-string does the utf-8 encoding, but what > does string-make-unibyte do in such a case and what is it used for ? It gets iso-8859-1 code-points of all characters in a multibyte string and concatenate them (the same as what is does in latin-1 lang. env.). In his environment, he has no problem in using unibyte buffer because it can represent all characters he wants. >>> Until now, I always thought that Emacs only dealt with >>> - byte streams representing encoded sequences of code points: case 1. >>> - sequences of internal character codes (internally encoded in emacs-mule >>> or unicode depending on the branch you use): case 3. >>> Is there any place where we deal with sequences of code points of external >>> charsets really (other than in the degenerate case where such a sequence >>> is indistinguishable from case 1, maybe). >> I'd like to repeat that although we don't have such an >> environment now, Ah, no, we have UTF-8 lang. env. now. >> it doesn't mean it is impossible to assume such >> environment. > I guess I don't understand how that is possible (and useful) and what that > would look like. Please try C-x C-m L utf-8 RET and see how string-make-unibyte and string-make-multibyte work. --- Ken'ichi HANDA handa@m17n.org