From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Date: Wed, 29 Jan 2003 20:23:23 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200301291123.UAA17563@etlken.m17n.org> References: <3405-Sat18Jan2003154003+0200-eliz@is.elta.co.il> <200301200229.LAA16287@etlken.m17n.org> <6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il> <200301202055.h0KKtun11691@rum.cs.yale.edu> <200301221412.h0MECoA01024@rum.cs.yale.edu> <200301260130.h0Q1Uo518101@rum.cs.yale.edu> <200301270738.QAA14597@etlken.m17n.org> <200301271412.h0REClJ30624@rum.cs.yale.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1043839720 2120 80.91.224.249 (29 Jan 2003 11:28:40 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 29 Jan 2003 11:28:40 +0000 (UTC) Cc: monnier+gnu/emacs@rum.cs.yale.edu Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18dqOU-0000Y3-00 for ; Wed, 29 Jan 2003 12:28:38 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18dqTc-00057Y-00 for ; Wed, 29 Jan 2003 12:33:57 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18dqN2-0002W4-02 for emacs-devel@quimby.gnus.org; Wed, 29 Jan 2003 06:27:08 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18dqLL-0002EQ-00 for emacs-devel@gnu.org; Wed, 29 Jan 2003 06:25:23 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18dqL9-0001zj-00 for emacs-devel@gnu.org; Wed, 29 Jan 2003 06:25:14 -0500 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18dqJX-00019F-00; Wed, 29 Jan 2003 06:23:31 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h0TBNOk22072; Wed, 29 Jan 2003 20:23:24 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h0TBNOR03200; Wed, 29 Jan 2003 20:23:24 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id UAA17563; Wed, 29 Jan 2003 20:23:23 +0900 (JST) Original-To: monnier+gnu/emacs@rum.cs.yale.edu In-reply-to: <200301271412.h0REClJ30624@rum.cs.yale.edu> (monnier+gnu/emacs@rum.cs.yale.edu) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Original-cc: rms@gnu.org Original-cc: eliz@is.elta.co.il Original-cc: emacs-devel@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:11188 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11188 In article <200301271412.h0REClJ30624@rum.cs.yale.edu>, "Stefan Monnier" writes: >> In one sense, it seems clean to use the concept of decoding >> and encoding for all unibyte<->multibyte conversions >> coherently. But, that hides what Emacs actually does. > You mean that string-FOO-multibyte uses special-cased code > and that there is thus a difference of efficiency ? Yes. string-FOO-multibyte are more effcient than decode-coding-string. But, that is not the point. >> > unibyte strings are sequences of bytes while multibyte >> > strings are sequences of chars. >> Unfortunately no. > I don't think there is any "truth" here. There are simply different > ways to look at the same thing. I don't understand why you don't think my explanation is not true. You wrote: >> Converting between bytes and chars is the purpose of >> coding-systems. Ok, then resulting region of encode-coding-region is a sequence of bytes, not chars, even if it's a multibyte buffer. Thus, the return string of buffer-substring on that region (let's name it MULTI) is also a byte sequence. Using (string-to-unibyte MULTI) to get the same byte sequence but in unibyte form is ok as long as we adopt my interpretatoin of that function. But, doing (encode-coding-string MULTI 'raw-text) is conceptually broken because MULTI is already a byte sequence. --- Ken'ichi HANDA handa@m17n.org