From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Date: Mon, 27 Jan 2003 16:38:39 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200301270738.QAA14597@etlken.m17n.org> References: <3405-Sat18Jan2003154003+0200-eliz@is.elta.co.il> <200301200229.LAA16287@etlken.m17n.org> <6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il> <200301202055.h0KKtun11691@rum.cs.yale.edu> <200301221412.h0MECoA01024@rum.cs.yale.edu> <200301260130.h0Q1Uo518101@rum.cs.yale.edu> NNTP-Posting-Host: main.gmane.org X-Trace: main.gmane.org 1043653672 2331 80.91.224.249 (27 Jan 2003 07:47:52 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 27 Jan 2003 07:47:52 +0000 (UTC) Cc: monnier+gnu/emacs@rum.cs.yale.edu Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18d3zi-0000bT-00 for ; Mon, 27 Jan 2003 08:47:50 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18d43o-0000zJ-00 for ; Mon, 27 Jan 2003 08:52:04 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18d3uV-0006kW-04 for emacs-devel@quimby.gnus.org; Mon, 27 Jan 2003 02:42:27 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18d3tb-0006OE-00 for emacs-devel@gnu.org; Mon, 27 Jan 2003 02:41:31 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18d3sb-0005xS-00 for emacs-devel@gnu.org; Mon, 27 Jan 2003 02:40:55 -0500 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18d3rV-0005Fs-00; Mon, 27 Jan 2003 02:39:21 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h0R7cek14599; Mon, 27 Jan 2003 16:38:40 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h0R7cdR17970; Mon, 27 Jan 2003 16:38:39 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA14597; Mon, 27 Jan 2003 16:38:39 +0900 (JST) Original-To: monnier+gnu/emacs@rum.cs.yale.edu In-reply-to: <200301260130.h0Q1Uo518101@rum.cs.yale.edu> (monnier+gnu/emacs@rum.cs.yale.edu) Original-cc: rms@gnu.org Original-cc: eliz@is.elta.co.il Original-cc: emacs-devel@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:11118 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11118 In article <200301260130.h0Q1Uo518101@rum.cs.yale.edu>, "Stefan Monnier" writes: > I don't understand your question. When people use string-FOO-multibyte > it's generally because they don't understand what's going on and they > think "a char is a char is a char and I don't get this multibyte madness": > using decode-coding-string would force them to better understand what's > going on. But I suspect that such people won't use the correct coding system anyway. To use the correct coding system, they must clearly understand what kind of multibyte string they want. And if they understand that, there should be no difficulty in using the correct string-FOO-multibyte function. In one sense, it seems clean to use the concept of decoding and encoding for all unibyte<->multibyte conversions coherently. But, that hides what Emacs actually does. You wrote: > I find it more helpful to think in terms of bytes and chars: Definitely. But, > unibyte strings are sequences of bytes while multibyte > strings are sequences of chars. Unfortunately no. Emacs can represent a character sequence both in unibyte and multibyte string. Emacs can also represent a raw-byte sequence both in unibyte and multibyte string. For a multibyte string, which it represents (char-seq or byte-seq) can be detected by what kind of characters it contains. But, for a unibyte string, it's impossible, only the context of how it is used decides that. For string-make-multibyte, the input is a char-seq, and the resulf of conversion is also a char-seq. So, the concept of decoding is not applicable here. For string-to-multibyte, the input is a byte-seq, and the result of conversion is also a byte-seq. So, again, the concept of decoding is not applicable neither. For string-as-multibyte, the intput is a byte-seq, and the result of conversion is a char-seq. So, only here, the concept of decoding is also applicable. I hope this explains why I insist on string-FOO-multibyte functions. By the way, it may be good to instroduce coding system aliases `internal' and `default', and write, for instance, in the docstring of string-as-multibyte that the effect is the same as (decode-coding-string UNIBYTE-STRING 'internal). --- Ken'ichi HANDA handa@m17n.org