From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2,
 Issue 28]
Date: Mon, 27 Jan 2003 16:38:39 +0900 (JST)
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200301270738.QAA14597@etlken.m17n.org>
References: <E18ZDQC-0003mt-02@monty-python.gnu.org>
	<E18Zh9W-00012L-00@fencepost.gnu.org>
	<3405-Sat18Jan2003154003+0200-eliz@is.elta.co.il>
	<200301200229.LAA16287@etlken.m17n.org>
	<6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il>
	<200301202055.h0KKtun11691@rum.cs.yale.edu>
	<E18bHfj-0002Rd-00@fencepost.gnu.org>
	<200301221412.h0MECoA01024@rum.cs.yale.edu>
	<E18bwcy-000569-00@fencepost.gnu.org>
	<200301260130.h0Q1Uo518101@rum.cs.yale.edu>
NNTP-Posting-Host: main.gmane.org
X-Trace: main.gmane.org 1043653672 2331 80.91.224.249 (27 Jan 2003 07:47:52 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Mon, 27 Jan 2003 07:47:52 +0000 (UTC)
Cc: monnier+gnu/emacs@rum.cs.yale.edu
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18d3zi-0000bT-00
	for <emacs-devel@main.gmane.org>; Mon, 27 Jan 2003 08:47:50 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian))
	id 18d43o-0000zJ-00
	for <emacs-devel@quimby.gnus.org>; Mon, 27 Jan 2003 08:52:04 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18d3uV-0006kW-04
	for emacs-devel@quimby.gnus.org; Mon, 27 Jan 2003 02:42:27 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13)
	id 18d3tb-0006OE-00
	for emacs-devel@gnu.org; Mon, 27 Jan 2003 02:41:31 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13)
	id 18d3sb-0005xS-00
	for emacs-devel@gnu.org; Mon, 27 Jan 2003 02:40:55 -0500
Original-Received: from tsukuba.m17n.org ([192.47.44.130])
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18d3rV-0005Fs-00; Mon, 27 Jan 2003 02:39:21 -0500
Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h0R7cek14599;
	Mon, 27 Jan 2003 16:38:40 +0900 (JST)	(envelope-from handa@m17n.org)
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	h0R7cdR17970;	Mon, 27 Jan 2003 16:38:39 +0900 (JST)
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA14597;
	Mon, 27 Jan 2003 16:38:39 +0900 (JST)
Original-To: monnier+gnu/emacs@rum.cs.yale.edu
In-reply-to: <200301260130.h0Q1Uo518101@rum.cs.yale.edu>
	(monnier+gnu/emacs@rum.cs.yale.edu)
Original-cc: rms@gnu.org
Original-cc: eliz@is.elta.co.il
Original-cc: emacs-devel@gnu.org
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: Emacs development discussions. <emacs-devel.gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Post: <mailto:emacs-devel@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:11118
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11118

In article <200301260130.h0Q1Uo518101@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> I don't understand your question.  When people use string-FOO-multibyte
> it's generally because they don't understand what's going on and they
> think "a char is a char is a char and I don't get this multibyte madness":
> using decode-coding-string would force them to better understand what's
> going on.

But I suspect that such people won't use the correct coding
system anyway.  To use the correct coding system, they must
clearly understand what kind of multibyte string they want.
And if they understand that, there should be no difficulty
in using the correct string-FOO-multibyte function.

In one sense, it seems clean to use the concept of decoding
and encoding for all unibyte<->multibyte conversions
coherently.  But, that hides what Emacs actually does.

You wrote:
> I find it more helpful to think in terms of bytes and chars:

Definitely.  But,

> unibyte strings are sequences of bytes while multibyte
> strings are sequences of chars.

Unfortunately no.

Emacs can represent a character sequence both in unibyte and
multibyte string.  Emacs can also represent a raw-byte
sequence both in unibyte and multibyte string.  For a
multibyte string, which it represents (char-seq or byte-seq)
can be detected by what kind of characters it contains.
But, for a unibyte string, it's impossible, only the context
of how it is used decides that.

For string-make-multibyte, the input is a char-seq, and the
resulf of conversion is also a char-seq.  So, the concept of
decoding is not applicable here.

For string-to-multibyte, the input is a byte-seq, and the
result of conversion is also a byte-seq.  So, again, the
concept of decoding is not applicable neither.

For string-as-multibyte, the intput is a byte-seq, and the
result of conversion is a char-seq.  So, only here, the
concept of decoding is also applicable.

I hope this explains why I insist on string-FOO-multibyte
functions.

By the way, it may be good to instroduce coding system
aliases `internal' and `default', and write, for instance,
in the docstring of string-as-multibyte that the effect is
the same as (decode-coding-string UNIBYTE-STRING 'internal).

---
Ken'ichi HANDA
handa@m17n.org