From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu>
Newsgroups: gmane.emacs.devel
Subject: Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2,
 Issue 28]
Date: Wed, 22 Jan 2003 09:12:49 -0500
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200301221412.h0MECoA01024@rum.cs.yale.edu>
References: <E18ZDQC-0003mt-02@monty-python.gnu.org>
	<E18Zh9W-00012L-00@fencepost.gnu.org>
	<3405-Sat18Jan2003154003+0200-eliz@is.elta.co.il>
	<200301200229.LAA16287@etlken.m17n.org>
	<6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il>
	<200301202055.h0KKtun11691@rum.cs.yale.edu>
	<E18bHfj-0002Rd-00@fencepost.gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: main.gmane.org 1043244859 27567 80.91.224.249 (22 Jan 2003 14:14:19 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Wed, 22 Jan 2003 14:14:19 +0000 (UTC)
Cc: Stefan Monnier <monnier+gnu/emacs@rum.cs.yale.edu>
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18bLdv-0007A6-00
	for <emacs-devel@main.gmane.org>; Wed, 22 Jan 2003 15:14:15 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian))
	id 18bLfj-0001vl-00
	for <emacs-devel@quimby.gnus.org>; Wed, 22 Jan 2003 15:16:07 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18bLei-0002Ip-08
	for emacs-devel@quimby.gnus.org; Wed, 22 Jan 2003 09:15:04 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13)
	id 18bLeE-0001yK-00
	for emacs-devel@gnu.org; Wed, 22 Jan 2003 09:14:34 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13)
	id 18bLcZ-0000iO-00
	for emacs-devel@gnu.org; Wed, 22 Jan 2003 09:12:53 -0500
Original-Received: from rum.cs.yale.edu ([128.36.229.169])
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18bLcY-0000hp-00; Wed, 22 Jan 2003 09:12:50 -0500
Original-Received: (from monnier@localhost)
	by rum.cs.yale.edu (8.11.6/8.11.6) id h0MECoA01024;
	Wed, 22 Jan 2003 09:12:50 -0500
X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4
Original-To: Richard Stallman <rms@gnu.org>
Original-cc: eliz@is.elta.co.il
Original-cc: handa@m17n.org
Original-cc: emacs-devel@gnu.org
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: Emacs development discussions. <emacs-devel.gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Post: <mailto:emacs-devel@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:10975
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:10975

>     While we're at it, how about making string-as-multibyte obsolete ?
> 
> It is not obsolete--there are reasons to use it.

But it can be replaced by a call to decode-coding-string, so it is
not indispensable.

>     I think avoiding string-FOO-multibyte and using decode-coding-string
>     instead would make things a lot more clear.
> 
> I don't see any advantage in the change.

Here is the reason why we should discourage the use of unibyte<->multibyte
conversions and recommend coding/decoding instead:

There is a lot of
confusion among Emacs hackers about "what's this MULE stuff" and "why
Emacs does conversions instead of keeping things as they are", typically
for users of latin-1 locales (but more generally any 8-bit locale)
where they don't understand the difference between bytes and chars.

This is of course why we introduced unibyte buffers in the first place:
a lot of code was not properly updated to MULE and was not doing
conversions where they're necessary.

So where does the unibyte<->multibyte stuff comes in ?  I think it
simply promotes the illusion that it is possible to "switch between
the two equivalent representation" although there's clearly no unambiguous
equivalence.  So people end up with "oh, I have a unibyte thing here
and Emacs wants a multibyte thing instead, so I'll just make it
multibyte" using some kind of default encoding which "should work
most of the time".

If coders such as Eli and myself don't fully understand the semantics
of string-as-multibyte and string-make-multibyte (and the various ways
in which they are implicitly called), it's clear that those functions
should basically not be used by anyone.

Using decode-coding-string is just as easy and makes things much
more clear so we should encourage it.


	Stefan