From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Stefan Monnier" Newsgroups: gmane.emacs.devel Subject: Re: unibyte<->multibyte conversion [Re: Emacs-diffs Digest, Vol 2, Issue 28] Date: Wed, 22 Jan 2003 09:12:49 -0500 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200301221412.h0MECoA01024@rum.cs.yale.edu> References: <3405-Sat18Jan2003154003+0200-eliz@is.elta.co.il> <200301200229.LAA16287@etlken.m17n.org> <6480-Mon20Jan2003214849+0200-eliz@is.elta.co.il> <200301202055.h0KKtun11691@rum.cs.yale.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1043244859 27567 80.91.224.249 (22 Jan 2003 14:14:19 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 22 Jan 2003 14:14:19 +0000 (UTC) Cc: Stefan Monnier Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18bLdv-0007A6-00 for ; Wed, 22 Jan 2003 15:14:15 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18bLfj-0001vl-00 for ; Wed, 22 Jan 2003 15:16:07 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18bLei-0002Ip-08 for emacs-devel@quimby.gnus.org; Wed, 22 Jan 2003 09:15:04 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18bLeE-0001yK-00 for emacs-devel@gnu.org; Wed, 22 Jan 2003 09:14:34 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18bLcZ-0000iO-00 for emacs-devel@gnu.org; Wed, 22 Jan 2003 09:12:53 -0500 Original-Received: from rum.cs.yale.edu ([128.36.229.169]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18bLcY-0000hp-00; Wed, 22 Jan 2003 09:12:50 -0500 Original-Received: (from monnier@localhost) by rum.cs.yale.edu (8.11.6/8.11.6) id h0MECoA01024; Wed, 22 Jan 2003 09:12:50 -0500 X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4 Original-To: Richard Stallman Original-cc: eliz@is.elta.co.il Original-cc: handa@m17n.org Original-cc: emacs-devel@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:10975 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:10975 > While we're at it, how about making string-as-multibyte obsolete ? > > It is not obsolete--there are reasons to use it. But it can be replaced by a call to decode-coding-string, so it is not indispensable. > I think avoiding string-FOO-multibyte and using decode-coding-string > instead would make things a lot more clear. > > I don't see any advantage in the change. Here is the reason why we should discourage the use of unibyte<->multibyte conversions and recommend coding/decoding instead: There is a lot of confusion among Emacs hackers about "what's this MULE stuff" and "why Emacs does conversions instead of keeping things as they are", typically for users of latin-1 locales (but more generally any 8-bit locale) where they don't understand the difference between bytes and chars. This is of course why we introduced unibyte buffers in the first place: a lot of code was not properly updated to MULE and was not doing conversions where they're necessary. So where does the unibyte<->multibyte stuff comes in ? I think it simply promotes the illusion that it is possible to "switch between the two equivalent representation" although there's clearly no unambiguous equivalence. So people end up with "oh, I have a unibyte thing here and Emacs wants a multibyte thing instead, so I'll just make it multibyte" using some kind of default encoding which "should work most of the time". If coders such as Eli and myself don't fully understand the semantics of string-as-multibyte and string-make-multibyte (and the various ways in which they are implicitly called), it's clear that those functions should basically not be used by anyone. Using decode-coding-string is just as easy and makes things much more clear so we should encourage it. Stefan