From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu>
Newsgroups: gmane.emacs.devel
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Tue, 25 Feb 2003 21:52:18 -0500
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200302260252.h1Q2qIK08490@rum.cs.yale.edu>
References: <m3lm05ciwr.fsf@loiso.podval.org>
	<200302250634.PAA27478@etlken.m17n.org>
	<buo65r8ua9i.fsf@mcspd15.ucom.lsi.nec.co.jp>
	<200302260058.JAA28973@etlken.m17n.org>
	<200302260211.h1Q2BJl08373@rum.cs.yale.edu>
	<200302260234.LAA29082@etlken.m17n.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: main.gmane.org 1046228050 10017 80.91.224.249 (26 Feb 2003 02:54:10 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Wed, 26 Feb 2003 02:54:10 +0000 (UTC)
Cc: monnier+gnu/emacs@rum.cs.yale.edu
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18nrhw-0002bE-00
	for <emacs-devel@main.gmane.org>; Wed, 26 Feb 2003 03:54:08 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian))
	id 18nryA-0001R3-00
	for <emacs-devel@quimby.gnus.org>; Wed, 26 Feb 2003 04:10:54 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18nrgf-0003WT-01
	for emacs-devel@quimby.gnus.org; Tue, 25 Feb 2003 21:52:49 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13)
	id 18nrgG-0003Tc-00
	for emacs-devel@gnu.org; Tue, 25 Feb 2003 21:52:24 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13)
	id 18nrgE-0003TD-00
	for emacs-devel@gnu.org; Tue, 25 Feb 2003 21:52:23 -0500
Original-Received: from rum.cs.yale.edu ([128.36.229.169])
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18nrgD-0003Sj-00; Tue, 25 Feb 2003 21:52:21 -0500
Original-Received: (from monnier@localhost)
	by rum.cs.yale.edu (8.11.6/8.11.6) id h1Q2qIK08490;
	Tue, 25 Feb 2003 21:52:18 -0500
X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4
Original-To: Kenichi Handa <handa@m17n.org>
Original-cc: d.love@dl.ac.uk
Original-cc: miles@gnu.org
Original-cc: emacs-devel@gnu.org
Original-cc: sds@gnu.org
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: Emacs development discussions. <emacs-devel.gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Post: <mailto:emacs-devel@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:11957
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11957

> >>    (if (multibyte-string-p variable)
> >>        (setq variable (encode-coding-string variable locale-coding-system)))
> >>  
> >>  multibyte-string-p is mandatory because encode-coding-string
> >>  will change the byte-sequence of `variable' even if it is
> >>  unibyte.
> >>  Ex. (encode-coding-string "\201\300" 'iso-latin-1) => "\300"
> 
> > I find this behavior annoying because it makes the emacs-mule
> > encoding appear in a situation where it is not mentioned.
> > I wish that
> 
> >     (encode-coding-string "\201\300" 'iso-latin-1)
> > and
> >     (encode-coding-string (string-to-multibyte "\201\300") 'iso-latin-1)
> 
> > returned the same value.
> 
> Why?  As I wrote before, what does bytes of unibyte string
> means depends on a context.

I consider this context-dependent meaning of unibyte strings
to be a problem.  I understand why text in a unibyte buffer
has such an ambiguous meaning and agree that it's difficult
to avoid, but it's not a reason to carry over this difficulty
to strings where it is not needed.

> In the former case, as it is given to encode-coding-string,
> it is a multibyte form by which emacs represents
> character(s), not a sequence of characters representing raw
> bytes.

The problem is that the multibyteness of strings is not
always as easy to guess/control.  For example: what is the
multibyteness of

	(concat "\201" (format "%s" "hello"))
and
	(concat "\201" (format "%s" 1))

> In the latter case, as it is given to string-to-multibyte,
> it should be regard as a sequence of characters representing
> raw bytes, thus the result of (string-to-multibyte
> "\201\300") is still a sequence of raw-bytes.  Encoding
> raw-bytes should yield the same raw-bytes.

Indeed, that's what I and `setenv' would want.

> And, this behaviour of encode-coding-string on a unibyte
> string is a natural consequence of encode-coding-region in a
> unibyte buffer.

As mentioned above, I understand why it works that way in buffers,
but I don't think it has to work the same way for strings.


	Stefan