From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Stefan Monnier" Newsgroups: gmane.emacs.devel Subject: Re: setenv -> locale-coding-system cannot handle ASCII?! Date: Tue, 25 Feb 2003 21:52:18 -0500 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200302260252.h1Q2qIK08490@rum.cs.yale.edu> References: <200302250634.PAA27478@etlken.m17n.org> <200302260058.JAA28973@etlken.m17n.org> <200302260211.h1Q2BJl08373@rum.cs.yale.edu> <200302260234.LAA29082@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1046228050 10017 80.91.224.249 (26 Feb 2003 02:54:10 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 26 Feb 2003 02:54:10 +0000 (UTC) Cc: monnier+gnu/emacs@rum.cs.yale.edu Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18nrhw-0002bE-00 for ; Wed, 26 Feb 2003 03:54:08 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18nryA-0001R3-00 for ; Wed, 26 Feb 2003 04:10:54 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nrgf-0003WT-01 for emacs-devel@quimby.gnus.org; Tue, 25 Feb 2003 21:52:49 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18nrgG-0003Tc-00 for emacs-devel@gnu.org; Tue, 25 Feb 2003 21:52:24 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18nrgE-0003TD-00 for emacs-devel@gnu.org; Tue, 25 Feb 2003 21:52:23 -0500 Original-Received: from rum.cs.yale.edu ([128.36.229.169]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nrgD-0003Sj-00; Tue, 25 Feb 2003 21:52:21 -0500 Original-Received: (from monnier@localhost) by rum.cs.yale.edu (8.11.6/8.11.6) id h1Q2qIK08490; Tue, 25 Feb 2003 21:52:18 -0500 X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4 Original-To: Kenichi Handa Original-cc: d.love@dl.ac.uk Original-cc: miles@gnu.org Original-cc: emacs-devel@gnu.org Original-cc: sds@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:11957 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11957 > >> (if (multibyte-string-p variable) > >> (setq variable (encode-coding-string variable locale-coding-system))) > >> > >> multibyte-string-p is mandatory because encode-coding-string > >> will change the byte-sequence of `variable' even if it is > >> unibyte. > >> Ex. (encode-coding-string "\201\300" 'iso-latin-1) => "\300" > > > I find this behavior annoying because it makes the emacs-mule > > encoding appear in a situation where it is not mentioned. > > I wish that > > > (encode-coding-string "\201\300" 'iso-latin-1) > > and > > (encode-coding-string (string-to-multibyte "\201\300") 'iso-latin-1) > > > returned the same value. > > Why? As I wrote before, what does bytes of unibyte string > means depends on a context. I consider this context-dependent meaning of unibyte strings to be a problem. I understand why text in a unibyte buffer has such an ambiguous meaning and agree that it's difficult to avoid, but it's not a reason to carry over this difficulty to strings where it is not needed. > In the former case, as it is given to encode-coding-string, > it is a multibyte form by which emacs represents > character(s), not a sequence of characters representing raw > bytes. The problem is that the multibyteness of strings is not always as easy to guess/control. For example: what is the multibyteness of (concat "\201" (format "%s" "hello")) and (concat "\201" (format "%s" 1)) > In the latter case, as it is given to string-to-multibyte, > it should be regard as a sequence of characters representing > raw bytes, thus the result of (string-to-multibyte > "\201\300") is still a sequence of raw-bytes. Encoding > raw-bytes should yield the same raw-bytes. Indeed, that's what I and `setenv' would want. > And, this behaviour of encode-coding-string on a unibyte > string is a natural consequence of encode-coding-region in a > unibyte buffer. As mentioned above, I understand why it works that way in buffers, but I don't think it has to work the same way for strings. Stefan