From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: setenv -> locale-coding-system cannot handle ASCII?! Date: Wed, 26 Feb 2003 14:32:16 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200302260532.OAA29294@etlken.m17n.org> References: <200302250634.PAA27478@etlken.m17n.org> <200302260058.JAA28973@etlken.m17n.org> <200302260211.h1Q2BJl08373@rum.cs.yale.edu> <200302260234.LAA29082@etlken.m17n.org> <200302260252.h1Q2qIK08490@rum.cs.yale.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1046237567 4810 80.91.224.249 (26 Feb 2003 05:32:47 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 26 Feb 2003 05:32:47 +0000 (UTC) Cc: miles@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18nuBR-0001FQ-00 for ; Wed, 26 Feb 2003 06:32:45 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18nuRi-0002vI-00 for ; Wed, 26 Feb 2003 06:49:35 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nuBT-0008A5-04 for emacs-devel@quimby.gnus.org; Wed, 26 Feb 2003 00:32:47 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18nuB9-00089k-00 for emacs-devel@gnu.org; Wed, 26 Feb 2003 00:32:27 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18nuB8-00089Z-00 for emacs-devel@gnu.org; Wed, 26 Feb 2003 00:32:26 -0500 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nuB6-00080s-00; Wed, 26 Feb 2003 00:32:24 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h1Q5WHk15186; Wed, 26 Feb 2003 14:32:17 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h1Q5WGR06927; Wed, 26 Feb 2003 14:32:16 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id OAA29294; Wed, 26 Feb 2003 14:32:16 +0900 (JST) Original-To: monnier+gnu/emacs@rum.cs.yale.edu In-reply-to: <200302260252.h1Q2qIK08490@rum.cs.yale.edu> (monnier+gnu/emacs@rum.cs.yale.edu) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Original-cc: d.love@dl.ac.uk Original-cc: sds@gnu.org Original-cc: emacs-devel@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:11959 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11959 In article <200302260252.h1Q2qIK08490@rum.cs.yale.edu>, "Stefan Monnier" writes: > I consider this context-dependent meaning of unibyte strings > to be a problem. I understand why text in a unibyte buffer > has such an ambiguous meaning and agree that it's difficult > to avoid, but it's not a reason to carry over this difficulty > to strings where it is not needed. Why is it not needed? Strings and buffers are not that different, both are containers of characters. If we get a unibyte string from a unibyte buffer by buffer-substring, how should we treat that string? >> In the former case, as it is given to encode-coding-string, >> it is a multibyte form by which emacs represents >> character(s), not a sequence of characters representing raw >> bytes. > The problem is that the multibyteness of strings is not > always as easy to guess/control. I agree. > For example: what is the multibyteness of > (concat "\201" (format "%s" "hello")) > and > (concat "\201" (format "%s" 1)) The latter yields multibyte, but I think it'a bug. I found that "(format "%s" 1)" is implemented by using prin1-to-string, and prin1-to-string prints an object to a temporary buffer and gets that buffer string. So, in a multibyte sesstion "(format "%s" 1)" yields a multibyte string. :-( >> In the latter case, as it is given to string-to-multibyte, >> it should be regard as a sequence of characters representing >> raw bytes, thus the result of (string-to-multibyte >> "\201\300") is still a sequence of raw-bytes. Encoding >> raw-bytes should yield the same raw-bytes. > Indeed, that's what I and `setenv' would want. >> And, this behaviour of encode-coding-string on a unibyte >> string is a natural consequence of encode-coding-region in a >> unibyte buffer. > As mentioned above, I understand why it works that way in buffers, > but I don't think it has to work the same way for strings. So, do you mean that you want this? If a unibyte buffer has \201\300 in the region FROM and TO, (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1) => "\201\300" (encode-coding-region FROM TO 'iso-latin-1) changes the region to \300. Isn't it more confusing? By the way, I also really really hate this unibyte/mulitbyte problem. Sometimes I think I should have opposed to the introduction of such a concept more strongly. imagine there's no unibyte it's easy if you try no bytes below us above us only chars imagine all the people living in multibyte :-) --- Ken'ichi HANDA handa@m17n.org