From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Stefan Monnier" Newsgroups: gmane.emacs.devel Subject: Re: setenv -> locale-coding-system cannot handle ASCII?! Date: Wed, 26 Feb 2003 00:50:27 -0500 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200302260550.h1Q5oSc08967@rum.cs.yale.edu> References: <200302250634.PAA27478@etlken.m17n.org> <200302260058.JAA28973@etlken.m17n.org> <200302260211.h1Q2BJl08373@rum.cs.yale.edu> <200302260234.LAA29082@etlken.m17n.org> <200302260252.h1Q2qIK08490@rum.cs.yale.edu> <200302260532.OAA29294@etlken.m17n.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1046238740 7631 80.91.224.249 (26 Feb 2003 05:52:20 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 26 Feb 2003 05:52:20 +0000 (UTC) Cc: monnier+gnu/emacs@rum.cs.yale.edu Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18nuUM-0001ym-00 for ; Wed, 26 Feb 2003 06:52:18 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18nukd-00036x-00 for ; Wed, 26 Feb 2003 07:09:07 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nuTy-00077l-04 for emacs-devel@quimby.gnus.org; Wed, 26 Feb 2003 00:51:54 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 18nuTe-00075X-00 for emacs-devel@gnu.org; Wed, 26 Feb 2003 00:51:34 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 18nuSx-0006fF-00 for emacs-devel@gnu.org; Wed, 26 Feb 2003 00:50:52 -0500 Original-Received: from rum.cs.yale.edu ([128.36.229.169]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 18nuSd-0006LH-00; Wed, 26 Feb 2003 00:50:32 -0500 Original-Received: (from monnier@localhost) by rum.cs.yale.edu (8.11.6/8.11.6) id h1Q5oSc08967; Wed, 26 Feb 2003 00:50:28 -0500 X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4 Original-To: Kenichi Handa Original-cc: d.love@dl.ac.uk Original-cc: miles@gnu.org Original-cc: emacs-devel@gnu.org Original-cc: sds@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Emacs development discussions. List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:11960 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11960 > > I consider this context-dependent meaning of unibyte strings > > to be a problem. I understand why text in a unibyte buffer > > has such an ambiguous meaning and agree that it's difficult > > to avoid, but it's not a reason to carry over this difficulty > > to strings where it is not needed. > > Why is it not needed? Strings and buffers are not that > different, both are containers of characters. They are used differently. Operations on strings generally apply to the whole string: you can only encode/decode a whole string at a time. > If we get a unibyte string from a unibyte buffer by buffer-substring, > how should we treat that string? Like any other unibyte string: as a sequence of raw bytes. If you want to treat it as a sequence of characters, then you need to pass it through `string-as-multibyte'. In buffers, there is sometimes a need to represent multibyte chars inside a unibyte buffer because only part of the buffer is decoded. For a string, that can be avoided. You can make sure that if it is decoded it's a multibyte string and if it's not then it's a unibyte string. > > For example: what is the multibyteness of > > > (concat "\201" (format "%s" "hello")) > > and > > (concat "\201" (format "%s" 1)) > > The latter yields multibyte, but I think it'a bug. I found > that "(format "%s" 1)" is implemented by using > prin1-to-string, and prin1-to-string prints an object to a > temporary buffer and gets that buffer string. So, in a > multibyte sesstion "(format "%s" 1)" yields a multibyte > string. :-( I know: I bumped into it yesterday while playing around with tar-mode. How about the attached patch ? > So, do you mean that you want this? > > If a unibyte buffer has \201\300 in the region FROM and TO, > > (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1) > => "\201\300" > > (encode-coding-region FROM TO 'iso-latin-1) changes the > region to \300. Yes, I guess I'd be happy with it. > Isn't it more confusing? Not to me. > By the way, I also really really hate this unibyte/mulitbyte > problem. Sometimes I think I should have opposed to the > introduction of such a concept more strongly. But it's pretty damn handy for binary data. Stefan PS: I wish there was a way to swap two buffers's content so that tar-mode could swap the (potentially very large) data to a helper buffer (without needing to copy this large data) and then use multibyte for the display and unibyte for the helper buffer. Index: print.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/print.c,v retrieving revision 1.184 diff -u -r1.184 print.c --- print.c 4 Feb 2003 14:03:13 -0000 1.184 +++ print.c 26 Feb 2003 05:43:26 -0000 @@ -774,9 +774,12 @@ /* Make Vprin1_to_string_buffer be the default buffer after PRINTFINSH */ PRINTFINISH; set_buffer_internal (XBUFFER (Vprin1_to_string_buffer)); + if (ZV == ZV_BYTE) + Fset_buffer_multibyte (Qnil); object = Fbuffer_string (); Ferase_buffer (); + Fset_buffer_multibyte (Qt); set_buffer_internal (old); Vdeactivate_mark = tem;