From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu>
Newsgroups: gmane.emacs.devel
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 00:50:27 -0500
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200302260550.h1Q5oSc08967@rum.cs.yale.edu>
References: <m3lm05ciwr.fsf@loiso.podval.org>
	<200302250634.PAA27478@etlken.m17n.org>
	<buo65r8ua9i.fsf@mcspd15.ucom.lsi.nec.co.jp>
	<200302260058.JAA28973@etlken.m17n.org>
	<200302260211.h1Q2BJl08373@rum.cs.yale.edu>
	<200302260234.LAA29082@etlken.m17n.org>
	<200302260252.h1Q2qIK08490@rum.cs.yale.edu>
	<200302260532.OAA29294@etlken.m17n.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: main.gmane.org 1046238740 7631 80.91.224.249 (26 Feb 2003 05:52:20 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Wed, 26 Feb 2003 05:52:20 +0000 (UTC)
Cc: monnier+gnu/emacs@rum.cs.yale.edu
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18nuUM-0001ym-00
	for <emacs-devel@main.gmane.org>; Wed, 26 Feb 2003 06:52:18 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian))
	id 18nukd-00036x-00
	for <emacs-devel@quimby.gnus.org>; Wed, 26 Feb 2003 07:09:07 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18nuTy-00077l-04
	for emacs-devel@quimby.gnus.org; Wed, 26 Feb 2003 00:51:54 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13)
	id 18nuTe-00075X-00
	for emacs-devel@gnu.org; Wed, 26 Feb 2003 00:51:34 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13)
	id 18nuSx-0006fF-00
	for emacs-devel@gnu.org; Wed, 26 Feb 2003 00:50:52 -0500
Original-Received: from rum.cs.yale.edu ([128.36.229.169])
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18nuSd-0006LH-00; Wed, 26 Feb 2003 00:50:32 -0500
Original-Received: (from monnier@localhost)
	by rum.cs.yale.edu (8.11.6/8.11.6) id h1Q5oSc08967;
	Wed, 26 Feb 2003 00:50:28 -0500
X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4
Original-To: Kenichi Handa <handa@m17n.org>
Original-cc: d.love@dl.ac.uk
Original-cc: miles@gnu.org
Original-cc: emacs-devel@gnu.org
Original-cc: sds@gnu.org
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: Emacs development discussions. <emacs-devel.gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Post: <mailto:emacs-devel@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:11960
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11960

> > I consider this context-dependent meaning of unibyte strings
> > to be a problem.  I understand why text in a unibyte buffer
> > has such an ambiguous meaning and agree that it's difficult
> > to avoid, but it's not a reason to carry over this difficulty
> > to strings where it is not needed.
> 
> Why is it not needed?  Strings and buffers are not that
> different, both are containers of characters.

They are used differently.  Operations on strings generally apply to the
whole string: you can only encode/decode a whole string at a time.

> If we get a unibyte string from a unibyte buffer by buffer-substring,
> how should we treat that string?

Like any other unibyte string: as a sequence of raw bytes.
If you want to treat it as a sequence of characters, then
you need to pass it through `string-as-multibyte'.

In buffers, there is sometimes a need to represent multibyte chars
inside a unibyte buffer because only part of the buffer is
decoded.  For a string, that can be avoided.  You can make sure
that if it is decoded it's a multibyte string and if it's not
then it's a unibyte string.

> > For example: what is the multibyteness of
> 
> > 	(concat "\201" (format "%s" "hello"))
> > and
> > 	(concat "\201" (format "%s" 1))
> 
> The latter yields multibyte, but I think it'a bug.  I found
> that "(format "%s" 1)" is implemented by using
> prin1-to-string, and prin1-to-string prints an object to a
> temporary buffer and gets that buffer string.  So, in a
> multibyte sesstion "(format "%s" 1)" yields a multibyte
> string.  :-(

I know: I bumped into it yesterday while playing around with tar-mode.
How about the attached patch ?

> So, do you mean that you want this?
> 
>     If a unibyte buffer has \201\300 in the region FROM and TO,
> 
>     (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
> 	=> "\201\300"
> 
>     (encode-coding-region FROM TO 'iso-latin-1) changes the
>     region to \300.

Yes, I guess I'd be happy with it.

> Isn't it more confusing?

Not to me.

> By the way, I also really really hate this unibyte/mulitbyte
> problem.  Sometimes I think I should have opposed to the
> introduction of such a concept more strongly.

But it's pretty damn handy for binary data.


	Stefan


PS: I wish there was a way to swap two buffers's content so that
    tar-mode could swap the (potentially very large) data to
    a helper buffer (without needing to copy this large data)
    and then use multibyte for the display and unibyte for
    the helper buffer.


Index: print.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/print.c,v
retrieving revision 1.184
diff -u -r1.184 print.c
--- print.c	4 Feb 2003 14:03:13 -0000	1.184
+++ print.c	26 Feb 2003 05:43:26 -0000
@@ -774,9 +774,12 @@
   /* Make Vprin1_to_string_buffer be the default buffer after PRINTFINSH */
   PRINTFINISH;
   set_buffer_internal (XBUFFER (Vprin1_to_string_buffer));
+  if (ZV == ZV_BYTE)
+    Fset_buffer_multibyte (Qnil);
   object = Fbuffer_string ();
 
   Ferase_buffer ();
+  Fset_buffer_multibyte (Qt);
   set_buffer_internal (old);
 
   Vdeactivate_mark = tem;