From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 16:49:15 +0900 (JST)
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200302260749.QAA29494@etlken.m17n.org>
References: <m3lm05ciwr.fsf@loiso.podval.org>
	<200302250634.PAA27478@etlken.m17n.org>
	<buo65r8ua9i.fsf@mcspd15.ucom.lsi.nec.co.jp>
	<200302260058.JAA28973@etlken.m17n.org>
	<200302260211.h1Q2BJl08373@rum.cs.yale.edu>
	<200302260234.LAA29082@etlken.m17n.org>
	<200302260252.h1Q2qIK08490@rum.cs.yale.edu>
	<200302260532.OAA29294@etlken.m17n.org>
	<200302260550.h1Q5oSc08967@rum.cs.yale.edu>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: main.gmane.org 1046245782 24742 80.91.224.249 (26 Feb 2003 07:49:42 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Wed, 26 Feb 2003 07:49:42 +0000 (UTC)
Cc: miles@gnu.org
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18nwJw-0006Qv-00
	for <emacs-devel@main.gmane.org>; Wed, 26 Feb 2003 08:49:40 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian))
	id 18nwaG-0004BQ-00
	for <emacs-devel@quimby.gnus.org>; Wed, 26 Feb 2003 09:06:32 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18nwK0-0001l1-01
	for emacs-devel@quimby.gnus.org; Wed, 26 Feb 2003 02:49:44 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13)
	id 18nwJh-0001kN-00
	for emacs-devel@gnu.org; Wed, 26 Feb 2003 02:49:25 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13)
	id 18nwJf-0001jr-00
	for emacs-devel@gnu.org; Wed, 26 Feb 2003 02:49:24 -0500
Original-Received: from tsukuba.m17n.org ([192.47.44.130])
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18nwJd-0001iz-00; Wed, 26 Feb 2003 02:49:22 -0500
Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h1Q7nGk16247;
	Wed, 26 Feb 2003 16:49:16 +0900 (JST)	(envelope-from handa@m17n.org)
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	h1Q7nGR07664;	Wed, 26 Feb 2003 16:49:16 +0900 (JST)
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA29494;
	Wed, 26 Feb 2003 16:49:15 +0900 (JST)
Original-To: monnier+gnu/emacs@rum.cs.yale.edu
In-reply-to: <200302260550.h1Q5oSc08967@rum.cs.yale.edu>
	(monnier+gnu/emacs@rum.cs.yale.edu)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
Original-cc: d.love@dl.ac.uk
Original-cc: sds@gnu.org
Original-cc: emacs-devel@gnu.org
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: Emacs development discussions. <emacs-devel.gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Post: <mailto:emacs-devel@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:11963
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11963

In article <200302260550.h1Q5oSc08967@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>>  Why is it not needed?  Strings and buffers are not that
>>  different, both are containers of characters.

> They are used differently.  Operations on strings generally apply to the
> whole string: you can only encode/decode a whole string at a time.

That's because of the limitation of the current
implementation, not because of the nature of strings.
There's no reason for keeping that limitation.  Actually, as
we have changed the type Lisp_String in 21.1, it's not
difficult to make strings change length.

>>  If we get a unibyte string from a unibyte buffer by buffer-substring,
>>  how should we treat that string?

> Like any other unibyte string: as a sequence of raw bytes.
> If you want to treat it as a sequence of characters, then
> you need to pass it through `string-as-multibyte'.

If we regard that limitation as a nature of strings, your
idea is worth considering.  It seems that we can at least
construct a consistent explanation about its behaviour based
on your idea too.

------------------------------------------------------------
What a character in a unibyte buffer represents depends on a
context.  It may be a character represented by a single
byte, or a raw byte not yet decoded, or a byte constituing a
multibyte form of the different character.

On the other hand, a character in a unibyte string always
represents a raw byte.  Emacs coerces it into a character
represented by that single byte when a unibyte string is
concatenated with a multibyte string, or it is inserted in a
multibyte buffer.
------------------------------------------------------------

But, I'm not sure such a change is really necessary.  Are
you sure that the change doesn't break the current usage of
unibyte strings?

>>  The latter yields multibyte, but I think it'a bug.  I found
>>  that "(format "%s" 1)" is implemented by using
>>  prin1-to-string, and prin1-to-string prints an object to a
>>  temporary buffer and gets that buffer string.  So, in a
>>  multibyte sesstion "(format "%s" 1)" yields a multibyte
>>  string.  :-(

> I know: I bumped into it yesterday while playing around with tar-mode.
> How about the attached patch ?

Please see the comments below.

>>  So, do you mean that you want this?
>>  
>>      If a unibyte buffer has \201\300 in the region FROM and TO,
>>  
>>      (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
>>  	=> "\201\300"
>>  
>>      (encode-coding-region FROM TO 'iso-latin-1) changes the
>>      region to \300.

> Yes, I guess I'd be happy with it.

>>  Isn't it more confusing?

> Not to me.

What do the other people think about it?

> PS: I wish there was a way to swap two buffers's content so that
>     tar-mode could swap the (potentially very large) data to
>     a helper buffer (without needing to copy this large data)
>     and then use multibyte for the display and unibyte for
>     the helper buffer.

I don't understand what you mean, especially the usage of
the helper buffer.

I think tar-mode should use multiple buffers, one unibyte
buffer for tar-file itself, one multibyte buffer for table
of contents, and the other multibyte buffers (created on
demand) for viewing/editing files contained in the tar-file.
Then, tar mode works almost the same way as dired.  We can
see multibyte files in the different buffers.  We can use
the same method in arc-mode and also in RMAIL.

Is that different from what you mean?

---
Ken'ichi HANDA
handa@m17n.org