From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 14:32:16 +0900 (JST)
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200302260532.OAA29294@etlken.m17n.org>
References: <m3lm05ciwr.fsf@loiso.podval.org>
	<200302250634.PAA27478@etlken.m17n.org>
	<buo65r8ua9i.fsf@mcspd15.ucom.lsi.nec.co.jp>
	<200302260058.JAA28973@etlken.m17n.org>
	<200302260211.h1Q2BJl08373@rum.cs.yale.edu>
	<200302260234.LAA29082@etlken.m17n.org>
	<200302260252.h1Q2qIK08490@rum.cs.yale.edu>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: main.gmane.org 1046237567 4810 80.91.224.249 (26 Feb 2003 05:32:47 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Wed, 26 Feb 2003 05:32:47 +0000 (UTC)
Cc: miles@gnu.org
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18nuBR-0001FQ-00
	for <emacs-devel@main.gmane.org>; Wed, 26 Feb 2003 06:32:45 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian))
	id 18nuRi-0002vI-00
	for <emacs-devel@quimby.gnus.org>; Wed, 26 Feb 2003 06:49:35 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18nuBT-0008A5-04
	for emacs-devel@quimby.gnus.org; Wed, 26 Feb 2003 00:32:47 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13)
	id 18nuB9-00089k-00
	for emacs-devel@gnu.org; Wed, 26 Feb 2003 00:32:27 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13)
	id 18nuB8-00089Z-00
	for emacs-devel@gnu.org; Wed, 26 Feb 2003 00:32:26 -0500
Original-Received: from tsukuba.m17n.org ([192.47.44.130])
	by monty-python.gnu.org with esmtp (Exim 4.10.13)
	id 18nuB6-00080s-00; Wed, 26 Feb 2003 00:32:24 -0500
Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h1Q5WHk15186;
	Wed, 26 Feb 2003 14:32:17 +0900 (JST)	(envelope-from handa@m17n.org)
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	h1Q5WGR06927;	Wed, 26 Feb 2003 14:32:16 +0900 (JST)
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id OAA29294;
	Wed, 26 Feb 2003 14:32:16 +0900 (JST)
Original-To: monnier+gnu/emacs@rum.cs.yale.edu
In-reply-to: <200302260252.h1Q2qIK08490@rum.cs.yale.edu>
	(monnier+gnu/emacs@rum.cs.yale.edu)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
Original-cc: d.love@dl.ac.uk
Original-cc: sds@gnu.org
Original-cc: emacs-devel@gnu.org
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: Emacs development discussions. <emacs-devel.gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Post: <mailto:emacs-devel@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:11959
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:11959

In article <200302260252.h1Q2qIK08490@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> I consider this context-dependent meaning of unibyte strings
> to be a problem.  I understand why text in a unibyte buffer
> has such an ambiguous meaning and agree that it's difficult
> to avoid, but it's not a reason to carry over this difficulty
> to strings where it is not needed.

Why is it not needed?  Strings and buffers are not that
different, both are containers of characters.  If we get a
unibyte string from a unibyte buffer by buffer-substring,
how should we treat that string?

>>  In the former case, as it is given to encode-coding-string,
>>  it is a multibyte form by which emacs represents
>>  character(s), not a sequence of characters representing raw
>>  bytes.

> The problem is that the multibyteness of strings is not
> always as easy to guess/control.

I agree.

> For example: what is the multibyteness of

> 	(concat "\201" (format "%s" "hello"))
> and
> 	(concat "\201" (format "%s" 1))

The latter yields multibyte, but I think it'a bug.  I found
that "(format "%s" 1)" is implemented by using
prin1-to-string, and prin1-to-string prints an object to a
temporary buffer and gets that buffer string.  So, in a
multibyte sesstion "(format "%s" 1)" yields a multibyte
string.  :-(

>>  In the latter case, as it is given to string-to-multibyte,
>>  it should be regard as a sequence of characters representing
>>  raw bytes, thus the result of (string-to-multibyte
>>  "\201\300") is still a sequence of raw-bytes.  Encoding
>>  raw-bytes should yield the same raw-bytes.

> Indeed, that's what I and `setenv' would want.

>>  And, this behaviour of encode-coding-string on a unibyte
>>  string is a natural consequence of encode-coding-region in a
>>  unibyte buffer.

> As mentioned above, I understand why it works that way in buffers,
> but I don't think it has to work the same way for strings.

So, do you mean that you want this?

    If a unibyte buffer has \201\300 in the region FROM and TO,

    (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
	=> "\201\300"

    (encode-coding-region FROM TO 'iso-latin-1) changes the
    region to \300.

Isn't it more confusing?

By the way, I also really really hate this unibyte/mulitbyte
problem.  Sometimes I think I should have opposed to the
introduction of such a concept more strongly.

    imagine there's no unibyte 
    it's easy if you try
    no bytes below us
    above us only chars
    imagine all the people living in multibyte

:-)

---
Ken'ichi HANDA
handa@m17n.org