From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: eight-bit char handling in emacs-unicode
Date: Wed, 26 Nov 2003 09:07:47 +0900 (JST)
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200311260007.JAA26617@etlken.m17n.org>
References: <200311250107.KAA24646@etlken.m17n.org>
	<jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: sea.gmane.org 1069807043 24248 80.91.224.253 (26 Nov 2003 00:37:23 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Wed, 26 Nov 2003 00:37:23 +0000 (UTC)
Cc: jas@extundo.com, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Nov 26 01:37:19 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AOngF-0003zT-00
	for <emacs-devel@deer.gmane.org>; Wed, 26 Nov 2003 01:37:19 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AOngF-0008SI-00
	for <emacs-devel@quimby.gnus.org>; Wed, 26 Nov 2003 01:37:19 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1AOodY-0000r7-DY
	for emacs-devel@quimby.gnus.org; Tue, 25 Nov 2003 20:38:36 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24)
	id 1AOoBu-0004TZ-En
	for emacs-devel@gnu.org; Tue, 25 Nov 2003 20:10:02 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24)
	id 1AOoBN-0004Hd-Gx
	for emacs-devel@gnu.org; Tue, 25 Nov 2003 20:10:00 -0500
Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org)
	by monty-python.gnu.org with esmtp (Exim 4.24) id 1AOoBL-0004GU-TR
	for emacs-devel@gnu.org; Tue, 25 Nov 2003 20:09:28 -0500
Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])
	by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id
	hAQ07mh00040; Wed, 26 Nov 2003 09:07:48 +0900 (JST)
	(envelope-from handa@m17n.org)
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAQ07ls19715; 
	Wed, 26 Nov 2003 09:07:47 +0900 (JST)
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id JAA26617;
	Wed, 26 Nov 2003 09:07:47 +0900 (JST)
Original-To: monnier@IRO.UMontreal.CA
In-reply-to: <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	(message from Stefan Monnier on 25 Nov 2003 10:43:05 -0500)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Emacs development discussions.  <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:18126
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18126

In article <jwvfzgcsbuv.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>  It seems that you keep of saying that "A does B, thus it's
>>  nonsense".  But, I'm arguing that "A does C".

> Well, the thing is: I still don't understand what is C.
> From what I understand, you say that C is "a conversion from multibyte
> to a sequence of code-points",

Yes, that what I said.

> but since the output is a unibyte string,
> that restrict it to cases where the code-points can be encoded in 8 bits,
> thus it doesn't sound very generic

Yes.  But I thought generic or not is not a point here.

> and I don't see any application for it
> (nor do I see any practical difference with using encode-coding-string
> since the output AFAIK would be the same).

My examples shows that we can't use encode-coding-string.
How can we use encode-coding-string without knowing what
coding system to use?  I haven't heard your answer yet.

>>  It doesn't make sense because you treat the result as "a
>>  unibyte string encoded in Latin-1".

>>  It makes sense if you treat the result as "a unibyte string
>>  in which each byte represents a sequence of Unicode
>>  code-points", doesn't it?

> But each byte can only represent the 0-255 subset of unicode code-points, in
> which case this is equivalent (practically speaking) to latin-1, isn't it ?

Yes.  And that covers all characters the user uses in this
case.

>>>  It'd make sense if the environment said "latin-1 when you can,
>>>  utf-8 otherwise" or something like that, but then we would use
>>>  encode-coding-string anyway.

>>  It's itself nonsense to have such a coding system.

> I was not thinking of a coding-system, but just some encoding job,
> such as what is done when saving a buffer (where my .emacs does exactly
> that: try latin-1 first and utf-8 if that fails).

Ah, I see.  But, my understanding is that
string-make-unibyte/multibyte are designed not to change the
number of characters to make the difference of
unibyte/multibyte transparent in Lisp.  That restriction
leads to a case that non-supported characters are handled
incorrectly.  But, I think Richard's design policy was that
incorrect handling of non-supported characters is better
than a possibly more disastrous error caused by the change
of number of characters.

>>  Do you agree with having string-make-unibyte if it signals an error on
>>  non-Latin-1 characters?

> Of course: that's pretty much what I suggested: make-string-unibyte only
> accepts multibyte chars that correspond to "bytes".

I agree with that.  But, it just changes the behaviour of
the function on error case.  It doesn't change the concept
of what it does.

>>>  I just don't know of a concrete case where it makes sense to use
>>>  string-make-unibyte.

>>  I'll paraphrase my previous example as this:

>>    It is perfectly possible to live in such an environment
>>    where only the characters U+0000..U+00FF of Unicode is
>>    used but only the coding system utf-8 is used.

>>  But, I don't claim that the above is a realistic case.

>>  Another non-realistic but concrete case is:

>>    Use only the charset iso-8859-5 and the encoding CTEXT.

> I don't see any use of string-make-unibyte in your two examples.

Again, I'd like to ask how to use encode-coding-string
without knowing the proper coding-system in each case.

> And "having string-make-unibyte if it signals an error on non-Latin-1
> characters" means that the second example can't be used any more.

In the second case, of course "supported characters" are
what included in the charset iso-8859-5, and
string-make-unibyte should accept them.  Again, the result
is the same as encoding by the coding system iso-8859-5, but
we only know about the coding system CTEXT here.

---
Ken'ichi HANDA
handa@m17n.org