From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: eight-bit char handling in emacs-unicode
Date: Fri, 21 Nov 2003 09:41:47 +0900 (JST)
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <200311210041.JAA18324@etlken.m17n.org>
References: <ilubrrha7oc.fsf@latte.josefsson.org>	<200311130153.KAA04615@etlken.m17n.org>	<ilur80c50uj.fsf@latte.josefsson.org>	<200311130610.PAA04983@etlken.m17n.org>	<iluekwcwyl8.fsf@latte.josefsson.org>	<200311130901.SAA05204@etlken.m17n.org>	<ilun0b08by1.fsf@latte.josefsson.org>	<200311140047.JAA06414@etlken.m17n.org>	<jwvhe12emr3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311180733.QAA13703@etlken.m17n.org>	<jwvn0atd38w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>	<200311190006.JAA14847@etlken.m17n.org>
	<jwvptfp139w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: sea.gmane.org 1069376401 11106 80.91.224.253 (21 Nov 2003 01:00:01 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Fri, 21 Nov 2003 01:00:01 +0000 (UTC)
Cc: jas@extundo.com, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Nov 21 01:59:56 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AMzeN-0005g3-00
	for <emacs-devel@deer.gmane.org>; Fri, 21 Nov 2003 01:59:55 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AMzeN-0004J8-00
	for <emacs-devel@quimby.gnus.org>; Fri, 21 Nov 2003 01:59:55 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1AN0SF-0000WV-Ce
	for emacs-devel@quimby.gnus.org; Thu, 20 Nov 2003 20:51:27 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24)
	id 1AN0QK-0008JF-R2
	for emacs-devel@gnu.org; Thu, 20 Nov 2003 20:49:28 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24)
	id 1AN0KW-0006LU-Fk
	for emacs-devel@gnu.org; Thu, 20 Nov 2003 20:43:59 -0500
Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org)
	by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN0KV-0006K6-Dk
	for emacs-devel@gnu.org; Thu, 20 Nov 2003 20:43:27 -0500
Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])
	by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id
	hAL0fmh06858; Fri, 21 Nov 2003 09:41:48 +0900 (JST)
	(envelope-from handa@m17n.org)
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAL0fls11944; 
	Fri, 21 Nov 2003 09:41:47 +0900 (JST)
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id JAA18324;
	Fri, 21 Nov 2003 09:41:47 +0900 (JST)
Original-To: monnier@IRO.UMontreal.CA
In-reply-to: <jwvptfp139w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	(message from Stefan Monnier on 18 Nov 2003 22:05:39 -0500)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Emacs development discussions.  <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:17995
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:17995

In article <jwvptfp139w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>>  I see.  Apart from the design itself, I agree that it's difficult to
>>  introduce a new type.  But, when I discussed with Richard about the
>>  Character type object a few year ago, he was not that negative provided
>>  that it gives sure improvement.

> Sounds about right to me: we have one free tag that we could use for chars

Yes, and as that is the last free tag, I still hesitate to
consume it for the Character object.

>>  Then, we can't use make-string-unibyte for the current case
>>  because, in emacs-unicode, (concat '(?a 192)) returns a
>>  multibyte string whose second element is A-grave, not an
>>  eight-bit-char.  Am I missing something?

> Well, obviously we need to make it accept this case (i.e. accept both the
> latin-1 192 and the eight-bit-char 192).

Then, I see your intention.  But, isn't the semantics of
such a function very weird?

>>>  To do what your string-make-unibyte does you should use
>>>  `encode-coding-string' where the coding system is passed explicitly.

>>  Those are conceptually different things (I remember the
>>  similar discussion we had a while ago).

>>  encode-coding-string does:
>>  char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence
>>    --CES-->  encoded-byte-sequence

>>  string-make-unibyte does:
>>  char-sequence --CCS--> code-point-sequence
>>    --concat-->  code-point-sequence

>>  These two yield the same result only when CCS support all
>>  chars in "char-sequence" and CES is stateless
>>  (e.g. iso-latin-1) and .

> You lost me here (I'm a poor soul whose doesn't know much outside of the
> latin-1 world).

CCS: Coded Character Set
CES: Character Encoding Scheme
coding-system of Emacs: Set of CCSs and CES.
   iso-latin-1: CCSs are ascii and latin-iso8859-1, 
		CES is 8-bit version of ISO-2022
   iso-2022-jp:	CCSs are ascii, japanese-jisx0208, ...
		CES is 7-bit version of ISO-2022

> I thought that string-make-unibyte only behaves meaningfully for
> "normal 8bit coding-systems" such as latin-1.

Yes, but it doesn't mean it is conceptually the same as
encode-coding-string.  The result of string-make-unibyte
should still be regarded as a sequence of character, but the
result of encode-coding-string is a sequence of byte.
Here exists an ambiguity of a unibyte string.

The number 192 can be regarded as:
(1) just a number, a byte
(2) a code point of some character set.
(3) a character code

A unibyte string can contain (1) and (2) without
distinguishing them, but a multibyte string can contain (1)
and (3) while distinguishing them.

---
Ken'ichi HANDA
handa@m17n.org