From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
Newsgroups: gmane.emacs.devel
Subject: Re: eight-bit char handling in emacs-unicode
Date: 21 Nov 2003 00:27:42 -0500
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <jwvzneqwbo3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
References: <ilubrrha7oc.fsf@latte.josefsson.org>
	<200311130153.KAA04615@etlken.m17n.org>
	<ilur80c50uj.fsf@latte.josefsson.org>
	<200311130610.PAA04983@etlken.m17n.org>
	<iluekwcwyl8.fsf@latte.josefsson.org>
	<200311130901.SAA05204@etlken.m17n.org>
	<ilun0b08by1.fsf@latte.josefsson.org>
	<200311140047.JAA06414@etlken.m17n.org>
	<jwvhe12emr3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200311180733.QAA13703@etlken.m17n.org>
	<jwvn0atd38w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200311190006.JAA14847@etlken.m17n.org>
	<jwvptfp139w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200311210041.JAA18324@etlken.m17n.org>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1069392882 2733 80.91.224.253 (21 Nov 2003 05:34:42 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Fri, 21 Nov 2003 05:34:42 +0000 (UTC)
Cc: jas@extundo.com, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Nov 21 06:34:39 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AN3wF-0006dh-00
	for <emacs-devel@deer.gmane.org>; Fri, 21 Nov 2003 06:34:39 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AN3wD-00070b-00
	for <emacs-devel@quimby.gnus.org>; Fri, 21 Nov 2003 06:34:37 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1AN4np-0002ib-1k
	for emacs-devel@quimby.gnus.org; Fri, 21 Nov 2003 01:30:01 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24)
	id 1AN4ng-0002eg-0W
	for emacs-devel@gnu.org; Fri, 21 Nov 2003 01:29:52 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24)
	id 1AN4n8-0002MA-Qc
	for emacs-devel@gnu.org; Fri, 21 Nov 2003 01:29:49 -0500
Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca)
	by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN4n8-0002Le-8o
	for emacs-devel@gnu.org; Fri, 21 Nov 2003 01:29:18 -0500
Original-Received: from vor.iro.umontreal.ca (vor.iro.umontreal.ca [132.204.24.42])
	by mercure.iro.umontreal.ca (8.12.9/8.12.9) with ESMTP id
	hAL5Rgbj032203; Fri, 21 Nov 2003 00:27:43 -0500
Original-Received: by vor.iro.umontreal.ca (Postfix, from userid 20848)
	id 91BAB3C63F; Fri, 21 Nov 2003 00:27:42 -0500 (EST)
Original-To: Kenichi Handa <handa@m17n.org>
In-Reply-To: <200311210041.JAA18324@etlken.m17n.org>
Original-Lines: 38
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
X-DIRO-MailScanner: Found to be clean
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Emacs development discussions.  <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:18009
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18009

>> I thought that string-make-unibyte only behaves meaningfully for
>> "normal 8bit coding-systems" such as latin-1.

> Yes, but it doesn't mean it is conceptually the same as
> encode-coding-string.  The result of string-make-unibyte
> should still be regarded as a sequence of character, but the
> result of encode-coding-string is a sequence of byte.

Why/when is the distinction meaningful (given the fact that it
can only be used meaningfully with 8bit coding-systems where the
distinction seems more philosophical than anything else) ?

> Here exists an ambiguity of a unibyte string.

> The number 192 can be regarded as:
> (1) just a number, a byte
> (2) a code point of some character set.
> (3) a character code

But the second case is only possible for 8bit character sets, right?

Until now, I always thought that Emacs only dealt with
- byte streams representing encoded sequences of code points: case 1.
- sequences of internal character codes (internally encoded in emacs-mule
  or unicode depending on the branch you use): case 3.
Is there any place where we deal with sequences of code points of external
charsets really (other than in the degenerate case where such a sequence
is indistinguishable from case 1, maybe).

> A unibyte string can contain (1) and (2) without
> distinguishing them, but a multibyte string can contain (1)
> and (3) while distinguishing them.

Can multibyte strings distinguish the cases (1) and (3) for integer 97 and
character `a' ?


        Stefan