From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
Newsgroups: gmane.emacs.devel
Subject: Re: eight-bit char handling in emacs-unicode
Date: 18 Nov 2003 12:12:10 -0500
Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Message-ID: <jwvn0atd38w.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
References: <ilubrrha7oc.fsf@latte.josefsson.org>
	<200311130153.KAA04615@etlken.m17n.org>
	<ilur80c50uj.fsf@latte.josefsson.org>
	<200311130610.PAA04983@etlken.m17n.org>
	<iluekwcwyl8.fsf@latte.josefsson.org>
	<200311130901.SAA05204@etlken.m17n.org>
	<ilun0b08by1.fsf@latte.josefsson.org>
	<200311140047.JAA06414@etlken.m17n.org>
	<jwvhe12emr3.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
	<200311180733.QAA13703@etlken.m17n.org>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: sea.gmane.org 1069175865 15058 80.91.224.253 (18 Nov 2003 17:17:45 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Tue, 18 Nov 2003 17:17:45 +0000 (UTC)
Cc: emacs-devel@gnu.org, jas@extundo.com
Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Nov 18 18:17:42 2003
Return-path: <emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org>
Original-Received: from quimby.gnus.org ([80.91.224.244])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AM9Ty-0005bg-00
	for <emacs-devel@deer.gmane.org>; Tue, 18 Nov 2003 18:17:42 +0100
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AM9Ty-0002NK-00
	for <emacs-devel@quimby.gnus.org>; Tue, 18 Nov 2003 18:17:42 +0100
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.24)
	id 1AMAOn-0004hX-9S
	for emacs-devel@quimby.gnus.org; Tue, 18 Nov 2003 13:16:25 -0500
Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24)
	id 1AMAN8-0004UL-0J
	for emacs-devel@gnu.org; Tue, 18 Nov 2003 13:14:42 -0500
Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24)
	id 1AMAMH-0003H9-7S
	for emacs-devel@gnu.org; Tue, 18 Nov 2003 13:14:20 -0500
Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca)
	by monty-python.gnu.org with esmtp (Exim 4.24) id 1AMAMG-0003Df-E6
	for emacs-devel@gnu.org; Tue, 18 Nov 2003 13:13:48 -0500
Original-Received: from vor.iro.umontreal.ca (vor.iro.umontreal.ca [132.204.24.42])
	by mercure.iro.umontreal.ca (8.12.9/8.12.9) with ESMTP id
	hAIHCAbj000674; Tue, 18 Nov 2003 12:12:11 -0500
Original-Received: by vor.iro.umontreal.ca (Postfix, from userid 20848)
	id 4FE403C63E; Tue, 18 Nov 2003 12:12:10 -0500 (EST)
Original-To: Kenichi Handa <handa@m17n.org>
In-Reply-To: <200311180733.QAA13703@etlken.m17n.org>
Original-Lines: 58
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
X-DIRO-MailScanner: Found to be clean
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Emacs development discussions.  <emacs-devel.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org
Xref: main.gmane.org gmane.emacs.devel:17888
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:17888

>>> The basic problem is that we don't distinguish a character
>>> (code) and a number.  So, we introduce a character object

>> That's one way to look at the problem.
>> Another is to say that the problem is instead that we do not distinguish
>> between arrays of chars and arrays of bytes.

> I agree that it's possible to grasp the problem in that way,
> but I'm not sure which is the better way.  Could you explain
> WHY yours is better?

I'm not sure whether it's better or worse.  The problem I have with the
introduction of a new type for chars is that it is a change that has far
reaching consequences and I'm not sure it would solve all our problems
since many of the problems have to do with bad elisp code.

>> Which of 1 to 3 is the best is not clear, and maybe we can just live with
>> `make-string-unibyte' and `make-string-multibyte'.

> I think you mean string-make-unibyte/multibyte, but, for the
> current problem, we can't use it because string-make-unibyte
> may behave differently in different language environment.
> Such a lang. env. that makes iso-8859-1 or Unicode the
> highest priority for the character `=C3=80' is ok.

> (string-make-unibyte (concat '(?a 192))) =3D "a\300"

> But, if some lang. env. prefers such a charset for `=C3=80' that
> encodes it not to 192 (e.g. Vietnamese VSCII), we fail.

No.  My `make-string-unibyte' should only work to convert "bytes in
multibyte string" to "bytes in unibyte string": there's no char, thus no
coding-system.  If the multibyte string argument contains a char that's
not an eight-bit-char, then it's an error.

To do what your string-make-unibyte does you should use
`encode-coding-string' where the coding system is passed explicitly.

I've changed my Emacs so that string-make-unibyte does the above
(i.e. signals an error if it encounters a non-byte char) and it works fairly
well, except for the few places where the elisp code is sloppy and needs to
be fixed.

>> Note that 1-3 are not mutually exclusive so we can use
>> them all.

> Yes, but, at least, I really want to avoid "(3) Make a
> series of new functions".

(defun concat-unibyte (&rest x)
  (make-string-unibyte (apply 'concat x)))
...

so we don't need this series of new functions, but if some of them are used
often enough, we can add them of course.


        Stefan