From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: 18 Nov 2003 12:12:10 -0500 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1069175865 15058 80.91.224.253 (18 Nov 2003 17:17:45 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 18 Nov 2003 17:17:45 +0000 (UTC) Cc: emacs-devel@gnu.org, jas@extundo.com Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Nov 18 18:17:42 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AM9Ty-0005bg-00 for ; Tue, 18 Nov 2003 18:17:42 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AM9Ty-0002NK-00 for ; Tue, 18 Nov 2003 18:17:42 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AMAOn-0004hX-9S for emacs-devel@quimby.gnus.org; Tue, 18 Nov 2003 13:16:25 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AMAN8-0004UL-0J for emacs-devel@gnu.org; Tue, 18 Nov 2003 13:14:42 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AMAMH-0003H9-7S for emacs-devel@gnu.org; Tue, 18 Nov 2003 13:14:20 -0500 Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AMAMG-0003Df-E6 for emacs-devel@gnu.org; Tue, 18 Nov 2003 13:13:48 -0500 Original-Received: from vor.iro.umontreal.ca (vor.iro.umontreal.ca [132.204.24.42]) by mercure.iro.umontreal.ca (8.12.9/8.12.9) with ESMTP id hAIHCAbj000674; Tue, 18 Nov 2003 12:12:11 -0500 Original-Received: by vor.iro.umontreal.ca (Postfix, from userid 20848) id 4FE403C63E; Tue, 18 Nov 2003 12:12:10 -0500 (EST) Original-To: Kenichi Handa In-Reply-To: <200311180733.QAA13703@etlken.m17n.org> Original-Lines: 58 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 X-DIRO-MailScanner: Found to be clean X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:17888 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:17888 >>> The basic problem is that we don't distinguish a character >>> (code) and a number. So, we introduce a character object >> That's one way to look at the problem. >> Another is to say that the problem is instead that we do not distinguish >> between arrays of chars and arrays of bytes. > I agree that it's possible to grasp the problem in that way, > but I'm not sure which is the better way. Could you explain > WHY yours is better? I'm not sure whether it's better or worse. The problem I have with the introduction of a new type for chars is that it is a change that has far reaching consequences and I'm not sure it would solve all our problems since many of the problems have to do with bad elisp code. >> Which of 1 to 3 is the best is not clear, and maybe we can just live with >> `make-string-unibyte' and `make-string-multibyte'. > I think you mean string-make-unibyte/multibyte, but, for the > current problem, we can't use it because string-make-unibyte > may behave differently in different language environment. > Such a lang. env. that makes iso-8859-1 or Unicode the > highest priority for the character `=C3=80' is ok. > (string-make-unibyte (concat '(?a 192))) =3D "a\300" > But, if some lang. env. prefers such a charset for `=C3=80' that > encodes it not to 192 (e.g. Vietnamese VSCII), we fail. No. My `make-string-unibyte' should only work to convert "bytes in multibyte string" to "bytes in unibyte string": there's no char, thus no coding-system. If the multibyte string argument contains a char that's not an eight-bit-char, then it's an error. To do what your string-make-unibyte does you should use `encode-coding-string' where the coding system is passed explicitly. I've changed my Emacs so that string-make-unibyte does the above (i.e. signals an error if it encounters a non-byte char) and it works fairly well, except for the few places where the elisp code is sloppy and needs to be fixed. >> Note that 1-3 are not mutually exclusive so we can use >> them all. > Yes, but, at least, I really want to avoid "(3) Make a > series of new functions". (defun concat-unibyte (&rest x) (make-string-unibyte (apply 'concat x))) ... so we don't need this series of new functions, but if some of them are used often enough, we can add them of course. Stefan