From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Wed, 19 Nov 2003 09:06:55 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311190006.JAA14847@etlken.m17n.org> References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1069200903 26281 80.91.224.253 (19 Nov 2003 00:15:03 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 19 Nov 2003 00:15:03 +0000 (UTC) Cc: emacs-devel@gnu.org, jas@extundo.com Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Nov 19 01:14:59 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AMFzn-0001wx-00 for ; Wed, 19 Nov 2003 01:14:59 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AMFzm-0006T1-00 for ; Wed, 19 Nov 2003 01:14:58 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AMGu6-000127-OD for emacs-devel@quimby.gnus.org; Tue, 18 Nov 2003 20:13:10 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AMGtU-0000uH-QH for emacs-devel@gnu.org; Tue, 18 Nov 2003 20:12:32 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AMGpc-0007gU-Ou for emacs-devel@gnu.org; Tue, 18 Nov 2003 20:09:04 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AMGpV-0007Qk-Kb for emacs-devel@gnu.org; Tue, 18 Nov 2003 20:08:25 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAJ06uh26758; Wed, 19 Nov 2003 09:06:56 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAJ06ts23875; Wed, 19 Nov 2003 09:06:55 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id JAA14847; Wed, 19 Nov 2003 09:06:55 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 18 Nov 2003 12:12:10 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:17900 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:17900 In article , Stefan Monnier writes: > I'm not sure whether it's better or worse. The problem I have with the > introduction of a new type for chars is that it is a change that has far > reaching consequences and I'm not sure it would solve all our problems > since many of the problems have to do with bad elisp code. I see. Apart from the design itself, I agree that it's difficult to introduce a new type. But, when I discussed with Richard about the Character type object a few year ago, he was not that negative provided that it gives sure improvement. >>> Which of 1 to 3 is the best is not clear, and maybe we can just live with >>> `make-string-unibyte' and `make-string-multibyte'. >> I think you mean string-make-unibyte/multibyte, but, for the > No. My `make-string-unibyte' should only work to convert "bytes in > multibyte string" to "bytes in unibyte string": there's no char, thus no > coding-system. I see. In emacs-unicode, I already introduced string-to-multibyte which, I think, is the same as your make-string-multibyte. But, > If the multibyte string argument contains a char that's > not an eight-bit-char, then it's an error. Then, we can't use make-string-unibyte for the current case because, in emacs-unicode, (concat '(?a 192)) returns a multibyte string whose second element is A-grave, not an eight-bit-char. Am I missing something? > To do what your string-make-unibyte does you should use > `encode-coding-string' where the coding system is passed explicitly. Those are conceptually different things (I remember the similar discussion we had a while ago). encode-coding-string does: char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence --CES--> encoded-byte-sequence string-make-unibyte does: char-sequence --CCS--> code-point-sequence --concat--> code-point-sequence These two yield the same result only when CCS support all chars in "char-sequence" and CES is stateless (e.g. iso-latin-1) and . > I've changed my Emacs so that string-make-unibyte does the above > (i.e. signals an error if it encounters a non-byte char) and it works fairly > well, except for the few places where the elisp code is sloppy and needs to > be fixed. How did you change it? string-make-unibyte internally uses the function copy_text. Did you change it? But, then, each time you copy a multibyte string into a unibyte buffer, you should get an error. >>> Note that 1-3 are not mutually exclusive so we can use >>> them all. >> Yes, but, at least, I really want to avoid "(3) Make a >> series of new functions". > (defun concat-unibyte (&rest x) > (make-string-unibyte (apply 'concat x))) > ... As I wrote above, this should signal an error on: (concat-unibyte '(?a 192)) > so we don't need this series of new functions, but if some of them are used > often enough, we can add them of course. --- Ken'ichi HANDA handa@m17n.org