From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: eight-bit char handling in emacs-unicode Date: Tue, 25 Nov 2003 10:07:18 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311250107.KAA24646@etlken.m17n.org> References: <200311130153.KAA04615@etlken.m17n.org> <200311130610.PAA04983@etlken.m17n.org> <200311130901.SAA05204@etlken.m17n.org> <200311140047.JAA06414@etlken.m17n.org> <200311180733.QAA13703@etlken.m17n.org> <200311190006.JAA14847@etlken.m17n.org> <200311210041.JAA18324@etlken.m17n.org> <200311210627.PAA18757@etlken.m17n.org> <200311220125.KAA20128@etlken.m17n.org> <200311230730.QAA21903@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1069722737 5763 80.91.224.253 (25 Nov 2003 01:12:17 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 25 Nov 2003 01:12:17 +0000 (UTC) Cc: jas@extundo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Nov 25 02:12:14 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AORkU-0004sw-00 for ; Tue, 25 Nov 2003 02:12:14 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AORkU-0001nM-00 for ; Tue, 25 Nov 2003 02:12:14 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AOSgQ-0007AX-GH for emacs-devel@quimby.gnus.org; Mon, 24 Nov 2003 21:12:06 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AOSgE-00078R-NE for emacs-devel@gnu.org; Mon, 24 Nov 2003 21:11:54 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AOSfi-0006y4-Jl for emacs-devel@gnu.org; Mon, 24 Nov 2003 21:11:53 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AOSdK-0006NP-RW for emacs-devel@gnu.org; Mon, 24 Nov 2003 21:08:55 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAP17Jh18892; Tue, 25 Nov 2003 10:07:19 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAP17Is11080; Tue, 25 Nov 2003 10:07:18 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id KAA24646; Tue, 25 Nov 2003 10:07:18 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 23 Nov 2003 18:48:08 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18096 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18096 In article , Stefan Monnier writes: >> But, the concept of unibyte<->multibyte convesion itself is >> not ad-hoc. Don't you think their meaning is very clear >> when you grasp them as my way? Do you see any inconsistency >> in my explanation about them? > No, as a matter of fact I don't see why in a utf-8 environment, > it makes any sense to have a function that turns a multibyte string > into a unibyte string encoded in latin-1 It seems that you keep of saying that "A does B, thus it's nonsense". But, I'm arguing that "A does C". It doesn't make sense because you treat the result as "a unibyte string encoded in Latin-1". It makes sense if you treat the result as "a unibyte string in which each byte represents a sequence of Unicode code-points", doesn't it? > (without even complaining when it encounters other > characters). I think it's ok (or better) that string-make-unibyte complains in such a case. > It'd make sense if the environment said "latin-1 when you can, > utf-8 otherwise" or something like that, but then we would use > encode-coding-string anyway. It's itself nonsense to have such a coding system. Do you agree with having string-make-unibyte if it signals an error on non-Latin-1 characters? > Besides, if any non-latin-1 char is encountered by string-make-unibyte, then > we end up with a uninyte string that has an unknown meaning because some > chars might have been encoded in latin-1, and others in some other encoding. > I just don't know of a concrete case where it makes sense to use > string-make-unibyte. I'll paraphrase my previous example as this: It is perfectly possible to live in such an environment where only the characters U+0000..U+00FF of Unicode is used but only the coding system utf-8 is used. But, I don't claim that the above is a realistic case. Another non-realistic but concrete case is: Use only the charset iso-8859-5 and the encoding CTEXT. --- Ken'ichi HANDA handa@m17n.org