From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: please consider emacs-unicode for pervasive changes Date: Thu, 5 Sep 2002 14:48:22 +0900 (JST) Sender: emacs-devel-admin@gnu.org Message-ID: <200209050548.OAA13998@etlken.m17n.org> References: <200208090754.g797s6s11972@rum.cs.yale.edu> <200208130030.JAA26246@etlken.m17n.org> <200209030615.PAA10378@etlken.m17n.org> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1031204902 7951 127.0.0.1 (5 Sep 2002 05:48:22 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Thu, 5 Sep 2002 05:48:22 +0000 (UTC) Cc: d.love@dl.ac.uk, raeburn@raeburn.org, monnier+gnu/emacs@rum.cs.yale.edu, emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17mpV7-000246-00 for ; Thu, 05 Sep 2002 07:48:21 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17mq4e-0005py-00 for ; Thu, 05 Sep 2002 08:25:05 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17mpWg-0001k9-00; Thu, 05 Sep 2002 01:49:58 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17mpVH-0001fd-00 for emacs-devel@gnu.org; Thu, 05 Sep 2002 01:48:31 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17mpVF-0001fP-00 for emacs-devel@gnu.org; Thu, 05 Sep 2002 01:48:31 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17mpVE-0001fH-00; Thu, 05 Sep 2002 01:48:29 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6/3.7W-20010518204228) with ESMTP id g855mQK06701; Thu, 5 Sep 2002 14:48:26 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.3/3.7W-20010823150639) with ESMTP id g855mMd13347; Thu, 5 Sep 2002 14:48:22 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id OAA13998; Thu, 5 Sep 2002 14:48:22 +0900 (JST) Original-To: rms@gnu.org In-Reply-To: (message from Richard Stallman on Wed, 04 Sep 2002 10:20:35 -0400) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:7517 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:7517 In article , Richard Stallman writes: >> I think that depends. For one thing, it will typically clobber >> iso-2022 files. > Yes. On reading and writing iso-2022 files, Emacs-unicode > may designate different charsets. I still can't find a time > to fix it. > What exactly is the nature of the problem? "Clobber" usually means > "destroy or ruin"; however, your statement seems to say it would alter > the file but the altered file would still represent the correct > characters. Which one is it? As Emacs-unicode unifies, for instance, character C1 of charset CS1 and character C2 of CS2. So, so even if an original iso-2022-7bit file uses the different byte sequence to represent them, when emacs reads it and write, C2 will be encoded by the same byte sequence as C1. It doesn't matter for Emacs because when Emacs reads that file again, there's no difference. It's difficult to answer the question "Are C1 and C2 the same character?" To Emacs-unicode (and also to Unicoders), they are same. To some other application, they may be different. But, this kind of thing happen only for such coding systems as iso-2022-7bit, iso-latin-1-with-esc, etc (those invented by Emacs) because they can support more than two charsets that contains characters unified by Emacs. Other application usually don't use such coding systems but use iso-2022-jp, iso-2022-kr, etc. As charsets supported by each of them doesn't overlap (thus not unified), reading and writing by Emacs-unicode has no problem. --- Ken'ichi HANDA handa@etl.go.jp