From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Several serious problems Date: Thu, 29 Aug 2002 22:25:25 +0900 (JST) Sender: emacs-devel-admin@gnu.org Message-ID: <200208291325.WAA03596@etlken.m17n.org> References: <200208190748.QAA14278@etlken.m17n.org> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1030628222 13886 127.0.0.1 (29 Aug 2002 13:37:02 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Thu, 29 Aug 2002 13:37:02 +0000 (UTC) Cc: monnier+gnu/emacs@rum.cs.yale.edu, keichwa@gmx.net, rms@gnu.org, emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17kPTj-0003bY-00 for ; Thu, 29 Aug 2002 15:36:55 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17kQ02-00052c-00 for ; Thu, 29 Aug 2002 16:10:18 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17kPLO-0007l0-00; Thu, 29 Aug 2002 09:28:18 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17kPIv-0007ia-00 for emacs-devel@gnu.org; Thu, 29 Aug 2002 09:25:45 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17kPIr-0007iK-00 for emacs-devel@gnu.org; Thu, 29 Aug 2002 09:25:44 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17kPIl-0007i1-00; Thu, 29 Aug 2002 09:25:36 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6/3.7W-20010518204228) with ESMTP id g7TDPPl03413; Thu, 29 Aug 2002 22:25:25 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.3/3.7W-20010823150639) with ESMTP id g7TDPP919097; Thu, 29 Aug 2002 22:25:25 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id WAA03596; Thu, 29 Aug 2002 22:25:25 +0900 (JST) Original-To: d.love@dl.ac.uk In-Reply-To: (message from Dave Love on 22 Aug 2002 18:08:43 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:7109 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:7109 In article , Dave Love writes: > As far as I know, what's installed in the trunk behaves correctly, but > I'm not using that code Why aren't you using that code? Does it mean that you changed some of them locally? > and I don't know if I'd hear about real > problems with it (as opposed to imagined problems). It should all be > things you have said are OK or I'm sure you will think are OK, but I > may have overlooked something. However, it could use work for CJK, in > particular; there's a fixme in utf-8, and there could be additional > interconversion tables for CJK charsets as well as a way of > customizing the character preferences in utf-8-subst.el, and probably > other things. I noticed those `fixme's. Yes, it is better to solve all of them, but, for the moment, I want to concentrate on fixing the problem of RC. >> I've thought that the current codes were >> the same one as what Dave had, but the above statement of >> Dave's tells that it's not. > Well, now I check, utf-8.el in the RC branch seems to be as I left it, > which is what rms (I think) told me to do. As far as I can tell, its > safe-charsets property is correct, The safe-charsets property of utf-8 in RC is this: ascii eight-bit-control eight-bit-graphic latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff ethiopic tibetan thai-tis620 katakana-jisx0201 ipa chinese-sisheng lao vietnamese-viscii-lower vietnamese-viscii-upper It doesn't contain latin-iso8859-[23...]. > and I don't understand what the complaint is about. When > I couldn't check, I assumed someone had modified it > incorrectly, but there's no sign of that in CVS. The complaint is that the coding-system utf-8 can't encode latin-2 characters in RC even if loadup.el has these lines. (load "international/ucs-tables") (ucs-unify-8859 'encode-only) The reason is, as far as I see, the ccl program `ccl-encode-mule-utf-8' doesn't have this line at the near to head. (translate-character ucs-mule-to-mule-unicode r0 r1)) So, even if we setup the translation table `ucs-mule-to-mule-unicode' at loadup time, it is not used in utf-8. >> Could someone tell me why are they different in HEAD and RC, >> and why are they different from what Dave have written? > Most changes aren't in RC since I was only allowed to add (a version > of) ucs-tables, not changing the default behaviour, so people could > turn on (partial) character translation themselves. It doesn't affect > utf-8 or any other ccl coding systems because they don't use the > translation table (although the useful extra coding systems in > code-pages.el aren't included either, so I think only koi, > alternativnyj and mac-roman are affected). Hmmm, I think I realized the situation of RC. It can unify charsets between iso-8859-X, but utf-8 can't encode iso-8859-X (intentionally), correct? Richard, is it what you asked Dave to install for RC? I think RC should also allow utf-8 to encode 8859-X correctly like in HEAD. I see no harm in it. > I think I unilaterally added some other things (a utf-8 language > environment and utf-16.el?) since they addressed somewhat misleading > entries in PROBLEMS and the arguments against the Unicode support are > either demonstrably wrong or spurious IMNSHO. I don't oppose to that. I found one problem with utf-16. It seems that utf-16-le/be can handle 8859-X correctly because of this line in ccl-encode-mule-utf-16-le/be, (translate-character ucs-mule-to-mule-unicode r0 r1) but the safe-charsets property lists only these: ascii eight-bit-control latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff thus, they can't be regarded as a safe coding system for them. > I'm afraid I've had enough of all this, Yah, you have done the excellent hack! When I implemented translation table stuffs, I didn't expect that it can be used this thoroughly. > and I doubt it's worth more effort anyhow. Especially > after all the FUD about them, the Mule additions probably > won't get used much unless they're the default, even by > i18n people, unfortunately. I thought containing ucs-tables and etc in RC is at least for making unify-on-encoding the default INCLUDING utf-8. --- Ken'ichi HANDA handa@etl.go.jp