From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Possible UTF-8 CJK Regressions in Terminal Emulators Date: Wed, 9 Jun 2004 16:24:13 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200406090724.QAA11057@etlken.m17n.org> References: <1077557604.1632.26.camel@duende> <1077643915.12919.2.camel@duende> <1077682436.28482.9.camel@duende> <200403010815.RAA14365@etlken.m17n.org> <200404071230.VAA25159@etlken.m17n.org> <200404091128.UAA02120@etlken.m17n.org> <200406071227.VAA06216@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1086765943 31544 80.91.224.253 (9 Jun 2004 07:25:43 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 9 Jun 2004 07:25:43 +0000 (UTC) Cc: mariano@gnome.org, alexander.winston@comcast.net, emacs-devel@gnu.org, danilo@gnome.org, monnier@iro.umontreal.ca, miles@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Jun 09 09:25:34 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BXxSo-0000bS-00 for ; Wed, 09 Jun 2004 09:25:34 +0200 Original-Received: from lists.gnu.org ([199.232.76.165]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BXxSo-0003rh-00 for ; Wed, 09 Jun 2004 09:25:34 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BXxTR-00027X-EZ for emacs-devel@quimby.gnus.org; Wed, 09 Jun 2004 03:26:13 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BXxT3-0001tQ-7x for emacs-devel@gnu.org; Wed, 09 Jun 2004 03:25:49 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BXxT0-0001sc-0G for emacs-devel@gnu.org; Wed, 09 Jun 2004 03:25:48 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BXxSz-0001sZ-T0 for emacs-devel@gnu.org; Wed, 09 Jun 2004 03:25:45 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BXxRe-0007Jb-0Z; Wed, 09 Jun 2004 03:24:22 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/8.11.6) with ESMTP id i597OEQ23432; Wed, 9 Jun 2004 16:24:14 +0900 (JST) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6p2/8.11.6) with ESMTP id i597ODW13790; Wed, 9 Jun 2004 16:24:14 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA11057; Wed, 9 Jun 2004 16:24:13 +0900 (JST) Original-To: d.love@dl.ac.uk In-reply-to: (message from Dave Love on Tue, 08 Jun 2004 18:56:37 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:24757 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:24757 In article , Dave Love writes: > Kenichi Handa writes: > > I think I succeeded in loading subst-*.el not at the time of > > customizing utf-translate-cjk-mode to t but only when it is > > found that loading them is necessary on decoding or encoding > > utf-8, or on running decode/encode-char. This means that we > > can make the default value of utf-translate-cjk-mode to t > > without loading subst-*.el at building time. > It doesn't fix the potential effects on non-CJK users if decoding a > bit of Unicode text containing such a character will load the large > tables even if they're useless to the user. Maybe there aren't many > people now with 48MB P133s or old SPARCs like me, in which case it's a > reasonable default, but I suggest an entry in NEWS/PROBLEMS about it. I'm going to modify the current entry in NEWS as below. ** The utf-8/16 coding systems have been enhanced. By default, untranslatable utf-8 sequences are simply composed into single quasi-characters. User option `utf-translate-cjk-mode' (it is turned on by default) arranges to translate many utf-8 CJK character sequences into real Emacs characters in a similar way to the Mule-UCS system. As this loads a fairly big data on demand, people who are not interested in CJK characters may want to customize it to nil. You can augment/amend the CJK translation via hash tables `ucs-mule-cjk-to-unicode' and `ucs-unicode-to-mule-cjk'. The utf-8 coding systems now also encodes characters from most of Emacs's one-dimensional internal charsets, specifically the ISO-8859 ones. The utf-16 coding system is affected similarly. > > I think it's a big improvement especially for CJK users, > I agree it should be on for CJK users anyway. (I thought it was now > conditional on the language environment.) It's not. I think we had better avoid turning on/off a user option depending on language environment. --- Ken'ichi HANDA handa@m17n.org