From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Possible UTF-8 CJK Regressions in Terminal Emulators Date: Wed, 7 Apr 2004 21:30:41 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200404071230.VAA25159@etlken.m17n.org> References: <1077557604.1632.26.camel@duende> <1077643915.12919.2.camel@duende> <1077682436.28482.9.camel@duende> <200403010815.RAA14365@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1081342084 10342 80.91.224.253 (7 Apr 2004 12:48:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 7 Apr 2004 12:48:04 +0000 (UTC) Cc: mariano@gnome.org, alexander.winston@comcast.net, emacs-devel@gnu.org, danilo@gnome.org, monnier@iro.umontreal.ca, miles@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Apr 07 14:47:55 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BBCTD-0005ib-00 for ; Wed, 07 Apr 2004 14:47:55 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BBCTD-0003Be-00 for ; Wed, 07 Apr 2004 14:47:55 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.30) id 1BBCMo-000719-7u for emacs-devel@quimby.gnus.org; Wed, 07 Apr 2004 08:41:18 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.30) id 1BBCLD-0006ct-0j for emacs-devel@gnu.org; Wed, 07 Apr 2004 08:39:39 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.30) id 1BBCKe-0006SZ-4G for emacs-devel@gnu.org; Wed, 07 Apr 2004 08:39:35 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.30) id 1BBCCl-0004lf-CE; Wed, 07 Apr 2004 08:30:55 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/8.11.6) with ESMTP id i37CUg802224; Wed, 7 Apr 2004 21:30:42 +0900 (JST) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6p2/8.11.6) with ESMTP id i37CUf903427; Wed, 7 Apr 2004 21:30:41 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id VAA25159; Wed, 7 Apr 2004 21:30:41 +0900 (JST) Original-To: d.love@dl.ac.uk In-reply-to: (message from Dave Love on Thu, 18 Mar 2004 15:34:07 +0000) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:21310 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:21310 In article , Dave Love writes: >> Change utf-translate-cjk-mode to a customizable variable >> utf-translate-cjk which is nil, t, or auto (default). The >> values nil and t mean the same thing as the current value of >> utf-translate-cjk-mode. The value `auto' means setting up >> tables for translating CJK characters automatically if >> necessary. >> >> By adding pre-write-conversion function, we can make the >> above work also on writing. But, in that case, it seems >> difficult to make find-coding-systems-region/string work >> consistently. To check if a text is encodable by utf-8, we >> must load translation tables. > As far as I remember, that's why I didn't implement that sort of > thing. Wait! If utf-translate-cjk-mode can encode all jis, kcs, big5, and gb to utf-8, we can tell that they can be encoded by utf-8 without loading tables. What we have to do is to simply include those charsets in `safe-charsets' on defining utf-8. > post-read-conversion machinery is already there, I think. Yes, utf-8 already has utf-8-post-read-conversion which composes unencoded raw-bytes into Unicode U+FFFD. > [Is this code base ever going to be released so that most users > actually can use it?] I'd like to ask it too. --- Ken'ichi HANDA handa@m17n.org