From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Dave Love Newsgroups: gmane.emacs.devel Subject: Re: utf-8 cjk translation bug? Date: Wed, 01 Oct 2003 13:44:28 +0100 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <200309301259.VAA01304@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1065063299 18343 80.91.224.253 (2 Oct 2003 02:54:59 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 2 Oct 2003 02:54:59 +0000 (UTC) Cc: emacs-devel@gnu.org, miles@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Thu Oct 02 04:54:57 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1A4tcG-0002cc-00 for ; Thu, 02 Oct 2003 04:54:56 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1A4tcG-0004ih-00 for ; Thu, 02 Oct 2003 04:54:56 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1A4taw-00007Y-Qg for emacs-devel@quimby.gnus.org; Wed, 01 Oct 2003 22:53:34 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1A4tF6-0003Dr-4h for emacs-devel@gnu.org; Wed, 01 Oct 2003 22:31:00 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1A4sxN-00082P-6D for emacs-devel@gnu.org; Wed, 01 Oct 2003 22:13:12 -0400 Original-Received: from [148.79.80.39] (helo=albion.dl.ac.uk) by monty-python.gnu.org with esmtp (Exim 4.24) id 1A4gLJ-0003ZM-Me; Wed, 01 Oct 2003 08:44:34 -0400 Original-Received: from fx by albion.dl.ac.uk with local (Exim 3.35 #1 (Debian)) id 1A4gLE-0003z6-00; Wed, 01 Oct 2003 13:44:28 +0100 Original-To: Kenichi Handa User-Agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.2 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:16849 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:16849 Kenichi Handa writes: > So, #xFF?? are excluded from ucs-unicode-to-mule-cjk, thus > they are not translated to japanese-jisx0208 on decoding. > If you have a ISO10646-1 font that contains full width > glyphs for those characters, you can see correct glyphs. Or you can display them with a jisx font, for instance. > I think the reason why they are excluded from the > translation is that they are representable by the charset > mule-unicode-e000-ffff, thus there's no need of translation. That was part of the reason for it -- the hash-based translation code is only relevant because we more-or-less used up the code space for the BMP. I also chose the boundaries to avoid breaking the region between the mule-unicode and CJK charsets. > It seems to be a reasonable decision, but considering that > most users don't have an ISO10646-1 font containing those > glyphs, I thought they typically did if they had 10646 fonts at all. Is the problem that in recent XFree86, for instance, the double-width characters are in different fonts which have `adstyl' `ja' or `ko'? As far as I remember, the fontset code doesn't deal with that yet. (So many special cases, sigh.) > and that those characters can also be regarded as > CJK components (only CJK users uses them), I think we had > better not exclude them from the translation. I'm not really convinced, but I don't feel strongly about it. (If the extra charsets hadn't been added before mule-unicode, we'd just have covered the BMP with more mule-unicode ones.) > So, I suggest changing the above line (and similar lines in > the other subst-XXX.el) to: > > (if (>= unicode #x2e80) > (puthash unicode char ucs-unicode-to-mule-cjk)) > > and modify ccl-decode-mule-utf-8 to check translation also > for those characters. > > Dave, what do you think? Does such a change leads to any > problem? As far as I remember, it includes too much, and you end up displaying some characters double width that probably shouldn't be, but I don't remember which. How about including the ranges of the double-width Western characters and the high CJK stuff explicitly? I guess it doesn't expand the tables greatly.