From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Possible UTF-8 CJK Regressions in Terminal Emulators Date: Wed, 9 Jun 2004 16:37:12 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200406090737.QAA11090@etlken.m17n.org> References: <1077643915.12919.2.camel@duende> <1077682436.28482.9.camel@duende> <200403010815.RAA14365@etlken.m17n.org> <200404071230.VAA25159@etlken.m17n.org> <200404091128.UAA02120@etlken.m17n.org> <200406071227.VAA06216@etlken.m17n.org> <20040607123615.GA29450@fencepost> <200406071300.WAA06332@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1086766675 828 80.91.224.253 (9 Jun 2004 07:37:55 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 9 Jun 2004 07:37:55 +0000 (UTC) Cc: mariano@gnome.org, alexander.winston@comcast.net, emacs-devel@gnu.org, danilo@gnome.org, monnier@iro.umontreal.ca, miles@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Jun 09 09:37:44 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BXxea-0001Nk-00 for ; Wed, 09 Jun 2004 09:37:44 +0200 Original-Received: from lists.gnu.org ([199.232.76.165]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BXxea-00044Z-00 for ; Wed, 09 Jun 2004 09:37:44 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BXxfC-0004RY-Tn for emacs-devel@quimby.gnus.org; Wed, 09 Jun 2004 03:38:23 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BXxfA-0004RT-JU for emacs-devel@gnu.org; Wed, 09 Jun 2004 03:38:20 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BXxfA-0004RH-25 for emacs-devel@gnu.org; Wed, 09 Jun 2004 03:38:20 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BXxf9-0004RE-Um for emacs-devel@gnu.org; Wed, 09 Jun 2004 03:38:20 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BXxeB-0001Bd-Bf; Wed, 09 Jun 2004 03:37:19 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/8.11.6) with ESMTP id i597bDQ23705; Wed, 9 Jun 2004 16:37:13 +0900 (JST) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6p2/8.11.6) with ESMTP id i597bDW14084; Wed, 9 Jun 2004 16:37:13 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA11090; Wed, 9 Jun 2004 16:37:12 +0900 (JST) Original-To: d.love@dl.ac.uk In-reply-to: (message from Dave Love on Tue, 08 Jun 2004 19:02:07 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:24758 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:24758 In article , Dave Love writes: > Kenichi Handa writes: > >> Absolutely! Then we can say "utf-8 is (almost) completely > >> supported"... I think this is a very important thing. > > > > I think "completely" is still too strong even with preceding > > "(almost)". > I know what you mean, but I think that's the sort of thing that > encourages the established user confusion over encoding issues. > UTF-8 per se is fully supported up to some limit on the code point. > (I hope that's as large as the Emacs 22 maximum codepoint, but I don't > remember.) No, the current support of UTF-8 is limitted to U+10FFFF (the maximum Unicode character). > Whether or not valid unicodes can be decoded into a > character Emacs can actually encode/display/input properly is a > different matter, Ah, yes. In that sense, we can say utf-8 encoding/decoding is completely supportted. > and the feature should affect all relevant CCL > coding systems, especially UTF-16. As surrogate pair was not handled well by UTF-16 converter, I've just fixed it too (not yet installed, I'm now adding comments in a code). Untranslatable characters are decoded into UTF-8 form represented by the sequence of eight-bit-graphic/control characters (the same way as UTF-8 decoding, thus we can use utf-8-post-read-conversion). The UTF-16 encoder encodes such a sequence back to the origianl UTF-16 form. So, now the UTF-16 support is at the same level as UTF-8. --- Ken'ichi HANDA handa@m17n.org