From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Possible UTF-8 CJK Regressions in Terminal Emulators Date: 09 Jun 2004 05:38:30 -0400 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <1077643915.12919.2.camel@duende> <1077682436.28482.9.camel@duende> <200403010815.RAA14365@etlken.m17n.org> <200404071230.VAA25159@etlken.m17n.org> <200404091128.UAA02120@etlken.m17n.org> <200406071227.VAA06216@etlken.m17n.org> <20040607123615.GA29450@fencepost> <200406071300.WAA06332@etlken.m17n.org> <200406090737.QAA11090@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1086774094 19740 80.91.224.253 (9 Jun 2004 09:41:34 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 9 Jun 2004 09:41:34 +0000 (UTC) Cc: mariano@gnome.org, alexander.winston@comcast.net, d.love@dl.ac.uk, emacs-devel@gnu.org, danilo@gnome.org, miles@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Wed Jun 09 11:41:22 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BXzaE-0000Sr-00 for ; Wed, 09 Jun 2004 11:41:22 +0200 Original-Received: from lists.gnu.org ([199.232.76.165]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BXzaE-00066j-00 for ; Wed, 09 Jun 2004 11:41:22 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BXzar-0000Su-FR for emacs-devel@quimby.gnus.org; Wed, 09 Jun 2004 05:42:01 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BXzai-0000S6-T7 for emacs-devel@gnu.org; Wed, 09 Jun 2004 05:41:52 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BXzah-0000Rr-9V for emacs-devel@gnu.org; Wed, 09 Jun 2004 05:41:52 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BXzah-0000Ro-6Q for emacs-devel@gnu.org; Wed, 09 Jun 2004 05:41:51 -0400 Original-Received: from [132.204.24.67] (helo=mercure.iro.umontreal.ca) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BXzZa-0004L3-Pn; Wed, 09 Jun 2004 05:40:42 -0400 Original-Received: from vor.iro.umontreal.ca (vor.iro.umontreal.ca [132.204.24.42]) by mercure.iro.umontreal.ca (Postfix) with ESMTP id 93C24B30454; Wed, 9 Jun 2004 05:38:42 -0400 (EDT) Original-Received: by vor.iro.umontreal.ca (Postfix, from userid 20848) id CF58A3C63E; Wed, 9 Jun 2004 05:38:30 -0400 (EDT) Original-To: Kenichi Handa In-Reply-To: <200406090737.QAA11090@etlken.m17n.org> Original-Lines: 19 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 X-DIRO-MailScanner-Information: Please contact the ISP for more information X-DIRO-MailScanner: Found to be clean X-DIRO-MailScanner-SpamCheck: n'est pas un polluriel, SpamAssassin (score=0, requis 5) X-MailScanner-From: monnier@iro.umontreal.ca X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:24761 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:24761 > As surrogate pair was not handled well by UTF-16 converter, > I've just fixed it too (not yet installed, I'm now adding > comments in a code). Untranslatable characters are decoded > into UTF-8 form represented by the sequence of > eight-bit-graphic/control characters (the same way as UTF-8 > decoding, thus we can use utf-8-post-read-conversion). The > UTF-16 encoder encodes such a sequence back to the origianl > UTF-16 form. So, now the UTF-16 support is at the same > level as UTF-8. Does that mean that some sequences of eight-bit-graphic/control are not encoded into the corresponding raw bytes? If so, that makes me a bit uneasy, since those special chars were introduced specifically to handle things like binary input or bad-byte-sequences and make sure that we at least preserve the raw bytes in those cases. Stefan