From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Fwd: Problem with non-bmp unicode Date: Sun, 12 Nov 2006 11:32:58 +0900 Message-ID: References: <200611101127.47456.jerome@marant.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1163298759 7582 80.91.229.2 (12 Nov 2006 02:32:39 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 12 Nov 2006 02:32:39 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 12 03:32:35 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Gj598-00045S-J6 for ged-emacs-devel@m.gmane.org; Sun, 12 Nov 2006 03:32:34 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Gj597-0000pe-P9 for ged-emacs-devel@m.gmane.org; Sat, 11 Nov 2006 21:32:33 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Gj58t-0000ol-4d for emacs-devel@gnu.org; Sat, 11 Nov 2006 21:32:19 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Gj58p-0000lU-Sp for emacs-devel@gnu.org; Sat, 11 Nov 2006 21:32:18 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Gj58p-0000lJ-Lw for emacs-devel@gnu.org; Sat, 11 Nov 2006 21:32:15 -0500 Original-Received: from [150.29.246.133] (helo=mx1.aist.go.jp) by monty-python.gnu.org with esmtp (Exim 4.52) id 1Gj58p-000876-8s for emacs-devel@gnu.org; Sat, 11 Nov 2006 21:32:15 -0500 Original-Received: from smtp2.aist.go.jp ([150.29.246.12]) by mx1.aist.go.jp with ESMTP id kAC2WCpo006169; Sun, 12 Nov 2006 11:32:12 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id kAC2W9nN005593; Sun, 12 Nov 2006 11:32:10 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1Gj59W-0003x7-00; Sun, 12 Nov 2006 11:32:58 +0900 Original-To: =?UTF-8?B?SsOpcsO0bWU=?= Marant In-reply-to: <200611101127.47456.jerome@marant.org> (message from =?UTF-8?B?SsOpcsO0bWU=?= Marant on Fri, 10 Nov 2006 11:27:47 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:62090 Archived-At: In article <200611101127.47456.jerome@marant.org>, J=C3=A9r=C3=B4me Marant = writes: > Do you have any clue about this? Sorry for the late reponse on this thread. > Subject: Problem with non-bmp unicode > Date: mercredi 08 novembre 2006 09:26 [...] > An UTF-8 file (attached) with these three characters: > U+0022 U+00010380 U+0022 > shows with "emacs -nw": > "\360\220\216\200" > which is not usable at all. The file displays correctly if I cat it. > I tried a bunch of other characters outside the BMP, all of which > fail in the same way. Characters in the BMP work nicely. Emacs 22 still doesn't support Unicode characters over BMP. If you really need to handle them, please use the CVS branch emacs-unicode-2. > Apparently, emacs 22 shows a question mark instead of "\360\220\216\200" > but trying to delete the question mark character with backspace turn it i= nto > "\360\220\216". This is written in the comment of utf-8.el. ;; We compose the untranslatable sequences into a single character, ;; and move point to the next character. ;; This is infelicitous for editing, because there's currently no ;; mechanism for treating compositions as atomic, but is OK for ;; display. They are composed to U+FFFD with help-echo which ;; indicates the unicodes they represent. This function GCs too much. I tried to fix this editting problem by using modification-hooks text property, but couldn't accomplish a good result. --- Kenichi Handa handa@m17n.org