From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: 23.0.60; describe-char gives wrong information Date: Tue, 08 Jan 2008 14:55:23 +0900 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1199771760 32230 80.91.229.12 (8 Jan 2008 05:56:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 8 Jan 2008 05:56:00 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, handa@ni.aist.go.jp To: Peter Dyballa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 08 06:56:18 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JC7Rg-00066y-C2 for ged-emacs-devel@m.gmane.org; Tue, 08 Jan 2008 06:56:16 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JC7RI-0000M8-VF for ged-emacs-devel@m.gmane.org; Tue, 08 Jan 2008 00:55:53 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JC7RC-0000Kr-IZ for emacs-devel@gnu.org; Tue, 08 Jan 2008 00:55:46 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JC7RB-0000KJ-Do for emacs-devel@gnu.org; Tue, 08 Jan 2008 00:55:45 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JC7RB-0000KE-0o for emacs-devel@gnu.org; Tue, 08 Jan 2008 00:55:45 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JC7RA-0001Ld-Ju for emacs-devel@gnu.org; Tue, 08 Jan 2008 00:55:44 -0500 Original-Received: from mail.gnu.org ([199.232.76.166] helo=mx10.gnu.org) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1JC7RA-0005Pi-8O for emacs-pretest-bug@gnu.org; Tue, 08 Jan 2008 00:55:44 -0500 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1JC7R6-0001LB-Ee for emacs-pretest-bug@gnu.org; Tue, 08 Jan 2008 00:55:43 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JC7R5-0001Kb-Qw for emacs-pretest-bug@gnu.org; Tue, 08 Jan 2008 00:55:40 -0500 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id m085tODi020100; Tue, 8 Jan 2008 14:55:24 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id m085tO5M013862; Tue, 8 Jan 2008 14:55:24 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id m085tNue007725; Tue, 8 Jan 2008 14:55:23 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.68) (envelope-from ) id 1JC7Qp-0006hS-J9; Tue, 08 Jan 2008 14:55:23 +0900 In-reply-to: (message from Peter Dyballa on Mon, 31 Dec 2007 14:16:04 +0100) X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.aist.go.jp id m085tODi020100 X-detected-kernel: by monty-python.gnu.org: Solaris 8 (1) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:86553 gmane.emacs.pretest.bugs:20546 Archived-At: Sorry for the late response. In article , Peter Dybal= la writes: > Hello! > When inquiring information for =CE=9F=CC=93 (i.e. a capital Omicron and= a =20 > psili), maybe not correctly "composed" coming from a XeTeX document, =20 > GNU Emacs 23.0.60 tells me: > character: =CE=9F (927, #o1637, #x39f) > preferred charset: gb18030 (GB18030) > code point: 0xA6AF > syntax: w which means: word > category: G:Greek characters of 2-byte character sets =20 > c:Chinese g:Greek h:Korean > j:Japanese > buffer code: #xCE #x9F > file code: #xCE #x9F (encoded by coding system utf-8-unix) > display: composed to form "=CE=9F=CC=93" (see below) =09 > Composed with the following character(s) "=CC=93 " by the rule: > (?=CE=9F (tc . bc) ?=CC=93 ) > The component character(s) are displayed by these fonts (glyph codes): > =CE=9F : -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-80-ISO8859-7 (#x= CF) > =CC=93 : -monotype-arial unicode ms-medium-r-normal--13-127-74-74-p-1= 29-=20 > gb18030.2000-0 (#xBE35) > See the variable `reference-point-alist' for the meaning of the rule. =09 > Character code properties are not shown: customize what to show =09 > There are text properties here: > auto-composed t > composition [Show] > fontified t > Character U+039F can't hardly belong to a Chinese encoding. It's a =20 > Greek character, taken off an ISO 8859-7 font. Actuallyy many CJK charsets contain Greek letters. As you are in de_DE locale, the order of iso-8859-7 and gb18030 in charset list is arbitrary. Try C-x C-m l greek RET C-u C-x =3D. iso-8859-7 should be preferred. > Its psili modifier or =20 > COMBINING COMMA ABOVE is at U+0313, outside any Chinese encoding, too =20 > (although GB18030-2000 defines both as 0xA6AF and as 0x8130BE35). =20 > Isn't Unicode, as in the name "Unicode Emacs," more > appropriate? For the moment, I don't have a good idea about how to order character sets that are outside of users locale. Perhaps, if the character doesn't belong to any of: (get-language-info current-language-environment 'charset) the "preferred charset" line should not be showned. > And then there is no sense in using a non-existing character from an =20 > inappropriate font when the default font, Lucida Sans Typewriter, has =20 > this character COMBINING COMMA ABOVE. And this font also has GREEK =20 > CAPITAL LETTER OMICRON at U+039F. Oops, Emacs assumed that GB18030 fonts contains all GB18030 characters. I've just installed a fix to check the contents of GB18030 fonts before using it. By the way, in emacs-unicode-2, the default fontset is not yet tuned well for Unicode. For instance, for Latin, currently only these fonts are registered: "ISO8859-1" "ISO8859-2" "ISO8859-3" "ISO8859-4" "ISO8859-9" "ISO8859-10" "ISO8859-13" "ISO8859-14" "ISO8859-15" "VISCII1.1-1" As none of them have U+0313, Emacs tries these fallback fonts: "gb2312.1980" "gbk-0" "gb18030" "jisx0208" "ksc5601.1987" "CNS11643.1992-1" "CNS11643.1992-2" "CNS11643.1992-3" "CNS11643.1992-4" "CNS11643.1992-5" "CNS11643.1992-6" "CNS11643.1992-7" "big5" "jisx0213.2000-1" "jisx0213.2004-1" "jisx0212" "ISO10646-1" I agree that this doesn't yield a good result, but the strategy of "use the default font if it contains the charater" is also not good for non-Latin users. Perhaps, to use the default font or not should depend on the language environment. > Similarly GNU Emacs 23.0.60 handles =E1=BD=88 (i.e. one letter Omicron= with =20 > psili): > character: =E1=BD=88 (8008, #o17510, #x1f48) > preferred charset: gb18030 (GB18030) > code point: 0x81369132 > syntax: w which means: word > category: g:Greek > buffer code: #xE1 #xBD #x88 > file code: #xE1 #xBD #x88 (encoded by coding system utf-8-unix= ) > display: by this font (glyph code) > -monotype-arial unicode ms-medium-r-normal--10-98-74-74-p-99-=20 > gb18030.2000-0 (#x9132) [...] > And although it claims taking GREEK CAPITAL LETTER OMICRON WITH PSILI =20 > at U+1F48 off Arial Unicode MS, which has this glyph, it uses an open =20 > box to display it. Because U+1F48 is not defined in GB18030? The byte =20 > sequence (code point) 0x81369132 is not defined in GB18030-2000. If that font doesn't contain that character, with the above change, that font won't be used. --- Kenichi Handa handa@ni.aist.go.jp