From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@ni.aist.go.jp>
Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs
Subject: Re: 23.0.60; describe-char gives wrong information
Date: Tue, 08 Jan 2008 14:55:23 +0900
Message-ID: <E1JC7Qp-0006hS-J9@etlken.m17n.org>
References: <C2305CE0-0122-48D7-8B3A-96C3D223F6E5@Freenet.DE>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1199771760 32230 80.91.229.12 (8 Jan 2008 05:56:00 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 8 Jan 2008 05:56:00 +0000 (UTC)
Cc: emacs-pretest-bug@gnu.org, handa@ni.aist.go.jp
To: Peter Dyballa <Peter_Dyballa@Freenet.DE>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 08 06:56:18 2008
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1JC7Rg-00066y-C2
	for ged-emacs-devel@m.gmane.org; Tue, 08 Jan 2008 06:56:16 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1JC7RI-0000M8-VF
	for ged-emacs-devel@m.gmane.org; Tue, 08 Jan 2008 00:55:53 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1JC7RC-0000Kr-IZ
	for emacs-devel@gnu.org; Tue, 08 Jan 2008 00:55:46 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1JC7RB-0000KJ-Do
	for emacs-devel@gnu.org; Tue, 08 Jan 2008 00:55:45 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1JC7RB-0000KE-0o
	for emacs-devel@gnu.org; Tue, 08 Jan 2008 00:55:45 -0500
Original-Received: from fencepost.gnu.org ([140.186.70.10])
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <handa@m17n.org>) id 1JC7RA-0001Ld-Ju
	for emacs-devel@gnu.org; Tue, 08 Jan 2008 00:55:44 -0500
Original-Received: from mail.gnu.org ([199.232.76.166] helo=mx10.gnu.org)
	by fencepost.gnu.org with esmtp (Exim 4.67)
	(envelope-from <handa@m17n.org>) id 1JC7RA-0005Pi-8O
	for emacs-pretest-bug@gnu.org; Tue, 08 Jan 2008 00:55:44 -0500
Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim
	4.60) (envelope-from <handa@m17n.org>) id 1JC7R6-0001LB-Ee
	for emacs-pretest-bug@gnu.org; Tue, 08 Jan 2008 00:55:43 -0500
Original-Received: from mx1.aist.go.jp ([150.29.246.133])
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <handa@m17n.org>) id 1JC7R5-0001Kb-Qw
	for emacs-pretest-bug@gnu.org; Tue, 08 Jan 2008 00:55:40 -0500
Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123])
	by mx1.aist.go.jp  with ESMTP id m085tODi020100;
	Tue, 8 Jan 2008 14:55:24 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp2.aist.go.jp
	by rqsmtp2.aist.go.jp  with ESMTP id m085tO5M013862;
	Tue, 8 Jan 2008 14:55:24 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp2.aist.go.jp  with ESMTP id m085tNue007725;
	Tue, 8 Jan 2008 14:55:23 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken.m17n.org with local (Exim 4.68)
	(envelope-from <handa@m17n.org>)
	id 1JC7Qp-0006hS-J9; Tue, 08 Jan 2008 14:55:23 +0900
In-reply-to: <C2305CE0-0122-48D7-8B3A-96C3D223F6E5@Freenet.DE> (message from
	Peter Dyballa on Mon, 31 Dec 2007 14:16:04 +0100)
X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.aist.go.jp id
	m085tODi020100
X-detected-kernel: by monty-python.gnu.org: Solaris 8 (1)
X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:86553 gmane.emacs.pretest.bugs:20546
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/86553>

Sorry for the late response.

In article <C2305CE0-0122-48D7-8B3A-96C3D223F6E5@Freenet.DE>, Peter Dybal=
la <Peter_Dyballa@Freenet.DE> writes:

> Hello!
> When inquiring information for =CE=9F=CC=93 (i.e. a capital Omicron and=
 a =20
> psili), maybe not correctly "composed" coming from a XeTeX document, =20
> GNU Emacs 23.0.60 tells me:

> 	        character: =CE=9F  (927, #o1637, #x39f)
> 	preferred charset: gb18030 (GB18030)
> 	       code point: 0xA6AF
> 	           syntax: w 	which means: word
> 	         category: G:Greek characters of 2-byte character sets =20
> c:Chinese g:Greek h:Korean
> 			   j:Japanese
> 	      buffer code: #xCE #x9F
> 	        file code: #xCE #x9F (encoded by coding system utf-8-unix)
> 	          display: composed to form "=CE=9F=CC=93" (see below)
=09
> 	Composed with the following character(s) "=CC=93 " by the rule:
> 		(?=CE=9F  (tc . bc) ?=CC=93 )
> 	The component character(s) are displayed by these fonts (glyph codes):
> 	 =CE=9F : -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-80-ISO8859-7 (#x=
CF)
> 	 =CC=93 : -monotype-arial unicode ms-medium-r-normal--13-127-74-74-p-1=
29-=20
> gb18030.2000-0 (#xBE35)
> 	See the variable `reference-point-alist' for the meaning of the rule.
=09
> 	Character code properties are not shown: customize what to show
=09
> 	There are text properties here:
> 	  auto-composed        t
> 	  composition          [Show]
> 	  fontified            t

> Character U+039F can't hardly belong to a Chinese encoding. It's a =20
> Greek character, taken off an ISO 8859-7 font.

Actuallyy many CJK charsets contain Greek letters.  As you
are in de_DE locale, the order of iso-8859-7 and gb18030 in
charset list is arbitrary.  Try C-x C-m l greek RET C-u C-x
=3D.  iso-8859-7 should be preferred.

> Its psili modifier or =20
> COMBINING COMMA ABOVE is at U+0313, outside any Chinese encoding, too =20
> (although GB18030-2000 defines both as 0xA6AF and as 0x8130BE35). =20
> Isn't Unicode, as in the name "Unicode Emacs," more
> appropriate?

For the moment, I don't have a good idea about how to order
character sets that are outside of users locale.  Perhaps,
if the character doesn't belong to any of:
 (get-language-info current-language-environment 'charset)
the "preferred charset" line should not be showned.

> And then there is no sense in using a non-existing character from an =20
> inappropriate font when the default font, Lucida Sans Typewriter, has =20
> this character COMBINING COMMA ABOVE. And this font also has GREEK =20
> CAPITAL LETTER OMICRON at U+039F.

Oops, Emacs assumed that GB18030 fonts contains all GB18030
characters.  I've just installed a fix to check the contents
of GB18030 fonts before using it.

By the way, in emacs-unicode-2, the default fontset is not
yet tuned well for Unicode.  For instance, for Latin,
currently only these fonts are registered:

"ISO8859-1" "ISO8859-2" "ISO8859-3" "ISO8859-4" "ISO8859-9"
"ISO8859-10" "ISO8859-13" "ISO8859-14" "ISO8859-15"
"VISCII1.1-1"

As none of them have U+0313, Emacs tries these fallback fonts:

"gb2312.1980" "gbk-0" "gb18030" "jisx0208" "ksc5601.1987"
"CNS11643.1992-1" "CNS11643.1992-2" "CNS11643.1992-3"
"CNS11643.1992-4" "CNS11643.1992-5" "CNS11643.1992-6"
"CNS11643.1992-7" "big5" "jisx0213.2000-1" "jisx0213.2004-1"
"jisx0212" "ISO10646-1"

I agree that this doesn't yield a good result, but the
strategy of "use the default font if it contains the
charater" is also not good for non-Latin users.  Perhaps, to
use the default font or not should depend on the language
environment.

> Similarly GNU Emacs 23.0.60 handles =E1=BD=88  (i.e. one letter Omicron=
 with =20
> psili):

> 	        character: =E1=BD=88  (8008, #o17510, #x1f48)
> 	preferred charset: gb18030 (GB18030)
> 	       code point: 0x81369132
> 	           syntax: w 	which means: word
> 	         category: g:Greek
> 	      buffer code: #xE1 #xBD #x88
> 	        file code: #xE1 #xBD #x88 (encoded by coding system utf-8-unix=
)
> 	          display: by this font (glyph code)
> 	     -monotype-arial unicode ms-medium-r-normal--10-98-74-74-p-99-=20
> gb18030.2000-0 (#x9132)
[...]
> And although it claims taking GREEK CAPITAL LETTER OMICRON WITH PSILI =20
> at U+1F48 off Arial Unicode MS, which has this glyph, it uses an open =20
> box to display it. Because U+1F48 is not defined in GB18030? The byte =20
> sequence (code point) 0x81369132 is not defined in GB18030-2000.

If that font doesn't contain that character, with the above
change, that font won't be used.

---
Kenichi Handa
handa@ni.aist.go.jp