From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: How to get the script name symbols of a specific character? Date: Mon, 11 Feb 2013 22:08:56 +0200 Message-ID: <837gme62av.fsf@gnu.org> References: <5117C3FC.5020608@gmail.com> <87bobr13v8.fsf@gmail.com> <51190927.7070807@gmail.com> <878v6ur5cn.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1360613339 29438 80.91.229.3 (11 Feb 2013 20:08:59 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 11 Feb 2013 20:08:59 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Feb 11 21:09:21 2013 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1U4zgi-0003PE-Ig for geh-help-gnu-emacs@m.gmane.org; Mon, 11 Feb 2013 21:09:16 +0100 Original-Received: from localhost ([::1]:58016 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U4zgP-0002Ws-1W for geh-help-gnu-emacs@m.gmane.org; Mon, 11 Feb 2013 15:08:57 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:57421) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U4zgJ-0002Wn-KO for help-gnu-emacs@gnu.org; Mon, 11 Feb 2013 15:08:53 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U4zgH-0001ad-Hm for help-gnu-emacs@gnu.org; Mon, 11 Feb 2013 15:08:51 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:45320) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U4zgH-0001aV-9y for help-gnu-emacs@gnu.org; Mon, 11 Feb 2013 15:08:49 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MI200000NUX7Y00@a-mtaout22.012.net.il> for help-gnu-emacs@gnu.org; Mon, 11 Feb 2013 22:08:47 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MI2000RGNYN3I20@a-mtaout22.012.net.il> for help-gnu-emacs@gnu.org; Mon, 11 Feb 2013 22:08:47 +0200 (IST) In-reply-to: <878v6ur5cn.fsf@gmail.com> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:89079 Archived-At: > From: Jambunathan K > Date: Tue, 12 Feb 2013 01:27:28 +0530 > Cc: help-gnu-emacs@gnu.org >=20 > YE Qianchuan writes: >=20 > > On 02/11/2013 07:34 PM, Jambunathan K wrote: > >> Put your cursor on the box and type > >> C-u C-x =3D > > In fact, it's the same as `describe-char'. This command invokes > > `what-cursor-position', which invokes `describe-char' eventually. > >> > >> It will give more useful pointers. The codepoint of a particula= r > >> character. The name of the character, in the example below is p= refixed > >> by the script it comes from etc. > > Cool, I didn't notice its name may be prefixed by its script. It = does > > make a lot sense. > > > > However sadly, not all characters do so. For example, a CJK chara= cter > > has prefix CJK. > > But cjk is not a script name (though there's a script called cjk-= misc) > > and it should belong > > to `han'. > > > > What's worse is, some characters don't show their names at all, e= ven > > if I assign a font to it. > > > > For example: > > position: 806 of 1031 (78%), column: 1 > > character: =F0=9F=98=80 (displayed as =F0=9F=98=80) (= codepoint 128512, #o373000, > > #x1f600) > > preferred charset: unicode (Unicode (ISO10646)) > > code point in charset: 0x1F600 > > syntax: w which means: word > > category: L:Left-to-right (strong) > > buffer code: #xF0 #x9F #x98 #x80 > > file code: #xF0 #x9F #x98 #x80 (encoded by coding sys= tem > > utf-8-unix) > > display: no font available > > > > Character code properties: customize what to show > > general-category: Cn (Other, Not Assigned) > > decomposition: (128512) ('=F0=9F=98=80') >=20 > This is what I get. Emacs reports that it is a GRINNING FACE. = =20 >=20 > I run Emacs from trunk though. I am not sure this makes any actual= l > difference. The names come from the Unicode character database (UCD) that is processed into a bunch of Emacs Lisp files and then preloaded into Emacs. The version of the Unicode database built into Emacs determines which codepoints have names and which don't. > I think it would be useful to have one browse different Unicode Blo= cks > or have C-u C-x =3D report the block name of a character. If that data is not in the UCD, Emacs cannot know it, unless someone adds it to Emacs.