How to get the script name symbols of a specific character?

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* How to get the script name symbols of a specific character?
@ 2013-02-10 15:59 YE Qianchuan
  2013-02-11  2:55 ` Jambunathan K
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-10 15:59 UTC (permalink / raw)
  To: help-gnu-emacs

Hi, all.

According to the document of `set-fontset-font', its argument TARGET can
be a charset or a script name symbol.  But I failed to find any
documents about script name symbols. What I found that seem relevant
are variables `charset-script-alist', `script-representative-chars'
and `char-script-table'.  However none of them tells me the details of
those scripts, I can only guess by their names.

My case is, for example, a set of unicode characters are displayed as
hex boxes. I want to assign a proper font to display them. Specifying
TARGET to unicode is not a good idea IMHO. I'd better find their
script name symbol as TARGET, like `Han' for CJK characters.

In practice, by calling `describe-char', I get which charset is
corresponding to this character. So I can specify it to modify its
font.  However, I can't find a method to get a character's script name
symbols.

How can I achieve this? Do I miss something?
Thanks for your help.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-10 15:59 How to get the script name symbols of a specific character? YE Qianchuan
@ 2013-02-11  2:55 ` Jambunathan K
  2013-02-11 10:48   ` YE Qianchuan
  2013-02-11 11:34 ` Jambunathan K
  2013-02-12 15:12 ` YE Qianchuan
  2 siblings, 1 reply; 15+ messages in thread
From: Jambunathan K @ 2013-02-11  2:55 UTC (permalink / raw)
  To: YE Qianchuan; +Cc: help-gnu-emacs

YE Qianchuan <stool.ye@gmail.com> writes:

> Hi, all.
>
> According to the document of `set-fontset-font', its argument TARGET can
> be a charset or a script name symbol.  But I failed to find any
> documents about script name symbols. What I found that seem relevant
> are variables `charset-script-alist', `script-representative-chars'
> and `char-script-table'.  However none of them tells me the details of
> those scripts, I can only guess by their names.

These two suggestions or from stackoverflow thread
(http://stackoverflow.com/questions/7176276/what-is-script-name-symbol-means-for-emacs-set-fontset-font-function)

        M-: (char-table-extra-slot char-script-table 0)
        M-x list-character-sets

Long time ago, I was trying to assign font to tamil/indic scripts.  I
was hoping that there would be a command like `describe-scripts' or some
such thing.  I was disappointed.

May be there should be one.

>
> My case is, for example, a set of unicode characters are displayed as
> hex boxes. I want to assign a proper font to display them. Specifying
> TARGET to unicode is not a good idea IMHO. I'd better find their
> script name symbol as TARGET, like `Han' for CJK characters.
>
> In practice, by calling `describe-char', I get which charset is
> corresponding to this character. So I can specify it to modify its
> font.  However, I can't find a method to get a character's script name
> symbols.
>
> How can I achieve this? Do I miss something?
> Thanks for your help.
>
>
>

-- 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11  2:55 ` Jambunathan K
@ 2013-02-11 10:48   ` YE Qianchuan
  2013-02-11 11:00     ` Jambunathan K
  0 siblings, 1 reply; 15+ messages in thread
From: YE Qianchuan @ 2013-02-11 10:48 UTC (permalink / raw)
  To: Jambunathan K; +Cc: help-gnu-emacs

On 02/11/2013 10:55 AM, Jambunathan K wrote:
> These two suggestions or from stackoverflow thread
> (http://stackoverflow.com/questions/7176276/what-is-script-name-symbol-means-for-emacs-set-fontset-font-function)
>
>          M-: (char-table-extra-slot char-script-table 0)
>          M-x list-character-sets
>
> Long time ago, I was trying to assign font to tamil/indic scripts.  I
> was hoping that there would be a command like `describe-scripts' or some
> such thing.  I was disappointed.
>
> May be there should be one.
>
Thanks. I had read this thread. It does say something but not enough. 
Maybe I should look at the source.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 10:48   ` YE Qianchuan
@ 2013-02-11 11:00     ` Jambunathan K
  2013-02-11 14:50       ` YE Qianchuan
  0 siblings, 1 reply; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 11:00 UTC (permalink / raw)
  To: YE Qianchuan; +Cc: help-gnu-emacs

YE Qianchuan <stool.ye@gmail.com> writes:

> On 02/11/2013 10:55 AM, Jambunathan K wrote:
>> These two suggestions or from stackoverflow thread
>> (http://stackoverflow.com/questions/7176276/what-is-script-name-symbol-means-for-emacs-set-fontset-font-function)
>>
>>          M-: (char-table-extra-slot char-script-table 0)
>>          M-x list-character-sets
>>
>> Long time ago, I was trying to assign font to tamil/indic scripts.  I
>> was hoping that there would be a command like `describe-scripts' or some
>> such thing.  I was disappointed.
>>
>> May be there should be one.
>>
> Thanks. I had read this thread. It does say something but not
> enough. Maybe I should look at the source.

You were looking for symbols for scripts and I have included a dump of
what the above sexp returns.  I see hangul, hanunoo and han.

As for Tamil script (which I use) I have the following in my .emacs.
"Lohit Tamil" is the font to use for 'tamil script.

,----
| ;; Use the predefined fontset "fontset-standard"
| (set-face-font 'default "fontset-default")
| (set-fontset-font "fontset-default" 'tamil "Lohit Tamil")
`----

,---- (char-table-extra-slot char-script-table 0)
| (latin phonetic greek coptic cyrillic armenian hebrew arabic
| syriac thaana nko samaritan mandaic devanagari bengali gurmukhi
| gujarati oriya tamil telugu kannada malayalam sinhala thai lao
| tibetan burmese georgian hangul ethiopic cherokee
| canadian-aboriginal ogham runic tagalog hanunoo buhid tagbanwa
| khmer mongolian limbu tai-le tai-lue buginese tai-tham balinese
| sundanese batak lepcha ol-chiki vedic symbol braille glagolitic
| tifinagh han ideographic-description cjk-misc kana bopomofo
| kanbun yi lisu vai bamum syloti-nagri north-indic-number phags-pa
| saurashtra kayah-li rejang javanese cham tai-viet meetei-mayek
| linear-b aegean-number ancient-greek-number ancient-symbol
| phaistos-disc lycian carian olt-italic gothic ugaritic
| old-persian deseret shavian osmanya cypriot-syllabary aramaic
| phoenician lydian meroitic kharoshthi old-south-arabian avestan
| inscriptional-parthian inscriptional-pahlavi old-turkic
| rumi-number brahmi kaithi sora-sompeng chakma sharada takri
| cuneiform cuneiform-numbers-and-punctuation egyptian miao
| byzantine-musical-symbol musical-symbol
| ancient-greek-musical-notation tai-xuan-jing-symbol
| counting-rod-numeral mathematical mahjong-tile domino-tile
| playing-cards)
| 
`----




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-10 15:59 How to get the script name symbols of a specific character? YE Qianchuan
  2013-02-11  2:55 ` Jambunathan K
@ 2013-02-11 11:34 ` Jambunathan K
  2013-02-11 15:07   ` YE Qianchuan
                     ` (2 more replies)
  2013-02-12 15:12 ` YE Qianchuan
  2 siblings, 3 replies; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 11:34 UTC (permalink / raw)
  To: YE Qianchuan; +Cc: help-gnu-emacs

YE Qianchuan <stool.ye@gmail.com> writes:

> Hi, all.
>
> According to the document of `set-fontset-font', its argument TARGET can
> be a charset or a script name symbol.  But I failed to find any
> documents about script name symbols. What I found that seem relevant
> are variables `charset-script-alist', `script-representative-chars'
> and `char-script-table'.  However none of them tells me the details of
> those scripts, I can only guess by their names.
>
> My case is, for example, a set of unicode characters are displayed as
> hex boxes. I want to assign a proper font to display them. Specifying
> TARGET to unicode is not a good idea IMHO. I'd better find their
> script name symbol as TARGET, like `Han' for CJK characters.
>
> In practice, by calling `describe-char', I get which charset is
> corresponding to this character. So I can specify it to modify its
> font.  However, I can't find a method to get a character's script name
> symbols.

Put your cursor on the box and type 
        C-u C-x =

It will give more useful pointers.  The codepoint of a particular
character.  The name of the character, in the example below is prefixed
by the script it comes from etc.

,----
|              position: 192 of 196 (97%), column: 0
|             character: ஜ (displayed as ஜ) (codepoint 2972, #o5634, #xb9c)
|     preferred charset: unicode (Unicode (ISO10646))
| code point in charset: 0x0B9C
|                syntax: w 	which means: word
|              category: .:Base, L:Left-to-right (strong)
|              to input: type "ja" with tamil-itrans input method
|           buffer code: #xE0 #xAE #x9C
|             file code: #xE0 #xAE #x9C (encoded by coding system utf-8)
|               display: by this font (glyph code)
|     xft:-unknown-Lohit Tamil-normal-normal-normal-*-24-*-*-*-*-0-iso10646-1 (#x44)
| 
| Character code properties: customize what to show
|   name: TAMIL LETTER JA
|   general-category: Lo (Letter, Other)
|   decomposition: (2972) ('ஜ')
| 
| There are text properties here:
|   fontified            t
`----

Also you may want to look at this page:
        http://en.wikipedia.org/wiki/Unicode_block

>
> How can I achieve this? Do I miss something?
> Thanks for your help.
>
>
>

-- 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 11:00     ` Jambunathan K
@ 2013-02-11 14:50       ` YE Qianchuan
  0 siblings, 0 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-11 14:50 UTC (permalink / raw)
  To: Jambunathan K; +Cc: help-gnu-emacs

On 02/11/2013 07:00 PM, Jambunathan K wrote:
> You were looking for symbols for scripts and I have included a dump of
> what the above sexp returns.  I see hangul, hanunoo and han.
Right, and those variables I mentioned also tell me these symbols. However,
my problem is, even with this list of symbols, I have no idea
which one I should use. I know `han' is for CJK, and I can guess `tamil' 
is for
tamil. But some of them are difficult to guess. For example, which 
characters
does `cjk-misc' represent? How is it distinct from `han'?
Moreover, sometimes I get characters fail to display, but don't know 
which script
I should use. I would expand this for your another reply.
> As for Tamil script (which I use) I have the following in my .emacs.
> "Lohit Tamil" is the font to use for 'tamil script.
>
> ,----
> | ;; Use the predefined fontset "fontset-standard"
> | (set-face-font 'default "fontset-default")
> | (set-fontset-font "fontset-default" 'tamil "Lohit Tamil")
> `----
>
> ,---- (char-table-extra-slot char-script-table 0)
> | (latin phonetic greek coptic cyrillic armenian hebrew arabic
> | syriac thaana nko samaritan mandaic devanagari bengali gurmukhi
> | gujarati oriya tamil telugu kannada malayalam sinhala thai lao
> | tibetan burmese georgian hangul ethiopic cherokee
> | canadian-aboriginal ogham runic tagalog hanunoo buhid tagbanwa
> | khmer mongolian limbu tai-le tai-lue buginese tai-tham balinese
> | sundanese batak lepcha ol-chiki vedic symbol braille glagolitic
> | tifinagh han ideographic-description cjk-misc kana bopomofo
> | kanbun yi lisu vai bamum syloti-nagri north-indic-number phags-pa
> | saurashtra kayah-li rejang javanese cham tai-viet meetei-mayek
> | linear-b aegean-number ancient-greek-number ancient-symbol
> | phaistos-disc lycian carian olt-italic gothic ugaritic
> | old-persian deseret shavian osmanya cypriot-syllabary aramaic
> | phoenician lydian meroitic kharoshthi old-south-arabian avestan
> | inscriptional-parthian inscriptional-pahlavi old-turkic
> | rumi-number brahmi kaithi sora-sompeng chakma sharada takri
> | cuneiform cuneiform-numbers-and-punctuation egyptian miao
> | byzantine-musical-symbol musical-symbol
> | ancient-greek-musical-notation tai-xuan-jing-symbol
> | counting-rod-numeral mathematical mahjong-tile domino-tile
> | playing-cards)
> |
> `----
>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 11:34 ` Jambunathan K
@ 2013-02-11 15:07   ` YE Qianchuan
  2013-02-11 15:17     ` YE Qianchuan
  2013-02-11 19:57     ` Jambunathan K
  2013-02-11 15:57   ` Stefan Monnier
  2013-02-11 20:11   ` T.F. Torrey
  2 siblings, 2 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-11 15:07 UTC (permalink / raw)
  To: Jambunathan K; +Cc: help-gnu-emacs

On 02/11/2013 07:34 PM, Jambunathan K wrote:
> Put your cursor on the box and type
>          C-u C-x =
In fact, it's the same as `describe-char'. This command invokes
`what-cursor-position', which invokes `describe-char' eventually.
>
> It will give more useful pointers.  The codepoint of a particular
> character.  The name of the character, in the example below is prefixed
> by the script it comes from etc.
Cool, I didn't notice its name may be prefixed by its script. It does 
make a lot sense.

However sadly, not all characters do so. For example, a CJK character 
has prefix CJK.
But cjk is not a script name (though there's a script called cjk-misc) 
and it should belong
to `han'.

What's worse is, some characters don't show their names at all, even if 
I assign a font to it.

For example:
              position: 806 of 1031 (78%), column: 1
             character: 😀 (displayed as 😀) (codepoint 128512, 
#o373000, #x1f600)
     preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F600
                syntax: w     which means: word
              category: L:Left-to-right (strong)
           buffer code: #xF0 #x9F #x98 #x80
             file code: #xF0 #x9F #x98 #x80 (encoded by coding system 
utf-8-unix)
               display: no font available

Character code properties: customize what to show
   general-category: Cn (Other, Not Assigned)
   decomposition: (128512) ('😀')

> ,----
> |              position: 192 of 196 (97%), column: 0
> |             character: ஜ (displayed as ஜ) (codepoint 2972, #o5634, #xb9c)
> |     preferred charset: unicode (Unicode (ISO10646))
> | code point in charset: 0x0B9C
> |                syntax: w 	which means: word
> |              category: .:Base, L:Left-to-right (strong)
> |              to input: type "ja" with tamil-itrans input method
> |           buffer code: #xE0 #xAE #x9C
> |             file code: #xE0 #xAE #x9C (encoded by coding system utf-8)
> |               display: by this font (glyph code)
> |     xft:-unknown-Lohit Tamil-normal-normal-normal-*-24-*-*-*-*-0-iso10646-1 (#x44)
> |
> | Character code properties: customize what to show
> |   name: TAMIL LETTER JA
> |   general-category: Lo (Letter, Other)
> |   decomposition: (2972) ('ஜ')
> |
> | There are text properties here:
> |   fontified            t
> `----
>
> Also you may want to look at this page:
>          http://en.wikipedia.org/wiki/Unicode_block
>
>> How can I achieve this? Do I miss something?
>> Thanks for your help.
>>
>>
>>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 15:07   ` YE Qianchuan
@ 2013-02-11 15:17     ` YE Qianchuan
  2013-02-11 19:57     ` Jambunathan K
  1 sibling, 0 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-11 15:17 UTC (permalink / raw)
  To: Jambunathan K; +Cc: help-gnu-emacs

On 02/11/2013 11:07 PM, YE Qianchuan wrote:
> On 02/11/2013 07:34 PM, Jambunathan K wrote:
>> Put your cursor on the box and type
>>          C-u C-x =
> In fact, it's the same as `describe-char'. This command invokes
> `what-cursor-position', which invokes `describe-char' eventually.
>>
>> It will give more useful pointers.  The codepoint of a particular
>> character.  The name of the character, in the example below is prefixed
>> by the script it comes from etc.
> Cool, I didn't notice its name may be prefixed by its script. It does 
> make a lot sense.
>
> However sadly, not all characters do so. For example, a CJK character 
> has prefix CJK.
> But cjk is not a script name (though there's a script called cjk-misc) 
> and it should belong
> to `han'.
>
> What's worse is, some characters don't show their names at all, even 
> if I assign a font to it.
>
> For example:
>              position: 806 of 1031 (78%), column: 1
>             character: 😀 (displayed as 😀) (codepoint 128512, 
> #o373000, #x1f600)
>     preferred charset: unicode (Unicode (ISO10646))
> code point in charset: 0x1F600
>                syntax: w     which means: word
>              category: L:Left-to-right (strong)
>           buffer code: #xF0 #x9F #x98 #x80
>             file code: #xF0 #x9F #x98 #x80 (encoded by coding system 
> utf-8-unix)
>               display: no font available
>
> Character code properties: customize what to show
>   general-category: Cn (Other, Not Assigned)
>   decomposition: (128512) ('😀')

Additional example, this character's name don't show any connection to 
its script.
How do you get its script symbol?

              position: 870 of 1031 (84%), column: 65
             character: 😠 (displayed as 😠) (codepoint 128544, 
#o373040, #x1f620)
     preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F620
                syntax: w     which means: word
              category: .:Base
           buffer code: #xF0 #x9F #x98 #xA0
             file code: #xF0 #x9F #x98 #xA0
                (encoded by coding system utf-8-unix)
               display: by this font (glyph code)
xft:-unknown-Symbola-normal-normal-semi-condensed-*-15-*-*-*-*-0-iso10646-1 
(#x1ADE)

Character code properties: customize what to show
   name: ANGRY FACE
   general-category: So (Symbol, Other)
   decomposition: (128544) ('😠')




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 11:34 ` Jambunathan K
  2013-02-11 15:07   ` YE Qianchuan
@ 2013-02-11 15:57   ` Stefan Monnier
  2013-02-12 15:22     ` YE Qianchuan
  2013-02-11 20:11   ` T.F. Torrey
  2 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2013-02-11 15:57 UTC (permalink / raw)
  To: help-gnu-emacs

> Put your cursor on the box and type
>         C-u C-x =
> It will give more useful pointers.  The codepoint of a particular
> character.  The name of the character, in the example below is prefixed
> by the script it comes from etc.

Actually, the "name" first is just the official Unicode name of that
char, which is only indirectly linked to Emacs's (and fonts's) notion of
a script, from what I understand.

I suggest you M-x report-emacs-bug requesting a new feature that
displays (in C-x =) the charsets (and/or scripts) that the
current char belongs to.

        Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 15:07   ` YE Qianchuan
  2013-02-11 15:17     ` YE Qianchuan
@ 2013-02-11 19:57     ` Jambunathan K
  2013-02-11 20:08       ` Eli Zaretskii
  1 sibling, 1 reply; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 19:57 UTC (permalink / raw)
  To: YE Qianchuan; +Cc: help-gnu-emacs

YE Qianchuan <stool.ye@gmail.com> writes:

> On 02/11/2013 07:34 PM, Jambunathan K wrote:
>> Put your cursor on the box and type
>>          C-u C-x =
> In fact, it's the same as `describe-char'. This command invokes
> `what-cursor-position', which invokes `describe-char' eventually.
>>
>> It will give more useful pointers.  The codepoint of a particular
>> character.  The name of the character, in the example below is prefixed
>> by the script it comes from etc.
> Cool, I didn't notice its name may be prefixed by its script. It does
> make a lot sense.
>
> However sadly, not all characters do so. For example, a CJK character
> has prefix CJK.
> But cjk is not a script name (though there's a script called cjk-misc)
> and it should belong
> to `han'.
>
> What's worse is, some characters don't show their names at all, even
> if I assign a font to it.
>
> For example:
>              position: 806 of 1031 (78%), column: 1
>             character: 😀 (displayed as 😀) (codepoint 128512, #o373000,
> #x1f600)
>     preferred charset: unicode (Unicode (ISO10646))
> code point in charset: 0x1F600
>                syntax: w     which means: word
>              category: L:Left-to-right (strong)
>           buffer code: #xF0 #x9F #x98 #x80
>             file code: #xF0 #x9F #x98 #x80 (encoded by coding system
> utf-8-unix)
>               display: no font available
>
> Character code properties: customize what to show
>   general-category: Cn (Other, Not Assigned)
>   decomposition: (128512) ('😀')

This is what I get.  Emacs reports that it is a GRINNING FACE.  

I run Emacs from trunk though.  I am not sure this makes any actuall
difference.

I think it would be useful to have one browse different Unicode Blocks
or have C-u C-x = report the block name of a character.  I am just going
by what the below mentioned Wikipedia article suggests.

,---- http://en.wikipedia.org/wiki/Unicode_block
| U+1F600..U+1F64F 	Emoticons 	80 	1 SMP 	Common
`----

             position: 1706 of 2799 (61%), column: 28
            character: 😀 (displayed as 😀) (codepoint 128512, #o373000, #x1f600)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F600
               syntax: w 	which means: word
             category: .:Base
             to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
          buffer code: #xF0 #x9F #x98 #x80
            file code: not encodable by coding system undecided-unix
              display: no font available

Character code properties: customize what to show
  name: GRINNING FACE
  general-category: So (Symbol, Other)
  decomposition: (128512) ('😀')


-- 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 19:57     ` Jambunathan K
@ 2013-02-11 20:08       ` Eli Zaretskii
  2013-02-11 21:46         ` Jambunathan K
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-02-11 20:08 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Jambunathan K <kjambunathan@gmail.com>
> Date: Tue, 12 Feb 2013 01:27:28 +0530
> Cc: help-gnu-emacs@gnu.org
> 
> YE Qianchuan <stool.ye@gmail.com> writes:
> 
> > On 02/11/2013 07:34 PM, Jambunathan K wrote:
> >> Put your cursor on the box and type
> >>          C-u C-x =
> > In fact, it's the same as `describe-char'. This command invokes
> > `what-cursor-position', which invokes `describe-char' eventually.
> >>
> >> It will give more useful pointers.  The codepoint of a particular
> >> character.  The name of the character, in the example below is prefixed
> >> by the script it comes from etc.
> > Cool, I didn't notice its name may be prefixed by its script. It does
> > make a lot sense.
> >
> > However sadly, not all characters do so. For example, a CJK character
> > has prefix CJK.
> > But cjk is not a script name (though there's a script called cjk-misc)
> > and it should belong
> > to `han'.
> >
> > What's worse is, some characters don't show their names at all, even
> > if I assign a font to it.
> >
> > For example:
> >              position: 806 of 1031 (78%), column: 1
> >             character: 😀 (displayed as 😀) (codepoint 128512, #o373000,
> > #x1f600)
> >     preferred charset: unicode (Unicode (ISO10646))
> > code point in charset: 0x1F600
> >                syntax: w     which means: word
> >              category: L:Left-to-right (strong)
> >           buffer code: #xF0 #x9F #x98 #x80
> >             file code: #xF0 #x9F #x98 #x80 (encoded by coding system
> > utf-8-unix)
> >               display: no font available
> >
> > Character code properties: customize what to show
> >   general-category: Cn (Other, Not Assigned)
> >   decomposition: (128512) ('😀')
> 
> This is what I get.  Emacs reports that it is a GRINNING FACE.  
> 
> I run Emacs from trunk though.  I am not sure this makes any actuall
> difference.

The names come from the Unicode character database (UCD) that is
processed into a bunch of Emacs Lisp files and then preloaded into
Emacs.  The version of the Unicode database built into Emacs
determines which codepoints have names and which don't.

> I think it would be useful to have one browse different Unicode Blocks
> or have C-u C-x = report the block name of a character.

If that data is not in the UCD, Emacs cannot know it, unless someone
adds it to Emacs.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 11:34 ` Jambunathan K
  2013-02-11 15:07   ` YE Qianchuan
  2013-02-11 15:57   ` Stefan Monnier
@ 2013-02-11 20:11   ` T.F. Torrey
  2 siblings, 0 replies; 15+ messages in thread
From: T.F. Torrey @ 2013-02-11 20:11 UTC (permalink / raw)
  To: Jambunathan K; +Cc: stool.ye, help-gnu-emacs

Jambunathan K <kjambunathan@gmail.com> writes:

> Put your cursor on the box and type 
>         C-u C-x =
>
> It will give more useful pointers.  The codepoint of a particular
> character.  The name of the character, in the example below is prefixed
> by the script it comes from etc.

Wow. This is a great tip. Thanks for this (and your other
contributions)!

Terry
-- 
T.F. Torrey



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 20:08       ` Eli Zaretskii
@ 2013-02-11 21:46         ` Jambunathan K
  0 siblings, 0 replies; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 21:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

Eli Zaretskii <eliz@gnu.org> writes:

> The names come from the Unicode character database (UCD) that is
> processed into a bunch of Emacs Lisp files and then preloaded into
> Emacs.  The version of the Unicode database built into Emacs
> determines which codepoints have names and which don't.

In admin/unidata, I see only the following *.txt files

    /home/kjambunathan/src/emacs/trunk/admin/unidata:

    .
    ..
    BidiMirroring.txt
    UnicodeData.txt

There are a lot more files under 

    http://www.unicode.org/Public/UNIDATA/


>> I think it would be useful to have one browse different Unicode Blocks
>> or have C-u C-x = report the block name of a character.
>
> If that data is not in the UCD, Emacs cannot know it, unless someone
> adds it to Emacs.

So this would involve massaging Blocks.txt from the above URL.  Hm ...
-- 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-10 15:59 How to get the script name symbols of a specific character? YE Qianchuan
  2013-02-11  2:55 ` Jambunathan K
  2013-02-11 11:34 ` Jambunathan K
@ 2013-02-12 15:12 ` YE Qianchuan
  2 siblings, 0 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-12 15:12 UTC (permalink / raw)
  To: help-gnu-emacs

Hi, all. I make it.

After reading the char-table part of manual and re-checking those
relevant variables, eventually I've found the solution, which is quite
simpler than I thought.

When you have a character c and want to get its script name symbol,
just simply call `char-table-range', like
(char-table-range char-script-table c).

For examples, (char-table-range char-script-table ?a) will return
symbol 'latin.  (char-table-range char-script-table #x1f600) return
symbol 'symbol, which indicates the script name symbol of #x1f600 is
'symbol.

See the document of char-table-range to find more details.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to get the script name symbols of a specific character?
  2013-02-11 15:57   ` Stefan Monnier
@ 2013-02-12 15:22     ` YE Qianchuan
  0 siblings, 0 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-12 15:22 UTC (permalink / raw)
  To: help-gnu-emacs

On 02/11/2013 11:57 PM, Stefan Monnier wrote:
>> Put your cursor on the box and type
>>          C-u C-x =
>> It will give more useful pointers.  The codepoint of a particular
>> character.  The name of the character, in the example below is prefixed
>> by the script it comes from etc.
> Actually, the "name" first is just the official Unicode name of that
> char, which is only indirectly linked to Emacs's (and fonts's) notion of
> a script, from what I understand.
>
> I suggest you M-x report-emacs-bug requesting a new feature that
> displays (in C-x =) the charsets (and/or scripts) that the
> current char belongs to.
>
>
>          Stefan
>
>
Thank you for your explanation.
I would report a bug requesting putting character's script on 
`describe-char'



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-02-12 15:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-10 15:59 How to get the script name symbols of a specific character? YE Qianchuan
2013-02-11  2:55 ` Jambunathan K
2013-02-11 10:48   ` YE Qianchuan
2013-02-11 11:00     ` Jambunathan K
2013-02-11 14:50       ` YE Qianchuan
2013-02-11 11:34 ` Jambunathan K
2013-02-11 15:07   ` YE Qianchuan
2013-02-11 15:17     ` YE Qianchuan
2013-02-11 19:57     ` Jambunathan K
2013-02-11 20:08       ` Eli Zaretskii
2013-02-11 21:46         ` Jambunathan K
2013-02-11 15:57   ` Stefan Monnier
2013-02-12 15:22     ` YE Qianchuan
2013-02-11 20:11   ` T.F. Torrey
2013-02-12 15:12 ` YE Qianchuan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).