* How to get the script name symbols of a specific character?
@ 2013-02-10 15:59 YE Qianchuan
2013-02-11 2:55 ` Jambunathan K
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-10 15:59 UTC (permalink / raw)
To: help-gnu-emacs
Hi, all.
According to the document of `set-fontset-font', its argument TARGET can
be a charset or a script name symbol. But I failed to find any
documents about script name symbols. What I found that seem relevant
are variables `charset-script-alist', `script-representative-chars'
and `char-script-table'. However none of them tells me the details of
those scripts, I can only guess by their names.
My case is, for example, a set of unicode characters are displayed as
hex boxes. I want to assign a proper font to display them. Specifying
TARGET to unicode is not a good idea IMHO. I'd better find their
script name symbol as TARGET, like `Han' for CJK characters.
In practice, by calling `describe-char', I get which charset is
corresponding to this character. So I can specify it to modify its
font. However, I can't find a method to get a character's script name
symbols.
How can I achieve this? Do I miss something?
Thanks for your help.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-10 15:59 How to get the script name symbols of a specific character? YE Qianchuan
@ 2013-02-11 2:55 ` Jambunathan K
2013-02-11 10:48 ` YE Qianchuan
2013-02-11 11:34 ` Jambunathan K
2013-02-12 15:12 ` YE Qianchuan
2 siblings, 1 reply; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 2:55 UTC (permalink / raw)
To: YE Qianchuan; +Cc: help-gnu-emacs
YE Qianchuan <stool.ye@gmail.com> writes:
> Hi, all.
>
> According to the document of `set-fontset-font', its argument TARGET can
> be a charset or a script name symbol. But I failed to find any
> documents about script name symbols. What I found that seem relevant
> are variables `charset-script-alist', `script-representative-chars'
> and `char-script-table'. However none of them tells me the details of
> those scripts, I can only guess by their names.
These two suggestions or from stackoverflow thread
(http://stackoverflow.com/questions/7176276/what-is-script-name-symbol-means-for-emacs-set-fontset-font-function)
M-: (char-table-extra-slot char-script-table 0)
M-x list-character-sets
Long time ago, I was trying to assign font to tamil/indic scripts. I
was hoping that there would be a command like `describe-scripts' or some
such thing. I was disappointed.
May be there should be one.
>
> My case is, for example, a set of unicode characters are displayed as
> hex boxes. I want to assign a proper font to display them. Specifying
> TARGET to unicode is not a good idea IMHO. I'd better find their
> script name symbol as TARGET, like `Han' for CJK characters.
>
> In practice, by calling `describe-char', I get which charset is
> corresponding to this character. So I can specify it to modify its
> font. However, I can't find a method to get a character's script name
> symbols.
>
> How can I achieve this? Do I miss something?
> Thanks for your help.
>
>
>
--
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 2:55 ` Jambunathan K
@ 2013-02-11 10:48 ` YE Qianchuan
2013-02-11 11:00 ` Jambunathan K
0 siblings, 1 reply; 15+ messages in thread
From: YE Qianchuan @ 2013-02-11 10:48 UTC (permalink / raw)
To: Jambunathan K; +Cc: help-gnu-emacs
On 02/11/2013 10:55 AM, Jambunathan K wrote:
> These two suggestions or from stackoverflow thread
> (http://stackoverflow.com/questions/7176276/what-is-script-name-symbol-means-for-emacs-set-fontset-font-function)
>
> M-: (char-table-extra-slot char-script-table 0)
> M-x list-character-sets
>
> Long time ago, I was trying to assign font to tamil/indic scripts. I
> was hoping that there would be a command like `describe-scripts' or some
> such thing. I was disappointed.
>
> May be there should be one.
>
Thanks. I had read this thread. It does say something but not enough.
Maybe I should look at the source.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 10:48 ` YE Qianchuan
@ 2013-02-11 11:00 ` Jambunathan K
2013-02-11 14:50 ` YE Qianchuan
0 siblings, 1 reply; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 11:00 UTC (permalink / raw)
To: YE Qianchuan; +Cc: help-gnu-emacs
YE Qianchuan <stool.ye@gmail.com> writes:
> On 02/11/2013 10:55 AM, Jambunathan K wrote:
>> These two suggestions or from stackoverflow thread
>> (http://stackoverflow.com/questions/7176276/what-is-script-name-symbol-means-for-emacs-set-fontset-font-function)
>>
>> M-: (char-table-extra-slot char-script-table 0)
>> M-x list-character-sets
>>
>> Long time ago, I was trying to assign font to tamil/indic scripts. I
>> was hoping that there would be a command like `describe-scripts' or some
>> such thing. I was disappointed.
>>
>> May be there should be one.
>>
> Thanks. I had read this thread. It does say something but not
> enough. Maybe I should look at the source.
You were looking for symbols for scripts and I have included a dump of
what the above sexp returns. I see hangul, hanunoo and han.
As for Tamil script (which I use) I have the following in my .emacs.
"Lohit Tamil" is the font to use for 'tamil script.
,----
| ;; Use the predefined fontset "fontset-standard"
| (set-face-font 'default "fontset-default")
| (set-fontset-font "fontset-default" 'tamil "Lohit Tamil")
`----
,---- (char-table-extra-slot char-script-table 0)
| (latin phonetic greek coptic cyrillic armenian hebrew arabic
| syriac thaana nko samaritan mandaic devanagari bengali gurmukhi
| gujarati oriya tamil telugu kannada malayalam sinhala thai lao
| tibetan burmese georgian hangul ethiopic cherokee
| canadian-aboriginal ogham runic tagalog hanunoo buhid tagbanwa
| khmer mongolian limbu tai-le tai-lue buginese tai-tham balinese
| sundanese batak lepcha ol-chiki vedic symbol braille glagolitic
| tifinagh han ideographic-description cjk-misc kana bopomofo
| kanbun yi lisu vai bamum syloti-nagri north-indic-number phags-pa
| saurashtra kayah-li rejang javanese cham tai-viet meetei-mayek
| linear-b aegean-number ancient-greek-number ancient-symbol
| phaistos-disc lycian carian olt-italic gothic ugaritic
| old-persian deseret shavian osmanya cypriot-syllabary aramaic
| phoenician lydian meroitic kharoshthi old-south-arabian avestan
| inscriptional-parthian inscriptional-pahlavi old-turkic
| rumi-number brahmi kaithi sora-sompeng chakma sharada takri
| cuneiform cuneiform-numbers-and-punctuation egyptian miao
| byzantine-musical-symbol musical-symbol
| ancient-greek-musical-notation tai-xuan-jing-symbol
| counting-rod-numeral mathematical mahjong-tile domino-tile
| playing-cards)
|
`----
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-10 15:59 How to get the script name symbols of a specific character? YE Qianchuan
2013-02-11 2:55 ` Jambunathan K
@ 2013-02-11 11:34 ` Jambunathan K
2013-02-11 15:07 ` YE Qianchuan
` (2 more replies)
2013-02-12 15:12 ` YE Qianchuan
2 siblings, 3 replies; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 11:34 UTC (permalink / raw)
To: YE Qianchuan; +Cc: help-gnu-emacs
YE Qianchuan <stool.ye@gmail.com> writes:
> Hi, all.
>
> According to the document of `set-fontset-font', its argument TARGET can
> be a charset or a script name symbol. But I failed to find any
> documents about script name symbols. What I found that seem relevant
> are variables `charset-script-alist', `script-representative-chars'
> and `char-script-table'. However none of them tells me the details of
> those scripts, I can only guess by their names.
>
> My case is, for example, a set of unicode characters are displayed as
> hex boxes. I want to assign a proper font to display them. Specifying
> TARGET to unicode is not a good idea IMHO. I'd better find their
> script name symbol as TARGET, like `Han' for CJK characters.
>
> In practice, by calling `describe-char', I get which charset is
> corresponding to this character. So I can specify it to modify its
> font. However, I can't find a method to get a character's script name
> symbols.
Put your cursor on the box and type
C-u C-x =
It will give more useful pointers. The codepoint of a particular
character. The name of the character, in the example below is prefixed
by the script it comes from etc.
,----
| position: 192 of 196 (97%), column: 0
| character: ஜ (displayed as ஜ) (codepoint 2972, #o5634, #xb9c)
| preferred charset: unicode (Unicode (ISO10646))
| code point in charset: 0x0B9C
| syntax: w which means: word
| category: .:Base, L:Left-to-right (strong)
| to input: type "ja" with tamil-itrans input method
| buffer code: #xE0 #xAE #x9C
| file code: #xE0 #xAE #x9C (encoded by coding system utf-8)
| display: by this font (glyph code)
| xft:-unknown-Lohit Tamil-normal-normal-normal-*-24-*-*-*-*-0-iso10646-1 (#x44)
|
| Character code properties: customize what to show
| name: TAMIL LETTER JA
| general-category: Lo (Letter, Other)
| decomposition: (2972) ('ஜ')
|
| There are text properties here:
| fontified t
`----
Also you may want to look at this page:
http://en.wikipedia.org/wiki/Unicode_block
>
> How can I achieve this? Do I miss something?
> Thanks for your help.
>
>
>
--
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 11:00 ` Jambunathan K
@ 2013-02-11 14:50 ` YE Qianchuan
0 siblings, 0 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-11 14:50 UTC (permalink / raw)
To: Jambunathan K; +Cc: help-gnu-emacs
On 02/11/2013 07:00 PM, Jambunathan K wrote:
> You were looking for symbols for scripts and I have included a dump of
> what the above sexp returns. I see hangul, hanunoo and han.
Right, and those variables I mentioned also tell me these symbols. However,
my problem is, even with this list of symbols, I have no idea
which one I should use. I know `han' is for CJK, and I can guess `tamil'
is for
tamil. But some of them are difficult to guess. For example, which
characters
does `cjk-misc' represent? How is it distinct from `han'?
Moreover, sometimes I get characters fail to display, but don't know
which script
I should use. I would expand this for your another reply.
> As for Tamil script (which I use) I have the following in my .emacs.
> "Lohit Tamil" is the font to use for 'tamil script.
>
> ,----
> | ;; Use the predefined fontset "fontset-standard"
> | (set-face-font 'default "fontset-default")
> | (set-fontset-font "fontset-default" 'tamil "Lohit Tamil")
> `----
>
> ,---- (char-table-extra-slot char-script-table 0)
> | (latin phonetic greek coptic cyrillic armenian hebrew arabic
> | syriac thaana nko samaritan mandaic devanagari bengali gurmukhi
> | gujarati oriya tamil telugu kannada malayalam sinhala thai lao
> | tibetan burmese georgian hangul ethiopic cherokee
> | canadian-aboriginal ogham runic tagalog hanunoo buhid tagbanwa
> | khmer mongolian limbu tai-le tai-lue buginese tai-tham balinese
> | sundanese batak lepcha ol-chiki vedic symbol braille glagolitic
> | tifinagh han ideographic-description cjk-misc kana bopomofo
> | kanbun yi lisu vai bamum syloti-nagri north-indic-number phags-pa
> | saurashtra kayah-li rejang javanese cham tai-viet meetei-mayek
> | linear-b aegean-number ancient-greek-number ancient-symbol
> | phaistos-disc lycian carian olt-italic gothic ugaritic
> | old-persian deseret shavian osmanya cypriot-syllabary aramaic
> | phoenician lydian meroitic kharoshthi old-south-arabian avestan
> | inscriptional-parthian inscriptional-pahlavi old-turkic
> | rumi-number brahmi kaithi sora-sompeng chakma sharada takri
> | cuneiform cuneiform-numbers-and-punctuation egyptian miao
> | byzantine-musical-symbol musical-symbol
> | ancient-greek-musical-notation tai-xuan-jing-symbol
> | counting-rod-numeral mathematical mahjong-tile domino-tile
> | playing-cards)
> |
> `----
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 11:34 ` Jambunathan K
@ 2013-02-11 15:07 ` YE Qianchuan
2013-02-11 15:17 ` YE Qianchuan
2013-02-11 19:57 ` Jambunathan K
2013-02-11 15:57 ` Stefan Monnier
2013-02-11 20:11 ` T.F. Torrey
2 siblings, 2 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-11 15:07 UTC (permalink / raw)
To: Jambunathan K; +Cc: help-gnu-emacs
On 02/11/2013 07:34 PM, Jambunathan K wrote:
> Put your cursor on the box and type
> C-u C-x =
In fact, it's the same as `describe-char'. This command invokes
`what-cursor-position', which invokes `describe-char' eventually.
>
> It will give more useful pointers. The codepoint of a particular
> character. The name of the character, in the example below is prefixed
> by the script it comes from etc.
Cool, I didn't notice its name may be prefixed by its script. It does
make a lot sense.
However sadly, not all characters do so. For example, a CJK character
has prefix CJK.
But cjk is not a script name (though there's a script called cjk-misc)
and it should belong
to `han'.
What's worse is, some characters don't show their names at all, even if
I assign a font to it.
For example:
position: 806 of 1031 (78%), column: 1
character: 😀 (displayed as 😀) (codepoint 128512,
#o373000, #x1f600)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F600
syntax: w which means: word
category: L:Left-to-right (strong)
buffer code: #xF0 #x9F #x98 #x80
file code: #xF0 #x9F #x98 #x80 (encoded by coding system
utf-8-unix)
display: no font available
Character code properties: customize what to show
general-category: Cn (Other, Not Assigned)
decomposition: (128512) ('😀')
> ,----
> | position: 192 of 196 (97%), column: 0
> | character: ஜ (displayed as ஜ) (codepoint 2972, #o5634, #xb9c)
> | preferred charset: unicode (Unicode (ISO10646))
> | code point in charset: 0x0B9C
> | syntax: w which means: word
> | category: .:Base, L:Left-to-right (strong)
> | to input: type "ja" with tamil-itrans input method
> | buffer code: #xE0 #xAE #x9C
> | file code: #xE0 #xAE #x9C (encoded by coding system utf-8)
> | display: by this font (glyph code)
> | xft:-unknown-Lohit Tamil-normal-normal-normal-*-24-*-*-*-*-0-iso10646-1 (#x44)
> |
> | Character code properties: customize what to show
> | name: TAMIL LETTER JA
> | general-category: Lo (Letter, Other)
> | decomposition: (2972) ('ஜ')
> |
> | There are text properties here:
> | fontified t
> `----
>
> Also you may want to look at this page:
> http://en.wikipedia.org/wiki/Unicode_block
>
>> How can I achieve this? Do I miss something?
>> Thanks for your help.
>>
>>
>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 15:07 ` YE Qianchuan
@ 2013-02-11 15:17 ` YE Qianchuan
2013-02-11 19:57 ` Jambunathan K
1 sibling, 0 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-11 15:17 UTC (permalink / raw)
To: Jambunathan K; +Cc: help-gnu-emacs
On 02/11/2013 11:07 PM, YE Qianchuan wrote:
> On 02/11/2013 07:34 PM, Jambunathan K wrote:
>> Put your cursor on the box and type
>> C-u C-x =
> In fact, it's the same as `describe-char'. This command invokes
> `what-cursor-position', which invokes `describe-char' eventually.
>>
>> It will give more useful pointers. The codepoint of a particular
>> character. The name of the character, in the example below is prefixed
>> by the script it comes from etc.
> Cool, I didn't notice its name may be prefixed by its script. It does
> make a lot sense.
>
> However sadly, not all characters do so. For example, a CJK character
> has prefix CJK.
> But cjk is not a script name (though there's a script called cjk-misc)
> and it should belong
> to `han'.
>
> What's worse is, some characters don't show their names at all, even
> if I assign a font to it.
>
> For example:
> position: 806 of 1031 (78%), column: 1
> character: 😀 (displayed as 😀) (codepoint 128512,
> #o373000, #x1f600)
> preferred charset: unicode (Unicode (ISO10646))
> code point in charset: 0x1F600
> syntax: w which means: word
> category: L:Left-to-right (strong)
> buffer code: #xF0 #x9F #x98 #x80
> file code: #xF0 #x9F #x98 #x80 (encoded by coding system
> utf-8-unix)
> display: no font available
>
> Character code properties: customize what to show
> general-category: Cn (Other, Not Assigned)
> decomposition: (128512) ('😀')
Additional example, this character's name don't show any connection to
its script.
How do you get its script symbol?
position: 870 of 1031 (84%), column: 65
character: 😠 (displayed as 😠) (codepoint 128544,
#o373040, #x1f620)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F620
syntax: w which means: word
category: .:Base
buffer code: #xF0 #x9F #x98 #xA0
file code: #xF0 #x9F #x98 #xA0
(encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-unknown-Symbola-normal-normal-semi-condensed-*-15-*-*-*-*-0-iso10646-1
(#x1ADE)
Character code properties: customize what to show
name: ANGRY FACE
general-category: So (Symbol, Other)
decomposition: (128544) ('😠')
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 11:34 ` Jambunathan K
2013-02-11 15:07 ` YE Qianchuan
@ 2013-02-11 15:57 ` Stefan Monnier
2013-02-12 15:22 ` YE Qianchuan
2013-02-11 20:11 ` T.F. Torrey
2 siblings, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2013-02-11 15:57 UTC (permalink / raw)
To: help-gnu-emacs
> Put your cursor on the box and type
> C-u C-x =
> It will give more useful pointers. The codepoint of a particular
> character. The name of the character, in the example below is prefixed
> by the script it comes from etc.
Actually, the "name" first is just the official Unicode name of that
char, which is only indirectly linked to Emacs's (and fonts's) notion of
a script, from what I understand.
I suggest you M-x report-emacs-bug requesting a new feature that
displays (in C-x =) the charsets (and/or scripts) that the
current char belongs to.
Stefan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 15:07 ` YE Qianchuan
2013-02-11 15:17 ` YE Qianchuan
@ 2013-02-11 19:57 ` Jambunathan K
2013-02-11 20:08 ` Eli Zaretskii
1 sibling, 1 reply; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 19:57 UTC (permalink / raw)
To: YE Qianchuan; +Cc: help-gnu-emacs
YE Qianchuan <stool.ye@gmail.com> writes:
> On 02/11/2013 07:34 PM, Jambunathan K wrote:
>> Put your cursor on the box and type
>> C-u C-x =
> In fact, it's the same as `describe-char'. This command invokes
> `what-cursor-position', which invokes `describe-char' eventually.
>>
>> It will give more useful pointers. The codepoint of a particular
>> character. The name of the character, in the example below is prefixed
>> by the script it comes from etc.
> Cool, I didn't notice its name may be prefixed by its script. It does
> make a lot sense.
>
> However sadly, not all characters do so. For example, a CJK character
> has prefix CJK.
> But cjk is not a script name (though there's a script called cjk-misc)
> and it should belong
> to `han'.
>
> What's worse is, some characters don't show their names at all, even
> if I assign a font to it.
>
> For example:
> position: 806 of 1031 (78%), column: 1
> character: 😀 (displayed as 😀) (codepoint 128512, #o373000,
> #x1f600)
> preferred charset: unicode (Unicode (ISO10646))
> code point in charset: 0x1F600
> syntax: w which means: word
> category: L:Left-to-right (strong)
> buffer code: #xF0 #x9F #x98 #x80
> file code: #xF0 #x9F #x98 #x80 (encoded by coding system
> utf-8-unix)
> display: no font available
>
> Character code properties: customize what to show
> general-category: Cn (Other, Not Assigned)
> decomposition: (128512) ('😀')
This is what I get. Emacs reports that it is a GRINNING FACE.
I run Emacs from trunk though. I am not sure this makes any actuall
difference.
I think it would be useful to have one browse different Unicode Blocks
or have C-u C-x = report the block name of a character. I am just going
by what the below mentioned Wikipedia article suggests.
,---- http://en.wikipedia.org/wiki/Unicode_block
| U+1F600..U+1F64F Emoticons 80 1 SMP Common
`----
position: 1706 of 2799 (61%), column: 28
character: 😀 (displayed as 😀) (codepoint 128512, #o373000, #x1f600)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F600
syntax: w which means: word
category: .:Base
to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
buffer code: #xF0 #x9F #x98 #x80
file code: not encodable by coding system undecided-unix
display: no font available
Character code properties: customize what to show
name: GRINNING FACE
general-category: So (Symbol, Other)
decomposition: (128512) ('😀')
--
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 19:57 ` Jambunathan K
@ 2013-02-11 20:08 ` Eli Zaretskii
2013-02-11 21:46 ` Jambunathan K
0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-02-11 20:08 UTC (permalink / raw)
To: help-gnu-emacs
> From: Jambunathan K <kjambunathan@gmail.com>
> Date: Tue, 12 Feb 2013 01:27:28 +0530
> Cc: help-gnu-emacs@gnu.org
>
> YE Qianchuan <stool.ye@gmail.com> writes:
>
> > On 02/11/2013 07:34 PM, Jambunathan K wrote:
> >> Put your cursor on the box and type
> >> C-u C-x =
> > In fact, it's the same as `describe-char'. This command invokes
> > `what-cursor-position', which invokes `describe-char' eventually.
> >>
> >> It will give more useful pointers. The codepoint of a particular
> >> character. The name of the character, in the example below is prefixed
> >> by the script it comes from etc.
> > Cool, I didn't notice its name may be prefixed by its script. It does
> > make a lot sense.
> >
> > However sadly, not all characters do so. For example, a CJK character
> > has prefix CJK.
> > But cjk is not a script name (though there's a script called cjk-misc)
> > and it should belong
> > to `han'.
> >
> > What's worse is, some characters don't show their names at all, even
> > if I assign a font to it.
> >
> > For example:
> > position: 806 of 1031 (78%), column: 1
> > character: 😀 (displayed as 😀) (codepoint 128512, #o373000,
> > #x1f600)
> > preferred charset: unicode (Unicode (ISO10646))
> > code point in charset: 0x1F600
> > syntax: w which means: word
> > category: L:Left-to-right (strong)
> > buffer code: #xF0 #x9F #x98 #x80
> > file code: #xF0 #x9F #x98 #x80 (encoded by coding system
> > utf-8-unix)
> > display: no font available
> >
> > Character code properties: customize what to show
> > general-category: Cn (Other, Not Assigned)
> > decomposition: (128512) ('😀')
>
> This is what I get. Emacs reports that it is a GRINNING FACE.
>
> I run Emacs from trunk though. I am not sure this makes any actuall
> difference.
The names come from the Unicode character database (UCD) that is
processed into a bunch of Emacs Lisp files and then preloaded into
Emacs. The version of the Unicode database built into Emacs
determines which codepoints have names and which don't.
> I think it would be useful to have one browse different Unicode Blocks
> or have C-u C-x = report the block name of a character.
If that data is not in the UCD, Emacs cannot know it, unless someone
adds it to Emacs.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 11:34 ` Jambunathan K
2013-02-11 15:07 ` YE Qianchuan
2013-02-11 15:57 ` Stefan Monnier
@ 2013-02-11 20:11 ` T.F. Torrey
2 siblings, 0 replies; 15+ messages in thread
From: T.F. Torrey @ 2013-02-11 20:11 UTC (permalink / raw)
To: Jambunathan K; +Cc: stool.ye, help-gnu-emacs
Jambunathan K <kjambunathan@gmail.com> writes:
> Put your cursor on the box and type
> C-u C-x =
>
> It will give more useful pointers. The codepoint of a particular
> character. The name of the character, in the example below is prefixed
> by the script it comes from etc.
Wow. This is a great tip. Thanks for this (and your other
contributions)!
Terry
--
T.F. Torrey
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 20:08 ` Eli Zaretskii
@ 2013-02-11 21:46 ` Jambunathan K
0 siblings, 0 replies; 15+ messages in thread
From: Jambunathan K @ 2013-02-11 21:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: help-gnu-emacs
Eli Zaretskii <eliz@gnu.org> writes:
> The names come from the Unicode character database (UCD) that is
> processed into a bunch of Emacs Lisp files and then preloaded into
> Emacs. The version of the Unicode database built into Emacs
> determines which codepoints have names and which don't.
In admin/unidata, I see only the following *.txt files
/home/kjambunathan/src/emacs/trunk/admin/unidata:
.
..
BidiMirroring.txt
UnicodeData.txt
There are a lot more files under
http://www.unicode.org/Public/UNIDATA/
>> I think it would be useful to have one browse different Unicode Blocks
>> or have C-u C-x = report the block name of a character.
>
> If that data is not in the UCD, Emacs cannot know it, unless someone
> adds it to Emacs.
So this would involve massaging Blocks.txt from the above URL. Hm ...
--
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-10 15:59 How to get the script name symbols of a specific character? YE Qianchuan
2013-02-11 2:55 ` Jambunathan K
2013-02-11 11:34 ` Jambunathan K
@ 2013-02-12 15:12 ` YE Qianchuan
2 siblings, 0 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-12 15:12 UTC (permalink / raw)
To: help-gnu-emacs
Hi, all. I make it.
After reading the char-table part of manual and re-checking those
relevant variables, eventually I've found the solution, which is quite
simpler than I thought.
When you have a character c and want to get its script name symbol,
just simply call `char-table-range', like
(char-table-range char-script-table c).
For examples, (char-table-range char-script-table ?a) will return
symbol 'latin. (char-table-range char-script-table #x1f600) return
symbol 'symbol, which indicates the script name symbol of #x1f600 is
'symbol.
See the document of char-table-range to find more details.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How to get the script name symbols of a specific character?
2013-02-11 15:57 ` Stefan Monnier
@ 2013-02-12 15:22 ` YE Qianchuan
0 siblings, 0 replies; 15+ messages in thread
From: YE Qianchuan @ 2013-02-12 15:22 UTC (permalink / raw)
To: help-gnu-emacs
On 02/11/2013 11:57 PM, Stefan Monnier wrote:
>> Put your cursor on the box and type
>> C-u C-x =
>> It will give more useful pointers. The codepoint of a particular
>> character. The name of the character, in the example below is prefixed
>> by the script it comes from etc.
> Actually, the "name" first is just the official Unicode name of that
> char, which is only indirectly linked to Emacs's (and fonts's) notion of
> a script, from what I understand.
>
> I suggest you M-x report-emacs-bug requesting a new feature that
> displays (in C-x =) the charsets (and/or scripts) that the
> current char belongs to.
>
>
> Stefan
>
>
Thank you for your explanation.
I would report a bug requesting putting character's script on
`describe-char'
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-02-12 15:22 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-10 15:59 How to get the script name symbols of a specific character? YE Qianchuan
2013-02-11 2:55 ` Jambunathan K
2013-02-11 10:48 ` YE Qianchuan
2013-02-11 11:00 ` Jambunathan K
2013-02-11 14:50 ` YE Qianchuan
2013-02-11 11:34 ` Jambunathan K
2013-02-11 15:07 ` YE Qianchuan
2013-02-11 15:17 ` YE Qianchuan
2013-02-11 19:57 ` Jambunathan K
2013-02-11 20:08 ` Eli Zaretskii
2013-02-11 21:46 ` Jambunathan K
2013-02-11 15:57 ` Stefan Monnier
2013-02-12 15:22 ` YE Qianchuan
2013-02-11 20:11 ` T.F. Torrey
2013-02-12 15:12 ` YE Qianchuan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).