unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#36070: 27; feature request '(Describe Char Unidata List) to include 'kDefinition' value
@ 2019-06-03 12:00 Van L
  2019-06-03 15:06 ` Eli Zaretskii
  0 siblings, 1 reply; 3+ messages in thread
From: Van L @ 2019-06-03 12:00 UTC (permalink / raw)
  To: 36070

Hello Emacs,

The details retrieved by 'M-x describe-char' on '入' show the following

--8<---------------cut here---------------start------------->8---
Character code properties: customize what to show
  name: CJK IDEOGRAPH-5165
  general-category: Lo (Letter, Other)
  decomposition: (20837) ('入')
--8<---------------cut here---------------end--------------->8---

Following the customize link to 'Describe Char Unidata List' 
I find more information can be had from [1] .

The Readings table, in particular, is nice to have for the 'kDefinition'.

--8<---------------cut here---------------start------------->8---
| Data type   | Value                    |
|-------------+--------------------------|
| kDefinition | enter, come in(to), join |
|             |                          |
--8<---------------cut here---------------end--------------->8---

WDYT? Thanks in advance.

[1] https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%E5%85%A5






^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#36070: 27; feature request '(Describe Char Unidata List) to include 'kDefinition' value
  2019-06-03 12:00 bug#36070: 27; feature request '(Describe Char Unidata List) to include 'kDefinition' value Van L
@ 2019-06-03 15:06 ` Eli Zaretskii
  2019-06-10  6:16   ` Van L
  0 siblings, 1 reply; 3+ messages in thread
From: Eli Zaretskii @ 2019-06-03 15:06 UTC (permalink / raw)
  To: Van L; +Cc: 36070

> From: Van L <van@scratch.space>
> Date: Mon, 3 Jun 2019 22:00:30 +1000
> 
> The details retrieved by 'M-x describe-char' on '入' show the following
> 
> --8<---------------cut here---------------start------------->8---
> Character code properties: customize what to show
>   name: CJK IDEOGRAPH-5165
>   general-category: Lo (Letter, Other)
>   decomposition: (20837) ('入')
> --8<---------------cut here---------------end--------------->8---

This comes from UnicodeData.txt, our source for the Unicode properties
of all the characters.  We parse it into uni-*.el files as part of the
build.

> Following the customize link to 'Describe Char Unidata List' 
> I find more information can be had from [1] .
> 
> The Readings table, in particular, is nice to have for the 'kDefinition'.
> 
> --8<---------------cut here---------------start------------->8---
> | Data type   | Value                    |
> |-------------+--------------------------|
> | kDefinition | enter, come in(to), join |
> |             |                          |
> --8<---------------cut here---------------end--------------->8---

This comes from Unihan_Reading.txt, a different file that is part of
the Unihan database.

We don't currently have a property where to put this value, so we need
first to extend the properties.  And then we will need to parse the
above file and populate the property.  Patches welcome.  Bonus points
for reviewing other properties of the Unihan DB and adding whatever is
useful.  See UAX#38 (http://www.unicode.org/reports/tr38/), for the
description of the properties.

Thanks.





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#36070: 27; feature request '(Describe Char Unidata List) to include 'kDefinition' value
  2019-06-03 15:06 ` Eli Zaretskii
@ 2019-06-10  6:16   ` Van L
  0 siblings, 0 replies; 3+ messages in thread
From: Van L @ 2019-06-10  6:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 36070


> On 4 Jun 2019, at 01:06, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> --8<---------------cut here---------------start------------->8---
>> Character code properties: customize what to show
>>  name: CJK IDEOGRAPH-5165
>>  general-category: Lo (Letter, Other)
>>  decomposition: (20837) ('入')
>> --8<---------------cut here---------------end--------------->8---
> 
> This comes from UnicodeData.txt, our source for the Unicode properties
> of all the characters.  We parse it into uni-*.el files as part of the
> build.
> 
>> The Readings table, in particular, is nice to have for the 'kDefinition'.
>> 
>> --8<---------------cut here---------------start------------->8---
>> | Data type   | Value                    |
>> |-------------+--------------------------|
>> | kDefinition | enter, come in(to), join |
>> |             |                          |
>> --8<---------------cut here---------------end--------------->8---
> 
> This comes from Unihan_Reading.txt, a different file that is part of
> the Unihan database.
> 
> We don't currently have a property where to put this value, so we need
> first to extend the properties.  And then we will need to parse the
> above file and populate the property.  Patches welcome.  Bonus points
> for reviewing other properties of the Unihan DB and adding whatever is
> useful.  See UAX#38 (http://www.unicode.org/reports/tr38/), for the
> description of the properties.

Thanks for pointing this out. I definitely want to know more about the Unihan DB and extend the handling of this information.

-- Van






^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-06-10  6:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-03 12:00 bug#36070: 27; feature request '(Describe Char Unidata List) to include 'kDefinition' value Van L
2019-06-03 15:06 ` Eli Zaretskii
2019-06-10  6:16   ` Van L

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).