unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
@ 2021-12-31 17:55 Van Ly
  2022-01-03 13:54 ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Van Ly @ 2021-12-31 17:55 UTC (permalink / raw)
  To: 52918

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]


Hello,

I was looking in the master's emacs/admin/notes subdirectory and 
found the unicode file.  It has a list of files from the ucd and has 
left out:

   . Unihan_Readings.txt

Like how quail-show-key helps by showing in the minibuffer the input 
sequence needed to type a character for a specific input method, can 
there be a function called quail-show-unihan that exposes in the 
minibuffer the kDefinition entry associated with the East Asian 
character from ucd/Unihan_Readings.txt?

-- 
vl

[-- Attachment #2: Type: text/plain, Size: 3929 bytes --]

In GNU Emacs 29.0.50 (build 1, aarch64-unknown-linux-gnu, X toolkit, cairo version 1.16.0)
 of 2021-12-28 built on charlie
Repository revision: 208ae993bac6f011f178befbeeb8104c0f63499f
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux 11 (bullseye)

Configured using:
 'configure
 --prefix=/b/b/Blah/emacs-2021-12-28
 --with-x-toolkit=lucid --without-toolkit-scroll-bars --without-xft
 --with-native-compilation --without-compress-install
 --without-mailutils --without-xaw3d --without-selinux'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
LCMS2 LIBOTF LIBSYSTEMD LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF X11 XDBE XIM XPM
LUCID ZLIB

Important settings:
  value of $LC_ALL: en_AU.UTF-8
  value of $LANG: en_AU.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Info

Minor modes in effect:
  shell-dirtrack-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow mail-extr emacsbug message yank-media rmc rfc822 mml mml-sec epa
derived epg rfc6068 epg-config mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums ind-util quail cl-print ffap sort find-dired rect kmacro
facemenu two-column cl-extra shortdoc help-fns radix-tree help-mode
timeclock whitespace tab-line hl-line display-line-numbers apropos
mule-util info misearch multi-isearch shell pcomplete comint ansi-color
ring eww xdg url-queue thingatpt shr pixel-fill kinsoku svg xml dom
browse-url url url-proxy url-privacy url-expand url-methods url-history
url-cookie url-domsuf url-util url-parse auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json map url-vars
mailcap puny mm-url gnus nnheader gnus-util time-date mail-utils mm-util
mail-prsvr cus-edit cus-start cus-load wid-edit bug-reference noutline
outline view dired-aux dired dired-loaddefs bookmark seq gv subr-x
byte-opt bytecomp byte-compile cconv text-property-search pp vc-git
diff-mode easy-mmode vc-dispatcher cl-loaddefs cl-lib wombat-theme
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget keymap hashtable-print-readable backquote threads
dbusbind inotify lcms2 dynamic-setting system-font-setting
font-render-setting cairo x-toolkit x multi-tty make-network-process
native-compile emacs)

Memory information:
((conses 16 243241 13740)
 (symbols 48 13426 1)
 (strings 32 76179 7118)
 (string-bytes 1 2010021)
 (vectors 16 46033)
 (vector-slots 8 1404983 181934)
 (floats 8 175 208)
 (intervals 56 6466 5425)
 (buffers 992 39))


^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
  2021-12-31 17:55 bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry Van Ly
@ 2022-01-03 13:54 ` Eli Zaretskii
  2022-01-04 15:13   ` Van Ly
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Eli Zaretskii @ 2022-01-03 13:54 UTC (permalink / raw)
  To: Van Ly; +Cc: 52918

> Date: Fri, 31 Dec 2021 17:55:01 +0000 (UTC)
> From: Van Ly <van.ly@sdf.org>
> 
> I was looking in the master's emacs/admin/notes subdirectory and 
> found the unicode file.  It has a list of files from the ucd and has 
> left out:
> 
>    . Unihan_Readings.txt
> 
> Like how quail-show-key helps by showing in the minibuffer the input 
> sequence needed to type a character for a specific input method, can 
> there be a function called quail-show-unihan that exposes in the 
> minibuffer the kDefinition entry associated with the East Asian 
> character from ucd/Unihan_Readings.txt?

Yes, this could be added to Emacs, and IMO would be a useful feature.

Suggested implementation:

  . import the Unihan_Readings.txt file into Emacs
  . add Makefile rules to produce a uni-unihan-readings.el file from
    Unihan_Readings.txt, which defines a char-table where each
    character has its kDefinition property value
  . code a minor mode which will show in the echo area the value of
    the kDefinition property, if any, of the character at point

Patches welcome.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
  2022-01-03 13:54 ` Eli Zaretskii
@ 2022-01-04 15:13   ` Van Ly
  2022-01-17 18:25   ` Van Ly
       [not found]   ` <Pine.NEB.4.64.2201230208400.10119@faeroes.freeshell.org>
  2 siblings, 0 replies; 10+ messages in thread
From: Van Ly @ 2022-01-04 15:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52918

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

On Mon, 3 Jan 2022, Eli Zaretskii wrote:

>
> Suggested implementation:
>
>  . import the Unihan_Readings.txt file into Emacs
>
> Patches welcome.
>

Attached is the diff listing for admin/unidata/README to source 
Unihan_Readings.txt from

=> https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip

The version specific path alternatives are

=> https://www.unicode.org/Public/14.0.0/ucd/Unihan.zip
=> https://www.unicode.org/Public/15.0.0/ucd/Unihan-15.0.0d1.zip

-- 
vl

[-- Attachment #2: Type: text/plain, Size: 410 bytes --]

diff --git a/admin/unidata/README b/admin/unidata/README
index 4b8444b0fe..d56677f90c 100644
--- a/admin/unidata/README
+++ b/admin/unidata/README
@@ -48,3 +48,8 @@ https://www.unicode.org/Public/emoji/14.0/emoji-sequences.txt
 emoji-test.txt
 https://unicode.org/Public/emoji/14.0/emoji-test.txt
 2021-10-28
+
+Unihan.zip
+https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip
+2022-01-05
+


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
  2022-01-03 13:54 ` Eli Zaretskii
  2022-01-04 15:13   ` Van Ly
@ 2022-01-17 18:25   ` Van Ly
  2022-01-18 11:30     ` Van Ly
       [not found]   ` <Pine.NEB.4.64.2201230208400.10119@faeroes.freeshell.org>
  2 siblings, 1 reply; 10+ messages in thread
From: Van Ly @ 2022-01-17 18:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52918

[-- Attachment #1: Type: text/plain, Size: 1118 bytes --]

On Mon, 3 Jan 2022, Eli Zaretskii wrote:

>
> Suggested implementation:
>
>  . add Makefile rules to produce a uni-unihan-readings.el file from
>    Unihan_Readings.txt, which defines a char-table where each
>    character has its kDefinition property value
>

A candidate for the Makefile rule to produce uni-unihan-readings.el 
is

'''
#!/bin/sh
X='/usr/X/Projects/emacs-28.0.91/admin/unidata/Unihan_Readings.txt'
fgrep 'kDefinition' "$X" | sed -e '/^#/d' -e 's/^../#x/' | head -n 3 
| awk '-F	' 'BEGIN {printf("(defvar 
readings-table\n\t(make-char-table '\'readings-table' nil)\n\t\"Char 
table of definitions for East Asian characters.\")\n")} 
{printf("(aset readings-table %s \"%s\")\n", $1, $3)}'
  '''

The result is

'''
(defvar readings-table
 	(make-char-table 'readings-table nil)
 	"Char table of definitions for East Asian characters.")
(aset readings-table #x3400 "(same as U+4E18 丘) hillock or mound")
(aset readings-table #x3401 "to lick; to taste, a mat, bamboo bark")
(aset readings-table #x3402 "(J) non-standard form of U+559C 喜, to 
like, love, enjoy; a joyful thing")
'''

-- 
vl

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
  2022-01-17 18:25   ` Van Ly
@ 2022-01-18 11:30     ` Van Ly
  0 siblings, 0 replies; 10+ messages in thread
From: Van Ly @ 2022-01-18 11:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52918


Place node in etc/TODO file for the suggested implementation here to 
be done.

'''
diff -u --label /usr/X/Projects/emacs-28.0.91/etc/TODO --label 
\#\<buffer\ TODO\> /usr/X/Projects/emacs-28.0.91/etc/TODO 
/dev/shm/buffer-content-Q1ArDD
--- /usr/X/Projects/emacs-28.0.91/etc/TODO
+++ #<buffer TODO>
@@ -747,6 +747,9 @@

  ** Add definitions for symbol properties, for documentation purposes

+** Make use of char-table for reading definitions from 
ucd/Unihan_Readings.txt
+bug#52918 see.
+
  ** Temporarily remove scroll bars when they are not needed
  Typically when a buffer can be fully displayed in its window.


Diff finished.  Tue Jan 18 22:22:52 2022

'''

-- 
vl






^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
       [not found]   ` <Pine.NEB.4.64.2201230208400.10119@faeroes.freeshell.org>
@ 2022-01-23  6:00     ` Eli Zaretskii
  2022-01-23 11:22       ` Van Ly
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2022-01-23  6:00 UTC (permalink / raw)
  To: Van Ly; +Cc: 52918

> Date: Sun, 23 Jan 2022 02:15:06 +0000 (UTC)
> From: Van Ly <van.ly@sdf.org>
> cc: 52918@debbugs.gnu.org
> 
> On Mon, 3 Jan 2022, Eli Zaretskii wrote:
> 
> >>    . Unihan_Readings.txt
> >>
> >> Like how quail-show-key helps by showing in the minibuffer the input
> >> sequence needed to type a character for a specific input method, can
> >> there be a function called quail-show-unihan that exposes in the
> >> minibuffer the kDefinition entry associated with the East Asian
> >> character from ucd/Unihan_Readings.txt?
> >
> > Yes, this could be added to Emacs, and IMO would be a useful feature.
> >
> > Suggested implementation:
> >
> >  . import the Unihan_Readings.txt file into Emacs
> >  . add Makefile rules to produce a uni-unihan-readings.el file from
> >    Unihan_Readings.txt, which defines a char-table where each
> >    character has its kDefinition property value
> >  . code a minor mode which will show in the echo area the value of
> >    the kDefinition property, if any, of the character at point
> >
> > Patches welcome.
> >
> 
> See patch attached.
> 
> Two of the three implementation steps suggested are done.

Thanks.

You don't seem to have copyright assignment on file, and without that
we cannot accept such large contributions.  Would you like to start
your legal paperwork now?  If so, I will send you the form and the
instructions.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
  2022-01-23  6:00     ` Eli Zaretskii
@ 2022-01-23 11:22       ` Van Ly
  2022-01-23 11:40         ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Van Ly @ 2022-01-23 11:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52918

On Sun, 23 Jan 2022, Eli Zaretskii wrote:

>>>
>>> Patches welcome.
>>>
>>
>> See patch attached.
>>
>> Two of the three implementation steps suggested are done.
>
> Thanks.
>
> You don't seem to have copyright assignment on file, and without that
> we cannot accept such large contributions.  Would you like to start
> your legal paperwork now?  If so, I will send you the form and the
> instructions.
>

I sent an email to assign@gnu.org in the 24hr before this patch was 
submitted.  I was hoping this patch would fall below the 15 line 
limit and not need the formality of the legal paperwork.  The minor 
mode contribution would climb above the limit, which was why I sent 
the request to assign copyright.  Best case is a 2 week wait.

That generated uni-unihan-readings.el will need a line as follows:

diff --git a/admin/unidata/Unihan_Readings.awk 
b/admin/unidata/Unihan_Readings.awk
index cf319449e59..f01c75b88f9 100644
--- a/admin/unidata/Unihan_Readings.awk
+++ b/admin/unidata/Unihan_Readings.awk
@@ -1,5 +1,6 @@
  BEGIN {
      FS="	"
+    printf(";; -*-no-byte-compile: t; -*-\n")
      printf("(defvar readings-table\n\
  	(make-char-table 'readings-table nil)\n\
  	\"Char table of definitions for East Asian characters.\")\n")


-- 
vl






^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
  2022-01-23 11:22       ` Van Ly
@ 2022-01-23 11:40         ` Eli Zaretskii
  2023-07-25 15:44           ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2022-01-23 11:40 UTC (permalink / raw)
  To: Van Ly; +Cc: 52918

> Date: Sun, 23 Jan 2022 11:22:03 +0000 (UTC)
> From: Van Ly <van.ly@sdf.org>
> cc: 52918@debbugs.gnu.org
> 
> > You don't seem to have copyright assignment on file, and without that
> > we cannot accept such large contributions.  Would you like to start
> > your legal paperwork now?  If so, I will send you the form and the
> > instructions.
> >
> 
> I sent an email to assign@gnu.org in the 24hr before this patch was 
> submitted.  I was hoping this patch would fall below the 15 line 
> limit and not need the formality of the legal paperwork.  The minor 
> mode contribution would climb above the limit, which was why I sent 
> the request to assign copyright.  Best case is a 2 week wait.
> 
> That generated uni-unihan-readings.el will need a line as follows:

Thanks, I prefer to wait until your assignment is in place, and you
can then submit the final pieces to make this feature complete.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
  2022-01-23 11:40         ` Eli Zaretskii
@ 2023-07-25 15:44           ` Eli Zaretskii
  2023-07-25 18:18             ` Van Ly via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2023-07-25 15:44 UTC (permalink / raw)
  To: van.ly; +Cc: 52918

[-- Attachment #1: Type: text/plain, Size: 2144 bytes --]

> Date: Sun, 23 Jan 2022 13:40:23 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 52918@debbugs.gnu.org
> 
> > Date: Sun, 23 Jan 2022 11:22:03 +0000 (UTC)
> > From: Van Ly <van.ly@sdf.org>
> > cc: 52918@debbugs.gnu.org
> > 
> > > You don't seem to have copyright assignment on file, and without that
> > > we cannot accept such large contributions.  Would you like to start
> > > your legal paperwork now?  If so, I will send you the form and the
> > > instructions.
> > >
> > 
> > I sent an email to assign@gnu.org in the 24hr before this patch was 
> > submitted.  I was hoping this patch would fall below the 15 line 
> > limit and not need the formality of the legal paperwork.  The minor 
> > mode contribution would climb above the limit, which was why I sent 
> > the request to assign copyright.  Best case is a 2 week wait.
> > 
> > That generated uni-unihan-readings.el will need a line as follows:
> 
> Thanks, I prefer to wait until your assignment is in place, and you
> can then submit the final pieces to make this feature complete.

<Time passes...>

> Date: Tue, 25 Jul 2023 14:47:52 GMT
> From: Van Ly <van.ly@sdf.org>
> 
> More than 18-months ago I left hanging in one of the bug report
> threads the suggestion to include a readings table for CJKV characters
> from Unicode.
> 
> At the time I hadn't done the paperwork and posted the awk transformer
> script which was about fewer than 16 lines that generated the 21346
> lines reading table.  See attached.
> 
> I have since done the paperwork and was prompted to get this done or
> close the bug report seeing the configure script for 29.1 on line
> 2761 has the option to generate a smaller sized Japanese dictionary.
> 
> The awk script I have since misplaced but it should be somewhere in
> the bug report if details have not been purged beyond 12 months.

Details were not purged, but please look at the past discussions of
this bug and tell where in it should we look for the Awk script.

I forward below the attachments you sent to me in private email;
please continue discussing this issue in this thread, not separately
and not in private email to me.

Thanks.


[-- Attachment #2: Unihan_Readings.el --]
[-- Type: application/emacs-lisp, Size: 1436342 bytes --]

[-- Attachment #3: create-readings-table.sh --]
[-- Type: application/x-sh, Size: 199 bytes --]

[-- Attachment #4: example-configuration.el --]
[-- Type: application/emacs-lisp, Size: 1178 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
  2023-07-25 15:44           ` Eli Zaretskii
@ 2023-07-25 18:18             ` Van Ly via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 10+ messages in thread
From: Van Ly via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-07-25 18:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52918


> Date: Tue, 25 Jul 2023 18:44:01 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > Date: Sun, 23 Jan 2022 13:40:23 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> 

 <Time passes...>

> 
> > Date: Tue, 25 Jul 2023 14:47:52 GMT
> > From: Van Ly <van.ly@sdf.org>
> > 
> > I have since done the paperwork and was prompted to get this done or
> > close the bug report seeing the configure script for 29.1 on line
> > 2761 has the option to generate a smaller sized Japanese dictionary.
> > 
> > The awk script I have since misplaced but it should be somewhere in
> > the bug report if details have not been purged beyond 12 months.
> 
> Details were not purged, but please look at the past discussions of
> this bug and tell where in it should we look for the Awk script.
> 

The patch is located at X and the Awk script in there looks as follows

 1  BEGIN {
 2      FS="       "
 3      printf("(defvar readings-table\n\
 4         (make-char-table 'readings-table nil)\n\
 5         \"Char table of definitions for East Asian characters.\")\n")
 6  }
 7  /^#/ { next }
 8  /kDefinition/ {
 9      sub(/^../, "#x", $1)
10      printf("(aset readings-table %s \"%s\")\n", $1, $3)
11  }
12
13  # Local Variables:
14  # indent-tabs-mode: t
15  # tab-width: 8
16  # End:

 X
    https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-01/msg01393.html





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-07-25 18:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-31 17:55 bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry Van Ly
2022-01-03 13:54 ` Eli Zaretskii
2022-01-04 15:13   ` Van Ly
2022-01-17 18:25   ` Van Ly
2022-01-18 11:30     ` Van Ly
     [not found]   ` <Pine.NEB.4.64.2201230208400.10119@faeroes.freeshell.org>
2022-01-23  6:00     ` Eli Zaretskii
2022-01-23 11:22       ` Van Ly
2022-01-23 11:40         ` Eli Zaretskii
2023-07-25 15:44           ` Eli Zaretskii
2023-07-25 18:18             ` Van Ly via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).