* OKURI-NASI
@ 2022-05-29 20:18 Lars Ingebrigtsen
2022-05-29 20:22 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 7:31 ` OKURI-NASI Stefan Kangas
0 siblings, 2 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-29 20:18 UTC (permalink / raw)
To: emacs-devel
The following takes 15 seconds on my laptop:
(skkdic-convert "~/src/emacs/trunk/leim/SKK-DIC/SKK-JISYO.L" "/tmp/")
This generates the lisp/leim/ja-dic/ja-dic.el file in a normal (or
bootstrap) build.
I wonder whether anybody's taken a look at speeding that up lately?
Because it's a bit annoying (especially since make isn't doing much else
while parsing/generating the dictionary).
We could, alternatively, just include the generated file in git (because
it seldom changes), but... On the one hand, it would save a lot of
electricity, but on the other hand, including generated files in git is
a bit tedious.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-29 20:18 OKURI-NASI Lars Ingebrigtsen
@ 2022-05-29 20:22 ` Lars Ingebrigtsen
2022-05-30 2:28 ` OKURI-NASI Eli Zaretskii
2022-05-30 7:31 ` OKURI-NASI Stefan Kangas
1 sibling, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-29 20:22 UTC (permalink / raw)
To: emacs-devel
Lars Ingebrigtsen <larsi@gnus.org> writes:
> Because it's a bit annoying (especially since make isn't doing much else
> while parsing/generating the dictionary).
Or perhaps we could just move generating that file until after we've
dumped Emacs for the final time? It would make things more parallel,
and we don't need ja-dic.el before we dump? Or do we?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-29 20:22 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 2:28 ` Eli Zaretskii
2022-05-30 9:47 ` OKURI-NASI Lars Ingebrigtsen
0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-30 2:28 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: emacs-devel
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Sun, 29 May 2022 22:22:25 +0200
>
> Lars Ingebrigtsen <larsi@gnus.org> writes:
>
> > Because it's a bit annoying (especially since make isn't doing much else
> > while parsing/generating the dictionary).
>
> Or perhaps we could just move generating that file until after we've
> dumped Emacs for the final time? It would make things more parallel,
> and we don't need ja-dic.el before we dump? Or do we?
I don't know if we need it that early. The only way to know is to
try. We seem to need it before loaddefs.el is generated, at least.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 2:28 ` OKURI-NASI Eli Zaretskii
@ 2022-05-30 9:47 ` Lars Ingebrigtsen
2022-05-30 11:39 ` OKURI-NASI Eli Zaretskii
0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 9:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> I don't know if we need it that early. The only way to know is to
> try. We seem to need it before loaddefs.el is generated, at least.
Do we need the ja-dic.el file before loaddefs.el? We need leim-list.el,
I think, and leim/Makefile generates both of those files, so I think if
we just decouple those things, then we move the ja-dic.el compilation
later.
But the dependencies here aren't quite clear, and I may be misreading
the code and makefiles here.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 9:47 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 11:39 ` Eli Zaretskii
2022-05-30 11:44 ` OKURI-NASI Lars Ingebrigtsen
0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-30 11:39 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: emacs-devel
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 30 May 2022 11:47:47 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > I don't know if we need it that early. The only way to know is to
> > try. We seem to need it before loaddefs.el is generated, at least.
>
> Do we need the ja-dic.el file before loaddefs.el?
Sorry, we don't. I was confused/coffee-challenged.
> We need leim-list.el, I think, and leim/Makefile generates both of
> those files, so I think if we just decouple those things, then we
> move the ja-dic.el compilation later.
Does moving it later speed up its compilation considerably?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 11:39 ` OKURI-NASI Eli Zaretskii
@ 2022-05-30 11:44 ` Lars Ingebrigtsen
2022-05-30 11:49 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 12:08 ` OKURI-NASI Eli Zaretskii
0 siblings, 2 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 11:44 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> We need leim-list.el, I think, and leim/Makefile generates both of
>> those files, so I think if we just decouple those things, then we
>> move the ja-dic.el compilation later.
>
> Does moving it later speed up its compilation considerably?
Yes -- since lisp depends on leim, we're stalling in a single-threaded
job after finishing bootstrap, but before doing lisp, I think? So we're
only using one CPU instead of all of them for 15 seconds.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 11:44 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 11:49 ` Lars Ingebrigtsen
2022-05-30 12:08 ` OKURI-NASI Eli Zaretskii
1 sibling, 0 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 11:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Lars Ingebrigtsen <larsi@gnus.org> writes:
> Yes -- since lisp depends on leim, we're stalling in a single-threaded
> job after finishing bootstrap, but before doing lisp, I think? So we're
> only using one CPU instead of all of them for 15 seconds.
(Or before bootstrap -- but we don't have anything else to schedule then
either.)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 11:44 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 11:49 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 12:08 ` Eli Zaretskii
2022-05-30 15:22 ` OKURI-NASI Lars Ingebrigtsen
1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-30 12:08 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: emacs-devel
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 30 May 2022 13:44:33 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> We need leim-list.el, I think, and leim/Makefile generates both of
> >> those files, so I think if we just decouple those things, then we
> >> move the ja-dic.el compilation later.
> >
> > Does moving it later speed up its compilation considerably?
>
> Yes -- since lisp depends on leim, we're stalling in a single-threaded
> job after finishing bootstrap, but before doing lisp, I think? So we're
> only using one CPU instead of all of them for 15 seconds.
Why is that job single-threaded? leim has more than one file to
build, so maybe we should rearrange the Makefile to allow a better
parallelism? IOW, perhaps the single-threaded job you see is simply
the result of all the other sub-jobs in leim being completed, and this
is the only one that's left.
Or maybe remove ja-dic.el from the 'all:' target in leim/Makefile and
move it to lisp/Makefile, where it will compete for CPU units with
more jobs?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 12:08 ` OKURI-NASI Eli Zaretskii
@ 2022-05-30 15:22 ` Lars Ingebrigtsen
2022-06-01 2:54 ` OKURI-NASI Lars Ingebrigtsen
0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 15:22 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> Why is that job single-threaded? leim has more than one file to
> build, so maybe we should rearrange the Makefile to allow a better
> parallelism? IOW, perhaps the single-threaded job you see is simply
> the result of all the other sub-jobs in leim being completed, and this
> is the only one that's left.
I think the other ones take so little time that it runs by itself for
most of the time.
> Or maybe remove ja-dic.el from the 'all:' target in leim/Makefile and
> move it to lisp/Makefile, where it will compete for CPU units with
> more jobs?
Yes, that's what I was thinking, but I wasn't sure how to express that
in Makefilese in the most idiomatic way, so if somebody else could do
that, that'd be nice.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 15:22 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-06-01 2:54 ` Lars Ingebrigtsen
0 siblings, 0 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-06-01 2:54 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Lars Ingebrigtsen <larsi@gnus.org> writes:
> Yes, that's what I was thinking, but I wasn't sure how to express that
> in Makefilese in the most idiomatic way, so if somebody else could do
> that, that'd be nice.
I've now poked at this, but my Makefile-fu is low, so if somebody wants
to fix this in a different way, please be my guest.
The change (making the OKURO-NASI stuff happen concurrently with the
main .elc build) takes "make -j32 bootstrap" down from 1m52s to 1m34s.
Hm... oh! There's a problem there now -- it generates the ja-dic.el
file fine, but it doesn't byte-compile it (and I guess it should)? So
there needs to be an extra rule for that... uhm... probably in
leim/Makefile, I guess.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-29 20:18 OKURI-NASI Lars Ingebrigtsen
2022-05-29 20:22 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 7:31 ` Stefan Kangas
2022-05-30 9:53 ` OKURI-NASI Lars Ingebrigtsen
1 sibling, 1 reply; 14+ messages in thread
From: Stefan Kangas @ 2022-05-30 7:31 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Emacs developers
Lars Ingebrigtsen <larsi@gnus.org> writes:
> We could, alternatively, just include the generated file in git (because
> it seldom changes), but... On the one hand, it would save a lot of
> electricity, but on the other hand, including generated files in git is
> a bit tedious.
I currently synch SKK-JISYO.L against upstream once a month [but there
has been no changes to this file since December last year]. It should
be easy to include a step recreating the generated file and committing
that too, if that's what we want.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 7:31 ` OKURI-NASI Stefan Kangas
@ 2022-05-30 9:53 ` Lars Ingebrigtsen
2022-05-30 13:06 ` OKURI-NASI Stefan Monnier
0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 9:53 UTC (permalink / raw)
To: Stefan Kangas; +Cc: Emacs developers
Stefan Kangas <stefan@marxist.se> writes:
> I currently synch SKK-JISYO.L against upstream once a month [but there
> has been no changes to this file since December last year]. It should
> be easy to include a step recreating the generated file and committing
> that too, if that's what we want.
It is extra work, though, and any work that can be avoided is nice. 😀
Doing some very light profiling here, a lot of the time is taken up by
skkdic-get-entry, which is just lookup-nested-alist. My guess is that
if somebody took a look ja-dic-cnv.el, this algorithm could be made
substantially more efficient by using other data structures than an
extremely long nested alist.
But I have really no idea what it's really doing, so it's a bit daunting
to start poking at the code. And my guess is that's why nobody else
has, either, since not many people currently hacking at Emacs has the
required domain knowledge.
;;; Commentary:
;; SKK is a Japanese input method running on Mule created by Masahiko
;; Sato <masahiko@sato.riec.tohoku.ac.jp>. Here we provide utilities
;; to handle a dictionary distributed with SKK so that a different
;; input method (e.g. quail-japanese) can utilize the dictionary.
;; The format of SKK dictionary is quite simple. Each line has the
;; form "KANASTRING /CONV1/CONV2/.../" which means KANASTRING (仮名文
;; 字列) can be converted to one of CONVi. CONVi is a Kanji (漢字)
;; and Kana (仮名) mixed string.
;;
;; KANASTRING may have a trailing ASCII letter for Okurigana (送り仮名)
;; information. For instance, the trailing letter `k' means that one
;; of the following Okurigana is allowed: かきくけこ. So, in that
;; case, the string "KANASTRINGく" can be converted to one of "CONV1く",
;; CONV2く, ...
Well, that doesn't sound all that complicated, eh?
(I'm hoping to entice somebody to see this as a fun challenge. 😀)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 9:53 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 13:06 ` Stefan Monnier
2022-05-30 13:19 ` OKURI-NASI Lars Ingebrigtsen
0 siblings, 1 reply; 14+ messages in thread
From: Stefan Monnier @ 2022-05-30 13:06 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Stefan Kangas, Emacs developers
> Doing some very light profiling here, a lot of the time is taken up by
> skkdic-get-entry, which is just lookup-nested-alist.
Odd: `skkdic-get-entry` didn't even appear in the profile I got (and
`lookup-nested-alist` was dwarfed by other things):
5412 79% - normal-top-level
5346 78% - command-line
5345 78% - command-line-1
5343 78% - skkdic-convert
2770 40% - skkdic-convert-okuri-nasi
2751 40% - skkdic-reduced-candidates
2699 39% - skkdic-breakup-string
707 10% - skkdic-breakup-string
7 0% - skkdic-breakup-string
2 0% - skkdic-breakup-string
1 0% skkdic-breakup-string
1915 28% - skkdic-collect-okuri-nasi
81 1% skkdic-get-candidate-list
3 0% lookup-nested-alist
47 0% skkdic-convert-okuri-ari
36 0% skkdic-convert-prefix
34 0% - save-buffer
34 0% - basic-save-buffer
33 0% - basic-save-buffer-1
33 0% - basic-save-buffer-2
26 0% - write-region
26 0% - select-safe-coding-system
1 0% find-auto-coding
1 0% - find-coding-systems-region
1 0% - sort-coding-systems
1 0% #<compiled -0x6cdb28b2d51a5ea>
1 0% - vc-before-save
1 0% - vc-backend
1 0% - vc-registered
1 0% - mapc
1 0% - #<compiled 0x1c10f50111ddf347>
1 0% - vc-call-backend
1 0% - vc-svn-registered
1 0% - let
1 0% - if
1 0% - vc-find-root
1 0% locate-dominating-file
31 0% - set-visited-file-name
26 0% - set-auto-mode
25 0% - set-auto-mode--apply-alist
25 0% - set-auto-mode-0
25 0% - emacs-lisp-mode
25 0% - run-mode-hooks
19 0% - hack-local-variables
19 0% - hack-local-variables-apply
19 0% - hack-one-local-variable
19 0% - bug-reference-prog-mode
18 0% - jit-lock-register
18 0% jit-lock-mode
1 0% defalias
6 0% - run-hooks
6 0% - global-font-lock-mode-enable-in-buffers
6 0% - turn-on-font-lock-if-desired
6 0% - turn-on-font-lock
6 0% - font-lock-mode
6 0% - font-lock-default-function
6 0% - font-lock-mode-internal
6 0% - font-lock-turn-on-thing-lock
6 0% - jit-lock-register
6 0% jit-lock-mode
1 0% hack-local-variables
5 0% - hack-local-variables
5 0% - hack-local-variables-apply
4 0% - hack-one-local-variable
4 0% - bug-reference-prog-mode
4 0% - jit-lock-register
4 0% jit-lock-mode
10 0% skkdic-convert-postfix
65 0% - startup--honor-delayed-native-compilations
61 0% - startup--require-comp-safely
44 0% - byte-code
42 0% - require
30 0% - do-after-load-evaluation
29 0% - elisp--font-lock-flush-elisp-buffers
29 0% font-lock-flush
6 0% - byte-code
6 0% - require
5 0% - do-after-load-evaluation
4 0% - elisp--font-lock-flush-elisp-buffers
4 0% font-lock-flush
1 0% defalias
6 0% - do-after-load-evaluation
6 0% - elisp--font-lock-flush-elisp-buffers
6 0% font-lock-flush
1 0% - native--compile-async
1 0% - comp-run-async-workers
1 0% - write-region
1 0% - select-safe-coding-system
1 0% find-auto-coding
1 0% #<compiled -0x114e6bbdb5ca73f0>
1364 20% - ...
1364 20% Automatic GC
8 0% + redisplay_internal (C function)
6 0% + command-execute
Of that profile I mostly see:
2770 40% - skkdic-convert-okuri-nasi
1915 28% - skkdic-collect-okuri-nasi
1364 20% Automatic GC
and AFAICT there's not much that can be optimized in
`skkdic-collect-okuri-nasi` (assuming my profile is mostly accurate)
since it spends most of its time just converting the source file into
a usable Lisp data structure. Also, I suspect that most of the GC time
comes from the "convert" part (based on the mem profile which shows it
allocates about 70% vs 30%), so if we factor GC time into it, it's
probably more like
3770 - skkdic-convert-okuri-nasi
2279 - skkdic-collect-okuri-nasi
> My guess is that if somebody took a look ja-dic-cnv.el, this algorithm
> could be made substantially more efficient by using other data
> structures than an extremely long nested alist.
I believe those (nested) alists shouldn't be that long (IIUC it's
a trie-like data-structure, a bit like keymaps, so even with many
entries in total, the total depth of the tree should be fairly short and
the length of each list (i.e. the out degree of each node) shouldn't be
very large either).
Stefan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: OKURI-NASI
2022-05-30 13:06 ` OKURI-NASI Stefan Monnier
@ 2022-05-30 13:19 ` Lars Ingebrigtsen
0 siblings, 0 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 13:19 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Stefan Kangas, Emacs developers
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> Doing some very light profiling here, a lot of the time is taken up by
>> skkdic-get-entry, which is just lookup-nested-alist.
>
> Odd: `skkdic-get-entry` didn't even appear in the profile I got (and
> `lookup-nested-alist` was dwarfed by other things):
It's a defsubst -- I was profiling uncompiled code:
6278 48% - skkdic-breakup-string
6262 48% - let
6258 48% - or
6242 48% - and
5871 45% - let
5763 44% - while
5735 44% - let
3995 30% + skkdic-get-entry
>> My guess is that if somebody took a look ja-dic-cnv.el, this algorithm
>> could be made substantially more efficient by using other data
>> structures than an extremely long nested alist.
>
> I believe those (nested) alists shouldn't be that long (IIUC it's
> a trie-like data-structure, a bit like keymaps, so even with many
> entries in total, the total depth of the tree should be fairly short and
> the length of each list (i.e. the out degree of each node) shouldn't be
> very large either).
Hm, right...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-06-01 2:54 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-29 20:18 OKURI-NASI Lars Ingebrigtsen
2022-05-29 20:22 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 2:28 ` OKURI-NASI Eli Zaretskii
2022-05-30 9:47 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 11:39 ` OKURI-NASI Eli Zaretskii
2022-05-30 11:44 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 11:49 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 12:08 ` OKURI-NASI Eli Zaretskii
2022-05-30 15:22 ` OKURI-NASI Lars Ingebrigtsen
2022-06-01 2:54 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 7:31 ` OKURI-NASI Stefan Kangas
2022-05-30 9:53 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 13:06 ` OKURI-NASI Stefan Monnier
2022-05-30 13:19 ` OKURI-NASI Lars Ingebrigtsen
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.