unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* OKURI-NASI
@ 2022-05-29 20:18 Lars Ingebrigtsen
  2022-05-29 20:22 ` OKURI-NASI Lars Ingebrigtsen
  2022-05-30  7:31 ` OKURI-NASI Stefan Kangas
  0 siblings, 2 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-29 20:18 UTC (permalink / raw)
  To: emacs-devel

The following takes 15 seconds on my laptop:

(skkdic-convert "~/src/emacs/trunk/leim/SKK-DIC/SKK-JISYO.L" "/tmp/")

This generates the lisp/leim/ja-dic/ja-dic.el file in a normal (or
bootstrap) build.

I wonder whether anybody's taken a look at speeding that up lately?
Because it's a bit annoying (especially since make isn't doing much else
while parsing/generating the dictionary).

We could, alternatively, just include the generated file in git (because
it seldom changes), but...  On the one hand, it would save a lot of
electricity, but on the other hand, including generated files in git is
a bit tedious.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-29 20:18 OKURI-NASI Lars Ingebrigtsen
@ 2022-05-29 20:22 ` Lars Ingebrigtsen
  2022-05-30  2:28   ` OKURI-NASI Eli Zaretskii
  2022-05-30  7:31 ` OKURI-NASI Stefan Kangas
  1 sibling, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-29 20:22 UTC (permalink / raw)
  To: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Because it's a bit annoying (especially since make isn't doing much else
> while parsing/generating the dictionary).

Or perhaps we could just move generating that file until after we've
dumped Emacs for the final time?  It would make things more parallel,
and we don't need ja-dic.el before we dump?  Or do we?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-29 20:22 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30  2:28   ` Eli Zaretskii
  2022-05-30  9:47     ` OKURI-NASI Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-30  2:28 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Sun, 29 May 2022 22:22:25 +0200
> 
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> 
> > Because it's a bit annoying (especially since make isn't doing much else
> > while parsing/generating the dictionary).
> 
> Or perhaps we could just move generating that file until after we've
> dumped Emacs for the final time?  It would make things more parallel,
> and we don't need ja-dic.el before we dump?  Or do we?

I don't know if we need it that early.  The only way to know is to
try.  We seem to need it before loaddefs.el is generated, at least.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-29 20:18 OKURI-NASI Lars Ingebrigtsen
  2022-05-29 20:22 ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30  7:31 ` Stefan Kangas
  2022-05-30  9:53   ` OKURI-NASI Lars Ingebrigtsen
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Kangas @ 2022-05-30  7:31 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Emacs developers

Lars Ingebrigtsen <larsi@gnus.org> writes:

> We could, alternatively, just include the generated file in git (because
> it seldom changes), but...  On the one hand, it would save a lot of
> electricity, but on the other hand, including generated files in git is
> a bit tedious.

I currently synch SKK-JISYO.L against upstream once a month [but there
has been no changes to this file since December last year].  It should
be easy to include a step recreating the generated file and committing
that too, if that's what we want.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30  2:28   ` OKURI-NASI Eli Zaretskii
@ 2022-05-30  9:47     ` Lars Ingebrigtsen
  2022-05-30 11:39       ` OKURI-NASI Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30  9:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I don't know if we need it that early.  The only way to know is to
> try.  We seem to need it before loaddefs.el is generated, at least.

Do we need the ja-dic.el file before loaddefs.el?  We need leim-list.el,
I think, and leim/Makefile generates both of those files, so I think if
we just decouple those things, then we move the ja-dic.el compilation
later.

But the dependencies here aren't quite clear, and I may be misreading
the code and makefiles here.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30  7:31 ` OKURI-NASI Stefan Kangas
@ 2022-05-30  9:53   ` Lars Ingebrigtsen
  2022-05-30 13:06     ` OKURI-NASI Stefan Monnier
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30  9:53 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Emacs developers

Stefan Kangas <stefan@marxist.se> writes:

> I currently synch SKK-JISYO.L against upstream once a month [but there
> has been no changes to this file since December last year].  It should
> be easy to include a step recreating the generated file and committing
> that too, if that's what we want.

It is extra work, though, and any work that can be avoided is nice.  😀

Doing some very light profiling here, a lot of the time is taken up by
skkdic-get-entry, which is just lookup-nested-alist.  My guess is that
if somebody took a look ja-dic-cnv.el, this algorithm could be made
substantially more efficient by using other data structures than an
extremely long nested alist.

But I have really no idea what it's really doing, so it's a bit daunting
to start poking at the code.  And my guess is that's why nobody else
has, either, since not many people currently hacking at Emacs has the
required domain knowledge.

;;; Commentary:

;; SKK is a Japanese input method running on Mule created by Masahiko
;; Sato <masahiko@sato.riec.tohoku.ac.jp>.  Here we provide utilities
;; to handle a dictionary distributed with SKK so that a different
;; input method (e.g. quail-japanese) can utilize the dictionary.

;; The format of SKK dictionary is quite simple.  Each line has the
;; form "KANASTRING /CONV1/CONV2/.../" which means KANASTRING (仮名文
;; 字列) can be converted to one of CONVi.  CONVi is a Kanji (漢字)
;; and Kana (仮名) mixed string.
;;
;; KANASTRING may have a trailing ASCII letter for Okurigana (送り仮名)
;; information.  For instance, the trailing letter `k' means that one
;; of the following Okurigana is allowed: かきくけこ.  So, in that
;; case, the string "KANASTRINGく" can be converted to one of "CONV1く",
;; CONV2く, ...

Well, that doesn't sound all that complicated, eh?

(I'm hoping to entice somebody to see this as a fun challenge.  😀)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30  9:47     ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 11:39       ` Eli Zaretskii
  2022-05-30 11:44         ` OKURI-NASI Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-30 11:39 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 30 May 2022 11:47:47 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I don't know if we need it that early.  The only way to know is to
> > try.  We seem to need it before loaddefs.el is generated, at least.
> 
> Do we need the ja-dic.el file before loaddefs.el?

Sorry, we don't.  I was confused/coffee-challenged.

> We need leim-list.el, I think, and leim/Makefile generates both of
> those files, so I think if we just decouple those things, then we
> move the ja-dic.el compilation later.

Does moving it later speed up its compilation considerably?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30 11:39       ` OKURI-NASI Eli Zaretskii
@ 2022-05-30 11:44         ` Lars Ingebrigtsen
  2022-05-30 11:49           ` OKURI-NASI Lars Ingebrigtsen
  2022-05-30 12:08           ` OKURI-NASI Eli Zaretskii
  0 siblings, 2 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 11:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> We need leim-list.el, I think, and leim/Makefile generates both of
>> those files, so I think if we just decouple those things, then we
>> move the ja-dic.el compilation later.
>
> Does moving it later speed up its compilation considerably?

Yes -- since lisp depends on leim, we're stalling in a single-threaded
job after finishing bootstrap, but before doing lisp, I think?  So we're
only using one CPU instead of all of them for 15 seconds.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30 11:44         ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 11:49           ` Lars Ingebrigtsen
  2022-05-30 12:08           ` OKURI-NASI Eli Zaretskii
  1 sibling, 0 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 11:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Yes -- since lisp depends on leim, we're stalling in a single-threaded
> job after finishing bootstrap, but before doing lisp, I think?  So we're
> only using one CPU instead of all of them for 15 seconds.

(Or before bootstrap -- but we don't have anything else to schedule then
either.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30 11:44         ` OKURI-NASI Lars Ingebrigtsen
  2022-05-30 11:49           ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 12:08           ` Eli Zaretskii
  2022-05-30 15:22             ` OKURI-NASI Lars Ingebrigtsen
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2022-05-30 12:08 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 30 May 2022 13:44:33 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> We need leim-list.el, I think, and leim/Makefile generates both of
> >> those files, so I think if we just decouple those things, then we
> >> move the ja-dic.el compilation later.
> >
> > Does moving it later speed up its compilation considerably?
> 
> Yes -- since lisp depends on leim, we're stalling in a single-threaded
> job after finishing bootstrap, but before doing lisp, I think?  So we're
> only using one CPU instead of all of them for 15 seconds.

Why is that job single-threaded?  leim has more than one file to
build, so maybe we should rearrange the Makefile to allow a better
parallelism?  IOW, perhaps the single-threaded job you see is simply
the result of all the other sub-jobs in leim being completed, and this
is the only one that's left.

Or maybe remove ja-dic.el from the 'all:' target in leim/Makefile and
move it to lisp/Makefile, where it will compete for CPU units with
more jobs?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30  9:53   ` OKURI-NASI Lars Ingebrigtsen
@ 2022-05-30 13:06     ` Stefan Monnier
  2022-05-30 13:19       ` OKURI-NASI Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Monnier @ 2022-05-30 13:06 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Stefan Kangas, Emacs developers

> Doing some very light profiling here, a lot of the time is taken up by
> skkdic-get-entry, which is just lookup-nested-alist.

Odd: `skkdic-get-entry` didn't even appear in the profile I got (and
`lookup-nested-alist` was dwarfed by other things):

        5412  79% - normal-top-level
        5346  78%  - command-line
        5345  78%   - command-line-1
        5343  78%    - skkdic-convert
        2770  40%     - skkdic-convert-okuri-nasi
        2751  40%      - skkdic-reduced-candidates
        2699  39%       - skkdic-breakup-string
         707  10%        - skkdic-breakup-string
           7   0%         - skkdic-breakup-string
           2   0%          - skkdic-breakup-string
           1   0%             skkdic-breakup-string
        1915  28%     - skkdic-collect-okuri-nasi
          81   1%        skkdic-get-candidate-list
           3   0%        lookup-nested-alist
          47   0%       skkdic-convert-okuri-ari
          36   0%       skkdic-convert-prefix
          34   0%     - save-buffer
          34   0%      - basic-save-buffer
          33   0%       - basic-save-buffer-1
          33   0%        - basic-save-buffer-2
          26   0%         - write-region
          26   0%          - select-safe-coding-system
           1   0%             find-auto-coding
           1   0%           - find-coding-systems-region
           1   0%            - sort-coding-systems
           1   0%               #<compiled -0x6cdb28b2d51a5ea>
           1   0%       - vc-before-save
           1   0%        - vc-backend
           1   0%         - vc-registered
           1   0%          - mapc
           1   0%           - #<compiled 0x1c10f50111ddf347>
           1   0%            - vc-call-backend
           1   0%             - vc-svn-registered
           1   0%              - let
           1   0%               - if
           1   0%                - vc-find-root
           1   0%                   locate-dominating-file
          31   0%     - set-visited-file-name
          26   0%      - set-auto-mode
          25   0%       - set-auto-mode--apply-alist
          25   0%        - set-auto-mode-0
          25   0%         - emacs-lisp-mode
          25   0%          - run-mode-hooks
          19   0%           - hack-local-variables
          19   0%            - hack-local-variables-apply
          19   0%             - hack-one-local-variable
          19   0%              - bug-reference-prog-mode
          18   0%               - jit-lock-register
          18   0%                  jit-lock-mode
           1   0%                 defalias
           6   0%           - run-hooks
           6   0%            - global-font-lock-mode-enable-in-buffers
           6   0%             - turn-on-font-lock-if-desired
           6   0%              - turn-on-font-lock
           6   0%               - font-lock-mode
           6   0%                - font-lock-default-function
           6   0%                 - font-lock-mode-internal
           6   0%                  - font-lock-turn-on-thing-lock
           6   0%                   - jit-lock-register
           6   0%                      jit-lock-mode
           1   0%         hack-local-variables
           5   0%      - hack-local-variables
           5   0%       - hack-local-variables-apply
           4   0%        - hack-one-local-variable
           4   0%         - bug-reference-prog-mode
           4   0%          - jit-lock-register
           4   0%             jit-lock-mode
          10   0%       skkdic-convert-postfix
          65   0%  - startup--honor-delayed-native-compilations
          61   0%   - startup--require-comp-safely
          44   0%    - byte-code
          42   0%     - require
          30   0%      - do-after-load-evaluation
          29   0%       - elisp--font-lock-flush-elisp-buffers
          29   0%          font-lock-flush
           6   0%      - byte-code
           6   0%       - require
           5   0%        - do-after-load-evaluation
           4   0%         - elisp--font-lock-flush-elisp-buffers
           4   0%            font-lock-flush
           1   0%        defalias
           6   0%    - do-after-load-evaluation
           6   0%     - elisp--font-lock-flush-elisp-buffers
           6   0%        font-lock-flush
           1   0%    - native--compile-async
           1   0%     - comp-run-async-workers
           1   0%      - write-region
           1   0%       - select-safe-coding-system
           1   0%          find-auto-coding
           1   0%    #<compiled -0x114e6bbdb5ca73f0>
        1364  20% - ...
        1364  20%    Automatic GC
           8   0% + redisplay_internal (C function)
           6   0% + command-execute

Of that profile I mostly see:

        2770  40%     - skkdic-convert-okuri-nasi
        1915  28%     - skkdic-collect-okuri-nasi
        1364  20%    Automatic GC

and AFAICT there's not much that can be optimized in
`skkdic-collect-okuri-nasi` (assuming my profile is mostly accurate)
since it spends most of its time just converting the source file into
a usable Lisp data structure.  Also, I suspect that most of the GC time
comes from the "convert" part (based on the mem profile which shows it
allocates about 70% vs 30%), so if we factor GC time into it, it's
probably more like

        3770       - skkdic-convert-okuri-nasi
        2279       - skkdic-collect-okuri-nasi

> My guess is that if somebody took a look ja-dic-cnv.el, this algorithm
> could be made substantially more efficient by using other data
> structures than an extremely long nested alist.

I believe those (nested) alists shouldn't be that long (IIUC it's
a trie-like data-structure, a bit like keymaps, so even with many
entries in total, the total depth of the tree should be fairly short and
the length of each list (i.e. the out degree of each node) shouldn't be
very large either).


        Stefan




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30 13:06     ` OKURI-NASI Stefan Monnier
@ 2022-05-30 13:19       ` Lars Ingebrigtsen
  0 siblings, 0 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 13:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Stefan Kangas, Emacs developers

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Doing some very light profiling here, a lot of the time is taken up by
>> skkdic-get-entry, which is just lookup-nested-alist.
>
> Odd: `skkdic-get-entry` didn't even appear in the profile I got (and
> `lookup-nested-alist` was dwarfed by other things):

It's a defsubst -- I was profiling uncompiled code:

        6278  48%                           - skkdic-breakup-string
        6262  48%                            - let
        6258  48%                             - or
        6242  48%                              - and
        5871  45%                               - let
        5763  44%                                - while
        5735  44%                                 - let
        3995  30%                                  + skkdic-get-entry


>> My guess is that if somebody took a look ja-dic-cnv.el, this algorithm
>> could be made substantially more efficient by using other data
>> structures than an extremely long nested alist.
>
> I believe those (nested) alists shouldn't be that long (IIUC it's
> a trie-like data-structure, a bit like keymaps, so even with many
> entries in total, the total depth of the tree should be fairly short and
> the length of each list (i.e. the out degree of each node) shouldn't be
> very large either).

Hm, right...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30 12:08           ` OKURI-NASI Eli Zaretskii
@ 2022-05-30 15:22             ` Lars Ingebrigtsen
  2022-06-01  2:54               ` OKURI-NASI Lars Ingebrigtsen
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-30 15:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Why is that job single-threaded?  leim has more than one file to
> build, so maybe we should rearrange the Makefile to allow a better
> parallelism?  IOW, perhaps the single-threaded job you see is simply
> the result of all the other sub-jobs in leim being completed, and this
> is the only one that's left.

I think the other ones take so little time that it runs by itself for
most of the time.

> Or maybe remove ja-dic.el from the 'all:' target in leim/Makefile and
> move it to lisp/Makefile, where it will compete for CPU units with
> more jobs?

Yes, that's what I was thinking, but I wasn't sure how to express that
in Makefilese in the most idiomatic way, so if somebody else could do
that, that'd be nice.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: OKURI-NASI
  2022-05-30 15:22             ` OKURI-NASI Lars Ingebrigtsen
@ 2022-06-01  2:54               ` Lars Ingebrigtsen
  0 siblings, 0 replies; 14+ messages in thread
From: Lars Ingebrigtsen @ 2022-06-01  2:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Yes, that's what I was thinking, but I wasn't sure how to express that
> in Makefilese in the most idiomatic way, so if somebody else could do
> that, that'd be nice.

I've now poked at this, but my Makefile-fu is low, so if somebody wants
to fix this in a different way, please be my guest.

The change (making the OKURO-NASI stuff happen concurrently with the
main .elc build) takes "make -j32 bootstrap" down from 1m52s to 1m34s.

Hm...  oh!  There's a problem there now -- it generates the ja-dic.el
file fine, but it doesn't byte-compile it (and I guess it should)?  So
there needs to be an extra rule for that...  uhm...  probably in
leim/Makefile, I guess.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-01  2:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-29 20:18 OKURI-NASI Lars Ingebrigtsen
2022-05-29 20:22 ` OKURI-NASI Lars Ingebrigtsen
2022-05-30  2:28   ` OKURI-NASI Eli Zaretskii
2022-05-30  9:47     ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 11:39       ` OKURI-NASI Eli Zaretskii
2022-05-30 11:44         ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 11:49           ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 12:08           ` OKURI-NASI Eli Zaretskii
2022-05-30 15:22             ` OKURI-NASI Lars Ingebrigtsen
2022-06-01  2:54               ` OKURI-NASI Lars Ingebrigtsen
2022-05-30  7:31 ` OKURI-NASI Stefan Kangas
2022-05-30  9:53   ` OKURI-NASI Lars Ingebrigtsen
2022-05-30 13:06     ` OKURI-NASI Stefan Monnier
2022-05-30 13:19       ` OKURI-NASI Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).