* bug#28043: [PATCH] Optimize skkdic conversion
@ 2017-08-11 3:25 npostavs
2017-08-11 6:39 ` Eli Zaretskii
0 siblings, 1 reply; 3+ messages in thread
From: npostavs @ 2017-08-11 3:25 UTC (permalink / raw)
To: 28043
[-- Attachment #1: Type: text/plain, Size: 647 bytes --]
Tags: patch
Severity: wishlist
While waiting for a bootstrap, I thought the skkdic conversion was
looking a bit slow. The following patches reduce the time needed to
less than a third of the original (2m30 to 45s measured using a
non-optimized Emacs). The main speedup comes from optimizing
lookup-nest-alist and set-nested-alist for the case where the key is a
string. I suspect it might be possible to optimize even more by
reorganizing things, but my ignorance of Japanese makes that difficult.
Each commit message has the time I measured after applying. I checked
that the generated ja-dic.el is identical to what was generated before.
[-- Attachment #2: patch --]
[-- Type: text/x-diff, Size: 3736 bytes --]
From b160b3a499983c998a8858a353e78bab5dfa49ad Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sat, 1 Jul 2017 22:39:16 -0400
Subject: [PATCH v1 1/6] Use progress-reporter functions in ja-dic-cnv
Baseline timing:(using non-optimized Emacs)
time make -C leim -B ../../leim/../lisp/leim/ja-dic/ja-dic.el
real 2m30.720s
user 2m29.693s
sys 0m0.567s
* lisp/international/ja-dic-cnv.el (skkdic-collect-okuri-nasi)
(skkdic-convert-okuri-nasi): Use progress-reporter functions instead
of calculating ratio of work done inline.
---
lisp/international/ja-dic-cnv.el | 34 ++++++++++++++--------------------
1 file changed, 14 insertions(+), 20 deletions(-)
diff --git a/lisp/international/ja-dic-cnv.el b/lisp/international/ja-dic-cnv.el
index e80b1b2881..ac097837f4 100644
--- a/lisp/international/ja-dic-cnv.el
+++ b/lisp/international/ja-dic-cnv.el
@@ -267,10 +267,10 @@ skkdic-okuri-nasi-entries
(defvar skkdic-okuri-nasi-entries-count 0)
(defun skkdic-collect-okuri-nasi ()
- (message "Collecting OKURI-NASI entries ...")
(save-excursion
- (let ((prev-ratio 0)
- ratio)
+ (let ((progress (make-progress-reporter "Collecting OKURI-NASI entries"
+ (point) (point-max)
+ nil 10)))
(while (re-search-forward "^\\(\\(\\cH\\|ー\\)+\\) \\(/\\cj.*\\)/$"
nil t)
(let ((kana (match-string 1))
@@ -280,11 +280,7 @@ skkdic-collect-okuri-nasi
(cons (cons kana candidates) skkdic-okuri-nasi-entries)
skkdic-okuri-nasi-entries-count
(1+ skkdic-okuri-nasi-entries-count))
- (setq ratio (floor (* (point) 100.0) (point-max)))
- (if (/= (/ prev-ratio 10) (/ ratio 10))
- (progn
- (message "collected %2d%% ..." ratio)
- (setq prev-ratio ratio)))
+ (progress-reporter-update progress (point))
(while candidates
(let ((entry (lookup-nested-alist (car candidates)
skkdic-word-list nil nil t)))
@@ -292,26 +288,23 @@ skkdic-collect-okuri-nasi
(setcar entry (cons kana (car entry)))
(set-nested-alist (car candidates) (list kana)
skkdic-word-list)))
- (setq candidates (cdr candidates))))))))
+ (setq candidates (cdr candidates)))))
+ (progress-reporter-done progress))))
(defun skkdic-convert-okuri-nasi (skkbuf buf)
- (message "Processing OKURI-NASI entries ...")
(with-current-buffer buf
(insert ";; Setting okuri-nasi entries.\n"
"(skkdic-set-okuri-nasi\n")
(let ((l (nreverse skkdic-okuri-nasi-entries))
- (count 0)
- (prev-ratio 0)
- ratio)
+ (progress (make-progress-reporter "Processing OKURI-NASI entries"
+ 0 skkdic-okuri-nasi-entries-count
+ nil 10))
+ (count 0))
(while l
(let ((kana (car (car l)))
(candidates (cdr (car l))))
- (setq ratio (floor (* count 100.0) skkdic-okuri-nasi-entries-count)
- count (1+ count))
- (if (/= (/ prev-ratio 10) (/ ratio 10))
- (progn
- (message "processed %2d%% ..." ratio)
- (setq prev-ratio ratio)))
+ (setq count (1+ count))
+ (progress-reporter-update progress count)
(if (setq candidates
(skkdic-reduced-candidates skkbuf kana candidates))
(progn
@@ -320,7 +313,8 @@ skkdic-convert-okuri-nasi
(insert " " (car candidates))
(setq candidates (cdr candidates)))
(insert "\"\n"))))
- (setq l (cdr l))))
+ (setq l (cdr l)))
+ (progress-reporter-done progress))
(insert ")\n\n")))
(defun skkdic-convert (filename &optional dirname)
--
2.11.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch --]
[-- Type: text/x-diff, Size: 2216 bytes --]
From 4fa69ec127558ed7b5412958ce1fa24bd1770889 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sun, 2 Jul 2017 09:34:58 -0400
Subject: [PATCH v1 2/6] Optimize lookup-nested-alist for string KEYSEQ
time make -C leim -B ../../leim/../lisp/leim/ja-dic/ja-dic.el
real 1m23.716s
user 1m23.010s
sys 0m0.587s
* lisp/international/mule-util.el (lookup-nested-alist): Use `assq'
instead of `assoc' when KEYSEQ is a string.
---
lisp/international/mule-util.el | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/lisp/international/mule-util.el b/lisp/international/mule-util.el
index e34b01c306..3ae87eb1dd 100644
--- a/lisp/international/mule-util.el
+++ b/lisp/international/mule-util.el
@@ -179,15 +179,23 @@ lookup-nested-alist
(setq len (length keyseq)))
(let ((i (or start 0)))
(if (catch 'lookup-nested-alist-tag
- (if (listp keyseq)
- (while (< i len)
- (if (setq alist (cdr (assoc (nth i keyseq) (cdr alist))))
- (setq i (1+ i))
- (throw 'lookup-nested-alist-tag t))))
- (while (< i len)
- (if (setq alist (cdr (assoc (aref keyseq i) (cdr alist))))
- (setq i (1+ i))
- (throw 'lookup-nested-alist-tag t))))
+ (cond ((stringp keyseq) ; We can use `assq' for characters.
+ (while (< i len)
+ (if (setq alist (cdr (assq (aref keyseq i) (cdr alist))))
+ (setq i (1+ i))
+ (throw 'lookup-nested-alist-tag t))))
+ ((arrayp keyseq)
+ (while (< i len)
+ (if (setq alist (cdr (assoc (aref keyseq i) (cdr alist))))
+ (setq i (1+ i))
+ (throw 'lookup-nested-alist-tag t))))
+ ((listp keyseq)
+ (setq keyseq (nthcdr i keyseq))
+ (while (< i len)
+ (if (setq alist (cdr (assoc (pop keyseq) (cdr alist))))
+ (setq i (1+ i))
+ (throw 'lookup-nested-alist-tag t))))
+ (t (signal 'wrong-type-argument (list keyseq)))))
;; KEYSEQ is too long.
(if nil-for-too-long nil i)
alist)))
--
2.11.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: patch --]
[-- Type: text/x-diff, Size: 2944 bytes --]
From 524246088450c1ced3314890397c2700fb49124f Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Mon, 3 Jul 2017 13:01:46 -0400
Subject: [PATCH v1 3/6] Optimize set-nested-alist for string KEYSEQ
time make -C leim -B ../../leim/../lisp/leim/ja-dic/ja-dic.el
real 1m7.218s
user 1m6.470s
sys 0m0.627s
* lisp/international/mule-util.el (set-nested-alist): Use `assq' when
KEYSEQ is string.
---
lisp/international/mule-util.el | 51 ++++++++++++++++++++++++++++++-----------
1 file changed, 37 insertions(+), 14 deletions(-)
diff --git a/lisp/international/mule-util.el b/lisp/international/mule-util.el
index 3ae87eb1dd..257f8854c3 100644
--- a/lisp/international/mule-util.el
+++ b/lisp/international/mule-util.el
@@ -143,20 +143,43 @@ set-nested-alist
See the documentation of `nested-alist-p' for more detail."
(or (nested-alist-p alist)
(error "Invalid argument %s" alist))
- (let ((islist (listp keyseq))
- (len (or len (length keyseq)))
- (i 0)
- key-elt slot)
- (while (< i len)
- (if (null (nested-alist-p alist))
- (error "Keyseq %s is too long for this nested alist" keyseq))
- (setq key-elt (if islist (nth i keyseq) (aref keyseq i)))
- (setq slot (assoc key-elt (cdr alist)))
- (unless slot
- (setq slot (cons key-elt (list t)))
- (setcdr alist (cons slot (cdr alist))))
- (setq alist (cdr slot))
- (setq i (1+ i)))
+ (let ((len (or len (length keyseq)))
+ (i 0))
+ (cond
+ ((stringp keyseq) ; We can use `assq' for characters.
+ (while (< i len)
+ (if (null (nested-alist-p alist))
+ (error "Keyseq %s is too long for this nested alist" keyseq))
+ (let* ((key-elt (aref keyseq i))
+ (slot (assq key-elt (cdr alist))))
+ (unless slot
+ (setq slot (list key-elt t))
+ (push slot (cdr alist)))
+ (setq alist (cdr slot)))
+ (setq i (1+ i))))
+ ((arrayp keyseq)
+ (while (< i len)
+ (if (null (nested-alist-p alist))
+ (error "Keyseq %s is too long for this nested alist" keyseq))
+ (let* ((key-elt (aref keyseq i))
+ (slot (assoc key-elt (cdr alist))))
+ (unless slot
+ (setq slot (list key-elt t))
+ (push slot (cdr alist)))
+ (setq alist (cdr slot)))
+ (setq i (1+ i))))
+ ((listp keyseq)
+ (while (< i len)
+ (if (null (nested-alist-p alist))
+ (error "Keyseq %s is too long for this nested alist" keyseq))
+ (let* ((key-elt (pop keyseq))
+ (slot (assoc key-elt (cdr alist))))
+ (unless slot
+ (setq slot (list key-elt t))
+ (push slot (cdr alist)))
+ (setq alist (cdr slot)))
+ (setq i (1+ i))))
+ (t (signal 'wrong-type-argument (list keyseq))))
(setcar alist entry)
(if branches
(setcdr (last alist) branches))))
--
2.11.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: patch --]
[-- Type: text/x-diff, Size: 2585 bytes --]
From 14eb38e18b83a67114f31de99d1bdcd8811bbd37 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sun, 2 Jul 2017 02:00:02 -0400
Subject: [PATCH v1 4/6] Optimize skkdic by avoiding regexp matching
time make -C leim -B ../../leim/../lisp/leim/ja-dic/ja-dic.el
real 0m57.091s
user 0m56.370s
sys 0m0.607s
* lisp/international/ja-dic-cnv.el (skkdic-reduced-candidates): Call
`char-category-set' on the first character of the string directly,
instead of using a regexp for the character category.
(skkdic--japanese-category-set): New constant.
(skkdic-collect-okuri-nasi): Just set
`skkdic-okuri-nasi-entries-count' at once at the end rather than
updating it throughout the loop.
---
lisp/international/ja-dic-cnv.el | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/lisp/international/ja-dic-cnv.el b/lisp/international/ja-dic-cnv.el
index ac097837f4..8878bbd358 100644
--- a/lisp/international/ja-dic-cnv.el
+++ b/lisp/international/ja-dic-cnv.el
@@ -251,12 +251,16 @@ skkdic-breakup-string
;; Return list of candidates which excludes some from CANDIDATES.
;; Excluded candidates can be derived from another entry.
+(defconst skkdic--japanese-category-set (make-category-set "j"))
+
(defun skkdic-reduced-candidates (skkbuf kana candidates)
(let (elt l)
(while candidates
(setq elt (car candidates))
(if (or (= (length elt) 1)
- (and (string-match "^\\cj" elt)
+ (and (bool-vector-subsetp
+ skkdic--japanese-category-set
+ (char-category-set (aref elt 0)))
(not (skkdic-breakup-string skkbuf kana elt 0 (length elt)
'first))))
(setq l (cons elt l)))
@@ -277,9 +281,7 @@ skkdic-collect-okuri-nasi
(candidates (skkdic-get-candidate-list (match-beginning 3)
(match-end 3))))
(setq skkdic-okuri-nasi-entries
- (cons (cons kana candidates) skkdic-okuri-nasi-entries)
- skkdic-okuri-nasi-entries-count
- (1+ skkdic-okuri-nasi-entries-count))
+ (cons (cons kana candidates) skkdic-okuri-nasi-entries))
(progress-reporter-update progress (point))
(while candidates
(let ((entry (lookup-nested-alist (car candidates)
@@ -289,6 +291,7 @@ skkdic-collect-okuri-nasi
(set-nested-alist (car candidates) (list kana)
skkdic-word-list)))
(setq candidates (cdr candidates)))))
+ (setq skkdic-okuri-nasi-entries-count (length skkdic-okuri-nasi-entries))
(progress-reporter-done progress))))
(defun skkdic-convert-okuri-nasi (skkbuf buf)
--
2.11.1
[-- Attachment #6: patch --]
[-- Type: text/x-diff, Size: 3180 bytes --]
From fec9671820e344cdbb8c06aeb889c0f2df67f87c Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sun, 2 Jul 2017 02:10:23 -0400
Subject: [PATCH v1 5/6] Optimize skkdic: don't extract text properties
time make -C leim -B ../../leim/../lisp/leim/ja-dic/ja-dic.el
real 0m47.962s
user 0m47.487s
sys 0m0.390s
* lisp/international/ja-dic-cnv.el (skkdic-convert-postfix)
(skkdic-convert-prefix, skkdic-get-candidate-list)
(skkdic-collect-okuri-nasi, skkdic-extract-conversion-data): Use
`match-string-no-properties' instead of `match-string'.
---
lisp/international/ja-dic-cnv.el | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/lisp/international/ja-dic-cnv.el b/lisp/international/ja-dic-cnv.el
index 8878bbd358..63eede093d 100644
--- a/lisp/international/ja-dic-cnv.el
+++ b/lisp/international/ja-dic-cnv.el
@@ -125,10 +125,10 @@ skkdic-convert-postfix
;; Search postfix entries.
(while (re-search-forward "^[#<>?]\\(\\(\\cH\\|ー\\)+\\) " nil t)
- (let ((kana (match-string 1))
+ (let ((kana (match-string-no-properties 1))
str candidates)
(while (looking-at "/[#0-9 ]*\\([^/\n]*\\)/")
- (setq str (match-string 1))
+ (setq str (match-string-no-properties 1))
(if (not (member str candidates))
(setq candidates (cons str candidates)))
(goto-char (match-end 1)))
@@ -158,10 +158,10 @@ skkdic-convert-prefix
"(skkdic-set-prefix\n"))
(save-excursion
(while (re-search-forward "^\\(\\(\\cH\\|ー\\)+\\)[<>?] " nil t)
- (let ((kana (match-string 1))
+ (let ((kana (match-string-no-properties 1))
str candidates)
(while (looking-at "/\\([^/\n]+\\)/")
- (setq str (match-string 1))
+ (setq str (match-string-no-properties 1))
(if (not (member str candidates))
(setq candidates (cons str candidates)))
(goto-char (match-end 1)))
@@ -180,8 +180,8 @@ skkdic-get-candidate-list
(let (candidates)
(goto-char from)
(while (re-search-forward "/[^/ \n]+" to t)
- (setq candidates (cons (buffer-substring (1+ (match-beginning 0))
- (match-end 0))
+ (setq candidates (cons (buffer-substring-no-properties
+ (1+ (match-beginning 0)) (match-end 0))
candidates)))
candidates))
@@ -277,7 +277,7 @@ skkdic-collect-okuri-nasi
nil 10)))
(while (re-search-forward "^\\(\\(\\cH\\|ー\\)+\\) \\(/\\cj.*\\)/$"
nil t)
- (let ((kana (match-string 1))
+ (let ((kana (match-string-no-properties 1))
(candidates (skkdic-get-candidate-list (match-beginning 3)
(match-end 3))))
(setq skkdic-okuri-nasi-entries
@@ -464,7 +464,7 @@ skkdic-extract-conversion-data
(i (match-end 0))
candidates)
(while (string-match "[^ ]+" entry i)
- (setq candidates (cons (match-string 0 entry) candidates))
+ (setq candidates (cons (match-string-no-properties 0 entry) candidates))
(setq i (match-end 0)))
(cons (skkdic-get-kana-compact-codes kana) candidates)))
--
2.11.1
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #7: patch --]
[-- Type: text/x-diff, Size: 2125 bytes --]
From 99a545062367a2f80d69482df0bb266b358ffa54 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sun, 2 Jul 2017 10:04:02 -0400
Subject: [PATCH v1 6/6] * lisp/international/ja-dic-cnv.el
(skkdic-convert-okuri-nasi): Cleanup.
time make -C leim -B ../../leim/../lisp/leim/ja-dic/ja-dic.el
real 0m46.145s
user 0m45.607s
sys 0m0.460s
---
lisp/international/ja-dic-cnv.el | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/lisp/international/ja-dic-cnv.el b/lisp/international/ja-dic-cnv.el
index 63eede093d..6d2753bbf5 100644
--- a/lisp/international/ja-dic-cnv.el
+++ b/lisp/international/ja-dic-cnv.el
@@ -298,25 +298,19 @@ skkdic-convert-okuri-nasi
(with-current-buffer buf
(insert ";; Setting okuri-nasi entries.\n"
"(skkdic-set-okuri-nasi\n")
- (let ((l (nreverse skkdic-okuri-nasi-entries))
- (progress (make-progress-reporter "Processing OKURI-NASI entries"
+ (let ((progress (make-progress-reporter "Processing OKURI-NASI entries"
0 skkdic-okuri-nasi-entries-count
nil 10))
(count 0))
- (while l
- (let ((kana (car (car l)))
- (candidates (cdr (car l))))
- (setq count (1+ count))
- (progress-reporter-update progress count)
- (if (setq candidates
- (skkdic-reduced-candidates skkbuf kana candidates))
- (progn
- (insert "\"" kana)
- (while candidates
- (insert " " (car candidates))
- (setq candidates (cdr candidates)))
- (insert "\"\n"))))
- (setq l (cdr l)))
+ (pcase-dolist (`(,kana . ,candidates)
+ (nreverse skkdic-okuri-nasi-entries))
+ (setq count (1+ count))
+ (progress-reporter-update progress count)
+ (when (setq candidates
+ (skkdic-reduced-candidates skkbuf kana candidates))
+ (insert "\"" kana " "
+ (mapconcat #'identity candidates " ")
+ "\"\n")))
(progress-reporter-done progress))
(insert ")\n\n")))
--
2.11.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* bug#28043: [PATCH] Optimize skkdic conversion
2017-08-11 3:25 bug#28043: [PATCH] Optimize skkdic conversion npostavs
@ 2017-08-11 6:39 ` Eli Zaretskii
2017-08-22 0:54 ` npostavs
0 siblings, 1 reply; 3+ messages in thread
From: Eli Zaretskii @ 2017-08-11 6:39 UTC (permalink / raw)
To: npostavs, Kenichi Handa; +Cc: 28043
> From: npostavs@users.sourceforge.net
> Date: Thu, 10 Aug 2017 23:25:17 -0400
>
> While waiting for a bootstrap, I thought the skkdic conversion was
> looking a bit slow. The following patches reduce the time needed to
> less than a third of the original (2m30 to 45s measured using a
> non-optimized Emacs). The main speedup comes from optimizing
> lookup-nest-alist and set-nested-alist for the case where the key is a
> string. I suspect it might be possible to optimize even more by
> reorganizing things, but my ignorance of Japanese makes that difficult.
>
> Each commit message has the time I measured after applying. I checked
> that the generated ja-dic.el is identical to what was generated before.
Thanks. CC'ing Handa-san, who could perhaps suggest more
optimizations.
I guess you've verified that the results are identical after these
optimizations? If so, I see no reason not to push in a few days.
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#28043: [PATCH] Optimize skkdic conversion
2017-08-11 6:39 ` Eli Zaretskii
@ 2017-08-22 0:54 ` npostavs
0 siblings, 0 replies; 3+ messages in thread
From: npostavs @ 2017-08-22 0:54 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 28043
tags 28043 fixed
close 28043 26.1
quit
Eli Zaretskii <eliz@gnu.org> writes:
>> From: npostavs@users.sourceforge.net
>> Date: Thu, 10 Aug 2017 23:25:17 -0400
>>
>> While waiting for a bootstrap, I thought the skkdic conversion was
>> looking a bit slow. The following patches reduce the time needed to
>> less than a third of the original (2m30 to 45s measured using a
>> non-optimized Emacs). The main speedup comes from optimizing
>> lookup-nest-alist and set-nested-alist for the case where the key is a
>> string. I suspect it might be possible to optimize even more by
>> reorganizing things, but my ignorance of Japanese makes that difficult.
>>
>> Each commit message has the time I measured after applying. I checked
>> that the generated ja-dic.el is identical to what was generated before.
>
> Thanks. CC'ing Handa-san, who could perhaps suggest more
> optimizations.
>
> I guess you've verified that the results are identical after these
> optimizations? If so, I see no reason not to push in a few days.
Okay, I've pushed to master.
[1: 9d7973530f]: 2017-08-21 20:52:25 -0400
Optimize skkdic conversion (Bug#28043)
http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=9d7973530f912c6001445ba9b83b7893b466aee8
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-08-22 0:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-11 3:25 bug#28043: [PATCH] Optimize skkdic conversion npostavs
2017-08-11 6:39 ` Eli Zaretskii
2017-08-22 0:54 ` npostavs
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).