* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
@ 2017-08-31 5:04 Mark Oteiza
2017-08-31 10:05 ` Robert Pluim
2017-08-31 14:00 ` Eli Zaretskii
0 siblings, 2 replies; 9+ messages in thread
From: Mark Oteiza @ 2017-08-31 5:04 UTC (permalink / raw)
To: 28302
Hi,
I seem to remember there having been complaints about ucs-names preview
being slow. I was curious about how much of that time was spent
assoc'ing every element of a roughly n = 42k element long alist, and so
tried making it a hash table instead. The result is a drastic speedup
of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index 338ca6a6e3..8c5fcf319b 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -2923,10 +2923,10 @@ nonascii-translation-table
(make-obsolete-variable 'nonascii-translation-table "do not use it." "23.1")
(defvar ucs-names nil
- "Alist of cached (CHAR-NAME . CHAR-CODE) pairs.")
+ "Hash table of cached (CHAR-NAME . CHAR-CODE) pairs.")
(defun ucs-names ()
- "Return alist of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
+ "Return hash table of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
(or ucs-names
(let ((ranges
'((#x0000 . #x33FF)
@@ -2954,7 +2954,7 @@ ucs-names
;; (#x20000 . #xDFFFF) CJK Ideograph Extension A, B, etc, unused
(#xE0000 . #xE01FF)))
(gc-cons-threshold 10000000)
- names)
+ (names (make-hash-table :size 42943 :test #'equal)))
(dolist (range ranges)
(let ((c (car range))
(end (cdr range)))
@@ -2965,27 +2965,28 @@ ucs-names
;; shadows a "new-name" but in practice every time an
;; `old-name' conflicts with a `new-name', the newer one has a
;; higher code, so it gets pushed later!
- (if new-name (push (cons new-name c) names))
- (if old-name (push (cons old-name c) names))
+ (if new-name (puthash new-name c names))
+ (if old-name (puthash old-name c names))
(setq c (1+ c))))))
;; Special case for "BELL" which is apparently the only char which
;; doesn't have a new name and whose old-name is shadowed by a newer
;; char with that name.
- (setq ucs-names `(("BELL (BEL)" . 7) ,@names)))))
+ (puthash "BELL (BEL)" ?\a names)
+ (setq ucs-names names))))
(defun mule--ucs-names-annotation (name)
;; FIXME: It would be much better to add this annotation before rather than
;; after the char name, so the annotations are aligned.
;; FIXME: The default behavior of displaying annotations in italics
;; doesn't work well here.
- (let ((char (assoc name ucs-names)))
- (when char (format " (%c)" (cdr char)))))
+ (let ((char (gethash name ucs-names)))
+ (when char (format " (%c)" char))))
(defun char-from-name (string &optional ignore-case)
"Return a character as a number from its Unicode name STRING.
If optional IGNORE-CASE is non-nil, ignore case in STRING.
Return nil if STRING does not name a character."
- (or (cdr (assoc-string string (ucs-names) ignore-case))
+ (or (gethash (if ignore-case (upcase string) string) (ucs-names))
(let ((minus (string-match-p "-[0-9A-F]+\\'" string)))
(when minus
;; Parse names like "VARIATION SELECTOR-17" and "CJK
^ permalink raw reply related [flat|nested] 9+ messages in thread
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
2017-08-31 5:04 bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table Mark Oteiza
@ 2017-08-31 10:05 ` Robert Pluim
2017-08-31 14:01 ` Eli Zaretskii
2017-08-31 14:00 ` Eli Zaretskii
1 sibling, 1 reply; 9+ messages in thread
From: Robert Pluim @ 2017-08-31 10:05 UTC (permalink / raw)
To: Mark Oteiza; +Cc: 28302
Mark Oteiza <mvoteiza@udel.edu> writes:
> Hi,
>
> I seem to remember there having been complaints about ucs-names preview
> being slow. I was curious about how much of that time was spent
> assoc'ing every element of a roughly n = 42k element long alist, and so
> tried making it a hash table instead. The result is a drastic speedup
> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
I haven't timed it exactly, but it makes a *very* noticeable
difference here. Thanks for this.
Regards
Robert
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
2017-08-31 5:04 bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table Mark Oteiza
2017-08-31 10:05 ` Robert Pluim
@ 2017-08-31 14:00 ` Eli Zaretskii
2017-08-31 14:46 ` Mark Oteiza
1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2017-08-31 14:00 UTC (permalink / raw)
To: Mark Oteiza; +Cc: 28302
> From: Mark Oteiza <mvoteiza@udel.edu>
> Date: Thu, 31 Aug 2017 01:04:15 -0400
>
> I seem to remember there having been complaints about ucs-names preview
> being slow. I was curious about how much of that time was spent
> assoc'ing every element of a roughly n = 42k element long alist, and so
> tried making it a hash table instead. The result is a drastic speedup
> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
Thanks, this is a very good change. Please make sure (if you haven't
already) that it survives bootstrap.
Also, there are other places which assume that ucs-names is an alist,
so I guess this is not the full final patch?
And this should be mentioned in NEWS under incompatible Lisp changes,
as ucs-names debuted in Emacs 23.1, and there could be some uses of it
outside Emacs proper.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
2017-08-31 10:05 ` Robert Pluim
@ 2017-08-31 14:01 ` Eli Zaretskii
0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2017-08-31 14:01 UTC (permalink / raw)
To: Robert Pluim; +Cc: mvoteiza, 28302
> From: Robert Pluim <rpluim@gmail.com>
> Date: Thu, 31 Aug 2017 12:05:46 +0200
> Cc: 28302@debbugs.gnu.org
>
> I haven't timed it exactly, but it makes a *very* noticeable
> difference here.
About ten-fold, according to my measurements.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
2017-08-31 14:00 ` Eli Zaretskii
@ 2017-08-31 14:46 ` Mark Oteiza
2017-08-31 21:30 ` Mark Oteiza
0 siblings, 1 reply; 9+ messages in thread
From: Mark Oteiza @ 2017-08-31 14:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 28302
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Mark Oteiza <mvoteiza@udel.edu>
>> Date: Thu, 31 Aug 2017 01:04:15 -0400
>>
>> I seem to remember there having been complaints about ucs-names preview
>> being slow. I was curious about how much of that time was spent
>> assoc'ing every element of a roughly n = 42k element long alist, and so
>> tried making it a hash table instead. The result is a drastic speedup
>> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
>
> Thanks, this is a very good change. Please make sure (if you haven't
> already) that it survives bootstrap.
>
> Also, there are other places which assume that ucs-names is an alist,
> so I guess this is not the full final patch?
Yes, since posting I found the other two places where ucs-names is used
and have made changes there.
> And this should be mentioned in NEWS under incompatible Lisp changes,
> as ucs-names debuted in Emacs 23.1, and there could be some uses of it
> outside Emacs proper.
Thanks, patch with additional changes attached. I don't really like
repeating the string in descr-text.el, but BEL is the only outlier and I
think it would be overkill to write a rassoc analog.
diff --git a/etc/NEWS b/etc/NEWS
index e8d6ea9c6d..e8f6aec9ef 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1153,6 +1153,9 @@ table implementation. This uses a new bytecode op 'switch', which isn't
compatible with previous Emacs versions. This functionality can be disabled
by setting 'byte-compile-cond-use-jump-table' to nil.
+---
+** The variable 'ucs-names' is now a hash table.
+
** 'C-up', 'C-down', 'C-left' and 'C-right' are now defined in term
mode to send the same escape sequences that xterm does. This makes
things like forward-word in readline work.
diff --git a/lisp/descr-text.el b/lisp/descr-text.el
index 6f36bbed68..b3c96988dd 100644
--- a/lisp/descr-text.el
+++ b/lisp/descr-text.el
@@ -617,16 +617,16 @@ describe-char
(list
(let* ((names (ucs-names))
(name
- (or (when (= char 7)
+ (or (when (= char ?\a)
;; Special case for "BELL" which is
;; apparently the only char which
;; doesn't have a new name and whose
;; old-name is shadowed by a newer char
;; with that name (bug#25641).
- (car (rassoc char names)))
+ "BELL (BEL)")
(get-char-code-property char 'name)
(get-char-code-property char 'old-name))))
- (if (and name (assoc-string name names))
+ (if (and name (gethash name names))
(format
"type \"C-x 8 RET %x\" or \"C-x 8 RET %s\""
char name)
diff --git a/lisp/leim/quail/latin-ltx.el b/lisp/leim/quail/latin-ltx.el
index 6c5afcd4f9..778706a451 100644
--- a/lisp/leim/quail/latin-ltx.el
+++ b/lisp/leim/quail/latin-ltx.el
@@ -75,20 +75,20 @@
(`(,seq ,re)
(let ((count 0)
(re (eval re t)))
- (dolist (pair (ucs-names))
- (let ((name (car pair))
- (char (cdr pair)))
- (when (and (characterp char) ;; Ignore char-ranges.
- (string-match re name))
- (let ((keys (if (stringp seq)
- (replace-match seq nil nil name)
- (funcall seq name char))))
- (if (listp keys)
- (dolist (x keys)
- (setq count (1+ count))
- (push (list x char) newrules))
- (setq count (1+ count))
- (push (list keys char) newrules))))))
+ (maphash
+ (lambda (name char)
+ (when (and (characterp char) ;; Ignore char-ranges.
+ (string-match re name))
+ (let ((keys (if (stringp seq)
+ (replace-match seq nil nil name)
+ (funcall seq name char))))
+ (if (listp keys)
+ (dolist (x keys)
+ (setq count (1+ count))
+ (push (list x char) newrules))
+ (setq count (1+ count))
+ (push (list keys char) newrules)))))
+ (ucs-names))
;; (message "latin-ltx: %d mappings for %S" count re)
))))
(setq newrules (delete-dups newrules))
@@ -206,7 +206,7 @@
((lambda (name char)
(let* ((base (concat (match-string 1 name) (match-string 3 name)))
- (basechar (cdr (assoc base (ucs-names)))))
+ (basechar (gethash base (ucs-names))))
(when (latin-ltx--ascii-p basechar)
(string (if (match-end 2) ?^ ?_) basechar))))
"\\(.*\\)SU\\(?:B\\|\\(PER\\)\\)SCRIPT \\(.*\\)")
^ permalink raw reply related [flat|nested] 9+ messages in thread
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
2017-08-31 14:46 ` Mark Oteiza
@ 2017-08-31 21:30 ` Mark Oteiza
2017-08-31 22:15 ` Drew Adams
0 siblings, 1 reply; 9+ messages in thread
From: Mark Oteiza @ 2017-08-31 21:30 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 28302-done
Pushed as 96c2c09.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
2017-08-31 21:30 ` Mark Oteiza
@ 2017-08-31 22:15 ` Drew Adams
2017-09-01 6:23 ` Eli Zaretskii
0 siblings, 1 reply; 9+ messages in thread
From: Drew Adams @ 2017-08-31 22:15 UTC (permalink / raw)
To: Mark Oteiza, Eli Zaretskii; +Cc: 28302-done
> Pushed as 96c2c09.
Judging only by the Subject line, without looking at the code,
I'd guess that this will break user code that counts on `ucs-names'
being an alist.
I, for one, have 3 libraries that do that: `ucs-cmds.el', `apu.el',
and Icicles.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
2017-08-31 22:15 ` Drew Adams
@ 2017-09-01 6:23 ` Eli Zaretskii
2017-09-01 10:43 ` Thien-Thi Nguyen
0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2017-09-01 6:23 UTC (permalink / raw)
To: Drew Adams; +Cc: mvoteiza, 28302
> Date: Thu, 31 Aug 2017 15:15:07 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: 28302-done@debbugs.gnu.org
>
> Judging only by the Subject line, without looking at the code,
> I'd guess that this will break user code that counts on `ucs-names'
> being an alist.
Yes, which is why it is mentioned as an incompatible change in NEWS.
> I, for one, have 3 libraries that do that: `ucs-cmds.el', `apu.el',
> and Icicles.
I suggest to update them to support a hash table as well as an alist.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
2017-09-01 6:23 ` Eli Zaretskii
@ 2017-09-01 10:43 ` Thien-Thi Nguyen
0 siblings, 0 replies; 9+ messages in thread
From: Thien-Thi Nguyen @ 2017-09-01 10:43 UTC (permalink / raw)
To: 28302
[-- Attachment #1: Type: text/plain, Size: 1722 bytes --]
() Eli Zaretskii <eliz@gnu.org>
() Fri, 01 Sep 2017 09:23:05 +0300
I suggest to update them to support a hash table as well as an
alist.
Tangent: Maybe we could extend ‘ucs-names’ (the function) to
take args and DTRT accordingly. Something like:
;; backward compatible
(ucs-names) => ALIST
;; lookup
(ucs-names 'forward-lookup CHAR-NAME) => CHAR-CODE
(ucs-names 'reverse-lookup CHAR-CODE) => CHAR-NAME
;; bonus: reflection
(ucs-names 'as-alist) => ALIST
(ucs-names 'as-hash-table) => HASH-TABLE
Admittedly, this is not very idiomatic Emacs Lisp. OTOH, i
would guess the vast majority of callers use it for lookup, so
centralizing that functionality would be a net win, despite the
style drift.
Really, given the rise of lexical binding and w/ niceties like
‘apply-partially’ already in the mix, i expect that sooner or
later, someone will put into place something like:
(defun callable-hash-table (source)
(let ((ht (ELABORATE source)))
;; rv
(lambda (&optional cmd)
(case cmd
(as-hash-table ht)
(as-alist ...)
(forward-lookup ...)
(reverse-lookup ...)
...))))
(fset ucs-names (callable-hash-table ucs-names))
IOW: Style drift be damned! Up the idioms! HOP things now!
(Insert more FP slogans here. :-D)
Or maybe this is already done? What am i missing? More coffee!
--
Thien-Thi Nguyen -----------------------------------------------
(defun responsep (query)
(pcase (context query)
(`(technical ,ml) (correctp ml))
...)) 748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-09-01 10:43 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-31 5:04 bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table Mark Oteiza
2017-08-31 10:05 ` Robert Pluim
2017-08-31 14:01 ` Eli Zaretskii
2017-08-31 14:00 ` Eli Zaretskii
2017-08-31 14:46 ` Mark Oteiza
2017-08-31 21:30 ` Mark Oteiza
2017-08-31 22:15 ` Drew Adams
2017-09-01 6:23 ` Eli Zaretskii
2017-09-01 10:43 ` Thien-Thi Nguyen
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).