unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
@ 2017-08-31  5:04 Mark Oteiza
  2017-08-31 10:05 ` Robert Pluim
  2017-08-31 14:00 ` Eli Zaretskii
  0 siblings, 2 replies; 9+ messages in thread
From: Mark Oteiza @ 2017-08-31  5:04 UTC (permalink / raw)
  To: 28302


Hi,

I seem to remember there having been complaints about ucs-names preview
being slow.  I was curious about how much of that time was spent
assoc'ing every element of a roughly n = 42k element long alist, and so
tried making it a hash table instead.  The result is a drastic speedup
of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).

diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index 338ca6a6e3..8c5fcf319b 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -2923,10 +2923,10 @@ nonascii-translation-table
 (make-obsolete-variable 'nonascii-translation-table "do not use it." "23.1")
 
 (defvar ucs-names nil
-  "Alist of cached (CHAR-NAME . CHAR-CODE) pairs.")
+  "Hash table of cached (CHAR-NAME . CHAR-CODE) pairs.")
 
 (defun ucs-names ()
-  "Return alist of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
+  "Return hash table of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
   (or ucs-names
       (let ((ranges
 	     '((#x0000 . #x33FF)
@@ -2954,7 +2954,7 @@ ucs-names
 	       ;; (#x20000 . #xDFFFF) CJK Ideograph Extension A, B, etc, unused
 	       (#xE0000 . #xE01FF)))
 	    (gc-cons-threshold 10000000)
-	    names)
+	    (names (make-hash-table :size 42943 :test #'equal)))
 	(dolist (range ranges)
 	  (let ((c (car range))
 		(end (cdr range)))
@@ -2965,27 +2965,28 @@ ucs-names
 		;; shadows a "new-name" but in practice every time an
 		;; `old-name' conflicts with a `new-name', the newer one has a
 		;; higher code, so it gets pushed later!
-		(if new-name (push (cons new-name c) names))
-		(if old-name (push (cons old-name c) names))
+		(if new-name (puthash new-name c names))
+		(if old-name (puthash old-name c names))
 		(setq c (1+ c))))))
 	;; Special case for "BELL" which is apparently the only char which
 	;; doesn't have a new name and whose old-name is shadowed by a newer
 	;; char with that name.
-	(setq ucs-names `(("BELL (BEL)" . 7) ,@names)))))
+	(puthash "BELL (BEL)" ?\a names)
+        (setq ucs-names names))))
 
 (defun mule--ucs-names-annotation (name)
   ;; FIXME: It would be much better to add this annotation before rather than
   ;; after the char name, so the annotations are aligned.
   ;; FIXME: The default behavior of displaying annotations in italics
   ;; doesn't work well here.
-  (let ((char (assoc name ucs-names)))
-    (when char (format " (%c)" (cdr char)))))
+  (let ((char (gethash name ucs-names)))
+    (when char (format " (%c)" char))))
 
 (defun char-from-name (string &optional ignore-case)
   "Return a character as a number from its Unicode name STRING.
 If optional IGNORE-CASE is non-nil, ignore case in STRING.
 Return nil if STRING does not name a character."
-  (or (cdr (assoc-string string (ucs-names) ignore-case))
+  (or (gethash (if ignore-case (upcase string) string) (ucs-names))
       (let ((minus (string-match-p "-[0-9A-F]+\\'" string)))
         (when minus
           ;; Parse names like "VARIATION SELECTOR-17" and "CJK





^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
  2017-08-31  5:04 bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table Mark Oteiza
@ 2017-08-31 10:05 ` Robert Pluim
  2017-08-31 14:01   ` Eli Zaretskii
  2017-08-31 14:00 ` Eli Zaretskii
  1 sibling, 1 reply; 9+ messages in thread
From: Robert Pluim @ 2017-08-31 10:05 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 28302

Mark Oteiza <mvoteiza@udel.edu> writes:

> Hi,
>
> I seem to remember there having been complaints about ucs-names preview
> being slow.  I was curious about how much of that time was spent
> assoc'ing every element of a roughly n = 42k element long alist, and so
> tried making it a hash table instead.  The result is a drastic speedup
> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).

I haven't timed it exactly, but it makes a *very* noticeable
difference here. Thanks for this.

Regards

Robert





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
  2017-08-31  5:04 bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table Mark Oteiza
  2017-08-31 10:05 ` Robert Pluim
@ 2017-08-31 14:00 ` Eli Zaretskii
  2017-08-31 14:46   ` Mark Oteiza
  1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2017-08-31 14:00 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: 28302

> From: Mark Oteiza <mvoteiza@udel.edu>
> Date: Thu, 31 Aug 2017 01:04:15 -0400
> 
> I seem to remember there having been complaints about ucs-names preview
> being slow.  I was curious about how much of that time was spent
> assoc'ing every element of a roughly n = 42k element long alist, and so
> tried making it a hash table instead.  The result is a drastic speedup
> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).

Thanks, this is a very good change.  Please make sure (if you haven't
already) that it survives bootstrap.

Also, there are other places which assume that ucs-names is an alist,
so I guess this is not the full final patch?

And this should be mentioned in NEWS under incompatible Lisp changes,
as ucs-names debuted in Emacs 23.1, and there could be some uses of it
outside Emacs proper.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
  2017-08-31 10:05 ` Robert Pluim
@ 2017-08-31 14:01   ` Eli Zaretskii
  0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2017-08-31 14:01 UTC (permalink / raw)
  To: Robert Pluim; +Cc: mvoteiza, 28302

> From: Robert Pluim <rpluim@gmail.com>
> Date: Thu, 31 Aug 2017 12:05:46 +0200
> Cc: 28302@debbugs.gnu.org
> 
> I haven't timed it exactly, but it makes a *very* noticeable
> difference here.

About ten-fold, according to my measurements.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
  2017-08-31 14:00 ` Eli Zaretskii
@ 2017-08-31 14:46   ` Mark Oteiza
  2017-08-31 21:30     ` Mark Oteiza
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Oteiza @ 2017-08-31 14:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 28302

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Mark Oteiza <mvoteiza@udel.edu>
>> Date: Thu, 31 Aug 2017 01:04:15 -0400
>> 
>> I seem to remember there having been complaints about ucs-names preview
>> being slow.  I was curious about how much of that time was spent
>> assoc'ing every element of a roughly n = 42k element long alist, and so
>> tried making it a hash table instead.  The result is a drastic speedup
>> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
>
> Thanks, this is a very good change.  Please make sure (if you haven't
> already) that it survives bootstrap.
>
> Also, there are other places which assume that ucs-names is an alist,
> so I guess this is not the full final patch?

Yes, since posting I found the other two places where ucs-names is used
and have made changes there.

> And this should be mentioned in NEWS under incompatible Lisp changes,
> as ucs-names debuted in Emacs 23.1, and there could be some uses of it
> outside Emacs proper.

Thanks, patch with additional changes attached.  I don't really like
repeating the string in descr-text.el, but BEL is the only outlier and I
think it would be overkill to write a rassoc analog.

diff --git a/etc/NEWS b/etc/NEWS
index e8d6ea9c6d..e8f6aec9ef 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1153,6 +1153,9 @@ table implementation. This uses a new bytecode op 'switch', which isn't
 compatible with previous Emacs versions. This functionality can be disabled
 by setting 'byte-compile-cond-use-jump-table' to nil.
 
+---
+** The variable 'ucs-names' is now a hash table.
+
 ** 'C-up', 'C-down', 'C-left' and 'C-right' are now defined in term
 mode to send the same escape sequences that xterm does.  This makes
 things like forward-word in readline work.
diff --git a/lisp/descr-text.el b/lisp/descr-text.el
index 6f36bbed68..b3c96988dd 100644
--- a/lisp/descr-text.el
+++ b/lisp/descr-text.el
@@ -617,16 +617,16 @@ describe-char
 			 (list
                           (let* ((names (ucs-names))
                                  (name
-                                  (or (when (= char 7)
+                                  (or (when (= char ?\a)
 				       ;; Special case for "BELL" which is
 				       ;; apparently the only char which
 				       ;; doesn't have a new name and whose
 				       ;; old-name is shadowed by a newer char
 				       ;; with that name (bug#25641).
-				       (car (rassoc char names)))
+				       "BELL (BEL)")
                                       (get-char-code-property char 'name)
                                       (get-char-code-property char 'old-name))))
-                            (if (and name (assoc-string name names))
+                            (if (and name (gethash name names))
                                 (format
                                  "type \"C-x 8 RET %x\" or \"C-x 8 RET %s\""
                                  char name)
diff --git a/lisp/leim/quail/latin-ltx.el b/lisp/leim/quail/latin-ltx.el
index 6c5afcd4f9..778706a451 100644
--- a/lisp/leim/quail/latin-ltx.el
+++ b/lisp/leim/quail/latin-ltx.el
@@ -75,20 +75,20 @@
           (`(,seq ,re)
            (let ((count 0)
                  (re (eval re t)))
-             (dolist (pair (ucs-names))
-               (let ((name (car pair))
-                     (char (cdr pair)))
-                 (when (and (characterp char) ;; Ignore char-ranges.
-                            (string-match re name))
-                   (let ((keys (if (stringp seq)
-                                   (replace-match seq nil nil name)
-                                 (funcall seq name char))))
-                     (if (listp keys)
-                         (dolist (x keys)
-                           (setq count (1+ count))
-                           (push (list x char) newrules))
-                       (setq count (1+ count))
-                       (push (list keys char) newrules))))))
+             (maphash
+              (lambda (name char) 
+                (when (and (characterp char) ;; Ignore char-ranges.
+                           (string-match re name))
+                  (let ((keys (if (stringp seq)
+                                  (replace-match seq nil nil name)
+                                (funcall seq name char))))
+                    (if (listp keys)
+                        (dolist (x keys)
+                          (setq count (1+ count))
+                          (push (list x char) newrules))
+                      (setq count (1+ count))
+                      (push (list keys char) newrules)))))
+               (ucs-names))
              ;; (message "latin-ltx: %d mappings for %S" count re)
 	     ))))
       (setq newrules (delete-dups newrules))
@@ -206,7 +206,7 @@
 
  ((lambda (name char)
     (let* ((base (concat (match-string 1 name) (match-string 3 name)))
-           (basechar (cdr (assoc base (ucs-names)))))
+           (basechar (gethash base (ucs-names))))
       (when (latin-ltx--ascii-p basechar)
         (string (if (match-end 2) ?^ ?_) basechar))))
   "\\(.*\\)SU\\(?:B\\|\\(PER\\)\\)SCRIPT \\(.*\\)")





^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
  2017-08-31 14:46   ` Mark Oteiza
@ 2017-08-31 21:30     ` Mark Oteiza
  2017-08-31 22:15       ` Drew Adams
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Oteiza @ 2017-08-31 21:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 28302-done

Pushed as 96c2c09.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
  2017-08-31 21:30     ` Mark Oteiza
@ 2017-08-31 22:15       ` Drew Adams
  2017-09-01  6:23         ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Drew Adams @ 2017-08-31 22:15 UTC (permalink / raw)
  To: Mark Oteiza, Eli Zaretskii; +Cc: 28302-done

> Pushed as 96c2c09.

Judging only by the Subject line, without looking at the code,
I'd guess that this will break user code that counts on `ucs-names'
being an alist.

I, for one, have 3 libraries that do that: `ucs-cmds.el', `apu.el',
and Icicles.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
  2017-08-31 22:15       ` Drew Adams
@ 2017-09-01  6:23         ` Eli Zaretskii
  2017-09-01 10:43           ` Thien-Thi Nguyen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2017-09-01  6:23 UTC (permalink / raw)
  To: Drew Adams; +Cc: mvoteiza, 28302

> Date: Thu, 31 Aug 2017 15:15:07 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: 28302-done@debbugs.gnu.org
> 
> Judging only by the Subject line, without looking at the code,
> I'd guess that this will break user code that counts on `ucs-names'
> being an alist.

Yes, which is why it is mentioned as an incompatible change in NEWS.

> I, for one, have 3 libraries that do that: `ucs-cmds.el', `apu.el',
> and Icicles.

I suggest to update them to support a hash table as well as an alist.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
  2017-09-01  6:23         ` Eli Zaretskii
@ 2017-09-01 10:43           ` Thien-Thi Nguyen
  0 siblings, 0 replies; 9+ messages in thread
From: Thien-Thi Nguyen @ 2017-09-01 10:43 UTC (permalink / raw)
  To: 28302

[-- Attachment #1: Type: text/plain, Size: 1722 bytes --]


() Eli Zaretskii <eliz@gnu.org>
() Fri, 01 Sep 2017 09:23:05 +0300

   I suggest to update them to support a hash table as well as an
   alist.

Tangent: Maybe we could extend ‘ucs-names’ (the function) to
take args and DTRT accordingly.  Something like:

 ;; backward compatible
 (ucs-names) => ALIST

 ;; lookup
 (ucs-names 'forward-lookup CHAR-NAME) => CHAR-CODE
 (ucs-names 'reverse-lookup CHAR-CODE) => CHAR-NAME

 ;; bonus: reflection
 (ucs-names 'as-alist) => ALIST
 (ucs-names 'as-hash-table) => HASH-TABLE

Admittedly, this is not very idiomatic Emacs Lisp.  OTOH, i
would guess the vast majority of callers use it for lookup, so
centralizing that functionality would be a net win, despite the
style drift.

Really, given the rise of lexical binding and w/ niceties like
‘apply-partially’ already in the mix, i expect that sooner or
later, someone will put into place something like:

 (defun callable-hash-table (source)
   (let ((ht (ELABORATE source)))
     ;; rv
     (lambda (&optional cmd)
       (case cmd
         (as-hash-table ht)
         (as-alist ...)
         (forward-lookup ...)
         (reverse-lookup ...)
         ...))))

 (fset ucs-names (callable-hash-table ucs-names))

IOW:  Style drift be damned!  Up the idioms!  HOP things now!
(Insert more FP slogans here. :-D)

Or maybe this is already done?  What am i missing?  More coffee!

-- 
Thien-Thi Nguyen -----------------------------------------------
 (defun responsep (query)
   (pcase (context query)
     (`(technical ,ml) (correctp ml))
     ...))                              748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-01 10:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-31  5:04 bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table Mark Oteiza
2017-08-31 10:05 ` Robert Pluim
2017-08-31 14:01   ` Eli Zaretskii
2017-08-31 14:00 ` Eli Zaretskii
2017-08-31 14:46   ` Mark Oteiza
2017-08-31 21:30     ` Mark Oteiza
2017-08-31 22:15       ` Drew Adams
2017-09-01  6:23         ` Eli Zaretskii
2017-09-01 10:43           ` Thien-Thi Nguyen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).