all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Oleh Krehel <ohwoeowho@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: emacs-devel@gnu.org
Subject: Re: What to do for faster `remove-duplicates'?
Date: Wed, 06 May 2015 15:30:45 +0200	[thread overview]
Message-ID: <873839ltoa.fsf@gmail.com> (raw)
In-Reply-To: <jwv3839c1am.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message of "Wed, 06 May 2015 08:59:33 -0400")

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> So should `cl-remove-duplicates' be a 2-in-1 (lists for small input,
>
> Don't know/don't care about cl-remove-duplicates, but for delete-dups, I'd
> welcome a patch which switches to a hash-table after the Nth element.

I attach the patch. I did a bunch of `benchmark-run' and it seems that
100 elements is the breaking point.

> PS: BTW, helm-fast-remove-dups's docstring says:
>
>     This is same as `remove-duplicates' but with memoisation.
>
>   but it doesn't actually use memoization, AFAICT.

It does in a way, for an optimized-out "is-element-in-collection"
function.

Oleh


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-lisp-subr.el-delete-dups-Use-a-hash-table.patch --]
[-- Type: text/x-diff, Size: 1512 bytes --]

From c845e6384283c17b61c3a0401213e4a1837807c5 Mon Sep 17 00:00:00 2001
From: Oleh Krehel <ohwoeowho@gmail.com>
Date: Wed, 6 May 2015 15:21:23 +0200
Subject: [PATCH] lisp/subr.el (delete-dups): Use a hash table

* lisp/subr.el (delete-dups): When there are more than 100 candidates,
  use a hash table. This can result in ~500 times speed-up for typical
  collections of size 5000, like that of `load-library'.
---
 lisp/subr.el | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/lisp/subr.el b/lisp/subr.el
index 0fec29c..591980d 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -417,11 +417,19 @@ If N is omitted or nil, remove the last element."
 Store the result in LIST and return it.  LIST must be a proper list.
 Of several `equal' occurrences of an element in LIST, the first
 one is kept."
-  (let ((tail list))
-    (while tail
-      (setcdr tail (delete (car tail) (cdr tail)))
-      (setq tail (cdr tail))))
-  list)
+  (if (> (length list) 100)
+      (let ((hash (make-hash-table :test #'equal))
+            res)
+        (dolist (elt list)
+          (unless (gethash elt hash)
+            (puthash elt elt hash)
+            (push elt res)))
+        (nreverse res))
+    (let ((tail list))
+      (while tail
+        (setcdr tail (delete (car tail) (cdr tail)))
+        (setq tail (cdr tail))))
+    list))
 
 ;; See http://lists.gnu.org/archive/html/emacs-devel/2013-05/msg00204.html
 (defun delete-consecutive-dups (list &optional circular)
-- 
1.8.4


  reply	other threads:[~2015-05-06 13:30 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-06  7:33 What to do for faster `remove-duplicates'? Oleh Krehel
2015-05-06 12:59 ` Stefan Monnier
2015-05-06 13:30   ` Oleh Krehel [this message]
2015-05-06 13:50     ` Stefan Monnier
2015-05-06 13:51       ` Oleh Krehel
2015-05-06 17:43       ` Thierry Volpiatto
2015-05-06 17:54         ` Oleh Krehel
2015-05-06 18:31         ` Artur Malabarba
2015-05-06 18:33           ` Artur Malabarba
2015-05-06 18:41             ` Oleh Krehel
2015-05-06 19:24               ` Artur Malabarba
2015-05-06 19:32                 ` Oleh Krehel
2015-05-06 21:22                   ` Artur Malabarba
2015-05-06 18:48           ` Thierry Volpiatto
2015-05-06 20:54             ` Stefan Monnier
2015-05-06 20:54           ` Stefan Monnier
2015-05-06 21:18             ` Artur Malabarba
2015-05-06 14:04     ` Artur Malabarba
2015-05-06 14:13       ` Oleh Krehel
2015-05-06 14:49         ` Artur Malabarba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=873839ltoa.fsf@gmail.com \
    --to=ohwoeowho@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.