From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Oleh Krehel Newsgroups: gmane.emacs.devel Subject: Re: What to do for faster `remove-duplicates'? Date: Wed, 06 May 2015 15:30:45 +0200 Message-ID: <873839ltoa.fsf@gmail.com> References: <87383atb2p.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1430919434 29155 80.91.229.3 (6 May 2015 13:37:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 6 May 2015 13:37:14 +0000 (UTC) Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed May 06 15:37:10 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YpzVb-0002hq-A6 for ged-emacs-devel@m.gmane.org; Wed, 06 May 2015 15:37:07 +0200 Original-Received: from localhost ([::1]:45130 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YpzVa-0007hB-Bl for ged-emacs-devel@m.gmane.org; Wed, 06 May 2015 09:37:06 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39123) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YpzVS-0007cU-Dy for emacs-devel@gnu.org; Wed, 06 May 2015 09:37:02 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YpzVO-0008Fa-7x for emacs-devel@gnu.org; Wed, 06 May 2015 09:36:58 -0400 Original-Received: from mail-wg0-x22d.google.com ([2a00:1450:400c:c00::22d]:33717) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YpzVO-0008FM-0O for emacs-devel@gnu.org; Wed, 06 May 2015 09:36:54 -0400 Original-Received: by wgin8 with SMTP id n8so11965536wgi.0 for ; Wed, 06 May 2015 06:36:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type; bh=7VYRt5Qgc2Bc9k6lAyHH8bJ7bRju79bUCa9jgmdSIFw=; b=j9TfMTBHX4afoe83244tZLST0C8AY/2ssdYphigr2JaDfNW28cgM26Ydx/otH4qTs8 WPbYRsQIN+Zm8gLDYw3DonJNmokZxqwt29L/cV5dKgSvjfJc8xO+ND5qaHI1rWVFl0zL ecZG9Du1eIIFnoIERjM+MxdLcuGBo4aQ7OAm7ou7QKC37rrxMQuTzqW8cTNAVUqpA4Zb aD3fZgu2Y6Lz9uWg2TKGCgh+tmVK8+hd2gTiBkJ2sc9WB8uGurJChtULQD4ITrtEUKJt vgFHjVF5OFrnONMxUCNaDsq2ev4ECDaVoMA+0yTZtg7iJXWyVzny/M0tWRtK/MexAiiI 8/Xg== X-Received: by 10.180.92.198 with SMTP id co6mr13939447wib.34.1430919413219; Wed, 06 May 2015 06:36:53 -0700 (PDT) Original-Received: from firefly (dyn069045.nbw.tue.nl. [131.155.69.45]) by mx.google.com with ESMTPSA id r9sm2806818wjo.26.2015.05.06.06.36.52 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 06 May 2015 06:36:52 -0700 (PDT) In-Reply-To: (Stefan Monnier's message of "Wed, 06 May 2015 08:59:33 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c00::22d X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:186279 Archived-At: --=-=-= Content-Type: text/plain Stefan Monnier writes: >> So should `cl-remove-duplicates' be a 2-in-1 (lists for small input, > > Don't know/don't care about cl-remove-duplicates, but for delete-dups, I'd > welcome a patch which switches to a hash-table after the Nth element. I attach the patch. I did a bunch of `benchmark-run' and it seems that 100 elements is the breaking point. > PS: BTW, helm-fast-remove-dups's docstring says: > > This is same as `remove-duplicates' but with memoisation. > > but it doesn't actually use memoization, AFAICT. It does in a way, for an optimized-out "is-element-in-collection" function. Oleh --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-lisp-subr.el-delete-dups-Use-a-hash-table.patch >From c845e6384283c17b61c3a0401213e4a1837807c5 Mon Sep 17 00:00:00 2001 From: Oleh Krehel Date: Wed, 6 May 2015 15:21:23 +0200 Subject: [PATCH] lisp/subr.el (delete-dups): Use a hash table * lisp/subr.el (delete-dups): When there are more than 100 candidates, use a hash table. This can result in ~500 times speed-up for typical collections of size 5000, like that of `load-library'. --- lisp/subr.el | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/lisp/subr.el b/lisp/subr.el index 0fec29c..591980d 100644 --- a/lisp/subr.el +++ b/lisp/subr.el @@ -417,11 +417,19 @@ If N is omitted or nil, remove the last element." Store the result in LIST and return it. LIST must be a proper list. Of several `equal' occurrences of an element in LIST, the first one is kept." - (let ((tail list)) - (while tail - (setcdr tail (delete (car tail) (cdr tail))) - (setq tail (cdr tail)))) - list) + (if (> (length list) 100) + (let ((hash (make-hash-table :test #'equal)) + res) + (dolist (elt list) + (unless (gethash elt hash) + (puthash elt elt hash) + (push elt res))) + (nreverse res)) + (let ((tail list)) + (while tail + (setcdr tail (delete (car tail) (cdr tail))) + (setq tail (cdr tail)))) + list)) ;; See http://lists.gnu.org/archive/html/emacs-devel/2013-05/msg00204.html (defun delete-consecutive-dups (list &optional circular) -- 1.8.4 --=-=-=--