From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: remove-duplicates performances Date: Fri, 20 May 2011 16:28:45 +0200 Organization: Organization?!? Message-ID: <878vu1qwde.fsf@fencepost.gnu.org> References: <877h9lv5tl.fsf@gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1305901749 10188 80.91.229.12 (20 May 2011 14:29:09 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 20 May 2011 14:29:09 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 20 16:29:05 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QNQhL-0008Rk-Eb for ged-emacs-devel@m.gmane.org; Fri, 20 May 2011 16:29:03 +0200 Original-Received: from localhost ([::1]:39276 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QNQhK-0005Hd-Sy for ged-emacs-devel@m.gmane.org; Fri, 20 May 2011 10:29:02 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:32986) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QNQhH-0005HN-FF for emacs-devel@gnu.org; Fri, 20 May 2011 10:29:00 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QNQhG-0005hJ-As for emacs-devel@gnu.org; Fri, 20 May 2011 10:28:59 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:35146) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QNQhF-0005hD-VB for emacs-devel@gnu.org; Fri, 20 May 2011 10:28:58 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QNQhE-0008O9-Id for emacs-devel@gnu.org; Fri, 20 May 2011 16:28:56 +0200 Original-Received: from p508edb1b.dip.t-dialin.net ([80.142.219.27]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 20 May 2011 16:28:56 +0200 Original-Received: from dak by p508edb1b.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 20 May 2011 16:28:56 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 70 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: p508edb1b.dip.t-dialin.net X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:2meUYVjRD9V5H2ImGlDP71l+cbw= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 80.91.229.12 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:139549 Archived-At: Thierry Volpiatto writes: > i just noticed that `remove-duplicates' is very slow. > > Something like below seem much faster: > > (defun* remove-dups (seq &key (test 'eq)) > (let ((cont (make-hash-table :test test))) > (loop for elm in seq > unless (gethash elm cont) > do (puthash elm elm cont) > finally return (loop for i being the hash-values in cont collect i)))) > > Test: > > (setq A (let ((seq (loop for i from 1 to 10000 collect i))) > (append seq seq))) > (1 2 3 4 5 6 7 8 9 10 1 2 ...) > > (remove-dups A) > (1 2 3 4 5 6 7 8 9 10 11 12 ...) > elp-results: remove-dups 1 0.013707 0.013707 > > (remove-duplicates A) > (1 2 3 4 5 6 7 8 9 10 11 12 ...) > elp-results: remove-duplicates 1 66.971619 66.971619 > > Would be nice to improve performances of `remove-duplicates'. There is little point in the overhead of a hashtable for a one-shot operation. Hashtables are best for _maintaining_ data, not for processing other data structures. I've found the following in some file of mine: (defun uniquify (list predicate) (let* ((p list) lst (x1 (make-symbol "x1")) (x2 (make-symbol "x2"))) (while p (push p lst) (setq p (cdr p))) ;;; (princ lst)(princ "\n") (setq lst (sort lst `(lambda(,x1 ,x2) (funcall ',predicate (car ,x1) (car ,x2))))) ;;; lst now contains all sorted sublists, with equal cars being ;;; sorted in order of increasing length (from end of list to start). ;; (while (cdr lst) (unless (funcall predicate (car (car lst)) (car (cadr lst))) (setcar (car lst) x1)) (setq lst (cdr lst))) (delq x1 list))) (uniquify '(2 1 2 1 2) '<) (uniquify '(4 7 3 26 4 2 6 24 4 5 2 3 2 4 6) '<) Obviously, this should make use of lexical binding "nowadays" instead of creating its own unique symbols for function parameters. And instead of using x1 as a sentinel value some other unique thing might be used, like a one-shot '(nil). Basically, for a one-shot O(n^2) operation on a list, `sort' will usually do the trick. But if you are doing that O(n^2) operation regularly, you should probably maintain the whole data set in a hashtable instead. -- David Kastrup