From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Drew Adams" Newsgroups: gmane.emacs.bugs Subject: bug#10386: CODE wishlist: =?UTF-8?Q?=C3=ADncludedelete-duplicates.el?= Date: Wed, 28 Dec 2011 08:56:48 -0800 Message-ID: <97F263C286E14A3A9D8656356D98ECD1@us.oracle.com> References: <87zked2kqt.fsf@cante.cante.net> <87aa6dvz8l.fsf@gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1325091450 14533 80.91.229.12 (28 Dec 2011 16:57:30 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 28 Dec 2011 16:57:30 +0000 (UTC) Cc: 'Jari Aalto' To: "'Thierry Volpiatto'" , <10386@debbugs.gnu.org> Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Dec 28 17:57:25 2011 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Rfwod-00006m-TH for geb-bug-gnu-emacs@m.gmane.org; Wed, 28 Dec 2011 17:57:24 +0100 Original-Received: from localhost ([::1]:52310 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rfwoc-0008KX-Q7 for geb-bug-gnu-emacs@m.gmane.org; Wed, 28 Dec 2011 11:57:22 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:48317) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rfwoa-0008KF-1n for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2011 11:57:20 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RfwoY-0007nQ-HW for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2011 11:57:19 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:56692) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RfwoY-0007nL-F3 for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2011 11:57:18 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1RfwrC-0002pS-Jf for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2011 12:00:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: "Drew Adams" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 28 Dec 2011 17:00:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10386 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 10386-submit@debbugs.gnu.org id=B10386.132509160010838 (code B ref 10386); Wed, 28 Dec 2011 17:00:02 +0000 Original-Received: (at 10386) by debbugs.gnu.org; 28 Dec 2011 17:00:00 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Rfwr9-0002ol-SB for submit@debbugs.gnu.org; Wed, 28 Dec 2011 12:00:00 -0500 Original-Received: from acsinet15.oracle.com ([141.146.126.227]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Rfwr5-0002oZ-IB for 10386@debbugs.gnu.org; Wed, 28 Dec 2011 11:59:58 -0500 Original-Received: from ucsinet22.oracle.com (ucsinet22.oracle.com [156.151.31.94]) by acsinet15.oracle.com (Switch-3.4.4/Switch-3.4.4) with ESMTP id pBSGv7WC026777 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 28 Dec 2011 16:57:08 GMT Original-Received: from acsmt356.oracle.com (acsmt356.oracle.com [141.146.40.156]) by ucsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id pBSGv5QC004890 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 28 Dec 2011 16:57:06 GMT Original-Received: from abhmt104.oracle.com (abhmt104.oracle.com [141.146.116.56]) by acsmt356.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id pBSGv5BH005841; Wed, 28 Dec 2011 10:57:05 -0600 Original-Received: from dradamslap1 (/10.159.49.247) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 28 Dec 2011 08:57:05 -0800 X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <87aa6dvz8l.fsf@gmail.com> Thread-Index: AczFQK24oKjCRKpURdKe0XwmO2kMNQAHF2BA X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: ucsinet22.oracle.com [156.151.31.94] X-CT-RefId: str=0001.0A02020A.4EFB4A64.00EB,ss=1,re=0.000,fgs=0 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Wed, 28 Dec 2011 12:00:02 -0500 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:55252 Archived-At: > > http://mwolson.org/static/dist/elisp/delete-duplicates.el > > This is already provided in Emacs24+. `delete-dups' has been in Emacs since Emacs 22, actually. > What is not provided is a fast version of remove-duplicates. > http://article.gmane.org/gmane.emacs.devel/139546/match=remove+dups +1 And the cl-seq.el version of `remove-duplicates' is *particularly* slow. Even a classic list dups removal algorithm is much faster. Here's a comparison using `equal' and a list of strings (`elp-results'): hash-remove-dups 1 0.031 0.031 list-remove-dups 1 5.813 5.813 remove-duplicates 1 122.875 122.875 Where: (defun list-remove-dups (list) (let ((tail list) new) (while tail (unless (member (car tail) new) (push (car tail) new)) (pop tail)) (nreverse new))) (defun* hash-remove-dups (seq &key (test 'equal)) (let ((cont (make-hash-table :test test))) (loop for elm in seq unless (gethash elm cont) do (puthash elm elm cont) finally return (loop for i being the hash-values in cont collect i)))) With all 3 functions byte-compiled, using these calls: (hash-remove-dups B :test 'equal) (list-remove-dups B) ; uses `equal' (remove-duplicates B :test 'equal) And with this list B (initialized anew each time): (let ((seq (loop for i from 1 to 10000 collect (format "%s" (random most-positive-fixnum))))) (append seq seq)) With B 10 times smaller (1000): hash-remove-dups 1 0.0 0.0 list-remove-dups 1 0.047 0.047 remove-duplicates 1 1.172 1.172 With B 10 times bigger (100000), the difference between hash and classic list is even greater (fuggedabowt cl-seq's `remove-duplicates' in this case): hash-remove-dups 1 0.359 0.359 list-remove-dups 1 1209.578 1209.578 remove-duplicates la-la...