From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: remove-duplicates performances Date: Fri, 20 May 2011 19:01:16 +0200 Organization: Organization?!? Message-ID: <871uztqpb7.fsf@fencepost.gnu.org> References: <877h9lv5tl.fsf@gmail.com> <878vu1qwde.fsf@fencepost.gnu.org> <87pqndcqfr.fsf@gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1305910911 4910 80.91.229.12 (20 May 2011 17:01:51 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 20 May 2011 17:01:51 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 20 19:01:47 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QNT55-0003fC-Ac for ged-emacs-devel@m.gmane.org; Fri, 20 May 2011 19:01:43 +0200 Original-Received: from localhost ([::1]:38457 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QNT54-0007eB-Rq for ged-emacs-devel@m.gmane.org; Fri, 20 May 2011 13:01:42 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:45927) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QNT52-0007e5-JA for emacs-devel@gnu.org; Fri, 20 May 2011 13:01:41 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QNT51-0004vs-MY for emacs-devel@gnu.org; Fri, 20 May 2011 13:01:40 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:43193) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QNT51-0004vR-CS for emacs-devel@gnu.org; Fri, 20 May 2011 13:01:39 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QNT4z-0003aV-E5 for emacs-devel@gnu.org; Fri, 20 May 2011 19:01:37 +0200 Original-Received: from p508edb1b.dip.t-dialin.net ([80.142.219.27]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 20 May 2011 19:01:37 +0200 Original-Received: from dak by p508edb1b.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 20 May 2011 19:01:37 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 67 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: p508edb1b.dip.t-dialin.net X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:9qGf0aaavj0m4gw7H8iC6rBaX4Y= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 80.91.229.12 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:139557 Archived-At: Thierry Volpiatto writes: > David Kastrup writes: > >> I've found the following in some file of mine: >> >> (defun uniquify (list predicate) >> (let* ((p list) lst (x1 (make-symbol "x1")) >> (x2 (make-symbol "x2"))) >> (while p >> (push p lst) >> (setq p (cdr p))) >> ;;; (princ lst)(princ "\n") >> (setq lst >> (sort lst `(lambda(,x1 ,x2) >> (funcall ',predicate (car ,x1) (car ,x2))))) >> ;;; lst now contains all sorted sublists, with equal cars being >> ;;; sorted in order of increasing length (from end of list to start). >> ;; >> >> (while (cdr lst) >> (unless (funcall predicate (car (car lst)) (car (cadr lst))) >> (setcar (car lst) x1)) >> (setq lst (cdr lst))) >> (delq x1 list))) >> >> (uniquify '(2 1 2 1 2) '<) >> (uniquify '(4 7 3 26 4 2 6 24 4 5 2 3 2 4 6) '<) > > This is nice and very instructive (at least for me) thanks. > It is not as performant as the version with hash-table, Well, the sorting function is a mess due to not being compiled and fearing dynamic binding. If you byte-compile something like (defun uniquify (list predicate) (let* ((p list) lst (sentinel (list nil))) (while p (push p lst) (setq p (cdr p))) (setq lst (sort lst (lambda(x1 x2) (funcall predicate (car x1) (car x2))))) ;;; lst now contains all sorted sublists, with equal cars being ;;; sorted in order of increasing length (from end of list to start). ;; (while (cdr lst) (unless (funcall predicate (car (car lst)) (car (cadr lst))) (setcar (car lst) sentinel)) (setq lst (cdr lst))) (delq sentinel list))) the behavior is likely better. > but very usable: 0.3 <=> 0.13 with same test on list with 20000 > elements. However, isn't it a problem when we want to remove > duplicate in a list type alist e.g ((a . 1) (b . 2) (a . 1) (c . 3) (b > . 2)...) Why? You need a predicate < both for sorting and for telling inequality. As long as you define a suitable predicate for that purpose, what should go wrong? Any elements for which (or (predicate a b) (predicate b a)) is nil will be considered duplicate. -- David Kastrup