From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Oleh Krehel Newsgroups: gmane.emacs.devel Subject: Re: What to do for faster `remove-duplicates'? Date: Wed, 06 May 2015 21:32:52 +0200 Message-ID: <87fv79zel7.fsf@gmail.com> References: <87383atb2p.fsf@gmail.com> <873839ltoa.fsf@gmail.com> <87bnhxeh4j.fsf@gmail.com> <87lhh1zgyn.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1430941161 18296 80.91.229.3 (6 May 2015 19:39:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 6 May 2015 19:39:21 +0000 (UTC) Cc: Thierry Volpiatto , emacs-devel To: Artur Malabarba Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed May 06 21:39:20 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Yq5A6-0001Fh-Ql for ged-emacs-devel@m.gmane.org; Wed, 06 May 2015 21:39:19 +0200 Original-Received: from localhost ([::1]:46913 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yq5A6-00009F-2R for ged-emacs-devel@m.gmane.org; Wed, 06 May 2015 15:39:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52839) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yq59s-000093-OO for emacs-devel@gnu.org; Wed, 06 May 2015 15:39:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yq59p-0007HU-G5 for emacs-devel@gnu.org; Wed, 06 May 2015 15:39:04 -0400 Original-Received: from mail-wi0-x231.google.com ([2a00:1450:400c:c05::231]:38475) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yq59p-0007HQ-9P for emacs-devel@gnu.org; Wed, 06 May 2015 15:39:01 -0400 Original-Received: by wiun10 with SMTP id n10so34683422wiu.1 for ; Wed, 06 May 2015 12:39:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type; bh=gPOor+nJZjKAav4oKnd149883N2NqpT6gnQRisFy0pk=; b=xIQQ/Jeor7yMjTK8pm9kdt77viEy3EMERoH8frnH668tBMSrmnE3fTI6H/zdShUCBw 0/W58nyCTAK27w9t2KMxs/qFj6ONhI3xqlJhuFAd4GC2QXU1qi+3ll1FIVN/4iMUnhsd qcTQpM7+taAVAFNPqvz/eBfWJp9RN24eVruFZYl5Gsq6bh3MfLqpOEgk4xggIoxYEo62 AlqaHvaFEsebQtj6GHDjLNlIc+3CFrQhWIgdwBd8A1170xQkCRM50B9xHKEwgFi5/rgT YNd1HDDJrOvJuRrsduAV71p1zfXv6uUesYm1x7PhoQG4kP3kqCzsz3i3b3bNjMNtZ3Qe fnaA== X-Received: by 10.194.97.196 with SMTP id ec4mr601797wjb.3.1430941140547; Wed, 06 May 2015 12:39:00 -0700 (PDT) Original-Received: from firefly (dyn069045.nbw.tue.nl. [131.155.69.45]) by mx.google.com with ESMTPSA id hu1sm207234wib.6.2015.05.06.12.38.59 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 06 May 2015 12:39:00 -0700 (PDT) In-Reply-To: (Artur Malabarba's message of "Wed, 6 May 2015 20:24:42 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c05::231 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:186299 Archived-At: Artur Malabarba writes: > 2015-05-06 19:41 GMT+01:00 Oleh Krehel : >> Artur Malabarba writes: >> >>> But that nreverse could be optimized out if the first loop followed a >>> `while'+`setcdr' strategy like the second. >> >> Please check the optimized version: >> >> (defun delete-dups (list) >> "Destructively remove `equal' duplicates from LIST. >> Store the result in LIST and return it. LIST must be a proper list. >> Of several `equal' occurrences of an element in LIST, the first >> one is kept." >> (if (> (length list) 100) >> (let ((hash (make-hash-table :test #'equal))) >> (let ((tail list) >> elt retail) >> (while (setq retail (cdr tail)) >> (setq elt (car retail)) >> (if (gethash elt hash) >> (setcdr tail (cdr retail)) >> (puthash elt t hash)) >> (setq tail retail)))) >> (let ((tail list)) >> (while tail >> (setcdr tail (delete (car tail) (cdr tail))) >> (setq tail (cdr tail))))) >> list) > > I may be wrong, but it looks like it's completely skipping the car of the list. You're right. I've pushed the corrected version. I tried testing with benchmark-run and the same large collection. The time fluctuates too much to judge which method is better. They're roughly equal, and the intuition is that not using `nreverse` is better. Oleh