From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Philip Kaludercic Newsgroups: gmane.emacs.devel Subject: Re: [ELPA] New package: dired-duplicates Date: Wed, 01 Nov 2023 21:29:29 +0000 Message-ID: <875y2lyxfq.fsf@posteo.net> References: <875y2nugm8.fsf@posteo.net> <87zfzx1zfz.fsf@posteo.net> <9bcb9fe9-37b7-4c6e-8292-772ebbe379d9@gmx.at> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35125"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org, Stefan Kangas , Eli Zaretskii , Visuwesh To: Harald Judt Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Nov 01 22:30:27 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qyInC-0008vn-Pn for ged-emacs-devel@m.gmane-mx.org; Wed, 01 Nov 2023 22:30:26 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyImT-00052N-Eg; Wed, 01 Nov 2023 17:29:42 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyImP-00052D-PV for emacs-devel@gnu.org; Wed, 01 Nov 2023 17:29:38 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyImM-0002qo-0S for emacs-devel@gnu.org; Wed, 01 Nov 2023 17:29:37 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 0F754240029 for ; Wed, 1 Nov 2023 22:29:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1698874171; bh=JNW7JzxFYsYadmZ/+uelkkC3A6SrAlSLj/MEdI6oGf8=; h=From:To:Cc:Subject:Autocrypt:Date:Message-ID:MIME-Version:From; b=RoYZf1UK9b3j61qvfB3qdQRjeNBfzWltOrpkeyMZtu/oh6haPGgQwAca0YUCRyeVb 421wKISoGdKdu5u5WnRnF5FTsRmA6Hbbm+qMTO6hlk7qE/hPmUxTGIVYPIygb03djE zQeIOPqhf2k04kQfeFJzGMvdbGNB3bARc0fduiI49CihoZEM9KDpapywYNiWu9lafh iUh/Xfj0O6++R4U+3/KK6hyltxZrYRoyR827PabZ7cnkFF/WWbNSaWUPLlsF7rLlya D2HBH6Qa0cWy9tNXNbMl9SHTgmKOx75JOCDTqPENt6z79BbQY9Q5t9bUF3jF2Fj9xI rXgOYFe9v8SdQ== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4SLKs61pKxz6tvq; Wed, 1 Nov 2023 22:29:30 +0100 (CET) In-Reply-To: <9bcb9fe9-37b7-4c6e-8292-772ebbe379d9@gmx.at> (Harald Judt's message of "Wed, 1 Nov 2023 21:04:29 +0100") Autocrypt: addr=philipk@posteo.net; keydata= mDMEZBBQQhYJKwYBBAHaRw8BAQdAHJuofBrfqFh12uQu0Yi7mrl525F28eTmwUDflFNmdui0QlBo aWxpcCBLYWx1ZGVyY2ljIChnZW5lcmF0ZWQgYnkgYXV0b2NyeXB0LmVsKSA8cGhpbGlwa0Bwb3N0 ZW8ubmV0PoiWBBMWCAA+FiEEDg7HY17ghYlni8XN8xYDWXahwukFAmQQUEICGwMFCQHhM4AFCwkI BwIGFQoJCAsCBBYCAwECHgECF4AACgkQ8xYDWXahwulikAEA77hloUiSrXgFkUVJhlKBpLCHUjA0 mWZ9j9w5d08+jVwBAK6c4iGP7j+/PhbkxaEKa4V3MzIl7zJkcNNjHCXmvFcEuDgEZBBQQhIKKwYB BAGXVQEFAQEHQI5NLiLRjZy3OfSt1dhCmFyn+fN/QKELUYQetiaoe+MMAwEIB4h+BBgWCAAmFiEE Dg7HY17ghYlni8XN8xYDWXahwukFAmQQUEICGwwFCQHhM4AACgkQ8xYDWXahwukm+wEA8cml4JpK NeAu65rg+auKrPOP6TP/4YWRCTIvuYDm0joBALw98AMz7/qMHvSCeU/hw9PL6u6R2EScxtpKnWof z4oM Received-SPF: pass client-ip=185.67.36.65; envelope-from=philipk@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:312102 Archived-At: Harald Judt writes: > Hi, > > On 2023-11-01 12:32, Philip Kaludercic wrote: > >> You forgot to CC me in the response, I would have almost missed your response. > > I am very sorry for this. No worries, happens to everyone :) >> Harald Judt writes: >> >>> On 2023-10-31 13:21, Philip Kaludercic wrote: >>> >>> [...] >>> >>>> @@ -225,7 +220,7 @@ This is the same as `dired-do-flagged-delete', but calls >>>> (defvar dired-duplicates-map >>>> (let ((map (make-sparse-keymap))) >>>> - ;; workaround for Emacs bug #57565 >>>> + ;; workaround for Emacs bug#57565 >>>> (when (< emacs-major-version 29) >>>> (define-key map (kbd "x") 'dired-duplicates--do-flagged-delete) >>>> (define-key map (kbd "D") 'dired-duplicates--do-delete)) >>> >>> BTW: What's the reasoning behind this suggested change? >> I thought that this would be necessary for bug-reference-mode to >> work, >> but apparently it can detect both of these references. > > Yes, whether space or not, it doesn't matter, at least in > Emacs-29.1. But that aside, bug reference mode wants to take me to a > page on my project (https://codeberg.org/bug/issues/57565) which does > not exist because it is an *Emacs* bug that is referenced in the > comment, not a bug in dired-duplicates. I am afraid I don't understand. Did you modify `bug-reference-url-format'? >> Eli Zaretskii writes: >> >>>> From: Philip Kaludercic >>>> Cc: emacs-devel@gnu.org >>>> Date: Tue, 31 Oct 2023 12:21:51 +0000 >>>> >>>> In the meantime, here is some quick and superficial feedback on the code: >>> >>> What I would like to ask is whether Harald tried to calculate SHA256 >>> using Emacs's own primitives, instead of relying on an external >>> program, which may or may not be installed. >> That is a good point, but I can imagine that one reason for using an >> external tool is that if the user has a specific program they wish to >> use that is not available as an Emacs primitive? > > I have not tried to calculate SHA256 using Emacs's own primitives, but > not because I did not want to. The reasoning behind my decision is the > following: > > dired-duplicates does support local files and *remote* files - via > TRAMP. For local files using Emacs's own primitives could be OK > performance-wise, I have not tested this. But if one starts to use > them for fetching files via TRAMP which means over network, then > things could slow down to a crawl pretty fast (it even takes quite a > while collecting size stats over the network). So it is not Emacs's > SHA256 calculation that would be a problem here, but instead slow > transport over network. If the host does the calculation, this should > be a lot faster (though that of course depends on whether the files > are local files on the host of course and not some mounted network > resource). I think it is reasonable to assume that one would *not* > want to download large video files over the network to check if they > are duplicates. > > The second thing is customizability of course, and probably necessary > because of my reasoning explained in the previous paragraph. The > remote host might have md5sum installed, but not sha256sum. So I > wanted to provide a way for the user to configure the checksum > command. > > I do not object against implementing a fallback as Eli suggested; I > simply have not implemented it because I did not know about Emacs's > primitives, and at that same time my reasoning about remote files > started and so that is the way it is now. > > Regards, > Harald Harald Judt writes: > Hi, > > On 2023-11-01 20:24, Eli Zaretskii wrote: >>> From: Philip Kaludercic >>> Cc: Visuwesh , h.judt@gmx.at, emacs-devel@gnu.org >>> Date: Wed, 01 Nov 2023 17:57:40 +0000 >>> >>> Eli Zaretskii writes: >>> >>>> I'd like to see the numbers which led to the conclusion that >>>> performance was prohibitive. >>>> >>>> And even if the performance is indeed much worse, it could be a >>>> fallback in case the program is not available -- which would IMO be >>>> much better than simply failing to provide the functionality in that >>>> case. >>> >>> This is a cheap test on a 1.4GB ISO I had lying around: >> Thanks. But this is just a single large file, not a frequent use >> case >> for this package. >> >>> --8<---------------cut here---------------start------------->8--- >>> (benchmark-run 1 >>> (with-temp-buffer >>> (insert-file-contents-literally "~/Downloads/haiku-r1beta4-x86_64-anyboot.iso") >>> (secure-hash 'sha512 (current-buffer)))) >>> ;; (44.389091035 1 1.5836082630000021) >>> >>> (benchmark-run 1 >>> (with-temp-buffer >>> (call-process "sha512sum" nil t nil (expand-file-name "~/Downloads/haiku-r1beta4-x86_64-anyboot.iso")) >>> (goto-char (point-min)) >>> (and (looking-at (rx bos (+ alnum))) >>> (match-string 0)))) >>> ;; (5.155846791 0 0.0) >>> --8<---------------cut here---------------end--------------->8--- >> And this is not the package doing its job, this is just a single >> task >> the package does when looking for duplicates. The numbers when >> running the package with call-process replaced by secure-hash will >> probably be different. > > Thanks for the numbers. What troubles me a bit more than the speed is > which effects inserting such a large file into a buffer has > memory-wise? It appears to depend on the size of the file. I haven't determined the exact tipping point where loading a file into memory becomes more expensive than spawning a process, but it seems to be relatively low. Another advantage of using a subprocess, is that the files can be processed in parallel, assuming you start the processes asynchronously. > Regards, > Harald Harald Judt writes: > Hi Philip, > > Another more organisational question regarding your review: If I apply > your patch and create commits from it in the project, should I then > use your name and email address as author of the commit (I guess I can > assume you also assigned copyright to the FSF ;-))? You can attribute it to me, but I don't really care much either way. I didn't really send you a patch to apply, but just the output of C-x v = after leaving a few comments in the file. > Harald