From: Philip Kaludercic <philipk@posteo.net>
To: Harald Judt <h.judt@gmx.at>
Cc: emacs-devel@gnu.org, Stefan Kangas <stefankangas@gmail.com>,
Eli Zaretskii <eliz@gnu.org>, Visuwesh <visuweshm@gmail.com>
Subject: Re: [ELPA] New package: dired-duplicates
Date: Wed, 01 Nov 2023 21:29:29 +0000 [thread overview]
Message-ID: <875y2lyxfq.fsf@posteo.net> (raw)
In-Reply-To: <9bcb9fe9-37b7-4c6e-8292-772ebbe379d9@gmx.at> (Harald Judt's message of "Wed, 1 Nov 2023 21:04:29 +0100")
Harald Judt <h.judt@gmx.at> writes:
> Hi,
>
> On 2023-11-01 12:32, Philip Kaludercic wrote:
>
>> You forgot to CC me in the response, I would have almost missed your response.
>
> I am very sorry for this.
No worries, happens to everyone :)
>> Harald Judt <h.judt@gmx.at> writes:
>>
>>> On 2023-10-31 13:21, Philip Kaludercic wrote:
>>>
>>> [...]
>>>
>>>> @@ -225,7 +220,7 @@ This is the same as `dired-do-flagged-delete', but calls
>>>> (defvar dired-duplicates-map
>>>> (let ((map (make-sparse-keymap)))
>>>> - ;; workaround for Emacs bug #57565
>>>> + ;; workaround for Emacs bug#57565
>>>> (when (< emacs-major-version 29)
>>>> (define-key map (kbd "x") 'dired-duplicates--do-flagged-delete)
>>>> (define-key map (kbd "D") 'dired-duplicates--do-delete))
>>>
>>> BTW: What's the reasoning behind this suggested change?
>> I thought that this would be necessary for bug-reference-mode to
>> work,
>> but apparently it can detect both of these references.
>
> Yes, whether space or not, it doesn't matter, at least in
> Emacs-29.1. But that aside, bug reference mode wants to take me to a
> page on my project (https://codeberg.org/bug/issues/57565) which does
> not exist because it is an *Emacs* bug that is referenced in the
> comment, not a bug in dired-duplicates.
I am afraid I don't understand. Did you modify `bug-reference-url-format'?
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>>>> From: Philip Kaludercic <philipk@posteo.net>
>>>> Cc: emacs-devel@gnu.org
>>>> Date: Tue, 31 Oct 2023 12:21:51 +0000
>>>>
>>>> In the meantime, here is some quick and superficial feedback on the code:
>>>
>>> What I would like to ask is whether Harald tried to calculate SHA256
>>> using Emacs's own primitives, instead of relying on an external
>>> program, which may or may not be installed.
>> That is a good point, but I can imagine that one reason for using an
>> external tool is that if the user has a specific program they wish to
>> use that is not available as an Emacs primitive?
>
> I have not tried to calculate SHA256 using Emacs's own primitives, but
> not because I did not want to. The reasoning behind my decision is the
> following:
>
> dired-duplicates does support local files and *remote* files - via
> TRAMP. For local files using Emacs's own primitives could be OK
> performance-wise, I have not tested this. But if one starts to use
> them for fetching files via TRAMP which means over network, then
> things could slow down to a crawl pretty fast (it even takes quite a
> while collecting size stats over the network). So it is not Emacs's
> SHA256 calculation that would be a problem here, but instead slow
> transport over network. If the host does the calculation, this should
> be a lot faster (though that of course depends on whether the files
> are local files on the host of course and not some mounted network
> resource). I think it is reasonable to assume that one would *not*
> want to download large video files over the network to check if they
> are duplicates.
>
> The second thing is customizability of course, and probably necessary
> because of my reasoning explained in the previous paragraph. The
> remote host might have md5sum installed, but not sha256sum. So I
> wanted to provide a way for the user to configure the checksum
> command.
>
> I do not object against implementing a fallback as Eli suggested; I
> simply have not implemented it because I did not know about Emacs's
> primitives, and at that same time my reasoning about remote files
> started and so that is the way it is now.
>
> Regards,
> Harald
Harald Judt <h.judt@gmx.at> writes:
> Hi,
>
> On 2023-11-01 20:24, Eli Zaretskii wrote:
>>> From: Philip Kaludercic <philipk@posteo.net>
>>> Cc: Visuwesh <visuweshm@gmail.com>, h.judt@gmx.at, emacs-devel@gnu.org
>>> Date: Wed, 01 Nov 2023 17:57:40 +0000
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>>> I'd like to see the numbers which led to the conclusion that
>>>> performance was prohibitive.
>>>>
>>>> And even if the performance is indeed much worse, it could be a
>>>> fallback in case the program is not available -- which would IMO be
>>>> much better than simply failing to provide the functionality in that
>>>> case.
>>>
>>> This is a cheap test on a 1.4GB ISO I had lying around:
>> Thanks. But this is just a single large file, not a frequent use
>> case
>> for this package.
>>
>>> --8<---------------cut here---------------start------------->8---
>>> (benchmark-run 1
>>> (with-temp-buffer
>>> (insert-file-contents-literally "~/Downloads/haiku-r1beta4-x86_64-anyboot.iso")
>>> (secure-hash 'sha512 (current-buffer))))
>>> ;; (44.389091035 1 1.5836082630000021)
>>>
>>> (benchmark-run 1
>>> (with-temp-buffer
>>> (call-process "sha512sum" nil t nil (expand-file-name "~/Downloads/haiku-r1beta4-x86_64-anyboot.iso"))
>>> (goto-char (point-min))
>>> (and (looking-at (rx bos (+ alnum)))
>>> (match-string 0))))
>>> ;; (5.155846791 0 0.0)
>>> --8<---------------cut here---------------end--------------->8---
>> And this is not the package doing its job, this is just a single
>> task
>> the package does when looking for duplicates. The numbers when
>> running the package with call-process replaced by secure-hash will
>> probably be different.
>
> Thanks for the numbers. What troubles me a bit more than the speed is
> which effects inserting such a large file into a buffer has
> memory-wise?
It appears to depend on the size of the file. I haven't determined the
exact tipping point where loading a file into memory becomes more
expensive than spawning a process, but it seems to be relatively low.
Another advantage of using a subprocess, is that the files can be
processed in parallel, assuming you start the processes asynchronously.
> Regards,
> Harald
Harald Judt <h.judt@gmx.at> writes:
> Hi Philip,
>
> Another more organisational question regarding your review: If I apply
> your patch and create commits from it in the project, should I then
> use your name and email address as author of the commit (I guess I can
> assume you also assigned copyright to the FSF ;-))?
You can attribute it to me, but I don't really care much either way. I
didn't really send you a patch to apply, but just the output of C-x v =
after leaving a few comments in the file.
> Harald
next prev parent reply other threads:[~2023-11-01 21:29 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-31 10:34 [ELPA] New package: dired-duplicates Harald Judt
2023-10-31 12:21 ` Philip Kaludercic
2023-10-31 21:05 ` Harald Judt
2023-11-01 2:14 ` Michael Heerdegen
2023-11-01 15:16 ` [External] : " Drew Adams
2023-11-01 15:45 ` Eli Zaretskii
2023-11-01 16:28 ` Drew Adams
2023-11-01 16:47 ` Eli Zaretskii
2023-11-01 16:33 ` Philip Kaludercic
2023-11-01 11:32 ` Philip Kaludercic
2023-11-01 20:04 ` Harald Judt
2023-11-01 21:29 ` Philip Kaludercic [this message]
2023-11-02 8:44 ` Harald Judt
2023-11-03 8:19 ` Philip Kaludercic
2023-11-03 20:19 ` Harald Judt
2023-11-04 15:27 ` Philip Kaludercic
2023-11-06 9:33 ` Harald Judt
2023-11-10 8:37 ` Philip Kaludercic
2023-11-10 10:02 ` Harald Judt
2023-11-23 6:12 ` Philip Kaludercic
2023-10-31 21:24 ` Harald Judt
2023-11-01 17:40 ` Stefan Kangas
2023-11-01 3:30 ` Eli Zaretskii
2023-11-01 13:53 ` Visuwesh
2023-11-01 14:12 ` Eli Zaretskii
2023-11-01 17:57 ` Philip Kaludercic
2023-11-01 19:24 ` Eli Zaretskii
2023-11-01 20:09 ` Harald Judt
2023-11-02 5:55 ` Eli Zaretskii
2023-11-08 20:29 ` Harald Judt
2023-11-09 5:52 ` Eli Zaretskii
2023-11-09 8:00 ` Harald Judt
2023-11-09 8:38 ` tomas
2023-11-09 8:48 ` Eli Zaretskii
2023-11-09 8:43 ` Eli Zaretskii
2023-11-09 8:53 ` tomas
2023-11-09 9:18 ` Harald Judt
2023-11-09 14:52 ` Harald Judt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875y2lyxfq.fsf@posteo.net \
--to=philipk@posteo.net \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=h.judt@gmx.at \
--cc=stefankangas@gmail.com \
--cc=visuweshm@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.