unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Philip Kaludercic <philipk@posteo.net>
To: Harald Judt <h.judt@gmx.at>
Cc: emacs-devel@gnu.org,  Stefan Kangas <stefankangas@gmail.com>,
	 Eli Zaretskii <eliz@gnu.org>,  Visuwesh <visuweshm@gmail.com>
Subject: Re: [ELPA] New package: dired-duplicates
Date: Wed, 01 Nov 2023 21:29:29 +0000	[thread overview]
Message-ID: <875y2lyxfq.fsf@posteo.net> (raw)
In-Reply-To: <9bcb9fe9-37b7-4c6e-8292-772ebbe379d9@gmx.at> (Harald Judt's message of "Wed, 1 Nov 2023 21:04:29 +0100")

Harald Judt <h.judt@gmx.at> writes:

> Hi,
>
> On 2023-11-01 12:32, Philip Kaludercic wrote:
>
>> You forgot to CC me in the response, I would have almost missed your response.
>
> I am very sorry for this.

No worries, happens to everyone :)

>> Harald Judt <h.judt@gmx.at> writes:
>> 
>>> On 2023-10-31 13:21, Philip Kaludercic wrote:
>>>
>>> [...]
>>>
>>>> @@ -225,7 +220,7 @@ This is the same as `dired-do-flagged-delete', but calls
>>>>      (defvar dired-duplicates-map
>>>>      (let ((map (make-sparse-keymap)))
>>>> -    ;; workaround for Emacs bug #57565
>>>> +    ;; workaround for Emacs bug#57565
>>>>        (when (< emacs-major-version 29)
>>>>          (define-key map (kbd "x") 'dired-duplicates--do-flagged-delete)
>>>>          (define-key map (kbd "D") 'dired-duplicates--do-delete))
>>>
>>> BTW: What's the reasoning behind this suggested change?
>> I thought that this would be necessary for bug-reference-mode to
>> work,
>> but apparently it can detect both of these references.
>
> Yes, whether space or not, it doesn't matter, at least in
> Emacs-29.1. But that aside, bug reference mode wants to take me to a
> page on my project (https://codeberg.org/bug/issues/57565) which does
> not exist because it is an *Emacs* bug that is referenced in the
> comment, not a bug in dired-duplicates.

I am afraid I don't understand.  Did you modify `bug-reference-url-format'?

>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>>>> From: Philip Kaludercic <philipk@posteo.net>
>>>> Cc: emacs-devel@gnu.org
>>>> Date: Tue, 31 Oct 2023 12:21:51 +0000
>>>>
>>>> In the meantime, here is some quick and superficial feedback on the code:
>>>
>>> What I would like to ask is whether Harald tried to calculate SHA256
>>> using Emacs's own primitives, instead of relying on an external
>>> program, which may or may not be installed.
>> That is a good point, but I can imagine that one reason for using an
>> external tool is that if the user has a specific program they wish to
>> use that is not available as an Emacs primitive?
>
> I have not tried to calculate SHA256 using Emacs's own primitives, but
> not because I did not want to. The reasoning behind my decision is the
> following:
>
> dired-duplicates does support local files and *remote* files - via
> TRAMP. For local files using Emacs's own primitives could be OK
> performance-wise, I have not tested this. But if one starts to use
> them for fetching files via TRAMP which means over network, then
> things could slow down to a crawl pretty fast (it even takes quite a
> while collecting size stats over the network). So it is not Emacs's
> SHA256 calculation that would be a problem here, but instead slow
> transport over network. If the host does the calculation, this should
> be a lot faster (though that of course depends on whether the files
> are local files on the host of course and not some mounted network
> resource). I think it is reasonable to assume that one would *not*
> want to download large video files over the network to check if they
> are duplicates.
>
> The second thing is customizability of course, and probably necessary
> because of my reasoning explained in the previous paragraph. The
> remote host might have md5sum installed, but not sha256sum. So I
> wanted to provide a way for the user to configure the checksum
> command.
>
> I do not object against implementing a fallback as Eli suggested; I
> simply have not implemented it because I did not know about Emacs's
> primitives, and at that same time my reasoning about remote files
> started and so that is the way it is now.
>
> Regards,
> Harald

Harald Judt <h.judt@gmx.at> writes:

> Hi,
>
> On 2023-11-01 20:24, Eli Zaretskii wrote:
>>> From: Philip Kaludercic <philipk@posteo.net>
>>> Cc: Visuwesh <visuweshm@gmail.com>,  h.judt@gmx.at,  emacs-devel@gnu.org
>>> Date: Wed, 01 Nov 2023 17:57:40 +0000
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>>> I'd like to see the numbers which led to the conclusion that
>>>> performance was prohibitive.
>>>>
>>>> And even if the performance is indeed much worse, it could be a
>>>> fallback in case the program is not available -- which would IMO be
>>>> much better than simply failing to provide the functionality in that
>>>> case.
>>>
>>> This is a cheap test on a 1.4GB ISO I had lying around:
>> Thanks.  But this is just a single large file, not a frequent use
>> case
>> for this package.
>> 
>>> --8<---------------cut here---------------start------------->8---
>>> (benchmark-run 1
>>>    (with-temp-buffer
>>>      (insert-file-contents-literally "~/Downloads/haiku-r1beta4-x86_64-anyboot.iso")
>>>      (secure-hash 'sha512 (current-buffer))))
>>> ;; (44.389091035 1 1.5836082630000021)
>>>
>>> (benchmark-run 1
>>>    (with-temp-buffer
>>>      (call-process "sha512sum" nil t nil (expand-file-name "~/Downloads/haiku-r1beta4-x86_64-anyboot.iso"))
>>>      (goto-char (point-min))
>>>      (and (looking-at (rx bos (+ alnum)))
>>> 	 (match-string 0))))
>>> ;; (5.155846791 0 0.0)
>>> --8<---------------cut here---------------end--------------->8---
>> And this is not the package doing its job, this is just a single
>> task
>> the package does when looking for duplicates.  The numbers when
>> running the package with call-process replaced by secure-hash will
>> probably be different.
>
> Thanks for the numbers. What troubles me a bit more than the speed is
> which effects inserting such a large file into a buffer has
> memory-wise?

It appears to depend on the size of the file.  I haven't determined the
exact tipping point where loading a file into memory becomes more
expensive than spawning a process, but it seems to be relatively low.

Another advantage of using a subprocess, is that the files can be
processed in parallel, assuming you start the processes asynchronously. 

> Regards,
> Harald

Harald Judt <h.judt@gmx.at> writes:

> Hi Philip,
>
> Another more organisational question regarding your review: If I apply
> your patch and create commits from it in the project, should I then
> use your name and email address as author of the commit (I guess I can
> assume you also assigned copyright to the FSF ;-))?

You can attribute it to me, but I don't really care much either way.  I
didn't really send you a patch to apply, but just the output of C-x v =
after leaving a few comments in the file.

> Harald





  reply	other threads:[~2023-11-01 21:29 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-31 10:34 [ELPA] New package: dired-duplicates Harald Judt
2023-10-31 12:21 ` Philip Kaludercic
2023-10-31 21:05   ` Harald Judt
2023-11-01  2:14     ` Michael Heerdegen
2023-11-01 15:16       ` [External] : " Drew Adams
2023-11-01 15:45         ` Eli Zaretskii
2023-11-01 16:28           ` Drew Adams
2023-11-01 16:47             ` Eli Zaretskii
2023-11-01 16:33         ` Philip Kaludercic
2023-11-01 11:32     ` Philip Kaludercic
2023-11-01 20:04       ` Harald Judt
2023-11-01 21:29         ` Philip Kaludercic [this message]
2023-11-02  8:44           ` Harald Judt
2023-11-03  8:19             ` Philip Kaludercic
2023-11-03 20:19               ` Harald Judt
2023-11-04 15:27                 ` Philip Kaludercic
2023-11-06  9:33                   ` Harald Judt
2023-11-10  8:37                     ` Philip Kaludercic
2023-11-10 10:02                       ` Harald Judt
2023-11-23  6:12                         ` Philip Kaludercic
2023-10-31 21:24   ` Harald Judt
2023-11-01 17:40     ` Stefan Kangas
2023-11-01  3:30   ` Eli Zaretskii
2023-11-01 13:53     ` Visuwesh
2023-11-01 14:12       ` Eli Zaretskii
2023-11-01 17:57         ` Philip Kaludercic
2023-11-01 19:24           ` Eli Zaretskii
2023-11-01 20:09             ` Harald Judt
2023-11-02  5:55               ` Eli Zaretskii
2023-11-08 20:29                 ` Harald Judt
2023-11-09  5:52                   ` Eli Zaretskii
2023-11-09  8:00                     ` Harald Judt
2023-11-09  8:38                       ` tomas
2023-11-09  8:48                         ` Eli Zaretskii
2023-11-09  8:43                       ` Eli Zaretskii
2023-11-09  8:53                         ` tomas
2023-11-09  9:18                         ` Harald Judt
2023-11-09 14:52                           ` Harald Judt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875y2lyxfq.fsf@posteo.net \
    --to=philipk@posteo.net \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=h.judt@gmx.at \
    --cc=stefankangas@gmail.com \
    --cc=visuweshm@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).