all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Michael Heerdegen <michael_heerdegen@web.de>
Cc: rswgnu@gmail.com, emacs-devel@gnu.org, rsw@gnu.org
Subject: Re: [ELPA] New package: find-dups
Date: Wed, 11 Oct 2017 21:56:43 +0300	[thread overview]
Message-ID: <83fuapo6ms.fsf@gnu.org> (raw)
In-Reply-To: <87bmldefg5.fsf@web.de> (message from Michael Heerdegen on Wed, 11 Oct 2017 19:56:26 +0200)

> From: Michael Heerdegen <michael_heerdegen@web.de>
> Date: Wed, 11 Oct 2017 19:56:26 +0200
> Cc: rswgnu@gmail.com, Emacs Development <emacs-devel@gnu.org>
> 
> #+begin_src emacs-lisp
> (find-dups my-sequence-of-file-names
>            (list (list (lambda (file)
>                          (file-attribute-size (file-attributes file)))
>                        #'eq)
>                  (list (lambda (file)
>                          (shell-command-to-string
>                           (format "head %s"
>                                   (shell-quote-argument file))))
>                        #'equal)
>                  (list (lambda (file)
>                          (shell-command-to-string
>                           (format "md5sum %s | awk '{print $1;}'"
>                                   (shell-quote-argument file))))
>                        #'equal)))
> #+end_src

Apologies for barging into the middle of a discussion, but starting
processes and making strings out of their output to process just a
portion of a file is sub-optimal, because process creation is not
cheap.  It is easier to simply read a predefined number of bytes into
a buffer; insert-file-contents-literally supports that.  Likewise with
md5sum: we have the md5 primitive for that.

In general, working with buffers is much more efficient in Emacs than
working with strings, so avoid strings, let alone large strings, as
much as you can.

One other comment is that shell-command-to-string decodes the output
from the shell command, which is not something you want here, because
AFAIU you are looking for files whose contents is identical on the
byte-stream level, i.e. 2 files which have the same characters, but
are encoded differently on disk (like one UTF-8, the other Latin-1)
should be considered different in this contents, whereas
shell-command-to-string will/might produce identical strings for them.
(Decoding is also expensive run-time wise.)



  reply	other threads:[~2017-10-11 18:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-11 15:25 [ELPA] New package: find-dups Michael Heerdegen
2017-10-11 17:05 ` Robert Weiner
2017-10-11 17:56   ` Michael Heerdegen
2017-10-11 18:56     ` Eli Zaretskii [this message]
2017-10-11 19:25       ` Michael Heerdegen
2017-10-11 23:28     ` Thien-Thi Nguyen
2017-10-12  2:23     ` Robert Weiner
2017-10-12  8:37 ` Andreas Politz
2017-10-12 12:32   ` Michael Heerdegen
2017-10-12 13:20     ` Nicolas Petton
2017-10-12 18:49       ` Michael Heerdegen
2017-10-13 10:21         ` Michael Heerdegen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83fuapo6ms.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=michael_heerdegen@web.de \
    --cc=rsw@gnu.org \
    --cc=rswgnu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.