unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Kevin McCarthy <kevin@8t8.us>
To: notmuch@notmuchmail.org
Cc: zack@upsilon.cc
Subject: [PATCH 1/2] notmuch-mutt: use notmuch --duplicate flag
Date: Wed,  4 Sep 2013 19:05:50 -0700	[thread overview]
Message-ID: <1378346751-25548-2-git-send-email-kevin@8t8.us> (raw)
In-Reply-To: <1378346751-25548-1-git-send-email-kevin@8t8.us>

Change notmuch-mutt to use the new --duplicate=1 flag for duplicate
removal.  This will remove duplicates based on message-id at the
notmuch level.  Previously we were using fdupes or generating sha sums
after the search.

This version will be faster, but will enable the possibility of hiding
search results due to accidental/malicious duplicate message-ids.
---
 contrib/notmuch-mutt/README       |  5 ---
 contrib/notmuch-mutt/notmuch-mutt | 64 +++++++--------------------------------
 2 files changed, 11 insertions(+), 58 deletions(-)

diff --git a/contrib/notmuch-mutt/README b/contrib/notmuch-mutt/README
index e00035c..382ac91 100644
--- a/contrib/notmuch-mutt/README
+++ b/contrib/notmuch-mutt/README
@@ -41,11 +41,6 @@ To *run* notmuch-mutt you will need Perl with the following libraries:
   (Debian package: libstring-shellquote-perl)
 - Term::ReadLine <http://search.cpan.org/~hayashi/Term-ReadLine-Gnu/>
   (Debian package: libterm-readline-gnu-perl)
-- File::Which <http://search.cpan.org/dist/File-Which/>
-  (Debian package: libfile-which-perl)
-
-The --remove-dups option will use fdupes <https://code.google.com/p/fdupes/>
-if it is installed.  Version fdupes-1.50-PR2 or higher is required.
 
 To *build* notmuch-mutt documentation you will need:
 
diff --git a/contrib/notmuch-mutt/notmuch-mutt b/contrib/notmuch-mutt/notmuch-mutt
index 00c5ef8..c69b35c 100755
--- a/contrib/notmuch-mutt/notmuch-mutt
+++ b/contrib/notmuch-mutt/notmuch-mutt
@@ -18,8 +18,6 @@ use Mail::Box::Maildir;
 use Pod::Usage;
 use String::ShellQuote;
 use Term::ReadLine;
-use Digest::SHA;
-use File::Which;
 
 
 my $xdg_cache_dir = "$ENV{HOME}/.cache";
@@ -36,65 +34,22 @@ sub empty_maildir($) {
     $folder->close();
 }
 
-# Match files by size and SHA-256; then delete duplicates
-sub builtin_remove_dups($) {
-    my ($maildir) = @_;
-    my (%size_to_files, %sha_to_files);
-
-    # Group files by matching sizes
-    foreach my $file (glob("$maildir/cur/*")) {
-        my $size = -s $file;
-        push(@{$size_to_files{$size}}, $file) if $size;
-    }
-
-    foreach my $same_size_files (values %size_to_files) {
-        # Don't run sha unless there is another file of the same size
-        next if scalar(@$same_size_files) < 2;
-        %sha_to_files = ();
-
-        # Group files with matching sizes by SHA-256
-        foreach my $file (@$same_size_files) {
-            open(my $fh, '<', $file) or next;
-            binmode($fh);
-            my $sha256hash = Digest::SHA->new(256)->addfile($fh)->hexdigest;
-            close($fh);
-
-            push(@{$sha_to_files{$sha256hash}}, $file);
-        }
-
-        # Remove duplicates
-        foreach my $same_sha_files (values %sha_to_files) {
-            next if scalar(@$same_sha_files) < 2;
-            unlink(@{$same_sha_files}[1..$#$same_sha_files]);
-        }
-    }
-}
-
-# Use either fdupes or the built-in scanner to detect and remove duplicate
-# search results in the maildir
-sub remove_duplicates($) {
-    my ($maildir) = @_;
-
-    my $fdupes = which("fdupes");
-    if ($fdupes) {
-      system("$fdupes --hardlinks --symlinks --delete --noprompt"
-             . " --quiet $maildir/cur/ > /dev/null");
-    } else {
-        builtin_remove_dups($maildir);
-    }
-}
-
 # search($maildir, $remove_dups, $query)
 # search mails according to $query with notmuch; store results in $maildir
 sub search($$$) {
     my ($maildir, $remove_dups, $query) = @_;
+    my $dup_option = "";
+
     $query = shell_quote($query);
 
+    if ($remove_dups) {
+      $dup_option = "--duplicate=1";
+    }
+
     empty_maildir($maildir);
-    system("notmuch search --output=files $query"
+    system("notmuch search --output=files $dup_option $query"
 	   . " | sed -e 's: :\\\\ :g'"
 	   . " | xargs --no-run-if-empty ln -s -t $maildir/cur/");
-    remove_duplicates($maildir) if ($remove_dups);
 }
 
 sub prompt($$) {
@@ -252,7 +207,10 @@ Instead of using command line search terms, prompt the user for them (only for
 
 =item --remove-dups
 
-Remove duplicates from search results.
+Remove emails with duplicate message-ids from search results.  (Passes
+--duplicate=1 to notmuch search command.)  Note this can hide search
+results if an email accidentally or maliciously uses the same message-id
+as a different email.
 
 =item -h
 
-- 
1.8.4.rc3

  reply	other threads:[~2013-09-05  2:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-05  2:05 notmuch-mutt: switch to using the new --duplicate flag Kevin McCarthy
2013-09-05  2:05 ` Kevin McCarthy [this message]
2013-09-09  1:54   ` [PATCH 1/2] notmuch-mutt: use notmuch " David Bremner
2013-09-05  2:05 ` [PATCH 2/2] debian: remove unneeded notmuch-mutt dependencies Kevin McCarthy
2013-09-05  7:13 ` notmuch-mutt: switch to using the new --duplicate flag Stefano Zacchiroli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1378346751-25548-2-git-send-email-kevin@8t8.us \
    --to=kevin@8t8.us \
    --cc=notmuch@notmuchmail.org \
    --cc=zack@upsilon.cc \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).