unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* notmuch-mutt: switch to using the new --duplicate flag
@ 2013-09-05  2:05 Kevin McCarthy
  2013-09-05  2:05 ` [PATCH 1/2] notmuch-mutt: use notmuch " Kevin McCarthy
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Kevin McCarthy @ 2013-09-05  2:05 UTC (permalink / raw)
  To: notmuch; +Cc: zack

This patch series changes notmuch-mutt to use the new built-in
--duplicate flag for removing duplicates by message-id.

This should speed up duplicate removal, since we no longer need to scan
and compute sha sums.  However, it does now allow search results to be
hidden in the event of accidental or malicious message-id re-use.

The duplicate removal is optional, but is currently enabled by default
in the distributed /etc/Muttrc.d/notmuch-mutt.rc file.

Please don't hesitate to comment if this change is too controversial!

Thank you,

-Kevin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] notmuch-mutt: use notmuch --duplicate flag
  2013-09-05  2:05 notmuch-mutt: switch to using the new --duplicate flag Kevin McCarthy
@ 2013-09-05  2:05 ` Kevin McCarthy
  2013-09-09  1:54   ` David Bremner
  2013-09-05  2:05 ` [PATCH 2/2] debian: remove unneeded notmuch-mutt dependencies Kevin McCarthy
  2013-09-05  7:13 ` notmuch-mutt: switch to using the new --duplicate flag Stefano Zacchiroli
  2 siblings, 1 reply; 5+ messages in thread
From: Kevin McCarthy @ 2013-09-05  2:05 UTC (permalink / raw)
  To: notmuch; +Cc: zack

Change notmuch-mutt to use the new --duplicate=1 flag for duplicate
removal.  This will remove duplicates based on message-id at the
notmuch level.  Previously we were using fdupes or generating sha sums
after the search.

This version will be faster, but will enable the possibility of hiding
search results due to accidental/malicious duplicate message-ids.
---
 contrib/notmuch-mutt/README       |  5 ---
 contrib/notmuch-mutt/notmuch-mutt | 64 +++++++--------------------------------
 2 files changed, 11 insertions(+), 58 deletions(-)

diff --git a/contrib/notmuch-mutt/README b/contrib/notmuch-mutt/README
index e00035c..382ac91 100644
--- a/contrib/notmuch-mutt/README
+++ b/contrib/notmuch-mutt/README
@@ -41,11 +41,6 @@ To *run* notmuch-mutt you will need Perl with the following libraries:
   (Debian package: libstring-shellquote-perl)
 - Term::ReadLine <http://search.cpan.org/~hayashi/Term-ReadLine-Gnu/>
   (Debian package: libterm-readline-gnu-perl)
-- File::Which <http://search.cpan.org/dist/File-Which/>
-  (Debian package: libfile-which-perl)
-
-The --remove-dups option will use fdupes <https://code.google.com/p/fdupes/>
-if it is installed.  Version fdupes-1.50-PR2 or higher is required.
 
 To *build* notmuch-mutt documentation you will need:
 
diff --git a/contrib/notmuch-mutt/notmuch-mutt b/contrib/notmuch-mutt/notmuch-mutt
index 00c5ef8..c69b35c 100755
--- a/contrib/notmuch-mutt/notmuch-mutt
+++ b/contrib/notmuch-mutt/notmuch-mutt
@@ -18,8 +18,6 @@ use Mail::Box::Maildir;
 use Pod::Usage;
 use String::ShellQuote;
 use Term::ReadLine;
-use Digest::SHA;
-use File::Which;
 
 
 my $xdg_cache_dir = "$ENV{HOME}/.cache";
@@ -36,65 +34,22 @@ sub empty_maildir($) {
     $folder->close();
 }
 
-# Match files by size and SHA-256; then delete duplicates
-sub builtin_remove_dups($) {
-    my ($maildir) = @_;
-    my (%size_to_files, %sha_to_files);
-
-    # Group files by matching sizes
-    foreach my $file (glob("$maildir/cur/*")) {
-        my $size = -s $file;
-        push(@{$size_to_files{$size}}, $file) if $size;
-    }
-
-    foreach my $same_size_files (values %size_to_files) {
-        # Don't run sha unless there is another file of the same size
-        next if scalar(@$same_size_files) < 2;
-        %sha_to_files = ();
-
-        # Group files with matching sizes by SHA-256
-        foreach my $file (@$same_size_files) {
-            open(my $fh, '<', $file) or next;
-            binmode($fh);
-            my $sha256hash = Digest::SHA->new(256)->addfile($fh)->hexdigest;
-            close($fh);
-
-            push(@{$sha_to_files{$sha256hash}}, $file);
-        }
-
-        # Remove duplicates
-        foreach my $same_sha_files (values %sha_to_files) {
-            next if scalar(@$same_sha_files) < 2;
-            unlink(@{$same_sha_files}[1..$#$same_sha_files]);
-        }
-    }
-}
-
-# Use either fdupes or the built-in scanner to detect and remove duplicate
-# search results in the maildir
-sub remove_duplicates($) {
-    my ($maildir) = @_;
-
-    my $fdupes = which("fdupes");
-    if ($fdupes) {
-      system("$fdupes --hardlinks --symlinks --delete --noprompt"
-             . " --quiet $maildir/cur/ > /dev/null");
-    } else {
-        builtin_remove_dups($maildir);
-    }
-}
-
 # search($maildir, $remove_dups, $query)
 # search mails according to $query with notmuch; store results in $maildir
 sub search($$$) {
     my ($maildir, $remove_dups, $query) = @_;
+    my $dup_option = "";
+
     $query = shell_quote($query);
 
+    if ($remove_dups) {
+      $dup_option = "--duplicate=1";
+    }
+
     empty_maildir($maildir);
-    system("notmuch search --output=files $query"
+    system("notmuch search --output=files $dup_option $query"
 	   . " | sed -e 's: :\\\\ :g'"
 	   . " | xargs --no-run-if-empty ln -s -t $maildir/cur/");
-    remove_duplicates($maildir) if ($remove_dups);
 }
 
 sub prompt($$) {
@@ -252,7 +207,10 @@ Instead of using command line search terms, prompt the user for them (only for
 
 =item --remove-dups
 
-Remove duplicates from search results.
+Remove emails with duplicate message-ids from search results.  (Passes
+--duplicate=1 to notmuch search command.)  Note this can hide search
+results if an email accidentally or maliciously uses the same message-id
+as a different email.
 
 =item -h
 
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] debian: remove unneeded notmuch-mutt dependencies
  2013-09-05  2:05 notmuch-mutt: switch to using the new --duplicate flag Kevin McCarthy
  2013-09-05  2:05 ` [PATCH 1/2] notmuch-mutt: use notmuch " Kevin McCarthy
@ 2013-09-05  2:05 ` Kevin McCarthy
  2013-09-05  7:13 ` notmuch-mutt: switch to using the new --duplicate flag Stefano Zacchiroli
  2 siblings, 0 replies; 5+ messages in thread
From: Kevin McCarthy @ 2013-09-05  2:05 UTC (permalink / raw)
  To: notmuch; +Cc: zack

Switching away from fdupes removes the dependency on libfile-which-perl
and the need to recommend fdupes.
---
 debian/control | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/debian/control b/debian/control
index 74a5161..81b9b12 100644
--- a/debian/control
+++ b/debian/control
@@ -142,9 +142,8 @@ Depends:
  notmuch (>= 0.4),
  libmail-box-perl, libmailtools-perl,
  libstring-shellquote-perl, libterm-readline-gnu-perl,
- libfile-which-perl,
  ${misc:Depends}
-Recommends: mutt, fdupes
+Recommends: mutt
 Enhances: notmuch, mutt
 Description: thread-based email index, search and tagging (Mutt interface)
  notmuch-mutt provides integration among the Mutt mail user agent and
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: notmuch-mutt: switch to using the new --duplicate flag
  2013-09-05  2:05 notmuch-mutt: switch to using the new --duplicate flag Kevin McCarthy
  2013-09-05  2:05 ` [PATCH 1/2] notmuch-mutt: use notmuch " Kevin McCarthy
  2013-09-05  2:05 ` [PATCH 2/2] debian: remove unneeded notmuch-mutt dependencies Kevin McCarthy
@ 2013-09-05  7:13 ` Stefano Zacchiroli
  2 siblings, 0 replies; 5+ messages in thread
From: Stefano Zacchiroli @ 2013-09-05  7:13 UTC (permalink / raw)
  To: Kevin McCarthy; +Cc: notmuch

On Wed, Sep 04, 2013 at 07:05:49PM -0700, Kevin McCarthy wrote:
> This patch series changes notmuch-mutt to use the new built-in
> --duplicate flag for removing duplicates by message-id.

Thanks Kevin!, the patch looks good to me and I very much welcome the
deduplication of the deduplication feature :), it is pointless to have
an ad-hoc implementation of it in notmuch-mutt now that there is support
in notmuch itself.

> The duplicate removal is optional, but is currently enabled by default
> in the distributed /etc/Muttrc.d/notmuch-mutt.rc file.
> 
> Please don't hesitate to comment if this change is too controversial!

FWIW I'm fine with either default.

Cheers.
-- 
Stefano Zacchiroli  . . . . . . .  zack@upsilon.cc . . . . o . . . o . o
Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o
Former Debian Project Leader  . . @zack on identi.ca . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] notmuch-mutt: use notmuch --duplicate flag
  2013-09-05  2:05 ` [PATCH 1/2] notmuch-mutt: use notmuch " Kevin McCarthy
@ 2013-09-09  1:54   ` David Bremner
  0 siblings, 0 replies; 5+ messages in thread
From: David Bremner @ 2013-09-09  1:54 UTC (permalink / raw)
  To: Kevin McCarthy, notmuch; +Cc: zack

Kevin McCarthy <kevin@8t8.us> writes:

> Change notmuch-mutt to use the new --duplicate=1 flag for duplicate
> removal.  This will remove duplicates based on message-id at the
> notmuch level.  Previously we were using fdupes or generating sha sums
> after the search.

Pushed both,

d

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-09  1:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-05  2:05 notmuch-mutt: switch to using the new --duplicate flag Kevin McCarthy
2013-09-05  2:05 ` [PATCH 1/2] notmuch-mutt: use notmuch " Kevin McCarthy
2013-09-09  1:54   ` David Bremner
2013-09-05  2:05 ` [PATCH 2/2] debian: remove unneeded notmuch-mutt dependencies Kevin McCarthy
2013-09-05  7:13 ` notmuch-mutt: switch to using the new --duplicate flag Stefano Zacchiroli

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).