From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cworth@cworth.org>
Received: from localhost (localhost [127.0.0.1])
 by arlo.cworth.org (Postfix) with ESMTP id 62AE16DE01F5
 for <notmuch@notmuchmail.org>; Wed,  7 Nov 2018 13:14:43 -0800 (PST)
X-Virus-Scanned: Debian amavisd-new at cworth.org
X-Spam-Flag: NO
X-Spam-Score: -8.93
X-Spam-Level: 
X-Spam-Status: No, score=-8.93 tagged_above=-999 required=5 tests=[AM.WBL=-8, 
 ALL_TRUSTED=-1, AWL=0.070] autolearn=disabled
Received: from arlo.cworth.org ([127.0.0.1])
 by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id Aw23iBjd7b9Z; Wed,  7 Nov 2018 13:14:42 -0800 (PST)
Received: from wondoo.home.cworth.org (unknown [10.0.0.11])
 (Authenticated sender: cworth)
 by arlo.cworth.org (Postfix) with ESMTPSA id CB8826DE014A;
 Wed,  7 Nov 2018 13:14:42 -0800 (PST)
Received: from wondoo (localhost [IPv6:::1])
 by wondoo.home.cworth.org (Postfix) with ESMTPS id 971671FE4349;
 Wed,  7 Nov 2018 13:14:42 -0800 (PST)
To: Ralph Seichter <m16+notmuch@monksofcool.net>, notmuch@notmuchmail.org
Subject: Re: finding file by size
In-Reply-To: <m2k1lq2asm.fsf@argon.seichter.de>
References: <87ftwec5mq.fsf@pigpen.jf.intel.com>
 <m2k1lq2asm.fsf@argon.seichter.de>
Sender: cworth@cworth.org
From: Carl Worth <cworth@cworth.org>
Date: Wed, 07 Nov 2018 13:14:37 -0800
Message-ID: <87h8gs1tfm.fsf@wondoo.home.cworth.org>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-=";
 micalg=pgp-sha512; protocol="application/pgp-signature"
X-BeenThere: notmuch@notmuchmail.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Use and development of the notmuch mail system."
 <notmuch.notmuchmail.org>
List-Unsubscribe: <https://notmuchmail.org/mailman/options/notmuch>,
 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
List-Archive: <http://notmuchmail.org/pipermail/notmuch/>
List-Post: <mailto:notmuch@notmuchmail.org>
List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
List-Subscribe: <https://notmuchmail.org/mailman/listinfo/notmuch>,
 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Nov 2018 21:14:43 -0000

--=-=-=
Content-Type: text/plain

On Tue, Nov 06 2018, Ralph Seichter wrote:
> I'm not sure about using Notmuch itself,

Right. Notmuch doesn't currently index (as far as I'm aware) anything
that would be useful for sorting by size.

> but this should work:
>
>   find /path/to/maildir -type f -size +50M | xargs rm

Hmm... I imagine that Mark would be more interested in viewing these
files to ensure they are what he thinks they are before deleting them.

So, capturing the results of that with a notmuch tag would be
a reasonable thing to do. The only trick there is that I don't see any
existing search term to find a message associated with a particular file
name, (we have "path:" and "folder:" to find messages in a specific
directory, but nothing I see for finding the message corresponding to a
specific file).

So, then we could extract the message-id from each file and do a search
based on that I guess?

Here's a (bash) command I just ran on my mail store of over a million
messages that tagged the 8 messages larger than 50MB. It took about a
minute to run (with a warm cache):

  for msg in $(
               for file in $(find . -type f -size +50M);
               do
                 grep -i ^Message-Id $file | sed -e 's/^.*<\(.*\)>.*/\1/';
               done
             );
  do
    notmuch tag +large id:$msg;
  done

With that, I'm able to go through the list from:

  notmuch search tag:large

to investigate whether these large emails are worth keeping.

So, that's obviously not extremely elegant, but it's at least possible.

-Carl


--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEzvjdyAGcpm1OMJTzYAIzup5U3GEFAlvjVb0ACgkQYAIzup5U
3GFV/w/8CcwkkxZ27Sud7dwWv4FohVvc7V9uNMOgh3ulBNVS+mhD+D0iefPe8+12
/EI2pDsl6WMXFjg53jqbwV0Sh3LZ4ng7peKGJiL+Rc2FMSJomosURZ6O5/H0qQlV
VUEaLydw7mc6KmJjZbVFCYfCMb32DUi9oer8njg0KvYPAzsBEcQJ5wUP2uRN4FKZ
lfVRLRx2PtQUIBsJbc1IxD7dXQcbBHWXnAsEDlgviOA4WqR+l1Jlx3Shodq7Pfds
oyhUuoqt5cFWvVzOhwii43b3LVt4r521clf2YmMnFkxRYG9qDN/O8JJxOqDTM1jc
NUvpbdXtv7nB4FA+D5VH5+yxAZoJgRXGeLS2NEkgGgfmXiSs7UVGnRTOSP2SsQ/A
6olAEmK9LizjMERBQqGKPscHvfYLby8pdR+pB4XUweoZ5omg5p896Tj5FH853i6+
KzJ8q11lXYglGN7s3TzZCmsAzYqpeU2kkxtmJVxw6NkmCpLtoPVlbn84o6DJcPE5
5O9ZiGnwcFz4zIWcVZ3kBTEP020fACFbigDGbGxmhr2YpG1DqX3esCinJP8rjwuX
zi2qSCX2Y6bjeu83JNQlbZrdfCCcEC9lTluY7VxRtZXch4bJSL5vrDRMOT/bCzV+
Af9lDsmqq9f4oDpXuDohYH/aPvukwgms8z2mAy2R0s4ZhPkND2I=
=g2GP
-----END PGP SIGNATURE-----
--=-=-=--