From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 62AE16DE01F5 for ; Wed, 7 Nov 2018 13:14:43 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -8.93 X-Spam-Level: X-Spam-Status: No, score=-8.93 tagged_above=-999 required=5 tests=[AM.WBL=-8, ALL_TRUSTED=-1, AWL=0.070] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Aw23iBjd7b9Z; Wed, 7 Nov 2018 13:14:42 -0800 (PST) Received: from wondoo.home.cworth.org (unknown [10.0.0.11]) (Authenticated sender: cworth) by arlo.cworth.org (Postfix) with ESMTPSA id CB8826DE014A; Wed, 7 Nov 2018 13:14:42 -0800 (PST) Received: from wondoo (localhost [IPv6:::1]) by wondoo.home.cworth.org (Postfix) with ESMTPS id 971671FE4349; Wed, 7 Nov 2018 13:14:42 -0800 (PST) To: Ralph Seichter , notmuch@notmuchmail.org Subject: Re: finding file by size In-Reply-To: References: <87ftwec5mq.fsf@pigpen.jf.intel.com> Sender: cworth@cworth.org From: Carl Worth Date: Wed, 07 Nov 2018 13:14:37 -0800 Message-ID: <87h8gs1tfm.fsf@wondoo.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2018 21:14:43 -0000 --=-=-= Content-Type: text/plain On Tue, Nov 06 2018, Ralph Seichter wrote: > I'm not sure about using Notmuch itself, Right. Notmuch doesn't currently index (as far as I'm aware) anything that would be useful for sorting by size. > but this should work: > > find /path/to/maildir -type f -size +50M | xargs rm Hmm... I imagine that Mark would be more interested in viewing these files to ensure they are what he thinks they are before deleting them. So, capturing the results of that with a notmuch tag would be a reasonable thing to do. The only trick there is that I don't see any existing search term to find a message associated with a particular file name, (we have "path:" and "folder:" to find messages in a specific directory, but nothing I see for finding the message corresponding to a specific file). So, then we could extract the message-id from each file and do a search based on that I guess? Here's a (bash) command I just ran on my mail store of over a million messages that tagged the 8 messages larger than 50MB. It took about a minute to run (with a warm cache): for msg in $( for file in $(find . -type f -size +50M); do grep -i ^Message-Id $file | sed -e 's/^.*<\(.*\)>.*/\1/'; done ); do notmuch tag +large id:$msg; done With that, I'm able to go through the list from: notmuch search tag:large to investigate whether these large emails are worth keeping. So, that's obviously not extremely elegant, but it's at least possible. -Carl --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEzvjdyAGcpm1OMJTzYAIzup5U3GEFAlvjVb0ACgkQYAIzup5U 3GFV/w/8CcwkkxZ27Sud7dwWv4FohVvc7V9uNMOgh3ulBNVS+mhD+D0iefPe8+12 /EI2pDsl6WMXFjg53jqbwV0Sh3LZ4ng7peKGJiL+Rc2FMSJomosURZ6O5/H0qQlV VUEaLydw7mc6KmJjZbVFCYfCMb32DUi9oer8njg0KvYPAzsBEcQJ5wUP2uRN4FKZ lfVRLRx2PtQUIBsJbc1IxD7dXQcbBHWXnAsEDlgviOA4WqR+l1Jlx3Shodq7Pfds oyhUuoqt5cFWvVzOhwii43b3LVt4r521clf2YmMnFkxRYG9qDN/O8JJxOqDTM1jc NUvpbdXtv7nB4FA+D5VH5+yxAZoJgRXGeLS2NEkgGgfmXiSs7UVGnRTOSP2SsQ/A 6olAEmK9LizjMERBQqGKPscHvfYLby8pdR+pB4XUweoZ5omg5p896Tj5FH853i6+ KzJ8q11lXYglGN7s3TzZCmsAzYqpeU2kkxtmJVxw6NkmCpLtoPVlbn84o6DJcPE5 5O9ZiGnwcFz4zIWcVZ3kBTEP020fACFbigDGbGxmhr2YpG1DqX3esCinJP8rjwuX zi2qSCX2Y6bjeu83JNQlbZrdfCCcEC9lTluY7VxRtZXch4bJSL5vrDRMOT/bCzV+ Af9lDsmqq9f4oDpXuDohYH/aPvukwgms8z2mAy2R0s4ZhPkND2I= =g2GP -----END PGP SIGNATURE----- --=-=-=--