From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 3D2C66DE01EC for ; Wed, 7 Nov 2018 13:37:12 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -2.338 X-Spam-Level: X-Spam-Status: No, score=-2.338 tagged_above=-999 required=5 tests=[AWL=-0.037, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BK-E7uhyvGfS for ; Wed, 7 Nov 2018 13:37:11 -0800 (PST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by arlo.cworth.org (Postfix) with ESMTPS id 1C3CD6DE014A for ; Wed, 7 Nov 2018 13:37:11 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Nov 2018 13:37:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,477,1534834800"; d="scan'208";a="98459888" Received: from philipca-mobl1.amr.corp.intel.com (HELO giraffe.jf.intel.com) ([10.251.29.167]) by orsmga003.jf.intel.com with ESMTP; 07 Nov 2018 13:37:08 -0800 From: Mark Janes To: Carl Worth , Ralph Seichter , notmuch@notmuchmail.org Subject: Re: finding file by size In-Reply-To: <87h8gs1tfm.fsf@wondoo.home.cworth.org> References: <87ftwec5mq.fsf@pigpen.jf.intel.com> <87h8gs1tfm.fsf@wondoo.home.cworth.org> Date: Wed, 07 Nov 2018 13:37:08 -0800 Message-ID: <87y3a4po1n.fsf@pigpen.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2018 21:37:12 -0000 Carl Worth writes: > On Tue, Nov 06 2018, Ralph Seichter wrote: >> I'm not sure about using Notmuch itself, > > Right. Notmuch doesn't currently index (as far as I'm aware) anything > that would be useful for sorting by size. > >> but this should work: >> >> find /path/to/maildir -type f -size +50M | xargs rm > > Hmm... I imagine that Mark would be more interested in viewing these > files to ensure they are what he thinks they are before deleting them. Yes, exactly this. It's quite nice to see the messages in question. Looking at the raw message files is not helpful, as the contents are usually encoded. > So, capturing the results of that with a notmuch tag would be > a reasonable thing to do. The only trick there is that I don't see any > existing search term to find a message associated with a particular file > name, (we have "path:" and "folder:" to find messages in a specific > directory, but nothing I see for finding the message corresponding to a > specific file). It would be better to have a separate numeric field in xapian similar to how the date is handled. Then the user could search for files of specific sizes. A more usable feature would be to sort the currently selected messages by size in emacs or any other front-end. In the past I've used Thunderbird to archive old/large messages from my imap server. It has a column in the UI that allows you to sort files by size. I'm just now realizing that notmuch *only* sorts messages by date (at least in emacs) -- and I haven't missed any other sort functionality until now... > So, then we could extract the message-id from each file and do a search > based on that I guess? > > Here's a (bash) command I just ran on my mail store of over a million > messages that tagged the 8 messages larger than 50MB. It took about a > minute to run (with a warm cache): > > for msg in $( > for file in $(find . -type f -size +50M); > do > grep -i ^Message-Id $file | sed -e 's/^.*<\(.*\)>.*/\1/'; > done > ); > do > notmuch tag +large id:$msg; > done > > With that, I'm able to go through the list from: > > notmuch search tag:large > > to investigate whether these large emails are worth keeping. > > So, that's obviously not extremely elegant, but it's at least possible. Thanks for this, I will use your command for my current needs. Even if notmuch supported a more elegant feature, I would still need to somehow locate and remove the messages from my maildir. I can see how the separation between the xapian database and the maildir makes it impossible to support cleaning up the maildir through notmuch.