unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* finding file by size
@ 2018-11-06 20:28 Mark Janes
  2018-11-06 20:47 ` Ralph Seichter
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Janes @ 2018-11-06 20:28 UTC (permalink / raw)
  To: notmuch

Is there a way to use notmuch to optimize space in my maildir?  I'd like
to find and delete the dozens of messages that were sent to me years ago
with 50MB attachments.

I didn't see a search-term that returns results by size.

I tried to ask this on #notmuch, but got:
  *** #notmuch: Cannot send to channel

thanks,

Mark

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: finding file by size
  2018-11-06 20:28 finding file by size Mark Janes
@ 2018-11-06 20:47 ` Ralph Seichter
  2018-11-07 21:14   ` Carl Worth
  0 siblings, 1 reply; 5+ messages in thread
From: Ralph Seichter @ 2018-11-06 20:47 UTC (permalink / raw)
  To: notmuch

* Mark Janes:

> I'd like to find and delete the dozens of messages that were sent to
> me years ago with 50MB attachments.

I'm not sure about using Notmuch itself, but this should work:

  find /path/to/maildir -type f -size +50M | xargs rm

Each message in Maildir is stored in a single file, after all.

-Ralph

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: finding file by size
  2018-11-06 20:47 ` Ralph Seichter
@ 2018-11-07 21:14   ` Carl Worth
  2018-11-07 21:37     ` Mark Janes
  0 siblings, 1 reply; 5+ messages in thread
From: Carl Worth @ 2018-11-07 21:14 UTC (permalink / raw)
  To: Ralph Seichter, notmuch

[-- Attachment #1: Type: text/plain, Size: 1526 bytes --]

On Tue, Nov 06 2018, Ralph Seichter wrote:
> I'm not sure about using Notmuch itself,

Right. Notmuch doesn't currently index (as far as I'm aware) anything
that would be useful for sorting by size.

> but this should work:
>
>   find /path/to/maildir -type f -size +50M | xargs rm

Hmm... I imagine that Mark would be more interested in viewing these
files to ensure they are what he thinks they are before deleting them.

So, capturing the results of that with a notmuch tag would be
a reasonable thing to do. The only trick there is that I don't see any
existing search term to find a message associated with a particular file
name, (we have "path:" and "folder:" to find messages in a specific
directory, but nothing I see for finding the message corresponding to a
specific file).

So, then we could extract the message-id from each file and do a search
based on that I guess?

Here's a (bash) command I just ran on my mail store of over a million
messages that tagged the 8 messages larger than 50MB. It took about a
minute to run (with a warm cache):

  for msg in $(
               for file in $(find . -type f -size +50M);
               do
                 grep -i ^Message-Id $file | sed -e 's/^.*<\(.*\)>.*/\1/';
               done
             );
  do
    notmuch tag +large id:$msg;
  done

With that, I'm able to go through the list from:

  notmuch search tag:large

to investigate whether these large emails are worth keeping.

So, that's obviously not extremely elegant, but it's at least possible.

-Carl


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: finding file by size
  2018-11-07 21:14   ` Carl Worth
@ 2018-11-07 21:37     ` Mark Janes
  2018-11-12 15:28       ` David Bremner
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Janes @ 2018-11-07 21:37 UTC (permalink / raw)
  To: Carl Worth, Ralph Seichter, notmuch

Carl Worth <cworth@cworth.org> writes:

> On Tue, Nov 06 2018, Ralph Seichter wrote:
>> I'm not sure about using Notmuch itself,
>
> Right. Notmuch doesn't currently index (as far as I'm aware) anything
> that would be useful for sorting by size.
>
>> but this should work:
>>
>>   find /path/to/maildir -type f -size +50M | xargs rm
>
> Hmm... I imagine that Mark would be more interested in viewing these
> files to ensure they are what he thinks they are before deleting them.

Yes, exactly this.  It's quite nice to see the messages in question.
Looking at the raw message files is not helpful, as the contents are
usually encoded.

> So, capturing the results of that with a notmuch tag would be
> a reasonable thing to do. The only trick there is that I don't see any
> existing search term to find a message associated with a particular file
> name, (we have "path:" and "folder:" to find messages in a specific
> directory, but nothing I see for finding the message corresponding to a
> specific file).

It would be better to have a separate numeric field in xapian similar to
how the date is handled.  Then the user could search for files of
specific sizes.  A more usable feature would be to sort the currently
selected messages by size in emacs or any other front-end.

In the past I've used Thunderbird to archive old/large messages from my
imap server.  It has a column in the UI that allows you to sort files by
size.  I'm just now realizing that notmuch *only* sorts messages by date
(at least in emacs) -- and I haven't missed any other sort functionality
until now...

> So, then we could extract the message-id from each file and do a search
> based on that I guess?
>
> Here's a (bash) command I just ran on my mail store of over a million
> messages that tagged the 8 messages larger than 50MB. It took about a
> minute to run (with a warm cache):
>
>   for msg in $(
>                for file in $(find . -type f -size +50M);
>                do
>                  grep -i ^Message-Id $file | sed -e 's/^.*<\(.*\)>.*/\1/';
>                done
>              );
>   do
>     notmuch tag +large id:$msg;
>   done
>
> With that, I'm able to go through the list from:
>
>   notmuch search tag:large
>
> to investigate whether these large emails are worth keeping.
>
> So, that's obviously not extremely elegant, but it's at least possible.

Thanks for this, I will use your command for my current needs.  Even if
notmuch supported a more elegant feature, I would still need to somehow
locate and remove the messages from my maildir.  I can see how the
separation between the xapian database and the maildir makes it
impossible to support cleaning up the maildir through notmuch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: finding file by size
  2018-11-07 21:37     ` Mark Janes
@ 2018-11-12 15:28       ` David Bremner
  0 siblings, 0 replies; 5+ messages in thread
From: David Bremner @ 2018-11-12 15:28 UTC (permalink / raw)
  To: Mark Janes, Carl Worth, notmuch

Mark Janes <mark.a.janes@intel.com> writes:

>
> Thanks for this, I will use your command for my current needs.  Even if
> notmuch supported a more elegant feature, I would still need to somehow
> locate and remove the messages from my maildir.  I can see how the
> separation between the xapian database and the maildir makes it
> impossible to support cleaning up the maildir through notmuch.

There is a patch series in progress to add filesize searching, if
someone wanted to work through the review issues:

        id:20170518222708.30032-1-adi@adirat.com

        https://nmbug.notmuchmail.org/nmweb/search/id%3A20170518222708.30032-1-adi%40adirat.com

d

P.S. #notmuch requires nicks to be registered with services, as an
attempt to combat the amount of active spambots.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-12 15:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-06 20:28 finding file by size Mark Janes
2018-11-06 20:47 ` Ralph Seichter
2018-11-07 21:14   ` Carl Worth
2018-11-07 21:37     ` Mark Janes
2018-11-12 15:28       ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).