unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* 'notmuch new' leaking memory and getting slower over time?
@ 2011-11-21 22:35 Petter Reinholdtsen
  2011-11-22 21:50 ` David Bremner
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Petter Reinholdtsen @ 2011-11-21 22:35 UTC (permalink / raw)
  To: notmuch


This weekend up updated my notmuch version to the 0.10 rc2 version
available in Debian/unstable, and made a few observations I would like
to share.  I rebuilt the source on Debian/Squeeze to get it working for
the machine I use to read email, and patched it to get the unsorted dump
before building it.  The reason is that my mail store contain slightly
more than 1.2 million emails at the moment, and just dumping the tags
take hours without the unsorted dump patch.  With the patch it took 40
minutes.  After dumping the tags, I moved away the .notmuch index and
started reindexing using 'notmuch new'.

The indexing took 36 hours.  At the start it claimed it would take 10
hours, and it continued to underestimate the amount of time left until
the very end.  It claimed to have 1 hour left when I checked before I
went to bed, and claimed to have 15 minutes left when I woke up 6-7
hours later.

Shortly before the indexing finished, the notmuch process was using 1.2
GiB of resident memory according to top.  Is the process leaking memory?

Running 'notmuch restore' to get my tags back took 106 minutes, and I
was very surprised that the restore could not load all the tags stored
by 'notmuch dump'.  The restore complained about this line:

  NO*TELEMAX**NORWAYII  M0018001012307699038 (unread usit year-2002)

The message in question is a bounce from some X400 mail system, and its
message id look like this:

  Message-Id: <NO*TELEMAX**NORWAYII    M0018001012307699038>

I would like 'notmuch new' to use less memory and be better at
estimating the time left, and would also like 'notmuch restore' to
always be able to load the output from 'notmuch dump'.
-- 
Happy hacking
Petter Reinholdtsen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 'notmuch new' leaking memory and getting slower over time?
  2011-11-21 22:35 'notmuch new' leaking memory and getting slower over time? Petter Reinholdtsen
@ 2011-11-22 21:50 ` David Bremner
  2011-11-23  2:48 ` Austin Clements
  2012-12-10 11:50 ` David Bremner
  2 siblings, 0 replies; 5+ messages in thread
From: David Bremner @ 2011-11-22 21:50 UTC (permalink / raw)
  To: Petter Reinholdtsen, notmuch

On Mon, 21 Nov 2011 23:35:34 +0100, Petter Reinholdtsen <pere@hungry.com> wrote:
> 
>   NO*TELEMAX**NORWAYII  M0018001012307699038 (unread usit year-2002)
> 
> The message in question is a bounce from some X400 mail system, and its
> message id look like this:
> 
>   Message-Id: <NO*TELEMAX**NORWAYII    M0018001012307699038>

Maybe try the patch 

      id:"1319884807-7206-1-git-send-email-thomas@schwinge.name"

d

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 'notmuch new' leaking memory and getting slower over time?
  2011-11-21 22:35 'notmuch new' leaking memory and getting slower over time? Petter Reinholdtsen
  2011-11-22 21:50 ` David Bremner
@ 2011-11-23  2:48 ` Austin Clements
  2012-12-10 11:50 ` David Bremner
  2 siblings, 0 replies; 5+ messages in thread
From: Austin Clements @ 2011-11-23  2:48 UTC (permalink / raw)
  To: Petter Reinholdtsen; +Cc: notmuch

Quoth Petter Reinholdtsen on Nov 21 at 11:35 pm:
> The indexing took 36 hours.  At the start it claimed it would take 10
> hours, and it continued to underestimate the amount of time left until
> the very end.  It claimed to have 1 hour left when I checked before I
> went to bed, and claimed to have 15 minutes left when I woke up 6-7
> hours later.

notmuch new does a simple linear extrapolation based on how many files
it's examined and how many there are total.  This is doomed to
undershoot at least because indexing becomes slower as the database
grows (B-tree insertion is O(log N), fragmentation will increase over
time, posting lists will get longer...).

I'm not sure much can be done about the estimate at the beginning,
short of throwing in some fudge factor, but the estimates later in the
process would be much more accurate if it used a sliding window,
rather than measuring from the beginning.

> Shortly before the indexing finished, the notmuch process was using 1.2
> GiB of resident memory according to top.  Is the process leaking memory?

It's possible this is just memory fragmentation, but it definitely
sounds like a leak.  talloc has some tools for tracking down leaks and
it would be good to heap profile notmuch new, but to my knowledge
nobody's applied these tools.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 'notmuch new' leaking memory and getting slower over time?
  2011-11-21 22:35 'notmuch new' leaking memory and getting slower over time? Petter Reinholdtsen
  2011-11-22 21:50 ` David Bremner
  2011-11-23  2:48 ` Austin Clements
@ 2012-12-10 11:50 ` David Bremner
  2012-12-10 12:39   ` David Bremner
  2 siblings, 1 reply; 5+ messages in thread
From: David Bremner @ 2012-12-10 11:50 UTC (permalink / raw)
  To: Petter Reinholdtsen, notmuch

Petter Reinholdtsen <pere@hungry.com> writes:
> Running 'notmuch restore' to get my tags back took 106 minutes, and I
> was very surprised that the restore could not load all the tags stored
> by 'notmuch dump'.  The restore complained about this line:
>
>   NO*TELEMAX**NORWAYII  M0018001012307699038 (unread usit year-2002)
>
> The message in question is a bounce from some X400 mail system, and its
> message id look like this:
>
>   Message-Id: <NO*TELEMAX**NORWAYII    M0018001012307699038>
>
> I would like 'notmuch new' to use less memory and be better at
> estimating the time left, and would also like 'notmuch restore' to
> always be able to load the output from 'notmuch dump'.

As of git commit 0.14-162-g5c7990f, notmuch supports a new dump restore
format 'batch-tag' that can handle these message-id's.  See the man
pages for more details.

I'm marking this bug as fixed; feel free to let us know of any problems
with the new batch-tag format.

d

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 'notmuch new' leaking memory and getting slower over time?
  2012-12-10 11:50 ` David Bremner
@ 2012-12-10 12:39   ` David Bremner
  0 siblings, 0 replies; 5+ messages in thread
From: David Bremner @ 2012-12-10 12:39 UTC (permalink / raw)
  To: Petter Reinholdtsen, notmuch

David Bremner <bremner@unb.ca> writes:

>
> I'm marking this bug as fixed; feel free to let us know of any problems
> with the new batch-tag format.

I should say I mean the bug with message-ids with spaces in
them. Performance issues with notmuch new (and particularly, bad
prediction of completion times) remain, presumably.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-12-10 12:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-21 22:35 'notmuch new' leaking memory and getting slower over time? Petter Reinholdtsen
2011-11-22 21:50 ` David Bremner
2011-11-23  2:48 ` Austin Clements
2012-12-10 11:50 ` David Bremner
2012-12-10 12:39   ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).