unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Notmuch new speed degradation
@ 2014-07-24  8:19 Dmitry Bogatov
  2014-07-24 14:32 ` Austin Clements
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Bogatov @ 2014-07-24  8:19 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 387 bytes --]

Hello!

I have ~ 3 000 000 mails. I wanted to index them.
First 1 000 000 took several hours, next 200 000 took several days.

And now, even with libeatmydata, it takes ~ 4 sec for a file.

Is it any way I can improve perfomance?

PS. Please, keep me in CC.

--
Best regards, Dmitry Bogatov <KAction@gnu.org>,
Free Software supporter, esperantisto and netiquette guardian.
GPG: 54B7F00D

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Notmuch new speed degradation
  2014-07-24  8:19 Notmuch new speed degradation Dmitry Bogatov
@ 2014-07-24 14:32 ` Austin Clements
  2014-07-24 19:49   ` Dmitry Bogatov
  0 siblings, 1 reply; 4+ messages in thread
From: Austin Clements @ 2014-07-24 14:32 UTC (permalink / raw)
  To: Dmitry Bogatov; +Cc: notmuch

Quoth Dmitry Bogatov on Jul 24 at 12:19 pm:
> Hello!
> 
> I have ~ 3 000 000 mails. I wanted to index them.
> First 1 000 000 took several hours, next 200 000 took several days.
> 
> And now, even with libeatmydata, it takes ~ 4 sec for a file.
> 
> Is it any way I can improve perfomance?
> 
> PS. Please, keep me in CC.

Hi Dmitry.  My guess is that's you've exceeded your OS buffer cache
size by enough that most B-tree reads are going to disk at least once.
How big is your database (du -h $MAIL/.notmuch/xapian) and what does
free -h report on that computer?  Also, is this on an SSD or an HDD?

You could try running notmuch compact.  That should shrink the
database, and, more importantly, pack more into the active page set
and, I think, also linearize the database.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Notmuch new speed degradation
  2014-07-24 14:32 ` Austin Clements
@ 2014-07-24 19:49   ` Dmitry Bogatov
  2014-07-24 22:31     ` Austin Clements
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Bogatov @ 2014-07-24 19:49 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

* Austin Clements <amdragon@MIT.EDU> [2014-07-24 10:32:14-0400]
> Hi Dmitry.  My guess is that's you've exceeded your OS buffer cache
> size by enough that most B-tree reads are going to disk at least once.
> How big is your database (du -h $MAIL/.notmuch/xapian) and what does
> free -h report on that computer?  Also, is this on an SSD or an HDD?

13Gb on HDD, 9G after compact. Compact did not improved indexing speed,
unfortunately. Maybe it is possible to somehow merge databases?

             total       used       free     shared    buffers     cached
Mem:          7,7G       6,5G       1,2G       240M       826M       3,6G
-/+ buffers/cache:       2,1G       5,6G
Swap:         1,9G        66M       1,8G

--
Best regards, Dmitry Bogatov <KAction@gnu.org>,
Free Software supporter, esperantisto and netiquette guardian.
GPG: 54B7F00D

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Notmuch new speed degradation
  2014-07-24 19:49   ` Dmitry Bogatov
@ 2014-07-24 22:31     ` Austin Clements
  0 siblings, 0 replies; 4+ messages in thread
From: Austin Clements @ 2014-07-24 22:31 UTC (permalink / raw)
  To: Dmitry Bogatov; +Cc: notmuch

Quoth Dmitry Bogatov on Jul 24 at 11:49 pm:
> * Austin Clements <amdragon@MIT.EDU> [2014-07-24 10:32:14-0400]
> > Hi Dmitry.  My guess is that's you've exceeded your OS buffer cache
> > size by enough that most B-tree reads are going to disk at least once.
> > How big is your database (du -h $MAIL/.notmuch/xapian) and what does
> > free -h report on that computer?  Also, is this on an SSD or an HDD?
> 
> 13Gb on HDD, 9G after compact. Compact did not improved indexing speed,
> unfortunately. Maybe it is possible to somehow merge databases?

Unfortunately, there's no support for merging databases.  Other than
technical difficulties like identifying messages that should belong to
the same thread during merge, the schema wasn't designed with this in
mind and uses various features that are incompatible with merging.

There are some known problems with Xapian slowing down as the database
gets larger, but four seconds per message still sounds extreme.

Another thing to try is to raise Xapian's flush threshold by setting
the environment variable XAPIAN_FLUSH_THRESHOLD.  The default is
10000.  Try increasing it by, say, an order of magnitude (you can
probably go much higher than that, though you don't want to go too
high or you'll start eating in to the memory for your page cache).

>              total       used       free     shared    buffers     cached
> Mem:          7,7G       6,5G       1,2G       240M       826M       3,6G
> -/+ buffers/cache:       2,1G       5,6G
> Swap:         1,9G        66M       1,8G

Hmm.  Was this after the compact or after notmuch new had run for a
while?  1.2GB of free memory suggests that it's not a page cache
problem, but that would only apply if you took this snapshot after
notmuch new, not after compact.

We should confirm that this is an IO problem.  If you run
/usr/bin/time notmuch new for a few minutes, is the %CPU significantly
below 100%?  If it's above 90%ish, then this is a CPU problem and we
might be able to track it down using CPU profiling.  If it is an IO
problem (which is almost certainly is), I'm afraid it's much harder to
track down.

Also, what file system are you using?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-07-24 22:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-24  8:19 Notmuch new speed degradation Dmitry Bogatov
2014-07-24 14:32 ` Austin Clements
2014-07-24 19:49   ` Dmitry Bogatov
2014-07-24 22:31     ` Austin Clements

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).