unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Olly Betts <olly@survex.com>
To: David Bremner <david@tethera.net>
Cc: Don Zickus <dzickus@redhat.com>,
	notmuch@notmuchmail.org, xapian-discuss@lists.xapian.org
Subject: Re: performance problems with notmuch new
Date: Thu, 23 Apr 2020 00:21:30 +0100	[thread overview]
Message-ID: <20200422232130.GH28897@survex.com> (raw)
In-Reply-To: <87imhup6kr.fsf@tethera.net>

On Mon, Apr 20, 2020 at 11:36:36AM -0300, David Bremner wrote:
> Franz Fellner <alpine.art.de@gmail.com> writes:
> 
> > I also suffer from bad performance of notmuch new.  I used notmuch
> > some years ago and notmuch new always felt instantanious.  Had to stop
> > using it because internet was too slow to sync my mails :/ Now (with
> > better internet and a completely new setup using mbsync) indexing one
> > mail takes at least 10 seconds, sometimes even more.  It can go into
> > minutes when I get lots of mail (~30...).

First question: what version of Xapian are you using?

And second thing to check, are you committing each message separately?

The commit operation tries to ensure that the data has actually been
written out to disk, so the time to index one message by itself isn't
indicative as it'll often mostly just be waiting for fdatasync() or
similar to return.

If you index 30 messages but commit each separately (i.e. run "notmuch
new" 30 times picking up one new message each time) that'll probably
scale something like linearly, but indexing a batch of 30 messages
should be much quicker per message.

> > When I run it after a
> > reboot I can have breakfast while notmuch starts up...  This is all on
> > spinning rust. I thought of getting an SSD but not in the near future.

After reboot the disk cache won't have any of the database in, so the
first operation will typically be slower, especially with a spinning
drive where seeks are relatively slow.

> > What I observe during that time: notmuch doesn't really need much CPU.
> > iotop shows constant read and write with extremely low rates, under
> > 1MB/sec.  So I think it might be an issue in xapian?
> 
> Just in case one of the xapian experts can suggest some kind of test for
> why you might be seeing this behaviour, I've included the xapian list in
> CC.

It sounds like you're seek-limited in this "cold cache" phase.  That is
not necessarily related to the slow indexing, but it could be.

I'd check the SMART diagnostics for the drive first (e.g. with
smartctl).  It's not the most likely cause, but it's quick to check and
if the drive is starting to fail it's better to find out sooner rather
than later.

Then I'd try compacting the database (I think there's a "notmuch
compact" subcommand to do this).
 
If that doesn't help, profiling the I/O would probably be my next
suggestion - there are some tools in the xapian git repo to help with
this (in xapian-maintainer-tools/profiling).  Under Linux I'd suggest
the strace ones (there's also an LD_PRELOAD library but it may need
tweaking for 32 vs 64 bit).

Cheers,
    Olly

  parent reply	other threads:[~2020-04-22 23:21 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-15 15:08 performance problems with notmuch new Don Zickus
2020-04-15 16:01 ` David Bremner
2020-04-15 17:31   ` Don Zickus
2020-04-15 20:29     ` David Bremner
2020-04-20 15:25       ` Don Zickus
2020-04-24 10:36         ` David Bremner
2020-04-24 19:05           ` Don Zickus
2020-04-24 23:07             ` David Bremner
2020-04-27 12:48               ` Don Zickus
2020-04-27 12:56                 ` David Bremner
2020-04-20 14:31     ` David Bremner
2020-04-20 16:26       ` Don Zickus
2020-04-18 11:59 ` Franz Fellner
2020-04-20 14:36   ` David Bremner
2020-04-22 10:46     ` Franz Fellner
2020-04-22 23:21     ` Olly Betts [this message]
2020-04-24 18:35       ` Franz Fellner
2020-04-24 23:13         ` David Bremner
2020-04-29  9:30     ` Eric Wong
2020-04-20 15:04   ` Kim Minh Kaplan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200422232130.GH28897@survex.com \
    --to=olly@survex.com \
    --cc=david@tethera.net \
    --cc=dzickus@redhat.com \
    --cc=notmuch@notmuchmail.org \
    --cc=xapian-discuss@lists.xapian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).