unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Franz Fellner <alpine.art.de@gmail.com>, Don Zickus <dzickus@redhat.com>
Cc: notmuch@notmuchmail.org, xapian-discuss@lists.xapian.org
Subject: Re: performance problems with notmuch new
Date: Wed, 29 Apr 2020 09:30:46 +0000	[thread overview]
Message-ID: <20200429093046.GA11038@dcvr> (raw)
In-Reply-To: <87imhup6kr.fsf@tethera.net>

David Bremner <david@tethera.net> wrote:
> Franz Fellner <alpine.art.de@gmail.com> writes:
> > mail takes at least 10 seconds, sometimes even more.  It can go into
> > minutes when I get lots of mail (~30...).  When I run it after a
> > reboot I can have breakfast while notmuch starts up...  This is all on
> > spinning rust. I thought of getting an SSD but not in the near future.
> 
> I do have at least one spinning rust configuration with about 300k
> messages, and notmuch new is still fast there.

I've yet to figure out how spinning rust can work well with
giant public-inboxes (git + Xapian + SQLite); but I have
a fair bit of experience with SSDs + Xapian.

But some of my recommendations below come from my experience
with HDDs in the old days, before I used Xapian.

> > What I observe during that time: notmuch doesn't really need much CPU.
> > iotop shows constant read and write with extremely low rates, under
> > 1MB/sec.  So I think it might be an issue in xapian?

Seek times, probably   `iostat -x 1' can give you very useful
information about I/O queue sizes and wait times for reads and
writes (the `-x' is the good stuff :), `1' means it keeps
outputting every second.

> Just in case one of the xapian experts can suggest some kind of test for
> why you might be seeing this behaviour, I've included the xapian list in
> CC.

Newer Xapian has a DB_NO_SYNC which notmuch could set as an
option.  Users of old Xapian (or on Perl XS bindings) also have
libeatmydata LD_PRELOAD which I end up using all the time:

	https://www.flamingspork.com/projects/libeatmydata/

I run `sync' if I have anything important, but I usually
don't ;)   I do set the kernel do flush dirty data in the
background fairly aggressively, though (more below)

For public-inbox v2 hacking in 2018 (indexing LKML archives, ~3M
messages), I found working on a freshly TRIM-ed SSD with plenty
of free space made the SSD firmware happier.  SSDs can get a LOT
slower as they get fuller (so xapian-compact helps, there, too).

SSD quality matters a lot; but even the low-end QLC stuff beats
high-end HDDs in random I/O; but they will slow down more as
they fill up more.

For writes, I set /proc/sys/vm/dirty_background_bytes to 100M or
something reasonably close to what the SSD can write quickly.
Linux tended to hit I/O stalls with lots of dirty data, so
making the kernel flush it sooner tends to help IME.  Maybe newer
kernels do better *shrug*; but it's basically the local storage
version of the network "Bufferbloat" problem.

Flushing dirty data more frequently also frees up more memory
for the kernel to make better caching decisions about
future/current data it needs to read.

notmuch can probably run a background thread (or use liburing)
to do POSIX_FADV_DONTNEED once its done with a message, too (and
POSIX_FADV_WILLNEED for to-be-indexed messages).  Uncompressed
Maildir messages eat cache space real quick, which means less
cache for Xapian.

public-inbox indexes the v2 inbox format in parallel; but
excessive parallelism still causes I/O contention with SSDs (at
least upper-mid-range ones).  So right now the default limit is
3 indexing processes regardless of CPU count.  Reading from git
is still synchronous atm, but will probably be async in a few
months.  git itself tends to generate decent I/O patterns with
its pack format (but makes posix_fadvise hinting impractical).

Anyways, indexing just under 3 million LKML messages took ~4
hours on 4-core system built in 2010 with a SATA SSD from 2014.

  parent reply	other threads:[~2020-04-29  9:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-15 15:08 performance problems with notmuch new Don Zickus
2020-04-15 16:01 ` David Bremner
2020-04-15 17:31   ` Don Zickus
2020-04-15 20:29     ` David Bremner
2020-04-20 15:25       ` Don Zickus
2020-04-24 10:36         ` David Bremner
2020-04-24 19:05           ` Don Zickus
2020-04-24 23:07             ` David Bremner
2020-04-27 12:48               ` Don Zickus
2020-04-27 12:56                 ` David Bremner
2020-04-20 14:31     ` David Bremner
2020-04-20 16:26       ` Don Zickus
2020-04-18 11:59 ` Franz Fellner
2020-04-20 14:36   ` David Bremner
2020-04-22 10:46     ` Franz Fellner
2020-04-22 23:21     ` Olly Betts
2020-04-24 18:35       ` Franz Fellner
2020-04-24 23:13         ` David Bremner
2020-04-29  9:30     ` Eric Wong [this message]
2020-04-20 15:04   ` Kim Minh Kaplan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200429093046.GA11038@dcvr \
    --to=e@80x24.org \
    --cc=alpine.art.de@gmail.com \
    --cc=dzickus@redhat.com \
    --cc=notmuch@notmuchmail.org \
    --cc=xapian-discuss@lists.xapian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).