unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Thomas Schwinge <thomas@schwinge.name>
To: notmuch@notmuchmail.org
Cc: Petter Reinholdtsen <pere@hungry.com>
Subject: Re: [PATCH] dump: Don't sort.
Date: Mon, 28 Nov 2011 22:04:14 +0100	[thread overview]
Message-ID: <87hb1ormb5.fsf@boole.schwinge.homeip.net> (raw)
In-Reply-To: <2flr514171q.fsf@login1.uio.no>

[-- Attachment #1: Type: text/plain, Size: 2925 bytes --]

Hi!

First, thanks to David, Tomi, Tom for moving this forward.


On Sat, 19 Nov 2011 16:11:13 +0100, Petter Reinholdtsen <pere@hungry.com> wrote:
> [Thomas Schwinge]
> > +    /* This used to use NOTMUCH_SORT_MESSAGE_ID.  On 2011-10-29, a measurement
> > +     * on a 372981 messages instance showed that wall time can be reduced from
> > +     * 28 minutes (sorted by Message-ID) to 15 minutes (unsorted), the latter
> > +     * being much more ``database-disk-layout-friendly''.  Subsequently sorting
> > +     * the 25 MiB of data is a no-brainer, if required.  */

Here is the measurement re-done -- I discovered that while doing the
former, there had been parallel work been done in another Xen domU on
that system, disturbing the measurement.

Discard caches, every time before dumping:

    $ sync; sleep 3; echo -n 3 | sudo dd of=/proc/sys/vm/drop_caches

Original (sorted by Message-ID):

    $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
    26.41user 16.56system 14:34.81elapsed 4%CPU (0avgtext+0avgdata 167152maxresident)k
    2994440inputs+55896outputs (41major+11627minor)pagefaults 0swaps

Unsorted:

    $ \time notmuch dump | sort > ~/tmp/Mail-notmuch_dump/dump
    24.79user 3.86system 12:00.22elapsed 3%CPU (0avgtext+0avgdata 57216maxresident)k
    2929192inputs+0outputs (40major+4942minor)pagefaults 0swaps

The difference is no longer as big as before, but still better than
nothing.

> This sound like a great idea for my use case.  Doing 'notmuch dump'
> with my 1.2 million emails take hours at the moment (not very fast
> encrypted file system), and result in a 90 MiB dump file.

... and you will gain most by putting the .notmuch directory onto a SSD,
as I have done by now:

Original (sorted by Message-ID), with .notmuch on SSD:

    $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
    24.86user 13.40system 1:06.01elapsed 57%CPU (0avgtext+0avgdata 167200maxresident)k
    2992184inputs+55920outputs (49major+11622minor)pagefaults 0swaps

Unsorted, with .notmuch on SSD:

    $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
    21.90user 2.68system 0:51.70elapsed 47%CPU (0avgtext+0avgdata 57248maxresident)k
    2926912inputs+55920outputs (50major+4934minor)pagefaults 0swaps

User and system time (roughly) remain the same, but the wall time drops
considerably -- a SSD at its best, obviously.


Generally speaking, I decided it was enough to just put the .notmuch
directory onto the SSD, and not the whole mail store: if new messages are
added (notmuch new), they're still in the page cache anyway (having been
retrieven via POP3 or whatever just before), and for regular message read
access, a HDD's seek time shouldn't matter too much (and I've taken
notice of Austin's patches which even retrieven Subject: etc. from the
DB), so what remains to be optimized is random access to the DB.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

  reply	other threads:[~2011-11-28 21:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-29 10:37 [PATCH] dump: Don't sort Thomas Schwinge
2011-11-15  1:10 ` David Bremner
2011-11-21 11:04   ` Tomi Ollila
2011-11-19 15:11 ` Petter Reinholdtsen
2011-11-28 21:04   ` Thomas Schwinge [this message]
2011-11-27 18:40 ` [PATCH] dump: Don't sort the output by message id Tom Prince
2011-11-29  7:10   ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87hb1ormb5.fsf@boole.schwinge.homeip.net \
    --to=thomas@schwinge.name \
    --cc=notmuch@notmuchmail.org \
    --cc=pere@hungry.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).