From: Thomas Schwinge <thomas@schwinge.name>
To: notmuch@notmuchmail.org
Cc: Petter Reinholdtsen <pere@hungry.com>
Subject: Re: [PATCH] dump: Don't sort.
Date: Mon, 28 Nov 2011 22:04:14 +0100 [thread overview]
Message-ID: <87hb1ormb5.fsf@boole.schwinge.homeip.net> (raw)
In-Reply-To: <2flr514171q.fsf@login1.uio.no>
[-- Attachment #1: Type: text/plain, Size: 2925 bytes --]
Hi!
First, thanks to David, Tomi, Tom for moving this forward.
On Sat, 19 Nov 2011 16:11:13 +0100, Petter Reinholdtsen <pere@hungry.com> wrote:
> [Thomas Schwinge]
> > + /* This used to use NOTMUCH_SORT_MESSAGE_ID. On 2011-10-29, a measurement
> > + * on a 372981 messages instance showed that wall time can be reduced from
> > + * 28 minutes (sorted by Message-ID) to 15 minutes (unsorted), the latter
> > + * being much more ``database-disk-layout-friendly''. Subsequently sorting
> > + * the 25 MiB of data is a no-brainer, if required. */
Here is the measurement re-done -- I discovered that while doing the
former, there had been parallel work been done in another Xen domU on
that system, disturbing the measurement.
Discard caches, every time before dumping:
$ sync; sleep 3; echo -n 3 | sudo dd of=/proc/sys/vm/drop_caches
Original (sorted by Message-ID):
$ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
26.41user 16.56system 14:34.81elapsed 4%CPU (0avgtext+0avgdata 167152maxresident)k
2994440inputs+55896outputs (41major+11627minor)pagefaults 0swaps
Unsorted:
$ \time notmuch dump | sort > ~/tmp/Mail-notmuch_dump/dump
24.79user 3.86system 12:00.22elapsed 3%CPU (0avgtext+0avgdata 57216maxresident)k
2929192inputs+0outputs (40major+4942minor)pagefaults 0swaps
The difference is no longer as big as before, but still better than
nothing.
> This sound like a great idea for my use case. Doing 'notmuch dump'
> with my 1.2 million emails take hours at the moment (not very fast
> encrypted file system), and result in a 90 MiB dump file.
... and you will gain most by putting the .notmuch directory onto a SSD,
as I have done by now:
Original (sorted by Message-ID), with .notmuch on SSD:
$ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
24.86user 13.40system 1:06.01elapsed 57%CPU (0avgtext+0avgdata 167200maxresident)k
2992184inputs+55920outputs (49major+11622minor)pagefaults 0swaps
Unsorted, with .notmuch on SSD:
$ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
21.90user 2.68system 0:51.70elapsed 47%CPU (0avgtext+0avgdata 57248maxresident)k
2926912inputs+55920outputs (50major+4934minor)pagefaults 0swaps
User and system time (roughly) remain the same, but the wall time drops
considerably -- a SSD at its best, obviously.
Generally speaking, I decided it was enough to just put the .notmuch
directory onto the SSD, and not the whole mail store: if new messages are
added (notmuch new), they're still in the page cache anyway (having been
retrieven via POP3 or whatever just before), and for regular message read
access, a HDD's seek time shouldn't matter too much (and I've taken
notice of Austin's patches which even retrieven Subject: etc. from the
DB), so what remains to be optimized is random access to the DB.
Grüße,
Thomas
[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]
next prev parent reply other threads:[~2011-11-28 21:04 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-29 10:37 [PATCH] dump: Don't sort Thomas Schwinge
2011-11-15 1:10 ` David Bremner
2011-11-21 11:04 ` Tomi Ollila
2011-11-19 15:11 ` Petter Reinholdtsen
2011-11-28 21:04 ` Thomas Schwinge [this message]
2011-11-27 18:40 ` [PATCH] dump: Don't sort the output by message id Tom Prince
2011-11-29 7:10 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87hb1ormb5.fsf@boole.schwinge.homeip.net \
--to=thomas@schwinge.name \
--cc=notmuch@notmuchmail.org \
--cc=pere@hungry.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).