Hi! First, thanks to David, Tomi, Tom for moving this forward. On Sat, 19 Nov 2011 16:11:13 +0100, Petter Reinholdtsen wrote: > [Thomas Schwinge] > > + /* This used to use NOTMUCH_SORT_MESSAGE_ID. On 2011-10-29, a measurement > > + * on a 372981 messages instance showed that wall time can be reduced from > > + * 28 minutes (sorted by Message-ID) to 15 minutes (unsorted), the latter > > + * being much more ``database-disk-layout-friendly''. Subsequently sorting > > + * the 25 MiB of data is a no-brainer, if required. */ Here is the measurement re-done -- I discovered that while doing the former, there had been parallel work been done in another Xen domU on that system, disturbing the measurement. Discard caches, every time before dumping: $ sync; sleep 3; echo -n 3 | sudo dd of=/proc/sys/vm/drop_caches Original (sorted by Message-ID): $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump 26.41user 16.56system 14:34.81elapsed 4%CPU (0avgtext+0avgdata 167152maxresident)k 2994440inputs+55896outputs (41major+11627minor)pagefaults 0swaps Unsorted: $ \time notmuch dump | sort > ~/tmp/Mail-notmuch_dump/dump 24.79user 3.86system 12:00.22elapsed 3%CPU (0avgtext+0avgdata 57216maxresident)k 2929192inputs+0outputs (40major+4942minor)pagefaults 0swaps The difference is no longer as big as before, but still better than nothing. > This sound like a great idea for my use case. Doing 'notmuch dump' > with my 1.2 million emails take hours at the moment (not very fast > encrypted file system), and result in a 90 MiB dump file. ... and you will gain most by putting the .notmuch directory onto a SSD, as I have done by now: Original (sorted by Message-ID), with .notmuch on SSD: $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump 24.86user 13.40system 1:06.01elapsed 57%CPU (0avgtext+0avgdata 167200maxresident)k 2992184inputs+55920outputs (49major+11622minor)pagefaults 0swaps Unsorted, with .notmuch on SSD: $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump 21.90user 2.68system 0:51.70elapsed 47%CPU (0avgtext+0avgdata 57248maxresident)k 2926912inputs+55920outputs (50major+4934minor)pagefaults 0swaps User and system time (roughly) remain the same, but the wall time drops considerably -- a SSD at its best, obviously. Generally speaking, I decided it was enough to just put the .notmuch directory onto the SSD, and not the whole mail store: if new messages are added (notmuch new), they're still in the page cache anyway (having been retrieven via POP3 or whatever just before), and for regular message read access, a HDD's seek time shouldn't matter too much (and I've taken notice of Austin's patches which even retrieven Subject: etc. from the DB), so what remains to be optimized is random access to the DB. Grüße, Thomas