* Xapian commits unexpectedly slow
@ 2019-12-23 2:54 Matthew Schauer
2019-12-29 0:21 ` David Bremner
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Schauer @ 2019-12-23 2:54 UTC (permalink / raw)
To: notmuch
Greetings,
I've been trying to migrate about 25K e-mails to Notmuch, and I'm seeing
some frustrating performance characteristics that don't seem to match
with the experience others report. I'm dumping messages from
Thunderbird in batches and then running `notmuch new` to add each batch
to the database. The indexing performance remains okay, at more than
200 per second, but after Notmuch has reported it's finished indexing,
it hangs for as much as several minutes before exiting. A stack trace
confirms that this is Xapian committing the database, with most of the
time seemingly spent in `fdatasync`. The time spent grows with the size
of the database, not the number of e-mails being imported, which means
this will remain a problem during day-to-day usage.
Has nobody else had a problem like this? Is my setup just weird? I'm
using Notmuch 0.29.3 from the Arch community repository, with Xapian
1.4.14 also from Arch repositories. I am using a spinning-platter hard
disk, but I find it hard to believe that an SSD is required to get
Xapian to perform well at this scale.
Please let me know if you have any performance pointers or can help me
investigate this further. Many thanks!
Matthew Schauer
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xapian commits unexpectedly slow
2019-12-23 2:54 Xapian commits unexpectedly slow Matthew Schauer
@ 2019-12-29 0:21 ` David Bremner
2020-01-02 22:46 ` Matthew Schauer
0 siblings, 1 reply; 5+ messages in thread
From: David Bremner @ 2019-12-29 0:21 UTC (permalink / raw)
To: Matthew Schauer, notmuch
Matthew Schauer <matthew.schauer@e10x.net> writes:
> Greetings,
>
> I've been trying to migrate about 25K e-mails to Notmuch, and I'm seeing
> some frustrating performance characteristics that don't seem to match
> with the experience others report.
25,000 messages should really not be a strain, spinning rust or no.
> I'm dumping messages from
> Thunderbird in batches and then running `notmuch new` to add each batch
> to the database. The indexing performance remains okay, at more than
> 200 per second, but after Notmuch has reported it's finished indexing,
> it hangs for as much as several minutes before exiting. A stack trace
> confirms that this is Xapian committing the database, with most of the
> time seemingly spent in `fdatasync`. The time spent grows with the size
> of the database, not the number of e-mails being imported, which means
> this will remain a problem during day-to-day usage.
It would be interesting if you could report the results of running the
notmuch performance test suite (under performance-test/ in the source).
The other thing I'm curious about is the actual size of the
database. This varies a lot, but in the past pathological performance
has sometimes been linked to indexing things that should not be,
bloating the database.
d
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xapian commits unexpectedly slow
2019-12-29 0:21 ` David Bremner
@ 2020-01-02 22:46 ` Matthew Schauer
2024-05-26 12:12 ` David Bremner
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Schauer @ 2020-01-02 22:46 UTC (permalink / raw)
To: David Bremner, notmuch
On 12/28/19 5:21 PM, David Bremner wrote:
> Matthew Schauer <matthew.schauer@e10x.net> writes:
>
>> Greetings,
>>
>> I've been trying to migrate about 25K e-mails to Notmuch, and I'm seeing
>> some frustrating performance characteristics that don't seem to match
>> with the experience others report.
>
> 25,000 messages should really not be a strain, spinning rust or no.
>
>> I'm dumping messages from
>> Thunderbird in batches and then running `notmuch new` to add each batch
>> to the database. The indexing performance remains okay, at more than
>> 200 per second, but after Notmuch has reported it's finished indexing,
>> it hangs for as much as several minutes before exiting. A stack trace
>> confirms that this is Xapian committing the database, with most of the
>> time seemingly spent in `fdatasync`. The time spent grows with the size
>> of the database, not the number of e-mails being imported, which means
>> this will remain a problem during day-to-day usage.
>
> It would be interesting if you could report the results of running the
> notmuch performance test suite (under performance-test/ in the source).
Nifty! Here are the results -- I assume you know how to interpret them
better than I do:
T00-new.sh: Testing notmuch new [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
Initial notmuch new 1163.05 854.26 45.97 444304 2343120/13645200
notmuch new #2 2.23 0.02 0.03 9384 2144/8
notmuch new #3 0.01 0.01 0.00 9460 0/8
notmuch new #4 0.01 0.01 0.00 9428 0/8
notmuch new #5 0.01 0.00 0.00 9468 0/8
notmuch new #6 0.01 0.01 0.00 9692 0/8
new (52374 mv) 1351.01 537.75 235.45 959524 1027288/8531616
new (52374 mv back) 834.15 489.27 213.97 967040 184/4754016
new (52374 cp) 747.23 284.03 105.51 941992 0/4007120
T01-dump-restore.sh: Testing dump and restore [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
load nmbug tags 32.64 4.16 3.97 12744 776/38968
dump * 5.02 4.81 0.18 26256 8/27928
restore * 5.94 5.43 0.48 9728 0/0
T02-tag.sh: Testing tagging [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
tag * +new_tag 611.53 305.67 229.54 111304 0/1840208
tag * +existing_tag 0.05 0.01 0.00 9340 96/0
tag * -existing_tag 513.58 242.77 215.88 36252 0/1937792
tag * -missing_tag 0.02 0.00 0.01 9332 0/0
T03-reindex.sh: Testing reindexing [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
reindex * 1893.02 590.11 150.22 392180 7572912/4620792
reindex * 853.85 440.58 115.60 337320 3072648/2512376
reindex * 629.36 415.50 107.50 337188 1501448/2507848
T04-thread-subquery.sh: Testing thread subqueries [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
search thread:{} ... 28.38 8.25 1.49 94304 278064/144
search thread:{} ... 11.25 5.26 0.63 94300 81520/144
search thread:{} ... 3.24 2.94 0.29 94284 0/144
> The other thing I'm curious about is the actual size of the
> database. This varies a lot, but in the past pathological performance
> has sometimes been linked to indexing things that should not be,
> bloating the database.
Here are some relevant lines from my import process, showing how long
Notmuch thinks it's taking, how long it's actually taking, and the size
of the database after each import. The sizes seem reasonable to me.
For comparison, the maildir itself is 4.5 GiB after all of the imports.
Processed 1057 total files in 1s (759 files/sec.).
notmuch new 1.48s user 0.08s system 52% cpu 2.982 total
6.0M .notmuch
Processed 1669 total files in 3s (438 files/sec.).
notmuch new 3.95s user 0.19s system 63% cpu 6.516 total
27M .notmuch
Processed 3338 total files in 9s (359 files/sec.).
notmuch new 9.73s user 0.44s system 71% cpu 14.288 total
71M .notmuch
Processed 7547 total files in 24s (304 files/sec.).
notmuch new 23.82s user 0.97s system 83% cpu 29.521 total
167M .notmuch
Processed 8224 total files in 39s (210 files/sec.).
notmuch new 35.72s user 1.96s system 52% cpu 1:12.08 total
334M .notmuch
Processed 9530 total files in 39s (239 files/sec.).
notmuch new 35.10s user 1.88s system 74% cpu 49.630 total
519M .notmuch
Processed 6029 total files in 46s (129 files/sec.).
notmuch new 26.42s user 1.88s system 24% cpu 1:54.35 total
641M .notmuch
Processed 6387 total files in 38s (167 files/sec.).
notmuch new 24.29s user 1.69s system 13% cpu 3:10.35 total
706M .notmuch
Processed 3113 total files in 10s (308 files/sec.).
notmuch new 10.65s user 0.82s system 6% cpu 2:53.25 total
725M .notmuch
Processed 410 total files in 1s (344 files/sec.).
notmuch new 1.21s user 0.32s system 1% cpu 1:57.68 total
726M .notmuch
Processed 317 total files in almost no time.
notmuch new 1.09s user 0.34s system 1% cpu 2:22.09 total
727M .notmuch
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xapian commits unexpectedly slow
2020-01-02 22:46 ` Matthew Schauer
@ 2024-05-26 12:12 ` David Bremner
2024-05-27 0:52 ` Matthew Schauer
0 siblings, 1 reply; 5+ messages in thread
From: David Bremner @ 2024-05-26 12:12 UTC (permalink / raw)
To: Matthew Schauer, notmuch
Matthew Schauer <matthew.schauer@e10x.net> writes:
>
> Nifty! Here are the results -- I assume you know how to interpret them
> better than I do:
>
> T00-new.sh: Testing notmuch new [0.4 large]
> Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
> Initial notmuch new 1163.05 854.26 45.97 444304 2343120/13645200
> notmuch new #2 2.23 0.02 0.03 9384 2144/8
> notmuch new #3 0.01 0.01 0.00 9460 0/8
> notmuch new #4 0.01 0.01 0.00 9428 0/8
> notmuch new #5 0.01 0.00 0.00 9468 0/8
> notmuch new #6 0.01 0.01 0.00 9692 0/8
> new (52374 mv) 1351.01 537.75 235.45 959524 1027288/8531616
> new (52374 mv back) 834.15 489.27 213.97 967040 184/4754016
> new (52374 cp) 747.23 284.03 105.51 941992 0/4007120
>
Apologies, it looks like I never replied to this thread. Probably you
are not longer interested, but I can make a few observations, mainly
that there are a few relevant improvements in later notmuch.
1) This is about 3x slower than my current benchmark machine [1]. My
current machine is probably 4 years newer, so I would expect some
improvement in performance.
2) I don't know if this is typical for spinning rust, but about about
25% of the time is (apparently) IO wait, since it it does not show up in
CPU time. I do have access to a machine with both SSD and spinning
rust, but the latter is in some complicated RAID formation, so I don't
know how representative the results would be.
3) Some time after you reported these issues I implemented an
"autocommit" parameter, which should should help avoid large Xapian
large commits.
4) Your results show that notmuch new could be extra slow when dealing
with moving files on disk. This should be somewhat improved by changes
in notmuch 0.32 (I also see fairly dramatic impovements in
notmuch-reindex relative to notmuch new, but the underlying cause is
less clear).
[1] e.g. https://notmuchmail.org/perf-test-results/2024-05-26-minkowski/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xapian commits unexpectedly slow
2024-05-26 12:12 ` David Bremner
@ 2024-05-27 0:52 ` Matthew Schauer
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Schauer @ 2024-05-27 0:52 UTC (permalink / raw)
To: David Bremner; +Cc: notmuch
On Sun, May 26, 2024 at 09:12:57AM -0300, David Bremner wrote:
> Apologies, it looks like I never replied to this thread. Probably you
> are not longer interested, but I can make a few observations, mainly
> that there are a few relevant improvements in later notmuch.
Wow! What reminded you of this after all this time?
A lot certainly has changed since then. I don't remember whether my
problem was ever resolved on that machine, but a few months later I
moved to a new laptop with an SSD, and I've now been a happy Notmuch
user for several years. I think we can move on from this.
Thanks for continuing to work on improving this great tool!
--
Matthew Schauer
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-27 1:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-23 2:54 Xapian commits unexpectedly slow Matthew Schauer
2019-12-29 0:21 ` David Bremner
2020-01-02 22:46 ` Matthew Schauer
2024-05-26 12:12 ` David Bremner
2024-05-27 0:52 ` Matthew Schauer
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).