* Notmuch indexing 21 million emails
@ 2011-11-22 3:02 Tom Bulli
2011-11-23 3:20 ` Austin Clements
2011-11-23 15:40 ` Felipe Contreras
0 siblings, 2 replies; 4+ messages in thread
From: Tom Bulli @ 2011-11-22 3:02 UTC (permalink / raw)
To: notmuch@notmuchmail.org
I have a project where I need to search about 21 emails - and decided to use "notmuch" for it. The system is a Debian Squeeze, the notmuch version is "0.8-1~bpo60+1" from "kyria's" private repository.
I am running the "notmuch new" for approx. 4 days now - and according to "not,uch count" it has indexed about 4.5 million emails.
Is this expected performance? Is there any way to speed that up?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Notmuch indexing 21 million emails
2011-11-22 3:02 Notmuch indexing 21 million emails Tom Bulli
@ 2011-11-23 3:20 ` Austin Clements
2011-11-23 15:40 ` Felipe Contreras
1 sibling, 0 replies; 4+ messages in thread
From: Austin Clements @ 2011-11-23 3:20 UTC (permalink / raw)
To: Tom Bulli; +Cc: notmuch@notmuchmail.org
Quoth Tom Bulli on Nov 21 at 7:02 pm:
> I have a project where I need to search about 21 emails - and
> decided to use "notmuch" for it. The system is a Debian Squeeze,
> the notmuch version is "0.8-1~bpo60+1" from "kyria's" private
> repository.
>
> I am running the "notmuch new" for approx. 4 days now - and
> according to "not,uch count" it has indexed about 4.5 million
> emails.
>
> Is this expected performance? Is there any way to speed that up?
Currently, notmuch is much more optimized for search than it is for
indexing. This is unfortunate for the initial indexing process and
seems to be becoming increasingly unfortunate.
There are some things you can try. One is to use an SSD if you aren't
already, since constructing the index requires a lot of random IO.
You can also try libeatmydata to disable fsync's, which may improve
your IO performance, with the obvious crash-safety caveats. However,
unless you have a lot of RAM, I suspect your index has long outgrown
your buffer cache, so this may have limited impact.
Since you're going to the trouble of indexing 21 million emails, you
might want to try 0.10 (under freeze right now, to be released very,
very soon). It won't improve your indexing time, but if you're doing
searches with non-trivial numbers of results, emails indexed with 0.10
will search much faster.
Sorry I don't have better news, but I hope this helps.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Notmuch indexing 21 million emails
2011-11-22 3:02 Notmuch indexing 21 million emails Tom Bulli
2011-11-23 3:20 ` Austin Clements
@ 2011-11-23 15:40 ` Felipe Contreras
2011-11-23 17:20 ` Tom Bulli
1 sibling, 1 reply; 4+ messages in thread
From: Felipe Contreras @ 2011-11-23 15:40 UTC (permalink / raw)
To: Tom Bulli; +Cc: notmuch@notmuchmail.org
On Tue, Nov 22, 2011 at 5:02 AM, Tom Bulli <mrbulli@yahoo.com> wrote:
> I have a project where I need to search about 21 emails - and decided to use "notmuch" for it. The system is a Debian Squeeze, the notmuch version is "0.8-1~bpo60+1" from "kyria's" private repository.
>
> I am running the "notmuch new" for approx. 4 days now - and according to "not,uch count" it has indexed about 4.5 million emails.
>
> Is this expected performance? Is there any way to speed that up?
It would be nice to run something like this with OProfile (or perf)
and see if there's some obvious fixes.
--
Felipe Contreras
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Notmuch indexing 21 million emails
2011-11-23 15:40 ` Felipe Contreras
@ 2011-11-23 17:20 ` Tom Bulli
0 siblings, 0 replies; 4+ messages in thread
From: Tom Bulli @ 2011-11-23 17:20 UTC (permalink / raw)
To: Felipe Contreras; +Cc: notmuch@notmuchmail.org
I have been able to speed that up with the code below - basically increase "XAPIAN_FLUSH_THRESHOLD" based on the total virtual memory divided by the avg. size of an email times 2 (just to be safe). It seems to be faster since it does less xapian updates. However, I have a nagging feeling that ""XAPIAN_FLUSH_THRESHOLD" could even be higher since I don't see any increase in used memory (via "top -d 1"). The server in question has eight CPU cores and 8GB RAM, running Debian squeeze on a 32bit architecture (I know - but it is what it is :) ).
# Assume an average size of 120KB per email
# and use at most half the virtual memory
XFT=$(($(free -otk | awk '/^Total/ {print $2}') / 240))
# Keep more index info in memory before flushing to disk
[ $XFT -lt 10000 ] && XFT=10000
su - archive -c "export XAPIAN_FLUSH_THRESHOLD=$XFT; notmuch new --verbose"
----- Original Message -----
> From: Felipe Contreras <felipe.contreras@gmail.com>
> To: Tom Bulli <mrbulli@yahoo.com>
> Cc: "notmuch@notmuchmail.org" <notmuch@notmuchmail.org>
> Sent: Wednesday, November 23, 2011 10:40 AM
> Subject: Re: Notmuch indexing 21 million emails
>
> On Tue, Nov 22, 2011 at 5:02 AM, Tom Bulli <mrbulli@yahoo.com> wrote:
>> I have a project where I need to search about 21 emails - and decided to
> use "notmuch" for it. The system is a Debian Squeeze, the notmuch
> version is "0.8-1~bpo60+1" from "kyria's" private
> repository.
>>
>> I am running the "notmuch new" for approx. 4 days now - and
> according to "not,uch count" it has indexed about 4.5 million emails.
>>
>> Is this expected performance? Is there any way to speed that up?
>
> It would be nice to run something like this with OProfile (or perf)
> and see if there's some obvious fixes.
>
> --
> Felipe Contreras
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-11-23 17:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-22 3:02 Notmuch indexing 21 million emails Tom Bulli
2011-11-23 3:20 ` Austin Clements
2011-11-23 15:40 ` Felipe Contreras
2011-11-23 17:20 ` Tom Bulli
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).