unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* 25 minutes load time with emacs -f notmuch
@ 2009-11-21 14:51 Stefan Schmidt
  2009-11-21 15:12 ` Bdale Garbee
  2009-11-21 17:07 ` Carl Worth
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Schmidt @ 2009-11-21 14:51 UTC (permalink / raw)
  To: notmuch

Hello.

Disclaimer: I'm using vim, in combination with mutt for email, for years, but
never dealt with emacs. Please have this in mind and spot any emacs user errors
in this report. :)

I have first seen notmuch several weeks ago as it seems a silent project. Being
more then happy now that it envolves quickly and a real developer community
builds around it.

But now to my problem. Getting m mail indexed was easy enough:

stefan@excalibur:~$ du -chs not-much-mail/
1.5G    not-much-mail/
1.5G    total
stefan@excalibur:~$ time notmuch new
Found 103677 total files.
Processed 103677 total files in 42m 30s (40 files/sec.).
Added 100899 new messages to the database (not much, really).

Tip: If you have any sub-directories that are archives (that is,
they will never receive new mail), marking these directories as
read-only (chmod u-w /path/to/dir) will make "notmuch new"
much more efficient (it won't even look in those directories).

real    43m0.943s
user    22m46.513s
sys     0m39.418s


I put (require  'notmuch) in my ~/.emacs ans start emacs with the -f notmuch
option to enter the notmuch mode. What happends then is that a notmuch process
gets started and emacs waits for the return.

23649 pts/1    SN+    0:00      |       \_ emacs -f notmuch
23651 ?        RNs    0:03      |           \_ /usr/local/bin/notmuch search
--sort=oldest-first tag:inbox

Sadly that takes around 25 minutes here on an Intel Core2Duo notbeook (Thinkpad
X200s). I tried this several times now. CPU load was low (~10%) during this time
so it is mostly IO bound.

I checked that I don't have any big files like mutt header caches left and all
my mail is stored in maildir format diretcly from offlineimap. I'm more then
happy to test any patches on this issue or do some debugging myself if I get
some hints where to look.

regards
Stefan Schmidt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 14:51 25 minutes load time with emacs -f notmuch Stefan Schmidt
@ 2009-11-21 15:12 ` Bdale Garbee
  2009-11-21 15:36   ` Stefan Schmidt
  2009-11-21 17:16   ` Carl Worth
  2009-11-21 17:07 ` Carl Worth
  1 sibling, 2 replies; 17+ messages in thread
From: Bdale Garbee @ 2009-11-21 15:12 UTC (permalink / raw)
  To: Stefan Schmidt; +Cc: notmuch

On Sat, 2009-11-21 at 15:51 +0100, Stefan Schmidt wrote:

> Sadly that takes around 25 minutes here on an Intel Core2Duo notbeook (Thinkpad
> X200s). I tried this several times now. CPU load was low (~10%) during this time
> so it is mostly IO bound.

I see the same behavior on my notebook.  

I gather from talking to keithp that things like the 'state of already
being read' aren't being picked up from the file names in the local
Maildir yet.  Thus I suspect it's a fairly unusual / worst case scenario
trying to start up with 178k (in my case) supposedly-unread messages
tagged inbox.

I haven't figured out how to quickly tag everything as already read or
archived or whatever .. can someone who knows more about what's going on
confirm my hypothesis and if so, suggest the best approach to getting to
a happier state?

Bdale

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 15:12 ` Bdale Garbee
@ 2009-11-21 15:36   ` Stefan Schmidt
  2009-11-21 17:26     ` Carl Worth
  2009-11-21 17:16   ` Carl Worth
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Schmidt @ 2009-11-21 15:36 UTC (permalink / raw)
  To: Bdale Garbee; +Cc: notmuch

Hello.

On Sat, 2009-11-21 at 08:12, Bdale Garbee wrote:
> On Sat, 2009-11-21 at 15:51 +0100, Stefan Schmidt wrote:
> 
> > Sadly that takes around 25 minutes here on an Intel Core2Duo notbeook (Thinkpad
> > X200s). I tried this several times now. CPU load was low (~10%) during this time
> > so it is mostly IO bound.
> 
> I see the same behavior on my notebook.  
> 
> I gather from talking to keithp that things like the 'state of already
> being read' aren't being picked up from the file names in the local
> Maildir yet.  Thus I suspect it's a fairly unusual / worst case scenario
> trying to start up with 178k (in my case) supposedly-unread messages
> tagged inbox.

Using the read flag during notmuch new would indeed be nice. But some further
testing brings some doubts that it is an overload due to to many unread
messages.

I executed "/usr/local/bin/notmuch search --sort=oldest-first tag:inbox" by hand
and from the 21 minutes it took it stayed around 20 in a state where no new
message where printed and then sudenly all the rest comes up.

In my case only 80 messages were printed before the gap. All of them had a wrong
year in the timestamp. 1900 and 1970. Maybe notmuch just comes into a bad state
with this dates?

Bdale, can you confirm this for your case?

I will remove these mails and re-generate the notmuch index to test this out
after dinner later today.

regards
Stefan Schmidt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 14:51 25 minutes load time with emacs -f notmuch Stefan Schmidt
  2009-11-21 15:12 ` Bdale Garbee
@ 2009-11-21 17:07 ` Carl Worth
  2009-11-21 19:36   ` Stefan Schmidt
  2009-11-21 22:36   ` Brett Viren
  1 sibling, 2 replies; 17+ messages in thread
From: Carl Worth @ 2009-11-21 17:07 UTC (permalink / raw)
  To: Stefan Schmidt, notmuch

On Sat, 21 Nov 2009 15:51:11 +0100, Stefan Schmidt <stefan@datenfreihafen.org> wrote:
> Disclaimer: I'm using vim, in combination with mutt for email, for years, but
> never dealt with emacs. Please have this in mind and spot any emacs user errors
> in this report. :)

Hi Stefan, welcome to Notmuch! And don't worry, we don't discriminate
(too much) against non-emacs users around here.

> I have first seen notmuch several weeks ago as it seems a silent project. Being
> more then happy now that it envolves quickly and a real developer community
> builds around it.

Yes. Notmuch was a silent project since it was just something that I was
doing for myself. I was always writing it as free software, and even had
a public git repository available, but hadn't advertised it at all yet.

And Keith did rather catch me off guard by announcing it. But I can't
complain as we have gotten a nice community started already, and it's
great to have other people writing the code that I intended to
write. :-)

But it's also true that some obvious problems just aren't taken care of
yet.

> But now to my problem. Getting m mail indexed was easy enough:
> 
> stefan@excalibur:~$ du -chs not-much-mail/
> 1.5G    not-much-mail/
> 1.5G    total
> stefan@excalibur:~$ time notmuch new
> Found 103677 total files.
> Processed 103677 total files in 42m 30s (40 files/sec.).
> Added 100899 new messages to the database (not much, really).

Good. I'm glad that went fairly smoothly for you.

Though, frankly, I think we need to fix "notmuch new" to do much better
than 40 files/sec. One plan I have for this is to not use the database
to search for message IDs when adding many messages---but to instead
just use a hash-table (seeded from any messages already in the
database). This would allow us to do all thread resolution before
indexing messages, without having to do the N different searches, and
also means we'd avoid continually rewriting documents when merging
thread IDs.

> I put (require  'notmuch) in my ~/.emacs ans start emacs with the -f notmuch
> option to enter the notmuch mode.

I'm glad you've figured that much out. I feel bad that that's not even
in the documentation anywhere yet.

> What happends then is that a notmuch process gets started and emacs
> waits for the return.

OK. This is a known shortcoming. As Bdale supposes, this problem is from
notmuch trying to load and construct every thread in your
database. There are actually several different bugs/missing features
here that should be addressed:

  * "notmuch new" should look at the R flag in maildir files to
    determine that they are read and do not need to be marked as "inbox"
    and "unread"

  * "notmuch setup" should prompt for some date range, ("last 2 months"
    by default?) before which no messages will be considered unread.

Either of those two fixes would have prevented your particular
problem. But it's still easy to generate searches that return large
numbers of results. So there's some more to do:

  * The emacs code needs to call "notmuch search" with the --first and
    --max-threads options to get a limited set of results, (one or two
    screenfuls). You should be able to test this at the command line and
    see that it returns results quickly. Then, of course, we'd like the
    emacs code to fill in subsequent screenfuls as you page.

But none of that helps you right now. What you need is to retroactively
remove all of the "inbox" and "unread" tags from messages older than
some time period. So then there's another missing feature:

  * We need to support date-range-based searches. If we had that you
    could just do:

	notmuch tag -inbox -unread until:"2 months ago"

    But we don't quite have this yet. Xapian does have support for a
    slightly less convenient date range specification:

	1970-01-01..2009-09-21

    but it turns out that we can't even use that just yet, since to make
    that work we would have to have dates saved as YYYYMMDD strings for
    each message, (where instead we have time_t values stored serialized
    into a string that will sort correctly.). So we need a new
    ValueRangeProcessor class to map to timestamps, and then we'll need
    some fancy parsing to do things like "2 months ago".

So, what's the best thing to do today if you want to start playing with
notmuch? I think you could pick one of the above to work on, (a quick
hack to "notmuch new" and a re-import might do the trick). Or you might
just remove the inbox and unread tags from all messages and then just
let messages that are actually *new* in the future get tagged into the
inbox by "notmuch new". Oh, but then there's another missing feature:

  * We need a syntax to specify a search string that should match all
    messages. Then you could do:

	notmuch tag -inbox -unread <whatever-magic-we-came-up-with>

Yikes! So many bugs and missing features. How is anyone actually using
this system? Well, Keith and I were able to get past all this by simply
doing a "notmuch restore" based on tags we got from sup-dump. So here,
is another attempt:

  1. Run "notmuch dump <some-file>" to get the list of message IDs, (all
     with their "inbox" and "unread" tags).

  2. Edit that file to remove the tags you want.

  3. Run "notmuch restore <some-file>" to cause the tags to be removed.

But, (*sigh*), that's not good either, because "notmuch dump" is
currently hard-coded to dump messages in message-ID order rather than
date order, (so you can't easily do something like "just remove the tags
from messages older than two months).

So, there's sadly no easy way to get what you want with the tools in
their current form. I guess that's the pain that you get for being an
early adopter. :-}

But if hacking a little C code doesn't scare you away, a lot of the
things listed above are actually really easy to fix. (Like, fixing
"notmuch dump" to just run in date order is a one-line change. Adding a
--sort command-line option to it wouldn't be much harder, etc.)

So hopefully the above serves as a nice TODO list.

Thanks everyone for your interest in this software even in its current,
can-be-painful-to-use state.

-Carl

PS. Expect the mass-re-tag operations to be about as slow as the
original "notmuch new" import of the messages. That's a known bug in
Xapian that's one of the highest priority things that I'd like to fix,
(along with all of the above and all the other things I want to do...)

At least we're not running out of things to work on here.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 15:12 ` Bdale Garbee
  2009-11-21 15:36   ` Stefan Schmidt
@ 2009-11-21 17:16   ` Carl Worth
  2009-11-21 19:22     ` Stefan Schmidt
  1 sibling, 1 reply; 17+ messages in thread
From: Carl Worth @ 2009-11-21 17:16 UTC (permalink / raw)
  To: Bdale Garbee, Stefan Schmidt; +Cc: notmuch

On Sat, 21 Nov 2009 08:12:52 -0700, Bdale Garbee <bdale@gag.com> wrote:
> I haven't figured out how to quickly tag everything as already read or
> archived or whatever .. can someone who knows more about what's going on
> confirm my hypothesis and if so, suggest the best approach to getting to
> a happier state?

See my message up-thread. The only reasonable ways all really do involve
at least a little bit of C-code hacking to either prevent those tags
from getting put there by "notmuch new" or to make it easier to get them
off afterwards.

I'm hoping everyone with this problem will happen to choose a different
solution and we'll get a nice flood of patches to improve things. :-)

And I can't help but apologize. I've known about all these issues, and
wouldn't have invited people to try things out in the current state. But
it was nice of Keith to share this with everyone. And it's nice of all
you to come take a look at things.

So, I'll just ask for a little patience, and we'll hopefully have a nice
system soon.

-Carl

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 15:36   ` Stefan Schmidt
@ 2009-11-21 17:26     ` Carl Worth
  2009-11-21 19:23       ` Stefan Schmidt
  0 siblings, 1 reply; 17+ messages in thread
From: Carl Worth @ 2009-11-21 17:26 UTC (permalink / raw)
  To: Stefan Schmidt, Bdale Garbee; +Cc: notmuch

On Sat, 21 Nov 2009 16:36:55 +0100, Stefan Schmidt <stefan@datenfreihafen.org> wrote:
> I executed "/usr/local/bin/notmuch search --sort=oldest-first tag:inbox" by hand
> and from the 21 minutes it took it stayed around 20 in a state where no new
> message where printed and then sudenly all the rest comes up.

That's actually the expected behavior currently.

It used to be that "notmuch search" on the command line wouldn't present
any results until everything was available.

I recently threw in a hack to present the first 100 thread results
quickly and only then does it sit and spin before all the results are
available. I suppose it wouldn't be any harder for it to keep returning
chunks of 100 threads at a time, (though this will slow down the final
result a bit---perhaps not significantly).

And I wouldn't really mind any slowdown there anyway, since any *real*
interface should be calling "notmuch search" in small chunks anyway.

So I'll go ahead and do that.

> In my case only 80 messages were printed before the gap. All of them had a wrong
> year in the timestamp. 1900 and 1970. Maybe notmuch just comes into a bad state
> with this dates?

I don't think the bogus dates are throwing anything off. It's more
likely that you just have a number of messages with no Date header on
them at all. And for such messages, notmuch just chooses a time_t value
of 0 so you'll see whatever that 0 maps to on your system---a date of
1970 there is not surprising. :-)

> I will remove these mails and re-generate the notmuch index to test this out
> after dinner later today.

See my other mail. You may want to tweak the behavior of "notmuch new"
before running it again. (I would not expect the results to be any
different from running it again with no change.)

-Carl

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 17:16   ` Carl Worth
@ 2009-11-21 19:22     ` Stefan Schmidt
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Schmidt @ 2009-11-21 19:22 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

Hello.

On Sat, 2009-11-21 at 18:16, Carl Worth wrote:
> On Sat, 21 Nov 2009 08:12:52 -0700, Bdale Garbee <bdale@gag.com> wrote:
> > I haven't figured out how to quickly tag everything as already read or
> > archived or whatever .. can someone who knows more about what's going on
> > confirm my hypothesis and if so, suggest the best approach to getting to
> > a happier state?
> 
> See my message up-thread. The only reasonable ways all really do involve
> at least a little bit of C-code hacking to either prevent those tags
> from getting put there by "notmuch new" or to make it easier to get them
> off afterwards.

Let's see if I come up with something here.

> And I can't help but apologize. I've known about all these issues, and
> wouldn't have invited people to try things out in the current state. But
> it was nice of Keith to share this with everyone. And it's nice of all
> you to come take a look at things.

Getting it out now was a good move. It had enough code to actually do omething
usefull and many people waited for something like this. The increasing number of
contributors in such a short time shows it very well. :)

regards
Stefan Schmidt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 17:26     ` Carl Worth
@ 2009-11-21 19:23       ` Stefan Schmidt
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Schmidt @ 2009-11-21 19:23 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

Hello.

On Sat, 2009-11-21 at 18:26, Carl Worth wrote:
> On Sat, 21 Nov 2009 16:36:55 +0100, Stefan Schmidt <stefan@datenfreihafen.org> wrote:
> 
> > In my case only 80 messages were printed before the gap. All of them had a wrong
> > year in the timestamp. 1900 and 1970. Maybe notmuch just comes into a bad state
> > with this dates?
> 
> I don't think the bogus dates are throwing anything off. It's more
> likely that you just have a number of messages with no Date header on
> them at all. And for such messages, notmuch just chooses a time_t value
> of 0 so you'll see whatever that 0 maps to on your system---a date of
> 1970 there is not surprising. :-)

Yeah, I figured that removing the offending messages and re-run it brought
nothing. Time to look at the source. :)

regards
Stefan Schmidt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 17:07 ` Carl Worth
@ 2009-11-21 19:36   ` Stefan Schmidt
  2009-11-21 20:47     ` Carl Worth
  2009-11-21 22:36   ` Brett Viren
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Schmidt @ 2009-11-21 19:36 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

Hello.

On Sat, 2009-11-21 at 18:07, Carl Worth wrote:
> On Sat, 21 Nov 2009 15:51:11 +0100, Stefan Schmidt <stefan@datenfreihafen.org> wrote:
> > Disclaimer: I'm using vim, in combination with mutt for email, for years, but
> > never dealt with emacs. Please have this in mind and spot any emacs user errors
> > in this report. :)
> 
> Hi Stefan, welcome to Notmuch! And don't worry, we don't discriminate
> (too much) against non-emacs users around here.

:)

> > I have first seen notmuch several weeks ago as it seems a silent project. Being
> > more then happy now that it envolves quickly and a real developer community
> > builds around it.
> 
> Yes. Notmuch was a silent project since it was just something that I was
> doing for myself. I was always writing it as free software, and even had
> a public git repository available, but hadn't advertised it at all yet.

Yup, I had the repo on my disk a week before Keith blogged about it. Just nice
that it was going crazy that fast and people start using it and contributing to
it.

> > But now to my problem. Getting m mail indexed was easy enough:
> > 
> > stefan@excalibur:~$ du -chs not-much-mail/
> > 1.5G    not-much-mail/
> > 1.5G    total
> > stefan@excalibur:~$ time notmuch new
> > Found 103677 total files.
> > Processed 103677 total files in 42m 30s (40 files/sec.).
> > Added 100899 new messages to the database (not much, really).
> 
> Good. I'm glad that went fairly smoothly for you.
> 
> Though, frankly, I think we need to fix "notmuch new" to do much better
> than 40 files/sec.

As a sidenote. That one is on a notebook with a slow 5400 disk and crypt + lvm +
ext3 on top. Perhaps I should put some money back for an X25 SSD. ;)

> > I put (require  'notmuch) in my ~/.emacs ans start emacs with the -f notmuch
> > option to enter the notmuch mode.
> 
> I'm glad you've figured that much out. I feel bad that that's not even
> in the documentation anywhere yet.

I have to admit it took me some time. Something like below should help?

> > What happends then is that a notmuch process gets started and emacs
> > waits for the return.
> 
> OK. This is a known shortcoming. As Bdale supposes, this problem is from
> notmuch trying to load and construct every thread in your
> database. There are actually several different bugs/missing features
> here that should be addressed:
> 
>   * "notmuch new" should look at the R flag in maildir files to
>     determine that they are read and do not need to be marked as "inbox"
>     and "unread"

I think that's what I will try to get working here. Sounds the nearest solution
to my problem. That in combination with the just merged tags-based-on-folders
patch should make me a lot happier. :)


From 8f95e039e98addd0f4be7c31e41e534f1b519a5d Mon Sep 17 00:00:00 2001
From: Stefan Schmidt <stefan@datenfreihafen.org>
Date: Sat, 21 Nov 2009 20:31:55 +0100
Subject: [PATCH] INSTALL: emacs install dokumentation.

Write down the steps needed to install and actuall use notmuch in emacs. Should
help emacs newbies.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
---
 INSTALL |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/INSTALL b/INSTALL
index de268b6..64b8e36 100644
--- a/INSTALL
+++ b/INSTALL
@@ -14,6 +14,14 @@ Notmuch are satisfied. If they are not, the configure script will
 notice that and provide instructions on where to obtain the necessary
 dependencies.
 
+notmuch.el installation
+-----------------------
+Installing the notmuch.el emacs lisp function systemwide:
+
+	sudo make install-emacs
+
+Each user needs to add (require 'notmuch) in his ~/.emacs to activate it.
+
 Dependencies
 ------------
 Notmuch depends on three libraries: Xapian, GMime 2.4, and Talloc
-- 
1.6.5.3

regards
Stefan Schmidt

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 19:36   ` Stefan Schmidt
@ 2009-11-21 20:47     ` Carl Worth
  0 siblings, 0 replies; 17+ messages in thread
From: Carl Worth @ 2009-11-21 20:47 UTC (permalink / raw)
  To: Stefan Schmidt; +Cc: notmuch

On Sat, 21 Nov 2009 20:36:06 +0100, Stefan Schmidt <stefan@datenfreihafen.org> wrote:
> Yup, I had the repo on my disk a week before Keith blogged about it. Just nice
> that it was going crazy that fast and people start using it and contributing to
> it.

Yes, it's quite fun.

> > Though, frankly, I think we need to fix "notmuch new" to do much better
> > than 40 files/sec.
> 
> As a sidenote. That one is on a notebook with a slow 5400 disk and crypt + lvm +
> ext3 on top. Perhaps I should put some money back for an X25 SSD. ;)

Sure. But I think we can still do a lot better even on your machine. :-)

> I have to admit it took me some time. Something like below should help?

Thanks so much! I committed this, (and then added a bit more
documentation on top of it).

> I think that's what I will try to get working here. Sounds the nearest solution
> to my problem. That in combination with the just merged tags-based-on-folders
> patch should make me a lot happier. :)

Well, do note that I just reverted that patch too. :-/

So you might want to cherry-pick it back (or even add the configuration
option that will let us push it back out again).

-Carl

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 17:07 ` Carl Worth
  2009-11-21 19:36   ` Stefan Schmidt
@ 2009-11-21 22:36   ` Brett Viren
  2009-11-21 22:40     ` Jed Brown
                       ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Brett Viren @ 2009-11-21 22:36 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

On Sat, Nov 21, 2009 at 12:07 PM, Carl Worth <cworth@cworth.org> wrote:

> Though, frankly, I think we need to fix "notmuch new" to do much better
> than 40 files/sec.

Just a "me too".

Processed 130871 total files in 38m 7s (57 files/sec.).
Added 102723 new messages to the database (not much, really).

This was ~2GB of mail on a 2.5GHz CPU.  That seems pretty reasonable
to me but I'd like to rerun the "notmuch new" under google perftools
to see if there are any obvious bottlenecks that might be cleaned up.

How can I purge the index?  I can't locate it.

-Brett.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 22:36   ` Brett Viren
@ 2009-11-21 22:40     ` Jed Brown
  2009-11-21 22:44       ` Brett Viren
  2009-11-22  3:28     ` Carl Worth
  2009-11-22  8:36     ` Mike Hommey
  2 siblings, 1 reply; 17+ messages in thread
From: Jed Brown @ 2009-11-21 22:40 UTC (permalink / raw)
  To: Brett Viren, Carl Worth; +Cc: notmuch

On Sat, 21 Nov 2009 17:36:18 -0500, Brett Viren <brett.viren@gmail.com> wrote:
> How can I purge the index?  I can't locate it.

I believe you can just remove /path/to/maildir/.notmuch

Jed

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 22:40     ` Jed Brown
@ 2009-11-21 22:44       ` Brett Viren
  0 siblings, 0 replies; 17+ messages in thread
From: Brett Viren @ 2009-11-21 22:44 UTC (permalink / raw)
  To: Jed Brown; +Cc: notmuch

On Sat, Nov 21, 2009 at 5:40 PM, Jed Brown <jed@59a2.org> wrote:
> On Sat, 21 Nov 2009 17:36:18 -0500, Brett Viren <brett.viren@gmail.com> wrote:
>> How can I purge the index?  I can't locate it.
>
> I believe you can just remove /path/to/maildir/.notmuch

Doh!  Thanks.  I have a dovecot controlled Maildir so this
dot-directory got lost in the forest of all the others.

-Brett.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 22:36   ` Brett Viren
  2009-11-21 22:40     ` Jed Brown
@ 2009-11-22  3:28     ` Carl Worth
  2009-11-22  8:36     ` Mike Hommey
  2 siblings, 0 replies; 17+ messages in thread
From: Carl Worth @ 2009-11-22  3:28 UTC (permalink / raw)
  To: Brett Viren; +Cc: notmuch

On Sat, 21 Nov 2009 17:36:18 -0500, Brett Viren <brett.viren@gmail.com> wrote:
> Processed 130871 total files in 38m 7s (57 files/sec.).
> Added 102723 new messages to the database (not much, really).

Just be glad that you have so little mail. ;-)

> This was ~2GB of mail on a 2.5GHz CPU.  That seems pretty reasonable
> to me but I'd like to rerun the "notmuch new" under google perftools
> to see if there are any obvious bottlenecks that might be cleaned up.

To me, here are the obvious things to fix after looking at a profile:

  1. We're spending a *lot* of time searching in the Xapian database.

But our initial indexing operation should only be *writing* data into
the database, so what's this searching about?

Well, at each new message, we're looking up the ID from it's In-Reply-To
header to find a thread-ID to link to, and then we're looking up all of
the IDs from its References header to find thread IDs that need to be
merged with ours. So both parent and child lookups.

And since those are taking a bunch of time, I think it might make sense
to just keep a hashtable mapping message-ID -> thread-ID and do lookups
in that, (should have plenty of memory on current machines even with
lots of mail).

  2. We're hitting the slow Xapian document updates for thread-ID
  merging.

Whenever we find a child that was already in the database with one
thread ID that should have ours, we simply want to set its thread ID to
ours. But as we've talked about recently, Xapian has a bug (defect 250)
that makes it much more expensive than it should be to update a single
term.

So, we could do a first pass over the messages to find all their thread
IDs and get them to settle down before doing any indexing in a separate,
second pass.

Step (2) should help even if we don't do step (1), but clearly we can do
both.

It would be great if anyone wants to take a look at either or both of
these, otherwise I will when I can.

-Carl

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-21 22:36   ` Brett Viren
  2009-11-21 22:40     ` Jed Brown
  2009-11-22  3:28     ` Carl Worth
@ 2009-11-22  8:36     ` Mike Hommey
  2009-11-22 15:15       ` Brett Viren
  2 siblings, 1 reply; 17+ messages in thread
From: Mike Hommey @ 2009-11-22  8:36 UTC (permalink / raw)
  To: Brett Viren; +Cc: notmuch

On Sat, Nov 21, 2009 at 05:36:18PM -0500, Brett Viren wrote:
> On Sat, Nov 21, 2009 at 12:07 PM, Carl Worth <cworth@cworth.org> wrote:
> 
> > Though, frankly, I think we need to fix "notmuch new" to do much better
> > than 40 files/sec.
> 
> Just a "me too".
> 
> Processed 130871 total files in 38m 7s (57 files/sec.).
> Added 102723 new messages to the database (not much, really).
> 
> This was ~2GB of mail on a 2.5GHz CPU.  That seems pretty reasonable
> to me but I'd like to rerun the "notmuch new" under google perftools
> to see if there are any obvious bottlenecks that might be cleaned up.

FWIW, my 90k+ messages mailbox was imported at a pace of 130 files/sec,
and my CPU is "only" 2.2GHz, but I have a SSD. A good share of the
bottlenecks is "simply" I/O. Don't forget having a lot of small files
sucks I/O wise, as files are most likely spread all over the disk.

A good test, if you have enough memory, would be to put your mailbox in
a tmpfs, and see how fast that imports.

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-22  8:36     ` Mike Hommey
@ 2009-11-22 15:15       ` Brett Viren
  2009-11-23  3:28         ` Carl Worth
  0 siblings, 1 reply; 17+ messages in thread
From: Brett Viren @ 2009-11-22 15:15 UTC (permalink / raw)
  To: Mike Hommey; +Cc: notmuch

On Sun, Nov 22, 2009 at 3:36 AM, Mike Hommey <mh+notmuch@glandium.org> wrote:

> A good test, if you have enough memory, would be to put your mailbox in
> a tmpfs, and see how fast that imports.

(Oops, forgot to reply to the list.)

I don't see any function calls related to I/O on the call graph.

But, here is one that looks I/O bound:

 notmuch tag -unread tag:inbox

I have my home directory on an encfs volume and I see it and notmuch
competing for CPU when viewing "top".

-Brett.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 25 minutes load time with emacs -f notmuch
  2009-11-22 15:15       ` Brett Viren
@ 2009-11-23  3:28         ` Carl Worth
  0 siblings, 0 replies; 17+ messages in thread
From: Carl Worth @ 2009-11-23  3:28 UTC (permalink / raw)
  To: Brett Viren, Mike Hommey; +Cc: notmuch

On Sun, 22 Nov 2009 10:15:39 -0500, Brett Viren <brett.viren@gmail.com> wrote:
> On Sun, Nov 22, 2009 at 3:36 AM, Mike Hommey <mh+notmuch@glandium.org> wrote:
> But, here is one that looks I/O bound:
> 
>  notmuch tag -unread tag:inbox
> 
> I have my home directory on an encfs volume and I see it and notmuch
> competing for CPU when viewing "top".

Yes. The "notmuch tag" command currently does much more IO than it
really should.

This is Xapian bug 250. Please see:

	id:874oon4pgv.fsf@yoom.home.cworth.org 

for some details and thoughts on the bug from me and some pointers on
how one could go about fixing it.

-Carl

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-11-23  3:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-21 14:51 25 minutes load time with emacs -f notmuch Stefan Schmidt
2009-11-21 15:12 ` Bdale Garbee
2009-11-21 15:36   ` Stefan Schmidt
2009-11-21 17:26     ` Carl Worth
2009-11-21 19:23       ` Stefan Schmidt
2009-11-21 17:16   ` Carl Worth
2009-11-21 19:22     ` Stefan Schmidt
2009-11-21 17:07 ` Carl Worth
2009-11-21 19:36   ` Stefan Schmidt
2009-11-21 20:47     ` Carl Worth
2009-11-21 22:36   ` Brett Viren
2009-11-21 22:40     ` Jed Brown
2009-11-21 22:44       ` Brett Viren
2009-11-22  3:28     ` Carl Worth
2009-11-22  8:36     ` Mike Hommey
2009-11-22 15:15       ` Brett Viren
2009-11-23  3:28         ` Carl Worth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).