unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Austin Clements <amdragon@mit.edu>
To: Istvan Marko <notmuch@kismala.com>
Cc: notmuch@notmuchmail.org
Subject: Re: storing From and Subject in xapian
Date: Sat, 14 May 2011 21:37:25 -0400	[thread overview]
Message-ID: <BANLkTinVzQL2qRDbt4WhcPdL1D7D3N=aQg@mail.gmail.com> (raw)
In-Reply-To: <m3sjsv2kw2.fsf@zsu.kismala.com>

I wonder if a better approach would be to use
notmuch_message_get_header everywhere, rather than introducing
_notmuch_message_get_header_value, and have it simply recognize
headers that can be retrieved directly from the database.  Then
library callers could take advantage of this optimization and it could
be trivially extended to other headers in the future.

On Tue, May 3, 2011 at 11:40 PM, Istvan Marko <notmuch@kismala.com> wrote:
> I have been looking at the I/O patterns of "notmuch search" with the
> default output format and noticed that it has to parse the maildir file
> of every matched message to get the From and Subject headers. I figured
> that this must be slowing things down, especially when the files are not
> in the filesystem cache.
>
> So I wanted to see how much difference would it make to have the From
> and Subject stored in xapian to avoid this parsing.
>
> With the attached patch I get a speedup of 2x with cached and almost 10x
> with uncached files for searches with many matches.
>
> The attached patch is only intended as proof of concept. I am not
> familiar with xapian so I wasn't sure if this kind of data should be
> stored as terms, values or data. I went with values simply because I saw
> that message-id and timestamp were already stored that way. Perhaps the
> data type would be more appropriate since the fields are not used for
> searching or sorting. Oh and for some reason I get blank Subject for
> about 1% of the matches.
>
>
> Is there a downside to this approach? The only one I see is that the
> xapian db size increases by about 1% but to me the speed increase would
> be well worth it.

  parent reply	other threads:[~2011-05-15  1:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-04  3:40 storing From and Subject in xapian Istvan Marko
2011-05-05  1:48 ` Austin Clements
2011-05-05  2:43   ` Austin Clements
2011-05-05  8:17   ` Istvan Marko
2011-05-09  5:00   ` Jameson Graef Rollins
2011-05-09  5:24     ` Istvan Marko
2011-05-11  3:39       ` Stewart Smith
2011-05-12  8:39         ` Istvan Marko
2011-05-15  4:40           ` servilio
2011-05-15  1:37 ` Austin Clements [this message]
2011-05-16  7:33   ` Istvan Marko
2011-05-16  8:59   ` Sebastian Spaeth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTinVzQL2qRDbt4WhcPdL1D7D3N=aQg@mail.gmail.com' \
    --to=amdragon@mit.edu \
    --cc=notmuch@kismala.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).