unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Carl Worth <cworth@cworth.org>
To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org
Subject: Re: [PATCH 3/4] Optimize thread search using matched docid sets.
Date: Tue, 07 Dec 2010 17:20:22 -0800	[thread overview]
Message-ID: <8739q95acp.fsf@yoom.home.cworth.org> (raw)
In-Reply-To: <20101118073828.GD2439@mit.edu>

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

On Thu, 18 Nov 2010 02:38:29 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> Currently this code uses a bitmap indexed by docid as a simple, fast
> set structure.  This is quite memory-efficient if the docid space is
> dense, even if the largest docid is quite large.  Is there a danger
> that the docid space will be large and sparse?  Is it worth replacing
> this with a smarter bit set structure?

When simply adding messages, the docid space is optimally dense, (see
_notmuch_database_generate_doc_id which generates sequential docid
values).

As messages are removed, the space will become less dense as there is
currently never any reuse of docid values. We could imagine adding some
sort of packing operation to get back to dense packing, (but forcing
that kind of thing on the user isn't so kind, of course).

Meanwhile, Xapian can very efficiently tell us how packed the space is,
(since we can query document count and the last docid allocated). So we
definitely have the ability to tune things automatically if needed.

We're currently using GHashTable in several places, but I've thought of
incorporating a nice, little hash-table implementation that Eric Anholt
wrote some time ago. If we did that, we could intelligently choose
whichever data structure is more memory-efficient depending on how
packed the docid space is.

Personally, though, I'm not that much of a micro-optimizer. As can be
seen in the current thread, I generally leave the optimization to
others. Thanks again, Austin!

-Carl

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2010-12-08  1:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-17 19:28 [PATCH 3/4] Optimize thread search using matched docid sets Austin Clements
2010-11-18  7:38 ` Austin Clements
2010-12-08  1:20   ` Carl Worth [this message]
2010-12-08  1:19 ` Carl Worth
2010-12-08 21:58   ` Austin Clements
2010-12-08 22:01     ` [PATCH] Various small clean-ups to doc ID set code Austin Clements
2011-01-28 21:36       ` Carl Worth
2011-01-31  4:22         ` [PATCH v2 0/2] Small clean-ups to the " Austin Clements
2011-03-09 23:21           ` Carl Worth
2011-03-10  1:31             ` Austin Clements
2011-01-31  4:22         ` [PATCH 1/2] Remove code repetition in the doc ID bitmap code Austin Clements
2011-01-31  4:22         ` [PATCH 2/2] Simplify _notmuch_doc_id_set_init interface Austin Clements
2011-01-28 21:26     ` [PATCH 3/4] Optimize thread search using matched docid sets Carl Worth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8739q95acp.fsf@yoom.home.cworth.org \
    --to=cworth@cworth.org \
    --cc=amdragon@MIT.EDU \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).