unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Mark Anderson <MarkR.Anderson@amd.com>
To: Olly Betts <olly@survex.com>, notmuch@notmuchmail.org
Subject: Re: Rather simple optimization for notmuch tag
Date: Wed, 23 Dec 2009 11:18:45 -0700	[thread overview]
Message-ID: <3wd637xo8oq.fsf@testarossa.amd.com> (raw)
In-Reply-To: <loom.20091223T043223-941@post.gmane.org>

On Wed, 23 Dec 2009 03:45:14 +0000, Olly Betts <olly@survex.com> wrote:
> Carl Worth writes:
> > On Fri, 18 Dec 2009 00:49:00 -0700, Mark Anderson wrote:
> > > I was updating my poll script that tags messages, and a common idiom is
> > > to put
> > >  tag +mytag <search_terms> and not tag:mytag
> > > 
> > > I don't know anything about efficiency, but for the simple single-tag
> > > case, couldn't we imply the "and not tag:mytag" from the +mytag action
> > > list for the tag command?
> > 
> > On one level, it really shouldn't be a performance issue to tag messages
> > that already have a particular tag. (And in fact, the recently proposed
> > patches to fix Xapian defect 250 even address this I think.)
> 
> Applying a filter up-front like this is likely to still help I think as it
> avoids Xapian having to reverse-engineer this information internally.

That's good to hear.

> Actually, you could do this with multiple tags - you just need to build
> a filter for documents which might be affected.
> 
> So if you're adding tags a1 and a2, you want: <query> AND_NOT (a1 AND a2)
> since documents which already have tags a1 and a2 can be ignored.
> 
> If you're removing d1 and d2, then the filter is: <query> AND (d1 OR d2)
> since documents have to be tagged d1 or d2 in order for the removal to
> do anything.
> 
> Handling a combination of removals and additions is trickier, but probably
> possible, although the more tags you are dealing with, the less profitable
> the filtering is likely to be (as the filter is likely to cull fewer
> documents yet be more expensive to evaluate).

But the transform is pretty simple, I think that any combination of
additions and removals could be transformed according to the following
formula.

notmuch tag +a1 +a2 +a3 -d1 -d2 -d3 <search-terms>

would transform to something like:

<search-terms> and ( not(a1) or not(a2) or not(a3) or d1 or d2 or d3) 

There are certainly may be much more optimal ways to do it depending on
the specific corpus of the database, considering if the tags a1 and a2
and a3 are usually added as one tag, or if the addition is done
individually, because if I know that a3 implies a1 and a2, the first 3
terms could be combined to not(a1 and a2 and a3), or I could just
exclude a3 tagged messages for nearly the same effect, with expected
performance improvements.

Unfortunately this requires that I know more about how the tags are used
than I ever want notmuch to deal with.  Perhaps a follow-on or parallel
project with less emphasis on minimalism.


This looks pretty good to me.  Easy to implement and not likely to break
things.  I've been wondering about whether there should be a repository
of mail added to the notmuch git so that we can start testing these
kinds of features on a consistent body of mail.
 
I doubt that I'll be the one to write this, since I don't have any time
set aside for real coding, but if it takes long enough, I'll probably
pick this one up eventually.

-Mark

  reply	other threads:[~2009-12-23 18:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-18  7:49 Rather simple optimization for notmuch tag Mark Anderson
2009-12-18 17:39 ` Carl Worth
2009-12-23  3:45   ` Olly Betts
2009-12-23 18:18     ` Mark Anderson [this message]
2009-12-24  0:27       ` Olly Betts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3wd637xo8oq.fsf@testarossa.amd.com \
    --to=markr.anderson@amd.com \
    --cc=notmuch@notmuchmail.org \
    --cc=olly@survex.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).