* Rather simple optimization for notmuch tag @ 2009-12-18 7:49 Mark Anderson 2009-12-18 17:39 ` Carl Worth 0 siblings, 1 reply; 5+ messages in thread From: Mark Anderson @ 2009-12-18 7:49 UTC (permalink / raw To: cworth, notmuch I was updating my poll script that tags messages, and a common idiom is to put tag +mytag <search_terms> and not tag:mytag I don't know anything about efficiency, but for the simple single-tag case, couldn't we imply the "and not tag:mytag" from the +mytag action list for the tag command? The similar (dual?, rusty math terminology, beware of Math-tetanus) case of "tag -mytag <search-terms> and tag:mytag" could be similarly optimized, since the tag removal action ought to be a null action in the case that the search terms matched on a thread or message, but the tag to be removed isn't attached to the message/thread returned. Any thoughts on the subject? -Mark ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rather simple optimization for notmuch tag 2009-12-18 7:49 Rather simple optimization for notmuch tag Mark Anderson @ 2009-12-18 17:39 ` Carl Worth 2009-12-23 3:45 ` Olly Betts 0 siblings, 1 reply; 5+ messages in thread From: Carl Worth @ 2009-12-18 17:39 UTC (permalink / raw To: Mark Anderson, notmuch [-- Attachment #1: Type: text/plain, Size: 1545 bytes --] On Fri, 18 Dec 2009 00:49:00 -0700, Mark Anderson <MarkR.Anderson@amd.com> wrote: > I was updating my poll script that tags messages, and a common idiom is > to put > tag +mytag <search_terms> and not tag:mytag > > I don't know anything about efficiency, but for the simple single-tag > case, couldn't we imply the "and not tag:mytag" from the +mytag action > list for the tag command? On one level, it really shouldn't be a performance issue to tag messages that already have a particular tag. (And in fact, the recently proposed patches to fix Xapian defect 250 even address this I think.) In the meantime, it is fairly annoying to have to type this, and yes, the tag command could infer that and append it to the search string automatically. That's a good idea, really. > The similar (dual?, rusty math terminology, beware of Math-tetanus) case > of "tag -mytag <search-terms> and tag:mytag" could be similarly optimized, > since the tag removal action ought to be a null action in the case that > the search terms matched on a thread or message, but the tag to be > removed isn't attached to the message/thread returned. Yes, that would work too. One potential snag with both ideas is that the "notmuch tag" command-line as currently implemented allows for multiple tag additions and removals with a single search. So the optimization here couldn't be used unless there was just a single tag action. So that's another reason to really just want the lower-level optimization to be in place. -Carl [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rather simple optimization for notmuch tag 2009-12-18 17:39 ` Carl Worth @ 2009-12-23 3:45 ` Olly Betts 2009-12-23 18:18 ` Mark Anderson 0 siblings, 1 reply; 5+ messages in thread From: Olly Betts @ 2009-12-23 3:45 UTC (permalink / raw To: notmuch Carl Worth writes: > On Fri, 18 Dec 2009 00:49:00 -0700, Mark Anderson wrote: > > I was updating my poll script that tags messages, and a common idiom is > > to put > > tag +mytag <search_terms> and not tag:mytag > > > > I don't know anything about efficiency, but for the simple single-tag > > case, couldn't we imply the "and not tag:mytag" from the +mytag action > > list for the tag command? > > On one level, it really shouldn't be a performance issue to tag messages > that already have a particular tag. (And in fact, the recently proposed > patches to fix Xapian defect 250 even address this I think.) Applying a filter up-front like this is likely to still help I think as it avoids Xapian having to reverse-engineer this information internally. > One potential snag with both ideas is that the "notmuch tag" > command-line as currently implemented allows for multiple tag additions > and removals with a single search. So the optimization here couldn't be > used unless there was just a single tag action. Actually, you could do this with multiple tags - you just need to build a filter for documents which might be affected. So if you're adding tags a1 and a2, you want: <query> AND_NOT (a1 AND a2) since documents which already have tags a1 and a2 can be ignored. If you're removing d1 and d2, then the filter is: <query> AND (d1 OR d2) since documents have to be tagged d1 or d2 in order for the removal to do anything. Handling a combination of removals and additions is trickier, but probably possible, although the more tags you are dealing with, the less profitable the filtering is likely to be (as the filter is likely to cull fewer documents yet be more expensive to evaluate). Cheers, Olly ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rather simple optimization for notmuch tag 2009-12-23 3:45 ` Olly Betts @ 2009-12-23 18:18 ` Mark Anderson 2009-12-24 0:27 ` Olly Betts 0 siblings, 1 reply; 5+ messages in thread From: Mark Anderson @ 2009-12-23 18:18 UTC (permalink / raw To: Olly Betts, notmuch On Wed, 23 Dec 2009 03:45:14 +0000, Olly Betts <olly@survex.com> wrote: > Carl Worth writes: > > On Fri, 18 Dec 2009 00:49:00 -0700, Mark Anderson wrote: > > > I was updating my poll script that tags messages, and a common idiom is > > > to put > > > tag +mytag <search_terms> and not tag:mytag > > > > > > I don't know anything about efficiency, but for the simple single-tag > > > case, couldn't we imply the "and not tag:mytag" from the +mytag action > > > list for the tag command? > > > > On one level, it really shouldn't be a performance issue to tag messages > > that already have a particular tag. (And in fact, the recently proposed > > patches to fix Xapian defect 250 even address this I think.) > > Applying a filter up-front like this is likely to still help I think as it > avoids Xapian having to reverse-engineer this information internally. That's good to hear. > Actually, you could do this with multiple tags - you just need to build > a filter for documents which might be affected. > > So if you're adding tags a1 and a2, you want: <query> AND_NOT (a1 AND a2) > since documents which already have tags a1 and a2 can be ignored. > > If you're removing d1 and d2, then the filter is: <query> AND (d1 OR d2) > since documents have to be tagged d1 or d2 in order for the removal to > do anything. > > Handling a combination of removals and additions is trickier, but probably > possible, although the more tags you are dealing with, the less profitable > the filtering is likely to be (as the filter is likely to cull fewer > documents yet be more expensive to evaluate). But the transform is pretty simple, I think that any combination of additions and removals could be transformed according to the following formula. notmuch tag +a1 +a2 +a3 -d1 -d2 -d3 <search-terms> would transform to something like: <search-terms> and ( not(a1) or not(a2) or not(a3) or d1 or d2 or d3) There are certainly may be much more optimal ways to do it depending on the specific corpus of the database, considering if the tags a1 and a2 and a3 are usually added as one tag, or if the addition is done individually, because if I know that a3 implies a1 and a2, the first 3 terms could be combined to not(a1 and a2 and a3), or I could just exclude a3 tagged messages for nearly the same effect, with expected performance improvements. Unfortunately this requires that I know more about how the tags are used than I ever want notmuch to deal with. Perhaps a follow-on or parallel project with less emphasis on minimalism. This looks pretty good to me. Easy to implement and not likely to break things. I've been wondering about whether there should be a repository of mail added to the notmuch git so that we can start testing these kinds of features on a consistent body of mail. I doubt that I'll be the one to write this, since I don't have any time set aside for real coding, but if it takes long enough, I'll probably pick this one up eventually. -Mark ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Rather simple optimization for notmuch tag 2009-12-23 18:18 ` Mark Anderson @ 2009-12-24 0:27 ` Olly Betts 0 siblings, 0 replies; 5+ messages in thread From: Olly Betts @ 2009-12-24 0:27 UTC (permalink / raw To: notmuch Mark Anderson writes: > On Wed, 23 Dec 2009 03:45:14 +0000, Olly Betts wrote: > > Handling a combination of removals and additions is trickier, but probably > > possible, although the more tags you are dealing with, the less profitable > > the filtering is likely to be (as the filter is likely to cull fewer > > documents yet be more expensive to evaluate). > > But the transform is pretty simple, I think that any combination of > additions and removals could be transformed according to the following > formula. > > notmuch tag +a1 +a2 +a3 -d1 -d2 -d3 <search-terms> > > would transform to something like: > > <search-terms> and ( not(a1) or not(a2) or not(a3) or d1 or d2 or d3) Note that Xapian doesn't really have a "not" operator (because of how it works - by storing the documents indexing each term - rather than because nobody's implemented it), so it isn't quite as simple as the above. There is a posting list for "all documents" (which is very efficient if the document ids form a contiguous range; if they don't, it's as efficient as a term which matches all those documents for the chert backend, but not so great for the default flint backend in 1.0.x), and you can combine this with the "AND_NOT" operator to give the equivalent of a "NOT" operator. So I think the example above is probably best expressed as: ( <search-terms> AND ( ( ALL AND_NOT (a1 AND a2 AND a3) ) OR d1 OR d2 OR d3 ) But my point wasn't that I doubted it could be handled, but that it becomes less worthwhile as the number of tags increases (and at some point will become slower). > There are certainly may be much more optimal ways to do it depending on > the specific corpus of the database, considering if the tags a1 and a2 > and a3 are usually added as one tag, or if the addition is done > individually, because if I know that a3 implies a1 and a2, the first 3 > terms could be combined to not(a1 and a2 and a3), or I could just > exclude a3 tagged messages for nearly the same effect, with expected > performance improvements. I think you always can combine them like that. The documents that don't need looking at are precisely those which already have all three tags (i.e. a1 AND a2 AND a3), so those that do are "NOT" that expression. Cheers, Olly ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-12-24 0:27 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-18 7:49 Rather simple optimization for notmuch tag Mark Anderson 2009-12-18 17:39 ` Carl Worth 2009-12-23 3:45 ` Olly Betts 2009-12-23 18:18 ` Mark Anderson 2009-12-24 0:27 ` Olly Betts
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).