From: Austin Clements <amdragon@MIT.EDU>
To: Jani Nikula <jani@nikula.org>
Cc: notmuch@notmuchmail.org
Subject: Re: [PATCH] tag: Automatically limit to messages whose tags will actually change.
Date: Wed, 9 Nov 2011 08:40:13 -0500 [thread overview]
Message-ID: <20111109134013.GK2658@mit.edu> (raw)
In-Reply-To: <87ty6d1y5x.fsf@nikula.org>
Quoth Jani Nikula on Nov 09 at 8:46 am:
>
> FWIW, I reviewed this and didn't find any obvious problems. A few
> nitpicks below, though.
>
> BR,
> Jani.
>
> On Mon, 7 Nov 2011 22:55:23 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> > This optimizes the user's tagging query to exclude messages that won't
> > be affected by the tagging operation, saving computation and IO for
> > redundant tagging operations.
> >
> > For example,
> > notmuch tag +notmuch to:notmuch@notmuchmail.org
> > will now use the query
> > ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")
> >
> > In the past, we've often suggested that people do this exact
> > transformation by hand for slow tagging operations. This makes that
> > unnecessary.
> > ---
> > I was about to implement this optimization in my initial tagging
> > script, but then I figured, why not just do it in notmuch so we can
> > stop telling people to do this by hand?
> >
> > NEWS | 9 ++++++
> > notmuch-tag.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 85 insertions(+), 0 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index e00452a..9ca5e0c 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -16,6 +16,15 @@ Add search terms to "notmuch dump"
> > search/show/tag. The output file argument of dump is deprecated in
> > favour of using stdout.
> >
> > +Optimizations
> > +-------------
> > +
> > +Automatic tag query optimization
> > +
> > + "notmuch tag" now automatically optimizes the user's query to
> > + exclude messages whose tags won't change. In the past, we've
> > + suggested that people do this by hand; this is no longer necessary.
> > +
> > Notmuch 0.9 (2011-10-01)
> > ========================
> >
> > diff --git a/notmuch-tag.c b/notmuch-tag.c
> > index dded39e..62c4bf1 100644
> > --- a/notmuch-tag.c
> > +++ b/notmuch-tag.c
> > @@ -30,6 +30,76 @@ handle_sigint (unused (int sig))
> > interrupted = 1;
> > }
> >
> > +static char *
> > +_escape_tag (char *buf, const char *tag)
> > +{
> > + const char *in = tag;
> > + char *out = buf;
> > + /* Boolean terms surrounded by double quotes can contain any
> > + * character. Double quotes are quoted by doubling them. */
> > + *(out++) = '"';
> > + while (*in) {
> > + if (*in == '"')
> > + *(out++) = '"';
> > + *(out++) = *(in++);
> > + }
> > + *(out++) = '"';
>
> The parenthesis are unnecessary for *p++.
Removed. I put these in out of paranoia, but I suppose it wouldn't be
an lvalue if it parsed differently.
> > + *out = 0;
> > + return buf;
> > +}
> > +
> > +static char *
> > +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
> > + int *add_tags, int add_tags_count,
> > + int *remove_tags, int remove_tags_count)
> > +{
> > + /* This is subtler than it looks. Xapian ignores the '-' operator
> > + * at the beginning both queries and parenthesized groups and,
> > + * furthermore, the presence of a '-' operator at the beginning of
> > + * a group can inhibit parsing of the previous operator. Hence,
> > + * the user-provided query MUST appear first, but it is safe to
> > + * parenthesize and the exclusion part of the query must not use
> > + * the '-' operator (though the NOT operator is fine). */
> > +
> > + char *escaped, *query_string;
> > + const char *join = "";
> > + int i;
> > + unsigned int max_tag_len = 0;
> > +
> > + /* Allocate a buffer for escaping tags. */
> > + for (i = 0; i < add_tags_count; i++)
> > + if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
> > + max_tag_len = strlen (argv[add_tags[i]] + 1);
> > + for (i = 0; i < remove_tags_count; i++)
> > + if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
> > + max_tag_len = strlen (argv[remove_tags[i]] + 1);
> > + escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);
>
> Perhaps a comment here or above _escape_tag() explaining the worst case
> memory consumption of strlen(tag) * 2 + 3 for a tag of "s would be in
> order.
Definitely. Done.
> It's unrelated, but looking at the above also made me check something
> I've suspected before: notmuch allows you to have empty or zero length
> tags "", which is probably not intentional.
>
> There's no check for talloc failures here or below. But then there are
> few checks for that in the cli in general. *shrug*.
It's unfortunate that error handling obscures C code so much. But
there's no sense in not handling errors, so I fixed this.
> > +
> > + /* Build the new query string */
> > + if (strcmp (orig_query_string, "*") == 0)
> > + query_string = talloc_strdup (ctx, "(");
> > + else
> > + query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
> > +
> > + for (i = 0; i < add_tags_count; i++) {
> > + query_string = talloc_asprintf_append_buffer (
> > + query_string, "%snot tag:%s", join,
> > + _escape_tag (escaped, argv[add_tags[i]] + 1));
> > + join = " or ";
> > + }
> > + for (i = 0; i < remove_tags_count; i++) {
> > + query_string = talloc_asprintf_append_buffer (
> > + query_string, "%stag:%s", join,
> > + _escape_tag (escaped, argv[remove_tags[i]] + 1));
> > + join = " or ";
> > + }
> > +
> > + query_string = talloc_strdup_append_buffer (query_string, ")");
> > +
> > + talloc_free (escaped);
> > + return query_string;
> > +}
> > +
> > int
> > notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
> > {
> > @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
> > return 1;
> > }
> >
> > + /* Optimize the query so it excludes messages that already have
> > + * the specified set of tags. */
> > + query_string = _optimize_tag_query (ctx, query_string, argv,
> > + add_tags, add_tags_count,
> > + remove_tags, remove_tags_count);
> > +
> > config = notmuch_config_open (ctx, NULL, NULL);
> > if (config == NULL)
> > return 1;
next prev parent reply other threads:[~2011-11-09 13:37 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-08 3:55 [PATCH] tag: Automatically limit to messages whose tags will actually change Austin Clements
2011-11-08 4:34 ` Dmitry Kurochkin
2011-11-08 16:10 ` Austin Clements
2011-11-08 10:41 ` Sebastian Spaeth
2011-11-09 8:46 ` Jani Nikula
2011-11-09 13:40 ` Austin Clements [this message]
2011-11-10 13:28 ` Sebastian Spaeth
2012-11-06 1:56 ` David Bremner
2011-11-09 13:44 ` [PATCH v2] " Austin Clements
2011-11-16 17:41 ` Tomi Ollila
2011-11-28 15:46 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111109134013.GK2658@mit.edu \
--to=amdragon@mit.edu \
--cc=jani@nikula.org \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).