unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] tag: Automatically limit to messages whose tags will actually change.
@ 2011-11-08  3:55 Austin Clements
  2011-11-08  4:34 ` Dmitry Kurochkin
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Austin Clements @ 2011-11-08  3:55 UTC (permalink / raw)
  To: notmuch

This optimizes the user's tagging query to exclude messages that won't
be affected by the tagging operation, saving computation and IO for
redundant tagging operations.

For example,
  notmuch tag +notmuch to:notmuch@notmuchmail.org
will now use the query
  ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")

In the past, we've often suggested that people do this exact
transformation by hand for slow tagging operations.  This makes that
unnecessary.
---
I was about to implement this optimization in my initial tagging
script, but then I figured, why not just do it in notmuch so we can
stop telling people to do this by hand?

 NEWS          |    9 ++++++
 notmuch-tag.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index e00452a..9ca5e0c 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,15 @@ Add search terms to  "notmuch dump"
   search/show/tag. The output file argument of dump is deprecated in
   favour of using stdout.
 
+Optimizations
+-------------
+
+Automatic tag query optimization
+
+  "notmuch tag" now automatically optimizes the user's query to
+  exclude messages whose tags won't change.  In the past, we've
+  suggested that people do this by hand; this is no longer necessary.
+
 Notmuch 0.9 (2011-10-01)
 ========================
 
diff --git a/notmuch-tag.c b/notmuch-tag.c
index dded39e..62c4bf1 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -30,6 +30,76 @@ handle_sigint (unused (int sig))
     interrupted = 1;
 }
 
+static char *
+_escape_tag (char *buf, const char *tag)
+{
+    const char *in = tag;
+    char *out = buf;
+    /* Boolean terms surrounded by double quotes can contain any
+     * character.  Double quotes are quoted by doubling them. */
+    *(out++) = '"';
+    while (*in) {
+	if (*in == '"')
+	    *(out++) = '"';
+	*(out++) = *(in++);
+    }
+    *(out++) = '"';
+    *out = 0;
+    return buf;
+}
+
+static char *
+_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
+		     int *add_tags, int add_tags_count,
+		     int *remove_tags, int remove_tags_count)
+{
+    /* This is subtler than it looks.  Xapian ignores the '-' operator
+     * at the beginning both queries and parenthesized groups and,
+     * furthermore, the presence of a '-' operator at the beginning of
+     * a group can inhibit parsing of the previous operator.  Hence,
+     * the user-provided query MUST appear first, but it is safe to
+     * parenthesize and the exclusion part of the query must not use
+     * the '-' operator (though the NOT operator is fine). */
+
+    char *escaped, *query_string;
+    const char *join = "";
+    int i;
+    unsigned int max_tag_len = 0;
+
+    /* Allocate a buffer for escaping tags. */
+    for (i = 0; i < add_tags_count; i++)
+	if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
+	    max_tag_len = strlen (argv[add_tags[i]] + 1);
+    for (i = 0; i < remove_tags_count; i++)
+	if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
+	    max_tag_len = strlen (argv[remove_tags[i]] + 1);
+    escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);
+
+    /* Build the new query string */
+    if (strcmp (orig_query_string, "*") == 0)
+	query_string = talloc_strdup (ctx, "(");
+    else
+	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
+
+    for (i = 0; i < add_tags_count; i++) {
+	query_string = talloc_asprintf_append_buffer (
+	    query_string, "%snot tag:%s", join,
+	    _escape_tag (escaped, argv[add_tags[i]] + 1));
+	join = " or ";
+    }
+    for (i = 0; i < remove_tags_count; i++) {
+	query_string = talloc_asprintf_append_buffer (
+	    query_string, "%stag:%s", join,
+	    _escape_tag (escaped, argv[remove_tags[i]] + 1));
+	join = " or ";
+    }
+
+    query_string = talloc_strdup_append_buffer (query_string, ")");
+
+    talloc_free (escaped);
+    return query_string;
+}
+
 int
 notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 {
@@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 	return 1;
     }
 
+    /* Optimize the query so it excludes messages that already have
+     * the specified set of tags. */
+    query_string = _optimize_tag_query (ctx, query_string, argv,
+					add_tags, add_tags_count,
+					remove_tags, remove_tags_count);
+
     config = notmuch_config_open (ctx, NULL, NULL);
     if (config == NULL)
 	return 1;
-- 
1.7.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] tag: Automatically limit to messages whose tags will actually change.
  2011-11-08  3:55 [PATCH] tag: Automatically limit to messages whose tags will actually change Austin Clements
@ 2011-11-08  4:34 ` Dmitry Kurochkin
  2011-11-08 16:10   ` Austin Clements
  2011-11-08 10:41 ` Sebastian Spaeth
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Dmitry Kurochkin @ 2011-11-08  4:34 UTC (permalink / raw)
  To: Austin Clements, notmuch

Hi Austin.

On Mon,  7 Nov 2011 22:55:23 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This optimizes the user's tagging query to exclude messages that won't
> be affected by the tagging operation, saving computation and IO for
> redundant tagging operations.
> 
> For example,
>   notmuch tag +notmuch to:notmuch@notmuchmail.org
> will now use the query
>   ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")
> 
> In the past, we've often suggested that people do this exact
> transformation by hand for slow tagging operations.  This makes that
> unnecessary.

Thanks!  This is a very useful optimization.

Does it work for multiple tags and tag removal?  I.e.:

  notmuch tag -inbox -unread +sent from:dmitry.kurochkin@gmail.com

can be converted to:

  notmuch tag -inbox -unread +sent from:dmitry.kurochkin@gmail.com and (tag:inbox or tag:unread or (not tag:sent))

Regards,
  Dmitry

> ---
> I was about to implement this optimization in my initial tagging
> script, but then I figured, why not just do it in notmuch so we can
> stop telling people to do this by hand?
> 
>  NEWS          |    9 ++++++
>  notmuch-tag.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 0 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index e00452a..9ca5e0c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -16,6 +16,15 @@ Add search terms to  "notmuch dump"
>    search/show/tag. The output file argument of dump is deprecated in
>    favour of using stdout.
>  
> +Optimizations
> +-------------
> +
> +Automatic tag query optimization
> +
> +  "notmuch tag" now automatically optimizes the user's query to
> +  exclude messages whose tags won't change.  In the past, we've
> +  suggested that people do this by hand; this is no longer necessary.
> +
>  Notmuch 0.9 (2011-10-01)
>  ========================
>  
> diff --git a/notmuch-tag.c b/notmuch-tag.c
> index dded39e..62c4bf1 100644
> --- a/notmuch-tag.c
> +++ b/notmuch-tag.c
> @@ -30,6 +30,76 @@ handle_sigint (unused (int sig))
>      interrupted = 1;
>  }
>  
> +static char *
> +_escape_tag (char *buf, const char *tag)
> +{
> +    const char *in = tag;
> +    char *out = buf;
> +    /* Boolean terms surrounded by double quotes can contain any
> +     * character.  Double quotes are quoted by doubling them. */
> +    *(out++) = '"';
> +    while (*in) {
> +	if (*in == '"')
> +	    *(out++) = '"';
> +	*(out++) = *(in++);
> +    }
> +    *(out++) = '"';
> +    *out = 0;
> +    return buf;
> +}
> +
> +static char *
> +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
> +		     int *add_tags, int add_tags_count,
> +		     int *remove_tags, int remove_tags_count)
> +{
> +    /* This is subtler than it looks.  Xapian ignores the '-' operator
> +     * at the beginning both queries and parenthesized groups and,
> +     * furthermore, the presence of a '-' operator at the beginning of
> +     * a group can inhibit parsing of the previous operator.  Hence,
> +     * the user-provided query MUST appear first, but it is safe to
> +     * parenthesize and the exclusion part of the query must not use
> +     * the '-' operator (though the NOT operator is fine). */
> +
> +    char *escaped, *query_string;
> +    const char *join = "";
> +    int i;
> +    unsigned int max_tag_len = 0;
> +
> +    /* Allocate a buffer for escaping tags. */
> +    for (i = 0; i < add_tags_count; i++)
> +	if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
> +	    max_tag_len = strlen (argv[add_tags[i]] + 1);
> +    for (i = 0; i < remove_tags_count; i++)
> +	if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
> +	    max_tag_len = strlen (argv[remove_tags[i]] + 1);
> +    escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);
> +
> +    /* Build the new query string */
> +    if (strcmp (orig_query_string, "*") == 0)
> +	query_string = talloc_strdup (ctx, "(");
> +    else
> +	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
> +
> +    for (i = 0; i < add_tags_count; i++) {
> +	query_string = talloc_asprintf_append_buffer (
> +	    query_string, "%snot tag:%s", join,
> +	    _escape_tag (escaped, argv[add_tags[i]] + 1));
> +	join = " or ";
> +    }
> +    for (i = 0; i < remove_tags_count; i++) {
> +	query_string = talloc_asprintf_append_buffer (
> +	    query_string, "%stag:%s", join,
> +	    _escape_tag (escaped, argv[remove_tags[i]] + 1));
> +	join = " or ";
> +    }
> +
> +    query_string = talloc_strdup_append_buffer (query_string, ")");
> +
> +    talloc_free (escaped);
> +    return query_string;
> +}
> +
>  int
>  notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
>  {
> @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
>  	return 1;
>      }
>  
> +    /* Optimize the query so it excludes messages that already have
> +     * the specified set of tags. */
> +    query_string = _optimize_tag_query (ctx, query_string, argv,
> +					add_tags, add_tags_count,
> +					remove_tags, remove_tags_count);
> +
>      config = notmuch_config_open (ctx, NULL, NULL);
>      if (config == NULL)
>  	return 1;
> -- 
> 1.7.7.1
> 
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tag: Automatically limit to messages whose tags will actually change.
  2011-11-08  3:55 [PATCH] tag: Automatically limit to messages whose tags will actually change Austin Clements
  2011-11-08  4:34 ` Dmitry Kurochkin
@ 2011-11-08 10:41 ` Sebastian Spaeth
  2011-11-09  8:46 ` Jani Nikula
  2011-11-09 13:44 ` [PATCH v2] " Austin Clements
  3 siblings, 0 replies; 11+ messages in thread
From: Sebastian Spaeth @ 2011-11-08 10:41 UTC (permalink / raw)
  To: Austin Clements, notmuch

[-- Attachment #1: Type: text/plain, Size: 267 bytes --]

On Mon,  7 Nov 2011 22:55:23 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This optimizes the user's tagging query to exclude messages that won't
> be affected by the tagging operation, saving computation and IO for
> redundant tagging operations.

+1 for this!

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tag: Automatically limit to messages whose tags will actually change.
  2011-11-08  4:34 ` Dmitry Kurochkin
@ 2011-11-08 16:10   ` Austin Clements
  0 siblings, 0 replies; 11+ messages in thread
From: Austin Clements @ 2011-11-08 16:10 UTC (permalink / raw)
  To: Dmitry Kurochkin; +Cc: notmuch

Quoth Dmitry Kurochkin on Nov 08 at  8:34 am:
> Hi Austin.
> 
> On Mon,  7 Nov 2011 22:55:23 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> > This optimizes the user's tagging query to exclude messages that won't
> > be affected by the tagging operation, saving computation and IO for
> > redundant tagging operations.
> > 
> > For example,
> >   notmuch tag +notmuch to:notmuch@notmuchmail.org
> > will now use the query
> >   ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")
> > 
> > In the past, we've often suggested that people do this exact
> > transformation by hand for slow tagging operations.  This makes that
> > unnecessary.
> 
> Thanks!  This is a very useful optimization.
> 
> Does it work for multiple tags and tag removal?  I.e.:
> 
>   notmuch tag -inbox -unread +sent from:dmitry.kurochkin@gmail.com
> 
> can be converted to:
> 
>   notmuch tag -inbox -unread +sent from:dmitry.kurochkin@gmail.com and (tag:inbox or tag:unread or (not tag:sent))

Yep.  This is pretty much exactly what it does.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tag: Automatically limit to messages whose tags will actually change.
  2011-11-08  3:55 [PATCH] tag: Automatically limit to messages whose tags will actually change Austin Clements
  2011-11-08  4:34 ` Dmitry Kurochkin
  2011-11-08 10:41 ` Sebastian Spaeth
@ 2011-11-09  8:46 ` Jani Nikula
  2011-11-09 13:40   ` Austin Clements
  2011-11-10 13:28   ` Sebastian Spaeth
  2011-11-09 13:44 ` [PATCH v2] " Austin Clements
  3 siblings, 2 replies; 11+ messages in thread
From: Jani Nikula @ 2011-11-09  8:46 UTC (permalink / raw)
  To: Austin Clements, notmuch


FWIW, I reviewed this and didn't find any obvious problems. A few
nitpicks below, though.

BR,
Jani.

On Mon,  7 Nov 2011 22:55:23 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This optimizes the user's tagging query to exclude messages that won't
> be affected by the tagging operation, saving computation and IO for
> redundant tagging operations.
> 
> For example,
>   notmuch tag +notmuch to:notmuch@notmuchmail.org
> will now use the query
>   ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")
> 
> In the past, we've often suggested that people do this exact
> transformation by hand for slow tagging operations.  This makes that
> unnecessary.
> ---
> I was about to implement this optimization in my initial tagging
> script, but then I figured, why not just do it in notmuch so we can
> stop telling people to do this by hand?
> 
>  NEWS          |    9 ++++++
>  notmuch-tag.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 0 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index e00452a..9ca5e0c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -16,6 +16,15 @@ Add search terms to  "notmuch dump"
>    search/show/tag. The output file argument of dump is deprecated in
>    favour of using stdout.
>  
> +Optimizations
> +-------------
> +
> +Automatic tag query optimization
> +
> +  "notmuch tag" now automatically optimizes the user's query to
> +  exclude messages whose tags won't change.  In the past, we've
> +  suggested that people do this by hand; this is no longer necessary.
> +
>  Notmuch 0.9 (2011-10-01)
>  ========================
>  
> diff --git a/notmuch-tag.c b/notmuch-tag.c
> index dded39e..62c4bf1 100644
> --- a/notmuch-tag.c
> +++ b/notmuch-tag.c
> @@ -30,6 +30,76 @@ handle_sigint (unused (int sig))
>      interrupted = 1;
>  }
>  
> +static char *
> +_escape_tag (char *buf, const char *tag)
> +{
> +    const char *in = tag;
> +    char *out = buf;
> +    /* Boolean terms surrounded by double quotes can contain any
> +     * character.  Double quotes are quoted by doubling them. */
> +    *(out++) = '"';
> +    while (*in) {
> +	if (*in == '"')
> +	    *(out++) = '"';
> +	*(out++) = *(in++);
> +    }
> +    *(out++) = '"';

The parenthesis are unnecessary for *p++.

> +    *out = 0;
> +    return buf;
> +}
> +
> +static char *
> +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
> +		     int *add_tags, int add_tags_count,
> +		     int *remove_tags, int remove_tags_count)
> +{
> +    /* This is subtler than it looks.  Xapian ignores the '-' operator
> +     * at the beginning both queries and parenthesized groups and,
> +     * furthermore, the presence of a '-' operator at the beginning of
> +     * a group can inhibit parsing of the previous operator.  Hence,
> +     * the user-provided query MUST appear first, but it is safe to
> +     * parenthesize and the exclusion part of the query must not use
> +     * the '-' operator (though the NOT operator is fine). */
> +
> +    char *escaped, *query_string;
> +    const char *join = "";
> +    int i;
> +    unsigned int max_tag_len = 0;
> +
> +    /* Allocate a buffer for escaping tags. */
> +    for (i = 0; i < add_tags_count; i++)
> +	if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
> +	    max_tag_len = strlen (argv[add_tags[i]] + 1);
> +    for (i = 0; i < remove_tags_count; i++)
> +	if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
> +	    max_tag_len = strlen (argv[remove_tags[i]] + 1);
> +    escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);

Perhaps a comment here or above _escape_tag() explaining the worst case
memory consumption of strlen(tag) * 2 + 3 for a tag of "s would be in
order.

It's unrelated, but looking at the above also made me check something
I've suspected before: notmuch allows you to have empty or zero length
tags "", which is probably not intentional.

There's no check for talloc failures here or below. But then there are
few checks for that in the cli in general. *shrug*.

> +
> +    /* Build the new query string */
> +    if (strcmp (orig_query_string, "*") == 0)
> +	query_string = talloc_strdup (ctx, "(");
> +    else
> +	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
> +
> +    for (i = 0; i < add_tags_count; i++) {
> +	query_string = talloc_asprintf_append_buffer (
> +	    query_string, "%snot tag:%s", join,
> +	    _escape_tag (escaped, argv[add_tags[i]] + 1));
> +	join = " or ";
> +    }
> +    for (i = 0; i < remove_tags_count; i++) {
> +	query_string = talloc_asprintf_append_buffer (
> +	    query_string, "%stag:%s", join,
> +	    _escape_tag (escaped, argv[remove_tags[i]] + 1));
> +	join = " or ";
> +    }
> +
> +    query_string = talloc_strdup_append_buffer (query_string, ")");
> +
> +    talloc_free (escaped);
> +    return query_string;
> +}
> +
>  int
>  notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
>  {
> @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
>  	return 1;
>      }
>  
> +    /* Optimize the query so it excludes messages that already have
> +     * the specified set of tags. */
> +    query_string = _optimize_tag_query (ctx, query_string, argv,
> +					add_tags, add_tags_count,
> +					remove_tags, remove_tags_count);
> +
>      config = notmuch_config_open (ctx, NULL, NULL);
>      if (config == NULL)
>  	return 1;
> -- 
> 1.7.7.1
> 
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tag: Automatically limit to messages whose tags will actually change.
  2011-11-09  8:46 ` Jani Nikula
@ 2011-11-09 13:40   ` Austin Clements
  2011-11-10 13:28   ` Sebastian Spaeth
  1 sibling, 0 replies; 11+ messages in thread
From: Austin Clements @ 2011-11-09 13:40 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

Quoth Jani Nikula on Nov 09 at  8:46 am:
> 
> FWIW, I reviewed this and didn't find any obvious problems. A few
> nitpicks below, though.
> 
> BR,
> Jani.
> 
> On Mon,  7 Nov 2011 22:55:23 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> > This optimizes the user's tagging query to exclude messages that won't
> > be affected by the tagging operation, saving computation and IO for
> > redundant tagging operations.
> > 
> > For example,
> >   notmuch tag +notmuch to:notmuch@notmuchmail.org
> > will now use the query
> >   ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")
> > 
> > In the past, we've often suggested that people do this exact
> > transformation by hand for slow tagging operations.  This makes that
> > unnecessary.
> > ---
> > I was about to implement this optimization in my initial tagging
> > script, but then I figured, why not just do it in notmuch so we can
> > stop telling people to do this by hand?
> > 
> >  NEWS          |    9 ++++++
> >  notmuch-tag.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 85 insertions(+), 0 deletions(-)
> > 
> > diff --git a/NEWS b/NEWS
> > index e00452a..9ca5e0c 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -16,6 +16,15 @@ Add search terms to  "notmuch dump"
> >    search/show/tag. The output file argument of dump is deprecated in
> >    favour of using stdout.
> >  
> > +Optimizations
> > +-------------
> > +
> > +Automatic tag query optimization
> > +
> > +  "notmuch tag" now automatically optimizes the user's query to
> > +  exclude messages whose tags won't change.  In the past, we've
> > +  suggested that people do this by hand; this is no longer necessary.
> > +
> >  Notmuch 0.9 (2011-10-01)
> >  ========================
> >  
> > diff --git a/notmuch-tag.c b/notmuch-tag.c
> > index dded39e..62c4bf1 100644
> > --- a/notmuch-tag.c
> > +++ b/notmuch-tag.c
> > @@ -30,6 +30,76 @@ handle_sigint (unused (int sig))
> >      interrupted = 1;
> >  }
> >  
> > +static char *
> > +_escape_tag (char *buf, const char *tag)
> > +{
> > +    const char *in = tag;
> > +    char *out = buf;
> > +    /* Boolean terms surrounded by double quotes can contain any
> > +     * character.  Double quotes are quoted by doubling them. */
> > +    *(out++) = '"';
> > +    while (*in) {
> > +	if (*in == '"')
> > +	    *(out++) = '"';
> > +	*(out++) = *(in++);
> > +    }
> > +    *(out++) = '"';
> 
> The parenthesis are unnecessary for *p++.

Removed.  I put these in out of paranoia, but I suppose it wouldn't be
an lvalue if it parsed differently.

> > +    *out = 0;
> > +    return buf;
> > +}
> > +
> > +static char *
> > +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
> > +		     int *add_tags, int add_tags_count,
> > +		     int *remove_tags, int remove_tags_count)
> > +{
> > +    /* This is subtler than it looks.  Xapian ignores the '-' operator
> > +     * at the beginning both queries and parenthesized groups and,
> > +     * furthermore, the presence of a '-' operator at the beginning of
> > +     * a group can inhibit parsing of the previous operator.  Hence,
> > +     * the user-provided query MUST appear first, but it is safe to
> > +     * parenthesize and the exclusion part of the query must not use
> > +     * the '-' operator (though the NOT operator is fine). */
> > +
> > +    char *escaped, *query_string;
> > +    const char *join = "";
> > +    int i;
> > +    unsigned int max_tag_len = 0;
> > +
> > +    /* Allocate a buffer for escaping tags. */
> > +    for (i = 0; i < add_tags_count; i++)
> > +	if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
> > +	    max_tag_len = strlen (argv[add_tags[i]] + 1);
> > +    for (i = 0; i < remove_tags_count; i++)
> > +	if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
> > +	    max_tag_len = strlen (argv[remove_tags[i]] + 1);
> > +    escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);
> 
> Perhaps a comment here or above _escape_tag() explaining the worst case
> memory consumption of strlen(tag) * 2 + 3 for a tag of "s would be in
> order.

Definitely.  Done.

> It's unrelated, but looking at the above also made me check something
> I've suspected before: notmuch allows you to have empty or zero length
> tags "", which is probably not intentional.
> 
> There's no check for talloc failures here or below. But then there are
> few checks for that in the cli in general. *shrug*.

It's unfortunate that error handling obscures C code so much.  But
there's no sense in not handling errors, so I fixed this.

> > +
> > +    /* Build the new query string */
> > +    if (strcmp (orig_query_string, "*") == 0)
> > +	query_string = talloc_strdup (ctx, "(");
> > +    else
> > +	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
> > +
> > +    for (i = 0; i < add_tags_count; i++) {
> > +	query_string = talloc_asprintf_append_buffer (
> > +	    query_string, "%snot tag:%s", join,
> > +	    _escape_tag (escaped, argv[add_tags[i]] + 1));
> > +	join = " or ";
> > +    }
> > +    for (i = 0; i < remove_tags_count; i++) {
> > +	query_string = talloc_asprintf_append_buffer (
> > +	    query_string, "%stag:%s", join,
> > +	    _escape_tag (escaped, argv[remove_tags[i]] + 1));
> > +	join = " or ";
> > +    }
> > +
> > +    query_string = talloc_strdup_append_buffer (query_string, ")");
> > +
> > +    talloc_free (escaped);
> > +    return query_string;
> > +}
> > +
> >  int
> >  notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
> >  {
> > @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
> >  	return 1;
> >      }
> >  
> > +    /* Optimize the query so it excludes messages that already have
> > +     * the specified set of tags. */
> > +    query_string = _optimize_tag_query (ctx, query_string, argv,
> > +					add_tags, add_tags_count,
> > +					remove_tags, remove_tags_count);
> > +
> >      config = notmuch_config_open (ctx, NULL, NULL);
> >      if (config == NULL)
> >  	return 1;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] tag: Automatically limit to messages whose tags will actually change.
  2011-11-08  3:55 [PATCH] tag: Automatically limit to messages whose tags will actually change Austin Clements
                   ` (2 preceding siblings ...)
  2011-11-09  8:46 ` Jani Nikula
@ 2011-11-09 13:44 ` Austin Clements
  2011-11-16 17:41   ` Tomi Ollila
  2011-11-28 15:46   ` David Bremner
  3 siblings, 2 replies; 11+ messages in thread
From: Austin Clements @ 2011-11-09 13:44 UTC (permalink / raw)
  To: notmuch

This optimizes the user's tagging query to exclude messages that won't
be affected by the tagging operation, saving computation and IO for
redundant tagging operations.

For example,
  notmuch tag +notmuch to:notmuch@notmuchmail.org
will now use the query
  ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")

In the past, we've often suggested that people do this exact
transformation by hand for slow tagging operations.  This makes that
unnecessary.
---
This version addresses Jani's comments.

 NEWS          |    9 ++++++
 notmuch-tag.c |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index e00452a..9ca5e0c 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,15 @@ Add search terms to  "notmuch dump"
   search/show/tag. The output file argument of dump is deprecated in
   favour of using stdout.
 
+Optimizations
+-------------
+
+Automatic tag query optimization
+
+  "notmuch tag" now automatically optimizes the user's query to
+  exclude messages whose tags won't change.  In the past, we've
+  suggested that people do this by hand; this is no longer necessary.
+
 Notmuch 0.9 (2011-10-01)
 ========================
 
diff --git a/notmuch-tag.c b/notmuch-tag.c
index dded39e..537d5a4 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -30,6 +30,81 @@ handle_sigint (unused (int sig))
     interrupted = 1;
 }
 
+static char *
+_escape_tag (char *buf, const char *tag)
+{
+    const char *in = tag;
+    char *out = buf;
+    /* Boolean terms surrounded by double quotes can contain any
+     * character.  Double quotes are quoted by doubling them. */
+    *out++ = '"';
+    while (*in) {
+	if (*in == '"')
+	    *out++ = '"';
+	*out++ = *in++;
+    }
+    *out++ = '"';
+    *out = 0;
+    return buf;
+}
+
+static char *
+_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
+		     int *add_tags, int add_tags_count,
+		     int *remove_tags, int remove_tags_count)
+{
+    /* This is subtler than it looks.  Xapian ignores the '-' operator
+     * at the beginning both queries and parenthesized groups and,
+     * furthermore, the presence of a '-' operator at the beginning of
+     * a group can inhibit parsing of the previous operator.  Hence,
+     * the user-provided query MUST appear first, but it is safe to
+     * parenthesize and the exclusion part of the query must not use
+     * the '-' operator (though the NOT operator is fine). */
+
+    char *escaped, *query_string;
+    const char *join = "";
+    int i;
+    unsigned int max_tag_len = 0;
+
+    /* Allocate a buffer for escaping tags.  This is large enough to
+     * hold a fully escaped tag with every character doubled plus
+     * enclosing quotes and a NUL. */
+    for (i = 0; i < add_tags_count; i++)
+	if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
+	    max_tag_len = strlen (argv[add_tags[i]] + 1);
+    for (i = 0; i < remove_tags_count; i++)
+	if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
+	    max_tag_len = strlen (argv[remove_tags[i]] + 1);
+    escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);
+    if (!escaped)
+	return NULL;
+
+    /* Build the new query string */
+    if (strcmp (orig_query_string, "*") == 0)
+	query_string = talloc_strdup (ctx, "(");
+    else
+	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
+
+    for (i = 0; i < add_tags_count && query_string; i++) {
+	query_string = talloc_asprintf_append_buffer (
+	    query_string, "%snot tag:%s", join,
+	    _escape_tag (escaped, argv[add_tags[i]] + 1));
+	join = " or ";
+    }
+    for (i = 0; i < remove_tags_count && query_string; i++) {
+	query_string = talloc_asprintf_append_buffer (
+	    query_string, "%stag:%s", join,
+	    _escape_tag (escaped, argv[remove_tags[i]] + 1));
+	join = " or ";
+    }
+
+    if (query_string)
+	query_string = talloc_strdup_append_buffer (query_string, ")");
+
+    talloc_free (escaped);
+    return query_string;
+}
+
 int
 notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 {
@@ -93,6 +168,16 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 	return 1;
     }
 
+    /* Optimize the query so it excludes messages that already have
+     * the specified set of tags. */
+    query_string = _optimize_tag_query (ctx, query_string, argv,
+					add_tags, add_tags_count,
+					remove_tags, remove_tags_count);
+    if (query_string == NULL) {
+	fprintf (stderr, "Out of memory.\n");
+	return 1;
+    }
+
     config = notmuch_config_open (ctx, NULL, NULL);
     if (config == NULL)
 	return 1;
-- 
1.7.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] tag: Automatically limit to messages whose tags will actually change.
  2011-11-09  8:46 ` Jani Nikula
  2011-11-09 13:40   ` Austin Clements
@ 2011-11-10 13:28   ` Sebastian Spaeth
  2012-11-06  1:56     ` David Bremner
  1 sibling, 1 reply; 11+ messages in thread
From: Sebastian Spaeth @ 2011-11-10 13:28 UTC (permalink / raw)
  To: Jani Nikula, Austin Clements, notmuch

[-- Attachment #1: Type: text/plain, Size: 486 bytes --]

On Wed, 09 Nov 2011 08:46:02 +0000, Jani Nikula <jani@nikula.org> wrote:
> It's unrelated, but looking at the above also made me check something
> I've suspected before: notmuch allows you to have empty or zero length
> tags "", which is probably not intentional.

I had reported already that it is possible to add either an empty or a
"\n" tag (not sure what gets added) via '+<RET>' in the emacs UI. See
the newline in the tag list. Remove it again with '-<RET>' in emacs.

Sebastian

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] tag: Automatically limit to messages whose tags will actually change.
  2011-11-09 13:44 ` [PATCH v2] " Austin Clements
@ 2011-11-16 17:41   ` Tomi Ollila
  2011-11-28 15:46   ` David Bremner
  1 sibling, 0 replies; 11+ messages in thread
From: Tomi Ollila @ 2011-11-16 17:41 UTC (permalink / raw)
  To: Austin Clements, notmuch

On Wed,  9 Nov 2011 08:44:35 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This optimizes the user's tagging query to exclude messages that won't
> be affected by the tagging operation, saving computation and IO for
> redundant tagging operations.
> 
> For example,
>   notmuch tag +notmuch to:notmuch@notmuchmail.org
> will now use the query
>   ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")
> 
> In the past, we've often suggested that people do this exact
> transformation by hand for slow tagging operations.  This makes that
> unnecessary.
> ---
> This version addresses Jani's comments.
> 
>  NEWS          |    9 ++++++
>  notmuch-tag.c |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 94 insertions(+), 0 deletions(-)
> 

Reviewed code, looks good

* Empty query string checked before entering _optimize_tag_query
  (if that matters)
* All allocations checked
* The logic look sound and creation of that query string is ok, too.
* I trust escaping is done the way it is done (quotes around, doubling
  any quotes (") in string).

* orig_query_string could be freed in _optimize_query_string()
  the data becomes garbage pointer to it lost after that call.
  However, _optimize_query_string() doesn't know that so it is
  better to leave talloc do the freeing (a bit later).

Tomi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] tag: Automatically limit to messages whose tags will actually change.
  2011-11-09 13:44 ` [PATCH v2] " Austin Clements
  2011-11-16 17:41   ` Tomi Ollila
@ 2011-11-28 15:46   ` David Bremner
  1 sibling, 0 replies; 11+ messages in thread
From: David Bremner @ 2011-11-28 15:46 UTC (permalink / raw)
  To: Austin Clements, notmuch

On Wed,  9 Nov 2011 08:44:35 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This optimizes the user's tagging query to exclude messages that won't
> be affected by the tagging operation, saving computation and IO for
> redundant tagging operations.

Pushed, 

d

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tag: Automatically limit to messages whose tags will actually change.
  2011-11-10 13:28   ` Sebastian Spaeth
@ 2012-11-06  1:56     ` David Bremner
  0 siblings, 0 replies; 11+ messages in thread
From: David Bremner @ 2012-11-06  1:56 UTC (permalink / raw)
  To: Sebastian Spaeth, Jani Nikula, Austin Clements, notmuch

Sebastian Spaeth <Sebastian@SSpaeth.de> writes:

> On Wed, 09 Nov 2011 08:46:02 +0000, Jani Nikula <jani@nikula.org> wrote:
>> It's unrelated, but looking at the above also made me check something
>> I've suspected before: notmuch allows you to have empty or zero length
>> tags "", which is probably not intentional.
>
> I had reported already that it is possible to add either an empty or a
> "\n" tag (not sure what gets added) via '+<RET>' in the emacs UI. See
> the newline in the tag list. Remove it again with '-<RET>' in emacs.

Both of these problems should be fixed in current git master
(0.14-76-g84a0c52) so I am untagging this bug.

d

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-11-06  1:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-08  3:55 [PATCH] tag: Automatically limit to messages whose tags will actually change Austin Clements
2011-11-08  4:34 ` Dmitry Kurochkin
2011-11-08 16:10   ` Austin Clements
2011-11-08 10:41 ` Sebastian Spaeth
2011-11-09  8:46 ` Jani Nikula
2011-11-09 13:40   ` Austin Clements
2011-11-10 13:28   ` Sebastian Spaeth
2012-11-06  1:56     ` David Bremner
2011-11-09 13:44 ` [PATCH v2] " Austin Clements
2011-11-16 17:41   ` Tomi Ollila
2011-11-28 15:46   ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).