unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] tag: Automatically limit to messages whose tags will actually change.
@ 2011-11-08  3:55 Austin Clements
  2011-11-08  4:34 ` Dmitry Kurochkin
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Austin Clements @ 2011-11-08  3:55 UTC (permalink / raw)
  To: notmuch

This optimizes the user's tagging query to exclude messages that won't
be affected by the tagging operation, saving computation and IO for
redundant tagging operations.

For example,
  notmuch tag +notmuch to:notmuch@notmuchmail.org
will now use the query
  ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch")

In the past, we've often suggested that people do this exact
transformation by hand for slow tagging operations.  This makes that
unnecessary.
---
I was about to implement this optimization in my initial tagging
script, but then I figured, why not just do it in notmuch so we can
stop telling people to do this by hand?

 NEWS          |    9 ++++++
 notmuch-tag.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index e00452a..9ca5e0c 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,15 @@ Add search terms to  "notmuch dump"
   search/show/tag. The output file argument of dump is deprecated in
   favour of using stdout.
 
+Optimizations
+-------------
+
+Automatic tag query optimization
+
+  "notmuch tag" now automatically optimizes the user's query to
+  exclude messages whose tags won't change.  In the past, we've
+  suggested that people do this by hand; this is no longer necessary.
+
 Notmuch 0.9 (2011-10-01)
 ========================
 
diff --git a/notmuch-tag.c b/notmuch-tag.c
index dded39e..62c4bf1 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -30,6 +30,76 @@ handle_sigint (unused (int sig))
     interrupted = 1;
 }
 
+static char *
+_escape_tag (char *buf, const char *tag)
+{
+    const char *in = tag;
+    char *out = buf;
+    /* Boolean terms surrounded by double quotes can contain any
+     * character.  Double quotes are quoted by doubling them. */
+    *(out++) = '"';
+    while (*in) {
+	if (*in == '"')
+	    *(out++) = '"';
+	*(out++) = *(in++);
+    }
+    *(out++) = '"';
+    *out = 0;
+    return buf;
+}
+
+static char *
+_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
+		     int *add_tags, int add_tags_count,
+		     int *remove_tags, int remove_tags_count)
+{
+    /* This is subtler than it looks.  Xapian ignores the '-' operator
+     * at the beginning both queries and parenthesized groups and,
+     * furthermore, the presence of a '-' operator at the beginning of
+     * a group can inhibit parsing of the previous operator.  Hence,
+     * the user-provided query MUST appear first, but it is safe to
+     * parenthesize and the exclusion part of the query must not use
+     * the '-' operator (though the NOT operator is fine). */
+
+    char *escaped, *query_string;
+    const char *join = "";
+    int i;
+    unsigned int max_tag_len = 0;
+
+    /* Allocate a buffer for escaping tags. */
+    for (i = 0; i < add_tags_count; i++)
+	if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
+	    max_tag_len = strlen (argv[add_tags[i]] + 1);
+    for (i = 0; i < remove_tags_count; i++)
+	if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
+	    max_tag_len = strlen (argv[remove_tags[i]] + 1);
+    escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);
+
+    /* Build the new query string */
+    if (strcmp (orig_query_string, "*") == 0)
+	query_string = talloc_strdup (ctx, "(");
+    else
+	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
+
+    for (i = 0; i < add_tags_count; i++) {
+	query_string = talloc_asprintf_append_buffer (
+	    query_string, "%snot tag:%s", join,
+	    _escape_tag (escaped, argv[add_tags[i]] + 1));
+	join = " or ";
+    }
+    for (i = 0; i < remove_tags_count; i++) {
+	query_string = talloc_asprintf_append_buffer (
+	    query_string, "%stag:%s", join,
+	    _escape_tag (escaped, argv[remove_tags[i]] + 1));
+	join = " or ";
+    }
+
+    query_string = talloc_strdup_append_buffer (query_string, ")");
+
+    talloc_free (escaped);
+    return query_string;
+}
+
 int
 notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 {
@@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
 	return 1;
     }
 
+    /* Optimize the query so it excludes messages that already have
+     * the specified set of tags. */
+    query_string = _optimize_tag_query (ctx, query_string, argv,
+					add_tags, add_tags_count,
+					remove_tags, remove_tags_count);
+
     config = notmuch_config_open (ctx, NULL, NULL);
     if (config == NULL)
 	return 1;
-- 
1.7.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-11-06  1:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-08  3:55 [PATCH] tag: Automatically limit to messages whose tags will actually change Austin Clements
2011-11-08  4:34 ` Dmitry Kurochkin
2011-11-08 16:10   ` Austin Clements
2011-11-08 10:41 ` Sebastian Spaeth
2011-11-09  8:46 ` Jani Nikula
2011-11-09 13:40   ` Austin Clements
2011-11-10 13:28   ` Sebastian Spaeth
2012-11-06  1:56     ` David Bremner
2011-11-09 13:44 ` [PATCH v2] " Austin Clements
2011-11-16 17:41   ` Tomi Ollila
2011-11-28 15:46   ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).