From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 72D4F429E26 for ; Mon, 7 Nov 2011 19:52:55 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7ZyQ4FG7tTRZ for ; Mon, 7 Nov 2011 19:52:54 -0800 (PST) Received: from dmz-mailsec-scanner-5.mit.edu (DMZ-MAILSEC-SCANNER-5.MIT.EDU [18.7.68.34]) by olra.theworths.org (Postfix) with ESMTP id A7E71431FB6 for ; Mon, 7 Nov 2011 19:52:54 -0800 (PST) X-AuditID: 12074422-b7ff56d00000092f-6d-4eb8a79603eb Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id CF.5D.02351.697A8BE4; Mon, 7 Nov 2011 22:52:54 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id pA83qr6e005768; Mon, 7 Nov 2011 22:52:53 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id pA83qqDi006906 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Mon, 7 Nov 2011 22:52:53 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1RNcmX-00069H-0T; Mon, 07 Nov 2011 22:55:29 -0500 From: Austin Clements To: notmuch@notmuchmail.org Subject: [PATCH] tag: Automatically limit to messages whose tags will actually change. Date: Mon, 7 Nov 2011 22:55:23 -0500 Message-Id: <1320724523-23568-1-git-send-email-amdragon@mit.edu> X-Mailer: git-send-email 1.7.7.1 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrCIsWRmVeSWpSXmKPExsUixCmqrTtt+Q4/g0fL2C2u35zJ7MDo8WzV LeYAxigum5TUnMyy1CJ9uwSujK4l55gKGlQqbr3fxt7AeFq6i5GTQ0LAROLH/68sELaYxIV7 69m6GLk4hAT2MUpc2tvODpIQEljPKPH+XChE4gSTxL8j35kgnM2MErvbFzODVLEJaEhs27+c EcQWEZCW2Hl3NmsXIwcHs4CaxJ8uFRBTWCBMouOSHUgFi4CqxPsnE8A6eQUcJI7cnsQOUiIh oCCxbEf1BEbeBYwMqxhlU3KrdHMTM3OKU5N1i5MT8/JSi3RN9XIzS/RSU0o3MYICgN1FaQfj z4NKhxgFOBiVeHhniu7wE2JNLCuuzD3EKMnBpCTKG78UKMSXlJ9SmZFYnBFfVJqTWnyIUYKD WUmEV6kJKMebklhZlVqUD5OS5mBREufl2ungJySQnliSmp2aWpBaBJOV4eBQkuA9sgyoUbAo NT21Ii0zpwQhzcTBCTKcB2j4SZAa3uKCxNzizHSI/ClGXY7Tfy6dYhRiycvPS5US510FUiQA UpRRmgc3Bxa5rxjFgd4S5t0PUsUDjHq4Sa+AljABLWnX3QaypCQRISXVwGi0tntGNNNn3p0l 5ZX/pEP53m16kHhohcS39BL+5oPFQtX+fdt6tR27f2z5LbHi9bTeO57q21L1WKrWL3kq2hUW tbZbpry1fdu0LxNqvkxqeNwX7JkxaU+Vbh3rvX7vguasBsUKiborXbUSHVYzDjifOHX+zf// yTLR0genp3aKvG7wWnaoX4mlOCPRUIu5qDgRAK3+uzi3AgAA X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2011 03:52:55 -0000 This optimizes the user's tagging query to exclude messages that won't be affected by the tagging operation, saving computation and IO for redundant tagging operations. For example, notmuch tag +notmuch to:notmuch@notmuchmail.org will now use the query ( to:notmuch@notmuchmail.org ) and (not tag:"notmuch") In the past, we've often suggested that people do this exact transformation by hand for slow tagging operations. This makes that unnecessary. --- I was about to implement this optimization in my initial tagging script, but then I figured, why not just do it in notmuch so we can stop telling people to do this by hand? NEWS | 9 ++++++ notmuch-tag.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 0 deletions(-) diff --git a/NEWS b/NEWS index e00452a..9ca5e0c 100644 --- a/NEWS +++ b/NEWS @@ -16,6 +16,15 @@ Add search terms to "notmuch dump" search/show/tag. The output file argument of dump is deprecated in favour of using stdout. +Optimizations +------------- + +Automatic tag query optimization + + "notmuch tag" now automatically optimizes the user's query to + exclude messages whose tags won't change. In the past, we've + suggested that people do this by hand; this is no longer necessary. + Notmuch 0.9 (2011-10-01) ======================== diff --git a/notmuch-tag.c b/notmuch-tag.c index dded39e..62c4bf1 100644 --- a/notmuch-tag.c +++ b/notmuch-tag.c @@ -30,6 +30,76 @@ handle_sigint (unused (int sig)) interrupted = 1; } +static char * +_escape_tag (char *buf, const char *tag) +{ + const char *in = tag; + char *out = buf; + /* Boolean terms surrounded by double quotes can contain any + * character. Double quotes are quoted by doubling them. */ + *(out++) = '"'; + while (*in) { + if (*in == '"') + *(out++) = '"'; + *(out++) = *(in++); + } + *(out++) = '"'; + *out = 0; + return buf; +} + +static char * +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[], + int *add_tags, int add_tags_count, + int *remove_tags, int remove_tags_count) +{ + /* This is subtler than it looks. Xapian ignores the '-' operator + * at the beginning both queries and parenthesized groups and, + * furthermore, the presence of a '-' operator at the beginning of + * a group can inhibit parsing of the previous operator. Hence, + * the user-provided query MUST appear first, but it is safe to + * parenthesize and the exclusion part of the query must not use + * the '-' operator (though the NOT operator is fine). */ + + char *escaped, *query_string; + const char *join = ""; + int i; + unsigned int max_tag_len = 0; + + /* Allocate a buffer for escaping tags. */ + for (i = 0; i < add_tags_count; i++) + if (strlen (argv[add_tags[i]] + 1) > max_tag_len) + max_tag_len = strlen (argv[add_tags[i]] + 1); + for (i = 0; i < remove_tags_count; i++) + if (strlen (argv[remove_tags[i]] + 1) > max_tag_len) + max_tag_len = strlen (argv[remove_tags[i]] + 1); + escaped = talloc_array(ctx, char, max_tag_len * 2 + 3); + + /* Build the new query string */ + if (strcmp (orig_query_string, "*") == 0) + query_string = talloc_strdup (ctx, "("); + else + query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string); + + for (i = 0; i < add_tags_count; i++) { + query_string = talloc_asprintf_append_buffer ( + query_string, "%snot tag:%s", join, + _escape_tag (escaped, argv[add_tags[i]] + 1)); + join = " or "; + } + for (i = 0; i < remove_tags_count; i++) { + query_string = talloc_asprintf_append_buffer ( + query_string, "%stag:%s", join, + _escape_tag (escaped, argv[remove_tags[i]] + 1)); + join = " or "; + } + + query_string = talloc_strdup_append_buffer (query_string, ")"); + + talloc_free (escaped); + return query_string; +} + int notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) { @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) return 1; } + /* Optimize the query so it excludes messages that already have + * the specified set of tags. */ + query_string = _optimize_tag_query (ctx, query_string, argv, + add_tags, add_tags_count, + remove_tags, remove_tags_count); + config = notmuch_config_open (ctx, NULL, NULL); if (config == NULL) return 1; -- 1.7.7.1