From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 53235431FBF for ; Sun, 23 Dec 2012 17:40:14 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KrKXEzMdKkt9 for ; Sun, 23 Dec 2012 17:40:12 -0800 (PST) Received: from tesseract.cs.unb.ca (tesseract.cs.unb.ca [131.202.240.238]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 919E0431FCB for ; Sun, 23 Dec 2012 17:40:02 -0800 (PST) Received: from fctnnbsc30w-156034082078.dhcp-dynamic.fibreop.nb.bellaliant.net ([156.34.82.78] helo=zancas.localnet) by tesseract.cs.unb.ca with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1Tmx1K-0008Kj-SZ for notmuch@notmuchmail.org; Sun, 23 Dec 2012 21:40:01 -0400 Received: from bremner by zancas.localnet with local (Exim 4.80) (envelope-from ) id 1Tmx1F-0002nD-C6 for notmuch@notmuchmail.org; Sun, 23 Dec 2012 21:39:53 -0400 From: david@tethera.net To: notmuch@notmuchmail.org Subject: v9 of batch tagging Date: Sun, 23 Dec 2012 21:39:26 -0400 Message-Id: <1356313183-9266-1-git-send-email-david@tethera.net> X-Mailer: git-send-email 1.7.10.4 X-Spam_bar: - X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2012 01:40:14 -0000 This obsoletes id:1356095307-22895-1-git-send-email-david@tethera.net The main changes since v8 are the rebasing against the notmuch-restore fixes in master, and the rewrite of the query (pre)-processing unhex_and_quote. This incorporates the changes of id:1356231570-28232-1-git-send-email-david@tethera.net and now handles '()' (cf. id:87a9t5p4dz.fsf@qmul.ac.uk) With respect to ,---- | Finally, I don't know if a query can contain a : without being a | prefix query. If it can that could end up being misquoted. `---- This is pretty easy to work around by encoding that :. I think unless it is a problem in practice I prefer not to keep an explicity list of prefixes here; recognizing prefixes should really be a service from libnotmuch. I dropped two patches (strnspn and hex_invariant), but picked up a new strtok variation. Probably the name strtok_len2 could be improved (and I see there is a typo in the patch subject). [Patch v9 05/17] util/string-util: add a new string tokenized Finally I added a test for the new parenthesis handling. [Patch v9 17/17] test/tagging: add test for handling of parens Fixup wise, the tests needed to be adjusted a bit for () being delimiters, and the man page as well. I added the fclose in id:87wqw9hf9a.fsf@oiva.home.nikula.org And I modified the return value per id:87zk15hi7f.fsf@oiva.home.nikula.org Here is the interdiff for unhex_and_quote: commit 67c6aee87db5c7da25529e1c0feb64e422abb4b7 Author: David Bremner Date: Sat Dec 22 22:49:02 2012 -0400 simplify unhex_and_quote, support parens the overgeneral definition of a prefix can be replaced by lower case alphabetic, and still work fine with current notmuch query syntax. use () as delimiters in unhex_and_quote, preserve delimiters diff --git a/tag-util.c b/tag-util.c index 6f62fe6..91f3603 100644 --- a/tag-util.c +++ b/tag-util.c @@ -56,6 +56,21 @@ illegal_tag (const char *tag, notmuch_bool_t remove) return NULL; } +/* Factor out the boilerplate to append a token to the query string. + * For use in unhex_and_quote */ + +static tag_parse_status_t +append_tok (const char *tok, size_t tok_len, + const char *line_for_error, char **query_string) +{ + + *query_string = talloc_strndup_append_buffer (*query_string, tok, tok_len); + if (*query_string == NULL) + return line_error (TAG_PARSE_OUT_OF_MEMORY, line_for_error, "aborting"); + + return TAG_PARSE_SUCCESS; +} + /* Input is a hex encoded string, presumed to be a query for Xapian. * * Space delimited tokens are decoded and quoted, with '*' and prefixes @@ -67,45 +82,41 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error, { char *tok = encoded; size_t tok_len = 0; + size_t delim_len = 0; char *buf = NULL; size_t buf_len = 0; tag_parse_status_t ret = TAG_PARSE_SUCCESS; *query_string = talloc_strdup (ctx, ""); - while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) { + while ((tok = strtok_len2 (tok + tok_len + delim_len, " ()", + &tok_len, &delim_len)) != NULL) { size_t prefix_len; char delim = *(tok + tok_len); - *(tok + tok_len++) = '\0'; + *(tok + tok_len) = '\0'; - prefix_len = hex_invariant (tok, tok_len); + /* The following matches a superset of prefixes currently + * used by notmuch */ + prefix_len = strspn (tok, "abcdefghijklmnopqrstuvwxyz"); - if ((strcmp (tok, "*") == 0) || prefix_len >= tok_len - 1) { + if ((strcmp (tok, "*") == 0) || prefix_len == tok_len) { /* pass some things through without quoting or decoding. * Note for '*' this is mandatory. */ - if (! (*query_string = talloc_asprintf_append_buffer ( - *query_string, "%s%c", tok, delim))) { - - ret = line_error (TAG_PARSE_OUT_OF_MEMORY, - line_for_error, "aborting"); - goto DONE; - } + ret = append_tok (tok, tok_len, line_for_error, query_string); + if (ret) goto DONE; } else { /* potential prefix: one for ':', then something after */ - if ((tok_len - prefix_len > 2) && *(tok + prefix_len) == ':') { - if (! (*query_string = talloc_strndup_append (*query_string, - tok, - prefix_len + 1))) { - ret = line_error (TAG_PARSE_OUT_OF_MEMORY, - line_for_error, "aborting"); - goto DONE; - } + if ((tok_len - prefix_len >= 2) && *(tok + prefix_len) == ':') { + ret = append_tok (tok, prefix_len + 1, + line_for_error, query_string); + if (ret) goto DONE; + tok += prefix_len + 1; tok_len -= prefix_len + 1; } @@ -122,13 +133,15 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error, goto DONE; } - if (! (*query_string = talloc_asprintf_append_buffer ( - *query_string, "%s%c", buf, delim))) { - ret = line_error (TAG_PARSE_OUT_OF_MEMORY, - line_for_error, "aborting"); - goto DONE; - } + ret = append_tok (buf, buf_len, line_for_error, query_string); + if (ret) goto DONE; } + /* restore the string */ + *(tok + tok_len) = delim; + + /* copy any delimiters */ + ret = append_tok (tok + tok_len, delim_len, line_for_error, query_string); + if (ret) goto DONE; } DONE: