From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 9B39B431FBC for ; Mon, 24 Dec 2012 21:58:18 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RjDg331+12Vn for ; Mon, 24 Dec 2012 21:58:17 -0800 (PST) Received: from dmz-mailsec-scanner-3.mit.edu (DMZ-MAILSEC-SCANNER-3.MIT.EDU [18.9.25.14]) by olra.theworths.org (Postfix) with ESMTP id 47FAA431FBD for ; Mon, 24 Dec 2012 21:58:14 -0800 (PST) X-AuditID: 1209190e-b7fa16d000001402-a4-50d94075fa9d Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-3.mit.edu (Symantec Messaging Gateway) with SMTP id CA.1D.05122.57049D05; Tue, 25 Dec 2012 00:58:13 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id qBP5wCIg008904; Tue, 25 Dec 2012 00:58:12 -0500 Received: from drake.dyndns.org (c-76-21-105-205.hsd1.ca.comcast.net [76.21.105.205]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id qBP5w7AB011705 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Tue, 25 Dec 2012 00:58:10 -0500 (EST) Received: from amthrax by drake.dyndns.org with local (Exim 4.77) (envelope-from ) id 1TnNWg-0001Xq-K2; Tue, 25 Dec 2012 00:58:06 -0500 From: Austin Clements To: notmuch@notmuchmail.org Subject: [PATCH 2/5] util: Function to parse boolean term queries Date: Tue, 25 Dec 2012 00:57:53 -0500 Message-Id: <1356415076-5692-3-git-send-email-amdragon@mit.edu> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1356415076-5692-1-git-send-email-amdragon@mit.edu> References: <1356415076-5692-1-git-send-email-amdragon@mit.edu> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrHIsWRmVeSWpSXmKPExsUixG6nrlvqcDPA4Mc/Hosbrd2MFk3TnS1W z+WxuH5zJrMDi8fOWXfZPW7df83u8WzVLWaPLYfeMwewRHHZpKTmZJalFunbJXBlLNx3haXg m2jFq1m3mBsYDwl0MXJySAiYSDy5dYMRwhaTuHBvPVsXIxeHkMA+Romff44yQTgbGCV+LN7J AuFcZJI4NP0yC0iLkMBcRom2JgcQm01AQ2Lb/uVgo0QEpCV23p3NCmIzC+RJPHy0lR3EFhZw lPh4cB5YDYuAqkRf11ygdRwcvAL2Estmm0FcoSjR/WwCG4jNKeAgMffPSqhV9hJf525mmcDI v4CRYRWjbEpulW5uYmZOcWqybnFyYl5eapGusV5uZoleakrpJkZQsHFK8u1g/HpQ6RCjAAej Eg/vxTk3AoRYE8uKK3MPMUpyMCmJ8r62uxkgxJeUn1KZkVicEV9UmpNafIhRgoNZSYTXhAco x5uSWFmVWpQPk5LmYFES572SctNfSCA9sSQ1OzW1ILUIJivDwaEkwfsHZKhgUWp6akVaZk4J QpqJgxNkOA/QcEl7kOHFBYm5xZnpEPlTjIpS4rymIAkBkERGaR5cLywZvGIUB3pFmFcIpIoH mEjgul8BDWYCGhzLdwNkcEkiQkqqgZE/9caMpoRzE8W/Cl/Ij81dtGhL8LfdtVafLqr4zknr 0Nu1z6xIpE1Ln/VpqoQUv3zXnu7CWyuXv93je+pa6+UC64jcN5O7m3zCWSpzBSWeS6Uzfncu v/VdWv7tlCcMTZtuWR41WSfDc3LZLYEZFyLmZCzrmZK7NV5r8VLrO5UZ5s6V9W3xm5VYijMS DbWYi4oTAU74pRPhAgAA X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2012 05:58:19 -0000 This reproduces Xapian's parsing rules for boolean term queries. This is provided as a generic string utility, but will be used shortly in notmuch restore to parse and optimize for ID queries. --- util/string-util.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++ util/string-util.h | 11 +++++++++ 2 files changed, 74 insertions(+) diff --git a/util/string-util.c b/util/string-util.c index 161a4dd..eaa6c99 100644 --- a/util/string-util.c +++ b/util/string-util.c @@ -94,3 +94,66 @@ make_boolean_term (void *ctx, const char *prefix, const char *term, return 0; } + +static int +consume_double_quote (const char **str) +{ + if (**str == '"') { + ++*str; + return 1; + } else if (strncmp(*str, "\xe2\x80\x9c", 3) == 0 || /* UTF8 0x201c */ + strncmp(*str, "\xe2\x80\x9d", 3) == 0) { /* UTF8 0x201d */ + *str += 3; + return 3; + } else { + return 0; + } +} + +int +parse_boolean_term (void *ctx, const char *str, + char **prefix_out, char **term_out) +{ + *prefix_out = *term_out = NULL; + + /* Parse prefix */ + const char *pos = strchr (str, ':'); + if (! pos) + goto FAIL; + *prefix_out = talloc_strndup (ctx, str, pos - str); + ++pos; + + /* Implement Xapian's boolean term de-quoting. This is a nearly + * direct translation of QueryParser::Internal::parse_query. */ + pos = *term_out = talloc_strdup (ctx, pos); + if (consume_double_quote (&pos)) { + char *out = talloc_strdup (ctx, pos); + pos = *term_out = out; + while (1) { + if (! *pos) { + /* Premature end of string */ + goto FAIL; + } else if (*pos == '"') { + if (*++pos != '"') + break; + } else if (consume_double_quote (&pos)) { + break; + } + *out++ = *pos++; + } + if (*pos) + goto FAIL; + *out = '\0'; + } else { + while (*pos > ' ' && *pos != ')') + ++pos; + if (*pos) + goto FAIL; + } + return 0; + + FAIL: + talloc_free (*prefix_out); + talloc_free (*term_out); + return 1; +} diff --git a/util/string-util.h b/util/string-util.h index 7475e2c..e4e4c42 100644 --- a/util/string-util.h +++ b/util/string-util.h @@ -28,4 +28,15 @@ char *strtok_len (char *s, const char *delim, size_t *len); int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term, char **buf, size_t *len); +/* Parse a boolean term query, returning the prefix in *prefix_out and + * the term in *term_out. *prefix_out and *term_out will be talloc'd + * with context ctx. + * + * Return: 0 on success, non-zero on parse error (including trailing + * data in str). + */ +int +parse_boolean_term (void *ctx, const char *str, + char **prefix_out, char **term_out); + #endif -- 1.7.10.4