From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jani@nikula.org>
Received: from localhost (localhost [127.0.0.1])
	by olra.theworths.org (Postfix) with ESMTP id 4EE11431FAF
	for <notmuch@notmuchmail.org>; Thu,  3 Jan 2013 08:49:07 -0800 (PST)
X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
X-Spam-Flag: NO
X-Spam-Score: 1.151
X-Spam-Level: *
X-Spam-Status: No, score=1.151 tagged_above=-999 required=5
	tests=[FUZZY_AMBIEN=1.851, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled
Received: from olra.theworths.org ([127.0.0.1])
	by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id yDceBnu96li7 for <notmuch@notmuchmail.org>;
	Thu,  3 Jan 2013 08:49:06 -0800 (PST)
Received: from mail-bk0-f53.google.com (mail-bk0-f53.google.com
	[209.85.214.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
	(No client certificate requested)
	by olra.theworths.org (Postfix) with ESMTPS id 3409C431FAE
	for <notmuch@notmuchmail.org>; Thu,  3 Jan 2013 08:49:06 -0800 (PST)
Received: by mail-bk0-f53.google.com with SMTP id j5so6793968bkw.12
	for <notmuch@notmuchmail.org>; Thu, 03 Jan 2013 08:49:03 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=x-received:from:to:cc:subject:in-reply-to:references:user-agent
	:date:message-id:mime-version:content-type:x-gm-message-state;
	bh=JT+aeu6kHcmbBMCWAjct13X/5hbEsPFm4KVnD83yHpo=;
	b=fnvnRcQm4q6rI7sPXgfoKJU69tQqGMeWWHCVpFkCSCtiMeEcjAG8Lyl9My/ZMwhdvf
	taIocJJ24meA2X86tYhHfOquDA3pQUh2J91xGpO30h8SJfiCFIEppgFH0bgGAS+JwGAe
	6U9BewuJRFWnXpGgNNYYCfxpscU68WnYYnoyH71SXNke6FBTuC768A8QdGZyRQrqUtPe
	FIyoARDD+8uBrVDCiXUl8jOhul3r5yktqK0N4d4641x9i1c2IbLeNhvZekM2dDqJLh4v
	i0vw0pyqyaxOE8B+Q0JGrHAMfSR0zIsLQaSJovUVBjcWhpj5yJKaSFy6TtZvlIL9Fi3e
	9+bA==
X-Received: by 10.204.3.220 with SMTP id 28mr23780642bko.50.1357231743438;
	Thu, 03 Jan 2013 08:49:03 -0800 (PST)
Received: from localhost ([2001:4b98:dc0:43:216:3eff:fe1b:25f3])
	by mx.google.com with ESMTPS id o7sm34593411bkv.13.2013.01.03.08.49.00
	(version=SSLv3 cipher=OTHER); Thu, 03 Jan 2013 08:49:02 -0800 (PST)
From: Jani Nikula <jani@nikula.org>
To: Austin Clements <amdragon@MIT.EDU>, notmuch@notmuchmail.org
Subject: Re: [PATCH v4 1/5] util: Factor out boolean term quoting routine
In-Reply-To: <1356936162-2589-2-git-send-email-amdragon@mit.edu>
References: <1356936162-2589-1-git-send-email-amdragon@mit.edu>
	<1356936162-2589-2-git-send-email-amdragon@mit.edu>
User-Agent: Notmuch/0.14+235~gdaf492b (http://notmuchmail.org) Emacs/23.2.1
	(x86_64-pc-linux-gnu)
Date: Thu, 03 Jan 2013 17:48:54 +0100
Message-ID: <87y5gagqkp.fsf@nikula.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Gm-Message-State: ALoCoQns7ri9as/bFmXiFxhT85GCq5nfW3PAwWJgOze5h1Lcy31e/6iRV9pgzjdMmCNys/Fm8UL+
Cc: tomi.ollila@iki.fi
X-BeenThere: notmuch@notmuchmail.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: "Use and development of the notmuch mail system."
	<notmuch.notmuchmail.org>
List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
	<mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
List-Archive: <http://notmuchmail.org/pipermail/notmuch>
List-Post: <mailto:notmuch@notmuchmail.org>
List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
	<mailto:notmuch-request@notmuchmail.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2013 16:49:07 -0000

On Mon, 31 Dec 2012, Austin Clements <amdragon@MIT.EDU> wrote:
> From: Austin Clements <amdragon@MIT.EDU>
>
> This is now a generic boolean term quoting function.  It performs
> minimal quoting to produce user-friendly queries.
>
> This could live in tag-util as well, but it is really nothing specific
> to tags (although the conventions are specific to Xapian).
>
> The API is changed from "caller-allocates" to "readline-like".  The
> scan for max tag length is pushed down into the quoting routine.
> Furthermore, this now combines the term prefix with the quoted term;
> arguably this is just as easy to do in the caller, but this will
> nicely parallel the boolean term parsing function to be introduced
> shortly.
>
> This is an amalgamation of code written by David Bremner and myself.
> ---
>  notmuch-tag.c      |   48 ++++++++++++---------------------------
>  util/string-util.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  util/string-util.h |   14 ++++++++++++
>  3 files changed, 92 insertions(+), 34 deletions(-)
>
> diff --git a/notmuch-tag.c b/notmuch-tag.c
> index 88d559b..fc9d43a 100644
> --- a/notmuch-tag.c
> +++ b/notmuch-tag.c
> @@ -19,6 +19,7 @@
>   */
>  
>  #include "notmuch-client.h"
> +#include "string-util.h"
>  
>  static volatile sig_atomic_t interrupted;
>  
> @@ -35,25 +36,6 @@ handle_sigint (unused (int sig))
>      interrupted = 1;
>  }
>  
> -static char *
> -_escape_tag (char *buf, const char *tag)
> -{
> -    const char *in = tag;
> -    char *out = buf;
> -
> -    /* Boolean terms surrounded by double quotes can contain any
> -     * character.  Double quotes are quoted by doubling them. */
> -    *out++ = '"';
> -    while (*in) {
> -	if (*in == '"')
> -	    *out++ = '"';
> -	*out++ = *in++;
> -    }
> -    *out++ = '"';
> -    *out = 0;
> -    return buf;
> -}
> -
>  typedef struct {
>      const char *tag;
>      notmuch_bool_t remove;
> @@ -71,25 +53,16 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
>       * parenthesize and the exclusion part of the query must not use
>       * the '-' operator (though the NOT operator is fine). */
>  
> -    char *escaped, *query_string;
> +    char *escaped = NULL;
> +    size_t escaped_len = 0;
> +    char *query_string;
>      const char *join = "";
> -    int i;
> -    unsigned int max_tag_len = 0;
> +    size_t i;
>  
>      /* Don't optimize if there are no tag changes. */
>      if (tag_ops[0].tag == NULL)
>  	return talloc_strdup (ctx, orig_query_string);
>  
> -    /* Allocate a buffer for escaping tags.  This is large enough to
> -     * hold a fully escaped tag with every character doubled plus
> -     * enclosing quotes and a NUL. */
> -    for (i = 0; tag_ops[i].tag; i++)
> -	if (strlen (tag_ops[i].tag) > max_tag_len)
> -	    max_tag_len = strlen (tag_ops[i].tag);
> -    escaped = talloc_array (ctx, char, max_tag_len * 2 + 3);
> -    if (! escaped)
> -	return NULL;
> -
>      /* Build the new query string */
>      if (strcmp (orig_query_string, "*") == 0)
>  	query_string = talloc_strdup (ctx, "(");
> @@ -97,10 +70,17 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
>  	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
>  
>      for (i = 0; tag_ops[i].tag && query_string; i++) {
> +	/* XXX in case of OOM, query_string will be deallocated when
> +	 * ctx is, which might be at shutdown */
> +	if (make_boolean_term (ctx,
> +			       "tag", tag_ops[i].tag,
> +			       &escaped, &escaped_len))
> +	    return NULL;
> +
>  	query_string = talloc_asprintf_append_buffer (
> -	    query_string, "%s%stag:%s", join,
> +	    query_string, "%s%s%s", join,
>  	    tag_ops[i].remove ? "" : "not ",
> -	    _escape_tag (escaped, tag_ops[i].tag));
> +	    escaped);
>  	join = " or ";
>      }
>  
> diff --git a/util/string-util.c b/util/string-util.c
> index 44f8cd3..e4bea21 100644
> --- a/util/string-util.c
> +++ b/util/string-util.c
> @@ -20,6 +20,7 @@
>  
>  
>  #include "string-util.h"
> +#include "talloc.h"
>  
>  char *
>  strtok_len (char *s, const char *delim, size_t *len)
> @@ -32,3 +33,66 @@ strtok_len (char *s, const char *delim, size_t *len)
>  
>      return *len ? s : NULL;
>  }
> +
> +int
> +make_boolean_term (void *ctx, const char *prefix, const char *term,
> +		   char **buf, size_t *len)
> +{
> +    const char *in;
> +    char *out;
> +    size_t needed = 3;
> +    int need_quoting = 0;
> +
> +    /* Do we need quoting?  To be paranoid, we quote anything
> +     * containing a quote, even though it only matters at the
> +     * beginning, and anything containing non-ASCII text. */
> +    for (in = term; *in && !need_quoting; in++)
> +	if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127)

Should that be *in >= 127?

Otherwise LGTM.

Jani.

> +	    need_quoting = 1;
> +
> +    if (need_quoting)
> +	for (in = term; *in; in++)
> +	    needed += (*in == '"') ? 2 : 1;
> +    else
> +	needed = strlen (term) + 1;
> +
> +    /* Reserve space for the prefix */
> +    if (prefix)
> +	needed += strlen (prefix) + 1;
> +
> +    if ((*buf == NULL) || (needed > *len)) {
> +	*len = 2 * needed;
> +	*buf = talloc_realloc (ctx, *buf, char, *len);
> +    }
> +
> +    if (! *buf)
> +	return 1;
> +
> +    out = *buf;
> +
> +    /* Copy in the prefix */
> +    if (prefix) {
> +	strcpy (out, prefix);
> +	out += strlen (prefix);
> +	*out++ = ':';
> +    }
> +
> +    if (! need_quoting) {
> +	strcpy (out, term);
> +	return 0;
> +    }
> +
> +    /* Quote term by enclosing it in double quotes and doubling any
> +     * internal double quotes. */
> +    *out++ = '"';
> +    in = term;
> +    while (*in) {
> +	if (*in == '"')
> +	    *out++ = '"';
> +	*out++ = *in++;
> +    }
> +    *out++ = '"';
> +    *out = '\0';
> +
> +    return 0;
> +}
> diff --git a/util/string-util.h b/util/string-util.h
> index ac7676c..b8844a3 100644
> --- a/util/string-util.h
> +++ b/util/string-util.h
> @@ -19,4 +19,18 @@
>  
>  char *strtok_len (char *s, const char *delim, size_t *len);
>  
> +/* Construct a boolean term query with the specified prefix (e.g.,
> + * "id") and search term, quoting term as necessary.  Specifically, if
> + * term contains any non-printable ASCII characters, non-ASCII
> + * characters, close parenthesis or double quotes, it will be enclosed
> + * in double quotes and any internal double quotes will be doubled
> + * (e.g. a"b -> "a""b").  The result will be a valid notmuch query and
> + * can be parsed by parse_boolean_term.
> + *
> + * Output is into buf; it may be talloc_realloced.
> + * Return: 0 on success, non-zero on memory allocation failure.
> + */
> +int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
> +		       char **buf, size_t *len);
> +
>  #endif
> -- 
> 1.7.10.4