unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore
@ 2012-12-26  3:48 Austin Clements
  2012-12-26  3:48 ` [PATCH v2 1/5] util: Factor out boolean term quoting routine Austin Clements
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Austin Clements @ 2012-12-26  3:48 UTC (permalink / raw)
  To: notmuch

This obsoletes

  id:1356415076-5692-1-git-send-email-amdragon@mit.edu

In addition to incorporating all of David's suggestions, this reworks
the boolean term parsing so it only handles the subset of quoting
syntax used by make_boolean_term (which also happens to be all that we
described in the man page for the format).  The diff from v1 is below.

diff --git a/man/man1/notmuch-restore.1 b/man/man1/notmuch-restore.1
index 6bba628..78fef52 100644
--- a/man/man1/notmuch-restore.1
+++ b/man/man1/notmuch-restore.1
@@ -57,10 +57,8 @@ sup calls them).
 The
 .B batch-tag
 dump format is intended to more robust against malformed message-ids
-and tags containing whitespace or non-\fBascii\fR(7) characters.  This
-format hex-escapes all characters those outside of a small character
-set, intended to be suitable for e.g. pathnames in most UNIX-like
-systems.
+and tags containing whitespace or non-\fBascii\fR(7) characters.  See
+\fBnotmuch-dump\fR(1) for details on this format.
 
 .B "notmuch restore"
 updates the maildir flags according to tag changes if the
diff --git a/test/dump-restore b/test/dump-restore
index aecc393..f9ae5b3 100755
--- a/test/dump-restore
+++ b/test/dump-restore
@@ -200,6 +200,8 @@ a
 # the next non-comment line should report an an empty tag error for
 # batch tagging, but not for restore
 + +e -- id:20091117232137.GA7669@griffis1.net
+# valid id, but warning about missing message
++e id:missing_message_id
 EOF
 
 cat <<EOF > EXPECTED
@@ -211,6 +213,7 @@ Warning: no query string after -- [+c +d --]
 Warning: hex decoding of tag %zz failed [+%zz -- id:whatever]
 Warning: cannot parse query: id:"
 Warning: not an id query: tag:abc
+Warning: cannot apply tags to missing message: missing_message_id
 EOF
 
 test_expect_equal_file EXPECTED OUTPUT
diff --git a/test/random-corpus.c b/test/random-corpus.c
index d0e3e8f..8b7748e 100644
--- a/test/random-corpus.c
+++ b/test/random-corpus.c
@@ -96,9 +96,9 @@ random_utf8_string (void *ctx, size_t char_count)
 	    buf = talloc_realloc (ctx, buf, gchar, buf_size);
 	}
 
-	randomchar = random_unichar ();
-	if (randomchar == '\n')
-	    randomchar = 'x';
+	do {
+	    randomchar = random_unichar ();
+	} while (randomchar == '\n');
 
 	written = g_unichar_to_utf8 (randomchar, buf + offset);
 
diff --git a/util/string-util.c b/util/string-util.c
index eaa6c99..db01b4b 100644
--- a/util/string-util.c
+++ b/util/string-util.c
@@ -43,9 +43,11 @@ make_boolean_term (void *ctx, const char *prefix, const char *term,
     size_t needed = 3;
     int need_quoting = 0;
 
-    /* Do we need quoting? */
+    /* Do we need quoting?  To be paranoid, we quote anything
+     * containing a quote, even though it only matters at the
+     * beginning, and anything containing non-ASCII text. */
     for (in = term; *in && !need_quoting; in++)
-	if (*in <= ' ' || *in == ')' || *in == '"')
+	if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127)
 	    need_quoting = 1;
 
     if (need_quoting)
@@ -95,21 +97,6 @@ make_boolean_term (void *ctx, const char *prefix, const char *term,
     return 0;
 }
 
-static int
-consume_double_quote (const char **str)
-{
-    if (**str == '"') {
-	++*str;
-	return 1;
-    } else if (strncmp(*str, "\xe2\x80\x9c", 3) == 0 || /* UTF8 0x201c */
-	       strncmp(*str, "\xe2\x80\x9d", 3) == 0) { /* UTF8 0x201d */
-	*str += 3;
-	return 3;
-    } else {
-	return 0;
-    }
-}
-
 int
 parse_boolean_term (void *ctx, const char *str,
 		    char **prefix_out, char **term_out)
@@ -123,28 +110,31 @@ parse_boolean_term (void *ctx, const char *str,
     *prefix_out = talloc_strndup (ctx, str, pos - str);
     ++pos;
 
-    /* Implement Xapian's boolean term de-quoting.  This is a nearly
-     * direct translation of QueryParser::Internal::parse_query. */
-    pos = *term_out = talloc_strdup (ctx, pos);
-    if (consume_double_quote (&pos)) {
-	char *out = talloc_strdup (ctx, pos);
-	pos = *term_out = out;
-	while (1) {
-	    if (! *pos) {
-		/* Premature end of string */
-		goto FAIL;
-	    } else if (*pos == '"') {
-		if (*++pos != '"')
+    /* Implement de-quoting compatible with make_boolean_term. */
+    if (*pos == '"') {
+	char *out = talloc_strdup (ctx, pos + 1);
+	int closed = 0;
+	/* Find the closing quote and un-double doubled internal
+	 * quotes. */
+	for (pos = *term_out = out; *pos; ) {
+	    if (*pos == '"') {
+		++pos;
+		if (*pos != '"') {
+		    /* Found the closing quote. */
+		    closed = 1;
 		    break;
-	    } else if (consume_double_quote (&pos)) {
-		break;
+		}
 	    }
 	    *out++ = *pos++;
 	}
-	if (*pos)
+	/* Did the term terminate without a closing quote or is there
+	 * trailing text after the closing quote? */
+	if (!closed || *pos)
 	    goto FAIL;
 	*out = '\0';
     } else {
+	*term_out = talloc_strdup (ctx, pos);
+	/* Check for text after the boolean term. */
 	while (*pos > ' ' && *pos != ')')
 	    ++pos;
 	if (*pos)
diff --git a/util/string-util.h b/util/string-util.h
index e4e4c42..aff2d65 100644
--- a/util/string-util.h
+++ b/util/string-util.h
@@ -28,9 +28,9 @@ char *strtok_len (char *s, const char *delim, size_t *len);
 int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
 		       char **buf, size_t *len);
 
-/* Parse a boolean term query, returning the prefix in *prefix_out and
- * the term in *term_out.  *prefix_out and *term_out will be talloc'd
- * with context ctx.
+/* Parse a boolean term query produced by make_boolean_term, returning
+ * the prefix in *prefix_out and the term in *term_out.  *prefix_out
+ * and *term_out will be talloc'd with context ctx.
  *
  * Return: 0 on success, non-zero on parse error (including trailing
  * data in str).

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 1/5] util: Factor out boolean term quoting routine
  2012-12-26  3:48 [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore Austin Clements
@ 2012-12-26  3:48 ` Austin Clements
  2012-12-26 14:24   ` David Bremner
  2012-12-26  3:48 ` [PATCH v2 2/5] util: Function to parse boolean term queries Austin Clements
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Austin Clements @ 2012-12-26  3:48 UTC (permalink / raw)
  To: notmuch

From: Austin Clements <amdragon@MIT.EDU>

This is now a generic boolean term quoting function.  It performs
minimal quoting to produce user-friendly queries.

This could live in tag-util as well, but it is really nothing specific
to tags (although the conventions are specific to Xapian).

The API is changed from "caller-allocates" to "readline-like".  The
scan for max tag length is pushed down into the quoting routine.
Furthermore, this now combines the term prefix with the quoted term;
arguably this is just as easy to do in the caller, but this will
nicely parallel the boolean term parsing function to be introduced
shortly.

This is an amalgamation of code written by David Bremner and myself.
---
 notmuch-tag.c      |   48 ++++++++++++---------------------------
 util/string-util.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 util/string-util.h |    9 ++++++++
 3 files changed, 87 insertions(+), 34 deletions(-)

diff --git a/notmuch-tag.c b/notmuch-tag.c
index 88d559b..fc9d43a 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -19,6 +19,7 @@
  */
 
 #include "notmuch-client.h"
+#include "string-util.h"
 
 static volatile sig_atomic_t interrupted;
 
@@ -35,25 +36,6 @@ handle_sigint (unused (int sig))
     interrupted = 1;
 }
 
-static char *
-_escape_tag (char *buf, const char *tag)
-{
-    const char *in = tag;
-    char *out = buf;
-
-    /* Boolean terms surrounded by double quotes can contain any
-     * character.  Double quotes are quoted by doubling them. */
-    *out++ = '"';
-    while (*in) {
-	if (*in == '"')
-	    *out++ = '"';
-	*out++ = *in++;
-    }
-    *out++ = '"';
-    *out = 0;
-    return buf;
-}
-
 typedef struct {
     const char *tag;
     notmuch_bool_t remove;
@@ -71,25 +53,16 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
      * parenthesize and the exclusion part of the query must not use
      * the '-' operator (though the NOT operator is fine). */
 
-    char *escaped, *query_string;
+    char *escaped = NULL;
+    size_t escaped_len = 0;
+    char *query_string;
     const char *join = "";
-    int i;
-    unsigned int max_tag_len = 0;
+    size_t i;
 
     /* Don't optimize if there are no tag changes. */
     if (tag_ops[0].tag == NULL)
 	return talloc_strdup (ctx, orig_query_string);
 
-    /* Allocate a buffer for escaping tags.  This is large enough to
-     * hold a fully escaped tag with every character doubled plus
-     * enclosing quotes and a NUL. */
-    for (i = 0; tag_ops[i].tag; i++)
-	if (strlen (tag_ops[i].tag) > max_tag_len)
-	    max_tag_len = strlen (tag_ops[i].tag);
-    escaped = talloc_array (ctx, char, max_tag_len * 2 + 3);
-    if (! escaped)
-	return NULL;
-
     /* Build the new query string */
     if (strcmp (orig_query_string, "*") == 0)
 	query_string = talloc_strdup (ctx, "(");
@@ -97,10 +70,17 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
 	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
 
     for (i = 0; tag_ops[i].tag && query_string; i++) {
+	/* XXX in case of OOM, query_string will be deallocated when
+	 * ctx is, which might be at shutdown */
+	if (make_boolean_term (ctx,
+			       "tag", tag_ops[i].tag,
+			       &escaped, &escaped_len))
+	    return NULL;
+
 	query_string = talloc_asprintf_append_buffer (
-	    query_string, "%s%stag:%s", join,
+	    query_string, "%s%s%s", join,
 	    tag_ops[i].remove ? "" : "not ",
-	    _escape_tag (escaped, tag_ops[i].tag));
+	    escaped);
 	join = " or ";
     }
 
diff --git a/util/string-util.c b/util/string-util.c
index 44f8cd3..e4bea21 100644
--- a/util/string-util.c
+++ b/util/string-util.c
@@ -20,6 +20,7 @@
 
 
 #include "string-util.h"
+#include "talloc.h"
 
 char *
 strtok_len (char *s, const char *delim, size_t *len)
@@ -32,3 +33,66 @@ strtok_len (char *s, const char *delim, size_t *len)
 
     return *len ? s : NULL;
 }
+
+int
+make_boolean_term (void *ctx, const char *prefix, const char *term,
+		   char **buf, size_t *len)
+{
+    const char *in;
+    char *out;
+    size_t needed = 3;
+    int need_quoting = 0;
+
+    /* Do we need quoting?  To be paranoid, we quote anything
+     * containing a quote, even though it only matters at the
+     * beginning, and anything containing non-ASCII text. */
+    for (in = term; *in && !need_quoting; in++)
+	if (*in <= ' ' || *in == ')' || *in == '"' || (unsigned char)*in > 127)
+	    need_quoting = 1;
+
+    if (need_quoting)
+	for (in = term; *in; in++)
+	    needed += (*in == '"') ? 2 : 1;
+    else
+	needed = strlen (term) + 1;
+
+    /* Reserve space for the prefix */
+    if (prefix)
+	needed += strlen (prefix) + 1;
+
+    if ((*buf == NULL) || (needed > *len)) {
+	*len = 2 * needed;
+	*buf = talloc_realloc (ctx, *buf, char, *len);
+    }
+
+    if (! *buf)
+	return 1;
+
+    out = *buf;
+
+    /* Copy in the prefix */
+    if (prefix) {
+	strcpy (out, prefix);
+	out += strlen (prefix);
+	*out++ = ':';
+    }
+
+    if (! need_quoting) {
+	strcpy (out, term);
+	return 0;
+    }
+
+    /* Quote term by enclosing it in double quotes and doubling any
+     * internal double quotes. */
+    *out++ = '"';
+    in = term;
+    while (*in) {
+	if (*in == '"')
+	    *out++ = '"';
+	*out++ = *in++;
+    }
+    *out++ = '"';
+    *out = '\0';
+
+    return 0;
+}
diff --git a/util/string-util.h b/util/string-util.h
index ac7676c..7475e2c 100644
--- a/util/string-util.h
+++ b/util/string-util.h
@@ -19,4 +19,13 @@
 
 char *strtok_len (char *s, const char *delim, size_t *len);
 
+/* Construct a boolean term query with the specified prefix (e.g.,
+ * "id") and search term, quoting term as necessary.
+ *
+ * Output is into buf; it may be talloc_realloced.
+ * Return: 0 on success, non-zero on memory allocation failure.
+ */
+int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
+		       char **buf, size_t *len);
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/5] util: Function to parse boolean term queries
  2012-12-26  3:48 [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore Austin Clements
  2012-12-26  3:48 ` [PATCH v2 1/5] util: Factor out boolean term quoting routine Austin Clements
@ 2012-12-26  3:48 ` Austin Clements
  2012-12-26 14:31   ` David Bremner
  2012-12-26  3:48 ` [PATCH v2 3/5] dump: Disallow \n in message IDs Austin Clements
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Austin Clements @ 2012-12-26  3:48 UTC (permalink / raw)
  To: notmuch

This parses the subset of Xapian's boolean term quoting rules that are
used by make_boolean_term.  This is provided as a generic string
utility, but will be used shortly in notmuch restore to parse and
optimize for ID queries.
---
 util/string-util.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 util/string-util.h |   11 +++++++++++
 2 files changed, 62 insertions(+)

diff --git a/util/string-util.c b/util/string-util.c
index e4bea21..db01b4b 100644
--- a/util/string-util.c
+++ b/util/string-util.c
@@ -96,3 +96,54 @@ make_boolean_term (void *ctx, const char *prefix, const char *term,
 
     return 0;
 }
+
+int
+parse_boolean_term (void *ctx, const char *str,
+		    char **prefix_out, char **term_out)
+{
+    *prefix_out = *term_out = NULL;
+
+    /* Parse prefix */
+    const char *pos = strchr (str, ':');
+    if (! pos)
+	goto FAIL;
+    *prefix_out = talloc_strndup (ctx, str, pos - str);
+    ++pos;
+
+    /* Implement de-quoting compatible with make_boolean_term. */
+    if (*pos == '"') {
+	char *out = talloc_strdup (ctx, pos + 1);
+	int closed = 0;
+	/* Find the closing quote and un-double doubled internal
+	 * quotes. */
+	for (pos = *term_out = out; *pos; ) {
+	    if (*pos == '"') {
+		++pos;
+		if (*pos != '"') {
+		    /* Found the closing quote. */
+		    closed = 1;
+		    break;
+		}
+	    }
+	    *out++ = *pos++;
+	}
+	/* Did the term terminate without a closing quote or is there
+	 * trailing text after the closing quote? */
+	if (!closed || *pos)
+	    goto FAIL;
+	*out = '\0';
+    } else {
+	*term_out = talloc_strdup (ctx, pos);
+	/* Check for text after the boolean term. */
+	while (*pos > ' ' && *pos != ')')
+	    ++pos;
+	if (*pos)
+	    goto FAIL;
+    }
+    return 0;
+
+ FAIL:
+    talloc_free (*prefix_out);
+    talloc_free (*term_out);
+    return 1;
+}
diff --git a/util/string-util.h b/util/string-util.h
index 7475e2c..aff2d65 100644
--- a/util/string-util.h
+++ b/util/string-util.h
@@ -28,4 +28,15 @@ char *strtok_len (char *s, const char *delim, size_t *len);
 int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
 		       char **buf, size_t *len);
 
+/* Parse a boolean term query produced by make_boolean_term, returning
+ * the prefix in *prefix_out and the term in *term_out.  *prefix_out
+ * and *term_out will be talloc'd with context ctx.
+ *
+ * Return: 0 on success, non-zero on parse error (including trailing
+ * data in str).
+ */
+int
+parse_boolean_term (void *ctx, const char *str,
+		    char **prefix_out, char **term_out);
+
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/5] dump: Disallow \n in message IDs
  2012-12-26  3:48 [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore Austin Clements
  2012-12-26  3:48 ` [PATCH v2 1/5] util: Factor out boolean term quoting routine Austin Clements
  2012-12-26  3:48 ` [PATCH v2 2/5] util: Function to parse boolean term queries Austin Clements
@ 2012-12-26  3:48 ` Austin Clements
  2012-12-26  3:48 ` [PATCH v2 4/5] dump/restore: Use Xapian queries for batch-tag format Austin Clements
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Austin Clements @ 2012-12-26  3:48 UTC (permalink / raw)
  To: notmuch

When we switch to using regular Xapian queries in the dump format, \n
will cause problems, so we disallow it.  Specially, while Xapian can
quote and parse queries containing \n without difficultly, quoted
queries containing \n still span multiple lines, which breaks the
line-orientedness of the dump format.  Strictly speaking, we could
still round-trip these, but it would significantly complicate restore
as well as scripts that deal with tag dumps.  This complexity would
come at absolutely no benefit: because of the RFC 2822 unfolding
rules, no amount of standards negligence can produce a message with a
message ID containing a line break (not even Outlook can do it!).

Hence, we simply disallow it.
---
 notmuch-dump.c       |    9 +++++++++
 test/random-corpus.c |    4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/notmuch-dump.c b/notmuch-dump.c
index d2dad40..29d79da 100644
--- a/notmuch-dump.c
+++ b/notmuch-dump.c
@@ -132,6 +132,15 @@ notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
 	if (output_format == DUMP_FORMAT_SUP) {
 	    fputs (")\n", output);
 	} else {
+	    if (strchr (message_id, '\n')) {
+		/* This will produce a line break in the output, which
+		 * would be difficult to handle in tools.  However,
+		 * it's also impossible to produce an email containing
+		 * a line break in a message ID because of unfolding,
+		 * so we can safely disallow it. */
+		fprintf (stderr, "Error: cannot dump message id containing line break: %s\n", message_id);
+		return 1;
+	    }
 	    if (hex_encode (notmuch, message_id,
 			    &buffer, &buffer_size) != HEX_SUCCESS) {
 		    fprintf (stderr, "Error: failed to hex-encode msg-id %s\n",
diff --git a/test/random-corpus.c b/test/random-corpus.c
index f354d4b..8b7748e 100644
--- a/test/random-corpus.c
+++ b/test/random-corpus.c
@@ -96,7 +96,9 @@ random_utf8_string (void *ctx, size_t char_count)
 	    buf = talloc_realloc (ctx, buf, gchar, buf_size);
 	}
 
-	randomchar = random_unichar ();
+	do {
+	    randomchar = random_unichar ();
+	} while (randomchar == '\n');
 
 	written = g_unichar_to_utf8 (randomchar, buf + offset);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/5] dump/restore: Use Xapian queries for batch-tag format
  2012-12-26  3:48 [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore Austin Clements
                   ` (2 preceding siblings ...)
  2012-12-26  3:48 ` [PATCH v2 3/5] dump: Disallow \n in message IDs Austin Clements
@ 2012-12-26  3:48 ` Austin Clements
  2012-12-26  3:48 ` [PATCH v2 5/5] man: Update notmuch-dump(1) and notmuch-restore(1) Austin Clements
  2012-12-26 14:55 ` [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore David Bremner
  5 siblings, 0 replies; 9+ messages in thread
From: Austin Clements @ 2012-12-26  3:48 UTC (permalink / raw)
  To: notmuch

This switches the new batch-tag format away from using a home-grown
hex-encoding scheme for message IDs in the dump to simply using Xapian
queries with Xapian quoting syntax.

This has a variety of advantages beyond presenting a cleaner and more
consistent interface.  Foremost is that it will dramatically simplify
the quoting for batch tagging, which shares the same input format.
While the hex-encoding is no better or worse for the simple ID queries
used by dump/restore, it becomes onerous for general-purpose queries
used in batch tagging.  It also better handles strange cases like
"id:foo and bar", since this is no longer syntactically valid.
---
 notmuch-dump.c    |    9 +++++----
 notmuch-restore.c |   22 ++++++++++------------
 tag-util.c        |    6 ------
 test/dump-restore |   14 ++++++++------
 4 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/notmuch-dump.c b/notmuch-dump.c
index 29d79da..bf01a39 100644
--- a/notmuch-dump.c
+++ b/notmuch-dump.c
@@ -20,6 +20,7 @@
 
 #include "notmuch-client.h"
 #include "dump-restore-private.h"
+#include "string-util.h"
 
 int
 notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
@@ -141,13 +142,13 @@ notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
 		fprintf (stderr, "Error: cannot dump message id containing line break: %s\n", message_id);
 		return 1;
 	    }
-	    if (hex_encode (notmuch, message_id,
-			    &buffer, &buffer_size) != HEX_SUCCESS) {
-		    fprintf (stderr, "Error: failed to hex-encode msg-id %s\n",
+	    if (make_boolean_term (notmuch, "id", message_id,
+				   &buffer, &buffer_size)) {
+		    fprintf (stderr, "Error: failed to quote message id %s\n",
 			     message_id);
 		    return 1;
 	    }
-	    fprintf (output, " -- id:%s\n", buffer);
+	    fprintf (output, " -- %s\n", buffer);
 	}
 
 	notmuch_message_destroy (message);
diff --git a/notmuch-restore.c b/notmuch-restore.c
index 9ed9b51..77a4c27 100644
--- a/notmuch-restore.c
+++ b/notmuch-restore.c
@@ -207,7 +207,7 @@ notmuch_restore_command (unused (void *ctx), int argc, char *argv[])
 	    INTERNAL_ERROR ("compile time constant regex failed.");
 
     do {
-	char *query_string;
+	char *query_string, *prefix, *term;
 
 	if (line_ctx != NULL)
 	    talloc_free (line_ctx);
@@ -220,19 +220,17 @@ notmuch_restore_command (unused (void *ctx), int argc, char *argv[])
 				  &query_string, tag_ops);
 
 	    if (ret == 0) {
-		if (strncmp ("id:", query_string, 3) != 0) {
-		    fprintf (stderr, "Warning: unsupported query: %s\n", query_string);
+		ret = parse_boolean_term (line_ctx, query_string,
+					  &prefix, &term);
+		if (ret) {
+		    fprintf (stderr, "Warning: cannot parse query: %s\n",
+			     query_string);
+		    continue;
+		} else if (strcmp ("id", prefix) != 0) {
+		    fprintf (stderr, "Warning: not an id query: %s\n", query_string);
 		    continue;
 		}
-		/* delete id: from front of string; tag_message
-		 * expects a raw message-id.
-		 *
-		 * XXX: Note that query string id:foo and bar will be
-		 * interpreted as a message id "foo and bar". This
-		 * should eventually be fixed to give a better error
-		 * message.
-		 */
-		query_string = query_string + 3;
+		query_string = term;
 	    }
 	}
 
diff --git a/tag-util.c b/tag-util.c
index 705b7ba..e4e5dda 100644
--- a/tag-util.c
+++ b/tag-util.c
@@ -124,12 +124,6 @@ parse_tag_line (void *ctx, char *line,
     }
 
     /* tok now points to the query string */
-    if (hex_decode_inplace (tok) != HEX_SUCCESS) {
-	ret = line_error (TAG_PARSE_INVALID, line_for_error,
-			  "hex decoding of query %s failed", tok);
-	goto DONE;
-    }
-
     *query_string = tok;
 
   DONE:
diff --git a/test/dump-restore b/test/dump-restore
index 6a989b6..f9ae5b3 100755
--- a/test/dump-restore
+++ b/test/dump-restore
@@ -195,23 +195,25 @@ a
 
 # the previous line was blank; also no yelling please
 +%zz -- id:whatever
-+e +f id:%yy
++e +f id:"
++e +f tag:abc
 # the next non-comment line should report an an empty tag error for
 # batch tagging, but not for restore
 + +e -- id:20091117232137.GA7669@griffis1.net
-# highlight the sketchy id parsing; this should be last
-+g -- id:foo and bar
+# valid id, but warning about missing message
++e id:missing_message_id
 EOF
 
 cat <<EOF > EXPECTED
-Warning: unsupported query: a
+Warning: cannot parse query: a
 Warning: no query string [+0]
 Warning: no query string [+a +b]
 Warning: missing query string [+a +b ]
 Warning: no query string after -- [+c +d --]
 Warning: hex decoding of tag %zz failed [+%zz -- id:whatever]
-Warning: hex decoding of query id:%yy failed [+e +f id:%yy]
-Warning: cannot apply tags to missing message: foo and bar
+Warning: cannot parse query: id:"
+Warning: not an id query: tag:abc
+Warning: cannot apply tags to missing message: missing_message_id
 EOF
 
 test_expect_equal_file EXPECTED OUTPUT
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/5] man: Update notmuch-dump(1) and notmuch-restore(1)
  2012-12-26  3:48 [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore Austin Clements
                   ` (3 preceding siblings ...)
  2012-12-26  3:48 ` [PATCH v2 4/5] dump/restore: Use Xapian queries for batch-tag format Austin Clements
@ 2012-12-26  3:48 ` Austin Clements
  2012-12-26 14:55 ` [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore David Bremner
  5 siblings, 0 replies; 9+ messages in thread
From: Austin Clements @ 2012-12-26  3:48 UTC (permalink / raw)
  To: notmuch

Describe the new batch-tag format.  For notmuch-restore, rather than
half-heartedly duplicating the description, we now cite notmuch-dump.
---
 man/man1/notmuch-dump.1    |   11 +++++++----
 man/man1/notmuch-restore.1 |    6 ++----
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/man/man1/notmuch-dump.1 b/man/man1/notmuch-dump.1
index 770b00f..7bd6def 100644
--- a/man/man1/notmuch-dump.1
+++ b/man/man1/notmuch-dump.1
@@ -64,13 +64,16 @@ and tags containing whitespace or non-\fBascii\fR(7) characters.
 Each line has the form
 
 .RS 4
-.RI "+<" "encoded-tag" "> " "" "+<" "encoded-tag" "> ... -- " "" " id:<" encoded-message-id >
+.RI "+<" "encoded-tag" "> " "" "+<" "encoded-tag" "> ... -- " "" " id:<" quoted-message-id >
 
-where encoded means that every byte not matching the regex
+Tags are hex-encoded by replacing every byte not matching the regex
 .B [A-Za-z0-9@=.,_+-]
-is replace by
+with
 .B %nn
-where nn is the two digit hex encoding.
+where nn is the two digit hex encoding.  The message ID is a valid Xapian
+query, quoted using Xapian boolean term quoting rules: if the ID contains
+whitespace or a close paren or starts with a double quote, it must be
+enclosed in double quotes and double quotes inside the ID must be doubled.
 The astute reader will notice this is a special case of the batch input
 format for \fBnotmuch-tag\fR(1); note that the single message-id query is
 mandatory for \fBnotmuch-restore\fR(1).
diff --git a/man/man1/notmuch-restore.1 b/man/man1/notmuch-restore.1
index 6bba628..78fef52 100644
--- a/man/man1/notmuch-restore.1
+++ b/man/man1/notmuch-restore.1
@@ -57,10 +57,8 @@ sup calls them).
 The
 .B batch-tag
 dump format is intended to more robust against malformed message-ids
-and tags containing whitespace or non-\fBascii\fR(7) characters.  This
-format hex-escapes all characters those outside of a small character
-set, intended to be suitable for e.g. pathnames in most UNIX-like
-systems.
+and tags containing whitespace or non-\fBascii\fR(7) characters.  See
+\fBnotmuch-dump\fR(1) for details on this format.
 
 .B "notmuch restore"
 updates the maildir flags according to tag changes if the
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/5] util: Factor out boolean term quoting routine
  2012-12-26  3:48 ` [PATCH v2 1/5] util: Factor out boolean term quoting routine Austin Clements
@ 2012-12-26 14:24   ` David Bremner
  0 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2012-12-26 14:24 UTC (permalink / raw)
  To: Austin Clements, notmuch

Austin Clements <amdragon@MIT.EDU> writes:

> +/* Construct a boolean term query with the specified prefix (e.g.,
> + * "id") and search term, quoting term as necessary.
> + *
> + * Output is into buf; it may be talloc_realloced.
> + * Return: 0 on success, non-zero on memory allocation failure.
> + */
> +int make_boolean_term (void *talloc_ctx, const char *prefix, const char *term,
> +		       char **buf, size_t *len);
> +

I think the quoting should be described a bit more precisely here,
especially since you are going to refer to it in the comment above
parse_boolean_term.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/5] util: Function to parse boolean term queries
  2012-12-26  3:48 ` [PATCH v2 2/5] util: Function to parse boolean term queries Austin Clements
@ 2012-12-26 14:31   ` David Bremner
  0 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2012-12-26 14:31 UTC (permalink / raw)
  To: Austin Clements, notmuch

Austin Clements <amdragon@MIT.EDU> writes:

> +	char *out = talloc_strdup (ctx, pos + 1);
> +	int closed = 0;
> +	/* Find the closing quote and un-double doubled internal
> +	 * quotes. */
> +	for (pos = *term_out = out; *pos; ) {

Since you strdup anyway, wouldn't it be easier to understand if pos read
from the input string and out wrote to term_out? Something like
(untested) 

index db01b4b..e4157d0 100644
--- a/util/string-util.c
+++ b/util/string-util.c
@@ -112,11 +112,12 @@ parse_boolean_term (void *ctx, const char *str,
 
     /* Implement de-quoting compatible with make_boolean_term. */
     if (*pos == '"') {
-       char *out = talloc_strdup (ctx, pos + 1);
+       char *out;
        int closed = 0;
+       *term_out= talloc_strdup (ctx, pos + 1);
        /* Find the closing quote and un-double doubled internal
         * quotes. */
-       for (pos = *term_out = out; *pos; ) {
+       for (out = *term_out; *pos; ) {
            if (*pos == '"') {
                ++pos;


Perhaps the two talloc_strdups can even be unified, but I wouldn't worry
too much about that.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore
  2012-12-26  3:48 [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore Austin Clements
                   ` (4 preceding siblings ...)
  2012-12-26  3:48 ` [PATCH v2 5/5] man: Update notmuch-dump(1) and notmuch-restore(1) Austin Clements
@ 2012-12-26 14:55 ` David Bremner
  5 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2012-12-26 14:55 UTC (permalink / raw)
  To: Austin Clements, notmuch

Austin Clements <amdragon@MIT.EDU> writes:

> This obsoletes
>
>   id:1356415076-5692-1-git-send-email-amdragon@mit.edu
>
> In addition to incorporating all of David's suggestions, this reworks
> the boolean term parsing so it only handles the subset of quoting
> syntax used by make_boolean_term (which also happens to be all that we
> described in the man page for the format).  The diff from v1 is below.

Implementation wise this series now looks OK to me; I think the dequote
routine could be slightly clearer, but it's a bit subjective at this
point.

IMHO the real question is if we like (or can live with) the resulting UI
for batch tagging.  Mark ( id:87ehid5h64.fsf@qmul.ac.uk ) brings up the
question of whether tags with newlines should be somehow supported in a
batch-tagging context. The other UI issue I'm aware of is the fact that
tags with spaces needed to be quoted in two different ways for tag
operations and for queries.  A relatively smooth upgrade path would be
to support Xapian quoting for tag operations in the future.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-12-26 14:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-26  3:48 [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore Austin Clements
2012-12-26  3:48 ` [PATCH v2 1/5] util: Factor out boolean term quoting routine Austin Clements
2012-12-26 14:24   ` David Bremner
2012-12-26  3:48 ` [PATCH v2 2/5] util: Function to parse boolean term queries Austin Clements
2012-12-26 14:31   ` David Bremner
2012-12-26  3:48 ` [PATCH v2 3/5] dump: Disallow \n in message IDs Austin Clements
2012-12-26  3:48 ` [PATCH v2 4/5] dump/restore: Use Xapian queries for batch-tag format Austin Clements
2012-12-26  3:48 ` [PATCH v2 5/5] man: Update notmuch-dump(1) and notmuch-restore(1) Austin Clements
2012-12-26 14:55 ` [PATCH v2 0/5] Use Xapian query syntax for batch-tag dump/restore David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).