unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more
@ 2011-01-16  8:10 Austin Clements
  2011-01-16  8:10 ` [PATCH 1/8] Implement a custom query parser with a mostly Xapian-compatible grammar Austin Clements
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

This is version 2 of the custom query parser.  It now supports date
searches with sane syntax, folder searches (without any additions or
changes to the database, unlike cworth's recent commit), and "tag:*"
and "-tag:*" queries for finding tagged and untagged messages.  I used
these features to guide changes to the original design and to validate
the approach.  This is still RFC, but it's much less raw now.

In addition to the new features, the core query parser has a bunch of
cleanups and changes, including completely redone NEAR and ADJ
operators that now behave essentially the same as they do in Xapian's
query parser.  I also split the implementation of these out into a
separate patch for ease of review.

There's a notable lack of tests in this current series.  I do have a
pile of tests for the lexer, parser, and generator, but the
infrastructure for testing them needs cleanup before I send that out.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/8] Implement a custom query parser with a mostly Xapian-compatible grammar.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
@ 2011-01-16  8:10 ` Austin Clements
  2011-01-21  6:37   ` [PATCH 1.5/8] Query parser testing framework and basic tests Austin Clements
  2011-01-16  8:10 ` [PATCH 2/8] Parse NEAR and ADJ operators Austin Clements
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

This parser takes an extra step through an intermediate representation
that is convenient to manipulate via pluggable transformation passes.
These are used to implement regular Xapian-style query prefixes, but
are flexible enough to accomplish far more.
---
 lib/Makefile.local     |    1 +
 lib/database-private.h |    9 +
 lib/notmuch-private.h  |  134 +++++++
 lib/qparser.cc         |  920 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 1064 insertions(+), 0 deletions(-)
 create mode 100644 lib/qparser.cc

diff --git a/lib/Makefile.local b/lib/Makefile.local
index 37d1c0d..37d3735 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -64,6 +64,7 @@ libnotmuch_cxx_srcs =		\
 	$(dir)/index.cc		\
 	$(dir)/message.cc	\
 	$(dir)/query.cc		\
+	$(dir)/qparser.cc	\
 	$(dir)/thread.cc
 
 libnotmuch_modules = $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
diff --git a/lib/database-private.h b/lib/database-private.h
index 9f83407..5d2fa02 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -64,6 +64,15 @@ _notmuch_get_terms_with_prefix (void *ctx, Xapian::TermIterator &i,
 				Xapian::TermIterator &end,
 				const char *prefix);
 
+/* qparser.cc */
+
+/* Generate a Xapian query from a query AST.  If an error occurs,
+ * *error_out will be set to the text of that error.  Otherwise,
+ * *error_out will be set to NULL. */
+Xapian::Query
+_notmuch_qparser_generate (const void *ctx, _notmuch_qparser_t *qparser,
+			   _notmuch_token_t *root, char **error_out);
+
 #pragma GCC visibility pop
 
 #endif
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index b6f1095..06239b9 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -508,6 +508,140 @@ notmuch_filenames_t *
 _notmuch_filenames_create (const void *ctx,
 			   notmuch_string_list_t *list);
 
+/* qparser.cc */
+
+typedef struct _notmuch_qparser _notmuch_qparser_t;
+
+enum _notmuch_token_type {
+    /* These first four token types appear only in the lexer output
+     * and never in the parse tree. */
+    TOK_LOVE, TOK_HATE, TOK_BRA, TOK_KET,
+    /* Binary operators.  These should have left and right children. */
+    TOK_AND, TOK_OR, TOK_XOR,
+    /* Unary operators.  These have only a left child.  Xapian::Query
+     * has no pure NOT operator, so the generator treats NOT as the
+     * child of an AND specially, and otherwise represents it as
+     * "<all> AND_NOT x".  FILTER ignores the weights of the subquery
+     * and generates Xapian::Query::OP_FILTER if the left child of an
+     * AND or Xapian::Query::OP_SCALE_WEIGHT otherwise.  The text
+     * field of a PREFIX operator specifies the prefix.  PREFIX
+     * operators specify only syntactic prefixes, not database
+     * prefixes, and thus have no effect on the generated query. */
+    TOK_NOT, TOK_FILTER, TOK_PREFIX,
+    /* A single TOK_TERMS token can represent a single term, a quoted
+     * phrase, or an implicit phrase.  An implicit phrase is something
+     * like "foo/bar", for which the database contains two separate
+     * terms, but you want to treat them as a phrase, even though it's
+     * not quoted.  Xapian calls characters that implicitly connect
+     * terms into phrases "phrase generators."  We take a simpler
+     * approach and treat almost any non-whitespace character as a
+     * phrase generator. */
+    TOK_TERMS,
+    /* Like TOK_TERMS, but the term text should be taken literally,
+     * with no phrase splitting or whitespace removal.  The lexer
+     * only generates TOK_TERMS; the parser creates TOK_LIT. */
+    TOK_LIT,
+    /* An error token.  An error token anywhere in the parse tree will
+     * be propagated up by the generator and returned to the caller.
+     * The error message should be in the text. */
+    TOK_ERROR,
+    /* TOK_END indicates the end of the token list.  Such tokens loop
+     * back on themselves so it's always safe to follow "next".
+     * These appear only in the lexer output. */
+    TOK_END
+};
+
+typedef struct _notmuch_token {
+    enum _notmuch_token_type type;
+    const char *text;
+
+    /* For TOK_PREFIX, the flags of this prefix. */
+    int prefixFlags;
+
+    /* For TOK_TERMS and TOK_LIT, the database prefix to use when
+     * generating database terms.  This must be filled in a
+     * transformation pass. */
+    const char *prefix;
+
+    /* Link in the lexer token list. */
+    struct _notmuch_token *next;
+
+    /* Links in the intermediate AST. */
+    struct _notmuch_token *left, *right;
+} _notmuch_token_t;
+
+_notmuch_token_t *
+_notmuch_token_create_op (const void *ctx, enum _notmuch_token_type type,
+			  _notmuch_token_t *left, _notmuch_token_t *right);
+
+_notmuch_token_t *
+_notmuch_token_create_term (const void *ctx, enum _notmuch_token_type type,
+			    const char *text);
+
+char *
+_notmuch_token_show (const void *ctx, _notmuch_token_t *tok);
+
+char *
+_notmuch_token_show_list (const void *ctx, _notmuch_token_t *tok);
+
+char *
+_notmuch_token_show_tree (const void *ctx, _notmuch_token_t *root);
+
+_notmuch_qparser_t *
+_notmuch_qparser_create (const void *ctx, notmuch_database_t *notmuch);
+
+/* Add a syntactic prefix.  This will appear as a TOK_PREFIX in the
+ * AST, but does not alone affect the final query.
+ *
+ * The literal flag affects lexing.  If true, this prefix must be
+ * followed by a regular term or quoted literal, which will not be
+ * stripped of whitespace or split in to a phrase.  The boolean flag
+ * affects parsing.  If true, then terms with this prefix will be
+ * combined into the query using the FILTER operator, so they must
+ * appear in the result and will not contribute to weights.  Xapian's
+ * "boolean prefixes" are both literal and boolean.
+ */
+void
+_notmuch_qparser_add_prefix (_notmuch_qparser_t *qparser,
+			     const char *prefix, notmuch_bool_t literal,
+			     notmuch_bool_t boolean);
+
+/* Add a transform pass to a query parser.  The transform function
+ * will be called with the root of the AST and should return a new AST
+ * root (which may be the same as the old root).
+ */
+void
+_notmuch_qparser_add_transform (_notmuch_qparser_t *qparser,
+				_notmuch_token_t *(*transform) (
+				    _notmuch_token_t *ast, void *opaque),
+				void *opaque);
+
+/* Add a syntactic prefix (field) and a transform pass to transform
+ * that syntactic prefix into a database prefix (prefix).  This
+ * corresponds to Xapian's add_prefix and add_boolean_prefix
+ * functions. */
+void
+_notmuch_qparser_add_db_prefix (_notmuch_qparser_t *qparser,
+				const char *field, const char *prefix,
+				notmuch_bool_t boolean);
+
+/* Lex a query string, returning the first token in the token list.
+ * This is only meant for testing. */
+_notmuch_token_t *
+_notmuch_qparser_lex (const void *ctx, _notmuch_qparser_t *qparser,
+		      const char *query);
+
+/* Parse a query string, returning the root of the AST. */
+_notmuch_token_t *
+_notmuch_qparser_parse (const void *ctx, _notmuch_qparser_t *qparser,
+			const char *query);
+
+/* Transform a parsed query, running the transforms in the order they
+ * were added to the query parser.  Return the root of the transformed
+ * AST. */
+_notmuch_token_t *
+_notmuch_qparser_transform (_notmuch_qparser_t *qparser, _notmuch_token_t *root);
+
 #pragma GCC visibility pop
 
 NOTMUCH_END_DECLS
diff --git a/lib/qparser.cc b/lib/qparser.cc
new file mode 100644
index 0000000..b86a445
--- /dev/null
+++ b/lib/qparser.cc
@@ -0,0 +1,920 @@
+/* qparser.cc - Notmuch query parser
+ *
+ * Copyright © 2010 Austin Clements
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: Austin Clements <amdragon@mit.edu>
+ */
+
+/*
+ * Query parsing is performed in a series of phases similar to those
+ * of a traditional compiler.
+ *
+ * 1) We tokenize the query, identifying operators and term phrases.
+ *    Note that a phrase (quoted or implicit) generates a single token
+ *    at this point, which we split up later (unlike in the Xapian
+ *    lexer).
+ * 2) We parse the token stream, generating an intermediate
+ *    representation in the form of a binary AST.  This IR is similar
+ *    to the Xapian::Query operators, but is designed to be more
+ *    easily manipulated.
+ * 3) We transform the parse tree, running a sequence of
+ *    caller-provided transformation functions over the tree.
+ * 4) We generate the Xapian::Query from the transformed IR.  This
+ *    step also splits phrase tokens into multiple query terms.
+ *
+ * To use the query parser, call _notmuch_qparser_parse to perform
+ * steps 1 and 2, _notmuch_qparser_transform to perform step 3, and
+ * _notmuch_qparser_generate to perform step 4.
+ *
+ * Still missing from this implementation:
+ * * NEAR/ADJ operators
+ * * Stemming - The stemming should probably be marked on TOK_TERMS
+ *   tokens.  Ideally, we can just pass this to the term generator.
+ * * Wildcard queries - This should be available in the IR so it's
+ *   easy to generate wildcard queries in a transformer.
+ * * Value ranges in the IR
+ * * Queries "" and "*"
+ */
+
+/* XXX notmuch currently registers "tag" as an exclusive boolean
+ * prefix, which means queries like "tag:x tag:y" will return messages
+ * with tag x OR tag y.  Is this intentional? */
+
+#include "notmuch-private.h"
+#include "database-private.h"
+
+#include <glib.h>		/* GHashTable */
+
+struct _notmuch_qparser {
+    notmuch_database_t *notmuch;
+
+    /* Maps from prefix strings to PREFIX_* flags */
+    GHashTable *prefixes;
+
+    struct _notmuch_qparser_transform *transforms;
+    struct _notmuch_qparser_transform **transformsTail;
+};
+
+enum {
+    PREFIX_LITERAL = 1<<1,
+    PREFIX_PROB    = 1<<2,
+    PREFIX_BOOL    = 1<<3,
+};
+
+struct _notmuch_qparser_transform {
+    struct _notmuch_qparser_transform *next;
+    _notmuch_token_t *(*transform) (_notmuch_token_t *ast, void *opaque);
+    void *opaque;
+};
+
+struct _notmuch_lex_state {
+    const void *ctx;
+    _notmuch_qparser_t *qparser;
+    _notmuch_token_t *head;
+    _notmuch_token_t **tail;
+};
+
+struct _notmuch_parse_state {
+    const void *ctx;
+    _notmuch_qparser_t *qparser;
+};
+
+struct _notmuch_generate_state {
+    const void *ctx;
+    _notmuch_qparser_t *qparser;
+    unsigned int termpos;
+    char *error;
+};
+
+static const char *token_types[] = {
+    "LOVE", "HATE", "BRA", "KET",
+    "AND", "OR", "XOR",
+    "NOT", "FILTER", "PREFIX",
+    "TERMS", "LIT", "ERROR", "END"
+};
+
+/* The distinguished end token.  This simplifies the parser since it
+ * never has to worry about dereferencing next. */
+static _notmuch_token_t tok_end = {TOK_END, NULL, FALSE, NULL,
+				   &tok_end, NULL, NULL};
+
+_notmuch_token_t *
+_notmuch_token_create_op (const void *ctx, enum _notmuch_token_type type,
+			  _notmuch_token_t *left, _notmuch_token_t *right)
+{
+    _notmuch_token_t *tok = talloc (ctx, struct _notmuch_token);
+    memset (tok, 0, sizeof (*tok));
+    tok->type = type;
+    tok->left = left;
+    tok->right = right;
+    return tok;
+}
+
+_notmuch_token_t *
+_notmuch_token_create_term (const void *ctx, enum _notmuch_token_type type,
+			    const char *text)
+{
+    _notmuch_token_t *tok = _notmuch_token_create_op (ctx, type, NULL, NULL);
+    tok->text = text;
+    return tok;
+}
+
+char *
+_notmuch_token_show (const void *ctx, _notmuch_token_t *tok)
+{
+    int ispre = tok->type == TOK_PREFIX;
+
+    if ((unsigned)tok->type > TOK_END)
+	return talloc_asprintf (ctx, "<bad type %d>", tok->type);
+
+    if (tok->type == TOK_TERMS)
+	return talloc_asprintf (ctx, "\"%s\"", tok->text);
+    else if (tok->type == TOK_LIT)
+	return talloc_asprintf (ctx, "'%s'", tok->text);
+    else if (tok->type == TOK_ERROR)
+	return talloc_asprintf (ctx, "ERROR/\"%s\"", tok->text);
+
+    return talloc_asprintf (ctx, "%s%s%s",
+			    token_types[tok->type],
+			    ispre ? "/" : "",
+			    ispre ? tok->text : "");
+}
+
+char *
+_notmuch_token_show_list (const void *ctx, _notmuch_token_t *tok)
+{
+    char *out = talloc_strdup (ctx, "");
+
+    for (; tok->type != TOK_END; tok = tok->next) {
+	char *t = _notmuch_token_show (ctx, tok);
+	out = talloc_asprintf_append (out, "%s%s", *out == 0 ? "" : " ", t);
+	talloc_free (t);
+    }
+
+    return out;
+}
+
+char *
+_notmuch_token_show_tree (const void *ctx, _notmuch_token_t *root)
+{
+    if (!root) {
+	return talloc_strdup (ctx, "<nil>");
+    } else if (!root->left && !root->right) {
+	return _notmuch_token_show (ctx, root);
+    } else {
+	void *local = talloc_new (ctx);
+	char *out = talloc_asprintf
+	    (ctx, "(%s%s%s%s%s)", _notmuch_token_show (local, root),
+	     root->left ? " " : "",
+	     root->left ? _notmuch_token_show_tree (local, root->left) : "",
+	     root->right ? " " : "",
+	     root->right ? _notmuch_token_show_tree (local, root->right) : "");
+	talloc_free (local);
+	return out;
+    }
+}
+
+using Xapian::Unicode::is_whitespace;
+using Xapian::Unicode::is_wordchar;
+
+static Xapian::Utf8Iterator
+lex_skip_ws (Xapian::Utf8Iterator it)
+{
+    while (is_whitespace (*it))
+	it++;
+    return it;
+}
+
+static struct _notmuch_token *
+lex_emit (struct _notmuch_lex_state *s, enum _notmuch_token_type type,
+	  char *text)
+{
+    _notmuch_token_t *tok = _notmuch_token_create_term (s->ctx, type, text);
+    tok->next = &tok_end;
+    *(s->tail) = tok;
+    s->tail = &tok->next;
+    return tok;
+}
+
+/* Lex a quoted phrase, returning an iterator pointing to the
+ * character following the closing quote.  If escaped, then accept two
+ * quotes as an escaped version of a single quote.
+ */
+static Xapian::Utf8Iterator
+lex_quoted_phrase (struct _notmuch_lex_state *s,
+		   Xapian::Utf8Iterator it, notmuch_bool_t escaped)
+{
+    Xapian::Utf8Iterator next, orig, end;
+    char *term, *src, *dst;
+
+    /* Find the end of the phrase */
+    assert (*(it++) == '"');
+    for (next = it; next != end; ++next) {
+	if (*next == '"') {
+	    if (!escaped)
+		break;
+
+	    orig = next++;
+	    if (next == end || *next != '"') {
+		next = orig;
+		break;
+	    }
+	}
+    }
+
+    /* Xapian still lexes +/-/( in quotes mode and simply doesn't
+     * generate tokens for them.  For us, the term generator will
+     * discard them. */
+    term = talloc_strndup (s->ctx, it.raw (), next.raw () - it.raw ());
+    if (escaped) {
+	/* Replace doubled quotes with a single quote. */
+	for (src = dst = term; *src; ++src, ++dst) {
+	    *dst = *src;
+	    if (*src == '"')
+		++src;
+	}
+	*dst = '\0';
+    }
+    lex_emit (s, TOK_TERMS, term);
+
+    if (next != end)
+	++next;
+    return next;
+}
+
+static Xapian::Utf8Iterator
+lex_try_consume_prefix (struct _notmuch_lex_state *s, Xapian::Utf8Iterator it,
+			char **prefixOut, int *flagsOut)
+{
+    Xapian::Utf8Iterator next (it), end;
+    char *prefix;
+    gpointer value = 0, orig;
+    int flags;
+
+    *prefixOut = NULL;
+    while (next != end && *next != ':' && !is_whitespace (*next))
+	++next;
+    if (*next != ':')
+	return it;
+    /* Ignore if followed by <= ' ' or ')' */
+    ++next;
+    if (*next <= ' ' || *next == ')')
+	return it;
+
+    prefix = talloc_strndup (s->ctx, it.raw (), next.raw () - it.raw() - 1);
+    g_hash_table_lookup_extended (s->qparser->prefixes, prefix, &orig, &value);
+    flags = GPOINTER_TO_INT (value);
+    talloc_free (prefix);
+
+    if (!flags)
+	/* Not a known prefix */
+	return it;
+    *prefixOut = (char*)orig;
+    *flagsOut = flags;
+    return next;
+}
+
+static Xapian::Utf8Iterator
+lex_consume_term (struct _notmuch_lex_state *s, Xapian::Utf8Iterator it,
+		  char **termOut)
+{
+    Xapian::Utf8Iterator next (it), end;
+    /* Xapian permits other characters to separate term phrases.  For
+     * example, "x#y" is parsed as two separate (non-phrase) terms.
+     * However, because the characters allowed in a term are
+     * context-sensitive, replicating this is very hard.  Here we take
+     * a simpler approach where only whitespace and a few operator
+     * characters that are never term characters separate terms. */
+    while (next != end && !strchr ("()\"", *next) && !is_whitespace (*next))
+	++next;
+    *termOut = talloc_strndup (s->ctx, it.raw (), next.raw () - it.raw ());
+    return next;
+}
+
+static notmuch_bool_t
+lex_operator (struct _notmuch_lex_state *s, char *term,
+	      const char *op, enum _notmuch_token_type type)
+{
+    if (strcasecmp (term, op) == 0) {
+	lex_emit (s, type, term);
+	return true;
+    }
+    
+    return false;
+}
+
+static _notmuch_token_t *
+lex (const void *ctx, _notmuch_qparser_t *qparser, const char *query)
+{
+    Xapian::Utf8Iterator it (query), next, end;
+    struct _notmuch_lex_state state = {ctx, qparser, &tok_end, &state.head};
+    struct _notmuch_lex_state *s = &state;
+    struct _notmuch_token *tok;
+    char *term;
+    int prefixFlags, literal;
+
+    while (it != end) {
+	unsigned ch;
+	if ((it = lex_skip_ws (it)) == end)
+	    break;
+
+	ch = *it;
+	switch (ch) {
+	case '+':
+	case '-':
+	    ++it;
+	    /* Xapian ignores these unless preceded by whitespace or
+	     * an open paren, which has the effect of ignoring all
+	     * +'s in "x +++y", "x#+y", and "(x)+y".  We don't
+	     * bother. */
+
+	    /* Ignore if followed by a space or another + or - */
+	    if (is_whitespace (*it) || *it == '+' || *it == '-')
+		continue;
+	    lex_emit (s, ch == '+' ? TOK_LOVE : TOK_HATE, NULL);
+	    continue;
+
+	case '"':
+	    it = lex_quoted_phrase(s, it, false);
+	    continue;
+
+	case '(':
+	    ++it;
+	    /* Xapian ignores this unless preceded by whitespace,
+	     * parens, +, or -.  We don't bother. */
+	    lex_emit (s, TOK_BRA, NULL);
+	    continue;
+
+	case ')':
+	    ++it;
+	    lex_emit (s, TOK_KET, NULL);
+	    continue;
+	}
+
+	/* Scan for a prefix */
+	next = lex_try_consume_prefix (s, it, &term, &prefixFlags);
+	literal = prefixFlags & PREFIX_LITERAL;
+	if (term && next != end && *next > ' ' && *next != ')' &&
+	    /* Non-literal prefixes are picky about the next character. */
+	    (literal || *next == '"' || *next == '(' || is_wordchar (*next))) {
+	    tok = lex_emit (s, TOK_PREFIX, term);
+	    tok->prefixFlags = prefixFlags;
+
+	    it = next;
+	    if (literal && *it == '"') {
+		/* Literal quoted strings keep everything and allow
+		 * quote escaping, unlike regular quoted phrases. */
+		it = lex_quoted_phrase (s, it, true);
+	    } else if (literal) {
+		/* Xapian uses anything up to the next space or ')'
+		 * (because literal prefixes can't be applied to
+		 * subqueries).  I disagree with Xapian here, since
+		 * Xapian will keep the open paren but not the close
+		 * paren.  Better would be to balance them. */
+		if (*next == '(')
+		    ++next;
+		while (next != end && *next > ' ' && *next != ')')
+		    ++next;
+		term = talloc_strndup (s->ctx, it.raw (),
+				       next.raw () - it.raw ());
+		lex_emit (s, TOK_TERMS, term);
+		it = next;
+	    }
+	    continue;
+	}
+
+	/* Scan for a term phrase or operator */
+	it = lex_consume_term (s, it, &term);
+
+	/* Check operators */
+	if (lex_operator (s, term, "and",  TOK_AND) ||
+	    lex_operator (s, term, "not",  TOK_NOT) ||
+	    lex_operator (s, term, "xor",  TOK_XOR) ||
+	    lex_operator (s, term, "or",   TOK_OR))
+	    continue;
+
+	/* Must be a term */
+	lex_emit (s, TOK_TERMS, term);
+    }
+
+    return s->head;
+}
+
+static void
+add_to_query (const void *ctx, _notmuch_token_t **query,
+	      _notmuch_token_type op, _notmuch_token_t *right)
+{
+    if (!*query)
+	*query = right;
+    else if (right)
+	*query = _notmuch_token_create_op (ctx, op, *query, right);
+}
+
+static _notmuch_token_t *
+parse_expr (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok);
+
+static _notmuch_token_t *
+parse_prob (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
+{
+    /* A prob is a sequence of three types of subqueries.  Because the
+     * default query operator is AND, loved terms are not treated
+     * specially.
+     * 1) Probabilistic terms (prefixed or not).  These are combined
+     *    with the default query operator, AND.
+     * 2) Terms with a boolean prefix.  All of the terms with the same
+     *    prefix are combined with OR.  Different prefixes are
+     *    combined with AND.
+     * 3) Hate terms.  These are combined with OR.
+     * The final IR looks like
+     *   (probs AND (FILTER bools)) AND (NOT hates)
+     */
+
+    _notmuch_token_t *probs, *hates, *sub, *q;
+    GHashTable *bools;
+    int done = 0;
+
+    probs = hates = NULL;
+    bools = g_hash_table_new (g_str_hash, g_str_equal);
+    while (!done) {
+	switch ((*tok)->type) {
+	case TOK_KET:
+	    if (prec < 10) {
+		/* Unmatched close paren.  Ignore it. */
+		*tok = (*tok)->next;
+		break;
+	    }
+	    /* Fall through */
+	case TOK_AND: case TOK_OR: case TOK_XOR: case TOK_NOT:
+	case TOK_END:
+	    /* End of the prob.  Might be empty. */
+	    done = 1;
+	    break;
+
+	case TOK_HATE:
+	    *tok = (*tok)->next;
+	    sub = parse_expr (s, prec + 1, tok);
+	    add_to_query (s->ctx, &hates, TOK_OR, sub);
+	    break;
+
+	case TOK_PREFIX:
+	    sub = parse_expr (s, prec + 1, tok);
+	    if (!sub)
+		break;
+	    if (sub->prefixFlags & PREFIX_PROB) {
+		add_to_query (s->ctx, &probs, TOK_AND, sub);
+	    } else {
+		_notmuch_token_t *newb, *pre = (_notmuch_token_t*)
+		    g_hash_table_lookup (bools, sub->text);
+		if (!pre)
+		    newb = sub;
+		else
+		    /* OR subqueries with same prefix */
+		    newb = _notmuch_token_create_op (s->ctx, TOK_OR, pre, sub);
+		g_hash_table_insert (bools, (void*)sub->text, newb);
+	    }
+	    break;
+
+	case TOK_LOVE:
+	    /* Join into the query like any other term, since the
+	     * default operator is AND anyway. */
+	    *tok = (*tok)->next;
+	    /* Fall through */
+	case TOK_BRA:
+	case TOK_TERMS:
+	case TOK_LIT:
+	    sub = parse_expr (s, prec + 1, tok);
+	    add_to_query (s->ctx, &probs, TOK_AND, sub);
+	    break;
+
+	case TOK_FILTER:
+	case TOK_ERROR:
+	    INTERNAL_ERROR ("Unexpected token %s",
+			    _notmuch_token_show (s->ctx, *tok));
+	}
+    }
+
+    q = probs;
+    if (g_hash_table_size (bools)) {
+	/* Merge boolean filters */
+	_notmuch_token_t *filter;
+	GList *vals = g_hash_table_get_values (bools), *l;
+	sub = NULL;
+	for (l = vals; l; l = l->next)
+	    add_to_query (s->ctx, &sub, TOK_AND, (_notmuch_token_t *) l->data);
+	g_list_free (vals);
+
+	/* Create filter */
+	filter = _notmuch_token_create_op (s->ctx, TOK_FILTER, sub, NULL);
+	add_to_query (s->ctx, &q, TOK_AND, filter);
+    }
+    if (hates) {
+	sub = _notmuch_token_create_op (s->ctx, TOK_NOT, hates, NULL);
+	add_to_query (s->ctx, &q, TOK_AND, sub);
+    }
+    g_hash_table_unref (bools);
+    return q;
+}
+
+static _notmuch_token_t *
+parse_term (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
+{
+    _notmuch_token_t *sub;
+
+    if ((*tok)->type == TOK_END) {
+	/* Arises from things like "x -".  Ignore. */
+	return NULL;
+    } else if ((*tok)->type == TOK_PREFIX) {
+	sub = *tok;
+	*tok = (*tok)->next;
+	sub->left = parse_term (s, prec, tok);
+	if (!sub->left)
+	    return NULL;
+	if (sub->prefixFlags & PREFIX_LITERAL) {
+	    /* Convert TOK_TERMS to TOK_LIT */
+	    assert (sub->left->type == TOK_TERMS);
+	    sub->left->type = TOK_LIT;
+	} else if (sub->left->type == TOK_PREFIX) {
+	    sub->left = sub->left->left;
+	}
+	return sub;
+    } else if ((*tok)->type == TOK_BRA) {
+	*tok = (*tok)->next;
+	sub = parse_expr (s, prec + 10 - (prec%10), tok);
+	if ((*tok)->type == TOK_KET)
+	    *tok = (*tok)->next;
+	return sub;
+    }
+
+    if ((*tok)->type != TOK_TERMS && (*tok)->type != TOK_LIT) {
+	/* Arises from "+AND", "-AND", "prob:AND".  We could give up
+	 * and return nothing, but it seems nicer to treat the
+	 * operator as a term if it came from the original query. */
+	if (!(*tok)->text)
+	    return NULL;
+	(*tok)->type = TOK_TERMS;
+    }
+
+    sub = *tok;
+    *tok = (*tok)->next;
+    return sub;
+}
+
+static _notmuch_token_t *
+parse_expr (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
+{
+    /* If you squint at the Xapian grammar, it's a precedence grammar
+     * with one strange "prob" level.  This implements all but the
+     * prob level and the leaf "term" level.
+     *
+     * prec is (nesting level * 10 + precedence level).  Normally we
+     * only care about the precedence level, but the nesting level is
+     * important for recovering from unbalanced parens.
+     */
+    int bprec = prec % 10;
+    if (bprec == 3) {
+	if ((*tok)->type == TOK_NOT) {
+	    /* Unary NOT */
+	    _notmuch_token_t *root = *tok;
+	    *tok = (*tok)->next;
+	    root->left = parse_expr (s, prec, tok);
+	    return root;
+	}
+
+	return parse_prob (s, prec, tok);
+    }
+    if (bprec == 4)
+	return parse_term (s, prec, tok);
+
+    _notmuch_token_t *left = parse_expr (s, prec + 1, tok);
+    while ((bprec == 0 && (*tok)->type == TOK_OR) ||
+	   (bprec == 1 && (*tok)->type == TOK_XOR) ||
+	   (bprec == 2 && ((*tok)->type == TOK_AND ||
+			   (*tok)->type == TOK_NOT))) {
+	_notmuch_token_t *root = *tok;
+	if (root->type == TOK_NOT) {
+	    /* Replace "x NOT y" with (AND x (NOT y)) by inserting an
+	     * AND operator and leaning on the unary NOT rule. */
+	    root = _notmuch_token_create_term (s->ctx, TOK_AND, NULL);
+	} else {
+	    *tok = (*tok)->next;
+	}
+
+	/* Xapian treats x AND -y as x AND NOT y, which affects
+	 * precedence. */
+	if (root->type == TOK_AND && (*tok)->type == TOK_HATE)
+	    (*tok)->type = TOK_NOT;
+
+	_notmuch_token_t *right = parse_expr (s, prec + 1, tok);
+	if (left && right) {
+	    root->left = left;
+	    root->right = right;
+	} else {
+	    /* Left or right was empty.  This may be a syntax error
+	     * like an omitted expression, or an empty expression. */
+	    root = left ? left : right;
+	}
+	left = root;
+    }
+    return left;
+}
+
+static _notmuch_token_t *
+parse (struct _notmuch_parse_state *s, _notmuch_token_t *toks)
+{
+    _notmuch_token_t *root = parse_expr (s, 0, &toks);
+    if (toks->type != TOK_END)
+	INTERNAL_ERROR ("Token stream not fully consumed: %s",
+			_notmuch_token_show_list (s->ctx, toks));
+    return root;
+}
+
+static char *
+generate_term (struct _notmuch_generate_state *s, const char *term,
+	       const char *prefix)
+{
+    notmuch_bool_t colon = FALSE;
+    if (isupper (term[0]) && strlen (prefix) > 1)
+	colon = TRUE;
+    return talloc_asprintf (s->ctx, "%s%s%s", prefix, colon ? ":" : "", term);
+}
+
+static Xapian::Query
+generate_terms (struct _notmuch_generate_state *s, const char *text,
+		const char *prefix, int distance, Xapian::Query::op op)
+{
+    Xapian::TermGenerator tg;
+    Xapian::Document doc;
+    Xapian::TermIterator it, end;
+    Xapian::PositionIterator pit, pend;
+    Xapian::Query *qs, q;
+    unsigned int nterms = 0;
+
+    if (prefix)
+	tg.index_text (text, 1, prefix);
+    else
+	tg.index_text (text);
+    doc = tg.get_document ();
+
+    /* Find the highest positioned term.  Positions are 1-based. */
+    end = doc.termlist_end ();
+    for (it = doc.termlist_begin (); it != end; ++it) {
+	pend = it.positionlist_end ();
+	for (pit = it.positionlist_begin (); pit != pend; ++pit) {
+	    if (*pit > nterms)
+		nterms = *pit;
+	}
+    }
+    if (nterms == 0)
+	return Xapian::Query ();
+
+    /* Extract terms */
+    qs = new Xapian::Query[nterms];
+    for (it = doc.termlist_begin (); it != end; ++it) {
+	pend = it.positionlist_end ();
+	for (pit = it.positionlist_begin (); pit != pend; ++pit)
+	    qs[*pit - 1] = Xapian::Query (*it, 1, s->termpos + *pit - 1);
+    }
+    s->termpos += nterms;
+
+    /* Build query */
+    q = Xapian::Query (op, qs, qs + nterms, distance + nterms);
+    delete [] qs;
+    return q;
+}
+
+static Xapian::Query
+generate (struct _notmuch_generate_state *s, _notmuch_token_t *root)
+{
+    using Xapian::Query;
+    Query l, r;
+    Query::op op;
+
+    if (!root)
+	return Query ();
+
+    /* The tricky part here is that generate is allowed to return a
+     * empty query, indicating that the user's query cannot be
+     * expressed.  For example, the term "#" is an empty query. */
+
+    switch (root->type) {
+    case TOK_AND:
+	if (root->left->type == TOK_NOT && root->right->type != TOK_NOT) {
+	    _notmuch_token_t *tmp = root->left;
+	    root->left = root->right;
+	    root->right = tmp;
+	}
+	l = generate (s, root->left);
+	if (l.empty()) {
+	    return generate (s, root->right);
+	} else if (root->right->type == TOK_NOT) {
+	    r = generate (s, root->right->left);
+	    op = Query::OP_AND_NOT;
+	} else if (root->right->type == TOK_FILTER) {
+	    r = generate (s, root->right->left);
+	    op = Query::OP_FILTER;
+	} else {
+	    r = generate (s, root->right);
+	    op = Query::OP_AND;
+	}
+	if (r.empty())
+	    return l;
+	return Query (op, l, r);
+
+    case TOK_NOT:
+	l = generate (s, root->left);
+	if (l.empty())
+	    return l;
+	return Query (Query::OP_AND_NOT, Query ("", 1, 0), l);
+
+    case TOK_FILTER:
+	l = generate(s, root->left);
+	if (l.empty())
+	    return l;
+	return Query (Query::OP_SCALE_WEIGHT, l, 0.0);
+
+    case TOK_OR:
+    case TOK_XOR:
+	l = generate (s, root->left);
+	r = generate (s, root->right);
+	if (l.empty())
+	    return r;
+	if (r.empty())
+	    return l;
+	return Query (root->type == TOK_OR ? Query::OP_OR : Query::OP_XOR,
+		      l, r);
+
+    case TOK_PREFIX:
+	return generate (s, root->left);
+
+    case TOK_TERMS:
+	return generate_terms (s, root->text, root->prefix, 0,
+			       Query::OP_PHRASE);
+
+    case TOK_LIT:
+	return Query (generate_term (s, root->text, root->prefix));
+
+    case TOK_ERROR:
+	if (!s->error)
+	    s->error = talloc_strdup (s->ctx, root->text);
+	return Query ();
+
+    case TOK_LOVE:
+    case TOK_HATE:
+    case TOK_BRA:
+    case TOK_KET:
+    case TOK_END:
+	/* Fall through to the error after the switch */
+	break;
+    }
+    /* We leave this outside the switch so the compiler will warn us
+     * if we missed a token type. */
+    INTERNAL_ERROR ("Illegal token %s in IR",
+		    _notmuch_token_show (s->ctx, root));
+    return Xapian::Query ();
+}
+
+static int
+_notmuch_qparser_destructor (_notmuch_qparser_t *qparser)
+{
+    g_hash_table_unref (qparser->prefixes);
+    return 0;
+}
+
+_notmuch_qparser_t *
+_notmuch_qparser_create (const void *ctx, notmuch_database_t *notmuch)
+{
+    _notmuch_qparser_t *qparser = talloc (ctx, _notmuch_qparser_t);
+    if (!qparser)
+	return NULL;
+    qparser->prefixes = NULL;
+    talloc_set_destructor (qparser, _notmuch_qparser_destructor);
+
+    qparser->notmuch = notmuch;
+    qparser->prefixes = g_hash_table_new (g_str_hash, g_str_equal);
+    qparser->transforms = NULL;
+    qparser->transformsTail = &qparser->transforms;
+
+    return qparser;
+}
+
+void
+_notmuch_qparser_add_prefix (_notmuch_qparser_t *qparser,
+			     const char *prefix, notmuch_bool_t literal,
+			     notmuch_bool_t boolean)
+{
+    int flags = ((literal ? PREFIX_LITERAL : 0) |
+		 (boolean ? PREFIX_BOOL : PREFIX_PROB));
+    g_hash_table_insert (qparser->prefixes, talloc_strdup (qparser, prefix),
+			 GINT_TO_POINTER (flags));
+}
+
+void
+_notmuch_qparser_add_transform (_notmuch_qparser_t *qparser,
+				_notmuch_token_t *(*transform) (
+				    _notmuch_token_t *ast, void *opaque),
+				void *opaque)
+{
+    struct _notmuch_qparser_transform *t;
+    t = talloc (qparser, struct _notmuch_qparser_transform);
+    t->next = NULL;
+    t->transform = transform;
+    t->opaque = opaque;
+    *qparser->transformsTail = t;
+    qparser->transformsTail = &t->next;
+}
+
+struct _notmuch_transform_prefix_info {
+    char *field, *prefix;
+};
+
+static _notmuch_token_t *
+transform_prefix_rec (struct _notmuch_transform_prefix_info *info,
+		      _notmuch_token_t *root, notmuch_bool_t active)
+{
+    if (!root)
+	return NULL;
+    if (root->type == TOK_PREFIX) {
+	active = (strcmp (info->field, root->text) == 0);
+    } else if (active && (root->type == TOK_TERMS || root->type == TOK_LIT)) {
+	root->prefix = info->prefix;
+    }
+    transform_prefix_rec (info, root->left, active);
+    transform_prefix_rec (info, root->right, active);
+    return root;
+}
+
+static _notmuch_token_t *
+transform_prefix (_notmuch_token_t *root, void *opaque)
+{
+    struct _notmuch_transform_prefix_info *info =
+	(struct _notmuch_transform_prefix_info*)opaque;
+    return transform_prefix_rec (info, root, FALSE);
+}
+
+void
+_notmuch_qparser_add_db_prefix (_notmuch_qparser_t *qparser,
+				const char *field, const char *prefix,
+				notmuch_bool_t boolean)
+{
+    struct _notmuch_transform_prefix_info *info;
+    info = talloc (qparser, struct _notmuch_transform_prefix_info);
+    info->field = talloc_strdup (info, field);
+    info->prefix = talloc_strdup (info, prefix);
+    _notmuch_qparser_add_prefix (qparser, field, boolean, boolean);
+    _notmuch_qparser_add_transform (qparser, transform_prefix, info);
+}
+
+_notmuch_token_t *
+_notmuch_qparser_lex (const void *ctx, _notmuch_qparser_t *qparser,
+		      const char *query)
+{
+    return lex (ctx, qparser, query);
+}
+
+_notmuch_token_t *
+_notmuch_qparser_parse (const void *ctx, _notmuch_qparser_t *qparser,
+			const char *query)
+{
+    struct _notmuch_parse_state state = {ctx, qparser};
+    _notmuch_token_t *toks = lex (ctx, qparser, query);
+    return parse (&state, toks);
+}
+
+_notmuch_token_t *
+_notmuch_qparser_transform (_notmuch_qparser_t *qparser, _notmuch_token_t *root)
+{
+    struct _notmuch_qparser_transform *t;
+    for (t = qparser->transforms; t; t = t->next)
+	root = t->transform (root, t->opaque);
+    return root;
+}
+
+Xapian::Query
+_notmuch_qparser_generate (const void *ctx, _notmuch_qparser_t *qparser,
+			   _notmuch_token_t *root, char **error_out)
+{
+    struct _notmuch_generate_state state = {ctx, qparser, 1, NULL};
+    Xapian::Query query = generate (&state, root);
+    if (state.error) {
+	*error_out = state.error;
+	return Xapian::Query ();
+    }
+    *error_out = NULL;
+    if (query.empty())
+	/* Return all documents */
+	return Xapian::Query ("", 1, 0);
+    return query;
+}
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/8] Parse NEAR and ADJ operators.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
  2011-01-16  8:10 ` [PATCH 1/8] Implement a custom query parser with a mostly Xapian-compatible grammar Austin Clements
@ 2011-01-16  8:10 ` Austin Clements
  2011-01-21  6:39   ` [PATCH 2.5/8] Query parser tests for " Austin Clements
  2011-01-16  8:10 ` [PATCH 3/8] Parse wildcard queries Austin Clements
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

NEAR and ADJ are treated as n-ary operators where all operands must be
terms, which fits with Xapian's own restrictions on near/adj queries.
This implementation is slightly more lenient than Xapian's in that it
allows phrases (both quoted and implicit) as operands and folds the
phrase terms in as operands to the near/adj operator.
---
 lib/notmuch-private.h |   10 +++++
 lib/qparser.cc        |  103 ++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 06239b9..a42afd6 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -518,6 +518,12 @@ enum _notmuch_token_type {
     TOK_LOVE, TOK_HATE, TOK_BRA, TOK_KET,
     /* Binary operators.  These should have left and right children. */
     TOK_AND, TOK_OR, TOK_XOR,
+    /* n-ary operators.  In the AST, these are represented like lists
+     * of TOK_TERMS, with the left child being a TOK_TERMS and the
+     * right being another TOK_ADJ/TOK_NEAR.  The final right must be
+     * NULL.  Both tokens can also carry distances; the highest
+     * distance in the chain will be used. */
+    TOK_ADJ, TOK_NEAR,
     /* Unary operators.  These have only a left child.  Xapian::Query
      * has no pure NOT operator, so the generator treats NOT as the
      * child of an AND specially, and otherwise represents it as
@@ -555,6 +561,10 @@ typedef struct _notmuch_token {
     enum _notmuch_token_type type;
     const char *text;
 
+    /* For TOK_ADJ and TOK_NEAR, this specifies the distance
+     * argument. */
+    int distance;
+
     /* For TOK_PREFIX, the flags of this prefix. */
     int prefixFlags;
 
diff --git a/lib/qparser.cc b/lib/qparser.cc
index b86a445..5a6d39b 100644
--- a/lib/qparser.cc
+++ b/lib/qparser.cc
@@ -40,7 +40,6 @@
  * _notmuch_qparser_generate to perform step 4.
  *
  * Still missing from this implementation:
- * * NEAR/ADJ operators
  * * Stemming - The stemming should probably be marked on TOK_TERMS
  *   tokens.  Ideally, we can just pass this to the term generator.
  * * Wildcard queries - This should be available in the IR so it's
@@ -101,14 +100,14 @@ struct _notmuch_generate_state {
 
 static const char *token_types[] = {
     "LOVE", "HATE", "BRA", "KET",
-    "AND", "OR", "XOR",
+    "AND", "OR", "XOR", "ADJ", "NEAR",
     "NOT", "FILTER", "PREFIX",
     "TERMS", "LIT", "ERROR", "END"
 };
 
 /* The distinguished end token.  This simplifies the parser since it
  * never has to worry about dereferencing next. */
-static _notmuch_token_t tok_end = {TOK_END, NULL, FALSE, NULL,
+static _notmuch_token_t tok_end = {TOK_END, NULL, -1, FALSE, NULL,
 				   &tok_end, NULL, NULL};
 
 _notmuch_token_t *
@@ -118,6 +117,7 @@ _notmuch_token_create_op (const void *ctx, enum _notmuch_token_type type,
     _notmuch_token_t *tok = talloc (ctx, struct _notmuch_token);
     memset (tok, 0, sizeof (*tok));
     tok->type = type;
+    tok->distance = -1;
     tok->left = left;
     tok->right = right;
     return tok;
@@ -135,6 +135,7 @@ _notmuch_token_create_term (const void *ctx, enum _notmuch_token_type type,
 char *
 _notmuch_token_show (const void *ctx, _notmuch_token_t *tok)
 {
+    char dist[32] = "";
     int ispre = tok->type == TOK_PREFIX;
 
     if ((unsigned)tok->type > TOK_END)
@@ -147,9 +148,11 @@ _notmuch_token_show (const void *ctx, _notmuch_token_t *tok)
     else if (tok->type == TOK_ERROR)
 	return talloc_asprintf (ctx, "ERROR/\"%s\"", tok->text);
 
-    return talloc_asprintf (ctx, "%s%s%s",
+    if (tok->distance != -1)
+	sprintf(dist, "/%d", tok->distance);
+    return talloc_asprintf (ctx, "%s%s%s%s",
 			    token_types[tok->type],
-			    ispre ? "/" : "",
+			    dist, ispre ? "/" : "",
 			    ispre ? tok->text : "");
 }
 
@@ -308,10 +311,31 @@ static notmuch_bool_t
 lex_operator (struct _notmuch_lex_state *s, char *term,
 	      const char *op, enum _notmuch_token_type type)
 {
+    size_t oplen = strlen (op);
+
     if (strcasecmp (term, op) == 0) {
 	lex_emit (s, type, term);
 	return true;
     }
+
+    /* Check for ADJ or NEAR with argument.  Our parsing of this is
+     * slightly incompatible with Xapian, but I believe this to be a
+     * bug in Xapian.  Xapian parses "x NEAR/y z" as three term
+     * phrases, "x", "near y", and "z", like we do.  However, it
+     * behaves differently if the bad NEAR operator is at the end of
+     * the query, parsing "x NEAR/y" like "x NEAR y". */
+    if ((type == TOK_ADJ || type == TOK_NEAR) &&
+	strncasecmp (term, op, oplen) == 0 &&
+	term[oplen] == '/') {
+	/* Try to parse the distance argument */
+	char *end;
+	int distance = strtol (&term[oplen + 1], &end, 10);
+	if (distance && !*end) {
+	    struct _notmuch_token *tok = lex_emit (s, type, term);
+	    tok->distance = distance;
+	    return true;
+	}
+    }
     
     return false;
 }
@@ -403,7 +427,9 @@ lex (const void *ctx, _notmuch_qparser_t *qparser, const char *query)
 	if (lex_operator (s, term, "and",  TOK_AND) ||
 	    lex_operator (s, term, "not",  TOK_NOT) ||
 	    lex_operator (s, term, "xor",  TOK_XOR) ||
-	    lex_operator (s, term, "or",   TOK_OR))
+	    lex_operator (s, term, "or",   TOK_OR)  ||
+	    lex_operator (s, term, "adj",  TOK_ADJ) ||
+	    lex_operator (s, term, "near", TOK_NEAR))
 	    continue;
 
 	/* Must be a term */
@@ -495,6 +521,8 @@ parse_prob (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
 	case TOK_BRA:
 	case TOK_TERMS:
 	case TOK_LIT:
+	case TOK_ADJ:
+	case TOK_NEAR:
 	    sub = parse_expr (s, prec + 1, tok);
 	    add_to_query (s->ctx, &probs, TOK_AND, sub);
 	    break;
@@ -529,6 +557,37 @@ parse_prob (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
 }
 
 static _notmuch_token_t *
+parse_near (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
+{
+    _notmuch_token_type first = (*tok)->type, conj = (*tok)->next->type;
+    _notmuch_token_t *root = parse_expr (s, prec + 1, tok);
+    _notmuch_token_t **tail = NULL;
+
+    /* XXX Xapian allows prefixed terms in near/adj. */
+    if (first != TOK_TERMS || !(conj == TOK_NEAR || conj == TOK_ADJ))
+	return root;
+
+    while ((*tok)->type == conj && (*tok)->next->type == TOK_TERMS) {
+	if (!tail) {
+	    /* First operator.  Create the list root. */
+	    _notmuch_token_t *nroot =
+		_notmuch_token_create_op (s->ctx, conj, root, NULL);
+	    root = nroot;
+	    tail = &nroot->right;
+	}
+
+	/* Append the operator and term token to the list */
+	*tail = *tok;
+	*tok = (*tok)->next;
+	(*tail)->left = *tok;
+	*tok = (*tok)->next;
+	tail = &(*tail)->right;
+    }
+
+    return root;
+}
+
+static _notmuch_token_t *
 parse_term (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
 {
     _notmuch_token_t *sub;
@@ -596,6 +655,8 @@ parse_expr (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
 	return parse_prob (s, prec, tok);
     }
     if (bprec == 4)
+	return parse_near (s, prec, tok);
+    if (bprec == 5)
 	return parse_term (s, prec, tok);
 
     _notmuch_token_t *left = parse_expr (s, prec + 1, tok);
@@ -756,6 +817,36 @@ generate (struct _notmuch_generate_state *s, _notmuch_token_t *root)
 	return Query (root->type == TOK_OR ? Query::OP_OR : Query::OP_XOR,
 		      l, r);
 
+    case TOK_ADJ:
+    case TOK_NEAR:
+    {
+	_notmuch_token_t *node;
+	int dist = -1;
+	char *terms = talloc_strdup (root, "");
+	/* Concatenate the operands and get the highest distance */
+	for (node = root; node; node = node->right) {
+	    if (node->left->type != TOK_TERMS)
+		INTERNAL_ERROR ("Illegal token in NEAR/ADJ: %s",
+				_notmuch_token_show (s->ctx, node->left));
+	    if (node->left->prefix)
+		INTERNAL_ERROR ("Prefixes not supported in NEAR/ADJ");
+
+	    terms = talloc_asprintf_append (terms, "%s ", node->left->text);
+	    if (node->distance > dist)
+		dist = node->distance;
+	}
+	/* The default distance is 10. */
+	if (dist == -1)
+	    dist = 10;
+	/* Generate a PHRASE or NEAR query.  If there are implicit
+	 * phrases, they will be split out and treated like any other
+	 * term in the operand list. */
+	op = root->type == TOK_ADJ ? Query::OP_PHRASE : Query::OP_NEAR;
+	l = generate_terms (s, terms, NULL, dist - 1, op);
+	talloc_free (terms);
+	return l;
+    }
+
     case TOK_PREFIX:
 	return generate (s, root->left);
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/8] Parse wildcard queries.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
  2011-01-16  8:10 ` [PATCH 1/8] Implement a custom query parser with a mostly Xapian-compatible grammar Austin Clements
  2011-01-16  8:10 ` [PATCH 2/8] Parse NEAR and ADJ operators Austin Clements
@ 2011-01-16  8:10 ` Austin Clements
  2011-01-21  6:40   ` [PATCH 3.5/8] Query parser tests for " Austin Clements
  2011-01-16  8:10 ` [PATCH 4/8] Replace Xapian query parser with custom query parser Austin Clements
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

This implements support in the lexer and generator for wildcard terms,
expanding them into synonym queries the way Xapian does.  Since this
expansion is performed by the generator, it's easy to take advantage
of in query transforms.

With this, * works anywhere in the query, so we'll no longer need
special case code for '*' queries in query.cc.
---
 TODO                  |    3 --
 lib/notmuch-private.h |    6 +++-
 lib/qparser.cc        |   75 +++++++++++++++++++++++++++++++++++++++----------
 3 files changed, 65 insertions(+), 19 deletions(-)

diff --git a/TODO b/TODO
index 438f7aa..10c8c12 100644
--- a/TODO
+++ b/TODO
@@ -228,9 +228,6 @@ for all messages with the word "to". If we don't provide the first
 behavior, perhaps we should exit on an error when a configured prefix
 is provided with no value?
 
-Support "*" in all cases and not just as a special case. That is, "* "
-should also work, as well as "* and tag:inbox".
-
 Implement a syntax for requesting set-theoertic operations on results
 of multiple searches. For example, I would like to do:
 
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index a42afd6..eb346ea 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -522,7 +522,8 @@ enum _notmuch_token_type {
      * of TOK_TERMS, with the left child being a TOK_TERMS and the
      * right being another TOK_ADJ/TOK_NEAR.  The final right must be
      * NULL.  Both tokens can also carry distances; the highest
-     * distance in the chain will be used. */
+     * distance in the chain will be used.  The operand terms may not
+     * be prefixed or wildcards. */
     TOK_ADJ, TOK_NEAR,
     /* Unary operators.  These have only a left child.  Xapian::Query
      * has no pure NOT operator, so the generator treats NOT as the
@@ -572,6 +573,9 @@ typedef struct _notmuch_token {
      * generating database terms.  This must be filled in a
      * transformation pass. */
     const char *prefix;
+    /* For TOK_TERMS and TOK_LIT, indicates that this token should
+     * match any terms prefixed with text. */
+    notmuch_bool_t wildcard;
 
     /* Link in the lexer token list. */
     struct _notmuch_token *next;
diff --git a/lib/qparser.cc b/lib/qparser.cc
index 5a6d39b..bd0296a 100644
--- a/lib/qparser.cc
+++ b/lib/qparser.cc
@@ -33,7 +33,8 @@
  * 3) We transform the parse tree, running a sequence of
  *    caller-provided transformation functions over the tree.
  * 4) We generate the Xapian::Query from the transformed IR.  This
- *    step also splits phrase tokens into multiple query terms.
+ *    step also splits phrase tokens into multiple query terms and
+ *    expands wildcard terms.
  *
  * To use the query parser, call _notmuch_qparser_parse to perform
  * steps 1 and 2, _notmuch_qparser_transform to perform step 3, and
@@ -42,10 +43,7 @@
  * Still missing from this implementation:
  * * Stemming - The stemming should probably be marked on TOK_TERMS
  *   tokens.  Ideally, we can just pass this to the term generator.
- * * Wildcard queries - This should be available in the IR so it's
- *   easy to generate wildcard queries in a transformer.
  * * Value ranges in the IR
- * * Queries "" and "*"
  */
 
 /* XXX notmuch currently registers "tag" as an exclusive boolean
@@ -55,7 +53,7 @@
 #include "notmuch-private.h"
 #include "database-private.h"
 
-#include <glib.h>		/* GHashTable */
+#include <glib.h>		/* GHashTable, GPtrArray */
 
 struct _notmuch_qparser {
     notmuch_database_t *notmuch;
@@ -107,7 +105,7 @@ static const char *token_types[] = {
 
 /* The distinguished end token.  This simplifies the parser since it
  * never has to worry about dereferencing next. */
-static _notmuch_token_t tok_end = {TOK_END, NULL, -1, FALSE, NULL,
+static _notmuch_token_t tok_end = {TOK_END, NULL, -1, FALSE, NULL, FALSE,
 				   &tok_end, NULL, NULL};
 
 _notmuch_token_t *
@@ -142,9 +140,11 @@ _notmuch_token_show (const void *ctx, _notmuch_token_t *tok)
 	return talloc_asprintf (ctx, "<bad type %d>", tok->type);
 
     if (tok->type == TOK_TERMS)
-	return talloc_asprintf (ctx, "\"%s\"", tok->text);
+	return talloc_asprintf (ctx, "\"%s\"%s", tok->text,
+				tok->wildcard ? "*" : "");
     else if (tok->type == TOK_LIT)
-	return talloc_asprintf (ctx, "'%s'", tok->text);
+	return talloc_asprintf (ctx, "'%s'%s", tok->text,
+				tok->wildcard ? "*" : "");
     else if (tok->type == TOK_ERROR)
 	return talloc_asprintf (ctx, "ERROR/\"%s\"", tok->text);
 
@@ -348,7 +348,7 @@ lex (const void *ctx, _notmuch_qparser_t *qparser, const char *query)
     struct _notmuch_lex_state *s = &state;
     struct _notmuch_token *tok;
     char *term;
-    int prefixFlags, literal;
+    int prefixFlags, literal, wildcard, n;
 
     while (it != end) {
 	unsigned ch;
@@ -433,7 +433,15 @@ lex (const void *ctx, _notmuch_qparser_t *qparser, const char *query)
 	    continue;
 
 	/* Must be a term */
-	lex_emit (s, TOK_TERMS, term);
+	wildcard = 0;
+	n = strlen(term);
+	if (n && term[n-1] == '*') {
+	    /* Wildcard */
+	    wildcard = 1;
+	    term[n-1] = 0;
+	}
+	tok = lex_emit (s, TOK_TERMS, term);
+	tok->wildcard = wildcard;
     }
 
     return s->head;
@@ -713,8 +721,39 @@ generate_term (struct _notmuch_generate_state *s, const char *term,
 }
 
 static Xapian::Query
+generate_wildcard (struct _notmuch_generate_state *s, const char *term)
+{
+    GPtrArray *subarr;
+    Xapian::Query query, **qs;
+    Xapian::Database *db = s->qparser->notmuch->xapian_db;
+    Xapian::TermIterator i = db->allterms_begin (term),
+	end = db->allterms_end (term);
+
+    subarr = g_ptr_array_new ();
+    for (; i != end; i++)
+	g_ptr_array_add (subarr, new Xapian::Query (*i, 1, s->termpos));
+    /* If the term didn't expand, then return a query over the
+     * unexpanded term, which is guaranteed not to match anything.
+     * We can't simply return an empty query because Xapian treats
+     * those specially. */
+    if (!subarr->len) {
+	g_ptr_array_free (subarr, TRUE);
+	return Xapian::Query (term);
+    }
+
+    s->termpos++;
+    qs = (Xapian::Query**)subarr->pdata;
+    query = Xapian::Query (Xapian::Query::OP_SYNONYM, qs, qs + subarr->len);
+    for (unsigned int i = 0; i < subarr->len; ++i)
+	delete qs[i];
+    g_ptr_array_free (subarr, TRUE);
+    return query;
+}
+
+static Xapian::Query
 generate_terms (struct _notmuch_generate_state *s, const char *text,
-		const char *prefix, int distance, Xapian::Query::op op)
+		const char *prefix, notmuch_bool_t wildcard, int distance,
+		Xapian::Query::op op)
 {
     Xapian::TermGenerator tg;
     Xapian::Document doc;
@@ -740,6 +779,8 @@ generate_terms (struct _notmuch_generate_state *s, const char *text,
     }
     if (nterms == 0)
 	return Xapian::Query ();
+    if (nterms == 1 && wildcard)
+	return generate_wildcard (s, text);
 
     /* Extract terms */
     qs = new Xapian::Query[nterms];
@@ -762,6 +803,7 @@ generate (struct _notmuch_generate_state *s, _notmuch_token_t *root)
     using Xapian::Query;
     Query l, r;
     Query::op op;
+    char *term;
 
     if (!root)
 	return Query ();
@@ -842,7 +884,7 @@ generate (struct _notmuch_generate_state *s, _notmuch_token_t *root)
 	 * phrases, they will be split out and treated like any other
 	 * term in the operand list. */
 	op = root->type == TOK_ADJ ? Query::OP_PHRASE : Query::OP_NEAR;
-	l = generate_terms (s, terms, NULL, dist - 1, op);
+	l = generate_terms (s, terms, NULL, FALSE, dist - 1, op);
 	talloc_free (terms);
 	return l;
     }
@@ -851,11 +893,14 @@ generate (struct _notmuch_generate_state *s, _notmuch_token_t *root)
 	return generate (s, root->left);
 
     case TOK_TERMS:
-	return generate_terms (s, root->text, root->prefix, 0,
-			       Query::OP_PHRASE);
+	return generate_terms (s, root->text, root->prefix, root->wildcard,
+			       0, Query::OP_PHRASE);
 
     case TOK_LIT:
-	return Query (generate_term (s, root->text, root->prefix));
+	term = generate_term (s, root->text, root->prefix);
+	if (root->wildcard)
+	    return generate_wildcard (s, term);
+	return Query (term);
 
     case TOK_ERROR:
 	if (!s->error)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/8] Replace Xapian query parser with custom query parser.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
                   ` (2 preceding siblings ...)
  2011-01-16  8:10 ` [PATCH 3/8] Parse wildcard queries Austin Clements
@ 2011-01-16  8:10 ` Austin Clements
  2011-01-16  8:10 ` [PATCH 5/8] Support "tag:*" as well as "NOT tag:*" queries Austin Clements
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

Note that the type:mail filter is implemented as a transform pass, so
it no longer has to be done everywhere queries are parsed.
Furthermore, this filter now depends on the prefixing logic in the
query parser instead of implementing this itself.  Likewise, we don't
need to special-case the queries "" and "*" in multiple places.
---
 lib/database-private.h |    3 +-
 lib/database.cc        |   33 ++++++++++++++-----------
 lib/query.cc           |   62 ++++++++++++++++++-----------------------------
 3 files changed, 44 insertions(+), 54 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index 5d2fa02..a6eea87 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -48,9 +48,8 @@ struct _notmuch_database {
     unsigned int last_doc_id;
     uint64_t last_thread_id;
 
-    Xapian::QueryParser *query_parser;
     Xapian::TermGenerator *term_gen;
-    Xapian::ValueRangeProcessor *value_range_processor;
+    _notmuch_qparser_t *query_parser;
 };
 
 /* Return the list of terms from the given iterator matching a prefix.
diff --git a/lib/database.cc b/lib/database.cc
index f8245ab..a3df0ae 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -571,6 +571,17 @@ _notmuch_database_ensure_writable (notmuch_database_t *notmuch)
     return NOTMUCH_STATUS_SUCCESS;
 }
 
+static _notmuch_token_t *
+transform_type_mail (_notmuch_token_t *root, unused (void *opaque))
+{
+    _notmuch_token_t *mail_ast =
+	_notmuch_token_create_term (root, TOK_LIT, "mail");
+    mail_ast->prefix = talloc_strdup (root, _find_prefix ("type"));
+    if (!root)
+	return mail_ast;
+    return _notmuch_token_create_op (root, TOK_AND, mail_ast, root);
+}
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
 		       notmuch_database_mode_t mode)
@@ -661,27 +672,24 @@ notmuch_database_open (const char *path,
 		INTERNAL_ERROR ("Malformed database last_thread_id: %s", str);
 	}
 
-	notmuch->query_parser = new Xapian::QueryParser;
 	notmuch->term_gen = new Xapian::TermGenerator;
 	notmuch->term_gen->set_stemmer (Xapian::Stem ("english"));
-	notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
-
-	notmuch->query_parser->set_default_op (Xapian::Query::OP_AND);
-	notmuch->query_parser->set_database (*notmuch->xapian_db);
-	notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
-	notmuch->query_parser->set_stemming_strategy (Xapian::QueryParser::STEM_SOME);
-	notmuch->query_parser->add_valuerangeprocessor (notmuch->value_range_processor);
+	notmuch->query_parser = _notmuch_qparser_create (notmuch, notmuch);
 
 	for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
 	    prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i];
-	    notmuch->query_parser->add_boolean_prefix (prefix->name,
-						       prefix->prefix);
+	    _notmuch_qparser_add_db_prefix (notmuch->query_parser, prefix->name,
+					    prefix->prefix, TRUE);
 	}
 
 	for (i = 0; i < ARRAY_SIZE (PROBABILISTIC_PREFIX); i++) {
 	    prefix_t *prefix = &PROBABILISTIC_PREFIX[i];
-	    notmuch->query_parser->add_prefix (prefix->name, prefix->prefix);
+	    _notmuch_qparser_add_db_prefix (notmuch->query_parser, prefix->name,
+					    prefix->prefix, FALSE);
 	}
+
+	_notmuch_qparser_add_transform (notmuch->query_parser,
+					transform_type_mail, NULL);
     } catch (const Xapian::Error &error) {
 	fprintf (stderr, "A Xapian exception occurred opening database: %s\n",
 		 error.get_msg().c_str());
@@ -711,9 +719,6 @@ notmuch_database_close (notmuch_database_t *notmuch)
     }
 
     delete notmuch->term_gen;
-    delete notmuch->query_parser;
-    delete notmuch->xapian_db;
-    delete notmuch->value_range_processor;
     talloc_free (notmuch);
 }
 
diff --git a/lib/query.cc b/lib/query.cc
index c7ae4ee..993f498 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -131,27 +131,21 @@ notmuch_query_search_messages (notmuch_query_t *query)
 	talloc_set_destructor (messages, _notmuch_messages_destructor);
 
 	Xapian::Enquire enquire (*notmuch->xapian_db);
-	Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
-						   _find_prefix ("type"),
-						   "mail"));
-	Xapian::Query string_query, final_query;
+	_notmuch_token_t *ast;
+	Xapian::Query final_query;
 	Xapian::MSet mset;
-	unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN |
-			      Xapian::QueryParser::FLAG_PHRASE |
-			      Xapian::QueryParser::FLAG_LOVEHATE |
-			      Xapian::QueryParser::FLAG_BOOLEAN_ANY_CASE |
-			      Xapian::QueryParser::FLAG_WILDCARD |
-			      Xapian::QueryParser::FLAG_PURE_NOT);
+	char *error;
 
-	if (strcmp (query_string, "") == 0 ||
-	    strcmp (query_string, "*") == 0)
-	{
-	    final_query = mail_query;
-	} else {
-	    string_query = notmuch->query_parser->
-		parse_query (query_string, flags);
-	    final_query = Xapian::Query (Xapian::Query::OP_AND,
-					 mail_query, string_query);
+	ast = _notmuch_qparser_parse (query, notmuch->query_parser,
+				      query_string);
+	ast = _notmuch_qparser_transform (notmuch->query_parser, ast);
+	final_query = _notmuch_qparser_generate (query, notmuch->query_parser,
+						 ast, &error);
+	if (error) {
+	    fprintf (stderr, "Query error: %s\n", error);
+	    notmuch->exception_reported = TRUE;
+	    talloc_free (messages);
+	    return NULL;
 	}
 
 	enquire.set_weighting_scheme (Xapian::BoolWeight());
@@ -412,27 +406,19 @@ notmuch_query_count_messages (notmuch_query_t *query)
 
     try {
 	Xapian::Enquire enquire (*notmuch->xapian_db);
-	Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
-						   _find_prefix ("type"),
-						   "mail"));
-	Xapian::Query string_query, final_query;
+	_notmuch_token_t *ast;
+	Xapian::Query final_query;
 	Xapian::MSet mset;
-	unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN |
-			      Xapian::QueryParser::FLAG_PHRASE |
-			      Xapian::QueryParser::FLAG_LOVEHATE |
-			      Xapian::QueryParser::FLAG_BOOLEAN_ANY_CASE |
-			      Xapian::QueryParser::FLAG_WILDCARD |
-			      Xapian::QueryParser::FLAG_PURE_NOT);
+	char *error;
 
-	if (strcmp (query_string, "") == 0 ||
-	    strcmp (query_string, "*") == 0)
-	{
-	    final_query = mail_query;
-	} else {
-	    string_query = notmuch->query_parser->
-		parse_query (query_string, flags);
-	    final_query = Xapian::Query (Xapian::Query::OP_AND,
-					 mail_query, string_query);
+	ast = _notmuch_qparser_parse (query, notmuch->query_parser,
+				      query_string);
+	ast = _notmuch_qparser_transform (notmuch->query_parser, ast);
+	final_query = _notmuch_qparser_generate (query, notmuch->query_parser,
+						 ast, &error);
+	if (error) {
+	    fprintf (stderr, "Query error: %s\n", error);
+	    return count;
 	}
 
 	enquire.set_weighting_scheme(Xapian::BoolWeight());
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/8] Support "tag:*" as well as "NOT tag:*" queries.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
                   ` (3 preceding siblings ...)
  2011-01-16  8:10 ` [PATCH 4/8] Replace Xapian query parser with custom query parser Austin Clements
@ 2011-01-16  8:10 ` Austin Clements
  2011-01-24 17:15   ` [PATCH 5.5/8] test: Wildcard tag search and untagged search Austin Clements
  2011-01-16  8:10 ` [PATCH 6/8] Support maildir folder search Austin Clements
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

This extends the syntactic-to-database prefix query transform to
optionally expand wildcards for boolean prefixes.  Support of "NOT
tag:*" queries to find all untagged messages falls out as a convenient
side-effect.
---
 TODO                  |    2 --
 lib/database.cc       |    4 ++--
 lib/notmuch-private.h |   10 ++++++----
 lib/qparser.cc        |   12 +++++++++++-
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/TODO b/TODO
index 10c8c12..15606d1 100644
--- a/TODO
+++ b/TODO
@@ -220,8 +220,6 @@ Fix the "count" functionality to be exact as Olly explained in IRC:
 
 Search syntax
 -------------
-Implement support for "tag:*" to expand to all tags.
-
 Fix "notmuch search to:" to be less confusing. Many users expect this
 to search for all messages with a To: header, but it instead searches
 for all messages with the word "to". If we don't provide the first
diff --git a/lib/database.cc b/lib/database.cc
index a3df0ae..3af82b0 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -679,13 +679,13 @@ notmuch_database_open (const char *path,
 	for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
 	    prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i];
 	    _notmuch_qparser_add_db_prefix (notmuch->query_parser, prefix->name,
-					    prefix->prefix, TRUE);
+					    prefix->prefix, TRUE, TRUE);
 	}
 
 	for (i = 0; i < ARRAY_SIZE (PROBABILISTIC_PREFIX); i++) {
 	    prefix_t *prefix = &PROBABILISTIC_PREFIX[i];
 	    _notmuch_qparser_add_db_prefix (notmuch->query_parser, prefix->name,
-					    prefix->prefix, FALSE);
+					    prefix->prefix, FALSE, FALSE);
 	}
 
 	_notmuch_qparser_add_transform (notmuch->query_parser,
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index eb346ea..5fc54de 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -631,13 +631,15 @@ _notmuch_qparser_add_transform (_notmuch_qparser_t *qparser,
 				void *opaque);
 
 /* Add a syntactic prefix (field) and a transform pass to transform
- * that syntactic prefix into a database prefix (prefix).  This
- * corresponds to Xapian's add_prefix and add_boolean_prefix
- * functions. */
+ * that syntactic prefix into a database prefix (prefix).  For boolean
+ * prefixes, wildcard indicates whether the term should allow wildcard
+ * expansion.  This corresponds to Xapian's add_prefix and
+ * add_boolean_prefix functions. */
 void
 _notmuch_qparser_add_db_prefix (_notmuch_qparser_t *qparser,
 				const char *field, const char *prefix,
-				notmuch_bool_t boolean);
+				notmuch_bool_t boolean,
+				notmuch_bool_t wildcard);
 
 /* Lex a query string, returning the first token in the token list.
  * This is only meant for testing. */
diff --git a/lib/qparser.cc b/lib/qparser.cc
index bd0296a..0ff240c 100644
--- a/lib/qparser.cc
+++ b/lib/qparser.cc
@@ -974,6 +974,7 @@ _notmuch_qparser_add_transform (_notmuch_qparser_t *qparser,
 
 struct _notmuch_transform_prefix_info {
     char *field, *prefix;
+    notmuch_bool_t wildcard;
 };
 
 static _notmuch_token_t *
@@ -986,6 +987,13 @@ transform_prefix_rec (struct _notmuch_transform_prefix_info *info,
 	active = (strcmp (info->field, root->text) == 0);
     } else if (active && (root->type == TOK_TERMS || root->type == TOK_LIT)) {
 	root->prefix = info->prefix;
+	if (info->wildcard) {
+	    int n = strlen (root->text);
+	    if (n && root->text[n - 1] == '*') {
+		root->text = talloc_strndup (root, root->text, n - 1);
+		root->wildcard = TRUE;
+	    }
+	}
     }
     transform_prefix_rec (info, root->left, active);
     transform_prefix_rec (info, root->right, active);
@@ -1003,12 +1011,14 @@ transform_prefix (_notmuch_token_t *root, void *opaque)
 void
 _notmuch_qparser_add_db_prefix (_notmuch_qparser_t *qparser,
 				const char *field, const char *prefix,
-				notmuch_bool_t boolean)
+				notmuch_bool_t boolean,
+				notmuch_bool_t wildcard)
 {
     struct _notmuch_transform_prefix_info *info;
     info = talloc (qparser, struct _notmuch_transform_prefix_info);
     info->field = talloc_strdup (info, field);
     info->prefix = talloc_strdup (info, prefix);
+    info->wildcard = boolean && wildcard;
     _notmuch_qparser_add_prefix (qparser, field, boolean, boolean);
     _notmuch_qparser_add_transform (qparser, transform_prefix, info);
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/8] Support maildir folder search.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
                   ` (4 preceding siblings ...)
  2011-01-16  8:10 ` [PATCH 5/8] Support "tag:*" as well as "NOT tag:*" queries Austin Clements
@ 2011-01-16  8:10 ` Austin Clements
  2011-01-24 17:13   ` [PATCH 6/8 v2] " Austin Clements
  2011-01-24 17:18   ` [PATCH 6.5/8] test: Add tests for custom query parser-based folder searches Austin Clements
  2011-01-16  8:10 ` [PATCH 7/8] Implement value range queries Austin Clements
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

This implements a folder: query prefix by constructing a wildcard
query that matches all files within the specified folder, folder/new,
or folder/cur.  This works with hierarchical folder names, and accepts
both absolute and relative paths.
---
 lib/database.cc |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 3af82b0..20fd412 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -582,6 +582,57 @@ transform_type_mail (_notmuch_token_t *root, unused (void *opaque))
     return _notmuch_token_create_op (root, TOK_AND, mail_ast, root);
 }
 
+static _notmuch_token_t *
+transform_folder_one (const void *ctx, notmuch_database_t *notmuch,
+		      const char *relpath, const char *rest)
+{
+    const char *db_path;
+    notmuch_private_status_t status;
+    unsigned int doc_id;
+    _notmuch_token_t *tok;
+
+    /* Get the docid for (relpath + rest). */
+    if (*rest)
+	relpath = talloc_asprintf (ctx, "%s%s", relpath, rest);
+    db_path = _notmuch_database_get_directory_db_path (relpath);
+    status = _notmuch_database_find_unique_doc_id (notmuch, "directory",
+						   db_path, &doc_id);
+    if (db_path != relpath)
+	free ((char*) db_path);
+
+    /* Construct a wildcard query that matches files in this directory. */
+    if (status)
+	/* Directory doesn't exist.  Perhaps this should be an error? */
+	doc_id = 0;
+    tok = _notmuch_token_create_term (ctx, TOK_LIT,
+				      talloc_asprintf (ctx, "%u:", doc_id));
+    tok->prefix = _find_prefix ("file-direntry");
+    return tok;
+}
+
+static _notmuch_token_t *
+transform_folder (_notmuch_token_t *root, void *opaque)
+{
+    if (!root)
+	return NULL;
+    if (root->type == TOK_PREFIX && strcmp (root->text, "folder") == 0) {
+	notmuch_database_t *notmuch = (notmuch_database_t *)opaque;
+	_notmuch_token_t *lit = root->left, *subs[3], *tok;
+	const char *relpath;
+	assert (lit && lit->type == TOK_LIT);
+	relpath = _notmuch_database_relative_path (notmuch, lit->text);
+	subs[0] = transform_folder_one (root, notmuch, relpath, "");
+	subs[1] = transform_folder_one (root, notmuch, relpath, "/new");
+	subs[2] = transform_folder_one (root, notmuch, relpath, "/cur");
+	tok = _notmuch_token_create_op (root, TOK_OR, subs[1], subs[2]);
+	return _notmuch_token_create_op (root, TOK_OR, subs[0], tok);
+    }
+
+    root->left = transform_folder (root->left, opaque);
+    root->right = transform_folder (root->right, opaque);
+    return root;
+}
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
 		       notmuch_database_mode_t mode)
@@ -690,6 +741,11 @@ notmuch_database_open (const char *path,
 
 	_notmuch_qparser_add_transform (notmuch->query_parser,
 					transform_type_mail, NULL);
+
+	_notmuch_qparser_add_prefix (notmuch->query_parser, "folder",
+				     TRUE, TRUE);
+	_notmuch_qparser_add_transform (notmuch->query_parser,
+					transform_folder, notmuch);
     } catch (const Xapian::Error &error) {
 	fprintf (stderr, "A Xapian exception occurred opening database: %s\n",
 		 error.get_msg().c_str());
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/8] Implement value range queries.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
                   ` (5 preceding siblings ...)
  2011-01-16  8:10 ` [PATCH 6/8] Support maildir folder search Austin Clements
@ 2011-01-16  8:10 ` Austin Clements
  2011-01-16  8:10 ` [PATCH 8/8] Support before: and after: date search with sane date syntax Austin Clements
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

Unlike in Xapian, there's no specific syntax that generates value
ranges.  Instead, it's up query transforms to generate them.
---
 lib/notmuch-private.h |    8 ++++++++
 lib/qparser.cc        |   20 +++++++++++++++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 5fc54de..9c16f56 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -548,6 +548,8 @@ enum _notmuch_token_type {
      * with no phrase splitting or whitespace removal.  The lexer
      * only generates TOK_TERMS; the parser creates TOK_LIT. */
     TOK_LIT,
+    /* A value range operand. */
+    TOK_RANGE,
     /* An error token.  An error token anywhere in the parse tree will
      * be propagated up by the generator and returned to the caller.
      * The error message should be in the text. */
@@ -577,6 +579,12 @@ typedef struct _notmuch_token {
      * match any terms prefixed with text. */
     notmuch_bool_t wildcard;
 
+    /* For TOK_RANGE, the value number to filter on, and the
+     * (inclusive) range to match lexicographically.  Either endpoint
+     * may be NULL, indicating an open-ended range. */
+    unsigned valueno;
+    const char *rangeBegin, *rangeEnd;
+
     /* Link in the lexer token list. */
     struct _notmuch_token *next;
 
diff --git a/lib/qparser.cc b/lib/qparser.cc
index 0ff240c..2c63062 100644
--- a/lib/qparser.cc
+++ b/lib/qparser.cc
@@ -43,7 +43,6 @@
  * Still missing from this implementation:
  * * Stemming - The stemming should probably be marked on TOK_TERMS
  *   tokens.  Ideally, we can just pass this to the term generator.
- * * Value ranges in the IR
  */
 
 /* XXX notmuch currently registers "tag" as an exclusive boolean
@@ -100,13 +99,13 @@ static const char *token_types[] = {
     "LOVE", "HATE", "BRA", "KET",
     "AND", "OR", "XOR", "ADJ", "NEAR",
     "NOT", "FILTER", "PREFIX",
-    "TERMS", "LIT", "ERROR", "END"
+    "TERMS", "LIT", "ERROR", "RANGE", "END"
 };
 
 /* The distinguished end token.  This simplifies the parser since it
  * never has to worry about dereferencing next. */
 static _notmuch_token_t tok_end = {TOK_END, NULL, -1, FALSE, NULL, FALSE,
-				   &tok_end, NULL, NULL};
+				   0, NULL, NULL, &tok_end, NULL, NULL};
 
 _notmuch_token_t *
 _notmuch_token_create_op (const void *ctx, enum _notmuch_token_type type,
@@ -145,6 +144,9 @@ _notmuch_token_show (const void *ctx, _notmuch_token_t *tok)
     else if (tok->type == TOK_LIT)
 	return talloc_asprintf (ctx, "'%s'%s", tok->text,
 				tok->wildcard ? "*" : "");
+    else if (tok->type == TOK_RANGE)
+	return talloc_asprintf (ctx, "RANGE/%d:%s..%s",
+				tok->valueno, tok->rangeBegin, tok->rangeEnd);
     else if (tok->type == TOK_ERROR)
 	return talloc_asprintf (ctx, "ERROR/\"%s\"", tok->text);
 
@@ -536,6 +538,7 @@ parse_prob (struct _notmuch_parse_state *s, int prec, _notmuch_token_t **tok)
 	    break;
 
 	case TOK_FILTER:
+	case TOK_RANGE:
 	case TOK_ERROR:
 	    INTERNAL_ERROR ("Unexpected token %s",
 			    _notmuch_token_show (s->ctx, *tok));
@@ -902,6 +905,17 @@ generate (struct _notmuch_generate_state *s, _notmuch_token_t *root)
 	    return generate_wildcard (s, term);
 	return Query (term);
 
+    case TOK_RANGE:
+	if (root->rangeBegin && root->rangeEnd)
+	    return Query (Query::OP_VALUE_RANGE, root->valueno,
+			  root->rangeBegin, root->rangeEnd);
+	else if (root->rangeBegin)
+	    return Query (Query::OP_VALUE_GE, root->valueno, root->rangeBegin);
+	else if (root->rangeEnd)
+	    return Query (Query::OP_VALUE_LE, root->valueno, root->rangeEnd);
+	else
+	    INTERNAL_ERROR ("TOK_RANGE must have an endpoint");
+
     case TOK_ERROR:
 	if (!s->error)
 	    s->error = talloc_strdup (s->ctx, root->text);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/8] Support before: and after: date search with sane date syntax.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
                   ` (6 preceding siblings ...)
  2011-01-16  8:10 ` [PATCH 7/8] Implement value range queries Austin Clements
@ 2011-01-16  8:10 ` Austin Clements
  2011-01-24 17:20   ` [PATCH 8.5/8] test: Add tests for search by date Austin Clements
  2011-01-31  4:33 ` [PATCH 9/8] qparser: Delete (and thus close) the Xapian database Austin Clements
  2011-02-02  5:03 ` [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
  9 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-01-16  8:10 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

Gmail-compatible date search operators, where the date must be
specified as yyyy/mm/dd.  This is just a start; it would be great to
support fancier date syntax and, in particular, relative date
specifications.
---
 lib/database.cc |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 20fd412..cb02220 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -633,6 +633,49 @@ transform_folder (_notmuch_token_t *root, void *opaque)
     return root;
 }
 
+static _notmuch_token_t *
+transform_date (_notmuch_token_t *root, void *opaque)
+{
+    if (!root)
+	return NULL;
+    if (root->type == TOK_PREFIX) {
+	int after = (strcmp (root->text, "after") == 0);
+	int before = (strcmp (root->text, "before") == 0);
+	if (after || before) {
+	    _notmuch_token_t *tok = root->left;
+	    struct tm t;
+	    time_t time_value;
+	    char *end, *serialzed;
+
+	    /* Parse the date */
+	    assert (tok && tok->type == TOK_LIT);
+	    memset(&t, 0, sizeof(t));
+	    end = strptime (tok->text, "%Y/%m/%d", &t);
+	    time_value = mktime(&t);
+	    if (!end || *end || time_value == -1) {
+		char *msg = talloc_asprintf (root, "Invalid date \"%s\"",
+					     tok->text);
+		return _notmuch_token_create_term (root, TOK_ERROR, msg);
+	    }
+
+	    /* Construct the date query */
+	    tok = _notmuch_token_create_term (root, TOK_RANGE, NULL);
+	    tok->valueno = NOTMUCH_VALUE_TIMESTAMP;
+	    serialzed = talloc_strdup
+		(tok, Xapian::sortable_serialise (time_value).c_str ());
+	    if (after)
+		tok->rangeBegin = serialzed;
+	    else
+		tok->rangeEnd = serialzed;
+	    return tok;
+	}
+    }
+
+    root->left = transform_date (root->left, opaque);
+    root->right = transform_date (root->right, opaque);
+    return root;
+}
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
 		       notmuch_database_mode_t mode)
@@ -746,6 +789,13 @@ notmuch_database_open (const char *path,
 				     TRUE, TRUE);
 	_notmuch_qparser_add_transform (notmuch->query_parser,
 					transform_folder, notmuch);
+
+	_notmuch_qparser_add_prefix (notmuch->query_parser, "after",
+				     TRUE, TRUE);
+	_notmuch_qparser_add_prefix (notmuch->query_parser, "before",
+				     TRUE, TRUE);
+	_notmuch_qparser_add_transform (notmuch->query_parser,
+					transform_date, NULL);
     } catch (const Xapian::Error &error) {
 	fprintf (stderr, "A Xapian exception occurred opening database: %s\n",
 		 error.get_msg().c_str());
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1.5/8] Query parser testing framework and basic tests.
  2011-01-16  8:10 ` [PATCH 1/8] Implement a custom query parser with a mostly Xapian-compatible grammar Austin Clements
@ 2011-01-21  6:37   ` Austin Clements
  0 siblings, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-21  6:37 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

The query parser test is implemented as a separate binary that calls
directly in to the lexer, parser, and generator to make it easy to
isolate test failures.
---

Sorry for the patch ordering.  This is intended to be applied after
patch 1/8 in this series,
id:1295165458-9573-2-git-send-email-amdragon@mit.edu

 test/Makefile.local                         |    7 +-
 test/basic                                  |    2 +-
 test/notmuch-test                           |    2 +-
 test/qparser                                |   32 ++++++
 test/qparser-test.cc                        |  153 +++++++++++++++++++++++++++
 test/qparser.expected-output/operators      |  145 +++++++++++++++++++++++++
 test/qparser.expected-output/prefixes       |   33 ++++++
 test/qparser.expected-output/probs          |   46 ++++++++
 test/qparser.expected-output/quoted-phrases |   29 +++++
 test/qparser.expected-output/terms          |  136 ++++++++++++++++++++++++
 10 files changed, 581 insertions(+), 4 deletions(-)
 create mode 100755 test/qparser
 create mode 100644 test/qparser-test.cc
 create mode 100644 test/qparser.expected-output/operators
 create mode 100644 test/qparser.expected-output/prefixes
 create mode 100644 test/qparser.expected-output/probs
 create mode 100644 test/qparser.expected-output/quoted-phrases
 create mode 100644 test/qparser.expected-output/terms

diff --git a/test/Makefile.local b/test/Makefile.local
index 7b602bc..302482b 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -5,10 +5,13 @@ dir := test
 $(dir)/smtp-dummy: $(dir)/smtp-dummy.c
 	$(call quiet,CC) $^ -o $@
 
+$(dir)/qparser-test: $(dir)/qparser-test.o notmuch-config.o query-string.o lib/libnotmuch.a
+	$(call quiet,CXX $(CXXFLAGS)) $^ $(FINAL_LIBNOTMUCH_LDFLAGS) -o $@
+
 .PHONY: test check
-test:	all $(dir)/smtp-dummy
+test:	all $(dir)/smtp-dummy $(dir)/qparser-test
 	@${dir}/notmuch-test $(OPTIONS)
 
 check: test
 
-CLEAN := $(CLEAN) $(dir)/smtp-dummy
+CLEAN := $(CLEAN) $(dir)/smtp-dummy $(dir)/qparser-test.o $(dir)/qparser-test
diff --git a/test/basic b/test/basic
index b4410f2..3191bcc 100755
--- a/test/basic
+++ b/test/basic
@@ -52,7 +52,7 @@ test_expect_code 2 'failure to clean up causes the test to fail' '
 # Ensure that all tests are being run
 test_begin_subtest 'Ensure that all available tests will be run by notmuch-test'
 tests_in_suite=$(grep TESTS= ../notmuch-test | sed -e "s/TESTS=\"\(.*\)\"/\1/" | tr " " "\n" | sort)
-available=$(ls -1 ../ | grep -v -E "^(aggregate-results.sh|Makefile|Makefile.local|notmuch-test|README|test-lib.sh|test-results|tmp.*|valgrind|corpus*|emacs.expected-output|smtp-dummy|smtp-dummy.c|test-verbose|test.expected-output)" | sort)
+available=$(ls -1 ../ | grep -v -E "^(aggregate-results.sh|Makefile|Makefile.local|notmuch-test|README|test-lib.sh|test-results|tmp.*|valgrind|corpus*|emacs.expected-output|smtp-dummy|smtp-dummy.c|test-verbose|test.expected-output|qparser-test.*|qparser-test|qparser.expected-output)" | sort)
 test_expect_equal "$tests_in_suite" "$available"
 
 EXPECTED=../test.expected-output
diff --git a/test/notmuch-test b/test/notmuch-test
index 4889e49..1e331b3 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -16,7 +16,7 @@ fi
 
 cd $(dirname "$0")
 
-TESTS="basic new search search-output json thread-naming raw reply dump-restore uuencode thread-order author-order from-guessing long-id encoding emacs maildir-sync"
+TESTS="basic new qparser search search-output json thread-naming raw reply dump-restore uuencode thread-order author-order from-guessing long-id encoding emacs maildir-sync"
 
 # Clean up any results from a previous run
 rm -r test-results >/dev/null 2>/dev/null
diff --git a/test/qparser b/test/qparser
new file mode 100755
index 0000000..0e7b022
--- /dev/null
+++ b/test/qparser
@@ -0,0 +1,32 @@
+#!/bin/bash
+test_description="query parser"
+. ./test-lib.sh
+
+EXPECTED=../qparser.expected-output
+
+test_begin_subtest "Quoted phrases"
+output=$(../qparser-test < $EXPECTED/quoted-phrases)
+expected=$(cat $EXPECTED/quoted-phrases)
+test_expect_equal "$output" "$expected"
+
+test_begin_subtest "Prefixes"
+output=$(../qparser-test < $EXPECTED/prefixes)
+expected=$(cat $EXPECTED/prefixes)
+test_expect_equal "$output" "$expected"
+
+test_begin_subtest "Terms"
+output=$(../qparser-test < $EXPECTED/terms)
+expected=$(cat $EXPECTED/terms)
+test_expect_equal "$output" "$expected"
+
+test_begin_subtest "Operators"
+output=$(../qparser-test < $EXPECTED/operators)
+expected=$(cat $EXPECTED/operators)
+test_expect_equal "$output" "$expected"
+
+test_begin_subtest "Probs"
+output=$(../qparser-test < $EXPECTED/probs)
+expected=$(cat $EXPECTED/probs)
+test_expect_equal "$output" "$expected"
+
+test_done
diff --git a/test/qparser-test.cc b/test/qparser-test.cc
new file mode 100644
index 0000000..01d6bae
--- /dev/null
+++ b/test/qparser-test.cc
@@ -0,0 +1,153 @@
+/* qparser-test - Display the lex, parse, and query tree for a query
+ *
+ * Copyright © 2011 Austin Clements
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Authors: Austin Clements <amdragon@mit.edu>
+ */
+
+/* If command-line arguments are given, they are used as the query
+ * string.  Otherwise, qparser-test enters "echo mode", in which it
+ * accepts queries from stdin.  In echo mode, lines beginning with '['
+ * are ignored and lines consisting of whitespace or comments are
+ * echoed back to stdout.  All other lines are treated as queries and
+ * are echoed back, followed by the results of parsing the query.
+ * This allows the output of qparser-test to be fed back in as input.
+ *
+ * For each, qparser-test displays the lex list of that query, the
+ * parse tree of that query, and the generated query tree.  Finally,
+ * if the generated query tree differs from that generated by Xapian's
+ * query parser, it also displays what Xapian's query parser
+ * generated.
+ */
+
+#include "../lib/notmuch-private.h"
+#include "../lib/database-private.h"
+
+extern "C" {
+/* notmuch-client.h also defines INTERNAL_ERROR */
+#undef INTERNAL_ERROR
+#include "../notmuch-client.h"
+}
+
+static _notmuch_qparser_t *qparser;
+static Xapian::QueryParser xqparser;
+
+static char *
+query_desc (void *ctx, Xapian::Query q)
+{
+    char *desc = talloc_strdup (ctx, q.get_description ().c_str ());
+    desc += strlen ("Xapian::Query(");
+    desc[strlen(desc) - 1] = 0;
+    return desc;
+}
+
+static void
+test_one (void *ctx, const char *query_str)
+{
+    void *local = talloc_new (ctx);
+    Xapian::Query q;
+    _notmuch_token_t *toks, *root;
+    char *error, *qparser_desc, *xqparser_desc;
+
+    toks = _notmuch_qparser_lex (local, qparser, query_str);
+    printf("[lex]    %s\n", _notmuch_token_show_list (local, toks));
+
+    root = _notmuch_qparser_parse (local, qparser, query_str);
+    printf("[parse]  %s\n", _notmuch_token_show_tree (local, root));
+
+    root = _notmuch_qparser_transform (qparser, root);
+    q = _notmuch_qparser_generate (local, qparser, root, &error);
+    if (error)
+	printf("[gen]    error %s\n", error);
+    else {
+	qparser_desc = query_desc (local, q);
+	printf("[gen]    %s\n", qparser_desc);
+    }
+
+    try {
+	unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN |
+			      Xapian::QueryParser::FLAG_PHRASE |
+			      Xapian::QueryParser::FLAG_LOVEHATE |
+			      Xapian::QueryParser::FLAG_BOOLEAN_ANY_CASE |
+			      Xapian::QueryParser::FLAG_WILDCARD |
+			      Xapian::QueryParser::FLAG_PURE_NOT);
+	q = xqparser.parse_query (query_str, flags);
+	xqparser_desc = query_desc (local, q);
+	if (strcmp (qparser_desc, xqparser_desc) != 0)
+	    printf("[xapian] %s\n", xqparser_desc);
+    } catch (const Xapian::QueryParserError & e) {
+	printf("[xapian] error %s\n", e.get_msg ().c_str ());
+    }
+
+    talloc_free (local);
+}
+
+static _notmuch_qparser_t *
+create_qparser (void *ctx)
+{
+    _notmuch_qparser_t *qparser = _notmuch_qparser_create (ctx, NULL);
+    _notmuch_qparser_add_db_prefix (qparser, "prob", "P", FALSE);
+    _notmuch_qparser_add_db_prefix (qparser, "lit", "L", TRUE);
+    _notmuch_qparser_add_db_prefix (qparser, "tag", "K", TRUE);
+    return qparser;
+}
+
+static Xapian::QueryParser
+create_xapian_qparser (void)
+{
+    Xapian::QueryParser xq;
+    xq.set_default_op (Xapian::Query::OP_AND);
+    xq.add_prefix ("prob", "P");
+    xq.add_boolean_prefix ("lit", "L");
+    xq.add_boolean_prefix ("tag", "K");
+    return xq;
+}
+
+int
+main (int argc, char **argv)
+{
+    void *ctx;
+
+    ctx = talloc_new (NULL);
+
+    qparser = create_qparser (ctx);
+    xqparser = create_xapian_qparser ();
+
+    if (argc > 1) {
+	char *query_str;
+	query_str = query_string_from_args (ctx, argc - 1, argv + 1);
+	test_one (ctx, query_str);
+    } else {
+	/* Echo mode */
+	char line[512];
+	while (fgets (line, sizeof (line), stdin)) {
+	    if (line[0] == '\n' || line[0] == '#') {
+		/* Comment or whitespace.  Echo it */
+		printf("%s", line);
+	    } else if (line[0] == '[') {
+		/* Ignore line */
+	    } else {
+		/* Query */
+		if (line[strlen (line) - 1] == '\n')
+		    line[strlen (line) - 1] = 0;
+		printf("%s\n", line);
+		test_one (ctx, line);
+	    }
+	}
+    }
+
+    return 0;
+}
diff --git a/test/qparser.expected-output/operators b/test/qparser.expected-output/operators
new file mode 100644
index 0000000..788f007
--- /dev/null
+++ b/test/qparser.expected-output/operators
@@ -0,0 +1,145 @@
+# Boolean operators
+
+x and y
+[lex]    "x" AND "y"
+[parse]  (AND "x" "y")
+[gen]    (x:(pos=1) AND y:(pos=2))
+
+x or y
+[lex]    "x" OR "y"
+[parse]  (OR "x" "y")
+[gen]    (x:(pos=1) OR y:(pos=2))
+
+x xor y
+[lex]    "x" XOR "y"
+[parse]  (XOR "x" "y")
+[gen]    (x:(pos=1) XOR y:(pos=2))
+
+x and y or x and w
+[lex]    "x" AND "y" OR "x" AND "w"
+[parse]  (OR (AND "x" "y") (AND "x" "w"))
+[gen]    ((x:(pos=1) AND y:(pos=2)) OR (x:(pos=3) AND w:(pos=4)))
+
+x and -y
+[lex]    "x" AND HATE "y"
+[parse]  (AND "x" (NOT "y"))
+[gen]    (x:(pos=1) AND_NOT y:(pos=2))
+
+x or not y
+[lex]    "x" OR NOT "y"
+[parse]  (OR "x" (NOT "y"))
+[gen]    (x:(pos=1) OR (<alldocuments> AND_NOT y:(pos=2)))
+
+# The following three are Xapian-incompatible because they're syntax errors.
+x and
+[lex]    "x" AND
+[parse]  "x"
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> AND <expression>
+
+and x
+[lex]    AND "x"
+[parse]  "x"
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> AND <expression>
+
+and
+[lex]    AND
+[parse]  <nil>
+[gen]    <alldocuments>
+[xapian] error Syntax: <expression> AND <expression>
+
+# Unary NOT
+
+x not y
+[lex]    "x" NOT "y"
+[parse]  (AND "x" (NOT "y"))
+[gen]    (x:(pos=1) AND_NOT y:(pos=2))
+
+x not y or z
+[lex]    "x" NOT "y" OR "z"
+[parse]  (OR (AND "x" (NOT "y")) "z")
+[gen]    ((x:(pos=1) AND_NOT y:(pos=2)) OR z:(pos=3))
+
+x not y and z
+[lex]    "x" NOT "y" AND "z"
+[parse]  (AND (AND "x" (NOT "y")) "z")
+[gen]    ((x:(pos=1) AND_NOT y:(pos=2)) AND z:(pos=3))
+
+not not x
+[lex]    NOT NOT "x"
+[parse]  (NOT (NOT "x"))
+[gen]    (<alldocuments> AND_NOT (<alldocuments> AND_NOT x:(pos=1)))
+[xapian] error Syntax: <expression> NOT <expression>
+
+# Empty subexpressions
+# These are all Xapian-incompatible because they're syntax errors.
+
+x and ()
+[lex]    "x" AND BRA KET
+[parse]  "x"
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> AND <expression>
+
+() and x
+[lex]    BRA KET AND "x"
+[parse]  "x"
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> AND <expression>
+
+# NULL phrases
+# These are all Xapian-incompatible because they're syntax errors.
+
+and
+[lex]    AND
+[parse]  <nil>
+[gen]    <alldocuments>
+[xapian] error Syntax: <expression> AND <expression>
+
+@
+[lex]    "@"
+[parse]  "@"
+[gen]    <alldocuments>
+[xapian] 
+
+@ AND x
+[lex]    "@" AND "x"
+[parse]  (AND "@" "x")
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> AND <expression>
+
+x AND @
+[lex]    "x" AND "@"
+[parse]  (AND "x" "@")
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> AND <expression>
+
+@ AND NOT x
+[lex]    "@" AND NOT "x"
+[parse]  (AND "@" (NOT "x"))
+[gen]    (<alldocuments> AND_NOT x:(pos=1))
+[xapian] error Syntax: <expression> AND NOT <expression>
+
+x AND NOT @
+[lex]    "x" AND NOT "@"
+[parse]  (AND "x" (NOT "@"))
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> AND NOT <expression>
+
+NOT @
+[lex]    NOT "@"
+[parse]  (NOT "@")
+[gen]    <alldocuments>
+[xapian] error Syntax: <expression> NOT <expression>
+
+@ OR x
+[lex]    "@" OR "x"
+[parse]  (OR "@" "x")
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> OR <expression>
+
+x OR @
+[lex]    "x" OR "@"
+[parse]  (OR "x" "@")
+[gen]    x:(pos=1)
+[xapian] error Syntax: <expression> OR <expression>
diff --git a/test/qparser.expected-output/prefixes b/test/qparser.expected-output/prefixes
new file mode 100644
index 0000000..04c4f90
--- /dev/null
+++ b/test/qparser.expected-output/prefixes
@@ -0,0 +1,33 @@
+prob:x lit:y none:z
+[lex]    PREFIX/prob "x" PREFIX/lit "y" "none:z"
+[parse]  (AND (AND (PREFIX/prob "x") "none:z") (FILTER (PREFIX/lit 'y')))
+[gen]    ((Px:(pos=1) AND (none:(pos=2) PHRASE 2 z:(pos=3))) FILTER Ly)
+
+prob:"x y" lit:"x y" none:"x y"
+[lex]    PREFIX/prob "x y" PREFIX/lit "x y" "none:" "x y"
+[parse]  (AND (AND (AND (PREFIX/prob "x y") "none:") "x y") (FILTER (PREFIX/lit 'x y')))
+[gen]    (((Px:(pos=1) PHRASE 2 Py:(pos=2)) AND none:(pos=3) AND (x:(pos=4) PHRASE 2 y:(pos=5))) FILTER Lx y)
+
+# Incompatible; Xapian bails and re-parses everything with no flags
+prob:(x y) lit:(x y) none:(x y)
+[lex]    PREFIX/prob BRA "x" "y" KET PREFIX/lit "(x" "y" KET "none:" BRA "x" "y" KET
+[parse]  (AND (AND (AND (AND (PREFIX/prob (AND "x" "y")) "y") "none:") (AND "x" "y")) (FILTER (PREFIX/lit '(x')))
+[gen]    ((Px:(pos=1) AND Py:(pos=2) AND y:(pos=3) AND none:(pos=4) AND x:(pos=5) AND y:(pos=6)) FILTER L(x)
+[xapian] ((prob:(pos=1) AND x:(pos=2) AND y:(pos=3) AND y:(pos=4) AND none:(pos=5) AND x:(pos=6) AND y:(pos=7)) FILTER L(x)
+
+# This is Xapian-compatible, but seems ridiculous
+lit:(x)
+[lex]    PREFIX/lit "(x" KET
+[parse]  (FILTER (PREFIX/lit '(x'))
+[gen]    0 * L(x
+
+# Test characters accepted after the prefix colon
+lit:#
+[lex]    PREFIX/lit "#"
+[parse]  (FILTER (PREFIX/lit '#'))
+[gen]    0 * L#
+
+prob:#
+[lex]    "prob:#"
+[parse]  "prob:#"
+[gen]    prob:(pos=1)
diff --git a/test/qparser.expected-output/probs b/test/qparser.expected-output/probs
new file mode 100644
index 0000000..3c166f7
--- /dev/null
+++ b/test/qparser.expected-output/probs
@@ -0,0 +1,46 @@
+(x OR y) AND z
+[lex]    BRA "x" OR "y" KET AND "z"
+[parse]  (AND (OR "x" "y") "z")
+[gen]    ((x:(pos=1) OR y:(pos=2)) AND z:(pos=3))
+
+# Incompatible; Xapian bails on the syntax error, we forge ahead.
+(x OR y)) AND z
+[lex]    BRA "x" OR "y" KET KET AND "z"
+[parse]  (AND (OR "x" "y") "z")
+[gen]    ((x:(pos=1) OR y:(pos=2)) AND z:(pos=3))
+[xapian] (x:(pos=1) AND or:(pos=2) AND y:(pos=3) AND and:(pos=4) AND z:(pos=5))
+
+# Empty subexpression after prefix
+# Incompatible; Xapian treats as a syntax error.
+prob:() AND x
+[lex]    PREFIX/prob BRA KET AND "x"
+[parse]  "x"
+[gen]    x:(pos=1)
+[xapian] (prob:(pos=1) AND and:(pos=2) AND x:(pos=3))
+
+# Subqueries with same boolean prefix
+lit:x lit:y
+[lex]    PREFIX/lit "x" PREFIX/lit "y"
+[parse]  (FILTER (OR (PREFIX/lit 'x') (PREFIX/lit 'y')))
+[gen]    0 * (Lx OR Ly)
+
+# Combining prob components
+x -y lit:z
+[lex]    "x" HATE "y" PREFIX/lit "z"
+[parse]  (AND (AND "x" (FILTER (PREFIX/lit 'z'))) (NOT "y"))
+[gen]    ((x:(pos=1) FILTER Lz) AND_NOT y:(pos=2))
+
+x lit:z
+[lex]    "x" PREFIX/lit "z"
+[parse]  (AND "x" (FILTER (PREFIX/lit 'z')))
+[gen]    (x:(pos=1) FILTER Lz)
+
+-y lit:z
+[lex]    HATE "y" PREFIX/lit "z"
+[parse]  (AND (FILTER (PREFIX/lit 'z')) (NOT "y"))
+[gen]    (0 * Lz AND_NOT y:(pos=1))
+
+x -y
+[lex]    "x" HATE "y"
+[parse]  (AND "x" (NOT "y"))
+[gen]    (x:(pos=1) AND_NOT y:(pos=2))
diff --git a/test/qparser.expected-output/quoted-phrases b/test/qparser.expected-output/quoted-phrases
new file mode 100644
index 0000000..e223366
--- /dev/null
+++ b/test/qparser.expected-output/quoted-phrases
@@ -0,0 +1,29 @@
+x "y z" w
+[lex]    "x" "y z" "w"
+[parse]  (AND (AND "x" "y z") "w")
+[gen]    (x:(pos=1) AND (y:(pos=2) PHRASE 2 z:(pos=3)) AND w:(pos=4))
+
+x "y z
+[lex]    "x" "y z"
+[parse]  (AND "x" "y z")
+[gen]    (x:(pos=1) AND (y:(pos=2) PHRASE 2 z:(pos=3)))
+
+x "" y
+[lex]    "x" "" "y"
+[parse]  (AND (AND "x" "") "y")
+[gen]    (x:(pos=1) AND y:(pos=2))
+
+x "  " y
+[lex]    "x" "  " "y"
+[parse]  (AND (AND "x" "  ") "y")
+[gen]    (x:(pos=1) AND y:(pos=2))
+
+lit:" x y"
+[lex]    PREFIX/lit " x y"
+[parse]  (FILTER (PREFIX/lit ' x y'))
+[gen]    0 * L x y
+
+lit:"x""y"
+[lex]    PREFIX/lit "x"y"
+[parse]  (FILTER (PREFIX/lit 'x"y'))
+[gen]    0 * Lx"y
diff --git a/test/qparser.expected-output/terms b/test/qparser.expected-output/terms
new file mode 100644
index 0000000..9316c54
--- /dev/null
+++ b/test/qparser.expected-output/terms
@@ -0,0 +1,136 @@
+# Term lexing
+
+x y z
+[lex]    "x" "y" "z"
+[parse]  (AND (AND "x" "y") "z")
+[gen]    (x:(pos=1) AND y:(pos=2) AND z:(pos=3))
+
+x"y z"w
+[lex]    "x" "y z" "w"
+[parse]  (AND (AND "x" "y z") "w")
+[gen]    (x:(pos=1) AND (y:(pos=2) PHRASE 2 z:(pos=3)) AND w:(pos=4))
+
+x(y z)w
+[lex]    "x" BRA "y" "z" KET "w"
+[parse]  (AND (AND "x" (AND "y" "z")) "w")
+[gen]    (x:(pos=1) AND y:(pos=2) AND z:(pos=3) AND w:(pos=4))
+
+# The first query below is Xapian-compatible, while the second one
+# isn't.  We use much simpler term lexing rules than Xapian.
+x/y
+[lex]    "x/y"
+[parse]  "x/y"
+[gen]    (x:(pos=1) PHRASE 2 y:(pos=2))
+
+x!y
+[lex]    "x!y"
+[parse]  "x!y"
+[gen]    (x:(pos=1) PHRASE 2 y:(pos=2))
+[xapian] (x:(pos=1) AND y:(pos=2))
+
+# Incompatible; our simpler term parsing sees ! as a term
+x -! y
+[lex]    "x" HATE "!" "y"
+[parse]  (AND (AND "x" "y") (NOT "!"))
+[gen]    (x:(pos=1) AND y:(pos=2))
+[xapian] (x:(pos=1) AND_NOT y:(pos=2))
+
+# Term parsing
+
+x -
+[lex]    "x" HATE
+[parse]  "x"
+[gen]    x:(pos=1)
+
+x +
+[lex]    "x" LOVE
+[parse]  "x"
+[gen]    x:(pos=1)
+
+(x)
+[lex]    BRA "x" KET
+[parse]  "x"
+[gen]    x:(pos=1)
+
+# Prefixed operators get demoted to terms
+prob:AND
+[lex]    PREFIX/prob AND
+[parse]  (PREFIX/prob "AND")
+[gen]    Pand:(pos=1)
+
+# The first query below is Xapian-compatible, but the second isn't
+# because Xapian handles hate very differently from love.
++AND
+[lex]    LOVE AND
+[parse]  "AND"
+[gen]    and:(pos=1)
+
+-AND
+[lex]    HATE AND
+[parse]  (NOT "AND")
+[gen]    (<alldocuments> AND_NOT and:(pos=1))
+[xapian] and:(pos=1)
+
+# Incompatible; Xapian sees this as prob:"prob:x"
+prob:prob:x
+[lex]    PREFIX/prob PREFIX/prob "x"
+[parse]  (PREFIX/prob "x")
+[gen]    Px:(pos=1)
+[xapian] (Pprob:(pos=1) PHRASE 2 Px:(pos=2))
+
+# The rest are Xapian-incompatible because they're all considered
+# syntax errors
+(
+[lex]    BRA
+[parse]  <nil>
+[gen]    <alldocuments>
+[xapian] 
+
+()
+[lex]    BRA KET
+[parse]  <nil>
+[gen]    <alldocuments>
+[xapian] 
+
+)
+[lex]    KET
+[parse]  <nil>
+[gen]    <alldocuments>
+[xapian] 
+
+(x)) OR y
+[lex]    BRA "x" KET KET OR "y"
+[parse]  (OR "x" "y")
+[gen]    (x:(pos=1) OR y:(pos=2))
+[xapian] (x:(pos=1) AND or:(pos=2) AND y:(pos=3))
+
+# This one's only Xapian-compatible by chance.
+(x))
+[lex]    BRA "x" KET KET
+[parse]  "x"
+[gen]    x:(pos=1)
+
+# Term generating
+
+c++ x
+[lex]    "c++" "x"
+[parse]  (AND "c++" "x")
+[gen]    (c++:(pos=1) AND x:(pos=2))
+
+# Incompatible; + is not a "phrase generator" in Xapian.
+c+x
+[lex]    "c+x"
+[parse]  "c+x"
+[gen]    (c:(pos=1) PHRASE 2 x:(pos=2))
+[xapian] (c:(pos=1) AND x:(pos=2))
+
+c-x
+[lex]    "c-x"
+[parse]  "c-x"
+[gen]    (c:(pos=1) PHRASE 2 x:(pos=2))
+
+w "x y z"
+[lex]    "w" "x y z"
+[parse]  (AND "w" "x y z")
+[gen]    (w:(pos=1) AND (x:(pos=2) PHRASE 3 y:(pos=3) PHRASE 3 z:(pos=4)))
+
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2.5/8] Query parser tests for NEAR and ADJ operators.
  2011-01-16  8:10 ` [PATCH 2/8] Parse NEAR and ADJ operators Austin Clements
@ 2011-01-21  6:39   ` Austin Clements
  0 siblings, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-21  6:39 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

---
This is intended to be applied after patch 2/8 in this series,
id:1295165458-9573-3-git-send-email-amdragon@mit.edu

 test/qparser                              |    5 ++
 test/qparser.expected-output/near-and-adj |   83 +++++++++++++++++++++++++++++
 test/qparser.expected-output/operators    |   12 ++++
 test/qparser.expected-output/probs        |    6 ++
 4 files changed, 106 insertions(+), 0 deletions(-)
 create mode 100644 test/qparser.expected-output/near-and-adj

diff --git a/test/qparser b/test/qparser
index 0e7b022..7ed5c97 100755
--- a/test/qparser
+++ b/test/qparser
@@ -24,6 +24,11 @@ output=$(../qparser-test < $EXPECTED/operators)
 expected=$(cat $EXPECTED/operators)
 test_expect_equal "$output" "$expected"
 
+test_begin_subtest "Near and adj"
+output=$(../qparser-test < $EXPECTED/near-and-adj)
+expected=$(cat $EXPECTED/near-and-adj)
+test_expect_equal "$output" "$expected"
+
 test_begin_subtest "Probs"
 output=$(../qparser-test < $EXPECTED/probs)
 expected=$(cat $EXPECTED/probs)
diff --git a/test/qparser.expected-output/near-and-adj b/test/qparser.expected-output/near-and-adj
new file mode 100644
index 0000000..0da4689
--- /dev/null
+++ b/test/qparser.expected-output/near-and-adj
@@ -0,0 +1,83 @@
+x near y
+[lex]    "x" NEAR "y"
+[parse]  (NEAR "x" (NEAR "y"))
+[gen]    (x:(pos=1) NEAR 11 y:(pos=2))
+
+x near y near z
+[lex]    "x" NEAR "y" NEAR "z"
+[parse]  (NEAR "x" (NEAR "y" (NEAR "z")))
+[gen]    (x:(pos=1) NEAR 12 y:(pos=2) NEAR 12 z:(pos=3))
+
+x near/2 y
+[lex]    "x" NEAR/2 "y"
+[parse]  (NEAR "x" (NEAR/2 "y"))
+[gen]    (x:(pos=1) NEAR 3 y:(pos=2))
+
+x near/2 y near z
+[lex]    "x" NEAR/2 "y" NEAR "z"
+[parse]  (NEAR "x" (NEAR/2 "y" (NEAR "z")))
+[gen]    (x:(pos=1) NEAR 4 y:(pos=2) NEAR 4 z:(pos=3))
+
+x near y near/2 z
+[lex]    "x" NEAR "y" NEAR/2 "z"
+[parse]  (NEAR "x" (NEAR "y" (NEAR/2 "z")))
+[gen]    (x:(pos=1) NEAR 4 y:(pos=2) NEAR 4 z:(pos=3))
+
+x near/0 y
+[lex]    "x" "near/0" "y"
+[parse]  (AND (AND "x" "near/0") "y")
+[gen]    (x:(pos=1) AND (near:(pos=2) PHRASE 2 0:(pos=3)) AND y:(pos=4))
+
+x near/2z y
+[lex]    "x" "near/2z" "y"
+[parse]  (AND (AND "x" "near/2z") "y")
+[gen]    (x:(pos=1) AND (near:(pos=2) PHRASE 2 2z:(pos=3)) AND y:(pos=4))
+
+# The first query below is Xapian-compatible; while the secnd one
+# isn't.  In both cases, Xapian initially sees "near" as a NEAR
+# operator, but in the first case the trailing y is a syntax error,
+# which cases Xapian to reparse with no flags.
+x near/z y
+[lex]    "x" "near/z" "y"
+[parse]  (AND (AND "x" "near/z") "y")
+[gen]    (x:(pos=1) AND (near:(pos=2) PHRASE 2 z:(pos=3)) AND y:(pos=4))
+
+x near/z
+[lex]    "x" "near/z"
+[parse]  (AND "x" "near/z")
+[gen]    (x:(pos=1) AND (near:(pos=2) PHRASE 2 z:(pos=3)))
+[xapian] (x:(pos=1) NEAR 11 z:(pos=2))
+
+x adj y adj z
+[lex]    "x" ADJ "y" ADJ "z"
+[parse]  (ADJ "x" (ADJ "y" (ADJ "z")))
+[gen]    (x:(pos=1) PHRASE 12 y:(pos=2) PHRASE 12 z:(pos=3))
+
+# Syntax errors
+# These are all Xapian-incompatible because they're syntax errors.
+(x) NEAR y
+[lex]    BRA "x" KET NEAR "y"
+[parse]  (AND (AND "x" "NEAR") "y")
+[gen]    (x:(pos=1) AND near:(pos=2) AND y:(pos=3))
+
+x NEAR (y)
+[lex]    "x" NEAR BRA "y" KET
+[parse]  (AND (AND "x" "NEAR") "y")
+[gen]    (x:(pos=1) AND near:(pos=2) AND y:(pos=3))
+
+x NEAR y NEAR (z)
+[lex]    "x" NEAR "y" NEAR BRA "z" KET
+[parse]  (AND (AND (NEAR "x" (NEAR "y")) "NEAR") "z")
+[gen]    ((x:(pos=1) NEAR 11 y:(pos=2)) AND near:(pos=3) AND z:(pos=4))
+[xapian] (x:(pos=1) AND near:(pos=2) AND y:(pos=3) AND near:(pos=4) AND z:(pos=5))
+
+x NEAR y ADJ z
+[lex]    "x" NEAR "y" ADJ "z"
+[parse]  (AND (AND (NEAR "x" (NEAR "y")) "ADJ") "z")
+[gen]    ((x:(pos=1) NEAR 11 y:(pos=2)) AND adj:(pos=3) AND z:(pos=4))
+[xapian] (x:(pos=1) AND near:(pos=2) AND y:(pos=3) AND adj:(pos=4) AND z:(pos=5))
+
+NEAR x
+[lex]    NEAR "x"
+[parse]  (AND "NEAR" "x")
+[gen]    (near:(pos=1) AND x:(pos=2))
diff --git a/test/qparser.expected-output/operators b/test/qparser.expected-output/operators
index 788f007..3d69f9b 100644
--- a/test/qparser.expected-output/operators
+++ b/test/qparser.expected-output/operators
@@ -143,3 +143,15 @@ x OR @
 [parse]  (OR "x" "@")
 [gen]    x:(pos=1)
 [xapian] error Syntax: <expression> OR <expression>
+
+@ NEAR x
+[lex]    "@" NEAR "x"
+[parse]  (NEAR "@" (NEAR "x"))
+[gen]    x:(pos=1)
+[xapian] (near:(pos=1) AND x:(pos=2))
+
+x NEAR @
+[lex]    "x" NEAR "@"
+[parse]  (NEAR "x" (NEAR "@"))
+[gen]    x:(pos=1)
+[xapian] (x:(pos=1) AND near:(pos=2))
diff --git a/test/qparser.expected-output/probs b/test/qparser.expected-output/probs
index 3c166f7..8f50cdc 100644
--- a/test/qparser.expected-output/probs
+++ b/test/qparser.expected-output/probs
@@ -10,6 +10,12 @@
 [gen]    ((x:(pos=1) OR y:(pos=2)) AND z:(pos=3))
 [xapian] (x:(pos=1) AND or:(pos=2) AND y:(pos=3) AND and:(pos=4) AND z:(pos=5))
 
+# Near binds tighter than hate
+x -y NEAR z
+[lex]    "x" HATE "y" NEAR "z"
+[parse]  (AND "x" (NOT (NEAR "y" (NEAR "z"))))
+[gen]    (x:(pos=1) AND_NOT (y:(pos=2) NEAR 11 z:(pos=3)))
+
 # Empty subexpression after prefix
 # Incompatible; Xapian treats as a syntax error.
 prob:() AND x
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3.5/8] Query parser tests for wildcard queries.
  2011-01-16  8:10 ` [PATCH 3/8] Parse wildcard queries Austin Clements
@ 2011-01-21  6:40   ` Austin Clements
  2011-01-22 16:47     ` Michal Sojka
  0 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-01-21  6:40 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

Since wildcard queries require a database, qparser-test can now open a
database.
---
This is intended to be applied after patch 3/5 in this series,
id:1295165458-9573-4-git-send-email-amdragon@mit.edu

 test/qparser                           |    7 +++++++
 test/qparser-test.cc                   |   25 ++++++++++++++++++++++++-
 test/qparser.expected-output/wildcards |   20 ++++++++++++++++++++
 3 files changed, 51 insertions(+), 1 deletions(-)
 create mode 100644 test/qparser.expected-output/wildcards

diff --git a/test/qparser b/test/qparser
index 7ed5c97..d77e2b2 100755
--- a/test/qparser
+++ b/test/qparser
@@ -34,4 +34,11 @@ output=$(../qparser-test < $EXPECTED/probs)
 expected=$(cat $EXPECTED/probs)
 test_expect_equal "$output" "$expected"
 
+add_message '[body]="Peter Piper picked a peck of pickled peppers"'
+
+test_begin_subtest "Wildcards"
+output=$(../qparser-test -d < $EXPECTED/wildcards)
+expected=$(cat $EXPECTED/wildcards)
+test_expect_equal "$output" "$expected"
+
 test_done
diff --git a/test/qparser-test.cc b/test/qparser-test.cc
index 01d6bae..ae6c8b9 100644
--- a/test/qparser-test.cc
+++ b/test/qparser-test.cc
@@ -42,6 +42,7 @@ extern "C" {
 #include "../notmuch-client.h"
 }
 
+static notmuch_database_t *notmuch;
 static _notmuch_qparser_t *qparser;
 static Xapian::QueryParser xqparser;
 
@@ -98,7 +99,7 @@ test_one (void *ctx, const char *query_str)
 static _notmuch_qparser_t *
 create_qparser (void *ctx)
 {
-    _notmuch_qparser_t *qparser = _notmuch_qparser_create (ctx, NULL);
+    _notmuch_qparser_t *qparser = _notmuch_qparser_create (ctx, notmuch);
     _notmuch_qparser_add_db_prefix (qparser, "prob", "P", FALSE);
     _notmuch_qparser_add_db_prefix (qparser, "lit", "L", TRUE);
     _notmuch_qparser_add_db_prefix (qparser, "tag", "K", TRUE);
@@ -109,6 +110,8 @@ static Xapian::QueryParser
 create_xapian_qparser (void)
 {
     Xapian::QueryParser xq;
+    if (notmuch)
+	xq.set_database (*notmuch->xapian_db);
     xq.set_default_op (Xapian::Query::OP_AND);
     xq.add_prefix ("prob", "P");
     xq.add_boolean_prefix ("lit", "L");
@@ -120,9 +123,27 @@ int
 main (int argc, char **argv)
 {
     void *ctx;
+    notmuch_config_t *config;
 
     ctx = talloc_new (NULL);
 
+    if (argc > 1 && strcmp(argv[1], "-d") == 0) {
+	argc--;
+	argv++;
+
+	/* Open the database */
+	config = notmuch_config_open (ctx, NULL, NULL);
+	if (config == NULL)
+	    return 1;
+
+	notmuch = notmuch_database_open (notmuch_config_get_database_path (config),
+					 NOTMUCH_DATABASE_MODE_READ_ONLY);
+	if (notmuch == NULL)
+	    return 1;
+    } else {
+	notmuch = NULL;
+    }
+
     qparser = create_qparser (ctx);
     xqparser = create_xapian_qparser ();
 
@@ -149,5 +170,7 @@ main (int argc, char **argv)
 	}
     }
 
+    if (notmuch)
+	notmuch_database_close (notmuch);
     return 0;
 }
diff --git a/test/qparser.expected-output/wildcards b/test/qparser.expected-output/wildcards
new file mode 100644
index 0000000..6f62829
--- /dev/null
+++ b/test/qparser.expected-output/wildcards
@@ -0,0 +1,20 @@
+# Basic wildcard expansion
+p* AND x
+[lex]    "p"* AND "x"
+[parse]  (AND "p"* "x")
+[gen]    ((peck:(pos=1) SYNONYM peppers:(pos=1) SYNONYM peter:(pos=1) SYNONYM picked:(pos=1) SYNONYM pickled:(pos=1) SYNONYM piper:(pos=1)) AND x:(pos=2))
+
+# Incompatible; Xapian considers this a syntax error
+*
+[lex]    ""*
+[parse]  ""*
+[gen]    <alldocuments>
+[xapian] 
+
+# Wildcard that matches nothing.  Xapian handles this differently
+# but equivalently.
+nosuchterm* AND x
+[lex]    "nosuchterm"* AND "x"
+[parse]  (AND "nosuchterm"* "x")
+[gen]    (nosuchterm AND x:(pos=1))
+[xapian] 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 3.5/8] Query parser tests for wildcard queries.
  2011-01-21  6:40   ` [PATCH 3.5/8] Query parser tests for " Austin Clements
@ 2011-01-22 16:47     ` Michal Sojka
  2011-01-23 22:02       ` Austin Clements
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Sojka @ 2011-01-22 16:47 UTC (permalink / raw)
  To: Austin Clements, notmuch; +Cc: amdragon

On Fri, 21 Jan 2011, Austin Clements wrote:
> Since wildcard queries require a database, qparser-test can now open a
> database.

Hi Austin,

I had to apply the following changes in order to be able to compile the
tests (make test).

I'm going to test the parser in my daily use, but so far it looks really
nice. I especially enjoy the before and after searches. Thanks.

-Michal

diff --git a/test/qparser-test.cc b/test/qparser-test.cc
index 18318aa..5be6220 100644
--- a/test/qparser-test.cc
+++ b/test/qparser-test.cc
@@ -61,7 +61,7 @@ test_one (void *ctx, const char *query_str)
     void *local = talloc_new (ctx);
     Xapian::Query q;
     _notmuch_token_t *toks, *root;
-    char *error, *qparser_desc, *xqparser_desc;
+    char *error, *qparser_desc = NULL, *xqparser_desc;
 
     toks = _notmuch_qparser_lex (local, qparser, query_str);
     printf("[lex]    %s\n", _notmuch_token_show_list (local, toks));
@@ -100,9 +100,9 @@ static _notmuch_qparser_t *
 create_qparser (void *ctx)
 {
     _notmuch_qparser_t *qparser = _notmuch_qparser_create (ctx, notmuch);
-    _notmuch_qparser_add_db_prefix (qparser, "prob", "P", FALSE);
-    _notmuch_qparser_add_db_prefix (qparser, "lit", "L", TRUE);
-    _notmuch_qparser_add_db_prefix (qparser, "tag", "K", TRUE);
+    _notmuch_qparser_add_db_prefix (qparser, "prob", "P", FALSE, FALSE);
+    _notmuch_qparser_add_db_prefix (qparser, "lit", "L", TRUE, FALSE);
+    _notmuch_qparser_add_db_prefix (qparser, "tag", "K", TRUE, FALSE);
     return qparser;
 }
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 3.5/8] Query parser tests for wildcard queries.
  2011-01-22 16:47     ` Michal Sojka
@ 2011-01-23 22:02       ` Austin Clements
  2011-01-24 12:24         ` Michal Sojka
  0 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-01-23 22:02 UTC (permalink / raw)
  To: Michal Sojka; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 2236 bytes --]

Oops, yes.  I'm not sure why you had to initialize qparser_desc (are you
sure it doesn't compile if you omit that?), but a change in the later patch
5/8 requires the extra argument to _notmuch_qparser_add_db_prefix.  I've got
another patch with tests for patch 5/8 that adds and tests the argument that
I'll send out shortly (along with tests for the remaining patches).

Glad to see you're taking advantage of the query parser!

On Sat, Jan 22, 2011 at 11:47 AM, Michal Sojka <sojkam1@fel.cvut.cz> wrote:

> On Fri, 21 Jan 2011, Austin Clements wrote:
> > Since wildcard queries require a database, qparser-test can now open a
> > database.
>
> Hi Austin,
>
> I had to apply the following changes in order to be able to compile the
> tests (make test).
>
> I'm going to test the parser in my daily use, but so far it looks really
> nice. I especially enjoy the before and after searches. Thanks.
>
> -Michal
>
> diff --git a/test/qparser-test.cc b/test/qparser-test.cc
> index 18318aa..5be6220 100644
> --- a/test/qparser-test.cc
> +++ b/test/qparser-test.cc
> @@ -61,7 +61,7 @@ test_one (void *ctx, const char *query_str)
>     void *local = talloc_new (ctx);
>     Xapian::Query q;
>     _notmuch_token_t *toks, *root;
> -    char *error, *qparser_desc, *xqparser_desc;
> +    char *error, *qparser_desc = NULL, *xqparser_desc;
>
>     toks = _notmuch_qparser_lex (local, qparser, query_str);
>     printf("[lex]    %s\n", _notmuch_token_show_list (local, toks));
> @@ -100,9 +100,9 @@ static _notmuch_qparser_t *
>  create_qparser (void *ctx)
>  {
>      _notmuch_qparser_t *qparser = _notmuch_qparser_create (ctx, notmuch);
> -    _notmuch_qparser_add_db_prefix (qparser, "prob", "P", FALSE);
> -    _notmuch_qparser_add_db_prefix (qparser, "lit", "L", TRUE);
> -    _notmuch_qparser_add_db_prefix (qparser, "tag", "K", TRUE);
> +    _notmuch_qparser_add_db_prefix (qparser, "prob", "P", FALSE, FALSE);
> +    _notmuch_qparser_add_db_prefix (qparser, "lit", "L", TRUE, FALSE);
> +    _notmuch_qparser_add_db_prefix (qparser, "tag", "K", TRUE, FALSE);
>     return qparser;
>  }
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
>

[-- Attachment #2: Type: text/html, Size: 3060 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3.5/8] Query parser tests for wildcard queries.
  2011-01-23 22:02       ` Austin Clements
@ 2011-01-24 12:24         ` Michal Sojka
  0 siblings, 0 replies; 25+ messages in thread
From: Michal Sojka @ 2011-01-24 12:24 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

On Sun, 23 Jan 2011, Austin Clements wrote:
> Oops, yes.  I'm not sure why you had to initialize qparser_desc (are you
> sure it doesn't compile if you omit that?), 

This was another problem - I got a warning that this variable might be
uninitialized.

-Michal

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/8 v2] Support maildir folder search.
  2011-01-16  8:10 ` [PATCH 6/8] Support maildir folder search Austin Clements
@ 2011-01-24 17:13   ` Austin Clements
  2011-01-24 17:18   ` [PATCH 6.5/8] test: Add tests for custom query parser-based folder searches Austin Clements
  1 sibling, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-24 17:13 UTC (permalink / raw)
  To: notmuch

This implements a folder: query prefix by constructing a wildcard
query that matches all files within the specified folder, folder/new,
or folder/cur.  This works with hierarchical folder names, and accepts
both absolute and relative paths.
---

Well, that's embarrassing.  I somehow lost the critical "tok->wildcard
= TRUE;" line below somewhere between testing and posting this patch.
This is identical to version 1 of patch 6/8 except this one addition.

 lib/database.cc |   57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 3af82b0..d42039c 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -582,6 +582,58 @@ transform_type_mail (_notmuch_token_t *root, unused (void *opaque))
     return _notmuch_token_create_op (root, TOK_AND, mail_ast, root);
 }
 
+static _notmuch_token_t *
+transform_folder_one (const void *ctx, notmuch_database_t *notmuch,
+		      const char *relpath, const char *rest)
+{
+    const char *db_path;
+    notmuch_private_status_t status;
+    unsigned int doc_id;
+    _notmuch_token_t *tok;
+
+    /* Get the docid for (relpath + rest). */
+    if (*rest)
+	relpath = talloc_asprintf (ctx, "%s%s", relpath, rest);
+    db_path = _notmuch_database_get_directory_db_path (relpath);
+    status = _notmuch_database_find_unique_doc_id (notmuch, "directory",
+						   db_path, &doc_id);
+    if (db_path != relpath)
+	free ((char*) db_path);
+
+    /* Construct a wildcard query that matches files in this directory. */
+    if (status)
+	/* Directory doesn't exist.  Perhaps this should be an error? */
+	doc_id = 0;
+    tok = _notmuch_token_create_term (ctx, TOK_LIT,
+				      talloc_asprintf (ctx, "%u:", doc_id));
+    tok->prefix = _find_prefix ("file-direntry");
+    tok->wildcard = TRUE;
+    return tok;
+}
+
+static _notmuch_token_t *
+transform_folder (_notmuch_token_t *root, void *opaque)
+{
+    if (!root)
+	return NULL;
+    if (root->type == TOK_PREFIX && strcmp (root->text, "folder") == 0) {
+	notmuch_database_t *notmuch = (notmuch_database_t *)opaque;
+	_notmuch_token_t *lit = root->left, *subs[3], *tok;
+	const char *relpath;
+	assert (lit && lit->type == TOK_LIT);
+	relpath = _notmuch_database_relative_path (notmuch, lit->text);
+	subs[0] = transform_folder_one (root, notmuch, relpath, "");
+	subs[1] = transform_folder_one (root, notmuch, relpath, "/new");
+	subs[2] = transform_folder_one (root, notmuch, relpath, "/cur");
+	tok = _notmuch_token_create_op (root, TOK_OR, subs[1], subs[2]);
+	return _notmuch_token_create_op (root, TOK_OR, subs[0], tok);
+    }
+
+    root->left = transform_folder (root->left, opaque);
+    root->right = transform_folder (root->right, opaque);
+    return root;
+}
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
 		       notmuch_database_mode_t mode)
@@ -690,6 +742,11 @@ notmuch_database_open (const char *path,
 
 	_notmuch_qparser_add_transform (notmuch->query_parser,
 					transform_type_mail, NULL);
+
+	_notmuch_qparser_add_prefix (notmuch->query_parser, "folder",
+				     TRUE, TRUE);
+	_notmuch_qparser_add_transform (notmuch->query_parser,
+					transform_folder, notmuch);
     } catch (const Xapian::Error &error) {
 	fprintf (stderr, "A Xapian exception occurred opening database: %s\n",
 		 error.get_msg().c_str());
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5.5/8] test: Wildcard tag search and untagged search.
  2011-01-16  8:10 ` [PATCH 5/8] Support "tag:*" as well as "NOT tag:*" queries Austin Clements
@ 2011-01-24 17:15   ` Austin Clements
  0 siblings, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-24 17:15 UTC (permalink / raw)
  To: notmuch

---
 test/qparser-test.cc                   |    6 +++---
 test/qparser.expected-output/wildcards |   13 +++++++++++++
 test/search                            |   12 ++++++++++++
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/test/qparser-test.cc b/test/qparser-test.cc
index ae6c8b9..7b145cc 100644
--- a/test/qparser-test.cc
+++ b/test/qparser-test.cc
@@ -100,9 +100,9 @@ static _notmuch_qparser_t *
 create_qparser (void *ctx)
 {
     _notmuch_qparser_t *qparser = _notmuch_qparser_create (ctx, notmuch);
-    _notmuch_qparser_add_db_prefix (qparser, "prob", "P", FALSE);
-    _notmuch_qparser_add_db_prefix (qparser, "lit", "L", TRUE);
-    _notmuch_qparser_add_db_prefix (qparser, "tag", "K", TRUE);
+    _notmuch_qparser_add_db_prefix (qparser, "prob", "P", FALSE, FALSE);
+    _notmuch_qparser_add_db_prefix (qparser, "lit", "L", TRUE, FALSE);
+    _notmuch_qparser_add_db_prefix (qparser, "tag", "K", TRUE, TRUE);
     return qparser;
 }
 
diff --git a/test/qparser.expected-output/wildcards b/test/qparser.expected-output/wildcards
index 6f62829..0558732 100644
--- a/test/qparser.expected-output/wildcards
+++ b/test/qparser.expected-output/wildcards
@@ -18,3 +18,16 @@ nosuchterm* AND x
 [parse]  (AND "nosuchterm"* "x")
 [gen]    (nosuchterm AND x:(pos=1))
 [xapian] 
+
+# Incompatible; Xapian doesn't accept wildcards in boolean prefixes
+tag:*
+[lex]    PREFIX/tag "*"
+[parse]  (FILTER (PREFIX/tag '*'))
+[gen]    0 * (Kinbox:(pos=1) SYNONYM Kunread:(pos=1))
+[xapian] 0 * K*
+
+tag:i*
+[lex]    PREFIX/tag "i*"
+[parse]  (FILTER (PREFIX/tag 'i*'))
+[gen]    0 * Kinbox:(pos=1)
+[xapian] 0 * Ki*
diff --git a/test/search b/test/search
index b180c7f..7d1dedb 100755
--- a/test/search
+++ b/test/search
@@ -113,6 +113,18 @@ thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; search by to (name) (inbox unr
 thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; subject search test (phrase) (inbox unread)
 thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; this phrase should not match the subject search test (inbox unread)"
 
+test_begin_subtest 'Search by wildcard tag ("at*")'
+output=$(notmuch search 'tag:at*' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2009-11-18 [2/2] Lars Kellogg-Stedman; [notmuch] \"notmuch help\" outputs to stderr? (attachment inbox unread)
+thread:XXX   2009-11-18 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] [PATCH] Error out if no query is supplied to search instead of going into an infinite loop (attachment inbox unread)
+thread:XXX   2009-11-17 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] preliminary FreeBSD support (attachment inbox unread)"
+
+test_begin_subtest 'Search for untagged messages'
+add_message '[subject]="untagged message"'
+notmuch tag -inbox -unread id:$gen_msg_id
+output=$(notmuch search 'NOT tag:*' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; untagged message ()"
+
 test_begin_subtest "Search body (utf-8):"
 add_message '[subject]="utf8-message-body-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="message body utf8: bödý"'
 output=$(notmuch search "bödý" | notmuch_search_sanitize)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6.5/8] test: Add tests for custom query parser-based folder searches.
  2011-01-16  8:10 ` [PATCH 6/8] Support maildir folder search Austin Clements
  2011-01-24 17:13   ` [PATCH 6/8 v2] " Austin Clements
@ 2011-01-24 17:18   ` Austin Clements
  1 sibling, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-24 17:18 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

---
 test/notmuch-test     |    2 +-
 test/search-by-folder |   52 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 1 deletions(-)
 create mode 100755 test/search-by-folder

diff --git a/test/notmuch-test b/test/notmuch-test
index 1e331b3..2aa3489 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -16,7 +16,7 @@ fi
 
 cd $(dirname "$0")
 
-TESTS="basic new qparser search search-output json thread-naming raw reply dump-restore uuencode thread-order author-order from-guessing long-id encoding emacs maildir-sync"
+TESTS="basic new qparser search search-output search-by-folder json thread-naming raw reply dump-restore uuencode thread-order author-order from-guessing long-id encoding emacs maildir-sync"
 
 # Clean up any results from a previous run
 rm -r test-results >/dev/null 2>/dev/null
diff --git a/test/search-by-folder b/test/search-by-folder
new file mode 100755
index 0000000..ebf724f
--- /dev/null
+++ b/test/search-by-folder
@@ -0,0 +1,52 @@
+#!/bin/bash
+test_description='"notmuch search" by folder: (with variations)'
+. ./test-lib.sh
+
+add_message '[dir]=bad' '[subject]="To the bone"'
+add_message '[dir]=bad/news' '[subject]="Bears"'
+mkdir -p "${MAIL_DIR}/duplicate/bad/news"
+cp "$gen_msg_filename" "${MAIL_DIR}/duplicate/bad/news"
+
+add_message '[dir]=things' '[subject]="These are a few"'
+add_message '[dir]=things/favorite' '[subject]="Raindrops, whiskers, kettles"'
+add_message '[dir]=things/bad' '[subject]="Bites, stings, sad feelings"'
+
+test_begin_subtest "Top-level folder"
+output=$(notmuch search folder:bad | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)"
+
+test_begin_subtest "Nested folder"
+output=$(notmuch search folder:bad/news | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bears (inbox unread)"
+
+test_begin_subtest "Search is rooted"
+output=$(notmuch search folder:news | notmuch_search_sanitize)
+test_expect_equal "$output" ""
+
+test_begin_subtest "Absolute path"
+output=$(notmuch search folder:$MAIL_DIR/bad | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)"
+
+test_begin_subtest "Duplicate path"
+output=$(notmuch search folder:duplicate/bad/news | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bears (inbox unread)"
+
+test_begin_subtest "After removing duplicate instance of matching path"
+rm -r "${MAIL_DIR}/bad/news"
+increment_mtime "${MAIL_DIR}/bad"
+notmuch new
+output=$(notmuch search folder:duplicate/bad/news | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bears (inbox unread)"
+
+test_begin_subtest "After rename, old path returns nothing"
+mv "${MAIL_DIR}/duplicate/bad/news" "${MAIL_DIR}/duplicate/bad/olds"
+increment_mtime "${MAIL_DIR}/duplicate/bad"
+notmuch new
+output=$(notmuch search folder:duplicate/bad/news | notmuch_search_sanitize)
+test_expect_equal "$output" ""
+
+test_begin_subtest "After rename, new path returns result"
+output=$(notmuch search folder:duplicate/bad/olds | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bears (inbox unread)"
+
+test_done
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8.5/8] test: Add tests for search by date.
  2011-01-16  8:10 ` [PATCH 8/8] Support before: and after: date search with sane date syntax Austin Clements
@ 2011-01-24 17:20   ` Austin Clements
  0 siblings, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-24 17:20 UTC (permalink / raw)
  To: notmuch; +Cc: amdragon

---
 test/search |   67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/test/search b/test/search
index 7d1dedb..6425359 100755
--- a/test/search
+++ b/test/search
@@ -73,6 +73,73 @@ add_message '[subject]="this phrase should not match the subject search test"' '
 output=$(notmuch search 'subject:"subject search test (phrase)"' | notmuch_search_sanitize)
 test_expect_equal "$output" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; subject search test (phrase) (inbox unread)"
 
+test_begin_subtest "Search by after:"
+output=$(notmuch search 'after:2009/11/18' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2009-11-18 [1/1] Chris Wilson; [notmuch] [PATCH 1/2] Makefile: evaluate pkg-config once (inbox unread)
+thread:XXX   2009-11-18 [2/2] Alex Botero-Lowry, Carl Worth; [notmuch] [PATCH] Error out if no query is supplied to search instead of going into an infinite loop (attachment inbox unread)
+thread:XXX   2009-11-18 [1/2] Carl Worth| Ingmar Vanhassel; [notmuch] [PATCH] Typsos (inbox unread)
+thread:XXX   2009-11-18 [2/3] Keith Packard, Carl Worth| Adrian Perez de Castro; [notmuch] Introducing myself (inbox unread)
+thread:XXX   2009-11-18 [2/3] Keith Packard, Carl Worth| Israel Herraiz; [notmuch] New to the list (inbox unread)
+thread:XXX   2009-11-18 [1/3] Carl Worth| Jan Janak; [notmuch] What a great idea! (inbox unread)
+thread:XXX   2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread)
+thread:XXX   2009-11-18 [2/3] Keith Packard, Carl Worth| Aron Griffis; [notmuch] archive (inbox unread)
+thread:XXX   2009-11-18 [1/2] Carl Worth| Keith Packard; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread)
+thread:XXX   2009-11-18 [2/7] Lars Kellogg-Stedman, Carl Worth| Mikhail Gusarov, Keith Packard; [notmuch] Working with Maildir storage? (inbox unread)
+thread:XXX   2009-11-18 [1/5] Carl Worth| Mikhail Gusarov, Keith Packard; [notmuch] [PATCH 1/2] Close message file after parsing message headers (inbox unread)
+thread:XXX   2009-11-18 [2/2] Keith Packard, Alexander Botero-Lowry; [notmuch] [PATCH] Create a default notmuch-show-hook that highlights URLs and uses word-wrap (inbox unread)
+thread:XXX   2009-11-18 [1/1] Alexander Botero-Lowry; [notmuch] request for pull (inbox unread)
+thread:XXX   2009-11-18 [4/4] Jjgod Jiang, Alexander Botero-Lowry; [notmuch] Mac OS X/Darwin compatibility issues (inbox unread)
+thread:XXX   2009-11-18 [1/1] Rolland Santimano; [notmuch] Link to mailing list archives ? (inbox unread)
+thread:XXX   2009-11-18 [1/1] Jan Janak; [notmuch] [PATCH] notmuch new: Support for conversion of spool subdirectories into tags (inbox unread)
+thread:XXX   2009-11-18 [1/1] Stewart Smith; [notmuch] [PATCH] count_files: sort directory in inode order before statting (inbox unread)
+thread:XXX   2009-11-18 [1/1] Stewart Smith; [notmuch] [PATCH 2/2] Read mail directory in inode number order (inbox unread)
+thread:XXX   2009-11-18 [1/1] Stewart Smith; [notmuch] [PATCH] Fix linking with gcc to use g++ to link in C++ libs. (inbox unread)
+thread:XXX   2009-11-18 [2/2] Lars Kellogg-Stedman; [notmuch] \"notmuch help\" outputs to stderr? (attachment inbox unread)"
+
+test_begin_subtest "Search by after: (no results)"
+output=$(notmuch search 'after:2009/11/19' | notmuch_search_sanitize)
+test_expect_equal "$output" ""
+
+test_begin_subtest "Search by before:"
+output=$(notmuch search 'before:2000/01/02' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; body search (inbox unread)
+thread:XXX   2000-01-01 [1/1] searchbyfrom; search by from (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; search by to (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; subjectsearchtest (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; search by id (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; search by tag (inbox searchbytag unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; search by thread (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; body search (phrase) (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; negative result (inbox unread)
+thread:XXX   2000-01-01 [1/1] searchbyfrom@example.com; search by from (address) (inbox unread)
+thread:XXX   2000-01-01 [1/1] Search By From Name; search by from (name) (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; search by to (address) (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; search by to (name) (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; subject search test (phrase) (inbox unread)
+thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; this phrase should not match the subject search test (inbox unread)"
+
+test_begin_subtest "Search by before: (no results)"
+output=$(notmuch search 'before:2000/01/01' | notmuch_search_sanitize)
+test_expect_equal "$output" ""
+
+test_begin_subtest "Search by after: and before:"
+output=$(notmuch search 'after:2009/11/17 before:2009/11/18' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2009-11-17 [1/2] Ingmar Vanhassel| Carl Worth; [notmuch] [PATCH] Typsos (inbox unread)
+thread:XXX   2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [notmuch] archive (inbox unread)
+thread:XXX   2009-11-17 [1/3] Adrian Perez de Castro| Keith Packard, Carl Worth; [notmuch] Introducing myself (inbox unread)
+thread:XXX   2009-11-17 [1/3] Israel Herraiz| Keith Packard, Carl Worth; [notmuch] New to the list (inbox unread)
+thread:XXX   2009-11-17 [2/3] Jan Janak| Carl Worth; [notmuch] What a great idea! (inbox unread)
+thread:XXX   2009-11-17 [1/2] Jan Janak| Carl Worth; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread)
+thread:XXX   2009-11-17 [1/2] Keith Packard| Carl Worth; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread)
+thread:XXX   2009-11-17 [5/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard| Carl Worth; [notmuch] Working with Maildir storage? (inbox unread)
+thread:XXX   2009-11-17 [4/5] Mikhail Gusarov, Carl Worth, Keith Packard; [notmuch] [PATCH 1/2] Close message file after parsing message headers (inbox unread)
+thread:XXX   2009-11-17 [1/1] Mikhail Gusarov; [notmuch] [PATCH] Handle rename of message file (inbox unread)
+thread:XXX   2009-11-17 [2/2] Alex Botero-Lowry, Carl Worth; [notmuch] preliminary FreeBSD support (attachment inbox unread)"
+
+test_begin_subtest "Search by before: (syntax error)"
+output=$(notmuch search 'before:x' 2>&1 | notmuch_search_sanitize)
+test_expect_equal "$output" 'Query error: Invalid date "x"'
+
 test_begin_subtest 'Search for all messages ("*")'
 output=$(notmuch search '*' | notmuch_search_sanitize)
 test_expect_equal "$output" "thread:XXX   2009-11-18 [1/1] Chris Wilson; [notmuch] [PATCH 1/2] Makefile: evaluate pkg-config once (inbox unread)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 9/8] qparser: Delete (and thus close) the Xapian database.
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
                   ` (7 preceding siblings ...)
  2011-01-16  8:10 ` [PATCH 8/8] Support before: and after: date search with sane date syntax Austin Clements
@ 2011-01-31  4:33 ` Austin Clements
  2011-02-02  5:03 ` [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
  9 siblings, 0 replies; 25+ messages in thread
From: Austin Clements @ 2011-01-31  4:33 UTC (permalink / raw)
  To: notmuch

I removed this line in a bout of overzealous deletions.  The visible
consequence was that the database was being unlocked lazily, resulting
in a brief window after notmuch had exited but the database was still
locked.
---
I admit this patch series numbering is getting a little out of hand.
As usual, this can be found on the qparser branch at
  http://awakening.csail.mit.edu/git/notmuch.git

 lib/database.cc |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 7253bdf..3c488a9 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -826,6 +826,7 @@ notmuch_database_close (notmuch_database_t *notmuch)
     }
 
     delete notmuch->term_gen;
+    delete notmuch->xapian_db;
     talloc_free (notmuch);
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more
  2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
                   ` (8 preceding siblings ...)
  2011-01-31  4:33 ` [PATCH 9/8] qparser: Delete (and thus close) the Xapian database Austin Clements
@ 2011-02-02  5:03 ` Austin Clements
  2011-02-02 22:48   ` Carl Worth
  9 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-02-02  5:03 UTC (permalink / raw)
  To: notmuch

I rebased the query parser against current master.  It's on the
qparser-3 branch at
  http://awakening.csail.mit.edu/git/notmuch.git

At cworth's request, I've folded the database closing bug fix in to
the appropriate patch.  I also stripped out my implementation of
folder searching, since it obviously conflicts with cworth's [1].  I'm
not planning to resend the patches unless asked because there were no
actual code changes.

[1] I still assert the "correct" folder solution is somewhere between
mine and cworth's: rooted and non-recursive by default (non-recursive
is the only thing that makes sense for Maildir++ folders and it would
be silly for the default to depend on folder type), but leveraging the
flexibility of custom query transforms to support Maildir++ folders
and recursive (and maybe non-rooted) searches as well.  Non-rooted and
recursive searches even map to natural syntaxes that align with
Xapian's existing wildcard syntax.

Quoth myself on Jan 16 at  3:10 am:
> This is version 2 of the custom query parser.  It now supports date
> searches with sane syntax, folder searches (without any additions or
> changes to the database, unlike cworth's recent commit), and "tag:*"
> and "-tag:*" queries for finding tagged and untagged messages.  I used
> these features to guide changes to the original design and to validate
> the approach.  This is still RFC, but it's much less raw now.
> 
> In addition to the new features, the core query parser has a bunch of
> cleanups and changes, including completely redone NEAR and ADJ
> operators that now behave essentially the same as they do in Xapian's
> query parser.  I also split the implementation of these out into a
> separate patch for ease of review.
> 
> There's a notable lack of tests in this current series.  I do have a
> pile of tests for the lexer, parser, and generator, but the
> infrastructure for testing them needs cleanup before I send that out.
> 

-- 
Austin Clements                                      MIT/'06/PhD/CSAIL
amdragon@mit.edu                           http://web.mit.edu/amdragon
       Somewhere in the dream we call reality you will find me,
              searching for the reality we call dreams.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more
  2011-02-02  5:03 ` [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
@ 2011-02-02 22:48   ` Carl Worth
  2011-02-03  6:14     ` Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more) Austin Clements
  0 siblings, 1 reply; 25+ messages in thread
From: Carl Worth @ 2011-02-02 22:48 UTC (permalink / raw)
  To: Austin Clements, notmuch

[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]

Restricting my reply to one tiny bit of your mail:

You wrote:
> non-recursive is the only thing that makes sense for Maildir++ folders

Either I'm not understanding Maildir++ folders, or I don't agree with
you.

I might have an email archive that looks like this:

  Maildir
    .work
      .project1
      .project2
      .etc...
    .family
      .dad
      .mom
      .brother
      .etc...

With the above setup, what would be unreasonable about wanting to search
for all work-related messages (across all projects, say) with a string
like "folder:work" ?

Now, a person might definitely want to search for messages in the
".work" folder directly, (not including the sub-folders), so we should
provide support for users to get at that behavior as well, (such as a
proposed "folder:work$" or so).

To me, both cases are perfectly legitimate, and I don't understand an
argument that claims that only one makes sense. (Or again, I may be
misunderstanding something.)

-Carl

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more)
  2011-02-02 22:48   ` Carl Worth
@ 2011-02-03  6:14     ` Austin Clements
  2011-02-20 19:52       ` Folder search semantics Rob Browning
  0 siblings, 1 reply; 25+ messages in thread
From: Austin Clements @ 2011-02-03  6:14 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

Quoth Carl Worth on Feb 02 at  2:48 pm:
> Restricting my reply to one tiny bit of your mail:
> 
> You wrote:
> > non-recursive is the only thing that makes sense for Maildir++ folders
> 
> Either I'm not understanding Maildir++ folders, or I don't agree with
> you.
> 
> I might have an email archive that looks like this:
> 
>   Maildir
>     .work
>       .project1
>       .project2
>       .etc...
>     .family
>       .dad
>       .mom
>       .brother
>       .etc...
> 
> With the above setup, what would be unreasonable about wanting to search
> for all work-related messages (across all projects, say) with a string
> like "folder:work" ?
> 
> Now, a person might definitely want to search for messages in the
> ".work" folder directly, (not including the sub-folders), so we should
> provide support for users to get at that behavior as well, (such as a
> proposed "folder:work$" or so).
> 
> To me, both cases are perfectly legitimate, and I don't understand an
> argument that claims that only one makes sense. (Or again, I may be
> misunderstanding something.)

(Somebody with more first-hand Maildir++ experience should jump in here.
I stopped using Maildir++ a long time ago, so I may have no idea what
I'm talking about.)

Both cases are perfectly legitimate.

However, the issue with Maildir++ is that the inbox is stored in the
top-level directory:

  Maildir
    cur
    new
    tmp
    .work
    .work.project1

As a consequence, all folders are subfolders of the inbox.  With
recursive search, a search for your inbox folder returns *all* of your
messages.  I wasn't trying to say that we shouldn't support recursive
search (I'm all for flexibility), but it's a confusing default for
Maildir++ because of this.

Maildir++ has the added twist that the inbox folder has no name.  As a
result, currently notmuch can't search for a Maildir++ inbox folder,
which needs to be addressed somehow.  The least surprising approach
would compatibility with the Maildir++ convention of calling the
top-level folder INBOX, the subfolder INBOX.work, etc.


Maildir++ issues aside, I submit that rooted, non-recursive folder
searches are a more natural default with a more conventional syntactic
extension to non-rooted/recursive searches.  In
id:87aaiy3u65.fsf@yoom.home.cworth.org, you mentioned that you
implemented non-rooted folder search to mimic subject search.  But file
system paths are not natural language like subject lines.  File system
paths are hierarchical and rooted.

Of course, special query operators like ^ and $ can mitigate this, but
these queries *aren't* regexps and, furthermore, people don't usually
apply regexps to file names.  They apply globs.  Glob syntax has the
added benefit of congruity with Xapian wildcard syntax.  This naturally
leads to a rooted, non-recursive syntax by default (like globs), where a
* at the end means recursive and a * at the beginning means non-rooted.
In fact, we could easily generalize this to arbitrary shell globs.


Here's a proposal that, I think, addresses Maildir++ inboxes and
subfolders; rooted, non-rooted, recursive, and non-recursive queries;
and then some.  Plus, it wouldn't require many code changes; you've
already done the hard work.

Switch XFOLDER from a probabilistic prefix with word-splitting to a
boolean prefix without word-splitting.  When indexing, strip off the cur
or new and examine the resulting directory name.  If it's the mail root,
this is a Maildir++ inbox, so add the term XFOLDERINBOX.  If it starts
with a dot, it's a Maildir++ subfolder, so add the term
XFOLDERINBOX<.dirname>.  Otherwise, add the term XFOLDER<dirname>.
Then, using a custom query transform for the "folder:" prefix, enumerate
XFOLDER terms and form a synonym query out of those that fnmatch the
user's folder query.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Folder search semantics
  2011-02-03  6:14     ` Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more) Austin Clements
@ 2011-02-20 19:52       ` Rob Browning
  2011-02-20 20:00         ` Rob Browning
  0 siblings, 1 reply; 25+ messages in thread
From: Rob Browning @ 2011-02-20 19:52 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

Austin Clements <amdragon@MIT.EDU> writes:

> As a consequence, all folders are subfolders of the inbox.  With
> recursive search, a search for your inbox folder returns *all* of your
> messages.  I wasn't trying to say that we shouldn't support recursive
> search (I'm all for flexibility), but it's a confusing default for
> Maildir++ because of this.
>
> Maildir++ has the added twist that the inbox folder has no name.  As a
> result, currently notmuch can't search for a Maildir++ inbox folder,
> which needs to be addressed somehow.  The least surprising approach
> would compatibility with the Maildir++ convention of calling the
> top-level folder INBOX, the subfolder INBOX.work, etc.

Just adding my agreement here.  With recursion and no anchors, "folder:"
really won't work for the inbox for Maildir++.

> Maildir++ issues aside, I submit that rooted, non-recursive folder
> searches are a more natural default with a more conventional syntactic
> extension to non-rooted/recursive searches.  In
> id:87aaiy3u65.fsf@yoom.home.cworth.org, you mentioned that you
> implemented non-rooted folder search to mimic subject search.  But file
> system paths are not natural language like subject lines.  File system
> paths are hierarchical and rooted.
>
> Of course, special query operators like ^ and $ can mitigate this, but
> these queries *aren't* regexps and, furthermore, people don't usually
> apply regexps to file names.  They apply globs.  Glob syntax has the
> added benefit of congruity with Xapian wildcard syntax.  This naturally
> leads to a rooted, non-recursive syntax by default (like globs), where a
> * at the end means recursive and a * at the beginning means non-rooted.
> In fact, we could easily generalize this to arbitrary shell globs.

I agree with all of this.  Something like fnmatch() sounds appropriate
to me.  In fact, I'd suggest that we implement this very much like
fnmatch() with folder references like paths -- where "/" is always the
separator, regardless of how things are handled in the underlying
storage.  So depending on the backend, foo/bar could refer to
"Maildir/foo/bar" or "Maildir/.foo.bar".

And personally, I think I'd prefer that folder: be anchored by default,
so that folder:work means "the top-level folder named work", but it's
not a big deal to me as long as there's a fairly easy way to specify
exactly what I want.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Folder search semantics
  2011-02-20 19:52       ` Folder search semantics Rob Browning
@ 2011-02-20 20:00         ` Rob Browning
  0 siblings, 0 replies; 25+ messages in thread
From: Rob Browning @ 2011-02-20 20:00 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

Rob Browning <rlb@defaultvalue.org> writes:

> And personally, I think I'd prefer that folder: be anchored by default,
> so that folder:work means "the top-level folder named work", but it's
> not a big deal to me as long as there's a fairly easy way to specify
> exactly what I want.

Oh, and in part, the reason why I suspect anchored might be a better
default is that it's more conservative.

For example, someone who mistakenly assumes that folder: is anchored by
default doesn't do any unexpected damage with something like this:

  $ notmuch tag +deleted folder:misc
  $ notmuch search tag:deleted ... | xargs rm

FWIW
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-02-20 20:00 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-16  8:10 [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
2011-01-16  8:10 ` [PATCH 1/8] Implement a custom query parser with a mostly Xapian-compatible grammar Austin Clements
2011-01-21  6:37   ` [PATCH 1.5/8] Query parser testing framework and basic tests Austin Clements
2011-01-16  8:10 ` [PATCH 2/8] Parse NEAR and ADJ operators Austin Clements
2011-01-21  6:39   ` [PATCH 2.5/8] Query parser tests for " Austin Clements
2011-01-16  8:10 ` [PATCH 3/8] Parse wildcard queries Austin Clements
2011-01-21  6:40   ` [PATCH 3.5/8] Query parser tests for " Austin Clements
2011-01-22 16:47     ` Michal Sojka
2011-01-23 22:02       ` Austin Clements
2011-01-24 12:24         ` Michal Sojka
2011-01-16  8:10 ` [PATCH 4/8] Replace Xapian query parser with custom query parser Austin Clements
2011-01-16  8:10 ` [PATCH 5/8] Support "tag:*" as well as "NOT tag:*" queries Austin Clements
2011-01-24 17:15   ` [PATCH 5.5/8] test: Wildcard tag search and untagged search Austin Clements
2011-01-16  8:10 ` [PATCH 6/8] Support maildir folder search Austin Clements
2011-01-24 17:13   ` [PATCH 6/8 v2] " Austin Clements
2011-01-24 17:18   ` [PATCH 6.5/8] test: Add tests for custom query parser-based folder searches Austin Clements
2011-01-16  8:10 ` [PATCH 7/8] Implement value range queries Austin Clements
2011-01-16  8:10 ` [PATCH 8/8] Support before: and after: date search with sane date syntax Austin Clements
2011-01-24 17:20   ` [PATCH 8.5/8] test: Add tests for search by date Austin Clements
2011-01-31  4:33 ` [PATCH 9/8] qparser: Delete (and thus close) the Xapian database Austin Clements
2011-02-02  5:03 ` [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more Austin Clements
2011-02-02 22:48   ` Carl Worth
2011-02-03  6:14     ` Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more) Austin Clements
2011-02-20 19:52       ` Folder search semantics Rob Browning
2011-02-20 20:00         ` Rob Browning

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).