unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Cc: David Bremner <david@tethera.net>
Subject: [PATCH 06/31] lib/parse-sexp: parse single terms and the empty list.
Date: Thu, 12 Aug 2021 10:07:03 -0700	[thread overview]
Message-ID: <20210812170728.1348333-7-david@tethera.net> (raw)
In-Reply-To: <20210812170728.1348333-1-david@tethera.net>

There is not much of a parser here yet, but it already does some
useful error reporting. Most functionality sketched in the
documentation is not implemented yet; detailed documentation will
follow with the implementation.
---
 doc/conf.py                       |  4 ++
 doc/index.rst                     |  1 +
 doc/man7/notmuch-sexp-queries.rst | 81 +++++++++++++++++++++++++++++++
 lib/Makefile.local                |  3 +-
 lib/database-private.h            |  7 +++
 lib/parse-sexp.cc                 | 54 +++++++++++++++++++++
 lib/query.cc                      |  6 +--
 test/T080-search.sh               |  5 --
 test/T081-sexpr-search.sh         | 65 +++++++++++++++++++++++++
 9 files changed, 216 insertions(+), 10 deletions(-)
 create mode 100644 doc/man7/notmuch-sexp-queries.rst
 create mode 100644 lib/parse-sexp.cc
 create mode 100755 test/T081-sexpr-search.sh

diff --git a/doc/conf.py b/doc/conf.py
index 4a4a3421..53becb00 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -159,6 +159,10 @@ man_pages = [
      u'syntax for notmuch queries',
      [notmuch_authors], 7),
 
+    ('man7/notmuch-sexp-queries', 'notmuch-sexp-queries',
+     u's-expression syntax for notmuch queries',
+     [notmuch_authors], 7),
+
     ('man1/notmuch-show', 'notmuch-show',
      u'show messages matching the given search terms',
      [notmuch_authors], 1),
diff --git a/doc/index.rst b/doc/index.rst
index a3bf3480..fbdcf779 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -24,6 +24,7 @@ Contents:
    man1/notmuch-restore
    man1/notmuch-search
    man7/notmuch-search-terms
+   man7/notmuch-sexp-queries
    man1/notmuch-show
    man1/notmuch-tag
    python-bindings
diff --git a/doc/man7/notmuch-sexp-queries.rst b/doc/man7/notmuch-sexp-queries.rst
new file mode 100644
index 00000000..223b1bd8
--- /dev/null
+++ b/doc/man7/notmuch-sexp-queries.rst
@@ -0,0 +1,81 @@
+.. _notmuch-sexp-query(7):
+
+====================
+notmuch-sexp-queries
+====================
+
+SYNOPSIS
+========
+
+**notmuch** **search** ``--query=sexp`` '(and (to santa) (date december))'
+
+DESCRIPTION
+===========
+
+
+S-EXPRESSIONS
+-------------
+
+An *s-expression* is either an atom, or list of whitespace delimited
+s-expressions inside parentheses. Atoms are either
+
+*basic value*
+    A basic value is an unquoted string containing no whitespace, double quotes, or
+    parentheses.
+
+*quoted string*
+    Double quotes (") delimit strings possibly containing whitespace
+    or parentheses. These can contain double quote characters by
+    escaping with backslash. E.g. ``"this is a quote \""``.
+
+S-EXPRESSION QUERIES
+--------------------
+
+An s-expression query is either an atom, the empty list, or a
+*compound query* consisting of a prefix atom (first element) defining
+a *field*, *logical operation*, or *modifier*, and 0 or more
+subqueries.
+
+``*``
+``()``
+    The empty list matches all messages
+
+*term*
+    Match all messages containing *term*, possibly after stemming
+    or phase splitting.
+
+``(`` *field* |q1| |q2| ... |qn| ``)``
+    Restrict the queries |q1| to |qn| to *field*, and combine with *and*
+    (for most fields) or *or*. See :any:`fields` for more information.
+
+``(`` *operator* |q1| |q2| ... |qn| ``)``
+    Combine queries |q1| to |qn|. See :any:`operators` for more information.
+
+``(`` *modifier* |q1| |q2| ... |qn| ``)``
+    Combine queries |q1| to |qn|, and reinterpret the result (e.g. as a regular expression).
+    See :any:`modifiers` for more information.
+
+.. _fields:
+
+FIELDS
+``````
+
+.. _operators:
+
+OPERATORS
+`````````
+
+.. _modifiers:
+
+MODIFIERS
+`````````
+
+EXAMPLES
+========
+
+``Wizard``
+    Match all messages containing the word "wizard", ignoring case.
+
+.. |q1| replace:: :math:`q_1`
+.. |q2| replace:: :math:`q_2`
+.. |qn| replace:: :math:`q_n`
diff --git a/lib/Makefile.local b/lib/Makefile.local
index e2d4b91d..1378a74b 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -63,7 +63,8 @@ libnotmuch_cxx_srcs =		\
 	$(dir)/features.cc	\
 	$(dir)/prefix.cc	\
 	$(dir)/open.cc		\
-	$(dir)/init.cc
+	$(dir)/init.cc		\
+	$(dir)/parse-sexp.cc
 
 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
 
diff --git a/lib/database-private.h b/lib/database-private.h
index 9706c17e..f206efaf 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -300,4 +300,11 @@ _notmuch_database_setup_standard_query_fields (notmuch_database_t *notmuch);
 notmuch_status_t
 _notmuch_database_setup_user_query_fields (notmuch_database_t *notmuch);
 
+#if __cplusplus
+/* parse-sexp.cc */
+notmuch_status_t
+_notmuch_sexp_string_to_xapian_query (notmuch_database_t *notmuch, const char *querystr,
+				      Xapian::Query &output);
+#endif
+
 #endif
diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc
new file mode 100644
index 00000000..1ce3c9d4
--- /dev/null
+++ b/lib/parse-sexp.cc
@@ -0,0 +1,54 @@
+#include <xapian.h>
+#include "notmuch-private.h"
+#include "sexp.h"
+
+#if HAVE_SFSEXP
+
+/* _sexp is used for file scope symbols to avoid clashing with
+ * definitions from sexp.h */
+
+/* Here we expect the s-expression to be a proper list, with first
+ * element defining and operation, or as a special case the empty
+ * list */
+
+static notmuch_status_t
+_sexp_to_xapian_query (notmuch_database_t *notmuch, const sexp_t *sx,
+		       Xapian::Query &output)
+{
+
+    if (sx->ty == SEXP_VALUE) {
+	output = Xapian::Query (Xapian::Unicode::tolower (sx->val));
+	return NOTMUCH_STATUS_SUCCESS;
+    }
+
+    /* Empty list */
+    if (! sx->list) {
+	output = Xapian::Query::MatchAll;
+	return NOTMUCH_STATUS_SUCCESS;
+    }
+
+    if (sx->list->ty == SEXP_VALUE)
+	_notmuch_database_log (notmuch, "unknown prefix '%s'\n", sx->list->val);
+    else
+	_notmuch_database_log (notmuch, "unexpected list in field/operation position\n",
+			       sx->list->val);
+
+    return NOTMUCH_STATUS_BAD_QUERY_SYNTAX;
+}
+
+notmuch_status_t
+_notmuch_sexp_string_to_xapian_query (notmuch_database_t *notmuch, const char *querystr,
+				      Xapian::Query &output)
+{
+    const sexp_t *sx = NULL;
+    char *buf = talloc_strdup (notmuch, querystr);
+
+    sx = parse_sexp (buf, strlen (querystr));
+    if (! sx) {
+	_notmuch_database_log (notmuch, "invalid s-expression: '%s'\n", querystr);
+	return NOTMUCH_STATUS_BAD_QUERY_SYNTAX;
+    }
+
+    return _sexp_to_xapian_query (notmuch, sx, output);
+}
+#endif
diff --git a/lib/query.cc b/lib/query.cc
index 12fd9482..435f7229 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -23,8 +23,6 @@
 
 #include <glib.h> /* GHashTable, GPtrArray */
 
-#include "sexp.h"
-
 struct _notmuch_query {
     notmuch_database_t *notmuch;
     const char *query_string;
@@ -208,8 +206,8 @@ _notmuch_query_ensure_parsed_sexpr (notmuch_query_t *query)
     if (query->parsed)
 	return NOTMUCH_STATUS_SUCCESS;
 
-    query->xapian_query = Xapian::Query::MatchAll;
-    return NOTMUCH_STATUS_SUCCESS;
+    return _notmuch_sexp_string_to_xapian_query (query->notmuch, query->query_string,
+						 query->xapian_query);
 }
 
 static notmuch_status_t
diff --git a/test/T080-search.sh b/test/T080-search.sh
index 5f6c1456..a3f0dead 100755
--- a/test/T080-search.sh
+++ b/test/T080-search.sh
@@ -189,9 +189,4 @@ test_begin_subtest "parts do not have adjacent term positions"
 output=$(notmuch search id:termpos and '"c x"')
 test_expect_equal "$output" ""
 
-test_begin_subtest "sexpr query: all messages"
-notmuch search '*' > EXPECTED
-notmuch search --query=sexp '()' > OUTPUT
-test_expect_equal_file EXPECTED OUTPUT
-
 test_done
diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh
new file mode 100755
index 00000000..46cc712c
--- /dev/null
+++ b/test/T081-sexpr-search.sh
@@ -0,0 +1,65 @@
+#!/usr/bin/env bash
+test_description='"notmuch search" in several variations'
+. $(dirname "$0")/test-lib.sh || exit 1
+
+if [ $NOTMUCH_HAVE_SFSEXP -ne 1 ]; then
+    printf "Skipping due to missing sfsexp library\n"
+    test_done
+fi
+
+add_email_corpus
+
+test_begin_subtest "all messages: ()"
+notmuch search '*' > EXPECTED
+notmuch search --query=sexp "()" > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "single term in body"
+notmuch search --query=sexp 'wizard' | notmuch_search_sanitize>OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2009-11-18 [1/3] Carl Worth| Jan Janak; [notmuch] What a great idea! (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "single term in body (case insensitive)"
+notmuch search --query=sexp 'Wizard' | notmuch_search_sanitize>OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2009-11-18 [1/3] Carl Worth| Jan Janak; [notmuch] What a great idea! (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "single term in body, stemmed version"
+test_subtest_known_broken
+notmuch search arriv > EXPECTED
+notmuch search --query=sexp arriv > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "Unbalanced parens"
+# A code 1 indicates the error was handled (a crash will return e.g. 139).
+test_expect_code 1 "notmuch search --query=sexp '('"
+
+test_begin_subtest "Unbalanced parens, error message"
+notmuch search --query=sexp '(' >OUTPUT 2>&1
+cat <<EOF > EXPECTED
+notmuch search: Syntax error in query
+invalid s-expression: '('
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "unknown prefix"
+notmuch search --query=sexp '(foo)' >OUTPUT 2>&1
+cat <<EOF > EXPECTED
+notmuch search: Syntax error in query
+unknown prefix 'foo'
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "list as prefix"
+notmuch search --query=sexp '((foo))' >OUTPUT 2>&1
+cat <<EOF > EXPECTED
+notmuch search: Syntax error in query
+unexpected list in field/operation position
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.30.2

  parent reply	other threads:[~2021-08-12 17:10 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 17:06 v4 sexp query parser David Bremner
2021-08-12 17:06 ` [PATCH 01/31] configure: optional library sfsexp David Bremner
2021-08-12 17:06 ` [PATCH 02/31] lib: split notmuch_query_create David Bremner
2021-08-12 17:07 ` [PATCH 03/31] lib: define notmuch_query_create_with_syntax David Bremner
2021-08-12 17:07 ` [PATCH 04/31] CLI/search+address: support sexpr queries David Bremner
2021-08-12 17:07 ` [PATCH 05/31] lib: add new status code for query syntax errors David Bremner
2021-08-12 17:07 ` David Bremner [this message]
2021-08-12 17:07 ` [PATCH 07/31] lib: leave stemmer object accessible David Bremner
2021-08-12 17:07 ` [PATCH 08/31] lib/parse-sexp: stem unquoted atoms David Bremner
2021-08-12 17:07 ` [PATCH 09/31] lib/parse-sexp: support and, not, and or David Bremner
2021-08-12 17:07 ` [PATCH 10/31] lib/parse-sexp: support subject field David Bremner
2021-08-12 17:07 ` [PATCH 11/31] util/unicode: allow calling from C++ David Bremner
2021-08-12 17:07 ` [PATCH 12/31] lib/parse-sexp: support phrase queries David Bremner
2021-08-12 17:07 ` [PATCH 13/31] lib/parse-sexp: add term prefix backed fields David Bremner
2021-08-12 17:07 ` [PATCH 14/31] lib/parse-sexp: 'starts-with' wildcard searches David Bremner
2021-08-12 17:07 ` [PATCH 15/31] lib/parse-sexp: add '*' as syntactic sugar for '(starts-with "")' David Bremner
2021-08-12 17:07 ` [PATCH 16/31] lib/parse-sexp: handle unprefixed terms David Bremner
2021-08-12 17:07 ` [PATCH 17/31] lib/query: generalize exclude handling to s-expression queries David Bremner
2021-08-12 17:07 ` [PATCH 18/31] lib: factor out query construction from regexp David Bremner
2021-08-12 17:07 ` [PATCH 19/31] lib/parse-sexp: support regular expressions David Bremner
2021-08-12 17:07 ` [PATCH 20/31] lib: generate actual Xapian query for "*" and "" David Bremner
2021-08-12 17:07 ` [PATCH 21/31] lib/query: factor out _notmuch_query_string_to_xapian_query David Bremner
2021-08-12 17:07 ` [PATCH 22/31] lib/thread-fp: factor out query expansion, rewrite in Xapian David Bremner
2021-08-12 17:07 ` [PATCH 23/31] lib/parse-sexp: expand queries David Bremner
2021-08-12 17:07 ` [PATCH 24/31] lib/parse-sexp: support infix subqueries David Bremner
2021-08-12 17:07 ` [PATCH 25/31] lib/parse-sexp: parse user headers David Bremner
2021-08-12 17:07 ` [PATCH 26/31] lib: factor out expansion of saved queries David Bremner
2021-08-12 17:07 ` [PATCH 27/31] lib/parse-sexp: handle " David Bremner
2021-08-12 17:07 ` [PATCH 28/31] CLI/config support saving s-expression queries David Bremner
2021-08-12 17:07 ` [PATCH 29/31] lib/parse-sexp: support saved " David Bremner
2021-08-12 17:07 ` [PATCH 30/31] lib/parse-sexp: thread environment argument through parser David Bremner
2021-08-12 17:07 ` [PATCH 31/31] lib/parse-sexp: apply macros David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210812170728.1348333-7-david@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).