From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id dq12CwhWFWGFOQEAgWs5BA (envelope-from ) for ; Thu, 12 Aug 2021 19:10:32 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 6GSxBghWFWHLcgAA1q6Kng (envelope-from ) for ; Thu, 12 Aug 2021 17:10:32 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [144.217.243.247]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id CAB5AB9FC for ; Thu, 12 Aug 2021 19:10:31 +0200 (CEST) Received: from nmbug.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id D8D062C042; Thu, 12 Aug 2021 13:09:30 -0400 (EDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by mail.notmuchmail.org (Postfix) with ESMTP id A982C2C027 for ; Thu, 12 Aug 2021 13:09:22 -0400 (EDT) Received: by fethera.tethera.net (Postfix, from userid 1001) id 9A34B5FD5C; Thu, 12 Aug 2021 13:09:22 -0400 (EDT) Received: (nullmailer pid 1348733 invoked by uid 1000); Thu, 12 Aug 2021 17:07:43 -0000 From: David Bremner To: notmuch@notmuchmail.org Cc: David Bremner Subject: [PATCH 06/31] lib/parse-sexp: parse single terms and the empty list. Date: Thu, 12 Aug 2021 10:07:03 -0700 Message-Id: <20210812170728.1348333-7-david@tethera.net> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210812170728.1348333-1-david@tethera.net> References: <20210812170728.1348333-1-david@tethera.net> MIME-Version: 1.0 Message-ID-Hash: BECMXIJNG4ZAM2ECPRMA4JUNP7ESKMFN X-Message-ID-Hash: BECMXIJNG4ZAM2ECPRMA4JUNP7ESKMFN X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1628788231; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=tLYGoUeYtbPgwoXd2xahghWasjoyOW5pyuBdDygQBJA=; b=aPMKWqIg+0TOqsyRoED+1VSNFyt36EmFxgYzhI+GnHGQeWpvHclSA8l6H7cTDF13WGO/KK vgPC0w9VDgoa5WLqXdlgXkGbRBqvI96yjrxEq324ILG1/u0TM9Gxo6T4sUpdTNBvlu/BNx LGRaCLJijwEU1J8eQ1tNM2u6PbJ3OjtnEdRV4Nkd1NpiVTer2iJ8RgonAzKb5UTKlE7N+M 7PwKGD1KlCEQ4TQ8szTtWRKFN2cMWrA6aLTAYQO3euZNAUAYQ2pnBLyNn8N832Y8QmhBu/ iWRUeIJ9aIrduW48C5cMpOywxzWAcyFKjBHYXfOZh85rrvvi695DJYfKtPkB9Q== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1628788231; a=rsa-sha256; cv=none; b=bZrYQk0XttAi328Zqt7vTdvUSyS0Ptiz5zEGYPiOZbu3xORIkZ7r3fhx0Q6sG/8TDe8Ivh OF/4syhJdhqE/KfsO+2C8xGtXRXg0XSlWxcRqz0V9RZXfPpVkHeadByBNjCKNy72Lhh3oN L8VNu9I4ND1dAX3yP2qGi3vmHmdEPrsKKdb5v1+n0fS1tNi+XhdP0gpzEnVaXHcbbU/tlO nMxVcsKSiDNon5oIrDpQwley+f+rIbdI4lG1T2R87+wtsXOiMrh+jOQ6PD033P4zaVpYuw dn1xs3Va0Xd2iFLyU9jftAiI6tyCye73BwZSah/pvEBoL8MgMNL/GXzyJEoS7Q== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: 0.48 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: CAB5AB9FC X-Spam-Score: 0.48 X-Migadu-Scanner: scn1.migadu.com X-TUID: Mm2y87/LtTTT There is not much of a parser here yet, but it already does some useful error reporting. Most functionality sketched in the documentation is not implemented yet; detailed documentation will follow with the implementation. --- doc/conf.py | 4 ++ doc/index.rst | 1 + doc/man7/notmuch-sexp-queries.rst | 81 +++++++++++++++++++++++++++++++ lib/Makefile.local | 3 +- lib/database-private.h | 7 +++ lib/parse-sexp.cc | 54 +++++++++++++++++++++ lib/query.cc | 6 +-- test/T080-search.sh | 5 -- test/T081-sexpr-search.sh | 65 +++++++++++++++++++++++++ 9 files changed, 216 insertions(+), 10 deletions(-) create mode 100644 doc/man7/notmuch-sexp-queries.rst create mode 100644 lib/parse-sexp.cc create mode 100755 test/T081-sexpr-search.sh diff --git a/doc/conf.py b/doc/conf.py index 4a4a3421..53becb00 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -159,6 +159,10 @@ man_pages = [ u'syntax for notmuch queries', [notmuch_authors], 7), + ('man7/notmuch-sexp-queries', 'notmuch-sexp-queries', + u's-expression syntax for notmuch queries', + [notmuch_authors], 7), + ('man1/notmuch-show', 'notmuch-show', u'show messages matching the given search terms', [notmuch_authors], 1), diff --git a/doc/index.rst b/doc/index.rst index a3bf3480..fbdcf779 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -24,6 +24,7 @@ Contents: man1/notmuch-restore man1/notmuch-search man7/notmuch-search-terms + man7/notmuch-sexp-queries man1/notmuch-show man1/notmuch-tag python-bindings diff --git a/doc/man7/notmuch-sexp-queries.rst b/doc/man7/notmuch-sexp-queries.rst new file mode 100644 index 00000000..223b1bd8 --- /dev/null +++ b/doc/man7/notmuch-sexp-queries.rst @@ -0,0 +1,81 @@ +.. _notmuch-sexp-query(7): + +==================== +notmuch-sexp-queries +==================== + +SYNOPSIS +======== + +**notmuch** **search** ``--query=sexp`` '(and (to santa) (date december))' + +DESCRIPTION +=========== + + +S-EXPRESSIONS +------------- + +An *s-expression* is either an atom, or list of whitespace delimited +s-expressions inside parentheses. Atoms are either + +*basic value* + A basic value is an unquoted string containing no whitespace, double quotes, or + parentheses. + +*quoted string* + Double quotes (") delimit strings possibly containing whitespace + or parentheses. These can contain double quote characters by + escaping with backslash. E.g. ``"this is a quote \""``. + +S-EXPRESSION QUERIES +-------------------- + +An s-expression query is either an atom, the empty list, or a +*compound query* consisting of a prefix atom (first element) defining +a *field*, *logical operation*, or *modifier*, and 0 or more +subqueries. + +``*`` +``()`` + The empty list matches all messages + +*term* + Match all messages containing *term*, possibly after stemming + or phase splitting. + +``(`` *field* |q1| |q2| ... |qn| ``)`` + Restrict the queries |q1| to |qn| to *field*, and combine with *and* + (for most fields) or *or*. See :any:`fields` for more information. + +``(`` *operator* |q1| |q2| ... |qn| ``)`` + Combine queries |q1| to |qn|. See :any:`operators` for more information. + +``(`` *modifier* |q1| |q2| ... |qn| ``)`` + Combine queries |q1| to |qn|, and reinterpret the result (e.g. as a regular expression). + See :any:`modifiers` for more information. + +.. _fields: + +FIELDS +`````` + +.. _operators: + +OPERATORS +````````` + +.. _modifiers: + +MODIFIERS +````````` + +EXAMPLES +======== + +``Wizard`` + Match all messages containing the word "wizard", ignoring case. + +.. |q1| replace:: :math:`q_1` +.. |q2| replace:: :math:`q_2` +.. |qn| replace:: :math:`q_n` diff --git a/lib/Makefile.local b/lib/Makefile.local index e2d4b91d..1378a74b 100644 --- a/lib/Makefile.local +++ b/lib/Makefile.local @@ -63,7 +63,8 @@ libnotmuch_cxx_srcs = \ $(dir)/features.cc \ $(dir)/prefix.cc \ $(dir)/open.cc \ - $(dir)/init.cc + $(dir)/init.cc \ + $(dir)/parse-sexp.cc libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o) diff --git a/lib/database-private.h b/lib/database-private.h index 9706c17e..f206efaf 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -300,4 +300,11 @@ _notmuch_database_setup_standard_query_fields (notmuch_database_t *notmuch); notmuch_status_t _notmuch_database_setup_user_query_fields (notmuch_database_t *notmuch); +#if __cplusplus +/* parse-sexp.cc */ +notmuch_status_t +_notmuch_sexp_string_to_xapian_query (notmuch_database_t *notmuch, const char *querystr, + Xapian::Query &output); +#endif + #endif diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc new file mode 100644 index 00000000..1ce3c9d4 --- /dev/null +++ b/lib/parse-sexp.cc @@ -0,0 +1,54 @@ +#include +#include "notmuch-private.h" +#include "sexp.h" + +#if HAVE_SFSEXP + +/* _sexp is used for file scope symbols to avoid clashing with + * definitions from sexp.h */ + +/* Here we expect the s-expression to be a proper list, with first + * element defining and operation, or as a special case the empty + * list */ + +static notmuch_status_t +_sexp_to_xapian_query (notmuch_database_t *notmuch, const sexp_t *sx, + Xapian::Query &output) +{ + + if (sx->ty == SEXP_VALUE) { + output = Xapian::Query (Xapian::Unicode::tolower (sx->val)); + return NOTMUCH_STATUS_SUCCESS; + } + + /* Empty list */ + if (! sx->list) { + output = Xapian::Query::MatchAll; + return NOTMUCH_STATUS_SUCCESS; + } + + if (sx->list->ty == SEXP_VALUE) + _notmuch_database_log (notmuch, "unknown prefix '%s'\n", sx->list->val); + else + _notmuch_database_log (notmuch, "unexpected list in field/operation position\n", + sx->list->val); + + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; +} + +notmuch_status_t +_notmuch_sexp_string_to_xapian_query (notmuch_database_t *notmuch, const char *querystr, + Xapian::Query &output) +{ + const sexp_t *sx = NULL; + char *buf = talloc_strdup (notmuch, querystr); + + sx = parse_sexp (buf, strlen (querystr)); + if (! sx) { + _notmuch_database_log (notmuch, "invalid s-expression: '%s'\n", querystr); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + + return _sexp_to_xapian_query (notmuch, sx, output); +} +#endif diff --git a/lib/query.cc b/lib/query.cc index 12fd9482..435f7229 100644 --- a/lib/query.cc +++ b/lib/query.cc @@ -23,8 +23,6 @@ #include /* GHashTable, GPtrArray */ -#include "sexp.h" - struct _notmuch_query { notmuch_database_t *notmuch; const char *query_string; @@ -208,8 +206,8 @@ _notmuch_query_ensure_parsed_sexpr (notmuch_query_t *query) if (query->parsed) return NOTMUCH_STATUS_SUCCESS; - query->xapian_query = Xapian::Query::MatchAll; - return NOTMUCH_STATUS_SUCCESS; + return _notmuch_sexp_string_to_xapian_query (query->notmuch, query->query_string, + query->xapian_query); } static notmuch_status_t diff --git a/test/T080-search.sh b/test/T080-search.sh index 5f6c1456..a3f0dead 100755 --- a/test/T080-search.sh +++ b/test/T080-search.sh @@ -189,9 +189,4 @@ test_begin_subtest "parts do not have adjacent term positions" output=$(notmuch search id:termpos and '"c x"') test_expect_equal "$output" "" -test_begin_subtest "sexpr query: all messages" -notmuch search '*' > EXPECTED -notmuch search --query=sexp '()' > OUTPUT -test_expect_equal_file EXPECTED OUTPUT - test_done diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh new file mode 100755 index 00000000..46cc712c --- /dev/null +++ b/test/T081-sexpr-search.sh @@ -0,0 +1,65 @@ +#!/usr/bin/env bash +test_description='"notmuch search" in several variations' +. $(dirname "$0")/test-lib.sh || exit 1 + +if [ $NOTMUCH_HAVE_SFSEXP -ne 1 ]; then + printf "Skipping due to missing sfsexp library\n" + test_done +fi + +add_email_corpus + +test_begin_subtest "all messages: ()" +notmuch search '*' > EXPECTED +notmuch search --query=sexp "()" > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "single term in body" +notmuch search --query=sexp 'wizard' | notmuch_search_sanitize>OUTPUT +cat < EXPECTED +thread:XXX 2009-11-18 [1/3] Carl Worth| Jan Janak; [notmuch] What a great idea! (inbox unread) +EOF +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "single term in body (case insensitive)" +notmuch search --query=sexp 'Wizard' | notmuch_search_sanitize>OUTPUT +cat < EXPECTED +thread:XXX 2009-11-18 [1/3] Carl Worth| Jan Janak; [notmuch] What a great idea! (inbox unread) +EOF +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "single term in body, stemmed version" +test_subtest_known_broken +notmuch search arriv > EXPECTED +notmuch search --query=sexp arriv > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "Unbalanced parens" +# A code 1 indicates the error was handled (a crash will return e.g. 139). +test_expect_code 1 "notmuch search --query=sexp '('" + +test_begin_subtest "Unbalanced parens, error message" +notmuch search --query=sexp '(' >OUTPUT 2>&1 +cat < EXPECTED +notmuch search: Syntax error in query +invalid s-expression: '(' +EOF +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "unknown prefix" +notmuch search --query=sexp '(foo)' >OUTPUT 2>&1 +cat < EXPECTED +notmuch search: Syntax error in query +unknown prefix 'foo' +EOF +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "list as prefix" +notmuch search --query=sexp '((foo))' >OUTPUT 2>&1 +cat < EXPECTED +notmuch search: Syntax error in query +unexpected list in field/operation position +EOF +test_expect_equal_file EXPECTED OUTPUT + +test_done -- 2.30.2