From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id mImyMLJVFWGPOAEAgWs5BA (envelope-from ) for ; Thu, 12 Aug 2021 19:09:06 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id T8J0LLJVFWEfdwAA1q6Kng (envelope-from ) for ; Thu, 12 Aug 2021 17:09:06 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [144.217.243.247]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 47437B9C8 for ; Thu, 12 Aug 2021 19:09:06 +0200 (CEST) Received: from nmbug.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 110EA295C4; Thu, 12 Aug 2021 13:08:29 -0400 (EDT) Received: from fethera.tethera.net (fethera.tethera.net [IPv6:2607:5300:60:c5::1]) by mail.notmuchmail.org (Postfix) with ESMTP id 191D22931A for ; Thu, 12 Aug 2021 13:08:21 -0400 (EDT) Received: by fethera.tethera.net (Postfix, from userid 1001) id 10FA85FD5C; Thu, 12 Aug 2021 13:08:21 -0400 (EDT) Received: (nullmailer pid 1348761 invoked by uid 1000); Thu, 12 Aug 2021 17:07:43 -0000 From: David Bremner To: notmuch@notmuchmail.org Cc: David Bremner Subject: [PATCH 19/31] lib/parse-sexp: support regular expressions Date: Thu, 12 Aug 2021 10:07:16 -0700 Message-Id: <20210812170728.1348333-20-david@tethera.net> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210812170728.1348333-1-david@tethera.net> References: <20210812170728.1348333-1-david@tethera.net> MIME-Version: 1.0 Message-ID-Hash: AM7ZVRMKODIATGI5BQG7GAXG7EXOUTTP X-Message-ID-Hash: AM7ZVRMKODIATGI5BQG7GAXG7EXOUTTP X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1628788146; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=aIx7hTD3dpLciMYx6VUIYp91zeLFPXrtHAcaPj8q3tU=; b=MMzALKOl6R4/EE8kI3wEdSo64JsbZ9EYoWCNjn19ANZM/HIhxfugHAvLxuQeNYyZv6QLvv 2r6boj4bNIhVg3u6SipuEjUy+dxML9mpctBqCAB1fsx4pxHByg8JIKSt36Ibob+I2jPOiW YTMQLC4PzMEu8BqRPJYMMGHeDYLsa0OJRqPPP5ytt9yISdZDzIBAEqxUgNWEhsr2JgBry/ 2POaEbD/fZBhXnZG0eQwIeL8IebQxx4cO/XCWGXMUO73gwnzqU6Kq8rVqhRIdoqg6XJLLB bN8H3NMmCO+M1vgTDhR6Yud37h0cnHUaeslnsfZ4PC3t6EAFlgtOuT6N481DVg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1628788146; a=rsa-sha256; cv=none; b=H7VVs1olvfWSJX0YIwgkqMaokAbkHEXjS2cHoWil8+DWepr3gPJZ9YwuQDqK7377HB7GKK 6qma2d8IbjmLd0KF3o5eCNCXI2f3rRc3vqmmr2rxz8CDwq+ZIAQbUSihBhK9q/Mc76y7Dq 3ZLIsSEbR/t6U6P4kUgY8/vtStJds5gsL0eqxSfdoL9CFDtEk4U2tTn4M5Mnag463W5tD1 rY/2SGL5qZEwQuHMf3VYRzFROXjD1K4n2y85pKi9cn52gjVZ4CPPG2SwiRqu7N57Laled+ 4xxwSwTPWr8UuFVZv4ytOeyG670P56rPaMTeBOpwzy6fQqJQ065M7B5KrkQSdQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: 0.48 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: 47437B9C8 X-Spam-Score: 0.48 X-Migadu-Scanner: scn1.migadu.com X-TUID: Xp3hHwJr6dlV At least to the degree that the Xapian QueryParser based parser also supports them. Support short alias 'rx' as it seems to make more complex queries nicer to read. --- doc/man7/notmuch-sexp-queries.rst | 8 ++++ lib/parse-sexp.cc | 54 ++++++++++++++++++----- test/T081-sexpr-search.sh | 72 +++++++++++++++++++++++++++++++ 3 files changed, 124 insertions(+), 10 deletions(-) diff --git a/doc/man7/notmuch-sexp-queries.rst b/doc/man7/notmuch-sexp-queries.rst index e163991c..7fe4a1ac 100644 --- a/doc/man7/notmuch-sexp-queries.rst +++ b/doc/man7/notmuch-sexp-queries.rst @@ -144,6 +144,11 @@ MODIFIERS *Modifiers* refer to any prefixes (first elements of compound queries) that are neither operators nor fields. +``(regex`` *atom* ``)`` ``(rx`` *atom* ``)`` + Interpret *atom* as a POSIX.2 regular expression (see + :manpage:`regex(7)`). This applies in term fields and a subset [#not-phrase]_ of + phrase fields (see :any:`field-table`). + ``(starts-with`` *subword* ``)`` Matches any term starting with *subword*. This applies in either phrase or term :any:`fields `, or outside of fields [#not-body]_. Note that @@ -205,6 +210,9 @@ NOTES .. [#aka-bool] a.k.a. boolean prefixes +.. [#not-phrase] Due to the implemention of phrase fields in Xapian, + regex queries could only match individual words. + .. [#not-body] Due the the way ``body`` is implemented in notmuch, this modifier is not supported in the ``body`` field. diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc index 56bd7e4b..48728edb 100644 --- a/lib/parse-sexp.cc +++ b/lib/parse-sexp.cc @@ -13,6 +13,8 @@ typedef enum { SEXP_FLAG_BOOLEAN = 1 << 1, SEXP_FLAG_SINGLE = 1 << 2, SEXP_FLAG_WILDCARD = 1 << 3, + SEXP_FLAG_REGEX = 1 << 4, + SEXP_FLAG_DO_REGEX = 1 << 5, } _sexp_flag_t; /* @@ -48,15 +50,15 @@ static _sexp_prefix_t prefixes[] = { "body", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD }, { "from", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "folder", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "id", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "is", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "mid", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "mimetype", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD }, { "not", Xapian::Query::OP_AND_NOT, Xapian::Query::MatchAll, @@ -64,17 +66,21 @@ static _sexp_prefix_t prefixes[] = { "or", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, SEXP_FLAG_NONE }, { "path", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "property", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, + { "regex", Xapian::Query::OP_INVALID, Xapian::Query::MatchAll, + SEXP_FLAG_SINGLE | SEXP_FLAG_DO_REGEX }, + { "rx", Xapian::Query::OP_INVALID, Xapian::Query::MatchAll, + SEXP_FLAG_SINGLE | SEXP_FLAG_DO_REGEX }, { "starts-with", Xapian::Query::OP_WILDCARD, Xapian::Query::MatchAll, SEXP_FLAG_SINGLE }, { "subject", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "tag", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "thread", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "to", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD }, { } @@ -180,6 +186,30 @@ _sexp_parse_one_term (notmuch_database_t *notmuch, std::string term_prefix, cons } } + +notmuch_status_t +_sexp_parse_regex (notmuch_database_t *notmuch, + const _sexp_prefix_t *prefix, const _sexp_prefix_t *parent, + std::string val, Xapian::Query &output) +{ + if (! parent) { + _notmuch_database_log (notmuch, "illegal '%s' outside field\n", + prefix->name); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + + if (! (parent->flags & SEXP_FLAG_REGEX)) { + _notmuch_database_log (notmuch, "'%s' not supported in field '%s'\n", + prefix->name, parent->name); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + + std::string msg; /* ignored */ + + return _notmuch_regexp_to_query (notmuch, Xapian::BAD_VALUENO, parent->name, + val, output, msg); +} + /* Here we expect the s-expression to be a proper list, with first * element defining and operation, or as a special case the empty * list */ @@ -254,6 +284,10 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const _sexp_prefix_t *parent if (prefix->xapian_op == Xapian::Query::OP_WILDCARD) return _sexp_parse_wildcard (notmuch, parent, sx->list->next->val, output); + if (prefix->flags & SEXP_FLAG_DO_REGEX) { + return _sexp_parse_regex (notmuch, prefix, parent, sx->list->next->val, output); + } + return _sexp_combine_query (notmuch, parent, prefix->xapian_op, prefix->initial, sx->list->next, output); } diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh index be243fc0..6cfd59a8 100755 --- a/test/T081-sexpr-search.sh +++ b/test/T081-sexpr-search.sh @@ -565,4 +565,76 @@ output=$(notmuch search --query=sexp '(subject deleted)' | notmuch_search_saniti test_expect_equal "$output" "thread:XXX 2001-01-05 [1/1] Notmuch Test Suite; Not deleted (inbox unread) thread:XXX 2001-01-05 [2/2] Notmuch Test Suite; Deleted (deleted inbox unread)" +test_begin_subtest "regex at top level" +notmuch search --query=sexp '(rx foo)' >& OUTPUT +cat < EXPECTED +notmuch search: Syntax error in query +illegal 'rx' outside field +EOF +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regex in illegal field" +notmuch search --query=sexp '(body (regex foo))' >& OUTPUT +cat < EXPECTED +notmuch search: Syntax error in query +'regex' not supported in field 'body' +EOF +test_expect_equal_file EXPECTED OUTPUT + +notmuch search --output=messages from:cworth > cworth.msg-ids + +test_begin_subtest "regexp 'from' search" +notmuch search --output=messages --query=sexp '(from (rx cworth))' > OUTPUT +test_expect_equal_file cworth.msg-ids OUTPUT + +test_begin_subtest "regexp search for 'from' 2" +notmuch search from:/cworth@cworth.org/ and subject:patch | notmuch_search_sanitize > EXPECTED +notmuch search --query=sexp '(and (from (rx cworth@cworth.org)) (subject patch))' \ + | notmuch_search_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'folder' search" +notmuch search 'folder:/^bar$/' | notmuch_search_sanitize > EXPECTED +notmuch search --query=sexp '(folder (rx ^bar$))' | notmuch_search_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'id' search" +notmuch search --output=messages --query=sexp '(id (rx yoom))' > OUTPUT +test_expect_equal_file cworth.msg-ids OUTPUT + +test_begin_subtest "unanchored 'is' search" +notmuch search tag:signed or tag:inbox > EXPECTED +notmuch search --query=sexp '(is (rx i))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "anchored 'is' search" +notmuch search tag:signed > EXPECTED +notmuch search --query=sexp '(is (rx ^si))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "combine regexp mid and subject" +notmuch search subject:/-C/ and mid:/y..m/ | notmuch_search_sanitize > EXPECTED +notmuch search --query=sexp '(and (subject (rx -C)) (mid (rx y..m)))' | notmuch_search_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'path' search" +notmuch search 'path:/^bar$/' | notmuch_search_sanitize > EXPECTED +notmuch search --query=sexp '(path (rx ^bar$))' | notmuch_search_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'property' search" +notmuch search property:foo=bar > EXPECTED +notmuch search --query=sexp '(property (rx foo=.*))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "anchored 'tag' search" +notmuch search tag:signed > EXPECTED +notmuch search --query=sexp '(tag (rx ^si))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'thread' search" +notmuch search --output=threads '*' | grep '7$' > EXPECTED +notmuch search --output=threads --query=sexp '(thread (rx 7$))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + test_done -- 2.30.2