From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 4BJELdsOJWGg4QAAgWs5BA (envelope-from ) for ; Tue, 24 Aug 2021 17:23:07 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id WLb/KNsOJWGdHQAA1q6Kng (envelope-from ) for ; Tue, 24 Aug 2021 15:23:07 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [IPv6:2607:5300:201:3100::1657]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 2335C8582 for ; Tue, 24 Aug 2021 17:23:07 +0200 (CEST) Received: from nmbug.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 2101E26A94; Tue, 24 Aug 2021 11:21:30 -0400 (EDT) Received: from fethera.tethera.net (fethera.tethera.net [IPv6:2607:5300:60:c5::1]) by mail.notmuchmail.org (Postfix) with ESMTP id D14D320509 for ; Tue, 24 Aug 2021 11:21:25 -0400 (EDT) Received: by fethera.tethera.net (Postfix, from userid 1001) id C78CB5FD5C; Tue, 24 Aug 2021 11:21:25 -0400 (EDT) Received: (nullmailer pid 2942869 invoked by uid 1000); Tue, 24 Aug 2021 15:17:51 -0000 From: David Bremner To: notmuch@notmuchmail.org Cc: David Bremner Subject: [PATCH 20/36] lib/parse-sexp: support regular expressions Date: Tue, 24 Aug 2021 08:17:29 -0700 Message-Id: <20210824151745.2941868-21-david@tethera.net> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210824151745.2941868-1-david@tethera.net> References: <20210824151745.2941868-1-david@tethera.net> MIME-Version: 1.0 Message-ID-Hash: BPGKFIU65ZAYIB7E5US2UOMMWTR7Q65S X-Message-ID-Hash: BPGKFIU65ZAYIB7E5US2UOMMWTR7Q65S X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1629818587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=sOGrTuuioHdjQhO6UuedzlOaiJJRugzEgGQl4ms+/hk=; b=Hdrzr8mWno2g2glaD6Acm8nQgAiAkM1shTbkMoiso/sehlddbvZzD50LRsT2WX+vojSW86 apUQUl1jBVeU/HqMRENrDbPSnGgeKm4c2izAQoHi8FnS2UWQ6Pz4UCOCaNjK94oJJnL9RF zp9hZqul0AoxBPpH/o3BFMWixudZjirTMug+tHUO8LdPr8QSOoVoLvxletGn3LQKXySyaY K1yLDH1Cbnxe9kHLQcSwCRpY3/FpulRKQ9CqCgG6++A4QhdhfhgdRgZnGI0gvnw2AEP+zd AWM8r5N6dgqh2GKJNaBsuWDEnhHKYQ7f4UaCZuLotKdB9yQdusA44qkpzq42Ug== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1629818587; a=rsa-sha256; cv=none; b=tGqPIW+jTjnLWXr/mEKkSddsfZjwCyEeTZgjLNMCEE9++xCDeYRTEbK4bIGrCceVJlv0+N LvO4HD9KLDudfKqca/Cu5UopQic6kiQYrBOET7JfMIG5opJT25G+S5oDTVi3xA7CUmV5Y3 BcxZTq2zRoO6CEU4SZJL8TvYuwBKJ7sfDw1Mvj6Ed1OELtvvSFUxoJM88XQDSF7pbRbFAh w0Svy0jybFVALCEiLLgCRk02vaKZwnxGLh8KN1/RGfbJNl32gkGJLqRYQuqmgCX5nCcjk9 sq8WfmNWKs8H4RQAOgTiLdg4cLDHfpvccqJmxS7XYesc0b5gm+CL+8ppXhMRPw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2607:5300:201:3100::1657 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: -0.93 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2607:5300:201:3100::1657 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: 2335C8582 X-Spam-Score: -0.93 X-Migadu-Scanner: scn0.migadu.com X-TUID: 3vrBEe8YZknN At least to the degree that the Xapian QueryParser based parser also supports them. Support short alias 'rx' as it seems to make more complex queries nicer to read. --- doc/man7/notmuch-sexp-queries.rst | 8 ++++ lib/parse-sexp.cc | 54 ++++++++++++++++++----- test/T081-sexpr-search.sh | 72 +++++++++++++++++++++++++++++++ 3 files changed, 124 insertions(+), 10 deletions(-) diff --git a/doc/man7/notmuch-sexp-queries.rst b/doc/man7/notmuch-sexp-queries.rst index f32bab9c..7eaffe56 100644 --- a/doc/man7/notmuch-sexp-queries.rst +++ b/doc/man7/notmuch-sexp-queries.rst @@ -144,6 +144,11 @@ MODIFIERS *Modifiers* refer to any prefixes (first elements of compound queries) that are neither operators nor fields. +``(regex`` *atom* ``)`` ``(rx`` *atom* ``)`` + Interpret *atom* as a POSIX.2 regular expression (see + :manpage:`regex(7)`). This applies in term fields and a subset [#not-phrase]_ of + phrase fields (see :any:`field-table`). + ``(starts-with`` *subword* ``)`` Matches any term starting with *subword*. This applies in either phrase or term :any:`fields `, or outside of fields [#not-body]_. Note that @@ -205,6 +210,9 @@ NOTES .. [#aka-bool] a.k.a. boolean prefixes +.. [#not-phrase] Due to the implemention of phrase fields in Xapian, + regex queries could only match individual words. + .. [#not-body] Due the the way ``body`` is implemented in notmuch, this modifier is not supported in the ``body`` field. diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc index 0192bda9..84914296 100644 --- a/lib/parse-sexp.cc +++ b/lib/parse-sexp.cc @@ -13,6 +13,8 @@ typedef enum { SEXP_FLAG_BOOLEAN = 1 << 1, SEXP_FLAG_SINGLE = 1 << 2, SEXP_FLAG_WILDCARD = 1 << 3, + SEXP_FLAG_REGEX = 1 << 4, + SEXP_FLAG_DO_REGEX = 1 << 5, } _sexp_flag_t; /* @@ -48,15 +50,15 @@ static _sexp_prefix_t prefixes[] = { "body", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD }, { "from", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "folder", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "id", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "is", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "mid", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "mimetype", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD }, { "not", Xapian::Query::OP_AND_NOT, Xapian::Query::MatchAll, @@ -64,17 +66,21 @@ static _sexp_prefix_t prefixes[] = { "or", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, SEXP_FLAG_NONE }, { "path", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "property", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, + { "regex", Xapian::Query::OP_INVALID, Xapian::Query::MatchAll, + SEXP_FLAG_SINGLE | SEXP_FLAG_DO_REGEX }, + { "rx", Xapian::Query::OP_INVALID, Xapian::Query::MatchAll, + SEXP_FLAG_SINGLE | SEXP_FLAG_DO_REGEX }, { "starts-with", Xapian::Query::OP_WILDCARD, Xapian::Query::MatchAll, SEXP_FLAG_SINGLE }, { "subject", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "tag", Xapian::Query::OP_AND, Xapian::Query::MatchAll, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "thread", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX }, { "to", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD }, { } @@ -180,6 +186,30 @@ _sexp_parse_one_term (notmuch_database_t *notmuch, std::string term_prefix, cons } } + +notmuch_status_t +_sexp_parse_regex (notmuch_database_t *notmuch, + const _sexp_prefix_t *prefix, const _sexp_prefix_t *parent, + std::string val, Xapian::Query &output) +{ + if (! parent) { + _notmuch_database_log (notmuch, "illegal '%s' outside field\n", + prefix->name); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + + if (! (parent->flags & SEXP_FLAG_REGEX)) { + _notmuch_database_log (notmuch, "'%s' not supported in field '%s'\n", + prefix->name, parent->name); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + + std::string msg; /* ignored */ + + return _notmuch_regexp_to_query (notmuch, Xapian::BAD_VALUENO, parent->name, + val, output, msg); +} + /* Here we expect the s-expression to be a proper list, with first * element defining and operation, or as a special case the empty * list */ @@ -254,6 +284,10 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const _sexp_prefix_t *parent if (prefix->xapian_op == Xapian::Query::OP_WILDCARD) return _sexp_parse_wildcard (notmuch, parent, sx->list->next->val, output); + if (prefix->flags & SEXP_FLAG_DO_REGEX) { + return _sexp_parse_regex (notmuch, prefix, parent, sx->list->next->val, output); + } + return _sexp_combine_query (notmuch, parent, prefix->xapian_op, prefix->initial, sx->list->next, output); } diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh index be243fc0..6cfd59a8 100755 --- a/test/T081-sexpr-search.sh +++ b/test/T081-sexpr-search.sh @@ -565,4 +565,76 @@ output=$(notmuch search --query=sexp '(subject deleted)' | notmuch_search_saniti test_expect_equal "$output" "thread:XXX 2001-01-05 [1/1] Notmuch Test Suite; Not deleted (inbox unread) thread:XXX 2001-01-05 [2/2] Notmuch Test Suite; Deleted (deleted inbox unread)" +test_begin_subtest "regex at top level" +notmuch search --query=sexp '(rx foo)' >& OUTPUT +cat < EXPECTED +notmuch search: Syntax error in query +illegal 'rx' outside field +EOF +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regex in illegal field" +notmuch search --query=sexp '(body (regex foo))' >& OUTPUT +cat < EXPECTED +notmuch search: Syntax error in query +'regex' not supported in field 'body' +EOF +test_expect_equal_file EXPECTED OUTPUT + +notmuch search --output=messages from:cworth > cworth.msg-ids + +test_begin_subtest "regexp 'from' search" +notmuch search --output=messages --query=sexp '(from (rx cworth))' > OUTPUT +test_expect_equal_file cworth.msg-ids OUTPUT + +test_begin_subtest "regexp search for 'from' 2" +notmuch search from:/cworth@cworth.org/ and subject:patch | notmuch_search_sanitize > EXPECTED +notmuch search --query=sexp '(and (from (rx cworth@cworth.org)) (subject patch))' \ + | notmuch_search_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'folder' search" +notmuch search 'folder:/^bar$/' | notmuch_search_sanitize > EXPECTED +notmuch search --query=sexp '(folder (rx ^bar$))' | notmuch_search_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'id' search" +notmuch search --output=messages --query=sexp '(id (rx yoom))' > OUTPUT +test_expect_equal_file cworth.msg-ids OUTPUT + +test_begin_subtest "unanchored 'is' search" +notmuch search tag:signed or tag:inbox > EXPECTED +notmuch search --query=sexp '(is (rx i))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "anchored 'is' search" +notmuch search tag:signed > EXPECTED +notmuch search --query=sexp '(is (rx ^si))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "combine regexp mid and subject" +notmuch search subject:/-C/ and mid:/y..m/ | notmuch_search_sanitize > EXPECTED +notmuch search --query=sexp '(and (subject (rx -C)) (mid (rx y..m)))' | notmuch_search_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'path' search" +notmuch search 'path:/^bar$/' | notmuch_search_sanitize > EXPECTED +notmuch search --query=sexp '(path (rx ^bar$))' | notmuch_search_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'property' search" +notmuch search property:foo=bar > EXPECTED +notmuch search --query=sexp '(property (rx foo=.*))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "anchored 'tag' search" +notmuch search tag:signed > EXPECTED +notmuch search --query=sexp '(tag (rx ^si))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest "regexp 'thread' search" +notmuch search --output=threads '*' | grep '7$' > EXPECTED +notmuch search --output=threads --query=sexp '(thread (rx 7$))' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + test_done -- 2.32.0