From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id cOdUCDH3A2EShQEAgWs5BA (envelope-from ) for ; Fri, 30 Jul 2021 14:57:21 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id SInkAzH3A2GQagAAB5/wlQ (envelope-from ) for ; Fri, 30 Jul 2021 12:57:21 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [IPv6:2607:5300:201:3100::1657]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 9AEA910E2E for ; Fri, 30 Jul 2021 14:57:20 +0200 (CEST) Received: from nmbug.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 5B13B29210; Fri, 30 Jul 2021 08:56:42 -0400 (EDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by mail.notmuchmail.org (Postfix) with ESMTP id 2C110291F3 for ; Fri, 30 Jul 2021 08:56:21 -0400 (EDT) Received: by fethera.tethera.net (Postfix, from userid 1001) id 23D725FD17; Fri, 30 Jul 2021 08:56:21 -0400 (EDT) Received: (nullmailer pid 2166881 invoked by uid 1000); Fri, 30 Jul 2021 12:56:10 -0000 From: David Bremner To: notmuch@notmuchmail.org Cc: David Bremner Subject: [PATCH 08/27] lib/parse-sexp: stem unquoted atoms Date: Fri, 30 Jul 2021 09:55:48 -0300 Message-Id: <20210730125607.2165433-9-david@tethera.net> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210730125607.2165433-1-david@tethera.net> References: <20210730125607.2165433-1-david@tethera.net> MIME-Version: 1.0 Message-ID-Hash: PDY3LWR2NV5D6QNSJAXXJTC6C6CBS5P6 X-Message-ID-Hash: PDY3LWR2NV5D6QNSJAXXJTC6C6CBS5P6 X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1627649840; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=4Sbg7cWQXO3p3fjq7WX5FOvchVEMJpJPbFRtudS+880=; b=mKBN6GSQMCPsH/4wg9Gu4OM78BAFJYjarelldzRr8lVMntMtvpoQScCwOY+JvdoKvFwnyN 28Qu0peI7LmX5NJQ9QnasBElh/ojAhCC0h7NA/VBI7xs16NGYKb+NGvEFaSx613JXegEgB rITK8TXHi6GIpbPKKLezmNFg4UUO9TXipEpL6/upl2G/hX5008lt7sDxcpjF38BDL/DCdz WVwQ3wbk/Rap5yZ7f7CadfPkb+I7VIuDyGhkUowAQPEIcR5DEkzMT+g+nEKtqZdOpDTu+b mWVX//aCW1SPQbnC1OxYbuYVK6GZN4ldQMyz/8Tu4j4PsMOTQ5A7rcvW4R7wgQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1627649840; a=rsa-sha256; cv=none; b=BrsgvgI7dLnzTkXYM7nSA9Rq/msGJpTfFJEliPSRC/bYpF2qSwvrm9dbvbW1FzAGnBEfz+ VlTW5fWOh3TuBGlVnub/DmusvxZzZUMd8x1tra7eeYlbEv/5RBICy1V60XxAFSWYuWl+9D YbFsydBwJ1UD/yiQJwjNY4pKf1zXaRbzbYz0YXhYvqlpv6+bPWJCAFYZVfYS6T1CtQxKqD Katt9+fc1rqTK6FV0dfd2oa5dY7WDWUbAq2Zq7gn3bdj+zQv/TVfKmt1mBBqfWpbhpN3XB v05EjmztQfohow/HNwTJhODqYHmCmmC840Olyy1DBm64L/W2L/q7epHfxdp7tw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2607:5300:201:3100::1657 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: 1.26 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2607:5300:201:3100::1657 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: 9AEA910E2E X-Spam-Score: 1.26 X-Migadu-Scanner: scn0.migadu.com X-TUID: aA+9SZY53Pnl This is somewhat less DWIM than the Xapian query parser, but it has the advantage of simplicity. --- doc/man7/notmuch-sexp-queries.rst | 10 ++++++++-- lib/parse-sexp.cc | 10 +++++++--- test/T081-sexpr-search.sh | 7 +++++-- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/doc/man7/notmuch-sexp-queries.rst b/doc/man7/notmuch-sexp-queries.rst index e530912c..8a3bcd8b 100644 --- a/doc/man7/notmuch-sexp-queries.rst +++ b/doc/man7/notmuch-sexp-queries.rst @@ -41,8 +41,10 @@ subqueries. The empty list matches all messages *term* - Match all messages containing *term*, possibly after stemming - or phase splitting. + Match all messages containing *term*, possibly after + stemming or phase splitting. For discussion of stemming in + notmuch see :any:`notmuch-search-terms(7)`. Stemming only applies + to unquoted terms (basic values) in s-expression queries. ``(`` *field* |q1| |q2| ... |qn| ``)`` Restrict the queries |q1| to |qn| to *field*, and combine with *and* @@ -76,6 +78,10 @@ EXAMPLES ``Wizard`` Match all messages containing the word "wizard", ignoring case. +``added`` + Match all messages containing "added", but also those containing "add", "additional", + "Additional", "adds", etc... via stemming. + .. |q1| replace:: :math:`q_1` .. |q2| replace:: :math:`q_2` .. |qn| replace:: :math:`q_n` diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc index 1ce3c9d4..1be5e209 100644 --- a/lib/parse-sexp.cc +++ b/lib/parse-sexp.cc @@ -1,5 +1,4 @@ -#include -#include "notmuch-private.h" +#include "database-private.h" #include "sexp.h" #if HAVE_SFSEXP @@ -17,7 +16,12 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const sexp_t *sx, { if (sx->ty == SEXP_VALUE) { - output = Xapian::Query (Xapian::Unicode::tolower (sx->val)); + std::string term = Xapian::Unicode::tolower (sx->val); + Xapian::Stem stem = *(notmuch->stemmer); + if (sx->aty == SEXP_BASIC) + term = "Z" + stem (term); + + output = Xapian::Query (term); return NOTMUCH_STATUS_SUCCESS; } diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh index 3ee9f71d..c5c3cf6b 100755 --- a/test/T081-sexpr-search.sh +++ b/test/T081-sexpr-search.sh @@ -22,18 +22,21 @@ EOF test_expect_equal_file EXPECTED OUTPUT test_begin_subtest "single term in body (case insensitive)" -notmuch search --query-syntax=sexp 'Wizard' | notmuch_search_sanitize>OUTPUT +notmuch search --query-syntax=sexp '"Wizard"' | notmuch_search_sanitize>OUTPUT cat < EXPECTED thread:XXX 2009-11-18 [1/3] Carl Worth| Jan Janak; [notmuch] What a great idea! (inbox unread) EOF test_expect_equal_file EXPECTED OUTPUT test_begin_subtest "single term in body, stemmed version" -test_subtest_known_broken notmuch search arriv > EXPECTED notmuch search --query-syntax=sexp arriv > OUTPUT test_expect_equal_file EXPECTED OUTPUT +test_begin_subtest "single term in body, unstemmed version" +notmuch search --query-syntax=sexp '"arriv"' > OUTPUT +test_expect_equal_file /dev/null OUTPUT + test_begin_subtest "Unbalanced parens" # A code 1 indicates the error was handled (a crash will return e.g. 139). test_expect_code 1 "notmuch search --query-syntax=sexp '('" -- 2.30.2