unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Cc: David Bremner <david@tethera.net>
Subject: [PATCH 08/25] lib/parse-sexp: split terms in phrase mode
Date: Sat, 17 Jul 2021 23:40:04 -0300	[thread overview]
Message-ID: <20210718024021.3850340-9-david@tethera.net> (raw)
In-Reply-To: <20210718024021.3850340-1-david@tethera.net>

The goal is to have (subject foo-bar) match the same messages as
subject:foo-bar.
---
 lib/parse-sexp.cc         | 38 +++++++++++++++++++++++++++++++++-----
 test/T081-sexpr-search.sh |  8 ++++++++
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc
index 898cfdd0..fc6eb2d7 100644
--- a/lib/parse-sexp.cc
+++ b/lib/parse-sexp.cc
@@ -72,6 +72,34 @@ _notmuch_sexp_string_to_xapian_query (notmuch_database_t *notmuch, const char *q
     return _sexp_to_xapian_query (notmuch, sx, output);
 }
 
+static void
+_sexp_find_words (const char *str, std::string pref_str, std::vector<std::string> &terms)
+{
+    Xapian::Utf8Iterator p (str);
+    Xapian::Utf8Iterator end;
+
+    while (p != end) {
+	Xapian::Utf8Iterator start;
+	while (p != end && ! Xapian::Unicode::is_wordchar (*p))
+	    p++;
+
+	if (p == end)
+	    break;
+
+	start = p;
+
+	while (p != end && Xapian::Unicode::is_wordchar (*p))
+	    p++;
+
+	if (p != start) {
+	    std::string word (start, p);
+	    word = Xapian::Unicode::tolower (word);
+	    terms.push_back (pref_str + word);
+	}
+    }
+
+}
+
 static notmuch_status_t
 _sexp_combine_field (const char *prefix,
 		     Xapian::Query::op operation,
@@ -82,12 +110,12 @@ _sexp_combine_field (const char *prefix,
 
     for (const sexp_t *cur = sx; cur; cur = cur->next) {
 	std::string pref_str = prefix;
-	std::string word = cur->val;
 
-	if (operation == Xapian::Query::OP_PHRASE)
-	    word = Xapian::Unicode::tolower (word);
-
-	terms.push_back (pref_str + word);
+	if (operation == Xapian::Query::OP_PHRASE) {
+	    _sexp_find_words (cur->val, pref_str, terms);
+	} else {
+	    terms.push_back (pref_str + cur->val);
+	}
     }
     output = Xapian::Query (operation, terms.begin (), terms.end ());
     return NOTMUCH_STATUS_SUCCESS;
diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh
index 872f2603..8e042f88 100755
--- a/test/T081-sexpr-search.sh
+++ b/test/T081-sexpr-search.sh
@@ -34,6 +34,14 @@ add_message [subject]=utf8-sübjéct '[date]="Sat, 01 Jan 2000 12:00:00 -0000"'
 output=$(notmuch search --query-syntax=sexp '(subject utf8 sübjéct)' | notmuch_search_sanitize)
 test_expect_equal "$output" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; utf8-sübjéct (inbox unread)"
 
+test_begin_subtest "Search by 'subject' (utf-8, phrase-token):"
+output=$(notmuch search --query-syntax=sexp '(subject utf8-sübjéct)' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; utf8-sübjéct (inbox unread)"
+
+test_begin_subtest "Search by 'subject' (utf-8, quoted string):"
+output=$(notmuch search --query-syntax=sexp '(subject "utf8 sübjéct")' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; utf8-sübjéct (inbox unread)"
+
 test_begin_subtest "Unbalanced parens"
 # A code 1 indicates the error was handled (a crash will return e.g. 139).
 test_expect_code 1 "notmuch search --query-syntax=sexp '('"
-- 
2.30.2\r

  parent reply	other threads:[~2021-07-18  2:41 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-18  2:39 v2 sexpr parser David Bremner
2021-07-18  2:39 ` [PATCH 01/25] configure: optional library sfsexp David Bremner
2021-07-18  2:39 ` [PATCH 02/25] lib: split notmuch_query_create David Bremner
2021-07-18  2:39 ` [PATCH 03/25] lib: define notmuch_query_create_sexpr David Bremner
2021-07-18  2:40 ` [PATCH 04/25] CLI/search+address: support sexpr queries David Bremner
2021-07-18  2:40 ` [PATCH 05/25] lib: add new status code for query syntax errors David Bremner
2021-07-18  2:40 ` [PATCH 06/25] lib/parse-sexp: parse 'and', 'not', 'or' David Bremner
2021-07-18  2:40 ` [PATCH 07/25] lib/parse-sexp: parse 'subject' David Bremner
2021-07-18  2:40 ` David Bremner [this message]
2021-07-18  2:40 ` [PATCH 09/25] lib/parse-sexp: handle most fields David Bremner
2021-07-18  2:40 ` [PATCH 10/25] lib/parse-sexp: handle unprefixed terms David Bremner
2021-07-18  2:40 ` [PATCH 11/25] lib: factor out date to query conversion David Bremner
2021-07-18  2:40 ` [PATCH 12/25] lib/parse-sexp: parse date fields David Bremner
2021-07-18  2:40 ` [PATCH 13/25] lib: factor out expansion of saved queries David Bremner
2021-07-18  2:40 ` [PATCH 14/25] lib/parse-sexp: handle " David Bremner
2021-07-18  2:40 ` [PATCH 15/25] lib/parse-sexp: add keyword arguments for fields David Bremner
2021-07-18  2:40 ` [PATCH 16/25] lib/parse-sexp: initial support for wildcard queries David Bremner
2021-07-18  2:40 ` [PATCH 17/25] lib/query: generalize exclude handling to s-expression queries David Bremner
2021-07-18  2:40 ` [PATCH 18/25] lib: factor out query construction from regexp David Bremner
2021-07-18  2:40 ` [PATCH 19/25] lib/parse-sexp: add support for regexp fields David Bremner
2021-07-18  2:40 ` [PATCH 20/25] lib/thread-fp: factor out query expansion David Bremner
2021-07-18  2:40 ` [PATCH 21/25] lib: define _notmuch_query_from_sexp David Bremner
2021-07-18  2:40 ` [PATCH 22/25] lib: generate actual Xapian query for "*" and "" David Bremner
2021-07-18  2:40 ` [PATCH 23/25] lib/parse-sexp: support thread subqueries David Bremner
2021-07-18  2:40 ` [PATCH 24/25] lib/parse-sexp: support infix subqueries David Bremner
2021-07-18  2:40 ` [PATCH 25/25] lib/parse-sexp: parse user headers David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210718024021.3850340-9-david@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).