From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Cc: David Bremner <david@tethera.net>
Subject: [PATCH 07/11] lib/parse-sexp: split terms in phrase mode
Date: Tue, 13 Jul 2021 21:02:35 -0300 [thread overview]
Message-ID: <20210714000239.804384-8-david@tethera.net> (raw)
In-Reply-To: <20210714000239.804384-1-david@tethera.net>
The goal is to have (subject foo-bar) match the same messages as
subject:foo-bar.
---
lib/parse-sexp.cc | 28 ++++++++++++++++++++++++----
test/T081-sexpr-search.sh | 8 ++++++++
2 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc
index 4a2fac8b..26d4ee1f 100644
--- a/lib/parse-sexp.cc
+++ b/lib/parse-sexp.cc
@@ -66,13 +66,33 @@ _sexp_combine_field (const char *prefix,
for (sexp_t *cur = sx; cur; cur = cur->next) {
std::string pref_str = prefix;
- std::string word = cur->val;
- if (operation == Xapian::Query::OP_PHRASE)
- word = Xapian::Unicode::tolower (word);
+ if (operation == Xapian::Query::OP_PHRASE) {
+ Xapian::Utf8Iterator p (cur->val);
+ Xapian::Utf8Iterator end;
+ while (p != end) {
+ Xapian::Utf8Iterator start;
+ while (p != end && ! Xapian::Unicode::is_wordchar (*p))
+ p++;
- terms.push_back (pref_str + word);
+ if (p == end)
+ break;
+
+ start = p;
+
+ while (p != end && Xapian::Unicode::is_wordchar (*p))
+ p++;
+
+ if (p != start) {
+ std::string word (start, p);
+ word = Xapian::Unicode::tolower (word);
+ terms.push_back (pref_str + word);
+ }
+ }
+ } else {
+ terms.push_back (pref_str + cur->val);
+ }
}
return Xapian::Query (operation, terms.begin (), terms.end ());
}
diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh
index 1a80a133..6369e483 100755
--- a/test/T081-sexpr-search.sh
+++ b/test/T081-sexpr-search.sh
@@ -34,4 +34,12 @@ add_message [subject]=utf8-sübjéct '[date]="Sat, 01 Jan 2000 12:00:00 -0000"'
output=$(notmuch search --query-syntax=sexp '(subject utf8 sübjéct)' | notmuch_search_sanitize)
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; utf8-sübjéct (inbox unread)"
+test_begin_subtest "Search by 'subject' (utf-8, phrase-token):"
+output=$(notmuch search --query-syntax=sexp '(subject utf8-sübjéct)' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; utf8-sübjéct (inbox unread)"
+
+test_begin_subtest "Search by 'subject' (utf-8, quoted string):"
+output=$(notmuch search --query-syntax=sexp '(subject "utf8 sübjéct")' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; utf8-sübjéct (inbox unread)"
+
test_done
--
2.30.2\r
next prev parent reply other threads:[~2021-07-14 0:03 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-14 0:02 Early preview of s-expression based query parser David Bremner
2021-07-14 0:02 ` [PATCH 01/11] configure: optional library sfsexp David Bremner
2021-07-14 0:02 ` [PATCH 02/11] lib: split notmuch_query_create David Bremner
2021-07-14 0:02 ` [PATCH 03/11] lib: define notmuch_query_create_sexpr David Bremner
2021-07-14 0:02 ` [PATCH 04/11] CLI/search+address: support sexpr queries David Bremner
2021-07-14 0:02 ` [PATCH 05/11] lib/parse-sexp: parse 'and', 'not', 'or' David Bremner
2021-07-14 0:02 ` [PATCH 06/11] lib/parse-sexp: parse 'subject' David Bremner
2021-07-14 0:02 ` David Bremner [this message]
2021-07-14 0:02 ` [PATCH 08/11] lib/parse-sexp: handle most fields David Bremner
2021-07-14 0:02 ` [PATCH 09/11] lib/parse-sexp: add error handling to internal API David Bremner
2021-07-14 0:02 ` [PATCH 10/11] lib/parse-sexp: add keyword arguments for fields David Bremner
2021-07-14 0:02 ` [PATCH 11/11] lib/parse-sexp: initial support for wildcard queries David Bremner
2021-07-16 14:00 ` Early preview of s-expression based query parser Hannu Hartikainen
2021-07-18 19:43 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210714000239.804384-8-david@tethera.net \
--to=david@tethera.net \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).