unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* fixes for regexp field processors
@ 2017-03-18  3:02 David Bremner
  2017-03-18  3:03 ` [PATCH 1/4] test: add known broken test for empty from: and subject: query David Bremner
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: David Bremner @ 2017-03-18  3:02 UTC (permalink / raw)
  To: notmuch

This series fixes two bugs introduces in notmuch 0.24

1) queries of the form  'from:foo*' and 'subject:bar*' no longer work (the * is just silently dropped)
2) queries of the form 'from:""' and 'subject:""' crashes with an uncaught exception.

Unless there objections, I'll probably roll these patches (along with
perhaps the two memory fixes recently posted) into a bugfix release
0.24.1. Tentative release date early next week.

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] test: add known broken test for empty from: and subject: query
  2017-03-18  3:02 fixes for regexp field processors David Bremner
@ 2017-03-18  3:03 ` David Bremner
  2017-03-18  3:03 ` [PATCH 2/4] lib: handle empty string in regexp field processors David Bremner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: David Bremner @ 2017-03-18  3:03 UTC (permalink / raw)
  To: notmuch

These queries currently fail with field processors enabled because the
code expects a non-empty string.
---
 test/T650-regexp-query.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index df48ab82..b1d84439 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -11,6 +11,15 @@ fi
 
 notmuch search --output=messages from:cworth > cworth.msg-ids
 
+test_begin_subtest "empty from: search"
+test_subtest_known_broken
+notmuch search --output=messages 'from:""' and from:cworth > OUTPUT
+test_expect_equal_file cworth.msg-ids OUTPUT
+
+test_begin_subtest "empty subject: search"
+test_subtest_known_broken
+notmuch search --output=messages 'subject:""' and from:cworth > OUTPUT
+test_expect_equal_file cworth.msg-ids OUTPUT
 test_begin_subtest "regexp from search, case sensitive"
 notmuch search --output=messages from:/carl/ > OUTPUT
 test_expect_equal_file /dev/null OUTPUT
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] lib: handle empty string in regexp field processors
  2017-03-18  3:02 fixes for regexp field processors David Bremner
  2017-03-18  3:03 ` [PATCH 1/4] test: add known broken test for empty from: and subject: query David Bremner
@ 2017-03-18  3:03 ` David Bremner
  2017-03-24  2:07   ` David Bremner
  2017-03-18  3:03 ` [PATCH 3/4] test: add known broken tests wildcard search in from and subject David Bremner
  2017-03-18  3:03 ` [PATCH 4/4] lib: only trigger phrase processing for regexp fields when needed David Bremner
  3 siblings, 1 reply; 7+ messages in thread
From: David Bremner @ 2017-03-18  3:03 UTC (permalink / raw)
  To: notmuch

The non-field processor behaviour is is convert the corresponding
queries into a search for the unprefixed terms. This yields pretty
surprising results so decided to match any message.
---
 lib/regexp-fields.cc      | 3 +++
 test/T650-regexp-query.sh | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 8e740a81..42239a66 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -148,6 +148,9 @@ RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix, Xapian::QueryPar
 Xapian::Query
 RegexpFieldProcessor::operator() (const std::string & str)
 {
+    if (str.size () == 0)
+	return Xapian::Query::MatchAll;
+
     if (str.at (0) == '/') {
 	if (str.at (str.size () - 1) == '/'){
 	    RegexpPostingSource *postings = new RegexpPostingSource (slot, str.substr(1,str.size () - 2));
diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index b1d84439..049477b4 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -12,12 +12,10 @@ fi
 notmuch search --output=messages from:cworth > cworth.msg-ids
 
 test_begin_subtest "empty from: search"
-test_subtest_known_broken
 notmuch search --output=messages 'from:""' and from:cworth > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
 
 test_begin_subtest "empty subject: search"
-test_subtest_known_broken
 notmuch search --output=messages 'subject:""' and from:cworth > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
 test_begin_subtest "regexp from search, case sensitive"
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] test: add known broken tests wildcard search in from and subject
  2017-03-18  3:02 fixes for regexp field processors David Bremner
  2017-03-18  3:03 ` [PATCH 1/4] test: add known broken test for empty from: and subject: query David Bremner
  2017-03-18  3:03 ` [PATCH 2/4] lib: handle empty string in regexp field processors David Bremner
@ 2017-03-18  3:03 ` David Bremner
  2017-03-25 14:57   ` David Bremner
  2017-03-18  3:03 ` [PATCH 4/4] lib: only trigger phrase processing for regexp fields when needed David Bremner
  3 siblings, 1 reply; 7+ messages in thread
From: David Bremner @ 2017-03-18  3:03 UTC (permalink / raw)
  To: notmuch

This was broken by the addition of regexp searching. The detection of
wildcards is not currently done in the recursive call to parse_query,
because of quoting issues.
---
 test/T650-regexp-query.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index 049477b4..ba4a64e0 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -18,6 +18,16 @@ test_expect_equal_file cworth.msg-ids OUTPUT
 test_begin_subtest "empty subject: search"
 notmuch search --output=messages 'subject:""' and from:cworth > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
+
+test_begin_subtest "xapian wildcard search for from:"
+test_subtest_known_broken
+notmuch search --output=messages 'from:cwo*' > OUTPUT
+test_expect_equal_file cworth.msg-ids OUTPUT
+
+test_begin_subtest "xapian wildcard search for subject:"
+test_subtest_known_broken
+test_expect_equal $(notmuch count 'subject:count*') 1
+
 test_begin_subtest "regexp from search, case sensitive"
 notmuch search --output=messages from:/carl/ > OUTPUT
 test_expect_equal_file /dev/null OUTPUT
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] lib: only trigger phrase processing for regexp fields when needed
  2017-03-18  3:02 fixes for regexp field processors David Bremner
                   ` (2 preceding siblings ...)
  2017-03-18  3:03 ` [PATCH 3/4] test: add known broken tests wildcard search in from and subject David Bremner
@ 2017-03-18  3:03 ` David Bremner
  3 siblings, 0 replies; 7+ messages in thread
From: David Bremner @ 2017-03-18  3:03 UTC (permalink / raw)
  To: notmuch

The argument is that if the string passed to the field processor has
no spaces, then the added quotes won't have any benefit except for
disabling wildcards. But disabling wildcards doesn't seem very useful
in the normal Xapian query parser, since they're stripped before
generating terms anyway. It does mean that the query 'from:"foo*"' will
not be precisely equivalent to 'from:foo' as it is for the non
field-processor version.
---
 lib/regexp-fields.cc      | 10 ++++++++--
 test/T650-regexp-query.sh |  2 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 42239a66..08c6ccb5 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -161,8 +161,14 @@ RegexpFieldProcessor::operator() (const std::string & str)
     } else {
 	/* TODO replace this with a nicer API level triggering of
 	 * phrase parsing, when possible */
-	std::string quoted='"' + str + '"';
-	return parser.parse_query (quoted, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
+	std::string query_str;
+
+	if (str.find (' ') != std::string::npos)
+	    query_str = '"' + str + '"';
+	else
+	    query_str = str;
+
+	return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
     }
 }
 #endif
diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index ba4a64e0..c2b99d1b 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -20,12 +20,10 @@ notmuch search --output=messages 'subject:""' and from:cworth > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
 
 test_begin_subtest "xapian wildcard search for from:"
-test_subtest_known_broken
 notmuch search --output=messages 'from:cwo*' > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
 
 test_begin_subtest "xapian wildcard search for subject:"
-test_subtest_known_broken
 test_expect_equal $(notmuch count 'subject:count*') 1
 
 test_begin_subtest "regexp from search, case sensitive"
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/4] lib: handle empty string in regexp field processors
  2017-03-18  3:03 ` [PATCH 2/4] lib: handle empty string in regexp field processors David Bremner
@ 2017-03-24  2:07   ` David Bremner
  0 siblings, 0 replies; 7+ messages in thread
From: David Bremner @ 2017-03-24  2:07 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> The non-field processor behaviour is is convert the corresponding
> queries into a search for the unprefixed terms. This yields pretty
> surprising results so decided to match any message.
> ---
>  lib/regexp-fields.cc      | 3 +++
>  test/T650-regexp-query.sh | 2 --
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
> index 8e740a81..42239a66 100644
> --- a/lib/regexp-fields.cc
> +++ b/lib/regexp-fields.cc
> @@ -148,6 +148,9 @@ RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix, Xapian::QueryPar
>  Xapian::Query
>  RegexpFieldProcessor::operator() (const std::string & str)
>  {
> +    if (str.size () == 0)
> +	return Xapian::Query::MatchAll;
> +

For things like file:, it actually makes more sense to return
Xapian::Query(term_prefix). I'm leaning towards something like the following

    if (str.size () == 0) {
       if (options & NOTMUCH_FIELD_PROBABILISTIC)
           return Xapian::Query::MatchAll;
       else
           return Xapian::Query(term_prefix);
    }
 

but this patch can stay as is for now, as there are not yet any boolean
fields being matched (mid would be the first).

I thought a bit about unconditionalling returning Xapian::Query
(term_prefix), but I think for the probabilistic (term based) fields,
we'll never add terms for only the prefix, i.e. and empty subject would
just not add any XSUBJECT terms. So that would effectively match no
messages.

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/4] test: add known broken tests wildcard search in from and subject
  2017-03-18  3:03 ` [PATCH 3/4] test: add known broken tests wildcard search in from and subject David Bremner
@ 2017-03-25 14:57   ` David Bremner
  0 siblings, 0 replies; 7+ messages in thread
From: David Bremner @ 2017-03-25 14:57 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> This was broken by the addition of regexp searching. The detection of
> wildcards is not currently done in the recursive call to parse_query,
> because of quoting issues.

Patches 3 and 4 pushed to release and master.

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-25 14:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-18  3:02 fixes for regexp field processors David Bremner
2017-03-18  3:03 ` [PATCH 1/4] test: add known broken test for empty from: and subject: query David Bremner
2017-03-18  3:03 ` [PATCH 2/4] lib: handle empty string in regexp field processors David Bremner
2017-03-24  2:07   ` David Bremner
2017-03-18  3:03 ` [PATCH 3/4] test: add known broken tests wildcard search in from and subject David Bremner
2017-03-25 14:57   ` David Bremner
2017-03-18  3:03 ` [PATCH 4/4] lib: only trigger phrase processing for regexp fields when needed David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).