unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* xapian parser bug?
@ 2018-09-29 22:09 David Bremner
  2018-09-30  8:50 ` James Aylett
  0 siblings, 1 reply; 7+ messages in thread
From: David Bremner @ 2018-09-29 22:09 UTC (permalink / raw)
  To: xapian-discuss; +Cc: notmuch


Today we noticed that keywords can't be searched as prefixed terms. Or
that's what it looks like anyway. I tested and, or, and not.

╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"and"'
Query string is:
subject:"and"
notmuch search: A Xapian exception occurred
A Xapian exception occurred parsing query: Syntax: <expression> AND <expression>
Query string was: subject:"and"

╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"or"' 
Query string is:
subject:"or"
notmuch search: A Xapian exception occurred
A Xapian exception occurred parsing query: Syntax: <expression> OR <expression>
Query string was: subject:"or"

╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"not"'
Query string is:
subject:"not"
notmuch search: A Xapian exception occurred
A Xapian exception occurred parsing query: Syntax: <expression> NOT <expression>
Query string was: subject:"not"

Interestingly, putting space around the operator seems to be a
workaround. Something about turning on phrase parsing maybe?

╰─% NOTMUCH_DEBUG_QUERY=y notmuch count 'subject:" not "'
Query string is:
subject:" not "
Exclude query is:
Query((((Kspam OR Kdeleted) OR Kmuted) OR Kbad-address))
Final query is:
Query(((Tmail AND 0 * XSUBJECTnot@1) AND_NOT (((Kspam OR Kdeleted) OR Kmuted) OR Kbad-address)))
9927

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xapian parser bug?
  2018-09-29 22:09 xapian parser bug? David Bremner
@ 2018-09-30  8:50 ` James Aylett
  2018-09-30  9:20   ` Olly Betts
  0 siblings, 1 reply; 7+ messages in thread
From: James Aylett @ 2018-09-30  8:50 UTC (permalink / raw)
  To: David Bremner; +Cc: xapian-discuss, notmuch

David — this isn't the behaviour I see what QueryParser alone, unless you're driving it in a way I don't expect. In python:

>>> import xapian
>>> qp = xapian.QueryParser()
>>> qp.add_prefix('subject', 'S')
>>> str(qp.parse_query('subject:"and"', qp.FLAG_DEFAULT|qp.FLAG_BOOLEAN_ANY_CASE))
'Query(Sand@1)'
>>> str(qp.parse_query('subject:"or"', qp.FLAG_DEFAULT|qp.FLAG_BOOLEAN_ANY_CASE))
'Query(Sor@1)'
>>> str(qp.parse_query('subject:"not"', qp.FLAG_DEFAULT|qp.FLAG_BOOLEAN_ANY_CASE))
'Query(Snot@1)'
>>> str(qp.parse_query('subject:" not "', qp.FLAG_DEFAULT|qp.FLAG_BOOLEAN_ANY_CASE))
'Query(Snot@1)'

Note that I'm using 1.4.7, and from your output I believe you're not (the * in the query description I believe doesn't happen in those situations any more).

J

> On 29 Sep 2018, at 23:09, David Bremner <david@tethera.net> wrote:
> 
> 
> Today we noticed that keywords can't be searched as prefixed terms. Or
> that's what it looks like anyway. I tested and, or, and not.
> 
> ╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"and"'
> Query string is:
> subject:"and"
> notmuch search: A Xapian exception occurred
> A Xapian exception occurred parsing query: Syntax: <expression> AND <expression>
> Query string was: subject:"and"
> 
> ╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"or"' 
> Query string is:
> subject:"or"
> notmuch search: A Xapian exception occurred
> A Xapian exception occurred parsing query: Syntax: <expression> OR <expression>
> Query string was: subject:"or"
> 
> ╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"not"'
> Query string is:
> subject:"not"
> notmuch search: A Xapian exception occurred
> A Xapian exception occurred parsing query: Syntax: <expression> NOT <expression>
> Query string was: subject:"not"
> 
> Interestingly, putting space around the operator seems to be a
> workaround. Something about turning on phrase parsing maybe?
> 
> ╰─% NOTMUCH_DEBUG_QUERY=y notmuch count 'subject:" not "'
> Query string is:
> subject:" not "
> Exclude query is:
> Query((((Kspam OR Kdeleted) OR Kmuted) OR Kbad-address))
> Final query is:
> Query(((Tmail AND 0 * XSUBJECTnot@1) AND_NOT (((Kspam OR Kdeleted) OR Kmuted) OR Kbad-address)))
> 9927
> 

-- 
 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xapian parser bug?
  2018-09-30  8:50 ` James Aylett
@ 2018-09-30  9:20   ` Olly Betts
  2018-09-30 12:05     ` David Bremner
  0 siblings, 1 reply; 7+ messages in thread
From: Olly Betts @ 2018-09-30  9:20 UTC (permalink / raw)
  To: James Aylett; +Cc: David Bremner, notmuch, xapian-discuss

On Sun, Sep 30, 2018 at 09:50:30AM +0100, James Aylett wrote:
> Note that I'm using 1.4.7, and from your output I believe you're not
> (the * in the query description I believe doesn't happen in those
> situations any more).

1.4.4 and later eliminate redundant 0 scaling factors, but this one
isn't actually redundant:

> > Query(((Tmail AND 0 * XSUBJECTnot@1) AND_NOT (((Kspam OR Kdeleted) OR Kmuted) OR Kbad-address)))

If it was on the right-hand side of AND_NOT it would be eliminated
(because the right-hand side doesn't contribute any weight anyway).

FWIW, I also couldn't reproduce this (I tried with quest and 1.4.7):

$ quest -psubject:S -fdefault,boolean_any_case 'subject:"and"'
Parsed Query: Query(Sand@1)

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xapian parser bug?
  2018-09-30  9:20   ` Olly Betts
@ 2018-09-30 12:05     ` David Bremner
  2018-09-30 17:49       ` David Bremner
  2018-09-30 20:43       ` Olly Betts
  0 siblings, 2 replies; 7+ messages in thread
From: David Bremner @ 2018-09-30 12:05 UTC (permalink / raw)
  To: Olly Betts, James Aylett; +Cc: notmuch, xapian-discuss

Olly Betts <olly@survex.com> writes:

>
> FWIW, I also couldn't reproduce this (I tried with quest and 1.4.7):
>
> $ quest -psubject:S -fdefault,boolean_any_case 'subject:"and"'
> Parsed Query: Query(Sand@1)
>

Ah, OK, it must have something to do with the way that notmuch is using
field processors. And I see now that the following code (from
lib/regexp-fields.cc) is probably related (at least it explains
subject:" not" works)

            if (str.find (' ') != std::string::npos)
		query_str = '"' + str + '"';
	    else
		query_str = str;

	    return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);

The motivation for not always triggering phrase processing is that it
breaks/disables wildcards. In particular this change was to fix the
query 'subject:foo*'.  The difficulty here is that the field processor
doesn't know if its string argument was originally quoted.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xapian parser bug?
  2018-09-30 12:05     ` David Bremner
@ 2018-09-30 17:49       ` David Bremner
  2018-09-30 20:43       ` Olly Betts
  1 sibling, 0 replies; 7+ messages in thread
From: David Bremner @ 2018-09-30 17:49 UTC (permalink / raw)
  To: Olly Betts, James Aylett; +Cc: notmuch, xapian-discuss

David Bremner <david@tethera.net> writes:

> Olly Betts <olly@survex.com> writes:
>
>>
>> FWIW, I also couldn't reproduce this (I tried with quest and 1.4.7):
>>
>> $ quest -psubject:S -fdefault,boolean_any_case 'subject:"and"'
>> Parsed Query: Query(Sand@1)
>>
>
> Ah, OK, it must have something to do with the way that notmuch is using
> field processors. And I see now that the following code (from
> lib/regexp-fields.cc) is probably related (at least it explains
> subject:" not" works)
>
>             if (str.find (' ') != std::string::npos)
> 		query_str = '"' + str + '"';
> 	    else
> 		query_str = str;
>
> 	    return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);

For the record, I have proposed a fix for notmuch (str is known to be
non-empty there). This will phrase quote by default, unless the string
looks like a wildcard query (without spaces).

diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 084bc8c0..52f30d82 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -194,7 +194,7 @@ RegexpFieldProcessor::operator() (const std::string & str)
             * phrase parsing, when possible */
            std::string query_str;
 
-           if (str.find (' ') != std::string::npos)
+           if (*str.rbegin () != '*' || str.find (' ') != std::string::npos)
                query_str = '"' + str + '"';
            else

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: xapian parser bug?
  2018-09-30 12:05     ` David Bremner
  2018-09-30 17:49       ` David Bremner
@ 2018-09-30 20:43       ` Olly Betts
  2018-10-01  1:25         ` David Bremner
  1 sibling, 1 reply; 7+ messages in thread
From: Olly Betts @ 2018-09-30 20:43 UTC (permalink / raw)
  To: David Bremner; +Cc: James Aylett, notmuch, xapian-discuss

On Sun, Sep 30, 2018 at 09:05:25AM -0300, David Bremner wrote:
>             if (str.find (' ') != std::string::npos)
> 		query_str = '"' + str + '"';
> 	    else
> 		query_str = str;
> 
> 	    return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);

I wouldn't recommend trying to generate strings to feed to QueryParser
like this code seems to be doing.  QueryParser aims to parse input from
humans not machines.

As well as the case where str is an operation name, the code above looks
like it will mishandle cases where str contains a tab or double quotes.
There are likely other problem cases too.

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xapian parser bug?
  2018-09-30 20:43       ` Olly Betts
@ 2018-10-01  1:25         ` David Bremner
  0 siblings, 0 replies; 7+ messages in thread
From: David Bremner @ 2018-10-01  1:25 UTC (permalink / raw)
  To: Olly Betts; +Cc: notmuch, xapian-discuss

Olly Betts <olly@survex.com> writes:

> On Sun, Sep 30, 2018 at 09:05:25AM -0300, David Bremner wrote:
>>             if (str.find (' ') != std::string::npos)
>> 		query_str = '"' + str + '"';
>> 	    else
>> 		query_str = str;
>> 
>> 	    return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
>
> I wouldn't recommend trying to generate strings to feed to QueryParser
> like this code seems to be doing.  QueryParser aims to parse input from
> humans not machines.

str is the parameter to the FieldProcessor () operator.  The field
processor needs a way to approximate the standard probabilistic prefix
parsing in the fallback case. The addition of quotes is to force the
generation of a phrase query, otherwise e.g. subject:"christmas party"
doesn't work out well.

I tried using OP_PHRASE as a the default operators, but it doesn't
handle some cases I need.

% quest -o phrase 'bob jones <bob@example.com>'       
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

If I don't recursively call parse_query, then I guess I need to generate
terms in a compatible way before turning them into a phrase query. Maybe
that's not as hard as I orginally thought, since being in phrase turns
off the stemmer anyway iiuc.  Is there a Xapian API I can use to extract
 "bob", "jones", "bob", "example", "com" from the example above? I guess
 I guess I could use a throwaway Xapian::Document and a TermGenerator
 (basically aping xapian_core/tests/api_termgen.cc).

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-01  1:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-29 22:09 xapian parser bug? David Bremner
2018-09-30  8:50 ` James Aylett
2018-09-30  9:20   ` Olly Betts
2018-09-30 12:05     ` David Bremner
2018-09-30 17:49       ` David Bremner
2018-09-30 20:43       ` Olly Betts
2018-10-01  1:25         ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).