From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 3A8156DE10F9 for ; Thu, 29 Aug 2019 17:13:25 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.054 X-Spam-Level: X-Spam-Status: No, score=-0.054 tagged_above=-999 required=5 tests=[AWL=-0.053, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SdVkNyVxhzO1 for ; Thu, 29 Aug 2019 17:13:23 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 06B6C6DE10F3 for ; Thu, 29 Aug 2019 17:13:22 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1i3UXX-00005k-3J; Thu, 29 Aug 2019 20:13:19 -0400 Received: (nullmailer pid 7508 invoked by uid 1000); Fri, 30 Aug 2019 00:13:17 -0000 From: David Bremner To: "Jorge P. de Morais Neto" , notmuch@notmuchmail.org Subject: Re: Strange, incoherent query parsing In-Reply-To: <874l21wc9w.fsf@disroot.org> References: <877e6ywlt7.fsf@disroot.org> <87d0gp8uw3.fsf@tethera.net> <874l21wc9w.fsf@disroot.org> Date: Thu, 29 Aug 2019 21:13:17 -0300 Message-ID: <871rx38q8y.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Aug 2019 00:13:25 -0000 Jorge P. de Morais Neto writes: > Hello, > > Em 2019-08-28T07:08:28-0300, David Bremner escreveu: > >> I don't know about the other cases but here there is an extra closing >> ')', which confuses the query parser. I agree that an error report >> would be nicer, but we don't currently have that option from Xapian. > > Oops. Sorry about that. But still there seem to be real errors. The > first query[1] should match messages containing both "Sa=C3=BAde" and "Ge= ap" > (sans the double quotes) in the subject, but does not. > > 1: 'subject:("Sa=C3=BAde" "Geap") OR subject:xplitz' There's two problems here. The first is because of the way regexp fields are implimented subject: doesn't tolerate the unescaped/unquoted space. So you'd have to write 'subject:"(Sa=C3=BAde Geap)"' to get anything sensi= ble. The other problem is that are treating the part within () as a quoted phrase, which is not correct. If you can build from source try the patch below [1]. With it, I think notmuch count 'subject:"(Sa=C3=BAde Geap)" OR subject:xplitz' should behave more like what you want. Notice my use of quotes is bit different that your original query. > > Also note that the fourth query[2] should have AND as the implicit > boolean operator, because 'subject:' is not a boolean prefix. You > explained in the other thread the implicit OR operator only applies for > search terms with the same /boolean/ prefix. Yes, this is the same problem as in your other report. I posted some patches that should fix it, but they need more testing. [1]: diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc index 198eb32f..858b1e24 100644 --- a/lib/regexp-fields.cc +++ b/lib/regexp-fields.cc @@ -189,6 +189,13 @@ RegexpFieldProcessor::operator() (const std::string & = str) } else { throw Xapian::QueryParserError ("unmatched regex delimiter in '= " + str + "'"); } + } else if (str.at (0) =3D=3D '(') { + if (str.length () > 1 && str.at (str.size () - 1) =3D=3D ')') { + std::string subexp_str =3D str.substr (1, str.size () - 2); + return parser.parse_query (subexp_str, NOTMUCH_QUERY_PARSER_FLA= GS, term_prefix); + } else { + throw Xapian::QueryParserError ("unmatched '(' in '" + str + "'= "); + } } else { if (options & NOTMUCH_FIELD_PROBABILISTIC) { /* TODO replace this with a nicer API level triggering of