From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 6030D6DE025F for ; Sun, 30 Sep 2018 18:25:39 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 0.002 X-Spam-Level: X-Spam-Status: No, score=0.002 tagged_above=-999 required=5 tests=[AWL=0.013, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qc5R8zxOOYlU for ; Sun, 30 Sep 2018 18:25:38 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 1C53E6DE023F for ; Sun, 30 Sep 2018 18:25:37 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1g6mxq-0006LN-NP; Sun, 30 Sep 2018 21:25:34 -0400 Received: (nullmailer pid 6423 invoked by uid 1000); Mon, 01 Oct 2018 01:25:33 -0000 From: David Bremner To: Olly Betts Cc: notmuch@notmuchmail.org, xapian-discuss@lists.xapian.org Subject: Re: xapian parser bug? In-Reply-To: <20180930204327.a4dwzh6jdqcqvk2e@survex.com> References: <87a7o02bya.fsf@tethera.net> <20180930092039.7imrsrjyctpel2sp@survex.com> <87y3bj198a.fsf@tethera.net> <20180930204327.a4dwzh6jdqcqvk2e@survex.com> Date: Sun, 30 Sep 2018 22:25:33 -0300 Message-ID: <87sh1q1mr6.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Oct 2018 01:25:39 -0000 Olly Betts writes: > On Sun, Sep 30, 2018 at 09:05:25AM -0300, David Bremner wrote: >> if (str.find (' ') != std::string::npos) >> query_str = '"' + str + '"'; >> else >> query_str = str; >> >> return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix); > > I wouldn't recommend trying to generate strings to feed to QueryParser > like this code seems to be doing. QueryParser aims to parse input from > humans not machines. str is the parameter to the FieldProcessor () operator. The field processor needs a way to approximate the standard probabilistic prefix parsing in the fallback case. The addition of quotes is to force the generation of a phrase query, otherwise e.g. subject:"christmas party" doesn't work out well. I tried using OP_PHRASE as a the default operators, but it doesn't handle some cases I need. % quest -o phrase 'bob jones ' UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries If I don't recursively call parse_query, then I guess I need to generate terms in a compatible way before turning them into a phrase query. Maybe that's not as hard as I orginally thought, since being in phrase turns off the stemmer anyway iiuc. Is there a Xapian API I can use to extract "bob", "jones", "bob", "example", "com" from the example above? I guess I guess I could use a throwaway Xapian::Document and a TermGenerator (basically aping xapian_core/tests/api_termgen.cc). d