From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 6EB466DE01F7 for ; Mon, 6 Jun 2016 13:09:26 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.823 X-Spam-Level: X-Spam-Status: No, score=-0.823 tagged_above=-999 required=5 tests=[AWL=-0.813, HTML_MESSAGE=0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ti-kiAZDuIZc for ; Mon, 6 Jun 2016 13:09:17 -0700 (PDT) X-Greylist: delayed 2913 seconds by postgrey-1.35 at arlo; Mon, 06 Jun 2016 13:08:57 PDT Received: from outgoing-tmp.csail.mit.edu (outgoing-tmp.csail.mit.edu [128.30.2.206]) by arlo.cworth.org (Postfix) with ESMTP id 063316DE0217 for ; Mon, 6 Jun 2016 13:08:56 -0700 (PDT) Received: from mail-yw0-f173.google.com ([209.85.161.173]) by outgoing-tmp.csail.mit.edu with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bA04T-0001og-80 for notmuch@notmuchmail.org; Mon, 06 Jun 2016 15:20:21 -0400 Received: by mail-yw0-f173.google.com with SMTP id c127so149455948ywb.1 for ; Mon, 06 Jun 2016 12:20:21 -0700 (PDT) X-Gm-Message-State: ALyK8tJzsZjhatT/stPriuVDauiazZsgnRt3IN/TeSV3HzCqPCyvNkrLFljgQTSX0n/Vnw05iAGS63zbY6Cfqg== X-Received: by 10.129.45.196 with SMTP id t187mr13435296ywt.153.1465240820424; Mon, 06 Jun 2016 12:20:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.200.7 with HTTP; Mon, 6 Jun 2016 12:20:19 -0700 (PDT) In-Reply-To: <878tyins3j.fsf@tesseract.cs.unb.ca> References: <1465196150-astroid-3-33kf2otxir-16915@strange> <87lh2ijxor.fsf@tesseract.cs.unb.ca> <1465217156-astroid-4-8l08w9cils-2318@strange> <877fe2tiy8.fsf@uwaterloo.ca> <878tyins3j.fsf@tesseract.cs.unb.ca> From: Austin Clements Date: Mon, 6 Jun 2016 15:20:19 -0400 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: searching: '*analysis' vs 'reanalysis' To: David Bremner Cc: sfischme@uwaterloo.ca, Gaute Hope , notmuch Content-Type: multipart/alternative; boundary=001a1141df549cccb00534a0f6b3 X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jun 2016 20:09:26 -0000 --001a1141df549cccb00534a0f6b3 Content-Type: text/plain; charset=UTF-8 On Mon, Jun 6, 2016 at 1:29 PM, David Bremner wrote: > Sebastian Fischmeister writes: > > > > > I ran into this problem before as well. Storage is cheap. Notmuch could > > index all emails with reversed text to get around some of this > > problem. It doesn't solve the problem of *analysis*, but it's still an > > improvement. > > It would probably be more useful to have brute force regexp searches on > headers. Austin did some experiments that sounded promising, where you > basically postprocess the result of a xapian query with a regexp. OTOH, > I don't know what kept him from proposing this for mainline. If it was > just parser issues, those are probably more or less solved now, at least > for people using xapian 1.3+ > The experiment was specifically for regexp matching subject, but it should work for any header we store a literal copy of in the database. The code is here, though in its current form it builds on my custom query parser: https://github.com/aclements/notmuch/commit/ce41b29aba4d9b84e2f1eb6ed8df67065196c960. Based on my understanding of Xapian 1.3+ field processors, these days it should be quite easy to hook the PostingSource in that commit into the Xapian QueryProcessor. --001a1141df549cccb00534a0f6b3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On M= on, Jun 6, 2016 at 1:29 PM, David Bremner <david@tethera.net> wrote:
Sebastian Fischmeister <sfischme@uwaterloo.ca> writes:

>
> I ran into this problem before as well. Storage is cheap. Notmuch coul= d
> index all emails with reversed text to get around some of this
> problem. It doesn't solve the problem of *analysis*, but it's = still an
> improvement.

It would probably be more useful to have brute force regexp searches on
headers.=C2=A0 Austin did some experiments that sounded promising, where yo= u
basically postprocess the result of a xapian query with a regexp. OTOH,
I don't know what kept him from proposing this for mainline. If it was<= br> just parser issues, those are probably more or less solved now, at least for people using xapian 1.3+

The experi= ment was specifically for regexp matching subject, but it should work for a= ny header we store a literal copy of in the database. The code is here, tho= ugh in its current form it builds on my custom query parser:=C2=A0https://github.com/aclements/notmuch/commit/ce41b29aba4d9b84= e2f1eb6ed8df67065196c960. Based on my understanding of Xapian 1.3+ fiel= d processors, these days it should be quite easy to hook the PostingSource = in that commit into the Xapian QueryProcessor.
--001a1141df549cccb00534a0f6b3--