From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 0E1236DE0B00 for ; Fri, 9 Jun 2017 03:57:54 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=5 tests=[AWL=0.010, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jIK5zvvBcrO4 for ; Fri, 9 Jun 2017 03:57:52 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id F3D986DE0A6C for ; Fri, 9 Jun 2017 03:57:51 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.84_2) (envelope-from ) id 1dJHay-0006Tf-Bf; Fri, 09 Jun 2017 06:56:48 -0400 Received: (nullmailer pid 26585 invoked by uid 1000); Fri, 09 Jun 2017 10:57:46 -0000 From: David Bremner To: Daniel Kahn Gillmor , notmuch@freelists.org, notmuch@notmuchmail.org Subject: Re: [patch v3 06/12] lib: index message files with duplicate message-ids In-Reply-To: <87ink9gbw9.fsf@tethera.net> References: <20170604123235.24466-1-david@tethera.net> <20170604123235.24466-7-david@tethera.net> <87k24rebkx.fsf@fifthhorseman.net> <87ink9gbw9.fsf@tethera.net> Date: Fri, 09 Jun 2017 07:57:46 -0300 Message-ID: <87lgp1e8d1.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Jun 2017 10:57:54 -0000 David Bremner writes: > Daniel Kahn Gillmor writes: >> for example, i could follow up on the current message with another >> message with Message-Id: 20170604123235.24466-7-david@tethera.net and >> give it a subject "Re: [patch v3 06/12] lib: do *not* index message >> files with duplicate message-ids". that's a bit odd, no? > > Yes, I agree that's a bit strange. We should make some effort to > display the subject that belongs with a given message body. I think it's > not too hard [1] to preserve the old behaviour of keeping the first > subject, date, and from. This leaves us with a version of the original > hiding message attack, but only for the special case of regex searches, > since those rely exclusively on the value slots. I had a slightly radical idea for how to deal with that. Subject/from from extra files could be appended to the value slot (e.g. separated by newlines). Then regexp searches would behave similarly to term based searches in that matching any file would match the message. We'd have to be slightly careful about what anchors meant. A further enhancement would be to expose the search result as an array. This kind of approach doesn't really make sense for dates, as we essentially search for those as numbers, and such a hack would break the built-in xapian range search.