From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id C91716DE0A5B for ; Mon, 5 Jun 2017 18:09:32 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=5 tests=[AWL=0.010, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uNfxalYKmPSV for ; Mon, 5 Jun 2017 18:09:31 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 7BC2A6DE0355 for ; Mon, 5 Jun 2017 18:09:31 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.84_2) (envelope-from ) id 1dI2yz-0007S6-H7; Mon, 05 Jun 2017 21:08:29 -0400 Received: (nullmailer pid 10030 invoked by uid 1000); Tue, 06 Jun 2017 01:09:26 -0000 From: David Bremner To: Daniel Kahn Gillmor , notmuch@freelists.org, notmuch@notmuchmail.org Subject: Re: [patch v3 06/12] lib: index message files with duplicate message-ids In-Reply-To: <87k24rebkx.fsf@fifthhorseman.net> References: <20170604123235.24466-1-david@tethera.net> <20170604123235.24466-7-david@tethera.net> <87k24rebkx.fsf@fifthhorseman.net> Date: Mon, 05 Jun 2017 22:09:26 -0300 Message-ID: <87ink9gbw9.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jun 2017 01:09:32 -0000 Daniel Kahn Gillmor writes: > On Sun 2017-06-04 09:32:29 -0300, David Bremner wrote: >> The corresponding xapian document just gets more terms added to it, >> but this doesn't seem to break anything. Values on the other hand get >> overwritten, which is a bit annoying, but arguably it is not worse to >> take the values (from, subject, date) from the last file indexed >> rather than the first. [snip] > for example, i could follow up on the current message with another > message with Message-Id: 20170604123235.24466-7-david@tethera.net and > give it a subject "Re: [patch v3 06/12] lib: do *not* index message > files with duplicate message-ids". that's a bit odd, no? Yes, I agree that's a bit strange. We should make some effort to display the subject that belongs with a given message body. I think it's not too hard [1] to preserve the old behaviour of keeping the first subject, date, and from. This leaves us with a version of the original hiding message attack, but only for the special case of regex searches, since those rely exclusively on the value slots. [1]: should be just a matter of guarding the call to _notmuch_message_set_header_values() with if (is_new || is_ghost), but that needs testing.