From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 7C6F06DE14D9 for ; Wed, 15 Mar 2017 18:57:55 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.005 X-Spam-Level: X-Spam-Status: No, score=-0.005 tagged_above=-999 required=5 tests=[AWL=0.006, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UC-cinH-8QWX for ; Wed, 15 Mar 2017 18:57:55 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id EA6336DE14A1 for ; Wed, 15 Mar 2017 18:57:54 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.84_2) (envelope-from ) id 1coKfA-00061y-9O for notmuch@notmuchmail.org; Wed, 15 Mar 2017 21:57:12 -0400 Received: (nullmailer pid 29483 invoked by uid 1000); Thu, 16 Mar 2017 01:57:52 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: a first step for the duplicate message-id dilemma Date: Wed, 15 Mar 2017 22:57:26 -0300 Message-Id: <20170316015728.29325-1-david@tethera.net> X-Mailer: git-send-email 2.11.0 X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 01:57:55 -0000 These are mainly RFC because I'm not 100% sure about the performance impact. It seems OK for me: about 3% slower indexing my 500 K messages with about 35k duplicates. I didn't see a noticable increase in database size (both cases it's 5.8G / 3.5G before/after notmuch compact). There are also tons of UI issues: for example in the test case here, notmuch search subject:'"message 2"' will happily print thread:0000000000000001 2001-01-05 [1/1] Notmuch Test Suite; message 1 (inbox unread) I claim it's still an improvement over the current code, where that second message is not findable by any terms unique to it.