From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 390556DE028C for ; Sat, 7 Apr 2018 20:04:42 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[AWL=0.011, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Xwx9u4IEi2YW for ; Sat, 7 Apr 2018 20:04:40 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 4CFEF6DE028A for ; Sat, 7 Apr 2018 20:04:40 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1f50dB-0003QP-4C; Sat, 07 Apr 2018 23:04:37 -0400 Received: (nullmailer pid 27066 invoked by uid 1000); Sun, 08 Apr 2018 03:04:35 -0000 From: David Bremner To: "Naveen N. Rao" , notmuch@notmuchmail.org Subject: Re: 'notmuch search thread:<>' lists multiple threads In-Reply-To: <1523007700.l8xm6nm6af.naveen@linux.ibm.com> References: <1523007700.l8xm6nm6af.naveen@linux.ibm.com> Date: Sun, 08 Apr 2018 00:04:35 -0300 Message-ID: <87sh86v1oc.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Apr 2018 03:04:42 -0000 "Naveen N. Rao" writes: > Greetings-- > If I search for threads matching a specific thread-id, I am seeing=20 > multiple results: > > $ notmuch search --output=3Dthreads thread:00000000000c4d20 > thread:00000000000c4d1e > thread:00000000000c4d20 This looks like a bug to me. I was able to replicate it in my own mail store with the script at the end of the message. I haven't completely analyzed the situation yet, but one thing I noticed is that in all "bad threads", there are files with duplicate message-ids. Typical output looks like =E2=95=AD=E2=94=80 zancas:software/upstream/notmuch/test=20 =E2=95=B0=E2=94=80 (git)-[master]-% notmuch search thread:000000000001760a thread:00000000000175e5 November 03 [1/2(3)] 128@gmx.us; Bug#846042: VTK 8= (unread) thread:000000000001760a 2016-11-27 [1/2(3)] 128@gmx.us; Bug#846042: virtu= al/meta package for python-vtk (unread) At least some of this mail data is public, but I'm not sure if the bad threading is reproducible or not; I want to run a complete census overnight before I reindex. Even if the bug is non-deterministic, it probably lives in lib/add-message.= cc ---------------------------------------------------------------------- count=3D0 success=3D0 for id in $(notmuch search --output=3Dthreads '*'); do count=3D$((count +1)) matches=3D$((`notmuch search --output=3Dthreads "$id" | wc -l`)) if [ "$matches" =3D 1 ]; then success=3D$((success + 1)) else echo "bad thread: $id" fi if [ $((count % 1000)) -eq 0 ]; then echo $count; fi done echo "count=3D$count success=3D$success"