From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 20DD3431FBD for ; Sun, 20 Apr 2014 10:46:14 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1wJVgg0Txobt for ; Sun, 20 Apr 2014 10:46:06 -0700 (PDT) Received: from dmz-mailsec-scanner-6.mit.edu (dmz-mailsec-scanner-6.mit.edu [18.7.68.35]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id A1C39431FBC for ; Sun, 20 Apr 2014 10:46:06 -0700 (PDT) X-AuditID: 12074423-f79476d000000c51-ca-535407dc81d5 Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP id 06.66.03153.CD704535; Sun, 20 Apr 2014 13:46:04 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id s3KHk3xc005104 for ; Sun, 20 Apr 2014 13:46:04 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s3KHk1jk021826 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Sun, 20 Apr 2014 13:46:03 -0400 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1WbvoX-0005zb-D0 for notmuch@notmuchmail.org; Sun, 20 Apr 2014 13:46:01 -0400 Date: Sun, 20 Apr 2014 13:46:01 -0400 From: Austin Clements To: notmuch@notmuchmail.org Subject: Re: excessive thread fusing Message-ID: <20140420174601.GC25817@mit.edu> References: <87ioq5mrbz.fsf@maritornes.cs.unb.ca> <20140419210439.GC1797@sid.nuvreauspam> <20140420164812.GB25817@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140420164812.GB25817@mit.edu> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrPIsWRmVeSWpSXmKPExsUixCmqrHuHPSTY4Px+RYvrN2cyOzB6PFt1 izmAMYrLJiU1J7MstUjfLoEr4/aynywFc0Uqnm38xtLA+JS/i5GTQ0LAROLGpsPMELaYxIV7 69m6GLk4hARmM0lcezKFHcI5zyhx8fFqKOclk8SeZVegnEOMElf/HGUD6WcRUJWYtHUGK4jN JqAhsW3/ckYQW0RAWmLn3dlgcWEBFYmNZ2Yzgdi8AjoS+x8sgNq3hFGi5dhkRoiEoMTJmU9Y QGxmAS2JG/9eAjVwANnSEsv/cYCEOQV0JZrmPQG7WxRo5pST29gmMArOQtI9C0n3LITuBYzM qxhlU3KrdHMTM3OKU5N1i5MT8/JSi3TN9HIzS/RSU0o3MYLCld1FeQfjn4NKhxgFOBiVeHgn fAsKFmJNLCuuzD3EKMnBpCTK+4UpJFiILyk/pTIjsTgjvqg0J7UYGCAczEoivCdfBgcL8aYk VlalFuXDpKQ5WJTEed9aWwULCaQnlqRmp6YWpBbBZGU4OJQkeNuAcSkkWJSanlqRlplTgpBm 4uAEGc4DNNwZpIa3uCAxtzgzHSJ/ilFRSpy3kg0oIQCSyCjNg+uFpZNXjOJArwjz9oG08wBT EVz3K6DBTECD/54JABlckoiQkmpgNNGofVZ4ql+0a4NirNoZs7oeO+Wuz8mut4TKlf5bFwnV Tgya9zeMX0rPqXav38wTLH6qO9dvXnfwjPvmx85LN+u4Bne8cJVtzeVieScX4KZ0LCKwYLu9 yZfWU9O6dbnkjDvuXfSs/Bau2j5ZR99tU++FzJt6lsYJymtMyt48+DpnUs7xUCklluKMREMt 5qLiRAAwfaQ3AgMAAA== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2014 17:46:14 -0000 Quoth myself on Apr 20 at 12:48 pm: > Quoth Andrei POPESCU on Apr 20 at 12:04 am: > > On Sb, 19 apr 14, 18:52:02, Eric wrote: > > > > > > This may not actually be any help, but both hypermail and mhonarc agree > > > that two messages form a separate thread from the rest. I believe that > > > the latter, at least, is the JWZ algorithm. > > > > mutt concurs. > > Can anyone explain why JWZ *doesn't* have the same problem? I don't > see how this heuristic doesn't doom it to the same fate: > > The References field is populated from the ``References'' and/or > ``In-Reply-To'' headers. If both headers exist, take the first thing > in the In-Reply-To header that looks like a Message-ID, and append > it to the References header. > > Given this, even considering only messages 18 and 52 (which "should" > be in different threads), JWZ should find the common "parent" > e.fraga@ucl.ac.uk and link them in to the same thread: > > Add 18 (step 1) > - The combined "references" list is > - Creates and links containers 17 <- e.fraga@ucl.ac.uk <- 18 where the > first two are empty > > Add 52 (step 1) > - The combined "references" list is > > - Creates and links containers 31 <- 32 <- 39 > - Also considers container e.fraga@ucl.ac.uk, but this is already > linked, so it doesn't change it > - Creates container 52 and links e.fraga@ucl.ac.uk <- 52 (step 1C) > > 18 and 52 will later get promoted over their empty parent (step 4), > but will remain in the same thread. > > What am I missing? Or are these other MUAs not using pure JWZ? I dug in to mutt's mutt_sort_threads a bit. It's not using JWZ, though it's something similar. The most salient thing may be how it handles in-reply-to and references: 1. If a message has both in-reply-to and references, the parent chain is the *last* in-reply-to ID and then the references from right to left (skipping the last reference ID if it's the same as the last in-reply-to ID). (See also mutt_parse_references.) 2. If a message has only in-reply-to, the parent chain is *all* of the IDs in in-reply-to *from right to left* (e.g., the right-most one is the immediate parent). 3. If a message has only references, the parent chain is that, from right to left. Like JWZ, mutt creates and links together "empty containers" as it scans the parent chain towards the root, though unlike JWZ it stops when it finds a non-empty container or a container that already has a parent.