From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id B0809431FBD for ; Sun, 20 Apr 2014 00:15:48 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0.502 X-Spam-Level: X-Spam-Status: No, score=0.502 tagged_above=-999 required=5 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xht2Z1q9FFSB for ; Sun, 20 Apr 2014 00:15:24 -0700 (PDT) Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id C1DB4431FB6 for ; Sun, 20 Apr 2014 00:15:23 -0700 (PDT) Received: from smtp.qmul.ac.uk ([138.37.6.40]) by mail2.qmul.ac.uk with esmtp (Exim 4.71) (envelope-from ) id 1Wblxu-0008NW-Nu; Sun, 20 Apr 2014 08:15:14 +0100 Received: from 94.196.250.77.threembb.co.uk ([94.196.250.77] helo=localhost) by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.71) (envelope-from ) id 1Wblxt-0006wl-ST; Sun, 20 Apr 2014 08:15:02 +0100 From: Mark Walters To: David Bremner , notmuch Subject: [RFC PATCH] Re: excessive thread fusing In-Reply-To: <87ioq5mrbz.fsf@maritornes.cs.unb.ca> References: <87ioq5mrbz.fsf@maritornes.cs.unb.ca> User-Agent: Notmuch/0.15.2+615~g78e3a93 (http://notmuchmail.org) Emacs/23.4.1 (x86_64-pc-linux-gnu) Date: Sun, 20 Apr 2014 08:14:56 +0100 Message-ID: <87fvl8mpzj.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Sender-Host-Address: 94.196.250.77 X-QM-Geographic: According to ripencc, this message was delivered by a machine in Britain (UK) (GB). X-QM-SPAM-Info: Sender has good ham record. :) X-QM-Body-MD5: 71a1118bc27b9b5f9855b6c1b2fb6dba (of first 20000 bytes) X-SpamAssassin-Score: 0.1 X-SpamAssassin-SpamBar: / X-SpamAssassin-Report: The QM spam filters have analysed this message to determine if it is spam. We require at least 5.0 points to mark a message as spam. This message scored 0.1 points. Summary of the scoring: * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (markwalters1009[at]gmail.com) * 0.1 AWL AWL: From: address is in the auto white-list X-QM-Scan-Virus: ClamAV says the message is clean X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2014 07:15:48 -0000 On Sat, 19 Apr 2014, David Bremner wrote: > Gregor Zattler mentioned some problems with threading at > > id:20120126004024.GA13704@shi.workgroup > > After some off list discussions, I believe I have a smaller test case. > > The attached maildir contains 24 messages from the org mode list. > > According to notmuch, these form one thread, but I can't figure out > exactly why. It seems like the chronologically first two messages should > be a seperate thread. There are several of the infamous malformed ME-E > In-reply-to headers, but each of these messages also has a References > header; this seems to indicate a case missed by commit cf8aaafbad68. Hi I have done dome debugging of this. There is a patch below which fixes this test case but who knows what it breaks! Please DO NOT apply unless someone who knows this code says it's OK. First, the bug is quite sensitive. The attached 24 messages are numbered and i will use the last two digits to refer to them (ie the 2 digits are the ?? in 1397885606.0002??.mbox:2,). The number range from 17-52; 17 and 18 should be one thread and the rest a different thread. 1) If you add all messages you get one thread. 2) If you add all apart from 52 you get 2 threads. However, then adding 52 still gives two threads. 3) If you add 18 and then 52 you get 1 thread. 4) If you add 17 and 18 then 52 you get 2 threads. I think notmuch will use inode sort and since the tar file contains these three files in the order 18 52 17 we get cases 1 and 2 above. I put some debug stuff in _notmuch_database_link_message_to_parents and I think that the problem comes from the call to parse_references on line 1767 which adds the malformed in-reply-to header to the hash table, so this malformed line gets added as a potential parent. As a clear example that I don't understand this code I don't know why this no longer causes a problem if message 17 gets added too. Best wishes Mark --- lib/database.cc | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 1efb14d..373a255 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -1763,20 +1763,23 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, this_message_id, parents, refs); - in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); - in_reply_to_message_id = parse_references (message, - this_message_id, - parents, in_reply_to); - /* For the parent of this message, use the last message ID of the * References header, if available. If not, fall back to the - * first message ID in the In-Reply-To header. */ + * first message ID in the In-Reply-To header. We only parse the + * In-Reply-To header if we need to as otherwise we might + * contanimate the hash table if it is malformed. */ if (last_ref_message_id) { _notmuch_message_add_term (message, "replyto", last_ref_message_id); - } else if (in_reply_to_message_id) { - _notmuch_message_add_term (message, "replyto", - in_reply_to_message_id); + } else { + in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); + in_reply_to_message_id = parse_references (message, + this_message_id, + parents, in_reply_to); + if (in_reply_to_message_id) { + _notmuch_message_add_term (message, "replyto", + in_reply_to_message_id); + } } keys = g_hash_table_get_keys (parents); -- 1.7.10.4