unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Mark Walters <markwalters1009@gmail.com>
To: David Bremner <david@tethera.net>, notmuch <notmuch@notmuchmail.org>
Subject: [RFC PATCH] Re: excessive thread fusing
Date: Sun, 20 Apr 2014 08:14:56 +0100	[thread overview]
Message-ID: <87fvl8mpzj.fsf@qmul.ac.uk> (raw)
In-Reply-To: <87ioq5mrbz.fsf@maritornes.cs.unb.ca>


On Sat, 19 Apr 2014, David Bremner <david@tethera.net> wrote:
> Gregor Zattler mentioned some problems with threading at 
>
>        id:20120126004024.GA13704@shi.workgroup
>
> After some off list discussions, I believe I have a smaller test case.
>
> The attached maildir contains 24 messages from the org mode list.
>
> According to notmuch, these form one thread, but I can't figure out
> exactly why. It seems like the chronologically first two messages should
> be a seperate thread. There are several of the infamous malformed ME-E
> In-reply-to headers, but each of these messages also has a References
> header; this seems to indicate a case missed by commit cf8aaafbad68.

Hi 

I have done dome debugging of this. There is a patch below which fixes
this test case but who knows what it breaks! Please DO NOT apply unless
someone who knows this code says it's OK.

First, the bug is quite sensitive. The attached 24 messages are numbered
and i will use the last two digits to refer to them (ie the 2 digits are
the ?? in 1397885606.0002??.mbox:2,). The number range from 17-52; 17
and 18 should be one thread and the rest a different thread.

1) If you add all messages you get one thread. 
2) If you add all apart from 52 you get 2 threads. However, then adding
52 still gives two threads.
3) If you add 18 and then 52 you get 1 thread.
4) If you add 17 and 18 then 52 you get 2 threads.

I think notmuch will use inode sort and since the tar file contains
these three files in the order 18 52 17 we get cases 1 and 2 above.

I put some debug stuff in _notmuch_database_link_message_to_parents and
I think that the problem comes from the call to parse_references on line
1767 which adds the malformed in-reply-to header to the hash table, so
this malformed line gets added as a potential parent. 

As a clear example that I don't understand this code I don't know why
this no longer causes a problem if message 17 gets added too.

Best wishes

Mark

---
 lib/database.cc |   21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 1efb14d..373a255 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1763,20 +1763,23 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
 					    this_message_id,
 					    parents, refs);
 
-    in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to");
-    in_reply_to_message_id = parse_references (message,
-					       this_message_id,
-					       parents, in_reply_to);
-
     /* For the parent of this message, use the last message ID of the
      * References header, if available.  If not, fall back to the
-     * first message ID in the In-Reply-To header. */
+     * first message ID in the In-Reply-To header. We only parse the
+     * In-Reply-To header if we need to as otherwise we might
+     * contanimate the hash table if it is malformed. */
     if (last_ref_message_id) {
         _notmuch_message_add_term (message, "replyto",
                                    last_ref_message_id);
-    } else if (in_reply_to_message_id) {
-	_notmuch_message_add_term (message, "replyto",
-			     in_reply_to_message_id);
+    } else {
+	in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to");
+	in_reply_to_message_id = parse_references (message,
+						   this_message_id,
+						   parents, in_reply_to);
+	if (in_reply_to_message_id) {
+	    _notmuch_message_add_term (message, "replyto",
+				       in_reply_to_message_id);
+	}
     }
 
     keys = g_hash_table_get_keys (parents);
-- 
1.7.10.4

  parent reply	other threads:[~2014-04-20  7:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-19 12:33 excessive thread fusing David Bremner
2014-04-19 17:52 ` Eric
2014-04-19 21:04   ` Andrei POPESCU
2014-04-20 16:48     ` Austin Clements
2014-04-20 17:46       ` Austin Clements
2014-04-20  7:14 ` Mark Walters [this message]
     [not found]   ` <87oazwjq1e.fsf@yoom.home.cworth.org>
2014-04-20 12:03     ` [RFC PATCH] " Mark Walters
2014-04-21  7:20       ` Mark Walters
2014-04-21 16:20         ` Austin Clements
2022-01-01  0:26         ` David Bremner
2014-04-20 12:59     ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fvl8mpzj.fsf@qmul.ac.uk \
    --to=markwalters1009@gmail.com \
    --cc=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).