unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated
@ 2013-02-25 23:50 Aaron Ecay
  2013-02-26  9:19 ` Jani Nikula
  2013-02-28 19:41 ` Jani Nikula
  0 siblings, 2 replies; 17+ messages in thread
From: Aaron Ecay @ 2013-02-25 23:50 UTC (permalink / raw)
  To: notmuch

Presently, the code which finds the parent of a message as it is being
added to the database assumes that the first Message-ID-like substring
of the In-Reply-To header is the parent Message ID.  Some mail clients,
however, put stuff other than the Message-ID of the parent in the
In-Reply-To header, such as the email address of the sender of the
parent.  This can fool notmuch.

The updated algorithm prefers the last Message ID in the References
header.  The References header lists messages oldest-first, so the last
Message ID is the parent (RFC2822, p. 24).  The References header is
also less likely to be in a non-standard
syntax (http://cr.yp.to/immhf/thread.html,
http://www.jwz.org/doc/threading.html).  In case the References header
is not to be found, fall back to the old behavior.
---

I especially notice this problem on public mailing lists, where
certain people's messages always cause an "out-dent" of the threading,
instead of being nested under whichever message they are replies to.

Technically, putting non-Message-ID crud in the In-Reply-To field is a
violation of RFC2822, but it appears that in practice the References
header is respected more often than the In-Reply-To one.

 lib/database.cc | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 91d4329..cbf33ae 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next)
  * 'message_id' in the result (to avoid mass confusion when a single
  * message references itself cyclically---and yes, mail messages are
  * not infrequent in the wild that do this---don't ask me why).
+ *
+ * Return the last reference parsed.
 */
-static void
+static char *
 parse_references (void *ctx,
 		  const char *message_id,
 		  GHashTable *hash,
@@ -511,7 +513,7 @@ parse_references (void *ctx,
     char *ref;

     if (refs == NULL || *refs == '\0')
-	return;
+	return NULL;

     while (*refs) {
 	ref = _parse_message_id (ctx, refs, &refs);
@@ -519,6 +521,8 @@ parse_references (void *ctx,
 	if (ref && strcmp (ref, message_id))
 	    g_hash_table_insert (hash, ref, NULL);
     }
+
+    return ref;
 }

 notmuch_status_t
@@ -1365,7 +1369,7 @@ _notmuch_database_generate_doc_id (notmuch_database_t *notmuch)
     notmuch->last_doc_id++;

     if (notmuch->last_doc_id == 0)
-	INTERNAL_ERROR ("Xapian document IDs are exhausted.\n");
+	INTERNAL_ERROR ("Xapian document IDs are exhausted.\n");

     return notmuch->last_doc_id;
 }
@@ -1509,7 +1513,7 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
 					   const char **thread_id)
 {
     GHashTable *parents = NULL;
-    const char *refs, *in_reply_to, *in_reply_to_message_id;
+    const char *refs, *in_reply_to, *in_reply_to_message_id, *last_ref_message_id;
     GList *l, *keys = NULL;
     notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;

@@ -1517,21 +1521,31 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
 				     _my_talloc_free_for_g_hash, NULL);

     refs = notmuch_message_file_get_header (message_file, "references");
-    parse_references (message, notmuch_message_get_message_id (message),
-		      parents, refs);
+    last_ref_message_id = parse_references (message,
+					    notmuch_message_get_message_id (message),
+					    parents, refs);

     in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to");
     parse_references (message, notmuch_message_get_message_id (message),
 		      parents, in_reply_to);

-    /* Carefully avoid adding any self-referential in-reply-to term. */
     in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL);
+    /* If the parent message ID from the Reply-To and References
+     * headers are different, use the References one.  This is because
+     * the Reply-To header is more likely to be in an non-standard
+     * format. */
+    if (in_reply_to_message_id &&
+	last_ref_message_id &&
+	strcmp (last_ref_message_id, in_reply_to_message_id)) {
+	in_reply_to_message_id = last_ref_message_id;
+    }
+    /* Carefully avoid adding any self-referential in-reply-to term. */
     if (in_reply_to_message_id &&
 	strcmp (in_reply_to_message_id,
 		notmuch_message_get_message_id (message)))
     {
 	_notmuch_message_add_term (message, "replyto",
-			     _parse_message_id (message, in_reply_to, NULL));
+			     in_reply_to_message_id);
     }

     keys = g_hash_table_get_keys (parents);
--
1.8.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-05-14  0:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-25 23:50 [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Aaron Ecay
2013-02-26  9:19 ` Jani Nikula
2013-02-26 19:40   ` Aaron Ecay
2013-02-28 19:41 ` Jani Nikula
2013-03-01 17:06   ` Jani Nikula
2013-03-03 23:46     ` Aaron Ecay
2013-03-03 23:55       ` Aaron Ecay
2013-03-04  2:45         ` David Bremner
2013-03-06  3:31           ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers Aaron Ecay
2013-03-06  3:31             ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay
2013-05-04 14:10               ` David Bremner
2013-05-04 16:24               ` Jani Nikula
2013-05-14  0:43               ` David Bremner
2013-05-04 14:01             ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers David Bremner
2013-05-04 14:03             ` David Bremner
2013-05-04 16:23             ` Jani Nikula
2013-03-04  5:59         ` [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Tomi Ollila

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).