From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id CBE10431FAE for ; Tue, 5 Mar 2013 19:32:01 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.799 X-Spam-Level: X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oNltIgGJgyHI for ; Tue, 5 Mar 2013 19:32:00 -0800 (PST) Received: from mail-qa0-f49.google.com (mail-qa0-f49.google.com [209.85.216.49]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id D5D75431FB6 for ; Tue, 5 Mar 2013 19:31:59 -0800 (PST) Received: by mail-qa0-f49.google.com with SMTP id o13so2299693qaj.15 for ; Tue, 05 Mar 2013 19:31:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:from:to:subject:date:message-id:x-mailer:in-reply-to :references; bh=I7SQero4dXbgF6qFSxlWsUGDj3/BMJ8eRDxPC+Q+CsE=; b=G5mMAVUqWubcywuFIEOnq1mhVw36rS5mfnAgEJ0C4HG1O/kNvaR8dV7n/wybChWky5 X7cY+HcQXw8fUIlWKshAZVTxLYSH06V3WRqKG7FmHzmKf7sxdJhS8ssrtO6/YO0qXzdk xMwW7uVCC1hEZnnsDT4+8I2QEDC9ciFIpW6JmgLo/SVTPM82snxklLrO5fi1qIiom4Kg 5h1FefyIYeTj65MIuh1gqw5SyIjMrKv01HR5F/LbfrhRLEfx+ALiTT0Xf7sYnQp3oSt0 FZK9oThaLMe127S2AUZPsR74jbTFbghISRR4ruPpbrAK2h++AQiiYQziaAochxRiV6FS C6CA== X-Received: by 10.229.198.22 with SMTP id em22mr9334247qcb.2.1362540718664; Tue, 05 Mar 2013 19:31:58 -0800 (PST) Received: from haize.hsd1.pa.comcast.net (c-68-80-94-73.hsd1.pa.comcast.net. [68.80.94.73]) by mx.google.com with ESMTPS id q5sm46570133qaz.2.2013.03.05.19.31.57 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 05 Mar 2013 19:31:58 -0800 (PST) From: Aaron Ecay To: notmuch@notmuchmail.org Subject: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Date: Tue, 5 Mar 2013 22:31:49 -0500 Message-Id: <1362540709-28765-2-git-send-email-aaronecay@gmail.com> X-Mailer: git-send-email 1.8.1.5 In-Reply-To: <1362540709-28765-1-git-send-email-aaronecay@gmail.com> References: <87ppzfzxuk.fsf@zancas.localnet> <1362540709-28765-1-git-send-email-aaronecay@gmail.com> X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 03:32:01 -0000 Presently, the code which finds the parent of a message as it is being added to the database assumes that the first Message-ID-like substring of the In-Reply-To header is the parent Message ID. Some mail clients, however, put stuff other than the Message-ID of the parent in the In-Reply-To header, such as the email address of the sender of the parent. This can fool notmuch. The updated algorithm prefers the last Message ID in the References header. The References header lists messages oldest-first, so the last Message ID is the parent (RFC2822, p. 24). The References header is also less likely to be in a non-standard syntax (http://cr.yp.to/immhf/thread.html, http://www.jwz.org/doc/threading.html). In case the References header is not to be found, fall back to the old behavior. V2 of this patch, incorporating feedback from Jani and (indirectly) Austin. --- lib/database.cc | 48 +++++++++++++++++++++++++++++++++--------------- test/thread-replies | 4 ---- 2 files changed, 33 insertions(+), 19 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 91d4329..52ed618 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) * 'message_id' in the result (to avoid mass confusion when a single * message references itself cyclically---and yes, mail messages are * not infrequent in the wild that do this---don't ask me why). -*/ -static void + * + * Return the last reference parsed, if it is not equal to message_id. + */ +static char * parse_references (void *ctx, const char *message_id, GHashTable *hash, @@ -511,7 +513,7 @@ parse_references (void *ctx, char *ref; if (refs == NULL || *refs == '\0') - return; + return NULL; while (*refs) { ref = _parse_message_id (ctx, refs, &refs); @@ -519,6 +521,17 @@ parse_references (void *ctx, if (ref && strcmp (ref, message_id)) g_hash_table_insert (hash, ref, NULL); } + + /* The return value of this function is used to add a parent + * reference to the database. We should avoid making a message + * its own parent, thus the following check. + */ + + if (ref && strcmp(ref, message_id)) { + return ref; + } else { + return NULL; + } } notmuch_status_t @@ -1510,28 +1523,33 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, { GHashTable *parents = NULL; const char *refs, *in_reply_to, *in_reply_to_message_id; + const char *last_ref_message_id, *this_message_id; GList *l, *keys = NULL; notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; parents = g_hash_table_new_full (g_str_hash, g_str_equal, _my_talloc_free_for_g_hash, NULL); + this_message_id = notmuch_message_get_message_id (message); refs = notmuch_message_file_get_header (message_file, "references"); - parse_references (message, notmuch_message_get_message_id (message), - parents, refs); + last_ref_message_id = parse_references (message, + this_message_id, + parents, refs); in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); - parse_references (message, notmuch_message_get_message_id (message), - parents, in_reply_to); - - /* Carefully avoid adding any self-referential in-reply-to term. */ - in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); - if (in_reply_to_message_id && - strcmp (in_reply_to_message_id, - notmuch_message_get_message_id (message))) - { + in_reply_to_message_id = parse_references (message, + this_message_id, + parents, in_reply_to); + + /* For the parent of this message, use the last message ID of the + * References header, if available. If not, fall back to the + * first message ID in the In-Reply-To header. */ + if (last_ref_message_id) { + _notmuch_message_add_term (message, "replyto", + last_ref_message_id); + } else if (in_reply_to_message_id) { _notmuch_message_add_term (message, "replyto", - _parse_message_id (message, in_reply_to, NULL)); + in_reply_to_message_id); } keys = g_hash_table_get_keys (parents); diff --git a/test/thread-replies b/test/thread-replies index a902691..28c2b1f 100755 --- a/test/thread-replies +++ b/test/thread-replies @@ -11,7 +11,6 @@ constructed properly, even in the presence of non-RFC-compliant headers' . ./test-lib.sh test_begin_subtest "Use References when In-Reply-To is broken" -test_subtest_known_broken add_message '[id]="foo@one.com"' \ '[subject]=one' add_message '[in-reply-to]="mumble"' \ @@ -46,7 +45,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` test_expect_equal_json "$output" "$expected" test_begin_subtest "Prefer References to In-Reply-To" -test_subtest_known_broken add_message '[id]="foo@two.com"' \ '[subject]=two' add_message '[in-reply-to]=""' \ @@ -77,7 +75,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` test_expect_equal_json "$output" "$expected" test_begin_subtest "Use In-Reply-To when no References" -test_subtest_known_broken add_message '[id]="foo@three.com"' \ '[subject]="three"' add_message '[in-reply-to]=""' \ @@ -104,7 +101,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` test_expect_equal_json "$output" "$expected" test_begin_subtest "Use last Reference" -test_subtest_known_broken add_message '[id]="foo@four.com"' \ '[subject]="four"' add_message '[id]="bar@four.com"' \ -- 1.8.1.5