From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 37BBC431FAE for ; Sun, 3 Mar 2013 15:56:05 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.799 X-Spam-Level: X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V1DnH4WHTqq1 for ; Sun, 3 Mar 2013 15:56:01 -0800 (PST) Received: from mail-qe0-f43.google.com (mail-qe0-f43.google.com [209.85.128.43]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 95EFB431FAF for ; Sun, 3 Mar 2013 15:56:01 -0800 (PST) Received: by mail-qe0-f43.google.com with SMTP id 1so3362385qee.16 for ; Sun, 03 Mar 2013 15:56:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:from:to:cc:subject:in-reply-to:references:user-agent :date:message-id:mime-version:content-type; bh=35z6Zyv+gYqaZZOgfJ0ZKm7sQt0qSjkJLHQIq+Sw1po=; b=BLlyMlDCOvZww549uMory2/3a8Re1qe8fq06U7U/4EpMrbZJJcCVZFgqw/aU/PZENj cR8y/MTb7hrKcNiHsiBmyM4iN1I9EcIq3JQyrQWSbE9XmBYHCnauh1jYwZmW9Ao8GR4o AxfKUKU0Z37RfCJtsrQ3VYhIQbc745fV4pkuSQQDWLvNbSnNQeUQL3cQ1sWvFVX5Ysgo 1UuUJECs2AQpUymxB4ITzBuo1zLZtVMkxgJoQ2XOPAvDHaTc0dviwxnZvYbqWUeUHNye izoZPRY1vAS2DivKhcYOXhtUSkc2/qecwDptftaQFRf0xJDzxOi6h32LrkQC0oiRJ4cP rZBA== X-Received: by 10.229.196.138 with SMTP id eg10mr6324192qcb.93.1362354961024; Sun, 03 Mar 2013 15:56:01 -0800 (PST) Received: from localhost (c-68-80-94-73.hsd1.pa.comcast.net. [68.80.94.73]) by mx.google.com with ESMTPS id m6sm3499808qed.7.2013.03.03.15.55.59 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sun, 03 Mar 2013 15:56:00 -0800 (PST) From: Aaron Ecay To: Jani Nikula , notmuch@notmuchmail.org Subject: Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated In-Reply-To: <87wqtovygl.fsf@gmail.com> References: <1361836225-17279-1-git-send-email-aaronecay@gmail.com> <87621cteeb.fsf@nikula.org> <871ubzt5gr.fsf@nikula.org> <87wqtovygl.fsf@gmail.com> User-Agent: Notmuch/0.15.2+33~g0c0a530 (http://notmuchmail.org) Emacs/24.3.50.2 (x86_64-unknown-linux-gnu) Date: Sun, 03 Mar 2013 18:55:58 -0500 Message-ID: <87sj4cvy0h.fsf@gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2013 23:56:05 -0000 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable git send-email is mad about lines >998 characters in the test patch, so I=E2=80=99m sending the patches as attachments to this email. (Is there a better way to include the expected output of a notmuch command which outputs long lines in a test script?) --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-test-add-tests-for-the-handling-of-References-and-In.patch Content-Transfer-Encoding: quoted-printable >From 23836241dd304b98f2a05803fbb5a5a94f563050 Mon Sep 17 00:00:00 2001 From: Aaron Ecay Date: Sun, 3 Mar 2013 18:14:07 -0500 Subject: [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers These tests are known_broken, the following commit fixes them. --- test/thread-replies | 55 +++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 55 insertions(+) create mode 100755 test/thread-replies diff --git a/test/thread-replies b/test/thread-replies new file mode 100755 index 0000000..fd11a09 --- /dev/null +++ b/test/thread-replies @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# +# Copyright (c) 2013 Aaron Ecay +# + +test_description=3D'test of proper handling of in-reply-to and references = headers + +This test makes sure that the thread structure in the notmuch database is +constructed properly, even in the presence of non-RFC-compliant headers' + +. ./test-lib.sh + +test_begin_subtest "Use References when In-Reply-To is broken" +test_subtest_known_broken +add_message '[id]=3D"foo@one.com"' \ + '[subject]=3Done' +add_message '[in-reply-to]=3D"mumble"' \ + '[references]=3D""' \ + '[subject]=3D"Re: one"' +output=3D$(notmuch show --format=3Djson 'subject:one') +test_expect_equal "$output" '[[[{"id": "foo@one.com", "match": true, "excl= uded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/= notmuch/test/tmp.thread-replies/mail/msg-001", "timestamp": 978709437, "dat= e_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subjec= t": "one", "From": "Notmuch Test Suite ", "To":= "Notmuch Test Suite ", "Date": "Fri, 05 Jan 20= 01 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "cont= ent": "This is just a test message (#1)\n"}]}, [[{"id": "msg-002@notmuch-te= st-suite", "match": true, "excluded": false, "filename": "/home/aecay/devel= opment/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-002= ", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox",= "unread"], "headers": {"Subject": "Re: one", "From": "Notmuch Test Suite <= test_suite@notmuchmail.org>", "To": "Notmuch Test Suite ", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, = "content-type": "text/plain", "content": "This is just a test message (#2)\= n"}]}, []]]]]]' + +test_begin_subtest "Prefer References to In-Reply-To" +test_subtest_known_broken +add_message '[id]=3D"foo@two.com"' \ + '[subject]=3Dtwo' +add_message '[in-reply-to]=3D""' \ + '[references]=3D""' \ + '[subject]=3D"Re: two"' +output=3D$(notmuch show --format=3Djson 'subject:two') +test_expect_equal "$output" '[[[{"id": "foo@two.com", "match": true, "excl= uded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/= notmuch/test/tmp.thread-replies/mail/msg-003", "timestamp": 978709437, "dat= e_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subjec= t": "two", "From": "Notmuch Test Suite ", "To":= "Notmuch Test Suite ", "Date": "Fri, 05 Jan 20= 01 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "cont= ent": "This is just a test message (#3)\n"}]}, [[{"id": "msg-004@notmuch-te= st-suite", "match": true, "excluded": false, "filename": "/home/aecay/devel= opment/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-004= ", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox",= "unread"], "headers": {"Subject": "Re: two", "From": "Notmuch Test Suite <= test_suite@notmuchmail.org>", "To": "Notmuch Test Suite ", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, = "content-type": "text/plain", "content": "This is just a test message (#4)\= n"}]}, []]]]]]' + +test_begin_subtest "Use In-Reply-To when no References" +test_subtest_known_broken +add_message '[id]=3D"foo@three.com"' \ + '[subject]=3D"three"' +add_message '[in-reply-to]=3D""' \ + '[subject]=3D"Re: three"' +output=3D$(notmuch show --format=3Djson 'subject:three') +test_expect_equal "$output" '[[[{"id": "foo@three.com", "match": true, "ex= cluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/sr= c/notmuch/test/tmp.thread-replies/mail/msg-005", "timestamp": 978709437, "d= ate_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subj= ect": "three", "From": "Notmuch Test Suite ", "= To": "Notmuch Test Suite ", "Date": "Fri, 05 Ja= n 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "= content": "This is just a test message (#5)\n"}]}, [[{"id": "msg-006@notmuc= h-test-suite", "match": true, "excluded": false, "filename": "/home/aecay/d= evelopment/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg= -006", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inb= ox", "unread"], "headers": {"Subject": "Re: three", "From": "Notmuch Test S= uite ", "To": "Notmuch Test Suite ", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id= ": 1, "content-type": "text/plain", "content": "This is just a test message= (#6)\n"}]}, []]]]]]' + +test_begin_subtest "Use last Reference" +test_subtest_known_broken +add_message '[id]=3D"foo@four.com"' \ + '[subject]=3D"four"' +add_message '[id]=3D"bar@four.com"' \ + '[subject]=3D"not-four"' +add_message '[in-reply-to]=3D""' \ + '[references]=3D" "' \ + '[subject]=3D"neither"' +output=3D$(notmuch show --format=3Djson 'subject:four') +test_expect_equal "$output" '[[[{"id": "foo@four.com", "match": true, "exc= luded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src= /notmuch/test/tmp.thread-replies/mail/msg-007", "timestamp": 978709437, "da= te_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subje= ct": "four", "From": "Notmuch Test Suite ", "To= ": "Notmuch Test Suite ", "Date": "Fri, 05 Jan = 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "co= ntent": "This is just a test message (#7)\n"}]}, [[{"id": "msg-009@notmuch-= test-suite", "match": false, "excluded": false, "filename": "/home/aecay/de= velopment/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-= 009", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbo= x", "unread"], "headers": {"Subject": "neither", "From": "Notmuch Test Suit= e ", "To": "Notmuch Test Suite ", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": = 1, "content-type": "text/plain", "content": "This is just a test message (#= 9)\n"}]}, []]]]], [[{"id": "bar@four.com", "match": true, "excluded": false= , "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test= /tmp.thread-replies/mail/msg-008", "timestamp": 978709437, "date_relative":= "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "not-fou= r", "From": "Notmuch Test Suite ", "To": "Notmu= ch Test Suite ", "Date": "Fri, 05 Jan 2001 15:4= 3:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "= This is just a test message (#8)\n"}]}, []]]]' + + +test_done --=20 1.8.1.5 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0002-lib-database.cc-change-how-the-parent-of-a-message-i.patch Content-Transfer-Encoding: quoted-printable >From 57739b9722a86ba50ef97ad7d5d21b3e5bc1a977 Mon Sep 17 00:00:00 2001 From: Aaron Ecay Date: Mon, 25 Feb 2013 18:46:41 -0500 Subject: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Presently, the code which finds the parent of a message as it is being added to the database assumes that the first Message-ID-like substring of the In-Reply-To header is the parent Message ID. Some mail clients, however, put stuff other than the Message-ID of the parent in the In-Reply-To header, such as the email address of the sender of the parent. This can fool notmuch. The updated algorithm prefers the last Message ID in the References header. The References header lists messages oldest-first, so the last Message ID is the parent (RFC2822, p. 24). The References header is also less likely to be in a non-standard syntax (http://cr.yp.to/immhf/thread.html, http://www.jwz.org/doc/threading.html). In case the References header is not to be found, fall back to the old behavior. V2 of this patch, incorporating feedback from Jani and (indirectly) Austin. --- lib/database.cc | 48 +++++++++++++++++++++++++++++++++--------------- test/thread-replies | 4 ---- 2 files changed, 33 insertions(+), 19 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 91d4329..52ed618 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, = const char **next) * 'message_id' in the result (to avoid mass confusion when a single * message references itself cyclically---and yes, mail messages are * not infrequent in the wild that do this---don't ask me why). -*/ -static void + * + * Return the last reference parsed, if it is not equal to message_id. + */ +static char * parse_references (void *ctx, const char *message_id, GHashTable *hash, @@ -511,7 +513,7 @@ parse_references (void *ctx, char *ref; =20 if (refs =3D=3D NULL || *refs =3D=3D '\0') - return; + return NULL; =20 while (*refs) { ref =3D _parse_message_id (ctx, refs, &refs); @@ -519,6 +521,17 @@ parse_references (void *ctx, if (ref && strcmp (ref, message_id)) g_hash_table_insert (hash, ref, NULL); } + + /* The return value of this function is used to add a parent + * reference to the database. We should avoid making a message + * its own parent, thus the following check. + */ + + if (ref && strcmp(ref, message_id)) { + return ref; + } else { + return NULL; + } } =20 notmuch_status_t @@ -1510,28 +1523,33 @@ _notmuch_database_link_message_to_parents (notmuch_= database_t *notmuch, { GHashTable *parents =3D NULL; const char *refs, *in_reply_to, *in_reply_to_message_id; + const char *last_ref_message_id, *this_message_id; GList *l, *keys =3D NULL; notmuch_status_t ret =3D NOTMUCH_STATUS_SUCCESS; =20 parents =3D g_hash_table_new_full (g_str_hash, g_str_equal, _my_talloc_free_for_g_hash, NULL); + this_message_id =3D notmuch_message_get_message_id (message); =20 refs =3D notmuch_message_file_get_header (message_file, "references"); - parse_references (message, notmuch_message_get_message_id (message), - parents, refs); + last_ref_message_id =3D parse_references (message, + this_message_id, + parents, refs); =20 in_reply_to =3D notmuch_message_file_get_header (message_file, "in-rep= ly-to"); - parse_references (message, notmuch_message_get_message_id (message), - parents, in_reply_to); - - /* Carefully avoid adding any self-referential in-reply-to term. */ - in_reply_to_message_id =3D _parse_message_id (message, in_reply_to, NU= LL); - if (in_reply_to_message_id && - strcmp (in_reply_to_message_id, - notmuch_message_get_message_id (message))) - { + in_reply_to_message_id =3D parse_references (message, + this_message_id, + parents, in_reply_to); + + /* For the parent of this message, use the last message ID of the + * References header, if available. If not, fall back to the + * first message ID in the In-Reply-To header. */ + if (last_ref_message_id) { + _notmuch_message_add_term (message, "replyto", + last_ref_message_id); + } else if (in_reply_to_message_id) { _notmuch_message_add_term (message, "replyto", - _parse_message_id (message, in_reply_to, NULL)); + in_reply_to_message_id); } =20 keys =3D g_hash_table_get_keys (parents); diff --git a/test/thread-replies b/test/thread-replies index fd11a09..6dc6143 100755 --- a/test/thread-replies +++ b/test/thread-replies @@ -11,7 +11,6 @@ constructed properly, even in the presence of non-RFC-com= pliant headers' . ./test-lib.sh =20 test_begin_subtest "Use References when In-Reply-To is broken" -test_subtest_known_broken add_message '[id]=3D"foo@one.com"' \ '[subject]=3Done' add_message '[in-reply-to]=3D"mumble"' \ @@ -21,7 +20,6 @@ output=3D$(notmuch show --format=3Djson 'subject:one') test_expect_equal "$output" '[[[{"id": "foo@one.com", "match": true, "excl= uded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/= notmuch/test/tmp.thread-replies/mail/msg-001", "timestamp": 978709437, "dat= e_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subjec= t": "one", "From": "Notmuch Test Suite ", "To":= "Notmuch Test Suite ", "Date": "Fri, 05 Jan 20= 01 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "cont= ent": "This is just a test message (#1)\n"}]}, [[{"id": "msg-002@notmuch-te= st-suite", "match": true, "excluded": false, "filename": "/home/aecay/devel= opment/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-002= ", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox",= "unread"], "headers": {"Subject": "Re: one", "From": "Notmuch Test Suite <= test_suite@notmuchmail.org>", "To": "Notmuch Test Suite ", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, = "content-type": "text/plain", "content": "This is just a test message (#2)\= n"}]}, []]]]]]' =20 test_begin_subtest "Prefer References to In-Reply-To" -test_subtest_known_broken add_message '[id]=3D"foo@two.com"' \ '[subject]=3Dtwo' add_message '[in-reply-to]=3D""' \ @@ -31,7 +29,6 @@ output=3D$(notmuch show --format=3Djson 'subject:two') test_expect_equal "$output" '[[[{"id": "foo@two.com", "match": true, "excl= uded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/= notmuch/test/tmp.thread-replies/mail/msg-003", "timestamp": 978709437, "dat= e_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subjec= t": "two", "From": "Notmuch Test Suite ", "To":= "Notmuch Test Suite ", "Date": "Fri, 05 Jan 20= 01 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "cont= ent": "This is just a test message (#3)\n"}]}, [[{"id": "msg-004@notmuch-te= st-suite", "match": true, "excluded": false, "filename": "/home/aecay/devel= opment/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-004= ", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox",= "unread"], "headers": {"Subject": "Re: two", "From": "Notmuch Test Suite <= test_suite@notmuchmail.org>", "To": "Notmuch Test Suite ", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, = "content-type": "text/plain", "content": "This is just a test message (#4)\= n"}]}, []]]]]]' =20 test_begin_subtest "Use In-Reply-To when no References" -test_subtest_known_broken add_message '[id]=3D"foo@three.com"' \ '[subject]=3D"three"' add_message '[in-reply-to]=3D""' \ @@ -40,7 +37,6 @@ output=3D$(notmuch show --format=3Djson 'subject:three') test_expect_equal "$output" '[[[{"id": "foo@three.com", "match": true, "ex= cluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/sr= c/notmuch/test/tmp.thread-replies/mail/msg-005", "timestamp": 978709437, "d= ate_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subj= ect": "three", "From": "Notmuch Test Suite ", "= To": "Notmuch Test Suite ", "Date": "Fri, 05 Ja= n 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "= content": "This is just a test message (#5)\n"}]}, [[{"id": "msg-006@notmuc= h-test-suite", "match": true, "excluded": false, "filename": "/home/aecay/d= evelopment/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg= -006", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inb= ox", "unread"], "headers": {"Subject": "Re: three", "From": "Notmuch Test S= uite ", "To": "Notmuch Test Suite ", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id= ": 1, "content-type": "text/plain", "content": "This is just a test message= (#6)\n"}]}, []]]]]]' =20 test_begin_subtest "Use last Reference" -test_subtest_known_broken add_message '[id]=3D"foo@four.com"' \ '[subject]=3D"four"' add_message '[id]=3D"bar@four.com"' \ --=20 1.8.1.5 --=-=-= Content-Type: text/plain -- Aaron Ecay --=-=-=--