* [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated @ 2013-02-25 23:50 Aaron Ecay 2013-02-26 9:19 ` Jani Nikula 2013-02-28 19:41 ` Jani Nikula 0 siblings, 2 replies; 17+ messages in thread From: Aaron Ecay @ 2013-02-25 23:50 UTC (permalink / raw) To: notmuch Presently, the code which finds the parent of a message as it is being added to the database assumes that the first Message-ID-like substring of the In-Reply-To header is the parent Message ID. Some mail clients, however, put stuff other than the Message-ID of the parent in the In-Reply-To header, such as the email address of the sender of the parent. This can fool notmuch. The updated algorithm prefers the last Message ID in the References header. The References header lists messages oldest-first, so the last Message ID is the parent (RFC2822, p. 24). The References header is also less likely to be in a non-standard syntax (http://cr.yp.to/immhf/thread.html, http://www.jwz.org/doc/threading.html). In case the References header is not to be found, fall back to the old behavior. --- I especially notice this problem on public mailing lists, where certain people's messages always cause an "out-dent" of the threading, instead of being nested under whichever message they are replies to. Technically, putting non-Message-ID crud in the In-Reply-To field is a violation of RFC2822, but it appears that in practice the References header is respected more often than the In-Reply-To one. lib/database.cc | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 91d4329..cbf33ae 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) * 'message_id' in the result (to avoid mass confusion when a single * message references itself cyclically---and yes, mail messages are * not infrequent in the wild that do this---don't ask me why). + * + * Return the last reference parsed. */ -static void +static char * parse_references (void *ctx, const char *message_id, GHashTable *hash, @@ -511,7 +513,7 @@ parse_references (void *ctx, char *ref; if (refs == NULL || *refs == '\0') - return; + return NULL; while (*refs) { ref = _parse_message_id (ctx, refs, &refs); @@ -519,6 +521,8 @@ parse_references (void *ctx, if (ref && strcmp (ref, message_id)) g_hash_table_insert (hash, ref, NULL); } + + return ref; } notmuch_status_t @@ -1365,7 +1369,7 @@ _notmuch_database_generate_doc_id (notmuch_database_t *notmuch) notmuch->last_doc_id++; if (notmuch->last_doc_id == 0) - INTERNAL_ERROR ("Xapian document IDs are exhausted.\n"); + INTERNAL_ERROR ("Xapian document IDs are exhausted.\n"); return notmuch->last_doc_id; } @@ -1509,7 +1513,7 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, const char **thread_id) { GHashTable *parents = NULL; - const char *refs, *in_reply_to, *in_reply_to_message_id; + const char *refs, *in_reply_to, *in_reply_to_message_id, *last_ref_message_id; GList *l, *keys = NULL; notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; @@ -1517,21 +1521,31 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, _my_talloc_free_for_g_hash, NULL); refs = notmuch_message_file_get_header (message_file, "references"); - parse_references (message, notmuch_message_get_message_id (message), - parents, refs); + last_ref_message_id = parse_references (message, + notmuch_message_get_message_id (message), + parents, refs); in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); parse_references (message, notmuch_message_get_message_id (message), parents, in_reply_to); - /* Carefully avoid adding any self-referential in-reply-to term. */ in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); + /* If the parent message ID from the Reply-To and References + * headers are different, use the References one. This is because + * the Reply-To header is more likely to be in an non-standard + * format. */ + if (in_reply_to_message_id && + last_ref_message_id && + strcmp (last_ref_message_id, in_reply_to_message_id)) { + in_reply_to_message_id = last_ref_message_id; + } + /* Carefully avoid adding any self-referential in-reply-to term. */ if (in_reply_to_message_id && strcmp (in_reply_to_message_id, notmuch_message_get_message_id (message))) { _notmuch_message_add_term (message, "replyto", - _parse_message_id (message, in_reply_to, NULL)); + in_reply_to_message_id); } keys = g_hash_table_get_keys (parents); -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated 2013-02-25 23:50 [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Aaron Ecay @ 2013-02-26 9:19 ` Jani Nikula 2013-02-26 19:40 ` Aaron Ecay 2013-02-28 19:41 ` Jani Nikula 1 sibling, 1 reply; 17+ messages in thread From: Jani Nikula @ 2013-02-26 9:19 UTC (permalink / raw) To: Aaron Ecay, notmuch On Tue, 26 Feb 2013, Aaron Ecay <aaronecay@gmail.com> wrote: > Presently, the code which finds the parent of a message as it is being > added to the database assumes that the first Message-ID-like substring > of the In-Reply-To header is the parent Message ID. Some mail clients, > however, put stuff other than the Message-ID of the parent in the > In-Reply-To header, such as the email address of the sender of the > parent. This can fool notmuch. Hi Aaron, please provide references to a few messages like this. If available on the notmuch list an id: reference would be best, but otherwise some archive that allows viewing full message headers or downloading the full message would be best. Thanks, Jani. > > The updated algorithm prefers the last Message ID in the References > header. The References header lists messages oldest-first, so the last > Message ID is the parent (RFC2822, p. 24). The References header is > also less likely to be in a non-standard > syntax (http://cr.yp.to/immhf/thread.html, > http://www.jwz.org/doc/threading.html). In case the References header > is not to be found, fall back to the old behavior. > --- > > I especially notice this problem on public mailing lists, where > certain people's messages always cause an "out-dent" of the threading, > instead of being nested under whichever message they are replies to. > > Technically, putting non-Message-ID crud in the In-Reply-To field is a > violation of RFC2822, but it appears that in practice the References > header is respected more often than the In-Reply-To one. > > lib/database.cc | 30 ++++++++++++++++++++++-------- > 1 file changed, 22 insertions(+), 8 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index 91d4329..cbf33ae 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) > * 'message_id' in the result (to avoid mass confusion when a single > * message references itself cyclically---and yes, mail messages are > * not infrequent in the wild that do this---don't ask me why). > + * > + * Return the last reference parsed. > */ > -static void > +static char * > parse_references (void *ctx, > const char *message_id, > GHashTable *hash, > @@ -511,7 +513,7 @@ parse_references (void *ctx, > char *ref; > > if (refs == NULL || *refs == '\0') > - return; > + return NULL; > > while (*refs) { > ref = _parse_message_id (ctx, refs, &refs); > @@ -519,6 +521,8 @@ parse_references (void *ctx, > if (ref && strcmp (ref, message_id)) > g_hash_table_insert (hash, ref, NULL); > } > + > + return ref; > } > > notmuch_status_t > @@ -1365,7 +1369,7 @@ _notmuch_database_generate_doc_id (notmuch_database_t *notmuch) > notmuch->last_doc_id++; > > if (notmuch->last_doc_id == 0) > - INTERNAL_ERROR ("Xapian document IDs are exhausted.\n"); > + INTERNAL_ERROR ("Xapian document IDs are exhausted.\n"); > > return notmuch->last_doc_id; > } > @@ -1509,7 +1513,7 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, > const char **thread_id) > { > GHashTable *parents = NULL; > - const char *refs, *in_reply_to, *in_reply_to_message_id; > + const char *refs, *in_reply_to, *in_reply_to_message_id, *last_ref_message_id; > GList *l, *keys = NULL; > notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; > > @@ -1517,21 +1521,31 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, > _my_talloc_free_for_g_hash, NULL); > > refs = notmuch_message_file_get_header (message_file, "references"); > - parse_references (message, notmuch_message_get_message_id (message), > - parents, refs); > + last_ref_message_id = parse_references (message, > + notmuch_message_get_message_id (message), > + parents, refs); > > in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); > parse_references (message, notmuch_message_get_message_id (message), > parents, in_reply_to); > > - /* Carefully avoid adding any self-referential in-reply-to term. */ > in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); > + /* If the parent message ID from the Reply-To and References > + * headers are different, use the References one. This is because > + * the Reply-To header is more likely to be in an non-standard > + * format. */ > + if (in_reply_to_message_id && > + last_ref_message_id && > + strcmp (last_ref_message_id, in_reply_to_message_id)) { > + in_reply_to_message_id = last_ref_message_id; > + } > + /* Carefully avoid adding any self-referential in-reply-to term. */ > if (in_reply_to_message_id && > strcmp (in_reply_to_message_id, > notmuch_message_get_message_id (message))) > { > _notmuch_message_add_term (message, "replyto", > - _parse_message_id (message, in_reply_to, NULL)); > + in_reply_to_message_id); > } > > keys = g_hash_table_get_keys (parents); > -- > 1.8.1.4 > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated 2013-02-26 9:19 ` Jani Nikula @ 2013-02-26 19:40 ` Aaron Ecay 0 siblings, 0 replies; 17+ messages in thread From: Aaron Ecay @ 2013-02-26 19:40 UTC (permalink / raw) To: Jani Nikula, notmuch 2013ko otsailak 26an, Jani Nikula-ek idatzi zuen: > Hi Aaron, please provide references to a few messages like this. If > available on the notmuch list an id: reference would be best, but > otherwise some archive that allows viewing full message headers or > downloading the full message would be best. Here’s a message from the notmuch list that has passed through gmane, which inserts References but not In-Reply-To headers: id:877h0sa207.fsf@fester.com . Here’s a link to one with a borked In-Reply-To header: http://article.gmane.org/gmane.emacs.orgmode/66616/raw -- Aaron Ecay ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated 2013-02-25 23:50 [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Aaron Ecay 2013-02-26 9:19 ` Jani Nikula @ 2013-02-28 19:41 ` Jani Nikula 2013-03-01 17:06 ` Jani Nikula 1 sibling, 1 reply; 17+ messages in thread From: Jani Nikula @ 2013-02-28 19:41 UTC (permalink / raw) To: Aaron Ecay, notmuch Hi Aaron - On Tue, 26 Feb 2013, Aaron Ecay <aaronecay@gmail.com> wrote: > Presently, the code which finds the parent of a message as it is being > added to the database assumes that the first Message-ID-like substring > of the In-Reply-To header is the parent Message ID. Some mail clients, > however, put stuff other than the Message-ID of the parent in the > In-Reply-To header, such as the email address of the sender of the > parent. This can fool notmuch. I think the background is that RFC 822 defines In-Reply-To (and References too for that matter) as *(phrase / msg-id), while RFC 2822 defines them as 1*msg-id. I'd like something about RFC 822 being mentioned in the commit message. The problem in the gmane message you link to in id:87liaa3luc.fsf@gmail.com is likely related to the FAQ item 05.26 "How do I fix a bogus In-Reply-To or missing References field?" in the MH FAQ http://www.newt.com/faq/mh.html. > The updated algorithm prefers the last Message ID in the References > header. The References header lists messages oldest-first, so the last > Message ID is the parent (RFC2822, p. 24). The References header is > also less likely to be in a non-standard > syntax (http://cr.yp.to/immhf/thread.html, > http://www.jwz.org/doc/threading.html). In case the References header > is not to be found, fall back to the old behavior. > --- > > I especially notice this problem on public mailing lists, where > certain people's messages always cause an "out-dent" of the threading, > instead of being nested under whichever message they are replies to. > > Technically, putting non-Message-ID crud in the In-Reply-To field is a > violation of RFC2822, but it appears that in practice the References > header is respected more often than the In-Reply-To one. > > lib/database.cc | 30 ++++++++++++++++++++++-------- > 1 file changed, 22 insertions(+), 8 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index 91d4329..cbf33ae 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) > * 'message_id' in the result (to avoid mass confusion when a single > * message references itself cyclically---and yes, mail messages are > * not infrequent in the wild that do this---don't ask me why). > + * > + * Return the last reference parsed. > */ > -static void > +static char * > parse_references (void *ctx, > const char *message_id, > GHashTable *hash, > @@ -511,7 +513,7 @@ parse_references (void *ctx, > char *ref; > > if (refs == NULL || *refs == '\0') > - return; > + return NULL; > > while (*refs) { > ref = _parse_message_id (ctx, refs, &refs); > @@ -519,6 +521,8 @@ parse_references (void *ctx, > if (ref && strcmp (ref, message_id)) > g_hash_table_insert (hash, ref, NULL); > } > + > + return ref; As the comment for the function says, we explicitly avoid including self-references. I think I'd err on the safe side and return NULL if the last ref equals message-id. > } > > notmuch_status_t > @@ -1365,7 +1369,7 @@ _notmuch_database_generate_doc_id (notmuch_database_t *notmuch) > notmuch->last_doc_id++; > > if (notmuch->last_doc_id == 0) > - INTERNAL_ERROR ("Xapian document IDs are exhausted.\n"); > + INTERNAL_ERROR ("Xapian document IDs are exhausted.\n"); I don't know how you got this non-change hunk here, but please remove it. :) > > return notmuch->last_doc_id; > } > @@ -1509,7 +1513,7 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, > const char **thread_id) > { > GHashTable *parents = NULL; > - const char *refs, *in_reply_to, *in_reply_to_message_id; > + const char *refs, *in_reply_to, *in_reply_to_message_id, *last_ref_message_id; > GList *l, *keys = NULL; > notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; > > @@ -1517,21 +1521,31 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, > _my_talloc_free_for_g_hash, NULL); > > refs = notmuch_message_file_get_header (message_file, "references"); > - parse_references (message, notmuch_message_get_message_id (message), > - parents, refs); > + last_ref_message_id = parse_references (message, > + notmuch_message_get_message_id (message), > + parents, refs); > > in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); > parse_references (message, notmuch_message_get_message_id (message), > parents, in_reply_to); > > - /* Carefully avoid adding any self-referential in-reply-to term. */ > in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); I wonder if you should reuse your parse_references() change here, so you'd set in_reply_to_message_id to the last message-id in In-Reply-To. This might tackle some of the problematic cases directly, but should still be all right per RFC 2822. I didn't verify how the parser handles an RFC 2822 violating free form header though. > + /* If the parent message ID from the Reply-To and References > + * headers are different, use the References one. This is because > + * the Reply-To header is more likely to be in an non-standard > + * format. */ > + if (in_reply_to_message_id && > + last_ref_message_id && > + strcmp (last_ref_message_id, in_reply_to_message_id)) { > + in_reply_to_message_id = last_ref_message_id; > + } I suggest adding an else if branch (or revamp the above if condition) to tackle the missing In-Reply-To header: else if (!in_reply_to_message_id && last_ref_message_id) { in_reply_to_message_id = last_ref_message_id; } > + /* Carefully avoid adding any self-referential in-reply-to term. */ > if (in_reply_to_message_id && > strcmp (in_reply_to_message_id, > notmuch_message_get_message_id (message))) If you change parse_references() to be careful about never returning a self-reference, and set in_reply_to_message_id from there, I think you can drop the strcmp here. And move the comment to an appropriate place. Thanks for the patch, I think we should do this. But this is an area where I think we need to be careful, so another reviewer wouldn't harm. Some tests for this would be good too, obviously. BR, Jani. > { > _notmuch_message_add_term (message, "replyto", > - _parse_message_id (message, in_reply_to, NULL)); > + in_reply_to_message_id); > } > > keys = g_hash_table_get_keys (parents); > -- > 1.8.1.4 > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated 2013-02-28 19:41 ` Jani Nikula @ 2013-03-01 17:06 ` Jani Nikula 2013-03-03 23:46 ` Aaron Ecay 0 siblings, 1 reply; 17+ messages in thread From: Jani Nikula @ 2013-03-01 17:06 UTC (permalink / raw) To: Aaron Ecay, notmuch On Thu, 28 Feb 2013, Jani Nikula <jani@nikula.org> wrote: > Hi Aaron - > > On Tue, 26 Feb 2013, Aaron Ecay <aaronecay@gmail.com> wrote: >> Presently, the code which finds the parent of a message as it is being >> added to the database assumes that the first Message-ID-like substring >> of the In-Reply-To header is the parent Message ID. Some mail clients, >> however, put stuff other than the Message-ID of the parent in the >> In-Reply-To header, such as the email address of the sender of the >> parent. This can fool notmuch. > > I think the background is that RFC 822 defines In-Reply-To (and > References too for that matter) as *(phrase / msg-id), while RFC 2822 > defines them as 1*msg-id. I'd like something about RFC 822 being > mentioned in the commit message. > > The problem in the gmane message you link to in > id:87liaa3luc.fsf@gmail.com is likely related to the FAQ item 05.26 "How > do I fix a bogus In-Reply-To or missing References field?" in the MH FAQ > http://www.newt.com/faq/mh.html. > >> The updated algorithm prefers the last Message ID in the References >> header. The References header lists messages oldest-first, so the last >> Message ID is the parent (RFC2822, p. 24). The References header is >> also less likely to be in a non-standard >> syntax (http://cr.yp.to/immhf/thread.html, >> http://www.jwz.org/doc/threading.html). In case the References header >> is not to be found, fall back to the old behavior. >> --- >> >> I especially notice this problem on public mailing lists, where >> certain people's messages always cause an "out-dent" of the threading, >> instead of being nested under whichever message they are replies to. >> >> Technically, putting non-Message-ID crud in the In-Reply-To field is a >> violation of RFC2822, but it appears that in practice the References >> header is respected more often than the In-Reply-To one. >> >> lib/database.cc | 30 ++++++++++++++++++++++-------- >> 1 file changed, 22 insertions(+), 8 deletions(-) >> >> diff --git a/lib/database.cc b/lib/database.cc >> index 91d4329..cbf33ae 100644 >> --- a/lib/database.cc >> +++ b/lib/database.cc >> @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) >> * 'message_id' in the result (to avoid mass confusion when a single >> * message references itself cyclically---and yes, mail messages are >> * not infrequent in the wild that do this---don't ask me why). >> + * >> + * Return the last reference parsed. >> */ >> -static void >> +static char * >> parse_references (void *ctx, >> const char *message_id, >> GHashTable *hash, >> @@ -511,7 +513,7 @@ parse_references (void *ctx, >> char *ref; >> >> if (refs == NULL || *refs == '\0') >> - return; >> + return NULL; >> >> while (*refs) { >> ref = _parse_message_id (ctx, refs, &refs); >> @@ -519,6 +521,8 @@ parse_references (void *ctx, >> if (ref && strcmp (ref, message_id)) >> g_hash_table_insert (hash, ref, NULL); >> } >> + >> + return ref; > > As the comment for the function says, we explicitly avoid including > self-references. I think I'd err on the safe side and return NULL if the > last ref equals message-id. > >> } >> >> notmuch_status_t >> @@ -1365,7 +1369,7 @@ _notmuch_database_generate_doc_id (notmuch_database_t *notmuch) >> notmuch->last_doc_id++; >> >> if (notmuch->last_doc_id == 0) >> - INTERNAL_ERROR ("Xapian document IDs are exhausted.\n"); >> + INTERNAL_ERROR ("Xapian document IDs are exhausted.\n"); > > I don't know how you got this non-change hunk here, but please remove > it. :) > >> >> return notmuch->last_doc_id; >> } >> @@ -1509,7 +1513,7 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, >> const char **thread_id) >> { >> GHashTable *parents = NULL; >> - const char *refs, *in_reply_to, *in_reply_to_message_id; >> + const char *refs, *in_reply_to, *in_reply_to_message_id, *last_ref_message_id; >> GList *l, *keys = NULL; >> notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; >> >> @@ -1517,21 +1521,31 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, >> _my_talloc_free_for_g_hash, NULL); >> >> refs = notmuch_message_file_get_header (message_file, "references"); >> - parse_references (message, notmuch_message_get_message_id (message), >> - parents, refs); >> + last_ref_message_id = parse_references (message, >> + notmuch_message_get_message_id (message), >> + parents, refs); >> >> in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); >> parse_references (message, notmuch_message_get_message_id (message), >> parents, in_reply_to); >> >> - /* Carefully avoid adding any self-referential in-reply-to term. */ >> in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); > > I wonder if you should reuse your parse_references() change here, so > you'd set in_reply_to_message_id to the last message-id in > In-Reply-To. This might tackle some of the problematic cases directly, > but should still be all right per RFC 2822. I didn't verify how the > parser handles an RFC 2822 violating free form header though. Strike that based on http://www.jwz.org/doc/threading.html: "If there are multiple things in In-Reply-To that look like Message-IDs, only use the first one of them: odds are that the later ones are actually email addresses, not IDs." >> + /* If the parent message ID from the Reply-To and References >> + * headers are different, use the References one. This is because >> + * the Reply-To header is more likely to be in an non-standard >> + * format. */ >> + if (in_reply_to_message_id && >> + last_ref_message_id && >> + strcmp (last_ref_message_id, in_reply_to_message_id)) { >> + in_reply_to_message_id = last_ref_message_id; >> + } I talked to Austin (CC) about the patch on IRC, and his comment was, perceptive as always: 23:38 amdragon Is the logic in that patch equivalent to always using the last message ID in references unless there is no references header? Seems like it is, but in a convoluted way. And that's actually the case, isn't it? To make the code reflect that, you should use last_ref_message_id, and if that's NULL, fallback to in_reply_to_message_id. > I suggest adding an else if branch (or revamp the above if condition) to > tackle the missing In-Reply-To header: > > else if (!in_reply_to_message_id && last_ref_message_id) { > in_reply_to_message_id = last_ref_message_id; > } Strike that, it should be the other way round. > >> + /* Carefully avoid adding any self-referential in-reply-to term. */ >> if (in_reply_to_message_id && >> strcmp (in_reply_to_message_id, >> notmuch_message_get_message_id (message))) > > If you change parse_references() to be careful about never returning a > self-reference, and set in_reply_to_message_id from there, I think you > can drop the strcmp here. And move the comment to an appropriate place. Strike that. > Thanks for the patch, I think we should do this. But this is an area > where I think we need to be careful, so another reviewer wouldn't > harm. Some tests for this would be good too, obviously. I got this part right. :) Jani. > > > BR, > Jani. > >> { >> _notmuch_message_add_term (message, "replyto", >> - _parse_message_id (message, in_reply_to, NULL)); >> + in_reply_to_message_id); >> } >> >> keys = g_hash_table_get_keys (parents); >> -- >> 1.8.1.4 >> _______________________________________________ >> notmuch mailing list >> notmuch@notmuchmail.org >> http://notmuchmail.org/mailman/listinfo/notmuch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated 2013-03-01 17:06 ` Jani Nikula @ 2013-03-03 23:46 ` Aaron Ecay 2013-03-03 23:55 ` Aaron Ecay 0 siblings, 1 reply; 17+ messages in thread From: Aaron Ecay @ 2013-03-03 23:46 UTC (permalink / raw) To: Jani Nikula, notmuch Hi Jani, Thanks to you and Austin for the comments. 2013ko martxoak 1an, Jani Nikula-ek idatzi zuen: >> I think the background is that RFC 822 defines In-Reply-To (and >> References too for that matter) as *(phrase / msg-id), while RFC 2822 >> defines them as 1*msg-id. I'd like something about RFC 822 being >> mentioned in the commit message. >> >> The problem in the gmane message you link to in >> id:87liaa3luc.fsf@gmail.com is likely related to the FAQ item 05.26 >> "How do I fix a bogus In-Reply-To or missing References field?" in >> the MH FAQ http://www.newt.com/faq/mh.html. Likely yes. But I think notmuch should handle these messages, since they are seen in the wild (and I don’t think you disagree with me on this point?) >> >> As the comment for the function says, we explicitly avoid including >> self-references. I think I'd err on the safe side and return NULL if >> the last ref equals message-id. Done. >> >> I don't know how you got this non-change hunk here, but please remove >> it. :) That’s what I get for setting my editor to delete trailing whitespace on save (then not reading outgoing patches carefully). Fixed. >> I wonder if you should reuse your parse_references() change here, so >> you'd set in_reply_to_message_id to the last message-id in >> In-Reply-To. This might tackle some of the problematic cases >> directly, but should still be all right per RFC 2822. I didn't verify >> how the parser handles an RFC 2822 violating free form header though. > > Strike that based on http://www.jwz.org/doc/threading.html: > > "If there are multiple things in In-Reply-To that look like > Message-IDs, only use the first one of them: odds are that the later > ones are actually email addresses, not IDs." Hmm. I think it’s a toss-up which of multiple quasi-message-ids is the real one. In the email message example I linked upthread, it was the last one that was real. I decided to use the last one, because it allows the self-reference checking to be pushed entirely into parse_references. If you feel strongly that we should use the first one, I can change it back. > I talked to Austin (CC) about the patch on IRC, and his comment was, > perceptive as always: > > 23:38 amdragon Is the logic in that patch equivalent to always using > the last message ID in references unless there is no references > header? Seems like it is, but in a convoluted way. > > And that's actually the case, isn't it? To make the code reflect that, > you should use last_ref_message_id, and if that's NULL, fallback to > in_reply_to_message_id. Yes. Fixed. > >> I suggest adding an else if branch (or revamp the above if condition) >> to tackle the missing In-Reply-To header: >> >> else if (!in_reply_to_message_id && last_ref_message_id) { >> in_reply_to_message_id = last_ref_message_id; } > > Strike that, it should be the other way round. Now that the self-reference check is in parse_references, the conditional is much simpler. One additional change I made in this version was to factor out 3 calls to “notmuch_message_get_message_id (message)” into a variable inside the _notmuch_database_link_message_to_parents function, for a small boost to readability (and perhaps speed, depending on how clever the compiler is I guess). I also added tests – those are the first of two patches that will follow this email, the second being the code to make them pass. -- Aaron Ecay ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated 2013-03-03 23:46 ` Aaron Ecay @ 2013-03-03 23:55 ` Aaron Ecay 2013-03-04 2:45 ` David Bremner 2013-03-04 5:59 ` [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Tomi Ollila 0 siblings, 2 replies; 17+ messages in thread From: Aaron Ecay @ 2013-03-03 23:55 UTC (permalink / raw) To: Jani Nikula, notmuch [-- Attachment #1: Type: text/plain, Size: 254 bytes --] git send-email is mad about lines >998 characters in the test patch, so I’m sending the patches as attachments to this email. (Is there a better way to include the expected output of a notmuch command which outputs long lines in a test script?) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-test-add-tests-for-the-handling-of-References-and-In.patch --] [-- Type: text/x-patch, Size: 7433 bytes --] From 23836241dd304b98f2a05803fbb5a5a94f563050 Mon Sep 17 00:00:00 2001 From: Aaron Ecay <aaronecay@gmail.com> Date: Sun, 3 Mar 2013 18:14:07 -0500 Subject: [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers These tests are known_broken, the following commit fixes them. --- test/thread-replies | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100755 test/thread-replies diff --git a/test/thread-replies b/test/thread-replies new file mode 100755 index 0000000..fd11a09 --- /dev/null +++ b/test/thread-replies @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# +# Copyright (c) 2013 Aaron Ecay +# + +test_description='test of proper handling of in-reply-to and references headers + +This test makes sure that the thread structure in the notmuch database is +constructed properly, even in the presence of non-RFC-compliant headers' + +. ./test-lib.sh + +test_begin_subtest "Use References when In-Reply-To is broken" +test_subtest_known_broken +add_message '[id]="foo@one.com"' \ + '[subject]=one' +add_message '[in-reply-to]="mumble"' \ + '[references]="<foo@one.com>"' \ + '[subject]="Re: one"' +output=$(notmuch show --format=json 'subject:one') +test_expect_equal "$output" '[[[{"id": "foo@one.com", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-001", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "one", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#1)\n"}]}, [[{"id": "msg-002@notmuch-test-suite", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-002", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "Re: one", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#2)\n"}]}, []]]]]]' + +test_begin_subtest "Prefer References to In-Reply-To" +test_subtest_known_broken +add_message '[id]="foo@two.com"' \ + '[subject]=two' +add_message '[in-reply-to]="<bar@baz.com>"' \ + '[references]="<foo@two.com>"' \ + '[subject]="Re: two"' +output=$(notmuch show --format=json 'subject:two') +test_expect_equal "$output" '[[[{"id": "foo@two.com", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-003", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "two", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#3)\n"}]}, [[{"id": "msg-004@notmuch-test-suite", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-004", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "Re: two", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#4)\n"}]}, []]]]]]' + +test_begin_subtest "Use In-Reply-To when no References" +test_subtest_known_broken +add_message '[id]="foo@three.com"' \ + '[subject]="three"' +add_message '[in-reply-to]="<foo@three.com>"' \ + '[subject]="Re: three"' +output=$(notmuch show --format=json 'subject:three') +test_expect_equal "$output" '[[[{"id": "foo@three.com", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-005", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "three", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#5)\n"}]}, [[{"id": "msg-006@notmuch-test-suite", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-006", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "Re: three", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#6)\n"}]}, []]]]]]' + +test_begin_subtest "Use last Reference" +test_subtest_known_broken +add_message '[id]="foo@four.com"' \ + '[subject]="four"' +add_message '[id]="bar@four.com"' \ + '[subject]="not-four"' +add_message '[in-reply-to]="<baz@four.com>"' \ + '[references]="<baz@four.com> <foo@four.com>"' \ + '[subject]="neither"' +output=$(notmuch show --format=json 'subject:four') +test_expect_equal "$output" '[[[{"id": "foo@four.com", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-007", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "four", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#7)\n"}]}, [[{"id": "msg-009@notmuch-test-suite", "match": false, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-009", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "neither", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#9)\n"}]}, []]]]], [[{"id": "bar@four.com", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-008", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "not-four", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#8)\n"}]}, []]]]' + + +test_done -- 1.8.1.5 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-lib-database.cc-change-how-the-parent-of-a-message-i.patch --] [-- Type: text/x-patch, Size: 9294 bytes --] From 57739b9722a86ba50ef97ad7d5d21b3e5bc1a977 Mon Sep 17 00:00:00 2001 From: Aaron Ecay <aaronecay@gmail.com> Date: Mon, 25 Feb 2013 18:46:41 -0500 Subject: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Presently, the code which finds the parent of a message as it is being added to the database assumes that the first Message-ID-like substring of the In-Reply-To header is the parent Message ID. Some mail clients, however, put stuff other than the Message-ID of the parent in the In-Reply-To header, such as the email address of the sender of the parent. This can fool notmuch. The updated algorithm prefers the last Message ID in the References header. The References header lists messages oldest-first, so the last Message ID is the parent (RFC2822, p. 24). The References header is also less likely to be in a non-standard syntax (http://cr.yp.to/immhf/thread.html, http://www.jwz.org/doc/threading.html). In case the References header is not to be found, fall back to the old behavior. V2 of this patch, incorporating feedback from Jani and (indirectly) Austin. --- lib/database.cc | 48 +++++++++++++++++++++++++++++++++--------------- test/thread-replies | 4 ---- 2 files changed, 33 insertions(+), 19 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 91d4329..52ed618 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) * 'message_id' in the result (to avoid mass confusion when a single * message references itself cyclically---and yes, mail messages are * not infrequent in the wild that do this---don't ask me why). -*/ -static void + * + * Return the last reference parsed, if it is not equal to message_id. + */ +static char * parse_references (void *ctx, const char *message_id, GHashTable *hash, @@ -511,7 +513,7 @@ parse_references (void *ctx, char *ref; if (refs == NULL || *refs == '\0') - return; + return NULL; while (*refs) { ref = _parse_message_id (ctx, refs, &refs); @@ -519,6 +521,17 @@ parse_references (void *ctx, if (ref && strcmp (ref, message_id)) g_hash_table_insert (hash, ref, NULL); } + + /* The return value of this function is used to add a parent + * reference to the database. We should avoid making a message + * its own parent, thus the following check. + */ + + if (ref && strcmp(ref, message_id)) { + return ref; + } else { + return NULL; + } } notmuch_status_t @@ -1510,28 +1523,33 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, { GHashTable *parents = NULL; const char *refs, *in_reply_to, *in_reply_to_message_id; + const char *last_ref_message_id, *this_message_id; GList *l, *keys = NULL; notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; parents = g_hash_table_new_full (g_str_hash, g_str_equal, _my_talloc_free_for_g_hash, NULL); + this_message_id = notmuch_message_get_message_id (message); refs = notmuch_message_file_get_header (message_file, "references"); - parse_references (message, notmuch_message_get_message_id (message), - parents, refs); + last_ref_message_id = parse_references (message, + this_message_id, + parents, refs); in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); - parse_references (message, notmuch_message_get_message_id (message), - parents, in_reply_to); - - /* Carefully avoid adding any self-referential in-reply-to term. */ - in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); - if (in_reply_to_message_id && - strcmp (in_reply_to_message_id, - notmuch_message_get_message_id (message))) - { + in_reply_to_message_id = parse_references (message, + this_message_id, + parents, in_reply_to); + + /* For the parent of this message, use the last message ID of the + * References header, if available. If not, fall back to the + * first message ID in the In-Reply-To header. */ + if (last_ref_message_id) { + _notmuch_message_add_term (message, "replyto", + last_ref_message_id); + } else if (in_reply_to_message_id) { _notmuch_message_add_term (message, "replyto", - _parse_message_id (message, in_reply_to, NULL)); + in_reply_to_message_id); } keys = g_hash_table_get_keys (parents); diff --git a/test/thread-replies b/test/thread-replies index fd11a09..6dc6143 100755 --- a/test/thread-replies +++ b/test/thread-replies @@ -11,7 +11,6 @@ constructed properly, even in the presence of non-RFC-compliant headers' . ./test-lib.sh test_begin_subtest "Use References when In-Reply-To is broken" -test_subtest_known_broken add_message '[id]="foo@one.com"' \ '[subject]=one' add_message '[in-reply-to]="mumble"' \ @@ -21,7 +20,6 @@ output=$(notmuch show --format=json 'subject:one') test_expect_equal "$output" '[[[{"id": "foo@one.com", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-001", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "one", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#1)\n"}]}, [[{"id": "msg-002@notmuch-test-suite", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-002", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "Re: one", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#2)\n"}]}, []]]]]]' test_begin_subtest "Prefer References to In-Reply-To" -test_subtest_known_broken add_message '[id]="foo@two.com"' \ '[subject]=two' add_message '[in-reply-to]="<bar@baz.com>"' \ @@ -31,7 +29,6 @@ output=$(notmuch show --format=json 'subject:two') test_expect_equal "$output" '[[[{"id": "foo@two.com", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-003", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "two", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#3)\n"}]}, [[{"id": "msg-004@notmuch-test-suite", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-004", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "Re: two", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#4)\n"}]}, []]]]]]' test_begin_subtest "Use In-Reply-To when no References" -test_subtest_known_broken add_message '[id]="foo@three.com"' \ '[subject]="three"' add_message '[in-reply-to]="<foo@three.com>"' \ @@ -40,7 +37,6 @@ output=$(notmuch show --format=json 'subject:three') test_expect_equal "$output" '[[[{"id": "foo@three.com", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-005", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "three", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#5)\n"}]}, [[{"id": "msg-006@notmuch-test-suite", "match": true, "excluded": false, "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-006", "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], "headers": {"Subject": "Re: three", "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#6)\n"}]}, []]]]]]' test_begin_subtest "Use last Reference" -test_subtest_known_broken add_message '[id]="foo@four.com"' \ '[subject]="four"' add_message '[id]="bar@four.com"' \ -- 1.8.1.5 [-- Attachment #4: Type: text/plain, Size: 17 bytes --] -- Aaron Ecay ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated 2013-03-03 23:55 ` Aaron Ecay @ 2013-03-04 2:45 ` David Bremner 2013-03-06 3:31 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers Aaron Ecay 2013-03-04 5:59 ` [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Tomi Ollila 1 sibling, 1 reply; 17+ messages in thread From: David Bremner @ 2013-03-04 2:45 UTC (permalink / raw) To: Aaron Ecay, Jani Nikula, notmuch Aaron Ecay <aaronecay@gmail.com> writes: > git send-email is mad about lines >998 characters in the test patch, so > I’m sending the patches as attachments to this email. (Is there a > better way to include the expected output of a notmuch command which > outputs long lines in a test script?) What about pretty printing the json in the source? test_expect_equal_json canonicalizes both arguments before comparing, so it should be ok. d ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers 2013-03-04 2:45 ` David Bremner @ 2013-03-06 3:31 ` Aaron Ecay 2013-03-06 3:31 ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay ` (3 more replies) 0 siblings, 4 replies; 17+ messages in thread From: Aaron Ecay @ 2013-03-06 3:31 UTC (permalink / raw) To: notmuch These tests are known_broken, the following commit fixes them. --- Thanks to David and Tomi for pointing out test_expect_equal_json. In the process of implementing that, I discovered notmuch_json_show_sanitize, which I had also not been using. This patch fixes the test to use both these conveniences. The second patch is not changed substantively, but I am resending it for tidiness. test/thread-replies | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) create mode 100755 test/thread-replies diff --git a/test/thread-replies b/test/thread-replies new file mode 100755 index 0000000..a902691 --- /dev/null +++ b/test/thread-replies @@ -0,0 +1,144 @@ +#!/usr/bin/env bash +# +# Copyright (c) 2013 Aaron Ecay +# + +test_description='test of proper handling of in-reply-to and references headers + +This test makes sure that the thread structure in the notmuch database is +constructed properly, even in the presence of non-RFC-compliant headers' + +. ./test-lib.sh + +test_begin_subtest "Use References when In-Reply-To is broken" +test_subtest_known_broken +add_message '[id]="foo@one.com"' \ + '[subject]=one' +add_message '[in-reply-to]="mumble"' \ + '[references]="<foo@one.com>"' \ + '[subject]="Re: one"' +output=$(notmuch show --format=json 'subject:one' | notmuch_json_show_sanitize) +expected='[[[{"id": "foo@one.com", + "match": true, + "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-001", + "timestamp": 978709437, + "date_relative": "2001-01-05", + "tags": ["inbox", "unread"], + "headers": {"Subject": "one", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, + "body": [{"id": 1, + "content-type": "text/plain", + "content": "This is just a test message (#1)\n"}]}, + [[{"id": "msg-002@notmuch-test-suite", + "match": true, "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-002", + "timestamp": 978709437, "date_relative": "2001-01-05", + "tags": ["inbox", "unread"], "headers": {"Subject": "Re: one", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, + "body": [{"id": 1, "content-type": "text/plain", + "content": "This is just a test message (#2)\n"}]}, []]]]]]' +expected=`echo "$expected" | notmuch_json_show_sanitize` +test_expect_equal_json "$output" "$expected" + +test_begin_subtest "Prefer References to In-Reply-To" +test_subtest_known_broken +add_message '[id]="foo@two.com"' \ + '[subject]=two' +add_message '[in-reply-to]="<bar@baz.com>"' \ + '[references]="<foo@two.com>"' \ + '[subject]="Re: two"' +output=$(notmuch show --format=json 'subject:two' | notmuch_json_show_sanitize) +expected='[[[{"id": "foo@two.com", + "match": true, "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-003", + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], + "headers": {"Subject": "two", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, + "body": [{"id": 1, "content-type": "text/plain", + "content": "This is just a test message (#3)\n"}]}, + [[{"id": "msg-004@notmuch-test-suite", "match": true, "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-004", + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], + "headers": {"Subject": "Re: two", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, + "body": [{"id": 1, + "content-type": "text/plain", "content": "This is just a test message (#4)\n"}]}, + []]]]]]' +expected=`echo "$expected" | notmuch_json_show_sanitize` +test_expect_equal_json "$output" "$expected" + +test_begin_subtest "Use In-Reply-To when no References" +test_subtest_known_broken +add_message '[id]="foo@three.com"' \ + '[subject]="three"' +add_message '[in-reply-to]="<foo@three.com>"' \ + '[subject]="Re: three"' +output=$(notmuch show --format=json 'subject:three' | notmuch_json_show_sanitize) +expected='[[[{"id": "foo@three.com", "match": true, "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-005", + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], + "headers": {"Subject": "three", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, + "content-type": "text/plain", "content": "This is just a test message (#5)\n"}]}, + [[{"id": "msg-006@notmuch-test-suite", "match": true, "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-006", + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], + "headers": {"Subject": "Re: three", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, + "content-type": "text/plain", "content": "This is just a test message (#6)\n"}]}, + []]]]]]' +expected=`echo "$expected" | notmuch_json_show_sanitize` +test_expect_equal_json "$output" "$expected" + +test_begin_subtest "Use last Reference" +test_subtest_known_broken +add_message '[id]="foo@four.com"' \ + '[subject]="four"' +add_message '[id]="bar@four.com"' \ + '[subject]="not-four"' +add_message '[in-reply-to]="<baz@four.com>"' \ + '[references]="<baz@four.com> <foo@four.com>"' \ + '[subject]="neither"' +output=$(notmuch show --format=json 'subject:four' | notmuch_json_show_sanitize) +expected='[[[{"id": "foo@four.com", "match": true, "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-007", + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], + "headers": {"Subject": "four", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, + "content-type": "text/plain", "content": "This is just a test message (#7)\n"}]}, + [[{"id": "msg-009@notmuch-test-suite", "match": false, "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-009", + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], + "headers": {"Subject": "neither", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, + "content-type": "text/plain", "content": "This is just a test message (#9)\n"}]}, + []]]]], [[{"id": "bar@four.com", "match": true, "excluded": false, + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-008", + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], + "headers": {"Subject": "not-four", + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, + "content-type": "text/plain", "content": "This is just a test message (#8)\n"}]}, []]]]' +expected=`echo "$expected" | notmuch_json_show_sanitize` +test_expect_equal_json "$output" "$expected" + + +test_done -- 1.8.1.5 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated 2013-03-06 3:31 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers Aaron Ecay @ 2013-03-06 3:31 ` Aaron Ecay 2013-05-04 14:10 ` David Bremner ` (2 more replies) 2013-05-04 14:01 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers David Bremner ` (2 subsequent siblings) 3 siblings, 3 replies; 17+ messages in thread From: Aaron Ecay @ 2013-03-06 3:31 UTC (permalink / raw) To: notmuch Presently, the code which finds the parent of a message as it is being added to the database assumes that the first Message-ID-like substring of the In-Reply-To header is the parent Message ID. Some mail clients, however, put stuff other than the Message-ID of the parent in the In-Reply-To header, such as the email address of the sender of the parent. This can fool notmuch. The updated algorithm prefers the last Message ID in the References header. The References header lists messages oldest-first, so the last Message ID is the parent (RFC2822, p. 24). The References header is also less likely to be in a non-standard syntax (http://cr.yp.to/immhf/thread.html, http://www.jwz.org/doc/threading.html). In case the References header is not to be found, fall back to the old behavior. V2 of this patch, incorporating feedback from Jani and (indirectly) Austin. --- lib/database.cc | 48 +++++++++++++++++++++++++++++++++--------------- test/thread-replies | 4 ---- 2 files changed, 33 insertions(+), 19 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 91d4329..52ed618 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) * 'message_id' in the result (to avoid mass confusion when a single * message references itself cyclically---and yes, mail messages are * not infrequent in the wild that do this---don't ask me why). -*/ -static void + * + * Return the last reference parsed, if it is not equal to message_id. + */ +static char * parse_references (void *ctx, const char *message_id, GHashTable *hash, @@ -511,7 +513,7 @@ parse_references (void *ctx, char *ref; if (refs == NULL || *refs == '\0') - return; + return NULL; while (*refs) { ref = _parse_message_id (ctx, refs, &refs); @@ -519,6 +521,17 @@ parse_references (void *ctx, if (ref && strcmp (ref, message_id)) g_hash_table_insert (hash, ref, NULL); } + + /* The return value of this function is used to add a parent + * reference to the database. We should avoid making a message + * its own parent, thus the following check. + */ + + if (ref && strcmp(ref, message_id)) { + return ref; + } else { + return NULL; + } } notmuch_status_t @@ -1510,28 +1523,33 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, { GHashTable *parents = NULL; const char *refs, *in_reply_to, *in_reply_to_message_id; + const char *last_ref_message_id, *this_message_id; GList *l, *keys = NULL; notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; parents = g_hash_table_new_full (g_str_hash, g_str_equal, _my_talloc_free_for_g_hash, NULL); + this_message_id = notmuch_message_get_message_id (message); refs = notmuch_message_file_get_header (message_file, "references"); - parse_references (message, notmuch_message_get_message_id (message), - parents, refs); + last_ref_message_id = parse_references (message, + this_message_id, + parents, refs); in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); - parse_references (message, notmuch_message_get_message_id (message), - parents, in_reply_to); - - /* Carefully avoid adding any self-referential in-reply-to term. */ - in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); - if (in_reply_to_message_id && - strcmp (in_reply_to_message_id, - notmuch_message_get_message_id (message))) - { + in_reply_to_message_id = parse_references (message, + this_message_id, + parents, in_reply_to); + + /* For the parent of this message, use the last message ID of the + * References header, if available. If not, fall back to the + * first message ID in the In-Reply-To header. */ + if (last_ref_message_id) { + _notmuch_message_add_term (message, "replyto", + last_ref_message_id); + } else if (in_reply_to_message_id) { _notmuch_message_add_term (message, "replyto", - _parse_message_id (message, in_reply_to, NULL)); + in_reply_to_message_id); } keys = g_hash_table_get_keys (parents); diff --git a/test/thread-replies b/test/thread-replies index a902691..28c2b1f 100755 --- a/test/thread-replies +++ b/test/thread-replies @@ -11,7 +11,6 @@ constructed properly, even in the presence of non-RFC-compliant headers' . ./test-lib.sh test_begin_subtest "Use References when In-Reply-To is broken" -test_subtest_known_broken add_message '[id]="foo@one.com"' \ '[subject]=one' add_message '[in-reply-to]="mumble"' \ @@ -46,7 +45,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` test_expect_equal_json "$output" "$expected" test_begin_subtest "Prefer References to In-Reply-To" -test_subtest_known_broken add_message '[id]="foo@two.com"' \ '[subject]=two' add_message '[in-reply-to]="<bar@baz.com>"' \ @@ -77,7 +75,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` test_expect_equal_json "$output" "$expected" test_begin_subtest "Use In-Reply-To when no References" -test_subtest_known_broken add_message '[id]="foo@three.com"' \ '[subject]="three"' add_message '[in-reply-to]="<foo@three.com>"' \ @@ -104,7 +101,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` test_expect_equal_json "$output" "$expected" test_begin_subtest "Use last Reference" -test_subtest_known_broken add_message '[id]="foo@four.com"' \ '[subject]="four"' add_message '[id]="bar@four.com"' \ -- 1.8.1.5 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated 2013-03-06 3:31 ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay @ 2013-05-04 14:10 ` David Bremner 2013-05-04 16:24 ` Jani Nikula 2013-05-14 0:43 ` David Bremner 2 siblings, 0 replies; 17+ messages in thread From: David Bremner @ 2013-05-04 14:10 UTC (permalink / raw) To: Aaron Ecay, notmuch Aaron Ecay <aaronecay@gmail.com> writes: > + if (ref && strcmp(ref, message_id)) { > + return ref; > + } else { > + return NULL; > + } As a nit, there should be a space after strcmp. But if nobody has any other comments, I could amend that. d ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated 2013-03-06 3:31 ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay 2013-05-04 14:10 ` David Bremner @ 2013-05-04 16:24 ` Jani Nikula 2013-05-14 0:43 ` David Bremner 2 siblings, 0 replies; 17+ messages in thread From: Jani Nikula @ 2013-05-04 16:24 UTC (permalink / raw) To: Aaron Ecay, notmuch LGTM On Wed, 06 Mar 2013, Aaron Ecay <aaronecay@gmail.com> wrote: > Presently, the code which finds the parent of a message as it is being > added to the database assumes that the first Message-ID-like substring > of the In-Reply-To header is the parent Message ID. Some mail clients, > however, put stuff other than the Message-ID of the parent in the > In-Reply-To header, such as the email address of the sender of the > parent. This can fool notmuch. > > The updated algorithm prefers the last Message ID in the References > header. The References header lists messages oldest-first, so the last > Message ID is the parent (RFC2822, p. 24). The References header is > also less likely to be in a non-standard > syntax (http://cr.yp.to/immhf/thread.html, > http://www.jwz.org/doc/threading.html). In case the References header > is not to be found, fall back to the old behavior. > > V2 of this patch, incorporating feedback from Jani and (indirectly) > Austin. > --- > lib/database.cc | 48 +++++++++++++++++++++++++++++++++--------------- > test/thread-replies | 4 ---- > 2 files changed, 33 insertions(+), 19 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index 91d4329..52ed618 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next) > * 'message_id' in the result (to avoid mass confusion when a single > * message references itself cyclically---and yes, mail messages are > * not infrequent in the wild that do this---don't ask me why). > -*/ > -static void > + * > + * Return the last reference parsed, if it is not equal to message_id. > + */ > +static char * > parse_references (void *ctx, > const char *message_id, > GHashTable *hash, > @@ -511,7 +513,7 @@ parse_references (void *ctx, > char *ref; > > if (refs == NULL || *refs == '\0') > - return; > + return NULL; > > while (*refs) { > ref = _parse_message_id (ctx, refs, &refs); > @@ -519,6 +521,17 @@ parse_references (void *ctx, > if (ref && strcmp (ref, message_id)) > g_hash_table_insert (hash, ref, NULL); > } > + > + /* The return value of this function is used to add a parent > + * reference to the database. We should avoid making a message > + * its own parent, thus the following check. > + */ > + > + if (ref && strcmp(ref, message_id)) { > + return ref; > + } else { > + return NULL; > + } > } > > notmuch_status_t > @@ -1510,28 +1523,33 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, > { > GHashTable *parents = NULL; > const char *refs, *in_reply_to, *in_reply_to_message_id; > + const char *last_ref_message_id, *this_message_id; > GList *l, *keys = NULL; > notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; > > parents = g_hash_table_new_full (g_str_hash, g_str_equal, > _my_talloc_free_for_g_hash, NULL); > + this_message_id = notmuch_message_get_message_id (message); > > refs = notmuch_message_file_get_header (message_file, "references"); > - parse_references (message, notmuch_message_get_message_id (message), > - parents, refs); > + last_ref_message_id = parse_references (message, > + this_message_id, > + parents, refs); > > in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); > - parse_references (message, notmuch_message_get_message_id (message), > - parents, in_reply_to); > - > - /* Carefully avoid adding any self-referential in-reply-to term. */ > - in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL); > - if (in_reply_to_message_id && > - strcmp (in_reply_to_message_id, > - notmuch_message_get_message_id (message))) > - { > + in_reply_to_message_id = parse_references (message, > + this_message_id, > + parents, in_reply_to); > + > + /* For the parent of this message, use the last message ID of the > + * References header, if available. If not, fall back to the > + * first message ID in the In-Reply-To header. */ > + if (last_ref_message_id) { > + _notmuch_message_add_term (message, "replyto", > + last_ref_message_id); > + } else if (in_reply_to_message_id) { > _notmuch_message_add_term (message, "replyto", > - _parse_message_id (message, in_reply_to, NULL)); > + in_reply_to_message_id); > } > > keys = g_hash_table_get_keys (parents); > diff --git a/test/thread-replies b/test/thread-replies > index a902691..28c2b1f 100755 > --- a/test/thread-replies > +++ b/test/thread-replies > @@ -11,7 +11,6 @@ constructed properly, even in the presence of non-RFC-compliant headers' > . ./test-lib.sh > > test_begin_subtest "Use References when In-Reply-To is broken" > -test_subtest_known_broken > add_message '[id]="foo@one.com"' \ > '[subject]=one' > add_message '[in-reply-to]="mumble"' \ > @@ -46,7 +45,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` > test_expect_equal_json "$output" "$expected" > > test_begin_subtest "Prefer References to In-Reply-To" > -test_subtest_known_broken > add_message '[id]="foo@two.com"' \ > '[subject]=two' > add_message '[in-reply-to]="<bar@baz.com>"' \ > @@ -77,7 +75,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` > test_expect_equal_json "$output" "$expected" > > test_begin_subtest "Use In-Reply-To when no References" > -test_subtest_known_broken > add_message '[id]="foo@three.com"' \ > '[subject]="three"' > add_message '[in-reply-to]="<foo@three.com>"' \ > @@ -104,7 +101,6 @@ expected=`echo "$expected" | notmuch_json_show_sanitize` > test_expect_equal_json "$output" "$expected" > > test_begin_subtest "Use last Reference" > -test_subtest_known_broken > add_message '[id]="foo@four.com"' \ > '[subject]="four"' > add_message '[id]="bar@four.com"' \ > -- > 1.8.1.5 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated 2013-03-06 3:31 ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay 2013-05-04 14:10 ` David Bremner 2013-05-04 16:24 ` Jani Nikula @ 2013-05-14 0:43 ` David Bremner 2 siblings, 0 replies; 17+ messages in thread From: David Bremner @ 2013-05-14 0:43 UTC (permalink / raw) To: Aaron Ecay, notmuch Aaron Ecay <aaronecay@gmail.com> writes: > Presently, the code which finds the parent of a message as it is being > added to the database assumes that the first Message-ID-like substring > of the In-Reply-To header is the parent Message ID. Some mail > clients, however, put stuff other than the Message-ID of the parent in > the In-Reply-To header, such as the email address of the sender of the > parent. This can fool notmuch. > Pushed both, with a certain amount of hacking on the test patch. d ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers 2013-03-06 3:31 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers Aaron Ecay 2013-03-06 3:31 ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay @ 2013-05-04 14:01 ` David Bremner 2013-05-04 14:03 ` David Bremner 2013-05-04 16:23 ` Jani Nikula 3 siblings, 0 replies; 17+ messages in thread From: David Bremner @ 2013-05-04 14:01 UTC (permalink / raw) To: Aaron Ecay, notmuch Aaron Ecay <aaronecay@gmail.com> writes: > These tests are known_broken, the following commit fixes them. > --- On current master, I get FIXED Use In-Reply-To when no References d ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers 2013-03-06 3:31 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers Aaron Ecay 2013-03-06 3:31 ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay 2013-05-04 14:01 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers David Bremner @ 2013-05-04 14:03 ` David Bremner 2013-05-04 16:23 ` Jani Nikula 3 siblings, 0 replies; 17+ messages in thread From: David Bremner @ 2013-05-04 14:03 UTC (permalink / raw) To: Aaron Ecay, notmuch Aaron Ecay <aaronecay@gmail.com> writes: > These tests are known_broken, the following commit fixes them. > --- Sorry for the extra mails, but this patch also makes test/basic fail because thread-replies is not added to test/notmuch-test d ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers 2013-03-06 3:31 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers Aaron Ecay ` (2 preceding siblings ...) 2013-05-04 14:03 ` David Bremner @ 2013-05-04 16:23 ` Jani Nikula 3 siblings, 0 replies; 17+ messages in thread From: Jani Nikula @ 2013-05-04 16:23 UTC (permalink / raw) To: Aaron Ecay, notmuch On Wed, 06 Mar 2013, Aaron Ecay <aaronecay@gmail.com> wrote: > These tests are known_broken, the following commit fixes them. > --- > > Thanks to David and Tomi for pointing out test_expect_equal_json. In > the process of implementing that, I discovered > notmuch_json_show_sanitize, which I had also not been using. This > patch fixes the test to use both these conveniences. The second patch > is not changed substantively, but I am resending it for tidiness. > > test/thread-replies | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 144 insertions(+) > create mode 100755 test/thread-replies > > diff --git a/test/thread-replies b/test/thread-replies > new file mode 100755 > index 0000000..a902691 > --- /dev/null > +++ b/test/thread-replies > @@ -0,0 +1,144 @@ > +#!/usr/bin/env bash > +# > +# Copyright (c) 2013 Aaron Ecay > +# > + > +test_description='test of proper handling of in-reply-to and references headers > + > +This test makes sure that the thread structure in the notmuch database is > +constructed properly, even in the presence of non-RFC-compliant headers' Nitpick, most other tests have one line test descriptions. The second paragraph could be left as comment. > + > +. ./test-lib.sh > + > +test_begin_subtest "Use References when In-Reply-To is broken" > +test_subtest_known_broken > +add_message '[id]="foo@one.com"' \ > + '[subject]=one' > +add_message '[in-reply-to]="mumble"' \ > + '[references]="<foo@one.com>"' \ > + '[subject]="Re: one"' > +output=$(notmuch show --format=json 'subject:one' | notmuch_json_show_sanitize) > +expected='[[[{"id": "foo@one.com", > + "match": true, > + "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-001", Please replace the path with: "filename": "YYYYY", Ditto below for all of them. > + "timestamp": 978709437, > + "date_relative": "2001-01-05", > + "tags": ["inbox", "unread"], > + "headers": {"Subject": "one", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, > + "body": [{"id": 1, > + "content-type": "text/plain", > + "content": "This is just a test message (#1)\n"}]}, > + [[{"id": "msg-002@notmuch-test-suite", > + "match": true, "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-002", > + "timestamp": 978709437, "date_relative": "2001-01-05", > + "tags": ["inbox", "unread"], "headers": {"Subject": "Re: one", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, > + "body": [{"id": 1, "content-type": "text/plain", > + "content": "This is just a test message (#2)\n"}]}, []]]]]]' > +expected=`echo "$expected" | notmuch_json_show_sanitize` > +test_expect_equal_json "$output" "$expected" > + > +test_begin_subtest "Prefer References to In-Reply-To" > +test_subtest_known_broken > +add_message '[id]="foo@two.com"' \ > + '[subject]=two' > +add_message '[in-reply-to]="<bar@baz.com>"' \ > + '[references]="<foo@two.com>"' \ > + '[subject]="Re: two"' > +output=$(notmuch show --format=json 'subject:two' | notmuch_json_show_sanitize) > +expected='[[[{"id": "foo@two.com", > + "match": true, "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-003", > + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], > + "headers": {"Subject": "two", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, > + "body": [{"id": 1, "content-type": "text/plain", > + "content": "This is just a test message (#3)\n"}]}, > + [[{"id": "msg-004@notmuch-test-suite", "match": true, "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-004", > + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], > + "headers": {"Subject": "Re: two", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, > + "body": [{"id": 1, > + "content-type": "text/plain", "content": "This is just a test message (#4)\n"}]}, > + []]]]]]' > +expected=`echo "$expected" | notmuch_json_show_sanitize` > +test_expect_equal_json "$output" "$expected" > + > +test_begin_subtest "Use In-Reply-To when no References" > +test_subtest_known_broken As David said, this is not broken currently, as In-Reply-To is used unconditionally. Just drop the broken annotation, the test itself is useful. > +add_message '[id]="foo@three.com"' \ > + '[subject]="three"' > +add_message '[in-reply-to]="<foo@three.com>"' \ > + '[subject]="Re: three"' > +output=$(notmuch show --format=json 'subject:three' | notmuch_json_show_sanitize) > +expected='[[[{"id": "foo@three.com", "match": true, "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-005", > + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], > + "headers": {"Subject": "three", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, > + "content-type": "text/plain", "content": "This is just a test message (#5)\n"}]}, > + [[{"id": "msg-006@notmuch-test-suite", "match": true, "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-006", > + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], > + "headers": {"Subject": "Re: three", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, > + "content-type": "text/plain", "content": "This is just a test message (#6)\n"}]}, > + []]]]]]' > +expected=`echo "$expected" | notmuch_json_show_sanitize` > +test_expect_equal_json "$output" "$expected" > + > +test_begin_subtest "Use last Reference" > +test_subtest_known_broken > +add_message '[id]="foo@four.com"' \ > + '[subject]="four"' > +add_message '[id]="bar@four.com"' \ > + '[subject]="not-four"' > +add_message '[in-reply-to]="<baz@four.com>"' \ > + '[references]="<baz@four.com> <foo@four.com>"' \ > + '[subject]="neither"' > +output=$(notmuch show --format=json 'subject:four' | notmuch_json_show_sanitize) > +expected='[[[{"id": "foo@four.com", "match": true, "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-007", > + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], > + "headers": {"Subject": "four", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, > + "content-type": "text/plain", "content": "This is just a test message (#7)\n"}]}, > + [[{"id": "msg-009@notmuch-test-suite", "match": false, "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-009", > + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], > + "headers": {"Subject": "neither", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, > + "content-type": "text/plain", "content": "This is just a test message (#9)\n"}]}, > + []]]]], [[{"id": "bar@four.com", "match": true, "excluded": false, > + "filename": "/home/aecay/development/notmuch/notmuch-git/src/notmuch/test/tmp.thread-replies/mail/msg-008", > + "timestamp": 978709437, "date_relative": "2001-01-05", "tags": ["inbox", "unread"], > + "headers": {"Subject": "not-four", > + "From": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "To": "Notmuch Test Suite <test_suite@notmuchmail.org>", > + "Date": "Fri, 05 Jan 2001 15:43:57 +0000"}, "body": [{"id": 1, > + "content-type": "text/plain", "content": "This is just a test message (#8)\n"}]}, []]]]' > +expected=`echo "$expected" | notmuch_json_show_sanitize` > +test_expect_equal_json "$output" "$expected" > + > + > +test_done > -- > 1.8.1.5 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated 2013-03-03 23:55 ` Aaron Ecay 2013-03-04 2:45 ` David Bremner @ 2013-03-04 5:59 ` Tomi Ollila 1 sibling, 0 replies; 17+ messages in thread From: Tomi Ollila @ 2013-03-04 5:59 UTC (permalink / raw) To: Aaron Ecay, Jani Nikula, notmuch On Mon, Mar 04 2013, Aaron Ecay <aaronecay@gmail.com> wrote: > git send-email is mad about lines >998 characters in the test patch, so > I’m sending the patches as attachments to this email. (Is there a > better way to include the expected output of a notmuch command which > outputs long lines in a test script?) Simplest thing is to split the expected string to multiple lines and use test_expect_equal_json() for checking. Tomi ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-05-14 0:43 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-25 23:50 [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Aaron Ecay 2013-02-26 9:19 ` Jani Nikula 2013-02-26 19:40 ` Aaron Ecay 2013-02-28 19:41 ` Jani Nikula 2013-03-01 17:06 ` Jani Nikula 2013-03-03 23:46 ` Aaron Ecay 2013-03-03 23:55 ` Aaron Ecay 2013-03-04 2:45 ` David Bremner 2013-03-06 3:31 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers Aaron Ecay 2013-03-06 3:31 ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay 2013-05-04 14:10 ` David Bremner 2013-05-04 16:24 ` Jani Nikula 2013-05-14 0:43 ` David Bremner 2013-05-04 14:01 ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers David Bremner 2013-05-04 14:03 ` David Bremner 2013-05-04 16:23 ` Jani Nikula 2013-03-04 5:59 ` [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Tomi Ollila
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).