unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Jani Nikula <jani@nikula.org>
To: Aaron Ecay <aaronecay@gmail.com>, notmuch@notmuchmail.org
Subject: Re: [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated
Date: Tue, 26 Feb 2013 10:19:06 +0100	[thread overview]
Message-ID: <878v6bjuut.fsf@nikula.org> (raw)
In-Reply-To: <1361836225-17279-1-git-send-email-aaronecay@gmail.com>

On Tue, 26 Feb 2013, Aaron Ecay <aaronecay@gmail.com> wrote:
> Presently, the code which finds the parent of a message as it is being
> added to the database assumes that the first Message-ID-like substring
> of the In-Reply-To header is the parent Message ID.  Some mail clients,
> however, put stuff other than the Message-ID of the parent in the
> In-Reply-To header, such as the email address of the sender of the
> parent.  This can fool notmuch.

Hi Aaron, please provide references to a few messages like this. If
available on the notmuch list an id: reference would be best, but
otherwise some archive that allows viewing full message headers or
downloading the full message would be best.

Thanks,
Jani.

>
> The updated algorithm prefers the last Message ID in the References
> header.  The References header lists messages oldest-first, so the last
> Message ID is the parent (RFC2822, p. 24).  The References header is
> also less likely to be in a non-standard
> syntax (http://cr.yp.to/immhf/thread.html,
> http://www.jwz.org/doc/threading.html).  In case the References header
> is not to be found, fall back to the old behavior.
> ---
>
> I especially notice this problem on public mailing lists, where
> certain people's messages always cause an "out-dent" of the threading,
> instead of being nested under whichever message they are replies to.
>
> Technically, putting non-Message-ID crud in the In-Reply-To field is a
> violation of RFC2822, but it appears that in practice the References
> header is respected more often than the In-Reply-To one.
>
>  lib/database.cc | 30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index 91d4329..cbf33ae 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -501,8 +501,10 @@ _parse_message_id (void *ctx, const char *message_id, const char **next)
>   * 'message_id' in the result (to avoid mass confusion when a single
>   * message references itself cyclically---and yes, mail messages are
>   * not infrequent in the wild that do this---don't ask me why).
> + *
> + * Return the last reference parsed.
>  */
> -static void
> +static char *
>  parse_references (void *ctx,
>  		  const char *message_id,
>  		  GHashTable *hash,
> @@ -511,7 +513,7 @@ parse_references (void *ctx,
>      char *ref;
>
>      if (refs == NULL || *refs == '\0')
> -	return;
> +	return NULL;
>
>      while (*refs) {
>  	ref = _parse_message_id (ctx, refs, &refs);
> @@ -519,6 +521,8 @@ parse_references (void *ctx,
>  	if (ref && strcmp (ref, message_id))
>  	    g_hash_table_insert (hash, ref, NULL);
>      }
> +
> +    return ref;
>  }
>
>  notmuch_status_t
> @@ -1365,7 +1369,7 @@ _notmuch_database_generate_doc_id (notmuch_database_t *notmuch)
>      notmuch->last_doc_id++;
>
>      if (notmuch->last_doc_id == 0)
> -	INTERNAL_ERROR ("Xapian document IDs are exhausted.\n");
> +	INTERNAL_ERROR ("Xapian document IDs are exhausted.\n");
>
>      return notmuch->last_doc_id;
>  }
> @@ -1509,7 +1513,7 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
>  					   const char **thread_id)
>  {
>      GHashTable *parents = NULL;
> -    const char *refs, *in_reply_to, *in_reply_to_message_id;
> +    const char *refs, *in_reply_to, *in_reply_to_message_id, *last_ref_message_id;
>      GList *l, *keys = NULL;
>      notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
>
> @@ -1517,21 +1521,31 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch,
>  				     _my_talloc_free_for_g_hash, NULL);
>
>      refs = notmuch_message_file_get_header (message_file, "references");
> -    parse_references (message, notmuch_message_get_message_id (message),
> -		      parents, refs);
> +    last_ref_message_id = parse_references (message,
> +					    notmuch_message_get_message_id (message),
> +					    parents, refs);
>
>      in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to");
>      parse_references (message, notmuch_message_get_message_id (message),
>  		      parents, in_reply_to);
>
> -    /* Carefully avoid adding any self-referential in-reply-to term. */
>      in_reply_to_message_id = _parse_message_id (message, in_reply_to, NULL);
> +    /* If the parent message ID from the Reply-To and References
> +     * headers are different, use the References one.  This is because
> +     * the Reply-To header is more likely to be in an non-standard
> +     * format. */
> +    if (in_reply_to_message_id &&
> +	last_ref_message_id &&
> +	strcmp (last_ref_message_id, in_reply_to_message_id)) {
> +	in_reply_to_message_id = last_ref_message_id;
> +    }
> +    /* Carefully avoid adding any self-referential in-reply-to term. */
>      if (in_reply_to_message_id &&
>  	strcmp (in_reply_to_message_id,
>  		notmuch_message_get_message_id (message)))
>      {
>  	_notmuch_message_add_term (message, "replyto",
> -			     _parse_message_id (message, in_reply_to, NULL));
> +			     in_reply_to_message_id);
>      }
>
>      keys = g_hash_table_get_keys (parents);
> --
> 1.8.1.4
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

  reply	other threads:[~2013-02-26  9:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-25 23:50 [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Aaron Ecay
2013-02-26  9:19 ` Jani Nikula [this message]
2013-02-26 19:40   ` Aaron Ecay
2013-02-28 19:41 ` Jani Nikula
2013-03-01 17:06   ` Jani Nikula
2013-03-03 23:46     ` Aaron Ecay
2013-03-03 23:55       ` Aaron Ecay
2013-03-04  2:45         ` David Bremner
2013-03-06  3:31           ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers Aaron Ecay
2013-03-06  3:31             ` [PATCH 2/2] lib/database.cc: change how the parent of a message is calculated Aaron Ecay
2013-05-04 14:10               ` David Bremner
2013-05-04 16:24               ` Jani Nikula
2013-05-14  0:43               ` David Bremner
2013-05-04 14:01             ` [PATCH 1/2] test: add tests for the handling of References and In-Reply-To headers David Bremner
2013-05-04 14:03             ` David Bremner
2013-05-04 16:23             ` Jani Nikula
2013-03-04  5:59         ` [RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated Tomi Ollila

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878v6bjuut.fsf@nikula.org \
    --to=jani@nikula.org \
    --cc=aaronecay@gmail.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).