unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org
Subject: Re: encoding of message-ids
Date: Tue, 16 Feb 2016 14:02:02 -0500	[thread overview]
Message-ID: <87ziv0iimt.fsf@alice.fifthhorseman.net> (raw)
In-Reply-To: <87si0svnim.fsf@zancas.localnet>

On Tue 2016-02-16 07:38:09 -0500, David Bremner wrote:
> I spent a little time this morning staring at the code, and it seems
> that all of the message-ids are parsed via g_mime_decode_text, which
> deals with RFC2047 encodings and makes guesses at decoding 8bit
> characters. In practice this means that in the notmuch database all
> headers are UTF-8. Since message-id's are supposed to be printable ascii
> [at least in rfc5322], this seems like not such a terrible decision, but
> I wonder if we should document this potential conversion somewhere?

i think you mean g_mime_utils_header_decode_text, not gmime_decode_text,
right?

What do you think are the potential risks here?

 * if all incoming message-ids are standards-compliant (lower-case
   ascii, with an @ sign in the middle and surrounded by angle-brackets
   [0], then it cannot be interpreted as RFC 2047 text because it does
   not have the leading =? or the trailing ?=, so gmime shouldn't
   translate it.

 * if some incoming message-ids are not standards-compliant, then it's
   possible that they will be transformed into other,
   non-standards-compliant message IDs.  Some of them might even be
   transformed into standards-compliant message-IDs.  for example,
   '=?UTF-8?q?<abc@example.net>?=' will be transformed into
   '<abc@example.net>'.

the main risk, i suppose, is that someone could craft a message with a
different literal Message-ID than an existing message, and could trigger
an otherwise undetectable message ID collision.  This seems not much
worse than the existing (detectable) mesage ID collision problems
notmuch already has.

That said, RFC 2047 suggest that its encodings are only relevant in
places where a "text" token would be used.  Message-ID (and References
and In-Reply-To) are intended to only contain dot-atom-text tokens.  So
probably it would be more correct to avoid applying to these specific
fields.

i dunno that it's a big deal though, given the analysis above.

        --dkg

[0] https://tools.ietf.org/html/rfc5322#section-3.6.4
[1] https://tools.ietf.org/html/rfc2047#section-5

  reply	other threads:[~2016-02-16 19:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-16 12:38 encoding of message-ids David Bremner
2016-02-16 19:02 ` Daniel Kahn Gillmor [this message]
2016-02-17 13:34   ` David Bremner
2016-02-24 17:15     ` W. Trevor King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ziv0iimt.fsf@alice.fifthhorseman.net \
    --to=dkg@fifthhorseman.net \
    --cc=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).