From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id E72966DE0B4F for ; Wed, 24 Feb 2016 09:23:43 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.099 X-Spam-Level: X-Spam-Status: No, score=-0.099 tagged_above=-999 required=5 tests=[AWL=0.213, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.211, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RFzXC0ldVQ2X for ; Wed, 24 Feb 2016 09:23:39 -0800 (PST) X-Greylist: delayed 490 seconds by postgrey-1.35 at arlo; Wed, 24 Feb 2016 09:23:39 PST Received: from resqmta-ch2-12v.sys.comcast.net (resqmta-ch2-12v.sys.comcast.net [69.252.207.44]) by arlo.cworth.org (Postfix) with ESMTPS id A45F36DE0AC2 for ; Wed, 24 Feb 2016 09:23:39 -0800 (PST) Received: from resomta-ch2-13v.sys.comcast.net ([69.252.207.109]) by resqmta-ch2-12v.sys.comcast.net with comcast id NHDG1s0062N9P4d01HFSvV; Wed, 24 Feb 2016 17:15:26 +0000 Received: from mail.tremily.us ([73.221.72.168]) by resomta-ch2-13v.sys.comcast.net with comcast id NHFQ1s00a3dr3C901HFRsS; Wed, 24 Feb 2016 17:15:26 +0000 Received: by mail.tremily.us (Postfix, from userid 1000) id B5FE01BD02A1; Wed, 24 Feb 2016 09:15:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin; t=1456334124; bh=Vkx3ZOGyNSxOIPbT2HJQ6UZW4hHixJsKxezLJ6PCdao=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=n4ZOTdIF4OSRTy3KKLiB3DWZ3YR0leR+UWrVs6bnNcJyhpq6fMrkpxfFH5An2m039 6XAjlzeFVfR2ZllJ/LiBtvWm/hvXPf2M3Y9awB+4IoIoPYu7hc5tEBTj6hLvluRHZv K9NPN26vpUtSl41T/3Sw61/7t2fAo/anxH65upVs= Date: Wed, 24 Feb 2016 09:15:24 -0800 From: "W. Trevor King" To: David Bremner Cc: Daniel Kahn Gillmor , notmuch@notmuchmail.org Subject: Re: encoding of message-ids Message-ID: <20160224171524.GS4265@odin.tremily.us> References: <87si0svnim.fsf@zancas.localnet> <87ziv0iimt.fsf@alice.fifthhorseman.net> <877fi3v4t6.fsf@zancas.localnet> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Djp5PRGHu2Cmyd8M" Content-Disposition: inline In-Reply-To: <877fi3v4t6.fsf@zancas.localnet> OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8; url=http://tremily.us/pubkey.txt User-Agent: Mutt/1.5.23 (2014-03-12) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1456334126; bh=FH2fuPKxpALNCdxB9yFY2XIPPEO6VETMGjI0e9JU2Rk=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=aplBt8g2OvDTDmfayW2v6qm3VdL/1L+RYh2X4tmVnJCYm4M9T9B3XHr3JL7LSi+Jc xDdMVBDVZb3HNRzzu/baSVMppAzKAsRUObG5hN5KOj+/+sQmFdi43felFOjzokAkAU CgBpD7KuLR07fmJKH3BF/jPybceWzUJboBmrCBUj1J52Zkg6pQ2gGHbz9eDd0m5tgh /+ht6Ah+8yYetXILk7G4TvmbSRqG8lNjLEzrfMvMzfear+a68m6RbTafwJJ1+ajMSg 354pJT8cvrTPupkaa49CP9qz2h2yHSQY14LbZVtmti3B/xIhp7ctt7MIHbkFfp8K7b 1mG24aKL0ysWg== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2016 17:23:44 -0000 --Djp5PRGHu2Cmyd8M Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 17, 2016 at 09:34:29AM -0400, David Bremner wrote: > Daniel Kahn Gillmor writes: > > That said, RFC 2047 suggest that its encodings are only relevant > > in places where a "text" token would be used. Message-ID (and > > References and In-Reply-To) are intended to only contain > > dot-atom-text tokens. So probably it would be more correct to > > avoid applying to these specific fields. > > > > i dunno that it's a big deal though, given the analysis above. > > I guess there are two seperate issues. One is the (mildly bogus) > application of RFC2047 decoding to message-ids. The other other is > the coercion into utf8 from whatever wacky 8bit encoding some > creative person might use in a message-id. It looks like there's already an =E2=80=9Cimplicit encodings are complicate= d=E2=80=9D RFC discussing this issue [1]. RFC 6532 overrides (among other things) the atext behind message-id [2,3] for message/global messages. Other related RFCs cover internationalized domain names [4] and internationalized email addresses [5]. I think we should: * Store message IDs as NFKC UTF-8 in notmuch (do we already do this?). * For message/global messages: * Convert headers to Unicode using UTF-8 (per RFC 6532). * For non-message/global messages: * Ignore any RFC 2047 =3D? encoding or RFC 5890 xn-- encoding that may be present. * Convert to Unicode by percent-encoding [6] (e.g. =E2=80=98=C3=BC%=E2=80= =99 represented as the three UTF-8 bytes =E2=80=98\xc3\xbc\x25=E2=80=99 would be repres= ented by the Unicode =E2=80=98%C3%BC%25=E2=80=99). Cheers, Trevor [1]: https://tools.ietf.org/html/rfc6055 [2]: https://tools.ietf.org/html/rfc5322#section-3.6.4 [3]: https://tools.ietf.org/html/rfc5322#section-3.2.3 [4]: https://tools.ietf.org/html/rfc5890 [5]: https://tools.ietf.org/html/rfc6530 [6]: https://tools.ietf.org/html/rfc3986#section-2 [7]: https://tools.ietf.org/html/rfc2606#section-2 --=20 This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy --Djp5PRGHu2Cmyd8M Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJWzeUpAAoJEAPqygegUbGsJ9AP/Rac5YlnUmrsnWtyftmNbj4f PDZyTHxDptFBK+brg39phzkC3wjrwuAiodP/l7j5LT0EqTywYY0pANKwf2oMhu0j iC4gxY321U3JrlbP/L/YjJKqbGIoT5f+X65GDklQ1K8bvGjbOEbnTF2hFzzNs3rh 60fqreZ/sSUVw39c3TyBL/FMTu0SFe2gqPqZl0o2IqSK/MClxwzQxQgmDXVPHZOP 9M+D92yvDAQ2Eoxvdj5Yv6k0CPNN3zXZOEpLrjwS6gN9VrYQEFwPzIrKcU9UBOM1 b0WAE/E7C1KNnb5WbBnGljQl8Pu2A0r//ER3j8CMj2/9+Ll0d4iSupr9erGrCG4m nnkN691yQqX4wlYtIl2+KCySOLhuO8BZRBB6QS3aIb6Bbke380I7Ajua/Wym+LVY 9sWH/HHaQPNWJizbAjRn4dYFcVfz8IhO6UtqDa881bXvlCWKmCsJP2fQQpgAAjza AbUn7WJgGzGANi8AJ1j7OEO4XdTi6sXyg2oVKF0eMVL8InJvXLRL7JTWRxBnnEuh IeHsGsKRyQqpy+IcbhatP6ikJCwPYUtRyiaVabGsXWVs8rzXjyMOCO48e4MQNCLs BMm7xazRrskbqM/XvcespGl5SKhwPnoXW2fmDa1WyzYPQdUz8X4eCRD10WM4w1Vf dEv7ZsBvXaAA3d/hoTqC =1gr7 -----END PGP SIGNATURE----- --Djp5PRGHu2Cmyd8M--