* header continuation issue in notmuch frontend/alot/pythons email module @ 2013-06-23 13:11 Justus Winter 2013-06-23 16:59 ` Austin Clements 0 siblings, 1 reply; 5+ messages in thread From: Justus Winter @ 2013-06-23 13:11 UTC (permalink / raw) To: notmuch mailing list Hi, I recently had a problem replying to a mail written by Thomas Schwinge using an oldish notmuch. Not sure if it has been fixed in more recent versions, but I think notmuch could improve uppon its header generation (see below). Problematic part of the mail: ~~~ snip ~~~ [...] To: someone@example.org, "line break" <linebreak@example.org>, someoneelse@example.org User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) [...] ~~~ snap ~~~ http://tools.ietf.org/html/rfc2822#section-2.2.3 says: Note: Though structured field bodies are defined in such a way that folding can take place between many of the lexical tokens (and even within some of the lexical tokens), folding SHOULD be limited to placing the CRLF at higher-level syntactic breaks. For instance, if a field body is defined as comma-separated values, it is recommended that folding occur after the comma separating the structured items in preference to other places where the field could be folded, even if it is allowed elsewhere. So notmuch "rfc-SHOULD" place the newlines after the comma. The rfc goes on: The process of moving from this folded multiple-line representation of a header field to its single line representation is called "unfolding". Unfolding is accomplished by simply removing any CRLF that is immediately followed by WSP. Each header field should be treated in its unfolded form for further syntactic and semantic evaluation. My interpretation is that unfolding simply removes any linebreaks first, so the value does not contain any newlines. But pythons email module discriminates quoted and unquoted parts of the value: ~~~ snip ~~~ from __future__ import print_function import email from email.utils import getaddresses m = email.message_from_string('''To: "line break" <linebreak@example.org>, line break <linebreak@example.org>''') print("m['To'] = ", m['To']) print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To'))) ~~~ snap ~~~ % python3 test.py m['To'] = "line break" <linebreak@example.org>, line break <linebreak@example.org> getaddresses(m.get_all('To')) = [('line\n break', 'linebreak@example.org'), ('line break', 'linebreak@example.org')] I believe that is what's preventing me from replying to the message using alot without sanitizing the To header first. Not really sure who is wrong or right here... any thoughts? Justus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: header continuation issue in notmuch frontend/alot/pythons email module 2013-06-23 13:11 header continuation issue in notmuch frontend/alot/pythons email module Justus Winter @ 2013-06-23 16:59 ` Austin Clements 2013-06-23 17:27 ` Austin Clements 2013-06-24 8:57 ` Justus Winter 0 siblings, 2 replies; 5+ messages in thread From: Austin Clements @ 2013-06-23 16:59 UTC (permalink / raw) To: Justus Winter; +Cc: notmuch mailing list Quoth Justus Winter on Jun 23 at 3:11 pm: > Hi, > > I recently had a problem replying to a mail written by Thomas Schwinge > using an oldish notmuch. Not sure if it has been fixed in more recent > versions, but I think notmuch could improve uppon its header > generation (see below). Problematic part of the mail: > > ~~~ snip ~~~ > [...] > To: someone@example.org, "line > break" <linebreak@example.org>, someoneelse@example.org > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) > [...] > ~~~ snap ~~~ > > http://tools.ietf.org/html/rfc2822#section-2.2.3 says: > > Note: Though structured field bodies are defined in such a way that > folding can take place between many of the lexical tokens (and even > within some of the lexical tokens), folding SHOULD be limited to > placing the CRLF at higher-level syntactic breaks. For instance, if > a field body is defined as comma-separated values, it is recommended > that folding occur after the comma separating the structured items in > preference to other places where the field could be folded, even if > it is allowed elsewhere. > > So notmuch "rfc-SHOULD" place the newlines after the comma. > > The rfc goes on: > > The process of moving from this folded multiple-line representation > of a header field to its single line representation is called > "unfolding". Unfolding is accomplished by simply removing any CRLF > that is immediately followed by WSP. Each header field should be > treated in its unfolded form for further syntactic and semantic > evaluation. > > My interpretation is that unfolding simply removes any linebreaks > first, so the value does not contain any newlines. But pythons email > module discriminates quoted and unquoted parts of the value: > > ~~~ snip ~~~ > from __future__ import print_function > import email > from email.utils import getaddresses > > m = email.message_from_string('''To: "line > break" <linebreak@example.org>, line > break <linebreak@example.org>''') > print("m['To'] = ", m['To']) > print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To'))) > ~~~ snap ~~~ > > % python3 test.py > m['To'] = "line > break" <linebreak@example.org>, line > break <linebreak@example.org> > getaddresses(m.get_all('To')) = [('line\n break', 'linebreak@example.org'), ('line break', 'linebreak@example.org')] > > I believe that is what's preventing me from replying to the message > using alot without sanitizing the To header first. Not really sure who > is wrong or right here... any thoughts? There are at least two bugs here. Regardless of what we RFC-should do, that folding *is* permitted by RFC2822, since quoted strings can contain folding whitespace: http://tools.ietf.org/html/rfc2822#section-3.2.5 For completeness, the full derivation for this "To" header is: to = "To:" address-list CRLF address-list = (address *("," address)) / obs-addr-list address = mailbox / group mailbox = name-addr / addr-spec name-addr = [display-name] angle-addr display-name = phrase phrase = 1*word / obs-phrase word = atom / quoted-string quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS] Do you happen to know how the strangely folded "to" header was produced for this message? In notmuch-emacs, a user can put whatever they want in a message-mode buffer's headers and mm will dutifully pass it on to their MTA. We could validate it, but that's a slippery slope and I would hope that the MTA itself is validating it (and probably more thoroughly than we could). That said, the first bug here is in Python. As I mentioned above, foldable whitespace is allowed in quoted strings. In fact, though the standard is rather long-winded about whitespace, if you dig into the grammar, you'll find that *all whitespace can be folded* (except in the obsolete grammar, which allowed whitespace between the header name and the colon, which obviously can't be folded). I'm not sure what Python is doing, but I bet it's going to a lot of effort to mis-implement something very simple. There also appears to be a bug in the notmuch CLI's reply command where it omits addresses that were folded in the original message. I don't know if alot uses the CLI's reply command, so this may or may not be related to your specific issue. I haven't dug into this yet, other than to confirm that it's the CLI's fault and not notmuch-emacs's. > Justus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: header continuation issue in notmuch frontend/alot/pythons email module 2013-06-23 16:59 ` Austin Clements @ 2013-06-23 17:27 ` Austin Clements 2013-06-24 8:57 ` Justus Winter 1 sibling, 0 replies; 5+ messages in thread From: Austin Clements @ 2013-06-23 17:27 UTC (permalink / raw) To: Justus Winter; +Cc: notmuch mailing list On Sun, 23 Jun 2013, Austin Clements <amdragon@MIT.EDU> wrote: > There also appears to be a bug in the notmuch CLI's reply command > where it omits addresses that were folded in the original message. I > don't know if alot uses the CLI's reply command, so this may or may > not be related to your specific issue. I haven't dug into this yet, > other than to confirm that it's the CLI's fault and not > notmuch-emacs's. I take back what I said about there being a bug in the reply command. It was a problem with my test case. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: header continuation issue in notmuch frontend/alot/pythons email module 2013-06-23 16:59 ` Austin Clements 2013-06-23 17:27 ` Austin Clements @ 2013-06-24 8:57 ` Justus Winter 2013-06-24 9:19 ` Thomas Schwinge 1 sibling, 1 reply; 5+ messages in thread From: Justus Winter @ 2013-06-24 8:57 UTC (permalink / raw) To: Austin Clements, thomas schwinge; +Cc: notmuch mailing list Quoting Austin Clements (2013-06-23 18:59:39) > Quoth Justus Winter on Jun 23 at 3:11 pm: > > Hi, > > > > I recently had a problem replying to a mail written by Thomas Schwinge > > using an oldish notmuch. Not sure if it has been fixed in more recent > > versions, but I think notmuch could improve uppon its header > > generation (see below). Problematic part of the mail: > > > > ~~~ snip ~~~ > > [...] > > To: someone@example.org, "line > > break" <linebreak@example.org>, someoneelse@example.org > > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) > > [...] > > ~~~ snap ~~~ > > > > http://tools.ietf.org/html/rfc2822#section-2.2.3 says: > > > > Note: Though structured field bodies are defined in such a way that > > folding can take place between many of the lexical tokens (and even > > within some of the lexical tokens), folding SHOULD be limited to > > placing the CRLF at higher-level syntactic breaks. For instance, if > > a field body is defined as comma-separated values, it is recommended > > that folding occur after the comma separating the structured items in > > preference to other places where the field could be folded, even if > > it is allowed elsewhere. > > > > So notmuch "rfc-SHOULD" place the newlines after the comma. > > > > The rfc goes on: > > > > The process of moving from this folded multiple-line representation > > of a header field to its single line representation is called > > "unfolding". Unfolding is accomplished by simply removing any CRLF > > that is immediately followed by WSP. Each header field should be > > treated in its unfolded form for further syntactic and semantic > > evaluation. > > > > My interpretation is that unfolding simply removes any linebreaks > > first, so the value does not contain any newlines. But pythons email > > module discriminates quoted and unquoted parts of the value: > > > > ~~~ snip ~~~ > > from __future__ import print_function > > import email > > from email.utils import getaddresses > > > > m = email.message_from_string('''To: "line > > break" <linebreak@example.org>, line > > break <linebreak@example.org>''') > > print("m['To'] = ", m['To']) > > print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To'))) > > ~~~ snap ~~~ > > > > % python3 test.py > > m['To'] = "line > > break" <linebreak@example.org>, line > > break <linebreak@example.org> > > getaddresses(m.get_all('To')) = [('line\n break', 'linebreak@example.org'), ('line break', 'linebreak@example.org')] > > > > I believe that is what's preventing me from replying to the message > > using alot without sanitizing the To header first. Not really sure who > > is wrong or right here... any thoughts? > > There are at least two bugs here. Regardless of what we RFC-should > do, that folding *is* permitted by RFC2822, since quoted > strings can contain folding whitespace: > > http://tools.ietf.org/html/rfc2822#section-3.2.5 > > For completeness, the full derivation for this "To" header is: > > to = "To:" address-list CRLF > address-list = (address *("," address)) / obs-addr-list > address = mailbox / group > mailbox = name-addr / addr-spec > name-addr = [display-name] angle-addr > display-name = phrase > phrase = 1*word / obs-phrase > word = atom / quoted-string > quoted-string = [CFWS] > DQUOTE *([FWS] qcontent) [FWS] DQUOTE > [CFWS] > > Do you happen to know how the strangely folded "to" header was > produced for this message? No, but Thomas might. Thomas, the problematic message is id:877ghpqckb.fsf@kepler.schwinge.homeip.net > In notmuch-emacs, a user can put whatever > they want in a message-mode buffer's headers and mm will dutifully > pass it on to their MTA. We could validate it, but that's a slippery > slope and I would hope that the MTA itself is validating it (and > probably more thoroughly than we could). > > That said, the first bug here is in Python. As I mentioned above, > foldable whitespace is allowed in quoted strings. In fact, though the > standard is rather long-winded about whitespace, if you dig into the > grammar, you'll find that *all whitespace can be folded* (except in > the obsolete grammar, which allowed whitespace between the header name > and the colon, which obviously can't be folded). I'm not sure what > Python is doing, but I bet it's going to a lot of effort to > mis-implement something very simple. Yes, I'm glad you came to the same conclusion. > There also appears to be a bug in the notmuch CLI's reply command > where it omits addresses that were folded in the original message. I > don't know if alot uses the CLI's reply command, so this may or may > not be related to your specific issue. I haven't dug into this yet, > other than to confirm that it's the CLI's fault and not > notmuch-emacs's. No, alot does not use notmuchs reply command. Thanks, Justus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: header continuation issue in notmuch frontend/alot/pythons email module 2013-06-24 8:57 ` Justus Winter @ 2013-06-24 9:19 ` Thomas Schwinge 0 siblings, 0 replies; 5+ messages in thread From: Thomas Schwinge @ 2013-06-24 9:19 UTC (permalink / raw) To: Justus Winter, Austin Clements; +Cc: notmuch mailing list [-- Attachment #1: Type: text/plain, Size: 2216 bytes --] Hi! On Mon, 24 Jun 2013 10:57:10 +0200, Justus Winter <4winter@informatik.uni-hamburg.de> wrote: > Quoting Austin Clements (2013-06-23 18:59:39) > > Quoth Justus Winter on Jun 23 at 3:11 pm: > > > I recently had a problem replying to a mail written by Thomas Schwinge > > > using an oldish notmuch. Not sure if it has been fixed in more recent "Oldish", yeah, yeah, I know... (Mumbles someting about long TODO list.) > > > versions, but I think notmuch could improve uppon its header > > > generation (see below). Problematic part of the mail: > > > > > > ~~~ snip ~~~ > > > [...] > > > To: someone@example.org, "line > > > break" <linebreak@example.org>, someoneelse@example.org > > > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) > > > [...] > > > ~~~ snap ~~~ > > Do you happen to know how the strangely folded "to" header was > > produced for this message? I just entered/copied all the addresses into one long To: line, and then let message-mode do its thing. > No, but Thomas might. Thomas, the problematic message is > id:877ghpqckb.fsf@kepler.schwinge.homeip.net Here is the header from the message as I sent it: To: Samuel Thibault <samuel.thibault@gnu.org>, Justus Winter <4winter@informatik.uni-hamburg.de>, fotis.koutoulakis@gmail.com, Ian Lance Taylor <iant@google.com>, toscano.pino@tiscali.it, Luis Machado <lgustavo@codesourcery.com>, =?utf-8?B?6ZmG5bKz?= <hacklu.newborn@gmail.com> And this is what I received from the bug-hurd mailing list: To: Samuel Thibault <samuel.thibault@gnu.org>, Justus Winter <4winter@informatik.uni-hamburg.de>, <fotis.koutoulakis@gmail.com>, "Ian Lance Taylor" <iant@google.com>, <toscano.pino@tiscali.it>, Luis Machado <lgustavo@codesourcery.com>, =?utf-8?B?6ZmG5bKz?= <hacklu.newborn@gmail.com> So the "corruption" (if it is declared as one; I don't have time right now to follow your RFC interpretation) must have happened after sending it off -- perhaps my company's Microsoft Exchange server (as Justus received a direct copy from that one), or even msmtp used as the local MTA. Grüße, Thomas [-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-06-24 9:20 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-23 13:11 header continuation issue in notmuch frontend/alot/pythons email module Justus Winter 2013-06-23 16:59 ` Austin Clements 2013-06-23 17:27 ` Austin Clements 2013-06-24 8:57 ` Justus Winter 2013-06-24 9:19 ` Thomas Schwinge
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).