unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* header continuation issue in notmuch frontend/alot/pythons email module
@ 2013-06-23 13:11 Justus Winter
  2013-06-23 16:59 ` Austin Clements
  0 siblings, 1 reply; 5+ messages in thread
From: Justus Winter @ 2013-06-23 13:11 UTC (permalink / raw)
  To: notmuch mailing list

Hi,

I recently had a problem replying to a mail written by Thomas Schwinge
using an oldish notmuch. Not sure if it has been fixed in more recent
versions, but I think notmuch could improve uppon its header
generation (see below). Problematic part of the mail:

~~~ snip ~~~
[...]
To: someone@example.org, "line
 break" <linebreak@example.org>, someoneelse@example.org
User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu)
[...]
~~~ snap ~~~

http://tools.ietf.org/html/rfc2822#section-2.2.3 says:

   Note: Though structured field bodies are defined in such a way that
   folding can take place between many of the lexical tokens (and even
   within some of the lexical tokens), folding SHOULD be limited to
   placing the CRLF at higher-level syntactic breaks.  For instance, if
   a field body is defined as comma-separated values, it is recommended
   that folding occur after the comma separating the structured items in
   preference to other places where the field could be folded, even if
   it is allowed elsewhere.

So notmuch "rfc-SHOULD" place the newlines after the comma.

The rfc goes on:

   The process of moving from this folded multiple-line representation
   of a header field to its single line representation is called
   "unfolding". Unfolding is accomplished by simply removing any CRLF
   that is immediately followed by WSP.  Each header field should be
   treated in its unfolded form for further syntactic and semantic
   evaluation.

My interpretation is that unfolding simply removes any linebreaks
first, so the value does not contain any newlines. But pythons email
module discriminates quoted and unquoted parts of the value:

~~~ snip ~~~
from __future__ import print_function
import email
from email.utils import getaddresses

m = email.message_from_string('''To: "line
 break" <linebreak@example.org>, line
 break <linebreak@example.org>''')
print("m['To'] = ", m['To'])
print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To')))
~~~ snap ~~~

% python3 test.py
m['To'] =  "line
 break" <linebreak@example.org>, line
 break <linebreak@example.org>
getaddresses(m.get_all('To')) =  [('line\n break', 'linebreak@example.org'), ('line break', 'linebreak@example.org')]

I believe that is what's preventing me from replying to the message
using alot without sanitizing the To header first. Not really sure who
is wrong or right here... any thoughts?

Justus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: header continuation issue in notmuch frontend/alot/pythons email module
  2013-06-23 13:11 header continuation issue in notmuch frontend/alot/pythons email module Justus Winter
@ 2013-06-23 16:59 ` Austin Clements
  2013-06-23 17:27   ` Austin Clements
  2013-06-24  8:57   ` Justus Winter
  0 siblings, 2 replies; 5+ messages in thread
From: Austin Clements @ 2013-06-23 16:59 UTC (permalink / raw)
  To: Justus Winter; +Cc: notmuch mailing list

Quoth Justus Winter on Jun 23 at  3:11 pm:
> Hi,
> 
> I recently had a problem replying to a mail written by Thomas Schwinge
> using an oldish notmuch. Not sure if it has been fixed in more recent
> versions, but I think notmuch could improve uppon its header
> generation (see below). Problematic part of the mail:
> 
> ~~~ snip ~~~
> [...]
> To: someone@example.org, "line
>  break" <linebreak@example.org>, someoneelse@example.org
> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu)
> [...]
> ~~~ snap ~~~
> 
> http://tools.ietf.org/html/rfc2822#section-2.2.3 says:
> 
>    Note: Though structured field bodies are defined in such a way that
>    folding can take place between many of the lexical tokens (and even
>    within some of the lexical tokens), folding SHOULD be limited to
>    placing the CRLF at higher-level syntactic breaks.  For instance, if
>    a field body is defined as comma-separated values, it is recommended
>    that folding occur after the comma separating the structured items in
>    preference to other places where the field could be folded, even if
>    it is allowed elsewhere.
> 
> So notmuch "rfc-SHOULD" place the newlines after the comma.
> 
> The rfc goes on:
> 
>    The process of moving from this folded multiple-line representation
>    of a header field to its single line representation is called
>    "unfolding". Unfolding is accomplished by simply removing any CRLF
>    that is immediately followed by WSP.  Each header field should be
>    treated in its unfolded form for further syntactic and semantic
>    evaluation.
> 
> My interpretation is that unfolding simply removes any linebreaks
> first, so the value does not contain any newlines. But pythons email
> module discriminates quoted and unquoted parts of the value:
> 
> ~~~ snip ~~~
> from __future__ import print_function
> import email
> from email.utils import getaddresses
> 
> m = email.message_from_string('''To: "line
>  break" <linebreak@example.org>, line
>  break <linebreak@example.org>''')
> print("m['To'] = ", m['To'])
> print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To')))
> ~~~ snap ~~~
> 
> % python3 test.py
> m['To'] =  "line
>  break" <linebreak@example.org>, line
>  break <linebreak@example.org>
> getaddresses(m.get_all('To')) =  [('line\n break', 'linebreak@example.org'), ('line break', 'linebreak@example.org')]
> 
> I believe that is what's preventing me from replying to the message
> using alot without sanitizing the To header first. Not really sure who
> is wrong or right here... any thoughts?

There are at least two bugs here.  Regardless of what we RFC-should
do, that folding *is* permitted by RFC2822, since quoted
strings can contain folding whitespace:

  http://tools.ietf.org/html/rfc2822#section-3.2.5

For completeness, the full derivation for this "To" header is:

to              =       "To:" address-list CRLF
address-list    =       (address *("," address)) / obs-addr-list
address         =       mailbox / group
mailbox         =       name-addr / addr-spec
name-addr       =       [display-name] angle-addr
display-name    =       phrase
phrase          =       1*word / obs-phrase
word            =       atom / quoted-string
quoted-string   =       [CFWS]
                        DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                        [CFWS]

Do you happen to know how the strangely folded "to" header was
produced for this message?  In notmuch-emacs, a user can put whatever
they want in a message-mode buffer's headers and mm will dutifully
pass it on to their MTA.  We could validate it, but that's a slippery
slope and I would hope that the MTA itself is validating it (and
probably more thoroughly than we could).

That said, the first bug here is in Python.  As I mentioned above,
foldable whitespace is allowed in quoted strings.  In fact, though the
standard is rather long-winded about whitespace, if you dig into the
grammar, you'll find that *all whitespace can be folded* (except in
the obsolete grammar, which allowed whitespace between the header name
and the colon, which obviously can't be folded).  I'm not sure what
Python is doing, but I bet it's going to a lot of effort to
mis-implement something very simple.

There also appears to be a bug in the notmuch CLI's reply command
where it omits addresses that were folded in the original message.  I
don't know if alot uses the CLI's reply command, so this may or may
not be related to your specific issue.  I haven't dug into this yet,
other than to confirm that it's the CLI's fault and not
notmuch-emacs's.

> Justus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: header continuation issue in notmuch frontend/alot/pythons email module
  2013-06-23 16:59 ` Austin Clements
@ 2013-06-23 17:27   ` Austin Clements
  2013-06-24  8:57   ` Justus Winter
  1 sibling, 0 replies; 5+ messages in thread
From: Austin Clements @ 2013-06-23 17:27 UTC (permalink / raw)
  To: Justus Winter; +Cc: notmuch mailing list

On Sun, 23 Jun 2013, Austin Clements <amdragon@MIT.EDU> wrote:
> There also appears to be a bug in the notmuch CLI's reply command
> where it omits addresses that were folded in the original message.  I
> don't know if alot uses the CLI's reply command, so this may or may
> not be related to your specific issue.  I haven't dug into this yet,
> other than to confirm that it's the CLI's fault and not
> notmuch-emacs's.

I take back what I said about there being a bug in the reply command.
It was a problem with my test case.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: header continuation issue in notmuch frontend/alot/pythons email module
  2013-06-23 16:59 ` Austin Clements
  2013-06-23 17:27   ` Austin Clements
@ 2013-06-24  8:57   ` Justus Winter
  2013-06-24  9:19     ` Thomas Schwinge
  1 sibling, 1 reply; 5+ messages in thread
From: Justus Winter @ 2013-06-24  8:57 UTC (permalink / raw)
  To: Austin Clements, thomas schwinge; +Cc: notmuch mailing list

Quoting Austin Clements (2013-06-23 18:59:39)
> Quoth Justus Winter on Jun 23 at  3:11 pm:
> > Hi,
> > 
> > I recently had a problem replying to a mail written by Thomas Schwinge
> > using an oldish notmuch. Not sure if it has been fixed in more recent
> > versions, but I think notmuch could improve uppon its header
> > generation (see below). Problematic part of the mail:
> > 
> > ~~~ snip ~~~
> > [...]
> > To: someone@example.org, "line
> >  break" <linebreak@example.org>, someoneelse@example.org
> > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu)
> > [...]
> > ~~~ snap ~~~
> > 
> > http://tools.ietf.org/html/rfc2822#section-2.2.3 says:
> > 
> >    Note: Though structured field bodies are defined in such a way that
> >    folding can take place between many of the lexical tokens (and even
> >    within some of the lexical tokens), folding SHOULD be limited to
> >    placing the CRLF at higher-level syntactic breaks.  For instance, if
> >    a field body is defined as comma-separated values, it is recommended
> >    that folding occur after the comma separating the structured items in
> >    preference to other places where the field could be folded, even if
> >    it is allowed elsewhere.
> > 
> > So notmuch "rfc-SHOULD" place the newlines after the comma.
> > 
> > The rfc goes on:
> > 
> >    The process of moving from this folded multiple-line representation
> >    of a header field to its single line representation is called
> >    "unfolding". Unfolding is accomplished by simply removing any CRLF
> >    that is immediately followed by WSP.  Each header field should be
> >    treated in its unfolded form for further syntactic and semantic
> >    evaluation.
> > 
> > My interpretation is that unfolding simply removes any linebreaks
> > first, so the value does not contain any newlines. But pythons email
> > module discriminates quoted and unquoted parts of the value:
> > 
> > ~~~ snip ~~~
> > from __future__ import print_function
> > import email
> > from email.utils import getaddresses
> > 
> > m = email.message_from_string('''To: "line
> >  break" <linebreak@example.org>, line
> >  break <linebreak@example.org>''')
> > print("m['To'] = ", m['To'])
> > print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To')))
> > ~~~ snap ~~~
> > 
> > % python3 test.py
> > m['To'] =  "line
> >  break" <linebreak@example.org>, line
> >  break <linebreak@example.org>
> > getaddresses(m.get_all('To')) =  [('line\n break', 'linebreak@example.org'), ('line break', 'linebreak@example.org')]
> > 
> > I believe that is what's preventing me from replying to the message
> > using alot without sanitizing the To header first. Not really sure who
> > is wrong or right here... any thoughts?
> 
> There are at least two bugs here.  Regardless of what we RFC-should
> do, that folding *is* permitted by RFC2822, since quoted
> strings can contain folding whitespace:
> 
>   http://tools.ietf.org/html/rfc2822#section-3.2.5
> 
> For completeness, the full derivation for this "To" header is:
> 
> to              =       "To:" address-list CRLF
> address-list    =       (address *("," address)) / obs-addr-list
> address         =       mailbox / group
> mailbox         =       name-addr / addr-spec
> name-addr       =       [display-name] angle-addr
> display-name    =       phrase
> phrase          =       1*word / obs-phrase
> word            =       atom / quoted-string
> quoted-string   =       [CFWS]
>                         DQUOTE *([FWS] qcontent) [FWS] DQUOTE
>                         [CFWS]
> 
> Do you happen to know how the strangely folded "to" header was
> produced for this message?

No, but Thomas might. Thomas, the problematic message is
id:877ghpqckb.fsf@kepler.schwinge.homeip.net

>  In notmuch-emacs, a user can put whatever
> they want in a message-mode buffer's headers and mm will dutifully
> pass it on to their MTA.  We could validate it, but that's a slippery
> slope and I would hope that the MTA itself is validating it (and
> probably more thoroughly than we could).
> 
> That said, the first bug here is in Python.  As I mentioned above,
> foldable whitespace is allowed in quoted strings.  In fact, though the
> standard is rather long-winded about whitespace, if you dig into the
> grammar, you'll find that *all whitespace can be folded* (except in
> the obsolete grammar, which allowed whitespace between the header name
> and the colon, which obviously can't be folded).  I'm not sure what
> Python is doing, but I bet it's going to a lot of effort to
> mis-implement something very simple.

Yes, I'm glad you came to the same conclusion.

> There also appears to be a bug in the notmuch CLI's reply command
> where it omits addresses that were folded in the original message.  I
> don't know if alot uses the CLI's reply command, so this may or may
> not be related to your specific issue.  I haven't dug into this yet,
> other than to confirm that it's the CLI's fault and not
> notmuch-emacs's.

No, alot does not use notmuchs reply command.

Thanks,
Justus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: header continuation issue in notmuch frontend/alot/pythons email module
  2013-06-24  8:57   ` Justus Winter
@ 2013-06-24  9:19     ` Thomas Schwinge
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Schwinge @ 2013-06-24  9:19 UTC (permalink / raw)
  To: Justus Winter, Austin Clements; +Cc: notmuch mailing list

[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]

Hi!

On Mon, 24 Jun 2013 10:57:10 +0200, Justus Winter <4winter@informatik.uni-hamburg.de> wrote:
> Quoting Austin Clements (2013-06-23 18:59:39)
> > Quoth Justus Winter on Jun 23 at  3:11 pm:
> > > I recently had a problem replying to a mail written by Thomas Schwinge
> > > using an oldish notmuch. Not sure if it has been fixed in more recent

"Oldish", yeah, yeah, I know...  (Mumbles someting about long TODO list.)

> > > versions, but I think notmuch could improve uppon its header
> > > generation (see below). Problematic part of the mail:
> > > 
> > > ~~~ snip ~~~
> > > [...]
> > > To: someone@example.org, "line
> > >  break" <linebreak@example.org>, someoneelse@example.org
> > > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu)
> > > [...]
> > > ~~~ snap ~~~

> > Do you happen to know how the strangely folded "to" header was
> > produced for this message?

I just entered/copied all the addresses into one long To: line, and then
let message-mode do its thing.

> No, but Thomas might. Thomas, the problematic message is
> id:877ghpqckb.fsf@kepler.schwinge.homeip.net

Here is the header from the message as I sent it:

    To: Samuel Thibault <samuel.thibault@gnu.org>, Justus Winter
     <4winter@informatik.uni-hamburg.de>, fotis.koutoulakis@gmail.com, Ian
     Lance Taylor <iant@google.com>, toscano.pino@tiscali.it, Luis Machado
     <lgustavo@codesourcery.com>, =?utf-8?B?6ZmG5bKz?=
     <hacklu.newborn@gmail.com>

And this is what I received from the bug-hurd mailing list:

    To: Samuel Thibault <samuel.thibault@gnu.org>, Justus Winter
    	<4winter@informatik.uni-hamburg.de>, <fotis.koutoulakis@gmail.com>, "Ian
    	Lance Taylor" <iant@google.com>, <toscano.pino@tiscali.it>, Luis Machado
    	<lgustavo@codesourcery.com>,
    	=?utf-8?B?6ZmG5bKz?= <hacklu.newborn@gmail.com>

So the "corruption" (if it is declared as one; I don't have time right
now to follow your RFC interpretation) must have happened after sending
it off -- perhaps my company's Microsoft Exchange server (as Justus
received a direct copy from that one), or even msmtp used as the local
MTA.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-06-24  9:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-23 13:11 header continuation issue in notmuch frontend/alot/pythons email module Justus Winter
2013-06-23 16:59 ` Austin Clements
2013-06-23 17:27   ` Austin Clements
2013-06-24  8:57   ` Justus Winter
2013-06-24  9:19     ` Thomas Schwinge

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).