unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Tabulation in multiline headers
@ 2014-10-17 23:45 Sergei Shilovsky
  2014-10-18  0:08 ` Jameson Graef Rollins
  0 siblings, 1 reply; 4+ messages in thread
From: Sergei Shilovsky @ 2014-10-17 23:45 UTC (permalink / raw)
  To: notmuch

Lets consider this message:

    id:87r5aucoeg.fsf@servo.finestructure.net

Its subject spreads over 2 lines and the 2nd line is indented with
<Tab> in the file:

    Subject: running the crypto branch [was: Re: Hiding HTML mime-parts and/or
    <Tab--->scrubbing (gmail's) HTML-based citation]

The issue is that notmuch_message_get_header() returns this whole line
with the Tab
character (though I guess it should not):

    running the crypto branch [was: Re: Hiding HTML mime-parts
and/or<Tab>scrubbing (gmail's) HTML-based citation]

This file could be imported from gmane though with mb2md. My test long
subject message (sent via gmail) didn't got any tabulation.

 No idea where this tabulation could came from, but would that be
correct to replace <tab> with space in libnotmuch itself?

-- 
С уважением,
Сергей Шиловский
Sergei Shilovsky

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Tabulation in multiline headers
  2014-10-17 23:45 Tabulation in multiline headers Sergei Shilovsky
@ 2014-10-18  0:08 ` Jameson Graef Rollins
  2014-10-18  1:37   ` Sergei Shilovsky
  0 siblings, 1 reply; 4+ messages in thread
From: Jameson Graef Rollins @ 2014-10-18  0:08 UTC (permalink / raw)
  To: sshilovsky, notmuch

[-- Attachment #1: Type: text/plain, Size: 1506 bytes --]

On Fri, Oct 17 2014, Sergei Shilovsky <sshilovsky@gmail.com> wrote:
> Lets consider this message:
>
>     id:87r5aucoeg.fsf@servo.finestructure.net
>
> Its subject spreads over 2 lines and the 2nd line is indented with
> <Tab> in the file:
>
>     Subject: running the crypto branch [was: Re: Hiding HTML mime-parts and/or
>     <Tab--->scrubbing (gmail's) HTML-based citation]
>
> The issue is that notmuch_message_get_header() returns this whole line
> with the Tab
> character (though I guess it should not):
>
>     running the crypto branch [was: Re: Hiding HTML mime-parts
> and/or<Tab>scrubbing (gmail's) HTML-based citation]
>
> This file could be imported from gmane though with mb2md. My test long
> subject message (sent via gmail) didn't got any tabulation.

Hi, Sergei.  I'm not clear on where exactly you are seeing a problem
with this tab in the subject line.  Is it showing up somewhere you think
it shouldn't?

Headers that are broken across multiple lines must be indented, so I
think it is fairly standard for MUAs to insert either a space or a tab
at that point.

>  No idea where this tabulation could came from, but would that be
> correct to replace <tab> with space in libnotmuch itself?

User-Agent: Notmuch/0.5-102-ge86ac1d (http://notmuchmail.org) Emacs/23.2.1
	(i486-pc-linux-gnu)

I'm not sure libnotmuch should be doing any scrubbing of the message
contents.  The emacs UI does seem to replace the tab with a space,
though.  Maybe other MUAs should be doing the same?

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Tabulation in multiline headers
  2014-10-18  0:08 ` Jameson Graef Rollins
@ 2014-10-18  1:37   ` Sergei Shilovsky
  2014-10-18  9:11     ` Jani Nikula
  0 siblings, 1 reply; 4+ messages in thread
From: Sergei Shilovsky @ 2014-10-18  1:37 UTC (permalink / raw)
  To: Jameson Graef Rollins; +Cc: notmuch

> Hi, Sergei.  I'm not clear on where exactly you are seeing a problem
> with this tab in the subject line.  Is it showing up somewhere you think
> it shouldn't?

It is shown in e.g. `notmuch show` as well as
'notmuch_message_get_header(m, "subject")`

> I'm not sure libnotmuch should be doing any scrubbing of the message
> contents.  The emacs UI does seem to replace the tab with a space,
> though.  Maybe other MUAs should be doing the same?

My point is that this tabulation character does not relate to the
contents of the header (this might be arguable though) and libnotmuch
should return the contents, not its representation on file system.

-- 
С уважением,
Сергей Шиловский
Sergei Shilovsky

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Tabulation in multiline headers
  2014-10-18  1:37   ` Sergei Shilovsky
@ 2014-10-18  9:11     ` Jani Nikula
  0 siblings, 0 replies; 4+ messages in thread
From: Jani Nikula @ 2014-10-18  9:11 UTC (permalink / raw)
  To: sshilovsky, Jameson Graef Rollins; +Cc: notmuch

On Sat, 18 Oct 2014, Sergei Shilovsky <sshilovsky@gmail.com> wrote:
>> Hi, Sergei.  I'm not clear on where exactly you are seeing a problem
>> with this tab in the subject line.  Is it showing up somewhere you think
>> it shouldn't?
>
> It is shown in e.g. `notmuch show` as well as
> 'notmuch_message_get_header(m, "subject")`
>
>> I'm not sure libnotmuch should be doing any scrubbing of the message
>> contents.  The emacs UI does seem to replace the tab with a space,
>> though.  Maybe other MUAs should be doing the same?
>
> My point is that this tabulation character does not relate to the
> contents of the header (this might be arguable though) and libnotmuch
> should return the contents, not its representation on file system.

This is folding and unfolding of long header fields in action, described
in [1]. In short, folding happens by inserting CRLF before any WSP, and
unfolding happens by removing any CRLF immediately followed by WSP. The
WSP is preserved unchanged through folding and unfolding. The TAB is not
part of the multiple line representation, it's part of the unfolded
content.

If my memory serves me right, many problems lead back to an
interpretation of [2] that you could insert extra WSP while folding. Due
to this interpretation, many agents replace the WSP following a CRLF
with a single space while unfolding. And presumably because of this,
buggy folding in a Python email package that replaces WSP by a TAB while
folding went unnoticed. This problem, in turn, has been literally spread
wide by Mailman 2 through its use of said email package. In practice it
follows that a perfectly good message will have folding WSP replaced by
TAB when it gets transmitted through Mailman 2. Again, this is all from
memory, [citation needed] etc.

Notmuch is not free of a history of its own when it comes to header
unfolding. For historical reasons, we used two header parsers until
recently. One from gmime, and one of our own. After all of the above, it
shouldn't surprise the reader that the parsers treated folding WSP
differently! Our own parser replaced folding WSP with a single space,
while gmime respects the RFC. Starting from 0.18 we only use gmime to
parse headers, which means we're at least consistent, but, by the GIGO
principle, we may see more folding TABs.

I do not think we should workaround header folding problems in the lib,
and I'm not sure about the cli either. We should consider replacing TABs
with spaces in notmuch-emacs though (I personally use a
notmuch-show-markup-headers-hook that does that).

HTH,
Jani.


[1] https://tools.ietf.org/html/rfc5322#section-2.2.3
[2] https://tools.ietf.org/html/rfc822#section-3.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-18  9:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-17 23:45 Tabulation in multiline headers Sergei Shilovsky
2014-10-18  0:08 ` Jameson Graef Rollins
2014-10-18  1:37   ` Sergei Shilovsky
2014-10-18  9:11     ` Jani Nikula

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).