* Tabulation in multiline headers @ 2014-10-17 23:45 Sergei Shilovsky 2014-10-18 0:08 ` Jameson Graef Rollins 0 siblings, 1 reply; 4+ messages in thread From: Sergei Shilovsky @ 2014-10-17 23:45 UTC (permalink / raw) To: notmuch Lets consider this message: id:87r5aucoeg.fsf@servo.finestructure.net Its subject spreads over 2 lines and the 2nd line is indented with <Tab> in the file: Subject: running the crypto branch [was: Re: Hiding HTML mime-parts and/or <Tab--->scrubbing (gmail's) HTML-based citation] The issue is that notmuch_message_get_header() returns this whole line with the Tab character (though I guess it should not): running the crypto branch [was: Re: Hiding HTML mime-parts and/or<Tab>scrubbing (gmail's) HTML-based citation] This file could be imported from gmane though with mb2md. My test long subject message (sent via gmail) didn't got any tabulation. No idea where this tabulation could came from, but would that be correct to replace <tab> with space in libnotmuch itself? -- С уважением, Сергей Шиловский Sergei Shilovsky ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Tabulation in multiline headers 2014-10-17 23:45 Tabulation in multiline headers Sergei Shilovsky @ 2014-10-18 0:08 ` Jameson Graef Rollins 2014-10-18 1:37 ` Sergei Shilovsky 0 siblings, 1 reply; 4+ messages in thread From: Jameson Graef Rollins @ 2014-10-18 0:08 UTC (permalink / raw) To: sshilovsky, notmuch [-- Attachment #1: Type: text/plain, Size: 1506 bytes --] On Fri, Oct 17 2014, Sergei Shilovsky <sshilovsky@gmail.com> wrote: > Lets consider this message: > > id:87r5aucoeg.fsf@servo.finestructure.net > > Its subject spreads over 2 lines and the 2nd line is indented with > <Tab> in the file: > > Subject: running the crypto branch [was: Re: Hiding HTML mime-parts and/or > <Tab--->scrubbing (gmail's) HTML-based citation] > > The issue is that notmuch_message_get_header() returns this whole line > with the Tab > character (though I guess it should not): > > running the crypto branch [was: Re: Hiding HTML mime-parts > and/or<Tab>scrubbing (gmail's) HTML-based citation] > > This file could be imported from gmane though with mb2md. My test long > subject message (sent via gmail) didn't got any tabulation. Hi, Sergei. I'm not clear on where exactly you are seeing a problem with this tab in the subject line. Is it showing up somewhere you think it shouldn't? Headers that are broken across multiple lines must be indented, so I think it is fairly standard for MUAs to insert either a space or a tab at that point. > No idea where this tabulation could came from, but would that be > correct to replace <tab> with space in libnotmuch itself? User-Agent: Notmuch/0.5-102-ge86ac1d (http://notmuchmail.org) Emacs/23.2.1 (i486-pc-linux-gnu) I'm not sure libnotmuch should be doing any scrubbing of the message contents. The emacs UI does seem to replace the tab with a space, though. Maybe other MUAs should be doing the same? jamie. [-- Attachment #2: Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Tabulation in multiline headers 2014-10-18 0:08 ` Jameson Graef Rollins @ 2014-10-18 1:37 ` Sergei Shilovsky 2014-10-18 9:11 ` Jani Nikula 0 siblings, 1 reply; 4+ messages in thread From: Sergei Shilovsky @ 2014-10-18 1:37 UTC (permalink / raw) To: Jameson Graef Rollins; +Cc: notmuch > Hi, Sergei. I'm not clear on where exactly you are seeing a problem > with this tab in the subject line. Is it showing up somewhere you think > it shouldn't? It is shown in e.g. `notmuch show` as well as 'notmuch_message_get_header(m, "subject")` > I'm not sure libnotmuch should be doing any scrubbing of the message > contents. The emacs UI does seem to replace the tab with a space, > though. Maybe other MUAs should be doing the same? My point is that this tabulation character does not relate to the contents of the header (this might be arguable though) and libnotmuch should return the contents, not its representation on file system. -- С уважением, Сергей Шиловский Sergei Shilovsky ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Tabulation in multiline headers 2014-10-18 1:37 ` Sergei Shilovsky @ 2014-10-18 9:11 ` Jani Nikula 0 siblings, 0 replies; 4+ messages in thread From: Jani Nikula @ 2014-10-18 9:11 UTC (permalink / raw) To: sshilovsky, Jameson Graef Rollins; +Cc: notmuch On Sat, 18 Oct 2014, Sergei Shilovsky <sshilovsky@gmail.com> wrote: >> Hi, Sergei. I'm not clear on where exactly you are seeing a problem >> with this tab in the subject line. Is it showing up somewhere you think >> it shouldn't? > > It is shown in e.g. `notmuch show` as well as > 'notmuch_message_get_header(m, "subject")` > >> I'm not sure libnotmuch should be doing any scrubbing of the message >> contents. The emacs UI does seem to replace the tab with a space, >> though. Maybe other MUAs should be doing the same? > > My point is that this tabulation character does not relate to the > contents of the header (this might be arguable though) and libnotmuch > should return the contents, not its representation on file system. This is folding and unfolding of long header fields in action, described in [1]. In short, folding happens by inserting CRLF before any WSP, and unfolding happens by removing any CRLF immediately followed by WSP. The WSP is preserved unchanged through folding and unfolding. The TAB is not part of the multiple line representation, it's part of the unfolded content. If my memory serves me right, many problems lead back to an interpretation of [2] that you could insert extra WSP while folding. Due to this interpretation, many agents replace the WSP following a CRLF with a single space while unfolding. And presumably because of this, buggy folding in a Python email package that replaces WSP by a TAB while folding went unnoticed. This problem, in turn, has been literally spread wide by Mailman 2 through its use of said email package. In practice it follows that a perfectly good message will have folding WSP replaced by TAB when it gets transmitted through Mailman 2. Again, this is all from memory, [citation needed] etc. Notmuch is not free of a history of its own when it comes to header unfolding. For historical reasons, we used two header parsers until recently. One from gmime, and one of our own. After all of the above, it shouldn't surprise the reader that the parsers treated folding WSP differently! Our own parser replaced folding WSP with a single space, while gmime respects the RFC. Starting from 0.18 we only use gmime to parse headers, which means we're at least consistent, but, by the GIGO principle, we may see more folding TABs. I do not think we should workaround header folding problems in the lib, and I'm not sure about the cli either. We should consider replacing TABs with spaces in notmuch-emacs though (I personally use a notmuch-show-markup-headers-hook that does that). HTH, Jani. [1] https://tools.ietf.org/html/rfc5322#section-2.2.3 [2] https://tools.ietf.org/html/rfc822#section-3.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-10-18 9:12 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-17 23:45 Tabulation in multiline headers Sergei Shilovsky 2014-10-18 0:08 ` Jameson Graef Rollins 2014-10-18 1:37 ` Sergei Shilovsky 2014-10-18 9:11 ` Jani Nikula
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).