RFC 822 specifies that headers are one-liners of ASCII: > The field-body may be composed of any ASCII characters, except CR or > LF. (While CR and/or LF may be present in the actual text, they are > removed by the action of unfolding the field.) RFC 5335 allows UTF-8 in header field bodies, but as I read the docs, the RFC 822 specification that they end up as one-liners still applies. RFC 5322 describes folding and unfolding as follows: > Each header field is logically a single line of characters comprising > the field name, the colon, and the field body. For convenience > however, and to deal with the 998/78 character limitations per line, > the field body portion of a header field can be split into a > multiple-line representation; this is called "folding". The general > rule is that wherever this specification allows for folding white > space (not simply WSP characters), a CRLF may be inserted before any > WSP. ... > The process of moving from this folded multiple-line representation of > a header field to its single line representation is called > "unfolding". Unfolding is accomplished by simply removing any CRLF > that is immediately followed by WSP. Again, unfolded subjects should be one-liners. An email was sent to me from pingg.com (I think it's a pretentious version of evite) came with a subject of "=?utf-8?Q?bring_small_items_for_a_pi=C3=B1ata=21=21=21=21=0A?=", which "notmuch search" displays as "Subject: bring small items for a piƱata!!!!" with a \n at the end. This befuddles the emacs UI ("Error: Unexpected output from notmuch search:"). I've attached an email that reproduces the error. I don't think ending the subject with a utf-8-encoded 0x0A followed by the usual CRLF is RFC-compliant. Still, notmuch should surely follow the deplorable "accept liberally/emit conservatively" doctrine. Here is a patch that trims leading and trailing whitespace from subjects and replaces internal non-space, non-horizontal-tab whitespace with spaces. It fixes the problem described in this message. --- lib/thread.cc | 36 ++++++++++++++++++++++++++++++++---- 1 files changed, 32 insertions(+), 4 deletions(-) diff --git a/lib/thread.cc b/lib/thread.cc index 5190a66..7a816ea 100644 --- a/lib/thread.cc +++ b/lib/thread.cc @@ -266,6 +266,34 @@ _thread_add_message (notmuch_thread_t *thread, } } +/* Remove leading/trailing whitespace and replace internal vertical + * whitespace with spaces. + */ +static char * +rectify_whitespace (char *str) +{ + char *last; + char *curr; + + while (isspace (*str)) + str++; + + if (*str == 0) + return str; + + last = str + strlen(str) - 1; + while (last > str && isspace (*last)) + last--; + + curr = str; + do + if ((*curr >= 10) && (*curr <= 13)) + *curr = 32; //space + while (curr++ < last); + + return str; +} + static void _thread_set_subject_from_message (notmuch_thread_t *thread, notmuch_message_t *message) @@ -282,11 +310,11 @@ _thread_set_subject_from_message (notmuch_thread_t *thread, (strncasecmp (subject, "Vs: ", 4) == 0) || (strncasecmp (subject, "Sv: ", 4) == 0)) { - cleaned_subject = talloc_strndup (thread, - subject + 4, - strlen(subject) - 4); + cleaned_subject = rectify_whitespace(talloc_strndup (thread, + subject + 4, + strlen(subject) - 4)); } else { - cleaned_subject = talloc_strdup (thread, subject); + cleaned_subject = rectify_whitespace(talloc_strdup (thread, subject)); } if (thread->subject) -- 1.7.2.3