unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] Remove/replace vertical whitespace in subject header field body.
@ 2011-03-17  1:44 James Vasile
  2011-03-17  1:55 ` [PATCH] replace null terminator in string James Vasile
  0 siblings, 1 reply; 2+ messages in thread
From: James Vasile @ 2011-03-17  1:44 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 3642 bytes --]

RFC 822 specifies that headers are one-liners of ASCII:

> The field-body may be composed of any ASCII characters, except CR or
> LF.  (While CR and/or LF may be present in the actual text, they are
> removed by the action of unfolding the field.)

RFC 5335 allows UTF-8 in header field bodies, but as I read the docs,
the RFC 822 specification that they end up as one-liners still applies.

RFC 5322 describes folding and unfolding as follows:

> Each header field is logically a single line of characters comprising
> the field name, the colon, and the field body. For convenience
> however, and to deal with the 998/78 character limitations per line,
> the field body portion of a header field can be split into a
> multiple-line representation; this is called "folding". The general
> rule is that wherever this specification allows for folding white
> space (not simply WSP characters), a CRLF may be inserted before any
> WSP.
...
> The process of moving from this folded multiple-line representation of
> a header field to its single line representation is called
> "unfolding". Unfolding is accomplished by simply removing any CRLF
> that is immediately followed by WSP.

Again, unfolded subjects should be one-liners.

An email was sent to me from pingg.com (I think it's a pretentious
version of evite) came with a subject of
"=?utf-8?Q?bring_small_items_for_a_pi=C3=B1ata=21=21=21=21=0A?=", which
"notmuch search" displays as "Subject: bring small items for a
piñata!!!!" with a \n at the end.  This befuddles the emacs UI ("Error:
Unexpected output from notmuch search:").  I've attached an email that
reproduces the error.

I don't think ending the subject with a utf-8-encoded 0x0A followed by
the usual CRLF is RFC-compliant.  Still, notmuch should surely follow
the deplorable "accept liberally/emit conservatively" doctrine.

Here is a patch that trims leading and trailing whitespace from subjects
and replaces internal non-space, non-horizontal-tab whitespace with
spaces.  It fixes the problem described in this message.
---
 lib/thread.cc |   36 ++++++++++++++++++++++++++++++++----
 1 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/lib/thread.cc b/lib/thread.cc
index 5190a66..7a816ea 100644
--- a/lib/thread.cc
+++ b/lib/thread.cc
@@ -266,6 +266,34 @@ _thread_add_message (notmuch_thread_t *thread,
     }
 }
 
+/* Remove leading/trailing whitespace and replace internal vertical
+ * whitespace with spaces.
+ */
+static char *
+rectify_whitespace (char *str)
+{
+  char *last;
+  char *curr;
+
+  while (isspace (*str))
+    str++;
+
+  if (*str == 0)
+    return str;
+
+  last = str + strlen(str) - 1;
+  while (last > str && isspace (*last))
+    last--;
+
+  curr = str;
+  do
+    if ((*curr >= 10) && (*curr <= 13))
+      *curr = 32; //space
+  while (curr++ < last);
+
+  return str;
+}
+
 static void
 _thread_set_subject_from_message (notmuch_thread_t *thread,
 				  notmuch_message_t *message)
@@ -282,11 +310,11 @@ _thread_set_subject_from_message (notmuch_thread_t *thread,
 	(strncasecmp (subject, "Vs: ", 4) == 0) ||
 	(strncasecmp (subject, "Sv: ", 4) == 0)) {
 
-	cleaned_subject = talloc_strndup (thread,
-					  subject + 4,
-					  strlen(subject) - 4);
+      cleaned_subject = rectify_whitespace(talloc_strndup (thread,
+							   subject + 4,
+							   strlen(subject) - 4));
     } else {
-	cleaned_subject = talloc_strdup (thread, subject);
+      cleaned_subject = rectify_whitespace(talloc_strdup (thread, subject));
     }
 
     if (thread->subject)
-- 
1.7.2.3




[-- Attachment #2: malformed_subject --]
[-- Type: application/octet-stream, Size: 352 bytes --]

Date: Fri, 11 Mar 2011 18:40:00 +0000
From: "redacted" <host@invite.pingg.com>
To: redacted@example.com
Message-Id: <20110311183749.526771.31453.9841841@sender.pingg.com>
Subject: =?utf-8?Q?bring_small_items_for_a_pi=C3=B1ata=21=21=21=21=0A?=
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Ignore this.

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH] replace null terminator in string
  2011-03-17  1:44 [PATCH] Remove/replace vertical whitespace in subject header field body James Vasile
@ 2011-03-17  1:55 ` James Vasile
  0 siblings, 0 replies; 2+ messages in thread
From: James Vasile @ 2011-03-17  1:55 UTC (permalink / raw)
  To: notmuch

In order to make the prior patch work for trailing whitespace, we also need this one.
---
 lib/thread.cc |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/lib/thread.cc b/lib/thread.cc
index 7a816ea..54fde2b 100644
--- a/lib/thread.cc
+++ b/lib/thread.cc
@@ -291,6 +291,8 @@ rectify_whitespace (char *str)
       *curr = 32; //space
   while (curr++ < last);
 
+  *(last+1) = 0;
+
   return str;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-03-17  1:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-17  1:44 [PATCH] Remove/replace vertical whitespace in subject header field body James Vasile
2011-03-17  1:55 ` [PATCH] replace null terminator in string James Vasile

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).