unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH 0/6] lib: replace the message header parser with gmime
@ 2013-10-16 19:00 Jani Nikula
  2013-10-16 19:00 ` [PATCH 1/6] emacs: Sanitize authors and subjects in search and show Jani Nikula
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-16 19:00 UTC (permalink / raw)
  To: notmuch

Hi all, here's something to debate. ;)

We have a homebrew message header parser in the lib, and we also parse
messages, including headers, using gime during indexing. This means for
messages that get indexed we parse the headers twice. (Duplicates and
non-emails only get parsed using our own parser.)

The two parsers handle some things differently, which may cause
confusion (tab handling in header folding for example).

In the interest of reducing somewhat complicated code to maintain, just
nuke the homebrew parser in favor of gmime. I did not look into the
history of why we have our own parser to begin with; it was more fun to
just do some coding. ;)

Patches 1-3 do prep work to fix some of the differences in the parsers
in advance. Arguably they are not that bad regardless of the parser
change.

Patches 4-5 actually make the change. Having two patches is a somewhat
artificial division, but perhaps makes it easier to review.

Patch 6 is just a hack to make perf tests not ignore so many mails... we
have quite a bit of non-emails in the corpus by gmime parser
standards. And this illlustrates one of the differences in the parsers.


BR,
Jani.


Austin Clements (1):
  emacs: Sanitize authors and subjects in search and show

Jani Nikula (5):
  cli: sanitize tabs to spaces in notmuch search
  cli: make the hacky from guessing more liberal
  lib: replace the header parser with gmime
  lib: parse messages only once
  HACK: fix broken messages in the perf test corpus

 emacs/notmuch-lib.el              |   6 +
 emacs/notmuch-show.el             |   7 +-
 emacs/notmuch.el                  |   6 +-
 lib/database.cc                   |   6 +-
 lib/index.cc                      |  70 +-------
 lib/message-file.c                | 351 +++++++++++++-------------------------
 lib/message.cc                    |   6 +
 lib/notmuch-private.h             |  19 ++-
 notmuch-reply.c                   |   4 +-
 notmuch-search.c                  |   4 +-
 performance-test/perf-test-lib.sh |   4 +
 11 files changed, 172 insertions(+), 311 deletions(-)

-- 
1.8.4.rc3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] emacs: Sanitize authors and subjects in search and show
  2013-10-16 19:00 [PATCH 0/6] lib: replace the message header parser with gmime Jani Nikula
@ 2013-10-16 19:00 ` Jani Nikula
  2013-10-16 19:00 ` [PATCH 2/6] cli: sanitize tabs to spaces in notmuch search Jani Nikula
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-16 19:00 UTC (permalink / raw)
  To: notmuch

From: Austin Clements <amdragon@MIT.EDU>

Authors and subjects can contain embedded, encoded control characters
like "\n" and "\t" that mess up display.  Transform control characters
into spaces everywhere we display them in search and show.
---
 emacs/notmuch-lib.el  | 6 ++++++
 emacs/notmuch-show.el | 7 ++++---
 emacs/notmuch.el      | 6 ++++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 58f3313..6541282 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -243,6 +243,12 @@ depending on the value of `notmuch-poll-script'."
 	"[No Subject]"
       subject)))
 
+(defun notmuch-sanitize (str)
+  "Sanitize control character in STR.
+
+This includes newlines, tabs, and other funny characters."
+  (replace-regexp-in-string "[[:cntrl:]\x7f\u2028\u2029]+" " " str))
+
 (defun notmuch-escape-boolean-term (term)
   "Escape a boolean term for use in a query.
 
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 7325792..fa11d98 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -407,7 +407,8 @@ unchanged ADDRESS if parsing fails."
 message at DEPTH in the current thread."
   (let ((start (point)))
     (insert (notmuch-show-spaces-n (* notmuch-show-indent-messages-width depth))
-	    (notmuch-show-clean-address (plist-get headers :From))
+	    (notmuch-sanitize
+	     (notmuch-show-clean-address (plist-get headers :From)))
 	    " ("
 	    date
 	    ") ("
@@ -417,7 +418,7 @@ message at DEPTH in the current thread."
 
 (defun notmuch-show-insert-header (header header-value)
   "Insert a single header."
-  (insert header ": " header-value "\n"))
+  (insert header ": " (notmuch-sanitize header-value) "\n"))
 
 (defun notmuch-show-insert-headers (headers)
   "Insert the headers of the current message."
@@ -1154,7 +1155,7 @@ function is used."
       (jit-lock-register #'notmuch-show-buttonise-links)
 
       ;; Set the header line to the subject of the first message.
-      (setq header-line-format (notmuch-show-strip-re (notmuch-show-get-subject)))
+      (setq header-line-format (notmuch-sanitize (notmuch-show-strip-re (notmuch-show-get-subject))))
 
       (run-hooks 'notmuch-show-hook))))
 
diff --git a/emacs/notmuch.el b/emacs/notmuch.el
index c47c6b5..44cd2fd 100644
--- a/emacs/notmuch.el
+++ b/emacs/notmuch.el
@@ -791,11 +791,13 @@ non-authors is found, assume that all of the authors match."
 					(plist-get result :total)))
 			'face 'notmuch-search-count)))
    ((string-equal field "subject")
-    (insert (propertize (format format-string (plist-get result :subject))
+    (insert (propertize (format format-string
+				(notmuch-sanitize (plist-get result :subject)))
 			'face 'notmuch-search-subject)))
 
    ((string-equal field "authors")
-    (notmuch-search-insert-authors format-string (plist-get result :authors)))
+    (notmuch-search-insert-authors
+     format-string (notmuch-sanitize (plist-get result :authors))))
 
    ((string-equal field "tags")
     (let ((tags (plist-get result :tags)))
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] cli: sanitize tabs to spaces in notmuch search
  2013-10-16 19:00 [PATCH 0/6] lib: replace the message header parser with gmime Jani Nikula
  2013-10-16 19:00 ` [PATCH 1/6] emacs: Sanitize authors and subjects in search and show Jani Nikula
@ 2013-10-16 19:00 ` Jani Nikula
  2013-10-17  8:07   ` Mark Walters
  2013-10-16 19:00 ` [PATCH 3/6] cli: make the hacky from guessing more liberal Jani Nikula
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Jani Nikula @ 2013-10-16 19:00 UTC (permalink / raw)
  To: notmuch

This is in preparation of switching to gmime header parsing, but
arguably converting tabs to spaces rather than question marks is the
right thing to do anyway.
---
 notmuch-search.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index d9d39ec..eab314f 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -40,7 +40,9 @@ sanitize_string (const void *ctx, const char *str)
     loop = out = talloc_strdup (ctx, str);
 
     for (; *loop; loop++) {
-	if ((unsigned char)(*loop) < 32)
+	if (*loop == '\t')
+	    *loop = ' ';
+	else if ((unsigned char)(*loop) < 32)
 	    *loop = '?';
     }
     return out;
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] cli: make the hacky from guessing more liberal
  2013-10-16 19:00 [PATCH 0/6] lib: replace the message header parser with gmime Jani Nikula
  2013-10-16 19:00 ` [PATCH 1/6] emacs: Sanitize authors and subjects in search and show Jani Nikula
  2013-10-16 19:00 ` [PATCH 2/6] cli: sanitize tabs to spaces in notmuch search Jani Nikula
@ 2013-10-16 19:00 ` Jani Nikula
  2013-10-17  8:11   ` Mark Walters
  2013-10-17 13:58   ` Moritz Wilhelmy
  2013-10-16 19:00 ` [PATCH 4/6] lib: replace the header parser with gmime Jani Nikula
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-16 19:00 UTC (permalink / raw)
  To: notmuch

This is in preparation of switching to gmime header parsing. Accept
"for" and "by" preceded by tabs in the received header. This is a bit
flaky, but so is the whole guessing code.
---
 notmuch-reply.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notmuch-reply.c b/notmuch-reply.c
index 9d6f843..4b67e66 100644
--- a/notmuch-reply.c
+++ b/notmuch-reply.c
@@ -423,7 +423,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
     /* First we look for a " for <email@add.res>" in the received
      * header
      */
-    ptr = strstr (received, " for ");
+    ptr = strstr (received, "for ");
 
     /* Note: ptr potentially contains a list of email addresses. */
     addr = user_address_in_string (ptr, config);
@@ -440,7 +440,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
      * system in this step of the receive chain
      */
     by = received;
-    while((by = strstr (by, " by ")) != NULL) {
+    while((by = strstr (by, "by ")) != NULL) {
 	by += 4;
 	if (*by == '\0')
 	    break;
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] lib: replace the header parser with gmime
  2013-10-16 19:00 [PATCH 0/6] lib: replace the message header parser with gmime Jani Nikula
                   ` (2 preceding siblings ...)
  2013-10-16 19:00 ` [PATCH 3/6] cli: make the hacky from guessing more liberal Jani Nikula
@ 2013-10-16 19:00 ` Jani Nikula
  2013-10-16 19:00 ` [PATCH 5/6] lib: parse messages only once Jani Nikula
  2013-10-16 19:00 ` [PATCH 6/6] HACK: fix broken messages in the perf test corpus Jani Nikula
  5 siblings, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-16 19:00 UTC (permalink / raw)
  To: notmuch

The notmuch library includes a full blown message header parser. Yet
the same message headers are parsed by gmime during indexing. Switch
to gmime parsing completely.

These are the main differences between the parsers:

* Gmime stops header parsing at the first invalid header, and presumes
  the message body starts from there. The current parser is quite
  liberal in accepting broken headers. The change means we will be
  much pickier about accepting invalid messages. For example the
  performance test suite contains plenty of email that does not pass
  as valid email.

* The current parser converts tabs used in header folding to
  spaces. Gmime preserve the tabs. Due to a broken python library used
  in mailman, there are plenty of mailing lists that produce headers
  with tabs in header folding, we'll see plenty of tabs.

At this step, we only switch the header parsing to gmime.
---
 lib/database.cc       |   4 +
 lib/message-file.c    | 346 ++++++++++++++++----------------------------------
 lib/message.cc        |   6 +
 lib/notmuch-private.h |   4 +
 4 files changed, 122 insertions(+), 238 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 06f1c0a..45a3987 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1907,6 +1907,10 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
 					   "to",
 					   (char *) NULL);
 
+    ret = notmuch_message_file_parse (message_file);
+    if (ret)
+	goto DONE;
+
     try {
 	/* Before we do any real work, (especially before doing a
 	 * potential SHA-1 computation on the entire file's contents),
diff --git a/lib/message-file.c b/lib/message-file.c
index a2850c2..9d5a3b9 100644
--- a/lib/message-file.c
+++ b/lib/message-file.c
@@ -26,30 +26,19 @@
 
 #include <glib.h> /* GHashTable */
 
-typedef struct {
-    char *str;
-    size_t size;
-    size_t len;
-} header_value_closure_t;
-
 struct _notmuch_message_file {
     /* File object */
     FILE *file;
+    char *filename;
 
     /* Header storage */
     int restrict_headers;
     GHashTable *headers;
-    int broken_headers;
-    int good_headers;
-    size_t header_size; /* Length of full message header in bytes. */
-
-    /* Parsing state */
-    char *line;
-    size_t line_size;
-    header_value_closure_t value;
 
-    int parsing_started;
-    int parsing_finished;
+    GMimeStream *stream;
+    GMimeParser *parser;
+    GMimeMessage *message;
+    notmuch_bool_t parsed;
 };
 
 static int
@@ -76,16 +65,18 @@ strcase_hash (const void *ptr)
 static int
 _notmuch_message_file_destructor (notmuch_message_file_t *message)
 {
-    if (message->line)
-	free (message->line);
-
-    if (message->value.size)
-	free (message->value.str);
-
     if (message->headers)
 	g_hash_table_destroy (message->headers);
 
-    if (message->file)
+    if (message->message)
+	g_object_unref (message->message);
+
+    if (message->parser)
+	g_object_unref (message->parser);
+
+    if (message->stream)
+	g_object_unref (message->stream);
+    else if (message->file) /* GMimeStream takes over the FILE* */
 	fclose (message->file);
 
     return 0;
@@ -102,19 +93,21 @@ _notmuch_message_file_open_ctx (void *ctx, const char *filename)
     if (unlikely (message == NULL))
 	return NULL;
 
+    /* only needed for error messages during parsing */
+    message->filename = talloc_strdup (message, filename);
+    if (message->filename == NULL)
+	goto FAIL;
+
     talloc_set_destructor (message, _notmuch_message_file_destructor);
 
     message->file = fopen (filename, "r");
     if (message->file == NULL)
 	goto FAIL;
 
-    message->headers = g_hash_table_new_full (strcase_hash,
-					      strcase_equal,
-					      free,
-					      g_free);
-
-    message->parsing_started = 0;
-    message->parsing_finished = 0;
+    message->headers = g_hash_table_new_full (strcase_hash, strcase_equal,
+					      free, g_free);
+    if (message->headers == NULL)
+	goto FAIL;
 
     return message;
 
@@ -143,7 +136,7 @@ notmuch_message_file_restrict_headersv (notmuch_message_file_t *message,
 {
     char *header;
 
-    if (message->parsing_started)
+    if (message->parsed)
 	INTERNAL_ERROR ("notmuch_message_file_restrict_headers called after parsing has started");
 
     while (1) {
@@ -167,234 +160,111 @@ notmuch_message_file_restrict_headers (notmuch_message_file_t *message, ...)
     notmuch_message_file_restrict_headersv (message, va_headers);
 }
 
-static void
-copy_header_unfolding (header_value_closure_t *value,
-		       const char *chunk)
+/*
+ * gmime does not provide access to all headers, so we need to use the
+ * parser header callback to gather them into a hash table.
+ */
+static void header_cb (unused(GMimeParser *parser), const char *header,
+		       const char *value, unused(gint64 offset),
+		       gpointer user_data)
 {
-    char *last;
+    notmuch_message_file_t *message = (notmuch_message_file_t *) user_data;
+    char *existing = NULL;
+    notmuch_bool_t found;
 
-    if (chunk == NULL)
+    found = g_hash_table_lookup_extended (message->headers, header,
+					  NULL, (gpointer *) &existing);
+    if (! found && message->restrict_headers)
 	return;
 
-    while (*chunk == ' ' || *chunk == '\t')
-	chunk++;
-
-    if (value->len + 1 + strlen (chunk) + 1 > value->size) {
-	unsigned int new_size = value->size;
-	if (value->size == 0)
-	    new_size = strlen (chunk) + 1;
-	else
-	    while (value->len + 1 + strlen (chunk) + 1 > new_size)
-		new_size *= 2;
-	value->str = xrealloc (value->str, new_size);
-	value->size = new_size;
-    }
-
-    last = value->str + value->len;
-    if (value->len) {
-	*last = ' ';
-	last++;
-	value->len++;
-    }
-
-    strcpy (last, chunk);
-    value->len += strlen (chunk);
-
-    last = value->str + value->len - 1;
-    if (*last == '\n') {
-	*last = '\0';
-	value->len--;
+    if (existing == NULL) {
+	/* No value, add one */
+	char *decoded = g_mime_utils_header_decode_text (value);
+	g_hash_table_insert (message->headers, xstrdup (header), decoded);
+    } else if (strcasecmp (header, "received") == 0) {
+	/* Concat all of the Received: headers we encounter. */
+	char *combined, *decoded;
+	size_t combined_size;
+
+	decoded = g_mime_utils_header_decode_text (value);
+
+	combined_size = strlen(existing) + strlen(decoded) + 2;
+	combined = g_malloc (combined_size);
+	snprintf (combined, combined_size, "%s %s", existing, decoded);
+	g_free (decoded);
+	g_hash_table_insert (message->headers, xstrdup (header), combined);
     }
 }
 
-/* As a special-case, a value of NULL for header_desired will force
- * the entire header to be parsed if it is not parsed already. This is
- * used by the _notmuch_message_file_get_headers_end function.
- * Another special case is the Received: header. For this header we
- * want to concatenate all instances of the header instead of just
- * hashing the first instance as we use this when analyzing the path
- * the mail has taken from sender to recipient.
- */
-const char *
-notmuch_message_file_get_header (notmuch_message_file_t *message,
-				 const char *header_desired)
+notmuch_status_t
+notmuch_message_file_parse (notmuch_message_file_t *message)
 {
-    int contains;
-    char *header, *decoded_value, *header_sofar, *combined_header;
-    const char *s, *colon;
-    int match, newhdr, hdrsofar, is_received;
     static int initialized = 0;
-
-    is_received = (strcmp(header_desired,"received") == 0);
+    char from_buf[5];
+    notmuch_bool_t is_mbox = FALSE;
+    static notmuch_bool_t mbox_warning = FALSE;
 
     if (! initialized) {
 	g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);
 	initialized = 1;
     }
 
-    message->parsing_started = 1;
-
-    if (header_desired == NULL)
-	contains = 0;
-    else
-	contains = g_hash_table_lookup_extended (message->headers,
-						 header_desired, NULL,
-						 (gpointer *) &decoded_value);
-
-    if (contains && decoded_value)
-	return decoded_value;
-
-    if (message->parsing_finished)
-	return "";
-
-#define NEXT_HEADER_LINE(closure)				\
-    while (1) {							\
-	ssize_t bytes_read = getline (&message->line,		\
-				      &message->line_size,	\
-				      message->file);		\
-	if (bytes_read == -1) {					\
-	    message->parsing_finished = 1;			\
-	    break;						\
-	}							\
-	if (*message->line == '\n') {				\
-	    message->parsing_finished = 1;			\
-	    break;						\
-	}							\
-	if (closure &&						\
-	    (*message->line == ' ' || *message->line == '\t'))	\
-	{							\
-	    copy_header_unfolding ((closure), message->line);	\
-	}							\
-	if (*message->line == ' ' || *message->line == '\t')	\
-	    message->header_size += strlen (message->line);	\
-	else							\
-	    break;						\
-    }
-
-    if (message->line == NULL)
-	NEXT_HEADER_LINE (NULL);
-
-    while (1) {
-
-	if (message->parsing_finished)
-	    break;
-
-	colon = strchr (message->line, ':');
-
-	if (colon == NULL) {
-	    message->broken_headers++;
-	    /* A simple heuristic for giving up on things that just
-	     * don't look like mail messages. */
-	    if (message->broken_headers >= 10 &&
-		message->good_headers < 5)
-	    {
-		message->parsing_finished = 1;
-		break;
-	    }
-	    NEXT_HEADER_LINE (NULL);
-	    continue;
+    /* Is this mbox? */
+    if (fread (from_buf, sizeof (from_buf), 1, message->file) == 1 &&
+	strncmp (from_buf, "From ", 5) == 0)
+	is_mbox = TRUE;
+    rewind (message->file);
+
+    message->stream = g_mime_stream_file_new (message->file);
+    message->parser = g_mime_parser_new_with_stream (message->stream);
+    g_mime_parser_set_scan_from (message->parser, is_mbox);
+    g_mime_parser_set_header_regex (message->parser, ".*", header_cb,
+				    (gpointer) message);
+
+    message->message = g_mime_parser_construct_message (message->parser);
+    if (! message->message)
+	return NOTMUCH_STATUS_FILE_NOT_EMAIL;
+
+    if (is_mbox) {
+	if (!g_mime_parser_eos (message->parser)) {
+	    /* This is a multi-message mbox. */
+	    return NOTMUCH_STATUS_FILE_NOT_EMAIL;
 	}
-
-	message->header_size += strlen (message->line);
-
-	message->good_headers++;
-
-	header = xstrndup (message->line, colon - message->line);
-
-	if (message->restrict_headers &&
-	    ! g_hash_table_lookup_extended (message->headers,
-					    header, NULL, NULL))
-	{
-	    free (header);
-	    NEXT_HEADER_LINE (NULL);
-	    continue;
-	}
-
-	s = colon + 1;
-	while (*s == ' ' || *s == '\t')
-	    s++;
-
-	message->value.len = 0;
-	copy_header_unfolding (&message->value, s);
-
-	NEXT_HEADER_LINE (&message->value);
-
-	if (header_desired == NULL)
-	    match = 0;
-	else
-	    match = (strcasecmp (header, header_desired) == 0);
-
-	decoded_value = g_mime_utils_header_decode_text (message->value.str);
-	header_sofar = (char *)g_hash_table_lookup (message->headers, header);
-	/* we treat the Received: header special - we want to concat ALL of 
-	 * the Received: headers we encounter.
-	 * for everything else we return the first instance of a header */
-	if (strcasecmp(header, "received") == 0) {
-	    if (header_sofar == NULL) {
-		/* first Received: header we encountered; just add it */
-		g_hash_table_insert (message->headers, header, decoded_value);
-	    } else {
-		/* we need to add the header to those we already collected */
-		newhdr = strlen(decoded_value);
-		hdrsofar = strlen(header_sofar);
-		combined_header = g_malloc(hdrsofar + newhdr + 2);
-		strncpy(combined_header,header_sofar,hdrsofar);
-		*(combined_header+hdrsofar) = ' ';
-		strncpy(combined_header+hdrsofar+1,decoded_value,newhdr+1);
-		g_free (decoded_value);
-		g_hash_table_insert (message->headers, header, combined_header);
-	    }
-	} else {
-	    if (header_sofar == NULL) {
-		/* Only insert if we don't have a value for this header, yet. */
-		g_hash_table_insert (message->headers, header, decoded_value);
-	    } else {
-		free (header);
-		g_free (decoded_value);
-		decoded_value = header_sofar;
-	    }
+	/* For historical reasons, we support single-message mboxes,
+	 * but this behavior is likely to change in the future, so
+	 * warn. */
+	if (!mbox_warning) {
+	    mbox_warning = TRUE;
+	    fprintf (stderr, "\
+Warning: %s is an mbox containing a single message,\n\
+likely caused by misconfigured mail delivery.  Support for single-message\n\
+mboxes is deprecated and may be removed in the future.\n", message->filename);
 	}
-	/* if we found a match we can bail - unless of course we are
-	 * collecting all the Received: headers */
-	if (match && !is_received)
-	    return decoded_value;
     }
 
-    if (message->parsing_finished) {
-        fclose (message->file);
-        message->file = NULL;
-    }
+    message->parsed = TRUE;
 
-    if (message->line)
-	free (message->line);
-    message->line = NULL;
+    return NOTMUCH_STATUS_SUCCESS;
+}
 
-    if (message->value.size) {
-	free (message->value.str);
-	message->value.str = NULL;
-	message->value.size = 0;
-	message->value.len = 0;
-    }
+/* return NULL on errors, empty string for non-existing headers */
+const char *
+notmuch_message_file_get_header (notmuch_message_file_t *message,
+				 const char *header)
+{
+    const char *value = NULL;
+    notmuch_bool_t found;
 
-    /* For the Received: header we actually might end up here even
-     * though we found the header (as we force continued parsing
-     * in that case). So let's check if that's the header we were
-     * looking for and return the value that we found (if any)
-     */
-    if (is_received)
-	return (char *)g_hash_table_lookup (message->headers, "received");
-
-    /* We've parsed all headers and never found the one we're looking
-     * for. It's probably just not there, but let's check that we
-     * didn't make a mistake preventing us from seeing it. */
-    if (message->restrict_headers && header_desired &&
-	! g_hash_table_lookup_extended (message->headers,
-					header_desired, NULL, NULL))
-    {
-	INTERNAL_ERROR ("Attempt to get header \"%s\" which was not\n"
-			"included in call to notmuch_message_file_restrict_headers\n",
-			header_desired);
-    }
+    /* the caller shouldn't ask for headers before parsing */
+    if (! message->parsed)
+	return NULL;
+
+    found = g_hash_table_lookup_extended (message->headers, header,
+					  NULL, (gpointer *) &value);
+
+    /* the caller shouldn't ask for non-restricted headers */
+    if (! found && message->restrict_headers)
+	return NULL;
 
-    return "";
+    return value ? value : "";
 }
diff --git a/lib/message.cc b/lib/message.cc
index 1b46379..1c7b91f 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -407,6 +407,12 @@ _notmuch_message_ensure_message_file (notmuch_message_t *message)
 	return;
 
     message->message_file = _notmuch_message_file_open_ctx (message, filename);
+
+    /* XXX: better return value handling */
+    if (notmuch_message_file_parse (message->message_file)) {
+	notmuch_message_file_close (message->message_file);
+	message->message_file = NULL;
+    }
 }
 
 const char *
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index af185c7..7277df1 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -345,6 +345,10 @@ notmuch_message_file_open (const char *filename);
 notmuch_message_file_t *
 _notmuch_message_file_open_ctx (void *ctx, const char *filename);
 
+/* Parse the message */
+notmuch_status_t
+notmuch_message_file_parse (notmuch_message_file_t *message);
+
 /* Close a notmuch message previously opened with notmuch_message_open. */
 void
 notmuch_message_file_close (notmuch_message_file_t *message);
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] lib: parse messages only once
  2013-10-16 19:00 [PATCH 0/6] lib: replace the message header parser with gmime Jani Nikula
                   ` (3 preceding siblings ...)
  2013-10-16 19:00 ` [PATCH 4/6] lib: replace the header parser with gmime Jani Nikula
@ 2013-10-16 19:00 ` Jani Nikula
  2013-10-16 19:00 ` [PATCH 6/6] HACK: fix broken messages in the perf test corpus Jani Nikula
  5 siblings, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-16 19:00 UTC (permalink / raw)
  To: notmuch

Make the necessary changes to only do one gmime parse pass during
indexing.
---
 lib/database.cc       |  2 +-
 lib/index.cc          | 70 +++++----------------------------------------------
 lib/message-file.c    |  9 +++++++
 lib/notmuch-private.h | 15 +++++++++--
 4 files changed, 29 insertions(+), 67 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 45a3987..d097dda 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1996,7 +1996,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
 	    date = notmuch_message_file_get_header (message_file, "date");
 	    _notmuch_message_set_header_values (message, date, from, subject);
 
-	    ret = _notmuch_message_index_file (message, filename);
+	    ret = _notmuch_message_index_file (message, message_file);
 	    if (ret)
 		goto DONE;
 	} else {
diff --git a/lib/index.cc b/lib/index.cc
index 78c18cf..71397da 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -425,63 +425,15 @@ _index_mime_part (notmuch_message_t *message,
 
 notmuch_status_t
 _notmuch_message_index_file (notmuch_message_t *message,
-			     const char *filename)
+			     notmuch_message_file_t *message_file)
 {
-    GMimeStream *stream = NULL;
-    GMimeParser *parser = NULL;
-    GMimeMessage *mime_message = NULL;
+    GMimeMessage *mime_message;
     InternetAddressList *addresses;
-    FILE *file = NULL;
     const char *from, *subject;
-    notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
-    static int initialized = 0;
-    char from_buf[5];
-    bool is_mbox = false;
-    static bool mbox_warning = false;
-
-    if (! initialized) {
-	g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);
-	initialized = 1;
-    }
-
-    file = fopen (filename, "r");
-    if (! file) {
-	fprintf (stderr, "Error opening %s: %s\n", filename, strerror (errno));
-	ret = NOTMUCH_STATUS_FILE_ERROR;
-	goto DONE;
-    }
-
-    /* Is this mbox? */
-    if (fread (from_buf, sizeof (from_buf), 1, file) == 1 &&
-	strncmp (from_buf, "From ", 5) == 0)
-	is_mbox = true;
-    rewind (file);
 
-    /* Evil GMime steals my FILE* here so I won't fclose it. */
-    stream = g_mime_stream_file_new (file);
-
-    parser = g_mime_parser_new_with_stream (stream);
-    g_mime_parser_set_scan_from (parser, is_mbox);
-
-    mime_message = g_mime_parser_construct_message (parser);
-
-    if (is_mbox) {
-	if (!g_mime_parser_eos (parser)) {
-	    /* This is a multi-message mbox. */
-	    ret = NOTMUCH_STATUS_FILE_NOT_EMAIL;
-	    goto DONE;
-	}
-	/* For historical reasons, we support single-message mboxes,
-	 * but this behavior is likely to change in the future, so
-	 * warn. */
-	if (!mbox_warning) {
-	    mbox_warning = true;
-	    fprintf (stderr, "\
-Warning: %s is an mbox containing a single message,\n\
-likely caused by misconfigured mail delivery.  Support for single-message\n\
-mboxes is deprecated and may be removed in the future.\n", filename);
-	}
-    }
+    mime_message = notmuch_message_file_get_mime_message (message_file);
+    if (! mime_message)
+	return NOTMUCH_STATUS_FILE_NOT_EMAIL; /* more like internal error */
 
     from = g_mime_message_get_sender (mime_message);
 
@@ -502,15 +454,5 @@ mboxes is deprecated and may be removed in the future.\n", filename);
 
     _index_mime_part (message, g_mime_message_get_mime_part (mime_message));
 
-  DONE:
-    if (mime_message)
-	g_object_unref (mime_message);
-
-    if (parser)
-	g_object_unref (parser);
-
-    if (stream)
-	g_object_unref (stream);
-
-    return ret;
+    return NOTMUCH_STATUS_SUCCESS;
 }
diff --git a/lib/message-file.c b/lib/message-file.c
index 9d5a3b9..7ab9e9d 100644
--- a/lib/message-file.c
+++ b/lib/message-file.c
@@ -247,6 +247,15 @@ mboxes is deprecated and may be removed in the future.\n", message->filename);
     return NOTMUCH_STATUS_SUCCESS;
 }
 
+GMimeMessage *
+notmuch_message_file_get_mime_message (notmuch_message_file_t *message)
+{
+    if (! message->parsed)
+	return NULL;
+
+    return message->message;
+}
+
 /* return NULL on errors, empty string for non-existing headers */
 const char *
 notmuch_message_file_get_header (notmuch_message_file_t *message,
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 7277df1..048dd6c 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -46,6 +46,8 @@ NOTMUCH_BEGIN_DECLS
 
 #include <talloc.h>
 
+#include <gmime/gmime.h>
+
 #include "xutil.h"
 #include "error_util.h"
 
@@ -320,9 +322,11 @@ notmuch_message_get_author (notmuch_message_t *message);
 
 /* index.cc */
 
+typedef struct _notmuch_message_file notmuch_message_file_t;
+
 notmuch_status_t
 _notmuch_message_index_file (notmuch_message_t *message,
-			     const char *filename);
+			     notmuch_message_file_t *message_file);
 
 /* message-file.c */
 
@@ -330,7 +334,6 @@ _notmuch_message_index_file (notmuch_message_t *message,
  * into the public interface in notmuch.h
  */
 
-typedef struct _notmuch_message_file notmuch_message_file_t;
 
 /* Open a file containing a single email message.
  *
@@ -377,6 +380,14 @@ void
 notmuch_message_file_restrict_headersv (notmuch_message_file_t *message,
 					va_list va_headers);
 
+/*
+ * get mime message. this is an ugly interface; maybe join index.cc
+ * and message-file.c, or move the top level indexing call to
+ * message-file.c with helpers in index.cc
+ */
+GMimeMessage *
+notmuch_message_file_get_mime_message (notmuch_message_file_t *message);
+
 /* Get the value of the specified header from the message as a UTF-8 string.
  *
  * The header name is case insensitive.
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] HACK: fix broken messages in the perf test corpus
  2013-10-16 19:00 [PATCH 0/6] lib: replace the message header parser with gmime Jani Nikula
                   ` (4 preceding siblings ...)
  2013-10-16 19:00 ` [PATCH 5/6] lib: parse messages only once Jani Nikula
@ 2013-10-16 19:00 ` Jani Nikula
  5 siblings, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-16 19:00 UTC (permalink / raw)
  To: notmuch

The gmime header parser rejects a lot of messages in the perf test
corpus which have this in the middle of headers:

Microsoft Mail Internet Headers Version 2.0

The header parsing stops right there. This illustrates a change in the
parsing. The message is clearly broken, but previously notmuch
accepted it anyway.

This patch "fixes" the messages in the perf test corpus to be able to
do fair comparisons of the parsers.

NOT TO BE MERGED, if that isn't obvious. This is just a quick hack.
---
 performance-test/perf-test-lib.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/performance-test/perf-test-lib.sh b/performance-test/perf-test-lib.sh
index 9ee7661..caec0d0 100644
--- a/performance-test/perf-test-lib.sh
+++ b/performance-test/perf-test-lib.sh
@@ -84,7 +84,11 @@ add_email_corpus ()
 	    "${args[@]}"
 
 	printf "\n"
+	printf "Fix broken messages in corpus..."
 
+	find "${TEST_DIRECTORY}/corpus" -type f -print0 | xargs -0 sed -i -e 's/^Microsoft Mail Internet Headers Version 2\.0/X-Crap: &/'
+
+	printf "\n"
     fi
 
     cp -lr $TAG_CORPUS $TMP_DIRECTORY/corpus.tags
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] cli: sanitize tabs to spaces in notmuch search
  2013-10-16 19:00 ` [PATCH 2/6] cli: sanitize tabs to spaces in notmuch search Jani Nikula
@ 2013-10-17  8:07   ` Mark Walters
  2013-10-17 11:47     ` Jani Nikula
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Walters @ 2013-10-17  8:07 UTC (permalink / raw)
  To: Jani Nikula, notmuch


Hi

I have looked at the whole series and broadly it looks good. However, I
don't know this code so this is not a full review. I do have a few
comments: some of these may be plain wrong in which case my apologies!

On Wed, 16 Oct 2013, Jani Nikula <jani@nikula.org> wrote:

> This is in preparation of switching to gmime header parsing, but
> arguably converting tabs to spaces rather than question marks is the
> right thing to do anyway.
> ---

I think it would be worth saying in the commit message that this is only
for text summary output.

Also why only tabs to spaces but \n to a '?'

Best wishes

Mark




>  notmuch-search.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/notmuch-search.c b/notmuch-search.c
> index d9d39ec..eab314f 100644
> --- a/notmuch-search.c
> +++ b/notmuch-search.c
> @@ -40,7 +40,9 @@ sanitize_string (const void *ctx, const char *str)
>      loop = out = talloc_strdup (ctx, str);
>  
>      for (; *loop; loop++) {
> -	if ((unsigned char)(*loop) < 32)
> +	if (*loop == '\t')
> +	    *loop = ' ';
> +	else if ((unsigned char)(*loop) < 32)
>  	    *loop = '?';
>      }
>      return out;
> -- 
> 1.8.4.rc3
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] cli: make the hacky from guessing more liberal
  2013-10-16 19:00 ` [PATCH 3/6] cli: make the hacky from guessing more liberal Jani Nikula
@ 2013-10-17  8:11   ` Mark Walters
  2013-10-17 11:52     ` Jani Nikula
  2013-10-17 13:58   ` Moritz Wilhelmy
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Walters @ 2013-10-17  8:11 UTC (permalink / raw)
  To: Jani Nikula, notmuch

On Wed, 16 Oct 2013, Jani Nikula <jani@nikula.org> wrote:
> This is in preparation of switching to gmime header parsing. Accept
> "for" and "by" preceded by tabs in the received header. This is a bit
> flaky, but so is the whole guessing code.

I am happy with the change but I think a little more explanation of the
problem it fixes would be helpful. Is it that there could be a \n \t
before the "for"/"by" or something else?

> ---
>  notmuch-reply.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/notmuch-reply.c b/notmuch-reply.c
> index 9d6f843..4b67e66 100644
> --- a/notmuch-reply.c
> +++ b/notmuch-reply.c
> @@ -423,7 +423,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
>      /* First we look for a " for <email@add.res>" in the received
>       * header
>       */
> -    ptr = strstr (received, " for ");
> +    ptr = strstr (received, "for ");

The comment should be updated to match the code (and depending on the
answer to the above maybe explain that too)

Best wishes

Mark
>  
>      /* Note: ptr potentially contains a list of email addresses. */
>      addr = user_address_in_string (ptr, config);
> @@ -440,7 +440,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
>       * system in this step of the receive chain
>       */
>      by = received;
> -    while((by = strstr (by, " by ")) != NULL) {
> +    while((by = strstr (by, "by ")) != NULL) {
>  	by += 4;
>  	if (*by == '\0')
>  	    break;
> -- 
> 1.8.4.rc3
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] cli: sanitize tabs to spaces in notmuch search
  2013-10-17  8:07   ` Mark Walters
@ 2013-10-17 11:47     ` Jani Nikula
  0 siblings, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-17 11:47 UTC (permalink / raw)
  To: Mark Walters, notmuch

On Thu, 17 Oct 2013, Mark Walters <markwalters1009@gmail.com> wrote:
> Hi
>
> I have looked at the whole series and broadly it looks good. However, I
> don't know this code so this is not a full review. I do have a few
> comments: some of these may be plain wrong in which case my apologies!

Thanks for the review!

> On Wed, 16 Oct 2013, Jani Nikula <jani@nikula.org> wrote:
>
>> This is in preparation of switching to gmime header parsing, but
>> arguably converting tabs to spaces rather than question marks is the
>> right thing to do anyway.
>> ---
>
> I think it would be worth saying in the commit message that this is only
> for text summary output.

Agreed.

> Also why only tabs to spaces but \n to a '?'

The notmuch header parser converts the tabs in header folding to spaces,
gmime keeps them as tabs. AFAICT that is the only difference between
headers indexed by notmuch and gmime, and causes tests to fail. So we do
the tabs to spaces conversion here to produce same output from cli. This
patch just does the minimal change required for tests to pass; other
changes might be good too, but they were not relevant for this series.

BR,
Jani.

>
> Best wishes
>
> Mark
>
>
>
>
>>  notmuch-search.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/notmuch-search.c b/notmuch-search.c
>> index d9d39ec..eab314f 100644
>> --- a/notmuch-search.c
>> +++ b/notmuch-search.c
>> @@ -40,7 +40,9 @@ sanitize_string (const void *ctx, const char *str)
>>      loop = out = talloc_strdup (ctx, str);
>>  
>>      for (; *loop; loop++) {
>> -	if ((unsigned char)(*loop) < 32)
>> +	if (*loop == '\t')
>> +	    *loop = ' ';
>> +	else if ((unsigned char)(*loop) < 32)
>>  	    *loop = '?';
>>      }
>>      return out;
>> -- 
>> 1.8.4.rc3
>>
>> _______________________________________________
>> notmuch mailing list
>> notmuch@notmuchmail.org
>> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] cli: make the hacky from guessing more liberal
  2013-10-17  8:11   ` Mark Walters
@ 2013-10-17 11:52     ` Jani Nikula
  0 siblings, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-17 11:52 UTC (permalink / raw)
  To: Mark Walters, notmuch

On Thu, 17 Oct 2013, Mark Walters <markwalters1009@gmail.com> wrote:
> On Wed, 16 Oct 2013, Jani Nikula <jani@nikula.org> wrote:
>> This is in preparation of switching to gmime header parsing. Accept
>> "for" and "by" preceded by tabs in the received header. This is a bit
>> flaky, but so is the whole guessing code.
>
> I am happy with the change but I think a little more explanation of the
> problem it fixes would be helpful. Is it that there could be a \n \t
> before the "for"/"by" or something else?

It's the same header folding by tabs here too. In your mail that I'm
replying to, the received headers are folded with tabs. (This may be due
to mailman doing bad things, but that's irrelevant.) There may be a tab
preceding the "for"/"by" instead of a space. Obviously this could be
made more robust, but if you look at what the from guessing does, you
realize it's quite hacky anyway, and a last resort. So I didn't bother,
at least not for now, for this series. Minimal change to make tests
pass.

>> ---
>>  notmuch-reply.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/notmuch-reply.c b/notmuch-reply.c
>> index 9d6f843..4b67e66 100644
>> --- a/notmuch-reply.c
>> +++ b/notmuch-reply.c
>> @@ -423,7 +423,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
>>      /* First we look for a " for <email@add.res>" in the received
>>       * header
>>       */
>> -    ptr = strstr (received, " for ");
>> +    ptr = strstr (received, "for ");
>
> The comment should be updated to match the code (and depending on the
> answer to the above maybe explain that too)

Agreed.

BR,
Jani.

>
> Best wishes
>
> Mark
>>  
>>      /* Note: ptr potentially contains a list of email addresses. */
>>      addr = user_address_in_string (ptr, config);
>> @@ -440,7 +440,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
>>       * system in this step of the receive chain
>>       */
>>      by = received;
>> -    while((by = strstr (by, " by ")) != NULL) {
>> +    while((by = strstr (by, "by ")) != NULL) {
>>  	by += 4;
>>  	if (*by == '\0')
>>  	    break;
>> -- 
>> 1.8.4.rc3
>>
>> _______________________________________________
>> notmuch mailing list
>> notmuch@notmuchmail.org
>> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] cli: make the hacky from guessing more liberal
  2013-10-16 19:00 ` [PATCH 3/6] cli: make the hacky from guessing more liberal Jani Nikula
  2013-10-17  8:11   ` Mark Walters
@ 2013-10-17 13:58   ` Moritz Wilhelmy
  2013-10-17 14:03     ` Jani Nikula
  1 sibling, 1 reply; 13+ messages in thread
From: Moritz Wilhelmy @ 2013-10-17 13:58 UTC (permalink / raw)
  To: notmuch

Hello,

On Wed, Oct 16, 2013 at 22:00:10 +0300, Jani Nikula wrote:
> This is in preparation of switching to gmime header parsing. Accept
> "for" and "by" preceded by tabs in the received header. This is a bit
> flaky, but so is the whole guessing code.
> ---
>  notmuch-reply.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/notmuch-reply.c b/notmuch-reply.c
> index 9d6f843..4b67e66 100644
> --- a/notmuch-reply.c
> +++ b/notmuch-reply.c
> @@ -423,7 +423,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
>      /* First we look for a " for <email@add.res>" in the received
>       * header
>       */
> -    ptr = strstr (received, " for ");
> +    ptr = strstr (received, "for ");
>  
>      /* Note: ptr potentially contains a list of email addresses. */
>      addr = user_address_in_string (ptr, config);
> @@ -440,7 +440,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
>       * system in this step of the receive chain
>       */
>      by = received;
> -    while((by = strstr (by, " by ")) != NULL) {
> +    while((by = strstr (by, "by ")) != NULL) {
>  	by += 4;

FWIW, I didn't read the rest of the code, but shouldn't the last line
be changed to "by += 3" when you're dropping a space from the strstr?


Best,

Moritz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] cli: make the hacky from guessing more liberal
  2013-10-17 13:58   ` Moritz Wilhelmy
@ 2013-10-17 14:03     ` Jani Nikula
  0 siblings, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2013-10-17 14:03 UTC (permalink / raw)
  To: Moritz Wilhelmy, notmuch

On Thu, 17 Oct 2013, Moritz Wilhelmy <mw+notmuch@barfooze.de> wrote:
> Hello,
>
> On Wed, Oct 16, 2013 at 22:00:10 +0300, Jani Nikula wrote:
>> This is in preparation of switching to gmime header parsing. Accept
>> "for" and "by" preceded by tabs in the received header. This is a bit
>> flaky, but so is the whole guessing code.
>> ---
>>  notmuch-reply.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/notmuch-reply.c b/notmuch-reply.c
>> index 9d6f843..4b67e66 100644
>> --- a/notmuch-reply.c
>> +++ b/notmuch-reply.c
>> @@ -423,7 +423,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
>>      /* First we look for a " for <email@add.res>" in the received
>>       * header
>>       */
>> -    ptr = strstr (received, " for ");
>> +    ptr = strstr (received, "for ");
>>  
>>      /* Note: ptr potentially contains a list of email addresses. */
>>      addr = user_address_in_string (ptr, config);
>> @@ -440,7 +440,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
>>       * system in this step of the receive chain
>>       */
>>      by = received;
>> -    while((by = strstr (by, " by ")) != NULL) {
>> +    while((by = strstr (by, "by ")) != NULL) {
>>  	by += 4;
>
> FWIW, I didn't read the rest of the code, but shouldn't the last line
> be changed to "by += 3" when you're dropping a space from the strstr?

Absolutely, same in the "for" case. Thanks.

Jani.


>
>
> Best,
>
> Moritz
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-10-17 14:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-16 19:00 [PATCH 0/6] lib: replace the message header parser with gmime Jani Nikula
2013-10-16 19:00 ` [PATCH 1/6] emacs: Sanitize authors and subjects in search and show Jani Nikula
2013-10-16 19:00 ` [PATCH 2/6] cli: sanitize tabs to spaces in notmuch search Jani Nikula
2013-10-17  8:07   ` Mark Walters
2013-10-17 11:47     ` Jani Nikula
2013-10-16 19:00 ` [PATCH 3/6] cli: make the hacky from guessing more liberal Jani Nikula
2013-10-17  8:11   ` Mark Walters
2013-10-17 11:52     ` Jani Nikula
2013-10-17 13:58   ` Moritz Wilhelmy
2013-10-17 14:03     ` Jani Nikula
2013-10-16 19:00 ` [PATCH 4/6] lib: replace the header parser with gmime Jani Nikula
2013-10-16 19:00 ` [PATCH 5/6] lib: parse messages only once Jani Nikula
2013-10-16 19:00 ` [PATCH 6/6] HACK: fix broken messages in the perf test corpus Jani Nikula

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).