unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Internal error on line 296 of mime-node.c
@ 2012-03-01 21:39 David Bremner
  2012-03-01 21:57 ` Austin Clements
  2012-03-06 18:26 ` [PATCH] Handle errors in mime_node_open Austin Clements
  0 siblings, 2 replies; 23+ messages in thread
From: David Bremner @ 2012-03-01 21:39 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 1617 bytes --]


In the current master (4fa77d031) I get a sort-of double crash on the 
attached message.

The internal error on line 296 is reached, indicating some kind of bug,
but then the G_OBJECT_TYPE crashes becase parent->part is null.

The message is probably crap, I created it from another "real" message
from some business that causes the same problem.

(gdb) bt
#0  0x0000000000416850 in mime_node_child (parent=0x8425fa0, child=0)
    at mime-node.c:296
#1  0x00000000004137f1 in format_part_json (ctx=0x8425f10, node=0x8425fa0, 
    first=1) at notmuch-show.c:669
#2  0x0000000000413e54 in format_part_json_entry (ctx=0x8425f10, 
    node=0x8425fa0, indent=0, params=0x7ff000470) at notmuch-show.c:758
#3  0x0000000000413f0f in show_message (ctx=0x8367680, format=0x6327e0, message=
    0x83e9830, indent=0, params=0x7ff000470) at notmuch-show.c:776
#4  0x0000000000414165 in show_messages (ctx=0x8367680, format=0x6327e0, 
    messages=0x8425e70, indent=0, params=0x7ff000470) at notmuch-show.c:835
#5  0x00000000004145f5 in do_show (ctx=0x8367680, query=0x83b36b0, 
    format=0x6327e0, params=0x7ff000470) at notmuch-show.c:956
#6  0x0000000000414ba7 in notmuch_show_command (ctx=0x8367680, argc=3, 
    argv=0x7ff000660) at notmuch-show.c:1089
#7  0x000000000040a9fe in main (argc=4, argv=0x7ff000658) at notmuch.c:294
(gdb) print *parent
$3 = {part = 0x0, envelope_file = 0x83e9830, envelope_part = 0x0, 
  nchildren = 1, parent = 0x0, part_num = 0, decrypt_attempted = 0, 
  decrypt_success = 0, verify_attempted = 0, sig_list = 0x0, ctx = 0x8426090, 
  decrypted_child = 0x0, next_child = 0, next_part_num = 1}


[-- Attachment #2: test.eml --]
[-- Type: message/rfc822, Size: 2720 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 27 bytes --]

quoted printable stuff.


[-- Attachment #2.1.2: Type: text/html, Size: 35 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Internal error on line 296 of mime-node.c
  2012-03-01 21:39 Internal error on line 296 of mime-node.c David Bremner
@ 2012-03-01 21:57 ` Austin Clements
  2012-03-02  2:35   ` David Bremner
  2012-03-06 18:26 ` [PATCH] Handle errors in mime_node_open Austin Clements
  1 sibling, 1 reply; 23+ messages in thread
From: Austin Clements @ 2012-03-01 21:57 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Quoth David Bremner on Mar 01 at  5:39 pm:
> 
> In the current master (4fa77d031) I get a sort-of double crash on the 
> attached message.
> 
> The internal error on line 296 is reached, indicating some kind of bug,
> but then the G_OBJECT_TYPE crashes becase parent->part is null.
> 
> The message is probably crap, I created it from another "real" message
> from some business that causes the same problem.
> 
> (gdb) bt
> #0  0x0000000000416850 in mime_node_child (parent=0x8425fa0, child=0)
>     at mime-node.c:296
> #1  0x00000000004137f1 in format_part_json (ctx=0x8425f10, node=0x8425fa0, 
>     first=1) at notmuch-show.c:669
> #2  0x0000000000413e54 in format_part_json_entry (ctx=0x8425f10, 
>     node=0x8425fa0, indent=0, params=0x7ff000470) at notmuch-show.c:758
> #3  0x0000000000413f0f in show_message (ctx=0x8367680, format=0x6327e0, message=
>     0x83e9830, indent=0, params=0x7ff000470) at notmuch-show.c:776
> #4  0x0000000000414165 in show_messages (ctx=0x8367680, format=0x6327e0, 
>     messages=0x8425e70, indent=0, params=0x7ff000470) at notmuch-show.c:835
> #5  0x00000000004145f5 in do_show (ctx=0x8367680, query=0x83b36b0, 
>     format=0x6327e0, params=0x7ff000470) at notmuch-show.c:956
> #6  0x0000000000414ba7 in notmuch_show_command (ctx=0x8367680, argc=3, 
>     argv=0x7ff000660) at notmuch-show.c:1089
> #7  0x000000000040a9fe in main (argc=4, argv=0x7ff000658) at notmuch.c:294
> (gdb) print *parent
> $3 = {part = 0x0, envelope_file = 0x83e9830, envelope_part = 0x0, 
>   nchildren = 1, parent = 0x0, part_num = 0, decrypt_attempted = 0, 
>   decrypt_success = 0, verify_attempted = 0, sig_list = 0x0, ctx = 0x8426090, 
>   decrypted_child = 0x0, next_child = 0, next_part_num = 1}

For the record, this is because g_mime_parser_construct_message in
mime_node_open is returning NULL.  mime_node_open should be checking
for this as well as errors from the other GMime functions it calls.
(Unfortunately, there appears to be no way to ask GMime *what* went
wrong; it just returns NULL.)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* (no subject)
  2012-03-01 21:57 ` Austin Clements
@ 2012-03-02  2:35   ` David Bremner
  2012-03-02  2:35     ` [PATCH 1/2] test: utility function to add a pre-generated message to the database David Bremner
  2012-03-02  2:35     ` [PATCH 2/2] test: add new test file for mime parsing David Bremner
  0 siblings, 2 replies; 23+ messages in thread
From: David Bremner @ 2012-03-02  2:35 UTC (permalink / raw)
  To: notmuch

Probably both of these patches could use some polishing; Austin asked
for a test to help debug the crash I found today.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/2] test: utility function to add a pre-generated message to the database.
  2012-03-02  2:35   ` David Bremner
@ 2012-03-02  2:35     ` David Bremner
  2012-03-02  2:35     ` [PATCH 2/2] test: add new test file for mime parsing David Bremner
  1 sibling, 0 replies; 23+ messages in thread
From: David Bremner @ 2012-03-02  2:35 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

For various "nasty" messages, it is easier to ship a message rather
than try to generate a message which causes a failure.
---
 test/test-lib.sh |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/test/test-lib.sh b/test/test-lib.sh
index 2781506..17d0e48 100644
--- a/test/test-lib.sh
+++ b/test/test-lib.sh
@@ -376,6 +376,21 @@ ${template[body]}
 EOF
 }
 
+# Add an existing message to the notmuch database.
+#
+# expects a message on stdin.
+#
+# uses the variable gen_msg_cnt from generate_message.
+
+add_rfc822_message ()
+{
+    gen_msg_cnt=$((gen_msg_cnt + 1))
+    gen_msg_name="msg-$(printf "%03d" $gen_msg_cnt)"
+    gen_msg_filename="${MAIL_DIR}/$gen_msg_name"
+    cat >$gen_msg_filename
+    notmuch new > /dev/null
+}
+
 # Generate a new message and add it to the database.
 #
 # All of the arguments and return values supported by generate_message
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/2] test: add new test file for mime parsing.
  2012-03-02  2:35   ` David Bremner
  2012-03-02  2:35     ` [PATCH 1/2] test: utility function to add a pre-generated message to the database David Bremner
@ 2012-03-02  2:35     ` David Bremner
  1 sibling, 0 replies; 23+ messages in thread
From: David Bremner @ 2012-03-02  2:35 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

For now, just tests one message currently causing notmuch to segfault.
---
 test/mime         |   67 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/notmuch-test |    1 +
 2 files changed, 68 insertions(+), 0 deletions(-)
 create mode 100755 test/mime

diff --git a/test/mime b/test/mime
new file mode 100755
index 0000000..5b2d77f
--- /dev/null
+++ b/test/mime
@@ -0,0 +1,67 @@
+#!/usr/bin/env bash
+
+test_description="mime parsing"
+. test-lib.sh
+
+add_rfc822_message <<EOF
+From bob.smith@fbi.gov Wed Feb 09 10:06:54 2011
+Return-path: <bob.smith@fbi.gov>
+Envelope-to: bobafett@archangel.wmdcantina.org
+Delivery-date: Wed, 09 Feb 2011 10:06:54 -0400
+Received: from fiero.its.cantina.org ([131.202.1.10])
+	by archangel.wmdcantina.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
+	(Exim 4.69)
+	(envelope-from <bob.smith@fbi.gov>)
+	id 1PnAh4-0000AX-DH
+	for bobafett@archangel.wmdcantina.org; Wed, 09 Feb 2011 10:06:54 -0400
+Received: from mx3.nbpei-ecn.ca (mx3.nbpei-ecn.ca [198.164.163.196])
+	by fiero.its.cantina.org (8.13.8/8.13.8) with ESMTP id p19E6lst014693
+	for <bobafett@cantina.org>; Wed, 9 Feb 2011 10:06:47 -0400
+Received: from mx3.nbpei-ecn.ca (localhost.localdomain [127.0.0.1])
+	by localhost (Postfix) with SMTP id EC0C04B8002
+	for <bobafett@cantina.org>; Wed,  9 Feb 2011 10:06:47 -0400 (AST)
+Received: from cibc.ca (mail4.cibc.ca [199.198.251.34])
+	by mx3.nbpei-ecn.ca (Postfix) with ESMTP id A80824B8003
+	for <bobafett@cantina.org>; Wed,  9 Feb 2011 10:06:47 -0400 (AST)
+From: "Smith, Bob" <Bob.Smith@fbi.gov>
+To: "'BOBAFETT@CANTINA.ORG'" <BOBAFETT@cantina.org>
+Disposition-Notification-To: "Smith, Bob" <Bob.Smith@fbi.gov>
+Return-Receipt-To: <Bob.Smith@FBI.GOV>
+Date: Wed, 9 Feb 2011 09:06:43 -0500
+Subject: 
+Thread-Index: AcvIYpRcEJX82QtpQSacs5hsY+i4SQ==
+Message-ID: <5BB75198A4300643A295D4678B10F0503BDDF7CEB1@CBMCC-X7-MBX09.ad.fbi.gov>
+Accept-Language: en-US
+Content-Language: en-US
+X-MS-Has-Attach: 
+X-MS-TNEF-Correlator: 
+acceptlanguage: en-US
+Content-Type: multipart/alternative;
+	boundary="_000_5BB75198A4300643A295D4678B10F0503BDDF7CEB1CBMCCX7MBX09a_"
+MIME-Version: 1.0
+X-PMX-Version: 5.6.1.2065439, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2011.2.9.135719
+X-PerlMx-Spam: Gauge=IIIIIIIII, Probability=9%, Report='
+ BLANK_SUBJECT 0.1, HTML_NO_HTTP 0.1, SUPERLONG_LINE 0.05, BODYTEXTH_SIZE_10000_LESS 0, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_4000_4999 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, DATE_TZ_NA 0, __C230066_P5 0, __CT 0, __CTYPE_HAS_BOUNDARY 0, __CTYPE_MULTIPART 0, __CTYPE_MULTIPART_ALT 0, __FRAUD_CONTACT_NUM 0, __HAS_HTML 0, __HAS_MSGID 0, __MIME_HTML 0, __MIME_VERSION 0, __PHISH_FROM 0, __PHISH_FROM1 0, __PHISH_FROM_C 0, __SANE_MSGID 0, __STOCK_PHRASE_8 0, __TAG_EXISTS_HTML 0, __TO_MALFORMED_2 0, __TO_NO_NAME 0'
+X-Sender-Verified: bob.smith@fbi.gov
+
+--_000_5BB75198A4300643A295D4678B10F0503BDDF7CEB1CBMCCX7MBX09a_
+Content-Type: text/plain; charset="iso-8859-1"
+Content-Transfer-Encoding: quoted-printable
+
+quoted printable stuff.
+
+
+--_000_5BB75198A4300643A295D4678B10F0503BDDF7CEB1CBMCCX7MBX09a_
+Content-Type: text/html; charset="iso-8859-1"
+Content-Transfer-Encoding: quoted-printable
+
+<html>
+some other stuff
+</html>
+
+--_000_5BB75198A4300643A295D4678B10F0503BDDF7CEB1CBMCCX7MBX09a_--
+
+
+EOF
+test_expect_success 'output message as json' "notmuch show --format=json id:5BB75198A4300643A295D4678B10F0503BDDF7CEB1@CBMCC-X7-MBX09.ad.fbi.gov > /dev/null"
+test_done
diff --git a/test/notmuch-test b/test/notmuch-test
index f03b594..b572f1c 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -56,6 +56,7 @@ TESTS="
   emacs-address-cleaning
   emacs-hello
   emacs-show
+  mime
 "
 TESTS=${NOTMUCH_TESTS:=$TESTS}
 
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH] Handle errors in mime_node_open
  2012-03-01 21:39 Internal error on line 296 of mime-node.c David Bremner
  2012-03-01 21:57 ` Austin Clements
@ 2012-03-06 18:26 ` Austin Clements
  2012-03-06 22:04   ` Parsing regression with gmime-2.6? David Bremner
  2012-03-11  1:51   ` [PATCH] Handle errors in mime_node_open David Bremner
  1 sibling, 2 replies; 23+ messages in thread
From: Austin Clements @ 2012-03-06 18:26 UTC (permalink / raw)
  To: notmuch

---
 mime-node.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/mime-node.c b/mime-node.c
index d6b4506..a95bdab 100644
--- a/mime-node.c
+++ b/mime-node.c
@@ -97,11 +97,26 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
     }
 
     mctx->stream = g_mime_stream_file_new (mctx->file);
+    if (!mctx->stream) {
+	fprintf (stderr, "Out of memory.\n");
+	status = NOTMUCH_STATUS_OUT_OF_MEMORY;
+	goto DONE;
+    }
     g_mime_stream_file_set_owner (GMIME_STREAM_FILE (mctx->stream), FALSE);
 
     mctx->parser = g_mime_parser_new_with_stream (mctx->stream);
+    if (!mctx->parser) {
+	fprintf (stderr, "Out of memory.\n");
+	status = NOTMUCH_STATUS_OUT_OF_MEMORY;
+	goto DONE;
+    }
 
     mctx->mime_message = g_mime_parser_construct_message (mctx->parser);
+    if (!mctx->mime_message) {
+	fprintf (stderr, "Failed to parse %s\n", filename);
+	status = NOTMUCH_STATUS_FILE_ERROR;
+	goto DONE;
+    }
 
     mctx->cryptoctx = cryptoctx;
     mctx->decrypt = decrypt;
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Parsing regression with gmime-2.6?
  2012-03-06 18:26 ` [PATCH] Handle errors in mime_node_open Austin Clements
@ 2012-03-06 22:04   ` David Bremner
  2012-03-08 15:35     ` [WIP PATCH] debugging gmime-2.6 fail David Bremner
                       ` (2 more replies)
  2012-03-11  1:51   ` [PATCH] Handle errors in mime_node_open David Bremner
  1 sibling, 3 replies; 23+ messages in thread
From: David Bremner @ 2012-03-06 22:04 UTC (permalink / raw)
  To: Austin Clements, notmuch; +Cc: Daniel Kahn Gillmor

On Tue,  6 Mar 2012 18:26:57 +0000, Austin Clements <amdragon@MIT.EDU> wrote:
> ---
>  mime-node.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)

There seems to be something weird going on with gmime-2.6; maybe we
didn't catch some api change? 

I noticed a surprising number of messages failing to parse, so I wrote a
script to take a random sample of 1000 messages and run notmuch show on
them. A shocking 650 to 700 of them fail to parse with gmime-2.6. No
failures are reported with gmime-2.4.

I tried with your patch, and then took a couple of the id's it reported
and tried them on the release branch, and the failed (segfaulted) there.

I don't know at this point if it is something specific to my setup, or
my testing methedology is just hare-brained, but I'd appreciate it if
people could try austin's patch, gmime-2.6, and my script on their
mailstore and let me know how many id's it spits out (if any).

#!/usr/bin/bash

NOTMUCH=${NOTMUCH-notmuch}

if [ ! -f test-ids.txt ]; then
    notmuch dump | shuf -n 1000 | cut -f1 -d' ' > test-ids.txt
else
    echo "Re-using test-ids.txt" 1>&2
fi

while read id; do
    output=$(${NOTMUCH} show --format=json id:$id 2>&1 1>/dev/null)
    case "$output" in
	"Failed to parse"*)
	    echo $id
	    ;;
	*)
    esac
done <test-ids.txt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [WIP PATCH] debugging gmime-2.6 fail.
  2012-03-06 22:04   ` Parsing regression with gmime-2.6? David Bremner
@ 2012-03-08 15:35     ` David Bremner
  2012-03-08 18:08       ` David Bremner
  2012-03-08 18:27     ` Parsing regression with gmime-2.6? Jameson Graef Rollins
  2012-03-08 20:30     ` Daniel Kahn Gillmor
  2 siblings, 1 reply; 23+ messages in thread
From: David Bremner @ 2012-03-08 15:35 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

Try parsing again as mbox if the first one failed.
---
I ran out of time for the moment, but the following patch gets me down from 
4196 failures on the notmuch mailing list to 3422.

I'm leaning to reverting back to gmime-2.4 for the Debian 0.12 package
if I can't sort this out.

 mime-node.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/mime-node.c b/mime-node.c
index a95bdab..1d3a184 100644
--- a/mime-node.c
+++ b/mime-node.c
@@ -111,8 +111,17 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+
     mctx->mime_message = g_mime_parser_construct_message (mctx->parser);
     if (!mctx->mime_message) {
+	/*
+	 * Parsing failed, let's try again as an mbox
+	 */
+	mctx->parser = g_mime_parser_new_with_stream (mctx->stream);
+	mctx->mime_message = g_mime_parser_construct_message (mctx->parser);
+    }
+
+    if (!mctx->mime_message) {
 	fprintf (stderr, "Failed to parse %s\n", filename);
 	status = NOTMUCH_STATUS_FILE_ERROR;
 	goto DONE;
-- 
1.7.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [WIP PATCH] debugging gmime-2.6 fail.
  2012-03-08 15:35     ` [WIP PATCH] debugging gmime-2.6 fail David Bremner
@ 2012-03-08 18:08       ` David Bremner
  0 siblings, 0 replies; 23+ messages in thread
From: David Bremner @ 2012-03-08 18:08 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 526 bytes --]

On Thu,  8 Mar 2012 11:35:35 -0400, David Bremner <david@tethera.net> wrote:


> I ran out of time for the moment, but the following patch gets me down from 
> 4196 failures on the notmuch mailing list to 3422.

That patch is of course complete nonsense. Attached is another silly
patch, which at least demonstrates the problem. With the attached patch,
notmuch (+ gmime-2.6) only fails for non-mbox messages. So I guess we
need a way to detect if a file is mbox before parsing? or a way to get
gmime to be less strict here?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-WIP-debugging-gmime-2.6-problems.patch --]
[-- Type: text/x-diff, Size: 846 bytes --]

From 7fb942049ae68e09ebb9fbca40048f95543ab4b8 Mon Sep 17 00:00:00 2001
From: David Bremner <bremner@debian.org>
Date: Thu, 8 Mar 2012 11:11:21 -0400
Subject: [PATCH] WIP debugging gmime-2.6 problems

Unconditionally tell gmime to look for an mbox. This of course makes
it fail for non-mboxes.
---
 mime-node.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mime-node.c b/mime-node.c
index a95bdab..3e07fbf 100644
--- a/mime-node.c
+++ b/mime-node.c
@@ -111,7 +111,9 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    g_mime_parser_set_scan_from(mctx->parser, TRUE);
     mctx->mime_message = g_mime_parser_construct_message (mctx->parser);
+
     if (!mctx->mime_message) {
 	fprintf (stderr, "Failed to parse %s\n", filename);
 	status = NOTMUCH_STATUS_FILE_ERROR;
-- 
1.7.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Parsing regression with gmime-2.6?
  2012-03-06 22:04   ` Parsing regression with gmime-2.6? David Bremner
  2012-03-08 15:35     ` [WIP PATCH] debugging gmime-2.6 fail David Bremner
@ 2012-03-08 18:27     ` Jameson Graef Rollins
  2012-03-08 20:30     ` Daniel Kahn Gillmor
  2 siblings, 0 replies; 23+ messages in thread
From: Jameson Graef Rollins @ 2012-03-08 18:27 UTC (permalink / raw)
  To: David Bremner, Austin Clements, notmuch; +Cc: Daniel Kahn Gillmor

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

On Tue, 06 Mar 2012 18:04:50 -0400, David Bremner <david@tethera.net> wrote:
> There seems to be something weird going on with gmime-2.6; maybe we
> didn't catch some api change? 
> 
> I noticed a surprising number of messages failing to parse, so I wrote a
> script to take a random sample of 1000 messages and run notmuch show on
> them. A shocking 650 to 700 of them fail to parse with gmime-2.6. No
> failures are reported with gmime-2.4.

This *is* shockingly high, and very confusing.  Somehow all of the tests
pass?  That's very strange, particularly since there are enough real
messages used in the tests that we should experience at least one of
these failures, right?  Is there really nothing special about the
messages that are failing to parse?  Your later patches seem to indicate
that this has something to do with mbox, although you don't mention that
hear.

I thought I had been using gmime-2.6 for the last month, so I was very
skeptical of this issue.  Unfortunately I just found a bug in the
configure script [0] that meant I had actually been using gmime-2.4.
However, since fixing things and actually using gmime-2.6 now, I still
don't see any problems.

If the problem is specific to mbox, which we explicitly don't support, I
don't think this is tragic enough to warrant removing gmime-2.6 as a
dependency satisfier for 0.12 in Debian.  In an event, a more explicit
description of the problem you're seeing would be helpful.

jamie.

[0] id:"1331225101-24385-1-git-send-email-jrollins@finestructure.net"

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Parsing regression with gmime-2.6?
  2012-03-06 22:04   ` Parsing regression with gmime-2.6? David Bremner
  2012-03-08 15:35     ` [WIP PATCH] debugging gmime-2.6 fail David Bremner
  2012-03-08 18:27     ` Parsing regression with gmime-2.6? Jameson Graef Rollins
@ 2012-03-08 20:30     ` Daniel Kahn Gillmor
  2012-03-08 21:32       ` Jameson Graef Rollins
  2012-03-08 21:48       ` [PATCH] mime_node_open: check if the file is in mbox format, and inform gmime David Bremner
  2 siblings, 2 replies; 23+ messages in thread
From: Daniel Kahn Gillmor @ 2012-03-08 20:30 UTC (permalink / raw)
  To: David Bremner, Austin Clements, notmuch

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

On Tue, 06 Mar 2012 18:04:50 -0400, David Bremner <david@tethera.net> wrote:
> There seems to be something weird going on with gmime-2.6; maybe we
> didn't catch some api change? 

This does seem to be a regression in gmime 2.6.  I've reported the bug
upstream, along with a simplified (non-notmuch) demonstration:

 https://bugzilla.gnome.org/show_bug.cgi?id=671680

We'll see what gmime's upstream has to say about it.

As a devil's advocate, i could argue that a message in a maildir that
starts with a "From " line isn't a proper e-mail message in the first
place, and therefore gmime 2.6 is being more rigorously correct about
what it accepts.  In particular, if a user were to place a multi-message
mbox file in their notmuch message store, i think that notmuch linked
against 2.4 would happily index only the first message of it, and the
rest of the message would be "hidden", whereas gmime 2.6 allows us to
detect these failures and avoid indexing them directly.

That said, i understand that this is probably not an entirely rare
situation, and i lean toward the idea that gmime 2.4's behavior was
actually the Right Thing.  Also, I haven't been able to find any
explicit documentation to indicate that the behavior change was
a deliberate one.

I'm happy to relay any helpful or constructive suggestions to the gmime
upstream ticket, if folks don't want to deal with bugzilla directly.

        --dkg

[-- Attachment #2: Type: application/pgp-signature, Size: 965 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Parsing regression with gmime-2.6?
  2012-03-08 20:30     ` Daniel Kahn Gillmor
@ 2012-03-08 21:32       ` Jameson Graef Rollins
  2012-03-08 21:40         ` Daniel Kahn Gillmor
  2012-03-08 21:48       ` [PATCH] mime_node_open: check if the file is in mbox format, and inform gmime David Bremner
  1 sibling, 1 reply; 23+ messages in thread
From: Jameson Graef Rollins @ 2012-03-08 21:32 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, David Bremner, Austin Clements, notmuch

[-- Attachment #1: Type: text/plain, Size: 2230 bytes --]

On Thu, 08 Mar 2012 15:30:21 -0500, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
> This does seem to be a regression in gmime 2.6.  I've reported the bug
> upstream, along with a simplified (non-notmuch) demonstration:
> 
>  https://bugzilla.gnome.org/show_bug.cgi?id=671680
> 
> We'll see what gmime's upstream has to say about it.

Thanks so much for following up on this, Daniel.

> As a devil's advocate, i could argue that a message in a maildir that
> starts with a "From " line isn't a proper e-mail message in the first
> place, and therefore gmime 2.6 is being more rigorously correct about
> what it accepts.  In particular, if a user were to place a multi-message
> mbox file in their notmuch message store, i think that notmuch linked
> against 2.4 would happily index only the first message of it, and the
> rest of the message would be "hidden", whereas gmime 2.6 allows us to
> detect these failures and avoid indexing them directly.

I actually agree with this position.  mbox files are not proper email
messages, so if gmime does not explicitly support parsing them then we
really can't expect it to parse them.  We *can* expect gmime to return
some sort of proper error message/return code/etc and not fail in a bad
way, but beyond that I think the burden is on us.

> That said, i understand that this is probably not an entirely rare
> situation, and i lean toward the idea that gmime 2.4's behavior was
> actually the Right Thing.  Also, I haven't been able to find any
> explicit documentation to indicate that the behavior change was
> a deliberate one.

And I think I don't agree with this one: I not think that handling mbox
files in the way you've outlined above is the Right Thing.  Only
partially parsing an mbox file, by indexing only the first message for
instance, seems like a bad solution to me, and one that's likely to lead
to a lot of confusion (e.g. "where are the rest of the messages that
were in my mbox?").

If notmuch encounters an mbox file in the store it should just skip the
message and continue, while indicating that it is skipping the message
because it's not a proper email message, as it does with other non-email
files.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Parsing regression with gmime-2.6?
  2012-03-08 21:32       ` Jameson Graef Rollins
@ 2012-03-08 21:40         ` Daniel Kahn Gillmor
  2012-03-08 21:59           ` Jameson Graef Rollins
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Kahn Gillmor @ 2012-03-08 21:40 UTC (permalink / raw)
  To: notmuch

On 03/08/2012 04:32 PM, Jameson Graef Rollins wrote:

> I actually agree with this position.  mbox files are not proper email
> messages, so if gmime does not explicitly support parsing them then we
> really can't expect it to parse them.

i think the point here is that gmime used to support parsing them and 
now it doesn't.

So if you've got a pre-existing notmuch index (built from when gmime 
*did* parse them) and you're now running notmuch linked against a gmime 
that *doesn't* parse them, then notmuch search will produce references 
to messages that notmuch show can't produce output for.

While i appreciate wanting to be sure that we're working with 
well-formed files, I don't have a good answer for what the right thing 
to do is, if gmime upstream agrees with your assessment of the situation.

yuck,

	--dkg

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] mime_node_open: check if the file is in mbox format, and inform gmime.
  2012-03-08 20:30     ` Daniel Kahn Gillmor
  2012-03-08 21:32       ` Jameson Graef Rollins
@ 2012-03-08 21:48       ` David Bremner
  2012-03-08 22:05         ` Jameson Graef Rollins
  2012-03-09 10:50         ` Tomi Ollila
  1 sibling, 2 replies; 23+ messages in thread
From: David Bremner @ 2012-03-08 21:48 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

It seems that it has always been an error to try to parse an mbox
format file with gmime without calling g_mime_parser_set_scan_from.

This change reads the first 5 bytes of the file, and if they are "From ",
declares the file to be an mbox.
---

This patch seems to fix the problem for me. I don't think the
performance impact should be too bad, but I didn't really test it.

 mime-node.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/mime-node.c b/mime-node.c
index a95bdab..8939147 100644
--- a/mime-node.c
+++ b/mime-node.c
@@ -72,6 +72,8 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
     mime_node_context_t *mctx;
     mime_node_t *root;
     notmuch_status_t status;
+    char from_buf[6]; /* From space NULL */
+    int is_mbox = 0;
 
     root = talloc_zero (ctx, mime_node_t);
     if (root == NULL) {
@@ -96,6 +98,23 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    if (fread (from_buf, 1, 5, mctx->file) != 5) {
+	fprintf (stderr, "Failed to read 5 bytes from %s: %s\n",
+		 filename, strerror (errno));
+	status = NOTMUCH_STATUS_FILE_ERROR;
+	goto DONE;
+    }
+    from_buf[5] = '\0';
+
+    if (fseek (mctx->file, 0L, SEEK_SET)) {
+	fprintf (stderr, "Failed to rewind %s: %s\n",
+		 filename, strerror (errno));
+	status = NOTMUCH_STATUS_FILE_ERROR;
+	goto DONE;
+    }
+
+    is_mbox = (strcmp (from_buf, "From ") == 0);
+
     mctx->stream = g_mime_stream_file_new (mctx->file);
     if (!mctx->stream) {
 	fprintf (stderr, "Out of memory.\n");
@@ -111,7 +130,9 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    g_mime_parser_set_scan_from (mctx->parser, is_mbox);
     mctx->mime_message = g_mime_parser_construct_message (mctx->parser);
+
     if (!mctx->mime_message) {
 	fprintf (stderr, "Failed to parse %s\n", filename);
 	status = NOTMUCH_STATUS_FILE_ERROR;
-- 
1.7.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Parsing regression with gmime-2.6?
  2012-03-08 21:40         ` Daniel Kahn Gillmor
@ 2012-03-08 21:59           ` Jameson Graef Rollins
  0 siblings, 0 replies; 23+ messages in thread
From: Jameson Graef Rollins @ 2012-03-08 21:59 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, notmuch

[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]

On Thu, 08 Mar 2012 16:40:00 -0500, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
> i think the point here is that gmime used to support parsing them and 
> now it doesn't.
> 
> So if you've got a pre-existing notmuch index (built from when gmime 
> *did* parse them) and you're now running notmuch linked against a gmime 
> that *doesn't* parse them, then notmuch search will produce references 
> to messages that notmuch show can't produce output for.

This is unfortunate for those that suffer from this but again I don't
think the solution is a partial parsing of mbox files.

If this really is the situation, it seems like the *previous* behavior
was actually the problematic one, which has now been fixed, even though
it appears to be a regression.

I think I would argue that users who do fall in to this situation should
just simply convert all of their mbox files into proper messages.
That's the position that we've actually held all along: notmuch does not
support mbox and those with mbox files should just convert them to
individual messages with something like mbox2maildir or the like.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] mime_node_open: check if the file is in mbox format, and inform gmime.
  2012-03-08 21:48       ` [PATCH] mime_node_open: check if the file is in mbox format, and inform gmime David Bremner
@ 2012-03-08 22:05         ` Jameson Graef Rollins
  2012-03-09 10:50         ` Tomi Ollila
  1 sibling, 0 replies; 23+ messages in thread
From: Jameson Graef Rollins @ 2012-03-08 22:05 UTC (permalink / raw)
  To: David Bremner, notmuch; +Cc: David Bremner

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

On Thu,  8 Mar 2012 17:48:15 -0400, David Bremner <david@tethera.net> wrote:
> From: David Bremner <bremner@debian.org>
> 
> It seems that it has always been an error to try to parse an mbox
> format file with gmime without calling g_mime_parser_set_scan_from.
> 
> This change reads the first 5 bytes of the file, and if they are "From ",
> declares the file to be an mbox.
> ---
> 
> This patch seems to fix the problem for me. I don't think the
> performance impact should be too bad, but I didn't really test it.

As I've stated previously in this thread, I think this behavior is a
mistake.  This will not result in a proper parsing of an mbox file, and
improper or incomplete parsing of mbox files will lead to bad/confusing
behavior.

We should either completely support mbox files or not support them.
Partial support is, imho, a recipe for disaster.  We don't currently
support them, and it would take a lot of extra work to do so completely.
I don't see any harm in just telling users to convert their mbox files
into proper message files.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] mime_node_open: check if the file is in mbox format, and inform gmime.
  2012-03-08 21:48       ` [PATCH] mime_node_open: check if the file is in mbox format, and inform gmime David Bremner
  2012-03-08 22:05         ` Jameson Graef Rollins
@ 2012-03-09 10:50         ` Tomi Ollila
  2012-03-09 13:31           ` [PATCH] mime_node_open: skip envelope from lines at the start of messages David Bremner
  2012-03-09 13:56           ` David Bremner
  1 sibling, 2 replies; 23+ messages in thread
From: Tomi Ollila @ 2012-03-09 10:50 UTC (permalink / raw)
  To: David Bremner, notmuch; +Cc: David Bremner

On Thu,  8 Mar 2012 17:48:15 -0400, David Bremner <david@tethera.net> wrote:
> From: David Bremner <bremner@debian.org>
> 
> It seems that it has always been an error to try to parse an mbox
> format file with gmime without calling g_mime_parser_set_scan_from.

At least for the time being I think we should apply
http://en.wikipedia.org/wiki/Robustness_principle to this case
and accept email files that start with 'From '...

> This change reads the first 5 bytes of the file, and if they are "From ",
> declares the file to be an mbox.

an alternative to this is not to declare file as an mbox one but
if first line starts with 'From ' skip that line.

Whether this is a good idea or not I've already thought an implementation
how to do this which I'll post as an RFC patch in next 12 hours.


Tomi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] mime_node_open: skip envelope from lines at the start of messages
  2012-03-09 10:50         ` Tomi Ollila
@ 2012-03-09 13:31           ` David Bremner
  2012-03-09 13:56           ` David Bremner
  1 sibling, 0 replies; 23+ messages in thread
From: David Bremner @ 2012-03-09 13:31 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

Some MDAs such as procmail (in MH mode), and exim (doing local
delivery in some configurations of the appendfile transport) add a
line to the front of a message with "From " followed by envelope
sender.  Since this is not a proper RFC822 header field, gmime (at
least since version 2.6) refuses to parse it, unless in mbox mode.

This change reads the line of the file, and if they start with
"From ", pass the stream to gmime starting from the second line.

This makes mime_node_open more consistent with (but still stricter
than) the permissive behaviour of notmuch_file_get_header
(message-file.c), which allows a certain number of "broken_headers".

We avoid putting gmime into mbox mode in case of side effects; this
leaves the situation of mboxes accidentally indexed by notmuch the
same as before, namely "undefined behaviour".  Ideally they should at
least be warned by notmuch-new.  Although strict rfc822 adherence
would be one way to detect mboxes, it doesn't seem to fit with the
spirit or code of message-file.c.
---
 mime-node.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/mime-node.c b/mime-node.c
index a95bdab..8939147 100644
--- a/mime-node.c
+++ b/mime-node.c
@@ -72,6 +72,8 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
     mime_node_context_t *mctx;
     mime_node_t *root;
     notmuch_status_t status;
+    char from_buf[6]; /* From space NULL */
+    int is_mbox = 0;
 
     root = talloc_zero (ctx, mime_node_t);
     if (root == NULL) {
@@ -96,6 +98,23 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    if (fread (from_buf, 1, 5, mctx->file) != 5) {
+	fprintf (stderr, "Failed to read 5 bytes from %s: %s\n",
+		 filename, strerror (errno));
+	status = NOTMUCH_STATUS_FILE_ERROR;
+	goto DONE;
+    }
+    from_buf[5] = '\0';
+
+    if (fseek (mctx->file, 0L, SEEK_SET)) {
+	fprintf (stderr, "Failed to rewind %s: %s\n",
+		 filename, strerror (errno));
+	status = NOTMUCH_STATUS_FILE_ERROR;
+	goto DONE;
+    }
+
+    is_mbox = (strcmp (from_buf, "From ") == 0);
+
     mctx->stream = g_mime_stream_file_new (mctx->file);
     if (!mctx->stream) {
 	fprintf (stderr, "Out of memory.\n");
@@ -111,7 +130,9 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    g_mime_parser_set_scan_from (mctx->parser, is_mbox);
     mctx->mime_message = g_mime_parser_construct_message (mctx->parser);
+
     if (!mctx->mime_message) {
 	fprintf (stderr, "Failed to parse %s\n", filename);
 	status = NOTMUCH_STATUS_FILE_ERROR;
-- 
1.7.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH] mime_node_open: skip envelope from lines at the start of messages
  2012-03-09 10:50         ` Tomi Ollila
  2012-03-09 13:31           ` [PATCH] mime_node_open: skip envelope from lines at the start of messages David Bremner
@ 2012-03-09 13:56           ` David Bremner
  2012-03-09 16:20             ` Daniel Kahn Gillmor
  1 sibling, 1 reply; 23+ messages in thread
From: David Bremner @ 2012-03-09 13:56 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

Some MDAs such as procmail (in MH mode), and exim (doing local
delivery in some configurations of the appendfile transport) add a
line to the front of a message with "From " followed by envelope
sender.  Since this is not a proper RFC822 header field, gmime (at
least since version 2.6) refuses to parse it, unless in mbox mode.

This change reads the line of the file, and if they start with
"From ", pass the stream to gmime starting from the second line.

This makes mime_node_open more consistent with (but still stricter
than) the permissive behaviour of notmuch_file_get_header
(message-file.c), which allows a certain number of "broken_headers".

We avoid putting gmime into mbox mode in case of side effects; this
leaves the situation of mboxes accidentally indexed by notmuch the
same as before, namely "undefined behaviour".  Ideally they should at
least be warned by notmuch-new.  Although strict rfc822 adherence
would be one way to detect mboxes, it doesn't seem to fit with the
spirit or code of message-file.c.
---
 mime-node.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/mime-node.c b/mime-node.c
index a95bdab..cf84423 100644
--- a/mime-node.c
+++ b/mime-node.c
@@ -72,6 +72,8 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
     mime_node_context_t *mctx;
     mime_node_t *root;
     notmuch_status_t status;
+    char *first_line=NULL;
+    size_t first_line_size=0;
 
     root = talloc_zero (ctx, mime_node_t);
     if (root == NULL) {
@@ -96,6 +98,23 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    if (getline(&first_line, &first_line_size, mctx->file) < 0) {
+	fprintf (stderr, "Failed to read first line from %s: %s\n",
+		 filename, strerror (errno));
+	status = NOTMUCH_STATUS_FILE_ERROR;
+	goto DONE;
+    }
+
+    if (strcmp (first_line, "From ") != 0) {
+	/* No envelope from line */
+	if (fseek (mctx->file, 0L, SEEK_SET)) {
+	    fprintf (stderr, "Failed to rewind %s: %s\n",
+		     filename, strerror (errno));
+	    status = NOTMUCH_STATUS_FILE_ERROR;
+	    goto DONE;
+	}
+    }
+
     mctx->stream = g_mime_stream_file_new (mctx->file);
     if (!mctx->stream) {
 	fprintf (stderr, "Out of memory.\n");
@@ -111,7 +130,9 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    g_mime_parser_set_scan_from (mctx->parser, FALSE);
     mctx->mime_message = g_mime_parser_construct_message (mctx->parser);
+
     if (!mctx->mime_message) {
 	fprintf (stderr, "Failed to parse %s\n", filename);
 	status = NOTMUCH_STATUS_FILE_ERROR;
@@ -136,6 +157,9 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
     return NOTMUCH_STATUS_SUCCESS;
 
 DONE:
+    if (first_line != NULL)
+	free (first_line);
+
     talloc_free (root);
     return status;
 }
-- 
1.7.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] mime_node_open: skip envelope from lines at the start of messages
  2012-03-09 13:56           ` David Bremner
@ 2012-03-09 16:20             ` Daniel Kahn Gillmor
  2012-03-10 13:25               ` David Bremner
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Kahn Gillmor @ 2012-03-09 16:20 UTC (permalink / raw)
  To: David Bremner, notmuch

On 03/09/2012 08:56 AM, David Bremner wrote:
> Some MDAs such as procmail (in MH mode), and exim (doing local
> delivery in some configurations of the appendfile transport) add a
> line to the front of a message with "From " followed by envelope
> sender.  Since this is not a proper RFC822 header field, gmime (at
> least since version 2.6) refuses to parse it, unless in mbox mode.
>
> This change reads the line of the file, and if they start with
> "From ", pass the stream to gmime starting from the second line.
>
> This makes mime_node_open more consistent with (but still stricter
> than) the permissive behaviour of notmuch_file_get_header
> (message-file.c), which allows a certain number of "broken_headers".
>
> We avoid putting gmime into mbox mode in case of side effects; this
> leaves the situation of mboxes accidentally indexed by notmuch the
> same as before, namely "undefined behaviour".  Ideally they should at
> least be warned by notmuch-new.  Although strict rfc822 adherence
> would be one way to detect mboxes, it doesn't seem to fit with the
> spirit or code of message-file.c.

The above justification (and the version of the associated patch without 
the memory leak and using strncmp instead of strcmp) seems good to me.

While I'd prefer to have nothing but spic-and-span, perfectly clean 
RFC2822 messages, we have (perhaps accidentally) traditionally supported 
message files with leading "From " lines, so they will be 
already-indexed by previous versions of notmuch.

This patch defines the non-MIME variance we're willing to accept quite 
narrowly (just a single leading line that starts with "From ", no 
escaping of the rest of the text), avoids breaking compatibility with 
existing indexes, and satisfies indexing some plausible MTA delivery 
configurations.

The only way it would be better is if it were to auto-detect that a file 
is actually a multi-message mbox, and alert the user to the fact that 
all but the first message in the mbox is unindexed.  But we don't 
currently do that anyway, so it's not a regression (and that additional 
cleanup should probably be a separate patch anyway).

so: LGTM.

	--dkg

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] mime_node_open: skip envelope from lines at the start of messages
  2012-03-09 16:20             ` Daniel Kahn Gillmor
@ 2012-03-10 13:25               ` David Bremner
  2012-03-10 14:45                 ` Tomi Ollila
  0 siblings, 1 reply; 23+ messages in thread
From: David Bremner @ 2012-03-10 13:25 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

Some MDAs such as procmail (in MH mode), and exim (doing local
delivery in some configurations of the appendfile transport) add a
line to the front of a message with "From " followed by envelope
sender.  Since this is not a proper RFC822 header field, gmime (at
least since version 2.6) refuses to parse it, unless in mbox mode.

This change reads the line of the file, and if they start with
"From ", pass the stream to gmime starting from the second line.

This makes mime_node_open more consistent with (but still stricter
than) the permissive behaviour of notmuch_file_get_header
(message-file.c), which allows a certain number of "broken_headers".

We avoid putting gmime into mbox mode in case of side effects; this
leaves the situation of mboxes accidentally indexed by notmuch the
same as before, namely "undefined behaviour".  Ideally they should at
least be warned by notmuch-new.  Although strict rfc822 adherence
would be one way to detect mboxes, it doesn't seem to fit with the
spirit or code of message-file.c.
---
This version fixes a few formatting issues, and one bug (the strncmp).

 mime-node.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/mime-node.c b/mime-node.c
index a95bdab..25e9d11 100644
--- a/mime-node.c
+++ b/mime-node.c
@@ -72,6 +72,8 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
     mime_node_context_t *mctx;
     mime_node_t *root;
     notmuch_status_t status;
+    char *first_line = NULL;
+    size_t first_line_size = 0;
 
     root = talloc_zero (ctx, mime_node_t);
     if (root == NULL) {
@@ -96,6 +98,23 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    if (getline (&first_line, &first_line_size, mctx->file) < 0) {
+	fprintf (stderr, "Failed to read first line from %s: %s\n",
+		 filename, strerror (errno));
+	status = NOTMUCH_STATUS_FILE_ERROR;
+	goto DONE;
+    }
+
+    if (strncmp (first_line, "From ", 5) != 0) {
+	/* No envelope from line */
+	if (fseek (mctx->file, 0L, SEEK_SET)) {
+	    fprintf (stderr, "Failed to rewind %s: %s\n",
+		     filename, strerror (errno));
+	    status = NOTMUCH_STATUS_FILE_ERROR;
+	    goto DONE;
+	}
+    }
+
     mctx->stream = g_mime_stream_file_new (mctx->file);
     if (!mctx->stream) {
 	fprintf (stderr, "Out of memory.\n");
@@ -111,7 +130,9 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
 	goto DONE;
     }
 
+    g_mime_parser_set_scan_from (mctx->parser, FALSE);
     mctx->mime_message = g_mime_parser_construct_message (mctx->parser);
+
     if (!mctx->mime_message) {
 	fprintf (stderr, "Failed to parse %s\n", filename);
 	status = NOTMUCH_STATUS_FILE_ERROR;
@@ -136,6 +157,9 @@ mime_node_open (const void *ctx, notmuch_message_t *message,
     return NOTMUCH_STATUS_SUCCESS;
 
 DONE:
+    if (first_line != NULL)
+	free (first_line);
+
     talloc_free (root);
     return status;
 }
-- 
1.7.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] mime_node_open: skip envelope from lines at the start of messages
  2012-03-10 13:25               ` David Bremner
@ 2012-03-10 14:45                 ` Tomi Ollila
  0 siblings, 0 replies; 23+ messages in thread
From: Tomi Ollila @ 2012-03-10 14:45 UTC (permalink / raw)
  To: David Bremner, notmuch; +Cc: David Bremner

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

On Sat, 10 Mar 2012 09:25:31 -0400, David Bremner <david@tethera.net> wrote:
> From: David Bremner <bremner@debian.org>
> 
> Some MDAs such as procmail (in MH mode), and exim (doing local
> delivery in some configurations of the appendfile transport) add a
> line to the front of a message with "From " followed by envelope
> sender.  Since this is not a proper RFC822 header field, gmime (at
> least since version 2.6) refuses to parse it, unless in mbox mode.
> 
> This change reads the line of the file, and if they start with
> "From ", pass the stream to gmime starting from the second line.
> 
> This makes mime_node_open more consistent with (but still stricter
> than) the permissive behaviour of notmuch_file_get_header
> (message-file.c), which allows a certain number of "broken_headers".
> 
> We avoid putting gmime into mbox mode in case of side effects; this
> leaves the situation of mboxes accidentally indexed by notmuch the
> same as before, namely "undefined behaviour".  Ideally they should at
> least be warned by notmuch-new.  Although strict rfc822 adherence
> would be one way to detect mboxes, it doesn't seem to fit with the
> spirit or code of message-file.c.
> ---
> This version fixes a few formatting issues, and one bug (the strncmp).

+1

Tomi

[-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Handle errors in mime_node_open
  2012-03-06 18:26 ` [PATCH] Handle errors in mime_node_open Austin Clements
  2012-03-06 22:04   ` Parsing regression with gmime-2.6? David Bremner
@ 2012-03-11  1:51   ` David Bremner
  1 sibling, 0 replies; 23+ messages in thread
From: David Bremner @ 2012-03-11  1:51 UTC (permalink / raw)
  To: Austin Clements, notmuch

On Tue,  6 Mar 2012 18:26:57 +0000, Austin Clements <amdragon@MIT.EDU> wrote:
> ---
>  mime-node.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)

Pushed, 

d

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-03-11  1:51 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-01 21:39 Internal error on line 296 of mime-node.c David Bremner
2012-03-01 21:57 ` Austin Clements
2012-03-02  2:35   ` David Bremner
2012-03-02  2:35     ` [PATCH 1/2] test: utility function to add a pre-generated message to the database David Bremner
2012-03-02  2:35     ` [PATCH 2/2] test: add new test file for mime parsing David Bremner
2012-03-06 18:26 ` [PATCH] Handle errors in mime_node_open Austin Clements
2012-03-06 22:04   ` Parsing regression with gmime-2.6? David Bremner
2012-03-08 15:35     ` [WIP PATCH] debugging gmime-2.6 fail David Bremner
2012-03-08 18:08       ` David Bremner
2012-03-08 18:27     ` Parsing regression with gmime-2.6? Jameson Graef Rollins
2012-03-08 20:30     ` Daniel Kahn Gillmor
2012-03-08 21:32       ` Jameson Graef Rollins
2012-03-08 21:40         ` Daniel Kahn Gillmor
2012-03-08 21:59           ` Jameson Graef Rollins
2012-03-08 21:48       ` [PATCH] mime_node_open: check if the file is in mbox format, and inform gmime David Bremner
2012-03-08 22:05         ` Jameson Graef Rollins
2012-03-09 10:50         ` Tomi Ollila
2012-03-09 13:31           ` [PATCH] mime_node_open: skip envelope from lines at the start of messages David Bremner
2012-03-09 13:56           ` David Bremner
2012-03-09 16:20             ` Daniel Kahn Gillmor
2012-03-10 13:25               ` David Bremner
2012-03-10 14:45                 ` Tomi Ollila
2012-03-11  1:51   ` [PATCH] Handle errors in mime_node_open David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).