unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] Don't bother checking for mbox files
@ 2016-03-10  0:08 Keith Packard
  2016-03-13 10:57 ` Edward Betts
  0 siblings, 1 reply; 4+ messages in thread
From: Keith Packard @ 2016-03-10  0:08 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 2792 bytes --]


Postfix adds mbox-style From lines when used in combination with
maildrop or .forward files. If they have another line starting with
'From ' in them, notmuch complains about them not being mail files.

If we assume the user hasn't screwed up and misconfigured their mail
system, then we can safely ignore whether the file started with an
mbox header and just parse it as a single-message file.

I can't see any harm in doing this; in the worst case, you'll have a
single 'message' which contains multiple emails now, while the
alternative is to miss mail entirely.

I could be convinced to add a config option to enable this behaviour,
but I'd really rather not.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 lib/message-file.c | 29 +----------------------------
 1 file changed, 1 insertion(+), 28 deletions(-)

diff --git a/lib/message-file.c b/lib/message-file.c
index ee30520..f03e1db 100644
--- a/lib/message-file.c
+++ b/lib/message-file.c
@@ -98,22 +98,6 @@ _notmuch_message_file_close (notmuch_message_file_t *message)
     talloc_free (message);
 }
 
-static notmuch_bool_t
-_is_mbox (FILE *file)
-{
-    char from_buf[5];
-    notmuch_bool_t ret = FALSE;
-
-    /* Is this mbox? */
-    if (fread (from_buf, sizeof (from_buf), 1, file) == 1 &&
-	strncmp (from_buf, "From ", 5) == 0)
-	ret = TRUE;
-
-    rewind (file);
-
-    return ret;
-}
-
 notmuch_status_t
 _notmuch_message_file_parse (notmuch_message_file_t *message)
 {
@@ -121,13 +105,10 @@ _notmuch_message_file_parse (notmuch_message_file_t *message)
     GMimeParser *parser;
     notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;
     static int initialized = 0;
-    notmuch_bool_t is_mbox;
 
     if (message->message)
 	return NOTMUCH_STATUS_SUCCESS;
 
-    is_mbox = _is_mbox (message->file);
-
     if (! initialized) {
 	g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);
 	initialized = 1;
@@ -144,7 +125,7 @@ _notmuch_message_file_parse (notmuch_message_file_t *message)
     g_mime_stream_file_set_owner (GMIME_STREAM_FILE (stream), FALSE);
 
     parser = g_mime_parser_new_with_stream (stream);
-    g_mime_parser_set_scan_from (parser, is_mbox);
+    g_mime_parser_set_scan_from (parser, FALSE);
 
     message->message = g_mime_parser_construct_message (parser);
     if (! message->message) {
@@ -152,14 +133,6 @@ _notmuch_message_file_parse (notmuch_message_file_t *message)
 	goto DONE;
     }
 
-    if (is_mbox && ! g_mime_parser_eos (parser)) {
-	/*
-	 * This is a multi-message mbox. (For historical reasons, we
-	 * do support single-message mboxes.)
-	 */
-	status = NOTMUCH_STATUS_FILE_NOT_EMAIL;
-    }
-
   DONE:
     g_object_unref (stream);
     g_object_unref (parser);
-- 
2.7.0


-- 
-keith

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 810 bytes --]

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Don't bother checking for mbox files
  2016-03-10  0:08 [PATCH] Don't bother checking for mbox files Keith Packard
@ 2016-03-13 10:57 ` Edward Betts
  2016-03-13 12:48   ` Jani Nikula
  0 siblings, 1 reply; 4+ messages in thread
From: Edward Betts @ 2016-03-13 10:57 UTC (permalink / raw)
  To: notmuch

Keith Packard <keithp@keithp.com> wrote:
> Postfix adds mbox-style From lines when used in combination with
> maildrop or .forward files. If they have another line starting with
> 'From ' in them, notmuch complains about them not being mail files.
> 
> If we assume the user hasn't screwed up and misconfigured their mail
> system, then we can safely ignore whether the file started with an
> mbox header and just parse it as a single-message file.

I think it is fine to go ahead with this change. At the same time the
behaviour of Postfix should be corrected so it doesn't add mbox-style From
lines to mails in maildir format.

The same problem existed in the Debian exim4 config. I filed a bug, it was
fixed: https://bugs.debian.org/769396

Here is a bug in maildrop suggesting that it should strip the mbox-style From
line from the top of mails: https://bugs.debian.org/737383

-- 
Edward.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Don't bother checking for mbox files
  2016-03-13 10:57 ` Edward Betts
@ 2016-03-13 12:48   ` Jani Nikula
  2016-03-14  7:23     ` Tomi Ollila
  0 siblings, 1 reply; 4+ messages in thread
From: Jani Nikula @ 2016-03-13 12:48 UTC (permalink / raw)
  To: Edward Betts, notmuch

On Sun, 13 Mar 2016, Edward Betts <edward@4angle.com> wrote:
> Keith Packard <keithp@keithp.com> wrote:
>> Postfix adds mbox-style From lines when used in combination with
>> maildrop or .forward files. If they have another line starting with
>> 'From ' in them, notmuch complains about them not being mail files.
>> 
>> If we assume the user hasn't screwed up and misconfigured their mail
>> system, then we can safely ignore whether the file started with an
>> mbox header and just parse it as a single-message file.
>
> I think it is fine to go ahead with this change. At the same time the
> behaviour of Postfix should be corrected so it doesn't add mbox-style From
> lines to mails in maildir format.

I disagree with making the change (as-is, at least).

In general, Notmuch does not support mboxes. We expect maildir style one
message per file mail storage. We support single-message mboxes as a
special case, in part because, as you note, there's plenty of other
software that adds the mbox "From " line even though delivering to
maildir.

I think it's misleading and confusing to the users to accept and index
the first message of mboxes, and silently ignore the rest (or worse,
index all of the mbox and associate the text with the first message). I
think we should reject multi-message mboxes, because we have no code to
handle them. This patch throws away that check.

Now, IIUC, the problem here is not that the files actually are
multi-message mboxes. We could use a sample message (even a crafted one)
that exhibits the problem, so we could add a test case, and fix Notmuch
to deal with it gracefully (if we decide catering to potentially broken
other software is the way to go), while retaining the code to reject
multi-message mboxes. With the test case, we'd also avoid accidentally
breaking this in the future.

BR,
Jani.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Don't bother checking for mbox files
  2016-03-13 12:48   ` Jani Nikula
@ 2016-03-14  7:23     ` Tomi Ollila
  0 siblings, 0 replies; 4+ messages in thread
From: Tomi Ollila @ 2016-03-14  7:23 UTC (permalink / raw)
  To: Jani Nikula, Edward Betts, notmuch

On Sun, Mar 13 2016, Jani Nikula <jani@nikula.org> wrote:

> [ text/plain ]
> On Sun, 13 Mar 2016, Edward Betts <edward@4angle.com> wrote:
>> Keith Packard <keithp@keithp.com> wrote:
>>> Postfix adds mbox-style From lines when used in combination with
>>> maildrop or .forward files. If they have another line starting with
>>> 'From ' in them, notmuch complains about them not being mail files.
>>> 
>>> If we assume the user hasn't screwed up and misconfigured their mail
>>> system, then we can safely ignore whether the file started with an
>>> mbox header and just parse it as a single-message file.
>>
>> I think it is fine to go ahead with this change. At the same time the
>> behaviour of Postfix should be corrected so it doesn't add mbox-style From
>> lines to mails in maildir format.
>
> I disagree with making the change (as-is, at least).
>
> In general, Notmuch does not support mboxes. We expect maildir style one
> message per file mail storage. We support single-message mboxes as a
> special case, in part because, as you note, there's plenty of other
> software that adds the mbox "From " line even though delivering to
> maildir.
>
> I think it's misleading and confusing to the users to accept and index
> the first message of mboxes, and silently ignore the rest (or worse,
> index all of the mbox and associate the text with the first message). I
> think we should reject multi-message mboxes, because we have no code to
> handle them. This patch throws away that check.
>
> Now, IIUC, the problem here is not that the files actually are
> multi-message mboxes. We could use a sample message (even a crafted one)
> that exhibits the problem, so we could add a test case, and fix Notmuch
> to deal with it gracefully (if we decide catering to potentially broken
> other software is the way to go), while retaining the code to reject
> multi-message mboxes. With the test case, we'd also avoid accidentally
> breaking this in the future.

I agree with Jani; user may accidentally index one mbox with multiple
messages as single message if this were merged...

We currently have very simple check; just line starting with 'From ' to
separate messages (and first line starts with 'From '). After a quick check
of these 'mbox*' "specs" this may just be within the "standard".

In mboxviewfs I checked whether there is at least one empty line before
'^From' (might not be required by the standard, but whatever ;/) and that
there is at least 'Date:' header following (needed for file "time")... but
even this "heuristics" may not be enough if we wanted to go deep into
this (i.e. there are emails which quote beginning of an mbox file (ok, no
heuristics can match this unless there is human-level AI working on it ;)

OTOH, presumably

https://github.com/GNOME/gmime/blob/master/tests/data/mbox/input/substring.mbox

contains 3 messages (or what??!!11)

...

Perhaps the simplest is to give users possibility to use 'footgun' option
in notmuch new (notmuch insert probably doesn't need it ???) which can be
used to skip the 'mbox' check (I was going to suggest configuration option,
but as we don't support that in bindings, ...). But of course some of the
simplicity is gone when one forgets to give the --footgun option -- next
notmuch new with the footgun probably will not pick the mail file again
(or we have to hold on updating the directory mtime indefinitely -- or
do other changes (i.e. more complicated which no-one reviews(*) anyway >;/))


> BR,
> Jani.

Tomi

(*) Although when someone sends less than usual trivial patches which
provides significant progression to the functionality those are reviewed
promptly with a relatively good number of reviewers...

One 'other change' could be e.g. keep a list of files that has been failing
due to this and retry those if this footgun option is given.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-03-14  7:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-10  0:08 [PATCH] Don't bother checking for mbox files Keith Packard
2016-03-13 10:57 ` Edward Betts
2016-03-13 12:48   ` Jani Nikula
2016-03-14  7:23     ` Tomi Ollila

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).