unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* BUG: "notmuch insert" fails with "Delivery of non-mail file"
@ 2019-01-18 16:07 Alvaro Herrera
  2019-01-19 18:17 ` David Bremner
  2019-03-07  6:57 ` Leo L. Schwab
  0 siblings, 2 replies; 9+ messages in thread
From: Alvaro Herrera @ 2019-01-18 16:07 UTC (permalink / raw)
  To: notmuch

Hello

I've been using notmuch successfully for a couple of years now (mostly
via neomutt).  Thanks for developing it.

Not long ago I switched my mail setup to use notmuch insert via
mailfilter instead of good old procmail.  However, since then a number
of emails are reported by notmuch as "non-mail", and appear to not be
indexed.  (I use --keep, so they're still in my maildir).

In my read of the code ultimately comes from
g_mime_parser_construct_message rejecting the message.
I reported this to GMime, and they said that the problem is that notmuch
insert is using the mbox mode:
https://github.com/jstedfast/gmime/issues/58
(Sample email is attached there).

As far as I can tell, this is all coming from
_notmuch_message_file_parse() which sets the is_mbox flag when it sees
the "^From " line at the start of the file ... which kinda makes sense
in general terms, but for notmuch-insert I think that's the wrong thing
to do.  Maybe a solution is to pass a flag down from notmuch-insert.c's
add_file all the way down to _notmuch_message_file_parse telling it not
to treat the file as an mbox.

I *think* that not all of the messages that fail parsing contain an
email attachment, so maybe I'll come back with further issues later on.
This is the first one I debugged.

Thanks

-- 
Álvaro Herrera

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: "notmuch insert" fails with "Delivery of non-mail file"
  2019-01-18 16:07 BUG: "notmuch insert" fails with "Delivery of non-mail file" Alvaro Herrera
@ 2019-01-19 18:17 ` David Bremner
  2019-01-21 19:53   ` Alvaro Herrera
  2019-03-07  6:57 ` Leo L. Schwab
  1 sibling, 1 reply; 9+ messages in thread
From: David Bremner @ 2019-01-19 18:17 UTC (permalink / raw)
  To: Alvaro Herrera, notmuch

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

> In my read of the code ultimately comes from
> g_mime_parser_construct_message rejecting the message.
> I reported this to GMime, and they said that the problem is that notmuch
> insert is using the mbox mode:
> https://github.com/jstedfast/gmime/issues/58
> (Sample email is attached there).

This issue (or a related one) has come up before

     https://nmbug.notmuchmail.org/nmweb/search/postfix+mbox

Generally it seems to be caused by tools that add mbox 'From ' headers,
without actually mbox escaping the file. We haven't yet reached
consensus on a good solution (generally people just want to fix their
own mail, which is understandable). A workaround discussed in the
messages I reference above is to strip the 'From ' header before passing
to notmuch-insert. Perhaps some scholar of the RFCs can convince us that
that is "always" the right thing for notmuch insert to do.

> As far as I can tell, this is all coming from
> _notmuch_message_file_parse() which sets the is_mbox flag when it sees
> the "^From " line at the start of the file ... which kinda makes sense
> in general terms, but for notmuch-insert I think that's the wrong thing
> to do.  Maybe a solution is to pass a flag down from notmuch-insert.c's
> add_file all the way down to _notmuch_message_file_parse telling it not
> to treat the file as an mbox.
>

I'd be worried about letting notmuch-insert deliver messages that
notmuch-new would not be able to parse. In particular we'd like to keep
the property that a Maildir + the output of notmuch-dump should be
enough to completely recover the notmuch database.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: "notmuch insert" fails with "Delivery of non-mail file"
  2019-01-19 18:17 ` David Bremner
@ 2019-01-21 19:53   ` Alvaro Herrera
  2019-02-01 19:33     ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: Alvaro Herrera @ 2019-01-21 19:53 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Hi David, thanks for replying.

On 2019-Jan-19, David Bremner wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> 
> > In my read of the code ultimately comes from
> > g_mime_parser_construct_message rejecting the message.
> > I reported this to GMime, and they said that the problem is that notmuch
> > insert is using the mbox mode:
> > https://github.com/jstedfast/gmime/issues/58
> > (Sample email is attached there).
> 
> This issue (or a related one) has come up before
> 
>      https://nmbug.notmuchmail.org/nmweb/search/postfix+mbox
> 
> Generally it seems to be caused by tools that add mbox 'From ' headers,
> without actually mbox escaping the file. We haven't yet reached
> consensus on a good solution (generally people just want to fix their
> own mail, which is understandable). A workaround discussed in the
> messages I reference above is to strip the 'From ' header before passing
> to notmuch-insert. Perhaps some scholar of the RFCs can convince us that
> that is "always" the right thing for notmuch insert to do.

I'm not sure I follow.  As I understand, notmuch does not work with
mboxes, only with maildirs, so the behavior of splitting emails at "From
" is not strictly necessary, since one file always equals one message.

As for RFC scholarship, I spent some time looking at
https://tools.ietf.org/html/rfc5322 to see if it defined any sort of
message separator ... but as far as I can tell, it only defines what
does a valid message looks like.  It doesn't say where does one message
end.

On the other hand, in my world, it's been quite a while since 'From '
was considered a useful message separator.  This stopped being true in a
pretty extensive way when git-format-patches messages started being
posted as attachments.  But even before that, MUAs stopped adding the
">" at the start of a "From " line in human-written text.  Nowadays what
really governs the split is the Content-Length header, from the MIME
definitions.  Most tools do not escape lines starting with 'From '
anymore.  As far as I can tell, this is defined by RFC-2049,
https://tools.ietf.org/html/rfc2046#section-5.1.1 which states that the
implementation must look for the "boundary delimitir line".  Stopping at
a "From " line before finding the boundary delimiter line would be a
mistake, in my reading.

> > As far as I can tell, this is all coming from
> > _notmuch_message_file_parse() which sets the is_mbox flag when it sees
> > the "^From " line at the start of the file ... which kinda makes sense
> > in general terms, but for notmuch-insert I think that's the wrong thing
> > to do.  Maybe a solution is to pass a flag down from notmuch-insert.c's
> > add_file all the way down to _notmuch_message_file_parse telling it not
> > to treat the file as an mbox.
> 
> I'd be worried about letting notmuch-insert deliver messages that
> notmuch-new would not be able to parse. In particular we'd like to keep
> the property that a Maildir + the output of notmuch-dump should be
> enough to completely recover the notmuch database.

Hmm, that's a good point -- I assume that notmuch-new should be patched
similarly so that those messages are valid there too.

So maybe the solution (given that, as I said above, Notmuch does not
appear to handle mboxes at all) is to just set the mbox flag to false
completely ...

-- 
Álvaro Herrera                PostgreSQL Expert, https://www.2ndQuadrant.com/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: "notmuch insert" fails with "Delivery of non-mail file"
  2019-01-21 19:53   ` Alvaro Herrera
@ 2019-02-01 19:33     ` David Bremner
  0 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2019-02-01 19:33 UTC (permalink / raw)
  To: Alvaro Herrera; +Cc: notmuch

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

> I'm not sure I follow.  As I understand, notmuch does not work with
> mboxes, only with maildirs, so the behavior of splitting emails at "From
> " is not strictly necessary, since one file always equals one message.

Checking for mboxes was added as a safety feature since people found
indexing large mboxes led to bad results (bloated index, crashing
indexer, etc...).

> On the other hand, in my world, it's been quite a while since 'From '
> was considered a useful message separator.  This stopped being true in a
> pretty extensive way when git-format-patches messages started being
> posted as attachments.

Sure. Things on disk should either be mboxes, or not. If they start with
'From ', they are mboxes.  We attempted to take away support for single
message mboxes, but people complained even more about  that. So
generally, if tools / users don't want to escape 'From ' after the first
line, the first line should not be 'From '.

My original question was whether notmuch-insert should strip the 'From '
(and presumbly save as a normal header) before delivery.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: "notmuch insert" fails with "Delivery of non-mail file"
  2019-01-18 16:07 BUG: "notmuch insert" fails with "Delivery of non-mail file" Alvaro Herrera
  2019-01-19 18:17 ` David Bremner
@ 2019-03-07  6:57 ` Leo L. Schwab
  2019-03-07 21:05   ` David Bremner
  1 sibling, 1 reply; 9+ messages in thread
From: Leo L. Schwab @ 2019-03-07  6:57 UTC (permalink / raw)
  To: notmuch

On Fri, Jan 18, 2019 at 01:07:35PM -0300, Alvaro Herrera wrote:
> Not long ago I switched my mail setup to use notmuch insert via
> mailfilter instead of good old procmail.  However, since then a number
> of emails are reported by notmuch as "non-mail", and appear to not be
> indexed.  (I use --keep, so they're still in my maildir).
> 
	I've been bumping in to the same problem.  I converted 20+ years
worth of mail to maildir format expressly so I could use notmuch.  I almost
didn't do it because the setup was so daunting (reconfigure system MTA/MDA
to deliver in maildir instead of mbox; install, learn, and set up procmail
and/or fetchmail to update the index; modify muttrc; blah blah blah...).

	And then I hit on the idea of creating a .forward file containing:
"|/usr/bin/notmuch insert"  Poof!  Delivery and indexing in one step.

	The downside to this is that, if notmuch-insert fails with the above
error, the MTA tries to bounce the message (so thanks *very* much for making
me aware of the '--keep' option).

	As a result, I've been thinking how this might be addressed.  The
thought I've had is to create a new option to motmuch-insert that
essentially means, "Skip all validation, just index and deliver."  In other
words, the input is presumed to have already been validated by an external
entity, so assume it's good and index and deliver it.  '--keep' effectively
does this already, but it quashes *all* errors.  I just want to skip the
validator.

	I could probably kluge up a prototype if anyone thinks that's a
reasonable idea.

					Thanks,
					Schwab

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: "notmuch insert" fails with "Delivery of non-mail file"
  2019-03-07  6:57 ` Leo L. Schwab
@ 2019-03-07 21:05   ` David Bremner
  2019-03-07 22:03     ` Alvaro Herrera
  0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2019-03-07 21:05 UTC (permalink / raw)
  To: Leo L. Schwab, notmuch

"Leo L. Schwab" <ewhac@ewhac.org> writes:

> 	As a result, I've been thinking how this might be addressed.  The
> thought I've had is to create a new option to motmuch-insert that
> essentially means, "Skip all validation, just index and deliver."  In other
> words, the input is presumed to have already been validated by an external
> entity, so assume it's good and index and deliver it.  '--keep' effectively
> does this already, but it quashes *all* errors.  I just want to skip the
> validator.

If you move your database out of the way and run notmuch-new, are the
messages delivered by your modified notmuch-insert?  I think that's a
property I'd require for anything we were going to carry upstream.

Also, I'm not sure about turning off _all_ validation vs. just not
checking for mboxes.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: "notmuch insert" fails with "Delivery of non-mail file"
  2019-03-07 21:05   ` David Bremner
@ 2019-03-07 22:03     ` Alvaro Herrera
  2019-03-07 22:34       ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: Alvaro Herrera @ 2019-03-07 22:03 UTC (permalink / raw)
  To: David Bremner; +Cc: Leo L. Schwab, notmuch

On 2019-Mar-07, David Bremner wrote:

> "Leo L. Schwab" <ewhac@ewhac.org> writes:
> 
> > 	As a result, I've been thinking how this might be addressed.  The
> > thought I've had is to create a new option to motmuch-insert that
> > essentially means, "Skip all validation, just index and deliver."  In other
> > words, the input is presumed to have already been validated by an external
> > entity, so assume it's good and index and deliver it.  '--keep' effectively
> > does this already, but it quashes *all* errors.  I just want to skip the
> > validator.
> 
> If you move your database out of the way and run notmuch-new, are the
> messages delivered by your modified notmuch-insert?  I think that's a
> property I'd require for anything we were going to carry upstream.
> 
> Also, I'm not sure about turning off _all_ validation vs. just not
> checking for mboxes.

By the way, did you not have a problem with message
id:878szcwd8c.fsf@swing.csc.kth.se delivered to this very list?  That
one includes an unescaped "^From " line in the body, which is sure to
confuse the message parser ...

-- 
Álvaro Herrera                            39°50'S 73°21'W
"Ah, spring... when a young penguin's fancy lightly turns to thoughts of ...
Beta testing!"                                  (Fedora 9 beta announcement)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: "notmuch insert" fails with "Delivery of non-mail file"
  2019-03-07 22:03     ` Alvaro Herrera
@ 2019-03-07 22:34       ` David Bremner
  2019-03-08  0:51         ` Alvaro Herrera
  0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2019-03-07 22:34 UTC (permalink / raw)
  To: Alvaro Herrera; +Cc: Leo L. Schwab, notmuch

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

> By the way, did you not have a problem with message
> id:878szcwd8c.fsf@swing.csc.kth.se delivered to this very list?  That
> one includes an unescaped "^From " line in the body, which is sure to
> confuse the message parser ...

That particular message is base64 encoded, so that would be unlikely to
cause a problem.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: "notmuch insert" fails with "Delivery of non-mail file"
  2019-03-07 22:34       ` David Bremner
@ 2019-03-08  0:51         ` Alvaro Herrera
  0 siblings, 0 replies; 9+ messages in thread
From: Alvaro Herrera @ 2019-03-08  0:51 UTC (permalink / raw)
  To: David Bremner; +Cc: Leo L. Schwab, notmuch

On 2019-Mar-07, David Bremner wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> 
> > By the way, did you not have a problem with message
> > id:878szcwd8c.fsf@swing.csc.kth.se delivered to this very list?  That
> > one includes an unescaped "^From " line in the body, which is sure to
> > confuse the message parser ...
> 
> That particular message is base64 encoded, so that would be unlikely to
> cause a problem.

Bah, you're right, it is.  I swear I saw an "!Err" message from maildrop
because of an unadorned "From " in a message recently ... can't find it
now.

-- 
Álvaro Herrera                            http://www.linkedin.com/in/alvherre

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-03-08  0:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-18 16:07 BUG: "notmuch insert" fails with "Delivery of non-mail file" Alvaro Herrera
2019-01-19 18:17 ` David Bremner
2019-01-21 19:53   ` Alvaro Herrera
2019-02-01 19:33     ` David Bremner
2019-03-07  6:57 ` Leo L. Schwab
2019-03-07 21:05   ` David Bremner
2019-03-07 22:03     ` Alvaro Herrera
2019-03-07 22:34       ` David Bremner
2019-03-08  0:51         ` Alvaro Herrera

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).