unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* WIP: filter out envelope headers in notmuch-insert.
@ 2022-02-12  2:55 David Bremner
  2022-02-12  2:55 ` [RFC PATCH 1/3] test: start new corpus of test messages for indexing code David Bremner
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: David Bremner @ 2022-02-12  2:55 UTC (permalink / raw)
  To: notmuch

This is a bit rough and ready, but before I fine tune it, I want to
make sure the overall idea of stripping envelope headers in
notmuch-insert makes sense.

The first patch is duplicated from id:20220205195215.166213-2-david@tethera.net

I thought about a more ambitious version that would replace any
existing "Return-Path" headers, but it seems like significantly more
work (the current code is not line based), and not obviously
better. Or maybe I missed the wording in the RFCs that talks about how
MDAs should behave here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/3] test: start new corpus of test messages for indexing code
  2022-02-12  2:55 WIP: filter out envelope headers in notmuch-insert David Bremner
@ 2022-02-12  2:55 ` David Bremner
  2022-02-12  2:55 ` [RFC PATCH 2/3] test: add known broken test for insert with mbox as input David Bremner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: David Bremner @ 2022-02-12  2:55 UTC (permalink / raw)
  To: notmuch

This particular message is not recognized by notmuch as mail, but is
fine according to e.g. mutt. The trigger for this bad behaviour seems
to be a second "From " ocurring at the beginning of the line but
inside an attachment.
---
 test/corpora/indexing/mbox-attachment.eml | 83 +++++++++++++++++++++++
 1 file changed, 83 insertions(+)
 create mode 100644 test/corpora/indexing/mbox-attachment.eml

diff --git a/test/corpora/indexing/mbox-attachment.eml b/test/corpora/indexing/mbox-attachment.eml
new file mode 100644
index 00000000..98a8fc91
--- /dev/null
+++ b/test/corpora/indexing/mbox-attachment.eml
@@ -0,0 +1,83 @@
+From david@tethera.net  Sat Feb  5 09:19:10 2022
+From: David Bremner <david@tethera.net>
+To: David Bremner <david@tethera.net>
+Subject: Re: [RFC PATCH v2 12/12] emacs: whitespace cleanup for keybindings
+Date: Sat, 05 Feb 2022 10:19:09 -0400
+Message-ID: <87k0e9o0pu.fsf@tethera.net>
+MIME-Version: 1.0
+Content-Type: multipart/mixed; boundary="=-=-="
+
+--=-=-=
+Content-Type: text/plain
+Content-Disposition: inline
+
+
+I figured out the race condition in the tests. The previous test was
+still running when the failing test started, the joys of using a shared
+emacs for running all of the tests in one file.
+
+The attached diff is split into the the commits that introduce the tests
+in question in my working series, but you should be able to just apply
+it on top of the posted series if you want.
+
+
+--=-=-=
+Content-Type: text/x-diff
+Content-Disposition: inline; filename=0001-test-fixups.patch
+
+From fc88cba7f1f37b9cf3b296eace2422dd0e173502 Mon Sep 17 00:00:00 2001
+From: David Bremner <david@tethera.net>
+Date: Thu, 3 Feb 2022 21:05:05 -0400
+Subject: [PATCH] test fixups
+
+---
+ test/T315-emacs-tagging.sh | 9 ++++-----
+ 1 file changed, 4 insertions(+), 5 deletions(-)
+
+diff --git a/test/T315-emacs-tagging.sh b/test/T315-emacs-tagging.sh
+index c9e3e53a..c26413ce 100755
+--- a/test/T315-emacs-tagging.sh
++++ b/test/T315-emacs-tagging.sh
+@@ -119,7 +119,8 @@ for mode in search show tree unthreaded; do
+       (notmuch-$mode \"$os_x_darwin_thread\")
+       (notmuch-test-wait)
+       (execute-kbd-macro \"+tag-to-be-undone-$mode\")
+-      (notmuch-tag-undo))"
++      (notmuch-tag-undo)
++      (notmuch-test-wait))"
+     count=$(notmuch count "tag:tag-to-be-undone-$mode")
+     test_expect_equal "$count" "0"
+ 
+@@ -128,9 +129,7 @@ for mode in search show tree unthreaded; do
+       (notmuch-$mode \"$os_x_darwin_thread\")
+       (notmuch-test-wait)
+       (execute-kbd-macro \"+one-$mode\")
+-      (notmuch-test-wait)
+       (execute-kbd-macro \"+two-$mode\")
+-      (notmuch-test-wait)
+       (notmuch-tag-undo)
+       (notmuch-test-wait)
+       (execute-kbd-macro \"+three-$mode\"))"
+@@ -143,7 +142,6 @@ for mode in search show tree unthreaded; do
+       (notmuch-$mode \"$os_x_darwin_thread\")
+       (notmuch-test-wait)
+       (execute-kbd-macro \"+one-$mode\")
+-      (notmuch-test-wait)
+       (execute-kbd-macro \"+two-$mode\")
+       (notmuch-tag-undo)
+       (notmuch-test-wait)
+@@ -159,7 +157,8 @@ for mode in search show tree unthreaded; do
+       (notmuch-$mode \"$os_x_darwin_thread\")
+       (notmuch-test-wait)
+       (execute-kbd-macro \"+tag-to-be-undone-$mode\")
+-      (execute-kbd-macro (kbd \"C-x u\")))"
++      (execute-kbd-macro (kbd \"C-x u\"))
++      (notmuch-test-wait))"
+     count=$(notmuch count "tag:tag-to-be-undone-$mode")
+     test_expect_equal "$count" "0"
+ done
+-- 
+2.30.2
+
+
+--=-=-=--
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/3] test: add known broken test for insert with mbox as input
  2022-02-12  2:55 WIP: filter out envelope headers in notmuch-insert David Bremner
  2022-02-12  2:55 ` [RFC PATCH 1/3] test: start new corpus of test messages for indexing code David Bremner
@ 2022-02-12  2:55 ` David Bremner
  2022-02-12  2:55 ` [RFC PATCH 3/3] WIP: escape envelope from David Bremner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: David Bremner @ 2022-02-12  2:55 UTC (permalink / raw)
  To: notmuch

It seems reasonable that notmuch should try to avoid delivering
messages in formats it cannot index.
---
 test/T070-insert.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/test/T070-insert.sh b/test/T070-insert.sh
index ec170b30..a297fa73 100755
--- a/test/T070-insert.sh
+++ b/test/T070-insert.sh
@@ -292,4 +292,10 @@ for code in OUT_OF_MEMORY XAPIAN_EXCEPTION ; do
     test_expect_code 0 "notmuch_with_shim shim-$code insert --keep < \"$gen_msg_filename\""
 done
 
+test_begin_subtest "insert converts mboxes on delivery"
+test_subtest_known_broken
+notmuch insert +unmboxed < "${TEST_DIRECTORY}"/corpora/indexing/mbox-attachment.eml
+output=$(notmuch count tag:unmboxed)
+test_expect_equal "${output}" 1
+
 test_done
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/3] WIP: escape envelope from
  2022-02-12  2:55 WIP: filter out envelope headers in notmuch-insert David Bremner
  2022-02-12  2:55 ` [RFC PATCH 1/3] test: start new corpus of test messages for indexing code David Bremner
  2022-02-12  2:55 ` [RFC PATCH 2/3] test: add known broken test for insert with mbox as input David Bremner
@ 2022-02-12  2:55 ` David Bremner
  2022-02-12 11:34 ` WIP: filter out envelope headers in notmuch-insert David Bremner
       [not found] ` <164481580222.17471.7090984749734305531@noble.neil.brown.name>
  4 siblings, 0 replies; 11+ messages in thread
From: David Bremner @ 2022-02-12  2:55 UTC (permalink / raw)
  To: notmuch

This needs duplicate code elimination.
---
 notmuch-insert.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/notmuch-insert.c b/notmuch-insert.c
index 214d4d03..dfeb6322 100644
--- a/notmuch-insert.c
+++ b/notmuch-insert.c
@@ -249,6 +249,8 @@ static bool
 copy_fd (int fdout, int fdin)
 {
     bool empty = true;
+    bool first = true;
+    const char *header="X-Envelope-From: ";
 
     while (! interrupted) {
 	ssize_t remain;
@@ -267,6 +269,23 @@ copy_fd (int fdout, int fdin)
 	}
 
 	p = buf;
+
+	if (first && remain >= 5 && 0 == strncmp (buf, "From ", 5)) {
+	    ssize_t written = write (fdout, header, strlen(header));
+	    if (written < 0 && errno == EINTR)
+		continue;
+	    if (written <= 0) {
+		fprintf (stderr, "Error: writing to temporary file: %s",
+			 strerror (errno));
+		return false;
+	    }
+
+	    p += 5;
+	    remain -= 5;
+	    empty = false;
+	}
+
+	first=false;
 	do {
 	    ssize_t written = write (fdout, p, remain);
 	    if (written < 0 && errno == EINTR)
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: WIP: filter out envelope headers in notmuch-insert.
  2022-02-12  2:55 WIP: filter out envelope headers in notmuch-insert David Bremner
                   ` (2 preceding siblings ...)
  2022-02-12  2:55 ` [RFC PATCH 3/3] WIP: escape envelope from David Bremner
@ 2022-02-12 11:34 ` David Bremner
  2022-02-13 14:06   ` David Bremner
       [not found] ` <164481580222.17471.7090984749734305531@noble.neil.brown.name>
  4 siblings, 1 reply; 11+ messages in thread
From: David Bremner @ 2022-02-12 11:34 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> I thought about a more ambitious version that would replace any
> existing "Return-Path" headers, but it seems like significantly more
> work (the current code is not line based), and not obviously
> better. Or maybe I missed the wording in the RFCs that talks about how
> MDAs should behave here.

The RFC wording I _did_ find is as follows.

RFC5598 says that (§4.3.3)

      The MDA records the RFC5321.MailFrom address into the
      RFC5321.Return-Path field.

where the first is the (address in) envelope from and the second is the
Return-Path: header. I'm not sure how authoritative RFC5598 really is;
it is "Category: Informational", whatever that means.

On the other hand RFC5321 §4.4 says

   When the delivery SMTP server makes the "final delivery" of a
   message, it inserts a return-path line at the beginning of the mail
   data.  This use of return-path is required; mail systems MUST support
   it.  The return-path line preserves the information in the <reverse-
   path> from the MAIL command.  Here, final delivery means the message
   has left the SMTP environment.  Normally, this would mean it had been
   delivered to the destination user or an associated mail drop, but in
   some cases it may be further processed and transmitted by another
   mail system.

At least postfix seems to follow the latter, but also includes the
envelope header when passing a message to a program via .forward.

This seems to imply that if we can trust the MTA to DTRT, the
Return-Path should already be present. OTOH, that is obviously not the
most robust assumption possible.
\r

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WIP: filter out envelope headers in notmuch-insert.
  2022-02-12 11:34 ` WIP: filter out envelope headers in notmuch-insert David Bremner
@ 2022-02-13 14:06   ` David Bremner
  0 siblings, 0 replies; 11+ messages in thread
From: David Bremner @ 2022-02-13 14:06 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> David Bremner <david@tethera.net> writes:
>
>> I thought about a more ambitious version that would replace any
>> existing "Return-Path" headers, but it seems like significantly more
>> work (the current code is not line based), and not obviously
>> better. Or maybe I missed the wording in the RFCs that talks about how
>> MDAs should behave here.
>
> The RFC wording I _did_ find is as follows.
>
> RFC5598 says that (§4.3.3)
>
>       The MDA records the RFC5321.MailFrom address into the
>       RFC5321.Return-Path field.
>
> where the first is the (address in) envelope from and the second is the
> Return-Path: header. I'm not sure how authoritative RFC5598 really is;
> it is "Category: Informational", whatever that means.

Starting at RFCs some more (as one does), I think that the "envelope
header" added by e.g. postfix's local(8) does not count as a
RFC5321.MailFrom (although it should contain the same
information). These mbox style envelope headers do not seem to super
well defined. RFC 4155, which is _about_ mbox, seems to go out of its
way not to define the syntax.

The following (from RFC5598) seems to indicate that
MDAs speak SMTP.

   Transfer into the MDA is accomplished by a normal MTA transfer
   mechanism.

If correct, that means that notmuch-insert is not an MDA in the sense
meant by RFC5598

As an experiment I converted copy_fd to line oriented form (which I
guess I'll post seperately), but I realized that further progress would
require parsing the "envelope header" line.\r

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WIP: filter out envelope headers in notmuch-insert.
       [not found] ` <164481580222.17471.7090984749734305531@noble.neil.brown.name>
@ 2022-02-14 11:16   ` David Bremner
  2022-02-14 22:23     ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: David Bremner @ 2022-02-14 11:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: notmuch

"NeilBrown" <neilb@suse.de> writes:

> On Sat, 12 Feb 2022, David Bremner wrote:
>> This is a bit rough and ready, but before I fine tune it, I want to
>> make sure the overall idea of stripping envelope headers in
>> notmuch-insert makes sense.
>
> I think that it would be highly inappropriate to modify the message
> passed to notmuch-insert AT ALL.
> It *might* be appropriate to check that it looks generally like an email
> message (e.g.  headers followed by a blank line), but changing it would
> be wrong.
>

Here we're just changing "From  foo" into "X-Envelope-From: foo",
so no information is lost, unless you consider being an mbox file
information.

> Why do you think it might be helpful?

Currently we throw an error from notmuch-insert when trying to index
those mbox files (as passed to notmuch-insert by postfix). This means
the message is either bounced or (with --keep), invisible in the notmuch
index.

d

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WIP: filter out envelope headers in notmuch-insert.
  2022-02-14 11:16   ` David Bremner
@ 2022-02-14 22:23     ` NeilBrown
  2022-02-14 23:52       ` David Bremner
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2022-02-14 22:23 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

On Mon, 14 Feb 2022, David Bremner wrote:
> "NeilBrown" <neilb@suse.de> writes:
> 
> > On Sat, 12 Feb 2022, David Bremner wrote:
> >> This is a bit rough and ready, but before I fine tune it, I want to
> >> make sure the overall idea of stripping envelope headers in
> >> notmuch-insert makes sense.
> >
> > I think that it would be highly inappropriate to modify the message
> > passed to notmuch-insert AT ALL.
> > It *might* be appropriate to check that it looks generally like an email
> > message (e.g.  headers followed by a blank line), but changing it would
> > be wrong.
> >
> 
> Here we're just changing "From  foo" into "X-Envelope-From: foo",
> so no information is lost, unless you consider being an mbox file
> information.

That one small change is probably defensible - though I would probably
prefer that such messages were either completely rejected or completely
supported.

Your broader discussion in the sequence of emails seemed to suggest you
were considering a lot more than that - sorry if I misunderstood.

> 
> > Why do you think it might be helpful?
> 
> Currently we throw an error from notmuch-insert when trying to index
> those mbox files (as passed to notmuch-insert by postfix). This means
> the message is either bounced or (with --keep), invisible in the notmuch
> index.

If notmuch-insert is given an 'mbox', shouldn't it either reject it, or
split it up into individual messages and insert each one individually
(discarding the "From" line because it shouldn't add any new
information)?

I don't suppose there is some config option to postfix to tell it to
provide an RFC-2822 email message, not an mbox ???

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WIP: filter out envelope headers in notmuch-insert.
  2022-02-14 22:23     ` NeilBrown
@ 2022-02-14 23:52       ` David Bremner
  2022-02-15  1:13         ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: David Bremner @ 2022-02-14 23:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: notmuch

"NeilBrown" <neilb@suse.de> writes:

> On Mon, 14 Feb 2022, David Bremner wrote:
>
> If notmuch-insert is given an 'mbox', shouldn't it either reject it, or
> split it up into individual messages and insert each one individually
> (discarding the "From" line because it shouldn't add any new
> information)?

I guess you could think of what the patch does as a conservative version
of that.  I'm not aware of any case where a "real" mbox with multiple
messages is passed to notmuch-insert. So the modification effectively
discards the From line, although keeping the contents, because who knows
what people rely on.

> I don't suppose there is some config option to postfix to tell it to
> provide an RFC-2822 email message, not an mbox ???

It seems not, at least for the case of invoking notmuch-insert via a
.forward file, which I would guess is the most common case (nothing
specific to postfix, it's just the obvious way to invoke
notmuch-insert).

d

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WIP: filter out envelope headers in notmuch-insert.
  2022-02-14 23:52       ` David Bremner
@ 2022-02-15  1:13         ` NeilBrown
  2022-02-15  2:15           ` David Bremner
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2022-02-15  1:13 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

On Tue, 15 Feb 2022, David Bremner wrote:
> "NeilBrown" <neilb@suse.de> writes:
> 
> > On Mon, 14 Feb 2022, David Bremner wrote:
> >
> > If notmuch-insert is given an 'mbox', shouldn't it either reject it, or
> > split it up into individual messages and insert each one individually
> > (discarding the "From" line because it shouldn't add any new
> > information)?
> 
> I guess you could think of what the patch does as a conservative version
> of that.  I'm not aware of any case where a "real" mbox with multiple
> messages is passed to notmuch-insert. So the modification effectively
> discards the From line, although keeping the contents, because who knows
> what people rely on.

And on digging into postfix a bit, it seem that when forwarding to a
program, it *doesn't* quote other "From " lines in the email, so
splitting up the input would break things.  I wonder if that is a bug...
the documentation is clear on the details of delivering to a program

> 
> > I don't suppose there is some config option to postfix to tell it to
> > provide an RFC-2822 email message, not an mbox ???
> 
> It seems not, at least for the case of invoking notmuch-insert via a
> .forward file, which I would guess is the most common case (nothing
> specific to postfix, it's just the obvious way to invoke
> notmuch-insert).

So it seems... In the case of postfix the sender information is provided
both in "From " and "Return-Path:", so just stripping "From " would be
safe.  It is hard to argue against the extra caution of retaining the
from line in an "X-Envolope-from" though.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WIP: filter out envelope headers in notmuch-insert.
  2022-02-15  1:13         ` NeilBrown
@ 2022-02-15  2:15           ` David Bremner
  0 siblings, 0 replies; 11+ messages in thread
From: David Bremner @ 2022-02-15  2:15 UTC (permalink / raw)
  To: NeilBrown; +Cc: notmuch

NeilBrown <neil@brown.name> writes:

> And on digging into postfix a bit, it seem that when forwarding to a
> program, it *doesn't* quote other "From " lines in the email, so
> splitting up the input would break things.  I wonder if that is a bug...
> the documentation is clear on the details of delivering to a program

It's a least partly "mbox is trainwreck, and they probably don't really
want to do mbox". Imagine a "From " line inside a signed part. But I
really don't know the intent of including a "From " line.

> So it seems... In the case of postfix the sender information is provided
> both in "From " and "Return-Path:", so just stripping "From " would be
> safe.  It is hard to argue against the extra caution of retaining the
> from line in an "X-Envolope-from" though.
>
> Thanks,
> NeilBrown

Thanks for thinking about these design issues with me.

d

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-02-15 12:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-12  2:55 WIP: filter out envelope headers in notmuch-insert David Bremner
2022-02-12  2:55 ` [RFC PATCH 1/3] test: start new corpus of test messages for indexing code David Bremner
2022-02-12  2:55 ` [RFC PATCH 2/3] test: add known broken test for insert with mbox as input David Bremner
2022-02-12  2:55 ` [RFC PATCH 3/3] WIP: escape envelope from David Bremner
2022-02-12 11:34 ` WIP: filter out envelope headers in notmuch-insert David Bremner
2022-02-13 14:06   ` David Bremner
     [not found] ` <164481580222.17471.7090984749734305531@noble.neil.brown.name>
2022-02-14 11:16   ` David Bremner
2022-02-14 22:23     ` NeilBrown
2022-02-14 23:52       ` David Bremner
2022-02-15  1:13         ` NeilBrown
2022-02-15  2:15           ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).