* notmuch ignoring alot of emails
@ 2019-03-23 6:45 Alexei Gilchrist
2019-03-30 11:29 ` David Bremner
2019-06-28 17:16 ` Alvaro Herrera
0 siblings, 2 replies; 19+ messages in thread
From: Alexei Gilchrist @ 2019-03-23 6:45 UTC (permalink / raw)
To: notmuch
Hi
When I run notmuch I get a bunch (hundreds) of emails that are ignored
with:
Note: Ignoring non-mail file: ...
The files are valid maildir files but have a paragraph somewhere in the
body where someone has written "From ".
Is there a fix to force the recognition of maildir files in this case? I
thought this was a solved problem with gmime since 2.6.7.
Sorry for the pun in the subject but I am using alot and I only see the
messages notmuch sees, neomutt has no issues seeing these messages but I
want a tighter integration with notmuch.
I'm on a mac and compiled notmuch-0.28.3; installed gmime 3.2.3 with
brew, and verified notmuch was linking against it:
≻ otool -L /usr/local/bin/notmuch
/usr/local/bin/notmuch:
/usr/local/lib/libnotmuch.5.dylib (compatibility version 5.2.0, current
version 5.2.0)
/usr/local/opt/gmime/lib/libgmime-3.0.0.dylib (compatibility version
202.0.0, current version 202.2.0)
/usr/local/opt/glib/lib/libgio-2.0.0.dylib (compatibility version
6001.0.0, current version 6001.0.0)
/usr/local/opt/glib/lib/libgobject-2.0.0.dylib (compatibility version
6001.0.0, current version 6001.0.0)
/usr/local/opt/glib/lib/libglib-2.0.0.dylib (compatibility version
6001.0.0, current version 6001.0.0)
/usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version
10.0.0, current version 10.5.0)
/usr/local/opt/talloc/lib/libtalloc.dylib (compatibility version 0.0.0,
current version 0.0.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version
1.2.11)
/usr/local/opt/xapian/lib/libxapian.30.dylib (compatibility version
36.0.0, current version 36.1.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version
400.9.4)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 1252.200.5)
≻ ls -l /usr/local/opt/gmime
lrwxr-xr-x 1 alexei admin 21 23 Mar 12:09 /usr/local/opt/gmime ->
../Cellar/gmime/3.2.3
≻ ls -l /usr/local/Cellar/gmime/3.2.3/lib/
total 2280
drwxr-xr-x 3 alexei staff 96 27 Nov 11:09 girepository-1.0
-rw-r--r-- 1 alexei staff 444500 23 Mar 12:09 libgmime-3.0.0.dylib
-r--r--r-- 1 alexei staff 720504 27 Nov 11:09 libgmime-3.0.a
lrwxr-xr-x 1 alexei staff 20 27 Nov 11:09 libgmime-3.0.dylib ->
libgmime-3.0.0.dylib
drwxr-xr-x 3 alexei staff 96 23 Mar 12:09 pkgconfig
Any ideas for a fix?
cheers,
Alexei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-03-23 6:45 notmuch ignoring alot of emails Alexei Gilchrist
@ 2019-03-30 11:29 ` David Bremner
2019-03-30 23:53 ` Alexei Gilchrist
2019-06-28 17:16 ` Alvaro Herrera
1 sibling, 1 reply; 19+ messages in thread
From: David Bremner @ 2019-03-30 11:29 UTC (permalink / raw)
To: Alexei Gilchrist, notmuch
"Alexei Gilchrist" <te100@runbox.com> writes:
> Hi
>
> When I run notmuch I get a bunch (hundreds) of emails that are ignored
> with:
>
> Note: Ignoring non-mail file: ...
>
> The files are valid maildir files but have a paragraph somewhere in the
> body where someone has written "From ".
>
And do they also have have a line starting with "From " as the first
line? This makes them mbox files. The second "From " makes them mbox
files with multiple messages. Notmuch thinks your MDA (the thing that
made those files) is misconfigured, assuming my guess about the format
is correct.
> Is there a fix to force the recognition of maildir files in this case? I
> thought this was a solved problem with gmime since 2.6.7.
There is not currently a way to do that. It's not a GMime problem, it's
a design choice of notmuch to avoid parsing multiple message
mbox's. That was originally added as a safety feature, and I think it
should probably stay the default. If someone wants work on adding a
configuration switch I can point them in the right direction.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-03-30 11:29 ` David Bremner
@ 2019-03-30 23:53 ` Alexei Gilchrist
2019-03-31 4:06 ` David Bremner
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Alexei Gilchrist @ 2019-03-30 23:53 UTC (permalink / raw)
To: David Bremner; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 2019 bytes --]
>> When I run notmuch I get a bunch (hundreds) of emails that are
>> ignored
>> with:
>>
>> Note: Ignoring non-mail file: ...
>>
>> The files are valid maildir files but have a paragraph somewhere in
>> the
>> body where someone has written "From ".
>>
>
> And do they also have have a line starting with "From " as the first
> line? This makes them mbox files. The second "From " makes them mbox
> files with multiple messages. Notmuch thinks your MDA (the thing that
> made those files) is misconfigured, assuming my guess about the format
> is correct.
Every message file begins with “From “. This is true of all messages
downloaded by both offlineimap (with type = Maildir) and mbsync.
neomutt has no issues dealing with these files as maildir and mu has no
issues indexing them either. I’m assuming that stating with “From
“ is part of the maildir spec.
The problem occurs specifically with notmuch. If someone sends a message
with a line that begins with “From “ in the *body* then it confuses
notmuch.
mu can correctly index these messages but my mu is linked against
libgmime-2.6, my notmuch (0.28.3) is linked against libgmime-3.0.
>> Is there a fix to force the recognition of maildir files in this
>> case? I
>> thought this was a solved problem with gmime since 2.6.7.
>
> There is not currently a way to do that. It's not a GMime problem,
> it's
> a design choice of notmuch to avoid parsing multiple message
> mbox's. That was originally added as a safety feature, and I think it
> should probably stay the default. If someone wants work on adding a
> configuration switch I can point them in the right direction.
This is a poor design decision. It means anyone on the internet can
break your mail setup simply by sending a message with a line starting
with “From “.
(and using usual quoted-printable Content-Transfer-Encoding).
Try it. Send yourself a message with the line “From bad parsing comes
chaos” and see if your notmuch can find it. My version can’t.
[-- Attachment #2: Type: text/html, Size: 3027 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-03-30 23:53 ` Alexei Gilchrist
@ 2019-03-31 4:06 ` David Bremner
2019-03-31 8:52 ` Tomi Ollila
2019-03-31 11:00 ` Tomas Nordin
2 siblings, 0 replies; 19+ messages in thread
From: David Bremner @ 2019-03-31 4:06 UTC (permalink / raw)
To: Alexei Gilchrist; +Cc: notmuch
"Alexei Gilchrist" <te100@runbox.com> writes:
>
> Try it. Send yourself a message with the line “From bad parsing comes
> chaos” and see if your notmuch can find it. My version can’t.
It's not that simple. My MDA is configured not to add the initial mbox
"From " line to files in maildirs.
d
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-03-30 23:53 ` Alexei Gilchrist
2019-03-31 4:06 ` David Bremner
@ 2019-03-31 8:52 ` Tomi Ollila
2019-03-31 11:00 ` Tomas Nordin
2 siblings, 0 replies; 19+ messages in thread
From: Tomi Ollila @ 2019-03-31 8:52 UTC (permalink / raw)
To: Alexei Gilchrist, David Bremner; +Cc: notmuch
On Sun, Mar 31 2019, Alexei Gilchrist wrote:
>>> When I run notmuch I get a bunch (hundreds) of emails that are
>>> ignored
>>> with:
>>>
>>> Note: Ignoring non-mail file: ...
>>>
>>> The files are valid maildir files but have a paragraph somewhere in
>>> the
>>> body where someone has written "From ".
>>>
>>
>> And do they also have have a line starting with "From " as the first
>> line? This makes them mbox files. The second "From " makes them mbox
>> files with multiple messages. Notmuch thinks your MDA (the thing that
>> made those files) is misconfigured, assuming my guess about the format
>> is correct.
>
> Every message file begins with “From “. This is true of all messages
> downloaded by both offlineimap (with type = Maildir) and mbsync.
> neomutt has no issues dealing with these files as maildir and mu has no
> issues indexing them either. I’m assuming that stating with “From
> “ is part of the maildir spec.
>
> The problem occurs specifically with notmuch. If someone sends a message
> with a line that begins with “From “ in the *body* then it confuses
> notmuch.
>
> mu can correctly index these messages but my mu is linked against
> libgmime-2.6, my notmuch (0.28.3) is linked against libgmime-3.0.
>
>
>>> Is there a fix to force the recognition of maildir files in this
>>> case? I
>>> thought this was a solved problem with gmime since 2.6.7.
>>
>> There is not currently a way to do that. It's not a GMime problem,
>> it's
>> a design choice of notmuch to avoid parsing multiple message
>> mbox's. That was originally added as a safety feature, and I think it
>> should probably stay the default. If someone wants work on adding a
>> configuration switch I can point them in the right direction.
>
> This is a poor design decision. It means anyone on the internet can
> break your mail setup simply by sending a message with a line starting
> with “From “.
> (and using usual quoted-printable Content-Transfer-Encoding).
There are few things to remember in notmuch development:
- notmuch is more of an evolution than intelligent design. it is hard to
do any long-planned design when writing email software...
- we all do welcome people do SMOP with notmuch and tolerate patches with
good commit messages and elegant content.
- it may take some time to get changes reviewed...
In this particular case it would be nice if someone(tm) investigated how
mu and neomutt handles these email -- and how broken (if at all) those go
if those are given large mbox file... was it so that both of those can
read mbox files...
(which notmuch doesn't (but one can always use mboxvievfs! >;)))?
> Try it. Send yourself a message with the line “From bad parsing comes
> chaos” and see if your notmuch can find it. My version can’t.
My MDA (md5mda.sh) does not add 'From ' as beginning of first
line in my delivered emails (i.e. works similarly in this respect as
David's MDA).
Tomi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-03-30 23:53 ` Alexei Gilchrist
2019-03-31 4:06 ` David Bremner
2019-03-31 8:52 ` Tomi Ollila
@ 2019-03-31 11:00 ` Tomas Nordin
2019-03-31 22:02 ` Alexei Gilchrist
2 siblings, 1 reply; 19+ messages in thread
From: Tomas Nordin @ 2019-03-31 11:00 UTC (permalink / raw)
To: Alexei Gilchrist, David Bremner; +Cc: notmuch
Alexei Gilchrist <te100@runbox.com> writes:
> Every message file begins with “From “. This is true of all messages
> downloaded by both offlineimap (with type = Maildir) and mbsync.
> neomutt has no issues dealing with these files as maildir and mu has no
> issues indexing them either. I’m assuming that stating with “From
> “ is part of the maildir spec.
FWIW, I use Offlineimap and files retreived with it here does not begin
with "From". I see things like "Received: from..." or "Return-Path:..."
as the beginning of the first line.
> Try it. Send yourself a message with the line “From bad parsing comes
> chaos” and see if your notmuch can find it. My version can’t.
I tried that and find messages as expected. I mean, the message I sent
and this thread.
Best regards
--
Tomas
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-03-31 11:00 ` Tomas Nordin
@ 2019-03-31 22:02 ` Alexei Gilchrist
2019-03-31 23:27 ` David Bremner
0 siblings, 1 reply; 19+ messages in thread
From: Alexei Gilchrist @ 2019-03-31 22:02 UTC (permalink / raw)
To: Tomas Nordin; +Cc: David Bremner, notmuch
That’s interesting. Do you know a link to the file spec for maildir
file content? All I can find is information about the directory
structure and file naming, not the file content.
mbsync which specialises in maildir also had an initial “From “ line
for me, and they are independently configured. I’ll try out a couple
of different mail hosts to see if it’s that.
I can imagine that mutt just assumes they are maildir files once
configured that way, but mu also assumes the files are maildir and also
uses gmime to parse. However the current version on home-brew (Mac) is
linked to a version of gmime which was fixed to accomodate multiple
“From “ lines I believe, though I haven’t dug through the source
yet.
Cheers,
Alexei
On 31 Mar 2019, at 22:00, Tomas Nordin wrote:
> Alexei Gilchrist <te100@runbox.com> writes:
>
>> Every message file begins with “From “. This is true of all
>> messages
>> downloaded by both offlineimap (with type = Maildir) and mbsync.
>> neomutt has no issues dealing with these files as maildir and mu has
>> no
>> issues indexing them either. I’m assuming that stating with “From
>> “ is part of the maildir spec.
>
> FWIW, I use Offlineimap and files retreived with it here does not
> begin
> with "From". I see things like "Received: from..." or
> "Return-Path:..."
> as the beginning of the first line.
>
>> Try it. Send yourself a message with the line “From bad parsing
>> comes
>> chaos” and see if your notmuch can find it. My version can’t.
>
> I tried that and find messages as expected. I mean, the message I sent
> and this thread.
>
> Best regards
> --
> Tomas
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-03-31 22:02 ` Alexei Gilchrist
@ 2019-03-31 23:27 ` David Bremner
0 siblings, 0 replies; 19+ messages in thread
From: David Bremner @ 2019-03-31 23:27 UTC (permalink / raw)
To: Alexei Gilchrist, Tomas Nordin; +Cc: notmuch
"Alexei Gilchrist" <te100@runbox.com> writes:
> That’s interesting. Do you know a link to the file spec for maildir
> file content? All I can find is information about the directory
> structure and file naming, not the file content.
As far as I know, this is specified by RFC 5322.
> mbsync which specialises in maildir also had an initial “From “ line
> for me, and they are independently configured. I’ll try out a couple
> of different mail hosts to see if it’s that.
Yes, it could well determined by how the messages are delivered on the
server.
> I can imagine that mutt just assumes they are maildir files once
> configured that way, but mu also assumes the files are maildir and also
> uses gmime to parse. However the current version on home-brew (Mac) is
> linked to a version of gmime which was fixed to accomodate multiple
> “From “ lines I believe, though I haven’t dug through the source
> yet.
As I mentioned above, it's not really related to the version of GMime,
it's about how GMime is called, and whether the client wishes to parse
mbox files containing more than one message. Or to ignore the "From "
line at the beginning.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-03-23 6:45 notmuch ignoring alot of emails Alexei Gilchrist
2019-03-30 11:29 ` David Bremner
@ 2019-06-28 17:16 ` Alvaro Herrera
2019-06-28 20:11 ` Alvaro Herrera
1 sibling, 1 reply; 19+ messages in thread
From: Alvaro Herrera @ 2019-06-28 17:16 UTC (permalink / raw)
To: Alexei Gilchrist; +Cc: notmuch
On 2019-Mar-23, Alexei Gilchrist wrote:
> When I run notmuch I get a bunch (hundreds) of emails that are ignored with:
>
> Note: Ignoring non-mail file: ...
>
> The files are valid maildir files but have a paragraph somewhere in the body
> where someone has written "From ".
Yeah, that happens too when you attach patches generated with git
format-patch as plain text; this is extremely common in the
pgsql-hackers@lists.postgresql.org mailing list (you can download an
mbox from there for any month, convert it to a maildir, and give the
resulting maildir to notmuch -- you'll likely find a few dozen emails
that fail parsing). This is a very annoying problem for me, see
201901181607.4rba4c5uyimv@alvherre.pgsql in this list earlier this year.
I worked around it by patching _notmuch_message_file_parse in
lib/message-file.c to set is_mbox = false unconditionally; but that's
not a real solution (and hence I didn't post as a patch here), and it
explodes real good if you have an actual mbox in the directory where the
mail is (since after the hack it won't skip it anymore).
I think a real solution is to parse the message header, look for the
Content-Length, and determine mbox-ness by looking for "From" only past
that many bytes; that seems to match what other mail parsing tools do.
However, I haven't gotten around to doing that.
--
Álvaro Herrera 39°49'30"S 73°17'W
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-06-28 17:16 ` Alvaro Herrera
@ 2019-06-28 20:11 ` Alvaro Herrera
2019-06-29 19:03 ` David Bremner
2019-06-30 17:29 ` Tomi Ollila
0 siblings, 2 replies; 19+ messages in thread
From: Alvaro Herrera @ 2019-06-28 20:11 UTC (permalink / raw)
To: Alexei Gilchrist; +Cc: notmuch
On 2019-Jun-28, Alvaro Herrera wrote:
> I think a real solution is to parse the message header, look for the
> Content-Length, and determine mbox-ness by looking for "From" only past
> that many bytes; that seems to match what other mail parsing tools do.
Sorry, I misspoke: there's no such thing as Content-Length.
It's Content-Type/boundary that needs to be watched for. Only consider
that the file is an mbox if a "^From " line appears after the boundary
end marker (which seems to be defined as "the boundary string followed
by two dashes --").
Here's a sample message, BTW:
https://www.postgresql.org/message-id/raw/3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp
(username "archives", password "antispam").
--
Álvaro Herrera Valdivia, Chile
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-06-28 20:11 ` Alvaro Herrera
@ 2019-06-29 19:03 ` David Bremner
2019-06-29 19:09 ` David Bremner
2019-06-30 17:29 ` Tomi Ollila
1 sibling, 1 reply; 19+ messages in thread
From: David Bremner @ 2019-06-29 19:03 UTC (permalink / raw)
To: Alvaro Herrera, Alexei Gilchrist; +Cc: notmuch
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2019-Jun-28, Alvaro Herrera wrote:
>
>> I think a real solution is to parse the message header, look for the
>> Content-Length, and determine mbox-ness by looking for "From" only past
>> that many bytes; that seems to match what other mail parsing tools do.
>
> Sorry, I misspoke: there's no such thing as Content-Length.
> It's Content-Type/boundary that needs to be watched for. Only consider
> that the file is an mbox if a "^From " line appears after the boundary
> end marker (which seems to be defined as "the boundary string followed
> by two dashes --").
>
> Here's a sample message, BTW:
> https://www.postgresql.org/message-id/raw/3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp
> (username "archives", password "antispam").
I'm not keen on writing (more) ad hoc MIME parsing code, so if you can
phrase this in terms of GMime API (or at least MIME parts) it would be
great.
d
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-06-29 19:03 ` David Bremner
@ 2019-06-29 19:09 ` David Bremner
2019-07-01 15:26 ` Alvaro Herrera
0 siblings, 1 reply; 19+ messages in thread
From: David Bremner @ 2019-06-29 19:09 UTC (permalink / raw)
To: Alvaro Herrera, Alexei Gilchrist; +Cc: notmuch
David Bremner <david@tethera.net> writes:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>
>> On 2019-Jun-28, Alvaro Herrera wrote:
>>
>>> I think a real solution is to parse the message header, look for the
>>> Content-Length, and determine mbox-ness by looking for "From" only past
>>> that many bytes; that seems to match what other mail parsing tools do.
>>
>> Sorry, I misspoke: there's no such thing as Content-Length.
>> It's Content-Type/boundary that needs to be watched for. Only consider
>> that the file is an mbox if a "^From " line appears after the boundary
>> end marker (which seems to be defined as "the boundary string followed
>> by two dashes --").
>>
>> Here's a sample message, BTW:
>> https://www.postgresql.org/message-id/raw/3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp
>> (username "archives", password "antispam").
>
> I'm not keen on writing (more) ad hoc MIME parsing code, so if you can
> phrase this in terms of GMime API (or at least MIME parts) it would be
> great.
>
> d
On second thought, I guess it might not be practical to use GMime to parse
the file, since that might perform badly on large mboxes.
d
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-06-29 19:09 ` David Bremner
@ 2019-07-01 15:26 ` Alvaro Herrera
0 siblings, 0 replies; 19+ messages in thread
From: Alvaro Herrera @ 2019-07-01 15:26 UTC (permalink / raw)
To: David Bremner; +Cc: Alexei Gilchrist, notmuch
On 2019-Jun-29, David Bremner wrote:
> David Bremner <david@tethera.net> writes:
>
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> >> It's Content-Type/boundary that needs to be watched for. Only consider
> >> that the file is an mbox if a "^From " line appears after the boundary
> >> end marker (which seems to be defined as "the boundary string followed
> >> by two dashes --").
> > I'm not keen on writing (more) ad hoc MIME parsing code, so if you can
> > phrase this in terms of GMime API (or at least MIME parts) it would be
> > great.
Yeah, I was having a look at the GMime API last week to have a think
about how to do it with that.
> On second thought, I guess it might not be practical to use GMime to parse
> the file, since that might perform badly on large mboxes.
I think we only need to search for the first end boundary; if there's
anything beyond that, return is_mbox true. So we only need to fully
process the first email, and we can stop searching at that point.
--
Álvaro Herrera http://www.twitter.com/alvherre
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-06-28 20:11 ` Alvaro Herrera
2019-06-29 19:03 ` David Bremner
@ 2019-06-30 17:29 ` Tomi Ollila
2019-07-01 15:36 ` Alvaro Herrera
1 sibling, 1 reply; 19+ messages in thread
From: Tomi Ollila @ 2019-06-30 17:29 UTC (permalink / raw)
Cc: notmuch
On Fri, Jun 28 2019, Alvaro Herrera wrote:
> On 2019-Jun-28, Alvaro Herrera wrote:
>
>> I think a real solution is to parse the message header, look for the
>> Content-Length, and determine mbox-ness by looking for "From" only past
>> that many bytes; that seems to match what other mail parsing tools do.
>
> Sorry, I misspoke: there's no such thing as Content-Length.
> It's Content-Type/boundary that needs to be watched for. Only consider
> that the file is an mbox if a "^From " line appears after the boundary
> end marker (which seems to be defined as "the boundary string followed
> by two dashes --").
Just checking line starting with 'From ' would be pretty naïve since
From may be first word in any line in text body.
If we'd have to do content scanning then at least empty line before
From would be reguired, and next lines starting like
Received: someone@not.an.example
Date: a date
From: someone
(and then empty line... ;)
all this checkin would be required and still it could fail (perhaps
this content get modified in the fly, but then signature check, if
this mail had one, could fail...)
If there is header that tells the length of the body, then things
could be easier...
Tomi
>
> --
> Álvaro Herrera Valdivia, Chile
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-06-30 17:29 ` Tomi Ollila
@ 2019-07-01 15:36 ` Alvaro Herrera
2019-11-16 17:40 ` David Bremner
0 siblings, 1 reply; 19+ messages in thread
From: Alvaro Herrera @ 2019-07-01 15:36 UTC (permalink / raw)
To: Tomi Ollila; +Cc: notmuch
On 2019-Jun-30, Tomi Ollila wrote:
> Just checking line starting with 'From ' would be pretty naïve since
> From may be first word in any line in text body.
Even so, early mail systems relied on there not being any such lines,
and they escaped those lines to be ">From" or to use quoted-printable
encoding. GMime has bespoke code to do this, in fact. Mail systems
stopped doing this escaping after MIME boundaries got more widely used,
I suppose.
I think NNTP used content length much more extensively than email. Of
course, NNTP is almost disappeared now ...
> If we'd have to do content scanning then at least empty line before
> From would be reguired, and next lines starting like
> Received: someone@not.an.example
> Date: a date
> From: someone
>
> (and then empty line... ;)
>
> all this checkin would be required and still it could fail (perhaps
> this content get modified in the fly, but then signature check, if
> this mail had one, could fail...)
This logic still fails if you have mail-like content in the mail, such
as attachments produced by "git format-patch". Many open source lists
don't have this problem because they use "git send-email" instead, but
this is not universal.
> If there is header that tells the length of the body, then things
> could be easier...
Early emails had Content-Length as a header, but it was not universal,
and nowadays it seems to have been abandoned as a practice; the MIME
content boundary is used universally (or at least I cannot find any
recent divergence from this practice.)
--
Álvaro Herrera http://www.twitter.com/alvherre
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: notmuch ignoring alot of emails
2019-07-01 15:36 ` Alvaro Herrera
@ 2019-11-16 17:40 ` David Bremner
[not found] ` <87eey7szz6.fsf@eirikba.org>
0 siblings, 1 reply; 19+ messages in thread
From: David Bremner @ 2019-11-16 17:40 UTC (permalink / raw)
To: Alvaro Herrera, Tomi Ollila; +Cc: notmuch
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2019-Jun-30, Tomi Ollila wrote:
>
>> Just checking line starting with 'From ' would be pretty naïve since
>> From may be first word in any line in text body.
>
> Even so, early mail systems relied on there not being any such lines,
> and they escaped those lines to be ">From" or to use quoted-printable
> encoding. GMime has bespoke code to do this, in fact. Mail systems
> stopped doing this escaping after MIME boundaries got more widely used,
> I suppose.
As far as I know this is still correct for mbox files in general.
In general confusion arises because notmuch has a strict idea of what an
mbox is (file starts with "From "), while other software takes a more
relaxed approach.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2019-11-19 20:18 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-23 6:45 notmuch ignoring alot of emails Alexei Gilchrist
2019-03-30 11:29 ` David Bremner
2019-03-30 23:53 ` Alexei Gilchrist
2019-03-31 4:06 ` David Bremner
2019-03-31 8:52 ` Tomi Ollila
2019-03-31 11:00 ` Tomas Nordin
2019-03-31 22:02 ` Alexei Gilchrist
2019-03-31 23:27 ` David Bremner
2019-06-28 17:16 ` Alvaro Herrera
2019-06-28 20:11 ` Alvaro Herrera
2019-06-29 19:03 ` David Bremner
2019-06-29 19:09 ` David Bremner
2019-07-01 15:26 ` Alvaro Herrera
2019-06-30 17:29 ` Tomi Ollila
2019-07-01 15:36 ` Alvaro Herrera
2019-11-16 17:40 ` David Bremner
[not found] ` <87eey7szz6.fsf@eirikba.org>
2019-11-17 12:31 ` David Bremner
2019-11-17 13:46 ` David Bremner
[not found] ` <87blt9tdjj.fsf@eirikba.org>
2019-11-19 20:18 ` David Bremner
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).