unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* How to debug 'ignoring non-mail file' issues
@ 2014-08-31  7:40 Perttu Luukko
  2014-08-31 14:41 ` David Bremner
  2014-08-31 16:46 ` David Bremner
  0 siblings, 2 replies; 12+ messages in thread
From: Perttu Luukko @ 2014-08-31  7:40 UTC (permalink / raw)
  To: notmuch

Hi,

I indexed my archive of emails from recent years with notmuch (about 10k
messages so not much). I have quite a lot of messages 'notmuch new'
ignores as non-mail files, about 1000 of them. They are not obviously
malformed, meaning that the files certainly look like emails when opened
in a text editor. I'd like to find out why these files are ignored, and
if there is something I can do to fix them. Of course I'd like to have a
complete database of my old emails, with nothing falling through the
cracks like this.

The vast majority of these ignored mails are not ignored after I
transfer them with offlineimap to another computer. I can non-ignore
these files probably by copying the renamed file back to the mail
server, so this is fixable. Offlineimap shouldn't mess with the file's
contents, so is there something that can cause notmuch to ignore a file
based on its name?

Looking at the rest of the ignored messages most of them seem to have
very large attachments, but there are possibly others. There is only
maybe 20 of these kinds of emails so I can try to fix them manually.
Still, it would help if I knew what exactly caused notmuch to ignore the
file. I understand most of the message parsing is done with gmime. Does
gmime give any diagnostics on parse errors that could be used to give a
reason for thinking a file is not mail?

I understand that the list of non-mail files is stored in the notmuch
database and the files are completely ignored from there on. This
actually makes it harder to debug these kind of issues since the list of
ignored mails is only visible on the first invocation of 'notmuch new',
unless the files are moved around. Is there some way to extract the list
of ignored files from the database for inspection? Maybe 'notmuch new'
could have some kind of --unignore-non-mail switch that would reconsider
previously ignored files.

-- 
Perttu Luukko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-08-31  7:40 How to debug 'ignoring non-mail file' issues Perttu Luukko
@ 2014-08-31 14:41 ` David Bremner
  2014-09-01  6:52   ` Perttu Luukko
  2014-08-31 16:46 ` David Bremner
  1 sibling, 1 reply; 12+ messages in thread
From: David Bremner @ 2014-08-31 14:41 UTC (permalink / raw)
  To: Perttu Luukko, notmuch

Perttu Luukko <perttu.luukko@iki.fi> writes:


> The vast majority of these ignored mails are not ignored after I
> transfer them with offlineimap to another computer. I can non-ignore
> these files probably by copying the renamed file back to the mail
> server, so this is fixable. Offlineimap shouldn't mess with the file's
> contents, so is there something that can cause notmuch to ignore a file
> based on its name?

The most likely cause is that the files are mboxes, whether intentional
or not.  In particular if they start with a "From " (note the lack of :)
and contain a second "From " at the beginning of a line later in the
file. In this case something like sed can replace the initial 
"From " with "X-Envelope-From: ".

I agree that the error message could be more informative in this case.

d

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-08-31  7:40 How to debug 'ignoring non-mail file' issues Perttu Luukko
  2014-08-31 14:41 ` David Bremner
@ 2014-08-31 16:46 ` David Bremner
  2014-09-01  6:41   ` Perttu Luukko
  1 sibling, 1 reply; 12+ messages in thread
From: David Bremner @ 2014-08-31 16:46 UTC (permalink / raw)
  To: Perttu Luukko, notmuch

Perttu Luukko <perttu.luukko@iki.fi> writes:


> I understand that the list of non-mail files is stored in the notmuch
> database and the files are completely ignored from there on. This
> actually makes it harder to debug these kind of issues since the list of
> ignored mails is only visible on the first invocation of 'notmuch new',
> unless the files are moved around. Is there some way to extract the list
> of ignored files from the database for inspection? Maybe 'notmuch new'
> could have some kind of --unignore-non-mail switch that would reconsider
> previously ignored files.

I _think_ it should suffice to do something like

   find Maildir -type d -exec touch {} \;

to force a rescan

d

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-08-31 16:46 ` David Bremner
@ 2014-09-01  6:41   ` Perttu Luukko
  2014-09-04 11:01     ` Perttu Luukko
  0 siblings, 1 reply; 12+ messages in thread
From: Perttu Luukko @ 2014-09-01  6:41 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

On 2014-08-31 09:46:12, David Bremner wrote:
> Perttu Luukko <perttu.luukko@iki.fi> writes:
> 
> > I understand that the list of non-mail files is stored in the
> > notmuch database and the files are completely ignored from there on.
> > This actually makes it harder to debug these kind of issues since
> > the list of ignored mails is only visible on the first invocation of
> > 'notmuch new', unless the files are moved around. Is there some way
> > to extract the list of ignored files from the database for
> > inspection? Maybe 'notmuch new' could have some kind of
> > --unignore-non-mail switch that would reconsider previously ignored
> > files.
> 
> I _think_ it should suffice to do something like
> 
>    find Maildir -type d -exec touch {} \;
> 
> to force a rescan

Yes, that indeed works. I'll probably move these ignored files to a
separate folder for inspection.

-- 
Perttu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-08-31 14:41 ` David Bremner
@ 2014-09-01  6:52   ` Perttu Luukko
  2014-09-01  7:15     ` Perttu Luukko
  2014-09-01  7:48     ` Tomi Ollila
  0 siblings, 2 replies; 12+ messages in thread
From: Perttu Luukko @ 2014-09-01  6:52 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

On 2014-08-31 07:41:42, David Bremner wrote:
> Perttu Luukko <perttu.luukko@iki.fi> writes:
> > The vast majority of these ignored mails are not ignored after I
> > transfer them with offlineimap to another computer. I can non-ignore
> > these files probably by copying the renamed file back to the mail
> > server, so this is fixable. Offlineimap shouldn't mess with the file's
> > contents, so is there something that can cause notmuch to ignore a file
> > based on its name?
> 
> The most likely cause is that the files are mboxes, whether intentional
> or not.  In particular if they start with a "From " (note the lack of :)
> and contain a second "From " at the beginning of a line later in the
> file. In this case something like sed can replace the initial 
> "From " with "X-Envelope-From: ".
> 
> I agree that the error message could be more informative in this case.

No, the mails do contain "From: " with the appropriate colon. If I
understood correctly notmuch returns the same "not mail" return code
both when the essential headers are missing (so the file probably really
isn't mail) and when GMime fails to parse the message. I think it would
be a good idea to give a different warning in the latter case.

If the files really are ignored because of GMime it also explains why so
much more files are ignored on my mail provider's server than on my
laptop. The server probably has an older version of GMime. I'll upgrade
and see if that makes a difference.

-- 
Perttu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-09-01  6:52   ` Perttu Luukko
@ 2014-09-01  7:15     ` Perttu Luukko
  2014-09-02 20:37       ` Jani Nikula
  2014-09-01  7:48     ` Tomi Ollila
  1 sibling, 1 reply; 12+ messages in thread
From: Perttu Luukko @ 2014-09-01  7:15 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

On 2014-09-01 09:52:20, Perttu Luukko wrote:
> If the files really are ignored because of GMime it also explains why so
> much more files are ignored on my mail provider's server than on my
> laptop. The server probably has an older version of GMime. I'll upgrade
> and see if that makes a difference.

Yes, upgrading to GMime 2.6.20 caused all the messages on my server
classified as mail. Even more reason to give a separate warning for
GMime parse errors. I'll see if my archive of older emails still
contains some ignored files.

-- 
Perttu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-09-01  6:52   ` Perttu Luukko
  2014-09-01  7:15     ` Perttu Luukko
@ 2014-09-01  7:48     ` Tomi Ollila
  1 sibling, 0 replies; 12+ messages in thread
From: Tomi Ollila @ 2014-09-01  7:48 UTC (permalink / raw)
  To: Perttu Luukko, David Bremner; +Cc: notmuch

On Mon, Sep 01 2014, Perttu Luukko <perttu.luukko@iki.fi> wrote:

> On 2014-08-31 07:41:42, David Bremner wrote:
>> Perttu Luukko <perttu.luukko@iki.fi> writes:
>> > The vast majority of these ignored mails are not ignored after I
>> > transfer them with offlineimap to another computer. I can non-ignore
>> > these files probably by copying the renamed file back to the mail
>> > server, so this is fixable. Offlineimap shouldn't mess with the file's
>> > contents, so is there something that can cause notmuch to ignore a file
>> > based on its name?
>> 
>> The most likely cause is that the files are mboxes, whether intentional
>> or not.  In particular if they start with a "From " (note the lack of :)
>> and contain a second "From " at the beginning of a line later in the
>> file. In this case something like sed can replace the initial 
>> "From " with "X-Envelope-From: ".
>> 
>> I agree that the error message could be more informative in this case.
>
> No, the mails do contain "From: " with the appropriate colon. If I
> understood correctly notmuch returns the same "not mail" return code

The question here is whether the very first line of the mail file begins
with 'From ', not whether *any* of the actual header line starts with 'From: '
IIRC the mails get accepted even the 'From:' header were missing...

> both when the essential headers are missing (so the file probably really
> isn't mail) and when GMime fails to parse the message. I think it would
> be a good idea to give a different warning in the latter case.

Sure... :D

>
> If the files really are ignored because of GMime it also explains why so
> much more files are ignored on my mail provider's server than on my
> laptop. The server probably has an older version of GMime. I'll upgrade
> and see if that makes a difference.
>
> -- 
> Perttu


Tomi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-09-01  7:15     ` Perttu Luukko
@ 2014-09-02 20:37       ` Jani Nikula
  2014-09-03  4:54         ` Perttu Luukko
  0 siblings, 1 reply; 12+ messages in thread
From: Jani Nikula @ 2014-09-02 20:37 UTC (permalink / raw)
  To: Perttu Luukko, David Bremner; +Cc: notmuch

On Mon, 01 Sep 2014, Perttu Luukko <perttu.luukko@iki.fi> wrote:
> Yes, upgrading to GMime 2.6.20 caused all the messages on my server
> classified as mail.

What was the old version? If it was 2.4 we should probably consider
dropping support for that in future notmuch.

> Even more reason to give a separate warning for GMime parse errors.

Not sure. We only get a binary success/fail from GMime, and that gets
printed for all non-email files. I'm not sure it's helpful.

BR,
Jani.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-09-02 20:37       ` Jani Nikula
@ 2014-09-03  4:54         ` Perttu Luukko
  2014-09-03 16:03           ` Jani Nikula
  0 siblings, 1 reply; 12+ messages in thread
From: Perttu Luukko @ 2014-09-03  4:54 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

On 2014-09-02 23:37:12, Jani Nikula wrote:
> On Mon, 01 Sep 2014, Perttu Luukko <perttu.luukko@iki.fi> wrote:
> > Yes, upgrading to GMime 2.6.20 caused all the messages on my server
> > classified as mail.
> 
> What was the old version? If it was 2.4 we should probably consider
> dropping support for that in future notmuch.

It was 2.4.33. It might still work for other people, I don't know. I
still have some ignored mails. If I can nail down why they are ignored
we might now more about why GMime 2.4 ignored even more mail. They were
from around the same time period, so it might have something to do with
the email setup I had at that time.

> > Even more reason to give a separate warning for GMime parse errors.
> 
> Not sure. We only get a binary success/fail from GMime, and that gets
> printed for all non-email files. I'm not sure it's helpful.

What I mean that there would be a separate error for cases "Does not
resemble an email message at all", i.e., some control file your mail
server happens to store in the mailbox, and "Looks like mail but we
can't parse it", i.e., better find out why it can't be parsed to avoid
potentially important messages going missing from the database.

-- 
Perttu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-09-03  4:54         ` Perttu Luukko
@ 2014-09-03 16:03           ` Jani Nikula
  2014-09-04  4:34             ` Perttu Luukko
  0 siblings, 1 reply; 12+ messages in thread
From: Jani Nikula @ 2014-09-03 16:03 UTC (permalink / raw)
  To: Perttu Luukko; +Cc: notmuch

On Wed, 03 Sep 2014, Perttu Luukko <perttu.luukko@iki.fi> wrote:
> What I mean that there would be a separate error for cases "Does not
> resemble an email message at all", i.e., some control file your mail
> server happens to store in the mailbox, and "Looks like mail but we
> can't parse it", i.e., better find out why it can't be parsed to avoid
> potentially important messages going missing from the database.

As I said, GMime does not tell us the difference between the two.

BR,
Jani.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-09-03 16:03           ` Jani Nikula
@ 2014-09-04  4:34             ` Perttu Luukko
  0 siblings, 0 replies; 12+ messages in thread
From: Perttu Luukko @ 2014-09-04  4:34 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

On 2014-09-03 19:03:40, Jani Nikula wrote:
> On Wed, 03 Sep 2014, Perttu Luukko <perttu.luukko@iki.fi> wrote:
> > What I mean that there would be a separate error for cases "Does not
> > resemble an email message at all", i.e., some control file your mail
> > server happens to store in the mailbox, and "Looks like mail but we
> > can't parse it", i.e., better find out why it can't be parsed to avoid
> > potentially important messages going missing from the database.
> 
> As I said, GMime does not tell us the difference between the two.

There could be a separate parsing step that reads the first kilobyte or
so and checks whether it is text, and whether there is a line starting
with "From: " and possibly other headers. This could be run if GMime
thinks the file is not mail so there would be negligible overhead.

This is just a suggestion. Notmuch users are probably quite experienced
so they can always investigate on their own why their emails are being
ignored. But there could be more warning about ignored messages.
Something like, at the end of each 'notmuch new' output: "Note: some
files were ignored as non-mail. Check the list at
~/mail/.notmuch/ignored-files and adjust your ~/.notmuch-config".

-- 
Perttu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to debug 'ignoring non-mail file' issues
  2014-09-01  6:41   ` Perttu Luukko
@ 2014-09-04 11:01     ` Perttu Luukko
  0 siblings, 0 replies; 12+ messages in thread
From: Perttu Luukko @ 2014-09-04 11:01 UTC (permalink / raw)
  To: notmuch

On 2014-09-01 09:41:06, Perttu Luukko wrote:
> Yes, that indeed works. I'll probably move these ignored files to a
> separate folder for inspection.

I looked at the mails that are still ignored after upgrading GMime to
latest version, and I think I have found what they have in common. All
of my ignored emails are from 2010-2011, and for some reason these mails
contain a line like this:

From username  Wed Sep 28 16:43:49 2011

somewhere among the headers. Note the '>' at the beginning of the line.
The mails that are still ignored after upgrading GMime are those where
this line happens to be the first line. Also, all of them have
attachments for some reason. That line certainly doesn't look right, and
I don't know where it came from. It might be some byproduct of mail
redirection, since it shows my username, but the mails are not sent by
me.

I moved these problematic lines to the second line of each message, and
now they are imported without problems. I probably won't file a bug for
GMime because I have no idea whether this is just some oddity caused by
my mail setup. Let this information reside here in case someone else has
a similar problem.

-- 
Perttu

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-04 11:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-31  7:40 How to debug 'ignoring non-mail file' issues Perttu Luukko
2014-08-31 14:41 ` David Bremner
2014-09-01  6:52   ` Perttu Luukko
2014-09-01  7:15     ` Perttu Luukko
2014-09-02 20:37       ` Jani Nikula
2014-09-03  4:54         ` Perttu Luukko
2014-09-03 16:03           ` Jani Nikula
2014-09-04  4:34             ` Perttu Luukko
2014-09-01  7:48     ` Tomi Ollila
2014-08-31 16:46 ` David Bremner
2014-09-01  6:41   ` Perttu Luukko
2014-09-04 11:01     ` Perttu Luukko

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).