unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Added messages / total files count difference.
@ 2011-08-09 11:02 Tomi Ollila
  2011-08-10  8:41 ` Tomi Ollila
  0 siblings, 1 reply; 2+ messages in thread
From: Tomi Ollila @ 2011-08-09 11:02 UTC (permalink / raw)
  To: notmuch

Hi

I get this output:

$ notmuch new --verbose
Found 15559 total files (that's not much mail).
Processed 15559 total files in 5m 53s (43 files/sec.).
Added 15546 new messages to the database.

$ find * -type f | wc
  15559   15559  529027

How can I determine which 13 files were dropped. All of those
15559 files should be mails. I tried to check through mail files that
have no 'Subject:' header but those were (at least one) indexed. Could
it be about duplicate Message-ID: or something ?

$ notmuch --version
notmuch 0.7-7-g68e8560

Tomi

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Added messages / total files count difference.
  2011-08-09 11:02 Added messages / total files count difference Tomi Ollila
@ 2011-08-10  8:41 ` Tomi Ollila
  0 siblings, 0 replies; 2+ messages in thread
From: Tomi Ollila @ 2011-08-10  8:41 UTC (permalink / raw)
  To: notmuch

On Tue 09 Aug 2011 14:02, Tomi Ollila <tomi.ollila@nixu.com> writes:

> Hi
>
> I get this output:
>
> $ notmuch new --verbose
> Found 15559 total files (that's not much mail).
> Processed 15559 total files in 5m 53s (43 files/sec.).
> Added 15546 new messages to the database.
>
> $ find * -type f | wc
>   15559   15559  529027
>
> How can I determine which 13 files were dropped. All of those
> 15559 files should be mails. I tried to check through mail files that
> have no 'Subject:' header but those were (at least one) indexed. Could
> it be about duplicate Message-ID: or something ?
>
> $ notmuch --version
> notmuch 0.7-7-g68e8560

It is about duplicate Message-ID:s

It would be nice that 'notmuch new' printes information about this
if this were to happen (as I recall it does when new file found
is not (considered as) a mail file).

The steps I took to figure this out (not all iterations with & without
'wc':s shown) at the end of this email.

>
> Tomi

Tomi

--8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--

$ find ~/mail/mails/* -type f | sort >! filenames-fs
$ wc filenames-fs 
 15559  15559 855766 filenames-fs

$ cd /path/to/notmuch-git/bindings/python
$ cat > foo.py
import notmuch
db = notmuch.Database()
msgs = notmuch.Query(db,'').search_messages()

for f in msgs:
    print f.get_filename()

$ PYTHONPATH=/path/to/python-json:`pwd` python foo.py | sort > filenames-db
$ wc filenames-db
 15546  15546 855037 filenames-db

$ diff filenames-db filenames-fs | grep mails | wc
     13      26     755

$ cd ~/mail
$ cat >midcheck.pl
use strict;
use warnings;

my %msgids;

foreach (<mails/*/*>) {
    my $fn = $_;
    my $mid;
    open I, '<', $fn or die $!;
    while (<I>) {
        $mid = $1, next if /^Message-ID:\s*(.*)/i;
        last if /^$/;
    }
    close I;
    unless ($mid) {
        print "$fn: no Message-ID (in same line with header tag?)\n";
        next;
    }
    my $fn0 = $msgids{$mid};
    if (defined $fn0) {
        print "Files '$fn0' and '$fn' have same msg id: $mid\n";
    }
    else {
        $msgids{$mid} = $fn;
    }
}

$ perl midcheck.pl | wc
     13     117    2098
$ perl midcheck.pl | grep \^Files | wc
     13     117    2098

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-08-10  8:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-09 11:02 Added messages / total files count difference Tomi Ollila
2011-08-10  8:41 ` Tomi Ollila

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).