unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Gregor Zattler <telegraph@gmx.net>
To: notmuch <notmuch@notmuchmail.org>
Subject: Re: Bug: fatal error with notmuch new, second run starts indexing all over again
Date: Thu, 13 Jul 2017 09:04:44 +0200	[thread overview]
Message-ID: <20170713070444.tr4n2yfot5t4j663@len.workgroup> (raw)
In-Reply-To: <87k23h3cgg.fsf@tethera.net>

Hi David, notmuch developers,
* David Bremner <david@tethera.net> [09. Jul. 2017]:
> Gregor Zattler <telegraph@gmx.net> writes:
>> Short version: Some of the messages on this mailinglist had very
>> weired References: headers mot probably causing notmuch to
>> misbehave while threading the messages.  But then there was no
>> xapian exeption involved.
>>
> 
> Right. I looks like there are indeed many message-ids in the resulting
> database that look like valid email addresses. So that problem persists,
> despite an attempted fix discussed in that thread.
> 
> I was wondering if the exception could result from overflowing some
> internal limit due to threads many thousands of messages long. I will be
> hard to know until I can replicate the exception. Perhaps a more
> complete org-mode list archive would do it.

Quite possible, when I do 
notmuch show  --entire-thread=true --format=mbox path:Mail/~ml/emacs-orgmode@gnu.org/**   date 64 bit 32 > /tmp/emo.mbox
and open the resulting mbox with mutt, it shows 10199 messages.

There is fun to have regarding message ids:

0 grfz@len:/tmp$ grep -c "^Message-ID:[[:space:]]" eom.mbox 
8758
0 grfz@len:/tmp$ grep -ci "^Message-ID:[[:space:]]" eom.mbox 
10174

I then split the mbox with mutt in a maildir folder.  This were
10199 individual files.  I then extracted the message ids via
formail from this files and piped this through sort -u.  These
were 10105 message ids.

The maildir with the emacs-orgmode@gnu.org emails contained 114563
individual files.  I removed every file somewhere containing one
of those message ids.  Now there were only 102164 individual files
in the maildir.

And after that I indexed this cleaned up maildir.  If somehow
this message threading problem affects the indexing it should now
index the files without xapian exeption:


This is with newest notmuch:
0 grfz@len:/tmp$ /home/grfz/src/notmuch/notmuch --version
notmuch 0.24.2+112~g37d1fa5


0 grfz@len:/tmp$ gdb --args /home/grfz/src/notmuch/notmuch new
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/grfz/src/notmuch/notmuch...done.
(gdb) b _notmuch_database_log
Breakpoint 1 at 0x1f6e0: file lib/database.cc, line 426.
(gdb) run
Starting program: /home/grfz/src/notmuch/notmuch new
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Found 102164 total files (that's not much mail).
Processed 102164 total files in 5m 58s (284 files/sec.).
Added 102151 new messages to the database.
[Inferior 1 (process 26339) exited normally]
(gdb) 


While indexing the original emacs-org mode mailing list maildir
with it's 114563 files with the very same binary results in a
xapian exeption:


0 grfz@len:/tmp$ gdb --args /home/grfz/src/notmuch/notmuch new
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/grfz/src/notmuch/notmuch...done.
(gdb) b _notmuch_database_log
Breakpoint 1 at 0x1f6e0: file lib/database.cc, line 426.
(gdb) run
Starting program: /home/grfz/src/notmuch/notmuch new
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Found 114563 total files (that's not much mail).

Breakpoint 1, _notmuch_database_log (notmuch=0x5555557d2780, format=0x55555558d258 "A Xapian exception occurred adding message: %s.\n") at lib/database.cc:426
426     {
(gdb) bt
#0  _notmuch_database_log (notmuch=0x5555557d2780, format=0x55555558d258 "A Xapian exception occurred adding message: %s.\n") at lib/database.cc:426
#1  0x0000555555577f58 in notmuch_database_add_message (notmuch=notmuch@entry=0x5555557d2780, filename=filename@entry=0x555557ae2070 "/home/grfz/Mail/~ml/emacs-orgmode@gnu.org/cur/1488666090.R8081408151026683680.len:2,",
    message_ret=message_ret@entry=0x7fffffffd2c8) at lib/database.cc:2597
#2  0x000055555556802f in add_file (state=0x7fffffffd540, filename=0x555557ae2070 "/home/grfz/Mail/~ml/emacs-orgmode@gnu.org/cur/1488666090.R8081408151026683680.len:2,", notmuch=0x5555557d2780) at notmuch-new.c:264
#3  add_files (notmuch=notmuch@entry=0x5555557d2780, path=path@entry=0x5555557d25f0 "/home/grfz/Mail/~ml/emacs-orgmode@gnu.org/cur", state=state@entry=0x7fffffffd540) at notmuch-new.c:599
#4  0x0000555555567b44 in add_files (notmuch=0x5555557d2780, path=path@entry=0x5555557d4f90 "/home/grfz/Mail/~ml/emacs-orgmode@gnu.org", state=state@entry=0x7fffffffd540) at notmuch-new.c:483
#5  0x00005555555689ed in notmuch_new_command (config=0x5555557ce1d0, argc=<optimized out>, argv=<optimized out>) at notmuch-new.c:1099
#6  0x0000555555561a27 in main (argc=<optimized out>, argv=0x7fffffffda68) at notmuch.c:456
(gdb) continue
Continuing.
Error: A Xapian exception occurred. Halting processing.
Processed 69364 total files in 25m 59s (44 files/sec.).
Added 69350 new messages to the database.
Note: A fatal error was encountered: A Xapian exception occurred
[Inferior 1 (process 27880) exited with code 01]
(gdb) continue
The program is not being run.
(gdb)

The referenced file is the second one I attached in this thread.




So your/Davids intuition was right.  The problem has to do with
this threading problem.


I also did extensive memtest and every kind of smartctl test
possible, to be sure this is no hardware problem.



I then downloaded parts of the archive, merged and sliced it and
now I have a sample of 25001 total files (that's not much mail),
on which notmuch new produces a xapian exeption:

0 (master *) grfz@len:~$ gdb --args /home/grfz/src/notmuch/notmuch new
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/grfz/src/notmuch/notmuch...done.
(gdb) b _notmuch_database_log
Breakpoint 1 at 0x1f6e0: file lib/database.cc, line 426.
(gdb) run
Starting program: /home/grfz/src/notmuch/notmuch new
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Found 25001 total files (that's not much mail).

Breakpoint 1, _notmuch_database_log (notmuch=0x5555557d27f0, format=0x55555558d258 "A Xapian exception occurred adding message: %s.\n") at lib/database.cc:426
426     {
(gdb) bt
#0  _notmuch_database_log (notmuch=0x5555557d27f0, format=0x55555558d258 "A Xapian exception occurred adding message: %s.\n") at lib/database.cc:426
#1  0x0000555555577f58 in notmuch_database_add_message (notmuch=notmuch@entry=0x5555557d27f0, filename=filename@entry=0x555558dc96c0 "/tmp/reduced-sample/cur/1499897912.R1843131171398763530.len:2,",
                                                                                              message_ret=message_ret@entry=0x7fffffffd2e8) at lib/database.cc:2597
#2  0x000055555556802f in add_file (state=0x7fffffffd560, filename=0x555558dc96c0 "/tmp/reduced-sample/cur/1499897912.R1843131171398763530.len:2,", notmuch=0x5555557d27f0) at notmuch-new.c:264
#3  add_files (notmuch=notmuch@entry=0x5555557d27f0, path=path@entry=0x5555557e7950 "/tmp/reduced-sample/cur", state=state@entry=0x7fffffffd560) at notmuch-new.c:599
#4  0x0000555555567b44 in add_files (notmuch=0x5555557d27f0, path=path@entry=0x5555557cf710 "/tmp/reduced-sample", state=state@entry=0x7fffffffd560) at notmuch-new.c:483
#5  0x00005555555689ed in notmuch_new_command (config=0x5555557ce1d0, argc=<optimized out>, argv=<optimized out>) at notmuch-new.c:1099
#6  0x0000555555561a27 in main (argc=<optimized out>, argv=0x7fffffffda88) at notmuch.c:456
(gdb)


As a tar.xz it weighs 28 MB.  I could provide this for download
if someone is interested.


Ciao, Gregor

  reply	other threads:[~2017-07-13  7:10 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-03 10:09 Bug: fatal error with notmuch new, second run starts indexing all over again Gregor Zattler
2017-07-03 10:46 ` David Bremner
2017-07-03 22:07   ` Gregor Zattler
2017-07-03 22:34     ` David Bremner
2017-07-05 16:46       ` Tomi Ollila
2017-07-06  9:31       ` Gregor Zattler
2017-07-09 12:36         ` David Bremner
2017-07-09 14:21           ` Gregor Zattler
2017-07-09 14:35             ` David Bremner
2017-07-13  7:04               ` Gregor Zattler [this message]
2017-07-13 10:31                 ` David Bremner
2017-07-13 11:31                   ` Gregor Zattler
2017-07-13 13:03                 ` David Bremner
2017-07-13 13:59                   ` Gregor Zattler
2017-07-13 18:26                     ` David Bremner
2017-07-14 10:14                       ` Gregor Zattler
2017-07-14 11:13                         ` David Bremner
2017-07-14 21:27                           ` Gregor Zattler
2018-07-01 15:36                             ` Leonard Lausen
2018-07-03 21:57                               ` Gregor Zattler
2018-07-08  2:07                                 ` David Bremner
2018-07-08  9:46                                   ` Gregor Zattler
2018-07-08  2:09                               ` David Bremner
2018-07-09 15:31                                 ` Leonard Lausen
2018-07-10 11:11                                   ` David Bremner
2021-12-25 18:10                           ` David Bremner
2021-12-27 19:05                             ` confirmed (was: Bug: fatal error with notmuch new, second run starts indexing all over again) Gregor Zattler
2021-12-28  0:58                               ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170713070444.tr4n2yfot5t4j663@len.workgroup \
    --to=telegraph@gmx.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).