From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 94D0A6DE294E for ; Thu, 13 Jul 2017 00:10:12 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.215 X-Spam-Level: X-Spam-Status: No, score=-0.215 tagged_above=-999 required=5 tests=[AWL=0.006, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.211, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lULwSl4L_RZ1 for ; Thu, 13 Jul 2017 00:10:10 -0700 (PDT) X-Greylist: delayed 306 seconds by postgrey-1.36 at arlo; Thu, 13 Jul 2017 00:10:10 PDT Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) by arlo.cworth.org (Postfix) with ESMTPS id 05B526DE290F for ; Thu, 13 Jul 2017 00:10:09 -0700 (PDT) Received: from len.workgroup ([84.189.138.147]) by mail.gmx.com (mrgmx102 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MIhDo-1dXk2N44PA-002Fhq for ; Thu, 13 Jul 2017 09:05:00 +0200 Date: Thu, 13 Jul 2017 09:04:44 +0200 From: Gregor Zattler To: notmuch Subject: Re: Bug: fatal error with notmuch new, second run starts indexing all over again Message-ID: <20170713070444.tr4n2yfot5t4j663@len.workgroup> Mail-Followup-To: notmuch References: <20170703100958.5yidjhsyrnglaxum@len.workgroup> <877ezpbxxu.fsf@tethera.net> <20170703220750.dkornyh4ho7b2azy@len.workgroup> <87inj9w3nu.fsf@tethera.net> <20170706093101.cdjxdgvfy57a3kkb@len.workgroup> <87o9st3hz0.fsf@tethera.net> <20170709142147.czeett6stskxrnkp@len.workgroup> <87k23h3cgg.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87k23h3cgg.fsf@tethera.net> X-Provags-ID: V03:K0:vDkBxJWAM0nf6A7gVSa/Ikxb2ps7njW2uFWUsU7DhFyreXgTevQ QQc/Czax8qNusNNMpkLTvwD3OBEI80qegpzNfU81+Ov5BlnbZ7kv+fxD9fVUEZTHwJZXbfM n6jfubXfWGs3sfrYklbYsSBoAK1Ybynj7pjdJhpfgt5iMHUM+OP1OVv0ynamO/S0TB+WAxK pCncugEicIIy1YTk3FwiQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:joobid25Gvo=:bw5+f/uakg0cIzUcpQc58U 70cP6kD7Y2rTKOfQ7xaq22+aAjuhSw6LoZHxSjb7qbheDhT6CNcshritpWnGChaBTXUCEnoPj Qj6bp3XdoCN/JUDapl5jMvap8GtGDpTz+jZAS5OUh2Ij7SQqdnS0hXyoNfw7GGrMbfSklf0oe xXa2IzfBx9KkiWWYjhtxG66BIKQVCNSij0MghxJ7hNcX4xLHuJyokZ4kuwMwjmmPtAR/qtMTy 47rCHKoqt4DOioyNLV2ZwlxCOYXSmO1QYDX95+G0e4qVBquYXy45E7yI8ws6HJSt70awQfgMS Nnp1eNNDA4llva4Z8w/aNxpG4YE+q2YHnRntejupS2VzyilOtnJBPh7qZuBhM2QajusdWF+Mf c+aPkTrhn1E6qmyFV7Sx/eul2Xm0hKdnmWCIBeSzjj05TmhceLmo+nLNHGVNZHAxU8c+994vZ QJ1zd+3EURBCo1Ddodn36mo4zLB4WewH0qnDgdvCdOYpUfNuLedTlrBXGmGFvmVf8lqebfzTc bUsHpjzhAuPm0CqRcuktxhniVoGMOgcyT1zG9/rVSC25Z4zdlvDbLkVeeIy/kb5ThGl5V8CJj n24WMr/giOcDBK3ZSnRgfOALHYpU4K9McHbJb0B3k8i0Q0TxBZ3KtUmVWeCLMO80qHPNY3yn9 g4dg1QFsxtK/4o3wE4TTcMLVT1TRl12gBmMCXAYT37nrwS+wMEAImK9KKBjBzL39IznqHvP4e /K6MEicNyHMp0VQkk4hMyeia2BPpsVt0luylq5TFK9KgVzpmgrgY6SPAcYA= X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jul 2017 07:10:12 -0000 Hi David, notmuch developers, * David Bremner [09. Jul. 2017]: > Gregor Zattler writes: >> Short version: Some of the messages on this mailinglist had very >> weired References: headers mot probably causing notmuch to >> misbehave while threading the messages. But then there was no >> xapian exeption involved. >> > > Right. I looks like there are indeed many message-ids in the resulting > database that look like valid email addresses. So that problem persists, > despite an attempted fix discussed in that thread. > > I was wondering if the exception could result from overflowing some > internal limit due to threads many thousands of messages long. I will be > hard to know until I can replicate the exception. Perhaps a more > complete org-mode list archive would do it. Quite possible, when I do notmuch show --entire-thread=true --format=mbox path:Mail/~ml/emacs-orgmode@gnu.org/** date 64 bit 32 > /tmp/emo.mbox and open the resulting mbox with mutt, it shows 10199 messages. There is fun to have regarding message ids: 0 grfz@len:/tmp$ grep -c "^Message-ID:[[:space:]]" eom.mbox 8758 0 grfz@len:/tmp$ grep -ci "^Message-ID:[[:space:]]" eom.mbox 10174 I then split the mbox with mutt in a maildir folder. This were 10199 individual files. I then extracted the message ids via formail from this files and piped this through sort -u. These were 10105 message ids. The maildir with the emacs-orgmode@gnu.org emails contained 114563 individual files. I removed every file somewhere containing one of those message ids. Now there were only 102164 individual files in the maildir. And after that I indexed this cleaned up maildir. If somehow this message threading problem affects the indexing it should now index the files without xapian exeption: This is with newest notmuch: 0 grfz@len:/tmp$ /home/grfz/src/notmuch/notmuch --version notmuch 0.24.2+112~g37d1fa5 0 grfz@len:/tmp$ gdb --args /home/grfz/src/notmuch/notmuch new GNU gdb (Debian 7.12-6) 7.12.0.20161007-git Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /home/grfz/src/notmuch/notmuch...done. (gdb) b _notmuch_database_log Breakpoint 1 at 0x1f6e0: file lib/database.cc, line 426. (gdb) run Starting program: /home/grfz/src/notmuch/notmuch new [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Found 102164 total files (that's not much mail). Processed 102164 total files in 5m 58s (284 files/sec.). Added 102151 new messages to the database. [Inferior 1 (process 26339) exited normally] (gdb) While indexing the original emacs-org mode mailing list maildir with it's 114563 files with the very same binary results in a xapian exeption: 0 grfz@len:/tmp$ gdb --args /home/grfz/src/notmuch/notmuch new GNU gdb (Debian 7.12-6) 7.12.0.20161007-git Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /home/grfz/src/notmuch/notmuch...done. (gdb) b _notmuch_database_log Breakpoint 1 at 0x1f6e0: file lib/database.cc, line 426. (gdb) run Starting program: /home/grfz/src/notmuch/notmuch new [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Found 114563 total files (that's not much mail). Breakpoint 1, _notmuch_database_log (notmuch=0x5555557d2780, format=0x55555558d258 "A Xapian exception occurred adding message: %s.\n") at lib/database.cc:426 426 { (gdb) bt #0 _notmuch_database_log (notmuch=0x5555557d2780, format=0x55555558d258 "A Xapian exception occurred adding message: %s.\n") at lib/database.cc:426 #1 0x0000555555577f58 in notmuch_database_add_message (notmuch=notmuch@entry=0x5555557d2780, filename=filename@entry=0x555557ae2070 "/home/grfz/Mail/~ml/emacs-orgmode@gnu.org/cur/1488666090.R8081408151026683680.len:2,", message_ret=message_ret@entry=0x7fffffffd2c8) at lib/database.cc:2597 #2 0x000055555556802f in add_file (state=0x7fffffffd540, filename=0x555557ae2070 "/home/grfz/Mail/~ml/emacs-orgmode@gnu.org/cur/1488666090.R8081408151026683680.len:2,", notmuch=0x5555557d2780) at notmuch-new.c:264 #3 add_files (notmuch=notmuch@entry=0x5555557d2780, path=path@entry=0x5555557d25f0 "/home/grfz/Mail/~ml/emacs-orgmode@gnu.org/cur", state=state@entry=0x7fffffffd540) at notmuch-new.c:599 #4 0x0000555555567b44 in add_files (notmuch=0x5555557d2780, path=path@entry=0x5555557d4f90 "/home/grfz/Mail/~ml/emacs-orgmode@gnu.org", state=state@entry=0x7fffffffd540) at notmuch-new.c:483 #5 0x00005555555689ed in notmuch_new_command (config=0x5555557ce1d0, argc=, argv=) at notmuch-new.c:1099 #6 0x0000555555561a27 in main (argc=, argv=0x7fffffffda68) at notmuch.c:456 (gdb) continue Continuing. Error: A Xapian exception occurred. Halting processing. Processed 69364 total files in 25m 59s (44 files/sec.). Added 69350 new messages to the database. Note: A fatal error was encountered: A Xapian exception occurred [Inferior 1 (process 27880) exited with code 01] (gdb) continue The program is not being run. (gdb) The referenced file is the second one I attached in this thread. So your/Davids intuition was right. The problem has to do with this threading problem. I also did extensive memtest and every kind of smartctl test possible, to be sure this is no hardware problem. I then downloaded parts of the archive, merged and sliced it and now I have a sample of 25001 total files (that's not much mail), on which notmuch new produces a xapian exeption: 0 (master *) grfz@len:~$ gdb --args /home/grfz/src/notmuch/notmuch new GNU gdb (Debian 7.12-6) 7.12.0.20161007-git Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /home/grfz/src/notmuch/notmuch...done. (gdb) b _notmuch_database_log Breakpoint 1 at 0x1f6e0: file lib/database.cc, line 426. (gdb) run Starting program: /home/grfz/src/notmuch/notmuch new [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Found 25001 total files (that's not much mail). Breakpoint 1, _notmuch_database_log (notmuch=0x5555557d27f0, format=0x55555558d258 "A Xapian exception occurred adding message: %s.\n") at lib/database.cc:426 426 { (gdb) bt #0 _notmuch_database_log (notmuch=0x5555557d27f0, format=0x55555558d258 "A Xapian exception occurred adding message: %s.\n") at lib/database.cc:426 #1 0x0000555555577f58 in notmuch_database_add_message (notmuch=notmuch@entry=0x5555557d27f0, filename=filename@entry=0x555558dc96c0 "/tmp/reduced-sample/cur/1499897912.R1843131171398763530.len:2,", message_ret=message_ret@entry=0x7fffffffd2e8) at lib/database.cc:2597 #2 0x000055555556802f in add_file (state=0x7fffffffd560, filename=0x555558dc96c0 "/tmp/reduced-sample/cur/1499897912.R1843131171398763530.len:2,", notmuch=0x5555557d27f0) at notmuch-new.c:264 #3 add_files (notmuch=notmuch@entry=0x5555557d27f0, path=path@entry=0x5555557e7950 "/tmp/reduced-sample/cur", state=state@entry=0x7fffffffd560) at notmuch-new.c:599 #4 0x0000555555567b44 in add_files (notmuch=0x5555557d27f0, path=path@entry=0x5555557cf710 "/tmp/reduced-sample", state=state@entry=0x7fffffffd560) at notmuch-new.c:483 #5 0x00005555555689ed in notmuch_new_command (config=0x5555557ce1d0, argc=, argv=) at notmuch-new.c:1099 #6 0x0000555555561a27 in main (argc=, argv=0x7fffffffda88) at notmuch.c:456 (gdb) As a tar.xz it weighs 28 MB. I could provide this for download if someone is interested. Ciao, Gregor