unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Duplicate In-reply-to line 326 lib/message.cc
@ 2009-11-28  9:40 David Bremner
  2009-11-28 18:15 ` Carl Worth
  0 siblings, 1 reply; 2+ messages in thread
From: David Bremner @ 2009-11-28  9:40 UTC (permalink / raw)
  To: notmuch


On the trail of a searching problem, I enabled debugging with 
   make CFLAGS="-g -DDEBUG" CXXFLAGS="-g -DDEBUG"

Now it seems that any search that is non-empty (i.e. matches
something) crashes with a duplicate In-Reply-To ID. This is in git
revision 92c4dcc (although it was the same yesterday).  The oddest
thing is that the second message-id is a common English word.

Here is a trace

dulcinea:~/tmp % ~/projects/notmuch/notmuch search spam
Query string is:
spam
Final query is:
Xapian::Query((Tmail AND Zspam:(pos=1)))
Query string is:
thread:13c033781712e92541a5591320ac0ff4
Query string is:
thread:13c033781712e92541a5591320ac0ff4 AND (spam)
Final query is:
Xapian::Query((Tmail AND 0 * G13c033781712e92541a5591320ac0ff4))
Final query is:
Xapian::Query((Tmail AND 0 * G13c033781712e92541a5591320ac0ff4 AND Zspam:(pos=1)))
Internal error: Message 877htzhn9e.wl%jemarch@gnu.org has duplicate In-Reply-To IDs: 1e5bcefd0911081424p12eb6fa9te57ff4cfeb83fcdd@mail.gmail.com and data
 (lib/message.cc:326).

At the moment I don't have any real good ideas for how to debug this
(or any real familiarity with notmuch internals).  I put a test corpus
of messages (all from public mailing lists) at

   http://pivot.cs.unb.ca/scratch/mailtest.tgz

The current tarball is about 5M.  The machine has plenty of bandwidth
(not meant as a challenge to DDOS hobbyists :) ).

d

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Duplicate In-reply-to line 326 lib/message.cc
  2009-11-28  9:40 Duplicate In-reply-to line 326 lib/message.cc David Bremner
@ 2009-11-28 18:15 ` Carl Worth
  0 siblings, 0 replies; 2+ messages in thread
From: Carl Worth @ 2009-11-28 18:15 UTC (permalink / raw)
  To: David Bremner, notmuch

[-- Attachment #1: Type: text/plain, Size: 3797 bytes --]

On Sat, 28 Nov 2009 05:40:13 -0400, David Bremner <bremner@pivot.cs.unb.ca> wrote:
> Now it seems that any search that is non-empty (i.e. matches
> something) crashes with a duplicate In-Reply-To ID. This is in git
> revision 92c4dcc (although it was the same yesterday).  The oddest
> thing is that the second message-id is a common English word.
...
> Internal error: Message 877htzhn9e.wl%jemarch@gnu.org has duplicate In-Reply-To IDs: 1e5bcefd0911081424p12eb6fa9te57ff4cfeb83fcdd@mail.gmail.com and data
>  (lib/message.cc:326).

Thanks David,

I replicated this without any difficulty. And the fix was to just
correct a stupid mistake on my part. The only reason I hadn't noticed
this myself earlier is that I've been doing debug builds with:

	make CFLAGS="-g -DDEBUG"

instead of:

        make CFLAGS="-g -DDEBUG" CXXFLAGS="-g -DDEBUG"

If we can, I'd like to see about making the former work, to avoid hiding
things like this in the future.

> At the moment I don't have any real good ideas for how to debug this
> (or any real familiarity with notmuch internals).  I put a test corpus
> of messages (all from public mailing lists) at

Before I realized how easy the bug was to replicate and fix, I was going
to give a couple of debugging ideas here. I guess I'll briefly mention
things anyway.

The core of what we store in the database for each message is a single
list of "terms", (each a string of text). We use different terms for
different purposes by prefixing some with particular sub-strings. See
the large comment at the top of lib/database.cc for some details on
this.

So if there *were* an actual case of a duplicate In-Reply-To term here,
the first thing to do would be to inspect the actual terms in the
database for the document of the message of interest. Up until now, what
I've been using for this is a little utility I wrote called
xapian-dump. It exists deep in the code history of notmuch. So one could
use git log to find the commit that removed it and then check out the
commit before that to get the utility.

But xapian-dump is pretty dumb and all it does is dump all terms from
all documents in the database, (it also dumps all the data and values
From those documents, but we're not talking about those parts
here). So that's a *lot* of output. More interesting would be a tool to
dump just the terms from the message you're wanting to debug. So that's
why I want to introduce a new "notmuch search --for=terms" or so to have
a much more useful debugging tool.

Anyway, I hope that was informative.

Thanks for reporting the bug!

-Carl

commit 64c8d6227a90ea6c37ea112ee20b14f16b9b46e7
Author: Carl Worth <cworth@cworth.org>
Date:   Sat Nov 28 10:01:22 2009 -0800

    Avoid bogus internal error reporting duplicate In-Reply-To IDs.
    
    This error was tirggered with a debugging build via:
    
        make CXXFLAGS="-DDEBUG"
    
    and reported by David Bremner. The actual error is that I'm an
    idiot that doesn't know how to use strcmp's return value. Of
    course, the strcmp interface scores a negative 7 on Rusty Russell
    ranking of bad interfaces:
    
    http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

diff --git a/lib/message.cc b/lib/message.cc
index 03b8c81..49519f1 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -318,7 +318,7 @@ _notmuch_message_get_in_reply_to (notmuch_message_t *message
     in_reply_to = *i;
 
     if (i != message->doc.termlist_end () &&
-       strncmp ((*i).c_str (), prefix, prefix_len))
+       strncmp ((*i).c_str (), prefix, prefix_len) == 0)
     {
        INTERNAL_ERROR ("Message %s has duplicate In-Reply-To IDs: %s and %s\n",
                        notmuch_message_get_message_id (message),

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-11-28 18:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-28  9:40 Duplicate In-reply-to line 326 lib/message.cc David Bremner
2009-11-28 18:15 ` Carl Worth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).