Michael J Gruber writes: > > Yes, the extra ones all are ghosts, and I slowly remember that they > scared me in the past already ... > > These ghosts appear to be pretty common. It happens all the time that > I am joined to an existing discussion thread where I do not have all > references. I have about 8% ghost messages in my 730k messages. I don't think I have any situation as extreme as you do with hundreds of ghost messages for a small number of actual messages in thread. If you would like to calculate the ratio for your mail store, you can run % xapian-delve -v -A Tghost ~/.local/share/notmuch/default/xapian % xapian-delve -v -A Tmail ~/.local/share/notmuch/default/xapian > I'd go as far as to say that counting ghosts as thread > members makes this useless for me. On the other hand, notmuch's own > count gets this right. And getting different counts is even more > confusing. The count shown in e.g. notmuch search is calculated after the query has been run, so it isn't easily usable as part of a query. Maybe there is a way to trade off some performance for less false positives. In principle we could do a query for each thread found by the current technique to postprocess the results. I can see that getting pretty slow if there are many results though. At least for the original motivation of looking for messages without replies counting ghost messages makes some sense. In general it also makes sense for finding large threads. I did the query '(thread (count 200 *))' on my mail store and most matches are genuinely large threads. A few are false positive like the one you describe. In my case it is easy to see where the ghosts come from, as the (spam) messages have hundreds of (presumably fictional) references. > >> 2) Do they have more than one G term? That suggests a bug somewhere. We >> actually have a test in the test suite [1] for that, but of course that is >> with a simple artificial database. > > No, they all have one. But their sheer number looks suspicious: those > 5 "real" e-mails have maybe 20 reference headers in total, and some of > them refer to some of those 5. Grepping the account store for those > references gives me around that number. Where do the 110 ghosts (90 > extra) come from which this thread points to? Still scared by them ... > we need ghost busters! The only information attached to a ghost message is the thread-id and the message-id. You can get a visual picture of the thread with the attached script. But that will probably just confirm what you did with grep. To see what is in the database, you can run % quest -btype:T -bthread:G -d mail/.notmuch/xapian "type:ghost and thread:0000000000000002" That gives you record numbers, that you can examine with xapian-delve -r.