* Proof of concept for counting messages in thread @ 2023-02-13 12:26 David Bremner 2023-02-13 12:26 ` [PATCH 1/2] WIP/lib: add count query backend David Bremner ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: David Bremner @ 2023-02-13 12:26 UTC (permalink / raw) To: notmuch; +Cc: pabs So for this only supports counting messages in threads, and the sexp based query parser. It seems useful to expand it to other fields (from, e.g.). I'm not sure how motivated I am to shim this into the infix query parser, but we will see how it goes. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] WIP/lib: add count query backend 2023-02-13 12:26 Proof of concept for counting messages in thread David Bremner @ 2023-02-13 12:26 ` David Bremner 2023-02-13 12:26 ` [PATCH 2/2] WIP: support thread count queries David Bremner 2023-02-13 15:39 ` Proof of concept for counting messages in thread Michael J Gruber 2 siblings, 0 replies; 12+ messages in thread From: David Bremner @ 2023-02-13 12:26 UTC (permalink / raw) To: notmuch; +Cc: pabs --- lib/Makefile.local | 3 +- lib/count-query.cc | 62 ++++++++++++++++++++++++++++++++++++++++++ lib/database-private.h | 6 ++++ 3 files changed, 70 insertions(+), 1 deletion(-) create mode 100644 lib/count-query.cc diff --git a/lib/Makefile.local b/lib/Makefile.local index 4e766305..cc646946 100644 --- a/lib/Makefile.local +++ b/lib/Makefile.local @@ -66,7 +66,8 @@ libnotmuch_cxx_srcs = \ $(dir)/init.cc \ $(dir)/parse-sexp.cc \ $(dir)/sexp-fp.cc \ - $(dir)/lastmod-fp.cc + $(dir)/lastmod-fp.cc \ + $(dir)/count-query.cc libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o) diff --git a/lib/count-query.cc b/lib/count-query.cc new file mode 100644 index 00000000..5d258880 --- /dev/null +++ b/lib/count-query.cc @@ -0,0 +1,62 @@ +/* count-query.cc - generate queries for terms on few / many messages. + * + * This file is part of notmuch. + * + * Copyright © 2023 David Bremner + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see https://www.gnu.org/licenses/ . + * + * Author: David Bremner <david@tethera.net> + */ + +#include "database-private.h" + +notmuch_status_t +_notmuch_count_strings_to_query (notmuch_database_t *notmuch, std::string field, + const std::string &from, const std::string &to, + Xapian::Query &output, std::string &msg) +{ + + long from_idx = 0, to_idx = LONG_MAX; + std::string term_prefix = _find_prefix (field.c_str ()); + std::vector<std::string> terms; + + if (! from.empty ()) { + try { + from_idx = std::stol(from); + } catch (std::logic_error &e) { + msg = "bad 'from' count: '" + from + "'"; + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + } + + if (! to.empty ()) { + try { + to_idx = std::stod(to); + } catch (std::logic_error &e) { + msg = "bad 'to' count: '" + to + "'"; + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + } + + for (Xapian::TermIterator it = notmuch->xapian_db->allterms_begin (term_prefix); + it != notmuch->xapian_db->allterms_end (); ++it) { + Xapian::doccount freq = it.get_termfreq(); + if (from_idx <= freq && freq <= to_idx) + terms.push_back (*it); + } + + output = Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ()); + return NOTMUCH_STATUS_SUCCESS; +} diff --git a/lib/database-private.h b/lib/database-private.h index b9be4e22..ba96a93c 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -387,5 +387,11 @@ notmuch_status_t _notmuch_lastmod_strings_to_query (notmuch_database_t *notmuch, const std::string &from, const std::string &to, Xapian::Query &output, std::string &msg); + +/* count-query.cc */ +notmuch_status_t +_notmuch_count_strings_to_query (notmuch_database_t *notmuch, std::string field, + const std::string &from, const std::string &to, + Xapian::Query &output, std::string &msg); #endif #endif -- 2.39.1 \r ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] WIP: support thread count queries 2023-02-13 12:26 Proof of concept for counting messages in thread David Bremner 2023-02-13 12:26 ` [PATCH 1/2] WIP/lib: add count query backend David Bremner @ 2023-02-13 12:26 ` David Bremner 2023-02-13 15:39 ` Proof of concept for counting messages in thread Michael J Gruber 2 siblings, 0 replies; 12+ messages in thread From: David Bremner @ 2023-02-13 12:26 UTC (permalink / raw) To: notmuch; +Cc: pabs --- lib/parse-sexp.cc | 35 ++++++++++++++++++++++++++++++++--- test/T081-sexpr-search.sh | 6 ++++++ 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/lib/parse-sexp.cc b/lib/parse-sexp.cc index 9cadbc13..1faa9023 100644 --- a/lib/parse-sexp.cc +++ b/lib/parse-sexp.cc @@ -34,6 +34,8 @@ typedef enum { SEXP_FLAG_ORPHAN = 1 << 8, SEXP_FLAG_RANGE = 1 << 9, SEXP_FLAG_PATHNAME = 1 << 10, + SEXP_FLAG_COUNT = 1 << 11, + SEXP_FLAG_MODIFIER = 1 << 12, } _sexp_flag_t; /* @@ -70,6 +72,8 @@ static _sexp_prefix_t prefixes[] = SEXP_FLAG_FIELD }, { "date", Xapian::Query::OP_INVALID, Xapian::Query::MatchAll, SEXP_FLAG_RANGE }, + { "count", Xapian::Query::OP_INVALID, Xapian::Query::MatchAll, + SEXP_FLAG_RANGE | SEXP_FLAG_MODIFIER }, { "from", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | SEXP_FLAG_EXPAND }, { "folder", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, @@ -113,7 +117,8 @@ static _sexp_prefix_t prefixes[] = { "tag", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | SEXP_FLAG_EXPAND }, { "thread", Xapian::Query::OP_OR, Xapian::Query::MatchNothing, - SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | SEXP_FLAG_EXPAND }, + SEXP_FLAG_FIELD | SEXP_FLAG_BOOLEAN | SEXP_FLAG_WILDCARD | SEXP_FLAG_REGEX | + SEXP_FLAG_EXPAND | SEXP_FLAG_COUNT }, { "to", Xapian::Query::OP_AND, Xapian::Query::MatchAll, SEXP_FLAG_FIELD | SEXP_FLAG_WILDCARD | SEXP_FLAG_EXPAND }, { } @@ -513,6 +518,7 @@ _sexp_expand_param (notmuch_database_t *notmuch, const _sexp_prefix_t *parent, static notmuch_status_t _sexp_parse_range (notmuch_database_t *notmuch, const _sexp_prefix_t *prefix, + const _sexp_prefix_t *parent, const sexp_t *sx, Xapian::Query &output) { const char *from, *to; @@ -552,6 +558,27 @@ _sexp_parse_range (notmuch_database_t *notmuch, const _sexp_prefix_t *prefix, to = ""; } + if (strcmp (prefix->name, "count") == 0) { + notmuch_status_t status; + if (! parent) { + _notmuch_database_log (notmuch, "illegal '%s' outside field\n", + prefix->name); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + if (! (parent->flags & SEXP_FLAG_COUNT)) { + _notmuch_database_log (notmuch, "'%s' not supported in field '%s'\n", + prefix->name, parent->name); + return NOTMUCH_STATUS_BAD_QUERY_SYNTAX; + } + + status = _notmuch_count_strings_to_query (notmuch, parent->name, from, to, output, msg); + if (status) { + if (! msg.empty ()) + _notmuch_database_log (notmuch, "%s\n", msg.c_str ()); + } + return status; + } + if (strcmp (prefix->name, "date") == 0) { notmuch_status_t status; status = _notmuch_date_strings_to_query (NOTMUCH_VALUE_TIMESTAMP, from, to, output, msg); @@ -654,7 +681,9 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const _sexp_prefix_t *parent for (_sexp_prefix_t *prefix = prefixes; prefix && prefix->name; prefix++) { if (strcmp (prefix->name, sx->list->val) == 0) { - if (prefix->flags & (SEXP_FLAG_FIELD | SEXP_FLAG_RANGE)) { + if ((prefix->flags & (SEXP_FLAG_FIELD)) || + ((prefix->flags & SEXP_FLAG_RANGE) && + ! (prefix->flags & SEXP_FLAG_MODIFIER))) { if (parent) { _notmuch_database_log (notmuch, "nested field: '%s' inside '%s'\n", prefix->name, parent->name); @@ -677,7 +706,7 @@ _sexp_to_xapian_query (notmuch_database_t *notmuch, const _sexp_prefix_t *parent } if (prefix->flags & SEXP_FLAG_RANGE) - return _sexp_parse_range (notmuch, prefix, sx->list->next, output); + return _sexp_parse_range (notmuch, prefix, parent, sx->list->next, output); if (strcmp (prefix->name, "infix") == 0) { return _sexp_parse_infix (notmuch, sx->list->next, output); diff --git a/test/T081-sexpr-search.sh b/test/T081-sexpr-search.sh index 0c7db9c2..2013fa5c 100755 --- a/test/T081-sexpr-search.sh +++ b/test/T081-sexpr-search.sh @@ -1318,5 +1318,11 @@ notmuch search subject:notmuch or List:notmuch | notmuch_search_sanitize > EXPEC notmuch search --query=sexp '(About notmuch)' | notmuch_search_sanitize > OUTPUT test_expect_equal_file EXPECTED OUTPUT +test_begin_subtest "threads with one message" +notmuch search --query=sexp '(and (from gusarov) (thread (count 1)))' | notmuch_search_sanitize > OUTPUT +cat <<EOF >EXPECTED +thread:XXX 2009-11-17 [1/1] Mikhail Gusarov; [notmuch] [PATCH] Handle rename of message file (inbox unread) +EOF +test_expect_equal_file EXPECTED OUTPUT test_done -- 2.39.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-13 12:26 Proof of concept for counting messages in thread David Bremner 2023-02-13 12:26 ` [PATCH 1/2] WIP/lib: add count query backend David Bremner 2023-02-13 12:26 ` [PATCH 2/2] WIP: support thread count queries David Bremner @ 2023-02-13 15:39 ` Michael J Gruber 2023-02-13 16:32 ` David Bremner 2 siblings, 1 reply; 12+ messages in thread From: Michael J Gruber @ 2023-02-13 15:39 UTC (permalink / raw) To: David Bremner, notmuch; +Cc: pabs Am Mo., 13. Feb. 2023 um 13:26 Uhr schrieb David Bremner <david@tethera.net>: > > So for this only supports counting messages in threads, and the sexp > based query parser. It seems useful to expand it to other fields > (from, e.g.). I'm not sure how motivated I am to shim this into the > infix query parser, but we will see how it goes. This certainly looks interesting, and not easy to get by scripting around the existing commands. It is kinda special, so having it in sexp only seems okay. I am getting a few surprising matches, e.g. ``` notmuch search --query=sexp '(thread (count 115)))' thread:0000000000021229 2021-05-17 [5/5] Michael J Gruber ... redacted notmuch count --exclude=false thread:0000000000021229 5 ``` It could be some database issues, of course. Or me misunderstanding something :) Patch 1/2 is crlf garbled, by the way. Applies cleanly after removing the extra ^Ms. Michael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-13 15:39 ` Proof of concept for counting messages in thread Michael J Gruber @ 2023-02-13 16:32 ` David Bremner 2023-02-13 17:03 ` Michael J Gruber 0 siblings, 1 reply; 12+ messages in thread From: David Bremner @ 2023-02-13 16:32 UTC (permalink / raw) To: Michael J Gruber, notmuch; +Cc: pabs Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes: > I am getting a few surprising matches, e.g. > ``` > notmuch search --query=sexp '(thread (count 115)))' > thread:0000000000021229 2021-05-17 [5/5] Michael J Gruber ... redacted > notmuch count --exclude=false thread:0000000000021229 > 5 > ``` > It could be some database issues, of course. Or me misunderstanding something :) Hmm. I don't see any strange matches for that particular query, just a thread that actually has 115 messages. But there could also be bugs of course. Does xapin-check complain about your database? > > Patch 1/2 is crlf garbled, by the way. Applies cleanly after removing > the extra ^Ms. Hmm. Probably because of Content-Transfer-Encoding: 8bit I have a direct mailed copy that didn't go through mailman, and that looks OK. > > Michael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-13 16:32 ` David Bremner @ 2023-02-13 17:03 ` Michael J Gruber 2023-02-13 20:23 ` David Bremner 0 siblings, 1 reply; 12+ messages in thread From: Michael J Gruber @ 2023-02-13 17:03 UTC (permalink / raw) To: David Bremner; +Cc: notmuch, pabs Am Mo., 13. Feb. 2023 um 17:32 Uhr schrieb David Bremner <david@tethera.net>: > > Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes: > > > I am getting a few surprising matches, e.g. > > ``` > > notmuch search --query=sexp '(thread (count 115)))' > > thread:0000000000021229 2021-05-17 [5/5] Michael J Gruber ... redacted > > notmuch count --exclude=false thread:0000000000021229 > > 5 > > ``` > > It could be some database issues, of course. Or me misunderstanding something :) > > Hmm. I don't see any strange matches for that particular query, just a > thread that actually has 115 messages. But there could also be bugs of > course. Does xapin-check complain about your database? It has 5, as confirmed by the search output and that of `notmuch count`. But it is matched by `count 115`. `xapian-check` is happy. (There used to be some issue with additional thread entries at some point.) Michael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-13 17:03 ` Michael J Gruber @ 2023-02-13 20:23 ` David Bremner 2023-02-13 22:36 ` Michael J Gruber 0 siblings, 1 reply; 12+ messages in thread From: David Bremner @ 2023-02-13 20:23 UTC (permalink / raw) To: Michael J Gruber; +Cc: notmuch Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes: > > It has 5, as confirmed by the search output and that of `notmuch > count`. But it is matched by `count 115`. > `xapian-check` is happy. (There used to be some issue with additional > thread entries at some point.) > > Michael A simple test to try is % xapian-delve -t G0000000000021229 \ ~/.local/share/notmuch/default/xapian adjusting your database path as needed. If that says "termfreq 115", then something is broken (or at least confusing) about your database (possibly related to the previous issues with threading). In that case I'm curious if there are 115 distinct record numbers. You can find all of the thread-ids attached to a given message with % xapian-delve -1r 267585 ~/.local/share/notmuch/default/xapian | grep ^G where 267585 is an example record number in my database. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-13 20:23 ` David Bremner @ 2023-02-13 22:36 ` Michael J Gruber 2023-02-14 1:47 ` David Bremner 0 siblings, 1 reply; 12+ messages in thread From: Michael J Gruber @ 2023-02-13 22:36 UTC (permalink / raw) To: David Bremner; +Cc: notmuch Am Mo., 13. Feb. 2023 um 21:23 Uhr schrieb David Bremner <david@tethera.net>: > > Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes: > > > > It has 5, as confirmed by the search output and that of `notmuch > > count`. But it is matched by `count 115`. > > `xapian-check` is happy. (There used to be some issue with additional > > thread entries at some point.) > > > > Michael > > A simple test to try is > > % xapian-delve -t G0000000000021229 \ > ~/.local/share/notmuch/default/xapian > > adjusting your database path as needed. > > If that says "termfreq 115", then something is broken (or at least > confusing) about your database (possibly related to the previous issues > with threading). In that case I'm curious if there are 115 distinct > record numbers. You can find all of the thread-ids attached to a given > message with > > % xapian-delve -1r 267585 ~/.local/share/notmuch/default/xapian | grep ^G > > where 267585 is an example record number in my database. That is really weird: ``` xapian-delve -t G0000000000021229 . Posting List for term 'G0000000000021229' (termfreq 115, collfreq 0, wdf_max 0): 146259 ... ``` with 115 record numbers, all different. Doing `xapian-delve -1r` for each of them and grepping for the G-lines gives 115 times that correct thread id. Grepping for the Q-lines and notmuch-searching for the message ids gives only 5 results (the expected ones). Apparantly, there are bogus mail records which that thread points to. I guess I should recreate the db, if I only knew how lieer deals with a reindexed mail store ... (The thread and the 5 message sit in an mbsynced folder, but lieer syncs other folders with that same db). Michael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-13 22:36 ` Michael J Gruber @ 2023-02-14 1:47 ` David Bremner 2023-02-18 17:47 ` Michael J Gruber 0 siblings, 1 reply; 12+ messages in thread From: David Bremner @ 2023-02-14 1:47 UTC (permalink / raw) To: Michael J Gruber; +Cc: notmuch Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes: > That is really weird: > ``` > xapian-delve -t G0000000000021229 . > Posting List for term 'G0000000000021229' (termfreq 115, collfreq 0, > wdf_max 0): 146259 ... > ``` > with 115 record numbers, all different. > Doing `xapian-delve -1r` for each of them and grepping for the G-lines > gives 115 times that correct thread id. > Grepping for the Q-lines and notmuch-searching for the message ids > gives only 5 results (the expected ones). Apparantly, there are bogus > mail records which that thread points to. 1) Do those "bogus" records have a "Tghost" term? That would be for messages that are known via references, but not actually in the local database. This is a bug / feature of the current implementation, it counts all messages known, whether or not local copies exist. 2) Do they have more than one G term? That suggests a bug somewhere. We actually have a test in the test suite [1] for that, but of course that is with a simple artificial database. [1]: in T670-duplicate-mid.sh: db=$HOME/.local/share/notmuch/default/xapian for doc in $(xapian-delve -1 -t '' "$db" | grep '^[1-9]'); do xapian-delve -1 -r "$doc" "$db" | grep -c '^G' done > OUTPUT.raw sort -u < OUTPUT.raw > OUTPUT ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-14 1:47 ` David Bremner @ 2023-02-18 17:47 ` Michael J Gruber 2023-02-19 13:04 ` David Bremner 0 siblings, 1 reply; 12+ messages in thread From: Michael J Gruber @ 2023-02-18 17:47 UTC (permalink / raw) To: David Bremner; +Cc: notmuch Am Di., 14. Feb. 2023 um 02:47 Uhr schrieb David Bremner <david@tethera.net>: > > Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes: > > > That is really weird: > > ``` > > xapian-delve -t G0000000000021229 . > > Posting List for term 'G0000000000021229' (termfreq 115, collfreq 0, > > wdf_max 0): 146259 ... > > ``` > > with 115 record numbers, all different. > > Doing `xapian-delve -1r` for each of them and grepping for the G-lines > > gives 115 times that correct thread id. > > Grepping for the Q-lines and notmuch-searching for the message ids > > gives only 5 results (the expected ones). Apparantly, there are bogus > > mail records which that thread points to. > > 1) Do those "bogus" records have a "Tghost" term? That would be for > messages that are known via references, but not actually in the local > database. This is a bug / feature of the current implementation, it > counts all messages known, whether or not local copies exist. Yes, the extra ones all are ghosts, and I slowly remember that they scared me in the past already ... These ghosts appear to be pretty common. It happens all the time that I am joined to an existing discussion thread where I do not have all references. I'd go as far as to say that counting ghosts as thread members makes this useless for me. On the other hand, notmuch's own count gets this right. And getting different counts is even more confusing. > 2) Do they have more than one G term? That suggests a bug somewhere. We > actually have a test in the test suite [1] for that, but of course that is > with a simple artificial database. No, they all have one. But their sheer number looks suspicious: those 5 "real" e-mails have maybe 20 reference headers in total, and some of them refer to some of those 5. Grepping the account store for those references gives me around that number. Where do the 110 ghosts (90 extra) come from which this thread points to? Still scared by them ... we need ghost busters! Michael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-18 17:47 ` Michael J Gruber @ 2023-02-19 13:04 ` David Bremner 2023-02-19 13:56 ` David Bremner 0 siblings, 1 reply; 12+ messages in thread From: David Bremner @ 2023-02-19 13:04 UTC (permalink / raw) To: Michael J Gruber; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 2798 bytes --] Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes: > > Yes, the extra ones all are ghosts, and I slowly remember that they > scared me in the past already ... > > These ghosts appear to be pretty common. It happens all the time that > I am joined to an existing discussion thread where I do not have all > references. I have about 8% ghost messages in my 730k messages. I don't think I have any situation as extreme as you do with hundreds of ghost messages for a small number of actual messages in thread. If you would like to calculate the ratio for your mail store, you can run % xapian-delve -v -A Tghost ~/.local/share/notmuch/default/xapian % xapian-delve -v -A Tmail ~/.local/share/notmuch/default/xapian > I'd go as far as to say that counting ghosts as thread > members makes this useless for me. On the other hand, notmuch's own > count gets this right. And getting different counts is even more > confusing. The count shown in e.g. notmuch search is calculated after the query has been run, so it isn't easily usable as part of a query. Maybe there is a way to trade off some performance for less false positives. In principle we could do a query for each thread found by the current technique to postprocess the results. I can see that getting pretty slow if there are many results though. At least for the original motivation of looking for messages without replies counting ghost messages makes some sense. In general it also makes sense for finding large threads. I did the query '(thread (count 200 *))' on my mail store and most matches are genuinely large threads. A few are false positive like the one you describe. In my case it is easy to see where the ghosts come from, as the (spam) messages have hundreds of (presumably fictional) references. > >> 2) Do they have more than one G term? That suggests a bug somewhere. We >> actually have a test in the test suite [1] for that, but of course that is >> with a simple artificial database. > > No, they all have one. But their sheer number looks suspicious: those > 5 "real" e-mails have maybe 20 reference headers in total, and some of > them refer to some of those 5. Grepping the account store for those > references gives me around that number. Where do the 110 ghosts (90 > extra) come from which this thread points to? Still scared by them ... > we need ghost busters! The only information attached to a ghost message is the thread-id and the message-id. You can get a visual picture of the thread with the attached script. But that will probably just confirm what you did with grep. To see what is in the database, you can run % quest -btype:T -bthread:G -d mail/.notmuch/xapian "type:ghost and thread:0000000000000002" That gives you record numbers, that you can examine with xapian-delve -r. [-- Attachment #2: draw-thread --] [-- Type: application/octet-stream, Size: 912 bytes --] #!/bin/bash # This script can be used like # NOTMUCH_CONFIG=test/tmp.T580-thread-search/notmuch-config \ # devel/draw-thread thread:0000000000000002 | dot -Tpdf > thread2.pdf # In addition to notmuch, you will need the following tools installed # - graphviz # - formail (part of procmail) threadid=$1 declare -a edges declare -a dest echo "digraph \"$threadid\" {" for messageid in $(notmuch search --output=messages $threadid); do echo "subgraph \"cluster_$messageid\" {" printf "\"%s\" [shape=folder];\n" ${messageid#id:} for file in $(notmuch search --output=files $messageid); do node=$(basename $file) printf "\"%s\" [shape=note];\n" $node mapfile -t dest < <(formail -x references < $file | tr '<>,' '"" ') edge="\"$node\" -> { ${dest[*]} }" edges+=($edge) done echo "}" done for edge in "${edges[*]}"; do echo $edge done echo "}" [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Proof of concept for counting messages in thread 2023-02-19 13:04 ` David Bremner @ 2023-02-19 13:56 ` David Bremner 0 siblings, 0 replies; 12+ messages in thread From: David Bremner @ 2023-02-19 13:56 UTC (permalink / raw) To: Michael J Gruber; +Cc: notmuch David Bremner <david@tethera.net> writes: > Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes: > >> >> Yes, the extra ones all are ghosts, and I slowly remember that they >> scared me in the past already ... >> >> These ghosts appear to be pretty common. It happens all the time that >> I am joined to an existing discussion thread where I do not have all >> references. > > I have about 8% ghost messages in my 730k messages. I don't think I have > any situation as extreme as you do with hundreds of ghost messages for a > small number of actual messages in thread. That turns out to be a lie, as I wrote below. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-02-19 14:01 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-13 12:26 Proof of concept for counting messages in thread David Bremner 2023-02-13 12:26 ` [PATCH 1/2] WIP/lib: add count query backend David Bremner 2023-02-13 12:26 ` [PATCH 2/2] WIP: support thread count queries David Bremner 2023-02-13 15:39 ` Proof of concept for counting messages in thread Michael J Gruber 2023-02-13 16:32 ` David Bremner 2023-02-13 17:03 ` Michael J Gruber 2023-02-13 20:23 ` David Bremner 2023-02-13 22:36 ` Michael J Gruber 2023-02-14 1:47 ` David Bremner 2023-02-18 17:47 ` Michael J Gruber 2023-02-19 13:04 ` David Bremner 2023-02-19 13:56 ` David Bremner
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).