Hi there After updating to notmuch 0.32 I noticed a few strange threadings that went away after downgrading to 0.31.3 and reindexing. I can't pin-point that yet but wanted to give an early warning so you can look out for that, too. My setup: mail is fetched with mbsync and indexed with "notmuch new" (with afew run from pre and post hooks in move and tag mode, using the notmuch bindings). "Strange threadings": Unrelated mails without relevant headers (references, in-reply-to) from different folders end up in the same thread with nm 0.32; reindexing the thread with nm 0.31 puts them in different threads as expected. I am using a mix of file and db config as is usual pre 0.32 but that should not affect threading, I think. Michael
Michael J Gruber <git@grubix.eu> writes:
> After updating to notmuch 0.32 I noticed a few strange threadings that
> went away after downgrading to 0.31.3 and reindexing. I can't pin-point
> that yet but wanted to give an early warning so you can look out for
> that, too.
I'm not aware of any threading relevant changes in 0.32, but there was a
lot of churn in the library. If you could bisect or give a reproducer
that would be great.
David Bremner venit, vidit, dixit 2021-05-05 14:29:38:
> Michael J Gruber <git@grubix.eu> writes:
>
> > After updating to notmuch 0.32 I noticed a few strange threadings that
> > went away after downgrading to 0.31.3 and reindexing. I can't pin-point
> > that yet but wanted to give an early warning so you can look out for
> > that, too.
>
> I'm not aware of any threading relevant changes in 0.32, but there was a
> lot of churn in the library. If you could bisect or give a reproducer
> that would be great.
I could not reproduce by reindexing with nm 0.32, so I started using nm
0.32 again. Indeed, I got another mis-threading. I reindexed with nm 0.32
(without downgrading) and the thread got separated correctly again. That
explains why I was not able to create a reproducer. So it seems:
- The mis-threading happens during `notmuch new`, not with `notmuch
reindex`.
- In this new case (and if I remember correctly also in the others),
it's always a new message getting worngly put into an existing thread,
and if I'm not mistaken, the existing thread was tagged as trash
before in all cases.
Michael
Michael J Gruber <git@grubix.eu> writes:
> [...]
> So it seems:
>
> - The mis-threading happens during `notmuch new`, not with `notmuch
> reindex`.
> - In this new case (and if I remember correctly also in the others),
> it's always a new message getting worngly put into an existing thread,
> and if I'm not mistaken, the existing thread was tagged as trash
> before in all cases.
> [...]
I can confirm both observations.
--alexander
Alexander Adolf <alexander.adolf@condition-alpha.com> writes:
> Michael J Gruber <git@grubix.eu> writes:
>
>> [...]
>> So it seems:
>>
>> - The mis-threading happens during `notmuch new`, not with `notmuch
>> reindex`.
>> - In this new case (and if I remember correctly also in the others),
>> it's always a new message getting worngly put into an existing thread,
>> and if I'm not mistaken, the existing thread was tagged as trash
>> before in all cases.
>> [...]
>
> I can confirm both observations.
> [...]
p.s.: Just got the weird threading with `notmuch reindex`, too.
Alexander Adolf venit, vidit, dixit 2021-05-11 16:32:22:
> Alexander Adolf <alexander.adolf@condition-alpha.com> writes:
>
> > Michael J Gruber <git@grubix.eu> writes:
> >
> >> [...]
> >> So it seems:
> >>
> >> - The mis-threading happens during `notmuch new`, not with `notmuch
> >> reindex`.
> >> - In this new case (and if I remember correctly also in the others),
> >> it's always a new message getting worngly put into an existing thread,
> >> and if I'm not mistaken, the existing thread was tagged as trash
> >> before in all cases.
> >> [...]
> >
> > I can confirm both observations.
> > [...]
>
> p.s.: Just got the weird threading with `notmuch reindex`, too.
Oh my gosh... This is getting interesting.
I'm delving (literally) into the xapian db now. "Put into an existing
thread" (what I had wiritten) was not correct in terms of thread IDs.
What's happening is the following:
I have an existing message A which is tagged as trash.
A is the only message in thread 0000000000021144.
A new message B is put in the db by "notmuch new".
Notmuch correctly creates a new thread 0000000000021148 (the next
avalaible id) and puts B in this new thread.
G0000000000021148 is the only thread term in the db for the document
belonging to message B.
So far so good, but: The document for message A has three thread terms
now:
G0000000000021144 G0000000000021148 G0000000000021149
Note that neither A nor B have any in-reply-to or references header.
AFAIK multiple thread terms on a single message document are a complete
no-go and indicate a problem, especially when an unrelated existing message's
entry is touched.
notmuch search --exclude=false thread:0000000000021148 lists both A and
B now, of course.
The third one, G0000000000021149, is completely weird. It leads to yet
another message document with multiple thread entries.
Looking at a few of the most recent messages, my suspicion is:
- A new message with in-reply-to/references get's a single (existing)
thread term correctly.
- A new message without in-reply-to/references get's the correct new
thread term; in addition, this get's assigned to some random existing
message by *adding* it to the list of terms, thereby making that
message part of multiple threads.
I have not checked systematically yet whether it (the multi-G-terms)
indeed affects Ktrash ones only.
Michael
... just a guess: Could it be that a9f74aee ("CLI/new: drop the write lock to run the pre-new hook.", 2021-03-18) was not enough? notmuch_database_reopen() only reopens the xapian db but does not update other members in notmuch_database_t *notmuch, such as the last doc id and thread id. If a pre-merge hook changes the database then values in that struct will be out of date. Before the config changes in nm 0.32, there was no such struct to begin with. After that, notmuch holds the struct just to be able to run the hook (from the proper dir). So I would think that reopen()ing to MODE_READ_ONLY is not a problem. But after the hook run, are full close() then open() to MODE_READ_WRITE() is necessary so that the values in the struct are correct (or change reopen() to do that). Indeed, if you open in MODE_READ_ONLY and don't hold a write lock you cannot trust any cached values such as those stored in the struct, can you? Michael
Michael J Gruber <git@grubix.eu> writes:
> ... just a guess: Could it be that
>
> a9f74aee ("CLI/new: drop the write lock to run the pre-new hook.", 2021-03-18)
>
> was not enough?
>
> notmuch_database_reopen() only reopens the xapian db but does not update
> other members in notmuch_database_t *notmuch, such as the last doc id
> and thread id.
That sounds plausible. By happy coincidence I prepared a patch for this
on the weekend as part of a different series. I will send the patch as a
followup. It would be nice to have a regression test for this, but I'm
not sure how easy it is to reproduce deterministically.
d
In some uses of reopen, new documents and threads maybe have been added, and e.g. compaction may have changed the uuid. --- lib/open.cc | 58 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 33 insertions(+), 25 deletions(-) diff --git a/lib/open.cc b/lib/open.cc index bdb695fe..0ceca921 100644 --- a/lib/open.cc +++ b/lib/open.cc @@ -325,6 +325,36 @@ _init_libs () } } +static void +_load_database_state (notmuch_database_t *notmuch) { + std::string last_thread_id; + std::string last_mod; + + notmuch->last_doc_id = notmuch->xapian_db->get_lastdocid (); + last_thread_id = notmuch->xapian_db->get_metadata ("last_thread_id"); + if (last_thread_id.empty ()) { + notmuch->last_thread_id = 0; + } else { + const char *str; + char *end; + + str = last_thread_id.c_str (); + notmuch->last_thread_id = strtoull (str, &end, 16); + if (*end != '\0') + INTERNAL_ERROR ("Malformed database last_thread_id: %s", str); + } + + /* Get current highest revision number. */ + last_mod = notmuch->xapian_db->get_value_upper_bound ( + NOTMUCH_VALUE_LAST_MOD); + if (last_mod.empty ()) + notmuch->revision = 0; + else + notmuch->revision = Xapian::sortable_unserialise (last_mod); + notmuch->uuid = talloc_strdup ( + notmuch, notmuch->xapian_db->get_uuid ().c_str ()); +} + static notmuch_status_t _finish_open (notmuch_database_t *notmuch, const char *profile, @@ -339,8 +369,6 @@ _finish_open (notmuch_database_t *notmuch, const char *database_path = notmuch_database_get_path (notmuch); try { - std::string last_thread_id; - std::string last_mod; if (mode == NOTMUCH_DATABASE_MODE_READ_WRITE) { notmuch->writable_xapian_db = new Xapian::WritableDatabase (notmuch->xapian_path, @@ -384,29 +412,7 @@ _finish_open (notmuch_database_t *notmuch, goto DONE; } - notmuch->last_doc_id = notmuch->xapian_db->get_lastdocid (); - last_thread_id = notmuch->xapian_db->get_metadata ("last_thread_id"); - if (last_thread_id.empty ()) { - notmuch->last_thread_id = 0; - } else { - const char *str; - char *end; - - str = last_thread_id.c_str (); - notmuch->last_thread_id = strtoull (str, &end, 16); - if (*end != '\0') - INTERNAL_ERROR ("Malformed database last_thread_id: %s", str); - } - - /* Get current highest revision number. */ - last_mod = notmuch->xapian_db->get_value_upper_bound ( - NOTMUCH_VALUE_LAST_MOD); - if (last_mod.empty ()) - notmuch->revision = 0; - else - notmuch->revision = Xapian::sortable_unserialise (last_mod); - notmuch->uuid = talloc_strdup ( - notmuch, notmuch->xapian_db->get_uuid ().c_str ()); + _load_database_state (notmuch); notmuch->query_parser = new Xapian::QueryParser; notmuch->term_gen = new Xapian::TermGenerator; @@ -733,6 +739,8 @@ notmuch_database_reopen (notmuch_database_t *notmuch, DB_ACTION); } } + + _load_database_state (notmuch); } catch (const Xapian::Error &error) { if (! notmuch->exception_reported) { _notmuch_database_log (notmuch, "Error: A Xapian exception reopening database: %s\n", -- 2.30.2
David Bremner <david@tethera.net> writes:
> In some uses of reopen, new documents and threads maybe have been
> added, and e.g. compaction may have changed the uuid.
It's quite possible there is more cached data that should be discarded,
but I'll have to think about that a bit more.
d
David Bremner venit, vidit, dixit 2021-05-11 19:47:15:
> David Bremner <david@tethera.net> writes:
>
> > In some uses of reopen, new documents and threads maybe have been
> > added, and e.g. compaction may have changed the uuid.
>
> It's quite possible there is more cached data that should be discarded,
> but I'll have to think about that a bit more.
>
> d
Haven't tried your patch yet, but worked on a test which even produces
notmuch: lib/database.cc:1415: unsigned int
_notmuch_database_generate_doc_id(notmuch_database_t*): Assertion
`notmuch->last_doc_id >= notmuch->xapian_db->get_lastdocid ()' failed.
by calling notmuch insert --no-hooks inside the pre-new hook of notmuch
new ... (T400). I have to test with nm 0.31 and nm 0.32+plus patch
before I send this out.
Michael
Due to the change in the config system, notmuch keeps a notmuch database open when it would not do so before. Consequently, it can miss changes to the database which are done from a hook (while notmuch holds the databse in read only mode). When notmuch itself writes to the database after that it uses wrong assumptions about the last used doc id etc. Demonstrate this by triggering an assertion. (This new test succeeds with notmuch 0.31.4.) Signed-off-by: Michael J Gruber <git@grubix.eu> --- test/T400-hooks.sh | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/test/T400-hooks.sh b/test/T400-hooks.sh index 00c99337..58e2b1dd 100755 --- a/test/T400-hooks.sh +++ b/test/T400-hooks.sh @@ -28,6 +28,16 @@ EOF echo "${TOKEN}" > ${2} } +create_change_hook () { + mkdir -p ${HOOK_DIR} + cat <<EOF >"${HOOK_DIR}/${1}" +#!/bin/sh +notmuch insert --no-hooks < ${2} > /dev/null +rm -f ${2} +EOF + chmod +x "${HOOK_DIR}/${1}" +} + create_failing_hook () { local HOOK_DIR=${2} mkdir -p ${HOOK_DIR} @@ -176,6 +186,17 @@ EOF NOTMUCH_NEW test_expect_equal_file write.expected write.output + test_begin_subtest "pre-new with change in database [${config}]" + test_subtest_known_broken + rm -rf ${HOOK_DIR} + notmuch search '*' > change.expected + generate_message '[subject]="Inserted"' + create_change_hook "pre-new" $gen_msg_filename $HOOK_DIR + generate_message + NOTMUCH_NEW + output=$(notmuch search id:$gen_msg_id | notmuch_search_sanitize) + test_expect_equal "$output" "thread:XXX 2001-01-05 [1/1] Notmuch Test Suite; pre-new with change in database [${config}] (inbox unread)" + rm -rf ${HOOK_DIR} done test_done -- 2.31.1.708.gc9a0ac0934
Michael J Gruber <git@grubix.eu> writes: > Due to the change in the config system, notmuch keeps a notmuch database > open when it would not do so before. Consequently, it can miss changes > to the database which are done from a hook (while notmuch holds the > databse in read only mode). When notmuch itself writes to the database > after that it uses wrong assumptions about the last used doc id etc. > > Demonstrate this by triggering an assertion. (This new test succeeds > with notmuch 0.31.4.) > > Signed-off-by: Michael J Gruber <git@grubix.eu> > --- > test/T400-hooks.sh | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > + notmuch search '*' > change.expected That line looks like leftover debugging to me, so I deleted it. Otherwise it works fine, thanks for tracking down the bug and providing a test. d
David Bremner venit, vidit, dixit 2021-05-12 00:34:33:
> Michael J Gruber <git@grubix.eu> writes:
>
> > Due to the change in the config system, notmuch keeps a notmuch database
> > open when it would not do so before. Consequently, it can miss changes
> > to the database which are done from a hook (while notmuch holds the
> > databse in read only mode). When notmuch itself writes to the database
> > after that it uses wrong assumptions about the last used doc id etc.
> >
> > Demonstrate this by triggering an assertion. (This new test succeeds
> > with notmuch 0.31.4.)
> >
> > Signed-off-by: Michael J Gruber <git@grubix.eu>
> > ---
> > test/T400-hooks.sh | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
>
> > + notmuch search '*' > change.expected
>
> That line looks like leftover debugging to me, so I deleted
> it. Otherwise it works fine, thanks for tracking down the bug and
> providing a test.
Thanks for the fix! I didn't put it in the commit message (because
that commit might go before for your patch), but the test succeeds with
your fix.
Sorry for the debug line - typical case of re-editing without
re-format-patching...
I was wondering if it's worthwhile to test also:
- that the change from pre hook is persistent (test for that msg)
- test the same for post (even though the db is closed before)
Cheers
Michael
Michael J Gruber <git@grubix.eu> writes: > > I was wondering if it's worthwhile to test also: > - that the change from pre hook is persistent (test for that msg) I added this to the version of test that I just pushed. > - test the same for post (even though the db is closed before) > It wouldn't hurt, but I think it can wait until we get this point release out the door.
David Bremner <david@tethera.net> writes:
> David Bremner <david@tethera.net> writes:
>
>> In some uses of reopen, new documents and threads maybe have been
>> added, and e.g. compaction may have changed the uuid.
>
> It's quite possible there is more cached data that should be discarded,
> but I'll have to think about that a bit more.
>
The following are at least potential problems, although I think none of
them are as urgent as the doc_id / thread_id issue.
exception_reported should probably be reset to false
atomic_nesting, atomic_dirty. Re-opening while in an open atomic
section is currently undefined
behaviour, should be checked.
features Theoretically if the user runs notmuch-new in a pre-new hook,
the database could be upgraded. For now, I suggest not doing
that. This should be fixabable, but I prefer to take more time.
David Bremner venit, vidit, dixit 2021-05-12 14:07:33:
> David Bremner <david@tethera.net> writes:
>
> > David Bremner <david@tethera.net> writes:
> >
> >> In some uses of reopen, new documents and threads maybe have been
> >> added, and e.g. compaction may have changed the uuid.
> >
> > It's quite possible there is more cached data that should be discarded,
> > but I'll have to think about that a bit more.
> >
>
> The following are at least potential problems, although I think none of
> them are as urgent as the doc_id / thread_id issue.
>
> exception_reported should probably be reset to false
>
> atomic_nesting, atomic_dirty. Re-opening while in an open atomic
> section is currently undefined
> behaviour, should be checked.
>
> features Theoretically if the user runs notmuch-new in a pre-new hook,
> the database could be upgraded. For now, I suggest not doing
> that. This should be fixabable, but I prefer to take more time.
Out of curiosity:
What does reopening gain compared to a full close/open cycle?
Michael
Michael J Gruber <git@grubix.eu> writes:
> Out of curiosity:
>
> What does reopening gain compared to a full close/open cycle?
>
> Michael
It's a fair question. Reopen does not require the extra arguments for
configuration that notmuch_database_open_with_config takes, so it can be
used in places where those arguments are not available. In the
read-only => read-only case there is also a performance advantage, but
that might not be relevant anymore now that we are doing several
database reads.
d