* missing messages in thread overview @ 2020-12-03 4:59 Kyle Meyer 2020-12-03 20:19 ` Eric Wong 0 siblings, 1 reply; 5+ messages in thread From: Kyle Meyer @ 2020-12-03 4:59 UTC (permalink / raw) To: meta I noticed a thread today that is missing messages in the thread overview. In the top-level inbox, all the messages are there: [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled 2020-12-02 11:17 UTC (11+ messages) - mbox.gz / Atom ` [bug#45000] [PATCH 1/9] gnu: Add ghc-yesod-test ` [bug#45000] [PATCH 2/9] gnu: Add ghc-wai-app-static ` [bug#45000] [PATCH 3/9] gnu: Add ghc-yesod-static ` [bug#45000] [PATCH 4/9] gnu: Add ghc-cryptonite-conduit ` [bug#45000] [PATCH 5/9] gnu: ghc-yesod-core: Update to 1.6.18.6 ` [bug#45000] [PATCH 6/9] gnu: Add ghc-hjsmin ` [bug#45000] [PATCH 7/9] gnu: Add ghc-template-haskell ` [bug#45000] [PATCH 8/9] gnu: Add ghc-boot-th ` [bug#45000] [PATCH 9/9] gnu: git-annex: Build with git-annex assistant webapp enabled However, when I click through, the thread overview is missing five messages: Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-02 4:53 [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 1/9] gnu: Add ghc-yesod-test Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 2/9] gnu: Add ghc-wai-app-static Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 7/9] gnu: Add ghc-template-haskell Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 9/9] gnu: git-annex: Build with git-annex assistant webapp enabled Jonathan Frederickson 2020-12-02 11:17 ` [bug#45000] [PATCH 0/9] Build git-annex with " zimoun https://yhetil.org/guix-patches/86sg8o1mou.fsf@gmail.com/T/#t The same subset of messages are in the thread's mbox: $ curl -fSsL https://yhetil.org/guix-patches/86sg8o1mou.fsf@gmail.com/t.mbox.gz \ | gzip -d | grep -i ^subject Subject: [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled Subject: [bug#45000] [PATCH 1/9] gnu: Add ghc-yesod-test. Subject: [bug#45000] [PATCH 7/9] gnu: Add ghc-template-haskell. Subject: [bug#45000] [PATCH 9/9] gnu: git-annex: Build with git-annex Subject: [bug#45000] [PATCH 2/9] gnu: Add ghc-wai-app-static. Subject: [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled I collected the complete thread (currently eleven messages) and used scripts/import_vger_from_mbox to create a separate inbox at <https://yhetil.org/debug-gap/>. As before, the top-level listing shows all the patches, and the thread overview is still missing some, though different ones: 2020-12-02 4:53 [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled Jonathan Frederickson 2020-12-02 11:17 ` zimoun 2020-12-02 4:55 [bug#45000] [PATCH 1/9] gnu: Add ghc-yesod-test Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 3/9] gnu: Add ghc-yesod-static Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 4/9] gnu: Add ghc-cryptonite-conduit Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 5/9] gnu: ghc-yesod-core: Update to 1.6.18.6 Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 6/9] gnu: Add ghc-hjsmin Jonathan Frederickson 2020-12-02 4:55 ` [bug#45000] [PATCH 8/9] gnu: Add ghc-boot-th Jonathan Frederickson On rethreading with `public-inbox-index -c --reindex --rethread', the thread overview looks like this: 2020-12-02 4:53 [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled Jonathan Frederickson 2020-12-02 11:17 ` zimoun All of the above is with a recent commit on master, v1.6.0-149-g4cc4ad59. Any ideas what might be going on here? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: missing messages in thread overview 2020-12-03 4:59 missing messages in thread overview Kyle Meyer @ 2020-12-03 20:19 ` Eric Wong 2020-12-04 2:12 ` [WIP] over: ensure old, merged {tid} is really gone Eric Wong 0 siblings, 1 reply; 5+ messages in thread From: Eric Wong @ 2020-12-03 20:19 UTC (permalink / raw) To: Kyle Meyer; +Cc: meta Kyle Meyer <kyle@kyleam.com> wrote: > <https://yhetil.org/debug-gap/>. Thanks, I'm able to reproduce it. <snip> > Any ideas what might be going on here? It seems like the presence of two ghosts in the histories fails to get merged together; so it's order-dependency bug in OverIdx.pm rethread seems to be exacerbating the condition (less sure about rethread, though). Basically, there should be only one distinct `tid' column in the `over' table, but there's two tids and that's causing over->get_thread to only see one tid or the other. The top-level inbox view doesn't care about `tid' column, it just fetches the most recent messages; so it's able to get a complete view to render all the messages in the thread. Will try to work on it more in a bit (struggling to decide between starvation or possible COVID-19 exposure... :<) ^ permalink raw reply [flat|nested] 5+ messages in thread
* [WIP] over: ensure old, merged {tid} is really gone. 2020-12-03 20:19 ` Eric Wong @ 2020-12-04 2:12 ` Eric Wong 2020-12-04 3:35 ` Kyle Meyer 0 siblings, 1 reply; 5+ messages in thread From: Eric Wong @ 2020-12-04 2:12 UTC (permalink / raw) To: Kyle Meyer; +Cc: meta Eric Wong <e@80x24.org> wrote: > It seems like the presence of two ghosts in the histories fails > to get merged together; so it's order-dependency bug in > OverIdx.pm rethread seems to be exacerbating the condition (less > sure about rethread, though). Yes, the fix is quite small (I think the below test case can be made smaller). --rethread seems to be a separate bug, will fix when more awake. -----------------8<-------------- Subject: [PATCH] over: ensure old, merged {tid} is really gone. We must use the result of link_refs() since it can trigger merge_threads() and invalidate $old_tid. In case merge_threads() isn't triggered, link_refs() will return $old_tid anyways. --- MANIFEST | 1 + lib/PublicInbox/OverIdx.pm | 2 +- t/thread-index-gap.t | 94 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 96 insertions(+), 1 deletion(-) create mode 100644 t/thread-index-gap.t diff --git a/MANIFEST b/MANIFEST index 544ec5f9..946e4b8a 100644 --- a/MANIFEST +++ b/MANIFEST @@ -366,6 +366,7 @@ t/solver_git.t t/spamcheck_spamc.t t/spawn.t t/thread-cycle.t +t/thread-index-gap.t t/time.t t/uri_imap.t t/utf8.eml diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm index 07cca4e5..ac53518c 100644 --- a/lib/PublicInbox/OverIdx.pm +++ b/lib/PublicInbox/OverIdx.pm @@ -298,7 +298,7 @@ sub _add_over { } } elsif ($n < 0) { # ghost $$old_tid //= $cur_valid ? $cur_tid : next_tid($self); - link_refs($self, $refs, $$old_tid); + $$old_tid = link_refs($self, $refs, $$old_tid); delete_by_num($self, $n); $$v++; } diff --git a/t/thread-index-gap.t b/t/thread-index-gap.t new file mode 100644 index 00000000..1772ce22 --- /dev/null +++ b/t/thread-index-gap.t @@ -0,0 +1,94 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use v5.10.1; +use Test::More; +use PublicInbox::TestCommon; +use PublicInbox::Eml; +use PublicInbox::InboxWritable; +use PublicInbox::Config; +require_mods(qw(DBD::SQLite)); +require_git(2.6); +my ($home, $for_destroy) = tmpdir(); +local $ENV{HOME} = $home; +ok(run_script([qw(-init -V2 index-gap), "$home/index-gap", + qw(http://example.com/v2test index-gap@example.com)]), 'init'); +my $pi_cfg = PublicInbox::Config->new; +my $ibx = $pi_cfg->lookup_name('index-gap'); +PublicInbox::InboxWritable->new($ibx); + +chomp(my @msgs = reverse(split(/\n\n/, <<'EOF'))); +Subject: [bug#45000] [PATCH 2/9] +Message-Id: <20201202045540.31248-2-j@example.com> +In-Reply-To: <20201202045540.31248-1-j@example.com> +References: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 9/9] +Message-Id: <20201202045540.31248-9-j@example.com> +In-Reply-To: <20201202045540.31248-1-j@example.com> +References: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 7/9] +Message-Id: <20201202045540.31248-7-j@example.com> +In-Reply-To: <20201202045540.31248-1-j@example.com> +References: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 1/9] +References: <20201202045335.31096-1-j@example.com> +In-Reply-To: <20201202045335.31096-1-j@example.com> +Message-Id: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 0/9] +Message-Id: <20201202045335.31096-1-j@example.com> + +Subject: [bug#45000] [PATCH 0/9] +In-Reply-To: <20201202045335.31096-1-j@example.com> +References: <20201202045335.31096-1-j@example.com> +Message-ID: <86sg8o1mou.fsf@example.com> + +Subject: [bug#45000] [PATCH 8/9] +Message-Id: <20201202045540.31248-8-j@example.com> +In-Reply-To: <20201202045540.31248-1-j@example.com> +References: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 6/9] +Message-Id: <20201202045540.31248-6-j@example.com> +In-Reply-To: <20201202045540.31248-1-j@example.com> +References: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 5/9] +Message-Id: <20201202045540.31248-5-j@example.com> +In-Reply-To: <20201202045540.31248-1-j@example.com> +References: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 4/9] +Message-Id: <20201202045540.31248-4-j@example.com> +In-Reply-To: <20201202045540.31248-1-j@example.com> +References: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 3/9] +Message-Id: <20201202045540.31248-3-j@example.com> +In-Reply-To: <20201202045540.31248-1-j@example.com> +References: <20201202045540.31248-1-j@example.com> +EOF + +my $im = $ibx->importer(0); +for my $msg (@msgs) { + $im->add(PublicInbox::Eml->new("$msg\nFrom: x\@example.com\n\n")); +} +$im->done; + +my @tid = $ibx->over->dbh->selectall_array('SELECT DISTINCT(tid) FROM over'); +is(scalar(@tid), 1, 'only one thread'); + +if (0 && 'FIXME') { + $ibx->over->dbh_close; + ok(run_script([qw(-index --reindex --rethread -v), $ibx->{inboxdir}]), + 'rethread'); + @tid = $ibx->over->dbh->selectall_array( + 'SELECT DISTINCT(tid) FROM over'); + is(scalar(@tid), 1, 'only one thread after rethread'); +} + +done_testing; ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [WIP] over: ensure old, merged {tid} is really gone. 2020-12-04 2:12 ` [WIP] over: ensure old, merged {tid} is really gone Eric Wong @ 2020-12-04 3:35 ` Kyle Meyer 2020-12-04 12:09 ` [PATCH] " Eric Wong 0 siblings, 1 reply; 5+ messages in thread From: Kyle Meyer @ 2020-12-04 3:35 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong writes: > Yes, the fix is quite small (I think the below test case can be > made smaller). > > --rethread seems to be a separate bug, will fix when more awake. Great, thank you for the explanation and fix. By the way, I've been enjoying playing with the extindex feature. I haven't been doing anything fancy and it's just a few inboxes, but it's been _really_ nice to be able to group the inboxes for a project. Anyway, off topic for this thread, but thanks for that too! ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] over: ensure old, merged {tid} is really gone 2020-12-04 3:35 ` Kyle Meyer @ 2020-12-04 12:09 ` Eric Wong 0 siblings, 0 replies; 5+ messages in thread From: Eric Wong @ 2020-12-04 12:09 UTC (permalink / raw) To: Kyle Meyer; +Cc: meta Kyle Meyer <kyle@kyleam.com> wrote: > Eric Wong writes: > > > Yes, the fix is quite small (I think the below test case can be > > made smaller). > > > > --rethread seems to be a separate bug, will fix when more awake. > > Great, thank you for the explanation and fix. No problem, thanks for the excellent bug report. Below patch fixes the rethread case, too, which has the same conceptual problem in it was leaving a stale {tid} in the DB. Also made the test data smaller and a shuffle test to ensure it's truly order-independent. > By the way, I've been enjoying playing with the extindex feature. I > haven't been doing anything fancy and it's just a few inboxes, but it's > been _really_ nice to be able to group the inboxes for a project. > Anyway, off topic for this thread, but thanks for that too! Great to know. Working on some other stuff in that area to allow selecting inbox subsets and (hopefully) reduce SSD wear. -------------------8<--------------- Subject: [PATCH] over: ensure old, merged {tid} is really gone We must use the result of link_refs() since it can trigger merge_threads() and invalidate $old_tid. In case merge_threads() isn't triggered, link_refs() will return $old_tid anyways. When rethreading and allocating new {tid}, we also must update the row where the now-expired {tid} came from to ensure only the new {tid} is seen when reindexing subsequent messages in history. Otherwise, every subsequently reindexed+rethreaded message could end up getting a new {tid}. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87360nlc44.fsf@kyleam.com/ --- MANIFEST | 1 + lib/PublicInbox/OverIdx.pm | 12 ++++++-- t/thread-index-gap.t | 57 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 67 insertions(+), 3 deletions(-) create mode 100644 t/thread-index-gap.t diff --git a/MANIFEST b/MANIFEST index 544ec5f9..946e4b8a 100644 --- a/MANIFEST +++ b/MANIFEST @@ -366,6 +366,7 @@ t/solver_git.t t/spamcheck_spamc.t t/spawn.t t/thread-cycle.t +t/thread-index-gap.t t/time.t t/uri_imap.t t/utf8.eml diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm index 07cca4e5..88daa64f 100644 --- a/lib/PublicInbox/OverIdx.pm +++ b/lib/PublicInbox/OverIdx.pm @@ -170,8 +170,14 @@ sub _resolve_mid_to_tid { $$tid = $cur_tid; } else { # rethreading, queue up dead ghosts $$tid = next_tid($self); - my $num = $smsg->{num}; - push(@{$self->{-ghosts_to_delete}}, $num) if $num < 0; + my $n = $smsg->{num}; + if ($n > 0) { + $self->{dbh}->prepare_cached(<<'')->execute($$tid, $n); +UPDATE over SET tid = ? WHERE num = ? + + } elsif ($n < 0) { + push(@{$self->{-ghosts_to_delete}}, $n); + } } 1; } @@ -298,7 +304,7 @@ sub _add_over { } } elsif ($n < 0) { # ghost $$old_tid //= $cur_valid ? $cur_tid : next_tid($self); - link_refs($self, $refs, $$old_tid); + $$old_tid = link_refs($self, $refs, $$old_tid); delete_by_num($self, $n); $$v++; } diff --git a/t/thread-index-gap.t b/t/thread-index-gap.t new file mode 100644 index 00000000..49f254e9 --- /dev/null +++ b/t/thread-index-gap.t @@ -0,0 +1,57 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use v5.10.1; +use Test::More; +use PublicInbox::TestCommon; +use PublicInbox::Eml; +use PublicInbox::InboxWritable; +use PublicInbox::Config; +use List::Util qw(shuffle); +require_mods(qw(DBD::SQLite)); +require_git(2.6); + +chomp(my @msgs = split(/\n\n/, <<'EOF')); # "git log" order +Subject: [bug#45000] [PATCH 1/9] +References: <20201202045335.31096-1-j@example.com> +Message-Id: <20201202045540.31248-1-j@example.com> + +Subject: [bug#45000] [PATCH 0/9] +Message-Id: <20201202045335.31096-1-j@example.com> + +Subject: [bug#45000] [PATCH 0/9] +References: <20201202045335.31096-1-j@example.com> +Message-ID: <86sg8o1mou.fsf@example.com> + +Subject: [bug#45000] [PATCH 8/9] +Message-Id: <20201202045540.31248-8-j@example.com> +References: <20201202045540.31248-1-j@example.com> + +EOF + +my ($home, $for_destroy) = tmpdir(); +local $ENV{HOME} = $home; +for my $msgs (['orig', reverse @msgs], ['shuffle', shuffle(@msgs)]) { + my $desc = shift @$msgs; + my $n = "index-cap-$desc"; + run_script([qw(-init -L basic -V2), $n, "$home/$n", + "http://example.com/$n", "$n\@example.com"]) or + BAIL_OUT 'init'; + my $ibx = PublicInbox::Config->new->lookup_name($n); + my $im = PublicInbox::InboxWritable->new($ibx)->importer(0); + for my $m (@$msgs) { + $im->add(PublicInbox::Eml->new("$m\nFrom: x\@example.com\n\n")); + } + $im->done; + my $over = $ibx->over; + my @tid = $over->dbh->selectall_array('SELECT DISTINCT(tid) FROM over'); + is(scalar(@tid), 1, "only one thread initially ($desc)"); + $over->dbh_close; + run_script([qw(-index --reindex --rethread), $ibx->{inboxdir}]) or + BAIL_OUT 'rethread'; + @tid = $over->dbh->selectall_array('SELECT DISTINCT(tid) FROM over'); + is(scalar(@tid), 1, "only one thread after rethread ($desc)"); +} + +done_testing; ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-12-04 12:09 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-12-03 4:59 missing messages in thread overview Kyle Meyer 2020-12-03 20:19 ` Eric Wong 2020-12-04 2:12 ` [WIP] over: ensure old, merged {tid} is really gone Eric Wong 2020-12-04 3:35 ` Kyle Meyer 2020-12-04 12:09 ` [PATCH] " Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).