unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* missing messages in thread overview
@ 2020-12-03  4:59 Kyle Meyer
  2020-12-03 20:19 ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Kyle Meyer @ 2020-12-03  4:59 UTC (permalink / raw)
  To: meta


I noticed a thread today that is missing messages in the thread
overview.  In the top-level inbox, all the messages are there:

  [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled
   2020-12-02 11:17 UTC  (11+ messages) - mbox.gz / Atom
  ` [bug#45000] [PATCH 1/9] gnu: Add ghc-yesod-test
    ` [bug#45000] [PATCH 2/9] gnu: Add ghc-wai-app-static
    ` [bug#45000] [PATCH 3/9] gnu: Add ghc-yesod-static
    ` [bug#45000] [PATCH 4/9] gnu: Add ghc-cryptonite-conduit
    ` [bug#45000] [PATCH 5/9] gnu: ghc-yesod-core: Update to 1.6.18.6
    ` [bug#45000] [PATCH 6/9] gnu: Add ghc-hjsmin
    ` [bug#45000] [PATCH 7/9] gnu: Add ghc-template-haskell
    ` [bug#45000] [PATCH 8/9] gnu: Add ghc-boot-th
    ` [bug#45000] [PATCH 9/9] gnu: git-annex: Build with git-annex assistant webapp enabled

However, when I click through, the thread overview is missing five
messages:

  Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
  -- links below jump to the message on this page --
  2020-12-02  4:53 [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled Jonathan Frederickson
  2020-12-02  4:55 ` [bug#45000] [PATCH 1/9] gnu: Add ghc-yesod-test Jonathan Frederickson
  2020-12-02  4:55   ` [bug#45000] [PATCH 2/9] gnu: Add ghc-wai-app-static Jonathan Frederickson
  2020-12-02  4:55   ` [bug#45000] [PATCH 7/9] gnu: Add ghc-template-haskell Jonathan Frederickson
  2020-12-02  4:55   ` [bug#45000] [PATCH 9/9] gnu: git-annex: Build with git-annex assistant webapp enabled Jonathan Frederickson
  2020-12-02 11:17 ` [bug#45000] [PATCH 0/9] Build git-annex with " zimoun

https://yhetil.org/guix-patches/86sg8o1mou.fsf@gmail.com/T/#t

The same subset of messages are in the thread's mbox:

  $ curl -fSsL https://yhetil.org/guix-patches/86sg8o1mou.fsf@gmail.com/t.mbox.gz \
    | gzip -d | grep -i ^subject
  Subject: [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled
  Subject: [bug#45000] [PATCH 1/9] gnu: Add ghc-yesod-test.
  Subject: [bug#45000] [PATCH 7/9] gnu: Add ghc-template-haskell.
  Subject: [bug#45000] [PATCH 9/9] gnu: git-annex: Build with git-annex
  Subject: [bug#45000] [PATCH 2/9] gnu: Add ghc-wai-app-static.
  Subject: [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled

I collected the complete thread (currently eleven messages) and used
scripts/import_vger_from_mbox to create a separate inbox at
<https://yhetil.org/debug-gap/>.  As before, the top-level listing shows
all the patches, and the thread overview is still missing some, though
different ones:

  2020-12-02  4:53 [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled Jonathan Frederickson
  2020-12-02 11:17 ` zimoun
  2020-12-02  4:55 [bug#45000] [PATCH 1/9] gnu: Add ghc-yesod-test Jonathan Frederickson
  2020-12-02  4:55 ` [bug#45000] [PATCH 3/9] gnu: Add ghc-yesod-static Jonathan Frederickson
  2020-12-02  4:55 ` [bug#45000] [PATCH 4/9] gnu: Add ghc-cryptonite-conduit Jonathan Frederickson
  2020-12-02  4:55 ` [bug#45000] [PATCH 5/9] gnu: ghc-yesod-core: Update to 1.6.18.6 Jonathan Frederickson
  2020-12-02  4:55 ` [bug#45000] [PATCH 6/9] gnu: Add ghc-hjsmin Jonathan Frederickson
  2020-12-02  4:55 ` [bug#45000] [PATCH 8/9] gnu: Add ghc-boot-th Jonathan Frederickson

On rethreading with `public-inbox-index -c --reindex --rethread', the
thread overview looks like this:

  2020-12-02  4:53 [bug#45000] [PATCH 0/9] Build git-annex with assistant webapp enabled Jonathan Frederickson
  2020-12-02 11:17 ` zimoun

All of the above is with a recent commit on master,
v1.6.0-149-g4cc4ad59.

Any ideas what might be going on here?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: missing messages in thread overview
  2020-12-03  4:59 missing messages in thread overview Kyle Meyer
@ 2020-12-03 20:19 ` Eric Wong
  2020-12-04  2:12   ` [WIP] over: ensure old, merged {tid} is really gone Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2020-12-03 20:19 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> <https://yhetil.org/debug-gap/>.

Thanks, I'm able to reproduce it.

<snip>

> Any ideas what might be going on here?

It seems like the presence of two ghosts in the histories fails
to get merged together; so it's order-dependency bug in
OverIdx.pm rethread seems to be exacerbating the condition (less
sure about rethread, though).

Basically, there should be only one distinct `tid' column
in the `over' table, but there's two tids and that's
causing over->get_thread to only see one tid or the other.

The top-level inbox view doesn't care about `tid' column, it
just fetches the most recent messages; so it's able to get
a complete view to render all the messages in the thread.

Will try to work on it more in a bit (struggling to decide
between starvation or possible COVID-19 exposure... :<)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [WIP] over: ensure old, merged {tid} is really gone.
  2020-12-03 20:19 ` Eric Wong
@ 2020-12-04  2:12   ` Eric Wong
  2020-12-04  3:35     ` Kyle Meyer
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2020-12-04  2:12 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Eric Wong <e@80x24.org> wrote:
> It seems like the presence of two ghosts in the histories fails
> to get merged together; so it's order-dependency bug in
> OverIdx.pm rethread seems to be exacerbating the condition (less
> sure about rethread, though).

Yes, the fix is quite small (I think the below test case can be
made smaller).

--rethread seems to be a separate bug, will fix when more awake.

-----------------8<--------------
Subject: [PATCH] over: ensure old, merged {tid} is really gone.

We must use the result of link_refs() since it can trigger
merge_threads() and invalidate $old_tid.  In case
merge_threads() isn't triggered,  link_refs() will return
$old_tid anyways.
---
 MANIFEST                   |  1 +
 lib/PublicInbox/OverIdx.pm |  2 +-
 t/thread-index-gap.t       | 94 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 1 deletion(-)
 create mode 100644 t/thread-index-gap.t

diff --git a/MANIFEST b/MANIFEST
index 544ec5f9..946e4b8a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -366,6 +366,7 @@ t/solver_git.t
 t/spamcheck_spamc.t
 t/spawn.t
 t/thread-cycle.t
+t/thread-index-gap.t
 t/time.t
 t/uri_imap.t
 t/utf8.eml
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 07cca4e5..ac53518c 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -298,7 +298,7 @@ sub _add_over {
 		}
 	} elsif ($n < 0) { # ghost
 		$$old_tid //= $cur_valid ? $cur_tid : next_tid($self);
-		link_refs($self, $refs, $$old_tid);
+		$$old_tid = link_refs($self, $refs, $$old_tid);
 		delete_by_num($self, $n);
 		$$v++;
 	}
diff --git a/t/thread-index-gap.t b/t/thread-index-gap.t
new file mode 100644
index 00000000..1772ce22
--- /dev/null
+++ b/t/thread-index-gap.t
@@ -0,0 +1,94 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use v5.10.1;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use PublicInbox::InboxWritable;
+use PublicInbox::Config;
+require_mods(qw(DBD::SQLite));
+require_git(2.6);
+my ($home, $for_destroy) = tmpdir();
+local $ENV{HOME} = $home;
+ok(run_script([qw(-init -V2 index-gap), "$home/index-gap",
+	qw(http://example.com/v2test index-gap@example.com)]), 'init');
+my $pi_cfg = PublicInbox::Config->new;
+my $ibx = $pi_cfg->lookup_name('index-gap');
+PublicInbox::InboxWritable->new($ibx);
+
+chomp(my @msgs = reverse(split(/\n\n/, <<'EOF')));
+Subject: [bug#45000] [PATCH 2/9]
+Message-Id: <20201202045540.31248-2-j@example.com>
+In-Reply-To: <20201202045540.31248-1-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 9/9]
+Message-Id: <20201202045540.31248-9-j@example.com>
+In-Reply-To: <20201202045540.31248-1-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 7/9]
+Message-Id: <20201202045540.31248-7-j@example.com>
+In-Reply-To: <20201202045540.31248-1-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 1/9]
+References: <20201202045335.31096-1-j@example.com>
+In-Reply-To: <20201202045335.31096-1-j@example.com>
+Message-Id: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 0/9]
+Message-Id: <20201202045335.31096-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 0/9]
+In-Reply-To: <20201202045335.31096-1-j@example.com>
+References: <20201202045335.31096-1-j@example.com>
+Message-ID: <86sg8o1mou.fsf@example.com>
+
+Subject: [bug#45000] [PATCH 8/9]
+Message-Id: <20201202045540.31248-8-j@example.com>
+In-Reply-To: <20201202045540.31248-1-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 6/9]
+Message-Id: <20201202045540.31248-6-j@example.com>
+In-Reply-To: <20201202045540.31248-1-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 5/9]
+Message-Id: <20201202045540.31248-5-j@example.com>
+In-Reply-To: <20201202045540.31248-1-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 4/9]
+Message-Id: <20201202045540.31248-4-j@example.com>
+In-Reply-To: <20201202045540.31248-1-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 3/9]
+Message-Id: <20201202045540.31248-3-j@example.com>
+In-Reply-To: <20201202045540.31248-1-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+EOF
+
+my $im = $ibx->importer(0);
+for my $msg (@msgs) {
+	$im->add(PublicInbox::Eml->new("$msg\nFrom: x\@example.com\n\n"));
+}
+$im->done;
+
+my @tid = $ibx->over->dbh->selectall_array('SELECT DISTINCT(tid) FROM over');
+is(scalar(@tid), 1, 'only one thread');
+
+if (0 && 'FIXME') {
+	$ibx->over->dbh_close;
+	ok(run_script([qw(-index --reindex --rethread -v), $ibx->{inboxdir}]),
+		'rethread');
+	@tid = $ibx->over->dbh->selectall_array(
+				'SELECT DISTINCT(tid) FROM over');
+	is(scalar(@tid), 1, 'only one thread after rethread');
+}
+
+done_testing;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [WIP] over: ensure old, merged {tid} is really gone.
  2020-12-04  2:12   ` [WIP] over: ensure old, merged {tid} is really gone Eric Wong
@ 2020-12-04  3:35     ` Kyle Meyer
  2020-12-04 12:09       ` [PATCH] " Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Kyle Meyer @ 2020-12-04  3:35 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Eric Wong writes:

> Yes, the fix is quite small (I think the below test case can be
> made smaller).
>
> --rethread seems to be a separate bug, will fix when more awake.

Great, thank you for the explanation and fix.

By the way, I've been enjoying playing with the extindex feature.  I
haven't been doing anything fancy and it's just a few inboxes, but it's
been _really_ nice to be able to group the inboxes for a project.
Anyway, off topic for this thread, but thanks for that too!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] over: ensure old, merged {tid} is really gone
  2020-12-04  3:35     ` Kyle Meyer
@ 2020-12-04 12:09       ` Eric Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2020-12-04 12:09 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> Eric Wong writes:
> 
> > Yes, the fix is quite small (I think the below test case can be
> > made smaller).
> >
> > --rethread seems to be a separate bug, will fix when more awake.
> 
> Great, thank you for the explanation and fix.

No problem, thanks for the excellent bug report.  Below patch
fixes the rethread case, too, which has the same conceptual
problem in it was leaving a stale {tid} in the DB.

Also made the test data smaller and a shuffle test to ensure
it's truly order-independent.

> By the way, I've been enjoying playing with the extindex feature.  I
> haven't been doing anything fancy and it's just a few inboxes, but it's
> been _really_ nice to be able to group the inboxes for a project.
> Anyway, off topic for this thread, but thanks for that too!

Great to know.  Working on some other stuff in that area to
allow selecting inbox subsets and (hopefully) reduce SSD wear.

-------------------8<---------------

Subject: [PATCH] over: ensure old, merged {tid} is really gone

We must use the result of link_refs() since it can trigger
merge_threads() and invalidate $old_tid.  In case
merge_threads() isn't triggered, link_refs() will return
$old_tid anyways.

When rethreading and allocating new {tid}, we also must update
the row where the now-expired {tid} came from to ensure only the
new {tid} is seen when reindexing subsequent messages in
history.  Otherwise, every subsequently reindexed+rethreaded
message could end up getting a new {tid}.

Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87360nlc44.fsf@kyleam.com/
---
 MANIFEST                   |  1 +
 lib/PublicInbox/OverIdx.pm | 12 ++++++--
 t/thread-index-gap.t       | 57 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 3 deletions(-)
 create mode 100644 t/thread-index-gap.t

diff --git a/MANIFEST b/MANIFEST
index 544ec5f9..946e4b8a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -366,6 +366,7 @@ t/solver_git.t
 t/spamcheck_spamc.t
 t/spawn.t
 t/thread-cycle.t
+t/thread-index-gap.t
 t/time.t
 t/uri_imap.t
 t/utf8.eml
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 07cca4e5..88daa64f 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -170,8 +170,14 @@ sub _resolve_mid_to_tid {
 		$$tid = $cur_tid;
 	} else { # rethreading, queue up dead ghosts
 		$$tid = next_tid($self);
-		my $num = $smsg->{num};
-		push(@{$self->{-ghosts_to_delete}}, $num) if $num < 0;
+		my $n = $smsg->{num};
+		if ($n > 0) {
+			$self->{dbh}->prepare_cached(<<'')->execute($$tid, $n);
+UPDATE over SET tid = ? WHERE num = ?
+
+		} elsif ($n < 0) {
+			push(@{$self->{-ghosts_to_delete}}, $n);
+		}
 	}
 	1;
 }
@@ -298,7 +304,7 @@ sub _add_over {
 		}
 	} elsif ($n < 0) { # ghost
 		$$old_tid //= $cur_valid ? $cur_tid : next_tid($self);
-		link_refs($self, $refs, $$old_tid);
+		$$old_tid = link_refs($self, $refs, $$old_tid);
 		delete_by_num($self, $n);
 		$$v++;
 	}
diff --git a/t/thread-index-gap.t b/t/thread-index-gap.t
new file mode 100644
index 00000000..49f254e9
--- /dev/null
+++ b/t/thread-index-gap.t
@@ -0,0 +1,57 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use v5.10.1;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use PublicInbox::InboxWritable;
+use PublicInbox::Config;
+use List::Util qw(shuffle);
+require_mods(qw(DBD::SQLite));
+require_git(2.6);
+
+chomp(my @msgs = split(/\n\n/, <<'EOF')); # "git log" order
+Subject: [bug#45000] [PATCH 1/9]
+References: <20201202045335.31096-1-j@example.com>
+Message-Id: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 0/9]
+Message-Id: <20201202045335.31096-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 0/9]
+References: <20201202045335.31096-1-j@example.com>
+Message-ID: <86sg8o1mou.fsf@example.com>
+
+Subject: [bug#45000] [PATCH 8/9]
+Message-Id: <20201202045540.31248-8-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+EOF
+
+my ($home, $for_destroy) = tmpdir();
+local $ENV{HOME} = $home;
+for my $msgs (['orig', reverse @msgs], ['shuffle', shuffle(@msgs)]) {
+	my $desc = shift @$msgs;
+	my $n = "index-cap-$desc";
+	run_script([qw(-init -L basic -V2), $n, "$home/$n",
+		"http://example.com/$n", "$n\@example.com"]) or
+		BAIL_OUT 'init';
+	my $ibx = PublicInbox::Config->new->lookup_name($n);
+	my $im = PublicInbox::InboxWritable->new($ibx)->importer(0);
+	for my $m (@$msgs) {
+		$im->add(PublicInbox::Eml->new("$m\nFrom: x\@example.com\n\n"));
+	}
+	$im->done;
+	my $over = $ibx->over;
+	my @tid = $over->dbh->selectall_array('SELECT DISTINCT(tid) FROM over');
+	is(scalar(@tid), 1, "only one thread initially ($desc)");
+	$over->dbh_close;
+	run_script([qw(-index --reindex --rethread), $ibx->{inboxdir}]) or
+		BAIL_OUT 'rethread';
+	@tid = $over->dbh->selectall_array('SELECT DISTINCT(tid) FROM over');
+	is(scalar(@tid), 1, "only one thread after rethread ($desc)");
+}
+
+done_testing;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-04 12:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03  4:59 missing messages in thread overview Kyle Meyer
2020-12-03 20:19 ` Eric Wong
2020-12-04  2:12   ` [WIP] over: ensure old, merged {tid} is really gone Eric Wong
2020-12-04  3:35     ` Kyle Meyer
2020-12-04 12:09       ` [PATCH] " Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).