From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Subject: [RFC] overidx: preserve `tid' column on re-indexing
Date: Sun, 5 Aug 2018 08:19:25 +0000 [thread overview]
Message-ID: <20180805081925.ypej6lcxtswdtdow@dcvr> (raw)
In-Reply-To: <20180805060440.fhl7zvyis246e3ym@dcvr>
Eric Wong <e@80x24.org> wrote:
> While working on this, I noticed the backwards --reindex walk
> breaks `tid' on v1 repositories, at least. That bug was hidden
> by the Subject: match logic and not discovered until now. It
> will be fixed separately.
Lightly tested, but seems to make sense...
Reindexing http://czquwvybam4bgbro.onion/git/ now...
-------8<-------
Subject: [RFC] overidx: preserve `tid' column on re-indexing
Otherwise, walking backwards through history could mean the root
message in a thread forgets its `tid' and it prevents messages
from being looked up by it.
This bug was hidden by the fact that `sid' matches were often
good enough to link threads together.
---
lib/PublicInbox/OverIdx.pm | 11 +++++++++--
t/search-thr-index.t | 40 ++++++++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 62fec0d..cc9bd7d 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -79,8 +79,15 @@ sub mid2id {
}
sub delete_by_num {
- my ($self, $num) = @_;
+ my ($self, $num, $tid_ref) = @_;
my $dbh = $self->{dbh};
+ if ($tid_ref) {
+ my $sth = $dbh->prepare_cached(<<'', undef, 1);
+SELECT tid FROM over WHERE num = ? LIMIT 1
+
+ $sth->execute($num);
+ $$tid_ref = $sth->fetchrow_array; # may be undef
+ }
foreach (qw(over id2num)) {
$dbh->prepare_cached(<<"")->execute($num);
DELETE FROM $_ WHERE num = ?
@@ -262,7 +269,7 @@ sub add_over {
my $vivified = 0;
$self->begin_lazy;
- $self->delete_by_num($num);
+ $self->delete_by_num($num, \$old_tid);
foreach my $mid (@$mids) {
my $v = 0;
each_by_mid($self, $mid, ['tid'], sub {
diff --git a/t/search-thr-index.t b/t/search-thr-index.t
index 2aa97bf..ab6d1b0 100644
--- a/t/search-thr-index.t
+++ b/t/search-thr-index.t
@@ -48,9 +48,49 @@ foreach (reverse split(/\n\n/, $data)) {
}
my $prev;
+my %tids;
+my $dbh = $rw->{over}->connect;
foreach my $mid (@mids) {
my $msgs = $rw->{over}->get_thread($mid);
is(3, scalar(@$msgs), "got all messages from $mid");
+ foreach my $m (@$msgs) {
+ my $tid = $dbh->selectrow_array(<<'', undef, $m->{num});
+SELECT tid FROM over WHERE num = ? LIMIT 1
+
+ $tids{$tid}++;
+ }
+}
+
+is(scalar keys %tids, 1, 'all messages have the same tid');
+
+$rw->commit_txn_lazy;
+
+$xdb = $rw->begin_txn_lazy;
+{
+ my $mime = Email::MIME->new(<<'');
+Subject: [RFC 00/14]
+Message-Id: <1-bw@g>
+From: bw@g
+To: git@vger.kernel.org
+
+ my $dbh = $rw->{over}->connect;
+ my ($id, $prev);
+ my $reidx = $rw->{over}->next_by_mid('1-bw@g', \$id, \$prev);
+ ok(defined $reidx);
+ my $num = $reidx->{num};
+ my $tid0 = $dbh->selectrow_array(<<'', undef, $num);
+SELECT tid FROM over WHERE num = ? LIMIT 1
+
+ my $bytes = bytes::length($mime->as_string);
+ my $mid = mids($mime->header_obj)->[0];
+ my $doc_id = $rw->add_message($mime, $bytes, $num, 'ignored', $mid);
+ ok($doc_id, 'message reindexed'. $mid);
+ is($doc_id, $num, "article number unchanged: $num");
+
+ my $tid1 = $dbh->selectrow_array(<<'', undef, $num);
+SELECT tid FROM over WHERE num = ? LIMIT 1
+
+ is($tid1, $tid0, 'tid unchanged on reindex');
}
$rw->commit_txn_lazy;
--
EW
next prev parent reply other threads:[~2018-08-05 8:19 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-03 18:26 Threading/searching problem Konstantin Ryabitsev
2018-08-03 19:20 ` Eric Wong
2018-08-03 19:38 ` Konstantin Ryabitsev
2018-08-05 6:04 ` [PATCH] view: distinguish strict and loose thread matches Eric Wong
2018-08-05 8:19 ` Eric Wong [this message]
2018-08-05 21:41 ` [RFC] overidx: preserve `tid' column on re-indexing Eric Wong
2018-08-06 20:05 ` [PATCH] view: distinguish strict and loose thread matches Konstantin Ryabitsev
2018-08-06 20:10 ` Konstantin Ryabitsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180805081925.ypej6lcxtswdtdow@dcvr \
--to=e@80x24.org \
--cc=konstantin@linuxfoundation.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).