* --reindex buggy wrt NNTP numbers, fixes coming @ 2019-05-25 19:57 Eric Wong 2019-05-27 18:45 ` [PATCH 0/3] fix --reindex skipping NNTP article numbers Eric Wong 0 siblings, 1 reply; 5+ messages in thread From: Eric Wong @ 2019-05-25 19:57 UTC (permalink / raw To: meta I was doing some cleanups and adding progress reporting to V2Writable for public-inbox-index when I noticed NNTP article numbers were off (too high, gaps) And the v1 index_sync path seems buggy in this regard, too. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 0/3] fix --reindex skipping NNTP article numbers 2019-05-25 19:57 --reindex buggy wrt NNTP numbers, fixes coming Eric Wong @ 2019-05-27 18:45 ` Eric Wong 2019-05-27 18:45 ` [PATCH 1/3] t/v1reindex.t: fix typo in setting `indexlevel' Eric Wong ` (2 more replies) 0 siblings, 3 replies; 5+ messages in thread From: Eric Wong @ 2019-05-27 18:45 UTC (permalink / raw To: meta I was working on v2 cleanups and progress-reporting, and speeding up no-op 'public-inbox-index' invocations when I noticed --reindex showing higher-than-expected NNTP article numbers with the verbose output I'm working on. There's some (apparently) inconsequential typo fixes, too. I think the v1 --reindex code is alright, after all, but maybe I haven't been paying enough attention to it... Also, I've noticed the v1 indexing code works around a Xapian v1.2.20..v1.2.24 bug wrt OFD locks which the newer v2 code doesn't (since v2 was developed mainly on Xapian 1.4.3+). Not sure who uses Xapian 1.2, still; but it might still be worth fixing for folks on older systems. Eric Wong (3): t/v1reindex.t: fix typo in setting `indexlevel' searchidx: fix obvious typo v2: fix reindex skipping NNTP article numbers lib/PublicInbox/SearchIdx.pm | 2 +- lib/PublicInbox/V2Writable.pm | 25 ++++++++++++++++++++++++- t/indexlevels-mirror.t | 25 +++++++++++++++++++++++++ t/v1reindex.t | 2 +- 4 files changed, 51 insertions(+), 3 deletions(-) -- EW ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] t/v1reindex.t: fix typo in setting `indexlevel' 2019-05-27 18:45 ` [PATCH 0/3] fix --reindex skipping NNTP article numbers Eric Wong @ 2019-05-27 18:45 ` Eric Wong 2019-05-27 18:45 ` [PATCH 2/3] searchidx: fix obvious typo Eric Wong 2019-05-27 18:45 ` [PATCH 3/3] v2: fix reindex skipping NNTP article numbers Eric Wong 2 siblings, 0 replies; 5+ messages in thread From: Eric Wong @ 2019-05-27 18:45 UTC (permalink / raw To: meta It did not cause a test failure because the default fallback is `indexlevel=full' --- t/v1reindex.t | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/v1reindex.t b/t/v1reindex.t index 402ecd7..35275fb 100644 --- a/t/v1reindex.t +++ b/t/v1reindex.t @@ -223,7 +223,7 @@ ok(!-d $xap, 'Xapian directories removed again'); my @warn; local $SIG{__WARN__} = sub { push @warn, @_ }; my %config = %$ibx_config; - $config{indexleve} = 'medium'; + $config{indexlevel} = 'medium'; my $ibx = PublicInbox::Inbox->new(\%config); my $rw = PublicInbox::SearchIdx->new($ibx, 1); eval { $rw->index_sync }; -- EW ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] searchidx: fix obvious typo 2019-05-27 18:45 ` [PATCH 0/3] fix --reindex skipping NNTP article numbers Eric Wong 2019-05-27 18:45 ` [PATCH 1/3] t/v1reindex.t: fix typo in setting `indexlevel' Eric Wong @ 2019-05-27 18:45 ` Eric Wong 2019-05-27 18:45 ` [PATCH 3/3] v2: fix reindex skipping NNTP article numbers Eric Wong 2 siblings, 0 replies; 5+ messages in thread From: Eric Wong @ 2019-05-27 18:45 UTC (permalink / raw To: meta We can't pass an empty string to `git merge-base --is-ancestor' AFAIK, this did NOT present issues in the current test suite. --- lib/PublicInbox/SearchIdx.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 9c29106..b963805 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -694,7 +694,7 @@ sub _last_x_commit { $lx = $lm; } # Use last_commit from msgmap if it is older or unset - if (!$lm || ($lx && $lx && is_ancestor($self->{git}, $lm, $lx))) { + if (!$lm || ($lx && $lm && is_ancestor($self->{git}, $lm, $lx))) { $lx = $lm; } $lx; -- EW ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] v2: fix reindex skipping NNTP article numbers 2019-05-27 18:45 ` [PATCH 0/3] fix --reindex skipping NNTP article numbers Eric Wong 2019-05-27 18:45 ` [PATCH 1/3] t/v1reindex.t: fix typo in setting `indexlevel' Eric Wong 2019-05-27 18:45 ` [PATCH 2/3] searchidx: fix obvious typo Eric Wong @ 2019-05-27 18:45 ` Eric Wong 2 siblings, 0 replies; 5+ messages in thread From: Eric Wong @ 2019-05-27 18:45 UTC (permalink / raw To: meta `public-inbox-index --reindex' could cause NNTP article number gaps to form when it also has to deal with new, never-before-seen commits in mirrors running off `git fetch'. Fix this by running two distinct invocations of ->index_sync; once to only reindex old commits, and a second time to index new commits. This does not appear to be a problem on v1 at the moment, but I'll need more time to analyze this. --- lib/PublicInbox/V2Writable.pm | 25 ++++++++++++++++++++++++- t/indexlevels-mirror.t | 25 +++++++++++++++++++++++++ 2 files changed, 49 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index cd08acd..331c4f4 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -850,11 +850,19 @@ sub index_prepare { my $pr = $opts->{-progress}; my $regen_max = 0; my $head = $self->{-inbox}->{ref_head} || 'refs/heads/master'; + + # reindex stops at the current heads and we later rerun index_sync + # without {reindex} + my $reindex_heads = last_commits($self, $epoch_max) if $opts->{reindex}; + for (my $i = $epoch_max; $i >= 0; $i--) { die 'BUG: already indexing!' if $self->{reindex_pipe}; my $git_dir = git_dir_n($self, $i); -d $git_dir or next; # missing parts are fine my $git = PublicInbox::Git->new($git_dir); + if ($reindex_heads) { + $head = $reindex_heads->[$i] or next; + } chomp(my $tip = $git->qx(qw(rev-parse -q --verify), $head)); next if $?; # new repo @@ -959,7 +967,14 @@ sub index_sync { my $high = $self->{mm}->num_highwater(); my $regen = $self->index_prepare($opts, $epoch_max, $ranges); - $$regen += $high if $high; + if ($opts->{reindex}) { + # reindex should NOT see new commits anymore, if we do, + # it's a problem and we need to notice it via die() + $$regen = -1; + } else { + $$regen += $high; + } + my $D = {}; # "$mid\0$cid" => $oid my @cmd = qw(log --raw -r --pretty=tformat:%H --no-notes --no-color --no-abbrev --no-renames); @@ -1001,6 +1016,14 @@ sub index_sync { $git->cleanup; } $self->done; + + # reindex does not pick up new changes, so we rerun w/o it: + if ($opts->{reindex}) { + my %again = %$opts; + $mm_tmp = undef; + delete @again{qw(reindex -skip_lock)}; + index_sync($self, \%again); + } } 1; diff --git a/t/indexlevels-mirror.t b/t/indexlevels-mirror.t index ce138fe..1251136 100644 --- a/t/indexlevels-mirror.t +++ b/t/indexlevels-mirror.t @@ -105,9 +105,17 @@ sub import_index_incremental { is_deeply([sort { $a cmp $b } map { $_->{mid} } @$msgs], ['m@1','m@2'], 'got both messages in master'); + my @rw_nums = map { $_->{num} } @{$ibx->over->query_ts(0, 0)}; + is_deeply(\@rw_nums, [1, 2], 'master has expected NNTP articles'); + + my @ro_nums = map { $_->{num} } @{$ro_mirror->over->query_ts(0, 0)}; + is_deeply(\@ro_nums, [1, 2], 'mirror has expected NNTP articles'); + # remove message from master ok($im->remove($mime), '2nd message removed'); $im->done; + @rw_nums = map { $_->{num} } @{$ibx->over->query_ts(0, 0)}; + is_deeply(\@rw_nums, [1], 'unindex NNTP article'.$v.$level); if ($level ne 'basic') { is(system(@xcpdb, $mirror), 0, "v$v xcpdb OK"); @@ -132,6 +140,23 @@ sub import_index_incremental { ($nr, $msgs) = $ro_mirror->search->reopen->query('m:m@2'); is($nr, 0, "v$v m\@2 gone from Xapian in mirror on $level"); } + + # add another message to master and have the mirror + # sync and reindex it + my @expect = map { $_->{num} } @{$ibx->over->query_ts(0, 0)}; + foreach my $i (3..5) { + $mime->header_set('Message-ID', "<m\@$i>"); + ok($im->add($mime), "#$i message added"); + push @expect, $i; + } + $im->done; + is(system('git', "--git-dir=$fetch_dir", qw(fetch -q)), 0, 'fetch OK'); + is(system($index, '--reindex', $mirror), 0, + "v$v index --reindex mirror OK"); + @ro_nums = map { $_->{num} } @{$ro_mirror->over->query_ts(0, 0)}; + @rw_nums = map { $_->{num} } @{$ibx->over->query_ts(0, 0)}; + is_deeply(\@rw_nums, \@expect, "v$v master has expected NNTP articles"); + is_deeply(\@ro_nums, \@expect, "v$v mirror matches master articles"); } # we can probably cull some other tests and put full/medium tests, here -- EW ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-05-27 18:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-05-25 19:57 --reindex buggy wrt NNTP numbers, fixes coming Eric Wong 2019-05-27 18:45 ` [PATCH 0/3] fix --reindex skipping NNTP article numbers Eric Wong 2019-05-27 18:45 ` [PATCH 1/3] t/v1reindex.t: fix typo in setting `indexlevel' Eric Wong 2019-05-27 18:45 ` [PATCH 2/3] searchidx: fix obvious typo Eric Wong 2019-05-27 18:45 ` [PATCH 3/3] v2: fix reindex skipping NNTP article numbers Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).