* [PATCH 0/4] lei reindex-related stuff
@ 2022-08-19 9:07 Eric Wong
2022-08-19 9:07 ` [PATCH 1/4] lei reindex: account for parallel lei/store users Eric Wong
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19 9:07 UTC (permalink / raw)
To: meta
1/4 is an important fix
And I'm still unusually stressed out about how to deal with
ancient, pre-release stuff from lei. --rethread may happen
another time, since AFAIK it's not necessary, right now as there
were no threading fixes required since lei...
Eric Wong (4):
lei reindex: account for parallel lei/store users
tests: add some basic "lei reindex" tests
smsg: ->populate falls back to old {ds}/{ts} values
lei/store: reindex culls over-indexed messages
MANIFEST | 1 +
lib/PublicInbox/LeiStore.pm | 14 ++++++++++++--
lib/PublicInbox/Smsg.pm | 6 ++++--
t/lei-index.t | 12 +++++++++++-
t/lei-reindex.t | 12 ++++++++++++
5 files changed, 40 insertions(+), 5 deletions(-)
create mode 100644 t/lei-reindex.t
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] lei reindex: account for parallel lei/store users
2022-08-19 9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
@ 2022-08-19 9:07 ` Eric Wong
2022-08-19 9:07 ` [PATCH 2/4] tests: add some basic "lei reindex" tests Eric Wong
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19 9:07 UTC (permalink / raw)
To: meta
We need to call eidx_init in each git->cat_async callback
since another requestor may've stopped the shard processes.
---
lib/PublicInbox/LeiStore.pm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index 277ed6bd..8e710540 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -337,7 +337,8 @@ sub _docids_and_maybe_kw ($$) {
sub _reindex_1 { # git->cat_async callback
my ($bref, $hex, $type, $size, $smsg) = @_;
- my ($self, $eidx, $tl) = delete @$smsg{qw(-self -eidx -tl)};
+ my $self = delete $smsg->{-sto};
+ my ($eidx, $tl) = eidx_init($self);
$bref //= _lms_rw($self)->local_blob($hex, 1);
if ($bref) {
my $eml = PublicInbox::Eml->new($bref);
@@ -353,7 +354,7 @@ sub reindex_art {
my ($eidx, $tl) = eidx_init($self);
my $smsg = $eidx->{oidx}->get_art($art) // return;
return if $smsg->{bytes} == 0; # external-only message
- @$smsg{qw(-self -eidx -tl)} = ($self, $eidx, $tl);
+ $smsg->{-sto} = $self;
$eidx->git->cat_async($smsg->{blob} // die("no blob (#$art)"),
\&_reindex_1, $smsg);
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] tests: add some basic "lei reindex" tests
2022-08-19 9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
2022-08-19 9:07 ` [PATCH 1/4] lei reindex: account for parallel lei/store users Eric Wong
@ 2022-08-19 9:07 ` Eric Wong
2022-08-19 9:07 ` [PATCH 3/4] smsg: ->populate falls back to old {ds}/{ts} values Eric Wong
2022-08-19 9:07 ` [PATCH 4/4] lei/store: reindex culls over-indexed messages Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19 9:07 UTC (permalink / raw)
To: meta
This is a bit hard-to-test, but at least we must ensure
volatile-metadata is preserved.
---
MANIFEST | 1 +
t/lei-index.t | 12 +++++++++++-
t/lei-reindex.t | 12 ++++++++++++
3 files changed, 24 insertions(+), 1 deletion(-)
create mode 100644 t/lei-reindex.t
diff --git a/MANIFEST b/MANIFEST
index 27e4c4e0..43128382 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -477,6 +477,7 @@ t/lei-q-remote-import.t
t/lei-q-save.t
t/lei-q-thread.t
t/lei-refresh-mail-sync.t
+t/lei-reindex.t
t/lei-sigpipe.t
t/lei-tag.t
t/lei-up.t
diff --git a/t/lei-index.t b/t/lei-index.t
index aab8f7e6..c31b1c3c 100644
--- a/t/lei-index.t
+++ b/t/lei-index.t
@@ -1,5 +1,5 @@
#!perl -w
-# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
use strict; use v5.10.1; use PublicInbox::TestCommon;
use File::Spec;
@@ -85,6 +85,10 @@ test_lei({ tmpdir => $tmpdir }, sub {
lei_ok qw(q m:multipart-html-sucks@11);
is_deeply(json_utf8->decode($lei_out)->[0]->{'kw'},
['seen'], 'keyword set');
+ lei_ok 'reindex';
+ lei_ok qw(q m:multipart-html-sucks@11);
+ is_deeply(json_utf8->decode($lei_out)->[0]->{'kw'},
+ ['seen'], 'keyword still set after reindex');
$srv->{nntpd} and
lei_ok('index', "nntp://$srv->{nntp_host_port}/t.v2");
@@ -104,6 +108,12 @@ test_lei({ tmpdir => $tmpdir }, sub {
my $t = xqx(['git', "--git-dir=$store_path/ALL.git",
qw(cat-file -t), $res_b->{blob}]);
is($t, "blob\n", 'got blob');
+
+ lei_ok('reindex');
+ lei_ok qw(q m:multipart-html-sucks@11);
+ $res_a = json_utf8->decode($lei_out)->[0];
+ is_deeply($res_a->{'kw'}, ['seen'],
+ 'keywords still set after reindex');
});
done_testing;
diff --git a/t/lei-reindex.t b/t/lei-reindex.t
new file mode 100644
index 00000000..73346ee8
--- /dev/null
+++ b/t/lei-reindex.t
@@ -0,0 +1,12 @@
+#!perl -w
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use v5.12; use PublicInbox::TestCommon;
+require_mods(qw(lei));
+my ($tmpdir, $for_destroy) = tmpdir;
+test_lei(sub {
+ ok(!lei('reindex'), 'reindex fails w/o store');
+ like $lei_err, qr/nothing indexed/, "`nothing indexed' noted";
+});
+
+done_testing;
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] smsg: ->populate falls back to old {ds}/{ts} values
2022-08-19 9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
2022-08-19 9:07 ` [PATCH 1/4] lei reindex: account for parallel lei/store users Eric Wong
2022-08-19 9:07 ` [PATCH 2/4] tests: add some basic "lei reindex" tests Eric Wong
@ 2022-08-19 9:07 ` Eric Wong
2022-08-19 9:07 ` [PATCH 4/4] lei/store: reindex culls over-indexed messages Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19 9:07 UTC (permalink / raw)
To: meta
This will be useful for re-indexing external messages which
were over-indexed in lei/store.
---
lib/PublicInbox/Smsg.pm | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 260ce6bb..2026c7d9 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -115,8 +115,10 @@ sub populate {
$self->{$f} = $val if $val ne '';
}
$sync //= {};
- $self->{-ds} = [ my @ds = msg_datestamp($hdr, $sync->{autime}) ];
- $self->{-ts} = [ my @ts = msg_timestamp($hdr, $sync->{cotime}) ];
+ my @ds = msg_datestamp($hdr, $sync->{autime} // $self->{ds});
+ my @ts = msg_timestamp($hdr, $sync->{cotime} // $self->{ts});
+ $self->{-ds} = \@ds;
+ $self->{-ts} = \@ts;
$self->{ds} //= $ds[0]; # no zone
$self->{ts} //= $ts[0];
$self->{mid} //= mids($hdr)->[0];
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] lei/store: reindex culls over-indexed messages
2022-08-19 9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
` (2 preceding siblings ...)
2022-08-19 9:07 ` [PATCH 3/4] smsg: ->populate falls back to old {ds}/{ts} values Eric Wong
@ 2022-08-19 9:07 ` Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19 9:07 UTC (permalink / raw)
To: meta
I may be the only lei user who has redundantly-indexed messages
needing this, though...
---
lib/PublicInbox/LeiStore.pm | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index 8e710540..57f0e013 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -344,6 +344,15 @@ sub _reindex_1 { # git->cat_async callback
my $eml = PublicInbox::Eml->new($bref);
$smsg->{-merge_vmd} = 1; # preserve existing keywords
$eidx->idx_shard($smsg->{num})->index_eml($eml, $smsg);
+ } elsif ($type eq 'missing') {
+ # pre-release/buggy lei may've indexed external-only msgs,
+ # try to correct that, here
+ warn("E: missing $hex, culling (ancient lei artifact?)\n");
+ $smsg->{to} = $smsg->{cc} = $smsg->{from} = '';
+ $smsg->{bytes} = 0;
+ $eidx->{oidx}->update_blob($smsg, '');
+ my $eml = PublicInbox::Eml->new("\r\n\r\n");
+ $eidx->idx_shard($smsg->{num})->index_eml($eml, $smsg);
} else {
warn("E: $type $hex\n");
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-08-19 9:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19 9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
2022-08-19 9:07 ` [PATCH 1/4] lei reindex: account for parallel lei/store users Eric Wong
2022-08-19 9:07 ` [PATCH 2/4] tests: add some basic "lei reindex" tests Eric Wong
2022-08-19 9:07 ` [PATCH 3/4] smsg: ->populate falls back to old {ds}/{ts} values Eric Wong
2022-08-19 9:07 ` [PATCH 4/4] lei/store: reindex culls over-indexed messages Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).