unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/4] lei reindex-related stuff
@ 2022-08-19  9:07 Eric Wong
  2022-08-19  9:07 ` [PATCH 1/4] lei reindex: account for parallel lei/store users Eric Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19  9:07 UTC (permalink / raw)
  To: meta

1/4 is an important fix

And I'm still unusually stressed out about how to deal with
ancient, pre-release stuff from lei.  --rethread may happen
another time, since AFAIK it's not necessary, right now as there
were no threading fixes required since lei...

Eric Wong (4):
  lei reindex: account for parallel lei/store users
  tests: add some basic "lei reindex" tests
  smsg: ->populate falls back to old {ds}/{ts} values
  lei/store: reindex culls over-indexed messages

 MANIFEST                    |  1 +
 lib/PublicInbox/LeiStore.pm | 14 ++++++++++++--
 lib/PublicInbox/Smsg.pm     |  6 ++++--
 t/lei-index.t               | 12 +++++++++++-
 t/lei-reindex.t             | 12 ++++++++++++
 5 files changed, 40 insertions(+), 5 deletions(-)
 create mode 100644 t/lei-reindex.t

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] lei reindex: account for parallel lei/store users
  2022-08-19  9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
@ 2022-08-19  9:07 ` Eric Wong
  2022-08-19  9:07 ` [PATCH 2/4] tests: add some basic "lei reindex" tests Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19  9:07 UTC (permalink / raw)
  To: meta

We need to call eidx_init in each git->cat_async callback
since another requestor may've stopped the shard processes.
---
 lib/PublicInbox/LeiStore.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index 277ed6bd..8e710540 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -337,7 +337,8 @@ sub _docids_and_maybe_kw ($$) {
 
 sub _reindex_1 { # git->cat_async callback
 	my ($bref, $hex, $type, $size, $smsg) = @_;
-	my ($self, $eidx, $tl) = delete @$smsg{qw(-self -eidx -tl)};
+	my $self = delete $smsg->{-sto};
+	my ($eidx, $tl) = eidx_init($self);
 	$bref //= _lms_rw($self)->local_blob($hex, 1);
 	if ($bref) {
 		my $eml = PublicInbox::Eml->new($bref);
@@ -353,7 +354,7 @@ sub reindex_art {
 	my ($eidx, $tl) = eidx_init($self);
 	my $smsg = $eidx->{oidx}->get_art($art) // return;
 	return if $smsg->{bytes} == 0; # external-only message
-	@$smsg{qw(-self -eidx -tl)} = ($self, $eidx, $tl);
+	$smsg->{-sto} = $self;
 	$eidx->git->cat_async($smsg->{blob} // die("no blob (#$art)"),
 				\&_reindex_1, $smsg);
 }

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] tests: add some basic "lei reindex" tests
  2022-08-19  9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
  2022-08-19  9:07 ` [PATCH 1/4] lei reindex: account for parallel lei/store users Eric Wong
@ 2022-08-19  9:07 ` Eric Wong
  2022-08-19  9:07 ` [PATCH 3/4] smsg: ->populate falls back to old {ds}/{ts} values Eric Wong
  2022-08-19  9:07 ` [PATCH 4/4] lei/store: reindex culls over-indexed messages Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19  9:07 UTC (permalink / raw)
  To: meta

This is a bit hard-to-test, but at least we must ensure
volatile-metadata is preserved.
---
 MANIFEST        |  1 +
 t/lei-index.t   | 12 +++++++++++-
 t/lei-reindex.t | 12 ++++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 t/lei-reindex.t

diff --git a/MANIFEST b/MANIFEST
index 27e4c4e0..43128382 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -477,6 +477,7 @@ t/lei-q-remote-import.t
 t/lei-q-save.t
 t/lei-q-thread.t
 t/lei-refresh-mail-sync.t
+t/lei-reindex.t
 t/lei-sigpipe.t
 t/lei-tag.t
 t/lei-up.t
diff --git a/t/lei-index.t b/t/lei-index.t
index aab8f7e6..c31b1c3c 100644
--- a/t/lei-index.t
+++ b/t/lei-index.t
@@ -1,5 +1,5 @@
 #!perl -w
-# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict; use v5.10.1; use PublicInbox::TestCommon;
 use File::Spec;
@@ -85,6 +85,10 @@ test_lei({ tmpdir => $tmpdir }, sub {
 	lei_ok qw(q m:multipart-html-sucks@11);
 	is_deeply(json_utf8->decode($lei_out)->[0]->{'kw'},
 		['seen'], 'keyword set');
+	lei_ok 'reindex';
+	lei_ok qw(q m:multipart-html-sucks@11);
+	is_deeply(json_utf8->decode($lei_out)->[0]->{'kw'},
+		['seen'], 'keyword still set after reindex');
 
 	$srv->{nntpd} and
 		lei_ok('index', "nntp://$srv->{nntp_host_port}/t.v2");
@@ -104,6 +108,12 @@ test_lei({ tmpdir => $tmpdir }, sub {
 	my $t = xqx(['git', "--git-dir=$store_path/ALL.git",
 			qw(cat-file -t), $res_b->{blob}]);
 	is($t, "blob\n", 'got blob');
+
+	lei_ok('reindex');
+	lei_ok qw(q m:multipart-html-sucks@11);
+	$res_a = json_utf8->decode($lei_out)->[0];
+	is_deeply($res_a->{'kw'}, ['seen'],
+		'keywords still set after reindex');
 });
 
 done_testing;
diff --git a/t/lei-reindex.t b/t/lei-reindex.t
new file mode 100644
index 00000000..73346ee8
--- /dev/null
+++ b/t/lei-reindex.t
@@ -0,0 +1,12 @@
+#!perl -w
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use v5.12; use PublicInbox::TestCommon;
+require_mods(qw(lei));
+my ($tmpdir, $for_destroy) = tmpdir;
+test_lei(sub {
+	ok(!lei('reindex'), 'reindex fails w/o store');
+	like $lei_err, qr/nothing indexed/, "`nothing indexed' noted";
+});
+
+done_testing;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] smsg: ->populate falls back to old {ds}/{ts} values
  2022-08-19  9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
  2022-08-19  9:07 ` [PATCH 1/4] lei reindex: account for parallel lei/store users Eric Wong
  2022-08-19  9:07 ` [PATCH 2/4] tests: add some basic "lei reindex" tests Eric Wong
@ 2022-08-19  9:07 ` Eric Wong
  2022-08-19  9:07 ` [PATCH 4/4] lei/store: reindex culls over-indexed messages Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19  9:07 UTC (permalink / raw)
  To: meta

This will be useful for re-indexing external messages which
were over-indexed in lei/store.
---
 lib/PublicInbox/Smsg.pm | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 260ce6bb..2026c7d9 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -115,8 +115,10 @@ sub populate {
 		$self->{$f} = $val if $val ne '';
 	}
 	$sync //= {};
-	$self->{-ds} = [ my @ds = msg_datestamp($hdr, $sync->{autime}) ];
-	$self->{-ts} = [ my @ts = msg_timestamp($hdr, $sync->{cotime}) ];
+	my @ds = msg_datestamp($hdr, $sync->{autime} // $self->{ds});
+	my @ts = msg_timestamp($hdr, $sync->{cotime} // $self->{ts});
+	$self->{-ds} = \@ds;
+	$self->{-ts} = \@ts;
 	$self->{ds} //= $ds[0]; # no zone
 	$self->{ts} //= $ts[0];
 	$self->{mid} //= mids($hdr)->[0];

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] lei/store: reindex culls over-indexed messages
  2022-08-19  9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
                   ` (2 preceding siblings ...)
  2022-08-19  9:07 ` [PATCH 3/4] smsg: ->populate falls back to old {ds}/{ts} values Eric Wong
@ 2022-08-19  9:07 ` Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-19  9:07 UTC (permalink / raw)
  To: meta

I may be the only lei user who has redundantly-indexed messages
needing this, though...
---
 lib/PublicInbox/LeiStore.pm | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index 8e710540..57f0e013 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -344,6 +344,15 @@ sub _reindex_1 { # git->cat_async callback
 		my $eml = PublicInbox::Eml->new($bref);
 		$smsg->{-merge_vmd} = 1; # preserve existing keywords
 		$eidx->idx_shard($smsg->{num})->index_eml($eml, $smsg);
+	} elsif ($type eq 'missing') {
+		# pre-release/buggy lei may've indexed external-only msgs,
+		# try to correct that, here
+		warn("E: missing $hex, culling (ancient lei artifact?)\n");
+		$smsg->{to} = $smsg->{cc} = $smsg->{from} = '';
+		$smsg->{bytes} = 0;
+		$eidx->{oidx}->update_blob($smsg, '');
+		my $eml = PublicInbox::Eml->new("\r\n\r\n");
+		$eidx->idx_shard($smsg->{num})->index_eml($eml, $smsg);
 	} else {
 		warn("E: $type $hex\n");
 	}

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-19  9:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19  9:07 [PATCH 0/4] lei reindex-related stuff Eric Wong
2022-08-19  9:07 ` [PATCH 1/4] lei reindex: account for parallel lei/store users Eric Wong
2022-08-19  9:07 ` [PATCH 2/4] tests: add some basic "lei reindex" tests Eric Wong
2022-08-19  9:07 ` [PATCH 3/4] smsg: ->populate falls back to old {ds}/{ts} values Eric Wong
2022-08-19  9:07 ` [PATCH 4/4] lei/store: reindex culls over-indexed messages Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).