* [PATCH 00/31] (ext)index: eliminate $sync structure
@ 2025-01-10 23:17 Eric Wong
2025-01-10 23:18 ` Eric Wong
` (3 more replies)
0 siblings, 4 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:17 UTC (permalink / raw)
To: meta
A giant set of changes to simplify indexing internals and make
data lifetimes easier to reason about. Testing took a few weeks
since it was interrupted by several power outages, but my mirror
of lore <https://yhbt.net/lore/> seems OK upon reindex.
I didn't know to (ab)use `local' as much back in the day when I
wrote this code as I have in more recent times.
Hopefully this giant pile makes future changes easier!
Eric Wong (31):
(ext)index: ${$sync->{nr}} to $self->{nrec}
(ext)index: move {-opt} from $sync to $self
(ext)index: move {-regen_fmt} from $sync to $self
(ext)index: move {latest_cmt} to $self (from $sync)
smsg->populate: rename $sync to $cmt_info
searchidx: prefix v1 code with `v1_'
(ext)index: avoid needless {git} ref with --max-size
(ext)index: move {quit} from $sync to $self
extindex: {boost_in_use} field to $self
extindex: move {id2pos} to $self
searchidx: move {ntodo} to $self
searchidx: rename {sidx} to {self}
(ext)index: move {max_size} and related bits to $self
index: move {D} (delete state) to $self
index: move {reindex} to $self
searchidx: eliminate $sync from subroutines
v2writable: move {mm_tmp} to $self
(ext)index: eliminate most uses of `$sync->{ibx}'
extindex: eliminate repeated ->eidx_key method call
v2writable: eliminate $sync->{art_end}
(ext)index: $sync->{unit} => $self->{unit}
(ext)index: eliminate redundant $sync->{epoch_max}
extindex: move {dedupe_cull} to self
extindex: simplify data structures used for dedupe
(ext)index: move {todo} into $self
v2writable: {in_unindex} moved to self
v2writable: hoist out process_todo sub for extindex
(ext)index: move {unindexed} to $self
(ext)index: move {ranges} to $self
(ext)index: eliminate $sync->{ibx}
(ext)index: eliminate $sync entirely
lib/PublicInbox/ExtSearchIdx.pm | 358 +++++++++++++++-----------------
lib/PublicInbox/IdxStack.pm | 2 +-
lib/PublicInbox/LeiInput.pm | 4 +-
lib/PublicInbox/SearchIdx.pm | 223 ++++++++++----------
lib/PublicInbox/Smsg.pm | 7 +-
lib/PublicInbox/V2Writable.pm | 265 ++++++++++++-----------
t/search.t | 7 +-
7 files changed, 425 insertions(+), 441 deletions(-)
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 00/31] (ext)index: eliminate $sync structure
2025-01-10 23:17 [PATCH 00/31] (ext)index: eliminate $sync structure Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 01/31] (ext)index: ${$sync->{nr}} to $self->{nrec} Eric Wong
` (8 more replies)
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
` (2 subsequent siblings)
3 siblings, 9 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
A giant set of changes to simplify indexing internals and make
data lifetimes easier to reason about. Testing took a few weeks
since it was interrupted by several power outages, but my mirror
of lore <https://yhbt.net/lore/> seems OK upon reindex.
I didn't know to (ab)use `local' as much back in the day when I
wrote this code as I have in more recent times.
Hopefully this giant pile makes future changes easier!
Eric Wong (31):
(ext)index: ${$sync->{nr}} to $self->{nrec}
(ext)index: move {-opt} from $sync to $self
(ext)index: move {-regen_fmt} from $sync to $self
(ext)index: move {latest_cmt} to $self (from $sync)
smsg->populate: rename $sync to $cmt_info
searchidx: prefix v1 code with `v1_'
(ext)index: avoid needless {git} ref with --max-size
(ext)index: move {quit} from $sync to $self
extindex: {boost_in_use} field to $self
extindex: move {id2pos} to $self
searchidx: move {ntodo} to $self
searchidx: rename {sidx} to {self}
(ext)index: move {max_size} and related bits to $self
index: move {D} (delete state) to $self
index: move {reindex} to $self
searchidx: eliminate $sync from subroutines
v2writable: move {mm_tmp} to $self
(ext)index: eliminate most uses of `$sync->{ibx}'
extindex: eliminate repeated ->eidx_key method call
v2writable: eliminate $sync->{art_end}
(ext)index: $sync->{unit} => $self->{unit}
(ext)index: eliminate redundant $sync->{epoch_max}
extindex: move {dedupe_cull} to self
extindex: simplify data structures used for dedupe
(ext)index: move {todo} into $self
v2writable: {in_unindex} moved to self
v2writable: hoist out process_todo sub for extindex
(ext)index: move {unindexed} to $self
(ext)index: move {ranges} to $self
(ext)index: eliminate $sync->{ibx}
(ext)index: eliminate $sync entirely
lib/PublicInbox/ExtSearchIdx.pm | 358 +++++++++++++++-----------------
lib/PublicInbox/IdxStack.pm | 2 +-
lib/PublicInbox/LeiInput.pm | 4 +-
lib/PublicInbox/SearchIdx.pm | 223 ++++++++++----------
lib/PublicInbox/Smsg.pm | 7 +-
lib/PublicInbox/V2Writable.pm | 265 ++++++++++++-----------
t/search.t | 7 +-
7 files changed, 425 insertions(+), 441 deletions(-)
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 01/31] (ext)index: ${$sync->{nr}} to $self->{nrec}
2025-01-10 23:18 ` Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 02/31] (ext)index: move {-opt} from $sync to $self Eric Wong
` (7 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
{nrec} is probably less confusing as a name in a long-lived
context and we can get rid of the awkward scalar dereferencing
by using $self.
---
lib/PublicInbox/ExtSearchIdx.pm | 25 ++++++++++++-------------
lib/PublicInbox/SearchIdx.pm | 11 +++++------
lib/PublicInbox/V2Writable.pm | 14 +++++++-------
3 files changed, 24 insertions(+), 26 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 970d5eb3..ebbb2af1 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -349,7 +349,7 @@ sub index_oid { # git->cat_async callback for 'm'
blob => $oid,
}, 'PublicInbox::Smsg';
$new_smsg->set_bytes($$bref, $size);
- ++${$req->{nr}};
+ ++$self->{nrec};
my $mismatch = [];
$req->{xnum} = cur_ibx_xnum($req, $bref, $mismatch) // do {
warn "# deleted\n";
@@ -397,7 +397,7 @@ sub _sync_inbox ($$$) {
return "W: skipping $ekey ($err)";
}
$sync->{ibx} = $ibx;
- $sync->{nr} = \(my $nr = 0);
+ $self->{nrec} = 0;
my $v = $ibx->version;
if ($v == 2) {
$sync->{epoch_max} = $ibx->max_git_epoch // return;
@@ -528,10 +528,8 @@ sub eidx_gc { # top-level entry point
$opt->{-idx_gc} = 1;
local $self->{checkpoint_unlocks} = 1;
local $self->{need_checkpoint} = 0;
- my $sync = {
- -opt => $opt,
- self => $self,
- };
+ local $self->{nrec};
+ my $sync = { -opt => $opt, self => $self };
$self->idx_init($opt); # acquire lock via V2Writable::_idx_init
eidx_gc_scan_inboxes($self, $sync);
eidx_gc_scan_shards($self, $sync);
@@ -793,7 +791,7 @@ sub eidxq_process ($$) { # for reindexing
return unless ($self->{cfg} && eidxq_lock_acquire($self));
my $dbh = $self->{oidx}->dbh;
my $tot = $dbh->selectrow_array('SELECT COUNT(*) FROM eidxq') or return;
- ${$sync->{nr}} = 0;
+ $self->{nrec} = 0;
local $sync->{-regen_fmt} = "%u/$tot\n";
my $pr = $sync->{-opt}->{-progress};
if ($pr) {
@@ -815,7 +813,7 @@ restart:
warn "E: #$docid does not exist in over\n";
}
$del->execute($docid);
- ++${$sync->{nr}};
+ ++$self->{nrec};
if (update_checkpoint $self) {
$dbh = $del = $iter = undef;
@@ -825,7 +823,7 @@ restart:
}
}
$self->git->async_wait_all;
- $pr->("reindexed ${$sync->{nr}}/$tot\n") if $pr;
+ $pr->("reindexed $self->{nrec}/$tot\n") if $pr;
}
sub _reindex_unseen { # git->cat_async callback
@@ -900,12 +898,12 @@ sub _reindex_check_ibx ($$$) {
my $msgs;
my $pr = $sync->{-opt}->{-progress};
local $sync->{-regen_fmt} = "$ekey checking %u/$max\n";
- ${$sync->{nr}} = 0;
+ $self->{nrec} = 0;
my $fast = $sync->{-opt}->{fast};
my $usr; # _unref_stale_range (< $lo) called
my ($lo, $hi);
while (scalar(@{$msgs = $ibx->over->query_xover($beg, $end, $opt)})) {
- ${$sync->{nr}} = $beg;
+ $self->{nrec} = $beg;
$beg = $msgs->[-1]->{num} + 1;
$end = $beg + $slice;
$end = $max if $end > $max;
@@ -1067,7 +1065,7 @@ EOS
while (my ($mid, $id) = $iter->fetchrow_array) {
last if $sync->{quit};
$self->{current_info} = "dedupe $mid";
- ${$sync->{nr}} = $min_id = $id;
+ $self->{nrec} = $min_id = $id;
my ($prv, @smsg);
while (my $x = $self->{oidx}->next_by_mid($mid, \$id, \$prv)) {
push @smsg, $x;
@@ -1105,7 +1103,7 @@ EOS
if (my $pr = $sync->{-opt}->{-progress}) {
$pr->("culled $n/$candidates candidates ($nr_mid msgids)\n");
}
- ${$sync->{nr}} = 0;
+ $self->{nrec} = 0;
}
sub eidx_sync { # main entry point
@@ -1116,6 +1114,7 @@ sub eidx_sync { # main entry point
$self->idx_init($opt); # acquire lock via V2Writable::_idx_init
$self->{oidx}->rethread_prepare($opt);
local $self->{need_checkpoint} = 0;
+ local $self->{nrec} = 0;
my $sync = {
-opt => $opt,
# DO NOT SET {reindex} here, it's incompatible with reused
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 8ac8cac3..34df5c90 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -801,7 +801,6 @@ sub update_checkpoint ($;$) {
sub index_both { # git->cat_async callback
my ($bref, $oid, $type, $size, $sync) = @_;
return if is_bad_blob($oid, $type, $size, $sync->{oid});
- ++${$sync->{nr}};
my $smsg = bless { blob => $oid }, 'PublicInbox::Smsg';
$smsg->set_bytes($$bref, $size);
my $self = $sync->{sidx};
@@ -812,6 +811,7 @@ sub index_both { # git->cat_async callback
die "E: could not generate NNTP article number for $oid";
add_message($self, $eml, $smsg, $sync);
++$self->{nidx};
+ ++$self->{nrec};
my $cur_cmt = $sync->{cur_cmt} // die 'BUG: {cur_cmt} missing';
${$sync->{latest_cmt}} = $cur_cmt;
}
@@ -891,11 +891,11 @@ sub v1_checkpoint ($$;$) {
}
commit_txn_lazy($self);
$sync->{ibx}->git->cleanup;
- my $nr = ${$sync->{nr}};
- idx_release($self, $nr);
+ my $nrec = $self->{nrec};
+ idx_release($self, $nrec);
# let another process do some work...
if (my $pr = $sync->{-opt}->{-progress}) {
- $pr->("indexed $nr/$sync->{ntodo}\n") if $nr;
+ $pr->("indexed $nrec/$sync->{ntodo}\n") if $nrec;
}
if (!$stk && !$sync->{quit}) { # more to come
begin_txn_lazy($self);
@@ -909,8 +909,7 @@ sub v1_checkpoint ($$;$) {
sub process_stack {
my ($self, $sync, $stk) = @_;
my $git = $sync->{ibx}->git;
- my $nr = 0;
- $sync->{nr} = \$nr;
+ $self->{nrec} = 0;
$sync->{sidx} = $self;
local $self->{need_checkpoint} = 0;
$sync->{latest_cmt} = \(my $latest_cmt);
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 61c41b60..dd3258f3 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -711,7 +711,7 @@ sub reindex_checkpoint ($$) {
}
if (my $pr = $sync->{-regen_fmt} ? $sync->{-opt}->{-progress} : undef) {
- $pr->(sprintf($sync->{-regen_fmt}, ${$sync->{nr}}));
+ $pr->(sprintf $sync->{-regen_fmt}, $self->{nrec});
}
# allow -watch or -mda to write...
@@ -803,7 +803,7 @@ sub index_oid { # cat_async callback
warn "E: $oid <", join('> <', @$mids), "> is a duplicate\n";
return;
}
- ++${$arg->{nr}};
+ ++$self->{nrec};
my $smsg = bless {
num => $num,
blob => $oid,
@@ -994,7 +994,7 @@ sub sync_prepare ($$) {
# it's a problem and we need to notice it via die()
my $pad = length($regen_max) + 1;
$sync->{-regen_fmt} = "% ${pad}u/$regen_max\n";
- $sync->{nr} = \(my $nr = 0);
+ $self->{nrec} = 0;
return -1 if $sync->{reindex};
$regen_max + $self->artnum_max || 0;
}
@@ -1115,7 +1115,7 @@ sub index_xap_step ($$$;$) {
# have timeout problems like SQLite
my $n = $self->{transact_bytes} += $smsg->{bytes};
if ($n >= $self->{batch_bytes}) {
- ${$sync->{nr}} = $num;
+ $self->{nrec} = $num;
reindex_checkpoint($self, $sync);
}
}
@@ -1179,7 +1179,6 @@ sub xapian_only {
$sync //= {
-opt => $opt,
self => $self,
- nr => \(my $nr = 0),
-regen_fmt => "%u/?\n",
};
$sync->{art_end} = $art_end;
@@ -1206,6 +1205,7 @@ sub index_sync {
my ($self, $opt) = @_;
$opt //= {};
local $self->{need_checkpoint} = 0;
+ local $self->{nrec} = 0;
return xapian_only($self, $opt) if $opt->{xapian_only};
my $epoch_max = $self->{ibx}->max_git_epoch // return;
@@ -1262,9 +1262,9 @@ sub index_sync {
$self->{oidx}->rethread_done($opt) unless $sync->{quit};
$self->done;
- if (my $nr = $sync->{nr}) {
+ if (my $nrec = $self->{nrec}) {
my $pr = $sync->{-opt}->{-progress};
- $pr->('all.git '.sprintf($sync->{-regen_fmt}, $$nr)) if $pr;
+ $pr->('all.git '.sprintf($sync->{-regen_fmt}, $nrec)) if $pr;
}
my $quit_warn;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 02/31] (ext)index: move {-opt} from $sync to $self
2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 01/31] (ext)index: ${$sync->{nr}} to $self->{nrec} Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 03/31] (ext)index: move {-regen_fmt} " Eric Wong
` (6 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
Another step towards eradicating the $sync structure.
This intermediate step does introduce the $self arg into
log2stack() and prepare_stack() but $self will eventually
eliminate $sync entirely.
---
lib/PublicInbox/ExtSearchIdx.pm | 24 ++++++++++++----------
lib/PublicInbox/LeiInput.pm | 3 ++-
lib/PublicInbox/SearchIdx.pm | 29 ++++++++++++++-------------
lib/PublicInbox/V2Writable.pm | 35 ++++++++++++++++-----------------
4 files changed, 48 insertions(+), 43 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index ebbb2af1..25f2d8e7 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -407,7 +407,8 @@ sub _sync_inbox ($$$) {
my $lc = $self->{oidx}->eidx_meta("lc-v1:$ekey//$uv");
my $head = $ibx->mm->last_commit //
return "E: $ibx->{inboxdir} is not indexed";
- my $stk = prepare_stack($sync, $lc ? "$lc..$head" : $head);
+ my $stk = prepare_stack($self, $sync,
+ $lc ? "$lc..$head" : $head);
my $unit = { stack => $stk, git => $ibx->git };
push @{$sync->{todo}}, $unit;
} else {
@@ -529,7 +530,8 @@ sub eidx_gc { # top-level entry point
local $self->{checkpoint_unlocks} = 1;
local $self->{need_checkpoint} = 0;
local $self->{nrec};
- my $sync = { -opt => $opt, self => $self };
+ local $self->{-opt} = $opt;
+ my $sync = { self => $self };
$self->idx_init($opt); # acquire lock via V2Writable::_idx_init
eidx_gc_scan_inboxes($self, $sync);
eidx_gc_scan_shards($self, $sync);
@@ -793,7 +795,7 @@ sub eidxq_process ($$) { # for reindexing
my $tot = $dbh->selectrow_array('SELECT COUNT(*) FROM eidxq') or return;
$self->{nrec} = 0;
local $sync->{-regen_fmt} = "%u/$tot\n";
- my $pr = $sync->{-opt}->{-progress};
+ my $pr = $self->{-opt}->{-progress};
if ($pr) {
my $min = $dbh->selectrow_array('SELECT MIN(docid) FROM eidxq');
my $max = $dbh->selectrow_array('SELECT MAX(docid) FROM eidxq');
@@ -896,10 +898,10 @@ sub _reindex_check_ibx ($$$) {
# first, check if we missed any messages in target $ibx
my $msgs;
- my $pr = $sync->{-opt}->{-progress};
+ my $pr = $self->{-opt}->{-progress};
local $sync->{-regen_fmt} = "$ekey checking %u/$max\n";
$self->{nrec} = 0;
- my $fast = $sync->{-opt}->{fast};
+ my $fast = $self->{-opt}->{fast};
my $usr; # _unref_stale_range (< $lo) called
my ($lo, $hi);
while (scalar(@{$msgs = $ibx->over->query_xover($beg, $end, $opt)})) {
@@ -1023,7 +1025,7 @@ sub dd_smsg { # git->cat_async callback
print STDERR
"# <$keep->{mid}> keeping #$keep->{num}, dropping ",
join(', ', map { "#$_->{num}" } @$ary),"\n";
- next if $per_mid->{sync}->{-opt}->{'dry-run'};
+ next if $self->{-opt}->{'dry-run'};
my $oidx = $self->{oidx};
for my $smsg (@$ary) {
my $gone = $smsg->{num};
@@ -1100,7 +1102,7 @@ EOS
goto dedupe_restart if defined($msgids->[++$idx]);
my $n = delete $sync->{dedupe_cull};
- if (my $pr = $sync->{-opt}->{-progress}) {
+ if (my $pr = $self->{-opt}->{-progress}) {
$pr->("culled $n/$candidates candidates ($nr_mid msgids)\n");
}
$self->{nrec} = 0;
@@ -1115,8 +1117,8 @@ sub eidx_sync { # main entry point
$self->{oidx}->rethread_prepare($opt);
local $self->{need_checkpoint} = 0;
local $self->{nrec} = 0;
+ local $self->{-opt} = $opt;
my $sync = {
- -opt => $opt,
# DO NOT SET {reindex} here, it's incompatible with reused
# V2Writable code, reindex is totally different here
# compared to v1/v2 inboxes because we have multiple histories
@@ -1306,7 +1308,7 @@ sub _watch_commit { # PublicInbox::DS::add_timer callback
sub on_inbox_unlock { # called by PublicInbox::InboxIdle
my ($self, $ibx) = @_;
- my $opt = $self->{-watch_sync}->{-opt};
+ my $opt = $self->{-opt};
my $pr = $opt->{-progress};
my $ekey = $ibx->eidx_key;
local $0 = "sync $ekey";
@@ -1320,7 +1322,7 @@ sub on_inbox_unlock { # called by PublicInbox::InboxIdle
sub eidx_reload { # -extindex --watch SIGHUP handler
my ($self, $idler) = @_;
if ($self->{cfg}) {
- my $pr = $self->{-watch_sync}->{-opt}->{-progress};
+ my $pr = $self->{-opt}->{-progress};
$pr->('reloading ...') if $pr;
delete $self->{-resync_queue};
delete $self->{-ibx_ary_known};
@@ -1359,6 +1361,7 @@ sub event_step { # PublicInbox::DS::requeue callback
}
}
+# FIXME: totally untested and undocumented
sub eidx_watch { # public-inbox-extindex --watch main loop
my ($self, $opt) = @_;
local @SIG{keys %SIG} = values %SIG;
@@ -1378,6 +1381,7 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
}
my $pr = $opt->{-progress};
$pr->("performing initial scan ...\n") if $pr;
+ local $self->{-opt} = $opt;
my $sync = eidx_sync($self, $opt); # initial sync
return if $sync->{quit};
my $oldset = PublicInbox::DS::block_signals();
diff --git a/lib/PublicInbox/LeiInput.pm b/lib/PublicInbox/LeiInput.pm
index 0a6aba82..618829ef 100644
--- a/lib/PublicInbox/LeiInput.pm
+++ b/lib/PublicInbox/LeiInput.pm
@@ -146,7 +146,8 @@ EOM
my $sync = { D => {}, ibx => $ibx }; # D => {} filters out deletes
my ($f, $at, $ct, $oid, $cmt);
for my $git (grep defined, @g) {
- my $s = PublicInbox::SearchIdx::log2stack($sync, $git, 'HEAD');
+ my $s = PublicInbox::SearchIdx::log2stack($sync, $sync,
+ $git, 'HEAD');
while (($f, $at, $ct, $oid, $cmt) = $s->pop_rec) {
$git->cat_async($oid, \&oid2eml, $self) if $f eq 'm';
}
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 34df5c90..628a1469 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -887,14 +887,14 @@ sub v1_checkpoint ($$;$) {
my $n = $xdb->get_metadata('has_threadid');
$xdb->set_metadata('has_threadid', '1') if $n ne '1';
}
- $self->{oidx}->rethread_done($sync->{-opt}); # all done
+ $self->{oidx}->rethread_done($self->{-opt}); # all done
}
commit_txn_lazy($self);
$sync->{ibx}->git->cleanup;
my $nrec = $self->{nrec};
idx_release($self, $nrec);
# let another process do some work...
- if (my $pr = $sync->{-opt}->{-progress}) {
+ if (my $pr = $self->{-opt}->{-progress}) {
$pr->("indexed $nrec/$sync->{ntodo}\n") if $nrec;
}
if (!$stk && !$sync->{quit}) { # more to come
@@ -923,7 +923,7 @@ sub process_stack {
$git->cat_async($oid, \&unindex_both, $sync);
}
}
- if ($sync->{max_size} = $sync->{-opt}->{max_size}) {
+ if ($sync->{max_size} = $self->{-opt}->{max_size}) {
$sync->{index_oid} = \&index_both;
}
while (my ($f, $at, $ct, $oid, $cur_cmt) = $stk->pop_rec) {
@@ -946,8 +946,8 @@ sub process_stack {
v1_checkpoint($self, $sync, $sync->{quit} ? undef : $stk);
}
-sub log2stack ($$$) {
- my ($sync, $git, $range) = @_;
+sub log2stack ($$$$) {
+ my ($self, $sync, $git, $range) = @_;
my $D = $sync->{D}; # OID_BIN => NR (if reindexing, undef otherwise)
my ($add, $del);
if ($sync->{ibx}->version == 1) {
@@ -964,8 +964,8 @@ sub log2stack ($$$) {
my @cmd = qw(log --raw -r --pretty=tformat:%at-%ct-%H
--no-notes --no-color --no-renames --no-abbrev);
for my $k (qw(since until)) {
- my $v = $sync->{-opt}->{$k} // next;
- next if !$sync->{-opt}->{reindex};
+ my $v = $self->{-opt}->{$k} // next;
+ next if !$self->{-opt}->{reindex};
push @cmd, "--$k=$v";
}
my $fh = $git->popen(@cmd, $range);
@@ -999,8 +999,8 @@ sub log2stack ($$$) {
$stk->read_prepare;
}
-sub prepare_stack ($$) {
- my ($sync, $range) = @_;
+sub prepare_stack ($$$) {
+ my ($self, $sync, $range) = @_;
my $git = $sync->{ibx}->git;
if (index($range, '..') < 0) {
@@ -1010,7 +1010,7 @@ sub prepare_stack ($$) {
return PublicInbox::IdxStack->new->read_prepare if $?;
}
$sync->{D} = $sync->{reindex} ? {} : undef; # OID_BIN => NR
- log2stack($sync, $git, $range);
+ log2stack($self, $sync, $git, $range);
}
# --is-ancestor requires git 1.8.0+
@@ -1028,7 +1028,7 @@ sub need_update ($$$$) {
# don't rewind if --{since,until,before,after} are in use
return if $cur ne '' &&
- grep(defined, @{$sync->{-opt}}{qw(since until)}) &&
+ grep(defined, @{$self->{-opt}}{qw(since until)}) &&
is_ancestor($git, $new, $cur);
return 1 if $cur ne '' && !is_ancestor($git, $cur, $new);
@@ -1067,7 +1067,7 @@ sub quit_cb ($) {
sub {
# we set {-opt}->{quit} too, so ->index_sync callers
# can abort multi-inbox loops this way
- $sync->{quit} = $sync->{-opt}->{quit} = 1;
+ $sync->{quit} = $sync->{self}->{-opt}->{quit} = 1;
warn "gracefully quitting\n";
}
}
@@ -1086,7 +1086,8 @@ sub _index_sync {
}
local $self->{transact_bytes} = 0;
my $pr = $opt->{-progress};
- my $sync = { reindex => $opt->{reindex}, -opt => $opt, ibx => $ibx };
+ local $self->{-opt} = $opt;
+ my $sync = { reindex => $opt->{reindex}, ibx => $ibx };
my $quit = quit_cb($sync);
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
@@ -1108,7 +1109,7 @@ sub _index_sync {
my $lx = reindex_from($sync->{reindex}, $last_commit);
my $range = $lx eq '' ? $tip : "$lx..$tip";
$pr->("counting changes\n\t$range ... ") if $pr;
- my $stk = prepare_stack($sync, $range);
+ my $stk = prepare_stack($self, $sync, $range);
$sync->{ntodo} = $stk ? $stk->num_records : 0;
$pr->("$sync->{ntodo}\n") if $pr; # continue previous line
process_stack($self, $sync, $stk) if !$sync->{quit};
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index dd3258f3..74281fed 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -710,12 +710,12 @@ sub reindex_checkpoint ($$) {
$self->done; # release lock
}
- if (my $pr = $sync->{-regen_fmt} ? $sync->{-opt}->{-progress} : undef) {
+ if (my $pr = $sync->{-regen_fmt} ? $self->{-opt}->{-progress} : undef) {
$pr->(sprintf $sync->{-regen_fmt}, $self->{nrec});
}
# allow -watch or -mda to write...
- $self->idx_init($sync->{-opt}); # reacquire lock
+ $self->idx_init($self->{-opt}); # reacquire lock
$mm_tmp->atfork_parent if $mm_tmp;
}
@@ -829,7 +829,7 @@ sub update_last_commit {
}
# don't rewind if --{since,until,before,after} are in use
return if (defined($last) &&
- grep(defined, @{$sync->{-opt}}{qw(since until)}) &&
+ grep(defined, @{$self->{-opt}}{qw(since until)}) &&
is_ancestor($self->git, $latest_cmt, $last));
last_epoch_commit($self, $unit->{epoch}, $latest_cmt);
@@ -847,7 +847,7 @@ sub last_commits {
# returns a revision range for git-log(1)
sub log_range ($$$) {
my ($sync, $unit, $tip) = @_;
- my $opt = $sync->{-opt};
+ my $opt = $sync->{self}->{-opt};
my $pr = $opt->{-progress} if (($opt->{verbose} || 0) > 1);
my $i = $unit->{epoch};
my $cur = $sync->{ranges}->[$i] or do {
@@ -910,7 +910,7 @@ sub artnum_max { $_[0]->{mm}->num_highwater }
sub sync_prepare ($$) {
my ($self, $sync) = @_;
$sync->{ranges} = sync_ranges($self, $sync);
- my $pr = $sync->{-opt}->{-progress};
+ my $pr = $self->{-opt}->{-progress};
my $regen_max = 0;
my $head = $sync->{ibx}->{ref_head} || 'HEAD';
my $pfx;
@@ -935,7 +935,7 @@ sub sync_prepare ($$) {
# rerun index_sync without {reindex}
$reindex_heads = $self->last_commits($sync);
}
- if ($sync->{max_size} = $sync->{-opt}->{max_size}) {
+ if ($sync->{max_size} = $self->{-opt}->{max_size}) {
$sync->{index_oid} = $self->can('index_oid');
}
my $git_pfx = "$sync->{ibx}->{inboxdir}/git";
@@ -960,7 +960,7 @@ sub sync_prepare ($$) {
# because we want NNTP article number gaps from unindexed
# messages to show up in mirrors, too.
$sync->{D} //= $sync->{reindex} ? {} : undef; # OID_BIN => NR
- my $stk = log2stack($sync, $git, $range);
+ my $stk = log2stack($self, $sync, $git, $range);
return 0 if $sync->{quit};
my $nr = $stk ? $stk->num_records : 0;
$pr->("$nr\n") if $pr;
@@ -1066,7 +1066,7 @@ sub unindex_todo ($$$) {
$fh->close or die "git log failed: \$?=$?";
$self->git->async_wait_all;
- return unless $sync->{-opt}->{prune};
+ return unless $self->{-opt}->{prune};
my $after = scalar keys %$unindexed;
return if $before == $after;
@@ -1102,7 +1102,7 @@ sub index_xap_step ($$$;$) {
$step //= $self->{shards};
my $ibx = $self->{ibx};
- if (my $pr = $sync->{-opt}->{-progress}) {
+ if (my $pr = $self->{-opt}->{-progress}) {
$pr->("Xapian indexlevel=$ibx->{indexlevel} ".
"$beg..$end (% $step)\n");
}
@@ -1169,15 +1169,14 @@ sub index_todo ($$$) {
$self->update_last_commit($sync, $stk);
}
-sub xapian_only {
- my ($self, $opt, $sync, $art_beg) = @_;
- my $seq = $opt->{'sequential-shard'};
+sub xapian_only ($;$$) {
+ my ($self, $sync, $art_beg) = @_;
+ my $seq = $self->{-opt}->{'sequential-shard'};
$art_beg //= 0;
local $self->{parallel} = 0 if $seq;
- $self->idx_init($opt); # acquire lock
+ $self->idx_init($self->{-opt}); # acquire lock
if (my $art_end = $self->{ibx}->mm->max) {
$sync //= {
- -opt => $opt,
self => $self,
-regen_fmt => "%u/?\n",
};
@@ -1206,7 +1205,8 @@ sub index_sync {
$opt //= {};
local $self->{need_checkpoint} = 0;
local $self->{nrec} = 0;
- return xapian_only($self, $opt) if $opt->{xapian_only};
+ local $self->{-opt} = $opt;
+ return xapian_only($self) if $opt->{xapian_only};
my $epoch_max = $self->{ibx}->max_git_epoch // return;
my $latest = $self->{mg}->epoch_dir."/$epoch_max.git";
@@ -1231,7 +1231,6 @@ sub index_sync {
$self->{oidx}->rethread_prepare($opt);
my $sync = {
reindex => $opt->{reindex},
- -opt => $opt,
self => $self,
ibx => $self->{ibx},
epoch_max => $epoch_max,
@@ -1263,7 +1262,7 @@ sub index_sync {
$self->done;
if (my $nrec = $self->{nrec}) {
- my $pr = $sync->{-opt}->{-progress};
+ my $pr = $self->{-opt}->{-progress};
$pr->('all.git '.sprintf($sync->{-regen_fmt}, $nrec)) if $pr;
}
@@ -1274,7 +1273,7 @@ sub index_sync {
$quit_warn = 1;
} else {
$self->{ibx}->{indexlevel} = $idxlevel;
- xapian_only($self, $opt, $sync, $art_beg);
+ xapian_only($self, $sync, $art_beg);
$quit_warn = 1 if $sync->{quit};
}
}
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 03/31] (ext)index: move {-regen_fmt} from $sync to $self
2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 01/31] (ext)index: ${$sync->{nr}} to $self->{nrec} Eric Wong
2025-01-10 23:18 ` [PATCH 02/31] (ext)index: move {-opt} from $sync to $self Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 04/31] (ext)index: move {latest_cmt} to $self (from $sync) Eric Wong
` (5 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
We rely on `local' anyways in some cases. This is yet another
step towards eliminating the $sync structure.
---
lib/PublicInbox/ExtSearchIdx.pm | 12 ++++++------
lib/PublicInbox/V2Writable.pm | 17 ++++++++---------
2 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 25f2d8e7..21f6b33a 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -794,7 +794,7 @@ sub eidxq_process ($$) { # for reindexing
my $dbh = $self->{oidx}->dbh;
my $tot = $dbh->selectrow_array('SELECT COUNT(*) FROM eidxq') or return;
$self->{nrec} = 0;
- local $sync->{-regen_fmt} = "%u/$tot\n";
+ local $self->{-regen_fmt} = "%u/$tot\n";
my $pr = $self->{-opt}->{-progress};
if ($pr) {
my $min = $dbh->selectrow_array('SELECT MIN(docid) FROM eidxq');
@@ -899,7 +899,7 @@ sub _reindex_check_ibx ($$$) {
# first, check if we missed any messages in target $ibx
my $msgs;
my $pr = $self->{-opt}->{-progress};
- local $sync->{-regen_fmt} = "$ekey checking %u/$max\n";
+ local $self->{-regen_fmt} = "$ekey checking %u/$max\n";
$self->{nrec} = 0;
my $fast = $self->{-opt}->{fast};
my $usr; # _unref_stale_range (< $lo) called
@@ -1047,7 +1047,7 @@ sub eidx_dedupe ($$$) {
my ($max_id) = $self->{oidx}->dbh->selectrow_array(<<EOS);
SELECT MAX(id) FROM msgid
EOS
- local $sync->{-regen_fmt} = "dedupe %u/$max_id\n";
+ local $self->{-regen_fmt} = "dedupe %u/$max_id\n";
# note: we could write this query more intelligently,
# but that causes lock contention with read-only processes
@@ -1118,12 +1118,12 @@ sub eidx_sync { # main entry point
local $self->{need_checkpoint} = 0;
local $self->{nrec} = 0;
local $self->{-opt} = $opt;
+ local $self->{-regen_fmt} = "%u/?\n";
my $sync = {
# DO NOT SET {reindex} here, it's incompatible with reused
# V2Writable code, reindex is totally different here
# compared to v1/v2 inboxes because we have multiple histories
self => $self,
- -regen_fmt => "%u/?\n",
};
local $SIG{USR1} = sub { $self->{need_checkpoint} = 1 };
my $quit = PublicInbox::SearchIdx::quit_cb($sync);
@@ -1298,9 +1298,9 @@ sub _watch_commit { # PublicInbox::DS::add_timer callback
delete $self->{-commit_timer};
eidxq_process($self, $self->{-watch_sync});
eidxq_release($self);
- my $fmt = delete $self->{-watch_sync}->{-regen_fmt};
+ my $fmt = delete $self->{-regen_fmt};
reindex_checkpoint($self, $self->{-watch_sync});
- $self->{-watch_sync}->{-regen_fmt} = $fmt;
+ $self->{-regen_fmt} = $fmt;
# call event_step => done unless commit_timer is armed
PublicInbox::DS::requeue($self);
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 74281fed..ca231e0c 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -710,8 +710,8 @@ sub reindex_checkpoint ($$) {
$self->done; # release lock
}
- if (my $pr = $sync->{-regen_fmt} ? $self->{-opt}->{-progress} : undef) {
- $pr->(sprintf $sync->{-regen_fmt}, $self->{nrec});
+ if (my $pr = $self->{-regen_fmt} ? $self->{-opt}->{-progress} : undef) {
+ $pr->(sprintf $self->{-regen_fmt}, $self->{nrec});
}
# allow -watch or -mda to write...
@@ -986,14 +986,14 @@ sub sync_prepare ($$) {
}
return 0 if $sync->{quit};
if (!$regen_max) {
- $sync->{-regen_fmt} = "%u/?\n";
+ $self->{-regen_fmt} = "%u/?\n";
return 0;
}
# reindex should NOT see new commits anymore, if we do,
# it's a problem and we need to notice it via die()
my $pad = length($regen_max) + 1;
- $sync->{-regen_fmt} = "% ${pad}u/$regen_max\n";
+ $self->{-regen_fmt} = "% ${pad}u/$regen_max\n";
$self->{nrec} = 0;
return -1 if $sync->{reindex};
$regen_max + $self->artnum_max || 0;
@@ -1176,10 +1176,8 @@ sub xapian_only ($;$$) {
local $self->{parallel} = 0 if $seq;
$self->idx_init($self->{-opt}); # acquire lock
if (my $art_end = $self->{ibx}->mm->max) {
- $sync //= {
- self => $self,
- -regen_fmt => "%u/?\n",
- };
+ $self->{-regen_fmt} //= "%u/?\n";
+ $sync //= { self => $self };
$sync->{art_end} = $art_end;
if ($seq || !$self->{parallel}) {
my $shard_end = $self->{shards} - 1;
@@ -1206,6 +1204,7 @@ sub index_sync {
local $self->{need_checkpoint} = 0;
local $self->{nrec} = 0;
local $self->{-opt} = $opt;
+ local $self->{-regen_fmt};
return xapian_only($self) if $opt->{xapian_only};
my $epoch_max = $self->{ibx}->max_git_epoch // return;
@@ -1263,7 +1262,7 @@ sub index_sync {
if (my $nrec = $self->{nrec}) {
my $pr = $self->{-opt}->{-progress};
- $pr->('all.git '.sprintf($sync->{-regen_fmt}, $nrec)) if $pr;
+ $pr->('all.git '.sprintf($self->{-regen_fmt}, $nrec)) if $pr;
}
my $quit_warn;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 04/31] (ext)index: move {latest_cmt} to $self (from $sync)
2025-01-10 23:18 ` Eric Wong
` (2 preceding siblings ...)
2025-01-10 23:18 ` [PATCH 03/31] (ext)index: move {-regen_fmt} " Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 05/31] smsg->populate: rename $sync to $cmt_info Eric Wong
` (4 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
No more sigil and brace overload from ${$sync->{latest_cmt}}
and another step towards $sync elimination.
---
lib/PublicInbox/ExtSearchIdx.pm | 10 ++++++----
lib/PublicInbox/IdxStack.pm | 2 +-
lib/PublicInbox/SearchIdx.pm | 8 ++++----
lib/PublicInbox/V2Writable.pm | 6 +++---
4 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 21f6b33a..441bb7b0 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -271,7 +271,7 @@ sub do_finalize ($) {
}
# cur_cmt may be undef for unindex_oid, set by V2Writable::index_todo
if (defined(my $cur_cmt = $req->{cur_cmt})) {
- ${$req->{latest_cmt}} = $cur_cmt;
+ $req->{self}->{latest_cmt} = $cur_cmt;
}
}
@@ -342,7 +342,7 @@ sub cur_ibx_xnum ($$;$) {
sub index_oid { # git->cat_async callback for 'm'
my ($bref, $oid, $type, $size, $req) = @_;
- my $self = $req->{self};
+ my $self = $req->{self} // die 'BUG: {self} missing';
local $self->{current_info} = "$self->{current_info} $oid";
return if is_bad_blob($oid, $type, $size, $req->{oid});
my $new_smsg = $req->{new_smsg} = bless {
@@ -354,7 +354,7 @@ sub index_oid { # git->cat_async callback for 'm'
$req->{xnum} = cur_ibx_xnum($req, $bref, $mismatch) // do {
warn "# deleted\n";
warn "# mismatch $_->{blob}\n" for @$mismatch;
- ${$req->{latest_cmt}} = $req->{cur_cmt} //
+ $self->{latest_cmt} = $req->{cur_cmt} //
die "BUG: {cur_cmt} unset ($oid)\n";
return;
};
@@ -531,6 +531,7 @@ sub eidx_gc { # top-level entry point
local $self->{need_checkpoint} = 0;
local $self->{nrec};
local $self->{-opt} = $opt;
+ local $self->{latest_cmt};
my $sync = { self => $self };
$self->idx_init($opt); # acquire lock via V2Writable::_idx_init
eidx_gc_scan_inboxes($self, $sync);
@@ -1119,6 +1120,7 @@ sub eidx_sync { # main entry point
local $self->{nrec} = 0;
local $self->{-opt} = $opt;
local $self->{-regen_fmt} = "%u/?\n";
+ local $self->{latest_cmt};
my $sync = {
# DO NOT SET {reindex} here, it's incompatible with reused
# V2Writable code, reindex is totally different here
@@ -1166,7 +1168,7 @@ sub eidx_sync { # main entry point
sub update_last_commit { # overrides V2Writable
my ($self, $sync, $stk) = @_;
my $unit = $sync->{unit} // return;
- my $latest_cmt = $stk ? $stk->{latest_cmt} : ${$sync->{latest_cmt}};
+ my $latest_cmt = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
defined($latest_cmt) or return;
my $ibx = $sync->{ibx} or die 'BUG: {ibx} missing';
my $ekey = $ibx->eidx_key;
diff --git a/lib/PublicInbox/IdxStack.pm b/lib/PublicInbox/IdxStack.pm
index 7681ee6f..8816d96b 100644
--- a/lib/PublicInbox/IdxStack.pm
+++ b/lib/PublicInbox/IdxStack.pm
@@ -14,7 +14,7 @@ use PublicInbox::IO qw(read_all);
sub new {
open(my $io, '+>', undef);
# latest_cmt is still useful when the newest revision is a `d'(elete),
- # otherwise we favor $sync->{latest_cmt} for checkpoints and {quit}
+ # otherwise we favor $self->{latest_cmt} for checkpoints and {quit}
bless { wr => $io, latest_cmt => $_[1] }, __PACKAGE__
}
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 628a1469..6a876963 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -813,7 +813,7 @@ sub index_both { # git->cat_async callback
++$self->{nidx};
++$self->{nrec};
my $cur_cmt = $sync->{cur_cmt} // die 'BUG: {cur_cmt} missing';
- ${$sync->{latest_cmt}} = $cur_cmt;
+ $self->{latest_cmt} = $cur_cmt;
}
sub unindex_both { # git->cat_async callback
@@ -824,7 +824,7 @@ sub unindex_both { # git->cat_async callback
unindex_eml($self, $oid, PublicInbox::Eml->new($bref));
# may be undef if leftover
if (defined(my $cur_cmt = $sync->{cur_cmt})) {
- ${$sync->{latest_cmt}} = $cur_cmt;
+ $self->{latest_cmt} = $cur_cmt;
}
++$self->{nidx};
}
@@ -865,7 +865,7 @@ sub v1_checkpoint ($$;$) {
$self->{need_checkpoint} = 0;
# $newest may be undef
- my $newest = $stk ? $stk->{latest_cmt} : ${$sync->{latest_cmt}};
+ my $newest = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
if (defined($newest)) {
my $cur = $self->{mm}->last_commit;
if (need_update($self, $sync, $cur, $newest)) {
@@ -912,7 +912,7 @@ sub process_stack {
$self->{nrec} = 0;
$sync->{sidx} = $self;
local $self->{need_checkpoint} = 0;
- $sync->{latest_cmt} = \(my $latest_cmt);
+ local $self->{latest_cmt};
$self->{mm}->{dbh}->begin_work;
if (my @leftovers = keys %{delete($sync->{D}) // {}}) {
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index ca231e0c..e5dae548 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -723,7 +723,7 @@ sub index_finalize ($$) {
my ($arg, $index) = @_;
++$arg->{self}->{nidx};
if (defined(my $cur = $arg->{cur_cmt})) {
- ${$arg->{latest_cmt}} = $cur;
+ $arg->{self}->{latest_cmt} = $cur;
} elsif ($index) {
die 'BUG: {cur_cmt} missing';
} # else { unindexing @leftovers doesn't set {cur_cmt}
@@ -819,7 +819,7 @@ sub index_oid { # cat_async callback
sub update_last_commit {
my ($self, $sync, $stk) = @_;
my $unit = $sync->{unit} // return;
- my $latest_cmt = $stk ? $stk->{latest_cmt} : ${$sync->{latest_cmt}};
+ my $latest_cmt = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
defined($latest_cmt) or return;
my $last = last_epoch_commit($self, $unit->{epoch});
if (defined $last && is_ancestor($self->git, $last, $latest_cmt)) {
@@ -1137,7 +1137,7 @@ sub index_todo ($$$) {
$pfx //= $unit->{git}->{git_dir};
}
local $self->{current_info} = "$pfx ";
- local $sync->{latest_cmt} = \(my $latest_cmt);
+ local $self->{latest_cmt};
local $sync->{unit} = $unit;
while (my ($f, $at, $ct, $oid, $cmt) = $stk->pop_rec) {
if ($sync->{quit}) {
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 05/31] smsg->populate: rename $sync to $cmt_info
2025-01-10 23:18 ` Eric Wong
` (3 preceding siblings ...)
2025-01-10 23:18 ` [PATCH 04/31] (ext)index: move {latest_cmt} to $self (from $sync) Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 06/31] searchidx: prefix v1 code with `v1_' Eric Wong
` (3 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
We only use this hashref to propagate author and commit time
from the commit, so give it a more descriptive name rather than
overloading the to-be-eliminated `$sync' struct. There's no
need to explicitly vivify into a hashref, either.
---
lib/PublicInbox/Smsg.pm | 7 +++----
lib/PublicInbox/V2Writable.pm | 4 ++--
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index b132381b..a430f72b 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -90,7 +90,7 @@ sub parse_references ($$$) {
# used for v2, Import and v1 non-SQLite WWW code paths
sub populate {
- my ($self, $hdr, $sync) = @_;
+ my ($self, $hdr, $cmt_info) = @_;
for my $f (qw(From To Cc Subject)) {
my @all = $hdr->header($f);
my $val = join(', ', @all);
@@ -111,9 +111,8 @@ sub populate {
}
$self->{$f} = $val if $val ne '';
}
- $sync //= {};
- my @ds = msg_datestamp($hdr, $sync->{autime} // $self->{ds});
- my @ts = msg_timestamp($hdr, $sync->{cotime} // $self->{ts});
+ my @ds = msg_datestamp($hdr, $cmt_info->{autime} // $self->{ds});
+ my @ts = msg_timestamp($hdr, $cmt_info->{cotime} // $self->{ts});
$self->{-ds} = \@ds;
$self->{-ts} = \@ts;
$self->{ds} //= $ds[0]; # no zone
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index e5dae548..5d8a5484 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -487,8 +487,8 @@ W: $list
num => $smsg->{num},
mid => $smsg->{mid},
}, 'PublicInbox::Smsg';
- my $sync = { autime => $smsg->{ds}, cotime => $smsg->{ts} };
- $new_smsg->populate($new_mime, $sync);
+ my $cmt_info = { autime => $smsg->{ds}, cotime => $smsg->{ts} };
+ $new_smsg->populate($new_mime, $cmt_info);
$new_smsg->set_bytes($raw, $bytes);
do_idx($self, $new_mime, $new_smsg);
}
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 06/31] searchidx: prefix v1 code with `v1_'
2025-01-10 23:18 ` Eric Wong
` (4 preceding siblings ...)
2025-01-10 23:18 ` [PATCH 05/31] smsg->populate: rename $sync to $cmt_info Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 07/31] (ext)index: avoid needless {git} ref with --max-size Eric Wong
` (2 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
Since there's no or v1-specific modules such as `V1Writable',
prefixing v1-specific subs seems like a prudent way to demarcate
subs used for dealing with the old v1 storage format.
We'll also take this opportunity to rearrange the order of subs
to take advantage of prototypes and compile-time checking with
`perl -c'.
---
lib/PublicInbox/SearchIdx.pm | 82 ++++++++++++++++++------------------
t/search.t | 7 ++-
2 files changed, 45 insertions(+), 44 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 6a876963..d2954ed7 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -516,15 +516,32 @@ sub add_xapian ($$$$) {
$self->{xdb}->replace_document($smsg->{num}, $doc);
}
-sub _msgmap_init ($) {
+sub v1_mm_init ($) {
my ($self) = @_;
- die "BUG: _msgmap_init is only for v1\n" if $self->{ibx}->version != 1;
+ die "BUG: v1_mm_init is only for v1\n" if $self->{ibx}->version != 1;
$self->{mm} //= do {
require PublicInbox::Msgmap;
PublicInbox::Msgmap->new_file($self->{ibx}, 1);
};
}
+sub v1_index_mm ($$$$) {
+ my ($self, $eml, $oid, $sync) = @_;
+ my $mids = mids($eml);
+ my $mm = $self->{mm};
+ if ($sync->{reindex}) {
+ my $oidx = $self->{oidx};
+ for my $mid (@$mids) {
+ my ($num, undef) = $oidx->num_mid0_for_oid($oid, $mid);
+ return $num if defined $num;
+ }
+ $mm->num_for($mids->[0]) // $mm->mid_insert($mids->[0]);
+ } else {
+ # fallback to num_for since filters like RubyLang set the number
+ $mm->mid_insert($mids->[0]) // $mm->num_for($mids->[0]);
+ }
+}
+
sub add_message {
# mime = PublicInbox::Eml or Email::MIME object
my ($self, $mime, $smsg, $sync) = @_;
@@ -533,8 +550,8 @@ sub add_message {
$smsg //= bless { blob => '' }, 'PublicInbox::Smsg'; # test-only compat
$smsg->{mid} //= $mids->[0]; # v1 compatibility
$smsg->{num} //= do { # v1
- _msgmap_init($self);
- index_mm($self, $mime, $smsg->{blob}, $sync);
+ v1_mm_init $self;
+ v1_index_mm $self, $mime, $smsg->{blob}, $sync;
};
# v1 and tests only:
@@ -734,8 +751,7 @@ sub index_git_blob_id {
}
}
-# v1 only
-sub unindex_eml {
+sub v1_unindex_eml ($$$) {
my ($self, $oid, $eml) = @_;
my $mids = mids($eml);
my $nr = 0;
@@ -760,23 +776,6 @@ sub unindex_eml {
xdb_remove($self, keys %tmp) if need_xapian($self);
}
-sub index_mm {
- my ($self, $mime, $oid, $sync) = @_;
- my $mids = mids($mime);
- my $mm = $self->{mm};
- if ($sync->{reindex}) {
- my $oidx = $self->{oidx};
- for my $mid (@$mids) {
- my ($num, undef) = $oidx->num_mid0_for_oid($oid, $mid);
- return $num if defined $num;
- }
- $mm->num_for($mids->[0]) // $mm->mid_insert($mids->[0]);
- } else {
- # fallback to num_for since filters like RubyLang set the number
- $mm->mid_insert($mids->[0]) // $mm->num_for($mids->[0]);
- }
-}
-
sub is_bad_blob ($$$$) {
my ($oid, $type, $size, $expect_oid) = @_;
if ($type ne 'blob') {
@@ -798,7 +797,7 @@ sub update_checkpoint ($;$) {
$self->{need_checkpoint} += ($now > $next ? 1 : 0);
}
-sub index_both { # git->cat_async callback
+sub v1_index_both { # git->cat_async callback
my ($bref, $oid, $type, $size, $sync) = @_;
return if is_bad_blob($oid, $type, $size, $sync->{oid});
my $smsg = bless { blob => $oid }, 'PublicInbox::Smsg';
@@ -807,7 +806,7 @@ sub index_both { # git->cat_async callback
update_checkpoint $self, $smsg->{bytes};
local $self->{current_info} = "$self->{current_info}: $oid";
my $eml = PublicInbox::Eml->new($bref);
- $smsg->{num} = index_mm($self, $eml, $oid, $sync) or
+ $smsg->{num} = v1_index_mm $self, $eml, $oid, $sync or
die "E: could not generate NNTP article number for $oid";
add_message($self, $eml, $smsg, $sync);
++$self->{nidx};
@@ -816,12 +815,12 @@ sub index_both { # git->cat_async callback
$self->{latest_cmt} = $cur_cmt;
}
-sub unindex_both { # git->cat_async callback
+sub v1_unindex_both { # git->cat_async callback
my ($bref, $oid, $type, $size, $sync) = @_;
return if is_bad_blob($oid, $type, $size, $sync->{oid});
my $self = $sync->{sidx};
local $self->{current_info} = "$self->{current_info}: $oid";
- unindex_eml($self, $oid, PublicInbox::Eml->new($bref));
+ v1_unindex_eml $self, $oid, PublicInbox::Eml->new($bref);
# may be undef if leftover
if (defined(my $cur_cmt = $sync->{cur_cmt})) {
$self->{latest_cmt} = $cur_cmt;
@@ -868,7 +867,7 @@ sub v1_checkpoint ($$;$) {
my $newest = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
if (defined($newest)) {
my $cur = $self->{mm}->last_commit;
- if (need_update($self, $sync, $cur, $newest)) {
+ if (v1_need_update($self, $sync, $cur, $newest)) {
$self->{mm}->last_commit($newest);
}
}
@@ -876,7 +875,7 @@ sub v1_checkpoint ($$;$) {
my $xdb = $self->{xdb};
if ($newest && $xdb) {
my $cur = $xdb->get_metadata('last_commit');
- if (need_update($self, $sync, $cur, $newest)) {
+ if (v1_need_update($self, $sync, $cur, $newest)) {
$xdb->set_metadata('last_commit', $newest);
}
}
@@ -905,8 +904,7 @@ sub v1_checkpoint ($$;$) {
delete $self->{next_checkpoint};
}
-# only for v1
-sub process_stack {
+sub v1_process_stack ($$$) {
my ($self, $sync, $stk) = @_;
my $git = $sync->{ibx}->git;
$self->{nrec} = 0;
@@ -920,11 +918,11 @@ sub process_stack {
for my $oid (@leftovers) {
last if $sync->{quit};
$oid = unpack('H*', $oid);
- $git->cat_async($oid, \&unindex_both, $sync);
+ $git->cat_async($oid, \&v1_unindex_both, $sync);
}
}
if ($sync->{max_size} = $self->{-opt}->{max_size}) {
- $sync->{index_oid} = \&index_both;
+ $sync->{index_oid} = \&v1_index_both;
}
while (my ($f, $at, $ct, $oid, $cur_cmt) = $stk->pop_rec) {
my $arg = { %$sync, cur_cmt => $cur_cmt, oid => $oid };
@@ -936,10 +934,10 @@ sub process_stack {
$arg->{git} = $git;
$git->check_async($oid, \&check_size, $arg);
} else {
- $git->cat_async($oid, \&index_both, $arg);
+ $git->cat_async($oid, \&v1_index_both, $arg);
}
} elsif ($f eq 'd') {
- $git->cat_async($oid, \&unindex_both, $arg);
+ $git->cat_async($oid, \&v1_unindex_both, $arg);
}
v1_checkpoint $self, $sync if $self->{need_checkpoint};
}
@@ -1021,7 +1019,7 @@ sub is_ancestor ($$$) {
run_wait($cmd) == 0;
}
-sub need_update ($$$$) {
+sub v1_need_update ($$$$) {
my ($self, $sync, $cur, $new) = @_;
my $git = $self->{ibx}->git;
$cur //= ''; # XS Search::Xapian ->get_metadata doesn't give undef
@@ -1040,7 +1038,7 @@ sub need_update ($$$$) {
# The last git commit we indexed with Xapian or SQLite (msgmap)
# This needs to account for cases where Xapian or SQLite is
# out-of-date with respect to the other.
-sub _last_x_commit {
+sub v1_last_x_commit ($$) {
my ($self, $mm) = @_;
my $lm = $mm->last_commit || '';
my $lx = '';
@@ -1056,7 +1054,7 @@ sub _last_x_commit {
$lx;
}
-sub reindex_from ($$) {
+sub v1_reindex_from ($$) {
my ($reindex, $last_commit) = @_;
return $last_commit unless $reindex;
ref($reindex) eq 'HASH' ? $reindex->{from} : '';
@@ -1094,7 +1092,7 @@ sub _index_sync {
local $SIG{TERM} = $quit;
my $xdb = $self->begin_txn_lazy;
$self->{oidx}->rethread_prepare($opt);
- my $mm = _msgmap_init($self);
+ my $mm = v1_mm_init $self;
if ($sync->{reindex}) {
my $last = $mm->last_commit;
if ($last) {
@@ -1105,14 +1103,14 @@ sub _index_sync {
undef $sync->{reindex};
}
}
- my $last_commit = _last_x_commit($self, $mm);
- my $lx = reindex_from($sync->{reindex}, $last_commit);
+ my $last_commit = v1_last_x_commit $self, $mm;
+ my $lx = v1_reindex_from $sync->{reindex}, $last_commit;
my $range = $lx eq '' ? $tip : "$lx..$tip";
$pr->("counting changes\n\t$range ... ") if $pr;
my $stk = prepare_stack($self, $sync, $range);
$sync->{ntodo} = $stk ? $stk->num_records : 0;
$pr->("$sync->{ntodo}\n") if $pr; # continue previous line
- process_stack($self, $sync, $stk) if !$sync->{quit};
+ v1_process_stack($self, $sync, $stk) if !$sync->{quit};
}
sub DESTROY {
diff --git a/t/search.t b/t/search.t
index 9fda6694..8938e6c6 100644
--- a/t/search.t
+++ b/t/search.t
@@ -1,6 +1,9 @@
#!perl -w
# Copyright (C) all contributors <meta@public-inbox.org>
# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# FIXME: some is probably redundant with more round-trip PSGI tests
+# nowadays with run_script and TEST_RUN_MODE=2 spawn-avoidance speedups
use strict;
use v5.10;
use PublicInbox::TestCommon;
@@ -419,8 +422,8 @@ $ibx->with_umask(sub {
$art = $ibx->over->next_by_mid($mid, \$id, \$prev);
ok($art, 'article exists in OVER DB');
}
- $rw->_msgmap_init;
- $rw->unindex_eml($oid, $amsg);
+ $rw->v1_mm_init;
+ $rw->v1_unindex_eml($oid, $amsg);
$rw->commit_txn_lazy;
SKIP: {
skip('$art not defined', 1) unless defined $art;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 07/31] (ext)index: avoid needless {git} ref with --max-size
2025-01-10 23:18 ` Eric Wong
` (5 preceding siblings ...)
2025-01-10 23:18 ` [PATCH 06/31] searchidx: prefix v1 code with `v1_' Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 08/31] (ext)index: move {quit} from $sync to $self Eric Wong
2025-01-10 23:18 ` [PATCH 09/31] extindex: {boost_in_use} field " Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
We can access the PublicInbox::Git object via {ibx} directly to
avoid redundant refs and cut down the $sync struct further.
---
lib/PublicInbox/SearchIdx.pm | 6 +++---
lib/PublicInbox/V2Writable.pm | 3 +--
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index d2954ed7..77416e61 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -850,9 +850,10 @@ sub index_sync {
sub check_size { # check_async cb for -index --max-size=...
my (undef, $oid, $type, $size, $arg) = @_;
- ($type // '') eq 'blob' or die "E: bad $oid in $arg->{git}->{git_dir}";
+ ($type // '') eq 'blob' or
+ die "E: bad $oid in $arg->{ibx}->{git}->{git_dir}";
if ($size <= $arg->{max_size}) {
- $arg->{git}->cat_async($oid, $arg->{index_oid}, $arg);
+ $arg->{ibx}->{git}->cat_async($oid, $arg->{index_oid}, $arg);
} else {
warn "W: skipping $oid ($size > $arg->{max_size})\n";
}
@@ -931,7 +932,6 @@ sub v1_process_stack ($$$) {
$arg->{autime} = $at;
$arg->{cotime} = $ct;
if ($sync->{max_size}) {
- $arg->{git} = $git;
$git->check_async($oid, \&check_size, $arg);
} else {
$git->cat_async($oid, \&v1_index_both, $arg);
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 5d8a5484..27be8c39 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -1126,7 +1126,7 @@ sub index_todo ($$$) {
return if $sync->{quit};
unindex_todo($self, $sync, $unit);
my $stk = delete($unit->{stack}) or return;
- my $all = $self->git;
+ my $all = $self->git; # initialize self->{ibx}->{git}
my $index_oid = $self->can('index_oid');
my $unindex_oid = $self->can('unindex_oid');
my $pfx;
@@ -1155,7 +1155,6 @@ sub index_todo ($$$) {
};
if ($f eq 'm') {
if ($sync->{max_size}) {
- $req->{git} = $all;
$all->check_async($oid, \&check_size, $req);
} else {
$all->cat_async($oid, $index_oid, $req);
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 08/31] (ext)index: move {quit} from $sync to $self
2025-01-10 23:18 ` Eric Wong
` (6 preceding siblings ...)
2025-01-10 23:18 ` [PATCH 07/31] (ext)index: avoid needless {git} ref with --max-size Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 09/31] extindex: {boost_in_use} field " Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
No need to localize it, either, since we don't expect to
use the $self instance after {quit} is set.
---
lib/PublicInbox/ExtSearchIdx.pm | 72 ++++++++++++++++-----------------
lib/PublicInbox/SearchIdx.pm | 27 ++++++-------
lib/PublicInbox/V2Writable.pm | 29 +++++++------
3 files changed, 63 insertions(+), 65 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 441bb7b0..e63d917f 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -168,7 +168,7 @@ sub remove_doc ($$) {
}
sub _unref_doc ($$$$$;$) {
- my ($sync, $docid, $ibx, $xnum, $oidbin, $eml) = @_;
+ my ($self, $docid, $ibx, $xnum, $oidbin, $eml) = @_;
my $smsg;
if (ref($docid)) {
$smsg = $docid;
@@ -185,34 +185,34 @@ sub _unref_doc ($$$$$;$) {
my $s = 'DELETE FROM xref3 WHERE oidbin = ?';
$s .= ' AND ibx_id = ?' if defined($ibx);
$s .= ' AND xnum = ?' if defined($xnum);
- my $del = $sync->{self}->{oidx}->dbh->prepare_cached($s);
+ my $del = $self->{oidx}->dbh->prepare_cached($s);
my $col = 0;
$del->bind_param(++$col, $oidbin, SQL_BLOB);
$del->bind_param(++$col, $ibx->{-ibx_id}) if $ibx;
$del->bind_param(++$col, $xnum) if defined($xnum);
$del->execute;
- my $xr3 = $sync->{self}->{oidx}->get_xref3($docid);
+ my $xr3 = $self->{oidx}->get_xref3($docid);
if (scalar(@$xr3) == 0) { # all gone
- remove_doc($sync->{self}, $docid);
+ remove_doc($self, $docid);
} else { # enqueue for reindex of remaining messages
if ($ibx) {
my $ekey = $ibx->{-gc_eidx_key} // $ibx->eidx_key;
- my $idx = $sync->{self}->idx_shard($docid);
+ my $idx = $self->idx_shard($docid);
$idx->ipc_do('remove_eidx_info', $docid, $ekey, $eml);
} # else: we can't remove_eidx_info in reindex-only path
# replace invalidated blob ASAP with something which should be
# readable since we may commit the transaction on checkpoint.
# eidxq processing will re-apply boost
- $smsg //= $sync->{self}->{oidx}->get_art($docid);
+ $smsg //= $self->{oidx}->get_art($docid);
my $hex = unpack('H*', $oidbin);
if ($smsg && $smsg->{blob} eq $hex) {
$xr3->[0] =~ /:([a-f0-9]{40,}+)\z/ or
die "BUG: xref $xr3->[0] has no OID";
- $sync->{self}->{oidx}->update_blob($smsg, $1);
+ $self->{oidx}->update_blob($smsg, $1);
}
# yes, add, we'll need to re-apply boost
- $sync->{self}->{oidx}->eidxq_add($docid);
+ $self->{oidx}->eidxq_add($docid);
}
@$xr3
}
@@ -234,7 +234,7 @@ sub do_xpost ($$) {
} else { # 'd' no {xnum}
$self->git->async_wait_all;
$oid = pack('H*', $oid);
- _unref_doc($req, $docid, $xibx, undef, $oid, $eml);
+ _unref_doc $self, $docid, $xibx, undef, $oid, $eml;
}
}
@@ -302,7 +302,7 @@ sub _blob_missing ($$) { # called when a known $smsg->{blob} is gone
# xnum and ibx are unknown, we only call this when an entry from
# /ei*/over.sqlite3 is bad, not on entries from xap*/over.sqlite3
$req->{self}->git->async_wait_all;
- _unref_doc($req, $smsg, undef, undef, $smsg->oidbin);
+ _unref_doc $req->{self}, $smsg, undef, undef, $smsg->oidbin;
}
sub ck_existing { # git->cat_async callback
@@ -415,10 +415,10 @@ sub _sync_inbox ($$$) {
return "E: $ekey unsupported inbox version (v$v)";
}
for my $unit (@{delete($sync->{todo}) // []}) {
- last if $sync->{quit};
+ last if $self->{quit};
index_todo($self, $sync, $unit);
}
- $self->{midx}->index_ibx($ibx) unless $sync->{quit};
+ $self->{midx}->index_ibx($ibx) unless $self->{quit};
$ibx->git->cleanup; # done with this inbox, now
undef;
}
@@ -441,7 +441,7 @@ EOM
$x3_doc->execute($ibx_id);
my $ibx = { -ibx_id => $ibx_id, -gc_eidx_key => $eidx_key };
while (my ($docid, $xnum, $oid) = $x3_doc->fetchrow_array) {
- my $r = _unref_doc($sync, $docid, $ibx, $xnum, $oid);
+ my $r = _unref_doc $self, $docid, $ibx, $xnum, $oid;
$oid = unpack('H*', $oid);
$r = $r ? 'unref' : 'remove';
warn "# $r #$docid $eidx_key $oid\n";
@@ -602,7 +602,7 @@ sub _reindex_finalize ($$$) {
for my $x (reverse @$ary) {
warn "removing #$docid xref3 $x->{blob}\n";
my $bin = $x->oidbin;
- my $n = _unref_doc($sync, $docid, undef, undef, $bin);
+ my $n = _unref_doc $self, $docid, undef, undef, $bin;
die "BUG: $x->{blob} invalidated #$docid" if $n == 0;
}
my $x = pop(@$ary) // die "BUG: #$docid {by_chash} empty";
@@ -634,7 +634,7 @@ sub _reindex_oid { # git->cat_async callback
my $docid = $orig_smsg->{num};
if (is_bad_blob($oid, $type, $size, $expect_oid)) {
my $oidbin = pack('H*', $expect_oid);
- my $remain = _unref_doc($sync, $docid, undef, undef, $oidbin);
+ my $remain = _unref_doc $self, $docid, undef, undef, $oidbin;
if ($remain == 0) {
warn "W: #$docid ($oid) gone or corrupt\n";
} elsif (my $next_oid = $req->{xr3r}->[++$req->{ix}]->[2]) {
@@ -809,7 +809,7 @@ restart:
$iter = $dbh->prepare('SELECT docid FROM eidxq ORDER BY docid ASC');
$iter->execute;
while (defined(my $docid = $iter->fetchrow_array)) {
- last if $sync->{quit};
+ last if $self->{quit};
if (my $smsg = $self->{oidx}->get_art($docid)) {
_reindex_smsg($self, $sync, $smsg);
} else {
@@ -860,21 +860,21 @@ sub reindex_unseen ($$$$) {
}
sub _unref_stale_range ($$$) {
- my ($sync, $ibx, $lt_or_gt) = @_;
+ my ($self, $ibx, $lt_or_gt) = @_;
my $r;
my $lim = 10000;
do {
- $r = $sync->{self}->{oidx}->dbh->selectall_arrayref(
+ $r = $self->{oidx}->dbh->selectall_arrayref(
<<EOS, undef, $ibx->{-ibx_id});
SELECT docid,xnum,oidbin FROM xref3
WHERE ibx_id = ? AND $lt_or_gt LIMIT $lim
EOS
- return if $sync->{quit};
+ return if $self->{quit};
for (@$r) { # hopefully rare, not worth optimizing:
my ($docid, $xnum, $oidbin) = @$_;
my $hex = unpack('H*', $oidbin);
warn("# $xnum:$hex (#$docid): stale\n");
- _unref_doc($sync, $docid, $ibx, $xnum, $oidbin);
+ _unref_doc $self, $docid, $ibx, $xnum, $oidbin;
}
} while (scalar(@$r) == $lim);
1;
@@ -892,7 +892,7 @@ sub _reindex_check_ibx ($$$) {
$max0 = $ibx->mm->num_highwater;
sync_inbox($self, $sync, $ibx) and return; # warned
$max = $ibx->mm->num_highwater;
- return if $sync->{quit};
+ return if $self->{quit};
} while ($max > $max0 &&
warn("# $ekey moved $max0..$max, resyncing..\n"));
$end = $max if $end > $max;
@@ -913,7 +913,7 @@ sub _reindex_check_ibx ($$$) {
update_checkpoint $self and
reindex_checkpoint($self, $sync); # release lock
($lo, $hi) = ($msgs->[0]->{num}, $msgs->[-1]->{num});
- $usr //= _unref_stale_range($sync, $ibx, "xnum < $lo");
+ $usr //= _unref_stale_range($self, $ibx, "xnum < $lo");
my $x3a = $self->{oidx}->dbh->selectall_arrayref(
<<"", undef, $ibx_id, $lo, $hi);
SELECT xnum,oidbin,docid FROM xref3 WHERE
@@ -935,7 +935,7 @@ ibx_id = ? AND xnum >= ? AND xnum <= ?
$self->{oidx}->eidxq_add($num);
}
}
- return if $sync->{quit};
+ return if $self->{quit};
}
next unless scalar keys %x3m;
$self->git->async_wait_all; # wait for reindex_unseen
@@ -959,13 +959,13 @@ BUG: (non-fatal) $ekey #$xnum $smsg->{blob} still matches (old exp: $exp)
my $m = defined($exp) ? "mismatch (!= $exp)" : 'stale';
warn("# $xnum:$hex (#@$docids): $m\n");
for my $i (@$docids) {
- _unref_doc($sync, $i, $ibx, $xnum, $bin);
+ _unref_doc $self, $i, $ibx, $xnum, $bin;
}
- return if $sync->{quit};
+ return if $self->{quit};
}
}
defined($hi) and ($hi < $max) and
- _unref_stale_range($sync, $ibx, "xnum > $hi AND xnum <= $max");
+ _unref_stale_range($self, $ibx, "xnum > $hi AND xnum <= $max");
}
sub _reindex_inbox ($$$) {
@@ -992,10 +992,10 @@ sub eidx_reindex {
}
for my $ibx (@{ibx_sorted($self, 'active')}) {
_reindex_inbox($self, $sync, $ibx);
- last if $sync->{quit};
+ last if $self->{quit};
}
$self->git->async_wait_all; # ensure eidxq gets filled completely
- eidxq_process($self, $sync) unless $sync->{quit};
+ eidxq_process($self, $sync) unless $self->{quit};
}
sub sync_inbox {
@@ -1066,7 +1066,7 @@ EOS
$iter->execute($cur_mid, $min_id);
}
while (my ($mid, $id) = $iter->fetchrow_array) {
- last if $sync->{quit};
+ last if $self->{quit};
$self->{current_info} = "dedupe $mid";
$self->{nrec} = $min_id = $id;
my ($prv, @smsg);
@@ -1128,7 +1128,7 @@ sub eidx_sync { # main entry point
self => $self,
};
local $SIG{USR1} = sub { $self->{need_checkpoint} = 1 };
- my $quit = PublicInbox::SearchIdx::quit_cb($sync);
+ my $quit = PublicInbox::SearchIdx::quit_cb $self;
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
local $SIG{TERM} = $quit;
@@ -1153,12 +1153,12 @@ sub eidx_sync { # main entry point
# don't use $_ here, it'll get clobbered by reindex_checkpoint
if ($opt->{scan} // 1) {
for my $ibx (@{ibx_sorted($self, 'active')}) {
- last if $sync->{quit};
+ last if $self->{quit};
sync_inbox($self, $sync, $ibx);
}
}
- $self->{oidx}->rethread_done($opt) unless $sync->{quit};
- eidxq_process($self, $sync) unless $sync->{quit};
+ $self->{oidx}->rethread_done($opt) unless $self->{quit};
+ eidxq_process($self, $sync) unless $self->{quit};
eidxq_release($self);
done($self);
@@ -1385,7 +1385,7 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
$pr->("performing initial scan ...\n") if $pr;
local $self->{-opt} = $opt;
my $sync = eidx_sync($self, $opt); # initial sync
- return if $sync->{quit};
+ return if $self->{quit};
my $oldset = PublicInbox::DS::block_signals();
local $self->{current_info} = '';
local $SIG{__WARN__} = PublicInbox::Admin::warn_cb $self;
@@ -1394,10 +1394,10 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
USR1 => sub { eidx_resync_start($self) },
TSTP => sub { kill('STOP', $$) },
};
- my $quit = PublicInbox::SearchIdx::quit_cb($sync);
+ my $quit = PublicInbox::SearchIdx::quit_cb $self;
$sig->{QUIT} = $sig->{INT} = $sig->{TERM} = $quit;
local $self->{-watch_sync} = $sync; # for ->on_inbox_unlock
- local @PublicInbox::DS::post_loop_do = (sub { !$sync->{quit} });
+ local @PublicInbox::DS::post_loop_do = (sub { !$self->{quit} });
$pr->("initial scan complete, entering event loop\n") if $pr;
# calls InboxIdle->event_step:
PublicInbox::DS::event_loop($sig, $oldset);
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 77416e61..dbbd4323 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -839,12 +839,11 @@ sub index_sync {
my ($self, $opt) = @_;
delete $self->{lock_path} if $opt->{-skip_lock};
$self->with_umask(\&_index_sync, $self, $opt);
- if ($opt->{reindex} && !$opt->{quit} &&
+ if ($opt->{reindex} && !$self->{quit} &&
!grep(defined, @$opt{qw(since until)})) {
my %again = %$opt;
delete @again{qw(rethread reindex)};
index_sync($self, \%again);
- $opt->{quit} = $again{quit}; # propagate to caller
}
}
@@ -897,7 +896,7 @@ sub v1_checkpoint ($$;$) {
if (my $pr = $self->{-opt}->{-progress}) {
$pr->("indexed $nrec/$sync->{ntodo}\n") if $nrec;
}
- if (!$stk && !$sync->{quit}) { # more to come
+ if (!$stk && !$self->{quit}) { # more to come
begin_txn_lazy($self);
$self->{mm}->{dbh}->begin_work;
}
@@ -917,7 +916,7 @@ sub v1_process_stack ($$$) {
if (my @leftovers = keys %{delete($sync->{D}) // {}}) {
warn('W: unindexing '.scalar(@leftovers)." leftovers\n");
for my $oid (@leftovers) {
- last if $sync->{quit};
+ last if $self->{quit};
$oid = unpack('H*', $oid);
$git->cat_async($oid, \&v1_unindex_both, $sync);
}
@@ -927,7 +926,7 @@ sub v1_process_stack ($$$) {
}
while (my ($f, $at, $ct, $oid, $cur_cmt) = $stk->pop_rec) {
my $arg = { %$sync, cur_cmt => $cur_cmt, oid => $oid };
- last if $sync->{quit};
+ last if $self->{quit};
if ($f eq 'm') {
$arg->{autime} = $at;
$arg->{cotime} = $ct;
@@ -941,7 +940,7 @@ sub v1_process_stack ($$$) {
}
v1_checkpoint $self, $sync if $self->{need_checkpoint};
}
- v1_checkpoint($self, $sync, $sync->{quit} ? undef : $stk);
+ v1_checkpoint($self, $sync, $self->{quit} ? undef : $stk);
}
sub log2stack ($$$$) {
@@ -969,7 +968,7 @@ sub log2stack ($$$$) {
my $fh = $git->popen(@cmd, $range);
my ($at, $ct, $stk, $cmt, $l);
while (defined($l = <$fh>)) {
- return if $sync->{quit};
+ return if $self->{quit};
if ($l =~ /\A([0-9]+)-([0-9]+)-($OID)$/o) {
($at, $ct, $cmt) = ($1 + 0, $2 + 0, $3);
$stk //= PublicInbox::IdxStack->new($cmt);
@@ -1061,12 +1060,12 @@ sub v1_reindex_from ($$) {
}
sub quit_cb ($) {
- my ($sync) = @_;
+ my ($self) = @_;
sub {
- # we set {-opt}->{quit} too, so ->index_sync callers
- # can abort multi-inbox loops this way
- $sync->{quit} = $sync->{self}->{-opt}->{quit} = 1;
- warn "gracefully quitting\n";
+ # we set {-opt}->{quit} for public-inbox-index so
+ # can abort multi-inbox loops this way (for now...)
+ $self->{quit} = $self->{-opt}->{quit} = 1;
+ warn "# gracefully quitting\n";
}
}
@@ -1086,7 +1085,7 @@ sub _index_sync {
my $pr = $opt->{-progress};
local $self->{-opt} = $opt;
my $sync = { reindex => $opt->{reindex}, ibx => $ibx };
- my $quit = quit_cb($sync);
+ my $quit = quit_cb $self;
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
local $SIG{TERM} = $quit;
@@ -1110,7 +1109,7 @@ sub _index_sync {
my $stk = prepare_stack($self, $sync, $range);
$sync->{ntodo} = $stk ? $stk->num_records : 0;
$pr->("$sync->{ntodo}\n") if $pr; # continue previous line
- v1_process_stack($self, $sync, $stk) if !$sync->{quit};
+ v1_process_stack($self, $sync, $stk) if !$self->{quit};
}
sub DESTROY {
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 27be8c39..886f59b1 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -961,14 +961,14 @@ sub sync_prepare ($$) {
# messages to show up in mirrors, too.
$sync->{D} //= $sync->{reindex} ? {} : undef; # OID_BIN => NR
my $stk = log2stack($self, $sync, $git, $range);
- return 0 if $sync->{quit};
+ return 0 if $self->{quit};
my $nr = $stk ? $stk->num_records : 0;
$pr->("$nr\n") if $pr;
$unit->{stack} = $stk; # may be undef
unshift @{$sync->{todo}}, $unit;
$regen_max += $nr;
}
- return 0 if $sync->{quit};
+ return 0 if $self->{quit};
# XXX this should not happen unless somebody bypasses checks in
# our code and blindly injects "d" file history into git repos
@@ -977,14 +977,14 @@ sub sync_prepare ($$) {
local $self->{current_info} = 'leftover ';
my $unindex_oid = $self->can('unindex_oid');
for my $oid (@leftovers) {
- last if $sync->{quit};
+ last if $self->{quit};
$oid = unpack('H*', $oid);
my $req = { %$sync, oid => $oid };
$self->git->cat_async($oid, $unindex_oid, $req);
}
$self->git->async_wait_all;
}
- return 0 if $sync->{quit};
+ return 0 if $self->{quit};
if (!$regen_max) {
$self->{-regen_fmt} = "%u/?\n";
return 0;
@@ -1107,7 +1107,7 @@ sub index_xap_step ($$$;$) {
"$beg..$end (% $step)\n");
}
for (my $num = $beg; $num <= $end; $num += $step) {
- last if $sync->{quit};
+ last if $self->{quit};
my $smsg = $ibx->over->get_art($num) or next;
$smsg->{self} = $self;
$ibx->git->cat_async($smsg->{blob}, \&index_xap_only, $smsg);
@@ -1123,7 +1123,7 @@ sub index_xap_step ($$$;$) {
sub index_todo ($$$) {
my ($self, $sync, $unit) = @_;
- return if $sync->{quit};
+ return if $self->{quit};
unindex_todo($self, $sync, $unit);
my $stk = delete($unit->{stack}) or return;
my $all = $self->git; # initialize self->{ibx}->{git}
@@ -1140,7 +1140,7 @@ sub index_todo ($$$) {
local $self->{latest_cmt};
local $sync->{unit} = $unit;
while (my ($f, $at, $ct, $oid, $cmt) = $stk->pop_rec) {
- if ($sync->{quit}) {
+ if ($self->{quit}) {
warn "waiting to quit...\n";
$all->async_wait_all;
$self->update_last_commit($sync);
@@ -1181,7 +1181,7 @@ sub xapian_only ($;$$) {
if ($seq || !$self->{parallel}) {
my $shard_end = $self->{shards} - 1;
for my $i (0..$shard_end) {
- last if $sync->{quit};
+ last if $self->{quit};
index_xap_step($self, $sync, $art_beg + $i);
if ($i != $shard_end) {
reindex_checkpoint($self, $sync);
@@ -1233,7 +1233,7 @@ sub index_sync {
ibx => $self->{ibx},
epoch_max => $epoch_max,
};
- my $quit = PublicInbox::SearchIdx::quit_cb($sync);
+ my $quit = PublicInbox::SearchIdx::quit_cb $self;
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
local $SIG{TERM} = $quit;
@@ -1256,7 +1256,7 @@ sub index_sync {
}
# work forwards through history
index_todo($self, $sync, $_) for @{delete($sync->{todo}) // []};
- $self->{oidx}->rethread_done($opt) unless $sync->{quit};
+ $self->{oidx}->rethread_done($opt) unless $self->{quit};
$self->done;
if (my $nrec = $self->{nrec}) {
@@ -1267,17 +1267,17 @@ sub index_sync {
my $quit_warn;
# deal with Xapian shards sequentially
if ($seq && delete($sync->{mm_tmp})) {
- if ($sync->{quit}) {
+ if ($self->{quit}) {
$quit_warn = 1;
} else {
$self->{ibx}->{indexlevel} = $idxlevel;
xapian_only($self, $sync, $art_beg);
- $quit_warn = 1 if $sync->{quit};
+ $quit_warn = 1 if $self->{quit};
}
}
# --reindex on the command-line
- if (!$sync->{quit} && $opt->{reindex} &&
+ if (!$self->{quit} && $opt->{reindex} &&
!ref($opt->{reindex}) && $idxlevel ne 'basic') {
$self->lock_acquire;
my $s0 = PublicInbox::SearchIdx->new($self->{ibx}, 0, 0);
@@ -1290,13 +1290,12 @@ sub index_sync {
}
# reindex does not pick up new changes, so we rerun w/o it:
- if ($opt->{reindex} && !$sync->{quit} &&
+ if ($opt->{reindex} && !$self->{quit} &&
!grep(defined, @$opt{qw(since until)})) {
my %again = %$opt;
$sync = undef;
delete @again{qw(rethread reindex -skip_lock)};
index_sync($self, \%again);
- $opt->{quit} = $again{quit}; # propagate to caller
}
warn <<EOF if $quit_warn;
W: interrupted, --xapian-only --reindex required upon restart
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 09/31] extindex: {boost_in_use} field to $self
2025-01-10 23:18 ` Eric Wong
` (7 preceding siblings ...)
2025-01-10 23:18 ` [PATCH 08/31] (ext)index: move {quit} from $sync to $self Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
There's no need to repeatedly propagate this field to every
cat-file blob request since this field is tied to the attached
config and inboxes associated with an inbox.
---
lib/PublicInbox/ExtSearchIdx.pm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index e63d917f..242f4319 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -230,7 +230,7 @@ sub do_xpost ($$) {
$self->{oidx}->add_xref3($docid, $xnum, $oid, $eidx_key);
my $idx = $self->idx_shard($docid);
$idx->ipc_do('add_eidx_info', $docid, $eidx_key, $eml);
- apply_boost($req, $smsg) if $req->{boost_in_use};
+ apply_boost($req, $smsg) if $self->{boost_in_use};
} else { # 'd' no {xnum}
$self->git->async_wait_all;
$oid = pack('H*', $oid);
@@ -1138,7 +1138,7 @@ sub eidx_sync { # main entry point
if (scalar(grep { defined($_->{boost}) } @{$self->{ibx_known}})) {
$sync->{id2pos} //= prep_id2pos($self);
- $sync->{boost_in_use} = 1;
+ $self->{boost_in_use} = 1;
}
if (my $msgids = delete($opt->{dedupe})) {
@@ -1329,6 +1329,7 @@ sub eidx_reload { # -extindex --watch SIGHUP handler
delete $self->{-resync_queue};
delete $self->{-ibx_ary_known};
delete $self->{-ibx_ary_active};
+ delete $self->{boost_in_use};
$self->{ibx_known} = [];
$self->{ibx_active} = [];
%{$self->{ibx_map}} = ();
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 10/31] extindex: move {id2pos} to $self
2025-01-10 23:17 [PATCH 00/31] (ext)index: eliminate $sync structure Eric Wong
2025-01-10 23:18 ` Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 11/31] searchidx: move {ntodo} " Eric Wong
` (8 more replies)
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
2025-01-10 23:20 ` [PATCH 30/31] (ext)index: eliminate $sync->{ibx} Eric Wong
3 siblings, 9 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
{id2pos} doesn't need a different hash table entry for every
request, instead it is tied to the inboxes associated with the
ExtSearchIdx ($self) object so we can just store it and give it
the same lifetime as the associated inbox references.
---
lib/PublicInbox/ExtSearchIdx.pm | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 242f4319..2f6de2bb 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -75,6 +75,7 @@ sub attach_inbox {
$self->{ibx_map}->{$ibx->eidx_key} //= do {
delete $self->{-ibx_ary_known}; # invalidate cache
delete $self->{-ibx_ary_active}; # invalidate cache
+ delete $self->{id2pos};
$types //= [ qw(active known) ];
for my $t (@$types) {
push @{$self->{"ibx_$t"}}, $ibx;
@@ -143,7 +144,7 @@ sub check_xr3 ($$$) {
sub apply_boost ($$) {
my ($req, $smsg) = @_;
- my $id2pos = $req->{id2pos}; # index in ibx_sorted
+ my $id2pos = $req->{self}->{id2pos}; # index in ibx_sorted
my $xr3 = $req->{self}->{oidx}->get_xref3($smsg->{num}, 1);
check_xr3($req->{self}, $id2pos, $xr3);
@$xr3 = sort { # sort ascending
@@ -542,7 +543,7 @@ sub eidx_gc { # top-level entry point
sub _ibx_for ($$$) {
my ($self, $sync, $smsg) = @_;
my $ibx_id = delete($smsg->{ibx_id}) // die 'BUG: {ibx_id} unset';
- my $pos = $sync->{id2pos}->{$ibx_id} //
+ my $pos = $self->{id2pos}->{$ibx_id} //
bad_ibx_id($self, $ibx_id, \&croak);
$self->{-ibx_ary_known}->[$pos] //
die "BUG: ibx for $smsg->{blob} not mapped"
@@ -680,7 +681,7 @@ BUG? #$docid $smsg->{blob} is not referenced by inboxes during reindex
# we sort {xr3r} in the reverse order of ibx_sorted so we can
# hit the common case in _reindex_finalize without rereading
# from git (or holding multiple messages in memory).
- my $id2pos = $sync->{id2pos}; # index in ibx_sorted
+ my $id2pos = $self->{id2pos}; # index in ibx_sorted
check_xr3($self, $id2pos, $xr3);
@$xr3 = sort { # sort descending
$id2pos->{$b->[0]} <=> $id2pos->{$a->[0]}
@@ -802,7 +803,7 @@ sub eidxq_process ($$) { # for reindexing
my $max = $dbh->selectrow_array('SELECT MAX(docid) FROM eidxq');
$pr->("Xapian indexing $min..$max (total=$tot)\n");
}
- $sync->{id2pos} //= prep_id2pos($self);
+ $self->{id2pos} //= prep_id2pos $self;
my ($del, $iter);
restart:
$del = $dbh->prepare('DELETE FROM eidxq WHERE docid = ?');
@@ -1135,9 +1136,8 @@ sub eidx_sync { # main entry point
for my $ibx (@{ibx_sorted($self, 'known')}) {
$ibx->{-ibx_id} //= $self->{oidx}->ibx_id($ibx->eidx_key);
}
-
- if (scalar(grep { defined($_->{boost}) } @{$self->{ibx_known}})) {
- $sync->{id2pos} //= prep_id2pos($self);
+ if (grep { defined($_->{boost}) } @{$self->{ibx_known}}) {
+ $self->{id2pos} //= prep_id2pos $self;
$self->{boost_in_use} = 1;
}
@@ -1329,11 +1329,11 @@ sub eidx_reload { # -extindex --watch SIGHUP handler
delete $self->{-resync_queue};
delete $self->{-ibx_ary_known};
delete $self->{-ibx_ary_active};
+ delete $self->{id2pos};
delete $self->{boost_in_use};
$self->{ibx_known} = [];
$self->{ibx_active} = [];
%{$self->{ibx_map}} = ();
- delete $self->{-watch_sync}->{id2pos};
my $cfg = PublicInbox::Config->new;
attach_config($self, $cfg);
$idler->refresh($cfg);
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 11/31] searchidx: move {ntodo} to $self
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 12/31] searchidx: rename {sidx} to {self} Eric Wong
` (7 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
Another step towards eliminating the temporary $sync structure
to flatten the internal data structures.
---
lib/PublicInbox/SearchIdx.pm | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index dbbd4323..daea0bab 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -894,7 +894,7 @@ sub v1_checkpoint ($$;$) {
idx_release($self, $nrec);
# let another process do some work...
if (my $pr = $self->{-opt}->{-progress}) {
- $pr->("indexed $nrec/$sync->{ntodo}\n") if $nrec;
+ $pr->("indexed $nrec/$self->{ntodo}\n") if $nrec;
}
if (!$stk && !$self->{quit}) { # more to come
begin_txn_lazy($self);
@@ -1107,8 +1107,8 @@ sub _index_sync {
my $range = $lx eq '' ? $tip : "$lx..$tip";
$pr->("counting changes\n\t$range ... ") if $pr;
my $stk = prepare_stack($self, $sync, $range);
- $sync->{ntodo} = $stk ? $stk->num_records : 0;
- $pr->("$sync->{ntodo}\n") if $pr; # continue previous line
+ local $self->{ntodo} = $stk ? $stk->num_records : 0;
+ $pr->("$self->{ntodo}\n") if $pr; # continue previous line
v1_process_stack($self, $sync, $stk) if !$self->{quit};
}
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 12/31] searchidx: rename {sidx} to {self}
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
2025-01-10 23:18 ` [PATCH 11/31] searchidx: move {ntodo} " Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 13/31] (ext)index: move {max_size} and related bits to $self Eric Wong
` (6 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
While this is v1-only code, {self} is more consistent with the
rest of the code and the eventual goal is to eliminate $sync
entirely in favor of $self for temporary state.
---
lib/PublicInbox/SearchIdx.pm | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index daea0bab..24f76c32 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -802,7 +802,7 @@ sub v1_index_both { # git->cat_async callback
return if is_bad_blob($oid, $type, $size, $sync->{oid});
my $smsg = bless { blob => $oid }, 'PublicInbox::Smsg';
$smsg->set_bytes($$bref, $size);
- my $self = $sync->{sidx};
+ my $self = $sync->{self};
update_checkpoint $self, $smsg->{bytes};
local $self->{current_info} = "$self->{current_info}: $oid";
my $eml = PublicInbox::Eml->new($bref);
@@ -818,7 +818,7 @@ sub v1_index_both { # git->cat_async callback
sub v1_unindex_both { # git->cat_async callback
my ($bref, $oid, $type, $size, $sync) = @_;
return if is_bad_blob($oid, $type, $size, $sync->{oid});
- my $self = $sync->{sidx};
+ my $self = $sync->{self};
local $self->{current_info} = "$self->{current_info}: $oid";
v1_unindex_eml $self, $oid, PublicInbox::Eml->new($bref);
# may be undef if leftover
@@ -908,7 +908,7 @@ sub v1_process_stack ($$$) {
my ($self, $sync, $stk) = @_;
my $git = $sync->{ibx}->git;
$self->{nrec} = 0;
- $sync->{sidx} = $self;
+ $sync->{self} = $self;
local $self->{need_checkpoint} = 0;
local $self->{latest_cmt};
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 13/31] (ext)index: move {max_size} and related bits to $self
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
2025-01-10 23:18 ` [PATCH 11/31] searchidx: move {ntodo} " Eric Wong
2025-01-10 23:18 ` [PATCH 12/31] searchidx: rename {sidx} to {self} Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 14/31] index: move {D} (delete state) " Eric Wong
` (5 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
`--max-size' is persistent for a process, so having it in $self
is more appropriate anyways than the shorter-lived $sync. The
{index_oid} function pointer is tied to max-size handling, so
we'll move that over to $self as well.
---
lib/PublicInbox/ExtSearchIdx.pm | 3 ++-
lib/PublicInbox/SearchIdx.pm | 16 ++++++++--------
lib/PublicInbox/V2Writable.pm | 7 +++----
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 2f6de2bb..ebcc8770 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -397,7 +397,8 @@ sub _sync_inbox ($$$) {
if (defined(my $err = _ibx_index_reject($ibx))) {
return "W: skipping $ekey ($err)";
}
- $sync->{ibx} = $ibx;
+ $sync->{ibx} = $ibx; # FIXME: eliminate
+ local $self->{ibx} = $ibx;
$self->{nrec} = 0;
my $v = $ibx->version;
if ($v == 2) {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 24f76c32..9771633d 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -849,12 +849,13 @@ sub index_sync {
sub check_size { # check_async cb for -index --max-size=...
my (undef, $oid, $type, $size, $arg) = @_;
+ my $self = $arg->{self};
($type // '') eq 'blob' or
- die "E: bad $oid in $arg->{ibx}->{git}->{git_dir}";
- if ($size <= $arg->{max_size}) {
- $arg->{ibx}->{git}->cat_async($oid, $arg->{index_oid}, $arg);
+ die "E: bad $oid in $self->{ibx}->{git}->{git_dir}";
+ if ($size <= $self->{max_size}) {
+ $self->{ibx}->{git}->cat_async($oid, $self->{index_oid}, $arg);
} else {
- warn "W: skipping $oid ($size > $arg->{max_size})\n";
+ warn "W: skipping $oid ($size > $self->{max_size})\n";
}
}
@@ -921,16 +922,15 @@ sub v1_process_stack ($$$) {
$git->cat_async($oid, \&v1_unindex_both, $sync);
}
}
- if ($sync->{max_size} = $self->{-opt}->{max_size}) {
- $sync->{index_oid} = \&v1_index_both;
- }
+ $self->{max_size} = $self->{-opt}->{max_size} and
+ $self->{index_oid} = \&v1_index_both;
while (my ($f, $at, $ct, $oid, $cur_cmt) = $stk->pop_rec) {
my $arg = { %$sync, cur_cmt => $cur_cmt, oid => $oid };
last if $self->{quit};
if ($f eq 'm') {
$arg->{autime} = $at;
$arg->{cotime} = $ct;
- if ($sync->{max_size}) {
+ if ($self->{max_size}) {
$git->check_async($oid, \&check_size, $arg);
} else {
$git->cat_async($oid, \&v1_index_both, $arg);
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 886f59b1..ff4e973a 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -935,9 +935,8 @@ sub sync_prepare ($$) {
# rerun index_sync without {reindex}
$reindex_heads = $self->last_commits($sync);
}
- if ($sync->{max_size} = $self->{-opt}->{max_size}) {
- $sync->{index_oid} = $self->can('index_oid');
- }
+ $self->{max_size} = $self->{-opt}->{max_size} and
+ $self->{index_oid} = $self->can('index_oid');
my $git_pfx = "$sync->{ibx}->{inboxdir}/git";
for (my $i = $sync->{epoch_max}; $i >= 0; $i--) {
my $git_dir = "$git_pfx/$i.git";
@@ -1154,7 +1153,7 @@ sub index_todo ($$$) {
cur_cmt => $cmt
};
if ($f eq 'm') {
- if ($sync->{max_size}) {
+ if ($self->{max_size}) {
$all->check_async($oid, \&check_size, $req);
} else {
$all->cat_async($oid, $index_oid, $req);
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 14/31] index: move {D} (delete state) to $self
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
` (2 preceding siblings ...)
2025-01-10 23:18 ` [PATCH 13/31] (ext)index: move {max_size} and related bits to $self Eric Wong
@ 2025-01-10 23:18 ` Eric Wong
2025-01-10 23:19 ` [PATCH 15/31] index: move {reindex} " Eric Wong
` (4 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:18 UTC (permalink / raw)
To: meta
The {D} delete state is short-lived and may have its scope
limited via `local', so take another step towards reducing
internal data structures.
---
lib/PublicInbox/SearchIdx.pm | 6 +++---
lib/PublicInbox/V2Writable.pm | 5 +++--
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 9771633d..0e84a6a6 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -914,7 +914,7 @@ sub v1_process_stack ($$$) {
local $self->{latest_cmt};
$self->{mm}->{dbh}->begin_work;
- if (my @leftovers = keys %{delete($sync->{D}) // {}}) {
+ if (my @leftovers = keys %{delete($self->{D}) // {}}) {
warn('W: unindexing '.scalar(@leftovers)." leftovers\n");
for my $oid (@leftovers) {
last if $self->{quit};
@@ -945,7 +945,7 @@ sub v1_process_stack ($$$) {
sub log2stack ($$$$) {
my ($self, $sync, $git, $range) = @_;
- my $D = $sync->{D}; # OID_BIN => NR (if reindexing, undef otherwise)
+ my $D = $self->{D}; # OID_BIN => NR (if reindexing, undef otherwise)
my ($add, $del);
if ($sync->{ibx}->version == 1) {
my $path = $hex.'{2}/'.$hex.'{38}';
@@ -1006,7 +1006,7 @@ sub prepare_stack ($$$) {
$git->qx(qw(rev-parse -q --verify), "$range^0");
return PublicInbox::IdxStack->new->read_prepare if $?;
}
- $sync->{D} = $sync->{reindex} ? {} : undef; # OID_BIN => NR
+ local $self->{D} = $sync->{reindex} ? {} : undef; # OID_BIN => NR
log2stack($self, $sync, $git, $range);
}
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index ff4e973a..6ab9cbaf 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -914,6 +914,7 @@ sub sync_prepare ($$) {
my $regen_max = 0;
my $head = $sync->{ibx}->{ref_head} || 'HEAD';
my $pfx;
+ local $self->{D}; # delete state
if ($pr) {
($pfx) = ($sync->{ibx}->{inboxdir} =~ m!([^/]+)\z!g);
$pfx //= $sync->{ibx}->{inboxdir};
@@ -958,7 +959,7 @@ sub sync_prepare ($$) {
# We intentionally do NOT use {D} in the non-reindex case
# because we want NNTP article number gaps from unindexed
# messages to show up in mirrors, too.
- $sync->{D} //= $sync->{reindex} ? {} : undef; # OID_BIN => NR
+ $self->{D} //= $sync->{reindex} ? {} : undef; # OID_BIN => NR
my $stk = log2stack($self, $sync, $git, $range);
return 0 if $self->{quit};
my $nr = $stk ? $stk->num_records : 0;
@@ -971,7 +972,7 @@ sub sync_prepare ($$) {
# XXX this should not happen unless somebody bypasses checks in
# our code and blindly injects "d" file history into git repos
- if (my @leftovers = keys %{delete($sync->{D}) // {}}) {
+ if (my @leftovers = keys %{delete($self->{D}) // {}}) {
warn('W: unindexing '.scalar(@leftovers)." leftovers\n");
local $self->{current_info} = 'leftover ';
my $unindex_oid = $self->can('unindex_oid');
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 15/31] index: move {reindex} to $self
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
` (3 preceding siblings ...)
2025-01-10 23:18 ` [PATCH 14/31] index: move {D} (delete state) " Eric Wong
@ 2025-01-10 23:19 ` Eric Wong
2025-01-10 23:19 ` [PATCH 16/31] searchidx: eliminate $sync from subroutines Eric Wong
` (3 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:19 UTC (permalink / raw)
To: meta
Again, we can easily contain the scope of {reindex} to $self via
`local', so there's no need to rely on another temporary structure.
---
lib/PublicInbox/SearchIdx.pm | 15 ++++++++-------
lib/PublicInbox/V2Writable.pm | 10 +++++-----
2 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 0e84a6a6..a9e4df62 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -529,7 +529,7 @@ sub v1_index_mm ($$$$) {
my ($self, $eml, $oid, $sync) = @_;
my $mids = mids($eml);
my $mm = $self->{mm};
- if ($sync->{reindex}) {
+ if ($self->{reindex}) {
my $oidx = $self->{oidx};
for my $mid (@$mids) {
my ($num, undef) = $oidx->num_mid0_for_oid($oid, $mid);
@@ -883,7 +883,7 @@ sub v1_checkpoint ($$;$) {
if ($stk) { # all done if $stk is passed
# let SearchView know a full --reindex was done so it can
# generate ->has_threadid-dependent links
- if ($xdb && $sync->{reindex} && !ref($sync->{reindex})) {
+ if ($xdb && $self->{reindex} && !ref($self->{reindex})) {
my $n = $xdb->get_metadata('has_threadid');
$xdb->set_metadata('has_threadid', '1') if $n ne '1';
}
@@ -1006,7 +1006,7 @@ sub prepare_stack ($$$) {
$git->qx(qw(rev-parse -q --verify), "$range^0");
return PublicInbox::IdxStack->new->read_prepare if $?;
}
- local $self->{D} = $sync->{reindex} ? {} : undef; # OID_BIN => NR
+ local $self->{D} = $self->{reindex} ? {} : undef; # OID_BIN => NR
log2stack($self, $sync, $git, $range);
}
@@ -1084,7 +1084,8 @@ sub _index_sync {
local $self->{transact_bytes} = 0;
my $pr = $opt->{-progress};
local $self->{-opt} = $opt;
- my $sync = { reindex => $opt->{reindex}, ibx => $ibx };
+ local $self->{reindex} = $opt->{reindex};
+ my $sync = { ibx => $ibx };
my $quit = quit_cb $self;
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
@@ -1092,18 +1093,18 @@ sub _index_sync {
my $xdb = $self->begin_txn_lazy;
$self->{oidx}->rethread_prepare($opt);
my $mm = v1_mm_init $self;
- if ($sync->{reindex}) {
+ if ($self->{reindex}) {
my $last = $mm->last_commit;
if ($last) {
$tip = $last;
} else {
# somebody just blindly added --reindex when indexing
# for the first time, allow it:
- undef $sync->{reindex};
+ delete $self->{reindex};
}
}
my $last_commit = v1_last_x_commit $self, $mm;
- my $lx = v1_reindex_from $sync->{reindex}, $last_commit;
+ my $lx = v1_reindex_from $self->{reindex}, $last_commit;
my $range = $lx eq '' ? $tip : "$lx..$tip";
$pr->("counting changes\n\t$range ... ") if $pr;
my $stk = prepare_stack($self, $sync, $range);
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 6ab9cbaf..31d8dcda 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -931,7 +931,7 @@ sub sync_prepare ($$) {
for my $i (0..$sync->{epoch_max}) {
$reindex_heads->[$i] = $mm->last_commit_xap($v, $i);
}
- } elsif ($sync->{reindex}) { # V2 inbox
+ } elsif ($self->{reindex}) { # V2 inbox
# reindex stops at the current heads and we later
# rerun index_sync without {reindex}
$reindex_heads = $self->last_commits($sync);
@@ -959,7 +959,7 @@ sub sync_prepare ($$) {
# We intentionally do NOT use {D} in the non-reindex case
# because we want NNTP article number gaps from unindexed
# messages to show up in mirrors, too.
- $self->{D} //= $sync->{reindex} ? {} : undef; # OID_BIN => NR
+ $self->{D} //= $self->{reindex} ? {} : undef; # OID_BIN => NR
my $stk = log2stack($self, $sync, $git, $range);
return 0 if $self->{quit};
my $nr = $stk ? $stk->num_records : 0;
@@ -995,7 +995,7 @@ sub sync_prepare ($$) {
my $pad = length($regen_max) + 1;
$self->{-regen_fmt} = "% ${pad}u/$regen_max\n";
$self->{nrec} = 0;
- return -1 if $sync->{reindex};
+ return -1 if $self->{reindex};
$regen_max + $self->artnum_max || 0;
}
@@ -1077,7 +1077,7 @@ sub unindex_todo ($$$) {
sub sync_ranges ($$) {
my ($self, $sync) = @_;
- my $reindex = $sync->{reindex};
+ my $reindex = $self->{reindex};
return $self->last_commits($sync) unless $reindex;
return [] if ref($reindex) ne 'HASH';
@@ -1227,8 +1227,8 @@ sub index_sync {
$self->idx_init($opt); # acquire lock
$self->{mg}->fill_alternates;
$self->{oidx}->rethread_prepare($opt);
+ local $self->{reindex} = $opt->{reindex};
my $sync = {
- reindex => $opt->{reindex},
self => $self,
ibx => $self->{ibx},
epoch_max => $epoch_max,
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 16/31] searchidx: eliminate $sync from subroutines
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
` (4 preceding siblings ...)
2025-01-10 23:19 ` [PATCH 15/31] index: move {reindex} " Eric Wong
@ 2025-01-10 23:19 ` Eric Wong
2025-01-10 23:19 ` [PATCH 17/31] v2writable: move {mm_tmp} to $self Eric Wong
` (2 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:19 UTC (permalink / raw)
To: meta
Eliminating this variable will hopefully reduce cognitive
overhead as well as reducing overhead from argument passing.
---
lib/PublicInbox/ExtSearchIdx.pm | 3 +-
lib/PublicInbox/LeiInput.pm | 5 +--
lib/PublicInbox/SearchIdx.pm | 79 ++++++++++++++++-----------------
lib/PublicInbox/V2Writable.pm | 2 +-
4 files changed, 43 insertions(+), 46 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index ebcc8770..32cdd090 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -409,8 +409,7 @@ sub _sync_inbox ($$$) {
my $lc = $self->{oidx}->eidx_meta("lc-v1:$ekey//$uv");
my $head = $ibx->mm->last_commit //
return "E: $ibx->{inboxdir} is not indexed";
- my $stk = prepare_stack($self, $sync,
- $lc ? "$lc..$head" : $head);
+ my $stk = prepare_stack $self, $lc ? "$lc..$head" : $head;
my $unit = { stack => $stk, git => $ibx->git };
push @{$sync->{todo}}, $unit;
} else {
diff --git a/lib/PublicInbox/LeiInput.pm b/lib/PublicInbox/LeiInput.pm
index 618829ef..a9443f3a 100644
--- a/lib/PublicInbox/LeiInput.pm
+++ b/lib/PublicInbox/LeiInput.pm
@@ -143,11 +143,10 @@ EOM
require PublicInbox::SearchIdx;
my $n = $ibx->max_git_epoch;
my @g = defined($n) ? map { $ibx->git_epoch($_) } (0..$n) : ($ibx->git);
- my $sync = { D => {}, ibx => $ibx }; # D => {} filters out deletes
+ my $sidx = { D => {}, ibx => $ibx }; # D => {} filters out deletes
my ($f, $at, $ct, $oid, $cmt);
for my $git (grep defined, @g) {
- my $s = PublicInbox::SearchIdx::log2stack($sync, $sync,
- $git, 'HEAD');
+ my $s = PublicInbox::SearchIdx::log2stack($sidx, $git, 'HEAD');
while (($f, $at, $ct, $oid, $cmt) = $s->pop_rec) {
$git->cat_async($oid, \&oid2eml, $self) if $f eq 'm';
}
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index a9e4df62..51c8b9c5 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -525,8 +525,8 @@ sub v1_mm_init ($) {
};
}
-sub v1_index_mm ($$$$) {
- my ($self, $eml, $oid, $sync) = @_;
+sub v1_index_mm ($$$) {
+ my ($self, $eml, $oid) = @_;
my $mids = mids($eml);
my $mm = $self->{mm};
if ($self->{reindex}) {
@@ -544,18 +544,18 @@ sub v1_index_mm ($$$$) {
sub add_message {
# mime = PublicInbox::Eml or Email::MIME object
- my ($self, $mime, $smsg, $sync) = @_;
+ my ($self, $mime, $smsg, $cmt_info) = @_;
begin_txn_lazy($self);
my $mids = mids_for_index($mime);
$smsg //= bless { blob => '' }, 'PublicInbox::Smsg'; # test-only compat
$smsg->{mid} //= $mids->[0]; # v1 compatibility
$smsg->{num} //= do { # v1
v1_mm_init $self;
- v1_index_mm $self, $mime, $smsg->{blob}, $sync;
+ v1_index_mm $self, $mime, $smsg->{blob};
};
# v1 and tests only:
- $smsg->populate($mime, $sync);
+ $smsg->populate($mime, $cmt_info);
$smsg->{bytes} //= length($mime->as_string);
eval {
@@ -798,31 +798,31 @@ sub update_checkpoint ($;$) {
}
sub v1_index_both { # git->cat_async callback
- my ($bref, $oid, $type, $size, $sync) = @_;
- return if is_bad_blob($oid, $type, $size, $sync->{oid});
+ my ($bref, $oid, $type, $size, $arg) = @_;
+ return if is_bad_blob($oid, $type, $size, $arg->{oid});
my $smsg = bless { blob => $oid }, 'PublicInbox::Smsg';
$smsg->set_bytes($$bref, $size);
- my $self = $sync->{self};
+ my $self = $arg->{self};
update_checkpoint $self, $smsg->{bytes};
local $self->{current_info} = "$self->{current_info}: $oid";
my $eml = PublicInbox::Eml->new($bref);
- $smsg->{num} = v1_index_mm $self, $eml, $oid, $sync or
+ $smsg->{num} = v1_index_mm $self, $eml, $oid or
die "E: could not generate NNTP article number for $oid";
- add_message($self, $eml, $smsg, $sync);
+ add_message($self, $eml, $smsg, $arg);
++$self->{nidx};
++$self->{nrec};
- my $cur_cmt = $sync->{cur_cmt} // die 'BUG: {cur_cmt} missing';
+ my $cur_cmt = $arg->{cur_cmt} // die 'BUG: {cur_cmt} missing';
$self->{latest_cmt} = $cur_cmt;
}
sub v1_unindex_both { # git->cat_async callback
- my ($bref, $oid, $type, $size, $sync) = @_;
- return if is_bad_blob($oid, $type, $size, $sync->{oid});
- my $self = $sync->{self};
+ my ($bref, $oid, $type, $size, $arg) = @_;
+ return if is_bad_blob($oid, $type, $size, $arg->{oid});
+ my $self = $arg->{self};
local $self->{current_info} = "$self->{current_info}: $oid";
v1_unindex_eml $self, $oid, PublicInbox::Eml->new($bref);
# may be undef if leftover
- if (defined(my $cur_cmt = $sync->{cur_cmt})) {
+ if (defined(my $cur_cmt = $arg->{cur_cmt})) {
$self->{latest_cmt} = $cur_cmt;
}
++$self->{nidx};
@@ -859,8 +859,8 @@ sub check_size { # check_async cb for -index --max-size=...
}
}
-sub v1_checkpoint ($$;$) {
- my ($self, $sync, $stk) = @_;
+sub v1_checkpoint ($;$) {
+ my ($self, $stk) = @_;
$self->{ibx}->git->async_wait_all;
$self->{need_checkpoint} = 0;
@@ -868,7 +868,7 @@ sub v1_checkpoint ($$;$) {
my $newest = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
if (defined($newest)) {
my $cur = $self->{mm}->last_commit;
- if (v1_need_update($self, $sync, $cur, $newest)) {
+ if (v1_need_update($self, $cur, $newest)) {
$self->{mm}->last_commit($newest);
}
}
@@ -876,7 +876,7 @@ sub v1_checkpoint ($$;$) {
my $xdb = $self->{xdb};
if ($newest && $xdb) {
my $cur = $xdb->get_metadata('last_commit');
- if (v1_need_update($self, $sync, $cur, $newest)) {
+ if (v1_need_update($self, $cur, $newest)) {
$xdb->set_metadata('last_commit', $newest);
}
}
@@ -890,7 +890,7 @@ sub v1_checkpoint ($$;$) {
$self->{oidx}->rethread_done($self->{-opt}); # all done
}
commit_txn_lazy($self);
- $sync->{ibx}->git->cleanup;
+ $self->{ibx}->git->cleanup;
my $nrec = $self->{nrec};
idx_release($self, $nrec);
# let another process do some work...
@@ -905,11 +905,10 @@ sub v1_checkpoint ($$;$) {
delete $self->{next_checkpoint};
}
-sub v1_process_stack ($$$) {
- my ($self, $sync, $stk) = @_;
- my $git = $sync->{ibx}->git;
+sub v1_process_stack ($$) {
+ my ($self, $stk) = @_;
+ my $git = $self->{ibx}->git;
$self->{nrec} = 0;
- $sync->{self} = $self;
local $self->{need_checkpoint} = 0;
local $self->{latest_cmt};
@@ -919,13 +918,14 @@ sub v1_process_stack ($$$) {
for my $oid (@leftovers) {
last if $self->{quit};
$oid = unpack('H*', $oid);
- $git->cat_async($oid, \&v1_unindex_both, $sync);
+ my $arg = { oid => $oid, self => $self };
+ $git->cat_async($oid, \&v1_unindex_both, $arg);
}
}
$self->{max_size} = $self->{-opt}->{max_size} and
$self->{index_oid} = \&v1_index_both;
while (my ($f, $at, $ct, $oid, $cur_cmt) = $stk->pop_rec) {
- my $arg = { %$sync, cur_cmt => $cur_cmt, oid => $oid };
+ my $arg = { self => $self, cur_cmt => $cur_cmt, oid => $oid };
last if $self->{quit};
if ($f eq 'm') {
$arg->{autime} = $at;
@@ -938,16 +938,16 @@ sub v1_process_stack ($$$) {
} elsif ($f eq 'd') {
$git->cat_async($oid, \&v1_unindex_both, $arg);
}
- v1_checkpoint $self, $sync if $self->{need_checkpoint};
+ v1_checkpoint $self if $self->{need_checkpoint};
}
- v1_checkpoint($self, $sync, $self->{quit} ? undef : $stk);
+ v1_checkpoint($self, $self->{quit} ? undef : $stk);
}
-sub log2stack ($$$$) {
- my ($self, $sync, $git, $range) = @_;
+sub log2stack ($$$) {
+ my ($self, $git, $range) = @_;
my $D = $self->{D}; # OID_BIN => NR (if reindexing, undef otherwise)
my ($add, $del);
- if ($sync->{ibx}->version == 1) {
+ if ($self->{ibx}->version == 1) {
my $path = $hex.'{2}/'.$hex.'{38}';
$add = qr!\A:000000 100644 \S+ ($OID) A\t$path$!;
$del = qr!\A:100644 000000 ($OID) \S+ D\t$path$!;
@@ -996,9 +996,9 @@ sub log2stack ($$$$) {
$stk->read_prepare;
}
-sub prepare_stack ($$$) {
- my ($self, $sync, $range) = @_;
- my $git = $sync->{ibx}->git;
+sub prepare_stack ($$) {
+ my ($self, $range) = @_;
+ my $git = $self->{ibx}->git;
if (index($range, '..') < 0) {
# don't show annoying git errors to users who run -index
@@ -1007,7 +1007,7 @@ sub prepare_stack ($$$) {
return PublicInbox::IdxStack->new->read_prepare if $?;
}
local $self->{D} = $self->{reindex} ? {} : undef; # OID_BIN => NR
- log2stack($self, $sync, $git, $range);
+ log2stack $self, $git, $range;
}
# --is-ancestor requires git 1.8.0+
@@ -1018,8 +1018,8 @@ sub is_ancestor ($$$) {
run_wait($cmd) == 0;
}
-sub v1_need_update ($$$$) {
- my ($self, $sync, $cur, $new) = @_;
+sub v1_need_update ($$$) {
+ my ($self, $cur, $new) = @_;
my $git = $self->{ibx}->git;
$cur //= ''; # XS Search::Xapian ->get_metadata doesn't give undef
@@ -1085,7 +1085,6 @@ sub _index_sync {
my $pr = $opt->{-progress};
local $self->{-opt} = $opt;
local $self->{reindex} = $opt->{reindex};
- my $sync = { ibx => $ibx };
my $quit = quit_cb $self;
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
@@ -1107,10 +1106,10 @@ sub _index_sync {
my $lx = v1_reindex_from $self->{reindex}, $last_commit;
my $range = $lx eq '' ? $tip : "$lx..$tip";
$pr->("counting changes\n\t$range ... ") if $pr;
- my $stk = prepare_stack($self, $sync, $range);
+ my $stk = prepare_stack $self, $range;
local $self->{ntodo} = $stk ? $stk->num_records : 0;
$pr->("$self->{ntodo}\n") if $pr; # continue previous line
- v1_process_stack($self, $sync, $stk) if !$self->{quit};
+ v1_process_stack($self, $stk) if !$self->{quit};
}
sub DESTROY {
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 31d8dcda..a80a3fb8 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -960,7 +960,7 @@ sub sync_prepare ($$) {
# because we want NNTP article number gaps from unindexed
# messages to show up in mirrors, too.
$self->{D} //= $self->{reindex} ? {} : undef; # OID_BIN => NR
- my $stk = log2stack($self, $sync, $git, $range);
+ my $stk = log2stack $self, $git, $range;
return 0 if $self->{quit};
my $nr = $stk ? $stk->num_records : 0;
$pr->("$nr\n") if $pr;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 17/31] v2writable: move {mm_tmp} to $self
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
` (5 preceding siblings ...)
2025-01-10 23:19 ` [PATCH 16/31] searchidx: eliminate $sync from subroutines Eric Wong
@ 2025-01-10 23:19 ` Eric Wong
2025-01-10 23:19 ` [PATCH 18/31] (ext)index: eliminate most uses of `$sync->{ibx}' Eric Wong
2025-01-10 23:19 ` [PATCH 19/31] extindex: eliminate repeated ->eidx_key method call Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:19 UTC (permalink / raw)
To: meta
Another variable we can easily local-ize to limit scope and
reduce refcount traffic from copying into the per-cat_async $arg
structure.
---
lib/PublicInbox/V2Writable.pm | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index a80a3fb8..e5d54dc6 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -701,7 +701,7 @@ sub reindex_checkpoint ($$) {
$self->git->async_wait_all;
$self->update_last_commit($sync);
$self->{need_checkpoint} = 0;
- my $mm_tmp = $sync->{mm_tmp};
+ my $mm_tmp = $self->{mm_tmp};
$mm_tmp->atfork_prepare if $mm_tmp;
die 'BUG: {im} during reindex' if $self->{im};
if ($self->{ibx_map} && !$self->{checkpoint_unlocks}) {
@@ -766,7 +766,7 @@ sub index_oid { # cat_async callback
}
}
$mid0 //= do { # is this a number we got before?
- $num = $arg->{mm_tmp}->num_for($mids->[0]);
+ $num = $self->{mm_tmp}->num_for($mids->[0]);
# don't clobber existing if Message-ID is reused:
if (my $x = defined($num) ? $oidx->get_art($num) : undef) {
@@ -776,7 +776,7 @@ sub index_oid { # cat_async callback
};
if (!defined($num)) {
for (my $i = $#$mids; $i >= 1; $i--) {
- $num = $arg->{mm_tmp}->num_for($mids->[$i]);
+ $num = $self->{mm_tmp}->num_for($mids->[$i]);
if (defined($num)) {
$mid0 = $mids->[$i];
last;
@@ -784,7 +784,7 @@ sub index_oid { # cat_async callback
}
}
if (defined($num)) {
- $arg->{mm_tmp}->num_delete($num);
+ $self->{mm_tmp}->num_delete($num);
} else { # never seen
$num = $self->{mm}->mid_insert($mids->[0]);
if (defined($num)) {
@@ -1228,6 +1228,7 @@ sub index_sync {
$self->{mg}->fill_alternates;
$self->{oidx}->rethread_prepare($opt);
local $self->{reindex} = $opt->{reindex};
+ local $self->{mm_tmp};
my $sync = {
self => $self,
ibx => $self->{ibx},
@@ -1245,12 +1246,12 @@ sub index_sync {
# only for batch performance.
$self->{mm}->{dbh}->rollback;
$self->{mm}->{dbh}->begin_work;
- $sync->{mm_tmp} =
+ $self->{mm_tmp} =
$self->{mm}->tmp_clone($self->{ibx}->{inboxdir});
# xapian_only works incrementally w/o --reindex
if ($seq && !$opt->{reindex}) {
- $art_beg = $sync->{mm_tmp}->max || -1;
+ $art_beg = $self->{mm_tmp}->max || -1;
$art_beg++;
}
}
@@ -1266,7 +1267,7 @@ sub index_sync {
my $quit_warn;
# deal with Xapian shards sequentially
- if ($seq && delete($sync->{mm_tmp})) {
+ if ($seq && delete($self->{mm_tmp})) {
if ($self->{quit}) {
$quit_warn = 1;
} else {
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 18/31] (ext)index: eliminate most uses of `$sync->{ibx}'
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
` (6 preceding siblings ...)
2025-01-10 23:19 ` [PATCH 17/31] v2writable: move {mm_tmp} to $self Eric Wong
@ 2025-01-10 23:19 ` Eric Wong
2025-01-10 23:19 ` [PATCH 19/31] extindex: eliminate repeated ->eidx_key method call Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:19 UTC (permalink / raw)
To: meta
Yet another step towards eliminating the $sync structure.
There's still a few remaining dependencies due to cross-inbox
linkage but the goal is to reduce refcount traffic for the
majority of Inbox object references. This field in $sync will
be completely eliminated in the next few commits.
---
lib/PublicInbox/ExtSearchIdx.pm | 6 +++---
lib/PublicInbox/V2Writable.pm | 10 +++++-----
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 32cdd090..76699c6b 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -375,8 +375,8 @@ sub unindex_oid { # git->cat_async callback for 'd'
sub last_commits {
my ($self, $sync) = @_;
my $heads = [];
- my $ekey = $sync->{ibx}->eidx_key;
- my $uv = $sync->{ibx}->uidvalidity;
+ my $ekey = $self->{ibx}->eidx_key;
+ my $uv = $self->{ibx}->uidvalidity;
for my $i (0..$sync->{epoch_max}) {
$heads->[$i] = $self->{oidx}->eidx_meta("lc-v2:$ekey//$uv;$i");
}
@@ -1170,7 +1170,7 @@ sub update_last_commit { # overrides V2Writable
my $unit = $sync->{unit} // return;
my $latest_cmt = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
defined($latest_cmt) or return;
- my $ibx = $sync->{ibx} or die 'BUG: {ibx} missing';
+ my $ibx = $self->{ibx} or die 'BUG: {ibx} missing';
my $ekey = $ibx->eidx_key;
my $uv = $ibx->uidvalidity;
my $epoch = $unit->{epoch};
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index e5d54dc6..0606c99f 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -912,12 +912,12 @@ sub sync_prepare ($$) {
$sync->{ranges} = sync_ranges($self, $sync);
my $pr = $self->{-opt}->{-progress};
my $regen_max = 0;
- my $head = $sync->{ibx}->{ref_head} || 'HEAD';
+ my $head = $self->{ibx}->{ref_head} || 'HEAD';
my $pfx;
local $self->{D}; # delete state
if ($pr) {
- ($pfx) = ($sync->{ibx}->{inboxdir} =~ m!([^/]+)\z!g);
- $pfx //= $sync->{ibx}->{inboxdir};
+ ($pfx) = ($self->{ibx}->{inboxdir} =~ m!([^/]+)\z!g);
+ $pfx //= $self->{ibx}->{inboxdir};
}
my $reindex_heads;
@@ -927,7 +927,7 @@ sub sync_prepare ($$) {
# what's in the per-inbox index.
$reindex_heads = [];
my $v = PublicInbox::Search::SCHEMA_VERSION;
- my $mm = $sync->{ibx}->mm;
+ my $mm = $self->{ibx}->mm;
for my $i (0..$sync->{epoch_max}) {
$reindex_heads->[$i] = $mm->last_commit_xap($v, $i);
}
@@ -938,7 +938,7 @@ sub sync_prepare ($$) {
}
$self->{max_size} = $self->{-opt}->{max_size} and
$self->{index_oid} = $self->can('index_oid');
- my $git_pfx = "$sync->{ibx}->{inboxdir}/git";
+ my $git_pfx = "$self->{ibx}->{inboxdir}/git";
for (my $i = $sync->{epoch_max}; $i >= 0; $i--) {
my $git_dir = "$git_pfx/$i.git";
-d $git_dir or next; # missing epochs are fine
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 19/31] extindex: eliminate repeated ->eidx_key method call
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
` (7 preceding siblings ...)
2025-01-10 23:19 ` [PATCH 18/31] (ext)index: eliminate most uses of `$sync->{ibx}' Eric Wong
@ 2025-01-10 23:19 ` Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:19 UTC (permalink / raw)
To: meta
No need to make two relatively expensive `->' method calls when
one will do.
---
lib/PublicInbox/ExtSearchIdx.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 76699c6b..6693542b 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -254,8 +254,8 @@ sub index_unseen ($) {
$self->{oidx}->add_overview($eml, $new_smsg);
my $oid = $new_smsg->{blob};
my $ibx = delete $req->{ibx} or die 'BUG: {ibx} unset';
- $self->{oidx}->add_xref3($docid, $req->{xnum}, $oid, $ibx->eidx_key);
- $new_smsg->{eidx_key} = $ibx->eidx_key;
+ my $ekey = $new_smsg->{eidx_key} = $ibx->eidx_key;
+ $self->{oidx}->add_xref3($docid, $req->{xnum}, $oid, $ekey);
$idx->index_eml($eml, $new_smsg);
update_checkpoint $self, $new_smsg->{bytes};
}
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 20/31] v2writable: eliminate $sync->{art_end}
2025-01-10 23:17 [PATCH 00/31] (ext)index: eliminate $sync structure Eric Wong
2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 21/31] (ext)index: $sync->{unit} => $self->{unit} Eric Wong
` (8 more replies)
2025-01-10 23:20 ` [PATCH 30/31] (ext)index: eliminate $sync->{ibx} Eric Wong
3 siblings, 9 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
Instead of creating a long-lived {art_end} entry or even
local-izing it in $self, pass the `$art_end' value on stack to
achieve symmetry with the existing `$arg_beg' arg in calls to
index_xap_step().
---
lib/PublicInbox/V2Writable.pm | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 0606c99f..4e42b347 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -1095,9 +1095,8 @@ sub index_xap_only { # git->cat_async callback
$idx->index_eml(PublicInbox::Eml->new($bref), $smsg);
}
-sub index_xap_step ($$$;$) {
- my ($self, $sync, $beg, $step) = @_;
- my $end = $sync->{art_end};
+sub index_xap_step ($$$$;$) {
+ my ($self, $sync, $beg, $end, $step) = @_;
return if $beg > $end; # nothing to do
$step //= $self->{shards};
@@ -1177,18 +1176,18 @@ sub xapian_only ($;$$) {
if (my $art_end = $self->{ibx}->mm->max) {
$self->{-regen_fmt} //= "%u/?\n";
$sync //= { self => $self };
- $sync->{art_end} = $art_end;
if ($seq || !$self->{parallel}) {
my $shard_end = $self->{shards} - 1;
for my $i (0..$shard_end) {
last if $self->{quit};
- index_xap_step($self, $sync, $art_beg + $i);
+ index_xap_step $self, $sync, $art_beg + $i,
+ $art_end;
if ($i != $shard_end) {
reindex_checkpoint($self, $sync);
}
}
} else { # parallel (maybe)
- index_xap_step($self, $sync, $art_beg, 1);
+ index_xap_step $self, $sync, $art_beg, $art_end, 1;
}
}
$self->git->async_wait_all;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 21/31] (ext)index: $sync->{unit} => $self->{unit}
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 22/31] (ext)index: eliminate redundant $sync->{epoch_max} Eric Wong
` (7 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
Since we were using `local' anyways on the $sync hashref, it's a
trivial swap to move it into $self.
---
lib/PublicInbox/ExtSearchIdx.pm | 2 +-
lib/PublicInbox/V2Writable.pm | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 6693542b..0080c1db 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -1167,7 +1167,7 @@ sub eidx_sync { # main entry point
sub update_last_commit { # overrides V2Writable
my ($self, $sync, $stk) = @_;
- my $unit = $sync->{unit} // return;
+ my $unit = $self->{unit} // return;
my $latest_cmt = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
defined($latest_cmt) or return;
my $ibx = $self->{ibx} or die 'BUG: {ibx} missing';
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 4e42b347..5fcc0800 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -818,7 +818,7 @@ sub index_oid { # cat_async callback
# only update last_commit for $i on reindex iff newer than current
sub update_last_commit {
my ($self, $sync, $stk) = @_;
- my $unit = $sync->{unit} // return;
+ my $unit = $self->{unit} // return;
my $latest_cmt = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
defined($latest_cmt) or return;
my $last = last_epoch_commit($self, $unit->{epoch});
@@ -1137,7 +1137,7 @@ sub index_todo ($$$) {
}
local $self->{current_info} = "$pfx ";
local $self->{latest_cmt};
- local $sync->{unit} = $unit;
+ local $self->{unit} = $unit;
while (my ($f, $at, $ct, $oid, $cmt) = $stk->pop_rec) {
if ($self->{quit}) {
warn "waiting to quit...\n";
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 22/31] (ext)index: eliminate redundant $sync->{epoch_max}
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
2025-01-10 23:20 ` [PATCH 21/31] (ext)index: $sync->{unit} => $self->{unit} Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 23/31] extindex: move {dedupe_cull} to self Eric Wong
` (6 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
We already had $self->{epoch_max}, and it's easy enough to
`local'-ize it to ensure it doesn't outlive its usefulness when
-extindex handles multiple inboxes.
---
lib/PublicInbox/ExtSearchIdx.pm | 5 +++--
lib/PublicInbox/V2Writable.pm | 11 +++++------
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 0080c1db..73802813 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -377,7 +377,7 @@ sub last_commits {
my $heads = [];
my $ekey = $self->{ibx}->eidx_key;
my $uv = $self->{ibx}->uidvalidity;
- for my $i (0..$sync->{epoch_max}) {
+ for my $i (0..$self->{epoch_max}) {
$heads->[$i] = $self->{oidx}->eidx_meta("lc-v2:$ekey//$uv;$i");
}
$heads;
@@ -400,9 +400,10 @@ sub _sync_inbox ($$$) {
$sync->{ibx} = $ibx; # FIXME: eliminate
local $self->{ibx} = $ibx;
$self->{nrec} = 0;
+ local $self->{epoch_max};
my $v = $ibx->version;
if ($v == 2) {
- $sync->{epoch_max} = $ibx->max_git_epoch // return;
+ $self->{epoch_max} = $ibx->max_git_epoch // return;
sync_prepare($self, $sync); # or return # TODO: once MiscIdx is stable
} elsif ($v == 1) {
my $uv = $ibx->uidvalidity;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 5fcc0800..e8d5e005 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -838,7 +838,7 @@ sub update_last_commit {
sub last_commits {
my ($self, $sync) = @_;
my $heads = [];
- for (my $i = $sync->{epoch_max}; $i >= 0; $i--) {
+ for (my $i = $self->{epoch_max}; $i >= 0; $i--) {
$heads->[$i] = last_epoch_commit($self, $i);
}
$heads;
@@ -928,7 +928,7 @@ sub sync_prepare ($$) {
$reindex_heads = [];
my $v = PublicInbox::Search::SCHEMA_VERSION;
my $mm = $self->{ibx}->mm;
- for my $i (0..$sync->{epoch_max}) {
+ for my $i (0..$self->{epoch_max}) {
$reindex_heads->[$i] = $mm->last_commit_xap($v, $i);
}
} elsif ($self->{reindex}) { # V2 inbox
@@ -939,7 +939,7 @@ sub sync_prepare ($$) {
$self->{max_size} = $self->{-opt}->{max_size} and
$self->{index_oid} = $self->can('index_oid');
my $git_pfx = "$self->{ibx}->{inboxdir}/git";
- for (my $i = $sync->{epoch_max}; $i >= 0; $i--) {
+ for (my $i = $self->{epoch_max}; $i >= 0; $i--) {
my $git_dir = "$git_pfx/$i.git";
-d $git_dir or next; # missing epochs are fine
my $git = PublicInbox::Git->new($git_dir);
@@ -1205,8 +1205,8 @@ sub index_sync {
local $self->{-regen_fmt};
return xapian_only($self) if $opt->{xapian_only};
- my $epoch_max = $self->{ibx}->max_git_epoch // return;
- my $latest = $self->{mg}->epoch_dir."/$epoch_max.git";
+ local $self->{epoch_max} = $self->{ibx}->max_git_epoch // return;
+ my $latest = $self->{mg}->epoch_dir."/$self->{epoch_max}.git";
if ($opt->{'fast-noop'}) { # nanosecond (st_ctim) comparison
use Time::HiRes qw(stat);
if (my @mm = stat("$self->{ibx}->{inboxdir}/msgmap.sqlite3")) {
@@ -1231,7 +1231,6 @@ sub index_sync {
my $sync = {
self => $self,
ibx => $self->{ibx},
- epoch_max => $epoch_max,
};
my $quit = PublicInbox::SearchIdx::quit_cb $self;
local $SIG{QUIT} = $quit;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 23/31] extindex: move {dedupe_cull} to self
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
2025-01-10 23:20 ` [PATCH 21/31] (ext)index: $sync->{unit} => $self->{unit} Eric Wong
2025-01-10 23:20 ` [PATCH 22/31] (ext)index: eliminate redundant $sync->{epoch_max} Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 24/31] extindex: simplify data structures used for dedupe Eric Wong
` (5 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
This allows us to eliminate the $per_mid->{sync} field used for
dedupe, as well.
---
lib/PublicInbox/ExtSearchIdx.pm | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 73802813..f6d0a7cc 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -1024,7 +1024,7 @@ sub dd_smsg { # git->cat_async callback
while (my ($chash, $ary) = each %{$per_mid->{dd_chash}}) {
my $keep = shift @$ary;
next if !scalar(@$ary);
- $per_mid->{sync}->{dedupe_cull} += scalar(@$ary);
+ $per_mid->{self}->{dedupe_cull} += scalar(@$ary);
print STDERR
"# <$keep->{mid}> keeping #$keep->{num}, dropping ",
join(', ', map { "#$_->{num}" } @$ary),"\n";
@@ -1040,7 +1040,7 @@ sub dd_smsg { # git->cat_async callback
sub eidx_dedupe ($$$) {
my ($self, $sync, $msgids) = @_;
- $sync->{dedupe_cull} = 0;
+ local $self->{dedupe_cull} = 0;
my $candidates = 0;
my $nr_mid = 0;
return unless eidxq_lock_acquire($self);
@@ -1079,7 +1079,6 @@ EOS
my $per_mid = {
dd_chash => {}, # chash => [ary of smsgs]
last_smsg => $smsg[-1],
- sync => $sync
};
$nr_mid++;
$candidates += scalar(@smsg) - 1;
@@ -1104,7 +1103,7 @@ EOS
}
goto dedupe_restart if defined($msgids->[++$idx]);
- my $n = delete $sync->{dedupe_cull};
+ my $n = delete $self->{dedupe_cull};
if (my $pr = $self->{-opt}->{-progress}) {
$pr->("culled $n/$candidates candidates ($nr_mid msgids)\n");
}
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 24/31] extindex: simplify data structures used for dedupe
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
` (2 preceding siblings ...)
2025-01-10 23:20 ` [PATCH 23/31] extindex: move {dedupe_cull} to self Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 25/31] (ext)index: move {todo} into $self Eric Wong
` (4 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
We can stuff more into $self via `local' and eliminate the $per_mid
structure and extra lookups.
---
lib/PublicInbox/ExtSearchIdx.pm | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index f6d0a7cc..21d91475 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -1012,19 +1012,18 @@ sub dd_smsg { # git->cat_async callback
my ($bref, $oid, $type, $size, $dd) = @_;
my $smsg = $dd->{smsg} // die 'BUG: dd->{smsg} missing';
my $self = $dd->{self} // die 'BUG: {self} missing';
- my $per_mid = $dd->{per_mid} // die 'BUG: {per_mid} missing';
if ($type eq 'missing') {
_blob_missing($dd, $smsg);
} elsif (!is_bad_blob($oid, $type, $size, $smsg->{blob})) {
local $self->{current_info} = "$self->{current_info} $oid";
my $chash = content_hash(PublicInbox::Eml->new($bref));
- push(@{$per_mid->{dd_chash}->{$chash}}, $smsg);
+ push @{$self->{dd_chash}->{$chash}}, $smsg;
}
- return if $per_mid->{last_smsg} != $smsg;
- while (my ($chash, $ary) = each %{$per_mid->{dd_chash}}) {
+ return if $self->{last_smsg} != $smsg;
+ while (my ($chash, $ary) = each %{$self->{dd_chash}}) {
my $keep = shift @$ary;
next if !scalar(@$ary);
- $per_mid->{self}->{dedupe_cull} += scalar(@$ary);
+ $self->{dedupe_cull} += scalar(@$ary);
print STDERR
"# <$keep->{mid}> keeping #$keep->{num}, dropping ",
join(', ', map { "#$_->{num}" } @$ary),"\n";
@@ -1076,18 +1075,13 @@ EOS
push @smsg, $x;
}
next if scalar(@smsg) < 2;
- my $per_mid = {
- dd_chash => {}, # chash => [ary of smsgs]
- last_smsg => $smsg[-1],
- };
+ # per-msgid data:
+ local $self->{dd_chash} = {}; # chash => [ary of smsgs]
+ local $self->{last_smsg} = $smsg[-1];
$nr_mid++;
$candidates += scalar(@smsg) - 1;
for my $smsg (@smsg) {
- my $dd = {
- per_mid => $per_mid,
- smsg => $smsg,
- self => $self,
- };
+ my $dd = { smsg => $smsg, self => $self };
$self->git->cat_async($smsg->{blob}, \&dd_smsg, $dd);
}
# need to wait on every single one @smsg contents can get
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 25/31] (ext)index: move {todo} into $self
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
` (3 preceding siblings ...)
2025-01-10 23:20 ` [PATCH 24/31] extindex: simplify data structures used for dedupe Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 26/31] v2writable: {in_unindex} moved to self Eric Wong
` (3 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
Another trivial field to `local'-ize and another step towards
eliminating the $sync structure.
---
lib/PublicInbox/ExtSearchIdx.pm | 5 +++--
lib/PublicInbox/V2Writable.pm | 10 ++++------
2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 21d91475..6c578b3b 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -402,6 +402,7 @@ sub _sync_inbox ($$$) {
$self->{nrec} = 0;
local $self->{epoch_max};
my $v = $ibx->version;
+ local $self->{todo}; # set by sync_prepare
if ($v == 2) {
$self->{epoch_max} = $ibx->max_git_epoch // return;
sync_prepare($self, $sync); # or return # TODO: once MiscIdx is stable
@@ -412,11 +413,11 @@ sub _sync_inbox ($$$) {
return "E: $ibx->{inboxdir} is not indexed";
my $stk = prepare_stack $self, $lc ? "$lc..$head" : $head;
my $unit = { stack => $stk, git => $ibx->git };
- push @{$sync->{todo}}, $unit;
+ push @{$self->{todo}}, $unit;
} else {
return "E: $ekey unsupported inbox version (v$v)";
}
- for my $unit (@{delete($sync->{todo}) // []}) {
+ for my $unit (@{delete($self->{todo}) // []}) {
last if $self->{quit};
index_todo($self, $sync, $unit);
}
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index e8d5e005..0bd49627 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -965,7 +965,7 @@ sub sync_prepare ($$) {
my $nr = $stk ? $stk->num_records : 0;
$pr->("$nr\n") if $pr;
$unit->{stack} = $stk; # may be undef
- unshift @{$sync->{todo}}, $unit;
+ unshift @{$self->{todo}}, $unit;
$regen_max += $nr;
}
return 0 if $self->{quit};
@@ -1228,10 +1228,8 @@ sub index_sync {
$self->{oidx}->rethread_prepare($opt);
local $self->{reindex} = $opt->{reindex};
local $self->{mm_tmp};
- my $sync = {
- self => $self,
- ibx => $self->{ibx},
- };
+ local $self->{todo}; # sync_prepare
+ my $sync = { self => $self, ibx => $self->{ibx} };
my $quit = PublicInbox::SearchIdx::quit_cb $self;
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
@@ -1254,7 +1252,7 @@ sub index_sync {
}
}
# work forwards through history
- index_todo($self, $sync, $_) for @{delete($sync->{todo}) // []};
+ index_todo($self, $sync, $_) for @{delete($self->{todo}) // []};
$self->{oidx}->rethread_done($opt) unless $self->{quit};
$self->done;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 26/31] v2writable: {in_unindex} moved to self
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
` (4 preceding siblings ...)
2025-01-10 23:20 ` [PATCH 25/31] (ext)index: move {todo} into $self Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 27/31] v2writable: hoist out process_todo sub for extindex Eric Wong
` (2 subsequent siblings)
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
{in_unindex} is already local-ized for $sync, so moving it to
$self is trivial since we use ->async_wait_all to ensure
cat-file requests are done.
---
lib/PublicInbox/V2Writable.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 0bd49627..072542cb 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -1014,7 +1014,7 @@ sub unindex_oid ($$;$) { # git->cat_async callback
return index_finalize($arg, 0);
my $self = $arg->{self};
local $self->{current_info} = "$self->{current_info} $oid";
- my $unindexed = $arg->{in_unindex} ? $arg->{unindexed} : undef;
+ my $unindexed = $self->{in_unindex} ? $arg->{unindexed} : undef;
my $mm = $self->{mm};
my $mids = mids(PublicInbox::Eml->new($bref));
undef $$bref;
@@ -1057,7 +1057,7 @@ sub unindex_todo ($$$) {
# order does not matter, here:
my $fh = $unit->{git}->popen(qw(log --raw -r --no-notes --no-color
--no-abbrev --no-renames), $unindex_range);
- local $sync->{in_unindex} = 1;
+ local $self->{in_unindex} = 1;
my $unindex_oid = $self->can('unindex_oid');
while (<$fh>) {
/\A:\d{6} 100644 $OID ($OID) [AM]\tm$/o or next;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 27/31] v2writable: hoist out process_todo sub for extindex
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
` (5 preceding siblings ...)
2025-01-10 23:20 ` [PATCH 26/31] v2writable: {in_unindex} moved to self Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 28/31] (ext)index: move {unindexed} to $self Eric Wong
2025-01-10 23:20 ` [PATCH 29/31] (ext)index: move {ranges} " Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
We can have consistent $self->{quit} detection this way and
eliminate an early return from index_todo.
---
lib/PublicInbox/ExtSearchIdx.pm | 5 +----
lib/PublicInbox/V2Writable.pm | 11 +++++++++--
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 6c578b3b..807666fc 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -417,10 +417,7 @@ sub _sync_inbox ($$$) {
} else {
return "E: $ekey unsupported inbox version (v$v)";
}
- for my $unit (@{delete($self->{todo}) // []}) {
- last if $self->{quit};
- index_todo($self, $sync, $unit);
- }
+ PublicInbox::V2Writable::process_todo $self, $sync;
$self->{midx}->index_ibx($ibx) unless $self->{quit};
$ibx->git->cleanup; # done with this inbox, now
undef;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 072542cb..548d8bde 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -1122,7 +1122,6 @@ sub index_xap_step ($$$$;$) {
sub index_todo ($$$) {
my ($self, $sync, $unit) = @_;
- return if $self->{quit};
unindex_todo($self, $sync, $unit);
my $stk = delete($unit->{stack}) or return;
my $all = $self->git; # initialize self->{ibx}->{git}
@@ -1195,6 +1194,14 @@ sub xapian_only ($;$$) {
$self->done;
}
+sub process_todo ($$) {
+ my ($self, $sync) = @_;
+ for my $unit (@{delete($self->{todo}) // []}) {
+ last if $self->{quit};
+ $self->index_todo($sync, $unit); # may be ExtSearchIdx
+ }
+}
+
# public, called by public-inbox-index
sub index_sync {
my ($self, $opt) = @_;
@@ -1252,7 +1259,7 @@ sub index_sync {
}
}
# work forwards through history
- index_todo($self, $sync, $_) for @{delete($self->{todo}) // []};
+ process_todo $self, $sync;
$self->{oidx}->rethread_done($opt) unless $self->{quit};
$self->done;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 28/31] (ext)index: move {unindexed} to $self
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
` (6 preceding siblings ...)
2025-01-10 23:20 ` [PATCH 27/31] v2writable: hoist out process_todo sub for extindex Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:20 ` [PATCH 29/31] (ext)index: move {ranges} " Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
As with most things that were formerly in $sync, we can just
rely on `local' to flatten the data structure graph and work
towards eliminating $sync.
---
lib/PublicInbox/ExtSearchIdx.pm | 1 +
lib/PublicInbox/V2Writable.pm | 13 ++++++-------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 807666fc..7ccfb1db 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -403,6 +403,7 @@ sub _sync_inbox ($$$) {
local $self->{epoch_max};
my $v = $ibx->version;
local $self->{todo}; # set by sync_prepare
+ local $self->{unindexed};
if ($v == 2) {
$self->{epoch_max} = $ibx->max_git_epoch // return;
sync_prepare($self, $sync); # or return # TODO: once MiscIdx is stable
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 548d8bde..6b43bac1 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -746,16 +746,14 @@ sub index_oid { # cat_async callback
}
# {unindexed} is unlikely
- if (my $unindexed = $arg->{unindexed}) {
+ if (my $unindexed = $self->{unindexed}) {
my $oidbin = pack('H*', $oid);
my $u = $unindexed->{$oidbin};
($num, $mid0) = splice(@$u, 0, 2) if $u;
if (defined $num) {
$self->{mm}->mid_set($num, $mid0);
- if (scalar(@$u) == 0) { # done with current OID
- delete $unindexed->{$oidbin};
- delete($arg->{unindexed}) if !keys(%$unindexed);
- }
+ # done w/ current OID?
+ delete $unindexed->{$oidbin} if !@$u;
}
}
my $oidx = $self->{oidx};
@@ -1014,7 +1012,7 @@ sub unindex_oid ($$;$) { # git->cat_async callback
return index_finalize($arg, 0);
my $self = $arg->{self};
local $self->{current_info} = "$self->{current_info} $oid";
- my $unindexed = $self->{in_unindex} ? $arg->{unindexed} : undef;
+ my $unindexed = $self->{in_unindex} ? $self->{unindexed} : undef;
my $mm = $self->{mm};
my $mids = mids(PublicInbox::Eml->new($bref));
undef $$bref;
@@ -1052,7 +1050,7 @@ sub git { $_[0]->{ibx}->git }
sub unindex_todo ($$$) {
my ($self, $sync, $unit) = @_;
my $unindex_range = delete($unit->{unindex_range}) // return;
- my $unindexed = $sync->{unindexed} //= {}; # $oidbin => [$num, $mid0]
+ my $unindexed = $self->{unindexed} //= {}; # $oidbin => [$num, $mid0]
my $before = scalar keys %$unindexed;
# order does not matter, here:
my $fh = $unit->{git}->popen(qw(log --raw -r --no-notes --no-color
@@ -1236,6 +1234,7 @@ sub index_sync {
local $self->{reindex} = $opt->{reindex};
local $self->{mm_tmp};
local $self->{todo}; # sync_prepare
+ local $self->{unindexed};
my $sync = { self => $self, ibx => $self->{ibx} };
my $quit = PublicInbox::SearchIdx::quit_cb $self;
local $SIG{QUIT} = $quit;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 29/31] (ext)index: move {ranges} to $self
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
` (7 preceding siblings ...)
2025-01-10 23:20 ` [PATCH 28/31] (ext)index: move {unindexed} to $self Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
8 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
Another field we can trivially `local'-ize in our quest to
eliminate the clumsy $sync structure.
---
lib/PublicInbox/ExtSearchIdx.pm | 1 +
lib/PublicInbox/V2Writable.pm | 12 +++++++-----
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 7ccfb1db..c0eae0e3 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -403,6 +403,7 @@ sub _sync_inbox ($$$) {
local $self->{epoch_max};
my $v = $ibx->version;
local $self->{todo}; # set by sync_prepare
+ local $self->{ranges};
local $self->{unindexed};
if ($v == 2) {
$self->{epoch_max} = $ibx->max_git_epoch // return;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 6b43bac1..ded3f2c1 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -845,17 +845,18 @@ sub last_commits {
# returns a revision range for git-log(1)
sub log_range ($$$) {
my ($sync, $unit, $tip) = @_;
- my $opt = $sync->{self}->{-opt};
+ my $self = $sync->{self};
+ my $opt = $self->{-opt};
my $pr = $opt->{-progress} if (($opt->{verbose} || 0) > 1);
my $i = $unit->{epoch};
- my $cur = $sync->{ranges}->[$i] or do {
+ my $cur = $self->{ranges}->[$i] or do {
$pr->("$i.git indexing all of $tip\n") if $pr;
return $tip; # all of it
};
# fast equality check to avoid (v)fork+execve overhead
if ($cur eq $tip) {
- $sync->{ranges}->[$i] = undef;
+ $self->{ranges}->[$i] = undef;
return;
}
@@ -867,7 +868,7 @@ sub log_range ($$$) {
my $n = $git->qx(qw(rev-list --count), $range);
chomp($n);
if ($n == 0) {
- $sync->{ranges}->[$i] = undef;
+ $self->{ranges}->[$i] = undef;
$pr->("$i.git has nothing new\n") if $pr;
return; # nothing to do
}
@@ -907,7 +908,7 @@ sub artnum_max { $_[0]->{mm}->num_highwater }
sub sync_prepare ($$) {
my ($self, $sync) = @_;
- $sync->{ranges} = sync_ranges($self, $sync);
+ $self->{ranges} = sync_ranges($self, $sync);
my $pr = $self->{-opt}->{-progress};
my $regen_max = 0;
my $head = $self->{ibx}->{ref_head} || 'HEAD';
@@ -1234,6 +1235,7 @@ sub index_sync {
local $self->{reindex} = $opt->{reindex};
local $self->{mm_tmp};
local $self->{todo}; # sync_prepare
+ local $self->{ranges};
local $self->{unindexed};
my $sync = { self => $self, ibx => $self->{ibx} };
my $quit = PublicInbox::SearchIdx::quit_cb $self;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 30/31] (ext)index: eliminate $sync->{ibx}
2025-01-10 23:17 [PATCH 00/31] (ext)index: eliminate $sync structure Eric Wong
` (2 preceding siblings ...)
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
@ 2025-01-10 23:20 ` Eric Wong
2025-01-10 23:21 ` [PATCH 31/31] (ext)index: eliminate $sync entirely Eric Wong
3 siblings, 1 reply; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:20 UTC (permalink / raw)
To: meta
Perhaps the trickiest field to eliminate from $sync {ibx}
due to the way -extindex handles cross-posted messages. We
now eliminate the per-request $req->{ibx} for non-crossposted
messages and use local-ized $self->{ibx} directly.
---
lib/PublicInbox/ExtSearchIdx.pm | 9 ++++-----
lib/PublicInbox/V2Writable.pm | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index c0eae0e3..946f5a4b 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -223,7 +223,7 @@ sub do_xpost ($$) {
my $self = $req->{self};
my $docid = $smsg->{num};
my $oid = $req->{oid};
- my $xibx = $req->{ibx};
+ my $xibx = $req->{ibx} // $self->{ibx};
my $eml = $req->{eml};
if (my $new_smsg = $req->{new_smsg}) { # 'm' on cross-posted message
my $eidx_key = $xibx->eidx_key;
@@ -253,7 +253,7 @@ sub index_unseen ($) {
my $idx = $self->idx_shard($docid);
$self->{oidx}->add_overview($eml, $new_smsg);
my $oid = $new_smsg->{blob};
- my $ibx = delete $req->{ibx} or die 'BUG: {ibx} unset';
+ my $ibx = ($req->{ibx} // $self->{ibx}) or die 'BUG: {ibx} unset';
my $ekey = $new_smsg->{eidx_key} = $ibx->eidx_key;
$self->{oidx}->add_xref3($docid, $req->{xnum}, $oid, $ekey);
$idx->index_eml($eml, $new_smsg);
@@ -326,8 +326,8 @@ sub ck_existing { # git->cat_async callback
# return the number if so
sub cur_ibx_xnum ($$;$) {
my ($req, $bref, $mismatch) = @_;
- my $ibx = $req->{ibx} or die 'BUG: current {ibx} missing';
-
+ my $ibx = ($req->{ibx} // $req->{self}->{ibx}) or
+ die 'BUG: current {ibx} missing';
$req->{eml} = PublicInbox::Eml->new($bref);
$req->{chash} = content_hash($req->{eml});
$req->{mids} = mids($req->{eml});
@@ -397,7 +397,6 @@ sub _sync_inbox ($$$) {
if (defined(my $err = _ibx_index_reject($ibx))) {
return "W: skipping $ekey ($err)";
}
- $sync->{ibx} = $ibx; # FIXME: eliminate
local $self->{ibx} = $ibx;
$self->{nrec} = 0;
local $self->{epoch_max};
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index ded3f2c1..80ee346d 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -1237,7 +1237,7 @@ sub index_sync {
local $self->{todo}; # sync_prepare
local $self->{ranges};
local $self->{unindexed};
- my $sync = { self => $self, ibx => $self->{ibx} };
+ my $sync = { self => $self };
my $quit = PublicInbox::SearchIdx::quit_cb $self;
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 31/31] (ext)index: eliminate $sync entirely
2025-01-10 23:20 ` [PATCH 30/31] (ext)index: eliminate $sync->{ibx} Eric Wong
@ 2025-01-10 23:21 ` Eric Wong
0 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2025-01-10 23:21 UTC (permalink / raw)
To: meta
Finally, the elimination of this variable simplifies subroutine
calls hopefully makes object lifetimes easier to reason about.
---
lib/PublicInbox/ExtSearchIdx.pm | 143 +++++++++++++++-----------------
lib/PublicInbox/V2Writable.pm | 86 +++++++++----------
2 files changed, 107 insertions(+), 122 deletions(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 946f5a4b..d1a16c84 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -373,7 +373,7 @@ sub unindex_oid { # git->cat_async callback for 'd'
# overrides V2Writable::last_commits, called by sync_ranges via sync_prepare
sub last_commits {
- my ($self, $sync) = @_;
+ my ($self) = @_;
my $heads = [];
my $ekey = $self->{ibx}->eidx_key;
my $uv = $self->{ibx}->uidvalidity;
@@ -391,8 +391,8 @@ sub _ibx_index_reject ($) {
undef;
}
-sub _sync_inbox ($$$) {
- my ($self, $sync, $ibx) = @_;
+sub _sync_inbox ($$) {
+ my ($self, $ibx) = @_;
my $ekey = $ibx->eidx_key;
if (defined(my $err = _ibx_index_reject($ibx))) {
return "W: skipping $ekey ($err)";
@@ -406,7 +406,7 @@ sub _sync_inbox ($$$) {
local $self->{unindexed};
if ($v == 2) {
$self->{epoch_max} = $ibx->max_git_epoch // return;
- sync_prepare($self, $sync); # or return # TODO: once MiscIdx is stable
+ sync_prepare($self); # or return # TODO: once MiscIdx is stable
} elsif ($v == 1) {
my $uv = $ibx->uidvalidity;
my $lc = $self->{oidx}->eidx_meta("lc-v1:$ekey//$uv");
@@ -418,14 +418,14 @@ sub _sync_inbox ($$$) {
} else {
return "E: $ekey unsupported inbox version (v$v)";
}
- PublicInbox::V2Writable::process_todo $self, $sync;
+ PublicInbox::V2Writable::process_todo $self;
$self->{midx}->index_ibx($ibx) unless $self->{quit};
$ibx->git->cleanup; # done with this inbox, now
undef;
}
-sub eidx_gc_scan_inboxes ($$) {
- my ($self, $sync) = @_;
+sub eidx_gc_scan_inboxes ($) {
+ my ($self) = @_;
my ($x3_doc, $ibx_ck);
restart:
$x3_doc = $self->{oidx}->dbh->prepare(<<EOM);
@@ -448,7 +448,7 @@ EOM
warn "# $r #$docid $eidx_key $oid\n";
if (update_checkpoint $self) {
$x3_doc = $ibx_ck = undef;
- reindex_checkpoint($self, $sync);
+ reindex_checkpoint($self);
goto restart;
}
}
@@ -473,26 +473,26 @@ DELETE FROM eidx_meta WHERE key = ?
}
}
-sub eidx_gc_scan_shards ($$) { # TODO: use for lei/store
- my ($self, $sync) = @_;
+sub eidx_gc_scan_shards ($) { # TODO: use for lei/store
+ my ($self) = @_;
my $nr = $self->{oidx}->dbh->do(<<'');
DELETE FROM xref3 WHERE docid NOT IN (SELECT num FROM over)
warn "# eliminated $nr stale xref3 entries\n" if $nr != 0;
- reindex_checkpoint($self, $sync) if update_checkpoint $self;
+ reindex_checkpoint($self) if update_checkpoint $self;
# fixup from old bugs:
$nr = $self->{oidx}->dbh->do(<<'');
DELETE FROM over WHERE num > 0 AND num NOT IN (SELECT docid FROM xref3)
warn "# eliminated $nr stale over entries\n" if $nr != 0;
- reindex_checkpoint($self, $sync) if update_checkpoint $self;
+ reindex_checkpoint($self) if update_checkpoint $self;
$nr = $self->{oidx}->dbh->do(<<'');
DELETE FROM eidxq WHERE docid NOT IN (SELECT num FROM over)
warn "# eliminated $nr stale reindex queue entries\n" if $nr != 0;
- reindex_checkpoint($self, $sync) if update_checkpoint $self;
+ reindex_checkpoint($self) if update_checkpoint $self;
my ($cur) = $self->{oidx}->dbh->selectrow_array(<<EOM);
SELECT MIN(num) FROM over WHERE num > 0
@@ -518,7 +518,7 @@ SELECT num FROM over WHERE num >= ? ORDER BY num ASC LIMIT 10000
$nr += $idx->ipc_do('nr_quiet_rm')
}
%active_shards = ();
- reindex_checkpoint($self, $sync);
+ reindex_checkpoint($self);
}
}
warn "# eliminated $nr stale Xapian documents\n" if $nr != 0;
@@ -533,15 +533,14 @@ sub eidx_gc { # top-level entry point
local $self->{nrec};
local $self->{-opt} = $opt;
local $self->{latest_cmt};
- my $sync = { self => $self };
$self->idx_init($opt); # acquire lock via V2Writable::_idx_init
- eidx_gc_scan_inboxes($self, $sync);
- eidx_gc_scan_shards($self, $sync);
+ eidx_gc_scan_inboxes $self;
+ eidx_gc_scan_shards $self;
done($self);
}
-sub _ibx_for ($$$) {
- my ($self, $sync, $smsg) = @_;
+sub _ibx_for ($$) {
+ my ($self, $smsg) = @_;
my $ibx_id = delete($smsg->{ibx_id}) // die 'BUG: {ibx_id} unset';
my $pos = $self->{id2pos}->{$ibx_id} //
bad_ibx_id($self, $ibx_id, \&croak);
@@ -572,8 +571,7 @@ EOF
sub _reindex_finalize ($$$) {
my ($req, $smsg, $eml) = @_;
- my $sync = $req->{sync};
- my $self = $sync->{self};
+ my $self = $req->{self};
my $by_chash = delete $req->{by_chash} or die 'BUG: no {by_chash}';
my $nr = scalar(keys(%$by_chash)) or die 'BUG: no content hashes';
my $orig_smsg = $req->{orig_smsg} // die 'BUG: no {orig_smsg}';
@@ -586,11 +584,11 @@ sub _reindex_finalize ($$$) {
my $idx = $self->idx_shard($docid);
my $top_smsg = pop @$stable;
$top_smsg == $smsg or die 'BUG: top_smsg != smsg';
- my $ibx = _ibx_for($self, $sync, $smsg);
+ my $ibx = _ibx_for $self, $smsg;
$smsg->{eidx_key} = $ibx->eidx_key;
$idx->index_eml($eml, $smsg);
for my $x (reverse @$stable) {
- $ibx = _ibx_for($self, $sync, $x);
+ $ibx = _ibx_for $self, $x;
my $hdr = delete $x->{hdr} // die 'BUG: no {hdr}';
$idx->ipc_do('add_eidx_info', $docid, $ibx->eidx_key, $hdr);
}
@@ -608,7 +606,7 @@ sub _reindex_finalize ($$$) {
}
my $x = pop(@$ary) // die "BUG: #$docid {by_chash} empty";
$x->{num} = delete($x->{xnum}) // die '{xnum} unset';
- $ibx = _ibx_for($self, $sync, $x);
+ $ibx = _ibx_for $self, $x;
if (my $over = $ibx->over) {
my $e = $over->get_art($x->{num});
$e->{blob} eq $x->{blob} or die <<EOF;
@@ -622,14 +620,13 @@ EOF
}
undef $by_chash;
while (my ($ibx, $e) = splice(@todo, 0, 2)) {
- reindex_unseen($self, $sync, $ibx, $e);
+ reindex_unseen($self, $ibx, $e);
}
}
sub _reindex_oid { # git->cat_async callback
my ($bref, $oid, $type, $size, $req) = @_;
- my $sync = $req->{sync};
- my $self = $sync->{self};
+ my $self = $req->{self};
my $orig_smsg = $req->{orig_smsg} // die 'BUG: no {orig_smsg}';
my $expect_oid = $req->{xr3r}->[$req->{ix}]->[2];
my $docid = $orig_smsg->{num};
@@ -666,8 +663,8 @@ sub _reindex_oid { # git->cat_async callback
}
}
-sub _reindex_smsg ($$$) {
- my ($self, $sync, $smsg) = @_;
+sub _reindex_smsg ($$) {
+ my ($self, $smsg) = @_;
my $docid = $smsg->{num};
my $xr3 = $self->{oidx}->get_xref3($docid, 1);
if (scalar(@$xr3) == 0) { # _reindex_check_stale should've covered this
@@ -689,7 +686,7 @@ BUG? #$docid $smsg->{blob} is not referenced by inboxes during reindex
$b->[1] <=> $a->[1] # break ties with {xnum}
} @$xr3;
@$xr3 = map { [ $_->[0], $_->[1], unpack('H*', $_->[2]) ] } @$xr3;
- my $req = { orig_smsg => $smsg, sync => $sync, xr3r => $xr3, ix => 0 };
+ my $req = { orig_smsg => $smsg, self => $self, xr3r => $xr3, ix => 0 };
$self->git->cat_async($xr3->[$req->{ix}]->[2], \&_reindex_oid, $req);
}
@@ -709,7 +706,7 @@ sub host_ident () {
};
}
-sub eidxq_release {
+sub eidxq_release ($) {
my ($self) = @_;
my $expect = delete($self->{-eidxq_locked}) or return;
my ($owner_pid, undef) = split(/-/, $expect);
@@ -730,7 +727,7 @@ sub eidxq_release {
sub DESTROY {
my ($self) = @_;
- eidxq_release($self) and $self->{oidx}->commit_lazy;
+ eidxq_release $self and $self->{oidx}->commit_lazy;
}
sub _eidxq_take ($) {
@@ -789,8 +786,8 @@ sub prep_id2pos ($) {
\%id2pos;
}
-sub eidxq_process ($$) { # for reindexing
- my ($self, $sync) = @_;
+sub eidxq_process ($) { # for reindexing
+ my ($self) = @_;
local $self->{current_info} = 'eidxq process';
return unless ($self->{cfg} && eidxq_lock_acquire($self));
my $dbh = $self->{oidx}->dbh;
@@ -812,7 +809,7 @@ restart:
while (defined(my $docid = $iter->fetchrow_array)) {
last if $self->{quit};
if (my $smsg = $self->{oidx}->get_art($docid)) {
- _reindex_smsg($self, $sync, $smsg);
+ _reindex_smsg($self, $smsg);
} else {
warn "E: #$docid does not exist in over\n";
}
@@ -821,7 +818,7 @@ restart:
if (update_checkpoint $self) {
$dbh = $del = $iter = undef;
- reindex_checkpoint($self, $sync); # release lock
+ reindex_checkpoint($self); # release lock
$dbh = $self->{oidx}->dbh;
goto restart;
}
@@ -845,10 +842,10 @@ sub _reindex_unseen { # git->cat_async callback
}
# --reindex may catch totally unseen messages, this handles them
-sub reindex_unseen ($$$$) {
- my ($self, $sync, $ibx, $xsmsg) = @_;
+sub reindex_unseen ($$$) {
+ my ($self, $ibx, $xsmsg) = @_;
my $req = {
- %$sync, # has {self}
+ self => $self,
autime => $xsmsg->{ds},
cotime => $xsmsg->{ts},
oid => $xsmsg->{blob},
@@ -881,8 +878,8 @@ EOS
1;
}
-sub _reindex_check_ibx ($$$) {
- my ($self, $sync, $ibx) = @_;
+sub _reindex_check_ibx ($$) {
+ my ($self, $ibx) = @_;
my $ibx_id = $ibx->{-ibx_id};
my $slice = 10000;
my $opt = { limit => $slice };
@@ -891,7 +888,7 @@ sub _reindex_check_ibx ($$$) {
my ($max, $max0);
do {
$max0 = $ibx->mm->num_highwater;
- sync_inbox($self, $sync, $ibx) and return; # warned
+ sync_inbox($self, $ibx) and return; # warned
$max = $ibx->mm->num_highwater;
return if $self->{quit};
} while ($max > $max0 &&
@@ -912,7 +909,7 @@ sub _reindex_check_ibx ($$$) {
$end = $beg + $slice;
$end = $max if $end > $max;
update_checkpoint $self and
- reindex_checkpoint($self, $sync); # release lock
+ reindex_checkpoint($self); # release lock
($lo, $hi) = ($msgs->[0]->{num}, $msgs->[-1]->{num});
$usr //= _unref_stale_range($self, $ibx, "xnum < $lo");
my $x3a = $self->{oidx}->dbh->selectall_arrayref(
@@ -930,7 +927,7 @@ ibx_id = ? AND xnum >= ? AND xnum <= ?
my $k = pack('JH*', $xsmsg->{num}, $xsmsg->{blob});
my $docids = delete($x3m{$k});
if (!defined($docids)) {
- reindex_unseen($self, $sync, $ibx, $xsmsg);
+ reindex_unseen($self, $ibx, $xsmsg);
} elsif (!$fast) {
for my $num (@$docids) {
$self->{oidx}->eidxq_add($num);
@@ -969,20 +966,20 @@ BUG: (non-fatal) $ekey #$xnum $smsg->{blob} still matches (old exp: $exp)
_unref_stale_range($self, $ibx, "xnum > $hi AND xnum <= $max");
}
-sub _reindex_inbox ($$$) {
- my ($self, $sync, $ibx) = @_;
+sub _reindex_inbox ($$) {
+ my ($self, $ibx) = @_;
my $ekey = $ibx->eidx_key;
local $self->{current_info} = $ekey;
if (defined(my $err = _ibx_index_reject($ibx))) {
warn "W: cannot reindex $ekey ($err)\n";
} else {
- _reindex_check_ibx($self, $sync, $ibx);
+ _reindex_check_ibx($self, $ibx);
}
delete @$ibx{qw(over mm search git)}; # won't need these for a bit
}
-sub eidx_reindex {
- my ($self, $sync) = @_;
+sub eidx_reindex ($) {
+ my ($self) = @_;
return unless $self->{cfg};
# acquire eidxq_lock early because full reindex takes forever
@@ -992,16 +989,16 @@ sub eidx_reindex {
return;
}
for my $ibx (@{ibx_sorted($self, 'active')}) {
- _reindex_inbox($self, $sync, $ibx);
+ _reindex_inbox($self, $ibx);
last if $self->{quit};
}
$self->git->async_wait_all; # ensure eidxq gets filled completely
- eidxq_process($self, $sync) unless $self->{quit};
+ eidxq_process $self unless $self->{quit};
}
-sub sync_inbox {
- my ($self, $sync, $ibx) = @_;
- my $err = _sync_inbox($self, $sync, $ibx);
+sub sync_inbox ($$) {
+ my ($self, $ibx) = @_;
+ my $err = _sync_inbox $self, $ibx;
delete @$ibx{qw(mm over)};
warn $err, "\n" if defined($err);
$err;
@@ -1036,8 +1033,8 @@ sub dd_smsg { # git->cat_async callback
}
}
-sub eidx_dedupe ($$$) {
- my ($self, $sync, $msgids) = @_;
+sub eidx_dedupe ($$) {
+ my ($self, $msgids) = @_;
local $self->{dedupe_cull} = 0;
my $candidates = 0;
my $nr_mid = 0;
@@ -1090,7 +1087,7 @@ EOS
if (update_checkpoint $self) {
undef $iter;
- reindex_checkpoint($self, $sync);
+ reindex_checkpoint($self);
goto dedupe_restart;
}
}
@@ -1103,7 +1100,7 @@ EOS
$self->{nrec} = 0;
}
-sub eidx_sync { # main entry point
+sub eidx_sync ($$) { # main entry point
my ($self, $opt) = @_;
local $self->{current_info} = '';
@@ -1115,12 +1112,6 @@ sub eidx_sync { # main entry point
local $self->{-opt} = $opt;
local $self->{-regen_fmt} = "%u/?\n";
local $self->{latest_cmt};
- my $sync = {
- # DO NOT SET {reindex} here, it's incompatible with reused
- # V2Writable code, reindex is totally different here
- # compared to v1/v2 inboxes because we have multiple histories
- self => $self,
- };
local $SIG{USR1} = sub { $self->{need_checkpoint} = 1 };
my $quit = PublicInbox::SearchIdx::quit_cb $self;
local $SIG{QUIT} = $quit;
@@ -1136,30 +1127,29 @@ sub eidx_sync { # main entry point
if (my $msgids = delete($opt->{dedupe})) {
local $self->{checkpoint_unlocks} = 1;
- eidx_dedupe($self, $sync, $msgids);
+ eidx_dedupe $self, $msgids;
}
if (delete($opt->{reindex})) {
local $self->{checkpoint_unlocks} = 1;
- eidx_reindex($self, $sync);
+ eidx_reindex $self;
}
# don't use $_ here, it'll get clobbered by reindex_checkpoint
if ($opt->{scan} // 1) {
for my $ibx (@{ibx_sorted($self, 'active')}) {
last if $self->{quit};
- sync_inbox($self, $sync, $ibx);
+ sync_inbox $self, $ibx;
}
}
$self->{oidx}->rethread_done($opt) unless $self->{quit};
- eidxq_process($self, $sync) unless $self->{quit};
+ eidxq_process $self unless $self->{quit};
- eidxq_release($self);
+ eidxq_release $self;
done($self);
- $sync; # for eidx_watch
}
sub update_last_commit { # overrides V2Writable
- my ($self, $sync, $stk) = @_;
+ my ($self, $stk) = @_;
my $unit = $self->{unit} // return;
my $latest_cmt = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
defined($latest_cmt) or return;
@@ -1291,10 +1281,10 @@ sub idx_init { # similar to V2Writable
sub _watch_commit { # PublicInbox::DS::add_timer callback
my ($self) = @_;
delete $self->{-commit_timer};
- eidxq_process($self, $self->{-watch_sync});
- eidxq_release($self);
+ eidxq_process $self;
+ eidxq_release $self;
my $fmt = delete $self->{-regen_fmt};
- reindex_checkpoint($self, $self->{-watch_sync});
+ reindex_checkpoint($self);
$self->{-regen_fmt} = $fmt;
# call event_step => done unless commit_timer is armed
@@ -1309,7 +1299,7 @@ sub on_inbox_unlock { # called by PublicInbox::InboxIdle
local $0 = "sync $ekey";
$pr->("indexing $ekey\n") if $pr;
$self->idx_init($opt);
- sync_inbox($self, $self->{-watch_sync}, $ibx);
+ sync_inbox $self, $ibx;
$self->{-commit_timer} //= add_timer($opt->{'commit-interval'} // 10,
\&_watch_commit, $self);
}
@@ -1378,7 +1368,7 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
my $pr = $opt->{-progress};
$pr->("performing initial scan ...\n") if $pr;
local $self->{-opt} = $opt;
- my $sync = eidx_sync($self, $opt); # initial sync
+ eidx_sync $self, $opt; # initial sync
return if $self->{quit};
my $oldset = PublicInbox::DS::block_signals();
local $self->{current_info} = '';
@@ -1390,7 +1380,6 @@ sub eidx_watch { # public-inbox-extindex --watch main loop
};
my $quit = PublicInbox::SearchIdx::quit_cb $self;
$sig->{QUIT} = $sig->{INT} = $sig->{TERM} = $quit;
- local $self->{-watch_sync} = $sync; # for ->on_inbox_unlock
local @PublicInbox::DS::post_loop_do = (sub { !$self->{quit} });
$pr->("initial scan complete, entering event loop\n") if $pr;
# calls InboxIdle->event_step:
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 80ee346d..3f05d5a6 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -695,11 +695,11 @@ sub atfork_child {
die "BUG: unexpected mm" if $self->{mm};
}
-sub reindex_checkpoint ($$) {
- my ($self, $sync) = @_;
+sub reindex_checkpoint ($) {
+ my ($self) = @_;
$self->git->async_wait_all;
- $self->update_last_commit($sync);
+ $self->update_last_commit;
$self->{need_checkpoint} = 0;
my $mm_tmp = $self->{mm_tmp};
$mm_tmp->atfork_prepare if $mm_tmp;
@@ -815,7 +815,7 @@ sub index_oid { # cat_async callback
# only update last_commit for $i on reindex iff newer than current
sub update_last_commit {
- my ($self, $sync, $stk) = @_;
+ my ($self, $stk) = @_;
my $unit = $self->{unit} // return;
my $latest_cmt = $stk ? $stk->{latest_cmt} : $self->{latest_cmt};
defined($latest_cmt) or return;
@@ -834,7 +834,7 @@ sub update_last_commit {
}
sub last_commits {
- my ($self, $sync) = @_;
+ my ($self) = @_;
my $heads = [];
for (my $i = $self->{epoch_max}; $i >= 0; $i--) {
$heads->[$i] = last_epoch_commit($self, $i);
@@ -844,8 +844,7 @@ sub last_commits {
# returns a revision range for git-log(1)
sub log_range ($$$) {
- my ($sync, $unit, $tip) = @_;
- my $self = $sync->{self};
+ my ($self, $unit, $tip) = @_;
my $opt = $self->{-opt};
my $pr = $opt->{-progress} if (($opt->{verbose} || 0) > 1);
my $i = $unit->{epoch};
@@ -863,7 +862,7 @@ sub log_range ($$$) {
my $range = "$cur..$tip";
$pr->("$i.git checking contiguity... ") if $pr;
my $git = $unit->{git};
- if (is_ancestor($sync->{self}->git, $cur, $tip)) { # common case
+ if (is_ancestor($self->git, $cur, $tip)) { # common case
$pr->("OK\n") if $pr;
my $n = $git->qx(qw(rev-list --count), $range);
chomp($n);
@@ -906,9 +905,9 @@ starting at $range
# overridden by ExtSearchIdx
sub artnum_max { $_[0]->{mm}->num_highwater }
-sub sync_prepare ($$) {
- my ($self, $sync) = @_;
- $self->{ranges} = sync_ranges($self, $sync);
+sub sync_prepare ($) {
+ my ($self) = @_;
+ $self->{ranges} = sync_ranges($self);
my $pr = $self->{-opt}->{-progress};
my $regen_max = 0;
my $head = $self->{ibx}->{ref_head} || 'HEAD';
@@ -933,7 +932,7 @@ sub sync_prepare ($$) {
} elsif ($self->{reindex}) { # V2 inbox
# reindex stops at the current heads and we later
# rerun index_sync without {reindex}
- $reindex_heads = $self->last_commits($sync);
+ $reindex_heads = $self->last_commits;
}
$self->{max_size} = $self->{-opt}->{max_size} and
$self->{index_oid} = $self->can('index_oid');
@@ -951,7 +950,7 @@ sub sync_prepare ($$) {
next if $?; # new repo
chomp $tip;
}
- my $range = log_range($sync, $unit, $tip) or next;
+ my $range = log_range($self, $unit, $tip) or next;
# can't use 'rev-list --count' if we use --diff-filter
$pr->("$pfx $i.git counting $range ... ") if $pr;
# Don't bump num_highwater on --reindex by using {D}.
@@ -978,7 +977,7 @@ sub sync_prepare ($$) {
for my $oid (@leftovers) {
last if $self->{quit};
$oid = unpack('H*', $oid);
- my $req = { %$sync, oid => $oid };
+ my $req = { self => $self, oid => $oid };
$self->git->cat_async($oid, $unindex_oid, $req);
}
$self->git->async_wait_all;
@@ -1048,8 +1047,8 @@ sub git { $_[0]->{ibx}->git }
# this is rare, it only happens when we get discontiguous history in
# a mirror because the source used -purge or -edit
-sub unindex_todo ($$$) {
- my ($self, $sync, $unit) = @_;
+sub unindex_todo ($$) {
+ my ($self, $unit) = @_;
my $unindex_range = delete($unit->{unindex_range}) // return;
my $unindexed = $self->{unindexed} //= {}; # $oidbin => [$num, $mid0]
my $before = scalar keys %$unindexed;
@@ -1060,7 +1059,8 @@ sub unindex_todo ($$$) {
my $unindex_oid = $self->can('unindex_oid');
while (<$fh>) {
/\A:\d{6} 100644 $OID ($OID) [AM]\tm$/o or next;
- $self->git->cat_async($1, $unindex_oid, { %$sync, oid => $1 });
+ $self->git->cat_async($1, $unindex_oid,
+ { self => $self, oid => $1 });
}
$fh->close or die "git log failed: \$?=$?";
$self->git->async_wait_all;
@@ -1074,10 +1074,10 @@ sub unindex_todo ($$$) {
--prune=all --quiet)));
}
-sub sync_ranges ($$) {
- my ($self, $sync) = @_;
+sub sync_ranges ($) {
+ my ($self) = @_;
my $reindex = $self->{reindex};
- return $self->last_commits($sync) unless $reindex;
+ return $self->last_commits unless $reindex;
return [] if ref($reindex) ne 'HASH';
my $ranges = $reindex->{from}; # arrayref;
@@ -1094,8 +1094,8 @@ sub index_xap_only { # git->cat_async callback
$idx->index_eml(PublicInbox::Eml->new($bref), $smsg);
}
-sub index_xap_step ($$$$;$) {
- my ($self, $sync, $beg, $end, $step) = @_;
+sub index_xap_step ($$$;$) {
+ my ($self, $beg, $end, $step) = @_;
return if $beg > $end; # nothing to do
$step //= $self->{shards};
@@ -1114,14 +1114,14 @@ sub index_xap_step ($$$$;$) {
my $n = $self->{transact_bytes} += $smsg->{bytes};
if ($n >= $self->{batch_bytes}) {
$self->{nrec} = $num;
- reindex_checkpoint($self, $sync);
+ reindex_checkpoint $self;
}
}
}
-sub index_todo ($$$) {
- my ($self, $sync, $unit) = @_;
- unindex_todo($self, $sync, $unit);
+sub index_todo ($$) {
+ my ($self, $unit) = @_;
+ unindex_todo($self, $unit);
my $stk = delete($unit->{stack}) or return;
my $all = $self->git; # initialize self->{ibx}->{git}
my $index_oid = $self->can('index_oid');
@@ -1140,11 +1140,11 @@ sub index_todo ($$$) {
if ($self->{quit}) {
warn "waiting to quit...\n";
$all->async_wait_all;
- $self->update_last_commit($sync);
+ $self->update_last_commit;
return;
}
my $req = {
- %$sync,
+ self => $self,
autime => $at,
cotime => $ct,
oid => $oid,
@@ -1159,33 +1159,31 @@ sub index_todo ($$$) {
} elsif ($f eq 'd') {
$all->cat_async($oid, $unindex_oid, $req);
}
- reindex_checkpoint($self, $sync) if $self->{need_checkpoint};
+ reindex_checkpoint $self if $self->{need_checkpoint};
}
$all->async_wait_all;
- $self->update_last_commit($sync, $stk);
+ $self->update_last_commit($stk);
}
-sub xapian_only ($;$$) {
- my ($self, $sync, $art_beg) = @_;
+sub xapian_only ($;$) {
+ my ($self, $art_beg) = @_;
my $seq = $self->{-opt}->{'sequential-shard'};
$art_beg //= 0;
local $self->{parallel} = 0 if $seq;
$self->idx_init($self->{-opt}); # acquire lock
if (my $art_end = $self->{ibx}->mm->max) {
$self->{-regen_fmt} //= "%u/?\n";
- $sync //= { self => $self };
if ($seq || !$self->{parallel}) {
my $shard_end = $self->{shards} - 1;
for my $i (0..$shard_end) {
last if $self->{quit};
- index_xap_step $self, $sync, $art_beg + $i,
- $art_end;
+ index_xap_step $self, $art_beg + $i, $art_end;
if ($i != $shard_end) {
- reindex_checkpoint($self, $sync);
+ reindex_checkpoint $self;
}
}
} else { # parallel (maybe)
- index_xap_step $self, $sync, $art_beg, $art_end, 1;
+ index_xap_step $self, $art_beg, $art_end, 1;
}
}
$self->git->async_wait_all;
@@ -1193,11 +1191,11 @@ sub xapian_only ($;$$) {
$self->done;
}
-sub process_todo ($$) {
- my ($self, $sync) = @_;
+sub process_todo ($) {
+ my ($self) = @_;
for my $unit (@{delete($self->{todo}) // []}) {
last if $self->{quit};
- $self->index_todo($sync, $unit); # may be ExtSearchIdx
+ $self->index_todo($unit); # may be ExtSearchIdx
}
}
@@ -1237,13 +1235,12 @@ sub index_sync {
local $self->{todo}; # sync_prepare
local $self->{ranges};
local $self->{unindexed};
- my $sync = { self => $self };
my $quit = PublicInbox::SearchIdx::quit_cb $self;
local $SIG{QUIT} = $quit;
local $SIG{INT} = $quit;
local $SIG{TERM} = $quit;
- if (sync_prepare($self, $sync)) {
+ if (sync_prepare($self)) {
# tmp_clone seems to fail if inside a transaction, so
# we rollback here (because we opened {mm} for reading)
# Note: we do NOT rely on DBI transactions for atomicity;
@@ -1260,7 +1257,7 @@ sub index_sync {
}
}
# work forwards through history
- process_todo $self, $sync;
+ process_todo $self;
$self->{oidx}->rethread_done($opt) unless $self->{quit};
$self->done;
@@ -1276,7 +1273,7 @@ sub index_sync {
$quit_warn = 1;
} else {
$self->{ibx}->{indexlevel} = $idxlevel;
- xapian_only($self, $sync, $art_beg);
+ xapian_only $self, $art_beg;
$quit_warn = 1 if $self->{quit};
}
}
@@ -1298,7 +1295,6 @@ sub index_sync {
if ($opt->{reindex} && !$self->{quit} &&
!grep(defined, @$opt{qw(since until)})) {
my %again = %$opt;
- $sync = undef;
delete @again{qw(rethread reindex -skip_lock)};
index_sync($self, \%again);
}
^ permalink raw reply related [flat|nested] 33+ messages in thread
end of thread, other threads:[~2025-01-10 23:21 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10 23:17 [PATCH 00/31] (ext)index: eliminate $sync structure Eric Wong
2025-01-10 23:18 ` Eric Wong
2025-01-10 23:18 ` [PATCH 01/31] (ext)index: ${$sync->{nr}} to $self->{nrec} Eric Wong
2025-01-10 23:18 ` [PATCH 02/31] (ext)index: move {-opt} from $sync to $self Eric Wong
2025-01-10 23:18 ` [PATCH 03/31] (ext)index: move {-regen_fmt} " Eric Wong
2025-01-10 23:18 ` [PATCH 04/31] (ext)index: move {latest_cmt} to $self (from $sync) Eric Wong
2025-01-10 23:18 ` [PATCH 05/31] smsg->populate: rename $sync to $cmt_info Eric Wong
2025-01-10 23:18 ` [PATCH 06/31] searchidx: prefix v1 code with `v1_' Eric Wong
2025-01-10 23:18 ` [PATCH 07/31] (ext)index: avoid needless {git} ref with --max-size Eric Wong
2025-01-10 23:18 ` [PATCH 08/31] (ext)index: move {quit} from $sync to $self Eric Wong
2025-01-10 23:18 ` [PATCH 09/31] extindex: {boost_in_use} field " Eric Wong
2025-01-10 23:18 ` [PATCH 10/31] extindex: move {id2pos} " Eric Wong
2025-01-10 23:18 ` [PATCH 11/31] searchidx: move {ntodo} " Eric Wong
2025-01-10 23:18 ` [PATCH 12/31] searchidx: rename {sidx} to {self} Eric Wong
2025-01-10 23:18 ` [PATCH 13/31] (ext)index: move {max_size} and related bits to $self Eric Wong
2025-01-10 23:18 ` [PATCH 14/31] index: move {D} (delete state) " Eric Wong
2025-01-10 23:19 ` [PATCH 15/31] index: move {reindex} " Eric Wong
2025-01-10 23:19 ` [PATCH 16/31] searchidx: eliminate $sync from subroutines Eric Wong
2025-01-10 23:19 ` [PATCH 17/31] v2writable: move {mm_tmp} to $self Eric Wong
2025-01-10 23:19 ` [PATCH 18/31] (ext)index: eliminate most uses of `$sync->{ibx}' Eric Wong
2025-01-10 23:19 ` [PATCH 19/31] extindex: eliminate repeated ->eidx_key method call Eric Wong
2025-01-10 23:20 ` [PATCH 20/31] v2writable: eliminate $sync->{art_end} Eric Wong
2025-01-10 23:20 ` [PATCH 21/31] (ext)index: $sync->{unit} => $self->{unit} Eric Wong
2025-01-10 23:20 ` [PATCH 22/31] (ext)index: eliminate redundant $sync->{epoch_max} Eric Wong
2025-01-10 23:20 ` [PATCH 23/31] extindex: move {dedupe_cull} to self Eric Wong
2025-01-10 23:20 ` [PATCH 24/31] extindex: simplify data structures used for dedupe Eric Wong
2025-01-10 23:20 ` [PATCH 25/31] (ext)index: move {todo} into $self Eric Wong
2025-01-10 23:20 ` [PATCH 26/31] v2writable: {in_unindex} moved to self Eric Wong
2025-01-10 23:20 ` [PATCH 27/31] v2writable: hoist out process_todo sub for extindex Eric Wong
2025-01-10 23:20 ` [PATCH 28/31] (ext)index: move {unindexed} to $self Eric Wong
2025-01-10 23:20 ` [PATCH 29/31] (ext)index: move {ranges} " Eric Wong
2025-01-10 23:20 ` [PATCH 30/31] (ext)index: eliminate $sync->{ibx} Eric Wong
2025-01-10 23:21 ` [PATCH 31/31] (ext)index: eliminate $sync entirely Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).