* [PATCH 00/10] lei: cleanups + initial import support
@ 2021-02-04 9:59 Eric Wong
2021-02-04 9:59 ` [PATCH 01/10] lei q: delay worker spawn Eric Wong
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
Still some ways to go, but changes to the "lei q" backend
should make future work far easier. I went a bit overboard
with the FD passing in earlier iterations :x Maybe Inline::C
won't have to be a hard requirement for lei after all...
The PktOp package is nice and works out for "lei import", too
Eric Wong (10):
lei q: delay worker spawn
ipc: localize fields assignment to prevent circular refs
lei q: reorder internals to reduce FD passing
lei q: only start pager if output is to stdout
lei q: reinstate early MUA spawn for Maildir
eml: handle warning ignores for lei
lei q: eliminate $not_done temporary git dir hack
lei_query: remove uneeded dwaitpid import
lei_xsearch: drop unused imports
lei import: initial implementation
MANIFEST | 1 +
lib/PublicInbox/Admin.pm | 7 +-
lib/PublicInbox/Eml.pm | 19 ++++
lib/PublicInbox/IPC.pm | 10 +--
lib/PublicInbox/InboxWritable.pm | 24 +-----
lib/PublicInbox/LEI.pm | 144 +++++++++++++------------------
lib/PublicInbox/LeiImport.pm | 106 +++++++++++++++++++++++
lib/PublicInbox/LeiOverview.pm | 44 ++--------
lib/PublicInbox/LeiQuery.pm | 20 ++---
lib/PublicInbox/LeiStore.pm | 18 ++++
lib/PublicInbox/LeiToMail.pm | 43 ++++++---
lib/PublicInbox/LeiXSearch.pm | 143 +++++++++++++++---------------
lib/PublicInbox/Watch.pm | 14 ++-
t/lei.t | 15 ++++
14 files changed, 345 insertions(+), 263 deletions(-)
create mode 100644 lib/PublicInbox/LeiImport.pm
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 01/10] lei q: delay worker spawn
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 02/10] ipc: localize fields assignment Eric Wong
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
Now that --stdin support is sorted, we can delay spawning
workers until we know the query is ready-to-run.
---
lib/PublicInbox/LeiQuery.pm | 19 +++++--------------
lib/PublicInbox/LeiXSearch.pm | 6 ++++++
2 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 4fe40400..6b1aa40c 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -75,21 +75,12 @@ sub lei_q {
$xj ||= $lxs->concurrency($opt); # allow: "--jobs ,$WRITER_ONLY"
my $nproc = $lxs->detect_nproc; # don't memoize, schedtool(1) exists
$xj = $nproc if $xj > $nproc;
- PublicInbox::LeiOverview->new($self) or return;
- $self->atfork_prepare_wq($lxs);
- $lxs->wq_workers_start('lei_xsearch', $xj, $self->oldset);
- delete $lxs->{-ipc_atfork_child_close};
- if (my $l2m = $self->{l2m}) {
- if (defined($mj) && $mj !~ /\A[1-9][0-9]*\z/) {
- return $self->fail("`$mj' writer jobs must be >= 1");
- }
- $mj //= $nproc;
- $self->atfork_prepare_wq($l2m);
- $l2m->wq_workers_start('lei2mail', $mj, $self->oldset);
- delete $l2m->{-ipc_atfork_child_close};
+ $lxs->{jobs} = $xj;
+ if (defined($mj) && $mj !~ /\A[1-9][0-9]*\z/) {
+ return $self->fail("`$mj' writer jobs must be >= 1");
}
-
- # no forking workers after this
+ $self->{l2m}->{jobs} = ($mj // $nproc) if $self->{l2m};
+ PublicInbox::LeiOverview->new($self) or return;
my %mset_opt = map { $_ => $opt->{$_} } qw(thread limit offset);
$mset_opt{asc} = $opt->{'reverse'} ? 1 : 0;
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 965617b5..ab66717c 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -406,7 +406,13 @@ sub do_query {
$lei->{ovv}->ovv_begin($lei);
my ($au_done, $zpipe);
my $l2m = $lei->{l2m};
+ $lei->atfork_prepare_wq($self);
+ $self->wq_workers_start('lei_xsearch', $self->{jobs}, $lei->oldset);
+ delete $self->{-ipc_atfork_child_close};
if ($l2m) {
+ $lei->atfork_prepare_wq($l2m);
+ $l2m->wq_workers_start('lei2mail', $l2m->{jobs}, $lei->oldset);
+ delete $l2m->{-ipc_atfork_child_close};
pipe($lei->{startq}, $au_done) or die "pipe: $!";
# 1031: F_SETPIPE_SZ
fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 02/10] ipc: localize fields assignment
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
2021-02-04 9:59 ` [PATCH 01/10] lei q: delay worker spawn Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 03/10] lei q: reorder internals to reduce FD passing Eric Wong
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
We don't want circular references giving surprising behavior
during worker exit.
---
lib/PublicInbox/IPC.pm | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 3873649b..078aaa2c 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -338,7 +338,6 @@ sub _wq_worker_start ($$$) {
srand($seed);
eval { PublicInbox::DS->Reset };
delete @$self{qw(-wq_s1 -wq_workers -wq_ppid)};
- @$self{keys %$fields} = values(%$fields) if $fields;
$SIG{$_} = 'IGNORE' for (qw(PIPE));
$SIG{$_} = 'DEFAULT' for (qw(TTOU TTIN TERM QUIT INT CHLD));
local $0 = $self->{-wq_ident};
@@ -346,6 +345,8 @@ sub _wq_worker_start ($$$) {
# ensure we properly exit even if warn() dies:
my $end = PublicInbox::OnDestroy->new($$, sub { exit(!!$@) });
eval {
+ $fields //= {};
+ local @$self{keys %$fields} = values(%$fields);
my $on_destroy = $self->ipc_atfork_child;
local %SIG = %SIG;
wq_worker_loop($self);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 03/10] lei q: reorder internals to reduce FD passing
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
2021-02-04 9:59 ` [PATCH 01/10] lei q: delay worker spawn Eric Wong
2021-02-04 9:59 ` [PATCH 02/10] ipc: localize fields assignment Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 04/10] lei q: only start pager if output is to stdout Eric Wong
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
While FD passing is critical for script/lei <=> lei-daemon,
lei-daemon doesn't need to use it internally if FDs are
created in the proper order before forking.
---
lib/PublicInbox/IPC.pm | 3 --
lib/PublicInbox/LEI.pm | 99 +++++++---------------------------
lib/PublicInbox/LeiOverview.pm | 28 +++-------
lib/PublicInbox/LeiToMail.pm | 28 ++++++----
lib/PublicInbox/LeiXSearch.pm | 97 ++++++++++++++++-----------------
5 files changed, 92 insertions(+), 163 deletions(-)
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 078aaa2c..7f5a3f6f 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -464,9 +464,6 @@ sub DESTROY {
ipc_worker_stop($self);
}
-# Sereal doesn't have dclone
-sub deep_clone { ipc_thaw(ipc_freeze($_[-1])) }
-
sub detect_nproc () {
# _SC_NPROCESSORS_ONLN = 84 on both Linux glibc and musl
return POSIX::sysconf(84) if $^O eq 'linux';
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 49deed13..0d4b1c11 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -286,7 +286,7 @@ sub x_it ($$) {
# make sure client sees stdout before exit
$self->{1}->autoflush(1) if $self->{1};
dump_and_clear_log();
- if (my $s = $self->{pkt_op} // $self->{sock}) {
+ if (my $s = $self->{pkt_op_p} // $self->{sock}) {
send($s, "x_it $code", MSG_EOR);
} elsif ($self->{oneshot}) {
# don't want to end up using $? from child processes
@@ -322,7 +322,8 @@ sub qerr ($;@) { $_[0]->{opt}->{quiet} or err(shift, @_) }
sub fail ($$;$) {
my ($self, $buf, $exit_code) = @_;
err($self, $buf) if defined $buf;
- send($self->{pkt_op}, '!', MSG_EOR) if $self->{pkt_op}; # fail_handler
+ # calls fail_handler:
+ send($self->{pkt_op_p}, '!', MSG_EOR) if $self->{pkt_op_p};
x_it($self, ($exit_code // 1) << 8);
undef;
}
@@ -340,7 +341,7 @@ sub puts ($;@) { out(shift, map { "$_\n" } @_) }
sub child_error { # passes non-fatal curl exit codes to user
my ($self, $child_error) = @_; # child_error is $?
- if (my $s = $self->{pkt_op} // $self->{sock}) {
+ if (my $s = $self->{pkt_op_p} // $self->{sock}) {
# send to the parent lei-daemon or to lei(1) client
send($s, "child_error $child_error", MSG_EOR);
} elsif (!$PublicInbox::DS::in_loop) {
@@ -348,94 +349,34 @@ sub child_error { # passes non-fatal curl exit codes to user
} # else noop if client disconnected
}
-sub atfork_prepare_wq {
- my ($self, $wq) = @_;
- my $tcafc = $wq->{-ipc_atfork_child_close} //= [ $listener // () ];
- if (my $sock = $self->{sock}) {
- push @$tcafc, @$self{qw(0 1 2 3)}, $sock;
- }
- if (my $pgr = $self->{pgr}) {
- push @$tcafc, @$pgr[1,2];
- }
- if (my $old_1 = $self->{old_1}) {
- push @$tcafc, $old_1;
- }
- for my $f (qw(lxs l2m)) {
- my $ipc = $self->{$f} or next;
- push @$tcafc, grep { defined }
- @$ipc{qw(-wq_s1 -wq_s2 -ipc_req -ipc_res)};
- }
-}
-
-sub io_restore ($$) {
- my ($dst, $src) = @_;
- for my $i (0..2) { # standard FDs
- my $io = delete $src->{$i} or next;
- $dst->{$i} = $io;
- }
- for my $i (3..9) { # named (non-standard) FDs
- my $io = $src->{$i} or next;
- my @st = stat($io) or die "stat $src.$i ($io): $!";
- my $f = delete $dst->{"dev=$st[0],ino=$st[1]"} // next;
- $dst->{$f} = $io;
- delete $src->{$i};
- }
-}
-
sub note_sigpipe { # triggers sigpipe_handler
my ($self, $fd) = @_;
close(delete($self->{$fd})); # explicit close silences Perl warning
- send($self->{pkt_op}, '|', MSG_EOR) if $self->{pkt_op};
+ send($self->{pkt_op_p}, '|', MSG_EOR) if $self->{pkt_op_p};
x_it($self, 13);
}
-sub atfork_child_wq {
- my ($self, $wq) = @_;
- io_restore($self, $wq);
- -S $self->{pkt_op} or die 'BUG: {pkt_op} expected';
- io_restore($self->{l2m}, $wq);
+sub lei_atfork_child {
+ my ($self) = @_;
+ # we need to explicitly close things which are on stack
+ delete $self->{0};
+ for (delete @$self{qw(3 sock old_1 au_done)}) {
+ close($_) if defined($_);
+ }
+ if (my $op_c = delete $self->{pkt_op_c}) {
+ close(delete $op_c->{sock});
+ }
+ if (my $pgr = delete $self->{pgr}) {
+ close($_) for (@$pgr[1,2]);
+ }
+ close $listener if $listener;
+ undef $listener;
%PATH2CFG = ();
undef $errors_log;
$quit = \&CORE::exit;
$current_lei = $self; # for SIG{__WARN__}
}
-sub io_extract ($;@) {
- my ($obj, @fields) = @_;
- my @io;
- for my $f (@fields) {
- my $io = delete $obj->{$f} or next;
- my @st = stat($io) or die "W: stat $obj.$f ($io): $!";
- $obj->{"dev=$st[0],ino=$st[1]"} = $f;
- push @io, $io;
- }
- @io
-}
-
-# usage: ($lei, @io) = $lei->atfork_parent_wq($wq);
-sub atfork_parent_wq {
- my ($self, $wq) = @_;
- my $env = delete $self->{env}; # env is inherited at fork
- my $lei = bless { %$self }, ref($self);
- for my $f (qw(dedupe ovv)) {
- my $tmp = delete($lei->{$f}) or next;
- $lei->{$f} = $wq->deep_clone($tmp);
- }
- $self->{env} = $env;
- delete @$lei{qw(sock 3 -lei_store cfg old_1 pgr lxs)}; # keep l2m
- my @io = (delete(@$lei{qw(0 1 2)}),
- io_extract($lei, qw(pkt_op startq)));
- my $l2m = $lei->{l2m};
- if ($l2m && $l2m != $wq) { # $wq == lxs
- if (my $wq_s1 = $l2m->{-wq_s1}) {
- push @io, io_extract($l2m, '-wq_s1');
- $l2m->{-wq_s1} = $wq_s1;
- }
- $l2m->wq_close(1);
- }
- ($lei, @io);
-}
-
sub _help ($;$) {
my ($self, $errmsg) = @_;
my $cmd = $self->{cmd} // 'COMMAND';
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index e33d63a2..e6bf4f2a 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -207,7 +207,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
}
$lei->{ovv_buf} = \(my $buf = '') if !$l2m;
if ($l2m && !$ibxish) { # remote https?:// mboxrd
- delete $l2m->{-wq_s1};
my $g2m = $l2m->can('git_to_mail');
my $wcb = $l2m->write_cb($lei);
sub {
@@ -215,33 +214,20 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
$wcb->(undef, $smsg, $eml);
};
} elsif ($l2m && $l2m->{-wq_s1}) {
- my ($lei_ipc, @io) = $lei->atfork_parent_wq($l2m);
- # $io[0] becomes a notification pipe that triggers EOF
+ # $io->[0] becomes a notification pipe that triggers EOF
# in this wq worker when all outstanding ->write_mail
# calls are complete
- $io[0] = undef;
- pipe($l2m->{each_smsg_done}, $io[0]) or die "pipe: $!";
- fcntl($io[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
- delete @$lei_ipc{qw(l2m opt mset_opt cmd)};
+ my $io = [];
+ pipe($l2m->{each_smsg_done}, $io->[0]) or die "pipe: $!";
+ fcntl($io->[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
$self->{git} = $git;
my $git_dir = $git->{git_dir};
sub {
my ($smsg, $mitem) = @_;
$smsg->{pct} = get_pct($mitem) if $mitem;
- $l2m->wq_do('write_mail', \@io, $git_dir, $smsg,
- $lei_ipc);
+ $l2m->wq_do('write_mail', $io, $git_dir, $smsg);
}
- } elsif ($l2m) {
- my $wcb = $l2m->write_cb($lei);
- my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
- $self->{git} = $git; # for ovv_atexit_child
- my $g2m = $l2m->can('git_to_mail');
- sub {
- my ($smsg, $mitem) = @_;
- $smsg->{pct} = get_pct($mitem) if $mitem;
- $git->cat_async($smsg->{blob}, $g2m, [ $wcb, $smsg ]);
- };
} elsif ($self->{fmt} =~ /\A(concat)?json\z/ && $lei->{opt}->{pretty}) {
my $EOR = ($1//'') eq 'concat' ? "\n}" : "\n},";
sub { # DIY prettiness :P
@@ -275,7 +261,9 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
$lei->out($buf);
$buf = '';
}
- } # else { ...
+ } else {
+ die "TODO: unhandled case $self->{fmt}"
+ }
}
no warnings 'once';
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index c704dc2a..f9250860 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -211,10 +211,10 @@ sub zsfx2cmd ($$$) {
}
sub _post_augment_mbox { # open a compressor process
- my ($self, $lei, $zpipe) = @_;
+ my ($self, $lei) = @_;
my $zsfx = $self->{zsfx} or return;
my $cmd = zsfx2cmd($zsfx, undef, $lei);
- my ($r, $w) = splice(@$zpipe, 0, 2);
+ my ($r, $w) = @{delete $lei->{zpipe}};
my $rdr = { 0 => $r, 1 => $lei->{1}, 2 => $lei->{2} };
my $pid = spawn($cmd, $lei->{env}, $rdr);
my $pp = gensym;
@@ -407,7 +407,7 @@ sub _pre_augment_mbox {
$! == ENOENT or die "unlink($dst): $!";
}
open my $out, $mode, $dst or die "open($dst): $!";
- $lei->{old_1} = $lei->{1};
+ $lei->{old_1} = $lei->{1}; # keep for spawning MUA
$lei->{1} = $out;
}
# Perl does SEEK_END even with O_APPEND :<
@@ -418,7 +418,7 @@ sub _pre_augment_mbox {
state $zsfx_allow = join('|', keys %zsfx2cmd);
($self->{zsfx}) = ($dst =~ /\.($zsfx_allow)\z/) or return;
pipe(my ($r, $w)) or die "pipe: $!";
- [ $r, $w ];
+ $lei->{zpipe} = [ $r, $w ];
}
sub _do_augment_mbox {
@@ -462,16 +462,24 @@ sub post_augment { # fast (spawn compressor or mkdir), runs in main daemon
$self->$m($lei, @args);
}
+sub ipc_atfork_child {
+ my ($self) = @_;
+ my $lei = delete $self->{lei};
+ $lei->lei_atfork_child;
+ if (my $zpipe = delete $lei->{zpipe}) {
+ $lei->{1} = $zpipe->[1];
+ close $zpipe->[0];
+ }
+ $self->{wcb} = $self->write_cb($lei);
+ $self->SUPER::ipc_atfork_child;
+}
+
sub write_mail { # via ->wq_do
- my ($self, $git_dir, $smsg, $lei) = @_;
+ my ($self, $git_dir, $smsg) = @_;
my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
- my $wcb = $self->{wcb} //= do { # first message
- $lei->atfork_child_wq($self);
- $self->write_cb($lei);
- };
my $git = $self->{"$$\0$git_dir"} //= PublicInbox::Git->new($git_dir);
git_async_cat($git, $smsg->{blob}, \&git_to_mail,
- [$wcb, $smsg, $not_done]);
+ [$self->{wcb}, $smsg, $not_done]);
}
sub wq_atexit_child {
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index ab66717c..e41d899e 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -110,8 +110,8 @@ sub wait_startq ($) {
sub mset_progress {
my $lei = shift;
return unless $lei->{-progress};
- if ($lei->{pkt_op}) { # called via pkt_op/pkt_do from workers
- pkt_do($lei->{pkt_op}, 'mset_progress', @_);
+ if ($lei->{pkt_op_p}) {
+ pkt_do($lei->{pkt_op_p}, 'mset_progress', @_);
} else { # single lei-daemon consumer
my ($desc, $mset_size, $mset_total_est) = @_;
$lei->{-mset_total} += $mset_size;
@@ -120,11 +120,10 @@ sub mset_progress {
}
sub query_thread_mset { # for --thread
- my ($self, $lei, $ibxish) = @_;
+ my ($self, $ibxish) = @_;
local $0 = "$0 query_thread_mset";
- $lei->atfork_child_wq($self);
+ my $lei = $self->{lei};
my $startq = delete $lei->{startq};
-
my ($srch, $over) = ($ibxish->search, $ibxish->over);
my $desc = $ibxish->{inboxdir} // $ibxish->{topdir};
return warn("$desc not indexed by Xapian\n") unless ($srch && $over);
@@ -154,9 +153,9 @@ sub query_thread_mset { # for --thread
}
sub query_mset { # non-parallel for non-"--thread" users
- my ($self, $lei) = @_;
+ my ($self) = @_;
local $0 = "$0 query_mset";
- $lei->atfork_child_wq($self);
+ my $lei = $self->{lei};
my $startq = delete $lei->{startq};
my $mo = { %{$lei->{mset_opt}} };
my $mset;
@@ -207,10 +206,10 @@ sub kill_reap {
}
sub query_remote_mboxrd {
- my ($self, $lei, $uris) = @_;
+ my ($self, $uris) = @_;
local $0 = "$0 query_remote_mboxrd";
- $lei->atfork_child_wq($self);
local $SIG{TERM} = sub { exit(0) }; # for DESTROY (File::Temp, $reap)
+ my $lei = $self->{lei};
my ($opt, $env) = @$lei{qw(opt env)};
my @qform = (q => $lei->{mset_opt}->{qstr}, x => 'm');
push(@qform, t => 1) if $opt->{thread};
@@ -307,7 +306,7 @@ sub git {
$git;
}
-sub query_done { # EOF callback
+sub query_done { # EOF callback for main daemon
my ($lei) = @_;
my $has_l2m = exists $lei->{l2m};
for my $f (qw(lxs l2m)) {
@@ -332,9 +331,8 @@ Error closing $lei->{ovv}->{dst}: $!
}
sub do_post_augment {
- my ($lei, $zpipe, $au_done) = @_;
- my $l2m = $lei->{l2m} or die 'BUG: no {l2m}';
- eval { $l2m->post_augment($lei, $zpipe) };
+ my ($lei) = @_;
+ eval { $lei->{l2m}->post_augment($lei) };
if (my $err = $@) {
if (my $lxs = delete $lei->{lxs}) {
$lxs->wq_kill;
@@ -342,7 +340,7 @@ sub do_post_augment {
}
$lei->fail("$err");
}
- close $au_done; # triggers wait_startq
+ close(delete $lei->{au_done}); # triggers wait_startq
}
my $MAX_PER_HOST = 4;
@@ -356,13 +354,13 @@ sub concurrency {
}
sub start_query { # always runs in main (lei-daemon) process
- my ($self, $io, $lei) = @_;
+ my ($self, $lei) = @_;
if ($lei->{opt}->{thread}) {
for my $ibxish (locals($self)) {
- $self->wq_do('query_thread_mset', $io, $lei, $ibxish);
+ $self->wq_do('query_thread_mset', [], $ibxish);
}
} elsif (locals($self)) {
- $self->wq_do('query_mset', $io, $lei);
+ $self->wq_do('query_mset', []);
}
my $i = 0;
my $q = [];
@@ -370,19 +368,23 @@ sub start_query { # always runs in main (lei-daemon) process
push @{$q->[$i++ % $MAX_PER_HOST]}, $uri;
}
for my $uris (@$q) {
- $self->wq_do('query_remote_mboxrd', $io, $lei, $uris);
+ $self->wq_do('query_remote_mboxrd', [], $uris);
}
- @$io = ();
+}
+
+sub ipc_atfork_child {
+ my ($self) = @_;
+ $self->{lei}->lei_atfork_child;
+ $self->SUPER::ipc_atfork_child;
}
sub query_prepare { # called by wq_do
- my ($self, $lei) = @_;
+ my ($self) = @_;
local $0 = "$0 query_prepare";
- $lei->atfork_child_wq($self);
- delete $lei->{l2m}->{-wq_s1};
+ my $lei = $self->{lei};
eval { $lei->{l2m}->do_augment($lei) };
$lei->fail($@) if $@;
- pkt_do($lei->{pkt_op}, '.') == 1 or die "do_post_augment trigger: $!"
+ pkt_do($lei->{pkt_op_p}, '.') == 1 or die "do_post_augment trigger: $!"
}
sub fail_handler ($;$$) {
@@ -401,45 +403,38 @@ sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
sub do_query {
my ($self, $lei) = @_;
- $lei->{1}->autoflush(1);
- $lei->start_pager if -t $lei->{1};
- $lei->{ovv}->ovv_begin($lei);
- my ($au_done, $zpipe);
- my $l2m = $lei->{l2m};
- $lei->atfork_prepare_wq($self);
- $self->wq_workers_start('lei_xsearch', $self->{jobs}, $lei->oldset);
- delete $self->{-ipc_atfork_child_close};
- if ($l2m) {
- $lei->atfork_prepare_wq($l2m);
- $l2m->wq_workers_start('lei2mail', $l2m->{jobs}, $lei->oldset);
- delete $l2m->{-ipc_atfork_child_close};
- pipe($lei->{startq}, $au_done) or die "pipe: $!";
- # 1031: F_SETPIPE_SZ
- fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
- $zpipe = $l2m->pre_augment($lei);
- }
my $ops = {
'|' => [ \&sigpipe_handler, $lei ],
'!' => [ \&fail_handler, $lei ],
- '.' => [ \&do_post_augment, $lei, $zpipe, $au_done ],
+ '.' => [ \&do_post_augment, $lei ],
'' => [ \&query_done, $lei ],
'mset_progress' => [ \&mset_progress, $lei ],
'x_it' => [ $lei->can('x_it'), $lei ],
'child_error' => [ $lei->can('child_error'), $lei ],
};
- (my $op, $lei->{pkt_op}) = PublicInbox::PktOp->pair($ops);
- my ($lei_ipc, @io) = $lei->atfork_parent_wq($self);
- delete($lei->{pkt_op});
-
- $lei->event_step_init; # wait for shutdowns
+ ($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
+ $lei->{1}->autoflush(1);
+ $lei->start_pager if -t $lei->{1};
+ $lei->{ovv}->ovv_begin($lei);
+ my $l2m = $lei->{l2m};
if ($l2m) {
- $self->wq_do('query_prepare', \@io, $lei_ipc);
- $io[1] = $zpipe->[1] if $zpipe;
+ $l2m->pre_augment($lei);
+ $l2m->wq_workers_start('lei2mail', $l2m->{jobs},
+ $lei->oldset, { lei => $lei });
+ pipe($lei->{startq}, $lei->{au_done}) or die "pipe: $!";
+ # 1031: F_SETPIPE_SZ
+ fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
}
- start_query($self, \@io, $lei_ipc);
- $self->wq_close(1);
+ $self->wq_workers_start('lei_xsearch', $self->{jobs},
+ $lei->oldset, { lei => $lei });
+ my $op = delete $lei->{pkt_op_c};
+ delete $lei->{pkt_op_p};
+ $l2m->wq_close(1) if $l2m;
+ $lei->event_step_init; # wait for shutdowns
+ $self->wq_do('query_prepare', []) if $l2m;
+ start_query($self, $lei);
+ $self->wq_close(1); # lei_xsearch workers stop when done
if ($lei->{oneshot}) {
- # for the $lei_ipc->atfork_child_wq PIPE handler:
while ($op->{sock}) { $op->event_step }
}
}
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 04/10] lei q: only start pager if output is to stdout
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
` (2 preceding siblings ...)
2021-02-04 9:59 ` [PATCH 03/10] lei q: reorder internals to reduce FD passing Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 05/10] lei q: reinstate early MUA spawn for Maildir Eric Wong
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
No need to be starting a pager if we're writing to a regular file.
---
lib/PublicInbox/LeiOverview.pm | 3 +--
lib/PublicInbox/LeiXSearch.pm | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index e6bf4f2a..3125f015 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -78,9 +78,8 @@ sub new {
if ($fmt =~ /\A($JSONL|(?:concat)?json)\z/) {
$json = $self->{json} = ref(PublicInbox::Config->json);
}
- my ($isatty, $seekable);
if ($dst eq '/dev/stdout') {
- $isatty = -t $lei->{1};
+ my $isatty = $lei->{need_pager} = -t $lei->{1};
$opt->{pretty} //= $isatty;
if (!$isatty && -f _) {
my $fl = fcntl($lei->{1}, F_GETFL, 0) //
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index e41d899e..0ca871ea 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -414,7 +414,7 @@ sub do_query {
};
($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
$lei->{1}->autoflush(1);
- $lei->start_pager if -t $lei->{1};
+ $lei->start_pager if delete $lei->{need_pager};
$lei->{ovv}->ovv_begin($lei);
my $l2m = $lei->{l2m};
if ($l2m) {
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 05/10] lei q: reinstate early MUA spawn for Maildir
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
` (3 preceding siblings ...)
2021-02-04 9:59 ` [PATCH 04/10] lei q: only start pager if output is to stdout Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 06/10] eml: handle warning ignores for lei Eric Wong
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
Once all files are written, we can use utime() to poke Maildirs
to wake up MUAs that fail to account for nanosecond timestamps
resolution.
---
lib/PublicInbox/LEI.pm | 1 +
lib/PublicInbox/LeiToMail.pm | 13 +++++++++++++
lib/PublicInbox/LeiXSearch.pm | 15 +++++++++------
3 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 0d4b1c11..24efb494 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -739,6 +739,7 @@ sub start_mua {
} elsif ($self->{oneshot}) {
$self->{"mua.pid.$self.$$"} = spawn(\@cmd);
}
+ delete $self->{-progress};
}
# caller needs to "-t $self->{1}" to check if tty
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index f9250860..5a6f18fb 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -365,6 +365,7 @@ sub new {
} else {
die "bad mail --format=$fmt\n";
}
+ $self->{dst} = $dst;
$lei->{dedupe} = PublicInbox::LeiDedupe->new($lei);
$self;
}
@@ -474,6 +475,18 @@ sub ipc_atfork_child {
$self->SUPER::ipc_atfork_child;
}
+sub lock_free {
+ $_[0]->{base_type} =~ /\A(?:maildir|mh|imap|jmap)\z/ ? 1 : 0;
+}
+
+sub poke_dst {
+ my ($self) = @_;
+ if ($self->{base_type} eq 'maildir') {
+ my $t = time + 1;
+ utime($t, $t, "$self->{dst}/cur");
+ }
+}
+
sub write_mail { # via ->wq_do
my ($self, $git_dir, $smsg) = @_;
my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 0ca871ea..e7f0ef63 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -308,13 +308,13 @@ sub git {
sub query_done { # EOF callback for main daemon
my ($lei) = @_;
- my $has_l2m = exists $lei->{l2m};
- for my $f (qw(lxs l2m)) {
- my $wq = delete $lei->{$f} or next;
- $wq->wq_wait_old($lei);
+ my $l2m = delete $lei->{l2m};
+ $l2m->wq_wait_old($lei) if $l2m;
+ if (my $lxs = delete $lei->{lxs}) {
+ $lxs->wq_wait_old($lei);
}
$lei->{ovv}->ovv_end($lei);
- if ($has_l2m) { # close() calls LeiToMail reap_compress
+ if ($l2m) { # close() calls LeiToMail reap_compress
if (my $out = delete $lei->{old_1}) {
if (my $mbout = $lei->{1}) {
close($mbout) or return $lei->fail(<<"");
@@ -323,7 +323,7 @@ Error closing $lei->{ovv}->{dst}: $!
}
$lei->{1} = $out;
}
- $lei->start_mua;
+ $l2m->lock_free ? $l2m->poke_dst : $lei->start_mua;
}
$lei->{-progress} and
$lei->err('# ', $lei->{-mset_total} // 0, " matches");
@@ -355,6 +355,9 @@ sub concurrency {
sub start_query { # always runs in main (lei-daemon) process
my ($self, $lei) = @_;
+ if (my $l2m = $lei->{l2m}) {
+ $lei->start_mua if $l2m->lock_free;
+ }
if ($lei->{opt}->{thread}) {
for my $ibxish (locals($self)) {
$self->wq_do('query_thread_mset', [], $ibxish);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 06/10] eml: handle warning ignores for lei
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
` (4 preceding siblings ...)
2021-02-04 9:59 ` [PATCH 05/10] lei q: reinstate early MUA spawn for Maildir Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 07/10] lei q: eliminate $not_done temporary git dir hack Eric Wong
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
There's nothing we can do about bad emails in our search
results, so quiet things down and don't fight the MUA for
the terminal.
---
lib/PublicInbox/Admin.pm | 7 +++----
lib/PublicInbox/Eml.pm | 19 +++++++++++++++++++
lib/PublicInbox/InboxWritable.pm | 24 +-----------------------
lib/PublicInbox/LeiToMail.pm | 1 +
lib/PublicInbox/Watch.pm | 14 ++++++--------
5 files changed, 30 insertions(+), 35 deletions(-)
diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index f96397ea..3b38a5a3 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -10,6 +10,7 @@ our @EXPORT_OK = qw(setup_signals);
use PublicInbox::Config;
use PublicInbox::Inbox;
use PublicInbox::Spawn qw(popen_rd);
+use PublicInbox::Eml;
*rel2abs_collapsed = \&PublicInbox::Config::rel2abs_collapsed;
sub setup_signals {
@@ -241,12 +242,10 @@ sub index_inbox {
}
local %SIG = %SIG;
setup_signals(\&index_terminate, $ibx);
- my $warn_cb = $SIG{__WARN__} // \&CORE::warn;
my $idx = { current_info => $ibx->{inboxdir} };
- my $warn_ignore = PublicInbox::InboxWritable->can('warn_ignore');
local $SIG{__WARN__} = sub {
- return if $warn_ignore->(@_);
- $warn_cb->($idx->{current_info}, ': ', @_);
+ return if PublicInbox::Eml::warn_ignore(@_);
+ warn($idx->{current_info}, ': ', @_);
};
if (ref($ibx) && $ibx->version == 2) {
eval { require PublicInbox::V2Writable };
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index bd27f19b..f7f62e7b 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -477,6 +477,25 @@ sub charset_set {
sub crlf { $_[0]->{crlf} // "\n" }
+# warnings to ignore when handling spam mailboxes and maybe other places
+sub warn_ignore {
+ my $s = "@_";
+ # Email::Address::XS warnings
+ $s =~ /^Argument contains empty address at /
+ || $s =~ /^Element at index [0-9]+ contains /
+ # PublicInbox::MsgTime
+ || $s =~ /^bogus TZ offset: .+?, ignoring and assuming \+0000/
+ || $s =~ /^bad Date: .+? in /
+ # Encode::Unicode::UTF7
+ || $s =~ /^Bad UTF7 data escape at /
+}
+
+# this expects to be RHS in this assignment: "local $SIG{__WARN__} = ..."
+sub warn_ignore_cb {
+ my $cb = $SIG{__WARN__} // \&CORE::warn;
+ sub { $cb->(@_) unless warn_ignore(@_) }
+}
+
sub willneed { re_memo($_) for @_ }
willneed(qw(From To Cc Date Subject Content-Type In-Reply-To References
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index 982ad6e5..3a4012cd 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -9,7 +9,7 @@ use parent qw(PublicInbox::Inbox Exporter);
use PublicInbox::Import;
use PublicInbox::Filter::Base qw(REJECT);
use Errno qw(ENOENT);
-our @EXPORT_OK = qw(eml_from_path warn_ignore_cb);
+our @EXPORT_OK = qw(eml_from_path);
use constant {
PERM_UMASK => 0,
@@ -277,28 +277,6 @@ sub cleanup ($) {
delete @{$_[0]}{qw(over mm git search)};
}
-# warnings to ignore when handling spam mailboxes and maybe other places
-sub warn_ignore {
- my $s = "@_";
- # Email::Address::XS warnings
- $s =~ /^Argument contains empty address at /
- || $s =~ /^Element at index [0-9]+ contains /
- # PublicInbox::MsgTime
- || $s =~ /^bogus TZ offset: .+?, ignoring and assuming \+0000/
- || $s =~ /^bad Date: .+? in /
- # Encode::Unicode::UTF7
- || $s =~ /^Bad UTF7 data escape at /
-}
-
-# this expects to be RHS in this assignment: "local $SIG{__WARN__} = ..."
-sub warn_ignore_cb {
- my $cb = $SIG{__WARN__} // \&CORE::warn;
- sub {
- return if warn_ignore(@_);
- $cb->(@_);
- }
-}
-
# v2+ only, XXX: maybe we can just rely on ->max_git_epoch and remove
sub git_dir_latest {
my ($self, $max) = @_;
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 5a6f18fb..1f815e40 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -472,6 +472,7 @@ sub ipc_atfork_child {
close $zpipe->[0];
}
$self->{wcb} = $self->write_cb($lei);
+ $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
$self->SUPER::ipc_atfork_child;
}
diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm
index 2b44ba43..185e5da8 100644
--- a/lib/PublicInbox/Watch.pm
+++ b/lib/PublicInbox/Watch.pm
@@ -7,7 +7,7 @@ package PublicInbox::Watch;
use strict;
use v5.10.1;
use PublicInbox::Eml;
-use PublicInbox::InboxWritable qw(eml_from_path warn_ignore_cb);
+use PublicInbox::InboxWritable qw(eml_from_path);
use PublicInbox::Filter::Base qw(REJECT);
use PublicInbox::Spamcheck;
use PublicInbox::Sigfd;
@@ -174,7 +174,7 @@ sub _remove_spam {
# path must be marked as (S)een
$path =~ /:2,[A-R]*S[T-Za-z]*\z/ or return;
my $eml = eml_from_path($path) or return;
- local $SIG{__WARN__} = warn_ignore_cb();
+ local $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
$self->{pi_cfg}->each_inbox(\&remove_eml_i, $self, $eml, $path);
}
@@ -414,13 +414,11 @@ sub imap_import_msg ($$$$$) {
import_eml($self, $ibx, $eml);
}
} elsif ($inboxes eq 'watchspam') {
- # we don't remove unseen messages
- if ($flags =~ /\\Seen\b/) {
- local $SIG{__WARN__} = warn_ignore_cb();
- my $eml = PublicInbox::Eml->new($raw);
- $self->{pi_cfg}->each_inbox(\&remove_eml_i,
+ return if $flags !~ /\\Seen\b/; # don't remove unseen messages
+ local $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
+ my $eml = PublicInbox::Eml->new($raw);
+ $self->{pi_cfg}->each_inbox(\&remove_eml_i,
$self, $eml, "$url UID:$uid");
- }
} else {
die "BUG: destination unknown $inboxes";
}
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 07/10] lei q: eliminate $not_done temporary git dir hack
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
` (5 preceding siblings ...)
2021-02-04 9:59 ` [PATCH 06/10] eml: handle warning ignores for lei Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 08/10] lei_query: remove uneeded dwaitpid import Eric Wong
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
Another step towards simplifying lei internals.
None of our current uses of ->wq_do involve FD passing, and the
plan is only rely on FD passing between lei-daemon and lei(1).
Internally, it ought to be possible for lei-daemon internal bits
to be ordered properly to not need FD passing.
---
lib/PublicInbox/LeiOverview.pm | 23 ++---------------------
lib/PublicInbox/LeiToMail.pm | 3 +--
lib/PublicInbox/LeiXSearch.pm | 16 ++++++++++++----
3 files changed, 15 insertions(+), 27 deletions(-)
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index 3125f015..d3df4faa 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -147,17 +147,6 @@ sub _unbless_smsg {
sub ovv_atexit_child {
my ($self, $lei) = @_;
- if (my $l2m = $lei->{l2m}) {
- # wait for ->write_mail work we submitted to lei2mail
- if (my $rd = delete $l2m->{each_smsg_done}) {
- read($rd, my $buf, 1); # wait for EOF
- }
- }
- # order matters, git->{-tmp}->DESTROY must not fire until
- # {each_smsg_done} hits EOF above
- if (my $git = delete $self->{git}) {
- $git->async_wait_all;
- }
if (my $bref = delete $lei->{ovv_buf}) {
my $lk = $self->lock_for_scope;
$lei->out($$bref);
@@ -213,19 +202,11 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
$wcb->(undef, $smsg, $eml);
};
} elsif ($l2m && $l2m->{-wq_s1}) {
- # $io->[0] becomes a notification pipe that triggers EOF
- # in this wq worker when all outstanding ->write_mail
- # calls are complete
- my $io = [];
- pipe($l2m->{each_smsg_done}, $io->[0]) or die "pipe: $!";
- fcntl($io->[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
- my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
- $self->{git} = $git;
- my $git_dir = $git->{git_dir};
+ my $git_dir = $ibxish->git->{git_dir};
sub {
my ($smsg, $mitem) = @_;
$smsg->{pct} = get_pct($mitem) if $mitem;
- $l2m->wq_do('write_mail', $io, $git_dir, $smsg);
+ $l2m->wq_do('write_mail', [], $git_dir, $smsg);
}
} elsif ($self->{fmt} =~ /\A(concat)?json\z/ && $lei->{opt}->{pretty}) {
my $EOR = ($1//'') eq 'concat' ? "\n}" : "\n},";
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 1f815e40..4f847221 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -490,10 +490,9 @@ sub poke_dst {
sub write_mail { # via ->wq_do
my ($self, $git_dir, $smsg) = @_;
- my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
my $git = $self->{"$$\0$git_dir"} //= PublicInbox::Git->new($git_dir);
git_async_cat($git, $smsg->{blob}, \&git_to_mail,
- [$self->{wcb}, $smsg, $not_done]);
+ [$self->{wcb}, $smsg]);
}
sub wq_atexit_child {
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index e7f0ef63..2dc44414 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -287,12 +287,15 @@ sub query_remote_mboxrd {
$lei->{ovv}->ovv_atexit_child($lei);
}
-sub git {
+# called by LeiOverview::each_smsg_cb
+sub git { $_[0]->{git_tmp} // die 'BUG: caller did not set {git_tmp}' }
+
+sub git_tmp ($) {
my ($self) = @_;
my (%seen, @dirs);
- my $tmp = File::Temp->newdir('lei_xsrch_git-XXXXXXXX', TMPDIR => 1);
- for my $ibx (@{$self->{shard2ibx} // []}) {
- my $d = File::Spec->canonpath($ibx->git->{git_dir});
+ my $tmp = File::Temp->newdir("lei_xsearch_git.$$-XXXX", TMPDIR => 1);
+ for my $ibxish (locals($self)) {
+ my $d = File::Spec->canonpath($ibxish->git->{git_dir});
$seen{$d} //= push @dirs, "$d/objects\n"
}
my $git_dir = $tmp->dirname;
@@ -428,6 +431,11 @@ sub do_query {
# 1031: F_SETPIPE_SZ
fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
}
+ if (!$lei->{opt}->{thread} && locals($self)) { # for query_mset
+ # lei->{git_tmp} is set for wq_wait_old so we don't
+ # delete until all lei2mail + lei_xsearch workers are reaped
+ $lei->{git_tmp} = $self->{git_tmp} = git_tmp($self);
+ }
$self->wq_workers_start('lei_xsearch', $self->{jobs},
$lei->oldset, { lei => $lei });
my $op = delete $lei->{pkt_op_c};
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 08/10] lei_query: remove uneeded dwaitpid import
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
` (6 preceding siblings ...)
2021-02-04 9:59 ` [PATCH 07/10] lei q: eliminate $not_done temporary git dir hack Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 09/10] lei_xsearch: drop unused imports Eric Wong
2021-02-04 9:59 ` [PATCH 10/10] lei import: initial implementation Eric Wong
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
All process management is handled elsewhere.
---
lib/PublicInbox/LeiQuery.pm | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 6b1aa40c..56350386 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -5,7 +5,6 @@
package PublicInbox::LeiQuery;
use strict;
use v5.10.1;
-use PublicInbox::DS qw(dwaitpid);
sub prep_ext { # externals_each callback
my ($lxs, $exclude, $loc) = @_;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 09/10] lei_xsearch: drop unused imports
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
` (7 preceding siblings ...)
2021-02-04 9:59 ` [PATCH 08/10] lei_query: remove uneeded dwaitpid import Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
2021-02-04 9:59 ` [PATCH 10/10] lei import: initial implementation Eric Wong
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
Reaping is handled by the parent PublicInbox::IPC, and we
have no business using PublicInbox::Import since LeiXSearch
won't write to git directly (it will write via LeiStore).
---
lib/PublicInbox/LeiXSearch.pm | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 2dc44414..daf42098 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -8,9 +8,8 @@ package PublicInbox::LeiXSearch;
use strict;
use v5.10.1;
use parent qw(PublicInbox::LeiSearch PublicInbox::IPC);
-use PublicInbox::DS qw(dwaitpid now);
+use PublicInbox::DS qw(now);
use PublicInbox::PktOp qw(pkt_do);
-use PublicInbox::Import;
use File::Temp 0.19 (); # 0.19 for ->newdir
use File::Spec ();
use PublicInbox::Search qw(xap_terms);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 10/10] lei import: initial implementation
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
` (8 preceding siblings ...)
2021-02-04 9:59 ` [PATCH 09/10] lei_xsearch: drop unused imports Eric Wong
@ 2021-02-04 9:59 ` Eric Wong
9 siblings, 0 replies; 11+ messages in thread
From: Eric Wong @ 2021-02-04 9:59 UTC (permalink / raw)
To: meta
Only tested with .eml files so far, but Maildir + IMAP
will be supported.
---
MANIFEST | 1 +
lib/PublicInbox/IPC.pm | 4 +-
lib/PublicInbox/LEI.pm | 48 ++++++++++++---
lib/PublicInbox/LeiImport.pm | 106 ++++++++++++++++++++++++++++++++++
lib/PublicInbox/LeiStore.pm | 18 ++++++
lib/PublicInbox/LeiXSearch.pm | 18 +-----
t/lei.t | 15 +++++
7 files changed, 184 insertions(+), 26 deletions(-)
create mode 100644 lib/PublicInbox/LeiImport.pm
diff --git a/MANIFEST b/MANIFEST
index 6922f9b1..a11d4106 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -179,6 +179,7 @@ lib/PublicInbox/KQNotify.pm
lib/PublicInbox/LEI.pm
lib/PublicInbox/LeiDedupe.pm
lib/PublicInbox/LeiExternal.pm
+lib/PublicInbox/LeiImport.pm
lib/PublicInbox/LeiOverview.pm
lib/PublicInbox/LeiQuery.pm
lib/PublicInbox/LeiSearch.pm
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 7f5a3f6f..a0e6bfee 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -101,7 +101,7 @@ sub ipc_worker_loop ($$$) {
# starts a worker if Sereal or Storable is installed
sub ipc_worker_spawn {
- my ($self, $ident, $oldset) = @_;
+ my ($self, $ident, $oldset, $fields) = @_;
return unless $enc; # no Sereal or Storable
return if ($self->{-ipc_ppid} // -1) == $$; # idempotent
delete(@$self{qw(-ipc_req -ipc_res -ipc_ppid -ipc_pid)});
@@ -123,6 +123,8 @@ sub ipc_worker_spawn {
# ensure we properly exit even if warn() dies:
my $end = PublicInbox::OnDestroy->new($$, sub { exit(!!$@) });
eval {
+ $fields //= {};
+ local @$self{keys %$fields} = values(%$fields);
my $on_destroy = $self->ipc_atfork_child;
local %SIG = %SIG;
ipc_worker_loop($self, $r_req, $w_res);
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 24efb494..682d1bd1 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -160,9 +160,10 @@ our %CMD = ( # sorted in order of importance/use:
'forget-watch' => [ '{WATCH_NUMBER|--prune}', 'stop and forget a watch',
qw(prune) ],
-'import' => [ 'URL_OR_PATHNAME|--stdin',
- 'one-shot import/update from URL or filesystem',
- qw(stdin| offset=i recursive|r exclude=s include=s !flags),
+'import' => [ 'URLS_OR_PATHNAMES...|--stdin',
+ 'one-time import/update from URL or filesystem',
+ qw(stdin| offset=i recursive|r exclude=s include|I=s
+ format|f=s flags!),
],
'config' => [ '[...]', sub {
@@ -194,8 +195,8 @@ our %CMD = ( # sorted in order of importance/use:
# $spec => [@ALLOWED_VALUES (default is first), $description],
# $spec => $description
# "$SUB_COMMAND TAB $spec" => as above
-my $stdin_formats = [ 'IN|auto|raw|mboxrd|mboxcl2|mboxcl|mboxo',
- 'specify message input format' ];
+my $stdin_formats = [ 'MAIL_FORMAT|eml|mboxrd|mboxcl2|mboxcl|mboxo',
+ 'specify message input format' ];
my $ls_format = [ 'OUT|plain|json|null', 'listing output format' ];
my %OPTDESC = (
@@ -240,6 +241,8 @@ my %OPTDESC = (
'q jobs=s' => [ '[SEARCH_JOBS][,WRITER_JOBS]',
'control number of search and writer jobs' ],
+'import format|f=s' => $stdin_formats,
+
'ls-query format|f=s' => $ls_format,
'ls-external format|f=s' => $ls_format,
@@ -319,6 +322,20 @@ sub err ($;@) {
sub qerr ($;@) { $_[0]->{opt}->{quiet} or err(shift, @_) }
+sub fail_handler ($;$$) {
+ my ($lei, $code, $io) = @_;
+ for my $f (qw(imp lxs l2m)) {
+ my $wq = delete $lei->{$f} or next;
+ $wq->wq_wait_old($lei) if $wq->wq_kill_old; # lei-daemon
+ }
+ close($io) if $io; # needed to avoid warnings on SIGPIPE
+ $lei->x_it($code // (1 >> 8));
+}
+
+sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
+ fail_handler($_[0], 13, delete $_[0]->{1});
+}
+
sub fail ($$;$) {
my ($self, $buf, $exit_code) = @_;
err($self, $buf) if defined $buf;
@@ -340,7 +357,8 @@ sub out ($;@) {
sub puts ($;@) { out(shift, map { "$_\n" } @_) }
sub child_error { # passes non-fatal curl exit codes to user
- my ($self, $child_error) = @_; # child_error is $?
+ my ($self, $child_error, $msg) = @_; # child_error is $?
+ $self->err($msg) if $msg;
if (my $s = $self->{pkt_op_p} // $self->{sock}) {
# send to the parent lei-daemon or to lei(1) client
send($s, "child_error $child_error", MSG_EOR);
@@ -357,9 +375,16 @@ sub note_sigpipe { # triggers sigpipe_handler
}
sub lei_atfork_child {
- my ($self) = @_;
+ my ($self, $persist) = @_;
# we need to explicitly close things which are on stack
- delete $self->{0};
+ if ($persist) {
+ my @io = delete @$self{0,1,2};
+ unless ($self->{oneshot}) {
+ close($_) for @io;
+ }
+ } else {
+ delete $self->{0};
+ }
for (delete @$self{qw(3 sock old_1 au_done)}) {
close($_) if defined($_);
}
@@ -374,7 +399,7 @@ sub lei_atfork_child {
%PATH2CFG = ();
undef $errors_log;
$quit = \&CORE::exit;
- $current_lei = $self; # for SIG{__WARN__}
+ $current_lei = $persist ? undef : $self; # for SIG{__WARN__}
}
sub _help ($;$) {
@@ -606,6 +631,11 @@ sub lei_config {
x_it($self, $?) if $?;
}
+sub lei_import {
+ require PublicInbox::LeiImport;
+ PublicInbox::LeiImport->call(@_);
+}
+
sub lei_init {
my ($self, $dir) = @_;
my $cfg = _lei_cfg($self, 1);
diff --git a/lib/PublicInbox/LeiImport.pm b/lib/PublicInbox/LeiImport.pm
new file mode 100644
index 00000000..4a9af8a7
--- /dev/null
+++ b/lib/PublicInbox/LeiImport.pm
@@ -0,0 +1,106 @@
+# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# front-end for the "lei import" sub-command
+package PublicInbox::LeiImport;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::IPC);
+use PublicInbox::MboxReader;
+use PublicInbox::Eml;
+
+sub _import_eml { # MboxReader callback
+ my ($eml, $sto, $set_kw) = @_;
+ $sto->ipc_do('set_eml', $eml, $set_kw ? $sto->mbox_keywords($eml) : ());
+}
+
+sub import_done { # EOF callback for main daemon
+ my ($lei) = @_;
+ my $imp = delete $lei->{imp};
+ $imp->wq_wait_old($lei) if $imp;
+ my $wait = $lei->{sto}->ipc_do('done');
+ $lei->dclose;
+}
+
+sub call { # the main "lei import" method
+ my ($cls, $lei, @argv) = @_;
+ my $sto = $lei->_lei_store(1);
+ $sto->write_prepare($lei);
+ $lei->{opt}->{flags} //= 1;
+ my $fmt = $lei->{opt}->{'format'};
+ my $self = $lei->{imp} = bless {}, $cls;
+ return $lei->fail('--format unspecified') if !$fmt;
+ $self->{0} = $lei->{0} if $lei->{opt}->{stdin};
+ my $ops = {
+ '!' => [ $lei->can('fail_handler'), $lei ],
+ 'x_it' => [ $lei->can('x_it'), $lei ],
+ 'child_error' => [ $lei->can('child_error'), $lei ],
+ '' => [ \&import_done, $lei ],
+ };
+ ($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
+ my $j = $lei->{opt}->{jobs} // scalar(@argv) || 1;
+ my $nproc = $self->detect_nproc;
+ $j = $nproc if $j > $nproc;
+ $self->wq_workers_start('lei_import', $j, $lei->oldset, {lei => $lei});
+ my $op = delete $lei->{pkt_op_c};
+ delete $lei->{pkt_op_p};
+ $self->wq_do('import_stdin', []) if $self->{0};
+ for my $x (@argv) {
+ $self->wq_do('import_path_url', [], $x);
+ }
+ $self->wq_close(1);
+ $lei->event_step_init; # wait for shutdowns
+ if ($lei->{oneshot}) {
+ while ($op->{sock}) { $op->event_step }
+ }
+}
+
+sub ipc_atfork_child {
+ my ($self) = @_;
+ $self->{lei}->lei_atfork_child;
+ $self->SUPER::ipc_atfork_child;
+}
+
+sub _import_fh {
+ my ($lei, $fh, $x) = @_;
+ my $set_kw = $lei->{opt}->{flags};
+ my $fmt = $lei->{opt}->{'format'};
+ eval {
+ if ($fmt eq 'eml') {
+ my $buf = do { local $/; <$fh> } //
+ return $lei->child_error(1 >> 8, <<"");
+ error reading $x: $!
+
+ my $eml = PublicInbox::Eml->new(\$buf);
+ _import_eml($eml, $lei->{sto}, $set_kw);
+ } else { # some mbox
+ my $cb = PublicInbox::MboxReader->can($fmt);
+ $cb or return $lei->child_error(1 >> 8, <<"");
+ --format $fmt unsupported for $x
+
+ $cb->(undef, $fh, \&_import_eml, $lei->{sto}, $set_kw);
+ }
+ };
+ $lei->child_error(1 >> 8, "<stdin>: $@") if $@;
+}
+
+sub import_path_url {
+ my ($self, $x) = @_;
+ my $lei = $self->{lei};
+ # TODO auto-detect?
+ if (-f $x) {
+ open my $fh, '<', $x or return $lei->child_error(1 >> 8, <<"");
+unable to open $x: $!
+
+ _import_fh($lei, $fh, $x);
+ } else {
+ $lei->fail("$x unsupported (TODO)");
+ }
+}
+
+sub import_stdin {
+ my ($self) = @_;
+ _import_fh($self->{lei}, $self->{0}, '<stdin>');
+}
+
+1;
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index a7d7d953..3a215973 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -17,6 +17,7 @@ use PublicInbox::V2Writable;
use PublicInbox::ContentHash qw(content_hash content_digest);
use PublicInbox::MID qw(mids mids_in);
use PublicInbox::LeiSearch;
+use PublicInbox::MDA;
use List::Util qw(max);
sub new {
@@ -237,4 +238,21 @@ sub done {
die $err if $err;
}
+sub ipc_atfork_child {
+ my ($self) = @_;
+ my $lei = delete $self->{lei};
+ $lei->lei_atfork_child(1) if $lei;
+ $self->SUPER::ipc_atfork_child;
+}
+
+sub write_prepare {
+ my ($self, $lei) = @_;
+ $self->ipc_lock_init;
+ # Mail we import into lei are private, so headers filtered out
+ # by -mda for public mail are not appropriate
+ local @PublicInbox::MDA::BAD_HEADERS = ();
+ $self->ipc_worker_spawn('lei_store', $lei->oldset, { lei => $lei });
+ $lei->{sto} = $self;
+}
+
1;
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index daf42098..f8068362 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -392,25 +392,11 @@ sub query_prepare { # called by wq_do
pkt_do($lei->{pkt_op_p}, '.') == 1 or die "do_post_augment trigger: $!"
}
-sub fail_handler ($;$$) {
- my ($lei, $code, $io) = @_;
- for my $f (qw(lxs l2m)) {
- my $wq = delete $lei->{$f} or next;
- $wq->wq_wait_old($lei) if $wq->wq_kill_old; # lei-daemon
- }
- close($io) if $io; # needed to avoid warnings on SIGPIPE
- $lei->x_it($code // (1 >> 8));
-}
-
-sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
- fail_handler($_[0], 13, delete $_[0]->{1});
-}
-
sub do_query {
my ($self, $lei) = @_;
my $ops = {
- '|' => [ \&sigpipe_handler, $lei ],
- '!' => [ \&fail_handler, $lei ],
+ '|' => [ $lei->can('sigpipe_handler'), $lei ],
+ '!' => [ $lei->can('fail_handler'), $lei ],
'.' => [ \&do_post_augment, $lei ],
'' => [ \&query_done, $lei ],
'mset_progress' => [ \&mset_progress, $lei ],
diff --git a/t/lei.t b/t/lei.t
index a08a6d0d..eb824a30 100644
--- a/t/lei.t
+++ b/t/lei.t
@@ -389,6 +389,20 @@ SKIP: {
}; # /SKIP
};
+my $test_import = sub {
+ $cleanup->();
+ ok($lei->(qw(q s:boolean)), 'search miss before import');
+ unlike($out, qr/boolean/i, 'no results, yet');
+ open my $fh, '<', 't/data/0001.patch' or BAIL_OUT $!;
+ ok($lei->([qw(import -f eml -)], undef, { %$opt, 0 => $fh }),
+ 'import single file from stdin');
+ close $fh;
+ ok($lei->(qw(q s:boolean)), 'search hit after import');
+ ok($lei->(qw(import -f eml), 't/data/message_embed.eml'),
+ 'import single file by path');
+ $cleanup->();
+};
+
my $test_lei_common = sub {
$test_help->();
$test_config->();
@@ -396,6 +410,7 @@ my $test_lei_common = sub {
$test_external->();
$test_completion->();
$test_fail->();
+ $test_import->();
};
if ($ENV{TEST_LEI_ONESHOT}) {
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-02-04 9:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-04 9:59 [PATCH 00/10] lei: cleanups + initial import support Eric Wong
2021-02-04 9:59 ` [PATCH 01/10] lei q: delay worker spawn Eric Wong
2021-02-04 9:59 ` [PATCH 02/10] ipc: localize fields assignment Eric Wong
2021-02-04 9:59 ` [PATCH 03/10] lei q: reorder internals to reduce FD passing Eric Wong
2021-02-04 9:59 ` [PATCH 04/10] lei q: only start pager if output is to stdout Eric Wong
2021-02-04 9:59 ` [PATCH 05/10] lei q: reinstate early MUA spawn for Maildir Eric Wong
2021-02-04 9:59 ` [PATCH 06/10] eml: handle warning ignores for lei Eric Wong
2021-02-04 9:59 ` [PATCH 07/10] lei q: eliminate $not_done temporary git dir hack Eric Wong
2021-02-04 9:59 ` [PATCH 08/10] lei_query: remove uneeded dwaitpid import Eric Wong
2021-02-04 9:59 ` [PATCH 09/10] lei_xsearch: drop unused imports Eric Wong
2021-02-04 9:59 ` [PATCH 10/10] lei import: initial implementation Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).