* [PATCH 0/5] watch: various bugfixes and fairness improvements @ 2017-06-26 3:15 Eric Wong 2017-06-26 3:15 ` [PATCH 1/5] watch: ensure HUP causes the scanner to be reloaded Eric Wong ` (4 more replies) 0 siblings, 5 replies; 7+ messages in thread From: Eric Wong @ 2017-06-26 3:15 UTC (permalink / raw) To: meta A fairly large reworking of "public-inbox-watch" to improve fairness when processing large directory scans. There are also some minor bugfixes for signal handling. Eric Wong (5): watch: ensure HUP causes the scanner to be reloaded spamc: retry on EINTR watch: improve fairness during full rescans watch: use "self-inotify-tempfile trick" for quit watch: commit changes to fast-import sooner lib/PublicInbox/Spamcheck/Spamc.pm | 6 +- lib/PublicInbox/WatchMaildir.pm | 110 +++++++++++++++++++++++++++++-------- script/public-inbox-watch | 9 ++- t/watch_maildir.t | 12 ++-- 4 files changed, 105 insertions(+), 32 deletions(-) -- EW ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/5] watch: ensure HUP causes the scanner to be reloaded 2017-06-26 3:15 [PATCH 0/5] watch: various bugfixes and fairness improvements Eric Wong @ 2017-06-26 3:15 ` Eric Wong 2017-06-26 3:15 ` [PATCH 2/5] spamc: retry on EINTR Eric Wong ` (3 subsequent siblings) 4 siblings, 0 replies; 7+ messages in thread From: Eric Wong @ 2017-06-26 3:15 UTC (permalink / raw) To: meta Otherwise the old watcher may run indefinitely --- lib/PublicInbox/WatchMaildir.pm | 4 +++- script/public-inbox-watch | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm index c15e138..f81a917 100644 --- a/lib/PublicInbox/WatchMaildir.pm +++ b/lib/PublicInbox/WatchMaildir.pm @@ -179,6 +179,8 @@ sub _try_path { $im->add($mime, $self->{spamcheck}); } +sub quit { $_[0]->{quit} = 1 } + sub watch { my ($self) = @_; my $cb = sub { _try_fsn_paths($self, \@_) }; @@ -188,7 +190,7 @@ sub watch { # in the future... require Filesys::Notify::Simple; my $watcher = Filesys::Notify::Simple->new($mdir); - $watcher->wait($cb) while (1); + $watcher->wait($cb) until ($self->{quit}); } sub scan { diff --git a/script/public-inbox-watch b/script/public-inbox-watch index bb65592..a72180c 100755 --- a/script/public-inbox-watch +++ b/script/public-inbox-watch @@ -8,6 +8,7 @@ use PublicInbox::Config; my ($config, $watch_md); my $reload = sub { $config = PublicInbox::Config->new; + $watch_md->quit if $watch_md; $watch_md = PublicInbox::WatchMaildir->new($config); }; $reload->(); @@ -17,5 +18,5 @@ if ($watch_md) { $SIG{USR1} = $scan; $SIG{ALRM} = sub { $SIG{ALRM} = 'DEFAULT'; $scan->() }; alarm(1); - $watch_md->watch; + $watch_md->watch while ($watch_md); } -- EW ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/5] spamc: retry on EINTR 2017-06-26 3:15 [PATCH 0/5] watch: various bugfixes and fairness improvements Eric Wong 2017-06-26 3:15 ` [PATCH 1/5] watch: ensure HUP causes the scanner to be reloaded Eric Wong @ 2017-06-26 3:15 ` Eric Wong 2017-06-26 3:15 ` [PATCH 3/5] watch: improve fairness during full rescans Eric Wong ` (2 subsequent siblings) 4 siblings, 0 replies; 7+ messages in thread From: Eric Wong @ 2017-06-26 3:15 UTC (permalink / raw) To: meta Signals can fire on us at any time if we're using blocking sysread. --- lib/PublicInbox/Spamcheck/Spamc.pm | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/Spamcheck/Spamc.pm b/lib/PublicInbox/Spamcheck/Spamc.pm index 30eec95..ba8e44a 100644 --- a/lib/PublicInbox/Spamcheck/Spamc.pm +++ b/lib/PublicInbox/Spamcheck/Spamc.pm @@ -29,10 +29,14 @@ sub spamcheck { my $buf = ''; $out = \$buf; } +again: do { $r = sysread($fh, $$out, 65536, length($$out)); } while (defined($r) && $r != 0); - defined $r or die "read failed: $!"; + unless (defined $r) { + goto again if $!{EINTR}; + die "read failed: $!"; + } close $fh or die "close failed: $!"; waitpid($pid, 0); ($? || $$out eq '') ? 0 : 1; -- EW ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/5] watch: improve fairness during full rescans 2017-06-26 3:15 [PATCH 0/5] watch: various bugfixes and fairness improvements Eric Wong 2017-06-26 3:15 ` [PATCH 1/5] watch: ensure HUP causes the scanner to be reloaded Eric Wong 2017-06-26 3:15 ` [PATCH 2/5] spamc: retry on EINTR Eric Wong @ 2017-06-26 3:15 ` Eric Wong 2017-06-26 4:01 ` Eric Wong 2017-06-26 3:15 ` [PATCH 4/5] watch: use "self-inotify-tempfile trick" for quit Eric Wong 2017-06-26 3:15 ` [PATCH 5/5] watch: commit changes to fast-import sooner Eric Wong 4 siblings, 1 reply; 7+ messages in thread From: Eric Wong @ 2017-06-26 3:15 UTC (permalink / raw) To: meta We need to ensure new messages are being processed fairly during full rescans, so have the ->scan subroutine yield and reschedule itself. Additionally, having a long-running task inside the signal handler is dangerous and subject to reentrancy bugs. Due to the limitations of the Filesys::Notify::Simple interface, we cannot rely on multiplexing I/O interfaces (select, IO::Poll, Danga::Socket, etc...) for this. Forking a separate process was considered, but it is more expensive for a mostly-idle process. So, we use a variant of the "self-pipe trick" via inotify (or whatever Filesys::Notify::Simple gives us). Instead of writing to our own pipe, we write to a file in our own temporary directory watched by Filesys::Notify::Simple to trigger events in signal handlers. --- lib/PublicInbox/WatchMaildir.pm | 99 +++++++++++++++++++++++++++++++---------- script/public-inbox-watch | 2 +- t/watch_maildir.t | 12 ++--- 3 files changed, 82 insertions(+), 31 deletions(-) diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm index f81a917..fc42ec4 100644 --- a/lib/PublicInbox/WatchMaildir.pm +++ b/lib/PublicInbox/WatchMaildir.pm @@ -13,25 +13,27 @@ use PublicInbox::Git; use PublicInbox::Import; use PublicInbox::MDA; use PublicInbox::Spawn qw(spawn); +use File::Temp qw//; sub new { my ($class, $config) = @_; - my (%mdmap, @mdir, $spamc); + my (%mdmap, @mdir, $spamc, $spamdir); # "publicinboxwatch" is the documented namespace # "publicinboxlearn" is legacy but may be supported # indefinitely... foreach my $pfx (qw(publicinboxwatch publicinboxlearn)) { my $k = "$pfx.watchspam"; - if (my $spamdir = $config->{$k}) { - if ($spamdir =~ s/\Amaildir://) { - $spamdir =~ s!/+\z!!; + if (my $dir = $config->{$k}) { + if ($dir =~ s/\Amaildir://) { + $dir =~ s!/+\z!!; # skip "new", no MUA has seen it, yet. - my $cur = "$spamdir/cur"; + my $cur = "$dir/cur"; + $spamdir = $cur; push @mdir, $cur; $mdmap{$cur} = 'watchspam'; } else { - warn "unsupported $k=$spamdir\n"; + warn "unsupported $k=$dir\n"; } } } @@ -77,22 +79,41 @@ sub new { $mdre = qr!\A($mdre)/!; bless { spamcheck => $spamcheck, + spamdir => $spamdir, mdmap => \%mdmap, mdir => \@mdir, mdre => $mdre, config => $config, importers => {}, + opendirs => {}, # dirname => dirhandle (in progress scans) }, $class; } sub _done_for_now { - $_->done foreach values %{$_[0]->{importers}}; + my ($self) = @_; + my $opendirs = $self->{opendirs}; + + # spamdir scanning means every importer remains open + my $spamdir = $self->{spamdir}; + return if defined($spamdir) && $opendirs->{$spamdir}; + + foreach my $im (values %{$self->{importers}}) { + # not done if we're scanning + next if $opendirs->{$im->{git}->{git_dir}}; + $im->done; + } } sub _try_fsn_paths { - my ($self, $paths) = @_; - _try_path($self, $_->{path}) foreach @$paths; - _done_for_now($self); + my ($self, $scan_re, $paths) = @_; + foreach (@$paths) { + my $path = $_->{path}; + if ($path =~ $scan_re) { + scan($self, $path); + } else { + _try_path($self, $path); + } + } } sub _remove_spam { @@ -183,31 +204,61 @@ sub quit { $_[0]->{quit} = 1 } sub watch { my ($self) = @_; - my $cb = sub { _try_fsn_paths($self, \@_) }; - my $mdir = $self->{mdir}; + my $scan = File::Temp->newdir("public-inbox-watch.$$.scan.XXXXXX", + TMPDIR => 1); + my $scandir = $self->{scandir} = $scan->dirname; + my $re = qr!\A$scandir/!; + my $cb = sub { _try_fsn_paths($self, $re, \@_) }; # lazy load here, we may support watching via IMAP IDLE # in the future... require Filesys::Notify::Simple; - my $watcher = Filesys::Notify::Simple->new($mdir); - $watcher->wait($cb) until ($self->{quit}); + my $fsn = Filesys::Notify::Simple->new([@{$self->{mdir}}, $scandir]); + $fsn->wait($cb) until ($self->{quit}); +} + +sub trigger_scan { + my ($self, $base) = @_; + my $dir = $self->{scandir} or die "not watch-ing, yet\n"; + open my $fh, '>', "$dir/$base" or die "open $dir/$base failed: $!\n"; + close $fh or die "close $dir/$base failed: $!\n"; } sub scan { - my ($self) = @_; - my $mdir = $self->{mdir}; - foreach my $dir (@$mdir) { - my $ok = opendir(my $dh, $dir); - unless ($ok) { - warn "failed to open $dir: $!\n"; - next; - } + my ($self, $path) = @_; + my $max = 10; + my $opendirs = $self->{opendirs}; + my @dirnames = keys %$opendirs; + foreach my $dir (@dirnames) { + my $dh = delete $opendirs->{$dir}; + my $n = $max; while (my $fn = readdir($dh)) { _try_path($self, "$dir/$fn"); + last if --$n < 0; } - closedir $dh; + $opendirs->{$dir} = $dh if $n < 0; + } + if ($path =~ /full\z/) { + foreach my $dir (@{$self->{mdir}}) { + next if $opendirs->{$dir}; # already in progress + my $ok = opendir(my $dh, $dir); + unless ($ok) { + warn "failed to open $dir: $!\n"; + next; + } + my $n = $max; + while (my $fn = readdir($dh)) { + _try_path($self, "$dir/$fn"); + last if --$n < 0; + } + $opendirs->{$dir} = $dh if $n < 0; + } + } + if (keys %$opendirs) { # do we have more work to do? + trigger_scan($self, 'cont'); + } else { + _done_for_now($self); } - _done_for_now($self); } sub _path_to_mime { diff --git a/script/public-inbox-watch b/script/public-inbox-watch index a72180c..51f1baa 100755 --- a/script/public-inbox-watch +++ b/script/public-inbox-watch @@ -13,7 +13,7 @@ my $reload = sub { }; $reload->(); if ($watch_md) { - my $scan = sub { $watch_md->scan if $watch_md }; + my $scan = sub { $watch_md->trigger_scan('full') if $watch_md }; $SIG{HUP} = $reload; $SIG{USR1} = $scan; $SIG{ALRM} = sub { $SIG{ALRM} = 'DEFAULT'; $scan->() }; diff --git a/t/watch_maildir.t b/t/watch_maildir.t index 3969c80..e12e083 100644 --- a/t/watch_maildir.t +++ b/t/watch_maildir.t @@ -42,7 +42,7 @@ my $config = PublicInbox::Config->new({ "publicinboxlearn.watchspam" => "maildir:$spamdir", }); -PublicInbox::WatchMaildir->new($config)->scan; +PublicInbox::WatchMaildir->new($config)->scan('full'); my $git = PublicInbox::Git->new($git_dir); my @list = $git->qx(qw(rev-list refs/heads/master)); is(scalar @list, 1, 'one revision in rev-list'); @@ -59,7 +59,7 @@ my $write_spam = sub { }; $write_spam->(); is(unlink(glob("$maildir/new/*")), 1, 'unlinked old spam'); -PublicInbox::WatchMaildir->new($config)->scan; +PublicInbox::WatchMaildir->new($config)->scan('full'); @list = $git->qx(qw(rev-list refs/heads/master)); is(scalar @list, 2, 'two revisions in rev-list'); @list = $git->qx(qw(ls-tree -r --name-only refs/heads/master)); @@ -72,7 +72,7 @@ To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo\@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html\n); PublicInbox::Emergency->new($maildir)->prepare(\$msg); - PublicInbox::WatchMaildir->new($config)->scan; + PublicInbox::WatchMaildir->new($config)->scan('full'); @list = $git->qx(qw(ls-tree -r --name-only refs/heads/master)); is(scalar @list, 1, 'tree has one file'); my $mref = $git->cat_file('HEAD:'.$list[0]); @@ -80,7 +80,7 @@ More majordomo info at http://vger.kernel.org/majordomo-info.html\n); is(unlink(glob("$maildir/new/*")), 1, 'unlinked spam'); $write_spam->(); - PublicInbox::WatchMaildir->new($config)->scan; + PublicInbox::WatchMaildir->new($config)->scan('full'); @list = $git->qx(qw(ls-tree -r --name-only refs/heads/master)); is(scalar @list, 0, 'tree is empty'); @list = $git->qx(qw(rev-list refs/heads/master)); @@ -96,7 +96,7 @@ More majordomo info at http://vger.kernel.org/majordomo-info.html\n); $config->{'publicinboxwatch.spamcheck'} = 'spamc'; { local $SIG{__WARN__} = sub {}; # quiet spam check warning - PublicInbox::WatchMaildir->new($config)->scan; + PublicInbox::WatchMaildir->new($config)->scan('full'); } @list = $git->qx(qw(ls-tree -r --name-only refs/heads/master)); is(scalar @list, 0, 'tree has no files spamc checked'); @@ -111,7 +111,7 @@ More majordomo info at http://vger.kernel.org/majordomo-info.html\n); PublicInbox::Emergency->new($maildir)->prepare(\$msg); $config->{'publicinboxwatch.spamcheck'} = 'spamc'; @list = $git->qx(qw(ls-tree -r --name-only refs/heads/master)); - PublicInbox::WatchMaildir->new($config)->scan; + PublicInbox::WatchMaildir->new($config)->scan('full'); @list = $git->qx(qw(ls-tree -r --name-only refs/heads/master)); is(scalar @list, 1, 'tree has one file after spamc checked'); -- EW ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 3/5] watch: improve fairness during full rescans 2017-06-26 3:15 ` [PATCH 3/5] watch: improve fairness during full rescans Eric Wong @ 2017-06-26 4:01 ` Eric Wong 0 siblings, 0 replies; 7+ messages in thread From: Eric Wong @ 2017-06-26 4:01 UTC (permalink / raw) To: meta Eric Wong <e@80x24.org> wrote: > sub _try_fsn_paths { > - my ($self, $paths) = @_; > - _try_path($self, $_->{path}) foreach @$paths; > - _done_for_now($self); > + my ($self, $scan_re, $paths) = @_; > + foreach (@$paths) { > + my $path = $_->{path}; > + if ($path =~ $scan_re) { > + scan($self, $path); > + } else { > + _try_path($self, $path); > + } > + } That "_done_for_now($self);" call should not have been eliminated. Fixed in the version pushed out to public-inbox.git ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 4/5] watch: use "self-inotify-tempfile trick" for quit 2017-06-26 3:15 [PATCH 0/5] watch: various bugfixes and fairness improvements Eric Wong ` (2 preceding siblings ...) 2017-06-26 3:15 ` [PATCH 3/5] watch: improve fairness during full rescans Eric Wong @ 2017-06-26 3:15 ` Eric Wong 2017-06-26 3:15 ` [PATCH 5/5] watch: commit changes to fast-import sooner Eric Wong 4 siblings, 0 replies; 7+ messages in thread From: Eric Wong @ 2017-06-26 3:15 UTC (permalink / raw) To: meta This should be more reliable and safer as it'll ensure existing fast-import instances are shut down properly. --- lib/PublicInbox/WatchMaildir.pm | 12 ++++++++++-- script/public-inbox-watch | 4 ++++ 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm index fc42ec4..b867edc 100644 --- a/lib/PublicInbox/WatchMaildir.pm +++ b/lib/PublicInbox/WatchMaildir.pm @@ -200,7 +200,7 @@ sub _try_path { $im->add($mime, $self->{spamcheck}); } -sub quit { $_[0]->{quit} = 1 } +sub quit { trigger_scan($_[0], 'quit') } sub watch { my ($self) = @_; @@ -214,7 +214,7 @@ sub watch { # in the future... require Filesys::Notify::Simple; my $fsn = Filesys::Notify::Simple->new([@{$self->{mdir}}, $scandir]); - $fsn->wait($cb) until ($self->{quit}); + $fsn->wait($cb) until $self->{quit}; } sub trigger_scan { @@ -226,6 +226,14 @@ sub trigger_scan { sub scan { my ($self, $path) = @_; + if ($path =~ /quit\z/) { + %{$self->{opendirs}} = (); + _done_for_now($self); + $self->{quit} = 1; + return; + } + # else: $path =~ /(cont|full)\z/ + return if $self->{quit}; my $max = 10; my $opendirs = $self->{opendirs}; my @dirnames = keys %$opendirs; diff --git a/script/public-inbox-watch b/script/public-inbox-watch index 51f1baa..0d1cd83 100755 --- a/script/public-inbox-watch +++ b/script/public-inbox-watch @@ -17,6 +17,10 @@ if ($watch_md) { $SIG{HUP} = $reload; $SIG{USR1} = $scan; $SIG{ALRM} = sub { $SIG{ALRM} = 'DEFAULT'; $scan->() }; + $SIG{QUIT} = $SIG{TERM} = $SIG{INT} = sub { + $watch_md->quit if $watch_md; + $watch_md = undef; + }; alarm(1); $watch_md->watch while ($watch_md); } -- EW ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 5/5] watch: commit changes to fast-import sooner 2017-06-26 3:15 [PATCH 0/5] watch: various bugfixes and fairness improvements Eric Wong ` (3 preceding siblings ...) 2017-06-26 3:15 ` [PATCH 4/5] watch: use "self-inotify-tempfile trick" for quit Eric Wong @ 2017-06-26 3:15 ` Eric Wong 4 siblings, 0 replies; 7+ messages in thread From: Eric Wong @ 2017-06-26 3:15 UTC (permalink / raw) To: meta We should make changes visible sooner, even during lengthy scans. --- lib/PublicInbox/WatchMaildir.pm | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm index b867edc..b0ad2a7 100644 --- a/lib/PublicInbox/WatchMaildir.pm +++ b/lib/PublicInbox/WatchMaildir.pm @@ -91,13 +91,18 @@ sub new { sub _done_for_now { my ($self) = @_; + my $importers = $self->{importers}; + foreach my $im (values %$importers) { + $im->done if $im->{nchg}; + } + my $opendirs = $self->{opendirs}; # spamdir scanning means every importer remains open my $spamdir = $self->{spamdir}; return if defined($spamdir) && $opendirs->{$spamdir}; - foreach my $im (values %{$self->{importers}}) { + foreach my $im (values %$importers) { # not done if we're scanning next if $opendirs->{$im->{git}->{git_dir}}; $im->done; @@ -262,11 +267,9 @@ sub scan { $opendirs->{$dir} = $dh if $n < 0; } } - if (keys %$opendirs) { # do we have more work to do? - trigger_scan($self, 'cont'); - } else { - _done_for_now($self); - } + _done_for_now($self); + # do we have more work to do? + trigger_scan($self, 'cont') if keys %$opendirs; } sub _path_to_mime { -- EW ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-06-26 4:01 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-06-26 3:15 [PATCH 0/5] watch: various bugfixes and fairness improvements Eric Wong 2017-06-26 3:15 ` [PATCH 1/5] watch: ensure HUP causes the scanner to be reloaded Eric Wong 2017-06-26 3:15 ` [PATCH 2/5] spamc: retry on EINTR Eric Wong 2017-06-26 3:15 ` [PATCH 3/5] watch: improve fairness during full rescans Eric Wong 2017-06-26 4:01 ` Eric Wong 2017-06-26 3:15 ` [PATCH 4/5] watch: use "self-inotify-tempfile trick" for quit Eric Wong 2017-06-26 3:15 ` [PATCH 5/5] watch: commit changes to fast-import sooner Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).