From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 90D031FA12; Mon, 31 Aug 2020 04:41:41 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Cc: Eric Wong Subject: [PATCH 05/11] watch: avoid unnecessary spawning on spam removals Date: Mon, 31 Aug 2020 04:41:34 +0000 Message-Id: <20200831044140.17027-6-e@80x24.org> In-Reply-To: <20200831044140.17027-1-e@80x24.org> References: <20200831044140.17027-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: From: Eric Wong This should further mitigate lock contention problems when -watch is configured to watch on a Maildir for spam while performing a large NNTP import. There is now a small risk a message won't get removed because if it's in the current (uncommitted) fast-import batch, but unlikely given the batch size is now only 10 messages. If a that small window is hit, flipping the \Seen flag (e.g. marking it unread, and then read again) will trigger another removal attempt via IMAP or Maildir. --- lib/PublicInbox/Import.pm | 3 +++ lib/PublicInbox/V2Writable.pm | 3 +++ lib/PublicInbox/Watch.pm | 31 +++++++++++++++++++++++++------ 3 files changed, 31 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index 700b4026..ee5ca2ea 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/PublicInbox/Import.pm @@ -461,6 +461,9 @@ sub init_bare { } } +# true if locked and active +sub active { !!$_[0]->{out} } + sub done { my ($self) = @_; my $w = delete $self->{out} or return; diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index f2288904..553dd839 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -655,6 +655,9 @@ sub checkpoint ($;$) { # public sub barrier { checkpoint($_[0], 1) }; +# true if locked and active +sub active { !!$_[0]->{im} } + # public sub done { my ($self) = @_; diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm index 5f786139..0bb92d0a 100644 --- a/lib/PublicInbox/Watch.pm +++ b/lib/PublicInbox/Watch.pm @@ -134,15 +134,34 @@ sub _done_for_now { sub remove_eml_i { # each_inbox callback my ($ibx, $arg) = @_; my ($self, $eml, $loc) = @$arg; + eval { - my $im = _importer_for($self, $ibx); - $im->remove($eml, 'spam'); - if (my $scrub = $ibx->filter($im)) { - my $scrubbed = $scrub->scrub($eml, 1); - if ($scrubbed && $scrubbed != REJECT) { - $im->remove($scrubbed, 'spam'); + # try to avoid taking a lock or unnecessary spawning + my $im = $self->{importers}->{"$ibx"}; + my $scrubbed; + if ((!$im || !$im->active) && $ibx->over) { + if (content_exists($ibx, $eml)) { + # continue + } elsif (my $scrub = $ibx->filter($im)) { + $scrubbed = $scrub->scrub($eml, 1); + if ($scrubbed && $scrubbed != REJECT && + !content_exists($ibx, $scrubbed)) { + return; + } + } else { + return; } } + + $im //= _importer_for($self, $ibx); # may spawn fast-import + $im->remove($eml, 'spam'); + $scrubbed //= do { + my $scrub = $ibx->filter($im); + $scrub ? $scrub->scrub($eml, 1) : undef; + }; + if ($scrubbed && $scrubbed != REJECT) { + $im->remove($scrubbed, 'spam'); + } }; if ($@) { warn "error removing spam at: $loc from $ibx->{name}: $@\n";