* [PATCH 0/2] watch-related fixes for filter users @ 2024-12-30 22:28 Eric Wong 2024-12-30 22:28 ` [PATCH 1/2] watch|mda|purge: Filter::*->scrub is destructive Eric Wong 2024-12-30 22:28 ` [PATCH 2/2] watch: import_eml: avoid Eml dup for non-scrub case Eric Wong 0 siblings, 2 replies; 4+ messages in thread From: Eric Wong @ 2024-12-30 22:28 UTC (permalink / raw) To: meta 1/2 is a bugfix, and 2/2 is a small optimization for common setups (probably immeasurable). Eric Wong (2): watch|mda|purge: Filter::*->scrub is destructive watch: import_eml: avoid Eml dup for non-scrub case lib/PublicInbox/Filter/Vger.pm | 15 +++++----- lib/PublicInbox/InboxWritable.pm | 2 -- lib/PublicInbox/Watch.pm | 50 ++++++++++++-------------------- script/public-inbox-mda | 19 +++++------- script/public-inbox-purge | 12 ++++---- 5 files changed, 39 insertions(+), 59 deletions(-) ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] watch|mda|purge: Filter::*->scrub is destructive 2024-12-30 22:28 [PATCH 0/2] watch-related fixes for filter users Eric Wong @ 2024-12-30 22:28 ` Eric Wong 2024-12-30 22:28 ` [PATCH 2/2] watch: import_eml: avoid Eml dup for non-scrub case Eric Wong 1 sibling, 0 replies; 4+ messages in thread From: Eric Wong @ 2024-12-30 22:28 UTC (permalink / raw) To: meta -watch was failing to properly account for inboxes having different filters and ->scrub behavior during message removal (but not adds). Explicitly duplicate and simplify the -watch code for handling message removal and update the rest of the callers to reflect the standardized destructive behavior of ->scrub. While most filters were destructive anyways, the Vger filter was a notable exception and updated to be destructive. We'll also update some variables from `$mime' uses to `$eml' to reflect the switch from (Email|PublicInbox)::MIME to PublicInbox::Eml years ago. Finally, the Vger filter includes updated comments and uses v5.10.1+ features, now. --- lib/PublicInbox/Filter/Vger.pm | 15 ++++++------- lib/PublicInbox/InboxWritable.pm | 2 -- lib/PublicInbox/Watch.pm | 36 +++++++++++--------------------- script/public-inbox-mda | 19 +++++++---------- script/public-inbox-purge | 12 +++++------ 5 files changed, 33 insertions(+), 51 deletions(-) diff --git a/lib/PublicInbox/Filter/Vger.pm b/lib/PublicInbox/Filter/Vger.pm index 5b3c0277..2202a217 100644 --- a/lib/PublicInbox/Filter/Vger.pm +++ b/lib/PublicInbox/Filter/Vger.pm @@ -1,9 +1,10 @@ -# Copyright (C) 2016-2021 all contributors <meta@public-inbox.org> +# Copyright (C) all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> # Filter for vger.kernel.org list trailer package PublicInbox::Filter::Vger; -use base qw(PublicInbox::Filter::Base); +use v5.10.1; # check regexps before v5.12 +use parent qw(PublicInbox::Filter::Base); use strict; use PublicInbox::Eml; @@ -18,16 +19,16 @@ my $l3 = my $l4 = qr!Please read the FAQ at +http://www\.tux\.org/lkml/!; sub scrub { - my ($self, $mime) = @_; - my $s = $mime->as_string; + my ($self, $eml) = @_; + my $s = $eml->as_string; - # the vger appender seems to only work on the raw string, + # the old vger appender seemed to only work on the raw string, # so in multipart (e.g. GPG-signed) messages, the list trailer # becomes invisible to MIME-aware email clients. if ($s =~ s/$l0\n$l1\n$l2\n$l3\n(?:$l4\n)?\n*\z//os) { - $mime = PublicInbox::Eml->new(\$s); + $_[1] = $eml= PublicInbox::Eml->new(\$s); } - $self->ACCEPT($mime); + $self->ACCEPT($eml); } sub delivery { diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm index b995a8ad..18412cc3 100644 --- a/lib/PublicInbox/InboxWritable.pm +++ b/lib/PublicInbox/InboxWritable.pm @@ -129,7 +129,6 @@ sub _each_maildir_eml { if ($self && (my $filter = $self->filter($im))) { my $ret = $filter->scrub($eml) or return; return if $ret == REJECT(); - $eml = $ret; } $im->add($eml); } @@ -153,7 +152,6 @@ sub _mbox_eml_cb { # MboxReader->mbox* callback if ($filter) { my $ret = $filter->scrub($eml) or return; return if $ret == REJECT(); - $eml = $ret; } $im->add($eml); } diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm index 5cde1d80..c270f587 100644 --- a/lib/PublicInbox/Watch.pm +++ b/lib/PublicInbox/Watch.pm @@ -176,30 +176,19 @@ sub remove_eml_i { # each_inbox callback eval { # try to avoid taking a lock or unnecessary spawning my $im = $self->{importers}->{"$ibx"}; - my $scrubbed; + my @eml = ($eml); + if (my $filter = $ibx->filter($im)) { + my $tmp = PublicInbox::Eml->new(\($eml->as_string)); + my $ret = $filter->scrub($tmp, 1); + push @eml, $tmp if $ret && $ret != REJECT; + } + if ((!$im || !$im->active) && $ibx->over) { - if (content_exists($ibx, $eml)) { - # continue - } elsif (my $scrub = $ibx->filter($im)) { - $scrubbed = $scrub->scrub($eml, 1); - if ($scrubbed && $scrubbed != REJECT && - !content_exists($ibx, $scrubbed)) { - return; - } - } else { - return; - } + @eml = grep { content_exists($ibx, $_) } @eml or return; } $im //= _importer_for($self, $ibx); # may spawn fast-import - $im->remove($eml, 'spam'); - $scrubbed //= do { - my $scrub = $ibx->filter($im); - $scrub ? $scrub->scrub($eml, 1) : undef; - }; - if ($scrubbed && $scrubbed != REJECT) { - $im->remove($scrubbed, 'spam'); - } + $im->remove($_, 'spam') for @eml; }; if ($@) { warn "error removing spam at: $loc from $ibx->{name}: $@\n"; @@ -232,10 +221,9 @@ sub import_eml ($$$) { } eval { my $im = _importer_for($self, $ibx); - if (my $scrub = $ibx->filter($im)) { - my $scrubbed = $scrub->scrub($eml) or return; - $scrubbed == REJECT and return; - $eml = $scrubbed; + if (my $filter = $ibx->filter($im)) { + my $ret = $filter->scrub($eml) or return; + $ret == REJECT and return; } $im->add($eml, $self->{spamcheck}); }; diff --git a/script/public-inbox-mda b/script/public-inbox-mda index 74202912..9d68ea35 100755 --- a/script/public-inbox-mda +++ b/script/public-inbox-mda @@ -120,26 +120,21 @@ my @rejects; for my $ibx (@$dests) { mda_filter_adjust($ibx); my $filter = $ibx->filter; - my $mime = PublicInbox::Eml->new($str); - my $ret = $filter->delivery($mime); - if (ref($ret) && ($ret->isa('PublicInbox::Eml') || - $ret->isa('Email::MIME'))) { # filter altered message - $mime = $ret; - } elsif ($ret == PublicInbox::Filter::Base::IGNORE) { - next; # nothing, keep looping - } elsif ($ret == PublicInbox::Filter::Base::REJECT) { + my $eml = PublicInbox::Eml->new($str); + my $ret = $filter->delivery($eml) or next; # $ret == IGNORE + if ($ret == PublicInbox::Filter::Base::REJECT) { push @rejects, $filter->err; next; - } + } # else success - PublicInbox::MDA->set_list_headers($mime, $ibx); + PublicInbox::MDA->set_list_headers($eml, $ibx); my $im = $ibx->importer(0); - if (defined $im->add($mime)) { + if (defined $im->add($eml)) { # ->abort is idempotent, no emergency if a single # destination succeeds $emm->abort; } else { # v1-only - my $mid = $mime->header_raw('Message-ID'); + my $mid = $eml->header_raw('Message-ID'); # this message is similar to what ssoma-mda shows: print STDERR "CONFLICT: Message-ID: $mid exists\n"; } diff --git a/script/public-inbox-purge b/script/public-inbox-purge index 618cfec4..381f58cd 100755 --- a/script/public-inbox-purge +++ b/script/public-inbox-purge @@ -38,16 +38,16 @@ PublicInbox::Eml::strip_from($data); my $n_purged = 0; foreach my $ibx (@ibxs) { - my $mime = PublicInbox::Eml->new($data); + my $eml = PublicInbox::Eml->new($data); my $v2w = PublicInbox::V2Writable->new($ibx, 0); - my $commits = $v2w->purge($mime) || []; + my $commits = $v2w->purge($eml) || []; - if (my $scrub = $ibx->filter($v2w)) { - my $scrubbed = $scrub->scrub($mime, 1); + if (my $filter = $ibx->filter($v2w)) { + my $ret = $filter->scrub($eml, 1); # destructive - if ($scrubbed && $scrubbed != REJECT()) { - my $scrub_commits = $v2w->purge($scrubbed); + if ($ret && $ret != REJECT) { + my $scrub_commits = $v2w->purge($eml); push @$commits, @$scrub_commits if $scrub_commits; } } ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] watch: import_eml: avoid Eml dup for non-scrub case 2024-12-30 22:28 [PATCH 0/2] watch-related fixes for filter users Eric Wong 2024-12-30 22:28 ` [PATCH 1/2] watch|mda|purge: Filter::*->scrub is destructive Eric Wong @ 2024-12-30 22:28 ` Eric Wong 2025-01-01 23:11 ` [PATCH 3/2] watch: don't propagate header changes to other inboxes Eric Wong 1 sibling, 1 reply; 4+ messages in thread From: Eric Wong @ 2024-12-30 22:28 UTC (permalink / raw) To: meta Most inboxes don't use filters, so the destructive ->scrub won't be called and we can avoid duplicating the Eml object in the common case. We'll also drop the last inbox optimization for net_cb since it's likely unnecessary given that filters are not very common. This may be more noticeable for users who map multiple inboxes to a single Maildir|IMAP|NNTP source. --- lib/PublicInbox/Watch.pm | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm index c270f587..a37038b5 100644 --- a/lib/PublicInbox/Watch.pm +++ b/lib/PublicInbox/Watch.pm @@ -222,8 +222,10 @@ sub import_eml ($$$) { eval { my $im = _importer_for($self, $ibx); if (my $filter = $ibx->filter($im)) { - my $ret = $filter->scrub($eml) or return; + my $tmp = PublicInbox::Eml->new(\($eml->as_string)); + my $ret = $filter->scrub($tmp) or return; $ret == REJECT and return; + $eml = $tmp; } $im->add($eml, $self->{spamcheck}); }; @@ -263,9 +265,9 @@ sub _try_path ($$) { $warn_cb->($pfx, "path: $path\n", @_); }; return _remove_spam($self, $path) if $inboxes eq 'watchspam'; + my $eml = eml_from_path($path) or return; for my $ibx (@$inboxes) { - my $eml = eml_from_path($path) or next; - import_eml($self, $ibx, $eml); # $eml may be scrubbed + import_eml($self, $ibx, $eml); } 1; } @@ -323,13 +325,9 @@ sub net_cb { # NetReader::(nntp|imap)_each callback return if grep(/\Adraft\z/, @$kw); local $self->{cur_uid} = $art; # IMAP UID or NNTP article if (ref($inboxes)) { - my @ibx = @$inboxes; - my $last = pop @ibx; - for my $ibx (@ibx) { - my $tmp = PublicInbox::Eml->new(\($eml->as_string)); - import_eml($self, $ibx, $tmp); + for my $ibx (@$inboxes) { + import_eml($self, $ibx, $eml); } - import_eml($self, $last, $eml); } elsif ($inboxes eq 'watchspam') { if ($uri->scheme =~ /\Aimaps?\z/ && !grep(/\Aseen\z/, @$kw)) { return; ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/2] watch: don't propagate header changes to other inboxes 2024-12-30 22:28 ` [PATCH 2/2] watch: import_eml: avoid Eml dup for non-scrub case Eric Wong @ 2025-01-01 23:11 ` Eric Wong 0 siblings, 0 replies; 4+ messages in thread From: Eric Wong @ 2025-01-01 23:11 UTC (permalink / raw) To: meta Message-IDs may be appended by v2 inboxes for messages with with conflicting Message-IDs or SpamAssassin (or another spam filter) can change headers of both v1 and v2 messages. So we rely on `local' to reach into Eml internals to localize ->header_set modifications and rely on modern Perl having copy-on-write scalars to minimize memory traffic in the common case. --- MANIFEST | 1 + lib/PublicInbox/Watch.pm | 4 ++ t/eml.t | 9 +++++ t/watch_v1_v2_mix_no_modify.t | 74 +++++++++++++++++++++++++++++++++++ 4 files changed, 88 insertions(+) create mode 100644 t/watch_v1_v2_mix_no_modify.t diff --git a/MANIFEST b/MANIFEST index b0b4f71c..30adab80 100644 --- a/MANIFEST +++ b/MANIFEST @@ -641,6 +641,7 @@ t/watch_maildir.t t/watch_maildir_v2.t t/watch_mh.t t/watch_multiple_headers.t +t/watch_v1_v2_mix_no_modify.t t/www_altid.t t/www_listing.t t/www_static.t diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm index a37038b5..eb6eb85f 100644 --- a/lib/PublicInbox/Watch.pm +++ b/lib/PublicInbox/Watch.pm @@ -210,6 +210,10 @@ sub _remove_spam { sub import_eml ($$$) { my ($self, $ibx, $eml) = @_; + # v2 may add a new Message-ID header on conflicts w/ different content + # and SpamAssassin may alter headers. $copy is CoW in newer Perls. + local $eml->{hdr} = \(my $copy = ${$eml->{hdr}}); + # any header match means it's eligible for the inbox: if (my $watch_hdrs = $ibx->{-watchheaders}) { my $ok; diff --git a/t/eml.t b/t/eml.t index 690ada57..d63f7a3b 100644 --- a/t/eml.t +++ b/t/eml.t @@ -28,6 +28,15 @@ sub mime_load ($) { is($eml->as_string, "a: b\n\nhi\n", '->as_string'); my $empty = PublicInbox::Eml->new("\n\n"); is($empty->as_string, "\n\n", 'empty message'); + + # we use this pattern in -watch for writing an eml to multiple inboxes + my $in = do { # n.b. `my' must be used to assign a local var + local $eml->{hdr} = \(my $copy = ${$eml->{hdr}}); + $eml->header_set(qw(Message-ID <a@example> <b@example>)); + ${$eml->{hdr}}; + }; + like $in, qr/<a\@example>.*?<b\@example>/s, 'Message-ID set in copy'; + is $eml->header_raw('Message-ID'), undef, 'local $eml->{hdr} works'; } for my $cls (@classes) { diff --git a/t/watch_v1_v2_mix_no_modify.t b/t/watch_v1_v2_mix_no_modify.t new file mode 100644 index 00000000..759fd0bb --- /dev/null +++ b/t/watch_v1_v2_mix_no_modify.t @@ -0,0 +1,74 @@ +#!perl -w +# Copyright (C) all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use v5.12; +use autodie; +use PublicInbox::TestCommon; +require_mods 'v2'; +use PublicInbox::InboxIdle; +use PublicInbox::DS qw(now); +use PublicInbox::IO qw(write_file); +use PublicInbox::Eml; +my $tmpdir = tmpdir; +local $ENV{HOME} = "$tmpdir"; +local $ENV{PI_CONFIG} = "$tmpdir/.public-inbox/config"; +my $msg = <<'EOM'; +From: a@example.com +To: v1@example.com, v2@example.com +Subject: test +Message-ID: <a@example> + +hi +EOM + +for my $v (2, 1) { + run_script(['-init', "-V$v", "v$v", '-Lbasic', "$tmpdir/v$v", + "https://example.com/v$v", "v$v\@example.com" ]) or + xbail "init -V$v"; + write_file '>>', $ENV{PI_CONFIG}, "watch = maildir:$tmpdir/md$v"; + mkdir "$tmpdir/md$v/$_" for ('', qw(cur tmp new)); +} +my $orig = "$tmpdir/md2/cur/initial:2,S"; +write_file '>', $orig, $msg; + +my $delivered = 0; +my $cb = sub { + my ($ibx) = @_; + diag "message delivered to `$ibx->{name}'"; + $delivered++; +}; +my $ii = PublicInbox::InboxIdle->new(my $cfg = PublicInbox::Config->new); +my $obj = bless \$cb, 'PublicInbox::TestCommon::InboxWakeup'; +$cfg->each_inbox(sub { $_[0]->subscribe_unlock('ident', $obj) }); +my $exp = now + 10; +local @PublicInbox::DS::post_loop_do = (sub { $delivered < 1 || now > $exp}); + +my $rdr; +open $rdr->{2}, '>>', undef; +my $w = start_script(['-watch'], undef, $rdr); +diag 'waiting for -watch to import 2 existing messages..'; +PublicInbox::DS::add_timer 10, \&PublicInbox::Config::noop; +PublicInbox::DS::event_loop; +is $delivered, 1, 'delivered initial message'; + +$exp = now + 10; +PublicInbox::DS::add_timer 10, \&PublicInbox::Config::noop; +local @PublicInbox::DS::post_loop_do = (sub { $delivered < 3 || now > $exp }); +my $tmpmsg = "$tmpdir/md2/tmp/conflict:2,S"; +write_file '>', $tmpmsg, $msg, "hi\n"; +link $tmpmsg, "$tmpdir/md1/cur/conflict:2,S"; +rename $tmpmsg, "$tmpdir/md2/cur/conflict:2,S"; +PublicInbox::DS::event_loop; +is $delivered, 3, 'delivered new conflicting message to v2 and new to v1'; + +my $v1ibx = $cfg->lookup('v1@example.com'); +my $v1msgs = $v1ibx->over->recent; +is scalar(@$v1msgs), 1, 'only one message in v1'; +my $v1eml = PublicInbox::Eml->new($v1ibx->git->cat_file($v1msgs->[0]->{blob})); +my @mid = $v1eml->header_raw('Message-ID'); +is_deeply \@mid, [ qw[<a@example>] ], 'only one Message-ID in v1 message'; + +my $v2msgs = $cfg->lookup('v2@example.com')->over->recent; +is scalar(@$v2msgs), 2, 'both messages in v2'; + +done_testing; ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-01-01 23:11 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-30 22:28 [PATCH 0/2] watch-related fixes for filter users Eric Wong 2024-12-30 22:28 ` [PATCH 1/2] watch|mda|purge: Filter::*->scrub is destructive Eric Wong 2024-12-30 22:28 ` [PATCH 2/2] watch: import_eml: avoid Eml dup for non-scrub case Eric Wong 2025-01-01 23:11 ` [PATCH 3/2] watch: don't propagate header changes to other inboxes Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).