From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Subject: [PATCH 1/3] learn: support --keep-going/-k switch
Date: Fri, 25 Oct 2024 03:19:55 +0000 [thread overview]
Message-ID: <20241025031957.1305385-2-e@80x24.org> (raw)
In-Reply-To: <20241025031957.1305385-1-e@80x24.org>
Inspired by make(1), this switch allows -learn to work with
config files which contain both read-only and read-write
inboxes.
The remove_or_add() sub is now anonymous and locally-scoped to
facilitate `@fail_ibx' variable sharing use when the entire
script is made into a subroutine for `make check-run' (via
TestCommon->key2sub). A subsequent commit will reduce needless
parameter passing of global variables for readability.
Requested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20241024-nimble-honest-stallion-dad849@lemur/
---
Documentation/public-inbox-learn.pod | 25 +++++++-
MANIFEST | 1 +
script/public-inbox-learn | 95 ++++++++++++++++------------
t/learn.t | 85 +++++++++++++++++++++++++
4 files changed, 164 insertions(+), 42 deletions(-)
create mode 100644 t/learn.t
diff --git a/Documentation/public-inbox-learn.pod b/Documentation/public-inbox-learn.pod
index b08e4bc8..ccb967fa 100644
--- a/Documentation/public-inbox-learn.pod
+++ b/Documentation/public-inbox-learn.pod
@@ -54,9 +54,28 @@ This is similar to the C<spam> command above, but does
not feed the message to L<spamc(1)> and only removes messages
which match on any of the C<To:>, C<Cc:>, and C<List-ID:> headers.
-The C<--all> option may be used to match C<spam> semantics in removing
-the message from all configured inboxes. C<--all> is only
-available in public-inbox 1.6.0+.
+=back
+
+=head1 OPTIONS
+
+=over
+
+=item --all
+
+C<--all> may be used with C<rm> to match C<spam> semantics in
+removing a message from all configured inboxes. C<--all> is
+only available in public-inbox 1.6.0+.
+
+=item --keep-going
+
+=item -k
+
+Continue to other inboxes after errors. This ensures C<spam>
+and C<rm --all> can succeed on all writable inboxes and act
+as a no-op on read-only inboxes.
+
+Useful for dealing with cases where some inboxes in PI_CONFIG
+are read-only.
=back
diff --git a/MANIFEST b/MANIFEST
index 3c4cefb4..6b759487 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -507,6 +507,7 @@ t/io.t
t/ipc.t
t/iso-2202-jp.eml
t/kqnotify.t
+t/learn.t
t/lei-auto-watch.t
t/lei-convert.t
t/lei-daemon.t
diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index a955cdf6..dae4400f 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -1,5 +1,5 @@
#!/usr/bin/perl -w
-# Copyright (C) 2014-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
#
# Used for training spam (via SpamAssassin) and removing messages from a
@@ -16,6 +16,7 @@ required action argument:
options:
--all scan all inboxes on `rm'
+ -k keep going after errors (for read-only inboxes)
See public-inbox-learn(1) man page for full documentation.
EOF
@@ -27,7 +28,7 @@ use PublicInbox::Address;
use PublicInbox::Spamcheck::Spamc;
use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
my %opt = (all => 0);
-GetOptions(\%opt, qw(all help|h)) or die $help;
+GetOptions(\%opt, qw(all keep-going|k help|h)) or die $help;
use PublicInbox::Import;
my $train = shift or die $help;
@@ -40,7 +41,7 @@ my $spamc = PublicInbox::Spamcheck::Spamc->new;
my $pi_cfg = PublicInbox::Config->new;
local $PublicInbox::Import::DROP_UNIQUE_UNSUB;
PublicInbox::Import::load_config($pi_cfg);
-my $err;
+my ($err, @fail_ibx);
my $mime = PublicInbox::Eml->new(do{
my $data = PublicInbox::IO::read_all \*STDIN;
PublicInbox::Eml::strip_from($data);
@@ -52,51 +53,66 @@ my $mime = PublicInbox::Eml->new(do{
} elsif ($train eq 'spam') {
$spamc->spamlearn(\$data);
}
- die "spamc failed with: $?\n" if $?;
+ die "E: spamc failed with: $?\n" if $?;
};
$err = $@;
}
\$data
});
-sub remove_or_add ($$$$) {
+my $ibx_fail = sub {
+ my ($ibx) = @_;
+ my $m = "E: $@ ($ibx->{inboxdir})\n";
+ die $m if !$opt{'keep-going'};
+ warn $m;
+ push @fail_ibx, $ibx;
+};
+
+my $remove_or_add = sub {
my ($ibx, $train, $mime, $addr) = @_;
+ eval {
+ # We do not touch GIT_COMMITTER_* env here so we can track
+ # who trained the message.
+ $ibx->{name} = $ENV{GIT_COMMITTER_NAME} // $ibx->{name};
+ $ibx->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} // $addr;
+ $ibx = PublicInbox::InboxWritable->new($ibx);
+ $ibx->{indexlevel} = $ibx->detect_indexlevel;
+ my $im = $ibx->importer(0);
- # We do not touch GIT_COMMITTER_* env here so we can track
- # who trained the message.
- $ibx->{name} = $ENV{GIT_COMMITTER_NAME} // $ibx->{name};
- $ibx->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} // $addr;
- $ibx = PublicInbox::InboxWritable->new($ibx);
- $ibx->{indexlevel} = $ibx->detect_indexlevel;
- my $im = $ibx->importer(0);
-
- if ($train eq "rm") {
- # This needs to be idempotent, as my inotify trainer
- # may train for each cross-posted message, and this
- # script already learns for every list in
- # ~/.public-inbox/config
- $im->remove($mime, $train);
- } elsif ($train eq "ham") {
- # no checking for spam here, we assume the message has
- # been reviewed by a human at this point:
- PublicInbox::MDA->set_list_headers($mime, $ibx);
-
- # Ham messages are trained when they're marked into
- # a SEEN state, so this is idempotent:
- $im->add($mime);
- }
- $im->done;
-}
+ if ($train eq "rm") {
+ # This needs to be idempotent, as my inotify trainer
+ # may train for each cross-posted message, and this
+ # script already learns for every list in
+ # ~/.public-inbox/config
+ $im->remove($mime, $train);
+ } elsif ($train eq "ham") {
+ # no checking for spam here, we assume the message has
+ # been reviewed by a human at this point:
+ PublicInbox::MDA->set_list_headers($mime, $ibx);
+
+ # Ham messages are trained when they're marked into
+ # a SEEN state, so this is idempotent:
+ $im->add($mime);
+ }
+ $im->done;
+ };
+ $ibx_fail->($ibx) if $@;
+};
-# spam is removed from all known inboxes since it is often Bcc:-ed
-if ($train eq 'spam' || ($train eq 'rm' && $opt{all})) {
- $pi_cfg->each_inbox(sub {
- my ($ibx) = @_;
+my $remove_all = sub { # each_inbox cb
+ my ($ibx) = @_;
+ eval {
$ibx = PublicInbox::InboxWritable->new($ibx);
my $im = $ibx->importer(0);
$im->remove($mime, $train);
$im->done;
- });
+ };
+ $ibx_fail->($ibx) if $@;
+};
+
+# spam is removed from all known inboxes since it is often Bcc:-ed
+if ($train eq 'spam' || ($train eq 'rm' && $opt{all})) {
+ $pi_cfg->each_inbox($remove_all);
} else {
require PublicInbox::MDA;
@@ -114,16 +130,17 @@ if ($train eq 'spam' || ($train eq 'rm' && $opt{all})) {
while (my ($addr, $ibx) = each %dests) {
next unless ref($ibx); # $ibx may be 0
next if $seen{0 + $ibx}++;
- remove_or_add($ibx, $train, $mime, $addr);
+ $remove_or_add->($ibx, $train, $mime, $addr);
}
my $dests = PublicInbox::MDA->inboxes_for_list_id($pi_cfg, $mime);
for my $ibx (@$dests) {
next if $seen{0 + $ibx}++;
- remove_or_add($ibx, $train, $mime, $ibx->{-primary_address});
+ $remove_or_add->($ibx, $train, $mime, $ibx->{-primary_address});
}
}
-if ($err) {
- warn $err;
+if ($err || @fail_ibx) {
+ warn $err if $err;
+ warn 'E: ', scalar(@fail_ibx), " inbox(es) failed\n";
exit 1;
}
diff --git a/t/learn.t b/t/learn.t
new file mode 100644
index 00000000..0c848459
--- /dev/null
+++ b/t/learn.t
@@ -0,0 +1,85 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use v5.12;
+use autodie;
+use PublicInbox::TestCommon;
+plan skip_all => "cannot test $0 as root" if $> == 0;
+require_mods qw(DBD::SQLite);
+my ($v2, $v2ro, @v2cfg);
+my $tmpdir = tmpdir;
+my $eml = eml_load 't/plack-qp.eml';
+my $v1 = create_inbox 'v1', indexlevel => 'basic', tmpdir => "$tmpdir/v1", sub {
+ my ($im) = @_;
+ $im->add($eml);
+};
+
+my $v1ro = create_inbox 'v1ro', indexlevel => 'basic',
+ tmpdir => "$tmpdir/v1ro", sub {
+ my ($im) = @_;
+ $im->add($eml);
+};
+chmod 0500, $v1ro->{inboxdir};
+chmod 0400, glob("$v1ro->{inboxdir}/public-inbox/xapian*/over.sqlite3"),
+ glob("$v1ro->{inboxdir}/public-inbox/msgmap.sqlite3");
+
+SKIP: {
+ require_git v2.6, 1;
+ $v2 = create_inbox 'v2', indexlevel => 'basic', version => 2,
+ tmpdir => "$tmpdir/v2", sub {
+ my ($im, $ibx) = @_;
+ $im->add($eml);
+ };
+ $v2ro = create_inbox 'v2', indexlevel => 'basic', version => 2,
+ tmpdir => "$tmpdir/v2ro", sub {
+ my ($im, $ibx) = @_;
+ $im->add($eml);
+ };
+ chmod 0500, $v2ro->{inboxdir}, "$v2ro->{inboxdir}/git/0.git";
+ chmod 0400, glob("$v2ro->{inboxdir}/xap*/over.sqlite3"),
+ glob("$v2ro->{inboxdir}/msgmap.sqlite3");
+ @v2cfg = (<<EOM);
+[publicinbox "v2ro"]
+ inboxdir = $v2ro->{inboxdir};
+ address = v2ro\@example.com
+ indexlevel = basic
+[publicinbox "v2"]
+ inboxdir = $v2->{inboxdir};
+ address = v2\@example.com
+ indexlevel = basic
+EOM
+}
+
+my $cfg = cfg_new $tmpdir, <<EOM, @v2cfg;
+[publicinbox "v1ro"]
+ inboxdir = $v1ro->{inboxdir}
+ address = v1ro\@example.com
+ indexlevel = basic
+[publicinbox "v1"]
+ inboxdir = $v1->{inboxdir}
+ address = v1\@example.com
+ indexlevel = basic
+EOM
+
+my $opt = {
+ 0 => \($eml->as_string),
+ 1 => \(my $out),
+ 2 => \(my $err),
+};
+my $env = { PI_CONFIG => $cfg->{-f} };
+
+run_script [ qw(-learn rm --all -k) ], $env, $opt;
+isnt $?, 0, 'learn $? is non-zero';
+is 0, $v1->over->max, 'removed from r/w v1';
+is 1, $v1ro->over->max, 'not removed from r/o v1';
+my $nr = 1;
+SKIP: {
+ require_git v2.6, 1;
+ is 0, $v2->over->max, 'removed from r/w v2';
+ is 1, $v2ro->over->max, 'not removed from r/o v2';
+ $nr = 2;
+}
+
+like $err, qr/E: $nr inbox\(es\) failed/, 'failures noted in stderr';
+is $out, '', 'stdout is empty';
+
+done_testing;
next prev parent reply other threads:[~2024-10-25 3:19 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 3:19 [PATCH 0/3] learn: support --keep-going + cleanups Eric Wong
2024-10-25 3:19 ` Eric Wong [this message]
2024-10-25 3:19 ` [PATCH 2/3] learn: reduce parameter passing Eric Wong
2024-10-25 3:19 ` [PATCH 3/3] learn: use Perl 5.12 Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241025031957.1305385-2-e@80x24.org \
--to=e@80x24.org \
--cc=konstantin@linuxfoundation.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).