Mostly stuff around -watch, I've finally decided NNTP and IMAP client support aren't as horrible-to-configure as I was imagining. Well, maybe, at least I'm finding it somewhat useful... One problem I've been trying to avoid is having excessive, overwhelming amounts of documentation. I tend to get overwhelmed myself when learning new things, too. Moving the watch-only stuff out of the config manpage seems like a step in the right direction in that regard. Eric Wong (8): watchmaildir: ensure I:/W:/E: prefixes in warnings imaptracker: preserve WAL journal_mode if set by user overidx: inline create_ghost sub doc: document graceful shutdown signals doc: speling fickses watch: imap: only remove \Seen spam doc: move watch config docs to -watch manpage doc: watch: expand on NNTP and IMAP-specific knobs Documentation/public-inbox-config.pod | 38 ++-------- Documentation/public-inbox-edit.pod | 2 +- Documentation/public-inbox-purge.pod | 2 +- Documentation/public-inbox-tuning.pod | 2 +- Documentation/public-inbox-watch.pod | 100 +++++++++++++++++++++++--- lib/PublicInbox/IMAPTracker.pm | 7 +- lib/PublicInbox/OverIdx.pm | 23 +++--- lib/PublicInbox/WatchMaildir.pm | 31 +++++--- t/over.t | 4 +- 9 files changed, 135 insertions(+), 74 deletions(-)
For consistency in output, any URL/path-context-dependent prefixes should have the same prefix as the actual warning which triggered it. --- lib/PublicInbox/WatchMaildir.pm | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm index 2ba10a9e..768e0efe 100644 --- a/lib/PublicInbox/WatchMaildir.pm +++ b/lib/PublicInbox/WatchMaildir.pm @@ -198,7 +198,10 @@ sub _try_path { return; } my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ }; - local $SIG{__WARN__} = sub { $warn_cb->("path: $path\n", @_) }; + local $SIG{__WARN__} = sub { + my $pfx = ($_[0] // '') =~ /^([A-Z]: )/g ? $1 : ''; + $warn_cb->($pfx, "path: $path\n", @_); + }; if (!ref($inboxes) && $inboxes eq 'watchspam') { return _remove_spam($self, $path); } @@ -443,8 +446,9 @@ sub imap_fetch_all ($$$) { my ($uids, $batch); my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ }; local $SIG{__WARN__} = sub { + my $pfx = ($_[0] // '') =~ /^([A-Z]: )/g ? $1 : ''; $batch //= '?'; - $warn_cb->("$url UID:$batch\n", @_); + $warn_cb->("$pfx$url UID:$batch\n", @_); }; my $err; do { @@ -875,7 +879,8 @@ sub nntp_fetch_all ($$$) { my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ }; my ($err, $art); local $SIG{__WARN__} = sub { - $warn_cb->("$url ", $art ? ("ARTICLE $art") : (), "\n", @_); + my $pfx = ($_[0] // '') =~ /^([A-Z]: )/g ? $1 : ''; + $warn_cb->("$pfx$url ", $art ? ("ARTICLE $art") : (), "\n", @_); }; my $inboxes = $self->{nntp}->{$url}; my $last_art;
It's no problem for most users to enable WAL, here, since there's only a single process doing both reading and writing (unlike the read-only daemons). However, WAL doesn't work on network filesystems, so it can't be enabled by default. --- lib/PublicInbox/IMAPTracker.pm | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm index 102a74ce..92f21584 100644 --- a/lib/PublicInbox/IMAPTracker.pm +++ b/lib/PublicInbox/IMAPTracker.pm @@ -29,7 +29,12 @@ sub dbh_new ($) { sqlite_use_immediate_transaction => 1, }); $dbh->{sqlite_unicode} = 1; - $dbh->do('PRAGMA journal_mode = TRUNCATE'); + + # TRUNCATE reduces I/O compared to the default (DELETE). + # Allow and preserve user-overridden WAL, but don't force it. + my $jm = $dbh->selectrow_array('PRAGMA journal_mode'); + $dbh->do('PRAGMA journal_mode = TRUNCATE') if $jm ne 'wal'; + create_tables($dbh); $dbh; }
There's no need for this to be a separate sub since there's only a single caller. This saves a few kilobytes at least in short-lived processes. --- lib/PublicInbox/OverIdx.pm | 23 ++++++++++------------- t/over.t | 4 ++-- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm index 67f8cf65..6f0477f0 100644 --- a/lib/PublicInbox/OverIdx.pm +++ b/lib/PublicInbox/OverIdx.pm @@ -184,23 +184,20 @@ sub resolve_mid_to_tid { if (my $del = delete $self->{-ghosts_to_delete}) { delete_by_num($self, $_) for @$del; } - $tid // create_ghost($self, $mid); -} - -sub create_ghost { - my ($self, $mid) = @_; - my $id = mid2id($self, $mid); - my $num = next_ghost_num($self); - $num < 0 or die "ghost num is non-negative: $num\n"; - my $tid = next_tid($self); - my $dbh = $self->{dbh}; - $dbh->prepare_cached(<<'')->execute($num, $tid); + $tid // do { # create a new ghost + my $id = mid2id($self, $mid); + my $num = next_ghost_num($self); + $num < 0 or die "ghost num is non-negative: $num\n"; + $tid = next_tid($self); + my $dbh = $self->{dbh}; + $dbh->prepare_cached(<<'')->execute($num, $tid); INSERT INTO over (num, tid) VALUES (?,?) - $dbh->prepare_cached(<<'')->execute($id, $num); + $dbh->prepare_cached(<<'')->execute($id, $num); INSERT INTO id2num (id, num) VALUES (?,?) - $tid; + $tid; + }; } sub merge_threads { diff --git a/t/over.t b/t/over.t index 41c13872..4c8f8098 100644 --- a/t/over.t +++ b/t/over.t @@ -33,9 +33,9 @@ $over->dbh; is($over->sid('hello-world'), $x, 'idempotent across reopen'); $over->each_by_mid('never', sub { fail('should not be called') }); -$x = $over->create_ghost('never'); +$x = $over->resolve_mid_to_tid('never'); is(int($x), $x, 'integer tid for ghost'); -$y = $over->create_ghost('NEVAR'); +$y = $over->resolve_mid_to_tid('NEVAR'); is($y, $x + 1, 'integer tid for ghost increases'); my $ddd = compress('');
Same as the read-only daemons. --- Documentation/public-inbox-watch.pod | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/Documentation/public-inbox-watch.pod b/Documentation/public-inbox-watch.pod index bf3c9bd4..34e8c4f2 100644 --- a/Documentation/public-inbox-watch.pod +++ b/Documentation/public-inbox-watch.pod @@ -93,6 +93,11 @@ Reload the config file (default: ~/.public-inbox/config) Rescan all watched mailboxes. This is done automatically after startup. +=item SIGQUIT / SIGTERM / SIGINT + +Gracefully shut down. In-flight messages will be stored +and indexed. + =back =head1 ENVIRONMENT
--- Documentation/public-inbox-edit.pod | 2 +- Documentation/public-inbox-purge.pod | 2 +- Documentation/public-inbox-tuning.pod | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/public-inbox-edit.pod b/Documentation/public-inbox-edit.pod index 3853fa9c..68180872 100644 --- a/Documentation/public-inbox-edit.pod +++ b/Documentation/public-inbox-edit.pod @@ -102,7 +102,7 @@ edited data remaining indexed. Incremental L<public-inbox-index(1)> (without C<--reindex>) is fine. -Keep in mind this is a last resort, as it will be distruptive +Keep in mind this is a last resort, as it will be disruptive to anyone using L<git(1)> to mirror the inbox being edited. =head1 CONTACT diff --git a/Documentation/public-inbox-purge.pod b/Documentation/public-inbox-purge.pod index e20e18df..a9479657 100644 --- a/Documentation/public-inbox-purge.pod +++ b/Documentation/public-inbox-purge.pod @@ -62,7 +62,7 @@ purged data remaining indexed. Incremental L<public-inbox-index(1)> (without C<--reindex>) is fine. -Keep in mind this is a last resort, as it will be distruptive +Keep in mind this is a last resort, as it will be disruptive to anyone using L<git(1)> to mirror the inbox being purged. =head1 CONTACT diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod index e3f2899b..b4e7698b 100644 --- a/Documentation/public-inbox-tuning.pod +++ b/Documentation/public-inbox-tuning.pod @@ -131,7 +131,7 @@ for parallelism. The open file descriptor limit (C<RLIMIT_NOFILE>, C<ulimit -n> in L<sh(1)>, C<LimitNOFILE=> in L<systemd.exec(5)>) may need to be raised to -accomodate many concurrent clients. +accommodate many concurrent clients. Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly increases memory use of client sockets, sure to account for that in
This matches the behavior of Maildir `watchspam' handling in not removing unseen messages. NNTP can't match this behavior, since NNTP servers don't store flags, clients do. --- lib/PublicInbox/WatchMaildir.pm | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm index 768e0efe..4ae400f7 100644 --- a/lib/PublicInbox/WatchMaildir.pm +++ b/lib/PublicInbox/WatchMaildir.pm @@ -382,8 +382,8 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient $mic; } -sub imap_import_msg ($$$$) { - my ($self, $url, $uid, $raw) = @_; +sub imap_import_msg ($$$$$) { + my ($self, $url, $uid, $raw, $flags) = @_; # our target audience expects LF-only, save storage $$raw =~ s/\r\n/\n/sg; @@ -394,10 +394,13 @@ sub imap_import_msg ($$$$) { my $x = import_eml($self, $ibx, $eml); } } elsif ($inboxes eq 'watchspam') { - local $SIG{__WARN__} = warn_ignore_cb(); - my $eml = PublicInbox::Eml->new($raw); - my $arg = [ $self, $eml, "$url UID:$uid" ]; - $self->{config}->each_inbox(\&remove_eml_i, $arg); + # we don't remove unseen messages + if ($flags =~ /\\Seen\b/) { + local $SIG{__WARN__} = warn_ignore_cb(); + my $eml = PublicInbox::Eml->new($raw); + my $arg = [ $self, $eml, "$url UID:$uid" ]; + $self->{config}->each_inbox(\&remove_eml_i, $arg); + } } else { die "BUG: destination unknown $inboxes"; } @@ -474,7 +477,7 @@ sub imap_fetch_all ($$$) { my @batch = splice(@$uids, 0, $bs); $batch = join(',', @batch); local $0 = "UID:$batch $mbx $sec"; - my $r = $mic->fetch_hash($batch, $req); + my $r = $mic->fetch_hash($batch, $req, 'FLAGS'); unless ($r) { # network error? $err = "E: $url UID FETCH $batch error: $!"; last; @@ -483,7 +486,8 @@ sub imap_fetch_all ($$$) { # messages get deleted, so holes appear my $per_uid = delete $r->{$uid} // next; my $raw = delete($per_uid->{$key}) // next; - imap_import_msg($self, $url, $uid, \$raw); + my $fl = $per_uid->{FLAGS} // ''; + imap_import_msg($self, $url, $uid, \$raw, $fl); $last_uid = $uid; last if $self->{quit}; }
The -config manpage is a bit long and the -watch stuff is isolated from the rest of it while we start documenting NNTP and IMAP support. I'm not entirely happy with the way IMAP and NNTP are configured, it's still good enough for small setups. This also fixes a long-standing misplaced comment about `publicinboxwatch.spamcheck' affecting all configured inboxes, that comment was actually for `publicinboxwatch.watchspam'. We'll omit documenting NNTP for `watchspam', for now, given the lack of \Seen flags in NNTP and I'm not sure if it's even useful. There may not be any newsgroups for sharing confirmed spam, either... --- Documentation/public-inbox-config.pod | 38 ++--------------- Documentation/public-inbox-watch.pod | 61 ++++++++++++++++++++++----- 2 files changed, 55 insertions(+), 44 deletions(-) diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod index 1dfb926e..2d845f16 100644 --- a/Documentation/public-inbox-config.pod +++ b/Documentation/public-inbox-config.pod @@ -74,26 +74,11 @@ Default: none, optional =item publicinbox.<name>.watch -A location for L<public-inbox-watch(1)> to watch. Currently, -only C<maildir:> paths are supported: - - [publicinbox "test"] - watch = maildir:/path/to/maildirs/.INBOX.test/ - -Default: none; only for L<public-inbox-watch(1)> users +See L<public-inbox-watch(1)> =item publicinbox.<name>.watchheader - [publicinbox "test"] - watchheader = List-Id:<test.example.com> - -If specified, L<public-inbox-watch(1)> will only process mail -matching the given header. If specified multiple times in -public-inbox 1.5 or later, mail will be processed if it matches -any of the values. Only the last value was used in public-inbox -1.4 and earlier. - -Default: none; only for L<public-inbox-watch(1)> users +See L<public-inbox-watch(1)> =item publicinbox.<name>.listid @@ -204,26 +189,11 @@ Default: spamc =item publicinboxwatch.spamcheck -This may be set to C<spamc> to enable the use of SpamAssassin -L<spamc(1)> for filtering spam before it is imported into git -history. Other spam filtering backends may be supported in -the future. - -This requires L<public-inbox-watch(1)>, but affects all configured -public-inboxes in PI_CONFIG. - -Default: none +See L<public-inbox-watch(1)> =item publicinboxwatch.watchspam -A Maildir to watch for confirmed spam messages to appear in. -Messages which appear in this folder with the (S)een Maildir flag -will be hidden from all configured inboxes based on Message-ID -and content matching. - -Messages without the (S)een Maildir flag are not considered for hiding. - -Default: none; only for L<public-inbox-watch(1)> users +See L<public-inbox-watch(1)> =item publicinbox.nntpserver diff --git a/Documentation/public-inbox-watch.pod b/Documentation/public-inbox-watch.pod index 34e8c4f2..b07d0fb5 100644 --- a/Documentation/public-inbox-watch.pod +++ b/Documentation/public-inbox-watch.pod @@ -35,8 +35,8 @@ In ~/.public-inbox/config: =head1 DESCRIPTION -public-inbox-watch allows watching a mailbox (currently only -Maildir) for the arrival of new messages and automatically +public-inbox-watch allows watching a mailbox or newsgroup +for the arrival of new messages and automatically importing them into public-inbox git repositories and indices. public-inbox-watch is useful in situations when a user wishes to mirror an existing mailing list, but has no access to run @@ -48,11 +48,9 @@ of large Maildirs. Upon startup, it scans the mailbox for new messages to be imported while it was not running. -Currently, only Maildirs are supported. - -For now, IMAP users should use tools such as L<mbsync(1)> -or L<offlineimap(1)> to bidirectionally sync their IMAP -folders to Maildirs for public-inbox-watch. +As of public-inbox 1.6.0, Maildirs, IMAP folders, and NNTP +newsgroups are supported. Previous versions of public-inbox +only supported Maildirs. public-inbox-watch should be run inside a L<screen(1)> session or as a L<systemd(1)> service. Errors are emitted to stderr. @@ -64,21 +62,64 @@ public-inbox-watch takes no command-line options. =head1 CONFIGURATION These configuration knobs should be used in the -L<public-inbox-config(5)> +L<public-inbox-config(5)> file =over 8 =item publicinbox.<name>.watch +A location to watch. public-inbox 1.5.0 and earlier only supported +C<maildir:> paths: + + [publicinbox "test"] + watch = maildir:/path/to/maildirs/.INBOX.test/ + +public-inbox 1.6.0 supports C<nntp://>, C<nntps://>, +C<imap://> and C<imaps://> URLs: + + watch = nntp://news.example.com/inbox.test.group + watch = imaps://mail.example.com/INBOX.test.foo + +Default: none + =item publicinbox.<name>.watchheader + [publicinbox "test"] + watchheader = List-Id:<test.example.com> + +If specified, L<public-inbox-watch(1)> will only process mail +matching the given header. If specified multiple times in +public-inbox 1.5 or later, mail will be processed if it matches +any of the values. Only the last value was used in public-inbox +1.4 and earlier. + +Default: none + =item publicinboxwatch.spamcheck +This may be set to C<spamc> to enable the use of SpamAssassin +L<spamc(1)> for filtering spam before it is imported into git +history. Other spam filtering backends may be supported in +the future. + +Default: none + =item publicinboxwatch.watchspam -=back +A Maildir to watch for confirmed spam messages to appear in. +Messages which appear in this folder with the (S)een flag +will be hidden from all configured inboxes based on Message-ID +and content matching. + +Messages without the (S)een flag are not considered for hiding. +This hiding affects all configured public-inboxes in PI_CONFIG. + +As with C<publicinbox.$NAME.watch>, C<imap://> and C<imaps://> URLs +are supported in public-inbox 1.6.0. -See L<public-inbox-config(5)> for documentation on them. +Default: none; only for L<public-inbox-watch(1)> users + +=back =head1 SIGNALS
There's a few more, but maybe they're too esoteric to be worth documenting at the moment (batch sizes, timeouts, etc). --- Documentation/public-inbox-watch.pod | 36 +++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/Documentation/public-inbox-watch.pod b/Documentation/public-inbox-watch.pod index b07d0fb5..39b8ac06 100644 --- a/Documentation/public-inbox-watch.pod +++ b/Documentation/public-inbox-watch.pod @@ -78,7 +78,12 @@ public-inbox 1.6.0 supports C<nntp://>, C<nntps://>, C<imap://> and C<imaps://> URLs: watch = nntp://news.example.com/inbox.test.group - watch = imaps://mail.example.com/INBOX.test.foo + watch = imaps://user@mail.example.com/INBOX.test.foo + +This may be specified multiple times to combine several mailboxes +into a single public-inbox. URLs requiring authentication +will require L<netrc(5)> and/or L<git-credential(1)> to fill +in the username and password. Default: none @@ -119,6 +124,35 @@ are supported in public-inbox 1.6.0. Default: none; only for L<public-inbox-watch(1)> users +=item imap.Starttls / imap.$URL.Starttls + +Whether or not to use C<STARTTLS> on plain C<imap://> connections. + +May be specified for certain URLs via L<git-config(1)/--get-urlmatch> +in C<git(1)> 1.8.5+. + +Default: C<true> + +=item imap.Compress / imap.$URL.Compress + +Whether or not to use the IMAP COMPRESS (RFC4978) extension to +save bandwidth. This is not supported by all IMAP servers and +some advertising this feature may not implement it correctly. + +May be specified only for certain URLs if L<git(1)> 1.8.5+ is +installed to use L<git-config(1)/--get-urlmatch> + +Default: C<false> + +=item nntp.Starttls / nntp.$URL.Starttls + +Whether or not to use C<STARTTLS> on plain C<nntp://> connections. + +May be specified for certain URLs via L<git-config(1)/--get-urlmatch> +in C<git(1)> 1.8.5+. + +Default: C<false> if the hostname is a Tor C<.onion>, C<true> otherwise + =back =head1 SIGNALS
Eric Wong <e@yhbt.net> wrote: > --- a/Documentation/public-inbox-watch.pod > +++ b/Documentation/public-inbox-watch.pod > @@ -78,7 +78,12 @@ public-inbox 1.6.0 supports C<nntp://>, C<nntps://>, > C<imap://> and C<imaps://> URLs: > > watch = nntp://news.example.com/inbox.test.group > - watch = imaps://mail.example.com/INBOX.test.foo > + watch = imaps://user@mail.example.com/INBOX.test.foo > + That exceeds 80 columns (I only ran "make check-run", not "make check" :x). Will squash this in: diff --git a/Documentation/public-inbox-watch.pod b/Documentation/public-inbox-watch.pod index 39b8ac06..f3e622b0 100644 --- a/Documentation/public-inbox-watch.pod +++ b/Documentation/public-inbox-watch.pod @@ -78,7 +78,7 @@ public-inbox 1.6.0 supports C<nntp://>, C<nntps://>, C<imap://> and C<imaps://> URLs: watch = nntp://news.example.com/inbox.test.group - watch = imaps://user@mail.example.com/INBOX.test.foo + watch = imaps://user@mail.example.com/INBOX.test This may be specified multiple times to combine several mailboxes into a single public-inbox. URLs requiring authentication