unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 00/34] watch: add IMAP and NNTP support
@ 2020-06-27 10:03 Eric Wong
  2020-06-27 10:03 ` [PATCH 01/34] inboxwritable: ensure ssoma.lock exists on init Eric Wong
                   ` (34 more replies)
  0 siblings, 35 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Some fairly major changes to -watch.  Filesys::Notify::Simple is
no longer used, and -watch now uses inotify, signalfd or kevent
like the read-only daemons.

Credentials are handled via Net::Netrc (Perl standard library)
or "git-credential", so we do no password storage on our own.

NNTP (and non-IDLE IMAP) may allow more parallelization in the
future.

One significant project-wide change is getting rid of "use
fields".  It gets in my way more than it helps, and it's
probably alien to a fair amount of Perl hackers.  AFAIK, it's
never really been popular outside of Danga::Socket-based
projects.


Eric W. Biederman (1):
  IMAPTracker: Add a helper to track our place in reading imap mailboxes

Eric Wong (33):
  inboxwritable: ensure ssoma.lock exists on init
  inbox: warn on ->on_inbox_unlock exception
  imaptracker: use ~/.local/share/public-inbox/imap.sqlite3
  watchmaildir: hoist out compile_watchheaders
  watchmaildir: fix check for spam vs ham inbox conflicts
  URI IMAP support
  watch: preliminary IMAP support
  kqnotify|fake_inotify: detect Maildir write ops
  watch: remove Filesys::Notify::Simple dependency
  watch: use signalfd for Maildir watching
  ds: remove fields.pm usage
  watch: wire up IMAP IDLE reapers to DS
  watch: support IMAP polling
  config: support ->urlmatch method for -watch
  watch: stop importers before forking
  watch: use UID SEARCH to avoid empty UID FETCH
  ds: add_timer: allow passing arg to callback.
  imaptracker: add {url} field to reduce args
  imaptracker: drop {dbname} field
  watch: avoid long transaction when writing to IMAPTracker
  watch: support imap.fetchBatchSize parameter
  watch: imap: be quieter about disconnecting on quit
  watch: support multiple watch: directives per-inbox
  watch: remove {mdir} array
  watch: just use ->urlmatch
  testcommon: $ENV{TAIL} supports non-@ARGV redirects
  watch: add NNTP support
  watch: show user-specified URL consistently.
  watch: enable autoflush for STDOUT and STDERR
  watch: use our own "git credential" wrapper
  watch: support ~/.netrc via Net::Netrc
  imaptracker: use flock(2) around writes
  watch: simplify internal structures

 Documentation/public-inbox-watch.pod |   3 +-
 INSTALL                              |   8 -
 MANIFEST                             |  11 +
 Makefile.PL                          |   4 -
 ci/deps.perl                         |   1 -
 lib/PublicInbox/Config.pm            |  21 +-
 lib/PublicInbox/DS.pm                |  29 +-
 lib/PublicInbox/Daemon.pm            |  19 +-
 lib/PublicInbox/DirIdle.pm           |  49 ++
 lib/PublicInbox/FakeInotify.pm       |  56 +-
 lib/PublicInbox/GitAsyncCat.pm       |   4 +-
 lib/PublicInbox/GitCredential.pm     |  55 ++
 lib/PublicInbox/HTTP.pm              |  23 +-
 lib/PublicInbox/HTTPD/Async.pm       |  22 +-
 lib/PublicInbox/IMAP.pm              |  19 +-
 lib/PublicInbox/IMAPTracker.pm       |  82 +++
 lib/PublicInbox/In2Tie.pm            |  13 +
 lib/PublicInbox/Inbox.pm             |   1 +
 lib/PublicInbox/InboxIdle.pm         |  20 +-
 lib/PublicInbox/InboxWritable.pm     |   3 +
 lib/PublicInbox/KQNotify.pm          |  38 +-
 lib/PublicInbox/Listener.pm          |   8 +-
 lib/PublicInbox/NNTP.pm              |  12 +-
 lib/PublicInbox/NNTPdeflate.pm       |   5 +-
 lib/PublicInbox/ParentPipe.pm        |   8 +-
 lib/PublicInbox/Sigfd.pm             |  21 +-
 lib/PublicInbox/TestCommon.pm        |  40 +-
 lib/PublicInbox/URIimap.pm           | 113 +++
 lib/PublicInbox/WatchMaildir.pm      | 998 +++++++++++++++++++++++----
 script/public-inbox-watch            |  33 +-
 t/config.t                           |  18 +
 t/dir_idle.t                         |   6 +
 t/fake_inotify.t                     |  45 ++
 t/imap_tracker.t                     |  54 ++
 t/imapd.t                            |  74 ++
 t/kqnotify.t                         |  41 ++
 t/nntpd.t                            |  52 ++
 t/uri_imap.t                         |  65 ++
 t/watch_filter_rubylang.t            |   2 +-
 t/watch_imap.t                       |  21 +
 t/watch_maildir.t                    |  96 ++-
 t/watch_maildir_v2.t                 |   4 +-
 t/watch_multiple_headers.t           |   2 +-
 t/watch_nntp.t                       |  17 +
 xt/mem-imapd-tls.t                   |  18 +-
 45 files changed, 1944 insertions(+), 290 deletions(-)
 create mode 100644 lib/PublicInbox/DirIdle.pm
 create mode 100644 lib/PublicInbox/GitCredential.pm
 create mode 100644 lib/PublicInbox/IMAPTracker.pm
 create mode 100644 lib/PublicInbox/URIimap.pm
 create mode 100644 t/dir_idle.t
 create mode 100644 t/fake_inotify.t
 create mode 100644 t/imap_tracker.t
 create mode 100644 t/kqnotify.t
 create mode 100644 t/uri_imap.t
 create mode 100644 t/watch_imap.t
 create mode 100644 t/watch_nntp.t


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/34] inboxwritable: ensure ssoma.lock exists on init
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 02/34] inbox: warn on ->on_inbox_unlock exception Eric Wong
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

This will allow us to use InboxIdle on empty/unindexed v1 inboxes.
---
 lib/PublicInbox/InboxWritable.pm | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index f9e28502001..9bdf8637e6e 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -53,6 +53,9 @@ sub init_inbox {
 				$mm->{dbh}->commit;
 			}) if defined($skip_artnum);
 			$sidx->commit_txn_lazy;
+		} else {
+			open my $fh, '>>', "$dir/ssoma.lock" or
+				die "$dir/ssoma.lock: $!\n";
 		}
 	} else {
 		my $v2w = importer($self);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/34] inbox: warn on ->on_inbox_unlock exception
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
  2020-06-27 10:03 ` [PATCH 01/34] inboxwritable: ensure ssoma.lock exists on init Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 03/34] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric Wong
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Otherwise, we may never know what went wrong.
---
 lib/PublicInbox/Inbox.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 7d5e048363f..02186dac717 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -421,6 +421,7 @@ sub on_unlock {
 	my $subs = $self->{unlock_subs} or return;
 	for (values %$subs) {
 		eval { $_->on_inbox_unlock($self) };
+		warn "E: $@ ($self->{inboxdir})\n" if $@;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/34] IMAPTracker: Add a helper to track our place in reading imap mailboxes
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
  2020-06-27 10:03 ` [PATCH 01/34] inboxwritable: ensure ssoma.lock exists on init Eric Wong
  2020-06-27 10:03 ` [PATCH 02/34] inbox: warn on ->on_inbox_unlock exception Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 04/34] imaptracker: use ~/.local/share/public-inbox/imap.sqlite3 Eric Wong
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta; +Cc: Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

This removes the need to delete from an imap mailbox when
downloading it's messages.

[ew: minor style changes]

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 MANIFEST                       |  1 +
 lib/PublicInbox/IMAPTracker.pm | 61 ++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)
 create mode 100644 lib/PublicInbox/IMAPTracker.pm

diff --git a/MANIFEST b/MANIFEST
index 3e7d4cc0e29..42a00d74344 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -132,6 +132,7 @@ lib/PublicInbox/Hval.pm
 lib/PublicInbox/IMAP.pm
 lib/PublicInbox/IMAPClient.pm
 lib/PublicInbox/IMAPD.pm
+lib/PublicInbox/IMAPTracker.pm
 lib/PublicInbox/IMAPdeflate.pm
 lib/PublicInbox/IMAPsearchqp.pm
 lib/PublicInbox/Import.pm
diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm
new file mode 100644
index 00000000000..c7da422b725
--- /dev/null
+++ b/lib/PublicInbox/IMAPTracker.pm
@@ -0,0 +1,61 @@
+# Copyright (C) 2018-2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+package PublicInbox::IMAPTracker;
+use strict;
+use DBI;
+use DBD::SQLite;
+use PublicInbox::Config;
+
+sub create_tables ($) {
+	my ($dbh) = @_;
+
+	$dbh->do(<<'');
+CREATE TABLE IF NOT EXISTS imap_last (
+	url VARCHAR PRIMARY KEY NOT NULL,
+	uid_validity INTEGER NOT NULL,
+	uid INTEGER NOT NULL,
+	UNIQUE (url)
+)
+
+}
+
+sub dbh_new ($) {
+	my ($dbname) = @_;
+	my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", '', '', {
+		AutoCommit => 1,
+		RaiseError => 1,
+		PrintError => 0,
+		sqlite_use_immediate_transaction => 1,
+	});
+	$dbh->{sqlite_unicode} = 1;
+	$dbh->do('PRAGMA journal_mode = TRUNCATE');
+	create_tables($dbh);
+	$dbh;
+}
+
+sub get_last ($$) {
+	my ($self, $url) = @_;
+	my $sth = $self->{dbh}->prepare_cached(<<'', undef, 1);
+SELECT uid_validity, uid FROM imap_last WHERE url = ?
+
+	$sth->execute($url);
+	$sth->fetchrow_array;
+}
+
+sub update_last ($$$$) {
+	my ($self, $url, $validity, $last) = @_;
+	my $sth = $self->{dbh}->prepare_cached(<<'');
+INSERT OR REPLACE INTO imap_last (url, uid_validity, uid)
+VALUES (?, ?, ?)
+
+	$sth->execute($url, $validity, $last);
+}
+
+sub new {
+	my ($class) = @_;
+	my $dbname = PublicInbox::Config->config_dir() . "/imap.sqlite3";
+	my $dbh = dbh_new($dbname);
+	bless { dbname => $dbname, dbh => $dbh }, $class;
+}
+
+1;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/34] imaptracker: use ~/.local/share/public-inbox/imap.sqlite3
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (2 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 03/34] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 05/34] watchmaildir: hoist out compile_watchheaders Eric Wong
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta; +Cc: Eric W. Biederman

Respect XDG_DATA_HOME to avoid cluttering ~/.public-inbox/.
Existing users of ~/.public-inbox/imap.sqlite3 will remain
supported, but the preference for new data is to use
~/.local/share and other paths standardized by XDG.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
 MANIFEST                       |  1 +
 lib/PublicInbox/IMAPTracker.pm | 19 +++++++++++++++++--
 t/imap_tracker.t               | 26 ++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 t/imap_tracker.t

diff --git a/MANIFEST b/MANIFEST
index 42a00d74344..158d7ca2d8e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -274,6 +274,7 @@ t/httpd.t
 t/hval.t
 t/imap.t
 t/imap_searchqp.t
+t/imap_tracker.t
 t/imapd-tls.t
 t/imapd.t
 t/import.t
diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm
index c7da422b725..bb4a39cc41a 100644
--- a/lib/PublicInbox/IMAPTracker.pm
+++ b/lib/PublicInbox/IMAPTracker.pm
@@ -52,8 +52,23 @@ VALUES (?, ?, ?)
 }
 
 sub new {
-	my ($class) = @_;
-	my $dbname = PublicInbox::Config->config_dir() . "/imap.sqlite3";
+	my ($class, $dbname) = @_;
+
+	# original name for compatibility with old setups:
+	$dbname //= PublicInbox::Config->config_dir() . "/imap.sqlite3";
+
+	# use the new XDG-compliant name for new setups:
+	if (!-f $dbname) {
+		$dbname = ($ENV{XDG_DATA_HOME} //
+			(($ENV{HOME} // '/nonexistent').'/.local/share')) .
+			'/public-inbox/imap.sqlite3';
+	}
+	if (!-f $dbname) {
+		require File::Path;
+		require File::Basename;;
+		File::Path::mkpath(File::Basename::dirname($dbname));
+	}
+
 	my $dbh = dbh_new($dbname);
 	bless { dbname => $dbname, dbh => $dbh }, $class;
 }
diff --git a/t/imap_tracker.t b/t/imap_tracker.t
new file mode 100644
index 00000000000..8dc04ed77a3
--- /dev/null
+++ b/t/imap_tracker.t
@@ -0,0 +1,26 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use Test::More;
+use strict;
+use PublicInbox::TestCommon;
+require_mods 'DBD::SQLite';
+use_ok 'PublicInbox::IMAPTracker';
+my ($tmpdir, $for_destroy) = tmpdir();
+mkdir "$tmpdir/old" or die "mkdir $tmpdir/old: $!";
+my $old = "$tmpdir/old/imap.sqlite3";
+my $cur = "$tmpdir/data/public-inbox/imap.sqlite3";
+{
+	local $ENV{XDG_DATA_HOME} = "$tmpdir/data";
+	local $ENV{PI_DIR} = "$tmpdir/old";
+
+	my $tracker = PublicInbox::IMAPTracker->new;
+	ok(-f $cur, '->new creates file');
+	$tracker = undef;
+	ok(-f $cur, 'file persists after DESTROY');
+	link $cur, $old or die "link $cur => $old: $!";
+	unlink $cur or die "unlink $cur: $!";
+	$tracker = PublicInbox::IMAPTracker->new;
+	ok(!-f $cur, '->new does not create new file if old is present');
+}
+
+done_testing;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/34] watchmaildir: hoist out compile_watchheaders
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (3 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 04/34] imaptracker: use ~/.local/share/public-inbox/imap.sqlite3 Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 06/34] watchmaildir: fix check for spam vs ham inbox conflicts Eric Wong
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

It's too deeply indented, and we will be using it for IMAP, too.
---
 lib/PublicInbox/WatchMaildir.pm | 45 ++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 7ca35403517..496199c90f5 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -13,6 +13,29 @@ use PublicInbox::Filter::Base qw(REJECT);
 use PublicInbox::Spamcheck;
 *mime_from_path = \&PublicInbox::InboxWritable::mime_from_path;
 
+sub compile_watchheaders ($) {
+	my ($ibx) = @_;
+	my $watch_hdrs = [];
+	if (my $whs = $ibx->{watchheader}) {
+		for (@$whs) {
+			my ($k, $v) = split(/:/, $_, 2);
+			# XXX should this be case-insensitive?
+			# Or, mutt-style, case-sensitive iff
+			# a capital letter exists?
+			push @$watch_hdrs, [ $k, qr/\Q$v\E/ ];
+		}
+	}
+	if (my $list_ids = $ibx->{listid}) {
+		for (@$list_ids) {
+			# RFC2919 section 6 stipulates
+			# "case insensitive equality"
+			my $re = qr/<[ \t]*\Q$_\E[ \t]*>/i;
+			push @$watch_hdrs, ['List-Id', $re ];
+		}
+	}
+	$ibx->{-watchheaders} = $watch_hdrs if scalar @$watch_hdrs;
+}
+
 sub new {
 	my ($class, $config) = @_;
 	my (%mdmap, @mdir, $spamc);
@@ -58,27 +81,7 @@ sub new {
 
 		my $watch = $ibx->{watch} or return;
 		if (is_maildir($watch)) {
-			my $watch_hdrs = [];
-			if (my $whs = $ibx->{watchheader}) {
-				for (@$whs) {
-					my ($k, $v) = split(/:/, $_, 2);
-					# XXX should this be case-insensitive?
-					# Or, mutt-style, case-sensitive iff
-					# a capital letter exists?
-					push @$watch_hdrs, [ $k, qr/\Q$v\E/ ];
-				}
-			}
-			if (my $list_ids = $ibx->{listid}) {
-				for (@$list_ids) {
-					# RFC2919 section 6 stipulates
-					# "case insensitive equality"
-					my $re = qr/<[ \t]*\Q$_\E[ \t]*>/i;
-					push @$watch_hdrs, ['List-Id', $re ];
-				}
-			}
-			if (scalar @$watch_hdrs) {
-				$ibx->{-watchheaders} = $watch_hdrs;
-			}
+			compile_watchheaders($ibx);
 			my $new = "$watch/new";
 			my $cur = "$watch/cur";
 			push @mdir, $new unless $uniq{$new}++;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/34] watchmaildir: fix check for spam vs ham inbox conflicts
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (4 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 05/34] watchmaildir: hoist out compile_watchheaders Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 07/34] URI IMAP support Eric Wong
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

The old check was ineffective since we process the spam folder
config before ham inboxes; and would only fail when attempting
to treat the scalar "watchspam" string as an array ref.
---
 lib/PublicInbox/WatchMaildir.pm | 17 +++++++----------
 t/watch_maildir.t               | 15 +++++++++++++++
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 496199c90f5..c1f3c5c258f 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -52,15 +52,6 @@ sub new {
 			if (is_maildir($dir)) {
 				# skip "new", no MUA has seen it, yet.
 				my $cur = "$dir/cur";
-				my $old = $mdmap{$cur};
-				if (ref($old)) {
-					foreach my $ibx (@$old) {
-						warn <<"";
-"$cur already watched for `$ibx->{name}'
-
-					}
-					die;
-				}
 				push @mdir, $cur;
 				$uniq{$cur}++;
 				$mdmap{$cur} = 'watchspam';
@@ -84,9 +75,15 @@ sub new {
 			compile_watchheaders($ibx);
 			my $new = "$watch/new";
 			my $cur = "$watch/cur";
+			my $ws = $mdmap{$cur};
+			if ($ws && !ref($ws) && $ws eq 'watchspam') {
+				warn <<EOF;
+E: $cur is a spam folder and cannot be used for `$ibx->{name}' input
+EOF
+				return; # onto next inbox
+			}
 			push @mdir, $new unless $uniq{$new}++;
 			push @mdir, $cur unless $uniq{$cur}++;
-
 			push @{$mdmap{$new} ||= []}, $ibx;
 			push @{$mdmap{$cur} ||= []}, $ibx;
 		} else {
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index 66955072cd5..33a3458bd4d 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -32,6 +32,21 @@ ok(POSIX::mkfifo("$maildir/cur/fifo", 0777),
 	'create FIFO to ensure we do not get stuck on it :P');
 my $sem = PublicInbox::Emergency->new($spamdir); # create dirs
 
+{
+	my @w;
+	local $SIG{__WARN__} = sub { push @w, @_ };
+	my $config = PublicInbox::Config->new(\<<EOF);
+$cfgpfx.address=$addr
+$cfgpfx.inboxdir=$git_dir
+$cfgpfx.watch=maildir:$spamdir
+publicinboxlearn.watchspam=maildir:$spamdir
+EOF
+	my $wm = PublicInbox::WatchMaildir->new($config);
+	is(scalar grep(/is a spam folder/, @w), 1, 'got warning about spam');
+	is_deeply($wm->{mdmap}, { "$spamdir/cur" => 'watchspam' },
+		'only got the spam folder to watch');
+}
+
 my $config = PublicInbox::Config->new(\<<EOF);
 $cfgpfx.address=$addr
 $cfgpfx.inboxdir=$git_dir

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/34] URI IMAP support
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (5 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 06/34] watchmaildir: fix check for spam vs ham inbox conflicts Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 08/34] watch: preliminary " Eric Wong
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

We'll be supporting the IMAP URL scheme described in RFC 5092
for -watch, so add this module to fill in what the `URI' package
lacks.
---
 MANIFEST                   |   2 +
 lib/PublicInbox/URIimap.pm | 113 +++++++++++++++++++++++++++++++++++++
 t/uri_imap.t               |  65 +++++++++++++++++++++
 3 files changed, 180 insertions(+)
 create mode 100644 lib/PublicInbox/URIimap.pm
 create mode 100644 t/uri_imap.t

diff --git a/MANIFEST b/MANIFEST
index 158d7ca2d8e..ffd79c1f1b3 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -181,6 +181,7 @@ lib/PublicInbox/Syscall.pm
 lib/PublicInbox/TLS.pm
 lib/PublicInbox/TestCommon.pm
 lib/PublicInbox/Tmpfile.pm
+lib/PublicInbox/URIimap.pm
 lib/PublicInbox/Unsubscribe.pm
 lib/PublicInbox/UserContent.pm
 lib/PublicInbox/V2Writable.pm
@@ -335,6 +336,7 @@ t/spamcheck_spamc.t
 t/spawn.t
 t/thread-cycle.t
 t/time.t
+t/uri_imap.t
 t/utf8.eml
 t/v1-add-remove-add.t
 t/v1reindex.t
diff --git a/lib/PublicInbox/URIimap.pm b/lib/PublicInbox/URIimap.pm
new file mode 100644
index 00000000000..56b6002a379
--- /dev/null
+++ b/lib/PublicInbox/URIimap.pm
@@ -0,0 +1,113 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# cf. RFC 5092, which the `URI' package doesn't support
+#
+# This depends only on the documented public API of the `URI' dist,
+# not on internal `_'-prefixed subclasses such as `URI::_server'
+#
+# <https://metacpan.org/pod/URI::imap> exists, but it's not in
+# common distros.
+#
+# RFC 2192 also describes ";TYPE=<list_type>"
+package PublicInbox::URIimap;
+use strict;
+use URI::Split qw(uri_split uri_join); # part of URI
+use URI::Escape qw(uri_unescape);
+
+my %default_ports = (imap => 143, imaps => 993);
+
+sub new {
+	my ($class, $url) = @_;
+	$url =~ m!\Aimaps?://! ? bless \$url, $class : undef;
+}
+
+sub canonical {
+	my ($self) = @_;
+
+	# no #frag in RFC 5092 from what I can tell
+	my ($scheme, $auth, $path, $query, $_frag) = uri_split($$self);
+	$path =~ s!\A/+!/!; # excessive leading slash
+
+	# lowercase the host portion
+	$auth =~ s#\A(.*@)?(.*?)(?::([0-9]+))?\z#
+		my $ret = ($1//'').lc($2);
+		if (defined(my $port = $3)) {
+			if ($default_ports{lc($scheme)} != $port) {
+				$ret .= ":$port";
+			}
+		}
+		$ret#ei;
+
+	ref($self)->new(uri_join(lc($scheme), $auth, $path, $query));
+}
+
+sub host {
+	my ($self) = @_;
+	my (undef, $auth) = uri_split($$self);
+	$auth =~ s!\A.*?@!!;
+	$auth =~ s!:[0-9]+\z!!;
+	$auth =~ s!\A\[(.*)\]\z!$1!; # IPv6
+	uri_unescape($auth);
+}
+
+# unescaped, may be used for globbing
+sub path {
+	my ($self) = @_;
+	my (undef, undef, $path) = uri_split($$self);
+	$path =~ s!\A/+!!;
+	$path =~ s/;.*\z//; # ;UIDVALIDITY=nz-number
+	$path eq '' ? undef : $path;
+}
+
+sub mailbox {
+	my ($self) = @_;
+	my $path = path($self);
+	defined($path) ? uri_unescape($path) : undef;
+}
+
+# TODO: UIDVALIDITY, search, and other params
+
+sub port {
+	my ($self) = @_;
+	my ($scheme, $auth) = uri_split($$self);
+	$auth =~ /:([0-9]+)\z/ ? $1 + 0 : $default_ports{lc($scheme)};
+}
+
+sub authority {
+	my ($self) = @_;
+	my (undef, $auth) = uri_split($$self);
+	$auth
+}
+
+sub user {
+	my ($self) = @_;
+	my (undef, $auth) = uri_split($$self);
+	$auth =~ s/@.*\z// or return undef; # drop host:port
+	$auth =~ s/;.*\z//; # drop ;AUTH=...
+	$auth =~ s/:.*\z//; # drop password
+	uri_unescape($auth);
+}
+
+sub password {
+	my ($self) = @_;
+	my (undef, $auth) = uri_split($$self);
+	$auth =~ s/@.*\z// or return undef; # drop host:port
+	$auth =~ s/;.*\z//; # drop ;AUTH=...
+	$auth =~ s/\A[^:]+:// ? uri_unescape($auth) : undef; # drop ->user
+}
+
+sub auth {
+	my ($self) = @_;
+	my (undef, $auth) = uri_split($$self);
+	$auth =~ s/@.*\z//; # drop host:port
+	$auth =~ /;AUTH=(.+)\z/i ? uri_unescape($1) : undef;
+}
+
+sub scheme {
+	my ($self) = @_;
+	(uri_split($$self))[0];
+}
+
+sub as_string { ${$_[0]} }
+
+1;
diff --git a/t/uri_imap.t b/t/uri_imap.t
new file mode 100644
index 00000000000..a2e86a7ec9c
--- /dev/null
+++ b/t/uri_imap.t
@@ -0,0 +1,65 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+require_mods 'URI::Split';
+use_ok 'PublicInbox::URIimap';
+
+is(PublicInbox::URIimap->new('https://example.com/'), undef,
+	'invalid scheme ignored');
+
+my $uri = PublicInbox::URIimap->new('imaps://EXAMPLE.com/');
+is($uri->host, 'EXAMPLE.com', 'host ok');
+is($uri->canonical->host, 'example.com', 'host canonicalized');
+is($uri->canonical->as_string, 'imaps://example.com/', 'URI canonicalized');
+is($uri->port, 993, 'imaps port');
+is($uri->auth, undef);
+is($uri->user, undef);
+
+$uri = PublicInbox::URIimap->new('imaps://foo@0/');
+is($uri->host, '0', 'numeric host');
+is($uri->user, 'foo', 'user extracted');
+
+$uri = PublicInbox::URIimap->new('imap://0/INBOX.sub#frag')->canonical;
+is($uri->as_string, 'imap://0/INBOX.sub', 'no fragment');
+is($uri->scheme, 'imap');
+
+$uri = PublicInbox::URIimap->new('imaps://;AUTH=ANONYMOUS@0/');
+is($uri->auth, 'ANONYMOUS', 'AUTH=ANONYMOUS accepted');
+
+$uri = PublicInbox::URIimap->new('imaps://bar%40example.com;AUTH=99%25@0/');
+is($uri->auth, '99%', 'decoded AUTH');
+is($uri->user, 'bar@example.com', 'decoded user');
+is($uri->mailbox, undef, 'mailbox is undef');
+
+$uri = PublicInbox::URIimap->new('imaps://ipv6@[::1]');
+is($uri->host, '::1', 'IPv6 host');
+is($uri->mailbox, undef, 'mailbox is undef');
+
+$uri = PublicInbox::URIimap->new('imaps://0:666/INBOX');
+is($uri->port, 666, 'port read');
+is($uri->mailbox, 'INBOX');
+$uri = PublicInbox::URIimap->new('imaps://0/INBOX.sub');
+is($uri->mailbox, 'INBOX.sub');
+is($uri->scheme, 'imaps');
+
+is(PublicInbox::URIimap->new('imap://0:143/')->canonical->as_string,
+	'imap://0/');
+is(PublicInbox::URIimap->new('imaps://0:993/')->canonical->as_string,
+	'imaps://0/');
+
+$uri = PublicInbox::URIimap->new('imap://NSA:Hunter2@0/INBOX');
+is($uri->user, 'NSA');
+is($uri->password, 'Hunter2');
+
+$uri = PublicInbox::URIimap->new('imap://0/%');
+is($uri->mailbox, '%', "RFC 2192 '%' supported");
+$uri = PublicInbox::URIimap->new('imap://0/%25');
+$uri = PublicInbox::URIimap->new('imap://0/*');
+is($uri->mailbox, '*', "RFC 2192 '*' supported");
+
+# TODO: support UIDVALIDITY and other params
+
+done_testing;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/34] watch: preliminary IMAP support
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (6 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 07/34] URI IMAP support Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 09/34] kqnotify|fake_inotify: detect Maildir write ops Eric Wong
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Only servers with IDLE are supported, for now.  Polling will
be needed since users may need to watch many inboxes with
a few active connections due to IMAP server limitations.
---
 MANIFEST                        |   1 +
 lib/PublicInbox/WatchMaildir.pm | 493 ++++++++++++++++++++++++++++----
 t/imapd.t                       |  43 +++
 t/watch_imap.t                  |  21 ++
 4 files changed, 504 insertions(+), 54 deletions(-)
 create mode 100644 t/watch_imap.t

diff --git a/MANIFEST b/MANIFEST
index ffd79c1f1b3..161b6cddbe0 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -347,6 +347,7 @@ t/v2reindex.t
 t/v2writable.t
 t/view.t
 t/watch_filter_rubylang.t
+t/watch_imap.t
 t/watch_maildir.t
 t/watch_maildir_v2.t
 t/watch_multiple_headers.t
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index c1f3c5c258f..fea7d5ef9ee 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -11,6 +11,8 @@ use PublicInbox::InboxWritable;
 use File::Temp 0.19 (); # 0.19 for ->newdir
 use PublicInbox::Filter::Base qw(REJECT);
 use PublicInbox::Spamcheck;
+use PublicInbox::DS qw(now);
+use POSIX qw(_exit WNOHANG);
 *mime_from_path = \&PublicInbox::InboxWritable::mime_from_path;
 
 sub compile_watchheaders ($) {
@@ -39,7 +41,8 @@ sub compile_watchheaders ($) {
 sub new {
 	my ($class, $config) = @_;
 	my (%mdmap, @mdir, $spamc);
-	my %uniq;
+	my %uniq; # directory => count
+	my %imap; # url => [inbox objects] or 'watchspam'
 
 	# "publicinboxwatch" is the documented namespace
 	# "publicinboxlearn" is legacy but may be supported
@@ -55,6 +58,8 @@ sub new {
 				push @mdir, $cur;
 				$uniq{$cur}++;
 				$mdmap{$cur} = 'watchspam';
+			} elsif (my $url = imap_url($dir)) {
+				$imap{$url} = 'watchspam';
 			} else {
 				warn "unsupported $k=$dir\n";
 			}
@@ -73,33 +78,34 @@ sub new {
 		my $watch = $ibx->{watch} or return;
 		if (is_maildir($watch)) {
 			compile_watchheaders($ibx);
-			my $new = "$watch/new";
-			my $cur = "$watch/cur";
-			my $ws = $mdmap{$cur};
-			if ($ws && !ref($ws) && $ws eq 'watchspam') {
-				warn <<EOF;
-E: $cur is a spam folder and cannot be used for `$ibx->{name}' input
-EOF
-				return; # onto next inbox
-			}
+			my ($new, $cur) = ("$watch/new", "$watch/cur");
+			return if is_watchspam($cur, $mdmap{$cur}, $ibx);
 			push @mdir, $new unless $uniq{$new}++;
 			push @mdir, $cur unless $uniq{$cur}++;
 			push @{$mdmap{$new} ||= []}, $ibx;
 			push @{$mdmap{$cur} ||= []}, $ibx;
+		} elsif (my $url = imap_url($watch)) {
+			return if is_watchspam($url, $imap{$url}, $ibx);
+			compile_watchheaders($ibx);
+			push @{$imap{$url} ||= []}, $ibx;
 		} else {
 			warn "watch unsupported: $k=$watch\n";
 		}
 	});
-	return unless @mdir;
+	return unless scalar(@mdir) || scalar(keys %imap);
 
-	my $mdre = join('|', map { quotemeta($_) } @mdir);
-	$mdre = qr!\A($mdre)/!;
+	my $mdre;
+	if (@mdir) {
+		$mdre = join('|', map { quotemeta($_) } @mdir);
+		$mdre = qr!\A($mdre)/!;
+	}
 	bless {
 		spamcheck => $spamcheck,
 		mdmap => \%mdmap,
 		mdir => \@mdir,
 		mdre => $mdre,
 		config => $config,
+		imap => scalar keys %imap ? \%imap : undef,
 		importers => {},
 		opendirs => {}, # dirname => dirhandle (in progress scans)
 	}, $class;
@@ -126,28 +132,51 @@ sub _try_fsn_paths {
 	_done_for_now($self);
 }
 
+sub remove_eml_i { # each_inbox callback
+	my ($ibx, $arg) = @_;
+	my ($self, $eml, $loc) = @$arg;
+	eval {
+		my $im = _importer_for($self, $ibx);
+		$im->remove($eml, 'spam');
+		if (my $scrub = $ibx->filter($im)) {
+			my $scrubbed = $scrub->scrub($eml, 1);
+			$scrubbed or return;
+			$scrubbed == REJECT() and return;
+			$im->remove($scrubbed, 'spam');
+		}
+	};
+	warn "error removing spam at: $loc from $ibx->{name}: $@\n" if $@;
+}
+
 sub _remove_spam {
 	my ($self, $path) = @_;
 	# path must be marked as (S)een
 	$path =~ /:2,[A-R]*S[T-Za-z]*\z/ or return;
-	my $mime = mime_from_path($path) or return;
-	$self->{config}->each_inbox(sub {
-		my ($ibx) = @_;
-		eval {
-			my $im = _importer_for($self, $ibx);
-			$im->remove($mime, 'spam');
-			if (my $scrub = $ibx->filter($im)) {
-				my $scrubbed = $scrub->scrub($mime, 1);
-				$scrubbed or return;
-				$scrubbed == REJECT() and return;
-				$im->remove($scrubbed, 'spam');
-			}
-		};
-		if ($@) {
-			warn "error removing spam at: ", $path,
-				" from ", $ibx->{name}, ': ', $@, "\n";
+	my $eml = mime_from_path($path) or return;
+	$self->{config}->each_inbox(\&remove_eml_i, [ $self, $eml, $path ]);
+}
+
+sub import_eml ($$$) {
+	my ($self, $ibx, $eml) = @_;
+	my $im = _importer_for($self, $ibx);
+
+	# any header match means it's eligible for the inbox:
+	if (my $watch_hdrs = $ibx->{-watchheaders}) {
+		my $ok;
+		my $hdr = $eml->header_obj;
+		for my $wh (@$watch_hdrs) {
+			my @v = $hdr->header_raw($wh->[0]);
+			$ok = grep(/$wh->[1]/, @v) and last;
 		}
-	})
+		return unless $ok;
+	}
+
+	if (my $scrub = $ibx->filter($im)) {
+		my $ret = $scrub->scrub($eml) or return;
+		$ret == REJECT() and return;
+		$eml = $ret;
+	}
+	$im->add($eml, $self->{spamcheck});
 }
 
 sub _try_path {
@@ -172,32 +201,29 @@ sub _try_path {
 		$warn_cb->(@_);
 	};
 	foreach my $ibx (@$inboxes) {
-		my $mime = mime_from_path($path) or next;
-		my $im = _importer_for($self, $ibx);
-
-		# any header match means it's eligible for the inbox:
-		if (my $watch_hdrs = $ibx->{-watchheaders}) {
-			my $ok;
-			my $hdr = $mime->header_obj;
-			for my $wh (@$watch_hdrs) {
-				my @v = $hdr->header_raw($wh->[0]);
-				$ok = grep(/$wh->[1]/, @v) and last;
-			}
-			next unless $ok;
-		}
-
-		if (my $scrub = $ibx->filter($im)) {
-			my $ret = $scrub->scrub($mime) or next;
-			$ret == REJECT() and next;
-			$mime = $ret;
-		}
-		$im->add($mime, $self->{spamcheck});
+		my $eml = mime_from_path($path) or next;
+		import_eml($self, $ibx, $eml);
 	}
 }
 
-sub quit { trigger_scan($_[0], 'quit') }
+sub quit {
+	my ($self) = @_;
+	trigger_scan($self, 'quit') or $self->{quit} = 1;
+	if (my $imap_pid = $self->{-imap_pid}) {
+		kill('QUIT', $imap_pid);
+	}
+	if (my $idle_pids = $self->{idle_pids}) {
+		kill('QUIT', $_) for (keys %$idle_pids);
+	}
+	if (my $idle_mic = $self->{idle_mic}) {
+		eval { $idle_mic->done };
+		warn "IDLE DONE error: $@\n" if $@;
+		eval { $idle_mic->disconnect };
+		warn "IDLE LOGOUT error: $@\n" if $@;
+	}
+}
 
-sub watch {
+sub watch_fs {
 	my ($self) = @_;
 	my $scan = File::Temp->newdir("public-inbox-watch.$$.scan.XXXXXX",
 					TMPDIR => 1);
@@ -205,14 +231,355 @@ sub watch {
 	my $re = qr!\A$scandir/!;
 	my $cb = sub { _try_fsn_paths($self, $re, \@_) };
 
-	# lazy load here, we may support watching via IMAP IDLE
-	# in the future...
 	eval { require Filesys::Notify::Simple } or
 		die "Filesys::Notify::Simple is currently required for $0\n";
 	my $fsn = Filesys::Notify::Simple->new([@{$self->{mdir}}, $scandir]);
 	$fsn->wait($cb) until $self->{quit};
 }
 
+# returns the git config section name, e.g [imap "imaps://user@example.com"]
+# without the mailbox, so we can share connections between different inboxes
+sub imap_section ($) {
+	my ($uri) = @_;
+	$uri->scheme . '://' . $uri->authority;
+}
+
+sub cfg_intvl ($$) {
+	my ($cfg, $key) = @_;
+	defined(my $v = $cfg->{lc($key)}) or return;
+	$v =~ /\A[0-9]+\z/s and return $v + 0;
+	if (ref($v) eq 'ARRAY') {
+		$v = join(', ', @$v);
+		warn "W: $key has multiple values: $v\nW: $key ignored\n";
+	} else {
+		warn "W: $key=$v is not an integer value in seconds\n";
+	}
+}
+
+# flesh out common IMAP-specific data structures
+sub imap_common_init ($) {
+	my ($self) = @_;
+	my $cfg = $self->{config};
+	my $mic_args = {}; # scheme://authority => Mail:IMAPClient arg
+	for my $url (sort keys %{$self->{imap}}) {
+		my $uri = PublicInbox::URIimap->new($url);
+		my $sec = imap_section($uri);
+		for my $k (qw(Starttls Debug Compress)) {
+			my $key = lc("imap.$sec.$k");
+			defined(my $orig = $cfg->{$key}) or next;
+			my $v = PublicInbox::Config::_git_config_bool($orig);
+			if (defined($v)) {
+				$mic_args->{$sec}->{$k} = $v;
+			} else {
+				warn "W: $key=$orig is not boolean\n";
+			}
+		}
+		my $to = cfg_intvl($cfg, "imap.$sec.Timeout");
+		$mic_args->{$sec}->{Timeout} = $to if $to;
+		$to = cfg_intvl($cfg, "imap.$sec.PollInterval");
+		$self->{imap_opt}->{$sec}->{poll_intvl} = $to if $to;
+		$to = cfg_intvl($cfg, "imap.$sec.IdleInterval");
+		$self->{imap_opt}->{$sec}->{idle_intvl} = $to if $to;
+	}
+	$mic_args;
+}
+
+sub auth_anon_cb { '' }; # for Mail::IMAPClient::Authcallback
+
+sub mic_for ($$$) { # mic = Mail::IMAPClient
+	my ($self, $uri, $mic_args) = @_;
+	my $url = $uri->as_string;
+	my $cred = {
+		url => $url,
+		protocol => $uri->scheme,
+		host => $uri->host,
+		username => $uri->user,
+		password => $uri->password,
+	};
+	my $common = $mic_args->{imap_section($uri)} // {};
+	my $host = $cred->{host};
+	my $mic_arg = {
+		Port => $uri->port,
+		# IMAPClient mishandles `0', so we pass `127.0.0.1'
+		Server => $host eq '0' ? '127.0.0.1' : $host,
+		Ssl => $uri->scheme eq 'imaps',
+		Keepalive => 1, # SO_KEEPALIVE
+		%$common, # may set Starttls, Compress, Debug ....
+	};
+	my $mic = PublicInbox::IMAPClient->new(%$mic_arg) or
+		die "E: <$url> new: $@\n";
+
+	# default to using STARTTLS if it's available, but allow
+	# it to be disabled since I usually connect to localhost
+	if (!$mic_arg->{Ssl} && !defined($mic_arg->{Starttls}) &&
+			$mic->has_capability('STARTTLS') &&
+			$mic->can('starttls')) {
+		$mic->starttls or die "E: <$url> STARTTLS: $@\n";
+	}
+
+	# do we even need credentials?
+	if (!defined($cred->{username}) &&
+			$mic->has_capability('AUTH=ANONYMOUS')) {
+		$cred = undef;
+	}
+	if ($cred) {
+		Git::credential($cred, 'fill'); # may prompt user here
+		$mic->User($mic_arg->{User} = $cred->{username});
+		$mic->Password($mic_arg->{Password} = $cred->{password});
+	} else { # AUTH=ANONYMOUS
+		$mic->Authmechanism($mic_arg->{Authmechanism} = 'ANONYMOUS');
+		$mic->Authcallback($mic_arg->{Authcallback} = \&auth_anon_cb);
+	}
+	if ($mic->login && $mic->IsAuthenticated) {
+		# success! keep IMAPClient->new arg in case we get disconnected
+		$self->{mic_arg}->{imap_section($uri)} = $mic_arg;
+	} else {
+		warn "E: <$url> LOGIN: $@\n";
+		$mic = undef;
+	}
+	Git::credential($cred, $mic ? 'approve' : 'reject') if $cred;
+	$mic;
+}
+
+sub imap_start ($) {
+	my ($self) = @_;
+	eval { require PublicInbox::IMAPClient } or
+		die "Mail::IMAPClient is required for IMAP:\n$@\n";
+	eval { require Git } or
+		die "Git (Perl module) is required for IMAP:\n$@\n";
+	eval { require PublicInbox::IMAPTracker } or
+		die "DBD::SQLite is required for IMAP\n:$@\n";
+
+	my $mic_args = imap_common_init($self);
+	# make sure we can connect and cache the credentials in memory
+	$self->{mic_arg} = {}; # schema://authority => IMAPClient->new args
+	my $mics = $self->{mics} = {}; # schema://authority => IMAPClient obj
+	for my $url (sort keys %{$self->{imap}}) {
+		my $uri = PublicInbox::URIimap->new($url);
+		$mics->{imap_section($uri)} //= mic_for($self, $uri, $mic_args);
+	}
+}
+
+sub imap_fetch_all ($$$) {
+	my ($self, $mic, $uri) = @_;
+	my $sec = imap_section($uri);
+	my $mbx = $uri->mailbox;
+	my $url = $uri->as_string;
+	$mic->Clear(1); # trim results history
+	$mic->examine($mbx) or return "E: EXAMINE $mbx ($sec) failed: $!";
+	my ($r_uidval, $r_uidnext);
+	for ($mic->Results) {
+		/^\* OK \[UIDVALIDITY ([0-9]+)\].*/ and $r_uidval = $1;
+		/^\* OK \[UIDNEXT ([0-9]+)\].*/ and $r_uidnext = $1;
+		last if $r_uidval && $r_uidnext;
+	}
+	$r_uidval //= $mic->uidvalidity($mbx) //
+		return "E: $url cannot get UIDVALIDITY";
+	$r_uidnext //= $mic->uidnext($mbx) //
+		return "E: $url cannot get UIDNEXT";
+	my $itrk = PublicInbox::IMAPTracker->new;
+	my ($l_uidval, $l_uid) = $itrk->get_last($url);
+	$l_uidval //= $r_uidval; # first time
+	$l_uid //= 1;
+	if ($l_uidval != $r_uidval) {
+		return "E: $url UIDVALIDITY mismatch\n".
+			"E: local=$l_uidval != remote=$r_uidval";
+	}
+	my $r_uid = $r_uidnext - 1;
+	if ($l_uid != 1 && $l_uid > $r_uid) {
+		return "E: $url local UID exceeds remote ($l_uid > $r_uid)\n".
+			"E: $url strangely, UIDVALIDLITY matches ($l_uidval)\n";
+	}
+	return if $l_uid >= $r_uid; # nothing to do
+
+	$mic->Uid(1); # the default, we hope
+	my $req = $mic->imap4rev1 ? 'BODY.PEEK[]' : 'RFC822.PEEK';
+	my $key = $req;
+	$key =~ s/\.PEEK//;
+	my $inboxes = $self->{imap}->{$url};
+	warn "I: $url fetching $l_uid..$r_uid\n";
+	my $uid = -1;
+	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
+	local $SIG{__WARN__} = sub {
+		$warn_cb->("$url UID:$uid\n");
+		$warn_cb->(@_);
+	};
+	my $err;
+	$itrk->{dbh}->begin_work;
+	for my $u ($l_uid..$r_uid) {
+		$uid = $u;
+		local $0 = "UID:$uid $mbx $sec";
+		my $r = $mic->fetch_hash($uid, $req);
+		unless ($r) { # network error?
+			$err = "E: $url UID FETCH $uid error: $!\n";
+			last;
+		}
+
+		# messages get deleted, so holes appear
+		defined(my $raw = delete $r->{$uid}->{$key}) or next;
+
+		# our target audience expects LF-only, save storage
+		$raw =~ s/\r\n/\n/sg;
+
+		if (ref($inboxes)) {
+			for my $ibx (@$inboxes) {
+				my $eml = PublicInbox::Eml->new($raw);
+				my $x = import_eml($self, $ibx, $eml);
+			}
+		} elsif ($inboxes eq 'watchspam') {
+			my $eml = PublicInbox::Eml->new($raw);
+			my $arg = [ $self, $eml, "$uri UID:$uid" ];
+			$self->{config}->each_inbox(\&remove_eml_i, $arg);
+		} else {
+			die "BUG: destination unknown $inboxes";
+		}
+		$itrk->update_last($url, $r_uidval, $uid);
+		last if $self->{quit};
+	}
+	_done_for_now($self);
+	$itrk->{dbh}->commit;
+	$err;
+}
+
+sub imap_idle_once ($$$$) {
+	my ($self, $mic, $intvl, $url) = @_;
+	my $i = $intvl //= (29 * 60);
+	my $end = now() + $intvl;
+	warn "I: $url idling for ${intvl}s\n";
+	local $0 = "IDLE $0";
+	unless ($mic->idle) {
+		return if $self->{quit};
+		return "E: IDLE failed on $url: $!";
+	}
+	$self->{idle_mic} = $mic; # for ->quit
+	my @res;
+	until ($self->{quit} || grep(/^\* [0-9]+ EXISTS/, @res) || $i <= 0) {
+		@res = $mic->idle_data($i);
+		$i = $end - now();
+	}
+	delete $self->{idle_mic};
+	unless ($self->{quit}) {
+		$mic->IsConnected or return "E: IDLE disconnected on $url";
+		$mic->done or return "E: IDLE DONE failed on $url: $!";
+	}
+	undef;
+}
+
+# idles on a single URI
+sub watch_imap_idle_1 ($$$) {
+	my ($self, $uri, $intvl) = @_;
+	my $sec = imap_section($uri);
+	my $mic_arg = $self->{mic_arg}->{$sec} or
+			die "BUG: no Mail::IMAPClient->new arg for $sec";
+	my $mic;
+	local $0 = $uri->mailbox." $sec";
+	until ($self->{quit}) {
+		$mic //= delete($self->{mics}->{$sec}) //
+				PublicInbox::IMAPClient->new(%$mic_arg);
+		my $err = imap_fetch_all($self, $mic, $uri);
+		$err //= imap_idle_once($self, $mic, $intvl, $uri->as_string);
+		if ($err && !$self->{quit}) {
+			warn $err, "\n";
+			$mic = undef;
+			sleep 60 unless $self->{quit};
+		}
+	}
+}
+
+sub watch_imap_idle_all ($$) {
+	my ($self, $idle) = @_; # $idle = [[ uri1, intvl1 ], [ uri2, intvl2 ]]
+	$self->{mics} = {}; # going to be forking, so disconnect
+	my $idle_pids = $self->{idle_pids} = {};
+	until ($self->{quit}) {
+		while (my $uri_intvl = shift @$idle) {
+			my ($uri, $intvl) = @$uri_intvl;
+			defined(my $pid = fork) or die "fork: $!";
+			if ($pid == 0) {
+				delete $self->{idle_pids};
+				watch_imap_idle_1($self, $uri, $intvl);
+				_exit(0);
+			}
+			$idle_pids->{$pid} = $uri_intvl;
+		}
+		my $pid = waitpid(-1, 0) or next;
+		if ($pid < 0) {
+			warn "W: no idling children: $!";
+			if (@$idle) {
+				sleep 60;
+			} else {
+				warn "W: nothing to respawn, quitting IDLE\n";
+				last;
+			}
+		}
+		if (my $uri_intvl = delete $idle_pids->{$pid}) {
+			my ($uri, $intvl) = @$uri_intvl;
+			my $url = $uri->as_string;
+			if ($? || !$self->{quit}) {
+				warn "W: PID=$pid on $url died: \$?=$?\n";
+			}
+			push @$idle, $uri_intvl;
+		} else {
+			warn "W: PID=$pid (unknown) reaped: \$?=$?\n";
+		}
+	}
+
+	# tear it all down
+	kill('QUIT', $_) for (keys %$idle_pids);
+	while (scalar keys %$idle_pids) {
+		if (my $pid = waitpid(-1, WNOHANG)) {
+			if ($pid < 0) {
+				warn "E: no children? $! (PIDs: ",
+					join(', ', keys %$idle_pids),")\n";
+				last;
+			} else {
+				delete $idle_pids->{$pid};
+			}
+		} else { # signals aren't that reliable w/o signalfd/kevent
+			sleep 1;
+			kill('QUIT', $_) for (keys %$idle_pids);
+		}
+	}
+}
+
+sub watch_imap ($) {
+	my ($self) = @_;
+	my $idle = []; # [ [ uri1, intvl1 ], [uri2, intvl2] ];
+	my $poll = {}; # intvl_seconds => [ uri1, uri2 ]
+	for my $url (keys %{$self->{imap}}) {
+		my $uri = PublicInbox::URIimap->new($url);
+		my $sec = imap_section($uri);
+		my $mic = $self->{mics}->{$sec};
+		my $intvl = $self->{imap_opt}->{$sec}->{poll_intvl};
+		if ($mic->has_capability('IDLE') && !$intvl) {
+			$intvl = $self->{imap_opt}->{$sec}->{idle_intvl};
+			push @$idle, [ $uri, $intvl // () ];
+		} else {
+			push @{$poll->{$intvl || 120}}, $uri;
+		}
+	}
+	my $nr_poll = scalar keys %$poll;
+	if (scalar @$idle && !$nr_poll) { # multiple idlers, need fork
+		watch_imap_idle_all($self, $idle);
+	}
+	# TODO: polling
+}
+
+sub watch {
+	my ($self) = @_;
+	if ($self->{mdre} && $self->{imap}) {
+		defined(my $pid = fork) or die "fork: $!";
+		if ($pid == 0) {
+			imap_start($self);
+			goto &watch_imap;
+		}
+		$self->{-imap_pid} = $pid;
+	} elsif ($self->{imap}) {
+		imap_start($self);
+		goto &watch_imap;
+	}
+	goto &watch_fs;
+}
+
 sub trigger_scan {
 	my ($self, $base) = @_;
 	my $dir = $self->{scandir} or return;
@@ -296,4 +663,22 @@ sub is_maildir {
 	$_[0];
 }
 
+sub is_watchspam {
+	my ($cur, $ws, $ibx) = @_;
+	if ($ws && !ref($ws) && $ws eq 'watchspam') {
+		warn <<EOF;
+E: $cur is a spam folder and cannot be used for `$ibx->{name}' input
+EOF
+		return 1;
+	}
+	undef;
+}
+
+sub imap_url {
+	my ($url) = @_;
+	require PublicInbox::URIimap;
+	my $uri = PublicInbox::URIimap->new($url);
+	$uri ? $uri->canonical->as_string : undef;
+}
+
 1;
diff --git a/t/imapd.t b/t/imapd.t
index ffa195d57ac..3f31743df2e 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -440,6 +440,45 @@ ok($mic->logout, 'logged out');
 	like(<$c>, qr/\Atagonly BAD Error in IMAP command/, 'tag-only line');
 }
 
+{
+	use_ok 'PublicInbox::WatchMaildir';
+	use_ok 'PublicInbox::InboxIdle';
+	my $home = "$tmpdir/watch_home";
+	mkdir $home or BAIL_OUT $!;
+	mkdir "$home/.public-inbox" or BAIL_OUT $!;
+	local $ENV{HOME} = $home;
+	my $name = 'watchimap';
+	my $addr = "i1\@example.com";
+	my $url = "http://example.com/i1";
+	my $inboxdir = "$tmpdir/watchimap";
+	my $cmd = ['-init', '-V2', '-Lbasic', $name, $inboxdir, $url, $addr];
+	my ($ihost, $iport) = ($sock->sockhost, $sock->sockport);
+	my $imapurl = "imap://$ihost:$iport/inbox.i1.0";
+	run_script($cmd) or BAIL_OUT("init $name");
+	xsys(qw(git config), "--file=$home/.public-inbox/config",
+			"publicinbox.$name.watch",
+			$imapurl) == 0 or BAIL_OUT "git config $?";
+	my $cfg = PublicInbox::Config->new;
+	PublicInbox::DS->Reset;
+	my $ii = PublicInbox::InboxIdle->new($cfg);
+	my $cb = sub { PublicInbox::DS->SetPostLoopCallback(sub {}) };
+	my $obj = bless \$cb, 'InboxWakeup';
+	$cfg->each_inbox(sub { $_[0]->subscribe_unlock('ident', $obj) });
+	open my $err, '+>', undef or BAIL_OUT $!;
+	my $w = start_script(['-watch'], undef, { 2 => $err });
+	PublicInbox::DS->EventLoop;
+	diag 'inbox unlocked';
+	$w->kill;
+	$w->join;
+	is($?, 0, 'no error in exited -watch process');
+	$cfg->each_inbox(sub { shift->unsubscribe_unlock('ident') });
+	$ii->close;
+	PublicInbox::DS->Reset;
+	seek($err, 0, 0);
+	my @err = grep(!/^I:/, <$err>);
+	is(@err, 0, 'no warnings/errors from -watch'.join(' ', @err));
+}
+
 $td->kill;
 $td->join;
 is($?, 0, 'no error in exited process');
@@ -449,3 +488,7 @@ unlike($eout, qr/wide/i, 'no Wide character warnings');
 unlike($eout, qr/uninitialized/i, 'no uninitialized warnings');
 
 done_testing;
+
+package InboxWakeup;
+use strict;
+sub on_inbox_unlock { ${$_[0]}->() }
diff --git a/t/watch_imap.t b/t/watch_imap.t
new file mode 100644
index 00000000000..9433bb6f443
--- /dev/null
+++ b/t/watch_imap.t
@@ -0,0 +1,21 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::Config;
+# see t/imapd*.t for tests against a live IMAP server
+
+use_ok 'PublicInbox::WatchMaildir';
+my $cfg = PublicInbox::Config->new(\<<EOF);
+publicinbox.i.address=i\@example.com
+publicinbox.i.inboxdir=/nonexistent
+publicinbox.i.watch=imap://example.com/INBOX.a
+publicinboxlearn.watchspam=imap://example.com/INBOX.spam
+EOF
+my $watch = PublicInbox::WatchMaildir->new($cfg);
+is($watch->{imap}->{'imap://example.com/INBOX.a'}->[0]->{name}, 'i',
+	'watched an inbox');
+is($watch->{imap}->{'imap://example.com/INBOX.spam'}, 'watchspam',
+	'watched spam folder');
+
+done_testing;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/34] kqnotify|fake_inotify: detect Maildir write ops
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (7 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 08/34] watch: preliminary " Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 10/34] watch: remove Filesys::Notify::Simple dependency Eric Wong
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

We need to detect link(2) and rename(2) in other apps
writing to the Maildir.

We'll be removing the Filesys::Notify::Simple from -watch
in favor of using IO::KQueue or Linux::Inotify2 directly.
Ensure non-inotify emulations can support everything we
expect for Maildir writers.
---
 MANIFEST                       |  2 ++
 lib/PublicInbox/FakeInotify.pm | 46 ++++++++++++++++++++++++++++------
 lib/PublicInbox/KQNotify.pm    | 38 +++++++++++++++++++++++-----
 t/fake_inotify.t               | 45 +++++++++++++++++++++++++++++++++
 t/kqnotify.t                   | 41 ++++++++++++++++++++++++++++++
 5 files changed, 159 insertions(+), 13 deletions(-)
 create mode 100644 t/fake_inotify.t
 create mode 100644 t/kqnotify.t

diff --git a/MANIFEST b/MANIFEST
index 161b6cddbe0..9d1a4e4a8b1 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -253,6 +253,7 @@ t/eml_content_disposition.t
 t/eml_content_type.t
 t/epoll.t
 t/fail-bin/spamc
+t/fake_inotify.t
 t/feed.t
 t/filter_base-junk.eml
 t/filter_base-xhtml.eml
@@ -286,6 +287,7 @@ t/indexlevels-mirror-v1.t
 t/indexlevels-mirror.t
 t/init.t
 t/iso-2202-jp.eml
+t/kqnotify.t
 t/linkify.t
 t/main-bin/spamc
 t/mda-mime.eml
diff --git a/lib/PublicInbox/FakeInotify.pm b/lib/PublicInbox/FakeInotify.pm
index b077d63a4b4..df63173f083 100644
--- a/lib/PublicInbox/FakeInotify.pm
+++ b/lib/PublicInbox/FakeInotify.pm
@@ -6,10 +6,13 @@
 package PublicInbox::FakeInotify;
 use strict;
 use Time::HiRes qw(stat);
+use PublicInbox::DS;
 my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify
+# my $IN_MOVED_TO = 0x80;
+# my $IN_CREATE = 0x100;
+sub MOVED_TO_OR_CREATE () { 0x80 | 0x100 }
 
 my $poll_intvl = 2; # same as Filesys::Notify::Simple
-my $for_cancel = bless \(my $x), 'PublicInbox::FakeInotify::Watch';
 
 sub poll_once {
 	my ($self) = @_;
@@ -30,8 +33,22 @@ sub new {
 sub watch {
 	my ($self, $path, $mask, $cb) = @_;
 	my @st = stat($path) or return;
-	$self->{watch}->{"$path\0$mask"} = [ @st, $cb ];
-	$for_cancel;
+	my $k = "$path\0$mask";
+	$self->{watch}->{$k} = [ $st[10], $cb ]; # 10 - ctime
+	bless [ $self->{watch}, $k ], 'PublicInbox::FakeInotify::Watch';
+}
+
+sub on_new_files ($$$$) {
+	my ($dh, $cb, $path, $old_ctime) = @_;
+	while (defined(my $base = readdir($dh))) {
+		next if $base =~ /\A\.\.?\z/;
+		my $full = "$path/$base";
+		my @st = stat($full);
+		if (@st && $st[10] > $old_ctime) {
+			bless \$full, 'PublicInbox::FakeInotify::Event';
+			eval { $cb->(\$full) };
+		}
+	}
 }
 
 # behaves like non-blocking Linux::Inotify2->poll
@@ -43,17 +60,32 @@ sub poll {
 		my @now = stat($path) or next;
 		my $prv = $watch->{$x};
 		my $cb = $prv->[-1];
-		# 10: ctime, 7: size
-		if ($prv->[10] != $now[10]) {
+		my $old_ctime = $prv->[0];
+		if ($old_ctime != $now[10]) {
 			if (($mask & $IN_CLOSE) == $IN_CLOSE) {
 				eval { $cb->() };
+			} elsif ($mask & MOVED_TO_OR_CREATE) {
+				opendir(my $dh, $path) or do {
+					warn "W: opendir $path: $!\n";
+					next;
+				};
+				on_new_files($dh, $cb, $path, $old_ctime);
 			}
 		}
-		@$prv = (@now, $cb);
+		@$prv = ($now[10], $cb);
 	}
 }
 
 package PublicInbox::FakeInotify::Watch;
-sub cancel {} # noop
+use strict;
+
+sub cancel {
+	my ($self) = @_;
+	delete $self->[0]->{$self->[1]};
+}
+
+package PublicInbox::FakeInotify::Event;
+use strict;
 
+sub fullname { ${$_[0]} }
 1;
diff --git a/lib/PublicInbox/KQNotify.pm b/lib/PublicInbox/KQNotify.pm
index 110594cc02c..9673b44290a 100644
--- a/lib/PublicInbox/KQNotify.pm
+++ b/lib/PublicInbox/KQNotify.pm
@@ -7,6 +7,11 @@ package PublicInbox::KQNotify;
 use strict;
 use IO::KQueue;
 use PublicInbox::DSKQXS; # wraps IO::KQueue for fork-safe DESTROY
+use PublicInbox::FakeInotify;
+use Time::HiRes qw(stat);
+
+# NOTE_EXTEND detects rename(2), NOTE_WRITE detects link(2)
+sub MOVED_TO_OR_CREATE () { NOTE_EXTEND|NOTE_WRITE }
 
 sub new {
 	my ($class) = @_;
@@ -15,19 +20,28 @@ sub new {
 
 sub watch {
 	my ($self, $path, $mask, $cb) = @_;
-	open(my $fh, '<', $path) or return;
+	my ($fh, $cls, @extra);
+	if (-d $path) {
+		opendir($fh, $path) or return;
+		my @st = stat($fh);
+		@extra = ($path, $st[10]); # 10: ctime
+		$cls = 'PublicInbox::KQNotify::Watchdir';
+	} else {
+		open($fh, '<', $path) or return;
+		$cls = 'PublicInbox::KQNotify::Watch';
+	}
 	my $ident = fileno($fh);
 	$self->{dskq}->{kq}->EV_SET($ident, # ident
 		EVFILT_VNODE, # filter
 		EV_ADD | EV_CLEAR, # flags
 		$mask, # fflags
 		0, 0); # data, udata
-	if ($mask == NOTE_WRITE) {
-		$self->{watch}->{$ident} = [ $fh, $cb ];
+	if ($mask == NOTE_WRITE || $mask == MOVED_TO_OR_CREATE) {
+		$self->{watch}->{$ident} = [ $fh, $cb, @extra ];
 	} else {
 		die "TODO Not implemented: $mask";
 	}
-	bless \$fh, 'PublicInbox::KQNotify::Watch';
+	bless \$fh, $cls;
 }
 
 # emulate Linux::Inotify::fileno
@@ -48,8 +62,15 @@ sub poll {
 	for my $kev (@kevents) {
 		my $ident = $kev->[KQ_IDENT];
 		my $mask = $kev->[KQ_FFLAGS];
-		if (($mask & NOTE_WRITE) == NOTE_WRITE) {
-			eval { $self->{watch}->{$ident}->[1]->() };
+		my ($dh, $cb, $path, $old_ctime) = @{$self->{watch}->{$ident}};
+		if (!defined($path) && ($mask & NOTE_WRITE) == NOTE_WRITE) {
+			eval { $cb->() };
+		} elsif ($mask & MOVED_TO_OR_CREATE) {
+			my @new_st = stat($path) or next;
+			$self->{watch}->{$ident}->[3] = $new_st[10]; # ctime
+			rewinddir($dh);
+			PublicInbox::FakeInotify::on_new_files($dh, $cb,
+							$path, $old_ctime);
 		}
 	}
 }
@@ -59,4 +80,9 @@ use strict;
 
 sub cancel { close ${$_[0]} or die "close: $!" }
 
+package PublicInbox::KQNotify::Watchdir;
+use strict;
+
+sub cancel { closedir ${$_[0]} or die "closedir: $!" }
+
 1;
diff --git a/t/fake_inotify.t b/t/fake_inotify.t
new file mode 100644
index 00000000000..f0db0cb58ec
--- /dev/null
+++ b/t/fake_inotify.t
@@ -0,0 +1,45 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# Ensure FakeInotify can pick up rename(2) and link(2) operations
+# used by Maildir writing tools
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+use_ok 'PublicInbox::FakeInotify';
+my $MIN_FS_TICK = 0.011; # for low-res CONFIG_HZ=100 systems
+my ($tmpdir, $for_destroy) = tmpdir();
+mkdir "$tmpdir/new" or BAIL_OUT "mkdir: $!";
+open my $fh, '>', "$tmpdir/tst" or BAIL_OUT "open: $!";
+close $fh or BAIL_OUT "close: $!";
+
+my $fi = PublicInbox::FakeInotify->new;
+my $mask = PublicInbox::FakeInotify::MOVED_TO_OR_CREATE();
+my $hit = [];
+my $cb = sub { push @$hit, map { $_->fullname } @_ };
+my $w = $fi->watch("$tmpdir/new", $mask, $cb);
+
+select undef, undef, undef, $MIN_FS_TICK;
+rename("$tmpdir/tst", "$tmpdir/new/tst") or BAIL_OUT "rename: $!";
+$fi->poll;
+is_deeply($hit, ["$tmpdir/new/tst"], 'rename(2) detected');
+
+@$hit = ();
+select undef, undef, undef, $MIN_FS_TICK;
+open $fh, '>', "$tmpdir/tst" or BAIL_OUT "open: $!";
+close $fh or BAIL_OUT "close: $!";
+link("$tmpdir/tst", "$tmpdir/new/link") or BAIL_OUT "link: $!";
+$fi->poll;
+is_deeply($hit, ["$tmpdir/new/link"], 'link(2) detected');
+
+$w->cancel;
+@$hit = ();
+select undef, undef, undef, $MIN_FS_TICK;
+link("$tmpdir/new/tst", "$tmpdir/new/link2") or BAIL_OUT "link: $!";
+$fi->poll;
+is_deeply($hit, [], 'link(2) not detected after cancel');
+
+PublicInbox::DS->Reset;
+
+done_testing;
diff --git a/t/kqnotify.t b/t/kqnotify.t
new file mode 100644
index 00000000000..b3414b8ae33
--- /dev/null
+++ b/t/kqnotify.t
@@ -0,0 +1,41 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# Ensure KQNotify can pick up rename(2) and link(2) operations
+# used by Maildir writing tools
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+plan skip_all => 'KQNotify is only for *BSD systems' if $^O !~ /bsd/;
+require_mods('IO::KQueue');
+use_ok 'PublicInbox::KQNotify';
+my ($tmpdir, $for_destroy) = tmpdir();
+mkdir "$tmpdir/new" or BAIL_OUT "mkdir: $!";
+open my $fh, '>', "$tmpdir/tst" or BAIL_OUT "open: $!";
+close $fh or BAIL_OUT "close: $!";
+
+my $kqn = PublicInbox::KQNotify->new;
+my $mask = PublicInbox::KQNotify::MOVED_TO_OR_CREATE();
+my $hit = [];
+my $cb = sub { push @$hit, map { $_->fullname } @_ };
+my $w = $kqn->watch("$tmpdir/new", $mask, $cb);
+
+rename("$tmpdir/tst", "$tmpdir/new/tst") or BAIL_OUT "rename: $!";
+$kqn->poll;
+is_deeply($hit, ["$tmpdir/new/tst"], 'rename(2) detected (via NOTE_EXTEND)');
+
+@$hit = ();
+open $fh, '>', "$tmpdir/tst" or BAIL_OUT "open: $!";
+close $fh or BAIL_OUT "close: $!";
+link("$tmpdir/tst", "$tmpdir/new/link") or BAIL_OUT "link: $!";
+$kqn->poll;
+is_deeply($hit, ["$tmpdir/new/link"], 'link(2) detected (via NOTE_WRITE)');
+
+$w->cancel;
+@$hit = ();
+link("$tmpdir/new/tst", "$tmpdir/new/link2") or BAIL_OUT "link: $!";
+$kqn->poll;
+is_deeply($hit, [], 'link(2) not detected after cancel');
+
+done_testing;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/34] watch: remove Filesys::Notify::Simple dependency
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (8 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 09/34] kqnotify|fake_inotify: detect Maildir write ops Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 11/34] watch: use signalfd for Maildir watching Eric Wong
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Since we already use inotify and EVFILT_VNODE (kqueue)
in -imapd, we might as well use them directly in -watch,
too.

This will allow public-inbox-watch to use PublicInbox::DS
for timers to watch newsgroups/mailboxes and have saner
signal handling in future commits.
---
 Documentation/public-inbox-watch.pod |  3 +-
 INSTALL                              |  8 ---
 MANIFEST                             |  2 +
 Makefile.PL                          |  4 --
 ci/deps.perl                         |  1 -
 lib/PublicInbox/DirIdle.pm           | 50 ++++++++++++++++
 lib/PublicInbox/In2Tie.pm            | 13 +++++
 lib/PublicInbox/InboxIdle.pm         | 11 +---
 lib/PublicInbox/TestCommon.pm        |  6 +-
 lib/PublicInbox/WatchMaildir.pm      | 39 +++++++------
 script/public-inbox-watch            |  3 +-
 t/dir_idle.t                         |  6 ++
 t/imapd.t                            |  6 +-
 t/watch_filter_rubylang.t            |  2 +-
 t/watch_maildir.t                    | 85 +++++++++++++++++++++++++---
 t/watch_maildir_v2.t                 |  2 +-
 t/watch_multiple_headers.t           |  2 +-
 17 files changed, 182 insertions(+), 61 deletions(-)
 create mode 100644 lib/PublicInbox/DirIdle.pm
 create mode 100644 t/dir_idle.t

diff --git a/Documentation/public-inbox-watch.pod b/Documentation/public-inbox-watch.pod
index 8a3ef076a51..bf3c9bd4bb9 100644
--- a/Documentation/public-inbox-watch.pod
+++ b/Documentation/public-inbox-watch.pod
@@ -48,8 +48,7 @@ of large Maildirs.
 Upon startup, it scans the mailbox for new messages to be
 imported while it was not running.
 
-Currently, only Maildirs are supported and the
-L<Filesys::Notify::Simple> Perl module is required.
+Currently, only Maildirs are supported.
 
 For now, IMAP users should use tools such as L<mbsync(1)>
 or L<offlineimap(1)> to bidirectionally sync their IMAP
diff --git a/INSTALL b/INSTALL
index 80cee7535f8..05e0f95e914 100644
--- a/INSTALL
+++ b/INSTALL
@@ -132,18 +132,10 @@ above, so there is no need to explicitly install them:
                                    (optional for stale FD cleanup in daemons,
                                     typically installed alongside Perl5)
 
-- Filesys::Notify::Simple          deb: libfilesys-notify-simple-perl
-                                   pkg: p5-Filesys-Notify-Simple
-                                   rpm: perl-Filesys-Notify-Simple
-                                   (for public-inbox-watch, pulled in by Plack)
-
 - Linux::Inotify2                  deb: liblinux-inotify2-perl
                                    rpm: perl-Linux-Inotify2
                                    (for public-inbox-watch on Linux)
 
-- Filesys::Notify::KQueue          pkg: p5-Filesys-Notify-KQueue
-                                   (for public-inbox-watch on FreeBSD)
-
 - IO::Compress (::Gzip)            deb: perl-modules (or libio-compress-perl)
                                    pkg: perl5
                                    rpm: perl-IO-Compress
diff --git a/MANIFEST b/MANIFEST
index 9d1a4e4a8b1..035c45bf498 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -106,6 +106,7 @@ lib/PublicInbox/DS.pm
 lib/PublicInbox/DSKQXS.pm
 lib/PublicInbox/DSPoll.pm
 lib/PublicInbox/Daemon.pm
+lib/PublicInbox/DirIdle.pm
 lib/PublicInbox/DummyInbox.pm
 lib/PublicInbox/Emergency.pm
 lib/PublicInbox/Eml.pm
@@ -243,6 +244,7 @@ t/content_hash.t
 t/convert-compact.t
 t/data/0001.patch
 t/data/message_embed.eml
+t/dir_idle.t
 t/ds-kqxs.t
 t/ds-leak.t
 t/ds-poll.t
diff --git a/Makefile.PL b/Makefile.PL
index 472baa9b080..8d90ad46cf6 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -138,10 +138,6 @@ WriteMakefile(
 		# Plack is needed for public-inbox-httpd and PublicInbox::WWW
 		# 'Plack' => 0,
 
-		# Filesys::Notify::Simple is pulled in by Plack, but also
-		# needed by public-inbox-watch (for now)
-		# 'Filesys::Notify::Simple' => 0,
-
 		# TODO: this should really be made optional...
 		'URI::Escape' => 0,
 
diff --git a/ci/deps.perl b/ci/deps.perl
index 48aaa9e46d4..501f51129e7 100755
--- a/ci/deps.perl
+++ b/ci/deps.perl
@@ -32,7 +32,6 @@ my $profiles = {
 		BSD::Resource
 		DBD::SQLite
 		DBI
-		Filesys::Notify::Simple
 		Inline::C
 		Net::Server
 		Plack
diff --git a/lib/PublicInbox/DirIdle.pm b/lib/PublicInbox/DirIdle.pm
new file mode 100644
index 00000000000..ffceda66530
--- /dev/null
+++ b/lib/PublicInbox/DirIdle.pm
@@ -0,0 +1,50 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# Used by public-inbox-watch for Maildir (and possibly MH in the future)
+package PublicInbox::DirIdle;
+use strict;
+use base 'PublicInbox::DS';
+use fields qw(inot);
+use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
+use PublicInbox::In2Tie;
+
+my ($MAIL_IN, $ino_cls);
+if ($^O eq 'linux' && eval { require Linux::Inotify2; 1 }) {
+	$MAIL_IN = Linux::Inotify2::IN_MOVED_TO() |
+		Linux::Inotify2::IN_CREATE();
+	$ino_cls = 'Linux::Inotify2';
+} elsif (eval { require PublicInbox::KQNotify }) {
+	$MAIL_IN = PublicInbox::KQNotify::MOVED_TO_OR_CREATE();
+	$ino_cls = 'PublicInbox::KQNotify';
+} else {
+	require PublicInbox::FakeInotify;
+	$MAIL_IN = PublicInbox::FakeInotify::MOVED_TO_OR_CREATE();
+}
+
+sub new {
+	my ($class, $dirs, $cb) = @_;
+	my $self = fields::new($class);
+	my $inot;
+	if ($ino_cls) {
+		$inot = $ino_cls->new or die "E: $ino_cls->new: $!";
+		my $io = PublicInbox::In2Tie::io($inot);
+		$self->SUPER::new($io, EPOLLIN | EPOLLET);
+	} else {
+		require PublicInbox::FakeInotify;
+		$inot = PublicInbox::FakeInotify->new; # starts timer
+	}
+
+	# Linux::Inotify2->watch or similar
+	$inot->watch($_, $MAIL_IN, $cb) for @$dirs;
+	$self->{inot} = $inot;
+	$self;
+}
+
+sub event_step {
+	my ($self) = @_;
+	eval { $self->{inot}->poll }; # Linux::Inotify2::poll
+	warn "$self->{inot}->poll err: $@\n" if $@;
+}
+
+1;
diff --git a/lib/PublicInbox/In2Tie.pm b/lib/PublicInbox/In2Tie.pm
index db1dc1045c1..7dee362724e 100644
--- a/lib/PublicInbox/In2Tie.pm
+++ b/lib/PublicInbox/In2Tie.pm
@@ -5,6 +5,19 @@
 # on Linux::Inotify2 objects
 package PublicInbox::In2Tie;
 use strict;
+use Symbol qw(gensym);
+
+sub io {
+	my $in2 = $_[0];
+	$in2->blocking(0);
+	if ($in2->can('on_overflow')) {
+		# broadcasts everything on overflow
+		$in2->on_overflow(undef);
+	}
+	my $io = gensym;
+	tie *$io, __PACKAGE__, $in2;
+	$io;
+}
 
 sub TIEHANDLE {
 	my ($class, $in2) = @_;
diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm
index 97e9d53250e..ba8200aef05 100644
--- a/lib/PublicInbox/InboxIdle.pm
+++ b/lib/PublicInbox/InboxIdle.pm
@@ -6,7 +6,6 @@ use strict;
 use base qw(PublicInbox::DS);
 use fields qw(pi_config inot pathmap);
 use Cwd qw(abs_path);
-use Symbol qw(gensym);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
 my $IN_MODIFY = 0x02; # match Linux inotify
 my $ino_cls;
@@ -55,14 +54,8 @@ sub new {
 	my $inot;
 	if ($ino_cls) {
 		$inot = $ino_cls->new or die "E: $ino_cls->new: $!";
-		my $sock = gensym;
-		tie *$sock, 'PublicInbox::In2Tie', $inot;
-		$inot->blocking(0);
-		if ($inot->can('on_overflow')) {
-			 # broadcasts everything on overflow
-			$inot->on_overflow(undef);
-		}
-		$self->SUPER::new($sock, EPOLLIN | EPOLLET);
+		my $io = PublicInbox::In2Tie::io($inot);
+		$self->SUPER::new($io, EPOLLIN | EPOLLET);
 	} else {
 		require PublicInbox::FakeInotify;
 		$inot = PublicInbox::FakeInotify->new;
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index dc360135569..b252810fca5 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -10,7 +10,7 @@ use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek);
 use POSIX qw(dup2);
 use IO::Socket::INET;
 our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
-	run_script start_script key2sub xsys xqx eml_load);
+	run_script start_script key2sub xsys xqx eml_load tick);
 
 sub eml_load ($) {
 	my ($path, $cb) = @_;
@@ -418,4 +418,8 @@ sub DESTROY {
 	$self->join('TERM');
 }
 
+package PublicInbox::TestCommon::InboxWakeup;
+use strict;
+sub on_inbox_unlock { ${$_[0]}->($_[1]) }
+
 1;
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index fea7d5ef9ee..22f190366a4 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -119,19 +119,6 @@ sub _done_for_now {
 	}
 }
 
-sub _try_fsn_paths {
-	my ($self, $scan_re, $paths) = @_;
-	foreach (@$paths) {
-		my $path = $_->{path};
-		if ($path =~ $scan_re) {
-			scan($self, $path);
-		} else {
-			_try_path($self, $path);
-		}
-	}
-	_done_for_now($self);
-}
-
 sub remove_eml_i { # each_inbox callback
 	my ($ibx, $arg) = @_;
 	my ($self, $eml, $loc) = @$arg;
@@ -225,16 +212,28 @@ sub quit {
 
 sub watch_fs {
 	my ($self) = @_;
+	require PublicInbox::DirIdle;
 	my $scan = File::Temp->newdir("public-inbox-watch.$$.scan.XXXXXX",
 					TMPDIR => 1);
 	my $scandir = $self->{scandir} = $scan->dirname;
-	my $re = qr!\A$scandir/!;
-	my $cb = sub { _try_fsn_paths($self, $re, \@_) };
-
-	eval { require Filesys::Notify::Simple } or
-		die "Filesys::Notify::Simple is currently required for $0\n";
-	my $fsn = Filesys::Notify::Simple->new([@{$self->{mdir}}, $scandir]);
-	$fsn->wait($cb) until $self->{quit};
+	my $scan_re = qr!\A$scandir/!;
+	my $done = sub {
+		delete $self->{done_timer};
+		_done_for_now($self);
+	};
+	my $cb = sub {
+		my $path = $_[0]->fullname;
+		if ($path =~ $scan_re) {
+			scan($self, $path);
+		} else {
+			_try_path($self, $path);
+		}
+		$self->{done_timer} //= PublicInbox::DS::requeue($done);
+	};
+	my $di = PublicInbox::DirIdle->new([@{$self->{mdir}}, $scandir], $cb);
+	PublicInbox::DS->SetPostLoopCallback(sub { !$self->{quit} });
+	PublicInbox::DS->EventLoop;
+	_done_for_now($self);
 }
 
 # returns the git config section name, e.g [imap "imaps://user@example.com"]
diff --git a/script/public-inbox-watch b/script/public-inbox-watch
index 645abeda971..2057066a2a9 100755
--- a/script/public-inbox-watch
+++ b/script/public-inbox-watch
@@ -21,6 +21,7 @@ if ($watch_md) {
 		$watch_md->quit if $watch_md;
 		$watch_md = undef;
 	};
-	alarm(1);
+	# --no-scan is only intended for testing atm, undocumented.
+	alarm(1) unless (grep(/\A--no-scan\z/, @ARGV));
 	$watch_md->watch while ($watch_md);
 }
diff --git a/t/dir_idle.t b/t/dir_idle.t
new file mode 100644
index 00000000000..587599e83ff
--- /dev/null
+++ b/t/dir_idle.t
@@ -0,0 +1,6 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use Test::More;
+use_ok 'PublicInbox::DirIdle';
+done_testing;
diff --git a/t/imapd.t b/t/imapd.t
index 3f31743df2e..cc87a127851 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -462,7 +462,7 @@ ok($mic->logout, 'logged out');
 	PublicInbox::DS->Reset;
 	my $ii = PublicInbox::InboxIdle->new($cfg);
 	my $cb = sub { PublicInbox::DS->SetPostLoopCallback(sub {}) };
-	my $obj = bless \$cb, 'InboxWakeup';
+	my $obj = bless \$cb, 'PublicInbox::TestCommon::InboxWakeup';
 	$cfg->each_inbox(sub { $_[0]->subscribe_unlock('ident', $obj) });
 	open my $err, '+>', undef or BAIL_OUT $!;
 	my $w = start_script(['-watch'], undef, { 2 => $err });
@@ -488,7 +488,3 @@ unlike($eout, qr/wide/i, 'no Wide character warnings');
 unlike($eout, qr/uninitialized/i, 'no uninitialized warnings');
 
 done_testing;
-
-package InboxWakeup;
-use strict;
-sub on_inbox_unlock { ${$_[0]}->() }
diff --git a/t/watch_filter_rubylang.t b/t/watch_filter_rubylang.t
index 2e7d402e5fa..db48cb2ffde 100644
--- a/t/watch_filter_rubylang.t
+++ b/t/watch_filter_rubylang.t
@@ -6,7 +6,7 @@ use PublicInbox::TestCommon;
 use Test::More;
 use PublicInbox::Eml;
 use PublicInbox::Config;
-require_mods(qw(Filesys::Notify::Simple DBD::SQLite Search::Xapian));
+require_mods(qw(DBD::SQLite Search::Xapian));
 use_ok 'PublicInbox::WatchMaildir';
 use_ok 'PublicInbox::Emergency';
 my ($tmpdir, $for_destroy) = tmpdir();
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index 33a3458bd4d..a2c09b0351b 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -7,7 +7,6 @@ use Cwd;
 use PublicInbox::Config;
 use PublicInbox::TestCommon;
 use PublicInbox::Import;
-require_mods(qw(Filesys::Notify::Simple));
 my ($tmpdir, $for_destroy) = tmpdir();
 my $git_dir = "$tmpdir/test.git";
 my $maildir = "$tmpdir/md";
@@ -47,14 +46,22 @@ EOF
 		'only got the spam folder to watch');
 }
 
-my $config = PublicInbox::Config->new(\<<EOF);
-$cfgpfx.address=$addr
-$cfgpfx.inboxdir=$git_dir
-$cfgpfx.watch=maildir:$maildir
-$cfgpfx.filter=PublicInbox::Filter::Vger
-publicinboxlearn.watchspam=maildir:$spamdir
+my $cfg_path = "$tmpdir/config";
+{
+	open my $fh, '>', $cfg_path or BAIL_OUT $!;
+	print $fh <<EOF or BAIL_OUT $!;
+[publicinbox "test"]
+	address = $addr
+	inboxdir = $git_dir
+	watch = maildir:$maildir
+	filter = PublicInbox::Filter::Vger
+[publicinboxlearn]
+	watchspam = maildir:$spamdir
 EOF
+	close $fh or BAIL_OUT $!;
+}
 
+my $config = PublicInbox::Config->new($cfg_path);
 PublicInbox::WatchMaildir->new($config)->scan('full');
 my $git = PublicInbox::Git->new($git_dir);
 my @list = $git->qx(qw(rev-list refs/heads/master));
@@ -136,6 +143,70 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html\n);
 	like($$mref, qr/something\n\z/s, 'message scrubbed on import');
 }
 
+# end-to-end test which actually uses inotify/kevent
+{
+	my $env = { PI_CONFIG => $cfg_path };
+	$git->cleanup;
+
+	# n.b. --no-scan is only intended for testing atm
+	my $wm = start_script([qw(-watch --no-scan)], $env);
+	my $eml = eml_load('t/data/0001.patch');
+	$eml->header_set('Cc', $addr);
+	my $em = PublicInbox::Emergency->new($maildir);
+	$em->prepare(\($eml->as_string));
+
+	use_ok 'PublicInbox::InboxIdle';
+	use_ok 'PublicInbox::DS';
+	my $delivered = 0;
+	my $cb = sub {
+		my ($ibx) = @_;
+		diag "message delivered to `$ibx->{name}'";
+		$delivered++;
+	};
+	PublicInbox::DS->Reset;
+	my $ii = PublicInbox::InboxIdle->new($config);
+	my $obj = bless \$cb, 'PublicInbox::TestCommon::InboxWakeup';
+	$config->each_inbox(sub { $_[0]->subscribe_unlock('ident', $obj) });
+	PublicInbox::DS->SetPostLoopCallback(sub { $delivered == 0 });
+
+	# wait for -watch to setup inotify watches
+	my $sleep = 1;
+	if (eval { require Linux::Inotify2 } && -d "/proc/$wm->{pid}/fd") {
+		my $end = time + 2;
+		my (@ino, @ino_info);
+		do {
+			@ino = grep {
+				(readlink($_)//'') =~ /\binotify\b/
+			} glob("/proc/$wm->{pid}/fd/*");
+		} until (@ino || time > $end || !tick);
+		if (scalar(@ino) == 1) {
+			my $ino_fd = (split('/', $ino[0]))[-1];
+			my $ino_fdinfo = "/proc/$wm->{pid}/fdinfo/$ino_fd";
+			while (time < $end && open(my $fh, '<', $ino_fdinfo)) {
+				@ino_info = grep(/^inotify wd:/, <$fh>);
+				last if @ino_info >= 4;
+				tick;
+			}
+			$sleep = undef if @ino_info >= 4;
+		}
+	}
+	if ($sleep) {
+		diag "waiting ${sleep}s for -watch to start up";
+		sleep $sleep;
+	}
+
+	$em->commit; # wake -watch up
+	diag 'waiting for -watch to import new message';
+	PublicInbox::DS->EventLoop;
+	$wm->kill;
+	$wm->join;
+	$ii->close;
+	PublicInbox::DS->Reset;
+	my $head = $git->qx(qw(cat-file commit HEAD));
+	my $subj = $eml->header('Subject');
+	like($head, qr/^\Q$subj\E/sm, 'new commit made');
+}
+
 sub is_maildir {
 	my ($dir) = @_;
 	PublicInbox::WatchMaildir::is_maildir($dir);
diff --git a/t/watch_maildir_v2.t b/t/watch_maildir_v2.t
index 19a2da77070..6cc8b6ff0e9 100644
--- a/t/watch_maildir_v2.t
+++ b/t/watch_maildir_v2.t
@@ -8,7 +8,7 @@ use PublicInbox::Config;
 use PublicInbox::TestCommon;
 use PublicInbox::Import;
 require_git(2.6);
-require_mods(qw(Search::Xapian DBD::SQLite Filesys::Notify::Simple));
+require_mods(qw(Search::Xapian DBD::SQLite));
 require PublicInbox::V2Writable;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $inboxdir = "$tmpdir/v2";
diff --git a/t/watch_multiple_headers.t b/t/watch_multiple_headers.t
index 3a39eba9e0e..0ee96d5ff89 100644
--- a/t/watch_multiple_headers.t
+++ b/t/watch_multiple_headers.t
@@ -5,7 +5,7 @@ use Test::More;
 use PublicInbox::Config;
 use PublicInbox::TestCommon;
 require_git(2.6);
-require_mods(qw(Search::Xapian DBD::SQLite Filesys::Notify::Simple));
+require_mods(qw(Search::Xapian DBD::SQLite));
 my ($tmpdir, $for_destroy) = tmpdir();
 my $inboxdir = "$tmpdir/v2";
 my $maildir = "$tmpdir/md";

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/34] watch: use signalfd for Maildir watching
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (9 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 10/34] watch: remove Filesys::Notify::Simple dependency Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 19:05   ` Kyle Meyer
  2020-06-27 10:03 ` [PATCH 12/34] ds: remove fields.pm usage Eric Wong
                   ` (23 subsequent siblings)
  34 siblings, 1 reply; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

We can get rid of the janky wannabe
self-using-a-directory-instead-of-pipe thing we needed to
workaround Filesys::Notify::Simple being blocking.

For existing Maildir users, this should be more robust and
immune to missed wakeups for to signalfd and kqueue-enabled
systems; as well as being immune to BOFHs clearing $TMPDIR
and preventing notifications from firing.

The IMAP IDLE code still uses normal Perl signals, so it's still
vulnerable to missed wakeups.  That will be addressed in future
commits.
---
 lib/PublicInbox/Daemon.pm       | 19 ++++------
 lib/PublicInbox/Sigfd.pm        | 12 +++++-
 lib/PublicInbox/WatchMaildir.pm | 67 ++++++++++++++++++---------------
 script/public-inbox-watch       | 24 +++++++++---
 t/watch_maildir.t               |  4 +-
 5 files changed, 75 insertions(+), 51 deletions(-)

diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 2f63bd73b4a..ab0c2226e40 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -18,9 +18,9 @@ use PublicInbox::DS qw(now);
 use PublicInbox::Syscall qw(SFD_NONBLOCK);
 require PublicInbox::Listener;
 require PublicInbox::ParentPipe;
-require PublicInbox::Sigfd;
+use PublicInbox::Sigfd;
 my @CMD;
-my ($set_user, $oldset, $newset);
+my ($set_user, $oldset);
 my (@cfg_listen, $stdout, $stderr, $group, $user, $pid_file, $daemonize);
 my $worker_processes = 1;
 my @listeners;
@@ -72,15 +72,10 @@ sub accept_tls_opt ($) {
 	{ SSL_server => 1, SSL_startHandshake => 0, SSL_reuse_ctx => $ctx };
 }
 
-sub sig_setmask { sigprocmask(SIG_SETMASK, @_) or die "sigprocmask: $!" }
-
 sub daemon_prepare ($) {
 	my ($default_listen) = @_;
 	my $listener_names = {}; # sockname => IO::Handle
-	$oldset = POSIX::SigSet->new();
-	$newset = POSIX::SigSet->new();
-	$newset->fillset or die "fillset: $!";
-	sig_setmask($newset, $oldset);
+	my $oldset = PublicInbox::Sigfd::block_signals();
 	@CMD = ($0, @ARGV);
 	my %opts = (
 		'l|listen=s' => \@cfg_listen,
@@ -515,7 +510,7 @@ EOF
 	};
 	my $sigfd = PublicInbox::Sigfd->new($sig, 0);
 	local %SIG = (%SIG, %$sig) if !$sigfd;
-	sig_setmask($oldset) if !$sigfd;
+	PublicInbox::restore_signals($oldset) if !$sigfd;
 	while (1) { # main loop
 		my $n = scalar keys %pids;
 		unless (@listeners) {
@@ -531,7 +526,7 @@ EOF
 		}
 		my $want = $worker_processes - 1;
 		if ($n <= $want) {
-			sig_setmask($newset) if !$sigfd;
+			PublicInbox::Sigfd::block_signals() if !$sigfd;
 			for my $i ($n..$want) {
 				my $pid = fork;
 				if (!defined $pid) {
@@ -544,7 +539,7 @@ EOF
 					$pids{$pid} = $i;
 				}
 			}
-			sig_setmask($oldset) if !$sigfd;
+			PubliInbox::Sigfd::set_sigmask($oldset) if !$sigfd;
 		}
 
 		if ($sigfd) { # Linux and IO::KQueue users:
@@ -632,7 +627,7 @@ sub daemon_loop ($$$$) {
 	if (!$sigfd) {
 		# wake up every second to accept signals if we don't
 		# have signalfd or IO::KQueue:
-		sig_setmask($oldset);
+		PublicInbox::Sigfd::set_sigmask($oldset);
 		PublicInbox::DS->SetLoopTimeout(1000);
 	}
 	PublicInbox::DS->EventLoop;
diff --git a/lib/PublicInbox/Sigfd.pm b/lib/PublicInbox/Sigfd.pm
index f500902ea67..17456592a7e 100644
--- a/lib/PublicInbox/Sigfd.pm
+++ b/lib/PublicInbox/Sigfd.pm
@@ -5,7 +5,7 @@ use strict;
 use parent qw(PublicInbox::DS);
 use fields qw(sig); # hashref similar to %SIG, but signal numbers as keys
 use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET SFD_NONBLOCK);
-use POSIX ();
+use POSIX qw(:signal_h);
 use IO::Handle ();
 
 # returns a coderef to unblock signals if neither signalfd or kqueue
@@ -62,4 +62,14 @@ sub event_step {
 	while (wait_once($_[0])) {} # non-blocking
 }
 
+sub sig_setmask { sigprocmask(SIG_SETMASK, @_) or die "sigprocmask: $!" }
+
+sub block_signals () {
+	my $oldset = POSIX::SigSet->new;
+	my $newset = POSIX::SigSet->new;
+	$newset->fillset or die "fillset: $!";
+	sig_setmask($newset, $oldset);
+	$oldset;
+}
+
 1;
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 22f190366a4..4d3cd032e5a 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -8,9 +8,9 @@ use strict;
 use warnings;
 use PublicInbox::Eml;
 use PublicInbox::InboxWritable;
-use File::Temp 0.19 (); # 0.19 for ->newdir
 use PublicInbox::Filter::Base qw(REJECT);
 use PublicInbox::Spamcheck;
+use PublicInbox::Sigfd;
 use PublicInbox::DS qw(now);
 use POSIX qw(_exit WNOHANG);
 *mime_from_path = \&PublicInbox::InboxWritable::mime_from_path;
@@ -108,6 +108,7 @@ sub new {
 		imap => scalar keys %imap ? \%imap : undef,
 		importers => {},
 		opendirs => {}, # dirname => dirhandle (in progress scans)
+		ops => [], # 'quit', 'full'
 	}, $class;
 }
 
@@ -195,7 +196,9 @@ sub _try_path {
 
 sub quit {
 	my ($self) = @_;
-	trigger_scan($self, 'quit') or $self->{quit} = 1;
+	$self->{quit} = 1;
+	%{$self->{opendirs}} = ();
+	_done_for_now($self);
 	if (my $imap_pid = $self->{-imap_pid}) {
 		kill('QUIT', $imap_pid);
 	}
@@ -213,24 +216,15 @@ sub quit {
 sub watch_fs {
 	my ($self) = @_;
 	require PublicInbox::DirIdle;
-	my $scan = File::Temp->newdir("public-inbox-watch.$$.scan.XXXXXX",
-					TMPDIR => 1);
-	my $scandir = $self->{scandir} = $scan->dirname;
-	my $scan_re = qr!\A$scandir/!;
 	my $done = sub {
 		delete $self->{done_timer};
 		_done_for_now($self);
 	};
 	my $cb = sub {
-		my $path = $_[0]->fullname;
-		if ($path =~ $scan_re) {
-			scan($self, $path);
-		} else {
-			_try_path($self, $path);
-		}
+		_try_path($self, $_[0]->fullname);
 		$self->{done_timer} //= PublicInbox::DS::requeue($done);
 	};
-	my $di = PublicInbox::DirIdle->new([@{$self->{mdir}}, $scandir], $cb);
+	my $di = PublicInbox::DirIdle->new($self->{mdir}, $cb);
 	PublicInbox::DS->SetPostLoopCallback(sub { !$self->{quit} });
 	PublicInbox::DS->EventLoop;
 	_done_for_now($self);
@@ -485,6 +479,12 @@ sub watch_imap_idle_1 ($$$) {
 	}
 }
 
+sub watch_atfork_child ($) {
+	my ($self) = @_;
+	PublicInbox::Sigfd::sig_setmask($self->{oldset});
+	%SIG = (%SIG, %{$self->{sig}});
+}
+
 sub watch_imap_idle_all ($$) {
 	my ($self, $idle) = @_; # $idle = [[ uri1, intvl1 ], [ uri2, intvl2 ]]
 	$self->{mics} = {}; # going to be forking, so disconnect
@@ -494,6 +494,7 @@ sub watch_imap_idle_all ($$) {
 			my ($uri, $intvl) = @$uri_intvl;
 			defined(my $pid = fork) or die "fork: $!";
 			if ($pid == 0) {
+				watch_atfork_child($self);
 				delete $self->{idle_pids};
 				watch_imap_idle_1($self, $uri, $intvl);
 				_exit(0);
@@ -564,15 +565,20 @@ sub watch_imap ($) {
 }
 
 sub watch {
-	my ($self) = @_;
+	my ($self, $sig, $oldset) = @_;
+	$self->{oldset} = $oldset;
+	$self->{sig} = $sig;
 	if ($self->{mdre} && $self->{imap}) {
 		defined(my $pid = fork) or die "fork: $!";
 		if ($pid == 0) {
+			watch_atfork_child($self);
 			imap_start($self);
 			goto &watch_imap;
 		}
 		$self->{-imap_pid} = $pid;
 	} elsif ($self->{imap}) {
+		# not a child process, but no signalfd, yet:
+		watch_atfork_child($self);
 		imap_start($self);
 		goto &watch_imap;
 	}
@@ -580,23 +586,18 @@ sub watch {
 }
 
 sub trigger_scan {
-	my ($self, $base) = @_;
-	my $dir = $self->{scandir} or return;
-	open my $fh, '>', "$dir/$base" or die "open $dir/$base failed: $!\n";
-	close $fh or die "close $dir/$base failed: $!\n";
+	my ($self, $op) = @_;
+	push @{$self->{ops}}, $op;
+	PublicInbox::DS::requeue($self);
 }
 
-sub scan {
-	my ($self, $path) = @_;
-	if ($path =~ /quit\z/) {
-		%{$self->{opendirs}} = ();
-		_done_for_now($self);
-		delete $self->{scandir};
-		$self->{quit} = 1;
-		return;
-	}
-	# else: $path =~ /(cont|full)\z/
+# called directly, and by PublicInbox::DS
+sub event_step ($) {
+	my ($self) = @_;
 	return if $self->{quit};
+	my $op = shift @{$self->{ops}};
+
+	# continue existing scan
 	my $max = 10;
 	my $opendirs = $self->{opendirs};
 	my @dirnames = keys %$opendirs;
@@ -609,7 +610,7 @@ sub scan {
 		}
 		$opendirs->{$dir} = $dh if $n < 0;
 	}
-	if ($path =~ /full\z/) {
+	if ($op && $op eq 'full') {
 		foreach my $dir (@{$self->{mdir}}) {
 			next if $opendirs->{$dir}; # already in progress
 			my $ok = opendir(my $dh, $dir);
@@ -627,7 +628,13 @@ sub scan {
 	}
 	_done_for_now($self);
 	# do we have more work to do?
-	trigger_scan($self, 'cont') if keys %$opendirs;
+	PublicInbox::DS::requeue($self) if keys %$opendirs;
+}
+
+sub scan {
+	my ($self, $op) = @_;
+	push @{$self->{ops}}, $op;
+	goto &event_step;
 }
 
 sub _importer_for {
diff --git a/script/public-inbox-watch b/script/public-inbox-watch
index 2057066a2a9..b6d545adad7 100755
--- a/script/public-inbox-watch
+++ b/script/public-inbox-watch
@@ -5,6 +5,10 @@ use strict;
 use warnings;
 use PublicInbox::WatchMaildir;
 use PublicInbox::Config;
+use PublicInbox::DS;
+use PublicInbox::Sigfd;
+use PublicInbox::Syscall qw(SFD_NONBLOCK);
+my $oldset = PublicInbox::Sigfd::block_signals();
 my ($config, $watch_md);
 my $reload = sub {
 	$config = PublicInbox::Config->new;
@@ -14,14 +18,22 @@ my $reload = sub {
 $reload->();
 if ($watch_md) {
 	my $scan = sub { $watch_md->trigger_scan('full') if $watch_md };
-	$SIG{HUP} = $reload;
-	$SIG{USR1} = $scan;
-	$SIG{ALRM} = sub { $SIG{ALRM} = 'DEFAULT'; $scan->() };
-	$SIG{QUIT} = $SIG{TERM} = $SIG{INT} = sub {
+	my $quit = sub {
 		$watch_md->quit if $watch_md;
 		$watch_md = undef;
 	};
+	my $sig = { HUP => $reload, USR1 => $scan };
+	$sig->{QUIT} = $sig->{TERM} = $sig->{INT} = $quit;
+
 	# --no-scan is only intended for testing atm, undocumented.
-	alarm(1) unless (grep(/\A--no-scan\z/, @ARGV));
-	$watch_md->watch while ($watch_md);
+	unless (grep(/\A--no-scan\z/, @ARGV)) {
+		PublicInbox::DS::requeue($scan);
+	}
+	my $sigfd = PublicInbox::Sigfd->new($sig, SFD_NONBLOCK);
+	local %SIG = (%SIG, %$sig) if !$sigfd;
+	if (!$sigfd) {
+		PublicInbox::Sigfd::set_sigmask($oldset);
+		PublicInbox::DS->SetLoopTimeout(1000);
+	}
+	$watch_md->watch($sig, $oldset) while ($watch_md);
 }
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index a2c09b0351b..c8658140cf2 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -184,10 +184,10 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html\n);
 			my $ino_fdinfo = "/proc/$wm->{pid}/fdinfo/$ino_fd";
 			while (time < $end && open(my $fh, '<', $ino_fdinfo)) {
 				@ino_info = grep(/^inotify wd:/, <$fh>);
-				last if @ino_info >= 4;
+				last if @ino_info >= 3;
 				tick;
 			}
-			$sleep = undef if @ino_info >= 4;
+			$sleep = undef if @ino_info >= 3;
 		}
 	}
 	if ($sleep) {

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/34] ds: remove fields.pm usage
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (10 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 11/34] watch: use signalfd for Maildir watching Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 13/34] watch: wire up IMAP IDLE reapers to DS Eric Wong
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Since the removal of pseudo-hash support in Perl 5.10, the
"fields" module no longer provides the space or speed benefits
it did in 5.8.  It also does not allow for compile-time checks,
only run-time checks.

To me, the extra developer overhead in maintaining "use fields"
args has become a hassle.  None of our non-DS-related code uses
fields.pm, nor do any of our current dependencies.  In fact,
Danga::Socket (which DS was originally forked from) and its
subclasses are the only fields.pm users I've ever encountered in
the wild.  Removing fields may make our code more approachable
to other Perl hackers.

So stop using fields.pm and locked hashes, but continue to
document what fields do for non-trivial classes.
---
 lib/PublicInbox/DS.pm          | 17 ++++++-----------
 lib/PublicInbox/DirIdle.pm     |  5 ++---
 lib/PublicInbox/GitAsyncCat.pm |  4 +---
 lib/PublicInbox/HTTP.pm        | 23 +++++++++++++++--------
 lib/PublicInbox/HTTPD/Async.pm | 22 +++++++++++++---------
 lib/PublicInbox/IMAP.pm        | 19 +++++++++++--------
 lib/PublicInbox/InboxIdle.pm   |  9 ++++++---
 lib/PublicInbox/Listener.pm    |  8 ++------
 lib/PublicInbox/NNTP.pm        | 12 +++++++-----
 lib/PublicInbox/NNTPdeflate.pm |  5 +----
 lib/PublicInbox/ParentPipe.pm  |  8 ++------
 lib/PublicInbox/Sigfd.pm       |  9 +++++----
 xt/mem-imapd-tls.t             | 18 +++++++-----------
 13 files changed, 78 insertions(+), 81 deletions(-)

diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index aa65b2d3642..da68802dda9 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -13,6 +13,12 @@
 # Bugs encountered were reported to bug-Danga-Socket@rt.cpan.org,
 # fixed in Danga::Socket 1.62 and visible at:
 # https://rt.cpan.org/Public/Dist/Display.html?Name=Danga-Socket
+#
+# fields:
+# sock: underlying socket
+# rbuf: scalarref, usually undef
+# wbuf: arrayref of coderefs or tmpio (autovivified))
+#        (tmpio = [ GLOB, offset, [ length ] ])
 package PublicInbox::DS;
 use strict;
 use bytes;
@@ -22,19 +28,10 @@ use Fcntl qw(SEEK_SET :DEFAULT O_APPEND);
 use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
 use parent qw(Exporter);
 our @EXPORT_OK = qw(now msg_more);
-use warnings;
 use 5.010_001;
 use Scalar::Util qw(blessed);
-
 use PublicInbox::Syscall qw(:epoll);
 use PublicInbox::Tmpfile;
-
-use fields ('sock',              # underlying socket
-            'rbuf',              # scalarref, usually undef
-            'wbuf', # arrayref of coderefs or tmpio (autovivified))
-                    # (tmpio = [ GLOB, offset, [ length ] ])
-            );
-
 use Errno qw(EAGAIN EINVAL);
 use Carp qw(confess carp);
 
@@ -328,8 +325,6 @@ This is normally (always?) called from your subclass via:
 =cut
 sub new {
     my ($self, $sock, $ev) = @_;
-    $self = fields::new($self) unless ref $self;
-
     $self->{sock} = $sock;
     my $fd = fileno($sock);
 
diff --git a/lib/PublicInbox/DirIdle.pm b/lib/PublicInbox/DirIdle.pm
index ffceda66530..fbbc9531a20 100644
--- a/lib/PublicInbox/DirIdle.pm
+++ b/lib/PublicInbox/DirIdle.pm
@@ -4,8 +4,7 @@
 # Used by public-inbox-watch for Maildir (and possibly MH in the future)
 package PublicInbox::DirIdle;
 use strict;
-use base 'PublicInbox::DS';
-use fields qw(inot);
+use parent 'PublicInbox::DS';
 use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
 use PublicInbox::In2Tie;
 
@@ -24,7 +23,7 @@ if ($^O eq 'linux' && eval { require Linux::Inotify2; 1 }) {
 
 sub new {
 	my ($class, $dirs, $cb) = @_;
-	my $self = fields::new($class);
+	my $self = bless {}, $class;
 	my $inot;
 	if ($ino_cls) {
 		$inot = $ino_cls->new or die "E: $ino_cls->new: $!";
diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm
index 098101aed00..0b777204a7c 100644
--- a/lib/PublicInbox/GitAsyncCat.pm
+++ b/lib/PublicInbox/GitAsyncCat.pm
@@ -11,16 +11,14 @@
 package PublicInbox::GitAsyncCat;
 use strict;
 use parent qw(PublicInbox::DS Exporter);
-use fields qw(git);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
 our @EXPORT = qw(git_async_cat);
 
 sub _add {
 	my ($class, $git) = @_;
-	my $self = fields::new($class);
 	$git->batch_prepare;
+	my $self = bless { git => $git }, $class;
 	$self->SUPER::new($git->{in}, EPOLLIN|EPOLLET);
-	$self->{git} = $git;
 	\undef; # this is a true ref()
 }
 
diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index 6ccf2059240..8281746538e 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -6,12 +6,21 @@
 # to learn different ways to admin both NNTP and HTTP components.
 # There's nothing which depends on public-inbox, here.
 # Each instance of this class represents a HTTP client socket
-
+#
+# fields:
+# httpd: PublicInbox::HTTPD ref
+# env: PSGI env hashref
+# input_left: bytes left to read in request body (e.g. POST/PUT)
+# remote_addr: remote IP address as a string (e.g. "127.0.0.1")
+# remote_port: peer port
+# forward: response body object, response to ->getline + ->close
+# alive: HTTP keepalive state:
+#	0: drop connection when done
+#	1: keep connection when done
+#	2: keep connection, chunk responses
 package PublicInbox::HTTP;
 use strict;
-use warnings;
-use base qw(PublicInbox::DS);
-use fields qw(httpd env input_left remote_addr remote_port forward alive);
+use parent qw(PublicInbox::DS);
 use bytes (); # only for bytes::length
 use Fcntl qw(:seek);
 use Plack::HTTPParser qw(parse_http_request); # XS or pure Perl
@@ -56,7 +65,7 @@ sub http_date () {
 
 sub new ($$$) {
 	my ($class, $sock, $addr, $httpd) = @_;
-	my $self = fields::new($class);
+	my $self = bless { httpd => $httpd }, $class;
 	my $ev = EPOLLIN;
 	my $wbuf;
 	if ($sock->can('accept_SSL') && !$sock->accept_SSL) {
@@ -64,12 +73,10 @@ sub new ($$$) {
 		$ev = PublicInbox::TLS::epollbit();
 		$wbuf = [ \&PublicInbox::DS::accept_tls_step ];
 	}
-	$self->SUPER::new($sock, $ev | EPOLLONESHOT);
-	$self->{httpd} = $httpd;
 	$self->{wbuf} = $wbuf if $wbuf;
 	($self->{remote_addr}, $self->{remote_port}) =
 		PublicInbox::Daemon::host_with_port($addr);
-	$self;
+	$self->SUPER::new($sock, $ev | EPOLLONESHOT);
 }
 
 sub event_step { # called by PublicInbox::DS
diff --git a/lib/PublicInbox/HTTPD/Async.pm b/lib/PublicInbox/HTTPD/Async.pm
index 35075d344b0..87a6a5f9cf0 100644
--- a/lib/PublicInbox/HTTPD/Async.pm
+++ b/lib/PublicInbox/HTTPD/Async.pm
@@ -6,11 +6,16 @@
 # The name of this key is not even stable!
 # Currently intended for use with read-only pipes with expensive
 # processes such as git-http-backend(1), cgit(1)
+#
+# fields:
+# http: PublicInbox::HTTP ref
+# fh: PublicInbox::HTTP::{Identity,Chunked} ref (can ->write + ->close)
+# cb: initial read callback
+# arg: arg for {cb}
+# end_obj: CODE or object which responds to ->event_step when ->close is called
 package PublicInbox::HTTPD::Async;
 use strict;
-use warnings;
-use base qw(PublicInbox::DS);
-use fields qw(http fh cb arg end_obj);
+use parent qw(PublicInbox::DS);
 use Errno qw(EAGAIN);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
 
@@ -27,14 +32,13 @@ sub new {
 		die '$end_obj unsupported w/o $io' if $end_obj;
 		return;
 	}
-
-	my $self = fields::new($class);
+	my $self = bless {
+		cb => $cb, # initial read callback
+		arg => $arg, # arg for $cb
+		end_obj => $end_obj, # like END{}, can ->event_step
+	}, $class;
 	IO::Handle::blocking($io, 0);
 	$self->SUPER::new($io, EPOLLIN | EPOLLET);
-	$self->{cb} = $cb; # initial read callback
-	$self->{arg} = $arg; # arg for $cb
-	$self->{end_obj} = $end_obj; # like END{}, can ->event_step
-	$self;
 }
 
 sub event_step {
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 0a6993c64c4..24f96e6691a 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -21,12 +21,18 @@
 #   as a 50K uint16_t array (via pack("S*", ...)).  "UID offset"
 #   is the offset from {uid_base} which determines the start of
 #   the mailbox slice.
-
+#
+# fields:
+# imapd: PublicInbox::IMAPD ref
+# ibx: PublicInbox::Inbox ref
+# long_cb: long_response private data
+# uid_base: base UID for mailbox slice (0-based)
+# -login_tag: IMAP TAG for LOGIN
+# -idle_tag: IMAP response tag for IDLE
+# uo2m: UID-to-MSN mapping
 package PublicInbox::IMAP;
 use strict;
-use base qw(PublicInbox::DS);
-use fields qw(imapd ibx long_cb -login_tag
-	uid_base -idle_tag uo2m);
+use parent qw(PublicInbox::DS);
 use PublicInbox::Eml;
 use PublicInbox::EmlContentFoo qw(parse_content_disposition);
 use PublicInbox::DS qw(now);
@@ -34,7 +40,6 @@ use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
 use PublicInbox::GitAsyncCat;
 use Text::ParseWords qw(parse_line);
 use Errno qw(EAGAIN);
-use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways
 use PublicInbox::Search;
 use PublicInbox::IMAPsearchqp;
 *mdocid = \&PublicInbox::Search::mdocid;
@@ -107,8 +112,7 @@ sub greet ($) {
 
 sub new ($$$) {
 	my ($class, $sock, $imapd) = @_;
-	my $self = fields::new('PublicInbox::IMAP_preauth');
-	unlock_hash(%$self);
+	my $self = bless { imapd => $imapd }, 'PublicInbox::IMAP_preauth';
 	my $ev = EPOLLIN;
 	my $wbuf;
 	if ($sock->can('accept_SSL') && !$sock->accept_SSL) {
@@ -117,7 +121,6 @@ sub new ($$$) {
 		$wbuf = [ \&PublicInbox::DS::accept_tls_step, \&greet ];
 	}
 	$self->SUPER::new($sock, $ev | EPOLLONESHOT);
-	$self->{imapd} = $imapd;
 	if ($wbuf) {
 		$self->{wbuf} = $wbuf;
 	} else {
diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm
index ba8200aef05..59cb833fd5a 100644
--- a/lib/PublicInbox/InboxIdle.pm
+++ b/lib/PublicInbox/InboxIdle.pm
@@ -1,10 +1,13 @@
 # Copyright (C) 2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
+# fields:
+# pi_config: PublicInbox::Config ref
+# inot: Linux::Inotify2-like object
+# pathmap => { inboxdir => [ ibx, watch1, watch2, watch3... ] } mapping
 package PublicInbox::InboxIdle;
 use strict;
-use base qw(PublicInbox::DS);
-use fields qw(pi_config inot pathmap);
+use parent qw(PublicInbox::DS);
 use Cwd qw(abs_path);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
 my $IN_MODIFY = 0x02; # match Linux inotify
@@ -50,7 +53,7 @@ sub refresh {
 
 sub new {
 	my ($class, $pi_config) = @_;
-	my $self = fields::new($class);
+	my $self = bless {}, $class;
 	my $inot;
 	if ($ino_cls) {
 		$inot = $ino_cls->new or die "E: $ino_cls->new: $!";
diff --git a/lib/PublicInbox/Listener.pm b/lib/PublicInbox/Listener.pm
index eb7dd8d46cc..2e0fc248fe7 100644
--- a/lib/PublicInbox/Listener.pm
+++ b/lib/PublicInbox/Listener.pm
@@ -4,10 +4,8 @@
 # Used by -nntpd for listen sockets
 package PublicInbox::Listener;
 use strict;
-use warnings;
-use base 'PublicInbox::DS';
+use parent 'PublicInbox::DS';
 use Socket qw(SOL_SOCKET SO_KEEPALIVE IPPROTO_TCP TCP_NODELAY);
-use fields qw(post_accept);
 use IO::Handle;
 use PublicInbox::Syscall qw(EPOLLIN EPOLLEXCLUSIVE EPOLLET);
 use Errno qw(EAGAIN ECONNABORTED EPERM);
@@ -23,10 +21,8 @@ sub new ($$$) {
 	setsockopt($s, SOL_SOCKET, SO_KEEPALIVE, 1);
 	setsockopt($s, IPPROTO_TCP, TCP_NODELAY, 1); # ignore errors on non-TCP
 	listen($s, 1024);
-	my $self = fields::new($class);
+	my $self = bless { post_accept => $cb }, $class;
 	$self->SUPER::new($s, EPOLLIN|EPOLLET|EPOLLEXCLUSIVE);
-	$self->{post_accept} = $cb;
-	$self
 }
 
 sub event_step {
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 76f14bbd97d..9d91544abd3 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -2,11 +2,14 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 #
 # Each instance of this represents a NNTP client socket
+# fields:
+# nntpd: PublicInbox::NNTPD ref
+# article: per-session current article number
+# ng: PublicInbox::Inbox ref
+# long_cb: long_response private data
 package PublicInbox::NNTP;
 use strict;
-use warnings;
-use base qw(PublicInbox::DS);
-use fields qw(nntpd article ng long_cb);
+use parent qw(PublicInbox::DS);
 use PublicInbox::MID qw(mid_escape $MID_EXTRACT);
 use PublicInbox::Eml;
 use POSIX qw(strftime);
@@ -45,7 +48,7 @@ sub greet ($) { $_[0]->write($_[0]->{nntpd}->{greet}) };
 
 sub new ($$$) {
 	my ($class, $sock, $nntpd) = @_;
-	my $self = fields::new($class);
+	my $self = bless { nntpd => $nntpd }, $class;
 	my $ev = EPOLLIN;
 	my $wbuf;
 	if ($sock->can('accept_SSL') && !$sock->accept_SSL) {
@@ -54,7 +57,6 @@ sub new ($$$) {
 		$wbuf = [ \&PublicInbox::DS::accept_tls_step, \&greet ];
 	}
 	$self->SUPER::new($sock, $ev | EPOLLONESHOT);
-	$self->{nntpd} = $nntpd;
 	if ($wbuf) {
 		$self->{wbuf} = $wbuf;
 	} else {
diff --git a/lib/PublicInbox/NNTPdeflate.pm b/lib/PublicInbox/NNTPdeflate.pm
index dec88aba3a5..02af935f0df 100644
--- a/lib/PublicInbox/NNTPdeflate.pm
+++ b/lib/PublicInbox/NNTPdeflate.pm
@@ -16,11 +16,9 @@
 #       efficient in terms of server memory usage.
 package PublicInbox::NNTPdeflate;
 use strict;
-use warnings;
 use 5.010_001;
-use base qw(PublicInbox::NNTP);
+use parent qw(PublicInbox::NNTP);
 use Compress::Raw::Zlib;
-use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways
 
 my %IN_OPT = (
 	-Bufsize => PublicInbox::NNTP::LINE_MAX,
@@ -53,7 +51,6 @@ sub enable {
 		$self->res('403 Unable to activate compression');
 		return;
 	}
-	unlock_hash(%$self);
 	$self->res('206 Compression active');
 	bless $self, $class;
 	$self->{zin} = $in;
diff --git a/lib/PublicInbox/ParentPipe.pm b/lib/PublicInbox/ParentPipe.pm
index f62f011bbe3..538b5632c62 100644
--- a/lib/PublicInbox/ParentPipe.pm
+++ b/lib/PublicInbox/ParentPipe.pm
@@ -5,17 +5,13 @@
 # notified if the master process dies.
 package PublicInbox::ParentPipe;
 use strict;
-use warnings;
-use base qw(PublicInbox::DS);
-use fields qw(cb);
+use parent qw(PublicInbox::DS);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
 
 sub new ($$$) {
 	my ($class, $pipe, $worker_quit) = @_;
-	my $self = fields::new($class);
+	my $self = bless { cb => $worker_quit }, $class;
 	$self->SUPER::new($pipe, EPOLLIN|EPOLLONESHOT);
-	$self->{cb} = $worker_quit;
-	$self;
 }
 
 # master process died, time to call worker_quit ourselves
diff --git a/lib/PublicInbox/Sigfd.pm b/lib/PublicInbox/Sigfd.pm
index 17456592a7e..bf91bb3774f 100644
--- a/lib/PublicInbox/Sigfd.pm
+++ b/lib/PublicInbox/Sigfd.pm
@@ -1,9 +1,11 @@
 # Copyright (C) 2019-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# Wraps a signalfd (or similar) for PublicInbox::DS
+# fields: (sig: hashref similar to %SIG, but signal numbers as keys)
 package PublicInbox::Sigfd;
 use strict;
 use parent qw(PublicInbox::DS);
-use fields qw(sig); # hashref similar to %SIG, but signal numbers as keys
 use PublicInbox::Syscall qw(signalfd EPOLLIN EPOLLET SFD_NONBLOCK);
 use POSIX qw(:signal_h);
 use IO::Handle ();
@@ -12,7 +14,6 @@ use IO::Handle ();
 # are available.
 sub new {
 	my ($class, $sig, $flags) = @_;
-	my $self = fields::new($class);
 	my %signo = map {;
 		my $cb = $sig->{$_};
 		# SIGWINCH is 28 on FreeBSD, NetBSD, OpenBSD
@@ -22,6 +23,7 @@ sub new {
 		};
 		$num => $cb;
 	} keys %$sig;
+	my $self = bless { sig => \%signo }, $class;
 	my $io;
 	my $fd = signalfd(-1, [keys %signo], $flags);
 	if (defined $fd && $fd >= 0) {
@@ -35,9 +37,8 @@ sub new {
 		$self->SUPER::new($io, EPOLLIN | EPOLLET);
 	} else { # master main loop
 		$self->{sock} = $io;
+		$self;
 	}
-	$self->{sig} = \%signo;
-	$self;
 }
 
 # PublicInbox::Daemon in master main loop (blocking)
diff --git a/xt/mem-imapd-tls.t b/xt/mem-imapd-tls.t
index 97e67d3029a..660fdc77a2b 100644
--- a/xt/mem-imapd-tls.t
+++ b/xt/mem-imapd-tls.t
@@ -132,8 +132,8 @@ done_testing;
 
 package IMAPC;
 use strict;
-use base qw(PublicInbox::DS);
-use fields qw(step zin);
+use parent qw(PublicInbox::DS);
+# fields: step: state machine, zin: Zlib inflate context
 use PublicInbox::Syscall qw(EPOLLIN EPOLLOUT EPOLLONESHOT);
 use Errno qw(EAGAIN);
 # determines where we start event_step
@@ -207,26 +207,23 @@ sub event_step {
 
 sub new {
 	my ($class, $io) = @_;
-	my $self = fields::new($class);
-
-	# wait for connect(), and maybe SSL_connect()
-	$self->SUPER::new($io, EPOLLOUT|EPOLLONESHOT);
+	my $self = bless { step => FIRST_STEP }, $class;
 	if ($io->can('connect_SSL')) {
 		$self->{wbuf} = [ \&connect_tls_step ];
 	}
-	$self->{step} = FIRST_STEP;
-	$self;
+	# wait for connect(), and maybe SSL_connect()
+	$self->SUPER::new($io, EPOLLOUT|EPOLLONESHOT);
 }
 
 1;
 package IMAPCdeflate;
 use strict;
-use base qw(IMAPC); # parent doesn't work for fields
-use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways
+our @ISA;
 use Compress::Raw::Zlib;
 use PublicInbox::IMAPdeflate;
 my %ZIN_OPT;
 BEGIN {
+	@ISA = qw(IMAPC);
 	%ZIN_OPT = ( -WindowBits => -15, -AppendOutput => 1 );
 	*write = \&PublicInbox::IMAPdeflate::write;
 	*do_read = \&PublicInbox::IMAPdeflate::do_read;
@@ -236,7 +233,6 @@ sub enable {
 	my ($class, $self) = @_;
 	my ($in, $err) = Compress::Raw::Zlib::Inflate->new(%ZIN_OPT);
 	die "Inflate->new failed: $err" if $err != Z_OK;
-	unlock_hash(%$self);
 	bless $self, $class;
 	$self->{zin} = $in;
 }

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/34] watch: wire up IMAP IDLE reapers to DS
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (11 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 12/34] ds: remove fields.pm usage Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 14/34] watch: support IMAP polling Eric Wong
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

We can avoid synchronous `waitpid(-1, 0)' and save a process
when simultaneously watching Maildirs.

One DS bug is fixed: ->Reset needs to clear the DS $in_loop flag
in forked children so dwaitpid() fails and allows git processes
to be reaped synchronously.  TestCommon also calls DS->Reset
when spawning new processes, since t/imapd.t uses DS->EventLoop
while waiting on -watch to write.
---
 lib/PublicInbox/DS.pm           |   2 +-
 lib/PublicInbox/TestCommon.pm   |   1 +
 lib/PublicInbox/WatchMaildir.pm | 170 +++++++++++++-------------------
 script/public-inbox-watch       |   6 +-
 4 files changed, 77 insertions(+), 102 deletions(-)

diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index da68802dda9..c46b20cba27 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -68,7 +68,7 @@ Reset all state
 =cut
 sub Reset {
     %DescriptorMap = ();
-    $wait_pids = $later_queue = undef;
+    $in_loop = $wait_pids = $later_queue = undef;
     $EXPMAP = {};
     $nextq = $ToClose = $reap_timer = $later_timer = $exp_timer = undef;
     $LoopTimeout = -1;  # no timeout by default
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index b252810fca5..14ebba10563 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -350,6 +350,7 @@ sub start_script {
 	}
 	defined(my $pid = fork) or die "fork: $!\n";
 	if ($pid == 0) {
+		eval { PublicInbox::DS->Reset };
 		# pretend to be systemd (cf. sd_listen_fds(3))
 		# 3 == SD_LISTEN_FDS_START
 		my $fd;
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 4d3cd032e5a..431350be277 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -12,7 +12,7 @@ use PublicInbox::Filter::Base qw(REJECT);
 use PublicInbox::Spamcheck;
 use PublicInbox::Sigfd;
 use PublicInbox::DS qw(now);
-use POSIX qw(_exit WNOHANG);
+use POSIX qw(_exit);
 *mime_from_path = \&PublicInbox::InboxWritable::mime_from_path;
 
 sub compile_watchheaders ($) {
@@ -213,9 +213,8 @@ sub quit {
 	}
 }
 
-sub watch_fs {
+sub watch_fs_init ($) {
 	my ($self) = @_;
-	require PublicInbox::DirIdle;
 	my $done = sub {
 		delete $self->{done_timer};
 		_done_for_now($self);
@@ -224,10 +223,8 @@ sub watch_fs {
 		_try_path($self, $_[0]->fullname);
 		$self->{done_timer} //= PublicInbox::DS::requeue($done);
 	};
-	my $di = PublicInbox::DirIdle->new($self->{mdir}, $cb);
-	PublicInbox::DS->SetPostLoopCallback(sub { !$self->{quit} });
-	PublicInbox::DS->EventLoop;
-	_done_for_now($self);
+	require PublicInbox::DirIdle;
+	PublicInbox::DirIdle->new($self->{mdir}, $cb); # EPOLL_CTL_ADD
 }
 
 # returns the git config section name, e.g [imap "imaps://user@example.com"]
@@ -334,25 +331,6 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 	$mic;
 }
 
-sub imap_start ($) {
-	my ($self) = @_;
-	eval { require PublicInbox::IMAPClient } or
-		die "Mail::IMAPClient is required for IMAP:\n$@\n";
-	eval { require Git } or
-		die "Git (Perl module) is required for IMAP:\n$@\n";
-	eval { require PublicInbox::IMAPTracker } or
-		die "DBD::SQLite is required for IMAP\n:$@\n";
-
-	my $mic_args = imap_common_init($self);
-	# make sure we can connect and cache the credentials in memory
-	$self->{mic_arg} = {}; # schema://authority => IMAPClient->new args
-	my $mics = $self->{mics} = {}; # schema://authority => IMAPClient obj
-	for my $url (sort keys %{$self->{imap}}) {
-		my $uri = PublicInbox::URIimap->new($url);
-		$mics->{imap_section($uri)} //= mic_for($self, $uri, $mic_args);
-	}
-}
-
 sub imap_fetch_all ($$$) {
 	my ($self, $mic, $uri) = @_;
 	my $sec = imap_section($uri);
@@ -481,74 +459,76 @@ sub watch_imap_idle_1 ($$$) {
 
 sub watch_atfork_child ($) {
 	my ($self) = @_;
+	delete $self->{idle_pids};
+	PublicInbox::DS->Reset;
 	PublicInbox::Sigfd::sig_setmask($self->{oldset});
 	%SIG = (%SIG, %{$self->{sig}});
 }
 
-sub watch_imap_idle_all ($$) {
-	my ($self, $idle) = @_; # $idle = [[ uri1, intvl1 ], [ uri2, intvl2 ]]
-	$self->{mics} = {}; # going to be forking, so disconnect
-	my $idle_pids = $self->{idle_pids} = {};
-	until ($self->{quit}) {
-		while (my $uri_intvl = shift @$idle) {
-			my ($uri, $intvl) = @$uri_intvl;
-			defined(my $pid = fork) or die "fork: $!";
-			if ($pid == 0) {
-				watch_atfork_child($self);
-				delete $self->{idle_pids};
-				watch_imap_idle_1($self, $uri, $intvl);
-				_exit(0);
-			}
-			$idle_pids->{$pid} = $uri_intvl;
-		}
-		my $pid = waitpid(-1, 0) or next;
-		if ($pid < 0) {
-			warn "W: no idling children: $!";
-			if (@$idle) {
-				sleep 60;
-			} else {
-				warn "W: nothing to respawn, quitting IDLE\n";
-				last;
-			}
-		}
-		if (my $uri_intvl = delete $idle_pids->{$pid}) {
-			my ($uri, $intvl) = @$uri_intvl;
-			my $url = $uri->as_string;
-			if ($? || !$self->{quit}) {
-				warn "W: PID=$pid on $url died: \$?=$?\n";
-			}
-			push @$idle, $uri_intvl;
-		} else {
-			warn "W: PID=$pid (unknown) reaped: \$?=$?\n";
-		}
+sub imap_idle_reap { # PublicInbox::DS::dwaitpid callback
+	my ($self, $pid) = @_;
+	my $uri_intvl = delete $self->{idle_pids}->{$pid} or
+		die "BUG: PID=$pid (unknown) reaped: \$?=$?\n";
+
+	my ($uri, $intvl) = @$uri_intvl;
+	my $url = $uri->as_string;
+	return if $self->{quit};
+	warn "W: PID=$pid on $url died: \$?=$?\n" if $?;
+	push @{$self->{idle_todo}}, $uri_intvl;
+	PubicInbox::DS::requeue($self); # call ->event_step to respawn
+}
+
+sub imap_idle_fork ($$) {
+	my ($self, $uri_intvl) = @_;
+	my ($uri, $intvl) = @$uri_intvl;
+	defined(my $pid = fork) or die "fork: $!";
+	if ($pid == 0) {
+		watch_atfork_child($self);
+		watch_imap_idle_1($self, $uri, $intvl);
+		_exit(0);
 	}
+	$self->{idle_pids}->{$pid} = $uri_intvl;
+	PublicInbox::DS::dwaitpid($pid, \&imap_idle_reap, $self);
+}
 
-	# tear it all down
-	kill('QUIT', $_) for (keys %$idle_pids);
-	while (scalar keys %$idle_pids) {
-		if (my $pid = waitpid(-1, WNOHANG)) {
-			if ($pid < 0) {
-				warn "E: no children? $! (PIDs: ",
-					join(', ', keys %$idle_pids),")\n";
-				last;
-			} else {
-				delete $idle_pids->{$pid};
-			}
-		} else { # signals aren't that reliable w/o signalfd/kevent
-			sleep 1;
-			kill('QUIT', $_) for (keys %$idle_pids);
+sub event_step {
+	my ($self) = @_;
+	return if $self->{quit};
+	my $idle_todo = $self->{idle_todo};
+	if ($idle_todo && @$idle_todo) {
+		$self->{mics} = {}; # going to be forking, so disconnect
+		while (my $uri_intvl = shift(@$idle_todo)) {
+			imap_idle_fork($self, $uri_intvl);
 		}
 	}
+	goto(&fs_scan_step) if $self->{mdre};
 }
 
-sub watch_imap ($) {
+sub watch_imap_init ($) {
 	my ($self) = @_;
-	my $idle = []; # [ [ uri1, intvl1 ], [uri2, intvl2] ];
+	eval { require PublicInbox::IMAPClient } or
+		die "Mail::IMAPClient is required for IMAP:\n$@\n";
+	eval { require Git } or
+		die "Git (Perl module) is required for IMAP:\n$@\n";
+	eval { require PublicInbox::IMAPTracker } or
+		die "DBD::SQLite is required for IMAP\n:$@\n";
+
+	my $mic_args = imap_common_init($self); # read args from config
+
+	# make sure we can connect and cache the credentials in memory
+	$self->{mic_arg} = {}; # schema://authority => IMAPClient->new args
+	my $mics = $self->{mics} = {}; # schema://authority => IMAPClient obj
+	for my $url (sort keys %{$self->{imap}}) {
+		my $uri = PublicInbox::URIimap->new($url);
+		$mics->{imap_section($uri)} //= mic_for($self, $uri, $mic_args);
+	}
+
+	my $idle = []; # [ [ uri1, intvl1 ], [uri2, intvl2] ]
 	my $poll = {}; # intvl_seconds => [ uri1, uri2 ]
 	for my $url (keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
 		my $sec = imap_section($uri);
-		my $mic = $self->{mics}->{$sec};
+		my $mic = $mics->{$sec};
 		my $intvl = $self->{imap_opt}->{$sec}->{poll_intvl};
 		if ($mic->has_capability('IDLE') && !$intvl) {
 			$intvl = $self->{imap_opt}->{$sec}->{idle_intvl};
@@ -557,9 +537,10 @@ sub watch_imap ($) {
 			push @{$poll->{$intvl || 120}}, $uri;
 		}
 	}
-	my $nr_poll = scalar keys %$poll;
-	if (scalar @$idle && !$nr_poll) { # multiple idlers, need fork
-		watch_imap_idle_all($self, $idle);
+	if (scalar @$idle) {
+		$self->{idle_pids} = {};
+		$self->{idle_todo} = $idle;
+		PublicInbox::DS::requeue($self); # ->event_step to fork
 	}
 	# TODO: polling
 }
@@ -568,21 +549,11 @@ sub watch {
 	my ($self, $sig, $oldset) = @_;
 	$self->{oldset} = $oldset;
 	$self->{sig} = $sig;
-	if ($self->{mdre} && $self->{imap}) {
-		defined(my $pid = fork) or die "fork: $!";
-		if ($pid == 0) {
-			watch_atfork_child($self);
-			imap_start($self);
-			goto &watch_imap;
-		}
-		$self->{-imap_pid} = $pid;
-	} elsif ($self->{imap}) {
-		# not a child process, but no signalfd, yet:
-		watch_atfork_child($self);
-		imap_start($self);
-		goto &watch_imap;
-	}
-	goto &watch_fs;
+	watch_imap_init($self) if $self->{imap};
+	watch_fs_init($self) if $self->{mdre};
+	PublicInbox::DS->SetPostLoopCallback(sub {});
+	PublicInbox::DS->EventLoop until $self->{quit};
+	_done_for_now($self);
 }
 
 sub trigger_scan {
@@ -591,8 +562,7 @@ sub trigger_scan {
 	PublicInbox::DS::requeue($self);
 }
 
-# called directly, and by PublicInbox::DS
-sub event_step ($) {
+sub fs_scan_step {
 	my ($self) = @_;
 	return if $self->{quit};
 	my $op = shift @{$self->{ops}};
@@ -634,7 +604,7 @@ sub event_step ($) {
 sub scan {
 	my ($self, $op) = @_;
 	push @{$self->{ops}}, $op;
-	goto &event_step;
+	goto &fs_scan_step;
 }
 
 sub _importer_for {
diff --git a/script/public-inbox-watch b/script/public-inbox-watch
index b6d545adad7..ae7b70be355 100755
--- a/script/public-inbox-watch
+++ b/script/public-inbox-watch
@@ -22,7 +22,11 @@ if ($watch_md) {
 		$watch_md->quit if $watch_md;
 		$watch_md = undef;
 	};
-	my $sig = { HUP => $reload, USR1 => $scan };
+	my $sig = {
+		HUP => $reload,
+		USR1 => $scan,
+		CHLD => \&PublicInbox::DS::enqueue_reap,
+	};
 	$sig->{QUIT} = $sig->{TERM} = $sig->{INT} = $quit;
 
 	# --no-scan is only intended for testing atm, undocumented.

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/34] watch: support IMAP polling
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (12 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 13/34] watch: wire up IMAP IDLE reapers to DS Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 15/34] config: support ->urlmatch method for -watch Eric Wong
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Not all IMAP servers support IDLE, and IDLE may be prohibitively
expensive for some IMAP servers with many inboxes.  So allow
configuring a imap.$IMAP_URL.pollInterval=SECONDS to poll
mailboxes.

We'll also need to poll for NNTP servers in the future.
---
 lib/PublicInbox/WatchMaildir.pm | 64 ++++++++++++++++++++++++++++++---
 t/imapd.t                       | 39 ++++++++++++++++++--
 2 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 431350be277..ac980d9b0f1 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -202,8 +202,9 @@ sub quit {
 	if (my $imap_pid = $self->{-imap_pid}) {
 		kill('QUIT', $imap_pid);
 	}
-	if (my $idle_pids = $self->{idle_pids}) {
-		kill('QUIT', $_) for (keys %$idle_pids);
+	for (qw(idle_pids poll_pids)) {
+		my $pids = $self->{$_} or next;
+		kill('QUIT', $_) for (keys %$pids);
 	}
 	if (my $idle_mic = $self->{idle_mic}) {
 		eval { $idle_mic->done };
@@ -237,12 +238,12 @@ sub imap_section ($) {
 sub cfg_intvl ($$) {
 	my ($cfg, $key) = @_;
 	defined(my $v = $cfg->{lc($key)}) or return;
-	$v =~ /\A[0-9]+\z/s and return $v + 0;
+	$v =~ /\A[0-9]+(?:\.[0-9]+)?\z/s and return $v + 0;
 	if (ref($v) eq 'ARRAY') {
 		$v = join(', ', @$v);
 		warn "W: $key has multiple values: $v\nW: $key ignored\n";
 	} else {
-		warn "W: $key=$v is not an integer value in seconds\n";
+		warn "W: $key=$v is not a numeric value in seconds\n";
 	}
 }
 
@@ -460,6 +461,7 @@ sub watch_imap_idle_1 ($$$) {
 sub watch_atfork_child ($) {
 	my ($self) = @_;
 	delete $self->{idle_pids};
+	delete $self->{poll_pids};
 	PublicInbox::DS->Reset;
 	PublicInbox::Sigfd::sig_setmask($self->{oldset});
 	%SIG = (%SIG, %{$self->{sig}});
@@ -504,6 +506,52 @@ sub event_step {
 	goto(&fs_scan_step) if $self->{mdre};
 }
 
+sub watch_imap_fetch_all ($$) {
+	my ($self, $uris) = @_;
+	for my $uri (@$uris) {
+		my $sec = imap_section($uri);
+		my $mic_arg = $self->{mic_arg}->{$sec} or
+			die "BUG: no Mail::IMAPClient->new arg for $sec";
+		my $mic = PublicInbox::IMAPClient->new(%$mic_arg) or next;
+		my $err = imap_fetch_all($self, $mic, $uri);
+		last if $self->{quit};
+		warn $err, "\n" if $err;
+	}
+}
+
+sub imap_fetch_fork ($$$) {
+	my ($self, $intvl, $uris) = @_;
+	return if $self->{quit};
+	$self->{mics} = {}; # going to be forking, so disconnect
+	defined(my $pid = fork) or die "fork: $!";
+	if ($pid == 0) {
+		watch_atfork_child($self);
+		watch_imap_fetch_all($self, $uris);
+		_exit(0);
+	}
+	$self->{poll_pids}->{$pid} = [ $intvl, $uris ];
+	PublicInbox::DS::dwaitpid($pid, \&imap_fetch_reap, $self);
+}
+
+sub imap_fetch_cb ($$$) {
+	my ($self, $intvl, $uris) = @_;
+	sub { imap_fetch_fork($self, $intvl, $uris) };
+}
+
+sub imap_fetch_reap { # PublicInbox::DS::dwaitpid callback
+	my ($self, $pid) = @_;
+	my $intvl_uris = delete $self->{poll_pids}->{$pid} or
+		die "BUG: PID=$pid (unknown) reaped: \$?=$?\n";
+	return if $self->{quit};
+	my ($intvl, $uris) = @$intvl_uris;
+	if ($?) {
+		warn "W: PID=$pid died: \$?=$?\n",
+			map { $_->as_string."\n" } @$uris;
+	}
+	warn('I: will check ', $_->as_string, " in ${intvl}s\n") for @$uris;
+	PublicInbox::DS::add_timer($intvl, imap_fetch_cb($self, $intvl, $uris));
+}
+
 sub watch_imap_init ($) {
 	my ($self) = @_;
 	eval { require PublicInbox::IMAPClient } or
@@ -542,7 +590,13 @@ sub watch_imap_init ($) {
 		$self->{idle_todo} = $idle;
 		PublicInbox::DS::requeue($self); # ->event_step to fork
 	}
-	# TODO: polling
+	return unless scalar keys %$poll;
+	$self->{poll_pids} = {};
+
+	# poll all URIs for a given interval sequentially
+	while (my ($intvl, $uris) = each %$poll) {
+		PublicInbox::DS::requeue(imap_fetch_cb($self, $intvl, $uris));
+	}
 }
 
 sub watch {
diff --git a/t/imapd.t b/t/imapd.t
index cc87a127851..ee3a3b26767 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -443,6 +443,7 @@ ok($mic->logout, 'logged out');
 {
 	use_ok 'PublicInbox::WatchMaildir';
 	use_ok 'PublicInbox::InboxIdle';
+	my $old_env = { HOME => $ENV{HOME} };
 	my $home = "$tmpdir/watch_home";
 	mkdir $home or BAIL_OUT $!;
 	mkdir "$home/.public-inbox" or BAIL_OUT $!;
@@ -464,13 +465,45 @@ ok($mic->logout, 'logged out');
 	my $cb = sub { PublicInbox::DS->SetPostLoopCallback(sub {}) };
 	my $obj = bless \$cb, 'PublicInbox::TestCommon::InboxWakeup';
 	$cfg->each_inbox(sub { $_[0]->subscribe_unlock('ident', $obj) });
-	open my $err, '+>', undef or BAIL_OUT $!;
-	my $w = start_script(['-watch'], undef, { 2 => $err });
+	my $watcherr = "$tmpdir/watcherr";
+	open my $err_wr, '>', $watcherr or BAIL_OUT $!;
+	open my $err, '<', $watcherr or BAIL_OUT $!;
+	my $w = start_script(['-watch'], undef, { 2 => $err_wr });
+
+	diag 'waiting for initial fetch...';
+	PublicInbox::DS->EventLoop;
+	diag 'inbox unlocked on initial fetch, waiting for IDLE';
+
+	tick until (grep(/I: \S+ idling/, <$err>));
+	open my $fh, '<', 't/iso-2202-jp.eml' or BAIL_OUT $!;
+	$old_env->{ORIGINAL_RECIPIENT} = $addr;
+	ok(run_script([qw(-mda --no-precheck)], $old_env, { 0 => $fh }),
+		'delivered a message for IDLE to kick -watch');
+	diag 'waiting for IMAP IDLE wakeup';
+	PublicInbox::DS->SetPostLoopCallback(undef);
+	PublicInbox::DS->EventLoop;
+	diag 'inbox unlocked on IDLE wakeup';
+
+	# try again with polling
+	xsys(qw(git config), "--file=$home/.public-inbox/config",
+		"imap.imap://$ihost:$iport.PollInterval", 0.11) == 0
+		or BAIL_OUT "git config $?";
+	$w->kill('HUP');
+	diag 'waiting for -watch reload + initial fetch';
+	tick until (grep(/I: will check/, <$err>));
+
+	open $fh, '<', 't/psgi_attach.eml' or BAIL_OUT $!;
+	ok(run_script([qw(-mda --no-precheck)], $old_env, { 0 => $fh }),
+		'delivered a message for -watch PollInterval');
+
+	diag 'waiting for PollInterval wakeup';
+	PublicInbox::DS->SetPostLoopCallback(undef);
 	PublicInbox::DS->EventLoop;
-	diag 'inbox unlocked';
+	diag 'inbox unlocked (poll)';
 	$w->kill;
 	$w->join;
 	is($?, 0, 'no error in exited -watch process');
+
 	$cfg->each_inbox(sub { shift->unsubscribe_unlock('ident') });
 	$ii->close;
 	PublicInbox::DS->Reset;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 15/34] config: support ->urlmatch method for -watch
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (13 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 14/34] watch: support IMAP polling Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 16/34] watch: stop importers before forking Eric Wong
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Since we have IMAP client support in -watch; make sure per-URL
settings are familiar to git users by taking advantage of git's
URL matching abilities.

This requires git 1.8.5+, which most users ought to have
(though base CentOS 7 is on 1.8.3).
---
 lib/PublicInbox/Config.pm       | 21 ++++++++++++++++++++-
 lib/PublicInbox/TestCommon.pm   | 11 ++++++-----
 lib/PublicInbox/WatchMaildir.pm | 17 ++++++++++-------
 t/config.t                      | 18 ++++++++++++++++++
 t/imapd.t                       |  2 +-
 5 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 19535beb973..c0e2cc575ec 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -9,7 +9,7 @@
 
 package PublicInbox::Config;
 use strict;
-use warnings;
+use v5.10.1;
 use PublicInbox::Inbox;
 use PublicInbox::Spawn qw(popen_rd);
 
@@ -462,4 +462,23 @@ sub _fill {
 	$ibx
 }
 
+sub urlmatch {
+	my ($self, $key, $url) = @_;
+	state $urlmatch_broken; # requires git 1.8.5
+	return if $urlmatch_broken;
+	my $file = default_file();
+	my $cmd = [qw/git config -z --includes --get-urlmatch/,
+		"--file=$file", $key, $url ];
+	my $fh = popen_rd($cmd);
+	local $/ = "\0";
+	my $val = <$fh>;
+	if (close($fh)) {
+		chomp($val);
+		$val;
+	} else {
+		$urlmatch_broken = 1 if (($? >> 8) != 1);
+		undef;
+	}
+}
+
 1;
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 14ebba10563..7b4da8b5f09 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -55,15 +55,16 @@ sub tcp_connect {
 
 sub require_git ($;$) {
 	my ($req, $maybe) = @_;
-	my ($req_maj, $req_min) = split(/\./, $req);
-	my ($cur_maj, $cur_min) = (`git --version` =~ /version (\d+)\.(\d+)/);
+	my ($req_maj, $req_min, $req_sub) = split(/\./, $req);
+	my ($cur_maj, $cur_min, $cur_sub) = (xqx([qw(git --version)])
+			=~ /version (\d+)\.(\d+)(?:\.(\d+))?/);
 
-	my $req_int = ($req_maj << 24) | ($req_min << 16);
-	my $cur_int = ($cur_maj << 24) | ($cur_min << 16);
+	my $req_int = ($req_maj << 24) | ($req_min << 16) | ($req_sub // 0);
+	my $cur_int = ($cur_maj << 24) | ($cur_min << 16) | ($cur_sub // 0);
 	if ($cur_int < $req_int) {
 		return 0 if $maybe;
 		Test::More::plan(skip_all =>
-				"git $req+ required, have $cur_maj.$cur_min");
+			"git $req+ required, have $cur_maj.$cur_min.$cur_sub");
 	}
 	1;
 }
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index ac980d9b0f1..494fe7a8f21 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -235,9 +235,11 @@ sub imap_section ($) {
 	$uri->scheme . '://' . $uri->authority;
 }
 
-sub cfg_intvl ($$) {
-	my ($cfg, $key) = @_;
-	defined(my $v = $cfg->{lc($key)}) or return;
+sub cfg_intvl ($$$$$) {
+	my ($cfg, $cfg_section, $cfg_key, $imap_section, $url) = @_;
+	my $key = "$cfg_section.$imap_section.$cfg_key";
+	my $v = $cfg->{lc($key)} //
+		$cfg->urlmatch("$cfg_section.$cfg_key", $url) // return;
 	$v =~ /\A[0-9]+(?:\.[0-9]+)?\z/s and return $v + 0;
 	if (ref($v) eq 'ARRAY') {
 		$v = join(', ', @$v);
@@ -257,7 +259,8 @@ sub imap_common_init ($) {
 		my $sec = imap_section($uri);
 		for my $k (qw(Starttls Debug Compress)) {
 			my $key = lc("imap.$sec.$k");
-			defined(my $orig = $cfg->{$key}) or next;
+			my $orig = $cfg->{$key} //
+				$cfg->urlmatch("imap.$k", $url) // next;
 			my $v = PublicInbox::Config::_git_config_bool($orig);
 			if (defined($v)) {
 				$mic_args->{$sec}->{$k} = $v;
@@ -265,11 +268,11 @@ sub imap_common_init ($) {
 				warn "W: $key=$orig is not boolean\n";
 			}
 		}
-		my $to = cfg_intvl($cfg, "imap.$sec.Timeout");
+		my $to = cfg_intvl($cfg, 'imap', 'Timeout', $sec, $url);
 		$mic_args->{$sec}->{Timeout} = $to if $to;
-		$to = cfg_intvl($cfg, "imap.$sec.PollInterval");
+		$to = cfg_intvl($cfg, 'imap', 'PollInterval', $sec, $url);
 		$self->{imap_opt}->{$sec}->{poll_intvl} = $to if $to;
-		$to = cfg_intvl($cfg, "imap.$sec.IdleInterval");
+		$to = cfg_intvl($cfg, 'imap', 'IdleInterval', $sec, $url);
 		$self->{imap_opt}->{$sec}->{idle_intvl} = $to if $to;
 	}
 	$mic_args;
diff --git a/t/config.t b/t/config.t
index 3f41c0042a9..ad543ad3638 100644
--- a/t/config.t
+++ b/t/config.t
@@ -225,4 +225,22 @@ EOF
 		'bogus is undef');
 }
 
+SKIP: {
+	require_git('1.8.5', 2) or
+		skip 'git 1.8.5+ required for --url-match', 2;
+	my $f = "$tmpdir/urlmatch";
+	open my $fh, '>', $f or BAIL_OUT $!;
+	print $fh <<EOF or BAIL_OUT $!;
+[imap "imap://*.example.com"]
+	pollInterval = 9
+EOF
+	close $fh or BAIL_OUT;
+	local $ENV{PI_CONFIG} = $f;
+	my $cfg = PublicInbox::Config->new;
+	my $url = 'imap://mail.example.com/INBOX';
+	is($cfg->urlmatch('imap.pollInterval', $url), 9, 'urlmatch hit');
+	is($cfg->urlmatch('imap.idleInterval', $url), undef, 'urlmatch miss');
+};
+
+
 done_testing();
diff --git a/t/imapd.t b/t/imapd.t
index ee3a3b26767..5626d24765f 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -486,7 +486,7 @@ ok($mic->logout, 'logged out');
 
 	# try again with polling
 	xsys(qw(git config), "--file=$home/.public-inbox/config",
-		"imap.imap://$ihost:$iport.PollInterval", 0.11) == 0
+		'imap.PollInterval', 0.11) == 0
 		or BAIL_OUT "git config $?";
 	$w->kill('HUP');
 	diag 'waiting for -watch reload + initial fetch';

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 16/34] watch: stop importers before forking
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (14 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 15/34] config: support ->urlmatch method for -watch Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 17/34] watch: use UID SEARCH to avoid empty UID FETCH Eric Wong
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

This fixes cases where watch is handling both Maildirs and IMAP
connections.  While we're at it, close open directories in the
IMAP children to save FDs.
---
 lib/PublicInbox/WatchMaildir.pm | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 494fe7a8f21..24989130979 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -465,11 +465,18 @@ sub watch_atfork_child ($) {
 	my ($self) = @_;
 	delete $self->{idle_pids};
 	delete $self->{poll_pids};
+	delete $self->{opendirs};
 	PublicInbox::DS->Reset;
 	PublicInbox::Sigfd::sig_setmask($self->{oldset});
 	%SIG = (%SIG, %{$self->{sig}});
 }
 
+sub watch_atfork_parent ($) {
+	my ($self) = @_;
+	_done_for_now($self);
+	$self->{mics} = {}; # going to be forking, so disconnect
+}
+
 sub imap_idle_reap { # PublicInbox::DS::dwaitpid callback
 	my ($self, $pid) = @_;
 	my $uri_intvl = delete $self->{idle_pids}->{$pid} or
@@ -501,7 +508,7 @@ sub event_step {
 	return if $self->{quit};
 	my $idle_todo = $self->{idle_todo};
 	if ($idle_todo && @$idle_todo) {
-		$self->{mics} = {}; # going to be forking, so disconnect
+		watch_atfork_parent($self);
 		while (my $uri_intvl = shift(@$idle_todo)) {
 			imap_idle_fork($self, $uri_intvl);
 		}
@@ -525,7 +532,7 @@ sub watch_imap_fetch_all ($$) {
 sub imap_fetch_fork ($$$) {
 	my ($self, $intvl, $uris) = @_;
 	return if $self->{quit};
-	$self->{mics} = {}; # going to be forking, so disconnect
+	watch_atfork_parent($self);
 	defined(my $pid = fork) or die "fork: $!";
 	if ($pid == 0) {
 		watch_atfork_child($self);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 17/34] watch: use UID SEARCH to avoid empty UID FETCH
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (15 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 16/34] watch: stop importers before forking Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 18/34] ds: add_timer: allow passing arg to callback Eric Wong
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

For mailboxes with many gaps in the UID sequence,
performing a UID SEARCH beforehand can reduce the
number of articles to fetch.

However, the downside to this is we may end up with
an arbitrarly large list of UIDs from the server.
---
 lib/PublicInbox/WatchMaildir.pm | 88 ++++++++++++++++++++-------------
 1 file changed, 54 insertions(+), 34 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 24989130979..b82b51025e6 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -335,6 +335,27 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 	$mic;
 }
 
+sub imap_import_msg ($$$$$$) {
+	my ($self, $itrk, $url, $r_uidval, $uid, $raw) = @_;
+	# our target audience expects LF-only, save storage
+	$$raw =~ s/\r\n/\n/sg;
+
+	my $inboxes = $self->{imap}->{$url};
+	if (ref($inboxes)) {
+		for my $ibx (@$inboxes) {
+			my $eml = PublicInbox::Eml->new($$raw);
+			my $x = import_eml($self, $ibx, $eml);
+		}
+	} elsif ($inboxes eq 'watchspam') {
+		my $eml = PublicInbox::Eml->new($raw);
+		my $arg = [ $self, $eml, "$url UID:$uid" ];
+		$self->{config}->each_inbox(\&remove_eml_i, $arg);
+	} else {
+		die "BUG: destination unknown $inboxes";
+	}
+	$itrk->update_last($url, $r_uidval, $uid);
+}
+
 sub imap_fetch_all ($$$) {
 	my ($self, $mic, $uri) = @_;
 	my $sec = imap_section($uri);
@@ -367,52 +388,51 @@ sub imap_fetch_all ($$$) {
 	}
 	return if $l_uid >= $r_uid; # nothing to do
 
+	warn "I: $url fetching UID $l_uid:$r_uid\n";
 	$mic->Uid(1); # the default, we hope
+	my $uids;
 	my $req = $mic->imap4rev1 ? 'BODY.PEEK[]' : 'RFC822.PEEK';
 	my $key = $req;
 	$key =~ s/\.PEEK//;
-	my $inboxes = $self->{imap}->{$url};
-	warn "I: $url fetching $l_uid..$r_uid\n";
-	my $uid = -1;
+	my $uid;
 	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
 	local $SIG{__WARN__} = sub {
+		$uid //= -1;
 		$warn_cb->("$url UID:$uid\n");
 		$warn_cb->(@_);
 	};
 	my $err;
-	$itrk->{dbh}->begin_work;
-	for my $u ($l_uid..$r_uid) {
-		$uid = $u;
-		local $0 = "UID:$uid $mbx $sec";
-		my $r = $mic->fetch_hash($uid, $req);
-		unless ($r) { # network error?
-			$err = "E: $url UID FETCH $uid error: $!\n";
-			last;
-		}
-
-		# messages get deleted, so holes appear
-		defined(my $raw = delete $r->{$uid}->{$key}) or next;
-
-		# our target audience expects LF-only, save storage
-		$raw =~ s/\r\n/\n/sg;
-
-		if (ref($inboxes)) {
-			for my $ibx (@$inboxes) {
-				my $eml = PublicInbox::Eml->new($raw);
-				my $x = import_eml($self, $ibx, $eml);
+	do {
+		$uids = $mic->search("UID $l_uid:*") or
+			return "E: $url UID SEARCH $l_uid:* error: $!";
+		return if scalar(@$uids) == 0;
+
+		# RFC 3501 doesn't seem to indicate order of UID SEARCH
+		# responses, so sort it ourselves
+		@$uids = sort { $a <=> $b } @$uids;
+
+		# Did we actually get new messages?
+		return if $uids->[0] < $l_uid;
+
+		$l_uid = $uids->[-1] + 1; # for next search
+
+		$itrk->{dbh}->begin_work;
+		while (defined(($uid = shift(@$uids)))) {
+			local $0 = "UID:$uid $mbx $sec";
+			my $r = $mic->fetch_hash($uid, $req);
+			unless ($r) { # network error?
+				$err = "E: $url UID FETCH $uid error: $!";
+				last;
 			}
-		} elsif ($inboxes eq 'watchspam') {
-			my $eml = PublicInbox::Eml->new($raw);
-			my $arg = [ $self, $eml, "$uri UID:$uid" ];
-			$self->{config}->each_inbox(\&remove_eml_i, $arg);
-		} else {
-			die "BUG: destination unknown $inboxes";
+			# messages get deleted, so holes appear
+			defined(my $raw = delete $r->{$uid}->{$key}) or next;
+			imap_import_msg($self, $itrk, $url, $r_uidval, $uid,
+					\$raw);
+			last if $self->{quit};
 		}
-		$itrk->update_last($url, $r_uidval, $uid);
-		last if $self->{quit};
-	}
-	_done_for_now($self);
-	$itrk->{dbh}->commit;
+		_done_for_now($self);
+		$itrk->{dbh}->commit;
+	} until ($err || $self->{quit});
 	$err;
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 18/34] ds: add_timer: allow passing arg to callback.
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (16 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 17/34] watch: use UID SEARCH to avoid empty UID FETCH Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 19/34] imaptracker: add {url} field to reduce args Eric Wong
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

This allows callers to avoid creating expensive closures.
We no longer pass the `$now' value to callers, as none of
the callers used it.
---
 lib/PublicInbox/DS.pm           | 10 +++++-----
 lib/PublicInbox/FakeInotify.pm  | 10 ++++------
 lib/PublicInbox/WatchMaildir.pm | 15 ++++++---------
 3 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index c46b20cba27..a3f2e76c16a 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -95,18 +95,18 @@ sub SetLoopTimeout {
     return $LoopTimeout = $_[1] + 0;
 }
 
-=head2 C<< PublicInbox::DS::add_timer( $seconds, $coderef ) >>
+=head2 C<< PublicInbox::DS::add_timer( $seconds, $coderef, $arg) >>
 
 Add a timer to occur $seconds from now. $seconds may be fractional, but timers
 are not guaranteed to fire at the exact time you ask for.
 
 =cut
-sub add_timer ($$) {
-    my ($secs, $coderef) = @_;
+sub add_timer ($$;$) {
+    my ($secs, $coderef, $arg) = @_;
 
     my $fire_time = now() + $secs;
 
-    my $timer = [$fire_time, $coderef];
+    my $timer = [$fire_time, $coderef, $arg];
 
     if (!@Timers || $fire_time >= $Timers[-1][0]) {
         push @Timers, $timer;
@@ -198,7 +198,7 @@ sub RunTimers {
     # Run expired timers
     while (@Timers && $Timers[0][0] <= $now) {
         my $to_run = shift(@Timers);
-        $to_run->[1]->($now) if $to_run->[1];
+        $to_run->[1]->($to_run->[2]);
     }
 
     # timers may enqueue into nextq:
diff --git a/lib/PublicInbox/FakeInotify.pm b/lib/PublicInbox/FakeInotify.pm
index df63173f083..debd2d39ae5 100644
--- a/lib/PublicInbox/FakeInotify.pm
+++ b/lib/PublicInbox/FakeInotify.pm
@@ -16,16 +16,14 @@ my $poll_intvl = 2; # same as Filesys::Notify::Simple
 
 sub poll_once {
 	my ($self) = @_;
-	sub {
-		eval { $self->poll };
-		warn "E: FakeInotify->poll: $@\n" if $@;
-		PublicInbox::DS::add_timer($poll_intvl, poll_once($self));
-	};
+	eval { $self->poll };
+	warn "E: FakeInotify->poll: $@\n" if $@;
+	PublicInbox::DS::add_timer($poll_intvl, \&poll_once, $self);
 }
 
 sub new {
 	my $self = bless { watch => {} }, __PACKAGE__;
-	PublicInbox::DS::add_timer($poll_intvl, poll_once($self));
+	PublicInbox::DS::add_timer($poll_intvl, \&poll_once, $self);
 	$self;
 }
 
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index b82b51025e6..f36aa20aa33 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -549,8 +549,8 @@ sub watch_imap_fetch_all ($$) {
 	}
 }
 
-sub imap_fetch_fork ($$$) {
-	my ($self, $intvl, $uris) = @_;
+sub imap_fetch_fork ($) { # DS::add_timer callback
+	my ($self, $intvl, $uris) = @{$_[0]};
 	return if $self->{quit};
 	watch_atfork_parent($self);
 	defined(my $pid = fork) or die "fork: $!";
@@ -563,11 +563,6 @@ sub imap_fetch_fork ($$$) {
 	PublicInbox::DS::dwaitpid($pid, \&imap_fetch_reap, $self);
 }
 
-sub imap_fetch_cb ($$$) {
-	my ($self, $intvl, $uris) = @_;
-	sub { imap_fetch_fork($self, $intvl, $uris) };
-}
-
 sub imap_fetch_reap { # PublicInbox::DS::dwaitpid callback
 	my ($self, $pid) = @_;
 	my $intvl_uris = delete $self->{poll_pids}->{$pid} or
@@ -579,7 +574,8 @@ sub imap_fetch_reap { # PublicInbox::DS::dwaitpid callback
 			map { $_->as_string."\n" } @$uris;
 	}
 	warn('I: will check ', $_->as_string, " in ${intvl}s\n") for @$uris;
-	PublicInbox::DS::add_timer($intvl, imap_fetch_cb($self, $intvl, $uris));
+	PublicInbox::DS::add_timer($intvl, \&imap_fetch_fork,
+					[$self, $intvl, $uris]);
 }
 
 sub watch_imap_init ($) {
@@ -625,7 +621,8 @@ sub watch_imap_init ($) {
 
 	# poll all URIs for a given interval sequentially
 	while (my ($intvl, $uris) = each %$poll) {
-		PublicInbox::DS::requeue(imap_fetch_cb($self, $intvl, $uris));
+		PublicInbox::DS::add_timer(0, \&imap_fetch_fork,
+						[$self, $intvl, $uris]);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 19/34] imaptracker: add {url} field to reduce args
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (17 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 18/34] ds: add_timer: allow passing arg to callback Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 20/34] imaptracker: drop {dbname} field Eric Wong
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta; +Cc: Eric W . Biederman

Passing a $url parameter to every function was error-prone
and having {url} field for a short-lived object is appropriate.

This matches the version of IMAPTracker posted by
Eric W. Biederman on 2020-05-15 at:
https://public-inbox.org/meta/87ftc0c3r4.fsf_-_@x220.int.ebiederm.org/

The version I originally imported was based on the one
posted on 2019-10-09:
https://public-inbox.org/meta/874l0i9vhc.fsf_-_@x220.int.ebiederm.org/

Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/PublicInbox/IMAPTracker.pm  | 20 ++++++++++----------
 lib/PublicInbox/WatchMaildir.pm | 17 ++++++++---------
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm
index bb4a39cc41a..26274568b9c 100644
--- a/lib/PublicInbox/IMAPTracker.pm
+++ b/lib/PublicInbox/IMAPTracker.pm
@@ -33,29 +33,29 @@ sub dbh_new ($) {
 	$dbh;
 }
 
-sub get_last ($$) {
-	my ($self, $url) = @_;
+sub get_last ($) {
+	my ($self) = @_;
 	my $sth = $self->{dbh}->prepare_cached(<<'', undef, 1);
 SELECT uid_validity, uid FROM imap_last WHERE url = ?
 
-	$sth->execute($url);
+	$sth->execute($self->{url});
 	$sth->fetchrow_array;
 }
 
-sub update_last ($$$$) {
-	my ($self, $url, $validity, $last) = @_;
+sub update_last ($$$) {
+	my ($self, $validity, $last) = @_;
 	my $sth = $self->{dbh}->prepare_cached(<<'');
 INSERT OR REPLACE INTO imap_last (url, uid_validity, uid)
 VALUES (?, ?, ?)
 
-	$sth->execute($url, $validity, $last);
+	$sth->execute($self->{url}, $validity, $last);
 }
 
 sub new {
-	my ($class, $dbname) = @_;
+	my ($class, $url) = @_;
 
 	# original name for compatibility with old setups:
-	$dbname //= PublicInbox::Config->config_dir() . "/imap.sqlite3";
+	my $dbname = PublicInbox::Config->config_dir() . "/imap.sqlite3";
 
 	# use the new XDG-compliant name for new setups:
 	if (!-f $dbname) {
@@ -65,12 +65,12 @@ sub new {
 	}
 	if (!-f $dbname) {
 		require File::Path;
-		require File::Basename;;
+		require File::Basename;
 		File::Path::mkpath(File::Basename::dirname($dbname));
 	}
 
 	my $dbh = dbh_new($dbname);
-	bless { dbname => $dbname, dbh => $dbh }, $class;
+	bless { dbname => $dbname, url => $url, dbh => $dbh }, $class;
 }
 
 1;
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index f36aa20aa33..e0caaa563b2 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -335,12 +335,12 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 	$mic;
 }
 
-sub imap_import_msg ($$$$$$) {
-	my ($self, $itrk, $url, $r_uidval, $uid, $raw) = @_;
+sub imap_import_msg ($$$$$) {
+	my ($self, $itrk, $r_uidval, $uid, $raw) = @_;
 	# our target audience expects LF-only, save storage
 	$$raw =~ s/\r\n/\n/sg;
 
-	my $inboxes = $self->{imap}->{$url};
+	my $inboxes = $self->{imap}->{$itrk->{url}};
 	if (ref($inboxes)) {
 		for my $ibx (@$inboxes) {
 			my $eml = PublicInbox::Eml->new($$raw);
@@ -348,12 +348,12 @@ sub imap_import_msg ($$$$$$) {
 		}
 	} elsif ($inboxes eq 'watchspam') {
 		my $eml = PublicInbox::Eml->new($raw);
-		my $arg = [ $self, $eml, "$url UID:$uid" ];
+		my $arg = [ $self, $eml, "$itrk->{url} UID:$uid" ];
 		$self->{config}->each_inbox(\&remove_eml_i, $arg);
 	} else {
 		die "BUG: destination unknown $inboxes";
 	}
-	$itrk->update_last($url, $r_uidval, $uid);
+	$itrk->update_last($r_uidval, $uid);
 }
 
 sub imap_fetch_all ($$$) {
@@ -373,8 +373,8 @@ sub imap_fetch_all ($$$) {
 		return "E: $url cannot get UIDVALIDITY";
 	$r_uidnext //= $mic->uidnext($mbx) //
 		return "E: $url cannot get UIDNEXT";
-	my $itrk = PublicInbox::IMAPTracker->new;
-	my ($l_uidval, $l_uid) = $itrk->get_last($url);
+	my $itrk = PublicInbox::IMAPTracker->new($url);
+	my ($l_uidval, $l_uid) = $itrk->get_last;
 	$l_uidval //= $r_uidval; # first time
 	$l_uid //= 1;
 	if ($l_uidval != $r_uidval) {
@@ -426,8 +426,7 @@ sub imap_fetch_all ($$$) {
 			}
 			# messages get deleted, so holes appear
 			defined(my $raw = delete $r->{$uid}->{$key}) or next;
-			imap_import_msg($self, $itrk, $url, $r_uidval, $uid,
-					\$raw);
+			imap_import_msg($self, $itrk, $r_uidval, $uid, \$raw);
 			last if $self->{quit};
 		}
 		_done_for_now($self);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 20/34] imaptracker: drop {dbname} field
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (18 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 19/34] imaptracker: add {url} field to reduce args Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 21/34] watch: avoid long transaction to IMAPTracker Eric Wong
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta; +Cc: Eric W . Biederman

It's not used anywhere since the IMAPTracker object doesn't
disconnect and reconnect.  If we ever need the filename,
{dbh}->sqlite_db_filename may be used.

Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/PublicInbox/IMAPTracker.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm
index 26274568b9c..0bbabe07fae 100644
--- a/lib/PublicInbox/IMAPTracker.pm
+++ b/lib/PublicInbox/IMAPTracker.pm
@@ -69,8 +69,7 @@ sub new {
 		File::Path::mkpath(File::Basename::dirname($dbname));
 	}
 
-	my $dbh = dbh_new($dbname);
-	bless { dbname => $dbname, url => $url, dbh => $dbh }, $class;
+	bless { url => $url, dbh => dbh_new($dbname) }, $class;
 }
 
 1;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 21/34] watch: avoid long transaction to IMAPTracker
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (19 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 20/34] imaptracker: drop {dbname} field Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 22/34] watch: support imap.fetchBatchSize parameter Eric Wong
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

With different polling intervals, multiple processes may
simultaneously write to IMAPtracker.  This ought to reduce
SQLite busy waiting and contention issues when importing
many inboxes in parallel.
---
 lib/PublicInbox/WatchMaildir.pm | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index e0caaa563b2..d492e5d65b7 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -335,12 +335,12 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 	$mic;
 }
 
-sub imap_import_msg ($$$$$) {
-	my ($self, $itrk, $r_uidval, $uid, $raw) = @_;
+sub imap_import_msg ($$$$) {
+	my ($self, $url, $uid, $raw) = @_;
 	# our target audience expects LF-only, save storage
 	$$raw =~ s/\r\n/\n/sg;
 
-	my $inboxes = $self->{imap}->{$itrk->{url}};
+	my $inboxes = $self->{imap}->{$url};
 	if (ref($inboxes)) {
 		for my $ibx (@$inboxes) {
 			my $eml = PublicInbox::Eml->new($$raw);
@@ -348,12 +348,11 @@ sub imap_import_msg ($$$$$) {
 		}
 	} elsif ($inboxes eq 'watchspam') {
 		my $eml = PublicInbox::Eml->new($raw);
-		my $arg = [ $self, $eml, "$itrk->{url} UID:$uid" ];
+		my $arg = [ $self, $eml, "$url UID:$uid" ];
 		$self->{config}->each_inbox(\&remove_eml_i, $arg);
 	} else {
 		die "BUG: destination unknown $inboxes";
 	}
-	$itrk->update_last($r_uidval, $uid);
 }
 
 sub imap_fetch_all ($$$) {
@@ -415,8 +414,8 @@ sub imap_fetch_all ($$$) {
 		return if $uids->[0] < $l_uid;
 
 		$l_uid = $uids->[-1] + 1; # for next search
+		my $last_uid;
 
-		$itrk->{dbh}->begin_work;
 		while (defined(($uid = shift(@$uids)))) {
 			local $0 = "UID:$uid $mbx $sec";
 			my $r = $mic->fetch_hash($uid, $req);
@@ -426,11 +425,12 @@ sub imap_fetch_all ($$$) {
 			}
 			# messages get deleted, so holes appear
 			defined(my $raw = delete $r->{$uid}->{$key}) or next;
-			imap_import_msg($self, $itrk, $r_uidval, $uid, \$raw);
+			imap_import_msg($self, $url, $uid, \$raw);
+			$last_uid = $uid;
 			last if $self->{quit};
 		}
 		_done_for_now($self);
-		$itrk->{dbh}->commit;
+		$itrk->update_last($r_uidval, $last_uid) if defined $last_uid;
 	} until ($err || $self->{quit});
 	$err;
 }

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 22/34] watch: support imap.fetchBatchSize parameter
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (20 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 21/34] watch: avoid long transaction to IMAPTracker Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 23/34] watch: imap: be quiet about disconnecting on quit Eric Wong
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

IMAP allows retrieving multiple messages with a single command,
and Mail::IMAPClient supports that.  Unfortunately, it means we
slurp multiple messages into memory at once.  This option allows
users to trade off memory usage to reduce network round-trips.

Ideally, we'd support pipelining; but AFAIK no widely installed
Perl IMAP library supports it.
---
 lib/PublicInbox/WatchMaildir.pm | 47 ++++++++++++++++++++++++---------
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index d492e5d65b7..05aa6594147 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -274,6 +274,15 @@ sub imap_common_init ($) {
 		$self->{imap_opt}->{$sec}->{poll_intvl} = $to if $to;
 		$to = cfg_intvl($cfg, 'imap', 'IdleInterval', $sec, $url);
 		$self->{imap_opt}->{$sec}->{idle_intvl} = $to if $to;
+
+		my $key = lc("imap.$sec.fetchBatchSize");
+		my $bs = $cfg->{lc($key)} //
+			$cfg->urlmatch('imap.fetchBatchSize', $url) // next;
+		if ($bs =~ /\A([0-9]+)\z/) {
+			$self->{imap_opt}->{$sec}->{batch_size} = $bs;
+		} else {
+			warn "W: $key=$bs is not an integer\n";
+		}
 	}
 	$mic_args;
 }
@@ -389,25 +398,31 @@ sub imap_fetch_all ($$$) {
 
 	warn "I: $url fetching UID $l_uid:$r_uid\n";
 	$mic->Uid(1); # the default, we hope
-	my $uids;
+	my $bs = $self->{imap_opt}->{$sec}->{batch_size} // 1;
 	my $req = $mic->imap4rev1 ? 'BODY.PEEK[]' : 'RFC822.PEEK';
+
+	# TODO: FLAGS may be useful for personal use
 	my $key = $req;
 	$key =~ s/\.PEEK//;
-	my $uid;
+	my ($uids, $batch);
 	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
 	local $SIG{__WARN__} = sub {
-		$uid //= -1;
-		$warn_cb->("$url UID:$uid\n");
+		$batch //= '?';
+		$warn_cb->("$url UID:$batch\n");
 		$warn_cb->(@_);
 	};
 	my $err;
 	do {
+		# I wish "UID FETCH $START:*" could work, but:
+		# 1) servers do not need to return results in any order
+		# 2) Mail::IMAPClient doesn't offer a streaming API
 		$uids = $mic->search("UID $l_uid:*") or
 			return "E: $url UID SEARCH $l_uid:* error: $!";
 		return if scalar(@$uids) == 0;
 
 		# RFC 3501 doesn't seem to indicate order of UID SEARCH
-		# responses, so sort it ourselves
+		# responses, so sort it ourselves.  Order matters so
+		# IMAPTracker can store the newest UID.
 		@$uids = sort { $a <=> $b } @$uids;
 
 		# Did we actually get new messages?
@@ -416,17 +431,23 @@ sub imap_fetch_all ($$$) {
 		$l_uid = $uids->[-1] + 1; # for next search
 		my $last_uid;
 
-		while (defined(($uid = shift(@$uids)))) {
-			local $0 = "UID:$uid $mbx $sec";
-			my $r = $mic->fetch_hash($uid, $req);
+		while (scalar @$uids) {
+			my @batch = splice(@$uids, 0, $bs);
+			$batch = join(',', @batch);
+			local $0 = "UID:$batch $mbx $sec";
+			my $r = $mic->fetch_hash($batch, $req);
 			unless ($r) { # network error?
-				$err = "E: $url UID FETCH $uid error: $!";
+				$err = "E: $url UID FETCH $batch error: $!";
 				last;
 			}
-			# messages get deleted, so holes appear
-			defined(my $raw = delete $r->{$uid}->{$key}) or next;
-			imap_import_msg($self, $url, $uid, \$raw);
-			$last_uid = $uid;
+			for my $uid (@batch) {
+				# messages get deleted, so holes appear
+				my $per_uid = delete $r->{$uid} // next;
+				my $raw = delete($per_uid->{$key}) // next;
+				imap_import_msg($self, $url, $uid, \$raw);
+				$last_uid = $uid;
+				last if $self->{quit};
+			}
 			last if $self->{quit};
 		}
 		_done_for_now($self);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 23/34] watch: imap: be quiet about disconnecting on quit
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (21 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 22/34] watch: support imap.fetchBatchSize parameter Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 24/34] watch: support multiple watch: directives per-inbox Eric Wong
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

If ->idle_done was handled successfully, we can just
let normal ->DESTROY disconnect and avoid ugly backtraces
when a user hits Ctrl-C to take down the process group.
---
 lib/PublicInbox/WatchMaildir.pm | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 05aa6594147..e4106490c27 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -208,9 +208,11 @@ sub quit {
 	}
 	if (my $idle_mic = $self->{idle_mic}) {
 		eval { $idle_mic->done };
-		warn "IDLE DONE error: $@\n" if $@;
-		eval { $idle_mic->disconnect };
-		warn "IDLE LOGOUT error: $@\n" if $@;
+		if ($@) {
+			warn "IDLE DONE error: $@\n";
+			eval { $idle_mic->disconnect };
+			warn "IDLE LOGOUT error: $@\n" if $@;
+		}
 	}
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 24/34] watch: support multiple watch: directives per-inbox
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (22 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 23/34] watch: imap: be quiet about disconnecting on quit Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 25/34] watch: remove {mdir} array Eric Wong
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Some users will find it useful to merge several Maildir or
IMAP mailboxes into one public-inbox.  Let them do it, since
we've always supported multi-address inboxes.
---
 lib/PublicInbox/WatchMaildir.pm | 36 ++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index e4106490c27..621d41bd81d 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -50,7 +50,7 @@ sub new {
 	foreach my $pfx (qw(publicinboxwatch publicinboxlearn)) {
 		my $k = "$pfx.watchspam";
 		defined(my $dirs = $config->{$k}) or next;
-		$dirs = [ $dirs ] if !ref($dirs);
+		$dirs = PublicInbox::Config::_array($dirs);
 		for my $dir (@$dirs) {
 			if (is_maildir($dir)) {
 				# skip "new", no MUA has seen it, yet.
@@ -75,21 +75,25 @@ sub new {
 		# need to make all inboxes writable for spam removal:
 		my $ibx = $_[0] = PublicInbox::InboxWritable->new($_[0]);
 
-		my $watch = $ibx->{watch} or return;
-		if (is_maildir($watch)) {
-			compile_watchheaders($ibx);
-			my ($new, $cur) = ("$watch/new", "$watch/cur");
-			return if is_watchspam($cur, $mdmap{$cur}, $ibx);
-			push @mdir, $new unless $uniq{$new}++;
-			push @mdir, $cur unless $uniq{$cur}++;
-			push @{$mdmap{$new} ||= []}, $ibx;
-			push @{$mdmap{$cur} ||= []}, $ibx;
-		} elsif (my $url = imap_url($watch)) {
-			return if is_watchspam($url, $imap{$url}, $ibx);
-			compile_watchheaders($ibx);
-			push @{$imap{$url} ||= []}, $ibx;
-		} else {
-			warn "watch unsupported: $k=$watch\n";
+		my $watches = $ibx->{watch} or return;
+		$watches = PublicInbox::Config::_array($watches);
+		for my $watch (@$watches) {
+			if (is_maildir($watch)) {
+				compile_watchheaders($ibx);
+				my ($new, $cur) = ("$watch/new", "$watch/cur");
+				my $cur_dst = $mdmap{$cur} //= [];
+				return if is_watchspam($cur, $cur_dst, $ibx);
+				push @mdir, $new unless $uniq{$new}++;
+				push @mdir, $cur unless $uniq{$cur}++;
+				push @{$mdmap{$new} //= []}, $ibx;
+				push @$cur_dst, $ibx;
+			} elsif (my $url = imap_url($watch)) {
+				return if is_watchspam($url, $imap{$url}, $ibx);
+				compile_watchheaders($ibx);
+				push @{$imap{$url} ||= []}, $ibx;
+			} else {
+				warn "watch unsupported: $k=$watch\n";
+			}
 		}
 	});
 	return unless scalar(@mdir) || scalar(keys %imap);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 25/34] watch: remove {mdir} array
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (23 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 24/34] watch: support multiple watch: directives per-inbox Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 26/34] watch: just use ->urlmatch Eric Wong
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Since we store all watched directory names as keys in %mdmap,
there should be no need to keep an array of those directories
around.

t/watch_maildir*.t required changes to remove trained spam.
Once we've trained something as spam, there shouldn't be
a need to rescan it.
---
 lib/PublicInbox/WatchMaildir.pm | 22 ++++++++--------------
 t/watch_maildir.t               |  2 ++
 t/watch_maildir_v2.t            |  2 ++
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 621d41bd81d..8d2dc432684 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -40,8 +40,7 @@ sub compile_watchheaders ($) {
 
 sub new {
 	my ($class, $config) = @_;
-	my (%mdmap, @mdir, $spamc);
-	my %uniq; # directory => count
+	my (%mdmap, $spamc);
 	my %imap; # url => [inbox objects] or 'watchspam'
 
 	# "publicinboxwatch" is the documented namespace
@@ -54,10 +53,7 @@ sub new {
 		for my $dir (@$dirs) {
 			if (is_maildir($dir)) {
 				# skip "new", no MUA has seen it, yet.
-				my $cur = "$dir/cur";
-				push @mdir, $cur;
-				$uniq{$cur}++;
-				$mdmap{$cur} = 'watchspam';
+				$mdmap{"$dir/cur"} = 'watchspam';
 			} elsif (my $url = imap_url($dir)) {
 				$imap{$url} = 'watchspam';
 			} else {
@@ -83,8 +79,6 @@ sub new {
 				my ($new, $cur) = ("$watch/new", "$watch/cur");
 				my $cur_dst = $mdmap{$cur} //= [];
 				return if is_watchspam($cur, $cur_dst, $ibx);
-				push @mdir, $new unless $uniq{$new}++;
-				push @mdir, $cur unless $uniq{$cur}++;
 				push @{$mdmap{$new} //= []}, $ibx;
 				push @$cur_dst, $ibx;
 			} elsif (my $url = imap_url($watch)) {
@@ -96,17 +90,16 @@ sub new {
 			}
 		}
 	});
-	return unless scalar(@mdir) || scalar(keys %imap);
 
 	my $mdre;
-	if (@mdir) {
-		$mdre = join('|', map { quotemeta($_) } @mdir);
+	if (scalar keys %mdmap) {
+		$mdre = join('|', map { quotemeta($_) } keys %mdmap);
 		$mdre = qr!\A($mdre)/!;
 	}
+	return unless $mdre || scalar(keys %imap);
 	bless {
 		spamcheck => $spamcheck,
 		mdmap => \%mdmap,
-		mdir => \@mdir,
 		mdre => $mdre,
 		config => $config,
 		imap => scalar keys %imap ? \%imap : undef,
@@ -231,7 +224,8 @@ sub watch_fs_init ($) {
 		$self->{done_timer} //= PublicInbox::DS::requeue($done);
 	};
 	require PublicInbox::DirIdle;
-	PublicInbox::DirIdle->new($self->{mdir}, $cb); # EPOLL_CTL_ADD
+	# inotify_create + EPOLL_CTL_ADD
+	PublicInbox::DirIdle->new([keys %{$self->{mdmap}}], $cb);
 }
 
 # returns the git config section name, e.g [imap "imaps://user@example.com"]
@@ -688,7 +682,7 @@ sub fs_scan_step {
 		$opendirs->{$dir} = $dh if $n < 0;
 	}
 	if ($op && $op eq 'full') {
-		foreach my $dir (@{$self->{mdir}}) {
+		foreach my $dir (keys %{$self->{mdmap}}) {
 			next if $opendirs->{$dir}; # already in progress
 			my $ok = opendir(my $dh, $dir);
 			unless ($ok) {
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index c8658140cf2..c44273f0519 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -84,6 +84,7 @@ PublicInbox::WatchMaildir->new($config)->scan('full');
 is(scalar @list, 2, 'two revisions in rev-list');
 @list = $git->qx(qw(ls-tree -r --name-only refs/heads/master));
 is(scalar @list, 0, 'tree is empty');
+is(unlink(glob("$spamdir/cur/*")), 1, 'unlinked trained spam');
 
 # check with scrubbing
 {
@@ -105,6 +106,7 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html\n);
 	is(scalar @list, 0, 'tree is empty');
 	@list = $git->qx(qw(rev-list refs/heads/master));
 	is(scalar @list, 4, 'four revisions in rev-list');
+	is(unlink(glob("$spamdir/cur/*")), 1, 'unlinked trained spam');
 }
 
 {
diff --git a/t/watch_maildir_v2.t b/t/watch_maildir_v2.t
index 6cc8b6ff0e9..f5b8e932985 100644
--- a/t/watch_maildir_v2.t
+++ b/t/watch_maildir_v2.t
@@ -71,6 +71,7 @@ $write_spam->();
 is(unlink(glob("$maildir/new/*")), 1, 'unlinked old spam');
 PublicInbox::WatchMaildir->new($config)->scan('full');
 is(($srch->reopen->query(''))[0], 0, 'deleted file');
+is(unlink(glob("$spamdir/cur/*")), 1, 'unlinked trained spam');
 
 # check with scrubbing
 {
@@ -90,6 +91,7 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html\n);
 	PublicInbox::WatchMaildir->new($config)->scan('full');
 	($nr, $msgs) = $srch->reopen->query('');
 	is($nr, 0, 'inbox is empty again');
+	is(unlink(glob("$spamdir/cur/*")), 1, 'unlinked trained spam');
 }
 
 {

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 26/34] watch: just use ->urlmatch
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (24 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 25/34] watch: remove {mdir} array Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 27/34] testcommon: $ENV{TAIL} supports non-@ARGV redirects Eric Wong
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

We may just modify PublicInbox::Config->urlmatch in the future
to support git <1.8.5, but I wonder if there's enough users on
git <1.8.5 to justify it.
---
 lib/PublicInbox/WatchMaildir.pm | 32 ++++++++++++++------------------
 t/imapd.t                       |  4 +++-
 2 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 8d2dc432684..535dadd539c 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -235,11 +235,9 @@ sub imap_section ($) {
 	$uri->scheme . '://' . $uri->authority;
 }
 
-sub cfg_intvl ($$$$$) {
-	my ($cfg, $cfg_section, $cfg_key, $imap_section, $url) = @_;
-	my $key = "$cfg_section.$imap_section.$cfg_key";
-	my $v = $cfg->{lc($key)} //
-		$cfg->urlmatch("$cfg_section.$cfg_key", $url) // return;
+sub cfg_intvl ($$$) {
+	my ($cfg, $key, $url) = @_;
+	my $v = $cfg->urlmatch($key, $url) // return;
 	$v =~ /\A[0-9]+(?:\.[0-9]+)?\z/s and return $v + 0;
 	if (ref($v) eq 'ARRAY') {
 		$v = join(', ', @$v);
@@ -257,31 +255,29 @@ sub imap_common_init ($) {
 	for my $url (sort keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
 		my $sec = imap_section($uri);
-		for my $k (qw(Starttls Debug Compress)) {
-			my $key = lc("imap.$sec.$k");
-			my $orig = $cfg->{$key} //
-				$cfg->urlmatch("imap.$k", $url) // next;
+		for my $f (qw(Starttls Debug Compress)) {
+			my $k = "imap.$f";
+			my $orig = $cfg->urlmatch($k, $url) // next;
 			my $v = PublicInbox::Config::_git_config_bool($orig);
 			if (defined($v)) {
-				$mic_args->{$sec}->{$k} = $v;
+				$mic_args->{$sec}->{$f} = $v;
 			} else {
-				warn "W: $key=$orig is not boolean\n";
+				warn "W: $k=$orig for $url is not boolean\n";
 			}
 		}
-		my $to = cfg_intvl($cfg, 'imap', 'Timeout', $sec, $url);
+		my $to = cfg_intvl($cfg, 'imap.timeout', $url);
 		$mic_args->{$sec}->{Timeout} = $to if $to;
-		$to = cfg_intvl($cfg, 'imap', 'PollInterval', $sec, $url);
+		$to = cfg_intvl($cfg, 'imap.pollInterval', $url);
 		$self->{imap_opt}->{$sec}->{poll_intvl} = $to if $to;
-		$to = cfg_intvl($cfg, 'imap', 'IdleInterval', $sec, $url);
+		$to = cfg_intvl($cfg, 'imap.IdleInterval', $url);
 		$self->{imap_opt}->{$sec}->{idle_intvl} = $to if $to;
 
-		my $key = lc("imap.$sec.fetchBatchSize");
-		my $bs = $cfg->{lc($key)} //
-			$cfg->urlmatch('imap.fetchBatchSize', $url) // next;
+		my $k = 'imap.fetchBatchSize';
+		my $bs = $cfg->urlmatch($k, $url) // next;
 		if ($bs =~ /\A([0-9]+)\z/) {
 			$self->{imap_opt}->{$sec}->{batch_size} = $bs;
 		} else {
-			warn "W: $key=$bs is not an integer\n";
+			warn "$k=$bs is not an integer\n";
 		}
 	}
 	$mic_args;
diff --git a/t/imapd.t b/t/imapd.t
index 5626d24765f..cf327e9fbea 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -440,9 +440,11 @@ ok($mic->logout, 'logged out');
 	like(<$c>, qr/\Atagonly BAD Error in IMAP command/, 'tag-only line');
 }
 
-{
+SKIP: {
 	use_ok 'PublicInbox::WatchMaildir';
 	use_ok 'PublicInbox::InboxIdle';
+	require_git('1.8.5', 1) or
+		skip('git 1.8.5+ needed for --urlmatch', 4);
 	my $old_env = { HOME => $ENV{HOME} };
 	my $home = "$tmpdir/watch_home";
 	mkdir $home or BAIL_OUT $!;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 27/34] testcommon: $ENV{TAIL} supports non-@ARGV redirects
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (25 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 26/34] watch: just use ->urlmatch Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 28/34] watch: add NNTP support Eric Wong
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Existing use of the $ENV{TAIL} relied on parsing --std{out,err},
which was only usable for read-only daemons.  However, -watch
doesn't use PublicInbox::Daemon code(*), so attempt to figure
out redirects.

(*) -watch won't able to run as a daemon in cases when
    git-credential prompts for IMAP/NNTP passwords.
    PublicInbox::Daemon is also designed for read-only
    parallelism where all worker processes are the same.
    Any subprocesses spawned by -watch are to do specific
    tasks for a particular set of inboxes.
---
 lib/PublicInbox/TestCommon.pm | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 7b4da8b5f09..b03e93e0f5b 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -276,11 +276,11 @@ sub tick (;$) {
 }
 
 sub wait_for_tail ($;$) {
-	my ($tail_pid, $stop) = @_;
+	my ($tail_pid, $want) = @_;
 	my $wait = 2;
 	if ($^O eq 'linux') { # GNU tail may use inotify
 		state $tail_has_inotify;
-		return tick if $stop && $tail_has_inotify;
+		return tick if $want < 0 && $tail_has_inotify;
 		my $end = time + $wait;
 		my @ino;
 		do {
@@ -297,7 +297,7 @@ sub wait_for_tail ($;$) {
 				local $/ = "\n";
 				@info = grep(/^inotify wd:/, <$fh>);
 			}
-		} while (scalar(@info) < 2 && time <= $end and tick);
+		} while (scalar(@info) < $want && time <= $end and tick);
 	} else {
 		sleep($wait);
 	}
@@ -337,6 +337,18 @@ sub start_script {
 			next unless /\A--std(?:err|out)=(.+)\z/;
 			push @paths, $1;
 		}
+		if ($opt) {
+			for (1, 2) {
+				my $f = $opt->{$_} or next;
+				if (!ref($f)) {
+					push @paths, $f;
+				} elsif (ref($f) eq 'GLOB' && $^O eq 'linux') {
+					my $fd = fileno($f);
+					my $f = readlink "/proc/$$/fd/$fd";
+					push @paths, $f if -e $f;
+				}
+			}
+		}
 		if (@paths) {
 			defined($tail_pid = fork) or die "fork: $!\n";
 			if ($tail_pid == 0) {
@@ -346,7 +358,7 @@ sub start_script {
 				exec(split(' ', $tail_cmd), @paths);
 				die "$tail_cmd failed: $!";
 			}
-			wait_for_tail($tail_pid);
+			wait_for_tail($tail_pid, scalar @paths);
 		}
 	}
 	defined(my $pid = fork) or die "fork: $!\n";
@@ -414,7 +426,7 @@ sub DESTROY {
 	my ($self) = @_;
 	return if $self->{owner} != $$;
 	if (my $tail_pid = delete $self->{tail_pid}) {
-		PublicInbox::TestCommon::wait_for_tail($tail_pid, 1);
+		PublicInbox::TestCommon::wait_for_tail($tail_pid, -1);
 		CORE::kill('TERM', $tail_pid);
 	}
 	$self->join('TERM');

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 28/34] watch: add NNTP support
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (26 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 27/34] testcommon: $ENV{TAIL} supports non-@ARGV redirects Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 19:06   ` Kyle Meyer
  2020-06-27 10:03 ` [PATCH 29/34] watch: show user-specified URL consistently Eric Wong
                   ` (6 subsequent siblings)
  34 siblings, 1 reply; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

This is similar to IMAP support, but only supports polling.
Automatic altid support is not yet supported, yet; but may
be in the future.
---
 MANIFEST                        |   1 +
 lib/PublicInbox/WatchMaildir.pm | 357 ++++++++++++++++++++++++++++----
 t/nntpd.t                       |  52 +++++
 t/watch_nntp.t                  |  15 ++
 4 files changed, 389 insertions(+), 36 deletions(-)
 create mode 100644 t/watch_nntp.t

diff --git a/MANIFEST b/MANIFEST
index 035c45bf498..f9d1eea5bd9 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -355,6 +355,7 @@ t/watch_imap.t
 t/watch_maildir.t
 t/watch_maildir_v2.t
 t/watch_multiple_headers.t
+t/watch_nntp.t
 t/www_altid.t
 t/www_listing.t
 t/www_static.t
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 535dadd539c..616c63a3857 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -41,7 +41,7 @@ sub compile_watchheaders ($) {
 sub new {
 	my ($class, $config) = @_;
 	my (%mdmap, $spamc);
-	my %imap; # url => [inbox objects] or 'watchspam'
+	my (%imap, %nntp); # url => [inbox objects] or 'watchspam'
 
 	# "publicinboxwatch" is the documented namespace
 	# "publicinboxlearn" is legacy but may be supported
@@ -51,11 +51,14 @@ sub new {
 		defined(my $dirs = $config->{$k}) or next;
 		$dirs = PublicInbox::Config::_array($dirs);
 		for my $dir (@$dirs) {
+			my $url;
 			if (is_maildir($dir)) {
 				# skip "new", no MUA has seen it, yet.
 				$mdmap{"$dir/cur"} = 'watchspam';
-			} elsif (my $url = imap_url($dir)) {
+			} elsif ($url = imap_url($dir)) {
 				$imap{$url} = 'watchspam';
+			} elsif ($url = nntp_url($dir)) {
+				$nntp{$url} = 'watchspam';
 			} else {
 				warn "unsupported $k=$dir\n";
 			}
@@ -74,6 +77,7 @@ sub new {
 		my $watches = $ibx->{watch} or return;
 		$watches = PublicInbox::Config::_array($watches);
 		for my $watch (@$watches) {
+			my $url;
 			if (is_maildir($watch)) {
 				compile_watchheaders($ibx);
 				my ($new, $cur) = ("$watch/new", "$watch/cur");
@@ -81,10 +85,14 @@ sub new {
 				return if is_watchspam($cur, $cur_dst, $ibx);
 				push @{$mdmap{$new} //= []}, $ibx;
 				push @$cur_dst, $ibx;
-			} elsif (my $url = imap_url($watch)) {
+			} elsif ($url = imap_url($watch)) {
 				return if is_watchspam($url, $imap{$url}, $ibx);
 				compile_watchheaders($ibx);
 				push @{$imap{$url} ||= []}, $ibx;
+			} elsif ($url = nntp_url($watch)) {
+				return if is_watchspam($url, $nntp{$url}, $ibx);
+				compile_watchheaders($ibx);
+				push @{$nntp{$url} ||= []}, $ibx;
 			} else {
 				warn "watch unsupported: $k=$watch\n";
 			}
@@ -96,13 +104,15 @@ sub new {
 		$mdre = join('|', map { quotemeta($_) } keys %mdmap);
 		$mdre = qr!\A($mdre)/!;
 	}
-	return unless $mdre || scalar(keys %imap);
+	return unless $mdre || scalar(keys %imap) || scalar(keys %nntp);
+
 	bless {
 		spamcheck => $spamcheck,
 		mdmap => \%mdmap,
 		mdre => $mdre,
 		config => $config,
 		imap => scalar keys %imap ? \%imap : undef,
+		nntp => scalar keys %nntp? \%nntp : undef,
 		importers => {},
 		opendirs => {}, # dirname => dirhandle (in progress scans)
 		ops => [], # 'quit', 'full'
@@ -230,7 +240,7 @@ sub watch_fs_init ($) {
 
 # returns the git config section name, e.g [imap "imaps://user@example.com"]
 # without the mailbox, so we can share connections between different inboxes
-sub imap_section ($) {
+sub uri_section ($) {
 	my ($uri) = @_;
 	$uri->scheme . '://' . $uri->authority;
 }
@@ -247,6 +257,14 @@ sub cfg_intvl ($$$) {
 	}
 }
 
+sub cfg_bool ($$$) {
+	my ($cfg, $key, $url) = @_;
+	my $orig = $cfg->urlmatch($key, $url) // return;
+	my $bool = PublicInbox::Config::_git_config_bool($orig);
+	warn "W: $key=$orig for $url is not boolean\n" unless defined($bool);
+	$bool;
+}
+
 # flesh out common IMAP-specific data structures
 sub imap_common_init ($) {
 	my ($self) = @_;
@@ -254,24 +272,17 @@ sub imap_common_init ($) {
 	my $mic_args = {}; # scheme://authority => Mail:IMAPClient arg
 	for my $url (sort keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
-		my $sec = imap_section($uri);
-		for my $f (qw(Starttls Debug Compress)) {
-			my $k = "imap.$f";
-			my $orig = $cfg->urlmatch($k, $url) // next;
-			my $v = PublicInbox::Config::_git_config_bool($orig);
-			if (defined($v)) {
-				$mic_args->{$sec}->{$f} = $v;
-			} else {
-				warn "W: $k=$orig for $url is not boolean\n";
-			}
+		my $sec = uri_section($uri);
+		for my $k (qw(Starttls Debug Compress)) {
+			my $bool = cfg_bool($cfg, "imap.$k", $url) // next;
+			$mic_args->{$sec}->{$k} = $bool;
 		}
 		my $to = cfg_intvl($cfg, 'imap.timeout', $url);
 		$mic_args->{$sec}->{Timeout} = $to if $to;
-		$to = cfg_intvl($cfg, 'imap.pollInterval', $url);
-		$self->{imap_opt}->{$sec}->{poll_intvl} = $to if $to;
-		$to = cfg_intvl($cfg, 'imap.IdleInterval', $url);
-		$self->{imap_opt}->{$sec}->{idle_intvl} = $to if $to;
-
+		for my $k (qw(pollInterval idleInterval)) {
+			$to = cfg_intvl($cfg, "imap.$k", $url) // next;
+			$self->{imap_opt}->{$sec}->{$k} = $to;
+		}
 		my $k = 'imap.fetchBatchSize';
 		my $bs = $cfg->urlmatch($k, $url) // next;
 		if ($bs =~ /\A([0-9]+)\z/) {
@@ -295,7 +306,7 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 		username => $uri->user,
 		password => $uri->password,
 	};
-	my $common = $mic_args->{imap_section($uri)} // {};
+	my $common = $mic_args->{uri_section($uri)} // {};
 	my $host = $cred->{host};
 	my $mic_arg = {
 		Port => $uri->port,
@@ -331,7 +342,7 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 	}
 	if ($mic->login && $mic->IsAuthenticated) {
 		# success! keep IMAPClient->new arg in case we get disconnected
-		$self->{mic_arg}->{imap_section($uri)} = $mic_arg;
+		$self->{mic_arg}->{uri_section($uri)} = $mic_arg;
 	} else {
 		warn "E: <$url> LOGIN: $@\n";
 		$mic = undef;
@@ -362,7 +373,7 @@ sub imap_import_msg ($$$$) {
 
 sub imap_fetch_all ($$$) {
 	my ($self, $mic, $uri) = @_;
-	my $sec = imap_section($uri);
+	my $sec = uri_section($uri);
 	my $mbx = $uri->mailbox;
 	my $url = $uri->as_string;
 	$mic->Clear(1); # trim results history
@@ -479,7 +490,7 @@ sub imap_idle_once ($$$$) {
 # idles on a single URI
 sub watch_imap_idle_1 ($$$) {
 	my ($self, $uri, $intvl) = @_;
-	my $sec = imap_section($uri);
+	my $sec = uri_section($uri);
 	my $mic_arg = $self->{mic_arg}->{$sec} or
 			die "BUG: no Mail::IMAPClient->new arg for $sec";
 	my $mic;
@@ -555,7 +566,7 @@ sub event_step {
 sub watch_imap_fetch_all ($$) {
 	my ($self, $uris) = @_;
 	for my $uri (@$uris) {
-		my $sec = imap_section($uri);
+		my $sec = uri_section($uri);
 		my $mic_arg = $self->{mic_arg}->{$sec} or
 			die "BUG: no Mail::IMAPClient->new arg for $sec";
 		my $mic = PublicInbox::IMAPClient->new(%$mic_arg) or next;
@@ -565,21 +576,56 @@ sub watch_imap_fetch_all ($$) {
 	}
 }
 
-sub imap_fetch_fork ($) { # DS::add_timer callback
+sub watch_nntp_fetch_all ($$) {
+	my ($self, $uris) = @_;
+	for my $uri (@$uris) {
+		my $sec = uri_section($uri);
+		my $nn_arg = $self->{nn_arg}->{$sec} or
+			die "BUG: no Net::NNTP->new arg for $sec";
+		my $nntp_opt = $self->{nntp_opt}->{$sec};
+		my $url = $uri->as_string;
+		my $nn = nn_new($nn_arg, $nntp_opt, $url);
+		unless ($nn) {
+			warn "E: $url: \$!=$!\n";
+			next;
+		}
+		last if $self->{quit};
+		if (my $postconn = $nntp_opt->{-postconn}) {
+			for my $m_arg (@$postconn) {
+				my ($method, @args) = @$m_arg;
+				$nn->$method(@args) and next;
+				warn "E: <$url> $method failed\n";
+				$nn = undef;
+				last;
+			}
+		}
+		last if $self->{quit};
+		if ($nn) {
+			my $err = nntp_fetch_all($self, $nn, $uri);
+			warn $err, "\n" if $err;
+		}
+	}
+}
+
+sub poll_fetch_fork ($) { # DS::add_timer callback
 	my ($self, $intvl, $uris) = @{$_[0]};
 	return if $self->{quit};
 	watch_atfork_parent($self);
 	defined(my $pid = fork) or die "fork: $!";
 	if ($pid == 0) {
 		watch_atfork_child($self);
-		watch_imap_fetch_all($self, $uris);
+		if ($uris->[0]->scheme =~ /\Aimaps?\z/) {
+			watch_imap_fetch_all($self, $uris);
+		} else {
+			watch_nntp_fetch_all($self, $uris);
+		}
 		_exit(0);
 	}
 	$self->{poll_pids}->{$pid} = [ $intvl, $uris ];
-	PublicInbox::DS::dwaitpid($pid, \&imap_fetch_reap, $self);
+	PublicInbox::DS::dwaitpid($pid, \&poll_fetch_reap, $self);
 }
 
-sub imap_fetch_reap { # PublicInbox::DS::dwaitpid callback
+sub poll_fetch_reap { # PublicInbox::DS::dwaitpid callback
 	my ($self, $pid) = @_;
 	my $intvl_uris = delete $self->{poll_pids}->{$pid} or
 		die "BUG: PID=$pid (unknown) reaped: \$?=$?\n";
@@ -590,7 +636,7 @@ sub imap_fetch_reap { # PublicInbox::DS::dwaitpid callback
 			map { $_->as_string."\n" } @$uris;
 	}
 	warn('I: will check ', $_->as_string, " in ${intvl}s\n") for @$uris;
-	PublicInbox::DS::add_timer($intvl, \&imap_fetch_fork,
+	PublicInbox::DS::add_timer($intvl, \&poll_fetch_fork,
 					[$self, $intvl, $uris]);
 }
 
@@ -610,18 +656,18 @@ sub watch_imap_init ($) {
 	my $mics = $self->{mics} = {}; # schema://authority => IMAPClient obj
 	for my $url (sort keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
-		$mics->{imap_section($uri)} //= mic_for($self, $uri, $mic_args);
+		$mics->{uri_section($uri)} //= mic_for($self, $uri, $mic_args);
 	}
 
 	my $idle = []; # [ [ uri1, intvl1 ], [uri2, intvl2] ]
 	my $poll = {}; # intvl_seconds => [ uri1, uri2 ]
 	for my $url (keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
-		my $sec = imap_section($uri);
+		my $sec = uri_section($uri);
 		my $mic = $mics->{$sec};
-		my $intvl = $self->{imap_opt}->{$sec}->{poll_intvl};
+		my $intvl = $self->{imap_opt}->{$sec}->{pollInterval};
 		if ($mic->has_capability('IDLE') && !$intvl) {
-			$intvl = $self->{imap_opt}->{$sec}->{idle_intvl};
+			$intvl = $self->{imap_opt}->{$sec}->{idleInterval};
 			push @$idle, [ $uri, $intvl // () ];
 		} else {
 			push @{$poll->{$intvl || 120}}, $uri;
@@ -633,11 +679,238 @@ sub watch_imap_init ($) {
 		PublicInbox::DS::requeue($self); # ->event_step to fork
 	}
 	return unless scalar keys %$poll;
-	$self->{poll_pids} = {};
+	$self->{poll_pids} //= {};
+
+	# poll all URIs for a given interval sequentially
+	while (my ($intvl, $uris) = each %$poll) {
+		PublicInbox::DS::add_timer(0, \&poll_fetch_fork,
+						[$self, $intvl, $uris]);
+	}
+}
+
+# flesh out common NNTP-specific data structures
+sub nntp_common_init ($) {
+	my ($self) = @_;
+	my $cfg = $self->{config};
+	my $nn_args = {}; # scheme://authority => Net::NNTP->new arg
+	for my $url (sort keys %{$self->{nntp}}) {
+		my $sec = uri_section(URI->new($url));
+
+		# Debug and Timeout are is passed to Net::NNTP->new
+		my $v = cfg_bool($cfg, 'nntp.Debug', $url);
+		$nn_args->{$sec}->{Debug} = $v if defined $v;
+		my $to = cfg_intvl($cfg, 'nntp.Timeout', $url);
+		$nn_args->{$sec}->{Timeout} = $to if $to;
+
+		# Net::NNTP post-connect commands
+		for my $k (qw(starttls compress)) {
+			$v = cfg_bool($cfg, "nntp.$k", $url) // next;
+			$self->{nntp_opt}->{$sec}->{$k} = $v;
+		}
+
+		# internal option
+		for my $k (qw(pollInterval)) {
+			$to = cfg_intvl($cfg, "nntp.$k", $url) // next;
+			$self->{nntp_opt}->{$sec}->{$k} = $to;
+		}
+	}
+	$nn_args;
+}
+
+# Net::NNTP doesn't support CAPABILITIES, yet
+sub try_starttls ($) {
+	my ($host) = @_;
+	return if $host =~ /\.onion\z/s;
+	return if $host =~ /\A127\.[0-9]+\.[0-9]+\.[0-9]+\z/s;
+	return if $host eq '::1';
+	1;
+}
+
+sub nn_new ($$$) {
+	my ($nn_arg, $nntp_opt, $url) = @_;
+	my $nn = Net::NNTP->new(%$nn_arg) or die "E: <$url> new: $!\n";
+
+	# default to using STARTTLS if it's available, but allow
+	# it to be disabled for localhost/VPN users
+	if (!$nn_arg->{SSL} && $nn->can('starttls')) {
+		if (!defined($nntp_opt->{starttls}) &&
+				try_starttls($nn_arg->{Host})) {
+			# soft fail by default
+			$nn->starttls or warn <<"";
+W: <$url> STARTTLS tried and failed (not requested)
+
+		} elsif ($nntp_opt->{starttls}) {
+			# hard fail if explicitly configured
+			$nn->starttls or die <<"";
+E: <$url> STARTTLS requested and failed
+
+		}
+	} elsif ($nntp_opt->{starttls}) {
+		$nn->can('starttls') or
+			die "E: <$url> Net::NNTP too old for STARTTLS\n";
+		$nn->starttls or die <<"";
+E: <$url> STARTTLS requested and failed
+
+	}
+	$nn;
+}
+
+sub nn_for ($$$) { # nn = Net::NNTP
+	my ($self, $uri, $nn_args) = @_;
+	my $url = $uri->as_string;
+	my $sec = uri_section($uri);
+	my $nntp_opt = $self->{nntp_opt}->{$sec} //= {};
+	my $cred;
+	my ($u, $p);
+	if (defined(my $ui = $uri->userinfo)) {
+		$cred = {
+			url => $sec,
+			protocol => $uri->scheme,
+			host => $uri->host,
+		};
+		($u, $p) = split(/:/, $ui, 2);
+		($cred->{username}, $cred->{password}) = ($u, $p);
+	}
+	my $common = $nn_args->{$sec} // {};
+	my $nn_arg = {
+		Port => $uri->port,
+		# Net::NNTP mishandles `0', so we pass `127.0.0.1'
+		Host => $uri->host eq '0' ? '127.0.0.1' : $uri->host,
+		SSL => $uri->secure, # snews == nntps
+		%$common, # may Debug ....
+	};
+	my $nn = nn_new($nn_arg, $nntp_opt, $url);
+
+	if ($cred) {
+		Git::credential($cred, 'fill'); # may prompt user here
+		if ($nn->authinfo($u, $p)) {
+			push @{$nntp_opt->{-postconn}}, [ 'authinfo', $u, $p ];
+		} else {
+			warn "E: <$url> AUTHINFO $u XXXX failed\n";
+			$nn = undef;
+		}
+	}
+
+	if ($nntp_opt->{compress}) {
+		# https://rt.cpan.org/Ticket/Display.html?id=129967
+		if ($nn->can('compress')) {
+			if ($nn->compress) {
+				push @{$nntp_opt->{-postconn}}, [ 'compress' ];
+			} else {
+				warn "W: <$url> COMPRESS failed\n";
+			}
+		} else {
+			delete $nntp_opt->{compress};
+			warn <<"";
+W: <$url> COMPRESS not supported by Net::NNTP
+W: see https://rt.cpan.org/Ticket/Display.html?id=129967 for updates
+
+		}
+	}
+
+	$self->{nn_arg}->{$sec} = $nn_arg;
+	Git::credential($cred, $nn ? 'approve' : 'reject') if $cred;
+	$nn;
+}
+
+sub nntp_fetch_all ($$$) {
+	my ($self, $nn, $uri) = @_;
+	my ($group, $num_a, $num_b) = $uri->group;
+	my $sec = uri_section($uri);
+	my $url = $uri->as_string;
+	my ($nr, $beg, $end) = $nn->group($group);
+	unless (defined($nr)) {
+		chomp(my $msg = $nn->message);
+		return "E: GROUP $group <$sec> $msg";
+	}
+
+	# IMAPTracker is also used for tracking NNTP, UID == article number
+	# LIST.ACTIVE can get the equivalent of UIDVALIDITY, but that's
+	# expensive.  So we assume newsgroups don't change:
+	my $itrk = PublicInbox::IMAPTracker->new($url);
+	my (undef, $l_art) = $itrk->get_last;
+	$l_art //= $beg; # initial import
+
+	# allow users to specify articles to refetch
+	# cf. https://tools.ietf.org/id/draft-gilman-news-url-01.txt
+	# nntp://example.com/inbox.foo/$num_a-$num_b
+	$l_art = $num_a if defined($num_a) && $num_a < $l_art;
+	$end = $num_b if defined($num_b) && $num_b < $end;
+
+	return if $l_art >= $end; # nothing to do
+	$beg = $l_art + 1;
+
+	warn "I: $url fetching ARTICLE $beg..$end\n";
+	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
+	my ($err, $art);
+	local $SIG{__WARN__} = sub {
+		$warn_cb->("$url ", $art ? ("ARTICLE $art") : (), "\n", @_);
+	};
+	my $inboxes = $self->{nntp}->{$url};
+	my $last_art;
+	for ($beg..$end) {
+		last if $self->{quit};
+		$art = $_;
+		my $raw = $nn->article($art);
+		unless (defined($raw)) {
+			my $msg = $nn->message;
+			if ($nn->code == 421) { # pseudo response from Net::Cmd
+				$err = "E: $msg";
+				last;
+			} else { # probably just a deleted message (spam)
+				warn "W: $msg";
+				next;
+			}
+		}
+		s/\r\n/\n/ for @$raw;
+		$raw = join('', @$raw);
+		if (ref($inboxes)) {
+			for my $ibx (@$inboxes) {
+				my $eml = PublicInbox::Eml->new($raw);
+				import_eml($self, $ibx, $eml);
+			}
+		} elsif ($inboxes eq 'watchspam') {
+			my $eml = PublicInbox::Eml->new(\$raw);
+			my $arg = [ $self, $eml, "$url ARTICLE $art" ];
+			$self->{config}->each_inbox(\&remove_eml_i, $arg);
+		} else {
+			die "BUG: destination unknown $inboxes";
+		}
+		$last_art = $art;
+	}
+	$itrk->update_last(0, $last_art) if defined $last_art;
+	_done_for_now($self);
+	$err;
+}
+
+sub watch_nntp_init ($) {
+	my ($self) = @_;
+	eval { require Net::NNTP } or
+		die "Net::NNTP is required for NNTP:\n$@\n";
+	eval { require Git } or
+		die "Git (Perl module) is required for NNTP:\n$@\n";
+	eval { require PublicInbox::IMAPTracker } or
+		die "DBD::SQLite is required for NNTP\n:$@\n";
+
+	my $nn_args = nntp_common_init($self); # read args from config
+
+	# make sure we can connect and cache the credentials in memory
+	$self->{nn_arg} = {}; # schema://authority => Net::NNTP->new args
+	for my $url (sort keys %{$self->{nntp}}) {
+		nn_for($self, URI->new($url), $nn_args);
+	}
+	my $poll = {}; # intvl_seconds => [ uri1, uri2 ]
+	for my $url (keys %{$self->{nntp}}) {
+		my $uri = URI->new($url);
+		my $sec = uri_section($uri);
+		my $intvl = $self->{nntp_opt}->{$sec}->{pollInterval};
+		push @{$poll->{$intvl || 120}}, $uri;
+	}
+	$self->{poll_pids} //= {};
 
 	# poll all URIs for a given interval sequentially
 	while (my ($intvl, $uris) = each %$poll) {
-		PublicInbox::DS::add_timer(0, \&imap_fetch_fork,
+		PublicInbox::DS::add_timer(0, \&poll_fetch_fork,
 						[$self, $intvl, $uris]);
 	}
 }
@@ -647,6 +920,7 @@ sub watch {
 	$self->{oldset} = $oldset;
 	$self->{sig} = $sig;
 	watch_imap_init($self) if $self->{imap};
+	watch_nntp_init($self) if $self->{nntp};
 	watch_fs_init($self) if $self->{mdre};
 	PublicInbox::DS->SetPostLoopCallback(sub {});
 	PublicInbox::DS->EventLoop until $self->{quit};
@@ -754,4 +1028,15 @@ sub imap_url {
 	$uri ? $uri->canonical->as_string : undef;
 }
 
+my %IS_NNTP = (news => 1, snews => 1, nntp => 1);
+sub nntp_url {
+	my ($url) = @_;
+	require URI;
+	# URI::snews exists, URI::nntps does not, so use URI::snews
+	$url =~ s!\Anntps://!snews://!i;
+	my $uri = URI->new($url);
+	return unless $uri && $IS_NNTP{$uri->scheme};
+	$uri->group ? $uri->canonical->as_string : undef;
+}
+
 1;
diff --git a/t/nntpd.t b/t/nntpd.t
index 9d0ee2baa6d..d72d6a1ce7e 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -352,6 +352,7 @@ Date: Fri, 02 Oct 1993 00:00:00 +0000
 		my @of = xqx([$lsof, '-p', $td->{pid}], undef, $noerr);
 		is(scalar(grep(/\(deleted\)/, @of)), 0, 'no deleted files');
 	};
+	SKIP: { test_watch($tmpdir, $sock, $group) };
 	{
 		setsockopt($s, IPPROTO_TCP, TCP_NODELAY, 1);
 		syswrite($s, 'HDR List-id 1-');
@@ -391,4 +392,55 @@ sub read_til_dot {
 	$buf;
 }
 
+sub test_watch {
+	my ($tmpdir, $sock, $group) = @_;
+	use_ok 'PublicInbox::WatchMaildir';
+	use_ok 'PublicInbox::InboxIdle';
+	require_git('1.8.5', 1) or skip('git 1.8.5+ needed for --urlmatch', 4);
+	my $old_env = { HOME => $ENV{HOME} };
+	my $home = "$tmpdir/watch_home";
+	mkdir $home or BAIL_OUT $!;
+	mkdir "$home/.public-inbox" or BAIL_OUT $!;
+	local $ENV{HOME} = $home;
+	my $name = 'watchnntp';
+	my $addr = "i1\@example.com";
+	my $url = "http://example.com/i1";
+	my $inboxdir = "$tmpdir/watchnntp";
+	my $cmd = ['-init', '-V1', '-Lbasic', $name, $inboxdir, $url, $addr];
+	my ($ihost, $iport) = ($sock->sockhost, $sock->sockport);
+	my $nntpurl = "nntp://$ihost:$iport/$group";
+	run_script($cmd) or BAIL_OUT("init $name");
+	xsys(qw(git config), "--file=$home/.public-inbox/config",
+			"publicinbox.$name.watch",
+			$nntpurl) == 0 or BAIL_OUT "git config $?";
+	# try again with polling
+	xsys(qw(git config), "--file=$home/.public-inbox/config",
+		'nntp.PollInterval', 0.11) == 0
+		or BAIL_OUT "git config $?";
+	my $cfg = PublicInbox::Config->new;
+	PublicInbox::DS->Reset;
+	my $ii = PublicInbox::InboxIdle->new($cfg);
+	my $cb = sub { PublicInbox::DS->SetPostLoopCallback(sub {}) };
+	my $obj = bless \$cb, 'PublicInbox::TestCommon::InboxWakeup';
+	$cfg->each_inbox(sub { $_[0]->subscribe_unlock('ident', $obj) });
+	my $watcherr = "$tmpdir/watcherr";
+	open my $err_wr, '>', $watcherr or BAIL_OUT $!;
+	open my $err, '<', $watcherr or BAIL_OUT $!;
+	my $w = start_script(['-watch'], undef, { 2 => $err_wr });
+
+	diag 'waiting for initial fetch...';
+	PublicInbox::DS->EventLoop;
+	diag 'inbox unlocked on initial fetch';
+	$w->kill;
+	$w->join;
+	is($?, 0, 'no error in exited -watch process');
+	$cfg->each_inbox(sub { shift->unsubscribe_unlock('ident') });
+	$ii->close;
+	PublicInbox::DS->Reset;
+	my @err = grep(!/^I:/, <$err>);
+	is(@err, 0, 'no warnings/errors from -watch'.join(' ', @err));
+	my @ls = xqx(['git', "--git-dir=$inboxdir", qw(ls-tree -r HEAD)]);
+	isnt(scalar(@ls), 0, 'imported something');
+}
+
 1;
diff --git a/t/watch_nntp.t b/t/watch_nntp.t
new file mode 100644
index 00000000000..f919930e7d8
--- /dev/null
+++ b/t/watch_nntp.t
@@ -0,0 +1,15 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::Config;
+# see t/nntpd*.t for tests against a live NNTP server
+
+use_ok 'PublicInbox::WatchMaildir';
+my $nntp_url = \&PublicInbox::WatchMaildir::nntp_url;
+is('news://example.com/inbox.foo',
+	$nntp_url->('NEWS://examplE.com/inbox.foo'), 'lowercased');
+is('snews://example.com/inbox.foo',
+	$nntp_url->('nntps://example.com/inbox.foo'), 'nntps:// is snews://');
+
+done_testing;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 29/34] watch: show user-specified URL consistently.
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (27 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 28/34] watch: add NNTP support Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 30/34] watch: enable autoflush for STDOUT and STDERR Eric Wong
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Since we use the non-ref scalar URL in many error messages,
favor keeping the unblessed URL in the long-lived process.

This avoids showing "snews://" to users who've specified
"nntps://" URLs, since "nntps" is IANA-registered nowadays and
what we show in our documentation, while "snews" was just a
draft the URI package picked up decades ago.

---
 lib/PublicInbox/WatchMaildir.pm | 142 ++++++++++++++++++--------------
 t/watch_nntp.t                  |   6 +-
 2 files changed, 84 insertions(+), 64 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 616c63a3857..43c8395c79b 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -238,11 +238,20 @@ sub watch_fs_init ($) {
 	PublicInbox::DirIdle->new([keys %{$self->{mdmap}}], $cb);
 }
 
+# avoid exposing deprecated "snews" to users.
+my %SCHEME_MAP = ('snews' => 'nntps');
+
+sub uri_scheme ($) {
+	my ($uri) = @_;
+	my $scheme = $uri->scheme;
+	$SCHEME_MAP{$scheme} // $scheme;
+}
+
 # returns the git config section name, e.g [imap "imaps://user@example.com"]
 # without the mailbox, so we can share connections between different inboxes
 sub uri_section ($) {
 	my ($uri) = @_;
-	$uri->scheme . '://' . $uri->authority;
+	uri_scheme($uri) . '://' . $uri->authority;
 }
 
 sub cfg_intvl ($$$) {
@@ -297,8 +306,8 @@ sub imap_common_init ($) {
 sub auth_anon_cb { '' }; # for Mail::IMAPClient::Authcallback
 
 sub mic_for ($$$) { # mic = Mail::IMAPClient
-	my ($self, $uri, $mic_args) = @_;
-	my $url = $uri->as_string;
+	my ($self, $url, $mic_args) = @_;
+	my $uri = PublicInbox::URIimap->new($url);
 	my $cred = {
 		url => $url,
 		protocol => $uri->scheme,
@@ -372,10 +381,10 @@ sub imap_import_msg ($$$$) {
 }
 
 sub imap_fetch_all ($$$) {
-	my ($self, $mic, $uri) = @_;
+	my ($self, $mic, $url) = @_;
+	my $uri = PublicInbox::URIimap->new($url);
 	my $sec = uri_section($uri);
 	my $mbx = $uri->mailbox;
-	my $url = $uri->as_string;
 	$mic->Clear(1); # trim results history
 	$mic->examine($mbx) or return "E: EXAMINE $mbx ($sec) failed: $!";
 	my ($r_uidval, $r_uidnext);
@@ -489,7 +498,8 @@ sub imap_idle_once ($$$$) {
 
 # idles on a single URI
 sub watch_imap_idle_1 ($$$) {
-	my ($self, $uri, $intvl) = @_;
+	my ($self, $url, $intvl) = @_;
+	my $uri = PublicInbox::URIimap->new($url);
 	my $sec = uri_section($uri);
 	my $mic_arg = $self->{mic_arg}->{$sec} or
 			die "BUG: no Mail::IMAPClient->new arg for $sec";
@@ -498,8 +508,8 @@ sub watch_imap_idle_1 ($$$) {
 	until ($self->{quit}) {
 		$mic //= delete($self->{mics}->{$sec}) //
 				PublicInbox::IMAPClient->new(%$mic_arg);
-		my $err = imap_fetch_all($self, $mic, $uri);
-		$err //= imap_idle_once($self, $mic, $intvl, $uri->as_string);
+		my $err = imap_fetch_all($self, $mic, $url);
+		$err //= imap_idle_once($self, $mic, $intvl, $url);
 		if ($err && !$self->{quit}) {
 			warn $err, "\n";
 			$mic = undef;
@@ -526,27 +536,26 @@ sub watch_atfork_parent ($) {
 
 sub imap_idle_reap { # PublicInbox::DS::dwaitpid callback
 	my ($self, $pid) = @_;
-	my $uri_intvl = delete $self->{idle_pids}->{$pid} or
+	my $url_intvl = delete $self->{idle_pids}->{$pid} or
 		die "BUG: PID=$pid (unknown) reaped: \$?=$?\n";
 
-	my ($uri, $intvl) = @$uri_intvl;
-	my $url = $uri->as_string;
+	my ($url, $intvl) = @$url_intvl;
 	return if $self->{quit};
 	warn "W: PID=$pid on $url died: \$?=$?\n" if $?;
-	push @{$self->{idle_todo}}, $uri_intvl;
+	push @{$self->{idle_todo}}, $url_intvl;
 	PubicInbox::DS::requeue($self); # call ->event_step to respawn
 }
 
 sub imap_idle_fork ($$) {
-	my ($self, $uri_intvl) = @_;
-	my ($uri, $intvl) = @$uri_intvl;
+	my ($self, $url_intvl) = @_;
+	my ($url, $intvl) = @$url_intvl;
 	defined(my $pid = fork) or die "fork: $!";
 	if ($pid == 0) {
 		watch_atfork_child($self);
-		watch_imap_idle_1($self, $uri, $intvl);
+		watch_imap_idle_1($self, $url, $intvl);
 		_exit(0);
 	}
-	$self->{idle_pids}->{$pid} = $uri_intvl;
+	$self->{idle_pids}->{$pid} = $url_intvl;
 	PublicInbox::DS::dwaitpid($pid, \&imap_idle_reap, $self);
 }
 
@@ -556,34 +565,35 @@ sub event_step {
 	my $idle_todo = $self->{idle_todo};
 	if ($idle_todo && @$idle_todo) {
 		watch_atfork_parent($self);
-		while (my $uri_intvl = shift(@$idle_todo)) {
-			imap_idle_fork($self, $uri_intvl);
+		while (my $url_intvl = shift(@$idle_todo)) {
+			imap_idle_fork($self, $url_intvl);
 		}
 	}
 	goto(&fs_scan_step) if $self->{mdre};
 }
 
 sub watch_imap_fetch_all ($$) {
-	my ($self, $uris) = @_;
-	for my $uri (@$uris) {
+	my ($self, $urls) = @_;
+	for my $url (@$urls) {
+		my $uri = PublicInbox::URIimap->new($url);
 		my $sec = uri_section($uri);
 		my $mic_arg = $self->{mic_arg}->{$sec} or
 			die "BUG: no Mail::IMAPClient->new arg for $sec";
 		my $mic = PublicInbox::IMAPClient->new(%$mic_arg) or next;
-		my $err = imap_fetch_all($self, $mic, $uri);
+		my $err = imap_fetch_all($self, $mic, $url);
 		last if $self->{quit};
 		warn $err, "\n" if $err;
 	}
 }
 
 sub watch_nntp_fetch_all ($$) {
-	my ($self, $uris) = @_;
-	for my $uri (@$uris) {
+	my ($self, $urls) = @_;
+	for my $url (@$urls) {
+		my $uri = uri_new($url);
 		my $sec = uri_section($uri);
 		my $nn_arg = $self->{nn_arg}->{$sec} or
 			die "BUG: no Net::NNTP->new arg for $sec";
 		my $nntp_opt = $self->{nntp_opt}->{$sec};
-		my $url = $uri->as_string;
 		my $nn = nn_new($nn_arg, $nntp_opt, $url);
 		unless ($nn) {
 			warn "E: $url: \$!=$!\n";
@@ -601,43 +611,42 @@ sub watch_nntp_fetch_all ($$) {
 		}
 		last if $self->{quit};
 		if ($nn) {
-			my $err = nntp_fetch_all($self, $nn, $uri);
+			my $err = nntp_fetch_all($self, $nn, $url);
 			warn $err, "\n" if $err;
 		}
 	}
 }
 
 sub poll_fetch_fork ($) { # DS::add_timer callback
-	my ($self, $intvl, $uris) = @{$_[0]};
+	my ($self, $intvl, $urls) = @{$_[0]};
 	return if $self->{quit};
 	watch_atfork_parent($self);
 	defined(my $pid = fork) or die "fork: $!";
 	if ($pid == 0) {
 		watch_atfork_child($self);
-		if ($uris->[0]->scheme =~ /\Aimaps?\z/) {
-			watch_imap_fetch_all($self, $uris);
+		if ($urls->[0] =~ m!\Aimaps?://!i) {
+			watch_imap_fetch_all($self, $urls);
 		} else {
-			watch_nntp_fetch_all($self, $uris);
+			watch_nntp_fetch_all($self, $urls);
 		}
 		_exit(0);
 	}
-	$self->{poll_pids}->{$pid} = [ $intvl, $uris ];
+	$self->{poll_pids}->{$pid} = [ $intvl, $urls ];
 	PublicInbox::DS::dwaitpid($pid, \&poll_fetch_reap, $self);
 }
 
 sub poll_fetch_reap { # PublicInbox::DS::dwaitpid callback
 	my ($self, $pid) = @_;
-	my $intvl_uris = delete $self->{poll_pids}->{$pid} or
+	my $intvl_urls = delete $self->{poll_pids}->{$pid} or
 		die "BUG: PID=$pid (unknown) reaped: \$?=$?\n";
 	return if $self->{quit};
-	my ($intvl, $uris) = @$intvl_uris;
+	my ($intvl, $urls) = @$intvl_urls;
 	if ($?) {
-		warn "W: PID=$pid died: \$?=$?\n",
-			map { $_->as_string."\n" } @$uris;
+		warn "W: PID=$pid died: \$?=$?\n", map { "$_\n" } @$urls;
 	}
-	warn('I: will check ', $_->as_string, " in ${intvl}s\n") for @$uris;
+	warn("I: will check $_ in ${intvl}s\n") for @$urls;
 	PublicInbox::DS::add_timer($intvl, \&poll_fetch_fork,
-					[$self, $intvl, $uris]);
+					[$self, $intvl, $urls]);
 }
 
 sub watch_imap_init ($) {
@@ -656,11 +665,11 @@ sub watch_imap_init ($) {
 	my $mics = $self->{mics} = {}; # schema://authority => IMAPClient obj
 	for my $url (sort keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
-		$mics->{uri_section($uri)} //= mic_for($self, $uri, $mic_args);
+		$mics->{uri_section($uri)} //= mic_for($self, $url, $mic_args);
 	}
 
-	my $idle = []; # [ [ uri1, intvl1 ], [uri2, intvl2] ]
-	my $poll = {}; # intvl_seconds => [ uri1, uri2 ]
+	my $idle = []; # [ [ url1, intvl1 ], [url2, intvl2] ]
+	my $poll = {}; # intvl_seconds => [ url1, url2 ]
 	for my $url (keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
 		my $sec = uri_section($uri);
@@ -668,9 +677,9 @@ sub watch_imap_init ($) {
 		my $intvl = $self->{imap_opt}->{$sec}->{pollInterval};
 		if ($mic->has_capability('IDLE') && !$intvl) {
 			$intvl = $self->{imap_opt}->{$sec}->{idleInterval};
-			push @$idle, [ $uri, $intvl // () ];
+			push @$idle, [ $url, $intvl // () ];
 		} else {
-			push @{$poll->{$intvl || 120}}, $uri;
+			push @{$poll->{$intvl || 120}}, $url;
 		}
 	}
 	if (scalar @$idle) {
@@ -681,10 +690,10 @@ sub watch_imap_init ($) {
 	return unless scalar keys %$poll;
 	$self->{poll_pids} //= {};
 
-	# poll all URIs for a given interval sequentially
-	while (my ($intvl, $uris) = each %$poll) {
+	# poll all URLs for a given interval sequentially
+	while (my ($intvl, $urls) = each %$poll) {
 		PublicInbox::DS::add_timer(0, \&poll_fetch_fork,
-						[$self, $intvl, $uris]);
+						[$self, $intvl, $urls]);
 	}
 }
 
@@ -694,7 +703,7 @@ sub nntp_common_init ($) {
 	my $cfg = $self->{config};
 	my $nn_args = {}; # scheme://authority => Net::NNTP->new arg
 	for my $url (sort keys %{$self->{nntp}}) {
-		my $sec = uri_section(URI->new($url));
+		my $sec = uri_section(uri_new($url));
 
 		# Debug and Timeout are is passed to Net::NNTP->new
 		my $v = cfg_bool($cfg, 'nntp.Debug', $url);
@@ -756,8 +765,8 @@ E: <$url> STARTTLS requested and failed
 }
 
 sub nn_for ($$$) { # nn = Net::NNTP
-	my ($self, $uri, $nn_args) = @_;
-	my $url = $uri->as_string;
+	my ($self, $url, $nn_args) = @_;
+	my $uri = uri_new($url);
 	my $sec = uri_section($uri);
 	my $nntp_opt = $self->{nntp_opt}->{$sec} //= {};
 	my $cred;
@@ -765,7 +774,7 @@ sub nn_for ($$$) { # nn = Net::NNTP
 	if (defined(my $ui = $uri->userinfo)) {
 		$cred = {
 			url => $sec,
-			protocol => $uri->scheme,
+			protocol => uri_scheme($uri),
 			host => $uri->host,
 		};
 		($u, $p) = split(/:/, $ui, 2);
@@ -814,10 +823,10 @@ W: see https://rt.cpan.org/Ticket/Display.html?id=129967 for updates
 }
 
 sub nntp_fetch_all ($$$) {
-	my ($self, $nn, $uri) = @_;
+	my ($self, $nn, $url) = @_;
+	my $uri = uri_new($url);
 	my ($group, $num_a, $num_b) = $uri->group;
 	my $sec = uri_section($uri);
-	my $url = $uri->as_string;
 	my ($nr, $beg, $end) = $nn->group($group);
 	unless (defined($nr)) {
 		chomp(my $msg = $nn->message);
@@ -897,21 +906,21 @@ sub watch_nntp_init ($) {
 	# make sure we can connect and cache the credentials in memory
 	$self->{nn_arg} = {}; # schema://authority => Net::NNTP->new args
 	for my $url (sort keys %{$self->{nntp}}) {
-		nn_for($self, URI->new($url), $nn_args);
+		nn_for($self, $url, $nn_args);
 	}
-	my $poll = {}; # intvl_seconds => [ uri1, uri2 ]
+	my $poll = {}; # intvl_seconds => [ url1, url2 ]
 	for my $url (keys %{$self->{nntp}}) {
-		my $uri = URI->new($url);
+		my $uri = uri_new($url);
 		my $sec = uri_section($uri);
 		my $intvl = $self->{nntp_opt}->{$sec}->{pollInterval};
-		push @{$poll->{$intvl || 120}}, $uri;
+		push @{$poll->{$intvl || 120}}, $url;
 	}
 	$self->{poll_pids} //= {};
 
-	# poll all URIs for a given interval sequentially
-	while (my ($intvl, $uris) = each %$poll) {
+	# poll all URLs for a given interval sequentially
+	while (my ($intvl, $urls) = each %$poll) {
 		PublicInbox::DS::add_timer(0, \&poll_fetch_fork,
-						[$self, $intvl, $uris]);
+						[$self, $intvl, $urls]);
 	}
 }
 
@@ -1021,6 +1030,14 @@ EOF
 	undef;
 }
 
+sub uri_new {
+	my ($url) = @_;
+
+	# URI::snews exists, URI::nntps does not, so use URI::snews
+	$url =~ s!\Anntps://!snews://!i;
+	URI->new($url);
+}
+
 sub imap_url {
 	my ($url) = @_;
 	require PublicInbox::URIimap;
@@ -1032,11 +1049,12 @@ my %IS_NNTP = (news => 1, snews => 1, nntp => 1);
 sub nntp_url {
 	my ($url) = @_;
 	require URI;
-	# URI::snews exists, URI::nntps does not, so use URI::snews
-	$url =~ s!\Anntps://!snews://!i;
-	my $uri = URI->new($url);
-	return unless $uri && $IS_NNTP{$uri->scheme};
-	$uri->group ? $uri->canonical->as_string : undef;
+	my $uri = uri_new($url);
+	return unless $uri && $IS_NNTP{$uri->scheme} && $uri->group;
+	$url = $uri->canonical->as_string;
+	# nntps is IANA registered, snews is deprecated
+	$url =~ s!\Asnews://!nntps://!;
+	$url;
 }
 
 1;
diff --git a/t/watch_nntp.t b/t/watch_nntp.t
index f919930e7d8..98fb1161d1c 100644
--- a/t/watch_nntp.t
+++ b/t/watch_nntp.t
@@ -9,7 +9,9 @@ use_ok 'PublicInbox::WatchMaildir';
 my $nntp_url = \&PublicInbox::WatchMaildir::nntp_url;
 is('news://example.com/inbox.foo',
 	$nntp_url->('NEWS://examplE.com/inbox.foo'), 'lowercased');
-is('snews://example.com/inbox.foo',
-	$nntp_url->('nntps://example.com/inbox.foo'), 'nntps:// is snews://');
+is('nntps://example.com/inbox.foo',
+	$nntp_url->('nntps://example.com/inbox.foo'), 'nntps:// accepted');
+is('nntps://example.com/inbox.foo',
+	$nntp_url->('SNEWS://example.com/inbox.foo'), 'snews => nntps');
 
 done_testing;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 30/34] watch: enable autoflush for STDOUT and STDERR
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (28 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 29/34] watch: show user-specified URL consistently Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 31/34] watch: use our own "git credential" wrapper Eric Wong
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

In case output is redirected to a pipe, ensure stdout and stderr
are always unbuffered, as -watch may go long periods without
any output to fill up buffers.
---
 script/public-inbox-watch | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/script/public-inbox-watch b/script/public-inbox-watch
index ae7b70be355..c07d45d74ae 100755
--- a/script/public-inbox-watch
+++ b/script/public-inbox-watch
@@ -2,13 +2,15 @@
 # Copyright (C) 2016-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
-use warnings;
+use IO::Handle;
 use PublicInbox::WatchMaildir;
 use PublicInbox::Config;
 use PublicInbox::DS;
 use PublicInbox::Sigfd;
 use PublicInbox::Syscall qw(SFD_NONBLOCK);
 my $oldset = PublicInbox::Sigfd::block_signals();
+STDOUT->autoflush(1);
+STDERR->autoflush(1);
 my ($config, $watch_md);
 my $reload = sub {
 	$config = PublicInbox::Config->new;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 31/34] watch: use our own "git credential" wrapper
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (29 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 30/34] watch: enable autoflush for STDOUT and STDERR Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 32/34] watch: support ~/.netrc via Net::Netrc Eric Wong
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

Git.pm may not be installed on some systems; or some users have
multiple Perl installations and Git.pm is not available to the
Perl running -watch.  Accomodate both those types of users by
providing our own "git credential" wrapper.
---
 MANIFEST                         |  1 +
 lib/PublicInbox/GitCredential.pm | 40 ++++++++++++++++++++++++++++++++
 lib/PublicInbox/WatchMaildir.pm  | 22 ++++++++----------
 3 files changed, 51 insertions(+), 12 deletions(-)
 create mode 100644 lib/PublicInbox/GitCredential.pm

diff --git a/MANIFEST b/MANIFEST
index f9d1eea5bd9..6de2c72581b 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -123,6 +123,7 @@ lib/PublicInbox/Filter/Vger.pm
 lib/PublicInbox/GetlineBody.pm
 lib/PublicInbox/Git.pm
 lib/PublicInbox/GitAsyncCat.pm
+lib/PublicInbox/GitCredential.pm
 lib/PublicInbox/GitHTTPBackend.pm
 lib/PublicInbox/GzipFilter.pm
 lib/PublicInbox/HTTP.pm
diff --git a/lib/PublicInbox/GitCredential.pm b/lib/PublicInbox/GitCredential.pm
new file mode 100644
index 00000000000..826e7a55e8b
--- /dev/null
+++ b/lib/PublicInbox/GitCredential.pm
@@ -0,0 +1,40 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+package PublicInbox::GitCredential;
+use strict;
+use PublicInbox::Spawn qw(popen_rd);
+
+sub run ($$) {
+	my ($self, $op) = @_;
+	my ($in_r, $in_w);
+	pipe($in_r, $in_w) or die "pipe: $!";
+	my $out_r = popen_rd([qw(git credential), $op], undef, { 0 => $in_r });
+	close $in_r or die "close in_r: $!";
+
+	my $out = '';
+	for my $k (qw(url protocol host username password)) {
+		defined(my $v = $self->{$k}) or next;
+		die "`$k' contains `\\n' or `\\0'\n" if $v =~ /[\n\0]/;
+		$out .= "$k=$v\n";
+	}
+	$out .= "\n";
+	print $in_w $out or die "print (git credential $op): $!";
+	close $in_w or die "close (git credential $op): $!";
+	return $out_r if $op eq 'fill';
+	<$out_r> and die "unexpected output from `git credential $op'\n";
+	close $out_r or die "`git credential $op' failed: \$!=$! \$?=$?\n";
+}
+
+sub fill {
+	my ($self) = @_;
+	my $out_r = run($self, 'fill');
+	while (<$out_r>) {
+		chomp;
+		return if $_ eq '';
+		/\A([^=]+)=(.*)\z/ or die "bad line: $_\n";
+		$self->{$1} = $2;
+	}
+	close $out_r or die "git credential fill failed: \$!=$! \$?=$?\n";
+}
+
+1;
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 43c8395c79b..19f894d4315 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -308,13 +308,14 @@ sub auth_anon_cb { '' }; # for Mail::IMAPClient::Authcallback
 sub mic_for ($$$) { # mic = Mail::IMAPClient
 	my ($self, $url, $mic_args) = @_;
 	my $uri = PublicInbox::URIimap->new($url);
-	my $cred = {
+	require PublicInbox::GitCredential;
+	my $cred = bless {
 		url => $url,
 		protocol => $uri->scheme,
 		host => $uri->host,
 		username => $uri->user,
 		password => $uri->password,
-	};
+	}, 'PublicInbox::GitCredential';
 	my $common = $mic_args->{uri_section($uri)} // {};
 	my $host = $cred->{host};
 	my $mic_arg = {
@@ -342,7 +343,7 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 		$cred = undef;
 	}
 	if ($cred) {
-		Git::credential($cred, 'fill'); # may prompt user here
+		$cred->fill; # may prompt user here
 		$mic->User($mic_arg->{User} = $cred->{username});
 		$mic->Password($mic_arg->{Password} = $cred->{password});
 	} else { # AUTH=ANONYMOUS
@@ -356,7 +357,7 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 		warn "E: <$url> LOGIN: $@\n";
 		$mic = undef;
 	}
-	Git::credential($cred, $mic ? 'approve' : 'reject') if $cred;
+	$cred->run($mic ? 'approve' : 'reject') if $cred;
 	$mic;
 }
 
@@ -653,8 +654,6 @@ sub watch_imap_init ($) {
 	my ($self) = @_;
 	eval { require PublicInbox::IMAPClient } or
 		die "Mail::IMAPClient is required for IMAP:\n$@\n";
-	eval { require Git } or
-		die "Git (Perl module) is required for IMAP:\n$@\n";
 	eval { require PublicInbox::IMAPTracker } or
 		die "DBD::SQLite is required for IMAP\n:$@\n";
 
@@ -772,11 +771,12 @@ sub nn_for ($$$) { # nn = Net::NNTP
 	my $cred;
 	my ($u, $p);
 	if (defined(my $ui = $uri->userinfo)) {
-		$cred = {
+		require PublicInbox::GitCredential;
+		$cred = bless {
 			url => $sec,
 			protocol => uri_scheme($uri),
 			host => $uri->host,
-		};
+		}, 'PublicInbox::GitCredential';
 		($u, $p) = split(/:/, $ui, 2);
 		($cred->{username}, $cred->{password}) = ($u, $p);
 	}
@@ -791,7 +791,7 @@ sub nn_for ($$$) { # nn = Net::NNTP
 	my $nn = nn_new($nn_arg, $nntp_opt, $url);
 
 	if ($cred) {
-		Git::credential($cred, 'fill'); # may prompt user here
+		$cred->fill; # may prompt user here
 		if ($nn->authinfo($u, $p)) {
 			push @{$nntp_opt->{-postconn}}, [ 'authinfo', $u, $p ];
 		} else {
@@ -818,7 +818,7 @@ W: see https://rt.cpan.org/Ticket/Display.html?id=129967 for updates
 	}
 
 	$self->{nn_arg}->{$sec} = $nn_arg;
-	Git::credential($cred, $nn ? 'approve' : 'reject') if $cred;
+	$cred->run($nn ? 'approve' : 'reject') if $cred;
 	$nn;
 }
 
@@ -896,8 +896,6 @@ sub watch_nntp_init ($) {
 	my ($self) = @_;
 	eval { require Net::NNTP } or
 		die "Net::NNTP is required for NNTP:\n$@\n";
-	eval { require Git } or
-		die "Git (Perl module) is required for NNTP:\n$@\n";
 	eval { require PublicInbox::IMAPTracker } or
 		die "DBD::SQLite is required for NNTP\n:$@\n";
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 32/34] watch: support ~/.netrc via Net::Netrc
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (30 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 31/34] watch: use our own "git credential" wrapper Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:03 ` [PATCH 33/34] imaptracker: use flock(2) around writes Eric Wong
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

While git-credential-netrc exists in git.git contrib/, it may
not be widely known or installed.  Net::Netrc is already a
standard part of most (if not all) Perl installations, so use it
directly if available.
---
 lib/PublicInbox/GitCredential.pm | 15 +++++++++++++++
 lib/PublicInbox/WatchMaildir.pm  | 15 ++++++++++-----
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/GitCredential.pm b/lib/PublicInbox/GitCredential.pm
index 826e7a55e8b..c6da6a090ca 100644
--- a/lib/PublicInbox/GitCredential.pm
+++ b/lib/PublicInbox/GitCredential.pm
@@ -25,6 +25,21 @@ sub run ($$) {
 	close $out_r or die "`git credential $op' failed: \$!=$! \$?=$?\n";
 }
 
+sub check_netrc ($) {
+	my ($self) = @_;
+
+	# part of the standard library, but distributions may split it out
+	eval { require Net::Netrc };
+	if ($@) {
+		warn "W: Net::Netrc missing: $@\n";
+		return;
+	}
+	if (my $x = Net::Netrc->lookup($self->{host}, $self->{username})) {
+		$self->{username} //= $x->login;
+		$self->{password} = $x->password;
+	}
+}
+
 sub fill {
 	my ($self) = @_;
 	my $out_r = run($self, 'fill');
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 19f894d4315..377548a2ad5 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -317,11 +317,12 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 		password => $uri->password,
 	}, 'PublicInbox::GitCredential';
 	my $common = $mic_args->{uri_section($uri)} // {};
+	# IMAPClient and Net::Netrc both mishandles `0', so we pass `127.0.0.1'
 	my $host = $cred->{host};
+	$host = '127.0.0.1' if $host eq '0';
 	my $mic_arg = {
 		Port => $uri->port,
-		# IMAPClient mishandles `0', so we pass `127.0.0.1'
-		Server => $host eq '0' ? '127.0.0.1' : $host,
+		Server => $host,
 		Ssl => $uri->scheme eq 'imaps',
 		Keepalive => 1, # SO_KEEPALIVE
 		%$common, # may set Starttls, Compress, Debug ....
@@ -343,6 +344,7 @@ sub mic_for ($$$) { # mic = Mail::IMAPClient
 		$cred = undef;
 	}
 	if ($cred) {
+		$cred->check_netrc unless defined $cred->{password};
 		$cred->fill; # may prompt user here
 		$mic->User($mic_arg->{User} = $cred->{username});
 		$mic->Password($mic_arg->{Password} = $cred->{password});
@@ -768,6 +770,9 @@ sub nn_for ($$$) { # nn = Net::NNTP
 	my $uri = uri_new($url);
 	my $sec = uri_section($uri);
 	my $nntp_opt = $self->{nntp_opt}->{$sec} //= {};
+	my $host = $uri->host;
+	# Net::NNTP and Net::Netrc both mishandle `0', so we pass `127.0.0.1'
+	$host = '127.0.0.1' if $host eq '0';
 	my $cred;
 	my ($u, $p);
 	if (defined(my $ui = $uri->userinfo)) {
@@ -775,16 +780,16 @@ sub nn_for ($$$) { # nn = Net::NNTP
 		$cred = bless {
 			url => $sec,
 			protocol => uri_scheme($uri),
-			host => $uri->host,
+			host => $host,
 		}, 'PublicInbox::GitCredential';
 		($u, $p) = split(/:/, $ui, 2);
 		($cred->{username}, $cred->{password}) = ($u, $p);
+		$cred->check_netrc unless defined $p;
 	}
 	my $common = $nn_args->{$sec} // {};
 	my $nn_arg = {
 		Port => $uri->port,
-		# Net::NNTP mishandles `0', so we pass `127.0.0.1'
-		Host => $uri->host eq '0' ? '127.0.0.1' : $uri->host,
+		Host => $host,
 		SSL => $uri->secure, # snews == nntps
 		%$common, # may Debug ....
 	};

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 33/34] imaptracker: use flock(2) around writes
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (31 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 32/34] watch: support ~/.netrc via Net::Netrc Eric Wong
@ 2020-06-27 10:03 ` Eric Wong
  2020-06-27 10:04 ` [PATCH 34/34] watch: simplify internal structures Eric Wong
  2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:03 UTC (permalink / raw)
  To: meta

SQLite only issues non-blocking F_SETLK ops (not F_SETLKW) and
retries failures using a configurable busy_timeout.  SQLite's
busy loop sleeps for a millisecond and retries the lock until
the configured busy_timeout is hit.

Trying to set ->sqlite_busy_timeout to larger values (e.g. 30000
milliseconds) still leads to failure when running the new stress
test with 8 processes with TMPDIR on a 7200 RPM HDD.

Inspection of SQLite source reveals there's no built-in way to
use F_SETLKW, so tack on the existing flock(2) support we use to
synchronize git + SQLite + Xapian for inbox writing.  We use
flock(2) instead of POSIX fcntl(2) locks since Perl doesn't
provide a way to manipulate "struct flock" portably.
---
 lib/PublicInbox/IMAPTracker.pm | 13 ++++++++++---
 t/imap_tracker.t               | 30 +++++++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm
index 0bbabe07fae..102a74ce66b 100644
--- a/lib/PublicInbox/IMAPTracker.pm
+++ b/lib/PublicInbox/IMAPTracker.pm
@@ -2,6 +2,7 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 package PublicInbox::IMAPTracker;
 use strict;
+use parent qw(PublicInbox::Lock);
 use DBI;
 use DBD::SQLite;
 use PublicInbox::Config;
@@ -48,7 +49,10 @@ sub update_last ($$$) {
 INSERT OR REPLACE INTO imap_last (url, uid_validity, uid)
 VALUES (?, ?, ?)
 
-	$sth->execute($self->{url}, $validity, $last);
+	$self->lock_acquire;
+	my $rv = $sth->execute($self->{url}, $validity, $last);
+	$self->lock_release;
+	$rv;
 }
 
 sub new {
@@ -68,8 +72,11 @@ sub new {
 		require File::Basename;
 		File::Path::mkpath(File::Basename::dirname($dbname));
 	}
-
-	bless { url => $url, dbh => dbh_new($dbname) }, $class;
+	my $self = bless { lock_path => "$dbname.lock", url => $url }, $class;
+	$self->lock_acquire;
+	$self->{dbh} = dbh_new($dbname);
+	$self->lock_release;
+	$self;
 }
 
 1;
diff --git a/t/imap_tracker.t b/t/imap_tracker.t
index 8dc04ed77a3..01e1d0b1549 100644
--- a/t/imap_tracker.t
+++ b/t/imap_tracker.t
@@ -9,8 +9,8 @@ my ($tmpdir, $for_destroy) = tmpdir();
 mkdir "$tmpdir/old" or die "mkdir $tmpdir/old: $!";
 my $old = "$tmpdir/old/imap.sqlite3";
 my $cur = "$tmpdir/data/public-inbox/imap.sqlite3";
+local $ENV{XDG_DATA_HOME} = "$tmpdir/data";
 {
-	local $ENV{XDG_DATA_HOME} = "$tmpdir/data";
 	local $ENV{PI_DIR} = "$tmpdir/old";
 
 	my $tracker = PublicInbox::IMAPTracker->new;
@@ -22,5 +22,33 @@ my $cur = "$tmpdir/data/public-inbox/imap.sqlite3";
 	$tracker = PublicInbox::IMAPTracker->new;
 	ok(!-f $cur, '->new does not create new file if old is present');
 }
+SKIP: {
+	my $nproc = $ENV{TEST_STRESS_NPROC};
+	skip 'TEST_STRESS_NPROC= not set', 1 unless $nproc;
+	my $nr = $ENV{TEST_STRESS_NR} // 10000;
+	diag "TEST_STRESS_NPROC=$nproc TEST_STRESS_NR=$nr";
+	require POSIX;
+	for my $n (1..$nproc) {
+		defined(my $pid = fork) or BAIL_OUT "fork: $!";
+		if ($pid == 0) {
+			my $url = "imap://example.com/INBOX.$$";
+			my $uidval = time;
+			eval {
+				my $itrk = PublicInbox::IMAPTracker->new($url);
+				for my $uid (1..$nr) {
+					$itrk->update_last($uidval, $uid);
+					my ($uv, $u) = $itrk->get_last;
+				}
+			};
+			warn "E: $n $$ - $@\n" if $@;
+			POSIX::_exit($@ ? 1 : 0);
+		}
+	}
+	while (1) {
+		my $pid = waitpid(-1, 0);
+		last if $pid < 0;
+		is($?, 0, "$pid exited");
+	}
+}
 
 done_testing;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 34/34] watch: simplify internal structures
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (32 preceding siblings ...)
  2020-06-27 10:03 ` [PATCH 33/34] imaptracker: use flock(2) around writes Eric Wong
@ 2020-06-27 10:04 ` Eric Wong
  2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
  34 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 10:04 UTC (permalink / raw)
  To: meta

We won't be attempting to reuse Mail::IMAPConnections used to
check authentication info, for now, so stop storing
$self->{mics}.

We can also combine $poll initialization for IMAP and NNTP
to avoid data structure duplication.  Furthermore, rely on
autovivification to create {idle_pids} and {poll_pids}.
---
 lib/PublicInbox/WatchMaildir.pm | 42 +++++++++++----------------------
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 377548a2ad5..94b03ab36e3 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -509,8 +509,7 @@ sub watch_imap_idle_1 ($$$) {
 	my $mic;
 	local $0 = $uri->mailbox." $sec";
 	until ($self->{quit}) {
-		$mic //= delete($self->{mics}->{$sec}) //
-				PublicInbox::IMAPClient->new(%$mic_arg);
+		$mic //= PublicInbox::IMAPClient->new(%$mic_arg);
 		my $err = imap_fetch_all($self, $mic, $url);
 		$err //= imap_idle_once($self, $mic, $intvl, $url);
 		if ($err && !$self->{quit}) {
@@ -534,7 +533,6 @@ sub watch_atfork_child ($) {
 sub watch_atfork_parent ($) {
 	my ($self) = @_;
 	_done_for_now($self);
-	$self->{mics} = {}; # going to be forking, so disconnect
 }
 
 sub imap_idle_reap { # PublicInbox::DS::dwaitpid callback
@@ -652,8 +650,8 @@ sub poll_fetch_reap { # PublicInbox::DS::dwaitpid callback
 					[$self, $intvl, $urls]);
 }
 
-sub watch_imap_init ($) {
-	my ($self) = @_;
+sub watch_imap_init ($$) {
+	my ($self, $poll) = @_;
 	eval { require PublicInbox::IMAPClient } or
 		die "Mail::IMAPClient is required for IMAP:\n$@\n";
 	eval { require PublicInbox::IMAPTracker } or
@@ -663,14 +661,13 @@ sub watch_imap_init ($) {
 
 	# make sure we can connect and cache the credentials in memory
 	$self->{mic_arg} = {}; # schema://authority => IMAPClient->new args
-	my $mics = $self->{mics} = {}; # schema://authority => IMAPClient obj
+	my $mics = {}; # schema://authority => IMAPClient obj
 	for my $url (sort keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
 		$mics->{uri_section($uri)} //= mic_for($self, $url, $mic_args);
 	}
 
 	my $idle = []; # [ [ url1, intvl1 ], [url2, intvl2] ]
-	my $poll = {}; # intvl_seconds => [ url1, url2 ]
 	for my $url (keys %{$self->{imap}}) {
 		my $uri = PublicInbox::URIimap->new($url);
 		my $sec = uri_section($uri);
@@ -684,18 +681,9 @@ sub watch_imap_init ($) {
 		}
 	}
 	if (scalar @$idle) {
-		$self->{idle_pids} = {};
 		$self->{idle_todo} = $idle;
 		PublicInbox::DS::requeue($self); # ->event_step to fork
 	}
-	return unless scalar keys %$poll;
-	$self->{poll_pids} //= {};
-
-	# poll all URLs for a given interval sequentially
-	while (my ($intvl, $urls) = each %$poll) {
-		PublicInbox::DS::add_timer(0, \&poll_fetch_fork,
-						[$self, $intvl, $urls]);
-	}
 }
 
 # flesh out common NNTP-specific data structures
@@ -897,8 +885,8 @@ sub nntp_fetch_all ($$$) {
 	$err;
 }
 
-sub watch_nntp_init ($) {
-	my ($self) = @_;
+sub watch_nntp_init ($$) {
+	my ($self, $poll) = @_;
 	eval { require Net::NNTP } or
 		die "Net::NNTP is required for NNTP:\n$@\n";
 	eval { require PublicInbox::IMAPTracker } or
@@ -911,28 +899,26 @@ sub watch_nntp_init ($) {
 	for my $url (sort keys %{$self->{nntp}}) {
 		nn_for($self, $url, $nn_args);
 	}
-	my $poll = {}; # intvl_seconds => [ url1, url2 ]
 	for my $url (keys %{$self->{nntp}}) {
 		my $uri = uri_new($url);
 		my $sec = uri_section($uri);
 		my $intvl = $self->{nntp_opt}->{$sec}->{pollInterval};
 		push @{$poll->{$intvl || 120}}, $url;
 	}
-	$self->{poll_pids} //= {};
-
-	# poll all URLs for a given interval sequentially
-	while (my ($intvl, $urls) = each %$poll) {
-		PublicInbox::DS::add_timer(0, \&poll_fetch_fork,
-						[$self, $intvl, $urls]);
-	}
 }
 
 sub watch {
 	my ($self, $sig, $oldset) = @_;
 	$self->{oldset} = $oldset;
 	$self->{sig} = $sig;
-	watch_imap_init($self) if $self->{imap};
-	watch_nntp_init($self) if $self->{nntp};
+	my $poll = {}; # intvl_seconds => [ url1, url2 ]
+	watch_imap_init($self, $poll) if $self->{imap};
+	watch_nntp_init($self, $poll) if $self->{nntp};
+	while (my ($intvl, $urls) = each %$poll) {
+		# poll all URLs for a given interval sequentially
+		PublicInbox::DS::add_timer(0, \&poll_fetch_fork,
+						[$self, $intvl, $urls]);
+	}
 	watch_fs_init($self) if $self->{mdre};
 	PublicInbox::DS->SetPostLoopCallback(sub {});
 	PublicInbox::DS->EventLoop until $self->{quit};

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/34] watch: use signalfd for Maildir watching
  2020-06-27 10:03 ` [PATCH 11/34] watch: use signalfd for Maildir watching Eric Wong
@ 2020-06-27 19:05   ` Kyle Meyer
  2020-06-27 22:32     ` Eric Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Kyle Meyer @ 2020-06-27 19:05 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

One of these days, I'm going to read through a series on this list and
make a valuable technical comment :]  Until then...

Eric Wong writes:

> We can get rid of the janky wannabe
> self-using-a-directory-instead-of-pipe thing we needed to
> workaround Filesys::Notify::Simple being blocking.
>
> For existing Maildir users, this should be more robust and
> immune to missed wakeups for to signalfd and kqueue-enabled

s/for to/for/ ?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 28/34] watch: add NNTP support
  2020-06-27 10:03 ` [PATCH 28/34] watch: add NNTP support Eric Wong
@ 2020-06-27 19:06   ` Kyle Meyer
  0 siblings, 0 replies; 46+ messages in thread
From: Kyle Meyer @ 2020-06-27 19:06 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong writes:

> +# flesh out common NNTP-specific data structures
> +sub nntp_common_init ($) {
> +	my ($self) = @_;
> +	my $cfg = $self->{config};
> +	my $nn_args = {}; # scheme://authority => Net::NNTP->new arg
> +	for my $url (sort keys %{$self->{nntp}}) {
> +		my $sec = uri_section(URI->new($url));
> +
> +		# Debug and Timeout are is passed to Net::NNTP->new

s/ is//

> +		my $v = cfg_bool($cfg, 'nntp.Debug', $url);
> +		$nn_args->{$sec}->{Debug} = $v if defined $v;
> +		my $to = cfg_intvl($cfg, 'nntp.Timeout', $url);
> +		$nn_args->{$sec}->{Timeout} = $to if $to;

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/34] watch: use signalfd for Maildir watching
  2020-06-27 19:05   ` Kyle Meyer
@ 2020-06-27 22:32     ` Eric Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-27 22:32 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> One of these days, I'm going to read through a series on this list and
> make a valuable technical comment :]  Until then...

No worries, your contributions are valuable regardless of
whether they're technical or not :>

> Eric Wong writes:
> 
> > We can get rid of the janky wannabe
> > self-using-a-directory-instead-of-pipe thing we needed to
> > workaround Filesys::Notify::Simple being blocking.
> >
> > For existing Maildir users, this should be more robust and
> > immune to missed wakeups for to signalfd and kqueue-enabled
> 
> s/for to/for/ ?

Thanks, will queue up some fixes along with your other comment.

There's deeper issues with this series w.r.t zombies for
existing Maildir users...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 0/5] watch: Maildir fixes
  2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
                   ` (33 preceding siblings ...)
  2020-06-27 10:04 ` [PATCH 34/34] watch: simplify internal structures Eric Wong
@ 2020-06-29 10:34 ` Eric Wong
  2020-06-29 10:34   ` [PATCH 1/5] watch: check for duplicates in ->over before spamcheck Eric Wong
                     ` (4 more replies)
  34 siblings, 5 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-29 10:34 UTC (permalink / raw)
  To: meta

The introduction of NNTP and IMAP support in -watch introduced
some annoying-but-non-fatal regressions for existing Maildir
users, in one case due to a race fix.

  https://public-inbox.org/meta/20200627100400.9871-1-e@yhbt.net/

It turned out -watch could missing full scans at startup due
to the old alarm(1)-triggered scan firing too soon, and
switching to signalfd(2)[*] exposed performance problems fixed
by [PATCH 1/5] in this series.

PATCH 2/5 was just a minor debugging change while I was fixing
commit 52e37476e8e62e8e54104d9a2abcf2a86d1d1a32
("eml: header_str_set: correctly encode UTF-8 headers")

3 & 4 may not be strictly necessary at the moment, but they
could be in the future if subprocesses (spamc, git) we spawn
ever depend on SIGCHLD.  More signals may be unblocked in
subprocesses in the future, perhaps only SIGINT (from Ctrl-C)
could remain masked.

Finally, 5/5 fixes the zombies introduced which kicked off
the investigation into this series.

Eric Wong (5):
  watch: check for duplicates in ->over before spamcheck
  watch: show path for warnings from spam messages
  watch: ensure SIGCHLD works in forked children
  spawn: unblock SIGCHLD in subprocess
  watch: make waitpid() synchronous for Maildir scans

 lib/PublicInbox/Import.pm       |  2 +-
 lib/PublicInbox/Spawn.pm        | 11 +++++++---
 lib/PublicInbox/SpawnPP.pm      |  5 +++++
 lib/PublicInbox/V2Writable.pm   |  2 +-
 lib/PublicInbox/WatchMaildir.pm | 37 +++++++++++++++++++++++++--------
 t/spawn.t                       | 27 ++++++++++++++++++++++++
 6 files changed, 70 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/5] watch: check for duplicates in ->over before spamcheck
  2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
@ 2020-06-29 10:34   ` Eric Wong
  2020-06-29 10:34   ` [PATCH 2/5] watch: show path for warnings from spam messages Eric Wong
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-29 10:34 UTC (permalink / raw)
  To: meta

It's cheaper to check for duplicates than run `spamc'
repeatedly when rechecking.  We already do this for
v1 with by using the "ls" command with fast-import,
but v2 requires checking against over.sqlite3.
---
 lib/PublicInbox/Import.pm       |  2 +-
 lib/PublicInbox/V2Writable.pm   |  2 +-
 lib/PublicInbox/WatchMaildir.pm | 21 ++++++++++++++++++++-
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index ae508cd8013..fb813159ef7 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -387,7 +387,7 @@ sub add {
 
 	# spam check:
 	if ($check_cb) {
-		$mime = $check_cb->($mime) or return;
+		$mime = $check_cb->($mime, $self->{-inbox}) or return;
 	}
 
 	my $blob = $self->{mark}++;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 8b31b69a62f..528f5e9a565 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -171,7 +171,7 @@ sub _add {
 
 	# spam check:
 	if ($check_cb) {
-		$mime = $check_cb->($mime) or return;
+		$mime = $check_cb->($mime, $self->{-inbox}) or return;
 	}
 
 	# All pipes (> $^F) known to Perl 5.6+ have FD_CLOEXEC set,
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index efc9849a6ef..ec28a3034ff 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -12,6 +12,8 @@ use PublicInbox::Filter::Base qw(REJECT);
 use PublicInbox::Spamcheck;
 use PublicInbox::Sigfd;
 use PublicInbox::DS qw(now);
+use PublicInbox::MID qw(mids);
+use PublicInbox::ContentHash qw(content_hash);
 use POSIX qw(_exit);
 *mime_from_path = \&PublicInbox::InboxWritable::mime_from_path;
 
@@ -988,10 +990,27 @@ sub _importer_for {
 	$importers->{"$ibx"} = $im;
 }
 
+# XXX consider sharing with V2Writable, this only requires read-only access
+sub content_exists ($$) {
+	my ($ibx, $eml) = @_;
+	my $over = $ibx->over or return;
+	my $mids = mids($eml);
+	my $chash = content_hash($eml);
+	my ($id, $prev);
+	for my $mid (@$mids) {
+		while (my $smsg = $over->next_by_mid($mid, \$id, \$prev)) {
+			my $cmp = $ibx->smsg_eml($smsg) or return;
+			return 1 if $chash eq content_hash($cmp);
+		}
+	}
+	undef;
+}
+
 sub _spamcheck_cb {
 	my ($sc) = @_;
 	sub {
-		my ($mime) = @_;
+		my ($mime, $ibx) = @_;
+		return if content_exists($ibx, $mime);
 		my $tmp = '';
 		if ($sc->spamcheck($mime, \$tmp)) {
 			return PublicInbox::Eml->new(\$tmp);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/5] watch: show path for warnings from spam messages
  2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
  2020-06-29 10:34   ` [PATCH 1/5] watch: check for duplicates in ->over before spamcheck Eric Wong
@ 2020-06-29 10:34   ` Eric Wong
  2020-06-29 10:34   ` [PATCH 3/5] watch: ensure SIGCHLD works in forked children Eric Wong
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-29 10:34 UTC (permalink / raw)
  To: meta

It could be useful to see warnings generated for known problematic
messages just as it is for possibly non-problematic ones.
---
 lib/PublicInbox/WatchMaildir.pm | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index ec28a3034ff..25b87e938e0 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -188,15 +188,14 @@ sub _try_path {
 		warn "unmappable dir: $1\n";
 		return;
 	}
-	if (!ref($inboxes) && $inboxes eq 'watchspam') {
-		return _remove_spam($self, $path);
-	}
-
 	my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
 	local $SIG{__WARN__} = sub {
 		$warn_cb->("path: $path\n");
 		$warn_cb->(@_);
 	};
+	if (!ref($inboxes) && $inboxes eq 'watchspam') {
+		return _remove_spam($self, $path);
+	}
 	foreach my $ibx (@$inboxes) {
 		my $eml = mime_from_path($path) or next;
 		import_eml($self, $ibx, $eml);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 3/5] watch: ensure SIGCHLD works in forked children
  2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
  2020-06-29 10:34   ` [PATCH 1/5] watch: check for duplicates in ->over before spamcheck Eric Wong
  2020-06-29 10:34   ` [PATCH 2/5] watch: show path for warnings from spam messages Eric Wong
@ 2020-06-29 10:34   ` Eric Wong
  2020-06-29 10:34   ` [PATCH 4/5] spawn: unblock SIGCHLD in subprocess Eric Wong
  2020-06-29 10:34   ` [PATCH 5/5] watch: make waitpid() synchronous for Maildir scans Eric Wong
  4 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-29 10:34 UTC (permalink / raw)
  To: meta

In case our git or spam checker subprocesses spawn
subprocesses of their own.  We'll also ensure signal
handlers are properly setup before unblocking them.
---
 lib/PublicInbox/WatchMaildir.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 25b87e938e0..288f64d1e6c 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -527,8 +527,8 @@ sub watch_atfork_child ($) {
 	delete $self->{poll_pids};
 	delete $self->{opendirs};
 	PublicInbox::DS->Reset;
+	%SIG = (%SIG, %{$self->{sig}}, CHLD => 'DEFAULT');
 	PublicInbox::Sigfd::sig_setmask($self->{oldset});
-	%SIG = (%SIG, %{$self->{sig}});
 }
 
 sub watch_atfork_parent ($) {

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 4/5] spawn: unblock SIGCHLD in subprocess
  2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
                     ` (2 preceding siblings ...)
  2020-06-29 10:34   ` [PATCH 3/5] watch: ensure SIGCHLD works in forked children Eric Wong
@ 2020-06-29 10:34   ` Eric Wong
  2020-07-07  6:17     ` [PATCH 6/5] t/spawn: fix test reliability Eric Wong
  2020-06-29 10:34   ` [PATCH 5/5] watch: make waitpid() synchronous for Maildir scans Eric Wong
  4 siblings, 1 reply; 46+ messages in thread
From: Eric Wong @ 2020-06-29 10:34 UTC (permalink / raw)
  To: meta

Subprocess we spawn may want to use SIGCHLD for themselves.
This also ensures we restore default signal handlers
in the pure Perl version.
---
 lib/PublicInbox/Spawn.pm   | 11 ++++++++---
 lib/PublicInbox/SpawnPP.pm |  5 +++++
 t/spawn.t                  | 27 +++++++++++++++++++++++++++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index 888283d0d09..f90d8f6d3dd 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -78,7 +78,7 @@ int pi_fork_exec(SV *redirref, SV *file, SV *cmdref, SV *envref, SV *rlimref,
 	const char *filename = SvPV_nolen(file);
 	pid_t pid;
 	char **argv, **envp;
-	sigset_t set, old;
+	sigset_t set, old, cset;
 	int ret, perrnum, cerrnum = 0;
 
 	AV2C_COPY(argv, cmd);
@@ -88,6 +88,10 @@ int pi_fork_exec(SV *redirref, SV *file, SV *cmdref, SV *envref, SV *rlimref,
 	assert(ret == 0 && "BUG calling sigfillset");
 	ret = sigprocmask(SIG_SETMASK, &set, &old);
 	assert(ret == 0 && "BUG calling sigprocmask to block");
+	ret = sigemptyset(&cset);
+	assert(ret == 0 && "BUG calling sigemptyset");
+	ret = sigaddset(&cset, SIGCHLD);
+	assert(ret == 0 && "BUG calling sigaddset for SIGCHLD");
 	pid = vfork();
 	if (pid == 0) {
 		int sig;
@@ -120,9 +124,10 @@ int pi_fork_exec(SV *redirref, SV *file, SV *cmdref, SV *envref, SV *rlimref,
 		}
 
 		/*
-		 * don't bother unblocking, we don't want signals
-		 * to the group taking out a subprocess
+		 * don't bother unblocking other signals for now, just SIGCHLD.
+		 * we don't want signals to the group taking out a subprocess
 		 */
+		(void)sigprocmask(SIG_UNBLOCK, &cset, NULL);
 		execve(filename, argv, envp);
 		exit_err(&cerrnum);
 	}
diff --git a/lib/PublicInbox/SpawnPP.pm b/lib/PublicInbox/SpawnPP.pm
index 34ce2052c60..a72d5a2d86f 100644
--- a/lib/PublicInbox/SpawnPP.pm
+++ b/lib/PublicInbox/SpawnPP.pm
@@ -36,6 +36,11 @@ sub pi_fork_exec ($$$$$$) {
 		if ($cd ne '') {
 			chdir $cd or die "chdir $cd: $!";
 		}
+		$SIG{$_} = 'DEFAULT' for keys %SIG;
+		my $cset = POSIX::SigSet->new();
+		$cset->addset(POSIX::SIGCHLD) or die "can't add SIGCHLD: $!";
+		sigprocmask(SIG_UNBLOCK, $cset) or
+					die "can't unblock SIGCHLD: $!";
 		if ($ENV{MOD_PERL}) {
 			exec which('env'), '-i', @$env, @$cmd;
 			die "exec env -i ... $cmd->[0] failed: $!\n";
diff --git a/t/spawn.t b/t/spawn.t
index 44355f43655..fd669e222e1 100644
--- a/t/spawn.t
+++ b/t/spawn.t
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use Test::More;
 use PublicInbox::Spawn qw(which spawn popen_rd);
+use PublicInbox::Sigfd;
 
 {
 	my $true = which('true');
@@ -17,6 +18,32 @@ use PublicInbox::Spawn qw(which spawn popen_rd);
 	is($?, 0, 'true exited successfully');
 }
 
+{ # ensure waitpid(-1, 0) and SIGCHLD works in spawned process
+	my $script = <<'EOF';
+$| = 1; # unbuffer stdout
+defined(my $pid = fork) or die "fork: $!";
+if ($pid == 0) { exit }
+elsif ($pid > 0) {
+	my $waited = waitpid(-1, 0);
+	$waited == $pid or die "mismatched child $pid != $waited";
+	$? == 0 or die "child err: $>";
+	$SIG{CHLD} = sub { print "HI\n"; exit };
+	print "RDY $$\n";
+	sleep while 1;
+}
+EOF
+	my $oldset = PublicInbox::Sigfd::block_signals();
+	my $rd = popen_rd([$^X, '-e', $script]);
+	diag 'waiting for child to reap grandchild...';
+	chomp(my $line = readline($rd));
+	my ($rdy, $pid) = split(' ', $line);
+	is($rdy, 'RDY', 'got ready signal, waitpid(-1) works in child');
+	ok(kill('CHLD', $pid), 'sent SIGCHLD to child');
+	is(readline($rd), "HI\n", '$SIG{CHLD} works in child');
+	ok(close $rd, 'popen_rd close works');
+	PublicInbox::Sigfd::sig_setmask($oldset);
+}
+
 {
 	my ($r, $w);
 	pipe $r, $w or die "pipe failed: $!";

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 5/5] watch: make waitpid() synchronous for Maildir scans
  2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
                     ` (3 preceding siblings ...)
  2020-06-29 10:34   ` [PATCH 4/5] spawn: unblock SIGCHLD in subprocess Eric Wong
@ 2020-06-29 10:34   ` Eric Wong
  2020-06-29 10:37     ` Eric Wong
  4 siblings, 1 reply; 46+ messages in thread
From: Eric Wong @ 2020-06-29 10:34 UTC (permalink / raw)
  To: meta

Maildir scanning still happens in the main process.  Scanning
dozens of Maildirs is still time-consuming and monopolizes the
event loop during WatchMaildir::event_step.  This can cause
cause zombies to accumulate before Sigfd::event_step triggers
DS::reap_pids.
---
 lib/PublicInbox/WatchMaildir.pm | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 288f64d1e6c..d147962994e 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -123,9 +123,9 @@ sub new {
 
 sub _done_for_now {
 	my ($self) = @_;
-	my $importers = $self->{importers};
-	foreach my $im (values %$importers) {
-		$im->done;
+	local $PublicInbox::DS::in_loop = 0; # waitpid() synchronlusly
+	for (values %{$self->{importers}}) {
+		$_->done if $_; # $_ may be undef during cleanup
 	}
 }
 
@@ -936,6 +936,7 @@ sub fs_scan_step {
 	my ($self) = @_;
 	return if $self->{quit};
 	my $op = shift @{$self->{ops}};
+	local $PublicInbox::DS::in_loop = 0; # waitpid() synchronlusly
 
 	# continue existing scan
 	my $max = 10;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/5] watch: make waitpid() synchronous for Maildir scans
  2020-06-29 10:34   ` [PATCH 5/5] watch: make waitpid() synchronous for Maildir scans Eric Wong
@ 2020-06-29 10:37     ` Eric Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-06-29 10:37 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> wrote:
> +	local $PublicInbox::DS::in_loop = 0; # waitpid() synchronlusly

> +	local $PublicInbox::DS::in_loop = 0; # waitpid() synchronlusly

"synchronously" :x

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 6/5] t/spawn: fix test reliability
  2020-06-29 10:34   ` [PATCH 4/5] spawn: unblock SIGCHLD in subprocess Eric Wong
@ 2020-07-07  6:17     ` Eric Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2020-07-07  6:17 UTC (permalink / raw)
  To: meta

Since Perl doesn't internally use a self-pipe for
sleep/select/poll/etc, wake up every 10ms to ensure
it can see the SIGCHLD; since neither signalfd nor EVFILT_SIGNAL
are always available.

Fixes: 761baa2a300e4268 ("spawn: unblock SIGCHLD in subprocess")
---
 t/spawn.t | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/spawn.t b/t/spawn.t
index fd669e22..a0019202 100644
--- a/t/spawn.t
+++ b/t/spawn.t
@@ -29,7 +29,7 @@ elsif ($pid > 0) {
 	$? == 0 or die "child err: $>";
 	$SIG{CHLD} = sub { print "HI\n"; exit };
 	print "RDY $$\n";
-	sleep while 1;
+	select(undef, undef, undef, 0.01) while 1;
 }
 EOF
 	my $oldset = PublicInbox::Sigfd::block_signals();

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-07-07  6:17 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
2020-06-27 10:03 ` [PATCH 01/34] inboxwritable: ensure ssoma.lock exists on init Eric Wong
2020-06-27 10:03 ` [PATCH 02/34] inbox: warn on ->on_inbox_unlock exception Eric Wong
2020-06-27 10:03 ` [PATCH 03/34] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric Wong
2020-06-27 10:03 ` [PATCH 04/34] imaptracker: use ~/.local/share/public-inbox/imap.sqlite3 Eric Wong
2020-06-27 10:03 ` [PATCH 05/34] watchmaildir: hoist out compile_watchheaders Eric Wong
2020-06-27 10:03 ` [PATCH 06/34] watchmaildir: fix check for spam vs ham inbox conflicts Eric Wong
2020-06-27 10:03 ` [PATCH 07/34] URI IMAP support Eric Wong
2020-06-27 10:03 ` [PATCH 08/34] watch: preliminary " Eric Wong
2020-06-27 10:03 ` [PATCH 09/34] kqnotify|fake_inotify: detect Maildir write ops Eric Wong
2020-06-27 10:03 ` [PATCH 10/34] watch: remove Filesys::Notify::Simple dependency Eric Wong
2020-06-27 10:03 ` [PATCH 11/34] watch: use signalfd for Maildir watching Eric Wong
2020-06-27 19:05   ` Kyle Meyer
2020-06-27 22:32     ` Eric Wong
2020-06-27 10:03 ` [PATCH 12/34] ds: remove fields.pm usage Eric Wong
2020-06-27 10:03 ` [PATCH 13/34] watch: wire up IMAP IDLE reapers to DS Eric Wong
2020-06-27 10:03 ` [PATCH 14/34] watch: support IMAP polling Eric Wong
2020-06-27 10:03 ` [PATCH 15/34] config: support ->urlmatch method for -watch Eric Wong
2020-06-27 10:03 ` [PATCH 16/34] watch: stop importers before forking Eric Wong
2020-06-27 10:03 ` [PATCH 17/34] watch: use UID SEARCH to avoid empty UID FETCH Eric Wong
2020-06-27 10:03 ` [PATCH 18/34] ds: add_timer: allow passing arg to callback Eric Wong
2020-06-27 10:03 ` [PATCH 19/34] imaptracker: add {url} field to reduce args Eric Wong
2020-06-27 10:03 ` [PATCH 20/34] imaptracker: drop {dbname} field Eric Wong
2020-06-27 10:03 ` [PATCH 21/34] watch: avoid long transaction to IMAPTracker Eric Wong
2020-06-27 10:03 ` [PATCH 22/34] watch: support imap.fetchBatchSize parameter Eric Wong
2020-06-27 10:03 ` [PATCH 23/34] watch: imap: be quiet about disconnecting on quit Eric Wong
2020-06-27 10:03 ` [PATCH 24/34] watch: support multiple watch: directives per-inbox Eric Wong
2020-06-27 10:03 ` [PATCH 25/34] watch: remove {mdir} array Eric Wong
2020-06-27 10:03 ` [PATCH 26/34] watch: just use ->urlmatch Eric Wong
2020-06-27 10:03 ` [PATCH 27/34] testcommon: $ENV{TAIL} supports non-@ARGV redirects Eric Wong
2020-06-27 10:03 ` [PATCH 28/34] watch: add NNTP support Eric Wong
2020-06-27 19:06   ` Kyle Meyer
2020-06-27 10:03 ` [PATCH 29/34] watch: show user-specified URL consistently Eric Wong
2020-06-27 10:03 ` [PATCH 30/34] watch: enable autoflush for STDOUT and STDERR Eric Wong
2020-06-27 10:03 ` [PATCH 31/34] watch: use our own "git credential" wrapper Eric Wong
2020-06-27 10:03 ` [PATCH 32/34] watch: support ~/.netrc via Net::Netrc Eric Wong
2020-06-27 10:03 ` [PATCH 33/34] imaptracker: use flock(2) around writes Eric Wong
2020-06-27 10:04 ` [PATCH 34/34] watch: simplify internal structures Eric Wong
2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
2020-06-29 10:34   ` [PATCH 1/5] watch: check for duplicates in ->over before spamcheck Eric Wong
2020-06-29 10:34   ` [PATCH 2/5] watch: show path for warnings from spam messages Eric Wong
2020-06-29 10:34   ` [PATCH 3/5] watch: ensure SIGCHLD works in forked children Eric Wong
2020-06-29 10:34   ` [PATCH 4/5] spawn: unblock SIGCHLD in subprocess Eric Wong
2020-07-07  6:17     ` [PATCH 6/5] t/spawn: fix test reliability Eric Wong
2020-06-29 10:34   ` [PATCH 5/5] watch: make waitpid() synchronous for Maildir scans Eric Wong
2020-06-29 10:37     ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).