unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 00/82] public-inbox-imapd: read-only IMAP server
@ 2020-06-10  7:03 Eric Wong
  2020-06-10  7:03 ` [PATCH 01/82] doc: add some IMAP standards Eric Wong
                   ` (82 more replies)
  0 siblings, 83 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:03 UTC (permalink / raw)
  To: meta

So I finally wrote my first IMAP server!  And I'm actually
fairly satisfied with how it's turning out to support a bunch
of other performance + scalability work I've wanted to do.

Some previous notes here:
  https://public-inbox.org/meta/20200609113442.GA16856@dcvr/

I finally seem to have gotten it to play nicely with mutt header
caching, so it's fit for public consumption :)

imaps://news.public-inbox.org/INBOX.comp.mail.public-inbox.meta.0

You can use any username+password, and AUTH=ANONYMOUS also
works if your client does that.

It doesn't support UTF-7 (mailbox names) or advertise UTF-8
in CAPABILITIES, yet; I still have RFCs to read :P

And there's a bunch of new things which could use some
testing from non-mutt/mbsync/offlineimap users.
Maybe you'll find some client-side bugs like I did :P

v1 reindexing also gets a little bit of parallelism :)

Anyways, I'll probably be porting some of the scalability
and slow-storage work to older parts of the code before
fiddling with more IMAP extensions.

Eric Wong (82):
  doc: add some IMAP standards
  nntpd: restrict allowed newsgroup names
  preliminary imap server implementation
  inboxidle: new class to detect inbox changes
  imap: support IDLE
  msgmap: split ->max into its own method
  imap: delay InboxIdle start, support refresh
  imap: implement STATUS command
  imap: use Text::ParseWords::parse_line to handle quoted words
  imap: support LIST command
  t/imapd: support FakeInotify and KQNotify
  imap: support fetch for BODYSTRUCTURE and BODY
  eml: each_part: single part $idx is 1
  imap: allow fetch of partial of BODY[...] and headers
  imap: always include `resp-text' in responses
  imap: split out unit tests and benchmarks
  imap: fix multi-message partial header fetches
  imap: simplify partial fetch structure
  imap: support sequence number FETCH
  imap: do not include ".PEEK" in responses
  imap: support the CLOSE command
  imap: speed up HEADER.FIELDS[.NOT] range fetches
  git: async: flatten the inflight array
  git: do our own read buffering for cat-file
  imap: use git-cat-file asynchronously
  git: idle rbuf for async
  imap: support LSUB command
  imap: FETCH: support comma-delimited ranges
  add imapd compression test
  testcommon: tcp_(server|connect): BAIL_OUT on failure
  *deflate: drop invalid comment about rbuf
  imap: fix pipelining with async git
  git: cat_async: provide requested OID + "missing" on missing blobs
  git: move async_cat reference to PublicInbox::Git
  git: async: automatic retry on alternates change
  imapclient: wrapper for Mail::IMAPClient
  xt: add imapd-validate and imapd-mbsync-oimap
  imap: support out-of-bounds ranges
  xt/perf-imap-list: time refresh_inboxlist
  imap: case-insensitive mailbox name comparisons
  imap: break giant inboxes into sub-inboxes of 50K messages
  imap: start introducing iterative config reloading
  imap: require ".$UID_MIN-$UID_END" suffix
  imapd: ensure LIST is sorted alphabetically, for now
  imap: omit $UID_END from mailbox name, use index
  t/config.t: always compare against git bool behavior
  xt/*: show some tunable parameters
  imap: STATUS and LIST are case-insensitive, too
  imap: EXAMINE/STATUS: return correct counts
  imap: avoid uninitialized warnings on incomplete commands
  imap: start parsing out queries for SQLite and Xapian
  imap: SEARCH: clamp results to the 50K UID range
  imap: allow UID range search on timestamps
  over: get_art: use dbh->prepare_cached
  search: index byte size of a message for IMAP search
  search: index UID for IMAP search, too
  imap: remove dummies from sequence number FETCH
  imap: compile UID FETCH to opcodes
  imap: UID FETCH: optimize for smsg-only case
  imap: UID FETCH: optimize (UID FLAGS) harder
  imap: IDLE: avoid extraneous wakeups, keep-alive
  imap: 30 minute auto-logout timer
  imap: split ->logged_in attribute into a separate class
  searchidx: v1 (re)-index uses git asynchronously
  index: account for CRLF conversion when storing bytes
  imap: rely on smsg->{bytes} for RFC822.SIZE
  imap: UID FETCH requires at least one data item
  imap: LIST shows "INBOX" in all caps
  imap: support 8000 octet lines
  imap: reinstate some message sequence number support
  imap: cleanup ->{uid_base} usage
  imap: FETCH: more granular CRLF conversion
  imap: further speed up HEADER.FIELDS FETCH requests
  imap: FETCH: try to make fake MSNs sequentially
  imap: STATUS/EXAMINE: rely on SQLite overview
  imap: UID SEARCH: support multiple ranges
  imap: wire up Xapian search, msn SEARCH and multiple ranges
  imap: misc cleanups and notes
  imapd: don't bother sorting LIST output
  imap: drop non-UID SEARCH for now
  over: uid_range: remove LIMIT
  imap: FETCH: proper MSN => UID mapping for requests

 Documentation/public-inbox-imapd.pod |   91 ++
 Documentation/standards.perl         |   10 +
 MANIFEST                             |   18 +
 lib/PublicInbox/Config.pm            |   18 +
 lib/PublicInbox/Daemon.pm            |   24 +-
 lib/PublicInbox/DummyInbox.pm        |   22 +
 lib/PublicInbox/Eml.pm               |    9 +-
 lib/PublicInbox/FakeInotify.pm       |   59 ++
 lib/PublicInbox/Git.pm               |  163 +--
 lib/PublicInbox/GitAsyncCat.pm       |   51 +
 lib/PublicInbox/IMAP.pm              | 1397 ++++++++++++++++++++++++++
 lib/PublicInbox/IMAPClient.pm        |  119 +++
 lib/PublicInbox/IMAPD.pm             |  114 +++
 lib/PublicInbox/IMAPdeflate.pm       |  126 +++
 lib/PublicInbox/Import.pm            |    2 +-
 lib/PublicInbox/In2Tie.pm            |   17 +
 lib/PublicInbox/Inbox.pm             |   33 +-
 lib/PublicInbox/InboxIdle.pm         |   79 ++
 lib/PublicInbox/KQNotify.pm          |   66 ++
 lib/PublicInbox/Lock.pm              |    7 +
 lib/PublicInbox/MsgIter.pm           |    2 +-
 lib/PublicInbox/Msgmap.pm            |   20 +-
 lib/PublicInbox/NNTPD.pm             |   12 +-
 lib/PublicInbox/NNTPdeflate.pm       |    1 -
 lib/PublicInbox/Over.pm              |   50 +-
 lib/PublicInbox/Search.pm            |   32 +-
 lib/PublicInbox/SearchIdx.pm         |   89 +-
 lib/PublicInbox/SearchIdxShard.pm    |   11 +-
 lib/PublicInbox/Smsg.pm              |    8 +-
 lib/PublicInbox/TestCommon.pm        |    7 +-
 lib/PublicInbox/V2Writable.pm        |   10 +-
 script/public-inbox-imapd            |   14 +
 t/config.t                           |   15 +-
 t/eml.t                              |    2 +-
 t/git.t                              |   40 +-
 t/imap.t                             |  133 +++
 t/imapd-tls.t                        |  204 ++++
 t/imapd.t                            |  398 ++++++++
 t/import.t                           |    5 +-
 t/inbox_idle.t                       |   72 ++
 t/nntpd.t                            |    5 +-
 t/over.t                             |    3 +
 t/search.t                           |   19 +
 xt/cmp-msgstr.t                      |    1 -
 xt/cmp-msgview.t                     |    1 -
 xt/eml_check_limits.t                |    6 +-
 xt/git_async_cmp.t                   |    2 +-
 xt/imapd-mbsync-oimap.t              |  132 +++
 xt/imapd-validate.t                  |  177 ++++
 xt/mem-msgview.t                     |    1 +
 xt/msgtime_cmp.t                     |    1 -
 xt/perf-msgview.t                    |    1 -
 52 files changed, 3718 insertions(+), 181 deletions(-)
 create mode 100644 Documentation/public-inbox-imapd.pod
 create mode 100644 lib/PublicInbox/DummyInbox.pm
 create mode 100644 lib/PublicInbox/FakeInotify.pm
 create mode 100644 lib/PublicInbox/GitAsyncCat.pm
 create mode 100644 lib/PublicInbox/IMAP.pm
 create mode 100644 lib/PublicInbox/IMAPClient.pm
 create mode 100644 lib/PublicInbox/IMAPD.pm
 create mode 100644 lib/PublicInbox/IMAPdeflate.pm
 create mode 100644 lib/PublicInbox/In2Tie.pm
 create mode 100644 lib/PublicInbox/InboxIdle.pm
 create mode 100644 lib/PublicInbox/KQNotify.pm
 create mode 100644 script/public-inbox-imapd
 create mode 100644 t/imap.t
 create mode 100644 t/imapd-tls.t
 create mode 100644 t/imapd.t
 create mode 100644 t/inbox_idle.t
 create mode 100644 xt/imapd-mbsync-oimap.t
 create mode 100644 xt/imapd-validate.t

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 01/82] doc: add some IMAP standards
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
@ 2020-06-10  7:03 ` Eric Wong
  2020-06-10  7:03 ` [PATCH 02/82] nntpd: restrict allowed newsgroup names Eric Wong
                   ` (81 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:03 UTC (permalink / raw)
  To: meta

There's more, but IMAP is big and complex already.
---
 Documentation/standards.perl | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/standards.perl b/Documentation/standards.perl
index 34ab829e2e3..37309956f39 100755
--- a/Documentation/standards.perl
+++ b/Documentation/standards.perl
@@ -42,6 +42,15 @@ my $rfcs = [
 	822 => 'Internet message format (1982)',
 	2822 => 'Internet message format (2001)',
 	5322 => 'Internet message format (2008)',
+	3501 => 'IMAP4rev1',
+	2177 => 'IMAP IDLE', # TODO
+	# 5032 = 'WITHIN search extension for IMAP',
+	4978 => 'IMAP COMPRESS Extension',
+	# 5182 = 'IMAP Extension for Referencing the Last SEARCH Result',
+	# 5738 =>  'IMAP Support for UTF-8',
+	# 8474 => 'IMAP Extension for Object Identifiers',
+	# ...
+
 	# TODO: flesh this out
 
 ];

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 02/82] nntpd: restrict allowed newsgroup names
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
  2020-06-10  7:03 ` [PATCH 01/82] doc: add some IMAP standards Eric Wong
@ 2020-06-10  7:03 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 03/82] preliminary imap server implementation Eric Wong
                   ` (80 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:03 UTC (permalink / raw)
  To: meta

We'll be using newsgroup names as mailbox names for IMAP,
too, so ensure we don't send wonky characters in responses.

I doubt this affects any real-world instances, but a BOFH could
choose strange names to cause grief for clients.
---
 lib/PublicInbox/NNTPD.pm | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/PublicInbox/NNTPD.pm b/lib/PublicInbox/NNTPD.pm
index 451f4d4183b..b8ec84ed6b0 100644
--- a/lib/PublicInbox/NNTPD.pm
+++ b/lib/PublicInbox/NNTPD.pm
@@ -41,6 +41,12 @@ sub refresh_groups () {
 		if (ref $ngname) {
 			warn 'multiple newsgroups not supported: '.
 				join(', ', @$ngname). "\n";
+		# Newsgroup name needs to be compatible with RFC 3977
+		# wildmat-exact and RFC 3501 (IMAP) ATOM-CHAR.
+		# Leave out a few chars likely to cause problems or conflicts:
+		# '|', '<', '>', ';', '#', '$', '&',
+		} elsif ($ngname =~ m![^A-Za-z0-9/_\.\-\~\@\+\=:]!) {
+			warn "newsgroup name invalid: `$ngname'\n";
 		} elsif ($ng->nntp_usable) {
 			# Only valid if msgmap and search works
 			$new->{$ngname} = $ng;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 03/82] preliminary imap server implementation
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
  2020-06-10  7:03 ` [PATCH 01/82] doc: add some IMAP standards Eric Wong
  2020-06-10  7:03 ` [PATCH 02/82] nntpd: restrict allowed newsgroup names Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 04/82] inboxidle: new class to detect inbox changes Eric Wong
                   ` (79 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

It shares a bit of code with NNTP.  It's copy+pasted for now
since this provides new ground to experiment with APIs for
dealing with slow storage and many inboxes.
---
 Documentation/public-inbox-imapd.pod |  91 +++++
 MANIFEST                             |   7 +
 lib/PublicInbox/Daemon.pm            |  24 +-
 lib/PublicInbox/IMAP.pm              | 523 +++++++++++++++++++++++++++
 lib/PublicInbox/IMAPD.pm             |  15 +
 lib/PublicInbox/IMAPdeflate.pm       | 119 ++++++
 lib/PublicInbox/Smsg.pm              |   8 +-
 script/public-inbox-imapd            |  14 +
 t/imapd-tls.t                        | 209 +++++++++++
 t/imapd.t                            | 149 ++++++++
 10 files changed, 1146 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/public-inbox-imapd.pod
 create mode 100644 lib/PublicInbox/IMAP.pm
 create mode 100644 lib/PublicInbox/IMAPD.pm
 create mode 100644 lib/PublicInbox/IMAPdeflate.pm
 create mode 100644 script/public-inbox-imapd
 create mode 100644 t/imapd-tls.t
 create mode 100644 t/imapd.t

diff --git a/Documentation/public-inbox-imapd.pod b/Documentation/public-inbox-imapd.pod
new file mode 100644
index 00000000000..02027f4f254
--- /dev/null
+++ b/Documentation/public-inbox-imapd.pod
@@ -0,0 +1,91 @@
+=head1 NAME
+
+public-inbox-imapd - IMAP server for sharing public-inboxes
+
+=head1 SYNOPSIS
+
+B<public-inbox-imapd> [OPTIONS]
+
+=head1 DESCRIPTION
+
+public-inbox-imapd provides a read-only IMAP daemon for
+public-inbox.  It uses options and environment variables common
+to all L<public-inbox-daemon(8)> implementations.
+
+Like L<public-inbox-nntpd(1)> and L<public-inbox-httpd(1)>,
+C<public-inbox-imapd> will never require write access
+to the directory where the public-inboxes are stored, so it
+may be run as a different user than the user running
+L<public-inbox-watch(1)>, L<public-inbox-mda(1)>, or
+L<git-fetch(1)>.
+
+=head1 OPTIONS
+
+See common options in L<public-inbox-daemon(8)/OPTIONS>.
+Additionally, IMAP-specific behavior for certain options
+are supported and documented below.
+
+=over
+
+=item -l, --listen PROTO://ADDRESS/?cert=/path/to/cert,key=/path/to/key
+
+In addition to the normal C<-l>/C<--listen> switch described in
+L<public-inbox-daemon(8)>, the C<PROTO> prefix (e.g. C<imap://> or
+C<imaps://>) may be specified to force a given protocol.
+
+For STARTTLS and IMAPS support, the C<cert> and C<key> may be specified
+on a per-listener basis after a C<?> character and separated by C<,>.
+These directives are per-directive, and it's possible to use a different
+cert for every listener.
+
+=item --cert /path/to/cert
+
+The default TLS certificate for optional STARTTLS and IMAPS support
+if the C<cert> option is not given with C<--listen>.
+
+If using systemd-compatible socket activation and a TCP listener on port
+993 is inherited, it is automatically IMAPS when this option is given.
+When a listener on port 143 is inherited and this option is given, it
+automatically gets STARTTLS support.
+
+=item --key /path/to/key
+
+The default private TLS certificate key for optional STARTTLS and IMAPS
+support if the C<key> option is not given with C<--listen>.  The private
+key may concatenated into the path used by C<--cert>, in which case this
+option is not needed.
+
+=back
+
+=head1 CONFIGURATION
+
+C<public-inbox-imapd> uses the same configuration knobs
+as L<public-inbox-nntpd(1)>, see L<public-inbox-nntpd(1)>
+and L<public-inbox-config(5)>.
+
+=over 8
+
+=item publicinbox.<name>.newsgroup
+
+The newsgroup name maps to an IMAP folder name.
+
+=back
+
+=head1 CONTACT
+
+Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
+
+The mail archives are hosted at L<https://public-inbox.org/meta/>,
+L<nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta>,
+L<nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta>
+
+=head1 COPYRIGHT
+
+Copyright 2020 all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<git(1)>, L<git-config(1)>, L<public-inbox-daemon(8)>,
+L<public-inbox-config(5)>, L<public-inbox-nntpd(1)>
diff --git a/MANIFEST b/MANIFEST
index 24f95faa942..73b874b42a0 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -26,6 +26,7 @@ Documentation/public-inbox-convert.pod
 Documentation/public-inbox-daemon.pod
 Documentation/public-inbox-edit.pod
 Documentation/public-inbox-httpd.pod
+Documentation/public-inbox-imapd.pod
 Documentation/public-inbox-index.pod
 Documentation/public-inbox-init.pod
 Documentation/public-inbox-learn.pod
@@ -124,6 +125,9 @@ lib/PublicInbox/HTTPD.pm
 lib/PublicInbox/HTTPD/Async.pm
 lib/PublicInbox/HlMod.pm
 lib/PublicInbox/Hval.pm
+lib/PublicInbox/IMAP.pm
+lib/PublicInbox/IMAPD.pm
+lib/PublicInbox/IMAPdeflate.pm
 lib/PublicInbox/Import.pm
 lib/PublicInbox/Inbox.pm
 lib/PublicInbox/InboxWritable.pm
@@ -193,6 +197,7 @@ script/public-inbox-compact
 script/public-inbox-convert
 script/public-inbox-edit
 script/public-inbox-httpd
+script/public-inbox-imapd
 script/public-inbox-index
 script/public-inbox-init
 script/public-inbox-learn
@@ -257,6 +262,8 @@ t/httpd-https.t
 t/httpd-unix.t
 t/httpd.t
 t/hval.t
+t/imapd-tls.t
+t/imapd.t
 t/import.t
 t/inbox.t
 t/index-git-times.t
diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm
index 4ff7cad4939..2f63bd73b4a 100644
--- a/lib/PublicInbox/Daemon.pm
+++ b/lib/PublicInbox/Daemon.pm
@@ -1,6 +1,6 @@
 # Copyright (C) 2015-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-# contains common daemon code for the nntpd and httpd servers.
+# contains common daemon code for the httpd, imapd, and nntpd servers.
 # This may be used for read-only IMAP server if we decide to implement it.
 package PublicInbox::Daemon;
 use strict;
@@ -29,8 +29,8 @@ my %tls_opt; # scheme://sockname => args for IO::Socket::SSL->start_SSL
 my $reexec_pid;
 my ($uid, $gid);
 my ($default_cert, $default_key);
-my %KNOWN_TLS = ( 443 => 'https', 563 => 'nntps' );
-my %KNOWN_STARTTLS = ( 119 => 'nntp' );
+my %KNOWN_TLS = ( 443 => 'https', 563 => 'nntps', 993 => 'imaps' );
+my %KNOWN_STARTTLS = ( 119 => 'nntp', 143 => 'imap' );
 
 sub accept_tls_opt ($) {
 	my ($opt_str) = @_;
@@ -123,7 +123,7 @@ sub daemon_prepare ($) {
 			$tls_opt{"$scheme://$l"} = accept_tls_opt($1);
 		} elsif (defined($default_cert)) {
 			$tls_opt{"$scheme://$l"} = accept_tls_opt('');
-		} elsif ($scheme =~ /\A(?:nntps|https)\z/) {
+		} elsif ($scheme =~ /\A(?:https|imaps|imaps)\z/) {
 			die "$orig specified w/o cert=\n";
 		}
 		# TODO: use scheme to load either NNTP.pm or HTTP.pm
@@ -584,13 +584,13 @@ sub defer_accept ($$) {
 }
 
 sub daemon_loop ($$$$) {
-	my ($refresh, $post_accept, $nntpd, $af_default) = @_;
+	my ($refresh, $post_accept, $tlsd, $af_default) = @_;
 	my %post_accept;
 	while (my ($k, $v) = each %tls_opt) {
-		if ($k =~ s!\A(?:nntps|https)://!!) {
+		if ($k =~ s!\A(?:https|imaps|nntps)://!!) {
 			$post_accept{$k} = tls_start_cb($v, $post_accept);
-		} elsif ($nntpd) { # STARTTLS, $k eq '' is OK
-			$nntpd->{accept_tls} = $v;
+		} elsif ($tlsd) { # STARTTLS, $k eq '' is OK
+			$tlsd->{accept_tls} = $v;
 		}
 	}
 	my $sig = {
@@ -620,8 +620,8 @@ sub daemon_loop ($$$$) {
 	@listeners = map {;
 		my $tls_cb = $post_accept{sockname($_)};
 
-		# NNTPS, HTTPS, HTTP, and POP3S are client-first traffic
-		# NNTP and POP3 are server-first
+		# NNTPS, HTTPS, HTTP, IMAPS and POP3S are client-first traffic
+		# IMAP, NNTP and POP3 are server-first
 		defer_accept($_, $tls_cb ? 'dataready' : $af_default);
 
 		# this calls epoll_create:
@@ -639,12 +639,12 @@ sub daemon_loop ($$$$) {
 }
 
 sub run ($$$;$) {
-	my ($default, $refresh, $post_accept, $nntpd) = @_;
+	my ($default, $refresh, $post_accept, $tlsd) = @_;
 	local $SIG{PIPE} = 'IGNORE';
 	daemon_prepare($default);
 	my $af_default = $default =~ /:8080\z/ ? 'httpready' : undef;
 	my $for_destroy = daemonize();
-	daemon_loop($refresh, $post_accept, $nntpd, $af_default);
+	daemon_loop($refresh, $post_accept, $tlsd, $af_default);
 	PublicInbox::DS->Reset;
 	# ->DESTROY runs when $for_destroy goes out-of-scope
 }
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
new file mode 100644
index 00000000000..c0636066b9f
--- /dev/null
+++ b/lib/PublicInbox/IMAP.pm
@@ -0,0 +1,523 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# Each instance of this represents an IMAP client connected to
+# public-inbox-imapd.  Much of this was taken from NNTP, but
+# further refined while experimenting on future ideas to handle
+# slow storage.
+#
+# data notes:
+# * NNTP article numbers are UIDs and message sequence numbers (MSNs)
+# * Message sequence numbers (MSNs) can be stable since we're read-only.
+#   Most IMAP clients use UIDs (I hope), and we can return a dummy
+#   message if a client requests a non-existent MSN.
+
+package PublicInbox::IMAP;
+use strict;
+use base qw(PublicInbox::DS);
+use fields qw(imapd logged_in ibx long_cb -login_tag);
+use PublicInbox::Eml;
+use PublicInbox::DS qw(now);
+use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
+use Errno qw(EAGAIN);
+my $Address;
+for my $mod (qw(Email::Address::XS Mail::Address)) {
+	eval "require $mod" or next;
+	$Address = $mod and last;
+}
+die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address;
+
+sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977?
+
+my %FETCH_NEED_BLOB = ( # for future optimization
+	'BODY.PEEK[HEADER]' => 1,
+	'BODY.PEEK[TEXT]' => 1,
+	'BODY.PEEK[]' => 1,
+	'BODY[HEADER]' => 1,
+	'BODY[TEXT]' => 1,
+	'BODY[]' => 1,
+	'RFC822.HEADER' => 1,
+	'RFC822.SIZE' => 1, # needs CRLF conversion :<
+	'RFC822.TEXT' => 1,
+	BODY => 1,
+	BODYSTRUCTURE => 1,
+	ENVELOPE => 1,
+	FLAGS => 0,
+	INTERNALDATE => 0,
+	RFC822 => 1,
+	UID => 0,
+);
+my %FETCH_ATT = map { $_ => [ $_ ] } keys %FETCH_NEED_BLOB;
+
+# aliases (RFC 3501 section 6.4.5)
+$FETCH_ATT{FAST} = [ qw(FLAGS INTERNALDATE RFC822.SIZE) ];
+$FETCH_ATT{ALL} = [ @{$FETCH_ATT{FAST}}, 'ENVELOPE' ];
+$FETCH_ATT{FULL} = [ @{$FETCH_ATT{ALL}}, 'BODY' ];
+
+for my $att (keys %FETCH_ATT) {
+	my %h = map { $_ => 1 } @{$FETCH_ATT{$att}};
+	$FETCH_ATT{$att} = \%h;
+}
+
+sub greet ($) {
+	my ($self) = @_;
+	my $capa = capa($self);
+	$self->write(\"* OK [$capa] public-inbox-imapd ready\r\n");
+}
+
+sub new ($$$) {
+	my ($class, $sock, $imapd) = @_;
+	my $self = fields::new($class);
+	my $ev = EPOLLIN;
+	my $wbuf;
+	if ($sock->can('accept_SSL') && !$sock->accept_SSL) {
+		return CORE::close($sock) if $! != EAGAIN;
+		$ev = PublicInbox::TLS::epollbit();
+		$wbuf = [ \&PublicInbox::DS::accept_tls_step, \&greet ];
+	}
+	$self->SUPER::new($sock, $ev | EPOLLONESHOT);
+	$self->{imapd} = $imapd;
+	if ($wbuf) {
+		$self->{wbuf} = $wbuf;
+	} else {
+		greet($self);
+	}
+	$self->update_idle_time;
+	$self;
+}
+
+sub capa ($) {
+	my ($self) = @_;
+	my $capa = 'CAPABILITY IMAP4rev1';
+	if ($self->{logged_in}) {
+		$capa .= ' COMPRESS=DEFLATE';
+	} else {
+		if (!($self->{sock} // $self)->can('accept_SSL') &&
+			$self->{imapd}->{accept_tls}) {
+			$capa .= ' STARTTLS';
+		}
+		$capa .= ' AUTH=ANONYMOUS';
+	}
+}
+
+sub login_success ($$) {
+	my ($self, $tag) = @_;
+	$self->{logged_in} = 1;
+	my $capa = capa($self);
+	"$tag OK [$capa] Logged in\r\n";
+}
+
+sub auth_challenge_ok ($) {
+	my ($self) = @_;
+	my $tag = delete($self->{-login_tag}) or return;
+	login_success($self, $tag);
+}
+
+sub cmd_login ($$$$) {
+	my ($self, $tag) = @_; # ignore ($user, $password) = ($_[2], $_[3])
+	login_success($self, $tag);
+}
+
+sub cmd_logout ($$) {
+	my ($self, $tag) = @_;
+	delete $self->{logged_in};
+	$self->write(\"* BYE logging out\r\n$tag OK logout completed\r\n");
+	$self->shutdn; # PublicInbox::DS::shutdn
+	undef;
+}
+
+sub cmd_authenticate ($$$) {
+	my ($self, $tag) = @_; # $method = $_[2], should be "ANONYMOUS"
+	$self->{-login_tag} = $tag;
+	"+\r\n"; # challenge
+}
+
+sub cmd_capability ($$) {
+	my ($self, $tag) = @_;
+	'* '.capa($self)."\r\n$tag OK\r\n";
+}
+
+sub cmd_noop ($$) { "$_[1] OK NOOP completed\r\n" }
+
+sub cmd_examine ($$$) {
+	my ($self, $tag, $mailbox) = @_;
+	my $ibx = $self->{imapd}->{groups}->{$mailbox} or
+		return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
+	my $mm = $ibx->mm;
+	my $max = $mm->num_highwater // 0;
+	# RFC 3501 2.3.1.1 -  "A good UIDVALIDITY value to use in
+	# this case is a 32-bit representation of the creation
+	# date/time of the mailbox"
+	my $uidvalidity = $mm->created_at or return "$tag BAD UIDVALIDITY\r\n";
+	my $uidnext = $max + 1;
+
+	# XXX: do we need this? RFC 5162/7162
+	my $ret = $self->{ibx} ? "* OK [CLOSED] previous closed\r\n" : '';
+	$self->{ibx} = $ibx;
+	$ret .= <<EOF;
+* $max EXISTS\r
+* $max RECENT\r
+* FLAGS (\\Seen)\r
+* OK [PERMANENTFLAGS ()] Read-only mailbox\r
+EOF
+	$ret .= "* OK [UNSEEN $max]\r\n" if $max;
+	$ret .= "* OK [UIDNEXT $uidnext]\r\n" if defined $uidnext;
+	$ret .= "* OK [UIDVALIDITY $uidvalidity]\r\n" if defined $uidvalidity;
+	$ret .= "$tag OK [READ-ONLY] EXAMINE/SELECT complete\r\n";
+}
+
+sub _esc ($) {
+	my ($v) = @_;
+	if (!defined($v)) {
+		'NIL';
+	} elsif ($v =~ /[{"\r\n%*\\\[]/) { # literal string
+		'{' . length($v) . "}\r\n" . $v;
+	} else { # quoted string
+		qq{"$v"}
+	}
+}
+
+sub addr_envelope ($$;$) {
+	my ($eml, $x, $y) = @_;
+	my $v = $eml->header_raw($x) //
+		($y ? $eml->header_raw($y) : undef) // return 'NIL';
+
+	my @x = $Address->parse($v) or return 'NIL';
+	'(' . join('',
+		map { '(' . join(' ',
+				_esc($_->name), 'NIL',
+				_esc($_->user), _esc($_->host)
+			) . ')'
+		} @x) .
+	')';
+}
+
+sub eml_envelope ($) {
+	my ($eml) = @_;
+	'(' . join(' ',
+		_esc($eml->header_raw('Date')),
+		_esc($eml->header_raw('Subject')),
+		addr_envelope($eml, 'From'),
+		addr_envelope($eml, 'Sender', 'From'),
+		addr_envelope($eml, 'Reply-To', 'From'),
+		addr_envelope($eml, 'To'),
+		addr_envelope($eml, 'Cc'),
+		addr_envelope($eml, 'Bcc'),
+		_esc($eml->header_raw('In-Reply-To')),
+		_esc($eml->header_raw('Message-ID')),
+	) . ')';
+}
+
+sub uid_fetch_cb { # called by git->cat_async
+	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
+	my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg;
+	my $smsg = shift @$msgs or die 'BUG: no smsg';
+	$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
+	$$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy
+
+	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
+	$$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s;
+
+	$self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}");
+
+	$want->{'RFC822.SIZE'} and
+		$self->msg_more(' RFC822.SIZE '.length($$bref));
+	$want->{INTERNALDATE} and
+		$self->msg_more(' INTERNALDATE "'.$smsg->internaldate.'"');
+	$want->{FLAGS} and $self->msg_more(' FLAGS ()');
+	for ('RFC822', 'BODY[]', 'BODY.PEEK[]') {
+		next unless $want->{$_};
+		$self->msg_more(" $_ {".length($$bref)."}\r\n");
+		$self->msg_more($$bref);
+	}
+
+	my $eml = PublicInbox::Eml->new($bref);
+
+	$want->{ENVELOPE} and
+		$self->msg_more(' ENVELOPE '.eml_envelope($eml));
+
+	for my $f ('RFC822.HEADER', 'BODY[HEADER]', 'BODY.PEEK[HEADER]') {
+		next unless $want->{$f};
+		$self->msg_more(" $f {".length(${$eml->{hdr}})."}\r\n");
+		$self->msg_more(${$eml->{hdr}});
+	}
+	for my $f ('RFC822.TEXT', 'BODY[TEXT]') {
+		next unless $want->{$f};
+		$self->msg_more(" $f {".length($$bref)."}\r\n");
+		$self->msg_more($$bref);
+	}
+	# TODO BODY/BODYSTRUCTURE, specific headers
+	$self->msg_more(")\r\n");
+}
+
+sub uid_fetch_m { # long_response
+	my ($self, $tag, $ibx, $beg, $end, $msgs, $want) = @_;
+	if (!@$msgs) { # refill
+		@$msgs = @{$ibx->over->query_xover($$beg, $end)};
+		if (!@$msgs) {
+			$self->write(\"$tag OK Fetch done\r\n");
+			return;
+		}
+		$$beg = $msgs->[-1]->{num} + 1;
+	}
+	my $git = $ibx->git;
+	$git->cat_async_begin; # TODO: actually make async
+	$git->cat_async($msgs->[0]->{blob}, \&uid_fetch_cb, \@_);
+	$git->cat_async_wait;
+	1;
+}
+
+sub cmd_uid_fetch ($$$;@) {
+	my ($self, $tag, $range, @want) = @_;
+	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
+	if ($want[0] =~ s/\A\(//s) {
+		$want[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n";
+	}
+	my %want = map {;
+		my $x = $FETCH_ATT{uc($_)} or return "$tag BAD param: $_\r\n";
+		%$x;
+	} @want;
+	my ($beg, $end);
+	my $msgs = [];
+	if ($range =~ /\A([0-9]+):([0-9]+)\z/s) {
+		($beg, $end) = ($1, $2);
+	} elsif ($range =~ /\A([0-9]+):\*\z/s) {
+		($beg, $end) =  ($1, $ibx->mm->num_highwater // 0);
+	} elsif ($range =~ /\A[0-9]+\z/) {
+		my $smsg = $ibx->over->get_art($range) or return "$tag OK\r\n";
+		push @$msgs, $smsg;
+		($beg, $end) = ($range, 0);
+	} else {
+		return "$tag BAD\r\n";
+	}
+	long_response($self, \&uid_fetch_m, $tag, $ibx,
+				\$beg, $end, $msgs, \%want);
+}
+
+sub uid_search_all { # long_response
+	my ($self, $tag, $ibx, $num) = @_;
+	my $uids = $ibx->mm->ids_after($num);
+	if (scalar(@$uids)) {
+		$self->msg_more(join(' ', '', @$uids));
+	} else {
+		$self->write(\"\r\n$tag OK\r\n");
+		undef;
+	}
+}
+
+sub uid_search_uid_range { # long_response
+	my ($self, $tag, $ibx, $beg, $end) = @_;
+	my $uids = $ibx->mm->msg_range($beg, $end, 'num');
+	if (@$uids) {
+		$self->msg_more(join('', map { " $_->[0]" } @$uids));
+	} else {
+		$self->write(\"\r\n$tag OK\r\n");
+		undef;
+	}
+}
+
+sub cmd_uid_search ($$$;) {
+	my ($self, $tag, $arg, @rest) = @_;
+	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
+	$arg = uc($arg);
+	if ($arg eq 'ALL' && !@rest) {
+		$self->msg_more('* SEARCH');
+		my $num = 0;
+		long_response($self, \&uid_search_all, $tag, $ibx, \$num);
+	} elsif ($arg eq 'UID' && scalar(@rest) == 1) {
+		if ($rest[0] =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
+			my ($beg, $end) = ($1, $2);
+			$end = ($ibx->mm->minmax)[1] if $end eq '*';
+			$self->msg_more('* SEARCH');
+			long_response($self, \&uid_search_uid_range,
+					$tag, $ibx, \$beg, $end);
+		} elsif ($rest[0] =~ /\A[0-9]+\z/s) {
+			my $uid = $rest[0];
+			$uid = $ibx->over->get_art($uid) ? " $uid" : '';
+			"* SEARCH$uid\r\n$tag OK\r\n";
+		} else {
+			"$tag BAD\r\n";
+		}
+	} else {
+		"$tag BAD\r\n";
+	}
+}
+
+sub args_ok ($$) { # duplicated from PublicInbox::NNTP
+	my ($cb, $argc) = @_;
+	my $tot = prototype $cb;
+	my ($nreq, undef) = split(';', $tot);
+	$nreq = ($nreq =~ tr/$//) - 1;
+	$tot = ($tot =~ tr/$//) - 1;
+	($argc <= $tot && $argc >= $nreq);
+}
+
+# returns 1 if we can continue, 0 if not due to buffered writes or disconnect
+sub process_line ($$) {
+	my ($self, $l) = @_;
+	my ($tag, $req, @args) = split(/[ \t]+/, $l);
+	if (@args && uc($req) eq 'UID') {
+		$req .= "_".(shift @args);
+	}
+	my $res = eval {
+		if (my $cmd = $self->can('cmd_'.lc($req // ''))) {
+			$cmd->($self, $tag, @args);
+		} else { # this is weird
+			auth_challenge_ok($self) //
+				"$tag BAD Error in IMAP command $req: ".
+				"Unknown command\r\n";
+		}
+	};
+	my $err = $@;
+	if ($err && $self->{sock}) {
+		$l =~ s/\r?\n//s;
+		err($self, 'error from: %s (%s)', $l, $err);
+		$res = "$tag BAD program fault - command not performed\r\n";
+	}
+	return 0 unless defined $res;
+	$self->write($res);
+}
+
+sub long_step {
+	my ($self) = @_;
+	# wbuf is unset or empty, here; {long} may add to it
+	my ($fd, $cb, $t0, @args) = @{$self->{long_cb}};
+	my $more = eval { $cb->($self, @args) };
+	if ($@ || !$self->{sock}) { # something bad happened...
+		delete $self->{long_cb};
+		my $elapsed = now() - $t0;
+		if ($@) {
+			err($self,
+			    "%s during long response[$fd] - %0.6f",
+			    $@, $elapsed);
+		}
+		out($self, " deferred[$fd] aborted - %0.6f", $elapsed);
+		$self->close;
+	} elsif ($more) { # $self->{wbuf}:
+		$self->update_idle_time;
+
+		# COMPRESS users all share the same DEFLATE context.
+		# Flush it here to ensure clients don't see
+		# each other's data
+		$self->zflush;
+
+		# no recursion, schedule another call ASAP, but only after
+		# all pending writes are done.  autovivify wbuf:
+		my $new_size = push(@{$self->{wbuf}}, \&long_step);
+
+		# wbuf may be populated by $cb, no need to rearm if so:
+		$self->requeue if $new_size == 1;
+	} else { # all done!
+		delete $self->{long_cb};
+		my $elapsed = now() - $t0;
+		my $fd = fileno($self->{sock});
+		out($self, " deferred[$fd] done - %0.6f", $elapsed);
+		my $wbuf = $self->{wbuf}; # do NOT autovivify
+
+		$self->requeue unless $wbuf && @$wbuf;
+	}
+}
+
+sub err ($$;@) {
+	my ($self, $fmt, @args) = @_;
+	printf { $self->{imapd}->{err} } $fmt."\n", @args;
+}
+
+sub out ($$;@) {
+	my ($self, $fmt, @args) = @_;
+	printf { $self->{imapd}->{out} } $fmt."\n", @args;
+}
+
+sub long_response ($$;@) {
+	my ($self, $cb, @args) = @_; # cb returns true if more, false if done
+
+	my $sock = $self->{sock} or return;
+	# make sure we disable reading during a long response,
+	# clients should not be sending us stuff and making us do more
+	# work while we are stream a response to them
+	$self->{long_cb} = [ fileno($sock), $cb, now(), @args ];
+	long_step($self); # kick off!
+	undef;
+}
+
+# callback used by PublicInbox::DS for any (e)poll (in/out/hup/err)
+sub event_step {
+	my ($self) = @_;
+
+	return unless $self->flush_write && $self->{sock};
+
+	$self->update_idle_time;
+	# only read more requests if we've drained the write buffer,
+	# otherwise we can be buffering infinitely w/o backpressure
+
+	my $rbuf = $self->{rbuf} // (\(my $x = ''));
+	my $r = 1;
+
+	if (index($$rbuf, "\n") < 0) {
+		my $off = length($$rbuf);
+		$r = $self->do_read($rbuf, LINE_MAX, $off) or return;
+	}
+	while ($r > 0 && $$rbuf =~ s/\A[ \t]*([^\n]*?)\r?\n//) {
+		my $line = $1;
+		return $self->close if $line =~ /[[:cntrl:]]/s;
+		my $t0 = now();
+		my $fd = fileno($self->{sock});
+		$r = eval { process_line($self, $line) };
+		my $pending = $self->{wbuf} ? ' pending' : '';
+		out($self, "[$fd] %s - %0.6f$pending", $line, now() - $t0);
+	}
+
+	return $self->close if $r < 0;
+	my $len = length($$rbuf);
+	return $self->close if ($len >= LINE_MAX);
+	$self->rbuf_idle($rbuf);
+	$self->update_idle_time;
+
+	# maybe there's more pipelined data, or we'll have
+	# to register it for socket-readiness notifications
+	$self->requeue unless $self->{wbuf};
+}
+
+sub compressed { undef }
+
+sub zflush {} # overridden by IMAPdeflate
+
+# RFC 4978
+sub cmd_compress ($$$) {
+	my ($self, $tag, $alg) = @_;
+	return "$tag BAD DEFLATE only\r\n" if uc($alg) ne "DEFLATE";
+	return "$tag BAD COMPRESS active\r\n" if $self->compressed;
+
+	# CRIME made TLS compression obsolete
+	# return "$tag NO [COMPRESSIONACTIVE]\r\n" if $self->tls_compressed;
+
+	PublicInbox::IMAPdeflate->enable($self, $tag);
+	$self->requeue;
+	undef
+}
+
+sub cmd_starttls ($$) {
+	my ($self, $tag) = @_;
+	my $sock = $self->{sock} or return;
+	if ($sock->can('stop_SSL') || $self->compressed) {
+		return "$tag BAD TLS or compression already enabled\r\n";
+	}
+	my $opt = $self->{imapd}->{accept_tls} or
+		return "$tag BAD can not initiate TLS negotiation\r\n";
+	$self->write(\"$tag OK begin TLS negotiation now\r\n");
+	$self->{sock} = IO::Socket::SSL->start_SSL($sock, %$opt);
+	$self->requeue if PublicInbox::DS::accept_tls_step($self);
+	undef;
+}
+
+# for graceful shutdown in PublicInbox::Daemon:
+sub busy {
+	my ($self, $now) = @_;
+	($self->{rbuf} || $self->{wbuf} || $self->not_idle_long($now));
+}
+
+# we're read-only, so SELECT and EXAMINE do the same thing
+no warnings 'once';
+*cmd_select = \&cmd_examine;
+
+1;
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
new file mode 100644
index 00000000000..1011d6a413b
--- /dev/null
+++ b/lib/PublicInbox/IMAPD.pm
@@ -0,0 +1,15 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# represents an IMAPD (currently a singleton),
+# see script/public-inbox-imapd for how it is used
+package PublicInbox::IMAPD;
+use strict;
+use parent qw(PublicInbox::NNTPD);
+
+sub new {
+	my ($class) = @_;
+	$class->SUPER::new; # PublicInbox::NNTPD->new
+}
+
+1;
diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm
new file mode 100644
index 00000000000..9366db7a7fc
--- /dev/null
+++ b/lib/PublicInbox/IMAPdeflate.pm
@@ -0,0 +1,119 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# TODO: reduce duplication from PublicInbox::NNTPdeflate
+
+# RFC 4978
+package PublicInbox::IMAPdeflate;
+use strict;
+use warnings;
+use 5.010_001;
+use base qw(PublicInbox::IMAP);
+use Compress::Raw::Zlib;
+use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways
+
+my %IN_OPT = (
+	-Bufsize => 1024,
+	-WindowBits => -15, # RFC 1951
+	-AppendOutput => 1,
+);
+
+# global deflate context and buffer
+my $zbuf = \(my $buf = '');
+my $zout;
+{
+	my $err;
+	($zout, $err) = Compress::Raw::Zlib::Deflate->new(
+		# nnrpd (INN) and Compress::Raw::Zlib favor MemLevel=9,
+		# the zlib C library and git use MemLevel=8 as the default
+		# -MemLevel => 9,
+		-Bufsize => 65536, # same as nnrpd
+		-WindowBits => -15, # RFC 1951
+		-AppendOutput => 1,
+	);
+	$err == Z_OK or die "Failed to initialize zlib deflate stream: $err";
+}
+
+sub enable {
+	my ($class, $self, $tag) = @_;
+	my ($in, $err) = Compress::Raw::Zlib::Inflate->new(%IN_OPT);
+	if ($err != Z_OK) {
+		$self->err("Inflate->new failed: $err");
+		$self->write(\"$tag BAD failed to activate compression\r\n");
+		return;
+	}
+	unlock_hash(%$self);
+	$self->write(\"$tag OK DEFLATE active\r\n");
+	bless $self, $class;
+	$self->{zin} = $in;
+}
+
+# overrides PublicInbox::NNTP::compressed
+sub compressed { 1 }
+
+# $_[1] may be a reference or not
+sub do_read ($$$$) {
+	my ($self, $rbuf, $len, $off) = @_;
+
+	my $zin = $self->{zin} or return; # closed
+	my $doff;
+	my $dbuf = delete($self->{dbuf}) // '';
+	$doff = length($dbuf);
+	my $r = PublicInbox::DS::do_read($self, \$dbuf, $len, $doff) or return;
+
+	# assert(length($$rbuf) == $off) as far as NNTP.pm is concerned
+	# -ConsumeInput is true, so $dbuf is automatically emptied
+	my $err = $zin->inflate($dbuf, $rbuf);
+	if ($err == Z_OK) {
+		$self->{dbuf} = $dbuf if $dbuf ne '';
+		$r = length($$rbuf) and return $r;
+		# nothing ready, yet, get more, later
+		$self->requeue;
+	} else {
+		delete $self->{zin};
+		$self->close;
+	}
+	0;
+}
+
+# override PublicInbox::DS::msg_more
+sub msg_more ($$) {
+	my $self = $_[0];
+
+	# $_[1] may be a reference or not for ->deflate
+	my $err = $zout->deflate($_[1], $zbuf);
+	$err == Z_OK or die "->deflate failed $err";
+	1;
+}
+
+sub zflush ($) {
+	my ($self) = @_;
+
+	my $deflated = $zbuf;
+	$zbuf = \(my $next = '');
+
+	my $err = $zout->flush($deflated, Z_FULL_FLUSH);
+	$err == Z_OK or die "->flush failed $err";
+
+	# We can still let the lower socket layer do buffering:
+	PublicInbox::DS::msg_more($self, $$deflated);
+}
+
+# compatible with PublicInbox::DS::write, so $_[1] may be a reference or not
+sub write ($$) {
+	my $self = $_[0];
+	return PublicInbox::DS::write($self, $_[1]) if ref($_[1]) eq 'CODE';
+
+	my $deflated = $zbuf;
+	$zbuf = \(my $next = '');
+
+	# $_[1] may be a reference or not for ->deflate
+	my $err = $zout->deflate($_[1], $deflated);
+	$err == Z_OK or die "->deflate failed $err";
+	$err = $zout->flush($deflated, Z_FULL_FLUSH);
+	$err == Z_OK or die "->flush failed $err";
+
+	# We can still let the socket layer do buffering:
+	PublicInbox::DS::write($self, $deflated);
+}
+
+1;
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index e8f9c9a3681..725d4206282 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -131,14 +131,20 @@ sub populate {
 my @DoW = qw(Sun Mon Tue Wed Thu Fri Sat);
 my @MoY = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
 
-sub date ($) {
+sub date ($) { # for NNTP
 	my ($self) = @_;
 	my $ds = $self->{ds};
 	return unless defined $ds;
 	my ($sec, $min, $hour, $mday, $mon, $year, $wday) = gmtime($ds);
 	"$DoW[$wday], " . sprintf("%02d $MoY[$mon] %04d %02d:%02d:%02d +0000",
 				$mday, $year+1900, $hour, $min, $sec);
+}
 
+sub internaldate { # for IMAP
+	my ($self) = @_;
+	my ($sec, $min, $hour, $mday, $mon, $year) = gmtime($self->{ts} // 0);
+	sprintf("%02d-$MoY[$mon]-%04d %02d:%02d:%02d +0000",
+				$mday, $year+1900, $hour, $min, $sec);
 }
 
 our $REPLY_RE = qr/^re:\s+/i;
diff --git a/script/public-inbox-imapd b/script/public-inbox-imapd
new file mode 100644
index 00000000000..63f865f53f0
--- /dev/null
+++ b/script/public-inbox-imapd
@@ -0,0 +1,14 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# Standalone read-only IMAP server for public-inbox.
+use strict;
+use PublicInbox::Daemon;
+use PublicInbox::IMAPdeflate; # loads PublicInbox::IMAP
+use PublicInbox::IMAPD;
+my $imapd = PublicInbox::IMAPD->new;
+PublicInbox::Daemon::run('0.0.0.0:143',
+	sub { $imapd->refresh_groups }, # refresh
+	sub ($$$) { PublicInbox::IMAP->new($_[0], $imapd) }, # post_accept
+	$imapd);
diff --git a/t/imapd-tls.t b/t/imapd-tls.t
new file mode 100644
index 00000000000..9f5abfe048e
--- /dev/null
+++ b/t/imapd-tls.t
@@ -0,0 +1,209 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use Socket qw(IPPROTO_TCP SOL_SOCKET);
+use PublicInbox::TestCommon;
+# IO::Poll is part of the standard library, but distros may split it off...
+require_mods(qw(DBD::SQLite IO::Socket::SSL Mail::IMAPClient IO::Poll));
+Mail::IMAPClient->can('starttls') or
+	plan skip_all => 'Mail::IMAPClient does not support TLS';
+my $cert = 'certs/server-cert.pem';
+my $key = 'certs/server-key.pem';
+unless (-r $key && -r $cert) {
+	plan skip_all =>
+		"certs/ missing for $0, run $^X ./create-certs.perl in certs/";
+}
+use_ok 'PublicInbox::TLS';
+use_ok 'IO::Socket::SSL';
+use PublicInbox::InboxWritable;
+require PublicInbox::SearchIdx;
+my $version = 1; # v2 needs newer git
+require_git('2.6') if $version >= 2;
+my ($tmpdir, $for_destroy) = tmpdir();
+my $err = "$tmpdir/stderr.log";
+my $out = "$tmpdir/stdout.log";
+my $inboxdir = "$tmpdir";
+my $pi_config = "$tmpdir/pi_config";
+my $group = 'test-imapd-tls';
+my $addr = $group . '@example.com';
+my $starttls = tcp_server();
+my $imaps = tcp_server();
+my $ibx = PublicInbox::Inbox->new({
+	inboxdir => $inboxdir,
+	name => 'imapd-tls',
+	version => $version,
+	-primary_address => $addr,
+	indexlevel => 'basic',
+});
+$ibx = PublicInbox::InboxWritable->new($ibx, {nproc=>1});
+$ibx->init_inbox(0);
+{
+	open my $fh, '>', $pi_config or BAIL_OUT "open: $!";
+	print $fh <<EOF
+[publicinbox "imapd-tls"]
+	inboxdir = $inboxdir
+	address = $addr
+	indexlevel = basic
+	newsgroup = $group
+EOF
+	;
+	close $fh or BAIL_OUT "close: $!\n";
+}
+
+{
+	my $im = $ibx->importer(0);
+	ok($im->add(eml_load('t/data/0001.patch')), 'message added');
+	$im->done;
+	if ($version == 1) {
+		my $s = PublicInbox::SearchIdx->new($ibx, 1);
+		$s->index_sync;
+	}
+}
+
+my $imaps_addr = $imaps->sockhost . ':' . $imaps->sockport;
+my $starttls_addr = $starttls->sockhost . ':' . $starttls->sockport;
+my $env = { PI_CONFIG => $pi_config };
+my $td;
+
+# Mail::IMAPClient ->compress creates cyclic reference:
+# https://rt.cpan.org/Ticket/Display.html?id=132654
+my $compress_logout = sub {
+	my ($c) = @_;
+	ok($c->logout, 'logout ok after ->compress');
+	# all documented in Mail::IMAPClient manpage:
+	for (qw(Readmoremethod Readmethod Prewritemethod)) {
+		$c->$_(undef);
+	}
+};
+
+
+for my $args (
+	[ "--cert=$cert", "--key=$key",
+		"-limaps://$imaps_addr",
+		"-limap://$starttls_addr" ],
+) {
+	for ($out, $err) {
+		open my $fh, '>', $_ or BAIL_OUT "truncate: $!";
+	}
+	my $cmd = [ '-imapd', '-W0', @$args, "--stdout=$out", "--stderr=$err" ];
+	$td = start_script($cmd, $env, { 3 => $starttls, 4 => $imaps });
+	my %o = (
+		SSL_hostname => 'server.local',
+		SSL_verifycn_name => 'server.local',
+		SSL_verify_mode => SSL_VERIFY_PEER(),
+		SSL_ca_file => 'certs/test-ca.pem',
+	);
+	# start negotiating a slow TLS connection
+	my $slow = tcp_connect($imaps, Blocking => 0);
+	$slow = IO::Socket::SSL->start_SSL($slow, SSL_startHandshake => 0, %o);
+	my $slow_done = $slow->connect_SSL;
+	my @poll;
+	if ($slow_done) {
+		diag('W: connect_SSL early OK, slow client test invalid');
+		use PublicInbox::Syscall qw(EPOLLIN EPOLLOUT);
+		@poll = (fileno($slow), EPOLLIN | EPOLLOUT);
+	} else {
+		@poll = (fileno($slow), PublicInbox::TLS::epollbit());
+	}
+	# we should call connect_SSL much later...
+	my %imaps_opt = (User => 'a', Password => 'b',
+			Server => $imaps->sockhost,
+			Port => $imaps->sockport);
+	# IMAPS
+	my $c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]);
+	ok($c && $c->IsAuthenticated, 'authenticated');
+	ok($c->select($group), 'SELECT works');
+	ok(!(scalar $c->has_capability('STARTTLS')),
+		'starttls not advertised with IMAPS');
+	ok(!$c->starttls, "starttls fails");
+	ok($c->has_capability('COMPRESS'), 'compress advertised');
+	ok($c->compress, 'compression enabled with IMAPS');
+	ok(!$c->starttls, 'starttls still fails');
+	ok($c->noop, 'noop succeeds');
+	$compress_logout->($c);
+
+	# STARTTLS
+	my %imap_opt = (Server => $starttls->sockhost,
+			Port => $starttls->sockport);
+	$c = Mail::IMAPClient->new(%imap_opt);
+	ok(scalar $c->has_capability('STARTTLS'),
+		'starttls advertised');
+	ok($c->Starttls([ %o ]), 'set starttls options');
+	ok($c->starttls, '->starttls works');
+	ok(!(scalar($c->has_capability('STARTTLS'))),
+		'starttls not advertised');
+	ok(!$c->starttls, '->starttls again fails');
+	ok(!(scalar($c->has_capability('STARTTLS'))),
+		'starttls still not advertised');
+	ok($c->examine($group), 'EXAMINE works');
+	ok($c->noop, 'NOOP works');
+	ok($c->compress, 'compression enabled with IMAPS');
+	ok($c->noop, 'NOOP works after compress');
+	$compress_logout->($c);
+
+	# STARTTLS with bad hostname
+	$o{SSL_hostname} = $o{SSL_verifycn_name} = 'server.invalid';
+	$c = Mail::IMAPClient->new(%imap_opt);
+	ok(scalar $c->has_capability('STARTTLS'), 'starttls advertised');
+	ok($c->Starttls([ %o ]), 'set starttls options');
+	ok(!$c->starttls, '->starttls fails with bad hostname');
+
+	$c = Mail::IMAPClient->new(%imap_opt);
+	ok($c->noop, 'NOOP still works from plain IMAP');
+
+	# IMAPS with bad hostname
+	$c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]);
+	is($c, undef, 'IMAPS fails with bad hostname');
+
+	# make hostname valid
+	$o{SSL_hostname} = $o{SSL_verifycn_name} = 'server.local';
+	$c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]);
+	ok($c, 'IMAPS succeeds again with valid hostname');
+
+	# slow TLS connection did not block the other fast clients while
+	# connecting, finish it off:
+	until ($slow_done) {
+		IO::Poll::_poll(-1, @poll);
+		$slow_done = $slow->connect_SSL and last;
+		@poll = (fileno($slow), PublicInbox::TLS::epollbit());
+	}
+	$slow->blocking(1);
+	ok(sysread($slow, my $greet, 4096) > 0, 'slow got a greeting');
+	like($greet, qr/\A\* OK \[CAPABILITY IMAP4rev1 /, 'got greeting');
+	is(syswrite($slow, "1 LOGOUT\r\n"), 10, 'slow wrote LOGOUT');
+	ok(sysread($slow, my $end, 4096) > 0, 'got end');
+	is(sysread($slow, my $eof, 4096), 0, 'got EOF');
+
+	SKIP: {
+		skip 'TCP_DEFER_ACCEPT is Linux-only', 2 if $^O ne 'linux';
+		my $var = eval { Socket::TCP_DEFER_ACCEPT() } // 9;
+		defined(my $x = getsockopt($imaps, IPPROTO_TCP, $var)) or die;
+		ok(unpack('i', $x) > 0, 'TCP_DEFER_ACCEPT set on IMAPS');
+		defined($x = getsockopt($starttls, IPPROTO_TCP, $var)) or die;
+		is(unpack('i', $x), 0, 'TCP_DEFER_ACCEPT is 0 on plain IMAP');
+	};
+	SKIP: {
+		skip 'SO_ACCEPTFILTER is FreeBSD-only', 2 if $^O ne 'freebsd';
+		if (system('kldstat -m accf_data >/dev/null')) {
+			skip 'accf_data not loaded? kldload accf_data', 2;
+		}
+		require PublicInbox::Daemon;
+		my $var = PublicInbox::Daemon::SO_ACCEPTFILTER();
+		my $x = getsockopt($imaps, SOL_SOCKET, $var);
+		like($x, qr/\Adataready\0+\z/, 'got dataready accf for IMAPS');
+		$x = getsockopt($starttls, IPPROTO_TCP, $var);
+		is($x, undef, 'no BSD accept filter for plain IMAP');
+	};
+
+	$c = undef;
+	$td->kill;
+	$td->join;
+	is($?, 0, 'no error in exited process');
+	open my $fh, '<', $err or BAIL_OUT "open $err failed: $!";
+	my $eout = do { local $/; <$fh> };
+	unlike($eout, qr/wide/i, 'no Wide character warnings');
+}
+
+done_testing;
diff --git a/t/imapd.t b/t/imapd.t
new file mode 100644
index 00000000000..f28a663bf9d
--- /dev/null
+++ b/t/imapd.t
@@ -0,0 +1,149 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+require_mods(qw(DBD::SQLite Mail::IMAPClient));
+my $level = '-Lbasic';
+SKIP: {
+	require_mods('Search::Xapian', 1);
+	$level = '-Lmedium';
+};
+
+my @V = (1);
+#push(@V, 2) if require_git('2.6', 1);
+
+my ($tmpdir, $for_destroy) = tmpdir();
+my $home = "$tmpdir/home";
+local $ENV{HOME} = $home;
+
+for my $V (@V) {
+	my $addr = "i$V\@example.com";
+	my $name = "i$V";
+	my $url = "http://example.com/i$V";
+	my $inboxdir = "$tmpdir/$name";
+	my $folder = "inbox.i$V";
+	my $cmd = ['-init', "-V$V", $level, $name, $inboxdir, $url, $addr];
+	run_script($cmd) or BAIL_OUT("init $name");
+	xsys(qw(git config), "--file=$ENV{HOME}/.public-inbox/config",
+			"publicinbox.$name.newsgroup", $folder) == 0 or
+			BAIL_OUT("setting newsgroup $V");
+	if ($V == 1) {
+		xsys(qw(git config), "--file=$ENV{HOME}/.public-inbox/config",
+			'publicinboxmda.spamcheck', 'none') == 0 or
+			BAIL_OUT("config: $?");
+	}
+	open(my $fh, '<', 't/utf8.eml') or BAIL_OUT("open t/utf8.eml: $!");
+	my $env = { ORIGINAL_RECIPIENT => $addr };
+	run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or
+		BAIL_OUT('-mda delivery');
+	if ($V == 1) {
+		run_script(['-index', $inboxdir]) or BAIL_OUT("index $?");
+	}
+}
+my $sock = tcp_server();
+my $err = "$tmpdir/stderr.log";
+my $out = "$tmpdir/stdout.log";
+my $cmd = [ '-imapd', '-W0', "--stdout=$out", "--stderr=$err" ];
+my $td = start_script($cmd, undef, { 3 => $sock }) or BAIL_OUT("-imapd: $?");
+my %mic_opt = (
+	Server => $sock->sockhost,
+	Port => $sock->sockport,
+	Uid => 1,
+);
+my $mic = Mail::IMAPClient->new(%mic_opt);
+my $pre_login_capa = $mic->capability;
+is(grep(/\AAUTH=ANONYMOUS\z/, @$pre_login_capa), 1,
+	'AUTH=ANONYMOUS advertised pre-login');
+
+$mic->User('lorelei');
+$mic->Password('Hunter2');
+ok($mic->login && $mic->IsAuthenticated, 'LOGIN works');
+my $post_login_capa = $mic->capability;
+ok(join("\n", @$pre_login_capa) ne join("\n", @$post_login_capa),
+	'got different capabilities post-login');
+
+$mic_opt{Authmechanism} = 'ANONYMOUS';
+$mic_opt{Authcallback} = sub { '' };
+$mic = Mail::IMAPClient->new(%mic_opt);
+ok($mic && $mic->login && $mic->IsAuthenticated, 'AUTHENTICATE ANONYMOUS');
+my $post_auth_anon_capa = $mic->capability;
+is_deeply($post_auth_anon_capa, $post_login_capa,
+	'auth anon has same capabilities');
+my $e;
+ok(!$mic->examine('foo') && ($e = $@), 'EXAMINE non-existent');
+like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent');
+ok(!$mic->select('foo') && ($e = $@), 'EXAMINE non-existent');
+like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent');
+ok($mic->select('inbox.i1'), 'SELECT succeeds');
+ok($mic->examine('inbox.i1'), 'EXAMINE succeeds');
+
+my $ret = $mic->search('all') or BAIL_OUT "SEARCH FAIL $@";
+is_deeply($ret, [ 1 ], 'search all works');
+$ret = $mic->search('uid 1') or BAIL_OUT "SEARCH FAIL $@";
+is_deeply($ret, [ 1 ], 'search UID 1 works');
+$ret = $mic->search('uid 1:1') or BAIL_OUT "SEARCH FAIL $@";
+is_deeply($ret, [ 1 ], 'search UID 1:1 works');
+$ret = $mic->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@";
+is_deeply($ret, [ 1 ], 'search UID 1:* works');
+
+is_deeply(scalar $mic->flags('1'), [], '->flags works');
+
+for my $r ('1:*', '1') {
+	$ret = $mic->fetch_hash($r, 'RFC822') or BAIL_OUT "FETCH $@";
+	is_deeply([keys %$ret], [1]);
+	like($ret->{1}->{RFC822}, qr/\r\n\r\nThis is a test/, 'read full');
+
+	# ensure Mail::IMAPClient behaves
+	my $str = $mic->message_string($r) or BAIL_OUT "->message_string: $@";
+	is($str, $ret->{1}->{RFC822}, '->message_string works as expected');
+
+	my $sz = $mic->fetch_hash($r, 'RFC822.size') or BAIL_OUT "FETCH $@";
+	is($sz->{1}->{'RFC822.SIZE'}, length($ret->{1}->{RFC822}),
+		'RFC822.SIZE');
+
+	$ret = $mic->fetch_hash($r, 'RFC822.HEADER') or BAIL_OUT "FETCH $@";
+	is_deeply([keys %$ret], [1]);
+	like($ret->{1}->{'RFC822.HEADER'},
+		qr/^Message-ID: <testmessage\@example\.com>/ms, 'read header');
+
+	$ret = $mic->fetch_hash($r, 'INTERNALDATE') or BAIL_OUT "FETCH $@";
+	is($ret->{1}->{'INTERNALDATE'}, '01-Jan-1970 00:00:00 +0000',
+		'internaldate matches');
+	ok(!$mic->fetch_hash($r, 'INFERNALDATE'), 'bogus attribute fails');
+
+	my $envelope = $mic->get_envelope($r) or BAIL_OUT("get_envelope: $@");
+	is($envelope->{bcc}, 'NIL', 'empty bcc');
+	is($envelope->{messageid}, '<testmessage@example.com>', 'messageid');
+	is(scalar @{$envelope->{to}}, 1, 'one {to} header');
+	# *sigh* too much to verify...
+	#use Data::Dumper; diag Dumper($envelope);
+
+	$ret = $mic->fetch_hash($r, 'FLAGS') or BAIL_OUT "FETCH $@";
+	is_deeply($ret->{1}->{FLAGS}, '', 'no flags');
+}
+
+# Mail::IMAPClient ->compress creates cyclic reference:
+# https://rt.cpan.org/Ticket/Display.html?id=132654
+my $compress_logout = sub {
+	my ($c) = @_;
+	ok($c->logout, 'logout ok after ->compress');
+	# all documented in Mail::IMAPClient manpage:
+	for (qw(Readmoremethod Readmethod Prewritemethod)) {
+		$c->$_(undef);
+	}
+};
+
+is_deeply([$mic->has_capability('COMPRESS')], ['DEFLATE'], 'deflate cap');
+ok($mic->compress, 'compress enabled');
+$compress_logout->($mic);
+
+$td->kill;
+$td->join;
+is($?, 0, 'no error in exited process');
+open my $fh, '<', $err or BAIL_OUT("open $err failed: $!");
+my $eout = do { local $/; <$fh> };
+unlike($eout, qr/wide/i, 'no Wide character warnings');
+
+done_testing;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 04/82] inboxidle: new class to detect inbox changes
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (2 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 03/82] preliminary imap server implementation Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 05/82] imap: support IDLE Eric Wong
                   ` (78 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This will be used to implement IMAP IDLE, first.

Eventually, it may be used to trigger other things:

* incremental internal updates for manifest.js.gz
* restart `git cat-file' processes on pack index unlink
* IMAP IDLE-like long-polling HTTP endpoint

And maybe more things we haven't thought of, yet.

It uses Linux::Inotify2 or IO::KQueue depending on what packages
are installed and what the kernel supports.  It falls back to
nanosecond-aware Time::HiRes::stat() (available with Perl 5.10.0+)
on systems lacking Linux::Inotify2 and IO::KQueue.

In the future, a pure Perl alternative to Linux::Inotify2 may be
supplied for users of architectures we already support signalfd
and epoll on.

v2 changes:
- avoid O_TRUNC on lock file
- change ctime on Linux systems w/o inotify
- fix naming of comments and fields
---
 MANIFEST                       |  5 +++
 lib/PublicInbox/FakeInotify.pm | 39 ++++++++++++++++++
 lib/PublicInbox/In2Tie.pm      | 17 ++++++++
 lib/PublicInbox/Inbox.pm       | 20 ++++++++++
 lib/PublicInbox/InboxIdle.pm   | 55 ++++++++++++++++++++++++++
 lib/PublicInbox/KQNotify.pm    | 60 ++++++++++++++++++++++++++++
 lib/PublicInbox/Lock.pm        |  7 ++++
 t/inbox_idle.t                 | 72 ++++++++++++++++++++++++++++++++++
 8 files changed, 275 insertions(+)
 create mode 100644 lib/PublicInbox/FakeInotify.pm
 create mode 100644 lib/PublicInbox/In2Tie.pm
 create mode 100644 lib/PublicInbox/InboxIdle.pm
 create mode 100644 lib/PublicInbox/KQNotify.pm
 create mode 100644 t/inbox_idle.t

diff --git a/MANIFEST b/MANIFEST
index 73b874b42a0..8aff192c7ee 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -109,6 +109,7 @@ lib/PublicInbox/Emergency.pm
 lib/PublicInbox/Eml.pm
 lib/PublicInbox/EmlContentFoo.pm
 lib/PublicInbox/ExtMsg.pm
+lib/PublicInbox/FakeInotify.pm
 lib/PublicInbox/Feed.pm
 lib/PublicInbox/Filter/Base.pm
 lib/PublicInbox/Filter/Gmane.pm
@@ -129,8 +130,11 @@ lib/PublicInbox/IMAP.pm
 lib/PublicInbox/IMAPD.pm
 lib/PublicInbox/IMAPdeflate.pm
 lib/PublicInbox/Import.pm
+lib/PublicInbox/In2Tie.pm
 lib/PublicInbox/Inbox.pm
+lib/PublicInbox/InboxIdle.pm
 lib/PublicInbox/InboxWritable.pm
+lib/PublicInbox/KQNotify.pm
 lib/PublicInbox/Linkify.pm
 lib/PublicInbox/Listener.pm
 lib/PublicInbox/Lock.pm
@@ -266,6 +270,7 @@ t/imapd-tls.t
 t/imapd.t
 t/import.t
 t/inbox.t
+t/inbox_idle.t
 t/index-git-times.t
 t/indexlevels-mirror-v1.t
 t/indexlevels-mirror.t
diff --git a/lib/PublicInbox/FakeInotify.pm b/lib/PublicInbox/FakeInotify.pm
new file mode 100644
index 00000000000..bd610463181
--- /dev/null
+++ b/lib/PublicInbox/FakeInotify.pm
@@ -0,0 +1,39 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# for systems lacking Linux::Inotify2 or IO::KQueue, just emulates
+# enough of Linux::Inotify2
+package PublicInbox::FakeInotify;
+use strict;
+use Time::HiRes qw(stat);
+my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify
+
+sub new { bless { watch => {} }, __PACKAGE__ }
+
+# behaves like Linux::Inotify2->watch
+sub watch {
+	my ($self, $path, $mask, $cb) = @_;
+	my @st = stat($path) or return;
+	$self->{watch}->{"$path\0$mask"} = [ @st, $cb ];
+}
+
+# behaves like non-blocking Linux::Inotify2->poll
+sub poll {
+	my ($self) = @_;
+	my $watch = $self->{watch} or return;
+	for my $x (keys %$watch) {
+		my ($path, $mask) = split(/\0/, $x, 2);
+		my @now = stat($path) or next;
+		my $prv = $watch->{$x};
+		my $cb = $prv->[-1];
+		# 10: ctime, 7: size
+		if ($prv->[10] != $now[10]) {
+			if (($mask & $IN_CLOSE) == $IN_CLOSE) {
+				eval { $cb->() };
+			}
+		}
+		@$prv = (@now, $cb);
+	}
+}
+
+1;
diff --git a/lib/PublicInbox/In2Tie.pm b/lib/PublicInbox/In2Tie.pm
new file mode 100644
index 00000000000..db1dc1045c1
--- /dev/null
+++ b/lib/PublicInbox/In2Tie.pm
@@ -0,0 +1,17 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# used to ensure PublicInbox::DS can call fileno() as a function
+# on Linux::Inotify2 objects
+package PublicInbox::In2Tie;
+use strict;
+
+sub TIEHANDLE {
+	my ($class, $in2) = @_;
+	bless \$in2, $class; # a scalar reference to an existing reference
+}
+
+# this calls Linux::Inotify2::fileno
+sub FILENO { ${$_[0]}->fileno }
+
+1;
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index af034358b15..b250bef33d3 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -391,4 +391,24 @@ sub altid_map ($) {
 	} // {};
 }
 
+# $obj must respond to ->on_inbox_unlock, which takes Inbox ($self) as an arg
+sub subscribe_unlock {
+	my ($self, $ident, $obj) = @_;
+	$self->{unlock_subs}->{$ident} = $obj;
+}
+
+sub unsubscribe_unlock {
+	my ($self, $ident) = @_;
+	delete $self->{unlock_subs}->{$ident};
+}
+
+# called by inotify
+sub on_unlock {
+	my ($self) = @_;
+	my $subs = $self->{unlock_subs} or return;
+	for (values %$subs) {
+		eval { $_->on_inbox_unlock($self) };
+	}
+}
+
 1;
diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm
new file mode 100644
index 00000000000..095a801c946
--- /dev/null
+++ b/lib/PublicInbox/InboxIdle.pm
@@ -0,0 +1,55 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+package PublicInbox::InboxIdle;
+use strict;
+use base qw(PublicInbox::DS);
+use fields qw(pi_config inot);
+use Symbol qw(gensym);
+use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
+my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify
+my $ino_cls;
+if ($^O eq 'linux' && eval { require Linux::Inotify2; 1 }) {
+	$IN_CLOSE = Linux::Inotify2::IN_CLOSE();
+	$ino_cls = 'Linux::Inotify2';
+} elsif (eval { require PublicInbox::KQNotify }) {
+	$IN_CLOSE = PublicInbox::KQNotify::IN_CLOSE();
+	$ino_cls = 'PublicInbox::KQNotify';
+}
+require PublicInbox::In2Tie if $ino_cls;
+
+sub in2_arm ($$) { # PublicInbox::Config::each_inbox callback
+	my ($ibx, $inot) = @_;
+	my $path = "$ibx->{inboxdir}/";
+	$path .= $ibx->version >= 2 ? 'inbox.lock' : 'ssoma.lock';
+	$inot->watch($path, $IN_CLOSE, sub { $ibx->on_unlock });
+	# TODO: detect deleted packs (and possibly other files)
+}
+
+sub new {
+	my ($class, $pi_config) = @_;
+	my $self = fields::new($class);
+	my $inot;
+	if ($ino_cls) {
+		$inot = $ino_cls->new or die "E: $ino_cls->new: $!";
+		my $sock = gensym;
+		tie *$sock, 'PublicInbox::In2Tie', $inot;
+		$inot->blocking(0);
+		$inot->on_overflow(undef); # broadcasts everything on overflow
+		$self->SUPER::new($sock, EPOLLIN | EPOLLET);
+	} else {
+		require PublicInbox::FakeInotify;
+		$inot = PublicInbox::FakeInotify->new;
+	}
+	$self->{inot} = $inot;
+	$pi_config->each_inbox(\&in2_arm, $inot);
+	$self;
+}
+
+sub event_step {
+	my ($self) = @_;
+	eval { $self->{inot}->poll }; # Linux::Inotify2::poll
+	warn "$self->{inot}->poll err: $@\n" if $@;
+}
+
+1;
diff --git a/lib/PublicInbox/KQNotify.pm b/lib/PublicInbox/KQNotify.pm
new file mode 100644
index 00000000000..3cf9c0f5a0c
--- /dev/null
+++ b/lib/PublicInbox/KQNotify.pm
@@ -0,0 +1,60 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# implements the small subset of Linux::Inotify2 functionality we use
+# using IO::KQueue on *BSD systems.
+package PublicInbox::KQNotify;
+use strict;
+use IO::KQueue;
+use PublicInbox::DSKQXS; # wraps IO::KQueue for fork-safe DESTROY
+
+# only true as far as public-inbox is concerned with .lock files:
+sub IN_CLOSE () { NOTE_WRITE }
+#sub IN_CLOSE () { 0x200 } # NOTE_CLOSE_WRITE (FreeBSD 11+ only)
+
+sub new {
+	my ($class) = @_;
+	bless { dskq => PublicInbox::DSKQXS->new, watch => {} }, $class;
+}
+
+sub watch {
+	my ($self, $path, $mask, $cb) = @_;
+	open(my $fh, '<', $path) or return;
+	my $ident = fileno($fh);
+	$self->{dskq}->{kq}->EV_SET($ident, # ident
+		EVFILT_VNODE, # filter
+		EV_ADD | EV_CLEAR, # flags
+		$mask, # fflags
+		0, 0); # data, udata
+	if ($mask == IN_CLOSE) {
+		$self->{watch}->{$ident} = [ $fh, $cb ];
+	} else {
+		die "TODO Not implemented: $mask";
+	}
+}
+
+# emulate Linux::Inotify::fileno
+sub fileno { ${$_[0]->{dskq}->{kq}} }
+
+# noop for Linux::Inotify2 compatibility.  Unlike inotify,
+# kqueue doesn't seem to overflow since it's limited by the number of
+# open FDs the process has
+sub on_overflow {}
+
+# noop for Linux::Inotify2 compatibility, we use `0' timeout for ->kevent
+sub blocking {}
+
+# behave like Linux::Inotify2::poll
+sub poll {
+	my ($self) = @_;
+	my @kevents = $self->{dskq}->{kq}->kevent(0);
+	for my $kev (@kevents) {
+		my $ident = $kev->[KQ_IDENT];
+		my $mask = $kev->[KQ_FFLAGS];
+		if (($mask & IN_CLOSE) == IN_CLOSE) {
+			eval { $self->{watch}->{$ident}->[1]->() };
+		}
+	}
+}
+
+1;
diff --git a/lib/PublicInbox/Lock.pm b/lib/PublicInbox/Lock.pm
index 032841ed5cf..693a37949a6 100644
--- a/lib/PublicInbox/Lock.pm
+++ b/lib/PublicInbox/Lock.pm
@@ -24,6 +24,13 @@ sub lock_release {
 	my ($self) = @_;
 	return unless $self->{lock_path};
 	my $lockfh = delete $self->{lockfh} or croak 'not locked';
+
+	# NetBSD 8.1 and OpenBSD 6.5 (and maybe other versions/*BSDs) lack
+	# NOTE_CLOSE_WRITE from FreeBSD 11+, so trigger NOTE_WRITE, instead.
+	# We also need to change the ctime on Linux systems w/o inotify
+	if ($^O ne 'linux' || !eval { require Linux::Inotify2; 1 }) {
+		syswrite($lockfh, '.');
+	}
 	flock($lockfh, LOCK_UN) or die "unlock failed: $!\n";
 	close $lockfh or die "close failed: $!\n";
 }
diff --git a/t/inbox_idle.t b/t/inbox_idle.t
new file mode 100644
index 00000000000..6bd56113602
--- /dev/null
+++ b/t/inbox_idle.t
@@ -0,0 +1,72 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::Config;
+require_mods(qw(DBD::SQLite));
+require PublicInbox::SearchIdx;
+use_ok 'PublicInbox::InboxIdle';
+require_git('2.6');
+use PublicInbox::InboxWritable;
+my ($tmpdir, $for_destroy) = tmpdir();
+
+for my $V (1, 2) {
+	my $inboxdir = "$tmpdir/$V";
+	mkdir $inboxdir or BAIL_OUT("mkdir: $!");
+	my %opt = (
+		inboxdir => $inboxdir,
+		name => 'inbox-idle',
+		version => $V,
+		-primary_address => 'test@example.com',
+		indexlevel => 'basic',
+	);
+	my $ibx = PublicInbox::Inbox->new({ %opt });
+	$ibx = PublicInbox::InboxWritable->new($ibx);
+	my $obj = InboxIdleTestObj->new;
+	$ibx->init_inbox(0);
+	my $im = $ibx->importer(0);
+	if ($V == 1) {
+		my $sidx = PublicInbox::SearchIdx->new($ibx, 1);
+		$sidx->_xdb_acquire;
+		$sidx->set_indexlevel;
+		$sidx->_xdb_release; # allow watching on lockfile
+	}
+	my $pi_config = PublicInbox::Config->new(\<<EOF);
+publicinbox.inbox-idle.inboxdir=$inboxdir
+publicinbox.inbox-idle.indexlevel=basic
+publicinbox.inbox-idle.address=test\@example.com
+EOF
+	my $ident = 'whatever';
+	$pi_config->each_inbox(sub { shift->subscribe_unlock($ident, $obj) });
+	my $ii = PublicInbox::InboxIdle->new($pi_config);
+	ok($ii, 'InboxIdle created');
+	SKIP: {
+		skip('inotify or kqueue missing', 1) unless $ii->{sock};
+		ok(fileno($ii->{sock}) >= 0, 'fileno() gave valid FD');
+	}
+	ok($im->add(eml_load('t/utf8.eml')), "$V added");
+	$im->done;
+	PublicInbox::SearchIdx->new($ibx)->index_sync if $V == 1;
+	$ii->event_step;
+	is(scalar @{$obj->{called}}, 1, 'called on unlock');
+	$pi_config->each_inbox(sub { shift->unsubscribe_unlock($ident) });
+	ok($im->add(eml_load('t/data/0001.patch')), "$V added #2");
+	$im->done;
+	PublicInbox::SearchIdx->new($ibx)->index_sync if $V == 1;
+	$ii->event_step;
+	is(scalar @{$obj->{called}}, 1, 'not called when unsubbed');
+	$ii->close;
+}
+
+done_testing;
+
+package InboxIdleTestObj;
+use strict;
+
+sub new { bless {}, shift }
+
+sub on_inbox_unlock {
+	my ($self, $ibx) = @_;
+	push @{$self->{called}}, $ibx;
+}

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 05/82] imap: support IDLE
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (3 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 04/82] inboxidle: new class to detect inbox changes Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 06/82] msgmap: split ->max into its own method Eric Wong
                   ` (77 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

It seems to be working as far as Mail::IMAPClient is concerned.
---
 Documentation/standards.perl |  2 +-
 lib/PublicInbox/IMAP.pm      | 58 ++++++++++++++++++++++++++++++++++--
 lib/PublicInbox/IMAPD.pm     | 20 ++++++++++++-
 lib/PublicInbox/NNTPD.pm     |  6 ++--
 t/imapd.t                    | 28 +++++++++++++++--
 5 files changed, 104 insertions(+), 10 deletions(-)

diff --git a/Documentation/standards.perl b/Documentation/standards.perl
index 37309956f39..8fc852c722b 100755
--- a/Documentation/standards.perl
+++ b/Documentation/standards.perl
@@ -43,7 +43,7 @@ my $rfcs = [
 	2822 => 'Internet message format (2001)',
 	5322 => 'Internet message format (2008)',
 	3501 => 'IMAP4rev1',
-	2177 => 'IMAP IDLE', # TODO
+	2177 => 'IMAP IDLE',
 	# 5032 = 'WITHIN search extension for IMAP',
 	4978 => 'IMAP COMPRESS Extension',
 	# 5182 = 'IMAP Extension for Referencing the Last SEARCH Result',
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index c0636066b9f..99c6c817fd7 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -15,7 +15,8 @@
 package PublicInbox::IMAP;
 use strict;
 use base qw(PublicInbox::DS);
-use fields qw(imapd logged_in ibx long_cb -login_tag);
+use fields qw(imapd logged_in ibx long_cb -login_tag
+	-idle_tag -idle_max);
 use PublicInbox::Eml;
 use PublicInbox::DS qw(now);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
@@ -88,7 +89,10 @@ sub new ($$$) {
 
 sub capa ($) {
 	my ($self) = @_;
-	my $capa = 'CAPABILITY IMAP4rev1';
+
+	# dovecot advertises IDLE pre-login; perhaps because some clients
+	# depend on it, so we'll do the same
+	my $capa = 'CAPABILITY IMAP4rev1 IDLE';
 	if ($self->{logged_in}) {
 		$capa .= ' COMPRESS=DEFLATE';
 	} else {
@@ -139,6 +143,40 @@ sub cmd_capability ($$) {
 
 sub cmd_noop ($$) { "$_[1] OK NOOP completed\r\n" }
 
+# called by PublicInbox::InboxIdle
+sub on_inbox_unlock {
+	my ($self, $ibx) = @_;
+	my $new = ($ibx->mm->minmax)[1];
+	defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset';
+	if ($new > $old) {
+		$self->{-idle_max} = $new;
+		$self->msg_more("* $_ EXISTS\r\n") for (($old + 1)..($new - 1));
+		$self->write(\"* $new EXISTS\r\n");
+	}
+}
+
+sub cmd_idle ($$) {
+	my ($self, $tag) = @_;
+	# IDLE seems allowed by dovecot w/o a mailbox selected *shrug*
+	my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n";
+	$ibx->subscribe_unlock(fileno($self->{sock}), $self);
+	$self->{-idle_tag} = $tag;
+	$self->{-idle_max} = ($ibx->mm->minmax)[1] // 0;
+	"+ idling\r\n"
+}
+
+sub cmd_done ($$) {
+	my ($self, $tag) = @_; # $tag is "DONE" (case-insensitive)
+	defined(my $idle_tag = delete $self->{-idle_tag}) or
+		return "$tag BAD not idle\r\n";
+	my $ibx = $self->{ibx} or do {
+		warn "BUG: idle_tag set w/o inbox";
+		return "$tag BAD internal bug\r\n";
+	};
+	$ibx->unsubscribe_unlock(fileno($self->{sock}));
+	"$idle_tag OK Idle completed\r\n";
+}
+
 sub cmd_examine ($$$) {
 	my ($self, $tag, $mailbox) = @_;
 	my $ibx = $self->{imapd}->{groups}->{$mailbox} or
@@ -361,7 +399,11 @@ sub process_line ($$) {
 	}
 	my $res = eval {
 		if (my $cmd = $self->can('cmd_'.lc($req // ''))) {
-			$cmd->($self, $tag, @args);
+			defined($self->{-idle_tag}) ?
+				"$self->{-idle_tag} BAD expected DONE\r\n" :
+				$cmd->($self, $tag, @args);
+		} elsif (uc($tag // '') eq 'DONE' && !defined($req)) {
+			cmd_done($self, $tag);
 		} else { # this is weird
 			auth_challenge_ok($self) //
 				"$tag BAD Error in IMAP command $req: ".
@@ -516,6 +558,16 @@ sub busy {
 	($self->{rbuf} || $self->{wbuf} || $self->not_idle_long($now));
 }
 
+sub close {
+	my ($self) = @_;
+	if (my $ibx = delete $self->{ibx}) {
+		if (my $sock = $self->{sock}) {;
+			$ibx->unsubscribe_unlock(fileno($sock));
+		}
+	}
+	$self->SUPER::close; # PublicInbox::DS::close
+}
+
 # we're read-only, so SELECT and EXAMINE do the same thing
 no warnings 'once';
 *cmd_select = \&cmd_examine;
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index 1011d6a413b..1922c16046a 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -6,10 +6,28 @@
 package PublicInbox::IMAPD;
 use strict;
 use parent qw(PublicInbox::NNTPD);
+use PublicInbox::InboxIdle;
 
 sub new {
 	my ($class) = @_;
-	$class->SUPER::new; # PublicInbox::NNTPD->new
+	bless {
+		groups => {},
+		err => \*STDERR,
+		out => \*STDOUT,
+		grouplist => [],
+		# accept_tls => { SSL_server => 1, ..., SSL_reuse_ctx => ... }
+		# idler => PublicInbox::InboxIdle
+	}, $class;
+}
+
+sub refresh_groups {
+	my ($self) = @_;
+	if (my $old_idler = delete $self->{idler}) {
+		$old_idler->close; # PublicInbox::DS::close
+	}
+	my $pi_config = PublicInbox::Config->new;
+	$self->{idler} = PublicInbox::InboxIdle->new($pi_config);
+	$self->SUPER::refresh_groups($pi_config);
 }
 
 1;
diff --git a/lib/PublicInbox/NNTPD.pm b/lib/PublicInbox/NNTPD.pm
index b8ec84ed6b0..ed5cf7cc8c0 100644
--- a/lib/PublicInbox/NNTPD.pm
+++ b/lib/PublicInbox/NNTPD.pm
@@ -30,9 +30,9 @@ sub new {
 	}, $class;
 }
 
-sub refresh_groups () {
-	my ($self) = @_;
-	my $pi_config = PublicInbox::Config->new;
+sub refresh_groups {
+	my ($self, $pi_config) = @_;
+	$pi_config //= PublicInbox::Config->new;
 	my $new = {};
 	my @list;
 	$pi_config->each_inbox(sub {
diff --git a/t/imapd.t b/t/imapd.t
index f28a663bf9d..359c4c033b2 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -3,8 +3,10 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use Test::More;
+use Time::HiRes ();
 use PublicInbox::TestCommon;
-require_mods(qw(DBD::SQLite Mail::IMAPClient));
+use PublicInbox::Config;
+require_mods(qw(DBD::SQLite Mail::IMAPClient Linux::Inotify2));
 my $level = '-Lbasic';
 SKIP: {
 	require_mods('Search::Xapian', 1);
@@ -12,7 +14,7 @@ SKIP: {
 };
 
 my @V = (1);
-#push(@V, 2) if require_git('2.6', 1);
+push(@V, 2) if require_git('2.6', 1);
 
 my ($tmpdir, $for_destroy) = tmpdir();
 my $home = "$tmpdir/home";
@@ -139,6 +141,28 @@ is_deeply([$mic->has_capability('COMPRESS')], ['DEFLATE'], 'deflate cap');
 ok($mic->compress, 'compress enabled');
 $compress_logout->($mic);
 
+my $pi_config = PublicInbox::Config->new;
+$pi_config->each_inbox(sub {
+	my ($ibx) = @_;
+	my $name = $ibx->{name};
+	my $ng = $ibx->{newsgroup};
+	my $mic = Mail::IMAPClient->new(%mic_opt);
+	ok($mic && $mic->login && $mic->IsAuthenticated, "authed $name");
+	is_deeply([$mic->has_capability('IDLE')], ['IDLE'], "IDLE capa $name");
+	ok(!$mic->idle, "IDLE fails w/o SELECT/EXAMINE $name");
+	ok($mic->examine($ng), "EXAMINE $ng succeeds");
+	ok($mic->idle, "IDLE succeeds on $ng");
+
+	open(my $fh, '<', 't/data/message_embed.eml') or BAIL_OUT("open: $!");
+	my $env = { ORIGINAL_RECIPIENT => $ibx->{-primary_address} };
+	run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or
+		BAIL_OUT('-mda delivery');
+	my $t0 = Time::HiRes::time();
+	ok(my @res = $mic->idle_data(11), "IDLE succeeds on $ng");
+	ok(grep(/\A\* [0-9] EXISTS\b/, @res), 'got EXISTS message');
+	ok((Time::HiRes::time() - $t0) < 10, 'IDLE client notified');
+});
+
 $td->kill;
 $td->join;
 is($?, 0, 'no error in exited process');

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 06/82] msgmap: split ->max into its own method
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (4 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 05/82] imap: support IDLE Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 07/82] imap: delay InboxIdle start, support refresh Eric Wong
                   ` (76 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

There's enough places where we only care about the max NNTP
article number to warrant avoiding a call into SQLite.

Using ->num_highwater in read-only packages such as
PublicInbox::IMAP is also incorrect, since that memoizes
and won't pick up changes made by other processes.
---
 lib/PublicInbox/IMAP.pm   | 10 +++++-----
 lib/PublicInbox/Msgmap.pm | 20 +++++++++++---------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 99c6c817fd7..4a43185c512 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -146,7 +146,7 @@ sub cmd_noop ($$) { "$_[1] OK NOOP completed\r\n" }
 # called by PublicInbox::InboxIdle
 sub on_inbox_unlock {
 	my ($self, $ibx) = @_;
-	my $new = ($ibx->mm->minmax)[1];
+	my $new = $ibx->mm->max;
 	defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset';
 	if ($new > $old) {
 		$self->{-idle_max} = $new;
@@ -161,7 +161,7 @@ sub cmd_idle ($$) {
 	my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n";
 	$ibx->subscribe_unlock(fileno($self->{sock}), $self);
 	$self->{-idle_tag} = $tag;
-	$self->{-idle_max} = ($ibx->mm->minmax)[1] // 0;
+	$self->{-idle_max} = $ibx->mm->max // 0;
 	"+ idling\r\n"
 }
 
@@ -182,7 +182,7 @@ sub cmd_examine ($$$) {
 	my $ibx = $self->{imapd}->{groups}->{$mailbox} or
 		return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
 	my $mm = $ibx->mm;
-	my $max = $mm->num_highwater // 0;
+	my $max = $mm->max // 0;
 	# RFC 3501 2.3.1.1 -  "A good UIDVALIDITY value to use in
 	# this case is a 32-bit representation of the creation
 	# date/time of the mailbox"
@@ -320,7 +320,7 @@ sub cmd_uid_fetch ($$$;@) {
 	if ($range =~ /\A([0-9]+):([0-9]+)\z/s) {
 		($beg, $end) = ($1, $2);
 	} elsif ($range =~ /\A([0-9]+):\*\z/s) {
-		($beg, $end) =  ($1, $ibx->mm->num_highwater // 0);
+		($beg, $end) =  ($1, $ibx->mm->max // 0);
 	} elsif ($range =~ /\A[0-9]+\z/) {
 		my $smsg = $ibx->over->get_art($range) or return "$tag OK\r\n";
 		push @$msgs, $smsg;
@@ -365,7 +365,7 @@ sub cmd_uid_search ($$$;) {
 	} elsif ($arg eq 'UID' && scalar(@rest) == 1) {
 		if ($rest[0] =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
 			my ($beg, $end) = ($1, $2);
-			$end = ($ibx->mm->minmax)[1] if $end eq '*';
+			$end = $ibx->mm->max if $end eq '*';
 			$self->msg_more('* SEARCH');
 			long_response($self, \&uid_search_uid_range,
 					$tag, $ibx, \$beg, $end);
diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index a2ffe7203e7..d115cbce352 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -55,8 +55,7 @@ sub new_file {
 		$dbh->begin_work;
 		$self->created_at(time) unless $self->created_at;
 
-		my (undef, $max) = $self->minmax();
-		$max ||= 0;
+		my $max = $self->max // 0;
 		$self->num_highwater($max);
 		$dbh->commit;
 	}
@@ -159,17 +158,20 @@ sub num_for {
 	$sth->fetchrow_array;
 }
 
+sub max {
+	my $sth = $_[0]->{dbh}->prepare_cached('SELECT MAX(num) FROM msgmap',
+						undef, 1);
+	$sth->execute;
+	$sth->fetchrow_array;
+}
+
 sub minmax {
-	my ($self) = @_;
-	my $dbh = $self->{dbh};
 	# breaking MIN and MAX into separate queries speeds up from 250ms
 	# to around 700us with 2.7million messages.
-	my $sth = $dbh->prepare_cached('SELECT MIN(num) FROM msgmap', undef, 1);
-	$sth->execute;
-	my $min = $sth->fetchrow_array;
-	$sth = $dbh->prepare_cached('SELECT MAX(num) FROM msgmap', undef, 1);
+	my $sth = $_[0]->{dbh}->prepare_cached('SELECT MIN(num) FROM msgmap',
+						undef, 1);
 	$sth->execute;
-	($min, $sth->fetchrow_array);
+	($sth->fetchrow_array, max($_[0]));
 }
 
 sub mid_delete {

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 07/82] imap: delay InboxIdle start, support refresh
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (5 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 06/82] msgmap: split ->max into its own method Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 08/82] imap: implement STATUS command Eric Wong
                   ` (75 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

InboxIdle should not be holding onto Inbox objects after the
Config object they came from expires, and Config objects may
expire on SIGHUP.

Old Inbox objects still persist due to IMAP clients holding onto
them, but that's a concern we'll deal with at another time, or
not at all, since all clients expire, eventually.

Regardless, stale inotify watch descriptors should not be left
hanging after SIGHUP refreshes.
---
 lib/PublicInbox/IMAP.pm      |  1 +
 lib/PublicInbox/IMAPD.pm     | 14 +++++----
 lib/PublicInbox/InboxIdle.pm | 36 ++++++++++++++++++----
 t/imapd.t                    | 58 ++++++++++++++++++++++++++++++++++--
 4 files changed, 96 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 4a43185c512..c8592dc0329 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -160,6 +160,7 @@ sub cmd_idle ($$) {
 	# IDLE seems allowed by dovecot w/o a mailbox selected *shrug*
 	my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n";
 	$ibx->subscribe_unlock(fileno($self->{sock}), $self);
+	$self->{imapd}->idler_start;
 	$self->{-idle_tag} = $tag;
 	$self->{-idle_max} = $ibx->mm->max // 0;
 	"+ idling\r\n"
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index 1922c16046a..05aa30e42a1 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -16,18 +16,22 @@ sub new {
 		out => \*STDOUT,
 		grouplist => [],
 		# accept_tls => { SSL_server => 1, ..., SSL_reuse_ctx => ... }
+		# pi_config => PublicInbox::Config
 		# idler => PublicInbox::InboxIdle
 	}, $class;
 }
 
 sub refresh_groups {
 	my ($self) = @_;
-	if (my $old_idler = delete $self->{idler}) {
-		$old_idler->close; # PublicInbox::DS::close
-	}
-	my $pi_config = PublicInbox::Config->new;
-	$self->{idler} = PublicInbox::InboxIdle->new($pi_config);
+	my $pi_config = $self->{pi_config} = PublicInbox::Config->new;
 	$self->SUPER::refresh_groups($pi_config);
+	if (my $idler = $self->{idler}) {
+		$idler->refresh($pi_config);
+	}
+}
+
+sub idler_start {
+	$_[0]->{idler} //= PublicInbox::InboxIdle->new($_[0]->{pi_config});
 }
 
 1;
diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm
index 095a801c946..c19b8d186cd 100644
--- a/lib/PublicInbox/InboxIdle.pm
+++ b/lib/PublicInbox/InboxIdle.pm
@@ -4,7 +4,8 @@
 package PublicInbox::InboxIdle;
 use strict;
 use base qw(PublicInbox::DS);
-use fields qw(pi_config inot);
+use fields qw(pi_config inot pathmap);
+use Cwd qw(abs_path);
 use Symbol qw(gensym);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
 my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify
@@ -19,13 +20,35 @@ if ($^O eq 'linux' && eval { require Linux::Inotify2; 1 }) {
 require PublicInbox::In2Tie if $ino_cls;
 
 sub in2_arm ($$) { # PublicInbox::Config::each_inbox callback
-	my ($ibx, $inot) = @_;
-	my $path = "$ibx->{inboxdir}/";
-	$path .= $ibx->version >= 2 ? 'inbox.lock' : 'ssoma.lock';
-	$inot->watch($path, $IN_CLOSE, sub { $ibx->on_unlock });
+	my ($ibx, $self) = @_;
+	my $dir = abs_path($ibx->{inboxdir});
+	if (!defined($dir)) {
+		warn "W: $ibx->{inboxdir} not watched: $!\n";
+		return;
+	}
+	my $inot = $self->{inot};
+	my $cur = $self->{pathmap}->{$dir} //= [];
+
+	# transfer old subscriptions to the current inbox, cancel the old watch
+	if (my $old_ibx = $cur->[0]) {
+		$ibx->{unlock_subs} and
+			die "BUG: $dir->{unlock_subs} should not exist";
+		$ibx->{unlock_subs} = $old_ibx->{unlock_subs};
+		$cur->[1]->cancel;
+	}
+	$cur->[0] = $ibx;
+
+	my $lock = "$dir/".($ibx->version >= 2 ? 'inbox.lock' : 'ssoma.lock');
+	$cur->[1] = $inot->watch($lock, $IN_CLOSE, sub { $ibx->on_unlock });
+
 	# TODO: detect deleted packs (and possibly other files)
 }
 
+sub refresh {
+	my ($self, $pi_config) = @_;
+	$pi_config->each_inbox(\&in2_arm, $self);
+}
+
 sub new {
 	my ($class, $pi_config) = @_;
 	my $self = fields::new($class);
@@ -42,7 +65,8 @@ sub new {
 		$inot = PublicInbox::FakeInotify->new;
 	}
 	$self->{inot} = $inot;
-	$pi_config->each_inbox(\&in2_arm, $inot);
+	$self->{pathmap} = {}; # inboxdir => [ ibx, watch1, watch2, watch3...]
+	refresh($self, $pi_config);
 	$self;
 }
 
diff --git a/t/imapd.t b/t/imapd.t
index 359c4c033b2..b0caa8f1779 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -6,6 +6,7 @@ use Test::More;
 use Time::HiRes ();
 use PublicInbox::TestCommon;
 use PublicInbox::Config;
+use PublicInbox::Spawn qw(which);
 require_mods(qw(DBD::SQLite Mail::IMAPClient Linux::Inotify2));
 my $level = '-Lbasic';
 SKIP: {
@@ -141,9 +142,12 @@ is_deeply([$mic->has_capability('COMPRESS')], ['DEFLATE'], 'deflate cap');
 ok($mic->compress, 'compress enabled');
 $compress_logout->($mic);
 
+my $have_inotify = eval { require Linux::Inotify2; 1 };
+
 my $pi_config = PublicInbox::Config->new;
 $pi_config->each_inbox(sub {
 	my ($ibx) = @_;
+	my $env = { ORIGINAL_RECIPIENT => $ibx->{-primary_address} };
 	my $name = $ibx->{name};
 	my $ng = $ibx->{newsgroup};
 	my $mic = Mail::IMAPClient->new(%mic_opt);
@@ -154,12 +158,62 @@ $pi_config->each_inbox(sub {
 	ok($mic->idle, "IDLE succeeds on $ng");
 
 	open(my $fh, '<', 't/data/message_embed.eml') or BAIL_OUT("open: $!");
-	my $env = { ORIGINAL_RECIPIENT => $ibx->{-primary_address} };
 	run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or
 		BAIL_OUT('-mda delivery');
 	my $t0 = Time::HiRes::time();
 	ok(my @res = $mic->idle_data(11), "IDLE succeeds on $ng");
-	ok(grep(/\A\* [0-9] EXISTS\b/, @res), 'got EXISTS message');
+	is(grep(/\A\* [0-9] EXISTS\b/, @res), 1, 'got EXISTS message');
+	ok((Time::HiRes::time() - $t0) < 10, 'IDLE client notified');
+
+	my (@ino_info, $ino_fdinfo);
+	SKIP: {
+		skip 'no inotify support', 1 unless $have_inotify;
+		skip 'missing /proc/$PID/fd', 1 if !-d "/proc/$td->{pid}/fd";
+		my @ino = grep {
+			readlink($_) =~ /\binotify\b/
+		} glob("/proc/$td->{pid}/fd/*");
+		is(scalar(@ino), 1, 'only one inotify FD');
+		my $ino_fd = (split('/', $ino[0]))[-1];
+		$ino_fdinfo = "/proc/$td->{pid}/fdinfo/$ino_fd";
+		if (open my $fh, '<', $ino_fdinfo) {
+			local $/ = "\n";
+			@ino_info = grep(/^inotify wd:/, <$fh>);
+			ok(scalar(@ino_info), 'inotify has watches');
+		} else {
+			skip "$ino_fdinfo missing: $!", 1;
+		}
+	};
+
+	# ensure IDLE persists across HUP, w/o extra watches or FDs
+	$td->kill('HUP') or BAIL_OUT "failed to kill -imapd: $!";
+	SKIP: {
+		skip 'no inotify fdinfo (or support)', 2 if !@ino_info;
+		my (@tmp, %prev);
+		local $/ = "\n";
+		my $end = time + 5;
+		until (time > $end) {
+			select undef, undef, undef, 0.01;
+			open my $fh, '<', $ino_fdinfo or
+						BAIL_OUT "$ino_fdinfo: $!";
+			%prev = map { $_ => 1 } @ino_info;
+			@tmp = grep(/^inotify wd:/, <$fh>);
+			if (scalar(@tmp) == scalar(@ino_info)) {
+				delete @prev{@tmp};
+				last if scalar(keys(%prev)) == @ino_info;
+			}
+		}
+		is(scalar @tmp, scalar @ino_info,
+			'old inotify watches replaced');
+		is(scalar keys %prev, scalar @ino_info,
+			'no previous watches overlap');
+	};
+
+	open($fh, '<', 't/data/0001.patch') or BAIL_OUT("open: $!");
+	run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or
+		BAIL_OUT('-mda delivery');
+	$t0 = Time::HiRes::time();
+	ok(@res = $mic->idle_data(11), "IDLE succeeds on $ng after HUP");
+	is(grep(/\A\* [0-9] EXISTS\b/, @res), 1, 'got EXISTS message');
 	ok((Time::HiRes::time() - $t0) < 10, 'IDLE client notified');
 });
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 08/82] imap: implement STATUS command
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (6 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 07/82] imap: delay InboxIdle start, support refresh Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 09/82] imap: use Text::ParseWords::parse_line to handle quoted words Eric Wong
                   ` (74 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

I'm not sure if there's much use for this command, but it's
part of RFC3501 and works read-only.
---
 lib/PublicInbox/IMAP.pm | 29 +++++++++++++++++++++++++++++
 t/imapd.t               |  5 +++++
 2 files changed, 34 insertions(+)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index c8592dc0329..a2d59e5cccc 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -306,6 +306,35 @@ sub uid_fetch_m { # long_response
 	1;
 }
 
+sub cmd_status ($$$;@) {
+	my ($self, $tag, $mailbox, @items) = @_;
+	my $ibx = $self->{imapd}->{groups}->{$mailbox} or
+		return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
+	return "$tag BAD no items\r\n" if !scalar(@items);
+	($items[0] !~ s/\A\(//s || $items[-1] !~ s/\)\z//s) and
+		return "$tag BAD invalid args\r\n";
+
+	my $mm = $ibx->mm;
+	my ($max, @it);
+	for my $it (@items) {
+		$it = uc($it);
+		push @it, $it;
+		if ($it =~ /\A(?:MESSAGES|UNSEEN|RECENT)\z/) {
+			push(@it, ($max //= $mm->max // 0));
+		} elsif ($it eq 'UIDNEXT') {
+			push(@it, ($max //= $mm->max // 0) + 1);
+		} elsif ($it eq 'UIDVALIDITY') {
+			push(@it, $mm->created_at //
+				return("$tag BAD UIDVALIDITY\r\n"));
+		} else {
+			return "$tag BAD invalid item\r\n";
+		}
+	}
+	return "$tag BAD no items\r\n" if !@it;
+	"* STATUS $mailbox (".join(' ', @it).")\r\n" .
+	"$tag OK Status complete\r\n";
+}
+
 sub cmd_uid_fetch ($$$;@) {
 	my ($self, $tag, $range, @want) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
diff --git a/t/imapd.t b/t/imapd.t
index b0caa8f1779..7512bb90050 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -81,6 +81,11 @@ ok(!$mic->select('foo') && ($e = $@), 'EXAMINE non-existent');
 like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent');
 ok($mic->select('inbox.i1'), 'SELECT succeeds');
 ok($mic->examine('inbox.i1'), 'EXAMINE succeeds');
+my @raw = $mic->status('inbox.i1', qw(Messages uidnext uidvalidity));
+is(scalar(@raw), 2, 'got status response');
+like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\x20
+	\(MESSAGES\x20\d+\x20UIDNEXT\x20\d+\x20UIDVALIDITY\x20\d+\)\r\n/sx);
+like($raw[1], qr/\A\S+ OK /, 'finished status response');
 
 my $ret = $mic->search('all') or BAIL_OUT "SEARCH FAIL $@";
 is_deeply($ret, [ 1 ], 'search all works');

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 09/82] imap: use Text::ParseWords::parse_line to handle quoted words
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (7 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 08/82] imap: implement STATUS command Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 10/82] imap: support LIST command Eric Wong
                   ` (73 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

IMAP clients may quote args and escape similar to POSIX shell,
so attempt to handle them properly using this standard library
module.
---
 lib/PublicInbox/IMAP.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index a2d59e5cccc..7745d9f96f7 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -20,6 +20,7 @@ use fields qw(imapd logged_in ibx long_cb -login_tag
 use PublicInbox::Eml;
 use PublicInbox::DS qw(now);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
+use Text::ParseWords qw(parse_line);
 use Errno qw(EAGAIN);
 my $Address;
 for my $mod (qw(Email::Address::XS Mail::Address)) {
@@ -423,7 +424,8 @@ sub args_ok ($$) { # duplicated from PublicInbox::NNTP
 # returns 1 if we can continue, 0 if not due to buffered writes or disconnect
 sub process_line ($$) {
 	my ($self, $l) = @_;
-	my ($tag, $req, @args) = split(/[ \t]+/, $l);
+	my ($tag, $req, @args) = parse_line('[ \t]+', 0, $l);
+	pop(@args) if (@args && !defined($args[-1]));
 	if (@args && uc($req) eq 'UID') {
 		$req .= "_".(shift @args);
 	}

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 10/82] imap: support LIST command
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (8 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 09/82] imap: use Text::ParseWords::parse_line to handle quoted words Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 11/82] t/imapd: support FakeInotify and KQNotify Eric Wong
                   ` (72 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We'll optimize for the common case of: $TAG LIST "" *
and rely on the grep perlfunc to handle trickier cases.
---
 lib/PublicInbox/IMAP.pm  | 14 ++++++++++
 lib/PublicInbox/IMAPD.pm | 25 +++++++++++++++++
 t/imapd.t                | 59 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 7745d9f96f7..ca9a0ea7d42 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -336,6 +336,20 @@ sub cmd_status ($$$;@) {
 	"$tag OK Status complete\r\n";
 }
 
+my %patmap = ('*' => '.*', '%' => '[^\.]*');
+sub cmd_list ($$$$) {
+	my ($self, $tag, $refname, $wildcard) = @_;
+	my $l = $self->{imapd}->{inboxlist};
+	if ($refname eq '' && $wildcard eq '') {
+		# request for hierarchy delimiter
+		$l = [ qq[* LIST (\\Noselect) "." ""\r\n] ];
+	} elsif ($refname ne '' || $wildcard ne '*') {
+		$wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eig;
+		$l = [ grep(/ \Q$refname\E$wildcard\r\n\z/s, @$l) ];
+	}
+	\(join('', @$l, "$tag OK List complete\r\n"));
+}
+
 sub cmd_uid_fetch ($$$;@) {
 	my ($self, $tag, $range, @want) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index 05aa30e42a1..a3a2598661b 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -21,10 +21,35 @@ sub new {
 	}, $class;
 }
 
+sub refresh_inboxlist ($) {
+	my ($self) = @_;
+	my @names = map { $_->{newsgroup} } @{delete $self->{grouplist}};
+	my %ns; # "\Noselect \HasChildren"
+	for (@names) {
+		my $up = $_;
+		while ($up =~ s/\.[^\.]+\z//) {
+			$ns{$up} = '\\Noselect \\HasChildren';
+		}
+	}
+	@names = map {;
+		my $at = delete($ns{$_}) ? '\\HasChildren' : '\\HasNoChildren';
+		qq[* LIST ($at) "." $_\r\n]
+	} @names;
+	push(@names, map { qq[* LIST ($ns{$_}) "." $_\r\n] } keys %ns);
+	@names = sort {
+		my ($xa) = ($a =~ / (\S+)\r\n/g);
+		my ($xb) = ($b =~ / (\S+)\r\n/g);
+		length($xa) <=> length($xb);
+	} @names;
+	$self->{inboxlist} = \@names;
+}
+
 sub refresh_groups {
 	my ($self) = @_;
 	my $pi_config = $self->{pi_config} = PublicInbox::Config->new;
 	$self->SUPER::refresh_groups($pi_config);
+	refresh_inboxlist($self);
+
 	if (my $idler = $self->{idler}) {
 		$idler->refresh($pi_config);
 	}
diff --git a/t/imapd.t b/t/imapd.t
index 7512bb90050..a377c02ab43 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -87,6 +87,65 @@ like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\x20
 	\(MESSAGES\x20\d+\x20UIDNEXT\x20\d+\x20UIDVALIDITY\x20\d+\)\r\n/sx);
 like($raw[1], qr/\A\S+ OK /, 'finished status response');
 
+@raw = $mic->list;
+like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/,
+	'got an inbox');
+like($raw[-1], qr/^\S+ OK /, 'response ended with OK');
+is(scalar(@raw), scalar(@V) + 2, 'default LIST response');
+@raw = $mic->list('', 'inbox.i1');
+is(scalar(@raw), 2, 'limited LIST response');
+like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/,
+		'got an inbox.i1');
+like($raw[-1], qr/^\S+ OK /, 'response ended with OK');
+
+{ # make sure we get '%' globbing right
+	my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y));
+	my $self = { imapd => { grouplist => \@n } };
+	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
+	my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%');
+	is(scalar($$res =~ tr/\n/\n/), 2, 'only one result');
+	like($$res, qr/ x\r\ntag OK/, 'saw expected');
+	$res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x.', '%');
+	is(scalar($$res =~ tr/\n/\n/), 3, 'only one result');
+	is(scalar(my @x = ($$res =~ m/ x\.[zy]\r\n/g)), 2, 'match expected');
+
+	$res = PublicInbox::IMAP::cmd_list($self, 't', 'x.(?{die "RCE"})', '%');
+	like($$res, qr/\At OK /, 'refname does not match attempted RCE');
+	$res = PublicInbox::IMAP::cmd_list($self, 't', '', '(?{die "RCE"})%');
+	like($$res, qr/\At OK /, 'wildcard does not match attempted RCE');
+}
+
+if ($ENV{TEST_BENCHMARK}) {
+	use Benchmark qw(:all);
+	my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000);
+	push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000);
+	my $self = { imapd => { grouplist => \@n } };
+	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
+
+	my $n = scalar @n;
+	open my $null, '>', '/dev/null' or die;
+	my $ds = { sock => $null };
+	my $nr = 200;
+	diag "starting benchmark...";
+	my $t = timeit(1, sub {
+		for (0..$nr) {
+			my $res = PublicInbox::IMAP::cmd_list($self, 'tag',
+								'', '*');
+			PublicInbox::DS::write($ds, $res);
+		}
+	});
+	diag timestr($t). "list all for $n inboxes $nr times";
+	$nr = 20;
+	$t = timeit(1, sub {
+		for (0..$nr) {
+			my $res = PublicInbox::IMAP::cmd_list($self, 'tag',
+								'inbox.', '%');
+			PublicInbox::DS::write($ds, $res);
+		}
+	});
+	diag timestr($t). "list partial for $n inboxes $nr times";
+}
+
 my $ret = $mic->search('all') or BAIL_OUT "SEARCH FAIL $@";
 is_deeply($ret, [ 1 ], 'search all works');
 $ret = $mic->search('uid 1') or BAIL_OUT "SEARCH FAIL $@";

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 11/82] t/imapd: support FakeInotify and KQNotify
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (9 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 10/82] imap: support LIST command Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 12/82] imap: support fetch for BODYSTRUCTURE and BODY Eric Wong
                   ` (71 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We can fill in some missing pieces from the emulation APIs
to enable IMAP IDLE tests on non-Linux platforms.
---
 lib/PublicInbox/FakeInotify.pm | 22 +++++++++++++++++++++-
 lib/PublicInbox/KQNotify.pm    |  6 ++++++
 t/imapd.t                      |  3 ++-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/FakeInotify.pm b/lib/PublicInbox/FakeInotify.pm
index bd610463181..b077d63a4b4 100644
--- a/lib/PublicInbox/FakeInotify.pm
+++ b/lib/PublicInbox/FakeInotify.pm
@@ -8,13 +8,30 @@ use strict;
 use Time::HiRes qw(stat);
 my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify
 
-sub new { bless { watch => {} }, __PACKAGE__ }
+my $poll_intvl = 2; # same as Filesys::Notify::Simple
+my $for_cancel = bless \(my $x), 'PublicInbox::FakeInotify::Watch';
+
+sub poll_once {
+	my ($self) = @_;
+	sub {
+		eval { $self->poll };
+		warn "E: FakeInotify->poll: $@\n" if $@;
+		PublicInbox::DS::add_timer($poll_intvl, poll_once($self));
+	};
+}
+
+sub new {
+	my $self = bless { watch => {} }, __PACKAGE__;
+	PublicInbox::DS::add_timer($poll_intvl, poll_once($self));
+	$self;
+}
 
 # behaves like Linux::Inotify2->watch
 sub watch {
 	my ($self, $path, $mask, $cb) = @_;
 	my @st = stat($path) or return;
 	$self->{watch}->{"$path\0$mask"} = [ @st, $cb ];
+	$for_cancel;
 }
 
 # behaves like non-blocking Linux::Inotify2->poll
@@ -36,4 +53,7 @@ sub poll {
 	}
 }
 
+package PublicInbox::FakeInotify::Watch;
+sub cancel {} # noop
+
 1;
diff --git a/lib/PublicInbox/KQNotify.pm b/lib/PublicInbox/KQNotify.pm
index 3cf9c0f5a0c..1b5c578ef3d 100644
--- a/lib/PublicInbox/KQNotify.pm
+++ b/lib/PublicInbox/KQNotify.pm
@@ -31,6 +31,7 @@ sub watch {
 	} else {
 		die "TODO Not implemented: $mask";
 	}
+	bless \$fh, 'PublicInbox::KQNotify::Watch';
 }
 
 # emulate Linux::Inotify::fileno
@@ -57,4 +58,9 @@ sub poll {
 	}
 }
 
+package PublicInbox::KQNotify::Watch;
+use strict;
+
+sub cancel { close ${$_[0]} or die "close: $!" }
+
 1;
diff --git a/t/imapd.t b/t/imapd.t
index a377c02ab43..c31ac12f941 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -7,7 +7,8 @@ use Time::HiRes ();
 use PublicInbox::TestCommon;
 use PublicInbox::Config;
 use PublicInbox::Spawn qw(which);
-require_mods(qw(DBD::SQLite Mail::IMAPClient Linux::Inotify2));
+require_mods(qw(DBD::SQLite Mail::IMAPClient));
+
 my $level = '-Lbasic';
 SKIP: {
 	require_mods('Search::Xapian', 1);

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 12/82] imap: support fetch for BODYSTRUCTURE and BODY
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (10 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 11/82] t/imapd: support FakeInotify and KQNotify Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 13/82] eml: each_part: single part $idx is 1 Eric Wong
                   ` (70 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

I'm not sure which clients use these, but it could be useful
down the line.
---
 lib/PublicInbox/Eml.pm  |   7 +--
 lib/PublicInbox/IMAP.pm | 106 +++++++++++++++++++++++++++++++++++++++-
 t/imapd.t               |  16 +++++-
 3 files changed, 123 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 6f6874cd237..d2bd3915545 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -235,11 +235,11 @@ sub mp_descend ($$) {
 # $arg - user-supplied arg (think pthread_create)
 # $once - unref body scalar during iteration
 sub each_part {
-	my ($self, $cb, $arg, $once) = @_;
+	my ($self, $cb, $arg, $once, $all) = @_;
 	my $p = mp_descend($self, $once // 0) or
 					return $cb->([$self, 0, 0], $arg);
 
-	$cb->([$self, 0, 0], $arg) if $self->{-call_cb}; # rare
+	$cb->([$self, 0, 0], $arg) if ($all || $self->{-call_cb}); # rare
 
 	$p = [ $p, 0 ];
 	my @s; # our virtual stack
@@ -255,7 +255,8 @@ sub each_part {
 				(my $nxt = mp_descend($sub, $nr))) {
 			push(@s, $p) if scalar @{$p->[0]};
 			$p = [ $nxt, @idx, 0 ];
-			$cb->([$sub, $depth, @idx], $arg) if $sub->{-call_cb};
+			($all || $sub->{-call_cb}) and
+				$cb->([$sub, $depth, @idx], $arg);
 		} else { # a leaf node
 			$cb->([$sub, $depth, @idx], $arg);
 		}
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index ca9a0ea7d42..6c5e0290925 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -18,6 +18,7 @@ use base qw(PublicInbox::DS);
 use fields qw(imapd logged_in ibx long_cb -login_tag
 	-idle_tag -idle_max);
 use PublicInbox::Eml;
+use PublicInbox::EmlContentFoo qw(parse_content_disposition);
 use PublicInbox::DS qw(now);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
 use Text::ParseWords qw(parse_line);
@@ -248,6 +249,105 @@ sub eml_envelope ($) {
 	) . ')';
 }
 
+sub _esc_hash ($) {
+	my ($hash) = @_;
+	if ($hash && scalar keys %$hash) {
+		$hash = [ %$hash ]; # flatten hash into 1-dimensional array
+		'(' . join(' ', map { _esc($_) } @$hash) . ')';
+	} else {
+		'NIL';
+	}
+}
+
+sub body_disposition ($) {
+	my ($eml) = @_;
+	my $cd = $eml->header_raw('Content-Disposition') or return 'NIL';
+	$cd = parse_content_disposition($cd);
+	my $buf = '('._esc($cd->{type});
+	$buf .= ' ' . _esc_hash(delete $cd->{attributes});
+	$buf .= ')';
+}
+
+sub body_leaf ($$;$) {
+	my ($eml, $structure, $hold) = @_;
+	my $buf = '';
+	$eml->{is_submsg} and # parent was a message/(rfc822|news|global)
+		$buf .= eml_envelope($eml). ' ';
+	my $ct = $eml->ct;
+	$buf .= '('._esc($ct->{type}).' ';
+	$buf .= _esc($ct->{subtype});
+	$buf .= ' ' . _esc_hash(delete $ct->{attributes});
+	$buf .= ' ' . _esc($eml->header_raw('Content-ID'));
+	$buf .= ' ' . _esc($eml->header_raw('Content-Description'));
+	my $cte = $eml->header_raw('Content-Transfer-Encoding') // '7bit';
+	$buf .= ' ' . _esc($cte);
+	$buf .= ' ' . $eml->{imap_body_len};
+	$buf .= ' '.($eml->body_raw =~ tr/\n/\n/) if lc($ct->{type}) eq 'text';
+
+	# for message/(rfc822|global|news), $hold[0] should have envelope
+	$buf .= ' ' . (@$hold ? join('', @$hold) : 'NIL') if $hold;
+
+	if ($structure) {
+		$buf .= ' '._esc($eml->header_raw('Content-MD5'));
+		$buf .= ' '. body_disposition($eml);
+		$buf .= ' '._esc($eml->header_raw('Content-Language'));
+		$buf .= ' '._esc($eml->header_raw('Content-Location'));
+	}
+	$buf .= ')';
+}
+
+sub body_parent ($$$) {
+	my ($eml, $structure, $hold) = @_;
+	my $ct = $eml->ct;
+	my $type = lc($ct->{type});
+	if ($type eq 'multipart') {
+		my $buf = '(';
+		$buf .= @$hold ? join('', @$hold) : 'NIL';
+		$buf .= ' '._esc($ct->{subtype});
+		if ($structure) {
+			$buf .= ' '._esc_hash(delete $ct->{attributes});
+			$buf .= ' '.body_disposition($eml);
+			$buf .= ' '._esc($eml->header_raw('Content-Language'));
+			$buf .= ' '._esc($eml->header_raw('Content-Location'));
+		}
+		$buf .= ')';
+		@$hold = ($buf);
+	} else { # message/(rfc822|global|news)
+		@$hold = (body_leaf($eml, $structure, $hold));
+	}
+}
+
+# this is gross, but we need to process the parent part AFTER
+# the child parts are done
+sub bodystructure_prep {
+	my ($p, $q) = @_;
+	my ($eml, $depth) = @$p; # ignore idx
+	# set length here, as $eml->{bdy} gets deleted for message/rfc822
+	$eml->{imap_body_len} = length($eml->body_raw);
+	push @$q, $eml, $depth;
+}
+
+# for FETCH BODY and FETCH BODYSTRUCTURE
+sub fetch_body ($;$) {
+	my ($eml, $structure) = @_;
+	my @q;
+	$eml->each_part(\&bodystructure_prep, \@q, 0, 1);
+	my $cur_depth = 0;
+	my @hold;
+	do {
+		my ($part, $depth) = splice(@q, -2);
+		my $is_mp_parent = $depth == ($cur_depth - 1);
+		$cur_depth = $depth;
+
+		if ($is_mp_parent) {
+			body_parent($part, $structure, \@hold);
+		} else {
+			unshift @hold, body_leaf($part, $structure);
+		}
+	} while (@q);
+	join('', @hold);
+}
+
 sub uid_fetch_cb { # called by git->cat_async
 	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
 	my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg;
@@ -286,7 +386,11 @@ sub uid_fetch_cb { # called by git->cat_async
 		$self->msg_more(" $f {".length($$bref)."}\r\n");
 		$self->msg_more($$bref);
 	}
-	# TODO BODY/BODYSTRUCTURE, specific headers
+	$want->{BODYSTRUCTURE} and
+		$self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1));
+	$want->{BODY} and
+		$self->msg_more(' BODY '.fetch_body($eml));
+
 	$self->msg_more(")\r\n");
 }
 
diff --git a/t/imapd.t b/t/imapd.t
index c31ac12f941..31154bdf3e7 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -7,7 +7,7 @@ use Time::HiRes ();
 use PublicInbox::TestCommon;
 use PublicInbox::Config;
 use PublicInbox::Spawn qw(which);
-require_mods(qw(DBD::SQLite Mail::IMAPClient));
+require_mods(qw(DBD::SQLite Mail::IMAPClient Mail::IMAPClient::BodyStructure));
 
 my $level = '-Lbasic';
 SKIP: {
@@ -190,6 +190,11 @@ for my $r ('1:*', '1') {
 
 	$ret = $mic->fetch_hash($r, 'FLAGS') or BAIL_OUT "FETCH $@";
 	is_deeply($ret->{1}->{FLAGS}, '', 'no flags');
+
+	my $bs = $mic->get_bodystructure($r) or BAIL_OUT("bodystructure: $@");
+	ok($bs, 'got a bodystructure');
+	is(lc($bs->bodytype), 'text', '->bodytype');
+	is(lc($bs->bodyenc), '8bit', '->bodyenc');
 }
 
 # Mail::IMAPClient ->compress creates cyclic reference:
@@ -217,10 +222,12 @@ $pi_config->each_inbox(sub {
 	my $ng = $ibx->{newsgroup};
 	my $mic = Mail::IMAPClient->new(%mic_opt);
 	ok($mic && $mic->login && $mic->IsAuthenticated, "authed $name");
+	my $uidnext = $mic->uidnext($ng); # we'll fetch BODYSTRUCTURE on this
+	ok($uidnext, 'got uidnext for later fetch');
 	is_deeply([$mic->has_capability('IDLE')], ['IDLE'], "IDLE capa $name");
 	ok(!$mic->idle, "IDLE fails w/o SELECT/EXAMINE $name");
 	ok($mic->examine($ng), "EXAMINE $ng succeeds");
-	ok($mic->idle, "IDLE succeeds on $ng");
+	ok(my $idle_tag = $mic->idle, "IDLE succeeds on $ng");
 
 	open(my $fh, '<', 't/data/message_embed.eml') or BAIL_OUT("open: $!");
 	run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or
@@ -280,6 +287,11 @@ $pi_config->each_inbox(sub {
 	ok(@res = $mic->idle_data(11), "IDLE succeeds on $ng after HUP");
 	is(grep(/\A\* [0-9] EXISTS\b/, @res), 1, 'got EXISTS message');
 	ok((Time::HiRes::time() - $t0) < 10, 'IDLE client notified');
+	ok($mic->done($idle_tag), 'IDLE DONE');
+	my $bs = $mic->get_bodystructure($uidnext);
+	ok($bs, 'BODYSTRUCTURE ok for deeply nested');
+	$ret = $mic->fetch_hash($uidnext, 'BODY') or BAIL_OUT "FETCH $@";
+	ok($ret->{$uidnext}->{BODY}, 'got something in BODY');
 });
 
 $td->kill;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 13/82] eml: each_part: single part $idx is 1
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (11 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 12/82] imap: support fetch for BODYSTRUCTURE and BODY Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 14/82] imap: allow fetch of partial of BODY[...] and headers Eric Wong
                   ` (69 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Instead of counts starting at 0, we start the single-part
message at 1 like we do with subparts of a multipart message.
This will make it easier to map offsets for "BODY[$SECTION]"
when using IMAP FETCH, since $SECTION must contain non-zero
numbers according to RFC 3501.

This doesn't make any difference for WWW URLs, since single part
messages cannot have downloadable attachments.
---
 lib/PublicInbox/Eml.pm     | 2 +-
 lib/PublicInbox/MsgIter.pm | 2 +-
 t/eml.t                    | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index d2bd3915545..ec682b919ea 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -237,7 +237,7 @@ sub mp_descend ($$) {
 sub each_part {
 	my ($self, $cb, $arg, $once, $all) = @_;
 	my $p = mp_descend($self, $once // 0) or
-					return $cb->([$self, 0, 0], $arg);
+					return $cb->([$self, 0, 1], $arg);
 
 	$cb->([$self, 0, 0], $arg) if ($all || $self->{-call_cb}); # rare
 
diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index 5ec2a4d9c7f..bb1dfeadc29 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -34,7 +34,7 @@ sub em_each_part ($$;$$) {
 			}
 		}
 	} else {
-		$cb->([$mime, 0, 0], $cb_arg);
+		$cb->([$mime, 0, 1], $cb_arg);
 	}
 }
 
diff --git a/t/eml.t b/t/eml.t
index a2479f6f068..8d131b1418e 100644
--- a/t/eml.t
+++ b/t/eml.t
@@ -169,7 +169,7 @@ EOF
 	is(scalar(@tmp), 1, 'got one part even w/o boundary');
 	is($tmp[0]->[0]->[0]->body, "hello world\n", 'body preserved');
 	is($tmp[0]->[0]->[1], 0, '$depth is zero');
-	is($tmp[0]->[0]->[2], 0, '@idx is zero');
+	is($tmp[0]->[0]->[2], 1, '@idx is one');
 }
 
 # I guess the following only worked in PI::M because of a happy accident

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 14/82] imap: allow fetch of partial of BODY[...] and headers
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (12 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 13/82] eml: each_part: single part $idx is 1 Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 15/82] imap: always include `resp-text' in responses Eric Wong
                   ` (68 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

IMAP supports a high level of granularity when it comes to
fetching, but fortunately Perl makes it fairly easy to support.
---
 MANIFEST                |   1 +
 lib/PublicInbox/IMAP.pm | 154 ++++++++++++++++++++++++++++++++++++++--
 t/imap.t                |  43 +++++++++++
 t/imapd.t               |  61 +++++++++++++++-
 4 files changed, 253 insertions(+), 6 deletions(-)
 create mode 100644 t/imap.t

diff --git a/MANIFEST b/MANIFEST
index 8aff192c7ee..0803c3654d6 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -266,6 +266,7 @@ t/httpd-https.t
 t/httpd-unix.t
 t/httpd.t
 t/hval.t
+t/imap.t
 t/imapd-tls.t
 t/imapd.t
 t/import.t
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 6c5e0290925..86e0a176be1 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -390,7 +390,9 @@ sub uid_fetch_cb { # called by git->cat_async
 		$self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1));
 	$want->{BODY} and
 		$self->msg_more(' BODY '.fetch_body($eml));
-
+	if (my $partial = delete $want->{-partial}) {
+		partial_emit($self, $partial, $eml);
+	}
 	$self->msg_more(")\r\n");
 }
 
@@ -454,16 +456,158 @@ sub cmd_list ($$$$) {
 	\(join('', @$l, "$tag OK List complete\r\n"));
 }
 
+sub eml_index_offs_i { # PublicInbox::Eml::each_part callback
+	my ($p, $all) = @_;
+	my ($eml, undef, $idx) = @$p;
+	if ($idx && lc($eml->ct->{type}) eq 'multipart') {
+		$eml->{imap_bdy} = $eml->{bdy} // \'';
+	}
+	$all->{$idx} = $eml; # $idx => Eml
+}
+
+# prepares an index for BODY[$SECTION_IDX] fetches
+sub eml_body_idx ($$) {
+	my ($eml, $section_idx) = @_;
+	my $idx = $eml->{imap_all_parts} //= do {
+		my $all = {};
+		$eml->each_part(\&eml_index_offs_i, $all, 0, 1);
+		# top-level of multipart, BODY[0] not allowed (nz-number)
+		delete $all->{0};
+		$all;
+	};
+	$idx->{$section_idx};
+}
+
+# BODY[($SECTION_IDX)?(.$SECTION_NAME)?]<$offset.$bytes>
+sub partial_body {
+	my ($eml, $section_idx, $section_name) = @_;
+	if (defined $section_idx) {
+		$eml = eml_body_idx($eml, $section_idx) or return;
+	}
+	if (defined $section_name) {
+		if ($section_name eq 'MIME') {
+			# RFC 3501 6.4.5 states:
+			#	The MIME part specifier MUST be prefixed
+			#	by one or more numeric part specifiers
+			return unless defined $section_idx;
+			return $eml->header_obj->as_string . "\r\n";
+		}
+		my $bdy = $eml->{bdy} // $eml->{imap_bdy} // \'';
+		$eml = PublicInbox::Eml->new($$bdy);
+		if ($section_name eq 'TEXT') {
+			return $eml->body_raw;
+		} elsif ($section_name eq 'HEADER') {
+			return $eml->header_obj->as_string . "\r\n";
+		} else {
+			die "BUG: bad section_name=$section_name";
+		}
+	}
+	${$eml->{bdy} // $eml->{imap_bdy} // \''};
+}
+
+# similar to what's in PublicInbox::Eml::re_memo, but doesn't memoize
+# to avoid OOM with malicious users
+sub hdrs_regexp ($) {
+	my ($hdrs) = @_;
+	my $names = join('|', map { "\Q$_" } split(/[ \t]+/, $hdrs));
+	qr/^(?:$names):[ \t]*[^\n]*\r?\n # 1st line
+		# continuation lines:
+		(?:[^:\n]*?[ \t]+[^\n]*\r?\n)*
+		/ismx;
+}
+
+# BODY[($SECTION_IDX.)?HEADER.FIELDS.NOT ($HDRS)]<$offset.$bytes>
+sub partial_hdr_not {
+	my ($eml, $section_idx, $hdrs) = @_;
+	if (defined $section_idx) {
+		$eml = eml_body_idx($eml, $section_idx) or return;
+	}
+	my $str = $eml->header_obj->as_string;
+	my $re = hdrs_regexp($hdrs);
+	$str =~ s/$re//g;
+	$str .= "\r\n";
+}
+
+# BODY[($SECTION_IDX.)?HEADER.FIELDS ($HDRS)]<$offset.$bytes>
+sub partial_hdr_get {
+	my ($eml, $section_idx, $hdrs) = @_;
+	if (defined $section_idx) {
+		$eml = eml_body_idx($eml, $section_idx) or return;
+	}
+	my $str = $eml->header_obj->as_string;
+	my $re = hdrs_regexp($hdrs);
+	join('', ($str =~ m/($re)/g), "\r\n");
+}
+
+sub partial_prepare ($$$) {
+	my ($partial, $want, $att) = @_;
+
+	# recombine [ "BODY[1.HEADER.FIELDS", "(foo", "bar)]" ]
+	# back to: "BODY[1.HEADER.FIELDS (foo bar)]"
+	return unless $att =~ /\ABODY(?:\.PEEK)?\[/s;
+	until (rindex($att, ']') >= 0) {
+		my $next = shift @$want or return;
+		$att .= ' ' . uc($next);
+	}
+	if ($att =~ /\ABODY(?:\.PEEK)?\[
+				([0-9]+(?:\.[0-9]+)*)? # 1 - section_idx
+				(?:\.(HEADER|MIME|TEXT))? # 2 - section_name
+			\](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 3, 4
+		$partial->{$att} = [ \&partial_body, $1, $2, $3, $4 ];
+	} elsif ($att =~ /\ABODY(?:\.PEEK)?\[
+				(?:([0-9]+(?:\.[0-9]+)*)\.)? # 1 - section_idx
+				(?:HEADER\.FIELDS(\.NOT)?)\x20 # 2
+				\(([A-Z0-9\-\x20]+)\) # 3 - hdrs
+			\](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 4 5
+		$partial->{$att} = [ $2 ? \&partial_hdr_not
+					: \&partial_hdr_get,
+					$1, $3, $4, $5 ];
+	} else {
+		undef;
+	}
+}
+
+sub partial_emit ($$$) {
+	my ($self, $partial, $eml) = @_;
+	for my $k (sort keys %$partial) {
+		my ($cb, @args) = @{$partial->{$k}};
+		my ($offset, $len) = splice(@args, -2);
+		# $cb is partial_body|partial_hdr_get|partial_hdr_not
+		my $str = $cb->($eml, @args) // '';
+		if (defined $offset) {
+			if (defined $len) {
+				$str = substr($str, $offset, $len);
+				$k =~ s/\.$len>\z/>/ or warn
+"BUG: unable to remove `.$len>' from `$k'";
+			} else {
+				$str = substr($str, $offset);
+				$len = length($str);
+			}
+		} else {
+			$len = length($str);
+		}
+		$self->msg_more(" $k {$len}\r\n");
+		$self->msg_more($str);
+	}
+}
+
 sub cmd_uid_fetch ($$$;@) {
 	my ($self, $tag, $range, @want) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
 	if ($want[0] =~ s/\A\(//s) {
 		$want[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n";
 	}
-	my %want = map {;
-		my $x = $FETCH_ATT{uc($_)} or return "$tag BAD param: $_\r\n";
-		%$x;
-	} @want;
+	my (%partial, %want);
+	while (defined(my $att = shift @want)) {
+		$att = uc($att);
+		my $x = $FETCH_ATT{$att};
+		if ($x) {
+			%want = (%want, %$x);
+		} elsif (!partial_prepare(\%partial, \@want, $att)) {
+			return "$tag BAD param: $att\r\n";
+		}
+	}
+	$want{-partial} = \%partial if scalar keys %partial;
 	my ($beg, $end);
 	my $msgs = [];
 	if ($range =~ /\A([0-9]+):([0-9]+)\z/s) {
diff --git a/t/imap.t b/t/imap.t
new file mode 100644
index 00000000000..c435d365543
--- /dev/null
+++ b/t/imap.t
@@ -0,0 +1,43 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::IMAP;
+{
+	my $partial_prepare = \&PublicInbox::IMAP::partial_prepare;
+	my $x = {};
+	my $r = $partial_prepare->($x, [], my $p = 'BODY.PEEK[9]');
+	ok($r, $p);
+	$r = $partial_prepare->($x, [], $p = 'BODY.PEEK[9]<5>');
+	ok($r, $p);
+	$r = $partial_prepare->($x, [], $p = 'BODY.PEEK[9]<5.1>');
+	ok($r, $p);
+	$r = $partial_prepare->($x, [], $p = 'BODY[1.1]');
+	ok($r, $p);
+	$r = $partial_prepare->($x, [], $p = 'BODY[HEADER.FIELDS (DATE FROM)]');
+	ok($r, $p);
+	$r = $partial_prepare->($x, [], $p = 'BODY[HEADER.FIELDS.NOT (TO)]');
+	ok($r, $p);
+	$r = $partial_prepare->($x, [], $p = 'BODY[HEDDER.FIELDS.NOT (TO)]');
+	ok(!$r, "rejected misspelling $p");
+	$r = $partial_prepare->($x, [], $p = 'BODY[1.1.HEADER.FIELDS (TO)]');
+	ok($r, $p);
+	my $partial_body = \&PublicInbox::IMAP::partial_body;
+	my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get;
+	my $partial_hdr_not = \&PublicInbox::IMAP::partial_hdr_not;
+	is_deeply($x, {
+		'BODY.PEEK[9]' => [ $partial_body, 9, undef, undef, undef ],
+		'BODY.PEEK[9]<5>' => [ $partial_body, 9, undef, 5, undef ],
+		'BODY.PEEK[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ],
+		'BODY[1.1]' => [ $partial_body, '1.1', undef, undef, undef ],
+		'BODY[HEADER.FIELDS (DATE FROM)]' => [ $partial_hdr_get,
+					undef, 'DATE FROM', undef, undef ],
+		'BODY[HEADER.FIELDS.NOT (TO)]' => [ $partial_hdr_not,
+						undef, 'TO', undef, undef ],
+		'BODY[1.1.HEADER.FIELDS (TO)]' => [ $partial_hdr_get,
+						'1.1', 'TO', undef, undef ],
+	}, 'structure matches expected');
+}
+
+done_testing;
diff --git a/t/imapd.t b/t/imapd.t
index 31154bdf3e7..a63be0fd0c2 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -191,6 +191,31 @@ for my $r ('1:*', '1') {
 	$ret = $mic->fetch_hash($r, 'FLAGS') or BAIL_OUT "FETCH $@";
 	is_deeply($ret->{1}->{FLAGS}, '', 'no flags');
 
+	$ret = $mic->fetch_hash($r, 'BODY[1]') or BAIL_OUT "FETCH $@";
+	like($ret->{1}->{'BODY[1]'}, qr/\AThis is a test message/, 'BODY[1]');
+
+	$ret = $mic->fetch_hash($r, 'BODY[1]<1>') or BAIL_OUT "FETCH $@";
+	like($ret->{1}->{'BODY[1]<1>'}, qr/\Ahis is a test message/,
+			'BODY[1]<1>');
+
+	$ret = $mic->fetch_hash($r, 'BODY[1]<2.3>') or BAIL_OUT "FETCH $@";
+	is($ret->{1}->{'BODY[1]<2>'}, "is ", 'BODY[1]<2.3>');
+	$ret = $mic->bodypart_string($r, 1, 3, 2) or
+					BAIL_OUT "bodypart_string $@";
+	is($ret, "is ", 'bodypart string');
+
+	$ret = $mic->fetch_hash($r, 'BODY[HEADER.FIELDS.NOT (Message-ID)]')
+		or BAIL_OUT "FETCH $@";
+	$ret = $ret->{1}->{'BODY[HEADER.FIELDS.NOT (MESSAGE-ID)]'};
+	unlike($ret, qr/message-id/i, 'Message-ID excluded');
+	like($ret, qr/\r\n\r\n\z/s, 'got header end');
+
+	$ret = $mic->fetch_hash($r, 'BODY[HEADER.FIELDS (Message-ID)]')
+		or BAIL_OUT "FETCH $@";
+	is($ret->{1}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'},
+		'Message-ID: <testmessage@example.com>'."\r\n\r\n",
+		'got only Message-ID');
+
 	my $bs = $mic->get_bodystructure($r) or BAIL_OUT("bodystructure: $@");
 	ok($bs, 'got a bodystructure');
 	is(lc($bs->bodytype), 'text', '->bodytype');
@@ -292,7 +317,41 @@ $pi_config->each_inbox(sub {
 	ok($bs, 'BODYSTRUCTURE ok for deeply nested');
 	$ret = $mic->fetch_hash($uidnext, 'BODY') or BAIL_OUT "FETCH $@";
 	ok($ret->{$uidnext}->{BODY}, 'got something in BODY');
-});
+
+	# this matches dovecot behavior
+	$ret = $mic->fetch_hash($uidnext, 'BODY[1]') or BAIL_OUT "FETCH $@";
+	is($ret->{$uidnext}->{'BODY[1]'},
+		"testing embedded message harder\r\n", 'BODY[1]');
+	$ret = $mic->fetch_hash($uidnext, 'BODY[2]') or BAIL_OUT "FETCH $@";
+	like($ret->{$uidnext}->{'BODY[2]'},
+		qr/\ADate: Sat, 18 Apr 2020 22:20:20 /, 'BODY[2]');
+
+	$ret = $mic->fetch_hash($uidnext, 'BODY[2.1.1]') or BAIL_OUT "FETCH $@";
+	is($ret->{$uidnext}->{'BODY[2.1.1]'},
+		"testing embedded message\r\n", 'BODY[2.1.1]');
+
+	$ret = $mic->fetch_hash($uidnext, 'BODY[2.1.2]') or BAIL_OUT "FETCH $@";
+	like($ret->{$uidnext}->{'BODY[2.1.2]'}, qr/\AFrom: /,
+		'BODY[2.1.2] tip matched');
+	like($ret->{$uidnext}->{'BODY[2.1.2]'},
+		 # trailing CRLF may vary depending on MIME parser
+		 qr/done_testing;(?:\r\n){1,2}\z/,
+		'BODY[2.1.2] tail matched');
+
+	$ret = $mic->fetch_hash($uidnext, 'BODY[2.HEADER]') or
+						BAIL_OUT "2.HEADER $@";
+	like($ret->{$uidnext}->{'BODY[2.HEADER]'},
+		qr/\ADate: Sat, 18 Apr 2020 22:20:20 /,
+		'2.HEADER of message/rfc822');
+
+	$ret = $mic->fetch_hash($uidnext, 'BODY[2.MIME]') or
+		BAIL_OUT "2.MIME $@";
+	is($ret->{$uidnext}->{'BODY[2.MIME]'}, <<EOF, 'BODY[2.MIME]');
+Content-Type: message/rfc822\r
+Content-Disposition: attachment; filename="embed2x\.eml"\r
+\r
+EOF
+}); # each_inbox
 
 $td->kill;
 $td->join;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 15/82] imap: always include `resp-text' in responses
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (13 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 14/82] imap: allow fetch of partial of BODY[...] and headers Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 16/82] imap: split out unit tests and benchmarks Eric Wong
                   ` (67 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Mail::IMAPClient doesn't seem to mind the lack of `resp-text';
but it's required by RFC 3501.  Preliminary tests with
offlineimap(1) indicates the presence of `resp-text' is
necessary, even if it's just the freeform `text'.

And make the `text' more consistent, favoring "done" over
"complete" or "completed"; while we're at it.
---
 lib/PublicInbox/IMAP.pm | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 86e0a176be1..4292c564f01 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -127,7 +127,7 @@ sub cmd_login ($$$$) {
 sub cmd_logout ($$) {
 	my ($self, $tag) = @_;
 	delete $self->{logged_in};
-	$self->write(\"* BYE logging out\r\n$tag OK logout completed\r\n");
+	$self->write(\"* BYE logging out\r\n$tag OK Logout done\r\n");
 	$self->shutdn; # PublicInbox::DS::shutdn
 	undef;
 }
@@ -140,10 +140,10 @@ sub cmd_authenticate ($$$) {
 
 sub cmd_capability ($$) {
 	my ($self, $tag) = @_;
-	'* '.capa($self)."\r\n$tag OK\r\n";
+	'* '.capa($self)."\r\n$tag OK Capability done\r\n";
 }
 
-sub cmd_noop ($$) { "$_[1] OK NOOP completed\r\n" }
+sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" }
 
 # called by PublicInbox::InboxIdle
 sub on_inbox_unlock {
@@ -177,7 +177,7 @@ sub cmd_done ($$) {
 		return "$tag BAD internal bug\r\n";
 	};
 	$ibx->unsubscribe_unlock(fileno($self->{sock}));
-	"$idle_tag OK Idle completed\r\n";
+	"$idle_tag OK Idle done\r\n";
 }
 
 sub cmd_examine ($$$) {
@@ -204,7 +204,7 @@ EOF
 	$ret .= "* OK [UNSEEN $max]\r\n" if $max;
 	$ret .= "* OK [UIDNEXT $uidnext]\r\n" if defined $uidnext;
 	$ret .= "* OK [UIDVALIDITY $uidvalidity]\r\n" if defined $uidvalidity;
-	$ret .= "$tag OK [READ-ONLY] EXAMINE/SELECT complete\r\n";
+	$ret .= "$tag OK [READ-ONLY] EXAMINE/SELECT done\r\n";
 }
 
 sub _esc ($) {
@@ -439,7 +439,7 @@ sub cmd_status ($$$;@) {
 	}
 	return "$tag BAD no items\r\n" if !@it;
 	"* STATUS $mailbox (".join(' ', @it).")\r\n" .
-	"$tag OK Status complete\r\n";
+	"$tag OK Status done\r\n";
 }
 
 my %patmap = ('*' => '.*', '%' => '[^\.]*');
@@ -453,7 +453,7 @@ sub cmd_list ($$$$) {
 		$wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eig;
 		$l = [ grep(/ \Q$refname\E$wildcard\r\n\z/s, @$l) ];
 	}
-	\(join('', @$l, "$tag OK List complete\r\n"));
+	\(join('', @$l, "$tag OK List done\r\n"));
 }
 
 sub eml_index_offs_i { # PublicInbox::Eml::each_part callback
@@ -615,11 +615,12 @@ sub cmd_uid_fetch ($$$;@) {
 	} elsif ($range =~ /\A([0-9]+):\*\z/s) {
 		($beg, $end) =  ($1, $ibx->mm->max // 0);
 	} elsif ($range =~ /\A[0-9]+\z/) {
-		my $smsg = $ibx->over->get_art($range) or return "$tag OK\r\n";
+		my $smsg = $ibx->over->get_art($range) or
+			return "$tag OK Fetch done\r\n"; # really OK(!)
 		push @$msgs, $smsg;
 		($beg, $end) = ($range, 0);
 	} else {
-		return "$tag BAD\r\n";
+		return "$tag BAD fetch range\r\n";
 	}
 	long_response($self, \&uid_fetch_m, $tag, $ibx,
 				\$beg, $end, $msgs, \%want);
@@ -631,7 +632,7 @@ sub uid_search_all { # long_response
 	if (scalar(@$uids)) {
 		$self->msg_more(join(' ', '', @$uids));
 	} else {
-		$self->write(\"\r\n$tag OK\r\n");
+		$self->write(\"\r\n$tag OK Search done\r\n");
 		undef;
 	}
 }
@@ -642,7 +643,7 @@ sub uid_search_uid_range { # long_response
 	if (@$uids) {
 		$self->msg_more(join('', map { " $_->[0]" } @$uids));
 	} else {
-		$self->write(\"\r\n$tag OK\r\n");
+		$self->write(\"\r\n$tag OK Search done\r\n");
 		undef;
 	}
 }
@@ -665,12 +666,12 @@ sub cmd_uid_search ($$$;) {
 		} elsif ($rest[0] =~ /\A[0-9]+\z/s) {
 			my $uid = $rest[0];
 			$uid = $ibx->over->get_art($uid) ? " $uid" : '';
-			"* SEARCH$uid\r\n$tag OK\r\n";
+			"* SEARCH$uid\r\n$tag OK Search done\r\n";
 		} else {
-			"$tag BAD\r\n";
+			"$tag BAD Error\r\n";
 		}
 	} else {
-		"$tag BAD\r\n";
+		"$tag BAD Error\r\n";
 	}
 }
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 16/82] imap: split out unit tests and benchmarks
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (14 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 15/82] imap: always include `resp-text' in responses Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 17/82] imap: fix multi-message partial header fetches Eric Wong
                   ` (66 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This makes the test code easier-to-manage and allows us to run
faster unit tests which don't involve loading Mail::IMAPClient.
---
 MANIFEST            |  1 +
 t/imap.t            | 20 ++++++++++++++++++
 t/imapd.t           | 49 +--------------------------------------------
 xt/perf-imap-list.t | 35 ++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 48 deletions(-)
 create mode 100644 xt/perf-imap-list.t

diff --git a/MANIFEST b/MANIFEST
index 0803c3654d6..2e4ec915412 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -353,6 +353,7 @@ xt/git_async_cmp.t
 xt/mem-msgview.t
 xt/msgtime_cmp.t
 xt/nntpd-validate.t
+xt/perf-imap-list.t
 xt/perf-msgview.t
 xt/perf-nntpd.t
 xt/perf-threading.t
diff --git a/t/imap.t b/t/imap.t
index c435d365543..9b64f1646c6 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -1,9 +1,29 @@
 #!perl -w
 # Copyright (C) 2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# unit tests (no network) for IMAP, see t/imapd.t for end-to-end tests
 use strict;
 use Test::More;
 use PublicInbox::IMAP;
+use PublicInbox::IMAPD;
+
+{ # make sure we get '%' globbing right
+	my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y));
+	my $self = { imapd => { grouplist => \@n } };
+	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
+	my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%');
+	is(scalar($$res =~ tr/\n/\n/), 2, 'only one result');
+	like($$res, qr/ x\r\ntag OK/, 'saw expected');
+	$res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x.', '%');
+	is(scalar($$res =~ tr/\n/\n/), 3, 'only one result');
+	is(scalar(my @x = ($$res =~ m/ x\.[zy]\r\n/g)), 2, 'match expected');
+
+	$res = PublicInbox::IMAP::cmd_list($self, 't', 'x.(?{die "RCE"})', '%');
+	like($$res, qr/\At OK /, 'refname does not match attempted RCE');
+	$res = PublicInbox::IMAP::cmd_list($self, 't', '', '(?{die "RCE"})%');
+	like($$res, qr/\At OK /, 'wildcard does not match attempted RCE');
+}
+
 {
 	my $partial_prepare = \&PublicInbox::IMAP::partial_prepare;
 	my $x = {};
diff --git a/t/imapd.t b/t/imapd.t
index a63be0fd0c2..59b95a6b83f 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -1,6 +1,7 @@
 #!perl -w
 # Copyright (C) 2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# end-to-end IMAP tests, see unit tests in t/imap.t, too
 use strict;
 use Test::More;
 use Time::HiRes ();
@@ -99,54 +100,6 @@ like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/,
 		'got an inbox.i1');
 like($raw[-1], qr/^\S+ OK /, 'response ended with OK');
 
-{ # make sure we get '%' globbing right
-	my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y));
-	my $self = { imapd => { grouplist => \@n } };
-	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
-	my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%');
-	is(scalar($$res =~ tr/\n/\n/), 2, 'only one result');
-	like($$res, qr/ x\r\ntag OK/, 'saw expected');
-	$res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x.', '%');
-	is(scalar($$res =~ tr/\n/\n/), 3, 'only one result');
-	is(scalar(my @x = ($$res =~ m/ x\.[zy]\r\n/g)), 2, 'match expected');
-
-	$res = PublicInbox::IMAP::cmd_list($self, 't', 'x.(?{die "RCE"})', '%');
-	like($$res, qr/\At OK /, 'refname does not match attempted RCE');
-	$res = PublicInbox::IMAP::cmd_list($self, 't', '', '(?{die "RCE"})%');
-	like($$res, qr/\At OK /, 'wildcard does not match attempted RCE');
-}
-
-if ($ENV{TEST_BENCHMARK}) {
-	use Benchmark qw(:all);
-	my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000);
-	push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000);
-	my $self = { imapd => { grouplist => \@n } };
-	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
-
-	my $n = scalar @n;
-	open my $null, '>', '/dev/null' or die;
-	my $ds = { sock => $null };
-	my $nr = 200;
-	diag "starting benchmark...";
-	my $t = timeit(1, sub {
-		for (0..$nr) {
-			my $res = PublicInbox::IMAP::cmd_list($self, 'tag',
-								'', '*');
-			PublicInbox::DS::write($ds, $res);
-		}
-	});
-	diag timestr($t). "list all for $n inboxes $nr times";
-	$nr = 20;
-	$t = timeit(1, sub {
-		for (0..$nr) {
-			my $res = PublicInbox::IMAP::cmd_list($self, 'tag',
-								'inbox.', '%');
-			PublicInbox::DS::write($ds, $res);
-		}
-	});
-	diag timestr($t). "list partial for $n inboxes $nr times";
-}
-
 my $ret = $mic->search('all') or BAIL_OUT "SEARCH FAIL $@";
 is_deeply($ret, [ 1 ], 'search all works');
 $ret = $mic->search('uid 1') or BAIL_OUT "SEARCH FAIL $@";
diff --git a/xt/perf-imap-list.t b/xt/perf-imap-list.t
new file mode 100644
index 00000000000..37640a90f3c
--- /dev/null
+++ b/xt/perf-imap-list.t
@@ -0,0 +1,35 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use Test::More;
+use_ok 'PublicInbox::IMAP';
+use_ok 'PublicInbox::IMAPD';
+use PublicInbox::DS;
+use Benchmark qw(:all);
+my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000);
+push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000);
+my $self = { imapd => { grouplist => \@n } };
+PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
+
+my $n = scalar @n;
+open my $null, '>', '/dev/null' or BAIL_OUT "open: $!";
+my $ds = { sock => $null };
+my $nr = 200;
+diag "starting benchmark...";
+my $cmd_list = \&PublicInbox::IMAP::cmd_list;
+my $t = timeit(1, sub {
+	for (0..$nr) {
+		my $res = $cmd_list->($self, 'tag', '', '*');
+		PublicInbox::DS::write($ds, $res);
+	}
+});
+diag timestr($t). "list all for $n inboxes $nr times";
+$nr = 20;
+$t = timeit(1, sub {
+	for (0..$nr) {
+		my $res = $cmd_list->($self, 'tag', 'inbox.', '%');
+		PublicInbox::DS::write($ds, $res);
+	}
+});
+diag timestr($t). "list partial for $n inboxes $nr times";
+done_testing;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 17/82] imap: fix multi-message partial header fetches
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (15 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 16/82] imap: split out unit tests and benchmarks Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 18/82] imap: simplify partial fetch structure Eric Wong
                   ` (65 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We must keep the contents of {-partial} around when handling
a request to fetch multiple messages.
---
 lib/PublicInbox/IMAP.pm | 2 +-
 t/imapd.t               | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 4292c564f01..fffd611b3c6 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -390,7 +390,7 @@ sub uid_fetch_cb { # called by git->cat_async
 		$self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1));
 	$want->{BODY} and
 		$self->msg_more(' BODY '.fetch_body($eml));
-	if (my $partial = delete $want->{-partial}) {
+	if (my $partial = $want->{-partial}) {
 		partial_emit($self, $partial, $eml);
 	}
 	$self->msg_more(")\r\n");
diff --git a/t/imapd.t b/t/imapd.t
index 59b95a6b83f..fcbbdc09d31 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -291,7 +291,7 @@ $pi_config->each_inbox(sub {
 		 qr/done_testing;(?:\r\n){1,2}\z/,
 		'BODY[2.1.2] tail matched');
 
-	$ret = $mic->fetch_hash($uidnext, 'BODY[2.HEADER]') or
+	$ret = $mic->fetch_hash("1:$uidnext", 'BODY[2.HEADER]') or
 						BAIL_OUT "2.HEADER $@";
 	like($ret->{$uidnext}->{'BODY[2.HEADER]'},
 		qr/\ADate: Sat, 18 Apr 2020 22:20:20 /,

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 18/82] imap: simplify partial fetch structure
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (16 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 17/82] imap: fix multi-message partial header fetches Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 19/82] imap: support sequence number FETCH Eric Wong
                   ` (64 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

While the contents of normal %want hash keys are bounded in
size, %partial can cause more overhead and lead to repeated sort
calls on multi-message fetches.  So sort it once and use
arrayrefs to make the data structure more compact.
---
 lib/PublicInbox/IMAP.pm | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index fffd611b3c6..673e1646216 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -569,8 +569,8 @@ sub partial_prepare ($$$) {
 
 sub partial_emit ($$$) {
 	my ($self, $partial, $eml) = @_;
-	for my $k (sort keys %$partial) {
-		my ($cb, @args) = @{$partial->{$k}};
+	for (@$partial) {
+		my ($k, $cb, @args) = @$_;
 		my ($offset, $len) = splice(@args, -2);
 		# $cb is partial_body|partial_hdr_get|partial_hdr_not
 		my $str = $cb->($eml, @args) // '';
@@ -607,7 +607,14 @@ sub cmd_uid_fetch ($$$;@) {
 			return "$tag BAD param: $att\r\n";
 		}
 	}
-	$want{-partial} = \%partial if scalar keys %partial;
+
+	# stabilize partial order for consistency and ease-of-debugging:
+	if (scalar keys %partial) {
+		$want{-partial} = [ map {
+			[ $_, @{$partial{$_}} ]
+		} sort keys %partial ];
+	}
+
 	my ($beg, $end);
 	my $msgs = [];
 	if ($range =~ /\A([0-9]+):([0-9]+)\z/s) {

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 19/82] imap: support sequence number FETCH
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (17 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 18/82] imap: simplify partial fetch structure Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 20/82] imap: do not include ".PEEK" in responses Eric Wong
                   ` (63 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We'll return dummy messages for now when sequence numbers go
missing, in case clients can't handle missing messages.
---
 lib/PublicInbox/IMAP.pm | 87 ++++++++++++++++++++++++++++++++++++-----
 t/imapd.t               | 18 +++++++++
 2 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 673e1646216..2aa7ab346c1 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -348,11 +348,34 @@ sub fetch_body ($;$) {
 	join('', @hold);
 }
 
+sub dummy_message ($$) {
+	my ($seqno, $ibx) = @_;
+	my $ret = <<EOF;
+From: nobody\@localhost\r
+To: nobody\@localhost\r
+Date: Thu, 01 Jan 1970 00:00:00 +0000\r
+Message-ID: <dummy-$seqno\@$ibx->{newsgroup}>\r
+Subject: dummy message #$seqno\r
+\r
+You're seeing this message because your IMAP client didn't use UIDs.\r
+The message which used to use this sequence number was likely spam\r
+and removed by the administrator.\r
+EOF
+	\$ret;
+}
+
 sub uid_fetch_cb { # called by git->cat_async
 	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
 	my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg;
 	my $smsg = shift @$msgs or die 'BUG: no smsg';
-	$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
+	if (!defined($oid)) {
+		# it's possible to have TOCTOU if an admin runs
+		# public-inbox-(edit|purge), just move onto the next message
+		return unless defined $want->{-seqno};
+		$bref = dummy_message($smsg->{num}, $ibx);
+	} else {
+		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
+	}
 	$$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy
 
 	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
@@ -591,19 +614,19 @@ sub partial_emit ($$$) {
 	}
 }
 
-sub cmd_uid_fetch ($$$;@) {
-	my ($self, $tag, $range, @want) = @_;
+sub fetch_common ($$$$) {
+	my ($self, $tag, $range, $want) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
-	if ($want[0] =~ s/\A\(//s) {
-		$want[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n";
+	if ($want->[0] =~ s/\A\(//s) {
+		$want->[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n";
 	}
 	my (%partial, %want);
-	while (defined(my $att = shift @want)) {
+	while (defined(my $att = shift @$want)) {
 		$att = uc($att);
 		my $x = $FETCH_ATT{$att};
 		if ($x) {
 			%want = (%want, %$x);
-		} elsif (!partial_prepare(\%partial, \@want, $att)) {
+		} elsif (!partial_prepare(\%partial, $want, $att)) {
 			return "$tag BAD param: $att\r\n";
 		}
 	}
@@ -629,8 +652,54 @@ sub cmd_uid_fetch ($$$;@) {
 	} else {
 		return "$tag BAD fetch range\r\n";
 	}
-	long_response($self, \&uid_fetch_m, $tag, $ibx,
-				\$beg, $end, $msgs, \%want);
+	[ $tag, $ibx, \$beg, $end, $msgs, \%want ];
+}
+
+sub cmd_uid_fetch ($$$;@) {
+	my ($self, $tag, $range, @want) = @_;
+	my $args = fetch_common($self, $tag, $range, \@want);
+	ref($args) eq 'ARRAY' ?
+		long_response($self, \&uid_fetch_m, @$args) :
+		$args; # error
+}
+
+sub seq_fetch_m { # long_response
+	my ($self, $tag, $ibx, $beg, $end, $msgs, $want) = @_;
+	if (!@$msgs) { # refill
+		@$msgs = @{$ibx->over->query_xover($$beg, $end)};
+		if (!@$msgs) {
+			$self->write(\"$tag OK Fetch done\r\n");
+			return;
+		}
+		$$beg = $msgs->[-1]->{num} + 1;
+	}
+	my $seq = $want->{-seqno}++;
+	my $cur_num = $msgs->[0]->{num};
+	if ($cur_num == $seq) { # as expected
+		my $git = $ibx->git;
+		$git->cat_async_begin; # TODO: actually make async
+		$git->cat_async($msgs->[0]->{blob}, \&uid_fetch_cb, \@_);
+		$git->cat_async_wait;
+	} elsif ($cur_num > $seq) {
+		# send dummy messages until $seq catches up to $cur_num
+		my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg';
+		unshift @$msgs, $smsg;
+		my $bref = dummy_message($seq, $ibx);
+		uid_fetch_cb($bref, undef, undef, undef, \@_);
+	} else { # should not happen
+		die "BUG: cur_num=$cur_num < seq=$seq";
+	}
+	1; # more messages on the way
+}
+
+sub cmd_fetch ($$$;@) {
+	my ($self, $tag, $range, @want) = @_;
+	my $args = fetch_common($self, $tag, $range, \@want);
+	ref($args) eq 'ARRAY' ? do {
+		my $want = $args->[-1];
+		$want->{-seqno} = ${$args->[2]}; # $$beg
+		long_response($self, \&seq_fetch_m, @$args)
+	} : $args; # error
 }
 
 sub uid_search_all { # long_response
diff --git a/t/imapd.t b/t/imapd.t
index fcbbdc09d31..fc90948e182 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -306,6 +306,24 @@ Content-Disposition: attachment; filename="embed2x\.eml"\r
 EOF
 }); # each_inbox
 
+# message sequence numbers :<
+is($mic->Uid(0), 0, 'disable UID on '.ref($mic));
+ok($mic->reconnect, 'reconnected');
+$ret = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@";
+is(scalar keys %$ret, 3, 'got all 3 messages');
+{
+	my $rdr = { 0 => \($ret->{1}->{RFC822}) };
+	my $env = { HOME => $ENV{HOME} };
+	my @cmd = qw(-learn rm --all);
+	run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm');
+}
+my $r2 = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@";
+is(scalar keys %$r2, 3, 'still got all 3 messages');
+like($r2->{1}->{RFC822}, qr/dummy message #1/, 'got dummy message 1');
+is($r2->{2}->{RFC822}, $ret->{2}->{RFC822}, 'message 2 unchanged');
+is($r2->{3}->{RFC822}, $ret->{3}->{RFC822}, 'message 3 unchanged');
+ok($mic->logout, 'logged out');
+
 $td->kill;
 $td->join;
 is($?, 0, 'no error in exited process');

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 20/82] imap: do not include ".PEEK" in responses
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (18 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 19/82] imap: support sequence number FETCH Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 21/82] imap: support the CLOSE command Eric Wong
                   ` (62 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

They're not specified in RFC 3501 for responses, and at least
mutt fails to handle it.
---
 lib/PublicInbox/IMAP.pm | 32 ++++++++++++++------------------
 t/imap.t                | 12 ++++++------
 t/imapd.t               | 14 ++++++++++----
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 2aa7ab346c1..54c616eee2f 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -33,9 +33,6 @@ die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address;
 sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977?
 
 my %FETCH_NEED_BLOB = ( # for future optimization
-	'BODY.PEEK[HEADER]' => 1,
-	'BODY.PEEK[TEXT]' => 1,
-	'BODY.PEEK[]' => 1,
 	'BODY[HEADER]' => 1,
 	'BODY[TEXT]' => 1,
 	'BODY[]' => 1,
@@ -388,8 +385,8 @@ sub uid_fetch_cb { # called by git->cat_async
 	$want->{INTERNALDATE} and
 		$self->msg_more(' INTERNALDATE "'.$smsg->internaldate.'"');
 	$want->{FLAGS} and $self->msg_more(' FLAGS ()');
-	for ('RFC822', 'BODY[]', 'BODY.PEEK[]') {
-		next unless $want->{$_};
+	for ('RFC822', 'BODY[]') {
+		$want->{$_} or next;
 		$self->msg_more(" $_ {".length($$bref)."}\r\n");
 		$self->msg_more($$bref);
 	}
@@ -399,14 +396,14 @@ sub uid_fetch_cb { # called by git->cat_async
 	$want->{ENVELOPE} and
 		$self->msg_more(' ENVELOPE '.eml_envelope($eml));
 
-	for my $f ('RFC822.HEADER', 'BODY[HEADER]', 'BODY.PEEK[HEADER]') {
-		next unless $want->{$f};
-		$self->msg_more(" $f {".length(${$eml->{hdr}})."}\r\n");
+	for ('RFC822.HEADER', 'BODY[HEADER]') {
+		$want->{$_} or next;
+		$self->msg_more(" $_ {".length(${$eml->{hdr}})."}\r\n");
 		$self->msg_more(${$eml->{hdr}});
 	}
-	for my $f ('RFC822.TEXT', 'BODY[TEXT]') {
-		next unless $want->{$f};
-		$self->msg_more(" $f {".length($$bref)."}\r\n");
+	for ('RFC822.TEXT', 'BODY[TEXT]') {
+		$want->{$_} or next;
+		$self->msg_more(" $_ {".length($$bref)."}\r\n");
 		$self->msg_more($$bref);
 	}
 	$want->{BODYSTRUCTURE} and
@@ -567,18 +564,16 @@ sub partial_prepare ($$$) {
 
 	# recombine [ "BODY[1.HEADER.FIELDS", "(foo", "bar)]" ]
 	# back to: "BODY[1.HEADER.FIELDS (foo bar)]"
-	return unless $att =~ /\ABODY(?:\.PEEK)?\[/s;
+	return unless $att =~ /\ABODY\[/s;
 	until (rindex($att, ']') >= 0) {
 		my $next = shift @$want or return;
 		$att .= ' ' . uc($next);
 	}
-	if ($att =~ /\ABODY(?:\.PEEK)?\[
-				([0-9]+(?:\.[0-9]+)*)? # 1 - section_idx
-				(?:\.(HEADER|MIME|TEXT))? # 2 - section_name
+	if ($att =~ /\ABODY\[([0-9]+(?:\.[0-9]+)*)? # 1 - section_idx
+			(?:\.(HEADER|MIME|TEXT))? # 2 - section_name
 			\](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 3, 4
 		$partial->{$att} = [ \&partial_body, $1, $2, $3, $4 ];
-	} elsif ($att =~ /\ABODY(?:\.PEEK)?\[
-				(?:([0-9]+(?:\.[0-9]+)*)\.)? # 1 - section_idx
+	} elsif ($att =~ /\ABODY\[(?:([0-9]+(?:\.[0-9]+)*)\.)? # 1 - section_idx
 				(?:HEADER\.FIELDS(\.NOT)?)\x20 # 2
 				\(([A-Z0-9\-\x20]+)\) # 3 - hdrs
 			\](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 4 5
@@ -623,6 +618,7 @@ sub fetch_common ($$$$) {
 	my (%partial, %want);
 	while (defined(my $att = shift @$want)) {
 		$att = uc($att);
+		$att =~ s/\ABODY\.PEEK\[/BODY\[/; # we're read-only
 		my $x = $FETCH_ATT{$att};
 		if ($x) {
 			%want = (%want, %$x);
@@ -633,7 +629,7 @@ sub fetch_common ($$$$) {
 
 	# stabilize partial order for consistency and ease-of-debugging:
 	if (scalar keys %partial) {
-		$want{-partial} = [ map {
+		$want{-partial} = [ map {;
 			[ $_, @{$partial{$_}} ]
 		} sort keys %partial ];
 	}
diff --git a/t/imap.t b/t/imap.t
index 9b64f1646c6..fe6352b678c 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -27,11 +27,11 @@ use PublicInbox::IMAPD;
 {
 	my $partial_prepare = \&PublicInbox::IMAP::partial_prepare;
 	my $x = {};
-	my $r = $partial_prepare->($x, [], my $p = 'BODY.PEEK[9]');
+	my $r = $partial_prepare->($x, [], my $p = 'BODY[9]');
 	ok($r, $p);
-	$r = $partial_prepare->($x, [], $p = 'BODY.PEEK[9]<5>');
+	$r = $partial_prepare->($x, [], $p = 'BODY[9]<5>');
 	ok($r, $p);
-	$r = $partial_prepare->($x, [], $p = 'BODY.PEEK[9]<5.1>');
+	$r = $partial_prepare->($x, [], $p = 'BODY[9]<5.1>');
 	ok($r, $p);
 	$r = $partial_prepare->($x, [], $p = 'BODY[1.1]');
 	ok($r, $p);
@@ -47,9 +47,9 @@ use PublicInbox::IMAPD;
 	my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get;
 	my $partial_hdr_not = \&PublicInbox::IMAP::partial_hdr_not;
 	is_deeply($x, {
-		'BODY.PEEK[9]' => [ $partial_body, 9, undef, undef, undef ],
-		'BODY.PEEK[9]<5>' => [ $partial_body, 9, undef, 5, undef ],
-		'BODY.PEEK[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ],
+		'BODY[9]' => [ $partial_body, 9, undef, undef, undef ],
+		'BODY[9]<5>' => [ $partial_body, 9, undef, 5, undef ],
+		'BODY[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ],
 		'BODY[1.1]' => [ $partial_body, '1.1', undef, undef, undef ],
 		'BODY[HEADER.FIELDS (DATE FROM)]' => [ $partial_hdr_get,
 					undef, 'DATE FROM', undef, undef ],
diff --git a/t/imapd.t b/t/imapd.t
index fc90948e182..1ec0d5c370f 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -317,11 +317,17 @@ is(scalar keys %$ret, 3, 'got all 3 messages');
 	my @cmd = qw(-learn rm --all);
 	run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm');
 }
-my $r2 = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@";
+my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@";
 is(scalar keys %$r2, 3, 'still got all 3 messages');
-like($r2->{1}->{RFC822}, qr/dummy message #1/, 'got dummy message 1');
-is($r2->{2}->{RFC822}, $ret->{2}->{RFC822}, 'message 2 unchanged');
-is($r2->{3}->{RFC822}, $ret->{3}->{RFC822}, 'message 3 unchanged');
+like($r2->{1}->{'BODY[]'}, qr/dummy message #1/, 'got dummy message 1');
+is($r2->{2}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged');
+is($r2->{3}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged');
+$r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]')
+			or BAIL_OUT "FETCH $@";
+is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'},
+	'Message-ID: <20200418222508.GA13918@dcvr>'."\r\n\r\n",
+	'BODY.PEEK[HEADER.FIELDS ...] drops .PEEK');
+
 ok($mic->logout, 'logged out');
 
 $td->kill;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 21/82] imap: support the CLOSE command
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (19 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 20/82] imap: do not include ".PEEK" in responses Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 22/82] imap: speed up HEADER.FIELDS[.NOT] range fetches Eric Wong
                   ` (61 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

It seems worthless to support CLOSE for read-only inboxes, but
mutt sends it, so don't return a BAD error with proper use.
---
 lib/PublicInbox/IMAP.pm | 6 ++++++
 t/imapd.t               | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 54c616eee2f..0852ffab868 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -121,6 +121,12 @@ sub cmd_login ($$$$) {
 	login_success($self, $tag);
 }
 
+sub cmd_close ($$) {
+	my ($self, $tag) = @_;
+	delete $self->{ibx} ? "$tag OK Close done\r\n"
+				: "$tag BAD No mailbox\r\n";
+}
+
 sub cmd_logout ($$) {
 	my ($self, $tag) = @_;
 	delete $self->{logged_in};
diff --git a/t/imapd.t b/t/imapd.t
index 1ec0d5c370f..8172a91923d 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -327,7 +327,8 @@ $r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]')
 is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'},
 	'Message-ID: <20200418222508.GA13918@dcvr>'."\r\n\r\n",
 	'BODY.PEEK[HEADER.FIELDS ...] drops .PEEK');
-
+ok($mic->close, 'CLOSE works');
+ok(!$mic->close, 'CLOSE not idempotent');
 ok($mic->logout, 'logged out');
 
 $td->kill;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 22/82] imap: speed up HEADER.FIELDS[.NOT] range fetches
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (20 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 21/82] imap: support the CLOSE command Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 23/82] git: async: flatten the inflight array Eric Wong
                   ` (60 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

While we can't memoize the regexp forever like we do with other
Eml users, we can still benefit from caching regexp compilation
on a per-request basis.

A FETCH request from mutt on a 4K message inbox is around 8%
faster after this.  Since regexp compilation via qr// isn't
unbearably slow, a shared cache probably isn't worth the
trouble of implementing.  A per-request cache seems enough.
---
 lib/PublicInbox/IMAP.pm | 17 ++++++++---------
 t/imap.t                | 10 +++++++---
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 0852ffab868..39667199080 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -544,25 +544,23 @@ sub hdrs_regexp ($) {
 
 # BODY[($SECTION_IDX.)?HEADER.FIELDS.NOT ($HDRS)]<$offset.$bytes>
 sub partial_hdr_not {
-	my ($eml, $section_idx, $hdrs) = @_;
+	my ($eml, $section_idx, $hdrs_re) = @_;
 	if (defined $section_idx) {
 		$eml = eml_body_idx($eml, $section_idx) or return;
 	}
 	my $str = $eml->header_obj->as_string;
-	my $re = hdrs_regexp($hdrs);
-	$str =~ s/$re//g;
+	$str =~ s/$hdrs_re//g;
 	$str .= "\r\n";
 }
 
 # BODY[($SECTION_IDX.)?HEADER.FIELDS ($HDRS)]<$offset.$bytes>
 sub partial_hdr_get {
-	my ($eml, $section_idx, $hdrs) = @_;
+	my ($eml, $section_idx, $hdrs_re) = @_;
 	if (defined $section_idx) {
 		$eml = eml_body_idx($eml, $section_idx) or return;
 	}
 	my $str = $eml->header_obj->as_string;
-	my $re = hdrs_regexp($hdrs);
-	join('', ($str =~ m/($re)/g), "\r\n");
+	join('', ($str =~ m/($hdrs_re)/g), "\r\n");
 }
 
 sub partial_prepare ($$$) {
@@ -583,9 +581,10 @@ sub partial_prepare ($$$) {
 				(?:HEADER\.FIELDS(\.NOT)?)\x20 # 2
 				\(([A-Z0-9\-\x20]+)\) # 3 - hdrs
 			\](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 4 5
-		$partial->{$att} = [ $2 ? \&partial_hdr_not
-					: \&partial_hdr_get,
-					$1, $3, $4, $5 ];
+		my $tmp = $partial->{$att} = [ $2 ? \&partial_hdr_not
+						: \&partial_hdr_get,
+						$1, undef, $4, $5 ];
+		$tmp->[2] = hdrs_regexp($3);
 	} else {
 		undef;
 	}
diff --git a/t/imap.t b/t/imap.t
index fe6352b678c..451b6596bf9 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -46,17 +46,21 @@ use PublicInbox::IMAPD;
 	my $partial_body = \&PublicInbox::IMAP::partial_body;
 	my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get;
 	my $partial_hdr_not = \&PublicInbox::IMAP::partial_hdr_not;
+	my $hdrs_regexp = \&PublicInbox::IMAP::hdrs_regexp;
 	is_deeply($x, {
 		'BODY[9]' => [ $partial_body, 9, undef, undef, undef ],
 		'BODY[9]<5>' => [ $partial_body, 9, undef, 5, undef ],
 		'BODY[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ],
 		'BODY[1.1]' => [ $partial_body, '1.1', undef, undef, undef ],
 		'BODY[HEADER.FIELDS (DATE FROM)]' => [ $partial_hdr_get,
-					undef, 'DATE FROM', undef, undef ],
+					undef, $hdrs_regexp->('DATE FROM'),
+					undef, undef ],
 		'BODY[HEADER.FIELDS.NOT (TO)]' => [ $partial_hdr_not,
-						undef, 'TO', undef, undef ],
+						undef, $hdrs_regexp->('TO'),
+						undef, undef ],
 		'BODY[1.1.HEADER.FIELDS (TO)]' => [ $partial_hdr_get,
-						'1.1', 'TO', undef, undef ],
+						'1.1', $hdrs_regexp->('TO'),
+						undef, undef ],
 	}, 'structure matches expected');
 }
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 23/82] git: async: flatten the inflight array
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (21 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 22/82] imap: speed up HEADER.FIELDS[.NOT] range fetches Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 24/82] git: do our own read buffering for cat-file Eric Wong
                   ` (59 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Small array refs have considerable overhead in Perl, so reduce
AV/SV overhead and instead allow the inflight array to grow
twice as large.
---
 lib/PublicInbox/Git.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 8426cc7d6a7..e1d5c386e7e 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -18,7 +18,7 @@ use base qw(Exporter);
 our @EXPORT_OK = qw(git_unquote git_quote);
 
 use constant MAX_INFLIGHT =>
-	($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF())
+	(($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 2)
 	/
 	65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1
 
@@ -135,8 +135,8 @@ sub read_cat_in_full ($$) {
 
 sub _cat_async_step ($$) {
 	my ($self, $inflight) = @_;
-	my $pair = shift @$inflight or die 'BUG: inflight empty';
-	my ($cb, $arg) = @$pair;
+	die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2;
+	my ($cb, $arg) = splice(@$inflight, 0, 2);
 	local $/ = "\n";
 	my $head = readline($self->{in});
 	$head =~ / missing$/ and return
@@ -314,7 +314,7 @@ sub cat_async ($$$;$) {
 	}
 
 	print { $self->{out} } $oid, "\n" or fail($self, "write error: $!");
-	push(@$inflight, [ $cb, $arg ]);
+	push(@$inflight, $cb, $arg);
 }
 
 sub extract_cmt_time {

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 24/82] git: do our own read buffering for cat-file
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (22 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 23/82] git: async: flatten the inflight array Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 25/82] imap: use git-cat-file asynchronously Eric Wong
                   ` (58 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

To work with our event loop, we must perform read buffering
ourselves or risk starvation, as there doesn't appear to be
a way to check the amount of data buffered in userspace by
by the PerlIO layers without resorting to C or XS.

This lets us perform fewer syscalls at the expense of more Perl
ops.  As it stands, there seems to be a tiny performance
improvement, but more will be possible in the future.
---
 lib/PublicInbox/Git.pm | 82 +++++++++++++++++++++++++++---------------
 1 file changed, 54 insertions(+), 28 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index e1d5c386e7e..54c163e8c2f 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -16,6 +16,8 @@ use PublicInbox::Spawn qw(popen_rd);
 use PublicInbox::Tmpfile;
 use base qw(Exporter);
 our @EXPORT_OK = qw(git_unquote git_quote);
+use Errno qw(EINTR);
+our $PIPE_BUFSIZ = 65536; # Linux default
 
 use constant MAX_INFLIGHT =>
 	(($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 2)
@@ -121,32 +123,54 @@ sub _bidi_pipe {
 		fcntl($out_w, 1031, 4096);
 		fcntl($in_r, 1031, 4096) if $batch eq '--batch-check';
 	}
+	$self->{$batch} = \(my $rbuf = '');
 	$self->{$out} = $out_w;
 	$self->{$in} = $in_r;
 }
 
-sub read_cat_in_full ($$) {
-	my ($self, $len) = @_;
-	++$len; # for final "\n" added by git
-	read($self->{in}, my $buf, $len) == $len or fail($self, 'short read');
-	chop($buf) eq "\n" or fail($self, 'newline missing after blob');
-	\$buf;
+sub my_read ($$$) {
+	my ($fh, $rbuf, $len) = @_;
+	my $left = $len - length($$rbuf);
+	my $r;
+	while ($left > 0) {
+		$r = sysread($fh, $$rbuf, $PIPE_BUFSIZ, length($$rbuf));
+		if ($r) {
+			$left -= $r;
+		} else {
+			next if (!defined($r) && $! == EINTR);
+			return $r;
+		}
+	}
+	\substr($$rbuf, 0, $len, '');
+}
+
+sub my_readline ($$) {
+	my ($fh, $rbuf) = @_;
+	while (1) {
+		if ((my $n = index($$rbuf, "\n")) >= 0) {
+			return substr($$rbuf, 0, $n + 1, '');
+		}
+		my $r = sysread($fh, $$rbuf, $PIPE_BUFSIZ, length($$rbuf));
+		next if $r || (!defined($r) && $! == EINTR);
+		return defined($r) ? '' : undef; # EOF or error
+	}
 }
 
 sub _cat_async_step ($$) {
 	my ($self, $inflight) = @_;
 	die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2;
 	my ($cb, $arg) = splice(@$inflight, 0, 2);
-	local $/ = "\n";
-	my $head = readline($self->{in});
+	my $head = my_readline($self->{in}, $self->{'--batch'});
 	$head =~ / missing$/ and return
 		eval { $cb->(undef, undef, undef, undef, $arg) };
 
 	$head =~ /^([0-9a-f]{40}) (\S+) ([0-9]+)$/ or
 		fail($self, "Unexpected result from async git cat-file: $head");
 	my ($oid_hex, $type, $size) = ($1, $2, $3 + 0);
-	my $bref = read_cat_in_full($self, $size);
-	eval { $cb->($bref, $oid_hex, $type, $size, $arg) };
+	my $ret = my_read($self->{in}, $self->{'--batch'}, $size + 1);
+	fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret;
+	chop($$ret) eq "\n" or fail($self, 'newline missing after blob');
+	eval { $cb->($ret, $oid_hex, $type, $size, $arg) };
 	warn "E: $oid_hex $@\n" if $@;
 }
 
@@ -158,16 +182,18 @@ sub cat_async_wait ($) {
 	}
 }
 
+sub batch_prepare ($) {
+	_bidi_pipe($_[0], qw(--batch in out pid));
+}
+
 sub cat_file {
-	my ($self, $obj, $ref) = @_;
+	my ($self, $obj, $sizeref) = @_;
 	my ($retried, $head);
 	cat_async_wait($self);
 again:
 	batch_prepare($self);
 	print { $self->{out} } $obj, "\n" or fail($self, "write error: $!");
-
-	local $/ = "\n";
-	$head = readline($self->{in});
+	$head = my_readline($self->{in}, $self->{'--batch'});
 	if ($head =~ / missing$/) {
 		if (!$retried && alternates_changed($self)) {
 			$retried = 1;
@@ -179,19 +205,19 @@ again:
 	$head =~ /^[0-9a-f]{40} \S+ ([0-9]+)$/ or
 		fail($self, "Unexpected result from git cat-file: $head");
 
-	my $size = $1;
-	$$ref = $size if $ref;
-	read_cat_in_full($self, $size);
+	my $size = $1 + 0;
+	$$sizeref = $size if $sizeref;
+	my $ret = my_read($self->{in}, $self->{'--batch'}, $size + 1);
+	fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret;
+	chop($$ret) eq "\n" or fail($self, 'newline missing after blob');
+	$ret;
 }
 
-sub batch_prepare ($) { _bidi_pipe($_[0], qw(--batch in out pid)) }
-
 sub check {
 	my ($self, $obj) = @_;
 	_bidi_pipe($self, qw(--batch-check in_c out_c pid_c err_c));
 	print { $self->{out_c} } $obj, "\n" or fail($self, "write error: $!");
-	local $/ = "\n";
-	chomp(my $line = readline($self->{in_c}));
+	chomp(my $line = my_readline($self->{in_c}, $self->{'--batch-check'}));
 	my ($hex, $type, $size) = split(' ', $line);
 
 	# Future versions of git.git may show 'ambiguous', but for now,
@@ -201,9 +227,9 @@ sub check {
 	return if $type eq 'missing' || $type eq 'ambiguous';
 
 	if ($hex eq 'dangling' || $hex eq 'notdir' || $hex eq 'loop') {
-		$size = $type + length("\n");
-		my $r = read($self->{in_c}, my $buf, $size);
-		defined($r) or fail($self, "read failed: $!");
+		my $ret = my_read($self->{in_c}, $self->{'--batch-check'},
+				$type + 1);
+		fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret;
 		return;
 	}
 
@@ -211,9 +237,9 @@ sub check {
 }
 
 sub _destroy {
-	my ($self, $in, $out, $pid, $err) = @_;
+	my ($self, $batch, $in, $out, $pid, $err) = @_;
 	my $p = delete $self->{$pid} or return;
-	delete @$self{($in, $out)};
+	delete @$self{($batch, $in, $out)};
 	delete $self->{$err} if $err; # `err_c'
 
 	# PublicInbox::DS may not be loaded
@@ -251,8 +277,8 @@ sub qx {
 # returns true if there are pending "git cat-file" processes
 sub cleanup {
 	my ($self) = @_;
-	_destroy($self, qw(in out pid));
-	_destroy($self, qw(in_c out_c pid_c err_c));
+	_destroy($self, qw(--batch in out pid));
+	_destroy($self, qw(--batch-check in_c out_c pid_c err_c));
 	!!($self->{pid} || $self->{pid_c});
 }
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 25/82] imap: use git-cat-file asynchronously
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (23 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 24/82] git: do our own read buffering for cat-file Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 26/82] git: idle rbuf for async Eric Wong
                   ` (57 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This ought to improve overall performance with multiple clients.
Single client performance suffers a tiny bit due to extra
syscall overhead from epoll.

This also makes the existing async interface easier-to-use,
since calling cat_async_begin is no longer required.
---
 MANIFEST                       |  1 +
 lib/PublicInbox/Git.pm         | 10 +++----
 lib/PublicInbox/GitAsyncCat.pm | 49 ++++++++++++++++++++++++++++++++++
 lib/PublicInbox/IMAP.pm        | 49 ++++++++++++++++++----------------
 lib/PublicInbox/Inbox.pm       | 16 ++++++++---
 t/git.t                        |  1 -
 xt/cmp-msgstr.t                |  1 -
 xt/cmp-msgview.t               |  1 -
 xt/eml_check_limits.t          |  1 -
 xt/git_async_cmp.t             |  1 -
 xt/msgtime_cmp.t               |  1 -
 xt/perf-msgview.t              |  1 -
 12 files changed, 93 insertions(+), 39 deletions(-)
 create mode 100644 lib/PublicInbox/GitAsyncCat.pm

diff --git a/MANIFEST b/MANIFEST
index 2e4ec915412..82f047f3623 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -119,6 +119,7 @@ lib/PublicInbox/Filter/SubjectTag.pm
 lib/PublicInbox/Filter/Vger.pm
 lib/PublicInbox/GetlineBody.pm
 lib/PublicInbox/Git.pm
+lib/PublicInbox/GitAsyncCat.pm
 lib/PublicInbox/GitHTTPBackend.pm
 lib/PublicInbox/GzipFilter.pm
 lib/PublicInbox/HTTP.pm
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 54c163e8c2f..c5a3fa4642c 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -156,7 +156,7 @@ sub my_readline ($$) {
 	}
 }
 
-sub _cat_async_step ($$) {
+sub cat_async_step ($$) {
 	my ($self, $inflight) = @_;
 	die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2;
 	my ($cb, $arg) = splice(@$inflight, 0, 2);
@@ -178,7 +178,7 @@ sub cat_async_wait ($) {
 	my ($self) = @_;
 	my $inflight = delete $self->{inflight} or return;
 	while (scalar(@$inflight)) {
-		_cat_async_step($self, $inflight);
+		cat_async_step($self, $inflight);
 	}
 }
 
@@ -277,6 +277,7 @@ sub qx {
 # returns true if there are pending "git cat-file" processes
 sub cleanup {
 	my ($self) = @_;
+	cat_async_wait($self);
 	_destroy($self, qw(--batch in out pid));
 	_destroy($self, qw(--batch-check in_c out_c pid_c err_c));
 	!!($self->{pid} || $self->{pid_c});
@@ -334,9 +335,9 @@ sub cat_async_begin {
 
 sub cat_async ($$$;$) {
 	my ($self, $oid, $cb, $arg) = @_;
-	my $inflight = $self->{inflight} or die 'BUG: not in async';
+	my $inflight = $self->{inflight} // cat_async_begin($self);
 	if (scalar(@$inflight) >= MAX_INFLIGHT) {
-		_cat_async_step($self, $inflight);
+		cat_async_step($self, $inflight);
 	}
 
 	print { $self->{out} } $oid, "\n" or fail($self, "write error: $!");
@@ -358,7 +359,6 @@ sub modified ($) {
 	my ($self) = @_;
 	my $modified = 0;
 	my $fh = popen($self, qw(rev-parse --branches));
-	cat_async_begin($self);
 	local $/ = "\n";
 	while (my $oid = <$fh>) {
 		chomp $oid;
diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm
new file mode 100644
index 00000000000..f168169f68d
--- /dev/null
+++ b/lib/PublicInbox/GitAsyncCat.pm
@@ -0,0 +1,49 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# internal class used by PublicInbox::Git + Danga::Socket
+# This parses the output pipe of "git cat-file --batch"
+#
+# Note: this does NOT set the non-blocking flag, we expect `git cat-file'
+# to be a local process, and git won't start writing a blob until it's
+# fully read.  So minimize context switching and read as much as possible
+# and avoid holding a buffer in our heap any longer than it has to live.
+package PublicInbox::GitAsyncCat;
+use strict;
+use parent qw(PublicInbox::DS Exporter);
+use fields qw(git);
+use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
+our @EXPORT = qw(git_async_msg);
+
+sub new {
+	my ($class, $git) = @_;
+	my $self = fields::new($class);
+	$git->batch_prepare;
+	$self->SUPER::new($git->{in}, EPOLLIN|EPOLLET);
+	$self->{git} = $git;
+	$self;
+}
+
+sub event_step {
+	my ($self) = @_;
+	my $git = $self->{git} or return; # ->close-ed
+	my $inflight = $git->{inflight};
+	if (@$inflight) {
+		$git->cat_async_step($inflight);
+		$self->requeue if @$inflight || length(${$git->{'--batch'}});
+	}
+}
+
+sub close {
+	my ($self) = @_;
+	delete $self->{git};
+	$self->SUPER::close; # PublicInbox::DS::close
+}
+
+sub git_async_msg ($$$$) {
+	my ($ibx, $smsg, $cb, $arg) = @_;
+	$ibx->git->cat_async($smsg->{blob}, $cb, $arg);
+	$ibx->{async_cat} //= new(__PACKAGE__, $ibx->{git});
+}
+
+1;
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 39667199080..2282e3ce78c 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -21,8 +21,10 @@ use PublicInbox::Eml;
 use PublicInbox::EmlContentFoo qw(parse_content_disposition);
 use PublicInbox::DS qw(now);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
+use PublicInbox::GitAsyncCat;
 use Text::ParseWords qw(parse_line);
 use Errno qw(EAGAIN);
+
 my $Address;
 for my $mod (qw(Email::Address::XS Mail::Address)) {
 	eval "require $mod" or next;
@@ -367,14 +369,30 @@ EOF
 	\$ret;
 }
 
-sub uid_fetch_cb { # called by git->cat_async
+sub requeue_once ($) {
+	my ($self) = @_;
+	# COMPRESS users all share the same DEFLATE context.
+	# Flush it here to ensure clients don't see
+	# each other's data
+	$self->zflush;
+
+	# no recursion, schedule another call ASAP,
+	# but only after all pending writes are done.
+	# autovivify wbuf:
+	my $new_size = push(@{$self->{wbuf}}, \&long_step);
+
+	# wbuf may be populated by $cb, no need to rearm if so:
+	$self->requeue if $new_size == 1;
+}
+
+sub uid_fetch_cb { # called by git->cat_async via git_async_msg
 	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
 	my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg;
 	my $smsg = shift @$msgs or die 'BUG: no smsg';
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
 		# public-inbox-(edit|purge), just move onto the next message
-		return unless defined $want->{-seqno};
+		return requeue_once($self) unless defined $want->{-seqno};
 		$bref = dummy_message($smsg->{num}, $ibx);
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
@@ -420,6 +438,7 @@ sub uid_fetch_cb { # called by git->cat_async
 		partial_emit($self, $partial, $eml);
 	}
 	$self->msg_more(")\r\n");
+	requeue_once($self);
 }
 
 sub uid_fetch_m { # long_response
@@ -432,11 +451,7 @@ sub uid_fetch_m { # long_response
 		}
 		$$beg = $msgs->[-1]->{num} + 1;
 	}
-	my $git = $ibx->git;
-	$git->cat_async_begin; # TODO: actually make async
-	$git->cat_async($msgs->[0]->{blob}, \&uid_fetch_cb, \@_);
-	$git->cat_async_wait;
-	1;
+	git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_);
 }
 
 sub cmd_status ($$$;@) {
@@ -677,20 +692,17 @@ sub seq_fetch_m { # long_response
 	my $seq = $want->{-seqno}++;
 	my $cur_num = $msgs->[0]->{num};
 	if ($cur_num == $seq) { # as expected
-		my $git = $ibx->git;
-		$git->cat_async_begin; # TODO: actually make async
-		$git->cat_async($msgs->[0]->{blob}, \&uid_fetch_cb, \@_);
-		$git->cat_async_wait;
+		git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_);
 	} elsif ($cur_num > $seq) {
 		# send dummy messages until $seq catches up to $cur_num
 		my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg';
 		unshift @$msgs, $smsg;
 		my $bref = dummy_message($seq, $ibx);
 		uid_fetch_cb($bref, undef, undef, undef, \@_);
+		$smsg; # blessed response since uid_fetch_cb requeues
 	} else { # should not happen
 		die "BUG: cur_num=$cur_num < seq=$seq";
 	}
-	1; # more messages on the way
 }
 
 sub cmd_fetch ($$$;@) {
@@ -810,17 +822,8 @@ sub long_step {
 	} elsif ($more) { # $self->{wbuf}:
 		$self->update_idle_time;
 
-		# COMPRESS users all share the same DEFLATE context.
-		# Flush it here to ensure clients don't see
-		# each other's data
-		$self->zflush;
-
-		# no recursion, schedule another call ASAP, but only after
-		# all pending writes are done.  autovivify wbuf:
-		my $new_size = push(@{$self->{wbuf}}, \&long_step);
-
-		# wbuf may be populated by $cb, no need to rearm if so:
-		$self->requeue if $new_size == 1;
+		# control passed to $more may be a GitAsyncCat object
+		requeue_once($self) if !ref($more);
 	} else { # all done!
 		delete $self->{long_cb};
 		my $elapsed = now() - $t0;
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index b250bef33d3..407751c3063 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -18,6 +18,16 @@ my $cleanup_timer;
 my $cleanup_avail = -1; # 0, or 1
 my $have_devel_peek;
 my $CLEANUP = {}; # string(inbox) -> inbox
+
+sub git_cleanup ($) {
+	my ($self) = @_;
+	my $git = $self->{git} or return;
+	if (my $async_cat = delete $self->{async_cat}) {
+		$async_cat->close;
+	}
+	$git->cleanup;
+}
+
 sub cleanup_task () {
 	$cleanup_timer = undef;
 	my $next = {};
@@ -32,9 +42,7 @@ sub cleanup_task () {
 				# refcnt is zero when tmp is out-of-scope
 			}
 		}
-		if (my $git = $ibx->{git}) {
-			$again = $git->cleanup;
-		}
+		git_cleanup($ibx);
 		if (my $gits = $ibx->{-repo_objs}) {
 			foreach my $git (@$gits) {
 				$again = 1 if $git->cleanup;
@@ -157,7 +165,7 @@ sub max_git_epoch {
 	my $cur = $self->{-max_git_epoch};
 	my $changed = git($self)->alternates_changed;
 	if (!defined($cur) || $changed) {
-		$self->git->cleanup if $changed;
+		git_cleanup($self) if $changed;
 		my $gits = "$self->{inboxdir}/git";
 		if (opendir my $dh, $gits) {
 			my $max = -1;
diff --git a/t/git.t b/t/git.t
index b05ac123bec..0b2089ba701 100644
--- a/t/git.t
+++ b/t/git.t
@@ -40,7 +40,6 @@ use_ok 'PublicInbox::Git';
 	my $arg = { 'foo' => 'bar' };
 	my $res = [];
 	my $missing = [];
-	$gcf->cat_async_begin;
 	$gcf->cat_async($oid, sub {
 		my ($bref, $oid_hex, $type, $size, $arg) = @_;
 		$res = [ @_ ];
diff --git a/xt/cmp-msgstr.t b/xt/cmp-msgstr.t
index 6bae0f66e83..0276f8454b0 100644
--- a/xt/cmp-msgstr.t
+++ b/xt/cmp-msgstr.t
@@ -93,7 +93,6 @@ my $git_cb = sub {
 		diag xqx([qw(git diff), "$tmpdir/mime", "$tmpdir/eml"]);
 	}
 };
-$git->cat_async_begin;
 my $t = timeit(1, sub {
 	while (<$fh>) {
 		my ($oid, $type) = split / /;
diff --git a/xt/cmp-msgview.t b/xt/cmp-msgview.t
index 66fb467eb1a..5bd7aa174df 100644
--- a/xt/cmp-msgview.t
+++ b/xt/cmp-msgview.t
@@ -77,7 +77,6 @@ my $git_cb = sub {
 		is_deeply($eml_cmp, $mime_cmp, "$inboxdir $oid match");
 	}
 };
-$git->cat_async_begin;
 my $t = timeit(1, sub {
 	while (<$fh>) {
 		my ($oid, $type) = split / /;
diff --git a/xt/eml_check_limits.t b/xt/eml_check_limits.t
index 39de047645b..2d632799956 100644
--- a/xt/eml_check_limits.t
+++ b/xt/eml_check_limits.t
@@ -55,7 +55,6 @@ my $cat_cb = sub {
 };
 
 my $t = timeit(1, sub {
-	$git->cat_async_begin;
 	my ($blob, $type);
 	while (<$fh>) {
 		($blob, $type) = split / /;
diff --git a/xt/git_async_cmp.t b/xt/git_async_cmp.t
index 46d27b26bed..8f8d1cf4d01 100644
--- a/xt/git_async_cmp.t
+++ b/xt/git_async_cmp.t
@@ -25,7 +25,6 @@ my $async = timeit($nr, sub {
 		$dig->add($$bref);
 	};
 	my $cat = $git->popen(@cat);
-	$git->cat_async_begin;
 
 	while (<$cat>) {
 		my ($oid, undef, undef) = split(/ /);
diff --git a/xt/msgtime_cmp.t b/xt/msgtime_cmp.t
index 95d7c64bec6..0ce3c042217 100644
--- a/xt/msgtime_cmp.t
+++ b/xt/msgtime_cmp.t
@@ -59,7 +59,6 @@ sub compare {
 }
 
 my $fh = $git->popen(@cat);
-$git->cat_async_begin;
 while (<$fh>) {
 	my ($oid, $type) = split / /;
 	next if $type ne 'blob';
diff --git a/xt/perf-msgview.t b/xt/perf-msgview.t
index 30fc07dcbc8..d99101a30de 100644
--- a/xt/perf-msgview.t
+++ b/xt/perf-msgview.t
@@ -44,7 +44,6 @@ my $cb = sub {
 	$obuf = '';
 };
 
-$git->cat_async_begin;
 my $t = timeit(1, sub {
 	$ctx->{obuf} = \$obuf;
 	$ctx->{mhref} = '../';

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 26/82] git: idle rbuf for async
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (24 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 25/82] imap: use git-cat-file asynchronously Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 27/82] imap: support LSUB command Eric Wong
                   ` (56 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We do this for the C10K-oriented HTTP/NNTP/IMAP processes, and
we may support thousands of git-cat-file processes in the
future.
---
 lib/PublicInbox/Git.pm         | 47 ++++++++++++++++++----------------
 lib/PublicInbox/GitAsyncCat.pm |  2 +-
 2 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index c5a3fa4642c..1c148086f46 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -123,7 +123,6 @@ sub _bidi_pipe {
 		fcntl($out_w, 1031, 4096);
 		fcntl($in_r, 1031, 4096) if $batch eq '--batch-check';
 	}
-	$self->{$batch} = \(my $rbuf = '');
 	$self->{$out} = $out_w;
 	$self->{$in} = $in_r;
 }
@@ -160,18 +159,20 @@ sub cat_async_step ($$) {
 	my ($self, $inflight) = @_;
 	die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2;
 	my ($cb, $arg) = splice(@$inflight, 0, 2);
-	my $head = my_readline($self->{in}, $self->{'--batch'});
-	$head =~ / missing$/ and return
-		eval { $cb->(undef, undef, undef, undef, $arg) };
-
-	$head =~ /^([0-9a-f]{40}) (\S+) ([0-9]+)$/ or
+	my $rbuf = delete($self->{cat_rbuf}) // \(my $new = '');
+	my ($bref, $oid, $type, $size);
+	my $head = my_readline($self->{in}, $rbuf);
+	if ($head =~ /^([0-9a-f]{40}) (\S+) ([0-9]+)$/) {
+		($oid, $type, $size) = ($1, $2, $3 + 0);
+		$bref = my_read($self->{in}, $rbuf, $size + 1) or
+			fail($self, defined($bref) ? 'read EOF' : "read: $!");
+		chop($$bref) eq "\n" or fail($self, 'LF missing after blob');
+	} elsif ($head !~ / missing$/) {
 		fail($self, "Unexpected result from async git cat-file: $head");
-	my ($oid_hex, $type, $size) = ($1, $2, $3 + 0);
-	my $ret = my_read($self->{in}, $self->{'--batch'}, $size + 1);
-	fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret;
-	chop($$ret) eq "\n" or fail($self, 'newline missing after blob');
-	eval { $cb->($ret, $oid_hex, $type, $size, $arg) };
-	warn "E: $oid_hex $@\n" if $@;
+	}
+	eval { $cb->($bref, $oid, $type, $size, $arg) };
+	$self->{cat_rbuf} = $rbuf if $$rbuf ne '';
+	warn "E: $oid: $@\n" if $@;
 }
 
 sub cat_async_wait ($) {
@@ -188,12 +189,13 @@ sub batch_prepare ($) {
 
 sub cat_file {
 	my ($self, $obj, $sizeref) = @_;
-	my ($retried, $head);
+	my ($retried, $head, $rbuf);
 	cat_async_wait($self);
 again:
 	batch_prepare($self);
+	$rbuf = delete($self->{cat_rbuf}) // \(my $new = '');
 	print { $self->{out} } $obj, "\n" or fail($self, "write error: $!");
-	$head = my_readline($self->{in}, $self->{'--batch'});
+	$head = my_readline($self->{in}, $rbuf);
 	if ($head =~ / missing$/) {
 		if (!$retried && alternates_changed($self)) {
 			$retried = 1;
@@ -207,7 +209,8 @@ again:
 
 	my $size = $1 + 0;
 	$$sizeref = $size if $sizeref;
-	my $ret = my_read($self->{in}, $self->{'--batch'}, $size + 1);
+	my $ret = my_read($self->{in}, $rbuf, $size + 1);
+	$self->{cat_rbuf} = $rbuf if $$rbuf ne '';
 	fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret;
 	chop($$ret) eq "\n" or fail($self, 'newline missing after blob');
 	$ret;
@@ -217,7 +220,8 @@ sub check {
 	my ($self, $obj) = @_;
 	_bidi_pipe($self, qw(--batch-check in_c out_c pid_c err_c));
 	print { $self->{out_c} } $obj, "\n" or fail($self, "write error: $!");
-	chomp(my $line = my_readline($self->{in_c}, $self->{'--batch-check'}));
+	my $rbuf = ''; # TODO: async + {chk_rbuf}
+	chomp(my $line = my_readline($self->{in_c}, \$rbuf));
 	my ($hex, $type, $size) = split(' ', $line);
 
 	# Future versions of git.git may show 'ambiguous', but for now,
@@ -227,8 +231,7 @@ sub check {
 	return if $type eq 'missing' || $type eq 'ambiguous';
 
 	if ($hex eq 'dangling' || $hex eq 'notdir' || $hex eq 'loop') {
-		my $ret = my_read($self->{in_c}, $self->{'--batch-check'},
-				$type + 1);
+		my $ret = my_read($self->{in_c}, \$rbuf, $type + 1);
 		fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret;
 		return;
 	}
@@ -237,9 +240,9 @@ sub check {
 }
 
 sub _destroy {
-	my ($self, $batch, $in, $out, $pid, $err) = @_;
+	my ($self, $rbuf, $in, $out, $pid, $err) = @_;
 	my $p = delete $self->{$pid} or return;
-	delete @$self{($batch, $in, $out)};
+	delete @$self{($rbuf, $in, $out)};
 	delete $self->{$err} if $err; # `err_c'
 
 	# PublicInbox::DS may not be loaded
@@ -278,8 +281,8 @@ sub qx {
 sub cleanup {
 	my ($self) = @_;
 	cat_async_wait($self);
-	_destroy($self, qw(--batch in out pid));
-	_destroy($self, qw(--batch-check in_c out_c pid_c err_c));
+	_destroy($self, qw(cat_rbuf in out pid));
+	_destroy($self, qw(chk_rbuf in_c out_c pid_c err_c));
 	!!($self->{pid} || $self->{pid_c});
 }
 
diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm
index f168169f68d..65e16121969 100644
--- a/lib/PublicInbox/GitAsyncCat.pm
+++ b/lib/PublicInbox/GitAsyncCat.pm
@@ -30,7 +30,7 @@ sub event_step {
 	my $inflight = $git->{inflight};
 	if (@$inflight) {
 		$git->cat_async_step($inflight);
-		$self->requeue if @$inflight || length(${$git->{'--batch'}});
+		$self->requeue if @$inflight || exists $git->{cat_rbuf};
 	}
 }
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 27/82] imap: support LSUB command
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (25 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 26/82] git: idle rbuf for async Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 28/82] imap: FETCH: support comma-delimited ranges Eric Wong
                   ` (55 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Since we only support read-only operation, we can't save
subscriptions requested by clients.  So just list no inboxes as
subscribed, some MUAs may blindly try to fetch everything its
subscribed to.
---
 lib/PublicInbox/IMAP.pm | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 2282e3ce78c..51ab8b8c2f2 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -497,6 +497,11 @@ sub cmd_list ($$$$) {
 	\(join('', @$l, "$tag OK List done\r\n"));
 }
 
+sub cmd_lsub ($$$$) {
+	my (undef, $tag) = @_; # same args as cmd_list
+	"$tag OK Lsub done\r\n";
+}
+
 sub eml_index_offs_i { # PublicInbox::Eml::each_part callback
 	my ($p, $all) = @_;
 	my ($eml, undef, $idx) = @$p;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 28/82] imap: FETCH: support comma-delimited ranges
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (26 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 27/82] imap: support LSUB command Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 29/82] add imapd compression test Eric Wong
                   ` (54 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

The RFC 3501 `sequence-set' definition allows comma-delimited
ranges, so we'll support it in case clients send them.

Coalescing overlapping ranges isn't required, so we won't
support it as such an attempt to save bandwidth would waste
memory on the server, instead.
---
 lib/PublicInbox/IMAP.pm | 92 +++++++++++++++++++++++++----------------
 t/imapd.t               |  2 +
 2 files changed, 59 insertions(+), 35 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 51ab8b8c2f2..917833f7c47 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -61,6 +61,9 @@ for my $att (keys %FETCH_ATT) {
 	$FETCH_ATT{$att} = \%h;
 }
 
+my $valid_range = '[0-9]+|[0-9]+:[0-9]+|[0-9]+:\*';
+$valid_range = qr/\A(?:$valid_range)(?:,(?:$valid_range))*\z/;
+
 sub greet ($) {
 	my ($self) = @_;
 	my $capa = capa($self);
@@ -387,7 +390,7 @@ sub requeue_once ($) {
 
 sub uid_fetch_cb { # called by git->cat_async via git_async_msg
 	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
-	my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg;
+	my ($self, undef, $ibx, $msgs, undef, $want) = @$fetch_m_arg;
 	my $smsg = shift @$msgs or die 'BUG: no smsg';
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
@@ -441,15 +444,48 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_msg
 	requeue_once($self);
 }
 
+sub range_step ($$) {
+	my ($ibx, $range_csv) = @_;
+	my ($beg, $end, $range);
+	if ($$range_csv =~ s/\A([^,]+),//) {
+		$range = $1;
+	} else {
+		$range = $$range_csv;
+		$$range_csv = undef;
+	}
+	if ($range =~ /\A([0-9]+):([0-9]+)\z/) {
+		($beg, $end) = ($1, $2);
+	} elsif ($range =~ /\A([0-9]+):\*\z/) {
+		($beg, $end) =  ($1, $ibx->mm->max // 0);
+	} elsif ($range =~ /\A[0-9]+\z/) {
+		$beg = $end = $range;
+	} else {
+		return 'BAD fetch range';
+	}
+	[ $beg, $end, $$range_csv ];
+}
+
+sub refill_range ($$$) {
+	my ($ibx, $msgs, $range_info) = @_;
+	my ($beg, $end, $range_csv) = @$range_info;
+	if (scalar(@$msgs = @{$ibx->over->query_xover($beg, $end)})) {
+		$range_info->[0] = $msgs->[-1]->{num} + 1;
+		return;
+	}
+	return 'OK Fetch done' if !$range_csv;
+	my $next_range = range_step($ibx, \$range_csv);
+	return $next_range if !ref($next_range); # error
+	@$range_info = @$next_range;
+	undef; # keep looping
+}
+
 sub uid_fetch_m { # long_response
-	my ($self, $tag, $ibx, $beg, $end, $msgs, $want) = @_;
-	if (!@$msgs) { # refill
-		@$msgs = @{$ibx->over->query_xover($$beg, $end)};
-		if (!@$msgs) {
-			$self->write(\"$tag OK Fetch done\r\n");
+	my ($self, $tag, $ibx, $msgs, $range_info, $want) = @_;
+	while (!@$msgs) { # rare
+		if (my $end = refill_range($ibx, $msgs, $range_info)) {
+			$self->write(\"$tag $end\r\n");
 			return;
 		}
-		$$beg = $msgs->[-1]->{num} + 1;
 	}
 	git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_);
 }
@@ -635,7 +671,7 @@ sub partial_emit ($$$) {
 }
 
 sub fetch_common ($$$$) {
-	my ($self, $tag, $range, $want) = @_;
+	my ($self, $tag, $range_csv, $want) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
 	if ($want->[0] =~ s/\A\(//s) {
 		$want->[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n";
@@ -658,41 +694,27 @@ sub fetch_common ($$$$) {
 			[ $_, @{$partial{$_}} ]
 		} sort keys %partial ];
 	}
-
-	my ($beg, $end);
-	my $msgs = [];
-	if ($range =~ /\A([0-9]+):([0-9]+)\z/s) {
-		($beg, $end) = ($1, $2);
-	} elsif ($range =~ /\A([0-9]+):\*\z/s) {
-		($beg, $end) =  ($1, $ibx->mm->max // 0);
-	} elsif ($range =~ /\A[0-9]+\z/) {
-		my $smsg = $ibx->over->get_art($range) or
-			return "$tag OK Fetch done\r\n"; # really OK(!)
-		push @$msgs, $smsg;
-		($beg, $end) = ($range, 0);
-	} else {
-		return "$tag BAD fetch range\r\n";
-	}
-	[ $tag, $ibx, \$beg, $end, $msgs, \%want ];
+	$range_csv = 'bad' if $range_csv !~ $valid_range;
+	my $range_info = range_step($ibx, \$range_csv);
+	return "$tag $range_info\r\n" if !ref($range_info);
+	[ $tag, $ibx, [], $range_info, \%want ];
 }
 
 sub cmd_uid_fetch ($$$;@) {
-	my ($self, $tag, $range, @want) = @_;
-	my $args = fetch_common($self, $tag, $range, \@want);
+	my ($self, $tag, $range_csv, @want) = @_;
+	my $args = fetch_common($self, $tag, $range_csv, \@want);
 	ref($args) eq 'ARRAY' ?
 		long_response($self, \&uid_fetch_m, @$args) :
 		$args; # error
 }
 
 sub seq_fetch_m { # long_response
-	my ($self, $tag, $ibx, $beg, $end, $msgs, $want) = @_;
-	if (!@$msgs) { # refill
-		@$msgs = @{$ibx->over->query_xover($$beg, $end)};
-		if (!@$msgs) {
-			$self->write(\"$tag OK Fetch done\r\n");
+	my ($self, $tag, $ibx, $msgs, $range_info, $want) = @_;
+	while (!@$msgs) { # rare
+		if (my $end = refill_range($ibx, $msgs, $range_info)) {
+			$self->write(\"$tag $end\r\n");
 			return;
 		}
-		$$beg = $msgs->[-1]->{num} + 1;
 	}
 	my $seq = $want->{-seqno}++;
 	my $cur_num = $msgs->[0]->{num};
@@ -711,11 +733,11 @@ sub seq_fetch_m { # long_response
 }
 
 sub cmd_fetch ($$$;@) {
-	my ($self, $tag, $range, @want) = @_;
-	my $args = fetch_common($self, $tag, $range, \@want);
+	my ($self, $tag, $range_csv, @want) = @_;
+	my $args = fetch_common($self, $tag, $range_csv, \@want);
 	ref($args) eq 'ARRAY' ? do {
 		my $want = $args->[-1];
-		$want->{-seqno} = ${$args->[2]}; # $$beg
+		$want->{-seqno} = $args->[3]->[0]; # $beg == $range_info->[0];
 		long_response($self, \&seq_fetch_m, @$args)
 	} : $args; # error
 }
diff --git a/t/imapd.t b/t/imapd.t
index 8172a91923d..7af14f1b150 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -309,6 +309,8 @@ EOF
 # message sequence numbers :<
 is($mic->Uid(0), 0, 'disable UID on '.ref($mic));
 ok($mic->reconnect, 'reconnected');
+$ret = $mic->fetch_hash('1,2:3', 'RFC822') or BAIL_OUT "FETCH $@";
+is(scalar keys %$ret, 3, 'got all 3 messages with comma-separated sequence');
 $ret = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@";
 is(scalar keys %$ret, 3, 'got all 3 messages');
 {

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 29/82] add imapd compression test
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (27 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 28/82] imap: FETCH: support comma-delimited ranges Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 30/82] testcommon: tcp_(server|connect): BAIL_OUT on failure Eric Wong
                   ` (53 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Include a test for Mail::IMAPTalk, here, since Mail::IMAPClient
stalls with compression enabled:
	https://rt.cpan.org/Ticket/Display.html?id=132720
---
 MANIFEST                |  1 +
 xt/cmp-imapd-compress.t | 83 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)
 create mode 100644 xt/cmp-imapd-compress.t

diff --git a/MANIFEST b/MANIFEST
index 82f047f3623..f74852b6e90 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -346,6 +346,7 @@ t/www_listing.t
 t/www_static.t
 t/x-unknown-alpine.eml
 t/xcpdb-reshard.t
+xt/cmp-imapd-compress.t
 xt/cmp-msgstr.t
 xt/cmp-msgview.t
 xt/eml_check_limits.t
diff --git a/xt/cmp-imapd-compress.t b/xt/cmp-imapd-compress.t
new file mode 100644
index 00000000000..b12cf74e2d5
--- /dev/null
+++ b/xt/cmp-imapd-compress.t
@@ -0,0 +1,83 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+require_mods('Data::Dumper');
+Data::Dumper->import('Dumper');
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+(defined($inboxdir) && -d $inboxdir) or
+	plan skip_all => "GIANT_INBOX_DIR not defined for $0";
+plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!;
+my ($tmpdir, $for_destroy) = tmpdir();
+my $cfg = "$tmpdir/cfg";
+my $mailbox = 'inbox.test';
+{
+	open my $fh, '>', $cfg or BAIL_OUT "open: $!";
+	print $fh <<EOF or BAIL_OUT "print: $!";
+[publicinbox "test"]
+	newsgroup = $mailbox
+	address = test\@example.com
+	inboxdir = $inboxdir
+EOF
+	close $fh or BAIL_OUT "close: $!";
+}
+my ($out, $err) = ("$tmpdir/stdout.log", "$tmpdir/stderr.log");
+my $sock = tcp_server();
+my $cmd = [ '-imapd', '-W0', "--stdout=$out", "--stderr=$err"];
+my $env = { PI_CONFIG => $cfg };
+my $td = start_script($cmd, $env, { 3 => $sock }) or BAIL_OUT "-imapd: $?";
+my ($host, $port) = ($sock->sockhost, $sock->sockport);
+my $c = tcp_connect($sock);
+like(readline($c), qr/CAPABILITY /, 'got greeting');
+undef $c;
+
+SKIP: {
+	require_mods('Mail::IMAPClient', 3);
+	unless ($ENV{RT_132720_FIXED}) {
+		my $bug = 'https://rt.cpan.org/Ticket/Display.html?id=132720';
+		skip "<$bug>, RT_132720_FIXED not defined", 3;
+	}
+	my %opt = (Server => $host, Port => $port,
+			User => 'u', Password => 'p', Clear => 1);
+	my $uc = Mail::IMAPClient->new(%opt);
+	my $c = Mail::IMAPClient->new(%opt);
+	ok($c->compress, 'enabled compression');
+	ok $c->examine($mailbox), 'compressed EXAMINE-ed';
+	ok $uc->examine($mailbox), 'uncompress EXAMINE-ed';
+	my $range = $uc->search('all');
+	for my $uid (@$range) {
+		my $A = $uc->fetch_hash($uid, 'BODY[]');
+		my $B = $c->fetch_hash($uid, 'BODY[]');
+		if (!is_deeply($A, $B, "$uid identical")) {
+			diag Dumper([$A, $B]);
+			diag Dumper([$uc, $c]);
+			last;
+		}
+	}
+	$uc->logout;
+	$c->logout;
+}
+
+SKIP: {
+	require_mods('Mail::IMAPTalk', 3);
+	my %opt = (Server => $host, Port => $port, UseSSL => 0,
+		Username => 'u', Password => 'p', Uid => 1);
+	my $uc = Mail::IMAPTalk->new(%opt) or BAIL_OUT 'IMAPTalk->new';
+	my $c = Mail::IMAPTalk->new(%opt, UseCompress => 1) or
+		BAIL_OUT 'IMAPTalk->new(UseCompress => 1)';
+	ok $c->examine($mailbox), 'compressed EXAMINE-ed';
+	ok $uc->examine($mailbox), 'uncompress EXAMINE-ed';
+	my $range = $uc->search('all');
+	for my $uid (@$range) {
+		my $A = $uc->fetch($uid, 'rfc822');
+		my $B = $c->fetch($uid, 'rfc822');
+		if (!is_deeply($A, $B, "$uid identical")) {
+			diag Dumper([$A, $B]);
+			diag Dumper([$uc, $c]);
+			last;
+		}
+	}
+}
+done_testing;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 30/82] testcommon: tcp_(server|connect): BAIL_OUT on failure
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (28 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 29/82] add imapd compression test Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 31/82] *deflate: drop invalid comment about rbuf Eric Wong
                   ` (52 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

None of our tests rely on this failing, so just bail out
if the system is out of resources.
---
 lib/PublicInbox/TestCommon.pm | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 246047b1357..5e7dc8b0d3f 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -37,17 +37,18 @@ sub tcp_server () {
 		Type => Socket::SOCK_STREAM(),
 		Listen => 1024,
 		Blocking => 0,
-	)
+	) or Test::More::BAIL_OUT("failed to create TCP server: $!");
 }
 
 sub tcp_connect {
 	my ($dest, %opt) = @_;
+	my $addr = $dest->sockhost . ':' . $dest->sockport;
 	my $s = IO::Socket::INET->new(
 		Proto => 'tcp',
 		Type => Socket::SOCK_STREAM(),
-		PeerAddr => $dest->sockhost . ':' . $dest->sockport,
+		PeerAddr => $addr,
 		%opt,
-	);
+	) or Test::More::BAIL_OUT("failed to connect to $addr: $!");
 	$s->autoflush(1);
 	$s;
 }

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 31/82] *deflate: drop invalid comment about rbuf
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (29 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 30/82] testcommon: tcp_(server|connect): BAIL_OUT on failure Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 32/82] imap: fix pipelining with async git Eric Wong
                   ` (51 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

It must be a scalar reference, unlike ->write
---
 lib/PublicInbox/IMAPdeflate.pm | 1 -
 lib/PublicInbox/NNTPdeflate.pm | 1 -
 2 files changed, 2 deletions(-)

diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm
index 9366db7a7fc..67c9a9738d5 100644
--- a/lib/PublicInbox/IMAPdeflate.pm
+++ b/lib/PublicInbox/IMAPdeflate.pm
@@ -50,7 +50,6 @@ sub enable {
 # overrides PublicInbox::NNTP::compressed
 sub compressed { 1 }
 
-# $_[1] may be a reference or not
 sub do_read ($$$$) {
 	my ($self, $rbuf, $len, $off) = @_;
 
diff --git a/lib/PublicInbox/NNTPdeflate.pm b/lib/PublicInbox/NNTPdeflate.pm
index 8efb662f5ab..eb400c9c220 100644
--- a/lib/PublicInbox/NNTPdeflate.pm
+++ b/lib/PublicInbox/NNTPdeflate.pm
@@ -62,7 +62,6 @@ sub enable {
 # overrides PublicInbox::NNTP::compressed
 sub compressed { 1 }
 
-# $_[1] may be a reference or not
 sub do_read ($$$$) {
 	my ($self, $rbuf, $len, $off) = @_;
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 32/82] imap: fix pipelining with async git
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (30 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 31/82] *deflate: drop invalid comment about rbuf Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 33/82] git: cat_async: provide requested OID + "missing" on missing blobs Eric Wong
                   ` (50 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Since IMAP yields control to GitAsyncCat, IMAP->event_step may
be invoked with {long_cb} still active.  We must be sure to
bail out of IMAP->event_step if that happens and continue to let
GitAsyncCat drive IMAP.

This also improves fairness by never processing more than one
request per ->event_step.
---
 lib/PublicInbox/IMAP.pm        | 37 ++++++++++++++++------------------
 lib/PublicInbox/IMAPdeflate.pm | 10 +++++++++
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 917833f7c47..3c32d846508 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -888,38 +888,35 @@ sub long_response ($$;@) {
 sub event_step {
 	my ($self) = @_;
 
-	return unless $self->flush_write && $self->{sock};
+	return unless $self->flush_write && $self->{sock} && !$self->{long_cb};
 
 	$self->update_idle_time;
 	# only read more requests if we've drained the write buffer,
 	# otherwise we can be buffering infinitely w/o backpressure
 
-	my $rbuf = $self->{rbuf} // (\(my $x = ''));
-	my $r = 1;
-
-	if (index($$rbuf, "\n") < 0) {
-		my $off = length($$rbuf);
-		$r = $self->do_read($rbuf, LINE_MAX, $off) or return;
-	}
-	while ($r > 0 && $$rbuf =~ s/\A[ \t]*([^\n]*?)\r?\n//) {
-		my $line = $1;
-		return $self->close if $line =~ /[[:cntrl:]]/s;
-		my $t0 = now();
-		my $fd = fileno($self->{sock});
-		$r = eval { process_line($self, $line) };
-		my $pending = $self->{wbuf} ? ' pending' : '';
-		out($self, "[$fd] %s - %0.6f$pending", $line, now() - $t0);
-	}
+	my $rbuf = $self->{rbuf} // \(my $x = '');
+	my $line = index($$rbuf, "\n");
+	while ($line < 0) {
+		return $self->close if length($$rbuf) >= LINE_MAX;
+		$self->do_read($rbuf, LINE_MAX, length($$rbuf)) or return;
+		$line = index($$rbuf, "\n");
+	}
+	$line = substr($$rbuf, 0, $line + 1, '');
+	$line =~ s/\r?\n\z//s;
+	return $self->close if $line =~ /[[:cntrl:]]/s;
+	my $t0 = now();
+	my $fd = fileno($self->{sock});
+	my $r = eval { process_line($self, $line) };
+	my $pending = $self->{wbuf} ? ' pending' : '';
+	out($self, "[$fd] %s - %0.6f$pending - $r", $line, now() - $t0);
 
 	return $self->close if $r < 0;
-	my $len = length($$rbuf);
-	return $self->close if ($len >= LINE_MAX);
 	$self->rbuf_idle($rbuf);
 	$self->update_idle_time;
 
 	# maybe there's more pipelined data, or we'll have
 	# to register it for socket-readiness notifications
-	$self->requeue unless $self->{wbuf};
+	$self->requeue unless $pending;
 }
 
 sub compressed { undef }
diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm
index 67c9a9738d5..42daa6cffaa 100644
--- a/lib/PublicInbox/IMAPdeflate.pm
+++ b/lib/PublicInbox/IMAPdeflate.pm
@@ -59,6 +59,16 @@ sub do_read ($$$$) {
 	$doff = length($dbuf);
 	my $r = PublicInbox::DS::do_read($self, \$dbuf, $len, $doff) or return;
 
+	# Workaround inflate bug appending to OOK scalars:
+	# <https://rt.cpan.org/Ticket/Display.html?id=132734>
+	# We only have $off if the client is pipelining, and pipelining
+	# is where our substr() OOK optimization in event_step makes sense.
+	if ($off) {
+		my $copy = $$rbuf;
+		undef $$rbuf;
+		$$rbuf = $copy;
+	}
+
 	# assert(length($$rbuf) == $off) as far as NNTP.pm is concerned
 	# -ConsumeInput is true, so $dbuf is automatically emptied
 	my $err = $zin->inflate($dbuf, $rbuf);

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 33/82] git: cat_async: provide requested OID + "missing" on missing blobs
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (31 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 32/82] imap: fix pipelining with async git Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 34/82] git: move async_cat reference to PublicInbox::Git Eric Wong
                   ` (49 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This will make it easier to implement the retries on
alternates_changed() of the synchronous ->cat_file API.
---
 lib/PublicInbox/Git.pm | 13 ++++++++-----
 t/git.t                |  2 +-
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 1c148086f46..b3ae2b812ba 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -20,7 +20,7 @@ use Errno qw(EINTR);
 our $PIPE_BUFSIZ = 65536; # Linux default
 
 use constant MAX_INFLIGHT =>
-	(($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 2)
+	(($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 3)
 	/
 	65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1
 
@@ -157,8 +157,8 @@ sub my_readline ($$) {
 
 sub cat_async_step ($$) {
 	my ($self, $inflight) = @_;
-	die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2;
-	my ($cb, $arg) = splice(@$inflight, 0, 2);
+	die 'BUG: inflight empty or odd' if scalar(@$inflight) < 3;
+	my ($req, $cb, $arg) = splice(@$inflight, 0, 3);
 	my $rbuf = delete($self->{cat_rbuf}) // \(my $new = '');
 	my ($bref, $oid, $type, $size);
 	my $head = my_readline($self->{in}, $rbuf);
@@ -167,7 +167,10 @@ sub cat_async_step ($$) {
 		$bref = my_read($self->{in}, $rbuf, $size + 1) or
 			fail($self, defined($bref) ? 'read EOF' : "read: $!");
 		chop($$bref) eq "\n" or fail($self, 'LF missing after blob');
-	} elsif ($head !~ / missing$/) {
+	} elsif ($head =~ / missing$/) {
+		$type = 'missing';
+		$oid = $req;
+	} else {
 		fail($self, "Unexpected result from async git cat-file: $head");
 	}
 	eval { $cb->($bref, $oid, $type, $size, $arg) };
@@ -344,7 +347,7 @@ sub cat_async ($$$;$) {
 	}
 
 	print { $self->{out} } $oid, "\n" or fail($self, "write error: $!");
-	push(@$inflight, $cb, $arg);
+	push(@$inflight, $oid, $cb, $arg);
 }
 
 sub extract_cmt_time {
diff --git a/t/git.t b/t/git.t
index 0b2089ba701..98d16f289c3 100644
--- a/t/git.t
+++ b/t/git.t
@@ -53,7 +53,7 @@ use_ok 'PublicInbox::Git';
 	is_deeply([$oid_hex, $type, $size], \@x, 'got expected header');
 	is($arg_res, $arg, 'arg passed to cat_async');
 	is_deeply($raw, $bref, 'blob result matches');
-	is_deeply($missing, [ undef, undef, undef, undef, $arg],
+	is_deeply($missing, [ undef, 'non-existent', 'missing', undef, $arg],
 		'non-existent blob gives expected result');
 }
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 34/82] git: move async_cat reference to PublicInbox::Git
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (32 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 33/82] git: cat_async: provide requested OID + "missing" on missing blobs Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 35/82] git: async: automatic retry on alternates change Eric Wong
                   ` (48 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Trying to avoid a circular reference by relying on $ibx object
here makes no sense, since skipping GitCatAsync::close will
result in an FD leak, anyways.  So keep GitAsyncCat contained to
git-only operations, since we'll be using it for Solver in the
distant feature.
---
 lib/PublicInbox/Git.pm         |  3 +++
 lib/PublicInbox/GitAsyncCat.pm | 14 ++++++++------
 lib/PublicInbox/IMAP.pm        |  8 +++++---
 lib/PublicInbox/Inbox.pm       |  3 ---
 4 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index b3ae2b812ba..60236afe7bf 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -283,6 +283,9 @@ sub qx {
 # returns true if there are pending "git cat-file" processes
 sub cleanup {
 	my ($self) = @_;
+	if (my $ac = $self->{async_cat}) {
+		$ac->close; # PublicInbox::GitAsyncCat::close -> EPOLL_CTL_DEL
+	}
 	cat_async_wait($self);
 	_destroy($self, qw(cat_rbuf in out pid));
 	_destroy($self, qw(chk_rbuf in_c out_c pid_c err_c));
diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm
index 65e16121969..8701e4cfe48 100644
--- a/lib/PublicInbox/GitAsyncCat.pm
+++ b/lib/PublicInbox/GitAsyncCat.pm
@@ -13,7 +13,7 @@ use strict;
 use parent qw(PublicInbox::DS Exporter);
 use fields qw(git);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLET);
-our @EXPORT = qw(git_async_msg);
+our @EXPORT = qw(git_async_cat);
 
 sub new {
 	my ($class, $git) = @_;
@@ -36,14 +36,16 @@ sub event_step {
 
 sub close {
 	my ($self) = @_;
-	delete $self->{git};
+	if (my $git = delete $self->{git}) {
+		delete $git->{async_cat}; # drop circular reference
+	}
 	$self->SUPER::close; # PublicInbox::DS::close
 }
 
-sub git_async_msg ($$$$) {
-	my ($ibx, $smsg, $cb, $arg) = @_;
-	$ibx->git->cat_async($smsg->{blob}, $cb, $arg);
-	$ibx->{async_cat} //= new(__PACKAGE__, $ibx->{git});
+sub git_async_cat ($$$$) {
+	my ($git, $oid, $cb, $arg) = @_;
+	$git->cat_async($oid, $cb, $arg);
+	$git->{async_cat} //= new(__PACKAGE__, $git); # circular reference
 }
 
 1;
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 3c32d846508..3815141a15e 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -388,7 +388,7 @@ sub requeue_once ($) {
 	$self->requeue if $new_size == 1;
 }
 
-sub uid_fetch_cb { # called by git->cat_async via git_async_msg
+sub uid_fetch_cb { # called by git->cat_async via git_async_cat
 	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
 	my ($self, undef, $ibx, $msgs, undef, $want) = @$fetch_m_arg;
 	my $smsg = shift @$msgs or die 'BUG: no smsg';
@@ -400,6 +400,7 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_msg
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
+
 	$$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy
 
 	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
@@ -487,7 +488,7 @@ sub uid_fetch_m { # long_response
 			return;
 		}
 	}
-	git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_);
+	git_async_cat($ibx->git, $msgs->[0]->{blob}, \&uid_fetch_cb, \@_);
 }
 
 sub cmd_status ($$$;@) {
@@ -719,7 +720,8 @@ sub seq_fetch_m { # long_response
 	my $seq = $want->{-seqno}++;
 	my $cur_num = $msgs->[0]->{num};
 	if ($cur_num == $seq) { # as expected
-		git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_);
+		git_async_cat($ibx->git, $msgs->[0]->{blob},
+				\&uid_fetch_cb, \@_);
 	} elsif ($cur_num > $seq) {
 		# send dummy messages until $seq catches up to $cur_num
 		my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg';
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 407751c3063..b2b0b56fdd3 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -22,9 +22,6 @@ my $CLEANUP = {}; # string(inbox) -> inbox
 sub git_cleanup ($) {
 	my ($self) = @_;
 	my $git = $self->{git} or return;
-	if (my $async_cat = delete $self->{async_cat}) {
-		$async_cat->close;
-	}
 	$git->cleanup;
 }
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 35/82] git: async: automatic retry on alternates change
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (33 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 34/82] git: move async_cat reference to PublicInbox::Git Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 36/82] imapclient: wrapper for Mail::IMAPClient Eric Wong
                   ` (47 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This matches the behavior of the existing synchronous ->cat_file
method.  In fact, ->cat_file now becomes a small wrapper around
the ->cat_async method.
---
 lib/PublicInbox/Git.pm | 64 +++++++++++++++++++++++++-----------------
 t/git.t                | 37 ++++++++++++++++++++----
 2 files changed, 70 insertions(+), 31 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 60236afe7bf..a55c48d5f52 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -18,6 +18,7 @@ use base qw(Exporter);
 our @EXPORT_OK = qw(git_unquote git_quote);
 use Errno qw(EINTR);
 our $PIPE_BUFSIZ = 65536; # Linux default
+our $in_cleanup;
 
 use constant MAX_INFLIGHT =>
 	(($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 3)
@@ -155,6 +156,26 @@ sub my_readline ($$) {
 	}
 }
 
+sub cat_async_retry ($$$$$) {
+	my ($self, $inflight, $req, $cb, $arg) = @_;
+
+	# {inflight} may be non-existent, but if it isn't we delete it
+	# here to prevent cleanup() from waiting:
+	delete $self->{inflight};
+	cleanup($self);
+
+	$self->{inflight} = $inflight;
+	batch_prepare($self);
+	my $buf = "$req\n";
+	for (my $i = 0; $i < @$inflight; $i += 3) {
+		$buf .= "$inflight->[$i]\n";
+	}
+	print { $self->{out} } $buf or fail($self, "write error: $!");
+	unshift(@$inflight, \$req, $cb, $arg); # \$ref to indicate retried
+
+	cat_async_step($self, $inflight); # take one step
+}
+
 sub cat_async_step ($$) {
 	my ($self, $inflight) = @_;
 	die 'BUG: inflight empty or odd' if scalar(@$inflight) < 3;
@@ -168,8 +189,13 @@ sub cat_async_step ($$) {
 			fail($self, defined($bref) ? 'read EOF' : "read: $!");
 		chop($$bref) eq "\n" or fail($self, 'LF missing after blob');
 	} elsif ($head =~ / missing$/) {
+		# ref($req) indicates it's already been retried
+		if (!ref($req) && !$in_cleanup && alternates_changed($self)) {
+			return cat_async_retry($self, $inflight,
+						$req, $cb, $arg);
+		}
 		$type = 'missing';
-		$oid = $req;
+		$oid = ref($req) ? $$req : $req;
 	} else {
 		fail($self, "Unexpected result from async git cat-file: $head");
 	}
@@ -190,33 +216,18 @@ sub batch_prepare ($) {
 	_bidi_pipe($_[0], qw(--batch in out pid));
 }
 
+sub _cat_file_cb {
+	my ($bref, undef, undef, $size, $result) = @_;
+	@$result = ($bref, $size);
+}
+
 sub cat_file {
-	my ($self, $obj, $sizeref) = @_;
-	my ($retried, $head, $rbuf);
+	my ($self, $oid, $sizeref) = @_;
+	my $result = [];
+	cat_async($self, $oid, \&_cat_file_cb, $result);
 	cat_async_wait($self);
-again:
-	batch_prepare($self);
-	$rbuf = delete($self->{cat_rbuf}) // \(my $new = '');
-	print { $self->{out} } $obj, "\n" or fail($self, "write error: $!");
-	$head = my_readline($self->{in}, $rbuf);
-	if ($head =~ / missing$/) {
-		if (!$retried && alternates_changed($self)) {
-			$retried = 1;
-			cleanup($self);
-			goto again;
-		}
-		return;
-	}
-	$head =~ /^[0-9a-f]{40} \S+ ([0-9]+)$/ or
-		fail($self, "Unexpected result from git cat-file: $head");
-
-	my $size = $1 + 0;
-	$$sizeref = $size if $sizeref;
-	my $ret = my_read($self->{in}, $rbuf, $size + 1);
-	$self->{cat_rbuf} = $rbuf if $$rbuf ne '';
-	fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret;
-	chop($$ret) eq "\n" or fail($self, 'newline missing after blob');
-	$ret;
+	$$sizeref = $result->[1] if $sizeref;
+	$result->[0];
 }
 
 sub check {
@@ -283,6 +294,7 @@ sub qx {
 # returns true if there are pending "git cat-file" processes
 sub cleanup {
 	my ($self) = @_;
+	local $in_cleanup = 1;
 	if (my $ac = $self->{async_cat}) {
 		$ac->close; # PublicInbox::GitAsyncCat::close -> EPOLL_CTL_DEL
 	}
diff --git a/t/git.t b/t/git.t
index 98d16f289c3..228df90fa45 100644
--- a/t/git.t
+++ b/t/git.t
@@ -86,23 +86,50 @@ if (1) {
 
 if ('alternates reloaded') {
 	my ($alt, $alt_obj) = tmpdir();
-	my @cmd = ('git', "--git-dir=$alt", qw(hash-object -w --stdin));
+	my $hash_obj = [ 'git', "--git-dir=$alt", qw(hash-object -w --stdin) ];
 	PublicInbox::Import::init_bare($alt);
 	open my $fh, '<', "$alt/config" or die "open failed: $!\n";
-	my $rd = popen_rd(\@cmd, {}, { 0 => $fh } );
-	close $fh or die "close failed: $!";
-	chomp(my $remote = <$rd>);
+	chomp(my $remote = xqx($hash_obj, undef, { 0 => $fh }));
 	my $gcf = PublicInbox::Git->new($dir);
 	is($gcf->cat_file($remote), undef, "remote file not found");
 	open $fh, '>>', "$dir/objects/info/alternates" or
 			die "open failed: $!\n";
-	print $fh "$alt/objects" or die "print failed: $!\n";
+	print $fh "$alt/objects\n" or die "print failed: $!\n";
 	close $fh or die "close failed: $!";
 	my $found = $gcf->cat_file($remote);
 	open $fh, '<', "$alt/config" or die "open failed: $!\n";
 	my $config = eval { local $/; <$fh> };
 	is($$found, $config, 'alternates reloaded');
 
+	# with the async interface
+	my ($async_alt, $async_dir_obj) = tmpdir();
+	PublicInbox::Import::init_bare($async_alt);
+	my @exist = map { chomp; [ split / / ] } (xqx(['git', "--git-dir=$dir",
+			qw(cat-file --batch-all-objects --batch-check)]));
+	my $results = [];
+	my $cb = sub {
+		my ($bref, $oid, $type, $size) = @_;
+		push @$results, [ $oid, $type, $size ];
+	};
+	for my $i (0..5) {
+		$gcf->cat_async($exist[$i]->[0], $cb, $results);
+		next if $i != 3;
+
+		# stick a new alternate into a running async pipeline
+		$hash_obj->[1] = "--git-dir=$async_alt";
+		$remote = xqx($hash_obj, undef, { 0 => \'async' });
+		chomp $remote;
+		open $fh, '>>', "$dir/objects/info/alternates" or
+				die "open failed: $!\n";
+		print $fh "$async_alt/objects\n" or die "print failed: $!\n";
+		close $fh or die "close failed: $!";
+		# trigger cat_async_retry:
+		$gcf->cat_async($remote, $cb, $results);
+	}
+	$gcf->cat_async_wait;
+	my $expect = [ @exist[0..3], [ $remote, 'blob', 5 ], @exist[4..5] ];
+	is_deeply($results, $expect, 'got expected results');
+
 	ok(!$gcf->cleanup, 'cleanup can expire');
 	ok(!$gcf->cleanup, 'cleanup idempotent');
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 36/82] imapclient: wrapper for Mail::IMAPClient
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (34 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 35/82] git: async: automatic retry on alternates change Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 37/82] xt: add imapd-validate and imapd-mbsync-oimap Eric Wong
                   ` (46 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We'll be using this wrapper class to workaround some upstream
bugs in Mail::IMAPClient.  There may also be experiments with
new APIs for more performance.
---
 MANIFEST                      |   1 +
 lib/PublicInbox/IMAPClient.pm | 119 ++++++++++++++++++++++++++++++++++
 t/imapd-tls.t                 |  37 +++++------
 t/imapd.t                     |  35 +++++-----
 4 files changed, 155 insertions(+), 37 deletions(-)
 create mode 100644 lib/PublicInbox/IMAPClient.pm

diff --git a/MANIFEST b/MANIFEST
index f74852b6e90..6744a519efc 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -128,6 +128,7 @@ lib/PublicInbox/HTTPD/Async.pm
 lib/PublicInbox/HlMod.pm
 lib/PublicInbox/Hval.pm
 lib/PublicInbox/IMAP.pm
+lib/PublicInbox/IMAPClient.pm
 lib/PublicInbox/IMAPD.pm
 lib/PublicInbox/IMAPdeflate.pm
 lib/PublicInbox/Import.pm
diff --git a/lib/PublicInbox/IMAPClient.pm b/lib/PublicInbox/IMAPClient.pm
new file mode 100644
index 00000000000..33deee9eb3d
--- /dev/null
+++ b/lib/PublicInbox/IMAPClient.pm
@@ -0,0 +1,119 @@
+# This library is free software; you can redistribute it and/or modify it
+# under the same terms as Perl itself, either Perl version 5.8.0 or, at
+# your option, any later version of Perl 5 you may have available.
+#
+# The license for this file differs from the rest of public-inbox.
+#
+# Workaround some bugs in upstream Mail::IMAPClient when
+# compression is enabled:
+# - reference cycle: https://rt.cpan.org/Ticket/Display.html?id=132654
+# - read starvation: https://rt.cpan.org/Ticket/Display.html?id=132720
+package PublicInbox::IMAPClient;
+use strict;
+use parent 'Mail::IMAPClient';
+use Errno qw(EAGAIN);
+
+# RFC4978 COMPRESS
+sub compress {
+    my ($self) = @_;
+
+    # BUG? strict check on capability commented out for now...
+    #my $can = $self->has_capability("COMPRESS")
+    #return undef unless $can and $can eq "DEFLATE";
+
+    $self->_imap_command("COMPRESS DEFLATE") or return undef;
+
+    my $zcl = $self->_load_module("Compress-Zlib") or return undef;
+
+    # give caller control of args if desired
+    $self->Compress(
+        [
+            -WindowBits => -$zcl->MAX_WBITS(),
+            -Level      => $zcl->Z_BEST_SPEED()
+        ]
+    ) unless ( $self->Compress and ref( $self->Compress ) eq "ARRAY" );
+
+    my ( $rc, $do, $io );
+
+    ( $do, $rc ) = Compress::Zlib::deflateInit( @{ $self->Compress } );
+    unless ( $rc == $zcl->Z_OK ) {
+        $self->LastError("deflateInit failed (rc=$rc)");
+        return undef;
+    }
+
+    ( $io, $rc ) =
+      Compress::Zlib::inflateInit( -WindowBits => -$zcl->MAX_WBITS() );
+    unless ( $rc == $zcl->Z_OK ) {
+        $self->LastError("inflateInit failed (rc=$rc)");
+        return undef;
+    }
+
+    $self->{Prewritemethod} = sub {
+        my ( $self, $string ) = @_;
+
+        my ( $rc, $out1, $out2 );
+        ( $out1, $rc ) = $do->deflate($string);
+        ( $out2, $rc ) = $do->flush( $zcl->Z_PARTIAL_FLUSH() )
+          unless ( $rc != $zcl->Z_OK );
+
+        unless ( $rc == $zcl->Z_OK ) {
+            $self->LastError("deflate/flush failed (rc=$rc)");
+            return undef;
+        }
+
+        return $out1 . $out2;
+    };
+
+    # need to retain some state for Readmoremethod/Readmethod calls
+    my ( $Zbuf, $Ibuf ) = ( "", "" );
+
+    $self->{Readmoremethod} = sub {
+        my $self = shift;
+        return 1 if ( length($Zbuf) || length($Ibuf) );
+        $self->__read_more(@_);
+    };
+
+    $self->{Readmethod} = sub {
+        my ( $self, $fh, $buf, $len, $off ) = @_;
+
+        # get more data, but empty $Ibuf first if any data is left
+        my ( $lz, $li ) = ( length $Zbuf, length $Ibuf );
+        if ( $lz || !$li ) {
+            my $readlen = $self->Buffer || 4096;
+            my $ret = sysread( $fh, $Zbuf, $readlen, length $Zbuf );
+            $lz = length $Zbuf;
+            return $ret if ( !$ret && !$lz );    # $ret is undef or 0
+        }
+
+        # accumulate inflated data in $Ibuf
+        if ($lz) {
+            my ( $tbuf, $rc ) = $io->inflate( \$Zbuf );
+            unless ( $rc == $zcl->Z_OK ) {
+                $self->LastError("inflate failed (rc=$rc)");
+                return undef;
+            }
+            $Ibuf .= $tbuf;
+            $li = length $Ibuf;
+        }
+
+        if ( !$li ) {
+            # note: faking EAGAIN here is only safe with level-triggered
+            # I/O readiness notifications (select, poll).  Refactoring
+            # callers will be needed in the unlikely case somebody wants
+            # to use edge-triggered notifications (EV_CLEAR, EPOLLET).
+            $! = EAGAIN;
+            return undef;
+        }
+
+        # pull desired length of data from $Ibuf
+        my $tbuf = substr( $Ibuf, 0, $len );
+        substr( $Ibuf, 0, $len ) = "";
+        substr( $$buf, $off ) = $tbuf;
+
+        return length $tbuf;
+    };
+
+    return $self;
+}
+
+1;
diff --git a/t/imapd-tls.t b/t/imapd-tls.t
index 9f5abfe048e..5352d100e4b 100644
--- a/t/imapd-tls.t
+++ b/t/imapd-tls.t
@@ -7,8 +7,15 @@ use Socket qw(IPPROTO_TCP SOL_SOCKET);
 use PublicInbox::TestCommon;
 # IO::Poll is part of the standard library, but distros may split it off...
 require_mods(qw(DBD::SQLite IO::Socket::SSL Mail::IMAPClient IO::Poll));
-Mail::IMAPClient->can('starttls') or
+my $imap_client = 'Mail::IMAPClient';
+$imap_client->can('starttls') or
 	plan skip_all => 'Mail::IMAPClient does not support TLS';
+my $can_compress = $imap_client->can('compress');
+if ($can_compress) { # hope this gets fixed upstream, soon
+	require PublicInbox::IMAPClient;
+	$imap_client = 'PublicInbox::IMAPClient';
+}
+
 my $cert = 'certs/server-cert.pem';
 my $key = 'certs/server-key.pem';
 unless (-r $key && -r $cert) {
@@ -67,18 +74,6 @@ my $starttls_addr = $starttls->sockhost . ':' . $starttls->sockport;
 my $env = { PI_CONFIG => $pi_config };
 my $td;
 
-# Mail::IMAPClient ->compress creates cyclic reference:
-# https://rt.cpan.org/Ticket/Display.html?id=132654
-my $compress_logout = sub {
-	my ($c) = @_;
-	ok($c->logout, 'logout ok after ->compress');
-	# all documented in Mail::IMAPClient manpage:
-	for (qw(Readmoremethod Readmethod Prewritemethod)) {
-		$c->$_(undef);
-	}
-};
-
-
 for my $args (
 	[ "--cert=$cert", "--key=$key",
 		"-limaps://$imaps_addr",
@@ -112,7 +107,7 @@ for my $args (
 			Server => $imaps->sockhost,
 			Port => $imaps->sockport);
 	# IMAPS
-	my $c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]);
+	my $c = $imap_client->new(%imaps_opt, Ssl => [ %o ]);
 	ok($c && $c->IsAuthenticated, 'authenticated');
 	ok($c->select($group), 'SELECT works');
 	ok(!(scalar $c->has_capability('STARTTLS')),
@@ -122,12 +117,12 @@ for my $args (
 	ok($c->compress, 'compression enabled with IMAPS');
 	ok(!$c->starttls, 'starttls still fails');
 	ok($c->noop, 'noop succeeds');
-	$compress_logout->($c);
+	ok($c->logout, 'logout succeeds');
 
 	# STARTTLS
 	my %imap_opt = (Server => $starttls->sockhost,
 			Port => $starttls->sockport);
-	$c = Mail::IMAPClient->new(%imap_opt);
+	$c = $imap_client->new(%imap_opt);
 	ok(scalar $c->has_capability('STARTTLS'),
 		'starttls advertised');
 	ok($c->Starttls([ %o ]), 'set starttls options');
@@ -141,25 +136,25 @@ for my $args (
 	ok($c->noop, 'NOOP works');
 	ok($c->compress, 'compression enabled with IMAPS');
 	ok($c->noop, 'NOOP works after compress');
-	$compress_logout->($c);
+	ok($c->logout, 'logout succeeds after compress');
 
 	# STARTTLS with bad hostname
 	$o{SSL_hostname} = $o{SSL_verifycn_name} = 'server.invalid';
-	$c = Mail::IMAPClient->new(%imap_opt);
+	$c = $imap_client->new(%imap_opt);
 	ok(scalar $c->has_capability('STARTTLS'), 'starttls advertised');
 	ok($c->Starttls([ %o ]), 'set starttls options');
 	ok(!$c->starttls, '->starttls fails with bad hostname');
 
-	$c = Mail::IMAPClient->new(%imap_opt);
+	$c = $imap_client->new(%imap_opt);
 	ok($c->noop, 'NOOP still works from plain IMAP');
 
 	# IMAPS with bad hostname
-	$c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]);
+	$c = $imap_client->new(%imaps_opt, Ssl => [ %o ]);
 	is($c, undef, 'IMAPS fails with bad hostname');
 
 	# make hostname valid
 	$o{SSL_hostname} = $o{SSL_verifycn_name} = 'server.local';
-	$c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]);
+	$c = $imap_client->new(%imaps_opt, Ssl => [ %o ]);
 	ok($c, 'IMAPS succeeds again with valid hostname');
 
 	# slow TLS connection did not block the other fast clients while
diff --git a/t/imapd.t b/t/imapd.t
index 7af14f1b150..3d0be340846 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -9,6 +9,12 @@ use PublicInbox::TestCommon;
 use PublicInbox::Config;
 use PublicInbox::Spawn qw(which);
 require_mods(qw(DBD::SQLite Mail::IMAPClient Mail::IMAPClient::BodyStructure));
+my $imap_client = 'Mail::IMAPClient';
+my $can_compress = $imap_client->can('compress');
+if ($can_compress) { # hope this gets fixed upstream, soon
+	require PublicInbox::IMAPClient;
+	$imap_client = 'PublicInbox::IMAPClient';
+}
 
 my $level = '-Lbasic';
 SKIP: {
@@ -57,7 +63,7 @@ my %mic_opt = (
 	Port => $sock->sockport,
 	Uid => 1,
 );
-my $mic = Mail::IMAPClient->new(%mic_opt);
+my $mic = $imap_client->new(%mic_opt);
 my $pre_login_capa = $mic->capability;
 is(grep(/\AAUTH=ANONYMOUS\z/, @$pre_login_capa), 1,
 	'AUTH=ANONYMOUS advertised pre-login');
@@ -71,7 +77,7 @@ ok(join("\n", @$pre_login_capa) ne join("\n", @$post_login_capa),
 
 $mic_opt{Authmechanism} = 'ANONYMOUS';
 $mic_opt{Authcallback} = sub { '' };
-$mic = Mail::IMAPClient->new(%mic_opt);
+$mic = $imap_client->new(%mic_opt);
 ok($mic && $mic->login && $mic->IsAuthenticated, 'AUTHENTICATE ANONYMOUS');
 my $post_auth_anon_capa = $mic->capability;
 is_deeply($post_auth_anon_capa, $post_login_capa,
@@ -175,20 +181,17 @@ for my $r ('1:*', '1') {
 	is(lc($bs->bodyenc), '8bit', '->bodyenc');
 }
 
-# Mail::IMAPClient ->compress creates cyclic reference:
-# https://rt.cpan.org/Ticket/Display.html?id=132654
-my $compress_logout = sub {
-	my ($c) = @_;
-	ok($c->logout, 'logout ok after ->compress');
-	# all documented in Mail::IMAPClient manpage:
-	for (qw(Readmoremethod Readmethod Prewritemethod)) {
-		$c->$_(undef);
-	}
-};
-
 is_deeply([$mic->has_capability('COMPRESS')], ['DEFLATE'], 'deflate cap');
-ok($mic->compress, 'compress enabled');
-$compress_logout->($mic);
+SKIP: {
+	skip 'Mail::IMAPClient too old for ->compress', 2 if !$can_compress;
+	my $c = $imap_client->new(%mic_opt);
+	ok($c && $c->compress, 'compress enabled');
+	ok($c->examine('inbox.i1'), 'EXAMINE succeeds after COMPRESS');
+	$ret = $c->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@";
+	is_deeply($ret, [ 1 ], 'search UID 1:* works after compression');
+}
+
+ok($mic->logout, 'logout works');
 
 my $have_inotify = eval { require Linux::Inotify2; 1 };
 
@@ -198,7 +201,7 @@ $pi_config->each_inbox(sub {
 	my $env = { ORIGINAL_RECIPIENT => $ibx->{-primary_address} };
 	my $name = $ibx->{name};
 	my $ng = $ibx->{newsgroup};
-	my $mic = Mail::IMAPClient->new(%mic_opt);
+	my $mic = $imap_client->new(%mic_opt);
 	ok($mic && $mic->login && $mic->IsAuthenticated, "authed $name");
 	my $uidnext = $mic->uidnext($ng); # we'll fetch BODYSTRUCTURE on this
 	ok($uidnext, 'got uidnext for later fetch');

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 37/82] xt: add imapd-validate and imapd-mbsync-oimap
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (35 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 36/82] imapclient: wrapper for Mail::IMAPClient Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 38/82] imap: support out-of-bounds ranges Eric Wong
                   ` (45 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

imapd-validate is a beefed up version of our nntpd-validate test
which hammers the server with parallel connections over regular
IMAP, IMAPS, IMAP+STARTTLS; and COMPRESS=DEFLATE variants of
each of those.  It uses $START_UID:$END_UID fetch ranges to
reduce requests and slurp many responses at once to saturate
"git cat-file --batch" processes.

mbsync(1) also uses pipelining extensively (but IMHO
unnecessarily), so it was able to shake out some bugs in
the async git code.

Finally, we remove xt/cmp-imapd-compress.t since it's
redundant now that we have PublicInbox::IMAPClient to work
around bugs in Mail::IMAPClient.
---
 MANIFEST                |   3 +-
 xt/cmp-imapd-compress.t |  83 -------------------
 xt/imapd-mbsync-oimap.t | 129 +++++++++++++++++++++++++++++
 xt/imapd-validate.t     | 175 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 306 insertions(+), 84 deletions(-)
 delete mode 100644 xt/cmp-imapd-compress.t
 create mode 100644 xt/imapd-mbsync-oimap.t
 create mode 100644 xt/imapd-validate.t

diff --git a/MANIFEST b/MANIFEST
index 6744a519efc..957228250a7 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -347,12 +347,13 @@ t/www_listing.t
 t/www_static.t
 t/x-unknown-alpine.eml
 t/xcpdb-reshard.t
-xt/cmp-imapd-compress.t
 xt/cmp-msgstr.t
 xt/cmp-msgview.t
 xt/eml_check_limits.t
 xt/git-http-backend.t
 xt/git_async_cmp.t
+xt/imapd-mbsync-oimap.t
+xt/imapd-validate.t
 xt/mem-msgview.t
 xt/msgtime_cmp.t
 xt/nntpd-validate.t
diff --git a/xt/cmp-imapd-compress.t b/xt/cmp-imapd-compress.t
deleted file mode 100644
index b12cf74e2d5..00000000000
--- a/xt/cmp-imapd-compress.t
+++ /dev/null
@@ -1,83 +0,0 @@
-#!perl -w
-# Copyright (C) 2020 all contributors <meta@public-inbox.org>
-# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-use strict;
-use Test::More;
-use PublicInbox::TestCommon;
-require_mods('Data::Dumper');
-Data::Dumper->import('Dumper');
-my $inboxdir = $ENV{GIANT_INBOX_DIR};
-(defined($inboxdir) && -d $inboxdir) or
-	plan skip_all => "GIANT_INBOX_DIR not defined for $0";
-plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!;
-my ($tmpdir, $for_destroy) = tmpdir();
-my $cfg = "$tmpdir/cfg";
-my $mailbox = 'inbox.test';
-{
-	open my $fh, '>', $cfg or BAIL_OUT "open: $!";
-	print $fh <<EOF or BAIL_OUT "print: $!";
-[publicinbox "test"]
-	newsgroup = $mailbox
-	address = test\@example.com
-	inboxdir = $inboxdir
-EOF
-	close $fh or BAIL_OUT "close: $!";
-}
-my ($out, $err) = ("$tmpdir/stdout.log", "$tmpdir/stderr.log");
-my $sock = tcp_server();
-my $cmd = [ '-imapd', '-W0', "--stdout=$out", "--stderr=$err"];
-my $env = { PI_CONFIG => $cfg };
-my $td = start_script($cmd, $env, { 3 => $sock }) or BAIL_OUT "-imapd: $?";
-my ($host, $port) = ($sock->sockhost, $sock->sockport);
-my $c = tcp_connect($sock);
-like(readline($c), qr/CAPABILITY /, 'got greeting');
-undef $c;
-
-SKIP: {
-	require_mods('Mail::IMAPClient', 3);
-	unless ($ENV{RT_132720_FIXED}) {
-		my $bug = 'https://rt.cpan.org/Ticket/Display.html?id=132720';
-		skip "<$bug>, RT_132720_FIXED not defined", 3;
-	}
-	my %opt = (Server => $host, Port => $port,
-			User => 'u', Password => 'p', Clear => 1);
-	my $uc = Mail::IMAPClient->new(%opt);
-	my $c = Mail::IMAPClient->new(%opt);
-	ok($c->compress, 'enabled compression');
-	ok $c->examine($mailbox), 'compressed EXAMINE-ed';
-	ok $uc->examine($mailbox), 'uncompress EXAMINE-ed';
-	my $range = $uc->search('all');
-	for my $uid (@$range) {
-		my $A = $uc->fetch_hash($uid, 'BODY[]');
-		my $B = $c->fetch_hash($uid, 'BODY[]');
-		if (!is_deeply($A, $B, "$uid identical")) {
-			diag Dumper([$A, $B]);
-			diag Dumper([$uc, $c]);
-			last;
-		}
-	}
-	$uc->logout;
-	$c->logout;
-}
-
-SKIP: {
-	require_mods('Mail::IMAPTalk', 3);
-	my %opt = (Server => $host, Port => $port, UseSSL => 0,
-		Username => 'u', Password => 'p', Uid => 1);
-	my $uc = Mail::IMAPTalk->new(%opt) or BAIL_OUT 'IMAPTalk->new';
-	my $c = Mail::IMAPTalk->new(%opt, UseCompress => 1) or
-		BAIL_OUT 'IMAPTalk->new(UseCompress => 1)';
-	ok $c->examine($mailbox), 'compressed EXAMINE-ed';
-	ok $uc->examine($mailbox), 'uncompress EXAMINE-ed';
-	my $range = $uc->search('all');
-	for my $uid (@$range) {
-		my $A = $uc->fetch($uid, 'rfc822');
-		my $B = $c->fetch($uid, 'rfc822');
-		if (!is_deeply($A, $B, "$uid identical")) {
-			diag Dumper([$A, $B]);
-			diag Dumper([$uc, $c]);
-			last;
-		}
-	}
-}
-done_testing;
diff --git a/xt/imapd-mbsync-oimap.t b/xt/imapd-mbsync-oimap.t
new file mode 100644
index 00000000000..d2237a24bef
--- /dev/null
+++ b/xt/imapd-mbsync-oimap.t
@@ -0,0 +1,129 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# ensure mbsync and offlineimap compatibility
+use strict;
+use Test::More;
+use File::Path qw(mkpath);
+use PublicInbox::TestCommon;
+use PublicInbox::Spawn qw(which spawn);
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+(defined($inboxdir) && -d $inboxdir) or
+	plan skip_all => "GIANT_INBOX_DIR not defined for $0";
+plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!;
+my ($tmpdir, $for_destroy) = tmpdir();
+my $cfg = "$tmpdir/cfg";
+my $mailbox = 'inbox.test';
+{
+	open my $fh, '>', $cfg or BAIL_OUT "open: $!";
+	print $fh <<EOF or BAIL_OUT "print: $!";
+[publicinbox "test"]
+	newsgroup = $mailbox
+	address = oimap\@example.com
+	inboxdir = $inboxdir
+EOF
+	close $fh or BAIL_OUT "close: $!";
+}
+my ($out, $err) = ("$tmpdir/stdout.log", "$tmpdir/stderr.log");
+my $sock = tcp_server();
+my $cmd = [ '-imapd', '-W0', "--stdout=$out", "--stderr=$err" ];
+my $env = { PI_CONFIG => $cfg };
+my $td = start_script($cmd, $env, { 3 => $sock }) or BAIL_OUT "-imapd: $?";
+{
+	my $c = tcp_connect($sock);
+	like(readline($c), qr/CAPABILITY /, 'got greeting');
+}
+my ($host, $port) = ($sock->sockhost, $sock->sockport);
+my %pids;
+
+SKIP: {
+	mkpath([map { "$tmpdir/oimapdir/$_" } qw(cur new tmp)]);
+	my $oimap = which('offlineimap') or skip 'no offlineimap(1)', 1;
+	open my $fh, '>', "$tmpdir/.offlineimaprc" or BAIL_OUT "open: $!";
+	print $fh <<EOF or BAIL_OUT "print: $!";
+[general]
+accounts = test
+socktimeout = 10
+fsync = false
+
+[Account test]
+localrepository = l.test
+remoterepository = r.test
+
+[Repository l.test]
+type = Maildir
+localfolders = ~/oimapdir
+
+[Repository r.test]
+type = IMAP
+ssl = no
+remotehost = $host
+remoteport = $port
+remoteuser = anonymous
+remotepass = Hunter2
+
+# python-imaplib2 times out on select/poll when compression is enabled
+# <https://bugs.debian.org/961713>
+usecompression = no
+EOF
+	close $fh or BAIL_OUT "close: $!";
+	my $cmd = [ $oimap, qw(-o -q -u quiet) ];
+	my $pid = spawn($cmd, { HOME => $tmpdir }, { 1 => 2 });
+	$pids{$pid} = $cmd;
+}
+
+SKIP: {
+	mkpath([map { "$tmpdir/mbsyncdir/test/$_" } qw(cur new tmp)]);
+	my $mbsync = which('mbsync') or skip 'no mbsync(1)', 1;
+	open my $fh, '>', "$tmpdir/.mbsyncrc" or BAIL_OUT "open: $!";
+	print $fh <<EOF or BAIL_OUT "print: $!";
+Create Slave
+SyncState *
+Remove None
+FSync no
+
+MaildirStore local
+Path ~/mbsyncdir/
+Inbox ~/mbsyncdir/test
+SubFolders verbatim
+
+IMAPStore remote
+Host $host
+Port $port
+User anonymous
+Pass Hunter2
+SSLType None
+UseNamespace no
+# DisableExtension COMPRESS=DEFLATE
+
+Channel "test"
+Master ":remote:inbox"
+Slave ":local:test"
+Expunge None
+Sync PullNew
+Patterns *
+EOF
+	close $fh or BAIL_OUT "close: $!";
+	my $cmd = [ $mbsync, qw(-aqq) ];
+	my $pid = spawn($cmd, { HOME => $tmpdir }, { 1 => 2 });
+	$pids{$pid} = $cmd;
+}
+
+while (scalar keys %pids) {
+	my $pid = waitpid(-1, 0) or next;
+	my $cmd = delete $pids{$pid} or next;
+	is($?, 0, join(' ', @$cmd, 'done'));
+}
+
+if (my $sec = $ENV{TEST_PERSIST}) {
+	diag "sleeping ${sec}s, imap://$host:$port/$mailbox available";
+	diag "tmpdir=$tmpdir (Maildirs available)";
+	diag "stdout=$out";
+	diag "stderr=$err";
+	diag "pid=$td->{pid}";
+	sleep $sec;
+}
+$td->kill;
+$td->join;
+is($?, 0, 'no error on -imapd exit');
+done_testing;
diff --git a/xt/imapd-validate.t b/xt/imapd-validate.t
new file mode 100644
index 00000000000..f96ec8791b9
--- /dev/null
+++ b/xt/imapd-validate.t
@@ -0,0 +1,175 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# Expensive test to validate compression and TLS.
+use strict;
+use Test::More;
+use Symbol qw(gensym);
+use PublicInbox::DS qw(now);
+use POSIX qw(_exit);
+use PublicInbox::TestCommon;
+my $inbox_dir = $ENV{GIANT_INBOX_DIR};
+plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inbox_dir;
+# how many emails to read into memory at once per-process
+my $BATCH = $ENV{TEST_BATCH} // 100;
+my $REPEAT = $ENV{TEST_REPEAT} // 1;
+
+require_mods(qw(Mail::IMAPClient));
+my $imap_client = 'Mail::IMAPClient';
+my $can_compress = $imap_client->can('compress');
+if ($can_compress) { # hope this gets fixed upstream, soon
+	require PublicInbox::IMAPClient;
+	$imap_client = 'PublicInbox::IMAPClient';
+}
+
+my $test_tls = $ENV{TEST_SKIP_TLS} ? 0 : eval { require IO::Socket::SSL };
+my ($cert, $key) = qw(certs/server-cert.pem certs/server-key.pem);
+if ($test_tls && !-r $key || !-r $cert) {
+	plan skip_all =>
+		"certs/ missing for $0, run $^X ./certs/create-certs.perl";
+}
+my ($tmpdir, $for_destroy) = tmpdir();
+my %OPT = qw(User u Password p);
+my (%STARTTLS_OPT, %IMAPS_OPT, $td, $mailbox, $make_local_server);
+if (($ENV{IMAP_TEST_URL} // '') =~ m!\Aimap://([^/]+)/(.+)\z!) {
+	($OPT{Server}, $mailbox) = ($1, $2);
+	$OPT{Server} =~ s/:([0-9]+)\z// and $OPT{Port} = $1 + 0;
+	%STARTTLS_OPT = %OPT;
+	%IMAPS_OPT = (%OPT, Port => 993) if $OPT{Port} == 143;
+} else {
+	require_mods(qw(DBD::SQLite));
+	$make_local_server->();
+}
+
+my %opts = (imap => \%OPT, 'imap+compress' => { %OPT, Compress => 1 });
+my $uid_max = do {
+	my $mic = $imap_client->new(%OPT) or BAIL_OUT "new $!";
+	$mic->examine($mailbox) or BAIL_OUT "examine: $!";
+	my $next = $mic->uidnext($mailbox) or BAIL_OUT "uidnext: $!";
+	$next - 1;
+};
+
+if (scalar keys %STARTTLS_OPT) {
+	$opts{starttls} = \%STARTTLS_OPT;
+	$opts{'starttls+compress'} = { %STARTTLS_OPT, Compress => 1 };
+}
+if (scalar keys %IMAPS_OPT) {
+	$opts{imaps} = \%IMAPS_OPT;
+	$opts{'imaps+compress'} = { %IMAPS_OPT, Compress => 1 };
+}
+
+my $do_get_all = sub {
+	my ($desc, $opt) = @_;
+	local $SIG{__DIE__} = sub { print STDERR $desc, ': ', @_; _exit(1) };
+	my $t0 = now();
+	my $dig = Digest::SHA->new(1);
+	my $mic = $imap_client->new(%$opt);
+	$mic->examine($mailbox) or die "examine: $!";
+	my $uid_base = 1;
+	my $bytes = 0;
+	my $nr = 0;
+	until ($uid_base > $uid_max) {
+		my $end = $uid_base + $BATCH;
+		my $ret = $mic->fetch_hash("$uid_base:$end", 'BODY[]') or last;
+		for my $uid ($uid_base..$end) {
+			$dig->add($uid);
+			my $h = delete $ret->{$uid} or next;
+			my $body = delete $h->{'BODY[]'} or
+						die "no BODY[] for UID=$uid";
+			$dig->add($body);
+			$bytes += length($body);
+			++$nr;
+		}
+		$uid_base = $end + 1;
+	}
+	$mic->logout or die "logout failed: $!";
+	my $elapsed = sprintf('%0.3f', now() - $t0);
+	my $res = $dig->hexdigest;
+	print STDERR "# $desc $res (${elapsed}s) $bytes bytes, NR=$nr\n";
+	$res;
+};
+
+my (%pids, %res);
+for (1..$REPEAT) {
+	while (my ($desc, $opt) = each %opts) {
+		pipe(my ($r, $w)) or die;
+		my $pid = fork;
+		if ($pid == 0) {
+			close $r or die;
+			my $res = $do_get_all->($desc, $opt);
+			print $w $res or die;
+			close $w or die;
+			_exit(0);
+		}
+		close $w or die;
+		$pids{$pid} = [ $desc, $r ];
+	}
+}
+
+while (scalar keys %pids) {
+	my $pid = waitpid(-1, 0) or next;
+	my $child = delete $pids{$pid} or next;
+	my ($desc, $rpipe) = @$child;
+	is($?, 0, "$desc done");
+	my $sum = do { local $/; <$rpipe> };
+	push @{$res{$sum}}, $desc;
+}
+is(scalar keys %res, 1, 'all got the same result');
+$td->kill;
+$td->join;
+is($?, 0, 'no error on -imapd exit');
+done_testing;
+
+BEGIN {
+
+$make_local_server = sub {
+	require PublicInbox::Inbox;
+	$mailbox = 'inbox.test';
+	my $ibx = { inboxdir => $inbox_dir, newsgroup => $mailbox };
+	$ibx = PublicInbox::Inbox->new($ibx);
+	my $pi_config = "$tmpdir/config";
+	{
+		open my $fh, '>', $pi_config or die "open($pi_config): $!";
+		print $fh <<"" or die "print $pi_config: $!";
+[publicinbox "test"]
+	newsgroup = $mailbox
+	inboxdir = $inbox_dir
+	address = test\@example.com
+
+		close $fh or die "close($pi_config): $!";
+	}
+	my ($out, $err) = ("$tmpdir/out", "$tmpdir/err");
+	for ($out, $err) {
+		open my $fh, '>', $_ or die "truncate: $!";
+	}
+	my $imap = tcp_server();
+	my $rdr = { 3 => $imap };
+	$OPT{Server} = $imap->sockhost;
+	$OPT{Port} = $imap->sockport;
+
+	# not using multiple workers, here, since we want to increase
+	# the chance of tripping concurrency bugs within PublicInbox/IMAP*.pm
+	my $cmd = [ '-imapd', "--stdout=$out", "--stderr=$err", '-W0' ];
+	push @$cmd, '-limap://'.$imap->sockhost.':'.$imap->sockport;
+	if ($test_tls) {
+		my $imaps = tcp_server();
+		$rdr->{4} = $imaps;
+		push @$cmd, '-limaps://'.$imaps->sockhost.':'.$imaps->sockport;
+		push @$cmd, "--cert=$cert", "--key=$key";
+		my $tls_opt = [
+			SSL_hostname => 'server.local',
+			SSL_verifycn_name => 'server.local',
+			SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_PEER(),
+			SSL_ca_file => 'certs/test-ca.pem',
+		];
+		%STARTTLS_OPT = (%OPT, Starttls => $tls_opt);
+		%IMAPS_OPT = (%OPT, Ssl => $tls_opt,
+			Server => $imaps->sockhost,
+			Port => $imaps->sockport
+		);
+	}
+	print STDERR "# CMD ". join(' ', @$cmd). "\n";
+	my $env = { PI_CONFIG => $pi_config };
+	$td = start_script($cmd, $env, $rdr);
+};
+} # BEGIN

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 38/82] imap: support out-of-bounds ranges
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (36 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 37/82] xt: add imapd-validate and imapd-mbsync-oimap Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 39/82] xt/perf-imap-list: time refresh_inboxlist Eric Wong
                   ` (44 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

"$UID_START:*" needs to return at least one message according
to RFC 3501 section 6.4.8.

While we're in the area, coerce ranges to (unsigned) integers by
adding zero ("+ 0") to reduce memory overhead.
---
 lib/PublicInbox/IMAP.pm |  8 +++++---
 t/imapd.t               | 13 +++++++++++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 3815141a15e..ffa76bb0266 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -455,11 +455,13 @@ sub range_step ($$) {
 		$$range_csv = undef;
 	}
 	if ($range =~ /\A([0-9]+):([0-9]+)\z/) {
-		($beg, $end) = ($1, $2);
+		($beg, $end) = ($1 + 0, $2 + 0);
 	} elsif ($range =~ /\A([0-9]+):\*\z/) {
-		($beg, $end) =  ($1, $ibx->mm->max // 0);
+		$beg = $1 + 0;
+		$end = $ibx->mm->max // 0;
+		$beg = $end if $beg > $end;
 	} elsif ($range =~ /\A[0-9]+\z/) {
-		$beg = $end = $range;
+		$beg = $end = $range + 0;
 	} else {
 		return 'BAD fetch range';
 	}
diff --git a/t/imapd.t b/t/imapd.t
index 3d0be340846..2c4315dec30 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -116,6 +116,19 @@ $ret = $mic->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@";
 is_deeply($ret, [ 1 ], 'search UID 1:* works');
 
 is_deeply(scalar $mic->flags('1'), [], '->flags works');
+{
+	# RFC 3501 section 6.4.8 states:
+	# Also note that a UID range of 559:* always includes the
+	# UID of the last message in the mailbox, even if 559 is
+	# higher than any assigned UID value.
+	my $exp = $mic->fetch_hash(1, 'UID');
+	$ret = $mic->fetch_hash('559:*', 'UID');
+	is_deeply($ret, $exp, 'beginning range too big');
+	for my $r (qw(559:558 558:559)) {
+		$ret = $mic->fetch_hash($r, 'UID');
+		is_deeply($ret, {}, "out-of-range UID FETCH $r");
+	}
+}
 
 for my $r ('1:*', '1') {
 	$ret = $mic->fetch_hash($r, 'RFC822') or BAIL_OUT "FETCH $@";

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 39/82] xt/perf-imap-list: time refresh_inboxlist
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (37 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 38/82] imap: support out-of-bounds ranges Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 40/82] imap: case-insensitive mailbox name comparisons Eric Wong
                   ` (43 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

It's useful to know how fast SIGHUP can be handled, too.
---
 xt/perf-imap-list.t | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/xt/perf-imap-list.t b/xt/perf-imap-list.t
index 37640a90f3c..0f00f487991 100644
--- a/xt/perf-imap-list.t
+++ b/xt/perf-imap-list.t
@@ -9,15 +9,18 @@ use Benchmark qw(:all);
 my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000);
 push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000);
 my $self = { imapd => { grouplist => \@n } };
-PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
-
 my $n = scalar @n;
+my $t = timeit(1, sub {
+	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
+});
+diag timestr($t). "refresh $n inboxes";
+
 open my $null, '>', '/dev/null' or BAIL_OUT "open: $!";
 my $ds = { sock => $null };
 my $nr = 200;
 diag "starting benchmark...";
 my $cmd_list = \&PublicInbox::IMAP::cmd_list;
-my $t = timeit(1, sub {
+$t = timeit(1, sub {
 	for (0..$nr) {
 		my $res = $cmd_list->($self, 'tag', '', '*');
 		PublicInbox::DS::write($ds, $res);

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 40/82] imap: case-insensitive mailbox name comparisons
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (38 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 39/82] xt/perf-imap-list: time refresh_inboxlist Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 41/82] imap: break giant inboxes into sub-inboxes of 50K messages Eric Wong
                   ` (42 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

IMAP RFC 3501 stipulates case-insensitive comparisons, and so
does RFC 977 (NNTP).  However, INN (nnrpd) uses case-sensitive
comparisons, so we've always used case-sensitive comparisons for
NNTP to match nnrpd behavior.

Unfortunately, some IMAP clients insist on sending "INBOX" with
caps, which causes problems for us.  Since NNTP group names are
typically all lowercase anyways, just force all comparisons to
lowercase for IMAP and warn admins if uppercase-containing
newsgroups won't be accessible over IMAP.

This ensures our existing -nntpd behavior remains unchanged
while being compatible with the expectations of real-world IMAP
clients.
---
 lib/PublicInbox/IMAPD.pm | 7 +++++++
 t/imap.t                 | 7 ++++++-
 t/imapd.t                | 2 +-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index a3a2598661b..966879d8ea5 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -25,6 +25,13 @@ sub refresh_inboxlist ($) {
 	my ($self) = @_;
 	my @names = map { $_->{newsgroup} } @{delete $self->{grouplist}};
 	my %ns; # "\Noselect \HasChildren"
+
+	if (my @uc = grep(/[A-Z]/, @names)) {
+		warn "Uppercase not allowed for IMAP newsgroup(s):\n",
+			map { "\t$_\n" } @uc;
+		my %uc = map { $_ => 1 } @uc;
+		@names = grep { !$uc{$_} } @names;
+	}
 	for (@names) {
 		my $up = $_;
 		while ($up =~ s/\.[^\.]+\z//) {
diff --git a/t/imap.t b/t/imap.t
index 451b6596bf9..aa262a19632 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -8,9 +8,14 @@ use PublicInbox::IMAP;
 use PublicInbox::IMAPD;
 
 { # make sure we get '%' globbing right
-	my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y));
+	my @w;
+	local $SIG{__WARN__} = sub { push @w, @_ };
+	my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y IGNORE.THIS));
 	my $self = { imapd => { grouplist => \@n } };
 	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
+	is(scalar(@w), 1, 'got a warning for upper-case');
+	like($w[0], qr/IGNORE\.THIS/, 'warned about upper-case');
+
 	my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%');
 	is(scalar($$res =~ tr/\n/\n/), 2, 'only one result');
 	like($$res, qr/ x\r\ntag OK/, 'saw expected');
diff --git a/t/imapd.t b/t/imapd.t
index 2c4315dec30..a5324f78da0 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -88,7 +88,7 @@ like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent');
 ok(!$mic->select('foo') && ($e = $@), 'EXAMINE non-existent');
 like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent');
 ok($mic->select('inbox.i1'), 'SELECT succeeds');
-ok($mic->examine('inbox.i1'), 'EXAMINE succeeds');
+ok($mic->examine('INBOX.i1'), 'EXAMINE succeeds');
 my @raw = $mic->status('inbox.i1', qw(Messages uidnext uidvalidity));
 is(scalar(@raw), 2, 'got status response');
 like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\x20

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 41/82] imap: break giant inboxes into sub-inboxes of 50K messages
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (39 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 40/82] imap: case-insensitive mailbox name comparisons Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 42/82] imap: start doing iterative config reloading Eric Wong
                   ` (41 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This limit on mailbox size should keep users of tools like
mbsync (isync) and offlineimap happy, since typical filesystems
struggle with giant Maildirs.

I chose 50K since it's a bit more than what LKML typically sees
in a month and still manages to give acceptable performance on
my ancient Centrino laptop.

There were also no responses to my original proposal at:
<https://public-inbox.org/meta/20200519090000.GA24273@dcvr/>
so no objections, either :>
---
 lib/PublicInbox/IMAP.pm | 134 ++++++++++++++++++++++++++++++----------
 1 file changed, 100 insertions(+), 34 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index ffa76bb0266..959dcfbe328 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -16,7 +16,7 @@ package PublicInbox::IMAP;
 use strict;
 use base qw(PublicInbox::DS);
 use fields qw(imapd logged_in ibx long_cb -login_tag
-	-idle_tag -idle_max);
+	uid_min -idle_tag -idle_max);
 use PublicInbox::Eml;
 use PublicInbox::EmlContentFoo qw(parse_content_disposition);
 use PublicInbox::DS qw(now);
@@ -34,6 +34,9 @@ die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address;
 
 sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977?
 
+# changing this will cause grief for clients which cache
+sub UID_BLOCK () { 50_000 }
+
 my %FETCH_NEED_BLOB = ( # for future optimization
 	'BODY[HEADER]' => 1,
 	'BODY[TEXT]' => 1,
@@ -128,6 +131,7 @@ sub cmd_login ($$$$) {
 
 sub cmd_close ($$) {
 	my ($self, $tag) = @_;
+	delete $self->{uid_min};
 	delete $self->{ibx} ? "$tag OK Close done\r\n"
 				: "$tag BAD No mailbox\r\n";
 }
@@ -188,12 +192,67 @@ sub cmd_done ($$) {
 	"$idle_tag OK Idle done\r\n";
 }
 
+sub ensure_old_ranges_exist ($$$) {
+	my ($self, $ibx, $uid_min) = @_;
+	my $groups = $self->{imapd}->{groups};
+	my $mailbox = $ibx->{newsgroup};
+	my @created;
+	$uid_min -= UID_BLOCK;
+	my $uid_end = $uid_min + UID_BLOCK - 1;
+	while ($uid_min > 0) {
+		my $sub_mailbox = "$mailbox.$uid_min-$uid_end";
+		last if exists $groups->{$sub_mailbox};
+		$groups->{$sub_mailbox} = $ibx;
+		$uid_end -= UID_BLOCK;
+		$uid_min -= UID_BLOCK;
+		push @created, $sub_mailbox;
+	}
+	return unless @created;
+	my $l = $self->{imapd}->{inboxlist};
+	grep {
+		/ \Q$mailbox\E\r\n\z/ and s/\(\\HasNoChildren/\(\\HasChildren/;
+	} @$l;
+	push @$l, map { qq[* LIST (\\HasNoChildren) "." $_\r\n] } @created;
+}
+
 sub cmd_examine ($$$) {
 	my ($self, $tag, $mailbox) = @_;
-	my $ibx = $self->{imapd}->{groups}->{$mailbox} or
-		return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
-	my $mm = $ibx->mm;
-	my $max = $mm->max // 0;
+	my ($ibx, $mm, $max);
+
+	if ($mailbox =~ /\A(.+)\.([0-9]+)-([0-9]+)\z/) {
+		# old mail: inbox.comp.foo.$uid_min-$uid_end
+		my ($mb_top, $uid_min, $uid_end) = ($1, $2 + 0, $3 + 0);
+		$ibx = $self->{imapd}->{groups}->{lc $mb_top};
+		if (!$ibx || ($uid_end % UID_BLOCK) != 0 ||
+				($uid_min + UID_BLOCK - 1) != $uid_end) {
+			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
+		}
+		$mm = $ibx->mm;
+		$max = $mm->max // 0;
+
+		# don't let users create inboxes w/ not-yet-possible range:
+		$uid_min > $max and
+			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
+
+		$max = $uid_min + UID_BLOCK + 1;
+		$self->{uid_min} = $uid_min;
+		ensure_old_ranges_exist($self, $ibx, $uid_min);
+	} else { # current mailbox (most recent UID_BLOCK messages)
+		$ibx = $self->{imapd}->{groups}->{lc $mailbox} or
+			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
+
+		$mm = $ibx->mm;
+		$max = $mm->max // 0;
+
+		my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1;
+		if ($uid_min == 1) { # normal inbox with <UID_BLOCK messages
+			delete $self->{uid_min}; # implicit cmd_close
+		} else { # we have a giant inbox:
+			$self->{uid_min} = $uid_min;
+			ensure_old_ranges_exist($self, $ibx, $uid_min);
+		}
+	}
+
 	# RFC 3501 2.3.1.1 -  "A good UIDVALIDITY value to use in
 	# this case is a 32-bit representation of the creation
 	# date/time of the mailbox"
@@ -208,11 +267,11 @@ sub cmd_examine ($$$) {
 * $max RECENT\r
 * FLAGS (\\Seen)\r
 * OK [PERMANENTFLAGS ()] Read-only mailbox\r
+* OK [UNSEEN $max]\r
+* OK [UIDNEXT $uidnext]\r
+* OK [UIDVALIDITY $uidvalidity]\r
+$tag OK [READ-ONLY] EXAMINE/SELECT done\r
 EOF
-	$ret .= "* OK [UNSEEN $max]\r\n" if $max;
-	$ret .= "* OK [UIDNEXT $uidnext]\r\n" if defined $uidnext;
-	$ret .= "* OK [UIDVALIDITY $uidvalidity]\r\n" if defined $uidvalidity;
-	$ret .= "$tag OK [READ-ONLY] EXAMINE/SELECT done\r\n";
 }
 
 sub _esc ($) {
@@ -357,12 +416,12 @@ sub fetch_body ($;$) {
 }
 
 sub dummy_message ($$) {
-	my ($seqno, $ibx) = @_;
+	my ($self, $seqno) = @_;
 	my $ret = <<EOF;
 From: nobody\@localhost\r
 To: nobody\@localhost\r
 Date: Thu, 01 Jan 1970 00:00:00 +0000\r
-Message-ID: <dummy-$seqno\@$ibx->{newsgroup}>\r
+Message-ID: <dummy-$seqno\@$self->{ibx}->{newsgroup}>\r
 Subject: dummy message #$seqno\r
 \r
 You're seeing this message because your IMAP client didn't use UIDs.\r
@@ -390,13 +449,13 @@ sub requeue_once ($) {
 
 sub uid_fetch_cb { # called by git->cat_async via git_async_cat
 	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
-	my ($self, undef, $ibx, $msgs, undef, $want) = @$fetch_m_arg;
+	my ($self, undef, $msgs, undef, $want) = @$fetch_m_arg;
 	my $smsg = shift @$msgs or die 'BUG: no smsg';
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
 		# public-inbox-(edit|purge), just move onto the next message
 		return requeue_once($self) unless defined $want->{-seqno};
-		$bref = dummy_message($smsg->{num}, $ibx);
+		$bref = dummy_message($self, $smsg->{num});
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
@@ -446,7 +505,7 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat
 }
 
 sub range_step ($$) {
-	my ($ibx, $range_csv) = @_;
+	my ($self, $range_csv) = @_;
 	my ($beg, $end, $range);
 	if ($$range_csv =~ s/\A([^,]+),//) {
 		$range = $1;
@@ -458,39 +517,46 @@ sub range_step ($$) {
 		($beg, $end) = ($1 + 0, $2 + 0);
 	} elsif ($range =~ /\A([0-9]+):\*\z/) {
 		$beg = $1 + 0;
-		$end = $ibx->mm->max // 0;
+		$end = $self->{ibx}->mm->max // 0;
 		$beg = $end if $beg > $end;
 	} elsif ($range =~ /\A[0-9]+\z/) {
 		$beg = $end = $range + 0;
+		undef $range;
 	} else {
 		return 'BAD fetch range';
 	}
+	if (defined($range) && (my $uid_min = $self->{uid_min})) {
+		my $uid_end = $uid_min + UID_BLOCK - 1;
+		$beg = $uid_min if $beg < $uid_min;
+		$end = $uid_end if $end > $uid_end;
+	}
 	[ $beg, $end, $$range_csv ];
 }
 
 sub refill_range ($$$) {
-	my ($ibx, $msgs, $range_info) = @_;
+	my ($self, $msgs, $range_info) = @_;
 	my ($beg, $end, $range_csv) = @$range_info;
-	if (scalar(@$msgs = @{$ibx->over->query_xover($beg, $end)})) {
+	if (scalar(@$msgs = @{$self->{ibx}->over->query_xover($beg, $end)})) {
 		$range_info->[0] = $msgs->[-1]->{num} + 1;
 		return;
 	}
 	return 'OK Fetch done' if !$range_csv;
-	my $next_range = range_step($ibx, \$range_csv);
+	my $next_range = range_step($self, \$range_csv);
 	return $next_range if !ref($next_range); # error
 	@$range_info = @$next_range;
 	undef; # keep looping
 }
 
 sub uid_fetch_m { # long_response
-	my ($self, $tag, $ibx, $msgs, $range_info, $want) = @_;
+	my ($self, $tag, $msgs, $range_info, $want) = @_;
 	while (!@$msgs) { # rare
-		if (my $end = refill_range($ibx, $msgs, $range_info)) {
+		if (my $end = refill_range($self, $msgs, $range_info)) {
 			$self->write(\"$tag $end\r\n");
 			return;
 		}
 	}
-	git_async_cat($ibx->git, $msgs->[0]->{blob}, \&uid_fetch_cb, \@_);
+	git_async_cat($self->{ibx}->git, $msgs->[0]->{blob},
+			\&uid_fetch_cb, \@_);
 }
 
 sub cmd_status ($$$;@) {
@@ -698,9 +764,9 @@ sub fetch_common ($$$$) {
 		} sort keys %partial ];
 	}
 	$range_csv = 'bad' if $range_csv !~ $valid_range;
-	my $range_info = range_step($ibx, \$range_csv);
+	my $range_info = range_step($self, \$range_csv);
 	return "$tag $range_info\r\n" if !ref($range_info);
-	[ $tag, $ibx, [], $range_info, \%want ];
+	[ $tag, [], $range_info, \%want ];
 }
 
 sub cmd_uid_fetch ($$$;@) {
@@ -712,9 +778,9 @@ sub cmd_uid_fetch ($$$;@) {
 }
 
 sub seq_fetch_m { # long_response
-	my ($self, $tag, $ibx, $msgs, $range_info, $want) = @_;
+	my ($self, $tag, $msgs, $range_info, $want) = @_;
 	while (!@$msgs) { # rare
-		if (my $end = refill_range($ibx, $msgs, $range_info)) {
+		if (my $end = refill_range($self, $msgs, $range_info)) {
 			$self->write(\"$tag $end\r\n");
 			return;
 		}
@@ -722,13 +788,13 @@ sub seq_fetch_m { # long_response
 	my $seq = $want->{-seqno}++;
 	my $cur_num = $msgs->[0]->{num};
 	if ($cur_num == $seq) { # as expected
-		git_async_cat($ibx->git, $msgs->[0]->{blob},
+		git_async_cat($self->{ibx}->git, $msgs->[0]->{blob},
 				\&uid_fetch_cb, \@_);
 	} elsif ($cur_num > $seq) {
 		# send dummy messages until $seq catches up to $cur_num
 		my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg';
 		unshift @$msgs, $smsg;
-		my $bref = dummy_message($seq, $ibx);
+		my $bref = dummy_message($self, $seq);
 		uid_fetch_cb($bref, undef, undef, undef, \@_);
 		$smsg; # blessed response since uid_fetch_cb requeues
 	} else { # should not happen
@@ -741,14 +807,14 @@ sub cmd_fetch ($$$;@) {
 	my $args = fetch_common($self, $tag, $range_csv, \@want);
 	ref($args) eq 'ARRAY' ? do {
 		my $want = $args->[-1];
-		$want->{-seqno} = $args->[3]->[0]; # $beg == $range_info->[0];
+		$want->{-seqno} = $args->[2]->[0]; # $beg == $range_info->[0];
 		long_response($self, \&seq_fetch_m, @$args)
 	} : $args; # error
 }
 
 sub uid_search_all { # long_response
-	my ($self, $tag, $ibx, $num) = @_;
-	my $uids = $ibx->mm->ids_after($num);
+	my ($self, $tag, $num) = @_;
+	my $uids = $self->{ibx}->mm->ids_after($num);
 	if (scalar(@$uids)) {
 		$self->msg_more(join(' ', '', @$uids));
 	} else {
@@ -758,8 +824,8 @@ sub uid_search_all { # long_response
 }
 
 sub uid_search_uid_range { # long_response
-	my ($self, $tag, $ibx, $beg, $end) = @_;
-	my $uids = $ibx->mm->msg_range($beg, $end, 'num');
+	my ($self, $tag, $beg, $end) = @_;
+	my $uids = $self->{ibx}->mm->msg_range($beg, $end, 'num');
 	if (@$uids) {
 		$self->msg_more(join('', map { " $_->[0]" } @$uids));
 	} else {
@@ -775,14 +841,14 @@ sub cmd_uid_search ($$$;) {
 	if ($arg eq 'ALL' && !@rest) {
 		$self->msg_more('* SEARCH');
 		my $num = 0;
-		long_response($self, \&uid_search_all, $tag, $ibx, \$num);
+		long_response($self, \&uid_search_all, $tag, \$num);
 	} elsif ($arg eq 'UID' && scalar(@rest) == 1) {
 		if ($rest[0] =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
 			my ($beg, $end) = ($1, $2);
 			$end = $ibx->mm->max if $end eq '*';
 			$self->msg_more('* SEARCH');
 			long_response($self, \&uid_search_uid_range,
-					$tag, $ibx, \$beg, $end);
+					$tag, \$beg, $end);
 		} elsif ($rest[0] =~ /\A[0-9]+\z/s) {
 			my $uid = $rest[0];
 			$uid = $ibx->over->get_art($uid) ? " $uid" : '';

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 42/82] imap: start doing iterative config reloading
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (40 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 41/82] imap: break giant inboxes into sub-inboxes of 50K messages Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 43/82] imap: require ".$UID_MIN-$UID_END" suffix Eric Wong
                   ` (40 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This will be used to prevent reloading a giant config with
tens/hundreds of thousands of inboxes from blocking the event
loop.
---
 lib/PublicInbox/Config.pm | 15 ++++++++++
 lib/PublicInbox/IMAPD.pm  | 61 +++++++++++++++++++++++++++++++++++----
 script/public-inbox-imapd |  2 +-
 3 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 458f29b2964..c18c9c75b4f 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -99,6 +99,21 @@ sub each_inbox {
 	}
 }
 
+sub iterate_start {
+	my ($self, $cb, $arg) = @_;
+	my $i = 0;
+	$self->{-iter} = [ \$i, $cb, $arg ];
+}
+
+# for PublicInbox::DS::next_tick
+sub event_step {
+	my ($self) = @_;
+	my ($i, $cb, $arg) = @{$self->{-iter}};
+	my $section = $self->{-section_order}->[$$i++];
+	delete($self->{-iter}) unless defined($section);
+	$cb->($self, $section, $arg);
+}
+
 sub lookup_newsgroup {
 	my ($self, $ng) = @_;
 	_lookup_fill($self, '-by_newsgroup', lc($ng));
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index 966879d8ea5..a10fabffb1f 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -7,6 +7,8 @@ package PublicInbox::IMAPD;
 use strict;
 use parent qw(PublicInbox::NNTPD);
 use PublicInbox::InboxIdle;
+use PublicInbox::IMAP;
+# *UID_BLOCK = \&PublicInbox::IMAP::UID_BLOCK;
 
 sub new {
 	my ($class) = @_;
@@ -51,17 +53,64 @@ sub refresh_inboxlist ($) {
 	$self->{inboxlist} = \@names;
 }
 
-sub refresh_groups {
-	my ($self) = @_;
-	my $pi_config = $self->{pi_config} = PublicInbox::Config->new;
-	$self->SUPER::refresh_groups($pi_config);
-	refresh_inboxlist($self);
+sub imapd_refresh_ibx { # pi_config->each_inbox cb
+	my ($ibx, $imapd) = @_;
+	my $ngname = $ibx->{newsgroup} or return;
+	if (ref $ngname) {
+		warn 'multiple newsgroups not supported: '.
+			join(', ', @$ngname). "\n";
+	} elsif ($ngname =~ m![^a-z0-9/_\.\-\~\@\+\=:]! ||
+		 $ngname =~ /\.[0-9]+-[0-9]+\z/) {
+		warn "mailbox name invalid: `$ngname'\n";
+	}
+
+	my $mm = $ibx->mm or return;
+	$ibx->{mm} = undef;
+	defined($ibx->{uidvalidity} = $mm->created_at) or return;
+	$imapd->{tmp_groups}->{$ngname} = $ibx;
 
-	if (my $idler = $self->{idler}) {
+	# preload to avoid fragmentation:
+	$ibx->description;
+	$ibx->base_url;
+	# my $max = $mm->max // 0;
+	# my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1;
+}
+
+sub imapd_refresh_finalize {
+	my ($imapd, $pi_config) = @_;
+	$imapd->{groups} = delete $imapd->{tmp_groups};
+	$imapd->{grouplist} = [ values %{$imapd->{groups}} ];
+	refresh_inboxlist($imapd);
+	$imapd->{pi_config} = $pi_config;
+	if (my $idler = $imapd->{idler}) {
 		$idler->refresh($pi_config);
 	}
 }
 
+sub imapd_refresh_step { # pi_config->iterate_start cb
+	my ($pi_config, $section, $imapd) = @_;
+	if (defined($section)) {
+		return if $section !~ m!\Apublicinbox\.([^/]+)\z!;
+		my $ibx = $pi_config->lookup_name($1) or return;
+		imapd_refresh_ibx($ibx, $imapd);
+	} else { # "EOF"
+		imapd_refresh_finalize($imapd, $pi_config);
+	}
+}
+
+sub refresh_groups {
+	my ($self, $sig) = @_;
+	my $pi_config = PublicInbox::Config->new;
+	$self->{tmp_groups} = {};
+	if (0 && $sig) { # SIGHUP
+		$pi_config->iterate_start(\&imapd_refresh_step, $self);
+		PublicInbox::DS::requeue($pi_config); # call event_step
+	} else { # initial start
+		$pi_config->each_inbox(\&imapd_refresh_ibx, $self);
+		imapd_refresh_finalize($self, $pi_config);
+	}
+}
+
 sub idler_start {
 	$_[0]->{idler} //= PublicInbox::InboxIdle->new($_[0]->{pi_config});
 }
diff --git a/script/public-inbox-imapd b/script/public-inbox-imapd
index 63f865f53f0..60f2e6d8a36 100644
--- a/script/public-inbox-imapd
+++ b/script/public-inbox-imapd
@@ -9,6 +9,6 @@ use PublicInbox::IMAPdeflate; # loads PublicInbox::IMAP
 use PublicInbox::IMAPD;
 my $imapd = PublicInbox::IMAPD->new;
 PublicInbox::Daemon::run('0.0.0.0:143',
-	sub { $imapd->refresh_groups }, # refresh
+	sub { $imapd->refresh_groups(@_) }, # refresh
 	sub ($$$) { PublicInbox::IMAP->new($_[0], $imapd) }, # post_accept
 	$imapd);

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 43/82] imap: require ".$UID_MIN-$UID_END" suffix
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (41 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 42/82] imap: start doing iterative config reloading Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 44/82] imapd: ensure LIST is sorted alphabetically, for now Eric Wong
                   ` (39 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Finish up the IMAP-only portion of iterative config reloading,
which allows us to create all sub-ranges of an inbox up front.
The InboxIdler still uses ->each_inbox which will struggle with
100K inboxes.

Having messages in the top-level newsgroup name of an inbox will
still waste bandwidth for clients which want to do full syncs
once there's a rollover to a new 50K range.  So instead, make
every inbox accessible exclusively via 50K slices in the form of
"$NEWSGROUP.$UID_MIN-$UID_END".

This introduces the DummyInbox, which makes $NEWSGROUP
and every parent component a selectable, empty inbox.
This aids navigation with mutt and possibly other MUAs.

Finally, the xt/perf-imap-list maintainer test is broken, now,
so remove it.  The grep perlfunc is already proven effective,
and we'll have separate tests for mocking out ~100k inboxes.
---
 MANIFEST                      |  2 +-
 lib/PublicInbox/Config.pm     |  7 ++-
 lib/PublicInbox/DummyInbox.pm | 21 +++++++++
 lib/PublicInbox/IMAP.pm       | 64 +++++++++----------------
 lib/PublicInbox/IMAPD.pm      | 88 ++++++++++++++++-------------------
 t/imap.t                      | 33 +++++++++++--
 t/imapd.t                     | 22 +++++----
 xt/imapd-mbsync-oimap.t       |  5 +-
 xt/imapd-validate.t           |  9 ++--
 xt/perf-imap-list.t           | 38 ---------------
 10 files changed, 139 insertions(+), 150 deletions(-)
 create mode 100644 lib/PublicInbox/DummyInbox.pm
 delete mode 100644 xt/perf-imap-list.t

diff --git a/MANIFEST b/MANIFEST
index 957228250a7..a133b71f881 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -105,6 +105,7 @@ lib/PublicInbox/DS.pm
 lib/PublicInbox/DSKQXS.pm
 lib/PublicInbox/DSPoll.pm
 lib/PublicInbox/Daemon.pm
+lib/PublicInbox/DummyInbox.pm
 lib/PublicInbox/Emergency.pm
 lib/PublicInbox/Eml.pm
 lib/PublicInbox/EmlContentFoo.pm
@@ -357,7 +358,6 @@ xt/imapd-validate.t
 xt/mem-msgview.t
 xt/msgtime_cmp.t
 xt/nntpd-validate.t
-xt/perf-imap-list.t
 xt/perf-msgview.t
 xt/perf-nntpd.t
 xt/perf-threading.t
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index c18c9c75b4f..19535beb973 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -105,13 +105,16 @@ sub iterate_start {
 	$self->{-iter} = [ \$i, $cb, $arg ];
 }
 
-# for PublicInbox::DS::next_tick
+# for PublicInbox::DS::next_tick, we only call this is if
+# PublicInbox::DS is already loaded
 sub event_step {
 	my ($self) = @_;
 	my ($i, $cb, $arg) = @{$self->{-iter}};
 	my $section = $self->{-section_order}->[$$i++];
 	delete($self->{-iter}) unless defined($section);
-	$cb->($self, $section, $arg);
+	eval { $cb->($self, $section, $arg) };
+	warn "E: $@ in ${self}::event_step" if $@;
+	PublicInbox::DS::requeue($self) if defined($section);
 }
 
 sub lookup_newsgroup {
diff --git a/lib/PublicInbox/DummyInbox.pm b/lib/PublicInbox/DummyInbox.pm
new file mode 100644
index 00000000000..e38f9e5a432
--- /dev/null
+++ b/lib/PublicInbox/DummyInbox.pm
@@ -0,0 +1,21 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# An EXAMINE-able, PublicInbox::Inbox-like object for IMAP.  Some
+# IMAP clients don't like having unselectable parent mailboxes,
+# so we have a dummy
+package PublicInbox::DummyInbox;
+use strict;
+
+sub created_at { 0 } # Msgmap::created_at
+sub mm { shift }
+sub max { undef } # Msgmap::max
+sub msg_range { [] } # Msgmap::msg_range
+
+no warnings 'once';
+*query_xover = \&msg_range;
+*over = \&mm;
+*subscribe_unlock = *unsubscribe_unlock =
+	*get_art = *description = *base_url = \&max;
+
+1;
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 959dcfbe328..6f64dff9958 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -192,26 +192,23 @@ sub cmd_done ($$) {
 	"$idle_tag OK Idle done\r\n";
 }
 
-sub ensure_old_ranges_exist ($$$) {
-	my ($self, $ibx, $uid_min) = @_;
-	my $groups = $self->{imapd}->{groups};
-	my $mailbox = $ibx->{newsgroup};
+sub ensure_ranges_exist ($$$) {
+	my ($imapd, $ibx, $max) = @_;
+	my $mailboxes = $imapd->{mailboxes};
+	my $mb_top = $ibx->{newsgroup};
 	my @created;
-	$uid_min -= UID_BLOCK;
+	my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1;
 	my $uid_end = $uid_min + UID_BLOCK - 1;
 	while ($uid_min > 0) {
-		my $sub_mailbox = "$mailbox.$uid_min-$uid_end";
-		last if exists $groups->{$sub_mailbox};
-		$groups->{$sub_mailbox} = $ibx;
+		my $sub_mailbox = "$mb_top.$uid_min-$uid_end";
+		last if exists $mailboxes->{$sub_mailbox};
+		$mailboxes->{$sub_mailbox} = $ibx;
 		$uid_end -= UID_BLOCK;
 		$uid_min -= UID_BLOCK;
 		push @created, $sub_mailbox;
 	}
 	return unless @created;
-	my $l = $self->{imapd}->{inboxlist};
-	grep {
-		/ \Q$mailbox\E\r\n\z/ and s/\(\\HasNoChildren/\(\\HasChildren/;
-	} @$l;
+	my $l = $imapd->{inboxlist} or return;
 	push @$l, map { qq[* LIST (\\HasNoChildren) "." $_\r\n] } @created;
 }
 
@@ -222,41 +219,23 @@ sub cmd_examine ($$$) {
 	if ($mailbox =~ /\A(.+)\.([0-9]+)-([0-9]+)\z/) {
 		# old mail: inbox.comp.foo.$uid_min-$uid_end
 		my ($mb_top, $uid_min, $uid_end) = ($1, $2 + 0, $3 + 0);
-		$ibx = $self->{imapd}->{groups}->{lc $mb_top};
-		if (!$ibx || ($uid_end % UID_BLOCK) != 0 ||
-				($uid_min + UID_BLOCK - 1) != $uid_end) {
-			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
-		}
-		$mm = $ibx->mm;
-		$max = $mm->max // 0;
 
-		# don't let users create inboxes w/ not-yet-possible range:
-		$uid_min > $max and
+		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or
 			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
 
-		$max = $uid_min + UID_BLOCK + 1;
+		$mm = $ibx->mm;
+		$max = $mm->max // 0;
 		$self->{uid_min} = $uid_min;
-		ensure_old_ranges_exist($self, $ibx, $uid_min);
-	} else { # current mailbox (most recent UID_BLOCK messages)
-		$ibx = $self->{imapd}->{groups}->{lc $mailbox} or
+		ensure_ranges_exist($self->{imapd}, $ibx, $max);
+		$max = $uid_end if $max > $uid_end;
+	} else { # check for dummy inboxes
+		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or
 			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
-
+		delete $self->{uid_min};
+		$max = 0;
 		$mm = $ibx->mm;
-		$max = $mm->max // 0;
-
-		my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1;
-		if ($uid_min == 1) { # normal inbox with <UID_BLOCK messages
-			delete $self->{uid_min}; # implicit cmd_close
-		} else { # we have a giant inbox:
-			$self->{uid_min} = $uid_min;
-			ensure_old_ranges_exist($self, $ibx, $uid_min);
-		}
 	}
 
-	# RFC 3501 2.3.1.1 -  "A good UIDVALIDITY value to use in
-	# this case is a 32-bit representation of the creation
-	# date/time of the mailbox"
-	my $uidvalidity = $mm->created_at or return "$tag BAD UIDVALIDITY\r\n";
 	my $uidnext = $max + 1;
 
 	# XXX: do we need this? RFC 5162/7162
@@ -269,7 +248,7 @@ sub cmd_examine ($$$) {
 * OK [PERMANENTFLAGS ()] Read-only mailbox\r
 * OK [UNSEEN $max]\r
 * OK [UIDNEXT $uidnext]\r
-* OK [UIDVALIDITY $uidvalidity]\r
+* OK [UIDVALIDITY $ibx->{uidvalidity}]\r
 $tag OK [READ-ONLY] EXAMINE/SELECT done\r
 EOF
 }
@@ -561,7 +540,7 @@ sub uid_fetch_m { # long_response
 
 sub cmd_status ($$$;@) {
 	my ($self, $tag, $mailbox, @items) = @_;
-	my $ibx = $self->{imapd}->{groups}->{$mailbox} or
+	my $ibx = $self->{imapd}->{mailboxes}->{$mailbox} or
 		return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
 	return "$tag BAD no items\r\n" if !scalar(@items);
 	($items[0] !~ s/\A\(//s || $items[-1] !~ s/\)\z//s) and
@@ -577,8 +556,7 @@ sub cmd_status ($$$;@) {
 		} elsif ($it eq 'UIDNEXT') {
 			push(@it, ($max //= $mm->max // 0) + 1);
 		} elsif ($it eq 'UIDVALIDITY') {
-			push(@it, $mm->created_at //
-				return("$tag BAD UIDVALIDITY\r\n"));
+			push(@it, $ibx->{uidvalidity});
 		} else {
 			return "$tag BAD invalid item\r\n";
 		}
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index a10fabffb1f..b647c940505 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -5,82 +5,75 @@
 # see script/public-inbox-imapd for how it is used
 package PublicInbox::IMAPD;
 use strict;
-use parent qw(PublicInbox::NNTPD);
+use PublicInbox::Config;
 use PublicInbox::InboxIdle;
 use PublicInbox::IMAP;
-# *UID_BLOCK = \&PublicInbox::IMAP::UID_BLOCK;
+use PublicInbox::DummyInbox;
+my $dummy = bless { uidvalidity => 0 }, 'PublicInbox::DummyInbox';
 
 sub new {
 	my ($class) = @_;
 	bless {
-		groups => {},
+		mailboxes => {},
 		err => \*STDERR,
 		out => \*STDOUT,
-		grouplist => [],
 		# accept_tls => { SSL_server => 1, ..., SSL_reuse_ctx => ... }
 		# pi_config => PublicInbox::Config
 		# idler => PublicInbox::InboxIdle
 	}, $class;
 }
 
-sub refresh_inboxlist ($) {
-	my ($self) = @_;
-	my @names = map { $_->{newsgroup} } @{delete $self->{grouplist}};
-	my %ns; # "\Noselect \HasChildren"
-
-	if (my @uc = grep(/[A-Z]/, @names)) {
-		warn "Uppercase not allowed for IMAP newsgroup(s):\n",
-			map { "\t$_\n" } @uc;
-		my %uc = map { $_ => 1 } @uc;
-		@names = grep { !$uc{$_} } @names;
-	}
-	for (@names) {
-		my $up = $_;
-		while ($up =~ s/\.[^\.]+\z//) {
-			$ns{$up} = '\\Noselect \\HasChildren';
-		}
-	}
-	@names = map {;
-		my $at = delete($ns{$_}) ? '\\HasChildren' : '\\HasNoChildren';
-		qq[* LIST ($at) "." $_\r\n]
-	} @names;
-	push(@names, map { qq[* LIST ($ns{$_}) "." $_\r\n] } keys %ns);
-	@names = sort {
-		my ($xa) = ($a =~ / (\S+)\r\n/g);
-		my ($xb) = ($b =~ / (\S+)\r\n/g);
-		length($xa) <=> length($xb);
-	} @names;
-	$self->{inboxlist} = \@names;
-}
-
 sub imapd_refresh_ibx { # pi_config->each_inbox cb
 	my ($ibx, $imapd) = @_;
 	my $ngname = $ibx->{newsgroup} or return;
 	if (ref $ngname) {
 		warn 'multiple newsgroups not supported: '.
 			join(', ', @$ngname). "\n";
+		return;
 	} elsif ($ngname =~ m![^a-z0-9/_\.\-\~\@\+\=:]! ||
 		 $ngname =~ /\.[0-9]+-[0-9]+\z/) {
-		warn "mailbox name invalid: `$ngname'\n";
+		warn "mailbox name invalid: newsgroup=`$ngname'\n";
+		return;
 	}
-
+	$ibx->over or return;
+	$ibx->{over} = undef;
 	my $mm = $ibx->mm or return;
 	$ibx->{mm} = undef;
+
+	# RFC 3501 2.3.1.1 -  "A good UIDVALIDITY value to use in
+	# this case is a 32-bit representation of the creation
+	# date/time of the mailbox"
 	defined($ibx->{uidvalidity} = $mm->created_at) or return;
-	$imapd->{tmp_groups}->{$ngname} = $ibx;
+	PublicInbox::IMAP::ensure_ranges_exist($imapd, $ibx, $mm->max // 1);
 
 	# preload to avoid fragmentation:
 	$ibx->description;
 	$ibx->base_url;
-	# my $max = $mm->max // 0;
-	# my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1;
+
+	# ensure dummies are selectable
+	my $dummies = $imapd->{dummies};
+	do {
+		$dummies->{$ngname} = $dummy;
+	} while ($ngname =~ s/\.[^\.]+\z//);
 }
 
 sub imapd_refresh_finalize {
 	my ($imapd, $pi_config) = @_;
-	$imapd->{groups} = delete $imapd->{tmp_groups};
-	$imapd->{grouplist} = [ values %{$imapd->{groups}} ];
-	refresh_inboxlist($imapd);
+	my $mailboxes;
+	if (my $next = delete $imapd->{imapd_next}) {
+		$imapd->{mailboxes} = delete $next->{mailboxes};
+		$mailboxes = delete $next->{dummies};
+	} else {
+		$mailboxes = delete $imapd->{dummies};
+	}
+	%$mailboxes = (%$mailboxes, %{$imapd->{mailboxes}});
+	$imapd->{mailboxes} = $mailboxes;
+	$imapd->{inboxlist} = [
+		map {
+			my $no = $mailboxes->{$_} == $dummy ? '' : 'No';
+			qq[* LIST (\\Has${no}Children) "." $_\r\n]
+		} sort { length($a) <=> length($b) } keys %$mailboxes
+	];
 	$imapd->{pi_config} = $pi_config;
 	if (my $idler = $imapd->{idler}) {
 		$idler->refresh($pi_config);
@@ -92,8 +85,8 @@ sub imapd_refresh_step { # pi_config->iterate_start cb
 	if (defined($section)) {
 		return if $section !~ m!\Apublicinbox\.([^/]+)\z!;
 		my $ibx = $pi_config->lookup_name($1) or return;
-		imapd_refresh_ibx($ibx, $imapd);
-	} else { # "EOF"
+		imapd_refresh_ibx($ibx, $imapd->{imapd_next});
+	} else { # undef == "EOF"
 		imapd_refresh_finalize($imapd, $pi_config);
 	}
 }
@@ -101,11 +94,12 @@ sub imapd_refresh_step { # pi_config->iterate_start cb
 sub refresh_groups {
 	my ($self, $sig) = @_;
 	my $pi_config = PublicInbox::Config->new;
-	$self->{tmp_groups} = {};
-	if (0 && $sig) { # SIGHUP
+	if ($sig) { # SIGHUP is handled through the event loop
+		$self->{imapd_next} = { dummies => {}, mailboxes => {} };
 		$pi_config->iterate_start(\&imapd_refresh_step, $self);
 		PublicInbox::DS::requeue($pi_config); # call event_step
-	} else { # initial start
+	} else { # initial start is synchronous
+		$self->{dummies} = {};
 		$pi_config->each_inbox(\&imapd_refresh_ibx, $self);
 		imapd_refresh_finalize($self, $pi_config);
 	}
diff --git a/t/imap.t b/t/imap.t
index aa262a19632..af59ef69386 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -6,16 +6,39 @@ use strict;
 use Test::More;
 use PublicInbox::IMAP;
 use PublicInbox::IMAPD;
+use PublicInbox::TestCommon;
+require_mods(qw(DBD::SQLite));
+require_git 2.6;
 
-{ # make sure we get '%' globbing right
+my ($tmpdir, $for_destroy) = tmpdir();
+my $cfgfile = "$tmpdir/config";
+{
+	open my $fh, '>', $cfgfile or BAIL_OUT $!;
+	print $fh <<EOF or BAIL_OUT $!;
+[publicinbox "a"]
+	inboxdir = $tmpdir/a
+	newsgroup = x.y.z
+[publicinbox "b"]
+	inboxdir = $tmpdir/b
+	newsgroup = x.z.y
+[publicinbox "c"]
+	inboxdir = $tmpdir/c
+	newsgroup = IGNORE.THIS
+EOF
+	close $fh or BAIL_OUT $!;
+	local $ENV{PI_CONFIG} = $cfgfile;
+	for my $x (qw(a b c)) {
+		ok(run_script(['-init', '-Lbasic', '-V2', $x, "$tmpdir/$x",
+				"https://example.com/$x", "$x\@example.com"]),
+			"init $x");
+	}
+	my $imapd = PublicInbox::IMAPD->new;
 	my @w;
 	local $SIG{__WARN__} = sub { push @w, @_ };
-	my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y IGNORE.THIS));
-	my $self = { imapd => { grouplist => \@n } };
-	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
+	$imapd->refresh_groups;
+	my $self = { imapd => $imapd };
 	is(scalar(@w), 1, 'got a warning for upper-case');
 	like($w[0], qr/IGNORE\.THIS/, 'warned about upper-case');
-
 	my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%');
 	is(scalar($$res =~ tr/\n/\n/), 2, 'only one result');
 	like($$res, qr/ x\r\ntag OK/, 'saw expected');
diff --git a/t/imapd.t b/t/imapd.t
index a5324f78da0..017bfa0b27b 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -16,6 +16,9 @@ if ($can_compress) { # hope this gets fixed upstream, soon
 	$imap_client = 'PublicInbox::IMAPClient';
 }
 
+require_ok 'PublicInbox::IMAP';
+my $first_range = '1-'.PublicInbox::IMAP::UID_BLOCK();
+
 my $level = '-Lbasic';
 SKIP: {
 	require_mods('Search::Xapian', 1);
@@ -87,11 +90,13 @@ ok(!$mic->examine('foo') && ($e = $@), 'EXAMINE non-existent');
 like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent');
 ok(!$mic->select('foo') && ($e = $@), 'EXAMINE non-existent');
 like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent');
-ok($mic->select('inbox.i1'), 'SELECT succeeds');
-ok($mic->examine('INBOX.i1'), 'EXAMINE succeeds');
-my @raw = $mic->status('inbox.i1', qw(Messages uidnext uidvalidity));
+my $mailbox1 = "inbox.i1.$first_range";
+ok($mic->select('inbox.i1'), 'SELECT on parent succeeds');
+ok($mic->select($mailbox1), 'SELECT succeeds');
+ok($mic->examine($mailbox1), 'EXAMINE succeeds');
+my @raw = $mic->status($mailbox1, qw(Messages uidnext uidvalidity));
 is(scalar(@raw), 2, 'got status response');
-like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\x20
+like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\.$first_range\x20
 	\(MESSAGES\x20\d+\x20UIDNEXT\x20\d+\x20UIDVALIDITY\x20\d+\)\r\n/sx);
 like($raw[1], qr/\A\S+ OK /, 'finished status response');
 
@@ -99,7 +104,7 @@ like($raw[1], qr/\A\S+ OK /, 'finished status response');
 like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/,
 	'got an inbox');
 like($raw[-1], qr/^\S+ OK /, 'response ended with OK');
-is(scalar(@raw), scalar(@V) + 2, 'default LIST response');
+is(scalar(@raw), scalar(@V) + 4, 'default LIST response');
 @raw = $mic->list('', 'inbox.i1');
 is(scalar(@raw), 2, 'limited LIST response');
 like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/,
@@ -199,7 +204,7 @@ SKIP: {
 	skip 'Mail::IMAPClient too old for ->compress', 2 if !$can_compress;
 	my $c = $imap_client->new(%mic_opt);
 	ok($c && $c->compress, 'compress enabled');
-	ok($c->examine('inbox.i1'), 'EXAMINE succeeds after COMPRESS');
+	ok($c->examine($mailbox1), 'EXAMINE succeeds after COMPRESS');
 	$ret = $c->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@";
 	is_deeply($ret, [ 1 ], 'search UID 1:* works after compression');
 }
@@ -216,11 +221,12 @@ $pi_config->each_inbox(sub {
 	my $ng = $ibx->{newsgroup};
 	my $mic = $imap_client->new(%mic_opt);
 	ok($mic && $mic->login && $mic->IsAuthenticated, "authed $name");
-	my $uidnext = $mic->uidnext($ng); # we'll fetch BODYSTRUCTURE on this
+	my $mb = "$ng.$first_range";
+	my $uidnext = $mic->uidnext($mb); # we'll fetch BODYSTRUCTURE on this
 	ok($uidnext, 'got uidnext for later fetch');
 	is_deeply([$mic->has_capability('IDLE')], ['IDLE'], "IDLE capa $name");
 	ok(!$mic->idle, "IDLE fails w/o SELECT/EXAMINE $name");
-	ok($mic->examine($ng), "EXAMINE $ng succeeds");
+	ok($mic->examine($mb), "EXAMINE $ng succeeds");
 	ok(my $idle_tag = $mic->idle, "IDLE succeeds on $ng");
 
 	open(my $fh, '<', 't/data/message_embed.eml') or BAIL_OUT("open: $!");
diff --git a/xt/imapd-mbsync-oimap.t b/xt/imapd-mbsync-oimap.t
index d2237a24bef..b2cb8737f82 100644
--- a/xt/imapd-mbsync-oimap.t
+++ b/xt/imapd-mbsync-oimap.t
@@ -13,12 +13,13 @@ my $inboxdir = $ENV{GIANT_INBOX_DIR};
 plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $cfg = "$tmpdir/cfg";
-my $mailbox = 'inbox.test';
+my $newsgroup = 'inbox.test';
+my $mailbox = "$newsgroup.1-50000";
 {
 	open my $fh, '>', $cfg or BAIL_OUT "open: $!";
 	print $fh <<EOF or BAIL_OUT "print: $!";
 [publicinbox "test"]
-	newsgroup = $mailbox
+	newsgroup = $newsgroup
 	address = oimap\@example.com
 	inboxdir = $inboxdir
 EOF
diff --git a/xt/imapd-validate.t b/xt/imapd-validate.t
index f96ec8791b9..b7b66d0583f 100644
--- a/xt/imapd-validate.t
+++ b/xt/imapd-validate.t
@@ -30,7 +30,7 @@ if ($test_tls && !-r $key || !-r $cert) {
 }
 my ($tmpdir, $for_destroy) = tmpdir();
 my %OPT = qw(User u Password p);
-my (%STARTTLS_OPT, %IMAPS_OPT, $td, $mailbox, $make_local_server);
+my (%STARTTLS_OPT, %IMAPS_OPT, $td, $newsgroup, $mailbox, $make_local_server);
 if (($ENV{IMAP_TEST_URL} // '') =~ m!\Aimap://([^/]+)/(.+)\z!) {
 	($OPT{Server}, $mailbox) = ($1, $2);
 	$OPT{Server} =~ s/:([0-9]+)\z// and $OPT{Port} = $1 + 0;
@@ -39,6 +39,7 @@ if (($ENV{IMAP_TEST_URL} // '') =~ m!\Aimap://([^/]+)/(.+)\z!) {
 } else {
 	require_mods(qw(DBD::SQLite));
 	$make_local_server->();
+	$mailbox = "$newsgroup.1-50000";
 }
 
 my %opts = (imap => \%OPT, 'imap+compress' => { %OPT, Compress => 1 });
@@ -124,15 +125,15 @@ BEGIN {
 
 $make_local_server = sub {
 	require PublicInbox::Inbox;
-	$mailbox = 'inbox.test';
-	my $ibx = { inboxdir => $inbox_dir, newsgroup => $mailbox };
+	$newsgroup = 'inbox.test';
+	my $ibx = { inboxdir => $inbox_dir, newsgroup => $newsgroup };
 	$ibx = PublicInbox::Inbox->new($ibx);
 	my $pi_config = "$tmpdir/config";
 	{
 		open my $fh, '>', $pi_config or die "open($pi_config): $!";
 		print $fh <<"" or die "print $pi_config: $!";
 [publicinbox "test"]
-	newsgroup = $mailbox
+	newsgroup = $newsgroup
 	inboxdir = $inbox_dir
 	address = test\@example.com
 
diff --git a/xt/perf-imap-list.t b/xt/perf-imap-list.t
deleted file mode 100644
index 0f00f487991..00000000000
--- a/xt/perf-imap-list.t
+++ /dev/null
@@ -1,38 +0,0 @@
-#!perl -w
-# Copyright (C) 2020 all contributors <meta@public-inbox.org>
-# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-use Test::More;
-use_ok 'PublicInbox::IMAP';
-use_ok 'PublicInbox::IMAPD';
-use PublicInbox::DS;
-use Benchmark qw(:all);
-my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000);
-push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000);
-my $self = { imapd => { grouplist => \@n } };
-my $n = scalar @n;
-my $t = timeit(1, sub {
-	PublicInbox::IMAPD::refresh_inboxlist($self->{imapd});
-});
-diag timestr($t). "refresh $n inboxes";
-
-open my $null, '>', '/dev/null' or BAIL_OUT "open: $!";
-my $ds = { sock => $null };
-my $nr = 200;
-diag "starting benchmark...";
-my $cmd_list = \&PublicInbox::IMAP::cmd_list;
-$t = timeit(1, sub {
-	for (0..$nr) {
-		my $res = $cmd_list->($self, 'tag', '', '*');
-		PublicInbox::DS::write($ds, $res);
-	}
-});
-diag timestr($t). "list all for $n inboxes $nr times";
-$nr = 20;
-$t = timeit(1, sub {
-	for (0..$nr) {
-		my $res = $cmd_list->($self, 'tag', 'inbox.', '%');
-		PublicInbox::DS::write($ds, $res);
-	}
-});
-diag timestr($t). "list partial for $n inboxes $nr times";
-done_testing;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 44/82] imapd: ensure LIST is sorted alphabetically, for now
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (42 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 43/82] imap: require ".$UID_MIN-$UID_END" suffix Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 45/82] imap: omit $UID_END from mailbox name, use index Eric Wong
                   ` (38 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

I'm not sure this matters, and it could be a waste of
CPU cycles if no real clients care.  However, it does
make debugging over telnet or s_client a bit easier.
---
 lib/PublicInbox/IMAPD.pm |  7 ++++++-
 t/imapd.t                | 14 +++++++++++++-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index b647c940505..6488dc0f3bc 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -72,7 +72,12 @@ sub imapd_refresh_finalize {
 		map {
 			my $no = $mailboxes->{$_} == $dummy ? '' : 'No';
 			qq[* LIST (\\Has${no}Children) "." $_\r\n]
-		} sort { length($a) <=> length($b) } keys %$mailboxes
+		} sort {
+			# shortest names first, alphabetically if lengths match
+			length($a) == length($b) ?
+				($a cmp $b) :
+				(length($a) <=> length($b))
+		} keys %$mailboxes
 	];
 	$imapd->{pi_config} = $pi_config;
 	if (my $idler = $imapd->{idler}) {
diff --git a/t/imapd.t b/t/imapd.t
index 017bfa0b27b..5d9a7d1181c 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -100,7 +100,7 @@ like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\.$first_range\x20
 	\(MESSAGES\x20\d+\x20UIDNEXT\x20\d+\x20UIDVALIDITY\x20\d+\)\r\n/sx);
 like($raw[1], qr/\A\S+ OK /, 'finished status response');
 
-@raw = $mic->list;
+my @orig_list = @raw = $mic->list;
 like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/,
 	'got an inbox');
 like($raw[-1], qr/^\S+ OK /, 'response ended with OK');
@@ -351,6 +351,18 @@ $r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]')
 is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'},
 	'Message-ID: <20200418222508.GA13918@dcvr>'."\r\n\r\n",
 	'BODY.PEEK[HEADER.FIELDS ...] drops .PEEK');
+
+{
+	my @new_list = $mic->list;
+	# tag differs in [-1]
+	like($orig_list[-1], qr/\A\S+ OK List done\r\n/, 'orig LIST');
+	like($new_list[-1], qr/\A\S+ OK List done\r\n/, 'new LIST');
+	pop @new_list;
+	pop @orig_list;
+	# TODO: not sure if sort order matters, imapd_refresh_finalize
+	# sorts, for now, but maybe clients don't care...
+	is_deeply(\@new_list, \@orig_list, 'LIST identical');
+}
 ok($mic->close, 'CLOSE works');
 ok(!$mic->close, 'CLOSE not idempotent');
 ok($mic->logout, 'logged out');

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 45/82] imap: omit $UID_END from mailbox name, use index
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (43 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 44/82] imapd: ensure LIST is sorted alphabetically, for now Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 46/82] t/config.t: always compare against git bool behavior Eric Wong
                   ` (37 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Having two large numbers separated by a dash can make visual
comparisons difficult when numbers are in the 3,000,000 range
for LKML.  So avoid the $UID_END value, since it can be
calculated from $UID_MIN.  And we can avoid large values of
$UID_MIN, too, by instead storing the block index and just
multiplying it by 50000 (and adding 1) on the server side.

Of course, LKML still goes up to 72, at the moment.
---
 lib/PublicInbox/IMAP.pm  | 15 ++++++---------
 lib/PublicInbox/IMAPD.pm |  2 +-
 t/imapd.t                |  2 +-
 xt/imapd-mbsync-oimap.t  |  2 +-
 xt/imapd-validate.t      |  2 +-
 5 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 6f64dff9958..5865822f0c3 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -197,14 +197,10 @@ sub ensure_ranges_exist ($$$) {
 	my $mailboxes = $imapd->{mailboxes};
 	my $mb_top = $ibx->{newsgroup};
 	my @created;
-	my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1;
-	my $uid_end = $uid_min + UID_BLOCK - 1;
-	while ($uid_min > 0) {
-		my $sub_mailbox = "$mb_top.$uid_min-$uid_end";
+	for (my $i = int($max/UID_BLOCK); $i >= 0; --$i) {
+		my $sub_mailbox = "$mb_top.$i";
 		last if exists $mailboxes->{$sub_mailbox};
 		$mailboxes->{$sub_mailbox} = $ibx;
-		$uid_end -= UID_BLOCK;
-		$uid_min -= UID_BLOCK;
 		push @created, $sub_mailbox;
 	}
 	return unless @created;
@@ -216,9 +212,9 @@ sub cmd_examine ($$$) {
 	my ($self, $tag, $mailbox) = @_;
 	my ($ibx, $mm, $max);
 
-	if ($mailbox =~ /\A(.+)\.([0-9]+)-([0-9]+)\z/) {
-		# old mail: inbox.comp.foo.$uid_min-$uid_end
-		my ($mb_top, $uid_min, $uid_end) = ($1, $2 + 0, $3 + 0);
+	if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) {
+		# old mail: inbox.comp.foo.$uid_block_idx
+		my ($mb_top, $uid_min) = ($1, $2 * UID_BLOCK + 1);
 
 		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or
 			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
@@ -227,6 +223,7 @@ sub cmd_examine ($$$) {
 		$max = $mm->max // 0;
 		$self->{uid_min} = $uid_min;
 		ensure_ranges_exist($self->{imapd}, $ibx, $max);
+		my $uid_end = $uid_min + UID_BLOCK - 1;
 		$max = $uid_end if $max > $uid_end;
 	} else { # check for dummy inboxes
 		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index 6488dc0f3bc..261d756042f 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -31,7 +31,7 @@ sub imapd_refresh_ibx { # pi_config->each_inbox cb
 			join(', ', @$ngname). "\n";
 		return;
 	} elsif ($ngname =~ m![^a-z0-9/_\.\-\~\@\+\=:]! ||
-		 $ngname =~ /\.[0-9]+-[0-9]+\z/) {
+		 $ngname =~ /\.[0-9]+\z/) {
 		warn "mailbox name invalid: newsgroup=`$ngname'\n";
 		return;
 	}
diff --git a/t/imapd.t b/t/imapd.t
index 5d9a7d1181c..45ee401a1e0 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -17,7 +17,7 @@ if ($can_compress) { # hope this gets fixed upstream, soon
 }
 
 require_ok 'PublicInbox::IMAP';
-my $first_range = '1-'.PublicInbox::IMAP::UID_BLOCK();
+my $first_range = '0';
 
 my $level = '-Lbasic';
 SKIP: {
diff --git a/xt/imapd-mbsync-oimap.t b/xt/imapd-mbsync-oimap.t
index b2cb8737f82..fdaa22aa9ef 100644
--- a/xt/imapd-mbsync-oimap.t
+++ b/xt/imapd-mbsync-oimap.t
@@ -14,7 +14,7 @@ plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $cfg = "$tmpdir/cfg";
 my $newsgroup = 'inbox.test';
-my $mailbox = "$newsgroup.1-50000";
+my $mailbox = "$newsgroup.0";
 {
 	open my $fh, '>', $cfg or BAIL_OUT "open: $!";
 	print $fh <<EOF or BAIL_OUT "print: $!";
diff --git a/xt/imapd-validate.t b/xt/imapd-validate.t
index b7b66d0583f..aeeb43b9b40 100644
--- a/xt/imapd-validate.t
+++ b/xt/imapd-validate.t
@@ -39,7 +39,7 @@ if (($ENV{IMAP_TEST_URL} // '') =~ m!\Aimap://([^/]+)/(.+)\z!) {
 } else {
 	require_mods(qw(DBD::SQLite));
 	$make_local_server->();
-	$mailbox = "$newsgroup.1-50000";
+	$mailbox = "$newsgroup.0";
 }
 
 my %opts = (imap => \%OPT, 'imap+compress' => { %OPT, Compress => 1 });

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 46/82] t/config.t: always compare against git bool behavior
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (44 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 45/82] imap: omit $UID_END from mailbox name, use index Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 47/82] xt/*: show some tunable parameters Eric Wong
                   ` (36 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We'll use the xqx() to avoid losing too much performance
compared to normal `backtick` (qx) when testing using
"make check-run" + Inline::C.
---
 t/config.t | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/t/config.t b/t/config.t
index 1f50bb86a09..3f41c0042a9 100644
--- a/t/config.t
+++ b/t/config.t
@@ -209,20 +209,17 @@ EOF
 }
 
 {
-	my $check_git = !!$ENV{CHECK_GIT_BOOL};
 	for my $t (qw(TRUE true yes on 1 +1 -1 13 0x1 0x12 0X5)) {
 		is(PublicInbox::Config::_git_config_bool($t), 1, "$t is true");
-		if ($check_git) {
-			is(`git -c test.val=$t config --bool test.val`,
-				"true\n", "$t matches git-config behavior");
-		}
+		is(xqx([qw(git -c), "test.val=$t",
+			qw(config --bool test.val)]),
+			"true\n", "$t matches git-config behavior");
 	}
 	for my $f (qw(FALSE false no off 0 +0 +000 00 0x00 0X0)) {
 		is(PublicInbox::Config::_git_config_bool($f), 0, "$f is false");
-		if ($check_git) {
-			is(`git -c test.val=$f config --bool test.val`,
-				"false\n", "$f matches git-config behavior");
-		}
+		is(xqx([qw(git -c), "test.val=$f",
+			qw(config --bool test.val)]),
+			"false\n", "$f matches git-config behavior");
 	}
 	is(PublicInbox::Config::_git_config_bool('bogus'), undef,
 		'bogus is undef');

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 47/82] xt/*: show some tunable parameters
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (45 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 46/82] t/config.t: always compare against git bool behavior Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 48/82] imap: STATUS and LIST are case-insensitive, too Eric Wong
                   ` (35 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This will make it easier to show parameters used for testing
and potential tweaks to be made.
---
 xt/eml_check_limits.t   | 5 ++++-
 xt/git_async_cmp.t      | 1 +
 xt/imapd-mbsync-oimap.t | 4 +++-
 xt/imapd-validate.t     | 1 +
 xt/mem-msgview.t        | 1 +
 5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/xt/eml_check_limits.t b/xt/eml_check_limits.t
index 2d632799956..cf780c77bab 100644
--- a/xt/eml_check_limits.t
+++ b/xt/eml_check_limits.t
@@ -13,7 +13,10 @@ require_git(2.19); # for --unordered
 require_mods(qw(BSD::Resource));
 BSD::Resource->import(qw(getrusage));
 my $cls = $ENV{TEST_CLASS};
-require_mods($cls) if $cls;
+if ($cls) {
+	diag "TEST_CLASS=$cls";
+	require_mods($cls);
+}
 $cls //= 'PublicInbox::Eml';
 my $inboxdir = $ENV{GIANT_INBOX_DIR};
 plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
diff --git a/xt/git_async_cmp.t b/xt/git_async_cmp.t
index 8f8d1cf4d01..f9c9ddefeb0 100644
--- a/xt/git_async_cmp.t
+++ b/xt/git_async_cmp.t
@@ -18,6 +18,7 @@ if (require_git(2.19, 1)) {
 }
 my @dig;
 my $nr = $ENV{NR} || 1;
+diag "NR=$nr";
 my $async = timeit($nr, sub {
 	my $dig = Digest::SHA->new(1);
 	my $cb = sub {
diff --git a/xt/imapd-mbsync-oimap.t b/xt/imapd-mbsync-oimap.t
index fdaa22aa9ef..c097a0262f0 100644
--- a/xt/imapd-mbsync-oimap.t
+++ b/xt/imapd-mbsync-oimap.t
@@ -116,7 +116,9 @@ while (scalar keys %pids) {
 	is($?, 0, join(' ', @$cmd, 'done'));
 }
 
-if (my $sec = $ENV{TEST_PERSIST}) {
+my $sec = $ENV{TEST_PERSIST} // 0;
+diag "TEST_PERSIST=$sec";
+if ($sec) {
 	diag "sleeping ${sec}s, imap://$host:$port/$mailbox available";
 	diag "tmpdir=$tmpdir (Maildirs available)";
 	diag "stdout=$out";
diff --git a/xt/imapd-validate.t b/xt/imapd-validate.t
index aeeb43b9b40..9a56c2d04c8 100644
--- a/xt/imapd-validate.t
+++ b/xt/imapd-validate.t
@@ -13,6 +13,7 @@ plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inbox_dir;
 # how many emails to read into memory at once per-process
 my $BATCH = $ENV{TEST_BATCH} // 100;
 my $REPEAT = $ENV{TEST_REPEAT} // 1;
+diag "TEST_BATCH=$BATCH TEST_REPEAT=$REPEAT";
 
 require_mods(qw(Mail::IMAPClient));
 my $imap_client = 'Mail::IMAPClient';
diff --git a/xt/mem-msgview.t b/xt/mem-msgview.t
index bffb1768b9e..c09afde083d 100644
--- a/xt/mem-msgview.t
+++ b/xt/mem-msgview.t
@@ -13,6 +13,7 @@ my @mods = qw(DBD::SQLite BSD::Resource PublicInbox::WWW);
 require_mods(@mods);
 use_ok($_) for @mods;
 my $lines = $ENV{NR_LINES} // 50000;
+diag "NR_LINES=$lines";
 my ($tmpdir, $for_destroy) = tmpdir();
 my $inboxname = 'big';
 my $inboxdir = "$tmpdir/big";

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 48/82] imap: STATUS and LIST are case-insensitive, too
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (46 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 47/82] xt/*: show some tunable parameters Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 49/82] imap: EXAMINE/STATUS: return correct counts Eric Wong
                   ` (34 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Some clients insist on sending "INBOX" in all caps,
since it's special in RFC 3501.
---
 lib/PublicInbox/IMAP.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 5865822f0c3..8c6fa7b62c9 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -537,7 +537,7 @@ sub uid_fetch_m { # long_response
 
 sub cmd_status ($$$;@) {
 	my ($self, $tag, $mailbox, @items) = @_;
-	my $ibx = $self->{imapd}->{mailboxes}->{$mailbox} or
+	my $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or
 		return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
 	return "$tag BAD no items\r\n" if !scalar(@items);
 	($items[0] !~ s/\A\(//s || $items[-1] !~ s/\)\z//s) and
@@ -571,7 +571,8 @@ sub cmd_list ($$$$) {
 		# request for hierarchy delimiter
 		$l = [ qq[* LIST (\\Noselect) "." ""\r\n] ];
 	} elsif ($refname ne '' || $wildcard ne '*') {
-		$wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eig;
+		$wildcard = lc $wildcard;
+		$wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eg;
 		$l = [ grep(/ \Q$refname\E$wildcard\r\n\z/s, @$l) ];
 	}
 	\(join('', @$l, "$tag OK List done\r\n"));

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 49/82] imap: EXAMINE/STATUS: return correct counts
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (47 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 48/82] imap: STATUS and LIST are case-insensitive, too Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 50/82] imap: avoid uninitialized warnings on incomplete commands Eric Wong
                   ` (33 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We can share code between them and account for each 50K
mailbox slice.  However, we must overreport these for
non-zero slices and just return lots of empty data for
high-numbered slices because some MUAs still insist
on non-UID fetches.
---
 lib/PublicInbox/IMAP.pm | 53 ++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 8c6fa7b62c9..e726307a678 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -208,42 +208,43 @@ sub ensure_ranges_exist ($$$) {
 	push @$l, map { qq[* LIST (\\HasNoChildren) "." $_\r\n] } @created;
 }
 
-sub cmd_examine ($$$) {
-	my ($self, $tag, $mailbox) = @_;
-	my ($ibx, $mm, $max);
-
+sub inbox_lookup ($$) {
+	my ($self, $mailbox) = @_;
+	my ($ibx, $exists, $uidnext);
 	if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) {
 		# old mail: inbox.comp.foo.$uid_block_idx
 		my ($mb_top, $uid_min) = ($1, $2 * UID_BLOCK + 1);
 
-		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or
-			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
-
-		$mm = $ibx->mm;
-		$max = $mm->max // 0;
+		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
+		$exists = $ibx->mm->max // 0;
 		$self->{uid_min} = $uid_min;
-		ensure_ranges_exist($self->{imapd}, $ibx, $max);
+		ensure_ranges_exist($self->{imapd}, $ibx, $exists);
 		my $uid_end = $uid_min + UID_BLOCK - 1;
-		$max = $uid_end if $max > $uid_end;
+		$exists = $uid_end if $exists > $uid_end;
+		$uidnext = $exists + 1;
 	} else { # check for dummy inboxes
-		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or
-			return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
+		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
 		delete $self->{uid_min};
-		$max = 0;
-		$mm = $ibx->mm;
+		$exists = 0;
+		$uidnext = 1;
 	}
+	($ibx, $exists, $uidnext);
+}
 
-	my $uidnext = $max + 1;
+sub cmd_examine ($$$) {
+	my ($self, $tag, $mailbox) = @_;
+	my ($ibx, $exists, $uidnext) = inbox_lookup($self, $mailbox);
+	return "$tag NO Mailbox doesn't exist: $mailbox\r\n" if !$ibx;
 
 	# XXX: do we need this? RFC 5162/7162
 	my $ret = $self->{ibx} ? "* OK [CLOSED] previous closed\r\n" : '';
 	$self->{ibx} = $ibx;
 	$ret .= <<EOF;
-* $max EXISTS\r
-* $max RECENT\r
+* $exists EXISTS\r
+* $exists RECENT\r
 * FLAGS (\\Seen)\r
 * OK [PERMANENTFLAGS ()] Read-only mailbox\r
-* OK [UNSEEN $max]\r
+* OK [UNSEEN $exists]\r
 * OK [UIDNEXT $uidnext]\r
 * OK [UIDVALIDITY $ibx->{uidvalidity}]\r
 $tag OK [READ-ONLY] EXAMINE/SELECT done\r
@@ -537,23 +538,21 @@ sub uid_fetch_m { # long_response
 
 sub cmd_status ($$$;@) {
 	my ($self, $tag, $mailbox, @items) = @_;
-	my $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or
-		return "$tag NO Mailbox doesn't exist: $mailbox\r\n";
 	return "$tag BAD no items\r\n" if !scalar(@items);
 	($items[0] !~ s/\A\(//s || $items[-1] !~ s/\)\z//s) and
 		return "$tag BAD invalid args\r\n";
-
-	my $mm = $ibx->mm;
-	my ($max, @it);
+	my ($ibx, $exists, $uidnext) = inbox_lookup($self, $mailbox);
+	return "$tag NO Mailbox doesn't exist: $mailbox\r\n" if !$ibx;
+	my @it;
 	for my $it (@items) {
 		$it = uc($it);
 		push @it, $it;
 		if ($it =~ /\A(?:MESSAGES|UNSEEN|RECENT)\z/) {
-			push(@it, ($max //= $mm->max // 0));
+			push @it, $exists;
 		} elsif ($it eq 'UIDNEXT') {
-			push(@it, ($max //= $mm->max // 0) + 1);
+			push @it, $uidnext;
 		} elsif ($it eq 'UIDVALIDITY') {
-			push(@it, $ibx->{uidvalidity});
+			push @it, $ibx->{uidvalidity};
 		} else {
 			return "$tag BAD invalid item\r\n";
 		}

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 50/82] imap: avoid uninitialized warnings on incomplete commands
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (48 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 49/82] imap: EXAMINE/STATUS: return correct counts Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 51/82] imap: start parsing out queries for SQLite and Xapian Eric Wong
                   ` (32 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

No point in spewing "uninitialized" warnings into logs when
the cat jumps on the Enter key.
---
 lib/PublicInbox/IMAP.pm |  7 +++++--
 t/imapd.t               | 11 +++++++++++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index e726307a678..0452d6df937 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -862,14 +862,17 @@ sub process_line ($$) {
 			cmd_done($self, $tag);
 		} else { # this is weird
 			auth_challenge_ok($self) //
-				"$tag BAD Error in IMAP command $req: ".
-				"Unknown command\r\n";
+					($tag // '*') .
+					' BAD Error in IMAP command '.
+					($req // '(???)').
+					": Unknown command\r\n";
 		}
 	};
 	my $err = $@;
 	if ($err && $self->{sock}) {
 		$l =~ s/\r?\n//s;
 		err($self, 'error from: %s (%s)', $l, $err);
+		$tag //= '*';
 		$res = "$tag BAD program fault - command not performed\r\n";
 	}
 	return 0 unless defined $res;
diff --git a/t/imapd.t b/t/imapd.t
index 45ee401a1e0..11b56b09dbe 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -367,11 +367,22 @@ ok($mic->close, 'CLOSE works');
 ok(!$mic->close, 'CLOSE not idempotent');
 ok($mic->logout, 'logged out');
 
+{
+	my $c = tcp_connect($sock);
+	$c->autoflush(1);
+	like(<$c>, qr/\* OK/, 'got a greeting');
+	print $c "\r\n";
+	like(<$c>, qr/\A\* BAD Error in IMAP command/, 'empty line');
+	print $c "tagonly\r\n";
+	like(<$c>, qr/\Atagonly BAD Error in IMAP command/, 'tag-only line');
+}
+
 $td->kill;
 $td->join;
 is($?, 0, 'no error in exited process');
 open my $fh, '<', $err or BAIL_OUT("open $err failed: $!");
 my $eout = do { local $/; <$fh> };
 unlike($eout, qr/wide/i, 'no Wide character warnings');
+unlike($eout, qr/uninitialized/i, 'no uninitialized warnings');
 
 done_testing;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 51/82] imap: start parsing out queries for SQLite and Xapian
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (49 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 50/82] imap: avoid uninitialized warnings on incomplete commands Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 52/82] imap: SEARCH: clamp results to the 50K UID range Eric Wong
                   ` (31 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

None of the new cases are wired up, yet, but existing cases
still work.
---
 lib/PublicInbox/IMAP.pm | 142 ++++++++++++++++++++++++++++++++++++++--
 t/imap.t                |  15 +++++
 2 files changed, 150 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 0452d6df937..b24dfcd70b5 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -24,6 +24,8 @@ use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
 use PublicInbox::GitAsyncCat;
 use Text::ParseWords qw(parse_line);
 use Errno qw(EAGAIN);
+use Time::Local qw(timegm);
+use POSIX qw(strftime);
 
 my $Address;
 for my $mod (qw(Email::Address::XS Mail::Address)) {
@@ -67,6 +69,10 @@ for my $att (keys %FETCH_ATT) {
 my $valid_range = '[0-9]+|[0-9]+:[0-9]+|[0-9]+:\*';
 $valid_range = qr/\A(?:$valid_range)(?:,(?:$valid_range))*\z/;
 
+my @MoY = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
+my %MoY;
+@MoY{@MoY} = (0..11);
+
 sub greet ($) {
 	my ($self) = @_;
 	my $capa = capa($self);
@@ -787,6 +793,17 @@ sub cmd_fetch ($$$;@) {
 	} : $args; # error
 }
 
+
+sub parse_date ($) { # 02-Oct-1993
+	my ($date_text) = @_;
+	my ($dd, $mon, $yyyy) = split(/-/, $_[0], 3);
+	defined($yyyy) or return;
+	my $mm = $MoY{$mon} // return;
+	$dd =~ /\A[0123]?[0-9]\z/ or return;
+	$yyyy =~ /\A[0-9]{4,}\z/ or return; # Y10K-compatible!
+	timegm(0, 0, 0, $dd, $mm, $yyyy);
+}
+
 sub uid_search_all { # long_response
 	my ($self, $tag, $num) = @_;
 	my $uids = $self->{ibx}->mm->ids_after($num);
@@ -809,23 +826,134 @@ sub uid_search_uid_range { # long_response
 	}
 }
 
+sub date_search {
+	my ($q, $k, $d) = @_;
+	my $sql = $q->{sql};
+
+	# Date: header
+	if ($k eq 'SENTON') {
+		my $end = $d + 86399; # no leap day...
+		my $da = strftime('%Y%m%d%H%M%S', gmtime($d));
+		my $db = strftime('%Y%m%d%H%M%S', gmtime($end));
+		$q->{xap} .= " dt:$da..$db";
+		$$sql .= " AND ds >= $d AND ds <= $end" if defined($sql);
+	} elsif ($k eq 'SENTBEFORE') {
+		$q->{xap} .= ' d:..'.strftime('%Y%m%d', gmtime($d));
+		$$sql .= " AND ds <= $d" if defined($sql);
+	} elsif ($k eq 'SENTSINCE') {
+		$q->{xap} .= ' d:'.strftime('%Y%m%d', gmtime($d)).'..';
+		$$sql .= " AND ds >= $d" if defined($sql);
+
+	# INTERNALDATE (Received)
+	} elsif ($k eq 'ON') {
+		my $end = $d + 86399; # no leap day...
+		$q->{xap} .= " ts:$d..$end";
+		$$sql .= " AND ts >= $d AND ts <= $end" if defined($sql);
+	} elsif ($k eq 'BEFORE') {
+		$q->{xap} .= " ts:..$d";
+		$$sql .= " AND ts <= $d" if defined($sql);
+	} elsif ($k eq 'SINCE') {
+		$q->{xap} .= " ts:$d..";
+		$$sql .= " AND ts >= $d" if defined($sql);
+	} else {
+		die "BUG: $k not recognized";
+	}
+}
+
+# IMAP to Xapian search key mapping
+my %I2X = (
+	SUBJECT => 's:',
+	BODY => 'b:',
+	FROM => 'f:',
+	TEXT => '', # n.b. does not include all headers
+	TO => 't:',
+	CC => 'c:',
+	# BCC => 'bcc:', # TODO
+	# KEYWORD # TODO ? dfpre,dfpost,...
+);
+
+sub parse_query {
+	my ($self, $rest) = @_;
+	if (uc($rest->[0]) eq 'CHARSET') {
+		shift @$rest;
+		defined(my $c = shift @$rest) or return 'BAD missing charset';
+		$c =~ /\A(?:UTF-8|US-ASCII)\z/ or return 'NO [BADCHARSET]';
+	}
+
+	my $sql = ''; # date conditions, {sql} deleted if Xapian is needed
+	my $q = { xap => '', sql => \$sql };
+	while (@$rest) {
+		my $k = uc(shift @$rest);
+		# default criteria
+		next if $k =~ /\A(?:ALL|RECENT|UNSEEN|NEW)\z/;
+		next if $k eq 'AND'; # the default, until we support OR
+		if ($k =~ $valid_range) { # sequence numbers == UIDs
+			push @{$q->{uid}}, $k;
+		} elsif ($k eq 'UID') {
+			$k = shift(@$rest) // '';
+			$k =~ $valid_range or return 'BAD UID range';
+			push @{$q->{uid}}, $k;
+		} elsif ($k =~ /\A(?:SENT)?(?:SINCE|ON|BEFORE)\z/) {
+			my $d = parse_date(shift(@$rest) // '');
+			defined $d or return "BAD $k date format";
+			date_search($q, $k, $d);
+		} elsif ($k =~ /\A(?:SMALLER|LARGER)\z/) {
+			delete $q->{sql}; # can't use over.sqlite3
+			my $bytes = shift(@$rest) // '';
+			$bytes =~ /\A[0-9]+\z/ or return "BAD $k not a number";
+			$q->{xap} .= ' bytes:' . ($k eq 'SMALLER' ?
+							'..'.(--$bytes) :
+							(++$bytes).'..');
+		} elsif (defined(my $xk = $I2X{$k})) {
+			delete $q->{sql}; # can't use over.sqlite3
+			my $arg = shift @$rest;
+			defined($arg) or return "BAD $k no arg";
+
+			# Xapian can't handle [*"] in probabilistic terms
+			$arg =~ tr/*"//d;
+			$q->{xap} .= qq[ $xk:"$arg"];
+		} else {
+			# TODO: parentheses, OR, NOT ...
+			return "BAD $k not supported (yet?)";
+		}
+	}
+
+	# favor using over.sqlite3 if possible, since Xapian is optional
+	if (exists $q->{sql}) {
+		delete($q->{xap});
+		delete($q->{sql}) if $sql eq '';
+	} elsif (!$self->{ibx}->search) {
+		return 'BAD Xapian not configured for mailbox';
+	}
+
+	if (my $uid = $q->{uid}) {
+		((@$uid > 1) || $uid->[0] =~ /,/) and
+			return 'BAD multiple ranges not supported, yet';
+		($q->{sql} // $q->{xap}) and
+			return 'BAD ranges and queries do not mix, yet';
+		$q->{uid} = join(',', @$uid); # TODO: multiple ranges
+	}
+	$q;
+}
+
 sub cmd_uid_search ($$$;) {
-	my ($self, $tag, $arg, @rest) = @_;
+	my ($self, $tag) = splice(@_, 0, 2);
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
-	$arg = uc($arg);
-	if ($arg eq 'ALL' && !@rest) {
+	my $q = parse_query($self, \@_);
+	return "$tag $q\r\n" if !ref($q);
+
+	if (!scalar(keys %$q)) {
 		$self->msg_more('* SEARCH');
 		my $num = 0;
 		long_response($self, \&uid_search_all, $tag, \$num);
-	} elsif ($arg eq 'UID' && scalar(@rest) == 1) {
-		if ($rest[0] =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
+	} elsif (my $uid = $q->{uid}) {
+		if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
 			my ($beg, $end) = ($1, $2);
 			$end = $ibx->mm->max if $end eq '*';
 			$self->msg_more('* SEARCH');
 			long_response($self, \&uid_search_uid_range,
 					$tag, \$beg, $end);
-		} elsif ($rest[0] =~ /\A[0-9]+\z/s) {
-			my $uid = $rest[0];
+		} elsif ($uid =~ /\A[0-9]+\z/s) {
 			$uid = $ibx->over->get_art($uid) ? " $uid" : '';
 			"* SEARCH$uid\r\n$tag OK Search done\r\n";
 		} else {
diff --git a/t/imap.t b/t/imap.t
index af59ef69386..47e86ef42c7 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -9,6 +9,21 @@ use PublicInbox::IMAPD;
 use PublicInbox::TestCommon;
 require_mods(qw(DBD::SQLite));
 require_git 2.6;
+use POSIX qw(strftime);
+
+{
+	my $parse_date = \&PublicInbox::IMAP::parse_date;
+	is(strftime('%Y-%m-%d', gmtime($parse_date->('02-Oct-1993'))),
+		'1993-10-02', 'parse_date works');
+	is(strftime('%Y-%m-%d', gmtime($parse_date->('2-Oct-1993'))),
+		'1993-10-02', 'parse_date works w/o leading zero');
+
+	is($parse_date->('2-10-1993'), undef, 'bad month');
+
+	# from what I can tell, RFC 3501 says nothing about date-month
+	# case-insensitivity, so be case-sensitive for now
+	is($parse_date->('02-oct-1993'), undef, 'case-sensitive month');
+}
 
 my ($tmpdir, $for_destroy) = tmpdir();
 my $cfgfile = "$tmpdir/config";

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 52/82] imap: SEARCH: clamp results to the 50K UID range
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (50 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 51/82] imap: start parsing out queries for SQLite and Xapian Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 53/82] imap: allow UID range search on timestamps Eric Wong
                   ` (30 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We won't support searching across mailboxes, just yet;
but maybe in the future.
---
 lib/PublicInbox/IMAP.pm | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index b24dfcd70b5..f9af530aa19 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -487,6 +487,14 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat
 	requeue_once($self);
 }
 
+sub uid_clamp ($$$) {
+	my ($self, $beg, $end) = @_;
+	my $uid_min = $self->{uid_min} or return;
+	my $uid_end = $uid_min + UID_BLOCK - 1;
+	$$beg = $uid_min if $$beg < $uid_min;
+	$$end = $uid_end if $$end > $uid_end;
+}
+
 sub range_step ($$) {
 	my ($self, $range_csv) = @_;
 	my ($beg, $end, $range);
@@ -501,6 +509,8 @@ sub range_step ($$) {
 	} elsif ($range =~ /\A([0-9]+):\*\z/) {
 		$beg = $1 + 0;
 		$end = $self->{ibx}->mm->max // 0;
+		my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK;
+		$end = $uid_end if $end > $uid_end;
 		$beg = $end if $beg > $end;
 	} elsif ($range =~ /\A[0-9]+\z/) {
 		$beg = $end = $range + 0;
@@ -508,11 +518,7 @@ sub range_step ($$) {
 	} else {
 		return 'BAD fetch range';
 	}
-	if (defined($range) && (my $uid_min = $self->{uid_min})) {
-		my $uid_end = $uid_min + UID_BLOCK - 1;
-		$beg = $uid_min if $beg < $uid_min;
-		$end = $uid_end if $end > $uid_end;
-	}
+	uid_clamp($self, \$beg, \$end) if defined($range);
 	[ $beg, $end, $$range_csv ];
 }
 
@@ -804,17 +810,6 @@ sub parse_date ($) { # 02-Oct-1993
 	timegm(0, 0, 0, $dd, $mm, $yyyy);
 }
 
-sub uid_search_all { # long_response
-	my ($self, $tag, $num) = @_;
-	my $uids = $self->{ibx}->mm->ids_after($num);
-	if (scalar(@$uids)) {
-		$self->msg_more(join(' ', '', @$uids));
-	} else {
-		$self->write(\"\r\n$tag OK Search done\r\n");
-		undef;
-	}
-}
-
 sub uid_search_uid_range { # long_response
 	my ($self, $tag, $beg, $end) = @_;
 	my $uids = $self->{ibx}->mm->msg_range($beg, $end, 'num');
@@ -944,12 +939,15 @@ sub cmd_uid_search ($$$;) {
 
 	if (!scalar(keys %$q)) {
 		$self->msg_more('* SEARCH');
-		my $num = 0;
-		long_response($self, \&uid_search_all, $tag, \$num);
+		my $beg = $self->{uid_min} // 1;
+		my $end = $ibx->mm->max;
+		uid_clamp($self, \$beg, \$end);
+		long_response($self, \&uid_search_uid_range, $tag, \$beg, $end);
 	} elsif (my $uid = $q->{uid}) {
 		if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
 			my ($beg, $end) = ($1, $2);
 			$end = $ibx->mm->max if $end eq '*';
+			uid_clamp($self, \$beg, \$end);
 			$self->msg_more('* SEARCH');
 			long_response($self, \&uid_search_uid_range,
 					$tag, \$beg, $end);

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 53/82] imap: allow UID range search on timestamps
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (51 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 52/82] imap: SEARCH: clamp results to the 50K UID range Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 54/82] over: get_art: use dbh->prepare_cached Eric Wong
                   ` (29 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Since it seems somewhat common for IMAP clients to limit
searches by sent Date: or INTERNALDATE, we can rely on
the NNTP/WWW-optimized overview DB.

For other queries, we'll have to depend on the Xapian DB.
---
 lib/PublicInbox/DummyInbox.pm |  2 +-
 lib/PublicInbox/IMAP.pm       | 13 ++++++++-----
 lib/PublicInbox/Over.pm       | 13 +++++++++++++
 t/imapd.t                     |  6 ++++++
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/DummyInbox.pm b/lib/PublicInbox/DummyInbox.pm
index e38f9e5a432..b6c48db1a0e 100644
--- a/lib/PublicInbox/DummyInbox.pm
+++ b/lib/PublicInbox/DummyInbox.pm
@@ -13,7 +13,7 @@ sub max { undef } # Msgmap::max
 sub msg_range { [] } # Msgmap::msg_range
 
 no warnings 'once';
-*query_xover = \&msg_range;
+*uid_range = *query_xover = \&msg_range;
 *over = \&mm;
 *subscribe_unlock = *unsubscribe_unlock =
 	*get_art = *description = *base_url = \&max;
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index f9af530aa19..78042b9e2ee 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -811,10 +811,11 @@ sub parse_date ($) { # 02-Oct-1993
 }
 
 sub uid_search_uid_range { # long_response
-	my ($self, $tag, $beg, $end) = @_;
-	my $uids = $self->{ibx}->mm->msg_range($beg, $end, 'num');
+	my ($self, $tag, $beg, $end, $sql) = @_;
+	my $uids = $self->{ibx}->over->uid_range($$beg, $end, $sql);
 	if (@$uids) {
-		$self->msg_more(join('', map { " $_->[0]" } @$uids));
+		$$beg = $uids->[-1] + 1;
+		$self->msg_more(join(' ', '', @$uids));
 	} else {
 		$self->write(\"\r\n$tag OK Search done\r\n");
 		undef;
@@ -936,13 +937,15 @@ sub cmd_uid_search ($$$;) {
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
 	my $q = parse_query($self, \@_);
 	return "$tag $q\r\n" if !ref($q);
+	my $sql = delete $q->{sql};
 
 	if (!scalar(keys %$q)) {
 		$self->msg_more('* SEARCH');
 		my $beg = $self->{uid_min} // 1;
 		my $end = $ibx->mm->max;
 		uid_clamp($self, \$beg, \$end);
-		long_response($self, \&uid_search_uid_range, $tag, \$beg, $end);
+		long_response($self, \&uid_search_uid_range,
+				$tag, \$beg, $end, $sql);
 	} elsif (my $uid = $q->{uid}) {
 		if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
 			my ($beg, $end) = ($1, $2);
@@ -950,7 +953,7 @@ sub cmd_uid_search ($$$;) {
 			uid_clamp($self, \$beg, \$end);
 			$self->msg_more('* SEARCH');
 			long_response($self, \&uid_search_uid_range,
-					$tag, \$beg, $end);
+					$tag, \$beg, $end, $sql);
 		} elsif ($uid =~ /\A[0-9]+\z/s) {
 			$uid = $ibx->over->get_art($uid) ? " $uid" : '';
 			"* SEARCH$uid\r\n$tag OK Search done\r\n";
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index 286fb7f6b4f..402cbf7ce07 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -215,4 +215,17 @@ SELECT num,ts,ds,ddd FROM over WHERE num = ? LIMIT 1
 	load_from_row($smsg);
 }
 
+# IMAP search
+sub uid_range {
+	my ($self, $beg, $end, $sql) = @_;
+	my $dbh = $self->connect;
+	my $q = 'SELECT num FROM over WHERE num >= ? AND num <= ?';
+
+	# This is read-only, anyways; but caller should verify it's
+	# only sending \A[0-9]+\z for ds and ts column ranges
+	$q .= $$sql if $sql;
+	$q .= ' ORDER BY num ASC LIMIT ' . DEFAULT_LIMIT;
+	$dbh->selectcol_arrayref($q, undef, $beg, $end);
+}
+
 1;
diff --git a/t/imapd.t b/t/imapd.t
index 11b56b09dbe..2546ab51ac6 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -326,6 +326,12 @@ Content-Type: message/rfc822\r
 Content-Disposition: attachment; filename="embed2x\.eml"\r
 \r
 EOF
+
+	my @hits = $mic->search('SENTON' => '18-Apr-2020');
+	is_deeply(\@hits, [ $uidnext ], 'search with date condition works');
+	ok($mic->examine($ng), 'EXAMINE on dummy');
+	@hits = $mic->search('SENTSINCE' => '18-Apr-2020');
+	is_deeply(\@hits, [], 'search on dummy with condition works');
 }); # each_inbox
 
 # message sequence numbers :<

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 54/82] over: get_art: use dbh->prepare_cached
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (52 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 53/82] imap: allow UID range search on timestamps Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 55/82] search: index byte size of a message for IMAP search Eric Wong
                   ` (28 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This speeds up xt/imapd-validate.t by around 10% when used with
an abandoned patch to remove ->query_xover.  We may also depend
on this further if we abandon storing doc_data in Xapian to save
disk space.
---
 lib/PublicInbox/Over.pm | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index 402cbf7ce07..1faeff418f4 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -179,11 +179,12 @@ SELECT COUNT(num) FROM over WHERE num > 0
 sub get_art {
 	my ($self, $num) = @_;
 	my $dbh = $self->connect;
-	my $smsg = $dbh->selectrow_hashref(<<'', undef, $num);
+	my $sth = $dbh->prepare_cached(<<'', undef, 1);
 SELECT num,ds,ts,ddd FROM over WHERE num = ? LIMIT 1
 
-	return load_from_row($smsg) if $smsg;
-	undef;
+	$sth->execute($num);
+	my $smsg = $sth->fetchrow_hashref;
+	$smsg ? load_from_row($smsg) : undef;
 }
 
 sub next_by_mid {

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 55/82] search: index byte size of a message for IMAP search
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (53 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 54/82] over: get_art: use dbh->prepare_cached Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 56/82] search: index UID for IMAP search, too Eric Wong
                   ` (27 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Searching for messages smaller than a certain size is allowed by
offlineimap(1), mbsync(1), and possibly other tools.  Maybe
public-inbox-watch will support it, too.

I don't see a reason to expose searching by size via WWW search
right now (but maybe in the future, I could be convinced to).

Note: we only store the byte-size of the message in git,
this is typically LF-only and we won't have the correct
size after CRLF conversion for NNTP or IMAP.
---
 lib/PublicInbox/Search.pm    | 12 ++++++++----
 lib/PublicInbox/SearchIdx.pm |  2 ++
 t/search.t                   |  6 ++++++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index cb669e8733e..f2d3b92dc82 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -5,12 +5,16 @@
 # Read-only search interface for use by the web and NNTP interfaces
 package PublicInbox::Search;
 use strict;
-use warnings;
 
 # values for searching
-use constant TS => 0;  # Received: header in Unix time
-use constant YYYYMMDD => 1; # Date: header for searching in the WWW UI
-use constant DT => 2; # Date: YYYYMMDDHHMMSS
+use constant {
+	TS => 0, # Received: header in Unix time (IMAP INTERNALDATE)
+	YYYYMMDD => 1, # Date: header for searching in the WWW UI
+	DT => 2, # Date: YYYYMMDDHHMMSS
+	BYTES => 3, # IMAP RFC822.SIZE
+	# TODO
+	# REPLYCNT => 4, # IMAP ANSWERED
+};
 
 use PublicInbox::Smsg;
 use PublicInbox::Over;
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index f4fa50ff10f..f7462aa74ca 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -341,6 +341,7 @@ sub add_xapian ($$$$) {
 	add_val($doc, PublicInbox::Search::YYYYMMDD(), $yyyymmdd);
 	my $dt = strftime('%Y%m%d%H%M%S', @ds);
 	add_val($doc, PublicInbox::Search::DT(), $dt);
+	add_val($doc, PublicInbox::Search::BYTES(), $smsg->{bytes});
 
 	my $tg = term_generator($self);
 	$tg->set_document($doc);
@@ -388,6 +389,7 @@ sub add_message {
 
 	# v1 and tests only:
 	$smsg->populate($hdr, $self);
+	$smsg->{bytes} //= length($mime->as_string);
 
 	eval {
 		# order matters, overview stores every possible piece of
diff --git a/t/search.t b/t/search.t
index 6cf2bc2d6b4..cf3254169ca 100644
--- a/t/search.t
+++ b/t/search.t
@@ -318,6 +318,12 @@ $ibx->with_umask(sub {
 	foreach my $m ($mset->items) {
 		my $smsg = $ro->{over_ro}->get_art($m->get_docid);
 		like($smsg->{to}, qr/\blist\@example\.com\b/, 'to appears');
+		my $doc = $m->get_document;
+		my $col = PublicInbox::Search::BYTES();
+		my $bytes = PublicInbox::Smsg::get_val($doc, $col);
+		like($bytes, qr/\A[0-9]+\z/, '$bytes stored as digit');
+		ok($bytes > 0, '$bytes is > 0');
+		is($bytes, $smsg->{bytes}, 'bytes Xapian value matches Over');
 	}
 
 	$mset = $ro->query('tc:list@example.com', {mset => 1});

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 56/82] search: index UID for IMAP search, too
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (54 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 55/82] search: index byte size of a message for IMAP search Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 57/82] imap: remove dummies from sequence number FETCH Eric Wong
                   ` (26 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We'll need to support searching UID ranges for IMAP,
so make sure it's indexed, too.
---
 lib/PublicInbox/Search.pm    | 1 +
 lib/PublicInbox/SearchIdx.pm | 1 +
 t/search.t                   | 5 +++++
 3 files changed, 7 insertions(+)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index f2d3b92dc82..c54cf7b9911 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -12,6 +12,7 @@ use constant {
 	YYYYMMDD => 1, # Date: header for searching in the WWW UI
 	DT => 2, # Date: YYYYMMDDHHMMSS
 	BYTES => 3, # IMAP RFC822.SIZE
+	UID => 4, # IMAP UID == NNTP article number == Xapian docid
 	# TODO
 	# REPLYCNT => 4, # IMAP ANSWERED
 };
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index f7462aa74ca..3df7970ebf9 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -342,6 +342,7 @@ sub add_xapian ($$$$) {
 	my $dt = strftime('%Y%m%d%H%M%S', @ds);
 	add_val($doc, PublicInbox::Search::DT(), $dt);
 	add_val($doc, PublicInbox::Search::BYTES(), $smsg->{bytes});
+	add_val($doc, PublicInbox::Search::UID(), $smsg->{num});
 
 	my $tg = term_generator($self);
 	$tg->set_document($doc);
diff --git a/t/search.t b/t/search.t
index cf3254169ca..d4ca28c794f 100644
--- a/t/search.t
+++ b/t/search.t
@@ -324,6 +324,11 @@ $ibx->with_umask(sub {
 		like($bytes, qr/\A[0-9]+\z/, '$bytes stored as digit');
 		ok($bytes > 0, '$bytes is > 0');
 		is($bytes, $smsg->{bytes}, 'bytes Xapian value matches Over');
+
+		$col = PublicInbox::Search::UID();
+		my $uid = PublicInbox::Smsg::get_val($doc, $col);
+		is($uid, $smsg->{num}, 'UID column matches {num}');
+		is($uid, $m->get_docid, 'UID column matches docid');
 	}
 
 	$mset = $ro->query('tc:list@example.com', {mset => 1});

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 57/82] imap: remove dummies from sequence number FETCH
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (55 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 56/82] search: index UID for IMAP search, too Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 58/82] imap: compile UID FETCH to opcodes Eric Wong
                   ` (25 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

Dummy messages make for bad user experience with MUAs which
still use sequence numbers.  Not being able to fetch a message
doesn't seem fatal in mutt, so just ignore (sometimes large)
gaps.
---
 lib/PublicInbox/IMAP.pm | 61 ++++-------------------------------------
 t/imapd.t               |  3 +-
 2 files changed, 6 insertions(+), 58 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 78042b9e2ee..8307343cf15 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -9,8 +9,9 @@
 # data notes:
 # * NNTP article numbers are UIDs and message sequence numbers (MSNs)
 # * Message sequence numbers (MSNs) can be stable since we're read-only.
-#   Most IMAP clients use UIDs (I hope), and we can return a dummy
-#   message if a client requests a non-existent MSN.
+#   Most IMAP clients use UIDs (I hope).  We may return a dummy message
+#   in the future if a client requests a non-existent MSN, but that seems
+#   unecessary with mutt.
 
 package PublicInbox::IMAP;
 use strict;
@@ -398,22 +399,6 @@ sub fetch_body ($;$) {
 	join('', @hold);
 }
 
-sub dummy_message ($$) {
-	my ($self, $seqno) = @_;
-	my $ret = <<EOF;
-From: nobody\@localhost\r
-To: nobody\@localhost\r
-Date: Thu, 01 Jan 1970 00:00:00 +0000\r
-Message-ID: <dummy-$seqno\@$self->{ibx}->{newsgroup}>\r
-Subject: dummy message #$seqno\r
-\r
-You're seeing this message because your IMAP client didn't use UIDs.\r
-The message which used to use this sequence number was likely spam\r
-and removed by the administrator.\r
-EOF
-	\$ret;
-}
-
 sub requeue_once ($) {
 	my ($self) = @_;
 	# COMPRESS users all share the same DEFLATE context.
@@ -437,8 +422,7 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
 		# public-inbox-(edit|purge), just move onto the next message
-		return requeue_once($self) unless defined $want->{-seqno};
-		$bref = dummy_message($self, $smsg->{num});
+		return requeue_once($self);
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
@@ -764,42 +748,6 @@ sub cmd_uid_fetch ($$$;@) {
 		$args; # error
 }
 
-sub seq_fetch_m { # long_response
-	my ($self, $tag, $msgs, $range_info, $want) = @_;
-	while (!@$msgs) { # rare
-		if (my $end = refill_range($self, $msgs, $range_info)) {
-			$self->write(\"$tag $end\r\n");
-			return;
-		}
-	}
-	my $seq = $want->{-seqno}++;
-	my $cur_num = $msgs->[0]->{num};
-	if ($cur_num == $seq) { # as expected
-		git_async_cat($self->{ibx}->git, $msgs->[0]->{blob},
-				\&uid_fetch_cb, \@_);
-	} elsif ($cur_num > $seq) {
-		# send dummy messages until $seq catches up to $cur_num
-		my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg';
-		unshift @$msgs, $smsg;
-		my $bref = dummy_message($self, $seq);
-		uid_fetch_cb($bref, undef, undef, undef, \@_);
-		$smsg; # blessed response since uid_fetch_cb requeues
-	} else { # should not happen
-		die "BUG: cur_num=$cur_num < seq=$seq";
-	}
-}
-
-sub cmd_fetch ($$$;@) {
-	my ($self, $tag, $range_csv, @want) = @_;
-	my $args = fetch_common($self, $tag, $range_csv, \@want);
-	ref($args) eq 'ARRAY' ? do {
-		my $want = $args->[-1];
-		$want->{-seqno} = $args->[2]->[0]; # $beg == $range_info->[0];
-		long_response($self, \&seq_fetch_m, @$args)
-	} : $args; # error
-}
-
-
 sub parse_date ($) { # 02-Oct-1993
 	my ($date_text) = @_;
 	my ($dd, $mon, $yyyy) = split(/-/, $_[0], 3);
@@ -1147,5 +1095,6 @@ sub close {
 # we're read-only, so SELECT and EXAMINE do the same thing
 no warnings 'once';
 *cmd_select = \&cmd_examine;
+*cmd_fetch = \&cmd_uid_fetch;
 
 1;
diff --git a/t/imapd.t b/t/imapd.t
index 2546ab51ac6..233be9f2c0a 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -348,8 +348,7 @@ is(scalar keys %$ret, 3, 'got all 3 messages');
 	run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm');
 }
 my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@";
-is(scalar keys %$r2, 3, 'still got all 3 messages');
-like($r2->{1}->{'BODY[]'}, qr/dummy message #1/, 'got dummy message 1');
+is(scalar keys %$r2, 2, 'did not get all 3 messages');
 is($r2->{2}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged');
 is($r2->{3}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged');
 $r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]')

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 58/82] imap: compile UID FETCH to opcodes
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (56 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 57/82] imap: remove dummies from sequence number FETCH Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 59/82] imap: UID FETCH: optimize for smsg-only case Eric Wong
                   ` (24 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

This is just a hair faster and cacheable in the future, if we
need it.  Most notably, this avoids doing PublicInbox::Eml->new
for simple "RFC822", "BODY[]", and "RFC822.SIZE" requests.
---
 lib/PublicInbox/IMAP.pm | 188 +++++++++++++++++++++++++---------------
 t/imap.t                |  16 ++++
 2 files changed, 134 insertions(+), 70 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 8307343cf15..3fae81112aa 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -40,22 +40,26 @@ sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977?
 # changing this will cause grief for clients which cache
 sub UID_BLOCK () { 50_000 }
 
-my %FETCH_NEED_BLOB = ( # for future optimization
-	'BODY[HEADER]' => 1,
-	'BODY[TEXT]' => 1,
-	'BODY[]' => 1,
-	'RFC822.HEADER' => 1,
-	'RFC822.SIZE' => 1, # needs CRLF conversion :<
-	'RFC822.TEXT' => 1,
-	BODY => 1,
-	BODYSTRUCTURE => 1,
-	ENVELOPE => 1,
-	FLAGS => 0,
-	INTERNALDATE => 0,
-	RFC822 => 1,
-	UID => 0,
+# these values area also used for sorting
+sub NEED_BLOB () { 1 }
+sub NEED_EML () { NEED_BLOB|2 }
+my $OP_EML_NEW = [ NEED_EML - 1, \&op_eml_new ];
+
+my %FETCH_NEED = ( # for future optimization
+	'BODY[HEADER]' => [ NEED_EML, \&emit_rfc822_header ],
+	'BODY[TEXT]' => [ NEED_EML, \&emit_rfc822_text ],
+	'BODY[]' => [ NEED_BLOB, \&emit_rfc822 ],
+	'RFC822.HEADER' => [ NEED_EML, \&emit_rfc822_header ],
+	'RFC822.TEXT' => [ NEED_EML, \&emit_rfc822_text ],
+	'RFC822.SIZE' => [ NEED_BLOB, \&emit_rfc822_size ],
+	RFC822 => [ NEED_BLOB, \&emit_rfc822 ],
+	BODY => [ NEED_EML, \&emit_body ],
+	BODYSTRUCTURE => [ NEED_EML, \&emit_bodystructure ],
+	ENVELOPE => [ NEED_EML, \&emit_envelope ],
+	FLAGS => [ 0, \&emit_flags ],
+	INTERNALDATE => [ 0, \&emit_internaldate ],
 );
-my %FETCH_ATT = map { $_ => [ $_ ] } keys %FETCH_NEED_BLOB;
+my %FETCH_ATT = map { $_ => [ $_ ] } keys %FETCH_NEED;
 
 # aliases (RFC 3501 section 6.4.5)
 $FETCH_ATT{FAST} = [ qw(FLAGS INTERNALDATE RFC822.SIZE) ];
@@ -63,9 +67,10 @@ $FETCH_ATT{ALL} = [ @{$FETCH_ATT{FAST}}, 'ENVELOPE' ];
 $FETCH_ATT{FULL} = [ @{$FETCH_ATT{ALL}}, 'BODY' ];
 
 for my $att (keys %FETCH_ATT) {
-	my %h = map { $_ => 1 } @{$FETCH_ATT{$att}};
+	my %h = map { $_ => $FETCH_NEED{$_} } @{$FETCH_ATT{$att}};
 	$FETCH_ATT{$att} = \%h;
 }
+undef %FETCH_NEED;
 
 my $valid_range = '[0-9]+|[0-9]+:[0-9]+|[0-9]+:\*';
 $valid_range = qr/\A(?:$valid_range)(?:,(?:$valid_range))*\z/;
@@ -417,7 +422,7 @@ sub requeue_once ($) {
 
 sub uid_fetch_cb { # called by git->cat_async via git_async_cat
 	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
-	my ($self, undef, $msgs, undef, $want) = @$fetch_m_arg;
+	my ($self, undef, $msgs, undef, $ops, $partial) = @$fetch_m_arg;
 	my $smsg = shift @$msgs or die 'BUG: no smsg';
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
@@ -426,51 +431,72 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
-
 	$$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy
 
 	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
 	$$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s;
-
 	$self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}");
-
-	$want->{'RFC822.SIZE'} and
-		$self->msg_more(' RFC822.SIZE '.length($$bref));
-	$want->{INTERNALDATE} and
-		$self->msg_more(' INTERNALDATE "'.$smsg->internaldate.'"');
-	$want->{FLAGS} and $self->msg_more(' FLAGS ()');
-	for ('RFC822', 'BODY[]') {
-		$want->{$_} or next;
-		$self->msg_more(" $_ {".length($$bref)."}\r\n");
-		$self->msg_more($$bref);
+	my $eml;
+	for (my $i = 0; $i < @$ops;) {
+		my $k = $ops->[$i++];
+		$ops->[$i++]->($self, $k, $smsg, $bref, $eml);
 	}
+	partial_emit($self, $partial, $eml) if $partial;
+	$self->msg_more(")\r\n");
+	requeue_once($self);
+}
 
-	my $eml = PublicInbox::Eml->new($bref);
+sub emit_rfc822 {
+	my ($self, $k, undef, $bref) = @_;
+	$self->msg_more(" $k {" . length($$bref)."}\r\n");
+	$self->msg_more($$bref);
+}
 
-	$want->{ENVELOPE} and
-		$self->msg_more(' ENVELOPE '.eml_envelope($eml));
+# Mail::IMAPClient::message_string cares about this by default
+# (->Ignoresizeerrors attribute)
+sub emit_rfc822_size {
+	my ($self, $k, undef, $bref) = @_;
+	$self->msg_more(' RFC822.SIZE ' . length($$bref));
+}
 
-	for ('RFC822.HEADER', 'BODY[HEADER]') {
-		$want->{$_} or next;
-		$self->msg_more(" $_ {".length(${$eml->{hdr}})."}\r\n");
-		$self->msg_more(${$eml->{hdr}});
-	}
-	for ('RFC822.TEXT', 'BODY[TEXT]') {
-		$want->{$_} or next;
-		$self->msg_more(" $_ {".length($$bref)."}\r\n");
-		$self->msg_more($$bref);
-	}
-	$want->{BODYSTRUCTURE} and
-		$self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1));
-	$want->{BODY} and
-		$self->msg_more(' BODY '.fetch_body($eml));
-	if (my $partial = $want->{-partial}) {
-		partial_emit($self, $partial, $eml);
-	}
-	$self->msg_more(")\r\n");
-	requeue_once($self);
+sub emit_internaldate {
+	my ($self, undef, $smsg) = @_;
+	$self->msg_more(' INTERNALDATE "'.$smsg->internaldate.'"');
+}
+
+sub emit_flags { $_[0]->msg_more(' FLAGS ()') }
+
+sub emit_envelope {
+	my ($self, undef, undef, undef, $eml) = @_;
+	$self->msg_more(' ENVELOPE '.eml_envelope($eml));
+}
+
+sub emit_rfc822_header {
+	my ($self, $k, undef, undef, $eml) = @_;
+	$self->msg_more(" $k {".length(${$eml->{hdr}})."}\r\n");
+	$self->msg_more(${$eml->{hdr}});
+}
+
+# n.b. this is sorted to be after any emit_eml_new ops
+sub emit_rfc822_text {
+	my ($self, $k, undef, $bref) = @_;
+	$self->msg_more(" $k {".length($$bref)."}\r\n");
+	$self->msg_more($$bref);
+}
+
+sub emit_bodystructure {
+	my ($self, undef, undef, undef, $eml) = @_;
+	$self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1));
+}
+
+sub emit_body {
+	my ($self, undef, undef, undef, $eml) = @_;
+	$self->msg_more(' BODY '.fetch_body($eml));
 }
 
+# set $eml once ($_[4] == $eml, $_[3] == $bref)
+sub op_eml_new { $_[4] = PublicInbox::Eml->new($_[3]) }
+
 sub uid_clamp ($$$) {
 	my ($self, $beg, $end) = @_;
 	my $uid_min = $self->{uid_min} or return;
@@ -521,7 +547,7 @@ sub refill_range ($$$) {
 }
 
 sub uid_fetch_m { # long_response
-	my ($self, $tag, $msgs, $range_info, $want) = @_;
+	my ($self, $tag, $msgs, $range_info) = @_; # \@ops, \@partial
 	while (!@$msgs) { # rare
 		if (my $end = refill_range($self, $msgs, $range_info)) {
 			$self->write(\"$tag $end\r\n");
@@ -710,42 +736,64 @@ sub partial_emit ($$$) {
 	}
 }
 
-sub fetch_common ($$$$) {
-	my ($self, $tag, $range_csv, $want) = @_;
-	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
+sub fetch_compile ($) {
+	my ($want) = @_;
 	if ($want->[0] =~ s/\A\(//s) {
-		$want->[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n";
+		$want->[-1] =~ s/\)\z//s or return 'BAD no rparen';
 	}
-	my (%partial, %want);
+	my (%partial, %seen, @op);
+	my $need = 0;
 	while (defined(my $att = shift @$want)) {
 		$att = uc($att);
+		next if $att eq 'UID'; # always returned
 		$att =~ s/\ABODY\.PEEK\[/BODY\[/; # we're read-only
 		my $x = $FETCH_ATT{$att};
 		if ($x) {
-			%want = (%want, %$x);
+			while (my ($k, $fl_cb) = each %$x) {
+				next if $seen{$k}++;
+				$need |= $fl_cb->[0];
+
+				# insert a special op to convert $bref to $eml
+				# the first time we need it
+				if ($need == NEED_EML && !$seen{$need}++) {
+					push @op, $OP_EML_NEW;
+				}
+				# $fl_cb = [ flags, \&emit_foo ]
+				push @op, [ @$fl_cb , $k ];
+			}
 		} elsif (!partial_prepare(\%partial, $want, $att)) {
-			return "$tag BAD param: $att\r\n";
+			return "BAD param: $att";
 		}
 	}
+	my @r;
 
 	# stabilize partial order for consistency and ease-of-debugging:
 	if (scalar keys %partial) {
-		$want{-partial} = [ map {;
-			[ $_, @{$partial{$_}} ]
-		} sort keys %partial ];
+		$need = NEED_EML;
+		push @op, $OP_EML_NEW if !$seen{$need}++;
+		$r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ];
 	}
-	$range_csv = 'bad' if $range_csv !~ $valid_range;
-	my $range_info = range_step($self, \$range_csv);
-	return "$tag $range_info\r\n" if !ref($range_info);
-	[ $tag, [], $range_info, \%want ];
+
+	$r[0] = $need;
+
+	# r[1] = [ $key1, $cb1, $key2, $cb2, ... ]
+	use sort 'stable'; # makes output more consistent
+	$r[1] = [ map { ($_->[2], $_->[1]) } sort { $a->[0] <=> $b->[0] } @op ];
+	@r;
 }
 
 sub cmd_uid_fetch ($$$;@) {
 	my ($self, $tag, $range_csv, @want) = @_;
-	my $args = fetch_common($self, $tag, $range_csv, \@want);
-	ref($args) eq 'ARRAY' ?
-		long_response($self, \&uid_fetch_m, @$args) :
-		$args; # error
+	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
+	my ($need, $ops, $partial) = fetch_compile(\@want);
+	return "$tag $need\r\n" unless $ops;
+
+	$range_csv = 'bad' if $range_csv !~ $valid_range;
+	my $range_info = range_step($self, \$range_csv);
+	return "$tag $range_info\r\n" if !ref($range_info);
+
+	long_response($self, \&uid_fetch_m,
+			$tag, [], $range_info, $ops, $partial);
 }
 
 sub parse_date ($) { # 02-Oct-1993
diff --git a/t/imap.t b/t/imap.t
index 47e86ef42c7..2401237c8a0 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -107,4 +107,20 @@ EOF
 	}, 'structure matches expected');
 }
 
+{
+	my $fetch_compile = \&PublicInbox::IMAP::fetch_compile;
+	my ($cb, $ops, $partial) = $fetch_compile->(['BODY[]']);
+	is($partial, undef, 'no partial fetch data');
+	is_deeply($ops,
+		[ 'BODY[]', \&PublicInbox::IMAP::emit_rfc822 ],
+		'proper key and op compiled for BODY[]');
+
+	($cb, $ops, $partial) = $fetch_compile->(['BODY', 'BODY[]']);
+	is_deeply($ops, [
+		'BODY[]', \&PublicInbox::IMAP::emit_rfc822,
+		undef, \&PublicInbox::IMAP::op_eml_new,
+		'BODY', \&PublicInbox::IMAP::emit_body,
+	], 'placed op_eml_new before emit_body');
+}
+
 done_testing;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 59/82] imap: UID FETCH: optimize for smsg-only case
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (57 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 58/82] imap: compile UID FETCH to opcodes Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 60/82] imap: UID FETCH: optimize (UID FLAGS) harder Eric Wong
                   ` (23 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We can avoid loading the entire message from git when mutt makes
a "UID FETCH" request for "(UID FLAGS)".  This speeds mutt up by
more than an order-of-magnitude in informal measurements.
---
 lib/PublicInbox/IMAP.pm | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 3fae81112aa..0fe31a77244 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -45,7 +45,7 @@ sub NEED_BLOB () { 1 }
 sub NEED_EML () { NEED_BLOB|2 }
 my $OP_EML_NEW = [ NEED_EML - 1, \&op_eml_new ];
 
-my %FETCH_NEED = ( # for future optimization
+my %FETCH_NEED = (
 	'BODY[HEADER]' => [ NEED_EML, \&emit_rfc822_header ],
 	'BODY[TEXT]' => [ NEED_EML, \&emit_rfc822_text ],
 	'BODY[]' => [ NEED_BLOB, \&emit_rfc822 ],
@@ -546,7 +546,7 @@ sub refill_range ($$$) {
 	undef; # keep looping
 }
 
-sub uid_fetch_m { # long_response
+sub uid_fetch_msg { # long_response
 	my ($self, $tag, $msgs, $range_info) = @_; # \@ops, \@partial
 	while (!@$msgs) { # rare
 		if (my $end = refill_range($self, $msgs, $range_info)) {
@@ -558,6 +558,26 @@ sub uid_fetch_m { # long_response
 			\&uid_fetch_cb, \@_);
 }
 
+sub uid_fetch_smsg { # long_response
+	my ($self, $tag, $msgs, $range_info, $ops) = @_;
+	while (!@$msgs) { # rare
+		if (my $end = refill_range($self, $msgs, $range_info)) {
+			$self->write(\"$tag $end\r\n");
+			return;
+		}
+	}
+	for my $smsg (@$msgs) {
+		$self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}");
+		for (my $i = 0; $i < @$ops;) {
+			my $k = $ops->[$i++];
+			$ops->[$i++]->($self, $k, $smsg);
+		}
+		$self->msg_more(")\r\n");
+	}
+	@$msgs = ();
+	1; # more
+}
+
 sub cmd_status ($$$;@) {
 	my ($self, $tag, $mailbox, @items) = @_;
 	return "$tag BAD no items\r\n" if !scalar(@items);
@@ -774,7 +794,7 @@ sub fetch_compile ($) {
 		$r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ];
 	}
 
-	$r[0] = $need;
+	$r[0] = $need ? \&uid_fetch_msg : \&uid_fetch_smsg;
 
 	# r[1] = [ $key1, $cb1, $key2, $cb2, ... ]
 	use sort 'stable'; # makes output more consistent
@@ -785,15 +805,13 @@ sub fetch_compile ($) {
 sub cmd_uid_fetch ($$$;@) {
 	my ($self, $tag, $range_csv, @want) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
-	my ($need, $ops, $partial) = fetch_compile(\@want);
-	return "$tag $need\r\n" unless $ops;
+	my ($cb, $ops, $partial) = fetch_compile(\@want);
+	return "$tag $cb\r\n" unless $ops;
 
 	$range_csv = 'bad' if $range_csv !~ $valid_range;
 	my $range_info = range_step($self, \$range_csv);
 	return "$tag $range_info\r\n" if !ref($range_info);
-
-	long_response($self, \&uid_fetch_m,
-			$tag, [], $range_info, $ops, $partial);
+	long_response($self, $cb, $tag, [], $range_info, $ops, $partial);
 }
 
 sub parse_date ($) { # 02-Oct-1993

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 60/82] imap: UID FETCH: optimize (UID FLAGS) harder
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (58 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 59/82] imap: UID FETCH: optimize for smsg-only case Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 61/82] imap: IDLE: avoid extraneous wakeups, keep-alive Eric Wong
                   ` (22 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We can speed up this common mutt request by another 2-3x by not
loading the entire smsg from SQLite, just the UID.
---
 lib/PublicInbox/IMAP.pm | 42 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 0fe31a77244..0d553da91f9 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -41,8 +41,9 @@ sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977?
 sub UID_BLOCK () { 50_000 }
 
 # these values area also used for sorting
-sub NEED_BLOB () { 1 }
-sub NEED_EML () { NEED_BLOB|2 }
+sub NEED_SMSG () { 1 }
+sub NEED_BLOB () { NEED_SMSG|2 }
+sub NEED_EML () { NEED_BLOB|4 }
 my $OP_EML_NEW = [ NEED_EML - 1, \&op_eml_new ];
 
 my %FETCH_NEED = (
@@ -57,7 +58,7 @@ my %FETCH_NEED = (
 	BODYSTRUCTURE => [ NEED_EML, \&emit_bodystructure ],
 	ENVELOPE => [ NEED_EML, \&emit_envelope ],
 	FLAGS => [ 0, \&emit_flags ],
-	INTERNALDATE => [ 0, \&emit_internaldate ],
+	INTERNALDATE => [ NEED_SMSG, \&emit_internaldate ],
 );
 my %FETCH_ATT = map { $_ => [ $_ ] } keys %FETCH_NEED;
 
@@ -578,6 +579,38 @@ sub uid_fetch_smsg { # long_response
 	1; # more
 }
 
+sub uid_fetch_uid { # long_response
+	my ($self, $tag, $uids, $range_info, $ops) = @_;
+	while (!@$uids) { # rare
+		my ($beg, $end, $range_csv) = @$range_info;
+		if (scalar(@$uids = @{$self->{ibx}->over->
+					uid_range($beg, $end)})) {
+			$range_info->[0] = $uids->[-1] + 1;
+		} elsif (!$range_csv) {
+			$self->write(\"$tag OK Fetch done\r\n");
+			return;
+		} else {
+			my $next_range = range_step($self, \$range_csv);
+			if (!ref($next_range)) { # error
+				$self->write(\"$tag $next_range\r\n");
+				return;
+			}
+			@$range_info = @$next_range;
+		}
+		# continue looping
+	}
+	for (@$uids) {
+		$self->msg_more("* $_ FETCH (UID $_");
+		for (my $i = 0; $i < @$ops;) {
+			my $k = $ops->[$i++];
+			$ops->[$i++]->($self, $k);
+		}
+		$self->msg_more(")\r\n");
+	}
+	@$uids = ();
+	1; # more
+}
+
 sub cmd_status ($$$;@) {
 	my ($self, $tag, $mailbox, @items) = @_;
 	return "$tag BAD no items\r\n" if !scalar(@items);
@@ -794,7 +827,8 @@ sub fetch_compile ($) {
 		$r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ];
 	}
 
-	$r[0] = $need ? \&uid_fetch_msg : \&uid_fetch_smsg;
+	$r[0] = $need & NEED_BLOB ? \&uid_fetch_msg :
+		($need & NEED_SMSG ? \&uid_fetch_smsg : \&uid_fetch_uid);
 
 	# r[1] = [ $key1, $cb1, $key2, $cb2, ... ]
 	use sort 'stable'; # makes output more consistent

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 61/82] imap: IDLE: avoid extraneous wakeups, keep-alive
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (59 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 60/82] imap: UID FETCH: optimize (UID FLAGS) harder Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:04 ` [PATCH 62/82] imap: 30 minute auto-logout timer Eric Wong
                   ` (21 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

We should not waste memory for IDLE unless it's used on the most
recent inbox slice.  We also need to keep the IDLE connection
alive regardless of $PublicInbox::DS::EXPTIME.
---
 lib/PublicInbox/IMAP.pm | 56 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 47 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 0d553da91f9..77e8af12fa9 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -151,7 +151,7 @@ sub cmd_close ($$) {
 
 sub cmd_logout ($$) {
 	my ($self, $tag) = @_;
-	delete $self->{logged_in};
+	delete @$self{qw(logged_in -idle_tag)};
 	$self->write(\"* BYE logging out\r\n$tag OK Logout done\r\n");
 	$self->shutdn; # PublicInbox::DS::shutdn
 	undef;
@@ -174,23 +174,63 @@ sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" }
 sub on_inbox_unlock {
 	my ($self, $ibx) = @_;
 	my $new = $ibx->mm->max;
+	my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK;
 	defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset';
+	$new = $uid_end if $new > $uid_end;
 	if ($new > $old) {
 		$self->{-idle_max} = $new;
 		$self->msg_more("* $_ EXISTS\r\n") for (($old + 1)..($new - 1));
 		$self->write(\"* $new EXISTS\r\n");
+	} elsif ($new == $uid_end) { # max exceeded $uid_end
+		# continue idling w/o inotify
+		delete $self->{-idle_max};
+		my $sock = $self->{sock} or return;
+		$ibx->unsubscribe_unlock(fileno($sock));
 	}
 }
 
+# called every X minute(s) or so by PublicInbox::DS::later
+my $IDLERS = {};
+my $idle_timer;
+sub idle_tick_all {
+	my $old = $IDLERS;
+	$IDLERS = {};
+	for my $i (values %$old) {
+		next if ($i->{wbuf} || !exists($i->{-idle_tag}));
+		$i->update_idle_time or next;
+		$IDLERS->{fileno($i->{sock})} = $i;
+		$i->write(\"* OK Still here\r\n");
+	}
+	$idle_timer = scalar keys %$IDLERS ?
+			PublicInbox::DS::later(\&idle_tick_all) : undef;
+}
+
 sub cmd_idle ($$) {
 	my ($self, $tag) = @_;
 	# IDLE seems allowed by dovecot w/o a mailbox selected *shrug*
 	my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n";
-	$ibx->subscribe_unlock(fileno($self->{sock}), $self);
-	$self->{imapd}->idler_start;
 	$self->{-idle_tag} = $tag;
-	$self->{-idle_max} = $ibx->mm->max // 0;
-	"+ idling\r\n"
+	my $max = $ibx->mm->max // 0;
+	my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK;
+	my $sock = $self->{sock} or return;
+	my $fd = fileno($sock);
+	# only do inotify on most recent slice
+	if ($max < $uid_end) {
+		$ibx->subscribe_unlock($fd, $self);
+		$self->{imapd}->idler_start;
+		$self->{-idle_max} = $max;
+	}
+	$idle_timer //= PublicInbox::DS::later(\&idle_tick_all);
+	$IDLERS->{$fd} = $self;
+	\"+ idling\r\n"
+}
+
+sub stop_idle ($$) {
+	my ($self, $ibx);
+	my $sock = $self->{sock} or return;
+	my $fd = fileno($sock);
+	delete $IDLERS->{$fd};
+	$ibx->unsubscribe_unlock($fd);
 }
 
 sub cmd_done ($$) {
@@ -201,7 +241,7 @@ sub cmd_done ($$) {
 		warn "BUG: idle_tag set w/o inbox";
 		return "$tag BAD internal bug\r\n";
 	};
-	$ibx->unsubscribe_unlock(fileno($self->{sock}));
+	stop_idle($self, $ibx);
 	"$idle_tag OK Idle done\r\n";
 }
 
@@ -1185,9 +1225,7 @@ sub busy {
 sub close {
 	my ($self) = @_;
 	if (my $ibx = delete $self->{ibx}) {
-		if (my $sock = $self->{sock}) {;
-			$ibx->unsubscribe_unlock(fileno($sock));
-		}
+		stop_idle($self, $ibx);
 	}
 	$self->SUPER::close; # PublicInbox::DS::close
 }

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 62/82] imap: 30 minute auto-logout timer
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (60 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 61/82] imap: IDLE: avoid extraneous wakeups, keep-alive Eric Wong
@ 2020-06-10  7:04 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 63/82] imap: split ->logged_in attribute into a separate class Eric Wong
                   ` (20 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:04 UTC (permalink / raw)
  To: meta

RFC 3501 section 5.4 requires this to be >= 30 minutes,
10x higher than what is recommended for NNTP.  Fortunately
our design is reasonably memory-efficient despite being Perl.
---
 lib/PublicInbox/IMAP.pm | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 77e8af12fa9..13f415cf8d6 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -80,6 +80,9 @@ my @MoY = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
 my %MoY;
 @MoY{@MoY} = (0..11);
 
+# RFC 3501 5.4. Autologout Timer needs to be >= 30min
+$PublicInbox::DS::EXPTIME = 60 * 30;
+
 sub greet ($) {
 	my ($self) = @_;
 	my $capa = capa($self);

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 63/82] imap: split ->logged_in attribute into a separate class
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (61 preceding siblings ...)
  2020-06-10  7:04 ` [PATCH 62/82] imap: 30 minute auto-logout timer Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 64/82] searchidx: v1 (re)-index uses git asynchronously Eric Wong
                   ` (19 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

This is one boolean attribute not worth wasting space for.
With 20000 sockets, this reduces RSS by around 5% at a glance,
and locked hashes doesn't do us much good when clients
use compression, anyways.
---
 lib/PublicInbox/IMAP.pm        | 19 ++++++++++++++-----
 lib/PublicInbox/IMAPdeflate.pm |  2 --
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 13f415cf8d6..803ce31ff22 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -16,7 +16,7 @@
 package PublicInbox::IMAP;
 use strict;
 use base qw(PublicInbox::DS);
-use fields qw(imapd logged_in ibx long_cb -login_tag
+use fields qw(imapd ibx long_cb -login_tag
 	uid_min -idle_tag -idle_max);
 use PublicInbox::Eml;
 use PublicInbox::EmlContentFoo qw(parse_content_disposition);
@@ -27,6 +27,7 @@ use Text::ParseWords qw(parse_line);
 use Errno qw(EAGAIN);
 use Time::Local qw(timegm);
 use POSIX qw(strftime);
+use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways
 
 my $Address;
 for my $mod (qw(Email::Address::XS Mail::Address)) {
@@ -91,7 +92,8 @@ sub greet ($) {
 
 sub new ($$$) {
 	my ($class, $sock, $imapd) = @_;
-	my $self = fields::new($class);
+	my $self = fields::new('PublicInbox::IMAP_preauth');
+	unlock_hash(%$self);
 	my $ev = EPOLLIN;
 	my $wbuf;
 	if ($sock->can('accept_SSL') && !$sock->accept_SSL) {
@@ -110,13 +112,15 @@ sub new ($$$) {
 	$self;
 }
 
+sub logged_in { 1 }
+
 sub capa ($) {
 	my ($self) = @_;
 
 	# dovecot advertises IDLE pre-login; perhaps because some clients
 	# depend on it, so we'll do the same
 	my $capa = 'CAPABILITY IMAP4rev1 IDLE';
-	if ($self->{logged_in}) {
+	if ($self->logged_in) {
 		$capa .= ' COMPRESS=DEFLATE';
 	} else {
 		if (!($self->{sock} // $self)->can('accept_SSL') &&
@@ -129,7 +133,7 @@ sub capa ($) {
 
 sub login_success ($$) {
 	my ($self, $tag) = @_;
-	$self->{logged_in} = 1;
+	bless $self, 'PublicInbox::IMAP';
 	my $capa = capa($self);
 	"$tag OK [$capa] Logged in\r\n";
 }
@@ -154,7 +158,7 @@ sub cmd_close ($$) {
 
 sub cmd_logout ($$) {
 	my ($self, $tag) = @_;
-	delete @$self{qw(logged_in -idle_tag)};
+	delete $self->{-idle_tag};
 	$self->write(\"* BYE logging out\r\n$tag OK Logout done\r\n");
 	$self->shutdn; # PublicInbox::DS::shutdn
 	undef;
@@ -1238,4 +1242,9 @@ no warnings 'once';
 *cmd_select = \&cmd_examine;
 *cmd_fetch = \&cmd_uid_fetch;
 
+package PublicInbox::IMAP_preauth;
+our @ISA = qw(PublicInbox::IMAP);
+
+sub logged_in { 0 }
+
 1;
diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm
index 42daa6cffaa..b98a069d9b6 100644
--- a/lib/PublicInbox/IMAPdeflate.pm
+++ b/lib/PublicInbox/IMAPdeflate.pm
@@ -9,7 +9,6 @@ use warnings;
 use 5.010_001;
 use base qw(PublicInbox::IMAP);
 use Compress::Raw::Zlib;
-use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways
 
 my %IN_OPT = (
 	-Bufsize => 1024,
@@ -41,7 +40,6 @@ sub enable {
 		$self->write(\"$tag BAD failed to activate compression\r\n");
 		return;
 	}
-	unlock_hash(%$self);
 	$self->write(\"$tag OK DEFLATE active\r\n");
 	bless $self, $class;
 	$self->{zin} = $in;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 64/82] searchidx: v1 (re)-index uses git asynchronously
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (62 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 63/82] imap: split ->logged_in attribute into a separate class Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 65/82] index: account for CRLF conversion when storing bytes Eric Wong
                   ` (18 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

We can cleanup some of our v1 code slightly and let git do
I/O+decoding in parallel.  This gives a slight 2-4%
re-indexing performance boost even on an SSD.
---
 lib/PublicInbox/SearchIdx.pm | 76 +++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 41 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 3df7970ebf9..a790ac4076a 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -378,7 +378,7 @@ sub _msgmap_init ($) {
 
 sub add_message {
 	# mime = PublicInbox::Eml or Email::MIME object
-	my ($self, $mime, $smsg) = @_;
+	my ($self, $mime, $smsg, $sync) = @_;
 	my $hdr = $mime->header_obj;
 	my $mids = mids_for_index($hdr);
 	$smsg //= bless { blob => '' }, 'PublicInbox::Smsg'; # test-only compat
@@ -389,7 +389,7 @@ sub add_message {
 	};
 
 	# v1 and tests only:
-	$smsg->populate($hdr, $self);
+	$smsg->populate($hdr, $sync);
 	$smsg->{bytes} //= length($mime->as_string);
 
 	eval {
@@ -549,24 +549,24 @@ sub unindex_mm {
 	$self->{mm}->mid_delete(mid_mime($mime));
 }
 
-sub index_both {
-	my ($self, $mime, $smsg) = @_;
-	my $num = index_mm($self, $mime);
+sub index_both { # git->cat_async callback
+	my ($bref, $oid, $type, $size, $sync) = @_;
+	my ($nr, $max) = @$sync{qw(nr max)};
+	++$$nr;
+	$$max -= $size;
+	my $smsg = bless { bytes => $size, blob => $oid }, 'PublicInbox::Smsg';
+	my $self = $sync->{sidx};
+	my $eml = PublicInbox::Eml->new($bref);
+	my $num = index_mm($self, $eml);
 	$smsg->{num} = $num;
-	add_message($self, $mime, $smsg);
+	add_message($self, $eml, $smsg, $sync);
 }
 
-sub unindex_both {
-	my ($self, $mime) = @_;
-	unindex_blob($self, $mime);
-	unindex_mm($self, $mime);
-}
-
-sub do_cat_mail {
-	my ($git, $blob, $sizeref) = @_;
-	my $str = $git->cat_file($blob, $sizeref) or
-		die "BUG: $blob not found in $git->{git_dir}";
-	PublicInbox::Eml->new($str);
+sub unindex_both { # git->cat_async callback
+	my ($bref, $oid, $type, $size, $self) = @_;
+	my $eml = PublicInbox::Eml->new($bref);
+	unindex_blob($self, $eml);
+	unindex_mm($self, $eml);
 }
 
 # called by public-inbox-index
@@ -576,15 +576,6 @@ sub index_sync {
 	$self->{-inbox}->with_umask(sub { $self->_index_sync($opts) })
 }
 
-sub batch_adjust ($$$$$) {
-	my ($max, $bytes, $batch_cb, $latest, $nr) = @_;
-	$$max -= $bytes;
-	if ($$max <= 0) {
-		$$max = $BATCH_BYTES;
-		$batch_cb->($nr, $latest);
-	}
-}
-
 sub too_big ($$$) {
 	my ($self, $git, $oid) = @_;
 	my $max_size = $self->{index_max_size} or return;
@@ -597,24 +588,28 @@ sub too_big ($$$) {
 
 # only for v1
 sub read_log {
-	my ($self, $log, $add_cb, $del_cb, $batch_cb) = @_;
+	my ($self, $log, $batch_cb) = @_;
 	my $hex = '[a-f0-9]';
 	my $h40 = $hex .'{40}';
 	my $addmsg = qr!^:000000 100644 \S+ ($h40) A\t${hex}{2}/${hex}{38}$!;
 	my $delmsg = qr!^:100644 000000 ($h40) \S+ D\t${hex}{2}/${hex}{38}$!;
 	my $git = $self->{git};
 	my $latest;
-	my $bytes;
 	my $max = $BATCH_BYTES;
 	local $/ = "\n";
 	my %D;
 	my $line;
 	my $newest;
 	my $nr = 0;
+	my $sync = { sidx => $self, nr => \$nr, max => \$max };
 	while (defined($line = <$log>)) {
 		if ($line =~ /$addmsg/o) {
 			my $blob = $1;
 			if (delete $D{$blob}) {
+				# make sure pending index writes are done
+				# before writing to ->mm
+				$git->cat_async_wait;
+
 				if (defined $self->{regen_down}) {
 					my $num = $self->{regen_down}--;
 					$self->{mm}->num_highwater($num);
@@ -622,12 +617,12 @@ sub read_log {
 				next;
 			}
 			next if too_big($self, $git, $blob);
-			my $mime = do_cat_mail($git, $blob, \$bytes);
-			my $smsg = bless {}, 'PublicInbox::Smsg';
-			batch_adjust(\$max, $bytes, $batch_cb, $latest, ++$nr);
-			$smsg->{blob} = $blob;
-			$smsg->{bytes} = $bytes;
-			$add_cb->($self, $mime, $smsg);
+			$git->cat_async($blob, \&index_both, { %$sync });
+			if ($max <= 0) {
+				$git->cat_async_wait;
+				$max = $BATCH_BYTES;
+				$batch_cb->($nr, $latest);
+			}
 		} elsif ($line =~ /$delmsg/o) {
 			my $blob = $1;
 			$D{$blob} = 1 unless too_big($self, $git, $blob);
@@ -635,18 +630,17 @@ sub read_log {
 			$latest = $1;
 			$newest ||= $latest;
 		} elsif ($line =~ /^author .*? ([0-9]+) [\-\+][0-9]+$/) {
-			$self->{autime} = $1;
+			$sync->{autime} = $1;
 		} elsif ($line =~ /^committer .*? ([0-9]+) [\-\+][0-9]+$/) {
-			$self->{cotime} = $1;
+			$sync->{cotime} = $1;
 		}
 	}
 	close($log) or die "git log failed: \$?=$?";
 	# get the leftovers
 	foreach my $blob (keys %D) {
-		my $mime = do_cat_mail($git, $blob, \$bytes);
-		$del_cb->($self, $mime);
+		$git->cat_async($blob, \&unindex_both, $self);
 	}
-	delete @$self{qw(autime cotime)};
+	$git->cat_async_wait;
 	$batch_cb->($nr, $latest, $newest);
 }
 
@@ -774,7 +768,7 @@ sub _index_sync {
 	} while (_last_x_commit($self, $mm) ne $last_commit);
 
 	my $dbh = $mm->{dbh} if $mm;
-	my $cb = sub {
+	my $batch_cb = sub {
 		my ($nr, $commit, $newest) = @_;
 		if ($dbh) {
 			if ($newest) {
@@ -803,7 +797,7 @@ sub _index_sync {
 	};
 
 	$dbh->begin_work;
-	read_log($self, $xlog, *index_both, *unindex_both, $cb);
+	read_log($self, $xlog, $batch_cb);
 }
 
 sub DESTROY {

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 65/82] index: account for CRLF conversion when storing bytes
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (63 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 64/82] searchidx: v1 (re)-index uses git asynchronously Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 66/82] imap: rely on smsg->{bytes} for RFC822.SIZE Eric Wong
                   ` (17 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

NNTP and IMAP both require CRLF conversions on the wire.
They're also the only components which care about
$smsg->{bytes}, so store the CRLF-adjusted value in over.sqlite3
and Xapian DBs..

This will allow us to optimize RFC822.SIZE fetch item in IMAP
without triggering size mismatch errors in some clients' default
configurations (e.g. Mail::IMAPClient), but not most others.

It could also fix hypothetical problems with NNTP clients that
report discrepancies between overview and article data.
---
 lib/PublicInbox/Import.pm         |  2 +-
 lib/PublicInbox/SearchIdx.pm      | 12 ++++++++++++
 lib/PublicInbox/SearchIdxShard.pm | 11 ++++++-----
 lib/PublicInbox/V2Writable.pm     | 10 ++++++----
 t/import.t                        |  5 +++--
 t/nntpd.t                         |  5 ++++-
 t/search.t                        |  8 ++++++++
 7 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index ab75aa00dc2..af35905be49 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -400,7 +400,7 @@ sub add {
 	# v2: we need this for Xapian
 	if ($smsg) {
 		$smsg->{blob} = $self->get_mark(":$blob");
-		$smsg->{bytes} = $n;
+		$smsg->{raw_bytes} = $n;
 		$smsg->{-raw_email} = \$raw_email;
 	}
 	my $ref = $self->{ref};
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index a790ac4076a..85821ea706a 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -549,11 +549,23 @@ sub unindex_mm {
 	$self->{mm}->mid_delete(mid_mime($mime));
 }
 
+# returns the number of bytes to add if given a non-CRLF arg
+sub crlf_adjust ($) {
+	if (index($_[0], "\r\n") < 0) {
+		# common case is LF-only, every \n needs an \r;
+		# so favor a cheap tr// over an expensive m//g
+		$_[0] =~ tr/\n/\n/;
+	} else { # count number of '\n' w/o '\r', expensive:
+		scalar(my @n = ($_[0] =~ m/(?<!\r)\n/g));
+	}
+}
+
 sub index_both { # git->cat_async callback
 	my ($bref, $oid, $type, $size, $sync) = @_;
 	my ($nr, $max) = @$sync{qw(nr max)};
 	++$$nr;
 	$$max -= $size;
+	$size += crlf_adjust($$bref);
 	my $smsg = bless { bytes => $size, blob => $oid }, 'PublicInbox::Smsg';
 	my $self = $sync->{sidx};
 	my $eml = PublicInbox::Eml->new($bref);
diff --git a/lib/PublicInbox/SearchIdxShard.pm b/lib/PublicInbox/SearchIdxShard.pm
index c1f52d8b884..f7ba293ff5b 100644
--- a/lib/PublicInbox/SearchIdxShard.pm
+++ b/lib/PublicInbox/SearchIdxShard.pm
@@ -71,11 +71,11 @@ sub shard_worker_loop ($$$$$) {
 		} else {
 			chomp $line;
 			# n.b. $mid may contain spaces(!)
-			my ($bytes, $num, $blob, $ds, $ts, $mid) =
-							split(/ /, $line, 6);
+			my ($to_read, $bytes, $num, $blob, $ds, $ts, $mid) =
+							split(/ /, $line, 7);
 			$self->begin_txn_lazy;
-			my $n = read($r, my $msg, $bytes) or die "read: $!\n";
-			$n == $bytes or die "short read: $n != $bytes\n";
+			my $n = read($r, my $msg, $to_read) or die "read: $!\n";
+			$n == $to_read or die "short read: $n != $to_read\n";
 			my $mime = PublicInbox::Eml->new(\$msg);
 			my $smsg = bless {
 				bytes => $bytes,
@@ -96,7 +96,8 @@ sub index_raw {
 	my ($self, $msgref, $mime, $smsg) = @_;
 	if (my $w = $self->{w}) {
 		# mid must be last, it can contain spaces (but not LF)
-		print $w join(' ', @$smsg{qw(bytes num blob ds ts mid)}),
+		print $w join(' ', @$smsg{qw(raw_bytes bytes
+						num blob ds ts mid)}),
 			"\n", $$msgref or die "failed to write shard $!\n";
 	} else {
 		$$msgref = undef;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 79bee7f9f3d..91379431633 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -155,10 +155,12 @@ sub add {
 # indexes a message, returns true if checkpointing is needed
 sub do_idx ($$$$) {
 	my ($self, $msgref, $mime, $smsg) = @_;
+	$smsg->{bytes} = $smsg->{raw_bytes} +
+			PublicInbox::SearchIdx::crlf_adjust($$msgref);
 	$self->{over}->add_overview($mime, $smsg);
 	my $idx = idx_shard($self, $smsg->{num} % $self->{shards});
 	$idx->index_raw($msgref, $mime, $smsg);
-	my $n = $self->{transact_bytes} += $smsg->{bytes};
+	my $n = $self->{transact_bytes} += $smsg->{raw_bytes};
 	$n >= ($PublicInbox::SearchIdx::BATCH_BYTES * $self->{shards});
 }
 
@@ -568,7 +570,7 @@ W: $list
 	for my $smsg (@$need_reindex) {
 		my $new_smsg = bless {
 			blob => $blob,
-			bytes => $bytes,
+			raw_bytes => $bytes,
 			num => $smsg->{num},
 			mid => $smsg->{mid},
 		}, 'PublicInbox::Smsg';
@@ -962,7 +964,7 @@ sub reindex_oid_m ($$$$;$) {
 	}
 	$sync->{nr}++;
 	my $smsg = bless {
-		bytes => $len,
+		raw_bytes => $len,
 		num => $num,
 		blob => $oid,
 		mid => $mid0,
@@ -1054,7 +1056,7 @@ sub reindex_oid ($$$$) {
 		die "failed to delete <$mid0> for article #$num\n";
 	$sync->{nr}++;
 	my $smsg = bless {
-		bytes => $len,
+		raw_bytes => $len,
 		num => $num,
 		blob => $oid,
 		mid => $mid0,
diff --git a/t/import.t b/t/import.t
index f987b1141f7..abbc8229d0e 100644
--- a/t/import.t
+++ b/t/import.t
@@ -32,8 +32,9 @@ like($im->add($mime, undef, $smsg), qr/\A:[0-9]+\z/, 'added one message');
 
 if ($v2) {
 	like($smsg->{blob}, qr/\A[a-f0-9]{40}\z/, 'got last object_id');
-	is($mime->as_string, ${$smsg->{-raw_email}}, 'string matches');
-	is($smsg->{bytes}, length(${$smsg->{-raw_email}}), 'length matches');
+	my $raw_email = $smsg->{-raw_email};
+	is($mime->as_string, $$raw_email, 'string matches');
+	is($smsg->{raw_bytes}, length($$raw_email), 'length matches');
 	my @cmd = ('git', "--git-dir=$git->{git_dir}", qw(hash-object --stdin));
 	my $in = tempfile();
 	print $in $mime->as_string or die "write failed: $!";
diff --git a/t/nntpd.t b/t/nntpd.t
index eee67ea65bb..d2f31323115 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -73,7 +73,10 @@ EOF
 		my $list_id = $addr;
 		$list_id =~ s/@/./;
 		$mime->header_set('List-Id', "<$list_id>");
-		$len = length($mime->as_string);
+		my $str = $mime->as_string;
+		$str =~ s/(?<!\r)\n/\r\n/sg;
+		$len = length($str);
+		undef $str;
 		$im->add($mime);
 		$im->done;
 		if ($version == 1) {
diff --git a/t/search.t b/t/search.t
index d4ca28c794f..82caf9e41c3 100644
--- a/t/search.t
+++ b/t/search.t
@@ -59,6 +59,14 @@ sub oct_is ($$$) {
 	}
 }
 
+{
+	my $crlf_adjust = \&PublicInbox::SearchIdx::crlf_adjust;
+	is($crlf_adjust->("hi\r\nworld\r\n"), 0, 'no adjustment needed');
+	is($crlf_adjust->("hi\nworld\n"), 2, 'LF-only counts two CR');
+	is($crlf_adjust->("hi\r\nworld\n"), 1, 'CRLF/LF-mix 1 counts 1 CR');
+	is($crlf_adjust->("hi\nworld\r\n"), 1, 'CRLF/LF-mix 2 counts 1 CR');
+}
+
 $ibx->with_umask(sub {
 	my $root = PublicInbox::Eml->new(<<'EOF');
 Date: Fri, 02 Oct 1993 00:00:00 +0000

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 66/82] imap: rely on smsg->{bytes} for RFC822.SIZE
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (64 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 65/82] index: account for CRLF conversion when storing bytes Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 67/82] imap: UID FETCH requires at least one data item Eric Wong
                   ` (16 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

Since we started indexing the CRLF-adjusted size of messages,
we can take an order-of-magnitude speedup for certain MUAs
which fetch this attribute without needing much else.

Admins are encouraged to --reindex existing inboxes for IMAP
support, anyways.  It won't be fatal if it's not reindexed, but
some client bugs and warnings can be fixed and they'll be able
to support more of IMAP.
---
 lib/PublicInbox/IMAP.pm | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 803ce31ff22..9cdcba693c7 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -53,7 +53,7 @@ my %FETCH_NEED = (
 	'BODY[]' => [ NEED_BLOB, \&emit_rfc822 ],
 	'RFC822.HEADER' => [ NEED_EML, \&emit_rfc822_header ],
 	'RFC822.TEXT' => [ NEED_EML, \&emit_rfc822_text ],
-	'RFC822.SIZE' => [ NEED_BLOB, \&emit_rfc822_size ],
+	'RFC822.SIZE' => [ NEED_SMSG, \&emit_rfc822_size ],
 	RFC822 => [ NEED_BLOB, \&emit_rfc822 ],
 	BODY => [ NEED_EML, \&emit_body ],
 	BODYSTRUCTURE => [ NEED_EML, \&emit_bodystructure ],
@@ -500,11 +500,12 @@ sub emit_rfc822 {
 	$self->msg_more($$bref);
 }
 
-# Mail::IMAPClient::message_string cares about this by default
-# (->Ignoresizeerrors attribute)
+# Mail::IMAPClient::message_string cares about this by default,
+# (->Ignoresizeerrors attribute).  Admins are encouraged to
+# --reindex for IMAP support, anyways.
 sub emit_rfc822_size {
-	my ($self, $k, undef, $bref) = @_;
-	$self->msg_more(' RFC822.SIZE ' . length($$bref));
+	my ($self, $k, $smsg) = @_;
+	$self->msg_more(' RFC822.SIZE ' . $smsg->{bytes});
 }
 
 sub emit_internaldate {

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 67/82] imap: UID FETCH requires at least one data item
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (65 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 66/82] imap: rely on smsg->{bytes} for RFC822.SIZE Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 68/82] imap: LIST shows "INBOX" in all caps Eric Wong
                   ` (15 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

It seems required based on my reading of RFC 3501 for
the non-UID "FETCH" command.
---
 lib/PublicInbox/IMAP.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 9cdcba693c7..d3f4874e496 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -884,7 +884,7 @@ sub fetch_compile ($) {
 	@r;
 }
 
-sub cmd_uid_fetch ($$$;@) {
+sub cmd_uid_fetch ($$$$;@) {
 	my ($self, $tag, $range_csv, @want) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
 	my ($cb, $ops, $partial) = fetch_compile(\@want);

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 68/82] imap: LIST shows "INBOX" in all caps
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (66 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 67/82] imap: UID FETCH requires at least one data item Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 69/82] imap: support 8000 octet lines Eric Wong
                   ` (14 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

While selecting a mailbox is done case-insensitively, "INBOX" is
special for the LIST command, according to RFC 3501 6.3.8:

> The special name INBOX is included in the output from LIST, if
> INBOX is supported by this server for this user and if the
> uppercase string "INBOX" matches the interpreted reference and
> mailbox name arguments with wildcards as described above.  The
> criteria for omitting INBOX is whether SELECT INBOX will
> return failure; it is not relevant whether the user's real
> INBOX resides on this or some other server.

Thus, the existing news.public-inbox.org convention of naming
newsgroups starting with "inbox." needs to be special-cased to
not confuse clients.

While we're at it, do not create ".0" for dummy newsgroups if
they're selected, either.
---
 lib/PublicInbox/IMAP.pm  | 8 ++++----
 lib/PublicInbox/IMAPD.pm | 4 +++-
 t/imapd.t                | 4 ++--
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index d3f4874e496..b3c449b0712 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -254,13 +254,14 @@ sub cmd_done ($$) {
 
 sub ensure_ranges_exist ($$$) {
 	my ($imapd, $ibx, $max) = @_;
+	defined(my $mb_top = $ibx->{newsgroup}) or return;
 	my $mailboxes = $imapd->{mailboxes};
-	my $mb_top = $ibx->{newsgroup};
 	my @created;
 	for (my $i = int($max/UID_BLOCK); $i >= 0; --$i) {
 		my $sub_mailbox = "$mb_top.$i";
 		last if exists $mailboxes->{$sub_mailbox};
 		$mailboxes->{$sub_mailbox} = $ibx;
+		$sub_mailbox =~ s/\Ainbox\./INBOX./i; # more familiar to users
 		push @created, $sub_mailbox;
 	}
 	return unless @created;
@@ -693,9 +694,8 @@ sub cmd_list ($$$$) {
 		# request for hierarchy delimiter
 		$l = [ qq[* LIST (\\Noselect) "." ""\r\n] ];
 	} elsif ($refname ne '' || $wildcard ne '*') {
-		$wildcard = lc $wildcard;
-		$wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eg;
-		$l = [ grep(/ \Q$refname\E$wildcard\r\n\z/s, @$l) ];
+		$wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!egi;
+		$l = [ grep(/ \Q$refname\E$wildcard\r\n\z/is, @$l) ];
 	}
 	\(join('', @$l, "$tag OK List done\r\n"));
 }
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index 261d756042f..186ec7b0062 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -71,7 +71,9 @@ sub imapd_refresh_finalize {
 	$imapd->{inboxlist} = [
 		map {
 			my $no = $mailboxes->{$_} == $dummy ? '' : 'No';
-			qq[* LIST (\\Has${no}Children) "." $_\r\n]
+			my $u = $_; # capitalize "INBOX" for user-familiarity
+			$u =~ s/\Ainbox(\.|\z)/INBOX$1/i;
+			qq[* LIST (\\Has${no}Children) "." $u\r\n]
 		} sort {
 			# shortest names first, alphabetically if lengths match
 			length($a) == length($b) ?
diff --git a/t/imapd.t b/t/imapd.t
index 233be9f2c0a..c691e1a96a7 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -101,13 +101,13 @@ like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\.$first_range\x20
 like($raw[1], qr/\A\S+ OK /, 'finished status response');
 
 my @orig_list = @raw = $mic->list;
-like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/,
+like($raw[0], qr/^\* LIST \(.*?\) "\." INBOX/,
 	'got an inbox');
 like($raw[-1], qr/^\S+ OK /, 'response ended with OK');
 is(scalar(@raw), scalar(@V) + 4, 'default LIST response');
 @raw = $mic->list('', 'inbox.i1');
 is(scalar(@raw), 2, 'limited LIST response');
-like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/,
+like($raw[0], qr/^\* LIST \(.*?\) "\." INBOX/,
 		'got an inbox.i1');
 like($raw[-1], qr/^\S+ OK /, 'response ended with OK');
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 69/82] imap: support 8000 octet lines
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (67 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 68/82] imap: LIST shows "INBOX" in all caps Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 70/82] imap: reinstate some message sequence number support Eric Wong
                   ` (13 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

RFC 2683 section 3.2.1.5 recommends it:

> For its part, a server should allow for a command line of at least
> 8000 octets.  This provides plenty of leeway for accepting reasonable
> length commands from clients.  The server should send a BAD response
> to a command that does not end within the server's maximum accepted
> command length.

To conserve memory, we won't bother reading the entire line
before sending the BAD response and disconnecting them.
---
 Documentation/standards.perl | 1 +
 lib/PublicInbox/IMAP.pm      | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/standards.perl b/Documentation/standards.perl
index 8fc852c722b..b4bda8a9c06 100755
--- a/Documentation/standards.perl
+++ b/Documentation/standards.perl
@@ -44,6 +44,7 @@ my $rfcs = [
 	5322 => 'Internet message format (2008)',
 	3501 => 'IMAP4rev1',
 	2177 => 'IMAP IDLE',
+	2683 => 'IMAP4 Implementation Recommendations',
 	# 5032 = 'WITHIN search extension for IMAP',
 	4978 => 'IMAP COMPRESS Extension',
 	# 5182 = 'IMAP Extension for Referencing the Last SEARCH Result',
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index b3c449b0712..2e50415de57 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -36,7 +36,7 @@ for my $mod (qw(Email::Address::XS Mail::Address)) {
 }
 die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address;
 
-sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977?
+sub LINE_MAX () { 8000 } # RFC 2683 3.2.1.5
 
 # changing this will cause grief for clients which cache
 sub UID_BLOCK () { 50_000 }
@@ -1170,7 +1170,10 @@ sub event_step {
 	my $rbuf = $self->{rbuf} // \(my $x = '');
 	my $line = index($$rbuf, "\n");
 	while ($line < 0) {
-		return $self->close if length($$rbuf) >= LINE_MAX;
+		if (length($$rbuf) >= LINE_MAX) {
+			$self->write(\"\* BAD request too long\r\n");
+			return $self->close;
+		}
 		$self->do_read($rbuf, LINE_MAX, length($$rbuf)) or return;
 		$line = index($$rbuf, "\n");
 	}

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 70/82] imap: reinstate some message sequence number support
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (68 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 69/82] imap: support 8000 octet lines Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 71/82] imap: cleanup ->{uid_base} usage Eric Wong
                   ` (12 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

The performance problem with mutt not using header caches isn't
fixed, yet, but mutt header caching seems to depend on MSNs
(message sequence numbers).  We'll switch to storing the 0-based
{uid_base} instead of the 1-based {uid_min} since it simplifies
most of our code.
---
 lib/PublicInbox/IMAP.pm | 121 ++++++++++++++++++++++++----------------
 1 file changed, 73 insertions(+), 48 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 2e50415de57..12123072d1a 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -7,17 +7,13 @@
 # slow storage.
 #
 # data notes:
-# * NNTP article numbers are UIDs and message sequence numbers (MSNs)
-# * Message sequence numbers (MSNs) can be stable since we're read-only.
-#   Most IMAP clients use UIDs (I hope).  We may return a dummy message
-#   in the future if a client requests a non-existent MSN, but that seems
-#   unecessary with mutt.
+# * NNTP article numbers are UIDs
 
 package PublicInbox::IMAP;
 use strict;
 use base qw(PublicInbox::DS);
 use fields qw(imapd ibx long_cb -login_tag
-	uid_min -idle_tag -idle_max);
+	uid_base -idle_tag -idle_max);
 use PublicInbox::Eml;
 use PublicInbox::EmlContentFoo qw(parse_content_disposition);
 use PublicInbox::DS qw(now);
@@ -151,7 +147,7 @@ sub cmd_login ($$$$) {
 
 sub cmd_close ($$) {
 	my ($self, $tag) = @_;
-	delete $self->{uid_min};
+	delete $self->{uid_base};
 	delete $self->{ibx} ? "$tag OK Close done\r\n"
 				: "$tag BAD No mailbox\r\n";
 }
@@ -181,11 +177,14 @@ sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" }
 sub on_inbox_unlock {
 	my ($self, $ibx) = @_;
 	my $new = $ibx->mm->max;
-	my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK;
+	my $uid_base = $self->{uid_base} // 0;
+	my $uid_end = $uid_base + UID_BLOCK;
 	defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset';
 	$new = $uid_end if $new > $uid_end;
 	if ($new > $old) {
 		$self->{-idle_max} = $new;
+		$new -= $uid_base;
+		$old -= $uid_base;
 		$self->msg_more("* $_ EXISTS\r\n") for (($old + 1)..($new - 1));
 		$self->write(\"* $new EXISTS\r\n");
 	} elsif ($new == $uid_end) { # max exceeded $uid_end
@@ -218,7 +217,7 @@ sub cmd_idle ($$) {
 	my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n";
 	$self->{-idle_tag} = $tag;
 	my $max = $ibx->mm->max // 0;
-	my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK;
+	my $uid_end = ($self->{uid_base} // 0) + UID_BLOCK;
 	my $sock = $self->{sock} or return;
 	my $fd = fileno($sock);
 	# only do inotify on most recent slice
@@ -274,18 +273,19 @@ sub inbox_lookup ($$) {
 	my ($ibx, $exists, $uidnext);
 	if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) {
 		# old mail: inbox.comp.foo.$uid_block_idx
-		my ($mb_top, $uid_min) = ($1, $2 * UID_BLOCK + 1);
+		my ($mb_top, $uid_base) = ($1, $2 * UID_BLOCK);
 
 		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
 		$exists = $ibx->mm->max // 0;
-		$self->{uid_min} = $uid_min;
+		$self->{uid_base} = $uid_base;
 		ensure_ranges_exist($self->{imapd}, $ibx, $exists);
-		my $uid_end = $uid_min + UID_BLOCK - 1;
+		my $uid_end = $uid_base + UID_BLOCK;
 		$exists = $uid_end if $exists > $uid_end;
 		$uidnext = $exists + 1;
+		$exists -= $uid_base;
 	} else { # check for dummy inboxes
 		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
-		delete $self->{uid_min};
+		delete $self->{uid_base};
 		$exists = 0;
 		$uidnext = 1;
 	}
@@ -469,9 +469,24 @@ sub requeue_once ($) {
 	$self->requeue if $new_size == 1;
 }
 
-sub uid_fetch_cb { # called by git->cat_async via git_async_cat
-	my ($bref, $oid, $type, $size, $fetch_m_arg) = @_;
-	my ($self, undef, $msgs, undef, $ops, $partial) = @$fetch_m_arg;
+# my ($uid_base, $UID) = @_;
+sub fetch_msn_uid ($$) { '* '.($_[1] - $_[0]).' FETCH (UID '.$_[1] }
+
+sub fetch_run_ops {
+	my ($self, $uid_base, $smsg, $bref, $ops, $partial) = @_;
+	$self->msg_more(fetch_msn_uid($uid_base, $smsg->{num}));
+	my ($eml, $k);
+	for (my $i = 0; $i < @$ops;) {
+		$k = $ops->[$i++];
+		$ops->[$i++]->($self, $k, $smsg, $bref, $eml);
+	}
+	partial_emit($self, $partial, $eml) if $partial;
+	$self->msg_more(")\r\n");
+}
+
+sub fetch_blob_cb { # called by git->cat_async via git_async_cat
+	my ($bref, $oid, $type, $size, $fetch_arg) = @_;
+	my ($self, undef, $msgs, undef, $ops, $partial) = @$fetch_arg;
 	my $smsg = shift @$msgs or die 'BUG: no smsg';
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
@@ -484,14 +499,8 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat
 
 	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
 	$$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s;
-	$self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}");
-	my $eml;
-	for (my $i = 0; $i < @$ops;) {
-		my $k = $ops->[$i++];
-		$ops->[$i++]->($self, $k, $smsg, $bref, $eml);
-	}
-	partial_emit($self, $partial, $eml) if $partial;
-	$self->msg_more(")\r\n");
+	fetch_run_ops($self, $self->{uid_base} // 0,
+			$smsg, $bref, $ops, $partial);
 	requeue_once($self);
 }
 
@@ -549,7 +558,7 @@ sub op_eml_new { $_[4] = PublicInbox::Eml->new($_[3]) }
 
 sub uid_clamp ($$$) {
 	my ($self, $beg, $end) = @_;
-	my $uid_min = $self->{uid_min} or return;
+	my $uid_min = ($self->{uid_base} // 0) + 1;
 	my $uid_end = $uid_min + UID_BLOCK - 1;
 	$$beg = $uid_min if $$beg < $uid_min;
 	$$end = $uid_end if $$end > $uid_end;
@@ -569,7 +578,7 @@ sub range_step ($$) {
 	} elsif ($range =~ /\A([0-9]+):\*\z/) {
 		$beg = $1 + 0;
 		$end = $self->{ibx}->mm->max // 0;
-		my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK;
+		my $uid_end = $self->{uid_base} + UID_BLOCK;
 		$end = $uid_end if $end > $uid_end;
 		$beg = $end if $beg > $end;
 	} elsif ($range =~ /\A[0-9]+\z/) {
@@ -596,8 +605,8 @@ sub refill_range ($$$) {
 	undef; # keep looping
 }
 
-sub uid_fetch_msg { # long_response
-	my ($self, $tag, $msgs, $range_info) = @_; # \@ops, \@partial
+sub fetch_blob { # long_response
+	my ($self, $tag, $msgs, $range_info, $ops, $partial) = @_;
 	while (!@$msgs) { # rare
 		if (my $end = refill_range($self, $msgs, $range_info)) {
 			$self->write(\"$tag $end\r\n");
@@ -605,10 +614,10 @@ sub uid_fetch_msg { # long_response
 		}
 	}
 	git_async_cat($self->{ibx}->git, $msgs->[0]->{blob},
-			\&uid_fetch_cb, \@_);
+			\&fetch_blob_cb, \@_);
 }
 
-sub uid_fetch_smsg { # long_response
+sub fetch_smsg { # long_response
 	my ($self, $tag, $msgs, $range_info, $ops) = @_;
 	while (!@$msgs) { # rare
 		if (my $end = refill_range($self, $msgs, $range_info)) {
@@ -616,20 +625,15 @@ sub uid_fetch_smsg { # long_response
 			return;
 		}
 	}
-	for my $smsg (@$msgs) {
-		$self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}");
-		for (my $i = 0; $i < @$ops;) {
-			my $k = $ops->[$i++];
-			$ops->[$i++]->($self, $k, $smsg);
-		}
-		$self->msg_more(")\r\n");
-	}
+	my $uid_base = $self->{uid_base} // 0;
+	fetch_run_ops($self, $uid_base, $_, undef, $ops) for @$msgs;
 	@$msgs = ();
 	1; # more
 }
 
-sub uid_fetch_uid { # long_response
+sub fetch_uid { # long_response
 	my ($self, $tag, $uids, $range_info, $ops) = @_;
+
 	while (!@$uids) { # rare
 		my ($beg, $end, $range_csv) = @$range_info;
 		if (scalar(@$uids = @{$self->{ibx}->over->
@@ -648,10 +652,12 @@ sub uid_fetch_uid { # long_response
 		}
 		# continue looping
 	}
+	my $uid_base = $self->{uid_base} // 0;
+	my ($i, $k);
 	for (@$uids) {
-		$self->msg_more("* $_ FETCH (UID $_");
-		for (my $i = 0; $i < @$ops;) {
-			my $k = $ops->[$i++];
+		$self->msg_more(fetch_msn_uid($uid_base, $_));
+		for ($i = 0; $i < @$ops;) {
+			$k = $ops->[$i++];
 			$ops->[$i++]->($self, $k);
 		}
 		$self->msg_more(")\r\n");
@@ -875,8 +881,8 @@ sub fetch_compile ($) {
 		$r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ];
 	}
 
-	$r[0] = $need & NEED_BLOB ? \&uid_fetch_msg :
-		($need & NEED_SMSG ? \&uid_fetch_smsg : \&uid_fetch_uid);
+	$r[0] = $need & NEED_BLOB ? \&fetch_blob :
+		($need & NEED_SMSG ? \&fetch_smsg : \&fetch_uid);
 
 	# r[1] = [ $key1, $cb1, $key2, $cb2, ... ]
 	use sort 'stable'; # makes output more consistent
@@ -896,6 +902,25 @@ sub cmd_uid_fetch ($$$$;@) {
 	long_response($self, $cb, $tag, [], $range_info, $ops, $partial);
 }
 
+sub msn_to_uid_range ($$) {
+	my $uid_base = $_[0]->{uid_base} // 0;
+	$_[1] =~ s/([0-9]+)/$uid_base + $1/sge;
+}
+
+sub cmd_fetch ($$$$;@) {
+	my ($self, $tag, $range_csv, @want) = @_;
+	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
+	my ($cb, $ops, $partial) = fetch_compile(\@want);
+	return "$tag $cb\r\n" unless $ops;
+
+	# cb is one of fetch_blob, fetch_smsg, fetch_uid
+	$range_csv = 'bad' if $range_csv !~ $valid_range;
+	msn_to_uid_range($self, $range_csv);
+	my $range_info = range_step($self, \$range_csv);
+	return "$tag $range_info\r\n" if !ref($range_info);
+	long_response($self, $cb, $tag, [], $range_info, $ops, $partial);
+}
+
 sub parse_date ($) { # 02-Oct-1993
 	my ($date_text) = @_;
 	my ($dd, $mon, $yyyy) = split(/-/, $_[0], 3);
@@ -979,7 +1004,8 @@ sub parse_query {
 		# default criteria
 		next if $k =~ /\A(?:ALL|RECENT|UNSEEN|NEW)\z/;
 		next if $k eq 'AND'; # the default, until we support OR
-		if ($k =~ $valid_range) { # sequence numbers == UIDs
+		if ($k =~ $valid_range) { # convert sequence numbers to UIDs
+			msn_to_uid_range($self, $k);
 			push @{$q->{uid}}, $k;
 		} elsif ($k eq 'UID') {
 			$k = shift(@$rest) // '';
@@ -1037,8 +1063,8 @@ sub cmd_uid_search ($$$;) {
 
 	if (!scalar(keys %$q)) {
 		$self->msg_more('* SEARCH');
-		my $beg = $self->{uid_min} // 1;
-		my $end = $ibx->mm->max;
+		my $beg = 1;
+		my $end = $ibx->mm->max // 0;
 		uid_clamp($self, \$beg, \$end);
 		long_response($self, \&uid_search_uid_range,
 				$tag, \$beg, $end, $sql);
@@ -1244,7 +1270,6 @@ sub close {
 # we're read-only, so SELECT and EXAMINE do the same thing
 no warnings 'once';
 *cmd_select = \&cmd_examine;
-*cmd_fetch = \&cmd_uid_fetch;
 
 package PublicInbox::IMAP_preauth;
 our @ISA = qw(PublicInbox::IMAP);

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 71/82] imap: cleanup ->{uid_base} usage
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (69 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 70/82] imap: reinstate some message sequence number support Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 72/82] imap: FETCH: more granular CRLF conversion Eric Wong
                   ` (11 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

Ensure {uid_base} is always set, so we don't need to add `//'
checks everywhere.  Furthermore, this fixes a hard-to-test bug
where the STATUS command would inadvertantly clobber {uid_base}.
---
 lib/PublicInbox/IMAP.pm | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 12123072d1a..e9fa338b5fe 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -217,7 +217,7 @@ sub cmd_idle ($$) {
 	my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n";
 	$self->{-idle_tag} = $tag;
 	my $max = $ibx->mm->max // 0;
-	my $uid_end = ($self->{uid_base} // 0) + UID_BLOCK;
+	my $uid_end = $self->{uid_base} + UID_BLOCK;
 	my $sock = $self->{sock} or return;
 	my $fd = fileno($sock);
 	# only do inotify on most recent slice
@@ -270,14 +270,13 @@ sub ensure_ranges_exist ($$$) {
 
 sub inbox_lookup ($$) {
 	my ($self, $mailbox) = @_;
-	my ($ibx, $exists, $uidnext);
+	my ($ibx, $exists, $uidnext, $uid_base);
 	if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) {
 		# old mail: inbox.comp.foo.$uid_block_idx
-		my ($mb_top, $uid_base) = ($1, $2 * UID_BLOCK);
-
+		my $mb_top = $1;
+		$uid_base = $2 * UID_BLOCK;
 		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
 		$exists = $ibx->mm->max // 0;
-		$self->{uid_base} = $uid_base;
 		ensure_ranges_exist($self->{imapd}, $ibx, $exists);
 		my $uid_end = $uid_base + UID_BLOCK;
 		$exists = $uid_end if $exists > $uid_end;
@@ -285,17 +284,17 @@ sub inbox_lookup ($$) {
 		$exists -= $uid_base;
 	} else { # check for dummy inboxes
 		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
-		delete $self->{uid_base};
-		$exists = 0;
+		$uid_base = $exists = 0;
 		$uidnext = 1;
 	}
-	($ibx, $exists, $uidnext);
+	($ibx, $exists, $uidnext, $uid_base);
 }
 
 sub cmd_examine ($$$) {
 	my ($self, $tag, $mailbox) = @_;
-	my ($ibx, $exists, $uidnext) = inbox_lookup($self, $mailbox);
+	my ($ibx, $exists, $uidnext, $base) = inbox_lookup($self, $mailbox);
 	return "$tag NO Mailbox doesn't exist: $mailbox\r\n" if !$ibx;
+	$self->{uid_base} = $base;
 
 	# XXX: do we need this? RFC 5162/7162
 	my $ret = $self->{ibx} ? "* OK [CLOSED] previous closed\r\n" : '';
@@ -499,8 +498,7 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat
 
 	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
 	$$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s;
-	fetch_run_ops($self, $self->{uid_base} // 0,
-			$smsg, $bref, $ops, $partial);
+	fetch_run_ops($self, $self->{uid_base}, $smsg, $bref, $ops, $partial);
 	requeue_once($self);
 }
 
@@ -558,7 +556,7 @@ sub op_eml_new { $_[4] = PublicInbox::Eml->new($_[3]) }
 
 sub uid_clamp ($$$) {
 	my ($self, $beg, $end) = @_;
-	my $uid_min = ($self->{uid_base} // 0) + 1;
+	my $uid_min = $self->{uid_base} + 1;
 	my $uid_end = $uid_min + UID_BLOCK - 1;
 	$$beg = $uid_min if $$beg < $uid_min;
 	$$end = $uid_end if $$end > $uid_end;
@@ -625,7 +623,7 @@ sub fetch_smsg { # long_response
 			return;
 		}
 	}
-	my $uid_base = $self->{uid_base} // 0;
+	my $uid_base = $self->{uid_base};
 	fetch_run_ops($self, $uid_base, $_, undef, $ops) for @$msgs;
 	@$msgs = ();
 	1; # more
@@ -652,7 +650,7 @@ sub fetch_uid { # long_response
 		}
 		# continue looping
 	}
-	my $uid_base = $self->{uid_base} // 0;
+	my $uid_base = $self->{uid_base};
 	my ($i, $k);
 	for (@$uids) {
 		$self->msg_more(fetch_msn_uid($uid_base, $_));
@@ -903,7 +901,7 @@ sub cmd_uid_fetch ($$$$;@) {
 }
 
 sub msn_to_uid_range ($$) {
-	my $uid_base = $_[0]->{uid_base} // 0;
+	my $uid_base = $_[0]->{uid_base};
 	$_[1] =~ s/([0-9]+)/$uid_base + $1/sge;
 }
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 72/82] imap: FETCH: more granular CRLF conversion
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (70 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 71/82] imap: cleanup ->{uid_base} usage Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 73/82] imap: further speed up HEADER.FIELDS FETCH requests Eric Wong
                   ` (10 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

This speeds up requests from mutt for HEADER.FIELDS by around 10%
since we don't waste time doing CRLF conversion on large message
bodies that get discarded, anyways.
---
 lib/PublicInbox/IMAP.pm | 80 ++++++++++++++++++++++++++---------------
 t/imap.t                | 29 +++++++++------
 2 files changed, 70 insertions(+), 39 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index e9fa338b5fe..3cc68e66872 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -40,20 +40,27 @@ sub UID_BLOCK () { 50_000 }
 # these values area also used for sorting
 sub NEED_SMSG () { 1 }
 sub NEED_BLOB () { NEED_SMSG|2 }
-sub NEED_EML () { NEED_BLOB|4 }
-my $OP_EML_NEW = [ NEED_EML - 1, \&op_eml_new ];
+sub CRLF_BREF () { 4 }
+sub EML_HDR () { 8 }
+sub CRLF_HDR () { 16 }
+sub EML_BDY () { 32 }
+sub CRLF_BDY () { 64 }
+my $OP_EML_NEW = [ EML_HDR - 1, \&op_eml_new ];
+my $OP_CRLF_BREF = [ CRLF_BREF, \&op_crlf_bref ];
+my $OP_CRLF_HDR = [ CRLF_HDR, \&op_crlf_hdr ];
+my $OP_CRLF_BDY = [ CRLF_BDY, \&op_crlf_bdy ];
 
 my %FETCH_NEED = (
-	'BODY[HEADER]' => [ NEED_EML, \&emit_rfc822_header ],
-	'BODY[TEXT]' => [ NEED_EML, \&emit_rfc822_text ],
-	'BODY[]' => [ NEED_BLOB, \&emit_rfc822 ],
-	'RFC822.HEADER' => [ NEED_EML, \&emit_rfc822_header ],
-	'RFC822.TEXT' => [ NEED_EML, \&emit_rfc822_text ],
+	'BODY[HEADER]' => [ NEED_BLOB|EML_HDR|CRLF_HDR, \&emit_rfc822_header ],
+	'BODY[TEXT]' => [ NEED_BLOB|EML_BDY|CRLF_BDY, \&emit_rfc822_text ],
+	'BODY[]' => [ NEED_BLOB|CRLF_BREF, \&emit_rfc822 ],
+	'RFC822.HEADER' => [ NEED_BLOB|EML_HDR|CRLF_HDR, \&emit_rfc822_header ],
+	'RFC822.TEXT' => [ NEED_BLOB|EML_BDY|CRLF_BDY, \&emit_rfc822_text ],
 	'RFC822.SIZE' => [ NEED_SMSG, \&emit_rfc822_size ],
-	RFC822 => [ NEED_BLOB, \&emit_rfc822 ],
-	BODY => [ NEED_EML, \&emit_body ],
-	BODYSTRUCTURE => [ NEED_EML, \&emit_bodystructure ],
-	ENVELOPE => [ NEED_EML, \&emit_envelope ],
+	RFC822 => [ NEED_BLOB|CRLF_BREF, \&emit_rfc822 ],
+	BODY => [ NEED_BLOB|EML_HDR|EML_BDY, \&emit_body ],
+	BODYSTRUCTURE => [ NEED_BLOB|EML_HDR|EML_BDY, \&emit_bodystructure ],
+	ENVELOPE => [ NEED_BLOB|EML_HDR, \&emit_envelope ],
 	FLAGS => [ 0, \&emit_flags ],
 	INTERNALDATE => [ NEED_SMSG, \&emit_internaldate ],
 );
@@ -494,10 +501,6 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
-	$$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy
-
-	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
-	$$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s;
 	fetch_run_ops($self, $self->{uid_base}, $smsg, $bref, $ops, $partial);
 	requeue_once($self);
 }
@@ -554,6 +557,18 @@ sub emit_body {
 # set $eml once ($_[4] == $eml, $_[3] == $bref)
 sub op_eml_new { $_[4] = PublicInbox::Eml->new($_[3]) }
 
+# s/From / fixes old bug from import (pre-a0c07cba0e5d8b6a)
+sub to_crlf_full {
+	${$_[0]} =~ s/(?<!\r)\n/\r\n/sg;
+	${$_[0]} =~ s/\A[\r\n]*From [^\r\n]*\r\n//s;
+}
+
+sub op_crlf_bref { to_crlf_full($_[3]) }
+
+sub op_crlf_hdr { to_crlf_full($_[4]->{hdr}) }
+
+sub op_crlf_bdy { ${$_[4]->{bdy}} =~ s/(?<!\r)\n/\r\n/sg if $_[4]->{bdy} }
+
 sub uid_clamp ($$$) {
 	my ($self, $beg, $end) = @_;
 	my $uid_min = $self->{uid_base} + 1;
@@ -790,8 +805,8 @@ sub partial_hdr_get {
 	join('', ($str =~ m/($hdrs_re)/g), "\r\n");
 }
 
-sub partial_prepare ($$$) {
-	my ($partial, $want, $att) = @_;
+sub partial_prepare ($$$$) {
+	my ($need, $partial, $want, $att) = @_;
 
 	# recombine [ "BODY[1.HEADER.FIELDS", "(foo", "bar)]" ]
 	# back to: "BODY[1.HEADER.FIELDS (foo bar)]"
@@ -804,6 +819,7 @@ sub partial_prepare ($$$) {
 			(?:\.(HEADER|MIME|TEXT))? # 2 - section_name
 			\](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 3, 4
 		$partial->{$att} = [ \&partial_body, $1, $2, $3, $4 ];
+		$$need |= CRLF_BREF|EML_HDR|EML_BDY;
 	} elsif ($att =~ /\ABODY\[(?:([0-9]+(?:\.[0-9]+)*)\.)? # 1 - section_idx
 				(?:HEADER\.FIELDS(\.NOT)?)\x20 # 2
 				\(([A-Z0-9\-\x20]+)\) # 3 - hdrs
@@ -812,6 +828,7 @@ sub partial_prepare ($$$) {
 						: \&partial_hdr_get,
 						$1, undef, $4, $5 ];
 		$tmp->[2] = hdrs_regexp($3);
+		$$need |= CRLF_HDR|EML_HDR;
 	} else {
 		undef;
 	}
@@ -857,16 +874,9 @@ sub fetch_compile ($) {
 			while (my ($k, $fl_cb) = each %$x) {
 				next if $seen{$k}++;
 				$need |= $fl_cb->[0];
-
-				# insert a special op to convert $bref to $eml
-				# the first time we need it
-				if ($need == NEED_EML && !$seen{$need}++) {
-					push @op, $OP_EML_NEW;
-				}
-				# $fl_cb = [ flags, \&emit_foo ]
-				push @op, [ @$fl_cb , $k ];
+				push @op, [ @$fl_cb, $k ];
 			}
-		} elsif (!partial_prepare(\%partial, $want, $att)) {
+		} elsif (!partial_prepare(\$need, \%partial, $want, $att)) {
 			return "BAD param: $att";
 		}
 	}
@@ -874,11 +884,25 @@ sub fetch_compile ($) {
 
 	# stabilize partial order for consistency and ease-of-debugging:
 	if (scalar keys %partial) {
-		$need = NEED_EML;
-		push @op, $OP_EML_NEW if !$seen{$need}++;
+		$need |= NEED_BLOB;
 		$r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ];
 	}
 
+	push @op, $OP_EML_NEW if ($need & (EML_HDR|EML_BDY));
+
+	# do we need CRLF conversion?
+	if ($need & CRLF_BREF) {
+		push @op, $OP_CRLF_BREF;
+	} elsif (my $crlf = ($need & (CRLF_HDR|CRLF_BDY))) {
+		if ($crlf == (CRLF_HDR|CRLF_BDY)) {
+			push @op, $OP_CRLF_BREF;
+		} elsif ($need & CRLF_HDR) {
+			push @op, $OP_CRLF_HDR;
+		} else {
+			push @op, $OP_CRLF_BDY;
+		}
+	}
+
 	$r[0] = $need & NEED_BLOB ? \&fetch_blob :
 		($need & NEED_SMSG ? \&fetch_smsg : \&fetch_uid);
 
diff --git a/t/imap.t b/t/imap.t
index 2401237c8a0..81b3d844e76 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -70,21 +70,26 @@ EOF
 {
 	my $partial_prepare = \&PublicInbox::IMAP::partial_prepare;
 	my $x = {};
-	my $r = $partial_prepare->($x, [], my $p = 'BODY[9]');
+	my $n = 0;
+	my $r = $partial_prepare->(\$n, $x, [], my $p = 'BODY[9]');
 	ok($r, $p);
-	$r = $partial_prepare->($x, [], $p = 'BODY[9]<5>');
+	$r = $partial_prepare->(\$n, $x, [], $p = 'BODY[9]<5>');
 	ok($r, $p);
-	$r = $partial_prepare->($x, [], $p = 'BODY[9]<5.1>');
+	$r = $partial_prepare->(\$n, $x, [], $p = 'BODY[9]<5.1>');
 	ok($r, $p);
-	$r = $partial_prepare->($x, [], $p = 'BODY[1.1]');
+	$r = $partial_prepare->(\$n, $x, [], $p = 'BODY[1.1]');
 	ok($r, $p);
-	$r = $partial_prepare->($x, [], $p = 'BODY[HEADER.FIELDS (DATE FROM)]');
+	$r = $partial_prepare->(\$n, $x, [],
+				$p = 'BODY[HEADER.FIELDS (DATE FROM)]');
 	ok($r, $p);
-	$r = $partial_prepare->($x, [], $p = 'BODY[HEADER.FIELDS.NOT (TO)]');
+	$r = $partial_prepare->(\$n, $x, [],
+				$p = 'BODY[HEADER.FIELDS.NOT (TO)]');
 	ok($r, $p);
-	$r = $partial_prepare->($x, [], $p = 'BODY[HEDDER.FIELDS.NOT (TO)]');
+	$r = $partial_prepare->(\$n, $x, [],
+				$p = 'BODY[HEDDER.FIELDS.NOT (TO)]');
 	ok(!$r, "rejected misspelling $p");
-	$r = $partial_prepare->($x, [], $p = 'BODY[1.1.HEADER.FIELDS (TO)]');
+	$r = $partial_prepare->(\$n, $x, [],
+				$p = 'BODY[1.1.HEADER.FIELDS (TO)]');
 	ok($r, $p);
 	my $partial_body = \&PublicInbox::IMAP::partial_body;
 	my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get;
@@ -111,12 +116,14 @@ EOF
 	my $fetch_compile = \&PublicInbox::IMAP::fetch_compile;
 	my ($cb, $ops, $partial) = $fetch_compile->(['BODY[]']);
 	is($partial, undef, 'no partial fetch data');
-	is_deeply($ops,
-		[ 'BODY[]', \&PublicInbox::IMAP::emit_rfc822 ],
-		'proper key and op compiled for BODY[]');
+	is_deeply($ops, [
+		undef, \&PublicInbox::IMAP::op_crlf_bref,
+		'BODY[]', \&PublicInbox::IMAP::emit_rfc822
+	], 'proper key and op compiled for BODY[]');
 
 	($cb, $ops, $partial) = $fetch_compile->(['BODY', 'BODY[]']);
 	is_deeply($ops, [
+		undef, \&PublicInbox::IMAP::op_crlf_bref,
 		'BODY[]', \&PublicInbox::IMAP::emit_rfc822,
 		undef, \&PublicInbox::IMAP::op_eml_new,
 		'BODY', \&PublicInbox::IMAP::emit_body,

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 73/82] imap: further speed up HEADER.FIELDS FETCH requests
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (71 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 72/82] imap: FETCH: more granular CRLF conversion Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 74/82] imap: FETCH: try to make fake MSNs sequentially Eric Wong
                   ` (9 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

Since headers are big and include a lot of lines MUAs don't
care about, we can skip the CRLF_HDR ops and just do the
CRLF conversion in partial_hdr_get and partial_hdr_not.
This is another 10-15% speedup for mutt w/o header caching.
---
 lib/PublicInbox/IMAP.pm | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 3cc68e66872..fe3c7d1a85b 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -792,6 +792,7 @@ sub partial_hdr_not {
 	}
 	my $str = $eml->header_obj->as_string;
 	$str =~ s/$hdrs_re//g;
+	$str =~ s/(?<!\r)\n/\r\n/sg;
 	$str .= "\r\n";
 }
 
@@ -802,7 +803,9 @@ sub partial_hdr_get {
 		$eml = eml_body_idx($eml, $section_idx) or return;
 	}
 	my $str = $eml->header_obj->as_string;
-	join('', ($str =~ m/($hdrs_re)/g), "\r\n");
+	$str = join('', ($str =~ m/($hdrs_re)/g));
+	$str =~ s/(?<!\r)\n/\r\n/sg;
+	$str .= "\r\n";
 }
 
 sub partial_prepare ($$$$) {
@@ -828,7 +831,11 @@ sub partial_prepare ($$$$) {
 						: \&partial_hdr_get,
 						$1, undef, $4, $5 ];
 		$tmp->[2] = hdrs_regexp($3);
-		$$need |= CRLF_HDR|EML_HDR;
+
+		# don't emit CRLF_HDR instruction, here, partial_hdr_*
+		# will do CRLF conversion with only the extracted result
+		# and not waste time converting lines we don't care about.
+		$$need |= EML_HDR;
 	} else {
 		undef;
 	}

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 74/82] imap: FETCH: try to make fake MSNs sequentially
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (72 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 73/82] imap: further speed up HEADER.FIELDS FETCH requests Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 75/82] imap: STATUS/EXAMINE: rely on SQLite overview Eric Wong
                   ` (8 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

This appears to significantly improve header caching behavior
with mutt.  With the current public-inbox.org/git mirror(*),
mutt will only re-FETCH the last ~300 or so messages in the
final "inbox.comp.version-control.git.7" mailbox, instead of
~49,000 messages every time.

It's not perfect, but a 500ms query is better than a >10s query
and mutt itself spends as much time loading its header cache.

(*) there are many gaps in NNTP article numbers (UIDs) due to
    spam removal from public-inbox-learn.
---
 lib/PublicInbox/IMAP.pm | 22 +++++++++++-----------
 t/imapd.t               |  4 ++--
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index fe3c7d1a85b..d0683530988 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -475,12 +475,12 @@ sub requeue_once ($) {
 	$self->requeue if $new_size == 1;
 }
 
-# my ($uid_base, $UID) = @_;
-sub fetch_msn_uid ($$) { '* '.($_[1] - $_[0]).' FETCH (UID '.$_[1] }
+# my ($msn, $UID) = @_;
+sub fetch_msn_uid ($$) { '* '.(${$_[0]}++).' FETCH (UID '.$_[1] }
 
 sub fetch_run_ops {
-	my ($self, $uid_base, $smsg, $bref, $ops, $partial) = @_;
-	$self->msg_more(fetch_msn_uid($uid_base, $smsg->{num}));
+	my ($self, $msn, $smsg, $bref, $ops, $partial) = @_;
+	$self->msg_more(fetch_msn_uid($msn, $smsg->{num}));
 	my ($eml, $k);
 	for (my $i = 0; $i < @$ops;) {
 		$k = $ops->[$i++];
@@ -492,7 +492,7 @@ sub fetch_run_ops {
 
 sub fetch_blob_cb { # called by git->cat_async via git_async_cat
 	my ($bref, $oid, $type, $size, $fetch_arg) = @_;
-	my ($self, undef, $msgs, undef, $ops, $partial) = @$fetch_arg;
+	my ($self, undef, $msgs, $range_info, $ops, $partial) = @$fetch_arg;
 	my $smsg = shift @$msgs or die 'BUG: no smsg';
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
@@ -501,7 +501,7 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
-	fetch_run_ops($self, $self->{uid_base}, $smsg, $bref, $ops, $partial);
+	fetch_run_ops($self, $range_info->[3], $smsg, $bref, $ops, $partial);
 	requeue_once($self);
 }
 
@@ -601,7 +601,8 @@ sub range_step ($$) {
 		return 'BAD fetch range';
 	}
 	uid_clamp($self, \$beg, \$end) if defined($range);
-	[ $beg, $end, $$range_csv ];
+	my $msn = $beg - $self->{uid_base};
+	[ $beg, $end, $$range_csv, \$msn ];
 }
 
 sub refill_range ($$$) {
@@ -638,8 +639,7 @@ sub fetch_smsg { # long_response
 			return;
 		}
 	}
-	my $uid_base = $self->{uid_base};
-	fetch_run_ops($self, $uid_base, $_, undef, $ops) for @$msgs;
+	fetch_run_ops($self, $range_info->[3], $_, undef, $ops) for @$msgs;
 	@$msgs = ();
 	1; # more
 }
@@ -665,10 +665,10 @@ sub fetch_uid { # long_response
 		}
 		# continue looping
 	}
-	my $uid_base = $self->{uid_base};
 	my ($i, $k);
+	my $msn = $range_info->[3];
 	for (@$uids) {
-		$self->msg_more(fetch_msn_uid($uid_base, $_));
+		$self->msg_more(fetch_msn_uid($msn, $_));
 		for ($i = 0; $i < @$ops;) {
 			$k = $ops->[$i++];
 			$ops->[$i++]->($self, $k);
diff --git a/t/imapd.t b/t/imapd.t
index c691e1a96a7..fdab074a249 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -349,8 +349,8 @@ is(scalar keys %$ret, 3, 'got all 3 messages');
 }
 my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@";
 is(scalar keys %$r2, 2, 'did not get all 3 messages');
-is($r2->{2}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged');
-is($r2->{3}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged');
+is($r2->{1}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged');
+is($r2->{2}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged');
 $r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]')
 			or BAIL_OUT "FETCH $@";
 is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'},

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 75/82] imap: STATUS/EXAMINE: rely on SQLite overview
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (73 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 74/82] imap: FETCH: try to make fake MSNs sequentially Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 76/82] imap: UID SEARCH: support multiple ranges Eric Wong
                   ` (7 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

We can get exact values for EXISTS, UIDNEXT using SQLite
rather than calculating off $ibx->mm->max ourselves.

Furthermore, $ibx->mm is less useful than $ibx->over for IMAP
(and for our read-only daemons in general) so do not depend on
$ibx->mm outside of startup/reload to save FDs and reduce kernel
page cache footprint.
---
 lib/PublicInbox/DummyInbox.pm | 11 ++++++-----
 lib/PublicInbox/IMAP.pm       | 30 ++++++++++++++++++------------
 lib/PublicInbox/IMAPD.pm      |  2 +-
 lib/PublicInbox/Over.pm       | 30 ++++++++++++++++++++++++++++++
 t/over.t                      |  3 +++
 5 files changed, 58 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/DummyInbox.pm b/lib/PublicInbox/DummyInbox.pm
index b6c48db1a0e..c3f4f5c6674 100644
--- a/lib/PublicInbox/DummyInbox.pm
+++ b/lib/PublicInbox/DummyInbox.pm
@@ -9,13 +9,14 @@ use strict;
 
 sub created_at { 0 } # Msgmap::created_at
 sub mm { shift }
-sub max { undef } # Msgmap::max
-sub msg_range { [] } # Msgmap::msg_range
+sub uid_range { [] } # Over::uid_range
+sub subscribe_unlock { undef };
 
 no warnings 'once';
-*uid_range = *query_xover = \&msg_range;
+*max = \&created_at;
+*query_xover = \&uid_range;
 *over = \&mm;
-*subscribe_unlock = *unsubscribe_unlock =
-	*get_art = *description = *base_url = \&max;
+*unsubscribe_unlock =
+	*get_art = *description = *base_url = \&subscribe_unlock;
 
 1;
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index d0683530988..41c8066446c 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -183,7 +183,7 @@ sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" }
 # called by PublicInbox::InboxIdle
 sub on_inbox_unlock {
 	my ($self, $ibx) = @_;
-	my $new = $ibx->mm->max;
+	my $new = $ibx->over->max;
 	my $uid_base = $self->{uid_base} // 0;
 	my $uid_end = $uid_base + UID_BLOCK;
 	defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset';
@@ -223,7 +223,7 @@ sub cmd_idle ($$) {
 	# IDLE seems allowed by dovecot w/o a mailbox selected *shrug*
 	my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n";
 	$self->{-idle_tag} = $tag;
-	my $max = $ibx->mm->max // 0;
+	my $max = $ibx->over->max;
 	my $uid_end = $self->{uid_base} + UID_BLOCK;
 	my $sock = $self->{sock} or return;
 	my $fd = fileno($sock);
@@ -283,14 +283,20 @@ sub inbox_lookup ($$) {
 		my $mb_top = $1;
 		$uid_base = $2 * UID_BLOCK;
 		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
-		$exists = $ibx->mm->max // 0;
-		ensure_ranges_exist($self->{imapd}, $ibx, $exists);
-		my $uid_end = $uid_base + UID_BLOCK;
-		$exists = $uid_end if $exists > $uid_end;
-		$uidnext = $exists + 1;
-		$exists -= $uid_base;
+		my $max;
+		($exists, $uidnext, $max) = $ibx->over->imap_status($uid_base,
+							$uid_base + UID_BLOCK);
+		ensure_ranges_exist($self->{imapd}, $ibx, $max);
 	} else { # check for dummy inboxes
-		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
+		$mailbox = lc $mailbox;
+		$ibx = $self->{imapd}->{mailboxes}->{$mailbox} or return;
+
+		# if "INBOX.foo.bar" is selected and "INBOX.foo.bar.0",
+		# check for new UID ranges (e.g. "INBOX.foo.bar.1")
+		if (my $z = $self->{imapd}->{mailboxes}->{"$mailbox.0"}) {
+			ensure_ranges_exist($self->{imapd}, $z, $z->over->max);
+		}
+
 		$uid_base = $exists = 0;
 		$uidnext = 1;
 	}
@@ -590,7 +596,7 @@ sub range_step ($$) {
 		($beg, $end) = ($1 + 0, $2 + 0);
 	} elsif ($range =~ /\A([0-9]+):\*\z/) {
 		$beg = $1 + 0;
-		$end = $self->{ibx}->mm->max // 0;
+		$end = $self->{ibx}->over->max;
 		my $uid_end = $self->{uid_base} + UID_BLOCK;
 		$end = $uid_end if $end > $uid_end;
 		$beg = $end if $beg > $end;
@@ -1093,14 +1099,14 @@ sub cmd_uid_search ($$$;) {
 	if (!scalar(keys %$q)) {
 		$self->msg_more('* SEARCH');
 		my $beg = 1;
-		my $end = $ibx->mm->max // 0;
+		my $end = $ibx->over->max;
 		uid_clamp($self, \$beg, \$end);
 		long_response($self, \&uid_search_uid_range,
 				$tag, \$beg, $end, $sql);
 	} elsif (my $uid = $q->{uid}) {
 		if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
 			my ($beg, $end) = ($1, $2);
-			$end = $ibx->mm->max if $end eq '*';
+			$end = $ibx->over->max if $end eq '*';
 			uid_clamp($self, \$beg, \$end);
 			$self->msg_more('* SEARCH');
 			long_response($self, \&uid_search_uid_range,
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index 186ec7b0062..e4dc90e8d0e 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -44,7 +44,7 @@ sub imapd_refresh_ibx { # pi_config->each_inbox cb
 	# this case is a 32-bit representation of the creation
 	# date/time of the mailbox"
 	defined($ibx->{uidvalidity} = $mm->created_at) or return;
-	PublicInbox::IMAP::ensure_ranges_exist($imapd, $ibx, $mm->max // 1);
+	PublicInbox::IMAP::ensure_ranges_exist($imapd, $ibx, $mm->max // 0);
 
 	# preload to avoid fragmentation:
 	$ibx->description;
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index 1faeff418f4..e0f20ea6d3d 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -229,4 +229,34 @@ sub uid_range {
 	$dbh->selectcol_arrayref($q, undef, $beg, $end);
 }
 
+sub max {
+	my ($self) = @_;
+	my $sth = $self->connect->prepare_cached(<<'', undef, 1);
+SELECT MAX(num) FROM over WHERE num > 0
+
+	$sth->execute;
+	$sth->fetchrow_array // 0;
+}
+
+sub imap_status {
+	my ($self, $uid_base, $uid_end) = @_;
+	my $dbh = $self->connect;
+	my $sth = $dbh->prepare_cached(<<'', undef, 1);
+SELECT COUNT(num) FROM over WHERE num > ? AND num <= ?
+
+	$sth->execute($uid_base, $uid_end);
+	my $exists = $sth->fetchrow_array;
+
+	$sth = $dbh->prepare_cached(<<'', undef, 1);
+SELECT MAX(num) + 1 FROM over WHERE num <= ?
+
+	$sth->execute($uid_end);
+	my $uidnext = $sth->fetchrow_array;
+
+	$sth = $dbh->prepare_cached(<<'', undef, 1);
+SELECT MAX(num) FROM over WHERE num > 0
+
+	($exists, $uidnext, $sth->fetchrow_array // 0);
+}
+
 1;
diff --git a/t/over.t b/t/over.t
index 5586aa310e6..734fdaa3604 100644
--- a/t/over.t
+++ b/t/over.t
@@ -10,6 +10,8 @@ use_ok 'PublicInbox::OverIdx';
 my ($tmpdir, $for_destroy) = tmpdir();
 my $over = PublicInbox::OverIdx->new("$tmpdir/over.sqlite3");
 $over->connect;
+is($over->max, 0, 'max is zero on new DB (scalar context)');
+is_deeply([$over->max], [0], 'max is zero on new DB (list context)');
 my $x = $over->next_tid;
 is(int($x), $x, 'integer tid');
 my $y = $over->next_tid;
@@ -58,6 +60,7 @@ foreach my $mid (qw(a b)) {
 	my $msgs = [ map { $_->{num} } @{$over->get_thread($mid)} ];
 	is_deeply([98, 99], $msgs, "linked messages by Message-ID: <$mid>");
 }
+isnt($over->max, 0, 'max is non-zero');
 
 $over->rollback_lazy;
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 76/82] imap: UID SEARCH: support multiple ranges
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (74 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 75/82] imap: STATUS/EXAMINE: rely on SQLite overview Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 77/82] imap: wire up Xapian, MSN SEARCH and multi sequence-sets Eric Wong
                   ` (6 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

We can share a bit of code with FETCH to refill UID
ranges which hit the SQLite overview.
---
 lib/PublicInbox/IMAP.pm | 84 +++++++++++++++++------------------------
 1 file changed, 35 insertions(+), 49 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 41c8066446c..f6106a1e806 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -650,26 +650,31 @@ sub fetch_smsg { # long_response
 	1; # more
 }
 
-sub fetch_uid { # long_response
-	my ($self, $tag, $uids, $range_info, $ops) = @_;
-
-	while (!@$uids) { # rare
-		my ($beg, $end, $range_csv) = @$range_info;
-		if (scalar(@$uids = @{$self->{ibx}->over->
-					uid_range($beg, $end)})) {
-			$range_info->[0] = $uids->[-1] + 1;
-		} elsif (!$range_csv) {
-			$self->write(\"$tag OK Fetch done\r\n");
+sub refill_uids ($$$;$) {
+	my ($self, $uids, $range_info, $sql) = @_;
+	my ($beg, $end, $range_csv) = @$range_info;
+	my $over = $self->{ibx}->over;
+	while (1) {
+		if (scalar(@$uids = @{$over->uid_range($beg, $end, $sql)})) {
+			$range_info->[0] = $uids->[-1] + 1; # update $beg
 			return;
+		} elsif (!$range_csv) {
+			return 0;
 		} else {
 			my $next_range = range_step($self, \$range_csv);
-			if (!ref($next_range)) { # error
-				$self->write(\"$tag $next_range\r\n");
-				return;
-			}
-			@$range_info = @$next_range;
+			return $next_range if !ref($next_range); # error
+			($beg, $end, $range_csv) = @$range_info = @$next_range;
+			# continue looping
 		}
-		# continue looping
+	}
+}
+
+sub fetch_uid { # long_response
+	my ($self, $tag, $uids, $range_info, $ops) = @_;
+	if (defined(my $err = refill_uids($self, $uids, $range_info))) {
+		$err ||= 'OK Fetch done';
+		$self->write("$tag $err\r\n");
+		return;
 	}
 	my ($i, $k);
 	my $msn = $range_info->[3];
@@ -967,15 +972,15 @@ sub parse_date ($) { # 02-Oct-1993
 }
 
 sub uid_search_uid_range { # long_response
-	my ($self, $tag, $beg, $end, $sql) = @_;
-	my $uids = $self->{ibx}->over->uid_range($$beg, $end, $sql);
-	if (@$uids) {
-		$$beg = $uids->[-1] + 1;
-		$self->msg_more(join(' ', '', @$uids));
-	} else {
-		$self->write(\"\r\n$tag OK Search done\r\n");
-		undef;
+	my ($self, $tag, $uids, $sql, $range_info) = @_;
+	if (defined(my $err = refill_uids($self, $uids, $range_info, $sql))) {
+		$err ||= 'OK Search done';
+		$self->write("\r\n$tag $err\r\n");
+		return;
 	}
+	$self->msg_more(join(' ', '', @$uids));
+	@$uids = ();
+	1; # more
 }
 
 sub date_search {
@@ -1079,12 +1084,8 @@ sub parse_query {
 		return 'BAD Xapian not configured for mailbox';
 	}
 
-	if (my $uid = $q->{uid}) {
-		((@$uid > 1) || $uid->[0] =~ /,/) and
-			return 'BAD multiple ranges not supported, yet';
-		($q->{sql} // $q->{xap}) and
-			return 'BAD ranges and queries do not mix, yet';
-		$q->{uid} = join(',', @$uid); # TODO: multiple ranges
+	if (my $uid = delete $q->{uid}) {
+		$q->{uid} = join(',', @$uid);
 	}
 	$q;
 }
@@ -1095,28 +1096,13 @@ sub cmd_uid_search ($$$;) {
 	my $q = parse_query($self, \@_);
 	return "$tag $q\r\n" if !ref($q);
 	my $sql = delete $q->{sql};
-
+	my $range_csv = delete $q->{uid} // '1:*';
+	my $range_info = range_step($self, \$range_csv);
+	return "$tag $range_info\r\n" if !ref($range_info);
 	if (!scalar(keys %$q)) {
 		$self->msg_more('* SEARCH');
-		my $beg = 1;
-		my $end = $ibx->over->max;
-		uid_clamp($self, \$beg, \$end);
 		long_response($self, \&uid_search_uid_range,
-				$tag, \$beg, $end, $sql);
-	} elsif (my $uid = $q->{uid}) {
-		if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) {
-			my ($beg, $end) = ($1, $2);
-			$end = $ibx->over->max if $end eq '*';
-			uid_clamp($self, \$beg, \$end);
-			$self->msg_more('* SEARCH');
-			long_response($self, \&uid_search_uid_range,
-					$tag, \$beg, $end, $sql);
-		} elsif ($uid =~ /\A[0-9]+\z/s) {
-			$uid = $ibx->over->get_art($uid) ? " $uid" : '';
-			"* SEARCH$uid\r\n$tag OK Search done\r\n";
-		} else {
-			"$tag BAD Error\r\n";
-		}
+				$tag, [], $sql, $range_info);
 	} else {
 		"$tag BAD Error\r\n";
 	}

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 77/82] imap: wire up Xapian, MSN SEARCH and multi sequence-sets
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (75 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 76/82] imap: UID SEARCH: support multiple ranges Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 78/82] imap: misc cleanups and notes Eric Wong
                   ` (5 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

Simple queries work, more complex queries involving parentheses,
"OR", "NOT" don't work, yet.

Tested with "=b", "=B", and "=H" search and limits in mutt
on both v1 and v2 with multiple Xapian shards.
---
 lib/PublicInbox/IMAP.pm   | 142 ++++++++++++++++++++++++++++++--------
 lib/PublicInbox/Search.pm |  19 ++++-
 t/imapd.t                 |  46 +++++++++++-
 3 files changed, 175 insertions(+), 32 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index f6106a1e806..67bc32ab59b 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -24,6 +24,8 @@ use Errno qw(EAGAIN);
 use Time::Local qw(timegm);
 use POSIX qw(strftime);
 use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways
+use PublicInbox::Search;
+*mdocid = \&PublicInbox::Search::mdocid;
 
 my $Address;
 for my $mod (qw(Email::Address::XS Mail::Address)) {
@@ -592,22 +594,26 @@ sub range_step ($$) {
 		$range = $$range_csv;
 		$$range_csv = undef;
 	}
+	my $uid_base = $self->{uid_base};
+	my $uid_end = $uid_base + UID_BLOCK;
 	if ($range =~ /\A([0-9]+):([0-9]+)\z/) {
 		($beg, $end) = ($1 + 0, $2 + 0);
+		uid_clamp($self, \$beg, \$end);
 	} elsif ($range =~ /\A([0-9]+):\*\z/) {
 		$beg = $1 + 0;
 		$end = $self->{ibx}->over->max;
-		my $uid_end = $self->{uid_base} + UID_BLOCK;
 		$end = $uid_end if $end > $uid_end;
 		$beg = $end if $beg > $end;
+		uid_clamp($self, \$beg, \$end);
 	} elsif ($range =~ /\A[0-9]+\z/) {
 		$beg = $end = $range + 0;
-		undef $range;
+		# just let the caller do an out-of-range query if a single
+		# UID is out-of-range
+		++$beg if ($beg <= $uid_base || $end > $uid_end);
 	} else {
 		return 'BAD fetch range';
 	}
-	uid_clamp($self, \$beg, \$end) if defined($range);
-	my $msn = $beg - $self->{uid_base};
+	my $msn = $beg - $uid_base;
 	[ $beg, $end, $$range_csv, \$msn ];
 }
 
@@ -971,15 +977,22 @@ sub parse_date ($) { # 02-Oct-1993
 	timegm(0, 0, 0, $dd, $mm, $yyyy);
 }
 
-sub uid_search_uid_range { # long_response
-	my ($self, $tag, $uids, $sql, $range_info) = @_;
+sub msn_convert ($$) {
+	my ($self, $uids) = @_;
+	my $adj = $self->{uid_base};
+	$_ -= $adj for @$uids;
+}
+
+sub search_uid_range { # long_response
+	my ($self, $tag, $sql, $range_info, $want_msn) = @_;
+	my $uids = [];
 	if (defined(my $err = refill_uids($self, $uids, $range_info, $sql))) {
 		$err ||= 'OK Search done';
 		$self->write("\r\n$tag $err\r\n");
 		return;
 	}
+	msn_convert($self, $uids) if $want_msn;
 	$self->msg_more(join(' ', '', @$uids));
-	@$uids = ();
 	1; # more
 }
 
@@ -1029,6 +1042,21 @@ my %I2X = (
 	# KEYWORD # TODO ? dfpre,dfpost,...
 );
 
+# IMAP allows searching arbitrary headers via "HEADER $HDR_NAME $HDR_VAL"
+# which gets silly expensive.  We only allow the headers we already index.
+my %H2X = (%I2X, 'MESSAGE-ID' => 'm:', 'LIST-ID' => 'l:');
+
+sub xap_append ($$$$) {
+	my ($q, $rest, $k, $xk) = @_;
+	delete $q->{sql}; # can't use over.sqlite3
+	defined(my $arg = shift @$rest) or return "BAD $k no arg";
+
+	# AFAIK Xapian can't handle [*"] in probabilistic terms
+	$arg =~ tr/*"//d;
+	${$q->{xap}} .= qq[ $xk"$arg"];
+	undef;
+}
+
 sub parse_query {
 	my ($self, $rest) = @_;
 	if (uc($rest->[0]) eq 'CHARSET') {
@@ -1038,7 +1066,8 @@ sub parse_query {
 	}
 
 	my $sql = ''; # date conditions, {sql} deleted if Xapian is needed
-	my $q = { xap => '', sql => \$sql };
+	my $xap = '';
+	my $q = { sql => \$sql, xap => \$xap };
 	while (@$rest) {
 		my $k = uc(shift @$rest);
 		# default criteria
@@ -1059,17 +1088,18 @@ sub parse_query {
 			delete $q->{sql}; # can't use over.sqlite3
 			my $bytes = shift(@$rest) // '';
 			$bytes =~ /\A[0-9]+\z/ or return "BAD $k not a number";
-			$q->{xap} .= ' bytes:' . ($k eq 'SMALLER' ?
+			$xap .= ' bytes:' . ($k eq 'SMALLER' ?
 							'..'.(--$bytes) :
 							(++$bytes).'..');
+		} elsif ($k eq 'HEADER') {
+			$k = uc(shift(@$rest) // '');
+			my $xk = $H2X{$k} or
+				return "BAD HEADER $k not supported";
+			my $err = xap_append($q, $rest, $k, $xk);
+			return $err if $err;
 		} elsif (defined(my $xk = $I2X{$k})) {
-			delete $q->{sql}; # can't use over.sqlite3
-			my $arg = shift @$rest;
-			defined($arg) or return "BAD $k no arg";
-
-			# Xapian can't handle [*"] in probabilistic terms
-			$arg =~ tr/*"//d;
-			$q->{xap} .= qq[ $xk:"$arg"];
+			my $err = xap_append($q, $rest, $k, $xk);
+			return $err if $err;
 		} else {
 			# TODO: parentheses, OR, NOT ...
 			return "BAD $k not supported (yet?)";
@@ -1083,31 +1113,87 @@ sub parse_query {
 	} elsif (!$self->{ibx}->search) {
 		return 'BAD Xapian not configured for mailbox';
 	}
-
+	my $max = $self->{ibx}->over->max;
 	if (my $uid = delete $q->{uid}) {
-		$q->{uid} = join(',', @$uid);
+		my $range_csv = join(',', @$uid);
+		do {
+			my $nxt = range_step($self, \$range_csv);
+			my ($beg, $end) = @$nxt;
+			if ($xap) {
+				$xap .= " uid:$beg..$end";
+			} elsif ($beg == $end) {
+				$sql .= " AND num = $beg";
+			} else {
+				$sql .= " AND num >= $beg AND num <= $end";
+			}
+		} while ($range_csv);
 	}
+	my $beg = 1;
+	uid_clamp($self, \$beg, \$max);
+	$q->{range_info} = [ $beg, $max ];
 	$q;
 }
 
-sub cmd_uid_search ($$$;) {
-	my ($self, $tag) = splice(@_, 0, 2);
+sub refill_xap ($$$$) {
+	my ($self, $uids, $range_info, $q) = @_;
+	my ($beg, $end) = @$range_info;
+	my $srch = $self->{ibx}->search;
+	my $opt = { mset => 2, limit => 1000 };
+	my $nshard = $srch->{nshard} // 1;
+	while (1) {
+		my $mset = $srch->query("$$q uid:$beg..$end", $opt);
+		@$uids = map { mdocid($nshard, $_) } $mset->items;
+		if (@$uids) {
+			$range_info->[0] = $uids->[-1] + 1; # update $beg
+			return;
+		} else { # all done
+			return 0;
+		}
+	}
+}
+
+sub search_xap_range { # long_response
+	my ($self, $tag, $q, $range_info, $want_msn) = @_;
+	my $uids = [];
+	if (defined(my $err = refill_xap($self, $uids, $range_info, $q))) {
+		$err ||= 'OK Search done';
+		$self->write("\r\n$tag $err\r\n");
+		return;
+	}
+	msn_convert($self, $uids) if $want_msn;
+	$self->msg_more(join(' ', '', @$uids));
+	1; # more
+}
+
+sub search_common {
+	my ($self, $tag, $rest, $want_msn) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
-	my $q = parse_query($self, \@_);
+	my $q = parse_query($self, $rest);
 	return "$tag $q\r\n" if !ref($q);
-	my $sql = delete $q->{sql};
-	my $range_csv = delete $q->{uid} // '1:*';
-	my $range_info = range_step($self, \$range_csv);
-	return "$tag $range_info\r\n" if !ref($range_info);
-	if (!scalar(keys %$q)) {
+	my ($sql, $range_info) = delete @$q{qw(sql range_info)};
+	if (!scalar(keys %$q)) { # overview.sqlite3
+		$self->msg_more('* SEARCH');
+		long_response($self, \&search_uid_range,
+				$tag, $sql, $range_info, $want_msn);
+	} elsif ($q = $q->{xap}) {
 		$self->msg_more('* SEARCH');
-		long_response($self, \&uid_search_uid_range,
-				$tag, [], $sql, $range_info);
+		long_response($self, \&search_xap_range,
+				$tag, $q, $range_info, $want_msn);
 	} else {
 		"$tag BAD Error\r\n";
 	}
 }
 
+sub cmd_uid_search ($$$;) {
+	my ($self, $tag) = splice(@_, 0, 2);
+	search_common($self, $tag, \@_);
+}
+
+sub cmd_search ($$$;) {
+	my ($self, $tag) = splice(@_, 0, 2);
+	search_common($self, $tag, \@_, 1);
+}
+
 sub args_ok ($$) { # duplicated from PublicInbox::NNTP
 	my ($cb, $argc) = @_;
 	my $tot = prototype $cb;
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index c54cf7b9911..55eee41ca4a 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -182,6 +182,7 @@ sub _xdb ($) {
 	my ($xdb, $slow_phrase);
 	my $qpf = \($self->{qp_flags} ||= $QP_FLAGS);
 	if ($self->{ibx_ver} >= 2) {
+		my $n = 0;
 		foreach my $shard (<$dir/*>) {
 			-d $shard && $shard =~ m!/[0-9]+\z! or next;
 			my $sub = $X{Database}->new($shard);
@@ -191,7 +192,9 @@ sub _xdb ($) {
 				$xdb = $sub;
 			}
 			$slow_phrase ||= -f "$shard/iamchert";
+			++$n;
 		}
+		$self->{nshard} = $n;
 	} else {
 		$slow_phrase = -f "$dir/iamchert";
 		$xdb = $X{Database}->new($dir);
@@ -200,6 +203,15 @@ sub _xdb ($) {
 	$xdb;
 }
 
+# v2 Xapian docids don't conflict, so they're identical to
+# NNTP article numbers and IMAP UIDs.
+# https://trac.xapian.org/wiki/FAQ/MultiDatabaseDocumentID
+sub mdocid {
+	my ($nshard, $mitem) = @_;
+	my $docid = $mitem->get_docid;
+	int(($docid - 1) / $nshard) + 1;
+}
+
 sub xdb ($) {
 	my ($self) = @_;
 	$self->{xdb} ||= do {
@@ -283,7 +295,7 @@ sub _enquire_once { # retry_reopen callback
 	$enquire->set_query($query);
 	$opts ||= {};
         my $desc = !$opts->{asc};
-	if (($opts->{mset} || 0) == 2) {
+	if (($opts->{mset} || 0) == 2) { # mset == 2: ORDER BY docid/UID
 		$enquire->set_docid_order($ENQ_ASCENDING);
 		$enquire->set_weighting_scheme($X{BoolWeight}->new);
 	} elsif ($opts->{relevance}) {
@@ -322,6 +334,11 @@ sub qp {
 	$qp->add_valuerangeprocessor($nvrp->new(YYYYMMDD, 'd:'));
 	$qp->add_valuerangeprocessor($nvrp->new(DT, 'dt:'));
 
+	# for IMAP, undocumented for WWW and may be split off go away
+	$qp->add_valuerangeprocessor($nvrp->new(BYTES, 'bytes:'));
+	$qp->add_valuerangeprocessor($nvrp->new(TS, 'ts:'));
+	$qp->add_valuerangeprocessor($nvrp->new(UID, 'uid:'));
+
 	while (my ($name, $prefix) = each %bool_pfx_external) {
 		$qp->add_boolean_prefix($name, $_) foreach split(/ /, $prefix);
 	}
diff --git a/t/imapd.t b/t/imapd.t
index fdab074a249..e3cce2d30f0 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -19,10 +19,10 @@ if ($can_compress) { # hope this gets fixed upstream, soon
 require_ok 'PublicInbox::IMAP';
 my $first_range = '0';
 
-my $level = '-Lbasic';
+my $level = 'basic';
 SKIP: {
 	require_mods('Search::Xapian', 1);
-	$level = '-Lmedium';
+	$level = 'medium';
 };
 
 my @V = (1);
@@ -38,7 +38,7 @@ for my $V (@V) {
 	my $url = "http://example.com/i$V";
 	my $inboxdir = "$tmpdir/$name";
 	my $folder = "inbox.i$V";
-	my $cmd = ['-init', "-V$V", $level, $name, $inboxdir, $url, $addr];
+	my $cmd = ['-init', "-V$V", "-L$level", $name, $inboxdir, $url, $addr];
 	run_script($cmd) or BAIL_OUT("init $name");
 	xsys(qw(git config), "--file=$ENV{HOME}/.public-inbox/config",
 			"publicinbox.$name.newsgroup", $folder) == 0 or
@@ -120,6 +120,24 @@ is_deeply($ret, [ 1 ], 'search UID 1:1 works');
 $ret = $mic->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@";
 is_deeply($ret, [ 1 ], 'search UID 1:* works');
 
+SKIP: {
+	skip 'Xapian missing', 6 if $level eq 'basic';
+	my $x = $mic->search(qw(smaller 99999));
+	is_deeply($x, [1], 'SMALLER works with Xapian (hit)');
+	$x = $mic->search(qw(smaller 9));
+	is_deeply($x, [], 'SMALLER works with Xapian (miss)');
+
+	$x = $mic->search(qw(larger 99999));
+	is_deeply($x, [], 'LARGER works with Xapian (miss)');
+	$x = $mic->search(qw(larger 9));
+	is_deeply($x, [1], 'LARGER works with Xapian (hit)');
+
+	$x = $mic->search(qw(HEADER Message-ID testmessage@example.com));
+	is_deeply($x, [1], 'HEADER Message-ID works');
+	$x = $mic->search(qw(HEADER Message-ID miss));
+	is_deeply($x, [], 'HEADER Message-ID can miss');
+}
+
 is_deeply(scalar $mic->flags('1'), [], '->flags works');
 {
 	# RFC 3501 section 6.4.8 states:
@@ -341,12 +359,34 @@ $ret = $mic->fetch_hash('1,2:3', 'RFC822') or BAIL_OUT "FETCH $@";
 is(scalar keys %$ret, 3, 'got all 3 messages with comma-separated sequence');
 $ret = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@";
 is(scalar keys %$ret, 3, 'got all 3 messages');
+
+SKIP: {
+	# do any clients use non-UID IMAP SEARCH?
+	skip 'Xapian missing', 2 if $level eq 'basic';
+	my $x = $mic->search('all');
+	is_deeply($x, [1, 2, 3], 'MSN SEARCH works before rm');
+	$x = $mic->search(qw(header subject embedded));
+	is_deeply($x, [2], 'MSN SEARCH on Subject works before rm');
+}
+
 {
 	my $rdr = { 0 => \($ret->{1}->{RFC822}) };
 	my $env = { HOME => $ENV{HOME} };
 	my @cmd = qw(-learn rm --all);
 	run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm');
 }
+
+SKIP: {
+	# do any clients use non-UID IMAP SEARCH?  We only ensure
+	# MSN "SEARCH" can return a result which can be retrieved
+	# via MSN "FETCH"
+	skip 'Xapian missing', 3 if $level eq 'basic';
+	my $x = $mic->search(qw(header subject embedded));
+	is(scalar(@$x), 1, 'MSN SEARCH on Subject works after rm');
+	$x = $mic->message_string($x->[0]);
+	is($x, $ret->{2}->{RFC822}, 'message 2 unchanged');
+}
+
 my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@";
 is(scalar keys %$r2, 2, 'did not get all 3 messages');
 is($r2->{1}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged');

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 78/82] imap: misc cleanups and notes
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (76 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 77/82] imap: wire up Xapian, MSN SEARCH and multi sequence-sets Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 79/82] imapd: don't bother sorting LIST output Eric Wong
                   ` (4 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

Note some of our limitations for potential hackers.

We'll be renaming "UID_BLOCK" to "UID_SLICE", since "block" is
overused term and "slice" isn't used in our codebase.  Also,
document how "slice" and "epochs" are similar concepts for
different clients.
---
 lib/PublicInbox/IMAP.pm  | 38 +++++++++++++++++++++++++-------------
 lib/PublicInbox/IMAPD.pm |  2 +-
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 67bc32ab59b..1ba3a3ff1bb 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -7,7 +7,16 @@
 # slow storage.
 #
 # data notes:
-# * NNTP article numbers are UIDs
+#
+# * NNTP article numbers are UIDs, mm->created_at is UIDVALIDITY
+#
+# * public-inboxes are sliced into mailboxes of 50K messages
+#   to not overload MUAs: $NEWSGROUP_NAME.$SLICE_INDEX
+#   Slices are similar in concept to v2 "epochs".  Epochs
+#   are for the limitations of git clients, while slices are
+#   for the limitations of IMAP clients.
+#
+# * sequence numbers are estimated based on slice
 
 package PublicInbox::IMAP;
 use strict;
@@ -37,7 +46,7 @@ die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address;
 sub LINE_MAX () { 8000 } # RFC 2683 3.2.1.5
 
 # changing this will cause grief for clients which cache
-sub UID_BLOCK () { 50_000 }
+sub UID_SLICE () { 50_000 }
 
 # these values area also used for sorting
 sub NEED_SMSG () { 1 }
@@ -187,7 +196,7 @@ sub on_inbox_unlock {
 	my ($self, $ibx) = @_;
 	my $new = $ibx->over->max;
 	my $uid_base = $self->{uid_base} // 0;
-	my $uid_end = $uid_base + UID_BLOCK;
+	my $uid_end = $uid_base + UID_SLICE;
 	defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset';
 	$new = $uid_end if $new > $uid_end;
 	if ($new > $old) {
@@ -226,7 +235,7 @@ sub cmd_idle ($$) {
 	my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n";
 	$self->{-idle_tag} = $tag;
 	my $max = $ibx->over->max;
-	my $uid_end = $self->{uid_base} + UID_BLOCK;
+	my $uid_end = $self->{uid_base} + UID_SLICE;
 	my $sock = $self->{sock} or return;
 	my $fd = fileno($sock);
 	# only do inotify on most recent slice
@@ -260,12 +269,12 @@ sub cmd_done ($$) {
 	"$idle_tag OK Idle done\r\n";
 }
 
-sub ensure_ranges_exist ($$$) {
+sub ensure_slices_exist ($$$) {
 	my ($imapd, $ibx, $max) = @_;
 	defined(my $mb_top = $ibx->{newsgroup}) or return;
 	my $mailboxes = $imapd->{mailboxes};
 	my @created;
-	for (my $i = int($max/UID_BLOCK); $i >= 0; --$i) {
+	for (my $i = int($max/UID_SLICE); $i >= 0; --$i) {
 		my $sub_mailbox = "$mb_top.$i";
 		last if exists $mailboxes->{$sub_mailbox};
 		$mailboxes->{$sub_mailbox} = $ibx;
@@ -281,14 +290,14 @@ sub inbox_lookup ($$) {
 	my ($self, $mailbox) = @_;
 	my ($ibx, $exists, $uidnext, $uid_base);
 	if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) {
-		# old mail: inbox.comp.foo.$uid_block_idx
+		# old mail: inbox.comp.foo.$SLICE_IDX
 		my $mb_top = $1;
-		$uid_base = $2 * UID_BLOCK;
+		$uid_base = $2 * UID_SLICE;
 		$ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return;
 		my $max;
 		($exists, $uidnext, $max) = $ibx->over->imap_status($uid_base,
-							$uid_base + UID_BLOCK);
-		ensure_ranges_exist($self->{imapd}, $ibx, $max);
+							$uid_base + UID_SLICE);
+		ensure_slices_exist($self->{imapd}, $ibx, $max);
 	} else { # check for dummy inboxes
 		$mailbox = lc $mailbox;
 		$ibx = $self->{imapd}->{mailboxes}->{$mailbox} or return;
@@ -296,7 +305,7 @@ sub inbox_lookup ($$) {
 		# if "INBOX.foo.bar" is selected and "INBOX.foo.bar.0",
 		# check for new UID ranges (e.g. "INBOX.foo.bar.1")
 		if (my $z = $self->{imapd}->{mailboxes}->{"$mailbox.0"}) {
-			ensure_ranges_exist($self->{imapd}, $z, $z->over->max);
+			ensure_slices_exist($self->{imapd}, $z, $z->over->max);
 		}
 
 		$uid_base = $exists = 0;
@@ -580,7 +589,7 @@ sub op_crlf_bdy { ${$_[4]->{bdy}} =~ s/(?<!\r)\n/\r\n/sg if $_[4]->{bdy} }
 sub uid_clamp ($$$) {
 	my ($self, $beg, $end) = @_;
 	my $uid_min = $self->{uid_base} + 1;
-	my $uid_end = $uid_min + UID_BLOCK - 1;
+	my $uid_end = $uid_min + UID_SLICE - 1;
 	$$beg = $uid_min if $$beg < $uid_min;
 	$$end = $uid_end if $$end > $uid_end;
 }
@@ -595,7 +604,7 @@ sub range_step ($$) {
 		$$range_csv = undef;
 	}
 	my $uid_base = $self->{uid_base};
-	my $uid_end = $uid_base + UID_BLOCK;
+	my $uid_end = $uid_base + UID_SLICE;
 	if ($range =~ /\A([0-9]+):([0-9]+)\z/) {
 		($beg, $end) = ($1 + 0, $2 + 0);
 		uid_clamp($self, \$beg, \$end);
@@ -1206,6 +1215,9 @@ sub args_ok ($$) { # duplicated from PublicInbox::NNTP
 # returns 1 if we can continue, 0 if not due to buffered writes or disconnect
 sub process_line ($$) {
 	my ($self, $l) = @_;
+
+	# TODO: IMAP allows literals for big requests to upload messages
+	# (which we don't support) but maybe some big search queries use it.
 	my ($tag, $req, @args) = parse_line('[ \t]+', 0, $l);
 	pop(@args) if (@args && !defined($args[-1]));
 	if (@args && uc($req) eq 'UID') {
diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index e4dc90e8d0e..fe50de0f7cd 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -44,7 +44,7 @@ sub imapd_refresh_ibx { # pi_config->each_inbox cb
 	# this case is a 32-bit representation of the creation
 	# date/time of the mailbox"
 	defined($ibx->{uidvalidity} = $mm->created_at) or return;
-	PublicInbox::IMAP::ensure_ranges_exist($imapd, $ibx, $mm->max // 0);
+	PublicInbox::IMAP::ensure_slices_exist($imapd, $ibx, $mm->max // 0);
 
 	# preload to avoid fragmentation:
 	$ibx->description;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 79/82] imapd: don't bother sorting LIST output
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (77 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 78/82] imap: misc cleanups and notes Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 80/82] imap: remove non-UID SEARCH for now Eric Wong
                   ` (3 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

The sort was unstable on my test instance anyways, and
clients don't seem to mind.  So stop wasting CPU cycles.
---
 lib/PublicInbox/IMAPD.pm | 5 -----
 t/imapd.t                | 4 +++-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm
index fe50de0f7cd..09bedf5ca9b 100644
--- a/lib/PublicInbox/IMAPD.pm
+++ b/lib/PublicInbox/IMAPD.pm
@@ -74,11 +74,6 @@ sub imapd_refresh_finalize {
 			my $u = $_; # capitalize "INBOX" for user-familiarity
 			$u =~ s/\Ainbox(\.|\z)/INBOX$1/i;
 			qq[* LIST (\\Has${no}Children) "." $u\r\n]
-		} sort {
-			# shortest names first, alphabetically if lengths match
-			length($a) == length($b) ?
-				($a cmp $b) :
-				(length($a) <=> length($b))
 		} keys %$mailboxes
 	];
 	$imapd->{pi_config} = $pi_config;
diff --git a/t/imapd.t b/t/imapd.t
index e3cce2d30f0..c7e0baaa10b 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -405,7 +405,9 @@ is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'},
 	pop @new_list;
 	pop @orig_list;
 	# TODO: not sure if sort order matters, imapd_refresh_finalize
-	# sorts, for now, but maybe clients don't care...
+	# doesn't sort, hopefully clients don't care...
+	@new_list = sort @new_list;
+	@orig_list = sort @orig_list;
 	is_deeply(\@new_list, \@orig_list, 'LIST identical');
 }
 ok($mic->close, 'CLOSE works');

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 80/82] imap: remove non-UID SEARCH for now
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (78 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 79/82] imapd: don't bother sorting LIST output Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 81/82] over: uid_range: remove LIMIT Eric Wong
                   ` (2 subsequent siblings)
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

Supporting MSNs in long-lived connections beyond the lifetime of
a single request/response cycle is not scalable to a C10K
scenario.  It's probably not needed, since most clients seem to
use UIDs.

A somewhat efficient implementation I can come up uses
pack("S*" ...) (AKA "uint16_t mapping[50000]") has an overhead
of 100K per-client socket on a mailbox with 50K messages.  The
100K is a contiguous scalar, so it could be swapped out for
idle clients on most architectures if THP is disabled.

An alternative could be to use a tempfile as an allocator
partitioned into 100K chunks (or SQLite); but I'll only do that
if somebody presents a compelling case to support MSN SEARCH.
---
 lib/PublicInbox/IMAP.pm | 33 +++++++++------------------------
 t/imapd.t               | 37 -------------------------------------
 2 files changed, 9 insertions(+), 61 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 1ba3a3ff1bb..7efe5387646 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -986,21 +986,14 @@ sub parse_date ($) { # 02-Oct-1993
 	timegm(0, 0, 0, $dd, $mm, $yyyy);
 }
 
-sub msn_convert ($$) {
-	my ($self, $uids) = @_;
-	my $adj = $self->{uid_base};
-	$_ -= $adj for @$uids;
-}
-
 sub search_uid_range { # long_response
-	my ($self, $tag, $sql, $range_info, $want_msn) = @_;
+	my ($self, $tag, $sql, $range_info) = @_;
 	my $uids = [];
 	if (defined(my $err = refill_uids($self, $uids, $range_info, $sql))) {
 		$err ||= 'OK Search done';
 		$self->write("\r\n$tag $err\r\n");
 		return;
 	}
-	msn_convert($self, $uids) if $want_msn;
 	$self->msg_more(join(' ', '', @$uids));
 	1; # more
 }
@@ -1162,46 +1155,38 @@ sub refill_xap ($$$$) {
 }
 
 sub search_xap_range { # long_response
-	my ($self, $tag, $q, $range_info, $want_msn) = @_;
+	my ($self, $tag, $q, $range_info) = @_;
 	my $uids = [];
 	if (defined(my $err = refill_xap($self, $uids, $range_info, $q))) {
 		$err ||= 'OK Search done';
 		$self->write("\r\n$tag $err\r\n");
 		return;
 	}
-	msn_convert($self, $uids) if $want_msn;
 	$self->msg_more(join(' ', '', @$uids));
 	1; # more
 }
 
-sub search_common {
-	my ($self, $tag, $rest, $want_msn) = @_;
+sub cmd_uid_search ($$$;) {
+	my ($self, $tag) = splice(@_, 0, 2);
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
-	my $q = parse_query($self, $rest);
+	my $q = parse_query($self, \@_);
 	return "$tag $q\r\n" if !ref($q);
 	my ($sql, $range_info) = delete @$q{qw(sql range_info)};
 	if (!scalar(keys %$q)) { # overview.sqlite3
 		$self->msg_more('* SEARCH');
 		long_response($self, \&search_uid_range,
-				$tag, $sql, $range_info, $want_msn);
+				$tag, $sql, $range_info);
 	} elsif ($q = $q->{xap}) {
 		$self->msg_more('* SEARCH');
 		long_response($self, \&search_xap_range,
-				$tag, $q, $range_info, $want_msn);
+				$tag, $q, $range_info);
 	} else {
 		"$tag BAD Error\r\n";
 	}
 }
 
-sub cmd_uid_search ($$$;) {
-	my ($self, $tag) = splice(@_, 0, 2);
-	search_common($self, $tag, \@_);
-}
-
-sub cmd_search ($$$;) {
-	my ($self, $tag) = splice(@_, 0, 2);
-	search_common($self, $tag, \@_, 1);
-}
+# note: MSN SEARCH is NOT supported.  Do any widely-used MUAs
+# rely on MSNs from SEARCH results?  Let us know at meta@public-inbox.org
 
 sub args_ok ($$) { # duplicated from PublicInbox::NNTP
 	my ($cb, $argc) = @_;
diff --git a/t/imapd.t b/t/imapd.t
index c7e0baaa10b..f5ca8b7eb03 100644
--- a/t/imapd.t
+++ b/t/imapd.t
@@ -360,43 +360,6 @@ is(scalar keys %$ret, 3, 'got all 3 messages with comma-separated sequence');
 $ret = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@";
 is(scalar keys %$ret, 3, 'got all 3 messages');
 
-SKIP: {
-	# do any clients use non-UID IMAP SEARCH?
-	skip 'Xapian missing', 2 if $level eq 'basic';
-	my $x = $mic->search('all');
-	is_deeply($x, [1, 2, 3], 'MSN SEARCH works before rm');
-	$x = $mic->search(qw(header subject embedded));
-	is_deeply($x, [2], 'MSN SEARCH on Subject works before rm');
-}
-
-{
-	my $rdr = { 0 => \($ret->{1}->{RFC822}) };
-	my $env = { HOME => $ENV{HOME} };
-	my @cmd = qw(-learn rm --all);
-	run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm');
-}
-
-SKIP: {
-	# do any clients use non-UID IMAP SEARCH?  We only ensure
-	# MSN "SEARCH" can return a result which can be retrieved
-	# via MSN "FETCH"
-	skip 'Xapian missing', 3 if $level eq 'basic';
-	my $x = $mic->search(qw(header subject embedded));
-	is(scalar(@$x), 1, 'MSN SEARCH on Subject works after rm');
-	$x = $mic->message_string($x->[0]);
-	is($x, $ret->{2}->{RFC822}, 'message 2 unchanged');
-}
-
-my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@";
-is(scalar keys %$r2, 2, 'did not get all 3 messages');
-is($r2->{1}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged');
-is($r2->{2}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged');
-$r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]')
-			or BAIL_OUT "FETCH $@";
-is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'},
-	'Message-ID: <20200418222508.GA13918@dcvr>'."\r\n\r\n",
-	'BODY.PEEK[HEADER.FIELDS ...] drops .PEEK');
-
 {
 	my @new_list = $mic->list;
 	# tag differs in [-1]

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 81/82] over: uid_range: remove LIMIT
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (79 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 80/82] imap: remove non-UID SEARCH for now Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-10  7:05 ` [PATCH 82/82] imap: FETCH: proper MSN => UID mapping for requests Eric Wong
  2020-06-12 23:49 ` [PATCH 83/82] imap: introduce memory-efficient uo2m mapping Eric Wong
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

The IMAP code already limits the range to UID_SLICE (50K),
so that's about 1.6MB of of IVs for an ephemeral allocation
that won't live beyond one iteration of the event loop.
---
 lib/PublicInbox/Over.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index e0f20ea6d3d..e32104f0cf8 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -216,7 +216,7 @@ SELECT num,ts,ds,ddd FROM over WHERE num = ? LIMIT 1
 	load_from_row($smsg);
 }
 
-# IMAP search
+# IMAP search, this is limited by callers to UID_SLICE size (50K)
 sub uid_range {
 	my ($self, $beg, $end, $sql) = @_;
 	my $dbh = $self->connect;
@@ -225,7 +225,7 @@ sub uid_range {
 	# This is read-only, anyways; but caller should verify it's
 	# only sending \A[0-9]+\z for ds and ts column ranges
 	$q .= $$sql if $sql;
-	$q .= ' ORDER BY num ASC LIMIT ' . DEFAULT_LIMIT;
+	$q .= ' ORDER BY num ASC';
 	$dbh->selectcol_arrayref($q, undef, $beg, $end);
 }
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 82/82] imap: FETCH: proper MSN => UID mapping for requests
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (80 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 81/82] over: uid_range: remove LIMIT Eric Wong
@ 2020-06-10  7:05 ` Eric Wong
  2020-06-12 23:49 ` [PATCH 83/82] imap: introduce memory-efficient uo2m mapping Eric Wong
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-10  7:05 UTC (permalink / raw)
  To: meta

This finally seems to make mutt header caching behave properly.

We expect to be able to safely load 50K IV/UVs in memory without
OOM, since that's "only" 1.6 MB that won't live beyond a single
event loop iteration.  So create a simple array which can
quickly map MSNs in requests to UIDs and not leave out messages.

MSNs in the FETCH response will NOT be correct, since it's
inefficient to implement properly and mutt doesn't seem to
care.

Since the conversion code is easily shared, "UID SEARCH" can
allow the same MSN => UID mapping non-UID "FETCH" does.
---
 lib/PublicInbox/IMAP.pm | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 7efe5387646..ff01d0b5e8e 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -16,7 +16,12 @@
 #   are for the limitations of git clients, while slices are
 #   for the limitations of IMAP clients.
 #
-# * sequence numbers are estimated based on slice
+# * sequence numbers are estimated based on slice.  If they
+#   wrong, they're higher than than the corresponding UID
+#   because UIDs have gaps due to spam removals.
+#   We only support an ephemeral mapping non-UID "FETCH"
+#   because mutt header caching relies on it; mutt uses
+#   UID requests everywhere else.
 
 package PublicInbox::IMAP;
 use strict;
@@ -957,9 +962,17 @@ sub cmd_uid_fetch ($$$$;@) {
 	long_response($self, $cb, $tag, [], $range_info, $ops, $partial);
 }
 
+# returns an arrayref of UIDs, so MSNs can be translated via:
+# $msn2uid->[$MSN-1] => $UID
+sub msn2uid ($) {
+	my ($self) = @_;
+	my $x = $self->{uid_base};
+	$self->{ibx}->over->uid_range($x + 1, $x + UID_SLICE);
+}
+
 sub msn_to_uid_range ($$) {
-	my $uid_base = $_[0]->{uid_base};
-	$_[1] =~ s/([0-9]+)/$uid_base + $1/sge;
+	my $msn2uid = $_[0];
+	$_[1] =~ s!([0-9]+)!$msn2uid->[$1 - 1] // ($msn2uid->[-1] + 1)!sge;
 }
 
 sub cmd_fetch ($$$$;@) {
@@ -970,7 +983,7 @@ sub cmd_fetch ($$$$;@) {
 
 	# cb is one of fetch_blob, fetch_smsg, fetch_uid
 	$range_csv = 'bad' if $range_csv !~ $valid_range;
-	msn_to_uid_range($self, $range_csv);
+	msn_to_uid_range(msn2uid($self), $range_csv);
 	my $range_info = range_step($self, \$range_csv);
 	return "$tag $range_info\r\n" if !ref($range_info);
 	long_response($self, $cb, $tag, [], $range_info, $ops, $partial);
@@ -1070,13 +1083,14 @@ sub parse_query {
 	my $sql = ''; # date conditions, {sql} deleted if Xapian is needed
 	my $xap = '';
 	my $q = { sql => \$sql, xap => \$xap };
+	my $msn2uid;
 	while (@$rest) {
 		my $k = uc(shift @$rest);
 		# default criteria
 		next if $k =~ /\A(?:ALL|RECENT|UNSEEN|NEW)\z/;
 		next if $k eq 'AND'; # the default, until we support OR
 		if ($k =~ $valid_range) { # convert sequence numbers to UIDs
-			msn_to_uid_range($self, $k);
+			msn_to_uid_range($msn2uid //= msn2uid($self), $k);
 			push @{$q->{uid}}, $k;
 		} elsif ($k eq 'UID') {
 			$k = shift(@$rest) // '';

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 83/82] imap: introduce memory-efficient uo2m mapping
  2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
                   ` (81 preceding siblings ...)
  2020-06-10  7:05 ` [PATCH 82/82] imap: FETCH: proper MSN => UID mapping for requests Eric Wong
@ 2020-06-12 23:49 ` Eric Wong
  82 siblings, 0 replies; 84+ messages in thread
From: Eric Wong @ 2020-06-12 23:49 UTC (permalink / raw)
  To: meta

Since we limit our mailboxes slices to 50K and can guarantee a
contiguous UID space for those mailboxes, we can store a mapping
of "UID offsets" (not full UIDs) to Message Sequence Numbers as
an array of 16-bit unsigned integers in a 100K scalar.

For UID-only FETCH responses, we can momentarily unpack the
compact 100K representation to a ~1.6M Perl array of IV/UV
elements for a slight speedup.

Furthermore, we can (ab)use hash key deduplication in Perl5 to
deduplicate this 100K scalar across all clients with the same
mailbox slice open.

Technically we can increase our slice size to 64K w/o increasing
our storage overhead, but I suspect humans are more accustomed
to slices easily divisible by 10.
---
 I've been testing this on public-inbox.org for a bit and
 it seems to fix problems with header caching and mutt.

 MANIFEST                |   1 +
 lib/PublicInbox/IMAP.pm | 178 +++++++++++++++++++++--------
 t/imap.t                |  43 +++++++
 xt/mem-imapd-tls.t      | 243 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 420 insertions(+), 45 deletions(-)
 create mode 100644 xt/mem-imapd-tls.t

diff --git a/MANIFEST b/MANIFEST
index a133b71f..1a8d6c68 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -355,6 +355,7 @@ xt/git-http-backend.t
 xt/git_async_cmp.t
 xt/imapd-mbsync-oimap.t
 xt/imapd-validate.t
+xt/mem-imapd-tls.t
 xt/mem-msgview.t
 xt/msgtime_cmp.t
 xt/nntpd-validate.t
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index ff01d0b5..bc890517 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -16,18 +16,17 @@
 #   are for the limitations of git clients, while slices are
 #   for the limitations of IMAP clients.
 #
-# * sequence numbers are estimated based on slice.  If they
-#   wrong, they're higher than than the corresponding UID
-#   because UIDs have gaps due to spam removals.
-#   We only support an ephemeral mapping non-UID "FETCH"
-#   because mutt header caching relies on it; mutt uses
-#   UID requests everywhere else.
+# * We also take advantage of slices being only 50K to store
+#   "UID offset" to message sequence number (MSN) mapping
+#   as a 50K uint16_t array (via pack("S*", ...)).  "UID offset"
+#   is the offset from {uid_base} which determines the start of
+#   the mailbox slice.
 
 package PublicInbox::IMAP;
 use strict;
 use base qw(PublicInbox::DS);
 use fields qw(imapd ibx long_cb -login_tag
-	uid_base -idle_tag -idle_max);
+	uid_base -idle_tag uo2m);
 use PublicInbox::Eml;
 use PublicInbox::EmlContentFoo qw(parse_content_disposition);
 use PublicInbox::DS qw(now);
@@ -50,7 +49,9 @@ die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address;
 
 sub LINE_MAX () { 8000 } # RFC 2683 3.2.1.5
 
-# changing this will cause grief for clients which cache
+# Changing UID_SLICE will cause grief for clients which cache.
+# This also needs to be <64K: we pack it into a uint16_t
+# for long_response UID (offset) => MSN mappings
 sub UID_SLICE () { 50_000 }
 
 # these values area also used for sorting
@@ -196,23 +197,118 @@ sub cmd_capability ($$) {
 
 sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" }
 
+# uo2m: UID Offset to MSN, this is an arrayref by default,
+# but uo2m_hibernate can compact and deduplicate it
+sub uo2m_ary_new ($) {
+	my ($self) = @_;
+	my $base = $self->{uid_base};
+	my $uids = $self->{ibx}->over->uid_range($base + 1, $base + UID_SLICE);
+
+	# convert UIDs to offsets from {base}
+	my @tmp; # [$UID_OFFSET] => $MSN
+	my $msn = 0;
+	++$base;
+	$tmp[$_ - $base] = ++$msn for @$uids;
+	\@tmp;
+}
+
+# changes UID-offset-to-MSN mapping into a deduplicated scalar:
+# uint16_t uo2m[UID_SLICE].
+# May be swapped out for idle clients if THP is disabled.
+sub uo2m_hibernate ($) {
+	my ($self) = @_;
+	ref(my $uo2m = $self->{uo2m}) or return;
+	my %dedupe = ( uo2m_pack($uo2m) => undef );
+	$self->{uo2m} = (keys(%dedupe))[0];
+	undef;
+}
+
+sub uo2m_last_uid ($) {
+	my ($self) = @_;
+	my $uo2m = $self->{uo2m} or die 'BUG: uo2m_last_uid w/o {uo2m}';
+	(ref($uo2m) ? @$uo2m : (length($uo2m) >> 1)) + $self->{uid_base};
+}
+
+sub uo2m_pack ($) {
+	# $_[0] is an arrayref of MSNs, it may have undef gaps if there
+	# are gaps in the corresponding UIDs: [ msn1, msn2, undef, msn3 ]
+	no warnings 'uninitialized';
+	pack('S*', @{$_[0]});
+}
+
+# extend {uo2m} to account for new messages which arrived since
+# {uo2m} was created.
+sub uo2m_extend ($$) {
+	my ($self, $new_uid_max) = @_;
+	defined(my $uo2m = $self->{uo2m}) or
+		return($self->{uo2m} = uo2m_ary_new($self));
+	my $beg = uo2m_last_uid($self); # last UID we've learned
+	return $uo2m if $beg >= $new_uid_max; # fast path
+
+	# need to extend the current range:
+	my $base = $self->{uid_base};
+	++$beg;
+	my $uids = $self->{ibx}->over->uid_range($beg, $base + UID_SLICE);
+	my @tmp; # [$UID_OFFSET] => $MSN
+	if (ref($uo2m)) {
+		my $msn = $uo2m->[-1];
+		$tmp[$_ - $beg] = ++$msn for @$uids;
+		push @$uo2m, @tmp;
+		$uo2m;
+	} else {
+		my $msn = unpack('S', substr($uo2m, -2, 2));
+		$tmp[$_ - $beg] = ++$msn for @$uids;
+		$uo2m .= uo2m_pack(\@tmp);
+		my %dedupe = ($uo2m => undef);
+		$self->{uo2m} = (keys %dedupe)[0];
+	}
+}
+
+# the flexible version which works on scalars and array refs.
+# Must call uo2m_extend before this
+sub uid2msn ($$) {
+	my ($self, $uid) = @_;
+	my $uo2m = $self->{uo2m};
+	my $off = $uid - $self->{uid_base} - 1;
+	ref($uo2m) ? $uo2m->[$off] : unpack('S', substr($uo2m, $off << 1, 2));
+}
+
+# returns an arrayref of UIDs, so MSNs can be translated to UIDs via:
+# $msn2uid->[$MSN-1] => $UID.  The result of this is always ephemeral
+# and does not live beyond the event loop.
+sub msn2uid ($) {
+	my ($self) = @_;
+	my $base = $self->{uid_base};
+	my $uo2m = uo2m_extend($self, $base + UID_SLICE);
+	$uo2m = [ unpack('S*', $uo2m) ] if !ref($uo2m);
+
+	my $uo = 0;
+	my @msn2uid;
+	for my $msn (@$uo2m) {
+		++$uo;
+		$msn2uid[$msn - 1] = $uo + $base if $msn;
+	}
+	\@msn2uid;
+}
+
+# converts a set of message sequence numbers in requests to UIDs:
+sub msn_to_uid_range ($$) {
+	my $msn2uid = $_[0];
+	$_[1] =~ s!([0-9]+)!$msn2uid->[$1 - 1] // ($msn2uid->[-1] + 1)!sge;
+}
+
 # called by PublicInbox::InboxIdle
 sub on_inbox_unlock {
 	my ($self, $ibx) = @_;
-	my $new = $ibx->over->max;
-	my $uid_base = $self->{uid_base} // 0;
-	my $uid_end = $uid_base + UID_SLICE;
-	defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset';
-	$new = $uid_end if $new > $uid_end;
+	my $old = uo2m_last_uid($self);
+	my $uid_end = $self->{uid_base} + UID_SLICE;
+	uo2m_extend($self, $uid_end);
+	my $new = uo2m_last_uid($self);
 	if ($new > $old) {
-		$self->{-idle_max} = $new;
-		$new -= $uid_base;
-		$old -= $uid_base;
-		$self->msg_more("* $_ EXISTS\r\n") for (($old + 1)..($new - 1));
-		$self->write(\"* $new EXISTS\r\n");
+		my $msn = uid2msn($self, $new);
+		$self->write(\"* $msn EXISTS\r\n");
 	} elsif ($new == $uid_end) { # max exceeded $uid_end
 		# continue idling w/o inotify
-		delete $self->{-idle_max};
 		my $sock = $self->{sock} or return;
 		$ibx->unsubscribe_unlock(fileno($sock));
 	}
@@ -245,9 +341,9 @@ sub cmd_idle ($$) {
 	my $fd = fileno($sock);
 	# only do inotify on most recent slice
 	if ($max < $uid_end) {
+		uo2m_extend($self, $uid_end);
 		$ibx->subscribe_unlock($fd, $self);
 		$self->{imapd}->idler_start;
-		$self->{-idle_max} = $max;
 	}
 	$idle_timer //= PublicInbox::DS::later(\&idle_tick_all);
 	$IDLERS->{$fd} = $self;
@@ -497,12 +593,10 @@ sub requeue_once ($) {
 	$self->requeue if $new_size == 1;
 }
 
-# my ($msn, $UID) = @_;
-sub fetch_msn_uid ($$) { '* '.(${$_[0]}++).' FETCH (UID '.$_[1] }
-
 sub fetch_run_ops {
-	my ($self, $msn, $smsg, $bref, $ops, $partial) = @_;
-	$self->msg_more(fetch_msn_uid($msn, $smsg->{num}));
+	my ($self, $smsg, $bref, $ops, $partial) = @_;
+	my $uid = $smsg->{num};
+	$self->msg_more('* '.uid2msn($self, $uid)." FETCH (UID $uid");
 	my ($eml, $k);
 	for (my $i = 0; $i < @$ops;) {
 		$k = $ops->[$i++];
@@ -523,7 +617,7 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
-	fetch_run_ops($self, $range_info->[3], $smsg, $bref, $ops, $partial);
+	fetch_run_ops($self, $smsg, $bref, $ops, $partial);
 	requeue_once($self);
 }
 
@@ -627,8 +721,7 @@ sub range_step ($$) {
 	} else {
 		return 'BAD fetch range';
 	}
-	my $msn = $beg - $uid_base;
-	[ $beg, $end, $$range_csv, \$msn ];
+	[ $beg, $end, $$range_csv ];
 }
 
 sub refill_range ($$$) {
@@ -653,6 +746,7 @@ sub fetch_blob { # long_response
 			return;
 		}
 	}
+	uo2m_extend($self, $msgs->[-1]->{num});
 	git_async_cat($self->{ibx}->git, $msgs->[0]->{blob},
 			\&fetch_blob_cb, \@_);
 }
@@ -665,7 +759,8 @@ sub fetch_smsg { # long_response
 			return;
 		}
 	}
-	fetch_run_ops($self, $range_info->[3], $_, undef, $ops) for @$msgs;
+	uo2m_extend($self, $msgs->[-1]->{num});
+	fetch_run_ops($self, $_, undef, $ops) for @$msgs;
 	@$msgs = ();
 	1; # more
 }
@@ -696,10 +791,12 @@ sub fetch_uid { # long_response
 		$self->write("$tag $err\r\n");
 		return;
 	}
+	my $adj = $self->{uid_base} + 1;
+	my $uo2m = uo2m_extend($self, $uids->[-1]);
+	$uo2m = [ unpack('S*', $uo2m) ] if !ref($uo2m);
 	my ($i, $k);
-	my $msn = $range_info->[3];
 	for (@$uids) {
-		$self->msg_more(fetch_msn_uid($msn, $_));
+		$self->msg_more("* $uo2m->[$_ - $adj] FETCH (UID $_");
 		for ($i = 0; $i < @$ops;) {
 			$k = $ops->[$i++];
 			$ops->[$i++]->($self, $k);
@@ -956,25 +1053,14 @@ sub cmd_uid_fetch ($$$$;@) {
 	my ($cb, $ops, $partial) = fetch_compile(\@want);
 	return "$tag $cb\r\n" unless $ops;
 
+	# cb is one of fetch_blob, fetch_smsg, fetch_uid
 	$range_csv = 'bad' if $range_csv !~ $valid_range;
 	my $range_info = range_step($self, \$range_csv);
 	return "$tag $range_info\r\n" if !ref($range_info);
+	uo2m_hibernate($self) if $cb == \&fetch_blob; # slow, save RAM
 	long_response($self, $cb, $tag, [], $range_info, $ops, $partial);
 }
 
-# returns an arrayref of UIDs, so MSNs can be translated via:
-# $msn2uid->[$MSN-1] => $UID
-sub msn2uid ($) {
-	my ($self) = @_;
-	my $x = $self->{uid_base};
-	$self->{ibx}->over->uid_range($x + 1, $x + UID_SLICE);
-}
-
-sub msn_to_uid_range ($$) {
-	my $msn2uid = $_[0];
-	$_[1] =~ s!([0-9]+)!$msn2uid->[$1 - 1] // ($msn2uid->[-1] + 1)!sge;
-}
-
 sub cmd_fetch ($$$$;@) {
 	my ($self, $tag, $range_csv, @want) = @_;
 	my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n";
@@ -986,6 +1072,7 @@ sub cmd_fetch ($$$$;@) {
 	msn_to_uid_range(msn2uid($self), $range_csv);
 	my $range_info = range_step($self, \$range_csv);
 	return "$tag $range_info\r\n" if !ref($range_info);
+	uo2m_hibernate($self) if $cb == \&fetch_blob; # slow, save RAM
 	long_response($self, $cb, $tag, [], $range_info, $ops, $partial);
 }
 
@@ -1318,7 +1405,8 @@ sub event_step {
 			$self->write(\"\* BAD request too long\r\n");
 			return $self->close;
 		}
-		$self->do_read($rbuf, LINE_MAX, length($$rbuf)) or return;
+		$self->do_read($rbuf, LINE_MAX, length($$rbuf)) or
+				return uo2m_hibernate($self);
 		$line = index($$rbuf, "\n");
 	}
 	$line = substr($$rbuf, 0, $line + 1, '');
diff --git a/t/imap.t b/t/imap.t
index 81b3d844..0700f578 100644
--- a/t/imap.t
+++ b/t/imap.t
@@ -130,4 +130,47 @@ EOF
 	], 'placed op_eml_new before emit_body');
 }
 
+# UID <=> MSN mapping
+
+sub uo2m_str_new ($) {
+	no warnings 'uninitialized'; # uom2m_ary_new may have may have undef
+	pack('S*', @{$_[0]->uo2m_ary_new}); # 2 bytes per-MSN
+}
+
+{
+	my $ibx = bless { uid_range => [ 1, 2, 4 ] }, 'Uo2mTestInbox';
+	my $imap = bless { uid_base => 0, ibx => $ibx }, 'PublicInbox::IMAP';
+	my $uo2m = $imap->uo2m_ary_new;
+	is_deeply($uo2m, [ 1, 2, undef, 3 ], 'uo2m ary');
+	$uo2m = uo2m_str_new($imap);
+	is_deeply([ unpack('S*', $uo2m) ], [ 1, 2, 0, 3 ], 'uo2m str');
+
+	$ibx->{uid_range} = [ 1, 2, 4, 5, 6 ];
+	for ([ 1, 2, undef, 3 ], $uo2m) {
+		$imap->{uo2m} = $_;
+		is($imap->uid2msn(1), 1, 'uid2msn');
+		is($imap->uid2msn(4), 3, 'uid2msn');
+		is($imap->uo2m_last_uid, 4, 'uo2m_last_uid');
+		$imap->uo2m_extend(6);
+		is($imap->uid2msn(5), 4, 'uid2msn 5 => 4');
+		is($imap->uid2msn(6), 5, 'uid2msn 6 => 5');
+		is($imap->uo2m_last_uid, 6, 'uo2m_last_uid');
+
+		my $msn2uid = $imap->msn2uid;
+		my $range = '1,4:5';
+		$imap->can('msn_to_uid_range')->($msn2uid, $range);
+		is($range, '1,5:6', 'range converted');
+	}
+}
+
 done_testing;
+
+package Uo2mTestInbox;
+use strict;
+require PublicInbox::DummyInbox;
+our @ISA = qw(PublicInbox::DummyInbox);
+sub over { shift }
+sub uid_range {
+	my ($self, $beg, $end, undef) = @_;
+	[ grep { $_ >= $beg && $_ <= $end } @{$self->{uid_range}} ];
+}
diff --git a/xt/mem-imapd-tls.t b/xt/mem-imapd-tls.t
new file mode 100644
index 00000000..accf7564
--- /dev/null
+++ b/xt/mem-imapd-tls.t
@@ -0,0 +1,243 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# Idle client memory usage test, particularly after EXAMINE when
+# Message Sequence Numbers are loaded
+use strict;
+use Test::More;
+use Socket qw(SOCK_STREAM IPPROTO_TCP SOL_SOCKET);
+use PublicInbox::TestCommon;
+use PublicInbox::Syscall qw(:epoll);
+use PublicInbox::DS;
+require_mods(qw(DBD::SQLite));
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+my $TEST_TLS;
+SKIP: {
+	require_mods('IO::Socket::SSL', 1);
+	$TEST_TLS = $ENV{TEST_TLS} // 1;
+};
+plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
+diag 'TEST_COMPRESS='.($ENV{TEST_COMPRESS} // 1) . " TEST_TLS=$TEST_TLS";
+
+my ($cert, $key) = qw(certs/server-cert.pem certs/server-key.pem);
+if ($TEST_TLS) {
+	if (!-r $key || !-r $cert) {
+		plan skip_all =>
+			"certs/ missing for $0, run ./certs/create-certs.perl";
+	}
+	use_ok 'PublicInbox::TLS';
+}
+my ($tmpdir, $for_destroy) = tmpdir();
+my ($out, $err) = ("$tmpdir/stdout.log", "$tmpdir/stderr.log");
+my $pi_config = "$tmpdir/pi_config";
+my $group = 'inbox.test';
+local $SIG{PIPE} = 'IGNORE'; # for IMAPC (below)
+my $imaps = tcp_server();
+{
+	open my $fh, '>', $pi_config or die "open: $!\n";
+	print $fh <<EOF or die;
+[publicinbox "imapd-tls"]
+	inboxdir = $inboxdir
+	address = $group\@example.com
+	newsgroup = $group
+	indexlevel = basic
+EOF
+	close $fh or die "close: $!\n";
+}
+my $imaps_addr = $imaps->sockhost . ':' . $imaps->sockport;
+my $env = { PI_CONFIG => $pi_config };
+my $arg = $TEST_TLS ? [ "-limaps://$imaps_addr/?cert=$cert,key=$key" ] : [];
+my $cmd = [ '-imapd', '-W0', @$arg, "--stdout=$out", "--stderr=$err" ];
+my $td = start_script($cmd, $env, { 3 => $imaps });
+my %ssl_opt;
+if ($TEST_TLS) {
+	%ssl_opt = (
+		SSL_hostname => 'server.local',
+		SSL_verifycn_name => 'server.local',
+		SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_PEER(),
+		SSL_ca_file => 'certs/test-ca.pem',
+	);
+	my $ctx = IO::Socket::SSL::SSL_Context->new(%ssl_opt);
+
+	# cf. https://rt.cpan.org/Ticket/Display.html?id=129463
+	my $mode = eval { Net::SSLeay::MODE_RELEASE_BUFFERS() };
+	if ($mode && $ctx->{context}) {
+		eval { Net::SSLeay::CTX_set_mode($ctx->{context}, $mode) };
+		warn "W: $@ (setting SSL_MODE_RELEASE_BUFFERS)\n" if $@;
+	}
+
+	$ssl_opt{SSL_reuse_ctx} = $ctx;
+	$ssl_opt{SSL_startHandshake} = 0;
+}
+chomp(my $nfd = `/bin/sh -c 'ulimit -n'`);
+$nfd -= 10;
+ok($nfd > 0, 'positive FD count');
+my $MAX_FD = 10000;
+$nfd = $MAX_FD if $nfd >= $MAX_FD;
+our $DONE = 0;
+sub once { 0 }; # stops event loop
+
+# setup the event loop so that it exits at every step
+# while we're still doing connect(2)
+PublicInbox::DS->SetLoopTimeout(0);
+PublicInbox::DS->SetPostLoopCallback(\&once);
+my $pid = $td->{pid};
+if ($^O eq 'linux' && open(my $f, '<', "/proc/$pid/status")) {
+	diag(grep(/RssAnon/, <$f>));
+}
+
+foreach my $n (1..$nfd) {
+	my $io = tcp_connect($imaps, Blocking => 0);
+	$io = IO::Socket::SSL->start_SSL($io, %ssl_opt) if $TEST_TLS;
+	IMAPC->new($io);
+
+	# one step through the event loop
+	# do a little work as we connect:
+	PublicInbox::DS->EventLoop;
+
+	# try not to overflow the listen() backlog:
+	if (!($n % 128) && $DONE != $n) {
+		diag("nr: ($n) $DONE/$nfd");
+		PublicInbox::DS->SetLoopTimeout(-1);
+		PublicInbox::DS->SetPostLoopCallback(sub { $DONE != $n });
+
+		# clear the backlog:
+		PublicInbox::DS->EventLoop;
+
+		# resume looping
+		PublicInbox::DS->SetLoopTimeout(0);
+		PublicInbox::DS->SetPostLoopCallback(\&once);
+	}
+}
+
+# run the event loop normally, now:
+diag "done?: @".time." $DONE/$nfd";
+if ($DONE != $nfd) {
+	PublicInbox::DS->SetLoopTimeout(-1);
+	PublicInbox::DS->SetPostLoopCallback(sub { $DONE != $nfd });
+	PublicInbox::DS->EventLoop;
+}
+is($nfd, $DONE, "$nfd/$DONE done");
+if ($^O eq 'linux' && open(my $f, '<', "/proc/$pid/status")) {
+	diag(grep(/RssAnon/, <$f>));
+	diag "  SELF lsof | wc -l ".`lsof -p $$ |wc -l`;
+	diag "SERVER lsof | wc -l ".`lsof -p $pid |wc -l`;
+}
+PublicInbox::DS->Reset;
+$td->kill;
+$td->join;
+is($?, 0, 'no error in exited process');
+done_testing;
+
+package IMAPC;
+use strict;
+use base qw(PublicInbox::DS);
+use fields qw(step zin);
+use PublicInbox::Syscall qw(EPOLLIN EPOLLOUT EPOLLONESHOT);
+use Errno qw(EAGAIN);
+# determines where we start event_step
+use constant FIRST_STEP => ($ENV{TEST_COMPRESS} // 1) ? -2 : 0;
+
+# return true if complete, false if incomplete (or failure)
+sub connect_tls_step {
+	my ($self) = @_;
+	my $sock = $self->{sock} or return;
+	return 1 if $sock->connect_SSL;
+	return $self->drop("$!") if $! != EAGAIN;
+	if (my $ev = PublicInbox::TLS::epollbit()) {
+		unshift @{$self->{wbuf}}, \&connect_tls_step;
+		PublicInbox::DS::epwait($sock, $ev | EPOLLONESHOT);
+		0;
+	} else {
+		$self->drop('BUG? EAGAIN but '.PublicInbox::TLS::err());
+	}
+}
+
+sub event_step {
+	my ($self) = @_;
+
+	# TLS negotiation happens in flush_write via {wbuf}
+	return unless $self->flush_write && $self->{sock};
+
+	if ($self->{step} == -2) {
+		$self->do_read(\(my $buf = ''), 128) or return;
+		$buf =~ /\A\* OK / or die 'no greeting';
+		$self->{step} = -1;
+		$self->write(\"1 COMPRESS DEFLATE\r\n");
+	}
+	if ($self->{step} == -1) {
+		$self->do_read(\(my $buf = ''), 128) or return;
+		$buf =~ /\A1 OK / or die "no compression $buf";
+		IMAPCdeflate->enable($self);
+		$self->{step} = 1;
+		$self->write(\"2 EXAMINE inbox.test.0\r\n");
+	}
+	if ($self->{step} == 0) {
+		$self->do_read(\(my $buf = ''), 128) or return;
+		$buf =~ /\A\* OK / or die 'no greeting';
+		$self->{step} = 1;
+		$self->write(\"2 EXAMINE inbox.test.0\r\n");
+	}
+	if ($self->{step} == 1) {
+		my $buf = '';
+		until ($buf =~ /^2 OK \[READ-ONLY/ms) {
+			$self->do_read(\$buf, 4096, length($buf)) or return;
+		}
+		$self->{step} = 2;
+		$self->write(\"3 UID FETCH 1 (UID FLAGS)\r\n");
+	}
+	if ($self->{step} == 2) {
+		my $buf = '';
+		until ($buf =~ /^3 OK /ms) {
+			$self->do_read(\$buf, 4096, length($buf)) or return;
+		}
+		$self->{step} = 3;
+		$self->write(\"4 IDLE\r\n");
+	}
+	if ($self->{step} == 3) {
+		$self->do_read(\(my $buf = ''), 128) or return;
+		no warnings 'once';
+		$::DONE++;
+		$self->{step} = 5; # all done
+	} else {
+		warn "$self->{step} Should never get here $self";
+	}
+}
+
+sub new {
+	my ($class, $io) = @_;
+	my $self = fields::new($class);
+
+	# wait for connect(), and maybe SSL_connect()
+	$self->SUPER::new($io, EPOLLOUT|EPOLLONESHOT);
+	if ($io->can('connect_SSL')) {
+		$self->{wbuf} = [ \&connect_tls_step ];
+	}
+	$self->{step} = FIRST_STEP;
+	$self;
+}
+
+1;
+package IMAPCdeflate;
+use strict;
+use base qw(IMAPC); # parent doesn't work for fields
+use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways
+use Compress::Raw::Zlib;
+use PublicInbox::IMAPdeflate;
+my %ZIN_OPT;
+BEGIN {
+	%ZIN_OPT = ( -WindowBits => -15, -AppendOutput => 1 );
+	*write = \&PublicInbox::IMAPdeflate::write;
+	*do_read = \&PublicInbox::IMAPdeflate::do_read;
+};
+
+sub enable {
+	my ($class, $self) = @_;
+	my ($in, $err) = Compress::Raw::Zlib::Inflate->new(%ZIN_OPT);
+	die "Inflate->new failed: $err" if $err != Z_OK;
+	unlock_hash(%$self);
+	bless $self, $class;
+	$self->{zin} = $in;
+}
+
+1;

^ permalink raw reply related	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2020-06-12 23:49 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10  7:03 [PATCH 00/82] public-inbox-imapd: read-only IMAP server Eric Wong
2020-06-10  7:03 ` [PATCH 01/82] doc: add some IMAP standards Eric Wong
2020-06-10  7:03 ` [PATCH 02/82] nntpd: restrict allowed newsgroup names Eric Wong
2020-06-10  7:04 ` [PATCH 03/82] preliminary imap server implementation Eric Wong
2020-06-10  7:04 ` [PATCH 04/82] inboxidle: new class to detect inbox changes Eric Wong
2020-06-10  7:04 ` [PATCH 05/82] imap: support IDLE Eric Wong
2020-06-10  7:04 ` [PATCH 06/82] msgmap: split ->max into its own method Eric Wong
2020-06-10  7:04 ` [PATCH 07/82] imap: delay InboxIdle start, support refresh Eric Wong
2020-06-10  7:04 ` [PATCH 08/82] imap: implement STATUS command Eric Wong
2020-06-10  7:04 ` [PATCH 09/82] imap: use Text::ParseWords::parse_line to handle quoted words Eric Wong
2020-06-10  7:04 ` [PATCH 10/82] imap: support LIST command Eric Wong
2020-06-10  7:04 ` [PATCH 11/82] t/imapd: support FakeInotify and KQNotify Eric Wong
2020-06-10  7:04 ` [PATCH 12/82] imap: support fetch for BODYSTRUCTURE and BODY Eric Wong
2020-06-10  7:04 ` [PATCH 13/82] eml: each_part: single part $idx is 1 Eric Wong
2020-06-10  7:04 ` [PATCH 14/82] imap: allow fetch of partial of BODY[...] and headers Eric Wong
2020-06-10  7:04 ` [PATCH 15/82] imap: always include `resp-text' in responses Eric Wong
2020-06-10  7:04 ` [PATCH 16/82] imap: split out unit tests and benchmarks Eric Wong
2020-06-10  7:04 ` [PATCH 17/82] imap: fix multi-message partial header fetches Eric Wong
2020-06-10  7:04 ` [PATCH 18/82] imap: simplify partial fetch structure Eric Wong
2020-06-10  7:04 ` [PATCH 19/82] imap: support sequence number FETCH Eric Wong
2020-06-10  7:04 ` [PATCH 20/82] imap: do not include ".PEEK" in responses Eric Wong
2020-06-10  7:04 ` [PATCH 21/82] imap: support the CLOSE command Eric Wong
2020-06-10  7:04 ` [PATCH 22/82] imap: speed up HEADER.FIELDS[.NOT] range fetches Eric Wong
2020-06-10  7:04 ` [PATCH 23/82] git: async: flatten the inflight array Eric Wong
2020-06-10  7:04 ` [PATCH 24/82] git: do our own read buffering for cat-file Eric Wong
2020-06-10  7:04 ` [PATCH 25/82] imap: use git-cat-file asynchronously Eric Wong
2020-06-10  7:04 ` [PATCH 26/82] git: idle rbuf for async Eric Wong
2020-06-10  7:04 ` [PATCH 27/82] imap: support LSUB command Eric Wong
2020-06-10  7:04 ` [PATCH 28/82] imap: FETCH: support comma-delimited ranges Eric Wong
2020-06-10  7:04 ` [PATCH 29/82] add imapd compression test Eric Wong
2020-06-10  7:04 ` [PATCH 30/82] testcommon: tcp_(server|connect): BAIL_OUT on failure Eric Wong
2020-06-10  7:04 ` [PATCH 31/82] *deflate: drop invalid comment about rbuf Eric Wong
2020-06-10  7:04 ` [PATCH 32/82] imap: fix pipelining with async git Eric Wong
2020-06-10  7:04 ` [PATCH 33/82] git: cat_async: provide requested OID + "missing" on missing blobs Eric Wong
2020-06-10  7:04 ` [PATCH 34/82] git: move async_cat reference to PublicInbox::Git Eric Wong
2020-06-10  7:04 ` [PATCH 35/82] git: async: automatic retry on alternates change Eric Wong
2020-06-10  7:04 ` [PATCH 36/82] imapclient: wrapper for Mail::IMAPClient Eric Wong
2020-06-10  7:04 ` [PATCH 37/82] xt: add imapd-validate and imapd-mbsync-oimap Eric Wong
2020-06-10  7:04 ` [PATCH 38/82] imap: support out-of-bounds ranges Eric Wong
2020-06-10  7:04 ` [PATCH 39/82] xt/perf-imap-list: time refresh_inboxlist Eric Wong
2020-06-10  7:04 ` [PATCH 40/82] imap: case-insensitive mailbox name comparisons Eric Wong
2020-06-10  7:04 ` [PATCH 41/82] imap: break giant inboxes into sub-inboxes of 50K messages Eric Wong
2020-06-10  7:04 ` [PATCH 42/82] imap: start doing iterative config reloading Eric Wong
2020-06-10  7:04 ` [PATCH 43/82] imap: require ".$UID_MIN-$UID_END" suffix Eric Wong
2020-06-10  7:04 ` [PATCH 44/82] imapd: ensure LIST is sorted alphabetically, for now Eric Wong
2020-06-10  7:04 ` [PATCH 45/82] imap: omit $UID_END from mailbox name, use index Eric Wong
2020-06-10  7:04 ` [PATCH 46/82] t/config.t: always compare against git bool behavior Eric Wong
2020-06-10  7:04 ` [PATCH 47/82] xt/*: show some tunable parameters Eric Wong
2020-06-10  7:04 ` [PATCH 48/82] imap: STATUS and LIST are case-insensitive, too Eric Wong
2020-06-10  7:04 ` [PATCH 49/82] imap: EXAMINE/STATUS: return correct counts Eric Wong
2020-06-10  7:04 ` [PATCH 50/82] imap: avoid uninitialized warnings on incomplete commands Eric Wong
2020-06-10  7:04 ` [PATCH 51/82] imap: start parsing out queries for SQLite and Xapian Eric Wong
2020-06-10  7:04 ` [PATCH 52/82] imap: SEARCH: clamp results to the 50K UID range Eric Wong
2020-06-10  7:04 ` [PATCH 53/82] imap: allow UID range search on timestamps Eric Wong
2020-06-10  7:04 ` [PATCH 54/82] over: get_art: use dbh->prepare_cached Eric Wong
2020-06-10  7:04 ` [PATCH 55/82] search: index byte size of a message for IMAP search Eric Wong
2020-06-10  7:04 ` [PATCH 56/82] search: index UID for IMAP search, too Eric Wong
2020-06-10  7:04 ` [PATCH 57/82] imap: remove dummies from sequence number FETCH Eric Wong
2020-06-10  7:04 ` [PATCH 58/82] imap: compile UID FETCH to opcodes Eric Wong
2020-06-10  7:04 ` [PATCH 59/82] imap: UID FETCH: optimize for smsg-only case Eric Wong
2020-06-10  7:04 ` [PATCH 60/82] imap: UID FETCH: optimize (UID FLAGS) harder Eric Wong
2020-06-10  7:04 ` [PATCH 61/82] imap: IDLE: avoid extraneous wakeups, keep-alive Eric Wong
2020-06-10  7:04 ` [PATCH 62/82] imap: 30 minute auto-logout timer Eric Wong
2020-06-10  7:05 ` [PATCH 63/82] imap: split ->logged_in attribute into a separate class Eric Wong
2020-06-10  7:05 ` [PATCH 64/82] searchidx: v1 (re)-index uses git asynchronously Eric Wong
2020-06-10  7:05 ` [PATCH 65/82] index: account for CRLF conversion when storing bytes Eric Wong
2020-06-10  7:05 ` [PATCH 66/82] imap: rely on smsg->{bytes} for RFC822.SIZE Eric Wong
2020-06-10  7:05 ` [PATCH 67/82] imap: UID FETCH requires at least one data item Eric Wong
2020-06-10  7:05 ` [PATCH 68/82] imap: LIST shows "INBOX" in all caps Eric Wong
2020-06-10  7:05 ` [PATCH 69/82] imap: support 8000 octet lines Eric Wong
2020-06-10  7:05 ` [PATCH 70/82] imap: reinstate some message sequence number support Eric Wong
2020-06-10  7:05 ` [PATCH 71/82] imap: cleanup ->{uid_base} usage Eric Wong
2020-06-10  7:05 ` [PATCH 72/82] imap: FETCH: more granular CRLF conversion Eric Wong
2020-06-10  7:05 ` [PATCH 73/82] imap: further speed up HEADER.FIELDS FETCH requests Eric Wong
2020-06-10  7:05 ` [PATCH 74/82] imap: FETCH: try to make fake MSNs sequentially Eric Wong
2020-06-10  7:05 ` [PATCH 75/82] imap: STATUS/EXAMINE: rely on SQLite overview Eric Wong
2020-06-10  7:05 ` [PATCH 76/82] imap: UID SEARCH: support multiple ranges Eric Wong
2020-06-10  7:05 ` [PATCH 77/82] imap: wire up Xapian, MSN SEARCH and multi sequence-sets Eric Wong
2020-06-10  7:05 ` [PATCH 78/82] imap: misc cleanups and notes Eric Wong
2020-06-10  7:05 ` [PATCH 79/82] imapd: don't bother sorting LIST output Eric Wong
2020-06-10  7:05 ` [PATCH 80/82] imap: remove non-UID SEARCH for now Eric Wong
2020-06-10  7:05 ` [PATCH 81/82] over: uid_range: remove LIMIT Eric Wong
2020-06-10  7:05 ` [PATCH 82/82] imap: FETCH: proper MSN => UID mapping for requests Eric Wong
2020-06-12 23:49 ` [PATCH 83/82] imap: introduce memory-efficient uo2m mapping Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).