So I finally wrote my first IMAP server! And I'm actually fairly satisfied with how it's turning out to support a bunch of other performance + scalability work I've wanted to do. Some previous notes here: https://public-inbox.org/meta/20200609113442.GA16856@dcvr/ I finally seem to have gotten it to play nicely with mutt header caching, so it's fit for public consumption :) imaps://news.public-inbox.org/INBOX.comp.mail.public-inbox.meta.0 You can use any username+password, and AUTH=ANONYMOUS also works if your client does that. It doesn't support UTF-7 (mailbox names) or advertise UTF-8 in CAPABILITIES, yet; I still have RFCs to read :P And there's a bunch of new things which could use some testing from non-mutt/mbsync/offlineimap users. Maybe you'll find some client-side bugs like I did :P v1 reindexing also gets a little bit of parallelism :) Anyways, I'll probably be porting some of the scalability and slow-storage work to older parts of the code before fiddling with more IMAP extensions. Eric Wong (82): doc: add some IMAP standards nntpd: restrict allowed newsgroup names preliminary imap server implementation inboxidle: new class to detect inbox changes imap: support IDLE msgmap: split ->max into its own method imap: delay InboxIdle start, support refresh imap: implement STATUS command imap: use Text::ParseWords::parse_line to handle quoted words imap: support LIST command t/imapd: support FakeInotify and KQNotify imap: support fetch for BODYSTRUCTURE and BODY eml: each_part: single part $idx is 1 imap: allow fetch of partial of BODY[...] and headers imap: always include `resp-text' in responses imap: split out unit tests and benchmarks imap: fix multi-message partial header fetches imap: simplify partial fetch structure imap: support sequence number FETCH imap: do not include ".PEEK" in responses imap: support the CLOSE command imap: speed up HEADER.FIELDS[.NOT] range fetches git: async: flatten the inflight array git: do our own read buffering for cat-file imap: use git-cat-file asynchronously git: idle rbuf for async imap: support LSUB command imap: FETCH: support comma-delimited ranges add imapd compression test testcommon: tcp_(server|connect): BAIL_OUT on failure *deflate: drop invalid comment about rbuf imap: fix pipelining with async git git: cat_async: provide requested OID + "missing" on missing blobs git: move async_cat reference to PublicInbox::Git git: async: automatic retry on alternates change imapclient: wrapper for Mail::IMAPClient xt: add imapd-validate and imapd-mbsync-oimap imap: support out-of-bounds ranges xt/perf-imap-list: time refresh_inboxlist imap: case-insensitive mailbox name comparisons imap: break giant inboxes into sub-inboxes of 50K messages imap: start introducing iterative config reloading imap: require ".$UID_MIN-$UID_END" suffix imapd: ensure LIST is sorted alphabetically, for now imap: omit $UID_END from mailbox name, use index t/config.t: always compare against git bool behavior xt/*: show some tunable parameters imap: STATUS and LIST are case-insensitive, too imap: EXAMINE/STATUS: return correct counts imap: avoid uninitialized warnings on incomplete commands imap: start parsing out queries for SQLite and Xapian imap: SEARCH: clamp results to the 50K UID range imap: allow UID range search on timestamps over: get_art: use dbh->prepare_cached search: index byte size of a message for IMAP search search: index UID for IMAP search, too imap: remove dummies from sequence number FETCH imap: compile UID FETCH to opcodes imap: UID FETCH: optimize for smsg-only case imap: UID FETCH: optimize (UID FLAGS) harder imap: IDLE: avoid extraneous wakeups, keep-alive imap: 30 minute auto-logout timer imap: split ->logged_in attribute into a separate class searchidx: v1 (re)-index uses git asynchronously index: account for CRLF conversion when storing bytes imap: rely on smsg->{bytes} for RFC822.SIZE imap: UID FETCH requires at least one data item imap: LIST shows "INBOX" in all caps imap: support 8000 octet lines imap: reinstate some message sequence number support imap: cleanup ->{uid_base} usage imap: FETCH: more granular CRLF conversion imap: further speed up HEADER.FIELDS FETCH requests imap: FETCH: try to make fake MSNs sequentially imap: STATUS/EXAMINE: rely on SQLite overview imap: UID SEARCH: support multiple ranges imap: wire up Xapian search, msn SEARCH and multiple ranges imap: misc cleanups and notes imapd: don't bother sorting LIST output imap: drop non-UID SEARCH for now over: uid_range: remove LIMIT imap: FETCH: proper MSN => UID mapping for requests Documentation/public-inbox-imapd.pod | 91 ++ Documentation/standards.perl | 10 + MANIFEST | 18 + lib/PublicInbox/Config.pm | 18 + lib/PublicInbox/Daemon.pm | 24 +- lib/PublicInbox/DummyInbox.pm | 22 + lib/PublicInbox/Eml.pm | 9 +- lib/PublicInbox/FakeInotify.pm | 59 ++ lib/PublicInbox/Git.pm | 163 +-- lib/PublicInbox/GitAsyncCat.pm | 51 + lib/PublicInbox/IMAP.pm | 1397 ++++++++++++++++++++++++++ lib/PublicInbox/IMAPClient.pm | 119 +++ lib/PublicInbox/IMAPD.pm | 114 +++ lib/PublicInbox/IMAPdeflate.pm | 126 +++ lib/PublicInbox/Import.pm | 2 +- lib/PublicInbox/In2Tie.pm | 17 + lib/PublicInbox/Inbox.pm | 33 +- lib/PublicInbox/InboxIdle.pm | 79 ++ lib/PublicInbox/KQNotify.pm | 66 ++ lib/PublicInbox/Lock.pm | 7 + lib/PublicInbox/MsgIter.pm | 2 +- lib/PublicInbox/Msgmap.pm | 20 +- lib/PublicInbox/NNTPD.pm | 12 +- lib/PublicInbox/NNTPdeflate.pm | 1 - lib/PublicInbox/Over.pm | 50 +- lib/PublicInbox/Search.pm | 32 +- lib/PublicInbox/SearchIdx.pm | 89 +- lib/PublicInbox/SearchIdxShard.pm | 11 +- lib/PublicInbox/Smsg.pm | 8 +- lib/PublicInbox/TestCommon.pm | 7 +- lib/PublicInbox/V2Writable.pm | 10 +- script/public-inbox-imapd | 14 + t/config.t | 15 +- t/eml.t | 2 +- t/git.t | 40 +- t/imap.t | 133 +++ t/imapd-tls.t | 204 ++++ t/imapd.t | 398 ++++++++ t/import.t | 5 +- t/inbox_idle.t | 72 ++ t/nntpd.t | 5 +- t/over.t | 3 + t/search.t | 19 + xt/cmp-msgstr.t | 1 - xt/cmp-msgview.t | 1 - xt/eml_check_limits.t | 6 +- xt/git_async_cmp.t | 2 +- xt/imapd-mbsync-oimap.t | 132 +++ xt/imapd-validate.t | 177 ++++ xt/mem-msgview.t | 1 + xt/msgtime_cmp.t | 1 - xt/perf-msgview.t | 1 - 52 files changed, 3718 insertions(+), 181 deletions(-) create mode 100644 Documentation/public-inbox-imapd.pod create mode 100644 lib/PublicInbox/DummyInbox.pm create mode 100644 lib/PublicInbox/FakeInotify.pm create mode 100644 lib/PublicInbox/GitAsyncCat.pm create mode 100644 lib/PublicInbox/IMAP.pm create mode 100644 lib/PublicInbox/IMAPClient.pm create mode 100644 lib/PublicInbox/IMAPD.pm create mode 100644 lib/PublicInbox/IMAPdeflate.pm create mode 100644 lib/PublicInbox/In2Tie.pm create mode 100644 lib/PublicInbox/InboxIdle.pm create mode 100644 lib/PublicInbox/KQNotify.pm create mode 100644 script/public-inbox-imapd create mode 100644 t/imap.t create mode 100644 t/imapd-tls.t create mode 100644 t/imapd.t create mode 100644 t/inbox_idle.t create mode 100644 xt/imapd-mbsync-oimap.t create mode 100644 xt/imapd-validate.t
There's more, but IMAP is big and complex already. --- Documentation/standards.perl | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/Documentation/standards.perl b/Documentation/standards.perl index 34ab829e2e3..37309956f39 100755 --- a/Documentation/standards.perl +++ b/Documentation/standards.perl @@ -42,6 +42,15 @@ my $rfcs = [ 822 => 'Internet message format (1982)', 2822 => 'Internet message format (2001)', 5322 => 'Internet message format (2008)', + 3501 => 'IMAP4rev1', + 2177 => 'IMAP IDLE', # TODO + # 5032 = 'WITHIN search extension for IMAP', + 4978 => 'IMAP COMPRESS Extension', + # 5182 = 'IMAP Extension for Referencing the Last SEARCH Result', + # 5738 => 'IMAP Support for UTF-8', + # 8474 => 'IMAP Extension for Object Identifiers', + # ... + # TODO: flesh this out ];
We'll be using newsgroup names as mailbox names for IMAP, too, so ensure we don't send wonky characters in responses. I doubt this affects any real-world instances, but a BOFH could choose strange names to cause grief for clients. --- lib/PublicInbox/NNTPD.pm | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lib/PublicInbox/NNTPD.pm b/lib/PublicInbox/NNTPD.pm index 451f4d4183b..b8ec84ed6b0 100644 --- a/lib/PublicInbox/NNTPD.pm +++ b/lib/PublicInbox/NNTPD.pm @@ -41,6 +41,12 @@ sub refresh_groups () { if (ref $ngname) { warn 'multiple newsgroups not supported: '. join(', ', @$ngname). "\n"; + # Newsgroup name needs to be compatible with RFC 3977 + # wildmat-exact and RFC 3501 (IMAP) ATOM-CHAR. + # Leave out a few chars likely to cause problems or conflicts: + # '|', '<', '>', ';', '#', '$', '&', + } elsif ($ngname =~ m![^A-Za-z0-9/_\.\-\~\@\+\=:]!) { + warn "newsgroup name invalid: `$ngname'\n"; } elsif ($ng->nntp_usable) { # Only valid if msgmap and search works $new->{$ngname} = $ng;
It shares a bit of code with NNTP. It's copy+pasted for now since this provides new ground to experiment with APIs for dealing with slow storage and many inboxes. --- Documentation/public-inbox-imapd.pod | 91 +++++ MANIFEST | 7 + lib/PublicInbox/Daemon.pm | 24 +- lib/PublicInbox/IMAP.pm | 523 +++++++++++++++++++++++++++ lib/PublicInbox/IMAPD.pm | 15 + lib/PublicInbox/IMAPdeflate.pm | 119 ++++++ lib/PublicInbox/Smsg.pm | 8 +- script/public-inbox-imapd | 14 + t/imapd-tls.t | 209 +++++++++++ t/imapd.t | 149 ++++++++ 10 files changed, 1146 insertions(+), 13 deletions(-) create mode 100644 Documentation/public-inbox-imapd.pod create mode 100644 lib/PublicInbox/IMAP.pm create mode 100644 lib/PublicInbox/IMAPD.pm create mode 100644 lib/PublicInbox/IMAPdeflate.pm create mode 100644 script/public-inbox-imapd create mode 100644 t/imapd-tls.t create mode 100644 t/imapd.t diff --git a/Documentation/public-inbox-imapd.pod b/Documentation/public-inbox-imapd.pod new file mode 100644 index 00000000000..02027f4f254 --- /dev/null +++ b/Documentation/public-inbox-imapd.pod @@ -0,0 +1,91 @@ +=head1 NAME + +public-inbox-imapd - IMAP server for sharing public-inboxes + +=head1 SYNOPSIS + +B<public-inbox-imapd> [OPTIONS] + +=head1 DESCRIPTION + +public-inbox-imapd provides a read-only IMAP daemon for +public-inbox. It uses options and environment variables common +to all L<public-inbox-daemon(8)> implementations. + +Like L<public-inbox-nntpd(1)> and L<public-inbox-httpd(1)>, +C<public-inbox-imapd> will never require write access +to the directory where the public-inboxes are stored, so it +may be run as a different user than the user running +L<public-inbox-watch(1)>, L<public-inbox-mda(1)>, or +L<git-fetch(1)>. + +=head1 OPTIONS + +See common options in L<public-inbox-daemon(8)/OPTIONS>. +Additionally, IMAP-specific behavior for certain options +are supported and documented below. + +=over + +=item -l, --listen PROTO://ADDRESS/?cert=/path/to/cert,key=/path/to/key + +In addition to the normal C<-l>/C<--listen> switch described in +L<public-inbox-daemon(8)>, the C<PROTO> prefix (e.g. C<imap://> or +C<imaps://>) may be specified to force a given protocol. + +For STARTTLS and IMAPS support, the C<cert> and C<key> may be specified +on a per-listener basis after a C<?> character and separated by C<,>. +These directives are per-directive, and it's possible to use a different +cert for every listener. + +=item --cert /path/to/cert + +The default TLS certificate for optional STARTTLS and IMAPS support +if the C<cert> option is not given with C<--listen>. + +If using systemd-compatible socket activation and a TCP listener on port +993 is inherited, it is automatically IMAPS when this option is given. +When a listener on port 143 is inherited and this option is given, it +automatically gets STARTTLS support. + +=item --key /path/to/key + +The default private TLS certificate key for optional STARTTLS and IMAPS +support if the C<key> option is not given with C<--listen>. The private +key may concatenated into the path used by C<--cert>, in which case this +option is not needed. + +=back + +=head1 CONFIGURATION + +C<public-inbox-imapd> uses the same configuration knobs +as L<public-inbox-nntpd(1)>, see L<public-inbox-nntpd(1)> +and L<public-inbox-config(5)>. + +=over 8 + +=item publicinbox.<name>.newsgroup + +The newsgroup name maps to an IMAP folder name. + +=back + +=head1 CONTACT + +Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org> + +The mail archives are hosted at L<https://public-inbox.org/meta/>, +L<nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta>, +L<nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta> + +=head1 COPYRIGHT + +Copyright 2020 all contributors L<mailto:meta@public-inbox.org> + +License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt> + +=head1 SEE ALSO + +L<git(1)>, L<git-config(1)>, L<public-inbox-daemon(8)>, +L<public-inbox-config(5)>, L<public-inbox-nntpd(1)> diff --git a/MANIFEST b/MANIFEST index 24f95faa942..73b874b42a0 100644 --- a/MANIFEST +++ b/MANIFEST @@ -26,6 +26,7 @@ Documentation/public-inbox-convert.pod Documentation/public-inbox-daemon.pod Documentation/public-inbox-edit.pod Documentation/public-inbox-httpd.pod +Documentation/public-inbox-imapd.pod Documentation/public-inbox-index.pod Documentation/public-inbox-init.pod Documentation/public-inbox-learn.pod @@ -124,6 +125,9 @@ lib/PublicInbox/HTTPD.pm lib/PublicInbox/HTTPD/Async.pm lib/PublicInbox/HlMod.pm lib/PublicInbox/Hval.pm +lib/PublicInbox/IMAP.pm +lib/PublicInbox/IMAPD.pm +lib/PublicInbox/IMAPdeflate.pm lib/PublicInbox/Import.pm lib/PublicInbox/Inbox.pm lib/PublicInbox/InboxWritable.pm @@ -193,6 +197,7 @@ script/public-inbox-compact script/public-inbox-convert script/public-inbox-edit script/public-inbox-httpd +script/public-inbox-imapd script/public-inbox-index script/public-inbox-init script/public-inbox-learn @@ -257,6 +262,8 @@ t/httpd-https.t t/httpd-unix.t t/httpd.t t/hval.t +t/imapd-tls.t +t/imapd.t t/import.t t/inbox.t t/index-git-times.t diff --git a/lib/PublicInbox/Daemon.pm b/lib/PublicInbox/Daemon.pm index 4ff7cad4939..2f63bd73b4a 100644 --- a/lib/PublicInbox/Daemon.pm +++ b/lib/PublicInbox/Daemon.pm @@ -1,6 +1,6 @@ # Copyright (C) 2015-2020 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> -# contains common daemon code for the nntpd and httpd servers. +# contains common daemon code for the httpd, imapd, and nntpd servers. # This may be used for read-only IMAP server if we decide to implement it. package PublicInbox::Daemon; use strict; @@ -29,8 +29,8 @@ my %tls_opt; # scheme://sockname => args for IO::Socket::SSL->start_SSL my $reexec_pid; my ($uid, $gid); my ($default_cert, $default_key); -my %KNOWN_TLS = ( 443 => 'https', 563 => 'nntps' ); -my %KNOWN_STARTTLS = ( 119 => 'nntp' ); +my %KNOWN_TLS = ( 443 => 'https', 563 => 'nntps', 993 => 'imaps' ); +my %KNOWN_STARTTLS = ( 119 => 'nntp', 143 => 'imap' ); sub accept_tls_opt ($) { my ($opt_str) = @_; @@ -123,7 +123,7 @@ sub daemon_prepare ($) { $tls_opt{"$scheme://$l"} = accept_tls_opt($1); } elsif (defined($default_cert)) { $tls_opt{"$scheme://$l"} = accept_tls_opt(''); - } elsif ($scheme =~ /\A(?:nntps|https)\z/) { + } elsif ($scheme =~ /\A(?:https|imaps|imaps)\z/) { die "$orig specified w/o cert=\n"; } # TODO: use scheme to load either NNTP.pm or HTTP.pm @@ -584,13 +584,13 @@ sub defer_accept ($$) { } sub daemon_loop ($$$$) { - my ($refresh, $post_accept, $nntpd, $af_default) = @_; + my ($refresh, $post_accept, $tlsd, $af_default) = @_; my %post_accept; while (my ($k, $v) = each %tls_opt) { - if ($k =~ s!\A(?:nntps|https)://!!) { + if ($k =~ s!\A(?:https|imaps|nntps)://!!) { $post_accept{$k} = tls_start_cb($v, $post_accept); - } elsif ($nntpd) { # STARTTLS, $k eq '' is OK - $nntpd->{accept_tls} = $v; + } elsif ($tlsd) { # STARTTLS, $k eq '' is OK + $tlsd->{accept_tls} = $v; } } my $sig = { @@ -620,8 +620,8 @@ sub daemon_loop ($$$$) { @listeners = map {; my $tls_cb = $post_accept{sockname($_)}; - # NNTPS, HTTPS, HTTP, and POP3S are client-first traffic - # NNTP and POP3 are server-first + # NNTPS, HTTPS, HTTP, IMAPS and POP3S are client-first traffic + # IMAP, NNTP and POP3 are server-first defer_accept($_, $tls_cb ? 'dataready' : $af_default); # this calls epoll_create: @@ -639,12 +639,12 @@ sub daemon_loop ($$$$) { } sub run ($$$;$) { - my ($default, $refresh, $post_accept, $nntpd) = @_; + my ($default, $refresh, $post_accept, $tlsd) = @_; local $SIG{PIPE} = 'IGNORE'; daemon_prepare($default); my $af_default = $default =~ /:8080\z/ ? 'httpready' : undef; my $for_destroy = daemonize(); - daemon_loop($refresh, $post_accept, $nntpd, $af_default); + daemon_loop($refresh, $post_accept, $tlsd, $af_default); PublicInbox::DS->Reset; # ->DESTROY runs when $for_destroy goes out-of-scope } diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm new file mode 100644 index 00000000000..c0636066b9f --- /dev/null +++ b/lib/PublicInbox/IMAP.pm @@ -0,0 +1,523 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# +# Each instance of this represents an IMAP client connected to +# public-inbox-imapd. Much of this was taken from NNTP, but +# further refined while experimenting on future ideas to handle +# slow storage. +# +# data notes: +# * NNTP article numbers are UIDs and message sequence numbers (MSNs) +# * Message sequence numbers (MSNs) can be stable since we're read-only. +# Most IMAP clients use UIDs (I hope), and we can return a dummy +# message if a client requests a non-existent MSN. + +package PublicInbox::IMAP; +use strict; +use base qw(PublicInbox::DS); +use fields qw(imapd logged_in ibx long_cb -login_tag); +use PublicInbox::Eml; +use PublicInbox::DS qw(now); +use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT); +use Errno qw(EAGAIN); +my $Address; +for my $mod (qw(Email::Address::XS Mail::Address)) { + eval "require $mod" or next; + $Address = $mod and last; +} +die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address; + +sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977? + +my %FETCH_NEED_BLOB = ( # for future optimization + 'BODY.PEEK[HEADER]' => 1, + 'BODY.PEEK[TEXT]' => 1, + 'BODY.PEEK[]' => 1, + 'BODY[HEADER]' => 1, + 'BODY[TEXT]' => 1, + 'BODY[]' => 1, + 'RFC822.HEADER' => 1, + 'RFC822.SIZE' => 1, # needs CRLF conversion :< + 'RFC822.TEXT' => 1, + BODY => 1, + BODYSTRUCTURE => 1, + ENVELOPE => 1, + FLAGS => 0, + INTERNALDATE => 0, + RFC822 => 1, + UID => 0, +); +my %FETCH_ATT = map { $_ => [ $_ ] } keys %FETCH_NEED_BLOB; + +# aliases (RFC 3501 section 6.4.5) +$FETCH_ATT{FAST} = [ qw(FLAGS INTERNALDATE RFC822.SIZE) ]; +$FETCH_ATT{ALL} = [ @{$FETCH_ATT{FAST}}, 'ENVELOPE' ]; +$FETCH_ATT{FULL} = [ @{$FETCH_ATT{ALL}}, 'BODY' ]; + +for my $att (keys %FETCH_ATT) { + my %h = map { $_ => 1 } @{$FETCH_ATT{$att}}; + $FETCH_ATT{$att} = \%h; +} + +sub greet ($) { + my ($self) = @_; + my $capa = capa($self); + $self->write(\"* OK [$capa] public-inbox-imapd ready\r\n"); +} + +sub new ($$$) { + my ($class, $sock, $imapd) = @_; + my $self = fields::new($class); + my $ev = EPOLLIN; + my $wbuf; + if ($sock->can('accept_SSL') && !$sock->accept_SSL) { + return CORE::close($sock) if $! != EAGAIN; + $ev = PublicInbox::TLS::epollbit(); + $wbuf = [ \&PublicInbox::DS::accept_tls_step, \&greet ]; + } + $self->SUPER::new($sock, $ev | EPOLLONESHOT); + $self->{imapd} = $imapd; + if ($wbuf) { + $self->{wbuf} = $wbuf; + } else { + greet($self); + } + $self->update_idle_time; + $self; +} + +sub capa ($) { + my ($self) = @_; + my $capa = 'CAPABILITY IMAP4rev1'; + if ($self->{logged_in}) { + $capa .= ' COMPRESS=DEFLATE'; + } else { + if (!($self->{sock} // $self)->can('accept_SSL') && + $self->{imapd}->{accept_tls}) { + $capa .= ' STARTTLS'; + } + $capa .= ' AUTH=ANONYMOUS'; + } +} + +sub login_success ($$) { + my ($self, $tag) = @_; + $self->{logged_in} = 1; + my $capa = capa($self); + "$tag OK [$capa] Logged in\r\n"; +} + +sub auth_challenge_ok ($) { + my ($self) = @_; + my $tag = delete($self->{-login_tag}) or return; + login_success($self, $tag); +} + +sub cmd_login ($$$$) { + my ($self, $tag) = @_; # ignore ($user, $password) = ($_[2], $_[3]) + login_success($self, $tag); +} + +sub cmd_logout ($$) { + my ($self, $tag) = @_; + delete $self->{logged_in}; + $self->write(\"* BYE logging out\r\n$tag OK logout completed\r\n"); + $self->shutdn; # PublicInbox::DS::shutdn + undef; +} + +sub cmd_authenticate ($$$) { + my ($self, $tag) = @_; # $method = $_[2], should be "ANONYMOUS" + $self->{-login_tag} = $tag; + "+\r\n"; # challenge +} + +sub cmd_capability ($$) { + my ($self, $tag) = @_; + '* '.capa($self)."\r\n$tag OK\r\n"; +} + +sub cmd_noop ($$) { "$_[1] OK NOOP completed\r\n" } + +sub cmd_examine ($$$) { + my ($self, $tag, $mailbox) = @_; + my $ibx = $self->{imapd}->{groups}->{$mailbox} or + return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; + my $mm = $ibx->mm; + my $max = $mm->num_highwater // 0; + # RFC 3501 2.3.1.1 - "A good UIDVALIDITY value to use in + # this case is a 32-bit representation of the creation + # date/time of the mailbox" + my $uidvalidity = $mm->created_at or return "$tag BAD UIDVALIDITY\r\n"; + my $uidnext = $max + 1; + + # XXX: do we need this? RFC 5162/7162 + my $ret = $self->{ibx} ? "* OK [CLOSED] previous closed\r\n" : ''; + $self->{ibx} = $ibx; + $ret .= <<EOF; +* $max EXISTS\r +* $max RECENT\r +* FLAGS (\\Seen)\r +* OK [PERMANENTFLAGS ()] Read-only mailbox\r +EOF + $ret .= "* OK [UNSEEN $max]\r\n" if $max; + $ret .= "* OK [UIDNEXT $uidnext]\r\n" if defined $uidnext; + $ret .= "* OK [UIDVALIDITY $uidvalidity]\r\n" if defined $uidvalidity; + $ret .= "$tag OK [READ-ONLY] EXAMINE/SELECT complete\r\n"; +} + +sub _esc ($) { + my ($v) = @_; + if (!defined($v)) { + 'NIL'; + } elsif ($v =~ /[{"\r\n%*\\\[]/) { # literal string + '{' . length($v) . "}\r\n" . $v; + } else { # quoted string + qq{"$v"} + } +} + +sub addr_envelope ($$;$) { + my ($eml, $x, $y) = @_; + my $v = $eml->header_raw($x) // + ($y ? $eml->header_raw($y) : undef) // return 'NIL'; + + my @x = $Address->parse($v) or return 'NIL'; + '(' . join('', + map { '(' . join(' ', + _esc($_->name), 'NIL', + _esc($_->user), _esc($_->host) + ) . ')' + } @x) . + ')'; +} + +sub eml_envelope ($) { + my ($eml) = @_; + '(' . join(' ', + _esc($eml->header_raw('Date')), + _esc($eml->header_raw('Subject')), + addr_envelope($eml, 'From'), + addr_envelope($eml, 'Sender', 'From'), + addr_envelope($eml, 'Reply-To', 'From'), + addr_envelope($eml, 'To'), + addr_envelope($eml, 'Cc'), + addr_envelope($eml, 'Bcc'), + _esc($eml->header_raw('In-Reply-To')), + _esc($eml->header_raw('Message-ID')), + ) . ')'; +} + +sub uid_fetch_cb { # called by git->cat_async + my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; + my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg; + my $smsg = shift @$msgs or die 'BUG: no smsg'; + $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; + $$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy + + # fixup old bug from import (pre-a0c07cba0e5d8b6a) + $$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s; + + $self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}"); + + $want->{'RFC822.SIZE'} and + $self->msg_more(' RFC822.SIZE '.length($$bref)); + $want->{INTERNALDATE} and + $self->msg_more(' INTERNALDATE "'.$smsg->internaldate.'"'); + $want->{FLAGS} and $self->msg_more(' FLAGS ()'); + for ('RFC822', 'BODY[]', 'BODY.PEEK[]') { + next unless $want->{$_}; + $self->msg_more(" $_ {".length($$bref)."}\r\n"); + $self->msg_more($$bref); + } + + my $eml = PublicInbox::Eml->new($bref); + + $want->{ENVELOPE} and + $self->msg_more(' ENVELOPE '.eml_envelope($eml)); + + for my $f ('RFC822.HEADER', 'BODY[HEADER]', 'BODY.PEEK[HEADER]') { + next unless $want->{$f}; + $self->msg_more(" $f {".length(${$eml->{hdr}})."}\r\n"); + $self->msg_more(${$eml->{hdr}}); + } + for my $f ('RFC822.TEXT', 'BODY[TEXT]') { + next unless $want->{$f}; + $self->msg_more(" $f {".length($$bref)."}\r\n"); + $self->msg_more($$bref); + } + # TODO BODY/BODYSTRUCTURE, specific headers + $self->msg_more(")\r\n"); +} + +sub uid_fetch_m { # long_response + my ($self, $tag, $ibx, $beg, $end, $msgs, $want) = @_; + if (!@$msgs) { # refill + @$msgs = @{$ibx->over->query_xover($$beg, $end)}; + if (!@$msgs) { + $self->write(\"$tag OK Fetch done\r\n"); + return; + } + $$beg = $msgs->[-1]->{num} + 1; + } + my $git = $ibx->git; + $git->cat_async_begin; # TODO: actually make async + $git->cat_async($msgs->[0]->{blob}, \&uid_fetch_cb, \@_); + $git->cat_async_wait; + 1; +} + +sub cmd_uid_fetch ($$$;@) { + my ($self, $tag, $range, @want) = @_; + my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; + if ($want[0] =~ s/\A\(//s) { + $want[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n"; + } + my %want = map {; + my $x = $FETCH_ATT{uc($_)} or return "$tag BAD param: $_\r\n"; + %$x; + } @want; + my ($beg, $end); + my $msgs = []; + if ($range =~ /\A([0-9]+):([0-9]+)\z/s) { + ($beg, $end) = ($1, $2); + } elsif ($range =~ /\A([0-9]+):\*\z/s) { + ($beg, $end) = ($1, $ibx->mm->num_highwater // 0); + } elsif ($range =~ /\A[0-9]+\z/) { + my $smsg = $ibx->over->get_art($range) or return "$tag OK\r\n"; + push @$msgs, $smsg; + ($beg, $end) = ($range, 0); + } else { + return "$tag BAD\r\n"; + } + long_response($self, \&uid_fetch_m, $tag, $ibx, + \$beg, $end, $msgs, \%want); +} + +sub uid_search_all { # long_response + my ($self, $tag, $ibx, $num) = @_; + my $uids = $ibx->mm->ids_after($num); + if (scalar(@$uids)) { + $self->msg_more(join(' ', '', @$uids)); + } else { + $self->write(\"\r\n$tag OK\r\n"); + undef; + } +} + +sub uid_search_uid_range { # long_response + my ($self, $tag, $ibx, $beg, $end) = @_; + my $uids = $ibx->mm->msg_range($beg, $end, 'num'); + if (@$uids) { + $self->msg_more(join('', map { " $_->[0]" } @$uids)); + } else { + $self->write(\"\r\n$tag OK\r\n"); + undef; + } +} + +sub cmd_uid_search ($$$;) { + my ($self, $tag, $arg, @rest) = @_; + my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; + $arg = uc($arg); + if ($arg eq 'ALL' && !@rest) { + $self->msg_more('* SEARCH'); + my $num = 0; + long_response($self, \&uid_search_all, $tag, $ibx, \$num); + } elsif ($arg eq 'UID' && scalar(@rest) == 1) { + if ($rest[0] =~ /\A([0-9]+):([0-9]+|\*)\z/s) { + my ($beg, $end) = ($1, $2); + $end = ($ibx->mm->minmax)[1] if $end eq '*'; + $self->msg_more('* SEARCH'); + long_response($self, \&uid_search_uid_range, + $tag, $ibx, \$beg, $end); + } elsif ($rest[0] =~ /\A[0-9]+\z/s) { + my $uid = $rest[0]; + $uid = $ibx->over->get_art($uid) ? " $uid" : ''; + "* SEARCH$uid\r\n$tag OK\r\n"; + } else { + "$tag BAD\r\n"; + } + } else { + "$tag BAD\r\n"; + } +} + +sub args_ok ($$) { # duplicated from PublicInbox::NNTP + my ($cb, $argc) = @_; + my $tot = prototype $cb; + my ($nreq, undef) = split(';', $tot); + $nreq = ($nreq =~ tr/$//) - 1; + $tot = ($tot =~ tr/$//) - 1; + ($argc <= $tot && $argc >= $nreq); +} + +# returns 1 if we can continue, 0 if not due to buffered writes or disconnect +sub process_line ($$) { + my ($self, $l) = @_; + my ($tag, $req, @args) = split(/[ \t]+/, $l); + if (@args && uc($req) eq 'UID') { + $req .= "_".(shift @args); + } + my $res = eval { + if (my $cmd = $self->can('cmd_'.lc($req // ''))) { + $cmd->($self, $tag, @args); + } else { # this is weird + auth_challenge_ok($self) // + "$tag BAD Error in IMAP command $req: ". + "Unknown command\r\n"; + } + }; + my $err = $@; + if ($err && $self->{sock}) { + $l =~ s/\r?\n//s; + err($self, 'error from: %s (%s)', $l, $err); + $res = "$tag BAD program fault - command not performed\r\n"; + } + return 0 unless defined $res; + $self->write($res); +} + +sub long_step { + my ($self) = @_; + # wbuf is unset or empty, here; {long} may add to it + my ($fd, $cb, $t0, @args) = @{$self->{long_cb}}; + my $more = eval { $cb->($self, @args) }; + if ($@ || !$self->{sock}) { # something bad happened... + delete $self->{long_cb}; + my $elapsed = now() - $t0; + if ($@) { + err($self, + "%s during long response[$fd] - %0.6f", + $@, $elapsed); + } + out($self, " deferred[$fd] aborted - %0.6f", $elapsed); + $self->close; + } elsif ($more) { # $self->{wbuf}: + $self->update_idle_time; + + # COMPRESS users all share the same DEFLATE context. + # Flush it here to ensure clients don't see + # each other's data + $self->zflush; + + # no recursion, schedule another call ASAP, but only after + # all pending writes are done. autovivify wbuf: + my $new_size = push(@{$self->{wbuf}}, \&long_step); + + # wbuf may be populated by $cb, no need to rearm if so: + $self->requeue if $new_size == 1; + } else { # all done! + delete $self->{long_cb}; + my $elapsed = now() - $t0; + my $fd = fileno($self->{sock}); + out($self, " deferred[$fd] done - %0.6f", $elapsed); + my $wbuf = $self->{wbuf}; # do NOT autovivify + + $self->requeue unless $wbuf && @$wbuf; + } +} + +sub err ($$;@) { + my ($self, $fmt, @args) = @_; + printf { $self->{imapd}->{err} } $fmt."\n", @args; +} + +sub out ($$;@) { + my ($self, $fmt, @args) = @_; + printf { $self->{imapd}->{out} } $fmt."\n", @args; +} + +sub long_response ($$;@) { + my ($self, $cb, @args) = @_; # cb returns true if more, false if done + + my $sock = $self->{sock} or return; + # make sure we disable reading during a long response, + # clients should not be sending us stuff and making us do more + # work while we are stream a response to them + $self->{long_cb} = [ fileno($sock), $cb, now(), @args ]; + long_step($self); # kick off! + undef; +} + +# callback used by PublicInbox::DS for any (e)poll (in/out/hup/err) +sub event_step { + my ($self) = @_; + + return unless $self->flush_write && $self->{sock}; + + $self->update_idle_time; + # only read more requests if we've drained the write buffer, + # otherwise we can be buffering infinitely w/o backpressure + + my $rbuf = $self->{rbuf} // (\(my $x = '')); + my $r = 1; + + if (index($$rbuf, "\n") < 0) { + my $off = length($$rbuf); + $r = $self->do_read($rbuf, LINE_MAX, $off) or return; + } + while ($r > 0 && $$rbuf =~ s/\A[ \t]*([^\n]*?)\r?\n//) { + my $line = $1; + return $self->close if $line =~ /[[:cntrl:]]/s; + my $t0 = now(); + my $fd = fileno($self->{sock}); + $r = eval { process_line($self, $line) }; + my $pending = $self->{wbuf} ? ' pending' : ''; + out($self, "[$fd] %s - %0.6f$pending", $line, now() - $t0); + } + + return $self->close if $r < 0; + my $len = length($$rbuf); + return $self->close if ($len >= LINE_MAX); + $self->rbuf_idle($rbuf); + $self->update_idle_time; + + # maybe there's more pipelined data, or we'll have + # to register it for socket-readiness notifications + $self->requeue unless $self->{wbuf}; +} + +sub compressed { undef } + +sub zflush {} # overridden by IMAPdeflate + +# RFC 4978 +sub cmd_compress ($$$) { + my ($self, $tag, $alg) = @_; + return "$tag BAD DEFLATE only\r\n" if uc($alg) ne "DEFLATE"; + return "$tag BAD COMPRESS active\r\n" if $self->compressed; + + # CRIME made TLS compression obsolete + # return "$tag NO [COMPRESSIONACTIVE]\r\n" if $self->tls_compressed; + + PublicInbox::IMAPdeflate->enable($self, $tag); + $self->requeue; + undef +} + +sub cmd_starttls ($$) { + my ($self, $tag) = @_; + my $sock = $self->{sock} or return; + if ($sock->can('stop_SSL') || $self->compressed) { + return "$tag BAD TLS or compression already enabled\r\n"; + } + my $opt = $self->{imapd}->{accept_tls} or + return "$tag BAD can not initiate TLS negotiation\r\n"; + $self->write(\"$tag OK begin TLS negotiation now\r\n"); + $self->{sock} = IO::Socket::SSL->start_SSL($sock, %$opt); + $self->requeue if PublicInbox::DS::accept_tls_step($self); + undef; +} + +# for graceful shutdown in PublicInbox::Daemon: +sub busy { + my ($self, $now) = @_; + ($self->{rbuf} || $self->{wbuf} || $self->not_idle_long($now)); +} + +# we're read-only, so SELECT and EXAMINE do the same thing +no warnings 'once'; +*cmd_select = \&cmd_examine; + +1; diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm new file mode 100644 index 00000000000..1011d6a413b --- /dev/null +++ b/lib/PublicInbox/IMAPD.pm @@ -0,0 +1,15 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +# represents an IMAPD (currently a singleton), +# see script/public-inbox-imapd for how it is used +package PublicInbox::IMAPD; +use strict; +use parent qw(PublicInbox::NNTPD); + +sub new { + my ($class) = @_; + $class->SUPER::new; # PublicInbox::NNTPD->new +} + +1; diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm new file mode 100644 index 00000000000..9366db7a7fc --- /dev/null +++ b/lib/PublicInbox/IMAPdeflate.pm @@ -0,0 +1,119 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# TODO: reduce duplication from PublicInbox::NNTPdeflate + +# RFC 4978 +package PublicInbox::IMAPdeflate; +use strict; +use warnings; +use 5.010_001; +use base qw(PublicInbox::IMAP); +use Compress::Raw::Zlib; +use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways + +my %IN_OPT = ( + -Bufsize => 1024, + -WindowBits => -15, # RFC 1951 + -AppendOutput => 1, +); + +# global deflate context and buffer +my $zbuf = \(my $buf = ''); +my $zout; +{ + my $err; + ($zout, $err) = Compress::Raw::Zlib::Deflate->new( + # nnrpd (INN) and Compress::Raw::Zlib favor MemLevel=9, + # the zlib C library and git use MemLevel=8 as the default + # -MemLevel => 9, + -Bufsize => 65536, # same as nnrpd + -WindowBits => -15, # RFC 1951 + -AppendOutput => 1, + ); + $err == Z_OK or die "Failed to initialize zlib deflate stream: $err"; +} + +sub enable { + my ($class, $self, $tag) = @_; + my ($in, $err) = Compress::Raw::Zlib::Inflate->new(%IN_OPT); + if ($err != Z_OK) { + $self->err("Inflate->new failed: $err"); + $self->write(\"$tag BAD failed to activate compression\r\n"); + return; + } + unlock_hash(%$self); + $self->write(\"$tag OK DEFLATE active\r\n"); + bless $self, $class; + $self->{zin} = $in; +} + +# overrides PublicInbox::NNTP::compressed +sub compressed { 1 } + +# $_[1] may be a reference or not +sub do_read ($$$$) { + my ($self, $rbuf, $len, $off) = @_; + + my $zin = $self->{zin} or return; # closed + my $doff; + my $dbuf = delete($self->{dbuf}) // ''; + $doff = length($dbuf); + my $r = PublicInbox::DS::do_read($self, \$dbuf, $len, $doff) or return; + + # assert(length($$rbuf) == $off) as far as NNTP.pm is concerned + # -ConsumeInput is true, so $dbuf is automatically emptied + my $err = $zin->inflate($dbuf, $rbuf); + if ($err == Z_OK) { + $self->{dbuf} = $dbuf if $dbuf ne ''; + $r = length($$rbuf) and return $r; + # nothing ready, yet, get more, later + $self->requeue; + } else { + delete $self->{zin}; + $self->close; + } + 0; +} + +# override PublicInbox::DS::msg_more +sub msg_more ($$) { + my $self = $_[0]; + + # $_[1] may be a reference or not for ->deflate + my $err = $zout->deflate($_[1], $zbuf); + $err == Z_OK or die "->deflate failed $err"; + 1; +} + +sub zflush ($) { + my ($self) = @_; + + my $deflated = $zbuf; + $zbuf = \(my $next = ''); + + my $err = $zout->flush($deflated, Z_FULL_FLUSH); + $err == Z_OK or die "->flush failed $err"; + + # We can still let the lower socket layer do buffering: + PublicInbox::DS::msg_more($self, $$deflated); +} + +# compatible with PublicInbox::DS::write, so $_[1] may be a reference or not +sub write ($$) { + my $self = $_[0]; + return PublicInbox::DS::write($self, $_[1]) if ref($_[1]) eq 'CODE'; + + my $deflated = $zbuf; + $zbuf = \(my $next = ''); + + # $_[1] may be a reference or not for ->deflate + my $err = $zout->deflate($_[1], $deflated); + $err == Z_OK or die "->deflate failed $err"; + $err = $zout->flush($deflated, Z_FULL_FLUSH); + $err == Z_OK or die "->flush failed $err"; + + # We can still let the socket layer do buffering: + PublicInbox::DS::write($self, $deflated); +} + +1; diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm index e8f9c9a3681..725d4206282 100644 --- a/lib/PublicInbox/Smsg.pm +++ b/lib/PublicInbox/Smsg.pm @@ -131,14 +131,20 @@ sub populate { my @DoW = qw(Sun Mon Tue Wed Thu Fri Sat); my @MoY = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); -sub date ($) { +sub date ($) { # for NNTP my ($self) = @_; my $ds = $self->{ds}; return unless defined $ds; my ($sec, $min, $hour, $mday, $mon, $year, $wday) = gmtime($ds); "$DoW[$wday], " . sprintf("%02d $MoY[$mon] %04d %02d:%02d:%02d +0000", $mday, $year+1900, $hour, $min, $sec); +} +sub internaldate { # for IMAP + my ($self) = @_; + my ($sec, $min, $hour, $mday, $mon, $year) = gmtime($self->{ts} // 0); + sprintf("%02d-$MoY[$mon]-%04d %02d:%02d:%02d +0000", + $mday, $year+1900, $hour, $min, $sec); } our $REPLY_RE = qr/^re:\s+/i; diff --git a/script/public-inbox-imapd b/script/public-inbox-imapd new file mode 100644 index 00000000000..63f865f53f0 --- /dev/null +++ b/script/public-inbox-imapd @@ -0,0 +1,14 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# +# Standalone read-only IMAP server for public-inbox. +use strict; +use PublicInbox::Daemon; +use PublicInbox::IMAPdeflate; # loads PublicInbox::IMAP +use PublicInbox::IMAPD; +my $imapd = PublicInbox::IMAPD->new; +PublicInbox::Daemon::run('0.0.0.0:143', + sub { $imapd->refresh_groups }, # refresh + sub ($$$) { PublicInbox::IMAP->new($_[0], $imapd) }, # post_accept + $imapd); diff --git a/t/imapd-tls.t b/t/imapd-tls.t new file mode 100644 index 00000000000..9f5abfe048e --- /dev/null +++ b/t/imapd-tls.t @@ -0,0 +1,209 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use warnings; +use Test::More; +use Socket qw(IPPROTO_TCP SOL_SOCKET); +use PublicInbox::TestCommon; +# IO::Poll is part of the standard library, but distros may split it off... +require_mods(qw(DBD::SQLite IO::Socket::SSL Mail::IMAPClient IO::Poll)); +Mail::IMAPClient->can('starttls') or + plan skip_all => 'Mail::IMAPClient does not support TLS'; +my $cert = 'certs/server-cert.pem'; +my $key = 'certs/server-key.pem'; +unless (-r $key && -r $cert) { + plan skip_all => + "certs/ missing for $0, run $^X ./create-certs.perl in certs/"; +} +use_ok 'PublicInbox::TLS'; +use_ok 'IO::Socket::SSL'; +use PublicInbox::InboxWritable; +require PublicInbox::SearchIdx; +my $version = 1; # v2 needs newer git +require_git('2.6') if $version >= 2; +my ($tmpdir, $for_destroy) = tmpdir(); +my $err = "$tmpdir/stderr.log"; +my $out = "$tmpdir/stdout.log"; +my $inboxdir = "$tmpdir"; +my $pi_config = "$tmpdir/pi_config"; +my $group = 'test-imapd-tls'; +my $addr = $group . '@example.com'; +my $starttls = tcp_server(); +my $imaps = tcp_server(); +my $ibx = PublicInbox::Inbox->new({ + inboxdir => $inboxdir, + name => 'imapd-tls', + version => $version, + -primary_address => $addr, + indexlevel => 'basic', +}); +$ibx = PublicInbox::InboxWritable->new($ibx, {nproc=>1}); +$ibx->init_inbox(0); +{ + open my $fh, '>', $pi_config or BAIL_OUT "open: $!"; + print $fh <<EOF +[publicinbox "imapd-tls"] + inboxdir = $inboxdir + address = $addr + indexlevel = basic + newsgroup = $group +EOF + ; + close $fh or BAIL_OUT "close: $!\n"; +} + +{ + my $im = $ibx->importer(0); + ok($im->add(eml_load('t/data/0001.patch')), 'message added'); + $im->done; + if ($version == 1) { + my $s = PublicInbox::SearchIdx->new($ibx, 1); + $s->index_sync; + } +} + +my $imaps_addr = $imaps->sockhost . ':' . $imaps->sockport; +my $starttls_addr = $starttls->sockhost . ':' . $starttls->sockport; +my $env = { PI_CONFIG => $pi_config }; +my $td; + +# Mail::IMAPClient ->compress creates cyclic reference: +# https://rt.cpan.org/Ticket/Display.html?id=132654 +my $compress_logout = sub { + my ($c) = @_; + ok($c->logout, 'logout ok after ->compress'); + # all documented in Mail::IMAPClient manpage: + for (qw(Readmoremethod Readmethod Prewritemethod)) { + $c->$_(undef); + } +}; + + +for my $args ( + [ "--cert=$cert", "--key=$key", + "-limaps://$imaps_addr", + "-limap://$starttls_addr" ], +) { + for ($out, $err) { + open my $fh, '>', $_ or BAIL_OUT "truncate: $!"; + } + my $cmd = [ '-imapd', '-W0', @$args, "--stdout=$out", "--stderr=$err" ]; + $td = start_script($cmd, $env, { 3 => $starttls, 4 => $imaps }); + my %o = ( + SSL_hostname => 'server.local', + SSL_verifycn_name => 'server.local', + SSL_verify_mode => SSL_VERIFY_PEER(), + SSL_ca_file => 'certs/test-ca.pem', + ); + # start negotiating a slow TLS connection + my $slow = tcp_connect($imaps, Blocking => 0); + $slow = IO::Socket::SSL->start_SSL($slow, SSL_startHandshake => 0, %o); + my $slow_done = $slow->connect_SSL; + my @poll; + if ($slow_done) { + diag('W: connect_SSL early OK, slow client test invalid'); + use PublicInbox::Syscall qw(EPOLLIN EPOLLOUT); + @poll = (fileno($slow), EPOLLIN | EPOLLOUT); + } else { + @poll = (fileno($slow), PublicInbox::TLS::epollbit()); + } + # we should call connect_SSL much later... + my %imaps_opt = (User => 'a', Password => 'b', + Server => $imaps->sockhost, + Port => $imaps->sockport); + # IMAPS + my $c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]); + ok($c && $c->IsAuthenticated, 'authenticated'); + ok($c->select($group), 'SELECT works'); + ok(!(scalar $c->has_capability('STARTTLS')), + 'starttls not advertised with IMAPS'); + ok(!$c->starttls, "starttls fails"); + ok($c->has_capability('COMPRESS'), 'compress advertised'); + ok($c->compress, 'compression enabled with IMAPS'); + ok(!$c->starttls, 'starttls still fails'); + ok($c->noop, 'noop succeeds'); + $compress_logout->($c); + + # STARTTLS + my %imap_opt = (Server => $starttls->sockhost, + Port => $starttls->sockport); + $c = Mail::IMAPClient->new(%imap_opt); + ok(scalar $c->has_capability('STARTTLS'), + 'starttls advertised'); + ok($c->Starttls([ %o ]), 'set starttls options'); + ok($c->starttls, '->starttls works'); + ok(!(scalar($c->has_capability('STARTTLS'))), + 'starttls not advertised'); + ok(!$c->starttls, '->starttls again fails'); + ok(!(scalar($c->has_capability('STARTTLS'))), + 'starttls still not advertised'); + ok($c->examine($group), 'EXAMINE works'); + ok($c->noop, 'NOOP works'); + ok($c->compress, 'compression enabled with IMAPS'); + ok($c->noop, 'NOOP works after compress'); + $compress_logout->($c); + + # STARTTLS with bad hostname + $o{SSL_hostname} = $o{SSL_verifycn_name} = 'server.invalid'; + $c = Mail::IMAPClient->new(%imap_opt); + ok(scalar $c->has_capability('STARTTLS'), 'starttls advertised'); + ok($c->Starttls([ %o ]), 'set starttls options'); + ok(!$c->starttls, '->starttls fails with bad hostname'); + + $c = Mail::IMAPClient->new(%imap_opt); + ok($c->noop, 'NOOP still works from plain IMAP'); + + # IMAPS with bad hostname + $c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]); + is($c, undef, 'IMAPS fails with bad hostname'); + + # make hostname valid + $o{SSL_hostname} = $o{SSL_verifycn_name} = 'server.local'; + $c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]); + ok($c, 'IMAPS succeeds again with valid hostname'); + + # slow TLS connection did not block the other fast clients while + # connecting, finish it off: + until ($slow_done) { + IO::Poll::_poll(-1, @poll); + $slow_done = $slow->connect_SSL and last; + @poll = (fileno($slow), PublicInbox::TLS::epollbit()); + } + $slow->blocking(1); + ok(sysread($slow, my $greet, 4096) > 0, 'slow got a greeting'); + like($greet, qr/\A\* OK \[CAPABILITY IMAP4rev1 /, 'got greeting'); + is(syswrite($slow, "1 LOGOUT\r\n"), 10, 'slow wrote LOGOUT'); + ok(sysread($slow, my $end, 4096) > 0, 'got end'); + is(sysread($slow, my $eof, 4096), 0, 'got EOF'); + + SKIP: { + skip 'TCP_DEFER_ACCEPT is Linux-only', 2 if $^O ne 'linux'; + my $var = eval { Socket::TCP_DEFER_ACCEPT() } // 9; + defined(my $x = getsockopt($imaps, IPPROTO_TCP, $var)) or die; + ok(unpack('i', $x) > 0, 'TCP_DEFER_ACCEPT set on IMAPS'); + defined($x = getsockopt($starttls, IPPROTO_TCP, $var)) or die; + is(unpack('i', $x), 0, 'TCP_DEFER_ACCEPT is 0 on plain IMAP'); + }; + SKIP: { + skip 'SO_ACCEPTFILTER is FreeBSD-only', 2 if $^O ne 'freebsd'; + if (system('kldstat -m accf_data >/dev/null')) { + skip 'accf_data not loaded? kldload accf_data', 2; + } + require PublicInbox::Daemon; + my $var = PublicInbox::Daemon::SO_ACCEPTFILTER(); + my $x = getsockopt($imaps, SOL_SOCKET, $var); + like($x, qr/\Adataready\0+\z/, 'got dataready accf for IMAPS'); + $x = getsockopt($starttls, IPPROTO_TCP, $var); + is($x, undef, 'no BSD accept filter for plain IMAP'); + }; + + $c = undef; + $td->kill; + $td->join; + is($?, 0, 'no error in exited process'); + open my $fh, '<', $err or BAIL_OUT "open $err failed: $!"; + my $eout = do { local $/; <$fh> }; + unlike($eout, qr/wide/i, 'no Wide character warnings'); +} + +done_testing; diff --git a/t/imapd.t b/t/imapd.t new file mode 100644 index 00000000000..f28a663bf9d --- /dev/null +++ b/t/imapd.t @@ -0,0 +1,149 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use Test::More; +use PublicInbox::TestCommon; +require_mods(qw(DBD::SQLite Mail::IMAPClient)); +my $level = '-Lbasic'; +SKIP: { + require_mods('Search::Xapian', 1); + $level = '-Lmedium'; +}; + +my @V = (1); +#push(@V, 2) if require_git('2.6', 1); + +my ($tmpdir, $for_destroy) = tmpdir(); +my $home = "$tmpdir/home"; +local $ENV{HOME} = $home; + +for my $V (@V) { + my $addr = "i$V\@example.com"; + my $name = "i$V"; + my $url = "http://example.com/i$V"; + my $inboxdir = "$tmpdir/$name"; + my $folder = "inbox.i$V"; + my $cmd = ['-init', "-V$V", $level, $name, $inboxdir, $url, $addr]; + run_script($cmd) or BAIL_OUT("init $name"); + xsys(qw(git config), "--file=$ENV{HOME}/.public-inbox/config", + "publicinbox.$name.newsgroup", $folder) == 0 or + BAIL_OUT("setting newsgroup $V"); + if ($V == 1) { + xsys(qw(git config), "--file=$ENV{HOME}/.public-inbox/config", + 'publicinboxmda.spamcheck', 'none') == 0 or + BAIL_OUT("config: $?"); + } + open(my $fh, '<', 't/utf8.eml') or BAIL_OUT("open t/utf8.eml: $!"); + my $env = { ORIGINAL_RECIPIENT => $addr }; + run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or + BAIL_OUT('-mda delivery'); + if ($V == 1) { + run_script(['-index', $inboxdir]) or BAIL_OUT("index $?"); + } +} +my $sock = tcp_server(); +my $err = "$tmpdir/stderr.log"; +my $out = "$tmpdir/stdout.log"; +my $cmd = [ '-imapd', '-W0', "--stdout=$out", "--stderr=$err" ]; +my $td = start_script($cmd, undef, { 3 => $sock }) or BAIL_OUT("-imapd: $?"); +my %mic_opt = ( + Server => $sock->sockhost, + Port => $sock->sockport, + Uid => 1, +); +my $mic = Mail::IMAPClient->new(%mic_opt); +my $pre_login_capa = $mic->capability; +is(grep(/\AAUTH=ANONYMOUS\z/, @$pre_login_capa), 1, + 'AUTH=ANONYMOUS advertised pre-login'); + +$mic->User('lorelei'); +$mic->Password('Hunter2'); +ok($mic->login && $mic->IsAuthenticated, 'LOGIN works'); +my $post_login_capa = $mic->capability; +ok(join("\n", @$pre_login_capa) ne join("\n", @$post_login_capa), + 'got different capabilities post-login'); + +$mic_opt{Authmechanism} = 'ANONYMOUS'; +$mic_opt{Authcallback} = sub { '' }; +$mic = Mail::IMAPClient->new(%mic_opt); +ok($mic && $mic->login && $mic->IsAuthenticated, 'AUTHENTICATE ANONYMOUS'); +my $post_auth_anon_capa = $mic->capability; +is_deeply($post_auth_anon_capa, $post_login_capa, + 'auth anon has same capabilities'); +my $e; +ok(!$mic->examine('foo') && ($e = $@), 'EXAMINE non-existent'); +like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent'); +ok(!$mic->select('foo') && ($e = $@), 'EXAMINE non-existent'); +like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent'); +ok($mic->select('inbox.i1'), 'SELECT succeeds'); +ok($mic->examine('inbox.i1'), 'EXAMINE succeeds'); + +my $ret = $mic->search('all') or BAIL_OUT "SEARCH FAIL $@"; +is_deeply($ret, [ 1 ], 'search all works'); +$ret = $mic->search('uid 1') or BAIL_OUT "SEARCH FAIL $@"; +is_deeply($ret, [ 1 ], 'search UID 1 works'); +$ret = $mic->search('uid 1:1') or BAIL_OUT "SEARCH FAIL $@"; +is_deeply($ret, [ 1 ], 'search UID 1:1 works'); +$ret = $mic->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@"; +is_deeply($ret, [ 1 ], 'search UID 1:* works'); + +is_deeply(scalar $mic->flags('1'), [], '->flags works'); + +for my $r ('1:*', '1') { + $ret = $mic->fetch_hash($r, 'RFC822') or BAIL_OUT "FETCH $@"; + is_deeply([keys %$ret], [1]); + like($ret->{1}->{RFC822}, qr/\r\n\r\nThis is a test/, 'read full'); + + # ensure Mail::IMAPClient behaves + my $str = $mic->message_string($r) or BAIL_OUT "->message_string: $@"; + is($str, $ret->{1}->{RFC822}, '->message_string works as expected'); + + my $sz = $mic->fetch_hash($r, 'RFC822.size') or BAIL_OUT "FETCH $@"; + is($sz->{1}->{'RFC822.SIZE'}, length($ret->{1}->{RFC822}), + 'RFC822.SIZE'); + + $ret = $mic->fetch_hash($r, 'RFC822.HEADER') or BAIL_OUT "FETCH $@"; + is_deeply([keys %$ret], [1]); + like($ret->{1}->{'RFC822.HEADER'}, + qr/^Message-ID: <testmessage\@example\.com>/ms, 'read header'); + + $ret = $mic->fetch_hash($r, 'INTERNALDATE') or BAIL_OUT "FETCH $@"; + is($ret->{1}->{'INTERNALDATE'}, '01-Jan-1970 00:00:00 +0000', + 'internaldate matches'); + ok(!$mic->fetch_hash($r, 'INFERNALDATE'), 'bogus attribute fails'); + + my $envelope = $mic->get_envelope($r) or BAIL_OUT("get_envelope: $@"); + is($envelope->{bcc}, 'NIL', 'empty bcc'); + is($envelope->{messageid}, '<testmessage@example.com>', 'messageid'); + is(scalar @{$envelope->{to}}, 1, 'one {to} header'); + # *sigh* too much to verify... + #use Data::Dumper; diag Dumper($envelope); + + $ret = $mic->fetch_hash($r, 'FLAGS') or BAIL_OUT "FETCH $@"; + is_deeply($ret->{1}->{FLAGS}, '', 'no flags'); +} + +# Mail::IMAPClient ->compress creates cyclic reference: +# https://rt.cpan.org/Ticket/Display.html?id=132654 +my $compress_logout = sub { + my ($c) = @_; + ok($c->logout, 'logout ok after ->compress'); + # all documented in Mail::IMAPClient manpage: + for (qw(Readmoremethod Readmethod Prewritemethod)) { + $c->$_(undef); + } +}; + +is_deeply([$mic->has_capability('COMPRESS')], ['DEFLATE'], 'deflate cap'); +ok($mic->compress, 'compress enabled'); +$compress_logout->($mic); + +$td->kill; +$td->join; +is($?, 0, 'no error in exited process'); +open my $fh, '<', $err or BAIL_OUT("open $err failed: $!"); +my $eout = do { local $/; <$fh> }; +unlike($eout, qr/wide/i, 'no Wide character warnings'); + +done_testing;
This will be used to implement IMAP IDLE, first. Eventually, it may be used to trigger other things: * incremental internal updates for manifest.js.gz * restart `git cat-file' processes on pack index unlink * IMAP IDLE-like long-polling HTTP endpoint And maybe more things we haven't thought of, yet. It uses Linux::Inotify2 or IO::KQueue depending on what packages are installed and what the kernel supports. It falls back to nanosecond-aware Time::HiRes::stat() (available with Perl 5.10.0+) on systems lacking Linux::Inotify2 and IO::KQueue. In the future, a pure Perl alternative to Linux::Inotify2 may be supplied for users of architectures we already support signalfd and epoll on. v2 changes: - avoid O_TRUNC on lock file - change ctime on Linux systems w/o inotify - fix naming of comments and fields --- MANIFEST | 5 +++ lib/PublicInbox/FakeInotify.pm | 39 ++++++++++++++++++ lib/PublicInbox/In2Tie.pm | 17 ++++++++ lib/PublicInbox/Inbox.pm | 20 ++++++++++ lib/PublicInbox/InboxIdle.pm | 55 ++++++++++++++++++++++++++ lib/PublicInbox/KQNotify.pm | 60 ++++++++++++++++++++++++++++ lib/PublicInbox/Lock.pm | 7 ++++ t/inbox_idle.t | 72 ++++++++++++++++++++++++++++++++++ 8 files changed, 275 insertions(+) create mode 100644 lib/PublicInbox/FakeInotify.pm create mode 100644 lib/PublicInbox/In2Tie.pm create mode 100644 lib/PublicInbox/InboxIdle.pm create mode 100644 lib/PublicInbox/KQNotify.pm create mode 100644 t/inbox_idle.t diff --git a/MANIFEST b/MANIFEST index 73b874b42a0..8aff192c7ee 100644 --- a/MANIFEST +++ b/MANIFEST @@ -109,6 +109,7 @@ lib/PublicInbox/Emergency.pm lib/PublicInbox/Eml.pm lib/PublicInbox/EmlContentFoo.pm lib/PublicInbox/ExtMsg.pm +lib/PublicInbox/FakeInotify.pm lib/PublicInbox/Feed.pm lib/PublicInbox/Filter/Base.pm lib/PublicInbox/Filter/Gmane.pm @@ -129,8 +130,11 @@ lib/PublicInbox/IMAP.pm lib/PublicInbox/IMAPD.pm lib/PublicInbox/IMAPdeflate.pm lib/PublicInbox/Import.pm +lib/PublicInbox/In2Tie.pm lib/PublicInbox/Inbox.pm +lib/PublicInbox/InboxIdle.pm lib/PublicInbox/InboxWritable.pm +lib/PublicInbox/KQNotify.pm lib/PublicInbox/Linkify.pm lib/PublicInbox/Listener.pm lib/PublicInbox/Lock.pm @@ -266,6 +270,7 @@ t/imapd-tls.t t/imapd.t t/import.t t/inbox.t +t/inbox_idle.t t/index-git-times.t t/indexlevels-mirror-v1.t t/indexlevels-mirror.t diff --git a/lib/PublicInbox/FakeInotify.pm b/lib/PublicInbox/FakeInotify.pm new file mode 100644 index 00000000000..bd610463181 --- /dev/null +++ b/lib/PublicInbox/FakeInotify.pm @@ -0,0 +1,39 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +# for systems lacking Linux::Inotify2 or IO::KQueue, just emulates +# enough of Linux::Inotify2 +package PublicInbox::FakeInotify; +use strict; +use Time::HiRes qw(stat); +my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify + +sub new { bless { watch => {} }, __PACKAGE__ } + +# behaves like Linux::Inotify2->watch +sub watch { + my ($self, $path, $mask, $cb) = @_; + my @st = stat($path) or return; + $self->{watch}->{"$path\0$mask"} = [ @st, $cb ]; +} + +# behaves like non-blocking Linux::Inotify2->poll +sub poll { + my ($self) = @_; + my $watch = $self->{watch} or return; + for my $x (keys %$watch) { + my ($path, $mask) = split(/\0/, $x, 2); + my @now = stat($path) or next; + my $prv = $watch->{$x}; + my $cb = $prv->[-1]; + # 10: ctime, 7: size + if ($prv->[10] != $now[10]) { + if (($mask & $IN_CLOSE) == $IN_CLOSE) { + eval { $cb->() }; + } + } + @$prv = (@now, $cb); + } +} + +1; diff --git a/lib/PublicInbox/In2Tie.pm b/lib/PublicInbox/In2Tie.pm new file mode 100644 index 00000000000..db1dc1045c1 --- /dev/null +++ b/lib/PublicInbox/In2Tie.pm @@ -0,0 +1,17 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +# used to ensure PublicInbox::DS can call fileno() as a function +# on Linux::Inotify2 objects +package PublicInbox::In2Tie; +use strict; + +sub TIEHANDLE { + my ($class, $in2) = @_; + bless \$in2, $class; # a scalar reference to an existing reference +} + +# this calls Linux::Inotify2::fileno +sub FILENO { ${$_[0]}->fileno } + +1; diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm index af034358b15..b250bef33d3 100644 --- a/lib/PublicInbox/Inbox.pm +++ b/lib/PublicInbox/Inbox.pm @@ -391,4 +391,24 @@ sub altid_map ($) { } // {}; } +# $obj must respond to ->on_inbox_unlock, which takes Inbox ($self) as an arg +sub subscribe_unlock { + my ($self, $ident, $obj) = @_; + $self->{unlock_subs}->{$ident} = $obj; +} + +sub unsubscribe_unlock { + my ($self, $ident) = @_; + delete $self->{unlock_subs}->{$ident}; +} + +# called by inotify +sub on_unlock { + my ($self) = @_; + my $subs = $self->{unlock_subs} or return; + for (values %$subs) { + eval { $_->on_inbox_unlock($self) }; + } +} + 1; diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm new file mode 100644 index 00000000000..095a801c946 --- /dev/null +++ b/lib/PublicInbox/InboxIdle.pm @@ -0,0 +1,55 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +package PublicInbox::InboxIdle; +use strict; +use base qw(PublicInbox::DS); +use fields qw(pi_config inot); +use Symbol qw(gensym); +use PublicInbox::Syscall qw(EPOLLIN EPOLLET); +my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify +my $ino_cls; +if ($^O eq 'linux' && eval { require Linux::Inotify2; 1 }) { + $IN_CLOSE = Linux::Inotify2::IN_CLOSE(); + $ino_cls = 'Linux::Inotify2'; +} elsif (eval { require PublicInbox::KQNotify }) { + $IN_CLOSE = PublicInbox::KQNotify::IN_CLOSE(); + $ino_cls = 'PublicInbox::KQNotify'; +} +require PublicInbox::In2Tie if $ino_cls; + +sub in2_arm ($$) { # PublicInbox::Config::each_inbox callback + my ($ibx, $inot) = @_; + my $path = "$ibx->{inboxdir}/"; + $path .= $ibx->version >= 2 ? 'inbox.lock' : 'ssoma.lock'; + $inot->watch($path, $IN_CLOSE, sub { $ibx->on_unlock }); + # TODO: detect deleted packs (and possibly other files) +} + +sub new { + my ($class, $pi_config) = @_; + my $self = fields::new($class); + my $inot; + if ($ino_cls) { + $inot = $ino_cls->new or die "E: $ino_cls->new: $!"; + my $sock = gensym; + tie *$sock, 'PublicInbox::In2Tie', $inot; + $inot->blocking(0); + $inot->on_overflow(undef); # broadcasts everything on overflow + $self->SUPER::new($sock, EPOLLIN | EPOLLET); + } else { + require PublicInbox::FakeInotify; + $inot = PublicInbox::FakeInotify->new; + } + $self->{inot} = $inot; + $pi_config->each_inbox(\&in2_arm, $inot); + $self; +} + +sub event_step { + my ($self) = @_; + eval { $self->{inot}->poll }; # Linux::Inotify2::poll + warn "$self->{inot}->poll err: $@\n" if $@; +} + +1; diff --git a/lib/PublicInbox/KQNotify.pm b/lib/PublicInbox/KQNotify.pm new file mode 100644 index 00000000000..3cf9c0f5a0c --- /dev/null +++ b/lib/PublicInbox/KQNotify.pm @@ -0,0 +1,60 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +# implements the small subset of Linux::Inotify2 functionality we use +# using IO::KQueue on *BSD systems. +package PublicInbox::KQNotify; +use strict; +use IO::KQueue; +use PublicInbox::DSKQXS; # wraps IO::KQueue for fork-safe DESTROY + +# only true as far as public-inbox is concerned with .lock files: +sub IN_CLOSE () { NOTE_WRITE } +#sub IN_CLOSE () { 0x200 } # NOTE_CLOSE_WRITE (FreeBSD 11+ only) + +sub new { + my ($class) = @_; + bless { dskq => PublicInbox::DSKQXS->new, watch => {} }, $class; +} + +sub watch { + my ($self, $path, $mask, $cb) = @_; + open(my $fh, '<', $path) or return; + my $ident = fileno($fh); + $self->{dskq}->{kq}->EV_SET($ident, # ident + EVFILT_VNODE, # filter + EV_ADD | EV_CLEAR, # flags + $mask, # fflags + 0, 0); # data, udata + if ($mask == IN_CLOSE) { + $self->{watch}->{$ident} = [ $fh, $cb ]; + } else { + die "TODO Not implemented: $mask"; + } +} + +# emulate Linux::Inotify::fileno +sub fileno { ${$_[0]->{dskq}->{kq}} } + +# noop for Linux::Inotify2 compatibility. Unlike inotify, +# kqueue doesn't seem to overflow since it's limited by the number of +# open FDs the process has +sub on_overflow {} + +# noop for Linux::Inotify2 compatibility, we use `0' timeout for ->kevent +sub blocking {} + +# behave like Linux::Inotify2::poll +sub poll { + my ($self) = @_; + my @kevents = $self->{dskq}->{kq}->kevent(0); + for my $kev (@kevents) { + my $ident = $kev->[KQ_IDENT]; + my $mask = $kev->[KQ_FFLAGS]; + if (($mask & IN_CLOSE) == IN_CLOSE) { + eval { $self->{watch}->{$ident}->[1]->() }; + } + } +} + +1; diff --git a/lib/PublicInbox/Lock.pm b/lib/PublicInbox/Lock.pm index 032841ed5cf..693a37949a6 100644 --- a/lib/PublicInbox/Lock.pm +++ b/lib/PublicInbox/Lock.pm @@ -24,6 +24,13 @@ sub lock_release { my ($self) = @_; return unless $self->{lock_path}; my $lockfh = delete $self->{lockfh} or croak 'not locked'; + + # NetBSD 8.1 and OpenBSD 6.5 (and maybe other versions/*BSDs) lack + # NOTE_CLOSE_WRITE from FreeBSD 11+, so trigger NOTE_WRITE, instead. + # We also need to change the ctime on Linux systems w/o inotify + if ($^O ne 'linux' || !eval { require Linux::Inotify2; 1 }) { + syswrite($lockfh, '.'); + } flock($lockfh, LOCK_UN) or die "unlock failed: $!\n"; close $lockfh or die "close failed: $!\n"; } diff --git a/t/inbox_idle.t b/t/inbox_idle.t new file mode 100644 index 00000000000..6bd56113602 --- /dev/null +++ b/t/inbox_idle.t @@ -0,0 +1,72 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use Test::More; +use PublicInbox::TestCommon; +use PublicInbox::Config; +require_mods(qw(DBD::SQLite)); +require PublicInbox::SearchIdx; +use_ok 'PublicInbox::InboxIdle'; +require_git('2.6'); +use PublicInbox::InboxWritable; +my ($tmpdir, $for_destroy) = tmpdir(); + +for my $V (1, 2) { + my $inboxdir = "$tmpdir/$V"; + mkdir $inboxdir or BAIL_OUT("mkdir: $!"); + my %opt = ( + inboxdir => $inboxdir, + name => 'inbox-idle', + version => $V, + -primary_address => 'test@example.com', + indexlevel => 'basic', + ); + my $ibx = PublicInbox::Inbox->new({ %opt }); + $ibx = PublicInbox::InboxWritable->new($ibx); + my $obj = InboxIdleTestObj->new; + $ibx->init_inbox(0); + my $im = $ibx->importer(0); + if ($V == 1) { + my $sidx = PublicInbox::SearchIdx->new($ibx, 1); + $sidx->_xdb_acquire; + $sidx->set_indexlevel; + $sidx->_xdb_release; # allow watching on lockfile + } + my $pi_config = PublicInbox::Config->new(\<<EOF); +publicinbox.inbox-idle.inboxdir=$inboxdir +publicinbox.inbox-idle.indexlevel=basic +publicinbox.inbox-idle.address=test\@example.com +EOF + my $ident = 'whatever'; + $pi_config->each_inbox(sub { shift->subscribe_unlock($ident, $obj) }); + my $ii = PublicInbox::InboxIdle->new($pi_config); + ok($ii, 'InboxIdle created'); + SKIP: { + skip('inotify or kqueue missing', 1) unless $ii->{sock}; + ok(fileno($ii->{sock}) >= 0, 'fileno() gave valid FD'); + } + ok($im->add(eml_load('t/utf8.eml')), "$V added"); + $im->done; + PublicInbox::SearchIdx->new($ibx)->index_sync if $V == 1; + $ii->event_step; + is(scalar @{$obj->{called}}, 1, 'called on unlock'); + $pi_config->each_inbox(sub { shift->unsubscribe_unlock($ident) }); + ok($im->add(eml_load('t/data/0001.patch')), "$V added #2"); + $im->done; + PublicInbox::SearchIdx->new($ibx)->index_sync if $V == 1; + $ii->event_step; + is(scalar @{$obj->{called}}, 1, 'not called when unsubbed'); + $ii->close; +} + +done_testing; + +package InboxIdleTestObj; +use strict; + +sub new { bless {}, shift } + +sub on_inbox_unlock { + my ($self, $ibx) = @_; + push @{$self->{called}}, $ibx; +}
It seems to be working as far as Mail::IMAPClient is concerned. --- Documentation/standards.perl | 2 +- lib/PublicInbox/IMAP.pm | 58 ++++++++++++++++++++++++++++++++++-- lib/PublicInbox/IMAPD.pm | 20 ++++++++++++- lib/PublicInbox/NNTPD.pm | 6 ++-- t/imapd.t | 28 +++++++++++++++-- 5 files changed, 104 insertions(+), 10 deletions(-) diff --git a/Documentation/standards.perl b/Documentation/standards.perl index 37309956f39..8fc852c722b 100755 --- a/Documentation/standards.perl +++ b/Documentation/standards.perl @@ -43,7 +43,7 @@ my $rfcs = [ 2822 => 'Internet message format (2001)', 5322 => 'Internet message format (2008)', 3501 => 'IMAP4rev1', - 2177 => 'IMAP IDLE', # TODO + 2177 => 'IMAP IDLE', # 5032 = 'WITHIN search extension for IMAP', 4978 => 'IMAP COMPRESS Extension', # 5182 = 'IMAP Extension for Referencing the Last SEARCH Result', diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index c0636066b9f..99c6c817fd7 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -15,7 +15,8 @@ package PublicInbox::IMAP; use strict; use base qw(PublicInbox::DS); -use fields qw(imapd logged_in ibx long_cb -login_tag); +use fields qw(imapd logged_in ibx long_cb -login_tag + -idle_tag -idle_max); use PublicInbox::Eml; use PublicInbox::DS qw(now); use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT); @@ -88,7 +89,10 @@ sub new ($$$) { sub capa ($) { my ($self) = @_; - my $capa = 'CAPABILITY IMAP4rev1'; + + # dovecot advertises IDLE pre-login; perhaps because some clients + # depend on it, so we'll do the same + my $capa = 'CAPABILITY IMAP4rev1 IDLE'; if ($self->{logged_in}) { $capa .= ' COMPRESS=DEFLATE'; } else { @@ -139,6 +143,40 @@ sub cmd_capability ($$) { sub cmd_noop ($$) { "$_[1] OK NOOP completed\r\n" } +# called by PublicInbox::InboxIdle +sub on_inbox_unlock { + my ($self, $ibx) = @_; + my $new = ($ibx->mm->minmax)[1]; + defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset'; + if ($new > $old) { + $self->{-idle_max} = $new; + $self->msg_more("* $_ EXISTS\r\n") for (($old + 1)..($new - 1)); + $self->write(\"* $new EXISTS\r\n"); + } +} + +sub cmd_idle ($$) { + my ($self, $tag) = @_; + # IDLE seems allowed by dovecot w/o a mailbox selected *shrug* + my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n"; + $ibx->subscribe_unlock(fileno($self->{sock}), $self); + $self->{-idle_tag} = $tag; + $self->{-idle_max} = ($ibx->mm->minmax)[1] // 0; + "+ idling\r\n" +} + +sub cmd_done ($$) { + my ($self, $tag) = @_; # $tag is "DONE" (case-insensitive) + defined(my $idle_tag = delete $self->{-idle_tag}) or + return "$tag BAD not idle\r\n"; + my $ibx = $self->{ibx} or do { + warn "BUG: idle_tag set w/o inbox"; + return "$tag BAD internal bug\r\n"; + }; + $ibx->unsubscribe_unlock(fileno($self->{sock})); + "$idle_tag OK Idle completed\r\n"; +} + sub cmd_examine ($$$) { my ($self, $tag, $mailbox) = @_; my $ibx = $self->{imapd}->{groups}->{$mailbox} or @@ -361,7 +399,11 @@ sub process_line ($$) { } my $res = eval { if (my $cmd = $self->can('cmd_'.lc($req // ''))) { - $cmd->($self, $tag, @args); + defined($self->{-idle_tag}) ? + "$self->{-idle_tag} BAD expected DONE\r\n" : + $cmd->($self, $tag, @args); + } elsif (uc($tag // '') eq 'DONE' && !defined($req)) { + cmd_done($self, $tag); } else { # this is weird auth_challenge_ok($self) // "$tag BAD Error in IMAP command $req: ". @@ -516,6 +558,16 @@ sub busy { ($self->{rbuf} || $self->{wbuf} || $self->not_idle_long($now)); } +sub close { + my ($self) = @_; + if (my $ibx = delete $self->{ibx}) { + if (my $sock = $self->{sock}) {; + $ibx->unsubscribe_unlock(fileno($sock)); + } + } + $self->SUPER::close; # PublicInbox::DS::close +} + # we're read-only, so SELECT and EXAMINE do the same thing no warnings 'once'; *cmd_select = \&cmd_examine; diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index 1011d6a413b..1922c16046a 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -6,10 +6,28 @@ package PublicInbox::IMAPD; use strict; use parent qw(PublicInbox::NNTPD); +use PublicInbox::InboxIdle; sub new { my ($class) = @_; - $class->SUPER::new; # PublicInbox::NNTPD->new + bless { + groups => {}, + err => \*STDERR, + out => \*STDOUT, + grouplist => [], + # accept_tls => { SSL_server => 1, ..., SSL_reuse_ctx => ... } + # idler => PublicInbox::InboxIdle + }, $class; +} + +sub refresh_groups { + my ($self) = @_; + if (my $old_idler = delete $self->{idler}) { + $old_idler->close; # PublicInbox::DS::close + } + my $pi_config = PublicInbox::Config->new; + $self->{idler} = PublicInbox::InboxIdle->new($pi_config); + $self->SUPER::refresh_groups($pi_config); } 1; diff --git a/lib/PublicInbox/NNTPD.pm b/lib/PublicInbox/NNTPD.pm index b8ec84ed6b0..ed5cf7cc8c0 100644 --- a/lib/PublicInbox/NNTPD.pm +++ b/lib/PublicInbox/NNTPD.pm @@ -30,9 +30,9 @@ sub new { }, $class; } -sub refresh_groups () { - my ($self) = @_; - my $pi_config = PublicInbox::Config->new; +sub refresh_groups { + my ($self, $pi_config) = @_; + $pi_config //= PublicInbox::Config->new; my $new = {}; my @list; $pi_config->each_inbox(sub { diff --git a/t/imapd.t b/t/imapd.t index f28a663bf9d..359c4c033b2 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -3,8 +3,10 @@ # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> use strict; use Test::More; +use Time::HiRes (); use PublicInbox::TestCommon; -require_mods(qw(DBD::SQLite Mail::IMAPClient)); +use PublicInbox::Config; +require_mods(qw(DBD::SQLite Mail::IMAPClient Linux::Inotify2)); my $level = '-Lbasic'; SKIP: { require_mods('Search::Xapian', 1); @@ -12,7 +14,7 @@ SKIP: { }; my @V = (1); -#push(@V, 2) if require_git('2.6', 1); +push(@V, 2) if require_git('2.6', 1); my ($tmpdir, $for_destroy) = tmpdir(); my $home = "$tmpdir/home"; @@ -139,6 +141,28 @@ is_deeply([$mic->has_capability('COMPRESS')], ['DEFLATE'], 'deflate cap'); ok($mic->compress, 'compress enabled'); $compress_logout->($mic); +my $pi_config = PublicInbox::Config->new; +$pi_config->each_inbox(sub { + my ($ibx) = @_; + my $name = $ibx->{name}; + my $ng = $ibx->{newsgroup}; + my $mic = Mail::IMAPClient->new(%mic_opt); + ok($mic && $mic->login && $mic->IsAuthenticated, "authed $name"); + is_deeply([$mic->has_capability('IDLE')], ['IDLE'], "IDLE capa $name"); + ok(!$mic->idle, "IDLE fails w/o SELECT/EXAMINE $name"); + ok($mic->examine($ng), "EXAMINE $ng succeeds"); + ok($mic->idle, "IDLE succeeds on $ng"); + + open(my $fh, '<', 't/data/message_embed.eml') or BAIL_OUT("open: $!"); + my $env = { ORIGINAL_RECIPIENT => $ibx->{-primary_address} }; + run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or + BAIL_OUT('-mda delivery'); + my $t0 = Time::HiRes::time(); + ok(my @res = $mic->idle_data(11), "IDLE succeeds on $ng"); + ok(grep(/\A\* [0-9] EXISTS\b/, @res), 'got EXISTS message'); + ok((Time::HiRes::time() - $t0) < 10, 'IDLE client notified'); +}); + $td->kill; $td->join; is($?, 0, 'no error in exited process');
There's enough places where we only care about the max NNTP article number to warrant avoiding a call into SQLite. Using ->num_highwater in read-only packages such as PublicInbox::IMAP is also incorrect, since that memoizes and won't pick up changes made by other processes. --- lib/PublicInbox/IMAP.pm | 10 +++++----- lib/PublicInbox/Msgmap.pm | 20 +++++++++++--------- 2 files changed, 16 insertions(+), 14 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 99c6c817fd7..4a43185c512 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -146,7 +146,7 @@ sub cmd_noop ($$) { "$_[1] OK NOOP completed\r\n" } # called by PublicInbox::InboxIdle sub on_inbox_unlock { my ($self, $ibx) = @_; - my $new = ($ibx->mm->minmax)[1]; + my $new = $ibx->mm->max; defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset'; if ($new > $old) { $self->{-idle_max} = $new; @@ -161,7 +161,7 @@ sub cmd_idle ($$) { my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n"; $ibx->subscribe_unlock(fileno($self->{sock}), $self); $self->{-idle_tag} = $tag; - $self->{-idle_max} = ($ibx->mm->minmax)[1] // 0; + $self->{-idle_max} = $ibx->mm->max // 0; "+ idling\r\n" } @@ -182,7 +182,7 @@ sub cmd_examine ($$$) { my $ibx = $self->{imapd}->{groups}->{$mailbox} or return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; my $mm = $ibx->mm; - my $max = $mm->num_highwater // 0; + my $max = $mm->max // 0; # RFC 3501 2.3.1.1 - "A good UIDVALIDITY value to use in # this case is a 32-bit representation of the creation # date/time of the mailbox" @@ -320,7 +320,7 @@ sub cmd_uid_fetch ($$$;@) { if ($range =~ /\A([0-9]+):([0-9]+)\z/s) { ($beg, $end) = ($1, $2); } elsif ($range =~ /\A([0-9]+):\*\z/s) { - ($beg, $end) = ($1, $ibx->mm->num_highwater // 0); + ($beg, $end) = ($1, $ibx->mm->max // 0); } elsif ($range =~ /\A[0-9]+\z/) { my $smsg = $ibx->over->get_art($range) or return "$tag OK\r\n"; push @$msgs, $smsg; @@ -365,7 +365,7 @@ sub cmd_uid_search ($$$;) { } elsif ($arg eq 'UID' && scalar(@rest) == 1) { if ($rest[0] =~ /\A([0-9]+):([0-9]+|\*)\z/s) { my ($beg, $end) = ($1, $2); - $end = ($ibx->mm->minmax)[1] if $end eq '*'; + $end = $ibx->mm->max if $end eq '*'; $self->msg_more('* SEARCH'); long_response($self, \&uid_search_uid_range, $tag, $ibx, \$beg, $end); diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm index a2ffe7203e7..d115cbce352 100644 --- a/lib/PublicInbox/Msgmap.pm +++ b/lib/PublicInbox/Msgmap.pm @@ -55,8 +55,7 @@ sub new_file { $dbh->begin_work; $self->created_at(time) unless $self->created_at; - my (undef, $max) = $self->minmax(); - $max ||= 0; + my $max = $self->max // 0; $self->num_highwater($max); $dbh->commit; } @@ -159,17 +158,20 @@ sub num_for { $sth->fetchrow_array; } +sub max { + my $sth = $_[0]->{dbh}->prepare_cached('SELECT MAX(num) FROM msgmap', + undef, 1); + $sth->execute; + $sth->fetchrow_array; +} + sub minmax { - my ($self) = @_; - my $dbh = $self->{dbh}; # breaking MIN and MAX into separate queries speeds up from 250ms # to around 700us with 2.7million messages. - my $sth = $dbh->prepare_cached('SELECT MIN(num) FROM msgmap', undef, 1); - $sth->execute; - my $min = $sth->fetchrow_array; - $sth = $dbh->prepare_cached('SELECT MAX(num) FROM msgmap', undef, 1); + my $sth = $_[0]->{dbh}->prepare_cached('SELECT MIN(num) FROM msgmap', + undef, 1); $sth->execute; - ($min, $sth->fetchrow_array); + ($sth->fetchrow_array, max($_[0])); } sub mid_delete {
InboxIdle should not be holding onto Inbox objects after the Config object they came from expires, and Config objects may expire on SIGHUP. Old Inbox objects still persist due to IMAP clients holding onto them, but that's a concern we'll deal with at another time, or not at all, since all clients expire, eventually. Regardless, stale inotify watch descriptors should not be left hanging after SIGHUP refreshes. --- lib/PublicInbox/IMAP.pm | 1 + lib/PublicInbox/IMAPD.pm | 14 +++++---- lib/PublicInbox/InboxIdle.pm | 36 ++++++++++++++++++---- t/imapd.t | 58 ++++++++++++++++++++++++++++++++++-- 4 files changed, 96 insertions(+), 13 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 4a43185c512..c8592dc0329 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -160,6 +160,7 @@ sub cmd_idle ($$) { # IDLE seems allowed by dovecot w/o a mailbox selected *shrug* my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n"; $ibx->subscribe_unlock(fileno($self->{sock}), $self); + $self->{imapd}->idler_start; $self->{-idle_tag} = $tag; $self->{-idle_max} = $ibx->mm->max // 0; "+ idling\r\n" diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index 1922c16046a..05aa30e42a1 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -16,18 +16,22 @@ sub new { out => \*STDOUT, grouplist => [], # accept_tls => { SSL_server => 1, ..., SSL_reuse_ctx => ... } + # pi_config => PublicInbox::Config # idler => PublicInbox::InboxIdle }, $class; } sub refresh_groups { my ($self) = @_; - if (my $old_idler = delete $self->{idler}) { - $old_idler->close; # PublicInbox::DS::close - } - my $pi_config = PublicInbox::Config->new; - $self->{idler} = PublicInbox::InboxIdle->new($pi_config); + my $pi_config = $self->{pi_config} = PublicInbox::Config->new; $self->SUPER::refresh_groups($pi_config); + if (my $idler = $self->{idler}) { + $idler->refresh($pi_config); + } +} + +sub idler_start { + $_[0]->{idler} //= PublicInbox::InboxIdle->new($_[0]->{pi_config}); } 1; diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm index 095a801c946..c19b8d186cd 100644 --- a/lib/PublicInbox/InboxIdle.pm +++ b/lib/PublicInbox/InboxIdle.pm @@ -4,7 +4,8 @@ package PublicInbox::InboxIdle; use strict; use base qw(PublicInbox::DS); -use fields qw(pi_config inot); +use fields qw(pi_config inot pathmap); +use Cwd qw(abs_path); use Symbol qw(gensym); use PublicInbox::Syscall qw(EPOLLIN EPOLLET); my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify @@ -19,13 +20,35 @@ if ($^O eq 'linux' && eval { require Linux::Inotify2; 1 }) { require PublicInbox::In2Tie if $ino_cls; sub in2_arm ($$) { # PublicInbox::Config::each_inbox callback - my ($ibx, $inot) = @_; - my $path = "$ibx->{inboxdir}/"; - $path .= $ibx->version >= 2 ? 'inbox.lock' : 'ssoma.lock'; - $inot->watch($path, $IN_CLOSE, sub { $ibx->on_unlock }); + my ($ibx, $self) = @_; + my $dir = abs_path($ibx->{inboxdir}); + if (!defined($dir)) { + warn "W: $ibx->{inboxdir} not watched: $!\n"; + return; + } + my $inot = $self->{inot}; + my $cur = $self->{pathmap}->{$dir} //= []; + + # transfer old subscriptions to the current inbox, cancel the old watch + if (my $old_ibx = $cur->[0]) { + $ibx->{unlock_subs} and + die "BUG: $dir->{unlock_subs} should not exist"; + $ibx->{unlock_subs} = $old_ibx->{unlock_subs}; + $cur->[1]->cancel; + } + $cur->[0] = $ibx; + + my $lock = "$dir/".($ibx->version >= 2 ? 'inbox.lock' : 'ssoma.lock'); + $cur->[1] = $inot->watch($lock, $IN_CLOSE, sub { $ibx->on_unlock }); + # TODO: detect deleted packs (and possibly other files) } +sub refresh { + my ($self, $pi_config) = @_; + $pi_config->each_inbox(\&in2_arm, $self); +} + sub new { my ($class, $pi_config) = @_; my $self = fields::new($class); @@ -42,7 +65,8 @@ sub new { $inot = PublicInbox::FakeInotify->new; } $self->{inot} = $inot; - $pi_config->each_inbox(\&in2_arm, $inot); + $self->{pathmap} = {}; # inboxdir => [ ibx, watch1, watch2, watch3...] + refresh($self, $pi_config); $self; } diff --git a/t/imapd.t b/t/imapd.t index 359c4c033b2..b0caa8f1779 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -6,6 +6,7 @@ use Test::More; use Time::HiRes (); use PublicInbox::TestCommon; use PublicInbox::Config; +use PublicInbox::Spawn qw(which); require_mods(qw(DBD::SQLite Mail::IMAPClient Linux::Inotify2)); my $level = '-Lbasic'; SKIP: { @@ -141,9 +142,12 @@ is_deeply([$mic->has_capability('COMPRESS')], ['DEFLATE'], 'deflate cap'); ok($mic->compress, 'compress enabled'); $compress_logout->($mic); +my $have_inotify = eval { require Linux::Inotify2; 1 }; + my $pi_config = PublicInbox::Config->new; $pi_config->each_inbox(sub { my ($ibx) = @_; + my $env = { ORIGINAL_RECIPIENT => $ibx->{-primary_address} }; my $name = $ibx->{name}; my $ng = $ibx->{newsgroup}; my $mic = Mail::IMAPClient->new(%mic_opt); @@ -154,12 +158,62 @@ $pi_config->each_inbox(sub { ok($mic->idle, "IDLE succeeds on $ng"); open(my $fh, '<', 't/data/message_embed.eml') or BAIL_OUT("open: $!"); - my $env = { ORIGINAL_RECIPIENT => $ibx->{-primary_address} }; run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or BAIL_OUT('-mda delivery'); my $t0 = Time::HiRes::time(); ok(my @res = $mic->idle_data(11), "IDLE succeeds on $ng"); - ok(grep(/\A\* [0-9] EXISTS\b/, @res), 'got EXISTS message'); + is(grep(/\A\* [0-9] EXISTS\b/, @res), 1, 'got EXISTS message'); + ok((Time::HiRes::time() - $t0) < 10, 'IDLE client notified'); + + my (@ino_info, $ino_fdinfo); + SKIP: { + skip 'no inotify support', 1 unless $have_inotify; + skip 'missing /proc/$PID/fd', 1 if !-d "/proc/$td->{pid}/fd"; + my @ino = grep { + readlink($_) =~ /\binotify\b/ + } glob("/proc/$td->{pid}/fd/*"); + is(scalar(@ino), 1, 'only one inotify FD'); + my $ino_fd = (split('/', $ino[0]))[-1]; + $ino_fdinfo = "/proc/$td->{pid}/fdinfo/$ino_fd"; + if (open my $fh, '<', $ino_fdinfo) { + local $/ = "\n"; + @ino_info = grep(/^inotify wd:/, <$fh>); + ok(scalar(@ino_info), 'inotify has watches'); + } else { + skip "$ino_fdinfo missing: $!", 1; + } + }; + + # ensure IDLE persists across HUP, w/o extra watches or FDs + $td->kill('HUP') or BAIL_OUT "failed to kill -imapd: $!"; + SKIP: { + skip 'no inotify fdinfo (or support)', 2 if !@ino_info; + my (@tmp, %prev); + local $/ = "\n"; + my $end = time + 5; + until (time > $end) { + select undef, undef, undef, 0.01; + open my $fh, '<', $ino_fdinfo or + BAIL_OUT "$ino_fdinfo: $!"; + %prev = map { $_ => 1 } @ino_info; + @tmp = grep(/^inotify wd:/, <$fh>); + if (scalar(@tmp) == scalar(@ino_info)) { + delete @prev{@tmp}; + last if scalar(keys(%prev)) == @ino_info; + } + } + is(scalar @tmp, scalar @ino_info, + 'old inotify watches replaced'); + is(scalar keys %prev, scalar @ino_info, + 'no previous watches overlap'); + }; + + open($fh, '<', 't/data/0001.patch') or BAIL_OUT("open: $!"); + run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or + BAIL_OUT('-mda delivery'); + $t0 = Time::HiRes::time(); + ok(@res = $mic->idle_data(11), "IDLE succeeds on $ng after HUP"); + is(grep(/\A\* [0-9] EXISTS\b/, @res), 1, 'got EXISTS message'); ok((Time::HiRes::time() - $t0) < 10, 'IDLE client notified'); });
I'm not sure if there's much use for this command, but it's part of RFC3501 and works read-only. --- lib/PublicInbox/IMAP.pm | 29 +++++++++++++++++++++++++++++ t/imapd.t | 5 +++++ 2 files changed, 34 insertions(+) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index c8592dc0329..a2d59e5cccc 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -306,6 +306,35 @@ sub uid_fetch_m { # long_response 1; } +sub cmd_status ($$$;@) { + my ($self, $tag, $mailbox, @items) = @_; + my $ibx = $self->{imapd}->{groups}->{$mailbox} or + return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; + return "$tag BAD no items\r\n" if !scalar(@items); + ($items[0] !~ s/\A\(//s || $items[-1] !~ s/\)\z//s) and + return "$tag BAD invalid args\r\n"; + + my $mm = $ibx->mm; + my ($max, @it); + for my $it (@items) { + $it = uc($it); + push @it, $it; + if ($it =~ /\A(?:MESSAGES|UNSEEN|RECENT)\z/) { + push(@it, ($max //= $mm->max // 0)); + } elsif ($it eq 'UIDNEXT') { + push(@it, ($max //= $mm->max // 0) + 1); + } elsif ($it eq 'UIDVALIDITY') { + push(@it, $mm->created_at // + return("$tag BAD UIDVALIDITY\r\n")); + } else { + return "$tag BAD invalid item\r\n"; + } + } + return "$tag BAD no items\r\n" if !@it; + "* STATUS $mailbox (".join(' ', @it).")\r\n" . + "$tag OK Status complete\r\n"; +} + sub cmd_uid_fetch ($$$;@) { my ($self, $tag, $range, @want) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; diff --git a/t/imapd.t b/t/imapd.t index b0caa8f1779..7512bb90050 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -81,6 +81,11 @@ ok(!$mic->select('foo') && ($e = $@), 'EXAMINE non-existent'); like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent'); ok($mic->select('inbox.i1'), 'SELECT succeeds'); ok($mic->examine('inbox.i1'), 'EXAMINE succeeds'); +my @raw = $mic->status('inbox.i1', qw(Messages uidnext uidvalidity)); +is(scalar(@raw), 2, 'got status response'); +like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\x20 + \(MESSAGES\x20\d+\x20UIDNEXT\x20\d+\x20UIDVALIDITY\x20\d+\)\r\n/sx); +like($raw[1], qr/\A\S+ OK /, 'finished status response'); my $ret = $mic->search('all') or BAIL_OUT "SEARCH FAIL $@"; is_deeply($ret, [ 1 ], 'search all works');
IMAP clients may quote args and escape similar to POSIX shell, so attempt to handle them properly using this standard library module. --- lib/PublicInbox/IMAP.pm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index a2d59e5cccc..7745d9f96f7 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -20,6 +20,7 @@ use fields qw(imapd logged_in ibx long_cb -login_tag use PublicInbox::Eml; use PublicInbox::DS qw(now); use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT); +use Text::ParseWords qw(parse_line); use Errno qw(EAGAIN); my $Address; for my $mod (qw(Email::Address::XS Mail::Address)) { @@ -423,7 +424,8 @@ sub args_ok ($$) { # duplicated from PublicInbox::NNTP # returns 1 if we can continue, 0 if not due to buffered writes or disconnect sub process_line ($$) { my ($self, $l) = @_; - my ($tag, $req, @args) = split(/[ \t]+/, $l); + my ($tag, $req, @args) = parse_line('[ \t]+', 0, $l); + pop(@args) if (@args && !defined($args[-1])); if (@args && uc($req) eq 'UID') { $req .= "_".(shift @args); }
We'll optimize for the common case of: $TAG LIST "" * and rely on the grep perlfunc to handle trickier cases. --- lib/PublicInbox/IMAP.pm | 14 ++++++++++ lib/PublicInbox/IMAPD.pm | 25 +++++++++++++++++ t/imapd.t | 59 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 98 insertions(+) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 7745d9f96f7..ca9a0ea7d42 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -336,6 +336,20 @@ sub cmd_status ($$$;@) { "$tag OK Status complete\r\n"; } +my %patmap = ('*' => '.*', '%' => '[^\.]*'); +sub cmd_list ($$$$) { + my ($self, $tag, $refname, $wildcard) = @_; + my $l = $self->{imapd}->{inboxlist}; + if ($refname eq '' && $wildcard eq '') { + # request for hierarchy delimiter + $l = [ qq[* LIST (\\Noselect) "." ""\r\n] ]; + } elsif ($refname ne '' || $wildcard ne '*') { + $wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eig; + $l = [ grep(/ \Q$refname\E$wildcard\r\n\z/s, @$l) ]; + } + \(join('', @$l, "$tag OK List complete\r\n")); +} + sub cmd_uid_fetch ($$$;@) { my ($self, $tag, $range, @want) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index 05aa30e42a1..a3a2598661b 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -21,10 +21,35 @@ sub new { }, $class; } +sub refresh_inboxlist ($) { + my ($self) = @_; + my @names = map { $_->{newsgroup} } @{delete $self->{grouplist}}; + my %ns; # "\Noselect \HasChildren" + for (@names) { + my $up = $_; + while ($up =~ s/\.[^\.]+\z//) { + $ns{$up} = '\\Noselect \\HasChildren'; + } + } + @names = map {; + my $at = delete($ns{$_}) ? '\\HasChildren' : '\\HasNoChildren'; + qq[* LIST ($at) "." $_\r\n] + } @names; + push(@names, map { qq[* LIST ($ns{$_}) "." $_\r\n] } keys %ns); + @names = sort { + my ($xa) = ($a =~ / (\S+)\r\n/g); + my ($xb) = ($b =~ / (\S+)\r\n/g); + length($xa) <=> length($xb); + } @names; + $self->{inboxlist} = \@names; +} + sub refresh_groups { my ($self) = @_; my $pi_config = $self->{pi_config} = PublicInbox::Config->new; $self->SUPER::refresh_groups($pi_config); + refresh_inboxlist($self); + if (my $idler = $self->{idler}) { $idler->refresh($pi_config); } diff --git a/t/imapd.t b/t/imapd.t index 7512bb90050..a377c02ab43 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -87,6 +87,65 @@ like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\x20 \(MESSAGES\x20\d+\x20UIDNEXT\x20\d+\x20UIDVALIDITY\x20\d+\)\r\n/sx); like($raw[1], qr/\A\S+ OK /, 'finished status response'); +@raw = $mic->list; +like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/, + 'got an inbox'); +like($raw[-1], qr/^\S+ OK /, 'response ended with OK'); +is(scalar(@raw), scalar(@V) + 2, 'default LIST response'); +@raw = $mic->list('', 'inbox.i1'); +is(scalar(@raw), 2, 'limited LIST response'); +like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/, + 'got an inbox.i1'); +like($raw[-1], qr/^\S+ OK /, 'response ended with OK'); + +{ # make sure we get '%' globbing right + my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y)); + my $self = { imapd => { grouplist => \@n } }; + PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); + my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%'); + is(scalar($$res =~ tr/\n/\n/), 2, 'only one result'); + like($$res, qr/ x\r\ntag OK/, 'saw expected'); + $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x.', '%'); + is(scalar($$res =~ tr/\n/\n/), 3, 'only one result'); + is(scalar(my @x = ($$res =~ m/ x\.[zy]\r\n/g)), 2, 'match expected'); + + $res = PublicInbox::IMAP::cmd_list($self, 't', 'x.(?{die "RCE"})', '%'); + like($$res, qr/\At OK /, 'refname does not match attempted RCE'); + $res = PublicInbox::IMAP::cmd_list($self, 't', '', '(?{die "RCE"})%'); + like($$res, qr/\At OK /, 'wildcard does not match attempted RCE'); +} + +if ($ENV{TEST_BENCHMARK}) { + use Benchmark qw(:all); + my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000); + push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000); + my $self = { imapd => { grouplist => \@n } }; + PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); + + my $n = scalar @n; + open my $null, '>', '/dev/null' or die; + my $ds = { sock => $null }; + my $nr = 200; + diag "starting benchmark..."; + my $t = timeit(1, sub { + for (0..$nr) { + my $res = PublicInbox::IMAP::cmd_list($self, 'tag', + '', '*'); + PublicInbox::DS::write($ds, $res); + } + }); + diag timestr($t). "list all for $n inboxes $nr times"; + $nr = 20; + $t = timeit(1, sub { + for (0..$nr) { + my $res = PublicInbox::IMAP::cmd_list($self, 'tag', + 'inbox.', '%'); + PublicInbox::DS::write($ds, $res); + } + }); + diag timestr($t). "list partial for $n inboxes $nr times"; +} + my $ret = $mic->search('all') or BAIL_OUT "SEARCH FAIL $@"; is_deeply($ret, [ 1 ], 'search all works'); $ret = $mic->search('uid 1') or BAIL_OUT "SEARCH FAIL $@";
We can fill in some missing pieces from the emulation APIs to enable IMAP IDLE tests on non-Linux platforms. --- lib/PublicInbox/FakeInotify.pm | 22 +++++++++++++++++++++- lib/PublicInbox/KQNotify.pm | 6 ++++++ t/imapd.t | 3 ++- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/FakeInotify.pm b/lib/PublicInbox/FakeInotify.pm index bd610463181..b077d63a4b4 100644 --- a/lib/PublicInbox/FakeInotify.pm +++ b/lib/PublicInbox/FakeInotify.pm @@ -8,13 +8,30 @@ use strict; use Time::HiRes qw(stat); my $IN_CLOSE = 0x08 | 0x10; # match Linux inotify -sub new { bless { watch => {} }, __PACKAGE__ } +my $poll_intvl = 2; # same as Filesys::Notify::Simple +my $for_cancel = bless \(my $x), 'PublicInbox::FakeInotify::Watch'; + +sub poll_once { + my ($self) = @_; + sub { + eval { $self->poll }; + warn "E: FakeInotify->poll: $@\n" if $@; + PublicInbox::DS::add_timer($poll_intvl, poll_once($self)); + }; +} + +sub new { + my $self = bless { watch => {} }, __PACKAGE__; + PublicInbox::DS::add_timer($poll_intvl, poll_once($self)); + $self; +} # behaves like Linux::Inotify2->watch sub watch { my ($self, $path, $mask, $cb) = @_; my @st = stat($path) or return; $self->{watch}->{"$path\0$mask"} = [ @st, $cb ]; + $for_cancel; } # behaves like non-blocking Linux::Inotify2->poll @@ -36,4 +53,7 @@ sub poll { } } +package PublicInbox::FakeInotify::Watch; +sub cancel {} # noop + 1; diff --git a/lib/PublicInbox/KQNotify.pm b/lib/PublicInbox/KQNotify.pm index 3cf9c0f5a0c..1b5c578ef3d 100644 --- a/lib/PublicInbox/KQNotify.pm +++ b/lib/PublicInbox/KQNotify.pm @@ -31,6 +31,7 @@ sub watch { } else { die "TODO Not implemented: $mask"; } + bless \$fh, 'PublicInbox::KQNotify::Watch'; } # emulate Linux::Inotify::fileno @@ -57,4 +58,9 @@ sub poll { } } +package PublicInbox::KQNotify::Watch; +use strict; + +sub cancel { close ${$_[0]} or die "close: $!" } + 1; diff --git a/t/imapd.t b/t/imapd.t index a377c02ab43..c31ac12f941 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -7,7 +7,8 @@ use Time::HiRes (); use PublicInbox::TestCommon; use PublicInbox::Config; use PublicInbox::Spawn qw(which); -require_mods(qw(DBD::SQLite Mail::IMAPClient Linux::Inotify2)); +require_mods(qw(DBD::SQLite Mail::IMAPClient)); + my $level = '-Lbasic'; SKIP: { require_mods('Search::Xapian', 1);
I'm not sure which clients use these, but it could be useful down the line. --- lib/PublicInbox/Eml.pm | 7 +-- lib/PublicInbox/IMAP.pm | 106 +++++++++++++++++++++++++++++++++++++++- t/imapd.t | 16 +++++- 3 files changed, 123 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm index 6f6874cd237..d2bd3915545 100644 --- a/lib/PublicInbox/Eml.pm +++ b/lib/PublicInbox/Eml.pm @@ -235,11 +235,11 @@ sub mp_descend ($$) { # $arg - user-supplied arg (think pthread_create) # $once - unref body scalar during iteration sub each_part { - my ($self, $cb, $arg, $once) = @_; + my ($self, $cb, $arg, $once, $all) = @_; my $p = mp_descend($self, $once // 0) or return $cb->([$self, 0, 0], $arg); - $cb->([$self, 0, 0], $arg) if $self->{-call_cb}; # rare + $cb->([$self, 0, 0], $arg) if ($all || $self->{-call_cb}); # rare $p = [ $p, 0 ]; my @s; # our virtual stack @@ -255,7 +255,8 @@ sub each_part { (my $nxt = mp_descend($sub, $nr))) { push(@s, $p) if scalar @{$p->[0]}; $p = [ $nxt, @idx, 0 ]; - $cb->([$sub, $depth, @idx], $arg) if $sub->{-call_cb}; + ($all || $sub->{-call_cb}) and + $cb->([$sub, $depth, @idx], $arg); } else { # a leaf node $cb->([$sub, $depth, @idx], $arg); } diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index ca9a0ea7d42..6c5e0290925 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -18,6 +18,7 @@ use base qw(PublicInbox::DS); use fields qw(imapd logged_in ibx long_cb -login_tag -idle_tag -idle_max); use PublicInbox::Eml; +use PublicInbox::EmlContentFoo qw(parse_content_disposition); use PublicInbox::DS qw(now); use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT); use Text::ParseWords qw(parse_line); @@ -248,6 +249,105 @@ sub eml_envelope ($) { ) . ')'; } +sub _esc_hash ($) { + my ($hash) = @_; + if ($hash && scalar keys %$hash) { + $hash = [ %$hash ]; # flatten hash into 1-dimensional array + '(' . join(' ', map { _esc($_) } @$hash) . ')'; + } else { + 'NIL'; + } +} + +sub body_disposition ($) { + my ($eml) = @_; + my $cd = $eml->header_raw('Content-Disposition') or return 'NIL'; + $cd = parse_content_disposition($cd); + my $buf = '('._esc($cd->{type}); + $buf .= ' ' . _esc_hash(delete $cd->{attributes}); + $buf .= ')'; +} + +sub body_leaf ($$;$) { + my ($eml, $structure, $hold) = @_; + my $buf = ''; + $eml->{is_submsg} and # parent was a message/(rfc822|news|global) + $buf .= eml_envelope($eml). ' '; + my $ct = $eml->ct; + $buf .= '('._esc($ct->{type}).' '; + $buf .= _esc($ct->{subtype}); + $buf .= ' ' . _esc_hash(delete $ct->{attributes}); + $buf .= ' ' . _esc($eml->header_raw('Content-ID')); + $buf .= ' ' . _esc($eml->header_raw('Content-Description')); + my $cte = $eml->header_raw('Content-Transfer-Encoding') // '7bit'; + $buf .= ' ' . _esc($cte); + $buf .= ' ' . $eml->{imap_body_len}; + $buf .= ' '.($eml->body_raw =~ tr/\n/\n/) if lc($ct->{type}) eq 'text'; + + # for message/(rfc822|global|news), $hold[0] should have envelope + $buf .= ' ' . (@$hold ? join('', @$hold) : 'NIL') if $hold; + + if ($structure) { + $buf .= ' '._esc($eml->header_raw('Content-MD5')); + $buf .= ' '. body_disposition($eml); + $buf .= ' '._esc($eml->header_raw('Content-Language')); + $buf .= ' '._esc($eml->header_raw('Content-Location')); + } + $buf .= ')'; +} + +sub body_parent ($$$) { + my ($eml, $structure, $hold) = @_; + my $ct = $eml->ct; + my $type = lc($ct->{type}); + if ($type eq 'multipart') { + my $buf = '('; + $buf .= @$hold ? join('', @$hold) : 'NIL'; + $buf .= ' '._esc($ct->{subtype}); + if ($structure) { + $buf .= ' '._esc_hash(delete $ct->{attributes}); + $buf .= ' '.body_disposition($eml); + $buf .= ' '._esc($eml->header_raw('Content-Language')); + $buf .= ' '._esc($eml->header_raw('Content-Location')); + } + $buf .= ')'; + @$hold = ($buf); + } else { # message/(rfc822|global|news) + @$hold = (body_leaf($eml, $structure, $hold)); + } +} + +# this is gross, but we need to process the parent part AFTER +# the child parts are done +sub bodystructure_prep { + my ($p, $q) = @_; + my ($eml, $depth) = @$p; # ignore idx + # set length here, as $eml->{bdy} gets deleted for message/rfc822 + $eml->{imap_body_len} = length($eml->body_raw); + push @$q, $eml, $depth; +} + +# for FETCH BODY and FETCH BODYSTRUCTURE +sub fetch_body ($;$) { + my ($eml, $structure) = @_; + my @q; + $eml->each_part(\&bodystructure_prep, \@q, 0, 1); + my $cur_depth = 0; + my @hold; + do { + my ($part, $depth) = splice(@q, -2); + my $is_mp_parent = $depth == ($cur_depth - 1); + $cur_depth = $depth; + + if ($is_mp_parent) { + body_parent($part, $structure, \@hold); + } else { + unshift @hold, body_leaf($part, $structure); + } + } while (@q); + join('', @hold); +} + sub uid_fetch_cb { # called by git->cat_async my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg; @@ -286,7 +386,11 @@ sub uid_fetch_cb { # called by git->cat_async $self->msg_more(" $f {".length($$bref)."}\r\n"); $self->msg_more($$bref); } - # TODO BODY/BODYSTRUCTURE, specific headers + $want->{BODYSTRUCTURE} and + $self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1)); + $want->{BODY} and + $self->msg_more(' BODY '.fetch_body($eml)); + $self->msg_more(")\r\n"); } diff --git a/t/imapd.t b/t/imapd.t index c31ac12f941..31154bdf3e7 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -7,7 +7,7 @@ use Time::HiRes (); use PublicInbox::TestCommon; use PublicInbox::Config; use PublicInbox::Spawn qw(which); -require_mods(qw(DBD::SQLite Mail::IMAPClient)); +require_mods(qw(DBD::SQLite Mail::IMAPClient Mail::IMAPClient::BodyStructure)); my $level = '-Lbasic'; SKIP: { @@ -190,6 +190,11 @@ for my $r ('1:*', '1') { $ret = $mic->fetch_hash($r, 'FLAGS') or BAIL_OUT "FETCH $@"; is_deeply($ret->{1}->{FLAGS}, '', 'no flags'); + + my $bs = $mic->get_bodystructure($r) or BAIL_OUT("bodystructure: $@"); + ok($bs, 'got a bodystructure'); + is(lc($bs->bodytype), 'text', '->bodytype'); + is(lc($bs->bodyenc), '8bit', '->bodyenc'); } # Mail::IMAPClient ->compress creates cyclic reference: @@ -217,10 +222,12 @@ $pi_config->each_inbox(sub { my $ng = $ibx->{newsgroup}; my $mic = Mail::IMAPClient->new(%mic_opt); ok($mic && $mic->login && $mic->IsAuthenticated, "authed $name"); + my $uidnext = $mic->uidnext($ng); # we'll fetch BODYSTRUCTURE on this + ok($uidnext, 'got uidnext for later fetch'); is_deeply([$mic->has_capability('IDLE')], ['IDLE'], "IDLE capa $name"); ok(!$mic->idle, "IDLE fails w/o SELECT/EXAMINE $name"); ok($mic->examine($ng), "EXAMINE $ng succeeds"); - ok($mic->idle, "IDLE succeeds on $ng"); + ok(my $idle_tag = $mic->idle, "IDLE succeeds on $ng"); open(my $fh, '<', 't/data/message_embed.eml') or BAIL_OUT("open: $!"); run_script(['-mda', '--no-precheck'], $env, { 0 => $fh }) or @@ -280,6 +287,11 @@ $pi_config->each_inbox(sub { ok(@res = $mic->idle_data(11), "IDLE succeeds on $ng after HUP"); is(grep(/\A\* [0-9] EXISTS\b/, @res), 1, 'got EXISTS message'); ok((Time::HiRes::time() - $t0) < 10, 'IDLE client notified'); + ok($mic->done($idle_tag), 'IDLE DONE'); + my $bs = $mic->get_bodystructure($uidnext); + ok($bs, 'BODYSTRUCTURE ok for deeply nested'); + $ret = $mic->fetch_hash($uidnext, 'BODY') or BAIL_OUT "FETCH $@"; + ok($ret->{$uidnext}->{BODY}, 'got something in BODY'); }); $td->kill;
Instead of counts starting at 0, we start the single-part message at 1 like we do with subparts of a multipart message. This will make it easier to map offsets for "BODY[$SECTION]" when using IMAP FETCH, since $SECTION must contain non-zero numbers according to RFC 3501. This doesn't make any difference for WWW URLs, since single part messages cannot have downloadable attachments. --- lib/PublicInbox/Eml.pm | 2 +- lib/PublicInbox/MsgIter.pm | 2 +- t/eml.t | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm index d2bd3915545..ec682b919ea 100644 --- a/lib/PublicInbox/Eml.pm +++ b/lib/PublicInbox/Eml.pm @@ -237,7 +237,7 @@ sub mp_descend ($$) { sub each_part { my ($self, $cb, $arg, $once, $all) = @_; my $p = mp_descend($self, $once // 0) or - return $cb->([$self, 0, 0], $arg); + return $cb->([$self, 0, 1], $arg); $cb->([$self, 0, 0], $arg) if ($all || $self->{-call_cb}); # rare diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm index 5ec2a4d9c7f..bb1dfeadc29 100644 --- a/lib/PublicInbox/MsgIter.pm +++ b/lib/PublicInbox/MsgIter.pm @@ -34,7 +34,7 @@ sub em_each_part ($$;$$) { } } } else { - $cb->([$mime, 0, 0], $cb_arg); + $cb->([$mime, 0, 1], $cb_arg); } } diff --git a/t/eml.t b/t/eml.t index a2479f6f068..8d131b1418e 100644 --- a/t/eml.t +++ b/t/eml.t @@ -169,7 +169,7 @@ EOF is(scalar(@tmp), 1, 'got one part even w/o boundary'); is($tmp[0]->[0]->[0]->body, "hello world\n", 'body preserved'); is($tmp[0]->[0]->[1], 0, '$depth is zero'); - is($tmp[0]->[0]->[2], 0, '@idx is zero'); + is($tmp[0]->[0]->[2], 1, '@idx is one'); } # I guess the following only worked in PI::M because of a happy accident
IMAP supports a high level of granularity when it comes to fetching, but fortunately Perl makes it fairly easy to support. --- MANIFEST | 1 + lib/PublicInbox/IMAP.pm | 154 ++++++++++++++++++++++++++++++++++++++-- t/imap.t | 43 +++++++++++ t/imapd.t | 61 +++++++++++++++- 4 files changed, 253 insertions(+), 6 deletions(-) create mode 100644 t/imap.t diff --git a/MANIFEST b/MANIFEST index 8aff192c7ee..0803c3654d6 100644 --- a/MANIFEST +++ b/MANIFEST @@ -266,6 +266,7 @@ t/httpd-https.t t/httpd-unix.t t/httpd.t t/hval.t +t/imap.t t/imapd-tls.t t/imapd.t t/import.t diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 6c5e0290925..86e0a176be1 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -390,7 +390,9 @@ sub uid_fetch_cb { # called by git->cat_async $self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1)); $want->{BODY} and $self->msg_more(' BODY '.fetch_body($eml)); - + if (my $partial = delete $want->{-partial}) { + partial_emit($self, $partial, $eml); + } $self->msg_more(")\r\n"); } @@ -454,16 +456,158 @@ sub cmd_list ($$$$) { \(join('', @$l, "$tag OK List complete\r\n")); } +sub eml_index_offs_i { # PublicInbox::Eml::each_part callback + my ($p, $all) = @_; + my ($eml, undef, $idx) = @$p; + if ($idx && lc($eml->ct->{type}) eq 'multipart') { + $eml->{imap_bdy} = $eml->{bdy} // \''; + } + $all->{$idx} = $eml; # $idx => Eml +} + +# prepares an index for BODY[$SECTION_IDX] fetches +sub eml_body_idx ($$) { + my ($eml, $section_idx) = @_; + my $idx = $eml->{imap_all_parts} //= do { + my $all = {}; + $eml->each_part(\&eml_index_offs_i, $all, 0, 1); + # top-level of multipart, BODY[0] not allowed (nz-number) + delete $all->{0}; + $all; + }; + $idx->{$section_idx}; +} + +# BODY[($SECTION_IDX)?(.$SECTION_NAME)?]<$offset.$bytes> +sub partial_body { + my ($eml, $section_idx, $section_name) = @_; + if (defined $section_idx) { + $eml = eml_body_idx($eml, $section_idx) or return; + } + if (defined $section_name) { + if ($section_name eq 'MIME') { + # RFC 3501 6.4.5 states: + # The MIME part specifier MUST be prefixed + # by one or more numeric part specifiers + return unless defined $section_idx; + return $eml->header_obj->as_string . "\r\n"; + } + my $bdy = $eml->{bdy} // $eml->{imap_bdy} // \''; + $eml = PublicInbox::Eml->new($$bdy); + if ($section_name eq 'TEXT') { + return $eml->body_raw; + } elsif ($section_name eq 'HEADER') { + return $eml->header_obj->as_string . "\r\n"; + } else { + die "BUG: bad section_name=$section_name"; + } + } + ${$eml->{bdy} // $eml->{imap_bdy} // \''}; +} + +# similar to what's in PublicInbox::Eml::re_memo, but doesn't memoize +# to avoid OOM with malicious users +sub hdrs_regexp ($) { + my ($hdrs) = @_; + my $names = join('|', map { "\Q$_" } split(/[ \t]+/, $hdrs)); + qr/^(?:$names):[ \t]*[^\n]*\r?\n # 1st line + # continuation lines: + (?:[^:\n]*?[ \t]+[^\n]*\r?\n)* + /ismx; +} + +# BODY[($SECTION_IDX.)?HEADER.FIELDS.NOT ($HDRS)]<$offset.$bytes> +sub partial_hdr_not { + my ($eml, $section_idx, $hdrs) = @_; + if (defined $section_idx) { + $eml = eml_body_idx($eml, $section_idx) or return; + } + my $str = $eml->header_obj->as_string; + my $re = hdrs_regexp($hdrs); + $str =~ s/$re//g; + $str .= "\r\n"; +} + +# BODY[($SECTION_IDX.)?HEADER.FIELDS ($HDRS)]<$offset.$bytes> +sub partial_hdr_get { + my ($eml, $section_idx, $hdrs) = @_; + if (defined $section_idx) { + $eml = eml_body_idx($eml, $section_idx) or return; + } + my $str = $eml->header_obj->as_string; + my $re = hdrs_regexp($hdrs); + join('', ($str =~ m/($re)/g), "\r\n"); +} + +sub partial_prepare ($$$) { + my ($partial, $want, $att) = @_; + + # recombine [ "BODY[1.HEADER.FIELDS", "(foo", "bar)]" ] + # back to: "BODY[1.HEADER.FIELDS (foo bar)]" + return unless $att =~ /\ABODY(?:\.PEEK)?\[/s; + until (rindex($att, ']') >= 0) { + my $next = shift @$want or return; + $att .= ' ' . uc($next); + } + if ($att =~ /\ABODY(?:\.PEEK)?\[ + ([0-9]+(?:\.[0-9]+)*)? # 1 - section_idx + (?:\.(HEADER|MIME|TEXT))? # 2 - section_name + \](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 3, 4 + $partial->{$att} = [ \&partial_body, $1, $2, $3, $4 ]; + } elsif ($att =~ /\ABODY(?:\.PEEK)?\[ + (?:([0-9]+(?:\.[0-9]+)*)\.)? # 1 - section_idx + (?:HEADER\.FIELDS(\.NOT)?)\x20 # 2 + \(([A-Z0-9\-\x20]+)\) # 3 - hdrs + \](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 4 5 + $partial->{$att} = [ $2 ? \&partial_hdr_not + : \&partial_hdr_get, + $1, $3, $4, $5 ]; + } else { + undef; + } +} + +sub partial_emit ($$$) { + my ($self, $partial, $eml) = @_; + for my $k (sort keys %$partial) { + my ($cb, @args) = @{$partial->{$k}}; + my ($offset, $len) = splice(@args, -2); + # $cb is partial_body|partial_hdr_get|partial_hdr_not + my $str = $cb->($eml, @args) // ''; + if (defined $offset) { + if (defined $len) { + $str = substr($str, $offset, $len); + $k =~ s/\.$len>\z/>/ or warn +"BUG: unable to remove `.$len>' from `$k'"; + } else { + $str = substr($str, $offset); + $len = length($str); + } + } else { + $len = length($str); + } + $self->msg_more(" $k {$len}\r\n"); + $self->msg_more($str); + } +} + sub cmd_uid_fetch ($$$;@) { my ($self, $tag, $range, @want) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; if ($want[0] =~ s/\A\(//s) { $want[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n"; } - my %want = map {; - my $x = $FETCH_ATT{uc($_)} or return "$tag BAD param: $_\r\n"; - %$x; - } @want; + my (%partial, %want); + while (defined(my $att = shift @want)) { + $att = uc($att); + my $x = $FETCH_ATT{$att}; + if ($x) { + %want = (%want, %$x); + } elsif (!partial_prepare(\%partial, \@want, $att)) { + return "$tag BAD param: $att\r\n"; + } + } + $want{-partial} = \%partial if scalar keys %partial; my ($beg, $end); my $msgs = []; if ($range =~ /\A([0-9]+):([0-9]+)\z/s) { diff --git a/t/imap.t b/t/imap.t new file mode 100644 index 00000000000..c435d365543 --- /dev/null +++ b/t/imap.t @@ -0,0 +1,43 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use Test::More; +use PublicInbox::IMAP; +{ + my $partial_prepare = \&PublicInbox::IMAP::partial_prepare; + my $x = {}; + my $r = $partial_prepare->($x, [], my $p = 'BODY.PEEK[9]'); + ok($r, $p); + $r = $partial_prepare->($x, [], $p = 'BODY.PEEK[9]<5>'); + ok($r, $p); + $r = $partial_prepare->($x, [], $p = 'BODY.PEEK[9]<5.1>'); + ok($r, $p); + $r = $partial_prepare->($x, [], $p = 'BODY[1.1]'); + ok($r, $p); + $r = $partial_prepare->($x, [], $p = 'BODY[HEADER.FIELDS (DATE FROM)]'); + ok($r, $p); + $r = $partial_prepare->($x, [], $p = 'BODY[HEADER.FIELDS.NOT (TO)]'); + ok($r, $p); + $r = $partial_prepare->($x, [], $p = 'BODY[HEDDER.FIELDS.NOT (TO)]'); + ok(!$r, "rejected misspelling $p"); + $r = $partial_prepare->($x, [], $p = 'BODY[1.1.HEADER.FIELDS (TO)]'); + ok($r, $p); + my $partial_body = \&PublicInbox::IMAP::partial_body; + my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get; + my $partial_hdr_not = \&PublicInbox::IMAP::partial_hdr_not; + is_deeply($x, { + 'BODY.PEEK[9]' => [ $partial_body, 9, undef, undef, undef ], + 'BODY.PEEK[9]<5>' => [ $partial_body, 9, undef, 5, undef ], + 'BODY.PEEK[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ], + 'BODY[1.1]' => [ $partial_body, '1.1', undef, undef, undef ], + 'BODY[HEADER.FIELDS (DATE FROM)]' => [ $partial_hdr_get, + undef, 'DATE FROM', undef, undef ], + 'BODY[HEADER.FIELDS.NOT (TO)]' => [ $partial_hdr_not, + undef, 'TO', undef, undef ], + 'BODY[1.1.HEADER.FIELDS (TO)]' => [ $partial_hdr_get, + '1.1', 'TO', undef, undef ], + }, 'structure matches expected'); +} + +done_testing; diff --git a/t/imapd.t b/t/imapd.t index 31154bdf3e7..a63be0fd0c2 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -191,6 +191,31 @@ for my $r ('1:*', '1') { $ret = $mic->fetch_hash($r, 'FLAGS') or BAIL_OUT "FETCH $@"; is_deeply($ret->{1}->{FLAGS}, '', 'no flags'); + $ret = $mic->fetch_hash($r, 'BODY[1]') or BAIL_OUT "FETCH $@"; + like($ret->{1}->{'BODY[1]'}, qr/\AThis is a test message/, 'BODY[1]'); + + $ret = $mic->fetch_hash($r, 'BODY[1]<1>') or BAIL_OUT "FETCH $@"; + like($ret->{1}->{'BODY[1]<1>'}, qr/\Ahis is a test message/, + 'BODY[1]<1>'); + + $ret = $mic->fetch_hash($r, 'BODY[1]<2.3>') or BAIL_OUT "FETCH $@"; + is($ret->{1}->{'BODY[1]<2>'}, "is ", 'BODY[1]<2.3>'); + $ret = $mic->bodypart_string($r, 1, 3, 2) or + BAIL_OUT "bodypart_string $@"; + is($ret, "is ", 'bodypart string'); + + $ret = $mic->fetch_hash($r, 'BODY[HEADER.FIELDS.NOT (Message-ID)]') + or BAIL_OUT "FETCH $@"; + $ret = $ret->{1}->{'BODY[HEADER.FIELDS.NOT (MESSAGE-ID)]'}; + unlike($ret, qr/message-id/i, 'Message-ID excluded'); + like($ret, qr/\r\n\r\n\z/s, 'got header end'); + + $ret = $mic->fetch_hash($r, 'BODY[HEADER.FIELDS (Message-ID)]') + or BAIL_OUT "FETCH $@"; + is($ret->{1}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'}, + 'Message-ID: <testmessage@example.com>'."\r\n\r\n", + 'got only Message-ID'); + my $bs = $mic->get_bodystructure($r) or BAIL_OUT("bodystructure: $@"); ok($bs, 'got a bodystructure'); is(lc($bs->bodytype), 'text', '->bodytype'); @@ -292,7 +317,41 @@ $pi_config->each_inbox(sub { ok($bs, 'BODYSTRUCTURE ok for deeply nested'); $ret = $mic->fetch_hash($uidnext, 'BODY') or BAIL_OUT "FETCH $@"; ok($ret->{$uidnext}->{BODY}, 'got something in BODY'); -}); + + # this matches dovecot behavior + $ret = $mic->fetch_hash($uidnext, 'BODY[1]') or BAIL_OUT "FETCH $@"; + is($ret->{$uidnext}->{'BODY[1]'}, + "testing embedded message harder\r\n", 'BODY[1]'); + $ret = $mic->fetch_hash($uidnext, 'BODY[2]') or BAIL_OUT "FETCH $@"; + like($ret->{$uidnext}->{'BODY[2]'}, + qr/\ADate: Sat, 18 Apr 2020 22:20:20 /, 'BODY[2]'); + + $ret = $mic->fetch_hash($uidnext, 'BODY[2.1.1]') or BAIL_OUT "FETCH $@"; + is($ret->{$uidnext}->{'BODY[2.1.1]'}, + "testing embedded message\r\n", 'BODY[2.1.1]'); + + $ret = $mic->fetch_hash($uidnext, 'BODY[2.1.2]') or BAIL_OUT "FETCH $@"; + like($ret->{$uidnext}->{'BODY[2.1.2]'}, qr/\AFrom: /, + 'BODY[2.1.2] tip matched'); + like($ret->{$uidnext}->{'BODY[2.1.2]'}, + # trailing CRLF may vary depending on MIME parser + qr/done_testing;(?:\r\n){1,2}\z/, + 'BODY[2.1.2] tail matched'); + + $ret = $mic->fetch_hash($uidnext, 'BODY[2.HEADER]') or + BAIL_OUT "2.HEADER $@"; + like($ret->{$uidnext}->{'BODY[2.HEADER]'}, + qr/\ADate: Sat, 18 Apr 2020 22:20:20 /, + '2.HEADER of message/rfc822'); + + $ret = $mic->fetch_hash($uidnext, 'BODY[2.MIME]') or + BAIL_OUT "2.MIME $@"; + is($ret->{$uidnext}->{'BODY[2.MIME]'}, <<EOF, 'BODY[2.MIME]'); +Content-Type: message/rfc822\r +Content-Disposition: attachment; filename="embed2x\.eml"\r +\r +EOF +}); # each_inbox $td->kill; $td->join;
Mail::IMAPClient doesn't seem to mind the lack of `resp-text'; but it's required by RFC 3501. Preliminary tests with offlineimap(1) indicates the presence of `resp-text' is necessary, even if it's just the freeform `text'. And make the `text' more consistent, favoring "done" over "complete" or "completed"; while we're at it. --- lib/PublicInbox/IMAP.pm | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 86e0a176be1..4292c564f01 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -127,7 +127,7 @@ sub cmd_login ($$$$) { sub cmd_logout ($$) { my ($self, $tag) = @_; delete $self->{logged_in}; - $self->write(\"* BYE logging out\r\n$tag OK logout completed\r\n"); + $self->write(\"* BYE logging out\r\n$tag OK Logout done\r\n"); $self->shutdn; # PublicInbox::DS::shutdn undef; } @@ -140,10 +140,10 @@ sub cmd_authenticate ($$$) { sub cmd_capability ($$) { my ($self, $tag) = @_; - '* '.capa($self)."\r\n$tag OK\r\n"; + '* '.capa($self)."\r\n$tag OK Capability done\r\n"; } -sub cmd_noop ($$) { "$_[1] OK NOOP completed\r\n" } +sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" } # called by PublicInbox::InboxIdle sub on_inbox_unlock { @@ -177,7 +177,7 @@ sub cmd_done ($$) { return "$tag BAD internal bug\r\n"; }; $ibx->unsubscribe_unlock(fileno($self->{sock})); - "$idle_tag OK Idle completed\r\n"; + "$idle_tag OK Idle done\r\n"; } sub cmd_examine ($$$) { @@ -204,7 +204,7 @@ EOF $ret .= "* OK [UNSEEN $max]\r\n" if $max; $ret .= "* OK [UIDNEXT $uidnext]\r\n" if defined $uidnext; $ret .= "* OK [UIDVALIDITY $uidvalidity]\r\n" if defined $uidvalidity; - $ret .= "$tag OK [READ-ONLY] EXAMINE/SELECT complete\r\n"; + $ret .= "$tag OK [READ-ONLY] EXAMINE/SELECT done\r\n"; } sub _esc ($) { @@ -439,7 +439,7 @@ sub cmd_status ($$$;@) { } return "$tag BAD no items\r\n" if !@it; "* STATUS $mailbox (".join(' ', @it).")\r\n" . - "$tag OK Status complete\r\n"; + "$tag OK Status done\r\n"; } my %patmap = ('*' => '.*', '%' => '[^\.]*'); @@ -453,7 +453,7 @@ sub cmd_list ($$$$) { $wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eig; $l = [ grep(/ \Q$refname\E$wildcard\r\n\z/s, @$l) ]; } - \(join('', @$l, "$tag OK List complete\r\n")); + \(join('', @$l, "$tag OK List done\r\n")); } sub eml_index_offs_i { # PublicInbox::Eml::each_part callback @@ -615,11 +615,12 @@ sub cmd_uid_fetch ($$$;@) { } elsif ($range =~ /\A([0-9]+):\*\z/s) { ($beg, $end) = ($1, $ibx->mm->max // 0); } elsif ($range =~ /\A[0-9]+\z/) { - my $smsg = $ibx->over->get_art($range) or return "$tag OK\r\n"; + my $smsg = $ibx->over->get_art($range) or + return "$tag OK Fetch done\r\n"; # really OK(!) push @$msgs, $smsg; ($beg, $end) = ($range, 0); } else { - return "$tag BAD\r\n"; + return "$tag BAD fetch range\r\n"; } long_response($self, \&uid_fetch_m, $tag, $ibx, \$beg, $end, $msgs, \%want); @@ -631,7 +632,7 @@ sub uid_search_all { # long_response if (scalar(@$uids)) { $self->msg_more(join(' ', '', @$uids)); } else { - $self->write(\"\r\n$tag OK\r\n"); + $self->write(\"\r\n$tag OK Search done\r\n"); undef; } } @@ -642,7 +643,7 @@ sub uid_search_uid_range { # long_response if (@$uids) { $self->msg_more(join('', map { " $_->[0]" } @$uids)); } else { - $self->write(\"\r\n$tag OK\r\n"); + $self->write(\"\r\n$tag OK Search done\r\n"); undef; } } @@ -665,12 +666,12 @@ sub cmd_uid_search ($$$;) { } elsif ($rest[0] =~ /\A[0-9]+\z/s) { my $uid = $rest[0]; $uid = $ibx->over->get_art($uid) ? " $uid" : ''; - "* SEARCH$uid\r\n$tag OK\r\n"; + "* SEARCH$uid\r\n$tag OK Search done\r\n"; } else { - "$tag BAD\r\n"; + "$tag BAD Error\r\n"; } } else { - "$tag BAD\r\n"; + "$tag BAD Error\r\n"; } }
This makes the test code easier-to-manage and allows us to run faster unit tests which don't involve loading Mail::IMAPClient. --- MANIFEST | 1 + t/imap.t | 20 ++++++++++++++++++ t/imapd.t | 49 +-------------------------------------------- xt/perf-imap-list.t | 35 ++++++++++++++++++++++++++++++++ 4 files changed, 57 insertions(+), 48 deletions(-) create mode 100644 xt/perf-imap-list.t diff --git a/MANIFEST b/MANIFEST index 0803c3654d6..2e4ec915412 100644 --- a/MANIFEST +++ b/MANIFEST @@ -353,6 +353,7 @@ xt/git_async_cmp.t xt/mem-msgview.t xt/msgtime_cmp.t xt/nntpd-validate.t +xt/perf-imap-list.t xt/perf-msgview.t xt/perf-nntpd.t xt/perf-threading.t diff --git a/t/imap.t b/t/imap.t index c435d365543..9b64f1646c6 100644 --- a/t/imap.t +++ b/t/imap.t @@ -1,9 +1,29 @@ #!perl -w # Copyright (C) 2020 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# unit tests (no network) for IMAP, see t/imapd.t for end-to-end tests use strict; use Test::More; use PublicInbox::IMAP; +use PublicInbox::IMAPD; + +{ # make sure we get '%' globbing right + my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y)); + my $self = { imapd => { grouplist => \@n } }; + PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); + my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%'); + is(scalar($$res =~ tr/\n/\n/), 2, 'only one result'); + like($$res, qr/ x\r\ntag OK/, 'saw expected'); + $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x.', '%'); + is(scalar($$res =~ tr/\n/\n/), 3, 'only one result'); + is(scalar(my @x = ($$res =~ m/ x\.[zy]\r\n/g)), 2, 'match expected'); + + $res = PublicInbox::IMAP::cmd_list($self, 't', 'x.(?{die "RCE"})', '%'); + like($$res, qr/\At OK /, 'refname does not match attempted RCE'); + $res = PublicInbox::IMAP::cmd_list($self, 't', '', '(?{die "RCE"})%'); + like($$res, qr/\At OK /, 'wildcard does not match attempted RCE'); +} + { my $partial_prepare = \&PublicInbox::IMAP::partial_prepare; my $x = {}; diff --git a/t/imapd.t b/t/imapd.t index a63be0fd0c2..59b95a6b83f 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -1,6 +1,7 @@ #!perl -w # Copyright (C) 2020 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# end-to-end IMAP tests, see unit tests in t/imap.t, too use strict; use Test::More; use Time::HiRes (); @@ -99,54 +100,6 @@ like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/, 'got an inbox.i1'); like($raw[-1], qr/^\S+ OK /, 'response ended with OK'); -{ # make sure we get '%' globbing right - my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y)); - my $self = { imapd => { grouplist => \@n } }; - PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); - my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%'); - is(scalar($$res =~ tr/\n/\n/), 2, 'only one result'); - like($$res, qr/ x\r\ntag OK/, 'saw expected'); - $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x.', '%'); - is(scalar($$res =~ tr/\n/\n/), 3, 'only one result'); - is(scalar(my @x = ($$res =~ m/ x\.[zy]\r\n/g)), 2, 'match expected'); - - $res = PublicInbox::IMAP::cmd_list($self, 't', 'x.(?{die "RCE"})', '%'); - like($$res, qr/\At OK /, 'refname does not match attempted RCE'); - $res = PublicInbox::IMAP::cmd_list($self, 't', '', '(?{die "RCE"})%'); - like($$res, qr/\At OK /, 'wildcard does not match attempted RCE'); -} - -if ($ENV{TEST_BENCHMARK}) { - use Benchmark qw(:all); - my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000); - push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000); - my $self = { imapd => { grouplist => \@n } }; - PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); - - my $n = scalar @n; - open my $null, '>', '/dev/null' or die; - my $ds = { sock => $null }; - my $nr = 200; - diag "starting benchmark..."; - my $t = timeit(1, sub { - for (0..$nr) { - my $res = PublicInbox::IMAP::cmd_list($self, 'tag', - '', '*'); - PublicInbox::DS::write($ds, $res); - } - }); - diag timestr($t). "list all for $n inboxes $nr times"; - $nr = 20; - $t = timeit(1, sub { - for (0..$nr) { - my $res = PublicInbox::IMAP::cmd_list($self, 'tag', - 'inbox.', '%'); - PublicInbox::DS::write($ds, $res); - } - }); - diag timestr($t). "list partial for $n inboxes $nr times"; -} - my $ret = $mic->search('all') or BAIL_OUT "SEARCH FAIL $@"; is_deeply($ret, [ 1 ], 'search all works'); $ret = $mic->search('uid 1') or BAIL_OUT "SEARCH FAIL $@"; diff --git a/xt/perf-imap-list.t b/xt/perf-imap-list.t new file mode 100644 index 00000000000..37640a90f3c --- /dev/null +++ b/xt/perf-imap-list.t @@ -0,0 +1,35 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use Test::More; +use_ok 'PublicInbox::IMAP'; +use_ok 'PublicInbox::IMAPD'; +use PublicInbox::DS; +use Benchmark qw(:all); +my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000); +push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000); +my $self = { imapd => { grouplist => \@n } }; +PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); + +my $n = scalar @n; +open my $null, '>', '/dev/null' or BAIL_OUT "open: $!"; +my $ds = { sock => $null }; +my $nr = 200; +diag "starting benchmark..."; +my $cmd_list = \&PublicInbox::IMAP::cmd_list; +my $t = timeit(1, sub { + for (0..$nr) { + my $res = $cmd_list->($self, 'tag', '', '*'); + PublicInbox::DS::write($ds, $res); + } +}); +diag timestr($t). "list all for $n inboxes $nr times"; +$nr = 20; +$t = timeit(1, sub { + for (0..$nr) { + my $res = $cmd_list->($self, 'tag', 'inbox.', '%'); + PublicInbox::DS::write($ds, $res); + } +}); +diag timestr($t). "list partial for $n inboxes $nr times"; +done_testing;
We must keep the contents of {-partial} around when handling a request to fetch multiple messages. --- lib/PublicInbox/IMAP.pm | 2 +- t/imapd.t | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 4292c564f01..fffd611b3c6 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -390,7 +390,7 @@ sub uid_fetch_cb { # called by git->cat_async $self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1)); $want->{BODY} and $self->msg_more(' BODY '.fetch_body($eml)); - if (my $partial = delete $want->{-partial}) { + if (my $partial = $want->{-partial}) { partial_emit($self, $partial, $eml); } $self->msg_more(")\r\n"); diff --git a/t/imapd.t b/t/imapd.t index 59b95a6b83f..fcbbdc09d31 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -291,7 +291,7 @@ $pi_config->each_inbox(sub { qr/done_testing;(?:\r\n){1,2}\z/, 'BODY[2.1.2] tail matched'); - $ret = $mic->fetch_hash($uidnext, 'BODY[2.HEADER]') or + $ret = $mic->fetch_hash("1:$uidnext", 'BODY[2.HEADER]') or BAIL_OUT "2.HEADER $@"; like($ret->{$uidnext}->{'BODY[2.HEADER]'}, qr/\ADate: Sat, 18 Apr 2020 22:20:20 /,
While the contents of normal %want hash keys are bounded in size, %partial can cause more overhead and lead to repeated sort calls on multi-message fetches. So sort it once and use arrayrefs to make the data structure more compact. --- lib/PublicInbox/IMAP.pm | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index fffd611b3c6..673e1646216 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -569,8 +569,8 @@ sub partial_prepare ($$$) { sub partial_emit ($$$) { my ($self, $partial, $eml) = @_; - for my $k (sort keys %$partial) { - my ($cb, @args) = @{$partial->{$k}}; + for (@$partial) { + my ($k, $cb, @args) = @$_; my ($offset, $len) = splice(@args, -2); # $cb is partial_body|partial_hdr_get|partial_hdr_not my $str = $cb->($eml, @args) // ''; @@ -607,7 +607,14 @@ sub cmd_uid_fetch ($$$;@) { return "$tag BAD param: $att\r\n"; } } - $want{-partial} = \%partial if scalar keys %partial; + + # stabilize partial order for consistency and ease-of-debugging: + if (scalar keys %partial) { + $want{-partial} = [ map { + [ $_, @{$partial{$_}} ] + } sort keys %partial ]; + } + my ($beg, $end); my $msgs = []; if ($range =~ /\A([0-9]+):([0-9]+)\z/s) {
We'll return dummy messages for now when sequence numbers go missing, in case clients can't handle missing messages. --- lib/PublicInbox/IMAP.pm | 87 ++++++++++++++++++++++++++++++++++++----- t/imapd.t | 18 +++++++++ 2 files changed, 96 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 673e1646216..2aa7ab346c1 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -348,11 +348,34 @@ sub fetch_body ($;$) { join('', @hold); } +sub dummy_message ($$) { + my ($seqno, $ibx) = @_; + my $ret = <<EOF; +From: nobody\@localhost\r +To: nobody\@localhost\r +Date: Thu, 01 Jan 1970 00:00:00 +0000\r +Message-ID: <dummy-$seqno\@$ibx->{newsgroup}>\r +Subject: dummy message #$seqno\r +\r +You're seeing this message because your IMAP client didn't use UIDs.\r +The message which used to use this sequence number was likely spam\r +and removed by the administrator.\r +EOF + \$ret; +} + sub uid_fetch_cb { # called by git->cat_async my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg; my $smsg = shift @$msgs or die 'BUG: no smsg'; - $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; + if (!defined($oid)) { + # it's possible to have TOCTOU if an admin runs + # public-inbox-(edit|purge), just move onto the next message + return unless defined $want->{-seqno}; + $bref = dummy_message($smsg->{num}, $ibx); + } else { + $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; + } $$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy # fixup old bug from import (pre-a0c07cba0e5d8b6a) @@ -591,19 +614,19 @@ sub partial_emit ($$$) { } } -sub cmd_uid_fetch ($$$;@) { - my ($self, $tag, $range, @want) = @_; +sub fetch_common ($$$$) { + my ($self, $tag, $range, $want) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; - if ($want[0] =~ s/\A\(//s) { - $want[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n"; + if ($want->[0] =~ s/\A\(//s) { + $want->[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n"; } my (%partial, %want); - while (defined(my $att = shift @want)) { + while (defined(my $att = shift @$want)) { $att = uc($att); my $x = $FETCH_ATT{$att}; if ($x) { %want = (%want, %$x); - } elsif (!partial_prepare(\%partial, \@want, $att)) { + } elsif (!partial_prepare(\%partial, $want, $att)) { return "$tag BAD param: $att\r\n"; } } @@ -629,8 +652,54 @@ sub cmd_uid_fetch ($$$;@) { } else { return "$tag BAD fetch range\r\n"; } - long_response($self, \&uid_fetch_m, $tag, $ibx, - \$beg, $end, $msgs, \%want); + [ $tag, $ibx, \$beg, $end, $msgs, \%want ]; +} + +sub cmd_uid_fetch ($$$;@) { + my ($self, $tag, $range, @want) = @_; + my $args = fetch_common($self, $tag, $range, \@want); + ref($args) eq 'ARRAY' ? + long_response($self, \&uid_fetch_m, @$args) : + $args; # error +} + +sub seq_fetch_m { # long_response + my ($self, $tag, $ibx, $beg, $end, $msgs, $want) = @_; + if (!@$msgs) { # refill + @$msgs = @{$ibx->over->query_xover($$beg, $end)}; + if (!@$msgs) { + $self->write(\"$tag OK Fetch done\r\n"); + return; + } + $$beg = $msgs->[-1]->{num} + 1; + } + my $seq = $want->{-seqno}++; + my $cur_num = $msgs->[0]->{num}; + if ($cur_num == $seq) { # as expected + my $git = $ibx->git; + $git->cat_async_begin; # TODO: actually make async + $git->cat_async($msgs->[0]->{blob}, \&uid_fetch_cb, \@_); + $git->cat_async_wait; + } elsif ($cur_num > $seq) { + # send dummy messages until $seq catches up to $cur_num + my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg'; + unshift @$msgs, $smsg; + my $bref = dummy_message($seq, $ibx); + uid_fetch_cb($bref, undef, undef, undef, \@_); + } else { # should not happen + die "BUG: cur_num=$cur_num < seq=$seq"; + } + 1; # more messages on the way +} + +sub cmd_fetch ($$$;@) { + my ($self, $tag, $range, @want) = @_; + my $args = fetch_common($self, $tag, $range, \@want); + ref($args) eq 'ARRAY' ? do { + my $want = $args->[-1]; + $want->{-seqno} = ${$args->[2]}; # $$beg + long_response($self, \&seq_fetch_m, @$args) + } : $args; # error } sub uid_search_all { # long_response diff --git a/t/imapd.t b/t/imapd.t index fcbbdc09d31..fc90948e182 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -306,6 +306,24 @@ Content-Disposition: attachment; filename="embed2x\.eml"\r EOF }); # each_inbox +# message sequence numbers :< +is($mic->Uid(0), 0, 'disable UID on '.ref($mic)); +ok($mic->reconnect, 'reconnected'); +$ret = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@"; +is(scalar keys %$ret, 3, 'got all 3 messages'); +{ + my $rdr = { 0 => \($ret->{1}->{RFC822}) }; + my $env = { HOME => $ENV{HOME} }; + my @cmd = qw(-learn rm --all); + run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm'); +} +my $r2 = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@"; +is(scalar keys %$r2, 3, 'still got all 3 messages'); +like($r2->{1}->{RFC822}, qr/dummy message #1/, 'got dummy message 1'); +is($r2->{2}->{RFC822}, $ret->{2}->{RFC822}, 'message 2 unchanged'); +is($r2->{3}->{RFC822}, $ret->{3}->{RFC822}, 'message 3 unchanged'); +ok($mic->logout, 'logged out'); + $td->kill; $td->join; is($?, 0, 'no error in exited process');
They're not specified in RFC 3501 for responses, and at least mutt fails to handle it. --- lib/PublicInbox/IMAP.pm | 32 ++++++++++++++------------------ t/imap.t | 12 ++++++------ t/imapd.t | 14 ++++++++++---- 3 files changed, 30 insertions(+), 28 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 2aa7ab346c1..54c616eee2f 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -33,9 +33,6 @@ die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address; sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977? my %FETCH_NEED_BLOB = ( # for future optimization - 'BODY.PEEK[HEADER]' => 1, - 'BODY.PEEK[TEXT]' => 1, - 'BODY.PEEK[]' => 1, 'BODY[HEADER]' => 1, 'BODY[TEXT]' => 1, 'BODY[]' => 1, @@ -388,8 +385,8 @@ sub uid_fetch_cb { # called by git->cat_async $want->{INTERNALDATE} and $self->msg_more(' INTERNALDATE "'.$smsg->internaldate.'"'); $want->{FLAGS} and $self->msg_more(' FLAGS ()'); - for ('RFC822', 'BODY[]', 'BODY.PEEK[]') { - next unless $want->{$_}; + for ('RFC822', 'BODY[]') { + $want->{$_} or next; $self->msg_more(" $_ {".length($$bref)."}\r\n"); $self->msg_more($$bref); } @@ -399,14 +396,14 @@ sub uid_fetch_cb { # called by git->cat_async $want->{ENVELOPE} and $self->msg_more(' ENVELOPE '.eml_envelope($eml)); - for my $f ('RFC822.HEADER', 'BODY[HEADER]', 'BODY.PEEK[HEADER]') { - next unless $want->{$f}; - $self->msg_more(" $f {".length(${$eml->{hdr}})."}\r\n"); + for ('RFC822.HEADER', 'BODY[HEADER]') { + $want->{$_} or next; + $self->msg_more(" $_ {".length(${$eml->{hdr}})."}\r\n"); $self->msg_more(${$eml->{hdr}}); } - for my $f ('RFC822.TEXT', 'BODY[TEXT]') { - next unless $want->{$f}; - $self->msg_more(" $f {".length($$bref)."}\r\n"); + for ('RFC822.TEXT', 'BODY[TEXT]') { + $want->{$_} or next; + $self->msg_more(" $_ {".length($$bref)."}\r\n"); $self->msg_more($$bref); } $want->{BODYSTRUCTURE} and @@ -567,18 +564,16 @@ sub partial_prepare ($$$) { # recombine [ "BODY[1.HEADER.FIELDS", "(foo", "bar)]" ] # back to: "BODY[1.HEADER.FIELDS (foo bar)]" - return unless $att =~ /\ABODY(?:\.PEEK)?\[/s; + return unless $att =~ /\ABODY\[/s; until (rindex($att, ']') >= 0) { my $next = shift @$want or return; $att .= ' ' . uc($next); } - if ($att =~ /\ABODY(?:\.PEEK)?\[ - ([0-9]+(?:\.[0-9]+)*)? # 1 - section_idx - (?:\.(HEADER|MIME|TEXT))? # 2 - section_name + if ($att =~ /\ABODY\[([0-9]+(?:\.[0-9]+)*)? # 1 - section_idx + (?:\.(HEADER|MIME|TEXT))? # 2 - section_name \](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 3, 4 $partial->{$att} = [ \&partial_body, $1, $2, $3, $4 ]; - } elsif ($att =~ /\ABODY(?:\.PEEK)?\[ - (?:([0-9]+(?:\.[0-9]+)*)\.)? # 1 - section_idx + } elsif ($att =~ /\ABODY\[(?:([0-9]+(?:\.[0-9]+)*)\.)? # 1 - section_idx (?:HEADER\.FIELDS(\.NOT)?)\x20 # 2 \(([A-Z0-9\-\x20]+)\) # 3 - hdrs \](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 4 5 @@ -623,6 +618,7 @@ sub fetch_common ($$$$) { my (%partial, %want); while (defined(my $att = shift @$want)) { $att = uc($att); + $att =~ s/\ABODY\.PEEK\[/BODY\[/; # we're read-only my $x = $FETCH_ATT{$att}; if ($x) { %want = (%want, %$x); @@ -633,7 +629,7 @@ sub fetch_common ($$$$) { # stabilize partial order for consistency and ease-of-debugging: if (scalar keys %partial) { - $want{-partial} = [ map { + $want{-partial} = [ map {; [ $_, @{$partial{$_}} ] } sort keys %partial ]; } diff --git a/t/imap.t b/t/imap.t index 9b64f1646c6..fe6352b678c 100644 --- a/t/imap.t +++ b/t/imap.t @@ -27,11 +27,11 @@ use PublicInbox::IMAPD; { my $partial_prepare = \&PublicInbox::IMAP::partial_prepare; my $x = {}; - my $r = $partial_prepare->($x, [], my $p = 'BODY.PEEK[9]'); + my $r = $partial_prepare->($x, [], my $p = 'BODY[9]'); ok($r, $p); - $r = $partial_prepare->($x, [], $p = 'BODY.PEEK[9]<5>'); + $r = $partial_prepare->($x, [], $p = 'BODY[9]<5>'); ok($r, $p); - $r = $partial_prepare->($x, [], $p = 'BODY.PEEK[9]<5.1>'); + $r = $partial_prepare->($x, [], $p = 'BODY[9]<5.1>'); ok($r, $p); $r = $partial_prepare->($x, [], $p = 'BODY[1.1]'); ok($r, $p); @@ -47,9 +47,9 @@ use PublicInbox::IMAPD; my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get; my $partial_hdr_not = \&PublicInbox::IMAP::partial_hdr_not; is_deeply($x, { - 'BODY.PEEK[9]' => [ $partial_body, 9, undef, undef, undef ], - 'BODY.PEEK[9]<5>' => [ $partial_body, 9, undef, 5, undef ], - 'BODY.PEEK[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ], + 'BODY[9]' => [ $partial_body, 9, undef, undef, undef ], + 'BODY[9]<5>' => [ $partial_body, 9, undef, 5, undef ], + 'BODY[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ], 'BODY[1.1]' => [ $partial_body, '1.1', undef, undef, undef ], 'BODY[HEADER.FIELDS (DATE FROM)]' => [ $partial_hdr_get, undef, 'DATE FROM', undef, undef ], diff --git a/t/imapd.t b/t/imapd.t index fc90948e182..1ec0d5c370f 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -317,11 +317,17 @@ is(scalar keys %$ret, 3, 'got all 3 messages'); my @cmd = qw(-learn rm --all); run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm'); } -my $r2 = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@"; +my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@"; is(scalar keys %$r2, 3, 'still got all 3 messages'); -like($r2->{1}->{RFC822}, qr/dummy message #1/, 'got dummy message 1'); -is($r2->{2}->{RFC822}, $ret->{2}->{RFC822}, 'message 2 unchanged'); -is($r2->{3}->{RFC822}, $ret->{3}->{RFC822}, 'message 3 unchanged'); +like($r2->{1}->{'BODY[]'}, qr/dummy message #1/, 'got dummy message 1'); +is($r2->{2}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged'); +is($r2->{3}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged'); +$r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]') + or BAIL_OUT "FETCH $@"; +is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'}, + 'Message-ID: <20200418222508.GA13918@dcvr>'."\r\n\r\n", + 'BODY.PEEK[HEADER.FIELDS ...] drops .PEEK'); + ok($mic->logout, 'logged out'); $td->kill;
It seems worthless to support CLOSE for read-only inboxes, but mutt sends it, so don't return a BAD error with proper use. --- lib/PublicInbox/IMAP.pm | 6 ++++++ t/imapd.t | 3 ++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 54c616eee2f..0852ffab868 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -121,6 +121,12 @@ sub cmd_login ($$$$) { login_success($self, $tag); } +sub cmd_close ($$) { + my ($self, $tag) = @_; + delete $self->{ibx} ? "$tag OK Close done\r\n" + : "$tag BAD No mailbox\r\n"; +} + sub cmd_logout ($$) { my ($self, $tag) = @_; delete $self->{logged_in}; diff --git a/t/imapd.t b/t/imapd.t index 1ec0d5c370f..8172a91923d 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -327,7 +327,8 @@ $r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]') is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'}, 'Message-ID: <20200418222508.GA13918@dcvr>'."\r\n\r\n", 'BODY.PEEK[HEADER.FIELDS ...] drops .PEEK'); - +ok($mic->close, 'CLOSE works'); +ok(!$mic->close, 'CLOSE not idempotent'); ok($mic->logout, 'logged out'); $td->kill;
While we can't memoize the regexp forever like we do with other Eml users, we can still benefit from caching regexp compilation on a per-request basis. A FETCH request from mutt on a 4K message inbox is around 8% faster after this. Since regexp compilation via qr// isn't unbearably slow, a shared cache probably isn't worth the trouble of implementing. A per-request cache seems enough. --- lib/PublicInbox/IMAP.pm | 17 ++++++++--------- t/imap.t | 10 +++++++--- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 0852ffab868..39667199080 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -544,25 +544,23 @@ sub hdrs_regexp ($) { # BODY[($SECTION_IDX.)?HEADER.FIELDS.NOT ($HDRS)]<$offset.$bytes> sub partial_hdr_not { - my ($eml, $section_idx, $hdrs) = @_; + my ($eml, $section_idx, $hdrs_re) = @_; if (defined $section_idx) { $eml = eml_body_idx($eml, $section_idx) or return; } my $str = $eml->header_obj->as_string; - my $re = hdrs_regexp($hdrs); - $str =~ s/$re//g; + $str =~ s/$hdrs_re//g; $str .= "\r\n"; } # BODY[($SECTION_IDX.)?HEADER.FIELDS ($HDRS)]<$offset.$bytes> sub partial_hdr_get { - my ($eml, $section_idx, $hdrs) = @_; + my ($eml, $section_idx, $hdrs_re) = @_; if (defined $section_idx) { $eml = eml_body_idx($eml, $section_idx) or return; } my $str = $eml->header_obj->as_string; - my $re = hdrs_regexp($hdrs); - join('', ($str =~ m/($re)/g), "\r\n"); + join('', ($str =~ m/($hdrs_re)/g), "\r\n"); } sub partial_prepare ($$$) { @@ -583,9 +581,10 @@ sub partial_prepare ($$$) { (?:HEADER\.FIELDS(\.NOT)?)\x20 # 2 \(([A-Z0-9\-\x20]+)\) # 3 - hdrs \](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 4 5 - $partial->{$att} = [ $2 ? \&partial_hdr_not - : \&partial_hdr_get, - $1, $3, $4, $5 ]; + my $tmp = $partial->{$att} = [ $2 ? \&partial_hdr_not + : \&partial_hdr_get, + $1, undef, $4, $5 ]; + $tmp->[2] = hdrs_regexp($3); } else { undef; } diff --git a/t/imap.t b/t/imap.t index fe6352b678c..451b6596bf9 100644 --- a/t/imap.t +++ b/t/imap.t @@ -46,17 +46,21 @@ use PublicInbox::IMAPD; my $partial_body = \&PublicInbox::IMAP::partial_body; my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get; my $partial_hdr_not = \&PublicInbox::IMAP::partial_hdr_not; + my $hdrs_regexp = \&PublicInbox::IMAP::hdrs_regexp; is_deeply($x, { 'BODY[9]' => [ $partial_body, 9, undef, undef, undef ], 'BODY[9]<5>' => [ $partial_body, 9, undef, 5, undef ], 'BODY[9]<5.1>' => [ $partial_body, 9, undef, 5, 1 ], 'BODY[1.1]' => [ $partial_body, '1.1', undef, undef, undef ], 'BODY[HEADER.FIELDS (DATE FROM)]' => [ $partial_hdr_get, - undef, 'DATE FROM', undef, undef ], + undef, $hdrs_regexp->('DATE FROM'), + undef, undef ], 'BODY[HEADER.FIELDS.NOT (TO)]' => [ $partial_hdr_not, - undef, 'TO', undef, undef ], + undef, $hdrs_regexp->('TO'), + undef, undef ], 'BODY[1.1.HEADER.FIELDS (TO)]' => [ $partial_hdr_get, - '1.1', 'TO', undef, undef ], + '1.1', $hdrs_regexp->('TO'), + undef, undef ], }, 'structure matches expected'); }
Small array refs have considerable overhead in Perl, so reduce AV/SV overhead and instead allow the inflight array to grow twice as large. --- lib/PublicInbox/Git.pm | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index 8426cc7d6a7..e1d5c386e7e 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -18,7 +18,7 @@ use base qw(Exporter); our @EXPORT_OK = qw(git_unquote git_quote); use constant MAX_INFLIGHT => - ($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) + (($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 2) / 65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1 @@ -135,8 +135,8 @@ sub read_cat_in_full ($$) { sub _cat_async_step ($$) { my ($self, $inflight) = @_; - my $pair = shift @$inflight or die 'BUG: inflight empty'; - my ($cb, $arg) = @$pair; + die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2; + my ($cb, $arg) = splice(@$inflight, 0, 2); local $/ = "\n"; my $head = readline($self->{in}); $head =~ / missing$/ and return @@ -314,7 +314,7 @@ sub cat_async ($$$;$) { } print { $self->{out} } $oid, "\n" or fail($self, "write error: $!"); - push(@$inflight, [ $cb, $arg ]); + push(@$inflight, $cb, $arg); } sub extract_cmt_time {
To work with our event loop, we must perform read buffering ourselves or risk starvation, as there doesn't appear to be a way to check the amount of data buffered in userspace by by the PerlIO layers without resorting to C or XS. This lets us perform fewer syscalls at the expense of more Perl ops. As it stands, there seems to be a tiny performance improvement, but more will be possible in the future. --- lib/PublicInbox/Git.pm | 82 +++++++++++++++++++++++++++--------------- 1 file changed, 54 insertions(+), 28 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index e1d5c386e7e..54c163e8c2f 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -16,6 +16,8 @@ use PublicInbox::Spawn qw(popen_rd); use PublicInbox::Tmpfile; use base qw(Exporter); our @EXPORT_OK = qw(git_unquote git_quote); +use Errno qw(EINTR); +our $PIPE_BUFSIZ = 65536; # Linux default use constant MAX_INFLIGHT => (($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 2) @@ -121,32 +123,54 @@ sub _bidi_pipe { fcntl($out_w, 1031, 4096); fcntl($in_r, 1031, 4096) if $batch eq '--batch-check'; } + $self->{$batch} = \(my $rbuf = ''); $self->{$out} = $out_w; $self->{$in} = $in_r; } -sub read_cat_in_full ($$) { - my ($self, $len) = @_; - ++$len; # for final "\n" added by git - read($self->{in}, my $buf, $len) == $len or fail($self, 'short read'); - chop($buf) eq "\n" or fail($self, 'newline missing after blob'); - \$buf; +sub my_read ($$$) { + my ($fh, $rbuf, $len) = @_; + my $left = $len - length($$rbuf); + my $r; + while ($left > 0) { + $r = sysread($fh, $$rbuf, $PIPE_BUFSIZ, length($$rbuf)); + if ($r) { + $left -= $r; + } else { + next if (!defined($r) && $! == EINTR); + return $r; + } + } + \substr($$rbuf, 0, $len, ''); +} + +sub my_readline ($$) { + my ($fh, $rbuf) = @_; + while (1) { + if ((my $n = index($$rbuf, "\n")) >= 0) { + return substr($$rbuf, 0, $n + 1, ''); + } + my $r = sysread($fh, $$rbuf, $PIPE_BUFSIZ, length($$rbuf)); + next if $r || (!defined($r) && $! == EINTR); + return defined($r) ? '' : undef; # EOF or error + } } sub _cat_async_step ($$) { my ($self, $inflight) = @_; die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2; my ($cb, $arg) = splice(@$inflight, 0, 2); - local $/ = "\n"; - my $head = readline($self->{in}); + my $head = my_readline($self->{in}, $self->{'--batch'}); $head =~ / missing$/ and return eval { $cb->(undef, undef, undef, undef, $arg) }; $head =~ /^([0-9a-f]{40}) (\S+) ([0-9]+)$/ or fail($self, "Unexpected result from async git cat-file: $head"); my ($oid_hex, $type, $size) = ($1, $2, $3 + 0); - my $bref = read_cat_in_full($self, $size); - eval { $cb->($bref, $oid_hex, $type, $size, $arg) }; + my $ret = my_read($self->{in}, $self->{'--batch'}, $size + 1); + fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret; + chop($$ret) eq "\n" or fail($self, 'newline missing after blob'); + eval { $cb->($ret, $oid_hex, $type, $size, $arg) }; warn "E: $oid_hex $@\n" if $@; } @@ -158,16 +182,18 @@ sub cat_async_wait ($) { } } +sub batch_prepare ($) { + _bidi_pipe($_[0], qw(--batch in out pid)); +} + sub cat_file { - my ($self, $obj, $ref) = @_; + my ($self, $obj, $sizeref) = @_; my ($retried, $head); cat_async_wait($self); again: batch_prepare($self); print { $self->{out} } $obj, "\n" or fail($self, "write error: $!"); - - local $/ = "\n"; - $head = readline($self->{in}); + $head = my_readline($self->{in}, $self->{'--batch'}); if ($head =~ / missing$/) { if (!$retried && alternates_changed($self)) { $retried = 1; @@ -179,19 +205,19 @@ again: $head =~ /^[0-9a-f]{40} \S+ ([0-9]+)$/ or fail($self, "Unexpected result from git cat-file: $head"); - my $size = $1; - $$ref = $size if $ref; - read_cat_in_full($self, $size); + my $size = $1 + 0; + $$sizeref = $size if $sizeref; + my $ret = my_read($self->{in}, $self->{'--batch'}, $size + 1); + fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret; + chop($$ret) eq "\n" or fail($self, 'newline missing after blob'); + $ret; } -sub batch_prepare ($) { _bidi_pipe($_[0], qw(--batch in out pid)) } - sub check { my ($self, $obj) = @_; _bidi_pipe($self, qw(--batch-check in_c out_c pid_c err_c)); print { $self->{out_c} } $obj, "\n" or fail($self, "write error: $!"); - local $/ = "\n"; - chomp(my $line = readline($self->{in_c})); + chomp(my $line = my_readline($self->{in_c}, $self->{'--batch-check'})); my ($hex, $type, $size) = split(' ', $line); # Future versions of git.git may show 'ambiguous', but for now, @@ -201,9 +227,9 @@ sub check { return if $type eq 'missing' || $type eq 'ambiguous'; if ($hex eq 'dangling' || $hex eq 'notdir' || $hex eq 'loop') { - $size = $type + length("\n"); - my $r = read($self->{in_c}, my $buf, $size); - defined($r) or fail($self, "read failed: $!"); + my $ret = my_read($self->{in_c}, $self->{'--batch-check'}, + $type + 1); + fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret; return; } @@ -211,9 +237,9 @@ sub check { } sub _destroy { - my ($self, $in, $out, $pid, $err) = @_; + my ($self, $batch, $in, $out, $pid, $err) = @_; my $p = delete $self->{$pid} or return; - delete @$self{($in, $out)}; + delete @$self{($batch, $in, $out)}; delete $self->{$err} if $err; # `err_c' # PublicInbox::DS may not be loaded @@ -251,8 +277,8 @@ sub qx { # returns true if there are pending "git cat-file" processes sub cleanup { my ($self) = @_; - _destroy($self, qw(in out pid)); - _destroy($self, qw(in_c out_c pid_c err_c)); + _destroy($self, qw(--batch in out pid)); + _destroy($self, qw(--batch-check in_c out_c pid_c err_c)); !!($self->{pid} || $self->{pid_c}); }
This ought to improve overall performance with multiple clients. Single client performance suffers a tiny bit due to extra syscall overhead from epoll. This also makes the existing async interface easier-to-use, since calling cat_async_begin is no longer required. --- MANIFEST | 1 + lib/PublicInbox/Git.pm | 10 +++---- lib/PublicInbox/GitAsyncCat.pm | 49 ++++++++++++++++++++++++++++++++++ lib/PublicInbox/IMAP.pm | 49 ++++++++++++++++++---------------- lib/PublicInbox/Inbox.pm | 16 ++++++++--- t/git.t | 1 - xt/cmp-msgstr.t | 1 - xt/cmp-msgview.t | 1 - xt/eml_check_limits.t | 1 - xt/git_async_cmp.t | 1 - xt/msgtime_cmp.t | 1 - xt/perf-msgview.t | 1 - 12 files changed, 93 insertions(+), 39 deletions(-) create mode 100644 lib/PublicInbox/GitAsyncCat.pm diff --git a/MANIFEST b/MANIFEST index 2e4ec915412..82f047f3623 100644 --- a/MANIFEST +++ b/MANIFEST @@ -119,6 +119,7 @@ lib/PublicInbox/Filter/SubjectTag.pm lib/PublicInbox/Filter/Vger.pm lib/PublicInbox/GetlineBody.pm lib/PublicInbox/Git.pm +lib/PublicInbox/GitAsyncCat.pm lib/PublicInbox/GitHTTPBackend.pm lib/PublicInbox/GzipFilter.pm lib/PublicInbox/HTTP.pm diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index 54c163e8c2f..c5a3fa4642c 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -156,7 +156,7 @@ sub my_readline ($$) { } } -sub _cat_async_step ($$) { +sub cat_async_step ($$) { my ($self, $inflight) = @_; die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2; my ($cb, $arg) = splice(@$inflight, 0, 2); @@ -178,7 +178,7 @@ sub cat_async_wait ($) { my ($self) = @_; my $inflight = delete $self->{inflight} or return; while (scalar(@$inflight)) { - _cat_async_step($self, $inflight); + cat_async_step($self, $inflight); } } @@ -277,6 +277,7 @@ sub qx { # returns true if there are pending "git cat-file" processes sub cleanup { my ($self) = @_; + cat_async_wait($self); _destroy($self, qw(--batch in out pid)); _destroy($self, qw(--batch-check in_c out_c pid_c err_c)); !!($self->{pid} || $self->{pid_c}); @@ -334,9 +335,9 @@ sub cat_async_begin { sub cat_async ($$$;$) { my ($self, $oid, $cb, $arg) = @_; - my $inflight = $self->{inflight} or die 'BUG: not in async'; + my $inflight = $self->{inflight} // cat_async_begin($self); if (scalar(@$inflight) >= MAX_INFLIGHT) { - _cat_async_step($self, $inflight); + cat_async_step($self, $inflight); } print { $self->{out} } $oid, "\n" or fail($self, "write error: $!"); @@ -358,7 +359,6 @@ sub modified ($) { my ($self) = @_; my $modified = 0; my $fh = popen($self, qw(rev-parse --branches)); - cat_async_begin($self); local $/ = "\n"; while (my $oid = <$fh>) { chomp $oid; diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm new file mode 100644 index 00000000000..f168169f68d --- /dev/null +++ b/lib/PublicInbox/GitAsyncCat.pm @@ -0,0 +1,49 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# +# internal class used by PublicInbox::Git + Danga::Socket +# This parses the output pipe of "git cat-file --batch" +# +# Note: this does NOT set the non-blocking flag, we expect `git cat-file' +# to be a local process, and git won't start writing a blob until it's +# fully read. So minimize context switching and read as much as possible +# and avoid holding a buffer in our heap any longer than it has to live. +package PublicInbox::GitAsyncCat; +use strict; +use parent qw(PublicInbox::DS Exporter); +use fields qw(git); +use PublicInbox::Syscall qw(EPOLLIN EPOLLET); +our @EXPORT = qw(git_async_msg); + +sub new { + my ($class, $git) = @_; + my $self = fields::new($class); + $git->batch_prepare; + $self->SUPER::new($git->{in}, EPOLLIN|EPOLLET); + $self->{git} = $git; + $self; +} + +sub event_step { + my ($self) = @_; + my $git = $self->{git} or return; # ->close-ed + my $inflight = $git->{inflight}; + if (@$inflight) { + $git->cat_async_step($inflight); + $self->requeue if @$inflight || length(${$git->{'--batch'}}); + } +} + +sub close { + my ($self) = @_; + delete $self->{git}; + $self->SUPER::close; # PublicInbox::DS::close +} + +sub git_async_msg ($$$$) { + my ($ibx, $smsg, $cb, $arg) = @_; + $ibx->git->cat_async($smsg->{blob}, $cb, $arg); + $ibx->{async_cat} //= new(__PACKAGE__, $ibx->{git}); +} + +1; diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 39667199080..2282e3ce78c 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -21,8 +21,10 @@ use PublicInbox::Eml; use PublicInbox::EmlContentFoo qw(parse_content_disposition); use PublicInbox::DS qw(now); use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT); +use PublicInbox::GitAsyncCat; use Text::ParseWords qw(parse_line); use Errno qw(EAGAIN); + my $Address; for my $mod (qw(Email::Address::XS Mail::Address)) { eval "require $mod" or next; @@ -367,14 +369,30 @@ EOF \$ret; } -sub uid_fetch_cb { # called by git->cat_async +sub requeue_once ($) { + my ($self) = @_; + # COMPRESS users all share the same DEFLATE context. + # Flush it here to ensure clients don't see + # each other's data + $self->zflush; + + # no recursion, schedule another call ASAP, + # but only after all pending writes are done. + # autovivify wbuf: + my $new_size = push(@{$self->{wbuf}}, \&long_step); + + # wbuf may be populated by $cb, no need to rearm if so: + $self->requeue if $new_size == 1; +} + +sub uid_fetch_cb { # called by git->cat_async via git_async_msg my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg; my $smsg = shift @$msgs or die 'BUG: no smsg'; if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs # public-inbox-(edit|purge), just move onto the next message - return unless defined $want->{-seqno}; + return requeue_once($self) unless defined $want->{-seqno}; $bref = dummy_message($smsg->{num}, $ibx); } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; @@ -420,6 +438,7 @@ sub uid_fetch_cb { # called by git->cat_async partial_emit($self, $partial, $eml); } $self->msg_more(")\r\n"); + requeue_once($self); } sub uid_fetch_m { # long_response @@ -432,11 +451,7 @@ sub uid_fetch_m { # long_response } $$beg = $msgs->[-1]->{num} + 1; } - my $git = $ibx->git; - $git->cat_async_begin; # TODO: actually make async - $git->cat_async($msgs->[0]->{blob}, \&uid_fetch_cb, \@_); - $git->cat_async_wait; - 1; + git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_); } sub cmd_status ($$$;@) { @@ -677,20 +692,17 @@ sub seq_fetch_m { # long_response my $seq = $want->{-seqno}++; my $cur_num = $msgs->[0]->{num}; if ($cur_num == $seq) { # as expected - my $git = $ibx->git; - $git->cat_async_begin; # TODO: actually make async - $git->cat_async($msgs->[0]->{blob}, \&uid_fetch_cb, \@_); - $git->cat_async_wait; + git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_); } elsif ($cur_num > $seq) { # send dummy messages until $seq catches up to $cur_num my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg'; unshift @$msgs, $smsg; my $bref = dummy_message($seq, $ibx); uid_fetch_cb($bref, undef, undef, undef, \@_); + $smsg; # blessed response since uid_fetch_cb requeues } else { # should not happen die "BUG: cur_num=$cur_num < seq=$seq"; } - 1; # more messages on the way } sub cmd_fetch ($$$;@) { @@ -810,17 +822,8 @@ sub long_step { } elsif ($more) { # $self->{wbuf}: $self->update_idle_time; - # COMPRESS users all share the same DEFLATE context. - # Flush it here to ensure clients don't see - # each other's data - $self->zflush; - - # no recursion, schedule another call ASAP, but only after - # all pending writes are done. autovivify wbuf: - my $new_size = push(@{$self->{wbuf}}, \&long_step); - - # wbuf may be populated by $cb, no need to rearm if so: - $self->requeue if $new_size == 1; + # control passed to $more may be a GitAsyncCat object + requeue_once($self) if !ref($more); } else { # all done! delete $self->{long_cb}; my $elapsed = now() - $t0; diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm index b250bef33d3..407751c3063 100644 --- a/lib/PublicInbox/Inbox.pm +++ b/lib/PublicInbox/Inbox.pm @@ -18,6 +18,16 @@ my $cleanup_timer; my $cleanup_avail = -1; # 0, or 1 my $have_devel_peek; my $CLEANUP = {}; # string(inbox) -> inbox + +sub git_cleanup ($) { + my ($self) = @_; + my $git = $self->{git} or return; + if (my $async_cat = delete $self->{async_cat}) { + $async_cat->close; + } + $git->cleanup; +} + sub cleanup_task () { $cleanup_timer = undef; my $next = {}; @@ -32,9 +42,7 @@ sub cleanup_task () { # refcnt is zero when tmp is out-of-scope } } - if (my $git = $ibx->{git}) { - $again = $git->cleanup; - } + git_cleanup($ibx); if (my $gits = $ibx->{-repo_objs}) { foreach my $git (@$gits) { $again = 1 if $git->cleanup; @@ -157,7 +165,7 @@ sub max_git_epoch { my $cur = $self->{-max_git_epoch}; my $changed = git($self)->alternates_changed; if (!defined($cur) || $changed) { - $self->git->cleanup if $changed; + git_cleanup($self) if $changed; my $gits = "$self->{inboxdir}/git"; if (opendir my $dh, $gits) { my $max = -1; diff --git a/t/git.t b/t/git.t index b05ac123bec..0b2089ba701 100644 --- a/t/git.t +++ b/t/git.t @@ -40,7 +40,6 @@ use_ok 'PublicInbox::Git'; my $arg = { 'foo' => 'bar' }; my $res = []; my $missing = []; - $gcf->cat_async_begin; $gcf->cat_async($oid, sub { my ($bref, $oid_hex, $type, $size, $arg) = @_; $res = [ @_ ]; diff --git a/xt/cmp-msgstr.t b/xt/cmp-msgstr.t index 6bae0f66e83..0276f8454b0 100644 --- a/xt/cmp-msgstr.t +++ b/xt/cmp-msgstr.t @@ -93,7 +93,6 @@ my $git_cb = sub { diag xqx([qw(git diff), "$tmpdir/mime", "$tmpdir/eml"]); } }; -$git->cat_async_begin; my $t = timeit(1, sub { while (<$fh>) { my ($oid, $type) = split / /; diff --git a/xt/cmp-msgview.t b/xt/cmp-msgview.t index 66fb467eb1a..5bd7aa174df 100644 --- a/xt/cmp-msgview.t +++ b/xt/cmp-msgview.t @@ -77,7 +77,6 @@ my $git_cb = sub { is_deeply($eml_cmp, $mime_cmp, "$inboxdir $oid match"); } }; -$git->cat_async_begin; my $t = timeit(1, sub { while (<$fh>) { my ($oid, $type) = split / /; diff --git a/xt/eml_check_limits.t b/xt/eml_check_limits.t index 39de047645b..2d632799956 100644 --- a/xt/eml_check_limits.t +++ b/xt/eml_check_limits.t @@ -55,7 +55,6 @@ my $cat_cb = sub { }; my $t = timeit(1, sub { - $git->cat_async_begin; my ($blob, $type); while (<$fh>) { ($blob, $type) = split / /; diff --git a/xt/git_async_cmp.t b/xt/git_async_cmp.t index 46d27b26bed..8f8d1cf4d01 100644 --- a/xt/git_async_cmp.t +++ b/xt/git_async_cmp.t @@ -25,7 +25,6 @@ my $async = timeit($nr, sub { $dig->add($$bref); }; my $cat = $git->popen(@cat); - $git->cat_async_begin; while (<$cat>) { my ($oid, undef, undef) = split(/ /); diff --git a/xt/msgtime_cmp.t b/xt/msgtime_cmp.t index 95d7c64bec6..0ce3c042217 100644 --- a/xt/msgtime_cmp.t +++ b/xt/msgtime_cmp.t @@ -59,7 +59,6 @@ sub compare { } my $fh = $git->popen(@cat); -$git->cat_async_begin; while (<$fh>) { my ($oid, $type) = split / /; next if $type ne 'blob'; diff --git a/xt/perf-msgview.t b/xt/perf-msgview.t index 30fc07dcbc8..d99101a30de 100644 --- a/xt/perf-msgview.t +++ b/xt/perf-msgview.t @@ -44,7 +44,6 @@ my $cb = sub { $obuf = ''; }; -$git->cat_async_begin; my $t = timeit(1, sub { $ctx->{obuf} = \$obuf; $ctx->{mhref} = '../';
We do this for the C10K-oriented HTTP/NNTP/IMAP processes, and we may support thousands of git-cat-file processes in the future. --- lib/PublicInbox/Git.pm | 47 ++++++++++++++++++---------------- lib/PublicInbox/GitAsyncCat.pm | 2 +- 2 files changed, 26 insertions(+), 23 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index c5a3fa4642c..1c148086f46 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -123,7 +123,6 @@ sub _bidi_pipe { fcntl($out_w, 1031, 4096); fcntl($in_r, 1031, 4096) if $batch eq '--batch-check'; } - $self->{$batch} = \(my $rbuf = ''); $self->{$out} = $out_w; $self->{$in} = $in_r; } @@ -160,18 +159,20 @@ sub cat_async_step ($$) { my ($self, $inflight) = @_; die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2; my ($cb, $arg) = splice(@$inflight, 0, 2); - my $head = my_readline($self->{in}, $self->{'--batch'}); - $head =~ / missing$/ and return - eval { $cb->(undef, undef, undef, undef, $arg) }; - - $head =~ /^([0-9a-f]{40}) (\S+) ([0-9]+)$/ or + my $rbuf = delete($self->{cat_rbuf}) // \(my $new = ''); + my ($bref, $oid, $type, $size); + my $head = my_readline($self->{in}, $rbuf); + if ($head =~ /^([0-9a-f]{40}) (\S+) ([0-9]+)$/) { + ($oid, $type, $size) = ($1, $2, $3 + 0); + $bref = my_read($self->{in}, $rbuf, $size + 1) or + fail($self, defined($bref) ? 'read EOF' : "read: $!"); + chop($$bref) eq "\n" or fail($self, 'LF missing after blob'); + } elsif ($head !~ / missing$/) { fail($self, "Unexpected result from async git cat-file: $head"); - my ($oid_hex, $type, $size) = ($1, $2, $3 + 0); - my $ret = my_read($self->{in}, $self->{'--batch'}, $size + 1); - fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret; - chop($$ret) eq "\n" or fail($self, 'newline missing after blob'); - eval { $cb->($ret, $oid_hex, $type, $size, $arg) }; - warn "E: $oid_hex $@\n" if $@; + } + eval { $cb->($bref, $oid, $type, $size, $arg) }; + $self->{cat_rbuf} = $rbuf if $$rbuf ne ''; + warn "E: $oid: $@\n" if $@; } sub cat_async_wait ($) { @@ -188,12 +189,13 @@ sub batch_prepare ($) { sub cat_file { my ($self, $obj, $sizeref) = @_; - my ($retried, $head); + my ($retried, $head, $rbuf); cat_async_wait($self); again: batch_prepare($self); + $rbuf = delete($self->{cat_rbuf}) // \(my $new = ''); print { $self->{out} } $obj, "\n" or fail($self, "write error: $!"); - $head = my_readline($self->{in}, $self->{'--batch'}); + $head = my_readline($self->{in}, $rbuf); if ($head =~ / missing$/) { if (!$retried && alternates_changed($self)) { $retried = 1; @@ -207,7 +209,8 @@ again: my $size = $1 + 0; $$sizeref = $size if $sizeref; - my $ret = my_read($self->{in}, $self->{'--batch'}, $size + 1); + my $ret = my_read($self->{in}, $rbuf, $size + 1); + $self->{cat_rbuf} = $rbuf if $$rbuf ne ''; fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret; chop($$ret) eq "\n" or fail($self, 'newline missing after blob'); $ret; @@ -217,7 +220,8 @@ sub check { my ($self, $obj) = @_; _bidi_pipe($self, qw(--batch-check in_c out_c pid_c err_c)); print { $self->{out_c} } $obj, "\n" or fail($self, "write error: $!"); - chomp(my $line = my_readline($self->{in_c}, $self->{'--batch-check'})); + my $rbuf = ''; # TODO: async + {chk_rbuf} + chomp(my $line = my_readline($self->{in_c}, \$rbuf)); my ($hex, $type, $size) = split(' ', $line); # Future versions of git.git may show 'ambiguous', but for now, @@ -227,8 +231,7 @@ sub check { return if $type eq 'missing' || $type eq 'ambiguous'; if ($hex eq 'dangling' || $hex eq 'notdir' || $hex eq 'loop') { - my $ret = my_read($self->{in_c}, $self->{'--batch-check'}, - $type + 1); + my $ret = my_read($self->{in_c}, \$rbuf, $type + 1); fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret; return; } @@ -237,9 +240,9 @@ sub check { } sub _destroy { - my ($self, $batch, $in, $out, $pid, $err) = @_; + my ($self, $rbuf, $in, $out, $pid, $err) = @_; my $p = delete $self->{$pid} or return; - delete @$self{($batch, $in, $out)}; + delete @$self{($rbuf, $in, $out)}; delete $self->{$err} if $err; # `err_c' # PublicInbox::DS may not be loaded @@ -278,8 +281,8 @@ sub qx { sub cleanup { my ($self) = @_; cat_async_wait($self); - _destroy($self, qw(--batch in out pid)); - _destroy($self, qw(--batch-check in_c out_c pid_c err_c)); + _destroy($self, qw(cat_rbuf in out pid)); + _destroy($self, qw(chk_rbuf in_c out_c pid_c err_c)); !!($self->{pid} || $self->{pid_c}); } diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm index f168169f68d..65e16121969 100644 --- a/lib/PublicInbox/GitAsyncCat.pm +++ b/lib/PublicInbox/GitAsyncCat.pm @@ -30,7 +30,7 @@ sub event_step { my $inflight = $git->{inflight}; if (@$inflight) { $git->cat_async_step($inflight); - $self->requeue if @$inflight || length(${$git->{'--batch'}}); + $self->requeue if @$inflight || exists $git->{cat_rbuf}; } }
Since we only support read-only operation, we can't save subscriptions requested by clients. So just list no inboxes as subscribed, some MUAs may blindly try to fetch everything its subscribed to. --- lib/PublicInbox/IMAP.pm | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 2282e3ce78c..51ab8b8c2f2 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -497,6 +497,11 @@ sub cmd_list ($$$$) { \(join('', @$l, "$tag OK List done\r\n")); } +sub cmd_lsub ($$$$) { + my (undef, $tag) = @_; # same args as cmd_list + "$tag OK Lsub done\r\n"; +} + sub eml_index_offs_i { # PublicInbox::Eml::each_part callback my ($p, $all) = @_; my ($eml, undef, $idx) = @$p;
The RFC 3501 `sequence-set' definition allows comma-delimited ranges, so we'll support it in case clients send them. Coalescing overlapping ranges isn't required, so we won't support it as such an attempt to save bandwidth would waste memory on the server, instead. --- lib/PublicInbox/IMAP.pm | 92 +++++++++++++++++++++++++---------------- t/imapd.t | 2 + 2 files changed, 59 insertions(+), 35 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 51ab8b8c2f2..917833f7c47 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -61,6 +61,9 @@ for my $att (keys %FETCH_ATT) { $FETCH_ATT{$att} = \%h; } +my $valid_range = '[0-9]+|[0-9]+:[0-9]+|[0-9]+:\*'; +$valid_range = qr/\A(?:$valid_range)(?:,(?:$valid_range))*\z/; + sub greet ($) { my ($self) = @_; my $capa = capa($self); @@ -387,7 +390,7 @@ sub requeue_once ($) { sub uid_fetch_cb { # called by git->cat_async via git_async_msg my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; - my ($self, undef, $ibx, undef, undef, $msgs, $want) = @$fetch_m_arg; + my ($self, undef, $ibx, $msgs, undef, $want) = @$fetch_m_arg; my $smsg = shift @$msgs or die 'BUG: no smsg'; if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs @@ -441,15 +444,48 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_msg requeue_once($self); } +sub range_step ($$) { + my ($ibx, $range_csv) = @_; + my ($beg, $end, $range); + if ($$range_csv =~ s/\A([^,]+),//) { + $range = $1; + } else { + $range = $$range_csv; + $$range_csv = undef; + } + if ($range =~ /\A([0-9]+):([0-9]+)\z/) { + ($beg, $end) = ($1, $2); + } elsif ($range =~ /\A([0-9]+):\*\z/) { + ($beg, $end) = ($1, $ibx->mm->max // 0); + } elsif ($range =~ /\A[0-9]+\z/) { + $beg = $end = $range; + } else { + return 'BAD fetch range'; + } + [ $beg, $end, $$range_csv ]; +} + +sub refill_range ($$$) { + my ($ibx, $msgs, $range_info) = @_; + my ($beg, $end, $range_csv) = @$range_info; + if (scalar(@$msgs = @{$ibx->over->query_xover($beg, $end)})) { + $range_info->[0] = $msgs->[-1]->{num} + 1; + return; + } + return 'OK Fetch done' if !$range_csv; + my $next_range = range_step($ibx, \$range_csv); + return $next_range if !ref($next_range); # error + @$range_info = @$next_range; + undef; # keep looping +} + sub uid_fetch_m { # long_response - my ($self, $tag, $ibx, $beg, $end, $msgs, $want) = @_; - if (!@$msgs) { # refill - @$msgs = @{$ibx->over->query_xover($$beg, $end)}; - if (!@$msgs) { - $self->write(\"$tag OK Fetch done\r\n"); + my ($self, $tag, $ibx, $msgs, $range_info, $want) = @_; + while (!@$msgs) { # rare + if (my $end = refill_range($ibx, $msgs, $range_info)) { + $self->write(\"$tag $end\r\n"); return; } - $$beg = $msgs->[-1]->{num} + 1; } git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_); } @@ -635,7 +671,7 @@ sub partial_emit ($$$) { } sub fetch_common ($$$$) { - my ($self, $tag, $range, $want) = @_; + my ($self, $tag, $range_csv, $want) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; if ($want->[0] =~ s/\A\(//s) { $want->[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n"; @@ -658,41 +694,27 @@ sub fetch_common ($$$$) { [ $_, @{$partial{$_}} ] } sort keys %partial ]; } - - my ($beg, $end); - my $msgs = []; - if ($range =~ /\A([0-9]+):([0-9]+)\z/s) { - ($beg, $end) = ($1, $2); - } elsif ($range =~ /\A([0-9]+):\*\z/s) { - ($beg, $end) = ($1, $ibx->mm->max // 0); - } elsif ($range =~ /\A[0-9]+\z/) { - my $smsg = $ibx->over->get_art($range) or - return "$tag OK Fetch done\r\n"; # really OK(!) - push @$msgs, $smsg; - ($beg, $end) = ($range, 0); - } else { - return "$tag BAD fetch range\r\n"; - } - [ $tag, $ibx, \$beg, $end, $msgs, \%want ]; + $range_csv = 'bad' if $range_csv !~ $valid_range; + my $range_info = range_step($ibx, \$range_csv); + return "$tag $range_info\r\n" if !ref($range_info); + [ $tag, $ibx, [], $range_info, \%want ]; } sub cmd_uid_fetch ($$$;@) { - my ($self, $tag, $range, @want) = @_; - my $args = fetch_common($self, $tag, $range, \@want); + my ($self, $tag, $range_csv, @want) = @_; + my $args = fetch_common($self, $tag, $range_csv, \@want); ref($args) eq 'ARRAY' ? long_response($self, \&uid_fetch_m, @$args) : $args; # error } sub seq_fetch_m { # long_response - my ($self, $tag, $ibx, $beg, $end, $msgs, $want) = @_; - if (!@$msgs) { # refill - @$msgs = @{$ibx->over->query_xover($$beg, $end)}; - if (!@$msgs) { - $self->write(\"$tag OK Fetch done\r\n"); + my ($self, $tag, $ibx, $msgs, $range_info, $want) = @_; + while (!@$msgs) { # rare + if (my $end = refill_range($ibx, $msgs, $range_info)) { + $self->write(\"$tag $end\r\n"); return; } - $$beg = $msgs->[-1]->{num} + 1; } my $seq = $want->{-seqno}++; my $cur_num = $msgs->[0]->{num}; @@ -711,11 +733,11 @@ sub seq_fetch_m { # long_response } sub cmd_fetch ($$$;@) { - my ($self, $tag, $range, @want) = @_; - my $args = fetch_common($self, $tag, $range, \@want); + my ($self, $tag, $range_csv, @want) = @_; + my $args = fetch_common($self, $tag, $range_csv, \@want); ref($args) eq 'ARRAY' ? do { my $want = $args->[-1]; - $want->{-seqno} = ${$args->[2]}; # $$beg + $want->{-seqno} = $args->[3]->[0]; # $beg == $range_info->[0]; long_response($self, \&seq_fetch_m, @$args) } : $args; # error } diff --git a/t/imapd.t b/t/imapd.t index 8172a91923d..7af14f1b150 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -309,6 +309,8 @@ EOF # message sequence numbers :< is($mic->Uid(0), 0, 'disable UID on '.ref($mic)); ok($mic->reconnect, 'reconnected'); +$ret = $mic->fetch_hash('1,2:3', 'RFC822') or BAIL_OUT "FETCH $@"; +is(scalar keys %$ret, 3, 'got all 3 messages with comma-separated sequence'); $ret = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@"; is(scalar keys %$ret, 3, 'got all 3 messages'); {
Include a test for Mail::IMAPTalk, here, since Mail::IMAPClient stalls with compression enabled: https://rt.cpan.org/Ticket/Display.html?id=132720 --- MANIFEST | 1 + xt/cmp-imapd-compress.t | 83 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 84 insertions(+) create mode 100644 xt/cmp-imapd-compress.t diff --git a/MANIFEST b/MANIFEST index 82f047f3623..f74852b6e90 100644 --- a/MANIFEST +++ b/MANIFEST @@ -346,6 +346,7 @@ t/www_listing.t t/www_static.t t/x-unknown-alpine.eml t/xcpdb-reshard.t +xt/cmp-imapd-compress.t xt/cmp-msgstr.t xt/cmp-msgview.t xt/eml_check_limits.t diff --git a/xt/cmp-imapd-compress.t b/xt/cmp-imapd-compress.t new file mode 100644 index 00000000000..b12cf74e2d5 --- /dev/null +++ b/xt/cmp-imapd-compress.t @@ -0,0 +1,83 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use Test::More; +use PublicInbox::TestCommon; +require_mods('Data::Dumper'); +Data::Dumper->import('Dumper'); +my $inboxdir = $ENV{GIANT_INBOX_DIR}; +(defined($inboxdir) && -d $inboxdir) or + plan skip_all => "GIANT_INBOX_DIR not defined for $0"; +plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!; +my ($tmpdir, $for_destroy) = tmpdir(); +my $cfg = "$tmpdir/cfg"; +my $mailbox = 'inbox.test'; +{ + open my $fh, '>', $cfg or BAIL_OUT "open: $!"; + print $fh <<EOF or BAIL_OUT "print: $!"; +[publicinbox "test"] + newsgroup = $mailbox + address = test\@example.com + inboxdir = $inboxdir +EOF + close $fh or BAIL_OUT "close: $!"; +} +my ($out, $err) = ("$tmpdir/stdout.log", "$tmpdir/stderr.log"); +my $sock = tcp_server(); +my $cmd = [ '-imapd', '-W0', "--stdout=$out", "--stderr=$err"]; +my $env = { PI_CONFIG => $cfg }; +my $td = start_script($cmd, $env, { 3 => $sock }) or BAIL_OUT "-imapd: $?"; +my ($host, $port) = ($sock->sockhost, $sock->sockport); +my $c = tcp_connect($sock); +like(readline($c), qr/CAPABILITY /, 'got greeting'); +undef $c; + +SKIP: { + require_mods('Mail::IMAPClient', 3); + unless ($ENV{RT_132720_FIXED}) { + my $bug = 'https://rt.cpan.org/Ticket/Display.html?id=132720'; + skip "<$bug>, RT_132720_FIXED not defined", 3; + } + my %opt = (Server => $host, Port => $port, + User => 'u', Password => 'p', Clear => 1); + my $uc = Mail::IMAPClient->new(%opt); + my $c = Mail::IMAPClient->new(%opt); + ok($c->compress, 'enabled compression'); + ok $c->examine($mailbox), 'compressed EXAMINE-ed'; + ok $uc->examine($mailbox), 'uncompress EXAMINE-ed'; + my $range = $uc->search('all'); + for my $uid (@$range) { + my $A = $uc->fetch_hash($uid, 'BODY[]'); + my $B = $c->fetch_hash($uid, 'BODY[]'); + if (!is_deeply($A, $B, "$uid identical")) { + diag Dumper([$A, $B]); + diag Dumper([$uc, $c]); + last; + } + } + $uc->logout; + $c->logout; +} + +SKIP: { + require_mods('Mail::IMAPTalk', 3); + my %opt = (Server => $host, Port => $port, UseSSL => 0, + Username => 'u', Password => 'p', Uid => 1); + my $uc = Mail::IMAPTalk->new(%opt) or BAIL_OUT 'IMAPTalk->new'; + my $c = Mail::IMAPTalk->new(%opt, UseCompress => 1) or + BAIL_OUT 'IMAPTalk->new(UseCompress => 1)'; + ok $c->examine($mailbox), 'compressed EXAMINE-ed'; + ok $uc->examine($mailbox), 'uncompress EXAMINE-ed'; + my $range = $uc->search('all'); + for my $uid (@$range) { + my $A = $uc->fetch($uid, 'rfc822'); + my $B = $c->fetch($uid, 'rfc822'); + if (!is_deeply($A, $B, "$uid identical")) { + diag Dumper([$A, $B]); + diag Dumper([$uc, $c]); + last; + } + } +} +done_testing;
None of our tests rely on this failing, so just bail out if the system is out of resources. --- lib/PublicInbox/TestCommon.pm | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm index 246047b1357..5e7dc8b0d3f 100644 --- a/lib/PublicInbox/TestCommon.pm +++ b/lib/PublicInbox/TestCommon.pm @@ -37,17 +37,18 @@ sub tcp_server () { Type => Socket::SOCK_STREAM(), Listen => 1024, Blocking => 0, - ) + ) or Test::More::BAIL_OUT("failed to create TCP server: $!"); } sub tcp_connect { my ($dest, %opt) = @_; + my $addr = $dest->sockhost . ':' . $dest->sockport; my $s = IO::Socket::INET->new( Proto => 'tcp', Type => Socket::SOCK_STREAM(), - PeerAddr => $dest->sockhost . ':' . $dest->sockport, + PeerAddr => $addr, %opt, - ); + ) or Test::More::BAIL_OUT("failed to connect to $addr: $!"); $s->autoflush(1); $s; }
It must be a scalar reference, unlike ->write --- lib/PublicInbox/IMAPdeflate.pm | 1 - lib/PublicInbox/NNTPdeflate.pm | 1 - 2 files changed, 2 deletions(-) diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm index 9366db7a7fc..67c9a9738d5 100644 --- a/lib/PublicInbox/IMAPdeflate.pm +++ b/lib/PublicInbox/IMAPdeflate.pm @@ -50,7 +50,6 @@ sub enable { # overrides PublicInbox::NNTP::compressed sub compressed { 1 } -# $_[1] may be a reference or not sub do_read ($$$$) { my ($self, $rbuf, $len, $off) = @_; diff --git a/lib/PublicInbox/NNTPdeflate.pm b/lib/PublicInbox/NNTPdeflate.pm index 8efb662f5ab..eb400c9c220 100644 --- a/lib/PublicInbox/NNTPdeflate.pm +++ b/lib/PublicInbox/NNTPdeflate.pm @@ -62,7 +62,6 @@ sub enable { # overrides PublicInbox::NNTP::compressed sub compressed { 1 } -# $_[1] may be a reference or not sub do_read ($$$$) { my ($self, $rbuf, $len, $off) = @_;
Since IMAP yields control to GitAsyncCat, IMAP->event_step may be invoked with {long_cb} still active. We must be sure to bail out of IMAP->event_step if that happens and continue to let GitAsyncCat drive IMAP. This also improves fairness by never processing more than one request per ->event_step. --- lib/PublicInbox/IMAP.pm | 37 ++++++++++++++++------------------ lib/PublicInbox/IMAPdeflate.pm | 10 +++++++++ 2 files changed, 27 insertions(+), 20 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 917833f7c47..3c32d846508 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -888,38 +888,35 @@ sub long_response ($$;@) { sub event_step { my ($self) = @_; - return unless $self->flush_write && $self->{sock}; + return unless $self->flush_write && $self->{sock} && !$self->{long_cb}; $self->update_idle_time; # only read more requests if we've drained the write buffer, # otherwise we can be buffering infinitely w/o backpressure - my $rbuf = $self->{rbuf} // (\(my $x = '')); - my $r = 1; - - if (index($$rbuf, "\n") < 0) { - my $off = length($$rbuf); - $r = $self->do_read($rbuf, LINE_MAX, $off) or return; - } - while ($r > 0 && $$rbuf =~ s/\A[ \t]*([^\n]*?)\r?\n//) { - my $line = $1; - return $self->close if $line =~ /[[:cntrl:]]/s; - my $t0 = now(); - my $fd = fileno($self->{sock}); - $r = eval { process_line($self, $line) }; - my $pending = $self->{wbuf} ? ' pending' : ''; - out($self, "[$fd] %s - %0.6f$pending", $line, now() - $t0); - } + my $rbuf = $self->{rbuf} // \(my $x = ''); + my $line = index($$rbuf, "\n"); + while ($line < 0) { + return $self->close if length($$rbuf) >= LINE_MAX; + $self->do_read($rbuf, LINE_MAX, length($$rbuf)) or return; + $line = index($$rbuf, "\n"); + } + $line = substr($$rbuf, 0, $line + 1, ''); + $line =~ s/\r?\n\z//s; + return $self->close if $line =~ /[[:cntrl:]]/s; + my $t0 = now(); + my $fd = fileno($self->{sock}); + my $r = eval { process_line($self, $line) }; + my $pending = $self->{wbuf} ? ' pending' : ''; + out($self, "[$fd] %s - %0.6f$pending - $r", $line, now() - $t0); return $self->close if $r < 0; - my $len = length($$rbuf); - return $self->close if ($len >= LINE_MAX); $self->rbuf_idle($rbuf); $self->update_idle_time; # maybe there's more pipelined data, or we'll have # to register it for socket-readiness notifications - $self->requeue unless $self->{wbuf}; + $self->requeue unless $pending; } sub compressed { undef } diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm index 67c9a9738d5..42daa6cffaa 100644 --- a/lib/PublicInbox/IMAPdeflate.pm +++ b/lib/PublicInbox/IMAPdeflate.pm @@ -59,6 +59,16 @@ sub do_read ($$$$) { $doff = length($dbuf); my $r = PublicInbox::DS::do_read($self, \$dbuf, $len, $doff) or return; + # Workaround inflate bug appending to OOK scalars: + # <https://rt.cpan.org/Ticket/Display.html?id=132734> + # We only have $off if the client is pipelining, and pipelining + # is where our substr() OOK optimization in event_step makes sense. + if ($off) { + my $copy = $$rbuf; + undef $$rbuf; + $$rbuf = $copy; + } + # assert(length($$rbuf) == $off) as far as NNTP.pm is concerned # -ConsumeInput is true, so $dbuf is automatically emptied my $err = $zin->inflate($dbuf, $rbuf);
This will make it easier to implement the retries on alternates_changed() of the synchronous ->cat_file API. --- lib/PublicInbox/Git.pm | 13 ++++++++----- t/git.t | 2 +- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index 1c148086f46..b3ae2b812ba 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -20,7 +20,7 @@ use Errno qw(EINTR); our $PIPE_BUFSIZ = 65536; # Linux default use constant MAX_INFLIGHT => - (($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 2) + (($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 3) / 65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1 @@ -157,8 +157,8 @@ sub my_readline ($$) { sub cat_async_step ($$) { my ($self, $inflight) = @_; - die 'BUG: inflight empty or odd' if scalar(@$inflight) < 2; - my ($cb, $arg) = splice(@$inflight, 0, 2); + die 'BUG: inflight empty or odd' if scalar(@$inflight) < 3; + my ($req, $cb, $arg) = splice(@$inflight, 0, 3); my $rbuf = delete($self->{cat_rbuf}) // \(my $new = ''); my ($bref, $oid, $type, $size); my $head = my_readline($self->{in}, $rbuf); @@ -167,7 +167,10 @@ sub cat_async_step ($$) { $bref = my_read($self->{in}, $rbuf, $size + 1) or fail($self, defined($bref) ? 'read EOF' : "read: $!"); chop($$bref) eq "\n" or fail($self, 'LF missing after blob'); - } elsif ($head !~ / missing$/) { + } elsif ($head =~ / missing$/) { + $type = 'missing'; + $oid = $req; + } else { fail($self, "Unexpected result from async git cat-file: $head"); } eval { $cb->($bref, $oid, $type, $size, $arg) }; @@ -344,7 +347,7 @@ sub cat_async ($$$;$) { } print { $self->{out} } $oid, "\n" or fail($self, "write error: $!"); - push(@$inflight, $cb, $arg); + push(@$inflight, $oid, $cb, $arg); } sub extract_cmt_time { diff --git a/t/git.t b/t/git.t index 0b2089ba701..98d16f289c3 100644 --- a/t/git.t +++ b/t/git.t @@ -53,7 +53,7 @@ use_ok 'PublicInbox::Git'; is_deeply([$oid_hex, $type, $size], \@x, 'got expected header'); is($arg_res, $arg, 'arg passed to cat_async'); is_deeply($raw, $bref, 'blob result matches'); - is_deeply($missing, [ undef, undef, undef, undef, $arg], + is_deeply($missing, [ undef, 'non-existent', 'missing', undef, $arg], 'non-existent blob gives expected result'); }
Trying to avoid a circular reference by relying on $ibx object here makes no sense, since skipping GitCatAsync::close will result in an FD leak, anyways. So keep GitAsyncCat contained to git-only operations, since we'll be using it for Solver in the distant feature. --- lib/PublicInbox/Git.pm | 3 +++ lib/PublicInbox/GitAsyncCat.pm | 14 ++++++++------ lib/PublicInbox/IMAP.pm | 8 +++++--- lib/PublicInbox/Inbox.pm | 3 --- 4 files changed, 16 insertions(+), 12 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index b3ae2b812ba..60236afe7bf 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -283,6 +283,9 @@ sub qx { # returns true if there are pending "git cat-file" processes sub cleanup { my ($self) = @_; + if (my $ac = $self->{async_cat}) { + $ac->close; # PublicInbox::GitAsyncCat::close -> EPOLL_CTL_DEL + } cat_async_wait($self); _destroy($self, qw(cat_rbuf in out pid)); _destroy($self, qw(chk_rbuf in_c out_c pid_c err_c)); diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm index 65e16121969..8701e4cfe48 100644 --- a/lib/PublicInbox/GitAsyncCat.pm +++ b/lib/PublicInbox/GitAsyncCat.pm @@ -13,7 +13,7 @@ use strict; use parent qw(PublicInbox::DS Exporter); use fields qw(git); use PublicInbox::Syscall qw(EPOLLIN EPOLLET); -our @EXPORT = qw(git_async_msg); +our @EXPORT = qw(git_async_cat); sub new { my ($class, $git) = @_; @@ -36,14 +36,16 @@ sub event_step { sub close { my ($self) = @_; - delete $self->{git}; + if (my $git = delete $self->{git}) { + delete $git->{async_cat}; # drop circular reference + } $self->SUPER::close; # PublicInbox::DS::close } -sub git_async_msg ($$$$) { - my ($ibx, $smsg, $cb, $arg) = @_; - $ibx->git->cat_async($smsg->{blob}, $cb, $arg); - $ibx->{async_cat} //= new(__PACKAGE__, $ibx->{git}); +sub git_async_cat ($$$$) { + my ($git, $oid, $cb, $arg) = @_; + $git->cat_async($oid, $cb, $arg); + $git->{async_cat} //= new(__PACKAGE__, $git); # circular reference } 1; diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 3c32d846508..3815141a15e 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -388,7 +388,7 @@ sub requeue_once ($) { $self->requeue if $new_size == 1; } -sub uid_fetch_cb { # called by git->cat_async via git_async_msg +sub uid_fetch_cb { # called by git->cat_async via git_async_cat my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; my ($self, undef, $ibx, $msgs, undef, $want) = @$fetch_m_arg; my $smsg = shift @$msgs or die 'BUG: no smsg'; @@ -400,6 +400,7 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_msg } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } + $$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy # fixup old bug from import (pre-a0c07cba0e5d8b6a) @@ -487,7 +488,7 @@ sub uid_fetch_m { # long_response return; } } - git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_); + git_async_cat($ibx->git, $msgs->[0]->{blob}, \&uid_fetch_cb, \@_); } sub cmd_status ($$$;@) { @@ -719,7 +720,8 @@ sub seq_fetch_m { # long_response my $seq = $want->{-seqno}++; my $cur_num = $msgs->[0]->{num}; if ($cur_num == $seq) { # as expected - git_async_msg($ibx, $msgs->[0], \&uid_fetch_cb, \@_); + git_async_cat($ibx->git, $msgs->[0]->{blob}, + \&uid_fetch_cb, \@_); } elsif ($cur_num > $seq) { # send dummy messages until $seq catches up to $cur_num my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg'; diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm index 407751c3063..b2b0b56fdd3 100644 --- a/lib/PublicInbox/Inbox.pm +++ b/lib/PublicInbox/Inbox.pm @@ -22,9 +22,6 @@ my $CLEANUP = {}; # string(inbox) -> inbox sub git_cleanup ($) { my ($self) = @_; my $git = $self->{git} or return; - if (my $async_cat = delete $self->{async_cat}) { - $async_cat->close; - } $git->cleanup; }
This matches the behavior of the existing synchronous ->cat_file method. In fact, ->cat_file now becomes a small wrapper around the ->cat_async method. --- lib/PublicInbox/Git.pm | 64 +++++++++++++++++++++++++----------------- t/git.t | 37 ++++++++++++++++++++---- 2 files changed, 70 insertions(+), 31 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index 60236afe7bf..a55c48d5f52 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -18,6 +18,7 @@ use base qw(Exporter); our @EXPORT_OK = qw(git_unquote git_quote); use Errno qw(EINTR); our $PIPE_BUFSIZ = 65536; # Linux default +our $in_cleanup; use constant MAX_INFLIGHT => (($^O eq 'linux' ? 4096 : POSIX::_POSIX_PIPE_BUF()) * 3) @@ -155,6 +156,26 @@ sub my_readline ($$) { } } +sub cat_async_retry ($$$$$) { + my ($self, $inflight, $req, $cb, $arg) = @_; + + # {inflight} may be non-existent, but if it isn't we delete it + # here to prevent cleanup() from waiting: + delete $self->{inflight}; + cleanup($self); + + $self->{inflight} = $inflight; + batch_prepare($self); + my $buf = "$req\n"; + for (my $i = 0; $i < @$inflight; $i += 3) { + $buf .= "$inflight->[$i]\n"; + } + print { $self->{out} } $buf or fail($self, "write error: $!"); + unshift(@$inflight, \$req, $cb, $arg); # \$ref to indicate retried + + cat_async_step($self, $inflight); # take one step +} + sub cat_async_step ($$) { my ($self, $inflight) = @_; die 'BUG: inflight empty or odd' if scalar(@$inflight) < 3; @@ -168,8 +189,13 @@ sub cat_async_step ($$) { fail($self, defined($bref) ? 'read EOF' : "read: $!"); chop($$bref) eq "\n" or fail($self, 'LF missing after blob'); } elsif ($head =~ / missing$/) { + # ref($req) indicates it's already been retried + if (!ref($req) && !$in_cleanup && alternates_changed($self)) { + return cat_async_retry($self, $inflight, + $req, $cb, $arg); + } $type = 'missing'; - $oid = $req; + $oid = ref($req) ? $$req : $req; } else { fail($self, "Unexpected result from async git cat-file: $head"); } @@ -190,33 +216,18 @@ sub batch_prepare ($) { _bidi_pipe($_[0], qw(--batch in out pid)); } +sub _cat_file_cb { + my ($bref, undef, undef, $size, $result) = @_; + @$result = ($bref, $size); +} + sub cat_file { - my ($self, $obj, $sizeref) = @_; - my ($retried, $head, $rbuf); + my ($self, $oid, $sizeref) = @_; + my $result = []; + cat_async($self, $oid, \&_cat_file_cb, $result); cat_async_wait($self); -again: - batch_prepare($self); - $rbuf = delete($self->{cat_rbuf}) // \(my $new = ''); - print { $self->{out} } $obj, "\n" or fail($self, "write error: $!"); - $head = my_readline($self->{in}, $rbuf); - if ($head =~ / missing$/) { - if (!$retried && alternates_changed($self)) { - $retried = 1; - cleanup($self); - goto again; - } - return; - } - $head =~ /^[0-9a-f]{40} \S+ ([0-9]+)$/ or - fail($self, "Unexpected result from git cat-file: $head"); - - my $size = $1 + 0; - $$sizeref = $size if $sizeref; - my $ret = my_read($self->{in}, $rbuf, $size + 1); - $self->{cat_rbuf} = $rbuf if $$rbuf ne ''; - fail($self, defined($ret) ? 'read EOF' : "read: $!") if !$ret; - chop($$ret) eq "\n" or fail($self, 'newline missing after blob'); - $ret; + $$sizeref = $result->[1] if $sizeref; + $result->[0]; } sub check { @@ -283,6 +294,7 @@ sub qx { # returns true if there are pending "git cat-file" processes sub cleanup { my ($self) = @_; + local $in_cleanup = 1; if (my $ac = $self->{async_cat}) { $ac->close; # PublicInbox::GitAsyncCat::close -> EPOLL_CTL_DEL } diff --git a/t/git.t b/t/git.t index 98d16f289c3..228df90fa45 100644 --- a/t/git.t +++ b/t/git.t @@ -86,23 +86,50 @@ if (1) { if ('alternates reloaded') { my ($alt, $alt_obj) = tmpdir(); - my @cmd = ('git', "--git-dir=$alt", qw(hash-object -w --stdin)); + my $hash_obj = [ 'git', "--git-dir=$alt", qw(hash-object -w --stdin) ]; PublicInbox::Import::init_bare($alt); open my $fh, '<', "$alt/config" or die "open failed: $!\n"; - my $rd = popen_rd(\@cmd, {}, { 0 => $fh } ); - close $fh or die "close failed: $!"; - chomp(my $remote = <$rd>); + chomp(my $remote = xqx($hash_obj, undef, { 0 => $fh })); my $gcf = PublicInbox::Git->new($dir); is($gcf->cat_file($remote), undef, "remote file not found"); open $fh, '>>', "$dir/objects/info/alternates" or die "open failed: $!\n"; - print $fh "$alt/objects" or die "print failed: $!\n"; + print $fh "$alt/objects\n" or die "print failed: $!\n"; close $fh or die "close failed: $!"; my $found = $gcf->cat_file($remote); open $fh, '<', "$alt/config" or die "open failed: $!\n"; my $config = eval { local $/; <$fh> }; is($$found, $config, 'alternates reloaded'); + # with the async interface + my ($async_alt, $async_dir_obj) = tmpdir(); + PublicInbox::Import::init_bare($async_alt); + my @exist = map { chomp; [ split / / ] } (xqx(['git', "--git-dir=$dir", + qw(cat-file --batch-all-objects --batch-check)])); + my $results = []; + my $cb = sub { + my ($bref, $oid, $type, $size) = @_; + push @$results, [ $oid, $type, $size ]; + }; + for my $i (0..5) { + $gcf->cat_async($exist[$i]->[0], $cb, $results); + next if $i != 3; + + # stick a new alternate into a running async pipeline + $hash_obj->[1] = "--git-dir=$async_alt"; + $remote = xqx($hash_obj, undef, { 0 => \'async' }); + chomp $remote; + open $fh, '>>', "$dir/objects/info/alternates" or + die "open failed: $!\n"; + print $fh "$async_alt/objects\n" or die "print failed: $!\n"; + close $fh or die "close failed: $!"; + # trigger cat_async_retry: + $gcf->cat_async($remote, $cb, $results); + } + $gcf->cat_async_wait; + my $expect = [ @exist[0..3], [ $remote, 'blob', 5 ], @exist[4..5] ]; + is_deeply($results, $expect, 'got expected results'); + ok(!$gcf->cleanup, 'cleanup can expire'); ok(!$gcf->cleanup, 'cleanup idempotent');
We'll be using this wrapper class to workaround some upstream bugs in Mail::IMAPClient. There may also be experiments with new APIs for more performance. --- MANIFEST | 1 + lib/PublicInbox/IMAPClient.pm | 119 ++++++++++++++++++++++++++++++++++ t/imapd-tls.t | 37 +++++------ t/imapd.t | 35 +++++----- 4 files changed, 155 insertions(+), 37 deletions(-) create mode 100644 lib/PublicInbox/IMAPClient.pm diff --git a/MANIFEST b/MANIFEST index f74852b6e90..6744a519efc 100644 --- a/MANIFEST +++ b/MANIFEST @@ -128,6 +128,7 @@ lib/PublicInbox/HTTPD/Async.pm lib/PublicInbox/HlMod.pm lib/PublicInbox/Hval.pm lib/PublicInbox/IMAP.pm +lib/PublicInbox/IMAPClient.pm lib/PublicInbox/IMAPD.pm lib/PublicInbox/IMAPdeflate.pm lib/PublicInbox/Import.pm diff --git a/lib/PublicInbox/IMAPClient.pm b/lib/PublicInbox/IMAPClient.pm new file mode 100644 index 00000000000..33deee9eb3d --- /dev/null +++ b/lib/PublicInbox/IMAPClient.pm @@ -0,0 +1,119 @@ +# This library is free software; you can redistribute it and/or modify it +# under the same terms as Perl itself, either Perl version 5.8.0 or, at +# your option, any later version of Perl 5 you may have available. +# +# The license for this file differs from the rest of public-inbox. +# +# Workaround some bugs in upstream Mail::IMAPClient when +# compression is enabled: +# - reference cycle: https://rt.cpan.org/Ticket/Display.html?id=132654 +# - read starvation: https://rt.cpan.org/Ticket/Display.html?id=132720 +package PublicInbox::IMAPClient; +use strict; +use parent 'Mail::IMAPClient'; +use Errno qw(EAGAIN); + +# RFC4978 COMPRESS +sub compress { + my ($self) = @_; + + # BUG? strict check on capability commented out for now... + #my $can = $self->has_capability("COMPRESS") + #return undef unless $can and $can eq "DEFLATE"; + + $self->_imap_command("COMPRESS DEFLATE") or return undef; + + my $zcl = $self->_load_module("Compress-Zlib") or return undef; + + # give caller control of args if desired + $self->Compress( + [ + -WindowBits => -$zcl->MAX_WBITS(), + -Level => $zcl->Z_BEST_SPEED() + ] + ) unless ( $self->Compress and ref( $self->Compress ) eq "ARRAY" ); + + my ( $rc, $do, $io ); + + ( $do, $rc ) = Compress::Zlib::deflateInit( @{ $self->Compress } ); + unless ( $rc == $zcl->Z_OK ) { + $self->LastError("deflateInit failed (rc=$rc)"); + return undef; + } + + ( $io, $rc ) = + Compress::Zlib::inflateInit( -WindowBits => -$zcl->MAX_WBITS() ); + unless ( $rc == $zcl->Z_OK ) { + $self->LastError("inflateInit failed (rc=$rc)"); + return undef; + } + + $self->{Prewritemethod} = sub { + my ( $self, $string ) = @_; + + my ( $rc, $out1, $out2 ); + ( $out1, $rc ) = $do->deflate($string); + ( $out2, $rc ) = $do->flush( $zcl->Z_PARTIAL_FLUSH() ) + unless ( $rc != $zcl->Z_OK ); + + unless ( $rc == $zcl->Z_OK ) { + $self->LastError("deflate/flush failed (rc=$rc)"); + return undef; + } + + return $out1 . $out2; + }; + + # need to retain some state for Readmoremethod/Readmethod calls + my ( $Zbuf, $Ibuf ) = ( "", "" ); + + $self->{Readmoremethod} = sub { + my $self = shift; + return 1 if ( length($Zbuf) || length($Ibuf) ); + $self->__read_more(@_); + }; + + $self->{Readmethod} = sub { + my ( $self, $fh, $buf, $len, $off ) = @_; + + # get more data, but empty $Ibuf first if any data is left + my ( $lz, $li ) = ( length $Zbuf, length $Ibuf ); + if ( $lz || !$li ) { + my $readlen = $self->Buffer || 4096; + my $ret = sysread( $fh, $Zbuf, $readlen, length $Zbuf ); + $lz = length $Zbuf; + return $ret if ( !$ret && !$lz ); # $ret is undef or 0 + } + + # accumulate inflated data in $Ibuf + if ($lz) { + my ( $tbuf, $rc ) = $io->inflate( \$Zbuf ); + unless ( $rc == $zcl->Z_OK ) { + $self->LastError("inflate failed (rc=$rc)"); + return undef; + } + $Ibuf .= $tbuf; + $li = length $Ibuf; + } + + if ( !$li ) { + # note: faking EAGAIN here is only safe with level-triggered + # I/O readiness notifications (select, poll). Refactoring + # callers will be needed in the unlikely case somebody wants + # to use edge-triggered notifications (EV_CLEAR, EPOLLET). + $! = EAGAIN; + return undef; + } + + # pull desired length of data from $Ibuf + my $tbuf = substr( $Ibuf, 0, $len ); + substr( $Ibuf, 0, $len ) = ""; + substr( $$buf, $off ) = $tbuf; + + return length $tbuf; + }; + + return $self; +} + +1; diff --git a/t/imapd-tls.t b/t/imapd-tls.t index 9f5abfe048e..5352d100e4b 100644 --- a/t/imapd-tls.t +++ b/t/imapd-tls.t @@ -7,8 +7,15 @@ use Socket qw(IPPROTO_TCP SOL_SOCKET); use PublicInbox::TestCommon; # IO::Poll is part of the standard library, but distros may split it off... require_mods(qw(DBD::SQLite IO::Socket::SSL Mail::IMAPClient IO::Poll)); -Mail::IMAPClient->can('starttls') or +my $imap_client = 'Mail::IMAPClient'; +$imap_client->can('starttls') or plan skip_all => 'Mail::IMAPClient does not support TLS'; +my $can_compress = $imap_client->can('compress'); +if ($can_compress) { # hope this gets fixed upstream, soon + require PublicInbox::IMAPClient; + $imap_client = 'PublicInbox::IMAPClient'; +} + my $cert = 'certs/server-cert.pem'; my $key = 'certs/server-key.pem'; unless (-r $key && -r $cert) { @@ -67,18 +74,6 @@ my $starttls_addr = $starttls->sockhost . ':' . $starttls->sockport; my $env = { PI_CONFIG => $pi_config }; my $td; -# Mail::IMAPClient ->compress creates cyclic reference: -# https://rt.cpan.org/Ticket/Display.html?id=132654 -my $compress_logout = sub { - my ($c) = @_; - ok($c->logout, 'logout ok after ->compress'); - # all documented in Mail::IMAPClient manpage: - for (qw(Readmoremethod Readmethod Prewritemethod)) { - $c->$_(undef); - } -}; - - for my $args ( [ "--cert=$cert", "--key=$key", "-limaps://$imaps_addr", @@ -112,7 +107,7 @@ for my $args ( Server => $imaps->sockhost, Port => $imaps->sockport); # IMAPS - my $c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]); + my $c = $imap_client->new(%imaps_opt, Ssl => [ %o ]); ok($c && $c->IsAuthenticated, 'authenticated'); ok($c->select($group), 'SELECT works'); ok(!(scalar $c->has_capability('STARTTLS')), @@ -122,12 +117,12 @@ for my $args ( ok($c->compress, 'compression enabled with IMAPS'); ok(!$c->starttls, 'starttls still fails'); ok($c->noop, 'noop succeeds'); - $compress_logout->($c); + ok($c->logout, 'logout succeeds'); # STARTTLS my %imap_opt = (Server => $starttls->sockhost, Port => $starttls->sockport); - $c = Mail::IMAPClient->new(%imap_opt); + $c = $imap_client->new(%imap_opt); ok(scalar $c->has_capability('STARTTLS'), 'starttls advertised'); ok($c->Starttls([ %o ]), 'set starttls options'); @@ -141,25 +136,25 @@ for my $args ( ok($c->noop, 'NOOP works'); ok($c->compress, 'compression enabled with IMAPS'); ok($c->noop, 'NOOP works after compress'); - $compress_logout->($c); + ok($c->logout, 'logout succeeds after compress'); # STARTTLS with bad hostname $o{SSL_hostname} = $o{SSL_verifycn_name} = 'server.invalid'; - $c = Mail::IMAPClient->new(%imap_opt); + $c = $imap_client->new(%imap_opt); ok(scalar $c->has_capability('STARTTLS'), 'starttls advertised'); ok($c->Starttls([ %o ]), 'set starttls options'); ok(!$c->starttls, '->starttls fails with bad hostname'); - $c = Mail::IMAPClient->new(%imap_opt); + $c = $imap_client->new(%imap_opt); ok($c->noop, 'NOOP still works from plain IMAP'); # IMAPS with bad hostname - $c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]); + $c = $imap_client->new(%imaps_opt, Ssl => [ %o ]); is($c, undef, 'IMAPS fails with bad hostname'); # make hostname valid $o{SSL_hostname} = $o{SSL_verifycn_name} = 'server.local'; - $c = Mail::IMAPClient->new(%imaps_opt, Ssl => [ %o ]); + $c = $imap_client->new(%imaps_opt, Ssl => [ %o ]); ok($c, 'IMAPS succeeds again with valid hostname'); # slow TLS connection did not block the other fast clients while diff --git a/t/imapd.t b/t/imapd.t index 7af14f1b150..3d0be340846 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -9,6 +9,12 @@ use PublicInbox::TestCommon; use PublicInbox::Config; use PublicInbox::Spawn qw(which); require_mods(qw(DBD::SQLite Mail::IMAPClient Mail::IMAPClient::BodyStructure)); +my $imap_client = 'Mail::IMAPClient'; +my $can_compress = $imap_client->can('compress'); +if ($can_compress) { # hope this gets fixed upstream, soon + require PublicInbox::IMAPClient; + $imap_client = 'PublicInbox::IMAPClient'; +} my $level = '-Lbasic'; SKIP: { @@ -57,7 +63,7 @@ my %mic_opt = ( Port => $sock->sockport, Uid => 1, ); -my $mic = Mail::IMAPClient->new(%mic_opt); +my $mic = $imap_client->new(%mic_opt); my $pre_login_capa = $mic->capability; is(grep(/\AAUTH=ANONYMOUS\z/, @$pre_login_capa), 1, 'AUTH=ANONYMOUS advertised pre-login'); @@ -71,7 +77,7 @@ ok(join("\n", @$pre_login_capa) ne join("\n", @$post_login_capa), $mic_opt{Authmechanism} = 'ANONYMOUS'; $mic_opt{Authcallback} = sub { '' }; -$mic = Mail::IMAPClient->new(%mic_opt); +$mic = $imap_client->new(%mic_opt); ok($mic && $mic->login && $mic->IsAuthenticated, 'AUTHENTICATE ANONYMOUS'); my $post_auth_anon_capa = $mic->capability; is_deeply($post_auth_anon_capa, $post_login_capa, @@ -175,20 +181,17 @@ for my $r ('1:*', '1') { is(lc($bs->bodyenc), '8bit', '->bodyenc'); } -# Mail::IMAPClient ->compress creates cyclic reference: -# https://rt.cpan.org/Ticket/Display.html?id=132654 -my $compress_logout = sub { - my ($c) = @_; - ok($c->logout, 'logout ok after ->compress'); - # all documented in Mail::IMAPClient manpage: - for (qw(Readmoremethod Readmethod Prewritemethod)) { - $c->$_(undef); - } -}; - is_deeply([$mic->has_capability('COMPRESS')], ['DEFLATE'], 'deflate cap'); -ok($mic->compress, 'compress enabled'); -$compress_logout->($mic); +SKIP: { + skip 'Mail::IMAPClient too old for ->compress', 2 if !$can_compress; + my $c = $imap_client->new(%mic_opt); + ok($c && $c->compress, 'compress enabled'); + ok($c->examine('inbox.i1'), 'EXAMINE succeeds after COMPRESS'); + $ret = $c->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@"; + is_deeply($ret, [ 1 ], 'search UID 1:* works after compression'); +} + +ok($mic->logout, 'logout works'); my $have_inotify = eval { require Linux::Inotify2; 1 }; @@ -198,7 +201,7 @@ $pi_config->each_inbox(sub { my $env = { ORIGINAL_RECIPIENT => $ibx->{-primary_address} }; my $name = $ibx->{name}; my $ng = $ibx->{newsgroup}; - my $mic = Mail::IMAPClient->new(%mic_opt); + my $mic = $imap_client->new(%mic_opt); ok($mic && $mic->login && $mic->IsAuthenticated, "authed $name"); my $uidnext = $mic->uidnext($ng); # we'll fetch BODYSTRUCTURE on this ok($uidnext, 'got uidnext for later fetch');
imapd-validate is a beefed up version of our nntpd-validate test which hammers the server with parallel connections over regular IMAP, IMAPS, IMAP+STARTTLS; and COMPRESS=DEFLATE variants of each of those. It uses $START_UID:$END_UID fetch ranges to reduce requests and slurp many responses at once to saturate "git cat-file --batch" processes. mbsync(1) also uses pipelining extensively (but IMHO unnecessarily), so it was able to shake out some bugs in the async git code. Finally, we remove xt/cmp-imapd-compress.t since it's redundant now that we have PublicInbox::IMAPClient to work around bugs in Mail::IMAPClient. --- MANIFEST | 3 +- xt/cmp-imapd-compress.t | 83 ------------------- xt/imapd-mbsync-oimap.t | 129 +++++++++++++++++++++++++++++ xt/imapd-validate.t | 175 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 306 insertions(+), 84 deletions(-) delete mode 100644 xt/cmp-imapd-compress.t create mode 100644 xt/imapd-mbsync-oimap.t create mode 100644 xt/imapd-validate.t diff --git a/MANIFEST b/MANIFEST index 6744a519efc..957228250a7 100644 --- a/MANIFEST +++ b/MANIFEST @@ -347,12 +347,13 @@ t/www_listing.t t/www_static.t t/x-unknown-alpine.eml t/xcpdb-reshard.t -xt/cmp-imapd-compress.t xt/cmp-msgstr.t xt/cmp-msgview.t xt/eml_check_limits.t xt/git-http-backend.t xt/git_async_cmp.t +xt/imapd-mbsync-oimap.t +xt/imapd-validate.t xt/mem-msgview.t xt/msgtime_cmp.t xt/nntpd-validate.t diff --git a/xt/cmp-imapd-compress.t b/xt/cmp-imapd-compress.t deleted file mode 100644 index b12cf74e2d5..00000000000 --- a/xt/cmp-imapd-compress.t +++ /dev/null @@ -1,83 +0,0 @@ -#!perl -w -# Copyright (C) 2020 all contributors <meta@public-inbox.org> -# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> -use strict; -use Test::More; -use PublicInbox::TestCommon; -require_mods('Data::Dumper'); -Data::Dumper->import('Dumper'); -my $inboxdir = $ENV{GIANT_INBOX_DIR}; -(defined($inboxdir) && -d $inboxdir) or - plan skip_all => "GIANT_INBOX_DIR not defined for $0"; -plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!; -my ($tmpdir, $for_destroy) = tmpdir(); -my $cfg = "$tmpdir/cfg"; -my $mailbox = 'inbox.test'; -{ - open my $fh, '>', $cfg or BAIL_OUT "open: $!"; - print $fh <<EOF or BAIL_OUT "print: $!"; -[publicinbox "test"] - newsgroup = $mailbox - address = test\@example.com - inboxdir = $inboxdir -EOF - close $fh or BAIL_OUT "close: $!"; -} -my ($out, $err) = ("$tmpdir/stdout.log", "$tmpdir/stderr.log"); -my $sock = tcp_server(); -my $cmd = [ '-imapd', '-W0', "--stdout=$out", "--stderr=$err"]; -my $env = { PI_CONFIG => $cfg }; -my $td = start_script($cmd, $env, { 3 => $sock }) or BAIL_OUT "-imapd: $?"; -my ($host, $port) = ($sock->sockhost, $sock->sockport); -my $c = tcp_connect($sock); -like(readline($c), qr/CAPABILITY /, 'got greeting'); -undef $c; - -SKIP: { - require_mods('Mail::IMAPClient', 3); - unless ($ENV{RT_132720_FIXED}) { - my $bug = 'https://rt.cpan.org/Ticket/Display.html?id=132720'; - skip "<$bug>, RT_132720_FIXED not defined", 3; - } - my %opt = (Server => $host, Port => $port, - User => 'u', Password => 'p', Clear => 1); - my $uc = Mail::IMAPClient->new(%opt); - my $c = Mail::IMAPClient->new(%opt); - ok($c->compress, 'enabled compression'); - ok $c->examine($mailbox), 'compressed EXAMINE-ed'; - ok $uc->examine($mailbox), 'uncompress EXAMINE-ed'; - my $range = $uc->search('all'); - for my $uid (@$range) { - my $A = $uc->fetch_hash($uid, 'BODY[]'); - my $B = $c->fetch_hash($uid, 'BODY[]'); - if (!is_deeply($A, $B, "$uid identical")) { - diag Dumper([$A, $B]); - diag Dumper([$uc, $c]); - last; - } - } - $uc->logout; - $c->logout; -} - -SKIP: { - require_mods('Mail::IMAPTalk', 3); - my %opt = (Server => $host, Port => $port, UseSSL => 0, - Username => 'u', Password => 'p', Uid => 1); - my $uc = Mail::IMAPTalk->new(%opt) or BAIL_OUT 'IMAPTalk->new'; - my $c = Mail::IMAPTalk->new(%opt, UseCompress => 1) or - BAIL_OUT 'IMAPTalk->new(UseCompress => 1)'; - ok $c->examine($mailbox), 'compressed EXAMINE-ed'; - ok $uc->examine($mailbox), 'uncompress EXAMINE-ed'; - my $range = $uc->search('all'); - for my $uid (@$range) { - my $A = $uc->fetch($uid, 'rfc822'); - my $B = $c->fetch($uid, 'rfc822'); - if (!is_deeply($A, $B, "$uid identical")) { - diag Dumper([$A, $B]); - diag Dumper([$uc, $c]); - last; - } - } -} -done_testing; diff --git a/xt/imapd-mbsync-oimap.t b/xt/imapd-mbsync-oimap.t new file mode 100644 index 00000000000..d2237a24bef --- /dev/null +++ b/xt/imapd-mbsync-oimap.t @@ -0,0 +1,129 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# ensure mbsync and offlineimap compatibility +use strict; +use Test::More; +use File::Path qw(mkpath); +use PublicInbox::TestCommon; +use PublicInbox::Spawn qw(which spawn); +my $inboxdir = $ENV{GIANT_INBOX_DIR}; +(defined($inboxdir) && -d $inboxdir) or + plan skip_all => "GIANT_INBOX_DIR not defined for $0"; +plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!; +my ($tmpdir, $for_destroy) = tmpdir(); +my $cfg = "$tmpdir/cfg"; +my $mailbox = 'inbox.test'; +{ + open my $fh, '>', $cfg or BAIL_OUT "open: $!"; + print $fh <<EOF or BAIL_OUT "print: $!"; +[publicinbox "test"] + newsgroup = $mailbox + address = oimap\@example.com + inboxdir = $inboxdir +EOF + close $fh or BAIL_OUT "close: $!"; +} +my ($out, $err) = ("$tmpdir/stdout.log", "$tmpdir/stderr.log"); +my $sock = tcp_server(); +my $cmd = [ '-imapd', '-W0', "--stdout=$out", "--stderr=$err" ]; +my $env = { PI_CONFIG => $cfg }; +my $td = start_script($cmd, $env, { 3 => $sock }) or BAIL_OUT "-imapd: $?"; +{ + my $c = tcp_connect($sock); + like(readline($c), qr/CAPABILITY /, 'got greeting'); +} +my ($host, $port) = ($sock->sockhost, $sock->sockport); +my %pids; + +SKIP: { + mkpath([map { "$tmpdir/oimapdir/$_" } qw(cur new tmp)]); + my $oimap = which('offlineimap') or skip 'no offlineimap(1)', 1; + open my $fh, '>', "$tmpdir/.offlineimaprc" or BAIL_OUT "open: $!"; + print $fh <<EOF or BAIL_OUT "print: $!"; +[general] +accounts = test +socktimeout = 10 +fsync = false + +[Account test] +localrepository = l.test +remoterepository = r.test + +[Repository l.test] +type = Maildir +localfolders = ~/oimapdir + +[Repository r.test] +type = IMAP +ssl = no +remotehost = $host +remoteport = $port +remoteuser = anonymous +remotepass = Hunter2 + +# python-imaplib2 times out on select/poll when compression is enabled +# <https://bugs.debian.org/961713> +usecompression = no +EOF + close $fh or BAIL_OUT "close: $!"; + my $cmd = [ $oimap, qw(-o -q -u quiet) ]; + my $pid = spawn($cmd, { HOME => $tmpdir }, { 1 => 2 }); + $pids{$pid} = $cmd; +} + +SKIP: { + mkpath([map { "$tmpdir/mbsyncdir/test/$_" } qw(cur new tmp)]); + my $mbsync = which('mbsync') or skip 'no mbsync(1)', 1; + open my $fh, '>', "$tmpdir/.mbsyncrc" or BAIL_OUT "open: $!"; + print $fh <<EOF or BAIL_OUT "print: $!"; +Create Slave +SyncState * +Remove None +FSync no + +MaildirStore local +Path ~/mbsyncdir/ +Inbox ~/mbsyncdir/test +SubFolders verbatim + +IMAPStore remote +Host $host +Port $port +User anonymous +Pass Hunter2 +SSLType None +UseNamespace no +# DisableExtension COMPRESS=DEFLATE + +Channel "test" +Master ":remote:inbox" +Slave ":local:test" +Expunge None +Sync PullNew +Patterns * +EOF + close $fh or BAIL_OUT "close: $!"; + my $cmd = [ $mbsync, qw(-aqq) ]; + my $pid = spawn($cmd, { HOME => $tmpdir }, { 1 => 2 }); + $pids{$pid} = $cmd; +} + +while (scalar keys %pids) { + my $pid = waitpid(-1, 0) or next; + my $cmd = delete $pids{$pid} or next; + is($?, 0, join(' ', @$cmd, 'done')); +} + +if (my $sec = $ENV{TEST_PERSIST}) { + diag "sleeping ${sec}s, imap://$host:$port/$mailbox available"; + diag "tmpdir=$tmpdir (Maildirs available)"; + diag "stdout=$out"; + diag "stderr=$err"; + diag "pid=$td->{pid}"; + sleep $sec; +} +$td->kill; +$td->join; +is($?, 0, 'no error on -imapd exit'); +done_testing; diff --git a/xt/imapd-validate.t b/xt/imapd-validate.t new file mode 100644 index 00000000000..f96ec8791b9 --- /dev/null +++ b/xt/imapd-validate.t @@ -0,0 +1,175 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# Expensive test to validate compression and TLS. +use strict; +use Test::More; +use Symbol qw(gensym); +use PublicInbox::DS qw(now); +use POSIX qw(_exit); +use PublicInbox::TestCommon; +my $inbox_dir = $ENV{GIANT_INBOX_DIR}; +plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inbox_dir; +# how many emails to read into memory at once per-process +my $BATCH = $ENV{TEST_BATCH} // 100; +my $REPEAT = $ENV{TEST_REPEAT} // 1; + +require_mods(qw(Mail::IMAPClient)); +my $imap_client = 'Mail::IMAPClient'; +my $can_compress = $imap_client->can('compress'); +if ($can_compress) { # hope this gets fixed upstream, soon + require PublicInbox::IMAPClient; + $imap_client = 'PublicInbox::IMAPClient'; +} + +my $test_tls = $ENV{TEST_SKIP_TLS} ? 0 : eval { require IO::Socket::SSL }; +my ($cert, $key) = qw(certs/server-cert.pem certs/server-key.pem); +if ($test_tls && !-r $key || !-r $cert) { + plan skip_all => + "certs/ missing for $0, run $^X ./certs/create-certs.perl"; +} +my ($tmpdir, $for_destroy) = tmpdir(); +my %OPT = qw(User u Password p); +my (%STARTTLS_OPT, %IMAPS_OPT, $td, $mailbox, $make_local_server); +if (($ENV{IMAP_TEST_URL} // '') =~ m!\Aimap://([^/]+)/(.+)\z!) { + ($OPT{Server}, $mailbox) = ($1, $2); + $OPT{Server} =~ s/:([0-9]+)\z// and $OPT{Port} = $1 + 0; + %STARTTLS_OPT = %OPT; + %IMAPS_OPT = (%OPT, Port => 993) if $OPT{Port} == 143; +} else { + require_mods(qw(DBD::SQLite)); + $make_local_server->(); +} + +my %opts = (imap => \%OPT, 'imap+compress' => { %OPT, Compress => 1 }); +my $uid_max = do { + my $mic = $imap_client->new(%OPT) or BAIL_OUT "new $!"; + $mic->examine($mailbox) or BAIL_OUT "examine: $!"; + my $next = $mic->uidnext($mailbox) or BAIL_OUT "uidnext: $!"; + $next - 1; +}; + +if (scalar keys %STARTTLS_OPT) { + $opts{starttls} = \%STARTTLS_OPT; + $opts{'starttls+compress'} = { %STARTTLS_OPT, Compress => 1 }; +} +if (scalar keys %IMAPS_OPT) { + $opts{imaps} = \%IMAPS_OPT; + $opts{'imaps+compress'} = { %IMAPS_OPT, Compress => 1 }; +} + +my $do_get_all = sub { + my ($desc, $opt) = @_; + local $SIG{__DIE__} = sub { print STDERR $desc, ': ', @_; _exit(1) }; + my $t0 = now(); + my $dig = Digest::SHA->new(1); + my $mic = $imap_client->new(%$opt); + $mic->examine($mailbox) or die "examine: $!"; + my $uid_base = 1; + my $bytes = 0; + my $nr = 0; + until ($uid_base > $uid_max) { + my $end = $uid_base + $BATCH; + my $ret = $mic->fetch_hash("$uid_base:$end", 'BODY[]') or last; + for my $uid ($uid_base..$end) { + $dig->add($uid); + my $h = delete $ret->{$uid} or next; + my $body = delete $h->{'BODY[]'} or + die "no BODY[] for UID=$uid"; + $dig->add($body); + $bytes += length($body); + ++$nr; + } + $uid_base = $end + 1; + } + $mic->logout or die "logout failed: $!"; + my $elapsed = sprintf('%0.3f', now() - $t0); + my $res = $dig->hexdigest; + print STDERR "# $desc $res (${elapsed}s) $bytes bytes, NR=$nr\n"; + $res; +}; + +my (%pids, %res); +for (1..$REPEAT) { + while (my ($desc, $opt) = each %opts) { + pipe(my ($r, $w)) or die; + my $pid = fork; + if ($pid == 0) { + close $r or die; + my $res = $do_get_all->($desc, $opt); + print $w $res or die; + close $w or die; + _exit(0); + } + close $w or die; + $pids{$pid} = [ $desc, $r ]; + } +} + +while (scalar keys %pids) { + my $pid = waitpid(-1, 0) or next; + my $child = delete $pids{$pid} or next; + my ($desc, $rpipe) = @$child; + is($?, 0, "$desc done"); + my $sum = do { local $/; <$rpipe> }; + push @{$res{$sum}}, $desc; +} +is(scalar keys %res, 1, 'all got the same result'); +$td->kill; +$td->join; +is($?, 0, 'no error on -imapd exit'); +done_testing; + +BEGIN { + +$make_local_server = sub { + require PublicInbox::Inbox; + $mailbox = 'inbox.test'; + my $ibx = { inboxdir => $inbox_dir, newsgroup => $mailbox }; + $ibx = PublicInbox::Inbox->new($ibx); + my $pi_config = "$tmpdir/config"; + { + open my $fh, '>', $pi_config or die "open($pi_config): $!"; + print $fh <<"" or die "print $pi_config: $!"; +[publicinbox "test"] + newsgroup = $mailbox + inboxdir = $inbox_dir + address = test\@example.com + + close $fh or die "close($pi_config): $!"; + } + my ($out, $err) = ("$tmpdir/out", "$tmpdir/err"); + for ($out, $err) { + open my $fh, '>', $_ or die "truncate: $!"; + } + my $imap = tcp_server(); + my $rdr = { 3 => $imap }; + $OPT{Server} = $imap->sockhost; + $OPT{Port} = $imap->sockport; + + # not using multiple workers, here, since we want to increase + # the chance of tripping concurrency bugs within PublicInbox/IMAP*.pm + my $cmd = [ '-imapd', "--stdout=$out", "--stderr=$err", '-W0' ]; + push @$cmd, '-limap://'.$imap->sockhost.':'.$imap->sockport; + if ($test_tls) { + my $imaps = tcp_server(); + $rdr->{4} = $imaps; + push @$cmd, '-limaps://'.$imaps->sockhost.':'.$imaps->sockport; + push @$cmd, "--cert=$cert", "--key=$key"; + my $tls_opt = [ + SSL_hostname => 'server.local', + SSL_verifycn_name => 'server.local', + SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_PEER(), + SSL_ca_file => 'certs/test-ca.pem', + ]; + %STARTTLS_OPT = (%OPT, Starttls => $tls_opt); + %IMAPS_OPT = (%OPT, Ssl => $tls_opt, + Server => $imaps->sockhost, + Port => $imaps->sockport + ); + } + print STDERR "# CMD ". join(' ', @$cmd). "\n"; + my $env = { PI_CONFIG => $pi_config }; + $td = start_script($cmd, $env, $rdr); +}; +} # BEGIN
"$UID_START:*" needs to return at least one message according to RFC 3501 section 6.4.8. While we're in the area, coerce ranges to (unsigned) integers by adding zero ("+ 0") to reduce memory overhead. --- lib/PublicInbox/IMAP.pm | 8 +++++--- t/imapd.t | 13 +++++++++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 3815141a15e..ffa76bb0266 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -455,11 +455,13 @@ sub range_step ($$) { $$range_csv = undef; } if ($range =~ /\A([0-9]+):([0-9]+)\z/) { - ($beg, $end) = ($1, $2); + ($beg, $end) = ($1 + 0, $2 + 0); } elsif ($range =~ /\A([0-9]+):\*\z/) { - ($beg, $end) = ($1, $ibx->mm->max // 0); + $beg = $1 + 0; + $end = $ibx->mm->max // 0; + $beg = $end if $beg > $end; } elsif ($range =~ /\A[0-9]+\z/) { - $beg = $end = $range; + $beg = $end = $range + 0; } else { return 'BAD fetch range'; } diff --git a/t/imapd.t b/t/imapd.t index 3d0be340846..2c4315dec30 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -116,6 +116,19 @@ $ret = $mic->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@"; is_deeply($ret, [ 1 ], 'search UID 1:* works'); is_deeply(scalar $mic->flags('1'), [], '->flags works'); +{ + # RFC 3501 section 6.4.8 states: + # Also note that a UID range of 559:* always includes the + # UID of the last message in the mailbox, even if 559 is + # higher than any assigned UID value. + my $exp = $mic->fetch_hash(1, 'UID'); + $ret = $mic->fetch_hash('559:*', 'UID'); + is_deeply($ret, $exp, 'beginning range too big'); + for my $r (qw(559:558 558:559)) { + $ret = $mic->fetch_hash($r, 'UID'); + is_deeply($ret, {}, "out-of-range UID FETCH $r"); + } +} for my $r ('1:*', '1') { $ret = $mic->fetch_hash($r, 'RFC822') or BAIL_OUT "FETCH $@";
It's useful to know how fast SIGHUP can be handled, too. --- xt/perf-imap-list.t | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/xt/perf-imap-list.t b/xt/perf-imap-list.t index 37640a90f3c..0f00f487991 100644 --- a/xt/perf-imap-list.t +++ b/xt/perf-imap-list.t @@ -9,15 +9,18 @@ use Benchmark qw(:all); my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000); push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000); my $self = { imapd => { grouplist => \@n } }; -PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); - my $n = scalar @n; +my $t = timeit(1, sub { + PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); +}); +diag timestr($t). "refresh $n inboxes"; + open my $null, '>', '/dev/null' or BAIL_OUT "open: $!"; my $ds = { sock => $null }; my $nr = 200; diag "starting benchmark..."; my $cmd_list = \&PublicInbox::IMAP::cmd_list; -my $t = timeit(1, sub { +$t = timeit(1, sub { for (0..$nr) { my $res = $cmd_list->($self, 'tag', '', '*'); PublicInbox::DS::write($ds, $res);
IMAP RFC 3501 stipulates case-insensitive comparisons, and so does RFC 977 (NNTP). However, INN (nnrpd) uses case-sensitive comparisons, so we've always used case-sensitive comparisons for NNTP to match nnrpd behavior. Unfortunately, some IMAP clients insist on sending "INBOX" with caps, which causes problems for us. Since NNTP group names are typically all lowercase anyways, just force all comparisons to lowercase for IMAP and warn admins if uppercase-containing newsgroups won't be accessible over IMAP. This ensures our existing -nntpd behavior remains unchanged while being compatible with the expectations of real-world IMAP clients. --- lib/PublicInbox/IMAPD.pm | 7 +++++++ t/imap.t | 7 ++++++- t/imapd.t | 2 +- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index a3a2598661b..966879d8ea5 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -25,6 +25,13 @@ sub refresh_inboxlist ($) { my ($self) = @_; my @names = map { $_->{newsgroup} } @{delete $self->{grouplist}}; my %ns; # "\Noselect \HasChildren" + + if (my @uc = grep(/[A-Z]/, @names)) { + warn "Uppercase not allowed for IMAP newsgroup(s):\n", + map { "\t$_\n" } @uc; + my %uc = map { $_ => 1 } @uc; + @names = grep { !$uc{$_} } @names; + } for (@names) { my $up = $_; while ($up =~ s/\.[^\.]+\z//) { diff --git a/t/imap.t b/t/imap.t index 451b6596bf9..aa262a19632 100644 --- a/t/imap.t +++ b/t/imap.t @@ -8,9 +8,14 @@ use PublicInbox::IMAP; use PublicInbox::IMAPD; { # make sure we get '%' globbing right - my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y)); + my @w; + local $SIG{__WARN__} = sub { push @w, @_ }; + my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y IGNORE.THIS)); my $self = { imapd => { grouplist => \@n } }; PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); + is(scalar(@w), 1, 'got a warning for upper-case'); + like($w[0], qr/IGNORE\.THIS/, 'warned about upper-case'); + my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%'); is(scalar($$res =~ tr/\n/\n/), 2, 'only one result'); like($$res, qr/ x\r\ntag OK/, 'saw expected'); diff --git a/t/imapd.t b/t/imapd.t index 2c4315dec30..a5324f78da0 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -88,7 +88,7 @@ like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent'); ok(!$mic->select('foo') && ($e = $@), 'EXAMINE non-existent'); like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent'); ok($mic->select('inbox.i1'), 'SELECT succeeds'); -ok($mic->examine('inbox.i1'), 'EXAMINE succeeds'); +ok($mic->examine('INBOX.i1'), 'EXAMINE succeeds'); my @raw = $mic->status('inbox.i1', qw(Messages uidnext uidvalidity)); is(scalar(@raw), 2, 'got status response'); like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\x20
This limit on mailbox size should keep users of tools like mbsync (isync) and offlineimap happy, since typical filesystems struggle with giant Maildirs. I chose 50K since it's a bit more than what LKML typically sees in a month and still manages to give acceptable performance on my ancient Centrino laptop. There were also no responses to my original proposal at: <https://public-inbox.org/meta/20200519090000.GA24273@dcvr/> so no objections, either :> --- lib/PublicInbox/IMAP.pm | 134 ++++++++++++++++++++++++++++++---------- 1 file changed, 100 insertions(+), 34 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index ffa76bb0266..959dcfbe328 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -16,7 +16,7 @@ package PublicInbox::IMAP; use strict; use base qw(PublicInbox::DS); use fields qw(imapd logged_in ibx long_cb -login_tag - -idle_tag -idle_max); + uid_min -idle_tag -idle_max); use PublicInbox::Eml; use PublicInbox::EmlContentFoo qw(parse_content_disposition); use PublicInbox::DS qw(now); @@ -34,6 +34,9 @@ die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address; sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977? +# changing this will cause grief for clients which cache +sub UID_BLOCK () { 50_000 } + my %FETCH_NEED_BLOB = ( # for future optimization 'BODY[HEADER]' => 1, 'BODY[TEXT]' => 1, @@ -128,6 +131,7 @@ sub cmd_login ($$$$) { sub cmd_close ($$) { my ($self, $tag) = @_; + delete $self->{uid_min}; delete $self->{ibx} ? "$tag OK Close done\r\n" : "$tag BAD No mailbox\r\n"; } @@ -188,12 +192,67 @@ sub cmd_done ($$) { "$idle_tag OK Idle done\r\n"; } +sub ensure_old_ranges_exist ($$$) { + my ($self, $ibx, $uid_min) = @_; + my $groups = $self->{imapd}->{groups}; + my $mailbox = $ibx->{newsgroup}; + my @created; + $uid_min -= UID_BLOCK; + my $uid_end = $uid_min + UID_BLOCK - 1; + while ($uid_min > 0) { + my $sub_mailbox = "$mailbox.$uid_min-$uid_end"; + last if exists $groups->{$sub_mailbox}; + $groups->{$sub_mailbox} = $ibx; + $uid_end -= UID_BLOCK; + $uid_min -= UID_BLOCK; + push @created, $sub_mailbox; + } + return unless @created; + my $l = $self->{imapd}->{inboxlist}; + grep { + / \Q$mailbox\E\r\n\z/ and s/\(\\HasNoChildren/\(\\HasChildren/; + } @$l; + push @$l, map { qq[* LIST (\\HasNoChildren) "." $_\r\n] } @created; +} + sub cmd_examine ($$$) { my ($self, $tag, $mailbox) = @_; - my $ibx = $self->{imapd}->{groups}->{$mailbox} or - return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; - my $mm = $ibx->mm; - my $max = $mm->max // 0; + my ($ibx, $mm, $max); + + if ($mailbox =~ /\A(.+)\.([0-9]+)-([0-9]+)\z/) { + # old mail: inbox.comp.foo.$uid_min-$uid_end + my ($mb_top, $uid_min, $uid_end) = ($1, $2 + 0, $3 + 0); + $ibx = $self->{imapd}->{groups}->{lc $mb_top}; + if (!$ibx || ($uid_end % UID_BLOCK) != 0 || + ($uid_min + UID_BLOCK - 1) != $uid_end) { + return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; + } + $mm = $ibx->mm; + $max = $mm->max // 0; + + # don't let users create inboxes w/ not-yet-possible range: + $uid_min > $max and + return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; + + $max = $uid_min + UID_BLOCK + 1; + $self->{uid_min} = $uid_min; + ensure_old_ranges_exist($self, $ibx, $uid_min); + } else { # current mailbox (most recent UID_BLOCK messages) + $ibx = $self->{imapd}->{groups}->{lc $mailbox} or + return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; + + $mm = $ibx->mm; + $max = $mm->max // 0; + + my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1; + if ($uid_min == 1) { # normal inbox with <UID_BLOCK messages + delete $self->{uid_min}; # implicit cmd_close + } else { # we have a giant inbox: + $self->{uid_min} = $uid_min; + ensure_old_ranges_exist($self, $ibx, $uid_min); + } + } + # RFC 3501 2.3.1.1 - "A good UIDVALIDITY value to use in # this case is a 32-bit representation of the creation # date/time of the mailbox" @@ -208,11 +267,11 @@ sub cmd_examine ($$$) { * $max RECENT\r * FLAGS (\\Seen)\r * OK [PERMANENTFLAGS ()] Read-only mailbox\r +* OK [UNSEEN $max]\r +* OK [UIDNEXT $uidnext]\r +* OK [UIDVALIDITY $uidvalidity]\r +$tag OK [READ-ONLY] EXAMINE/SELECT done\r EOF - $ret .= "* OK [UNSEEN $max]\r\n" if $max; - $ret .= "* OK [UIDNEXT $uidnext]\r\n" if defined $uidnext; - $ret .= "* OK [UIDVALIDITY $uidvalidity]\r\n" if defined $uidvalidity; - $ret .= "$tag OK [READ-ONLY] EXAMINE/SELECT done\r\n"; } sub _esc ($) { @@ -357,12 +416,12 @@ sub fetch_body ($;$) { } sub dummy_message ($$) { - my ($seqno, $ibx) = @_; + my ($self, $seqno) = @_; my $ret = <<EOF; From: nobody\@localhost\r To: nobody\@localhost\r Date: Thu, 01 Jan 1970 00:00:00 +0000\r -Message-ID: <dummy-$seqno\@$ibx->{newsgroup}>\r +Message-ID: <dummy-$seqno\@$self->{ibx}->{newsgroup}>\r Subject: dummy message #$seqno\r \r You're seeing this message because your IMAP client didn't use UIDs.\r @@ -390,13 +449,13 @@ sub requeue_once ($) { sub uid_fetch_cb { # called by git->cat_async via git_async_cat my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; - my ($self, undef, $ibx, $msgs, undef, $want) = @$fetch_m_arg; + my ($self, undef, $msgs, undef, $want) = @$fetch_m_arg; my $smsg = shift @$msgs or die 'BUG: no smsg'; if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs # public-inbox-(edit|purge), just move onto the next message return requeue_once($self) unless defined $want->{-seqno}; - $bref = dummy_message($smsg->{num}, $ibx); + $bref = dummy_message($self, $smsg->{num}); } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } @@ -446,7 +505,7 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat } sub range_step ($$) { - my ($ibx, $range_csv) = @_; + my ($self, $range_csv) = @_; my ($beg, $end, $range); if ($$range_csv =~ s/\A([^,]+),//) { $range = $1; @@ -458,39 +517,46 @@ sub range_step ($$) { ($beg, $end) = ($1 + 0, $2 + 0); } elsif ($range =~ /\A([0-9]+):\*\z/) { $beg = $1 + 0; - $end = $ibx->mm->max // 0; + $end = $self->{ibx}->mm->max // 0; $beg = $end if $beg > $end; } elsif ($range =~ /\A[0-9]+\z/) { $beg = $end = $range + 0; + undef $range; } else { return 'BAD fetch range'; } + if (defined($range) && (my $uid_min = $self->{uid_min})) { + my $uid_end = $uid_min + UID_BLOCK - 1; + $beg = $uid_min if $beg < $uid_min; + $end = $uid_end if $end > $uid_end; + } [ $beg, $end, $$range_csv ]; } sub refill_range ($$$) { - my ($ibx, $msgs, $range_info) = @_; + my ($self, $msgs, $range_info) = @_; my ($beg, $end, $range_csv) = @$range_info; - if (scalar(@$msgs = @{$ibx->over->query_xover($beg, $end)})) { + if (scalar(@$msgs = @{$self->{ibx}->over->query_xover($beg, $end)})) { $range_info->[0] = $msgs->[-1]->{num} + 1; return; } return 'OK Fetch done' if !$range_csv; - my $next_range = range_step($ibx, \$range_csv); + my $next_range = range_step($self, \$range_csv); return $next_range if !ref($next_range); # error @$range_info = @$next_range; undef; # keep looping } sub uid_fetch_m { # long_response - my ($self, $tag, $ibx, $msgs, $range_info, $want) = @_; + my ($self, $tag, $msgs, $range_info, $want) = @_; while (!@$msgs) { # rare - if (my $end = refill_range($ibx, $msgs, $range_info)) { + if (my $end = refill_range($self, $msgs, $range_info)) { $self->write(\"$tag $end\r\n"); return; } } - git_async_cat($ibx->git, $msgs->[0]->{blob}, \&uid_fetch_cb, \@_); + git_async_cat($self->{ibx}->git, $msgs->[0]->{blob}, + \&uid_fetch_cb, \@_); } sub cmd_status ($$$;@) { @@ -698,9 +764,9 @@ sub fetch_common ($$$$) { } sort keys %partial ]; } $range_csv = 'bad' if $range_csv !~ $valid_range; - my $range_info = range_step($ibx, \$range_csv); + my $range_info = range_step($self, \$range_csv); return "$tag $range_info\r\n" if !ref($range_info); - [ $tag, $ibx, [], $range_info, \%want ]; + [ $tag, [], $range_info, \%want ]; } sub cmd_uid_fetch ($$$;@) { @@ -712,9 +778,9 @@ sub cmd_uid_fetch ($$$;@) { } sub seq_fetch_m { # long_response - my ($self, $tag, $ibx, $msgs, $range_info, $want) = @_; + my ($self, $tag, $msgs, $range_info, $want) = @_; while (!@$msgs) { # rare - if (my $end = refill_range($ibx, $msgs, $range_info)) { + if (my $end = refill_range($self, $msgs, $range_info)) { $self->write(\"$tag $end\r\n"); return; } @@ -722,13 +788,13 @@ sub seq_fetch_m { # long_response my $seq = $want->{-seqno}++; my $cur_num = $msgs->[0]->{num}; if ($cur_num == $seq) { # as expected - git_async_cat($ibx->git, $msgs->[0]->{blob}, + git_async_cat($self->{ibx}->git, $msgs->[0]->{blob}, \&uid_fetch_cb, \@_); } elsif ($cur_num > $seq) { # send dummy messages until $seq catches up to $cur_num my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg'; unshift @$msgs, $smsg; - my $bref = dummy_message($seq, $ibx); + my $bref = dummy_message($self, $seq); uid_fetch_cb($bref, undef, undef, undef, \@_); $smsg; # blessed response since uid_fetch_cb requeues } else { # should not happen @@ -741,14 +807,14 @@ sub cmd_fetch ($$$;@) { my $args = fetch_common($self, $tag, $range_csv, \@want); ref($args) eq 'ARRAY' ? do { my $want = $args->[-1]; - $want->{-seqno} = $args->[3]->[0]; # $beg == $range_info->[0]; + $want->{-seqno} = $args->[2]->[0]; # $beg == $range_info->[0]; long_response($self, \&seq_fetch_m, @$args) } : $args; # error } sub uid_search_all { # long_response - my ($self, $tag, $ibx, $num) = @_; - my $uids = $ibx->mm->ids_after($num); + my ($self, $tag, $num) = @_; + my $uids = $self->{ibx}->mm->ids_after($num); if (scalar(@$uids)) { $self->msg_more(join(' ', '', @$uids)); } else { @@ -758,8 +824,8 @@ sub uid_search_all { # long_response } sub uid_search_uid_range { # long_response - my ($self, $tag, $ibx, $beg, $end) = @_; - my $uids = $ibx->mm->msg_range($beg, $end, 'num'); + my ($self, $tag, $beg, $end) = @_; + my $uids = $self->{ibx}->mm->msg_range($beg, $end, 'num'); if (@$uids) { $self->msg_more(join('', map { " $_->[0]" } @$uids)); } else { @@ -775,14 +841,14 @@ sub cmd_uid_search ($$$;) { if ($arg eq 'ALL' && !@rest) { $self->msg_more('* SEARCH'); my $num = 0; - long_response($self, \&uid_search_all, $tag, $ibx, \$num); + long_response($self, \&uid_search_all, $tag, \$num); } elsif ($arg eq 'UID' && scalar(@rest) == 1) { if ($rest[0] =~ /\A([0-9]+):([0-9]+|\*)\z/s) { my ($beg, $end) = ($1, $2); $end = $ibx->mm->max if $end eq '*'; $self->msg_more('* SEARCH'); long_response($self, \&uid_search_uid_range, - $tag, $ibx, \$beg, $end); + $tag, \$beg, $end); } elsif ($rest[0] =~ /\A[0-9]+\z/s) { my $uid = $rest[0]; $uid = $ibx->over->get_art($uid) ? " $uid" : '';
This will be used to prevent reloading a giant config with tens/hundreds of thousands of inboxes from blocking the event loop. --- lib/PublicInbox/Config.pm | 15 ++++++++++ lib/PublicInbox/IMAPD.pm | 61 +++++++++++++++++++++++++++++++++++---- script/public-inbox-imapd | 2 +- 3 files changed, 71 insertions(+), 7 deletions(-) diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index 458f29b2964..c18c9c75b4f 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -99,6 +99,21 @@ sub each_inbox { } } +sub iterate_start { + my ($self, $cb, $arg) = @_; + my $i = 0; + $self->{-iter} = [ \$i, $cb, $arg ]; +} + +# for PublicInbox::DS::next_tick +sub event_step { + my ($self) = @_; + my ($i, $cb, $arg) = @{$self->{-iter}}; + my $section = $self->{-section_order}->[$$i++]; + delete($self->{-iter}) unless defined($section); + $cb->($self, $section, $arg); +} + sub lookup_newsgroup { my ($self, $ng) = @_; _lookup_fill($self, '-by_newsgroup', lc($ng)); diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index 966879d8ea5..a10fabffb1f 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -7,6 +7,8 @@ package PublicInbox::IMAPD; use strict; use parent qw(PublicInbox::NNTPD); use PublicInbox::InboxIdle; +use PublicInbox::IMAP; +# *UID_BLOCK = \&PublicInbox::IMAP::UID_BLOCK; sub new { my ($class) = @_; @@ -51,17 +53,64 @@ sub refresh_inboxlist ($) { $self->{inboxlist} = \@names; } -sub refresh_groups { - my ($self) = @_; - my $pi_config = $self->{pi_config} = PublicInbox::Config->new; - $self->SUPER::refresh_groups($pi_config); - refresh_inboxlist($self); +sub imapd_refresh_ibx { # pi_config->each_inbox cb + my ($ibx, $imapd) = @_; + my $ngname = $ibx->{newsgroup} or return; + if (ref $ngname) { + warn 'multiple newsgroups not supported: '. + join(', ', @$ngname). "\n"; + } elsif ($ngname =~ m![^a-z0-9/_\.\-\~\@\+\=:]! || + $ngname =~ /\.[0-9]+-[0-9]+\z/) { + warn "mailbox name invalid: `$ngname'\n"; + } + + my $mm = $ibx->mm or return; + $ibx->{mm} = undef; + defined($ibx->{uidvalidity} = $mm->created_at) or return; + $imapd->{tmp_groups}->{$ngname} = $ibx; - if (my $idler = $self->{idler}) { + # preload to avoid fragmentation: + $ibx->description; + $ibx->base_url; + # my $max = $mm->max // 0; + # my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1; +} + +sub imapd_refresh_finalize { + my ($imapd, $pi_config) = @_; + $imapd->{groups} = delete $imapd->{tmp_groups}; + $imapd->{grouplist} = [ values %{$imapd->{groups}} ]; + refresh_inboxlist($imapd); + $imapd->{pi_config} = $pi_config; + if (my $idler = $imapd->{idler}) { $idler->refresh($pi_config); } } +sub imapd_refresh_step { # pi_config->iterate_start cb + my ($pi_config, $section, $imapd) = @_; + if (defined($section)) { + return if $section !~ m!\Apublicinbox\.([^/]+)\z!; + my $ibx = $pi_config->lookup_name($1) or return; + imapd_refresh_ibx($ibx, $imapd); + } else { # "EOF" + imapd_refresh_finalize($imapd, $pi_config); + } +} + +sub refresh_groups { + my ($self, $sig) = @_; + my $pi_config = PublicInbox::Config->new; + $self->{tmp_groups} = {}; + if (0 && $sig) { # SIGHUP + $pi_config->iterate_start(\&imapd_refresh_step, $self); + PublicInbox::DS::requeue($pi_config); # call event_step + } else { # initial start + $pi_config->each_inbox(\&imapd_refresh_ibx, $self); + imapd_refresh_finalize($self, $pi_config); + } +} + sub idler_start { $_[0]->{idler} //= PublicInbox::InboxIdle->new($_[0]->{pi_config}); } diff --git a/script/public-inbox-imapd b/script/public-inbox-imapd index 63f865f53f0..60f2e6d8a36 100644 --- a/script/public-inbox-imapd +++ b/script/public-inbox-imapd @@ -9,6 +9,6 @@ use PublicInbox::IMAPdeflate; # loads PublicInbox::IMAP use PublicInbox::IMAPD; my $imapd = PublicInbox::IMAPD->new; PublicInbox::Daemon::run('0.0.0.0:143', - sub { $imapd->refresh_groups }, # refresh + sub { $imapd->refresh_groups(@_) }, # refresh sub ($$$) { PublicInbox::IMAP->new($_[0], $imapd) }, # post_accept $imapd);
Finish up the IMAP-only portion of iterative config reloading, which allows us to create all sub-ranges of an inbox up front. The InboxIdler still uses ->each_inbox which will struggle with 100K inboxes. Having messages in the top-level newsgroup name of an inbox will still waste bandwidth for clients which want to do full syncs once there's a rollover to a new 50K range. So instead, make every inbox accessible exclusively via 50K slices in the form of "$NEWSGROUP.$UID_MIN-$UID_END". This introduces the DummyInbox, which makes $NEWSGROUP and every parent component a selectable, empty inbox. This aids navigation with mutt and possibly other MUAs. Finally, the xt/perf-imap-list maintainer test is broken, now, so remove it. The grep perlfunc is already proven effective, and we'll have separate tests for mocking out ~100k inboxes. --- MANIFEST | 2 +- lib/PublicInbox/Config.pm | 7 ++- lib/PublicInbox/DummyInbox.pm | 21 +++++++++ lib/PublicInbox/IMAP.pm | 64 +++++++++---------------- lib/PublicInbox/IMAPD.pm | 88 ++++++++++++++++------------------- t/imap.t | 33 +++++++++++-- t/imapd.t | 22 +++++---- xt/imapd-mbsync-oimap.t | 5 +- xt/imapd-validate.t | 9 ++-- xt/perf-imap-list.t | 38 --------------- 10 files changed, 139 insertions(+), 150 deletions(-) create mode 100644 lib/PublicInbox/DummyInbox.pm delete mode 100644 xt/perf-imap-list.t diff --git a/MANIFEST b/MANIFEST index 957228250a7..a133b71f881 100644 --- a/MANIFEST +++ b/MANIFEST @@ -105,6 +105,7 @@ lib/PublicInbox/DS.pm lib/PublicInbox/DSKQXS.pm lib/PublicInbox/DSPoll.pm lib/PublicInbox/Daemon.pm +lib/PublicInbox/DummyInbox.pm lib/PublicInbox/Emergency.pm lib/PublicInbox/Eml.pm lib/PublicInbox/EmlContentFoo.pm @@ -357,7 +358,6 @@ xt/imapd-validate.t xt/mem-msgview.t xt/msgtime_cmp.t xt/nntpd-validate.t -xt/perf-imap-list.t xt/perf-msgview.t xt/perf-nntpd.t xt/perf-threading.t diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index c18c9c75b4f..19535beb973 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -105,13 +105,16 @@ sub iterate_start { $self->{-iter} = [ \$i, $cb, $arg ]; } -# for PublicInbox::DS::next_tick +# for PublicInbox::DS::next_tick, we only call this is if +# PublicInbox::DS is already loaded sub event_step { my ($self) = @_; my ($i, $cb, $arg) = @{$self->{-iter}}; my $section = $self->{-section_order}->[$$i++]; delete($self->{-iter}) unless defined($section); - $cb->($self, $section, $arg); + eval { $cb->($self, $section, $arg) }; + warn "E: $@ in ${self}::event_step" if $@; + PublicInbox::DS::requeue($self) if defined($section); } sub lookup_newsgroup { diff --git a/lib/PublicInbox/DummyInbox.pm b/lib/PublicInbox/DummyInbox.pm new file mode 100644 index 00000000000..e38f9e5a432 --- /dev/null +++ b/lib/PublicInbox/DummyInbox.pm @@ -0,0 +1,21 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# +# An EXAMINE-able, PublicInbox::Inbox-like object for IMAP. Some +# IMAP clients don't like having unselectable parent mailboxes, +# so we have a dummy +package PublicInbox::DummyInbox; +use strict; + +sub created_at { 0 } # Msgmap::created_at +sub mm { shift } +sub max { undef } # Msgmap::max +sub msg_range { [] } # Msgmap::msg_range + +no warnings 'once'; +*query_xover = \&msg_range; +*over = \&mm; +*subscribe_unlock = *unsubscribe_unlock = + *get_art = *description = *base_url = \&max; + +1; diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 959dcfbe328..6f64dff9958 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -192,26 +192,23 @@ sub cmd_done ($$) { "$idle_tag OK Idle done\r\n"; } -sub ensure_old_ranges_exist ($$$) { - my ($self, $ibx, $uid_min) = @_; - my $groups = $self->{imapd}->{groups}; - my $mailbox = $ibx->{newsgroup}; +sub ensure_ranges_exist ($$$) { + my ($imapd, $ibx, $max) = @_; + my $mailboxes = $imapd->{mailboxes}; + my $mb_top = $ibx->{newsgroup}; my @created; - $uid_min -= UID_BLOCK; + my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1; my $uid_end = $uid_min + UID_BLOCK - 1; while ($uid_min > 0) { - my $sub_mailbox = "$mailbox.$uid_min-$uid_end"; - last if exists $groups->{$sub_mailbox}; - $groups->{$sub_mailbox} = $ibx; + my $sub_mailbox = "$mb_top.$uid_min-$uid_end"; + last if exists $mailboxes->{$sub_mailbox}; + $mailboxes->{$sub_mailbox} = $ibx; $uid_end -= UID_BLOCK; $uid_min -= UID_BLOCK; push @created, $sub_mailbox; } return unless @created; - my $l = $self->{imapd}->{inboxlist}; - grep { - / \Q$mailbox\E\r\n\z/ and s/\(\\HasNoChildren/\(\\HasChildren/; - } @$l; + my $l = $imapd->{inboxlist} or return; push @$l, map { qq[* LIST (\\HasNoChildren) "." $_\r\n] } @created; } @@ -222,41 +219,23 @@ sub cmd_examine ($$$) { if ($mailbox =~ /\A(.+)\.([0-9]+)-([0-9]+)\z/) { # old mail: inbox.comp.foo.$uid_min-$uid_end my ($mb_top, $uid_min, $uid_end) = ($1, $2 + 0, $3 + 0); - $ibx = $self->{imapd}->{groups}->{lc $mb_top}; - if (!$ibx || ($uid_end % UID_BLOCK) != 0 || - ($uid_min + UID_BLOCK - 1) != $uid_end) { - return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; - } - $mm = $ibx->mm; - $max = $mm->max // 0; - # don't let users create inboxes w/ not-yet-possible range: - $uid_min > $max and + $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; - $max = $uid_min + UID_BLOCK + 1; + $mm = $ibx->mm; + $max = $mm->max // 0; $self->{uid_min} = $uid_min; - ensure_old_ranges_exist($self, $ibx, $uid_min); - } else { # current mailbox (most recent UID_BLOCK messages) - $ibx = $self->{imapd}->{groups}->{lc $mailbox} or + ensure_ranges_exist($self->{imapd}, $ibx, $max); + $max = $uid_end if $max > $uid_end; + } else { # check for dummy inboxes + $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; - + delete $self->{uid_min}; + $max = 0; $mm = $ibx->mm; - $max = $mm->max // 0; - - my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1; - if ($uid_min == 1) { # normal inbox with <UID_BLOCK messages - delete $self->{uid_min}; # implicit cmd_close - } else { # we have a giant inbox: - $self->{uid_min} = $uid_min; - ensure_old_ranges_exist($self, $ibx, $uid_min); - } } - # RFC 3501 2.3.1.1 - "A good UIDVALIDITY value to use in - # this case is a 32-bit representation of the creation - # date/time of the mailbox" - my $uidvalidity = $mm->created_at or return "$tag BAD UIDVALIDITY\r\n"; my $uidnext = $max + 1; # XXX: do we need this? RFC 5162/7162 @@ -269,7 +248,7 @@ sub cmd_examine ($$$) { * OK [PERMANENTFLAGS ()] Read-only mailbox\r * OK [UNSEEN $max]\r * OK [UIDNEXT $uidnext]\r -* OK [UIDVALIDITY $uidvalidity]\r +* OK [UIDVALIDITY $ibx->{uidvalidity}]\r $tag OK [READ-ONLY] EXAMINE/SELECT done\r EOF } @@ -561,7 +540,7 @@ sub uid_fetch_m { # long_response sub cmd_status ($$$;@) { my ($self, $tag, $mailbox, @items) = @_; - my $ibx = $self->{imapd}->{groups}->{$mailbox} or + my $ibx = $self->{imapd}->{mailboxes}->{$mailbox} or return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; return "$tag BAD no items\r\n" if !scalar(@items); ($items[0] !~ s/\A\(//s || $items[-1] !~ s/\)\z//s) and @@ -577,8 +556,7 @@ sub cmd_status ($$$;@) { } elsif ($it eq 'UIDNEXT') { push(@it, ($max //= $mm->max // 0) + 1); } elsif ($it eq 'UIDVALIDITY') { - push(@it, $mm->created_at // - return("$tag BAD UIDVALIDITY\r\n")); + push(@it, $ibx->{uidvalidity}); } else { return "$tag BAD invalid item\r\n"; } diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index a10fabffb1f..b647c940505 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -5,82 +5,75 @@ # see script/public-inbox-imapd for how it is used package PublicInbox::IMAPD; use strict; -use parent qw(PublicInbox::NNTPD); +use PublicInbox::Config; use PublicInbox::InboxIdle; use PublicInbox::IMAP; -# *UID_BLOCK = \&PublicInbox::IMAP::UID_BLOCK; +use PublicInbox::DummyInbox; +my $dummy = bless { uidvalidity => 0 }, 'PublicInbox::DummyInbox'; sub new { my ($class) = @_; bless { - groups => {}, + mailboxes => {}, err => \*STDERR, out => \*STDOUT, - grouplist => [], # accept_tls => { SSL_server => 1, ..., SSL_reuse_ctx => ... } # pi_config => PublicInbox::Config # idler => PublicInbox::InboxIdle }, $class; } -sub refresh_inboxlist ($) { - my ($self) = @_; - my @names = map { $_->{newsgroup} } @{delete $self->{grouplist}}; - my %ns; # "\Noselect \HasChildren" - - if (my @uc = grep(/[A-Z]/, @names)) { - warn "Uppercase not allowed for IMAP newsgroup(s):\n", - map { "\t$_\n" } @uc; - my %uc = map { $_ => 1 } @uc; - @names = grep { !$uc{$_} } @names; - } - for (@names) { - my $up = $_; - while ($up =~ s/\.[^\.]+\z//) { - $ns{$up} = '\\Noselect \\HasChildren'; - } - } - @names = map {; - my $at = delete($ns{$_}) ? '\\HasChildren' : '\\HasNoChildren'; - qq[* LIST ($at) "." $_\r\n] - } @names; - push(@names, map { qq[* LIST ($ns{$_}) "." $_\r\n] } keys %ns); - @names = sort { - my ($xa) = ($a =~ / (\S+)\r\n/g); - my ($xb) = ($b =~ / (\S+)\r\n/g); - length($xa) <=> length($xb); - } @names; - $self->{inboxlist} = \@names; -} - sub imapd_refresh_ibx { # pi_config->each_inbox cb my ($ibx, $imapd) = @_; my $ngname = $ibx->{newsgroup} or return; if (ref $ngname) { warn 'multiple newsgroups not supported: '. join(', ', @$ngname). "\n"; + return; } elsif ($ngname =~ m![^a-z0-9/_\.\-\~\@\+\=:]! || $ngname =~ /\.[0-9]+-[0-9]+\z/) { - warn "mailbox name invalid: `$ngname'\n"; + warn "mailbox name invalid: newsgroup=`$ngname'\n"; + return; } - + $ibx->over or return; + $ibx->{over} = undef; my $mm = $ibx->mm or return; $ibx->{mm} = undef; + + # RFC 3501 2.3.1.1 - "A good UIDVALIDITY value to use in + # this case is a 32-bit representation of the creation + # date/time of the mailbox" defined($ibx->{uidvalidity} = $mm->created_at) or return; - $imapd->{tmp_groups}->{$ngname} = $ibx; + PublicInbox::IMAP::ensure_ranges_exist($imapd, $ibx, $mm->max // 1); # preload to avoid fragmentation: $ibx->description; $ibx->base_url; - # my $max = $mm->max // 0; - # my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1; + + # ensure dummies are selectable + my $dummies = $imapd->{dummies}; + do { + $dummies->{$ngname} = $dummy; + } while ($ngname =~ s/\.[^\.]+\z//); } sub imapd_refresh_finalize { my ($imapd, $pi_config) = @_; - $imapd->{groups} = delete $imapd->{tmp_groups}; - $imapd->{grouplist} = [ values %{$imapd->{groups}} ]; - refresh_inboxlist($imapd); + my $mailboxes; + if (my $next = delete $imapd->{imapd_next}) { + $imapd->{mailboxes} = delete $next->{mailboxes}; + $mailboxes = delete $next->{dummies}; + } else { + $mailboxes = delete $imapd->{dummies}; + } + %$mailboxes = (%$mailboxes, %{$imapd->{mailboxes}}); + $imapd->{mailboxes} = $mailboxes; + $imapd->{inboxlist} = [ + map { + my $no = $mailboxes->{$_} == $dummy ? '' : 'No'; + qq[* LIST (\\Has${no}Children) "." $_\r\n] + } sort { length($a) <=> length($b) } keys %$mailboxes + ]; $imapd->{pi_config} = $pi_config; if (my $idler = $imapd->{idler}) { $idler->refresh($pi_config); @@ -92,8 +85,8 @@ sub imapd_refresh_step { # pi_config->iterate_start cb if (defined($section)) { return if $section !~ m!\Apublicinbox\.([^/]+)\z!; my $ibx = $pi_config->lookup_name($1) or return; - imapd_refresh_ibx($ibx, $imapd); - } else { # "EOF" + imapd_refresh_ibx($ibx, $imapd->{imapd_next}); + } else { # undef == "EOF" imapd_refresh_finalize($imapd, $pi_config); } } @@ -101,11 +94,12 @@ sub imapd_refresh_step { # pi_config->iterate_start cb sub refresh_groups { my ($self, $sig) = @_; my $pi_config = PublicInbox::Config->new; - $self->{tmp_groups} = {}; - if (0 && $sig) { # SIGHUP + if ($sig) { # SIGHUP is handled through the event loop + $self->{imapd_next} = { dummies => {}, mailboxes => {} }; $pi_config->iterate_start(\&imapd_refresh_step, $self); PublicInbox::DS::requeue($pi_config); # call event_step - } else { # initial start + } else { # initial start is synchronous + $self->{dummies} = {}; $pi_config->each_inbox(\&imapd_refresh_ibx, $self); imapd_refresh_finalize($self, $pi_config); } diff --git a/t/imap.t b/t/imap.t index aa262a19632..af59ef69386 100644 --- a/t/imap.t +++ b/t/imap.t @@ -6,16 +6,39 @@ use strict; use Test::More; use PublicInbox::IMAP; use PublicInbox::IMAPD; +use PublicInbox::TestCommon; +require_mods(qw(DBD::SQLite)); +require_git 2.6; -{ # make sure we get '%' globbing right +my ($tmpdir, $for_destroy) = tmpdir(); +my $cfgfile = "$tmpdir/config"; +{ + open my $fh, '>', $cfgfile or BAIL_OUT $!; + print $fh <<EOF or BAIL_OUT $!; +[publicinbox "a"] + inboxdir = $tmpdir/a + newsgroup = x.y.z +[publicinbox "b"] + inboxdir = $tmpdir/b + newsgroup = x.z.y +[publicinbox "c"] + inboxdir = $tmpdir/c + newsgroup = IGNORE.THIS +EOF + close $fh or BAIL_OUT $!; + local $ENV{PI_CONFIG} = $cfgfile; + for my $x (qw(a b c)) { + ok(run_script(['-init', '-Lbasic', '-V2', $x, "$tmpdir/$x", + "https://example.com/$x", "$x\@example.com"]), + "init $x"); + } + my $imapd = PublicInbox::IMAPD->new; my @w; local $SIG{__WARN__} = sub { push @w, @_ }; - my @n = map { { newsgroup => $_ } } (qw(x.y.z x.z.y IGNORE.THIS)); - my $self = { imapd => { grouplist => \@n } }; - PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); + $imapd->refresh_groups; + my $self = { imapd => $imapd }; is(scalar(@w), 1, 'got a warning for upper-case'); like($w[0], qr/IGNORE\.THIS/, 'warned about upper-case'); - my $res = PublicInbox::IMAP::cmd_list($self, 'tag', 'x', '%'); is(scalar($$res =~ tr/\n/\n/), 2, 'only one result'); like($$res, qr/ x\r\ntag OK/, 'saw expected'); diff --git a/t/imapd.t b/t/imapd.t index a5324f78da0..017bfa0b27b 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -16,6 +16,9 @@ if ($can_compress) { # hope this gets fixed upstream, soon $imap_client = 'PublicInbox::IMAPClient'; } +require_ok 'PublicInbox::IMAP'; +my $first_range = '1-'.PublicInbox::IMAP::UID_BLOCK(); + my $level = '-Lbasic'; SKIP: { require_mods('Search::Xapian', 1); @@ -87,11 +90,13 @@ ok(!$mic->examine('foo') && ($e = $@), 'EXAMINE non-existent'); like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent'); ok(!$mic->select('foo') && ($e = $@), 'EXAMINE non-existent'); like($e, qr/\bNO\b/, 'got a NO on EXAMINE for non-existent'); -ok($mic->select('inbox.i1'), 'SELECT succeeds'); -ok($mic->examine('INBOX.i1'), 'EXAMINE succeeds'); -my @raw = $mic->status('inbox.i1', qw(Messages uidnext uidvalidity)); +my $mailbox1 = "inbox.i1.$first_range"; +ok($mic->select('inbox.i1'), 'SELECT on parent succeeds'); +ok($mic->select($mailbox1), 'SELECT succeeds'); +ok($mic->examine($mailbox1), 'EXAMINE succeeds'); +my @raw = $mic->status($mailbox1, qw(Messages uidnext uidvalidity)); is(scalar(@raw), 2, 'got status response'); -like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\x20 +like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\.$first_range\x20 \(MESSAGES\x20\d+\x20UIDNEXT\x20\d+\x20UIDVALIDITY\x20\d+\)\r\n/sx); like($raw[1], qr/\A\S+ OK /, 'finished status response'); @@ -99,7 +104,7 @@ like($raw[1], qr/\A\S+ OK /, 'finished status response'); like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/, 'got an inbox'); like($raw[-1], qr/^\S+ OK /, 'response ended with OK'); -is(scalar(@raw), scalar(@V) + 2, 'default LIST response'); +is(scalar(@raw), scalar(@V) + 4, 'default LIST response'); @raw = $mic->list('', 'inbox.i1'); is(scalar(@raw), 2, 'limited LIST response'); like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/, @@ -199,7 +204,7 @@ SKIP: { skip 'Mail::IMAPClient too old for ->compress', 2 if !$can_compress; my $c = $imap_client->new(%mic_opt); ok($c && $c->compress, 'compress enabled'); - ok($c->examine('inbox.i1'), 'EXAMINE succeeds after COMPRESS'); + ok($c->examine($mailbox1), 'EXAMINE succeeds after COMPRESS'); $ret = $c->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@"; is_deeply($ret, [ 1 ], 'search UID 1:* works after compression'); } @@ -216,11 +221,12 @@ $pi_config->each_inbox(sub { my $ng = $ibx->{newsgroup}; my $mic = $imap_client->new(%mic_opt); ok($mic && $mic->login && $mic->IsAuthenticated, "authed $name"); - my $uidnext = $mic->uidnext($ng); # we'll fetch BODYSTRUCTURE on this + my $mb = "$ng.$first_range"; + my $uidnext = $mic->uidnext($mb); # we'll fetch BODYSTRUCTURE on this ok($uidnext, 'got uidnext for later fetch'); is_deeply([$mic->has_capability('IDLE')], ['IDLE'], "IDLE capa $name"); ok(!$mic->idle, "IDLE fails w/o SELECT/EXAMINE $name"); - ok($mic->examine($ng), "EXAMINE $ng succeeds"); + ok($mic->examine($mb), "EXAMINE $ng succeeds"); ok(my $idle_tag = $mic->idle, "IDLE succeeds on $ng"); open(my $fh, '<', 't/data/message_embed.eml') or BAIL_OUT("open: $!"); diff --git a/xt/imapd-mbsync-oimap.t b/xt/imapd-mbsync-oimap.t index d2237a24bef..b2cb8737f82 100644 --- a/xt/imapd-mbsync-oimap.t +++ b/xt/imapd-mbsync-oimap.t @@ -13,12 +13,13 @@ my $inboxdir = $ENV{GIANT_INBOX_DIR}; plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!; my ($tmpdir, $for_destroy) = tmpdir(); my $cfg = "$tmpdir/cfg"; -my $mailbox = 'inbox.test'; +my $newsgroup = 'inbox.test'; +my $mailbox = "$newsgroup.1-50000"; { open my $fh, '>', $cfg or BAIL_OUT "open: $!"; print $fh <<EOF or BAIL_OUT "print: $!"; [publicinbox "test"] - newsgroup = $mailbox + newsgroup = $newsgroup address = oimap\@example.com inboxdir = $inboxdir EOF diff --git a/xt/imapd-validate.t b/xt/imapd-validate.t index f96ec8791b9..b7b66d0583f 100644 --- a/xt/imapd-validate.t +++ b/xt/imapd-validate.t @@ -30,7 +30,7 @@ if ($test_tls && !-r $key || !-r $cert) { } my ($tmpdir, $for_destroy) = tmpdir(); my %OPT = qw(User u Password p); -my (%STARTTLS_OPT, %IMAPS_OPT, $td, $mailbox, $make_local_server); +my (%STARTTLS_OPT, %IMAPS_OPT, $td, $newsgroup, $mailbox, $make_local_server); if (($ENV{IMAP_TEST_URL} // '') =~ m!\Aimap://([^/]+)/(.+)\z!) { ($OPT{Server}, $mailbox) = ($1, $2); $OPT{Server} =~ s/:([0-9]+)\z// and $OPT{Port} = $1 + 0; @@ -39,6 +39,7 @@ if (($ENV{IMAP_TEST_URL} // '') =~ m!\Aimap://([^/]+)/(.+)\z!) { } else { require_mods(qw(DBD::SQLite)); $make_local_server->(); + $mailbox = "$newsgroup.1-50000"; } my %opts = (imap => \%OPT, 'imap+compress' => { %OPT, Compress => 1 }); @@ -124,15 +125,15 @@ BEGIN { $make_local_server = sub { require PublicInbox::Inbox; - $mailbox = 'inbox.test'; - my $ibx = { inboxdir => $inbox_dir, newsgroup => $mailbox }; + $newsgroup = 'inbox.test'; + my $ibx = { inboxdir => $inbox_dir, newsgroup => $newsgroup }; $ibx = PublicInbox::Inbox->new($ibx); my $pi_config = "$tmpdir/config"; { open my $fh, '>', $pi_config or die "open($pi_config): $!"; print $fh <<"" or die "print $pi_config: $!"; [publicinbox "test"] - newsgroup = $mailbox + newsgroup = $newsgroup inboxdir = $inbox_dir address = test\@example.com diff --git a/xt/perf-imap-list.t b/xt/perf-imap-list.t deleted file mode 100644 index 0f00f487991..00000000000 --- a/xt/perf-imap-list.t +++ /dev/null @@ -1,38 +0,0 @@ -#!perl -w -# Copyright (C) 2020 all contributors <meta@public-inbox.org> -# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> -use Test::More; -use_ok 'PublicInbox::IMAP'; -use_ok 'PublicInbox::IMAPD'; -use PublicInbox::DS; -use Benchmark qw(:all); -my @n = map { { newsgroup => "inbox.comp.foo.bar.$_" } } (0..50000); -push @n, map { { newsgroup => "xobni.womp.foo.bar.$_" } } (0..50000); -my $self = { imapd => { grouplist => \@n } }; -my $n = scalar @n; -my $t = timeit(1, sub { - PublicInbox::IMAPD::refresh_inboxlist($self->{imapd}); -}); -diag timestr($t). "refresh $n inboxes"; - -open my $null, '>', '/dev/null' or BAIL_OUT "open: $!"; -my $ds = { sock => $null }; -my $nr = 200; -diag "starting benchmark..."; -my $cmd_list = \&PublicInbox::IMAP::cmd_list; -$t = timeit(1, sub { - for (0..$nr) { - my $res = $cmd_list->($self, 'tag', '', '*'); - PublicInbox::DS::write($ds, $res); - } -}); -diag timestr($t). "list all for $n inboxes $nr times"; -$nr = 20; -$t = timeit(1, sub { - for (0..$nr) { - my $res = $cmd_list->($self, 'tag', 'inbox.', '%'); - PublicInbox::DS::write($ds, $res); - } -}); -diag timestr($t). "list partial for $n inboxes $nr times"; -done_testing;
I'm not sure this matters, and it could be a waste of CPU cycles if no real clients care. However, it does make debugging over telnet or s_client a bit easier. --- lib/PublicInbox/IMAPD.pm | 7 ++++++- t/imapd.t | 14 +++++++++++++- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index b647c940505..6488dc0f3bc 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -72,7 +72,12 @@ sub imapd_refresh_finalize { map { my $no = $mailboxes->{$_} == $dummy ? '' : 'No'; qq[* LIST (\\Has${no}Children) "." $_\r\n] - } sort { length($a) <=> length($b) } keys %$mailboxes + } sort { + # shortest names first, alphabetically if lengths match + length($a) == length($b) ? + ($a cmp $b) : + (length($a) <=> length($b)) + } keys %$mailboxes ]; $imapd->{pi_config} = $pi_config; if (my $idler = $imapd->{idler}) { diff --git a/t/imapd.t b/t/imapd.t index 017bfa0b27b..5d9a7d1181c 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -100,7 +100,7 @@ like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\.$first_range\x20 \(MESSAGES\x20\d+\x20UIDNEXT\x20\d+\x20UIDVALIDITY\x20\d+\)\r\n/sx); like($raw[1], qr/\A\S+ OK /, 'finished status response'); -@raw = $mic->list; +my @orig_list = @raw = $mic->list; like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/, 'got an inbox'); like($raw[-1], qr/^\S+ OK /, 'response ended with OK'); @@ -351,6 +351,18 @@ $r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]') is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'}, 'Message-ID: <20200418222508.GA13918@dcvr>'."\r\n\r\n", 'BODY.PEEK[HEADER.FIELDS ...] drops .PEEK'); + +{ + my @new_list = $mic->list; + # tag differs in [-1] + like($orig_list[-1], qr/\A\S+ OK List done\r\n/, 'orig LIST'); + like($new_list[-1], qr/\A\S+ OK List done\r\n/, 'new LIST'); + pop @new_list; + pop @orig_list; + # TODO: not sure if sort order matters, imapd_refresh_finalize + # sorts, for now, but maybe clients don't care... + is_deeply(\@new_list, \@orig_list, 'LIST identical'); +} ok($mic->close, 'CLOSE works'); ok(!$mic->close, 'CLOSE not idempotent'); ok($mic->logout, 'logged out');
Having two large numbers separated by a dash can make visual comparisons difficult when numbers are in the 3,000,000 range for LKML. So avoid the $UID_END value, since it can be calculated from $UID_MIN. And we can avoid large values of $UID_MIN, too, by instead storing the block index and just multiplying it by 50000 (and adding 1) on the server side. Of course, LKML still goes up to 72, at the moment. --- lib/PublicInbox/IMAP.pm | 15 ++++++--------- lib/PublicInbox/IMAPD.pm | 2 +- t/imapd.t | 2 +- xt/imapd-mbsync-oimap.t | 2 +- xt/imapd-validate.t | 2 +- 5 files changed, 10 insertions(+), 13 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 6f64dff9958..5865822f0c3 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -197,14 +197,10 @@ sub ensure_ranges_exist ($$$) { my $mailboxes = $imapd->{mailboxes}; my $mb_top = $ibx->{newsgroup}; my @created; - my $uid_min = UID_BLOCK * int($max/UID_BLOCK) + 1; - my $uid_end = $uid_min + UID_BLOCK - 1; - while ($uid_min > 0) { - my $sub_mailbox = "$mb_top.$uid_min-$uid_end"; + for (my $i = int($max/UID_BLOCK); $i >= 0; --$i) { + my $sub_mailbox = "$mb_top.$i"; last if exists $mailboxes->{$sub_mailbox}; $mailboxes->{$sub_mailbox} = $ibx; - $uid_end -= UID_BLOCK; - $uid_min -= UID_BLOCK; push @created, $sub_mailbox; } return unless @created; @@ -216,9 +212,9 @@ sub cmd_examine ($$$) { my ($self, $tag, $mailbox) = @_; my ($ibx, $mm, $max); - if ($mailbox =~ /\A(.+)\.([0-9]+)-([0-9]+)\z/) { - # old mail: inbox.comp.foo.$uid_min-$uid_end - my ($mb_top, $uid_min, $uid_end) = ($1, $2 + 0, $3 + 0); + if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) { + # old mail: inbox.comp.foo.$uid_block_idx + my ($mb_top, $uid_min) = ($1, $2 * UID_BLOCK + 1); $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; @@ -227,6 +223,7 @@ sub cmd_examine ($$$) { $max = $mm->max // 0; $self->{uid_min} = $uid_min; ensure_ranges_exist($self->{imapd}, $ibx, $max); + my $uid_end = $uid_min + UID_BLOCK - 1; $max = $uid_end if $max > $uid_end; } else { # check for dummy inboxes $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index 6488dc0f3bc..261d756042f 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -31,7 +31,7 @@ sub imapd_refresh_ibx { # pi_config->each_inbox cb join(', ', @$ngname). "\n"; return; } elsif ($ngname =~ m![^a-z0-9/_\.\-\~\@\+\=:]! || - $ngname =~ /\.[0-9]+-[0-9]+\z/) { + $ngname =~ /\.[0-9]+\z/) { warn "mailbox name invalid: newsgroup=`$ngname'\n"; return; } diff --git a/t/imapd.t b/t/imapd.t index 5d9a7d1181c..45ee401a1e0 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -17,7 +17,7 @@ if ($can_compress) { # hope this gets fixed upstream, soon } require_ok 'PublicInbox::IMAP'; -my $first_range = '1-'.PublicInbox::IMAP::UID_BLOCK(); +my $first_range = '0'; my $level = '-Lbasic'; SKIP: { diff --git a/xt/imapd-mbsync-oimap.t b/xt/imapd-mbsync-oimap.t index b2cb8737f82..fdaa22aa9ef 100644 --- a/xt/imapd-mbsync-oimap.t +++ b/xt/imapd-mbsync-oimap.t @@ -14,7 +14,7 @@ plan skip_all => "bad characters in $inboxdir" if $inboxdir =~ m![^\w\.\-/]!; my ($tmpdir, $for_destroy) = tmpdir(); my $cfg = "$tmpdir/cfg"; my $newsgroup = 'inbox.test'; -my $mailbox = "$newsgroup.1-50000"; +my $mailbox = "$newsgroup.0"; { open my $fh, '>', $cfg or BAIL_OUT "open: $!"; print $fh <<EOF or BAIL_OUT "print: $!"; diff --git a/xt/imapd-validate.t b/xt/imapd-validate.t index b7b66d0583f..aeeb43b9b40 100644 --- a/xt/imapd-validate.t +++ b/xt/imapd-validate.t @@ -39,7 +39,7 @@ if (($ENV{IMAP_TEST_URL} // '') =~ m!\Aimap://([^/]+)/(.+)\z!) { } else { require_mods(qw(DBD::SQLite)); $make_local_server->(); - $mailbox = "$newsgroup.1-50000"; + $mailbox = "$newsgroup.0"; } my %opts = (imap => \%OPT, 'imap+compress' => { %OPT, Compress => 1 });
We'll use the xqx() to avoid losing too much performance compared to normal `backtick` (qx) when testing using "make check-run" + Inline::C. --- t/config.t | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/t/config.t b/t/config.t index 1f50bb86a09..3f41c0042a9 100644 --- a/t/config.t +++ b/t/config.t @@ -209,20 +209,17 @@ EOF } { - my $check_git = !!$ENV{CHECK_GIT_BOOL}; for my $t (qw(TRUE true yes on 1 +1 -1 13 0x1 0x12 0X5)) { is(PublicInbox::Config::_git_config_bool($t), 1, "$t is true"); - if ($check_git) { - is(`git -c test.val=$t config --bool test.val`, - "true\n", "$t matches git-config behavior"); - } + is(xqx([qw(git -c), "test.val=$t", + qw(config --bool test.val)]), + "true\n", "$t matches git-config behavior"); } for my $f (qw(FALSE false no off 0 +0 +000 00 0x00 0X0)) { is(PublicInbox::Config::_git_config_bool($f), 0, "$f is false"); - if ($check_git) { - is(`git -c test.val=$f config --bool test.val`, - "false\n", "$f matches git-config behavior"); - } + is(xqx([qw(git -c), "test.val=$f", + qw(config --bool test.val)]), + "false\n", "$f matches git-config behavior"); } is(PublicInbox::Config::_git_config_bool('bogus'), undef, 'bogus is undef');
This will make it easier to show parameters used for testing and potential tweaks to be made. --- xt/eml_check_limits.t | 5 ++++- xt/git_async_cmp.t | 1 + xt/imapd-mbsync-oimap.t | 4 +++- xt/imapd-validate.t | 1 + xt/mem-msgview.t | 1 + 5 files changed, 10 insertions(+), 2 deletions(-) diff --git a/xt/eml_check_limits.t b/xt/eml_check_limits.t index 2d632799956..cf780c77bab 100644 --- a/xt/eml_check_limits.t +++ b/xt/eml_check_limits.t @@ -13,7 +13,10 @@ require_git(2.19); # for --unordered require_mods(qw(BSD::Resource)); BSD::Resource->import(qw(getrusage)); my $cls = $ENV{TEST_CLASS}; -require_mods($cls) if $cls; +if ($cls) { + diag "TEST_CLASS=$cls"; + require_mods($cls); +} $cls //= 'PublicInbox::Eml'; my $inboxdir = $ENV{GIANT_INBOX_DIR}; plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir; diff --git a/xt/git_async_cmp.t b/xt/git_async_cmp.t index 8f8d1cf4d01..f9c9ddefeb0 100644 --- a/xt/git_async_cmp.t +++ b/xt/git_async_cmp.t @@ -18,6 +18,7 @@ if (require_git(2.19, 1)) { } my @dig; my $nr = $ENV{NR} || 1; +diag "NR=$nr"; my $async = timeit($nr, sub { my $dig = Digest::SHA->new(1); my $cb = sub { diff --git a/xt/imapd-mbsync-oimap.t b/xt/imapd-mbsync-oimap.t index fdaa22aa9ef..c097a0262f0 100644 --- a/xt/imapd-mbsync-oimap.t +++ b/xt/imapd-mbsync-oimap.t @@ -116,7 +116,9 @@ while (scalar keys %pids) { is($?, 0, join(' ', @$cmd, 'done')); } -if (my $sec = $ENV{TEST_PERSIST}) { +my $sec = $ENV{TEST_PERSIST} // 0; +diag "TEST_PERSIST=$sec"; +if ($sec) { diag "sleeping ${sec}s, imap://$host:$port/$mailbox available"; diag "tmpdir=$tmpdir (Maildirs available)"; diag "stdout=$out"; diff --git a/xt/imapd-validate.t b/xt/imapd-validate.t index aeeb43b9b40..9a56c2d04c8 100644 --- a/xt/imapd-validate.t +++ b/xt/imapd-validate.t @@ -13,6 +13,7 @@ plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inbox_dir; # how many emails to read into memory at once per-process my $BATCH = $ENV{TEST_BATCH} // 100; my $REPEAT = $ENV{TEST_REPEAT} // 1; +diag "TEST_BATCH=$BATCH TEST_REPEAT=$REPEAT"; require_mods(qw(Mail::IMAPClient)); my $imap_client = 'Mail::IMAPClient'; diff --git a/xt/mem-msgview.t b/xt/mem-msgview.t index bffb1768b9e..c09afde083d 100644 --- a/xt/mem-msgview.t +++ b/xt/mem-msgview.t @@ -13,6 +13,7 @@ my @mods = qw(DBD::SQLite BSD::Resource PublicInbox::WWW); require_mods(@mods); use_ok($_) for @mods; my $lines = $ENV{NR_LINES} // 50000; +diag "NR_LINES=$lines"; my ($tmpdir, $for_destroy) = tmpdir(); my $inboxname = 'big'; my $inboxdir = "$tmpdir/big";
Some clients insist on sending "INBOX" in all caps, since it's special in RFC 3501. --- lib/PublicInbox/IMAP.pm | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 5865822f0c3..8c6fa7b62c9 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -537,7 +537,7 @@ sub uid_fetch_m { # long_response sub cmd_status ($$$;@) { my ($self, $tag, $mailbox, @items) = @_; - my $ibx = $self->{imapd}->{mailboxes}->{$mailbox} or + my $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; return "$tag BAD no items\r\n" if !scalar(@items); ($items[0] !~ s/\A\(//s || $items[-1] !~ s/\)\z//s) and @@ -571,7 +571,8 @@ sub cmd_list ($$$$) { # request for hierarchy delimiter $l = [ qq[* LIST (\\Noselect) "." ""\r\n] ]; } elsif ($refname ne '' || $wildcard ne '*') { - $wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eig; + $wildcard = lc $wildcard; + $wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eg; $l = [ grep(/ \Q$refname\E$wildcard\r\n\z/s, @$l) ]; } \(join('', @$l, "$tag OK List done\r\n"));
We can share code between them and account for each 50K mailbox slice. However, we must overreport these for non-zero slices and just return lots of empty data for high-numbered slices because some MUAs still insist on non-UID fetches. --- lib/PublicInbox/IMAP.pm | 53 ++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 8c6fa7b62c9..e726307a678 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -208,42 +208,43 @@ sub ensure_ranges_exist ($$$) { push @$l, map { qq[* LIST (\\HasNoChildren) "." $_\r\n] } @created; } -sub cmd_examine ($$$) { - my ($self, $tag, $mailbox) = @_; - my ($ibx, $mm, $max); - +sub inbox_lookup ($$) { + my ($self, $mailbox) = @_; + my ($ibx, $exists, $uidnext); if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) { # old mail: inbox.comp.foo.$uid_block_idx my ($mb_top, $uid_min) = ($1, $2 * UID_BLOCK + 1); - $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or - return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; - - $mm = $ibx->mm; - $max = $mm->max // 0; + $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; + $exists = $ibx->mm->max // 0; $self->{uid_min} = $uid_min; - ensure_ranges_exist($self->{imapd}, $ibx, $max); + ensure_ranges_exist($self->{imapd}, $ibx, $exists); my $uid_end = $uid_min + UID_BLOCK - 1; - $max = $uid_end if $max > $uid_end; + $exists = $uid_end if $exists > $uid_end; + $uidnext = $exists + 1; } else { # check for dummy inboxes - $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or - return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; + $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; delete $self->{uid_min}; - $max = 0; - $mm = $ibx->mm; + $exists = 0; + $uidnext = 1; } + ($ibx, $exists, $uidnext); +} - my $uidnext = $max + 1; +sub cmd_examine ($$$) { + my ($self, $tag, $mailbox) = @_; + my ($ibx, $exists, $uidnext) = inbox_lookup($self, $mailbox); + return "$tag NO Mailbox doesn't exist: $mailbox\r\n" if !$ibx; # XXX: do we need this? RFC 5162/7162 my $ret = $self->{ibx} ? "* OK [CLOSED] previous closed\r\n" : ''; $self->{ibx} = $ibx; $ret .= <<EOF; -* $max EXISTS\r -* $max RECENT\r +* $exists EXISTS\r +* $exists RECENT\r * FLAGS (\\Seen)\r * OK [PERMANENTFLAGS ()] Read-only mailbox\r -* OK [UNSEEN $max]\r +* OK [UNSEEN $exists]\r * OK [UIDNEXT $uidnext]\r * OK [UIDVALIDITY $ibx->{uidvalidity}]\r $tag OK [READ-ONLY] EXAMINE/SELECT done\r @@ -537,23 +538,21 @@ sub uid_fetch_m { # long_response sub cmd_status ($$$;@) { my ($self, $tag, $mailbox, @items) = @_; - my $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or - return "$tag NO Mailbox doesn't exist: $mailbox\r\n"; return "$tag BAD no items\r\n" if !scalar(@items); ($items[0] !~ s/\A\(//s || $items[-1] !~ s/\)\z//s) and return "$tag BAD invalid args\r\n"; - - my $mm = $ibx->mm; - my ($max, @it); + my ($ibx, $exists, $uidnext) = inbox_lookup($self, $mailbox); + return "$tag NO Mailbox doesn't exist: $mailbox\r\n" if !$ibx; + my @it; for my $it (@items) { $it = uc($it); push @it, $it; if ($it =~ /\A(?:MESSAGES|UNSEEN|RECENT)\z/) { - push(@it, ($max //= $mm->max // 0)); + push @it, $exists; } elsif ($it eq 'UIDNEXT') { - push(@it, ($max //= $mm->max // 0) + 1); + push @it, $uidnext; } elsif ($it eq 'UIDVALIDITY') { - push(@it, $ibx->{uidvalidity}); + push @it, $ibx->{uidvalidity}; } else { return "$tag BAD invalid item\r\n"; }
No point in spewing "uninitialized" warnings into logs when the cat jumps on the Enter key. --- lib/PublicInbox/IMAP.pm | 7 +++++-- t/imapd.t | 11 +++++++++++ 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index e726307a678..0452d6df937 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -862,14 +862,17 @@ sub process_line ($$) { cmd_done($self, $tag); } else { # this is weird auth_challenge_ok($self) // - "$tag BAD Error in IMAP command $req: ". - "Unknown command\r\n"; + ($tag // '*') . + ' BAD Error in IMAP command '. + ($req // '(???)'). + ": Unknown command\r\n"; } }; my $err = $@; if ($err && $self->{sock}) { $l =~ s/\r?\n//s; err($self, 'error from: %s (%s)', $l, $err); + $tag //= '*'; $res = "$tag BAD program fault - command not performed\r\n"; } return 0 unless defined $res; diff --git a/t/imapd.t b/t/imapd.t index 45ee401a1e0..11b56b09dbe 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -367,11 +367,22 @@ ok($mic->close, 'CLOSE works'); ok(!$mic->close, 'CLOSE not idempotent'); ok($mic->logout, 'logged out'); +{ + my $c = tcp_connect($sock); + $c->autoflush(1); + like(<$c>, qr/\* OK/, 'got a greeting'); + print $c "\r\n"; + like(<$c>, qr/\A\* BAD Error in IMAP command/, 'empty line'); + print $c "tagonly\r\n"; + like(<$c>, qr/\Atagonly BAD Error in IMAP command/, 'tag-only line'); +} + $td->kill; $td->join; is($?, 0, 'no error in exited process'); open my $fh, '<', $err or BAIL_OUT("open $err failed: $!"); my $eout = do { local $/; <$fh> }; unlike($eout, qr/wide/i, 'no Wide character warnings'); +unlike($eout, qr/uninitialized/i, 'no uninitialized warnings'); done_testing;
None of the new cases are wired up, yet, but existing cases still work. --- lib/PublicInbox/IMAP.pm | 142 ++++++++++++++++++++++++++++++++++++++-- t/imap.t | 15 +++++ 2 files changed, 150 insertions(+), 7 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 0452d6df937..b24dfcd70b5 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -24,6 +24,8 @@ use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT); use PublicInbox::GitAsyncCat; use Text::ParseWords qw(parse_line); use Errno qw(EAGAIN); +use Time::Local qw(timegm); +use POSIX qw(strftime); my $Address; for my $mod (qw(Email::Address::XS Mail::Address)) { @@ -67,6 +69,10 @@ for my $att (keys %FETCH_ATT) { my $valid_range = '[0-9]+|[0-9]+:[0-9]+|[0-9]+:\*'; $valid_range = qr/\A(?:$valid_range)(?:,(?:$valid_range))*\z/; +my @MoY = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); +my %MoY; +@MoY{@MoY} = (0..11); + sub greet ($) { my ($self) = @_; my $capa = capa($self); @@ -787,6 +793,17 @@ sub cmd_fetch ($$$;@) { } : $args; # error } + +sub parse_date ($) { # 02-Oct-1993 + my ($date_text) = @_; + my ($dd, $mon, $yyyy) = split(/-/, $_[0], 3); + defined($yyyy) or return; + my $mm = $MoY{$mon} // return; + $dd =~ /\A[0123]?[0-9]\z/ or return; + $yyyy =~ /\A[0-9]{4,}\z/ or return; # Y10K-compatible! + timegm(0, 0, 0, $dd, $mm, $yyyy); +} + sub uid_search_all { # long_response my ($self, $tag, $num) = @_; my $uids = $self->{ibx}->mm->ids_after($num); @@ -809,23 +826,134 @@ sub uid_search_uid_range { # long_response } } +sub date_search { + my ($q, $k, $d) = @_; + my $sql = $q->{sql}; + + # Date: header + if ($k eq 'SENTON') { + my $end = $d + 86399; # no leap day... + my $da = strftime('%Y%m%d%H%M%S', gmtime($d)); + my $db = strftime('%Y%m%d%H%M%S', gmtime($end)); + $q->{xap} .= " dt:$da..$db"; + $$sql .= " AND ds >= $d AND ds <= $end" if defined($sql); + } elsif ($k eq 'SENTBEFORE') { + $q->{xap} .= ' d:..'.strftime('%Y%m%d', gmtime($d)); + $$sql .= " AND ds <= $d" if defined($sql); + } elsif ($k eq 'SENTSINCE') { + $q->{xap} .= ' d:'.strftime('%Y%m%d', gmtime($d)).'..'; + $$sql .= " AND ds >= $d" if defined($sql); + + # INTERNALDATE (Received) + } elsif ($k eq 'ON') { + my $end = $d + 86399; # no leap day... + $q->{xap} .= " ts:$d..$end"; + $$sql .= " AND ts >= $d AND ts <= $end" if defined($sql); + } elsif ($k eq 'BEFORE') { + $q->{xap} .= " ts:..$d"; + $$sql .= " AND ts <= $d" if defined($sql); + } elsif ($k eq 'SINCE') { + $q->{xap} .= " ts:$d.."; + $$sql .= " AND ts >= $d" if defined($sql); + } else { + die "BUG: $k not recognized"; + } +} + +# IMAP to Xapian search key mapping +my %I2X = ( + SUBJECT => 's:', + BODY => 'b:', + FROM => 'f:', + TEXT => '', # n.b. does not include all headers + TO => 't:', + CC => 'c:', + # BCC => 'bcc:', # TODO + # KEYWORD # TODO ? dfpre,dfpost,... +); + +sub parse_query { + my ($self, $rest) = @_; + if (uc($rest->[0]) eq 'CHARSET') { + shift @$rest; + defined(my $c = shift @$rest) or return 'BAD missing charset'; + $c =~ /\A(?:UTF-8|US-ASCII)\z/ or return 'NO [BADCHARSET]'; + } + + my $sql = ''; # date conditions, {sql} deleted if Xapian is needed + my $q = { xap => '', sql => \$sql }; + while (@$rest) { + my $k = uc(shift @$rest); + # default criteria + next if $k =~ /\A(?:ALL|RECENT|UNSEEN|NEW)\z/; + next if $k eq 'AND'; # the default, until we support OR + if ($k =~ $valid_range) { # sequence numbers == UIDs + push @{$q->{uid}}, $k; + } elsif ($k eq 'UID') { + $k = shift(@$rest) // ''; + $k =~ $valid_range or return 'BAD UID range'; + push @{$q->{uid}}, $k; + } elsif ($k =~ /\A(?:SENT)?(?:SINCE|ON|BEFORE)\z/) { + my $d = parse_date(shift(@$rest) // ''); + defined $d or return "BAD $k date format"; + date_search($q, $k, $d); + } elsif ($k =~ /\A(?:SMALLER|LARGER)\z/) { + delete $q->{sql}; # can't use over.sqlite3 + my $bytes = shift(@$rest) // ''; + $bytes =~ /\A[0-9]+\z/ or return "BAD $k not a number"; + $q->{xap} .= ' bytes:' . ($k eq 'SMALLER' ? + '..'.(--$bytes) : + (++$bytes).'..'); + } elsif (defined(my $xk = $I2X{$k})) { + delete $q->{sql}; # can't use over.sqlite3 + my $arg = shift @$rest; + defined($arg) or return "BAD $k no arg"; + + # Xapian can't handle [*"] in probabilistic terms + $arg =~ tr/*"//d; + $q->{xap} .= qq[ $xk:"$arg"]; + } else { + # TODO: parentheses, OR, NOT ... + return "BAD $k not supported (yet?)"; + } + } + + # favor using over.sqlite3 if possible, since Xapian is optional + if (exists $q->{sql}) { + delete($q->{xap}); + delete($q->{sql}) if $sql eq ''; + } elsif (!$self->{ibx}->search) { + return 'BAD Xapian not configured for mailbox'; + } + + if (my $uid = $q->{uid}) { + ((@$uid > 1) || $uid->[0] =~ /,/) and + return 'BAD multiple ranges not supported, yet'; + ($q->{sql} // $q->{xap}) and + return 'BAD ranges and queries do not mix, yet'; + $q->{uid} = join(',', @$uid); # TODO: multiple ranges + } + $q; +} + sub cmd_uid_search ($$$;) { - my ($self, $tag, $arg, @rest) = @_; + my ($self, $tag) = splice(@_, 0, 2); my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; - $arg = uc($arg); - if ($arg eq 'ALL' && !@rest) { + my $q = parse_query($self, \@_); + return "$tag $q\r\n" if !ref($q); + + if (!scalar(keys %$q)) { $self->msg_more('* SEARCH'); my $num = 0; long_response($self, \&uid_search_all, $tag, \$num); - } elsif ($arg eq 'UID' && scalar(@rest) == 1) { - if ($rest[0] =~ /\A([0-9]+):([0-9]+|\*)\z/s) { + } elsif (my $uid = $q->{uid}) { + if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) { my ($beg, $end) = ($1, $2); $end = $ibx->mm->max if $end eq '*'; $self->msg_more('* SEARCH'); long_response($self, \&uid_search_uid_range, $tag, \$beg, $end); - } elsif ($rest[0] =~ /\A[0-9]+\z/s) { - my $uid = $rest[0]; + } elsif ($uid =~ /\A[0-9]+\z/s) { $uid = $ibx->over->get_art($uid) ? " $uid" : ''; "* SEARCH$uid\r\n$tag OK Search done\r\n"; } else { diff --git a/t/imap.t b/t/imap.t index af59ef69386..47e86ef42c7 100644 --- a/t/imap.t +++ b/t/imap.t @@ -9,6 +9,21 @@ use PublicInbox::IMAPD; use PublicInbox::TestCommon; require_mods(qw(DBD::SQLite)); require_git 2.6; +use POSIX qw(strftime); + +{ + my $parse_date = \&PublicInbox::IMAP::parse_date; + is(strftime('%Y-%m-%d', gmtime($parse_date->('02-Oct-1993'))), + '1993-10-02', 'parse_date works'); + is(strftime('%Y-%m-%d', gmtime($parse_date->('2-Oct-1993'))), + '1993-10-02', 'parse_date works w/o leading zero'); + + is($parse_date->('2-10-1993'), undef, 'bad month'); + + # from what I can tell, RFC 3501 says nothing about date-month + # case-insensitivity, so be case-sensitive for now + is($parse_date->('02-oct-1993'), undef, 'case-sensitive month'); +} my ($tmpdir, $for_destroy) = tmpdir(); my $cfgfile = "$tmpdir/config";
We won't support searching across mailboxes, just yet; but maybe in the future. --- lib/PublicInbox/IMAP.pm | 34 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index b24dfcd70b5..f9af530aa19 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -487,6 +487,14 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat requeue_once($self); } +sub uid_clamp ($$$) { + my ($self, $beg, $end) = @_; + my $uid_min = $self->{uid_min} or return; + my $uid_end = $uid_min + UID_BLOCK - 1; + $$beg = $uid_min if $$beg < $uid_min; + $$end = $uid_end if $$end > $uid_end; +} + sub range_step ($$) { my ($self, $range_csv) = @_; my ($beg, $end, $range); @@ -501,6 +509,8 @@ sub range_step ($$) { } elsif ($range =~ /\A([0-9]+):\*\z/) { $beg = $1 + 0; $end = $self->{ibx}->mm->max // 0; + my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK; + $end = $uid_end if $end > $uid_end; $beg = $end if $beg > $end; } elsif ($range =~ /\A[0-9]+\z/) { $beg = $end = $range + 0; @@ -508,11 +518,7 @@ sub range_step ($$) { } else { return 'BAD fetch range'; } - if (defined($range) && (my $uid_min = $self->{uid_min})) { - my $uid_end = $uid_min + UID_BLOCK - 1; - $beg = $uid_min if $beg < $uid_min; - $end = $uid_end if $end > $uid_end; - } + uid_clamp($self, \$beg, \$end) if defined($range); [ $beg, $end, $$range_csv ]; } @@ -804,17 +810,6 @@ sub parse_date ($) { # 02-Oct-1993 timegm(0, 0, 0, $dd, $mm, $yyyy); } -sub uid_search_all { # long_response - my ($self, $tag, $num) = @_; - my $uids = $self->{ibx}->mm->ids_after($num); - if (scalar(@$uids)) { - $self->msg_more(join(' ', '', @$uids)); - } else { - $self->write(\"\r\n$tag OK Search done\r\n"); - undef; - } -} - sub uid_search_uid_range { # long_response my ($self, $tag, $beg, $end) = @_; my $uids = $self->{ibx}->mm->msg_range($beg, $end, 'num'); @@ -944,12 +939,15 @@ sub cmd_uid_search ($$$;) { if (!scalar(keys %$q)) { $self->msg_more('* SEARCH'); - my $num = 0; - long_response($self, \&uid_search_all, $tag, \$num); + my $beg = $self->{uid_min} // 1; + my $end = $ibx->mm->max; + uid_clamp($self, \$beg, \$end); + long_response($self, \&uid_search_uid_range, $tag, \$beg, $end); } elsif (my $uid = $q->{uid}) { if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) { my ($beg, $end) = ($1, $2); $end = $ibx->mm->max if $end eq '*'; + uid_clamp($self, \$beg, \$end); $self->msg_more('* SEARCH'); long_response($self, \&uid_search_uid_range, $tag, \$beg, $end);
Since it seems somewhat common for IMAP clients to limit searches by sent Date: or INTERNALDATE, we can rely on the NNTP/WWW-optimized overview DB. For other queries, we'll have to depend on the Xapian DB. --- lib/PublicInbox/DummyInbox.pm | 2 +- lib/PublicInbox/IMAP.pm | 13 ++++++++----- lib/PublicInbox/Over.pm | 13 +++++++++++++ t/imapd.t | 6 ++++++ 4 files changed, 28 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/DummyInbox.pm b/lib/PublicInbox/DummyInbox.pm index e38f9e5a432..b6c48db1a0e 100644 --- a/lib/PublicInbox/DummyInbox.pm +++ b/lib/PublicInbox/DummyInbox.pm @@ -13,7 +13,7 @@ sub max { undef } # Msgmap::max sub msg_range { [] } # Msgmap::msg_range no warnings 'once'; -*query_xover = \&msg_range; +*uid_range = *query_xover = \&msg_range; *over = \&mm; *subscribe_unlock = *unsubscribe_unlock = *get_art = *description = *base_url = \&max; diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index f9af530aa19..78042b9e2ee 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -811,10 +811,11 @@ sub parse_date ($) { # 02-Oct-1993 } sub uid_search_uid_range { # long_response - my ($self, $tag, $beg, $end) = @_; - my $uids = $self->{ibx}->mm->msg_range($beg, $end, 'num'); + my ($self, $tag, $beg, $end, $sql) = @_; + my $uids = $self->{ibx}->over->uid_range($$beg, $end, $sql); if (@$uids) { - $self->msg_more(join('', map { " $_->[0]" } @$uids)); + $$beg = $uids->[-1] + 1; + $self->msg_more(join(' ', '', @$uids)); } else { $self->write(\"\r\n$tag OK Search done\r\n"); undef; @@ -936,13 +937,15 @@ sub cmd_uid_search ($$$;) { my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; my $q = parse_query($self, \@_); return "$tag $q\r\n" if !ref($q); + my $sql = delete $q->{sql}; if (!scalar(keys %$q)) { $self->msg_more('* SEARCH'); my $beg = $self->{uid_min} // 1; my $end = $ibx->mm->max; uid_clamp($self, \$beg, \$end); - long_response($self, \&uid_search_uid_range, $tag, \$beg, $end); + long_response($self, \&uid_search_uid_range, + $tag, \$beg, $end, $sql); } elsif (my $uid = $q->{uid}) { if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) { my ($beg, $end) = ($1, $2); @@ -950,7 +953,7 @@ sub cmd_uid_search ($$$;) { uid_clamp($self, \$beg, \$end); $self->msg_more('* SEARCH'); long_response($self, \&uid_search_uid_range, - $tag, \$beg, $end); + $tag, \$beg, $end, $sql); } elsif ($uid =~ /\A[0-9]+\z/s) { $uid = $ibx->over->get_art($uid) ? " $uid" : ''; "* SEARCH$uid\r\n$tag OK Search done\r\n"; diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm index 286fb7f6b4f..402cbf7ce07 100644 --- a/lib/PublicInbox/Over.pm +++ b/lib/PublicInbox/Over.pm @@ -215,4 +215,17 @@ SELECT num,ts,ds,ddd FROM over WHERE num = ? LIMIT 1 load_from_row($smsg); } +# IMAP search +sub uid_range { + my ($self, $beg, $end, $sql) = @_; + my $dbh = $self->connect; + my $q = 'SELECT num FROM over WHERE num >= ? AND num <= ?'; + + # This is read-only, anyways; but caller should verify it's + # only sending \A[0-9]+\z for ds and ts column ranges + $q .= $$sql if $sql; + $q .= ' ORDER BY num ASC LIMIT ' . DEFAULT_LIMIT; + $dbh->selectcol_arrayref($q, undef, $beg, $end); +} + 1; diff --git a/t/imapd.t b/t/imapd.t index 11b56b09dbe..2546ab51ac6 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -326,6 +326,12 @@ Content-Type: message/rfc822\r Content-Disposition: attachment; filename="embed2x\.eml"\r \r EOF + + my @hits = $mic->search('SENTON' => '18-Apr-2020'); + is_deeply(\@hits, [ $uidnext ], 'search with date condition works'); + ok($mic->examine($ng), 'EXAMINE on dummy'); + @hits = $mic->search('SENTSINCE' => '18-Apr-2020'); + is_deeply(\@hits, [], 'search on dummy with condition works'); }); # each_inbox # message sequence numbers :<
This speeds up xt/imapd-validate.t by around 10% when used with an abandoned patch to remove ->query_xover. We may also depend on this further if we abandon storing doc_data in Xapian to save disk space. --- lib/PublicInbox/Over.pm | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm index 402cbf7ce07..1faeff418f4 100644 --- a/lib/PublicInbox/Over.pm +++ b/lib/PublicInbox/Over.pm @@ -179,11 +179,12 @@ SELECT COUNT(num) FROM over WHERE num > 0 sub get_art { my ($self, $num) = @_; my $dbh = $self->connect; - my $smsg = $dbh->selectrow_hashref(<<'', undef, $num); + my $sth = $dbh->prepare_cached(<<'', undef, 1); SELECT num,ds,ts,ddd FROM over WHERE num = ? LIMIT 1 - return load_from_row($smsg) if $smsg; - undef; + $sth->execute($num); + my $smsg = $sth->fetchrow_hashref; + $smsg ? load_from_row($smsg) : undef; } sub next_by_mid {
Searching for messages smaller than a certain size is allowed by offlineimap(1), mbsync(1), and possibly other tools. Maybe public-inbox-watch will support it, too. I don't see a reason to expose searching by size via WWW search right now (but maybe in the future, I could be convinced to). Note: we only store the byte-size of the message in git, this is typically LF-only and we won't have the correct size after CRLF conversion for NNTP or IMAP. --- lib/PublicInbox/Search.pm | 12 ++++++++---- lib/PublicInbox/SearchIdx.pm | 2 ++ t/search.t | 6 ++++++ 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm index cb669e8733e..f2d3b92dc82 100644 --- a/lib/PublicInbox/Search.pm +++ b/lib/PublicInbox/Search.pm @@ -5,12 +5,16 @@ # Read-only search interface for use by the web and NNTP interfaces package PublicInbox::Search; use strict; -use warnings; # values for searching -use constant TS => 0; # Received: header in Unix time -use constant YYYYMMDD => 1; # Date: header for searching in the WWW UI -use constant DT => 2; # Date: YYYYMMDDHHMMSS +use constant { + TS => 0, # Received: header in Unix time (IMAP INTERNALDATE) + YYYYMMDD => 1, # Date: header for searching in the WWW UI + DT => 2, # Date: YYYYMMDDHHMMSS + BYTES => 3, # IMAP RFC822.SIZE + # TODO + # REPLYCNT => 4, # IMAP ANSWERED +}; use PublicInbox::Smsg; use PublicInbox::Over; diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index f4fa50ff10f..f7462aa74ca 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -341,6 +341,7 @@ sub add_xapian ($$$$) { add_val($doc, PublicInbox::Search::YYYYMMDD(), $yyyymmdd); my $dt = strftime('%Y%m%d%H%M%S', @ds); add_val($doc, PublicInbox::Search::DT(), $dt); + add_val($doc, PublicInbox::Search::BYTES(), $smsg->{bytes}); my $tg = term_generator($self); $tg->set_document($doc); @@ -388,6 +389,7 @@ sub add_message { # v1 and tests only: $smsg->populate($hdr, $self); + $smsg->{bytes} //= length($mime->as_string); eval { # order matters, overview stores every possible piece of diff --git a/t/search.t b/t/search.t index 6cf2bc2d6b4..cf3254169ca 100644 --- a/t/search.t +++ b/t/search.t @@ -318,6 +318,12 @@ $ibx->with_umask(sub { foreach my $m ($mset->items) { my $smsg = $ro->{over_ro}->get_art($m->get_docid); like($smsg->{to}, qr/\blist\@example\.com\b/, 'to appears'); + my $doc = $m->get_document; + my $col = PublicInbox::Search::BYTES(); + my $bytes = PublicInbox::Smsg::get_val($doc, $col); + like($bytes, qr/\A[0-9]+\z/, '$bytes stored as digit'); + ok($bytes > 0, '$bytes is > 0'); + is($bytes, $smsg->{bytes}, 'bytes Xapian value matches Over'); } $mset = $ro->query('tc:list@example.com', {mset => 1});
We'll need to support searching UID ranges for IMAP, so make sure it's indexed, too. --- lib/PublicInbox/Search.pm | 1 + lib/PublicInbox/SearchIdx.pm | 1 + t/search.t | 5 +++++ 3 files changed, 7 insertions(+) diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm index f2d3b92dc82..c54cf7b9911 100644 --- a/lib/PublicInbox/Search.pm +++ b/lib/PublicInbox/Search.pm @@ -12,6 +12,7 @@ use constant { YYYYMMDD => 1, # Date: header for searching in the WWW UI DT => 2, # Date: YYYYMMDDHHMMSS BYTES => 3, # IMAP RFC822.SIZE + UID => 4, # IMAP UID == NNTP article number == Xapian docid # TODO # REPLYCNT => 4, # IMAP ANSWERED }; diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index f7462aa74ca..3df7970ebf9 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -342,6 +342,7 @@ sub add_xapian ($$$$) { my $dt = strftime('%Y%m%d%H%M%S', @ds); add_val($doc, PublicInbox::Search::DT(), $dt); add_val($doc, PublicInbox::Search::BYTES(), $smsg->{bytes}); + add_val($doc, PublicInbox::Search::UID(), $smsg->{num}); my $tg = term_generator($self); $tg->set_document($doc); diff --git a/t/search.t b/t/search.t index cf3254169ca..d4ca28c794f 100644 --- a/t/search.t +++ b/t/search.t @@ -324,6 +324,11 @@ $ibx->with_umask(sub { like($bytes, qr/\A[0-9]+\z/, '$bytes stored as digit'); ok($bytes > 0, '$bytes is > 0'); is($bytes, $smsg->{bytes}, 'bytes Xapian value matches Over'); + + $col = PublicInbox::Search::UID(); + my $uid = PublicInbox::Smsg::get_val($doc, $col); + is($uid, $smsg->{num}, 'UID column matches {num}'); + is($uid, $m->get_docid, 'UID column matches docid'); } $mset = $ro->query('tc:list@example.com', {mset => 1});
Dummy messages make for bad user experience with MUAs which still use sequence numbers. Not being able to fetch a message doesn't seem fatal in mutt, so just ignore (sometimes large) gaps. --- lib/PublicInbox/IMAP.pm | 61 ++++------------------------------------- t/imapd.t | 3 +- 2 files changed, 6 insertions(+), 58 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 78042b9e2ee..8307343cf15 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -9,8 +9,9 @@ # data notes: # * NNTP article numbers are UIDs and message sequence numbers (MSNs) # * Message sequence numbers (MSNs) can be stable since we're read-only. -# Most IMAP clients use UIDs (I hope), and we can return a dummy -# message if a client requests a non-existent MSN. +# Most IMAP clients use UIDs (I hope). We may return a dummy message +# in the future if a client requests a non-existent MSN, but that seems +# unecessary with mutt. package PublicInbox::IMAP; use strict; @@ -398,22 +399,6 @@ sub fetch_body ($;$) { join('', @hold); } -sub dummy_message ($$) { - my ($self, $seqno) = @_; - my $ret = <<EOF; -From: nobody\@localhost\r -To: nobody\@localhost\r -Date: Thu, 01 Jan 1970 00:00:00 +0000\r -Message-ID: <dummy-$seqno\@$self->{ibx}->{newsgroup}>\r -Subject: dummy message #$seqno\r -\r -You're seeing this message because your IMAP client didn't use UIDs.\r -The message which used to use this sequence number was likely spam\r -and removed by the administrator.\r -EOF - \$ret; -} - sub requeue_once ($) { my ($self) = @_; # COMPRESS users all share the same DEFLATE context. @@ -437,8 +422,7 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs # public-inbox-(edit|purge), just move onto the next message - return requeue_once($self) unless defined $want->{-seqno}; - $bref = dummy_message($self, $smsg->{num}); + return requeue_once($self); } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } @@ -764,42 +748,6 @@ sub cmd_uid_fetch ($$$;@) { $args; # error } -sub seq_fetch_m { # long_response - my ($self, $tag, $msgs, $range_info, $want) = @_; - while (!@$msgs) { # rare - if (my $end = refill_range($self, $msgs, $range_info)) { - $self->write(\"$tag $end\r\n"); - return; - } - } - my $seq = $want->{-seqno}++; - my $cur_num = $msgs->[0]->{num}; - if ($cur_num == $seq) { # as expected - git_async_cat($self->{ibx}->git, $msgs->[0]->{blob}, - \&uid_fetch_cb, \@_); - } elsif ($cur_num > $seq) { - # send dummy messages until $seq catches up to $cur_num - my $smsg = bless { num => $seq, ts => 0 }, 'PublicInbox::Smsg'; - unshift @$msgs, $smsg; - my $bref = dummy_message($self, $seq); - uid_fetch_cb($bref, undef, undef, undef, \@_); - $smsg; # blessed response since uid_fetch_cb requeues - } else { # should not happen - die "BUG: cur_num=$cur_num < seq=$seq"; - } -} - -sub cmd_fetch ($$$;@) { - my ($self, $tag, $range_csv, @want) = @_; - my $args = fetch_common($self, $tag, $range_csv, \@want); - ref($args) eq 'ARRAY' ? do { - my $want = $args->[-1]; - $want->{-seqno} = $args->[2]->[0]; # $beg == $range_info->[0]; - long_response($self, \&seq_fetch_m, @$args) - } : $args; # error -} - - sub parse_date ($) { # 02-Oct-1993 my ($date_text) = @_; my ($dd, $mon, $yyyy) = split(/-/, $_[0], 3); @@ -1147,5 +1095,6 @@ sub close { # we're read-only, so SELECT and EXAMINE do the same thing no warnings 'once'; *cmd_select = \&cmd_examine; +*cmd_fetch = \&cmd_uid_fetch; 1; diff --git a/t/imapd.t b/t/imapd.t index 2546ab51ac6..233be9f2c0a 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -348,8 +348,7 @@ is(scalar keys %$ret, 3, 'got all 3 messages'); run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm'); } my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@"; -is(scalar keys %$r2, 3, 'still got all 3 messages'); -like($r2->{1}->{'BODY[]'}, qr/dummy message #1/, 'got dummy message 1'); +is(scalar keys %$r2, 2, 'did not get all 3 messages'); is($r2->{2}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged'); is($r2->{3}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged'); $r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]')
This is just a hair faster and cacheable in the future, if we need it. Most notably, this avoids doing PublicInbox::Eml->new for simple "RFC822", "BODY[]", and "RFC822.SIZE" requests. --- lib/PublicInbox/IMAP.pm | 188 +++++++++++++++++++++++++--------------- t/imap.t | 16 ++++ 2 files changed, 134 insertions(+), 70 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 8307343cf15..3fae81112aa 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -40,22 +40,26 @@ sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977? # changing this will cause grief for clients which cache sub UID_BLOCK () { 50_000 } -my %FETCH_NEED_BLOB = ( # for future optimization - 'BODY[HEADER]' => 1, - 'BODY[TEXT]' => 1, - 'BODY[]' => 1, - 'RFC822.HEADER' => 1, - 'RFC822.SIZE' => 1, # needs CRLF conversion :< - 'RFC822.TEXT' => 1, - BODY => 1, - BODYSTRUCTURE => 1, - ENVELOPE => 1, - FLAGS => 0, - INTERNALDATE => 0, - RFC822 => 1, - UID => 0, +# these values area also used for sorting +sub NEED_BLOB () { 1 } +sub NEED_EML () { NEED_BLOB|2 } +my $OP_EML_NEW = [ NEED_EML - 1, \&op_eml_new ]; + +my %FETCH_NEED = ( # for future optimization + 'BODY[HEADER]' => [ NEED_EML, \&emit_rfc822_header ], + 'BODY[TEXT]' => [ NEED_EML, \&emit_rfc822_text ], + 'BODY[]' => [ NEED_BLOB, \&emit_rfc822 ], + 'RFC822.HEADER' => [ NEED_EML, \&emit_rfc822_header ], + 'RFC822.TEXT' => [ NEED_EML, \&emit_rfc822_text ], + 'RFC822.SIZE' => [ NEED_BLOB, \&emit_rfc822_size ], + RFC822 => [ NEED_BLOB, \&emit_rfc822 ], + BODY => [ NEED_EML, \&emit_body ], + BODYSTRUCTURE => [ NEED_EML, \&emit_bodystructure ], + ENVELOPE => [ NEED_EML, \&emit_envelope ], + FLAGS => [ 0, \&emit_flags ], + INTERNALDATE => [ 0, \&emit_internaldate ], ); -my %FETCH_ATT = map { $_ => [ $_ ] } keys %FETCH_NEED_BLOB; +my %FETCH_ATT = map { $_ => [ $_ ] } keys %FETCH_NEED; # aliases (RFC 3501 section 6.4.5) $FETCH_ATT{FAST} = [ qw(FLAGS INTERNALDATE RFC822.SIZE) ]; @@ -63,9 +67,10 @@ $FETCH_ATT{ALL} = [ @{$FETCH_ATT{FAST}}, 'ENVELOPE' ]; $FETCH_ATT{FULL} = [ @{$FETCH_ATT{ALL}}, 'BODY' ]; for my $att (keys %FETCH_ATT) { - my %h = map { $_ => 1 } @{$FETCH_ATT{$att}}; + my %h = map { $_ => $FETCH_NEED{$_} } @{$FETCH_ATT{$att}}; $FETCH_ATT{$att} = \%h; } +undef %FETCH_NEED; my $valid_range = '[0-9]+|[0-9]+:[0-9]+|[0-9]+:\*'; $valid_range = qr/\A(?:$valid_range)(?:,(?:$valid_range))*\z/; @@ -417,7 +422,7 @@ sub requeue_once ($) { sub uid_fetch_cb { # called by git->cat_async via git_async_cat my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; - my ($self, undef, $msgs, undef, $want) = @$fetch_m_arg; + my ($self, undef, $msgs, undef, $ops, $partial) = @$fetch_m_arg; my $smsg = shift @$msgs or die 'BUG: no smsg'; if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs @@ -426,51 +431,72 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } - $$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy # fixup old bug from import (pre-a0c07cba0e5d8b6a) $$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s; - $self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}"); - - $want->{'RFC822.SIZE'} and - $self->msg_more(' RFC822.SIZE '.length($$bref)); - $want->{INTERNALDATE} and - $self->msg_more(' INTERNALDATE "'.$smsg->internaldate.'"'); - $want->{FLAGS} and $self->msg_more(' FLAGS ()'); - for ('RFC822', 'BODY[]') { - $want->{$_} or next; - $self->msg_more(" $_ {".length($$bref)."}\r\n"); - $self->msg_more($$bref); + my $eml; + for (my $i = 0; $i < @$ops;) { + my $k = $ops->[$i++]; + $ops->[$i++]->($self, $k, $smsg, $bref, $eml); } + partial_emit($self, $partial, $eml) if $partial; + $self->msg_more(")\r\n"); + requeue_once($self); +} - my $eml = PublicInbox::Eml->new($bref); +sub emit_rfc822 { + my ($self, $k, undef, $bref) = @_; + $self->msg_more(" $k {" . length($$bref)."}\r\n"); + $self->msg_more($$bref); +} - $want->{ENVELOPE} and - $self->msg_more(' ENVELOPE '.eml_envelope($eml)); +# Mail::IMAPClient::message_string cares about this by default +# (->Ignoresizeerrors attribute) +sub emit_rfc822_size { + my ($self, $k, undef, $bref) = @_; + $self->msg_more(' RFC822.SIZE ' . length($$bref)); +} - for ('RFC822.HEADER', 'BODY[HEADER]') { - $want->{$_} or next; - $self->msg_more(" $_ {".length(${$eml->{hdr}})."}\r\n"); - $self->msg_more(${$eml->{hdr}}); - } - for ('RFC822.TEXT', 'BODY[TEXT]') { - $want->{$_} or next; - $self->msg_more(" $_ {".length($$bref)."}\r\n"); - $self->msg_more($$bref); - } - $want->{BODYSTRUCTURE} and - $self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1)); - $want->{BODY} and - $self->msg_more(' BODY '.fetch_body($eml)); - if (my $partial = $want->{-partial}) { - partial_emit($self, $partial, $eml); - } - $self->msg_more(")\r\n"); - requeue_once($self); +sub emit_internaldate { + my ($self, undef, $smsg) = @_; + $self->msg_more(' INTERNALDATE "'.$smsg->internaldate.'"'); +} + +sub emit_flags { $_[0]->msg_more(' FLAGS ()') } + +sub emit_envelope { + my ($self, undef, undef, undef, $eml) = @_; + $self->msg_more(' ENVELOPE '.eml_envelope($eml)); +} + +sub emit_rfc822_header { + my ($self, $k, undef, undef, $eml) = @_; + $self->msg_more(" $k {".length(${$eml->{hdr}})."}\r\n"); + $self->msg_more(${$eml->{hdr}}); +} + +# n.b. this is sorted to be after any emit_eml_new ops +sub emit_rfc822_text { + my ($self, $k, undef, $bref) = @_; + $self->msg_more(" $k {".length($$bref)."}\r\n"); + $self->msg_more($$bref); +} + +sub emit_bodystructure { + my ($self, undef, undef, undef, $eml) = @_; + $self->msg_more(' BODYSTRUCTURE '.fetch_body($eml, 1)); +} + +sub emit_body { + my ($self, undef, undef, undef, $eml) = @_; + $self->msg_more(' BODY '.fetch_body($eml)); } +# set $eml once ($_[4] == $eml, $_[3] == $bref) +sub op_eml_new { $_[4] = PublicInbox::Eml->new($_[3]) } + sub uid_clamp ($$$) { my ($self, $beg, $end) = @_; my $uid_min = $self->{uid_min} or return; @@ -521,7 +547,7 @@ sub refill_range ($$$) { } sub uid_fetch_m { # long_response - my ($self, $tag, $msgs, $range_info, $want) = @_; + my ($self, $tag, $msgs, $range_info) = @_; # \@ops, \@partial while (!@$msgs) { # rare if (my $end = refill_range($self, $msgs, $range_info)) { $self->write(\"$tag $end\r\n"); @@ -710,42 +736,64 @@ sub partial_emit ($$$) { } } -sub fetch_common ($$$$) { - my ($self, $tag, $range_csv, $want) = @_; - my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; +sub fetch_compile ($) { + my ($want) = @_; if ($want->[0] =~ s/\A\(//s) { - $want->[-1] =~ s/\)\z//s or return "$tag BAD no rparen\r\n"; + $want->[-1] =~ s/\)\z//s or return 'BAD no rparen'; } - my (%partial, %want); + my (%partial, %seen, @op); + my $need = 0; while (defined(my $att = shift @$want)) { $att = uc($att); + next if $att eq 'UID'; # always returned $att =~ s/\ABODY\.PEEK\[/BODY\[/; # we're read-only my $x = $FETCH_ATT{$att}; if ($x) { - %want = (%want, %$x); + while (my ($k, $fl_cb) = each %$x) { + next if $seen{$k}++; + $need |= $fl_cb->[0]; + + # insert a special op to convert $bref to $eml + # the first time we need it + if ($need == NEED_EML && !$seen{$need}++) { + push @op, $OP_EML_NEW; + } + # $fl_cb = [ flags, \&emit_foo ] + push @op, [ @$fl_cb , $k ]; + } } elsif (!partial_prepare(\%partial, $want, $att)) { - return "$tag BAD param: $att\r\n"; + return "BAD param: $att"; } } + my @r; # stabilize partial order for consistency and ease-of-debugging: if (scalar keys %partial) { - $want{-partial} = [ map {; - [ $_, @{$partial{$_}} ] - } sort keys %partial ]; + $need = NEED_EML; + push @op, $OP_EML_NEW if !$seen{$need}++; + $r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ]; } - $range_csv = 'bad' if $range_csv !~ $valid_range; - my $range_info = range_step($self, \$range_csv); - return "$tag $range_info\r\n" if !ref($range_info); - [ $tag, [], $range_info, \%want ]; + + $r[0] = $need; + + # r[1] = [ $key1, $cb1, $key2, $cb2, ... ] + use sort 'stable'; # makes output more consistent + $r[1] = [ map { ($_->[2], $_->[1]) } sort { $a->[0] <=> $b->[0] } @op ]; + @r; } sub cmd_uid_fetch ($$$;@) { my ($self, $tag, $range_csv, @want) = @_; - my $args = fetch_common($self, $tag, $range_csv, \@want); - ref($args) eq 'ARRAY' ? - long_response($self, \&uid_fetch_m, @$args) : - $args; # error + my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; + my ($need, $ops, $partial) = fetch_compile(\@want); + return "$tag $need\r\n" unless $ops; + + $range_csv = 'bad' if $range_csv !~ $valid_range; + my $range_info = range_step($self, \$range_csv); + return "$tag $range_info\r\n" if !ref($range_info); + + long_response($self, \&uid_fetch_m, + $tag, [], $range_info, $ops, $partial); } sub parse_date ($) { # 02-Oct-1993 diff --git a/t/imap.t b/t/imap.t index 47e86ef42c7..2401237c8a0 100644 --- a/t/imap.t +++ b/t/imap.t @@ -107,4 +107,20 @@ EOF }, 'structure matches expected'); } +{ + my $fetch_compile = \&PublicInbox::IMAP::fetch_compile; + my ($cb, $ops, $partial) = $fetch_compile->(['BODY[]']); + is($partial, undef, 'no partial fetch data'); + is_deeply($ops, + [ 'BODY[]', \&PublicInbox::IMAP::emit_rfc822 ], + 'proper key and op compiled for BODY[]'); + + ($cb, $ops, $partial) = $fetch_compile->(['BODY', 'BODY[]']); + is_deeply($ops, [ + 'BODY[]', \&PublicInbox::IMAP::emit_rfc822, + undef, \&PublicInbox::IMAP::op_eml_new, + 'BODY', \&PublicInbox::IMAP::emit_body, + ], 'placed op_eml_new before emit_body'); +} + done_testing;
We can avoid loading the entire message from git when mutt makes a "UID FETCH" request for "(UID FLAGS)". This speeds mutt up by more than an order-of-magnitude in informal measurements. --- lib/PublicInbox/IMAP.pm | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 3fae81112aa..0fe31a77244 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -45,7 +45,7 @@ sub NEED_BLOB () { 1 } sub NEED_EML () { NEED_BLOB|2 } my $OP_EML_NEW = [ NEED_EML - 1, \&op_eml_new ]; -my %FETCH_NEED = ( # for future optimization +my %FETCH_NEED = ( 'BODY[HEADER]' => [ NEED_EML, \&emit_rfc822_header ], 'BODY[TEXT]' => [ NEED_EML, \&emit_rfc822_text ], 'BODY[]' => [ NEED_BLOB, \&emit_rfc822 ], @@ -546,7 +546,7 @@ sub refill_range ($$$) { undef; # keep looping } -sub uid_fetch_m { # long_response +sub uid_fetch_msg { # long_response my ($self, $tag, $msgs, $range_info) = @_; # \@ops, \@partial while (!@$msgs) { # rare if (my $end = refill_range($self, $msgs, $range_info)) { @@ -558,6 +558,26 @@ sub uid_fetch_m { # long_response \&uid_fetch_cb, \@_); } +sub uid_fetch_smsg { # long_response + my ($self, $tag, $msgs, $range_info, $ops) = @_; + while (!@$msgs) { # rare + if (my $end = refill_range($self, $msgs, $range_info)) { + $self->write(\"$tag $end\r\n"); + return; + } + } + for my $smsg (@$msgs) { + $self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}"); + for (my $i = 0; $i < @$ops;) { + my $k = $ops->[$i++]; + $ops->[$i++]->($self, $k, $smsg); + } + $self->msg_more(")\r\n"); + } + @$msgs = (); + 1; # more +} + sub cmd_status ($$$;@) { my ($self, $tag, $mailbox, @items) = @_; return "$tag BAD no items\r\n" if !scalar(@items); @@ -774,7 +794,7 @@ sub fetch_compile ($) { $r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ]; } - $r[0] = $need; + $r[0] = $need ? \&uid_fetch_msg : \&uid_fetch_smsg; # r[1] = [ $key1, $cb1, $key2, $cb2, ... ] use sort 'stable'; # makes output more consistent @@ -785,15 +805,13 @@ sub fetch_compile ($) { sub cmd_uid_fetch ($$$;@) { my ($self, $tag, $range_csv, @want) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; - my ($need, $ops, $partial) = fetch_compile(\@want); - return "$tag $need\r\n" unless $ops; + my ($cb, $ops, $partial) = fetch_compile(\@want); + return "$tag $cb\r\n" unless $ops; $range_csv = 'bad' if $range_csv !~ $valid_range; my $range_info = range_step($self, \$range_csv); return "$tag $range_info\r\n" if !ref($range_info); - - long_response($self, \&uid_fetch_m, - $tag, [], $range_info, $ops, $partial); + long_response($self, $cb, $tag, [], $range_info, $ops, $partial); } sub parse_date ($) { # 02-Oct-1993
We can speed up this common mutt request by another 2-3x by not loading the entire smsg from SQLite, just the UID. --- lib/PublicInbox/IMAP.pm | 42 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 0fe31a77244..0d553da91f9 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -41,8 +41,9 @@ sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977? sub UID_BLOCK () { 50_000 } # these values area also used for sorting -sub NEED_BLOB () { 1 } -sub NEED_EML () { NEED_BLOB|2 } +sub NEED_SMSG () { 1 } +sub NEED_BLOB () { NEED_SMSG|2 } +sub NEED_EML () { NEED_BLOB|4 } my $OP_EML_NEW = [ NEED_EML - 1, \&op_eml_new ]; my %FETCH_NEED = ( @@ -57,7 +58,7 @@ my %FETCH_NEED = ( BODYSTRUCTURE => [ NEED_EML, \&emit_bodystructure ], ENVELOPE => [ NEED_EML, \&emit_envelope ], FLAGS => [ 0, \&emit_flags ], - INTERNALDATE => [ 0, \&emit_internaldate ], + INTERNALDATE => [ NEED_SMSG, \&emit_internaldate ], ); my %FETCH_ATT = map { $_ => [ $_ ] } keys %FETCH_NEED; @@ -578,6 +579,38 @@ sub uid_fetch_smsg { # long_response 1; # more } +sub uid_fetch_uid { # long_response + my ($self, $tag, $uids, $range_info, $ops) = @_; + while (!@$uids) { # rare + my ($beg, $end, $range_csv) = @$range_info; + if (scalar(@$uids = @{$self->{ibx}->over-> + uid_range($beg, $end)})) { + $range_info->[0] = $uids->[-1] + 1; + } elsif (!$range_csv) { + $self->write(\"$tag OK Fetch done\r\n"); + return; + } else { + my $next_range = range_step($self, \$range_csv); + if (!ref($next_range)) { # error + $self->write(\"$tag $next_range\r\n"); + return; + } + @$range_info = @$next_range; + } + # continue looping + } + for (@$uids) { + $self->msg_more("* $_ FETCH (UID $_"); + for (my $i = 0; $i < @$ops;) { + my $k = $ops->[$i++]; + $ops->[$i++]->($self, $k); + } + $self->msg_more(")\r\n"); + } + @$uids = (); + 1; # more +} + sub cmd_status ($$$;@) { my ($self, $tag, $mailbox, @items) = @_; return "$tag BAD no items\r\n" if !scalar(@items); @@ -794,7 +827,8 @@ sub fetch_compile ($) { $r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ]; } - $r[0] = $need ? \&uid_fetch_msg : \&uid_fetch_smsg; + $r[0] = $need & NEED_BLOB ? \&uid_fetch_msg : + ($need & NEED_SMSG ? \&uid_fetch_smsg : \&uid_fetch_uid); # r[1] = [ $key1, $cb1, $key2, $cb2, ... ] use sort 'stable'; # makes output more consistent
We should not waste memory for IDLE unless it's used on the most recent inbox slice. We also need to keep the IDLE connection alive regardless of $PublicInbox::DS::EXPTIME. --- lib/PublicInbox/IMAP.pm | 56 ++++++++++++++++++++++++++++++++++------- 1 file changed, 47 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 0d553da91f9..77e8af12fa9 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -151,7 +151,7 @@ sub cmd_close ($$) { sub cmd_logout ($$) { my ($self, $tag) = @_; - delete $self->{logged_in}; + delete @$self{qw(logged_in -idle_tag)}; $self->write(\"* BYE logging out\r\n$tag OK Logout done\r\n"); $self->shutdn; # PublicInbox::DS::shutdn undef; @@ -174,23 +174,63 @@ sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" } sub on_inbox_unlock { my ($self, $ibx) = @_; my $new = $ibx->mm->max; + my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK; defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset'; + $new = $uid_end if $new > $uid_end; if ($new > $old) { $self->{-idle_max} = $new; $self->msg_more("* $_ EXISTS\r\n") for (($old + 1)..($new - 1)); $self->write(\"* $new EXISTS\r\n"); + } elsif ($new == $uid_end) { # max exceeded $uid_end + # continue idling w/o inotify + delete $self->{-idle_max}; + my $sock = $self->{sock} or return; + $ibx->unsubscribe_unlock(fileno($sock)); } } +# called every X minute(s) or so by PublicInbox::DS::later +my $IDLERS = {}; +my $idle_timer; +sub idle_tick_all { + my $old = $IDLERS; + $IDLERS = {}; + for my $i (values %$old) { + next if ($i->{wbuf} || !exists($i->{-idle_tag})); + $i->update_idle_time or next; + $IDLERS->{fileno($i->{sock})} = $i; + $i->write(\"* OK Still here\r\n"); + } + $idle_timer = scalar keys %$IDLERS ? + PublicInbox::DS::later(\&idle_tick_all) : undef; +} + sub cmd_idle ($$) { my ($self, $tag) = @_; # IDLE seems allowed by dovecot w/o a mailbox selected *shrug* my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n"; - $ibx->subscribe_unlock(fileno($self->{sock}), $self); - $self->{imapd}->idler_start; $self->{-idle_tag} = $tag; - $self->{-idle_max} = $ibx->mm->max // 0; - "+ idling\r\n" + my $max = $ibx->mm->max // 0; + my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK; + my $sock = $self->{sock} or return; + my $fd = fileno($sock); + # only do inotify on most recent slice + if ($max < $uid_end) { + $ibx->subscribe_unlock($fd, $self); + $self->{imapd}->idler_start; + $self->{-idle_max} = $max; + } + $idle_timer //= PublicInbox::DS::later(\&idle_tick_all); + $IDLERS->{$fd} = $self; + \"+ idling\r\n" +} + +sub stop_idle ($$) { + my ($self, $ibx); + my $sock = $self->{sock} or return; + my $fd = fileno($sock); + delete $IDLERS->{$fd}; + $ibx->unsubscribe_unlock($fd); } sub cmd_done ($$) { @@ -201,7 +241,7 @@ sub cmd_done ($$) { warn "BUG: idle_tag set w/o inbox"; return "$tag BAD internal bug\r\n"; }; - $ibx->unsubscribe_unlock(fileno($self->{sock})); + stop_idle($self, $ibx); "$idle_tag OK Idle done\r\n"; } @@ -1185,9 +1225,7 @@ sub busy { sub close { my ($self) = @_; if (my $ibx = delete $self->{ibx}) { - if (my $sock = $self->{sock}) {; - $ibx->unsubscribe_unlock(fileno($sock)); - } + stop_idle($self, $ibx); } $self->SUPER::close; # PublicInbox::DS::close }
RFC 3501 section 5.4 requires this to be >= 30 minutes, 10x higher than what is recommended for NNTP. Fortunately our design is reasonably memory-efficient despite being Perl. --- lib/PublicInbox/IMAP.pm | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 77e8af12fa9..13f415cf8d6 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -80,6 +80,9 @@ my @MoY = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); my %MoY; @MoY{@MoY} = (0..11); +# RFC 3501 5.4. Autologout Timer needs to be >= 30min +$PublicInbox::DS::EXPTIME = 60 * 30; + sub greet ($) { my ($self) = @_; my $capa = capa($self);
This is one boolean attribute not worth wasting space for. With 20000 sockets, this reduces RSS by around 5% at a glance, and locked hashes doesn't do us much good when clients use compression, anyways. --- lib/PublicInbox/IMAP.pm | 19 ++++++++++++++----- lib/PublicInbox/IMAPdeflate.pm | 2 -- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 13f415cf8d6..803ce31ff22 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -16,7 +16,7 @@ package PublicInbox::IMAP; use strict; use base qw(PublicInbox::DS); -use fields qw(imapd logged_in ibx long_cb -login_tag +use fields qw(imapd ibx long_cb -login_tag uid_min -idle_tag -idle_max); use PublicInbox::Eml; use PublicInbox::EmlContentFoo qw(parse_content_disposition); @@ -27,6 +27,7 @@ use Text::ParseWords qw(parse_line); use Errno qw(EAGAIN); use Time::Local qw(timegm); use POSIX qw(strftime); +use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways my $Address; for my $mod (qw(Email::Address::XS Mail::Address)) { @@ -91,7 +92,8 @@ sub greet ($) { sub new ($$$) { my ($class, $sock, $imapd) = @_; - my $self = fields::new($class); + my $self = fields::new('PublicInbox::IMAP_preauth'); + unlock_hash(%$self); my $ev = EPOLLIN; my $wbuf; if ($sock->can('accept_SSL') && !$sock->accept_SSL) { @@ -110,13 +112,15 @@ sub new ($$$) { $self; } +sub logged_in { 1 } + sub capa ($) { my ($self) = @_; # dovecot advertises IDLE pre-login; perhaps because some clients # depend on it, so we'll do the same my $capa = 'CAPABILITY IMAP4rev1 IDLE'; - if ($self->{logged_in}) { + if ($self->logged_in) { $capa .= ' COMPRESS=DEFLATE'; } else { if (!($self->{sock} // $self)->can('accept_SSL') && @@ -129,7 +133,7 @@ sub capa ($) { sub login_success ($$) { my ($self, $tag) = @_; - $self->{logged_in} = 1; + bless $self, 'PublicInbox::IMAP'; my $capa = capa($self); "$tag OK [$capa] Logged in\r\n"; } @@ -154,7 +158,7 @@ sub cmd_close ($$) { sub cmd_logout ($$) { my ($self, $tag) = @_; - delete @$self{qw(logged_in -idle_tag)}; + delete $self->{-idle_tag}; $self->write(\"* BYE logging out\r\n$tag OK Logout done\r\n"); $self->shutdn; # PublicInbox::DS::shutdn undef; @@ -1238,4 +1242,9 @@ no warnings 'once'; *cmd_select = \&cmd_examine; *cmd_fetch = \&cmd_uid_fetch; +package PublicInbox::IMAP_preauth; +our @ISA = qw(PublicInbox::IMAP); + +sub logged_in { 0 } + 1; diff --git a/lib/PublicInbox/IMAPdeflate.pm b/lib/PublicInbox/IMAPdeflate.pm index 42daa6cffaa..b98a069d9b6 100644 --- a/lib/PublicInbox/IMAPdeflate.pm +++ b/lib/PublicInbox/IMAPdeflate.pm @@ -9,7 +9,6 @@ use warnings; use 5.010_001; use base qw(PublicInbox::IMAP); use Compress::Raw::Zlib; -use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways my %IN_OPT = ( -Bufsize => 1024, @@ -41,7 +40,6 @@ sub enable { $self->write(\"$tag BAD failed to activate compression\r\n"); return; } - unlock_hash(%$self); $self->write(\"$tag OK DEFLATE active\r\n"); bless $self, $class; $self->{zin} = $in;
We can cleanup some of our v1 code slightly and let git do I/O+decoding in parallel. This gives a slight 2-4% re-indexing performance boost even on an SSD. --- lib/PublicInbox/SearchIdx.pm | 76 +++++++++++++++++------------------- 1 file changed, 35 insertions(+), 41 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 3df7970ebf9..a790ac4076a 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -378,7 +378,7 @@ sub _msgmap_init ($) { sub add_message { # mime = PublicInbox::Eml or Email::MIME object - my ($self, $mime, $smsg) = @_; + my ($self, $mime, $smsg, $sync) = @_; my $hdr = $mime->header_obj; my $mids = mids_for_index($hdr); $smsg //= bless { blob => '' }, 'PublicInbox::Smsg'; # test-only compat @@ -389,7 +389,7 @@ sub add_message { }; # v1 and tests only: - $smsg->populate($hdr, $self); + $smsg->populate($hdr, $sync); $smsg->{bytes} //= length($mime->as_string); eval { @@ -549,24 +549,24 @@ sub unindex_mm { $self->{mm}->mid_delete(mid_mime($mime)); } -sub index_both { - my ($self, $mime, $smsg) = @_; - my $num = index_mm($self, $mime); +sub index_both { # git->cat_async callback + my ($bref, $oid, $type, $size, $sync) = @_; + my ($nr, $max) = @$sync{qw(nr max)}; + ++$$nr; + $$max -= $size; + my $smsg = bless { bytes => $size, blob => $oid }, 'PublicInbox::Smsg'; + my $self = $sync->{sidx}; + my $eml = PublicInbox::Eml->new($bref); + my $num = index_mm($self, $eml); $smsg->{num} = $num; - add_message($self, $mime, $smsg); + add_message($self, $eml, $smsg, $sync); } -sub unindex_both { - my ($self, $mime) = @_; - unindex_blob($self, $mime); - unindex_mm($self, $mime); -} - -sub do_cat_mail { - my ($git, $blob, $sizeref) = @_; - my $str = $git->cat_file($blob, $sizeref) or - die "BUG: $blob not found in $git->{git_dir}"; - PublicInbox::Eml->new($str); +sub unindex_both { # git->cat_async callback + my ($bref, $oid, $type, $size, $self) = @_; + my $eml = PublicInbox::Eml->new($bref); + unindex_blob($self, $eml); + unindex_mm($self, $eml); } # called by public-inbox-index @@ -576,15 +576,6 @@ sub index_sync { $self->{-inbox}->with_umask(sub { $self->_index_sync($opts) }) } -sub batch_adjust ($$$$$) { - my ($max, $bytes, $batch_cb, $latest, $nr) = @_; - $$max -= $bytes; - if ($$max <= 0) { - $$max = $BATCH_BYTES; - $batch_cb->($nr, $latest); - } -} - sub too_big ($$$) { my ($self, $git, $oid) = @_; my $max_size = $self->{index_max_size} or return; @@ -597,24 +588,28 @@ sub too_big ($$$) { # only for v1 sub read_log { - my ($self, $log, $add_cb, $del_cb, $batch_cb) = @_; + my ($self, $log, $batch_cb) = @_; my $hex = '[a-f0-9]'; my $h40 = $hex .'{40}'; my $addmsg = qr!^:000000 100644 \S+ ($h40) A\t${hex}{2}/${hex}{38}$!; my $delmsg = qr!^:100644 000000 ($h40) \S+ D\t${hex}{2}/${hex}{38}$!; my $git = $self->{git}; my $latest; - my $bytes; my $max = $BATCH_BYTES; local $/ = "\n"; my %D; my $line; my $newest; my $nr = 0; + my $sync = { sidx => $self, nr => \$nr, max => \$max }; while (defined($line = <$log>)) { if ($line =~ /$addmsg/o) { my $blob = $1; if (delete $D{$blob}) { + # make sure pending index writes are done + # before writing to ->mm + $git->cat_async_wait; + if (defined $self->{regen_down}) { my $num = $self->{regen_down}--; $self->{mm}->num_highwater($num); @@ -622,12 +617,12 @@ sub read_log { next; } next if too_big($self, $git, $blob); - my $mime = do_cat_mail($git, $blob, \$bytes); - my $smsg = bless {}, 'PublicInbox::Smsg'; - batch_adjust(\$max, $bytes, $batch_cb, $latest, ++$nr); - $smsg->{blob} = $blob; - $smsg->{bytes} = $bytes; - $add_cb->($self, $mime, $smsg); + $git->cat_async($blob, \&index_both, { %$sync }); + if ($max <= 0) { + $git->cat_async_wait; + $max = $BATCH_BYTES; + $batch_cb->($nr, $latest); + } } elsif ($line =~ /$delmsg/o) { my $blob = $1; $D{$blob} = 1 unless too_big($self, $git, $blob); @@ -635,18 +630,17 @@ sub read_log { $latest = $1; $newest ||= $latest; } elsif ($line =~ /^author .*? ([0-9]+) [\-\+][0-9]+$/) { - $self->{autime} = $1; + $sync->{autime} = $1; } elsif ($line =~ /^committer .*? ([0-9]+) [\-\+][0-9]+$/) { - $self->{cotime} = $1; + $sync->{cotime} = $1; } } close($log) or die "git log failed: \$?=$?"; # get the leftovers foreach my $blob (keys %D) { - my $mime = do_cat_mail($git, $blob, \$bytes); - $del_cb->($self, $mime); + $git->cat_async($blob, \&unindex_both, $self); } - delete @$self{qw(autime cotime)}; + $git->cat_async_wait; $batch_cb->($nr, $latest, $newest); } @@ -774,7 +768,7 @@ sub _index_sync { } while (_last_x_commit($self, $mm) ne $last_commit); my $dbh = $mm->{dbh} if $mm; - my $cb = sub { + my $batch_cb = sub { my ($nr, $commit, $newest) = @_; if ($dbh) { if ($newest) { @@ -803,7 +797,7 @@ sub _index_sync { }; $dbh->begin_work; - read_log($self, $xlog, *index_both, *unindex_both, $cb); + read_log($self, $xlog, $batch_cb); } sub DESTROY {
NNTP and IMAP both require CRLF conversions on the wire. They're also the only components which care about $smsg->{bytes}, so store the CRLF-adjusted value in over.sqlite3 and Xapian DBs.. This will allow us to optimize RFC822.SIZE fetch item in IMAP without triggering size mismatch errors in some clients' default configurations (e.g. Mail::IMAPClient), but not most others. It could also fix hypothetical problems with NNTP clients that report discrepancies between overview and article data. --- lib/PublicInbox/Import.pm | 2 +- lib/PublicInbox/SearchIdx.pm | 12 ++++++++++++ lib/PublicInbox/SearchIdxShard.pm | 11 ++++++----- lib/PublicInbox/V2Writable.pm | 10 ++++++---- t/import.t | 5 +++-- t/nntpd.t | 5 ++++- t/search.t | 8 ++++++++ 7 files changed, 40 insertions(+), 13 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index ab75aa00dc2..af35905be49 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/PublicInbox/Import.pm @@ -400,7 +400,7 @@ sub add { # v2: we need this for Xapian if ($smsg) { $smsg->{blob} = $self->get_mark(":$blob"); - $smsg->{bytes} = $n; + $smsg->{raw_bytes} = $n; $smsg->{-raw_email} = \$raw_email; } my $ref = $self->{ref}; diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index a790ac4076a..85821ea706a 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -549,11 +549,23 @@ sub unindex_mm { $self->{mm}->mid_delete(mid_mime($mime)); } +# returns the number of bytes to add if given a non-CRLF arg +sub crlf_adjust ($) { + if (index($_[0], "\r\n") < 0) { + # common case is LF-only, every \n needs an \r; + # so favor a cheap tr// over an expensive m//g + $_[0] =~ tr/\n/\n/; + } else { # count number of '\n' w/o '\r', expensive: + scalar(my @n = ($_[0] =~ m/(?<!\r)\n/g)); + } +} + sub index_both { # git->cat_async callback my ($bref, $oid, $type, $size, $sync) = @_; my ($nr, $max) = @$sync{qw(nr max)}; ++$$nr; $$max -= $size; + $size += crlf_adjust($$bref); my $smsg = bless { bytes => $size, blob => $oid }, 'PublicInbox::Smsg'; my $self = $sync->{sidx}; my $eml = PublicInbox::Eml->new($bref); diff --git a/lib/PublicInbox/SearchIdxShard.pm b/lib/PublicInbox/SearchIdxShard.pm index c1f52d8b884..f7ba293ff5b 100644 --- a/lib/PublicInbox/SearchIdxShard.pm +++ b/lib/PublicInbox/SearchIdxShard.pm @@ -71,11 +71,11 @@ sub shard_worker_loop ($$$$$) { } else { chomp $line; # n.b. $mid may contain spaces(!) - my ($bytes, $num, $blob, $ds, $ts, $mid) = - split(/ /, $line, 6); + my ($to_read, $bytes, $num, $blob, $ds, $ts, $mid) = + split(/ /, $line, 7); $self->begin_txn_lazy; - my $n = read($r, my $msg, $bytes) or die "read: $!\n"; - $n == $bytes or die "short read: $n != $bytes\n"; + my $n = read($r, my $msg, $to_read) or die "read: $!\n"; + $n == $to_read or die "short read: $n != $to_read\n"; my $mime = PublicInbox::Eml->new(\$msg); my $smsg = bless { bytes => $bytes, @@ -96,7 +96,8 @@ sub index_raw { my ($self, $msgref, $mime, $smsg) = @_; if (my $w = $self->{w}) { # mid must be last, it can contain spaces (but not LF) - print $w join(' ', @$smsg{qw(bytes num blob ds ts mid)}), + print $w join(' ', @$smsg{qw(raw_bytes bytes + num blob ds ts mid)}), "\n", $$msgref or die "failed to write shard $!\n"; } else { $$msgref = undef; diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 79bee7f9f3d..91379431633 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -155,10 +155,12 @@ sub add { # indexes a message, returns true if checkpointing is needed sub do_idx ($$$$) { my ($self, $msgref, $mime, $smsg) = @_; + $smsg->{bytes} = $smsg->{raw_bytes} + + PublicInbox::SearchIdx::crlf_adjust($$msgref); $self->{over}->add_overview($mime, $smsg); my $idx = idx_shard($self, $smsg->{num} % $self->{shards}); $idx->index_raw($msgref, $mime, $smsg); - my $n = $self->{transact_bytes} += $smsg->{bytes}; + my $n = $self->{transact_bytes} += $smsg->{raw_bytes}; $n >= ($PublicInbox::SearchIdx::BATCH_BYTES * $self->{shards}); } @@ -568,7 +570,7 @@ W: $list for my $smsg (@$need_reindex) { my $new_smsg = bless { blob => $blob, - bytes => $bytes, + raw_bytes => $bytes, num => $smsg->{num}, mid => $smsg->{mid}, }, 'PublicInbox::Smsg'; @@ -962,7 +964,7 @@ sub reindex_oid_m ($$$$;$) { } $sync->{nr}++; my $smsg = bless { - bytes => $len, + raw_bytes => $len, num => $num, blob => $oid, mid => $mid0, @@ -1054,7 +1056,7 @@ sub reindex_oid ($$$$) { die "failed to delete <$mid0> for article #$num\n"; $sync->{nr}++; my $smsg = bless { - bytes => $len, + raw_bytes => $len, num => $num, blob => $oid, mid => $mid0, diff --git a/t/import.t b/t/import.t index f987b1141f7..abbc8229d0e 100644 --- a/t/import.t +++ b/t/import.t @@ -32,8 +32,9 @@ like($im->add($mime, undef, $smsg), qr/\A:[0-9]+\z/, 'added one message'); if ($v2) { like($smsg->{blob}, qr/\A[a-f0-9]{40}\z/, 'got last object_id'); - is($mime->as_string, ${$smsg->{-raw_email}}, 'string matches'); - is($smsg->{bytes}, length(${$smsg->{-raw_email}}), 'length matches'); + my $raw_email = $smsg->{-raw_email}; + is($mime->as_string, $$raw_email, 'string matches'); + is($smsg->{raw_bytes}, length($$raw_email), 'length matches'); my @cmd = ('git', "--git-dir=$git->{git_dir}", qw(hash-object --stdin)); my $in = tempfile(); print $in $mime->as_string or die "write failed: $!"; diff --git a/t/nntpd.t b/t/nntpd.t index eee67ea65bb..d2f31323115 100644 --- a/t/nntpd.t +++ b/t/nntpd.t @@ -73,7 +73,10 @@ EOF my $list_id = $addr; $list_id =~ s/@/./; $mime->header_set('List-Id', "<$list_id>"); - $len = length($mime->as_string); + my $str = $mime->as_string; + $str =~ s/(?<!\r)\n/\r\n/sg; + $len = length($str); + undef $str; $im->add($mime); $im->done; if ($version == 1) { diff --git a/t/search.t b/t/search.t index d4ca28c794f..82caf9e41c3 100644 --- a/t/search.t +++ b/t/search.t @@ -59,6 +59,14 @@ sub oct_is ($$$) { } } +{ + my $crlf_adjust = \&PublicInbox::SearchIdx::crlf_adjust; + is($crlf_adjust->("hi\r\nworld\r\n"), 0, 'no adjustment needed'); + is($crlf_adjust->("hi\nworld\n"), 2, 'LF-only counts two CR'); + is($crlf_adjust->("hi\r\nworld\n"), 1, 'CRLF/LF-mix 1 counts 1 CR'); + is($crlf_adjust->("hi\nworld\r\n"), 1, 'CRLF/LF-mix 2 counts 1 CR'); +} + $ibx->with_umask(sub { my $root = PublicInbox::Eml->new(<<'EOF'); Date: Fri, 02 Oct 1993 00:00:00 +0000
Since we started indexing the CRLF-adjusted size of messages, we can take an order-of-magnitude speedup for certain MUAs which fetch this attribute without needing much else. Admins are encouraged to --reindex existing inboxes for IMAP support, anyways. It won't be fatal if it's not reindexed, but some client bugs and warnings can be fixed and they'll be able to support more of IMAP. --- lib/PublicInbox/IMAP.pm | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 803ce31ff22..9cdcba693c7 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -53,7 +53,7 @@ my %FETCH_NEED = ( 'BODY[]' => [ NEED_BLOB, \&emit_rfc822 ], 'RFC822.HEADER' => [ NEED_EML, \&emit_rfc822_header ], 'RFC822.TEXT' => [ NEED_EML, \&emit_rfc822_text ], - 'RFC822.SIZE' => [ NEED_BLOB, \&emit_rfc822_size ], + 'RFC822.SIZE' => [ NEED_SMSG, \&emit_rfc822_size ], RFC822 => [ NEED_BLOB, \&emit_rfc822 ], BODY => [ NEED_EML, \&emit_body ], BODYSTRUCTURE => [ NEED_EML, \&emit_bodystructure ], @@ -500,11 +500,12 @@ sub emit_rfc822 { $self->msg_more($$bref); } -# Mail::IMAPClient::message_string cares about this by default -# (->Ignoresizeerrors attribute) +# Mail::IMAPClient::message_string cares about this by default, +# (->Ignoresizeerrors attribute). Admins are encouraged to +# --reindex for IMAP support, anyways. sub emit_rfc822_size { - my ($self, $k, undef, $bref) = @_; - $self->msg_more(' RFC822.SIZE ' . length($$bref)); + my ($self, $k, $smsg) = @_; + $self->msg_more(' RFC822.SIZE ' . $smsg->{bytes}); } sub emit_internaldate {
It seems required based on my reading of RFC 3501 for the non-UID "FETCH" command. --- lib/PublicInbox/IMAP.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 9cdcba693c7..d3f4874e496 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -884,7 +884,7 @@ sub fetch_compile ($) { @r; } -sub cmd_uid_fetch ($$$;@) { +sub cmd_uid_fetch ($$$$;@) { my ($self, $tag, $range_csv, @want) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; my ($cb, $ops, $partial) = fetch_compile(\@want);
While selecting a mailbox is done case-insensitively, "INBOX" is special for the LIST command, according to RFC 3501 6.3.8: > The special name INBOX is included in the output from LIST, if > INBOX is supported by this server for this user and if the > uppercase string "INBOX" matches the interpreted reference and > mailbox name arguments with wildcards as described above. The > criteria for omitting INBOX is whether SELECT INBOX will > return failure; it is not relevant whether the user's real > INBOX resides on this or some other server. Thus, the existing news.public-inbox.org convention of naming newsgroups starting with "inbox." needs to be special-cased to not confuse clients. While we're at it, do not create ".0" for dummy newsgroups if they're selected, either. --- lib/PublicInbox/IMAP.pm | 8 ++++---- lib/PublicInbox/IMAPD.pm | 4 +++- t/imapd.t | 4 ++-- 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index d3f4874e496..b3c449b0712 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -254,13 +254,14 @@ sub cmd_done ($$) { sub ensure_ranges_exist ($$$) { my ($imapd, $ibx, $max) = @_; + defined(my $mb_top = $ibx->{newsgroup}) or return; my $mailboxes = $imapd->{mailboxes}; - my $mb_top = $ibx->{newsgroup}; my @created; for (my $i = int($max/UID_BLOCK); $i >= 0; --$i) { my $sub_mailbox = "$mb_top.$i"; last if exists $mailboxes->{$sub_mailbox}; $mailboxes->{$sub_mailbox} = $ibx; + $sub_mailbox =~ s/\Ainbox\./INBOX./i; # more familiar to users push @created, $sub_mailbox; } return unless @created; @@ -693,9 +694,8 @@ sub cmd_list ($$$$) { # request for hierarchy delimiter $l = [ qq[* LIST (\\Noselect) "." ""\r\n] ]; } elsif ($refname ne '' || $wildcard ne '*') { - $wildcard = lc $wildcard; - $wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!eg; - $l = [ grep(/ \Q$refname\E$wildcard\r\n\z/s, @$l) ]; + $wildcard =~ s!([^a-z0-9_])!$patmap{$1} // "\Q$1"!egi; + $l = [ grep(/ \Q$refname\E$wildcard\r\n\z/is, @$l) ]; } \(join('', @$l, "$tag OK List done\r\n")); } diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index 261d756042f..186ec7b0062 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -71,7 +71,9 @@ sub imapd_refresh_finalize { $imapd->{inboxlist} = [ map { my $no = $mailboxes->{$_} == $dummy ? '' : 'No'; - qq[* LIST (\\Has${no}Children) "." $_\r\n] + my $u = $_; # capitalize "INBOX" for user-familiarity + $u =~ s/\Ainbox(\.|\z)/INBOX$1/i; + qq[* LIST (\\Has${no}Children) "." $u\r\n] } sort { # shortest names first, alphabetically if lengths match length($a) == length($b) ? diff --git a/t/imapd.t b/t/imapd.t index 233be9f2c0a..c691e1a96a7 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -101,13 +101,13 @@ like($raw[0], qr/\A\*\x20STATUS\x20inbox\.i1\.$first_range\x20 like($raw[1], qr/\A\S+ OK /, 'finished status response'); my @orig_list = @raw = $mic->list; -like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/, +like($raw[0], qr/^\* LIST \(.*?\) "\." INBOX/, 'got an inbox'); like($raw[-1], qr/^\S+ OK /, 'response ended with OK'); is(scalar(@raw), scalar(@V) + 4, 'default LIST response'); @raw = $mic->list('', 'inbox.i1'); is(scalar(@raw), 2, 'limited LIST response'); -like($raw[0], qr/^\* LIST \(.*?\) "\." inbox/, +like($raw[0], qr/^\* LIST \(.*?\) "\." INBOX/, 'got an inbox.i1'); like($raw[-1], qr/^\S+ OK /, 'response ended with OK');
RFC 2683 section 3.2.1.5 recommends it: > For its part, a server should allow for a command line of at least > 8000 octets. This provides plenty of leeway for accepting reasonable > length commands from clients. The server should send a BAD response > to a command that does not end within the server's maximum accepted > command length. To conserve memory, we won't bother reading the entire line before sending the BAD response and disconnecting them. --- Documentation/standards.perl | 1 + lib/PublicInbox/IMAP.pm | 7 +++++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/Documentation/standards.perl b/Documentation/standards.perl index 8fc852c722b..b4bda8a9c06 100755 --- a/Documentation/standards.perl +++ b/Documentation/standards.perl @@ -44,6 +44,7 @@ my $rfcs = [ 5322 => 'Internet message format (2008)', 3501 => 'IMAP4rev1', 2177 => 'IMAP IDLE', + 2683 => 'IMAP4 Implementation Recommendations', # 5032 = 'WITHIN search extension for IMAP', 4978 => 'IMAP COMPRESS Extension', # 5182 = 'IMAP Extension for Referencing the Last SEARCH Result', diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index b3c449b0712..2e50415de57 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -36,7 +36,7 @@ for my $mod (qw(Email::Address::XS Mail::Address)) { } die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address; -sub LINE_MAX () { 512 } # does RFC 3501 have a limit like RFC 977? +sub LINE_MAX () { 8000 } # RFC 2683 3.2.1.5 # changing this will cause grief for clients which cache sub UID_BLOCK () { 50_000 } @@ -1170,7 +1170,10 @@ sub event_step { my $rbuf = $self->{rbuf} // \(my $x = ''); my $line = index($$rbuf, "\n"); while ($line < 0) { - return $self->close if length($$rbuf) >= LINE_MAX; + if (length($$rbuf) >= LINE_MAX) { + $self->write(\"\* BAD request too long\r\n"); + return $self->close; + } $self->do_read($rbuf, LINE_MAX, length($$rbuf)) or return; $line = index($$rbuf, "\n"); }
The performance problem with mutt not using header caches isn't fixed, yet, but mutt header caching seems to depend on MSNs (message sequence numbers). We'll switch to storing the 0-based {uid_base} instead of the 1-based {uid_min} since it simplifies most of our code. --- lib/PublicInbox/IMAP.pm | 121 ++++++++++++++++++++++++---------------- 1 file changed, 73 insertions(+), 48 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 2e50415de57..12123072d1a 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -7,17 +7,13 @@ # slow storage. # # data notes: -# * NNTP article numbers are UIDs and message sequence numbers (MSNs) -# * Message sequence numbers (MSNs) can be stable since we're read-only. -# Most IMAP clients use UIDs (I hope). We may return a dummy message -# in the future if a client requests a non-existent MSN, but that seems -# unecessary with mutt. +# * NNTP article numbers are UIDs package PublicInbox::IMAP; use strict; use base qw(PublicInbox::DS); use fields qw(imapd ibx long_cb -login_tag - uid_min -idle_tag -idle_max); + uid_base -idle_tag -idle_max); use PublicInbox::Eml; use PublicInbox::EmlContentFoo qw(parse_content_disposition); use PublicInbox::DS qw(now); @@ -151,7 +147,7 @@ sub cmd_login ($$$$) { sub cmd_close ($$) { my ($self, $tag) = @_; - delete $self->{uid_min}; + delete $self->{uid_base}; delete $self->{ibx} ? "$tag OK Close done\r\n" : "$tag BAD No mailbox\r\n"; } @@ -181,11 +177,14 @@ sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" } sub on_inbox_unlock { my ($self, $ibx) = @_; my $new = $ibx->mm->max; - my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK; + my $uid_base = $self->{uid_base} // 0; + my $uid_end = $uid_base + UID_BLOCK; defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset'; $new = $uid_end if $new > $uid_end; if ($new > $old) { $self->{-idle_max} = $new; + $new -= $uid_base; + $old -= $uid_base; $self->msg_more("* $_ EXISTS\r\n") for (($old + 1)..($new - 1)); $self->write(\"* $new EXISTS\r\n"); } elsif ($new == $uid_end) { # max exceeded $uid_end @@ -218,7 +217,7 @@ sub cmd_idle ($$) { my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n"; $self->{-idle_tag} = $tag; my $max = $ibx->mm->max // 0; - my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK; + my $uid_end = ($self->{uid_base} // 0) + UID_BLOCK; my $sock = $self->{sock} or return; my $fd = fileno($sock); # only do inotify on most recent slice @@ -274,18 +273,19 @@ sub inbox_lookup ($$) { my ($ibx, $exists, $uidnext); if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) { # old mail: inbox.comp.foo.$uid_block_idx - my ($mb_top, $uid_min) = ($1, $2 * UID_BLOCK + 1); + my ($mb_top, $uid_base) = ($1, $2 * UID_BLOCK); $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; $exists = $ibx->mm->max // 0; - $self->{uid_min} = $uid_min; + $self->{uid_base} = $uid_base; ensure_ranges_exist($self->{imapd}, $ibx, $exists); - my $uid_end = $uid_min + UID_BLOCK - 1; + my $uid_end = $uid_base + UID_BLOCK; $exists = $uid_end if $exists > $uid_end; $uidnext = $exists + 1; + $exists -= $uid_base; } else { # check for dummy inboxes $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; - delete $self->{uid_min}; + delete $self->{uid_base}; $exists = 0; $uidnext = 1; } @@ -469,9 +469,24 @@ sub requeue_once ($) { $self->requeue if $new_size == 1; } -sub uid_fetch_cb { # called by git->cat_async via git_async_cat - my ($bref, $oid, $type, $size, $fetch_m_arg) = @_; - my ($self, undef, $msgs, undef, $ops, $partial) = @$fetch_m_arg; +# my ($uid_base, $UID) = @_; +sub fetch_msn_uid ($$) { '* '.($_[1] - $_[0]).' FETCH (UID '.$_[1] } + +sub fetch_run_ops { + my ($self, $uid_base, $smsg, $bref, $ops, $partial) = @_; + $self->msg_more(fetch_msn_uid($uid_base, $smsg->{num})); + my ($eml, $k); + for (my $i = 0; $i < @$ops;) { + $k = $ops->[$i++]; + $ops->[$i++]->($self, $k, $smsg, $bref, $eml); + } + partial_emit($self, $partial, $eml) if $partial; + $self->msg_more(")\r\n"); +} + +sub fetch_blob_cb { # called by git->cat_async via git_async_cat + my ($bref, $oid, $type, $size, $fetch_arg) = @_; + my ($self, undef, $msgs, undef, $ops, $partial) = @$fetch_arg; my $smsg = shift @$msgs or die 'BUG: no smsg'; if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs @@ -484,14 +499,8 @@ sub uid_fetch_cb { # called by git->cat_async via git_async_cat # fixup old bug from import (pre-a0c07cba0e5d8b6a) $$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s; - $self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}"); - my $eml; - for (my $i = 0; $i < @$ops;) { - my $k = $ops->[$i++]; - $ops->[$i++]->($self, $k, $smsg, $bref, $eml); - } - partial_emit($self, $partial, $eml) if $partial; - $self->msg_more(")\r\n"); + fetch_run_ops($self, $self->{uid_base} // 0, + $smsg, $bref, $ops, $partial); requeue_once($self); } @@ -549,7 +558,7 @@ sub op_eml_new { $_[4] = PublicInbox::Eml->new($_[3]) } sub uid_clamp ($$$) { my ($self, $beg, $end) = @_; - my $uid_min = $self->{uid_min} or return; + my $uid_min = ($self->{uid_base} // 0) + 1; my $uid_end = $uid_min + UID_BLOCK - 1; $$beg = $uid_min if $$beg < $uid_min; $$end = $uid_end if $$end > $uid_end; @@ -569,7 +578,7 @@ sub range_step ($$) { } elsif ($range =~ /\A([0-9]+):\*\z/) { $beg = $1 + 0; $end = $self->{ibx}->mm->max // 0; - my $uid_end = ($self->{uid_min} // 1) - 1 + UID_BLOCK; + my $uid_end = $self->{uid_base} + UID_BLOCK; $end = $uid_end if $end > $uid_end; $beg = $end if $beg > $end; } elsif ($range =~ /\A[0-9]+\z/) { @@ -596,8 +605,8 @@ sub refill_range ($$$) { undef; # keep looping } -sub uid_fetch_msg { # long_response - my ($self, $tag, $msgs, $range_info) = @_; # \@ops, \@partial +sub fetch_blob { # long_response + my ($self, $tag, $msgs, $range_info, $ops, $partial) = @_; while (!@$msgs) { # rare if (my $end = refill_range($self, $msgs, $range_info)) { $self->write(\"$tag $end\r\n"); @@ -605,10 +614,10 @@ sub uid_fetch_msg { # long_response } } git_async_cat($self->{ibx}->git, $msgs->[0]->{blob}, - \&uid_fetch_cb, \@_); + \&fetch_blob_cb, \@_); } -sub uid_fetch_smsg { # long_response +sub fetch_smsg { # long_response my ($self, $tag, $msgs, $range_info, $ops) = @_; while (!@$msgs) { # rare if (my $end = refill_range($self, $msgs, $range_info)) { @@ -616,20 +625,15 @@ sub uid_fetch_smsg { # long_response return; } } - for my $smsg (@$msgs) { - $self->msg_more("* $smsg->{num} FETCH (UID $smsg->{num}"); - for (my $i = 0; $i < @$ops;) { - my $k = $ops->[$i++]; - $ops->[$i++]->($self, $k, $smsg); - } - $self->msg_more(")\r\n"); - } + my $uid_base = $self->{uid_base} // 0; + fetch_run_ops($self, $uid_base, $_, undef, $ops) for @$msgs; @$msgs = (); 1; # more } -sub uid_fetch_uid { # long_response +sub fetch_uid { # long_response my ($self, $tag, $uids, $range_info, $ops) = @_; + while (!@$uids) { # rare my ($beg, $end, $range_csv) = @$range_info; if (scalar(@$uids = @{$self->{ibx}->over-> @@ -648,10 +652,12 @@ sub uid_fetch_uid { # long_response } # continue looping } + my $uid_base = $self->{uid_base} // 0; + my ($i, $k); for (@$uids) { - $self->msg_more("* $_ FETCH (UID $_"); - for (my $i = 0; $i < @$ops;) { - my $k = $ops->[$i++]; + $self->msg_more(fetch_msn_uid($uid_base, $_)); + for ($i = 0; $i < @$ops;) { + $k = $ops->[$i++]; $ops->[$i++]->($self, $k); } $self->msg_more(")\r\n"); @@ -875,8 +881,8 @@ sub fetch_compile ($) { $r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ]; } - $r[0] = $need & NEED_BLOB ? \&uid_fetch_msg : - ($need & NEED_SMSG ? \&uid_fetch_smsg : \&uid_fetch_uid); + $r[0] = $need & NEED_BLOB ? \&fetch_blob : + ($need & NEED_SMSG ? \&fetch_smsg : \&fetch_uid); # r[1] = [ $key1, $cb1, $key2, $cb2, ... ] use sort 'stable'; # makes output more consistent @@ -896,6 +902,25 @@ sub cmd_uid_fetch ($$$$;@) { long_response($self, $cb, $tag, [], $range_info, $ops, $partial); } +sub msn_to_uid_range ($$) { + my $uid_base = $_[0]->{uid_base} // 0; + $_[1] =~ s/([0-9]+)/$uid_base + $1/sge; +} + +sub cmd_fetch ($$$$;@) { + my ($self, $tag, $range_csv, @want) = @_; + my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; + my ($cb, $ops, $partial) = fetch_compile(\@want); + return "$tag $cb\r\n" unless $ops; + + # cb is one of fetch_blob, fetch_smsg, fetch_uid + $range_csv = 'bad' if $range_csv !~ $valid_range; + msn_to_uid_range($self, $range_csv); + my $range_info = range_step($self, \$range_csv); + return "$tag $range_info\r\n" if !ref($range_info); + long_response($self, $cb, $tag, [], $range_info, $ops, $partial); +} + sub parse_date ($) { # 02-Oct-1993 my ($date_text) = @_; my ($dd, $mon, $yyyy) = split(/-/, $_[0], 3); @@ -979,7 +1004,8 @@ sub parse_query { # default criteria next if $k =~ /\A(?:ALL|RECENT|UNSEEN|NEW)\z/; next if $k eq 'AND'; # the default, until we support OR - if ($k =~ $valid_range) { # sequence numbers == UIDs + if ($k =~ $valid_range) { # convert sequence numbers to UIDs + msn_to_uid_range($self, $k); push @{$q->{uid}}, $k; } elsif ($k eq 'UID') { $k = shift(@$rest) // ''; @@ -1037,8 +1063,8 @@ sub cmd_uid_search ($$$;) { if (!scalar(keys %$q)) { $self->msg_more('* SEARCH'); - my $beg = $self->{uid_min} // 1; - my $end = $ibx->mm->max; + my $beg = 1; + my $end = $ibx->mm->max // 0; uid_clamp($self, \$beg, \$end); long_response($self, \&uid_search_uid_range, $tag, \$beg, $end, $sql); @@ -1244,7 +1270,6 @@ sub close { # we're read-only, so SELECT and EXAMINE do the same thing no warnings 'once'; *cmd_select = \&cmd_examine; -*cmd_fetch = \&cmd_uid_fetch; package PublicInbox::IMAP_preauth; our @ISA = qw(PublicInbox::IMAP);
Ensure {uid_base} is always set, so we don't need to add `//' checks everywhere. Furthermore, this fixes a hard-to-test bug where the STATUS command would inadvertantly clobber {uid_base}. --- lib/PublicInbox/IMAP.pm | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 12123072d1a..e9fa338b5fe 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -217,7 +217,7 @@ sub cmd_idle ($$) { my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n"; $self->{-idle_tag} = $tag; my $max = $ibx->mm->max // 0; - my $uid_end = ($self->{uid_base} // 0) + UID_BLOCK; + my $uid_end = $self->{uid_base} + UID_BLOCK; my $sock = $self->{sock} or return; my $fd = fileno($sock); # only do inotify on most recent slice @@ -270,14 +270,13 @@ sub ensure_ranges_exist ($$$) { sub inbox_lookup ($$) { my ($self, $mailbox) = @_; - my ($ibx, $exists, $uidnext); + my ($ibx, $exists, $uidnext, $uid_base); if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) { # old mail: inbox.comp.foo.$uid_block_idx - my ($mb_top, $uid_base) = ($1, $2 * UID_BLOCK); - + my $mb_top = $1; + $uid_base = $2 * UID_BLOCK; $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; $exists = $ibx->mm->max // 0; - $self->{uid_base} = $uid_base; ensure_ranges_exist($self->{imapd}, $ibx, $exists); my $uid_end = $uid_base + UID_BLOCK; $exists = $uid_end if $exists > $uid_end; @@ -285,17 +284,17 @@ sub inbox_lookup ($$) { $exists -= $uid_base; } else { # check for dummy inboxes $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; - delete $self->{uid_base}; - $exists = 0; + $uid_base = $exists = 0; $uidnext = 1; } - ($ibx, $exists, $uidnext); + ($ibx, $exists, $uidnext, $uid_base); } sub cmd_examine ($$$) { my ($self, $tag, $mailbox) = @_; - my ($ibx, $exists, $uidnext) = inbox_lookup($self, $mailbox); + my ($ibx, $exists, $uidnext, $base) = inbox_lookup($self, $mailbox); return "$tag NO Mailbox doesn't exist: $mailbox\r\n" if !$ibx; + $self->{uid_base} = $base; # XXX: do we need this? RFC 5162/7162 my $ret = $self->{ibx} ? "* OK [CLOSED] previous closed\r\n" : ''; @@ -499,8 +498,7 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat # fixup old bug from import (pre-a0c07cba0e5d8b6a) $$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s; - fetch_run_ops($self, $self->{uid_base} // 0, - $smsg, $bref, $ops, $partial); + fetch_run_ops($self, $self->{uid_base}, $smsg, $bref, $ops, $partial); requeue_once($self); } @@ -558,7 +556,7 @@ sub op_eml_new { $_[4] = PublicInbox::Eml->new($_[3]) } sub uid_clamp ($$$) { my ($self, $beg, $end) = @_; - my $uid_min = ($self->{uid_base} // 0) + 1; + my $uid_min = $self->{uid_base} + 1; my $uid_end = $uid_min + UID_BLOCK - 1; $$beg = $uid_min if $$beg < $uid_min; $$end = $uid_end if $$end > $uid_end; @@ -625,7 +623,7 @@ sub fetch_smsg { # long_response return; } } - my $uid_base = $self->{uid_base} // 0; + my $uid_base = $self->{uid_base}; fetch_run_ops($self, $uid_base, $_, undef, $ops) for @$msgs; @$msgs = (); 1; # more @@ -652,7 +650,7 @@ sub fetch_uid { # long_response } # continue looping } - my $uid_base = $self->{uid_base} // 0; + my $uid_base = $self->{uid_base}; my ($i, $k); for (@$uids) { $self->msg_more(fetch_msn_uid($uid_base, $_)); @@ -903,7 +901,7 @@ sub cmd_uid_fetch ($$$$;@) { } sub msn_to_uid_range ($$) { - my $uid_base = $_[0]->{uid_base} // 0; + my $uid_base = $_[0]->{uid_base}; $_[1] =~ s/([0-9]+)/$uid_base + $1/sge; }
This speeds up requests from mutt for HEADER.FIELDS by around 10% since we don't waste time doing CRLF conversion on large message bodies that get discarded, anyways. --- lib/PublicInbox/IMAP.pm | 80 ++++++++++++++++++++++++++--------------- t/imap.t | 29 +++++++++------ 2 files changed, 70 insertions(+), 39 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index e9fa338b5fe..3cc68e66872 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -40,20 +40,27 @@ sub UID_BLOCK () { 50_000 } # these values area also used for sorting sub NEED_SMSG () { 1 } sub NEED_BLOB () { NEED_SMSG|2 } -sub NEED_EML () { NEED_BLOB|4 } -my $OP_EML_NEW = [ NEED_EML - 1, \&op_eml_new ]; +sub CRLF_BREF () { 4 } +sub EML_HDR () { 8 } +sub CRLF_HDR () { 16 } +sub EML_BDY () { 32 } +sub CRLF_BDY () { 64 } +my $OP_EML_NEW = [ EML_HDR - 1, \&op_eml_new ]; +my $OP_CRLF_BREF = [ CRLF_BREF, \&op_crlf_bref ]; +my $OP_CRLF_HDR = [ CRLF_HDR, \&op_crlf_hdr ]; +my $OP_CRLF_BDY = [ CRLF_BDY, \&op_crlf_bdy ]; my %FETCH_NEED = ( - 'BODY[HEADER]' => [ NEED_EML, \&emit_rfc822_header ], - 'BODY[TEXT]' => [ NEED_EML, \&emit_rfc822_text ], - 'BODY[]' => [ NEED_BLOB, \&emit_rfc822 ], - 'RFC822.HEADER' => [ NEED_EML, \&emit_rfc822_header ], - 'RFC822.TEXT' => [ NEED_EML, \&emit_rfc822_text ], + 'BODY[HEADER]' => [ NEED_BLOB|EML_HDR|CRLF_HDR, \&emit_rfc822_header ], + 'BODY[TEXT]' => [ NEED_BLOB|EML_BDY|CRLF_BDY, \&emit_rfc822_text ], + 'BODY[]' => [ NEED_BLOB|CRLF_BREF, \&emit_rfc822 ], + 'RFC822.HEADER' => [ NEED_BLOB|EML_HDR|CRLF_HDR, \&emit_rfc822_header ], + 'RFC822.TEXT' => [ NEED_BLOB|EML_BDY|CRLF_BDY, \&emit_rfc822_text ], 'RFC822.SIZE' => [ NEED_SMSG, \&emit_rfc822_size ], - RFC822 => [ NEED_BLOB, \&emit_rfc822 ], - BODY => [ NEED_EML, \&emit_body ], - BODYSTRUCTURE => [ NEED_EML, \&emit_bodystructure ], - ENVELOPE => [ NEED_EML, \&emit_envelope ], + RFC822 => [ NEED_BLOB|CRLF_BREF, \&emit_rfc822 ], + BODY => [ NEED_BLOB|EML_HDR|EML_BDY, \&emit_body ], + BODYSTRUCTURE => [ NEED_BLOB|EML_HDR|EML_BDY, \&emit_bodystructure ], + ENVELOPE => [ NEED_BLOB|EML_HDR, \&emit_envelope ], FLAGS => [ 0, \&emit_flags ], INTERNALDATE => [ NEED_SMSG, \&emit_internaldate ], ); @@ -494,10 +501,6 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } - $$bref =~ s/(?<!\r)\n/\r\n/sg; # make strict clients happy - - # fixup old bug from import (pre-a0c07cba0e5d8b6a) - $$bref =~ s/\A[\r\n]*From [^\r\n]*\r\n//s; fetch_run_ops($self, $self->{uid_base}, $smsg, $bref, $ops, $partial); requeue_once($self); } @@ -554,6 +557,18 @@ sub emit_body { # set $eml once ($_[4] == $eml, $_[3] == $bref) sub op_eml_new { $_[4] = PublicInbox::Eml->new($_[3]) } +# s/From / fixes old bug from import (pre-a0c07cba0e5d8b6a) +sub to_crlf_full { + ${$_[0]} =~ s/(?<!\r)\n/\r\n/sg; + ${$_[0]} =~ s/\A[\r\n]*From [^\r\n]*\r\n//s; +} + +sub op_crlf_bref { to_crlf_full($_[3]) } + +sub op_crlf_hdr { to_crlf_full($_[4]->{hdr}) } + +sub op_crlf_bdy { ${$_[4]->{bdy}} =~ s/(?<!\r)\n/\r\n/sg if $_[4]->{bdy} } + sub uid_clamp ($$$) { my ($self, $beg, $end) = @_; my $uid_min = $self->{uid_base} + 1; @@ -790,8 +805,8 @@ sub partial_hdr_get { join('', ($str =~ m/($hdrs_re)/g), "\r\n"); } -sub partial_prepare ($$$) { - my ($partial, $want, $att) = @_; +sub partial_prepare ($$$$) { + my ($need, $partial, $want, $att) = @_; # recombine [ "BODY[1.HEADER.FIELDS", "(foo", "bar)]" ] # back to: "BODY[1.HEADER.FIELDS (foo bar)]" @@ -804,6 +819,7 @@ sub partial_prepare ($$$) { (?:\.(HEADER|MIME|TEXT))? # 2 - section_name \](?:<([0-9]+)(?:\.([0-9]+))?>)?\z/sx) { # 3, 4 $partial->{$att} = [ \&partial_body, $1, $2, $3, $4 ]; + $$need |= CRLF_BREF|EML_HDR|EML_BDY; } elsif ($att =~ /\ABODY\[(?:([0-9]+(?:\.[0-9]+)*)\.)? # 1 - section_idx (?:HEADER\.FIELDS(\.NOT)?)\x20 # 2 \(([A-Z0-9\-\x20]+)\) # 3 - hdrs @@ -812,6 +828,7 @@ sub partial_prepare ($$$) { : \&partial_hdr_get, $1, undef, $4, $5 ]; $tmp->[2] = hdrs_regexp($3); + $$need |= CRLF_HDR|EML_HDR; } else { undef; } @@ -857,16 +874,9 @@ sub fetch_compile ($) { while (my ($k, $fl_cb) = each %$x) { next if $seen{$k}++; $need |= $fl_cb->[0]; - - # insert a special op to convert $bref to $eml - # the first time we need it - if ($need == NEED_EML && !$seen{$need}++) { - push @op, $OP_EML_NEW; - } - # $fl_cb = [ flags, \&emit_foo ] - push @op, [ @$fl_cb , $k ]; + push @op, [ @$fl_cb, $k ]; } - } elsif (!partial_prepare(\%partial, $want, $att)) { + } elsif (!partial_prepare(\$need, \%partial, $want, $att)) { return "BAD param: $att"; } } @@ -874,11 +884,25 @@ sub fetch_compile ($) { # stabilize partial order for consistency and ease-of-debugging: if (scalar keys %partial) { - $need = NEED_EML; - push @op, $OP_EML_NEW if !$seen{$need}++; + $need |= NEED_BLOB; $r[2] = [ map { [ $_, @{$partial{$_}} ] } sort keys %partial ]; } + push @op, $OP_EML_NEW if ($need & (EML_HDR|EML_BDY)); + + # do we need CRLF conversion? + if ($need & CRLF_BREF) { + push @op, $OP_CRLF_BREF; + } elsif (my $crlf = ($need & (CRLF_HDR|CRLF_BDY))) { + if ($crlf == (CRLF_HDR|CRLF_BDY)) { + push @op, $OP_CRLF_BREF; + } elsif ($need & CRLF_HDR) { + push @op, $OP_CRLF_HDR; + } else { + push @op, $OP_CRLF_BDY; + } + } + $r[0] = $need & NEED_BLOB ? \&fetch_blob : ($need & NEED_SMSG ? \&fetch_smsg : \&fetch_uid); diff --git a/t/imap.t b/t/imap.t index 2401237c8a0..81b3d844e76 100644 --- a/t/imap.t +++ b/t/imap.t @@ -70,21 +70,26 @@ EOF { my $partial_prepare = \&PublicInbox::IMAP::partial_prepare; my $x = {}; - my $r = $partial_prepare->($x, [], my $p = 'BODY[9]'); + my $n = 0; + my $r = $partial_prepare->(\$n, $x, [], my $p = 'BODY[9]'); ok($r, $p); - $r = $partial_prepare->($x, [], $p = 'BODY[9]<5>'); + $r = $partial_prepare->(\$n, $x, [], $p = 'BODY[9]<5>'); ok($r, $p); - $r = $partial_prepare->($x, [], $p = 'BODY[9]<5.1>'); + $r = $partial_prepare->(\$n, $x, [], $p = 'BODY[9]<5.1>'); ok($r, $p); - $r = $partial_prepare->($x, [], $p = 'BODY[1.1]'); + $r = $partial_prepare->(\$n, $x, [], $p = 'BODY[1.1]'); ok($r, $p); - $r = $partial_prepare->($x, [], $p = 'BODY[HEADER.FIELDS (DATE FROM)]'); + $r = $partial_prepare->(\$n, $x, [], + $p = 'BODY[HEADER.FIELDS (DATE FROM)]'); ok($r, $p); - $r = $partial_prepare->($x, [], $p = 'BODY[HEADER.FIELDS.NOT (TO)]'); + $r = $partial_prepare->(\$n, $x, [], + $p = 'BODY[HEADER.FIELDS.NOT (TO)]'); ok($r, $p); - $r = $partial_prepare->($x, [], $p = 'BODY[HEDDER.FIELDS.NOT (TO)]'); + $r = $partial_prepare->(\$n, $x, [], + $p = 'BODY[HEDDER.FIELDS.NOT (TO)]'); ok(!$r, "rejected misspelling $p"); - $r = $partial_prepare->($x, [], $p = 'BODY[1.1.HEADER.FIELDS (TO)]'); + $r = $partial_prepare->(\$n, $x, [], + $p = 'BODY[1.1.HEADER.FIELDS (TO)]'); ok($r, $p); my $partial_body = \&PublicInbox::IMAP::partial_body; my $partial_hdr_get = \&PublicInbox::IMAP::partial_hdr_get; @@ -111,12 +116,14 @@ EOF my $fetch_compile = \&PublicInbox::IMAP::fetch_compile; my ($cb, $ops, $partial) = $fetch_compile->(['BODY[]']); is($partial, undef, 'no partial fetch data'); - is_deeply($ops, - [ 'BODY[]', \&PublicInbox::IMAP::emit_rfc822 ], - 'proper key and op compiled for BODY[]'); + is_deeply($ops, [ + undef, \&PublicInbox::IMAP::op_crlf_bref, + 'BODY[]', \&PublicInbox::IMAP::emit_rfc822 + ], 'proper key and op compiled for BODY[]'); ($cb, $ops, $partial) = $fetch_compile->(['BODY', 'BODY[]']); is_deeply($ops, [ + undef, \&PublicInbox::IMAP::op_crlf_bref, 'BODY[]', \&PublicInbox::IMAP::emit_rfc822, undef, \&PublicInbox::IMAP::op_eml_new, 'BODY', \&PublicInbox::IMAP::emit_body,
Since headers are big and include a lot of lines MUAs don't care about, we can skip the CRLF_HDR ops and just do the CRLF conversion in partial_hdr_get and partial_hdr_not. This is another 10-15% speedup for mutt w/o header caching. --- lib/PublicInbox/IMAP.pm | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 3cc68e66872..fe3c7d1a85b 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -792,6 +792,7 @@ sub partial_hdr_not { } my $str = $eml->header_obj->as_string; $str =~ s/$hdrs_re//g; + $str =~ s/(?<!\r)\n/\r\n/sg; $str .= "\r\n"; } @@ -802,7 +803,9 @@ sub partial_hdr_get { $eml = eml_body_idx($eml, $section_idx) or return; } my $str = $eml->header_obj->as_string; - join('', ($str =~ m/($hdrs_re)/g), "\r\n"); + $str = join('', ($str =~ m/($hdrs_re)/g)); + $str =~ s/(?<!\r)\n/\r\n/sg; + $str .= "\r\n"; } sub partial_prepare ($$$$) { @@ -828,7 +831,11 @@ sub partial_prepare ($$$$) { : \&partial_hdr_get, $1, undef, $4, $5 ]; $tmp->[2] = hdrs_regexp($3); - $$need |= CRLF_HDR|EML_HDR; + + # don't emit CRLF_HDR instruction, here, partial_hdr_* + # will do CRLF conversion with only the extracted result + # and not waste time converting lines we don't care about. + $$need |= EML_HDR; } else { undef; }
This appears to significantly improve header caching behavior with mutt. With the current public-inbox.org/git mirror(*), mutt will only re-FETCH the last ~300 or so messages in the final "inbox.comp.version-control.git.7" mailbox, instead of ~49,000 messages every time. It's not perfect, but a 500ms query is better than a >10s query and mutt itself spends as much time loading its header cache. (*) there are many gaps in NNTP article numbers (UIDs) due to spam removal from public-inbox-learn. --- lib/PublicInbox/IMAP.pm | 22 +++++++++++----------- t/imapd.t | 4 ++-- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index fe3c7d1a85b..d0683530988 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -475,12 +475,12 @@ sub requeue_once ($) { $self->requeue if $new_size == 1; } -# my ($uid_base, $UID) = @_; -sub fetch_msn_uid ($$) { '* '.($_[1] - $_[0]).' FETCH (UID '.$_[1] } +# my ($msn, $UID) = @_; +sub fetch_msn_uid ($$) { '* '.(${$_[0]}++).' FETCH (UID '.$_[1] } sub fetch_run_ops { - my ($self, $uid_base, $smsg, $bref, $ops, $partial) = @_; - $self->msg_more(fetch_msn_uid($uid_base, $smsg->{num})); + my ($self, $msn, $smsg, $bref, $ops, $partial) = @_; + $self->msg_more(fetch_msn_uid($msn, $smsg->{num})); my ($eml, $k); for (my $i = 0; $i < @$ops;) { $k = $ops->[$i++]; @@ -492,7 +492,7 @@ sub fetch_run_ops { sub fetch_blob_cb { # called by git->cat_async via git_async_cat my ($bref, $oid, $type, $size, $fetch_arg) = @_; - my ($self, undef, $msgs, undef, $ops, $partial) = @$fetch_arg; + my ($self, undef, $msgs, $range_info, $ops, $partial) = @$fetch_arg; my $smsg = shift @$msgs or die 'BUG: no smsg'; if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs @@ -501,7 +501,7 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } - fetch_run_ops($self, $self->{uid_base}, $smsg, $bref, $ops, $partial); + fetch_run_ops($self, $range_info->[3], $smsg, $bref, $ops, $partial); requeue_once($self); } @@ -601,7 +601,8 @@ sub range_step ($$) { return 'BAD fetch range'; } uid_clamp($self, \$beg, \$end) if defined($range); - [ $beg, $end, $$range_csv ]; + my $msn = $beg - $self->{uid_base}; + [ $beg, $end, $$range_csv, \$msn ]; } sub refill_range ($$$) { @@ -638,8 +639,7 @@ sub fetch_smsg { # long_response return; } } - my $uid_base = $self->{uid_base}; - fetch_run_ops($self, $uid_base, $_, undef, $ops) for @$msgs; + fetch_run_ops($self, $range_info->[3], $_, undef, $ops) for @$msgs; @$msgs = (); 1; # more } @@ -665,10 +665,10 @@ sub fetch_uid { # long_response } # continue looping } - my $uid_base = $self->{uid_base}; my ($i, $k); + my $msn = $range_info->[3]; for (@$uids) { - $self->msg_more(fetch_msn_uid($uid_base, $_)); + $self->msg_more(fetch_msn_uid($msn, $_)); for ($i = 0; $i < @$ops;) { $k = $ops->[$i++]; $ops->[$i++]->($self, $k); diff --git a/t/imapd.t b/t/imapd.t index c691e1a96a7..fdab074a249 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -349,8 +349,8 @@ is(scalar keys %$ret, 3, 'got all 3 messages'); } my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@"; is(scalar keys %$r2, 2, 'did not get all 3 messages'); -is($r2->{2}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged'); -is($r2->{3}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged'); +is($r2->{1}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged'); +is($r2->{2}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged'); $r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]') or BAIL_OUT "FETCH $@"; is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'},
We can get exact values for EXISTS, UIDNEXT using SQLite rather than calculating off $ibx->mm->max ourselves. Furthermore, $ibx->mm is less useful than $ibx->over for IMAP (and for our read-only daemons in general) so do not depend on $ibx->mm outside of startup/reload to save FDs and reduce kernel page cache footprint. --- lib/PublicInbox/DummyInbox.pm | 11 ++++++----- lib/PublicInbox/IMAP.pm | 30 ++++++++++++++++++------------ lib/PublicInbox/IMAPD.pm | 2 +- lib/PublicInbox/Over.pm | 30 ++++++++++++++++++++++++++++++ t/over.t | 3 +++ 5 files changed, 58 insertions(+), 18 deletions(-) diff --git a/lib/PublicInbox/DummyInbox.pm b/lib/PublicInbox/DummyInbox.pm index b6c48db1a0e..c3f4f5c6674 100644 --- a/lib/PublicInbox/DummyInbox.pm +++ b/lib/PublicInbox/DummyInbox.pm @@ -9,13 +9,14 @@ use strict; sub created_at { 0 } # Msgmap::created_at sub mm { shift } -sub max { undef } # Msgmap::max -sub msg_range { [] } # Msgmap::msg_range +sub uid_range { [] } # Over::uid_range +sub subscribe_unlock { undef }; no warnings 'once'; -*uid_range = *query_xover = \&msg_range; +*max = \&created_at; +*query_xover = \&uid_range; *over = \&mm; -*subscribe_unlock = *unsubscribe_unlock = - *get_art = *description = *base_url = \&max; +*unsubscribe_unlock = + *get_art = *description = *base_url = \&subscribe_unlock; 1; diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index d0683530988..41c8066446c 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -183,7 +183,7 @@ sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" } # called by PublicInbox::InboxIdle sub on_inbox_unlock { my ($self, $ibx) = @_; - my $new = $ibx->mm->max; + my $new = $ibx->over->max; my $uid_base = $self->{uid_base} // 0; my $uid_end = $uid_base + UID_BLOCK; defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset'; @@ -223,7 +223,7 @@ sub cmd_idle ($$) { # IDLE seems allowed by dovecot w/o a mailbox selected *shrug* my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n"; $self->{-idle_tag} = $tag; - my $max = $ibx->mm->max // 0; + my $max = $ibx->over->max; my $uid_end = $self->{uid_base} + UID_BLOCK; my $sock = $self->{sock} or return; my $fd = fileno($sock); @@ -283,14 +283,20 @@ sub inbox_lookup ($$) { my $mb_top = $1; $uid_base = $2 * UID_BLOCK; $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; - $exists = $ibx->mm->max // 0; - ensure_ranges_exist($self->{imapd}, $ibx, $exists); - my $uid_end = $uid_base + UID_BLOCK; - $exists = $uid_end if $exists > $uid_end; - $uidnext = $exists + 1; - $exists -= $uid_base; + my $max; + ($exists, $uidnext, $max) = $ibx->over->imap_status($uid_base, + $uid_base + UID_BLOCK); + ensure_ranges_exist($self->{imapd}, $ibx, $max); } else { # check for dummy inboxes - $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; + $mailbox = lc $mailbox; + $ibx = $self->{imapd}->{mailboxes}->{$mailbox} or return; + + # if "INBOX.foo.bar" is selected and "INBOX.foo.bar.0", + # check for new UID ranges (e.g. "INBOX.foo.bar.1") + if (my $z = $self->{imapd}->{mailboxes}->{"$mailbox.0"}) { + ensure_ranges_exist($self->{imapd}, $z, $z->over->max); + } + $uid_base = $exists = 0; $uidnext = 1; } @@ -590,7 +596,7 @@ sub range_step ($$) { ($beg, $end) = ($1 + 0, $2 + 0); } elsif ($range =~ /\A([0-9]+):\*\z/) { $beg = $1 + 0; - $end = $self->{ibx}->mm->max // 0; + $end = $self->{ibx}->over->max; my $uid_end = $self->{uid_base} + UID_BLOCK; $end = $uid_end if $end > $uid_end; $beg = $end if $beg > $end; @@ -1093,14 +1099,14 @@ sub cmd_uid_search ($$$;) { if (!scalar(keys %$q)) { $self->msg_more('* SEARCH'); my $beg = 1; - my $end = $ibx->mm->max // 0; + my $end = $ibx->over->max; uid_clamp($self, \$beg, \$end); long_response($self, \&uid_search_uid_range, $tag, \$beg, $end, $sql); } elsif (my $uid = $q->{uid}) { if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) { my ($beg, $end) = ($1, $2); - $end = $ibx->mm->max if $end eq '*'; + $end = $ibx->over->max if $end eq '*'; uid_clamp($self, \$beg, \$end); $self->msg_more('* SEARCH'); long_response($self, \&uid_search_uid_range, diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index 186ec7b0062..e4dc90e8d0e 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -44,7 +44,7 @@ sub imapd_refresh_ibx { # pi_config->each_inbox cb # this case is a 32-bit representation of the creation # date/time of the mailbox" defined($ibx->{uidvalidity} = $mm->created_at) or return; - PublicInbox::IMAP::ensure_ranges_exist($imapd, $ibx, $mm->max // 1); + PublicInbox::IMAP::ensure_ranges_exist($imapd, $ibx, $mm->max // 0); # preload to avoid fragmentation: $ibx->description; diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm index 1faeff418f4..e0f20ea6d3d 100644 --- a/lib/PublicInbox/Over.pm +++ b/lib/PublicInbox/Over.pm @@ -229,4 +229,34 @@ sub uid_range { $dbh->selectcol_arrayref($q, undef, $beg, $end); } +sub max { + my ($self) = @_; + my $sth = $self->connect->prepare_cached(<<'', undef, 1); +SELECT MAX(num) FROM over WHERE num > 0 + + $sth->execute; + $sth->fetchrow_array // 0; +} + +sub imap_status { + my ($self, $uid_base, $uid_end) = @_; + my $dbh = $self->connect; + my $sth = $dbh->prepare_cached(<<'', undef, 1); +SELECT COUNT(num) FROM over WHERE num > ? AND num <= ? + + $sth->execute($uid_base, $uid_end); + my $exists = $sth->fetchrow_array; + + $sth = $dbh->prepare_cached(<<'', undef, 1); +SELECT MAX(num) + 1 FROM over WHERE num <= ? + + $sth->execute($uid_end); + my $uidnext = $sth->fetchrow_array; + + $sth = $dbh->prepare_cached(<<'', undef, 1); +SELECT MAX(num) FROM over WHERE num > 0 + + ($exists, $uidnext, $sth->fetchrow_array // 0); +} + 1; diff --git a/t/over.t b/t/over.t index 5586aa310e6..734fdaa3604 100644 --- a/t/over.t +++ b/t/over.t @@ -10,6 +10,8 @@ use_ok 'PublicInbox::OverIdx'; my ($tmpdir, $for_destroy) = tmpdir(); my $over = PublicInbox::OverIdx->new("$tmpdir/over.sqlite3"); $over->connect; +is($over->max, 0, 'max is zero on new DB (scalar context)'); +is_deeply([$over->max], [0], 'max is zero on new DB (list context)'); my $x = $over->next_tid; is(int($x), $x, 'integer tid'); my $y = $over->next_tid; @@ -58,6 +60,7 @@ foreach my $mid (qw(a b)) { my $msgs = [ map { $_->{num} } @{$over->get_thread($mid)} ]; is_deeply([98, 99], $msgs, "linked messages by Message-ID: <$mid>"); } +isnt($over->max, 0, 'max is non-zero'); $over->rollback_lazy;
We can share a bit of code with FETCH to refill UID ranges which hit the SQLite overview. --- lib/PublicInbox/IMAP.pm | 84 +++++++++++++++++------------------------ 1 file changed, 35 insertions(+), 49 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 41c8066446c..f6106a1e806 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -650,26 +650,31 @@ sub fetch_smsg { # long_response 1; # more } -sub fetch_uid { # long_response - my ($self, $tag, $uids, $range_info, $ops) = @_; - - while (!@$uids) { # rare - my ($beg, $end, $range_csv) = @$range_info; - if (scalar(@$uids = @{$self->{ibx}->over-> - uid_range($beg, $end)})) { - $range_info->[0] = $uids->[-1] + 1; - } elsif (!$range_csv) { - $self->write(\"$tag OK Fetch done\r\n"); +sub refill_uids ($$$;$) { + my ($self, $uids, $range_info, $sql) = @_; + my ($beg, $end, $range_csv) = @$range_info; + my $over = $self->{ibx}->over; + while (1) { + if (scalar(@$uids = @{$over->uid_range($beg, $end, $sql)})) { + $range_info->[0] = $uids->[-1] + 1; # update $beg return; + } elsif (!$range_csv) { + return 0; } else { my $next_range = range_step($self, \$range_csv); - if (!ref($next_range)) { # error - $self->write(\"$tag $next_range\r\n"); - return; - } - @$range_info = @$next_range; + return $next_range if !ref($next_range); # error + ($beg, $end, $range_csv) = @$range_info = @$next_range; + # continue looping } - # continue looping + } +} + +sub fetch_uid { # long_response + my ($self, $tag, $uids, $range_info, $ops) = @_; + if (defined(my $err = refill_uids($self, $uids, $range_info))) { + $err ||= 'OK Fetch done'; + $self->write("$tag $err\r\n"); + return; } my ($i, $k); my $msn = $range_info->[3]; @@ -967,15 +972,15 @@ sub parse_date ($) { # 02-Oct-1993 } sub uid_search_uid_range { # long_response - my ($self, $tag, $beg, $end, $sql) = @_; - my $uids = $self->{ibx}->over->uid_range($$beg, $end, $sql); - if (@$uids) { - $$beg = $uids->[-1] + 1; - $self->msg_more(join(' ', '', @$uids)); - } else { - $self->write(\"\r\n$tag OK Search done\r\n"); - undef; + my ($self, $tag, $uids, $sql, $range_info) = @_; + if (defined(my $err = refill_uids($self, $uids, $range_info, $sql))) { + $err ||= 'OK Search done'; + $self->write("\r\n$tag $err\r\n"); + return; } + $self->msg_more(join(' ', '', @$uids)); + @$uids = (); + 1; # more } sub date_search { @@ -1079,12 +1084,8 @@ sub parse_query { return 'BAD Xapian not configured for mailbox'; } - if (my $uid = $q->{uid}) { - ((@$uid > 1) || $uid->[0] =~ /,/) and - return 'BAD multiple ranges not supported, yet'; - ($q->{sql} // $q->{xap}) and - return 'BAD ranges and queries do not mix, yet'; - $q->{uid} = join(',', @$uid); # TODO: multiple ranges + if (my $uid = delete $q->{uid}) { + $q->{uid} = join(',', @$uid); } $q; } @@ -1095,28 +1096,13 @@ sub cmd_uid_search ($$$;) { my $q = parse_query($self, \@_); return "$tag $q\r\n" if !ref($q); my $sql = delete $q->{sql}; - + my $range_csv = delete $q->{uid} // '1:*'; + my $range_info = range_step($self, \$range_csv); + return "$tag $range_info\r\n" if !ref($range_info); if (!scalar(keys %$q)) { $self->msg_more('* SEARCH'); - my $beg = 1; - my $end = $ibx->over->max; - uid_clamp($self, \$beg, \$end); long_response($self, \&uid_search_uid_range, - $tag, \$beg, $end, $sql); - } elsif (my $uid = $q->{uid}) { - if ($uid =~ /\A([0-9]+):([0-9]+|\*)\z/s) { - my ($beg, $end) = ($1, $2); - $end = $ibx->over->max if $end eq '*'; - uid_clamp($self, \$beg, \$end); - $self->msg_more('* SEARCH'); - long_response($self, \&uid_search_uid_range, - $tag, \$beg, $end, $sql); - } elsif ($uid =~ /\A[0-9]+\z/s) { - $uid = $ibx->over->get_art($uid) ? " $uid" : ''; - "* SEARCH$uid\r\n$tag OK Search done\r\n"; - } else { - "$tag BAD Error\r\n"; - } + $tag, [], $sql, $range_info); } else { "$tag BAD Error\r\n"; }
Simple queries work, more complex queries involving parentheses, "OR", "NOT" don't work, yet. Tested with "=b", "=B", and "=H" search and limits in mutt on both v1 and v2 with multiple Xapian shards. --- lib/PublicInbox/IMAP.pm | 142 ++++++++++++++++++++++++++++++-------- lib/PublicInbox/Search.pm | 19 ++++- t/imapd.t | 46 +++++++++++- 3 files changed, 175 insertions(+), 32 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index f6106a1e806..67bc32ab59b 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -24,6 +24,8 @@ use Errno qw(EAGAIN); use Time::Local qw(timegm); use POSIX qw(strftime); use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways +use PublicInbox::Search; +*mdocid = \&PublicInbox::Search::mdocid; my $Address; for my $mod (qw(Email::Address::XS Mail::Address)) { @@ -592,22 +594,26 @@ sub range_step ($$) { $range = $$range_csv; $$range_csv = undef; } + my $uid_base = $self->{uid_base}; + my $uid_end = $uid_base + UID_BLOCK; if ($range =~ /\A([0-9]+):([0-9]+)\z/) { ($beg, $end) = ($1 + 0, $2 + 0); + uid_clamp($self, \$beg, \$end); } elsif ($range =~ /\A([0-9]+):\*\z/) { $beg = $1 + 0; $end = $self->{ibx}->over->max; - my $uid_end = $self->{uid_base} + UID_BLOCK; $end = $uid_end if $end > $uid_end; $beg = $end if $beg > $end; + uid_clamp($self, \$beg, \$end); } elsif ($range =~ /\A[0-9]+\z/) { $beg = $end = $range + 0; - undef $range; + # just let the caller do an out-of-range query if a single + # UID is out-of-range + ++$beg if ($beg <= $uid_base || $end > $uid_end); } else { return 'BAD fetch range'; } - uid_clamp($self, \$beg, \$end) if defined($range); - my $msn = $beg - $self->{uid_base}; + my $msn = $beg - $uid_base; [ $beg, $end, $$range_csv, \$msn ]; } @@ -971,15 +977,22 @@ sub parse_date ($) { # 02-Oct-1993 timegm(0, 0, 0, $dd, $mm, $yyyy); } -sub uid_search_uid_range { # long_response - my ($self, $tag, $uids, $sql, $range_info) = @_; +sub msn_convert ($$) { + my ($self, $uids) = @_; + my $adj = $self->{uid_base}; + $_ -= $adj for @$uids; +} + +sub search_uid_range { # long_response + my ($self, $tag, $sql, $range_info, $want_msn) = @_; + my $uids = []; if (defined(my $err = refill_uids($self, $uids, $range_info, $sql))) { $err ||= 'OK Search done'; $self->write("\r\n$tag $err\r\n"); return; } + msn_convert($self, $uids) if $want_msn; $self->msg_more(join(' ', '', @$uids)); - @$uids = (); 1; # more } @@ -1029,6 +1042,21 @@ my %I2X = ( # KEYWORD # TODO ? dfpre,dfpost,... ); +# IMAP allows searching arbitrary headers via "HEADER $HDR_NAME $HDR_VAL" +# which gets silly expensive. We only allow the headers we already index. +my %H2X = (%I2X, 'MESSAGE-ID' => 'm:', 'LIST-ID' => 'l:'); + +sub xap_append ($$$$) { + my ($q, $rest, $k, $xk) = @_; + delete $q->{sql}; # can't use over.sqlite3 + defined(my $arg = shift @$rest) or return "BAD $k no arg"; + + # AFAIK Xapian can't handle [*"] in probabilistic terms + $arg =~ tr/*"//d; + ${$q->{xap}} .= qq[ $xk"$arg"]; + undef; +} + sub parse_query { my ($self, $rest) = @_; if (uc($rest->[0]) eq 'CHARSET') { @@ -1038,7 +1066,8 @@ sub parse_query { } my $sql = ''; # date conditions, {sql} deleted if Xapian is needed - my $q = { xap => '', sql => \$sql }; + my $xap = ''; + my $q = { sql => \$sql, xap => \$xap }; while (@$rest) { my $k = uc(shift @$rest); # default criteria @@ -1059,17 +1088,18 @@ sub parse_query { delete $q->{sql}; # can't use over.sqlite3 my $bytes = shift(@$rest) // ''; $bytes =~ /\A[0-9]+\z/ or return "BAD $k not a number"; - $q->{xap} .= ' bytes:' . ($k eq 'SMALLER' ? + $xap .= ' bytes:' . ($k eq 'SMALLER' ? '..'.(--$bytes) : (++$bytes).'..'); + } elsif ($k eq 'HEADER') { + $k = uc(shift(@$rest) // ''); + my $xk = $H2X{$k} or + return "BAD HEADER $k not supported"; + my $err = xap_append($q, $rest, $k, $xk); + return $err if $err; } elsif (defined(my $xk = $I2X{$k})) { - delete $q->{sql}; # can't use over.sqlite3 - my $arg = shift @$rest; - defined($arg) or return "BAD $k no arg"; - - # Xapian can't handle [*"] in probabilistic terms - $arg =~ tr/*"//d; - $q->{xap} .= qq[ $xk:"$arg"]; + my $err = xap_append($q, $rest, $k, $xk); + return $err if $err; } else { # TODO: parentheses, OR, NOT ... return "BAD $k not supported (yet?)"; @@ -1083,31 +1113,87 @@ sub parse_query { } elsif (!$self->{ibx}->search) { return 'BAD Xapian not configured for mailbox'; } - + my $max = $self->{ibx}->over->max; if (my $uid = delete $q->{uid}) { - $q->{uid} = join(',', @$uid); + my $range_csv = join(',', @$uid); + do { + my $nxt = range_step($self, \$range_csv); + my ($beg, $end) = @$nxt; + if ($xap) { + $xap .= " uid:$beg..$end"; + } elsif ($beg == $end) { + $sql .= " AND num = $beg"; + } else { + $sql .= " AND num >= $beg AND num <= $end"; + } + } while ($range_csv); } + my $beg = 1; + uid_clamp($self, \$beg, \$max); + $q->{range_info} = [ $beg, $max ]; $q; } -sub cmd_uid_search ($$$;) { - my ($self, $tag) = splice(@_, 0, 2); +sub refill_xap ($$$$) { + my ($self, $uids, $range_info, $q) = @_; + my ($beg, $end) = @$range_info; + my $srch = $self->{ibx}->search; + my $opt = { mset => 2, limit => 1000 }; + my $nshard = $srch->{nshard} // 1; + while (1) { + my $mset = $srch->query("$$q uid:$beg..$end", $opt); + @$uids = map { mdocid($nshard, $_) } $mset->items; + if (@$uids) { + $range_info->[0] = $uids->[-1] + 1; # update $beg + return; + } else { # all done + return 0; + } + } +} + +sub search_xap_range { # long_response + my ($self, $tag, $q, $range_info, $want_msn) = @_; + my $uids = []; + if (defined(my $err = refill_xap($self, $uids, $range_info, $q))) { + $err ||= 'OK Search done'; + $self->write("\r\n$tag $err\r\n"); + return; + } + msn_convert($self, $uids) if $want_msn; + $self->msg_more(join(' ', '', @$uids)); + 1; # more +} + +sub search_common { + my ($self, $tag, $rest, $want_msn) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; - my $q = parse_query($self, \@_); + my $q = parse_query($self, $rest); return "$tag $q\r\n" if !ref($q); - my $sql = delete $q->{sql}; - my $range_csv = delete $q->{uid} // '1:*'; - my $range_info = range_step($self, \$range_csv); - return "$tag $range_info\r\n" if !ref($range_info); - if (!scalar(keys %$q)) { + my ($sql, $range_info) = delete @$q{qw(sql range_info)}; + if (!scalar(keys %$q)) { # overview.sqlite3 + $self->msg_more('* SEARCH'); + long_response($self, \&search_uid_range, + $tag, $sql, $range_info, $want_msn); + } elsif ($q = $q->{xap}) { $self->msg_more('* SEARCH'); - long_response($self, \&uid_search_uid_range, - $tag, [], $sql, $range_info); + long_response($self, \&search_xap_range, + $tag, $q, $range_info, $want_msn); } else { "$tag BAD Error\r\n"; } } +sub cmd_uid_search ($$$;) { + my ($self, $tag) = splice(@_, 0, 2); + search_common($self, $tag, \@_); +} + +sub cmd_search ($$$;) { + my ($self, $tag) = splice(@_, 0, 2); + search_common($self, $tag, \@_, 1); +} + sub args_ok ($$) { # duplicated from PublicInbox::NNTP my ($cb, $argc) = @_; my $tot = prototype $cb; diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm index c54cf7b9911..55eee41ca4a 100644 --- a/lib/PublicInbox/Search.pm +++ b/lib/PublicInbox/Search.pm @@ -182,6 +182,7 @@ sub _xdb ($) { my ($xdb, $slow_phrase); my $qpf = \($self->{qp_flags} ||= $QP_FLAGS); if ($self->{ibx_ver} >= 2) { + my $n = 0; foreach my $shard (<$dir/*>) { -d $shard && $shard =~ m!/[0-9]+\z! or next; my $sub = $X{Database}->new($shard); @@ -191,7 +192,9 @@ sub _xdb ($) { $xdb = $sub; } $slow_phrase ||= -f "$shard/iamchert"; + ++$n; } + $self->{nshard} = $n; } else { $slow_phrase = -f "$dir/iamchert"; $xdb = $X{Database}->new($dir); @@ -200,6 +203,15 @@ sub _xdb ($) { $xdb; } +# v2 Xapian docids don't conflict, so they're identical to +# NNTP article numbers and IMAP UIDs. +# https://trac.xapian.org/wiki/FAQ/MultiDatabaseDocumentID +sub mdocid { + my ($nshard, $mitem) = @_; + my $docid = $mitem->get_docid; + int(($docid - 1) / $nshard) + 1; +} + sub xdb ($) { my ($self) = @_; $self->{xdb} ||= do { @@ -283,7 +295,7 @@ sub _enquire_once { # retry_reopen callback $enquire->set_query($query); $opts ||= {}; my $desc = !$opts->{asc}; - if (($opts->{mset} || 0) == 2) { + if (($opts->{mset} || 0) == 2) { # mset == 2: ORDER BY docid/UID $enquire->set_docid_order($ENQ_ASCENDING); $enquire->set_weighting_scheme($X{BoolWeight}->new); } elsif ($opts->{relevance}) { @@ -322,6 +334,11 @@ sub qp { $qp->add_valuerangeprocessor($nvrp->new(YYYYMMDD, 'd:')); $qp->add_valuerangeprocessor($nvrp->new(DT, 'dt:')); + # for IMAP, undocumented for WWW and may be split off go away + $qp->add_valuerangeprocessor($nvrp->new(BYTES, 'bytes:')); + $qp->add_valuerangeprocessor($nvrp->new(TS, 'ts:')); + $qp->add_valuerangeprocessor($nvrp->new(UID, 'uid:')); + while (my ($name, $prefix) = each %bool_pfx_external) { $qp->add_boolean_prefix($name, $_) foreach split(/ /, $prefix); } diff --git a/t/imapd.t b/t/imapd.t index fdab074a249..e3cce2d30f0 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -19,10 +19,10 @@ if ($can_compress) { # hope this gets fixed upstream, soon require_ok 'PublicInbox::IMAP'; my $first_range = '0'; -my $level = '-Lbasic'; +my $level = 'basic'; SKIP: { require_mods('Search::Xapian', 1); - $level = '-Lmedium'; + $level = 'medium'; }; my @V = (1); @@ -38,7 +38,7 @@ for my $V (@V) { my $url = "http://example.com/i$V"; my $inboxdir = "$tmpdir/$name"; my $folder = "inbox.i$V"; - my $cmd = ['-init', "-V$V", $level, $name, $inboxdir, $url, $addr]; + my $cmd = ['-init', "-V$V", "-L$level", $name, $inboxdir, $url, $addr]; run_script($cmd) or BAIL_OUT("init $name"); xsys(qw(git config), "--file=$ENV{HOME}/.public-inbox/config", "publicinbox.$name.newsgroup", $folder) == 0 or @@ -120,6 +120,24 @@ is_deeply($ret, [ 1 ], 'search UID 1:1 works'); $ret = $mic->search('uid 1:*') or BAIL_OUT "SEARCH FAIL $@"; is_deeply($ret, [ 1 ], 'search UID 1:* works'); +SKIP: { + skip 'Xapian missing', 6 if $level eq 'basic'; + my $x = $mic->search(qw(smaller 99999)); + is_deeply($x, [1], 'SMALLER works with Xapian (hit)'); + $x = $mic->search(qw(smaller 9)); + is_deeply($x, [], 'SMALLER works with Xapian (miss)'); + + $x = $mic->search(qw(larger 99999)); + is_deeply($x, [], 'LARGER works with Xapian (miss)'); + $x = $mic->search(qw(larger 9)); + is_deeply($x, [1], 'LARGER works with Xapian (hit)'); + + $x = $mic->search(qw(HEADER Message-ID testmessage@example.com)); + is_deeply($x, [1], 'HEADER Message-ID works'); + $x = $mic->search(qw(HEADER Message-ID miss)); + is_deeply($x, [], 'HEADER Message-ID can miss'); +} + is_deeply(scalar $mic->flags('1'), [], '->flags works'); { # RFC 3501 section 6.4.8 states: @@ -341,12 +359,34 @@ $ret = $mic->fetch_hash('1,2:3', 'RFC822') or BAIL_OUT "FETCH $@"; is(scalar keys %$ret, 3, 'got all 3 messages with comma-separated sequence'); $ret = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@"; is(scalar keys %$ret, 3, 'got all 3 messages'); + +SKIP: { + # do any clients use non-UID IMAP SEARCH? + skip 'Xapian missing', 2 if $level eq 'basic'; + my $x = $mic->search('all'); + is_deeply($x, [1, 2, 3], 'MSN SEARCH works before rm'); + $x = $mic->search(qw(header subject embedded)); + is_deeply($x, [2], 'MSN SEARCH on Subject works before rm'); +} + { my $rdr = { 0 => \($ret->{1}->{RFC822}) }; my $env = { HOME => $ENV{HOME} }; my @cmd = qw(-learn rm --all); run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm'); } + +SKIP: { + # do any clients use non-UID IMAP SEARCH? We only ensure + # MSN "SEARCH" can return a result which can be retrieved + # via MSN "FETCH" + skip 'Xapian missing', 3 if $level eq 'basic'; + my $x = $mic->search(qw(header subject embedded)); + is(scalar(@$x), 1, 'MSN SEARCH on Subject works after rm'); + $x = $mic->message_string($x->[0]); + is($x, $ret->{2}->{RFC822}, 'message 2 unchanged'); +} + my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@"; is(scalar keys %$r2, 2, 'did not get all 3 messages'); is($r2->{1}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged');
Note some of our limitations for potential hackers. We'll be renaming "UID_BLOCK" to "UID_SLICE", since "block" is overused term and "slice" isn't used in our codebase. Also, document how "slice" and "epochs" are similar concepts for different clients. --- lib/PublicInbox/IMAP.pm | 38 +++++++++++++++++++++++++------------- lib/PublicInbox/IMAPD.pm | 2 +- 2 files changed, 26 insertions(+), 14 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 67bc32ab59b..1ba3a3ff1bb 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -7,7 +7,16 @@ # slow storage. # # data notes: -# * NNTP article numbers are UIDs +# +# * NNTP article numbers are UIDs, mm->created_at is UIDVALIDITY +# +# * public-inboxes are sliced into mailboxes of 50K messages +# to not overload MUAs: $NEWSGROUP_NAME.$SLICE_INDEX +# Slices are similar in concept to v2 "epochs". Epochs +# are for the limitations of git clients, while slices are +# for the limitations of IMAP clients. +# +# * sequence numbers are estimated based on slice package PublicInbox::IMAP; use strict; @@ -37,7 +46,7 @@ die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address; sub LINE_MAX () { 8000 } # RFC 2683 3.2.1.5 # changing this will cause grief for clients which cache -sub UID_BLOCK () { 50_000 } +sub UID_SLICE () { 50_000 } # these values area also used for sorting sub NEED_SMSG () { 1 } @@ -187,7 +196,7 @@ sub on_inbox_unlock { my ($self, $ibx) = @_; my $new = $ibx->over->max; my $uid_base = $self->{uid_base} // 0; - my $uid_end = $uid_base + UID_BLOCK; + my $uid_end = $uid_base + UID_SLICE; defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset'; $new = $uid_end if $new > $uid_end; if ($new > $old) { @@ -226,7 +235,7 @@ sub cmd_idle ($$) { my $ibx = $self->{ibx} or return "$tag BAD no mailbox selected\r\n"; $self->{-idle_tag} = $tag; my $max = $ibx->over->max; - my $uid_end = $self->{uid_base} + UID_BLOCK; + my $uid_end = $self->{uid_base} + UID_SLICE; my $sock = $self->{sock} or return; my $fd = fileno($sock); # only do inotify on most recent slice @@ -260,12 +269,12 @@ sub cmd_done ($$) { "$idle_tag OK Idle done\r\n"; } -sub ensure_ranges_exist ($$$) { +sub ensure_slices_exist ($$$) { my ($imapd, $ibx, $max) = @_; defined(my $mb_top = $ibx->{newsgroup}) or return; my $mailboxes = $imapd->{mailboxes}; my @created; - for (my $i = int($max/UID_BLOCK); $i >= 0; --$i) { + for (my $i = int($max/UID_SLICE); $i >= 0; --$i) { my $sub_mailbox = "$mb_top.$i"; last if exists $mailboxes->{$sub_mailbox}; $mailboxes->{$sub_mailbox} = $ibx; @@ -281,14 +290,14 @@ sub inbox_lookup ($$) { my ($self, $mailbox) = @_; my ($ibx, $exists, $uidnext, $uid_base); if ($mailbox =~ /\A(.+)\.([0-9]+)\z/) { - # old mail: inbox.comp.foo.$uid_block_idx + # old mail: inbox.comp.foo.$SLICE_IDX my $mb_top = $1; - $uid_base = $2 * UID_BLOCK; + $uid_base = $2 * UID_SLICE; $ibx = $self->{imapd}->{mailboxes}->{lc $mailbox} or return; my $max; ($exists, $uidnext, $max) = $ibx->over->imap_status($uid_base, - $uid_base + UID_BLOCK); - ensure_ranges_exist($self->{imapd}, $ibx, $max); + $uid_base + UID_SLICE); + ensure_slices_exist($self->{imapd}, $ibx, $max); } else { # check for dummy inboxes $mailbox = lc $mailbox; $ibx = $self->{imapd}->{mailboxes}->{$mailbox} or return; @@ -296,7 +305,7 @@ sub inbox_lookup ($$) { # if "INBOX.foo.bar" is selected and "INBOX.foo.bar.0", # check for new UID ranges (e.g. "INBOX.foo.bar.1") if (my $z = $self->{imapd}->{mailboxes}->{"$mailbox.0"}) { - ensure_ranges_exist($self->{imapd}, $z, $z->over->max); + ensure_slices_exist($self->{imapd}, $z, $z->over->max); } $uid_base = $exists = 0; @@ -580,7 +589,7 @@ sub op_crlf_bdy { ${$_[4]->{bdy}} =~ s/(?<!\r)\n/\r\n/sg if $_[4]->{bdy} } sub uid_clamp ($$$) { my ($self, $beg, $end) = @_; my $uid_min = $self->{uid_base} + 1; - my $uid_end = $uid_min + UID_BLOCK - 1; + my $uid_end = $uid_min + UID_SLICE - 1; $$beg = $uid_min if $$beg < $uid_min; $$end = $uid_end if $$end > $uid_end; } @@ -595,7 +604,7 @@ sub range_step ($$) { $$range_csv = undef; } my $uid_base = $self->{uid_base}; - my $uid_end = $uid_base + UID_BLOCK; + my $uid_end = $uid_base + UID_SLICE; if ($range =~ /\A([0-9]+):([0-9]+)\z/) { ($beg, $end) = ($1 + 0, $2 + 0); uid_clamp($self, \$beg, \$end); @@ -1206,6 +1215,9 @@ sub args_ok ($$) { # duplicated from PublicInbox::NNTP # returns 1 if we can continue, 0 if not due to buffered writes or disconnect sub process_line ($$) { my ($self, $l) = @_; + + # TODO: IMAP allows literals for big requests to upload messages + # (which we don't support) but maybe some big search queries use it. my ($tag, $req, @args) = parse_line('[ \t]+', 0, $l); pop(@args) if (@args && !defined($args[-1])); if (@args && uc($req) eq 'UID') { diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index e4dc90e8d0e..fe50de0f7cd 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -44,7 +44,7 @@ sub imapd_refresh_ibx { # pi_config->each_inbox cb # this case is a 32-bit representation of the creation # date/time of the mailbox" defined($ibx->{uidvalidity} = $mm->created_at) or return; - PublicInbox::IMAP::ensure_ranges_exist($imapd, $ibx, $mm->max // 0); + PublicInbox::IMAP::ensure_slices_exist($imapd, $ibx, $mm->max // 0); # preload to avoid fragmentation: $ibx->description;
The sort was unstable on my test instance anyways, and clients don't seem to mind. So stop wasting CPU cycles. --- lib/PublicInbox/IMAPD.pm | 5 ----- t/imapd.t | 4 +++- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/IMAPD.pm b/lib/PublicInbox/IMAPD.pm index fe50de0f7cd..09bedf5ca9b 100644 --- a/lib/PublicInbox/IMAPD.pm +++ b/lib/PublicInbox/IMAPD.pm @@ -74,11 +74,6 @@ sub imapd_refresh_finalize { my $u = $_; # capitalize "INBOX" for user-familiarity $u =~ s/\Ainbox(\.|\z)/INBOX$1/i; qq[* LIST (\\Has${no}Children) "." $u\r\n] - } sort { - # shortest names first, alphabetically if lengths match - length($a) == length($b) ? - ($a cmp $b) : - (length($a) <=> length($b)) } keys %$mailboxes ]; $imapd->{pi_config} = $pi_config; diff --git a/t/imapd.t b/t/imapd.t index e3cce2d30f0..c7e0baaa10b 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -405,7 +405,9 @@ is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'}, pop @new_list; pop @orig_list; # TODO: not sure if sort order matters, imapd_refresh_finalize - # sorts, for now, but maybe clients don't care... + # doesn't sort, hopefully clients don't care... + @new_list = sort @new_list; + @orig_list = sort @orig_list; is_deeply(\@new_list, \@orig_list, 'LIST identical'); } ok($mic->close, 'CLOSE works');
Supporting MSNs in long-lived connections beyond the lifetime of a single request/response cycle is not scalable to a C10K scenario. It's probably not needed, since most clients seem to use UIDs. A somewhat efficient implementation I can come up uses pack("S*" ...) (AKA "uint16_t mapping[50000]") has an overhead of 100K per-client socket on a mailbox with 50K messages. The 100K is a contiguous scalar, so it could be swapped out for idle clients on most architectures if THP is disabled. An alternative could be to use a tempfile as an allocator partitioned into 100K chunks (or SQLite); but I'll only do that if somebody presents a compelling case to support MSN SEARCH. --- lib/PublicInbox/IMAP.pm | 33 +++++++++------------------------ t/imapd.t | 37 ------------------------------------- 2 files changed, 9 insertions(+), 61 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 1ba3a3ff1bb..7efe5387646 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -986,21 +986,14 @@ sub parse_date ($) { # 02-Oct-1993 timegm(0, 0, 0, $dd, $mm, $yyyy); } -sub msn_convert ($$) { - my ($self, $uids) = @_; - my $adj = $self->{uid_base}; - $_ -= $adj for @$uids; -} - sub search_uid_range { # long_response - my ($self, $tag, $sql, $range_info, $want_msn) = @_; + my ($self, $tag, $sql, $range_info) = @_; my $uids = []; if (defined(my $err = refill_uids($self, $uids, $range_info, $sql))) { $err ||= 'OK Search done'; $self->write("\r\n$tag $err\r\n"); return; } - msn_convert($self, $uids) if $want_msn; $self->msg_more(join(' ', '', @$uids)); 1; # more } @@ -1162,46 +1155,38 @@ sub refill_xap ($$$$) { } sub search_xap_range { # long_response - my ($self, $tag, $q, $range_info, $want_msn) = @_; + my ($self, $tag, $q, $range_info) = @_; my $uids = []; if (defined(my $err = refill_xap($self, $uids, $range_info, $q))) { $err ||= 'OK Search done'; $self->write("\r\n$tag $err\r\n"); return; } - msn_convert($self, $uids) if $want_msn; $self->msg_more(join(' ', '', @$uids)); 1; # more } -sub search_common { - my ($self, $tag, $rest, $want_msn) = @_; +sub cmd_uid_search ($$$;) { + my ($self, $tag) = splice(@_, 0, 2); my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; - my $q = parse_query($self, $rest); + my $q = parse_query($self, \@_); return "$tag $q\r\n" if !ref($q); my ($sql, $range_info) = delete @$q{qw(sql range_info)}; if (!scalar(keys %$q)) { # overview.sqlite3 $self->msg_more('* SEARCH'); long_response($self, \&search_uid_range, - $tag, $sql, $range_info, $want_msn); + $tag, $sql, $range_info); } elsif ($q = $q->{xap}) { $self->msg_more('* SEARCH'); long_response($self, \&search_xap_range, - $tag, $q, $range_info, $want_msn); + $tag, $q, $range_info); } else { "$tag BAD Error\r\n"; } } -sub cmd_uid_search ($$$;) { - my ($self, $tag) = splice(@_, 0, 2); - search_common($self, $tag, \@_); -} - -sub cmd_search ($$$;) { - my ($self, $tag) = splice(@_, 0, 2); - search_common($self, $tag, \@_, 1); -} +# note: MSN SEARCH is NOT supported. Do any widely-used MUAs +# rely on MSNs from SEARCH results? Let us know at meta@public-inbox.org sub args_ok ($$) { # duplicated from PublicInbox::NNTP my ($cb, $argc) = @_; diff --git a/t/imapd.t b/t/imapd.t index c7e0baaa10b..f5ca8b7eb03 100644 --- a/t/imapd.t +++ b/t/imapd.t @@ -360,43 +360,6 @@ is(scalar keys %$ret, 3, 'got all 3 messages with comma-separated sequence'); $ret = $mic->fetch_hash('1:*', 'RFC822') or BAIL_OUT "FETCH $@"; is(scalar keys %$ret, 3, 'got all 3 messages'); -SKIP: { - # do any clients use non-UID IMAP SEARCH? - skip 'Xapian missing', 2 if $level eq 'basic'; - my $x = $mic->search('all'); - is_deeply($x, [1, 2, 3], 'MSN SEARCH works before rm'); - $x = $mic->search(qw(header subject embedded)); - is_deeply($x, [2], 'MSN SEARCH on Subject works before rm'); -} - -{ - my $rdr = { 0 => \($ret->{1}->{RFC822}) }; - my $env = { HOME => $ENV{HOME} }; - my @cmd = qw(-learn rm --all); - run_script(\@cmd, $env, $rdr) or BAIL_OUT('-learn rm'); -} - -SKIP: { - # do any clients use non-UID IMAP SEARCH? We only ensure - # MSN "SEARCH" can return a result which can be retrieved - # via MSN "FETCH" - skip 'Xapian missing', 3 if $level eq 'basic'; - my $x = $mic->search(qw(header subject embedded)); - is(scalar(@$x), 1, 'MSN SEARCH on Subject works after rm'); - $x = $mic->message_string($x->[0]); - is($x, $ret->{2}->{RFC822}, 'message 2 unchanged'); -} - -my $r2 = $mic->fetch_hash('1:*', 'BODY.PEEK[]') or BAIL_OUT "FETCH $@"; -is(scalar keys %$r2, 2, 'did not get all 3 messages'); -is($r2->{1}->{'BODY[]'}, $ret->{2}->{RFC822}, 'message 2 unchanged'); -is($r2->{2}->{'BODY[]'}, $ret->{3}->{RFC822}, 'message 3 unchanged'); -$r2 = $mic->fetch_hash(2, 'BODY.PEEK[HEADER.FIELDS (message-id)]') - or BAIL_OUT "FETCH $@"; -is($r2->{2}->{'BODY[HEADER.FIELDS (MESSAGE-ID)]'}, - 'Message-ID: <20200418222508.GA13918@dcvr>'."\r\n\r\n", - 'BODY.PEEK[HEADER.FIELDS ...] drops .PEEK'); - { my @new_list = $mic->list; # tag differs in [-1]
The IMAP code already limits the range to UID_SLICE (50K), so that's about 1.6MB of of IVs for an ephemeral allocation that won't live beyond one iteration of the event loop. --- lib/PublicInbox/Over.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm index e0f20ea6d3d..e32104f0cf8 100644 --- a/lib/PublicInbox/Over.pm +++ b/lib/PublicInbox/Over.pm @@ -216,7 +216,7 @@ SELECT num,ts,ds,ddd FROM over WHERE num = ? LIMIT 1 load_from_row($smsg); } -# IMAP search +# IMAP search, this is limited by callers to UID_SLICE size (50K) sub uid_range { my ($self, $beg, $end, $sql) = @_; my $dbh = $self->connect; @@ -225,7 +225,7 @@ sub uid_range { # This is read-only, anyways; but caller should verify it's # only sending \A[0-9]+\z for ds and ts column ranges $q .= $$sql if $sql; - $q .= ' ORDER BY num ASC LIMIT ' . DEFAULT_LIMIT; + $q .= ' ORDER BY num ASC'; $dbh->selectcol_arrayref($q, undef, $beg, $end); }
This finally seems to make mutt header caching behave properly. We expect to be able to safely load 50K IV/UVs in memory without OOM, since that's "only" 1.6 MB that won't live beyond a single event loop iteration. So create a simple array which can quickly map MSNs in requests to UIDs and not leave out messages. MSNs in the FETCH response will NOT be correct, since it's inefficient to implement properly and mutt doesn't seem to care. Since the conversion code is easily shared, "UID SEARCH" can allow the same MSN => UID mapping non-UID "FETCH" does. --- lib/PublicInbox/IMAP.pm | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index 7efe5387646..ff01d0b5e8e 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -16,7 +16,12 @@ # are for the limitations of git clients, while slices are # for the limitations of IMAP clients. # -# * sequence numbers are estimated based on slice +# * sequence numbers are estimated based on slice. If they +# wrong, they're higher than than the corresponding UID +# because UIDs have gaps due to spam removals. +# We only support an ephemeral mapping non-UID "FETCH" +# because mutt header caching relies on it; mutt uses +# UID requests everywhere else. package PublicInbox::IMAP; use strict; @@ -957,9 +962,17 @@ sub cmd_uid_fetch ($$$$;@) { long_response($self, $cb, $tag, [], $range_info, $ops, $partial); } +# returns an arrayref of UIDs, so MSNs can be translated via: +# $msn2uid->[$MSN-1] => $UID +sub msn2uid ($) { + my ($self) = @_; + my $x = $self->{uid_base}; + $self->{ibx}->over->uid_range($x + 1, $x + UID_SLICE); +} + sub msn_to_uid_range ($$) { - my $uid_base = $_[0]->{uid_base}; - $_[1] =~ s/([0-9]+)/$uid_base + $1/sge; + my $msn2uid = $_[0]; + $_[1] =~ s!([0-9]+)!$msn2uid->[$1 - 1] // ($msn2uid->[-1] + 1)!sge; } sub cmd_fetch ($$$$;@) { @@ -970,7 +983,7 @@ sub cmd_fetch ($$$$;@) { # cb is one of fetch_blob, fetch_smsg, fetch_uid $range_csv = 'bad' if $range_csv !~ $valid_range; - msn_to_uid_range($self, $range_csv); + msn_to_uid_range(msn2uid($self), $range_csv); my $range_info = range_step($self, \$range_csv); return "$tag $range_info\r\n" if !ref($range_info); long_response($self, $cb, $tag, [], $range_info, $ops, $partial); @@ -1070,13 +1083,14 @@ sub parse_query { my $sql = ''; # date conditions, {sql} deleted if Xapian is needed my $xap = ''; my $q = { sql => \$sql, xap => \$xap }; + my $msn2uid; while (@$rest) { my $k = uc(shift @$rest); # default criteria next if $k =~ /\A(?:ALL|RECENT|UNSEEN|NEW)\z/; next if $k eq 'AND'; # the default, until we support OR if ($k =~ $valid_range) { # convert sequence numbers to UIDs - msn_to_uid_range($self, $k); + msn_to_uid_range($msn2uid //= msn2uid($self), $k); push @{$q->{uid}}, $k; } elsif ($k eq 'UID') { $k = shift(@$rest) // '';
Since we limit our mailboxes slices to 50K and can guarantee a contiguous UID space for those mailboxes, we can store a mapping of "UID offsets" (not full UIDs) to Message Sequence Numbers as an array of 16-bit unsigned integers in a 100K scalar. For UID-only FETCH responses, we can momentarily unpack the compact 100K representation to a ~1.6M Perl array of IV/UV elements for a slight speedup. Furthermore, we can (ab)use hash key deduplication in Perl5 to deduplicate this 100K scalar across all clients with the same mailbox slice open. Technically we can increase our slice size to 64K w/o increasing our storage overhead, but I suspect humans are more accustomed to slices easily divisible by 10. --- I've been testing this on public-inbox.org for a bit and it seems to fix problems with header caching and mutt. MANIFEST | 1 + lib/PublicInbox/IMAP.pm | 178 +++++++++++++++++++++-------- t/imap.t | 43 +++++++ xt/mem-imapd-tls.t | 243 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 420 insertions(+), 45 deletions(-) create mode 100644 xt/mem-imapd-tls.t diff --git a/MANIFEST b/MANIFEST index a133b71f..1a8d6c68 100644 --- a/MANIFEST +++ b/MANIFEST @@ -355,6 +355,7 @@ xt/git-http-backend.t xt/git_async_cmp.t xt/imapd-mbsync-oimap.t xt/imapd-validate.t +xt/mem-imapd-tls.t xt/mem-msgview.t xt/msgtime_cmp.t xt/nntpd-validate.t diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index ff01d0b5..bc890517 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -16,18 +16,17 @@ # are for the limitations of git clients, while slices are # for the limitations of IMAP clients. # -# * sequence numbers are estimated based on slice. If they -# wrong, they're higher than than the corresponding UID -# because UIDs have gaps due to spam removals. -# We only support an ephemeral mapping non-UID "FETCH" -# because mutt header caching relies on it; mutt uses -# UID requests everywhere else. +# * We also take advantage of slices being only 50K to store +# "UID offset" to message sequence number (MSN) mapping +# as a 50K uint16_t array (via pack("S*", ...)). "UID offset" +# is the offset from {uid_base} which determines the start of +# the mailbox slice. package PublicInbox::IMAP; use strict; use base qw(PublicInbox::DS); use fields qw(imapd ibx long_cb -login_tag - uid_base -idle_tag -idle_max); + uid_base -idle_tag uo2m); use PublicInbox::Eml; use PublicInbox::EmlContentFoo qw(parse_content_disposition); use PublicInbox::DS qw(now); @@ -50,7 +49,9 @@ die "neither Email::Address::XS nor Mail::Address loaded: $@" if !$Address; sub LINE_MAX () { 8000 } # RFC 2683 3.2.1.5 -# changing this will cause grief for clients which cache +# Changing UID_SLICE will cause grief for clients which cache. +# This also needs to be <64K: we pack it into a uint16_t +# for long_response UID (offset) => MSN mappings sub UID_SLICE () { 50_000 } # these values area also used for sorting @@ -196,23 +197,118 @@ sub cmd_capability ($$) { sub cmd_noop ($$) { "$_[1] OK Noop done\r\n" } +# uo2m: UID Offset to MSN, this is an arrayref by default, +# but uo2m_hibernate can compact and deduplicate it +sub uo2m_ary_new ($) { + my ($self) = @_; + my $base = $self->{uid_base}; + my $uids = $self->{ibx}->over->uid_range($base + 1, $base + UID_SLICE); + + # convert UIDs to offsets from {base} + my @tmp; # [$UID_OFFSET] => $MSN + my $msn = 0; + ++$base; + $tmp[$_ - $base] = ++$msn for @$uids; + \@tmp; +} + +# changes UID-offset-to-MSN mapping into a deduplicated scalar: +# uint16_t uo2m[UID_SLICE]. +# May be swapped out for idle clients if THP is disabled. +sub uo2m_hibernate ($) { + my ($self) = @_; + ref(my $uo2m = $self->{uo2m}) or return; + my %dedupe = ( uo2m_pack($uo2m) => undef ); + $self->{uo2m} = (keys(%dedupe))[0]; + undef; +} + +sub uo2m_last_uid ($) { + my ($self) = @_; + my $uo2m = $self->{uo2m} or die 'BUG: uo2m_last_uid w/o {uo2m}'; + (ref($uo2m) ? @$uo2m : (length($uo2m) >> 1)) + $self->{uid_base}; +} + +sub uo2m_pack ($) { + # $_[0] is an arrayref of MSNs, it may have undef gaps if there + # are gaps in the corresponding UIDs: [ msn1, msn2, undef, msn3 ] + no warnings 'uninitialized'; + pack('S*', @{$_[0]}); +} + +# extend {uo2m} to account for new messages which arrived since +# {uo2m} was created. +sub uo2m_extend ($$) { + my ($self, $new_uid_max) = @_; + defined(my $uo2m = $self->{uo2m}) or + return($self->{uo2m} = uo2m_ary_new($self)); + my $beg = uo2m_last_uid($self); # last UID we've learned + return $uo2m if $beg >= $new_uid_max; # fast path + + # need to extend the current range: + my $base = $self->{uid_base}; + ++$beg; + my $uids = $self->{ibx}->over->uid_range($beg, $base + UID_SLICE); + my @tmp; # [$UID_OFFSET] => $MSN + if (ref($uo2m)) { + my $msn = $uo2m->[-1]; + $tmp[$_ - $beg] = ++$msn for @$uids; + push @$uo2m, @tmp; + $uo2m; + } else { + my $msn = unpack('S', substr($uo2m, -2, 2)); + $tmp[$_ - $beg] = ++$msn for @$uids; + $uo2m .= uo2m_pack(\@tmp); + my %dedupe = ($uo2m => undef); + $self->{uo2m} = (keys %dedupe)[0]; + } +} + +# the flexible version which works on scalars and array refs. +# Must call uo2m_extend before this +sub uid2msn ($$) { + my ($self, $uid) = @_; + my $uo2m = $self->{uo2m}; + my $off = $uid - $self->{uid_base} - 1; + ref($uo2m) ? $uo2m->[$off] : unpack('S', substr($uo2m, $off << 1, 2)); +} + +# returns an arrayref of UIDs, so MSNs can be translated to UIDs via: +# $msn2uid->[$MSN-1] => $UID. The result of this is always ephemeral +# and does not live beyond the event loop. +sub msn2uid ($) { + my ($self) = @_; + my $base = $self->{uid_base}; + my $uo2m = uo2m_extend($self, $base + UID_SLICE); + $uo2m = [ unpack('S*', $uo2m) ] if !ref($uo2m); + + my $uo = 0; + my @msn2uid; + for my $msn (@$uo2m) { + ++$uo; + $msn2uid[$msn - 1] = $uo + $base if $msn; + } + \@msn2uid; +} + +# converts a set of message sequence numbers in requests to UIDs: +sub msn_to_uid_range ($$) { + my $msn2uid = $_[0]; + $_[1] =~ s!([0-9]+)!$msn2uid->[$1 - 1] // ($msn2uid->[-1] + 1)!sge; +} + # called by PublicInbox::InboxIdle sub on_inbox_unlock { my ($self, $ibx) = @_; - my $new = $ibx->over->max; - my $uid_base = $self->{uid_base} // 0; - my $uid_end = $uid_base + UID_SLICE; - defined(my $old = $self->{-idle_max}) or die 'BUG: -idle_max unset'; - $new = $uid_end if $new > $uid_end; + my $old = uo2m_last_uid($self); + my $uid_end = $self->{uid_base} + UID_SLICE; + uo2m_extend($self, $uid_end); + my $new = uo2m_last_uid($self); if ($new > $old) { - $self->{-idle_max} = $new; - $new -= $uid_base; - $old -= $uid_base; - $self->msg_more("* $_ EXISTS\r\n") for (($old + 1)..($new - 1)); - $self->write(\"* $new EXISTS\r\n"); + my $msn = uid2msn($self, $new); + $self->write(\"* $msn EXISTS\r\n"); } elsif ($new == $uid_end) { # max exceeded $uid_end # continue idling w/o inotify - delete $self->{-idle_max}; my $sock = $self->{sock} or return; $ibx->unsubscribe_unlock(fileno($sock)); } @@ -245,9 +341,9 @@ sub cmd_idle ($$) { my $fd = fileno($sock); # only do inotify on most recent slice if ($max < $uid_end) { + uo2m_extend($self, $uid_end); $ibx->subscribe_unlock($fd, $self); $self->{imapd}->idler_start; - $self->{-idle_max} = $max; } $idle_timer //= PublicInbox::DS::later(\&idle_tick_all); $IDLERS->{$fd} = $self; @@ -497,12 +593,10 @@ sub requeue_once ($) { $self->requeue if $new_size == 1; } -# my ($msn, $UID) = @_; -sub fetch_msn_uid ($$) { '* '.(${$_[0]}++).' FETCH (UID '.$_[1] } - sub fetch_run_ops { - my ($self, $msn, $smsg, $bref, $ops, $partial) = @_; - $self->msg_more(fetch_msn_uid($msn, $smsg->{num})); + my ($self, $smsg, $bref, $ops, $partial) = @_; + my $uid = $smsg->{num}; + $self->msg_more('* '.uid2msn($self, $uid)." FETCH (UID $uid"); my ($eml, $k); for (my $i = 0; $i < @$ops;) { $k = $ops->[$i++]; @@ -523,7 +617,7 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } - fetch_run_ops($self, $range_info->[3], $smsg, $bref, $ops, $partial); + fetch_run_ops($self, $smsg, $bref, $ops, $partial); requeue_once($self); } @@ -627,8 +721,7 @@ sub range_step ($$) { } else { return 'BAD fetch range'; } - my $msn = $beg - $uid_base; - [ $beg, $end, $$range_csv, \$msn ]; + [ $beg, $end, $$range_csv ]; } sub refill_range ($$$) { @@ -653,6 +746,7 @@ sub fetch_blob { # long_response return; } } + uo2m_extend($self, $msgs->[-1]->{num}); git_async_cat($self->{ibx}->git, $msgs->[0]->{blob}, \&fetch_blob_cb, \@_); } @@ -665,7 +759,8 @@ sub fetch_smsg { # long_response return; } } - fetch_run_ops($self, $range_info->[3], $_, undef, $ops) for @$msgs; + uo2m_extend($self, $msgs->[-1]->{num}); + fetch_run_ops($self, $_, undef, $ops) for @$msgs; @$msgs = (); 1; # more } @@ -696,10 +791,12 @@ sub fetch_uid { # long_response $self->write("$tag $err\r\n"); return; } + my $adj = $self->{uid_base} + 1; + my $uo2m = uo2m_extend($self, $uids->[-1]); + $uo2m = [ unpack('S*', $uo2m) ] if !ref($uo2m); my ($i, $k); - my $msn = $range_info->[3]; for (@$uids) { - $self->msg_more(fetch_msn_uid($msn, $_)); + $self->msg_more("* $uo2m->[$_ - $adj] FETCH (UID $_"); for ($i = 0; $i < @$ops;) { $k = $ops->[$i++]; $ops->[$i++]->($self, $k); @@ -956,25 +1053,14 @@ sub cmd_uid_fetch ($$$$;@) { my ($cb, $ops, $partial) = fetch_compile(\@want); return "$tag $cb\r\n" unless $ops; + # cb is one of fetch_blob, fetch_smsg, fetch_uid $range_csv = 'bad' if $range_csv !~ $valid_range; my $range_info = range_step($self, \$range_csv); return "$tag $range_info\r\n" if !ref($range_info); + uo2m_hibernate($self) if $cb == \&fetch_blob; # slow, save RAM long_response($self, $cb, $tag, [], $range_info, $ops, $partial); } -# returns an arrayref of UIDs, so MSNs can be translated via: -# $msn2uid->[$MSN-1] => $UID -sub msn2uid ($) { - my ($self) = @_; - my $x = $self->{uid_base}; - $self->{ibx}->over->uid_range($x + 1, $x + UID_SLICE); -} - -sub msn_to_uid_range ($$) { - my $msn2uid = $_[0]; - $_[1] =~ s!([0-9]+)!$msn2uid->[$1 - 1] // ($msn2uid->[-1] + 1)!sge; -} - sub cmd_fetch ($$$$;@) { my ($self, $tag, $range_csv, @want) = @_; my $ibx = $self->{ibx} or return "$tag BAD No mailbox selected\r\n"; @@ -986,6 +1072,7 @@ sub cmd_fetch ($$$$;@) { msn_to_uid_range(msn2uid($self), $range_csv); my $range_info = range_step($self, \$range_csv); return "$tag $range_info\r\n" if !ref($range_info); + uo2m_hibernate($self) if $cb == \&fetch_blob; # slow, save RAM long_response($self, $cb, $tag, [], $range_info, $ops, $partial); } @@ -1318,7 +1405,8 @@ sub event_step { $self->write(\"\* BAD request too long\r\n"); return $self->close; } - $self->do_read($rbuf, LINE_MAX, length($$rbuf)) or return; + $self->do_read($rbuf, LINE_MAX, length($$rbuf)) or + return uo2m_hibernate($self); $line = index($$rbuf, "\n"); } $line = substr($$rbuf, 0, $line + 1, ''); diff --git a/t/imap.t b/t/imap.t index 81b3d844..0700f578 100644 --- a/t/imap.t +++ b/t/imap.t @@ -130,4 +130,47 @@ EOF ], 'placed op_eml_new before emit_body'); } +# UID <=> MSN mapping + +sub uo2m_str_new ($) { + no warnings 'uninitialized'; # uom2m_ary_new may have may have undef + pack('S*', @{$_[0]->uo2m_ary_new}); # 2 bytes per-MSN +} + +{ + my $ibx = bless { uid_range => [ 1, 2, 4 ] }, 'Uo2mTestInbox'; + my $imap = bless { uid_base => 0, ibx => $ibx }, 'PublicInbox::IMAP'; + my $uo2m = $imap->uo2m_ary_new; + is_deeply($uo2m, [ 1, 2, undef, 3 ], 'uo2m ary'); + $uo2m = uo2m_str_new($imap); + is_deeply([ unpack('S*', $uo2m) ], [ 1, 2, 0, 3 ], 'uo2m str'); + + $ibx->{uid_range} = [ 1, 2, 4, 5, 6 ]; + for ([ 1, 2, undef, 3 ], $uo2m) { + $imap->{uo2m} = $_; + is($imap->uid2msn(1), 1, 'uid2msn'); + is($imap->uid2msn(4), 3, 'uid2msn'); + is($imap->uo2m_last_uid, 4, 'uo2m_last_uid'); + $imap->uo2m_extend(6); + is($imap->uid2msn(5), 4, 'uid2msn 5 => 4'); + is($imap->uid2msn(6), 5, 'uid2msn 6 => 5'); + is($imap->uo2m_last_uid, 6, 'uo2m_last_uid'); + + my $msn2uid = $imap->msn2uid; + my $range = '1,4:5'; + $imap->can('msn_to_uid_range')->($msn2uid, $range); + is($range, '1,5:6', 'range converted'); + } +} + done_testing; + +package Uo2mTestInbox; +use strict; +require PublicInbox::DummyInbox; +our @ISA = qw(PublicInbox::DummyInbox); +sub over { shift } +sub uid_range { + my ($self, $beg, $end, undef) = @_; + [ grep { $_ >= $beg && $_ <= $end } @{$self->{uid_range}} ]; +} diff --git a/xt/mem-imapd-tls.t b/xt/mem-imapd-tls.t new file mode 100644 index 00000000..accf7564 --- /dev/null +++ b/xt/mem-imapd-tls.t @@ -0,0 +1,243 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# Idle client memory usage test, particularly after EXAMINE when +# Message Sequence Numbers are loaded +use strict; +use Test::More; +use Socket qw(SOCK_STREAM IPPROTO_TCP SOL_SOCKET); +use PublicInbox::TestCommon; +use PublicInbox::Syscall qw(:epoll); +use PublicInbox::DS; +require_mods(qw(DBD::SQLite)); +my $inboxdir = $ENV{GIANT_INBOX_DIR}; +my $TEST_TLS; +SKIP: { + require_mods('IO::Socket::SSL', 1); + $TEST_TLS = $ENV{TEST_TLS} // 1; +}; +plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir; +diag 'TEST_COMPRESS='.($ENV{TEST_COMPRESS} // 1) . " TEST_TLS=$TEST_TLS"; + +my ($cert, $key) = qw(certs/server-cert.pem certs/server-key.pem); +if ($TEST_TLS) { + if (!-r $key || !-r $cert) { + plan skip_all => + "certs/ missing for $0, run ./certs/create-certs.perl"; + } + use_ok 'PublicInbox::TLS'; +} +my ($tmpdir, $for_destroy) = tmpdir(); +my ($out, $err) = ("$tmpdir/stdout.log", "$tmpdir/stderr.log"); +my $pi_config = "$tmpdir/pi_config"; +my $group = 'inbox.test'; +local $SIG{PIPE} = 'IGNORE'; # for IMAPC (below) +my $imaps = tcp_server(); +{ + open my $fh, '>', $pi_config or die "open: $!\n"; + print $fh <<EOF or die; +[publicinbox "imapd-tls"] + inboxdir = $inboxdir + address = $group\@example.com + newsgroup = $group + indexlevel = basic +EOF + close $fh or die "close: $!\n"; +} +my $imaps_addr = $imaps->sockhost . ':' . $imaps->sockport; +my $env = { PI_CONFIG => $pi_config }; +my $arg = $TEST_TLS ? [ "-limaps://$imaps_addr/?cert=$cert,key=$key" ] : []; +my $cmd = [ '-imapd', '-W0', @$arg, "--stdout=$out", "--stderr=$err" ]; +my $td = start_script($cmd, $env, { 3 => $imaps }); +my %ssl_opt; +if ($TEST_TLS) { + %ssl_opt = ( + SSL_hostname => 'server.local', + SSL_verifycn_name => 'server.local', + SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_PEER(), + SSL_ca_file => 'certs/test-ca.pem', + ); + my $ctx = IO::Socket::SSL::SSL_Context->new(%ssl_opt); + + # cf. https://rt.cpan.org/Ticket/Display.html?id=129463 + my $mode = eval { Net::SSLeay::MODE_RELEASE_BUFFERS() }; + if ($mode && $ctx->{context}) { + eval { Net::SSLeay::CTX_set_mode($ctx->{context}, $mode) }; + warn "W: $@ (setting SSL_MODE_RELEASE_BUFFERS)\n" if $@; + } + + $ssl_opt{SSL_reuse_ctx} = $ctx; + $ssl_opt{SSL_startHandshake} = 0; +} +chomp(my $nfd = `/bin/sh -c 'ulimit -n'`); +$nfd -= 10; +ok($nfd > 0, 'positive FD count'); +my $MAX_FD = 10000; +$nfd = $MAX_FD if $nfd >= $MAX_FD; +our $DONE = 0; +sub once { 0 }; # stops event loop + +# setup the event loop so that it exits at every step +# while we're still doing connect(2) +PublicInbox::DS->SetLoopTimeout(0); +PublicInbox::DS->SetPostLoopCallback(\&once); +my $pid = $td->{pid}; +if ($^O eq 'linux' && open(my $f, '<', "/proc/$pid/status")) { + diag(grep(/RssAnon/, <$f>)); +} + +foreach my $n (1..$nfd) { + my $io = tcp_connect($imaps, Blocking => 0); + $io = IO::Socket::SSL->start_SSL($io, %ssl_opt) if $TEST_TLS; + IMAPC->new($io); + + # one step through the event loop + # do a little work as we connect: + PublicInbox::DS->EventLoop; + + # try not to overflow the listen() backlog: + if (!($n % 128) && $DONE != $n) { + diag("nr: ($n) $DONE/$nfd"); + PublicInbox::DS->SetLoopTimeout(-1); + PublicInbox::DS->SetPostLoopCallback(sub { $DONE != $n }); + + # clear the backlog: + PublicInbox::DS->EventLoop; + + # resume looping + PublicInbox::DS->SetLoopTimeout(0); + PublicInbox::DS->SetPostLoopCallback(\&once); + } +} + +# run the event loop normally, now: +diag "done?: @".time." $DONE/$nfd"; +if ($DONE != $nfd) { + PublicInbox::DS->SetLoopTimeout(-1); + PublicInbox::DS->SetPostLoopCallback(sub { $DONE != $nfd }); + PublicInbox::DS->EventLoop; +} +is($nfd, $DONE, "$nfd/$DONE done"); +if ($^O eq 'linux' && open(my $f, '<', "/proc/$pid/status")) { + diag(grep(/RssAnon/, <$f>)); + diag " SELF lsof | wc -l ".`lsof -p $$ |wc -l`; + diag "SERVER lsof | wc -l ".`lsof -p $pid |wc -l`; +} +PublicInbox::DS->Reset; +$td->kill; +$td->join; +is($?, 0, 'no error in exited process'); +done_testing; + +package IMAPC; +use strict; +use base qw(PublicInbox::DS); +use fields qw(step zin); +use PublicInbox::Syscall qw(EPOLLIN EPOLLOUT EPOLLONESHOT); +use Errno qw(EAGAIN); +# determines where we start event_step +use constant FIRST_STEP => ($ENV{TEST_COMPRESS} // 1) ? -2 : 0; + +# return true if complete, false if incomplete (or failure) +sub connect_tls_step { + my ($self) = @_; + my $sock = $self->{sock} or return; + return 1 if $sock->connect_SSL; + return $self->drop("$!") if $! != EAGAIN; + if (my $ev = PublicInbox::TLS::epollbit()) { + unshift @{$self->{wbuf}}, \&connect_tls_step; + PublicInbox::DS::epwait($sock, $ev | EPOLLONESHOT); + 0; + } else { + $self->drop('BUG? EAGAIN but '.PublicInbox::TLS::err()); + } +} + +sub event_step { + my ($self) = @_; + + # TLS negotiation happens in flush_write via {wbuf} + return unless $self->flush_write && $self->{sock}; + + if ($self->{step} == -2) { + $self->do_read(\(my $buf = ''), 128) or return; + $buf =~ /\A\* OK / or die 'no greeting'; + $self->{step} = -1; + $self->write(\"1 COMPRESS DEFLATE\r\n"); + } + if ($self->{step} == -1) { + $self->do_read(\(my $buf = ''), 128) or return; + $buf =~ /\A1 OK / or die "no compression $buf"; + IMAPCdeflate->enable($self); + $self->{step} = 1; + $self->write(\"2 EXAMINE inbox.test.0\r\n"); + } + if ($self->{step} == 0) { + $self->do_read(\(my $buf = ''), 128) or return; + $buf =~ /\A\* OK / or die 'no greeting'; + $self->{step} = 1; + $self->write(\"2 EXAMINE inbox.test.0\r\n"); + } + if ($self->{step} == 1) { + my $buf = ''; + until ($buf =~ /^2 OK \[READ-ONLY/ms) { + $self->do_read(\$buf, 4096, length($buf)) or return; + } + $self->{step} = 2; + $self->write(\"3 UID FETCH 1 (UID FLAGS)\r\n"); + } + if ($self->{step} == 2) { + my $buf = ''; + until ($buf =~ /^3 OK /ms) { + $self->do_read(\$buf, 4096, length($buf)) or return; + } + $self->{step} = 3; + $self->write(\"4 IDLE\r\n"); + } + if ($self->{step} == 3) { + $self->do_read(\(my $buf = ''), 128) or return; + no warnings 'once'; + $::DONE++; + $self->{step} = 5; # all done + } else { + warn "$self->{step} Should never get here $self"; + } +} + +sub new { + my ($class, $io) = @_; + my $self = fields::new($class); + + # wait for connect(), and maybe SSL_connect() + $self->SUPER::new($io, EPOLLOUT|EPOLLONESHOT); + if ($io->can('connect_SSL')) { + $self->{wbuf} = [ \&connect_tls_step ]; + } + $self->{step} = FIRST_STEP; + $self; +} + +1; +package IMAPCdeflate; +use strict; +use base qw(IMAPC); # parent doesn't work for fields +use Hash::Util qw(unlock_hash); # dependency of fields for perl 5.10+, anyways +use Compress::Raw::Zlib; +use PublicInbox::IMAPdeflate; +my %ZIN_OPT; +BEGIN { + %ZIN_OPT = ( -WindowBits => -15, -AppendOutput => 1 ); + *write = \&PublicInbox::IMAPdeflate::write; + *do_read = \&PublicInbox::IMAPdeflate::do_read; +}; + +sub enable { + my ($class, $self) = @_; + my ($in, $err) = Compress::Raw::Zlib::Inflate->new(%ZIN_OPT); + die "Inflate->new failed: $err" if $err != Z_OK; + unlock_hash(%$self); + bless $self, $class; + $self->{zin} = $in; +} + +1;