This allows -httpd to make better use of time it spends waiting on git-cat-file to respond. It allows us to deal with high-latency HDD storage without a client monopolizing the event loop. Even on a mid-range consumer-grade SSD, this seems to give a 10+% speed improvement for HTTP responses requiring many blobs, including all /T/, /t/, and /t.mbox.gz endpoints. This only benefits indexed inboxes (both v1 and v2); I'm not sure if anybody still uses unindexed v1 inboxes nowadays. A new xt/httpd-async-stream.t maintainer test ensures checksums for responses before and after this series match exactly as before. This builds off a branch I started several months ago (but never published here) to integrate gzip responses into our codebase and remove our optional dependency on Plack::Middleware::Deflater. We already gzip a bunch of things independent of Plack::Middleware::Deflater: manifest.js.gz, altid SQLite3 dumps and all the *.mbox.gz endpoints; so being able to use gzip on all of our responses without an extra dependency seemed logical. Being able to consistently use our GzipFilter API to perform buffering via ->zmore made it significantly easier to reason about small response chunks for ghost messages interspersed with large ones when streaming /$INBOX/$MSGID/t/ endpoints. I'm not yet maximizing use of ->zmore for all buffering of HTTP responses, yet; measurements need to happen, first. That may happen in the 1.7 time frame. In particular, we would need to ensure the Perl method dispatch and DSO overhead to Zlib.so and libz.so of making many ->zmore calls doesn't cause performance regressions compared to the current `.=' use and calling ->zmore/->translate fewer times. Eric Wong (43): gzipfilter: minor cleanups wwwstream: oneshot: perform gzip without middleware www*stream: gzip ->getline responses wwwtext: gzip text/plain responses, as well wwwtext: switch to html_oneshot www: need: use WwwStream::html_oneshot wwwlisting: use GzipFilter for HTML gzipfilter: replace Compress::Raw::Deflate usages {gzip,noop}filter: ->zmore returns undef, always mbox: remove html_oneshot import wwwstatic: support gzipped response directory listings qspawn: learn to gzip streaming responses stop auto-loading Plack::Middleware::Deflater mboxgz: do asynchronous git blob retrievals mboxgz: reduce object hash depth mbox: async blob retrieval for "single message" raw mboxrd wwwatomstream: simplify feed_update callers wwwatomstream: use PublicInbox::Inbox->modified for feed_updated wwwatomstream: reuse $ctx as $self xt/httpd-async-stream: allow more options wwwatomstream: support asynchronous blob retrievals wwwstream: reduce object graph depth wwwstream: reduce blob retrieval paths for ->getline www: start making gzipfilter the parent response class remove unused/redundant zlib-related imports wwwstream: use parent.pm and no warnings wwwstream: subclass off GzipFilter view: wire up /$INBOX/$MESSAGE_ID/ permalink to async view: /$INBOX/$MSGID/t/ reads blobs asynchronously view: update /$INBOX/$MSGID/T/ to be async feed: generate_i: eliminate pointless loop feed: /$INBOX/new.html retrieves blobs asynchronously ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs view: eml_entry: reduce parameters view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case wwwstream: eliminate ::response, use html_oneshot www: update internal docs view: simplify eml_entry callers further wwwtext: simplify gzf_maybe use wwwattach: support async blob retrievals gzipfilter: drop HTTP connection on bugs or data corruption daemon: warn on missing blobs gzipfilter: check http->{forward} for client disconnects Documentation/mknews.perl | 20 +-- Documentation/public-inbox-httpd.pod | 1 - Documentation/technical/ds.txt | 4 +- INSTALL | 5 - MANIFEST | 2 + ci/deps.perl | 1 - examples/cgit.psgi | 8 - examples/newswww.psgi | 8 - examples/public-inbox.psgi | 9 -- examples/unsubscribe.psgi | 1 - lib/PublicInbox/CompressNoop.pm | 22 +++ lib/PublicInbox/Feed.pm | 22 ++- lib/PublicInbox/GetlineBody.pm | 4 +- lib/PublicInbox/GzipFilter.pm | 168 +++++++++++++++++--- lib/PublicInbox/HTTP.pm | 7 + lib/PublicInbox/HTTPD.pm | 5 +- lib/PublicInbox/IMAP.pm | 1 + lib/PublicInbox/Mbox.pm | 137 +++++++++-------- lib/PublicInbox/MboxGz.pm | 81 ++++------ lib/PublicInbox/NNTP.pm | 1 + lib/PublicInbox/Qspawn.pm | 6 +- lib/PublicInbox/SearchView.pm | 40 ++--- lib/PublicInbox/View.pm | 219 ++++++++++++++------------- lib/PublicInbox/WWW.pm | 9 +- lib/PublicInbox/WwwAtomStream.pm | 66 ++++---- lib/PublicInbox/WwwAttach.pm | 63 ++++++-- lib/PublicInbox/WwwListing.pm | 24 +-- lib/PublicInbox/WwwStatic.pm | 14 +- lib/PublicInbox/WwwStream.pm | 110 ++++++++------ lib/PublicInbox/WwwText.pm | 26 ++-- script/public-inbox-httpd | 9 -- script/public-inbox.cgi | 7 - t/httpd-corner.psgi | 7 + t/httpd-corner.t | 9 +- t/plack.t | 4 + t/psgi_attach.t | 162 +++++++++++--------- t/psgi_text.t | 33 +++- t/psgi_v2.t | 80 ++++++++-- t/www_listing.t | 8 +- t/www_static.t | 11 +- xt/httpd-async-stream.t | 104 +++++++++++++ 41 files changed, 964 insertions(+), 554 deletions(-) create mode 100644 lib/PublicInbox/CompressNoop.pm create mode 100644 xt/httpd-async-stream.t
We currently don't use bytes::length in ->write, so there's no need to `use bytes'. Favor `//=' to describe the intent of the conditional assignment since the C::R::Z::Deflate object is always truthy. Also use the local $gz variable to avoid unnecessary {gz} hash lookups. --- lib/PublicInbox/GzipFilter.pm | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 864095862..a7355a8df 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -4,7 +4,6 @@ # Qspawn filter package PublicInbox::GzipFilter; use strict; -use bytes (); # length use Compress::Raw::Zlib qw(Z_FINISH Z_OK); my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1); @@ -24,21 +23,21 @@ sub translate ($$) { # allocate the zlib context lazily here, instead of in ->new. # Deflate contexts are memory-intensive and this object may # be sitting in the Qspawn limiter queue for a while. - my $gz = $self->{gz} ||= do { + my $gz = $self->{gz} //= do { my ($g, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); $err == Z_OK or die "Deflate->new failed: $err"; $g; }; my $zbuf = delete($self->{zbuf}); if (defined $_[1]) { # my $buf = $_[1]; - my $err = $self->{gz}->deflate($_[1], $zbuf); + my $err = $gz->deflate($_[1], $zbuf); die "gzip->deflate: $err" if $err != Z_OK; return $zbuf if length($zbuf) >= 8192; $self->{zbuf} = $zbuf; ''; } else { # undef == EOF - my $err = $self->{gz}->flush($zbuf, Z_FINISH); + my $err = $gz->flush($zbuf, Z_FINISH); die "gzip->flush: $err" if $err != Z_OK; $zbuf; }
Plack::Middleware::Deflater forces us to use a memory-intensive closure. Instead, work towards building compressed strings in memory to reduce the overhead of buffering large HTML output. --- lib/PublicInbox/GzipFilter.pm | 13 +++++++++++++ lib/PublicInbox/WwwStream.pm | 27 ++++++++++++++++++++++----- 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index a7355a8df..115660cb1 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -4,7 +4,9 @@ # Qspawn filter package PublicInbox::GzipFilter; use strict; +use parent qw(Exporter); use Compress::Raw::Zlib qw(Z_FINISH Z_OK); +our @EXPORT_OK = qw(gzip_maybe); my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1); sub new { bless {}, shift } @@ -16,6 +18,17 @@ sub attach { $self } +sub gzip_maybe ($) { + my ($env) = @_; + return if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/; + + # in case Plack::Middleware::Deflater is loaded: + $env->{'plack.skip-deflater'} = 1; + + my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); + $err == Z_OK ? $gz : undef; +} + # for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'} sub translate ($$) { my $self = $_[0]; diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index 915a71ba0..79ed6871e 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -13,6 +13,8 @@ use base qw(Exporter); our @EXPORT_OK = qw(html_oneshot); use bytes (); # length use PublicInbox::Hval qw(ascii_html prurl); +use Compress::Raw::Zlib qw(Z_FINISH Z_OK); +use PublicInbox::GzipFilter qw(gzip_maybe); our $TOR_URL = 'https://www.torproject.org/'; our $CODE_URL = 'https://public-inbox.org/public-inbox.git'; @@ -178,13 +180,28 @@ sub html_oneshot ($$;$) { ctx => $ctx, base_url => base_url($ctx), }, __PACKAGE__; - my @x = (_html_top($self), $sref ? $$sref : (), _html_end($self)); + my @x; + my @h = ('Content-Type' => 'text/html; charset=UTF-8'); + if (my $gz = gzip_maybe($ctx->{env})) { + my $err = $gz->deflate(_html_top($self), $x[0]); + die "gzip->deflate: $err" if $err != Z_OK; + if ($sref) { + $err = $gz->deflate($sref, $x[0]); + die "gzip->deflate: $err" if $err != Z_OK; + } + $err = $gz->deflate(_html_end($self), $x[0]); + die "gzip->deflate: $err" if $err != Z_OK; + $err = $gz->flush($x[0], Z_FINISH); + die "gzip->flush: $err" if $err != Z_OK; + push @h, qw(Vary Accept-Encoding Content-Encoding gzip); + } else { + @x = (_html_top($self), $sref ? $$sref : (), _html_end($self)); + } + my $len = 0; $len += bytes::length($_) for @x; - [ $code, [ - 'Content-Type' => 'text/html; charset=UTF-8', - 'Content-Length' => $len - ], \@x ]; + push @h, 'Content-Length', $len; + [ $code, \@h, \@x ] } 1;
Our most common endpoints deserve to be gzipped. --- lib/PublicInbox/GzipFilter.pm | 21 +++++++++++++++----- lib/PublicInbox/WwwAtomStream.pm | 25 ++++++++++++++++------- lib/PublicInbox/WwwStream.pm | 34 ++++++++++++++++++++------------ 3 files changed, 55 insertions(+), 25 deletions(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 115660cb1..95fced053 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -6,8 +6,9 @@ package PublicInbox::GzipFilter; use strict; use parent qw(Exporter); use Compress::Raw::Zlib qw(Z_FINISH Z_OK); -our @EXPORT_OK = qw(gzip_maybe); +our @EXPORT_OK = qw(gzip_maybe gzf_maybe); my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1); +my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip); sub new { bless {}, shift } @@ -18,18 +19,28 @@ sub attach { $self } -sub gzip_maybe ($) { - my ($env) = @_; +sub gzip_maybe ($$) { + my ($res_hdr, $env) = @_; return if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/; + my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); + return if $err != Z_OK; + # in case Plack::Middleware::Deflater is loaded: $env->{'plack.skip-deflater'} = 1; - my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); - $err == Z_OK ? $gz : undef; + push @$res_hdr, @GZIP_HDRS; + $gz; +} + +sub gzf_maybe ($$) { + my ($res_hdr, $env) = @_; + my $gz = gzip_maybe($res_hdr, $env) or return 0; + bless { gz => $gz }, __PACKAGE__; } # for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'} +# Also used for ->getline callbacks sub translate ($$) { my $self = $_[0]; diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm index 9dc24e16e..c407e343f 100644 --- a/lib/PublicInbox/WwwAtomStream.pm +++ b/lib/PublicInbox/WwwAtomStream.pm @@ -14,6 +14,7 @@ use Digest::SHA qw(sha1_hex); use PublicInbox::Address; use PublicInbox::Hval qw(ascii_html mid_href); use PublicInbox::MsgTime qw(msg_timestamp); +use PublicInbox::GzipFilter qw(gzf_maybe); # called by PSGI server after getline: sub close {} @@ -26,18 +27,28 @@ sub new { sub response { my ($class, $ctx, $code, $cb) = @_; - [ $code, [ 'Content-Type', 'application/atom+xml' ], - $class->new($ctx, $cb) ] + my $h = [ 'Content-Type' => 'application/atom+xml' ]; + my $self = $class->new($ctx, $cb); + $self->{gzf} = gzf_maybe($h, $ctx->{env}); + [ $code, $h, $self ] } # called once for each message by PSGI server sub getline { my ($self) = @_; - if (my $middle = $self->{cb}) { - my $smsg = $middle->($self->{ctx}); - return feed_entry($self, $smsg) if $smsg; - } - delete $self->{cb} ? '</feed>' : undef; + my $buf = do { + if (my $middle = $self->{cb}) { + my $smsg = $middle->($self->{ctx}); + feed_entry($self, $smsg) if $smsg; + } + } // (delete($self->{cb}) ? '</feed>' : undef); + + # gzf may be GzipFilter, `undef' or `0' + my $gzf = $self->{gzf} or return $buf; + + return $gzf->translate($buf) if defined $buf; + $self->{gzf} = 0; # next call to ->getline returns $buf (== undef) + $gzf->translate(undef); } # private diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index 79ed6871e..c964dbd41 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -14,7 +14,7 @@ our @EXPORT_OK = qw(html_oneshot); use bytes (); # length use PublicInbox::Hval qw(ascii_html prurl); use Compress::Raw::Zlib qw(Z_FINISH Z_OK); -use PublicInbox::GzipFilter qw(gzip_maybe); +use PublicInbox::GzipFilter qw(gzip_maybe gzf_maybe); our $TOR_URL = 'https://www.torproject.org/'; our $CODE_URL = 'https://public-inbox.org/public-inbox.git'; @@ -41,8 +41,10 @@ sub new { sub response { my ($class, $ctx, $code, $cb) = @_; - [ $code, [ 'Content-Type', 'text/html; charset=UTF-8' ], - $class->new($ctx, $cb) ] + my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ]; + my $self = $class->new($ctx, $cb); + $self->{gzf} = gzf_maybe($h, $ctx->{env}); + [ $code, $h, $self ] } sub _html_top ($) { @@ -165,13 +167,20 @@ sub getline { my ($self) = @_; my $nr = $self->{nr}++; - return _html_top($self) if $nr == 0; + my $buf = do { + if ($nr == 0) { + _html_top($self); + } elsif (my $middle = $self->{cb}) { + $middle->($nr, $self->{ctx}); + } + } // (delete($self->{cb}) ? _html_end($self) : undef); - if (my $middle = $self->{cb}) { - $middle = $middle->($nr, $self->{ctx}) and return $middle; - } + # gzf may be GzipFilter, `undef' or `0' + my $gzf = $self->{gzf} or return $buf; - delete $self->{cb} ? _html_end($self) : undef; + return $gzf->translate($buf) if defined $buf; + $self->{gzf} = 0; # next call to ->getline returns $buf (== undef) + $gzf->translate(undef); } sub html_oneshot ($$;$) { @@ -181,8 +190,8 @@ sub html_oneshot ($$;$) { base_url => base_url($ctx), }, __PACKAGE__; my @x; - my @h = ('Content-Type' => 'text/html; charset=UTF-8'); - if (my $gz = gzip_maybe($ctx->{env})) { + my $h = [ 'Content-Type' => 'text/html; charset=UTF-8' ]; + if (my $gz = gzip_maybe($h, $ctx->{env})) { my $err = $gz->deflate(_html_top($self), $x[0]); die "gzip->deflate: $err" if $err != Z_OK; if ($sref) { @@ -193,15 +202,14 @@ sub html_oneshot ($$;$) { die "gzip->deflate: $err" if $err != Z_OK; $err = $gz->flush($x[0], Z_FINISH); die "gzip->flush: $err" if $err != Z_OK; - push @h, qw(Vary Accept-Encoding Content-Encoding gzip); } else { @x = (_html_top($self), $sref ? $$sref : (), _html_end($self)); } my $len = 0; $len += bytes::length($_) for @x; - push @h, 'Content-Length', $len; - [ $code, \@h, \@x ] + push @$h, 'Content-Length', $len; + [ $code, $h, \@x ] } 1;
Most of our plain-text responses are config files big enough to warrant compression. --- lib/PublicInbox/WwwText.pm | 17 ++++++++++++++--- t/psgi_text.t | 33 ++++++++++++++++++++++++++------- 2 files changed, 40 insertions(+), 10 deletions(-) diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm index b23a415e4..508005fba 100644 --- a/lib/PublicInbox/WwwText.pm +++ b/lib/PublicInbox/WwwText.pm @@ -10,6 +10,8 @@ use PublicInbox::Linkify; use PublicInbox::WwwStream; use PublicInbox::Hval qw(ascii_html); use URI::Escape qw(uri_escape_utf8); +use PublicInbox::GzipFilter qw(gzf_maybe); +use Compress::Raw::Zlib qw(Z_FINISH Z_OK); our $QP_URL = 'https://xapian.org/docs/queryparser.html'; our $WIKI_URL = 'https://en.wikipedia.org/wiki'; my $hl = eval { @@ -35,14 +37,23 @@ sub get_text { $code = 404; $txt = "404 Not Found ($key)\n"; } + my $env = $ctx->{env}; if ($raw) { - $hdr->[3] = bytes::length($txt); - return [ $code, $hdr, [ $txt ] ] + my $body; + if (my $gzf = $code == 200 ? gzf_maybe($hdr, $env) : undef) { + my $zbuf = $gzf->translate($txt); + undef $txt; + $body = [ $zbuf .= $gzf->translate(undef) ]; + } else { + $body = [ $txt ]; + } + $hdr->[3] = bytes::length($body->[0]); + return [ $code, $hdr, $body ] } # enforce trailing slash for "wget -r" compatibility if (!$have_tslash && $code == 200) { - my $url = $ctx->{-inbox}->base_url($ctx->{env}); + my $url = $ctx->{-inbox}->base_url($env); $url .= "_/text/$key/"; return [ 302, [ 'Content-Type', 'text/plain', diff --git a/t/psgi_text.t b/t/psgi_text.t index 833bcaba7..9867feaa4 100644 --- a/t/psgi_text.t +++ b/t/psgi_text.t @@ -10,7 +10,7 @@ my $maindir = "$tmpdir/main.git"; my $addr = 'test-public@example.com'; my $cfgpfx = "publicinbox.test"; my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape Plack::Builder); -require_mods(@mods); +require_mods(@mods, 'IO::Uncompress::Gunzip'); use_ok $_ foreach @mods; use PublicInbox::Import; use PublicInbox::Git; @@ -26,17 +26,36 @@ my $www = PublicInbox::WWW->new($config); test_psgi(sub { $www->call(@_) }, sub { my ($cb) = @_; - my $res; - $res = $cb->(GET('/test/_/text/help/')); - like($res->content, qr!<title>public-inbox help.*</title>!, - 'default help'); - $res = $cb->(GET('/test/_/text/config/raw')); + my $gunzipped; + my $req = GET('/test/_/text/help/'); + my $res = $cb->($req); + my $content = $res->content; + like($content, qr!<title>public-inbox help.*</title>!, 'default help'); + $req->header('Accept-Encoding' => 'gzip'); + $res = $cb->($req); + is($res->header('Content-Encoding'), 'gzip', 'got gzip encoding'); + is($res->header('Content-Type'), 'text/html; charset=UTF-8', + 'got gzipped HTML'); + IO::Uncompress::Gunzip::gunzip(\($res->content) => \$gunzipped); + is($gunzipped, $content, 'gzipped content is correct'); + + $req = GET('/test/_/text/config/raw'); + $res = $cb->($req); + $content = $res->content; + my $olen = $res->header('Content-Length'); my $f = "$tmpdir/cfg"; open my $fh, '>', $f or die; - print $fh $res->content or die; + print $fh $content or die; close $fh or die; my $cfg = PublicInbox::Config->new($f); is($cfg->{"$cfgpfx.address"}, $addr, 'got expected address in config'); + + $req->header('Accept-Encoding' => 'gzip'); + $res = $cb->($req); + is($res->header('Content-Encoding'), 'gzip', 'got gzip encoding'); + ok($res->header('Content-Length') < $olen, 'gzipped help is smaller'); + IO::Uncompress::Gunzip::gunzip(\($res->content) => \$gunzipped); + is($gunzipped, $content); }); done_testing();
No point in streaming a tiny response via ->getline, but we may stream to a gzipped buffer, later. --- lib/PublicInbox/WwwText.pm | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm index 508005fba..ea50428e3 100644 --- a/lib/PublicInbox/WwwText.pm +++ b/lib/PublicInbox/WwwText.pm @@ -64,25 +64,18 @@ sub get_text { # Follow git commit message conventions, # first line is the Subject/title my ($title) = ($txt =~ /\A([^\n]*)/s); - $ctx->{txt} = \$txt; $ctx->{-title_html} = ascii_html($title); my $nslash = ($key =~ tr!/!/!); $ctx->{-upfx} = '../../../' . ('../' x $nslash); - PublicInbox::WwwStream->response($ctx, $code, \&_do_linkify); -} - -sub _do_linkify { - my ($nr, $ctx) = @_; - return unless $nr == 1; my $l = PublicInbox::Linkify->new; - my $txt = delete $ctx->{txt}; - $l->linkify_1($$txt); + $l->linkify_1($txt); if ($hl) { - $hl->do_hl_text($txt); + $hl->do_hl_text(\$txt); } else { - $$txt = ascii_html($$txt); + $txt = ascii_html($txt); } - '<pre>' . $l->linkify_2($$txt) . '</pre>'; + $txt = '<pre>' . $l->linkify_2($txt) . '</pre>'; + PublicInbox::WwwStream::html_oneshot($ctx, $code, \$txt); } sub _srch_prefix ($$) {
It'll give us a nicer HTML header and footer. --- lib/PublicInbox/WWW.pm | 9 ++++----- t/plack.t | 4 ++++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm index 289d0ce48..e4ad515a4 100644 --- a/lib/PublicInbox/WWW.pm +++ b/lib/PublicInbox/WWW.pm @@ -322,12 +322,11 @@ sub get_altid_dump { sub need { my ($ctx, $extra) = @_; - my $msg = <<EOF; -<html><head><title>$extra not available for this -public-inbox</title><body><pre>$extra is not available for this public-inbox -<a href="../">Return to index</a></pre></body></html> + require PublicInbox::WwwStream; + PublicInbox::WwwStream::html_oneshot($ctx, 501, \<<EOF); +<pre>$extra is not available for this public-inbox +<a\nhref="../">Return to index</a></pre> EOF - [ 501, [ 'Content-Type' => 'text/html; charset=UTF-8' ], [ $msg ] ]; } # /$INBOX/$MESSAGE_ID/t.mbox -> thread as mbox diff --git a/t/plack.t b/t/plack.t index 37a6b3948..4b830a21e 100644 --- a/t/plack.t +++ b/t/plack.t @@ -209,6 +209,10 @@ test_psgi($app, sub { my $res = $cb->(GET($pfx . '/blah@example.com/raw')); is(200, $res->code, 'success response received for /*/raw'); like($res->content, qr!^From !sm, "mbox returned"); + + $res = $cb->(GET($pfx . '/blah@example.com/t.mbox.gz')); + is(501, $res->code, '501 when overview missing'); + like($res->content, qr!\bOverview\b!, 'overview omission noted'); }); # legacy redirects
The changes to GzipFilter here may be beneficial for building HTML and XML responses in other places, too. --- MANIFEST | 1 + lib/PublicInbox/GzipFilter.pm | 28 ++++++++++++++++++++++++++-- lib/PublicInbox/NoopFilter.pm | 13 +++++++++++++ lib/PublicInbox/WwwListing.pm | 25 +++++++++++++++---------- t/www_listing.t | 8 +++++++- 5 files changed, 62 insertions(+), 13 deletions(-) create mode 100644 lib/PublicInbox/NoopFilter.pm diff --git a/MANIFEST b/MANIFEST index 6de2c7258..dcd7a7e5f 100644 --- a/MANIFEST +++ b/MANIFEST @@ -159,6 +159,7 @@ lib/PublicInbox/NNTP.pm lib/PublicInbox/NNTPD.pm lib/PublicInbox/NNTPdeflate.pm lib/PublicInbox/NewsWWW.pm +lib/PublicInbox/NoopFilter.pm lib/PublicInbox/Over.pm lib/PublicInbox/OverIdx.pm lib/PublicInbox/ParentPipe.pm diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 95fced053..8cc5ea00b 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -42,7 +42,7 @@ sub gzf_maybe ($$) { # for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'} # Also used for ->getline callbacks sub translate ($$) { - my $self = $_[0]; + my $self = $_[0]; # $_[1] => input # allocate the zlib context lazily here, instead of in ->new. # Deflate contexts are memory-intensive and this object may @@ -72,10 +72,34 @@ sub write { $_[0]->{fh}->write(translate($_[0], $_[1])); } +# similar to ->translate; use this when we're sure we know we have +# more data to buffer after this +sub zmore { + my $self = $_[0]; # $_[1] => input + my $err = $self->{gz}->deflate($_[1], $self->{zbuf}); + die "gzip->deflate: $err" if $err != Z_OK; + ''; +} + +# flushes and returns the final bit of gzipped data +sub zflush ($;$) { + my $self = $_[0]; # $_[1] => final input (optional) + my $zbuf = delete $self->{zbuf}; + my $gz = delete $self->{gz}; + my $err; + if (defined $_[1]) { + $err = $gz->deflate($_[1], $zbuf); + die "gzip->deflate: $err" if $err != Z_OK; + } + $err = $gz->flush($zbuf, Z_FINISH); + die "gzip->flush: $err" if $err != Z_OK; + $zbuf; +} + sub close { my ($self) = @_; my $fh = delete $self->{fh}; - $fh->write(translate($self, undef)); + $fh->write(zflush($self)); $fh->close; } diff --git a/lib/PublicInbox/NoopFilter.pm b/lib/PublicInbox/NoopFilter.pm new file mode 100644 index 000000000..b9c00ff7a --- /dev/null +++ b/lib/PublicInbox/NoopFilter.pm @@ -0,0 +1,13 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +package PublicInbox::NoopFilter; +use strict; + +sub new { bless \(my $ignore), __PACKAGE__ } + +# noop workalike for PublicInbox::GzipFilter methods +sub translate { $_[1] // '' } +sub zmore { $_[1] } +sub zflush { $_[1] // '' } +1; diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm index a3d4e2b35..780c97e91 100644 --- a/lib/PublicInbox/WwwListing.pm +++ b/lib/PublicInbox/WwwListing.pm @@ -10,6 +10,8 @@ use PublicInbox::Hval qw(ascii_html prurl); use PublicInbox::Linkify; use PublicInbox::View; use PublicInbox::Inbox; +use PublicInbox::NoopFilter; +use PublicInbox::GzipFilter qw(gzf_maybe); use bytes (); # bytes::length use HTTP::Date qw(time2str); use Digest::SHA (); @@ -104,13 +106,15 @@ sub ibx_entry { sub html ($$) { my ($env, $list) = @_; - my $title = 'public-inbox'; - my $out = ''; + my $h = [ 'Content-Type', 'text/html; charset=UTF-8', + 'Content-Length', undef ]; + my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new(); + my $out = $gzf->zmore('<html><head><title>' . + 'public-inbox listing</title>' . + '</head><body><pre>'); my $code = 404; if (@$list) { - $title .= ' - listing'; $code = 200; - # Schwartzian transform since Inbox->modified is expensive @$list = sort { $b->[0] <=> $a->[0] @@ -118,13 +122,14 @@ sub html ($$) { my $tmp = join("\n", map { ibx_entry(@$_, $env) } @$list); my $l = PublicInbox::Linkify->new; - $out = '<pre>'.$l->to_html($tmp).'</pre><hr>'; + $out .= $gzf->zmore($l->to_html($tmp)); + } else { + $out .= $gzf->zmore('no inboxes, yet'); } - $out = "<html><head><title>$title</title></head><body>" . $out; - $out .= '<pre>'. PublicInbox::WwwStream::code_footer($env) . - '</pre></body></html>'; - - my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ]; + $out .= $gzf->zflush('</pre><hr><pre>'. + PublicInbox::WwwStream::code_footer($env) . + '</pre></body></html>'); + $h->[3] = bytes::length($out); [ $code, $h, [ $out ] ]; } diff --git a/t/www_listing.t b/t/www_listing.t index 0aededd43..c4511cd1f 100644 --- a/t/www_listing.t +++ b/t/www_listing.t @@ -35,13 +35,19 @@ like(PublicInbox::WwwListing::fingerprint($bare), qr/\A[a-f0-9]{40}\z/, sub tiny_test { my ($json, $host, $port) = @_; + my $tmp; my $http = HTTP::Tiny->new; my $res = $http->get("http://$host:$port/"); is($res->{status}, 200, 'got HTML listing'); like($res->{content}, qr!</html>!si, 'listing looks like HTML'); + + $res = $http->get("http://$host:$port/", {'Accept-Encoding'=>'gzip'}); + is($res->{status}, 200, 'got gzipped HTML listing'); + IO::Uncompress::Gunzip::gunzip(\(delete $res->{content}) => \$tmp); + like($tmp, qr!</html>!si, 'unzipped listing looks like HTML'); + $res = $http->get("http://$host:$port/manifest.js.gz"); is($res->{status}, 200, 'got manifest'); - my $tmp; IO::Uncompress::Gunzip::gunzip(\(delete $res->{content}) => \$tmp); unlike($tmp, qr/"modified":\s*"/, 'modified is an integer'); my $manifest = $json->decode($tmp);
The new ->zmore and ->zflush APIs make it possible to replace existing verbose usages of Compress::Raw::Deflate and simplify buffering logic for streaming large gzipped data. One potentially user visible change is we now break the mbox.gz response on zlib failures, instead of silently continuing onto the next message. zlib only seems to fail on OOM, which should be rare; so it's ideal we drop the connection anyways. --- lib/PublicInbox/GzipFilter.pm | 27 ++++++++++------------- lib/PublicInbox/MboxGz.pm | 41 ++++++++++------------------------- lib/PublicInbox/WwwStream.pm | 27 ++++++++--------------- 3 files changed, 31 insertions(+), 64 deletions(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 8cc5ea00b..d2eb4e664 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -6,7 +6,7 @@ package PublicInbox::GzipFilter; use strict; use parent qw(Exporter); use Compress::Raw::Zlib qw(Z_FINISH Z_OK); -our @EXPORT_OK = qw(gzip_maybe gzf_maybe); +our @EXPORT_OK = qw(gzf_maybe); my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1); my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip); @@ -19,24 +19,23 @@ sub attach { $self } -sub gzip_maybe ($$) { +# returns `0' and not `undef' on failure (see Www*Stream) +sub gzf_maybe ($$) { my ($res_hdr, $env) = @_; - return if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/; - + return 0 if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/; my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); - return if $err != Z_OK; + return 0 if $err != Z_OK; # in case Plack::Middleware::Deflater is loaded: $env->{'plack.skip-deflater'} = 1; - push @$res_hdr, @GZIP_HDRS; - $gz; + bless { gz => $gz }, __PACKAGE__; } -sub gzf_maybe ($$) { - my ($res_hdr, $env) = @_; - my $gz = gzip_maybe($res_hdr, $env) or return 0; - bless { gz => $gz }, __PACKAGE__; +sub gzip_or_die () { + my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); + $err == Z_OK or die "Deflate->new failed: $err"; + $gz; } # for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'} @@ -47,11 +46,7 @@ sub translate ($$) { # allocate the zlib context lazily here, instead of in ->new. # Deflate contexts are memory-intensive and this object may # be sitting in the Qspawn limiter queue for a while. - my $gz = $self->{gz} //= do { - my ($g, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); - $err == Z_OK or die "Deflate->new failed: $err"; - $g; - }; + my $gz = $self->{gz} //= gzip_or_die(); my $zbuf = delete($self->{zbuf}); if (defined $_[1]) { # my $buf = $_[1]; my $err = $gz->deflate($_[1], $zbuf); diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm index f7fc4afc1..535ef96c9 100644 --- a/lib/PublicInbox/MboxGz.pm +++ b/lib/PublicInbox/MboxGz.pm @@ -2,19 +2,19 @@ # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> package PublicInbox::MboxGz; use strict; -use warnings; +use parent 'PublicInbox::GzipFilter'; use PublicInbox::Eml; use PublicInbox::Hval qw/to_filename/; use PublicInbox::Mbox; -use Compress::Raw::Zlib qw(Z_FINISH Z_OK); -my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1); sub new { my ($class, $ctx, $cb) = @_; $ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env}); - my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); - $err == Z_OK or die "Deflate->new failed: $err"; - bless { gz => $gz, cb => $cb, ctx => $ctx }, $class; + bless { + gz => PublicInbox::GzipFilter::gzip_or_die(), + cb => $cb, + ctx => $ctx + }, $class; } sub response { @@ -27,40 +27,21 @@ sub response { [ 200, $h, $body ]; } -sub gzip_fail ($$) { - my ($ctx, $err) = @_; - $ctx->{env}->{'psgi.errors'}->print("deflate failed: $err\n"); - ''; -} - # called by Plack::Util::foreach or similar sub getline { my ($self) = @_; my $ctx = $self->{ctx} or return; - my $gz = $self->{gz}; - my $buf = delete($self->{buf}); while (my $smsg = $self->{cb}->($ctx)) { my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next; my $h = PublicInbox::Eml->new($mref)->header_obj; - - my $err = $gz->deflate( - PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}), - $buf); - return gzip_fail($ctx, $err) if $err != Z_OK; - - $err = $gz->deflate(PublicInbox::Mbox::msg_body($$mref), $buf); - return gzip_fail($ctx, $err) if $err != Z_OK; - - return $buf if length($buf) >= 8192; - - # be fair to other clients on public-inbox-httpd: - $self->{buf} = $buf; - return ''; + $self->zmore( + PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}) + ); + return $self->translate(PublicInbox::Mbox::msg_body($$mref)); } # signal that we're done and can return undef next call: delete $self->{ctx}; - my $err = $gz->flush($buf, Z_FINISH); - ($err == Z_OK) ? $buf : gzip_fail($ctx, $err); + $self->zflush; } sub close {} # noop diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index c964dbd41..8623440b8 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -13,8 +13,7 @@ use base qw(Exporter); our @EXPORT_OK = qw(html_oneshot); use bytes (); # length use PublicInbox::Hval qw(ascii_html prurl); -use Compress::Raw::Zlib qw(Z_FINISH Z_OK); -use PublicInbox::GzipFilter qw(gzip_maybe gzf_maybe); +use PublicInbox::GzipFilter qw(gzf_maybe); our $TOR_URL = 'https://www.torproject.org/'; our $CODE_URL = 'https://public-inbox.org/public-inbox.git'; @@ -190,25 +189,17 @@ sub html_oneshot ($$;$) { base_url => base_url($ctx), }, __PACKAGE__; my @x; - my $h = [ 'Content-Type' => 'text/html; charset=UTF-8' ]; - if (my $gz = gzip_maybe($h, $ctx->{env})) { - my $err = $gz->deflate(_html_top($self), $x[0]); - die "gzip->deflate: $err" if $err != Z_OK; - if ($sref) { - $err = $gz->deflate($sref, $x[0]); - die "gzip->deflate: $err" if $err != Z_OK; - } - $err = $gz->deflate(_html_end($self), $x[0]); - die "gzip->deflate: $err" if $err != Z_OK; - $err = $gz->flush($x[0], Z_FINISH); - die "gzip->flush: $err" if $err != Z_OK; + my $h = [ 'Content-Type' => 'text/html; charset=UTF-8', + 'Content-Length' => undef ]; + if (my $gzf = gzf_maybe($h, $ctx->{env})) { + $gzf->zmore(_html_top($self)); + $gzf->zmore($$sref) if $sref; + $x[0] = $gzf->zflush(_html_end($self)); + $h->[3] = length($x[0]); } else { @x = (_html_top($self), $sref ? $$sref : (), _html_end($self)); + $h->[3] += bytes::length($_) for @x; } - - my $len = 0; - $len += bytes::length($_) for @x; - push @$h, 'Content-Length', $len; [ $code, $h, \@x ] }
This simplifies callers, as witnessed by the change to WwwListing. It adds overhead to NoopFilter, but NoopFilter should see little use as nearly all HTTP clients request gzip. --- lib/PublicInbox/GzipFilter.pm | 2 +- lib/PublicInbox/NoopFilter.pm | 19 +++++++++++++++---- lib/PublicInbox/WwwListing.pm | 8 ++++---- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index d2eb4e664..0fbb4476a 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -73,7 +73,7 @@ sub zmore { my $self = $_[0]; # $_[1] => input my $err = $self->{gz}->deflate($_[1], $self->{zbuf}); die "gzip->deflate: $err" if $err != Z_OK; - ''; + undef; } # flushes and returns the final bit of gzipped data diff --git a/lib/PublicInbox/NoopFilter.pm b/lib/PublicInbox/NoopFilter.pm index b9c00ff7a..a97dbde64 100644 --- a/lib/PublicInbox/NoopFilter.pm +++ b/lib/PublicInbox/NoopFilter.pm @@ -4,10 +4,21 @@ package PublicInbox::NoopFilter; use strict; -sub new { bless \(my $ignore), __PACKAGE__ } +sub new { bless \(my $self = ''), __PACKAGE__ } # noop workalike for PublicInbox::GzipFilter methods -sub translate { $_[1] // '' } -sub zmore { $_[1] } -sub zflush { $_[1] // '' } +sub translate { + my $self = $_[0]; + my $ret = $$self .= ($_[1] // ''); + $$self = ''; + $ret; +} + +sub zmore { + ${$_[0]} .= $_[1]; + undef; +} + +sub zflush { translate($_[0], $_[1]) } + 1; diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm index 780c97e91..d641e6d5c 100644 --- a/lib/PublicInbox/WwwListing.pm +++ b/lib/PublicInbox/WwwListing.pm @@ -109,7 +109,7 @@ sub html ($$) { my $h = [ 'Content-Type', 'text/html; charset=UTF-8', 'Content-Length', undef ]; my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new(); - my $out = $gzf->zmore('<html><head><title>' . + $gzf->zmore('<html><head><title>' . 'public-inbox listing</title>' . '</head><body><pre>'); my $code = 404; @@ -122,11 +122,11 @@ sub html ($$) { my $tmp = join("\n", map { ibx_entry(@$_, $env) } @$list); my $l = PublicInbox::Linkify->new; - $out .= $gzf->zmore($l->to_html($tmp)); + $gzf->zmore($l->to_html($tmp)); } else { - $out .= $gzf->zmore('no inboxes, yet'); + $gzf->zmore('no inboxes, yet'); } - $out .= $gzf->zflush('</pre><hr><pre>'. + my $out = $gzf->zflush('</pre><hr><pre>'. PublicInbox::WwwStream::code_footer($env) . '</pre></body></html>'); $h->[3] = bytes::length($out);
It's no longer needed, we no longer show a runtime error for zlib being missing, as zlib is a hard requirement. Fixes: a318e758129d616b ("make zlib-related modules a hard dependency") --- lib/PublicInbox/Mbox.pm | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm index b46dacfdc..4c0b01edf 100644 --- a/lib/PublicInbox/Mbox.pm +++ b/lib/PublicInbox/Mbox.pm @@ -13,7 +13,6 @@ use warnings; use PublicInbox::MID qw/mid_escape/; use PublicInbox::Hval qw/to_filename/; use PublicInbox::Smsg; -use PublicInbox::WwwStream qw(html_oneshot); use PublicInbox::Eml; sub subject_fn ($) {
This will allow others to mimic our award-winning homepage design without needing to rely on Plack::Middleware::Deflater or varnish to compress responses. --- lib/PublicInbox/WwwStatic.pm | 15 ++++++++++----- t/www_static.t | 11 ++++++++++- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm index 3c9331564..d0611949d 100644 --- a/lib/PublicInbox/WwwStatic.pm +++ b/lib/PublicInbox/WwwStatic.pm @@ -17,6 +17,8 @@ use HTTP::Date qw(time2str); use HTTP::Status qw(status_message); use Errno qw(EACCES ENOTDIR ENOENT); use URI::Escape qw(uri_escape_utf8); +use PublicInbox::NoopFilter; +use PublicInbox::GzipFilter qw(gzf_maybe); use PublicInbox::Hval qw(ascii_html); use Plack::MIME; our @EXPORT_OK = qw(@NO_CACHE r path_info_raw); @@ -310,12 +312,15 @@ sub dir_response ($$$) { (map { ${$other{$_}} } sort keys %other)); my $path_info_html = ascii_html($path_info); - my $body = "<html><head><title>Index of $path_info_html</title>" . + my $h = [qw(Content-Type text/html Content-Length), undef]; + my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new(); + $gzf->zmore("<html><head><title>Index of $path_info_html</title>" . ${$self->{style}} . - "</head><body><pre>Index of $path_info_html</pre><hr><pre>\n"; - $body .= join("\n", @entries) . "</pre><hr></body></html>\n"; - [ 200, [ qw(Content-Type text/html - Content-Length), bytes::length($body) ], [ $body ] ] + "</head><body><pre>Index of $path_info_html</pre><hr><pre>\n"); + $gzf->zmore(join("\n", @entries)); + my $out = $gzf->zflush("</pre><hr></body></html>\n"); + $h->[3] = bytes::length($out); + [ 200, $h, [ $out ] ] } sub call { # PSGI app endpoint diff --git a/t/www_static.t b/t/www_static.t index 10757cb7f..364b9447a 100644 --- a/t/www_static.t +++ b/t/www_static.t @@ -6,7 +6,7 @@ use Test::More; use PublicInbox::TestCommon; my ($tmpdir, $for_destroy) = tmpdir(); my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape); -require_mods(@mods); +require_mods(@mods, 'IO::Uncompress::Gunzip'); use_ok $_ foreach @mods; use_ok 'PublicInbox::WwwStatic'; @@ -91,6 +91,15 @@ test_psgi($app->(autoindex => 1, index => []), sub { $get->header('Accept-Encoding' => 'gzip'); $res = $cb->($get); is($res->content, "hi", 'got compressed on mtime match'); + + $get = GET('/dir/'); + $get->header('Accept-Encoding' => 'gzip'); + $res = $cb->($get); + my $in = $res->content; + my $out = ''; + IO::Uncompress::Gunzip::gunzip(\$in => \$out); + like($out, qr/\A<html>/, 'got HTML start after gunzip'); + like($out, qr{</html>$}, 'got HTML end after gunzip'); }); done_testing();
This will allow us to gzip responses generated by cgit and any other CGI programs or long-lived streaming responses we may spawn. --- lib/PublicInbox/GzipFilter.pm | 16 ++++++++++++++++ lib/PublicInbox/Qspawn.pm | 6 ++++-- t/httpd-corner.psgi | 7 +++++++ t/httpd-corner.t | 9 ++++++++- 4 files changed, 35 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 0fbb4476a..0a6c56a5d 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -32,6 +32,22 @@ sub gzf_maybe ($$) { bless { gz => $gz }, __PACKAGE__; } +sub qsp_maybe ($$) { + my ($res_hdr, $env) = @_; + return if ($env->{HTTP_ACCEPT_ENCODING} // '') !~ /\bgzip\b/; + my $hdr = join("\n", @$res_hdr); + return if $hdr !~ m!^Content-Type\n + (?:(?:text/(?:html|plain))| + application/atom\+xml)\b!ixsm; + return if $hdr =~ m!^Content-Encoding\ngzip\n!smi; + return if $hdr =~ m!^Content-Length\n[0-9]+\n!smi; + return if $hdr =~ m!^Transfer-Encoding\n!smi; + # in case Plack::Middleware::Deflater is loaded: + return if $env->{'plack.skip-deflater'}++; + push @$res_hdr, @GZIP_HDRS; + bless {}, __PACKAGE__; +} + sub gzip_or_die () { my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); $err == Z_OK or die "Deflate->new failed: $err"; diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm index d395a10b3..88b6d390a 100644 --- a/lib/PublicInbox/Qspawn.pm +++ b/lib/PublicInbox/Qspawn.pm @@ -25,8 +25,8 @@ package PublicInbox::Qspawn; use strict; -use warnings; use PublicInbox::Spawn qw(popen_rd); +use PublicInbox::GzipFilter; # n.b.: we get EAGAIN with public-inbox-httpd, and EINTR on other PSGI servers use Errno qw(EAGAIN EINTR); @@ -255,7 +255,9 @@ sub psgi_return_init_cb { my ($self) = @_; my $r = rd_hdr($self) or return; my $env = $self->{psgi_env}; - my $filter = delete $env->{'qspawn.filter'}; + my $filter = delete $env->{'qspawn.filter'} // + PublicInbox::GzipFilter::qsp_maybe($r->[1], $env); + my $wcb = delete $env->{'qspawn.wcb'}; my $async = delete $self->{async}; if (scalar(@$r) == 3) { # error diff --git a/t/httpd-corner.psgi b/t/httpd-corner.psgi index 446296200..cb41cfa05 100644 --- a/t/httpd-corner.psgi +++ b/t/httpd-corner.psgi @@ -94,6 +94,13 @@ my $app = sub { return $qsp->psgi_return($env, undef, sub { [ 200, [ qw(Content-Type application/octet-stream)]] }); + } elsif ($path eq '/psgi-return-compressible') { + require PublicInbox::Qspawn; + my $cmd = [qw(echo goodbye world)]; + my $qsp = PublicInbox::Qspawn->new($cmd); + return $qsp->psgi_return($env, undef, sub { + [200, [qw(Content-Type text/plain)]] + }); } elsif ($path eq '/psgi-return-enoent') { require PublicInbox::Qspawn; my $cmd = [ 'this-better-not-exist-in-PATH'.rand ]; diff --git a/t/httpd-corner.t b/t/httpd-corner.t index 681486550..514672a1b 100644 --- a/t/httpd-corner.t +++ b/t/httpd-corner.t @@ -340,11 +340,18 @@ SKIP: { is($n, 30 * 1024 * 1024, 'got expected output from curl'); is($non_zero, 0, 'read all zeros'); - require_mods(@zmods, 2); + require_mods(@zmods, 4); my $buf = xqx([$curl, '-sS', "$base/psgi-return-gzip"]); is($?, 0, 'curl succesful'); IO::Uncompress::Gunzip::gunzip(\$buf => \(my $out)); is($out, "hello world\n"); + my $curl_rdr = { 2 => \(my $curl_err = '') }; + $buf = xqx([$curl, qw(-sSv --compressed), + "$base/psgi-return-compressible"], undef, $curl_rdr); + is($?, 0, 'curl --compressed successful'); + is($buf, "goodbye world\n", 'gzipped response as expected'); + like($curl_err, qr/\bContent-Encoding: gzip\b/, + 'curl got gzipped response'); } {
Instead of gzipping some (mbox.gz, manifest.js.gz) responses and leaving P::M::D to do the rest, we gzip everything ourselves, now, so P::M::D is redundant. --- Documentation/public-inbox-httpd.pod | 1 - INSTALL | 5 ----- ci/deps.perl | 1 - examples/cgit.psgi | 8 -------- examples/newswww.psgi | 8 -------- examples/public-inbox.psgi | 9 --------- examples/unsubscribe.psgi | 1 - script/public-inbox-httpd | 9 --------- script/public-inbox.cgi | 7 ------- 9 files changed, 49 deletions(-) diff --git a/Documentation/public-inbox-httpd.pod b/Documentation/public-inbox-httpd.pod index c1ebfd822..2f4e9e5d1 100644 --- a/Documentation/public-inbox-httpd.pod +++ b/Documentation/public-inbox-httpd.pod @@ -15,7 +15,6 @@ the PSGI file. If a PSGI file is not specified, L<PublicInbox::WWW> is loaded with a default middleware stack consisting of -L<Plack::Middleware::Deflater>, L<Plack::Middleware::ReverseProxy>, and L<Plack::Middleware::Head> diff --git a/INSTALL b/INSTALL index 05e0f95e9..9f05c3f62 100644 --- a/INSTALL +++ b/INSTALL @@ -100,11 +100,6 @@ Numerous optional modules are likely to be useful as well: (ensures redirects are correct when running behind nginx or Varnish) -- Plack::Middleware::Deflater deb: libplack-middleware-deflater-perl - pkg: p5 -Plack-Middleware-Deflater - rpm: perl-Plack-Middleware-Deflater - (saves bandwidth on responses) - * highlight deb: libhighlight-perl (for syntax highlighting with coderepo) diff --git a/ci/deps.perl b/ci/deps.perl index 501f51129..77d95fc8e 100755 --- a/ci/deps.perl +++ b/ci/deps.perl @@ -36,7 +36,6 @@ my $profiles = { Net::Server Plack Plack::Test - Plack::Middleware::Deflater Plack::Middleware::ReverseProxy Search::Xapian Socket6 diff --git a/examples/cgit.psgi b/examples/cgit.psgi index e72e832d0..7ad38e280 100644 --- a/examples/cgit.psgi +++ b/examples/cgit.psgi @@ -18,14 +18,6 @@ my $pi_config = PublicInbox::Config->new; my $cgit = PublicInbox::Cgit->new($pi_config); builder { - eval { - enable 'Deflater', - content_type => [ qw( - text/html - text/plain - application/atom+xml - )] - }; eval { enable 'ReverseProxy' }; enable 'Head'; sub { $cgit->call($_[0]) } diff --git a/examples/newswww.psgi b/examples/newswww.psgi index 3cce7191d..52ad7043e 100644 --- a/examples/newswww.psgi +++ b/examples/newswww.psgi @@ -36,14 +36,6 @@ builder { # regular PublicInbox::WWW code: # see comments in examples/public-inbox.psgi for more info: mount '/' => builder { - eval { - enable 'Deflater', - content_type => [ qw( - text/html - text/plain - application/atom+xml - )] - }; eval { enable 'ReverseProxy' }; enable 'Head'; sub { $www->call($_[0]) } diff --git a/examples/public-inbox.psgi b/examples/public-inbox.psgi index 9891a1f03..3537be2c7 100644 --- a/examples/public-inbox.psgi +++ b/examples/public-inbox.psgi @@ -22,15 +22,6 @@ my $src = $ENV{SRC_GIT_DIR}; # '/path/to/public-inbox.git' $src = PublicInbox::Git->new($src) if defined $src; builder { - eval { - enable 'Deflater', - content_type => [ qw( - text/html - text/plain - application/atom+xml - )] - }; - # Enable to ensure redirects and Atom feed URLs are generated # properly when running behind a reverse proxy server which # sets the X-Forwarded-Proto request header. diff --git a/examples/unsubscribe.psgi b/examples/unsubscribe.psgi index 6a40f251d..7b97e2532 100644 --- a/examples/unsubscribe.psgi +++ b/examples/unsubscribe.psgi @@ -61,7 +61,6 @@ my $app = PublicInbox::Unsubscribe->new( builder { mount '/u' => builder { - eval { enable 'Deflater' }; # optional eval { enable 'ReverseProxy' }; # optional enable 'Head'; sub { $app->call(@_) }; diff --git a/script/public-inbox-httpd b/script/public-inbox-httpd index 09da505e5..b8159f3a5 100755 --- a/script/public-inbox-httpd +++ b/script/public-inbox-httpd @@ -27,15 +27,6 @@ my $refresh = sub { my $www = PublicInbox::WWW->new; $www->preload; $app = builder { - eval { - enable 'Deflater', - content_type => [ qw( - text/html - text/plain - application/atom+xml - )] - }; - eval { enable 'ReverseProxy' }; $@ and warn "Plack::Middleware::ReverseProxy missing,\n", diff --git a/script/public-inbox.cgi b/script/public-inbox.cgi index c766483a2..42ab17c9e 100755 --- a/script/public-inbox.cgi +++ b/script/public-inbox.cgi @@ -13,14 +13,7 @@ BEGIN { PublicInbox::WWW->preload if $ENV{MOD_PERL}; } my $www = PublicInbox::WWW->new; -my $have_deflater = eval { require Plack::Middleware::Deflater; 1 }; my $app = builder { - if ($have_deflater) { - enable 'Deflater', - content_type => [ 'text/html', 'text/plain', - 'application/atom+xml' ]; - } - # Enable to ensure redirects and Atom feed URLs are generated # properly when running behind a reverse proxy server which # sets the X-Forwarded-Proto request header.
This lets the -httpd worker process make better use of time instead of waiting for git-cat-file to respond. With 4 jobs in the new test case against a clone of <https://public-inbox.org/meta/>, a speedup of 10-12% is shown. Even a single job shows a 2-5% improvement on an SSD. --- MANIFEST | 1 + lib/PublicInbox/HTTP.pm | 7 +++ lib/PublicInbox/MboxGz.pm | 69 +++++++++++++++++++++++---- xt/httpd-async-stream.t | 99 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 167 insertions(+), 9 deletions(-) create mode 100644 xt/httpd-async-stream.t diff --git a/MANIFEST b/MANIFEST index dcd7a7e5f..9b0f50203 100644 --- a/MANIFEST +++ b/MANIFEST @@ -368,6 +368,7 @@ xt/cmp-msgview.t xt/eml_check_limits.t xt/git-http-backend.t xt/git_async_cmp.t +xt/httpd-async-stream.t xt/imapd-mbsync-oimap.t xt/imapd-validate.t xt/mem-imapd-tls.t diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm index 828174653..5844ef440 100644 --- a/lib/PublicInbox/HTTP.pm +++ b/lib/PublicInbox/HTTP.pm @@ -488,6 +488,13 @@ sub busy () { ($self->{rbuf} || exists($self->{env}) || $self->{wbuf}); } +# runs $cb on the next iteration of the event loop at earliest +sub next_step { + my ($self, $cb) = @_; + return unless exists $self->{sock}; + $self->requeue if 1 == push(@{$self->{wbuf}}, $cb); +} + # Chunked and Identity packages are used for writing responses. # They may be exposed to the PSGI application when the PSGI app # returns a CODE ref for "push"-based responses diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm index 535ef96c9..8c9010afb 100644 --- a/lib/PublicInbox/MboxGz.pm +++ b/lib/PublicInbox/MboxGz.pm @@ -6,6 +6,9 @@ use parent 'PublicInbox::GzipFilter'; use PublicInbox::Eml; use PublicInbox::Hval qw/to_filename/; use PublicInbox::Mbox; +use PublicInbox::GitAsyncCat; +*msg_hdr = \&PublicInbox::Mbox::msg_hdr; +*msg_body = \&PublicInbox::Mbox::msg_body; sub new { my ($class, $ctx, $cb) = @_; @@ -17,33 +20,81 @@ sub new { }, $class; } +# this is public-inbox-httpd-specific +sub mboxgz_blob_cb { # git->cat_async callback + my ($bref, $oid, $type, $size, $self) = @_; + my $http = $self->{ctx}->{env}->{'psgix.io'} or return; # client abort + my $smsg = delete $self->{smsg} or die 'BUG: no smsg'; + if (!defined($oid)) { + # it's possible to have TOCTOU if an admin runs + # public-inbox-(edit|purge), just move onto the next message + return $http->next_step(\&async_next); + } else { + $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; + } + $self->zmore(msg_hdr($self->{ctx}, + PublicInbox::Eml->new($bref)->header_obj, + $smsg->{mid})); + + # PublicInbox::HTTP::{Chunked,Identity}::write + $self->{http_out}->write($self->translate(msg_body($$bref))); + + $http->next_step(\&async_next); +} + +# this is public-inbox-httpd-specific +sub async_step ($) { + my ($self) = @_; + if (my $smsg = $self->{smsg} = $self->{cb}->($self->{ctx})) { + git_async_cat($self->{ctx}->{-inbox}->git, $smsg->{blob}, + \&mboxgz_blob_cb, $self); + } elsif (my $out = delete $self->{http_out}) { + $out->write($self->zflush); + $out->close; + } +} + +# called by PublicInbox::DS::write +sub async_next { + my ($http) = @_; # PublicInbox::HTTP + async_step($http->{forward}); +} + +# called by PublicInbox::HTTP::close, or any other PSGI server +sub close { !!delete($_[0]->{http_out}) } + sub response { my ($class, $ctx, $cb, $fn) = @_; - my $body = $class->new($ctx, $cb); + my $self = $class->new($ctx, $cb); # http://www.iana.org/assignments/media-types/application/gzip $fn = defined($fn) && $fn ne '' ? to_filename($fn) : 'no-subject'; my $h = [ qw(Content-Type application/gzip), 'Content-Disposition', "inline; filename=$fn.mbox.gz" ]; - [ 200, $h, $body ]; + if ($ctx->{env}->{'pi-httpd.async'}) { + sub { + my ($wcb) = @_; # -httpd provided write callback + $self->{http_out} = $wcb->([200, $h]); + $self->{ctx}->{env}->{'psgix.io'}->{forward} = $self; + async_step($self); # start stepping + }; + } else { # generic PSGI + [ 200, $h, $self ]; + } } -# called by Plack::Util::foreach or similar +# called by Plack::Util::foreach or similar (generic PSGI) sub getline { my ($self) = @_; my $ctx = $self->{ctx} or return; while (my $smsg = $self->{cb}->($ctx)) { my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next; my $h = PublicInbox::Eml->new($mref)->header_obj; - $self->zmore( - PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}) - ); - return $self->translate(PublicInbox::Mbox::msg_body($$mref)); + $self->zmore(msg_hdr($ctx, $h, $smsg->{mid})); + return $self->translate(msg_body($$mref)); } # signal that we're done and can return undef next call: delete $self->{ctx}; $self->zflush; } -sub close {} # noop - 1; diff --git a/xt/httpd-async-stream.t b/xt/httpd-async-stream.t new file mode 100644 index 000000000..29bcb6125 --- /dev/null +++ b/xt/httpd-async-stream.t @@ -0,0 +1,99 @@ +#!perl -w +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +# Expensive test to validate compression and TLS. +use strict; +use Test::More; +use PublicInbox::TestCommon; +use PublicInbox::DS qw(now); +use PublicInbox::Spawn qw(which popen_rd); +use Digest::MD5; +use POSIX qw(_exit); +my $inboxdir = $ENV{GIANT_INBOX_DIR}; +plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir; +my $curl = which('curl') or plan skip_all => "curl(1) missing for $0"; +my ($tmpdir, $for_destroy) = tmpdir(); +require_mods(qw(DBD::SQLite)); +my $JOBS = $ENV{TEST_JOBS} // 4; +diag "TEST_JOBS=$JOBS"; + +my $make_local_server = sub { + my $pi_config = "$tmpdir/config"; + open my $fh, '>', $pi_config or die "open($pi_config): $!"; + print $fh <<"" or die "print $pi_config: $!"; +[publicinbox "test"] +inboxdir = $inboxdir +address = test\@example.com + + close $fh or die "close($pi_config): $!"; + my ($out, $err) = ("$tmpdir/out", "$tmpdir/err"); + for ($out, $err) { + open my $fh, '>', $_ or die "truncate: $!"; + } + my $http = tcp_server(); + my $rdr = { 3 => $http }; + + # not using multiple workers, here, since we want to increase + # the chance of tripping concurrency bugs within PublicInbox/HTTP*.pm + my $cmd = [ '-httpd', "--stdout=$out", "--stderr=$err", '-W0' ]; + my $host_port = $http->sockhost.':'.$http->sockport; + push @$cmd, "-lhttp://$host_port"; + my $url = "$host_port/test/all.mbox.gz"; + print STDERR "# CMD ". join(' ', @$cmd). "\n"; + my $env = { PI_CONFIG => $pi_config }; + (start_script($cmd, $env, $rdr), $url); +}; + +my ($td, $url) = $make_local_server->(); + +my $do_get_all = sub { + my ($job) = @_; + local $SIG{__DIE__} = sub { print STDERR $job, ': ', @_; _exit(1) }; + my $dig = Digest::MD5->new; + my ($buf, $nr); + my $bytes = 0; + my $t0 = now(); + my ($rd, $pid) = popen_rd([$curl, qw(-HHost:example.com -sSf), $url]); + while (1) { + $nr = sysread($rd, $buf, 65536); + last if !$nr; + $dig->add($buf); + $bytes += $nr; + } + my $res = $dig->hexdigest; + my $elapsed = sprintf('%0.3f', now() - $t0); + close $rd or die "close curl failed: $!\n"; + waitpid($pid, 0) == $pid or die "waitpid failed: $!\n"; + $? == 0 or die "curl failed: $?\n"; + print STDERR "# $job $$ ($?) $res (${elapsed}s) $bytes bytes\n"; + $res; +}; + +my (%pids, %res); +for my $job (1..$JOBS) { + pipe(my ($r, $w)) or die; + my $pid = fork; + if ($pid == 0) { + close $r or die; + my $res = $do_get_all->($job); + print $w $res or die; + close $w or die; + _exit(0); + } + close $w or die; + $pids{$pid} = [ $job, $r ]; +} + +while (scalar keys %pids) { + my $pid = waitpid(-1, 0) or next; + my $child = delete $pids{$pid} or next; + my ($job, $rpipe) = @$child; + is($?, 0, "$job done"); + my $sum = do { local $/; <$rpipe> }; + push @{$res{$sum}}, $job; +} +is(scalar keys %res, 1, 'all got the same result'); +$td->kill; +$td->join; +is($?, 0, 'no error on -httpd exit'); +done_testing;
We can bless $ctx directly into a MboxGz object to reduce hash lookups and allocations. --- lib/PublicInbox/MboxGz.pm | 39 ++++++++++++++++----------------------- 1 file changed, 16 insertions(+), 23 deletions(-) diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm index 8c9010afb..598b10347 100644 --- a/lib/PublicInbox/MboxGz.pm +++ b/lib/PublicInbox/MboxGz.pm @@ -10,20 +10,10 @@ use PublicInbox::GitAsyncCat; *msg_hdr = \&PublicInbox::Mbox::msg_hdr; *msg_body = \&PublicInbox::Mbox::msg_body; -sub new { - my ($class, $ctx, $cb) = @_; - $ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env}); - bless { - gz => PublicInbox::GzipFilter::gzip_or_die(), - cb => $cb, - ctx => $ctx - }, $class; -} - # this is public-inbox-httpd-specific sub mboxgz_blob_cb { # git->cat_async callback my ($bref, $oid, $type, $size, $self) = @_; - my $http = $self->{ctx}->{env}->{'psgix.io'} or return; # client abort + my $http = $self->{env}->{'psgix.io'} or return; # client abort my $smsg = delete $self->{smsg} or die 'BUG: no smsg'; if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs @@ -32,7 +22,7 @@ sub mboxgz_blob_cb { # git->cat_async callback } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } - $self->zmore(msg_hdr($self->{ctx}, + $self->zmore(msg_hdr($self, PublicInbox::Eml->new($bref)->header_obj, $smsg->{mid})); @@ -45,8 +35,8 @@ sub mboxgz_blob_cb { # git->cat_async callback # this is public-inbox-httpd-specific sub async_step ($) { my ($self) = @_; - if (my $smsg = $self->{smsg} = $self->{cb}->($self->{ctx})) { - git_async_cat($self->{ctx}->{-inbox}->git, $smsg->{blob}, + if (my $smsg = $self->{smsg} = $self->{cb}->($self)) { + git_async_cat($self->{-inbox}->git, $smsg->{blob}, \&mboxgz_blob_cb, $self); } elsif (my $out = delete $self->{http_out}) { $out->write($self->zflush); @@ -64,17 +54,20 @@ sub async_next { sub close { !!delete($_[0]->{http_out}) } sub response { - my ($class, $ctx, $cb, $fn) = @_; - my $self = $class->new($ctx, $cb); + my ($class, $self, $cb, $fn) = @_; + $self->{base_url} = $self->{-inbox}->base_url($self->{env}); + $self->{cb} = $cb; + $self->{gz} = PublicInbox::GzipFilter::gzip_or_die(); + bless $self, $class; # http://www.iana.org/assignments/media-types/application/gzip $fn = defined($fn) && $fn ne '' ? to_filename($fn) : 'no-subject'; my $h = [ qw(Content-Type application/gzip), 'Content-Disposition', "inline; filename=$fn.mbox.gz" ]; - if ($ctx->{env}->{'pi-httpd.async'}) { + if ($self->{env}->{'pi-httpd.async'}) { sub { my ($wcb) = @_; # -httpd provided write callback $self->{http_out} = $wcb->([200, $h]); - $self->{ctx}->{env}->{'psgix.io'}->{forward} = $self; + $self->{env}->{'psgix.io'}->{forward} = $self; async_step($self); # start stepping }; } else { # generic PSGI @@ -85,15 +78,15 @@ sub response { # called by Plack::Util::foreach or similar (generic PSGI) sub getline { my ($self) = @_; - my $ctx = $self->{ctx} or return; - while (my $smsg = $self->{cb}->($ctx)) { - my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next; + my $cb = $self->{cb} or return; + while (my $smsg = $cb->($self)) { + my $mref = $self->{-inbox}->msg_by_smsg($smsg) or next; my $h = PublicInbox::Eml->new($mref)->header_obj; - $self->zmore(msg_hdr($ctx, $h, $smsg->{mid})); + $self->zmore(msg_hdr($self, $h, $smsg->{mid})); return $self->translate(msg_body($$mref)); } # signal that we're done and can return undef next call: - delete $self->{ctx}; + delete $self->{cb}; $self->zflush; }
This restores gzip-by-default behavior for /$INBOX/$MSGID/raw endpoints for all indexed inboxes. Unindexed v1 inboxes will remain uncompressed, for now. --- lib/PublicInbox/Mbox.pm | 160 +++++++++++++++++++++++++------------- lib/PublicInbox/MboxGz.pm | 51 ++++++------ t/psgi_v2.t | 46 +++++++++-- 3 files changed, 175 insertions(+), 82 deletions(-) diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm index 4c0b01edf..895f828c5 100644 --- a/lib/PublicInbox/Mbox.pm +++ b/lib/PublicInbox/Mbox.pm @@ -14,62 +14,69 @@ use PublicInbox::MID qw/mid_escape/; use PublicInbox::Hval qw/to_filename/; use PublicInbox::Smsg; use PublicInbox::Eml; - -sub subject_fn ($) { - my ($hdr) = @_; - my $fn = $hdr->header_str('Subject'); - return 'no-subject' if (!defined($fn) || $fn eq ''); - - $fn =~ s/^re:\s+//i; - $fn eq '' ? 'no-subject' : to_filename($fn); -} - -sub mb_stream { - my ($more) = @_; - bless $more, 'PublicInbox::Mbox'; -} +use PublicInbox::GitAsyncCat; +use PublicInbox::GzipFilter qw(gzf_maybe); # called by PSGI server as body response # this gets called twice for every message, once to return the header, # once to retrieve the body sub getline { - my ($more) = @_; # self - my ($ctx, $id, $prev, $next, $mref, $hdr) = @$more; - if ($hdr) { # first message hits this, only - pop @$more; # $hdr - pop @$more; # $mref - return msg_hdr($ctx, $hdr) . msg_body($$mref); - } - my $cur = $next or return; + my ($ctx) = @_; # ctx + my $smsg = $ctx->{smsg} or return; my $ibx = $ctx->{-inbox}; - $next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev); - $mref = $ibx->msg_by_smsg($cur) or return; - $hdr = PublicInbox::Eml->new($mref)->header_obj; - @$more = ($ctx, $id, $prev, $next); # $next may be undef, here - msg_hdr($ctx, $hdr) . msg_body($$mref); + my $eml = $ibx->smsg_eml($smsg) or return; + $ctx->{smsg} = $ibx->over->next_by_mid($ctx->{mid}, @{$ctx->{id_prev}}); + msg_hdr($ctx, $eml, $smsg->{mid}) . msg_body($eml); } -sub close {} # noop +sub close { !!delete($_[0]->{http_out}) } -# /$INBOX/$MESSAGE_ID/raw -sub emit_raw { +sub mbox_async_step ($) { # public-inbox-httpd-only my ($ctx) = @_; - my $mid = $ctx->{mid}; - my $ibx = $ctx->{-inbox}; - $ctx->{base_url} = $ibx->base_url($ctx->{env}); - my ($mref, $more, $id, $prev, $next); - if (my $over = $ibx->over) { - my $smsg = $over->next_by_mid($mid, \$id, \$prev) or return; - $mref = $ibx->msg_by_smsg($smsg) or return; - $next = $over->next_by_mid($mid, \$id, \$prev); + if (my $smsg = $ctx->{smsg}) { + git_async_cat($ctx->{-inbox}->git, $smsg->{blob}, + \&mbox_blob_cb, $ctx); + } elsif (my $out = delete $ctx->{http_out}) { + $out->close; + } +} + +# called by PublicInbox::DS::write +sub mbox_async_next { + my ($http) = @_; # PublicInbox::HTTP + my $ctx = $http->{forward} or return; # client aborted + eval { + $ctx->{smsg} = $ctx->{-inbox}->over->next_by_mid( + $ctx->{mid}, @{$ctx->{id_prev}}); + mbox_async_step($ctx); + }; +} + +# this is public-inbox-httpd-specific +sub mbox_blob_cb { # git->cat_async callback + my ($bref, $oid, $type, $size, $ctx) = @_; + my $http = $ctx->{env}->{'psgix.io'} or return; # client abort + my $smsg = delete $ctx->{smsg} or die 'BUG: no smsg'; + if (!defined($oid)) { + # it's possible to have TOCTOU if an admin runs + # public-inbox-(edit|purge), just move onto the next message + return $http->next_step(\&mbox_async_next); } else { - $mref = $ibx->msg_by_mid($mid) or return; + $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } - my $hdr = PublicInbox::Eml->new($mref)->header_obj; - $more = [ $ctx, $id, $prev, $next, $mref, $hdr ]; # for ->getline - my $fn = subject_fn($hdr); + my $eml = PublicInbox::Eml->new($bref); + $ctx->{http_out}->write(msg_hdr($ctx, $eml, $smsg->{mid})); + $ctx->{http_out}->write(msg_body($eml)); + $http->next_step(\&mbox_async_next); +} + +sub res_hdr ($$) { + my ($ctx, $subject) = @_; + my $fn = $subject // 'no-subject'; + $fn =~ s/^re:\s+//i; + $fn = $fn eq '' ? 'no-subject' : to_filename($fn); my @hdr = ('Content-Type'); - if ($ibx->{obfuscate}) { + if ($ctx->{-inbox}->{obfuscate}) { # obfuscation is stupid, but maybe scrapers are, too... push @hdr, 'application/mbox'; $fn .= '.mbox'; @@ -78,11 +85,58 @@ sub emit_raw { $fn .= '.txt'; } push @hdr, 'Content-Disposition', "inline; filename=$fn"; - [ 200, \@hdr, mb_stream($more) ]; + \@hdr; +} + +# for rare cases where v1 inboxes aren't indexed w/ ->over at all +sub no_over_raw ($) { + my ($ctx) = @_; + my $mref = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return; + my $eml = PublicInbox::Eml->new($mref); + [ 200, res_hdr($ctx, $eml->header_str('Subject')), + [ msg_hdr($ctx, $eml, $ctx->{mid}) . msg_body($eml) ] ] +} + +sub stream_raw { # MboxGz response callback + my ($ctx) = @_; + delete($ctx->{smsg}) // + $ctx->{-inbox}->over->next_by_mid($ctx->{mid}, + @{$ctx->{id_prev}}); +} + +# /$INBOX/$MESSAGE_ID/raw +sub emit_raw { + my ($ctx) = @_; + my $env = $ctx->{env}; + $ctx->{base_url} = $ctx->{-inbox}->base_url($env); + my $over = $ctx->{-inbox}->over or return no_over_raw($ctx); + my ($id, $prev); + my $smsg = $over->next_by_mid($ctx->{mid}, \$id, \$prev) or return; + $ctx->{smsg} = $smsg; + my $res_hdr = res_hdr($ctx, $smsg->{subject}); + $ctx->{id_prev} = [ \$id, \$prev ]; + + if (my $gzf = gzf_maybe($res_hdr, $env)) { + $ctx->{gz} = delete $gzf->{gz}; + require PublicInbox::MboxGz; + PublicInbox::MboxGz::response($ctx, \&stream_raw, $res_hdr); + } elsif ($env->{'pi-httpd.async'}) { + sub { + my ($wcb) = @_; # -httpd provided write callback + $ctx->{http_out} = $wcb->([200, $res_hdr]); + $ctx->{env}->{'psgix.io'}->{forward} = $ctx; + bless $ctx, __PACKAGE__; + mbox_async_step($ctx); # start stepping + }; + } else { # generic PSGI code path + bless $ctx, __PACKAGE__; # respond to ->getline + [ 200, $res_hdr, $ctx ]; + } } sub msg_hdr ($$;$) { - my ($ctx, $header_obj, $mid) = @_; + my ($ctx, $eml, $mid) = @_; + my $header_obj = $eml->header_obj; # drop potentially confusing headers, ssoma already should've dropped # Lines and Content-Length @@ -120,12 +174,13 @@ sub msg_hdr ($$;$) { } sub msg_body ($) { + my $bdy = $_[0]->{bdy} // return "\n"; # mboxrd quoting style # https://en.wikipedia.org/wiki/Mbox#Modified_mbox # https://www.loc.gov/preservation/digital/formats/fdd/fdd000385.shtml # https://web.archive.org/http://www.qmail.org/man/man5/mbox.html - $_[0] =~ s/^(>*From )/>$1/gm; - $_[0] .= "\n"; + $$bdy =~ s/^(>*From )/>$1/gm; + $$bdy .= "\n"; } sub thread_cb { @@ -145,12 +200,12 @@ sub thread_cb { sub thread_mbox { my ($ctx, $over, $sfx) = @_; - require PublicInbox::MboxGz; my $msgs = $ctx->{msgs} = $over->get_thread($ctx->{mid}, {}); return [404, [qw(Content-Type text/plain)], []] if !@$msgs; $ctx->{prev} = $msgs->[-1]; $ctx->{over} = $over; # bump refcnt - PublicInbox::MboxGz->response($ctx, \&thread_cb, $msgs->[0]->{subject}); + require PublicInbox::MboxGz; + PublicInbox::MboxGz::mbox_gz($ctx, \&thread_cb, $msgs->[0]->{subject}); } sub emit_range { @@ -188,7 +243,8 @@ sub mbox_all_ids { return PublicInbox::WWW::need($ctx, 'Overview'); $ctx->{ids} = $ids; $ctx->{prev} = $prev; - return PublicInbox::MboxGz->response($ctx, \&all_ids_cb, 'all'); + require PublicInbox::MboxGz; + PublicInbox::MboxGz::mbox_gz($ctx, \&all_ids_cb, 'all'); } sub results_cb { @@ -213,7 +269,6 @@ sub results_cb { sub mbox_all { my ($ctx, $query) = @_; - require PublicInbox::MboxGz; return mbox_all_ids($ctx) if $query eq ''; my $qopts = $ctx->{qopts} = { mset => 2 }; my $srch = $ctx->{srch} = $ctx->{-inbox}->search or @@ -224,7 +279,8 @@ sub mbox_all { ["No results found\n"]]; $ctx->{iter} = 0; $ctx->{query} = $query; - PublicInbox::MboxGz->response($ctx, \&results_cb, 'results-'.$query); + require PublicInbox::MboxGz; + PublicInbox::MboxGz::mbox_gz($ctx, \&results_cb, 'results-'.$query); } 1; diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm index 598b10347..716bf7b19 100644 --- a/lib/PublicInbox/MboxGz.pm +++ b/lib/PublicInbox/MboxGz.pm @@ -18,22 +18,21 @@ sub mboxgz_blob_cb { # git->cat_async callback if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs # public-inbox-(edit|purge), just move onto the next message - return $http->next_step(\&async_next); + return $http->next_step(\&mboxgz_async_next); } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; } - $self->zmore(msg_hdr($self, - PublicInbox::Eml->new($bref)->header_obj, - $smsg->{mid})); + my $eml = PublicInbox::Eml->new($bref); + $self->zmore(msg_hdr($self, $eml, $smsg->{mid})); # PublicInbox::HTTP::{Chunked,Identity}::write - $self->{http_out}->write($self->translate(msg_body($$bref))); + $self->{http_out}->write($self->translate(msg_body($eml))); - $http->next_step(\&async_next); + $http->next_step(\&mboxgz_async_next); } # this is public-inbox-httpd-specific -sub async_step ($) { +sub mboxgz_async_step ($) { my ($self) = @_; if (my $smsg = $self->{smsg} = $self->{cb}->($self)) { git_async_cat($self->{-inbox}->git, $smsg->{blob}, @@ -45,45 +44,49 @@ sub async_step ($) { } # called by PublicInbox::DS::write -sub async_next { +sub mboxgz_async_next { my ($http) = @_; # PublicInbox::HTTP - async_step($http->{forward}); + mboxgz_async_step($http->{forward}); } # called by PublicInbox::HTTP::close, or any other PSGI server sub close { !!delete($_[0]->{http_out}) } sub response { - my ($class, $self, $cb, $fn) = @_; - $self->{base_url} = $self->{-inbox}->base_url($self->{env}); + my ($self, $cb, $res_hdr) = @_; $self->{cb} = $cb; - $self->{gz} = PublicInbox::GzipFilter::gzip_or_die(); - bless $self, $class; - # http://www.iana.org/assignments/media-types/application/gzip - $fn = defined($fn) && $fn ne '' ? to_filename($fn) : 'no-subject'; - my $h = [ qw(Content-Type application/gzip), - 'Content-Disposition', "inline; filename=$fn.mbox.gz" ]; + bless $self, __PACKAGE__; if ($self->{env}->{'pi-httpd.async'}) { sub { my ($wcb) = @_; # -httpd provided write callback - $self->{http_out} = $wcb->([200, $h]); + $self->{http_out} = $wcb->([200, $res_hdr]); $self->{env}->{'psgix.io'}->{forward} = $self; - async_step($self); # start stepping + mboxgz_async_step($self); # start stepping }; } else { # generic PSGI - [ 200, $h, $self ]; + [ 200, $res_hdr, $self ]; } } +sub mbox_gz { + my ($self, $cb, $fn) = @_; + $self->{base_url} = $self->{-inbox}->base_url($self->{env}); + $self->{gz} = PublicInbox::GzipFilter::gzip_or_die(); + $fn = to_filename($fn // 'no-subject'); + $fn = 'no-subject' if $fn eq ''; + # http://www.iana.org/assignments/media-types/application/gzip + response($self, $cb, [ qw(Content-Type application/gzip), + 'Content-Disposition', "inline; filename=$fn.mbox.gz" ]); +} + # called by Plack::Util::foreach or similar (generic PSGI) sub getline { my ($self) = @_; my $cb = $self->{cb} or return; while (my $smsg = $cb->($self)) { - my $mref = $self->{-inbox}->msg_by_smsg($smsg) or next; - my $h = PublicInbox::Eml->new($mref)->header_obj; - $self->zmore(msg_hdr($self, $h, $smsg->{mid})); - return $self->translate(msg_body($$mref)); + my $eml = $self->{-inbox}->smsg_eml($smsg) or next; + $self->zmore(msg_hdr($self, $eml, $smsg->{mid})); + return $self->translate(msg_body($eml)); } # signal that we're done and can return undef next call: delete $self->{cb}; diff --git a/t/psgi_v2.t b/t/psgi_v2.t index 8f75a3fbf..90a710a44 100644 --- a/t/psgi_v2.t +++ b/t/psgi_v2.t @@ -103,7 +103,7 @@ $mids = mids($mime->header_obj); my $third = $mids->[-1]; $im->done; -test_psgi(sub { $www->call(@_) }, sub { +my $client = sub { my ($cb) = @_; $res = $cb->(GET("/v2test/$third/raw")); $raw = $res->content; @@ -122,12 +122,19 @@ test_psgi(sub { $www->call(@_) }, sub { SKIP: { eval { require IO::Uncompress::Gunzip }; - skip 'IO::Uncompress::Gunzip missing', 4 if $@; + skip 'IO::Uncompress::Gunzip missing', 6 if $@; + my ($in, $out, $status); + my $req = GET('/v2test/a-mid@b/raw'); + $req->header('Accept-Encoding' => 'gzip'); + $res = $cb->($req); + is($res->header('Content-Encoding'), 'gzip', 'gzip encoding'); + $in = $res->content; + IO::Uncompress::Gunzip::gunzip(\$in => \$out); + is($out, $raw, 'gzip response matches'); $res = $cb->(GET('/v2test/a-mid@b/t.mbox.gz')); - my $out; - my $in = $res->content; - my $status = IO::Uncompress::Gunzip::gunzip(\$in => \$out); + $in = $res->content; + $status = IO::Uncompress::Gunzip::gunzip(\$in => \$out); unlike($out, qr/^From oldbug/sm, 'buggy "From_" line omitted'); like($out, qr/^hello world$/m, 'got first in t.mbox.gz'); like($out, qr/^hello world!$/m, 'got second in t.mbox.gz'); @@ -187,7 +194,34 @@ test_psgi(sub { $www->call(@_) }, sub { like($raw, qr!>\Q$mid\E</a>!s, "Message-ID $mid shown"); } like($raw, qr/\b3\+ messages\b/, 'thread overview shown'); +}; +test_psgi(sub { $www->call(@_) }, $client); +SKIP: { + require_mods(qw(Plack::Test::ExternalServer), 37); + my $cfgpath = "$inboxdir/$$.config"; + open my $fh, '>', $cfgpath or BAIL_OUT $!; + print $fh <<EOF or BAIL_OUT $!; +[publicinbox "v2test"] + inboxdir = $inboxdir + address = test\@example.com +EOF + close $fh or BAIL_OUT $!; + my $env = { PI_CONFIG => $cfgpath }; + my $sock = tcp_server() or die; + my ($out, $err) = map { "$inboxdir/std$_.log" } qw(out err); + my $cmd = [ qw(-httpd -W0), "--stdout=$out", "--stderr=$err" ]; + my $td = start_script($cmd, $env, { 3 => $sock }); + my ($h, $p) = ($sock->sockhost, $sock->sockport); + local $ENV{PLACK_TEST_EXTERNALSERVER_URI} = "http://$h:$p"; + Plack::Test::ExternalServer::test_psgi(client => $client); + $td->join('TERM'); + open $fh, '<', $err or BAIL_OUT $!; + is(do { local $/; <$fh> }, '', 'no errors'); +}; + +test_psgi(sub { $www->call(@_) }, sub { + my ($cb) = @_; my $exp = [ qw(<a-mid@b> <reuse@mid>) ]; $mime->header_set('Message-Id', @$exp); $mime->header_set('Subject', '4th dupe'); @@ -208,7 +242,7 @@ test_psgi(sub { $www->call(@_) }, sub { $res = $cb->(GET('/v2test/reuse@mid/T/')); $raw = $res->content; like($raw, qr/\b4\+ messages\b/, 'thread overview shown with /T/'); - @over = ($raw =~ m/^\d{4}-\d+-\d+\s+\d+:\d+ (.+)$/gm); + my @over = ($raw =~ m/^\d{4}-\d+-\d+\s+\d+:\d+ (.+)$/gm); is_deeply(\@over, [ '<a', '` <a', '` <a', '` <a' ], 'duplicate messages share the same root');
We always return Z (UTC) times, anyways, so we'll always use gmtime() on the seconds-after-the-epoch. --- Documentation/mknews.perl | 2 +- lib/PublicInbox/WwwAtomStream.pm | 8 +++----- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl index 965c30c1d..ba049d9e6 100755 --- a/Documentation/mknews.perl +++ b/Documentation/mknews.perl @@ -131,7 +131,7 @@ sub atom_start { delete $astream->{emit_header}; my $ibx = $ctx->{-inbox}; my $title = PublicInbox::WwwAtomStream::title_tag($ibx->description); - my $updated = PublicInbox::WwwAtomStream::feed_updated(gmtime($mtime)); + my $updated = PublicInbox::WwwAtomStream::feed_updated($mtime); print $out <<EOF or die; <?xml version="1.0" encoding="us-ascii"?> <feed diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm index c407e343f..c494fa226 100644 --- a/lib/PublicInbox/WwwAtomStream.pm +++ b/lib/PublicInbox/WwwAtomStream.pm @@ -103,7 +103,7 @@ sub atom_header { qq(\nhref="$base_url"/>) . qq(<link\nrel="self"\nhref="$self_url"/>) . qq(<id>$page_id</id>) . - feed_updated(gmtime($mtime)); + feed_updated($mtime); } # returns undef or string @@ -125,9 +125,7 @@ sub feed_entry { $irt = ''; } my $href = $base . mid_href($mid) . '/'; - my $t = msg_timestamp($hdr); - my @t = gmtime(defined $t ? $t : time); - my $updated = feed_updated(@t); + my $updated = feed_updated(msg_timestamp($hdr)); my $title = $hdr->header('Subject'); $title = '(no subject)' unless defined $title && $title ne ''; @@ -158,7 +156,7 @@ sub feed_entry { } sub feed_updated { - '<updated>' . strftime('%Y-%m-%dT%H:%M:%SZ', @_) . '</updated>'; + '<updated>' . strftime('%Y-%m-%dT%H:%M:%SZ', gmtime(@_)) . '</updated>'; } 1;
stat(2) on the inboxdir is unlikely to be correct, now that msgmap truncates its journal (rather than unlinking it). --- lib/PublicInbox/WwwAtomStream.pm | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm index c494fa226..35b536c1c 100644 --- a/lib/PublicInbox/WwwAtomStream.pm +++ b/lib/PublicInbox/WwwAtomStream.pm @@ -93,8 +93,6 @@ sub atom_header { $self_url .= 'new.atom'; $page_id = "mailto:$ibx->{-primary_address}"; } - my $mtime = (stat($ibx->{inboxdir}))[9] || time; - qq(<?xml version="1.0" encoding="us-ascii"?>\n) . qq(<feed\nxmlns="http://www.w3.org/2005/Atom"\n) . qq(xmlns:thr="http://purl.org/syndication/thread/1.0">) . @@ -103,7 +101,7 @@ sub atom_header { qq(\nhref="$base_url"/>) . qq(<link\nrel="self"\nhref="$self_url"/>) . qq(<id>$page_id</id>) . - feed_updated($mtime); + feed_updated($ibx->modified); } # returns undef or string
No need to deepen our object graph, here. --- lib/PublicInbox/WwwAtomStream.pm | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm index 35b536c1c..9f322d38f 100644 --- a/lib/PublicInbox/WwwAtomStream.pm +++ b/lib/PublicInbox/WwwAtomStream.pm @@ -22,15 +22,17 @@ sub close {} sub new { my ($class, $ctx, $cb) = @_; $ctx->{feed_base_url} = $ctx->{-inbox}->base_url($ctx->{env}); - bless { cb => $cb || \&close, ctx => $ctx, emit_header => 1 }, $class; + $ctx->{cb} = $cb || \&close; + $ctx->{emit_header} = 1; + bless $ctx, $class; } sub response { my ($class, $ctx, $code, $cb) = @_; my $h = [ 'Content-Type' => 'application/atom+xml' ]; - my $self = $class->new($ctx, $cb); - $self->{gzf} = gzf_maybe($h, $ctx->{env}); - [ $code, $h, $self ] + $class->new($ctx, $cb); + $ctx->{gzf} = gzf_maybe($h, $ctx->{env}); + [ $code, $h, $ctx ] } # called once for each message by PSGI server @@ -38,7 +40,7 @@ sub getline { my ($self) = @_; my $buf = do { if (my $middle = $self->{cb}) { - my $smsg = $middle->($self->{ctx}); + my $smsg = $middle->($self); feed_entry($self, $smsg) if $smsg; } } // (delete($self->{cb}) ? '</feed>' : undef); @@ -106,8 +108,7 @@ sub atom_header { # returns undef or string sub feed_entry { - my ($self, $smsg) = @_; - my $ctx = $self->{ctx}; + my ($ctx, $smsg) = @_; my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return ''; my $hdr = $eml->header_obj; my $mid = $smsg->{mid}; @@ -136,7 +137,7 @@ sub feed_entry { $email = ascii_html($email); my $s = ''; - if (delete $self->{emit_header}) { + if (delete $ctx->{emit_header}) { $s .= atom_header($ctx, $title); } $s .= "<entry><author><name>$name</name><email>$email</email>" .
We want to be able to parallelize and stress test more endpoints and toggle `--compressed' and possibly other options in curl. --- xt/httpd-async-stream.t | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/xt/httpd-async-stream.t b/xt/httpd-async-stream.t index 29bcb6125..22a96875d 100644 --- a/xt/httpd-async-stream.t +++ b/xt/httpd-async-stream.t @@ -15,7 +15,12 @@ my $curl = which('curl') or plan skip_all => "curl(1) missing for $0"; my ($tmpdir, $for_destroy) = tmpdir(); require_mods(qw(DBD::SQLite)); my $JOBS = $ENV{TEST_JOBS} // 4; -diag "TEST_JOBS=$JOBS"; +my $endpoint = $ENV{TEST_ENDPOINT} // 'all.mbox.gz'; +my $curl_opt = $ENV{TEST_CURL_OPT} // ''; +diag "TEST_JOBS=$JOBS TEST_ENDPOINT=$endpoint TEST_CURL_OPT=$curl_opt"; + +# we set Host: to ensure stable results across test runs +my @CURL_OPT = (qw(-HHost:example.com -sSf), split(' ', $curl_opt)); my $make_local_server = sub { my $pi_config = "$tmpdir/config"; @@ -38,7 +43,7 @@ address = test\@example.com my $cmd = [ '-httpd', "--stdout=$out", "--stderr=$err", '-W0' ]; my $host_port = $http->sockhost.':'.$http->sockport; push @$cmd, "-lhttp://$host_port"; - my $url = "$host_port/test/all.mbox.gz"; + my $url = "$host_port/test/$endpoint"; print STDERR "# CMD ". join(' ', @$cmd). "\n"; my $env = { PI_CONFIG => $pi_config }; (start_script($cmd, $env, $rdr), $url); @@ -53,7 +58,7 @@ my $do_get_all = sub { my ($buf, $nr); my $bytes = 0; my $t0 = now(); - my ($rd, $pid) = popen_rd([$curl, qw(-HHost:example.com -sSf), $url]); + my ($rd, $pid) = popen_rd([$curl, @CURL_OPT, $url]); while (1) { $nr = sysread($rd, $buf, 65536); last if !$nr;
This allows -httpd to handle other requests while waiting for git to retrieve and decode blobs. We'll also break apart t/psgi_v2.t further to ensure tests run against -httpd in addition to generic PSGI testing. Using xt/httpd-async-stream.t to test against clones of meta@public-inbox.org shows a 10-12% performance improvement with the following env: TEST_JOBS=1000 TEST_CURL_OPT=--compressed TEST_ENDPOINT=new.atom --- Documentation/mknews.perl | 7 +-- lib/PublicInbox/WwwAtomStream.pm | 74 +++++++++++++++++++++++---- t/psgi_v2.t | 86 ++++++++++++++++++++------------ 3 files changed, 123 insertions(+), 44 deletions(-) diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl index ba049d9e6..2d22d147e 100755 --- a/Documentation/mknews.perl +++ b/Documentation/mknews.perl @@ -147,9 +147,10 @@ EOF } sub mime2atom { - my ($out, $astream, $mime, $ctx) = @_; - my $smsg = bless { mime => $mime }, 'PublicInbox::Smsg'; - if (defined(my $str = $astream->feed_entry($smsg))) { + my ($out, $astream, $eml, $ctx) = @_; + my $smsg = bless {}, 'PublicInbox::Smsg'; + $smsg->populate($eml); + if (defined(my $str = $astream->feed_entry($smsg, $eml))) { print $out $str or die; } } diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm index 9f322d38f..583309228 100644 --- a/lib/PublicInbox/WwwAtomStream.pm +++ b/lib/PublicInbox/WwwAtomStream.pm @@ -15,9 +15,11 @@ use PublicInbox::Address; use PublicInbox::Hval qw(ascii_html mid_href); use PublicInbox::MsgTime qw(msg_timestamp); use PublicInbox::GzipFilter qw(gzf_maybe); +use PublicInbox::GitAsyncCat; -# called by PSGI server after getline: -sub close {} +# called by generic PSGI server after getline, +# and also by PublicInbox::HTTP::close +sub close { !!delete($_[0]->{http_out}) } sub new { my ($class, $ctx, $cb) = @_; @@ -27,12 +29,62 @@ sub new { bless $ctx, $class; } +# called by PublicInbox::DS::write +sub atom_async_next { + my ($http) = @_; # PublicInbox::HTTP + atom_async_step($http->{forward}); +} + +# this is public-inbox-httpd-specific +sub atom_blob_cb { # git->cat_async callback + my ($bref, $oid, $type, $size, $ctx) = @_; + my $http = $ctx->{env}->{'psgix.io'} or return; # client abort + my $smsg = delete $ctx->{smsg} or die 'BUG: no smsg'; + if (!defined($oid)) { + # it's possible to have TOCTOU if an admin runs + # public-inbox-(edit|purge), just move onto the next message + return $http->next_step(\&atom_async_next); + } else { + $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; + } + my $buf = feed_entry($ctx, $smsg, PublicInbox::Eml->new($bref)); + if (my $gzf = $ctx->{gzf}) { + $buf = $gzf->translate($buf); + } + # PublicInbox::HTTP::{Chunked,Identity}::write + $ctx->{http_out}->write($buf); + + $http->next_step(\&atom_async_next); +} + +sub atom_async_step { # this is public-inbox-httpd-specific + my ($ctx) = @_; + if (my $smsg = $ctx->{smsg} = $ctx->{cb}->($ctx)) { + git_async_cat($ctx->{-inbox}->git, $smsg->{blob}, + \&atom_blob_cb, $ctx); + } elsif (my $out = delete $ctx->{http_out}) { + if (my $gzf = delete $ctx->{gzf}) { + $out->write($gzf->zflush); + } + $out->close; + } +} + sub response { my ($class, $ctx, $code, $cb) = @_; - my $h = [ 'Content-Type' => 'application/atom+xml' ]; + my $res_hdr = [ 'Content-Type' => 'application/atom+xml' ]; $class->new($ctx, $cb); - $ctx->{gzf} = gzf_maybe($h, $ctx->{env}); - [ $code, $h, $ctx ] + $ctx->{gzf} = gzf_maybe($res_hdr, $ctx->{env}); + if ($ctx->{env}->{'pi-httpd.async'}) { + sub { + my ($wcb) = @_; # -httpd provided write callback + $ctx->{http_out} = $wcb->([200, $res_hdr]); + $ctx->{env}->{'psgix.io'}->{forward} = $ctx; + atom_async_step($ctx); # start stepping + }; + } else { + [ $code, $res_hdr, $ctx ]; + } } # called once for each message by PSGI server @@ -40,8 +92,13 @@ sub getline { my ($self) = @_; my $buf = do { if (my $middle = $self->{cb}) { - my $smsg = $middle->($self); - feed_entry($self, $smsg) if $smsg; + if (my $smsg = $middle->($self)) { + my $eml = $self->{-inbox}->smsg_eml($smsg) or + return ''; + feed_entry($self, $smsg, $eml); + } else { + undef; + } } } // (delete($self->{cb}) ? '</feed>' : undef); @@ -108,8 +165,7 @@ sub atom_header { # returns undef or string sub feed_entry { - my ($ctx, $smsg) = @_; - my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return ''; + my ($ctx, $smsg, $eml) = @_; my $hdr = $eml->header_obj; my $mid = $smsg->{mid}; my $irt = PublicInbox::View::in_reply_to($hdr); diff --git a/t/psgi_v2.t b/t/psgi_v2.t index 90a710a44..4ab9601c0 100644 --- a/t/psgi_v2.t +++ b/t/psgi_v2.t @@ -14,6 +14,36 @@ use_ok($_) for (qw(HTTP::Request::Common Plack::Test)); use_ok 'PublicInbox::WWW'; use_ok 'PublicInbox::V2Writable'; my ($inboxdir, $for_destroy) = tmpdir(); +my $cfgpath = "$inboxdir/$$.config"; +SKIP: { + require_mods(qw(Plack::Test::ExternalServer), 1); + open my $fh, '>', $cfgpath or BAIL_OUT $!; + print $fh <<EOF or BAIL_OUT $!; +[publicinbox "v2test"] + inboxdir = $inboxdir + address = test\@example.com +EOF + close $fh or BAIL_OUT $!; +} + +my $run_httpd = sub { + my ($client, $skip) = @_; + SKIP: { + require_mods(qw(Plack::Test::ExternalServer), $skip); + my $env = { PI_CONFIG => $cfgpath }; + my $sock = tcp_server() or die; + my ($out, $err) = map { "$inboxdir/std$_.log" } qw(out err); + my $cmd = [ qw(-httpd -W0), "--stdout=$out", "--stderr=$err" ]; + my $td = start_script($cmd, $env, { 3 => $sock }); + my ($h, $p) = ($sock->sockhost, $sock->sockport); + local $ENV{PLACK_TEST_EXTERNALSERVER_URI} = "http://$h:$p"; + Plack::Test::ExternalServer::test_psgi(client => $client); + $td->join('TERM'); + open my $fh, '<', $err or BAIL_OUT $!; + is(do { local $/; <$fh> }, '', 'no errors'); + } +}; + my $ibx = { inboxdir => $inboxdir, name => 'test-v2writable', @@ -60,7 +90,7 @@ EOF my $config = PublicInbox::Config->new(\$cfg); my $www = PublicInbox::WWW->new($config); my ($res, $raw, @from_); -test_psgi(sub { $www->call(@_) }, sub { +my $client0 = sub { my ($cb) = @_; $res = $cb->(GET('/v2test/description')); like($res->content, qr!\$INBOX_DIR/description missing!, @@ -90,7 +120,9 @@ test_psgi(sub { $www->call(@_) }, sub { @bodies = ($res->content =~ /^(hello [^<]+)$/mg); is_deeply(\@bodies, [ "hello world!\n", "hello world\n" ], 'new.html ordering is chronological'); -}); +}; +test_psgi(sub { $www->call(@_) }, $client0); +$run_httpd->($client0, 9); $mime->header_set('Message-Id', 'a-mid@b'); $mime->body_set("hello ghosts\n"); @@ -103,7 +135,7 @@ $mids = mids($mime->header_obj); my $third = $mids->[-1]; $im->done; -my $client = sub { +my $client1 = sub { my ($cb) = @_; $res = $cb->(GET("/v2test/$third/raw")); $raw = $res->content; @@ -196,32 +228,10 @@ my $client = sub { like($raw, qr/\b3\+ messages\b/, 'thread overview shown'); }; -test_psgi(sub { $www->call(@_) }, $client); -SKIP: { - require_mods(qw(Plack::Test::ExternalServer), 37); - my $cfgpath = "$inboxdir/$$.config"; - open my $fh, '>', $cfgpath or BAIL_OUT $!; - print $fh <<EOF or BAIL_OUT $!; -[publicinbox "v2test"] - inboxdir = $inboxdir - address = test\@example.com -EOF - close $fh or BAIL_OUT $!; - my $env = { PI_CONFIG => $cfgpath }; - my $sock = tcp_server() or die; - my ($out, $err) = map { "$inboxdir/std$_.log" } qw(out err); - my $cmd = [ qw(-httpd -W0), "--stdout=$out", "--stderr=$err" ]; - my $td = start_script($cmd, $env, { 3 => $sock }); - my ($h, $p) = ($sock->sockhost, $sock->sockport); - local $ENV{PLACK_TEST_EXTERNALSERVER_URI} = "http://$h:$p"; - Plack::Test::ExternalServer::test_psgi(client => $client); - $td->join('TERM'); - open $fh, '<', $err or BAIL_OUT $!; - is(do { local $/; <$fh> }, '', 'no errors'); -}; +test_psgi(sub { $www->call(@_) }, $client1); +$run_httpd->($client1, 38); -test_psgi(sub { $www->call(@_) }, sub { - my ($cb) = @_; +{ my $exp = [ qw(<a-mid@b> <reuse@mid>) ]; $mime->header_set('Message-Id', @$exp); $mime->header_set('Subject', '4th dupe'); @@ -230,10 +240,12 @@ test_psgi(sub { $www->call(@_) }, sub { $im->done; my @h = $mime->header('Message-ID'); is_deeply($exp, \@h, 'reused existing Message-ID'); - $config->each_inbox(sub { $_[0]->search->reopen }); +} - $res = $cb->(GET('/v2test/new.atom')); +my $client2 = sub { + my ($cb) = @_; + my $res = $cb->(GET('/v2test/new.atom')); my @ids = ($res->content =~ m!<id>urn:uuid:([^<]+)</id>!sg); my %ids; $ids{$_}++ for @ids; @@ -256,7 +268,11 @@ test_psgi(sub { $www->call(@_) }, sub { is($res->code, 200, 'got info refs for dumb clones w/ .git suffix'); $res = $cb->(GET('/v2test/info/refs')); is($res->code, 404, 'v2 git URL w/o shard fails'); +}; +test_psgi(sub { $www->call(@_) }, $client2); +$run_httpd->($client2, 8); +{ # ensure conflicted attachments can be resolved foreach my $body (qw(old new)) { $mime = eml_load "t/psgi_v2-$body.eml"; @@ -264,7 +280,11 @@ test_psgi(sub { $www->call(@_) }, sub { } $im->done; $config->each_inbox(sub { $_[0]->search->reopen }); - $res = $cb->(GET('/v2test/a@dup/')); +} + +my $client3 = sub { + my ($cb) = @_; + my $res = $cb->(GET('/v2test/a@dup/')); my @links = ($res->content =~ m!"\.\./([^/]+/2-attach\.txt)\"!g); is(scalar(@links), 2, 'both attachment links exist'); isnt($links[0], $links[1], 'attachment links are different'); @@ -276,7 +296,9 @@ test_psgi(sub { $www->call(@_) }, sub { } $res = $cb->(GET('/v2test/?t=1970'.'01'.'01'.'000000')); is($res->code, 404, '404 for out-of-range t= param'); -}); +}; +test_psgi(sub { $www->call(@_) }, $client3); +$run_httpd->($client3, 4); done_testing();
Like with WwwAtomStream and MboxGz, we can bless the existing $ctx object directly to avoid allocating a new hashref. We'll also switch from "->" to "::" to reduce stack utilization. --- Documentation/mknews.perl | 2 +- lib/PublicInbox/Feed.pm | 2 +- lib/PublicInbox/SearchView.pm | 2 +- lib/PublicInbox/View.pm | 8 ++--- lib/PublicInbox/WwwStream.pm | 59 +++++++++++++++-------------------- 5 files changed, 33 insertions(+), 40 deletions(-) diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl index 2d22d147e..1bd704e68 100755 --- a/Documentation/mknews.perl +++ b/Documentation/mknews.perl @@ -112,7 +112,7 @@ sub html_start { my ($out, $ctx) = @_; require PublicInbox::WwwStream; $ctx->{www} = My::MockObject->new(style => ''); - my $www_stream = PublicInbox::WwwStream->new($ctx); + my $www_stream = PublicInbox::WwwStream::init($ctx); print $out $www_stream->_html_top, '<pre>' or die; } diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm index 4c1056b46..f25dd267e 100644 --- a/lib/PublicInbox/Feed.pm +++ b/lib/PublicInbox/Feed.pm @@ -70,7 +70,7 @@ sub new_html { $ctx->{-html_tip} = '<pre>'; $ctx->{-upfx} = ''; $ctx->{-hr} = 1; - PublicInbox::WwwStream->response($ctx, 200, \&new_html_i); + PublicInbox::WwwStream::response($ctx, 200, \&new_html_i); } # private subs diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm index d53a533e5..71c3ae707 100644 --- a/lib/PublicInbox/SearchView.pm +++ b/lib/PublicInbox/SearchView.pm @@ -82,7 +82,7 @@ retry: $cb = mset_summary($ctx, $mset, $q); } } - PublicInbox::WwwStream->response($ctx, $code, $cb); + PublicInbox::WwwStream::response($ctx, $code, $cb); } # display non-nested search results similar to what users expect from diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 0bc2b06e4..4d6f44e0b 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -66,7 +66,7 @@ sub msg_page { $ctx->{mhref} = $next ? '../'.mid_href($smsg->{mid}).'/' : ''; multipart_text_as_html($mime, $ctx); $ctx->{-html_tip} = (${delete $ctx->{obuf}} .= '</pre><hr>'); - PublicInbox::WwwStream->response($ctx, 200, \&msg_page_i); + PublicInbox::WwwStream::response($ctx, 200, \&msg_page_i); } sub msg_page_more { # cold @@ -413,7 +413,7 @@ sub stream_thread ($$) { $ctx->{-title_html} = ascii_html($smsg->{subject}); $ctx->{-html_tip} = thread_eml_entry($ctx, $level, $smsg, $eml); $ctx->{-queue} = \@q; - PublicInbox::WwwStream->response($ctx, 200, \&stream_thread_i); + PublicInbox::WwwStream::response($ctx, 200, \&stream_thread_i); } # /$INBOX/$MESSAGE_ID/t/ @@ -459,7 +459,7 @@ sub thread_html { $ctx->{-title_html} = ascii_html($smsg->{subject}); $ctx->{-html_tip} = '<pre>'.eml_entry($ctx, $smsg, $eml, scalar @$msgs); $ctx->{msgs} = $msgs; - PublicInbox::WwwStream->response($ctx, 200, \&thread_html_i); + PublicInbox::WwwStream::response($ctx, 200, \&thread_html_i); } sub thread_html_i { # PublicInbox::WwwStream::getline callback @@ -1213,7 +1213,7 @@ sub index_topics { if (@$msgs) { walk_thread(thread_results($ctx, $msgs), $ctx, \&acc_topic); } - PublicInbox::WwwStream->response($ctx, dump_topics($ctx), \&index_nav); + PublicInbox::WwwStream::response($ctx, dump_topics($ctx), \&index_nav); } sub thread_adj_level { diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index 8623440b8..c80440d14 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -27,28 +27,24 @@ sub base_url ($) { $base_url; } -sub new { - my ($class, $ctx, $cb) = @_; - - bless { - nr => 0, - cb => $cb, - ctx => $ctx, - base_url => base_url($ctx), - }, $class; +sub init { + my ($ctx, $cb) = @_; + $ctx->{cb} = $cb; + $ctx->{base_url} = base_url($ctx); + $ctx->{nr} = 0; + bless $ctx, __PACKAGE__; } sub response { - my ($class, $ctx, $code, $cb) = @_; + my ($ctx, $code, $cb) = @_; my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ]; - my $self = $class->new($ctx, $cb); - $self->{gzf} = gzf_maybe($h, $ctx->{env}); - [ $code, $h, $self ] + init($ctx, $cb); + $ctx->{gzf} = gzf_maybe($h, $ctx->{env}); + [ $code, $h, $ctx ] } sub _html_top ($) { - my ($self) = @_; - my $ctx = $self->{ctx}; + my ($ctx) = @_; my $ibx = $ctx->{-inbox}; my $desc = ascii_html($ibx->description); my $title = delete($ctx->{-title_html}) // $desc; @@ -89,14 +85,13 @@ sub code_footer ($) { } sub _html_end { - my ($self) = @_; + my ($ctx) = @_; my $urls = 'Archives are clonable:'; - my $ctx = $self->{ctx}; my $ibx = $ctx->{-inbox}; my $desc = ascii_html($ibx->description); my @urls; - my $http = $self->{base_url}; + my $http = $ctx->{base_url}; my $max = $ibx->max_git_epoch; my $dir = (split(m!/!, $http))[-1]; my %seen = ($http => 1); @@ -163,41 +158,39 @@ EOF # callback for HTTP.pm (and any other PSGI servers) sub getline { - my ($self) = @_; - my $nr = $self->{nr}++; + my ($ctx) = @_; + my $nr = $ctx->{nr}++; my $buf = do { if ($nr == 0) { - _html_top($self); - } elsif (my $middle = $self->{cb}) { - $middle->($nr, $self->{ctx}); + _html_top($ctx); + } elsif (my $middle = $ctx->{cb}) { + $middle->($nr, $ctx); } - } // (delete($self->{cb}) ? _html_end($self) : undef); + } // (delete($ctx->{cb}) ? _html_end($ctx) : undef); # gzf may be GzipFilter, `undef' or `0' - my $gzf = $self->{gzf} or return $buf; + my $gzf = $ctx->{gzf} or return $buf; return $gzf->translate($buf) if defined $buf; - $self->{gzf} = 0; # next call to ->getline returns $buf (== undef) + $ctx->{gzf} = 0; # next call to ->getline returns $buf (== undef) $gzf->translate(undef); } sub html_oneshot ($$;$) { my ($ctx, $code, $sref) = @_; - my $self = bless { - ctx => $ctx, - base_url => base_url($ctx), - }, __PACKAGE__; + $ctx->{base_url} = base_url($ctx); + bless $ctx, __PACKAGE__; my @x; my $h = [ 'Content-Type' => 'text/html; charset=UTF-8', 'Content-Length' => undef ]; if (my $gzf = gzf_maybe($h, $ctx->{env})) { - $gzf->zmore(_html_top($self)); + $gzf->zmore(_html_top($ctx)); $gzf->zmore($$sref) if $sref; - $x[0] = $gzf->zflush(_html_end($self)); + $x[0] = $gzf->zflush(_html_end($ctx)); $h->[3] = length($x[0]); } else { - @x = (_html_top($self), $sref ? $$sref : (), _html_end($self)); + @x = (_html_top($ctx), $sref ? $$sref : (), _html_end($ctx)); $h->[3] += bytes::length($_) for @x; } [ $code, $h, \@x ]
This will make it easier to support asynchronous blob retrievals. The `$ctx->{nr}' counter is no longer implicitly supplied since many users didn't care for it, so stack overhead is slightly reduced. --- Documentation/mknews.perl | 4 +- lib/PublicInbox/Feed.pm | 3 +- lib/PublicInbox/SearchView.pm | 28 +++--- lib/PublicInbox/View.pm | 181 +++++++++++++++++----------------- lib/PublicInbox/WwwStream.pm | 19 ++-- 5 files changed, 113 insertions(+), 122 deletions(-) diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl index 1bd704e68..51d54b716 100755 --- a/Documentation/mknews.perl +++ b/Documentation/mknews.perl @@ -37,7 +37,7 @@ if ($dst eq 'NEWS') { my $ibx = My::MockObject->new( description => 'public-inbox releases', over => undef, - search => 1, # for WwwStream:_html_top + search => 1, # for WwwStream::html_top base_url => "$base_url/", ); $ibx->{-primary_address} = $addr; @@ -113,7 +113,7 @@ sub html_start { require PublicInbox::WwwStream; $ctx->{www} = My::MockObject->new(style => ''); my $www_stream = PublicInbox::WwwStream::init($ctx); - print $out $www_stream->_html_top, '<pre>' or die; + print $out $www_stream->html_top, '<pre>' or die; } sub html_end { diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm index f25dd267e..b15fc3a09 100644 --- a/lib/PublicInbox/Feed.pm +++ b/lib/PublicInbox/Feed.pm @@ -50,7 +50,8 @@ sub generate_html_index { } sub new_html_i { - my ($nr, $ctx) = @_; + my ($ctx) = @_; + return $ctx->html_top if exists $ctx->{-html_tip}; my $msgs = $ctx->{msgs}; while (my $smsg = shift @$msgs) { my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next; diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm index 71c3ae707..eeebdfa31 100644 --- a/lib/PublicInbox/SearchView.pm +++ b/lib/PublicInbox/SearchView.pm @@ -10,12 +10,11 @@ use PublicInbox::Smsg; use PublicInbox::Hval qw(ascii_html obfuscate_addrs mid_href); use PublicInbox::View; use PublicInbox::WwwAtomStream; +use PublicInbox::WwwStream qw(html_oneshot); use PublicInbox::SearchThread; our $LIM = 200; my %rmap_inc; -my $noop = sub {}; - sub mbox_results { my ($ctx) = @_; my $q = PublicInbox::SearchQuery->new($ctx->{qp}); @@ -48,7 +47,7 @@ sub sres_top_html { relevance => $q->{r}, asc => $asc, }; - my ($mset, $total, $err, $cb); + my ($mset, $total, $err, $html); retry: eval { $mset = $srch->query($query, $opts); @@ -58,8 +57,7 @@ retry: ctx_prepare($q, $ctx); if ($err) { $code = 400; - $ctx->{-html_tip} = '<pre>'.err_txt($ctx, $err).'</pre><hr>'; - $cb = $noop; + $html = '<pre>'.err_txt($ctx, $err).'</pre><hr>'; } elsif ($total == 0) { if (defined($ctx->{-uxs_retried})) { # undo retry damage: @@ -70,19 +68,16 @@ retry: goto retry; } $code = 404; - $ctx->{-html_tip} = "<pre>\n[No results found]</pre><hr>"; - $cb = $noop; + $html = "<pre>\n[No results found]</pre><hr>"; } else { return adump($_[0], $mset, $q, $ctx) if $x eq 'A'; $ctx->{-html_tip} = search_nav_top($mset, $q, $ctx); - if ($x eq 't') { - $cb = mset_thread($ctx, $mset, $q); - } else { - $cb = mset_summary($ctx, $mset, $q); - } + return mset_thread($ctx, $mset, $q) if $x eq 't'; + mset_summary($ctx, $mset, $q); # appends to {-html_tip} + $html = ''; } - PublicInbox::WwwStream::response($ctx, $code, $cb); + html_oneshot($ctx, $code); } # display non-nested search results similar to what users expect from @@ -122,7 +117,7 @@ sub mset_summary { $$res .= "$pfx - by $f @ $date UTC [$pct%]\n\n"; } $$res .= search_nav_bot($mset, $q); - $noop; + undef; } # shorten "/full/path/to/Foo/Bar.pm" to "Foo/Bar.pm" so error @@ -292,12 +287,13 @@ sub mset_thread { @$msgs = reverse @$msgs if $r; $ctx->{msgs} = $msgs; - \&mset_thread_i; + PublicInbox::WwwStream::response($ctx, 200, \&mset_thread_i); } # callback for PublicInbox::WwwStream::getline sub mset_thread_i { - my ($nr, $ctx) = @_; + my ($ctx) = @_; + return $ctx->html_top if exists $ctx->{-html_tip}; my $msgs = $ctx->{msgs} or return; while (my $smsg = pop @$msgs) { my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next; diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 4d6f44e0b..243528263 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -27,60 +27,60 @@ use constant TCHILD => '` '; sub th_pfx ($) { $_[0] == 0 ? '' : TCHILD }; sub msg_page_i { - my ($nr, $ctx) = @_; - if (my $more = delete $ctx->{more}) { # unlikely - # fake an EOF if $more retrieval fails; - eval { msg_page_more($ctx, $nr, @$more) }; - } elsif (my $hdr = delete $ctx->{hdr}) { - # fake an EOF if generating the footer fails; - # we want to at least show the message if something - # here crashes: - eval { html_footer($ctx, $hdr) }; - } else { - undef + my ($ctx) = @_; + my $cur = delete $ctx->{smsg} or return; # undef: done + my $nxt; + if (my $over = $ctx->{-inbox}->over) { + $nxt = $ctx->{smsg} = $over->next_by_mid(@{$ctx->{next_arg}}); } + $ctx->{mhref} = ($ctx->{nr} || $nxt) ? + "../${\mid_href($cur->{mid})}/" : ''; + my $eml = $ctx->{-inbox}->smsg_eml($cur) or return; + my $hdr = $eml->header_obj; + my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx); + multipart_text_as_html($eml, $ctx); + delete $ctx->{obuf}; + $$obuf .= '</pre><hr>'; + # we want to at least show the message if something + # here crashes: + eval { $$obuf .= html_footer($ctx, $ctx->{first_hdr}) } if !$nxt; + $$obuf; +} + +# /$INBOX/$MESSAGE_ID/ for unindexed v1 inboxes +sub no_over_i { + my ($ctx) = @_; + my $eml = delete $ctx->{eml} or return; + my $hdr = $eml->header_obj; + $ctx->{mhref} = ''; + my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx); + multipart_text_as_html($eml, $ctx); + delete $ctx->{obuf}; + $$obuf .= '</pre><hr>'; + eval { $$obuf .= html_footer($ctx, $hdr) }; + $$obuf +} + +sub no_over_html ($) { + my ($ctx) = @_; + my $bref = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return; # 404 + $ctx->{eml} = PublicInbox::Eml->new($bref); + PublicInbox::WwwStream::response($ctx, 200, \&no_over_i); } # public functions: (unstable) sub msg_page { my ($ctx) = @_; - my $mid = $ctx->{mid}; my $ibx = $ctx->{-inbox}; - my ($smsg, $first, $next); - if (my $over = $ibx->over) { - my ($id, $prev); - $smsg = $over->next_by_mid($mid, \$id, \$prev) or return; - $first = $ibx->msg_by_smsg($smsg) or return; - $next = $over->next_by_mid($mid, \$id, \$prev); - $ctx->{more} = [ $id, $prev, $next ] if $next; - } else { - $first = $ibx->msg_by_mid($mid) or return; - } - my $mime = PublicInbox::Eml->new($first); $ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef; - my $hdr = $ctx->{hdr} = $mime->header_obj; - $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx, 0); - $ctx->{smsg} = $smsg; - # $next cannot be true w/o $smsg being defined: - $ctx->{mhref} = $next ? '../'.mid_href($smsg->{mid}).'/' : ''; - multipart_text_as_html($mime, $ctx); - $ctx->{-html_tip} = (${delete $ctx->{obuf}} .= '</pre><hr>'); + my $over = $ibx->over or return no_over_html($ctx); + my ($id, $prev); + my $next_arg = $ctx->{next_arg} = [ $ctx->{mid}, \$id, \$prev ]; + $ctx->{smsg} = $over->next_by_mid(@$next_arg) or return; PublicInbox::WwwStream::response($ctx, 200, \&msg_page_i); } -sub msg_page_more { # cold - my ($ctx, $nr, $id, $prev, $smsg) = @_; - my $ibx = $ctx->{-inbox}; - my $next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev); - $ctx->{more} = [ $id, $prev, $next ] if $next; - my $eml = $ibx->smsg_eml($smsg) or return ''; - $ctx->{mhref} = '../' . mid_href($smsg->{mid}) . '/'; - $ctx->{obuf} = _msg_page_prepare_obuf($eml->header_obj, $ctx, $nr); - multipart_text_as_html($eml, $ctx); - ${delete $ctx->{obuf}} .= '</pre><hr>'; -} - # /$INBOX/$MESSAGE_ID/#R sub msg_reply ($$) { my ($ctx, $hdr) = @_; @@ -377,42 +377,40 @@ sub thread_eml_entry { $beg . '<pre>' . eml_entry($ctx, $smsg, $eml, 0) . '</pre>' . $end; } -sub stream_thread_i { # PublicInbox::WwwStream::getline callback - my ($nr, $ctx) = @_; - return unless exists($ctx->{skel}); - my $q = $ctx->{-queue}; +sub next_in_queue ($;$) { + my ($q, $ghost_ok) = @_; while (@$q) { - my $level = shift @$q; - my $node = shift @$q or next; + my ($level, $smsg) = splice(@$q, 0, 2); my $cl = $level + 1; - unshift @$q, map { ($cl, $_) } @{$node->{children}}; - if (my $eml = $ctx->{-inbox}->smsg_eml($node)) { - return thread_eml_entry($ctx, $level, $node, $eml); - } else { - return ghost_index_entry($ctx, $level, $node); - } + unshift @$q, map { ($cl, $_) } @{$smsg->{children}}; + return ($level, $smsg) if $ghost_ok || exists($smsg->{blob}); } - join('', thread_adj_level($ctx, 0)) . ${delete $ctx->{skel}}; + undef; } -sub stream_thread ($$) { - my ($rootset, $ctx) = @_; - my $ibx = $ctx->{-inbox}; - my @q = map { (0, $_) } @$rootset; - my ($smsg, $eml, $level); - while (@q) { - $level = shift @q; - $smsg = shift @q or next; - my $cl = $level + 1; - unshift @q, map { ($cl, $_) } @{$smsg->{children}}; - $eml = $ibx->smsg_eml($smsg) and last; +sub stream_thread_i { # PublicInbox::WwwStream::getline callback + my ($ctx) = @_; + return unless exists($ctx->{skel}); + my $nr = $ctx->{nr}++; + my ($level, $smsg) = next_in_queue($ctx->{-queue}, $nr); + + $smsg or return + join('', thread_adj_level($ctx, 0)) . ${delete $ctx->{skel}}; + + my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return + ghost_index_entry($ctx, $level, $smsg); + + if ($nr == 0) { + $ctx->{-title_html} = ascii_html($smsg->{subject}); + $ctx->html_top . thread_eml_entry($ctx, $level, $smsg, $eml); + } else { + thread_eml_entry($ctx, $level, $smsg, $eml); } - return missing_thread($ctx) unless $eml; +} - $ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef; - $ctx->{-title_html} = ascii_html($smsg->{subject}); - $ctx->{-html_tip} = thread_eml_entry($ctx, $level, $smsg, $eml); - $ctx->{-queue} = \@q; +sub stream_thread ($$) { + my ($rootset, $ctx) = @_; + $ctx->{-queue} = [ map { (0, $_) } @$rootset ]; PublicInbox::WwwStream::response($ctx, 200, \&stream_thread_i); } @@ -451,22 +449,21 @@ sub thread_html { return stream_thread($rootset, $ctx) unless $ctx->{flat}; # flat display: lazy load the full message from smsg - my ($smsg, $eml); - while ($smsg = shift @$msgs) { - $eml = $ibx->smsg_eml($smsg) and last; - } - return missing_thread($ctx) unless $smsg; - $ctx->{-title_html} = ascii_html($smsg->{subject}); - $ctx->{-html_tip} = '<pre>'.eml_entry($ctx, $smsg, $eml, scalar @$msgs); $ctx->{msgs} = $msgs; + $ctx->{-html_tip} = '<pre>'; PublicInbox::WwwStream::response($ctx, 200, \&thread_html_i); } sub thread_html_i { # PublicInbox::WwwStream::getline callback - my ($nr, $ctx) = @_; + my ($ctx) = @_; my $msgs = $ctx->{msgs} or return; while (my $smsg = shift @$msgs) { my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next; + if (exists $ctx->{-html_tip}) { + $ctx->{-title_html} = ascii_html($smsg->{subject}); + return $ctx->html_top . + eml_entry($ctx, $smsg, $eml, scalar @$msgs); + } return eml_entry($ctx, $smsg, $eml, scalar @$msgs); } my ($skel) = delete @$ctx{qw(skel msgs)}; @@ -624,23 +621,23 @@ sub add_text_body { # callback for each_part } sub _msg_page_prepare_obuf { - my ($hdr, $ctx, $nr) = @_; + my ($hdr, $ctx) = @_; my $over = $ctx->{-inbox}->over; my $obfs_ibx = $ctx->{-obfs_ibx}; my $rv = ''; my $mids = mids_for_index($hdr); - if ($nr == 0) { - if ($ctx->{more}) { + my $nr = $ctx->{nr}++; + if ($nr) { # unlikely + $rv .= '<pre>'; + } else { + $ctx->{first_hdr} = $hdr; + if ($ctx->{smsg}) { $rv .= "<pre>WARNING: multiple messages have this Message-ID\n</pre>"; } $rv .= "<pre\nid=b>"; # anchor for body start - } else { - $rv .= '<pre>'; - } - if ($over) { - $ctx->{-upfx} = '../'; } + $ctx->{-upfx} = '../' if $over; my @title; # (Subject[0], From[0]) for my $v ($hdr->header('From')) { my @n = PublicInbox::Address::names($v); @@ -681,7 +678,10 @@ sub _msg_page_prepare_obuf { obfuscate_addrs($obfs_ibx, $v) if $obfs_ibx; # possible :P $rv .= "Date: $v\n"; } - $ctx->{-title_html} = join(' - ', @title); + if (!$nr) { # first (and only) message, common case + $ctx->{-title_html} = join(' - ', @title); + $rv = $ctx->html_top . $rv; + } if (scalar(@$mids) == 1) { # common case my $mhtml = ascii_html($mids->[0]); $rv .= "Message-ID: <$mhtml> "; @@ -1160,8 +1160,9 @@ sub pagination_footer ($$) { "<hr><pre>page: $next$prev</pre>"; } -sub index_nav { # callback for WwwStream - my (undef, $ctx) = @_; +sub index_nav { # callback for WwwStream::getline + my ($ctx) = @_; + return $ctx->html_top if exists $ctx->{-html_tip}; pagination_footer($ctx, '.') } diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index c80440d14..4d82cbb48 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -31,7 +31,6 @@ sub init { my ($ctx, $cb) = @_; $ctx->{cb} = $cb; $ctx->{base_url} = base_url($ctx); - $ctx->{nr} = 0; bless $ctx, __PACKAGE__; } @@ -43,7 +42,7 @@ sub response { [ $code, $h, $ctx ] } -sub _html_top ($) { +sub html_top ($) { my ($ctx) = @_; my $ibx = $ctx->{-inbox}; my $desc = ascii_html($ibx->description); @@ -159,15 +158,9 @@ EOF # callback for HTTP.pm (and any other PSGI servers) sub getline { my ($ctx) = @_; - my $nr = $ctx->{nr}++; - - my $buf = do { - if ($nr == 0) { - _html_top($ctx); - } elsif (my $middle = $ctx->{cb}) { - $middle->($nr, $ctx); - } - } // (delete($ctx->{cb}) ? _html_end($ctx) : undef); + my $cb = $ctx->{cb}; + my $buf = $cb->($ctx) if $cb; + $buf //= delete($ctx->{cb}) ? _html_end($ctx) : undef; # gzf may be GzipFilter, `undef' or `0' my $gzf = $ctx->{gzf} or return $buf; @@ -185,12 +178,12 @@ sub html_oneshot ($$;$) { my $h = [ 'Content-Type' => 'text/html; charset=UTF-8', 'Content-Length' => undef ]; if (my $gzf = gzf_maybe($h, $ctx->{env})) { - $gzf->zmore(_html_top($ctx)); + $gzf->zmore(html_top($ctx)); $gzf->zmore($$sref) if $sref; $x[0] = $gzf->zflush(_html_end($ctx)); $h->[3] = length($x[0]); } else { - @x = (_html_top($ctx), $sref ? $$sref : (), _html_end($ctx)); + @x = (html_top($ctx), $sref ? $$sref : (), _html_end($ctx)); $h->[3] += bytes::length($_) for @x; } [ $code, $h, \@x ]
Virtually all of our responses are going to be gzipped, anyways. This will allow us to utilize zlib as a buffering layer and share common code for async blob retrieval responses. To streamline this and allow GzipFilter to be a parent class, we'll replace the NoopFilter with a similar CompressNoop class which emulates the two Compress::Raw::Zlib::Deflate methods we use. This drops a bunch of redundant code and will hopefully make upcoming WwwStream changes easier to reason about. --- MANIFEST | 2 +- lib/PublicInbox/CompressNoop.pm | 22 +++++++ lib/PublicInbox/GzipFilter.pm | 75 +++++++++++++++++++----- lib/PublicInbox/Mbox.pm | 90 ++++++++--------------------- lib/PublicInbox/MboxGz.pm | 71 +++++------------------ lib/PublicInbox/NoopFilter.pm | 24 -------- lib/PublicInbox/WwwAtomStream.pm | 98 ++++++++------------------------ lib/PublicInbox/WwwListing.pm | 3 +- lib/PublicInbox/WwwStatic.pm | 3 +- 9 files changed, 148 insertions(+), 240 deletions(-) create mode 100644 lib/PublicInbox/CompressNoop.pm delete mode 100644 lib/PublicInbox/NoopFilter.pm diff --git a/MANIFEST b/MANIFEST index 9b0f50203..963caad02 100644 --- a/MANIFEST +++ b/MANIFEST @@ -100,6 +100,7 @@ lib/PublicInbox/Admin.pm lib/PublicInbox/AdminEdit.pm lib/PublicInbox/AltId.pm lib/PublicInbox/Cgit.pm +lib/PublicInbox/CompressNoop.pm lib/PublicInbox/Config.pm lib/PublicInbox/ContentHash.pm lib/PublicInbox/DS.pm @@ -159,7 +160,6 @@ lib/PublicInbox/NNTP.pm lib/PublicInbox/NNTPD.pm lib/PublicInbox/NNTPdeflate.pm lib/PublicInbox/NewsWWW.pm -lib/PublicInbox/NoopFilter.pm lib/PublicInbox/Over.pm lib/PublicInbox/OverIdx.pm lib/PublicInbox/ParentPipe.pm diff --git a/lib/PublicInbox/CompressNoop.pm b/lib/PublicInbox/CompressNoop.pm new file mode 100644 index 000000000..fe73c2d10 --- /dev/null +++ b/lib/PublicInbox/CompressNoop.pm @@ -0,0 +1,22 @@ +# Copyright (C) 2020 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +# Provide the same methods as Compress::Raw::Zlib::Deflate but +# does no transformation of outgoing data +package PublicInbox::CompressNoop; +use strict; +use Compress::Raw::Zlib qw(Z_OK); + +sub new { bless \(my $self), __PACKAGE__ } + +sub deflate { # ($self, $input, $output) + $_[2] .= $_[1]; + Z_OK; +} + +sub flush { # ($self, $output, $flags = Z_FINISH) + $_[1] //= ''; + Z_OK; +} + +1; diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 0a6c56a5d..99d43cf04 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -6,6 +6,10 @@ package PublicInbox::GzipFilter; use strict; use parent qw(Exporter); use Compress::Raw::Zlib qw(Z_FINISH Z_OK); +use PublicInbox::CompressNoop; +use PublicInbox::Eml; +use PublicInbox::GitAsyncCat; + our @EXPORT_OK = qw(gzf_maybe); my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1); my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip); @@ -14,22 +18,41 @@ sub new { bless {}, shift } # for Qspawn if using $env->{'pi-httpd.async'} sub attach { - my ($self, $fh) = @_; - $self->{fh} = $fh; + my ($self, $http_out) = @_; + $self->{http_out} = $http_out; $self } -# returns `0' and not `undef' on failure (see Www*Stream) -sub gzf_maybe ($$) { +sub gz_or_noop { my ($res_hdr, $env) = @_; - return 0 if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/; - my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT); - return 0 if $err != Z_OK; + if (($env->{HTTP_ACCEPT_ENCODING} // '') =~ /\bgzip\b/) { + $env->{'plack.skip-deflater'} = 1; + push @$res_hdr, @GZIP_HDRS; + gzip_or_die(); + } else { + PublicInbox::CompressNoop::new(); + } +} - # in case Plack::Middleware::Deflater is loaded: - $env->{'plack.skip-deflater'} = 1; - push @$res_hdr, @GZIP_HDRS; - bless { gz => $gz }, __PACKAGE__; +sub gzf_maybe ($$) { bless { gz => gz_or_noop(@_) }, __PACKAGE__ } + +sub psgi_response { + my ($self, $code, $res_hdr, $next_cb, $eml_cb) = @_; + my $env = $self->{env}; + $self->{gz} //= gz_or_noop($res_hdr, $env); + if ($env->{'pi-httpd.async'}) { + $self->{async_next} = $next_cb; + $self->{async_eml} = $eml_cb; + my $http = $env->{'psgix.io'}; # PublicInbox::HTTP + $http->{forward} = $self; + sub { + my ($wcb) = @_; # -httpd provided write callback + $self->{http_out} = $wcb->([$code, $res_hdr]); + $next_cb->($http); # start stepping + }; + } else { # generic PSGI code path + [ $code, $res_hdr, $self ]; + } } sub qsp_maybe ($$) { @@ -80,7 +103,7 @@ sub translate ($$) { sub write { # my $ret = bytes::length($_[1]); # XXX does anybody care? - $_[0]->{fh}->write(translate($_[0], $_[1])); + $_[0]->{http_out}->write(translate($_[0], $_[1])); } # similar to ->translate; use this when we're sure we know we have @@ -109,9 +132,31 @@ sub zflush ($;$) { sub close { my ($self) = @_; - my $fh = delete $self->{fh}; - $fh->write(zflush($self)); - $fh->close; + if (my $http_out = delete $self->{http_out}) { + $http_out->write(zflush($self)); + $http_out->close; + } +} + +# this is public-inbox-httpd-specific +sub async_blob_cb { # git->cat_async callback + my ($bref, $oid, $type, $size, $self) = @_; + my $http = $self->{env}->{'psgix.io'} or return; # client abort + my $smsg = $self->{smsg} or die 'BUG: no smsg'; + if (!defined($oid)) { + # it's possible to have TOCTOU if an admin runs + # public-inbox-(edit|purge), just move onto the next message + return $http->next_step($self->{async_next}); + } + $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; + $self->{async_eml}->($self, PublicInbox::Eml->new($bref)); + $http->next_step($self->{async_next}); +} + +sub smsg_blob { + my ($self, $smsg) = @_; + git_async_cat($self->{-inbox}->git, $smsg->{blob}, + \&async_blob_cb, $self); } 1; diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm index 895f828c5..abdf43c93 100644 --- a/lib/PublicInbox/Mbox.pm +++ b/lib/PublicInbox/Mbox.pm @@ -9,13 +9,11 @@ # more common "push" model) package PublicInbox::Mbox; use strict; -use warnings; +use parent 'PublicInbox::GzipFilter'; use PublicInbox::MID qw/mid_escape/; use PublicInbox::Hval qw/to_filename/; use PublicInbox::Smsg; use PublicInbox::Eml; -use PublicInbox::GitAsyncCat; -use PublicInbox::GzipFilter qw(gzf_maybe); # called by PSGI server as body response # this gets called twice for every message, once to return the header, @@ -25,49 +23,35 @@ sub getline { my $smsg = $ctx->{smsg} or return; my $ibx = $ctx->{-inbox}; my $eml = $ibx->smsg_eml($smsg) or return; - $ctx->{smsg} = $ibx->over->next_by_mid($ctx->{mid}, @{$ctx->{id_prev}}); - msg_hdr($ctx, $eml, $smsg->{mid}) . msg_body($eml); -} - -sub close { !!delete($_[0]->{http_out}) } - -sub mbox_async_step ($) { # public-inbox-httpd-only - my ($ctx) = @_; - if (my $smsg = $ctx->{smsg}) { - git_async_cat($ctx->{-inbox}->git, $smsg->{blob}, - \&mbox_blob_cb, $ctx); - } elsif (my $out = delete $ctx->{http_out}) { - $out->close; + my $n = $ctx->{smsg} = $ibx->over->next_by_mid(@{$ctx->{next_arg}}); + $ctx->zmore(msg_hdr($ctx, $eml, $smsg->{mid})); + if ($n) { + $ctx->translate(msg_body($eml)); + } else { # last message + $ctx->zmore(msg_body($eml)); + $ctx->zflush; } } # called by PublicInbox::DS::write -sub mbox_async_next { +sub async_next { my ($http) = @_; # PublicInbox::HTTP my $ctx = $http->{forward} or return; # client aborted eval { - $ctx->{smsg} = $ctx->{-inbox}->over->next_by_mid( - $ctx->{mid}, @{$ctx->{id_prev}}); - mbox_async_step($ctx); + my $smsg = $ctx->{smsg} or return $ctx->close; + $ctx->smsg_blob($smsg); }; + warn "E: $@" if $@; } -# this is public-inbox-httpd-specific -sub mbox_blob_cb { # git->cat_async callback - my ($bref, $oid, $type, $size, $ctx) = @_; - my $http = $ctx->{env}->{'psgix.io'} or return; # client abort - my $smsg = delete $ctx->{smsg} or die 'BUG: no smsg'; - if (!defined($oid)) { - # it's possible to have TOCTOU if an admin runs - # public-inbox-(edit|purge), just move onto the next message - return $http->next_step(\&mbox_async_next); - } else { - $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; - } - my $eml = PublicInbox::Eml->new($bref); - $ctx->{http_out}->write(msg_hdr($ctx, $eml, $smsg->{mid})); - $ctx->{http_out}->write(msg_body($eml)); - $http->next_step(\&mbox_async_next); +sub async_eml { # ->{async_eml} for async_blob_cb + my ($ctx, $eml) = @_; + my $smsg = delete $ctx->{smsg}; + # next message + $ctx->{smsg} = $ctx->{-inbox}->over->next_by_mid(@{$ctx->{next_arg}}); + + $ctx->zmore(msg_hdr($ctx, $eml, $smsg->{mid})); + $ctx->{http_out}->write($ctx->translate(msg_body($eml))); } sub res_hdr ($$) { @@ -97,41 +81,17 @@ sub no_over_raw ($) { [ msg_hdr($ctx, $eml, $ctx->{mid}) . msg_body($eml) ] ] } -sub stream_raw { # MboxGz response callback - my ($ctx) = @_; - delete($ctx->{smsg}) // - $ctx->{-inbox}->over->next_by_mid($ctx->{mid}, - @{$ctx->{id_prev}}); -} - # /$INBOX/$MESSAGE_ID/raw sub emit_raw { my ($ctx) = @_; - my $env = $ctx->{env}; - $ctx->{base_url} = $ctx->{-inbox}->base_url($env); + $ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env}); my $over = $ctx->{-inbox}->over or return no_over_raw($ctx); my ($id, $prev); - my $smsg = $over->next_by_mid($ctx->{mid}, \$id, \$prev) or return; - $ctx->{smsg} = $smsg; + my $mip = $ctx->{next_arg} = [ $ctx->{mid}, \$id, \$prev ]; + my $smsg = $ctx->{smsg} = $over->next_by_mid(@$mip) or return; my $res_hdr = res_hdr($ctx, $smsg->{subject}); - $ctx->{id_prev} = [ \$id, \$prev ]; - - if (my $gzf = gzf_maybe($res_hdr, $env)) { - $ctx->{gz} = delete $gzf->{gz}; - require PublicInbox::MboxGz; - PublicInbox::MboxGz::response($ctx, \&stream_raw, $res_hdr); - } elsif ($env->{'pi-httpd.async'}) { - sub { - my ($wcb) = @_; # -httpd provided write callback - $ctx->{http_out} = $wcb->([200, $res_hdr]); - $ctx->{env}->{'psgix.io'}->{forward} = $ctx; - bless $ctx, __PACKAGE__; - mbox_async_step($ctx); # start stepping - }; - } else { # generic PSGI code path - bless $ctx, __PACKAGE__; # respond to ->getline - [ 200, $res_hdr, $ctx ]; - } + bless $ctx, __PACKAGE__; + $ctx->psgi_response(200, $res_hdr, \&async_next, \&async_eml); } sub msg_hdr ($$;$) { diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm index 716bf7b19..fdd16f68e 100644 --- a/lib/PublicInbox/MboxGz.pm +++ b/lib/PublicInbox/MboxGz.pm @@ -6,77 +6,32 @@ use parent 'PublicInbox::GzipFilter'; use PublicInbox::Eml; use PublicInbox::Hval qw/to_filename/; use PublicInbox::Mbox; -use PublicInbox::GitAsyncCat; *msg_hdr = \&PublicInbox::Mbox::msg_hdr; *msg_body = \&PublicInbox::Mbox::msg_body; -# this is public-inbox-httpd-specific -sub mboxgz_blob_cb { # git->cat_async callback - my ($bref, $oid, $type, $size, $self) = @_; - my $http = $self->{env}->{'psgix.io'} or return; # client abort - my $smsg = delete $self->{smsg} or die 'BUG: no smsg'; - if (!defined($oid)) { - # it's possible to have TOCTOU if an admin runs - # public-inbox-(edit|purge), just move onto the next message - return $http->next_step(\&mboxgz_async_next); - } else { - $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; - } - my $eml = PublicInbox::Eml->new($bref); - $self->zmore(msg_hdr($self, $eml, $smsg->{mid})); - - # PublicInbox::HTTP::{Chunked,Identity}::write - $self->{http_out}->write($self->translate(msg_body($eml))); - - $http->next_step(\&mboxgz_async_next); -} - -# this is public-inbox-httpd-specific -sub mboxgz_async_step ($) { - my ($self) = @_; - if (my $smsg = $self->{smsg} = $self->{cb}->($self)) { - git_async_cat($self->{-inbox}->git, $smsg->{blob}, - \&mboxgz_blob_cb, $self); - } elsif (my $out = delete $self->{http_out}) { - $out->write($self->zflush); - $out->close; - } -} - -# called by PublicInbox::DS::write -sub mboxgz_async_next { +sub async_next ($) { my ($http) = @_; # PublicInbox::HTTP - mboxgz_async_step($http->{forward}); -} - -# called by PublicInbox::HTTP::close, or any other PSGI server -sub close { !!delete($_[0]->{http_out}) } - -sub response { - my ($self, $cb, $res_hdr) = @_; - $self->{cb} = $cb; - bless $self, __PACKAGE__; - if ($self->{env}->{'pi-httpd.async'}) { - sub { - my ($wcb) = @_; # -httpd provided write callback - $self->{http_out} = $wcb->([200, $res_hdr]); - $self->{env}->{'psgix.io'}->{forward} = $self; - mboxgz_async_step($self); # start stepping - }; - } else { # generic PSGI - [ 200, $res_hdr, $self ]; - } + my $ctx = $http->{forward} or return; + eval { + $ctx->{smsg} = $ctx->{cb}->($ctx) or return $ctx->close; + $ctx->smsg_blob($ctx->{smsg}); + }; + warn "E: $@" if $@; } sub mbox_gz { my ($self, $cb, $fn) = @_; + $self->{cb} = $cb; $self->{base_url} = $self->{-inbox}->base_url($self->{env}); $self->{gz} = PublicInbox::GzipFilter::gzip_or_die(); $fn = to_filename($fn // 'no-subject'); $fn = 'no-subject' if $fn eq ''; # http://www.iana.org/assignments/media-types/application/gzip - response($self, $cb, [ qw(Content-Type application/gzip), - 'Content-Disposition', "inline; filename=$fn.mbox.gz" ]); + bless $self, __PACKAGE__; + my $res_hdr = [ 'Content-Type' => 'application/gzip', + 'Content-Disposition' => "inline; filename=$fn.mbox.gz" ]; + $self->psgi_response(200, $res_hdr, \&async_next, + \&PublicInbox::Mbox::async_eml); } # called by Plack::Util::foreach or similar (generic PSGI) diff --git a/lib/PublicInbox/NoopFilter.pm b/lib/PublicInbox/NoopFilter.pm deleted file mode 100644 index a97dbde64..000000000 --- a/lib/PublicInbox/NoopFilter.pm +++ /dev/null @@ -1,24 +0,0 @@ -# Copyright (C) 2020 all contributors <meta@public-inbox.org> -# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> - -package PublicInbox::NoopFilter; -use strict; - -sub new { bless \(my $self = ''), __PACKAGE__ } - -# noop workalike for PublicInbox::GzipFilter methods -sub translate { - my $self = $_[0]; - my $ret = $$self .= ($_[1] // ''); - $$self = ''; - $ret; -} - -sub zmore { - ${$_[0]} .= $_[1]; - undef; -} - -sub zflush { translate($_[0], $_[1]) } - -1; diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm index 583309228..073df1dfa 100644 --- a/lib/PublicInbox/WwwAtomStream.pm +++ b/lib/PublicInbox/WwwAtomStream.pm @@ -7,107 +7,59 @@ # more common "push" model) package PublicInbox::WwwAtomStream; use strict; -use warnings; +use parent 'PublicInbox::GzipFilter'; use POSIX qw(strftime); use Digest::SHA qw(sha1_hex); use PublicInbox::Address; use PublicInbox::Hval qw(ascii_html mid_href); use PublicInbox::MsgTime qw(msg_timestamp); -use PublicInbox::GzipFilter qw(gzf_maybe); -use PublicInbox::GitAsyncCat; - -# called by generic PSGI server after getline, -# and also by PublicInbox::HTTP::close -sub close { !!delete($_[0]->{http_out}) } sub new { my ($class, $ctx, $cb) = @_; $ctx->{feed_base_url} = $ctx->{-inbox}->base_url($ctx->{env}); - $ctx->{cb} = $cb || \&close; + $ctx->{cb} = $cb || \&PublicInbox::GzipFilter::close; $ctx->{emit_header} = 1; bless $ctx, $class; } -# called by PublicInbox::DS::write -sub atom_async_next { +sub async_next ($) { my ($http) = @_; # PublicInbox::HTTP - atom_async_step($http->{forward}); -} - -# this is public-inbox-httpd-specific -sub atom_blob_cb { # git->cat_async callback - my ($bref, $oid, $type, $size, $ctx) = @_; - my $http = $ctx->{env}->{'psgix.io'} or return; # client abort - my $smsg = delete $ctx->{smsg} or die 'BUG: no smsg'; - if (!defined($oid)) { - # it's possible to have TOCTOU if an admin runs - # public-inbox-(edit|purge), just move onto the next message - return $http->next_step(\&atom_async_next); - } else { - $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; - } - my $buf = feed_entry($ctx, $smsg, PublicInbox::Eml->new($bref)); - if (my $gzf = $ctx->{gzf}) { - $buf = $gzf->translate($buf); - } - # PublicInbox::HTTP::{Chunked,Identity}::write - $ctx->{http_out}->write($buf); - - $http->next_step(\&atom_async_next); + my $ctx = $http->{forward} or return; + eval { + if (my $smsg = $ctx->{smsg} = $ctx->{cb}->($ctx)) { + $ctx->smsg_blob($smsg); + } else { + $ctx->{http_out}->write($ctx->translate('</feed>')); + $ctx->close; + } + }; + warn "E: $@" if $@; } -sub atom_async_step { # this is public-inbox-httpd-specific - my ($ctx) = @_; - if (my $smsg = $ctx->{smsg} = $ctx->{cb}->($ctx)) { - git_async_cat($ctx->{-inbox}->git, $smsg->{blob}, - \&atom_blob_cb, $ctx); - } elsif (my $out = delete $ctx->{http_out}) { - if (my $gzf = delete $ctx->{gzf}) { - $out->write($gzf->zflush); - } - $out->close; - } +sub async_eml { # ->{async_eml} for async_blob_cb + my ($ctx, $eml) = @_; + my $smsg = delete $ctx->{smsg}; + $ctx->{http_out}->write($ctx->translate(feed_entry($ctx, $smsg, $eml))) } sub response { my ($class, $ctx, $code, $cb) = @_; my $res_hdr = [ 'Content-Type' => 'application/atom+xml' ]; $class->new($ctx, $cb); - $ctx->{gzf} = gzf_maybe($res_hdr, $ctx->{env}); - if ($ctx->{env}->{'pi-httpd.async'}) { - sub { - my ($wcb) = @_; # -httpd provided write callback - $ctx->{http_out} = $wcb->([200, $res_hdr]); - $ctx->{env}->{'psgix.io'}->{forward} = $ctx; - atom_async_step($ctx); # start stepping - }; - } else { - [ $code, $res_hdr, $ctx ]; - } + $ctx->psgi_response($code, $res_hdr, \&async_next, \&async_eml); } # called once for each message by PSGI server sub getline { my ($self) = @_; - my $buf = do { - if (my $middle = $self->{cb}) { - if (my $smsg = $middle->($self)) { - my $eml = $self->{-inbox}->smsg_eml($smsg) or - return ''; - feed_entry($self, $smsg, $eml); - } else { - undef; - } - } - } // (delete($self->{cb}) ? '</feed>' : undef); - - # gzf may be GzipFilter, `undef' or `0' - my $gzf = $self->{gzf} or return $buf; - - return $gzf->translate($buf) if defined $buf; - $self->{gzf} = 0; # next call to ->getline returns $buf (== undef) - $gzf->translate(undef); + my $cb = $self->{cb} or return; + while (my $smsg = $cb->($self)) { + my $eml = $self->{-inbox}->smsg_eml($smsg) or next; + return $self->translate(feed_entry($self, $smsg, $eml)); + } + delete $self->{cb}; + $self->zflush('</feed>'); } # private diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm index d641e6d5c..5f85e3464 100644 --- a/lib/PublicInbox/WwwListing.pm +++ b/lib/PublicInbox/WwwListing.pm @@ -10,7 +10,6 @@ use PublicInbox::Hval qw(ascii_html prurl); use PublicInbox::Linkify; use PublicInbox::View; use PublicInbox::Inbox; -use PublicInbox::NoopFilter; use PublicInbox::GzipFilter qw(gzf_maybe); use bytes (); # bytes::length use HTTP::Date qw(time2str); @@ -108,7 +107,7 @@ sub html ($$) { my ($env, $list) = @_; my $h = [ 'Content-Type', 'text/html; charset=UTF-8', 'Content-Length', undef ]; - my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new(); + my $gzf = gzf_maybe($h, $env); $gzf->zmore('<html><head><title>' . 'public-inbox listing</title>' . '</head><body><pre>'); diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm index d0611949d..051d2e039 100644 --- a/lib/PublicInbox/WwwStatic.pm +++ b/lib/PublicInbox/WwwStatic.pm @@ -17,7 +17,6 @@ use HTTP::Date qw(time2str); use HTTP::Status qw(status_message); use Errno qw(EACCES ENOTDIR ENOENT); use URI::Escape qw(uri_escape_utf8); -use PublicInbox::NoopFilter; use PublicInbox::GzipFilter qw(gzf_maybe); use PublicInbox::Hval qw(ascii_html); use Plack::MIME; @@ -313,7 +312,7 @@ sub dir_response ($$$) { my $path_info_html = ascii_html($path_info); my $h = [qw(Content-Type text/html Content-Length), undef]; - my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new(); + my $gzf = gzf_maybe($h, $env); $gzf->zmore("<html><head><title>Index of $path_info_html</title>" . ${$self->{style}} . "</head><body><pre>Index of $path_info_html</pre><hr><pre>\n");
Z_FINISH is the default for Compress::Raw::Zlib::Deflate->flush, anyways, so there's no reason to import it. And none of C::R::Z is needed in WwwText now that gzf_maybe handles it all. --- lib/PublicInbox/GzipFilter.pm | 6 +++--- lib/PublicInbox/WwwText.pm | 1 - 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 99d43cf04..6380f50e9 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -5,7 +5,7 @@ package PublicInbox::GzipFilter; use strict; use parent qw(Exporter); -use Compress::Raw::Zlib qw(Z_FINISH Z_OK); +use Compress::Raw::Zlib qw(Z_OK); use PublicInbox::CompressNoop; use PublicInbox::Eml; use PublicInbox::GitAsyncCat; @@ -95,7 +95,7 @@ sub translate ($$) { $self->{zbuf} = $zbuf; ''; } else { # undef == EOF - my $err = $gz->flush($zbuf, Z_FINISH); + my $err = $gz->flush($zbuf); die "gzip->flush: $err" if $err != Z_OK; $zbuf; } @@ -125,7 +125,7 @@ sub zflush ($;$) { $err = $gz->deflate($_[1], $zbuf); die "gzip->deflate: $err" if $err != Z_OK; } - $err = $gz->flush($zbuf, Z_FINISH); + $err = $gz->flush($zbuf); die "gzip->flush: $err" if $err != Z_OK; $zbuf; } diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm index ea50428e3..08691684d 100644 --- a/lib/PublicInbox/WwwText.pm +++ b/lib/PublicInbox/WwwText.pm @@ -11,7 +11,6 @@ use PublicInbox::WwwStream; use PublicInbox::Hval qw(ascii_html); use URI::Escape qw(uri_escape_utf8); use PublicInbox::GzipFilter qw(gzf_maybe); -use Compress::Raw::Zlib qw(Z_FINISH Z_OK); our $QP_URL = 'https://xapian.org/docs/queryparser.html'; our $WIKI_URL = 'https://en.wikipedia.org/wiki'; my $hl = eval {
parent.pm is leaner than base and we'll rely on `-w' for warnings during development. --- lib/PublicInbox/WwwStream.pm | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index 4d82cbb48..fd558e1b7 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -8,8 +8,7 @@ # more common "push" model) package PublicInbox::WwwStream; use strict; -use warnings; -use base qw(Exporter); +use parent qw(Exporter); our @EXPORT_OK = qw(html_oneshot); use bytes (); # length use PublicInbox::Hval qw(ascii_html prurl);
This makes WwwStream closer to MboxGz and WwwAtomStream and will eventually allow us to follow the same patterns. --- lib/PublicInbox/WwwStream.pm | 45 +++++++++++++++--------------------- 1 file changed, 18 insertions(+), 27 deletions(-) diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index fd558e1b7..42fb183f4 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -8,11 +8,10 @@ # more common "push" model) package PublicInbox::WwwStream; use strict; -use parent qw(Exporter); +use parent qw(Exporter PublicInbox::GzipFilter); our @EXPORT_OK = qw(html_oneshot); use bytes (); # length use PublicInbox::Hval qw(ascii_html prurl); -use PublicInbox::GzipFilter qw(gzf_maybe); our $TOR_URL = 'https://www.torproject.org/'; our $CODE_URL = 'https://public-inbox.org/public-inbox.git'; @@ -35,10 +34,10 @@ sub init { sub response { my ($ctx, $code, $cb) = @_; - my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ]; + my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8' ]; init($ctx, $cb); - $ctx->{gzf} = gzf_maybe($h, $ctx->{env}); - [ $code, $h, $ctx ] + $ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res_hdr, $ctx->{env}); + [ $code, $res_hdr, $ctx ] } sub html_top ($) { @@ -157,35 +156,27 @@ EOF # callback for HTTP.pm (and any other PSGI servers) sub getline { my ($ctx) = @_; - my $cb = $ctx->{cb}; - my $buf = $cb->($ctx) if $cb; - $buf //= delete($ctx->{cb}) ? _html_end($ctx) : undef; - - # gzf may be GzipFilter, `undef' or `0' - my $gzf = $ctx->{gzf} or return $buf; - - return $gzf->translate($buf) if defined $buf; - $ctx->{gzf} = 0; # next call to ->getline returns $buf (== undef) - $gzf->translate(undef); + my $cb = $ctx->{cb} or return; + if (defined(my $buf = $cb->($ctx))) { + return $ctx->translate($buf); + } + delete $ctx->{cb}; + $ctx->zflush(_html_end($ctx)); } sub html_oneshot ($$;$) { my ($ctx, $code, $sref) = @_; $ctx->{base_url} = base_url($ctx); bless $ctx, __PACKAGE__; - my @x; - my $h = [ 'Content-Type' => 'text/html; charset=UTF-8', + my @bdy; + my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8', 'Content-Length' => undef ]; - if (my $gzf = gzf_maybe($h, $ctx->{env})) { - $gzf->zmore(html_top($ctx)); - $gzf->zmore($$sref) if $sref; - $x[0] = $gzf->zflush(_html_end($ctx)); - $h->[3] = length($x[0]); - } else { - @x = (html_top($ctx), $sref ? $$sref : (), _html_end($ctx)); - $h->[3] += bytes::length($_) for @x; - } - [ $code, $h, \@x ] + $ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res_hdr, $ctx->{env}); + $ctx->zmore(html_top($ctx)); + $ctx->zmore($$sref) if $sref; + $bdy[0] = $ctx->zflush(_html_end($ctx)); + $res_hdr->[3] = bytes::length($bdy[0]); + [ $code, $res_hdr, \@bdy ] } 1;
This will allow -httpd to handle other requusts if waiting on an HDD seek or git to decode a blob. --- lib/PublicInbox/View.pm | 38 ++++++++++++++++------------------ lib/PublicInbox/WwwStream.pm | 40 +++++++++++++++++++++++++++++++----- 2 files changed, 53 insertions(+), 25 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 243528263..98445f0e0 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -27,24 +27,22 @@ use constant TCHILD => '` '; sub th_pfx ($) { $_[0] == 0 ? '' : TCHILD }; sub msg_page_i { - my ($ctx) = @_; - my $cur = delete $ctx->{smsg} or return; # undef: done - my $nxt; - if (my $over = $ctx->{-inbox}->over) { - $nxt = $ctx->{smsg} = $over->next_by_mid(@{$ctx->{next_arg}}); + my ($ctx, $eml) = @_; + if ($eml) { # called by WwwStream::async_eml or getline + my $smsg = $ctx->{smsg}; + $ctx->{smsg} = $ctx->{over}->next_by_mid(@{$ctx->{next_arg}}); + $ctx->{mhref} = ($ctx->{nr} || $ctx->{smsg}) ? + "../${\mid_href($smsg->{mid})}/" : ''; + my $hdr = $eml->header_obj; + my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx); + multipart_text_as_html($eml, $ctx); + delete $ctx->{obuf}; + $$obuf .= '</pre><hr>'; + $$obuf .= html_footer($ctx, $ctx->{first_hdr}) if !$ctx->{smsg}; + $$obuf; + } else { # called by WwwStream::async_next or getline + $ctx->{smsg}; # may be undef } - $ctx->{mhref} = ($ctx->{nr} || $nxt) ? - "../${\mid_href($cur->{mid})}/" : ''; - my $eml = $ctx->{-inbox}->smsg_eml($cur) or return; - my $hdr = $eml->header_obj; - my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx); - multipart_text_as_html($eml, $ctx); - delete $ctx->{obuf}; - $$obuf .= '</pre><hr>'; - # we want to at least show the message if something - # here crashes: - eval { $$obuf .= html_footer($ctx, $ctx->{first_hdr}) } if !$nxt; - $$obuf; } # /$INBOX/$MESSAGE_ID/ for unindexed v1 inboxes @@ -74,11 +72,11 @@ sub msg_page { my ($ctx) = @_; my $ibx = $ctx->{-inbox}; $ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef; - my $over = $ibx->over or return no_over_html($ctx); + my $over = $ctx->{over} = $ibx->over or return no_over_html($ctx); my ($id, $prev); my $next_arg = $ctx->{next_arg} = [ $ctx->{mid}, \$id, \$prev ]; - $ctx->{smsg} = $over->next_by_mid(@$next_arg) or return; - PublicInbox::WwwStream::response($ctx, 200, \&msg_page_i); + $ctx->{smsg} = $over->next_by_mid(@$next_arg) or return; # undef == 404 + PublicInbox::WwwStream::aresponse($ctx, 200, \&msg_page_i); } # /$INBOX/$MESSAGE_ID/#R diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index 42fb183f4..eecc27019 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -15,9 +15,6 @@ use PublicInbox::Hval qw(ascii_html prurl); our $TOR_URL = 'https://www.torproject.org/'; our $CODE_URL = 'https://public-inbox.org/public-inbox.git'; -# noop for HTTP.pm (and any other PSGI servers) -sub close {} - sub base_url ($) { my $ctx = shift; my $base_url = $ctx->{-inbox}->base_url($ctx->{env}); @@ -40,6 +37,11 @@ sub response { [ $code, $res_hdr, $ctx ] } +sub async_eml { # ->{async_eml} for async_blob_cb + my ($ctx, $eml) = @_; + $ctx->{http_out}->write($ctx->translate($ctx->{cb}->($ctx, $eml))); +} + sub html_top ($) { my ($ctx) = @_; my $ibx = $ctx->{-inbox}; @@ -157,8 +159,14 @@ EOF sub getline { my ($ctx) = @_; my $cb = $ctx->{cb} or return; - if (defined(my $buf = $cb->($ctx))) { - return $ctx->translate($buf); + while (defined(my $x = $cb->($ctx))) { # x = smsg or scalar non-ref + if (ref($x)) { # smsg + my $eml = $ctx->{-inbox}->smsg_eml($x) or next; + $ctx->{smsg} = $x; + return $ctx->translate($cb->($ctx, $eml)); + } else { # scalar + return $ctx->translate($x); + } } delete $ctx->{cb}; $ctx->zflush(_html_end($ctx)); @@ -179,4 +187,26 @@ sub html_oneshot ($$;$) { [ $code, $res_hdr, \@bdy ] } +sub async_next ($) { + my ($http) = @_; # PublicInbox::HTTP + my $ctx = $http->{forward} or return; + eval { + if (my $smsg = $ctx->{smsg} = $ctx->{cb}->($ctx)) { + $ctx->smsg_blob($smsg); + } else { + $ctx->{http_out}->write( + $ctx->translate(_html_end($ctx))); + $ctx->close; # GzipFilter->close + } + }; + warn "E: $@" if $@; +} + +sub aresponse { + my ($ctx, $code, $cb) = @_; + my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8' ]; + init($ctx, $cb); + $ctx->psgi_response($code, $res_hdr, \&async_next, \&async_eml); +} + 1;
Once again, this shows a ~10% speedup with multi-message threads in xt/httpd-async-stream.t regardless of whether TEST_JOBS is 1 or 100. --- lib/PublicInbox/View.pm | 44 +++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 98445f0e0..117257a64 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -375,7 +375,7 @@ sub thread_eml_entry { $beg . '<pre>' . eml_entry($ctx, $smsg, $eml, 0) . '</pre>' . $end; } -sub next_in_queue ($;$) { +sub next_in_queue ($$) { my ($q, $ghost_ok) = @_; while (@$q) { my ($level, $smsg) = splice(@$q, 0, 2); @@ -387,29 +387,39 @@ sub next_in_queue ($;$) { } sub stream_thread_i { # PublicInbox::WwwStream::getline callback - my ($ctx) = @_; - return unless exists($ctx->{skel}); - my $nr = $ctx->{nr}++; - my ($level, $smsg) = next_in_queue($ctx->{-queue}, $nr); - - $smsg or return - join('', thread_adj_level($ctx, 0)) . ${delete $ctx->{skel}}; - - my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return - ghost_index_entry($ctx, $level, $smsg); + my ($ctx, $eml) = @_; - if ($nr == 0) { - $ctx->{-title_html} = ascii_html($smsg->{subject}); - $ctx->html_top . thread_eml_entry($ctx, $level, $smsg, $eml); - } else { - thread_eml_entry($ctx, $level, $smsg, $eml); + if ($eml) { + my ($level, $smsg) = delete @$ctx{qw(level smsg)}; + if ($ctx->{nr} == 1) { + $ctx->{-title_html} = ascii_html($smsg->{subject}); + $ctx->zmore($ctx->html_top); + } + return thread_eml_entry($ctx, $level, $smsg, $eml); + } + return unless exists($ctx->{skel}); + my $ghost_ok = $ctx->{nr}++; + while (1) { + my ($lvl, $smsg) = next_in_queue($ctx->{-queue}, $ghost_ok); + if ($smsg) { + if (exists $smsg->{blob}) { # next message for cat-file + $ctx->{level} = $lvl; + return $smsg; + } + # buffer the ghost entry and loop + $ctx->zmore(ghost_index_entry($ctx, $lvl, $smsg)); + } else { # all done + $ctx->zmore(join('', thread_adj_level($ctx, 0))); + $ctx->zmore(${delete($ctx->{skel})}); + return; + } } } sub stream_thread ($$) { my ($rootset, $ctx) = @_; $ctx->{-queue} = [ map { (0, $_) } @$rootset ]; - PublicInbox::WwwStream::response($ctx, 200, \&stream_thread_i); + PublicInbox::WwwStream::aresponse($ctx, 200, \&stream_thread_i); } # /$INBOX/$MESSAGE_ID/t/
Another 10% or so speedup in a frequently-hit endpoint. --- lib/PublicInbox/View.pm | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 117257a64..16a0fcdfb 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -459,23 +459,26 @@ sub thread_html { # flat display: lazy load the full message from smsg $ctx->{msgs} = $msgs; $ctx->{-html_tip} = '<pre>'; - PublicInbox::WwwStream::response($ctx, 200, \&thread_html_i); + PublicInbox::WwwStream::aresponse($ctx, 200, \&thread_html_i); } sub thread_html_i { # PublicInbox::WwwStream::getline callback - my ($ctx) = @_; - my $msgs = $ctx->{msgs} or return; - while (my $smsg = shift @$msgs) { - my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next; + my ($ctx, $eml) = @_; + if ($eml) { + my $smsg = $ctx->{smsg}; if (exists $ctx->{-html_tip}) { $ctx->{-title_html} = ascii_html($smsg->{subject}); - return $ctx->html_top . - eml_entry($ctx, $smsg, $eml, scalar @$msgs); + $ctx->zmore($ctx->html_top); + } + return eml_entry($ctx, $smsg, $eml, scalar @{$ctx->{msgs}}); + } else { + while (my $smsg = shift @{$ctx->{msgs}}) { + return $smsg if exists($smsg->{blob}); } - return eml_entry($ctx, $smsg, $eml, scalar @$msgs); + my $skel = delete($ctx->{skel}) or return; # all done + $ctx->zmore($$skel); + undef; } - my ($skel) = delete @$ctx{qw(skel msgs)}; - $$skel; } sub multipart_text_as_html {
$ctx->{msgs} won't ever contain undef values. --- lib/PublicInbox/Feed.pm | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm index b15fc3a09..9141faaf0 100644 --- a/lib/PublicInbox/Feed.pm +++ b/lib/PublicInbox/Feed.pm @@ -11,9 +11,7 @@ use PublicInbox::Smsg; # this loads w/o Search::Xapian sub generate_i { my ($ctx) = @_; - while (my $smsg = shift @{$ctx->{msgs}}) { - return $smsg; - } + shift @{$ctx->{msgs}}; } # main function
Once again this speeds another endpoint up 10% or so. --- lib/PublicInbox/Feed.pm | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm index 9141faaf0..279106d28 100644 --- a/lib/PublicInbox/Feed.pm +++ b/lib/PublicInbox/Feed.pm @@ -48,15 +48,15 @@ sub generate_html_index { } sub new_html_i { - my ($ctx) = @_; - return $ctx->html_top if exists $ctx->{-html_tip}; - my $msgs = $ctx->{msgs}; - while (my $smsg = shift @$msgs) { - my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next; - return PublicInbox::View::eml_entry($ctx, $smsg, $eml, - scalar @$msgs); - } - PublicInbox::View::pagination_footer($ctx, './new.html'); + my ($ctx, $eml) = @_; + $ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip}; + + $eml and return PublicInbox::View::eml_entry($ctx, $ctx->{smsg}, $eml, + scalar @{$ctx->{msgs}}); + my $smsg = shift @{$ctx->{msgs}} or + $ctx->zmore(PublicInbox::View::pagination_footer( + $ctx, './new.html')); + $smsg; } sub new_html { @@ -69,7 +69,7 @@ sub new_html { $ctx->{-html_tip} = '<pre>'; $ctx->{-upfx} = ''; $ctx->{-hr} = 1; - PublicInbox::WwwStream::response($ctx, 200, \&new_html_i); + PublicInbox::WwwStream::aresponse($ctx, 200, \&new_html_i); } # private subs
Another 10% or so speedup when displaying full messages off search results. --- lib/PublicInbox/SearchView.pm | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm index eeebdfa31..921992a5d 100644 --- a/lib/PublicInbox/SearchView.pm +++ b/lib/PublicInbox/SearchView.pm @@ -287,21 +287,18 @@ sub mset_thread { @$msgs = reverse @$msgs if $r; $ctx->{msgs} = $msgs; - PublicInbox::WwwStream::response($ctx, 200, \&mset_thread_i); + PublicInbox::WwwStream::aresponse($ctx, 200, \&mset_thread_i); } # callback for PublicInbox::WwwStream::getline sub mset_thread_i { - my ($ctx) = @_; - return $ctx->html_top if exists $ctx->{-html_tip}; - my $msgs = $ctx->{msgs} or return; - while (my $smsg = pop @$msgs) { - my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next; - return PublicInbox::View::eml_entry($ctx, $smsg, $eml, - scalar @$msgs); - } - my ($skel) = delete @$ctx{qw(skel msgs)}; - $$skel .= "\n</pre>"; + my ($ctx, $eml) = @_; + $ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip}; + $eml and return PublicInbox::View::eml_entry($ctx, $ctx->{smsg}, $eml, + scalar @{$ctx->{msgs}}); + my $smsg = shift @{$ctx->{msgs}} or + $ctx->zmore(${delete($ctx->{skel})}); + $smsg; } sub ctx_prepare {
We can save stack space and simplify subroutine calls, here. --- Documentation/mknews.perl | 4 ++-- lib/PublicInbox/Feed.pm | 2 +- lib/PublicInbox/SearchView.pm | 2 +- lib/PublicInbox/View.pm | 17 +++++++++-------- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl index 51d54b716..4a5d0e563 100755 --- a/Documentation/mknews.perl +++ b/Documentation/mknews.perl @@ -103,9 +103,9 @@ sub mime2txt { sub mime2html { my ($out, $eml, $ctx) = @_; - my $smsg = bless {}, 'PublicInbox::Smsg'; + my $smsg = $ctx->{smsg} = bless {}, 'PublicInbox::Smsg'; $smsg->populate($eml); - print $out PublicInbox::View::eml_entry($ctx, $smsg, $eml, 1) or die; + print $out PublicInbox::View::eml_entry($ctx, $eml, 1) or die; } sub html_start { diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm index 279106d28..476d946f5 100644 --- a/lib/PublicInbox/Feed.pm +++ b/lib/PublicInbox/Feed.pm @@ -51,7 +51,7 @@ sub new_html_i { my ($ctx, $eml) = @_; $ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip}; - $eml and return PublicInbox::View::eml_entry($ctx, $ctx->{smsg}, $eml, + $eml and return PublicInbox::View::eml_entry($ctx, $eml, scalar @{$ctx->{msgs}}); my $smsg = shift @{$ctx->{msgs}} or $ctx->zmore(PublicInbox::View::pagination_footer( diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm index 921992a5d..623b16fb2 100644 --- a/lib/PublicInbox/SearchView.pm +++ b/lib/PublicInbox/SearchView.pm @@ -294,7 +294,7 @@ sub mset_thread { sub mset_thread_i { my ($ctx, $eml) = @_; $ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip}; - $eml and return PublicInbox::View::eml_entry($ctx, $ctx->{smsg}, $eml, + $eml and return PublicInbox::View::eml_entry($ctx, $eml, scalar @{$ctx->{msgs}}); my $smsg = shift @{$ctx->{msgs}} or $ctx->zmore(${delete($ctx->{skel})}); diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 16a0fcdfb..656953928 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -181,7 +181,8 @@ sub fmt_ts ($) { strftime('%Y-%m-%d %k:%M', gmtime($_[0])) } # Displays the text of of the message for /$INBOX/$MSGID/[Tt]/ endpoint # this is already inside a <pre> sub eml_entry { - my ($ctx, $smsg, $eml, $more) = @_; + my ($ctx, $eml, $more) = @_; + my $smsg = delete $ctx->{smsg}; my $subj = delete $smsg->{subject}; my $mid_raw = $smsg->{mid}; my $id = id_compress($mid_raw, 1); @@ -370,9 +371,9 @@ sub pre_thread { # walk_thread callback } sub thread_eml_entry { - my ($ctx, $level, $smsg, $eml) = @_; - my ($beg, $end) = thread_adj_level($ctx, $level); - $beg . '<pre>' . eml_entry($ctx, $smsg, $eml, 0) . '</pre>' . $end; + my ($ctx, $eml) = @_; + my ($beg, $end) = thread_adj_level($ctx, $ctx->{level}); + $beg . '<pre>' . eml_entry($ctx, $eml, 0) . '</pre>' . $end; } sub next_in_queue ($$) { @@ -390,12 +391,12 @@ sub stream_thread_i { # PublicInbox::WwwStream::getline callback my ($ctx, $eml) = @_; if ($eml) { - my ($level, $smsg) = delete @$ctx{qw(level smsg)}; if ($ctx->{nr} == 1) { - $ctx->{-title_html} = ascii_html($smsg->{subject}); + $ctx->{-title_html} = + ascii_html($ctx->{smsg}->{subject}); $ctx->zmore($ctx->html_top); } - return thread_eml_entry($ctx, $level, $smsg, $eml); + goto &thread_eml_entry; # tail recursion } return unless exists($ctx->{skel}); my $ghost_ok = $ctx->{nr}++; @@ -470,7 +471,7 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback $ctx->{-title_html} = ascii_html($smsg->{subject}); $ctx->zmore($ctx->html_top); } - return eml_entry($ctx, $smsg, $eml, scalar @{$ctx->{msgs}}); + return eml_entry($ctx, $eml, scalar @{$ctx->{msgs}}); } else { while (my $smsg = shift @{$ctx->{msgs}}) { return $smsg if exists($smsg->{blob});
We can build and buffer the HTML <head> section once the first non-ghost message in a thread is loaded, so there's no need to perform an extra check on $ctx->{nr} once the $eml is ready. --- lib/PublicInbox/View.pm | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 656953928..138e0c3a2 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -389,15 +389,7 @@ sub next_in_queue ($$) { sub stream_thread_i { # PublicInbox::WwwStream::getline callback my ($ctx, $eml) = @_; - - if ($eml) { - if ($ctx->{nr} == 1) { - $ctx->{-title_html} = - ascii_html($ctx->{smsg}->{subject}); - $ctx->zmore($ctx->html_top); - } - goto &thread_eml_entry; # tail recursion - } + goto &thread_eml_entry if $eml; # tail recursion return unless exists($ctx->{skel}); my $ghost_ok = $ctx->{nr}++; while (1) { @@ -405,6 +397,11 @@ sub stream_thread_i { # PublicInbox::WwwStream::getline callback if ($smsg) { if (exists $smsg->{blob}) { # next message for cat-file $ctx->{level} = $lvl; + if (!$ghost_ok) { # first non-ghost + $ctx->{-title_html} = + ascii_html($smsg->{subject}); + $ctx->zmore($ctx->html_top); + } return $smsg; } # buffer the ghost entry and loop
All of our streaming responses use ::aresponse, now, and our synchronous responses use html_oneshot. So there's no need for the old WwwStream::response. --- lib/PublicInbox/View.pm | 28 +++++++++------------------- lib/PublicInbox/WwwStream.pm | 23 ++++++++--------------- 2 files changed, 17 insertions(+), 34 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 138e0c3a2..895e4f278 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -14,7 +14,7 @@ use PublicInbox::MID qw(id_compress mids mids_for_index references $MID_EXTRACT); use PublicInbox::MsgIter; use PublicInbox::Address; -use PublicInbox::WwwStream; +use PublicInbox::WwwStream qw(html_oneshot); use PublicInbox::Reply; use PublicInbox::ViewDiff qw(flush_diff); use PublicInbox::Eml; @@ -45,25 +45,20 @@ sub msg_page_i { } } -# /$INBOX/$MESSAGE_ID/ for unindexed v1 inboxes -sub no_over_i { +# /$INBOX/$MSGID/ for unindexed v1 inboxes +sub no_over_html ($) { my ($ctx) = @_; - my $eml = delete $ctx->{eml} or return; + my $bref = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return; # 404 + my $eml = PublicInbox::Eml->new($bref); my $hdr = $eml->header_obj; $ctx->{mhref} = ''; + PublicInbox::WwwStream::init($ctx); my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx); multipart_text_as_html($eml, $ctx); delete $ctx->{obuf}; $$obuf .= '</pre><hr>'; eval { $$obuf .= html_footer($ctx, $hdr) }; - $$obuf -} - -sub no_over_html ($) { - my ($ctx) = @_; - my $bref = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return; # 404 - $ctx->{eml} = PublicInbox::Eml->new($bref); - PublicInbox::WwwStream::response($ctx, 200, \&no_over_i); + html_oneshot($ctx, 200, $obuf); } # public functions: (unstable) @@ -1169,12 +1164,6 @@ sub pagination_footer ($$) { "<hr><pre>page: $next$prev</pre>"; } -sub index_nav { # callback for WwwStream::getline - my ($ctx) = @_; - return $ctx->html_top if exists $ctx->{-html_tip}; - pagination_footer($ctx, '.') -} - sub paginate_recent ($$) { my ($ctx, $lim) = @_; my $t = $ctx->{qp}->{t} || ''; @@ -1223,7 +1212,8 @@ sub index_topics { if (@$msgs) { walk_thread(thread_results($ctx, $msgs), $ctx, \&acc_topic); } - PublicInbox::WwwStream::response($ctx, dump_topics($ctx), \&index_nav); + html_oneshot($ctx, dump_topics($ctx), \pagination_footer($ctx, '.')); + } sub thread_adj_level { diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index eecc27019..7d257a191 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -29,14 +29,6 @@ sub init { bless $ctx, __PACKAGE__; } -sub response { - my ($ctx, $code, $cb) = @_; - my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8' ]; - init($ctx, $cb); - $ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res_hdr, $ctx->{env}); - [ $code, $res_hdr, $ctx ] -} - sub async_eml { # ->{async_eml} for async_blob_cb my ($ctx, $eml) = @_; $ctx->{http_out}->write($ctx->translate($ctx->{cb}->($ctx, $eml))); @@ -174,17 +166,18 @@ sub getline { sub html_oneshot ($$;$) { my ($ctx, $code, $sref) = @_; - $ctx->{base_url} = base_url($ctx); - bless $ctx, __PACKAGE__; - my @bdy; my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8', 'Content-Length' => undef ]; + bless $ctx, __PACKAGE__; $ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res_hdr, $ctx->{env}); - $ctx->zmore(html_top($ctx)); + $ctx->{base_url} //= do { + $ctx->zmore(html_top($ctx)); + base_url($ctx); + }; $ctx->zmore($$sref) if $sref; - $bdy[0] = $ctx->zflush(_html_end($ctx)); - $res_hdr->[3] = bytes::length($bdy[0]); - [ $code, $res_hdr, \@bdy ] + my $bdy = $ctx->zflush(_html_end($ctx)); + $res_hdr->[3] = bytes::length($bdy); + [ $code, $res_hdr, [ $bdy ] ] } sub async_next ($) {
We no longer favor getline+close for streaming PSGI responses when using public-inbox-httpd. We still support it for other PSGI servers, though. --- Documentation/technical/ds.txt | 4 ++-- lib/PublicInbox/GetlineBody.pm | 4 +--- lib/PublicInbox/GzipFilter.pm | 17 +++++++++++++---- lib/PublicInbox/HTTPD.pm | 5 ++--- lib/PublicInbox/Mbox.pm | 8 ++------ lib/PublicInbox/View.pm | 2 +- lib/PublicInbox/WwwAtomStream.pm | 6 ++---- lib/PublicInbox/WwwStream.pm | 7 +++---- 8 files changed, 26 insertions(+), 27 deletions(-) diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt index cbd06cfb4..a0793ca23 100644 --- a/Documentation/technical/ds.txt +++ b/Documentation/technical/ds.txt @@ -64,8 +64,8 @@ Augmented features: * ->requeue support. An optimization of the AddTimer(0, ...) idiom for immediately dispatching code at the next event loop iteration. public-inbox uses this for fairly generating large responses - iteratively (see PublicInbox::NNTP::long_response or the use of - ->getline callbacks for generating gigantic gzipped mboxes). + iteratively (see PublicInbox::NNTP::long_response or git_async_cat + for blob retrievals). New features diff --git a/lib/PublicInbox/GetlineBody.pm b/lib/PublicInbox/GetlineBody.pm index 6becaaf5f..988bc63f4 100644 --- a/lib/PublicInbox/GetlineBody.pm +++ b/lib/PublicInbox/GetlineBody.pm @@ -5,9 +5,7 @@ # end callback when the object goes out-of-scope. # This depends on rpipe being _blocking_ on getline. # -# public-inbox-httpd favors "getline" response bodies to take a -# "pull"-based approach to feeding slow clients (as opposed to a -# more common "push" model) +# This is only used by generic PSGI servers and not public-inbox-httpd package PublicInbox::GetlineBody; use strict; use warnings; diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 6380f50e9..d72ad3c88 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -1,7 +1,16 @@ # Copyright (C) 2020 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> - -# Qspawn filter +# +# In public-inbox <=1.5.0, public-inbox-httpd favored "getline" +# response bodies to take a "pull"-based approach to feeding +# slow clients (as opposed to a more common "push" model). +# +# In newer versions, public-inbox-httpd supports a backpressure-aware +# pull/push model which also accounts for slow git blob storage. +# {async_next} callbacks only run when the DS {wbuf} is drained +# {async_eml} callbacks only run when a blob arrives from git. +# +# We continue to support getline+close for generic PSGI servers. package PublicInbox::GzipFilter; use strict; use parent qw(Exporter); @@ -14,12 +23,12 @@ our @EXPORT_OK = qw(gzf_maybe); my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1); my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip); -sub new { bless {}, shift } +sub new { bless {}, shift } # qspawn filter # for Qspawn if using $env->{'pi-httpd.async'} sub attach { my ($self, $http_out) = @_; - $self->{http_out} = $http_out; + $self->{http_out} = $http_out; # PublicInbox::HTTP::{Chunked,Identity} $self } diff --git a/lib/PublicInbox/HTTPD.pm b/lib/PublicInbox/HTTPD.pm index 331939699..a9f55ff61 100644 --- a/lib/PublicInbox/HTTPD.pm +++ b/lib/PublicInbox/HTTPD.pm @@ -36,9 +36,8 @@ sub new { # XXX unstable API!, only GitHTTPBackend needs # this to limit git-http-backend(1) parallelism. - # The rest of our PSGI code is generic, relying - # on "pull" model using "getline" to prevent - # over-buffering. + # We also check for the truthiness of this to + # detect when to use git_async_cat for slow blobs 'pi-httpd.async' => \&pi_httpd_async ); bless { diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm index abdf43c93..8726b9f64 100644 --- a/lib/PublicInbox/Mbox.pm +++ b/lib/PublicInbox/Mbox.pm @@ -1,12 +1,8 @@ # Copyright (C) 2015-2020 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> -# Streaming (via getline) interface for formatting messages as an mboxrd. -# Used by the PSGI web interface. -# -# public-inbox-httpd favors "getline" response bodies to take a -# "pull"-based approach to feeding slow clients (as opposed to a -# more common "push" model) +# Streaming interface for mboxrd HTTP responses +# See PublicInbox::GzipFilter for details. package PublicInbox::Mbox; use strict; use parent 'PublicInbox::GzipFilter'; diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 895e4f278..60dad6bac 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -415,7 +415,7 @@ sub stream_thread ($$) { PublicInbox::WwwStream::aresponse($ctx, 200, \&stream_thread_i); } -# /$INBOX/$MESSAGE_ID/t/ +# /$INBOX/$MSGID/t/ and /$INBOX/$MSGID/T/ sub thread_html { my ($ctx) = @_; my $mid = $ctx->{mid}; diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm index 073df1dfa..3b5b133a5 100644 --- a/lib/PublicInbox/WwwAtomStream.pm +++ b/lib/PublicInbox/WwwAtomStream.pm @@ -1,10 +1,8 @@ # Copyright (C) 2016-2020 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> # -# Atom body stream for which yields getline+close methods -# public-inbox-httpd favors "getline" response bodies to take a -# "pull"-based approach to feeding slow clients (as opposed to a -# more common "push" model) +# Atom body stream for HTTP responses +# See PublicInbox::GzipFilter for details. package PublicInbox::WwwAtomStream; use strict; use parent 'PublicInbox::GzipFilter'; diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index 7d257a191..23b03f0e8 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -1,11 +1,10 @@ # Copyright (C) 2016-2020 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> # -# HTML body stream for which yields getline+close methods +# HTML body stream for which yields getline+close methods for +# generic PSGI servers and callbacks for public-inbox-httpd. # -# public-inbox-httpd favors "getline" response bodies to take a -# "pull"-based approach to feeding slow clients (as opposed to a -# more common "push" model) +# See PublicInbox::GzipFilter parent class for more info. package PublicInbox::WwwStream; use strict; use parent qw(Exporter PublicInbox::GzipFilter);
This simplifies the primary callers of eml_entry while only making mknews.perl worse. --- Documentation/mknews.perl | 3 ++- lib/PublicInbox/Feed.pm | 3 +-- lib/PublicInbox/SearchView.pm | 3 +-- lib/PublicInbox/View.pm | 9 +++++---- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl index 4a5d0e563..f053e2bfb 100755 --- a/Documentation/mknews.perl +++ b/Documentation/mknews.perl @@ -105,7 +105,8 @@ sub mime2html { my ($out, $eml, $ctx) = @_; my $smsg = $ctx->{smsg} = bless {}, 'PublicInbox::Smsg'; $smsg->populate($eml); - print $out PublicInbox::View::eml_entry($ctx, $eml, 1) or die; + $ctx->{msgs} = [ 1 ]; # for <hr> in eml_entry + print $out PublicInbox::View::eml_entry($ctx, $eml) or die; } sub html_start { diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm index 476d946f5..bf095a2cc 100644 --- a/lib/PublicInbox/Feed.pm +++ b/lib/PublicInbox/Feed.pm @@ -51,8 +51,7 @@ sub new_html_i { my ($ctx, $eml) = @_; $ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip}; - $eml and return PublicInbox::View::eml_entry($ctx, $eml, - scalar @{$ctx->{msgs}}); + $eml and return PublicInbox::View::eml_entry($ctx, $eml); my $smsg = shift @{$ctx->{msgs}} or $ctx->zmore(PublicInbox::View::pagination_footer( $ctx, './new.html')); diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm index 623b16fb2..84c04c6c4 100644 --- a/lib/PublicInbox/SearchView.pm +++ b/lib/PublicInbox/SearchView.pm @@ -294,8 +294,7 @@ sub mset_thread { sub mset_thread_i { my ($ctx, $eml) = @_; $ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip}; - $eml and return PublicInbox::View::eml_entry($ctx, $eml, - scalar @{$ctx->{msgs}}); + $eml and return PublicInbox::View::eml_entry($ctx, $eml); my $smsg = shift @{$ctx->{msgs}} or $ctx->zmore(${delete($ctx->{skel})}); $smsg; diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 60dad6bac..d7ec4eb0a 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -176,7 +176,7 @@ sub fmt_ts ($) { strftime('%Y-%m-%d %k:%M', gmtime($_[0])) } # Displays the text of of the message for /$INBOX/$MSGID/[Tt]/ endpoint # this is already inside a <pre> sub eml_entry { - my ($ctx, $eml, $more) = @_; + my ($ctx, $eml) = @_; my $smsg = delete $ctx->{smsg}; my $subj = delete $smsg->{subject}; my $mid_raw = $smsg->{mid}; @@ -267,7 +267,8 @@ sub eml_entry { $hr = $ctx->{-hr}; } - $rv .= $more ? '</pre><hr><pre>' : '</pre>' if $hr; + # do we have more messages? start a new <pre> if so + $rv .= scalar(@{$ctx->{msgs}}) ? '</pre><hr><pre>' : '</pre>' if $hr; $rv; } @@ -368,7 +369,7 @@ sub pre_thread { # walk_thread callback sub thread_eml_entry { my ($ctx, $eml) = @_; my ($beg, $end) = thread_adj_level($ctx, $ctx->{level}); - $beg . '<pre>' . eml_entry($ctx, $eml, 0) . '</pre>' . $end; + $beg . '<pre>' . eml_entry($ctx, $eml) . '</pre>' . $end; } sub next_in_queue ($$) { @@ -463,7 +464,7 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback $ctx->{-title_html} = ascii_html($smsg->{subject}); $ctx->zmore($ctx->html_top); } - return eml_entry($ctx, $eml, scalar @{$ctx->{msgs}}); + return eml_entry($ctx, $eml); } else { while (my $smsg = shift @{$ctx->{msgs}}) { return $smsg if exists($smsg->{blob});
gzf_maybe always returns a GzipFilter object, even if it uses CompressNoop. We can also use ->zflush instead of ->translate(undef) here for the final bit. --- lib/PublicInbox/WwwText.pm | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm index 08691684d..a691e2d8e 100644 --- a/lib/PublicInbox/WwwText.pm +++ b/lib/PublicInbox/WwwText.pm @@ -38,16 +38,13 @@ sub get_text { } my $env = $ctx->{env}; if ($raw) { - my $body; - if (my $gzf = $code == 200 ? gzf_maybe($hdr, $env) : undef) { - my $zbuf = $gzf->translate($txt); - undef $txt; - $body = [ $zbuf .= $gzf->translate(undef) ]; - } else { - $body = [ $txt ]; + if ($code == 200) { + my $gzf = gzf_maybe($hdr, $env); + $txt = $gzf->translate($txt); + $txt .= $gzf->zflush; } - $hdr->[3] = bytes::length($body->[0]); - return [ $code, $hdr, $body ] + $hdr->[3] = bytes::length($txt); + return [ $code, $hdr, [ $txt ] ] } # enforce trailing slash for "wget -r" compatibility
We can reuse some of the GzipFilter infrastructure used by other WWW components to handle slow blob retrieval, here. The difference from previous changes is we don't decide on the 200 status code until we've retrieved the blob and found the attachment. While we're at it, ensure we can compress text attachment responses once again, since all text attachments are served as text/plain. --- lib/PublicInbox/WwwAttach.pm | 63 +++++++++++--- t/psgi_attach.t | 162 ++++++++++++++++++++--------------- 2 files changed, 144 insertions(+), 81 deletions(-) diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm index 7e8496d7a..20417295e 100644 --- a/lib/PublicInbox/WwwAttach.pm +++ b/lib/PublicInbox/WwwAttach.pm @@ -4,15 +4,16 @@ # For retrieving attachments from messages in the WWW interface package PublicInbox::WwwAttach; # internal package use strict; -use warnings; +use parent qw(PublicInbox::GzipFilter); use bytes (); # only for bytes::length use PublicInbox::EmlContentFoo qw(parse_content_type); use PublicInbox::Eml; sub get_attach_i { # ->each_part callback my ($part, $depth, $idx) = @{$_[0]}; - my $res = $_[1]; - return if $idx ne $res->[3]; # [0-9]+(?:\.[0-9]+)+ + my $ctx = $_[1]; + return if $idx ne $ctx->{idx}; # [0-9]+(?:\.[0-9]+)+ + my $res = $ctx->{res}; $res->[0] = 200; my $ct = $part->content_type; $ct = parse_content_type($ct) if $ct; @@ -23,24 +24,64 @@ sub get_attach_i { # ->each_part callback if ($cset && ($cset =~ /\A[a-zA-Z0-9_\-]+\z/)) { $res->[1]->[1] .= qq(; charset=$cset); } + $ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res->[1], + $ctx->{env}); + $part = $ctx->zflush($part->body); } else { # TODO: allow user to configure safe types $res->[1]->[1] = 'application/octet-stream'; + $part = $part->body; } - $part = $part->body; push @{$res->[1]}, 'Content-Length', bytes::length($part); $res->[2]->[0] = $part; } +sub async_eml { # ->{async_eml} for async_blob_cb + my ($ctx, $eml) = @_; + eval { $eml->each_part(\&get_attach_i, $ctx, 1) }; + if ($@) { + $ctx->{res}->[0] = 500; + warn "E: $@"; + } +} + +sub async_next { + my ($http) = @_; + my $ctx = $http->{forward} or return; # client aborted + # finally, we call the user-supplied callback + eval { $ctx->{wcb}->($ctx->{res}) }; + warn "E: $@" if $@; +} + +sub scan_attach ($) { # public-inbox-httpd only + my ($ctx) = @_; + $ctx->{env}->{'psgix.io'}->{forward} = $ctx; + $ctx->{async_eml} = \&async_eml; + $ctx->{async_next} = \&async_next; + $ctx->smsg_blob($ctx->{smsg}); +} + # /$LISTNAME/$MESSAGE_ID/$IDX-$FILENAME sub get_attach ($$$) { my ($ctx, $idx, $fn) = @_; - my $res = [ 404, [ 'Content-Type', 'text/plain' ], [ "Not found\n" ] ]; - my $mime = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return $res; - $mime = PublicInbox::Eml->new($mime); - $res->[3] = $idx; - $mime->each_part(\&get_attach_i, $res, 1); - pop @$res; # cleanup before letting PSGI server see it - $res + $ctx->{res} = [ 404, [ 'Content-Type' => 'text/plain' ], + [ "Not found\n" ] ]; + $ctx->{idx} = $idx; + bless $ctx, __PACKAGE__; + my $eml; + if ($ctx->{smsg} = $ctx->{-inbox}->smsg_by_mid($ctx->{mid})) { + return sub { # public-inbox-httpd-only + $ctx->{wcb} = $_[0]; + scan_attach($ctx); + } if $ctx->{env}->{'pi-httpd.async'}; + # generic PSGI: + $eml = $ctx->{-inbox}->smsg_eml($ctx->{smsg}); + } elsif (!$ctx->{-inbox}->over) { + if (my $bref = $ctx->{-inbox}->msg_by_mid($ctx->{mid})) { + $eml = PublicInbox::Eml->new($bref); + } + } + $eml->each_part(\&get_attach_i, $ctx, 1) if $eml; + $ctx->{res} } 1; diff --git a/t/psgi_attach.t b/t/psgi_attach.t index 9a734f813..14d20adb1 100644 --- a/t/psgi_attach.t +++ b/t/psgi_attach.t @@ -5,9 +5,8 @@ use warnings; use Test::More; use PublicInbox::TestCommon; my ($tmpdir, $for_destroy) = tmpdir(); -my $maindir = "$tmpdir/main.git"; +my $inboxdir = "$tmpdir/main.git"; my $addr = 'test-public@example.com'; -my $cfgpfx = "publicinbox.test"; my @mods = qw(HTTP::Request::Common Plack::Builder Plack::Test URI::Escape); require_mods(@mods); use_ok $_ foreach @mods; @@ -17,85 +16,108 @@ use PublicInbox::Git; use PublicInbox::Config; use PublicInbox::Eml; use_ok 'PublicInbox::WwwAttach'; -my $config = PublicInbox::Config->new(\<<EOF); -$cfgpfx.address=$addr -$cfgpfx.inboxdir=$maindir + +my $cfgpath = "$tmpdir/config"; +open my $fh, '>', $cfgpath or BAIL_OUT $!; +print $fh <<EOF or BAIL_OUT $!; +[publicinbox "test"] + address = $addr + inboxdir = $inboxdir EOF -my $git = PublicInbox::Git->new($maindir); +close $fh or BAIL_OUT $!; +my $config = PublicInbox::Config->new($cfgpath); +my $git = PublicInbox::Git->new($inboxdir); my $im = PublicInbox::Import->new($git, 'test', $addr); $im->init_bare; -{ - my $qp = "abcdef=g\n==blah\n"; - my $b64 = "b64\xde\xad\xbe\xef\n"; - my $txt = "plain\ntext\npass\nthrough\n"; - my $dot = "dotfile\n"; - $im->add(eml_load('t/psgi_attach.eml')); - $im->add(eml_load('t/data/message_embed.eml')); - $im->done; +my $qp = "abcdef=g\n==blah\n"; +my $b64 = "b64\xde\xad\xbe\xef\n"; +my $txt = "plain\ntext\npass\nthrough\n"; +my $dot = "dotfile\n"; +$im->add(eml_load('t/psgi_attach.eml')); +$im->add(eml_load('t/data/message_embed.eml')); +$im->done; + +my $www = PublicInbox::WWW->new($config); +my $client = sub { + my ($cb) = @_; + my $res; + $res = $cb->(GET('/test/Z%40B/')); + my @href = ($res->content =~ /^href="([^"]+)"/gms); + @href = grep(/\A[\d\.]+-/, @href); + is_deeply([qw(1-queue-pee 2-bayce-sixty-four 3-noop.txt + 4-a.txt)], + \@href, 'attachment links generated'); + + $res = $cb->(GET('/test/Z%40B/1-queue-pee')); + my $qp_res = $res->content; + ok(length($qp_res) >= length($qp), 'QP length is close'); + like($qp_res, qr/\n\z/s, 'trailing newline exists'); + # is(index($qp_res, $qp), 0, 'QP trailing newline is there'); + $qp_res =~ s/\r\n/\n/g; + is(index($qp_res, $qp), 0, 'QP trailing newline is there'); + + $res = $cb->(GET('/test/Z%40B/2-base-sixty-four')); + is(quotemeta($res->content), quotemeta($b64), + 'Base64 matches exactly'); - my $www = PublicInbox::WWW->new($config); - test_psgi(sub { $www->call(@_) }, sub { - my ($cb) = @_; - my $res; - $res = $cb->(GET('/test/Z%40B/')); - my @href = ($res->content =~ /^href="([^"]+)"/gms); - @href = grep(/\A[\d\.]+-/, @href); - is_deeply([qw(1-queue-pee 2-bayce-sixty-four 3-noop.txt - 4-a.txt)], - \@href, 'attachment links generated'); + $res = $cb->(GET('/test/Z%40B/3-noop.txt')); + my $txt_res = $res->content; + ok(length($txt_res) >= length($txt), + 'plain text almost matches'); + like($txt_res, qr/\n\z/s, 'trailing newline exists in text'); + is(index($txt_res, $txt), 0, 'plain text not truncated'); - $res = $cb->(GET('/test/Z%40B/1-queue-pee')); - my $qp_res = $res->content; - ok(length($qp_res) >= length($qp), 'QP length is close'); - like($qp_res, qr/\n\z/s, 'trailing newline exists'); - # is(index($qp_res, $qp), 0, 'QP trailing newline is there'); - $qp_res =~ s/\r\n/\n/g; - is(index($qp_res, $qp), 0, 'QP trailing newline is there'); + $res = $cb->(GET('/test/Z%40B/4-a.txt')); + my $dot_res = $res->content; + ok(length($dot_res) >= length($dot), 'dot almost matches'); + $res = $cb->(GET('/test/Z%40B/4-any-filename.txt')); + is($res->content, $dot_res, 'user-specified filename is OK'); - $res = $cb->(GET('/test/Z%40B/2-base-sixty-four')); - is(quotemeta($res->content), quotemeta($b64), - 'Base64 matches exactly'); + my $mid = '20200418222508.GA13918@dcvr'; + my $irt = '20200418222020.GA2745@dcvr'; + $res = $cb->(GET("/test/$mid/")); + unlike($res->content, qr! multipart/mixed, Size: 0 bytes!, + '0-byte download not offered'); + like($res->content, qr/\bhref="2-embed2x\.eml"/s, + 'href to message/rfc822 attachment visible'); + like($res->content, qr/\bhref="2\.1\.2-test\.eml"/s, + 'href to nested message/rfc822 attachment visible'); - $res = $cb->(GET('/test/Z%40B/3-noop.txt')); - my $txt_res = $res->content; - ok(length($txt_res) >= length($txt), - 'plain text almost matches'); - like($txt_res, qr/\n\z/s, 'trailing newline exists in text'); - is(index($txt_res, $txt), 0, 'plain text not truncated'); + $res = $cb->(GET("/test/$mid/2-embed2x.eml")); + my $eml = PublicInbox::Eml->new(\($res->content)); + is_deeply([ $eml->header_raw('Message-ID') ], [ "<$irt>" ], + 'got attached eml'); + my @subs = $eml->subparts; + is(scalar(@subs), 2, 'attachment had 2 subparts'); + like($subs[0]->body_str, qr/^testing embedded message\n*\z/sm, + '1st attachment is as expected'); + is($subs[1]->header('Content-Type'), 'message/rfc822', + '2nd attachment is as expected'); - $res = $cb->(GET('/test/Z%40B/4-a.txt')); - my $dot_res = $res->content; - ok(length($dot_res) >= length($dot), 'dot almost matches'); - $res = $cb->(GET('/test/Z%40B/4-any-filename.txt')); - is($res->content, $dot_res, 'user-specified filename is OK'); + $res = $cb->(GET("/test/$mid/2.1.2-test.eml")); + $eml = PublicInbox::Eml->new(\($res->content)); + is_deeply([ $eml->header_raw('Message-ID') ], + [ '<20200418214114.7575-1-e@yhbt.net>' ], + 'nested eml retrieved'); +}; - my $mid = '20200418222508.GA13918@dcvr'; - my $irt = '20200418222020.GA2745@dcvr'; - $res = $cb->(GET("/test/$mid/")); - unlike($res->content, qr! multipart/mixed, Size: 0 bytes!, - '0-byte download not offered'); - like($res->content, qr/\bhref="2-embed2x\.eml"/s, - 'href to message/rfc822 attachment visible'); - like($res->content, qr/\bhref="2\.1\.2-test\.eml"/s, - 'href to nested message/rfc822 attachment visible'); +test_psgi(sub { $www->call(@_) }, $client); +SKIP: { + diag 'testing with index indexed'; + require_mods('DBD::SQLite', 19); + my $env = { PI_CONFIG => $cfgpath }; + ok(run_script(['-index', $inboxdir], $env), 'indexed'); - $res = $cb->(GET("/test/$mid/2-embed2x.eml")); - my $eml = PublicInbox::Eml->new(\($res->content)); - is_deeply([ $eml->header_raw('Message-ID') ], [ "<$irt>" ], - 'got attached eml'); - my @subs = $eml->subparts; - is(scalar(@subs), 2, 'attachment had 2 subparts'); - like($subs[0]->body_str, qr/^testing embedded message\n*\z/sm, - '1st attachment is as expected'); - is($subs[1]->header('Content-Type'), 'message/rfc822', - '2nd attachment is as expected'); + test_psgi(sub { $www->call(@_) }, $client); - $res = $cb->(GET("/test/$mid/2.1.2-test.eml")); - $eml = PublicInbox::Eml->new(\($res->content)); - is_deeply([ $eml->header_raw('Message-ID') ], - [ '<20200418214114.7575-1-e@yhbt.net>' ], - 'nested eml retrieved'); - }); + require_mods(qw(Plack::Test::ExternalServer), 18); + my $sock = tcp_server() or die; + my ($out, $err) = map { "$inboxdir/std$_.log" } qw(out err); + my $cmd = [ qw(-httpd -W0), "--stdout=$out", "--stderr=$err" ]; + my $td = start_script($cmd, $env, { 3 => $sock }); + my ($h, $p) = ($sock->sockhost, $sock->sockport); + local $ENV{PLACK_TEST_EXTERNALSERVER_URI} = "http://$h:$p"; + Plack::Test::ExternalServer::test_psgi(client => $client); } done_testing();
While all the {async_next} callbacks needed eval guards anyways because of DS->write, {async_eml} callbacks did not. Ensure any bugs in our code or data corruption result in termination of the HTTP connection, so as not to leave clients hanging on a response which never comes or is mangled in some way. --- lib/PublicInbox/GzipFilter.pm | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index d72ad3c88..57344604b 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -147,18 +147,34 @@ sub close { } } +sub bail { + my $self = shift; + if (my $env = $self->{env}) { + eval { $env->{'psgi.errors'}->print(@_, "\n") }; + warn("E: error printing to psgi.errors: $@", @_) if $@; + my $http = $env->{'psgix.io'} or return; # client abort + eval { $http->close }; # should hit our close + warn "E: error in http->close: $@" if $@; + eval { $self->close }; # just in case... + warn "E: error in self->close: $@" if $@; + } else { + warn @_, "\n"; + } +} + # this is public-inbox-httpd-specific sub async_blob_cb { # git->cat_async callback my ($bref, $oid, $type, $size, $self) = @_; my $http = $self->{env}->{'psgix.io'} or return; # client abort - my $smsg = $self->{smsg} or die 'BUG: no smsg'; + my $smsg = $self->{smsg} or bail($self, 'BUG: no smsg'); if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs # public-inbox-(edit|purge), just move onto the next message return $http->next_step($self->{async_next}); } - $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; - $self->{async_eml}->($self, PublicInbox::Eml->new($bref)); + $smsg->{blob} eq $oid or bail($self, "BUG: $smsg->{blob} != $oid"); + eval { $self->{async_eml}->($self, PublicInbox::Eml->new($bref)) }; + bail($self, "E: async_eml: $@") if $@; $http->next_step($self->{async_next}); }
Since -edit and -purge should be rare and TOCTOU around them rarer still; missing {blobs} could be indicative of a real bug elsewhere. Warn on them. And I somehow ended up with 3 different field names for Inbox objects. Perhaps they'll be made consistent in the future. --- lib/PublicInbox/GzipFilter.pm | 1 + lib/PublicInbox/IMAP.pm | 1 + lib/PublicInbox/NNTP.pm | 1 + 3 files changed, 3 insertions(+) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index 57344604b..b5ad9eb88 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -170,6 +170,7 @@ sub async_blob_cb { # git->cat_async callback if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs # public-inbox-(edit|purge), just move onto the next message + warn "E: $smsg->{blob} missing in $self->{-inbox}->{inboxdir}\n"; return $http->next_step($self->{async_next}); } $smsg->{blob} eq $oid or bail($self, "BUG: $smsg->{blob} != $oid"); diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm index e06021438..d8c898f4b 100644 --- a/lib/PublicInbox/IMAP.pm +++ b/lib/PublicInbox/IMAP.pm @@ -620,6 +620,7 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs # public-inbox-(edit|purge), just move onto the next message + warn "E: $smsg->{blob} missing in $self->{ibx}->{inboxdir}\n"; return requeue_once($self); } else { $smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid"; diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm index 9d91544ab..87ddf7a4a 100644 --- a/lib/PublicInbox/NNTP.pm +++ b/lib/PublicInbox/NNTP.pm @@ -523,6 +523,7 @@ sub blob_cb { # called by git->cat_async via git_async_cat if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs # public-inbox-(edit|purge), just move onto the next message + warn "E: $smsg->{blob} missing in $self->{ng}->{inboxdir}\n"; return $self->requeue; } elsif ($smsg->{blob} ne $oid) { $self->close;
We actually don't do anything with {env} or {'psgix.io'} on client aborts, so checking the truthiness of '{forward}' is necessary. --- lib/PublicInbox/GzipFilter.pm | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm index b5ad9eb88..f1354d2b7 100644 --- a/lib/PublicInbox/GzipFilter.pm +++ b/lib/PublicInbox/GzipFilter.pm @@ -165,7 +165,8 @@ sub bail { # this is public-inbox-httpd-specific sub async_blob_cb { # git->cat_async callback my ($bref, $oid, $type, $size, $self) = @_; - my $http = $self->{env}->{'psgix.io'} or return; # client abort + my $http = $self->{env}->{'psgix.io'}; + $http->{forward} or return; # client aborted my $smsg = $self->{smsg} or bail($self, 'BUG: no smsg'); if (!defined($oid)) { # it's possible to have TOCTOU if an admin runs