unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 00/43] www: async git cat-file w/ -httpd
@ 2020-07-05 23:27 Eric Wong
  2020-07-05 23:27 ` [PATCH 01/43] gzipfilter: minor cleanups Eric Wong
                   ` (42 more replies)
  0 siblings, 43 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This allows -httpd to make better use of time it spends waiting
on git-cat-file to respond.  It allows us to deal with
high-latency HDD storage without a client monopolizing the event
loop.  Even on a mid-range consumer-grade SSD, this seems to
give a 10+% speed improvement for HTTP responses requiring many
blobs, including all /T/, /t/, and /t.mbox.gz endpoints.

This only benefits indexed inboxes (both v1 and v2); I'm not
sure if anybody still uses unindexed v1 inboxes nowadays.

A new xt/httpd-async-stream.t maintainer test ensures checksums
for responses before and after this series match exactly as
before.

This builds off a branch I started several months ago (but never
published here) to integrate gzip responses into our codebase
and remove our optional dependency on Plack::Middleware::Deflater.

We already gzip a bunch of things independent of
Plack::Middleware::Deflater: manifest.js.gz, altid SQLite3 dumps
and all the *.mbox.gz endpoints; so being able to use gzip on
all of our responses without an extra dependency seemed logical.

Being able to consistently use our GzipFilter API to perform
buffering via ->zmore made it significantly easier to reason
about small response chunks for ghost messages interspersed with
large ones when streaming /$INBOX/$MSGID/t/ endpoints.

I'm not yet maximizing use of ->zmore for all buffering of HTTP
responses, yet; measurements need to happen, first.  That may
happen in the 1.7 time frame.  In particular, we would need to
ensure the Perl method dispatch and DSO overhead to Zlib.so and
libz.so of making many ->zmore calls doesn't cause performance
regressions compared to the current `.=' use and calling
->zmore/->translate fewer times.

Eric Wong (43):
  gzipfilter: minor cleanups
  wwwstream: oneshot: perform gzip without middleware
  www*stream: gzip ->getline responses
  wwwtext: gzip text/plain responses, as well
  wwwtext: switch to html_oneshot
  www: need: use WwwStream::html_oneshot
  wwwlisting: use GzipFilter for HTML
  gzipfilter: replace Compress::Raw::Deflate usages
  {gzip,noop}filter: ->zmore returns undef, always
  mbox: remove html_oneshot import
  wwwstatic: support gzipped response directory listings
  qspawn: learn to gzip streaming responses
  stop auto-loading Plack::Middleware::Deflater
  mboxgz: do asynchronous git blob retrievals
  mboxgz: reduce object hash depth
  mbox: async blob retrieval for "single message" raw mboxrd
  wwwatomstream: simplify feed_update callers
  wwwatomstream: use PublicInbox::Inbox->modified for feed_updated
  wwwatomstream: reuse $ctx as $self
  xt/httpd-async-stream: allow more options
  wwwatomstream: support asynchronous blob retrievals
  wwwstream: reduce object graph depth
  wwwstream: reduce blob retrieval paths for ->getline
  www: start making gzipfilter the parent response class
  remove unused/redundant zlib-related imports
  wwwstream: use parent.pm and no warnings
  wwwstream: subclass off GzipFilter
  view: wire up /$INBOX/$MESSAGE_ID/ permalink to async
  view: /$INBOX/$MSGID/t/ reads blobs asynchronously
  view: update /$INBOX/$MSGID/T/ to be async
  feed: generate_i: eliminate pointless loop
  feed: /$INBOX/new.html retrieves blobs asynchronously
  ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs
  view: eml_entry: reduce parameters
  view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case
  wwwstream: eliminate ::response, use html_oneshot
  www: update internal docs
  view: simplify eml_entry callers further
  wwwtext: simplify gzf_maybe use
  wwwattach: support async blob retrievals
  gzipfilter: drop HTTP connection on bugs or data corruption
  daemon: warn on missing blobs
  gzipfilter: check http->{forward} for client disconnects

 Documentation/mknews.perl            |  20 +--
 Documentation/public-inbox-httpd.pod |   1 -
 Documentation/technical/ds.txt       |   4 +-
 INSTALL                              |   5 -
 MANIFEST                             |   2 +
 ci/deps.perl                         |   1 -
 examples/cgit.psgi                   |   8 -
 examples/newswww.psgi                |   8 -
 examples/public-inbox.psgi           |   9 --
 examples/unsubscribe.psgi            |   1 -
 lib/PublicInbox/CompressNoop.pm      |  22 +++
 lib/PublicInbox/Feed.pm              |  22 ++-
 lib/PublicInbox/GetlineBody.pm       |   4 +-
 lib/PublicInbox/GzipFilter.pm        | 168 +++++++++++++++++---
 lib/PublicInbox/HTTP.pm              |   7 +
 lib/PublicInbox/HTTPD.pm             |   5 +-
 lib/PublicInbox/IMAP.pm              |   1 +
 lib/PublicInbox/Mbox.pm              | 137 +++++++++--------
 lib/PublicInbox/MboxGz.pm            |  81 ++++------
 lib/PublicInbox/NNTP.pm              |   1 +
 lib/PublicInbox/Qspawn.pm            |   6 +-
 lib/PublicInbox/SearchView.pm        |  40 ++---
 lib/PublicInbox/View.pm              | 219 ++++++++++++++-------------
 lib/PublicInbox/WWW.pm               |   9 +-
 lib/PublicInbox/WwwAtomStream.pm     |  66 ++++----
 lib/PublicInbox/WwwAttach.pm         |  63 ++++++--
 lib/PublicInbox/WwwListing.pm        |  24 +--
 lib/PublicInbox/WwwStatic.pm         |  14 +-
 lib/PublicInbox/WwwStream.pm         | 110 ++++++++------
 lib/PublicInbox/WwwText.pm           |  26 ++--
 script/public-inbox-httpd            |   9 --
 script/public-inbox.cgi              |   7 -
 t/httpd-corner.psgi                  |   7 +
 t/httpd-corner.t                     |   9 +-
 t/plack.t                            |   4 +
 t/psgi_attach.t                      | 162 +++++++++++---------
 t/psgi_text.t                        |  33 +++-
 t/psgi_v2.t                          |  80 ++++++++--
 t/www_listing.t                      |   8 +-
 t/www_static.t                       |  11 +-
 xt/httpd-async-stream.t              | 104 +++++++++++++
 41 files changed, 964 insertions(+), 554 deletions(-)
 create mode 100644 lib/PublicInbox/CompressNoop.pm
 create mode 100644 xt/httpd-async-stream.t

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/43] gzipfilter: minor cleanups
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 02/43] wwwstream: oneshot: perform gzip without middleware Eric Wong
                   ` (41 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We currently don't use bytes::length in ->write, so there's no
need to `use bytes'.  Favor `//=' to describe the intent of the
conditional assignment since the C::R::Z::Deflate object is
always truthy.  Also use the local $gz variable to avoid
unnecessary {gz} hash lookups.
---
 lib/PublicInbox/GzipFilter.pm | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 864095862..a7355a8df 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -4,7 +4,6 @@
 # Qspawn filter
 package PublicInbox::GzipFilter;
 use strict;
-use bytes (); # length
 use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
 my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 
@@ -24,21 +23,21 @@ sub translate ($$) {
 	# allocate the zlib context lazily here, instead of in ->new.
 	# Deflate contexts are memory-intensive and this object may
 	# be sitting in the Qspawn limiter queue for a while.
-	my $gz = $self->{gz} ||= do {
+	my $gz = $self->{gz} //= do {
 		my ($g, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
 		$err == Z_OK or die "Deflate->new failed: $err";
 		$g;
 	};
 	my $zbuf = delete($self->{zbuf});
 	if (defined $_[1]) { # my $buf = $_[1];
-		my $err = $self->{gz}->deflate($_[1], $zbuf);
+		my $err = $gz->deflate($_[1], $zbuf);
 		die "gzip->deflate: $err" if $err != Z_OK;
 		return $zbuf if length($zbuf) >= 8192;
 
 		$self->{zbuf} = $zbuf;
 		'';
 	} else { # undef == EOF
-		my $err = $self->{gz}->flush($zbuf, Z_FINISH);
+		my $err = $gz->flush($zbuf, Z_FINISH);
 		die "gzip->flush: $err" if $err != Z_OK;
 		$zbuf;
 	}

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/43] wwwstream: oneshot: perform gzip without middleware
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
  2020-07-05 23:27 ` [PATCH 01/43] gzipfilter: minor cleanups Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 03/43] www*stream: gzip ->getline responses Eric Wong
                   ` (40 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Plack::Middleware::Deflater forces us to use a memory-intensive
closure.  Instead, work towards building compressed strings in
memory to reduce the overhead of buffering large HTML output.
---
 lib/PublicInbox/GzipFilter.pm | 13 +++++++++++++
 lib/PublicInbox/WwwStream.pm  | 27 ++++++++++++++++++++++-----
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index a7355a8df..115660cb1 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -4,7 +4,9 @@
 # Qspawn filter
 package PublicInbox::GzipFilter;
 use strict;
+use parent qw(Exporter);
 use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+our @EXPORT_OK = qw(gzip_maybe);
 my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 
 sub new { bless {}, shift }
@@ -16,6 +18,17 @@ sub attach {
 	$self
 }
 
+sub gzip_maybe ($) {
+	my ($env) = @_;
+	return if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/;
+
+	# in case Plack::Middleware::Deflater is loaded:
+	$env->{'plack.skip-deflater'} = 1;
+
+	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
+	$err == Z_OK ? $gz : undef;
+}
+
 # for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'}
 sub translate ($$) {
 	my $self = $_[0];
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 915a71ba0..79ed6871e 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -13,6 +13,8 @@ use base qw(Exporter);
 our @EXPORT_OK = qw(html_oneshot);
 use bytes (); # length
 use PublicInbox::Hval qw(ascii_html prurl);
+use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+use PublicInbox::GzipFilter qw(gzip_maybe);
 our $TOR_URL = 'https://www.torproject.org/';
 our $CODE_URL = 'https://public-inbox.org/public-inbox.git';
 
@@ -178,13 +180,28 @@ sub html_oneshot ($$;$) {
 		ctx => $ctx,
 		base_url => base_url($ctx),
 	}, __PACKAGE__;
-	my @x = (_html_top($self), $sref ? $$sref : (), _html_end($self));
+	my @x;
+	my @h = ('Content-Type' => 'text/html; charset=UTF-8');
+	if (my $gz = gzip_maybe($ctx->{env})) {
+		my $err = $gz->deflate(_html_top($self), $x[0]);
+		die "gzip->deflate: $err" if $err != Z_OK;
+		if ($sref) {
+			$err = $gz->deflate($sref, $x[0]);
+			die "gzip->deflate: $err" if $err != Z_OK;
+		}
+		$err = $gz->deflate(_html_end($self), $x[0]);
+		die "gzip->deflate: $err" if $err != Z_OK;
+		$err = $gz->flush($x[0], Z_FINISH);
+		die "gzip->flush: $err" if $err != Z_OK;
+		push @h, qw(Vary Accept-Encoding Content-Encoding gzip);
+	} else {
+		@x = (_html_top($self), $sref ? $$sref : (), _html_end($self));
+	}
+
 	my $len = 0;
 	$len += bytes::length($_) for @x;
-	[ $code, [
-		'Content-Type' => 'text/html; charset=UTF-8',
-		'Content-Length' => $len
-	], \@x ];
+	push @h, 'Content-Length', $len;
+	[ $code, \@h, \@x ]
 }
 
 1;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/43] www*stream: gzip ->getline responses
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
  2020-07-05 23:27 ` [PATCH 01/43] gzipfilter: minor cleanups Eric Wong
  2020-07-05 23:27 ` [PATCH 02/43] wwwstream: oneshot: perform gzip without middleware Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 04/43] wwwtext: gzip text/plain responses, as well Eric Wong
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Our most common endpoints deserve to be gzipped.
---
 lib/PublicInbox/GzipFilter.pm    | 21 +++++++++++++++-----
 lib/PublicInbox/WwwAtomStream.pm | 25 ++++++++++++++++-------
 lib/PublicInbox/WwwStream.pm     | 34 ++++++++++++++++++++------------
 3 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 115660cb1..95fced053 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -6,8 +6,9 @@ package PublicInbox::GzipFilter;
 use strict;
 use parent qw(Exporter);
 use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
-our @EXPORT_OK = qw(gzip_maybe);
+our @EXPORT_OK = qw(gzip_maybe gzf_maybe);
 my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
+my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip);
 
 sub new { bless {}, shift }
 
@@ -18,18 +19,28 @@ sub attach {
 	$self
 }
 
-sub gzip_maybe ($) {
-	my ($env) = @_;
+sub gzip_maybe ($$) {
+	my ($res_hdr, $env) = @_;
 	return if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/;
 
+	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
+	return if $err != Z_OK;
+
 	# in case Plack::Middleware::Deflater is loaded:
 	$env->{'plack.skip-deflater'} = 1;
 
-	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
-	$err == Z_OK ? $gz : undef;
+	push @$res_hdr, @GZIP_HDRS;
+	$gz;
+}
+
+sub gzf_maybe ($$) {
+	my ($res_hdr, $env) = @_;
+	my $gz = gzip_maybe($res_hdr, $env) or return 0;
+	bless { gz => $gz }, __PACKAGE__;
 }
 
 # for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'}
+# Also used for ->getline callbacks
 sub translate ($$) {
 	my $self = $_[0];
 
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 9dc24e16e..c407e343f 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -14,6 +14,7 @@ use Digest::SHA qw(sha1_hex);
 use PublicInbox::Address;
 use PublicInbox::Hval qw(ascii_html mid_href);
 use PublicInbox::MsgTime qw(msg_timestamp);
+use PublicInbox::GzipFilter qw(gzf_maybe);
 
 # called by PSGI server after getline:
 sub close {}
@@ -26,18 +27,28 @@ sub new {
 
 sub response {
 	my ($class, $ctx, $code, $cb) = @_;
-	[ $code, [ 'Content-Type', 'application/atom+xml' ],
-	  $class->new($ctx, $cb) ]
+	my $h = [ 'Content-Type' => 'application/atom+xml' ];
+	my $self = $class->new($ctx, $cb);
+	$self->{gzf} = gzf_maybe($h, $ctx->{env});
+	[ $code, $h, $self ]
 }
 
 # called once for each message by PSGI server
 sub getline {
 	my ($self) = @_;
-	if (my $middle = $self->{cb}) {
-		my $smsg = $middle->($self->{ctx});
-		return feed_entry($self, $smsg) if $smsg;
-	}
-	delete $self->{cb} ? '</feed>' : undef;
+	my $buf = do {
+		if (my $middle = $self->{cb}) {
+			my $smsg = $middle->($self->{ctx});
+			feed_entry($self, $smsg) if $smsg;
+		}
+	} // (delete($self->{cb}) ? '</feed>' : undef);
+
+	# gzf may be GzipFilter, `undef' or `0'
+	my $gzf = $self->{gzf} or return $buf;
+
+	return $gzf->translate($buf) if defined $buf;
+	$self->{gzf} = 0; # next call to ->getline returns $buf (== undef)
+	$gzf->translate(undef);
 }
 
 # private
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 79ed6871e..c964dbd41 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -14,7 +14,7 @@ our @EXPORT_OK = qw(html_oneshot);
 use bytes (); # length
 use PublicInbox::Hval qw(ascii_html prurl);
 use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
-use PublicInbox::GzipFilter qw(gzip_maybe);
+use PublicInbox::GzipFilter qw(gzip_maybe gzf_maybe);
 our $TOR_URL = 'https://www.torproject.org/';
 our $CODE_URL = 'https://public-inbox.org/public-inbox.git';
 
@@ -41,8 +41,10 @@ sub new {
 
 sub response {
 	my ($class, $ctx, $code, $cb) = @_;
-	[ $code, [ 'Content-Type', 'text/html; charset=UTF-8' ],
-	  $class->new($ctx, $cb) ]
+	my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ];
+	my $self = $class->new($ctx, $cb);
+	$self->{gzf} = gzf_maybe($h, $ctx->{env});
+	[ $code, $h, $self ]
 }
 
 sub _html_top ($) {
@@ -165,13 +167,20 @@ sub getline {
 	my ($self) = @_;
 	my $nr = $self->{nr}++;
 
-	return _html_top($self) if $nr == 0;
+	my $buf = do {
+		if ($nr == 0) {
+			_html_top($self);
+		} elsif (my $middle = $self->{cb}) {
+			$middle->($nr, $self->{ctx});
+		}
+	} // (delete($self->{cb}) ? _html_end($self) : undef);
 
-	if (my $middle = $self->{cb}) {
-		$middle = $middle->($nr, $self->{ctx}) and return $middle;
-	}
+	# gzf may be GzipFilter, `undef' or `0'
+	my $gzf = $self->{gzf} or return $buf;
 
-	delete $self->{cb} ? _html_end($self) : undef;
+	return $gzf->translate($buf) if defined $buf;
+	$self->{gzf} = 0; # next call to ->getline returns $buf (== undef)
+	$gzf->translate(undef);
 }
 
 sub html_oneshot ($$;$) {
@@ -181,8 +190,8 @@ sub html_oneshot ($$;$) {
 		base_url => base_url($ctx),
 	}, __PACKAGE__;
 	my @x;
-	my @h = ('Content-Type' => 'text/html; charset=UTF-8');
-	if (my $gz = gzip_maybe($ctx->{env})) {
+	my $h = [ 'Content-Type' => 'text/html; charset=UTF-8' ];
+	if (my $gz = gzip_maybe($h, $ctx->{env})) {
 		my $err = $gz->deflate(_html_top($self), $x[0]);
 		die "gzip->deflate: $err" if $err != Z_OK;
 		if ($sref) {
@@ -193,15 +202,14 @@ sub html_oneshot ($$;$) {
 		die "gzip->deflate: $err" if $err != Z_OK;
 		$err = $gz->flush($x[0], Z_FINISH);
 		die "gzip->flush: $err" if $err != Z_OK;
-		push @h, qw(Vary Accept-Encoding Content-Encoding gzip);
 	} else {
 		@x = (_html_top($self), $sref ? $$sref : (), _html_end($self));
 	}
 
 	my $len = 0;
 	$len += bytes::length($_) for @x;
-	push @h, 'Content-Length', $len;
-	[ $code, \@h, \@x ]
+	push @$h, 'Content-Length', $len;
+	[ $code, $h, \@x ]
 }
 
 1;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/43] wwwtext: gzip text/plain responses, as well
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (2 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 03/43] www*stream: gzip ->getline responses Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 05/43] wwwtext: switch to html_oneshot Eric Wong
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Most of our plain-text responses are config files
big enough to warrant compression.
---
 lib/PublicInbox/WwwText.pm | 17 ++++++++++++++---
 t/psgi_text.t              | 33 ++++++++++++++++++++++++++-------
 2 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm
index b23a415e4..508005fba 100644
--- a/lib/PublicInbox/WwwText.pm
+++ b/lib/PublicInbox/WwwText.pm
@@ -10,6 +10,8 @@ use PublicInbox::Linkify;
 use PublicInbox::WwwStream;
 use PublicInbox::Hval qw(ascii_html);
 use URI::Escape qw(uri_escape_utf8);
+use PublicInbox::GzipFilter qw(gzf_maybe);
+use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
 our $QP_URL = 'https://xapian.org/docs/queryparser.html';
 our $WIKI_URL = 'https://en.wikipedia.org/wiki';
 my $hl = eval {
@@ -35,14 +37,23 @@ sub get_text {
 		$code = 404;
 		$txt = "404 Not Found ($key)\n";
 	}
+	my $env = $ctx->{env};
 	if ($raw) {
-		$hdr->[3] = bytes::length($txt);
-		return [ $code, $hdr, [ $txt ] ]
+		my $body;
+		if (my $gzf = $code == 200 ? gzf_maybe($hdr, $env) : undef) {
+			my $zbuf = $gzf->translate($txt);
+			undef $txt;
+			$body = [ $zbuf .= $gzf->translate(undef) ];
+		} else {
+			$body = [ $txt ];
+		}
+		$hdr->[3] = bytes::length($body->[0]);
+		return [ $code, $hdr, $body ]
 	}
 
 	# enforce trailing slash for "wget -r" compatibility
 	if (!$have_tslash && $code == 200) {
-		my $url = $ctx->{-inbox}->base_url($ctx->{env});
+		my $url = $ctx->{-inbox}->base_url($env);
 		$url .= "_/text/$key/";
 
 		return [ 302, [ 'Content-Type', 'text/plain',
diff --git a/t/psgi_text.t b/t/psgi_text.t
index 833bcaba7..9867feaa4 100644
--- a/t/psgi_text.t
+++ b/t/psgi_text.t
@@ -10,7 +10,7 @@ my $maindir = "$tmpdir/main.git";
 my $addr = 'test-public@example.com';
 my $cfgpfx = "publicinbox.test";
 my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape Plack::Builder);
-require_mods(@mods);
+require_mods(@mods, 'IO::Uncompress::Gunzip');
 use_ok $_ foreach @mods;
 use PublicInbox::Import;
 use PublicInbox::Git;
@@ -26,17 +26,36 @@ my $www = PublicInbox::WWW->new($config);
 
 test_psgi(sub { $www->call(@_) }, sub {
 	my ($cb) = @_;
-	my $res;
-	$res = $cb->(GET('/test/_/text/help/'));
-	like($res->content, qr!<title>public-inbox help.*</title>!,
-		'default help');
-	$res = $cb->(GET('/test/_/text/config/raw'));
+	my $gunzipped;
+	my $req = GET('/test/_/text/help/');
+	my $res = $cb->($req);
+	my $content = $res->content;
+	like($content, qr!<title>public-inbox help.*</title>!, 'default help');
+	$req->header('Accept-Encoding' => 'gzip');
+	$res = $cb->($req);
+	is($res->header('Content-Encoding'), 'gzip', 'got gzip encoding');
+	is($res->header('Content-Type'), 'text/html; charset=UTF-8',
+		'got gzipped HTML');
+	IO::Uncompress::Gunzip::gunzip(\($res->content) => \$gunzipped);
+	is($gunzipped, $content, 'gzipped content is correct');
+
+	$req = GET('/test/_/text/config/raw');
+	$res = $cb->($req);
+	$content = $res->content;
+	my $olen = $res->header('Content-Length');
 	my $f = "$tmpdir/cfg";
 	open my $fh, '>', $f or die;
-	print $fh $res->content or die;
+	print $fh $content or die;
 	close $fh or die;
 	my $cfg = PublicInbox::Config->new($f);
 	is($cfg->{"$cfgpfx.address"}, $addr, 'got expected address in config');
+
+	$req->header('Accept-Encoding' => 'gzip');
+	$res = $cb->($req);
+	is($res->header('Content-Encoding'), 'gzip', 'got gzip encoding');
+	ok($res->header('Content-Length') < $olen, 'gzipped help is smaller');
+	IO::Uncompress::Gunzip::gunzip(\($res->content) => \$gunzipped);
+	is($gunzipped, $content);
 });
 
 done_testing();

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/43] wwwtext: switch to html_oneshot
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (3 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 04/43] wwwtext: gzip text/plain responses, as well Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 06/43] www: need: use WwwStream::html_oneshot Eric Wong
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

No point in streaming a tiny response via ->getline,
but we may stream to a gzipped buffer, later.
---
 lib/PublicInbox/WwwText.pm | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm
index 508005fba..ea50428e3 100644
--- a/lib/PublicInbox/WwwText.pm
+++ b/lib/PublicInbox/WwwText.pm
@@ -64,25 +64,18 @@ sub get_text {
 	# Follow git commit message conventions,
 	# first line is the Subject/title
 	my ($title) = ($txt =~ /\A([^\n]*)/s);
-	$ctx->{txt} = \$txt;
 	$ctx->{-title_html} = ascii_html($title);
 	my $nslash = ($key =~ tr!/!/!);
 	$ctx->{-upfx} = '../../../' . ('../' x $nslash);
-	PublicInbox::WwwStream->response($ctx, $code, \&_do_linkify);
-}
-
-sub _do_linkify {
-	my ($nr, $ctx) = @_;
-	return unless $nr == 1;
 	my $l = PublicInbox::Linkify->new;
-	my $txt = delete $ctx->{txt};
-	$l->linkify_1($$txt);
+	$l->linkify_1($txt);
 	if ($hl) {
-		$hl->do_hl_text($txt);
+		$hl->do_hl_text(\$txt);
 	} else {
-		$$txt = ascii_html($$txt);
+		$txt = ascii_html($txt);
 	}
-	'<pre>' . $l->linkify_2($$txt) . '</pre>';
+	$txt = '<pre>' . $l->linkify_2($txt) . '</pre>';
+	PublicInbox::WwwStream::html_oneshot($ctx, $code, \$txt);
 }
 
 sub _srch_prefix ($$) {

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/43] www: need: use WwwStream::html_oneshot
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (4 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 05/43] wwwtext: switch to html_oneshot Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 07/43] wwwlisting: use GzipFilter for HTML Eric Wong
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

It'll give us a nicer HTML header and footer.
---
 lib/PublicInbox/WWW.pm | 9 ++++-----
 t/plack.t              | 4 ++++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 289d0ce48..e4ad515a4 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -322,12 +322,11 @@ sub get_altid_dump {
 
 sub need {
 	my ($ctx, $extra) = @_;
-	my $msg = <<EOF;
-<html><head><title>$extra not available for this
-public-inbox</title><body><pre>$extra is not available for this public-inbox
-<a href="../">Return to index</a></pre></body></html>
+	require PublicInbox::WwwStream;
+	PublicInbox::WwwStream::html_oneshot($ctx, 501, \<<EOF);
+<pre>$extra is not available for this public-inbox
+<a\nhref="../">Return to index</a></pre>
 EOF
-	[ 501, [ 'Content-Type' => 'text/html; charset=UTF-8' ], [ $msg ] ];
 }
 
 # /$INBOX/$MESSAGE_ID/t.mbox           -> thread as mbox
diff --git a/t/plack.t b/t/plack.t
index 37a6b3948..4b830a21e 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -209,6 +209,10 @@ test_psgi($app, sub {
 	my $res = $cb->(GET($pfx . '/blah@example.com/raw'));
 	is(200, $res->code, 'success response received for /*/raw');
 	like($res->content, qr!^From !sm, "mbox returned");
+
+	$res = $cb->(GET($pfx . '/blah@example.com/t.mbox.gz'));
+	is(501, $res->code, '501 when overview missing');
+	like($res->content, qr!\bOverview\b!, 'overview omission noted');
 });
 
 # legacy redirects

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/43] wwwlisting: use GzipFilter for HTML
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (5 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 06/43] www: need: use WwwStream::html_oneshot Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 08/43] gzipfilter: replace Compress::Raw::Deflate usages Eric Wong
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

The changes to GzipFilter here may be beneficial for building
HTML and XML responses in other places, too.
---
 MANIFEST                      |  1 +
 lib/PublicInbox/GzipFilter.pm | 28 ++++++++++++++++++++++++++--
 lib/PublicInbox/NoopFilter.pm | 13 +++++++++++++
 lib/PublicInbox/WwwListing.pm | 25 +++++++++++++++----------
 t/www_listing.t               |  8 +++++++-
 5 files changed, 62 insertions(+), 13 deletions(-)
 create mode 100644 lib/PublicInbox/NoopFilter.pm

diff --git a/MANIFEST b/MANIFEST
index 6de2c7258..dcd7a7e5f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -159,6 +159,7 @@ lib/PublicInbox/NNTP.pm
 lib/PublicInbox/NNTPD.pm
 lib/PublicInbox/NNTPdeflate.pm
 lib/PublicInbox/NewsWWW.pm
+lib/PublicInbox/NoopFilter.pm
 lib/PublicInbox/Over.pm
 lib/PublicInbox/OverIdx.pm
 lib/PublicInbox/ParentPipe.pm
diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 95fced053..8cc5ea00b 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -42,7 +42,7 @@ sub gzf_maybe ($$) {
 # for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'}
 # Also used for ->getline callbacks
 sub translate ($$) {
-	my $self = $_[0];
+	my $self = $_[0]; # $_[1] => input
 
 	# allocate the zlib context lazily here, instead of in ->new.
 	# Deflate contexts are memory-intensive and this object may
@@ -72,10 +72,34 @@ sub write {
 	$_[0]->{fh}->write(translate($_[0], $_[1]));
 }
 
+# similar to ->translate; use this when we're sure we know we have
+# more data to buffer after this
+sub zmore {
+	my $self = $_[0]; # $_[1] => input
+	my $err = $self->{gz}->deflate($_[1], $self->{zbuf});
+	die "gzip->deflate: $err" if $err != Z_OK;
+	'';
+}
+
+# flushes and returns the final bit of gzipped data
+sub zflush ($;$) {
+	my $self = $_[0]; # $_[1] => final input (optional)
+	my $zbuf = delete $self->{zbuf};
+	my $gz = delete $self->{gz};
+	my $err;
+	if (defined $_[1]) {
+		$err = $gz->deflate($_[1], $zbuf);
+		die "gzip->deflate: $err" if $err != Z_OK;
+	}
+	$err = $gz->flush($zbuf, Z_FINISH);
+	die "gzip->flush: $err" if $err != Z_OK;
+	$zbuf;
+}
+
 sub close {
 	my ($self) = @_;
 	my $fh = delete $self->{fh};
-	$fh->write(translate($self, undef));
+	$fh->write(zflush($self));
 	$fh->close;
 }
 
diff --git a/lib/PublicInbox/NoopFilter.pm b/lib/PublicInbox/NoopFilter.pm
new file mode 100644
index 000000000..b9c00ff7a
--- /dev/null
+++ b/lib/PublicInbox/NoopFilter.pm
@@ -0,0 +1,13 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+package PublicInbox::NoopFilter;
+use strict;
+
+sub new { bless \(my $ignore), __PACKAGE__ }
+
+# noop workalike for PublicInbox::GzipFilter methods
+sub translate { $_[1] // '' }
+sub zmore { $_[1] }
+sub zflush { $_[1] // '' }
+1;
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index a3d4e2b35..780c97e91 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -10,6 +10,8 @@ use PublicInbox::Hval qw(ascii_html prurl);
 use PublicInbox::Linkify;
 use PublicInbox::View;
 use PublicInbox::Inbox;
+use PublicInbox::NoopFilter;
+use PublicInbox::GzipFilter qw(gzf_maybe);
 use bytes (); # bytes::length
 use HTTP::Date qw(time2str);
 use Digest::SHA ();
@@ -104,13 +106,15 @@ sub ibx_entry {
 
 sub html ($$) {
 	my ($env, $list) = @_;
-	my $title = 'public-inbox';
-	my $out = '';
+	my $h = [ 'Content-Type', 'text/html; charset=UTF-8',
+			'Content-Length', undef ];
+	my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new();
+	my $out = $gzf->zmore('<html><head><title>' .
+				'public-inbox listing</title>' .
+				'</head><body><pre>');
 	my $code = 404;
 	if (@$list) {
-		$title .= ' - listing';
 		$code = 200;
-
 		# Schwartzian transform since Inbox->modified is expensive
 		@$list = sort {
 			$b->[0] <=> $a->[0]
@@ -118,13 +122,14 @@ sub html ($$) {
 
 		my $tmp = join("\n", map { ibx_entry(@$_, $env) } @$list);
 		my $l = PublicInbox::Linkify->new;
-		$out = '<pre>'.$l->to_html($tmp).'</pre><hr>';
+		$out .= $gzf->zmore($l->to_html($tmp));
+	} else {
+		$out .= $gzf->zmore('no inboxes, yet');
 	}
-	$out = "<html><head><title>$title</title></head><body>" . $out;
-	$out .= '<pre>'. PublicInbox::WwwStream::code_footer($env) .
-		'</pre></body></html>';
-
-	my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ];
+	$out .= $gzf->zflush('</pre><hr><pre>'.
+				PublicInbox::WwwStream::code_footer($env) .
+				'</pre></body></html>');
+	$h->[3] = bytes::length($out);
 	[ $code, $h, [ $out ] ];
 }
 
diff --git a/t/www_listing.t b/t/www_listing.t
index 0aededd43..c4511cd1f 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -35,13 +35,19 @@ like(PublicInbox::WwwListing::fingerprint($bare), qr/\A[a-f0-9]{40}\z/,
 
 sub tiny_test {
 	my ($json, $host, $port) = @_;
+	my $tmp;
 	my $http = HTTP::Tiny->new;
 	my $res = $http->get("http://$host:$port/");
 	is($res->{status}, 200, 'got HTML listing');
 	like($res->{content}, qr!</html>!si, 'listing looks like HTML');
+
+	$res = $http->get("http://$host:$port/", {'Accept-Encoding'=>'gzip'});
+	is($res->{status}, 200, 'got gzipped HTML listing');
+	IO::Uncompress::Gunzip::gunzip(\(delete $res->{content}) => \$tmp);
+	like($tmp, qr!</html>!si, 'unzipped listing looks like HTML');
+
 	$res = $http->get("http://$host:$port/manifest.js.gz");
 	is($res->{status}, 200, 'got manifest');
-	my $tmp;
 	IO::Uncompress::Gunzip::gunzip(\(delete $res->{content}) => \$tmp);
 	unlike($tmp, qr/"modified":\s*"/, 'modified is an integer');
 	my $manifest = $json->decode($tmp);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/43] gzipfilter: replace Compress::Raw::Deflate usages
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (6 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 07/43] wwwlisting: use GzipFilter for HTML Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 09/43] {gzip,noop}filter: ->zmore returns undef, always Eric Wong
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

The new ->zmore and ->zflush APIs make it possible to replace
existing verbose usages of Compress::Raw::Deflate and simplify
buffering logic for streaming large gzipped data.

One potentially user visible change is we now break the mbox.gz
response on zlib failures, instead of silently continuing onto
the next message.  zlib only seems to fail on OOM, which should
be rare; so it's ideal we drop the connection anyways.
---
 lib/PublicInbox/GzipFilter.pm | 27 ++++++++++-------------
 lib/PublicInbox/MboxGz.pm     | 41 ++++++++++-------------------------
 lib/PublicInbox/WwwStream.pm  | 27 ++++++++---------------
 3 files changed, 31 insertions(+), 64 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 8cc5ea00b..d2eb4e664 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -6,7 +6,7 @@ package PublicInbox::GzipFilter;
 use strict;
 use parent qw(Exporter);
 use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
-our @EXPORT_OK = qw(gzip_maybe gzf_maybe);
+our @EXPORT_OK = qw(gzf_maybe);
 my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip);
 
@@ -19,24 +19,23 @@ sub attach {
 	$self
 }
 
-sub gzip_maybe ($$) {
+# returns `0' and not `undef' on failure (see Www*Stream)
+sub gzf_maybe ($$) {
 	my ($res_hdr, $env) = @_;
-	return if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/;
-
+	return 0 if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/;
 	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
-	return if $err != Z_OK;
+	return 0 if $err != Z_OK;
 
 	# in case Plack::Middleware::Deflater is loaded:
 	$env->{'plack.skip-deflater'} = 1;
-
 	push @$res_hdr, @GZIP_HDRS;
-	$gz;
+	bless { gz => $gz }, __PACKAGE__;
 }
 
-sub gzf_maybe ($$) {
-	my ($res_hdr, $env) = @_;
-	my $gz = gzip_maybe($res_hdr, $env) or return 0;
-	bless { gz => $gz }, __PACKAGE__;
+sub gzip_or_die () {
+	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
+	$err == Z_OK or die "Deflate->new failed: $err";
+	$gz;
 }
 
 # for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'}
@@ -47,11 +46,7 @@ sub translate ($$) {
 	# allocate the zlib context lazily here, instead of in ->new.
 	# Deflate contexts are memory-intensive and this object may
 	# be sitting in the Qspawn limiter queue for a while.
-	my $gz = $self->{gz} //= do {
-		my ($g, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
-		$err == Z_OK or die "Deflate->new failed: $err";
-		$g;
-	};
+	my $gz = $self->{gz} //= gzip_or_die();
 	my $zbuf = delete($self->{zbuf});
 	if (defined $_[1]) { # my $buf = $_[1];
 		my $err = $gz->deflate($_[1], $zbuf);
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index f7fc4afc1..535ef96c9 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -2,19 +2,19 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 package PublicInbox::MboxGz;
 use strict;
-use warnings;
+use parent 'PublicInbox::GzipFilter';
 use PublicInbox::Eml;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Mbox;
-use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
-my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
 	$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
-	$err == Z_OK or die "Deflate->new failed: $err";
-	bless { gz => $gz, cb => $cb, ctx => $ctx }, $class;
+	bless {
+		gz => PublicInbox::GzipFilter::gzip_or_die(),
+		cb => $cb,
+		ctx => $ctx
+	}, $class;
 }
 
 sub response {
@@ -27,40 +27,21 @@ sub response {
 	[ 200, $h, $body ];
 }
 
-sub gzip_fail ($$) {
-	my ($ctx, $err) = @_;
-	$ctx->{env}->{'psgi.errors'}->print("deflate failed: $err\n");
-	'';
-}
-
 # called by Plack::Util::foreach or similar
 sub getline {
 	my ($self) = @_;
 	my $ctx = $self->{ctx} or return;
-	my $gz = $self->{gz};
-	my $buf = delete($self->{buf});
 	while (my $smsg = $self->{cb}->($ctx)) {
 		my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
 		my $h = PublicInbox::Eml->new($mref)->header_obj;
-
-		my $err = $gz->deflate(
-			PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}),
-		        $buf);
-		return gzip_fail($ctx, $err) if $err != Z_OK;
-
-		$err = $gz->deflate(PublicInbox::Mbox::msg_body($$mref), $buf);
-		return gzip_fail($ctx, $err) if $err != Z_OK;
-
-		return $buf if length($buf) >= 8192;
-
-		# be fair to other clients on public-inbox-httpd:
-		$self->{buf} = $buf;
-		return '';
+		$self->zmore(
+			PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid})
+		);
+		return $self->translate(PublicInbox::Mbox::msg_body($$mref));
 	}
 	# signal that we're done and can return undef next call:
 	delete $self->{ctx};
-	my $err = $gz->flush($buf, Z_FINISH);
-	($err == Z_OK) ? $buf : gzip_fail($ctx, $err);
+	$self->zflush;
 }
 
 sub close {} # noop
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index c964dbd41..8623440b8 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -13,8 +13,7 @@ use base qw(Exporter);
 our @EXPORT_OK = qw(html_oneshot);
 use bytes (); # length
 use PublicInbox::Hval qw(ascii_html prurl);
-use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
-use PublicInbox::GzipFilter qw(gzip_maybe gzf_maybe);
+use PublicInbox::GzipFilter qw(gzf_maybe);
 our $TOR_URL = 'https://www.torproject.org/';
 our $CODE_URL = 'https://public-inbox.org/public-inbox.git';
 
@@ -190,25 +189,17 @@ sub html_oneshot ($$;$) {
 		base_url => base_url($ctx),
 	}, __PACKAGE__;
 	my @x;
-	my $h = [ 'Content-Type' => 'text/html; charset=UTF-8' ];
-	if (my $gz = gzip_maybe($h, $ctx->{env})) {
-		my $err = $gz->deflate(_html_top($self), $x[0]);
-		die "gzip->deflate: $err" if $err != Z_OK;
-		if ($sref) {
-			$err = $gz->deflate($sref, $x[0]);
-			die "gzip->deflate: $err" if $err != Z_OK;
-		}
-		$err = $gz->deflate(_html_end($self), $x[0]);
-		die "gzip->deflate: $err" if $err != Z_OK;
-		$err = $gz->flush($x[0], Z_FINISH);
-		die "gzip->flush: $err" if $err != Z_OK;
+	my $h = [ 'Content-Type' => 'text/html; charset=UTF-8',
+		'Content-Length' => undef ];
+	if (my $gzf = gzf_maybe($h, $ctx->{env})) {
+		$gzf->zmore(_html_top($self));
+		$gzf->zmore($$sref) if $sref;
+		$x[0] = $gzf->zflush(_html_end($self));
+		$h->[3] = length($x[0]);
 	} else {
 		@x = (_html_top($self), $sref ? $$sref : (), _html_end($self));
+		$h->[3] += bytes::length($_) for @x;
 	}
-
-	my $len = 0;
-	$len += bytes::length($_) for @x;
-	push @$h, 'Content-Length', $len;
 	[ $code, $h, \@x ]
 }
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/43] {gzip,noop}filter: ->zmore returns undef, always
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (7 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 08/43] gzipfilter: replace Compress::Raw::Deflate usages Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 10/43] mbox: remove html_oneshot import Eric Wong
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This simplifies callers, as witnessed by the change to
WwwListing.  It adds overhead to NoopFilter, but NoopFilter
should see little use as nearly all HTTP clients request gzip.
---
 lib/PublicInbox/GzipFilter.pm |  2 +-
 lib/PublicInbox/NoopFilter.pm | 19 +++++++++++++++----
 lib/PublicInbox/WwwListing.pm |  8 ++++----
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index d2eb4e664..0fbb4476a 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -73,7 +73,7 @@ sub zmore {
 	my $self = $_[0]; # $_[1] => input
 	my $err = $self->{gz}->deflate($_[1], $self->{zbuf});
 	die "gzip->deflate: $err" if $err != Z_OK;
-	'';
+	undef;
 }
 
 # flushes and returns the final bit of gzipped data
diff --git a/lib/PublicInbox/NoopFilter.pm b/lib/PublicInbox/NoopFilter.pm
index b9c00ff7a..a97dbde64 100644
--- a/lib/PublicInbox/NoopFilter.pm
+++ b/lib/PublicInbox/NoopFilter.pm
@@ -4,10 +4,21 @@
 package PublicInbox::NoopFilter;
 use strict;
 
-sub new { bless \(my $ignore), __PACKAGE__ }
+sub new { bless \(my $self = ''), __PACKAGE__ }
 
 # noop workalike for PublicInbox::GzipFilter methods
-sub translate { $_[1] // '' }
-sub zmore { $_[1] }
-sub zflush { $_[1] // '' }
+sub translate {
+	my $self = $_[0];
+	my $ret = $$self .= ($_[1] // '');
+	$$self = '';
+	$ret;
+}
+
+sub zmore {
+	${$_[0]} .= $_[1];
+	undef;
+}
+
+sub zflush { translate($_[0], $_[1]) }
+
 1;
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index 780c97e91..d641e6d5c 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -109,7 +109,7 @@ sub html ($$) {
 	my $h = [ 'Content-Type', 'text/html; charset=UTF-8',
 			'Content-Length', undef ];
 	my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new();
-	my $out = $gzf->zmore('<html><head><title>' .
+	$gzf->zmore('<html><head><title>' .
 				'public-inbox listing</title>' .
 				'</head><body><pre>');
 	my $code = 404;
@@ -122,11 +122,11 @@ sub html ($$) {
 
 		my $tmp = join("\n", map { ibx_entry(@$_, $env) } @$list);
 		my $l = PublicInbox::Linkify->new;
-		$out .= $gzf->zmore($l->to_html($tmp));
+		$gzf->zmore($l->to_html($tmp));
 	} else {
-		$out .= $gzf->zmore('no inboxes, yet');
+		$gzf->zmore('no inboxes, yet');
 	}
-	$out .= $gzf->zflush('</pre><hr><pre>'.
+	my $out = $gzf->zflush('</pre><hr><pre>'.
 				PublicInbox::WwwStream::code_footer($env) .
 				'</pre></body></html>');
 	$h->[3] = bytes::length($out);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/43] mbox: remove html_oneshot import
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (8 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 09/43] {gzip,noop}filter: ->zmore returns undef, always Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 11/43] wwwstatic: support gzipped directory listings Eric Wong
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

It's no longer needed, we no longer show a runtime error
for zlib being missing, as zlib is a hard requirement.

Fixes: a318e758129d616b ("make zlib-related modules a hard dependency")
---
 lib/PublicInbox/Mbox.pm | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index b46dacfdc..4c0b01edf 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -13,7 +13,6 @@ use warnings;
 use PublicInbox::MID qw/mid_escape/;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Smsg;
-use PublicInbox::WwwStream qw(html_oneshot);
 use PublicInbox::Eml;
 
 sub subject_fn ($) {

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/43] wwwstatic: support gzipped directory listings
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (9 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 10/43] mbox: remove html_oneshot import Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 12/43] qspawn: learn to gzip streaming responses Eric Wong
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This will allow others to mimic our award-winning homepage
design without needing to rely on Plack::Middleware::Deflater
or varnish to compress responses.
---
 lib/PublicInbox/WwwStatic.pm | 15 ++++++++++-----
 t/www_static.t               | 11 ++++++++++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index 3c9331564..d0611949d 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -17,6 +17,8 @@ use HTTP::Date qw(time2str);
 use HTTP::Status qw(status_message);
 use Errno qw(EACCES ENOTDIR ENOENT);
 use URI::Escape qw(uri_escape_utf8);
+use PublicInbox::NoopFilter;
+use PublicInbox::GzipFilter qw(gzf_maybe);
 use PublicInbox::Hval qw(ascii_html);
 use Plack::MIME;
 our @EXPORT_OK = qw(@NO_CACHE r path_info_raw);
@@ -310,12 +312,15 @@ sub dir_response ($$$) {
 			(map { ${$other{$_}} } sort keys %other));
 
 	my $path_info_html = ascii_html($path_info);
-	my $body = "<html><head><title>Index of $path_info_html</title>" .
+	my $h = [qw(Content-Type text/html Content-Length), undef];
+	my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new();
+	$gzf->zmore("<html><head><title>Index of $path_info_html</title>" .
 		${$self->{style}} .
-		"</head><body><pre>Index of $path_info_html</pre><hr><pre>\n";
-	$body .= join("\n", @entries) . "</pre><hr></body></html>\n";
-	[ 200, [ qw(Content-Type text/html
-			Content-Length), bytes::length($body) ], [ $body ] ]
+		"</head><body><pre>Index of $path_info_html</pre><hr><pre>\n");
+	$gzf->zmore(join("\n", @entries));
+	my $out = $gzf->zflush("</pre><hr></body></html>\n");
+	$h->[3] = bytes::length($out);
+	[ 200, $h, [ $out ] ]
 }
 
 sub call { # PSGI app endpoint
diff --git a/t/www_static.t b/t/www_static.t
index 10757cb7f..364b9447a 100644
--- a/t/www_static.t
+++ b/t/www_static.t
@@ -6,7 +6,7 @@ use Test::More;
 use PublicInbox::TestCommon;
 my ($tmpdir, $for_destroy) = tmpdir();
 my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape);
-require_mods(@mods);
+require_mods(@mods, 'IO::Uncompress::Gunzip');
 use_ok $_ foreach @mods;
 use_ok 'PublicInbox::WwwStatic';
 
@@ -91,6 +91,15 @@ test_psgi($app->(autoindex => 1, index => []), sub {
 	$get->header('Accept-Encoding' => 'gzip');
 	$res = $cb->($get);
 	is($res->content, "hi", 'got compressed on mtime match');
+
+	$get = GET('/dir/');
+	$get->header('Accept-Encoding' => 'gzip');
+	$res = $cb->($get);
+	my $in = $res->content;
+	my $out = '';
+	IO::Uncompress::Gunzip::gunzip(\$in => \$out);
+	like($out, qr/\A<html>/, 'got HTML start after gunzip');
+	like($out, qr{</html>$}, 'got HTML end after gunzip');
 });
 
 done_testing();

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/43] qspawn: learn to gzip streaming responses
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (10 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 11/43] wwwstatic: support gzipped directory listings Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 13/43] stop auto-loading Plack::Middleware::Deflater Eric Wong
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This will allow us to gzip responses generated by cgit
and any other CGI programs or long-lived streaming
responses we may spawn.
---
 lib/PublicInbox/GzipFilter.pm | 16 ++++++++++++++++
 lib/PublicInbox/Qspawn.pm     |  6 ++++--
 t/httpd-corner.psgi           |  7 +++++++
 t/httpd-corner.t              |  9 ++++++++-
 4 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 0fbb4476a..0a6c56a5d 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -32,6 +32,22 @@ sub gzf_maybe ($$) {
 	bless { gz => $gz }, __PACKAGE__;
 }
 
+sub qsp_maybe ($$) {
+	my ($res_hdr, $env) = @_;
+	return if ($env->{HTTP_ACCEPT_ENCODING} // '') !~ /\bgzip\b/;
+	my $hdr = join("\n", @$res_hdr);
+	return if $hdr !~ m!^Content-Type\n
+				(?:(?:text/(?:html|plain))|
+				application/atom\+xml)\b!ixsm;
+	return if $hdr =~ m!^Content-Encoding\ngzip\n!smi;
+	return if $hdr =~ m!^Content-Length\n[0-9]+\n!smi;
+	return if $hdr =~ m!^Transfer-Encoding\n!smi;
+	# in case Plack::Middleware::Deflater is loaded:
+	return if $env->{'plack.skip-deflater'}++;
+	push @$res_hdr, @GZIP_HDRS;
+	bless {}, __PACKAGE__;
+}
+
 sub gzip_or_die () {
 	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
 	$err == Z_OK or die "Deflate->new failed: $err";
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index d395a10b3..88b6d390a 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -25,8 +25,8 @@
 
 package PublicInbox::Qspawn;
 use strict;
-use warnings;
 use PublicInbox::Spawn qw(popen_rd);
+use PublicInbox::GzipFilter;
 
 # n.b.: we get EAGAIN with public-inbox-httpd, and EINTR on other PSGI servers
 use Errno qw(EAGAIN EINTR);
@@ -255,7 +255,9 @@ sub psgi_return_init_cb {
 	my ($self) = @_;
 	my $r = rd_hdr($self) or return;
 	my $env = $self->{psgi_env};
-	my $filter = delete $env->{'qspawn.filter'};
+	my $filter = delete $env->{'qspawn.filter'} //
+		PublicInbox::GzipFilter::qsp_maybe($r->[1], $env);
+
 	my $wcb = delete $env->{'qspawn.wcb'};
 	my $async = delete $self->{async};
 	if (scalar(@$r) == 3) { # error
diff --git a/t/httpd-corner.psgi b/t/httpd-corner.psgi
index 446296200..cb41cfa05 100644
--- a/t/httpd-corner.psgi
+++ b/t/httpd-corner.psgi
@@ -94,6 +94,13 @@ my $app = sub {
 		return $qsp->psgi_return($env, undef, sub {
 			[ 200, [ qw(Content-Type application/octet-stream)]]
 		});
+	} elsif ($path eq '/psgi-return-compressible') {
+		require PublicInbox::Qspawn;
+		my $cmd = [qw(echo goodbye world)];
+		my $qsp = PublicInbox::Qspawn->new($cmd);
+		return $qsp->psgi_return($env, undef, sub {
+			[200, [qw(Content-Type text/plain)]]
+		});
 	} elsif ($path eq '/psgi-return-enoent') {
 		require PublicInbox::Qspawn;
 		my $cmd = [ 'this-better-not-exist-in-PATH'.rand ];
diff --git a/t/httpd-corner.t b/t/httpd-corner.t
index 681486550..514672a1b 100644
--- a/t/httpd-corner.t
+++ b/t/httpd-corner.t
@@ -340,11 +340,18 @@ SKIP: {
 	is($n, 30 * 1024 * 1024, 'got expected output from curl');
 	is($non_zero, 0, 'read all zeros');
 
-	require_mods(@zmods, 2);
+	require_mods(@zmods, 4);
 	my $buf = xqx([$curl, '-sS', "$base/psgi-return-gzip"]);
 	is($?, 0, 'curl succesful');
 	IO::Uncompress::Gunzip::gunzip(\$buf => \(my $out));
 	is($out, "hello world\n");
+	my $curl_rdr = { 2 => \(my $curl_err = '') };
+	$buf = xqx([$curl, qw(-sSv --compressed),
+			"$base/psgi-return-compressible"], undef, $curl_rdr);
+	is($?, 0, 'curl --compressed successful');
+	is($buf, "goodbye world\n", 'gzipped response as expected');
+	like($curl_err, qr/\bContent-Encoding: gzip\b/,
+		'curl got gzipped response');
 }
 
 {

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/43] stop auto-loading Plack::Middleware::Deflater
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (11 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 12/43] qspawn: learn to gzip streaming responses Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 14/43] mboxgz: do asynchronous git blob retrievals Eric Wong
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Instead of gzipping some (mbox.gz, manifest.js.gz) responses and
leaving P::M::D to do the rest, we gzip everything ourselves,
now, so P::M::D is redundant.
---
 Documentation/public-inbox-httpd.pod | 1 -
 INSTALL                              | 5 -----
 ci/deps.perl                         | 1 -
 examples/cgit.psgi                   | 8 --------
 examples/newswww.psgi                | 8 --------
 examples/public-inbox.psgi           | 9 ---------
 examples/unsubscribe.psgi            | 1 -
 script/public-inbox-httpd            | 9 ---------
 script/public-inbox.cgi              | 7 -------
 9 files changed, 49 deletions(-)

diff --git a/Documentation/public-inbox-httpd.pod b/Documentation/public-inbox-httpd.pod
index c1ebfd822..2f4e9e5d1 100644
--- a/Documentation/public-inbox-httpd.pod
+++ b/Documentation/public-inbox-httpd.pod
@@ -15,7 +15,6 @@ the PSGI file.
 
 If a PSGI file is not specified, L<PublicInbox::WWW> is
 loaded with a default middleware stack consisting of
-L<Plack::Middleware::Deflater>,
 L<Plack::Middleware::ReverseProxy>, and
 L<Plack::Middleware::Head>
 
diff --git a/INSTALL b/INSTALL
index 05e0f95e9..9f05c3f62 100644
--- a/INSTALL
+++ b/INSTALL
@@ -100,11 +100,6 @@ Numerous optional modules are likely to be useful as well:
                                    (ensures redirects are correct when running
                                     behind nginx or Varnish)
 
-- Plack::Middleware::Deflater      deb: libplack-middleware-deflater-perl
-                                   pkg: p5 -Plack-Middleware-Deflater
-                                   rpm: perl-Plack-Middleware-Deflater
-                                   (saves bandwidth on responses)
-
 * highlight                        deb: libhighlight-perl
                                    (for syntax highlighting with coderepo)
 
diff --git a/ci/deps.perl b/ci/deps.perl
index 501f51129..77d95fc8e 100755
--- a/ci/deps.perl
+++ b/ci/deps.perl
@@ -36,7 +36,6 @@ my $profiles = {
 		Net::Server
 		Plack
 		Plack::Test
-		Plack::Middleware::Deflater
 		Plack::Middleware::ReverseProxy
 		Search::Xapian
 		Socket6
diff --git a/examples/cgit.psgi b/examples/cgit.psgi
index e72e832d0..7ad38e280 100644
--- a/examples/cgit.psgi
+++ b/examples/cgit.psgi
@@ -18,14 +18,6 @@ my $pi_config = PublicInbox::Config->new;
 my $cgit = PublicInbox::Cgit->new($pi_config);
 
 builder {
-	eval {
-		enable 'Deflater',
-			content_type => [ qw(
-				text/html
-				text/plain
-				application/atom+xml
-				)]
-	};
 	eval { enable 'ReverseProxy' };
 	enable 'Head';
 	sub { $cgit->call($_[0]) }
diff --git a/examples/newswww.psgi b/examples/newswww.psgi
index 3cce7191d..52ad7043e 100644
--- a/examples/newswww.psgi
+++ b/examples/newswww.psgi
@@ -36,14 +36,6 @@ builder {
 	# regular PublicInbox::WWW code:
 	# see comments in examples/public-inbox.psgi for more info:
 	mount '/' => builder {
-		eval {
-			enable 'Deflater',
-				content_type => [ qw(
-					text/html
-					text/plain
-					application/atom+xml
-					)]
-		};
 		eval { enable 'ReverseProxy' };
 		enable 'Head';
 		sub { $www->call($_[0]) }
diff --git a/examples/public-inbox.psgi b/examples/public-inbox.psgi
index 9891a1f03..3537be2c7 100644
--- a/examples/public-inbox.psgi
+++ b/examples/public-inbox.psgi
@@ -22,15 +22,6 @@ my $src = $ENV{SRC_GIT_DIR}; # '/path/to/public-inbox.git'
 $src = PublicInbox::Git->new($src) if defined $src;
 
 builder {
-	eval {
-		enable 'Deflater',
-			content_type => [ qw(
-				text/html
-				text/plain
-				application/atom+xml
-				)]
-	};
-
 	# Enable to ensure redirects and Atom feed URLs are generated
 	# properly when running behind a reverse proxy server which
 	# sets the X-Forwarded-Proto request header.
diff --git a/examples/unsubscribe.psgi b/examples/unsubscribe.psgi
index 6a40f251d..7b97e2532 100644
--- a/examples/unsubscribe.psgi
+++ b/examples/unsubscribe.psgi
@@ -61,7 +61,6 @@ my $app = PublicInbox::Unsubscribe->new(
 
 builder {
 	mount '/u' => builder {
-		eval { enable 'Deflater' }; # optional
 		eval { enable 'ReverseProxy' }; # optional
 		enable 'Head';
 		sub { $app->call(@_) };
diff --git a/script/public-inbox-httpd b/script/public-inbox-httpd
index 09da505e5..b8159f3a5 100755
--- a/script/public-inbox-httpd
+++ b/script/public-inbox-httpd
@@ -27,15 +27,6 @@ my $refresh = sub {
 		my $www = PublicInbox::WWW->new;
 		$www->preload;
 		$app = builder {
-			eval {
-				enable 'Deflater',
-					content_type => [ qw(
-						text/html
-						text/plain
-						application/atom+xml
-						)]
-			};
-
 			eval { enable 'ReverseProxy' };
 			$@ and warn
 "Plack::Middleware::ReverseProxy missing,\n",
diff --git a/script/public-inbox.cgi b/script/public-inbox.cgi
index c766483a2..42ab17c9e 100755
--- a/script/public-inbox.cgi
+++ b/script/public-inbox.cgi
@@ -13,14 +13,7 @@ BEGIN {
 	PublicInbox::WWW->preload if $ENV{MOD_PERL};
 }
 my $www = PublicInbox::WWW->new;
-my $have_deflater = eval { require Plack::Middleware::Deflater; 1 };
 my $app = builder {
-	if ($have_deflater) {
-		enable 'Deflater',
-			content_type => [ 'text/html', 'text/plain',
-					'application/atom+xml' ];
-	}
-
 	# Enable to ensure redirects and Atom feed URLs are generated
 	# properly when running behind a reverse proxy server which
 	# sets the X-Forwarded-Proto request header.

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/43] mboxgz: do asynchronous git blob retrievals
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (12 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 13/43] stop auto-loading Plack::Middleware::Deflater Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 15/43] mboxgz: reduce hash depth Eric Wong
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This lets the -httpd worker process make better use of time
instead of waiting for git-cat-file to respond.  With 4 jobs in
the new test case against a clone of
<https://public-inbox.org/meta/>, a speedup of 10-12% is shown.
Even a single job shows a 2-5% improvement on an SSD.
---
 MANIFEST                  |  1 +
 lib/PublicInbox/HTTP.pm   |  7 +++
 lib/PublicInbox/MboxGz.pm | 69 +++++++++++++++++++++++----
 xt/httpd-async-stream.t   | 99 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 167 insertions(+), 9 deletions(-)
 create mode 100644 xt/httpd-async-stream.t

diff --git a/MANIFEST b/MANIFEST
index dcd7a7e5f..9b0f50203 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -368,6 +368,7 @@ xt/cmp-msgview.t
 xt/eml_check_limits.t
 xt/git-http-backend.t
 xt/git_async_cmp.t
+xt/httpd-async-stream.t
 xt/imapd-mbsync-oimap.t
 xt/imapd-validate.t
 xt/mem-imapd-tls.t
diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index 828174653..5844ef440 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -488,6 +488,13 @@ sub busy () {
 	($self->{rbuf} || exists($self->{env}) || $self->{wbuf});
 }
 
+# runs $cb on the next iteration of the event loop at earliest
+sub next_step {
+	my ($self, $cb) = @_;
+	return unless exists $self->{sock};
+	$self->requeue if 1 == push(@{$self->{wbuf}}, $cb);
+}
+
 # Chunked and Identity packages are used for writing responses.
 # They may be exposed to the PSGI application when the PSGI app
 # returns a CODE ref for "push"-based responses
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index 535ef96c9..8c9010afb 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -6,6 +6,9 @@ use parent 'PublicInbox::GzipFilter';
 use PublicInbox::Eml;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Mbox;
+use PublicInbox::GitAsyncCat;
+*msg_hdr = \&PublicInbox::Mbox::msg_hdr;
+*msg_body = \&PublicInbox::Mbox::msg_body;
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
@@ -17,33 +20,81 @@ sub new {
 	}, $class;
 }
 
+# this is public-inbox-httpd-specific
+sub mboxgz_blob_cb { # git->cat_async callback
+	my ($bref, $oid, $type, $size, $self) = @_;
+	my $http = $self->{ctx}->{env}->{'psgix.io'} or return; # client abort
+	my $smsg = delete $self->{smsg} or die 'BUG: no smsg';
+	if (!defined($oid)) {
+		# it's possible to have TOCTOU if an admin runs
+		# public-inbox-(edit|purge), just move onto the next message
+		return $http->next_step(\&async_next);
+	} else {
+		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
+	}
+	$self->zmore(msg_hdr($self->{ctx},
+				PublicInbox::Eml->new($bref)->header_obj,
+				$smsg->{mid}));
+
+	# PublicInbox::HTTP::{Chunked,Identity}::write
+	$self->{http_out}->write($self->translate(msg_body($$bref)));
+
+	$http->next_step(\&async_next);
+}
+
+# this is public-inbox-httpd-specific
+sub async_step ($) {
+	my ($self) = @_;
+	if (my $smsg = $self->{smsg} = $self->{cb}->($self->{ctx})) {
+		git_async_cat($self->{ctx}->{-inbox}->git, $smsg->{blob},
+				\&mboxgz_blob_cb, $self);
+	} elsif (my $out = delete $self->{http_out}) {
+		$out->write($self->zflush);
+		$out->close;
+	}
+}
+
+# called by PublicInbox::DS::write
+sub async_next {
+	my ($http) = @_; # PublicInbox::HTTP
+	async_step($http->{forward});
+}
+
+# called by PublicInbox::HTTP::close, or any other PSGI server
+sub close { !!delete($_[0]->{http_out}) }
+
 sub response {
 	my ($class, $ctx, $cb, $fn) = @_;
-	my $body = $class->new($ctx, $cb);
+	my $self = $class->new($ctx, $cb);
 	# http://www.iana.org/assignments/media-types/application/gzip
 	$fn = defined($fn) && $fn ne '' ? to_filename($fn) : 'no-subject';
 	my $h = [ qw(Content-Type application/gzip),
 		'Content-Disposition', "inline; filename=$fn.mbox.gz" ];
-	[ 200, $h, $body ];
+	if ($ctx->{env}->{'pi-httpd.async'}) {
+		sub {
+			my ($wcb) = @_; # -httpd provided write callback
+			$self->{http_out} = $wcb->([200, $h]);
+			$self->{ctx}->{env}->{'psgix.io'}->{forward} = $self;
+			async_step($self); # start stepping
+		};
+	} else { # generic PSGI
+		[ 200, $h, $self ];
+	}
 }
 
-# called by Plack::Util::foreach or similar
+# called by Plack::Util::foreach or similar (generic PSGI)
 sub getline {
 	my ($self) = @_;
 	my $ctx = $self->{ctx} or return;
 	while (my $smsg = $self->{cb}->($ctx)) {
 		my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
 		my $h = PublicInbox::Eml->new($mref)->header_obj;
-		$self->zmore(
-			PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid})
-		);
-		return $self->translate(PublicInbox::Mbox::msg_body($$mref));
+		$self->zmore(msg_hdr($ctx, $h, $smsg->{mid}));
+		return $self->translate(msg_body($$mref));
 	}
 	# signal that we're done and can return undef next call:
 	delete $self->{ctx};
 	$self->zflush;
 }
 
-sub close {} # noop
-
 1;
diff --git a/xt/httpd-async-stream.t b/xt/httpd-async-stream.t
new file mode 100644
index 000000000..29bcb6125
--- /dev/null
+++ b/xt/httpd-async-stream.t
@@ -0,0 +1,99 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# Expensive test to validate compression and TLS.
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::DS qw(now);
+use PublicInbox::Spawn qw(which popen_rd);
+use Digest::MD5;
+use POSIX qw(_exit);
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
+my $curl = which('curl') or plan skip_all => "curl(1) missing for $0";
+my ($tmpdir, $for_destroy) = tmpdir();
+require_mods(qw(DBD::SQLite));
+my $JOBS = $ENV{TEST_JOBS} // 4;
+diag "TEST_JOBS=$JOBS";
+
+my $make_local_server = sub {
+	my $pi_config = "$tmpdir/config";
+	open my $fh, '>', $pi_config or die "open($pi_config): $!";
+	print $fh <<"" or die "print $pi_config: $!";
+[publicinbox "test"]
+inboxdir = $inboxdir
+address = test\@example.com
+
+	close $fh or die "close($pi_config): $!";
+	my ($out, $err) = ("$tmpdir/out", "$tmpdir/err");
+	for ($out, $err) {
+		open my $fh, '>', $_ or die "truncate: $!";
+	}
+	my $http = tcp_server();
+	my $rdr = { 3 => $http };
+
+	# not using multiple workers, here, since we want to increase
+	# the chance of tripping concurrency bugs within PublicInbox/HTTP*.pm
+	my $cmd = [ '-httpd', "--stdout=$out", "--stderr=$err", '-W0' ];
+	my $host_port = $http->sockhost.':'.$http->sockport;
+	push @$cmd, "-lhttp://$host_port";
+	my $url = "$host_port/test/all.mbox.gz";
+	print STDERR "# CMD ". join(' ', @$cmd). "\n";
+	my $env = { PI_CONFIG => $pi_config };
+	(start_script($cmd, $env, $rdr), $url);
+};
+
+my ($td, $url) = $make_local_server->();
+
+my $do_get_all = sub {
+	my ($job) = @_;
+	local $SIG{__DIE__} = sub { print STDERR $job, ': ', @_; _exit(1) };
+	my $dig = Digest::MD5->new;
+	my ($buf, $nr);
+	my $bytes = 0;
+	my $t0 = now();
+	my ($rd, $pid) = popen_rd([$curl, qw(-HHost:example.com -sSf), $url]);
+	while (1) {
+		$nr = sysread($rd, $buf, 65536);
+		last if !$nr;
+		$dig->add($buf);
+		$bytes += $nr;
+	}
+	my $res = $dig->hexdigest;
+	my $elapsed = sprintf('%0.3f', now() - $t0);
+	close $rd or die "close curl failed: $!\n";
+	waitpid($pid, 0) == $pid or die "waitpid failed: $!\n";
+	$? == 0 or die "curl failed: $?\n";
+	print STDERR "# $job $$ ($?) $res (${elapsed}s) $bytes bytes\n";
+	$res;
+};
+
+my (%pids, %res);
+for my $job (1..$JOBS) {
+	pipe(my ($r, $w)) or die;
+	my $pid = fork;
+	if ($pid == 0) {
+		close $r or die;
+		my $res = $do_get_all->($job);
+		print $w $res or die;
+		close $w or die;
+		_exit(0);
+	}
+	close $w or die;
+	$pids{$pid} = [ $job, $r ];
+}
+
+while (scalar keys %pids) {
+	my $pid = waitpid(-1, 0) or next;
+	my $child = delete $pids{$pid} or next;
+	my ($job, $rpipe) = @$child;
+	is($?, 0, "$job done");
+	my $sum = do { local $/; <$rpipe> };
+	push @{$res{$sum}}, $job;
+}
+is(scalar keys %res, 1, 'all got the same result');
+$td->kill;
+$td->join;
+is($?, 0, 'no error on -httpd exit');
+done_testing;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 15/43] mboxgz: reduce hash depth
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (13 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 14/43] mboxgz: do asynchronous git blob retrievals Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 16/43] mbox: async blob fetch for "single message" raw mboxrd Eric Wong
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We can bless $ctx directly into a MboxGz object to reduce
hash lookups and allocations.
---
 lib/PublicInbox/MboxGz.pm | 39 ++++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index 8c9010afb..598b10347 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -10,20 +10,10 @@ use PublicInbox::GitAsyncCat;
 *msg_hdr = \&PublicInbox::Mbox::msg_hdr;
 *msg_body = \&PublicInbox::Mbox::msg_body;
 
-sub new {
-	my ($class, $ctx, $cb) = @_;
-	$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	bless {
-		gz => PublicInbox::GzipFilter::gzip_or_die(),
-		cb => $cb,
-		ctx => $ctx
-	}, $class;
-}
-
 # this is public-inbox-httpd-specific
 sub mboxgz_blob_cb { # git->cat_async callback
 	my ($bref, $oid, $type, $size, $self) = @_;
-	my $http = $self->{ctx}->{env}->{'psgix.io'} or return; # client abort
+	my $http = $self->{env}->{'psgix.io'} or return; # client abort
 	my $smsg = delete $self->{smsg} or die 'BUG: no smsg';
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
@@ -32,7 +22,7 @@ sub mboxgz_blob_cb { # git->cat_async callback
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
-	$self->zmore(msg_hdr($self->{ctx},
+	$self->zmore(msg_hdr($self,
 				PublicInbox::Eml->new($bref)->header_obj,
 				$smsg->{mid}));
 
@@ -45,8 +35,8 @@ sub mboxgz_blob_cb { # git->cat_async callback
 # this is public-inbox-httpd-specific
 sub async_step ($) {
 	my ($self) = @_;
-	if (my $smsg = $self->{smsg} = $self->{cb}->($self->{ctx})) {
-		git_async_cat($self->{ctx}->{-inbox}->git, $smsg->{blob},
+	if (my $smsg = $self->{smsg} = $self->{cb}->($self)) {
+		git_async_cat($self->{-inbox}->git, $smsg->{blob},
 				\&mboxgz_blob_cb, $self);
 	} elsif (my $out = delete $self->{http_out}) {
 		$out->write($self->zflush);
@@ -64,17 +54,20 @@ sub async_next {
 sub close { !!delete($_[0]->{http_out}) }
 
 sub response {
-	my ($class, $ctx, $cb, $fn) = @_;
-	my $self = $class->new($ctx, $cb);
+	my ($class, $self, $cb, $fn) = @_;
+	$self->{base_url} = $self->{-inbox}->base_url($self->{env});
+	$self->{cb} = $cb;
+	$self->{gz} = PublicInbox::GzipFilter::gzip_or_die();
+	bless $self, $class;
 	# http://www.iana.org/assignments/media-types/application/gzip
 	$fn = defined($fn) && $fn ne '' ? to_filename($fn) : 'no-subject';
 	my $h = [ qw(Content-Type application/gzip),
 		'Content-Disposition', "inline; filename=$fn.mbox.gz" ];
-	if ($ctx->{env}->{'pi-httpd.async'}) {
+	if ($self->{env}->{'pi-httpd.async'}) {
 		sub {
 			my ($wcb) = @_; # -httpd provided write callback
 			$self->{http_out} = $wcb->([200, $h]);
-			$self->{ctx}->{env}->{'psgix.io'}->{forward} = $self;
+			$self->{env}->{'psgix.io'}->{forward} = $self;
 			async_step($self); # start stepping
 		};
 	} else { # generic PSGI
@@ -85,15 +78,15 @@ sub response {
 # called by Plack::Util::foreach or similar (generic PSGI)
 sub getline {
 	my ($self) = @_;
-	my $ctx = $self->{ctx} or return;
-	while (my $smsg = $self->{cb}->($ctx)) {
-		my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
+	my $cb = $self->{cb} or return;
+	while (my $smsg = $cb->($self)) {
+		my $mref = $self->{-inbox}->msg_by_smsg($smsg) or next;
 		my $h = PublicInbox::Eml->new($mref)->header_obj;
-		$self->zmore(msg_hdr($ctx, $h, $smsg->{mid}));
+		$self->zmore(msg_hdr($self, $h, $smsg->{mid}));
 		return $self->translate(msg_body($$mref));
 	}
 	# signal that we're done and can return undef next call:
-	delete $self->{ctx};
+	delete $self->{cb};
 	$self->zflush;
 }
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 16/43] mbox: async blob fetch for "single message" raw mboxrd
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (14 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 15/43] mboxgz: reduce hash depth Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 17/43] wwwatomstream: simplify feed_update callers Eric Wong
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This restores gzip-by-default behavior for /$INBOX/$MSGID/raw
endpoints for all indexed inboxes.  Unindexed v1 inboxes will
remain uncompressed, for now.
---
 lib/PublicInbox/Mbox.pm   | 160 +++++++++++++++++++++++++-------------
 lib/PublicInbox/MboxGz.pm |  51 ++++++------
 t/psgi_v2.t               |  46 +++++++++--
 3 files changed, 175 insertions(+), 82 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 4c0b01edf..895f828c5 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -14,62 +14,69 @@ use PublicInbox::MID qw/mid_escape/;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Smsg;
 use PublicInbox::Eml;
-
-sub subject_fn ($) {
-	my ($hdr) = @_;
-	my $fn = $hdr->header_str('Subject');
-	return 'no-subject' if (!defined($fn) || $fn eq '');
-
-	$fn =~ s/^re:\s+//i;
-	$fn eq '' ? 'no-subject' : to_filename($fn);
-}
-
-sub mb_stream {
-	my ($more) = @_;
-	bless $more, 'PublicInbox::Mbox';
-}
+use PublicInbox::GitAsyncCat;
+use PublicInbox::GzipFilter qw(gzf_maybe);
 
 # called by PSGI server as body response
 # this gets called twice for every message, once to return the header,
 # once to retrieve the body
 sub getline {
-	my ($more) = @_; # self
-	my ($ctx, $id, $prev, $next, $mref, $hdr) = @$more;
-	if ($hdr) { # first message hits this, only
-		pop @$more; # $hdr
-		pop @$more; # $mref
-		return msg_hdr($ctx, $hdr) . msg_body($$mref);
-	}
-	my $cur = $next or return;
+	my ($ctx) = @_; # ctx
+	my $smsg = $ctx->{smsg} or return;
 	my $ibx = $ctx->{-inbox};
-	$next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev);
-	$mref = $ibx->msg_by_smsg($cur) or return;
-	$hdr = PublicInbox::Eml->new($mref)->header_obj;
-	@$more = ($ctx, $id, $prev, $next); # $next may be undef, here
-	msg_hdr($ctx, $hdr) . msg_body($$mref);
+	my $eml = $ibx->smsg_eml($smsg) or return;
+	$ctx->{smsg} = $ibx->over->next_by_mid($ctx->{mid}, @{$ctx->{id_prev}});
+	msg_hdr($ctx, $eml, $smsg->{mid}) . msg_body($eml);
 }
 
-sub close {} # noop
+sub close { !!delete($_[0]->{http_out}) }
 
-# /$INBOX/$MESSAGE_ID/raw
-sub emit_raw {
+sub mbox_async_step ($) { # public-inbox-httpd-only
 	my ($ctx) = @_;
-	my $mid = $ctx->{mid};
-	my $ibx = $ctx->{-inbox};
-	$ctx->{base_url} = $ibx->base_url($ctx->{env});
-	my ($mref, $more, $id, $prev, $next);
-	if (my $over = $ibx->over) {
-		my $smsg = $over->next_by_mid($mid, \$id, \$prev) or return;
-		$mref = $ibx->msg_by_smsg($smsg) or return;
-		$next = $over->next_by_mid($mid, \$id, \$prev);
+	if (my $smsg = $ctx->{smsg}) {
+		git_async_cat($ctx->{-inbox}->git, $smsg->{blob},
+				\&mbox_blob_cb, $ctx);
+	} elsif (my $out = delete $ctx->{http_out}) {
+		$out->close;
+	}
+}
+
+# called by PublicInbox::DS::write
+sub mbox_async_next {
+	my ($http) = @_; # PublicInbox::HTTP
+	my $ctx = $http->{forward} or return; # client aborted
+	eval {
+		$ctx->{smsg} = $ctx->{-inbox}->over->next_by_mid(
+					$ctx->{mid}, @{$ctx->{id_prev}});
+		mbox_async_step($ctx);
+	};
+}
+
+# this is public-inbox-httpd-specific
+sub mbox_blob_cb { # git->cat_async callback
+	my ($bref, $oid, $type, $size, $ctx) = @_;
+	my $http = $ctx->{env}->{'psgix.io'} or return; # client abort
+	my $smsg = delete $ctx->{smsg} or die 'BUG: no smsg';
+	if (!defined($oid)) {
+		# it's possible to have TOCTOU if an admin runs
+		# public-inbox-(edit|purge), just move onto the next message
+		return $http->next_step(\&mbox_async_next);
 	} else {
-		$mref = $ibx->msg_by_mid($mid) or return;
+		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
-	my $hdr = PublicInbox::Eml->new($mref)->header_obj;
-	$more = [ $ctx, $id, $prev, $next, $mref, $hdr ]; # for ->getline
-	my $fn = subject_fn($hdr);
+	my $eml = PublicInbox::Eml->new($bref);
+	$ctx->{http_out}->write(msg_hdr($ctx, $eml, $smsg->{mid}));
+	$ctx->{http_out}->write(msg_body($eml));
+	$http->next_step(\&mbox_async_next);
+}
+
+sub res_hdr ($$) {
+	my ($ctx, $subject) = @_;
+	my $fn = $subject // 'no-subject';
+	$fn =~ s/^re:\s+//i;
+	$fn = $fn eq '' ? 'no-subject' : to_filename($fn);
 	my @hdr = ('Content-Type');
-	if ($ibx->{obfuscate}) {
+	if ($ctx->{-inbox}->{obfuscate}) {
 		# obfuscation is stupid, but maybe scrapers are, too...
 		push @hdr, 'application/mbox';
 		$fn .= '.mbox';
@@ -78,11 +85,58 @@ sub emit_raw {
 		$fn .= '.txt';
 	}
 	push @hdr, 'Content-Disposition', "inline; filename=$fn";
-	[ 200, \@hdr, mb_stream($more) ];
+	\@hdr;
+}
+
+# for rare cases where v1 inboxes aren't indexed w/ ->over at all
+sub no_over_raw ($) {
+	my ($ctx) = @_;
+	my $mref = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return;
+	my $eml = PublicInbox::Eml->new($mref);
+	[ 200, res_hdr($ctx, $eml->header_str('Subject')),
+		[ msg_hdr($ctx, $eml, $ctx->{mid}) . msg_body($eml) ] ]
+}
+
+sub stream_raw { # MboxGz response callback
+	my ($ctx) = @_;
+	delete($ctx->{smsg}) //
+		$ctx->{-inbox}->over->next_by_mid($ctx->{mid},
+						@{$ctx->{id_prev}});
+}
+
+# /$INBOX/$MESSAGE_ID/raw
+sub emit_raw {
+	my ($ctx) = @_;
+	my $env = $ctx->{env};
+	$ctx->{base_url} = $ctx->{-inbox}->base_url($env);
+	my $over = $ctx->{-inbox}->over or return no_over_raw($ctx);
+	my ($id, $prev);
+	my $smsg = $over->next_by_mid($ctx->{mid}, \$id, \$prev) or return;
+	$ctx->{smsg} = $smsg;
+	my $res_hdr = res_hdr($ctx, $smsg->{subject});
+	$ctx->{id_prev} = [ \$id, \$prev ];
+
+	if (my $gzf = gzf_maybe($res_hdr, $env)) {
+		$ctx->{gz} = delete $gzf->{gz};
+		require PublicInbox::MboxGz;
+		PublicInbox::MboxGz::response($ctx, \&stream_raw, $res_hdr);
+	} elsif ($env->{'pi-httpd.async'}) {
+		sub {
+			my ($wcb) = @_; # -httpd provided write callback
+			$ctx->{http_out} = $wcb->([200, $res_hdr]);
+			$ctx->{env}->{'psgix.io'}->{forward} = $ctx;
+			bless $ctx, __PACKAGE__;
+			mbox_async_step($ctx); # start stepping
+		};
+	} else { # generic PSGI code path
+		bless $ctx, __PACKAGE__; # respond to ->getline
+		[ 200, $res_hdr, $ctx ];
+	}
 }
 
 sub msg_hdr ($$;$) {
-	my ($ctx, $header_obj, $mid) = @_;
+	my ($ctx, $eml, $mid) = @_;
+	my $header_obj = $eml->header_obj;
 
 	# drop potentially confusing headers, ssoma already should've dropped
 	# Lines and Content-Length
@@ -120,12 +174,13 @@ sub msg_hdr ($$;$) {
 }
 
 sub msg_body ($) {
+	my $bdy = $_[0]->{bdy} // return "\n";
 	# mboxrd quoting style
 	# https://en.wikipedia.org/wiki/Mbox#Modified_mbox
 	# https://www.loc.gov/preservation/digital/formats/fdd/fdd000385.shtml
 	# https://web.archive.org/http://www.qmail.org/man/man5/mbox.html
-	$_[0] =~ s/^(>*From )/>$1/gm;
-	$_[0] .= "\n";
+	$$bdy =~ s/^(>*From )/>$1/gm;
+	$$bdy .= "\n";
 }
 
 sub thread_cb {
@@ -145,12 +200,12 @@ sub thread_cb {
 
 sub thread_mbox {
 	my ($ctx, $over, $sfx) = @_;
-	require PublicInbox::MboxGz;
 	my $msgs = $ctx->{msgs} = $over->get_thread($ctx->{mid}, {});
 	return [404, [qw(Content-Type text/plain)], []] if !@$msgs;
 	$ctx->{prev} = $msgs->[-1];
 	$ctx->{over} = $over; # bump refcnt
-	PublicInbox::MboxGz->response($ctx, \&thread_cb, $msgs->[0]->{subject});
+	require PublicInbox::MboxGz;
+	PublicInbox::MboxGz::mbox_gz($ctx, \&thread_cb, $msgs->[0]->{subject});
 }
 
 sub emit_range {
@@ -188,7 +243,8 @@ sub mbox_all_ids {
 		return PublicInbox::WWW::need($ctx, 'Overview');
 	$ctx->{ids} = $ids;
 	$ctx->{prev} = $prev;
-	return PublicInbox::MboxGz->response($ctx, \&all_ids_cb, 'all');
+	require PublicInbox::MboxGz;
+	PublicInbox::MboxGz::mbox_gz($ctx, \&all_ids_cb, 'all');
 }
 
 sub results_cb {
@@ -213,7 +269,6 @@ sub results_cb {
 sub mbox_all {
 	my ($ctx, $query) = @_;
 
-	require PublicInbox::MboxGz;
 	return mbox_all_ids($ctx) if $query eq '';
 	my $qopts = $ctx->{qopts} = { mset => 2 };
 	my $srch = $ctx->{srch} = $ctx->{-inbox}->search or
@@ -224,7 +279,8 @@ sub mbox_all {
 				["No results found\n"]];
 	$ctx->{iter} = 0;
 	$ctx->{query} = $query;
-	PublicInbox::MboxGz->response($ctx, \&results_cb, 'results-'.$query);
+	require PublicInbox::MboxGz;
+	PublicInbox::MboxGz::mbox_gz($ctx, \&results_cb, 'results-'.$query);
 }
 
 1;
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index 598b10347..716bf7b19 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -18,22 +18,21 @@ sub mboxgz_blob_cb { # git->cat_async callback
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
 		# public-inbox-(edit|purge), just move onto the next message
-		return $http->next_step(\&async_next);
+		return $http->next_step(\&mboxgz_async_next);
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
 	}
-	$self->zmore(msg_hdr($self,
-				PublicInbox::Eml->new($bref)->header_obj,
-				$smsg->{mid}));
+	my $eml = PublicInbox::Eml->new($bref);
+	$self->zmore(msg_hdr($self, $eml, $smsg->{mid}));
 
 	# PublicInbox::HTTP::{Chunked,Identity}::write
-	$self->{http_out}->write($self->translate(msg_body($$bref)));
+	$self->{http_out}->write($self->translate(msg_body($eml)));
 
-	$http->next_step(\&async_next);
+	$http->next_step(\&mboxgz_async_next);
 }
 
 # this is public-inbox-httpd-specific
-sub async_step ($) {
+sub mboxgz_async_step ($) {
 	my ($self) = @_;
 	if (my $smsg = $self->{smsg} = $self->{cb}->($self)) {
 		git_async_cat($self->{-inbox}->git, $smsg->{blob},
@@ -45,45 +44,49 @@ sub async_step ($) {
 }
 
 # called by PublicInbox::DS::write
-sub async_next {
+sub mboxgz_async_next {
 	my ($http) = @_; # PublicInbox::HTTP
-	async_step($http->{forward});
+	mboxgz_async_step($http->{forward});
 }
 
 # called by PublicInbox::HTTP::close, or any other PSGI server
 sub close { !!delete($_[0]->{http_out}) }
 
 sub response {
-	my ($class, $self, $cb, $fn) = @_;
-	$self->{base_url} = $self->{-inbox}->base_url($self->{env});
+	my ($self, $cb, $res_hdr) = @_;
 	$self->{cb} = $cb;
-	$self->{gz} = PublicInbox::GzipFilter::gzip_or_die();
-	bless $self, $class;
-	# http://www.iana.org/assignments/media-types/application/gzip
-	$fn = defined($fn) && $fn ne '' ? to_filename($fn) : 'no-subject';
-	my $h = [ qw(Content-Type application/gzip),
-		'Content-Disposition', "inline; filename=$fn.mbox.gz" ];
+	bless $self, __PACKAGE__;
 	if ($self->{env}->{'pi-httpd.async'}) {
 		sub {
 			my ($wcb) = @_; # -httpd provided write callback
-			$self->{http_out} = $wcb->([200, $h]);
+			$self->{http_out} = $wcb->([200, $res_hdr]);
 			$self->{env}->{'psgix.io'}->{forward} = $self;
-			async_step($self); # start stepping
+			mboxgz_async_step($self); # start stepping
 		};
 	} else { # generic PSGI
-		[ 200, $h, $self ];
+		[ 200, $res_hdr, $self ];
 	}
 }
 
+sub mbox_gz {
+	my ($self, $cb, $fn) = @_;
+	$self->{base_url} = $self->{-inbox}->base_url($self->{env});
+	$self->{gz} = PublicInbox::GzipFilter::gzip_or_die();
+	$fn = to_filename($fn // 'no-subject');
+	$fn = 'no-subject' if $fn eq '';
+	# http://www.iana.org/assignments/media-types/application/gzip
+	response($self, $cb, [ qw(Content-Type application/gzip),
+		'Content-Disposition', "inline; filename=$fn.mbox.gz" ]);
+}
+
 # called by Plack::Util::foreach or similar (generic PSGI)
 sub getline {
 	my ($self) = @_;
 	my $cb = $self->{cb} or return;
 	while (my $smsg = $cb->($self)) {
-		my $mref = $self->{-inbox}->msg_by_smsg($smsg) or next;
-		my $h = PublicInbox::Eml->new($mref)->header_obj;
-		$self->zmore(msg_hdr($self, $h, $smsg->{mid}));
-		return $self->translate(msg_body($$mref));
+		my $eml = $self->{-inbox}->smsg_eml($smsg) or next;
+		$self->zmore(msg_hdr($self, $eml, $smsg->{mid}));
+		return $self->translate(msg_body($eml));
 	}
 	# signal that we're done and can return undef next call:
 	delete $self->{cb};
diff --git a/t/psgi_v2.t b/t/psgi_v2.t
index 8f75a3fbf..90a710a44 100644
--- a/t/psgi_v2.t
+++ b/t/psgi_v2.t
@@ -103,7 +103,7 @@ $mids = mids($mime->header_obj);
 my $third = $mids->[-1];
 $im->done;
 
-test_psgi(sub { $www->call(@_) }, sub {
+my $client = sub {
 	my ($cb) = @_;
 	$res = $cb->(GET("/v2test/$third/raw"));
 	$raw = $res->content;
@@ -122,12 +122,19 @@ test_psgi(sub { $www->call(@_) }, sub {
 
 	SKIP: {
 		eval { require IO::Uncompress::Gunzip };
-		skip 'IO::Uncompress::Gunzip missing', 4 if $@;
+		skip 'IO::Uncompress::Gunzip missing', 6 if $@;
+		my ($in, $out, $status);
+		my $req = GET('/v2test/a-mid@b/raw');
+		$req->header('Accept-Encoding' => 'gzip');
+		$res = $cb->($req);
+		is($res->header('Content-Encoding'), 'gzip', 'gzip encoding');
+		$in = $res->content;
+		IO::Uncompress::Gunzip::gunzip(\$in => \$out);
+		is($out, $raw, 'gzip response matches');
 
 		$res = $cb->(GET('/v2test/a-mid@b/t.mbox.gz'));
-		my $out;
-		my $in = $res->content;
-		my $status = IO::Uncompress::Gunzip::gunzip(\$in => \$out);
+		$in = $res->content;
+		$status = IO::Uncompress::Gunzip::gunzip(\$in => \$out);
 		unlike($out, qr/^From oldbug/sm, 'buggy "From_" line omitted');
 		like($out, qr/^hello world$/m, 'got first in t.mbox.gz');
 		like($out, qr/^hello world!$/m, 'got second in t.mbox.gz');
@@ -187,7 +194,34 @@ test_psgi(sub { $www->call(@_) }, sub {
 		like($raw, qr!>\Q$mid\E</a>!s, "Message-ID $mid shown");
 	}
 	like($raw, qr/\b3\+ messages\b/, 'thread overview shown');
+};
 
+test_psgi(sub { $www->call(@_) }, $client);
+SKIP: {
+	require_mods(qw(Plack::Test::ExternalServer), 37);
+	my $cfgpath = "$inboxdir/$$.config";
+	open my $fh, '>', $cfgpath or BAIL_OUT $!;
+	print $fh <<EOF or BAIL_OUT $!;
+[publicinbox "v2test"]
+	inboxdir = $inboxdir
+	address = test\@example.com
+EOF
+	close $fh or BAIL_OUT $!;
+	my $env = { PI_CONFIG => $cfgpath };
+	my $sock = tcp_server() or die;
+	my ($out, $err) = map { "$inboxdir/std$_.log" } qw(out err);
+	my $cmd = [ qw(-httpd -W0), "--stdout=$out", "--stderr=$err" ];
+	my $td = start_script($cmd, $env, { 3 => $sock });
+	my ($h, $p) = ($sock->sockhost, $sock->sockport);
+	local $ENV{PLACK_TEST_EXTERNALSERVER_URI} = "http://$h:$p";
+	Plack::Test::ExternalServer::test_psgi(client => $client);
+	$td->join('TERM');
+	open $fh, '<', $err or BAIL_OUT $!;
+	is(do { local $/; <$fh> }, '', 'no errors');
+};
+
+test_psgi(sub { $www->call(@_) }, sub {
+	my ($cb) = @_;
 	my $exp = [ qw(<a-mid@b> <reuse@mid>) ];
 	$mime->header_set('Message-Id', @$exp);
 	$mime->header_set('Subject', '4th dupe');
@@ -208,7 +242,7 @@ test_psgi(sub { $www->call(@_) }, sub {
 	$res = $cb->(GET('/v2test/reuse@mid/T/'));
 	$raw = $res->content;
 	like($raw, qr/\b4\+ messages\b/, 'thread overview shown with /T/');
-	@over = ($raw =~ m/^\d{4}-\d+-\d+\s+\d+:\d+ (.+)$/gm);
+	my @over = ($raw =~ m/^\d{4}-\d+-\d+\s+\d+:\d+ (.+)$/gm);
 	is_deeply(\@over, [ '<a', '` <a', '` <a', '` <a' ],
 		'duplicate messages share the same root');
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 17/43] wwwatomstream: simplify feed_update callers
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (15 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 16/43] mbox: async blob fetch for "single message" raw mboxrd Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 18/43] wwwatomstream: use PublicInbox::Inbox->modified for feed_updated Eric Wong
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We always return Z (UTC) times, anyways, so we'll always
use gmtime() on the seconds-after-the-epoch.
---
 Documentation/mknews.perl        | 2 +-
 lib/PublicInbox/WwwAtomStream.pm | 8 +++-----
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index 965c30c1d..ba049d9e6 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -131,7 +131,7 @@ sub atom_start {
 	delete $astream->{emit_header};
 	my $ibx = $ctx->{-inbox};
 	my $title = PublicInbox::WwwAtomStream::title_tag($ibx->description);
-	my $updated = PublicInbox::WwwAtomStream::feed_updated(gmtime($mtime));
+	my $updated = PublicInbox::WwwAtomStream::feed_updated($mtime);
 	print $out <<EOF or die;
 <?xml version="1.0" encoding="us-ascii"?>
 <feed
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index c407e343f..c494fa226 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -103,7 +103,7 @@ sub atom_header {
 		qq(\nhref="$base_url"/>) .
 	qq(<link\nrel="self"\nhref="$self_url"/>) .
 	qq(<id>$page_id</id>) .
-	feed_updated(gmtime($mtime));
+	feed_updated($mtime);
 }
 
 # returns undef or string
@@ -125,9 +125,7 @@ sub feed_entry {
 		$irt = '';
 	}
 	my $href = $base . mid_href($mid) . '/';
-	my $t = msg_timestamp($hdr);
-	my @t = gmtime(defined $t ? $t : time);
-	my $updated = feed_updated(@t);
+	my $updated = feed_updated(msg_timestamp($hdr));
 
 	my $title = $hdr->header('Subject');
 	$title = '(no subject)' unless defined $title && $title ne '';
@@ -158,7 +156,7 @@ sub feed_entry {
 }
 
 sub feed_updated {
-	'<updated>' . strftime('%Y-%m-%dT%H:%M:%SZ', @_) . '</updated>';
+	'<updated>' . strftime('%Y-%m-%dT%H:%M:%SZ', gmtime(@_)) . '</updated>';
 }
 
 1;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 18/43] wwwatomstream: use PublicInbox::Inbox->modified for feed_updated
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (16 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 17/43] wwwatomstream: simplify feed_update callers Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 19/43] wwwatomstream: reuse $ctx as $self Eric Wong
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

stat(2) on the inboxdir is unlikely to be correct, now that
msgmap truncates its journal (rather than unlinking it).
---
 lib/PublicInbox/WwwAtomStream.pm | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index c494fa226..35b536c1c 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -93,8 +93,6 @@ sub atom_header {
 		$self_url .= 'new.atom';
 		$page_id = "mailto:$ibx->{-primary_address}";
 	}
-	my $mtime = (stat($ibx->{inboxdir}))[9] || time;
-
 	qq(<?xml version="1.0" encoding="us-ascii"?>\n) .
 	qq(<feed\nxmlns="http://www.w3.org/2005/Atom"\n) .
 	qq(xmlns:thr="http://purl.org/syndication/thread/1.0">) .
@@ -103,7 +101,7 @@ sub atom_header {
 		qq(\nhref="$base_url"/>) .
 	qq(<link\nrel="self"\nhref="$self_url"/>) .
 	qq(<id>$page_id</id>) .
-	feed_updated($mtime);
+	feed_updated($ibx->modified);
 }
 
 # returns undef or string

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 19/43] wwwatomstream: reuse $ctx as $self
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (17 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 18/43] wwwatomstream: use PublicInbox::Inbox->modified for feed_updated Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 20/43] xt/httpd-async-stream: allow more options Eric Wong
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

No need to deepen our object graph, here.
---
 lib/PublicInbox/WwwAtomStream.pm | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 35b536c1c..9f322d38f 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -22,15 +22,17 @@ sub close {}
 sub new {
 	my ($class, $ctx, $cb) = @_;
 	$ctx->{feed_base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	bless { cb => $cb || \&close, ctx => $ctx, emit_header => 1 }, $class;
+	$ctx->{cb} = $cb || \&close;
+	$ctx->{emit_header} = 1;
+	bless $ctx, $class;
 }
 
 sub response {
 	my ($class, $ctx, $code, $cb) = @_;
 	my $h = [ 'Content-Type' => 'application/atom+xml' ];
-	my $self = $class->new($ctx, $cb);
-	$self->{gzf} = gzf_maybe($h, $ctx->{env});
-	[ $code, $h, $self ]
+	$class->new($ctx, $cb);
+	$ctx->{gzf} = gzf_maybe($h, $ctx->{env});
+	[ $code, $h, $ctx ]
 }
 
 # called once for each message by PSGI server
@@ -38,7 +40,7 @@ sub getline {
 	my ($self) = @_;
 	my $buf = do {
 		if (my $middle = $self->{cb}) {
-			my $smsg = $middle->($self->{ctx});
+			my $smsg = $middle->($self);
 			feed_entry($self, $smsg) if $smsg;
 		}
 	} // (delete($self->{cb}) ? '</feed>' : undef);
@@ -106,8 +108,7 @@ sub atom_header {
 
 # returns undef or string
 sub feed_entry {
-	my ($self, $smsg) = @_;
-	my $ctx = $self->{ctx};
+	my ($ctx, $smsg) = @_;
 	my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return '';
 	my $hdr = $eml->header_obj;
 	my $mid = $smsg->{mid};
@@ -136,7 +137,7 @@ sub feed_entry {
 	$email = ascii_html($email);
 
 	my $s = '';
-	if (delete $self->{emit_header}) {
+	if (delete $ctx->{emit_header}) {
 		$s .= atom_header($ctx, $title);
 	}
 	$s .= "<entry><author><name>$name</name><email>$email</email>" .

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 20/43] xt/httpd-async-stream: allow more options
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (18 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 19/43] wwwatomstream: reuse $ctx as $self Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 21/43] wwwatomstream: support async blob fetch Eric Wong
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We want to be able to parallelize and stress test more
endpoints and toggle `--compressed' and possibly other
options in curl.
---
 xt/httpd-async-stream.t | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/xt/httpd-async-stream.t b/xt/httpd-async-stream.t
index 29bcb6125..22a96875d 100644
--- a/xt/httpd-async-stream.t
+++ b/xt/httpd-async-stream.t
@@ -15,7 +15,12 @@ my $curl = which('curl') or plan skip_all => "curl(1) missing for $0";
 my ($tmpdir, $for_destroy) = tmpdir();
 require_mods(qw(DBD::SQLite));
 my $JOBS = $ENV{TEST_JOBS} // 4;
-diag "TEST_JOBS=$JOBS";
+my $endpoint = $ENV{TEST_ENDPOINT} // 'all.mbox.gz';
+my $curl_opt = $ENV{TEST_CURL_OPT} // '';
+diag "TEST_JOBS=$JOBS TEST_ENDPOINT=$endpoint TEST_CURL_OPT=$curl_opt";
+
+# we set Host: to ensure stable results across test runs
+my @CURL_OPT = (qw(-HHost:example.com -sSf), split(' ', $curl_opt));
 
 my $make_local_server = sub {
 	my $pi_config = "$tmpdir/config";
@@ -38,7 +43,7 @@ address = test\@example.com
 	my $cmd = [ '-httpd', "--stdout=$out", "--stderr=$err", '-W0' ];
 	my $host_port = $http->sockhost.':'.$http->sockport;
 	push @$cmd, "-lhttp://$host_port";
-	my $url = "$host_port/test/all.mbox.gz";
+	my $url = "$host_port/test/$endpoint";
 	print STDERR "# CMD ". join(' ', @$cmd). "\n";
 	my $env = { PI_CONFIG => $pi_config };
 	(start_script($cmd, $env, $rdr), $url);
@@ -53,7 +58,7 @@ my $do_get_all = sub {
 	my ($buf, $nr);
 	my $bytes = 0;
 	my $t0 = now();
-	my ($rd, $pid) = popen_rd([$curl, qw(-HHost:example.com -sSf), $url]);
+	my ($rd, $pid) = popen_rd([$curl, @CURL_OPT, $url]);
 	while (1) {
 		$nr = sysread($rd, $buf, 65536);
 		last if !$nr;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 21/43] wwwatomstream: support async blob fetch
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (19 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 20/43] xt/httpd-async-stream: allow more options Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 22/43] wwwstream: reduce object graph depth Eric Wong
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This allows -httpd to handle other requests while waiting
for git to retrieve and decode blobs.  We'll also break
apart t/psgi_v2.t further to ensure tests run against
-httpd in addition to generic PSGI testing.

Using xt/httpd-async-stream.t to test against clones of meta@public-inbox.org
shows a 10-12% performance improvement with the following env:
TEST_JOBS=1000 TEST_CURL_OPT=--compressed TEST_ENDPOINT=new.atom
---
 Documentation/mknews.perl        |  7 +--
 lib/PublicInbox/WwwAtomStream.pm | 74 +++++++++++++++++++++++----
 t/psgi_v2.t                      | 86 ++++++++++++++++++++------------
 3 files changed, 123 insertions(+), 44 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index ba049d9e6..2d22d147e 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -147,9 +147,10 @@ EOF
 }
 
 sub mime2atom  {
-	my ($out, $astream, $mime, $ctx) = @_;
-	my $smsg = bless { mime => $mime }, 'PublicInbox::Smsg';
-	if (defined(my $str = $astream->feed_entry($smsg))) {
+	my ($out, $astream, $eml, $ctx) = @_;
+	my $smsg = bless {}, 'PublicInbox::Smsg';
+	$smsg->populate($eml);
+	if (defined(my $str = $astream->feed_entry($smsg, $eml))) {
 		print $out $str or die;
 	}
 }
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 9f322d38f..583309228 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -15,9 +15,11 @@ use PublicInbox::Address;
 use PublicInbox::Hval qw(ascii_html mid_href);
 use PublicInbox::MsgTime qw(msg_timestamp);
 use PublicInbox::GzipFilter qw(gzf_maybe);
+use PublicInbox::GitAsyncCat;
 
-# called by PSGI server after getline:
-sub close {}
+# called by generic PSGI server after getline,
+# and also by PublicInbox::HTTP::close
+sub close { !!delete($_[0]->{http_out}) }
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
@@ -27,12 +29,62 @@ sub new {
 	bless $ctx, $class;
 }
 
+# called by PublicInbox::DS::write
+sub atom_async_next {
+	my ($http) = @_; # PublicInbox::HTTP
+	atom_async_step($http->{forward});
+}
+
+# this is public-inbox-httpd-specific
+sub atom_blob_cb { # git->cat_async callback
+	my ($bref, $oid, $type, $size, $ctx) = @_;
+	my $http = $ctx->{env}->{'psgix.io'} or return; # client abort
+	my $smsg = delete $ctx->{smsg} or die 'BUG: no smsg';
+	if (!defined($oid)) {
+		# it's possible to have TOCTOU if an admin runs
+		# public-inbox-(edit|purge), just move onto the next message
+		return $http->next_step(\&atom_async_next);
+	} else {
+		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
+	}
+	my $buf = feed_entry($ctx, $smsg, PublicInbox::Eml->new($bref));
+	if (my $gzf = $ctx->{gzf}) {
+		$buf = $gzf->translate($buf);
+	}
+	# PublicInbox::HTTP::{Chunked,Identity}::write
+	$ctx->{http_out}->write($buf);
+
+	$http->next_step(\&atom_async_next);
+}
+
+sub atom_async_step { # this is public-inbox-httpd-specific
+	my ($ctx) = @_;
+	if (my $smsg = $ctx->{smsg} = $ctx->{cb}->($ctx)) {
+		git_async_cat($ctx->{-inbox}->git, $smsg->{blob},
+				\&atom_blob_cb, $ctx);
+	} elsif (my $out = delete $ctx->{http_out}) {
+		if (my $gzf = delete $ctx->{gzf}) {
+			$out->write($gzf->zflush);
+		}
+		$out->close;
+	}
+}
+
 sub response {
 	my ($class, $ctx, $code, $cb) = @_;
-	my $h = [ 'Content-Type' => 'application/atom+xml' ];
+	my $res_hdr = [ 'Content-Type' => 'application/atom+xml' ];
 	$class->new($ctx, $cb);
-	$ctx->{gzf} = gzf_maybe($h, $ctx->{env});
-	[ $code, $h, $ctx ]
+	$ctx->{gzf} = gzf_maybe($res_hdr, $ctx->{env});
+	if ($ctx->{env}->{'pi-httpd.async'}) {
+		sub {
+			my ($wcb) = @_; # -httpd provided write callback
+			$ctx->{http_out} = $wcb->([200, $res_hdr]);
+			$ctx->{env}->{'psgix.io'}->{forward} = $ctx;
+			atom_async_step($ctx); # start stepping
+		};
+	} else {
+		[ $code, $res_hdr, $ctx ];
+	}
 }
 
 # called once for each message by PSGI server
@@ -40,8 +92,13 @@ sub getline {
 	my ($self) = @_;
 	my $buf = do {
 		if (my $middle = $self->{cb}) {
-			my $smsg = $middle->($self);
-			feed_entry($self, $smsg) if $smsg;
+			if (my $smsg = $middle->($self)) {
+				my $eml = $self->{-inbox}->smsg_eml($smsg) or
+						return '';
+				feed_entry($self, $smsg, $eml);
+			} else {
+				undef;
+			}
 		}
 	} // (delete($self->{cb}) ? '</feed>' : undef);
 
@@ -108,8 +165,7 @@ sub atom_header {
 
 # returns undef or string
 sub feed_entry {
-	my ($ctx, $smsg) = @_;
-	my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return '';
+	my ($ctx, $smsg, $eml) = @_;
 	my $hdr = $eml->header_obj;
 	my $mid = $smsg->{mid};
 	my $irt = PublicInbox::View::in_reply_to($hdr);
diff --git a/t/psgi_v2.t b/t/psgi_v2.t
index 90a710a44..4ab9601c0 100644
--- a/t/psgi_v2.t
+++ b/t/psgi_v2.t
@@ -14,6 +14,36 @@ use_ok($_) for (qw(HTTP::Request::Common Plack::Test));
 use_ok 'PublicInbox::WWW';
 use_ok 'PublicInbox::V2Writable';
 my ($inboxdir, $for_destroy) = tmpdir();
+my $cfgpath = "$inboxdir/$$.config";
+SKIP: {
+	require_mods(qw(Plack::Test::ExternalServer), 1);
+	open my $fh, '>', $cfgpath or BAIL_OUT $!;
+	print $fh <<EOF or BAIL_OUT $!;
+[publicinbox "v2test"]
+	inboxdir = $inboxdir
+	address = test\@example.com
+EOF
+	close $fh or BAIL_OUT $!;
+}
+
+my $run_httpd = sub {
+	my ($client, $skip) = @_;
+	SKIP: {
+		require_mods(qw(Plack::Test::ExternalServer), $skip);
+		my $env = { PI_CONFIG => $cfgpath };
+		my $sock = tcp_server() or die;
+		my ($out, $err) = map { "$inboxdir/std$_.log" } qw(out err);
+		my $cmd = [ qw(-httpd -W0), "--stdout=$out", "--stderr=$err" ];
+		my $td = start_script($cmd, $env, { 3 => $sock });
+		my ($h, $p) = ($sock->sockhost, $sock->sockport);
+		local $ENV{PLACK_TEST_EXTERNALSERVER_URI} = "http://$h:$p";
+		Plack::Test::ExternalServer::test_psgi(client => $client);
+		$td->join('TERM');
+		open my $fh, '<', $err or BAIL_OUT $!;
+		is(do { local $/; <$fh> }, '', 'no errors');
+	}
+};
+
 my $ibx = {
 	inboxdir => $inboxdir,
 	name => 'test-v2writable',
@@ -60,7 +90,7 @@ EOF
 my $config = PublicInbox::Config->new(\$cfg);
 my $www = PublicInbox::WWW->new($config);
 my ($res, $raw, @from_);
-test_psgi(sub { $www->call(@_) }, sub {
+my $client0 = sub {
 	my ($cb) = @_;
 	$res = $cb->(GET('/v2test/description'));
 	like($res->content, qr!\$INBOX_DIR/description missing!,
@@ -90,7 +120,9 @@ test_psgi(sub { $www->call(@_) }, sub {
 	@bodies = ($res->content =~ /^(hello [^<]+)$/mg);
 	is_deeply(\@bodies, [ "hello world!\n", "hello world\n" ],
 		'new.html ordering is chronological');
-});
+};
+test_psgi(sub { $www->call(@_) }, $client0);
+$run_httpd->($client0, 9);
 
 $mime->header_set('Message-Id', 'a-mid@b');
 $mime->body_set("hello ghosts\n");
@@ -103,7 +135,7 @@ $mids = mids($mime->header_obj);
 my $third = $mids->[-1];
 $im->done;
 
-my $client = sub {
+my $client1 = sub {
 	my ($cb) = @_;
 	$res = $cb->(GET("/v2test/$third/raw"));
 	$raw = $res->content;
@@ -196,32 +228,10 @@ my $client = sub {
 	like($raw, qr/\b3\+ messages\b/, 'thread overview shown');
 };
 
-test_psgi(sub { $www->call(@_) }, $client);
-SKIP: {
-	require_mods(qw(Plack::Test::ExternalServer), 37);
-	my $cfgpath = "$inboxdir/$$.config";
-	open my $fh, '>', $cfgpath or BAIL_OUT $!;
-	print $fh <<EOF or BAIL_OUT $!;
-[publicinbox "v2test"]
-	inboxdir = $inboxdir
-	address = test\@example.com
-EOF
-	close $fh or BAIL_OUT $!;
-	my $env = { PI_CONFIG => $cfgpath };
-	my $sock = tcp_server() or die;
-	my ($out, $err) = map { "$inboxdir/std$_.log" } qw(out err);
-	my $cmd = [ qw(-httpd -W0), "--stdout=$out", "--stderr=$err" ];
-	my $td = start_script($cmd, $env, { 3 => $sock });
-	my ($h, $p) = ($sock->sockhost, $sock->sockport);
-	local $ENV{PLACK_TEST_EXTERNALSERVER_URI} = "http://$h:$p";
-	Plack::Test::ExternalServer::test_psgi(client => $client);
-	$td->join('TERM');
-	open $fh, '<', $err or BAIL_OUT $!;
-	is(do { local $/; <$fh> }, '', 'no errors');
-};
+test_psgi(sub { $www->call(@_) }, $client1);
+$run_httpd->($client1, 38);
 
-test_psgi(sub { $www->call(@_) }, sub {
-	my ($cb) = @_;
+{
 	my $exp = [ qw(<a-mid@b> <reuse@mid>) ];
 	$mime->header_set('Message-Id', @$exp);
 	$mime->header_set('Subject', '4th dupe');
@@ -230,10 +240,12 @@ test_psgi(sub { $www->call(@_) }, sub {
 	$im->done;
 	my @h = $mime->header('Message-ID');
 	is_deeply($exp, \@h, 'reused existing Message-ID');
-
 	$config->each_inbox(sub { $_[0]->search->reopen });
+}
 
-	$res = $cb->(GET('/v2test/new.atom'));
+my $client2 = sub {
+	my ($cb) = @_;
+	my $res = $cb->(GET('/v2test/new.atom'));
 	my @ids = ($res->content =~ m!<id>urn:uuid:([^<]+)</id>!sg);
 	my %ids;
 	$ids{$_}++ for @ids;
@@ -256,7 +268,11 @@ test_psgi(sub { $www->call(@_) }, sub {
 	is($res->code, 200, 'got info refs for dumb clones w/ .git suffix');
 	$res = $cb->(GET('/v2test/info/refs'));
 	is($res->code, 404, 'v2 git URL w/o shard fails');
+};
 
+test_psgi(sub { $www->call(@_) }, $client2);
+$run_httpd->($client2, 8);
+{
 	# ensure conflicted attachments can be resolved
 	foreach my $body (qw(old new)) {
 		$mime = eml_load "t/psgi_v2-$body.eml";
@@ -264,7 +280,11 @@ test_psgi(sub { $www->call(@_) }, sub {
 	}
 	$im->done;
 	$config->each_inbox(sub { $_[0]->search->reopen });
-	$res = $cb->(GET('/v2test/a@dup/'));
+}
+
+my $client3 = sub {
+	my ($cb) = @_;
+	my $res = $cb->(GET('/v2test/a@dup/'));
 	my @links = ($res->content =~ m!"\.\./([^/]+/2-attach\.txt)\"!g);
 	is(scalar(@links), 2, 'both attachment links exist');
 	isnt($links[0], $links[1], 'attachment links are different');
@@ -276,7 +296,9 @@ test_psgi(sub { $www->call(@_) }, sub {
 	}
 	$res = $cb->(GET('/v2test/?t=1970'.'01'.'01'.'000000'));
 	is($res->code, 404, '404 for out-of-range t= param');
-});
+};
+test_psgi(sub { $www->call(@_) }, $client3);
+$run_httpd->($client3, 4);
 
 done_testing();
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 22/43] wwwstream: reduce object graph depth
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (20 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 21/43] wwwatomstream: support async blob fetch Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 23/43] wwwstream: reduce blob fetch paths for ->getline Eric Wong
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Like with WwwAtomStream and MboxGz, we can bless the existing
$ctx object directly to avoid allocating a new hashref.  We'll
also switch from "->" to "::" to reduce stack utilization.
---
 Documentation/mknews.perl     |  2 +-
 lib/PublicInbox/Feed.pm       |  2 +-
 lib/PublicInbox/SearchView.pm |  2 +-
 lib/PublicInbox/View.pm       |  8 ++---
 lib/PublicInbox/WwwStream.pm  | 59 +++++++++++++++--------------------
 5 files changed, 33 insertions(+), 40 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index 2d22d147e..1bd704e68 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -112,7 +112,7 @@ sub html_start {
 	my ($out, $ctx) = @_;
 	require PublicInbox::WwwStream;
 	$ctx->{www} = My::MockObject->new(style => '');
-	my $www_stream = PublicInbox::WwwStream->new($ctx);
+	my $www_stream = PublicInbox::WwwStream::init($ctx);
 	print $out $www_stream->_html_top, '<pre>' or die;
 }
 
diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index 4c1056b46..f25dd267e 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -70,7 +70,7 @@ sub new_html {
 	$ctx->{-html_tip} = '<pre>';
 	$ctx->{-upfx} = '';
 	$ctx->{-hr} = 1;
-	PublicInbox::WwwStream->response($ctx, 200, \&new_html_i);
+	PublicInbox::WwwStream::response($ctx, 200, \&new_html_i);
 }
 
 # private subs
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index d53a533e5..71c3ae707 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -82,7 +82,7 @@ retry:
 			$cb = mset_summary($ctx, $mset, $q);
 		}
 	}
-	PublicInbox::WwwStream->response($ctx, $code, $cb);
+	PublicInbox::WwwStream::response($ctx, $code, $cb);
 }
 
 # display non-nested search results similar to what users expect from
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 0bc2b06e4..4d6f44e0b 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -66,7 +66,7 @@ sub msg_page {
 	$ctx->{mhref} = $next ? '../'.mid_href($smsg->{mid}).'/' : '';
 	multipart_text_as_html($mime, $ctx);
 	$ctx->{-html_tip} = (${delete $ctx->{obuf}} .= '</pre><hr>');
-	PublicInbox::WwwStream->response($ctx, 200, \&msg_page_i);
+	PublicInbox::WwwStream::response($ctx, 200, \&msg_page_i);
 }
 
 sub msg_page_more { # cold
@@ -413,7 +413,7 @@ sub stream_thread ($$) {
 	$ctx->{-title_html} = ascii_html($smsg->{subject});
 	$ctx->{-html_tip} = thread_eml_entry($ctx, $level, $smsg, $eml);
 	$ctx->{-queue} = \@q;
-	PublicInbox::WwwStream->response($ctx, 200, \&stream_thread_i);
+	PublicInbox::WwwStream::response($ctx, 200, \&stream_thread_i);
 }
 
 # /$INBOX/$MESSAGE_ID/t/
@@ -459,7 +459,7 @@ sub thread_html {
 	$ctx->{-title_html} = ascii_html($smsg->{subject});
 	$ctx->{-html_tip} = '<pre>'.eml_entry($ctx, $smsg, $eml, scalar @$msgs);
 	$ctx->{msgs} = $msgs;
-	PublicInbox::WwwStream->response($ctx, 200, \&thread_html_i);
+	PublicInbox::WwwStream::response($ctx, 200, \&thread_html_i);
 }
 
 sub thread_html_i { # PublicInbox::WwwStream::getline callback
@@ -1213,7 +1213,7 @@ sub index_topics {
 	if (@$msgs) {
 		walk_thread(thread_results($ctx, $msgs), $ctx, \&acc_topic);
 	}
-	PublicInbox::WwwStream->response($ctx, dump_topics($ctx), \&index_nav);
+	PublicInbox::WwwStream::response($ctx, dump_topics($ctx), \&index_nav);
 }
 
 sub thread_adj_level {
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 8623440b8..c80440d14 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -27,28 +27,24 @@ sub base_url ($) {
 	$base_url;
 }
 
-sub new {
-	my ($class, $ctx, $cb) = @_;
-
-	bless {
-		nr => 0,
-		cb => $cb,
-		ctx => $ctx,
-		base_url => base_url($ctx),
-	}, $class;
+sub init {
+	my ($ctx, $cb) = @_;
+	$ctx->{cb} = $cb;
+	$ctx->{base_url} = base_url($ctx);
+	$ctx->{nr} = 0;
+	bless $ctx, __PACKAGE__;
 }
 
 sub response {
-	my ($class, $ctx, $code, $cb) = @_;
+	my ($ctx, $code, $cb) = @_;
 	my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ];
-	my $self = $class->new($ctx, $cb);
-	$self->{gzf} = gzf_maybe($h, $ctx->{env});
-	[ $code, $h, $self ]
+	init($ctx, $cb);
+	$ctx->{gzf} = gzf_maybe($h, $ctx->{env});
+	[ $code, $h, $ctx ]
 }
 
 sub _html_top ($) {
-	my ($self) = @_;
-	my $ctx = $self->{ctx};
+	my ($ctx) = @_;
 	my $ibx = $ctx->{-inbox};
 	my $desc = ascii_html($ibx->description);
 	my $title = delete($ctx->{-title_html}) // $desc;
@@ -89,14 +85,13 @@ sub code_footer ($) {
 }
 
 sub _html_end {
-	my ($self) = @_;
+	my ($ctx) = @_;
 	my $urls = 'Archives are clonable:';
-	my $ctx = $self->{ctx};
 	my $ibx = $ctx->{-inbox};
 	my $desc = ascii_html($ibx->description);
 
 	my @urls;
-	my $http = $self->{base_url};
+	my $http = $ctx->{base_url};
 	my $max = $ibx->max_git_epoch;
 	my $dir = (split(m!/!, $http))[-1];
 	my %seen = ($http => 1);
@@ -163,41 +158,39 @@ EOF
 
 # callback for HTTP.pm (and any other PSGI servers)
 sub getline {
-	my ($self) = @_;
-	my $nr = $self->{nr}++;
+	my ($ctx) = @_;
+	my $nr = $ctx->{nr}++;
 
 	my $buf = do {
 		if ($nr == 0) {
-			_html_top($self);
-		} elsif (my $middle = $self->{cb}) {
-			$middle->($nr, $self->{ctx});
+			_html_top($ctx);
+		} elsif (my $middle = $ctx->{cb}) {
+			$middle->($nr, $ctx);
 		}
-	} // (delete($self->{cb}) ? _html_end($self) : undef);
+	} // (delete($ctx->{cb}) ? _html_end($ctx) : undef);
 
 	# gzf may be GzipFilter, `undef' or `0'
-	my $gzf = $self->{gzf} or return $buf;
+	my $gzf = $ctx->{gzf} or return $buf;
 
 	return $gzf->translate($buf) if defined $buf;
-	$self->{gzf} = 0; # next call to ->getline returns $buf (== undef)
+	$ctx->{gzf} = 0; # next call to ->getline returns $buf (== undef)
 	$gzf->translate(undef);
 }
 
 sub html_oneshot ($$;$) {
 	my ($ctx, $code, $sref) = @_;
-	my $self = bless {
-		ctx => $ctx,
-		base_url => base_url($ctx),
-	}, __PACKAGE__;
+	$ctx->{base_url} = base_url($ctx);
+	bless $ctx, __PACKAGE__;
 	my @x;
 	my $h = [ 'Content-Type' => 'text/html; charset=UTF-8',
 		'Content-Length' => undef ];
 	if (my $gzf = gzf_maybe($h, $ctx->{env})) {
-		$gzf->zmore(_html_top($self));
+		$gzf->zmore(_html_top($ctx));
 		$gzf->zmore($$sref) if $sref;
-		$x[0] = $gzf->zflush(_html_end($self));
+		$x[0] = $gzf->zflush(_html_end($ctx));
 		$h->[3] = length($x[0]);
 	} else {
-		@x = (_html_top($self), $sref ? $$sref : (), _html_end($self));
+		@x = (_html_top($ctx), $sref ? $$sref : (), _html_end($ctx));
 		$h->[3] += bytes::length($_) for @x;
 	}
 	[ $code, $h, \@x ]

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 23/43] wwwstream: reduce blob fetch paths for ->getline
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (21 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 22/43] wwwstream: reduce object graph depth Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 24/43] www: start making gzipfilter the parent response class Eric Wong
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This will make it easier to support asynchronous blob
retrievals.  The `$ctx->{nr}' counter is no longer implicitly
supplied since many users didn't care for it, so stack overhead
is slightly reduced.
---
 Documentation/mknews.perl     |   4 +-
 lib/PublicInbox/Feed.pm       |   3 +-
 lib/PublicInbox/SearchView.pm |  28 +++---
 lib/PublicInbox/View.pm       | 181 +++++++++++++++++-----------------
 lib/PublicInbox/WwwStream.pm  |  19 ++--
 5 files changed, 113 insertions(+), 122 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index 1bd704e68..51d54b716 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -37,7 +37,7 @@ if ($dst eq 'NEWS') {
 	my $ibx = My::MockObject->new(
 		description => 'public-inbox releases',
 		over => undef,
-		search => 1, # for WwwStream:_html_top
+		search => 1, # for WwwStream::html_top
 		base_url => "$base_url/",
 	);
 	$ibx->{-primary_address} = $addr;
@@ -113,7 +113,7 @@ sub html_start {
 	require PublicInbox::WwwStream;
 	$ctx->{www} = My::MockObject->new(style => '');
 	my $www_stream = PublicInbox::WwwStream::init($ctx);
-	print $out $www_stream->_html_top, '<pre>' or die;
+	print $out $www_stream->html_top, '<pre>' or die;
 }
 
 sub html_end {
diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index f25dd267e..b15fc3a09 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -50,7 +50,8 @@ sub generate_html_index {
 }
 
 sub new_html_i {
-	my ($nr, $ctx) = @_;
+	my ($ctx) = @_;
+	return $ctx->html_top if exists $ctx->{-html_tip};
 	my $msgs = $ctx->{msgs};
 	while (my $smsg = shift @$msgs) {
 		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 71c3ae707..eeebdfa31 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -10,12 +10,11 @@ use PublicInbox::Smsg;
 use PublicInbox::Hval qw(ascii_html obfuscate_addrs mid_href);
 use PublicInbox::View;
 use PublicInbox::WwwAtomStream;
+use PublicInbox::WwwStream qw(html_oneshot);
 use PublicInbox::SearchThread;
 our $LIM = 200;
 my %rmap_inc;
 
-my $noop = sub {};
-
 sub mbox_results {
 	my ($ctx) = @_;
 	my $q = PublicInbox::SearchQuery->new($ctx->{qp});
@@ -48,7 +47,7 @@ sub sres_top_html {
 		relevance => $q->{r},
 		asc => $asc,
 	};
-	my ($mset, $total, $err, $cb);
+	my ($mset, $total, $err, $html);
 retry:
 	eval {
 		$mset = $srch->query($query, $opts);
@@ -58,8 +57,7 @@ retry:
 	ctx_prepare($q, $ctx);
 	if ($err) {
 		$code = 400;
-		$ctx->{-html_tip} = '<pre>'.err_txt($ctx, $err).'</pre><hr>';
-		$cb = $noop;
+		$html = '<pre>'.err_txt($ctx, $err).'</pre><hr>';
 	} elsif ($total == 0) {
 		if (defined($ctx->{-uxs_retried})) {
 			# undo retry damage:
@@ -70,19 +68,16 @@ retry:
 			goto retry;
 		}
 		$code = 404;
-		$ctx->{-html_tip} = "<pre>\n[No results found]</pre><hr>";
-		$cb = $noop;
+		$html = "<pre>\n[No results found]</pre><hr>";
 	} else {
 		return adump($_[0], $mset, $q, $ctx) if $x eq 'A';
 
 		$ctx->{-html_tip} = search_nav_top($mset, $q, $ctx);
-		if ($x eq 't') {
-			$cb = mset_thread($ctx, $mset, $q);
-		} else {
-			$cb = mset_summary($ctx, $mset, $q);
-		}
+		return mset_thread($ctx, $mset, $q) if $x eq 't';
+		mset_summary($ctx, $mset, $q); # appends to {-html_tip}
+		$html = '';
 	}
-	PublicInbox::WwwStream::response($ctx, $code, $cb);
+	html_oneshot($ctx, $code);
 }
 
 # display non-nested search results similar to what users expect from
@@ -122,7 +117,7 @@ sub mset_summary {
 		$$res .= "$pfx  - by $f @ $date UTC [$pct%]\n\n";
 	}
 	$$res .= search_nav_bot($mset, $q);
-	$noop;
+	undef;
 }
 
 # shorten "/full/path/to/Foo/Bar.pm" to "Foo/Bar.pm" so error
@@ -292,12 +287,13 @@ sub mset_thread {
 
 	@$msgs = reverse @$msgs if $r;
 	$ctx->{msgs} = $msgs;
-	\&mset_thread_i;
+	PublicInbox::WwwStream::response($ctx, 200, \&mset_thread_i);
 }
 
 # callback for PublicInbox::WwwStream::getline
 sub mset_thread_i {
-	my ($nr, $ctx) = @_;
+	my ($ctx) = @_;
+	return $ctx->html_top if exists $ctx->{-html_tip};
 	my $msgs = $ctx->{msgs} or return;
 	while (my $smsg = pop @$msgs) {
 		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 4d6f44e0b..243528263 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -27,60 +27,60 @@ use constant TCHILD => '` ';
 sub th_pfx ($) { $_[0] == 0 ? '' : TCHILD };
 
 sub msg_page_i {
-	my ($nr, $ctx) = @_;
-	if (my $more = delete $ctx->{more}) { # unlikely
-		# fake an EOF if $more retrieval fails;
-		eval { msg_page_more($ctx, $nr, @$more) };
-	} elsif (my $hdr = delete $ctx->{hdr}) {
-		# fake an EOF if generating the footer fails;
-		# we want to at least show the message if something
-		# here crashes:
-		eval { html_footer($ctx, $hdr) };
-	} else {
-		undef
+	my ($ctx) = @_;
+	my $cur = delete $ctx->{smsg} or return; # undef: done
+	my $nxt;
+	if (my $over = $ctx->{-inbox}->over) {
+		$nxt = $ctx->{smsg} = $over->next_by_mid(@{$ctx->{next_arg}});
 	}
+	$ctx->{mhref} = ($ctx->{nr} || $nxt) ?
+			"../${\mid_href($cur->{mid})}/" : '';
+	my $eml = $ctx->{-inbox}->smsg_eml($cur) or return;
+	my $hdr = $eml->header_obj;
+	my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx);
+	multipart_text_as_html($eml, $ctx);
+	delete $ctx->{obuf};
+	$$obuf .= '</pre><hr>';
+	# we want to at least show the message if something
+	# here crashes:
+	eval { $$obuf .= html_footer($ctx, $ctx->{first_hdr}) } if !$nxt;
+	$$obuf;
+}
+
+# /$INBOX/$MESSAGE_ID/ for unindexed v1 inboxes
+sub no_over_i {
+	my ($ctx) = @_;
+	my $eml = delete $ctx->{eml} or return;
+	my $hdr = $eml->header_obj;
+	$ctx->{mhref} = '';
+	my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx);
+	multipart_text_as_html($eml, $ctx);
+	delete $ctx->{obuf};
+	$$obuf .= '</pre><hr>';
+	eval { $$obuf .= html_footer($ctx, $hdr) };
+	$$obuf
+}
+
+sub no_over_html ($) {
+	my ($ctx) = @_;
+	my $bref = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return; # 404
+	$ctx->{eml} = PublicInbox::Eml->new($bref);
+	PublicInbox::WwwStream::response($ctx, 200, \&no_over_i);
 }
 
 # public functions: (unstable)
 
 sub msg_page {
 	my ($ctx) = @_;
-	my $mid = $ctx->{mid};
 	my $ibx = $ctx->{-inbox};
-	my ($smsg, $first, $next);
-	if (my $over = $ibx->over) {
-		my ($id, $prev);
-		$smsg = $over->next_by_mid($mid, \$id, \$prev) or return;
-		$first = $ibx->msg_by_smsg($smsg) or return;
-		$next = $over->next_by_mid($mid, \$id, \$prev);
-		$ctx->{more} = [ $id, $prev, $next ] if $next;
-	} else {
-		$first = $ibx->msg_by_mid($mid) or return;
-	}
-	my $mime = PublicInbox::Eml->new($first);
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
-	my $hdr = $ctx->{hdr} = $mime->header_obj;
-	$ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx, 0);
-	$ctx->{smsg} = $smsg;
-	# $next cannot be true w/o $smsg being defined:
-	$ctx->{mhref} = $next ? '../'.mid_href($smsg->{mid}).'/' : '';
-	multipart_text_as_html($mime, $ctx);
-	$ctx->{-html_tip} = (${delete $ctx->{obuf}} .= '</pre><hr>');
+	my $over = $ibx->over or return no_over_html($ctx);
+	my ($id, $prev);
+	my $next_arg = $ctx->{next_arg} = [ $ctx->{mid}, \$id, \$prev ];
+	$ctx->{smsg} = $over->next_by_mid(@$next_arg) or return;
 	PublicInbox::WwwStream::response($ctx, 200, \&msg_page_i);
 }
 
-sub msg_page_more { # cold
-	my ($ctx, $nr, $id, $prev, $smsg) = @_;
-	my $ibx = $ctx->{-inbox};
-	my $next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev);
-	$ctx->{more} = [ $id, $prev, $next ] if $next;
-	my $eml = $ibx->smsg_eml($smsg) or return '';
-	$ctx->{mhref} = '../' . mid_href($smsg->{mid}) . '/';
-	$ctx->{obuf} = _msg_page_prepare_obuf($eml->header_obj, $ctx, $nr);
-	multipart_text_as_html($eml, $ctx);
-	${delete $ctx->{obuf}} .= '</pre><hr>';
-}
-
 # /$INBOX/$MESSAGE_ID/#R
 sub msg_reply ($$) {
 	my ($ctx, $hdr) = @_;
@@ -377,42 +377,40 @@ sub thread_eml_entry {
 	$beg . '<pre>' . eml_entry($ctx, $smsg, $eml, 0) . '</pre>' . $end;
 }
 
-sub stream_thread_i { # PublicInbox::WwwStream::getline callback
-	my ($nr, $ctx) = @_;
-	return unless exists($ctx->{skel});
-	my $q = $ctx->{-queue};
+sub next_in_queue ($;$) {
+	my ($q, $ghost_ok) = @_;
 	while (@$q) {
-		my $level = shift @$q;
-		my $node = shift @$q or next;
+		my ($level, $smsg) = splice(@$q, 0, 2);
 		my $cl = $level + 1;
-		unshift @$q, map { ($cl, $_) } @{$node->{children}};
-		if (my $eml = $ctx->{-inbox}->smsg_eml($node)) {
-			return thread_eml_entry($ctx, $level, $node, $eml);
-		} else {
-			return ghost_index_entry($ctx, $level, $node);
-		}
+		unshift @$q, map { ($cl, $_) } @{$smsg->{children}};
+		return ($level, $smsg) if $ghost_ok || exists($smsg->{blob});
 	}
-	join('', thread_adj_level($ctx, 0)) . ${delete $ctx->{skel}};
+	undef;
 }
 
-sub stream_thread ($$) {
-	my ($rootset, $ctx) = @_;
-	my $ibx = $ctx->{-inbox};
-	my @q = map { (0, $_) } @$rootset;
-	my ($smsg, $eml, $level);
-	while (@q) {
-		$level = shift @q;
-		$smsg = shift @q or next;
-		my $cl = $level + 1;
-		unshift @q, map { ($cl, $_) } @{$smsg->{children}};
-		$eml = $ibx->smsg_eml($smsg) and last;
+sub stream_thread_i { # PublicInbox::WwwStream::getline callback
+	my ($ctx) = @_;
+	return unless exists($ctx->{skel});
+	my $nr = $ctx->{nr}++;
+	my ($level, $smsg) = next_in_queue($ctx->{-queue}, $nr);
+
+	$smsg or return
+		join('', thread_adj_level($ctx, 0)) . ${delete $ctx->{skel}};
+
+	my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return
+		ghost_index_entry($ctx, $level, $smsg);
+
+	if ($nr == 0) {
+		$ctx->{-title_html} = ascii_html($smsg->{subject});
+		$ctx->html_top . thread_eml_entry($ctx, $level, $smsg, $eml);
+	} else {
+		thread_eml_entry($ctx, $level, $smsg, $eml);
 	}
-	return missing_thread($ctx) unless $eml;
+}
 
-	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
-	$ctx->{-title_html} = ascii_html($smsg->{subject});
-	$ctx->{-html_tip} = thread_eml_entry($ctx, $level, $smsg, $eml);
-	$ctx->{-queue} = \@q;
+sub stream_thread ($$) {
+	my ($rootset, $ctx) = @_;
+	$ctx->{-queue} = [ map { (0, $_) } @$rootset ];
 	PublicInbox::WwwStream::response($ctx, 200, \&stream_thread_i);
 }
 
@@ -451,22 +449,21 @@ sub thread_html {
 	return stream_thread($rootset, $ctx) unless $ctx->{flat};
 
 	# flat display: lazy load the full message from smsg
-	my ($smsg, $eml);
-	while ($smsg = shift @$msgs) {
-		$eml = $ibx->smsg_eml($smsg) and last;
-	}
-	return missing_thread($ctx) unless $smsg;
-	$ctx->{-title_html} = ascii_html($smsg->{subject});
-	$ctx->{-html_tip} = '<pre>'.eml_entry($ctx, $smsg, $eml, scalar @$msgs);
 	$ctx->{msgs} = $msgs;
+	$ctx->{-html_tip} = '<pre>';
 	PublicInbox::WwwStream::response($ctx, 200, \&thread_html_i);
 }
 
 sub thread_html_i { # PublicInbox::WwwStream::getline callback
-	my ($nr, $ctx) = @_;
+	my ($ctx) = @_;
 	my $msgs = $ctx->{msgs} or return;
 	while (my $smsg = shift @$msgs) {
 		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
+		if (exists $ctx->{-html_tip}) {
+			$ctx->{-title_html} = ascii_html($smsg->{subject});
+			return $ctx->html_top .
+				eml_entry($ctx, $smsg, $eml, scalar @$msgs);
+		}
 		return eml_entry($ctx, $smsg, $eml, scalar @$msgs);
 	}
 	my ($skel) = delete @$ctx{qw(skel msgs)};
@@ -624,23 +621,23 @@ sub add_text_body { # callback for each_part
 }
 
 sub _msg_page_prepare_obuf {
-	my ($hdr, $ctx, $nr) = @_;
+	my ($hdr, $ctx) = @_;
 	my $over = $ctx->{-inbox}->over;
 	my $obfs_ibx = $ctx->{-obfs_ibx};
 	my $rv = '';
 	my $mids = mids_for_index($hdr);
-	if ($nr == 0) {
-		if ($ctx->{more}) {
+	my $nr = $ctx->{nr}++;
+	if ($nr) { # unlikely
+		$rv .= '<pre>';
+	} else {
+		$ctx->{first_hdr} = $hdr;
+		if ($ctx->{smsg}) {
 			$rv .=
 "<pre>WARNING: multiple messages have this Message-ID\n</pre>";
 		}
 		$rv .= "<pre\nid=b>"; # anchor for body start
-	} else {
-		$rv .= '<pre>';
-	}
-	if ($over) {
-		$ctx->{-upfx} = '../';
 	}
+	$ctx->{-upfx} = '../' if $over;
 	my @title; # (Subject[0], From[0])
 	for my $v ($hdr->header('From')) {
 		my @n = PublicInbox::Address::names($v);
@@ -681,7 +678,10 @@ sub _msg_page_prepare_obuf {
 		obfuscate_addrs($obfs_ibx, $v) if $obfs_ibx; # possible :P
 		$rv .= "Date: $v\n";
 	}
-	$ctx->{-title_html} = join(' - ', @title);
+	if (!$nr) { # first (and only) message, common case
+		$ctx->{-title_html} = join(' - ', @title);
+		$rv = $ctx->html_top . $rv;
+	}
 	if (scalar(@$mids) == 1) { # common case
 		my $mhtml = ascii_html($mids->[0]);
 		$rv .= "Message-ID: &lt;$mhtml&gt; ";
@@ -1160,8 +1160,9 @@ sub pagination_footer ($$) {
 	"<hr><pre>page: $next$prev</pre>";
 }
 
-sub index_nav { # callback for WwwStream
-	my (undef, $ctx) = @_;
+sub index_nav { # callback for WwwStream::getline
+	my ($ctx) = @_;
+	return $ctx->html_top if exists $ctx->{-html_tip};
 	pagination_footer($ctx, '.')
 }
 
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index c80440d14..4d82cbb48 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -31,7 +31,6 @@ sub init {
 	my ($ctx, $cb) = @_;
 	$ctx->{cb} = $cb;
 	$ctx->{base_url} = base_url($ctx);
-	$ctx->{nr} = 0;
 	bless $ctx, __PACKAGE__;
 }
 
@@ -43,7 +42,7 @@ sub response {
 	[ $code, $h, $ctx ]
 }
 
-sub _html_top ($) {
+sub html_top ($) {
 	my ($ctx) = @_;
 	my $ibx = $ctx->{-inbox};
 	my $desc = ascii_html($ibx->description);
@@ -159,15 +158,9 @@ EOF
 # callback for HTTP.pm (and any other PSGI servers)
 sub getline {
 	my ($ctx) = @_;
-	my $nr = $ctx->{nr}++;
-
-	my $buf = do {
-		if ($nr == 0) {
-			_html_top($ctx);
-		} elsif (my $middle = $ctx->{cb}) {
-			$middle->($nr, $ctx);
-		}
-	} // (delete($ctx->{cb}) ? _html_end($ctx) : undef);
+	my $cb = $ctx->{cb};
+	my $buf = $cb->($ctx) if $cb;
+	$buf //= delete($ctx->{cb}) ? _html_end($ctx) : undef;
 
 	# gzf may be GzipFilter, `undef' or `0'
 	my $gzf = $ctx->{gzf} or return $buf;
@@ -185,12 +178,12 @@ sub html_oneshot ($$;$) {
 	my $h = [ 'Content-Type' => 'text/html; charset=UTF-8',
 		'Content-Length' => undef ];
 	if (my $gzf = gzf_maybe($h, $ctx->{env})) {
-		$gzf->zmore(_html_top($ctx));
+		$gzf->zmore(html_top($ctx));
 		$gzf->zmore($$sref) if $sref;
 		$x[0] = $gzf->zflush(_html_end($ctx));
 		$h->[3] = length($x[0]);
 	} else {
-		@x = (_html_top($ctx), $sref ? $$sref : (), _html_end($ctx));
+		@x = (html_top($ctx), $sref ? $$sref : (), _html_end($ctx));
 		$h->[3] += bytes::length($_) for @x;
 	}
 	[ $code, $h, \@x ]

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 24/43] www: start making gzipfilter the parent response class
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (22 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 23/43] wwwstream: reduce blob fetch paths for ->getline Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 25/43] remove unused/redundant zlib-related imports Eric Wong
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Virtually all of our responses are going to be gzipped, anyways.
This will allow us to utilize zlib as a buffering layer and
share common code for async blob retrieval responses.

To streamline this and allow GzipFilter to be a parent class,
we'll replace the NoopFilter with a similar CompressNoop class
which emulates the two Compress::Raw::Zlib::Deflate methods we
use.

This drops a bunch of redundant code and will hopefully make
upcoming WwwStream changes easier to reason about.
---
 MANIFEST                         |  2 +-
 lib/PublicInbox/CompressNoop.pm  | 22 +++++++
 lib/PublicInbox/GzipFilter.pm    | 75 +++++++++++++++++++-----
 lib/PublicInbox/Mbox.pm          | 90 ++++++++---------------------
 lib/PublicInbox/MboxGz.pm        | 71 +++++------------------
 lib/PublicInbox/NoopFilter.pm    | 24 --------
 lib/PublicInbox/WwwAtomStream.pm | 98 ++++++++------------------------
 lib/PublicInbox/WwwListing.pm    |  3 +-
 lib/PublicInbox/WwwStatic.pm     |  3 +-
 9 files changed, 148 insertions(+), 240 deletions(-)
 create mode 100644 lib/PublicInbox/CompressNoop.pm
 delete mode 100644 lib/PublicInbox/NoopFilter.pm

diff --git a/MANIFEST b/MANIFEST
index 9b0f50203..963caad02 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -100,6 +100,7 @@ lib/PublicInbox/Admin.pm
 lib/PublicInbox/AdminEdit.pm
 lib/PublicInbox/AltId.pm
 lib/PublicInbox/Cgit.pm
+lib/PublicInbox/CompressNoop.pm
 lib/PublicInbox/Config.pm
 lib/PublicInbox/ContentHash.pm
 lib/PublicInbox/DS.pm
@@ -159,7 +160,6 @@ lib/PublicInbox/NNTP.pm
 lib/PublicInbox/NNTPD.pm
 lib/PublicInbox/NNTPdeflate.pm
 lib/PublicInbox/NewsWWW.pm
-lib/PublicInbox/NoopFilter.pm
 lib/PublicInbox/Over.pm
 lib/PublicInbox/OverIdx.pm
 lib/PublicInbox/ParentPipe.pm
diff --git a/lib/PublicInbox/CompressNoop.pm b/lib/PublicInbox/CompressNoop.pm
new file mode 100644
index 000000000..fe73c2d10
--- /dev/null
+++ b/lib/PublicInbox/CompressNoop.pm
@@ -0,0 +1,22 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# Provide the same methods as Compress::Raw::Zlib::Deflate but
+# does no transformation of outgoing data
+package PublicInbox::CompressNoop;
+use strict;
+use Compress::Raw::Zlib qw(Z_OK);
+
+sub new { bless \(my $self), __PACKAGE__ }
+
+sub deflate { # ($self, $input, $output)
+	$_[2] .= $_[1];
+	Z_OK;
+}
+
+sub flush { # ($self, $output, $flags = Z_FINISH)
+	$_[1] //= '';
+	Z_OK;
+}
+
+1;
diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 0a6c56a5d..99d43cf04 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -6,6 +6,10 @@ package PublicInbox::GzipFilter;
 use strict;
 use parent qw(Exporter);
 use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+use PublicInbox::CompressNoop;
+use PublicInbox::Eml;
+use PublicInbox::GitAsyncCat;
+
 our @EXPORT_OK = qw(gzf_maybe);
 my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip);
@@ -14,22 +18,41 @@ sub new { bless {}, shift }
 
 # for Qspawn if using $env->{'pi-httpd.async'}
 sub attach {
-	my ($self, $fh) = @_;
-	$self->{fh} = $fh;
+	my ($self, $http_out) = @_;
+	$self->{http_out} = $http_out;
 	$self
 }
 
-# returns `0' and not `undef' on failure (see Www*Stream)
-sub gzf_maybe ($$) {
+sub gz_or_noop {
 	my ($res_hdr, $env) = @_;
-	return 0 if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/;
-	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
-	return 0 if $err != Z_OK;
+	if (($env->{HTTP_ACCEPT_ENCODING} // '') =~ /\bgzip\b/) {
+		$env->{'plack.skip-deflater'} = 1;
+		push @$res_hdr, @GZIP_HDRS;
+		gzip_or_die();
+	} else {
+		PublicInbox::CompressNoop::new();
+	}
+}
 
-	# in case Plack::Middleware::Deflater is loaded:
-	$env->{'plack.skip-deflater'} = 1;
-	push @$res_hdr, @GZIP_HDRS;
-	bless { gz => $gz }, __PACKAGE__;
+sub gzf_maybe ($$) { bless { gz => gz_or_noop(@_) }, __PACKAGE__ }
+
+sub psgi_response {
+	my ($self, $code, $res_hdr, $next_cb, $eml_cb) = @_;
+	my $env = $self->{env};
+	$self->{gz} //= gz_or_noop($res_hdr, $env);
+	if ($env->{'pi-httpd.async'}) {
+		$self->{async_next} = $next_cb;
+		$self->{async_eml} = $eml_cb;
+		my $http = $env->{'psgix.io'}; # PublicInbox::HTTP
+		$http->{forward} = $self;
+		sub {
+			my ($wcb) = @_; # -httpd provided write callback
+			$self->{http_out} = $wcb->([$code, $res_hdr]);
+			$next_cb->($http); # start stepping
+		};
+	} else { # generic PSGI code path
+		[ $code, $res_hdr, $self ];
+	}
 }
 
 sub qsp_maybe ($$) {
@@ -80,7 +103,7 @@ sub translate ($$) {
 
 sub write {
 	# my $ret = bytes::length($_[1]); # XXX does anybody care?
-	$_[0]->{fh}->write(translate($_[0], $_[1]));
+	$_[0]->{http_out}->write(translate($_[0], $_[1]));
 }
 
 # similar to ->translate; use this when we're sure we know we have
@@ -109,9 +132,31 @@ sub zflush ($;$) {
 
 sub close {
 	my ($self) = @_;
-	my $fh = delete $self->{fh};
-	$fh->write(zflush($self));
-	$fh->close;
+	if (my $http_out = delete $self->{http_out}) {
+		$http_out->write(zflush($self));
+		$http_out->close;
+	}
+}
+
+# this is public-inbox-httpd-specific
+sub async_blob_cb { # git->cat_async callback
+	my ($bref, $oid, $type, $size, $self) = @_;
+	my $http = $self->{env}->{'psgix.io'} or return; # client abort
+	my $smsg = $self->{smsg} or die 'BUG: no smsg';
+	if (!defined($oid)) {
+		# it's possible to have TOCTOU if an admin runs
+		# public-inbox-(edit|purge), just move onto the next message
+		return $http->next_step($self->{async_next});
+	}
+	$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
+	$self->{async_eml}->($self, PublicInbox::Eml->new($bref));
+	$http->next_step($self->{async_next});
+}
+
+sub smsg_blob {
+	my ($self, $smsg) = @_;
+	git_async_cat($self->{-inbox}->git, $smsg->{blob},
+			\&async_blob_cb, $self);
 }
 
 1;
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 895f828c5..abdf43c93 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -9,13 +9,11 @@
 # more common "push" model)
 package PublicInbox::Mbox;
 use strict;
-use warnings;
+use parent 'PublicInbox::GzipFilter';
 use PublicInbox::MID qw/mid_escape/;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Smsg;
 use PublicInbox::Eml;
-use PublicInbox::GitAsyncCat;
-use PublicInbox::GzipFilter qw(gzf_maybe);
 
 # called by PSGI server as body response
 # this gets called twice for every message, once to return the header,
@@ -25,49 +23,35 @@ sub getline {
 	my $smsg = $ctx->{smsg} or return;
 	my $ibx = $ctx->{-inbox};
 	my $eml = $ibx->smsg_eml($smsg) or return;
-	$ctx->{smsg} = $ibx->over->next_by_mid($ctx->{mid}, @{$ctx->{id_prev}});
-	msg_hdr($ctx, $eml, $smsg->{mid}) . msg_body($eml);
-}
-
-sub close { !!delete($_[0]->{http_out}) }
-
-sub mbox_async_step ($) { # public-inbox-httpd-only
-	my ($ctx) = @_;
-	if (my $smsg = $ctx->{smsg}) {
-		git_async_cat($ctx->{-inbox}->git, $smsg->{blob},
-				\&mbox_blob_cb, $ctx);
-	} elsif (my $out = delete $ctx->{http_out}) {
-		$out->close;
+	my $n = $ctx->{smsg} = $ibx->over->next_by_mid(@{$ctx->{next_arg}});
+	$ctx->zmore(msg_hdr($ctx, $eml, $smsg->{mid}));
+	if ($n) {
+		$ctx->translate(msg_body($eml));
+	} else { # last message
+		$ctx->zmore(msg_body($eml));
+		$ctx->zflush;
 	}
 }
 
 # called by PublicInbox::DS::write
-sub mbox_async_next {
+sub async_next {
 	my ($http) = @_; # PublicInbox::HTTP
 	my $ctx = $http->{forward} or return; # client aborted
 	eval {
-		$ctx->{smsg} = $ctx->{-inbox}->over->next_by_mid(
-					$ctx->{mid}, @{$ctx->{id_prev}});
-		mbox_async_step($ctx);
+		my $smsg = $ctx->{smsg} or return $ctx->close;
+		$ctx->smsg_blob($smsg);
 	};
+	warn "E: $@" if $@;
 }
 
-# this is public-inbox-httpd-specific
-sub mbox_blob_cb { # git->cat_async callback
-	my ($bref, $oid, $type, $size, $ctx) = @_;
-	my $http = $ctx->{env}->{'psgix.io'} or return; # client abort
-	my $smsg = delete $ctx->{smsg} or die 'BUG: no smsg';
-	if (!defined($oid)) {
-		# it's possible to have TOCTOU if an admin runs
-		# public-inbox-(edit|purge), just move onto the next message
-		return $http->next_step(\&mbox_async_next);
-	} else {
-		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
-	}
-	my $eml = PublicInbox::Eml->new($bref);
-	$ctx->{http_out}->write(msg_hdr($ctx, $eml, $smsg->{mid}));
-	$ctx->{http_out}->write(msg_body($eml));
-	$http->next_step(\&mbox_async_next);
+sub async_eml { # ->{async_eml} for async_blob_cb
+	my ($ctx, $eml) = @_;
+	my $smsg = delete $ctx->{smsg};
+	# next message
+	$ctx->{smsg} = $ctx->{-inbox}->over->next_by_mid(@{$ctx->{next_arg}});
+
+	$ctx->zmore(msg_hdr($ctx, $eml, $smsg->{mid}));
+	$ctx->{http_out}->write($ctx->translate(msg_body($eml)));
 }
 
 sub res_hdr ($$) {
@@ -97,41 +81,17 @@ sub no_over_raw ($) {
 		[ msg_hdr($ctx, $eml, $ctx->{mid}) . msg_body($eml) ] ]
 }
 
-sub stream_raw { # MboxGz response callback
-	my ($ctx) = @_;
-	delete($ctx->{smsg}) //
-		$ctx->{-inbox}->over->next_by_mid($ctx->{mid},
-						@{$ctx->{id_prev}});
-}
-
 # /$INBOX/$MESSAGE_ID/raw
 sub emit_raw {
 	my ($ctx) = @_;
-	my $env = $ctx->{env};
-	$ctx->{base_url} = $ctx->{-inbox}->base_url($env);
+	$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
 	my $over = $ctx->{-inbox}->over or return no_over_raw($ctx);
 	my ($id, $prev);
-	my $smsg = $over->next_by_mid($ctx->{mid}, \$id, \$prev) or return;
-	$ctx->{smsg} = $smsg;
+	my $mip = $ctx->{next_arg} = [ $ctx->{mid}, \$id, \$prev ];
+	my $smsg = $ctx->{smsg} = $over->next_by_mid(@$mip) or return;
 	my $res_hdr = res_hdr($ctx, $smsg->{subject});
-	$ctx->{id_prev} = [ \$id, \$prev ];
-
-	if (my $gzf = gzf_maybe($res_hdr, $env)) {
-		$ctx->{gz} = delete $gzf->{gz};
-		require PublicInbox::MboxGz;
-		PublicInbox::MboxGz::response($ctx, \&stream_raw, $res_hdr);
-	} elsif ($env->{'pi-httpd.async'}) {
-		sub {
-			my ($wcb) = @_; # -httpd provided write callback
-			$ctx->{http_out} = $wcb->([200, $res_hdr]);
-			$ctx->{env}->{'psgix.io'}->{forward} = $ctx;
-			bless $ctx, __PACKAGE__;
-			mbox_async_step($ctx); # start stepping
-		};
-	} else { # generic PSGI code path
-		bless $ctx, __PACKAGE__; # respond to ->getline
-		[ 200, $res_hdr, $ctx ];
-	}
+	bless $ctx, __PACKAGE__;
+	$ctx->psgi_response(200, $res_hdr, \&async_next, \&async_eml);
 }
 
 sub msg_hdr ($$;$) {
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index 716bf7b19..fdd16f68e 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -6,77 +6,32 @@ use parent 'PublicInbox::GzipFilter';
 use PublicInbox::Eml;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Mbox;
-use PublicInbox::GitAsyncCat;
 *msg_hdr = \&PublicInbox::Mbox::msg_hdr;
 *msg_body = \&PublicInbox::Mbox::msg_body;
 
-# this is public-inbox-httpd-specific
-sub mboxgz_blob_cb { # git->cat_async callback
-	my ($bref, $oid, $type, $size, $self) = @_;
-	my $http = $self->{env}->{'psgix.io'} or return; # client abort
-	my $smsg = delete $self->{smsg} or die 'BUG: no smsg';
-	if (!defined($oid)) {
-		# it's possible to have TOCTOU if an admin runs
-		# public-inbox-(edit|purge), just move onto the next message
-		return $http->next_step(\&mboxgz_async_next);
-	} else {
-		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
-	}
-	my $eml = PublicInbox::Eml->new($bref);
-	$self->zmore(msg_hdr($self, $eml, $smsg->{mid}));
-
-	# PublicInbox::HTTP::{Chunked,Identity}::write
-	$self->{http_out}->write($self->translate(msg_body($eml)));
-
-	$http->next_step(\&mboxgz_async_next);
-}
-
-# this is public-inbox-httpd-specific
-sub mboxgz_async_step ($) {
-	my ($self) = @_;
-	if (my $smsg = $self->{smsg} = $self->{cb}->($self)) {
-		git_async_cat($self->{-inbox}->git, $smsg->{blob},
-				\&mboxgz_blob_cb, $self);
-	} elsif (my $out = delete $self->{http_out}) {
-		$out->write($self->zflush);
-		$out->close;
-	}
-}
-
-# called by PublicInbox::DS::write
-sub mboxgz_async_next {
+sub async_next ($) {
 	my ($http) = @_; # PublicInbox::HTTP
-	mboxgz_async_step($http->{forward});
-}
-
-# called by PublicInbox::HTTP::close, or any other PSGI server
-sub close { !!delete($_[0]->{http_out}) }
-
-sub response {
-	my ($self, $cb, $res_hdr) = @_;
-	$self->{cb} = $cb;
-	bless $self, __PACKAGE__;
-	if ($self->{env}->{'pi-httpd.async'}) {
-		sub {
-			my ($wcb) = @_; # -httpd provided write callback
-			$self->{http_out} = $wcb->([200, $res_hdr]);
-			$self->{env}->{'psgix.io'}->{forward} = $self;
-			mboxgz_async_step($self); # start stepping
-		};
-	} else { # generic PSGI
-		[ 200, $res_hdr, $self ];
-	}
+	my $ctx = $http->{forward} or return;
+	eval {
+		$ctx->{smsg} = $ctx->{cb}->($ctx) or return $ctx->close;
+		$ctx->smsg_blob($ctx->{smsg});
+	};
+	warn "E: $@" if $@;
 }
 
 sub mbox_gz {
 	my ($self, $cb, $fn) = @_;
+	$self->{cb} = $cb;
 	$self->{base_url} = $self->{-inbox}->base_url($self->{env});
 	$self->{gz} = PublicInbox::GzipFilter::gzip_or_die();
 	$fn = to_filename($fn // 'no-subject');
 	$fn = 'no-subject' if $fn eq '';
 	# http://www.iana.org/assignments/media-types/application/gzip
-	response($self, $cb, [ qw(Content-Type application/gzip),
-		'Content-Disposition', "inline; filename=$fn.mbox.gz" ]);
+	bless $self, __PACKAGE__;
+	my $res_hdr = [ 'Content-Type' => 'application/gzip',
+		'Content-Disposition' => "inline; filename=$fn.mbox.gz" ];
+	$self->psgi_response(200, $res_hdr, \&async_next,
+				\&PublicInbox::Mbox::async_eml);
 }
 
 # called by Plack::Util::foreach or similar (generic PSGI)
diff --git a/lib/PublicInbox/NoopFilter.pm b/lib/PublicInbox/NoopFilter.pm
deleted file mode 100644
index a97dbde64..000000000
--- a/lib/PublicInbox/NoopFilter.pm
+++ /dev/null
@@ -1,24 +0,0 @@
-# Copyright (C) 2020 all contributors <meta@public-inbox.org>
-# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-
-package PublicInbox::NoopFilter;
-use strict;
-
-sub new { bless \(my $self = ''), __PACKAGE__ }
-
-# noop workalike for PublicInbox::GzipFilter methods
-sub translate {
-	my $self = $_[0];
-	my $ret = $$self .= ($_[1] // '');
-	$$self = '';
-	$ret;
-}
-
-sub zmore {
-	${$_[0]} .= $_[1];
-	undef;
-}
-
-sub zflush { translate($_[0], $_[1]) }
-
-1;
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 583309228..073df1dfa 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -7,107 +7,59 @@
 # more common "push" model)
 package PublicInbox::WwwAtomStream;
 use strict;
-use warnings;
+use parent 'PublicInbox::GzipFilter';
 
 use POSIX qw(strftime);
 use Digest::SHA qw(sha1_hex);
 use PublicInbox::Address;
 use PublicInbox::Hval qw(ascii_html mid_href);
 use PublicInbox::MsgTime qw(msg_timestamp);
-use PublicInbox::GzipFilter qw(gzf_maybe);
-use PublicInbox::GitAsyncCat;
-
-# called by generic PSGI server after getline,
-# and also by PublicInbox::HTTP::close
-sub close { !!delete($_[0]->{http_out}) }
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
 	$ctx->{feed_base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	$ctx->{cb} = $cb || \&close;
+	$ctx->{cb} = $cb || \&PublicInbox::GzipFilter::close;
 	$ctx->{emit_header} = 1;
 	bless $ctx, $class;
 }
 
-# called by PublicInbox::DS::write
-sub atom_async_next {
+sub async_next ($) {
 	my ($http) = @_; # PublicInbox::HTTP
-	atom_async_step($http->{forward});
-}
-
-# this is public-inbox-httpd-specific
-sub atom_blob_cb { # git->cat_async callback
-	my ($bref, $oid, $type, $size, $ctx) = @_;
-	my $http = $ctx->{env}->{'psgix.io'} or return; # client abort
-	my $smsg = delete $ctx->{smsg} or die 'BUG: no smsg';
-	if (!defined($oid)) {
-		# it's possible to have TOCTOU if an admin runs
-		# public-inbox-(edit|purge), just move onto the next message
-		return $http->next_step(\&atom_async_next);
-	} else {
-		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
-	}
-	my $buf = feed_entry($ctx, $smsg, PublicInbox::Eml->new($bref));
-	if (my $gzf = $ctx->{gzf}) {
-		$buf = $gzf->translate($buf);
-	}
-	# PublicInbox::HTTP::{Chunked,Identity}::write
-	$ctx->{http_out}->write($buf);
-
-	$http->next_step(\&atom_async_next);
+	my $ctx = $http->{forward} or return;
+	eval {
+		if (my $smsg = $ctx->{smsg} = $ctx->{cb}->($ctx)) {
+			$ctx->smsg_blob($smsg);
+		} else {
+			$ctx->{http_out}->write($ctx->translate('</feed>'));
+			$ctx->close;
+		}
+	};
+	warn "E: $@" if $@;
 }
 
-sub atom_async_step { # this is public-inbox-httpd-specific
-	my ($ctx) = @_;
-	if (my $smsg = $ctx->{smsg} = $ctx->{cb}->($ctx)) {
-		git_async_cat($ctx->{-inbox}->git, $smsg->{blob},
-				\&atom_blob_cb, $ctx);
-	} elsif (my $out = delete $ctx->{http_out}) {
-		if (my $gzf = delete $ctx->{gzf}) {
-			$out->write($gzf->zflush);
-		}
-		$out->close;
-	}
+sub async_eml { # ->{async_eml} for async_blob_cb
+	my ($ctx, $eml) = @_;
+	my $smsg = delete $ctx->{smsg};
+	$ctx->{http_out}->write($ctx->translate(feed_entry($ctx, $smsg, $eml)))
 }
 
 sub response {
 	my ($class, $ctx, $code, $cb) = @_;
 	my $res_hdr = [ 'Content-Type' => 'application/atom+xml' ];
 	$class->new($ctx, $cb);
-	$ctx->{gzf} = gzf_maybe($res_hdr, $ctx->{env});
-	if ($ctx->{env}->{'pi-httpd.async'}) {
-		sub {
-			my ($wcb) = @_; # -httpd provided write callback
-			$ctx->{http_out} = $wcb->([200, $res_hdr]);
-			$ctx->{env}->{'psgix.io'}->{forward} = $ctx;
-			atom_async_step($ctx); # start stepping
-		};
-	} else {
-		[ $code, $res_hdr, $ctx ];
-	}
+	$ctx->psgi_response($code, $res_hdr, \&async_next, \&async_eml);
 }
 
 # called once for each message by PSGI server
 sub getline {
 	my ($self) = @_;
-	my $buf = do {
-		if (my $middle = $self->{cb}) {
-			if (my $smsg = $middle->($self)) {
-				my $eml = $self->{-inbox}->smsg_eml($smsg) or
-						return '';
-				feed_entry($self, $smsg, $eml);
-			} else {
-				undef;
-			}
-		}
-	} // (delete($self->{cb}) ? '</feed>' : undef);
-
-	# gzf may be GzipFilter, `undef' or `0'
-	my $gzf = $self->{gzf} or return $buf;
-
-	return $gzf->translate($buf) if defined $buf;
-	$self->{gzf} = 0; # next call to ->getline returns $buf (== undef)
-	$gzf->translate(undef);
+	my $cb = $self->{cb} or return;
+	while (my $smsg = $cb->($self)) {
+		my $eml = $self->{-inbox}->smsg_eml($smsg) or next;
+		return $self->translate(feed_entry($self, $smsg, $eml));
+	}
+	delete $self->{cb};
+	$self->zflush('</feed>');
 }
 
 # private
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index d641e6d5c..5f85e3464 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -10,7 +10,6 @@ use PublicInbox::Hval qw(ascii_html prurl);
 use PublicInbox::Linkify;
 use PublicInbox::View;
 use PublicInbox::Inbox;
-use PublicInbox::NoopFilter;
 use PublicInbox::GzipFilter qw(gzf_maybe);
 use bytes (); # bytes::length
 use HTTP::Date qw(time2str);
@@ -108,7 +107,7 @@ sub html ($$) {
 	my ($env, $list) = @_;
 	my $h = [ 'Content-Type', 'text/html; charset=UTF-8',
 			'Content-Length', undef ];
-	my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new();
+	my $gzf = gzf_maybe($h, $env);
 	$gzf->zmore('<html><head><title>' .
 				'public-inbox listing</title>' .
 				'</head><body><pre>');
diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index d0611949d..051d2e039 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -17,7 +17,6 @@ use HTTP::Date qw(time2str);
 use HTTP::Status qw(status_message);
 use Errno qw(EACCES ENOTDIR ENOENT);
 use URI::Escape qw(uri_escape_utf8);
-use PublicInbox::NoopFilter;
 use PublicInbox::GzipFilter qw(gzf_maybe);
 use PublicInbox::Hval qw(ascii_html);
 use Plack::MIME;
@@ -313,7 +312,7 @@ sub dir_response ($$$) {
 
 	my $path_info_html = ascii_html($path_info);
 	my $h = [qw(Content-Type text/html Content-Length), undef];
-	my $gzf = gzf_maybe($h, $env) || PublicInbox::NoopFilter::new();
+	my $gzf = gzf_maybe($h, $env);
 	$gzf->zmore("<html><head><title>Index of $path_info_html</title>" .
 		${$self->{style}} .
 		"</head><body><pre>Index of $path_info_html</pre><hr><pre>\n");

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 25/43] remove unused/redundant zlib-related imports
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (23 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 24/43] www: start making gzipfilter the parent response class Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 26/43] wwwstream: use parent.pm and no warnings Eric Wong
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Z_FINISH is the default for Compress::Raw::Zlib::Deflate->flush,
anyways, so there's no reason to import it.  And none of C::R::Z
is needed in WwwText now that gzf_maybe handles it all.
---
 lib/PublicInbox/GzipFilter.pm | 6 +++---
 lib/PublicInbox/WwwText.pm    | 1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 99d43cf04..6380f50e9 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -5,7 +5,7 @@
 package PublicInbox::GzipFilter;
 use strict;
 use parent qw(Exporter);
-use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+use Compress::Raw::Zlib qw(Z_OK);
 use PublicInbox::CompressNoop;
 use PublicInbox::Eml;
 use PublicInbox::GitAsyncCat;
@@ -95,7 +95,7 @@ sub translate ($$) {
 		$self->{zbuf} = $zbuf;
 		'';
 	} else { # undef == EOF
-		my $err = $gz->flush($zbuf, Z_FINISH);
+		my $err = $gz->flush($zbuf);
 		die "gzip->flush: $err" if $err != Z_OK;
 		$zbuf;
 	}
@@ -125,7 +125,7 @@ sub zflush ($;$) {
 		$err = $gz->deflate($_[1], $zbuf);
 		die "gzip->deflate: $err" if $err != Z_OK;
 	}
-	$err = $gz->flush($zbuf, Z_FINISH);
+	$err = $gz->flush($zbuf);
 	die "gzip->flush: $err" if $err != Z_OK;
 	$zbuf;
 }
diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm
index ea50428e3..08691684d 100644
--- a/lib/PublicInbox/WwwText.pm
+++ b/lib/PublicInbox/WwwText.pm
@@ -11,7 +11,6 @@ use PublicInbox::WwwStream;
 use PublicInbox::Hval qw(ascii_html);
 use URI::Escape qw(uri_escape_utf8);
 use PublicInbox::GzipFilter qw(gzf_maybe);
-use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
 our $QP_URL = 'https://xapian.org/docs/queryparser.html';
 our $WIKI_URL = 'https://en.wikipedia.org/wiki';
 my $hl = eval {

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 26/43] wwwstream: use parent.pm and no warnings
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (24 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 25/43] remove unused/redundant zlib-related imports Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 27/43] wwwstream: subclass off GzipFilter Eric Wong
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

parent.pm is leaner than base and we'll rely on `-w' for
warnings during development.
---
 lib/PublicInbox/WwwStream.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 4d82cbb48..fd558e1b7 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -8,8 +8,7 @@
 # more common "push" model)
 package PublicInbox::WwwStream;
 use strict;
-use warnings;
-use base qw(Exporter);
+use parent qw(Exporter);
 our @EXPORT_OK = qw(html_oneshot);
 use bytes (); # length
 use PublicInbox::Hval qw(ascii_html prurl);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 27/43] wwwstream: subclass off GzipFilter
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (25 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 26/43] wwwstream: use parent.pm and no warnings Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 28/43] view: make /$INBOX/$MSGID/ permalink async Eric Wong
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This makes WwwStream closer to MboxGz and WwwAtomStream
and will eventually allow us to follow the same patterns.
---
 lib/PublicInbox/WwwStream.pm | 45 +++++++++++++++---------------------
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index fd558e1b7..42fb183f4 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -8,11 +8,10 @@
 # more common "push" model)
 package PublicInbox::WwwStream;
 use strict;
-use parent qw(Exporter);
+use parent qw(Exporter PublicInbox::GzipFilter);
 our @EXPORT_OK = qw(html_oneshot);
 use bytes (); # length
 use PublicInbox::Hval qw(ascii_html prurl);
-use PublicInbox::GzipFilter qw(gzf_maybe);
 our $TOR_URL = 'https://www.torproject.org/';
 our $CODE_URL = 'https://public-inbox.org/public-inbox.git';
 
@@ -35,10 +34,10 @@ sub init {
 
 sub response {
 	my ($ctx, $code, $cb) = @_;
-	my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ];
+	my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8' ];
 	init($ctx, $cb);
-	$ctx->{gzf} = gzf_maybe($h, $ctx->{env});
-	[ $code, $h, $ctx ]
+	$ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res_hdr, $ctx->{env});
+	[ $code, $res_hdr, $ctx ]
 }
 
 sub html_top ($) {
@@ -157,35 +156,27 @@ EOF
 # callback for HTTP.pm (and any other PSGI servers)
 sub getline {
 	my ($ctx) = @_;
-	my $cb = $ctx->{cb};
-	my $buf = $cb->($ctx) if $cb;
-	$buf //= delete($ctx->{cb}) ? _html_end($ctx) : undef;
-
-	# gzf may be GzipFilter, `undef' or `0'
-	my $gzf = $ctx->{gzf} or return $buf;
-
-	return $gzf->translate($buf) if defined $buf;
-	$ctx->{gzf} = 0; # next call to ->getline returns $buf (== undef)
-	$gzf->translate(undef);
+	my $cb = $ctx->{cb} or return;
+	if (defined(my $buf = $cb->($ctx))) {
+		return $ctx->translate($buf);
+	}
+	delete $ctx->{cb};
+	$ctx->zflush(_html_end($ctx));
 }
 
 sub html_oneshot ($$;$) {
 	my ($ctx, $code, $sref) = @_;
 	$ctx->{base_url} = base_url($ctx);
 	bless $ctx, __PACKAGE__;
-	my @x;
-	my $h = [ 'Content-Type' => 'text/html; charset=UTF-8',
+	my @bdy;
+	my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8',
 		'Content-Length' => undef ];
-	if (my $gzf = gzf_maybe($h, $ctx->{env})) {
-		$gzf->zmore(html_top($ctx));
-		$gzf->zmore($$sref) if $sref;
-		$x[0] = $gzf->zflush(_html_end($ctx));
-		$h->[3] = length($x[0]);
-	} else {
-		@x = (html_top($ctx), $sref ? $$sref : (), _html_end($ctx));
-		$h->[3] += bytes::length($_) for @x;
-	}
-	[ $code, $h, \@x ]
+	$ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res_hdr, $ctx->{env});
+	$ctx->zmore(html_top($ctx));
+	$ctx->zmore($$sref) if $sref;
+	$bdy[0] = $ctx->zflush(_html_end($ctx));
+	$res_hdr->[3] = bytes::length($bdy[0]);
+	[ $code, $res_hdr, \@bdy ]
 }
 
 1;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 28/43] view: make /$INBOX/$MSGID/ permalink async
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (26 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 27/43] wwwstream: subclass off GzipFilter Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 29/43] view: /$INBOX/$MSGID/t/ reads blobs asynchronously Eric Wong
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This will allow -httpd to handle other requusts if waiting on
an HDD seek or git to decode a blob.
---
 lib/PublicInbox/View.pm      | 38 ++++++++++++++++------------------
 lib/PublicInbox/WwwStream.pm | 40 +++++++++++++++++++++++++++++++-----
 2 files changed, 53 insertions(+), 25 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 243528263..98445f0e0 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -27,24 +27,22 @@ use constant TCHILD => '` ';
 sub th_pfx ($) { $_[0] == 0 ? '' : TCHILD };
 
 sub msg_page_i {
-	my ($ctx) = @_;
-	my $cur = delete $ctx->{smsg} or return; # undef: done
-	my $nxt;
-	if (my $over = $ctx->{-inbox}->over) {
-		$nxt = $ctx->{smsg} = $over->next_by_mid(@{$ctx->{next_arg}});
+	my ($ctx, $eml) = @_;
+	if ($eml) { # called by WwwStream::async_eml or getline
+		my $smsg = $ctx->{smsg};
+		$ctx->{smsg} = $ctx->{over}->next_by_mid(@{$ctx->{next_arg}});
+		$ctx->{mhref} = ($ctx->{nr} || $ctx->{smsg}) ?
+				"../${\mid_href($smsg->{mid})}/" : '';
+		my $hdr = $eml->header_obj;
+		my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx);
+		multipart_text_as_html($eml, $ctx);
+		delete $ctx->{obuf};
+		$$obuf .= '</pre><hr>';
+		$$obuf .= html_footer($ctx, $ctx->{first_hdr}) if !$ctx->{smsg};
+		$$obuf;
+	} else { # called by WwwStream::async_next or getline
+		$ctx->{smsg}; # may be undef
 	}
-	$ctx->{mhref} = ($ctx->{nr} || $nxt) ?
-			"../${\mid_href($cur->{mid})}/" : '';
-	my $eml = $ctx->{-inbox}->smsg_eml($cur) or return;
-	my $hdr = $eml->header_obj;
-	my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx);
-	multipart_text_as_html($eml, $ctx);
-	delete $ctx->{obuf};
-	$$obuf .= '</pre><hr>';
-	# we want to at least show the message if something
-	# here crashes:
-	eval { $$obuf .= html_footer($ctx, $ctx->{first_hdr}) } if !$nxt;
-	$$obuf;
 }
 
 # /$INBOX/$MESSAGE_ID/ for unindexed v1 inboxes
@@ -74,11 +72,11 @@ sub msg_page {
 	my ($ctx) = @_;
 	my $ibx = $ctx->{-inbox};
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
-	my $over = $ibx->over or return no_over_html($ctx);
+	my $over = $ctx->{over} = $ibx->over or return no_over_html($ctx);
 	my ($id, $prev);
 	my $next_arg = $ctx->{next_arg} = [ $ctx->{mid}, \$id, \$prev ];
-	$ctx->{smsg} = $over->next_by_mid(@$next_arg) or return;
-	PublicInbox::WwwStream::response($ctx, 200, \&msg_page_i);
+	$ctx->{smsg} = $over->next_by_mid(@$next_arg) or return; # undef == 404
+	PublicInbox::WwwStream::aresponse($ctx, 200, \&msg_page_i);
 }
 
 # /$INBOX/$MESSAGE_ID/#R
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 42fb183f4..eecc27019 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -15,9 +15,6 @@ use PublicInbox::Hval qw(ascii_html prurl);
 our $TOR_URL = 'https://www.torproject.org/';
 our $CODE_URL = 'https://public-inbox.org/public-inbox.git';
 
-# noop for HTTP.pm (and any other PSGI servers)
-sub close {}
-
 sub base_url ($) {
 	my $ctx = shift;
 	my $base_url = $ctx->{-inbox}->base_url($ctx->{env});
@@ -40,6 +37,11 @@ sub response {
 	[ $code, $res_hdr, $ctx ]
 }
 
+sub async_eml { # ->{async_eml} for async_blob_cb
+	my ($ctx, $eml) = @_;
+	$ctx->{http_out}->write($ctx->translate($ctx->{cb}->($ctx, $eml)));
+}
+
 sub html_top ($) {
 	my ($ctx) = @_;
 	my $ibx = $ctx->{-inbox};
@@ -157,8 +159,14 @@ EOF
 sub getline {
 	my ($ctx) = @_;
 	my $cb = $ctx->{cb} or return;
-	if (defined(my $buf = $cb->($ctx))) {
-		return $ctx->translate($buf);
+	while (defined(my $x = $cb->($ctx))) { # x = smsg or scalar non-ref
+		if (ref($x)) { # smsg
+			my $eml = $ctx->{-inbox}->smsg_eml($x) or next;
+			$ctx->{smsg} = $x;
+			return $ctx->translate($cb->($ctx, $eml));
+		} else { # scalar
+			return $ctx->translate($x);
+		}
 	}
 	delete $ctx->{cb};
 	$ctx->zflush(_html_end($ctx));
@@ -179,4 +187,26 @@ sub html_oneshot ($$;$) {
 	[ $code, $res_hdr, \@bdy ]
 }
 
+sub async_next ($) {
+	my ($http) = @_; # PublicInbox::HTTP
+	my $ctx = $http->{forward} or return;
+	eval {
+		if (my $smsg = $ctx->{smsg} = $ctx->{cb}->($ctx)) {
+			$ctx->smsg_blob($smsg);
+		} else {
+			$ctx->{http_out}->write(
+					$ctx->translate(_html_end($ctx)));
+			$ctx->close; # GzipFilter->close
+		}
+	};
+	warn "E: $@" if $@;
+}
+
+sub aresponse {
+	my ($ctx, $code, $cb) = @_;
+	my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8' ];
+	init($ctx, $cb);
+	$ctx->psgi_response($code, $res_hdr, \&async_next, \&async_eml);
+}
+
 1;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 29/43] view: /$INBOX/$MSGID/t/ reads blobs asynchronously
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (27 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 28/43] view: make /$INBOX/$MSGID/ permalink async Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 30/43] view: update /$INBOX/$MSGID/T/ to be async Eric Wong
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Once again, this shows a ~10% speedup with multi-message
threads in xt/httpd-async-stream.t regardless of whether
TEST_JOBS is 1 or 100.
---
 lib/PublicInbox/View.pm | 44 +++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 98445f0e0..117257a64 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -375,7 +375,7 @@ sub thread_eml_entry {
 	$beg . '<pre>' . eml_entry($ctx, $smsg, $eml, 0) . '</pre>' . $end;
 }
 
-sub next_in_queue ($;$) {
+sub next_in_queue ($$) {
 	my ($q, $ghost_ok) = @_;
 	while (@$q) {
 		my ($level, $smsg) = splice(@$q, 0, 2);
@@ -387,29 +387,39 @@ sub next_in_queue ($;$) {
 }
 
 sub stream_thread_i { # PublicInbox::WwwStream::getline callback
-	my ($ctx) = @_;
-	return unless exists($ctx->{skel});
-	my $nr = $ctx->{nr}++;
-	my ($level, $smsg) = next_in_queue($ctx->{-queue}, $nr);
-
-	$smsg or return
-		join('', thread_adj_level($ctx, 0)) . ${delete $ctx->{skel}};
-
-	my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return
-		ghost_index_entry($ctx, $level, $smsg);
+	my ($ctx, $eml) = @_;
 
-	if ($nr == 0) {
-		$ctx->{-title_html} = ascii_html($smsg->{subject});
-		$ctx->html_top . thread_eml_entry($ctx, $level, $smsg, $eml);
-	} else {
-		thread_eml_entry($ctx, $level, $smsg, $eml);
+	if ($eml) {
+		my ($level, $smsg) = delete @$ctx{qw(level smsg)};
+		if ($ctx->{nr} == 1) {
+			$ctx->{-title_html} = ascii_html($smsg->{subject});
+			$ctx->zmore($ctx->html_top);
+		}
+		return thread_eml_entry($ctx, $level, $smsg, $eml);
+	}
+	return unless exists($ctx->{skel});
+	my $ghost_ok = $ctx->{nr}++;
+	while (1) {
+		my ($lvl, $smsg) = next_in_queue($ctx->{-queue}, $ghost_ok);
+		if ($smsg) {
+			if (exists $smsg->{blob}) { # next message for cat-file
+				$ctx->{level} = $lvl;
+				return $smsg;
+			}
+			# buffer the ghost entry and loop
+			$ctx->zmore(ghost_index_entry($ctx, $lvl, $smsg));
+		} else { # all done
+			$ctx->zmore(join('', thread_adj_level($ctx, 0)));
+			$ctx->zmore(${delete($ctx->{skel})});
+			return;
+		}
 	}
 }
 
 sub stream_thread ($$) {
 	my ($rootset, $ctx) = @_;
 	$ctx->{-queue} = [ map { (0, $_) } @$rootset ];
-	PublicInbox::WwwStream::response($ctx, 200, \&stream_thread_i);
+	PublicInbox::WwwStream::aresponse($ctx, 200, \&stream_thread_i);
 }
 
 # /$INBOX/$MESSAGE_ID/t/

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 30/43] view: update /$INBOX/$MSGID/T/ to be async
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (28 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 29/43] view: /$INBOX/$MSGID/t/ reads blobs asynchronously Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 31/43] feed: generate_i: eliminate pointless loop Eric Wong
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Another 10% or so speedup in a frequently-hit endpoint.
---
 lib/PublicInbox/View.pm | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 117257a64..16a0fcdfb 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -459,23 +459,26 @@ sub thread_html {
 	# flat display: lazy load the full message from smsg
 	$ctx->{msgs} = $msgs;
 	$ctx->{-html_tip} = '<pre>';
-	PublicInbox::WwwStream::response($ctx, 200, \&thread_html_i);
+	PublicInbox::WwwStream::aresponse($ctx, 200, \&thread_html_i);
 }
 
 sub thread_html_i { # PublicInbox::WwwStream::getline callback
-	my ($ctx) = @_;
-	my $msgs = $ctx->{msgs} or return;
-	while (my $smsg = shift @$msgs) {
-		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
+	my ($ctx, $eml) = @_;
+	if ($eml) {
+		my $smsg = $ctx->{smsg};
 		if (exists $ctx->{-html_tip}) {
 			$ctx->{-title_html} = ascii_html($smsg->{subject});
-			return $ctx->html_top .
-				eml_entry($ctx, $smsg, $eml, scalar @$msgs);
+			$ctx->zmore($ctx->html_top);
+		}
+		return eml_entry($ctx, $smsg, $eml, scalar @{$ctx->{msgs}});
+	} else {
+		while (my $smsg = shift @{$ctx->{msgs}}) {
+			return $smsg if exists($smsg->{blob});
 		}
-		return eml_entry($ctx, $smsg, $eml, scalar @$msgs);
+		my $skel = delete($ctx->{skel}) or return; # all done
+		$ctx->zmore($$skel);
+		undef;
 	}
-	my ($skel) = delete @$ctx{qw(skel msgs)};
-	$$skel;
 }
 
 sub multipart_text_as_html {

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 31/43] feed: generate_i: eliminate pointless loop
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (29 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 30/43] view: update /$INBOX/$MSGID/T/ to be async Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 32/43] feed: /$INBOX/new.html fetches blobs asynchronously Eric Wong
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

$ctx->{msgs} won't ever contain undef values.
---
 lib/PublicInbox/Feed.pm | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index b15fc3a09..9141faaf0 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -11,9 +11,7 @@ use PublicInbox::Smsg; # this loads w/o Search::Xapian
 
 sub generate_i {
 	my ($ctx) = @_;
-	while (my $smsg = shift @{$ctx->{msgs}}) {
-		return $smsg;
-	}
+	shift @{$ctx->{msgs}};
 }
 
 # main function

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 32/43] feed: /$INBOX/new.html fetches blobs asynchronously
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (30 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 31/43] feed: generate_i: eliminate pointless loop Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 33/43] ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs Eric Wong
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Once again this speeds another endpoint up 10% or so.
---
 lib/PublicInbox/Feed.pm | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index 9141faaf0..279106d28 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -48,15 +48,15 @@ sub generate_html_index {
 }
 
 sub new_html_i {
-	my ($ctx) = @_;
-	return $ctx->html_top if exists $ctx->{-html_tip};
-	my $msgs = $ctx->{msgs};
-	while (my $smsg = shift @$msgs) {
-		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
-		return PublicInbox::View::eml_entry($ctx, $smsg, $eml,
-							scalar @$msgs);
-	}
-	PublicInbox::View::pagination_footer($ctx, './new.html');
+	my ($ctx, $eml) = @_;
+	$ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip};
+
+	$eml and return PublicInbox::View::eml_entry($ctx, $ctx->{smsg}, $eml,
+						scalar @{$ctx->{msgs}});
+	my $smsg = shift @{$ctx->{msgs}} or
+		$ctx->zmore(PublicInbox::View::pagination_footer(
+						$ctx, './new.html'));
+	$smsg;
 }
 
 sub new_html {
@@ -69,7 +69,7 @@ sub new_html {
 	$ctx->{-html_tip} = '<pre>';
 	$ctx->{-upfx} = '';
 	$ctx->{-hr} = 1;
-	PublicInbox::WwwStream::response($ctx, 200, \&new_html_i);
+	PublicInbox::WwwStream::aresponse($ctx, 200, \&new_html_i);
 }
 
 # private subs

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 33/43] ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (31 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 32/43] feed: /$INBOX/new.html fetches blobs asynchronously Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 34/43] view: eml_entry: reduce parameters Eric Wong
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Another 10% or so speedup when displaying full messages off
search results.
---
 lib/PublicInbox/SearchView.pm | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index eeebdfa31..921992a5d 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -287,21 +287,18 @@ sub mset_thread {
 
 	@$msgs = reverse @$msgs if $r;
 	$ctx->{msgs} = $msgs;
-	PublicInbox::WwwStream::response($ctx, 200, \&mset_thread_i);
+	PublicInbox::WwwStream::aresponse($ctx, 200, \&mset_thread_i);
 }
 
 # callback for PublicInbox::WwwStream::getline
 sub mset_thread_i {
-	my ($ctx) = @_;
-	return $ctx->html_top if exists $ctx->{-html_tip};
-	my $msgs = $ctx->{msgs} or return;
-	while (my $smsg = pop @$msgs) {
-		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
-		return PublicInbox::View::eml_entry($ctx, $smsg, $eml,
-							scalar @$msgs);
-	}
-	my ($skel) = delete @$ctx{qw(skel msgs)};
-	$$skel .= "\n</pre>";
+	my ($ctx, $eml) = @_;
+	$ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip};
+	$eml and return PublicInbox::View::eml_entry($ctx, $ctx->{smsg}, $eml,
+						scalar @{$ctx->{msgs}});
+	my $smsg = shift @{$ctx->{msgs}} or
+		$ctx->zmore(${delete($ctx->{skel})});
+	$smsg;
 }
 
 sub ctx_prepare {

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 34/43] view: eml_entry: reduce parameters
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (32 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 33/43] ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 35/43] view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case Eric Wong
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We can save stack space and simplify subroutine calls, here.
---
 Documentation/mknews.perl     |  4 ++--
 lib/PublicInbox/Feed.pm       |  2 +-
 lib/PublicInbox/SearchView.pm |  2 +-
 lib/PublicInbox/View.pm       | 17 +++++++++--------
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index 51d54b716..4a5d0e563 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -103,9 +103,9 @@ sub mime2txt {
 
 sub mime2html {
 	my ($out, $eml, $ctx) = @_;
-	my $smsg = bless {}, 'PublicInbox::Smsg';
+	my $smsg = $ctx->{smsg} = bless {}, 'PublicInbox::Smsg';
 	$smsg->populate($eml);
-	print $out PublicInbox::View::eml_entry($ctx, $smsg, $eml, 1) or die;
+	print $out PublicInbox::View::eml_entry($ctx, $eml, 1) or die;
 }
 
 sub html_start {
diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index 279106d28..476d946f5 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -51,7 +51,7 @@ sub new_html_i {
 	my ($ctx, $eml) = @_;
 	$ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip};
 
-	$eml and return PublicInbox::View::eml_entry($ctx, $ctx->{smsg}, $eml,
+	$eml and return PublicInbox::View::eml_entry($ctx, $eml,
 						scalar @{$ctx->{msgs}});
 	my $smsg = shift @{$ctx->{msgs}} or
 		$ctx->zmore(PublicInbox::View::pagination_footer(
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 921992a5d..623b16fb2 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -294,7 +294,7 @@ sub mset_thread {
 sub mset_thread_i {
 	my ($ctx, $eml) = @_;
 	$ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip};
-	$eml and return PublicInbox::View::eml_entry($ctx, $ctx->{smsg}, $eml,
+	$eml and return PublicInbox::View::eml_entry($ctx, $eml,
 						scalar @{$ctx->{msgs}});
 	my $smsg = shift @{$ctx->{msgs}} or
 		$ctx->zmore(${delete($ctx->{skel})});
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 16a0fcdfb..656953928 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -181,7 +181,8 @@ sub fmt_ts ($) { strftime('%Y-%m-%d %k:%M', gmtime($_[0])) }
 # Displays the text of of the message for /$INBOX/$MSGID/[Tt]/ endpoint
 # this is already inside a <pre>
 sub eml_entry {
-	my ($ctx, $smsg, $eml, $more) = @_;
+	my ($ctx, $eml, $more) = @_;
+	my $smsg = delete $ctx->{smsg};
 	my $subj = delete $smsg->{subject};
 	my $mid_raw = $smsg->{mid};
 	my $id = id_compress($mid_raw, 1);
@@ -370,9 +371,9 @@ sub pre_thread  { # walk_thread callback
 }
 
 sub thread_eml_entry {
-	my ($ctx, $level, $smsg, $eml) = @_;
-	my ($beg, $end) = thread_adj_level($ctx, $level);
-	$beg . '<pre>' . eml_entry($ctx, $smsg, $eml, 0) . '</pre>' . $end;
+	my ($ctx, $eml) = @_;
+	my ($beg, $end) = thread_adj_level($ctx, $ctx->{level});
+	$beg . '<pre>' . eml_entry($ctx, $eml, 0) . '</pre>' . $end;
 }
 
 sub next_in_queue ($$) {
@@ -390,12 +391,12 @@ sub stream_thread_i { # PublicInbox::WwwStream::getline callback
 	my ($ctx, $eml) = @_;
 
 	if ($eml) {
-		my ($level, $smsg) = delete @$ctx{qw(level smsg)};
 		if ($ctx->{nr} == 1) {
-			$ctx->{-title_html} = ascii_html($smsg->{subject});
+			$ctx->{-title_html} =
+					ascii_html($ctx->{smsg}->{subject});
 			$ctx->zmore($ctx->html_top);
 		}
-		return thread_eml_entry($ctx, $level, $smsg, $eml);
+		goto &thread_eml_entry; # tail recursion
 	}
 	return unless exists($ctx->{skel});
 	my $ghost_ok = $ctx->{nr}++;
@@ -470,7 +471,7 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 			$ctx->{-title_html} = ascii_html($smsg->{subject});
 			$ctx->zmore($ctx->html_top);
 		}
-		return eml_entry($ctx, $smsg, $eml, scalar @{$ctx->{msgs}});
+		return eml_entry($ctx, $eml, scalar @{$ctx->{msgs}});
 	} else {
 		while (my $smsg = shift @{$ctx->{msgs}}) {
 			return $smsg if exists($smsg->{blob});

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 35/43] view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (33 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 34/43] view: eml_entry: reduce parameters Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 36/43] wwwstream: eliminate ::response, use html_oneshot Eric Wong
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We can build and buffer the HTML <head> section once the first
non-ghost message in a thread is loaded, so there's no need to
perform an extra check on $ctx->{nr} once the $eml is ready.
---
 lib/PublicInbox/View.pm | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 656953928..138e0c3a2 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -389,15 +389,7 @@ sub next_in_queue ($$) {
 
 sub stream_thread_i { # PublicInbox::WwwStream::getline callback
 	my ($ctx, $eml) = @_;
-
-	if ($eml) {
-		if ($ctx->{nr} == 1) {
-			$ctx->{-title_html} =
-					ascii_html($ctx->{smsg}->{subject});
-			$ctx->zmore($ctx->html_top);
-		}
-		goto &thread_eml_entry; # tail recursion
-	}
+	goto &thread_eml_entry if $eml; # tail recursion
 	return unless exists($ctx->{skel});
 	my $ghost_ok = $ctx->{nr}++;
 	while (1) {
@@ -405,6 +397,11 @@ sub stream_thread_i { # PublicInbox::WwwStream::getline callback
 		if ($smsg) {
 			if (exists $smsg->{blob}) { # next message for cat-file
 				$ctx->{level} = $lvl;
+				if (!$ghost_ok) { # first non-ghost
+					$ctx->{-title_html} =
+						ascii_html($smsg->{subject});
+					$ctx->zmore($ctx->html_top);
+				}
 				return $smsg;
 			}
 			# buffer the ghost entry and loop

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 36/43] wwwstream: eliminate ::response, use html_oneshot
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (34 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 35/43] view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 37/43] www: update internal docs Eric Wong
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

All of our streaming responses use ::aresponse, now, and our
synchronous responses use html_oneshot.  So there's no need
for the old WwwStream::response.
---
 lib/PublicInbox/View.pm      | 28 +++++++++-------------------
 lib/PublicInbox/WwwStream.pm | 23 ++++++++---------------
 2 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 138e0c3a2..895e4f278 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -14,7 +14,7 @@ use PublicInbox::MID qw(id_compress mids mids_for_index references
 			$MID_EXTRACT);
 use PublicInbox::MsgIter;
 use PublicInbox::Address;
-use PublicInbox::WwwStream;
+use PublicInbox::WwwStream qw(html_oneshot);
 use PublicInbox::Reply;
 use PublicInbox::ViewDiff qw(flush_diff);
 use PublicInbox::Eml;
@@ -45,25 +45,20 @@ sub msg_page_i {
 	}
 }
 
-# /$INBOX/$MESSAGE_ID/ for unindexed v1 inboxes
-sub no_over_i {
+# /$INBOX/$MSGID/ for unindexed v1 inboxes
+sub no_over_html ($) {
 	my ($ctx) = @_;
-	my $eml = delete $ctx->{eml} or return;
+	my $bref = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return; # 404
+	my $eml = PublicInbox::Eml->new($bref);
 	my $hdr = $eml->header_obj;
 	$ctx->{mhref} = '';
+	PublicInbox::WwwStream::init($ctx);
 	my $obuf = $ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx);
 	multipart_text_as_html($eml, $ctx);
 	delete $ctx->{obuf};
 	$$obuf .= '</pre><hr>';
 	eval { $$obuf .= html_footer($ctx, $hdr) };
-	$$obuf
-}
-
-sub no_over_html ($) {
-	my ($ctx) = @_;
-	my $bref = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return; # 404
-	$ctx->{eml} = PublicInbox::Eml->new($bref);
-	PublicInbox::WwwStream::response($ctx, 200, \&no_over_i);
+	html_oneshot($ctx, 200, $obuf);
 }
 
 # public functions: (unstable)
@@ -1169,12 +1164,6 @@ sub pagination_footer ($$) {
 	"<hr><pre>page: $next$prev</pre>";
 }
 
-sub index_nav { # callback for WwwStream::getline
-	my ($ctx) = @_;
-	return $ctx->html_top if exists $ctx->{-html_tip};
-	pagination_footer($ctx, '.')
-}
-
 sub paginate_recent ($$) {
 	my ($ctx, $lim) = @_;
 	my $t = $ctx->{qp}->{t} || '';
@@ -1223,7 +1212,8 @@ sub index_topics {
 	if (@$msgs) {
 		walk_thread(thread_results($ctx, $msgs), $ctx, \&acc_topic);
 	}
-	PublicInbox::WwwStream::response($ctx, dump_topics($ctx), \&index_nav);
+	html_oneshot($ctx, dump_topics($ctx), \pagination_footer($ctx, '.'));
+
 }
 
 sub thread_adj_level {
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index eecc27019..7d257a191 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -29,14 +29,6 @@ sub init {
 	bless $ctx, __PACKAGE__;
 }
 
-sub response {
-	my ($ctx, $code, $cb) = @_;
-	my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8' ];
-	init($ctx, $cb);
-	$ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res_hdr, $ctx->{env});
-	[ $code, $res_hdr, $ctx ]
-}
-
 sub async_eml { # ->{async_eml} for async_blob_cb
 	my ($ctx, $eml) = @_;
 	$ctx->{http_out}->write($ctx->translate($ctx->{cb}->($ctx, $eml)));
@@ -174,17 +166,18 @@ sub getline {
 
 sub html_oneshot ($$;$) {
 	my ($ctx, $code, $sref) = @_;
-	$ctx->{base_url} = base_url($ctx);
-	bless $ctx, __PACKAGE__;
-	my @bdy;
 	my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8',
 		'Content-Length' => undef ];
+	bless $ctx, __PACKAGE__;
 	$ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res_hdr, $ctx->{env});
-	$ctx->zmore(html_top($ctx));
+	$ctx->{base_url} //= do {
+		$ctx->zmore(html_top($ctx));
+		base_url($ctx);
+	};
 	$ctx->zmore($$sref) if $sref;
-	$bdy[0] = $ctx->zflush(_html_end($ctx));
-	$res_hdr->[3] = bytes::length($bdy[0]);
-	[ $code, $res_hdr, \@bdy ]
+	my $bdy = $ctx->zflush(_html_end($ctx));
+	$res_hdr->[3] = bytes::length($bdy);
+	[ $code, $res_hdr, [ $bdy ] ]
 }
 
 sub async_next ($) {

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 37/43] www: update internal docs
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (35 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 36/43] wwwstream: eliminate ::response, use html_oneshot Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 38/43] view: simplify eml_entry callers further Eric Wong
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We no longer favor getline+close for streaming PSGI responses
when using public-inbox-httpd.  We still support it for other
PSGI servers, though.
---
 Documentation/technical/ds.txt   |  4 ++--
 lib/PublicInbox/GetlineBody.pm   |  4 +---
 lib/PublicInbox/GzipFilter.pm    | 17 +++++++++++++----
 lib/PublicInbox/HTTPD.pm         |  5 ++---
 lib/PublicInbox/Mbox.pm          |  8 ++------
 lib/PublicInbox/View.pm          |  2 +-
 lib/PublicInbox/WwwAtomStream.pm |  6 ++----
 lib/PublicInbox/WwwStream.pm     |  7 +++----
 8 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index cbd06cfb4..a0793ca23 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -64,8 +64,8 @@ Augmented features:
 * ->requeue support.  An optimization of the AddTimer(0, ...) idiom
   for immediately dispatching code at the next event loop iteration.
   public-inbox uses this for fairly generating large responses
-  iteratively (see PublicInbox::NNTP::long_response or the use of
-  ->getline callbacks for generating gigantic gzipped mboxes).
+  iteratively (see PublicInbox::NNTP::long_response or git_async_cat
+  for blob retrievals).
 
 New features
 
diff --git a/lib/PublicInbox/GetlineBody.pm b/lib/PublicInbox/GetlineBody.pm
index 6becaaf5f..988bc63f4 100644
--- a/lib/PublicInbox/GetlineBody.pm
+++ b/lib/PublicInbox/GetlineBody.pm
@@ -5,9 +5,7 @@
 # end callback when the object goes out-of-scope.
 # This depends on rpipe being _blocking_ on getline.
 #
-# public-inbox-httpd favors "getline" response bodies to take a
-# "pull"-based approach to feeding slow clients (as opposed to a
-# more common "push" model)
+# This is only used by generic PSGI servers and not public-inbox-httpd
 package PublicInbox::GetlineBody;
 use strict;
 use warnings;
diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 6380f50e9..d72ad3c88 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -1,7 +1,16 @@
 # Copyright (C) 2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-
-# Qspawn filter
+#
+# In public-inbox <=1.5.0, public-inbox-httpd favored "getline"
+# response bodies to take a "pull"-based approach to feeding
+# slow clients (as opposed to a more common "push" model).
+#
+# In newer versions, public-inbox-httpd supports a backpressure-aware
+# pull/push model which also accounts for slow git blob storage.
+# {async_next} callbacks only run when the DS {wbuf} is drained
+# {async_eml} callbacks only run when a blob arrives from git.
+#
+# We continue to support getline+close for generic PSGI servers.
 package PublicInbox::GzipFilter;
 use strict;
 use parent qw(Exporter);
@@ -14,12 +23,12 @@ our @EXPORT_OK = qw(gzf_maybe);
 my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 my @GZIP_HDRS = qw(Vary Accept-Encoding Content-Encoding gzip);
 
-sub new { bless {}, shift }
+sub new { bless {}, shift } # qspawn filter
 
 # for Qspawn if using $env->{'pi-httpd.async'}
 sub attach {
 	my ($self, $http_out) = @_;
-	$self->{http_out} = $http_out;
+	$self->{http_out} = $http_out; # PublicInbox::HTTP::{Chunked,Identity}
 	$self
 }
 
diff --git a/lib/PublicInbox/HTTPD.pm b/lib/PublicInbox/HTTPD.pm
index 331939699..a9f55ff61 100644
--- a/lib/PublicInbox/HTTPD.pm
+++ b/lib/PublicInbox/HTTPD.pm
@@ -36,9 +36,8 @@ sub new {
 
 		# XXX unstable API!, only GitHTTPBackend needs
 		# this to limit git-http-backend(1) parallelism.
-		# The rest of our PSGI code is generic, relying
-		# on "pull" model using "getline" to prevent
-		# over-buffering.
+		# We also check for the truthiness of this to
+		# detect when to use git_async_cat for slow blobs
 		'pi-httpd.async' => \&pi_httpd_async
 	);
 	bless {
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index abdf43c93..8726b9f64 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -1,12 +1,8 @@
 # Copyright (C) 2015-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
-# Streaming (via getline) interface for formatting messages as an mboxrd.
-# Used by the PSGI web interface.
-#
-# public-inbox-httpd favors "getline" response bodies to take a
-# "pull"-based approach to feeding slow clients (as opposed to a
-# more common "push" model)
+# Streaming interface for mboxrd HTTP responses
+# See PublicInbox::GzipFilter for details.
 package PublicInbox::Mbox;
 use strict;
 use parent 'PublicInbox::GzipFilter';
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 895e4f278..60dad6bac 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -415,7 +415,7 @@ sub stream_thread ($$) {
 	PublicInbox::WwwStream::aresponse($ctx, 200, \&stream_thread_i);
 }
 
-# /$INBOX/$MESSAGE_ID/t/
+# /$INBOX/$MSGID/t/ and /$INBOX/$MSGID/T/
 sub thread_html {
 	my ($ctx) = @_;
 	my $mid = $ctx->{mid};
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 073df1dfa..3b5b133a5 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -1,10 +1,8 @@
 # Copyright (C) 2016-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 #
-# Atom body stream for which yields getline+close methods
-# public-inbox-httpd favors "getline" response bodies to take a
-# "pull"-based approach to feeding slow clients (as opposed to a
-# more common "push" model)
+# Atom body stream for HTTP responses
+# See PublicInbox::GzipFilter for details.
 package PublicInbox::WwwAtomStream;
 use strict;
 use parent 'PublicInbox::GzipFilter';
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 7d257a191..23b03f0e8 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -1,11 +1,10 @@
 # Copyright (C) 2016-2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 #
-# HTML body stream for which yields getline+close methods
+# HTML body stream for which yields getline+close methods for
+# generic PSGI servers and callbacks for public-inbox-httpd.
 #
-# public-inbox-httpd favors "getline" response bodies to take a
-# "pull"-based approach to feeding slow clients (as opposed to a
-# more common "push" model)
+# See PublicInbox::GzipFilter parent class for more info.
 package PublicInbox::WwwStream;
 use strict;
 use parent qw(Exporter PublicInbox::GzipFilter);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 38/43] view: simplify eml_entry callers further
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (36 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 37/43] www: update internal docs Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 39/43] wwwtext: simplify gzf_maybe use Eric Wong
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

This simplifies the primary callers of eml_entry while only making
mknews.perl worse.
---
 Documentation/mknews.perl     | 3 ++-
 lib/PublicInbox/Feed.pm       | 3 +--
 lib/PublicInbox/SearchView.pm | 3 +--
 lib/PublicInbox/View.pm       | 9 +++++----
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index 4a5d0e563..f053e2bfb 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -105,7 +105,8 @@ sub mime2html {
 	my ($out, $eml, $ctx) = @_;
 	my $smsg = $ctx->{smsg} = bless {}, 'PublicInbox::Smsg';
 	$smsg->populate($eml);
-	print $out PublicInbox::View::eml_entry($ctx, $eml, 1) or die;
+	$ctx->{msgs} = [ 1 ]; # for <hr> in eml_entry
+	print $out PublicInbox::View::eml_entry($ctx, $eml) or die;
 }
 
 sub html_start {
diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index 476d946f5..bf095a2cc 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -51,8 +51,7 @@ sub new_html_i {
 	my ($ctx, $eml) = @_;
 	$ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip};
 
-	$eml and return PublicInbox::View::eml_entry($ctx, $eml,
-						scalar @{$ctx->{msgs}});
+	$eml and return PublicInbox::View::eml_entry($ctx, $eml);
 	my $smsg = shift @{$ctx->{msgs}} or
 		$ctx->zmore(PublicInbox::View::pagination_footer(
 						$ctx, './new.html'));
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 623b16fb2..84c04c6c4 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -294,8 +294,7 @@ sub mset_thread {
 sub mset_thread_i {
 	my ($ctx, $eml) = @_;
 	$ctx->zmore($ctx->html_top) if exists $ctx->{-html_tip};
-	$eml and return PublicInbox::View::eml_entry($ctx, $eml,
-						scalar @{$ctx->{msgs}});
+	$eml and return PublicInbox::View::eml_entry($ctx, $eml);
 	my $smsg = shift @{$ctx->{msgs}} or
 		$ctx->zmore(${delete($ctx->{skel})});
 	$smsg;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 60dad6bac..d7ec4eb0a 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -176,7 +176,7 @@ sub fmt_ts ($) { strftime('%Y-%m-%d %k:%M', gmtime($_[0])) }
 # Displays the text of of the message for /$INBOX/$MSGID/[Tt]/ endpoint
 # this is already inside a <pre>
 sub eml_entry {
-	my ($ctx, $eml, $more) = @_;
+	my ($ctx, $eml) = @_;
 	my $smsg = delete $ctx->{smsg};
 	my $subj = delete $smsg->{subject};
 	my $mid_raw = $smsg->{mid};
@@ -267,7 +267,8 @@ sub eml_entry {
 		$hr = $ctx->{-hr};
 	}
 
-	$rv .= $more ? '</pre><hr><pre>' : '</pre>' if $hr;
+	# do we have more messages? start a new <pre> if so
+	$rv .= scalar(@{$ctx->{msgs}}) ? '</pre><hr><pre>' : '</pre>' if $hr;
 	$rv;
 }
 
@@ -368,7 +369,7 @@ sub pre_thread  { # walk_thread callback
 sub thread_eml_entry {
 	my ($ctx, $eml) = @_;
 	my ($beg, $end) = thread_adj_level($ctx, $ctx->{level});
-	$beg . '<pre>' . eml_entry($ctx, $eml, 0) . '</pre>' . $end;
+	$beg . '<pre>' . eml_entry($ctx, $eml) . '</pre>' . $end;
 }
 
 sub next_in_queue ($$) {
@@ -463,7 +464,7 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 			$ctx->{-title_html} = ascii_html($smsg->{subject});
 			$ctx->zmore($ctx->html_top);
 		}
-		return eml_entry($ctx, $eml, scalar @{$ctx->{msgs}});
+		return eml_entry($ctx, $eml);
 	} else {
 		while (my $smsg = shift @{$ctx->{msgs}}) {
 			return $smsg if exists($smsg->{blob});

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 39/43] wwwtext: simplify gzf_maybe use
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (37 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 38/43] view: simplify eml_entry callers further Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 40/43] wwwattach: support async blob retrievals Eric Wong
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

gzf_maybe always returns a GzipFilter object, even if it uses
CompressNoop.  We can also use ->zflush instead of
->translate(undef) here for the final bit.
---
 lib/PublicInbox/WwwText.pm | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm
index 08691684d..a691e2d8e 100644
--- a/lib/PublicInbox/WwwText.pm
+++ b/lib/PublicInbox/WwwText.pm
@@ -38,16 +38,13 @@ sub get_text {
 	}
 	my $env = $ctx->{env};
 	if ($raw) {
-		my $body;
-		if (my $gzf = $code == 200 ? gzf_maybe($hdr, $env) : undef) {
-			my $zbuf = $gzf->translate($txt);
-			undef $txt;
-			$body = [ $zbuf .= $gzf->translate(undef) ];
-		} else {
-			$body = [ $txt ];
+		if ($code == 200) {
+			my $gzf = gzf_maybe($hdr, $env);
+			$txt = $gzf->translate($txt);
+			$txt .= $gzf->zflush;
 		}
-		$hdr->[3] = bytes::length($body->[0]);
-		return [ $code, $hdr, $body ]
+		$hdr->[3] = bytes::length($txt);
+		return [ $code, $hdr, [ $txt ] ]
 	}
 
 	# enforce trailing slash for "wget -r" compatibility

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 40/43] wwwattach: support async blob retrievals
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (38 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 39/43] wwwtext: simplify gzf_maybe use Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 41/43] gzipfilter: drop HTTP connection on bugs or data corruption Eric Wong
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We can reuse some of the GzipFilter infrastructure used by other
WWW components to handle slow blob retrieval, here.  The
difference from previous changes is we don't decide on the 200
status code until we've retrieved the blob and found the
attachment.

While we're at it, ensure we can compress text attachment
responses once again, since all text attachments are served
as text/plain.
---
 lib/PublicInbox/WwwAttach.pm |  63 +++++++++++---
 t/psgi_attach.t              | 162 ++++++++++++++++++++---------------
 2 files changed, 144 insertions(+), 81 deletions(-)

diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index 7e8496d7a..20417295e 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -4,15 +4,16 @@
 # For retrieving attachments from messages in the WWW interface
 package PublicInbox::WwwAttach; # internal package
 use strict;
-use warnings;
+use parent qw(PublicInbox::GzipFilter);
 use bytes (); # only for bytes::length
 use PublicInbox::EmlContentFoo qw(parse_content_type);
 use PublicInbox::Eml;
 
 sub get_attach_i { # ->each_part callback
 	my ($part, $depth, $idx) = @{$_[0]};
-	my $res = $_[1];
-	return if $idx ne $res->[3]; # [0-9]+(?:\.[0-9]+)+
+	my $ctx = $_[1];
+	return if $idx ne $ctx->{idx}; # [0-9]+(?:\.[0-9]+)+
+	my $res = $ctx->{res};
 	$res->[0] = 200;
 	my $ct = $part->content_type;
 	$ct = parse_content_type($ct) if $ct;
@@ -23,24 +24,64 @@ sub get_attach_i { # ->each_part callback
 		if ($cset && ($cset =~ /\A[a-zA-Z0-9_\-]+\z/)) {
 			$res->[1]->[1] .= qq(; charset=$cset);
 		}
+		$ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($res->[1],
+								$ctx->{env});
+		$part = $ctx->zflush($part->body);
 	} else { # TODO: allow user to configure safe types
 		$res->[1]->[1] = 'application/octet-stream';
+		$part = $part->body;
 	}
-	$part = $part->body;
 	push @{$res->[1]}, 'Content-Length', bytes::length($part);
 	$res->[2]->[0] = $part;
 }
 
+sub async_eml { # ->{async_eml} for async_blob_cb
+	my ($ctx, $eml) = @_;
+	eval { $eml->each_part(\&get_attach_i, $ctx, 1) };
+	if ($@) {
+		$ctx->{res}->[0] = 500;
+		warn "E: $@";
+	}
+}
+
+sub async_next {
+	my ($http) = @_;
+	my $ctx = $http->{forward} or return; # client aborted
+	# finally, we call the user-supplied callback
+	eval { $ctx->{wcb}->($ctx->{res}) };
+	warn "E: $@" if $@;
+}
+
+sub scan_attach ($) { # public-inbox-httpd only
+	my ($ctx) = @_;
+	$ctx->{env}->{'psgix.io'}->{forward} = $ctx;
+	$ctx->{async_eml} = \&async_eml;
+	$ctx->{async_next} = \&async_next;
+	$ctx->smsg_blob($ctx->{smsg});
+}
+
 # /$LISTNAME/$MESSAGE_ID/$IDX-$FILENAME
 sub get_attach ($$$) {
 	my ($ctx, $idx, $fn) = @_;
-	my $res = [ 404, [ 'Content-Type', 'text/plain' ], [ "Not found\n" ] ];
-	my $mime = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return $res;
-	$mime = PublicInbox::Eml->new($mime);
-	$res->[3] = $idx;
-	$mime->each_part(\&get_attach_i, $res, 1);
-	pop @$res; # cleanup before letting PSGI server see it
-	$res
+	$ctx->{res} = [ 404, [ 'Content-Type' => 'text/plain' ],
+				[ "Not found\n" ] ];
+	$ctx->{idx} = $idx;
+	bless $ctx, __PACKAGE__;
+	my $eml;
+	if ($ctx->{smsg} = $ctx->{-inbox}->smsg_by_mid($ctx->{mid})) {
+		return sub { # public-inbox-httpd-only
+			$ctx->{wcb} = $_[0];
+			scan_attach($ctx);
+		} if $ctx->{env}->{'pi-httpd.async'};
+		# generic PSGI:
+		$eml = $ctx->{-inbox}->smsg_eml($ctx->{smsg});
+	} elsif (!$ctx->{-inbox}->over) {
+		if (my $bref = $ctx->{-inbox}->msg_by_mid($ctx->{mid})) {
+			$eml = PublicInbox::Eml->new($bref);
+		}
+	}
+	$eml->each_part(\&get_attach_i, $ctx, 1) if $eml;
+	$ctx->{res}
 }
 
 1;
diff --git a/t/psgi_attach.t b/t/psgi_attach.t
index 9a734f813..14d20adb1 100644
--- a/t/psgi_attach.t
+++ b/t/psgi_attach.t
@@ -5,9 +5,8 @@ use warnings;
 use Test::More;
 use PublicInbox::TestCommon;
 my ($tmpdir, $for_destroy) = tmpdir();
-my $maindir = "$tmpdir/main.git";
+my $inboxdir = "$tmpdir/main.git";
 my $addr = 'test-public@example.com';
-my $cfgpfx = "publicinbox.test";
 my @mods = qw(HTTP::Request::Common Plack::Builder Plack::Test URI::Escape);
 require_mods(@mods);
 use_ok $_ foreach @mods;
@@ -17,85 +16,108 @@ use PublicInbox::Git;
 use PublicInbox::Config;
 use PublicInbox::Eml;
 use_ok 'PublicInbox::WwwAttach';
-my $config = PublicInbox::Config->new(\<<EOF);
-$cfgpfx.address=$addr
-$cfgpfx.inboxdir=$maindir
+
+my $cfgpath = "$tmpdir/config";
+open my $fh, '>', $cfgpath or BAIL_OUT $!;
+print $fh <<EOF or BAIL_OUT $!;
+[publicinbox "test"]
+	address = $addr
+	inboxdir = $inboxdir
 EOF
-my $git = PublicInbox::Git->new($maindir);
+close $fh or BAIL_OUT $!;
+my $config = PublicInbox::Config->new($cfgpath);
+my $git = PublicInbox::Git->new($inboxdir);
 my $im = PublicInbox::Import->new($git, 'test', $addr);
 $im->init_bare;
 
-{
-	my $qp = "abcdef=g\n==blah\n";
-	my $b64 = "b64\xde\xad\xbe\xef\n";
-	my $txt = "plain\ntext\npass\nthrough\n";
-	my $dot = "dotfile\n";
-	$im->add(eml_load('t/psgi_attach.eml'));
-	$im->add(eml_load('t/data/message_embed.eml'));
-	$im->done;
+my $qp = "abcdef=g\n==blah\n";
+my $b64 = "b64\xde\xad\xbe\xef\n";
+my $txt = "plain\ntext\npass\nthrough\n";
+my $dot = "dotfile\n";
+$im->add(eml_load('t/psgi_attach.eml'));
+$im->add(eml_load('t/data/message_embed.eml'));
+$im->done;
+
+my $www = PublicInbox::WWW->new($config);
+my $client = sub {
+	my ($cb) = @_;
+	my $res;
+	$res = $cb->(GET('/test/Z%40B/'));
+	my @href = ($res->content =~ /^href="([^"]+)"/gms);
+	@href = grep(/\A[\d\.]+-/, @href);
+	is_deeply([qw(1-queue-pee 2-bayce-sixty-four 3-noop.txt
+			4-a.txt)],
+		\@href, 'attachment links generated');
+
+	$res = $cb->(GET('/test/Z%40B/1-queue-pee'));
+	my $qp_res = $res->content;
+	ok(length($qp_res) >= length($qp), 'QP length is close');
+	like($qp_res, qr/\n\z/s, 'trailing newline exists');
+	# is(index($qp_res, $qp), 0, 'QP trailing newline is there');
+	$qp_res =~ s/\r\n/\n/g;
+	is(index($qp_res, $qp), 0, 'QP trailing newline is there');
+
+	$res = $cb->(GET('/test/Z%40B/2-base-sixty-four'));
+	is(quotemeta($res->content), quotemeta($b64),
+		'Base64 matches exactly');
 
-	my $www = PublicInbox::WWW->new($config);
-	test_psgi(sub { $www->call(@_) }, sub {
-		my ($cb) = @_;
-		my $res;
-		$res = $cb->(GET('/test/Z%40B/'));
-		my @href = ($res->content =~ /^href="([^"]+)"/gms);
-		@href = grep(/\A[\d\.]+-/, @href);
-		is_deeply([qw(1-queue-pee 2-bayce-sixty-four 3-noop.txt
-				4-a.txt)],
-			\@href, 'attachment links generated');
+	$res = $cb->(GET('/test/Z%40B/3-noop.txt'));
+	my $txt_res = $res->content;
+	ok(length($txt_res) >= length($txt),
+		'plain text almost matches');
+	like($txt_res, qr/\n\z/s, 'trailing newline exists in text');
+	is(index($txt_res, $txt), 0, 'plain text not truncated');
 
-		$res = $cb->(GET('/test/Z%40B/1-queue-pee'));
-		my $qp_res = $res->content;
-		ok(length($qp_res) >= length($qp), 'QP length is close');
-		like($qp_res, qr/\n\z/s, 'trailing newline exists');
-		# is(index($qp_res, $qp), 0, 'QP trailing newline is there');
-		$qp_res =~ s/\r\n/\n/g;
-		is(index($qp_res, $qp), 0, 'QP trailing newline is there');
+	$res = $cb->(GET('/test/Z%40B/4-a.txt'));
+	my $dot_res = $res->content;
+	ok(length($dot_res) >= length($dot), 'dot almost matches');
+	$res = $cb->(GET('/test/Z%40B/4-any-filename.txt'));
+	is($res->content, $dot_res, 'user-specified filename is OK');
 
-		$res = $cb->(GET('/test/Z%40B/2-base-sixty-four'));
-		is(quotemeta($res->content), quotemeta($b64),
-			'Base64 matches exactly');
+	my $mid = '20200418222508.GA13918@dcvr';
+	my $irt = '20200418222020.GA2745@dcvr';
+	$res = $cb->(GET("/test/$mid/"));
+	unlike($res->content, qr! multipart/mixed, Size: 0 bytes!,
+		'0-byte download not offered');
+	like($res->content, qr/\bhref="2-embed2x\.eml"/s,
+		'href to message/rfc822 attachment visible');
+	like($res->content, qr/\bhref="2\.1\.2-test\.eml"/s,
+		'href to nested message/rfc822 attachment visible');
 
-		$res = $cb->(GET('/test/Z%40B/3-noop.txt'));
-		my $txt_res = $res->content;
-		ok(length($txt_res) >= length($txt),
-			'plain text almost matches');
-		like($txt_res, qr/\n\z/s, 'trailing newline exists in text');
-		is(index($txt_res, $txt), 0, 'plain text not truncated');
+	$res = $cb->(GET("/test/$mid/2-embed2x.eml"));
+	my $eml = PublicInbox::Eml->new(\($res->content));
+	is_deeply([ $eml->header_raw('Message-ID') ], [ "<$irt>" ],
+		'got attached eml');
+	my @subs = $eml->subparts;
+	is(scalar(@subs), 2, 'attachment had 2 subparts');
+	like($subs[0]->body_str, qr/^testing embedded message\n*\z/sm,
+		'1st attachment is as expected');
+	is($subs[1]->header('Content-Type'), 'message/rfc822',
+		'2nd attachment is as expected');
 
-		$res = $cb->(GET('/test/Z%40B/4-a.txt'));
-		my $dot_res = $res->content;
-		ok(length($dot_res) >= length($dot), 'dot almost matches');
-		$res = $cb->(GET('/test/Z%40B/4-any-filename.txt'));
-		is($res->content, $dot_res, 'user-specified filename is OK');
+	$res = $cb->(GET("/test/$mid/2.1.2-test.eml"));
+	$eml = PublicInbox::Eml->new(\($res->content));
+	is_deeply([ $eml->header_raw('Message-ID') ],
+		[ '<20200418214114.7575-1-e@yhbt.net>' ],
+		'nested eml retrieved');
+};
 
-		my $mid = '20200418222508.GA13918@dcvr';
-		my $irt = '20200418222020.GA2745@dcvr';
-		$res = $cb->(GET("/test/$mid/"));
-		unlike($res->content, qr! multipart/mixed, Size: 0 bytes!,
-			'0-byte download not offered');
-		like($res->content, qr/\bhref="2-embed2x\.eml"/s,
-			'href to message/rfc822 attachment visible');
-		like($res->content, qr/\bhref="2\.1\.2-test\.eml"/s,
-			'href to nested message/rfc822 attachment visible');
+test_psgi(sub { $www->call(@_) }, $client);
+SKIP: {
+	diag 'testing with index indexed';
+	require_mods('DBD::SQLite', 19);
+	my $env = { PI_CONFIG => $cfgpath };
+	ok(run_script(['-index', $inboxdir], $env), 'indexed');
 
-		$res = $cb->(GET("/test/$mid/2-embed2x.eml"));
-		my $eml = PublicInbox::Eml->new(\($res->content));
-		is_deeply([ $eml->header_raw('Message-ID') ], [ "<$irt>" ],
-			'got attached eml');
-		my @subs = $eml->subparts;
-		is(scalar(@subs), 2, 'attachment had 2 subparts');
-		like($subs[0]->body_str, qr/^testing embedded message\n*\z/sm,
-			'1st attachment is as expected');
-		is($subs[1]->header('Content-Type'), 'message/rfc822',
-			'2nd attachment is as expected');
+	test_psgi(sub { $www->call(@_) }, $client);
 
-		$res = $cb->(GET("/test/$mid/2.1.2-test.eml"));
-		$eml = PublicInbox::Eml->new(\($res->content));
-		is_deeply([ $eml->header_raw('Message-ID') ],
-			[ '<20200418214114.7575-1-e@yhbt.net>' ],
-			'nested eml retrieved');
-	});
+	require_mods(qw(Plack::Test::ExternalServer), 18);
+	my $sock = tcp_server() or die;
+	my ($out, $err) = map { "$inboxdir/std$_.log" } qw(out err);
+	my $cmd = [ qw(-httpd -W0), "--stdout=$out", "--stderr=$err" ];
+	my $td = start_script($cmd, $env, { 3 => $sock });
+	my ($h, $p) = ($sock->sockhost, $sock->sockport);
+	local $ENV{PLACK_TEST_EXTERNALSERVER_URI} = "http://$h:$p";
+	Plack::Test::ExternalServer::test_psgi(client => $client);
 }
 done_testing();

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 41/43] gzipfilter: drop HTTP connection on bugs or data corruption
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (39 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 40/43] wwwattach: support async blob retrievals Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 42/43] daemon: warn on missing blobs Eric Wong
  2020-07-05 23:27 ` [PATCH 43/43] gzipfilter: check http->{forward} for client disconnects Eric Wong
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

While all the {async_next} callbacks needed eval guards anyways
because of DS->write, {async_eml} callbacks did not.

Ensure any bugs in our code or data corruption result in
termination of the HTTP connection, so as not to leave clients
hanging on a response which never comes or is mangled in some
way.
---
 lib/PublicInbox/GzipFilter.pm | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index d72ad3c88..57344604b 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -147,18 +147,34 @@ sub close {
 	}
 }
 
+sub bail  {
+	my $self = shift;
+	if (my $env = $self->{env}) {
+		eval { $env->{'psgi.errors'}->print(@_, "\n") };
+		warn("E: error printing to psgi.errors: $@", @_) if $@;
+		my $http = $env->{'psgix.io'} or return; # client abort
+		eval { $http->close }; # should hit our close
+		warn "E: error in http->close: $@" if $@;
+		eval { $self->close }; # just in case...
+		warn "E: error in self->close: $@" if $@;
+	} else {
+		warn @_, "\n";
+	}
+}
+
 # this is public-inbox-httpd-specific
 sub async_blob_cb { # git->cat_async callback
 	my ($bref, $oid, $type, $size, $self) = @_;
 	my $http = $self->{env}->{'psgix.io'} or return; # client abort
-	my $smsg = $self->{smsg} or die 'BUG: no smsg';
+	my $smsg = $self->{smsg} or bail($self, 'BUG: no smsg');
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
 		# public-inbox-(edit|purge), just move onto the next message
 		return $http->next_step($self->{async_next});
 	}
-	$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
-	$self->{async_eml}->($self, PublicInbox::Eml->new($bref));
+	$smsg->{blob} eq $oid or bail($self, "BUG: $smsg->{blob} != $oid");
+	eval { $self->{async_eml}->($self, PublicInbox::Eml->new($bref)) };
+	bail($self, "E: async_eml: $@") if $@;
 	$http->next_step($self->{async_next});
 }
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 42/43] daemon: warn on missing blobs
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (40 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 41/43] gzipfilter: drop HTTP connection on bugs or data corruption Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  2020-07-05 23:27 ` [PATCH 43/43] gzipfilter: check http->{forward} for client disconnects Eric Wong
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

Since -edit and -purge should be rare and TOCTOU around them
rarer still; missing {blobs} could be indicative of a real bug
elsewhere.  Warn on them.

And I somehow ended up with 3 different field names for Inbox
objects.  Perhaps they'll be made consistent in the future.
---
 lib/PublicInbox/GzipFilter.pm | 1 +
 lib/PublicInbox/IMAP.pm       | 1 +
 lib/PublicInbox/NNTP.pm       | 1 +
 3 files changed, 3 insertions(+)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 57344604b..b5ad9eb88 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -170,6 +170,7 @@ sub async_blob_cb { # git->cat_async callback
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
 		# public-inbox-(edit|purge), just move onto the next message
+		warn "E: $smsg->{blob} missing in $self->{-inbox}->{inboxdir}\n";
 		return $http->next_step($self->{async_next});
 	}
 	$smsg->{blob} eq $oid or bail($self, "BUG: $smsg->{blob} != $oid");
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index e06021438..d8c898f4b 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -620,6 +620,7 @@ sub fetch_blob_cb { # called by git->cat_async via git_async_cat
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
 		# public-inbox-(edit|purge), just move onto the next message
+		warn "E: $smsg->{blob} missing in $self->{ibx}->{inboxdir}\n";
 		return requeue_once($self);
 	} else {
 		$smsg->{blob} eq $oid or die "BUG: $smsg->{blob} != $oid";
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 9d91544ab..87ddf7a4a 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -523,6 +523,7 @@ sub blob_cb { # called by git->cat_async via git_async_cat
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs
 		# public-inbox-(edit|purge), just move onto the next message
+		warn "E: $smsg->{blob} missing in $self->{ng}->{inboxdir}\n";
 		return $self->requeue;
 	} elsif ($smsg->{blob} ne $oid) {
 		$self->close;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 43/43] gzipfilter: check http->{forward} for client disconnects
  2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
                   ` (41 preceding siblings ...)
  2020-07-05 23:27 ` [PATCH 42/43] daemon: warn on missing blobs Eric Wong
@ 2020-07-05 23:27 ` Eric Wong
  42 siblings, 0 replies; 44+ messages in thread
From: Eric Wong @ 2020-07-05 23:27 UTC (permalink / raw)
  To: meta

We actually don't do anything with {env} or {'psgix.io'}
on client aborts, so checking the truthiness of '{forward}'
is necessary.
---
 lib/PublicInbox/GzipFilter.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index b5ad9eb88..f1354d2b7 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -165,7 +165,8 @@ sub bail  {
 # this is public-inbox-httpd-specific
 sub async_blob_cb { # git->cat_async callback
 	my ($bref, $oid, $type, $size, $self) = @_;
-	my $http = $self->{env}->{'psgix.io'} or return; # client abort
+	my $http = $self->{env}->{'psgix.io'};
+	$http->{forward} or return; # client aborted
 	my $smsg = $self->{smsg} or bail($self, 'BUG: no smsg');
 	if (!defined($oid)) {
 		# it's possible to have TOCTOU if an admin runs

^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2020-07-05 23:28 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
2020-07-05 23:27 ` [PATCH 01/43] gzipfilter: minor cleanups Eric Wong
2020-07-05 23:27 ` [PATCH 02/43] wwwstream: oneshot: perform gzip without middleware Eric Wong
2020-07-05 23:27 ` [PATCH 03/43] www*stream: gzip ->getline responses Eric Wong
2020-07-05 23:27 ` [PATCH 04/43] wwwtext: gzip text/plain responses, as well Eric Wong
2020-07-05 23:27 ` [PATCH 05/43] wwwtext: switch to html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 06/43] www: need: use WwwStream::html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 07/43] wwwlisting: use GzipFilter for HTML Eric Wong
2020-07-05 23:27 ` [PATCH 08/43] gzipfilter: replace Compress::Raw::Deflate usages Eric Wong
2020-07-05 23:27 ` [PATCH 09/43] {gzip,noop}filter: ->zmore returns undef, always Eric Wong
2020-07-05 23:27 ` [PATCH 10/43] mbox: remove html_oneshot import Eric Wong
2020-07-05 23:27 ` [PATCH 11/43] wwwstatic: support gzipped directory listings Eric Wong
2020-07-05 23:27 ` [PATCH 12/43] qspawn: learn to gzip streaming responses Eric Wong
2020-07-05 23:27 ` [PATCH 13/43] stop auto-loading Plack::Middleware::Deflater Eric Wong
2020-07-05 23:27 ` [PATCH 14/43] mboxgz: do asynchronous git blob retrievals Eric Wong
2020-07-05 23:27 ` [PATCH 15/43] mboxgz: reduce hash depth Eric Wong
2020-07-05 23:27 ` [PATCH 16/43] mbox: async blob fetch for "single message" raw mboxrd Eric Wong
2020-07-05 23:27 ` [PATCH 17/43] wwwatomstream: simplify feed_update callers Eric Wong
2020-07-05 23:27 ` [PATCH 18/43] wwwatomstream: use PublicInbox::Inbox->modified for feed_updated Eric Wong
2020-07-05 23:27 ` [PATCH 19/43] wwwatomstream: reuse $ctx as $self Eric Wong
2020-07-05 23:27 ` [PATCH 20/43] xt/httpd-async-stream: allow more options Eric Wong
2020-07-05 23:27 ` [PATCH 21/43] wwwatomstream: support async blob fetch Eric Wong
2020-07-05 23:27 ` [PATCH 22/43] wwwstream: reduce object graph depth Eric Wong
2020-07-05 23:27 ` [PATCH 23/43] wwwstream: reduce blob fetch paths for ->getline Eric Wong
2020-07-05 23:27 ` [PATCH 24/43] www: start making gzipfilter the parent response class Eric Wong
2020-07-05 23:27 ` [PATCH 25/43] remove unused/redundant zlib-related imports Eric Wong
2020-07-05 23:27 ` [PATCH 26/43] wwwstream: use parent.pm and no warnings Eric Wong
2020-07-05 23:27 ` [PATCH 27/43] wwwstream: subclass off GzipFilter Eric Wong
2020-07-05 23:27 ` [PATCH 28/43] view: make /$INBOX/$MSGID/ permalink async Eric Wong
2020-07-05 23:27 ` [PATCH 29/43] view: /$INBOX/$MSGID/t/ reads blobs asynchronously Eric Wong
2020-07-05 23:27 ` [PATCH 30/43] view: update /$INBOX/$MSGID/T/ to be async Eric Wong
2020-07-05 23:27 ` [PATCH 31/43] feed: generate_i: eliminate pointless loop Eric Wong
2020-07-05 23:27 ` [PATCH 32/43] feed: /$INBOX/new.html fetches blobs asynchronously Eric Wong
2020-07-05 23:27 ` [PATCH 33/43] ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs Eric Wong
2020-07-05 23:27 ` [PATCH 34/43] view: eml_entry: reduce parameters Eric Wong
2020-07-05 23:27 ` [PATCH 35/43] view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case Eric Wong
2020-07-05 23:27 ` [PATCH 36/43] wwwstream: eliminate ::response, use html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 37/43] www: update internal docs Eric Wong
2020-07-05 23:27 ` [PATCH 38/43] view: simplify eml_entry callers further Eric Wong
2020-07-05 23:27 ` [PATCH 39/43] wwwtext: simplify gzf_maybe use Eric Wong
2020-07-05 23:27 ` [PATCH 40/43] wwwattach: support async blob retrievals Eric Wong
2020-07-05 23:27 ` [PATCH 41/43] gzipfilter: drop HTTP connection on bugs or data corruption Eric Wong
2020-07-05 23:27 ` [PATCH 42/43] daemon: warn on missing blobs Eric Wong
2020-07-05 23:27 ` [PATCH 43/43] gzipfilter: check http->{forward} for client disconnects Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).