From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH 02/43] wwwstream: oneshot: perform gzip without middleware
Date: Sun, 5 Jul 2020 23:27:18 +0000 [thread overview]
Message-ID: <20200705232759.3161-3-e@yhbt.net> (raw)
In-Reply-To: <20200705232759.3161-1-e@yhbt.net>
Plack::Middleware::Deflater forces us to use a memory-intensive
closure. Instead, work towards building compressed strings in
memory to reduce the overhead of buffering large HTML output.
---
lib/PublicInbox/GzipFilter.pm | 13 +++++++++++++
lib/PublicInbox/WwwStream.pm | 27 ++++++++++++++++++++++-----
2 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index a7355a8df..115660cb1 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -4,7 +4,9 @@
# Qspawn filter
package PublicInbox::GzipFilter;
use strict;
+use parent qw(Exporter);
use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+our @EXPORT_OK = qw(gzip_maybe);
my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
sub new { bless {}, shift }
@@ -16,6 +18,17 @@ sub attach {
$self
}
+sub gzip_maybe ($) {
+ my ($env) = @_;
+ return if (($env->{HTTP_ACCEPT_ENCODING}) // '') !~ /\bgzip\b/;
+
+ # in case Plack::Middleware::Deflater is loaded:
+ $env->{'plack.skip-deflater'} = 1;
+
+ my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
+ $err == Z_OK ? $gz : undef;
+}
+
# for GetlineBody (via Qspawn) when NOT using $env->{'pi-httpd.async'}
sub translate ($$) {
my $self = $_[0];
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 915a71ba0..79ed6871e 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -13,6 +13,8 @@ use base qw(Exporter);
our @EXPORT_OK = qw(html_oneshot);
use bytes (); # length
use PublicInbox::Hval qw(ascii_html prurl);
+use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+use PublicInbox::GzipFilter qw(gzip_maybe);
our $TOR_URL = 'https://www.torproject.org/';
our $CODE_URL = 'https://public-inbox.org/public-inbox.git';
@@ -178,13 +180,28 @@ sub html_oneshot ($$;$) {
ctx => $ctx,
base_url => base_url($ctx),
}, __PACKAGE__;
- my @x = (_html_top($self), $sref ? $$sref : (), _html_end($self));
+ my @x;
+ my @h = ('Content-Type' => 'text/html; charset=UTF-8');
+ if (my $gz = gzip_maybe($ctx->{env})) {
+ my $err = $gz->deflate(_html_top($self), $x[0]);
+ die "gzip->deflate: $err" if $err != Z_OK;
+ if ($sref) {
+ $err = $gz->deflate($sref, $x[0]);
+ die "gzip->deflate: $err" if $err != Z_OK;
+ }
+ $err = $gz->deflate(_html_end($self), $x[0]);
+ die "gzip->deflate: $err" if $err != Z_OK;
+ $err = $gz->flush($x[0], Z_FINISH);
+ die "gzip->flush: $err" if $err != Z_OK;
+ push @h, qw(Vary Accept-Encoding Content-Encoding gzip);
+ } else {
+ @x = (_html_top($self), $sref ? $$sref : (), _html_end($self));
+ }
+
my $len = 0;
$len += bytes::length($_) for @x;
- [ $code, [
- 'Content-Type' => 'text/html; charset=UTF-8',
- 'Content-Length' => $len
- ], \@x ];
+ push @h, 'Content-Length', $len;
+ [ $code, \@h, \@x ]
}
1;
next prev parent reply other threads:[~2020-07-05 23:28 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
2020-07-05 23:27 ` [PATCH 01/43] gzipfilter: minor cleanups Eric Wong
2020-07-05 23:27 ` Eric Wong [this message]
2020-07-05 23:27 ` [PATCH 03/43] www*stream: gzip ->getline responses Eric Wong
2020-07-05 23:27 ` [PATCH 04/43] wwwtext: gzip text/plain responses, as well Eric Wong
2020-07-05 23:27 ` [PATCH 05/43] wwwtext: switch to html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 06/43] www: need: use WwwStream::html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 07/43] wwwlisting: use GzipFilter for HTML Eric Wong
2020-07-05 23:27 ` [PATCH 08/43] gzipfilter: replace Compress::Raw::Deflate usages Eric Wong
2020-07-05 23:27 ` [PATCH 09/43] {gzip,noop}filter: ->zmore returns undef, always Eric Wong
2020-07-05 23:27 ` [PATCH 10/43] mbox: remove html_oneshot import Eric Wong
2020-07-05 23:27 ` [PATCH 11/43] wwwstatic: support gzipped directory listings Eric Wong
2020-07-05 23:27 ` [PATCH 12/43] qspawn: learn to gzip streaming responses Eric Wong
2020-07-05 23:27 ` [PATCH 13/43] stop auto-loading Plack::Middleware::Deflater Eric Wong
2020-07-05 23:27 ` [PATCH 14/43] mboxgz: do asynchronous git blob retrievals Eric Wong
2020-07-05 23:27 ` [PATCH 15/43] mboxgz: reduce hash depth Eric Wong
2020-07-05 23:27 ` [PATCH 16/43] mbox: async blob fetch for "single message" raw mboxrd Eric Wong
2020-07-05 23:27 ` [PATCH 17/43] wwwatomstream: simplify feed_update callers Eric Wong
2020-07-05 23:27 ` [PATCH 18/43] wwwatomstream: use PublicInbox::Inbox->modified for feed_updated Eric Wong
2020-07-05 23:27 ` [PATCH 19/43] wwwatomstream: reuse $ctx as $self Eric Wong
2020-07-05 23:27 ` [PATCH 20/43] xt/httpd-async-stream: allow more options Eric Wong
2020-07-05 23:27 ` [PATCH 21/43] wwwatomstream: support async blob fetch Eric Wong
2020-07-05 23:27 ` [PATCH 22/43] wwwstream: reduce object graph depth Eric Wong
2020-07-05 23:27 ` [PATCH 23/43] wwwstream: reduce blob fetch paths for ->getline Eric Wong
2020-07-05 23:27 ` [PATCH 24/43] www: start making gzipfilter the parent response class Eric Wong
2020-07-05 23:27 ` [PATCH 25/43] remove unused/redundant zlib-related imports Eric Wong
2020-07-05 23:27 ` [PATCH 26/43] wwwstream: use parent.pm and no warnings Eric Wong
2020-07-05 23:27 ` [PATCH 27/43] wwwstream: subclass off GzipFilter Eric Wong
2020-07-05 23:27 ` [PATCH 28/43] view: make /$INBOX/$MSGID/ permalink async Eric Wong
2020-07-05 23:27 ` [PATCH 29/43] view: /$INBOX/$MSGID/t/ reads blobs asynchronously Eric Wong
2020-07-05 23:27 ` [PATCH 30/43] view: update /$INBOX/$MSGID/T/ to be async Eric Wong
2020-07-05 23:27 ` [PATCH 31/43] feed: generate_i: eliminate pointless loop Eric Wong
2020-07-05 23:27 ` [PATCH 32/43] feed: /$INBOX/new.html fetches blobs asynchronously Eric Wong
2020-07-05 23:27 ` [PATCH 33/43] ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs Eric Wong
2020-07-05 23:27 ` [PATCH 34/43] view: eml_entry: reduce parameters Eric Wong
2020-07-05 23:27 ` [PATCH 35/43] view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case Eric Wong
2020-07-05 23:27 ` [PATCH 36/43] wwwstream: eliminate ::response, use html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 37/43] www: update internal docs Eric Wong
2020-07-05 23:27 ` [PATCH 38/43] view: simplify eml_entry callers further Eric Wong
2020-07-05 23:27 ` [PATCH 39/43] wwwtext: simplify gzf_maybe use Eric Wong
2020-07-05 23:27 ` [PATCH 40/43] wwwattach: support async blob retrievals Eric Wong
2020-07-05 23:27 ` [PATCH 41/43] gzipfilter: drop HTTP connection on bugs or data corruption Eric Wong
2020-07-05 23:27 ` [PATCH 42/43] daemon: warn on missing blobs Eric Wong
2020-07-05 23:27 ` [PATCH 43/43] gzipfilter: check http->{forward} for client disconnects Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200705232759.3161-3-e@yhbt.net \
--to=e@yhbt.net \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).