* [PATCH 0/3] start tidying up gzip-related code
@ 2019-11-16 2:34 Eric Wong
2019-11-16 2:34 ` [PATCH 1/3] mbox: unused mid_clean import Eric Wong
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Eric Wong @ 2019-11-16 2:34 UTC (permalink / raw)
To: meta
Starting with the mbox.gz stuff, first. Gettig rid of
Plack::Middleware::Deflater is a long-term goal since we can
take advantage of doing gzip during HTML/XML rendering to
reduce memory usage.
Eric Wong (3):
mbox: unused mid_clean import
mbox: split mboxgz out into a separate file
mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
MANIFEST | 1 +
lib/PublicInbox/Mbox.pm | 68 +++----------------------------------
lib/PublicInbox/MboxGz.pm | 71 +++++++++++++++++++++++++++++++++++++++
3 files changed, 76 insertions(+), 64 deletions(-)
create mode 100644 lib/PublicInbox/MboxGz.pm
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] mbox: unused mid_clean import
2019-11-16 2:34 [PATCH 0/3] start tidying up gzip-related code Eric Wong
@ 2019-11-16 2:34 ` Eric Wong
2019-11-16 2:34 ` [PATCH 2/3] mbox: split mboxgz out into a separate file Eric Wong
2019-11-16 2:34 ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
2 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2019-11-16 2:34 UTC (permalink / raw)
To: meta
We're gradually phasing mid_clean out (in favor of mids()).
---
lib/PublicInbox/Mbox.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 67b671f5..9e808c09 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -10,7 +10,7 @@
package PublicInbox::Mbox;
use strict;
use warnings;
-use PublicInbox::MID qw/mid_clean mid_escape/;
+use PublicInbox::MID qw/mid_escape/;
use PublicInbox::Hval qw/to_filename/;
use Email::Simple;
use Email::MIME::Encode;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] mbox: split mboxgz out into a separate file
2019-11-16 2:34 [PATCH 0/3] start tidying up gzip-related code Eric Wong
2019-11-16 2:34 ` [PATCH 1/3] mbox: unused mid_clean import Eric Wong
@ 2019-11-16 2:34 ` Eric Wong
2019-11-16 2:34 ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
2 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2019-11-16 2:34 UTC (permalink / raw)
To: meta
It'll make using Compress::Raw::Zlib easier, since we
can use that and import constants more easily.
---
MANIFEST | 1 +
lib/PublicInbox/Mbox.pm | 64 ++-------------------------------------
lib/PublicInbox/MboxGz.pm | 64 +++++++++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+), 62 deletions(-)
create mode 100644 lib/PublicInbox/MboxGz.pm
diff --git a/MANIFEST b/MANIFEST
index ef8538b4..689d3d4e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -119,6 +119,7 @@ lib/PublicInbox/MDA.pm
lib/PublicInbox/MID.pm
lib/PublicInbox/MIME.pm
lib/PublicInbox/Mbox.pm
+lib/PublicInbox/MboxGz.pm
lib/PublicInbox/MsgIter.pm
lib/PublicInbox/MsgTime.pm
lib/PublicInbox/Msgmap.pm
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 9e808c09..42ed8c5d 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -136,7 +136,7 @@ sub msg_body ($) {
sub thread_mbox {
my ($ctx, $over, $sfx) = @_;
- eval { require IO::Compress::Gzip };
+ eval { require PublicInbox::MboxGz };
return sub { need_gzip(@_) } if $@;
my $mid = $ctx->{mid};
my $msgs = $over->get_thread($mid, {});
@@ -196,7 +196,7 @@ sub mbox_all_ids {
sub mbox_all {
my ($ctx, $query) = @_;
- eval { require IO::Compress::Gzip };
+ eval { require PublicInbox::MboxGz };
return sub { need_gzip(@_) } if $@;
return mbox_all_ids($ctx) if $query eq '';
my $opts = { mset => 2 };
@@ -239,63 +239,3 @@ EOF
}
1;
-
-package PublicInbox::MboxGz;
-use strict;
-use warnings;
-use PublicInbox::Hval qw/to_filename/;
-
-sub new {
- my ($class, $ctx, $cb) = @_;
- my $buf = '';
- $ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
- bless {
- buf => \$buf,
- gz => IO::Compress::Gzip->new(\$buf, Time => 0),
- cb => $cb,
- ctx => $ctx,
- }, $class;
-}
-
-sub response {
- my ($class, $ctx, $cb, $fn) = @_;
- my $body = $class->new($ctx, $cb);
- # http://www.iana.org/assignments/media-types/application/gzip
- my @h = qw(Content-Type application/gzip);
- if ($fn) {
- $fn = to_filename($fn);
- push @h, 'Content-Disposition', "inline; filename=$fn.mbox.gz";
- }
- [ 200, \@h, $body ];
-}
-
-# called by Plack::Util::foreach or similar
-sub getline {
- my ($self) = @_;
- my $ctx = $self->{ctx} or return;
- my $gz = $self->{gz};
- while (my $smsg = $self->{cb}->()) {
- my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
- my $h = Email::Simple->new($mref)->header_obj;
- $gz->write(PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}));
- $gz->write(PublicInbox::Mbox::msg_body($$mref));
-
- my $bref = $self->{buf};
- if (length($$bref) >= 8192) {
- my $ret = $$bref; # copy :<
- ${$self->{buf}} = '';
- return $ret;
- }
-
- # be fair to other clients on public-inbox-httpd:
- return '';
- }
- delete($self->{gz})->close;
- # signal that we're done and can return undef next call:
- delete $self->{ctx};
- ${delete $self->{buf}};
-}
-
-sub close {} # noop
-
-1;
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
new file mode 100644
index 00000000..2919ad6a
--- /dev/null
+++ b/lib/PublicInbox/MboxGz.pm
@@ -0,0 +1,64 @@
+# Copyright (C) 2015-2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+package PublicInbox::MboxGz;
+use strict;
+use warnings;
+use Email::Simple;
+use PublicInbox::Hval qw/to_filename/;
+use PublicInbox::Mbox;
+use IO::Compress::Gzip;
+
+sub new {
+ my ($class, $ctx, $cb) = @_;
+ my $buf = '';
+ $ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
+ bless {
+ buf => \$buf,
+ gz => IO::Compress::Gzip->new(\$buf, Time => 0),
+ cb => $cb,
+ ctx => $ctx,
+ }, $class;
+}
+
+sub response {
+ my ($class, $ctx, $cb, $fn) = @_;
+ my $body = $class->new($ctx, $cb);
+ # http://www.iana.org/assignments/media-types/application/gzip
+ my @h = qw(Content-Type application/gzip);
+ if ($fn) {
+ $fn = to_filename($fn);
+ push @h, 'Content-Disposition', "inline; filename=$fn.mbox.gz";
+ }
+ [ 200, \@h, $body ];
+}
+
+# called by Plack::Util::foreach or similar
+sub getline {
+ my ($self) = @_;
+ my $ctx = $self->{ctx} or return;
+ my $gz = $self->{gz};
+ while (my $smsg = $self->{cb}->()) {
+ my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
+ my $h = Email::Simple->new($mref)->header_obj;
+ $gz->write(PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}));
+ $gz->write(PublicInbox::Mbox::msg_body($$mref));
+
+ my $bref = $self->{buf};
+ if (length($$bref) >= 8192) {
+ my $ret = $$bref; # copy :<
+ ${$self->{buf}} = '';
+ return $ret;
+ }
+
+ # be fair to other clients on public-inbox-httpd:
+ return '';
+ }
+ delete($self->{gz})->close;
+ # signal that we're done and can return undef next call:
+ delete $self->{ctx};
+ ${delete $self->{buf}};
+}
+
+sub close {} # noop
+
+1;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
2019-11-16 2:34 [PATCH 0/3] start tidying up gzip-related code Eric Wong
2019-11-16 2:34 ` [PATCH 1/3] mbox: unused mid_clean import Eric Wong
2019-11-16 2:34 ` [PATCH 2/3] mbox: split mboxgz out into a separate file Eric Wong
@ 2019-11-16 2:34 ` Eric Wong
2019-11-19 13:57 ` SZEDER Gábor
2 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2019-11-16 2:34 UTC (permalink / raw)
To: meta
IO::Compress::Gzip is a wrapper around Compress::Raw::Zlib,
anyways, and being able to easily detach buffers to return them
via ->getline is nice. This results in a 1-2% performance
improvement when fetching giant mboxes.
---
lib/PublicInbox/Mbox.pm | 2 +-
lib/PublicInbox/MboxGz.pm | 41 +++++++++++++++++++++++----------------
2 files changed, 25 insertions(+), 18 deletions(-)
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 42ed8c5d..42cedd15 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -231,7 +231,7 @@ sub need_gzip {
my $title = 'gzipped mbox not available';
$fh->write(<<EOF);
<html><head><title>$title</title><body><pre>$title
-The administrator needs to install the IO::Compress::Gzip Perl module
+The administrator needs to install the Compress::Raw::Zlib Perl module
to support gzipped mboxes.
<a href="../">Return to index</a></pre></body></html>
EOF
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index 2919ad6a..2a55447f 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -7,17 +7,15 @@ use Email::Simple;
use PublicInbox::Hval qw/to_filename/;
use PublicInbox::Mbox;
use IO::Compress::Gzip;
+use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
sub new {
my ($class, $ctx, $cb) = @_;
- my $buf = '';
$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
- bless {
- buf => \$buf,
- gz => IO::Compress::Gzip->new(\$buf, Time => 0),
- cb => $cb,
- ctx => $ctx,
- }, $class;
+ my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
+ $err == Z_OK or die "Deflate->new failed: $err";
+ bless { gz => $gz, cb => $cb, ctx => $ctx }, $class;
}
sub response {
@@ -32,31 +30,40 @@ sub response {
[ 200, \@h, $body ];
}
+sub gzip_fail ($$) {
+ my ($ctx, $err) = @_;
+ $ctx->{env}->{'psgi.errors'}->print("deflate failed: $err\n");
+ '';
+}
+
# called by Plack::Util::foreach or similar
sub getline {
my ($self) = @_;
my $ctx = $self->{ctx} or return;
my $gz = $self->{gz};
+ my $buf = delete($self->{buf});
while (my $smsg = $self->{cb}->()) {
my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
my $h = Email::Simple->new($mref)->header_obj;
- $gz->write(PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}));
- $gz->write(PublicInbox::Mbox::msg_body($$mref));
- my $bref = $self->{buf};
- if (length($$bref) >= 8192) {
- my $ret = $$bref; # copy :<
- ${$self->{buf}} = '';
- return $ret;
- }
+ my $err = $gz->deflate(
+ PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}),
+ $buf);
+ return gzip_fail($ctx, $err) if $err != Z_OK;
+
+ $err = $gz->deflate(PublicInbox::Mbox::msg_body($$mref), $buf);
+ return gzip_fail($ctx, $err) if $err != Z_OK;
+
+ return $buf if length($buf) >= 8192;
# be fair to other clients on public-inbox-httpd:
+ $self->{buf} = $buf;
return '';
}
- delete($self->{gz})->close;
# signal that we're done and can return undef next call:
delete $self->{ctx};
- ${delete $self->{buf}};
+ my $err = $gz->flush($buf, Z_FINISH);
+ $err == Z_OK ? $buf : gzip_fail($ctx, $err);
}
sub close {} # noop
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
2019-11-16 2:34 ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
@ 2019-11-19 13:57 ` SZEDER Gábor
2019-11-19 20:12 ` Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: SZEDER Gábor @ 2019-11-19 13:57 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
Hi,
On Sat, Nov 16, 2019 at 02:34:39AM +0000, Eric Wong wrote:
> IO::Compress::Gzip is a wrapper around Compress::Raw::Zlib,
> anyways, and being able to easily detach buffers to return them
> via ->getline is nice. This results in a 1-2% performance
> improvement when fetching giant mboxes.
I've just stumbled upon an issue that I suspect to be related to this
patch series (or maybe just a strange coincidence...).
When trying to download a mbox.gz with 'wget' I get a "501 Not
Implemented", e.g.:
$ wget https://public-inbox.org/meta/20191116023439.32410-1-e@80x24.org/t.mbox.gz
--2019-11-19 14:53:37-- https://public-inbox.org/meta/20191116023439.32410-1-e@80x24.org/t.mbox.gz
Resolving public-inbox.org (public-inbox.org)... 64.71.152.64, 2600:3c01::f03c:91ff:fe96:f5d6
Connecting to public-inbox.org (public-inbox.org)|64.71.152.64|:443... connected.
HTTP request sent, awaiting response... 501 Not Implemented
2019-11-19 14:53:38 ERROR 501: Not Implemented.
When I try to do that with Firefox, I get:
gzipped mbox not available
The administrator needs to install the Compress::Raw::Zlib Perl module
to support gzipped mboxes.
Return to index
Thanks,
Gábor
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
2019-11-19 13:57 ` SZEDER Gábor
@ 2019-11-19 20:12 ` Eric Wong
0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2019-11-19 20:12 UTC (permalink / raw)
To: SZEDER Gábor; +Cc: meta
SZEDER Gábor <szeder.dev@gmail.com> wrote:
> I've just stumbled upon an issue that I suspect to be related to this
> patch series (or maybe just a strange coincidence...).
>
> When trying to download a mbox.gz with 'wget' I get a "501 Not
> Implemented", e.g.:
Thanks, fixed now. It's a bug in the build/install since
PublicInbox/MboxGz.pm was not installed (being a new file).
I made commit 4c20de0694d06ff3a5f963d7f51d509319060b50
("Makefile.PL: add dependency on MANIFEST contents") to
avoid that bug, but apparently it wasn't enough...
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-11-19 20:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-11-16 2:34 [PATCH 0/3] start tidying up gzip-related code Eric Wong
2019-11-16 2:34 ` [PATCH 1/3] mbox: unused mid_clean import Eric Wong
2019-11-16 2:34 ` [PATCH 2/3] mbox: split mboxgz out into a separate file Eric Wong
2019-11-16 2:34 ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
2019-11-19 13:57 ` SZEDER Gábor
2019-11-19 20:12 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).