* [PATCH 0/4] WWW-related memory savings
@ 2021-10-09 12:03 Eric Wong
2021-10-09 12:03 ` [PATCH 1/4] solver_git: shorten scalar lifetimes Eric Wong
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
To: meta
Some things I noticed while tracking down our own
reference cycle leak and the Encode <= 3.12 memory leak.
There's more aggressive stuff I'm testing, too, but
I've yet to check throughput performance.
Eric Wong (4):
solver_git: shorten scalar lifetimes
view: discard Eml->{bdy} when done using
http: avoid Perl target cache for psgi.input
view: save memory by dropping smsg->{from_name} on use
lib/PublicInbox/HTTP.pm | 30 ++++++------------------------
lib/PublicInbox/SearchView.pm | 2 +-
lib/PublicInbox/Smsg.pm | 9 +++------
lib/PublicInbox/SolverGit.pm | 8 ++++----
lib/PublicInbox/View.pm | 4 +++-
5 files changed, 17 insertions(+), 36 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] solver_git: shorten scalar lifetimes
2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
@ 2021-10-09 12:03 ` Eric Wong
2021-10-09 12:03 ` [PATCH 2/4] view: discard Eml->{bdy} when done using Eric Wong
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
To: meta
Some of these scalar buffers may be large patches, so try
to keep them as short-lived as possible to reduce memory
pressure.
---
lib/PublicInbox/SolverGit.pm | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index b0cd0f2c..5d5060f4 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -111,8 +111,6 @@ sub extract_diff ($$) {
my ($self, $want, $smsg) = @$arg;
my ($part) = @$p; # ignore $depth and @idx;
my $ct = $part->content_type || 'text/plain';
- my ($s, undef) = msg_part_text($part, $ct);
- defined $s or return;
my $post = $want->{oid_b};
my $pre = $want->{oid_a};
if (!defined($pre) || $pre !~ /\A[a-f0-9]+\z/) {
@@ -122,11 +120,12 @@ sub extract_diff ($$) {
# Email::MIME::Encodings forces QP to be CRLF upon decoding,
# change it back to LF:
my $cte = $part->header('Content-Transfer-Encoding') || '';
+ my ($s, undef) = msg_part_text($part, $ct);
+ defined $s or return;
+ delete $part->{bdy};
if ($cte =~ /\bquoted-printable\b/i && $part->crlf eq "\n") {
$s =~ s/\r\n/\n/sg;
}
-
-
$s =~ m!( # $1 start header lines we save for debugging:
# everything before ^index is optional, but we don't
@@ -169,6 +168,7 @@ sub extract_diff ($$) {
# because git-apply(1) handles that case, too
(?:^(?:[\@\+\x20\-\\][^\n]*|)$LF)+
)!smx or return;
+ undef $s; # free memory
my $di = {
hdr_lines => $1,
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] view: discard Eml->{bdy} when done using
2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
2021-10-09 12:03 ` [PATCH 1/4] solver_git: shorten scalar lifetimes Eric Wong
@ 2021-10-09 12:03 ` Eric Wong
2021-10-09 12:03 ` [PATCH 3/4] http: avoid Perl target cache for psgi.input Eric Wong
2021-10-09 12:03 ` [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
To: meta
We can release the raw body buffer once we've obtained a copy of
the decoded buffer. This reduces memory pressure ahead of some
expensive diff processing.
---
lib/PublicInbox/View.pm | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 64e73234..a6944b80 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -533,6 +533,7 @@ sub attach_link ($$$$;$) {
my $nl = $idx eq '1' ? '' : "\n"; # like join("\n", ...)
my $size = length($part->body);
+ delete $part->{bdy}; # save memory
# hide attributes normally, unless we want to aid users in
# spotting MUA problems:
@@ -632,6 +633,7 @@ sub add_text_body { # callback for each_part
attach_link($ctx, $ct, $p, $fn, $err);
$$rv .= "\n";
}
+ delete $part->{bdy}; # save memory
foreach my $cur (@sections) {
if ($cur =~ /\A>/) {
# we use a <span> here to allow users to specify
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] http: avoid Perl target cache for psgi.input
2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
2021-10-09 12:03 ` [PATCH 1/4] solver_git: shorten scalar lifetimes Eric Wong
2021-10-09 12:03 ` [PATCH 2/4] view: discard Eml->{bdy} when done using Eric Wong
@ 2021-10-09 12:03 ` Eric Wong
2021-10-09 12:03 ` [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
To: meta
By using syswrite to populate env->{psgi.input}. The substr()
call IO::Handle->write will trigger Perl's target/scratchpad and
result in a permanent allocation. Since this is a cold path,
that allocation is pointless, and syswrite() can already write a
substring.
Allowing Perl to cache a large allocation in a cold path only
result in fragmentation and wasted RAM.
write(2) on a regular file won't result in short writes
unless the FS quotas or free space limits are hit, or the buffer
is close to overflowing (e.g. the 0x7ffff000-byte Linux limit).
Since our HTTP server will never buffer that much in RAM,
there's no need to retry syswrite nor rely on the retrying
implicit in IO::Handle->write and the "print" perlop.
---
lib/PublicInbox/HTTP.pm | 30 ++++++------------------------
1 file changed, 6 insertions(+), 24 deletions(-)
diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index b2c74cf3..82c2b200 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -26,7 +26,6 @@ use Plack::HTTPParser qw(parse_http_request); # XS or pure Perl
use Plack::Util;
use HTTP::Status qw(status_message);
use HTTP::Date qw(time2str);
-use IO::Handle; # ->write
use PublicInbox::DS qw(msg_more);
use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
use PublicInbox::Tmpfile;
@@ -117,15 +116,6 @@ sub rbuf_process {
$len ? read_input($self, $rbuf) : app_dispatch($self, undef, $rbuf);
}
-# IO::Handle::write returns boolean, this returns bytes written:
-sub xwrite ($$$) {
- my ($fh, $rbuf, $max) = @_;
- my $w = length($$rbuf);
- $w = $max if $w > $max;
- $fh->write($$rbuf, $w) or return;
- $w;
-}
-
sub read_input ($;$) {
my ($self, $rbuf) = @_;
$rbuf //= $self->{rbuf} // (\(my $x = ''));
@@ -138,7 +128,7 @@ sub read_input ($;$) {
while ($len > 0) {
if ($$rbuf ne '') {
- my $w = xwrite($input, $rbuf, $len);
+ my $w = syswrite($input, $$rbuf, $len);
return write_err($self, $len) unless $w;
$len -= $w;
die "BUG: $len < 0 (w=$w)" if $len < 0;
@@ -333,12 +323,6 @@ sub response_write {
}
}
-sub input_tmpfile ($) {
- my $input = tmpfile('http.input', $_[0]->{sock}) or return;
- $input->autoflush(1);
- $input;
-}
-
sub input_prepare {
my ($self, $env) = @_;
my ($input, $len);
@@ -354,24 +338,22 @@ sub input_prepare {
return quit($self, 400) if $hte !~ /\Achunked\z/i;
$len = CHUNK_START;
- $input = input_tmpfile($self);
+ $input = tmpfile('http.input', $self->{sock});
} else {
$len = $env->{CONTENT_LENGTH};
if (defined $len) {
# rfc7230 3.3.3.4
return quit($self, 400) if $len !~ /\A[0-9]+\z/;
-
return quit($self, 413) if $len > $MAX_REQUEST_BUFFER;
- $input = $len ? input_tmpfile($self) : $null_io;
+ $input = $len ? tmpfile('http.input', $self->{sock})
+ : $null_io;
} else {
$input = $null_io;
}
}
# TODO: expire idle clients on ENFILE / EMFILE
- return unless $input;
-
- $env->{'psgi.input'} = $input;
+ $env->{'psgi.input'} = $input // return;
$self->{env} = $env;
$self->{input_left} = $len || 0;
}
@@ -441,7 +423,7 @@ sub read_input_chunked { # unlikely...
# drain the current chunk
until ($len <= 0) {
if ($$rbuf ne '') {
- my $w = xwrite($input, $rbuf, $len);
+ my $w = syswrite($input, $$rbuf, $len);
return write_err($self, "$len chunk") if !$w;
$len -= $w;
if ($len == 0) {
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use
2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
` (2 preceding siblings ...)
2021-10-09 12:03 ` [PATCH 3/4] http: avoid Perl target cache for psgi.input Eric Wong
@ 2021-10-09 12:03 ` Eric Wong
3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
To: meta
We'll also save a few LoC when generating it. $smsg objects can
linger a while when rendering large threads, so saving a few
bytes here can add up to several hundred KB saved.
I noticed this while chasing the ref cycle leak in commit
b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03).
While there's no longer a leak, releasing memory earlier can
allow it to be reused sooner and reduce both memory traffic and
memory pressure.
---
lib/PublicInbox/SearchView.pm | 2 +-
lib/PublicInbox/Smsg.pm | 9 +++------
lib/PublicInbox/View.pm | 2 +-
3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 91196cca..e74ddb90 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -122,7 +122,7 @@ sub mset_summary {
$min = $pct;
my $s = ascii_html($smsg->{subject});
- my $f = ascii_html($smsg->{from_name});
+ my $f = ascii_html(delete $smsg->{from_name});
if ($obfs_ibx) {
obfuscate_addrs($obfs_ibx, $s);
obfuscate_addrs($obfs_ibx, $f);
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index fb28eff7..a2f54507 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -57,15 +57,12 @@ sub load_from_data ($$) {
sub psgi_cull ($) {
my ($self) = @_;
- # ghosts don't have ->{from}
- my $from = delete($self->{from}) // '';
- my @n = PublicInbox::Address::names($from);
- $self->{from_name} = join(', ', @n);
-
# drop NNTP-only fields which aren't relevant to PSGI results:
# saves ~80K on a 200 item search result:
# TODO: we may need to keep some of these for JMAP...
- delete @$self{qw(tid to cc bytes lines)};
+ my ($f) = delete @$self{qw(from tid to cc bytes lines)};
+ # ghosts don't have ->{from}
+ $self->{from_name} = join(', ', PublicInbox::Address::names($f // ''));
$self;
}
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index a6944b80..116aa641 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -978,7 +978,7 @@ sub skel_dump { # walk_thread callback
$$skel .= delete($ctx->{sl_note}) || '';
}
- my $f = ascii_html($smsg->{from_name});
+ my $f = ascii_html(delete $smsg->{from_name});
my $obfs_ibx = $ctx->{-obfs_ibx};
obfuscate_addrs($obfs_ibx, $f) if $obfs_ibx;
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-10-09 12:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
2021-10-09 12:03 ` [PATCH 1/4] solver_git: shorten scalar lifetimes Eric Wong
2021-10-09 12:03 ` [PATCH 2/4] view: discard Eml->{bdy} when done using Eric Wong
2021-10-09 12:03 ` [PATCH 3/4] http: avoid Perl target cache for psgi.input Eric Wong
2021-10-09 12:03 ` [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).