* [PATCH 0/4] memory reductions for WWW + solver @ 2024-03-11 19:40 Eric Wong 2024-03-11 19:40 ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong ` (3 more replies) 0 siblings, 4 replies; 5+ messages in thread From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw) To: meta 1/4 gets rid of some overload caused by parallel solver invocations under heavy (likely bot) traffic crawling yhbt.net/lore with many coderepos enabled and joined to inboxes. 2/4 is a large reduction in allocations from loading coderepo <=> inbox associations, 4/4 is smaller. I found 2/4 with Devel::Mwrap and noticed 4/4 while working on 2/4. 3/4 is just a doc update but I've been successfully using jemalloc on my lore+gko mirror for a week or two, now (and I plan to experiment with making glibc||dlmalloc more resistant to fragmentation) Eric Wong (4): www: use a dedicated limiter for blob solver codesearch: deduplicate {ibx_score} name pairs doc: tuning: note reduced fragmentation w/ jemalloc codesearch: deduplicate $git->{nick} field Documentation/public-inbox-tuning.pod | 5 +++ examples/public-inbox-netd@.service | 2 ++ lib/PublicInbox/CodeSearch.pm | 14 ++++++-- lib/PublicInbox/SolverGit.pm | 15 +++++---- lib/PublicInbox/ViewVCS.pm | 48 ++++++++++++++++++++++----- 5 files changed, 66 insertions(+), 18 deletions(-) ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] www: use a dedicated limiter for blob solver 2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong @ 2024-03-11 19:40 ` Eric Wong 2024-03-11 19:40 ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong ` (2 subsequent siblings) 3 siblings, 0 replies; 5+ messages in thread From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw) To: meta Wrap the entire solver command chain with a dedicated limiter. The normal limiter is designed for longer-lived commands or ones which serve a single HTTP request (e.g. git-http-backend or cgit) and not effective for short memory + CPU intensive commands used for solver. Each overall solver request is both memory + CPU intensive: it spawns several short-lived git processes(*) in addition to a longer-lived `git cat-file --batch' process. Thus running parallel solvers from a single -netd/-httpd worker (which have their own parallelization) results in excessive parallelism that is both memory and CPU-bound (not network-bound) and cascade into slowdowns for handling simpler memory/CPU-bound requests. Parallel solvers were also responsible for the increased lifetime and frequency of zombies since the event loop was too saturated to reap them. We'll also return 503 on excessive solver queueing, since these require an FD for the client HTTP(S) socket to be held onto. (*) git (update-index|apply|ls-files) are all run by solver and short-lived --- lib/PublicInbox/SolverGit.pm | 15 ++++++----- lib/PublicInbox/ViewVCS.pm | 48 +++++++++++++++++++++++++++++------- 2 files changed, 48 insertions(+), 15 deletions(-) diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm index 4e79f750..296e7d17 100644 --- a/lib/PublicInbox/SolverGit.pm +++ b/lib/PublicInbox/SolverGit.pm @@ -256,6 +256,12 @@ sub update_index_result ($$) { next_step($self); # onto do_git_apply } +sub qsp_qx ($$$) { + my ($self, $qsp, $cb) = @_; + $qsp->{qsp_err} = \($self->{-qsp_err} = ''); + $qsp->psgi_qx($self->{psgi_env}, $self->{limiter}, $cb, $self); +} + sub prepare_index ($) { my ($self) = @_; my $patches = $self->{patches}; @@ -284,9 +290,8 @@ sub prepare_index ($) { my $cmd = [ qw(git update-index -z --index-info) ]; my $qsp = PublicInbox::Qspawn->new($cmd, $self->{git_env}, $rdr); $path_a = git_quote($path_a); - $qsp->{qsp_err} = \($self->{-qsp_err} = ''); $self->{-msg} = "index prepared:\n$mode_a $oid_full\t$path_a"; - $qsp->psgi_qx($self->{psgi_env}, undef, \&update_index_result, $self); + qsp_qx $self, $qsp, \&update_index_result; } # pure Perl "git init" @@ -465,8 +470,7 @@ sub apply_result ($$) { # qx_cb my @cmd = qw(git ls-files -s -z); my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env}); $self->{-cur_di} = $di; - $qsp->{qsp_err} = \($self->{-qsp_err} = ''); - $qsp->psgi_qx($self->{psgi_env}, undef, \&ls_files_result, $self); + qsp_qx $self, $qsp, \&ls_files_result; } sub do_git_apply ($) { @@ -495,8 +499,7 @@ sub do_git_apply ($) { my $opt = { 2 => 1, -C => _tmp($self)->dirname, quiet => 1 }; my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env}, $opt); $self->{-cur_di} = $di; - $qsp->{qsp_err} = \($self->{-qsp_err} = ''); - $qsp->psgi_qx($self->{psgi_env}, undef, \&apply_result, $self); + qsp_qx $self, $qsp, \&apply_result; } sub di_url ($$) { diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm index 61329db6..790b9a2c 100644 --- a/lib/PublicInbox/ViewVCS.pm +++ b/lib/PublicInbox/ViewVCS.pm @@ -49,6 +49,10 @@ my %GIT_MODE = ( '160000' => 'g', # commit (gitlink) ); +# TODO: not fork safe, but we don't fork w/o exec in PublicInbox::WWW +my (@solver_q, $solver_lim); +my $solver_nr = 0; + sub html_page ($$;@) { my ($ctx, $code) = @_[0, 1]; my $wcb = delete $ctx->{-wcb}; @@ -614,26 +618,52 @@ sub show_blob { # git->cat_async callback '</code></pre></td></tr></table>'.dbg_log($ctx), @def); } -# GET /$INBOX/$GIT_OBJECT_ID/s/ -# GET /$INBOX/$GIT_OBJECT_ID/s/$FILENAME -sub show ($$;$) { - my ($ctx, $oid_b, $fn) = @_; - my $hints = $ctx->{hints} = {}; +sub start_solver ($) { + my ($ctx) = @_; while (my ($from, $to) = each %QP_MAP) { my $v = $ctx->{qp}->{$from} // next; - $hints->{$to} = $v if $v ne ''; + $ctx->{hints}->{$to} = $v if $v ne ''; } - $ctx->{fn} = $fn; - $ctx->{-tmp} = File::Temp->newdir("solver.$oid_b-XXXX", TMPDIR => 1); + $ctx->{-next_solver} = PublicInbox::OnDestroy->new($$, \&next_solver); + ++$solver_nr; + $ctx->{-tmp} = File::Temp->newdir("solver.$ctx->{oid_b}-XXXX", + TMPDIR => 1); $ctx->{lh} or open $ctx->{lh}, '+>>', "$ctx->{-tmp}/solve.log"; my $solver = PublicInbox::SolverGit->new($ctx->{ibx}, \&solve_result, $ctx); + $solver->{limiter} = $solver_lim; $solver->{gits} //= [ $ctx->{git} ]; $solver->{tmp} = $ctx->{-tmp}; # share tmpdir # PSGI server will call this immediately and give us a callback (-wcb) + $solver->solve(@$ctx{qw(env lh oid_b hints)}); +} + +# run the next solver job when done and DESTROY-ed +sub next_solver { + --$solver_nr; + # XXX FIXME: client may've disconnected if it waited a long while + start_solver(shift(@solver_q) // return); +} + +sub may_start_solver ($) { + my ($ctx) = @_; + $solver_lim //= $ctx->{www}->{pi_cfg}->limiter('codeblob'); + if ($solver_nr >= $solver_lim->{max}) { + @solver_q > 128 ? html_page($ctx, 503, 'too busy') + : push(@solver_q, $ctx); + } else { + start_solver($ctx); + } +} + +# GET /$INBOX/$GIT_OBJECT_ID/s/ +# GET /$INBOX/$GIT_OBJECT_ID/s/$FILENAME +sub show ($$;$) { + my ($ctx, $oid_b, $fn) = @_; + @$ctx{qw(oid_b fn)} = ($oid_b, $fn); sub { $ctx->{-wcb} = $_[0]; # HTTP write callback - $solver->solve($ctx->{env}, $ctx->{lh}, $oid_b, $hints); + may_start_solver $ctx; }; } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs 2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong 2024-03-11 19:40 ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong @ 2024-03-11 19:40 ` Eric Wong 2024-03-11 19:40 ` [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc Eric Wong 2024-03-11 19:40 ` [PATCH 4/4] codesearch: deduplicate $git->{nick} field Eric Wong 3 siblings, 0 replies; 5+ messages in thread From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw) To: meta With my current mirror of lore + gko, this saves over 300K allocations and brings the allocation count in this area down to under 5K. The reduction in AV refs saves around 45MB RAM according to measurements done live via Devel::Mwrap. --- lib/PublicInbox/CodeSearch.pm | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm index 1f95a726..48033bb5 100644 --- a/lib/PublicInbox/CodeSearch.pm +++ b/lib/PublicInbox/CodeSearch.pm @@ -292,6 +292,7 @@ W: cindex.$name.topdir=$self->{topdir} has no usable join data for $cfg_f EOM my ($ekeys, $roots, $ibx2root) = @$jd{qw(ekeys roots ibx2root)}; my $roots2paths = roots2paths($self); + my %dedupe; # 50x alloc reduction w/ lore + gko mirror (Mar 2024) for my $root_offs (@$ibx2root) { my $ekey = shift(@$ekeys) // die 'BUG: {ekeys} empty'; scalar(@$root_offs) or next; @@ -320,9 +321,15 @@ EOM if (my $git = $dir2cr{$_}) { $ibx_p2g{$_} = $git; $ibx2self = 1; - $ibx->{-hide_www} or - push @{$git->{ibx_score}}, + if (!$ibx->{-hide_www}) { + # don't stringify $nr directly + # to avoid long-lived PV + my $k = ($nr + 0)."\0". + ($ibx + 0); + my $s = $dedupe{$k} //= [ $nr, $ibx->{name} ]; + push @{$git->{ibx_score}}, $s; + } push @$gits, $git; } else { warn <<EOM; ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc 2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong 2024-03-11 19:40 ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong 2024-03-11 19:40 ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong @ 2024-03-11 19:40 ` Eric Wong 2024-03-11 19:40 ` [PATCH 4/4] codesearch: deduplicate $git->{nick} field Eric Wong 3 siblings, 0 replies; 5+ messages in thread From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw) To: meta I may be mistaken, but I suspect the reason jemalloc handles long-lived processes better than glibc is due to granularity reduction being scaled to larger size classes. This can waste 20% of an individual allocation, but increases the likelyhood of reuse (without splitting/consolidating into other sizes). In other words, glibc seems to try too hard to make the best fit for initial allocations. This ends up being suboptimal over time as those allocations are freed and similar (but not identical) allocations come in. jemalloc sacrifices the best initial fit for better fits over a long process lifetime. --- Documentation/public-inbox-tuning.pod | 5 +++++ examples/public-inbox-netd@.service | 2 ++ 2 files changed, 7 insertions(+) diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod index 38810ce6..73246144 100644 --- a/Documentation/public-inbox-tuning.pod +++ b/Documentation/public-inbox-tuning.pod @@ -163,6 +163,11 @@ Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly increases memory use of client sockets, be sure to account for that in capacity planning. +Bursts of small object allocations late in process life contribute to +fragmentation of the heap due to arenas (slabs) used internally by Perl. +jemalloc (tested as an LD_PRELOAD on GNU/Linux) appears to reduce +overall fragmentation compared to glibc malloc in long-lived processes. + =head2 Other OS tuning knobs Linux users: the C<sys.vm.max_map_count> sysctl may need to be increased if diff --git a/examples/public-inbox-netd@.service b/examples/public-inbox-netd@.service index de5feea6..2330bd59 100644 --- a/examples/public-inbox-netd@.service +++ b/examples/public-inbox-netd@.service @@ -12,6 +12,8 @@ Wants = public-inbox-netd.socket After = public-inbox-netd.socket [Service] +# An LD_PRELOAD for libjemalloc can be added here. It currently seems +# more resistant to fragmentation to glibc in long-lived daemons. Environment = PI_CONFIG=/home/pi/.public-inbox/config \ PATH=/usr/local/bin:/usr/bin:/bin \ TZ=UTC \ ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] codesearch: deduplicate $git->{nick} field 2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong ` (2 preceding siblings ...) 2024-03-11 19:40 ` [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc Eric Wong @ 2024-03-11 19:40 ` Eric Wong 3 siblings, 0 replies; 5+ messages in thread From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw) To: meta While PublicInbox::Config is responsible for some instances of setting $git->{nick}, more PublicInbox::Git objects may be created from loading the cindex and we should do our best to reuse that memory, too. Followup-to: 84ed7ec1c887 (dedupe inbox names, coderepo nicks + git dirs, 2024-03-04) --- lib/PublicInbox/CodeSearch.pm | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm index 48033bb5..e5fa4480 100644 --- a/lib/PublicInbox/CodeSearch.pm +++ b/lib/PublicInbox/CodeSearch.pm @@ -283,7 +283,8 @@ EOM $nick =~ s!$lre!$nick_pfx!s or next; $dir2cr{$p} = $coderepos->{$nick} //= do { my $git = PublicInbox::Git->new($p); - $git->{nick} = $nick; # for git->pub_urls + my %dedupe = ($nick => undef); + ($git->{nick}) = keys %dedupe; # for git->pub_urls $git; }; } ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-03-11 19:40 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong 2024-03-11 19:40 ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong 2024-03-11 19:40 ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong 2024-03-11 19:40 ` [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc Eric Wong 2024-03-11 19:40 ` [PATCH 4/4] codesearch: deduplicate $git->{nick} field Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).