unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/4] memory reductions for WWW + solver
@ 2024-03-11 19:40 Eric Wong
  2024-03-11 19:40 ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
  To: meta

1/4 gets rid of some overload caused by parallel solver
invocations under heavy (likely bot) traffic crawling
yhbt.net/lore with many coderepos enabled and joined
to inboxes.

2/4 is a large reduction in allocations from loading
coderepo <=> inbox associations, 4/4 is smaller.
I found 2/4 with Devel::Mwrap and noticed 4/4 while
working on 2/4.

3/4 is just a doc update but I've been successfully using
jemalloc on my lore+gko mirror for a week or two, now
(and I plan to experiment with making glibc||dlmalloc more
resistant to fragmentation)

Eric Wong (4):
  www: use a dedicated limiter for blob solver
  codesearch: deduplicate {ibx_score} name pairs
  doc: tuning: note reduced fragmentation w/ jemalloc
  codesearch: deduplicate $git->{nick} field

 Documentation/public-inbox-tuning.pod |  5 +++
 examples/public-inbox-netd@.service   |  2 ++
 lib/PublicInbox/CodeSearch.pm         | 14 ++++++--
 lib/PublicInbox/SolverGit.pm          | 15 +++++----
 lib/PublicInbox/ViewVCS.pm            | 48 ++++++++++++++++++++++-----
 5 files changed, 66 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] www: use a dedicated limiter for blob solver
  2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong
@ 2024-03-11 19:40 ` Eric Wong
  2024-03-11 19:40 ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
  To: meta

Wrap the entire solver command chain with a dedicated limiter.
The normal limiter is designed for longer-lived commands or ones
which serve a single HTTP request (e.g. git-http-backend or
cgit) and not effective for short memory + CPU intensive commands
used for solver.

Each overall solver request is both memory + CPU intensive: it
spawns several short-lived git processes(*) in addition to a
longer-lived `git cat-file --batch' process.

Thus running parallel solvers from a single -netd/-httpd worker
(which have their own parallelization) results in excessive
parallelism that is both memory and CPU-bound (not network-bound)
and cascade into slowdowns for handling simpler memory/CPU-bound
requests.  Parallel solvers were also responsible for the
increased lifetime and frequency of zombies since the event loop
was too saturated to reap them.

We'll also return 503 on excessive solver queueing, since these
require an FD for the client HTTP(S) socket to be held onto.

(*) git (update-index|apply|ls-files) are all run by solver and
    short-lived
---
 lib/PublicInbox/SolverGit.pm | 15 ++++++-----
 lib/PublicInbox/ViewVCS.pm   | 48 +++++++++++++++++++++++++++++-------
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 4e79f750..296e7d17 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -256,6 +256,12 @@ sub update_index_result ($$) {
 	next_step($self); # onto do_git_apply
 }
 
+sub qsp_qx ($$$) {
+	my ($self, $qsp, $cb) = @_;
+	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
+	$qsp->psgi_qx($self->{psgi_env}, $self->{limiter}, $cb, $self);
+}
+
 sub prepare_index ($) {
 	my ($self) = @_;
 	my $patches = $self->{patches};
@@ -284,9 +290,8 @@ sub prepare_index ($) {
 	my $cmd = [ qw(git update-index -z --index-info) ];
 	my $qsp = PublicInbox::Qspawn->new($cmd, $self->{git_env}, $rdr);
 	$path_a = git_quote($path_a);
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
 	$self->{-msg} = "index prepared:\n$mode_a $oid_full\t$path_a";
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&update_index_result, $self);
+	qsp_qx $self, $qsp, \&update_index_result;
 }
 
 # pure Perl "git init"
@@ -465,8 +470,7 @@ sub apply_result ($$) { # qx_cb
 	my @cmd = qw(git ls-files -s -z);
 	my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env});
 	$self->{-cur_di} = $di;
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&ls_files_result, $self);
+	qsp_qx $self, $qsp, \&ls_files_result;
 }
 
 sub do_git_apply ($) {
@@ -495,8 +499,7 @@ sub do_git_apply ($) {
 	my $opt = { 2 => 1, -C => _tmp($self)->dirname, quiet => 1 };
 	my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env}, $opt);
 	$self->{-cur_di} = $di;
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&apply_result, $self);
+	qsp_qx $self, $qsp, \&apply_result;
 }
 
 sub di_url ($$) {
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 61329db6..790b9a2c 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -49,6 +49,10 @@ my %GIT_MODE = (
 	'160000' => 'g', # commit (gitlink)
 );
 
+# TODO: not fork safe, but we don't fork w/o exec in PublicInbox::WWW
+my (@solver_q, $solver_lim);
+my $solver_nr = 0;
+
 sub html_page ($$;@) {
 	my ($ctx, $code) = @_[0, 1];
 	my $wcb = delete $ctx->{-wcb};
@@ -614,26 +618,52 @@ sub show_blob { # git->cat_async callback
 		'</code></pre></td></tr></table>'.dbg_log($ctx), @def);
 }
 
-# GET /$INBOX/$GIT_OBJECT_ID/s/
-# GET /$INBOX/$GIT_OBJECT_ID/s/$FILENAME
-sub show ($$;$) {
-	my ($ctx, $oid_b, $fn) = @_;
-	my $hints = $ctx->{hints} = {};
+sub start_solver ($) {
+	my ($ctx) = @_;
 	while (my ($from, $to) = each %QP_MAP) {
 		my $v = $ctx->{qp}->{$from} // next;
-		$hints->{$to} = $v if $v ne '';
+		$ctx->{hints}->{$to} = $v if $v ne '';
 	}
-	$ctx->{fn} = $fn;
-	$ctx->{-tmp} = File::Temp->newdir("solver.$oid_b-XXXX", TMPDIR => 1);
+	$ctx->{-next_solver} = PublicInbox::OnDestroy->new($$, \&next_solver);
+	++$solver_nr;
+	$ctx->{-tmp} = File::Temp->newdir("solver.$ctx->{oid_b}-XXXX",
+						TMPDIR => 1);
 	$ctx->{lh} or open $ctx->{lh}, '+>>', "$ctx->{-tmp}/solve.log";
 	my $solver = PublicInbox::SolverGit->new($ctx->{ibx},
 						\&solve_result, $ctx);
+	$solver->{limiter} = $solver_lim;
 	$solver->{gits} //= [ $ctx->{git} ];
 	$solver->{tmp} = $ctx->{-tmp}; # share tmpdir
 	# PSGI server will call this immediately and give us a callback (-wcb)
+	$solver->solve(@$ctx{qw(env lh oid_b hints)});
+}
+
+# run the next solver job when done and DESTROY-ed
+sub next_solver {
+	--$solver_nr;
+	# XXX FIXME: client may've disconnected if it waited a long while
+	start_solver(shift(@solver_q) // return);
+}
+
+sub may_start_solver ($) {
+	my ($ctx) = @_;
+	$solver_lim //= $ctx->{www}->{pi_cfg}->limiter('codeblob');
+	if ($solver_nr >= $solver_lim->{max}) {
+		@solver_q > 128 ? html_page($ctx, 503, 'too busy')
+				: push(@solver_q, $ctx);
+	} else {
+		start_solver($ctx);
+	}
+}
+
+# GET /$INBOX/$GIT_OBJECT_ID/s/
+# GET /$INBOX/$GIT_OBJECT_ID/s/$FILENAME
+sub show ($$;$) {
+	my ($ctx, $oid_b, $fn) = @_;
+	@$ctx{qw(oid_b fn)} = ($oid_b, $fn);
 	sub {
 		$ctx->{-wcb} = $_[0]; # HTTP write callback
-		$solver->solve($ctx->{env}, $ctx->{lh}, $oid_b, $hints);
+		may_start_solver $ctx;
 	};
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs
  2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong
  2024-03-11 19:40 ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong
@ 2024-03-11 19:40 ` Eric Wong
  2024-03-11 19:40 ` [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc Eric Wong
  2024-03-11 19:40 ` [PATCH 4/4] codesearch: deduplicate $git->{nick} field Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
  To: meta

With my current mirror of lore + gko, this saves over 300K
allocations and brings the allocation count in this area down
to under 5K.  The reduction in AV refs saves around 45MB RAM
according to measurements done live via Devel::Mwrap.
---
 lib/PublicInbox/CodeSearch.pm | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm
index 1f95a726..48033bb5 100644
--- a/lib/PublicInbox/CodeSearch.pm
+++ b/lib/PublicInbox/CodeSearch.pm
@@ -292,6 +292,7 @@ W: cindex.$name.topdir=$self->{topdir} has no usable join data for $cfg_f
 EOM
 	my ($ekeys, $roots, $ibx2root) = @$jd{qw(ekeys roots ibx2root)};
 	my $roots2paths = roots2paths($self);
+	my %dedupe; # 50x alloc reduction w/ lore + gko mirror (Mar 2024)
 	for my $root_offs (@$ibx2root) {
 		my $ekey = shift(@$ekeys) // die 'BUG: {ekeys} empty';
 		scalar(@$root_offs) or next;
@@ -320,9 +321,15 @@ EOM
 				if (my $git = $dir2cr{$_}) {
 					$ibx_p2g{$_} = $git;
 					$ibx2self = 1;
-					$ibx->{-hide_www} or
-						push @{$git->{ibx_score}},
+					if (!$ibx->{-hide_www}) {
+						# don't stringify $nr directly
+						# to avoid long-lived PV
+						my $k = ($nr + 0)."\0".
+							($ibx + 0);
+						my $s = $dedupe{$k} //=
 							[ $nr, $ibx->{name} ];
+						push @{$git->{ibx_score}}, $s;
+					}
 					push @$gits, $git;
 				} else {
 					warn <<EOM;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc
  2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong
  2024-03-11 19:40 ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong
  2024-03-11 19:40 ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong
@ 2024-03-11 19:40 ` Eric Wong
  2024-03-11 19:40 ` [PATCH 4/4] codesearch: deduplicate $git->{nick} field Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
  To: meta

I may be mistaken, but I suspect the reason jemalloc handles
long-lived processes better than glibc is due to granularity
reduction being scaled to larger size classes.  This can waste
20% of an individual allocation, but increases the likelyhood
of reuse (without splitting/consolidating into other sizes).

In other words, glibc seems to try too hard to make the best fit
for initial allocations.  This ends up being suboptimal over
time as those allocations are freed and similar (but not
identical) allocations come in.  jemalloc sacrifices the best
initial fit for better fits over a long process lifetime.
---
 Documentation/public-inbox-tuning.pod | 5 +++++
 examples/public-inbox-netd@.service   | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index 38810ce6..73246144 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -163,6 +163,11 @@ Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly
 increases memory use of client sockets, be sure to account for that in
 capacity planning.
 
+Bursts of small object allocations late in process life contribute to
+fragmentation of the heap due to arenas (slabs) used internally by Perl.
+jemalloc (tested as an LD_PRELOAD on GNU/Linux) appears to reduce
+overall fragmentation compared to glibc malloc in long-lived processes.
+
 =head2 Other OS tuning knobs
 
 Linux users: the C<sys.vm.max_map_count> sysctl may need to be increased if
diff --git a/examples/public-inbox-netd@.service b/examples/public-inbox-netd@.service
index de5feea6..2330bd59 100644
--- a/examples/public-inbox-netd@.service
+++ b/examples/public-inbox-netd@.service
@@ -12,6 +12,8 @@ Wants = public-inbox-netd.socket
 After = public-inbox-netd.socket
 
 [Service]
+# An LD_PRELOAD for libjemalloc can be added here.  It currently seems
+# more resistant to fragmentation to glibc in long-lived daemons.
 Environment = PI_CONFIG=/home/pi/.public-inbox/config \
 PATH=/usr/local/bin:/usr/bin:/bin \
 TZ=UTC \

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] codesearch: deduplicate $git->{nick} field
  2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong
                   ` (2 preceding siblings ...)
  2024-03-11 19:40 ` [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc Eric Wong
@ 2024-03-11 19:40 ` Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
  To: meta

While PublicInbox::Config is responsible for some instances of
setting $git->{nick}, more PublicInbox::Git objects may be
created from loading the cindex and we should do our best to
reuse that memory, too.

Followup-to: 84ed7ec1c887 (dedupe inbox names, coderepo nicks + git dirs, 2024-03-04)
---
 lib/PublicInbox/CodeSearch.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm
index 48033bb5..e5fa4480 100644
--- a/lib/PublicInbox/CodeSearch.pm
+++ b/lib/PublicInbox/CodeSearch.pm
@@ -283,7 +283,8 @@ EOM
 		$nick =~ s!$lre!$nick_pfx!s or next;
 		$dir2cr{$p} = $coderepos->{$nick} //= do {
 			my $git = PublicInbox::Git->new($p);
-			$git->{nick} = $nick; # for git->pub_urls
+			my %dedupe = ($nick => undef);
+			($git->{nick}) = keys %dedupe; # for git->pub_urls
 			$git;
 		};
 	}

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-11 19:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong
2024-03-11 19:40 ` [PATCH 1/4] www: use a dedicated limiter for blob solver Eric Wong
2024-03-11 19:40 ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong
2024-03-11 19:40 ` [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc Eric Wong
2024-03-11 19:40 ` [PATCH 4/4] codesearch: deduplicate $git->{nick} field Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).