unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/5] cleanups + posible major fragmentation reduction
@ 2024-06-06  7:44 Eric Wong
  2024-06-06  7:44 ` [PATCH 1/5] treewide: use \*STD(IN|OUT|ERR) consistently Eric Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Eric Wong @ 2024-06-06  7:44 UTC (permalink / raw)
  To: meta

1/5 is another attempt to fix sporadic t/imap_searchqp.t failures
This part of Perl still seems mysterious to me :<

Straightfoward stuff:

2,3,4 reduce syscalls and allocations from looking up `git'
(mainly in tests and odd parts this time around)

5/5 appears to be a major reduction in malloc fragmentation in
long-running -httpd and -netd instances under heavy traffic.
I'm still testing, but I'm kicking myself for not trying this
trade-off earlier even though mwrap-perl pointed this out
$ctx->{skel} as a hot spot months ago...

Eric Wong (5):
  treewide: use \*STD(IN|OUT|ERR) consistently
  git: decouple git_version from git_exe
  test_common: use cached git lookup to reduce stat(2)
  treewide: use cached git executable lookup
  www: reduce fragmentation in /t/ and /T/ endpoints

 lib/PublicInbox/Admin.pm          |  5 +--
 lib/PublicInbox/Config.pm         |  2 +-
 lib/PublicInbox/Git.pm            | 10 +++---
 lib/PublicInbox/GitCredential.pm  |  3 +-
 lib/PublicInbox/HTTP.pm           |  2 +-
 lib/PublicInbox/Import.pm         |  3 +-
 lib/PublicInbox/LEI.pm            |  9 ++---
 lib/PublicInbox/LeiBlob.pm        |  4 +--
 lib/PublicInbox/LeiConfig.pm      |  3 +-
 lib/PublicInbox/LeiInput.pm       |  4 ++-
 lib/PublicInbox/LeiMailDiff.pm    |  3 +-
 lib/PublicInbox/LeiSavedSearch.pm |  4 +--
 lib/PublicInbox/LeiViewText.pm    |  3 +-
 lib/PublicInbox/MailDiff.pm       |  3 +-
 lib/PublicInbox/MultiGit.pm       |  6 ++--
 lib/PublicInbox/SearchView.pm     |  7 ++--
 lib/PublicInbox/TestCommon.pm     | 12 +++----
 lib/PublicInbox/View.pm           | 55 ++++++++++++++++---------------
 lib/PublicInbox/WwwCoderepo.pm    |  7 ++--
 script/lei                        |  2 +-
 script/public-inbox-clone         |  2 +-
 script/public-inbox-fetch         |  2 +-
 22 files changed, 80 insertions(+), 71 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] treewide: use \*STD(IN|OUT|ERR) consistently
  2024-06-06  7:44 [PATCH 0/5] cleanups + posible major fragmentation reduction Eric Wong
@ 2024-06-06  7:44 ` Eric Wong
  2024-06-06  7:44 ` [PATCH 2/5] git: decouple git_version from git_exe Eric Wong
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2024-06-06  7:44 UTC (permalink / raw)
  To: meta

Referencing the {IO} slot may not always be populated or work
(e.g. with `-t' filetest) if there's no IO handle.  Using merely
using `\*' is shorter than typing out `{GLOB}', so just use the
shortest form consistently.

This may fix occasional and difficult-to-reproduce failures from
redirecting STDERR in t/imap_searchqp.t
---
 lib/PublicInbox/Admin.pm       | 2 +-
 lib/PublicInbox/HTTP.pm        | 2 +-
 lib/PublicInbox/LEI.pm         | 6 +++---
 lib/PublicInbox/TestCommon.pm  | 3 +--
 lib/PublicInbox/WwwCoderepo.pm | 2 +-
 script/lei                     | 2 +-
 script/public-inbox-clone      | 2 +-
 script/public-inbox-fetch      | 2 +-
 8 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index a1b1fc07..a2045b37 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -317,7 +317,7 @@ sub progress_prepare ($;$) {
 		$opt->{1} = $null; # suitable for spawn() redirect
 	} else {
 		$opt->{verbose} ||= 1;
-		$dst //= *STDERR{GLOB};
+		$dst //= \*STDERR;
 		$opt->{-progress} = sub { print $dst '# ', @_ };
 	}
 }
diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index 7162732e..b7728bd2 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -47,7 +47,7 @@ open(my $null_io, '<', '/dev/null') or die "open /dev/null: $!";
 {
 	my @n = stat($null_io) or die "stat(/dev/null): $!";
 	my @i = stat(STDIN) or die "stat(STDIN): $!";
-	$null_io = *STDIN{IO} if "@n[0, 1]" eq "@i[0, 1]";
+	$null_io = \*STDIN if "@n[0, 1]" eq "@i[0, 1]";
 }
 
 my $http_date;
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index e9a0de6c..309d290d 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -506,12 +506,12 @@ sub x_it ($$) {
 
 sub err ($;@) {
 	my $self = shift;
-	my $err = $self->{2} // ($self->{pgr} // [])->[2] // *STDERR{GLOB};
+	my $err = $self->{2} // ($self->{pgr} // [])->[2] // \*STDERR;
 	my @eor = (substr($_[-1]//'', -1, 1) eq "\n" ? () : ("\n"));
 	print $err @_, @eor and return;
 	my $old_err = delete $self->{2};
 	$old_err->close if $! == EPIPE && $old_err;
-	$err = $self->{2} = ($self->{pgr} // [])->[2] // *STDERR{GLOB};
+	$err = $self->{2} = ($self->{pgr} // [])->[2] // \*STDERR;
 	print $err @_, @eor or print STDERR @_, @eor;
 }
 
@@ -1556,7 +1556,7 @@ sub sto_barrier_request {
 		eval { $lei->{sto}->wq_do('schedule_commit', $n) };
 	} else {
 		my $s = ($wq ? $wq->{lei_sock} : undef) // $lei->{sock};
-		my $errfh = $lei->{2} // *STDERR{GLOB};
+		my $errfh = $lei->{2} // \*STDERR;
 		my @io = $s ? ($errfh, $s) : ($errfh);
 		eval { $lei->{sto}->wq_io_do('barrier', \@io, 1) };
 	}
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 3a67ab54..a01949a3 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -281,8 +281,7 @@ sub key2script ($) {
 	'blib/script/'.$key;
 }
 
-my @io_mode = ([ *STDIN{IO}, '+<&' ], [ *STDOUT{IO}, '+>&' ],
-		[ *STDERR{IO}, '+>&' ]);
+my @io_mode = ([ \*STDIN, '+<&' ], [ \*STDOUT, '+>&' ], [ \*STDERR, '+>&' ]);
 
 sub _prepare_redirects ($) {
 	my ($fhref) = @_;
diff --git a/lib/PublicInbox/WwwCoderepo.pm b/lib/PublicInbox/WwwCoderepo.pm
index 8587f530..8d825284 100644
--- a/lib/PublicInbox/WwwCoderepo.pm
+++ b/lib/PublicInbox/WwwCoderepo.pm
@@ -87,7 +87,7 @@ sub new {
 	my @s = stat(STDIN) or die "stat(STDIN): $!";
 	if ("@l[0, 1]" eq "@s[0, 1]") {
 		my $f = fcntl(STDIN, F_GETFL, 0);
-		$self->{log_fh} = *STDIN{IO} if $f & O_RDWR;
+		$self->{log_fh} = \*STDIN if $f & O_RDWR;
 	}
 	$self;
 }
diff --git a/script/lei b/script/lei
index 087afc33..a5aef956 100755
--- a/script/lei
+++ b/script/lei
@@ -37,7 +37,7 @@ my $exec_cmd = sub {
 	my ($fds, $argc, @argv) = @_;
 	my $parent = $$;
 	require POSIX;
-	my @old = (*STDIN{IO}, *STDOUT{IO}, *STDERR{IO});
+	my @old = (\*STDIN, \*STDOUT, \*STDERR);
 	my @rdr;
 	for my $fd (@$fds) {
 		open(my $newfh, '+<&=', $fd) or die "open +<&=$fd: $!";
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index c3e64485..5ecf11ae 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -54,7 +54,7 @@ require PublicInbox::LeiMirror;
 $url = PublicInbox::LeiExternal::ext_canonicalize($url);
 my $lei = bless {
 	env => \%ENV, opt => $opt, cmd => 'public-inbox-clone',
-	0 => *STDIN{GLOB}, 2 => *STDERR{GLOB},
+	0 => \*STDIN, 2 => \*STDERR,
 }, 'PublicInbox::LEI';
 open $lei->{1}, '+<&=', 1 or die "dup: $!";
 open $lei->{3}, '.' or die "open . $!";
diff --git a/script/public-inbox-fetch b/script/public-inbox-fetch
index 6fd15328..1ae31171 100755
--- a/script/public-inbox-fetch
+++ b/script/public-inbox-fetch
@@ -33,7 +33,7 @@ $SIG{PIPE} = 'IGNORE';
 
 my $lei = bless {
 	env => \%ENV, opt => $opt, cmd => 'public-inbox-fetch',
-	0 => *STDIN{GLOB}, 1 => *STDOUT{GLOB}, 2 => *STDERR{GLOB},
+	0 => \*STDIN, 1 => \*STDOUT, 2 => \*STDERR,
 }, 'PublicInbox::LEI';
 PublicInbox::Fetch->do_fetch($lei, '.');
 exit(($lei->{child_error} // 0) >> 8);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] git: decouple git_version from git_exe
  2024-06-06  7:44 [PATCH 0/5] cleanups + posible major fragmentation reduction Eric Wong
  2024-06-06  7:44 ` [PATCH 1/5] treewide: use \*STD(IN|OUT|ERR) consistently Eric Wong
@ 2024-06-06  7:44 ` Eric Wong
  2024-06-06  7:44 ` [PATCH 3/5] test_common: use cached git lookup to reduce stat(2) Eric Wong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2024-06-06  7:44 UTC (permalink / raw)
  To: meta

No need to check for the git version number unless we actually
want it.
---
 lib/PublicInbox/Git.pm | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 6b722023..32c11a59 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -61,7 +61,10 @@ sub git_exe () {
 	return $GIT_EXE if $now < $next_check;
 	$next_check = $now + 10;
 	$GIT_EXE = which('git') // die "git not found in $ENV{PATH}";
-	my @st = stat(_) or die "stat($GIT_EXE): $!"; # can't do HiRes w/ _
+}
+
+sub git_version () {
+	my @st = stat(git_exe) or die "stat($GIT_EXE): $!";
 	my $st = pack('dd', $st[0], $st[1]);
 	if ($st ne $EXE_ST) {
 		my $v = run_qx([ $GIT_EXE, '--version' ]);
@@ -71,11 +74,6 @@ sub git_exe () {
 		$GIT_VER = eval("v$1") // die "BUG: bad vstring: $1 ($v)";
 		$EXE_ST = $st;
 	}
-	$GIT_EXE;
-}
-
-sub git_version () {
-	git_exe;
 	$GIT_VER;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] test_common: use cached git lookup to reduce stat(2)
  2024-06-06  7:44 [PATCH 0/5] cleanups + posible major fragmentation reduction Eric Wong
  2024-06-06  7:44 ` [PATCH 1/5] treewide: use \*STD(IN|OUT|ERR) consistently Eric Wong
  2024-06-06  7:44 ` [PATCH 2/5] git: decouple git_version from git_exe Eric Wong
@ 2024-06-06  7:44 ` Eric Wong
  2024-06-06  7:44 ` [PATCH 4/5] treewide: use cached git executable lookup Eric Wong
  2024-06-06  7:44 ` [PATCH 5/5] www: reduce fragmentation in /t/ and /T/ endpoints Eric Wong
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2024-06-06  7:44 UTC (permalink / raw)
  To: meta

This reduces syscalls required to run tests and IMHO improves
readability since we're not having to import `git_exe' everywhere.
---
 lib/PublicInbox/TestCommon.pm | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index a01949a3..00e96aee 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -167,11 +167,8 @@ sub require_git ($;$) {
 sub require_git_http_backend (;$) {
 	my ($nr) = @_;
 	state $ok = do {
-		require PublicInbox::Git;
-		my $git = PublicInbox::Git::git_exe() or plan
-			skip_all => 'nothing in public-inbox works w/o git';
 		my $rdr = { 1 => \my $out, 2 => \my $err };
-		xsys([$git, qw(http-backend)], undef, $rdr);
+		xsys([qw(git http-backend)], undef, $rdr);
 		$out =~ /^Status:/ism;
 	};
 	if (!$ok) {
@@ -274,7 +271,9 @@ sub require_mods {
 
 sub key2script ($) {
 	my ($key) = @_;
-	return $key if ($key eq 'git' || index($key, '/') >= 0);
+	require PublicInbox::Git;
+	return PublicInbox::Git::git_exe() if $key eq 'git';
+	return $key if index($key, '/') >= 0;
 	# n.b. we may have scripts which don't start with "public-inbox" in
 	# the future:
 	$key =~ s/\A([-\.])/public-inbox$1/;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] treewide: use cached git executable lookup
  2024-06-06  7:44 [PATCH 0/5] cleanups + posible major fragmentation reduction Eric Wong
                   ` (2 preceding siblings ...)
  2024-06-06  7:44 ` [PATCH 3/5] test_common: use cached git lookup to reduce stat(2) Eric Wong
@ 2024-06-06  7:44 ` Eric Wong
  2024-06-06  7:44 ` [PATCH 5/5] www: reduce fragmentation in /t/ and /T/ endpoints Eric Wong
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2024-06-06  7:44 UTC (permalink / raw)
  To: meta

Repeated stat(2) syscalls are more expensive nowadays due
to CPU vulnerability mitigations and this change also
allows bypassing some heap allocations done by Perl.
---
 lib/PublicInbox/Admin.pm          | 3 ++-
 lib/PublicInbox/Config.pm         | 2 +-
 lib/PublicInbox/GitCredential.pm  | 3 ++-
 lib/PublicInbox/Import.pm         | 3 ++-
 lib/PublicInbox/LEI.pm            | 3 ++-
 lib/PublicInbox/LeiBlob.pm        | 4 ++--
 lib/PublicInbox/LeiConfig.pm      | 3 ++-
 lib/PublicInbox/LeiInput.pm       | 4 +++-
 lib/PublicInbox/LeiMailDiff.pm    | 3 ++-
 lib/PublicInbox/LeiSavedSearch.pm | 4 ++--
 lib/PublicInbox/LeiViewText.pm    | 3 ++-
 lib/PublicInbox/MailDiff.pm       | 3 ++-
 lib/PublicInbox/MultiGit.pm       | 6 ++++--
 lib/PublicInbox/WwwCoderepo.pm    | 5 +++--
 14 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index a2045b37..bb5d3653 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -11,6 +11,7 @@ use PublicInbox::Config;
 use PublicInbox::Inbox;
 use PublicInbox::Spawn qw(run_qx);
 use PublicInbox::Eml;
+use PublicInbox::Git qw(git_exe);
 *rel2abs_collapsed = \&PublicInbox::Config::rel2abs_collapsed;
 
 sub setup_signals {
@@ -77,7 +78,7 @@ sub resolve_git_dir {
 	my $env;
 	defined($pwd) && substr($cd // '/', 0, 1) ne '/' and
 		$env->{PWD} = "$pwd/$cd";
-	my $cmd = [ qw(git rev-parse --git-dir) ];
+	my $cmd = [ git_exe, qw(rev-parse --git-dir) ];
 	my $dir = run_qx($cmd, $env, { -C => $cd });
 	die "error in @$cmd (cwd:${\($cd // '.')}): $?\n" if $?;
 	chomp $dir;
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index e1843912..998fc25e 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -632,7 +632,7 @@ sub urlmatch {
 		} elsif (($? >> 8) != 1) {
 			$urlmatch_broken = 1;
 		} elsif ($try_git) { # n.b. this takes cwd into account
-			$val = run_qx([qw(git config), @bool,
+			$val = run_qx([$cmd->[0], 'config', @bool,
 					qw(-z --get-urlmatch), $key, $url]);
 			undef $val if $?;
 		}
diff --git a/lib/PublicInbox/GitCredential.pm b/lib/PublicInbox/GitCredential.pm
index bb225ff3..cf5a2213 100644
--- a/lib/PublicInbox/GitCredential.pm
+++ b/lib/PublicInbox/GitCredential.pm
@@ -4,13 +4,14 @@
 # git-credential wrapper with built-in .netrc fallback
 package PublicInbox::GitCredential;
 use v5.12;
+use PublicInbox::Git qw(git_exe);
 use PublicInbox::Spawn qw(popen_rd);
 use autodie qw(close pipe);
 
 sub run ($$;$) {
 	my ($self, $op, $lei) = @_;
 	my ($in_r, $in_w, $out_r);
-	my $cmd = [ qw(git credential), $op ];
+	my $cmd = [ git_exe, 'credential', $op ];
 	pipe($in_r, $in_w);
 	if ($lei) { # we'll die if disconnected:
 		pipe($out_r, my $out_w);
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index fefc282a..2e193d46 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -8,6 +8,7 @@
 package PublicInbox::Import;
 use v5.12;
 use parent qw(PublicInbox::Lock);
+use PublicInbox::Git qw(git_exe);
 use PublicInbox::Spawn qw(run_die run_qx spawn);
 use PublicInbox::MID qw(mids mid2path);
 use PublicInbox::Address;
@@ -24,7 +25,7 @@ use PublicInbox::IO qw(read_all);
 
 sub default_branch () {
 	state $default_branch = do {
-		my $h = run_qx([qw(git config --global init.defaultBranch)],
+		my $h = run_qx([git_exe,qw(config --global init.defaultBranch)],
 				 { GIT_CONFIG => undef });
 		chomp $h;
 		$h eq '' ? 'refs/heads/master' : "refs/heads/$h";
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 309d290d..c5146428 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -22,6 +22,7 @@ use PublicInbox::Syscall qw(EPOLLIN);
 use PublicInbox::Spawn qw(run_wait popen_rd run_qx);
 use PublicInbox::Lock;
 use PublicInbox::Eml;
+use PublicInbox::Git qw(git_exe);
 use PublicInbox::Import;
 use PublicInbox::ContentHash qw(git_sha);
 use PublicInbox::OnDestroy;
@@ -1098,7 +1099,7 @@ sub path_to_fd {
 # caller needs to "-t $self->{1}" to check if tty
 sub start_pager {
 	my ($self, $new_env) = @_;
-	chomp(my $pager = run_qx([qw(git var GIT_PAGER)]));
+	chomp(my $pager = run_qx([git_exe, qw(var GIT_PAGER)]));
 	warn "`git var PAGER' error: \$?=$?" if $?;
 	return if $pager eq 'cat' || $pager eq '';
 	$new_env //= {};
diff --git a/lib/PublicInbox/LeiBlob.pm b/lib/PublicInbox/LeiBlob.pm
index 7b2ea434..31936c36 100644
--- a/lib/PublicInbox/LeiBlob.pm
+++ b/lib/PublicInbox/LeiBlob.pm
@@ -10,14 +10,14 @@ use parent qw(PublicInbox::IPC);
 use PublicInbox::Spawn qw(run_wait run_qx which);
 use PublicInbox::DS;
 use PublicInbox::Eml;
-use PublicInbox::Git;
+use PublicInbox::Git qw(git_exe);
 use PublicInbox::IO qw(read_all);
 
 sub get_git_dir ($$) {
 	my ($lei, $d) = @_;
 	return $d if -d "$d/objects" && -d "$d/refs" && -e "$d/HEAD";
 
-	my $cmd = [ qw(git rev-parse --git-dir) ];
+	my $cmd = [ git_exe, qw(rev-parse --git-dir) ];
 	my $opt = { '-C' => $d };
 	if (defined($lei->{opt}->{cwd})) { # --cwd used, report errors
 		$opt->{2} = $lei->{2};
diff --git a/lib/PublicInbox/LeiConfig.pm b/lib/PublicInbox/LeiConfig.pm
index a50ff2b6..ae12249f 100644
--- a/lib/PublicInbox/LeiConfig.pm
+++ b/lib/PublicInbox/LeiConfig.pm
@@ -3,6 +3,7 @@
 package PublicInbox::LeiConfig; # subclassed by LeiEditSearch
 use v5.12;
 use PublicInbox::PktOp;
+use PublicInbox::Git qw(git_exe);
 use Fcntl qw(SEEK_SET);
 use autodie qw(open seek);
 use PublicInbox::IO qw(read_all);
@@ -11,7 +12,7 @@ sub cfg_do_edit ($;$) {
 	my ($self, $reason) = @_;
 	my $lei = $self->{lei};
 	$lei->pgr_err($reason) if defined $reason;
-	my $cmd = [ qw(git config --edit -f), $self->{-f} ];
+	my $cmd = [ git_exe, qw(config --edit -f), $self->{-f} ];
 	my $env = { GIT_CONFIG => $self->{-f} };
 	$self->cfg_edit_begin if $self->can('cfg_edit_begin');
 	# run in script/lei foreground
diff --git a/lib/PublicInbox/LeiInput.pm b/lib/PublicInbox/LeiInput.pm
index c388f7dc..0a6aba82 100644
--- a/lib/PublicInbox/LeiInput.pm
+++ b/lib/PublicInbox/LeiInput.pm
@@ -5,6 +5,7 @@
 package PublicInbox::LeiInput;
 use v5.12;
 use PublicInbox::DS;
+use PublicInbox::Git qw(git_exe);
 use PublicInbox::Spawn qw(which popen_rd);
 use PublicInbox::InboxWritable qw(eml_from_path);
 
@@ -252,7 +253,8 @@ sub input_path_url {
 		each_ibx_eml($self, $esrch, @args);
 	} elsif ($self->{missing_ok} && !-e $input) { # don't ->fail
 		if ($lei->{cmd} eq 'p2q') {
-			my $fp = [ qw(git format-patch --stdout -1), $input ];
+			my $fp = [ git_exe, qw(format-patch --stdout -1),
+					$input ];
 			my $rdr = { 2 => $lei->{2} };
 			my $fh = popen_rd($fp, undef, $rdr);
 			eval { $self->input_fh('eml', $fh, $input, @args) };
diff --git a/lib/PublicInbox/LeiMailDiff.pm b/lib/PublicInbox/LeiMailDiff.pm
index af6ecf82..270de507 100644
--- a/lib/PublicInbox/LeiMailDiff.pm
+++ b/lib/PublicInbox/LeiMailDiff.pm
@@ -7,13 +7,14 @@ package PublicInbox::LeiMailDiff;
 use v5.12;
 use parent qw(PublicInbox::IPC PublicInbox::LeiInput PublicInbox::MailDiff);
 use PublicInbox::Spawn qw(run_wait);
+use PublicInbox::Git qw(git_exe);
 require PublicInbox::LeiRediff;
 
 sub diff_a ($$) {
 	my ($self, $eml) = @_;
 	my $dir = "$self->{tmp}/N".(++$self->{nr});
 	$self->dump_eml($dir, $eml);
-	my $cmd = [ qw(git diff --no-index) ];
+	my $cmd = [ git_exe, qw(diff --no-index) ];
 	my $lei = $self->{lei};
 	PublicInbox::LeiRediff::_lei_diff_prepare($lei, $cmd);
 	push @$cmd, qw(-- a), "N$self->{nr}";
diff --git a/lib/PublicInbox/LeiSavedSearch.pm b/lib/PublicInbox/LeiSavedSearch.pm
index 1c8a1f73..c27b6f86 100644
--- a/lib/PublicInbox/LeiSavedSearch.pm
+++ b/lib/PublicInbox/LeiSavedSearch.pm
@@ -5,7 +5,7 @@
 package PublicInbox::LeiSavedSearch;
 use v5.12;
 use parent qw(PublicInbox::Lock);
-use PublicInbox::Git;
+use PublicInbox::Git qw(git_exe);
 use PublicInbox::OverIdx;
 use PublicInbox::LeiSearch;
 use PublicInbox::Config;
@@ -176,7 +176,7 @@ sub description { $_[0]->{qstr} } # for WWW
 sub cfg_set { # called by LeiXSearch
 	my ($self, @args) = @_;
 	my $lk = $self->lock_for_scope; # git-config doesn't wait
-	run_die([qw(git config -f), $self->{'-f'}, @args]);
+	run_die([git_exe, qw(config -f), $self->{'-f'}, @args]);
 }
 
 # drop-in for LeiDedupe API
diff --git a/lib/PublicInbox/LeiViewText.pm b/lib/PublicInbox/LeiViewText.pm
index c7d72c71..6510b19e 100644
--- a/lib/PublicInbox/LeiViewText.pm
+++ b/lib/PublicInbox/LeiViewText.pm
@@ -12,6 +12,7 @@ use PublicInbox::View;
 use PublicInbox::Hval;
 use PublicInbox::ViewDiff;
 use PublicInbox::Spawn qw(popen_rd);
+use PublicInbox::Git qw(git_exe);
 use Term::ANSIColor;
 use POSIX ();
 use PublicInbox::Address;
@@ -72,7 +73,7 @@ sub new {
 	my $self = bless { %{$lei->{opt}}, -colored => \&uncolored }, $cls;
 	$self->{-quote_reply} = 1 if $fmt eq 'reply';
 	return $self unless $self->{color} //= -t $lei->{1};
-	my @cmd = qw(git config -z --includes -l); # reuse normal git config
+	my @cmd = (git_exe, qw(config -z --includes -l)); # reuse normal git cfg
 	my $r = popen_rd(\@cmd, undef, { 2 => $lei->{2} });
 	my $cfg = PublicInbox::Config::config_fh_parse($r, "\0", "\n");
 	if (!$r->close) {
diff --git a/lib/PublicInbox/MailDiff.pm b/lib/PublicInbox/MailDiff.pm
index 125360fe..ce268bfb 100644
--- a/lib/PublicInbox/MailDiff.pm
+++ b/lib/PublicInbox/MailDiff.pm
@@ -7,6 +7,7 @@ use PublicInbox::ContentHash qw(content_digest);
 use PublicInbox::MsgIter qw(msg_part_text);
 use PublicInbox::ViewDiff qw(flush_diff);
 use PublicInbox::GitAsyncCat;
+use PublicInbox::Git qw(git_exe);
 use PublicInbox::ContentDigestDbg;
 use PublicInbox::Qspawn;
 use PublicInbox::IO qw(write_file);
@@ -81,7 +82,7 @@ sub do_diff {
 	my $n = 'N'.(++$self->{nr});
 	my $dir = "$self->{tmp}/$n";
 	$self->dump_eml($dir, $eml);
-	my $cmd = [ qw(git diff --no-index --no-color -- a), $n ];
+	my $cmd = [ git_exe, qw(diff --no-index --no-color -- a), $n ];
 	my $opt = { -C => "$self->{tmp}", quiet => 1 };
 	my $qsp = PublicInbox::Qspawn->new($cmd, undef, $opt);
 	$qsp->psgi_qx($self->{ctx}->{env}, undef, \&emit_msg_diff, $self);
diff --git a/lib/PublicInbox/MultiGit.pm b/lib/PublicInbox/MultiGit.pm
index b7691806..32bb3588 100644
--- a/lib/PublicInbox/MultiGit.pm
+++ b/lib/PublicInbox/MultiGit.pm
@@ -7,6 +7,7 @@ use strict;
 use v5.10.1;
 use PublicInbox::Spawn qw(run_die run_qx);
 use PublicInbox::Import;
+use PublicInbox::Git qw(git_exe);
 use File::Temp 0.19;
 use List::Util qw(max);
 use PublicInbox::IO qw(read_all);
@@ -110,11 +111,12 @@ sub epoch_cfg_set {
 	my ($self, $epoch_nr) = @_;
 	my $f = epoch_dir($self)."/$epoch_nr.git/config";
 	my $v = "../../$self->{all}/config";
+	my @cmd = (git_exe, qw(config -f), $f, 'include.path');
 	if (-r $f) {
-		chomp(my $x = run_qx([qw(git config -f), $f, 'include.path']));
+		chomp(my $x = run_qx(\@cmd));
 		return if $x eq $v;
 	}
-	run_die([qw(git config -f), $f, 'include.path', $v ]);
+	run_die [@cmd, $v];
 }
 
 sub add_epoch {
diff --git a/lib/PublicInbox/WwwCoderepo.pm b/lib/PublicInbox/WwwCoderepo.pm
index 8d825284..5e086fee 100644
--- a/lib/PublicInbox/WwwCoderepo.pm
+++ b/lib/PublicInbox/WwwCoderepo.pm
@@ -24,8 +24,9 @@ use PublicInbox::OnDestroy;
 use URI::Escape qw(uri_escape_utf8);
 use File::Spec;
 use autodie qw(fcntl open);
+use PublicInbox::Git qw(git_exe);
 
-my @EACH_REF = (qw(git for-each-ref --sort=-creatordate),
+my @EACH_REF = (git_exe, qw(for-each-ref --sort=-creatordate),
 		"--format=%(HEAD)%00".join('%00', map { "%($_)" }
 		qw(objectname refname:short subject creatordate:short)));
 my $HEADS_CMD = <<'';
@@ -249,7 +250,7 @@ sub summary ($$) {
 	my $qsp_err = \($ctx->{-qsp_err} = '');
 	my %opt = (quiet => 1, 2 => $ctx->{wcr}->{log_fh});
 	my %env = (GIT_DIR => $ctx->{git}->{git_dir});
-	my @log = (qw(git log), "-$nl", '--pretty=format:%d %H %h %cs %s');
+	my @log = (git_exe, 'log', "-$nl", '--pretty=format:%d %H %h %cs %s');
 	push(@log, $tip) if defined $tip;
 
 	# limit scope for MockHTTP test (t/solver_git.t)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] www: reduce fragmentation in /t/ and /T/ endpoints
  2024-06-06  7:44 [PATCH 0/5] cleanups + posible major fragmentation reduction Eric Wong
                   ` (3 preceding siblings ...)
  2024-06-06  7:44 ` [PATCH 4/5] treewide: use cached git executable lookup Eric Wong
@ 2024-06-06  7:44 ` Eric Wong
  2024-06-12 11:43   ` fragmentation notes [was: [PATCH 5/5] www: reduce fragmentation ...] Eric Wong
  4 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2024-06-06  7:44 UTC (permalink / raw)
  To: meta

For giant threads with /t/ and /T/ endpoints, avoid generating a
large string with a medium lifetime for the thread skeleton
($ctx->{skel}).  Instead, make $ctx->{skel} an arrayref and use
it to store a bunch of smaller strings, instead.

While keeping many small strings is inefficient due to pointer
chasing; forcing a smaller distribution of sizes makes it easier
for the malloc implementation to organize and find small chunks
of memory instead of having to find (and hold) larger contiguous
chunks.  When a large string is created now, it's lifetime is
kept as short as possible to decrease its likelyhood of causing
fragmentation.

Preliminary testing shows this appears to reduce RSS by roughly
20-40% under both glibc malloc (using a tiny
MALLOC_MMAP_THRESHOLD_=67000) on 32-bit and jemalloc 5.2.1 on
64-bit with standard settings.
---
 lib/PublicInbox/SearchView.pm |  7 ++---
 lib/PublicInbox/View.pm       | 55 ++++++++++++++++++-----------------
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 9919e25c..e27844d1 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -310,13 +310,12 @@ sub mset_thread {
 	my $rootset = PublicInbox::SearchThread::thread($msgs,
 		$r ? \&sort_relevance : \&PublicInbox::View::sort_ds,
 		$ctx);
-	my $skel = search_nav_bot($ctx, $mset, $q).'<pre>'. <<EOM;
+	$ctx->{skel} = [ search_nav_bot($ctx, $mset, $q).'<pre>'. <<EOM ];
 -- pct% links below jump to the message on this page, permalinks otherwise --
 EOM
 	$ctx->{-upfx} = '';
 	$ctx->{anchor_idx} = 1;
 	$ctx->{cur_level} = 0;
-	$ctx->{skel} = \$skel;
 	$ctx->{mapping} = {};
 	$ctx->{searchview} = 1;
 	$ctx->{prev_attr} = '';
@@ -326,7 +325,7 @@ EOM
 	# reduce hash lookups in skel_dump
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
 	PublicInbox::View::walk_thread($rootset, $ctx,
-		\&PublicInbox::View::pre_thread);
+		\&PublicInbox::View::pre_thread); # pushes to ctx->{skel}
 
 	# link $INBOX_DIR/description text to "recent" view around
 	# the newest message in this result set:
@@ -343,7 +342,7 @@ sub mset_thread_i {
 	print { $ctx->zfh } $ctx->html_top if exists $ctx->{-html_tip};
 	$eml and return PublicInbox::View::eml_entry($ctx, $eml);
 	my $smsg = shift @{$ctx->{msgs}} or
-		print { $ctx->zfh } ${delete($ctx->{skel})};
+		print { $ctx->zfh } @{delete($ctx->{skel})};
 	$smsg;
 }
 
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 44e1f2a8..958efa41 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -476,7 +476,7 @@ sub stream_thread_i { # PublicInbox::WwwStream::getline callback
 			print { $ctx->zfh } ghost_index_entry($ctx, $lvl, $smsg)
 		} else { # all done
 			print { $ctx->zfh } thread_adj_level($ctx, 0),
-						${delete($ctx->{skel})};
+						@{delete($ctx->{skel})};
 			return;
 		}
 	}
@@ -513,11 +513,13 @@ href="../../">newest</a>]
 EOF
 	$skel .= "<b\nid=t>Thread overview:</b> ";
 	$skel .= $nr == 1 ? '(only message)' : "$nr+ messages";
-	$skel .= " (download: <a\nhref=\"../t.mbox.gz\">mbox.gz</a>";
-	$skel .= " / follow: <a\nhref=\"../t.atom\">Atom feed</a>)\n";
-	$skel .= "-- links below jump to the message on this page --\n";
+	$skel .= <<EOM;
+ (download: <a\nhref="../t.mbox.gz">mbox.gz</a> follow: <a
+href=\"../t.atom\">Atom feed</a>
+-- links below jump to the message on this page --
+EOM
 	$ctx->{cur_level} = 0;
-	$ctx->{skel} = \$skel;
+	$ctx->{skel} = [ $skel ];
 	$ctx->{prev_attr} = '';
 	$ctx->{prev_level} = 0;
 	$ctx->{root_anchor} = 'm' . id_compress($mid, 1);
@@ -529,9 +531,9 @@ EOF
 
 	# reduce hash lookups in pre_thread->skel_dump
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
-	walk_thread($rootset, $ctx, \&pre_thread);
+	walk_thread($rootset, $ctx, \&pre_thread); # pushes to ctx->{skel}
 
-	$skel .= '</pre>';
+	push @{$ctx->{skel}}, '</pre>';
 	return stream_thread($rootset, $ctx) unless $ctx->{flat};
 
 	# flat display: lazy load the full message from smsg
@@ -553,8 +555,7 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 		while (my $smsg = shift @{$ctx->{msgs}}) {
 			return $smsg if exists($smsg->{blob});
 		}
-		my $skel = delete($ctx->{skel}) or return; # all done
-		print { $ctx->zfh } $$skel;
+		print { $ctx->zfh } @{delete $ctx->{skel} // []};
 		undef;
 	}
 }
@@ -778,13 +779,13 @@ sub thread_skel ($$$) {
 	my $ibx = $ctx->{ibx};
 	my ($nr, $msgs) = $ibx->over->get_thread($mid);
 	my $parent = in_reply_to($hdr);
-	$$skel .= "\n<b>Thread overview: </b>";
+	$skel->[-1] .= "\n<b>Thread overview: </b>";
 	if ($nr <= 1) {
 		if (defined $parent) {
-			$$skel .= SKEL_EXPAND."\n ";
-			$$skel .= ghost_parent('../', $parent) . "\n";
+			$skel->[-1] .= SKEL_EXPAND."\n ";
+			$skel->[-1] .= ghost_parent('../', $parent) . "\n";
 		} else {
-			$$skel .= "<a\nid=r>[no followups]</a> ".
+			$skel->[-1] .= "<a\nid=r>[no followups]</a> ".
 					SKEL_EXPAND."\n";
 		}
 		$ctx->{next_msg} = undef;
@@ -792,8 +793,9 @@ sub thread_skel ($$$) {
 		return;
 	}
 
-	$$skel .= $nr;
-	$$skel .= '+ messages / '.SKEL_EXPAND.qq!  <a\nhref="#b">top</a>\n!;
+	$skel->[-1] .= $nr;
+	$skel->[-1] .= '+ messages / '.SKEL_EXPAND.
+			qq!  <a\nhref="#b">top</a>\n!;
 
 	# nb: mutt only shows the first Subject in the index pane
 	# when multiple Subject: headers are present, so we follow suit:
@@ -815,7 +817,7 @@ sub thread_skel ($$$) {
 sub html_footer {
 	my ($ctx, $hdr) = @_;
 	my $upfx = '../';
-	my (@related, $skel);
+	my (@related, @skel);
 	my $foot = '<pre>';
 	my $qry = delete $ctx->{-qry};
 	if ($qry && $ctx->{ibx}->isrch) {
@@ -847,12 +849,12 @@ EOM
 		my $t = ts2str($ctx->{-t_max});
 		my $t_fmt = fmt_ts($ctx->{-t_max});
 		my $fallback = @related ? "\t" : "<a id=related>\t</a>";
-		$skel = <<EOF;
+		$skel[0] = <<EOF;
 ${fallback}other threads:[<a
 href="$upfx?t=$t">~$t_fmt UTC</a>|<a
 href="$upfx">newest</a>]
 EOF
-		thread_skel(\$skel, $ctx, $hdr);
+		thread_skel(\@skel, $ctx, $hdr);
 		my ($next, $prev);
 		my $parent = '       ';
 		$next = $prev = '    ';
@@ -879,11 +881,11 @@ EOF
 		}
 		$foot .= "$next $prev$parent ";
 	} else { # unindexed inboxes w/o over
-		$skel = qq( <a\nhref="$upfx">latest</a>);
+		$skel[0] = qq( <a\nhref="$upfx">latest</a>);
 	}
-	# $skel may be big for big threads, don't append it to $foot
+	# @skel may be big for big threads, don't push to it
 	print { $ctx->zfh } $foot, qq(<a\nhref="#R">reply</a>),
-				$skel, '</pre>', @related,
+				@skel, '</pre>', @related,
 				msg_reply($ctx, $hdr);
 }
 
@@ -985,7 +987,8 @@ sub skel_dump { # walk_thread callback
 	my $mid = $smsg->{mid};
 
 	if ($level == 0 && $ctx->{skel_dump_roots}++) {
-		$$skel .= delete($ctx->{sl_note}) || '';
+		my $note = delete $ctx->{sl_note};
+		push @$skel, $note if $note;
 	}
 
 	my $f = ascii_html(delete $smsg->{from_name});
@@ -1014,7 +1017,7 @@ sub skel_dump { # walk_thread callback
 	if ($cur) {
 		if ($cur eq $mid) {
 			delete $ctx->{cur};
-			$$skel .= "<b>$d<a\nid=r\nhref=\"#t\">".
+			push @$skel, "<b>$d<a\nid=r\nhref=\"#t\">".
 				 "$attr [this message]</a></b>\n";
 			return 1;
 		} else {
@@ -1054,8 +1057,7 @@ sub skel_dump { # walk_thread callback
 	} else {
 		$m = $ctx->{-upfx}.mid_href($mid).'/';
 	}
-	$$skel .=  $d . "<a\nhref=\"$m\"$id>" . $end;
-	1;
+	push @$skel, qq($d<a\nhref="$m"$id>$end);
 }
 
 sub _skel_ghost {
@@ -1078,8 +1080,7 @@ sub _skel_ghost {
 	} else {
 		$d .= qq{&lt;<a\nhref="$href">$html</a>&gt;\n};
 	}
-	${$ctx->{skel}} .= $d;
-	1;
+	push @{$ctx->{skel}}, $d;
 }
 
 # note: we favor Date: here because git-send-email increments it

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* fragmentation notes [was: [PATCH 5/5] www: reduce fragmentation ...]
  2024-06-06  7:44 ` [PATCH 5/5] www: reduce fragmentation in /t/ and /T/ endpoints Eric Wong
@ 2024-06-12 11:43   ` Eric Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2024-06-12 11:43 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> Preliminary testing shows this appears to reduce RSS by roughly
> 20-40% under both glibc malloc (using a tiny
> MALLOC_MMAP_THRESHOLD_=67000) on 32-bit and jemalloc 5.2.1 on
> 64-bit with standard settings.

This patch and 4fedc5caff06 (gzip_filter: use zlib DEF_MEM_LEVEL for gzip)
seem to be fairly substantial improvement in reducing glibc fragmentation;
but more is being worked on...

MALLOC_MMAP_THRESHOLD_ should probably be 65553 for both 32 and 64-bit.
This comes from the default Linux pipe size (65536) we use for sysread,
the extra NUL byte Perl adds to all scalars, and 16 bytes of glibc
overhead:

	65536 + 1 + 16 = 65553

glibc w/o the mmap threshold clamp remains awful out-of-the-box
compared to jemalloc; and git eats my memory in ways unrelated
to malloc :<

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-12 11:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-06  7:44 [PATCH 0/5] cleanups + posible major fragmentation reduction Eric Wong
2024-06-06  7:44 ` [PATCH 1/5] treewide: use \*STD(IN|OUT|ERR) consistently Eric Wong
2024-06-06  7:44 ` [PATCH 2/5] git: decouple git_version from git_exe Eric Wong
2024-06-06  7:44 ` [PATCH 3/5] test_common: use cached git lookup to reduce stat(2) Eric Wong
2024-06-06  7:44 ` [PATCH 4/5] treewide: use cached git executable lookup Eric Wong
2024-06-06  7:44 ` [PATCH 5/5] www: reduce fragmentation in /t/ and /T/ endpoints Eric Wong
2024-06-12 11:43   ` fragmentation notes [was: [PATCH 5/5] www: reduce fragmentation ...] Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).