unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/5] various cleanups and tweaks
@ 2023-04-12  0:12 Eric Wong
  2023-04-12  0:12 ` [PATCH 1/5] git: cat_async_step: reduce batch-command info checks Eric Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12  0:12 UTC (permalink / raw)
  To: meta

Mostly stuff I noticed while working on the fix in commit
4b1a8231d4bae093 (git: fix cat_async_retry, 2023-04-11).

Eric Wong (5):
  git: cat_async_step: reduce batch-command info checks
  gzip_filter: use carp in ->bail for failure checks
  git: rename version() to git_version()
  www_coderepo: drop unused $EACH_REF variable
  git: parallelize manifest_entry

 lib/PublicInbox/Git.pm         | 82 +++++++++++++++++-----------------
 lib/PublicInbox/GzipFilter.pm  | 18 ++++----
 lib/PublicInbox/LeiMirror.pm   |  2 +-
 lib/PublicInbox/TestCommon.pm  |  2 +-
 lib/PublicInbox/WwwCoderepo.pm |  1 -
 5 files changed, 50 insertions(+), 55 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] git: cat_async_step: reduce batch-command info checks
  2023-04-12  0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
@ 2023-04-12  0:12 ` Eric Wong
  2023-04-12  0:12 ` [PATCH 2/5] gzip_filter: use carp in ->bail for failure checks Eric Wong
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12  0:12 UTC (permalink / raw)
  To: meta

This improves readability for me.  Instead of checking for `info '
requests of `--batch-command' in multiple places of every
common branch, do it once per-call and stash its result.

We'll also avoid storing `$bc' for now since the only other
check is in a cold path.
---
 lib/PublicInbox/Git.pm | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index f153237b..cc337e5d 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -262,20 +262,19 @@ sub cat_async_step ($$) {
 	my $head = my_readline($self->{in}, $rbuf);
 	my $cmd = ref($req) ? $$req : $req;
 	# ->fail may be called via Gcf2Client.pm
-	my $bc = $self->{-bc};
+	my $info = $self->{-bc} && substr($cmd, 0, 5) eq 'info ';
 	if ($head =~ /^([0-9a-f]{40,}) (\S+) ([0-9]+)$/) {
 		($oid, $type, $size) = ($1, $2, $3 + 0);
-		unless ($bc && $cmd =~ /\Ainfo /) { # --batch-command
+		unless ($info) { # --batch-command
 			$bref = my_read($self->{in}, $rbuf, $size + 1) or
 				$self->fail(defined($bref) ?
 						'read EOF' : "read: $!");
 			chop($$bref) eq "\n" or
 					$self->fail('LF missing after blob');
 		}
-	} elsif ($bc && $cmd =~ /\Ainfo / &&
-			$head =~ / (missing|ambiguous)\n/) {
+	} elsif ($info && $head =~ / (missing|ambiguous)\n/) {
 		$type = $1;
-		$oid = substr($cmd, 5);
+		$oid = substr($cmd, 5); # remove 'info '
 	} elsif ($head =~ s/ missing\n//s) {
 		$oid = $head;
 		# ref($req) indicates it's already been retried
@@ -286,7 +285,7 @@ sub cat_async_step ($$) {
 		$type = 'missing';
 		if ($oid eq '') {
 			$oid = $cmd;
-			$oid =~ s/\A(?:contents|info) // if $bc;
+			$oid =~ s/\A(?:contents|info) // if $self->{-bc};
 		}
 	} else {
 		my $err = $! ? " ($!)" : '';
@@ -294,7 +293,7 @@ sub cat_async_step ($$) {
 	}
 	$self->{rbuf} = $rbuf if $$rbuf ne '';
 	splice(@$inflight, 0, 3); # don't retry $cb on ->fail
-	if ($bc && $cmd =~ /\Ainfo /) {
+	if ($info) {
 		eval { $cb->($oid, $type, $size, $arg, $self) };
 		async_err($self, $req, $oid, $@, 'check') if $@;
 	} else {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] gzip_filter: use carp in ->bail for failure checks
  2023-04-12  0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
  2023-04-12  0:12 ` [PATCH 1/5] git: cat_async_step: reduce batch-command info checks Eric Wong
@ 2023-04-12  0:12 ` Eric Wong
  2023-04-12  0:13 ` [PATCH 3/5] git: rename version() to git_version() Eric Wong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12  0:12 UTC (permalink / raw)
  To: meta

carp is more useful since it shows the perspective of the caller
and can be made to show a full backtrace with
PERL5OPT=-MCarp=verbose
---
 lib/PublicInbox/GzipFilter.pm | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index a11ba73f..a37080c8 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -18,6 +18,7 @@ use Compress::Raw::Zlib qw(Z_OK);
 use PublicInbox::CompressNoop;
 use PublicInbox::Eml;
 use PublicInbox::GitAsyncCat;
+use Carp qw(carp);
 
 our @EXPORT_OK = qw(gzf_maybe);
 my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
@@ -173,16 +174,13 @@ sub close {
 
 sub bail  {
 	my $self = shift;
-	if (my $env = $self->{env}) {
-		warn @_, "\n";
-		my $http = $env->{'psgix.io'} or return; # client abort
-		eval { $http->close }; # should hit our close
-		warn "E: error in http->close: $@" if $@;
-		eval { $self->close }; # just in case...
-		warn "E: error in self->close: $@" if $@;
-	} else {
-		warn @_, "\n";
-	}
+	carp @_;
+	my $env = $self->{env} or return;
+	my $http = $env->{'psgix.io'} or return; # client abort
+	eval { $http->close }; # should hit our close
+	carp "E: error in http->close: $@" if $@;
+	eval { $self->close }; # just in case...
+	carp "E: error in self->close: $@" if $@;
 }
 
 # this is public-inbox-httpd-specific

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] git: rename version() to git_version()
  2023-04-12  0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
  2023-04-12  0:12 ` [PATCH 1/5] git: cat_async_step: reduce batch-command info checks Eric Wong
  2023-04-12  0:12 ` [PATCH 2/5] gzip_filter: use carp in ->bail for failure checks Eric Wong
@ 2023-04-12  0:13 ` Eric Wong
  2023-04-12  0:13 ` [PATCH 4/5] www_coderepo: drop unused $EACH_REF variable Eric Wong
  2023-04-12  0:13 ` [PATCH 5/5] git: parallelize manifest_entry Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12  0:13 UTC (permalink / raw)
  To: meta

In case it gets confused with Inbox->version or similar.
---
 lib/PublicInbox/Git.pm        | 10 +++++-----
 lib/PublicInbox/LeiMirror.pm  |  2 +-
 lib/PublicInbox/TestCommon.pm |  2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index cc337e5d..fd5bbc6b 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -52,11 +52,6 @@ my %ESC_GIT = map { $GIT_ESC{$_} => $_ } keys %GIT_ESC;
 my $EXE_ST = ''; # pack('dd', st_ctime, st_size);
 my ($GIT_EXE, $GIT_VER);
 
-sub version {
-	check_git_exe();
-	$GIT_VER;
-}
-
 sub check_git_exe () {
 	$GIT_EXE = which('git') // die "git not found in $ENV{PATH}";
 	my @st = stat($GIT_EXE) or die "stat: $!";
@@ -72,6 +67,11 @@ sub check_git_exe () {
 	}
 }
 
+sub git_version {
+	check_git_exe();
+	$GIT_VER;
+}
+
 # unquote pathnames used by git, see quote.c::unquote_c_style.c in git.git
 sub git_unquote ($) {
 	return $_[0] unless ($_[0] =~ /\A"(.*)"\z/);
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 8f749688..bed034f1 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -276,7 +276,7 @@ sub fetch_args ($$) {
 			($lei->{opt}->{jobs} // 1) > 1;
 	push @cmd, '-v' if $lei->{opt}->{verbose};
 	push(@cmd, '-p') if $lei->{opt}->{prune};
-	PublicInbox::Git::version() ge v2.29.0 and
+	PublicInbox::Git::git_version() ge v2.29.0 and
 		push(@cmd, '--no-write-fetch-head');
 	@cmd;
 }
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index aa2abc43..ddee58b1 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -111,7 +111,7 @@ sub have_xapian_compact (;$) {
 sub require_git ($;$) {
 	my ($req, $nr) = @_;
 	require PublicInbox::Git;
-	state $cur_vstr = PublicInbox::Git::version();
+	state $cur_vstr = PublicInbox::Git::git_version();
 	$req = eval("v$req") unless isvstring($req);
 
 	return 1 if $cur_vstr ge $req;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] www_coderepo: drop unused $EACH_REF variable
  2023-04-12  0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
                   ` (2 preceding siblings ...)
  2023-04-12  0:13 ` [PATCH 3/5] git: rename version() to git_version() Eric Wong
@ 2023-04-12  0:13 ` Eric Wong
  2023-04-12  0:13 ` [PATCH 5/5] git: parallelize manifest_entry Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12  0:13 UTC (permalink / raw)
  To: meta

It's unused since commit cbe2548c91859dfb
(www_coderepo: use OnDestroy to render summary view, 2023-04-09)
---
 lib/PublicInbox/WwwCoderepo.pm | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/PublicInbox/WwwCoderepo.pm b/lib/PublicInbox/WwwCoderepo.pm
index bece45af..730c2d0a 100644
--- a/lib/PublicInbox/WwwCoderepo.pm
+++ b/lib/PublicInbox/WwwCoderepo.pm
@@ -24,7 +24,6 @@ use PublicInbox::OnDestroy;
 my @EACH_REF = (qw(git for-each-ref --sort=-creatordate),
 		"--format=%(HEAD)%00".join('%00', map { "%($_)" }
 		qw(objectname refname:short subject creatordate:short)));
-my $EACH_REF = "@EACH_REF[0..2] '$EACH_REF[3]'";
 my $HEADS_CMD = <<'';
 # heads (aka `branches'):
 $ git for-each-ref --sort=-creatordate refs/heads \

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] git: parallelize manifest_entry
  2023-04-12  0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
                   ` (3 preceding siblings ...)
  2023-04-12  0:13 ` [PATCH 4/5] www_coderepo: drop unused $EACH_REF variable Eric Wong
@ 2023-04-12  0:13 ` Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12  0:13 UTC (permalink / raw)
  To: meta

This saves a few milliseconds per-epoch without incurring
any dependencies on the event loop.  It can be parallelized
further, of course, but it may not be worth it for -extindex
users since it's already cached.
---
 lib/PublicInbox/Git.pm | 59 +++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index fd5bbc6b..3108ed85 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -28,6 +28,10 @@ our $in_cleanup;
 our $RDTIMEO = 60_000; # milliseconds
 our $async_warn; # true in read-only daemons
 
+# committerdate:unix is git 2.9.4+ (2017-05-05), so using raw instead
+my @MODIFIED_DATE = qw[for-each-ref --sort=-committerdate
+			--format=%(committerdate:raw) --count=1];
+
 # 512: POSIX PIPE_BUF minimum (see pipe(7))
 # 3: @$inflight is flattened [ $OID, $cb, $arg ]
 # 65: SHA-256 hex size + "\n" in preparation for git using non-SHA1
@@ -592,10 +596,8 @@ sub cat_async ($$$;$) {
 
 # returns the modified time of a git repo, same as the "modified" field
 # of a grokmirror manifest
-sub modified ($) {
-	# committerdate:unix is git 2.9.4+ (2017-05-05), so using raw instead
-	my $fh = popen($_[0], qw[for-each-ref --sort=-committerdate
-				--format=%(committerdate:raw) --count=1]);
+sub modified ($;$) {
+	my $fh = $_[1] // popen($_[0], @MODIFIED_DATE);
 	(split(/ /, <$fh> // time))[0] + 0; # integerize for JSON
 }
 
@@ -632,41 +634,38 @@ sub cloneurl {
 # templates/this--description in git.git
 sub manifest_entry {
 	my ($self, $epoch, $default_desc) = @_;
-	my $fh = $self->popen('show-ref');
-	my $dig = PublicInbox::SHA->new(1);
-	while (read($fh, my $buf, 65536)) {
-		$dig->add($buf);
-	}
-	close $fh or return; # empty, uninitialized git repo
-	undef $fh; # for open, below
-	my $git_dir = $self->{git_dir};
-	my $ent = {
-		fingerprint => $dig->hexdigest,
-		reference => undef,
-		modified => modified($self),
-	};
-	chomp(my $owner = $self->qx('config', 'gitweb.owner'));
-	utf8::decode($owner);
-	$ent->{owner} = $owner eq '' ? undef : $owner;
-	my $desc = description($self);
-	if (defined $epoch && index($desc, 'Unnamed repository') == 0) {
-		$desc = "$default_desc [epoch $epoch]";
+	check_git_exe();
+	my $gd = $self->{git_dir};
+	my @git = ($GIT_EXE, "--git-dir=$gd");
+	my $sr = popen_rd([@git, 'show-ref']);
+	my $own = popen_rd([@git, qw(config gitweb.owner)]);
+	my $mod = popen_rd([@git, @MODIFIED_DATE]);
+	my $buf = description($self);
+	if (defined $epoch && index($buf, 'Unnamed repository') == 0) {
+		$buf = "$default_desc [epoch $epoch]";
 	}
-	$ent->{description} = $desc;
-	if (open($fh, '<', "$git_dir/objects/info/alternates")) {
+	my $ent = { description => $buf, reference => undef };
+	if (open(my $alt, '<', "$gd/objects/info/alternates")) {
 		# n.b.: GitPython doesn't seem to handle comments or C-quoted
 		# strings like native git does; and we don't for now, either.
 		local $/ = "\n";
-		chomp(my @alt = <$fh>);
+		chomp(my @alt = <$alt>);
 
 		# grokmirror only supports 1 alternate for "reference",
 		if (scalar(@alt) == 1) {
-			my $objdir = "$git_dir/objects";
-			my $ref = File::Spec->rel2abs($alt[0], $objdir);
-			$ref =~ s!/[^/]+/?\z!!; # basename
-			$ent->{reference} = $ref;
+			$buf = File::Spec->rel2abs($alt[0], "$gd/objects");
+			$buf =~ s!/[^/]+/?\z!!; # basename
+			$ent->{reference} = $buf;
 		}
 	}
+	my $dig = PublicInbox::SHA->new(1);
+	while (read($sr, $buf, 65536)) { $dig->add($buf) }
+	close $sr or return; # empty, uninitialized git repo
+	$ent->{fingerprint} = $dig->hexdigest;
+	$ent->{modified} = modified(undef, $mod);
+	chomp($buf = <$own> // '');
+	utf8::decode($buf);
+	$ent->{owner} = $buf eq '' ? undef : $buf;
 	$ent;
 }
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-12  0:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12  0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
2023-04-12  0:12 ` [PATCH 1/5] git: cat_async_step: reduce batch-command info checks Eric Wong
2023-04-12  0:12 ` [PATCH 2/5] gzip_filter: use carp in ->bail for failure checks Eric Wong
2023-04-12  0:13 ` [PATCH 3/5] git: rename version() to git_version() Eric Wong
2023-04-12  0:13 ` [PATCH 4/5] www_coderepo: drop unused $EACH_REF variable Eric Wong
2023-04-12  0:13 ` [PATCH 5/5] git: parallelize manifest_entry Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).