* [PATCH 0/5] various cleanups and tweaks
@ 2023-04-12 0:12 Eric Wong
2023-04-12 0:12 ` [PATCH 1/5] git: cat_async_step: reduce batch-command info checks Eric Wong
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12 0:12 UTC (permalink / raw)
To: meta
Mostly stuff I noticed while working on the fix in commit
4b1a8231d4bae093 (git: fix cat_async_retry, 2023-04-11).
Eric Wong (5):
git: cat_async_step: reduce batch-command info checks
gzip_filter: use carp in ->bail for failure checks
git: rename version() to git_version()
www_coderepo: drop unused $EACH_REF variable
git: parallelize manifest_entry
lib/PublicInbox/Git.pm | 82 +++++++++++++++++-----------------
lib/PublicInbox/GzipFilter.pm | 18 ++++----
lib/PublicInbox/LeiMirror.pm | 2 +-
lib/PublicInbox/TestCommon.pm | 2 +-
lib/PublicInbox/WwwCoderepo.pm | 1 -
5 files changed, 50 insertions(+), 55 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/5] git: cat_async_step: reduce batch-command info checks
2023-04-12 0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
@ 2023-04-12 0:12 ` Eric Wong
2023-04-12 0:12 ` [PATCH 2/5] gzip_filter: use carp in ->bail for failure checks Eric Wong
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12 0:12 UTC (permalink / raw)
To: meta
This improves readability for me. Instead of checking for `info '
requests of `--batch-command' in multiple places of every
common branch, do it once per-call and stash its result.
We'll also avoid storing `$bc' for now since the only other
check is in a cold path.
---
lib/PublicInbox/Git.pm | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index f153237b..cc337e5d 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -262,20 +262,19 @@ sub cat_async_step ($$) {
my $head = my_readline($self->{in}, $rbuf);
my $cmd = ref($req) ? $$req : $req;
# ->fail may be called via Gcf2Client.pm
- my $bc = $self->{-bc};
+ my $info = $self->{-bc} && substr($cmd, 0, 5) eq 'info ';
if ($head =~ /^([0-9a-f]{40,}) (\S+) ([0-9]+)$/) {
($oid, $type, $size) = ($1, $2, $3 + 0);
- unless ($bc && $cmd =~ /\Ainfo /) { # --batch-command
+ unless ($info) { # --batch-command
$bref = my_read($self->{in}, $rbuf, $size + 1) or
$self->fail(defined($bref) ?
'read EOF' : "read: $!");
chop($$bref) eq "\n" or
$self->fail('LF missing after blob');
}
- } elsif ($bc && $cmd =~ /\Ainfo / &&
- $head =~ / (missing|ambiguous)\n/) {
+ } elsif ($info && $head =~ / (missing|ambiguous)\n/) {
$type = $1;
- $oid = substr($cmd, 5);
+ $oid = substr($cmd, 5); # remove 'info '
} elsif ($head =~ s/ missing\n//s) {
$oid = $head;
# ref($req) indicates it's already been retried
@@ -286,7 +285,7 @@ sub cat_async_step ($$) {
$type = 'missing';
if ($oid eq '') {
$oid = $cmd;
- $oid =~ s/\A(?:contents|info) // if $bc;
+ $oid =~ s/\A(?:contents|info) // if $self->{-bc};
}
} else {
my $err = $! ? " ($!)" : '';
@@ -294,7 +293,7 @@ sub cat_async_step ($$) {
}
$self->{rbuf} = $rbuf if $$rbuf ne '';
splice(@$inflight, 0, 3); # don't retry $cb on ->fail
- if ($bc && $cmd =~ /\Ainfo /) {
+ if ($info) {
eval { $cb->($oid, $type, $size, $arg, $self) };
async_err($self, $req, $oid, $@, 'check') if $@;
} else {
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/5] gzip_filter: use carp in ->bail for failure checks
2023-04-12 0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
2023-04-12 0:12 ` [PATCH 1/5] git: cat_async_step: reduce batch-command info checks Eric Wong
@ 2023-04-12 0:12 ` Eric Wong
2023-04-12 0:13 ` [PATCH 3/5] git: rename version() to git_version() Eric Wong
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12 0:12 UTC (permalink / raw)
To: meta
carp is more useful since it shows the perspective of the caller
and can be made to show a full backtrace with
PERL5OPT=-MCarp=verbose
---
lib/PublicInbox/GzipFilter.pm | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index a11ba73f..a37080c8 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -18,6 +18,7 @@ use Compress::Raw::Zlib qw(Z_OK);
use PublicInbox::CompressNoop;
use PublicInbox::Eml;
use PublicInbox::GitAsyncCat;
+use Carp qw(carp);
our @EXPORT_OK = qw(gzf_maybe);
my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
@@ -173,16 +174,13 @@ sub close {
sub bail {
my $self = shift;
- if (my $env = $self->{env}) {
- warn @_, "\n";
- my $http = $env->{'psgix.io'} or return; # client abort
- eval { $http->close }; # should hit our close
- warn "E: error in http->close: $@" if $@;
- eval { $self->close }; # just in case...
- warn "E: error in self->close: $@" if $@;
- } else {
- warn @_, "\n";
- }
+ carp @_;
+ my $env = $self->{env} or return;
+ my $http = $env->{'psgix.io'} or return; # client abort
+ eval { $http->close }; # should hit our close
+ carp "E: error in http->close: $@" if $@;
+ eval { $self->close }; # just in case...
+ carp "E: error in self->close: $@" if $@;
}
# this is public-inbox-httpd-specific
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/5] git: rename version() to git_version()
2023-04-12 0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
2023-04-12 0:12 ` [PATCH 1/5] git: cat_async_step: reduce batch-command info checks Eric Wong
2023-04-12 0:12 ` [PATCH 2/5] gzip_filter: use carp in ->bail for failure checks Eric Wong
@ 2023-04-12 0:13 ` Eric Wong
2023-04-12 0:13 ` [PATCH 4/5] www_coderepo: drop unused $EACH_REF variable Eric Wong
2023-04-12 0:13 ` [PATCH 5/5] git: parallelize manifest_entry Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12 0:13 UTC (permalink / raw)
To: meta
In case it gets confused with Inbox->version or similar.
---
lib/PublicInbox/Git.pm | 10 +++++-----
lib/PublicInbox/LeiMirror.pm | 2 +-
lib/PublicInbox/TestCommon.pm | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index cc337e5d..fd5bbc6b 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -52,11 +52,6 @@ my %ESC_GIT = map { $GIT_ESC{$_} => $_ } keys %GIT_ESC;
my $EXE_ST = ''; # pack('dd', st_ctime, st_size);
my ($GIT_EXE, $GIT_VER);
-sub version {
- check_git_exe();
- $GIT_VER;
-}
-
sub check_git_exe () {
$GIT_EXE = which('git') // die "git not found in $ENV{PATH}";
my @st = stat($GIT_EXE) or die "stat: $!";
@@ -72,6 +67,11 @@ sub check_git_exe () {
}
}
+sub git_version {
+ check_git_exe();
+ $GIT_VER;
+}
+
# unquote pathnames used by git, see quote.c::unquote_c_style.c in git.git
sub git_unquote ($) {
return $_[0] unless ($_[0] =~ /\A"(.*)"\z/);
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 8f749688..bed034f1 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -276,7 +276,7 @@ sub fetch_args ($$) {
($lei->{opt}->{jobs} // 1) > 1;
push @cmd, '-v' if $lei->{opt}->{verbose};
push(@cmd, '-p') if $lei->{opt}->{prune};
- PublicInbox::Git::version() ge v2.29.0 and
+ PublicInbox::Git::git_version() ge v2.29.0 and
push(@cmd, '--no-write-fetch-head');
@cmd;
}
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index aa2abc43..ddee58b1 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -111,7 +111,7 @@ sub have_xapian_compact (;$) {
sub require_git ($;$) {
my ($req, $nr) = @_;
require PublicInbox::Git;
- state $cur_vstr = PublicInbox::Git::version();
+ state $cur_vstr = PublicInbox::Git::git_version();
$req = eval("v$req") unless isvstring($req);
return 1 if $cur_vstr ge $req;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/5] www_coderepo: drop unused $EACH_REF variable
2023-04-12 0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
` (2 preceding siblings ...)
2023-04-12 0:13 ` [PATCH 3/5] git: rename version() to git_version() Eric Wong
@ 2023-04-12 0:13 ` Eric Wong
2023-04-12 0:13 ` [PATCH 5/5] git: parallelize manifest_entry Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12 0:13 UTC (permalink / raw)
To: meta
It's unused since commit cbe2548c91859dfb
(www_coderepo: use OnDestroy to render summary view, 2023-04-09)
---
lib/PublicInbox/WwwCoderepo.pm | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/PublicInbox/WwwCoderepo.pm b/lib/PublicInbox/WwwCoderepo.pm
index bece45af..730c2d0a 100644
--- a/lib/PublicInbox/WwwCoderepo.pm
+++ b/lib/PublicInbox/WwwCoderepo.pm
@@ -24,7 +24,6 @@ use PublicInbox::OnDestroy;
my @EACH_REF = (qw(git for-each-ref --sort=-creatordate),
"--format=%(HEAD)%00".join('%00', map { "%($_)" }
qw(objectname refname:short subject creatordate:short)));
-my $EACH_REF = "@EACH_REF[0..2] '$EACH_REF[3]'";
my $HEADS_CMD = <<'';
# heads (aka `branches'):
$ git for-each-ref --sort=-creatordate refs/heads \
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 5/5] git: parallelize manifest_entry
2023-04-12 0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
` (3 preceding siblings ...)
2023-04-12 0:13 ` [PATCH 4/5] www_coderepo: drop unused $EACH_REF variable Eric Wong
@ 2023-04-12 0:13 ` Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2023-04-12 0:13 UTC (permalink / raw)
To: meta
This saves a few milliseconds per-epoch without incurring
any dependencies on the event loop. It can be parallelized
further, of course, but it may not be worth it for -extindex
users since it's already cached.
---
lib/PublicInbox/Git.pm | 59 +++++++++++++++++++++---------------------
1 file changed, 29 insertions(+), 30 deletions(-)
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index fd5bbc6b..3108ed85 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -28,6 +28,10 @@ our $in_cleanup;
our $RDTIMEO = 60_000; # milliseconds
our $async_warn; # true in read-only daemons
+# committerdate:unix is git 2.9.4+ (2017-05-05), so using raw instead
+my @MODIFIED_DATE = qw[for-each-ref --sort=-committerdate
+ --format=%(committerdate:raw) --count=1];
+
# 512: POSIX PIPE_BUF minimum (see pipe(7))
# 3: @$inflight is flattened [ $OID, $cb, $arg ]
# 65: SHA-256 hex size + "\n" in preparation for git using non-SHA1
@@ -592,10 +596,8 @@ sub cat_async ($$$;$) {
# returns the modified time of a git repo, same as the "modified" field
# of a grokmirror manifest
-sub modified ($) {
- # committerdate:unix is git 2.9.4+ (2017-05-05), so using raw instead
- my $fh = popen($_[0], qw[for-each-ref --sort=-committerdate
- --format=%(committerdate:raw) --count=1]);
+sub modified ($;$) {
+ my $fh = $_[1] // popen($_[0], @MODIFIED_DATE);
(split(/ /, <$fh> // time))[0] + 0; # integerize for JSON
}
@@ -632,41 +634,38 @@ sub cloneurl {
# templates/this--description in git.git
sub manifest_entry {
my ($self, $epoch, $default_desc) = @_;
- my $fh = $self->popen('show-ref');
- my $dig = PublicInbox::SHA->new(1);
- while (read($fh, my $buf, 65536)) {
- $dig->add($buf);
- }
- close $fh or return; # empty, uninitialized git repo
- undef $fh; # for open, below
- my $git_dir = $self->{git_dir};
- my $ent = {
- fingerprint => $dig->hexdigest,
- reference => undef,
- modified => modified($self),
- };
- chomp(my $owner = $self->qx('config', 'gitweb.owner'));
- utf8::decode($owner);
- $ent->{owner} = $owner eq '' ? undef : $owner;
- my $desc = description($self);
- if (defined $epoch && index($desc, 'Unnamed repository') == 0) {
- $desc = "$default_desc [epoch $epoch]";
+ check_git_exe();
+ my $gd = $self->{git_dir};
+ my @git = ($GIT_EXE, "--git-dir=$gd");
+ my $sr = popen_rd([@git, 'show-ref']);
+ my $own = popen_rd([@git, qw(config gitweb.owner)]);
+ my $mod = popen_rd([@git, @MODIFIED_DATE]);
+ my $buf = description($self);
+ if (defined $epoch && index($buf, 'Unnamed repository') == 0) {
+ $buf = "$default_desc [epoch $epoch]";
}
- $ent->{description} = $desc;
- if (open($fh, '<', "$git_dir/objects/info/alternates")) {
+ my $ent = { description => $buf, reference => undef };
+ if (open(my $alt, '<', "$gd/objects/info/alternates")) {
# n.b.: GitPython doesn't seem to handle comments or C-quoted
# strings like native git does; and we don't for now, either.
local $/ = "\n";
- chomp(my @alt = <$fh>);
+ chomp(my @alt = <$alt>);
# grokmirror only supports 1 alternate for "reference",
if (scalar(@alt) == 1) {
- my $objdir = "$git_dir/objects";
- my $ref = File::Spec->rel2abs($alt[0], $objdir);
- $ref =~ s!/[^/]+/?\z!!; # basename
- $ent->{reference} = $ref;
+ $buf = File::Spec->rel2abs($alt[0], "$gd/objects");
+ $buf =~ s!/[^/]+/?\z!!; # basename
+ $ent->{reference} = $buf;
}
}
+ my $dig = PublicInbox::SHA->new(1);
+ while (read($sr, $buf, 65536)) { $dig->add($buf) }
+ close $sr or return; # empty, uninitialized git repo
+ $ent->{fingerprint} = $dig->hexdigest;
+ $ent->{modified} = modified(undef, $mod);
+ chomp($buf = <$own> // '');
+ utf8::decode($buf);
+ $ent->{owner} = $buf eq '' ? undef : $buf;
$ent;
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-04-12 0:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12 0:12 [PATCH 0/5] various cleanups and tweaks Eric Wong
2023-04-12 0:12 ` [PATCH 1/5] git: cat_async_step: reduce batch-command info checks Eric Wong
2023-04-12 0:12 ` [PATCH 2/5] gzip_filter: use carp in ->bail for failure checks Eric Wong
2023-04-12 0:13 ` [PATCH 3/5] git: rename version() to git_version() Eric Wong
2023-04-12 0:13 ` [PATCH 4/5] www_coderepo: drop unused $EACH_REF variable Eric Wong
2023-04-12 0:13 ` [PATCH 5/5] git: parallelize manifest_entry Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).