unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: Chris Brannon <chris@the-brannons.com>
Cc: meta@public-inbox.org
Subject: [PATCH] git: fix asynchronous batching for deep pipelines
Date: Wed, 4 Jan 2023 03:49:34 +0000	[thread overview]
Message-ID: <20230104034934.M147841@dcvr> (raw)
In-Reply-To: <877cykl387.fsf@the-brannons.com>

Chris Brannon <chris@the-brannons.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > Also, can I assume just running 'public-inbox-index' on any
> > freshly-cloned inbox also fails for you independently of
> > -convert?
> 
> Yes it does.  I double-checked to be sure.
> 
> Thanks for the fix!  If you come up with something more permanent that
> you want me to test, please ask.



--------8<-------
Subject: [PATCH] git: fix asynchronous batching for deep pipelines

...By using non-blocking pipe writes.  This avoids problems for
musl (and other libc) where getdelim(3) used by `git cat-file --batch*'
uses a smaller input buffer than glibc or FreeBSD libc.

My key mistake was our check against MAX_INFLIGHT is only useful
for the initial batch of requests.  It is not useful for
subsequent requests since git will drain the pipe at
unpredictable rates due to libc differences.

To fix this problem, I initially tried to drain the read pipe
as long as readable data was pending.  However, reading git
output without giving git more work would also limit parallelism
opportunities since we don't want git to sit idle, either.  This
change ensures we keep both pipes reasonably full to reduce
stalls and maximize parallelism between git and public-inbox.

While the limit set a few weeks ago in commit
56e6e587745c (git: cap MAX_INFLIGHT value to POSIX minimum, 2022-12-21)
remains in place, any higher or lower limit will work.  It may
be worth it to use an even lower limit to improve interactivity
w.r.t. Ctrl-C interrupts.

I've tested the pre-56e6e587745c and even higher values on an
Alpine VM in the GCC Farm <https://cfarm.tetaneutral.net>

Reported-by: Chris Brannon <chris@the-brannons.com>
Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/
---
 lib/PublicInbox/Git.pm | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index a1af776b..86b80a4e 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -145,6 +145,7 @@ sub _bidi_pipe {
 		fcntl($out_w, 1031, 4096);
 		fcntl($in_r, 1031, 4096) if $batch eq '--batch-check';
 	}
+	$out_w->blocking(0);
 	$self->{$out} = $out_w;
 	$self->{$in} = $in_r;
 }
@@ -203,7 +204,9 @@ sub cat_async_retry ($$) {
 	for (my $i = 0; $i < @$inflight; $i += 3) {
 		$buf .= "$inflight->[$i]\n";
 	}
+	$self->{out}->blocking(1); # brand new pipe, should never block
 	print { $self->{out} } $buf or $self->fail("write error: $!");
+	$self->{out}->blocking(0);
 	my $req = shift @$inflight;
 	unshift(@$inflight, \$req); # \$ref to indicate retried
 
@@ -305,13 +308,27 @@ sub check_async_begin ($) {
 	$self->{inflight_c} = [];
 }
 
+sub write_all {
+	my ($self, $out, $buf, $read_step, $inflight) = @_;
+	$read_step->($self, $inflight) while @$inflight >= MAX_INFLIGHT;
+	do {
+		my $w = syswrite($out, $buf);
+		if (defined $w) {
+			return if $w == length($buf);
+			warn "chop: $w";
+			substr($buf, 0, $w, ''); # sv_chop
+		} elsif ($! != EAGAIN) {
+			$self->fail("write: $!");
+		} else { warn "E: $!" }
+		$read_step->($self, $inflight);
+	} while (1);
+}
+
 sub check_async ($$$$) {
 	my ($self, $oid, $cb, $arg) = @_;
 	my $inflight_c = $self->{inflight_c} // check_async_begin($self);
-	while (scalar(@$inflight_c) >= MAX_INFLIGHT) {
-		check_async_step($self, $inflight_c);
-	}
-	print { $self->{out_c} } $oid, "\n" or $self->fail("write error: $!");
+	write_all($self, $self->{out_c}, $oid."\n",
+		\&check_async_step, $inflight_c);
 	push(@$inflight_c, $oid, $cb, $arg);
 }
 
@@ -496,10 +513,7 @@ sub cat_async_begin {
 sub cat_async ($$$;$) {
 	my ($self, $oid, $cb, $arg) = @_;
 	my $inflight = $self->{inflight} // cat_async_begin($self);
-	while (scalar(@$inflight) >= MAX_INFLIGHT) {
-		cat_async_step($self, $inflight);
-	}
-	print { $self->{out} } $oid, "\n" or $self->fail("write error: $!");
+	write_all($self, $self->{out}, $oid."\n", \&cat_async_step, $inflight);
 	push(@$inflight, $oid, $cb, $arg);
 }
 

  reply	other threads:[~2023-01-04  3:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-21 11:28 public-inbox-convert hangs on systems using musl libc Chris Brannon
2022-12-21 12:21 ` Eric Wong
2022-12-21 13:46   ` Chris Brannon
2022-12-21 19:48     ` Eric Wong
2022-12-21 20:46       ` Chris Brannon
2022-12-21 21:11         ` Eric Wong
2022-12-21 22:17           ` Chris Brannon
2022-12-21 23:22             ` [PATCH] git: cap MAX_INFLIGHT value to POSIX minimum Eric Wong
2022-12-21 23:57               ` Chris Brannon
2023-01-04  3:49                 ` Eric Wong [this message]
2023-01-05  1:08                   ` [PATCH] git: fix asynchronous batching for deep pipelines Chris Brannon
2023-01-05  1:44                     ` [PATCH] git: write_all: remove leftover debug messages Eric Wong
2023-01-05  7:32                       ` Chris Brannon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230104034934.M147841@dcvr \
    --to=e@80x24.org \
    --cc=chris@the-brannons.com \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).