unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 00/95] clone: multi-inbox/repo support...
@ 2022-11-28  5:30 Eric Wong
  2022-11-28  5:30 ` [PATCH 01/95] clone: support multi-inbox clone Eric Wong
                   ` (94 more replies)
  0 siblings, 95 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:30 UTC (permalink / raw)
  To: meta

A large patchset, and not done, yet :P  It's only tested live,
but it seems to work reasonably well against live hosts...

Behavior changes to public-inbox-clone are NOT final; but
public-inbox-fetch|PublicInbox::Fetch will probably become
thin wrappers around LeiMirror.

--include=/--exclude= support now exists with glob support

--keep-going and --dry-run support added, too, since it's
make(1) influenced (more below)

It supports coderepos, too, using --inbox-config=never (default: always);
--project-list=, --manifest=, --objstore=, and --prune.

key differences from grok-pull (grokmirror) for coderepos:

* uses relative paths on the FS (dumb HTTP untested, but dumb
  HTTP is a goal for memory-constrained hosts).  This means
  I can relocate coderepos freely within my FS or do sneakernet
  transfers across machines without having to `perl -ipe s/x/y/'
  on hundreds of info/alternates and config files.

* CLI-only, no extra config files (may generate a Makefile, like
  individual inbox clones)

* objstore repos fetches from remotes directly
  (does not need, use, nor benefit from hardlinks at all)

* no sleep states

It is not a full replacement for grokmirror

* reliant on default `git gc' behavior for repack. This is OK
  since it's only one-way relationships between objstore and
  non-objstore repos.

* no fsck support (probably will be in generated Makefile)

* doesn't generate forkgroups nor manifest.js.gz
  (I may do this for coderepo Xapian indexing)

It relies on parallel git-fetch for objstores, so `-j $NUM'
calculations may end up being ($NUM * $NUM) in the worst case.
Not sure how to best approach this...
Maybe `-j $M,$N' similar to `lei q -j$M,$N` is a solution...

Design note:

This is an exercise in building make(1)-like parallelism using
->DESTROY callbacks for prerequisites; so it's a newish paradigm
for me.  It forced me to fix a reference cycle, already.

TODO: repo|symlink pruning, --exit-code, retry/refetch, manpage updates

Eric Wong (95):
  clone: support multi-inbox clone
  clone: support --include and --exclude with multi-clone
  clone: parallelize v2 epoch clones
  lei_mirror: async config retrieval for v2 w/ manifest
  lei_mirror: rely on DESTROY to index v2 inbox
  lei_mirror: rely on global process reaper
  clone: support parallel v1 clones
  lei_mirror: default to single job by default
  lei_mirror: move directory creation to v2-only path
  lei_mirror: retrieve description text asynchronously, too
  switch inotify/kevent stuff to v5.12
  manifest: update module blurb + v5.12
  lei_mirror: simplify _get_txt_start callers
  lei_mirror: elide description retrieval for v1|coderepo
  lei_mirror: add a hint for skipped epoch permissions
  lei_mirror: consolidate clone process management
  lei_mirror: load File::Path unconditionally
  lei_mirror: load most modules up-front
  lei_mirror: set gitweb.owner from manifest
  clone: support --dry-run / -n flag
  lei_mirror: initialize placeholders with "head" from manifest
  lei_mirror: support {reference} for v1 manifest clones
  lei_mirror: reduce noise on interrupted clones
  clone: support --inbox-config option
  lei_mirror: retrieve v2 description properly
  lei_mirror: reduce scope of v2 lock
  lei_mirror: allow --epoch on mixed v1/v2 clones
  lei_mirror: fix infinite loop in dependency resolution
  lei_mirror: defend against infinite loops
  lei_mirror: do not fetch descriptions if using manifest
  lei_mirror: require PublicInbox::Lock at use
  lei_mirror: fix glob semantics to match end-of-path
  lei_mirror: differentiate -entv vs -ent
  lei_mirror: support manifest {references} for v2 epochs
  lei_mirror: simplify v2 code paths
  clone: support --inbox-version
  lei_mirror: require Perl v5.12+
  lei_mirror: ensure curl exits 22 on HTTP 404 responses
  lei_mirror: cleanup File::Temp OO usage
  lei_mirror: add `index' target to generated Makefile
  lei_mirror: do not write Makefile for --inbox-config=never
  lei_mirror: hoist out dump_manifest sub
  lei_mirror: avoid convoluted lazy_cb usage
  lei_mirror: simplify clone_v2_prep
  lei_mirror: support --objstore and forkgroups
  lei_mirror: cleanup process reaping logic
  lei_mirror: ensure git <1.8.5 fallback can use torsocks
  clone: flesh out --objstore behavior and document
  lei_mirror: always pack refs for coderepos
  lei_mirror: set description for non-inboxes, too
  lei_mirror: force --no-tags when fetching forkgroups
  lei_mirror: preserve permissions of existing alternates file
  lei_mirror: do not show ref updates w/o --verbose
  lei_mirror: drop git <1.8.5 support
  lei_mirror: make basename more descriptive
  lei_mirror: fix --dry-run for forkgroups
  lei_mirror: forkgroups use `git fetch --multiple'
  clone: move --dry-run handling to lei_mirror
  clone: drop unnecessary requires
  clone: use v5.12
  clone: require `--objstore=' for default location
  lei_mirror: shorten remote names
  fetch: use v5.12
  fetch: eliminate File::Temp->filename var
  lei_mirror: properly pack-refs in non-forkgroup repos
  lei_mirror: show child error error code
  on_destroy: support ->cancel callback
  lei_mirror: support resuming multi-repo clones
  lei_mirror: check fingerprints before fetching
  clone: support loading manifest.js.gz from destination
  lei_mirror: delay configuring forkgroups
  clone: canonicalize destination path from CLI
  clone|fetch: support passing --prune(-tags) to `git fetch'
  lei_mirror: avoid needless FD passing
  clone: support --keep-going/-k like make(1)
  lei_mirror: don't warn on missing manifest on initial clone
  lei_mirror: respect `./' and `../' prefixes for CLI args
  lei_mirror: --manifest= affects destination, too
  lei_mirror: update fingerprints when writing local manifest.js.gz
  lei_mirror: remove janky mirror.done stamp file
  lei_mirror: simplify most process spawning
  lei_mirror: run v1_done earlier on forkgroup done
  lei_mirror: simplify forkgroup-related subs
  lei_mirror: shorten scope mirror objects
  lei_mirror: set {head} from manifest
  lei_mirror: support {symlinks} from manifest
  lei_mirror: eliminate circular references
  lei_mirror: use curl -z/--timecond if manifest exists
  lei_mirror: avoid redundant curl `-f' use
  lei_mirror: omit trailing slash for git remote.*.url
  lei_mirror: set info/web/last-modified from manifest
  lei_mirror: don't clobber inbox.config.example if it exists
  lei_mirror: break out of fgrp fetch iteration early
  clone: support --project-list= for cgit
  lei_mirror: handle forkgroup changes

 Documentation/lei-add-external.pod   |    4 +-
 Documentation/public-inbox-clone.pod |   76 ++
 Documentation/public-inbox-fetch.pod |    6 +
 lib/PublicInbox/DSKQXS.pm            |    5 +-
 lib/PublicInbox/DirIdle.pm           |    4 +-
 lib/PublicInbox/FakeInotify.pm       |   13 +-
 lib/PublicInbox/Fetch.pm             |   50 +-
 lib/PublicInbox/In2Tie.pm            |    4 +-
 lib/PublicInbox/InboxIdle.pm         |    2 +-
 lib/PublicInbox/KQNotify.pm          |   12 +-
 lib/PublicInbox/LEI.pm               |    5 +-
 lib/PublicInbox/LeiMirror.pm         | 1104 +++++++++++++++++++++-----
 lib/PublicInbox/ManifestJsGz.pm      |    8 +-
 lib/PublicInbox/OnDestroy.pm         |    5 +-
 lib/PublicInbox/TestCommon.pm        |    1 +
 script/public-inbox-clone            |   23 +-
 script/public-inbox-fetch            |    4 +-
 t/on_destroy.t                       |    8 +-
 t/www_listing.t                      |   71 +-
 19 files changed, 1148 insertions(+), 257 deletions(-)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 01/95] clone: support multi-inbox clone
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
@ 2022-11-28  5:30 ` Eric Wong
  2022-11-28  5:30 ` [PATCH 02/95] clone: support --include and --exclude with multi-clone Eric Wong
                   ` (93 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:30 UTC (permalink / raw)
  To: meta

This is to ensure we can do `public-inbox-clone https://yhbt.net/lore'
or `public-inbox-clone https://lore.kernel.org/' and clone all
inboxes (and whatever else git stores).
---
 lib/PublicInbox/Fetch.pm     |  17 +++-
 lib/PublicInbox/LeiMirror.pm | 162 ++++++++++++++++++++++-------------
 t/www_listing.t              |  34 +++++++-
 3 files changed, 152 insertions(+), 61 deletions(-)

diff --git a/lib/PublicInbox/Fetch.pm b/lib/PublicInbox/Fetch.pm
index 364271e8..3b6aa389 100644
--- a/lib/PublicInbox/Fetch.pm
+++ b/lib/PublicInbox/Fetch.pm
@@ -44,6 +44,21 @@ sub remote_url ($$) {
 	undef
 }
 
+# PSGI mount prefixes and manifest.js.gz prefixes don't always align...
+# TODO: remove, handle multi-inbox fetch
+sub deduce_epochs ($$) {
+	my ($m, $path) = @_;
+	my ($v1_ent, @v2_epochs);
+	my $path_pfx = '';
+	$path =~ s!/+\z!!;
+	do {
+		$v1_ent = $m->{$path};
+		@v2_epochs = grep(m!\A\Q$path\E/git/[0-9]+\.git\z!, keys %$m);
+	} while (!defined($v1_ent) && !@v2_epochs &&
+		$path =~ s!\A(/[^/]+)/!/! and $path_pfx .= $1);
+	($path_pfx, $v1_ent ? $path : undef, @v2_epochs);
+}
+
 sub do_manifest ($$$) {
 	my ($lei, $dir, $ibx_uri) = @_;
 	my $muri = URI->new("$ibx_uri/manifest.js.gz");
@@ -88,7 +103,7 @@ sub do_manifest ($$$) {
 		return;
 	}
 	my (undef, $v1_path, @v2_epochs) =
-		PublicInbox::LeiMirror::deduce_epochs($mdiff, $ibx_uri->path);
+		deduce_epochs($mdiff, $ibx_uri->path);
 	[ 200, $muri, $v1_path, \@v2_epochs, $ft, $mf, $m1 ];
 }
 
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index ed8e4842..e356b5c5 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # "lei add-external --mirror" support (also "public-inbox-clone");
@@ -58,7 +58,7 @@ sub try_scrape {
 	if (my @v2_urls = grep(m!\A\Q$url\E/[0-9]+\z!, @urls)) {
 		my %v2_epochs = map {
 			my ($n) = (m!/([0-9]+)\z!);
-			$n => URI->new($_)
+			$n => [ URI->new($_), '' ]
 		} @v2_urls; # uniq
 		return clone_v2($self, \%v2_epochs);
 	}
@@ -104,26 +104,27 @@ sub ft_rename ($$$) {
 
 sub _get_txt { # non-fatal
 	my ($self, $endpoint, $file, $mode) = @_;
-	my $uri = URI->new($self->{src});
+	my $uri = URI->new($self->{cur_src} // $self->{src});
 	my $lei = $self->{lei};
 	my $path = $uri->path;
 	chop($path) eq '/' or die "BUG: $uri not canonicalized";
 	$uri->path("$path/$endpoint");
-	my $ft = File::Temp->new(TEMPLATE => "$file-XXXX", DIR => $self->{dst});
+	my $dst = $self->{cur_dst} // $self->{dst};
+	my $ft = File::Temp->new(TEMPLATE => "$file-XXXX", DIR => $dst);
 	my $opt = { 0 => $lei->{0}, 1 => $lei->{1}, 2 => $lei->{2} };
 	my $cmd = $self->{curl}->for_uri($lei, $uri,
 					qw(--compressed -R -o), $ft->filename);
 	my $cerr = run_reap($lei, $cmd, $opt);
 	return "$uri missing" if ($cerr >> 8) == 22;
 	return "# @$cmd failed (non-fatal)" if $cerr;
-	ft_rename($ft, "$self->{dst}/$file", $mode);
+	ft_rename($ft, "$dst/$file", $mode);
 	undef; # success
 }
 
 # tries the relatively new /$INBOX/_/text/config/raw endpoint
 sub _try_config {
 	my ($self) = @_;
-	my $dst = $self->{dst};
+	my $dst = $self->{cur_dst} // $self->{dst};
 	if (!-d $dst || !mkdir($dst)) {
 		require File::Path;
 		File::Path::mkpath($dst);
@@ -132,7 +133,7 @@ sub _try_config {
 	my $err = _get_txt($self,
 			qw(_/text/config/raw inbox.config.example), 0444);
 	return warn($err, "\n") if $err;
-	my $f = "$self->{dst}/inbox.config.example";
+	my $f = "$dst/inbox.config.example";
 	my $cfg = PublicInbox::Config->git_config_dump($f, $self->{lei}->{2});
 	my $ibx = $self->{ibx} = {};
 	for my $sec (grep(/\Apublicinbox\./, @{$cfg->{-section_order}})) {
@@ -144,7 +145,8 @@ sub _try_config {
 
 sub set_description ($) {
 	my ($self) = @_;
-	my $f = "$self->{dst}/description";
+	my $dst = $self->{cur_dst} // $self->{dst};
+	my $f = "$dst/description";
 	open my $fh, '+>>', $f or die "open($f): $!";
 	seek($fh, 0, SEEK_SET) or die "seek($f): $!";
 	chomp(my $d = do { local $/; <$fh> } // die "read($f): $!");
@@ -152,7 +154,8 @@ sub set_description ($) {
 			$d =~ /^Unnamed repository/ || $d !~ /\S/) {
 		seek($fh, 0, SEEK_SET) or die "seek($f): $!";
 		truncate($fh, 0) or die "truncate($f): $!";
-		print $fh "mirror of $self->{src}\n" or die "print($f): $!";
+		my $src = $self->{cur_src} // $self->{src};
+		print $fh "mirror of $src\n" or die "print($f): $!";
 		close $fh or die "close($f): $!";
 	}
 }
@@ -172,7 +175,7 @@ sub index_cloned_inbox {
 			address => [ 'lei@example.com' ],
 			version => $iv,
 		};
-		$ibx->{inboxdir} = $self->{dst};
+		$ibx->{inboxdir} = $self->{cur_dst} // $self->{dst};
 		PublicInbox::Inbox->new($ibx);
 		PublicInbox::InboxWritable->new($ibx);
 		my $opt = {};
@@ -188,6 +191,7 @@ sub index_cloned_inbox {
 		PublicInbox::Admin::progress_prepare($opt, $lei->{2});
 		PublicInbox::Admin::index_inbox($ibx, undef, $opt);
 	}
+	return if defined $self->{cur_dst};
 	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait
 }
 
@@ -205,21 +209,22 @@ sub clone_v1 {
 	my ($self) = @_;
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
-	my $uri = URI->new($self->{src});
+	my $uri = URI->new($self->{cur_src} // $self->{src});
 	defined($lei->{opt}->{epoch}) and
 		die "$uri is a v1 inbox, --epoch is not supported\n";
 	my $pfx = $curl->torsocks($lei, $uri) or return;
+	my $dst = $self->{cur_dst} // $self->{dst};
 	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}),
-			$uri->as_string, $self->{dst} ];
+			$uri->as_string, $dst ];
 	my $cerr = run_reap($lei, $cmd, $opt);
 	return $lei->child_error($cerr, "@$cmd failed") if $cerr;
 	_try_config($self);
-	write_makefile($self->{dst}, 1);
+	write_makefile($dst, 1);
 	index_cloned_inbox($self, 1);
 }
 
 sub parse_epochs ($$) {
-	my ($opt_epochs, $v2_epochs) = @_; # $epcohs "LOW..HIGH"
+	my ($opt_epochs, $v2_epochs) = @_; # $epochs "LOW..HIGH"
 	$opt_epochs // return; # undef => all epochs
 	my ($lo, $dotdot, $hi, @extra) = split(/(\.\.)/, $opt_epochs);
 	undef($lo) if ($lo // '') eq '';
@@ -282,12 +287,13 @@ sub clone_v2 ($$;$) {
 	my ($self, $v2_epochs, $m) = @_; # $m => manifest.js.gz hashref
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
-	my $pfx = $curl->torsocks($lei, (values %$v2_epochs)[0]) or return;
-	my $dst = $self->{dst};
+	my $first_uri = (map { $_->[0] } values %$v2_epochs)[0];
+	my $pfx = $curl->torsocks($lei, $first_uri) or return;
+	my $dst = $self->{cur_dst} // $self->{dst};
 	my $want = parse_epochs($lei->{opt}->{epoch}, $v2_epochs);
-	my (@src_edst, @read_only, @skip_nr);
+	my (@src_edst, @read_only, @skip);
 	for my $nr (sort { $a <=> $b } keys %$v2_epochs) {
-		my $uri = $v2_epochs->{$nr};
+		my ($uri, $key) = @{$v2_epochs->{$nr}};
 		my $src = $uri->as_string;
 		my $edst = $dst;
 		$src =~ m!/([0-9]+)(?:\.git)?\z! or die <<"";
@@ -300,15 +306,11 @@ failed to extract epoch number from $src
 		} else { # create a placeholder so users only need to chmod +w
 			init_placeholder($src, $edst);
 			push @read_only, $edst;
-			push @skip_nr, $nr;
+			push @skip, $key;
 		}
 	}
-	if (@skip_nr) { # filter out the epochs we skipped
-		my $re = join('|', @skip_nr);
-		my @del = grep(m!/git/$re\.git\z!, keys %$m);
-		delete @$m{@del};
-		$self->{-culled_manifest} = 1;
-	}
+	# filter out the epochs we skipped
+	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	_try_config($self);
 	my $on_destroy = $lk->lock_for_scope($$);
@@ -326,25 +328,11 @@ failed to extract epoch number from $src
 		my @st = stat($edst) or die "stat($edst): $!";
 		chmod($st[2] & 0555, $edst) or die "chmod(a-w, $edst): $!";
 	}
-	write_makefile($self->{dst}, 2);
+	write_makefile($dst, 2);
 	undef $on_destroy; # unlock
 	index_cloned_inbox($self, 2);
 }
 
-# PSGI mount prefixes and manifest.js.gz prefixes don't always align...
-sub deduce_epochs ($$) {
-	my ($m, $path) = @_;
-	my ($v1_ent, @v2_epochs);
-	my $path_pfx = '';
-	$path =~ s!/+\z!!;
-	do {
-		$v1_ent = $m->{$path};
-		@v2_epochs = grep(m!\A\Q$path\E/git/[0-9]+\.git\z!, keys %$m);
-	} while (!defined($v1_ent) && !@v2_epochs &&
-		$path =~ s!\A(/[^/]+)/!/! and $path_pfx .= $1);
-	($path_pfx, $v1_ent ? $path : undef, @v2_epochs);
-}
-
 sub decode_manifest ($$$) {
 	my ($fh, $fn, $uri) = @_;
 	my $js;
@@ -357,6 +345,40 @@ sub decode_manifest ($$$) {
 	$m;
 }
 
+sub multi_inbox ($$$) {
+	my ($self, $path, $m) = @_;
+
+	# assuming everything not v2 is v1, for now
+	my @v1 = sort grep(!m!.+/git/[0-9]+\.git\z!, keys %$m);
+	my @v2_epochs = sort grep(m!.+/git/[0-9]+\.git\z!, keys %$m);
+	my $v2 = {};
+
+	for (@v2_epochs) {
+		m!\A/(.+)/git/[0-9]+\.git\z! or die "BUG: $_";
+		push @{$v2->{$1}}, $_;
+	}
+	my $n = scalar(keys %$v2) + scalar(@v1);
+	my $ret; # { v1 => [ ... ], v2 => { $inbox_name => [ epochs ] }}
+	$ret->{v1} = \@v1 if @v1;
+	$ret->{v2} = $v2 if keys %$v2;
+	my $path_pfx = '';
+
+	# PSGI mount prefixes and manifest.js.gz prefixes don't always align...
+	if (@v2_epochs) {
+		until (grep(m!\A\Q$$path\E/git/[0-9]+\.git\z!,
+				@v2_epochs) == @v2_epochs) {
+			$$path =~ s!\A(/[^/]+)/!/! or last;
+			$path_pfx .= $1;
+		}
+	} elsif (@v1) {
+		while (!defined($m->{$$path}) && $$path =~ s!\A(/[^/]+)/!/!) {
+			$path_pfx .= $1;
+		}
+	}
+	($path_pfx, $n, $ret);
+}
+
+# FIXME: this gets confused by single inbox instance w/ global manifest.js.gz
 sub try_manifest {
 	my ($self) = @_;
 	my $uri = URI->new($self->{src});
@@ -384,25 +406,48 @@ sub try_manifest {
 		warn $@;
 		return try_scrape($self);
 	}
-	my ($path_pfx, $v1_path, @v2_epochs) = deduce_epochs($m, $path);
-	if (@v2_epochs) {
-		# It may be possible to have v1 + v2 in parallel someday:
-		warn(<<EOM) if defined $v1_path;
-# `$v1_path' appears to be a v1 inbox while v2 epochs exist:
-# @v2_epochs
-# ignoring $v1_path (use --inbox-version=1 to force v1 instead)
+	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
+	if (my $v2 = delete $multi->{v2}) {
+		for my $name (sort keys %$v2) {
+			my $epochs = delete $v2->{$name};
+			my %v2_epochs = map {
+				$uri->path($n > 1 ? $path_pfx.$path.$_
+						: $path_pfx.$_);
+				my ($e) = ("$uri" =~ m!/([0-9]+)\.git\z!);
+				$e // die "no [0-9]+\.git in `$uri'";
+				$e => [ $uri->clone, $_ ];
+			} @$epochs;
+			("$uri" =~ m!\A(.+/)git/[0-9]+\.git\z!) or
+				die "BUG: `$uri' !~ m!/git/[0-9]+.git!";
+			local $self->{cur_src} = $1;
+			local $self->{cur_dst} = $self->{dst};
+			if ($n > 1 && $uri->path =~ m!\A\Q$path_pfx$path\E/(.+)/
+							git/[0-9]+\.git\z!x) {
+				$self->{cur_dst} .= "/$1";
+			}
+			index($self->{cur_dst}, "\n") >= 0 and die <<EOM;
+E: `$self->{cur_dst}' must not contain newline
 EOM
-		my %v2_epochs = map {
-			$uri->path($path_pfx.$_);
-			my ($n) = ("$uri" =~ m!/([0-9]+)\.git\z!);
-			$n => $uri->clone
-		} @v2_epochs;
-		clone_v2($self, \%v2_epochs, $m);
-	} elsif (defined $v1_path) {
-		clone_v1($self);
-	} else {
-		die "E: confused by <$uri>, possible matches:\n\t",
-			join("\n\t", sort keys %$m), "\n";
+			clone_v2($self, \%v2_epochs, $m);
+		}
+	}
+	if (my $v1 = delete $multi->{v1}) {
+		my $p = $path_pfx.$path;
+		chop($p) if substr($p, -1, 1) eq '/';
+		$uri->path($p);
+		for my $name (@$v1) {
+			local $self->{cur_src} = "$uri";
+			local $self->{cur_dst} = $self->{dst};
+			if ($n > 1) {
+				$self->{cur_dst} .= $name;
+				$self->{cur_src} .= $name;
+			}
+			index($self->{cur_dst}, "\n") >= 0 and die <<EOM;
+E: `$self->{cur_dst}' must not contain newline
+EOM
+			$self->{cur_src} .= '/';
+			clone_v1($self, 1);
+		}
 	}
 	if (delete $self->{-culled_manifest}) { # set by clone_v2
 		# write the smaller manifest if epochs were skipped so
@@ -414,6 +459,7 @@ EOM
 		utime($mtime, $mtime, $fn) or die "utime(..., $fn): $!";
 	}
 	ft_rename($ft, "$self->{dst}/manifest.js.gz", 0666);
+	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait
 }
 
 sub start_clone_url {
diff --git a/t/www_listing.t b/t/www_listing.t
index c556a2d7..e88bfbc5 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -1,5 +1,5 @@
 #!perl -w
-# Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 # manifest.js.gz generation and grok-pull integration test
 use strict; use v5.10.1; use PublicInbox::TestCommon;
@@ -115,10 +115,38 @@ SKIP: {
 
 	my $env = { PI_CONFIG => $cfgfile };
 	my $cmd = [ '-httpd', '-W0', "--stdout=$out", "--stderr=$err" ];
+	my $psgi = "$tmpdir/pfx.psgi";
+	{
+		open my $psgi_fh, '>', $psgi or xbail "open: $!";
+		print $psgi_fh <<'EOM' or xbail "print $!";
+use PublicInbox::WWW;
+use Plack::Builder;
+my $www = PublicInbox::WWW->new;
+builder {
+	enable 'Head';
+	mount '/pfx/' => sub { $www->call(@_) }
+}
+EOM
+		close $psgi_fh or xbail "close: $!";
+	}
+
+	# ensure prefixed mount full clones work:
+	$td = start_script([@$cmd, $psgi], $env, { 3 => $sock });
+	my $opt = { 2 => \(my $clone_err = '') };
+	ok(run_script(['-clone', "http://$host:$port/pfx", "$tmpdir/pfx" ],
+		undef, $opt), 'pfx clone w/pfx') or diag "clone_err=$clone_err";
+	undef $td;
+
 	$td = start_script($cmd, $env, { 3 => $sock });
 
 	# default publicinboxGrokManifest match=domain default
 	tiny_test($json, $host, $port);
+
+	# normal full clone on /
+	$clone_err = '';
+	ok(run_script(['-clone', "http://$host:$port/", "$tmpdir/full" ],
+		undef, $opt), 'full clone') or diag "clone_err=$clone_err";
+
 	undef $td;
 
 	print $fh <<"" or xbail "print $!";
@@ -127,9 +155,11 @@ SKIP: {
 
 	close $fh or xbail "close $!";
 	$td = start_script($cmd, $env, { 3 => $sock });
-	tiny_test($json, $host, $port, 1);
 	undef $sock;
+	tiny_test($json, $host, $port, 1);
 
+	# grok-pull sleeps a long while some places:
+	# https://lore.kernel.org/tools/20211013110344.GA10632@dcvr/
 	skip 'TEST_GROK unset', 12 unless $ENV{TEST_GROK};
 	my $grok_pull = require_cmd('grok-pull', 1) or
 		skip('grok-pull not available', 12);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 02/95] clone: support --include and --exclude with multi-clone
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
  2022-11-28  5:30 ` [PATCH 01/95] clone: support multi-inbox clone Eric Wong
@ 2022-11-28  5:30 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 03/95] clone: parallelize v2 epoch clones Eric Wong
                   ` (92 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:30 UTC (permalink / raw)
  To: meta

These will be handy when someone is interested in a subset of
inboxes on a large hosting site.
---
 Documentation/public-inbox-clone.pod | 14 +++++++++++++
 lib/PublicInbox/LeiMirror.pm         | 31 +++++++++++++++++++++++++---
 script/public-inbox-clone            |  2 +-
 t/www_listing.t                      | 22 ++++++++++++++++++++
 4 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index c80c3c5f..7e95146e 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -51,6 +51,20 @@ C<--epoch=~2..> clones the three latest epochs.
 Default: C<0..~0> or C<0..> or C<..~0>
 (all epochs, all three examples are equivalent)
 
+=item -I PATTERN
+
+=item --include=PATTERN
+
+When cloning a top-level with multiple inboxes, only clone inboxes and
+repositories matching a given wildcard pattern (using C<*?> and C<[]> is
+supported).
+
+=item --exclude=PATTERN
+
+When cloning a top-level with multiple inboxes, ignore inboxes and
+repositories matching the given wildcard pattern.  Supports the same
+wildcards as L</--include>
+
 =item -q
 
 =item --quiet
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index e356b5c5..d5017642 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -347,6 +347,8 @@ sub decode_manifest ($$$) {
 
 sub multi_inbox ($$$) {
 	my ($self, $path, $m) = @_;
+	my $incl = $self->{lei}->{opt}->{include};
+	my $excl = $self->{lei}->{opt}->{exclude};
 
 	# assuming everything not v2 is v1, for now
 	my @v1 = sort grep(!m!.+/git/[0-9]+\.git\z!, keys %$m);
@@ -354,13 +356,35 @@ sub multi_inbox ($$$) {
 	my $v2 = {};
 
 	for (@v2_epochs) {
-		m!\A/(.+)/git/[0-9]+\.git\z! or die "BUG: $_";
+		m!\A(/.+)/git/[0-9]+\.git\z! or die "BUG: $_";
 		push @{$v2->{$1}}, $_;
 	}
 	my $n = scalar(keys %$v2) + scalar(@v1);
-	my $ret; # { v1 => [ ... ], v2 => { $inbox_name => [ epochs ] }}
+	my @orig = defined($incl // $excl) ? (keys %$v2, @v1) : ();
+	if (defined $incl) {
+		my $re = '(?:'.join('|', map {
+				$self->{lei}->glob2re($_) // qr/\A\Q$_\E\z/
+			} @$incl).')';
+		my @gone = delete @$v2{grep(!/$re/, keys %$v2)};
+		delete @$m{map { @$_ } @gone} and $self->{-culled_manifest} = 1;
+		delete @$m{grep(!/$re/, @v1)} and $self->{-culled_manifest} = 1;
+		@v1 = grep(/$re/, @v1);
+	}
+	if (defined $excl) {
+		my $re = '(?:'.join('|', map {
+				$self->{lei}->glob2re($_) // qr/\A\Q$_\E\z/
+			} @$excl).')';
+		my @gone = delete @$v2{grep(/$re/, keys %$v2)};
+		delete @$m{map { @$_ } @gone} and $self->{-culled_manifest} = 1;
+		delete @$m{grep(/$re/, @v1)} and $self->{-culled_manifest} = 1;
+		@v1 = grep(!/$re/, @v1);
+	}
+	my $ret; # { v1 => [ ... ], v2 => { "/$inbox_name" => [ epochs ] }}
 	$ret->{v1} = \@v1 if @v1;
 	$ret->{v2} = $v2 if keys %$v2;
+	$ret //= @orig ? "Nothing to clone, available repositories:\n\t".
+				join("\n\t", sort @orig)
+			: "Nothing available to clone\n";
 	my $path_pfx = '';
 
 	# PSGI mount prefixes and manifest.js.gz prefixes don't always align...
@@ -407,6 +431,7 @@ sub try_manifest {
 		return try_scrape($self);
 	}
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
+	return $lei->child_error(1, $multi) if !ref($multi);
 	if (my $v2 = delete $multi->{v2}) {
 		for my $name (sort keys %$v2) {
 			my $epochs = delete $v2->{$name};
@@ -449,7 +474,7 @@ EOM
 			clone_v1($self, 1);
 		}
 	}
-	if (delete $self->{-culled_manifest}) { # set by clone_v2
+	if (delete $self->{-culled_manifest}) { # set by clone_v2/-I/--exclude
 		# write the smaller manifest if epochs were skipped so
 		# users won't have to delete manifest if they +w an
 		# epoch they no longer want to skip
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 54059d03..4244e0c8 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -21,7 +21,7 @@ options:
     --quiet | -q      increase verbosity (may be repeated)
     -C DIR            chdir to specified directory
 EOF
-GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@
+GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
 		no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config
diff --git a/t/www_listing.t b/t/www_listing.t
index e88bfbc5..e6bb1bda 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -135,8 +135,29 @@ EOM
 	my $opt = { 2 => \(my $clone_err = '') };
 	ok(run_script(['-clone', "http://$host:$port/pfx", "$tmpdir/pfx" ],
 		undef, $opt), 'pfx clone w/pfx') or diag "clone_err=$clone_err";
+
+	open my $mh, '<', "$tmpdir/pfx/manifest.js.gz" or xbail "open: $!";
+	gunzip(\(do { local $/; <$mh> }) => \(my $mjs = ''));
+	my $mf = $json->decode($mjs);
+	is_deeply([sort keys %$mf], [ qw(/alt /bare /v2/git/0.git
+					/v2/git/1.git /v2/git/2.git) ],
+		'manifest saved');
+	for (keys %$mf) { ok(-d "$tmpdir/pfx$_", "pfx/$_ cloned") }
+
+	$clone_err = '';
+	ok(run_script(['-clone', '--include=*/alt',
+			"http://$host:$port/pfx", "$tmpdir/incl" ],
+		undef, $opt), 'clone w/include') or diag "clone_err=$clone_err";
+	ok(-d "$tmpdir/incl/alt", 'alt cloned');
+	ok(!-d "$tmpdir/incl/v2" && !-d "$tmpdir/incl/bare", 'only alt cloned');
+
 	undef $td;
 
+	open $mh, '<', "$tmpdir/incl/manifest.js.gz" or xbail "open: $!";
+	gunzip(\(do { local $/; <$mh> }) => \($mjs = ''));
+	$mf = $json->decode($mjs);
+	is_deeply([keys %$mf], [ '/alt' ], 'excluded keys skipped in manifest');
+
 	$td = start_script($cmd, $env, { 3 => $sock });
 
 	# default publicinboxGrokManifest match=domain default
@@ -146,6 +167,7 @@ EOM
 	$clone_err = '';
 	ok(run_script(['-clone', "http://$host:$port/", "$tmpdir/full" ],
 		undef, $opt), 'full clone') or diag "clone_err=$clone_err";
+	ok(-d "$tmpdir/full/$_", "$_ cloned") for qw(alt v2 bare);
 
 	undef $td;
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 03/95] clone: parallelize v2 epoch clones
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
  2022-11-28  5:30 ` [PATCH 01/95] clone: support multi-inbox clone Eric Wong
  2022-11-28  5:30 ` [PATCH 02/95] clone: support --include and --exclude with multi-clone Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 04/95] lei_mirror: async config retrieval for v2 w/ manifest Eric Wong
                   ` (91 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This is a first step in supporting completely parallelized
clones.  Eventually, everything will be parallelized and
dependencies will be managed via callbacks.
---
 lib/PublicInbox/LeiMirror.pm  | 52 ++++++++++++++++++++++++++++++-----
 lib/PublicInbox/TestCommon.pm |  1 +
 script/public-inbox-clone     |  2 +-
 3 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index d5017642..a8aa6a9f 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -14,6 +14,7 @@ use PublicInbox::Spawn qw(popen_rd spawn);
 use File::Temp ();
 use Fcntl qw(SEEK_SET O_CREAT O_EXCL O_WRONLY);
 use Carp qw(croak);
+use POSIX qw(WNOHANG);
 
 sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my ($arg, $pid) = @_;
@@ -283,6 +284,16 @@ EOM
 	close $fh or die "close:($f): $!";
 }
 
+sub reap_clone { # async, called via SIGCHLD
+	my ($lei, $cmd, $live) = @_;
+	my $cerr = $?;
+	$? = 0; # don't let it influence normal exit
+	if ($cerr) {
+		kill('TERM', keys %$live);
+		$lei->child_error($cerr, "@$cmd failed");
+	}
+}
+
 sub clone_v2 ($$;$) {
 	my ($self, $v2_epochs, $m) = @_; # $m => manifest.js.gz hashref
 	my $lei = $self->{lei};
@@ -312,14 +323,41 @@ failed to extract epoch number from $src
 	# filter out the epochs we skipped
 	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
-	_try_config($self);
+	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
+
+	_try_config($task);
 	my $on_destroy = $lk->lock_for_scope($$);
 	my @cmd = clone_cmd($lei, my $opt = {});
-	while (my ($src, $edst) = splice(@src_edst, 0, 2)) {
-		my $cmd = [ @$pfx, @cmd, $src, $edst ];
-		my $cerr = run_reap($lei, $cmd, $opt);
-		return $lei->child_error($cerr, "@$cmd failed") if $cerr;
-	}
+	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
+	my %live;
+	my $sigchld = sub {
+		my ($sig) = @_;
+		my $flags = $sig ? WNOHANG : 0;
+		while (1) {
+			my $pid = waitpid(-1, $flags) or return;
+			return if $pid < 0;
+			if (my $x = delete $live{$pid}) {
+				my $cb = shift @$x;
+				$cb->(@$x, \%live);
+			} else {
+				warn "reaped unknown PID=$pid ($?)\n";
+			}
+		}
+	};
+	do {
+		$sigchld->(0) while keys(%live) >= $jobs;
+		local $SIG{CHLD} = $sigchld;
+		while (keys(%live) < $jobs && @src_edst &&
+				!$lei->{child_error}) {
+			my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
+			$lei->qerr("# @$cmd");
+			my $pid = spawn($cmd, undef, $opt);
+			$live{$pid} = [ \&reap_clone, $lei, $cmd ];
+		}
+	} while (@src_edst && !$lei->{child_error});
+	$sigchld->(0) while keys(%live);
+	return if $lei->{child_error};
+
 	require PublicInbox::MultiGit;
 	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
 	$mg->fill_alternates;
@@ -330,7 +368,7 @@ failed to extract epoch number from $src
 	}
 	write_makefile($dst, 2);
 	undef $on_destroy; # unlock
-	index_cloned_inbox($self, 2);
+	index_cloned_inbox($task, 2);
 }
 
 sub decode_manifest ($$$) {
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index e793a001..888c1f1e 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -291,6 +291,7 @@ sub run_script ($;$$) {
 	my ($cmd, $env, $opt) = @_;
 	my ($key, @argv) = @$cmd;
 	my $run_mode = $ENV{TEST_RUN_MODE} // $opt->{run_mode} // 1;
+	$run_mode = 0 if $key eq '-clone'; # relies on SIGCHLD + waitpid(-1)
 	my $sub = $run_mode == 0 ? undef : key2sub($key);
 	my $fhref = [];
 	my $spawn_opt = {};
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 4244e0c8..ce4697f3 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -22,7 +22,7 @@ options:
     -C DIR            chdir to specified directory
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
-		no-torsocks torsocks=s epoch=s)) or die $help;
+		jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config
 PublicInbox::Admin::do_chdir(delete $opt->{C});

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 04/95] lei_mirror: async config retrieval for v2 w/ manifest
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (2 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 03/95] clone: parallelize v2 epoch clones Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 05/95] lei_mirror: rely on DESTROY to index v2 inbox Eric Wong
                   ` (90 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Another step towards being able to minimize mirror time by
supporting parallelization.
---
 lib/PublicInbox/LeiMirror.pm | 41 +++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index a8aa6a9f..ed2d71b8 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -103,7 +103,7 @@ sub ft_rename ($$$) {
 	$ft->unlink_on_destroy(0);
 }
 
-sub _get_txt { # non-fatal
+sub _get_txt_start { # non-fatal
 	my ($self, $endpoint, $file, $mode) = @_;
 	my $uri = URI->new($self->{cur_src} // $self->{src});
 	my $lei = $self->{lei};
@@ -115,15 +115,25 @@ sub _get_txt { # non-fatal
 	my $opt = { 0 => $lei->{0}, 1 => $lei->{1}, 2 => $lei->{2} };
 	my $cmd = $self->{curl}->for_uri($lei, $uri,
 					qw(--compressed -R -o), $ft->filename);
-	my $cerr = run_reap($lei, $cmd, $opt);
+	$self->{-get_txt} = [ $ft, $cmd, $uri, $file, $mode ];
+	$lei->qerr("# @$cmd");
+	spawn($cmd, undef, $opt);
+}
+
+sub _get_txt_done {
+	my ($self) = @_;
+	my ($ft, $cmd, $uri, $file, $mode) = @{delete $self->{-get_txt}};
+	my $cerr = $?;
+	$? = 0;
 	return "$uri missing" if ($cerr >> 8) == 22;
 	return "# @$cmd failed (non-fatal)" if $cerr;
+	my $dst = $self->{cur_dst} // $self->{dst};
 	ft_rename($ft, "$dst/$file", $mode);
 	undef; # success
 }
 
 # tries the relatively new /$INBOX/_/text/config/raw endpoint
-sub _try_config {
+sub _try_config_start {
 	my ($self) = @_;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	if (!-d $dst || !mkdir($dst)) {
@@ -131,9 +141,14 @@ sub _try_config {
 		File::Path::mkpath($dst);
 		-d $dst or die "mkpath($dst): $!\n";
 	}
-	my $err = _get_txt($self,
-			qw(_/text/config/raw inbox.config.example), 0444);
+	_get_txt_start($self, qw(_/text/config/raw inbox.config.example), 0444);
+}
+
+sub _try_config_done {
+	my ($self) = @_;
+	my $err = _get_txt_done($self);
 	return warn($err, "\n") if $err;
+	my $dst = $self->{cur_dst} // $self->{dst};
 	my $f = "$dst/inbox.config.example";
 	my $cfg = PublicInbox::Config->git_config_dump($f, $self->{lei}->{2});
 	my $ibx = $self->{ibx} = {};
@@ -144,6 +159,17 @@ sub _try_config {
 	}
 }
 
+sub _get_txt { # non-fatal temporary compat function
+	waitpid(_get_txt_start(@_), 0) > 0 or die "waitpid: $!";
+	_get_txt_done($_[0]);
+}
+
+# tries the relatively new /$INBOX/_/text/config/raw endpoint
+sub _try_config { # temporary compat function
+	waitpid(_try_config_start($_[0]), 0) > 0 or die "waitpid: $!";
+	_try_config_done($_[0]);
+}
+
 sub set_description ($) {
 	my ($self) = @_;
 	my $dst = $self->{cur_dst} // $self->{dst};
@@ -324,12 +350,11 @@ failed to extract epoch number from $src
 	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
-
-	_try_config($task);
+	my %live;
+	$live{_try_config_start($task)} = [ \&_try_config_done, $task ];
 	my $on_destroy = $lk->lock_for_scope($$);
 	my @cmd = clone_cmd($lei, my $opt = {});
 	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
-	my %live;
 	my $sigchld = sub {
 		my ($sig) = @_;
 		my $flags = $sig ? WNOHANG : 0;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 05/95] lei_mirror: rely on DESTROY to index v2 inbox
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (3 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 04/95] lei_mirror: async config retrieval for v2 w/ manifest Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 06/95] lei_mirror: rely on global process reaper Eric Wong
                   ` (89 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This will give us more freedom in upcoming commits
to ensure indexing only happens after all all epochs
are cloned.
---
 lib/PublicInbox/LeiMirror.pm | 43 +++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index ed2d71b8..0603dd48 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -320,6 +320,22 @@ sub reap_clone { # async, called via SIGCHLD
 	}
 }
 
+sub v2_done {
+	my ($self) = @_;
+	require PublicInbox::MultiGit;
+	my $dst = $self->{cur_dst} // $self->{dst};
+	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
+	$mg->fill_alternates;
+	for my $i ($mg->git_epochs) { $mg->epoch_cfg_set($i) }
+	for my $edst (@{delete($self->{-read_only}) // []}) {
+		my @st = stat($edst) or die "stat($edst): $!";
+		chmod($st[2] & 0555, $edst) or die "chmod(a-w, $edst): $!";
+	}
+	write_makefile($dst, 2);
+	delete $self->{-locked} // die "BUG: $dst not locked"; # unlock
+	index_cloned_inbox($self, 2);
+}
+
 sub clone_v2 ($$;$) {
 	my ($self, $v2_epochs, $m) = @_; # $m => manifest.js.gz hashref
 	my $lei = $self->{lei};
@@ -328,7 +344,8 @@ sub clone_v2 ($$;$) {
 	my $pfx = $curl->torsocks($lei, $first_uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $want = parse_epochs($lei->{opt}->{epoch}, $v2_epochs);
-	my (@src_edst, @read_only, @skip);
+	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
+	my (@src_edst, @skip);
 	for my $nr (sort { $a <=> $b } keys %$v2_epochs) {
 		my ($uri, $key) = @{$v2_epochs->{$nr}};
 		my $src = $uri->as_string;
@@ -342,17 +359,17 @@ failed to extract epoch number from $src
 			push @src_edst, $src, $edst;
 		} else { # create a placeholder so users only need to chmod +w
 			init_placeholder($src, $edst);
-			push @read_only, $edst;
+			push @{$task->{-read_only}}, $edst;
 			push @skip, $key;
 		}
 	}
 	# filter out the epochs we skipped
 	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
-	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
 	my %live;
-	$live{_try_config_start($task)} = [ \&_try_config_done, $task ];
-	my $on_destroy = $lk->lock_for_scope($$);
+	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
+	$live{_try_config_start($task)} = [ \&_try_config_done, $task, $fini ];
+	$task->{-locked} = $lk->lock_for_scope($$);
 	my @cmd = clone_cmd($lei, my $opt = {});
 	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
 	my $sigchld = sub {
@@ -371,29 +388,15 @@ failed to extract epoch number from $src
 	};
 	do {
 		$sigchld->(0) while keys(%live) >= $jobs;
-		local $SIG{CHLD} = $sigchld;
 		while (keys(%live) < $jobs && @src_edst &&
 				!$lei->{child_error}) {
 			my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
 			$lei->qerr("# @$cmd");
 			my $pid = spawn($cmd, undef, $opt);
-			$live{$pid} = [ \&reap_clone, $lei, $cmd ];
+			$live{$pid} = [ \&reap_clone, $lei, $cmd, $fini ];
 		}
 	} while (@src_edst && !$lei->{child_error});
 	$sigchld->(0) while keys(%live);
-	return if $lei->{child_error};
-
-	require PublicInbox::MultiGit;
-	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
-	$mg->fill_alternates;
-	for my $i ($mg->git_epochs) { $mg->epoch_cfg_set($i) }
-	for my $edst (@read_only) {
-		my @st = stat($edst) or die "stat($edst): $!";
-		chmod($st[2] & 0555, $edst) or die "chmod(a-w, $edst): $!";
-	}
-	write_makefile($dst, 2);
-	undef $on_destroy; # unlock
-	index_cloned_inbox($task, 2);
 }
 
 sub decode_manifest ($$$) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 06/95] lei_mirror: rely on global process reaper
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (4 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 05/95] lei_mirror: rely on DESTROY to index v2 inbox Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 07/95] clone: support parallel v1 clones Eric Wong
                   ` (88 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We no longer rely on SIGCHLD for predictability, and instead
call waitpid at safe points.  This will make it easier for us to
do parallel mirroring of multiple inboxes while preserving
proper dependencies via ->DESTROY callbacks.
---
 lib/PublicInbox/LeiMirror.pm | 54 +++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 0603dd48..7dc47ab8 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -14,7 +14,7 @@ use PublicInbox::Spawn qw(popen_rd spawn);
 use File::Temp ();
 use Fcntl qw(SEEK_SET O_CREAT O_EXCL O_WRONLY);
 use Carp qw(croak);
-use POSIX qw(WNOHANG);
+our %LIVE;
 
 sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my ($arg, $pid) = @_;
@@ -61,7 +61,9 @@ sub try_scrape {
 			my ($n) = (m!/([0-9]+)\z!);
 			$n => [ URI->new($_), '' ]
 		} @v2_urls; # uniq
-		return clone_v2($self, \%v2_epochs);
+		clone_v2($self, \%v2_epochs);
+		reap_live() while keys(%LIVE);
+		return;
 	}
 
 	# filter out common URLs served by WWW (e.g /$MSGID/T/)
@@ -311,16 +313,16 @@ EOM
 }
 
 sub reap_clone { # async, called via SIGCHLD
-	my ($lei, $cmd, $live) = @_;
+	my ($lei, $cmd) = @_;
 	my $cerr = $?;
 	$? = 0; # don't let it influence normal exit
 	if ($cerr) {
-		kill('TERM', keys %$live);
+		kill('TERM', keys %LIVE);
 		$lei->child_error($cerr, "@$cmd failed");
 	}
 }
 
-sub v2_done {
+sub v2_done { # called via OnDestroy
 	my ($self) = @_;
 	require PublicInbox::MultiGit;
 	my $dst = $self->{cur_dst} // $self->{dst};
@@ -336,6 +338,16 @@ sub v2_done {
 	index_cloned_inbox($self, 2);
 }
 
+sub reap_live {
+	my $pid = waitpid(-1, 0) // die "waitpid(-1): $!";
+	if (my $x = delete $LIVE{$pid}) {
+		my $cb = shift @$x;
+		$cb->(@$x);
+	} else {
+		warn "reaped unknown PID=$pid ($?)\n";
+	}
+}
+
 sub clone_v2 ($$;$) {
 	my ($self, $v2_epochs, $m) = @_; # $m => manifest.js.gz hashref
 	my $lei = $self->{lei};
@@ -366,37 +378,21 @@ failed to extract epoch number from $src
 	# filter out the epochs we skipped
 	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
-	my %live;
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
-	$live{_try_config_start($task)} = [ \&_try_config_done, $task, $fini ];
+	$LIVE{_try_config_start($task)} = [ \&_try_config_done, $task, $fini ];
 	$task->{-locked} = $lk->lock_for_scope($$);
 	my @cmd = clone_cmd($lei, my $opt = {});
 	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
-	my $sigchld = sub {
-		my ($sig) = @_;
-		my $flags = $sig ? WNOHANG : 0;
-		while (1) {
-			my $pid = waitpid(-1, $flags) or return;
-			return if $pid < 0;
-			if (my $x = delete $live{$pid}) {
-				my $cb = shift @$x;
-				$cb->(@$x, \%live);
-			} else {
-				warn "reaped unknown PID=$pid ($?)\n";
-			}
-		}
-	};
 	do {
-		$sigchld->(0) while keys(%live) >= $jobs;
-		while (keys(%live) < $jobs && @src_edst &&
+		reap_live() while keys(%LIVE) >= $jobs;
+		while (keys(%LIVE) < $jobs && @src_edst &&
 				!$lei->{child_error}) {
 			my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
 			$lei->qerr("# @$cmd");
-			my $pid = spawn($cmd, undef, $opt);
-			$live{$pid} = [ \&reap_clone, $lei, $cmd, $fini ];
+			$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone,
+							$lei, $cmd, $fini ];
 		}
 	} while (@src_edst && !$lei->{child_error});
-	$sigchld->(0) while keys(%live);
 }
 
 sub decode_manifest ($$$) {
@@ -487,6 +483,7 @@ sub try_manifest {
 	my $opt = { -C => $pdir };
 	$opt->{$_} = $lei->{$_} for (0..2);
 	my $cerr = run_reap($lei, $cmd, $opt);
+	local %LIVE;
 	if ($cerr) {
 		return try_scrape($self) if ($cerr >> 8) == 22; # 404 missing
 		return $lei->child_error($cerr, "@$cmd failed");
@@ -498,6 +495,7 @@ sub try_manifest {
 	}
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
 	return $lei->child_error(1, $multi) if !ref($multi);
+	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
 	if (my $v2 = delete $multi->{v2}) {
 		for my $name (sort keys %$v2) {
 			my $epochs = delete $v2->{$name};
@@ -520,6 +518,8 @@ sub try_manifest {
 E: `$self->{cur_dst}' must not contain newline
 EOM
 			clone_v2($self, \%v2_epochs, $m);
+			reap_live() while keys(%LIVE) >= $jobs;
+			return if $self->{lei}->{child_error};
 		}
 	}
 	if (my $v1 = delete $multi->{v1}) {
@@ -540,6 +540,7 @@ EOM
 			clone_v1($self, 1);
 		}
 	}
+	reap_live() while keys(%LIVE);
 	if (delete $self->{-culled_manifest}) { # set by clone_v2/-I/--exclude
 		# write the smaller manifest if epochs were skipped so
 		# users won't have to delete manifest if they +w an
@@ -566,6 +567,7 @@ sub do_mirror { # via wq_io_do
 	eval {
 		my $iv = $lei->{opt}->{'inbox-version'};
 		if (defined $iv) {
+			local %LIVE;
 			return clone_v1($self) if $iv == 1;
 			return try_scrape($self) if $iv == 2;
 			die "bad --inbox-version=$iv\n";

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 07/95] clone: support parallel v1 clones
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (5 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 06/95] lei_mirror: rely on global process reaper Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 08/95] lei_mirror: default to single job by default Eric Wong
                   ` (87 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This opens the door to parallel cloning of coderepos, too.  We
can also get rid of needless AutoReap usage, here, too since
it's usage has been 100% synchronous and not DESTROY-based as
they are in tests.
---
 lib/PublicInbox/LeiMirror.pm | 59 +++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 25 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 7dc47ab8..a0b197a7 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -7,7 +7,6 @@ use strict;
 use v5.10.1;
 use parent qw(PublicInbox::IPC);
 use PublicInbox::Config;
-use PublicInbox::AutoReap;
 use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
 use IO::Compress::Gzip qw(gzip $GzipError);
 use PublicInbox::Spawn qw(popen_rd spawn);
@@ -166,12 +165,6 @@ sub _get_txt { # non-fatal temporary compat function
 	_get_txt_done($_[0]);
 }
 
-# tries the relatively new /$INBOX/_/text/config/raw endpoint
-sub _try_config { # temporary compat function
-	waitpid(_try_config_start($_[0]), 0) > 0 or die "waitpid: $!";
-	_try_config_done($_[0]);
-}
-
 sub set_description ($) {
 	my ($self) = @_;
 	my $dst = $self->{cur_dst} // $self->{dst};
@@ -227,15 +220,14 @@ sub index_cloned_inbox {
 sub run_reap {
 	my ($lei, $cmd, $opt) = @_;
 	$lei->qerr("# @$cmd");
-	my $ar = PublicInbox::AutoReap->new(spawn($cmd, undef, $opt));
-	$ar->join;
+	waitpid(spawn($cmd, undef, $opt), 0) // die "waitpid: $!";
 	my $ret = $?;
 	$? = 0; # don't let it influence normal exit
 	$ret;
 }
 
 sub clone_v1 {
-	my ($self) = @_;
+	my ($self, $nohang) = @_;
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
 	my $uri = URI->new($self->{cur_src} // $self->{src});
@@ -243,13 +235,17 @@ sub clone_v1 {
 		die "$uri is a v1 inbox, --epoch is not supported\n";
 	my $pfx = $curl->torsocks($lei, $uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
-	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}),
-			$uri->as_string, $dst ];
-	my $cerr = run_reap($lei, $cmd, $opt);
-	return $lei->child_error($cerr, "@$cmd failed") if $cerr;
-	_try_config($self);
-	write_makefile($dst, 1);
-	index_cloned_inbox($self, 1);
+	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
+	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
+	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}), "$uri", $dst ];
+	$lei->qerr("# @$cmd");
+	$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone, $lei, $cmd, $fini ];
+	reap_live() while keys(%LIVE) >= $jobs;
+
+	# wait for `git clone' to mkdir $dst (TODO: inotify/kevent?)
+	select(undef, undef, undef, 0.011) until -d $dst;
+	$LIVE{_try_config_start($self)} = [ \&_try_config_done, $self, $fini ];
+	reap_live() until ($nohang || !keys(%LIVE));
 }
 
 sub parse_epochs ($$) {
@@ -322,6 +318,13 @@ sub reap_clone { # async, called via SIGCHLD
 	}
 }
 
+sub v1_done { # called via OnDestroy
+	my ($self) = @_;
+	my $dst = $self->{cur_dst} // $self->{dst};
+	write_makefile($dst, 1);
+	index_cloned_inbox($self, 1);
+}
+
 sub v2_done { # called via OnDestroy
 	my ($self) = @_;
 	require PublicInbox::MultiGit;
@@ -527,20 +530,26 @@ EOM
 		chop($p) if substr($p, -1, 1) eq '/';
 		$uri->path($p);
 		for my $name (@$v1) {
-			local $self->{cur_src} = "$uri";
-			local $self->{cur_dst} = $self->{dst};
+			reap_live() while keys(%LIVE) >= $jobs;
+			return if $self->{lei}->{child_error};
+
+			my $task = bless { %$self }, __PACKAGE__;
+			$task->{cur_src} = "$uri";
+			$task->{cur_dst} = $task->{dst};
 			if ($n > 1) {
-				$self->{cur_dst} .= $name;
-				$self->{cur_src} .= $name;
+				$task->{cur_dst} .= $name;
+				$task->{cur_src} .= $name;
 			}
-			index($self->{cur_dst}, "\n") >= 0 and die <<EOM;
-E: `$self->{cur_dst}' must not contain newline
+			index($task->{cur_dst}, "\n") >= 0 and die <<EOM;
+E: `$task->{cur_dst}' must not contain newline
 EOM
-			$self->{cur_src} .= '/';
-			clone_v1($self, 1);
+			$task->{cur_src} .= '/';
+			clone_v1($task, 1);
 		}
 	}
 	reap_live() while keys(%LIVE);
+	return if $self->{lei}->{child_error};
+
 	if (delete $self->{-culled_manifest}) { # set by clone_v2/-I/--exclude
 		# write the smaller manifest if epochs were skipped so
 		# users won't have to delete manifest if they +w an

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 08/95] lei_mirror: default to single job by default
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (6 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 07/95] clone: support parallel v1 clones Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 09/95] lei_mirror: move directory creation to v2-only path Eric Wong
                   ` (86 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Parallel git clones are expensive on the server-side, and
smaller machines (which we encourage) can't handle them, well.

We'll also set `-q' since parallel clones will have output step
all over each other.
---
 lib/PublicInbox/LeiMirror.pm | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index a0b197a7..285c64d8 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -87,7 +87,8 @@ sub clone_cmd {
 	# e.g.: git -c http.proxy=socks5h://127.0.0.1:9050
 	push(@cmd, '-c', $_) for @{$lei->{opt}->{c} // []};
 	push @cmd, qw(clone --mirror);
-	push @cmd, '-q' if $lei->{opt}->{quiet};
+	push @cmd, '-q' if $lei->{opt}->{quiet} ||
+			($lei->{opt}->{jobs} // 1) > 1;
 	push @cmd, '-v' if $lei->{opt}->{verbose};
 	# XXX any other options to support?
 	# --reference is tricky with multiple epochs...
@@ -236,7 +237,7 @@ sub clone_v1 {
 	my $pfx = $curl->torsocks($lei, $uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
-	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
+	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}), "$uri", $dst ];
 	$lei->qerr("# @$cmd");
 	$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone, $lei, $cmd, $fini ];
@@ -385,7 +386,7 @@ failed to extract epoch number from $src
 	$LIVE{_try_config_start($task)} = [ \&_try_config_done, $task, $fini ];
 	$task->{-locked} = $lk->lock_for_scope($$);
 	my @cmd = clone_cmd($lei, my $opt = {});
-	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
+	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	do {
 		reap_live() while keys(%LIVE) >= $jobs;
 		while (keys(%LIVE) < $jobs && @src_edst &&
@@ -498,7 +499,7 @@ sub try_manifest {
 	}
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
 	return $lei->child_error(1, $multi) if !ref($multi);
-	my $jobs = $self->{lei}->{opt}->{jobs} // 2;
+	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	if (my $v2 = delete $multi->{v2}) {
 		for my $name (sort keys %$v2) {
 			my $epochs = delete $v2->{$name};

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 09/95] lei_mirror: move directory creation to v2-only path
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (7 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 08/95] lei_mirror: default to single job by default Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 10/95] lei_mirror: retrieve description text asynchronously, too Eric Wong
                   ` (85 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We rely on `git clone' to create the destination directory
for v1 and coderepos, so having it in _try_config_start was
senseless.
---
 lib/PublicInbox/LeiMirror.pm | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 285c64d8..6d72f15d 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -137,12 +137,6 @@ sub _get_txt_done {
 # tries the relatively new /$INBOX/_/text/config/raw endpoint
 sub _try_config_start {
 	my ($self) = @_;
-	my $dst = $self->{cur_dst} // $self->{dst};
-	if (!-d $dst || !mkdir($dst)) {
-		require File::Path;
-		File::Path::mkpath($dst);
-		-d $dst or die "mkpath($dst): $!\n";
-	}
 	_get_txt_start($self, qw(_/text/config/raw inbox.config.example), 0444);
 }
 
@@ -381,6 +375,12 @@ failed to extract epoch number from $src
 	}
 	# filter out the epochs we skipped
 	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
+
+	if (!-d $dst || !mkdir($dst)) {
+		require File::Path;
+		File::Path::mkpath($dst);
+		-d $dst or die "mkpath($dst): $!\n";
+	}
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
 	$LIVE{_try_config_start($task)} = [ \&_try_config_done, $task, $fini ];

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 10/95] lei_mirror: retrieve description text asynchronously, too
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (8 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 09/95] lei_mirror: move directory creation to v2-only path Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 11/95] switch inotify/kevent stuff to v5.12 Eric Wong
                   ` (84 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We can easily parallelize this, so do it.
---
 lib/PublicInbox/LeiMirror.pm | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 6d72f15d..0696c2d9 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -122,13 +122,13 @@ sub _get_txt_start { # non-fatal
 	spawn($cmd, undef, $opt);
 }
 
-sub _get_txt_done {
+sub _get_txt_done { # returns true on error (non-fatal), undef on success
 	my ($self) = @_;
 	my ($ft, $cmd, $uri, $file, $mode) = @{delete $self->{-get_txt}};
 	my $cerr = $?;
 	$? = 0;
-	return "$uri missing" if ($cerr >> 8) == 22;
-	return "# @$cmd failed (non-fatal)" if $cerr;
+	return warn("$uri missing\n") if ($cerr >> 8) == 22;
+	return warn("# @$cmd failed (non-fatal)\n") if $cerr;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	ft_rename($ft, "$dst/$file", $mode);
 	undef; # success
@@ -142,8 +142,7 @@ sub _try_config_start {
 
 sub _try_config_done {
 	my ($self) = @_;
-	my $err = _get_txt_done($self);
-	return warn($err, "\n") if $err;
+	_get_txt_done($self) and return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $f = "$dst/inbox.config.example";
 	my $cfg = PublicInbox::Config->git_config_dump($f, $self->{lei}->{2});
@@ -155,11 +154,6 @@ sub _try_config_done {
 	}
 }
 
-sub _get_txt { # non-fatal temporary compat function
-	waitpid(_get_txt_start(@_), 0) > 0 or die "waitpid: $!";
-	_get_txt_done($_[0]);
-}
-
 sub set_description ($) {
 	my ($self) = @_;
 	my $dst = $self->{cur_dst} // $self->{dst};
@@ -180,8 +174,6 @@ sub set_description ($) {
 sub index_cloned_inbox {
 	my ($self, $iv) = @_;
 	my $lei = $self->{lei};
-	my $err = _get_txt($self, qw(description description), 0666);
-	warn($err, "\n") if $err; # non fatal
 	eval { set_description($self) };
 	warn $@ if $@;
 
@@ -240,6 +232,9 @@ sub clone_v1 {
 	# wait for `git clone' to mkdir $dst (TODO: inotify/kevent?)
 	select(undef, undef, undef, 0.011) until -d $dst;
 	$LIVE{_try_config_start($self)} = [ \&_try_config_done, $self, $fini ];
+	reap_live() while keys(%LIVE) >= $jobs;
+	$LIVE{_get_txt_start($self, qw(description description), 0666)} =
+			[ \&_get_txt_done, $self, $fini ];
 	reap_live() until ($nohang || !keys(%LIVE));
 }
 
@@ -383,10 +378,13 @@ failed to extract epoch number from $src
 	}
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
+	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	$LIVE{_try_config_start($task)} = [ \&_try_config_done, $task, $fini ];
+	reap_live() while keys(%LIVE) >= $jobs;
+	$LIVE{_get_txt_start($self, qw(description description), 0666)} =
+				[ \&_get_txt_done, $self, $fini ];
 	$task->{-locked} = $lk->lock_for_scope($$);
 	my @cmd = clone_cmd($lei, my $opt = {});
-	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	do {
 		reap_live() while keys(%LIVE) >= $jobs;
 		while (keys(%LIVE) < $jobs && @src_edst &&

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 11/95] switch inotify/kevent stuff to v5.12
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (9 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 10/95] lei_mirror: retrieve description text asynchronously, too Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 12/95] manifest: update module blurb + v5.12 Eric Wong
                   ` (83 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Another tiny step towards an eventual startup time improvements
by avoiding strict.pm
---
 lib/PublicInbox/DSKQXS.pm      |  5 ++---
 lib/PublicInbox/DirIdle.pm     |  4 ++--
 lib/PublicInbox/FakeInotify.pm | 13 ++++++-------
 lib/PublicInbox/In2Tie.pm      |  4 ++--
 lib/PublicInbox/InboxIdle.pm   |  2 +-
 lib/PublicInbox/KQNotify.pm    | 12 +++++-------
 6 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/lib/PublicInbox/DSKQXS.pm b/lib/PublicInbox/DSKQXS.pm
index 7141b131..cb035bd5 100644
--- a/lib/PublicInbox/DSKQXS.pm
+++ b/lib/PublicInbox/DSKQXS.pm
@@ -1,4 +1,4 @@
-# Copyright (C) 2019-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # Licensed the same as Danga::Socket (and Perl5)
 # License: GPL-1.0+ or Artistic-1.0-Perl
 #  <https://www.gnu.org/licenses/gpl-1.0.txt>
@@ -11,8 +11,7 @@
 #
 # It also implements signalfd(2) emulation via "tie".
 package PublicInbox::DSKQXS;
-use strict;
-use warnings;
+use v5.12;
 use parent qw(Exporter);
 use Symbol qw(gensym);
 use IO::KQueue;
diff --git a/lib/PublicInbox/DirIdle.pm b/lib/PublicInbox/DirIdle.pm
index 9206da9c..55c3982f 100644
--- a/lib/PublicInbox/DirIdle.pm
+++ b/lib/PublicInbox/DirIdle.pm
@@ -1,9 +1,9 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # Used by public-inbox-watch for Maildir (and possibly MH in the future)
 package PublicInbox::DirIdle;
-use strict;
+use v5.12;
 use parent 'PublicInbox::DS';
 use PublicInbox::Syscall qw(EPOLLIN);
 use PublicInbox::In2Tie;
diff --git a/lib/PublicInbox/FakeInotify.pm b/lib/PublicInbox/FakeInotify.pm
index 6d269601..45b80f50 100644
--- a/lib/PublicInbox/FakeInotify.pm
+++ b/lib/PublicInbox/FakeInotify.pm
@@ -1,11 +1,10 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # for systems lacking Linux::Inotify2 or IO::KQueue, just emulates
 # enough of Linux::Inotify2
 package PublicInbox::FakeInotify;
-use strict;
-use v5.10.1;
+use v5.12;
 use parent qw(Exporter);
 use Time::HiRes qw(stat);
 use PublicInbox::DS qw(add_timer);
@@ -119,7 +118,7 @@ sub poll_once {
 }
 
 package PublicInbox::FakeInotify::Watch;
-use strict;
+use v5.12;
 
 sub cancel {
 	my ($self) = @_;
@@ -132,7 +131,7 @@ sub name {
 }
 
 package PublicInbox::FakeInotify::Event;
-use strict;
+use v5.12;
 
 sub fullname { ${$_[0]} }
 
@@ -141,14 +140,14 @@ sub IN_MOVED_FROM { 0 }
 sub IN_DELETE_SELF { 0 }
 
 package PublicInbox::FakeInotify::GoneEvent;
-use strict;
+use v5.12;
 our @ISA = qw(PublicInbox::FakeInotify::Event);
 
 sub IN_DELETE { 1 }
 sub IN_MOVED_FROM { 0 }
 
 package PublicInbox::FakeInotify::SelfGoneEvent;
-use strict;
+use v5.12;
 our @ISA = qw(PublicInbox::FakeInotify::GoneEvent);
 
 sub IN_DELETE_SELF { 1 }
diff --git a/lib/PublicInbox/In2Tie.pm b/lib/PublicInbox/In2Tie.pm
index ffe26a44..3689432b 100644
--- a/lib/PublicInbox/In2Tie.pm
+++ b/lib/PublicInbox/In2Tie.pm
@@ -1,10 +1,10 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # used to ensure PublicInbox::DS can call fileno() as a function
 # on Linux::Inotify2 objects
 package PublicInbox::In2Tie;
-use strict;
+use v5.12;
 use Symbol qw(gensym);
 
 sub io {
diff --git a/lib/PublicInbox/InboxIdle.pm b/lib/PublicInbox/InboxIdle.pm
index 005e2636..f0d8a972 100644
--- a/lib/PublicInbox/InboxIdle.pm
+++ b/lib/PublicInbox/InboxIdle.pm
@@ -5,7 +5,7 @@
 # inot: Linux::Inotify2-like object
 # pathmap => { inboxdir => [ ibx, watch1, watch2, watch3... ] } mapping
 package PublicInbox::InboxIdle;
-use strict;
+use v5.12;
 use parent qw(PublicInbox::DS);
 use PublicInbox::Syscall qw(EPOLLIN);
 my $IN_MODIFY = 0x02; # match Linux inotify
diff --git a/lib/PublicInbox/KQNotify.pm b/lib/PublicInbox/KQNotify.pm
index 7efb8b60..381711fa 100644
--- a/lib/PublicInbox/KQNotify.pm
+++ b/lib/PublicInbox/KQNotify.pm
@@ -1,11 +1,10 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # implements the small subset of Linux::Inotify2 functionality we use
 # using IO::KQueue on *BSD systems.
 package PublicInbox::KQNotify;
-use strict;
-use v5.10.1;
+use v5.12;
 use IO::KQueue;
 use PublicInbox::DSKQXS; # wraps IO::KQueue for fork-safe DESTROY
 use PublicInbox::FakeInotify qw(fill_dirlist on_dir_change);
@@ -29,8 +28,7 @@ sub watch {
 			'PublicInbox::KQNotify::Watchdir';
 	} else {
 		open($fh, '<', $path) or return;
-		$watch = bless [ $fh, $path ],
-			'PublicInbox::KQNotify::Watch';
+		$watch = bless [ $fh, $path ], 'PublicInbox::KQNotify::Watch';
 	}
 	my $ident = fileno($fh);
 	$self->{dskq}->{kq}->EV_SET($ident, # ident (fd)
@@ -100,14 +98,14 @@ sub read {
 }
 
 package PublicInbox::KQNotify::Watch;
-use strict;
+use v5.12;
 
 sub name { $_[0]->[1] }
 
 sub cancel { close $_[0]->[0] or die "close: $!" }
 
 package PublicInbox::KQNotify::Watchdir;
-use strict;
+use v5.12;
 
 sub name { $_[0]->[1] }
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 12/95] manifest: update module blurb + v5.12
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (10 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 11/95] switch inotify/kevent stuff to v5.12 Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 13/95] lei_mirror: simplify _get_txt_start callers Eric Wong
                   ` (82 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Helps steer new contributors (or forgetful old ones) in the
right direction.
---
 lib/PublicInbox/ManifestJsGz.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/ManifestJsGz.pm b/lib/PublicInbox/ManifestJsGz.pm
index d5048a96..1f739baa 100644
--- a/lib/PublicInbox/ManifestJsGz.pm
+++ b/lib/PublicInbox/ManifestJsGz.pm
@@ -1,10 +1,10 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
-# generates manifest.js.gz for grokmirror(1)
+# generates manifest.js.gz for grokmirror(1) via PublicInbox::WWW
+# This doesn't parse manifest.js.gz (that happens in LeiMirror)
 package PublicInbox::ManifestJsGz;
-use strict;
-use v5.10.1;
+use v5.12;
 use parent qw(PublicInbox::WwwListing);
 use PublicInbox::Config;
 use IO::Compress::Gzip qw(gzip);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 13/95] lei_mirror: simplify _get_txt_start callers
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (11 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 12/95] manifest: update module blurb + v5.12 Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 14/95] lei_mirror: elide description retrieval for v1|coderepo Eric Wong
                   ` (81 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We can avoid needless select()-based sleeps by always
using TMPDIR for temporary files, and just slurping the
small config or description file.

This will make it easier to reuse the description from
the manifest in the next commit.
---
 lib/PublicInbox/LeiMirror.pm | 87 +++++++++++++++++++-----------------
 1 file changed, 46 insertions(+), 41 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 0696c2d9..da8987be 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -106,45 +106,50 @@ sub ft_rename ($$$) {
 }
 
 sub _get_txt_start { # non-fatal
-	my ($self, $endpoint, $file, $mode) = @_;
+	my ($self, $endpoint, $fini) = @_;
 	my $uri = URI->new($self->{cur_src} // $self->{src});
 	my $lei = $self->{lei};
 	my $path = $uri->path;
 	chop($path) eq '/' or die "BUG: $uri not canonicalized";
 	$uri->path("$path/$endpoint");
-	my $dst = $self->{cur_dst} // $self->{dst};
-	my $ft = File::Temp->new(TEMPLATE => "$file-XXXX", DIR => $dst);
+	my $f = (split(m!/!, $endpoint))[-1];
+	my $ft = File::Temp->new(TEMPLATE => "$f-XXXX", TMPDIR => 1);
 	my $opt = { 0 => $lei->{0}, 1 => $lei->{1}, 2 => $lei->{2} };
-	my $cmd = $self->{curl}->for_uri($lei, $uri,
-					qw(--compressed -R -o), $ft->filename);
-	$self->{-get_txt} = [ $ft, $cmd, $uri, $file, $mode ];
+	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(--compressed -R -o),
+					$ft->filename);
+	$self->{"-get_txt.$endpoint"} = [ $ft, $cmd, $uri ];
 	$lei->qerr("# @$cmd");
-	spawn($cmd, undef, $opt);
+	my $jobs = $lei->{opt}->{jobs} // 1;
+	reap_live() while keys(%LIVE) >= $jobs;
+	$LIVE{spawn($cmd, undef, $opt)} =
+			[ \&_get_txt_done, $self, $endpoint, $fini ];
 }
 
 sub _get_txt_done { # returns true on error (non-fatal), undef on success
-	my ($self) = @_;
-	my ($ft, $cmd, $uri, $file, $mode) = @{delete $self->{-get_txt}};
+	my ($self, $endpoint) = @_;
+	my ($fh, $cmd, $uri) = @{delete $self->{"-get_txt.$endpoint"}};
 	my $cerr = $?;
-	$? = 0;
+	$? = 0; # don't influence normal lei exit
 	return warn("$uri missing\n") if ($cerr >> 8) == 22;
 	return warn("# @$cmd failed (non-fatal)\n") if $cerr;
-	my $dst = $self->{cur_dst} // $self->{dst};
-	ft_rename($ft, "$dst/$file", $mode);
+	seek($fh, SEEK_SET, 0) or die "seek: $!";
+	$self->{"mtime.$endpoint"} = (stat($fh))[9];
+	local $/;
+	$self->{"txt.$endpoint"} = <$fh>;
 	undef; # success
 }
 
-# tries the relatively new /$INBOX/_/text/config/raw endpoint
-sub _try_config_start {
+sub _write_inbox_config {
 	my ($self) = @_;
-	_get_txt_start($self, qw(_/text/config/raw inbox.config.example), 0444);
-}
-
-sub _try_config_done {
-	my ($self) = @_;
-	_get_txt_done($self) and return;
+	my $buf = delete($self->{'txt._/text/config/raw'}) // return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $f = "$dst/inbox.config.example";
+	open my $fh, '>', $f or die "open($f): $!";
+	print $fh $buf or die "print: $!";
+	chmod(0444 & ~umask, $fh) or die "chmod($f): $!";
+	my $mtime = delete $self->{'mtime._/text/config/raw'} // die 'BUG: no mtime';
+	$fh->flush or die "flush($f): $!";
+	utime($mtime, $mtime, $fh) or die "utime($f): $!";
 	my $cfg = PublicInbox::Config->git_config_dump($f, $self->{lei}->{2});
 	my $ibx = $self->{ibx} = {};
 	for my $sec (grep(/\Apublicinbox\./, @{$cfg->{-section_order}})) {
@@ -160,15 +165,18 @@ sub set_description ($) {
 	my $f = "$dst/description";
 	open my $fh, '+>>', $f or die "open($f): $!";
 	seek($fh, 0, SEEK_SET) or die "seek($f): $!";
-	chomp(my $d = do { local $/; <$fh> } // die "read($f): $!");
-	if ($d eq '($INBOX_DIR/description missing)' ||
-			$d =~ /^Unnamed repository/ || $d !~ /\S/) {
-		seek($fh, 0, SEEK_SET) or die "seek($f): $!";
-		truncate($fh, 0) or die "truncate($f): $!";
-		my $src = $self->{cur_src} // $self->{src};
-		print $fh "mirror of $src\n" or die "print($f): $!";
-		close $fh or die "close($f): $!";
+	my $d = do { local $/; <$fh> } // die "read($f): $!";
+	my $orig = $d;
+	while (defined($d) && ($d =~ m!^\(\$INBOX_DIR/description missing\)! ||
+			$d =~ /^Unnamed repository/ || $d !~ /\S/)) {
+		$d = delete($self->{'txt.description'});
 	}
+	$d //= 'mirror of '.($self->{cur_src} // $self->{src})."\n";
+	return if $d eq $orig;
+	seek($fh, 0, SEEK_SET) or die "seek($f): $!";
+	truncate($fh, 0) or die "truncate($f): $!";
+	print $fh $d or die "print($f): $!";
+	close $fh or die "close($f): $!";
 }
 
 sub index_cloned_inbox {
@@ -229,12 +237,8 @@ sub clone_v1 {
 	$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone, $lei, $cmd, $fini ];
 	reap_live() while keys(%LIVE) >= $jobs;
 
-	# wait for `git clone' to mkdir $dst (TODO: inotify/kevent?)
-	select(undef, undef, undef, 0.011) until -d $dst;
-	$LIVE{_try_config_start($self)} = [ \&_try_config_done, $self, $fini ];
-	reap_live() while keys(%LIVE) >= $jobs;
-	$LIVE{_get_txt_start($self, qw(description description), 0666)} =
-			[ \&_get_txt_done, $self, $fini ];
+	_get_txt_start($self, '_/text/config/raw', $fini);
+	_get_txt_start($self, 'description', $fini);
 	reap_live() until ($nohang || !keys(%LIVE));
 }
 
@@ -310,13 +314,14 @@ sub reap_clone { # async, called via SIGCHLD
 
 sub v1_done { # called via OnDestroy
 	my ($self) = @_;
-	my $dst = $self->{cur_dst} // $self->{dst};
-	write_makefile($dst, 1);
+	_write_inbox_config($self);
+	write_makefile($self->{cur_dst} // $self->{dst}, 1);
 	index_cloned_inbox($self, 1);
 }
 
 sub v2_done { # called via OnDestroy
 	my ($self) = @_;
+	_write_inbox_config($self);
 	require PublicInbox::MultiGit;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
@@ -378,13 +383,13 @@ failed to extract epoch number from $src
 	}
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
-	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
-	$LIVE{_try_config_start($task)} = [ \&_try_config_done, $task, $fini ];
-	reap_live() while keys(%LIVE) >= $jobs;
-	$LIVE{_get_txt_start($self, qw(description description), 0666)} =
-				[ \&_get_txt_done, $self, $fini ];
+
+	_get_txt_start($task, '_/text/config/raw', $fini);
+	_get_txt_start($self, 'description', $fini);
+
 	$task->{-locked} = $lk->lock_for_scope($$);
 	my @cmd = clone_cmd($lei, my $opt = {});
+	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	do {
 		reap_live() while keys(%LIVE) >= $jobs;
 		while (keys(%LIVE) < $jobs && @src_edst &&

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 14/95] lei_mirror: elide description retrieval for v1|coderepo
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (12 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 13/95] lei_mirror: simplify _get_txt_start callers Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 15/95] lei_mirror: add a hint for skipped epoch permissions Eric Wong
                   ` (80 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

manifest.js.gz can provide the description without an extra
HTTP(S) requests, so attempt to use it whenever we're using
the manifest.
---
 lib/PublicInbox/LeiMirror.pm | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index da8987be..150b4296 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -147,9 +147,11 @@ sub _write_inbox_config {
 	open my $fh, '>', $f or die "open($f): $!";
 	print $fh $buf or die "print: $!";
 	chmod(0444 & ~umask, $fh) or die "chmod($f): $!";
-	my $mtime = delete $self->{'mtime._/text/config/raw'} // die 'BUG: no mtime';
+	my $mtime = delete $self->{'mtime._/text/config/raw'};
 	$fh->flush or die "flush($f): $!";
-	utime($mtime, $mtime, $fh) or die "utime($f): $!";
+	if (defined $mtime) {
+		utime($mtime, $mtime, $fh) or die "utime($f): $!";
+	}
 	my $cfg = PublicInbox::Config->git_config_dump($f, $self->{lei}->{2});
 	my $ibx = $self->{ibx} = {};
 	for my $sec (grep(/\Apublicinbox\./, @{$cfg->{-section_order}})) {
@@ -238,8 +240,11 @@ sub clone_v1 {
 	reap_live() while keys(%LIVE) >= $jobs;
 
 	_get_txt_start($self, '_/text/config/raw', $fini);
-	_get_txt_start($self, 'description', $fini);
-	reap_live() until ($nohang || !keys(%LIVE));
+	my $d = $self->{-ent} ? $self->{-ent}->{description} : undef;
+	defined($d) ? ($self->{'txt.description'} = $d) :
+		_get_txt_start($self, 'description', $fini);
+
+	reap_live() until ($nohang || !keys(%LIVE)); # for non-manifest clone
 }
 
 sub parse_epochs ($$) {
@@ -538,6 +543,8 @@ EOM
 			return if $self->{lei}->{child_error};
 
 			my $task = bless { %$self }, __PACKAGE__;
+			$task->{-ent} = $m->{$name} //
+					die("BUG: no `$name' in manifest");
 			$task->{cur_src} = "$uri";
 			$task->{cur_dst} = $task->{dst};
 			if ($n > 1) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 15/95] lei_mirror: add a hint for skipped epoch permissions
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (13 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 14/95] lei_mirror: elide description retrieval for v1|coderepo Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 16/95] lei_mirror: consolidate clone process management Eric Wong
                   ` (79 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Some users may think it's git-specific thing to enable
writability, rather than a *nix permissions thing.  Clarify that
it's a standard *nix thing.
---
 lib/PublicInbox/LeiMirror.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 150b4296..eaffb8fa 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -303,6 +303,7 @@ sub init_placeholder ($$) {
 
 ; This git epoch was created read-only and "public-inbox-fetch"
 ; will not fetch updates for it unless write permission is added.
+; Hint: chmod +w $edst
 EOM
 	close $fh or die "close:($f): $!";
 }

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 16/95] lei_mirror: consolidate clone process management
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (14 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 15/95] lei_mirror: add a hint for skipped epoch permissions Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 17/95] lei_mirror: load File::Path unconditionally Eric Wong
                   ` (78 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This simplifies our code by having fewer places check process
limits and perform reaping.  We'll also print command names
immediately before executing, instead of right before waiting
for running processes.
---
 lib/PublicInbox/LeiMirror.pm | 37 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index eaffb8fa..8796951d 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -118,9 +118,9 @@ sub _get_txt_start { # non-fatal
 	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(--compressed -R -o),
 					$ft->filename);
 	$self->{"-get_txt.$endpoint"} = [ $ft, $cmd, $uri ];
-	$lei->qerr("# @$cmd");
 	my $jobs = $lei->{opt}->{jobs} // 1;
 	reap_live() while keys(%LIVE) >= $jobs;
+	$lei->qerr("# @$cmd");
 	$LIVE{spawn($cmd, undef, $opt)} =
 			[ \&_get_txt_done, $self, $endpoint, $fini ];
 }
@@ -223,6 +223,14 @@ sub run_reap {
 	$ret;
 }
 
+sub start_clone {
+	my ($self, $cmd, $opt, $fini) = @_;
+	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
+	reap_live() while keys(%LIVE) >= $jobs;
+	$self->{lei}->qerr("# @$cmd");
+	$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone, $self, $cmd, $fini ];
+}
+
 sub clone_v1 {
 	my ($self, $nohang) = @_;
 	my $lei = $self->{lei};
@@ -233,11 +241,8 @@ sub clone_v1 {
 	my $pfx = $curl->torsocks($lei, $uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
-	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}), "$uri", $dst ];
-	$lei->qerr("# @$cmd");
-	$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone, $lei, $cmd, $fini ];
-	reap_live() while keys(%LIVE) >= $jobs;
+	start_clone($self, $cmd, $opt, $fini);
 
 	_get_txt_start($self, '_/text/config/raw', $fini);
 	my $d = $self->{-ent} ? $self->{-ent}->{description} : undef;
@@ -309,12 +314,12 @@ EOM
 }
 
 sub reap_clone { # async, called via SIGCHLD
-	my ($lei, $cmd) = @_;
+	my ($self, $cmd) = @_;
 	my $cerr = $?;
 	$? = 0; # don't let it influence normal exit
 	if ($cerr) {
 		kill('TERM', keys %LIVE);
-		$lei->child_error($cerr, "@$cmd failed");
+		$self->{lei}->child_error($cerr, "@$cmd failed");
 	}
 }
 
@@ -395,17 +400,10 @@ failed to extract epoch number from $src
 
 	$task->{-locked} = $lk->lock_for_scope($$);
 	my @cmd = clone_cmd($lei, my $opt = {});
-	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
-	do {
-		reap_live() while keys(%LIVE) >= $jobs;
-		while (keys(%LIVE) < $jobs && @src_edst &&
-				!$lei->{child_error}) {
-			my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
-			$lei->qerr("# @$cmd");
-			$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone,
-							$lei, $cmd, $fini ];
-		}
-	} while (@src_edst && !$lei->{child_error});
+	while (@src_edst && !$lei->{child_error}) {
+		my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
+		start_clone($self, $cmd, $opt, $fini);
+	}
 }
 
 sub decode_manifest ($$$) {
@@ -508,7 +506,6 @@ sub try_manifest {
 	}
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
 	return $lei->child_error(1, $multi) if !ref($multi);
-	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	if (my $v2 = delete $multi->{v2}) {
 		for my $name (sort keys %$v2) {
 			my $epochs = delete $v2->{$name};
@@ -531,7 +528,6 @@ sub try_manifest {
 E: `$self->{cur_dst}' must not contain newline
 EOM
 			clone_v2($self, \%v2_epochs, $m);
-			reap_live() while keys(%LIVE) >= $jobs;
 			return if $self->{lei}->{child_error};
 		}
 	}
@@ -540,7 +536,6 @@ EOM
 		chop($p) if substr($p, -1, 1) eq '/';
 		$uri->path($p);
 		for my $name (@$v1) {
-			reap_live() while keys(%LIVE) >= $jobs;
 			return if $self->{lei}->{child_error};
 
 			my $task = bless { %$self }, __PACKAGE__;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 17/95] lei_mirror: load File::Path unconditionally
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (15 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 16/95] lei_mirror: consolidate clone process management Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 18/95] lei_mirror: load most modules up-front Eric Wong
                   ` (77 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

File::Temp already uses it, so there's no sense in conditionally
require-ing it to save startup time.
---
 lib/PublicInbox/LeiMirror.pm | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 8796951d..1ca603b3 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -10,6 +10,7 @@ use PublicInbox::Config;
 use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
 use IO::Compress::Gzip qw(gzip $GzipError);
 use PublicInbox::Spawn qw(popen_rd spawn);
+use File::Path ();
 use File::Temp ();
 use Fcntl qw(SEEK_SET O_CREAT O_EXCL O_WRONLY);
 use Carp qw(croak);
@@ -387,11 +388,7 @@ failed to extract epoch number from $src
 	# filter out the epochs we skipped
 	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
 
-	if (!-d $dst || !mkdir($dst)) {
-		require File::Path;
-		File::Path::mkpath($dst);
-		-d $dst or die "mkpath($dst): $!\n";
-	}
+	-d $dst || File::Path::mkpath($dst);
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
 
@@ -486,6 +483,7 @@ sub try_manifest {
 	$uri->path($path . '/manifest.js.gz');
 	my $pdir = $lei->rel2abs($self->{dst});
 	$pdir =~ s!/[^/]+/?\z!!;
+	-d $pdir || File::Path::mkpath($pdir);
 	my $ft = File::Temp->new(TEMPLATE => 'm-XXXX',
 				UNLINK => 1, DIR => $pdir, SUFFIX => '.tmp');
 	my $fn = $ft->filename;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 18/95] lei_mirror: load most modules up-front
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (16 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 17/95] lei_mirror: load File::Path unconditionally Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 19/95] lei_mirror: set gitweb.owner from manifest Eric Wong
                   ` (76 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

lei lazy loads LeiMirror itself lazily, anyways, and it only
supports HTTP(S) mirrors, so there's no point in delaying most
of the modules it loads.  Some of the inbox-specific and
v2-specific stuff can be lazy-loaded, however, since this
will support mirroring non-inbox repositories, too.
---
 lib/PublicInbox/LeiMirror.pm | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 1ca603b3..d0bc7384 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -6,7 +6,6 @@ package PublicInbox::LeiMirror;
 use strict;
 use v5.10.1;
 use parent qw(PublicInbox::IPC);
-use PublicInbox::Config;
 use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
 use IO::Compress::Gzip qw(gzip $GzipError);
 use PublicInbox::Spawn qw(popen_rd spawn);
@@ -14,7 +13,13 @@ use File::Path ();
 use File::Temp ();
 use Fcntl qw(SEEK_SET O_CREAT O_EXCL O_WRONLY);
 use Carp qw(croak);
-our %LIVE;
+use URI;
+use PublicInbox::Config;
+use PublicInbox::Inbox;
+use PublicInbox::LeiCurl;
+use PublicInbox::OnDestroy;
+
+our %LIVE; # pid => callback
 
 sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my ($arg, $pid) = @_;
@@ -191,6 +196,8 @@ sub index_cloned_inbox {
 	# n.b. public-inbox-clone works w/o (SQLite || Xapian)
 	# lei is useless without Xapian + SQLite
 	if ($lei->{cmd} ne 'public-inbox-clone') {
+		require PublicInbox::InboxWritable;
+		require PublicInbox::Admin;
 		my $ibx = delete($self->{ibx}) // {
 			address => [ 'lei@example.com' ],
 			version => $iv,
@@ -389,6 +396,7 @@ failed to extract epoch number from $src
 	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
 
 	-d $dst || File::Path::mkpath($dst);
+	require PublicInbox::Lock;
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
 
@@ -595,14 +603,6 @@ sub do_mirror { # via wq_io_do
 sub start {
 	my ($cls, $lei, $src, $dst) = @_;
 	my $self = bless { src => $src, dst => $dst }, $cls;
-	if ($src =~ m!https?://!) {
-		require URI;
-		require PublicInbox::LeiCurl;
-	}
-	require PublicInbox::Lock;
-	require PublicInbox::Inbox;
-	require PublicInbox::Admin;
-	require PublicInbox::InboxWritable;
 	$lei->request_umask;
 	my ($op_c, $ops) = $lei->workers_start($self, 1);
 	$lei->{wq1} = $self;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 19/95] lei_mirror: set gitweb.owner from manifest
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (17 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 18/95] lei_mirror: load most modules up-front Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 20/95] clone: support --dry-run / -n flag Eric Wong
                   ` (75 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This is mainly for coderepos, but sometimes public-inboxes
get shared via cgit/gitweb, too.
---
 lib/PublicInbox/LeiMirror.pm | 28 +++++++++++++++++++++++-----
 t/www_listing.t              |  2 ++
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index d0bc7384..5e1b1c64 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -8,7 +8,7 @@ use v5.10.1;
 use parent qw(PublicInbox::IPC);
 use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
 use IO::Compress::Gzip qw(gzip $GzipError);
-use PublicInbox::Spawn qw(popen_rd spawn);
+use PublicInbox::Spawn qw(popen_rd spawn run_die);
 use File::Path ();
 use File::Temp ();
 use Fcntl qw(SEEK_SET O_CREAT O_EXCL O_WRONLY);
@@ -303,8 +303,8 @@ EOM
 	$want
 }
 
-sub init_placeholder ($$) {
-	my ($src, $edst) = @_;
+sub init_placeholder ($$$) {
+	my ($src, $edst, $owner) = @_;
 	PublicInbox::Import::init_bare($edst);
 	my $f = "$edst/config";
 	open my $fh, '>>', $f or die "open($f): $!";
@@ -318,6 +318,12 @@ sub init_placeholder ($$) {
 ; will not fetch updates for it unless write permission is added.
 ; Hint: chmod +w $edst
 EOM
+	if (defined($owner)) {
+		print $fh <<EOM or die "print($f): $!";
+[gitweb]
+	owner = $owner
+EOM
+	}
 	close $fh or die "close:($f): $!";
 }
 
@@ -334,7 +340,11 @@ sub reap_clone { # async, called via SIGCHLD
 sub v1_done { # called via OnDestroy
 	my ($self) = @_;
 	_write_inbox_config($self);
-	write_makefile($self->{cur_dst} // $self->{dst}, 1);
+	my $dst = $self->{cur_dst} // $self->{dst};
+	if (defined(my $o = $self->{-ent} ? $self->{-ent}->{owner} : undef)) {
+		run_die([qw(git config -f), "$dst/config", 'gitweb.owner', $o]);
+	}
+	write_makefile($dst, 1);
 	index_cloned_inbox($self, 1);
 }
 
@@ -346,6 +356,11 @@ sub v2_done { # called via OnDestroy
 	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
 	$mg->fill_alternates;
 	for my $i ($mg->git_epochs) { $mg->epoch_cfg_set($i) }
+	my $edst_owner = delete($self->{-owner}) // [];
+	while (@$edst_owner) {
+		my ($edst, $o) = splice(@$edst_owner);
+		run_die [qw(git config -f), "$edst/config", 'gitweb.owner', $o];
+	}
 	for my $edst (@{delete($self->{-read_only}) // []}) {
 		my @st = stat($edst) or die "stat($edst): $!";
 		chmod($st[2] & 0555, $edst) or die "chmod(a-w, $edst): $!";
@@ -384,10 +399,13 @@ failed to extract epoch number from $src
 
 		$1 + 0 == $nr or die "BUG: <$uri> miskeyed $1 != $nr";
 		$edst .= "/git/$nr.git";
+		$m->{$key} // die "BUG: `$key' not in manifest.js.gz";
 		if (!$want || $want->{$nr}) {
 			push @src_edst, $src, $edst;
+			my $o = $m->{$key}->{owner};
+			push(@{$task->{-owner}}, $edst, $o) if defined($o);
 		} else { # create a placeholder so users only need to chmod +w
-			init_placeholder($src, $edst);
+			init_placeholder($src, $edst, $m->{$key}->{owner});
 			push @{$task->{-read_only}}, $edst;
 			push @skip, $key;
 		}
diff --git a/t/www_listing.t b/t/www_listing.t
index e6bb1bda..c13d8f90 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -150,6 +150,8 @@ EOM
 		undef, $opt), 'clone w/include') or diag "clone_err=$clone_err";
 	ok(-d "$tmpdir/incl/alt", 'alt cloned');
 	ok(!-d "$tmpdir/incl/v2" && !-d "$tmpdir/incl/bare", 'only alt cloned');
+	is(xqx([qw(git config -f), "$tmpdir/incl/alt/config", 'gitweb.owner']),
+		"lorelei \xc4\x80\n", 'gitweb.owner set by -clone');
 
 	undef $td;
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 20/95] clone: support --dry-run / -n flag
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (18 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 19/95] lei_mirror: set gitweb.owner from manifest Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 21/95] lei_mirror: initialize placeholders with "head" from manifest Eric Wong
                   ` (74 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

It still makes HTTP(S) requests to retrieve the manifest or
scrape HTML, but doesn't make permanent changes to the FS
(aside from modifying {acm}time of ${TMPDIR-/tmp}).
---
 Documentation/public-inbox-clone.pod |  6 +++++
 lib/PublicInbox/LeiMirror.pm         | 35 ++++++++++++++--------------
 script/public-inbox-clone            |  6 ++++-
 t/www_listing.t                      |  6 +++++
 4 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 7e95146e..178d952a 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -65,6 +65,12 @@ When cloning a top-level with multiple inboxes, ignore inboxes and
 repositories matching the given wildcard pattern.  Supports the same
 wildcards as L</--include>
 
+=item -n
+
+=item --dry-run
+
+Show what would be done, without making any changes.
+
 =item -q
 
 =item --quiet
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 5e1b1c64..d955ac3b 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -27,10 +27,10 @@ sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my $f = "$mrr->{dst}/mirror.done";
 	if ($?) {
 		$lei->child_error($?);
-	} elsif (!unlink($f)) {
+	} elsif (!$mrr->{dry_run} && !unlink($f)) {
 		warn("unlink($f): $!\n") unless $!{ENOENT};
 	} else {
-		if ($lei->{cmd} ne 'public-inbox-clone') {
+		if (!$mrr->{dry_run} && $lei->{cmd} ne 'public-inbox-clone') {
 			# calls _finish_add_external
 			$lei->lazy_cb('add-external', '_finish_'
 					)->($lei, $mrr->{dst});
@@ -107,7 +107,8 @@ sub ft_rename ($$$) {
 	my @st = stat($dst);
 	my $mode = @st ? ($st[2] & 07777) : ($open_mode & ~umask);
 	chmod($mode, $ft) or croak "E: chmod $fn: $!";
-	rename($fn, $dst) or croak "E: rename($fn => $ft): $!";
+	require File::Copy;
+	File::Copy::mv($fn, $dst) or croak "E: mv($fn => $ft): $!";
 	$ft->unlink_on_destroy(0);
 }
 
@@ -123,10 +124,11 @@ sub _get_txt_start { # non-fatal
 	my $opt = { 0 => $lei->{0}, 1 => $lei->{1}, 2 => $lei->{2} };
 	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(--compressed -R -o),
 					$ft->filename);
-	$self->{"-get_txt.$endpoint"} = [ $ft, $cmd, $uri ];
 	my $jobs = $lei->{opt}->{jobs} // 1;
 	reap_live() while keys(%LIVE) >= $jobs;
 	$lei->qerr("# @$cmd");
+	return if $self->{dry_run};
+	$self->{"-get_txt.$endpoint"} = [ $ft, $cmd, $uri ];
 	$LIVE{spawn($cmd, undef, $opt)} =
 			[ \&_get_txt_done, $self, $endpoint, $fini ];
 }
@@ -236,6 +238,7 @@ sub start_clone {
 	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
 	reap_live() while keys(%LIVE) >= $jobs;
 	$self->{lei}->qerr("# @$cmd");
+	return if $self->{dry_run};
 	$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone, $self, $cmd, $fini ];
 }
 
@@ -339,6 +342,7 @@ sub reap_clone { # async, called via SIGCHLD
 
 sub v1_done { # called via OnDestroy
 	my ($self) = @_;
+	return if $self->{dry_run};
 	_write_inbox_config($self);
 	my $dst = $self->{cur_dst} // $self->{dst};
 	if (defined(my $o = $self->{-ent} ? $self->{-ent}->{owner} : undef)) {
@@ -350,6 +354,7 @@ sub v1_done { # called via OnDestroy
 
 sub v2_done { # called via OnDestroy
 	my ($self) = @_;
+	return if $self->{dry_run};
 	_write_inbox_config($self);
 	require PublicInbox::MultiGit;
 	my $dst = $self->{cur_dst} // $self->{dst};
@@ -413,7 +418,8 @@ failed to extract epoch number from $src
 	# filter out the epochs we skipped
 	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
 
-	-d $dst || File::Path::mkpath($dst);
+	(!$self->{dry_run} && !-d $dst) and File::Path::mkpath($dst);
+
 	require PublicInbox::Lock;
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
@@ -421,7 +427,7 @@ failed to extract epoch number from $src
 	_get_txt_start($task, '_/text/config/raw', $fini);
 	_get_txt_start($self, 'description', $fini);
 
-	$task->{-locked} = $lk->lock_for_scope($$);
+	$task->{-locked} = $lk->lock_for_scope($$) if !$self->{dry_run};
 	my @cmd = clone_cmd($lei, my $opt = {});
 	while (@src_edst && !$lei->{child_error}) {
 		my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
@@ -507,17 +513,12 @@ sub try_manifest {
 	my $path = $uri->path;
 	chop($path) eq '/' or die "BUG: $uri not canonicalized";
 	$uri->path($path . '/manifest.js.gz');
-	my $pdir = $lei->rel2abs($self->{dst});
-	$pdir =~ s!/[^/]+/?\z!!;
-	-d $pdir || File::Path::mkpath($pdir);
-	my $ft = File::Temp->new(TEMPLATE => 'm-XXXX',
-				UNLINK => 1, DIR => $pdir, SUFFIX => '.tmp');
+	my $ft = File::Temp->new(TEMPLATE => '.manifest-XXXX',
+				UNLINK => 1, TMPDIR => 1, SUFFIX => '.tmp');
 	my $fn = $ft->filename;
-	my ($bn) = ($fn =~ m!/([^/]+)\z!);
-	my $cmd = $curl->for_uri($lei, $uri, '-R', '-o', $bn);
-	my $opt = { -C => $pdir };
-	$opt->{$_} = $lei->{$_} for (0..2);
-	my $cerr = run_reap($lei, $cmd, $opt);
+	my $cmd = $curl->for_uri($lei, $uri, '-R', '-o', $fn);
+	my %opt = map { $_ => $lei->{$_} } (0..2);
+	my $cerr = run_reap($lei, $cmd, \%opt);
 	local %LIVE;
 	if ($cerr) {
 		return try_scrape($self) if ($cerr >> 8) == 22; # 404 missing
@@ -579,7 +580,7 @@ EOM
 		}
 	}
 	reap_live() while keys(%LIVE);
-	return if $self->{lei}->{child_error};
+	return if $self->{lei}->{child_error} || $self->{dry_run};
 
 	if (delete $self->{-culled_manifest}) { # set by clone_v2/-I/--exclude
 		# write the smaller manifest if epochs were skipped so
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index ce4697f3..22ffc0fc 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -17,12 +17,13 @@ options:
   --torsocks VAL      whether or not to wrap git and curl commands with
                       torsocks (default: `auto')
                       Must be one of: `auto', `no' or `yes'
+  --dry-run | -n      show what would be cloned without cloning
   --verbose | -v      increase verbosity (may be repeated)
     --quiet | -q      increase verbosity (may be repeated)
     -C DIR            chdir to specified directory
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
-		jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
+	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config
 PublicInbox::Admin::do_chdir(delete $opt->{C});
@@ -54,6 +55,9 @@ my $mrr = bless {
 	src => $url,
 	dst => $dst,
 }, 'PublicInbox::LeiMirror';
+
+$? = 0;
+$mrr->{dry_run} = 1 if $lei->{opt}->{'dry-run'};
 $mrr->do_mirror;
 $mrr->can('_wq_done_wait')->([$mrr, $lei], $$);
 exit(($lei->{child_error} // 0) >> 8);
diff --git a/t/www_listing.t b/t/www_listing.t
index c13d8f90..45287c7d 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -153,6 +153,12 @@ EOM
 	is(xqx([qw(git config -f), "$tmpdir/incl/alt/config", 'gitweb.owner']),
 		"lorelei \xc4\x80\n", 'gitweb.owner set by -clone');
 
+	$clone_err = '';
+	ok(run_script(['-clone', '--dry-run',
+			"http://$host:$port/pfx", "$tmpdir/dry-run" ],
+		undef, $opt), 'clone --dry-run') or diag "clone_err=$clone_err";
+	ok(!-d "$tmpdir/dry-run", 'nothing cloned with --dry-run');
+
 	undef $td;
 
 	open $mh, '<', "$tmpdir/incl/manifest.js.gz" or xbail "open: $!";

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 21/95] lei_mirror: initialize placeholders with "head" from manifest
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (19 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 20/95] clone: support --dry-run / -n flag Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 22/95] lei_mirror: support {reference} for v1 manifest clones Eric Wong
                   ` (73 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This only affects v2 epochs, but ensures our bases are covered,
at least.  We'll have to update PublicInbox::Fetch later to
deal with "head" entries in manifest.js.gz, too.
---
 lib/PublicInbox/LeiMirror.pm | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index d955ac3b..e6b3e32a 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -307,7 +307,7 @@ EOM
 }
 
 sub init_placeholder ($$$) {
-	my ($src, $edst, $owner) = @_;
+	my ($src, $edst, $ent) = @_;
 	PublicInbox::Import::init_bare($edst);
 	my $f = "$edst/config";
 	open my $fh, '>>', $f or die "open($f): $!";
@@ -321,13 +321,19 @@ sub init_placeholder ($$$) {
 ; will not fetch updates for it unless write permission is added.
 ; Hint: chmod +w $edst
 EOM
-	if (defined($owner)) {
+	if (defined($ent->{owner})) {
 		print $fh <<EOM or die "print($f): $!";
 [gitweb]
-	owner = $owner
+	owner = $ent->{owner}
 EOM
 	}
-	close $fh or die "close:($f): $!";
+	close $fh or die "close($f): $!";
+	if (defined $ent->{head}) {
+		$f = "$edst/HEAD";
+		open $fh, '>', $f or die "open($f): $!";
+		print $fh $ent->{head}, "\n" or die "print($f): $!";
+		close $fh or die "close($f): $!";
+	}
 }
 
 sub reap_clone { # async, called via SIGCHLD
@@ -410,7 +416,7 @@ failed to extract epoch number from $src
 			my $o = $m->{$key}->{owner};
 			push(@{$task->{-owner}}, $edst, $o) if defined($o);
 		} else { # create a placeholder so users only need to chmod +w
-			init_placeholder($src, $edst, $m->{$key}->{owner});
+			init_placeholder($src, $edst, $m->{$key});
 			push @{$task->{-read_only}}, $edst;
 			push @skip, $key;
 		}

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 22/95] lei_mirror: support {reference} for v1 manifest clones
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (20 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 21/95] lei_mirror: initialize placeholders with "head" from manifest Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 23/95] lei_mirror: reduce noise on interrupted clones Eric Wong
                   ` (72 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This will be generalized to v2, as well.
---
 lib/PublicInbox/LeiMirror.pm | 48 +++++++++++++++++++++++++++++++++---
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index e6b3e32a..b6e12b95 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -11,6 +11,7 @@ use IO::Compress::Gzip qw(gzip $GzipError);
 use PublicInbox::Spawn qw(popen_rd spawn run_die);
 use File::Path ();
 use File::Temp ();
+use File::Spec ();
 use Fcntl qw(SEEK_SET O_CREAT O_EXCL O_WRONLY);
 use Carp qw(croak);
 use URI;
@@ -253,6 +254,9 @@ sub clone_v1 {
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
 	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}), "$uri", $dst ];
+	my $ref = $self->{-ent} ? $self->{-ent}->{reference} : undef;
+	defined($ref) && -e "$self->{dst}$ref" and
+		push @$cmd, '--reference', "$self->{dst}$ref";
 	start_clone($self, $cmd, $opt, $fini);
 
 	_get_txt_start($self, '_/text/config/raw', $fini);
@@ -354,6 +358,17 @@ sub v1_done { # called via OnDestroy
 	if (defined(my $o = $self->{-ent} ? $self->{-ent}->{owner} : undef)) {
 		run_die([qw(git config -f), "$dst/config", 'gitweb.owner', $o]);
 	}
+	my $o = "$dst/objects";
+	if (open(my $fh, '<', "$o/info/alternates")) {
+		chomp(my @l = <$fh>);
+		for (@l) { $_ = File::Spec->abs2rel($_, $o)."\n" }
+		my $f = File::Temp->new(TEMPLATE => '.XXXX', DIR => "$o/info");
+		print $f @l;
+		$f->flush or die "flush($f): $!";
+		rename($f->filename, "$o/info/alternates") or
+			die "rename($f, $o/info/alternates): $!";
+		$f->unlink_on_destroy(0);
+	}
 	write_makefile($dst, 1);
 	index_cloned_inbox($self, 1);
 }
@@ -510,6 +525,30 @@ sub multi_inbox ($$$) {
 	($path_pfx, $n, $ret);
 }
 
+sub clone_all {
+	my ($self, $todo, $m) = @_;
+	# handle no-dependency repos, first
+	for (@{delete($todo->{''}) // []}) {
+		clone_v1($_, 1);
+		return if $self->{lei}->{child_error};
+	}
+	# resolve references, deepest, first:
+	while (scalar keys %$todo) {
+		for my $x (keys %$todo) {
+			# resolve multi-level references
+			while (defined($m->{$x}->{reference})) {
+				$x = $m->{$x}->{reference};
+			}
+			my $y = delete $todo->{$x} // next; # already done
+			for (@$y) {
+				clone_v1($_, 1);
+				return if $self->{lei}->{child_error};
+			}
+			last; # restart %$todo iteration
+		}
+	}
+}
+
 # FIXME: this gets confused by single inbox instance w/ global manifest.js.gz
 sub try_manifest {
 	my ($self) = @_;
@@ -566,9 +605,9 @@ EOM
 		my $p = $path_pfx.$path;
 		chop($p) if substr($p, -1, 1) eq '/';
 		$uri->path($p);
+		my $todo = {};
+		my %want = map { $_ => 1 } @$v1;
 		for my $name (@$v1) {
-			return if $self->{lei}->{child_error};
-
 			my $task = bless { %$self }, __PACKAGE__;
 			$task->{-ent} = $m->{$name} //
 					die("BUG: no `$name' in manifest");
@@ -582,8 +621,11 @@ EOM
 E: `$task->{cur_dst}' must not contain newline
 EOM
 			$task->{cur_src} .= '/';
-			clone_v1($task, 1);
+			my $dep = $task->{-ent}->{reference} // '';
+			$dep = '' if !$want{$dep};
+			push @{$todo->{$dep}}, $task;
 		}
+		clone_all($self, $todo, $m);
 	}
 	reap_live() while keys(%LIVE);
 	return if $self->{lei}->{child_error} || $self->{dry_run};

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 23/95] lei_mirror: reduce noise on interrupted clones
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (21 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 22/95] lei_mirror: support {reference} for v1 manifest clones Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 24/95] clone: support --inbox-config option Eric Wong
                   ` (71 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We don't need git-config or other commands failing loudly.
`git clone' and subcommands it spawns may still spew, but it's no
worse than interrupting `git clone' itself, now.

We accomplish this by localizing $LIVE (formerly %LIVE) and
detecting when its auto-vivification into a hashref goes
out-of-scope during the `DESTRUCT' ${^GLOBAL_PHASE}.

We can't use ${^GLOBAL_PHASE}, yet, either, since it appeared in
Perl 5.14 and we're still migrating slowly to Perl 5.12 before
going to 5.14.
---
 lib/PublicInbox/LeiMirror.pm | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index b6e12b95..f81f6094 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -20,7 +20,7 @@ use PublicInbox::Inbox;
 use PublicInbox::LeiCurl;
 use PublicInbox::OnDestroy;
 
-our %LIVE; # pid => callback
+our $LIVE; # pid => callback
 
 sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my ($arg, $pid) = @_;
@@ -68,7 +68,7 @@ sub try_scrape {
 			$n => [ URI->new($_), '' ]
 		} @v2_urls; # uniq
 		clone_v2($self, \%v2_epochs);
-		reap_live() while keys(%LIVE);
+		reap_live() while keys(%$LIVE);
 		return;
 	}
 
@@ -126,11 +126,11 @@ sub _get_txt_start { # non-fatal
 	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(--compressed -R -o),
 					$ft->filename);
 	my $jobs = $lei->{opt}->{jobs} // 1;
-	reap_live() while keys(%LIVE) >= $jobs;
+	reap_live() while keys(%$LIVE) >= $jobs;
 	$lei->qerr("# @$cmd");
 	return if $self->{dry_run};
 	$self->{"-get_txt.$endpoint"} = [ $ft, $cmd, $uri ];
-	$LIVE{spawn($cmd, undef, $opt)} =
+	$LIVE->{spawn($cmd, undef, $opt)} =
 			[ \&_get_txt_done, $self, $endpoint, $fini ];
 }
 
@@ -237,10 +237,10 @@ sub run_reap {
 sub start_clone {
 	my ($self, $cmd, $opt, $fini) = @_;
 	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
-	reap_live() while keys(%LIVE) >= $jobs;
+	reap_live() while keys(%$LIVE) >= $jobs;
 	$self->{lei}->qerr("# @$cmd");
 	return if $self->{dry_run};
-	$LIVE{spawn($cmd, undef, $opt)} = [ \&reap_clone, $self, $cmd, $fini ];
+	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_clone, $self, $cmd, $fini ]
 }
 
 sub clone_v1 {
@@ -264,7 +264,7 @@ sub clone_v1 {
 	defined($d) ? ($self->{'txt.description'} = $d) :
 		_get_txt_start($self, 'description', $fini);
 
-	reap_live() until ($nohang || !keys(%LIVE)); # for non-manifest clone
+	reap_live() until ($nohang || !keys(%$LIVE)); # for non-manifest clone
 }
 
 sub parse_epochs ($$) {
@@ -345,14 +345,14 @@ sub reap_clone { # async, called via SIGCHLD
 	my $cerr = $?;
 	$? = 0; # don't let it influence normal exit
 	if ($cerr) {
-		kill('TERM', keys %LIVE);
+		kill('TERM', keys %$LIVE);
 		$self->{lei}->child_error($cerr, "@$cmd failed");
 	}
 }
 
 sub v1_done { # called via OnDestroy
 	my ($self) = @_;
-	return if $self->{dry_run};
+	return if $self->{dry_run} || !$LIVE;
 	_write_inbox_config($self);
 	my $dst = $self->{cur_dst} // $self->{dst};
 	if (defined(my $o = $self->{-ent} ? $self->{-ent}->{owner} : undef)) {
@@ -375,7 +375,7 @@ sub v1_done { # called via OnDestroy
 
 sub v2_done { # called via OnDestroy
 	my ($self) = @_;
-	return if $self->{dry_run};
+	return if $self->{dry_run} || !$LIVE;
 	_write_inbox_config($self);
 	require PublicInbox::MultiGit;
 	my $dst = $self->{cur_dst} // $self->{dst};
@@ -398,7 +398,7 @@ sub v2_done { # called via OnDestroy
 
 sub reap_live {
 	my $pid = waitpid(-1, 0) // die "waitpid(-1): $!";
-	if (my $x = delete $LIVE{$pid}) {
+	if (my $x = delete $LIVE->{$pid}) {
 		my $cb = shift @$x;
 		$cb->(@$x);
 	} else {
@@ -564,7 +564,7 @@ sub try_manifest {
 	my $cmd = $curl->for_uri($lei, $uri, '-R', '-o', $fn);
 	my %opt = map { $_ => $lei->{$_} } (0..2);
 	my $cerr = run_reap($lei, $cmd, \%opt);
-	local %LIVE;
+	local $LIVE;
 	if ($cerr) {
 		return try_scrape($self) if ($cerr >> 8) == 22; # 404 missing
 		return $lei->child_error($cerr, "@$cmd failed");
@@ -627,7 +627,7 @@ EOM
 		}
 		clone_all($self, $todo, $m);
 	}
-	reap_live() while keys(%LIVE);
+	reap_live() while keys(%$LIVE);
 	return if $self->{lei}->{child_error} || $self->{dry_run};
 
 	if (delete $self->{-culled_manifest}) { # set by clone_v2/-I/--exclude
@@ -656,7 +656,7 @@ sub do_mirror { # via wq_io_do
 	eval {
 		my $iv = $lei->{opt}->{'inbox-version'};
 		if (defined $iv) {
-			local %LIVE;
+			local $LIVE;
 			return clone_v1($self) if $iv == 1;
 			return try_scrape($self) if $iv == 2;
 			die "bad --inbox-version=$iv\n";

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 24/95] clone: support --inbox-config option
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (22 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 23/95] lei_mirror: reduce noise on interrupted clones Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 25/95] lei_mirror: retrieve v2 description properly Eric Wong
                   ` (70 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This allows avoiding 404s when trying _/text/config/raw on code
repositories.
---
 Documentation/public-inbox-clone.pod | 11 +++++++++++
 lib/PublicInbox/LeiMirror.pm         | 13 ++++++++++---
 script/public-inbox-clone            |  1 +
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 178d952a..52c89cfd 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -65,6 +65,17 @@ When cloning a top-level with multiple inboxes, ignore inboxes and
 repositories matching the given wildcard pattern.  Supports the same
 wildcards as L</--include>
 
+=item --inbox-config=always|v2|v1|never
+
+Whether or not to retrieve the C<$INBOX/_/text/config/raw> HTTP(S)
+endpoint when cloning.
+
+Since we can't deduce v1 inboxes from code repositories, setting this
+to C<v2> or C<never> can allow faster clones of code repositories if
+no v1 inboxes are present.
+
+Default: C<always>
+
 =item -n
 
 =item --dry-run
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index f81f6094..44d7a524 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -259,7 +259,8 @@ sub clone_v1 {
 		push @$cmd, '--reference', "$self->{dst}$ref";
 	start_clone($self, $cmd, $opt, $fini);
 
-	_get_txt_start($self, '_/text/config/raw', $fini);
+	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v1)\z/s and
+		_get_txt_start($self, '_/text/config/raw', $fini);
 	my $d = $self->{-ent} ? $self->{-ent}->{description} : undef;
 	defined($d) ? ($self->{'txt.description'} = $d) :
 		_get_txt_start($self, 'description', $fini);
@@ -445,7 +446,9 @@ failed to extract epoch number from $src
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
 
-	_get_txt_start($task, '_/text/config/raw', $fini);
+	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v2)\z/s and
+		_get_txt_start($task, '_/text/config/raw', $fini);
+
 	_get_txt_start($self, 'description', $fini);
 
 	$task->{-locked} = $lk->lock_for_scope($$) if !$self->{dry_run};
@@ -649,11 +652,15 @@ sub start_clone_url {
 	die "TODO: non-HTTP/HTTPS clone of $self->{src} not supported, yet";
 }
 
-sub do_mirror { # via wq_io_do
+sub do_mirror { # via wq_io_do or public-inbox-clone
 	my ($self) = @_;
 	my $lei = $self->{lei};
 	umask($lei->{client_umask}) if defined $lei->{client_umask};
 	eval {
+		my $ic = $lei->{opt}->{'inbox-config'} //= 'always';
+		$ic =~ /\A(?:v1|v2|always|never)\z/s or die <<"";
+--inbox-config must be one of `always', `v2', `v1', or `never'
+
 		my $iv = $lei->{opt}->{'inbox-version'};
 		if (defined $iv) {
 			local $LIVE;
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 22ffc0fc..3d980c97 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -23,6 +23,7 @@ options:
     -C DIR            chdir to specified directory
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
+	inbox-config=s
 	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 25/95] lei_mirror: retrieve v2 description properly
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (23 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 24/95] clone: support --inbox-config option Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 26/95] lei_mirror: reduce scope of v2 lock Eric Wong
                   ` (69 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

---
 lib/PublicInbox/LeiMirror.pm | 2 +-
 t/www_listing.t              | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 44d7a524..9c457fca 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -449,7 +449,7 @@ failed to extract epoch number from $src
 	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v2)\z/s and
 		_get_txt_start($task, '_/text/config/raw', $fini);
 
-	_get_txt_start($self, 'description', $fini);
+	_get_txt_start($task, 'description', $fini);
 
 	$task->{-locked} = $lk->lock_for_scope($$) if !$self->{dry_run};
 	my @cmd = clone_cmd($lei, my $opt = {});
diff --git a/t/www_listing.t b/t/www_listing.t
index 45287c7d..6166b94e 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -91,6 +91,10 @@ SKIP: {
 		is(xsys(@clone, $alt, "$v2/git/$i.git"), 0, "clone epoch $i")
 	}
 	ok(open(my $fh, '>', "$v2/inbox.lock"), 'mock a v2 inbox');
+	open($fh, '>', "$v2/description") or xbail "open $v2/description: $!";
+	print $fh "a v2 inbox\n" or xbail "print $!";
+	close $fh or xbail "write: $v2/description $!";
+
 	open $fh, '>', "$alt/description" or xbail "open $alt/description $!";
 	print $fh "we're \xc4\x80ll clones\n" or xbail "print $!";
 	close $fh or xbail "write: $alt/description $!";
@@ -143,6 +147,9 @@ EOM
 					/v2/git/1.git /v2/git/2.git) ],
 		'manifest saved');
 	for (keys %$mf) { ok(-d "$tmpdir/pfx$_", "pfx/$_ cloned") }
+	open my $desc, '<', "$tmpdir/pfx/v2/description" or xbail "open: $!";
+	$desc = <$desc>;
+	is($desc, "a v2 inbox\n", 'v2 description retrieved');
 
 	$clone_err = '';
 	ok(run_script(['-clone', '--include=*/alt',

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 26/95] lei_mirror: reduce scope of v2 lock
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (24 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 25/95] lei_mirror: retrieve v2 description properly Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 27/95] lei_mirror: allow --epoch on mixed v1/v2 clones Eric Wong
                   ` (68 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Guarding against parallel clones isn't realistic, really, only
setting up all.git, and even then, I'm not 100% sure the lock
is useful.
---
 lib/PublicInbox/LeiMirror.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 9c457fca..0a93ed44 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -377,9 +377,11 @@ sub v1_done { # called via OnDestroy
 sub v2_done { # called via OnDestroy
 	my ($self) = @_;
 	return if $self->{dry_run} || !$LIVE;
+	my $dst = $self->{cur_dst} // $self->{dst};
+	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
+	my $lck = $lk->lock_for_scope($$);
 	_write_inbox_config($self);
 	require PublicInbox::MultiGit;
-	my $dst = $self->{cur_dst} // $self->{dst};
 	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
 	$mg->fill_alternates;
 	for my $i ($mg->git_epochs) { $mg->epoch_cfg_set($i) }
@@ -393,7 +395,7 @@ sub v2_done { # called via OnDestroy
 		chmod($st[2] & 0555, $edst) or die "chmod(a-w, $edst): $!";
 	}
 	write_makefile($dst, 2);
-	delete $self->{-locked} // die "BUG: $dst not locked"; # unlock
+	undef $lck; # unlock
 	index_cloned_inbox($self, 2);
 }
 
@@ -443,7 +445,6 @@ failed to extract epoch number from $src
 	(!$self->{dry_run} && !-d $dst) and File::Path::mkpath($dst);
 
 	require PublicInbox::Lock;
-	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
 
 	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v2)\z/s and
@@ -451,7 +452,6 @@ failed to extract epoch number from $src
 
 	_get_txt_start($task, 'description', $fini);
 
-	$task->{-locked} = $lk->lock_for_scope($$) if !$self->{dry_run};
 	my @cmd = clone_cmd($lei, my $opt = {});
 	while (@src_edst && !$lei->{child_error}) {
 		my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 27/95] lei_mirror: allow --epoch on mixed v1/v2 clones
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (25 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 26/95] lei_mirror: reduce scope of v2 lock Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 28/95] lei_mirror: fix infinite loop in dependency resolution Eric Wong
                   ` (67 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

It's entirely possible an instance will have both v1 and v2
inboxes (or v2 inboxes and coderepos).  Don't punish --epoch
users by forcing them to run multiple commands.
---
 lib/PublicInbox/LeiMirror.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 0a93ed44..ddb1e747 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -579,7 +579,8 @@ sub try_manifest {
 	}
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
 	return $lei->child_error(1, $multi) if !ref($multi);
-	if (my $v2 = delete $multi->{v2}) {
+	my $v2 = delete $multi->{v2};
+	if ($v2) {
 		for my $name (sort keys %$v2) {
 			my $epochs = delete $v2->{$name};
 			my %v2_epochs = map {
@@ -605,6 +606,7 @@ EOM
 		}
 	}
 	if (my $v1 = delete $multi->{v1}) {
+		delete local $lei->{opt}->{epoch} if defined($v2);
 		my $p = $path_pfx.$path;
 		chop($p) if substr($p, -1, 1) eq '/';
 		$uri->path($p);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 28/95] lei_mirror: fix infinite loop in dependency resolution
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (26 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 27/95] lei_mirror: allow --epoch on mixed v1/v2 clones Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 29/95] lei_mirror: defend against infinite loops Eric Wong
                   ` (66 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We need to account for dependencies which are marked `done'.
---
 lib/PublicInbox/LeiMirror.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index ddb1e747..0f46d355 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -539,8 +539,9 @@ sub clone_all {
 	while (scalar keys %$todo) {
 		for my $x (keys %$todo) {
 			# resolve multi-level references
-			while (defined($m->{$x}->{reference})) {
-				$x = $m->{$x}->{reference};
+			while (defined(my $nxt = $m->{$x}->{reference})) {
+				exists($todo->{$nxt}) or last;
+				$x = $nxt;
 			}
 			my $y = delete $todo->{$x} // next; # already done
 			for (@$y) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 29/95] lei_mirror: defend against infinite loops
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (27 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 28/95] lei_mirror: fix infinite loop in dependency resolution Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 30/95] lei_mirror: do not fetch descriptions if using manifest Eric Wong
                   ` (65 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

A reference chain of 1000 ought to be enough, I think...
---
 lib/PublicInbox/LeiMirror.pm | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 0f46d355..1d6ed51c 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -538,9 +538,13 @@ sub clone_all {
 	# resolve references, deepest, first:
 	while (scalar keys %$todo) {
 		for my $x (keys %$todo) {
+			my $nr;
 			# resolve multi-level references
 			while (defined(my $nxt = $m->{$x}->{reference})) {
 				exists($todo->{$nxt}) or last;
+				die <<EOM if ++$nr > 1000;
+E: dependency loop detected (`$x' => `$nxt')
+EOM
 				$x = $nxt;
 			}
 			my $y = delete $todo->{$x} // next; # already done

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 30/95] lei_mirror: do not fetch descriptions if using manifest
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (28 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 29/95] lei_mirror: defend against infinite loops Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 31/95] lei_mirror: require PublicInbox::Lock at use Eric Wong
                   ` (64 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

If a manifest exists, we can expect the description to always be
present, thus there's no need to make a separate HTTP(S) request
since we can use it as-is from the manifest for v1||coderepos
and strip / \[epoch [0-9]+\]\z/ from v1.
---
 lib/PublicInbox/LeiMirror.pm | 56 ++++++++++++++++++++++++------------
 1 file changed, 37 insertions(+), 19 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 1d6ed51c..2b20873e 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -177,16 +177,17 @@ sub set_description ($) {
 	open my $fh, '+>>', $f or die "open($f): $!";
 	seek($fh, 0, SEEK_SET) or die "seek($f): $!";
 	my $d = do { local $/; <$fh> } // die "read($f): $!";
-	my $orig = $d;
+	chomp(my $orig = $d);
 	while (defined($d) && ($d =~ m!^\(\$INBOX_DIR/description missing\)! ||
 			$d =~ /^Unnamed repository/ || $d !~ /\S/)) {
 		$d = delete($self->{'txt.description'});
 	}
-	$d //= 'mirror of '.($self->{cur_src} // $self->{src})."\n";
+	$d //= 'mirror of '.($self->{cur_src} // $self->{src});
+	chomp $d;
 	return if $d eq $orig;
 	seek($fh, 0, SEEK_SET) or die "seek($f): $!";
 	truncate($fh, 0) or die "truncate($f): $!";
-	print $fh $d or die "print($f): $!";
+	print $fh $d, "\n" or die "print($f): $!";
 	close $fh or die "close($f): $!";
 }
 
@@ -261,8 +262,10 @@ sub clone_v1 {
 
 	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v1)\z/s and
 		_get_txt_start($self, '_/text/config/raw', $fini);
+
 	my $d = $self->{-ent} ? $self->{-ent}->{description} : undef;
-	defined($d) ? ($self->{'txt.description'} = $d) :
+	$self->{'txt.description'} = $d if defined $d;
+	(!defined($d) && !$nohang) and
 		_get_txt_start($self, 'description', $fini);
 
 	reap_live() until ($nohang || !keys(%$LIVE)); # for non-manifest clone
@@ -333,11 +336,14 @@ EOM
 EOM
 	}
 	close $fh or die "close($f): $!";
-	if (defined $ent->{head}) {
-		$f = "$edst/HEAD";
-		open $fh, '>', $f or die "open($f): $!";
-		print $fh $ent->{head}, "\n" or die "print($f): $!";
-		close $fh or die "close($f): $!";
+	my %map = (head => 'HEAD', description => undef);
+	while (my ($key, $fn) = each %map) {
+		my $val = $ent->{$key} // next;
+		$fn //= $key;
+		$fn = "$edst/$fn";
+		open $fh, '>', $fn or die "open($fn): $!";
+		print $fh $val, "\n" or die "print($fn): $!";
+		close $fh or die "close($fn): $!";
 	}
 }
 
@@ -385,10 +391,18 @@ sub v2_done { # called via OnDestroy
 	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
 	$mg->fill_alternates;
 	for my $i ($mg->git_epochs) { $mg->epoch_cfg_set($i) }
-	my $edst_owner = delete($self->{-owner}) // [];
-	while (@$edst_owner) {
-		my ($edst, $o) = splice(@$edst_owner);
-		run_die [qw(git config -f), "$edst/config", 'gitweb.owner', $o];
+	my $entries = delete($self->{-ent}) // [];
+	while (@$entries) {
+		my ($edst, $ent) = splice(@$entries);
+		if (defined(my $o = $ent->{owner})) {
+			run_die [qw(git config -f), "$edst/config",
+				'gitweb.owner', $o];
+		}
+		my $d = $ent->{description} // next;
+		my $fn = "$edst/description";
+		open my $fh, '>', $fn or die "open($fn): $!";
+		print $fh $d, "\n" or die "print($fn): $!";
+		close $fh or die "close($fn): $!";
 	}
 	for my $edst (@{delete($self->{-read_only}) // []}) {
 		my @st = stat($edst) or die "stat($edst): $!";
@@ -418,7 +432,7 @@ sub clone_v2 ($$;$) {
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $want = parse_epochs($lei->{opt}->{epoch}, $v2_epochs);
 	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
-	my (@src_edst, @skip);
+	my (@src_edst, @skip, $desc);
 	for my $nr (sort { $a <=> $b } keys %$v2_epochs) {
 		my ($uri, $key) = @{$v2_epochs->{$nr}};
 		my $src = $uri->as_string;
@@ -428,13 +442,16 @@ failed to extract epoch number from $src
 
 		$1 + 0 == $nr or die "BUG: <$uri> miskeyed $1 != $nr";
 		$edst .= "/git/$nr.git";
-		$m->{$key} // die "BUG: `$key' not in manifest.js.gz";
+		my $ent = $m->{$key} // die "BUG: `$key' not in manifest.js.gz";
+		if (defined(my $d = $ent->{description})) {
+			$d =~ s/ \[epoch [0-9]+\]\z//s;
+			$desc = $d;
+		}
 		if (!$want || $want->{$nr}) {
 			push @src_edst, $src, $edst;
-			my $o = $m->{$key}->{owner};
-			push(@{$task->{-owner}}, $edst, $o) if defined($o);
+			push @{$task->{-ent}}, $edst, $ent;
 		} else { # create a placeholder so users only need to chmod +w
-			init_placeholder($src, $edst, $m->{$key});
+			init_placeholder($src, $edst, $ent);
 			push @{$task->{-read_only}}, $edst;
 			push @skip, $key;
 		}
@@ -450,7 +467,8 @@ failed to extract epoch number from $src
 	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v2)\z/s and
 		_get_txt_start($task, '_/text/config/raw', $fini);
 
-	_get_txt_start($task, 'description', $fini);
+	defined($desc) ? ($task->{'txt.description'} = $desc) :
+		_get_txt_start($task, 'description', $fini);
 
 	my @cmd = clone_cmd($lei, my $opt = {});
 	while (@src_edst && !$lei->{child_error}) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 31/95] lei_mirror: require PublicInbox::Lock at use
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (29 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 30/95] lei_mirror: do not fetch descriptions if using manifest Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 32/95] lei_mirror: fix glob semantics to match end-of-path Eric Wong
                   ` (63 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

It's easier to understand why we lazy-load Lock for v2-only
code paths when we require it near its first use.
---
 lib/PublicInbox/LeiMirror.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 2b20873e..a7ddfcd2 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -384,6 +384,7 @@ sub v2_done { # called via OnDestroy
 	my ($self) = @_;
 	return if $self->{dry_run} || !$LIVE;
 	my $dst = $self->{cur_dst} // $self->{dst};
+	require PublicInbox::Lock;
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
 	my $lck = $lk->lock_for_scope($$);
 	_write_inbox_config($self);
@@ -461,7 +462,6 @@ failed to extract epoch number from $src
 
 	(!$self->{dry_run} && !-d $dst) and File::Path::mkpath($dst);
 
-	require PublicInbox::Lock;
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
 
 	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v2)\z/s and

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 32/95] lei_mirror: fix glob semantics to match end-of-path
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (30 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 31/95] lei_mirror: require PublicInbox::Lock at use Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 33/95] lei_mirror: differentiate -entv vs -ent Eric Wong
                   ` (62 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Globs such as `*/foo' should not match `*/foobar',
this allows cloning only `git' and not
`gitolite-transparency-log` off lore
---
 lib/PublicInbox/LeiMirror.pm | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index a7ddfcd2..40164b67 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -506,18 +506,18 @@ sub multi_inbox ($$$) {
 	my $n = scalar(keys %$v2) + scalar(@v1);
 	my @orig = defined($incl // $excl) ? (keys %$v2, @v1) : ();
 	if (defined $incl) {
-		my $re = '(?:'.join('|', map {
-				$self->{lei}->glob2re($_) // qr/\A\Q$_\E\z/
-			} @$incl).')';
+		my $re = '(?:'.join('\\z|', map {
+				$self->{lei}->glob2re($_) // qr/\A\Q$_\E/
+			} @$incl).'\\z)';
 		my @gone = delete @$v2{grep(!/$re/, keys %$v2)};
 		delete @$m{map { @$_ } @gone} and $self->{-culled_manifest} = 1;
 		delete @$m{grep(!/$re/, @v1)} and $self->{-culled_manifest} = 1;
 		@v1 = grep(/$re/, @v1);
 	}
 	if (defined $excl) {
-		my $re = '(?:'.join('|', map {
-				$self->{lei}->glob2re($_) // qr/\A\Q$_\E\z/
-			} @$excl).')';
+		my $re = '(?:'.join('\\z|', map {
+				$self->{lei}->glob2re($_) // qr/\A\Q$_\E/
+			} @$excl).'\\z)';
 		my @gone = delete @$v2{grep(/$re/, keys %$v2)};
 		delete @$m{map { @$_ } @gone} and $self->{-culled_manifest} = 1;
 		delete @$m{grep(/$re/, @v1)} and $self->{-culled_manifest} = 1;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 33/95] lei_mirror: differentiate -entv vs -ent
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (31 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 32/95] lei_mirror: fix glob semantics to match end-of-path Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 34/95] lei_mirror: support manifest {references} for v2 epochs Eric Wong
                   ` (61 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

It makes the code easier-to-follow when we have a single
versus multiple entities (`v' for vector, à la `argv').
---
 lib/PublicInbox/LeiMirror.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 40164b67..faf6a3b6 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -392,7 +392,7 @@ sub v2_done { # called via OnDestroy
 	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
 	$mg->fill_alternates;
 	for my $i ($mg->git_epochs) { $mg->epoch_cfg_set($i) }
-	my $entries = delete($self->{-ent}) // [];
+	my $entries = delete($self->{-entv}) // [];
 	while (@$entries) {
 		my ($edst, $ent) = splice(@$entries);
 		if (defined(my $o = $ent->{owner})) {
@@ -450,7 +450,7 @@ failed to extract epoch number from $src
 		}
 		if (!$want || $want->{$nr}) {
 			push @src_edst, $src, $edst;
-			push @{$task->{-ent}}, $edst, $ent;
+			push @{$task->{-entv}}, $edst, $ent;
 		} else { # create a placeholder so users only need to chmod +w
 			init_placeholder($src, $edst, $ent);
 			push @{$task->{-read_only}}, $edst;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 34/95] lei_mirror: support manifest {references} for v2 epochs
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (32 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 33/95] lei_mirror: differentiate -entv vs -ent Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 35/95] lei_mirror: simplify v2 code paths Eric Wong
                   ` (60 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This may be useful in case a v1 inbox gets forked into v2
(untested).
---
 lib/PublicInbox/LeiMirror.pm | 62 ++++++++++++++++++++++++++----------
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index faf6a3b6..20ae3ac8 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -98,7 +98,8 @@ sub clone_cmd {
 			($lei->{opt}->{jobs} // 1) > 1;
 	push @cmd, '-v' if $lei->{opt}->{verbose};
 	# XXX any other options to support?
-	# --reference is tricky with multiple epochs...
+	# --reference is tricky with multiple epochs, but handled
+	# automatically if using manifest.js.gz
 	@cmd;
 }
 
@@ -260,8 +261,10 @@ sub clone_v1 {
 		push @$cmd, '--reference', "$self->{dst}$ref";
 	start_clone($self, $cmd, $opt, $fini);
 
-	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v1)\z/s and
+	if (!$self->{-is_epoch} && $lei->{opt}->{'inbox-config'} =~
+				/\A(?:always|v1)\z/s) {
 		_get_txt_start($self, '_/text/config/raw', $fini);
+	}
 
 	my $d = $self->{-ent} ? $self->{-ent}->{description} : undef;
 	$self->{'txt.description'} = $d if defined $d;
@@ -376,6 +379,7 @@ sub v1_done { # called via OnDestroy
 			die "rename($f, $o/info/alternates): $!";
 		$f->unlink_on_destroy(0);
 	}
+	return if $self->{-is_epoch};
 	write_makefile($dst, 1);
 	index_cloned_inbox($self, 1);
 }
@@ -433,6 +437,7 @@ sub clone_v2 ($$;$) {
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $want = parse_epochs($lei->{opt}->{epoch}, $v2_epochs);
 	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
+	delete $task->{todo}; # $self->{todo} still exists
 	my (@src_edst, @skip, $desc);
 	for my $nr (sort { $a <=> $b } keys %$v2_epochs) {
 		my ($uri, $key) = @{$v2_epochs->{$nr}};
@@ -451,6 +456,7 @@ failed to extract epoch number from $src
 		if (!$want || $want->{$nr}) {
 			push @src_edst, $src, $edst;
 			push @{$task->{-entv}}, $edst, $ent;
+			$self->{any_want}->{$key} = 1;
 		} else { # create a placeholder so users only need to chmod +w
 			init_placeholder($src, $edst, $ent);
 			push @{$task->{-read_only}}, $edst;
@@ -469,11 +475,27 @@ failed to extract epoch number from $src
 
 	defined($desc) ? ($task->{'txt.description'} = $desc) :
 		_get_txt_start($task, 'description', $fini);
-
-	my @cmd = clone_cmd($lei, my $opt = {});
-	while (@src_edst && !$lei->{child_error}) {
-		my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
-		start_clone($self, $cmd, $opt, $fini);
+	if (my $todo = $self->{todo}) {  # manifest clone, deal with references
+		my $entv = delete $task->{-entv};
+		while (@$entv) {
+			my ($edst, $ent) = splice(@$entv, 0, 2);
+			my $etask = bless { %$task }, __PACKAGE__;
+			$etask->{-ent} = $ent; # may have {reference}
+			$etask->{cur_src} = shift @src_edst //
+				die 'BUG: no cur_src';
+			$etask->{cur_dst} = shift @src_edst //
+				die 'BUG: no cur_dst';
+			$etask->{cur_dst} eq $edst or
+				die "BUG: `$etask->{cur_dst}' != `$edst'";
+			$etask->{-is_epoch} = $fini;
+			push @{$todo->{($ent->{reference} // '')}}, $etask;
+		}
+	} else {
+		my @cmd = clone_cmd($lei, my $opt = {});
+		while (@src_edst && !$lei->{child_error}) {
+			my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
+			start_clone($self, $cmd, $opt, $fini);
+		}
 	}
 }
 
@@ -546,10 +568,19 @@ sub multi_inbox ($$$) {
 	($path_pfx, $n, $ret);
 }
 
-sub clone_all {
-	my ($self, $todo, $m) = @_;
+sub clone_all ($$) {
+	my ($self, $m) = @_;
+	my $todo = delete $self->{todo};
+	my $nodep = delete $todo->{''};
+
+	# do not download unwanted deps
+	my $any_want = delete $self->{any_want};
+	my @unwanted = grep { !$any_want->{$_} } keys %$todo;
+	my @nodep = delete(@$todo{@unwanted});
+	push(@$nodep, @$_) for @nodep;
+
 	# handle no-dependency repos, first
-	for (@{delete($todo->{''}) // []}) {
+	for (@$nodep) {
 		clone_v1($_, 1);
 		return if $self->{lei}->{child_error};
 	}
@@ -603,6 +634,7 @@ sub try_manifest {
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
 	return $lei->child_error(1, $multi) if !ref($multi);
 	my $v2 = delete $multi->{v2};
+	local $self->{todo} = {};
 	if ($v2) {
 		for my $name (sort keys %$v2) {
 			my $epochs = delete $v2->{$name};
@@ -629,12 +661,9 @@ EOM
 		}
 	}
 	if (my $v1 = delete $multi->{v1}) {
-		delete local $lei->{opt}->{epoch} if defined($v2);
 		my $p = $path_pfx.$path;
 		chop($p) if substr($p, -1, 1) eq '/';
 		$uri->path($p);
-		my $todo = {};
-		my %want = map { $_ => 1 } @$v1;
 		for my $name (@$v1) {
 			my $task = bless { %$self }, __PACKAGE__;
 			$task->{-ent} = $m->{$name} //
@@ -650,11 +679,12 @@ E: `$task->{cur_dst}' must not contain newline
 EOM
 			$task->{cur_src} .= '/';
 			my $dep = $task->{-ent}->{reference} // '';
-			$dep = '' if !$want{$dep};
-			push @{$todo->{$dep}}, $task;
+			push @{$self->{todo}->{$dep}}, $task;
+			$self->{any_want}->{$name} = 1;
 		}
-		clone_all($self, $todo, $m);
 	}
+	delete local $lei->{opt}->{epoch} if defined($v2);
+	clone_all($self, $m);
 	reap_live() while keys(%$LIVE);
 	return if $self->{lei}->{child_error} || $self->{dry_run};
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 35/95] lei_mirror: simplify v2 code paths
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (33 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 34/95] lei_mirror: support manifest {references} for v2 epochs Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 36/95] clone: support --inbox-version Eric Wong
                   ` (59 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We can simply reuse the parallelization of the manifest
code path for non-manifest v2 clones, now.
---
 lib/PublicInbox/LeiMirror.pm | 86 ++++++++++++++----------------------
 1 file changed, 34 insertions(+), 52 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 20ae3ac8..18c825d3 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -67,8 +67,9 @@ sub try_scrape {
 			my ($n) = (m!/([0-9]+)\z!);
 			$n => [ URI->new($_), '' ]
 		} @v2_urls; # uniq
-		clone_v2($self, \%v2_epochs);
-		reap_live() while keys(%$LIVE);
+		clone_v2_prep($self, \%v2_epochs);
+		delete local $lei->{opt}->{epoch};
+		clone_all($self);
 		return;
 	}
 
@@ -223,7 +224,7 @@ sub index_cloned_inbox {
 		PublicInbox::Admin::progress_prepare($opt, $lei->{2});
 		PublicInbox::Admin::index_inbox($ibx, undef, $opt);
 	}
-	return if defined $self->{cur_dst};
+	return if defined $self->{cur_dst}; # one of many repos to clone
 	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait
 }
 
@@ -396,19 +397,6 @@ sub v2_done { # called via OnDestroy
 	my $mg = PublicInbox::MultiGit->new($dst, 'all.git', 'git');
 	$mg->fill_alternates;
 	for my $i ($mg->git_epochs) { $mg->epoch_cfg_set($i) }
-	my $entries = delete($self->{-entv}) // [];
-	while (@$entries) {
-		my ($edst, $ent) = splice(@$entries);
-		if (defined(my $o = $ent->{owner})) {
-			run_die [qw(git config -f), "$edst/config",
-				'gitweb.owner', $o];
-		}
-		my $d = $ent->{description} // next;
-		my $fn = "$edst/description";
-		open my $fh, '>', $fn or die "open($fn): $!";
-		print $fh $d, "\n" or die "print($fn): $!";
-		close $fh or die "close($fn): $!";
-	}
 	for my $edst (@{delete($self->{-read_only}) // []}) {
 		my @st = stat($edst) or die "stat($edst): $!";
 		chmod($st[2] & 0555, $edst) or die "chmod(a-w, $edst): $!";
@@ -428,7 +416,7 @@ sub reap_live {
 	}
 }
 
-sub clone_v2 ($$;$) {
+sub clone_v2_prep ($$;$) {
 	my ($self, $v2_epochs, $m) = @_; # $m => manifest.js.gz hashref
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
@@ -438,7 +426,7 @@ sub clone_v2 ($$;$) {
 	my $want = parse_epochs($lei->{opt}->{epoch}, $v2_epochs);
 	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
 	delete $task->{todo}; # $self->{todo} still exists
-	my (@src_edst, @skip, $desc);
+	my (@src_edst, @skip, $desc, @entv);
 	for my $nr (sort { $a <=> $b } keys %$v2_epochs) {
 		my ($uri, $key) = @{$v2_epochs->{$nr}};
 		my $src = $uri->as_string;
@@ -448,14 +436,18 @@ failed to extract epoch number from $src
 
 		$1 + 0 == $nr or die "BUG: <$uri> miskeyed $1 != $nr";
 		$edst .= "/git/$nr.git";
-		my $ent = $m->{$key} // die "BUG: `$key' not in manifest.js.gz";
-		if (defined(my $d = $ent->{description})) {
-			$d =~ s/ \[epoch [0-9]+\]\z//s;
-			$desc = $d;
+		my $ent;
+		if ($m) {
+			$ent = $m->{$key} //
+				die("BUG: `$key' not in manifest.js.gz");
+			if (defined(my $d = $ent->{description})) {
+				$d =~ s/ \[epoch [0-9]+\]\z//s;
+				$desc = $d;
+			}
 		}
 		if (!$want || $want->{$nr}) {
 			push @src_edst, $src, $edst;
-			push @{$task->{-entv}}, $edst, $ent;
+			push @entv, $edst, $ent;
 			$self->{any_want}->{$key} = 1;
 		} else { # create a placeholder so users only need to chmod +w
 			init_placeholder($src, $edst, $ent);
@@ -464,7 +456,7 @@ failed to extract epoch number from $src
 		}
 	}
 	# filter out the epochs we skipped
-	$self->{-culled_manifest} = 1 if delete(@$m{@skip});
+	$self->{-culled_manifest} = 1 if $m && delete(@$m{@skip});
 
 	(!$self->{dry_run} && !-d $dst) and File::Path::mkpath($dst);
 
@@ -475,27 +467,16 @@ failed to extract epoch number from $src
 
 	defined($desc) ? ($task->{'txt.description'} = $desc) :
 		_get_txt_start($task, 'description', $fini);
-	if (my $todo = $self->{todo}) {  # manifest clone, deal with references
-		my $entv = delete $task->{-entv};
-		while (@$entv) {
-			my ($edst, $ent) = splice(@$entv, 0, 2);
-			my $etask = bless { %$task }, __PACKAGE__;
-			$etask->{-ent} = $ent; # may have {reference}
-			$etask->{cur_src} = shift @src_edst //
-				die 'BUG: no cur_src';
-			$etask->{cur_dst} = shift @src_edst //
-				die 'BUG: no cur_dst';
-			$etask->{cur_dst} eq $edst or
-				die "BUG: `$etask->{cur_dst}' != `$edst'";
-			$etask->{-is_epoch} = $fini;
-			push @{$todo->{($ent->{reference} // '')}}, $etask;
-		}
-	} else {
-		my @cmd = clone_cmd($lei, my $opt = {});
-		while (@src_edst && !$lei->{child_error}) {
-			my $cmd = [ @$pfx, @cmd, splice(@src_edst, 0, 2) ];
-			start_clone($self, $cmd, $opt, $fini);
-		}
+	while (@entv) {
+		my ($edst, $ent) = splice(@entv, 0, 2);
+		my $etask = bless { %$task }, __PACKAGE__;
+		$etask->{-ent} = $ent; # may have {reference}
+		$etask->{cur_src} = shift @src_edst // die 'BUG: no cur_src';
+		$etask->{cur_dst} = shift @src_edst // die 'BUG: no cur_dst';
+		$etask->{cur_dst} eq $edst or
+			die "BUG: `$etask->{cur_dst}' != `$edst'";
+		$etask->{-is_epoch} = $fini;
+		push @{$self->{todo}->{($ent->{reference} // '')}}, $etask;
 	}
 }
 
@@ -568,7 +549,7 @@ sub multi_inbox ($$$) {
 	($path_pfx, $n, $ret);
 }
 
-sub clone_all ($$) {
+sub clone_all {
 	my ($self, $m) = @_;
 	my $todo = delete $self->{todo};
 	my $nodep = delete $todo->{''};
@@ -587,9 +568,9 @@ sub clone_all ($$) {
 	# resolve references, deepest, first:
 	while (scalar keys %$todo) {
 		for my $x (keys %$todo) {
-			my $nr;
+			my ($nr, $nxt);
 			# resolve multi-level references
-			while (defined(my $nxt = $m->{$x}->{reference})) {
+			while ($m && defined($nxt = $m->{$x}->{reference})) {
 				exists($todo->{$nxt}) or last;
 				die <<EOM if ++$nr > 1000;
 E: dependency loop detected (`$x' => `$nxt')
@@ -604,6 +585,7 @@ EOM
 			last; # restart %$todo iteration
 		}
 	}
+	reap_live() while keys(%$LIVE);
 }
 
 # FIXME: this gets confused by single inbox instance w/ global manifest.js.gz
@@ -656,7 +638,7 @@ sub try_manifest {
 			index($self->{cur_dst}, "\n") >= 0 and die <<EOM;
 E: `$self->{cur_dst}' must not contain newline
 EOM
-			clone_v2($self, \%v2_epochs, $m);
+			clone_v2_prep($self, \%v2_epochs, $m);
 			return if $self->{lei}->{child_error};
 		}
 	}
@@ -679,16 +661,16 @@ E: `$task->{cur_dst}' must not contain newline
 EOM
 			$task->{cur_src} .= '/';
 			my $dep = $task->{-ent}->{reference} // '';
-			push @{$self->{todo}->{$dep}}, $task;
+			push @{$self->{todo}->{$dep}}, $task; # for clone_all
 			$self->{any_want}->{$name} = 1;
 		}
 	}
 	delete local $lei->{opt}->{epoch} if defined($v2);
 	clone_all($self, $m);
-	reap_live() while keys(%$LIVE);
 	return if $self->{lei}->{child_error} || $self->{dry_run};
 
-	if (delete $self->{-culled_manifest}) { # set by clone_v2/-I/--exclude
+	# set by clone_v2_prep/-I/--exclude
+	if (delete $self->{-culled_manifest}) {
 		# write the smaller manifest if epochs were skipped so
 		# users won't have to delete manifest if they +w an
 		# epoch they no longer want to skip

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 36/95] clone: support --inbox-version
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (34 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 35/95] lei_mirror: simplify v2 code paths Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 37/95] lei_mirror: require Perl v5.12+ Eric Wong
                   ` (58 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This is part of `lei add-external --mirror', and it makes
sense to have for development and testing.  We'll also add
a fallback in case somebody tries --inbox-version and fails
due to a newer remote instances of public-inbox.
---
 Documentation/lei-add-external.pod   |  4 +++-
 Documentation/public-inbox-clone.pod |  6 ++++++
 lib/PublicInbox/LeiMirror.pm         | 31 +++++++++++++++++-----------
 script/public-inbox-clone            |  2 +-
 4 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/Documentation/lei-add-external.pod b/Documentation/lei-add-external.pod
index 7afcad63..2a131b55 100644
--- a/Documentation/lei-add-external.pod
+++ b/Documentation/lei-add-external.pod
@@ -75,7 +75,9 @@ Default: C<auto>
 
 =item --inbox-version=NUM
 
-Force a public-inbox version (must be C<1> or C<2>).
+Force a remote public-inbox version (must be C<1> or C<2>).
+This is auto-detected by default, and this option exists mainly
+for testing.
 
 =back
 
diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 52c89cfd..1c31fbb3 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -76,6 +76,12 @@ no v1 inboxes are present.
 
 Default: C<always>
 
+=item --inbox-version=NUM
+
+Force a remote public-inbox version (must be C<1> or C<2>).
+This is auto-detected by default, and this option exists mainly
+for testing.
+
 =item -n
 
 =item --dry-run
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 18c825d3..c3512d43 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -43,7 +43,7 @@ sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 
 # for old installations without manifest.js.gz
 sub try_scrape {
-	my ($self) = @_;
+	my ($self, $fallback_manifest) = @_;
 	my $uri = URI->new($self->{src});
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
@@ -54,9 +54,17 @@ sub try_scrape {
 	close($fh) or return $lei->child_error($?, "@$cmd failed");
 
 	# we grep with URL below, we don't want Subject/From headers
-	# making us clone random URLs
+	# making us clone random URLs.  This assumes remote instances
+	# prior to public-inbox 1.7.0
+	# 5b96edcb1e0d8252 (www: move mirror instructions to /text/, 2021-08-28)
 	my @html = split(/<hr>/, $html);
 	my @urls = ($html[-1] =~ m!\bgit clone --mirror ([a-z\+]+://\S+)!g);
+	if (!@urls && $fallback_manifest) {
+		warn <<EOM;
+W: failed to extract URLs from $uri, trying manifest.js.gz...
+EOM
+		return start_clone_url($self);
+	}
 	my $url = $uri->as_string;
 	chop($url) eq '/' or die "BUG: $uri not canonicalized";
 
@@ -603,7 +611,6 @@ sub try_manifest {
 	my $cmd = $curl->for_uri($lei, $uri, '-R', '-o', $fn);
 	my %opt = map { $_ => $lei->{$_} } (0..2);
 	my $cerr = run_reap($lei, $cmd, \%opt);
-	local $LIVE;
 	if ($cerr) {
 		return try_scrape($self) if ($cerr >> 8) == 22; # 404 missing
 		return $lei->child_error($cerr, "@$cmd failed");
@@ -698,15 +705,15 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 		$ic =~ /\A(?:v1|v2|always|never)\z/s or die <<"";
 --inbox-config must be one of `always', `v2', `v1', or `never'
 
-		my $iv = $lei->{opt}->{'inbox-version'};
-		if (defined $iv) {
-			local $LIVE;
-			return clone_v1($self) if $iv == 1;
-			return try_scrape($self) if $iv == 2;
-			die "bad --inbox-version=$iv\n";
-		}
-		return start_clone_url($self) if $self->{src} =~ m!://!;
-		die "TODO: cloning local directories not supported, yet";
+		local $LIVE;
+		my $iv = $lei->{opt}->{'inbox-version'} //
+			return start_clone_url($self);
+		return clone_v1($self) if $iv == 1;
+		die "bad --inbox-version=$iv\n" if $iv != 2;
+		die <<EOM if $self->{src} !~ m!://!;
+cloning local v2 inboxes not supported
+EOM
+		try_scrape($self, 1);
 	};
 	$lei->fail($@) if $@;
 }
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 3d980c97..2900f232 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -23,7 +23,7 @@ options:
     -C DIR            chdir to specified directory
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
-	inbox-config=s
+	inbox-config=s inbox-version=i
 	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 37/95] lei_mirror: require Perl v5.12+
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (35 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 36/95] clone: support --inbox-version Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 38/95] lei_mirror: ensure curl exits 22 on HTTP 404 responses Eric Wong
                   ` (57 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Another tiny step towards improve startup performance by
relying on Perl 5.12 strictness and avoiding strict.pm
---
 lib/PublicInbox/LeiMirror.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index c3512d43..279ce30e 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -3,8 +3,7 @@
 
 # "lei add-external --mirror" support (also "public-inbox-clone");
 package PublicInbox::LeiMirror;
-use strict;
-use v5.10.1;
+use v5.12;
 use parent qw(PublicInbox::IPC);
 use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
 use IO::Compress::Gzip qw(gzip $GzipError);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 38/95] lei_mirror: ensure curl exits 22 on HTTP 404 responses
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (36 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 37/95] lei_mirror: require Perl v5.12+ Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 39/95] lei_mirror: cleanup File::Temp OO usage Eric Wong
                   ` (56 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Oops, this is actually a long-standing bug :x
---
 lib/PublicInbox/LeiMirror.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 279ce30e..acf08665 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -46,7 +46,7 @@ sub try_scrape {
 	my $uri = URI->new($self->{src});
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
-	my $cmd = $curl->for_uri($lei, $uri, '--compressed');
+	my $cmd = $curl->for_uri($lei, $uri, qw(-f --compressed));
 	my $opt = { 0 => $lei->{0}, 2 => $lei->{2} };
 	my $fh = popen_rd($cmd, undef, $opt);
 	my $html = do { local $/; <$fh> } // die "read(curl $uri): $!";
@@ -132,7 +132,7 @@ sub _get_txt_start { # non-fatal
 	my $f = (split(m!/!, $endpoint))[-1];
 	my $ft = File::Temp->new(TEMPLATE => "$f-XXXX", TMPDIR => 1);
 	my $opt = { 0 => $lei->{0}, 1 => $lei->{1}, 2 => $lei->{2} };
-	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(--compressed -R -o),
+	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(-f --compressed -R -o),
 					$ft->filename);
 	my $jobs = $lei->{opt}->{jobs} // 1;
 	reap_live() while keys(%$LIVE) >= $jobs;
@@ -607,7 +607,7 @@ sub try_manifest {
 	my $ft = File::Temp->new(TEMPLATE => '.manifest-XXXX',
 				UNLINK => 1, TMPDIR => 1, SUFFIX => '.tmp');
 	my $fn = $ft->filename;
-	my $cmd = $curl->for_uri($lei, $uri, '-R', '-o', $fn);
+	my $cmd = $curl->for_uri($lei, $uri, qw(-f -R -o), $fn);
 	my %opt = map { $_ => $lei->{$_} } (0..2);
 	my $cerr = run_reap($lei, $cmd, \%opt);
 	if ($cerr) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 39/95] lei_mirror: cleanup File::Temp OO usage
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (37 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 38/95] lei_mirror: ensure curl exits 22 on HTTP 404 responses Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 40/95] lei_mirror: add `index' target to generated Makefile Eric Wong
                   ` (55 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

There's no need to capture or rely on the File::Temp->filename
in most cases since most Perl functions accept file handles all
the same.
---
 lib/PublicInbox/LeiMirror.pm | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index acf08665..3bc19d2f 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -113,12 +113,11 @@ sub clone_cmd {
 
 sub ft_rename ($$$) {
 	my ($ft, $dst, $open_mode) = @_;
-	my $fn = $ft->filename;
 	my @st = stat($dst);
 	my $mode = @st ? ($st[2] & 07777) : ($open_mode & ~umask);
-	chmod($mode, $ft) or croak "E: chmod $fn: $!";
+	chmod($mode, $ft) or croak "E: chmod($ft): $!";
 	require File::Copy;
-	File::Copy::mv($fn, $dst) or croak "E: mv($fn => $ft): $!";
+	File::Copy::mv($ft->filename, $dst) or croak "E: mv($ft => $dst): $!";
 	$ft->unlink_on_destroy(0);
 }
 
@@ -606,15 +605,14 @@ sub try_manifest {
 	$uri->path($path . '/manifest.js.gz');
 	my $ft = File::Temp->new(TEMPLATE => '.manifest-XXXX',
 				UNLINK => 1, TMPDIR => 1, SUFFIX => '.tmp');
-	my $fn = $ft->filename;
-	my $cmd = $curl->for_uri($lei, $uri, qw(-f -R -o), $fn);
+	my $cmd = $curl->for_uri($lei, $uri, qw(-f -R -o), $ft->filename);
 	my %opt = map { $_ => $lei->{$_} } (0..2);
 	my $cerr = run_reap($lei, $cmd, \%opt);
 	if ($cerr) {
 		return try_scrape($self) if ($cerr >> 8) == 22; # 404 missing
 		return $lei->child_error($cerr, "@$cmd failed");
 	}
-	my $m = eval { decode_manifest($ft, $fn, $uri) };
+	my $m = eval { decode_manifest($ft, $ft, $uri) };
 	if ($@) {
 		warn $@;
 		return try_scrape($self);
@@ -681,9 +679,12 @@ EOM
 		# users won't have to delete manifest if they +w an
 		# epoch they no longer want to skip
 		my $json = PublicInbox::Config->json->encode($m);
-		my $mtime = (stat($fn))[9];
-		gzip(\$json => $fn) or die "gzip: $GzipError";
-		utime($mtime, $mtime, $fn) or die "utime(..., $fn): $!";
+		my $mtime = (stat($ft))[9];
+		seek($ft, SEEK_SET, 0) or die "seek($ft): $!";
+		truncate($ft, 0) or die "truncate($ft): $!";
+		gzip(\$json => $ft) or die "gzip($ft): $GzipError";
+		$ft->flush or die "flush($ft): $!";
+		utime($mtime, $mtime, "$ft") or die "utime(..., $ft): $!";
 	}
 	ft_rename($ft, "$self->{dst}/manifest.js.gz", 0666);
 	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 40/95] lei_mirror: add `index' target to generated Makefile
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (38 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 39/95] lei_mirror: cleanup File::Temp OO usage Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 41/95] lei_mirror: do not write Makefile for --inbox-config=never Eric Wong
                   ` (54 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

It can probably be a useful hint to avoid misleading users
into always using `--reindex'.
---
 lib/PublicInbox/LeiMirror.pm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 3bc19d2f..4b6da260 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -763,6 +763,7 @@ help :
 	@echo Rarely needed targets:
 	@echo '    make reindex      - may be needed for new features/bugfixes'
 	@echo '    make compact      - rewrite Xapian storage to save space'
+	@echo '    make index        - initial index after clone
 
 fetch :
 	public-inbox-fetch
@@ -779,12 +780,14 @@ update :
 		echo 'public-inbox index not initialized'; \
 		echo 'see public-inbox-index(1) man page'; \
 	fi
+index :
+	public-inbox-index
 reindex :
 	public-inbox-index --reindex
 compact :
 	public-inbox-compact
 
-.PHONY : help fetch update reindex compact
+.PHONY : help fetch update index reindex compact
 EOM
 		close $fh or die "close($f): $!";
 	} else {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 41/95] lei_mirror: do not write Makefile for --inbox-config=never
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (39 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 40/95] lei_mirror: add `index' target to generated Makefile Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 42/95] lei_mirror: hoist out dump_manifest sub Eric Wong
                   ` (53 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We want to be able to clone non-inbox git repos, too.
---
 lib/PublicInbox/LeiMirror.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 4b6da260..b0e6fa53 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -386,7 +386,8 @@ sub v1_done { # called via OnDestroy
 			die "rename($f, $o/info/alternates): $!";
 		$f->unlink_on_destroy(0);
 	}
-	return if $self->{-is_epoch};
+	return if ($self->{-is_epoch} ||
+		$self->{lei}->{opt}->{'inbox-config'} ne 'always');
 	write_makefile($dst, 1);
 	index_cloned_inbox($self, 1);
 }

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 42/95] lei_mirror: hoist out dump_manifest sub
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (40 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 41/95] lei_mirror: do not write Makefile for --inbox-config=never Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 43/95] lei_mirror: avoid convoluted lazy_cb usage Eric Wong
                   ` (52 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We can reuse it in PublicInbox::Fetch, too.
---
 lib/PublicInbox/Fetch.pm     | 10 +---------
 lib/PublicInbox/LeiMirror.pm | 27 +++++++++++++++------------
 2 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/lib/PublicInbox/Fetch.pm b/lib/PublicInbox/Fetch.pm
index 3b6aa389..06ed775f 100644
--- a/lib/PublicInbox/Fetch.pm
+++ b/lib/PublicInbox/Fetch.pm
@@ -12,8 +12,6 @@ use PublicInbox::LEI;
 use PublicInbox::LeiCurl;
 use PublicInbox::LeiMirror;
 use File::Temp ();
-use PublicInbox::Config;
-use IO::Compress::Gzip qw(gzip $GzipError);
 
 sub new { bless {}, __PACKAGE__ }
 
@@ -233,13 +231,7 @@ EOM
 	}
 	for my $i (@new_epoch) { $mg->epoch_cfg_set($i) }
 	if ($ft) {
-		if ($mculled) {
-			my $json = PublicInbox::Config->json->encode($m1);
-			my $fn = $ft->filename;
-			my $mtime = (stat($fn))[9];
-			gzip(\$json => $fn) or die "gzip: $GzipError";
-			utime($mtime, $mtime, $fn) or die "utime(..., $fn): $!";
-		}
+		PublicInbox::LeiMirror::dump_manifest($m1 => $ft) if $mculled;
 		PublicInbox::LeiMirror::ft_rename($ft, $mf, 0666);
 	}
 	$lei->child_error($xit << 8) if $fp2 && $xit;
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index b0e6fa53..0df37724 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -595,6 +595,20 @@ EOM
 	reap_live() while keys(%$LIVE);
 }
 
+sub dump_manifest ($$) {
+	my ($m, $ft) = @_;
+	# write the smaller manifest if epochs were skipped so
+	# users won't have to delete manifest if they +w an
+	# epoch they no longer want to skip
+	my $json = PublicInbox::Config->json->encode($m);
+	my $mtime = (stat($ft))[9];
+	seek($ft, SEEK_SET, 0) or die "seek($ft): $!";
+	truncate($ft, 0) or die "truncate($ft): $!";
+	gzip(\$json => $ft) or die "gzip($ft): $GzipError";
+	$ft->flush or die "flush($ft): $!";
+	utime($mtime, $mtime, "$ft") or die "utime(..., $ft): $!";
+}
+
 # FIXME: this gets confused by single inbox instance w/ global manifest.js.gz
 sub try_manifest {
 	my ($self) = @_;
@@ -675,18 +689,7 @@ EOM
 	return if $self->{lei}->{child_error} || $self->{dry_run};
 
 	# set by clone_v2_prep/-I/--exclude
-	if (delete $self->{-culled_manifest}) {
-		# write the smaller manifest if epochs were skipped so
-		# users won't have to delete manifest if they +w an
-		# epoch they no longer want to skip
-		my $json = PublicInbox::Config->json->encode($m);
-		my $mtime = (stat($ft))[9];
-		seek($ft, SEEK_SET, 0) or die "seek($ft): $!";
-		truncate($ft, 0) or die "truncate($ft): $!";
-		gzip(\$json => $ft) or die "gzip($ft): $GzipError";
-		$ft->flush or die "flush($ft): $!";
-		utime($mtime, $mtime, "$ft") or die "utime(..., $ft): $!";
-	}
+	dump_manifest($m => $ft) if delete $self->{-culled_manifest};
 	ft_rename($ft, "$self->{dst}/manifest.js.gz", 0666);
 	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait
 }

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 43/95] lei_mirror: avoid convoluted lazy_cb usage
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (41 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 42/95] lei_mirror: hoist out dump_manifest sub Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 44/95] lei_mirror: simplify clone_v2_prep Eric Wong
                   ` (51 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

lazy_cb should only be used for lei command dispatch and
completion callbacks when the method isn't known at startup.
There's zero reason to use it when the method is known
ahead-of-time, especially when there's a comment pointing
reviewers towards the only possible method it can dispatch.
---
 lib/PublicInbox/LEI.pm       | 2 +-
 lib/PublicInbox/LeiMirror.pm | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index f3e80113..8a14ace4 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -787,7 +787,7 @@ EOM
 	}
 }
 
-sub lazy_cb ($$$) {
+sub lazy_cb ($$$) { # $pfx is _complete_ or lei_
 	my ($self, $cmd, $pfx) = @_;
 	my $ucmd = $cmd;
 	$ucmd =~ tr/-/_/;
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 0df37724..b2745295 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -31,9 +31,9 @@ sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 		warn("unlink($f): $!\n") unless $!{ENOENT};
 	} else {
 		if (!$mrr->{dry_run} && $lei->{cmd} ne 'public-inbox-clone') {
-			# calls _finish_add_external
-			$lei->lazy_cb('add-external', '_finish_'
-					)->($lei, $mrr->{dst});
+			require PublicInbox::LeiAddExternal;
+			PublicInbox::LeiAddExternal::_finish_add_external(
+							$lei, $mrr->{dst});
 		}
 		$lei->qerr("# mirrored $mrr->{src} => $mrr->{dst}");
 	}

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 44/95] lei_mirror: simplify clone_v2_prep
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (42 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 43/95] lei_mirror: avoid convoluted lazy_cb usage Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 45/95] lei_mirror: support --objstore and forkgroups Eric Wong
                   ` (50 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Since everything relies on the instance-specific {todo} queue,
there's no need to have sub-specific queues.
---
 lib/PublicInbox/LeiMirror.pm | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index b2745295..de4cdc22 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -433,7 +433,8 @@ sub clone_v2_prep ($$;$) {
 	my $want = parse_epochs($lei->{opt}->{epoch}, $v2_epochs);
 	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
 	delete $task->{todo}; # $self->{todo} still exists
-	my (@src_edst, @skip, $desc, @entv);
+	my (@skip, $desc);
+	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
 	for my $nr (sort { $a <=> $b } keys %$v2_epochs) {
 		my ($uri, $key) = @{$v2_epochs->{$nr}};
 		my $src = $uri->as_string;
@@ -453,8 +454,13 @@ failed to extract epoch number from $src
 			}
 		}
 		if (!$want || $want->{$nr}) {
-			push @src_edst, $src, $edst;
-			push @entv, $edst, $ent;
+			my $etask = bless { %$task }, __PACKAGE__;
+			$etask->{-ent} = $ent; # may have {reference}
+			$etask->{cur_src} = $src;
+			$etask->{cur_dst} = $edst;
+			$etask->{-is_epoch} = $fini;
+			my $ref = $ent->{reference} // '';
+			push @{$self->{todo}->{$ref}}, $etask;
 			$self->{any_want}->{$key} = 1;
 		} else { # create a placeholder so users only need to chmod +w
 			init_placeholder($src, $edst, $ent);
@@ -467,24 +473,11 @@ failed to extract epoch number from $src
 
 	(!$self->{dry_run} && !-d $dst) and File::Path::mkpath($dst);
 
-	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
-
 	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v2)\z/s and
 		_get_txt_start($task, '_/text/config/raw', $fini);
 
 	defined($desc) ? ($task->{'txt.description'} = $desc) :
 		_get_txt_start($task, 'description', $fini);
-	while (@entv) {
-		my ($edst, $ent) = splice(@entv, 0, 2);
-		my $etask = bless { %$task }, __PACKAGE__;
-		$etask->{-ent} = $ent; # may have {reference}
-		$etask->{cur_src} = shift @src_edst // die 'BUG: no cur_src';
-		$etask->{cur_dst} = shift @src_edst // die 'BUG: no cur_dst';
-		$etask->{cur_dst} eq $edst or
-			die "BUG: `$etask->{cur_dst}' != `$edst'";
-		$etask->{-is_epoch} = $fini;
-		push @{$self->{todo}->{($ent->{reference} // '')}}, $etask;
-	}
 }
 
 sub decode_manifest ($$$) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 45/95] lei_mirror: support --objstore and forkgroups
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (43 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 44/95] lei_mirror: simplify clone_v2_prep Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 46/95] lei_mirror: cleanup process reaping logic Eric Wong
                   ` (49 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

The {forkgroup} directive of grokmirror 2.x manifest.js.gz
can facilitate more space savings and improved pack performance
with pack.islands.
---
 lib/PublicInbox/Fetch.pm     |  15 +--
 lib/PublicInbox/LeiMirror.pm | 200 +++++++++++++++++++++++++++++++++--
 script/public-inbox-clone    |   3 +-
 3 files changed, 194 insertions(+), 24 deletions(-)

diff --git a/lib/PublicInbox/Fetch.pm b/lib/PublicInbox/Fetch.pm
index 06ed775f..3dbb0b55 100644
--- a/lib/PublicInbox/Fetch.pm
+++ b/lib/PublicInbox/Fetch.pm
@@ -15,19 +15,6 @@ use File::Temp ();
 
 sub new { bless {}, __PACKAGE__ }
 
-sub fetch_args ($$) {
-	my ($lei, $opt) = @_;
-	my @cmd; # (git --git-dir=...) to be added by caller
-	$opt->{$_} = $lei->{$_} for (0..2);
-	# we support "-c $key=$val" for arbitrary git config options
-	# e.g.: git -c http.proxy=socks5h://127.0.0.1:9050
-	push(@cmd, '-c', $_) for @{$lei->{opt}->{c} // []};
-	push @cmd, 'fetch';
-	push @cmd, '-q' if $lei->{opt}->{quiet};
-	push @cmd, '-v' if $lei->{opt}->{verbose};
-	@cmd;
-}
-
 sub remote_url ($$) {
 	my ($lei, $dir) = @_;
 	my $rn = $lei->{opt}->{'try-remote'} // [ 'origin', '_grokmirror' ];
@@ -205,7 +192,7 @@ EOM
 		if (-d $d) {
 			$fp2->[0] = get_fingerprint2($d) if $fp2;
 			$cmd = [ @$torsocks, 'git', "--git-dir=$d",
-				fetch_args($lei, $opt) ];
+			       PublicInbox::LeiMirror::fetch_args($lei, $opt)];
 		} else {
 			my $e_uri = $ibx_uri->clone;
 			my ($epath) = ($d =~ m!(/git/[0-9]+\.git)\z!);
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index de4cdc22..799939b5 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -18,8 +18,10 @@ use PublicInbox::Config;
 use PublicInbox::Inbox;
 use PublicInbox::LeiCurl;
 use PublicInbox::OnDestroy;
+use Digest::SHA qw(sha256_hex);
 
 our $LIVE; # pid => callback
+my $update_ref_stdin = $ENV{GIT_CAN_UPDATE_REF_STDIN} // 1;
 
 sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my ($arg, $pid) = @_;
@@ -249,7 +251,173 @@ sub start_clone {
 	reap_live() while keys(%$LIVE) >= $jobs;
 	$self->{lei}->qerr("# @$cmd");
 	return if $self->{dry_run};
-	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_clone, $self, $cmd, $fini ]
+	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $self, $cmd, $fini ]
+}
+
+sub fetch_args ($$) {
+	my ($lei, $opt) = @_;
+	my @cmd; # (git --git-dir=...) to be added by caller
+	$opt->{$_} = $lei->{$_} for (0..2);
+	# we support "-c $key=$val" for arbitrary git config options
+	# e.g.: git -c http.proxy=socks5h://127.0.0.1:9050
+	push(@cmd, '-c', $_) for @{$lei->{opt}->{c} // []};
+	push @cmd, 'fetch';
+	push @cmd, '-q' if $lei->{opt}->{quiet} ||
+			($lei->{opt}->{jobs} // 1) > 1;
+	push @cmd, '-v' if $lei->{opt}->{verbose};
+	@cmd;
+}
+
+sub fgrp_update_old ($) { # for git <1.8.5
+	my ($fgrp) = @_;
+	my $cmd = [ 'git', "--git-dir=$fgrp->{cur_dst}",
+		fetch_args($fgrp->{lei}, my $opt = {}) ];
+	$fgrp->{lei}->qerr("# @$cmd");
+	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $fgrp, $cmd ];
+	my $jobs = $fgrp->{lei}->{opt}->{jobs} // 1;
+	reap_live() while keys(%$LIVE) >= $jobs;
+}
+
+sub upr { # feed `git update-ref --stdin -z' verbosely
+	my ($fgrp, $w, $op, $ref, $oid) = @_;
+	$fgrp->{lei}->qerr("# $op $ref $oid");
+	print $w "$op $ref\0$oid\0" or die "print(w): $!";
+}
+
+sub fgrp_update {
+	my ($fgrp) = @_;
+	my $srcfh = delete $fgrp->{srcfh} or return;
+	my $dstfh = delete $fgrp->{dstfh} or return;
+	seek($srcfh, SEEK_SET, 0) or die "seek(src): $!";
+	seek($dstfh, SEEK_SET, 0) or die "seek(dst): $!";
+	my %src = map { chomp; split(/\0/) } (<$srcfh>);
+	close $srcfh;
+	my %dst = map { chomp; split(/\0/) } (<$dstfh>);
+	close $dstfh;
+	pipe(my ($r, $w)) or die "pipe: $!";
+	my $cmd = [ 'git', "--git-dir=$fgrp->{cur_dst}",
+		qw(update-ref --stdin -z) ];
+	$fgrp->{lei}->qerr("# @$cmd");
+	my $opt = { 0 => $r, 1 => $fgrp->{lei}->{1}, 2 => $fgrp->{lei}->{2} };
+	my $pid = spawn($cmd, undef, $opt);
+	close $r or die "close(r): $!";
+	for my $ref (keys %dst) {
+		my $new = delete $src{$ref};
+		my $old = $dst{$ref};
+		if (defined $new) {
+			upr($fgrp, $w, 'update', $ref, $new) if $new ne $old;
+		} else {
+			upr($fgrp, $w, 'delete', $ref, $old);
+		}
+	}
+	while (my ($ref, $oid) = each %src) {
+		upr($fgrp, $w, 'create', $ref, $oid);
+	}
+	if (close($w)) { # git >= 1.8.5
+		$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd ];
+		my $jobs = $fgrp->{lei}->{opt}->{jobs} // 1;
+		reap_live() while keys(%$LIVE) >= $jobs;
+	} else { # git <1.8.5 w/o update-ref --stdin
+		warn "E: close(update-ref --stdin): $!\n";
+		$update_ref_stdin = 0;
+		waitpid($pid, 0) // die "waitpid(update-ref --stdin): $!";
+		fgrp_update_old($fgrp);
+	}
+}
+
+sub fgrp_fetched {
+	my ($fgrp) = @_;
+	return if $fgrp->{dry_run} || !$LIVE;
+	my $rn = $fgrp->{-remote};
+	my %opt = map { $_ => $fgrp->{lei}->{$_} } (0..2);
+	my $cmd = [ 'git', "--git-dir=$fgrp->{-osdir}",
+			qw(pack-refs --all --prune) ];
+	$fgrp->{lei}->qerr("# @$cmd");
+	$LIVE->{spawn($cmd, undef, \%opt)} = [ \&reap_cmd, $fgrp, $cmd ];
+	my $jobs = $fgrp->{lei}->{opt}->{jobs} // 1;
+	reap_live() while keys(%$LIVE) >= $jobs;
+
+	$update_ref_stdin or return fgrp_update_old($fgrp);
+
+	my $update_ref = PublicInbox::OnDestroy->new($$, \&fgrp_update, $fgrp);
+
+	my $src = [ 'git', "--git-dir=$fgrp->{-osdir}", 'for-each-ref',
+		"--format=refs/%(refname:lstrip=3)%00%(objectname)",
+		"refs/remotes/$rn/" ];
+	open($fgrp->{srcfh}, '+>', undef) or die "open(src): $!";
+	$fgrp->{lei}->qerr("# @$src >SRC");
+	my $pid = spawn($src, undef, { %opt, 1 => $fgrp->{srcfh} });
+	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $src, $update_ref ];
+	reap_live() while keys(%$LIVE) >= $jobs;
+
+	my $dst = [ 'git', "--git-dir=$fgrp->{cur_dst}", 'for-each-ref',
+		'--format=%(refname)%00%(objectname)' ];
+	open($fgrp->{dstfh}, '+>', undef) or die "open(dst): $!";
+	$fgrp->{lei}->qerr("# @$dst >DST");
+	$pid = spawn($dst, undef, { %opt, 1 => $fgrp->{dstfh} });
+	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $dst, $update_ref ];
+	reap_live() while keys(%$LIVE) >= $jobs;
+}
+
+sub fgrp_fetch {
+	my ($fgrp, $pfx, $fini) = @_;
+	my $cmd = [ @$pfx, 'git', "--git-dir=$fgrp->{-osdir}",
+			fetch_args($fgrp->{lei}, my $opt = {}),
+			$fgrp->{-remote} ];
+	$fgrp->{-fini} = $fini;
+	my $jobs = $fgrp->{lei}->{opt}->{jobs} // 1;
+	reap_live() while keys(%$LIVE) >= $jobs;
+	$fgrp->{lei}->qerr("# @$cmd");
+	return if $fgrp->{dry_run};
+	my $fgrp_fini = PublicInbox::OnDestroy->new($$, \&fgrp_fetched, $fgrp);
+	my $pid = spawn($cmd, undef, $opt);
+	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd, $fgrp_fini ];
+}
+
+# keep this idempotent for future use by public-inbox-fetch
+sub forkgroup_prep {
+	my ($self, $uri) = @_;
+	$self->{-ent} // return;
+	my $os = $self->{-objstore} // return;
+	my $fg = $self->{-ent}->{forkgroup} // return;
+	my $dir = "$os/$fg.git";
+	my @cmd = ('git', "--git-dir=$dir", 'config');
+	my $opt = +{ map { $_ => $self->{lei}->{$_} } (0..2) };
+	if (!-d $dir) {
+		PublicInbox::Import::init_bare($dir);
+		for ('repack.useDeltaIslands=true',
+				'pack.island=refs/remotes/([^/]+)/') {
+			run_die([@cmd, split(/=/, $_, 2)], undef, $opt);
+		}
+	}
+	my $key = $self->{-key} // die 'BUG: no -key';
+	my ($bn) = ($key =~ m{/([a-z0-9_,;=!\+\{\}\|][^/]*)(?:\.git)?\z}i);
+	my $rn = "$bn-".substr(sha256_hex($key), 0, 16);
+	for ("url=$uri", "fetch=+refs/*:refs/remotes/$rn/*") {
+		my @kv = split(/=/, $_, 2);
+		$kv[0] = "remote.$rn.$kv[0]";
+		run_die([@cmd, @kv], undef, $opt);
+	}
+	if (!-d $self->{cur_dst}) {
+		my $alt = File::Spec->rel2abs("$dir/objects");
+		PublicInbox::Import::init_bare($self->{cur_dst});
+		my $o = "$self->{cur_dst}/objects";
+		my $f = "$o/info/alternates";
+		my $l = File::Spec->abs2rel($alt, File::Spec->rel2abs($o));
+		open my $fh, '+>>', $f or die "open($f): $!";
+		seek($fh, SEEK_SET, 0) or die "seek($f): $!";
+		chomp(my @cur = <$fh>);
+		if (!grep(/\A\Q$l\E\z/, @cur)) {
+			say $fh $l or die "say($f): $!";
+		}
+		close $fh or die "close($f): $!";
+		@cmd = ('git', "--git-dir=$self->{cur_dst}",
+			qw(remote add --mirror=fetch origin), "$uri");
+		my $pid = spawn(\@cmd, undef, $opt);
+		waitpid($pid, 0) // die "waitpid(@cmd): $!";
+		die "E: @cmd: \$?=$?" if ($? && ($? >> 8) != 3);
+	}
+	bless { %$self, -osdir => $dir, -remote => $rn }, __PACKAGE__;
 }
 
 sub clone_v1 {
@@ -263,10 +431,15 @@ sub clone_v1 {
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
 	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}), "$uri", $dst ];
-	my $ref = $self->{-ent} ? $self->{-ent}->{reference} : undef;
-	defined($ref) && -e "$self->{dst}$ref" and
-		push @$cmd, '--reference', "$self->{dst}$ref";
-	start_clone($self, $cmd, $opt, $fini);
+	my $fgrp = forkgroup_prep($self, $uri);
+	if (!defined($fgrp) && defined($self->{-ent})) {
+		if (defined(my $ref = $self->{-ent}->{reference})) {
+			-e "$self->{dst}$ref" and
+				push @$cmd, '--reference', "$self->{dst}$ref";
+		}
+	}
+	$fgrp ? fgrp_fetch($fgrp, $pfx, $fini) :
+		start_clone($self, $cmd, $opt, $fini);
 
 	if (!$self->{-is_epoch} && $lei->{opt}->{'inbox-config'} =~
 				/\A(?:always|v1)\z/s) {
@@ -357,7 +530,7 @@ EOM
 	}
 }
 
-sub reap_clone { # async, called via SIGCHLD
+sub reap_cmd { # async, called via SIGCHLD
 	my ($self, $cmd) = @_;
 	my $cerr = $?;
 	$? = 0; # don't let it influence normal exit
@@ -377,8 +550,12 @@ sub v1_done { # called via OnDestroy
 	}
 	my $o = "$dst/objects";
 	if (open(my $fh, '<', "$o/info/alternates")) {
+		my $base = File::Spec->rel2abs($o);
 		chomp(my @l = <$fh>);
-		for (@l) { $_ = File::Spec->abs2rel($_, $o)."\n" }
+		for (@l) {
+			$_ = File::Spec->abs2rel($_, $base) if m!\A/!;
+			$_ .= "\n";
+		}
 		my $f = File::Temp->new(TEMPLATE => '.XXXX', DIR => "$o/info");
 		print $f @l;
 		$f->flush or die "flush($f): $!";
@@ -417,7 +594,7 @@ sub reap_live {
 	my $pid = waitpid(-1, 0) // die "waitpid(-1): $!";
 	if (my $x = delete $LIVE->{$pid}) {
 		my $cb = shift @$x;
-		$cb->(@$x);
+		$cb->(@$x) if $cb;
 	} else {
 		warn "reaped unknown PID=$pid ($?)\n";
 	}
@@ -454,7 +631,7 @@ failed to extract epoch number from $src
 			}
 		}
 		if (!$want || $want->{$nr}) {
-			my $etask = bless { %$task }, __PACKAGE__;
+			my $etask = bless { %$task, -key => $key }, __PACKAGE__;
 			$etask->{-ent} = $ent; # may have {reference}
 			$etask->{cur_src} = $src;
 			$etask->{cur_dst} = $edst;
@@ -664,6 +841,7 @@ EOM
 					die("BUG: no `$name' in manifest");
 			$task->{cur_src} = "$uri";
 			$task->{cur_dst} = $task->{dst};
+			$task->{-key} = $name;
 			if ($n > 1) {
 				$task->{cur_dst} .= $name;
 				$task->{cur_src} .= $name;
@@ -702,6 +880,10 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 		$ic =~ /\A(?:v1|v2|always|never)\z/s or die <<"";
 --inbox-config must be one of `always', `v2', `v1', or `never'
 
+		if (defined(my $os = $lei->{opt}->{objstore})) {
+			$os = 'objstore' if $os eq ''; # --objstore w/o args
+			$self->{-objstore} = "$self->{dst}/$os";
+		}
 		local $LIVE;
 		my $iv = $lei->{opt}->{'inbox-version'} //
 			return start_clone_url($self);
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 2900f232..59f01b54 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -14,6 +14,7 @@ usage: public-inbox-clone INBOX_URL [DESTINATION]
 options:
 
   --epoch=RANGE       range of v2 epochs to clone (e.g `2..5', `~0', `~1..')
+  --objstore [DIR]    share storage for coderepos
   --torsocks VAL      whether or not to wrap git and curl commands with
                       torsocks (default: `auto')
                       Must be one of: `auto', `no' or `yes'
@@ -23,7 +24,7 @@ options:
     -C DIR            chdir to specified directory
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
-	inbox-config=s inbox-version=i
+	inbox-config=s inbox-version=i objstore:s
 	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 46/95] lei_mirror: cleanup process reaping logic
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (44 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 45/95] lei_mirror: support --objstore and forkgroups Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 47/95] lei_mirror: ensure git <1.8.5 fallback can use torsocks Eric Wong
                   ` (48 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We can put more of the default --jobs logic and loop handling
inside a sub to simplify callers.
---
 lib/PublicInbox/LeiMirror.pm | 51 ++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 799939b5..163f45ee 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -123,6 +123,21 @@ sub ft_rename ($$$) {
 	$ft->unlink_on_destroy(0);
 }
 
+sub do_reap ($;$) {
+	my ($self, $jobs) = @_;
+	$jobs //= $self->{-jobs} //= $self->{lei}->{opt}->{jobs} // 1;
+	$jobs = 1 if $jobs < 1;
+	while (keys(%$LIVE) >= $jobs) {
+		my $pid = waitpid(-1, 0) // die "waitpid(-1): $!";
+		if (my $x = delete $LIVE->{$pid}) {
+			my $cb = shift @$x;
+			$cb->(@$x) if $cb;
+		} else {
+			warn "reaped unknown PID=$pid ($?)\n";
+		}
+	}
+}
+
 sub _get_txt_start { # non-fatal
 	my ($self, $endpoint, $fini) = @_;
 	my $uri = URI->new($self->{cur_src} // $self->{src});
@@ -135,8 +150,7 @@ sub _get_txt_start { # non-fatal
 	my $opt = { 0 => $lei->{0}, 1 => $lei->{1}, 2 => $lei->{2} };
 	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(-f --compressed -R -o),
 					$ft->filename);
-	my $jobs = $lei->{opt}->{jobs} // 1;
-	reap_live() while keys(%$LIVE) >= $jobs;
+	do_reap($self);
 	$lei->qerr("# @$cmd");
 	return if $self->{dry_run};
 	$self->{"-get_txt.$endpoint"} = [ $ft, $cmd, $uri ];
@@ -247,8 +261,7 @@ sub run_reap {
 
 sub start_clone {
 	my ($self, $cmd, $opt, $fini) = @_;
-	my $jobs = $self->{lei}->{opt}->{jobs} // 1;
-	reap_live() while keys(%$LIVE) >= $jobs;
+	do_reap($self);
 	$self->{lei}->qerr("# @$cmd");
 	return if $self->{dry_run};
 	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $self, $cmd, $fini ]
@@ -273,9 +286,8 @@ sub fgrp_update_old ($) { # for git <1.8.5
 	my $cmd = [ 'git', "--git-dir=$fgrp->{cur_dst}",
 		fetch_args($fgrp->{lei}, my $opt = {}) ];
 	$fgrp->{lei}->qerr("# @$cmd");
+	do_reap($fgrp);
 	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $fgrp, $cmd ];
-	my $jobs = $fgrp->{lei}->{opt}->{jobs} // 1;
-	reap_live() while keys(%$LIVE) >= $jobs;
 }
 
 sub upr { # feed `git update-ref --stdin -z' verbosely
@@ -315,8 +327,7 @@ sub fgrp_update {
 	}
 	if (close($w)) { # git >= 1.8.5
 		$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd ];
-		my $jobs = $fgrp->{lei}->{opt}->{jobs} // 1;
-		reap_live() while keys(%$LIVE) >= $jobs;
+		do_reap($fgrp);
 	} else { # git <1.8.5 w/o update-ref --stdin
 		warn "E: close(update-ref --stdin): $!\n";
 		$update_ref_stdin = 0;
@@ -332,10 +343,9 @@ sub fgrp_fetched {
 	my %opt = map { $_ => $fgrp->{lei}->{$_} } (0..2);
 	my $cmd = [ 'git', "--git-dir=$fgrp->{-osdir}",
 			qw(pack-refs --all --prune) ];
+	do_reap($fgrp);
 	$fgrp->{lei}->qerr("# @$cmd");
 	$LIVE->{spawn($cmd, undef, \%opt)} = [ \&reap_cmd, $fgrp, $cmd ];
-	my $jobs = $fgrp->{lei}->{opt}->{jobs} // 1;
-	reap_live() while keys(%$LIVE) >= $jobs;
 
 	$update_ref_stdin or return fgrp_update_old($fgrp);
 
@@ -344,19 +354,19 @@ sub fgrp_fetched {
 	my $src = [ 'git', "--git-dir=$fgrp->{-osdir}", 'for-each-ref',
 		"--format=refs/%(refname:lstrip=3)%00%(objectname)",
 		"refs/remotes/$rn/" ];
+	do_reap($fgrp);
 	open($fgrp->{srcfh}, '+>', undef) or die "open(src): $!";
 	$fgrp->{lei}->qerr("# @$src >SRC");
 	my $pid = spawn($src, undef, { %opt, 1 => $fgrp->{srcfh} });
 	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $src, $update_ref ];
-	reap_live() while keys(%$LIVE) >= $jobs;
 
 	my $dst = [ 'git', "--git-dir=$fgrp->{cur_dst}", 'for-each-ref',
 		'--format=%(refname)%00%(objectname)' ];
+	do_reap($fgrp);
 	open($fgrp->{dstfh}, '+>', undef) or die "open(dst): $!";
 	$fgrp->{lei}->qerr("# @$dst >DST");
 	$pid = spawn($dst, undef, { %opt, 1 => $fgrp->{dstfh} });
 	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $dst, $update_ref ];
-	reap_live() while keys(%$LIVE) >= $jobs;
 }
 
 sub fgrp_fetch {
@@ -365,8 +375,7 @@ sub fgrp_fetch {
 			fetch_args($fgrp->{lei}, my $opt = {}),
 			$fgrp->{-remote} ];
 	$fgrp->{-fini} = $fini;
-	my $jobs = $fgrp->{lei}->{opt}->{jobs} // 1;
-	reap_live() while keys(%$LIVE) >= $jobs;
+	do_reap($fgrp);
 	$fgrp->{lei}->qerr("# @$cmd");
 	return if $fgrp->{dry_run};
 	my $fgrp_fini = PublicInbox::OnDestroy->new($$, \&fgrp_fetched, $fgrp);
@@ -451,7 +460,7 @@ sub clone_v1 {
 	(!defined($d) && !$nohang) and
 		_get_txt_start($self, 'description', $fini);
 
-	reap_live() until ($nohang || !keys(%$LIVE)); # for non-manifest clone
+	$nohang or do_reap($self, 1); # for non-manifest clone
 }
 
 sub parse_epochs ($$) {
@@ -590,16 +599,6 @@ sub v2_done { # called via OnDestroy
 	index_cloned_inbox($self, 2);
 }
 
-sub reap_live {
-	my $pid = waitpid(-1, 0) // die "waitpid(-1): $!";
-	if (my $x = delete $LIVE->{$pid}) {
-		my $cb = shift @$x;
-		$cb->(@$x) if $cb;
-	} else {
-		warn "reaped unknown PID=$pid ($?)\n";
-	}
-}
-
 sub clone_v2_prep ($$;$) {
 	my ($self, $v2_epochs, $m) = @_; # $m => manifest.js.gz hashref
 	my $lei = $self->{lei};
@@ -762,7 +761,7 @@ EOM
 			last; # restart %$todo iteration
 		}
 	}
-	reap_live() while keys(%$LIVE);
+	do_reap($self, 1);
 }
 
 sub dump_manifest ($$) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 47/95] lei_mirror: ensure git <1.8.5 fallback can use torsocks
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (45 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 46/95] lei_mirror: cleanup process reaping logic Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 48/95] clone: flesh out --objstore behavior and document Eric Wong
                   ` (47 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Since we fall back to `git fetch' on versions of git without
`git update-ref --stdin' support, we must also support
torsocks use on Tor .onion URLs
---
 lib/PublicInbox/LeiMirror.pm | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 163f45ee..6efe23fa 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -283,7 +283,7 @@ sub fetch_args ($$) {
 
 sub fgrp_update_old ($) { # for git <1.8.5
 	my ($fgrp) = @_;
-	my $cmd = [ 'git', "--git-dir=$fgrp->{cur_dst}",
+	my $cmd = [ @{$fgrp->{-torsocks}}, 'git', "--git-dir=$fgrp->{cur_dst}",
 		fetch_args($fgrp->{lei}, my $opt = {}) ];
 	$fgrp->{lei}->qerr("# @$cmd");
 	do_reap($fgrp);
@@ -370,8 +370,8 @@ sub fgrp_fetched {
 }
 
 sub fgrp_fetch {
-	my ($fgrp, $pfx, $fini) = @_;
-	my $cmd = [ @$pfx, 'git', "--git-dir=$fgrp->{-osdir}",
+	my ($fgrp, $fini) = @_;
+	my $cmd = [ @{$fgrp->{-torsocks}}, 'git', "--git-dir=$fgrp->{-osdir}",
 			fetch_args($fgrp->{lei}, my $opt = {}),
 			$fgrp->{-remote} ];
 	$fgrp->{-fini} = $fini;
@@ -436,10 +436,11 @@ sub clone_v1 {
 	my $uri = URI->new($self->{cur_src} // $self->{src});
 	defined($lei->{opt}->{epoch}) and
 		die "$uri is a v1 inbox, --epoch is not supported\n";
-	my $pfx = $curl->torsocks($lei, $uri) or return;
+	$self->{-torsocks} //= $curl->torsocks($lei, $uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
-	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}), "$uri", $dst ];
+	my $cmd = [ @{$self->{-torsocks}}, clone_cmd($lei, my $opt = {}),
+		"$uri", $dst ];
 	my $fgrp = forkgroup_prep($self, $uri);
 	if (!defined($fgrp) && defined($self->{-ent})) {
 		if (defined(my $ref = $self->{-ent}->{reference})) {
@@ -447,7 +448,7 @@ sub clone_v1 {
 				push @$cmd, '--reference', "$self->{dst}$ref";
 		}
 	}
-	$fgrp ? fgrp_fetch($fgrp, $pfx, $fini) :
+	$fgrp ? fgrp_fetch($fgrp, $fini) :
 		start_clone($self, $cmd, $opt, $fini);
 
 	if (!$self->{-is_epoch} && $lei->{opt}->{'inbox-config'} =~
@@ -604,7 +605,7 @@ sub clone_v2_prep ($$;$) {
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
 	my $first_uri = (map { $_->[0] } values %$v2_epochs)[0];
-	my $pfx = $curl->torsocks($lei, $first_uri) or return;
+	$self->{-torsocks} //= $curl->torsocks($lei, $first_uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $want = parse_epochs($lei->{opt}->{epoch}, $v2_epochs);
 	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 48/95] clone: flesh out --objstore behavior and document
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (46 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 47/95] lei_mirror: ensure git <1.8.5 fallback can use torsocks Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 49/95] lei_mirror: always pack refs for coderepos Eric Wong
                   ` (46 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We can support absolute paths to avoid surprising behaviors,
but relative paths are preferred since the goal is to be
accessible over the "dumb" HTTP git transport (the dumb
transport is uses less memory and CPU on the server).
---
 Documentation/public-inbox-clone.pod | 12 ++++++++++++
 lib/PublicInbox/LeiMirror.pm         |  3 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 1c31fbb3..cee9f76e 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -6,6 +6,8 @@ public-inbox-clone - "git clone --mirror" wrapper
 
 public-inbox-clone INBOX_URL [INBOX_DIR]
 
+public-inbox-clone ROOT_URL [DESTINATION]
+
 =head1 DESCRIPTION
 
 public-inbox-clone is a wrapper around C<git clone --mirror> for
@@ -82,6 +84,16 @@ Force a remote public-inbox version (must be C<1> or C<2>).
 This is auto-detected by default, and this option exists mainly
 for testing.
 
+=item --objstore[=DIR]
+
+Enables space savings when the remote C<manifest.js.gz>
+includes C<forkgroup> entries as generated by grokmirror 2.x.
+
+If C<DIR> is not an absolute path, it is relative to the
+C<DESTINATION> directory.  If only C<--objstore> is specified
+without C<DIR>, then C<objstore> (C<$DESTINATION/objstore>)
+is the implied value of C<DIR>.
+
 =item -n
 
 =item --dry-run
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 6efe23fa..2f96058a 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -882,7 +882,8 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 
 		if (defined(my $os = $lei->{opt}->{objstore})) {
 			$os = 'objstore' if $os eq ''; # --objstore w/o args
-			$self->{-objstore} = "$self->{dst}/$os";
+			$os = "$self->{dst}/$os" if $os !~ m!\A/!;
+			$self->{-objstore} = $os;
 		}
 		local $LIVE;
 		my $iv = $lei->{opt}->{'inbox-version'} //

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 49/95] lei_mirror: always pack refs for coderepos
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (47 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 48/95] clone: flesh out --objstore behavior and document Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 50/95] lei_mirror: set description for non-inboxes, too Eric Wong
                   ` (45 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Unlike object packing, ref packing is cheap and fast.
---
 lib/PublicInbox/LeiMirror.pm | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 2f96058a..3fea4c29 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -336,16 +336,21 @@ sub fgrp_update {
 	}
 }
 
+sub pack_refs {
+	my ($self, $git_dir) = @_;
+	do_reap($self);
+	my $cmd = [ 'git', "--git-dir=$git_dir", qw(pack-refs --all --prune) ];
+	$self->{lei}->qerr("# @$cmd");
+	my $opt = { 1 => $self->{lei}->{1}, 2 => $self->{lei}->{2} };
+	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $self, $cmd ];
+}
+
 sub fgrp_fetched {
 	my ($fgrp) = @_;
 	return if $fgrp->{dry_run} || !$LIVE;
 	my $rn = $fgrp->{-remote};
 	my %opt = map { $_ => $fgrp->{lei}->{$_} } (0..2);
-	my $cmd = [ 'git', "--git-dir=$fgrp->{-osdir}",
-			qw(pack-refs --all --prune) ];
-	do_reap($fgrp);
-	$fgrp->{lei}->qerr("# @$cmd");
-	$LIVE->{spawn($cmd, undef, \%opt)} = [ \&reap_cmd, $fgrp, $cmd ];
+	pack_refs($fgrp, $fgrp->{-osdir}); # objstore refs always packed
 
 	$update_ref_stdin or return fgrp_update_old($fgrp);
 
@@ -407,6 +412,7 @@ sub forkgroup_prep {
 		$kv[0] = "remote.$rn.$kv[0]";
 		run_die([@cmd, @kv], undef, $opt);
 	}
+	$self->{-do_pack_refs} = 1; # likely coderepo
 	if (!-d $self->{cur_dst}) {
 		my $alt = File::Spec->rel2abs("$dir/objects");
 		PublicInbox::Import::init_bare($self->{cur_dst});
@@ -573,6 +579,7 @@ sub v1_done { # called via OnDestroy
 			die "rename($f, $o/info/alternates): $!";
 		$f->unlink_on_destroy(0);
 	}
+	pack_refs($self, $dst) if delete $self->{-do_pack_refs};
 	return if ($self->{-is_epoch} ||
 		$self->{lei}->{opt}->{'inbox-config'} ne 'always');
 	write_makefile($dst, 1);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 50/95] lei_mirror: set description for non-inboxes, too
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (48 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 49/95] lei_mirror: always pack refs for coderepos Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 51/95] lei_mirror: force --no-tags when fetching forkgroups Eric Wong
                   ` (44 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We can still set $GIT_DIR/description when cloning coderepos with
--inbox-config=never
---
 lib/PublicInbox/LeiMirror.pm | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 3fea4c29..21341efb 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -218,8 +218,6 @@ sub set_description ($) {
 sub index_cloned_inbox {
 	my ($self, $iv) = @_;
 	my $lei = $self->{lei};
-	eval { set_description($self) };
-	warn $@ if $@;
 
 	# n.b. public-inbox-clone works w/o (SQLite || Xapian)
 	# lei is useless without Xapian + SQLite
@@ -580,6 +578,8 @@ sub v1_done { # called via OnDestroy
 		$f->unlink_on_destroy(0);
 	}
 	pack_refs($self, $dst) if delete $self->{-do_pack_refs};
+	eval { set_description($self) };
+	warn $@ if $@;
 	return if ($self->{-is_epoch} ||
 		$self->{lei}->{opt}->{'inbox-config'} ne 'always');
 	write_makefile($dst, 1);
@@ -604,6 +604,8 @@ sub v2_done { # called via OnDestroy
 	}
 	write_makefile($dst, 2);
 	undef $lck; # unlock
+	eval { set_description($self) };
+	warn $@ if $@;
 	index_cloned_inbox($self, 2);
 }
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 51/95] lei_mirror: force --no-tags when fetching forkgroups
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (49 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 50/95] lei_mirror: set description for non-inboxes, too Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 52/95] lei_mirror: preserve permissions of existing alternates file Eric Wong
                   ` (43 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We can't have multiple remotes writing to refs/tags/*
(instead of refs/remotes/*/tags) due to potential conflicts.
---
 lib/PublicInbox/LeiMirror.pm | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 21341efb..d6aca800 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -375,7 +375,7 @@ sub fgrp_fetched {
 sub fgrp_fetch {
 	my ($fgrp, $fini) = @_;
 	my $cmd = [ @{$fgrp->{-torsocks}}, 'git', "--git-dir=$fgrp->{-osdir}",
-			fetch_args($fgrp->{lei}, my $opt = {}),
+			fetch_args($fgrp->{lei}, my $opt = {}), '--no-tags',
 			$fgrp->{-remote} ];
 	$fgrp->{-fini} = $fini;
 	do_reap($fgrp);
@@ -405,7 +405,9 @@ sub forkgroup_prep {
 	my $key = $self->{-key} // die 'BUG: no -key';
 	my ($bn) = ($key =~ m{/([a-z0-9_,;=!\+\{\}\|][^/]*)(?:\.git)?\z}i);
 	my $rn = "$bn-".substr(sha256_hex($key), 0, 16);
-	for ("url=$uri", "fetch=+refs/*:refs/remotes/$rn/*") {
+	# --no-tags is required to avoid conflicts
+	for ("url=$uri", "fetch=+refs/*:refs/remotes/$rn/*",
+			'tagopt=--no-tags') {
 		my @kv = split(/=/, $_, 2);
 		$kv[0] = "remote.$rn.$kv[0]";
 		run_die([@cmd, @kv], undef, $opt);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 52/95] lei_mirror: preserve permissions of existing alternates file
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (50 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 51/95] lei_mirror: force --no-tags when fetching forkgroups Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 53/95] lei_mirror: do not show ref updates w/o --verbose Eric Wong
                   ` (42 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We don't want to be clobbering permissions when changing to
relative paths.  Furthermore, we can avoid writing to the
alternates file if there are no changes.
---
 lib/PublicInbox/LeiMirror.pm | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index d6aca800..829740bc 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -113,9 +113,9 @@ sub clone_cmd {
 	@cmd;
 }
 
-sub ft_rename ($$$) {
-	my ($ft, $dst, $open_mode) = @_;
-	my @st = stat($dst);
+sub ft_rename ($$$;$) {
+	my ($ft, $dst, $open_mode, $fh) = @_;
+	my @st = stat($fh // $dst);
 	my $mode = @st ? ($st[2] & 07777) : ($open_mode & ~umask);
 	chmod($mode, $ft) or croak "E: chmod($ft): $!";
 	require File::Copy;
@@ -565,19 +565,21 @@ sub v1_done { # called via OnDestroy
 		run_die([qw(git config -f), "$dst/config", 'gitweb.owner', $o]);
 	}
 	my $o = "$dst/objects";
-	if (open(my $fh, '<', "$o/info/alternates")) {
+	if (open(my $fh, '<', my $fn = "$o/info/alternates")) {;
 		my $base = File::Spec->rel2abs($o);
-		chomp(my @l = <$fh>);
+		my @l = <$fh>;
+		my $ft;
 		for (@l) {
-			$_ = File::Spec->abs2rel($_, $base) if m!\A/!;
-			$_ .= "\n";
+			next unless m!\A/!;
+			$_ = File::Spec->abs2rel($_, $base);
+			$ft //= File::Temp->new(TEMPLATE => '.XXXX',
+						DIR => "$o/info");
+		}
+		if ($ft) {
+			print $ft @l or die "print($ft): $!";
+			$ft->flush or die "flush($ft): $!";
+			ft_rename($ft, $fn, 0666, $fh);
 		}
-		my $f = File::Temp->new(TEMPLATE => '.XXXX', DIR => "$o/info");
-		print $f @l;
-		$f->flush or die "flush($f): $!";
-		rename($f->filename, "$o/info/alternates") or
-			die "rename($f, $o/info/alternates): $!";
-		$f->unlink_on_destroy(0);
 	}
 	pack_refs($self, $dst) if delete $self->{-do_pack_refs};
 	eval { set_description($self) };

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 53/95] lei_mirror: do not show ref updates w/o --verbose
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (51 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 52/95] lei_mirror: preserve permissions of existing alternates file Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 54/95] lei_mirror: drop git <1.8.5 support Eric Wong
                   ` (41 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

It's too noisy IMHO, and UIs are always opinionated.
---
 lib/PublicInbox/LeiMirror.pm | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 829740bc..1138a82d 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -289,8 +289,8 @@ sub fgrp_update_old ($) { # for git <1.8.5
 }
 
 sub upr { # feed `git update-ref --stdin -z' verbosely
-	my ($fgrp, $w, $op, $ref, $oid) = @_;
-	$fgrp->{lei}->qerr("# $op $ref $oid");
+	my ($lei, $w, $op, $ref, $oid) = @_;
+	$lei->qerr("# $op $ref $oid") if $lei->{opt}->{verbose};
 	print $w "$op $ref\0$oid\0" or die "print(w): $!";
 }
 
@@ -307,21 +307,22 @@ sub fgrp_update {
 	pipe(my ($r, $w)) or die "pipe: $!";
 	my $cmd = [ 'git', "--git-dir=$fgrp->{cur_dst}",
 		qw(update-ref --stdin -z) ];
-	$fgrp->{lei}->qerr("# @$cmd");
-	my $opt = { 0 => $r, 1 => $fgrp->{lei}->{1}, 2 => $fgrp->{lei}->{2} };
+	my $lei = $fgrp->{lei};
+	$lei->qerr("# @$cmd");
+	my $opt = { 0 => $r, 1 => $lei->{1}, 2 => $lei->{2} };
 	my $pid = spawn($cmd, undef, $opt);
 	close $r or die "close(r): $!";
 	for my $ref (keys %dst) {
 		my $new = delete $src{$ref};
 		my $old = $dst{$ref};
 		if (defined $new) {
-			upr($fgrp, $w, 'update', $ref, $new) if $new ne $old;
+			upr($lei, $w, 'update', $ref, $new) if $new ne $old;
 		} else {
-			upr($fgrp, $w, 'delete', $ref, $old);
+			upr($lei, $w, 'delete', $ref, $old);
 		}
 	}
 	while (my ($ref, $oid) = each %src) {
-		upr($fgrp, $w, 'create', $ref, $oid);
+		upr($lei, $w, 'create', $ref, $oid);
 	}
 	if (close($w)) { # git >= 1.8.5
 		$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd ];

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 54/95] lei_mirror: drop git <1.8.5 support
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (52 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 53/95] lei_mirror: do not show ref updates w/o --verbose Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 55/95] lei_mirror: make basename more descriptive Eric Wong
                   ` (40 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Supporting git <1.8.5 via fetch on non-forkgroup repos would
make auto-GC dangerous, and I want to support auto-GC instead
of relying on the preciousObjects extension.

Since git 1.8.5 is 9 years old at this point, and grokmirror
(used by the only CentOS 7.x user I know of) already relies on
newer git, simplify our code and only fetch into forkgroups.
---
 lib/PublicInbox/LeiMirror.pm | 41 +++++++++++++-----------------------
 1 file changed, 15 insertions(+), 26 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 1138a82d..28fef6f9 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -21,7 +21,6 @@ use PublicInbox::OnDestroy;
 use Digest::SHA qw(sha256_hex);
 
 our $LIVE; # pid => callback
-my $update_ref_stdin = $ENV{GIT_CAN_UPDATE_REF_STDIN} // 1;
 
 sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my ($arg, $pid) = @_;
@@ -279,15 +278,6 @@ sub fetch_args ($$) {
 	@cmd;
 }
 
-sub fgrp_update_old ($) { # for git <1.8.5
-	my ($fgrp) = @_;
-	my $cmd = [ @{$fgrp->{-torsocks}}, 'git', "--git-dir=$fgrp->{cur_dst}",
-		fetch_args($fgrp->{lei}, my $opt = {}) ];
-	$fgrp->{lei}->qerr("# @$cmd");
-	do_reap($fgrp);
-	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $fgrp, $cmd ];
-}
-
 sub upr { # feed `git update-ref --stdin -z' verbosely
 	my ($lei, $w, $op, $ref, $oid) = @_;
 	$lei->qerr("# $op $ref $oid") if $lei->{opt}->{verbose};
@@ -324,15 +314,9 @@ sub fgrp_update {
 	while (my ($ref, $oid) = each %src) {
 		upr($lei, $w, 'create', $ref, $oid);
 	}
-	if (close($w)) { # git >= 1.8.5
-		$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd ];
-		do_reap($fgrp);
-	} else { # git <1.8.5 w/o update-ref --stdin
-		warn "E: close(update-ref --stdin): $!\n";
-		$update_ref_stdin = 0;
-		waitpid($pid, 0) // die "waitpid(update-ref --stdin): $!";
-		fgrp_update_old($fgrp);
-	}
+	close($w) or warn "E: close(update-ref --stdin): $! (need git 1.8.5+)\n";
+	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd ];
+	do_reap($fgrp);
 }
 
 sub pack_refs {
@@ -351,8 +335,6 @@ sub fgrp_fetched {
 	my %opt = map { $_ => $fgrp->{lei}->{$_} } (0..2);
 	pack_refs($fgrp, $fgrp->{-osdir}); # objstore refs always packed
 
-	$update_ref_stdin or return fgrp_update_old($fgrp);
-
 	my $update_ref = PublicInbox::OnDestroy->new($$, \&fgrp_update, $fgrp);
 
 	my $src = [ 'git', "--git-dir=$fgrp->{-osdir}", 'for-each-ref',
@@ -427,11 +409,18 @@ sub forkgroup_prep {
 			say $fh $l or die "say($f): $!";
 		}
 		close $fh or die "close($f): $!";
-		@cmd = ('git', "--git-dir=$self->{cur_dst}",
-			qw(remote add --mirror=fetch origin), "$uri");
-		my $pid = spawn(\@cmd, undef, $opt);
-		waitpid($pid, 0) // die "waitpid(@cmd): $!";
-		die "E: @cmd: \$?=$?" if ($? && ($? >> 8) != 3);
+		$f = "$self->{cur_dst}/config";
+		open $fh, '+>>', $f or die "open:($f): $!";
+		print $fh <<EOM or die "print($f): $!";
+; rely on the "$rn" remote in the
+; $fg fork group for fetches
+; only uncomment the following iff you detach from fork groups
+; [remote "origin"]
+;	url = $uri
+;	fetch = +refs/*:refs/*
+;	mirror = true
+EOM
+		close $fh or die "close($f): $!";
 	}
 	bless { %$self, -osdir => $dir, -remote => $rn }, __PACKAGE__;
 }

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 55/95] lei_mirror: make basename more descriptive
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (53 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 54/95] lei_mirror: drop git <1.8.5 support Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 56/95] lei_mirror: fix --dry-run for forkgroups Eric Wong
                   ` (39 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This makes it easier for humans to distinguish between
"Alice/project.git" and "Bob/project.git"
---
 lib/PublicInbox/LeiMirror.pm | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 28fef6f9..3220f48d 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -386,8 +386,11 @@ sub forkgroup_prep {
 		}
 	}
 	my $key = $self->{-key} // die 'BUG: no -key';
-	my ($bn) = ($key =~ m{/([a-z0-9_,;=!\+\{\}\|][^/]*)(?:\.git)?\z}i);
-	my $rn = "$bn-".substr(sha256_hex($key), 0, 16);
+	my $rn = $key;
+	$rn =~ s!\A[\./]+!!s;
+	$rn =~ s/\.*?(?:\.git)?\.*?\z//s;
+	$rn =~ s![\@\{\}/:\?\[\]\^~\s\f[:cntrl:]\*]!_!isg;
+	$rn .= '-'.substr(sha256_hex($key), 0, 16);
 	# --no-tags is required to avoid conflicts
 	for ("url=$uri", "fetch=+refs/*:refs/remotes/$rn/*",
 			'tagopt=--no-tags') {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 56/95] lei_mirror: fix --dry-run for forkgroups
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (54 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 55/95] lei_mirror: make basename more descriptive Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 57/95] lei_mirror: forkgroups use `git fetch --multiple' Eric Wong
                   ` (38 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

We must not make permanent changes to the FS if --dry-run is in use.
---
 lib/PublicInbox/LeiMirror.pm | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 3220f48d..00732128 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -378,7 +378,7 @@ sub forkgroup_prep {
 	my $dir = "$os/$fg.git";
 	my @cmd = ('git', "--git-dir=$dir", 'config');
 	my $opt = +{ map { $_ => $self->{lei}->{$_} } (0..2) };
-	if (!-d $dir) {
+	if (!-d $dir && !$self->{dry_run}) {
 		PublicInbox::Import::init_bare($dir);
 		for ('repack.useDeltaIslands=true',
 				'pack.island=refs/remotes/([^/]+)/') {
@@ -391,15 +391,17 @@ sub forkgroup_prep {
 	$rn =~ s/\.*?(?:\.git)?\.*?\z//s;
 	$rn =~ s![\@\{\}/:\?\[\]\^~\s\f[:cntrl:]\*]!_!isg;
 	$rn .= '-'.substr(sha256_hex($key), 0, 16);
-	# --no-tags is required to avoid conflicts
-	for ("url=$uri", "fetch=+refs/*:refs/remotes/$rn/*",
-			'tagopt=--no-tags') {
-		my @kv = split(/=/, $_, 2);
-		$kv[0] = "remote.$rn.$kv[0]";
-		run_die([@cmd, @kv], undef, $opt);
+	unless ($self->{dry_run}) {
+		# --no-tags is required to avoid conflicts
+		for ("url=$uri", "fetch=+refs/*:refs/remotes/$rn/*",
+				'tagopt=--no-tags') {
+			my @kv = split(/=/, $_, 2);
+			$kv[0] = "remote.$rn.$kv[0]";
+			run_die([@cmd, @kv], undef, $opt);
+		}
 	}
 	$self->{-do_pack_refs} = 1; # likely coderepo
-	if (!-d $self->{cur_dst}) {
+	if (!-d $self->{cur_dst} && !$self->{dry_run}) {
 		my $alt = File::Spec->rel2abs("$dir/objects");
 		PublicInbox::Import::init_bare($self->{cur_dst});
 		my $o = "$self->{cur_dst}/objects";

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 57/95] lei_mirror: forkgroups use `git fetch --multiple'
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (55 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 56/95] lei_mirror: fix --dry-run for forkgroups Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 58/95] clone: move --dry-run handling to lei_mirror Eric Wong
                   ` (37 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

This offloads network parallelization and safety off to git
itself while reducing the amount of unnecessary process spawning
we do.  This also improves readability of pack-refs invocations
and reduces the need for them.

To prevent heavily-forked repos from hitting system command-line
size limits, we group refs to be updated in the "fgrptmp" group.
---
 lib/PublicInbox/LeiMirror.pm | 160 +++++++++++++++++++++++------------
 1 file changed, 108 insertions(+), 52 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 00732128..8b55a7da 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -316,7 +316,7 @@ sub fgrp_update {
 	}
 	close($w) or warn "E: close(update-ref --stdin): $! (need git 1.8.5+)\n";
 	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd ];
-	do_reap($fgrp);
+	pack_refs($fgrp, $fgrp->{cur_dst});
 }
 
 sub pack_refs {
@@ -324,49 +324,101 @@ sub pack_refs {
 	do_reap($self);
 	my $cmd = [ 'git', "--git-dir=$git_dir", qw(pack-refs --all --prune) ];
 	$self->{lei}->qerr("# @$cmd");
+	return if $self->{dry_run};
 	my $opt = { 1 => $self->{lei}->{1}, 2 => $self->{lei}->{2} };
 	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $self, $cmd ];
 }
 
-sub fgrp_fetched {
-	my ($fgrp) = @_;
-	return if $fgrp->{dry_run} || !$LIVE;
-	my $rn = $fgrp->{-remote};
-	my %opt = map { $_ => $fgrp->{lei}->{$_} } (0..2);
-	pack_refs($fgrp, $fgrp->{-osdir}); # objstore refs always packed
-
-	my $update_ref = PublicInbox::OnDestroy->new($$, \&fgrp_update, $fgrp);
-
-	my $src = [ 'git', "--git-dir=$fgrp->{-osdir}", 'for-each-ref',
-		"--format=refs/%(refname:lstrip=3)%00%(objectname)",
-		"refs/remotes/$rn/" ];
-	do_reap($fgrp);
-	open($fgrp->{srcfh}, '+>', undef) or die "open(src): $!";
-	$fgrp->{lei}->qerr("# @$src >SRC");
-	my $pid = spawn($src, undef, { %opt, 1 => $fgrp->{srcfh} });
-	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $src, $update_ref ];
-
-	my $dst = [ 'git', "--git-dir=$fgrp->{cur_dst}", 'for-each-ref',
-		'--format=%(refname)%00%(objectname)' ];
-	do_reap($fgrp);
-	open($fgrp->{dstfh}, '+>', undef) or die "open(dst): $!";
-	$fgrp->{lei}->qerr("# @$dst >DST");
-	$pid = spawn($dst, undef, { %opt, 1 => $fgrp->{dstfh} });
-	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $dst, $update_ref ];
-}
-
-sub fgrp_fetch {
-	my ($fgrp, $fini) = @_;
-	my $cmd = [ @{$fgrp->{-torsocks}}, 'git', "--git-dir=$fgrp->{-osdir}",
-			fetch_args($fgrp->{lei}, my $opt = {}), '--no-tags',
-			$fgrp->{-remote} ];
-	$fgrp->{-fini} = $fini;
-	do_reap($fgrp);
-	$fgrp->{lei}->qerr("# @$cmd");
-	return if $fgrp->{dry_run};
-	my $fgrp_fini = PublicInbox::OnDestroy->new($$, \&fgrp_fetched, $fgrp);
-	my $pid = spawn($cmd, undef, $opt);
-	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd, $fgrp_fini ];
+sub fgrpv_done {
+	my ($fgrpv) = @_;
+	return if !$LIVE;
+	my $pid;
+	my $first = $fgrpv->[0] // die 'BUG: no fgrpv->[0]';
+	pack_refs($first, $first->{-osdir}); # objstore refs always packed
+	for my $fgrp (@$fgrpv) {
+		my $rn = $fgrp->{-remote};
+		my %opt = map { $_ => $fgrp->{lei}->{$_} } (0..2);
+
+		my $update_ref = $fgrp->{dry_run} ? undef :
+			PublicInbox::OnDestroy->new($$, \&fgrp_update, $fgrp);
+
+		my $src = [ 'git', "--git-dir=$fgrp->{-osdir}", 'for-each-ref',
+			"--format=refs/%(refname:lstrip=3)%00%(objectname)",
+			"refs/remotes/$rn/" ];
+		do_reap($fgrp);
+		$fgrp->{lei}->qerr("# @$src >SRC");
+		if ($update_ref) {
+			open(my $fh, '+>', undef) or die "open(src): $!";
+			$pid = spawn($src, undef, { %opt, 1 => $fh });
+			$fgrp->{srcfh} = $fh;
+			$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $src, $update_ref ]
+		}
+		my $dst = [ 'git', "--git-dir=$fgrp->{cur_dst}", 'for-each-ref',
+			'--format=%(refname)%00%(objectname)' ];
+		do_reap($fgrp);
+		$fgrp->{lei}->qerr("# @$dst >DST");
+		if ($update_ref) {
+			open(my $fh, '+>', undef) or die "open(dst): $!";
+			$pid = spawn($dst, undef, { %opt, 1 => $fh });
+			$fgrp->{dstfh} = $fh;
+			$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $dst, $update_ref ]
+		}
+	}
+}
+
+sub fgrp_fetch_all {
+	my ($self) = @_;
+	my $todo = delete $self->{fgrp_todo} or return;
+	keys(%$todo) or return;
+
+	# Rely on the fgrptmp remote groups in the config file rather
+	# than listing all remotes since the remote name list may exceed
+	# system argv limits:
+	my $grp = 'fgrptmp';
+
+	my @git = (@{$self->{-torsocks}}, 'git');
+	my $j = $self->{lei}->{opt}->{jobs};
+	my $opt = {};
+	my @fetch = do {
+		local $self->{lei}->{opt}->{jobs} = 1;
+		(fetch_args($self->{lei}, $opt),
+			qw(--no-tags --multiple));
+	};
+	push(@fetch, "-j$j") if $j;
+	my $pid;
+	while (my ($osdir, $fgrpv) = each %$todo) {
+		my $f = "$osdir/config";
+
+		# clobber group from previous run atomically
+		my $cmd = ['git', "--git-dir=$osdir", qw(config -f),
+				$f, '--unset-all', "remotes.$grp"];
+		$self->{lei}->qerr("# @$cmd");
+		if (!$self->{dry_run}) {
+			$pid = spawn($cmd);
+			waitpid($pid, 0) // die "waitpid: $!";
+			die "E: @$cmd: \$?=$?" if ($? && ($? >> 8) != 5);
+
+			# update the config atomically via O_APPEND while
+			# respecting git-config locking
+			sysopen(my $lk, "$f.lock", O_CREAT|O_EXCL|O_WRONLY)
+				or die "open($f.lock): $!";
+			open my $fh, '>>', $f or die "open(>>$f): $!";
+			$fh->autoflush(1);
+			my $buf = join('', "[remotes]\n",
+				map { "\t$grp = $_->{-remote}\n" } @$fgrpv);
+			print $fh $buf or die "print($f): $!";
+			close $fh or die "close($f): $!";
+			unlink("$f.lock") or die "unlink($f.lock): $!";
+		}
+
+		$cmd = [ @git, "--git-dir=$osdir", @fetch, $grp ];
+		do_reap($self);
+		$self->{lei}->qerr("# @$cmd");
+		my $end = PublicInbox::OnDestroy->new($$, \&fgrpv_done, $fgrpv);
+		return if $self->{dry_run};
+		$pid = spawn($cmd, undef, $opt);
+		$LIVE->{$pid} = [ \&reap_cmd, $self, $cmd, $end ];
+	}
 }
 
 # keep this idempotent for future use by public-inbox-fetch
@@ -400,7 +452,6 @@ sub forkgroup_prep {
 			run_die([@cmd, @kv], undef, $opt);
 		}
 	}
-	$self->{-do_pack_refs} = 1; # likely coderepo
 	if (!-d $self->{cur_dst} && !$self->{dry_run}) {
 		my $alt = File::Spec->rel2abs("$dir/objects");
 		PublicInbox::Import::init_bare($self->{cur_dst});
@@ -440,18 +491,21 @@ sub clone_v1 {
 	$self->{-torsocks} //= $curl->torsocks($lei, $uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
-	my $cmd = [ @{$self->{-torsocks}}, clone_cmd($lei, my $opt = {}),
-		"$uri", $dst ];
-	my $fgrp = forkgroup_prep($self, $uri);
-	if (!defined($fgrp) && defined($self->{-ent})) {
-		if (defined(my $ref = $self->{-ent}->{reference})) {
-			-e "$self->{dst}$ref" and
-				push @$cmd, '--reference', "$self->{dst}$ref";
+	if (my $fgrp = forkgroup_prep($self, $uri)) {
+		$fgrp->{-fini} = $fini;
+		push @{$self->{fgrp_todo}->{$fgrp->{-osdir}}}, $fgrp;
+	} else { # normal fetch
+		my $cmd = [ @{$self->{-torsocks}},
+				clone_cmd($lei, my $opt = {}), "$uri", $dst ];
+		if (defined($self->{-ent})) {
+			if (defined(my $ref = $self->{-ent}->{reference})) {
+				-e "$self->{dst}$ref" and
+					push @$cmd, '--reference',
+						"$self->{dst}$ref";
+			}
 		}
-	}
-	$fgrp ? fgrp_fetch($fgrp, $fini) :
 		start_clone($self, $cmd, $opt, $fini);
-
+	}
 	if (!$self->{-is_epoch} && $lei->{opt}->{'inbox-config'} =~
 				/\A(?:always|v1)\z/s) {
 		_get_txt_start($self, '_/text/config/raw', $fini);
@@ -576,7 +630,6 @@ sub v1_done { # called via OnDestroy
 			ft_rename($ft, $fn, 0666, $fh);
 		}
 	}
-	pack_refs($self, $dst) if delete $self->{-do_pack_refs};
 	eval { set_description($self) };
 	warn $@ if $@;
 	return if ($self->{-is_epoch} ||
@@ -770,6 +823,7 @@ EOM
 			last; # restart %$todo iteration
 		}
 	}
+	fgrp_fetch_all($self);
 	do_reap($self, 1);
 }
 
@@ -793,6 +847,7 @@ sub try_manifest {
 	my $uri = URI->new($self->{src});
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+	$self->{-torsocks} //= $curl->torsocks($lei, $uri) or return;
 	my $path = $uri->path;
 	chop($path) eq '/' or die "BUG: $uri not canonicalized";
 	$uri->path($path . '/manifest.js.gz');
@@ -814,6 +869,7 @@ sub try_manifest {
 	return $lei->child_error(1, $multi) if !ref($multi);
 	my $v2 = delete $multi->{v2};
 	local $self->{todo} = {};
+	local $self->{fgrp_todo} = {}; # { objstore_dir => [fgrp, ...] }
 	if ($v2) {
 		for my $name (sort keys %$v2) {
 			my $epochs = delete $v2->{$name};

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 58/95] clone: move --dry-run handling to lei_mirror
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (56 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 57/95] lei_mirror: forkgroups use `git fetch --multiple' Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 59/95] clone: drop unnecessary requires Eric Wong
                   ` (36 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

lei will probably support dry-run in more places, too.
---
 lib/PublicInbox/LeiMirror.pm | 1 +
 script/public-inbox-clone    | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 8b55a7da..2da4f881 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -938,6 +938,7 @@ sub start_clone_url {
 sub do_mirror { # via wq_io_do or public-inbox-clone
 	my ($self) = @_;
 	my $lei = $self->{lei};
+	$self->{dry_run} = 1 if $lei->{opt}->{'dry-run'};
 	umask($lei->{client_umask}) if defined $lei->{client_umask};
 	eval {
 		my $ic = $lei->{opt}->{'inbox-config'} //= 'always';
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 59f01b54..44626936 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -59,7 +59,6 @@ my $mrr = bless {
 }, 'PublicInbox::LeiMirror';
 
 $? = 0;
-$mrr->{dry_run} = 1 if $lei->{opt}->{'dry-run'};
 $mrr->do_mirror;
 $mrr->can('_wq_done_wait')->([$mrr, $lei], $$);
 exit(($lei->{child_error} // 0) >> 8);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 59/95] clone: drop unnecessary requires
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (57 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 58/95] clone: move --dry-run handling to lei_mirror Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 60/95] clone: use v5.12 Eric Wong
                   ` (35 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

These packages are all require-ed elsewhere.
---
 script/public-inbox-clone | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 44626936..a5651e5f 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -38,12 +38,9 @@ defined($dst) or ($dst) = ($url =~ m!/([^/]+)/?\z!);
 index($dst, "\n") >= 0 and die "`\\n' not allowed in `$dst'";
 
 # n.b. this is still a truckload of code...
-require URI;
 require PublicInbox::LEI;
 require PublicInbox::LeiExternal;
 require PublicInbox::LeiMirror;
-require PublicInbox::LeiCurl;
-require PublicInbox::Lock;
 
 $url = PublicInbox::LeiExternal::ext_canonicalize($url);
 my $lei = bless {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 60/95] clone: use v5.12
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (58 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 59/95] clone: drop unnecessary requires Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 61/95] clone: require `--objstore=' for default location Eric Wong
                   ` (34 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Another small step in what will probably a be a decades-long
quest to reduce startup time by a few milliseconds.
---
 script/public-inbox-clone | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index a5651e5f..26f42e74 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -2,8 +2,7 @@
 # Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 # Wrapper to git clone remote public-inboxes
-use strict;
-use v5.10.1;
+use v5.12;
 use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
 my $opt = {};
 my $help = <<EOF; # the following should fit w/o scrolling in 80x24 term:

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 61/95] clone: require `--objstore=' for default location
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (59 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 60/95] clone: use v5.12 Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:31 ` [PATCH 62/95] lei_mirror: shorten remote names Eric Wong
                   ` (33 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

Allowing just `--objstore' without `=' was confusing,
since it could eat one of the required parameters (URL or
DESTINATION).
---
 Documentation/public-inbox-clone.pod | 8 ++++----
 script/public-inbox-clone            | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index cee9f76e..257967d9 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -84,15 +84,15 @@ Force a remote public-inbox version (must be C<1> or C<2>).
 This is auto-detected by default, and this option exists mainly
 for testing.
 
-=item --objstore[=DIR]
+=item --objstore=DIR
 
 Enables space savings when the remote C<manifest.js.gz>
 includes C<forkgroup> entries as generated by grokmirror 2.x.
 
 If C<DIR> is not an absolute path, it is relative to the
-C<DESTINATION> directory.  If only C<--objstore> is specified
-without C<DIR>, then C<objstore> (C<$DESTINATION/objstore>)
-is the implied value of C<DIR>.
+C<DESTINATION> directory.  If only C<--objstore=> is specified
+where C<DIR> is an empty string (C<"">), then C<objstore>
+(C<$DESTINATION/objstore>) is the implied value of C<DIR>.
 
 =item -n
 
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 26f42e74..e38d7b0d 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -13,7 +13,7 @@ usage: public-inbox-clone INBOX_URL [DESTINATION]
 options:
 
   --epoch=RANGE       range of v2 epochs to clone (e.g `2..5', `~0', `~1..')
-  --objstore [DIR]    share storage for coderepos
+  --objstore=DIR      share storage for coderepos
   --torsocks VAL      whether or not to wrap git and curl commands with
                       torsocks (default: `auto')
                       Must be one of: `auto', `no' or `yes'
@@ -23,7 +23,7 @@ options:
     -C DIR            chdir to specified directory
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
-	inbox-config=s inbox-version=i objstore:s
+	inbox-config=s inbox-version=i objstore=s
 	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 62/95] lei_mirror: shorten remote names
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (60 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 61/95] clone: require `--objstore=' for default location Eric Wong
@ 2022-11-28  5:31 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 63/95] fetch: use v5.12 Eric Wong
                   ` (32 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:31 UTC (permalink / raw)
  To: meta

The lengthy-but-human-meaningful remote names are more expensive
at runtime and increase packed-refs space.
---
 lib/PublicInbox/LeiMirror.pm | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 2da4f881..04d9494c 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -438,11 +438,7 @@ sub forkgroup_prep {
 		}
 	}
 	my $key = $self->{-key} // die 'BUG: no -key';
-	my $rn = $key;
-	$rn =~ s!\A[\./]+!!s;
-	$rn =~ s/\.*?(?:\.git)?\.*?\z//s;
-	$rn =~ s![\@\{\}/:\?\[\]\^~\s\f[:cntrl:]\*]!_!isg;
-	$rn .= '-'.substr(sha256_hex($key), 0, 16);
+	my $rn = substr(sha256_hex($key), 0, 16);
 	unless ($self->{dry_run}) {
 		# --no-tags is required to avoid conflicts
 		for ("url=$uri", "fetch=+refs/*:refs/remotes/$rn/*",

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 63/95] fetch: use v5.12
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (61 preceding siblings ...)
  2022-11-28  5:31 ` [PATCH 62/95] lei_mirror: shorten remote names Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 64/95] fetch: eliminate File::Temp->filename var Eric Wong
                   ` (31 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

Another tiny step towards improved startup performance by
avoiding one .pm file.
---
 lib/PublicInbox/Fetch.pm  | 3 +--
 script/public-inbox-fetch | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Fetch.pm b/lib/PublicInbox/Fetch.pm
index 3dbb0b55..d75e427b 100644
--- a/lib/PublicInbox/Fetch.pm
+++ b/lib/PublicInbox/Fetch.pm
@@ -2,8 +2,7 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 # Wrapper to "git fetch" remote public-inboxes
 package PublicInbox::Fetch;
-use strict;
-use v5.10.1;
+use v5.12;
 use parent qw(PublicInbox::IPC);
 use URI ();
 use PublicInbox::Spawn qw(popen_rd run_die spawn);
diff --git a/script/public-inbox-fetch b/script/public-inbox-fetch
index f9bac4e3..4b991a90 100755
--- a/script/public-inbox-fetch
+++ b/script/public-inbox-fetch
@@ -2,8 +2,7 @@
 # Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 # Wrapper to git fetch remote public-inboxes
-use strict;
-use v5.10.1;
+use v5.12;
 use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
 my $opt = {};
 my $help = <<EOF; # the following should fit w/o scrolling in 80x24 term:

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 64/95] fetch: eliminate File::Temp->filename var
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (62 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 63/95] fetch: use v5.12 Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 65/95] lei_mirror: properly pack-refs in non-forkgroup repos Eric Wong
                   ` (30 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

File::Temp objects are overloaded to automatically
call ->filename when stringified, so there's no need
to store the ->filename result on the Perl stack.
---
 lib/PublicInbox/Fetch.pm | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Fetch.pm b/lib/PublicInbox/Fetch.pm
index d75e427b..198e2a60 100644
--- a/lib/PublicInbox/Fetch.pm
+++ b/lib/PublicInbox/Fetch.pm
@@ -48,7 +48,6 @@ sub do_manifest ($$$) {
 	my $muri = URI->new("$ibx_uri/manifest.js.gz");
 	my $ft = File::Temp->new(TEMPLATE => 'm-XXXX',
 				UNLINK => 1, DIR => $dir, SUFFIX => '.tmp');
-	my $fn = $ft->filename;
 	my $mf = "$dir/manifest.js.gz";
 	my $m0; # current manifest.js.gz contents
 	if (open my $fh, '<', $mf) {
@@ -57,7 +56,7 @@ sub do_manifest ($$$) {
 		};
 		warn($@) if $@;
 	}
-	my ($bn) = ($fn =~ m!/([^/]+)\z!);
+	my ($bn) = ($ft->filename =~ m!/([^/]+)\z!);
 	my $curl_cmd = $lei->{curl}->for_uri($lei, $muri, qw(-R -o), $bn);
 	my $opt = { -C => $dir };
 	$opt->{$_} = $lei->{$_} for (0..2);
@@ -68,7 +67,7 @@ sub do_manifest ($$$) {
 		return;
 	}
 	my $m1 = eval {
-		PublicInbox::LeiMirror::decode_manifest($ft, $fn, $muri);
+		PublicInbox::LeiMirror::decode_manifest($ft, $ft, $muri);
 	} or return [ 404, $muri ];
 	my $mdiff = { %$m1 };
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 65/95] lei_mirror: properly pack-refs in non-forkgroup repos
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (63 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 64/95] fetch: eliminate File::Temp->filename var Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 66/95] lei_mirror: show child error error code Eric Wong
                   ` (29 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

We need to ensure `git update-ref --stdin' is complete
before running `git pack-refs', otherwise loose refs can
remain while update-ref is still running.
---
 lib/PublicInbox/LeiMirror.pm | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 04d9494c..4464b6b1 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -315,7 +315,12 @@ sub fgrp_update {
 		upr($lei, $w, 'create', $ref, $oid);
 	}
 	close($w) or warn "E: close(update-ref --stdin): $! (need git 1.8.5+)\n";
-	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd ];
+	my $pack = PublicInbox::OnDestroy->new($$, \&pack_dst, $fgrp);
+	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd, $pack ];
+}
+
+sub pack_dst { # packs lightweight satellite repos
+	my ($fgrp) = @_;
 	pack_refs($fgrp, $fgrp->{cur_dst});
 }
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 66/95] lei_mirror: show child error error code
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (64 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 65/95] lei_mirror: properly pack-refs in non-forkgroup repos Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 67/95] on_destroy: support ->cancel callback Eric Wong
                   ` (28 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

Just passing the exit value of the child process isn't to
our parent process isn't very useful when multiple commands
are failing at once.
---
 lib/PublicInbox/LeiMirror.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 4464b6b1..8b276336 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -602,7 +602,7 @@ sub reap_cmd { # async, called via SIGCHLD
 	$? = 0; # don't let it influence normal exit
 	if ($cerr) {
 		kill('TERM', keys %$LIVE);
-		$self->{lei}->child_error($cerr, "@$cmd failed");
+		$self->{lei}->child_error($cerr, "@$cmd failed (\$?=$cerr)");
 	}
 }
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 67/95] on_destroy: support ->cancel callback
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (65 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 66/95] lei_mirror: show child error error code Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 68/95] lei_mirror: support resuming multi-repo clones Eric Wong
                   ` (27 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

We probably use this idiom elsewhere, but having this method
around to make future use cases more readable is probably prudent.
---
 lib/PublicInbox/OnDestroy.pm | 5 ++++-
 t/on_destroy.t               | 8 ++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/OnDestroy.pm b/lib/PublicInbox/OnDestroy.pm
index 615bc450..d9a6cd24 100644
--- a/lib/PublicInbox/OnDestroy.pm
+++ b/lib/PublicInbox/OnDestroy.pm
@@ -1,13 +1,16 @@
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 package PublicInbox::OnDestroy;
+use v5.12;
 
 sub new {
 	shift; # ($class, $cb, @args)
 	bless [ @_ ], __PACKAGE__;
 }
 
+sub cancel { @{$_[0]} = () }
+
 sub DESTROY {
 	my ($cb, @args) = @{$_[0]};
 	if (!ref($cb) && $cb) {
diff --git a/t/on_destroy.t b/t/on_destroy.t
index 0de67d0b..e7945100 100644
--- a/t/on_destroy.t
+++ b/t/on_destroy.t
@@ -1,6 +1,5 @@
 #!perl -w
-use strict;
-use v5.10.1;
+use v5.12;
 use Test::More;
 require_ok 'PublicInbox::OnDestroy';
 my @x;
@@ -25,6 +24,11 @@ $od = PublicInbox::OnDestroy->new($$, sub { $tmp = $$ });
 undef $od;
 is($tmp, $$, '$tmp set to $$ by callback');
 
+$od = PublicInbox::OnDestroy->new($$, sub { $tmp = 'foo' });
+$od->cancel;
+$od = undef;
+isnt($tmp, 'foo', '->cancel');
+
 if (my $nr = $ENV{TEST_LEAK_NR}) {
 	for (0..$nr) {
 		$od = PublicInbox::OnDestroy->new(sub { @x = @_ }, qw(x y));

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 68/95] lei_mirror: support resuming multi-repo clones
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (66 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 67/95] on_destroy: support ->cancel callback Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 69/95] lei_mirror: check fingerprints before fetching Eric Wong
                   ` (26 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

This is actually a combination of clone and fetch, and I don't
think `public-inbox-fetch' will be used to update multiple git
repos (inbox or not).

Our use of `git update-ref --stdin -z' was broken for
incremental updates, but now fixed to properly NUL-terminate
commands.
---
 lib/PublicInbox/LeiMirror.pm | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 8b276336..f2111dd2 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -256,7 +256,7 @@ sub run_reap {
 	$ret;
 }
 
-sub start_clone {
+sub start_cmd {
 	my ($self, $cmd, $opt, $fini) = @_;
 	do_reap($self);
 	$self->{lei}->qerr("# @$cmd");
@@ -279,9 +279,9 @@ sub fetch_args ($$) {
 }
 
 sub upr { # feed `git update-ref --stdin -z' verbosely
-	my ($lei, $w, $op, $ref, $oid) = @_;
-	$lei->qerr("# $op $ref $oid") if $lei->{opt}->{verbose};
-	print $w "$op $ref\0$oid\0" or die "print(w): $!";
+	my ($lei, $w, $op, @rest) = @_; # ($ref, $oid) = @rest
+	$lei->qerr("# $op @rest") if $lei->{opt}->{verbose};
+	print $w "$op ", join("\0", @rest, '') or die "print(w): $!";
 }
 
 sub fgrp_update {
@@ -306,7 +306,8 @@ sub fgrp_update {
 		my $new = delete $src{$ref};
 		my $old = $dst{$ref};
 		if (defined $new) {
-			upr($lei, $w, 'update', $ref, $new) if $new ne $old;
+			$new eq $old or
+				upr($lei, $w, 'update', $ref, $new, $old);
 		} else {
 			upr($lei, $w, 'delete', $ref, $old);
 		}
@@ -482,6 +483,23 @@ EOM
 	bless { %$self, -osdir => $dir, -remote => $rn }, __PACKAGE__;
 }
 
+sub resume_fetch {
+	my ($self, $uri, $fini) = @_;
+	my $dst = $self->{cur_dst} // $self->{dst};
+	my @git = ('git', "--git-dir=$dst");
+	my $opt = +{ map { $_ => $self->{lei}->{$_} } (0..2) };
+	my $rn = 'origin'; # configurable?
+	for ("url=$uri", "fetch=+refs/*:refs/*", 'mirror=true') {
+		my @kv = split(/=/, $_, 2);
+		$kv[0] = "remote.$rn.$kv[0]";
+		next if $self->{dry_run};
+		run_die([@git, 'config', @kv], undef, $opt);
+	}
+	my $cmd = [ @{$self->{-torsocks}}, @git,
+			fetch_args($self->{lei}, $opt), $rn ];
+	start_cmd($self, $cmd, $opt, $fini);
+}
+
 sub clone_v1 {
 	my ($self, $nohang) = @_;
 	my $lei = $self->{lei};
@@ -495,7 +513,9 @@ sub clone_v1 {
 	if (my $fgrp = forkgroup_prep($self, $uri)) {
 		$fgrp->{-fini} = $fini;
 		push @{$self->{fgrp_todo}->{$fgrp->{-osdir}}}, $fgrp;
-	} else { # normal fetch
+	} elsif (-d $dst) {
+		resume_fetch($self, $uri, $fini);
+	} else { # normal clone
 		my $cmd = [ @{$self->{-torsocks}},
 				clone_cmd($lei, my $opt = {}), "$uri", $dst ];
 		if (defined($self->{-ent})) {
@@ -505,7 +525,7 @@ sub clone_v1 {
 						"$self->{dst}$ref";
 			}
 		}
-		start_clone($self, $cmd, $opt, $fini);
+		start_cmd($self, $cmd, $opt, $fini);
 	}
 	if (!$self->{-is_epoch} && $lei->{opt}->{'inbox-config'} =~
 				/\A(?:always|v1)\z/s) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 69/95] lei_mirror: check fingerprints before fetching
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (67 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 68/95] lei_mirror: support resuming multi-repo clones Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 70/95] clone: support loading manifest.js.gz from destination Eric Wong
                   ` (25 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

While we currently don't check an existing on-disk manifest,
using `git show-ref' can still save us precious network traffic.
---
 lib/PublicInbox/LeiMirror.pm | 54 +++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index f2111dd2..e744f06a 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -18,7 +18,7 @@ use PublicInbox::Config;
 use PublicInbox::Inbox;
 use PublicInbox::LeiCurl;
 use PublicInbox::OnDestroy;
-use Digest::SHA qw(sha256_hex);
+use Digest::SHA qw(sha256_hex sha1_hex);
 
 our $LIVE; # pid => callback
 
@@ -483,6 +483,36 @@ EOM
 	bless { %$self, -osdir => $dir, -remote => $rn }, __PACKAGE__;
 }
 
+sub fp_done {
+	my ($self, $go_fetch) = @_;
+	my $fh = delete $self->{-show_ref} // die 'BUG: no show-ref output';
+	seek($fh, SEEK_SET, 0) or die "seek(show_ref): $!";
+	$self->{-ent} // die 'BUG: no -ent';
+	my $A = $self->{-ent}->{fingerprint} // die 'BUG: no fingerprint';
+	my $B = sha1_hex(do { local $/; <$fh> } // die("read(show_ref): $!"));
+	return if $A ne $B; # $go_fetch->DESTROY fires
+	$go_fetch->cancel;
+	$self->{lei}->qerr("# $self->{-key} up-to-date");
+}
+
+sub cmp_fp_fetch {
+	my ($self, $go_fetch) = @_;
+	my $dst = $self->{cur_dst} // $self->{dst};
+	my $cmd = ['git', "--git-dir=$dst", 'show-ref'];
+	my $opt = { 2 => $self->{lei}->{2} };
+	open($opt->{1}, '+>', undef) or die "open(tmp): $!";
+	$self->{-show_ref} = $opt->{1};
+	my $done = PublicInbox::OnDestroy->new($$, \&fp_done, $self, $go_fetch);
+	start_cmd($self, $cmd, $opt, $done);
+}
+
+sub resume_fetch_maybe {
+	my ($self, $uri, $fini) = @_;
+	my $go_fetch = PublicInbox::OnDestroy->new($$, \&resume_fetch, @_);
+	cmp_fp_fetch($self, $go_fetch) if $self->{-ent} &&
+				defined($self->{-ent}->{fingerprint});
+}
+
 sub resume_fetch {
 	my ($self, $uri, $fini) = @_;
 	my $dst = $self->{cur_dst} // $self->{dst};
@@ -500,6 +530,19 @@ sub resume_fetch {
 	start_cmd($self, $cmd, $opt, $fini);
 }
 
+sub fgrp_enqueue_maybe {
+	my ($self, $fgrp) = @_;
+	my $enq = PublicInbox::OnDestroy->new($$, \&fgrp_enqueue, $self, $fgrp);
+	cmp_fp_fetch($self, $enq) if $self->{-ent} &&
+					defined($self->{-ent}->{fingerprint});
+	# $enq->DESTROY calls fgrp_enqueue otherwise
+}
+
+sub fgrp_enqueue {
+	my ($self, $fgrp) = @_;
+	push @{$self->{fgrp_todo}->{$fgrp->{-osdir}}}, $fgrp;
+}
+
 sub clone_v1 {
 	my ($self, $nohang) = @_;
 	my $lei = $self->{lei};
@@ -510,11 +553,13 @@ sub clone_v1 {
 	$self->{-torsocks} //= $curl->torsocks($lei, $uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
+	my $resume = -d $dst;
 	if (my $fgrp = forkgroup_prep($self, $uri)) {
 		$fgrp->{-fini} = $fini;
-		push @{$self->{fgrp_todo}->{$fgrp->{-osdir}}}, $fgrp;
-	} elsif (-d $dst) {
-		resume_fetch($self, $uri, $fini);
+		$resume ? fgrp_enqueue_maybe($self, $fgrp) :
+				fgrp_enqueue($self, $fgrp);
+	} elsif ($resume) {
+		resume_fetch_maybe($self, $uri, $fini);
 	} else { # normal clone
 		my $cmd = [ @{$self->{-torsocks}},
 				clone_cmd($lei, my $opt = {}), "$uri", $dst ];
@@ -844,6 +889,7 @@ EOM
 			last; # restart %$todo iteration
 		}
 	}
+	do_reap($self, 1); # finish all fingerprint checks
 	fgrp_fetch_all($self);
 	do_reap($self, 1);
 }

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 70/95] clone: support loading manifest.js.gz from destination
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (68 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 69/95] lei_mirror: check fingerprints before fetching Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 71/95] lei_mirror: delay configuring forkgroups Eric Wong
                   ` (24 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

This will allow us to quickly check fingerprints against
remotes with a single HTTP(S) request, saving us numerous
`git show-refs' invocations.
---
 Documentation/public-inbox-clone.pod | 10 ++++++++
 lib/PublicInbox/LeiMirror.pm         | 37 ++++++++++++++++++++++++----
 script/public-inbox-clone            |  2 +-
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 257967d9..9288b175 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -94,6 +94,16 @@ C<DESTINATION> directory.  If only C<--objstore=> is specified
 where C<DIR> is an empty string (C<"">), then C<objstore>
 (C<$DESTINATION/objstore>) is the implied value of C<DIR>.
 
+=item --manifest=FILE
+
+When incrementally updating an existing mirror, load the given
+manifest (typically C<manifest.js.gz>) to speed up updates.
+
+If C<FILE> is not an absolute path, it is relative to the
+C<DESTINATION> directory.  If only C<--manifest => is specified
+where C<FILE > is an empty string (C<"">), then C<manifest.js.gz>
+(C<$DESTINATION/manifest.js.gz>) is the implied value of C<FILE>.
+
 =item -n
 
 =item --dry-run
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index e744f06a..51cc6d05 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -497,6 +497,13 @@ sub fp_done {
 
 sub cmp_fp_fetch {
 	my ($self, $go_fetch) = @_;
+	# $go_fetch is either resume_fetch or fgrp_enqueue
+	my $new = $self->{-ent}->{fingerprint} // die 'BUG: no fingerprint';
+	my $key = $self->{-key} // die 'BUG: no -key';
+	if (my $cur_ent = $self->{-local_manifest}->{$key}) {
+		# runs go_fetch->DESTROY run if eq
+		return $go_fetch->cancel if $cur_ent->{fingerprint} eq $new;
+	}
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $cmd = ['git', "--git-dir=$dst", 'show-ref'];
 	my $opt = { 2 => $self->{lei}->{2} };
@@ -677,7 +684,10 @@ sub v1_done { # called via OnDestroy
 	_write_inbox_config($self);
 	my $dst = $self->{cur_dst} // $self->{dst};
 	if (defined(my $o = $self->{-ent} ? $self->{-ent}->{owner} : undef)) {
-		run_die([qw(git config -f), "$dst/config", 'gitweb.owner', $o]);
+		my $key = $self->{-key} // die 'BUG: no -key';
+		my $cur = $self->{-local_manifest}->{$key}->{owner} // "\0";
+		$cur eq $o or run_die([qw(git config -f),
+					"$dst/config", 'gitweb.owner', $o]);
 	}
 	my $o = "$dst/objects";
 	if (open(my $fh, '<', my $fn = "$o/info/alternates")) {;
@@ -796,6 +806,19 @@ sub decode_manifest ($$$) {
 	$m;
 }
 
+sub load_current_manifest ($) {
+	my ($self) = @_;
+	my $fn = $self->{-manifest} // return;
+	if (open(my $fh, '<', $fn)) {
+		decode_manifest($fh, $fn, $fn);
+	} elsif ($!{ENOENT}) { # non-fatal, we can just do it slowly
+		warn "open($fn): $!\n";
+		undef;
+	} else {
+		die "open($fn): $!\n";
+	}
+}
+
 sub multi_inbox ($$$) {
 	my ($self, $path, $m) = @_;
 	my $incl = $self->{lei}->{opt}->{include};
@@ -932,6 +955,7 @@ sub try_manifest {
 		warn $@;
 		return try_scrape($self);
 	}
+	local $self->{-local_manifest} = load_current_manifest($self);
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
 	return $lei->child_error(1, $multi) if !ref($multi);
 	my $v2 = delete $multi->{v2};
@@ -1012,10 +1036,13 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 		$ic =~ /\A(?:v1|v2|always|never)\z/s or die <<"";
 --inbox-config must be one of `always', `v2', `v1', or `never'
 
-		if (defined(my $os = $lei->{opt}->{objstore})) {
-			$os = 'objstore' if $os eq ''; # --objstore w/o args
-			$os = "$self->{dst}/$os" if $os !~ m!\A/!;
-			$self->{-objstore} = $os;
+		# we support --objstore= and --manifest= with '' (empty string)
+		for my $default (qw(objstore manifest.js.gz)) {
+			my ($k) = (split(/\./, $default))[0];
+			my $v = $lei->{opt}->{$k} // next;
+			$v = $default if $v eq '';
+			$v = "$self->{dst}/$v" if $v !~ m!\A/!;
+			$self->{"-$k"} = $v;
 		}
 		local $LIVE;
 		my $iv = $lei->{opt}->{'inbox-version'} //
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index e38d7b0d..a11c6874 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -23,7 +23,7 @@ options:
     -C DIR            chdir to specified directory
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
-	inbox-config=s inbox-version=i objstore=s
+	inbox-config=s inbox-version=i objstore=s manifest=s
 	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 71/95] lei_mirror: delay configuring forkgroups
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (69 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 70/95] clone: support loading manifest.js.gz from destination Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 72/95] clone: canonicalize destination path from CLI Eric Wong
                   ` (23 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

When relying on `public-inbox-clone --manifest=', idempotent
`git config' invocations can take a considerable amount of
time.  We still configure inboxes idempotently since it
allows quickly changing URLs to mirrors, but we just defer
it until an update is actually needed.
---
 lib/PublicInbox/LeiMirror.pm | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 51cc6d05..0e8689ca 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -434,10 +434,10 @@ sub forkgroup_prep {
 	my $os = $self->{-objstore} // return;
 	my $fg = $self->{-ent}->{forkgroup} // return;
 	my $dir = "$os/$fg.git";
-	my @cmd = ('git', "--git-dir=$dir", 'config');
-	my $opt = +{ map { $_ => $self->{lei}->{$_} } (0..2) };
 	if (!-d $dir && !$self->{dry_run}) {
 		PublicInbox::Import::init_bare($dir);
+		my @cmd = ('git', "--git-dir=$dir", 'config');
+		my $opt = { 2 => $self->{lei}->{2} };
 		for ('repack.useDeltaIslands=true',
 				'pack.island=refs/remotes/([^/]+)/') {
 			run_die([@cmd, split(/=/, $_, 2)], undef, $opt);
@@ -445,15 +445,6 @@ sub forkgroup_prep {
 	}
 	my $key = $self->{-key} // die 'BUG: no -key';
 	my $rn = substr(sha256_hex($key), 0, 16);
-	unless ($self->{dry_run}) {
-		# --no-tags is required to avoid conflicts
-		for ("url=$uri", "fetch=+refs/*:refs/remotes/$rn/*",
-				'tagopt=--no-tags') {
-			my @kv = split(/=/, $_, 2);
-			$kv[0] = "remote.$rn.$kv[0]";
-			run_die([@cmd, @kv], undef, $opt);
-		}
-	}
 	if (!-d $self->{cur_dst} && !$self->{dry_run}) {
 		my $alt = File::Spec->rel2abs("$dir/objects");
 		PublicInbox::Import::init_bare($self->{cur_dst});
@@ -480,7 +471,9 @@ sub forkgroup_prep {
 EOM
 		close $fh or die "close($f): $!";
 	}
-	bless { %$self, -osdir => $dir, -remote => $rn }, __PACKAGE__;
+	bless {
+		%$self, -osdir => $dir, -remote => $rn, -uri => $uri
+	}, __PACKAGE__;
 }
 
 sub fp_done {
@@ -547,6 +540,17 @@ sub fgrp_enqueue_maybe {
 
 sub fgrp_enqueue {
 	my ($self, $fgrp) = @_;
+	my $opt = { 2 => $self->{lei}->{2} };
+	# --no-tags is required to avoid conflicts
+	my $u = $fgrp->{-uri} // die 'BUG: no {-uri}';
+	my $rn = $fgrp->{-remote} // die 'BUG: no {-remote}';
+	my @cmd = ('git', "--git-dir=$fgrp->{-osdir}", 'config');
+	for ("url=$u", "fetch=+refs/*:refs/remotes/$rn/*", 'tagopt=--no-tags') {
+		my @kv = split(/=/, $_, 2);
+		$kv[0] = "remote.$rn.$kv[0]";
+		$self->{dry_run} ? $self->{lei}->qerr("# @cmd @kv") :
+				run_die([@cmd, @kv], undef, $opt);
+	}
 	push @{$self->{fgrp_todo}->{$fgrp->{-osdir}}}, $fgrp;
 }
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 72/95] clone: canonicalize destination path from CLI
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (70 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 71/95] lei_mirror: delay configuring forkgroups Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 73/95] clone|fetch: support passing --prune(-tags) to `git fetch' Eric Wong
                   ` (22 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

We'll probably save the destination path somewhere, so
ensure the path doesn't have redundant slashes and such
---
 script/public-inbox-clone | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index a11c6874..9a22fa21 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -37,6 +37,7 @@ defined($dst) or ($dst) = ($url =~ m!/([^/]+)/?\z!);
 index($dst, "\n") >= 0 and die "`\\n' not allowed in `$dst'";
 
 # n.b. this is still a truckload of code...
+require File::Spec;
 require PublicInbox::LEI;
 require PublicInbox::LeiExternal;
 require PublicInbox::LeiMirror;
@@ -51,7 +52,7 @@ open $lei->{3}, '.' or die "open . $!";
 my $mrr = bless {
 	lei => $lei,
 	src => $url,
-	dst => $dst,
+	dst => File::Spec->canonpath($dst),
 }, 'PublicInbox::LeiMirror';
 
 $? = 0;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 73/95] clone|fetch: support passing --prune(-tags) to `git fetch'
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (71 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 72/95] clone: canonicalize destination path from CLI Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 74/95] lei_mirror: avoid needless FD passing Eric Wong
                   ` (21 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

We need to be able to get rid of removed branches and tags on
the remote.  --prune-tags is implied for non-objstore repos,
and incompatible with objstore repos.
---
 Documentation/public-inbox-clone.pod | 7 +++++++
 Documentation/public-inbox-fetch.pod | 6 ++++++
 lib/PublicInbox/LeiMirror.pm         | 2 ++
 script/public-inbox-clone            | 1 +
 script/public-inbox-fetch            | 1 +
 5 files changed, 17 insertions(+)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 9288b175..bcf7dcc1 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -104,6 +104,13 @@ C<DESTINATION> directory.  If only C<--manifest => is specified
 where C<FILE > is an empty string (C<"">), then C<manifest.js.gz>
 (C<$DESTINATION/manifest.js.gz>) is the implied value of C<FILE>.
 
+=item -p
+
+=item --prune
+
+Pass the C<--prune> and C<--prune-tags> flags to L<git-fetch(1)>
+calls on incremental clones.
+
 =item -n
 
 =item --dry-run
diff --git a/Documentation/public-inbox-fetch.pod b/Documentation/public-inbox-fetch.pod
index c78ffc0b..c5e73d38 100644
--- a/Documentation/public-inbox-fetch.pod
+++ b/Documentation/public-inbox-fetch.pod
@@ -61,6 +61,12 @@ there are no updates:
 	public-inbox-fetch -q --exit-code && public-inbox-index
 	test $? -eq 0 || exit $?
 
+=item -p
+
+=item --prune
+
+Pass the C<--prune> and C<--prune-tags> flags to L<git-fetch(1)> calls.
+
 =item -v
 
 =item --verbose
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 0e8689ca..2473c74b 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -275,6 +275,7 @@ sub fetch_args ($$) {
 	push @cmd, '-q' if $lei->{opt}->{quiet} ||
 			($lei->{opt}->{jobs} // 1) > 1;
 	push @cmd, '-v' if $lei->{opt}->{verbose};
+	push(@cmd, '-p') if $lei->{opt}->{prune};
 	@cmd;
 }
 
@@ -527,6 +528,7 @@ sub resume_fetch {
 	}
 	my $cmd = [ @{$self->{-torsocks}}, @git,
 			fetch_args($self->{lei}, $opt), $rn ];
+	push @$cmd, '-P' if $self->{lei}->{prune}; # --prune-tags implied
 	start_cmd($self, $cmd, $opt, $fini);
 }
 
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index 9a22fa21..df9ddd37 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -24,6 +24,7 @@ options:
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
 	inbox-config=s inbox-version=i objstore=s manifest=s
+	prune|p
 	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config
diff --git a/script/public-inbox-fetch b/script/public-inbox-fetch
index 4b991a90..6fd15328 100755
--- a/script/public-inbox-fetch
+++ b/script/public-inbox-fetch
@@ -23,6 +23,7 @@ options:
     -C DIR            chdir to specified directory
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ try-remote|T=s@
+	prune|p
 	no-torsocks torsocks=s exit-code)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Fetch; # loads Admin

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 74/95] lei_mirror: avoid needless FD passing
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (72 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 73/95] clone|fetch: support passing --prune(-tags) to `git fetch' Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 75/95] clone: support --keep-going/-k like make(1) Eric Wong
                   ` (20 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

Most git processes we invoke don't care about stdin nor stdout,
so don't waste cycles and memory dealing with it.

stderr passing is added `git config --unset-all remotes.fgrptmp'
invocation, though, since that can fail due to I/O errors or OOM.
---
 lib/PublicInbox/LeiMirror.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 2473c74b..7fcb4ebb 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -300,7 +300,7 @@ sub fgrp_update {
 		qw(update-ref --stdin -z) ];
 	my $lei = $fgrp->{lei};
 	$lei->qerr("# @$cmd");
-	my $opt = { 0 => $r, 1 => $lei->{1}, 2 => $lei->{2} };
+	my $opt = { 0 => $r, 2 => $lei->{2} };
 	my $pid = spawn($cmd, undef, $opt);
 	close $r or die "close(r): $!";
 	for my $ref (keys %dst) {
@@ -332,7 +332,7 @@ sub pack_refs {
 	my $cmd = [ 'git', "--git-dir=$git_dir", qw(pack-refs --all --prune) ];
 	$self->{lei}->qerr("# @$cmd");
 	return if $self->{dry_run};
-	my $opt = { 1 => $self->{lei}->{1}, 2 => $self->{lei}->{2} };
+	my $opt = { 2 => $self->{lei}->{2} };
 	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $self, $cmd ];
 }
 
@@ -344,7 +344,7 @@ sub fgrpv_done {
 	pack_refs($first, $first->{-osdir}); # objstore refs always packed
 	for my $fgrp (@$fgrpv) {
 		my $rn = $fgrp->{-remote};
-		my %opt = map { $_ => $fgrp->{lei}->{$_} } (0..2);
+		my %opt = ( 2 => $fgrp->{lei}->{2} );
 
 		my $update_ref = $fgrp->{dry_run} ? undef :
 			PublicInbox::OnDestroy->new($$, \&fgrp_update, $fgrp);
@@ -401,7 +401,7 @@ sub fgrp_fetch_all {
 				$f, '--unset-all', "remotes.$grp"];
 		$self->{lei}->qerr("# @$cmd");
 		if (!$self->{dry_run}) {
-			$pid = spawn($cmd);
+			$pid = spawn($cmd, undef, { 2 => $self->{lei}->{2} });
 			waitpid($pid, 0) // die "waitpid: $!";
 			die "E: @$cmd: \$?=$?" if ($? && ($? >> 8) != 5);
 
@@ -518,7 +518,7 @@ sub resume_fetch {
 	my ($self, $uri, $fini) = @_;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my @git = ('git', "--git-dir=$dst");
-	my $opt = +{ map { $_ => $self->{lei}->{$_} } (0..2) };
+	my $opt = { 2 => $self->{lei}->{2} };
 	my $rn = 'origin'; # configurable?
 	for ("url=$uri", "fetch=+refs/*:refs/*", 'mirror=true') {
 		my @kv = split(/=/, $_, 2);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 75/95] clone: support --keep-going/-k like make(1)
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (73 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 74/95] lei_mirror: avoid needless FD passing Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 76/95] lei_mirror: don't warn on missing manifest on initial clone Eric Wong
                   ` (19 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

This can be useful for intermittent network errors,
and the required code changes makes it less dependent
on global state.
---
 Documentation/public-inbox-clone.pod |  6 ++++++
 lib/PublicInbox/LeiMirror.pm         | 30 ++++++++++++++++++----------
 script/public-inbox-clone            |  2 +-
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index bcf7dcc1..5e6a6fe9 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -111,6 +111,12 @@ where C<FILE > is an empty string (C<"">), then C<manifest.js.gz>
 Pass the C<--prune> and C<--prune-tags> flags to L<git-fetch(1)>
 calls on incremental clones.
 
+=item -k
+
+=item --keep-going
+
+Continue as much as possible after an error.
+
 =item -n
 
 =item --dry-run
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 7fcb4ebb..8cd64b65 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -22,6 +22,11 @@ use Digest::SHA qw(sha256_hex sha1_hex);
 
 our $LIVE; # pid => callback
 
+sub keep_going ($) {
+	$LIVE && (!$_[0]->{lei}->{child_error} ||
+		$_[0]->{lei}->{opt}->{'keep-going'});
+}
+
 sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my ($arg, $pid) = @_;
 	my ($mrr, $lei) = @$arg;
@@ -287,6 +292,7 @@ sub upr { # feed `git update-ref --stdin -z' verbosely
 
 sub fgrp_update {
 	my ($fgrp) = @_;
+	return if !keep_going($fgrp);
 	my $srcfh = delete $fgrp->{srcfh} or return;
 	my $dstfh = delete $fgrp->{dstfh} or return;
 	seek($srcfh, SEEK_SET, 0) or die "seek(src): $!";
@@ -329,6 +335,7 @@ sub pack_dst { # packs lightweight satellite repos
 sub pack_refs {
 	my ($self, $git_dir) = @_;
 	do_reap($self);
+	return if !keep_going($self);
 	my $cmd = [ 'git', "--git-dir=$git_dir", qw(pack-refs --all --prune) ];
 	$self->{lei}->qerr("# @$cmd");
 	return if $self->{dry_run};
@@ -341,6 +348,7 @@ sub fgrpv_done {
 	return if !$LIVE;
 	my $pid;
 	my $first = $fgrpv->[0] // die 'BUG: no fgrpv->[0]';
+	return if !keep_going($first);
 	pack_refs($first, $first->{-osdir}); # objstore refs always packed
 	for my $fgrp (@$fgrpv) {
 		my $rn = $fgrp->{-remote};
@@ -479,6 +487,7 @@ EOM
 
 sub fp_done {
 	my ($self, $go_fetch) = @_;
+	return if !keep_going($self);
 	my $fh = delete $self->{-show_ref} // die 'BUG: no show-ref output';
 	seek($fh, SEEK_SET, 0) or die "seek(show_ref): $!";
 	$self->{-ent} // die 'BUG: no -ent';
@@ -516,6 +525,7 @@ sub resume_fetch_maybe {
 
 sub resume_fetch {
 	my ($self, $uri, $fini) = @_;
+	return if !keep_going($self);
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my @git = ('git', "--git-dir=$dst");
 	my $opt = { 2 => $self->{lei}->{2} };
@@ -542,6 +552,7 @@ sub fgrp_enqueue_maybe {
 
 sub fgrp_enqueue {
 	my ($self, $fgrp) = @_;
+	return if !keep_going($self);
 	my $opt = { 2 => $self->{lei}->{2} };
 	# --no-tags is required to avoid conflicts
 	my $u = $fgrp->{-uri} // die 'BUG: no {-uri}';
@@ -678,15 +689,12 @@ sub reap_cmd { # async, called via SIGCHLD
 	my ($self, $cmd) = @_;
 	my $cerr = $?;
 	$? = 0; # don't let it influence normal exit
-	if ($cerr) {
-		kill('TERM', keys %$LIVE);
-		$self->{lei}->child_error($cerr, "@$cmd failed (\$?=$cerr)");
-	}
+	$self->{lei}->child_error($cerr, "@$cmd failed (\$?=$cerr)") if $cerr;
 }
 
 sub v1_done { # called via OnDestroy
 	my ($self) = @_;
-	return if $self->{dry_run} || !$LIVE;
+	return if $self->{dry_run} || !keep_going($self);
 	_write_inbox_config($self);
 	my $dst = $self->{cur_dst} // $self->{dst};
 	if (defined(my $o = $self->{-ent} ? $self->{-ent}->{owner} : undef)) {
@@ -722,7 +730,7 @@ sub v1_done { # called via OnDestroy
 
 sub v2_done { # called via OnDestroy
 	my ($self) = @_;
-	return if $self->{dry_run} || !$LIVE;
+	return if $self->{dry_run} || !keep_going($self);
 	my $dst = $self->{cur_dst} // $self->{dst};
 	require PublicInbox::Lock;
 	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
@@ -896,7 +904,7 @@ sub clone_all {
 	# handle no-dependency repos, first
 	for (@$nodep) {
 		clone_v1($_, 1);
-		return if $self->{lei}->{child_error};
+		return if !keep_going($self);
 	}
 	# resolve references, deepest, first:
 	while (scalar keys %$todo) {
@@ -913,7 +921,7 @@ EOM
 			my $y = delete $todo->{$x} // next; # already done
 			for (@$y) {
 				clone_v1($_, 1);
-				return if $self->{lei}->{child_error};
+				return if !keep_going($self);
 			}
 			last; # restart %$todo iteration
 		}
@@ -989,7 +997,7 @@ sub try_manifest {
 E: `$self->{cur_dst}' must not contain newline
 EOM
 			clone_v2_prep($self, \%v2_epochs, $m);
-			return if $self->{lei}->{child_error};
+			return if !keep_going($self);
 		}
 	}
 	if (my $v1 = delete $multi->{v1}) {
@@ -1018,7 +1026,7 @@ EOM
 	}
 	delete local $lei->{opt}->{epoch} if defined($v2);
 	clone_all($self, $m);
-	return if $self->{lei}->{child_error} || $self->{dry_run};
+	return if $self->{dry_run} || !keep_going($self);
 
 	# set by clone_v2_prep/-I/--exclude
 	dump_manifest($m => $ft) if delete $self->{-culled_manifest};
@@ -1050,7 +1058,7 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 			$v = "$self->{dst}/$v" if $v !~ m!\A/!;
 			$self->{"-$k"} = $v;
 		}
-		local $LIVE;
+		local $LIVE = {};
 		my $iv = $lei->{opt}->{'inbox-version'} //
 			return start_clone_url($self);
 		return clone_v1($self) if $iv == 1;
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index df9ddd37..efe0cff6 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -24,7 +24,7 @@ options:
 EOF
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
 	inbox-config=s inbox-version=i objstore=s manifest=s
-	prune|p
+	prune|p keep-going|k
 	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };
 require PublicInbox::Admin; # loads Config

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 76/95] lei_mirror: don't warn on missing manifest on initial clone
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (74 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 75/95] clone: support --keep-going/-k like make(1) Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 77/95] lei_mirror: respect `./' and `../' prefixes for CLI args Eric Wong
                   ` (18 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

Users may choose to specify a manifest on the initial clone,
so don't complain if it's missing in that case.
---
 lib/PublicInbox/LeiMirror.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 8cd64b65..e0a212de 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -826,7 +826,7 @@ sub load_current_manifest ($) {
 	if (open(my $fh, '<', $fn)) {
 		decode_manifest($fh, $fn, $fn);
 	} elsif ($!{ENOENT}) { # non-fatal, we can just do it slowly
-		warn "open($fn): $!\n";
+		warn "open($fn): $!\n" if -d $self->{dst};
 		undef;
 	} else {
 		die "open($fn): $!\n";

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 77/95] lei_mirror: respect `./' and `../' prefixes for CLI args
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (75 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 76/95] lei_mirror: don't warn on missing manifest on initial clone Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 78/95] lei_mirror: --manifest= affects destination, too Eric Wong
                   ` (17 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

Users may wish to keep objstore and manifest files at
a higher level to prevent direct access via HTTP(S),
so those relative paths probably make sense.
---
 Documentation/public-inbox-clone.pod | 12 ++++++------
 lib/PublicInbox/LeiMirror.pm         |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 5e6a6fe9..9bcb9967 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -89,9 +89,9 @@ for testing.
 Enables space savings when the remote C<manifest.js.gz>
 includes C<forkgroup> entries as generated by grokmirror 2.x.
 
-If C<DIR> is not an absolute path, it is relative to the
-C<DESTINATION> directory.  If only C<--objstore=> is specified
-where C<DIR> is an empty string (C<"">), then C<objstore>
+If C<DIR> does not start with C</>, C<./>, or C<../>, it is treated
+as relative to the C<DESTINATION> directory.  If only C<--objstore=>
+is specified where C<DIR> is an empty string (C<"">), then C<objstore>
 (C<$DESTINATION/objstore>) is the implied value of C<DIR>.
 
 =item --manifest=FILE
@@ -99,9 +99,9 @@ where C<DIR> is an empty string (C<"">), then C<objstore>
 When incrementally updating an existing mirror, load the given
 manifest (typically C<manifest.js.gz>) to speed up updates.
 
-If C<FILE> is not an absolute path, it is relative to the
-C<DESTINATION> directory.  If only C<--manifest => is specified
-where C<FILE > is an empty string (C<"">), then C<manifest.js.gz>
+If C<FILE> does not start with C</>, C<./>, or C<../>, it is treated
+as relative to the C<DESTINATION> directory.  If only C<--manifest=>
+is specified where C<FILE> is an empty string (C<"">), then C<manifest.js.gz>
 (C<$DESTINATION/manifest.js.gz>) is the implied value of C<FILE>.
 
 =item -p
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index e0a212de..ec2ed557 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -1055,7 +1055,7 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 			my ($k) = (split(/\./, $default))[0];
 			my $v = $lei->{opt}->{$k} // next;
 			$v = $default if $v eq '';
-			$v = "$self->{dst}/$v" if $v !~ m!\A/!;
+			$v = "$self->{dst}/$v" if $v !~ m!\A\.{0,2}/!;
 			$self->{"-$k"} = $v;
 		}
 		local $LIVE = {};

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 78/95] lei_mirror: --manifest= affects destination, too
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (76 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 77/95] lei_mirror: respect `./' and `../' prefixes for CLI args Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 79/95] lei_mirror: update fingerprints when writing local manifest.js.gz Eric Wong
                   ` (16 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

This probably makes the most sense, if a user wants to
use an alternate path to read from, it's likely they
want to write it there, too.
---
 Documentation/public-inbox-clone.pod |  4 ++++
 lib/PublicInbox/LeiMirror.pm         | 16 +++++++++++-----
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 9bcb9967..2a7081ac 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -99,6 +99,10 @@ is specified where C<DIR> is an empty string (C<"">), then C<objstore>
 When incrementally updating an existing mirror, load the given
 manifest (typically C<manifest.js.gz>) to speed up updates.
 
+By default, public-inbox writes the retrieved manifest to
+C<$DESTINATION/manifest.js.gz>, this directive also
+changes the destination to the specified C<FILE>
+
 If C<FILE> does not start with C</>, C<./>, or C<../>, it is treated
 as relative to the C<DESTINATION> directory.  If only C<--manifest=>
 is specified where C<FILE> is an empty string (C<"">), then C<manifest.js.gz>
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index ec2ed557..2e94a2fa 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -826,7 +826,7 @@ sub load_current_manifest ($) {
 	if (open(my $fh, '<', $fn)) {
 		decode_manifest($fh, $fn, $fn);
 	} elsif ($!{ENOENT}) { # non-fatal, we can just do it slowly
-		warn "open($fn): $!\n" if -d $self->{dst};
+		warn "open($fn): $!\n" if !$self->{-initial_clone};
 		undef;
 	} else {
 		die "open($fn): $!\n";
@@ -955,10 +955,15 @@ sub try_manifest {
 	my $path = $uri->path;
 	chop($path) eq '/' or die "BUG: $uri not canonicalized";
 	$uri->path($path . '/manifest.js.gz');
-	my $ft = File::Temp->new(TEMPLATE => '.manifest-XXXX',
-				UNLINK => 1, TMPDIR => 1, SUFFIX => '.tmp');
+	my $manifest = $self->{-manifest} // "$self->{dst}/manifest.js.gz";
+	my %opt = (UNLINK => 1, SUFFIX => '.tmp', TMPDIR => 1);
+	if (!$self->{dry_run} && $manifest =~ m!\A(.+?)/[^/]+\z! and -d $1) {
+		$opt{DIR} = $1; # allows fast rename(2) w/o EXDEV
+		delete $opt{TMPDIR};
+	}
+	my $ft = File::Temp->new(TEMPLATE => '.manifest-XXXX', %opt);
 	my $cmd = $curl->for_uri($lei, $uri, qw(-f -R -o), $ft->filename);
-	my %opt = map { $_ => $lei->{$_} } (0..2);
+	%opt = map { $_ => $lei->{$_} } (0..2);
 	my $cerr = run_reap($lei, $cmd, \%opt);
 	if ($cerr) {
 		return try_scrape($self) if ($cerr >> 8) == 22; # 404 missing
@@ -1030,7 +1035,7 @@ EOM
 
 	# set by clone_v2_prep/-I/--exclude
 	dump_manifest($m => $ft) if delete $self->{-culled_manifest};
-	ft_rename($ft, "$self->{dst}/manifest.js.gz", 0666);
+	ft_rename($ft, $manifest, 0666);
 	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait
 }
 
@@ -1045,6 +1050,7 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 	my $lei = $self->{lei};
 	$self->{dry_run} = 1 if $lei->{opt}->{'dry-run'};
 	umask($lei->{client_umask}) if defined $lei->{client_umask};
+	$self->{-initial_clone} = 1 if !-d $self->{dst};
 	eval {
 		my $ic = $lei->{opt}->{'inbox-config'} //= 'always';
 		$ic =~ /\A(?:v1|v2|always|never)\z/s or die <<"";

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 79/95] lei_mirror: update fingerprints when writing local manifest.js.gz
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (77 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 78/95] lei_mirror: --manifest= affects destination, too Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 80/95] lei_mirror: remove janky mirror.done stamp file Eric Wong
                   ` (15 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

We need our local manifest to match the actual data we store,
not what we're mirroring.
---
 lib/PublicInbox/LeiMirror.pm | 70 +++++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 2e94a2fa..b982f919 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -692,17 +692,46 @@ sub reap_cmd { # async, called via SIGCHLD
 	$self->{lei}->child_error($cerr, "@$cmd failed (\$?=$cerr)") if $cerr;
 }
 
+sub up_fp_done {
+	my ($self) = @_;
+	return if !keep_going($self);
+	my $fh = delete $self->{-show_ref_up} // die 'BUG: no show-ref output';
+	seek($fh, SEEK_SET, 0) or die "seek(show_ref): $!";
+	$self->{-ent} // die 'BUG: no -ent';
+	my $A = $self->{-ent}->{fingerprint} // die 'BUG: no fingerprint';
+	my $B = sha1_hex(do { local $/; <$fh> } // die("read(show_ref): $!"));
+	return if $A eq $B;
+	$self->{-ent}->{fingerprint} = $B;
+	push @{$self->{chg}->{fp_mismatch}}, $self->{-key};
+}
+
+sub update_ent {
+	my ($self) = @_;
+	my $key = $self->{-key} // die 'BUG: no -key';
+	my $new = $self->{-ent}->{fingerprint};
+	my $cur = $self->{-local_manifest}->{$key}->{fingerprint} // "\0";
+	my $dst = $self->{cur_dst} // $self->{dst};
+	if (defined($new) && $new ne $cur) {
+		my $cmd = ['git', "--git-dir=$dst", 'show-ref'];
+		my $opt = { 2 => $self->{lei}->{2} };
+		open($opt->{1}, '+>', undef) or die "open(tmp): $!";
+		$self->{-show_ref_up} = $opt->{1};
+		my $done = PublicInbox::OnDestroy->new($$, \&up_fp_done, $self);
+		start_cmd($self, $cmd, $opt, $done);
+	}
+	$new = $self->{-ent}->{owner} // return;
+	$cur = $self->{-local_manifest}->{$key}->{owner} // "\0";
+	return if $cur eq $new;
+	my $cmd = [ qw(git config -f), "$dst/config", 'gitweb.owner', $new ];
+	start_cmd($self, $cmd, { 2 => $self->{lei}->{2} });
+}
+
 sub v1_done { # called via OnDestroy
 	my ($self) = @_;
 	return if $self->{dry_run} || !keep_going($self);
 	_write_inbox_config($self);
 	my $dst = $self->{cur_dst} // $self->{dst};
-	if (defined(my $o = $self->{-ent} ? $self->{-ent}->{owner} : undef)) {
-		my $key = $self->{-key} // die 'BUG: no -key';
-		my $cur = $self->{-local_manifest}->{$key}->{owner} // "\0";
-		$cur eq $o or run_die([qw(git config -f),
-					"$dst/config", 'gitweb.owner', $o]);
-	}
+	update_ent($self) if $self->{-ent};
 	my $o = "$dst/objects";
 	if (open(my $fh, '<', my $fn = "$o/info/alternates")) {;
 		my $base = File::Spec->rel2abs($o);
@@ -797,7 +826,7 @@ failed to extract epoch number from $src
 		}
 	}
 	# filter out the epochs we skipped
-	$self->{-culled_manifest} = 1 if $m && delete(@$m{@skip});
+	$self->{chg}->{manifest} = 1 if $m && delete(@$m{@skip});
 
 	(!$self->{dry_run} && !-d $dst) and File::Path::mkpath($dst);
 
@@ -854,8 +883,8 @@ sub multi_inbox ($$$) {
 				$self->{lei}->glob2re($_) // qr/\A\Q$_\E/
 			} @$incl).'\\z)';
 		my @gone = delete @$v2{grep(!/$re/, keys %$v2)};
-		delete @$m{map { @$_ } @gone} and $self->{-culled_manifest} = 1;
-		delete @$m{grep(!/$re/, @v1)} and $self->{-culled_manifest} = 1;
+		delete @$m{map { @$_ } @gone} and $self->{chg}->{manifest} = 1;
+		delete @$m{grep(!/$re/, @v1)} and $self->{chg}->{manifest} = 1;
 		@v1 = grep(/$re/, @v1);
 	}
 	if (defined $excl) {
@@ -863,8 +892,8 @@ sub multi_inbox ($$$) {
 				$self->{lei}->glob2re($_) // qr/\A\Q$_\E/
 			} @$excl).'\\z)';
 		my @gone = delete @$v2{grep(/$re/, keys %$v2)};
-		delete @$m{map { @$_ } @gone} and $self->{-culled_manifest} = 1;
-		delete @$m{grep(/$re/, @v1)} and $self->{-culled_manifest} = 1;
+		delete @$m{map { @$_ } @gone} and $self->{chg}->{manifest} = 1;
+		delete @$m{grep(/$re/, @v1)} and $self->{chg}->{manifest} = 1;
 		@v1 = grep(!/$re/, @v1);
 	}
 	my $ret; # { v1 => [ ... ], v2 => { "/$inbox_name" => [ epochs ] }}
@@ -963,6 +992,7 @@ sub try_manifest {
 	}
 	my $ft = File::Temp->new(TEMPLATE => '.manifest-XXXX', %opt);
 	my $cmd = $curl->for_uri($lei, $uri, qw(-f -R -o), $ft->filename);
+	my $mf_url = "$uri";
 	%opt = map { $_ => $lei->{$_} } (0..2);
 	my $cerr = run_reap($lei, $cmd, \%opt);
 	if ($cerr) {
@@ -974,6 +1004,7 @@ sub try_manifest {
 		warn $@;
 		return try_scrape($self);
 	}
+	local $self->{chg} = {};
 	local $self->{-local_manifest} = load_current_manifest($self);
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
 	return $lei->child_error(1, $multi) if !ref($multi);
@@ -1034,7 +1065,22 @@ EOM
 	return if $self->{dry_run} || !keep_going($self);
 
 	# set by clone_v2_prep/-I/--exclude
-	dump_manifest($m => $ft) if delete $self->{-culled_manifest};
+	my $mis = delete $self->{chg}->{fp_mismatch};
+	if ($mis) {
+		my $t = (stat($ft))[9];
+		require POSIX;
+		$t = POSIX::strftime('%Y-%m-%d %k:%M:%S %z', localtime($t));
+		warn <<EOM;
+W: Fingerprints for the following repositories do not match
+W: $mf_url @ $t:
+W: These repositories may have updated since $t:
+EOM
+		warn "\t", $_, "\n" for @$mis;
+		warn <<EOM if !$self->{lei}->{opt}->{prune};
+W: The above fingerprints may never match without --prune
+EOM
+	}
+	dump_manifest($m => $ft) if delete($self->{chg}->{manifest}) || $mis;
 	ft_rename($ft, $manifest, 0666);
 	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait
 }

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 80/95] lei_mirror: remove janky mirror.done stamp file
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (78 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 79/95] lei_mirror: update fingerprints when writing local manifest.js.gz Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 81/95] lei_mirror: simplify most process spawning Eric Wong
                   ` (14 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

This makes a fundamental (and overdue) change to the core of
lei in how it handles child errors.  Every process which
generates or receives a child error will remember it before
passing it on.  This ensures _wq_done_wait callbacks will
know of prior errors aside from $? when it runs.
---
 lib/PublicInbox/LEI.pm       | 3 +--
 lib/PublicInbox/LeiMirror.pm | 7 +------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 8a14ace4..b78d70de 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -544,12 +544,11 @@ sub child_error { # passes non-fatal curl exit codes to user
 	local $current_lei = $self;
 	$child_error ||= 1 << 8;
 	warn(substr($msg, -1, 1) eq "\n" ? $msg : "$msg\n") if defined $msg;
+	$self->{child_error} ||= $child_error;
 	if ($self->{pkt_op_p}) { # to top lei-daemon
 		$self->{pkt_op_p}->pkt_do('child_error', $child_error);
 	} elsif ($self->{sock}) { # to lei(1) client
 		send($self->{sock}, "child_error $child_error", MSG_EOR);
-	} else { # non-lei admin command
-		$self->{child_error} ||= $child_error;
 	} # else noop if client disconnected
 }
 
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index b982f919..b8b6b504 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -30,12 +30,9 @@ sub keep_going ($) {
 sub _wq_done_wait { # dwaitpid callback (via wq_eof)
 	my ($arg, $pid) = @_;
 	my ($mrr, $lei) = @$arg;
-	my $f = "$mrr->{dst}/mirror.done";
 	if ($?) {
 		$lei->child_error($?);
-	} elsif (!$mrr->{dry_run} && !unlink($f)) {
-		warn("unlink($f): $!\n") unless $!{ENOENT};
-	} else {
+	} elsif (!$lei->{child_error}) {
 		if (!$mrr->{dry_run} && $lei->{cmd} ne 'public-inbox-clone') {
 			require PublicInbox::LeiAddExternal;
 			PublicInbox::LeiAddExternal::_finish_add_external(
@@ -249,7 +246,6 @@ sub index_cloned_inbox {
 		PublicInbox::Admin::index_inbox($ibx, undef, $opt);
 	}
 	return if defined $self->{cur_dst}; # one of many repos to clone
-	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait
 }
 
 sub run_reap {
@@ -1082,7 +1078,6 @@ EOM
 	}
 	dump_manifest($m => $ft) if delete($self->{chg}->{manifest}) || $mis;
 	ft_rename($ft, $manifest, 0666);
-	open my $x, '>', "$self->{dst}/mirror.done"; # for _wq_done_wait
 }
 
 sub start_clone_url {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 81/95] lei_mirror: simplify most process spawning
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (79 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 80/95] lei_mirror: remove janky mirror.done stamp file Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 82/95] lei_mirror: run v1_done earlier on forkgroup done Eric Wong
                   ` (13 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

For commands where we rely on successful exit codes to continue,
start_cmd() generalizes well enough to be used in a variety
of places.
---
 lib/PublicInbox/LeiMirror.pm | 51 ++++++++++--------------------------
 1 file changed, 14 insertions(+), 37 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index b8b6b504..2812b696 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -301,10 +301,10 @@ sub fgrp_update {
 	my $cmd = [ 'git', "--git-dir=$fgrp->{cur_dst}",
 		qw(update-ref --stdin -z) ];
 	my $lei = $fgrp->{lei};
-	$lei->qerr("# @$cmd");
-	my $opt = { 0 => $r, 2 => $lei->{2} };
-	my $pid = spawn($cmd, undef, $opt);
+	my $pack = PublicInbox::OnDestroy->new($$, \&pack_dst, $fgrp);
+	start_cmd($fgrp, $cmd, { 0 => $r, 2 => $lei->{2} }, $pack);
 	close $r or die "close(r): $!";
+	return if $fgrp->{dry_run};
 	for my $ref (keys %dst) {
 		my $new = delete $src{$ref};
 		my $old = $dst{$ref};
@@ -319,8 +319,6 @@ sub fgrp_update {
 		upr($lei, $w, 'create', $ref, $oid);
 	}
 	close($w) or warn "E: close(update-ref --stdin): $! (need git 1.8.5+)\n";
-	my $pack = PublicInbox::OnDestroy->new($$, \&pack_dst, $fgrp);
-	$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $cmd, $pack ];
 }
 
 sub pack_dst { # packs lightweight satellite repos
@@ -330,19 +328,13 @@ sub pack_dst { # packs lightweight satellite repos
 
 sub pack_refs {
 	my ($self, $git_dir) = @_;
-	do_reap($self);
-	return if !keep_going($self);
 	my $cmd = [ 'git', "--git-dir=$git_dir", qw(pack-refs --all --prune) ];
-	$self->{lei}->qerr("# @$cmd");
-	return if $self->{dry_run};
-	my $opt = { 2 => $self->{lei}->{2} };
-	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $self, $cmd ];
+	start_cmd($self, $cmd, { 2 => $self->{lei}->{2} });
 }
 
 sub fgrpv_done {
 	my ($fgrpv) = @_;
 	return if !$LIVE;
-	my $pid;
 	my $first = $fgrpv->[0] // die 'BUG: no fgrpv->[0]';
 	return if !keep_going($first);
 	pack_refs($first, $first->{-osdir}); # objstore refs always packed
@@ -350,30 +342,20 @@ sub fgrpv_done {
 		my $rn = $fgrp->{-remote};
 		my %opt = ( 2 => $fgrp->{lei}->{2} );
 
-		my $update_ref = $fgrp->{dry_run} ? undef :
-			PublicInbox::OnDestroy->new($$, \&fgrp_update, $fgrp);
+		my $update_ref = PublicInbox::OnDestroy->new($$,
+							\&fgrp_update, $fgrp);
 
 		my $src = [ 'git', "--git-dir=$fgrp->{-osdir}", 'for-each-ref',
 			"--format=refs/%(refname:lstrip=3)%00%(objectname)",
 			"refs/remotes/$rn/" ];
-		do_reap($fgrp);
-		$fgrp->{lei}->qerr("# @$src >SRC");
-		if ($update_ref) {
-			open(my $fh, '+>', undef) or die "open(src): $!";
-			$pid = spawn($src, undef, { %opt, 1 => $fh });
-			$fgrp->{srcfh} = $fh;
-			$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $src, $update_ref ]
-		}
+		open(my $sfh, '+>', undef) or die "open(src): $!";
+		$fgrp->{srcfh} = $sfh;
+		start_cmd($fgrp, $src, { %opt, 1 => $sfh }, $update_ref);
 		my $dst = [ 'git', "--git-dir=$fgrp->{cur_dst}", 'for-each-ref',
 			'--format=%(refname)%00%(objectname)' ];
-		do_reap($fgrp);
-		$fgrp->{lei}->qerr("# @$dst >DST");
-		if ($update_ref) {
-			open(my $fh, '+>', undef) or die "open(dst): $!";
-			$pid = spawn($dst, undef, { %opt, 1 => $fh });
-			$fgrp->{dstfh} = $fh;
-			$LIVE->{$pid} = [ \&reap_cmd, $fgrp, $dst, $update_ref ]
-		}
+		open(my $dfh, '+>', undef) or die "open(dst): $!";
+		$fgrp->{dstfh} = $dfh;
+		start_cmd($fgrp, $dst, { %opt, 1 => $dfh }, $update_ref);
 	}
 }
 
@@ -396,7 +378,6 @@ sub fgrp_fetch_all {
 			qw(--no-tags --multiple));
 	};
 	push(@fetch, "-j$j") if $j;
-	my $pid;
 	while (my ($osdir, $fgrpv) = each %$todo) {
 		my $f = "$osdir/config";
 
@@ -405,7 +386,7 @@ sub fgrp_fetch_all {
 				$f, '--unset-all', "remotes.$grp"];
 		$self->{lei}->qerr("# @$cmd");
 		if (!$self->{dry_run}) {
-			$pid = spawn($cmd, undef, { 2 => $self->{lei}->{2} });
+			my $pid = spawn($cmd, undef, { 2 => $self->{lei}->{2} });
 			waitpid($pid, 0) // die "waitpid: $!";
 			die "E: @$cmd: \$?=$?" if ($? && ($? >> 8) != 5);
 
@@ -423,12 +404,8 @@ sub fgrp_fetch_all {
 		}
 
 		$cmd = [ @git, "--git-dir=$osdir", @fetch, $grp ];
-		do_reap($self);
-		$self->{lei}->qerr("# @$cmd");
 		my $end = PublicInbox::OnDestroy->new($$, \&fgrpv_done, $fgrpv);
-		return if $self->{dry_run};
-		$pid = spawn($cmd, undef, $opt);
-		$LIVE->{$pid} = [ \&reap_cmd, $self, $cmd, $end ];
+		start_cmd($self, $cmd, $opt, $end);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 82/95] lei_mirror: run v1_done earlier on forkgroup done
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (80 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 81/95] lei_mirror: simplify most process spawning Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 83/95] lei_mirror: simplify forkgroup-related subs Eric Wong
                   ` (12 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

There's likely a circular reference somewhere which was
preventing v1_done from running early.  In any case, this allows
v1_done to run in parallel with the pack-refs process since
there's no ordering dependency between ref-packing and v1_done.
---
 lib/PublicInbox/LeiMirror.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 2812b696..29bfcabf 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -324,6 +324,7 @@ sub fgrp_update {
 sub pack_dst { # packs lightweight satellite repos
 	my ($fgrp) = @_;
 	pack_refs($fgrp, $fgrp->{cur_dst});
+	delete($fgrp->{-fini}) // die 'BUG: no {-fini}'; # call v1_done
 }
 
 sub pack_refs {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 83/95] lei_mirror: simplify forkgroup-related subs
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (81 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 82/95] lei_mirror: run v1_done earlier on forkgroup done Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 84/95] lei_mirror: shorten scope mirror objects Eric Wong
                   ` (11 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

We can pass fewer variables around on stack since $fgrp is just
a copy of $self.  We can also rely more on explicit callback
passing rather than relying on OnDestroy and ->cancel for
conditional calls.
---
 lib/PublicInbox/LeiMirror.pm | 49 +++++++++++++-----------------------
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 29bfcabf..79861d64 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -460,43 +460,36 @@ EOM
 }
 
 sub fp_done {
-	my ($self, $go_fetch) = @_;
+	my ($self, $cb, @arg) = @_;
 	return if !keep_going($self);
 	my $fh = delete $self->{-show_ref} // die 'BUG: no show-ref output';
 	seek($fh, SEEK_SET, 0) or die "seek(show_ref): $!";
 	$self->{-ent} // die 'BUG: no -ent';
 	my $A = $self->{-ent}->{fingerprint} // die 'BUG: no fingerprint';
 	my $B = sha1_hex(do { local $/; <$fh> } // die("read(show_ref): $!"));
-	return if $A ne $B; # $go_fetch->DESTROY fires
-	$go_fetch->cancel;
+	return $cb->($self, @arg) if $A ne $B;
 	$self->{lei}->qerr("# $self->{-key} up-to-date");
 }
 
-sub cmp_fp_fetch {
-	my ($self, $go_fetch) = @_;
-	# $go_fetch is either resume_fetch or fgrp_enqueue
-	my $new = $self->{-ent}->{fingerprint} // die 'BUG: no fingerprint';
+sub cmp_fp_do {
+	my ($self, $cb, @arg) = @_;
+	# $cb is either resume_fetch or fgrp_enqueue
+	$self->{-ent} // return $cb->($self, @arg);
+	my $new = $self->{-ent}->{fingerprint} // return $cb->($self, @arg);
 	my $key = $self->{-key} // die 'BUG: no -key';
 	if (my $cur_ent = $self->{-local_manifest}->{$key}) {
 		# runs go_fetch->DESTROY run if eq
-		return $go_fetch->cancel if $cur_ent->{fingerprint} eq $new;
+		return if $cur_ent->{fingerprint} eq $new;
 	}
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $cmd = ['git', "--git-dir=$dst", 'show-ref'];
 	my $opt = { 2 => $self->{lei}->{2} };
 	open($opt->{1}, '+>', undef) or die "open(tmp): $!";
 	$self->{-show_ref} = $opt->{1};
-	my $done = PublicInbox::OnDestroy->new($$, \&fp_done, $self, $go_fetch);
+	my $done = PublicInbox::OnDestroy->new($$, \&fp_done, $self, $cb, @arg);
 	start_cmd($self, $cmd, $opt, $done);
 }
 
-sub resume_fetch_maybe {
-	my ($self, $uri, $fini) = @_;
-	my $go_fetch = PublicInbox::OnDestroy->new($$, \&resume_fetch, @_);
-	cmp_fp_fetch($self, $go_fetch) if $self->{-ent} &&
-				defined($self->{-ent}->{fingerprint});
-}
-
 sub resume_fetch {
 	my ($self, $uri, $fini) = @_;
 	return if !keep_going($self);
@@ -516,18 +509,10 @@ sub resume_fetch {
 	start_cmd($self, $cmd, $opt, $fini);
 }
 
-sub fgrp_enqueue_maybe {
-	my ($self, $fgrp) = @_;
-	my $enq = PublicInbox::OnDestroy->new($$, \&fgrp_enqueue, $self, $fgrp);
-	cmp_fp_fetch($self, $enq) if $self->{-ent} &&
-					defined($self->{-ent}->{fingerprint});
-	# $enq->DESTROY calls fgrp_enqueue otherwise
-}
-
 sub fgrp_enqueue {
-	my ($self, $fgrp) = @_;
-	return if !keep_going($self);
-	my $opt = { 2 => $self->{lei}->{2} };
+	my ($fgrp) = @_;
+	return if !keep_going($fgrp);
+	my $opt = { 2 => $fgrp->{lei}->{2} };
 	# --no-tags is required to avoid conflicts
 	my $u = $fgrp->{-uri} // die 'BUG: no {-uri}';
 	my $rn = $fgrp->{-remote} // die 'BUG: no {-remote}';
@@ -535,10 +520,11 @@ sub fgrp_enqueue {
 	for ("url=$u", "fetch=+refs/*:refs/remotes/$rn/*", 'tagopt=--no-tags') {
 		my @kv = split(/=/, $_, 2);
 		$kv[0] = "remote.$rn.$kv[0]";
-		$self->{dry_run} ? $self->{lei}->qerr("# @cmd @kv") :
+		$fgrp->{dry_run} ? $fgrp->{lei}->qerr("# @cmd @kv") :
 				run_die([@cmd, @kv], undef, $opt);
 	}
-	push @{$self->{fgrp_todo}->{$fgrp->{-osdir}}}, $fgrp;
+	$fgrp->{fgrp_todo} // die 'BUG: no fgrp_todo';
+	push @{$fgrp->{fgrp_todo}->{$fgrp->{-osdir}}}, $fgrp;
 }
 
 sub clone_v1 {
@@ -554,10 +540,9 @@ sub clone_v1 {
 	my $resume = -d $dst;
 	if (my $fgrp = forkgroup_prep($self, $uri)) {
 		$fgrp->{-fini} = $fini;
-		$resume ? fgrp_enqueue_maybe($self, $fgrp) :
-				fgrp_enqueue($self, $fgrp);
+		$resume ? cmp_fp_do($fgrp, \&fgrp_enqueue) : fgrp_enqueue($fgrp)
 	} elsif ($resume) {
-		resume_fetch_maybe($self, $uri, $fini);
+		cmp_fp_do($self, \&resume_fetch, $uri, $fini);
 	} else { # normal clone
 		my $cmd = [ @{$self->{-torsocks}},
 				clone_cmd($lei, my $opt = {}), "$uri", $dst ];

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 84/95] lei_mirror: shorten scope mirror objects
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (82 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 83/95] lei_mirror: simplify forkgroup-related subs Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 85/95] lei_mirror: set {head} from manifest Eric Wong
                   ` (10 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

We may be able to save some memory this way.
---
 lib/PublicInbox/LeiMirror.pm | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 79861d64..f7db5a49 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -881,18 +881,20 @@ sub multi_inbox ($$$) {
 sub clone_all {
 	my ($self, $m) = @_;
 	my $todo = delete $self->{todo};
-	my $nodep = delete $todo->{''};
-
-	# do not download unwanted deps
-	my $any_want = delete $self->{any_want};
-	my @unwanted = grep { !$any_want->{$_} } keys %$todo;
-	my @nodep = delete(@$todo{@unwanted});
-	push(@$nodep, @$_) for @nodep;
-
-	# handle no-dependency repos, first
-	for (@$nodep) {
-		clone_v1($_, 1);
-		return if !keep_going($self);
+	{
+		my $nodep = delete $todo->{''};
+
+		# do not download unwanted deps
+		my $any_want = delete $self->{any_want};
+		my @unwanted = grep { !$any_want->{$_} } keys %$todo;
+		my @nodep = delete(@$todo{@unwanted});
+		push(@$nodep, @$_) for @nodep;
+
+		# handle no-dependency repos, first
+		for (@$nodep) {
+			clone_v1($_, 1);
+			return if !keep_going($self);
+		}
 	}
 	# resolve references, deepest, first:
 	while (scalar keys %$todo) {

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 85/95] lei_mirror: set {head} from manifest
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (83 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 84/95] lei_mirror: shorten scope mirror objects Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 86/95] lei_mirror: support {symlinks} " Eric Wong
                   ` (9 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

We handle symbolic refs properly, at least.  It's also possible
for $GIT_DIR/HEAD to contain a full SHA-1/SHA-256, and we'll
support that by using update-ref --no-deref
---
 lib/PublicInbox/LeiMirror.pm | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index f7db5a49..fc5bc88d 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -678,6 +678,23 @@ sub update_ent {
 		my $done = PublicInbox::OnDestroy->new($$, \&up_fp_done, $self);
 		start_cmd($self, $cmd, $opt, $done);
 	}
+
+	$new = $self->{-ent}->{head};
+	$cur = $self->{-local_manifest}->{$key}->{head} // "\0";
+	if (defined($new) && $new ne $cur) {
+		# n.b. grokmirror writes raw contents to $dst/HEAD w/o locking
+		my $cmd = [ 'git', "--git-dir=$dst" ];
+		if ($new =~ s/\Aref: //) {
+			push @$cmd, qw(symbolic-ref HEAD), $new;
+		} elsif ($new =~ /\A[a-f0-9]{40,}\z/) {
+			push @$cmd, qw(update-ref --no-deref HEAD), $new;
+		} else {
+			undef $cmd;
+			warn "W: $key: {head} => `$new' not understood\n";
+		}
+		start_cmd($self, $cmd, { 2 => $self->{lei}->{2} }) if $cmd;
+	}
+
 	$new = $self->{-ent}->{owner} // return;
 	$cur = $self->{-local_manifest}->{$key}->{owner} // "\0";
 	return if $cur eq $new;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 86/95] lei_mirror: support {symlinks} from manifest
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (84 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 85/95] lei_mirror: set {head} from manifest Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 87/95] lei_mirror: eliminate circular references Eric Wong
                   ` (8 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

It's part of grokmirror, and useful for keeping compatibility.
We can make use of File::Spec->abs2rel here to ensure our
symlinks are relative and the entire mirror can be copied
as a whole.
---
 lib/PublicInbox/LeiMirror.pm | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index fc5bc88d..47db9ccd 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -664,6 +664,7 @@ sub up_fp_done {
 	push @{$self->{chg}->{fp_mismatch}}, $self->{-key};
 }
 
+# modifies the to-be-written manifest entry, and sets values from it, too
 sub update_ent {
 	my ($self) = @_;
 	my $key = $self->{-key} // die 'BUG: no -key';
@@ -694,7 +695,26 @@ sub update_ent {
 		}
 		start_cmd($self, $cmd, { 2 => $self->{lei}->{2} }) if $cmd;
 	}
-
+	if (my $symlinks = $self->{-ent}->{symlinks}) {
+		my $top = File::Spec->rel2abs($self->{dst});
+		for my $p (@$symlinks) {
+			my $ln = "$top/$p";
+			$ln =~ tr!/!/!s;
+			my (undef, $dn, $bn) = File::Spec->splitpath($ln);
+			File::Path::mkpath($dn);
+			my $tgt = "$top/$key";
+			$tgt = File::Spec->abs2rel($tgt, $dn);
+			if (lstat($ln)) {
+				if (-l _) {
+					next if readlink($ln) eq $tgt;
+					unlink($ln) or die "unlink($ln): $!";
+				} else {
+					push @{$self->{chg}->{badlink}}, $p;
+				}
+			}
+			symlink($tgt, $ln) or die "symlink($tgt, $ln): $!";
+		}
+	}
 	$new = $self->{-ent}->{owner} // return;
 	$cur = $self->{-local_manifest}->{$key}->{owner} // "\0";
 	return if $cur eq $new;
@@ -1059,6 +1079,10 @@ W: The above fingerprints may never match without --prune
 EOM
 	}
 	dump_manifest($m => $ft) if delete($self->{chg}->{manifest}) || $mis;
+	my $bad = delete $self->{chg}->{badlink};
+	warn(<<EOM, map { ("\t", $_, "\n") } @$bad) if $bad;
+W: The following exist and have not been converted to symlinks
+EOM
 	ft_rename($ft, $manifest, 0666);
 }
 

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 87/95] lei_mirror: eliminate circular references
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (85 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 86/95] lei_mirror: support {symlinks} " Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 88/95] lei_mirror: use curl -z/--timecond if manifest exists Eric Wong
                   ` (7 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

...by using local-ized globals.  While non-globals could work,
eliminating the {todo} and {fgrp_todo} refs in all sub-refs
is more error-prone and the `local' construct is convenient.

This allows us to get rid of the `delete $fgrp->{-fini}' call
in pack_refs and eliminates the indiscriminate reaping of all
processes before calling fgrp_fetch_all.  This means we can
fully depend on DESTROY to provide predictable dependency
handling while supporting parallelization.

Global $TODO and $FGRP_TODO now become SCALAR refs on
consumption so they can act as assertions to detect future bugs.
---
 lib/PublicInbox/LeiMirror.pm | 43 ++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 47db9ccd..b30cc519 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -21,6 +21,8 @@ use PublicInbox::OnDestroy;
 use Digest::SHA qw(sha256_hex sha1_hex);
 
 our $LIVE; # pid => callback
+our $FGRP_TODO; # objstore -> [ fgrp mirror objects ]
+our $TODO; # reference => [ non-fgrp mirror objects ]
 
 sub keep_going ($) {
 	$LIVE && (!$_[0]->{lei}->{child_error} ||
@@ -324,7 +326,6 @@ sub fgrp_update {
 sub pack_dst { # packs lightweight satellite repos
 	my ($fgrp) = @_;
 	pack_refs($fgrp, $fgrp->{cur_dst});
-	delete($fgrp->{-fini}) // die 'BUG: no {-fini}'; # call v1_done
 }
 
 sub pack_refs {
@@ -362,7 +363,8 @@ sub fgrpv_done {
 
 sub fgrp_fetch_all {
 	my ($self) = @_;
-	my $todo = delete $self->{fgrp_todo} or return;
+	my $todo = $FGRP_TODO;
+	$FGRP_TODO = \'BUG on further use';
 	keys(%$todo) or return;
 
 	# Rely on the fgrptmp remote groups in the config file rather
@@ -510,7 +512,7 @@ sub resume_fetch {
 }
 
 sub fgrp_enqueue {
-	my ($fgrp) = @_;
+	my ($fgrp, $end) = @_; # $end calls fgrp_fetch_all
 	return if !keep_going($fgrp);
 	my $opt = { 2 => $fgrp->{lei}->{2} };
 	# --no-tags is required to avoid conflicts
@@ -523,12 +525,11 @@ sub fgrp_enqueue {
 		$fgrp->{dry_run} ? $fgrp->{lei}->qerr("# @cmd @kv") :
 				run_die([@cmd, @kv], undef, $opt);
 	}
-	$fgrp->{fgrp_todo} // die 'BUG: no fgrp_todo';
-	push @{$fgrp->{fgrp_todo}->{$fgrp->{-osdir}}}, $fgrp;
+	push @{$FGRP_TODO->{$fgrp->{-osdir}}}, $fgrp;
 }
 
 sub clone_v1 {
-	my ($self, $nohang) = @_;
+	my ($self, $end) = @_;
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
 	my $uri = URI->new($self->{cur_src} // $self->{src});
@@ -540,7 +541,8 @@ sub clone_v1 {
 	my $resume = -d $dst;
 	if (my $fgrp = forkgroup_prep($self, $uri)) {
 		$fgrp->{-fini} = $fini;
-		$resume ? cmp_fp_do($fgrp, \&fgrp_enqueue) : fgrp_enqueue($fgrp)
+		$resume ? cmp_fp_do($fgrp, \&fgrp_enqueue, $end)
+			: fgrp_enqueue($fgrp, $end);
 	} elsif ($resume) {
 		cmp_fp_do($self, \&resume_fetch, $uri, $fini);
 	} else { # normal clone
@@ -562,10 +564,10 @@ sub clone_v1 {
 
 	my $d = $self->{-ent} ? $self->{-ent}->{description} : undef;
 	$self->{'txt.description'} = $d if defined $d;
-	(!defined($d) && !$nohang) and
+	(!defined($d) && !$end) and
 		_get_txt_start($self, 'description', $fini);
 
-	$nohang or do_reap($self, 1); # for non-manifest clone
+	$end or do_reap($self, 1); # for non-manifest clone
 }
 
 sub parse_epochs ($$) {
@@ -785,7 +787,6 @@ sub clone_v2_prep ($$;$) {
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $want = parse_epochs($lei->{opt}->{epoch}, $v2_epochs);
 	my $task = $m ? bless { %$self }, __PACKAGE__ : $self;
-	delete $task->{todo}; # $self->{todo} still exists
 	my (@skip, $desc);
 	my $fini = PublicInbox::OnDestroy->new($$, \&v2_done, $task);
 	for my $nr (sort { $a <=> $b } keys %$v2_epochs) {
@@ -813,7 +814,7 @@ failed to extract epoch number from $src
 			$etask->{cur_dst} = $edst;
 			$etask->{-is_epoch} = $fini;
 			my $ref = $ent->{reference} // '';
-			push @{$self->{todo}->{$ref}}, $etask;
+			push @{$TODO->{$ref}}, $etask;
 			$self->{any_want}->{$key} = 1;
 		} else { # create a placeholder so users only need to chmod +w
 			init_placeholder($src, $edst, $ent);
@@ -917,7 +918,9 @@ sub multi_inbox ($$$) {
 
 sub clone_all {
 	my ($self, $m) = @_;
-	my $todo = delete $self->{todo};
+	my $todo = $TODO;
+	$TODO = \'BUG on further use';
+	my $end = PublicInbox::OnDestroy->new($$, \&fgrp_fetch_all, $self);
 	{
 		my $nodep = delete $todo->{''};
 
@@ -929,7 +932,7 @@ sub clone_all {
 
 		# handle no-dependency repos, first
 		for (@$nodep) {
-			clone_v1($_, 1);
+			clone_v1($_, $end);
 			return if !keep_going($self);
 		}
 	}
@@ -947,14 +950,16 @@ EOM
 			}
 			my $y = delete $todo->{$x} // next; # already done
 			for (@$y) {
-				clone_v1($_, 1);
+				clone_v1($_, $end);
 				return if !keep_going($self);
 			}
 			last; # restart %$todo iteration
 		}
 	}
-	do_reap($self, 1); # finish all fingerprint checks
-	fgrp_fetch_all($self);
+
+	# $end->DESTROY will call fgrp_fetch_all once all references
+	# in $LIVE are gone, and do_reap will eventually drain $LIVE
+	$end = undef;
 	do_reap($self, 1);
 }
 
@@ -1007,8 +1012,6 @@ sub try_manifest {
 	my ($path_pfx, $n, $multi) = multi_inbox($self, \$path, $m);
 	return $lei->child_error(1, $multi) if !ref($multi);
 	my $v2 = delete $multi->{v2};
-	local $self->{todo} = {};
-	local $self->{fgrp_todo} = {}; # { objstore_dir => [fgrp, ...] }
 	if ($v2) {
 		for my $name (sort keys %$v2) {
 			my $epochs = delete $v2->{$name};
@@ -1054,7 +1057,7 @@ E: `$task->{cur_dst}' must not contain newline
 EOM
 			$task->{cur_src} .= '/';
 			my $dep = $task->{-ent}->{reference} // '';
-			push @{$self->{todo}->{$dep}}, $task; # for clone_all
+			push @{$TODO->{$dep}}, $task; # for clone_all
 			$self->{any_want}->{$name} = 1;
 		}
 	}
@@ -1112,6 +1115,8 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 			$self->{"-$k"} = $v;
 		}
 		local $LIVE = {};
+		local $TODO = {};
+		local $FGRP_TODO = {};
 		my $iv = $lei->{opt}->{'inbox-version'} //
 			return start_clone_url($self);
 		return clone_v1($self) if $iv == 1;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 88/95] lei_mirror: use curl -z/--timecond if manifest exists
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (86 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 87/95] lei_mirror: eliminate circular references Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 89/95] lei_mirror: avoid redundant curl `-f' use Eric Wong
                   ` (6 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

This lets us save cycles and avoid scanning + comparing manifest
contents by relying on the Last-Modified HTTP response header.
---
 lib/PublicInbox/LeiMirror.pm | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index b30cc519..cc5ea1d2 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -995,6 +995,7 @@ sub try_manifest {
 	}
 	my $ft = File::Temp->new(TEMPLATE => '.manifest-XXXX', %opt);
 	my $cmd = $curl->for_uri($lei, $uri, qw(-f -R -o), $ft->filename);
+	push(@$cmd, '-z', $manifest) if -f $manifest;
 	my $mf_url = "$uri";
 	%opt = map { $_ => $lei->{$_} } (0..2);
 	my $cerr = run_reap($lei, $cmd, \%opt);
@@ -1002,6 +1003,10 @@ sub try_manifest {
 		return try_scrape($self) if ($cerr >> 8) == 22; # 404 missing
 		return $lei->child_error($cerr, "@$cmd failed");
 	}
+
+	# bail out if curl -z/--timecond hit 304 Not Modified, $ft will be empty
+	return $lei->qerr("# $manifest unchanged") if -f $manifest && !-s $ft;
+
 	my $m = eval { decode_manifest($ft, $ft, $uri) };
 	if ($@) {
 		warn $@;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 89/95] lei_mirror: avoid redundant curl `-f' use
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (87 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 88/95] lei_mirror: use curl -z/--timecond if manifest exists Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 90/95] lei_mirror: omit trailing slash for git remote.*.url Eric Wong
                   ` (5 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

All of our curl invocations use the `-f' (--fail) switch
anyways, and I can't imagine a time when we'd want silent
failures.
---
 lib/PublicInbox/LeiMirror.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index cc5ea1d2..cf4e58f1 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -51,7 +51,7 @@ sub try_scrape {
 	my $uri = URI->new($self->{src});
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
-	my $cmd = $curl->for_uri($lei, $uri, qw(-f --compressed));
+	my $cmd = $curl->for_uri($lei, $uri, '--compressed');
 	my $opt = { 0 => $lei->{0}, 2 => $lei->{2} };
 	my $fh = popen_rd($cmd, undef, $opt);
 	my $html = do { local $/; <$fh> } // die "read(curl $uri): $!";
@@ -151,7 +151,7 @@ sub _get_txt_start { # non-fatal
 	my $f = (split(m!/!, $endpoint))[-1];
 	my $ft = File::Temp->new(TEMPLATE => "$f-XXXX", TMPDIR => 1);
 	my $opt = { 0 => $lei->{0}, 1 => $lei->{1}, 2 => $lei->{2} };
-	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(-f --compressed -R -o),
+	my $cmd = $self->{curl}->for_uri($lei, $uri, qw(--compressed -R -o),
 					$ft->filename);
 	do_reap($self);
 	$lei->qerr("# @$cmd");
@@ -994,7 +994,7 @@ sub try_manifest {
 		delete $opt{TMPDIR};
 	}
 	my $ft = File::Temp->new(TEMPLATE => '.manifest-XXXX', %opt);
-	my $cmd = $curl->for_uri($lei, $uri, qw(-f -R -o), $ft->filename);
+	my $cmd = $curl->for_uri($lei, $uri, qw(-R -o), $ft->filename);
 	push(@$cmd, '-z', $manifest) if -f $manifest;
 	my $mf_url = "$uri";
 	%opt = map { $_ => $lei->{$_} } (0..2);

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 90/95] lei_mirror: omit trailing slash for git remote.*.url
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (88 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 89/95] lei_mirror: avoid redundant curl `-f' use Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 91/95] lei_mirror: set info/web/last-modified from manifest Eric Wong
                   ` (4 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

While PublicInbox::WWW URLs have a trailing slash in them
for compatibility with static web server mirrors, URLs
intended for `git clone' don't benefit from this and the
trailing `/' just looks awkward.
---
 lib/PublicInbox/LeiMirror.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index cf4e58f1..5b7cf9e6 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -533,6 +533,8 @@ sub clone_v1 {
 	my $lei = $self->{lei};
 	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
 	my $uri = URI->new($self->{cur_src} // $self->{src});
+	my $path = $uri->path;
+	$path =~ s!/*\z!! and $uri->path($path);
 	defined($lei->{opt}->{epoch}) and
 		die "$uri is a v1 inbox, --epoch is not supported\n";
 	$self->{-torsocks} //= $curl->torsocks($lei, $uri) or return;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 91/95] lei_mirror: set info/web/last-modified from manifest
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (89 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 90/95] lei_mirror: omit trailing slash for git remote.*.url Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 92/95] lei_mirror: don't clobber inbox.config.example if it exists Eric Wong
                   ` (3 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

The grokmirror manifest sets {modified}, so we might as well use
it to make life easier for users of cgit (and compatible)
front-ends.
---
 lib/PublicInbox/LeiMirror.pm | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 5b7cf9e6..e284f55d 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -19,6 +19,7 @@ use PublicInbox::Inbox;
 use PublicInbox::LeiCurl;
 use PublicInbox::OnDestroy;
 use Digest::SHA qw(sha256_hex sha1_hex);
+use POSIX qw(strftime);
 
 our $LIVE; # pid => callback
 our $FGRP_TODO; # objstore -> [ fgrp mirror objects ]
@@ -200,22 +201,14 @@ sub _write_inbox_config {
 sub set_description ($) {
 	my ($self) = @_;
 	my $dst = $self->{cur_dst} // $self->{dst};
-	my $f = "$dst/description";
-	open my $fh, '+>>', $f or die "open($f): $!";
-	seek($fh, 0, SEEK_SET) or die "seek($f): $!";
-	my $d = do { local $/; <$fh> } // die "read($f): $!";
-	chomp(my $orig = $d);
+	chomp(my $orig = PublicInbox::Git::try_cat("$dst/description"));
+	my $d = $orig;
 	while (defined($d) && ($d =~ m!^\(\$INBOX_DIR/description missing\)! ||
 			$d =~ /^Unnamed repository/ || $d !~ /\S/)) {
 		$d = delete($self->{'txt.description'});
 	}
 	$d //= 'mirror of '.($self->{cur_src} // $self->{src});
-	chomp $d;
-	return if $d eq $orig;
-	seek($fh, 0, SEEK_SET) or die "seek($f): $!";
-	truncate($fh, 0) or die "truncate($f): $!";
-	print $fh $d, "\n" or die "print($f): $!";
-	close $fh or die "close($f): $!";
+	atomic_write($dst, 'description', $d."\n") if $d ne $orig;
 }
 
 sub index_cloned_inbox {
@@ -668,6 +661,14 @@ sub up_fp_done {
 	push @{$self->{chg}->{fp_mismatch}}, $self->{-key};
 }
 
+sub atomic_write ($$$) {
+	my ($dn, $bn, $raw) = @_;
+	my $ft = File::Temp->new(DIR => $dn, TEMPLATE => "$bn-XXXX");
+	print $ft $raw or die "print($ft): $!";
+	$ft->flush or die "flush($ft): $!";
+	ft_rename($ft, "$dn/$bn", 0666);
+}
+
 # modifies the to-be-written manifest entry, and sets values from it, too
 sub update_ent {
 	my ($self) = @_;
@@ -683,7 +684,6 @@ sub update_ent {
 		my $done = PublicInbox::OnDestroy->new($$, \&up_fp_done, $self);
 		start_cmd($self, $cmd, $opt, $done);
 	}
-
 	$new = $self->{-ent}->{head};
 	$cur = $self->{-local_manifest}->{$key}->{head} // "\0";
 	if (defined($new) && $new ne $cur) {
@@ -719,6 +719,14 @@ sub update_ent {
 			symlink($tgt, $ln) or die "symlink($tgt, $ln): $!";
 		}
 	}
+	if (defined(my $t = $self->{-ent}->{modified})) {
+		my ($dn, $bn) = ("$dst/info/web", 'last-modified');
+		my $orig = PublicInbox::Git::try_cat("$dn/$bn");
+		$t = strftime('%F %T', gmtime($t))." +0000\n";
+		File::Path::mkpath($dn);
+		atomic_write($dn, $bn, $t) if $orig ne $t;
+	}
+
 	$new = $self->{-ent}->{owner} // return;
 	$cur = $self->{-local_manifest}->{$key}->{owner} // "\0";
 	return if $cur eq $new;
@@ -1076,8 +1084,7 @@ EOM
 	my $mis = delete $self->{chg}->{fp_mismatch};
 	if ($mis) {
 		my $t = (stat($ft))[9];
-		require POSIX;
-		$t = POSIX::strftime('%Y-%m-%d %k:%M:%S %z', localtime($t));
+		$t = strftime('%F %k:%M:%S %z', localtime($t));
 		warn <<EOM;
 W: Fingerprints for the following repositories do not match
 W: $mf_url @ $t:

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 92/95] lei_mirror: don't clobber inbox.config.example if it exists
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (90 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 91/95] lei_mirror: set info/web/last-modified from manifest Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 93/95] lei_mirror: break out of fgrp fetch iteration early Eric Wong
                   ` (2 subsequent siblings)
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

Users may save notes or edits in there, and it's only an
example, so there's no need to mindlessly clobber it.
---
 lib/PublicInbox/LeiMirror.pm | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index e284f55d..04e54955 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -181,13 +181,16 @@ sub _write_inbox_config {
 	my $buf = delete($self->{'txt._/text/config/raw'}) // return;
 	my $dst = $self->{cur_dst} // $self->{dst};
 	my $f = "$dst/inbox.config.example";
-	open my $fh, '>', $f or die "open($f): $!";
-	print $fh $buf or die "print: $!";
-	chmod(0444 & ~umask, $fh) or die "chmod($f): $!";
 	my $mtime = delete $self->{'mtime._/text/config/raw'};
-	$fh->flush or die "flush($f): $!";
-	if (defined $mtime) {
-		utime($mtime, $mtime, $fh) or die "utime($f): $!";
+	if (sysopen(my $fh, $f, O_CREAT|O_EXCL|O_WRONLY)) {
+		print $fh $buf or die "print: $!";
+		chmod(0444 & ~umask, $fh) or die "chmod($f): $!";
+		$fh->flush or die "flush($f): $!";
+		if (defined $mtime) {
+			utime($mtime, $mtime, $fh) or die "utime($f): $!";
+		}
+	} elsif (!$!{EEXIST}) {
+		die "open($f): $!";
 	}
 	my $cfg = PublicInbox::Config->git_config_dump($f, $self->{lei}->{2});
 	my $ibx = $self->{ibx} = {};

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 93/95] lei_mirror: break out of fgrp fetch iteration early
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (91 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 92/95] lei_mirror: don't clobber inbox.config.example if it exists Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 94/95] clone: support --project-list= for cgit Eric Wong
  2022-11-28  5:32 ` [PATCH 95/95] lei_mirror: handle forkgroup changes Eric Wong
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

Don't queue up more work if we already have a failure somewhere.
---
 lib/PublicInbox/LeiMirror.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 04e54955..0508c9a8 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -379,6 +379,7 @@ sub fgrp_fetch_all {
 	push(@fetch, "-j$j") if $j;
 	while (my ($osdir, $fgrpv) = each %$todo) {
 		my $f = "$osdir/config";
+		return if !keep_going($self);
 
 		# clobber group from previous run atomically
 		my $cmd = ['git', "--git-dir=$osdir", qw(config -f),

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 94/95] clone: support --project-list= for cgit
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (92 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 93/95] lei_mirror: break out of fgrp fetch iteration early Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  2022-11-28  5:32 ` [PATCH 95/95] lei_mirror: handle forkgroup changes Eric Wong
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

grokmirror supports it, and we also support cgit, so this should
make running mirrors easier.  This will be useful for scripting
purposes, too.
---
 lib/PublicInbox/LeiMirror.pm | 39 +++++++++++++++++++++++++++++++++---
 script/public-inbox-clone    |  4 ++++
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 0508c9a8..d4b14699 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -991,6 +991,34 @@ sub dump_manifest ($$) {
 	utime($mtime, $mtime, "$ft") or die "utime(..., $ft): $!";
 }
 
+sub dump_project_list ($$) {
+	my ($self, $m) = @_;
+	my $f = $self->{'-project-list'} // return;
+	my $old = PublicInbox::Git::try_cat($f);
+	my %new;
+
+	open my $dh, '<', '.' or die "open(.): $!";
+	chdir($self->{dst}) or die "chdir($self->{dst}): $!";
+	my @local = grep { -e $_ ? ($new{$_} = undef) : 1 } split(/\n/s, $old);
+	chdir($dh) or die "chdir(restore): $!";
+
+	$new{substr($_, 1)} = 1 for keys %$m; # drop leading '/'
+	my @list = sort keys %new;
+	my @remote = grep { !defined($new{$_}) } @list;
+
+	warn <<EOM if @remote;
+The following local repositories are ignored/gone from $self->{src}:
+EOM
+	warn "\t", $_, "\n" for @remote;
+	warn <<EOM if @local;
+The following repos in $f no longer exist on the filesystem:
+EOM
+	warn "\t", $_, "\n" for @local;
+
+	my (undef, $dn, $bn) = File::Spec->splitpath($f);
+	atomic_write($dn, $bn, join("\n", @list, ''));
+}
+
 # FIXME: this gets confused by single inbox instance w/ global manifest.js.gz
 sub try_manifest {
 	my ($self) = @_;
@@ -1104,6 +1132,7 @@ EOM
 	warn(<<EOM, map { ("\t", $_, "\n") } @$bad) if $bad;
 W: The following exist and have not been converted to symlinks
 EOM
+	dump_project_list($self, $m);
 	ft_rename($ft, $manifest, 0666);
 }
 
@@ -1124,14 +1153,18 @@ sub do_mirror { # via wq_io_do or public-inbox-clone
 		$ic =~ /\A(?:v1|v2|always|never)\z/s or die <<"";
 --inbox-config must be one of `always', `v2', `v1', or `never'
 
-		# we support --objstore= and --manifest= with '' (empty string)
-		for my $default (qw(objstore manifest.js.gz)) {
-			my ($k) = (split(/\./, $default))[0];
+		# we support these switches with '' (empty string).
+		# defaults match example conf distributed with grokmirror
+		my @pairs = qw(objstore objstore manifest manifest.js.gz
+				project-list projects.list);
+		while (@pairs) {
+			my ($k, $default) = splice(@pairs, 0, 2);
 			my $v = $lei->{opt}->{$k} // next;
 			$v = $default if $v eq '';
 			$v = "$self->{dst}/$v" if $v !~ m!\A\.{0,2}/!;
 			$self->{"-$k"} = $v;
 		}
+
 		local $LIVE = {};
 		local $TODO = {};
 		local $FGRP_TODO = {};
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
index efe0cff6..677c56c8 100755
--- a/script/public-inbox-clone
+++ b/script/public-inbox-clone
@@ -22,8 +22,12 @@ options:
     --quiet | -q      increase verbosity (may be repeated)
     -C DIR            chdir to specified directory
 EOF
+
+# cgit calls it `project-list', grokmirror calls it `projectslist',
+# support both :/
 GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@ include|I=s@ exclude=s@
 	inbox-config=s inbox-version=i objstore=s manifest=s
+	project-list|projectslist=s
 	prune|p keep-going|k
 	dry-run|n jobs|j=i no-torsocks torsocks=s epoch=s)) or die $help;
 if ($opt->{help}) { print $help; exit };

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH 95/95] lei_mirror: handle forkgroup changes
  2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
                   ` (93 preceding siblings ...)
  2022-11-28  5:32 ` [PATCH 94/95] clone: support --project-list= for cgit Eric Wong
@ 2022-11-28  5:32 ` Eric Wong
  94 siblings, 0 replies; 96+ messages in thread
From: Eric Wong @ 2022-11-28  5:32 UTC (permalink / raw)
  To: meta

Forkgroups for projects are not static and may change at
the whim of the remote sysadmin.  Ensure we can migrate
to the new forkgroup.

Old forkgroups do not get pruned, yet, and their entries
stay in alternates.
---
 lib/PublicInbox/LeiMirror.pm | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index d4b14699..33cf55ab 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -428,20 +428,9 @@ sub forkgroup_prep {
 	my $key = $self->{-key} // die 'BUG: no -key';
 	my $rn = substr(sha256_hex($key), 0, 16);
 	if (!-d $self->{cur_dst} && !$self->{dry_run}) {
-		my $alt = File::Spec->rel2abs("$dir/objects");
 		PublicInbox::Import::init_bare($self->{cur_dst});
-		my $o = "$self->{cur_dst}/objects";
-		my $f = "$o/info/alternates";
-		my $l = File::Spec->abs2rel($alt, File::Spec->rel2abs($o));
-		open my $fh, '+>>', $f or die "open($f): $!";
-		seek($fh, SEEK_SET, 0) or die "seek($f): $!";
-		chomp(my @cur = <$fh>);
-		if (!grep(/\A\Q$l\E\z/, @cur)) {
-			say $fh $l or die "say($f): $!";
-		}
-		close $fh or die "close($f): $!";
-		$f = "$self->{cur_dst}/config";
-		open $fh, '+>>', $f or die "open:($f): $!";
+		my $f = "$self->{cur_dst}/config";
+		open my $fh, '+>>', $f or die "open:($f): $!";
 		print $fh <<EOM or die "print($f): $!";
 ; rely on the "$rn" remote in the
 ; $fg fork group for fetches
@@ -453,6 +442,19 @@ sub forkgroup_prep {
 EOM
 		close $fh or die "close($f): $!";
 	}
+	if (!$self->{dry_run}) {
+		my $alt = File::Spec->rel2abs("$dir/objects");
+		my $o = "$self->{cur_dst}/objects";
+		my $f = "$o/info/alternates";
+		my $l = File::Spec->abs2rel($alt, File::Spec->rel2abs($o));
+		open my $fh, '+>>', $f or die "open($f): $!";
+		seek($fh, SEEK_SET, 0) or die "seek($f): $!";
+		chomp(my @cur = <$fh>);
+		if (!grep(/\A\Q$l\E\z/, @cur)) {
+			say $fh $l or die "say($f): $!";
+		}
+		close $fh or die "close($f): $!";
+	}
 	bless {
 		%$self, -osdir => $dir, -remote => $rn, -uri => $uri
 	}, __PACKAGE__;

^ permalink raw reply related	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2022-11-28  5:32 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-28  5:30 [PATCH 00/95] clone: multi-inbox/repo support Eric Wong
2022-11-28  5:30 ` [PATCH 01/95] clone: support multi-inbox clone Eric Wong
2022-11-28  5:30 ` [PATCH 02/95] clone: support --include and --exclude with multi-clone Eric Wong
2022-11-28  5:31 ` [PATCH 03/95] clone: parallelize v2 epoch clones Eric Wong
2022-11-28  5:31 ` [PATCH 04/95] lei_mirror: async config retrieval for v2 w/ manifest Eric Wong
2022-11-28  5:31 ` [PATCH 05/95] lei_mirror: rely on DESTROY to index v2 inbox Eric Wong
2022-11-28  5:31 ` [PATCH 06/95] lei_mirror: rely on global process reaper Eric Wong
2022-11-28  5:31 ` [PATCH 07/95] clone: support parallel v1 clones Eric Wong
2022-11-28  5:31 ` [PATCH 08/95] lei_mirror: default to single job by default Eric Wong
2022-11-28  5:31 ` [PATCH 09/95] lei_mirror: move directory creation to v2-only path Eric Wong
2022-11-28  5:31 ` [PATCH 10/95] lei_mirror: retrieve description text asynchronously, too Eric Wong
2022-11-28  5:31 ` [PATCH 11/95] switch inotify/kevent stuff to v5.12 Eric Wong
2022-11-28  5:31 ` [PATCH 12/95] manifest: update module blurb + v5.12 Eric Wong
2022-11-28  5:31 ` [PATCH 13/95] lei_mirror: simplify _get_txt_start callers Eric Wong
2022-11-28  5:31 ` [PATCH 14/95] lei_mirror: elide description retrieval for v1|coderepo Eric Wong
2022-11-28  5:31 ` [PATCH 15/95] lei_mirror: add a hint for skipped epoch permissions Eric Wong
2022-11-28  5:31 ` [PATCH 16/95] lei_mirror: consolidate clone process management Eric Wong
2022-11-28  5:31 ` [PATCH 17/95] lei_mirror: load File::Path unconditionally Eric Wong
2022-11-28  5:31 ` [PATCH 18/95] lei_mirror: load most modules up-front Eric Wong
2022-11-28  5:31 ` [PATCH 19/95] lei_mirror: set gitweb.owner from manifest Eric Wong
2022-11-28  5:31 ` [PATCH 20/95] clone: support --dry-run / -n flag Eric Wong
2022-11-28  5:31 ` [PATCH 21/95] lei_mirror: initialize placeholders with "head" from manifest Eric Wong
2022-11-28  5:31 ` [PATCH 22/95] lei_mirror: support {reference} for v1 manifest clones Eric Wong
2022-11-28  5:31 ` [PATCH 23/95] lei_mirror: reduce noise on interrupted clones Eric Wong
2022-11-28  5:31 ` [PATCH 24/95] clone: support --inbox-config option Eric Wong
2022-11-28  5:31 ` [PATCH 25/95] lei_mirror: retrieve v2 description properly Eric Wong
2022-11-28  5:31 ` [PATCH 26/95] lei_mirror: reduce scope of v2 lock Eric Wong
2022-11-28  5:31 ` [PATCH 27/95] lei_mirror: allow --epoch on mixed v1/v2 clones Eric Wong
2022-11-28  5:31 ` [PATCH 28/95] lei_mirror: fix infinite loop in dependency resolution Eric Wong
2022-11-28  5:31 ` [PATCH 29/95] lei_mirror: defend against infinite loops Eric Wong
2022-11-28  5:31 ` [PATCH 30/95] lei_mirror: do not fetch descriptions if using manifest Eric Wong
2022-11-28  5:31 ` [PATCH 31/95] lei_mirror: require PublicInbox::Lock at use Eric Wong
2022-11-28  5:31 ` [PATCH 32/95] lei_mirror: fix glob semantics to match end-of-path Eric Wong
2022-11-28  5:31 ` [PATCH 33/95] lei_mirror: differentiate -entv vs -ent Eric Wong
2022-11-28  5:31 ` [PATCH 34/95] lei_mirror: support manifest {references} for v2 epochs Eric Wong
2022-11-28  5:31 ` [PATCH 35/95] lei_mirror: simplify v2 code paths Eric Wong
2022-11-28  5:31 ` [PATCH 36/95] clone: support --inbox-version Eric Wong
2022-11-28  5:31 ` [PATCH 37/95] lei_mirror: require Perl v5.12+ Eric Wong
2022-11-28  5:31 ` [PATCH 38/95] lei_mirror: ensure curl exits 22 on HTTP 404 responses Eric Wong
2022-11-28  5:31 ` [PATCH 39/95] lei_mirror: cleanup File::Temp OO usage Eric Wong
2022-11-28  5:31 ` [PATCH 40/95] lei_mirror: add `index' target to generated Makefile Eric Wong
2022-11-28  5:31 ` [PATCH 41/95] lei_mirror: do not write Makefile for --inbox-config=never Eric Wong
2022-11-28  5:31 ` [PATCH 42/95] lei_mirror: hoist out dump_manifest sub Eric Wong
2022-11-28  5:31 ` [PATCH 43/95] lei_mirror: avoid convoluted lazy_cb usage Eric Wong
2022-11-28  5:31 ` [PATCH 44/95] lei_mirror: simplify clone_v2_prep Eric Wong
2022-11-28  5:31 ` [PATCH 45/95] lei_mirror: support --objstore and forkgroups Eric Wong
2022-11-28  5:31 ` [PATCH 46/95] lei_mirror: cleanup process reaping logic Eric Wong
2022-11-28  5:31 ` [PATCH 47/95] lei_mirror: ensure git <1.8.5 fallback can use torsocks Eric Wong
2022-11-28  5:31 ` [PATCH 48/95] clone: flesh out --objstore behavior and document Eric Wong
2022-11-28  5:31 ` [PATCH 49/95] lei_mirror: always pack refs for coderepos Eric Wong
2022-11-28  5:31 ` [PATCH 50/95] lei_mirror: set description for non-inboxes, too Eric Wong
2022-11-28  5:31 ` [PATCH 51/95] lei_mirror: force --no-tags when fetching forkgroups Eric Wong
2022-11-28  5:31 ` [PATCH 52/95] lei_mirror: preserve permissions of existing alternates file Eric Wong
2022-11-28  5:31 ` [PATCH 53/95] lei_mirror: do not show ref updates w/o --verbose Eric Wong
2022-11-28  5:31 ` [PATCH 54/95] lei_mirror: drop git <1.8.5 support Eric Wong
2022-11-28  5:31 ` [PATCH 55/95] lei_mirror: make basename more descriptive Eric Wong
2022-11-28  5:31 ` [PATCH 56/95] lei_mirror: fix --dry-run for forkgroups Eric Wong
2022-11-28  5:31 ` [PATCH 57/95] lei_mirror: forkgroups use `git fetch --multiple' Eric Wong
2022-11-28  5:31 ` [PATCH 58/95] clone: move --dry-run handling to lei_mirror Eric Wong
2022-11-28  5:31 ` [PATCH 59/95] clone: drop unnecessary requires Eric Wong
2022-11-28  5:31 ` [PATCH 60/95] clone: use v5.12 Eric Wong
2022-11-28  5:31 ` [PATCH 61/95] clone: require `--objstore=' for default location Eric Wong
2022-11-28  5:31 ` [PATCH 62/95] lei_mirror: shorten remote names Eric Wong
2022-11-28  5:32 ` [PATCH 63/95] fetch: use v5.12 Eric Wong
2022-11-28  5:32 ` [PATCH 64/95] fetch: eliminate File::Temp->filename var Eric Wong
2022-11-28  5:32 ` [PATCH 65/95] lei_mirror: properly pack-refs in non-forkgroup repos Eric Wong
2022-11-28  5:32 ` [PATCH 66/95] lei_mirror: show child error error code Eric Wong
2022-11-28  5:32 ` [PATCH 67/95] on_destroy: support ->cancel callback Eric Wong
2022-11-28  5:32 ` [PATCH 68/95] lei_mirror: support resuming multi-repo clones Eric Wong
2022-11-28  5:32 ` [PATCH 69/95] lei_mirror: check fingerprints before fetching Eric Wong
2022-11-28  5:32 ` [PATCH 70/95] clone: support loading manifest.js.gz from destination Eric Wong
2022-11-28  5:32 ` [PATCH 71/95] lei_mirror: delay configuring forkgroups Eric Wong
2022-11-28  5:32 ` [PATCH 72/95] clone: canonicalize destination path from CLI Eric Wong
2022-11-28  5:32 ` [PATCH 73/95] clone|fetch: support passing --prune(-tags) to `git fetch' Eric Wong
2022-11-28  5:32 ` [PATCH 74/95] lei_mirror: avoid needless FD passing Eric Wong
2022-11-28  5:32 ` [PATCH 75/95] clone: support --keep-going/-k like make(1) Eric Wong
2022-11-28  5:32 ` [PATCH 76/95] lei_mirror: don't warn on missing manifest on initial clone Eric Wong
2022-11-28  5:32 ` [PATCH 77/95] lei_mirror: respect `./' and `../' prefixes for CLI args Eric Wong
2022-11-28  5:32 ` [PATCH 78/95] lei_mirror: --manifest= affects destination, too Eric Wong
2022-11-28  5:32 ` [PATCH 79/95] lei_mirror: update fingerprints when writing local manifest.js.gz Eric Wong
2022-11-28  5:32 ` [PATCH 80/95] lei_mirror: remove janky mirror.done stamp file Eric Wong
2022-11-28  5:32 ` [PATCH 81/95] lei_mirror: simplify most process spawning Eric Wong
2022-11-28  5:32 ` [PATCH 82/95] lei_mirror: run v1_done earlier on forkgroup done Eric Wong
2022-11-28  5:32 ` [PATCH 83/95] lei_mirror: simplify forkgroup-related subs Eric Wong
2022-11-28  5:32 ` [PATCH 84/95] lei_mirror: shorten scope mirror objects Eric Wong
2022-11-28  5:32 ` [PATCH 85/95] lei_mirror: set {head} from manifest Eric Wong
2022-11-28  5:32 ` [PATCH 86/95] lei_mirror: support {symlinks} " Eric Wong
2022-11-28  5:32 ` [PATCH 87/95] lei_mirror: eliminate circular references Eric Wong
2022-11-28  5:32 ` [PATCH 88/95] lei_mirror: use curl -z/--timecond if manifest exists Eric Wong
2022-11-28  5:32 ` [PATCH 89/95] lei_mirror: avoid redundant curl `-f' use Eric Wong
2022-11-28  5:32 ` [PATCH 90/95] lei_mirror: omit trailing slash for git remote.*.url Eric Wong
2022-11-28  5:32 ` [PATCH 91/95] lei_mirror: set info/web/last-modified from manifest Eric Wong
2022-11-28  5:32 ` [PATCH 92/95] lei_mirror: don't clobber inbox.config.example if it exists Eric Wong
2022-11-28  5:32 ` [PATCH 93/95] lei_mirror: break out of fgrp fetch iteration early Eric Wong
2022-11-28  5:32 ` [PATCH 94/95] clone: support --project-list= for cgit Eric Wong
2022-11-28  5:32 ` [PATCH 95/95] lei_mirror: handle forkgroup changes Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).