unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/7] new public-inbox-{clone,fetch} commands
@ 2021-09-12  7:47 Eric Wong
  2021-09-12  7:47 ` [PATCH 1/7] lei_mirror: simplify error reporting Eric Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Eric Wong @ 2021-09-12  7:47 UTC (permalink / raw)
  To: meta

Hopefully, these new commands make maintaining mirrors of a
single (or handful of) multi-epoch v2 inboxes easier and less
error-prone.

Unlike grokmirror:
* these commands do not require extra config files of any kind
* they only allow cloning/fetching a single inbox per-invocation

"description" files also default to something more meaningful
for both public-inbox-init and -clone.

PATCH 7/7 also begins laying the groundwork for a v1 => v2
migration path which doesn't involve existing mirrors
having to redownload everything.

Eric Wong (7):
  lei_mirror: simplify error reporting
  lei_mirror: fix error message
  new public-inbox-{clone,fetch} commands
  clone|lei_mirror: write description in mirrors
  import: do not write a "description" file
  init: set a useful description
  fetch: use manifest.js.gz for v1

 Documentation/public-inbox-clone.pod |  71 +++++++++++
 Documentation/public-inbox-fetch.pod |  63 ++++++++++
 MANIFEST                             |   5 +
 lib/PublicInbox/Admin.pm             |   8 ++
 lib/PublicInbox/Fetch.pm             | 172 +++++++++++++++++++++++++++
 lib/PublicInbox/Import.pm            |   3 -
 lib/PublicInbox/LEI.pm               |   6 +-
 lib/PublicInbox/LeiMirror.pm         | 167 ++++++++++++++++----------
 lib/PublicInbox/TestCommon.pm        |   3 +-
 script/public-inbox-clone            |  58 +++++++++
 script/public-inbox-fetch            |  35 ++++++
 script/public-inbox-init             |   6 +
 t/init.t                             |   3 +
 t/lei-mirror.t                       |  52 ++++++++
 t/www_listing.t                      |   1 -
 15 files changed, 585 insertions(+), 68 deletions(-)
 create mode 100644 Documentation/public-inbox-clone.pod
 create mode 100644 Documentation/public-inbox-fetch.pod
 create mode 100644 lib/PublicInbox/Fetch.pm
 create mode 100755 script/public-inbox-clone
 create mode 100755 script/public-inbox-fetch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/7] lei_mirror: simplify error reporting
  2021-09-12  7:47 [PATCH 0/7] new public-inbox-{clone,fetch} commands Eric Wong
@ 2021-09-12  7:47 ` Eric Wong
  2021-09-12  7:47 ` [PATCH 2/7] lei_mirror: fix error message Eric Wong
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2021-09-12  7:47 UTC (permalink / raw)
  To: meta

Slowly transitioning to using die() more, which hopefully
improves code reusability between lei and non-lei parts of our
code.
---
 lib/PublicInbox/LeiMirror.pm | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 8689b825..96580f9e 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -103,7 +103,7 @@ sub _try_config {
 	my $f = "$ce-$$.tmp";
 	open(my $fh, '+>', $f) or return $lei->err("open $f: $! (non-fatal)");
 	my $opt = { 0 => $lei->{0}, 1 => $fh, 2 => $lei->{2} };
-	my $cerr = run_reap($lei, $cmd, $opt) // return;
+	my $cerr = run_reap($lei, $cmd, $opt);
 	if (($cerr >> 8) == 22) { # 404 missing
 		unlink($f) if -s $fh == 0;
 		return;
@@ -150,9 +150,9 @@ sub run_reap {
 	$opt->{pgid} = 0;
 	my $pid = spawn($cmd, undef, $opt);
 	my $reap = PublicInbox::OnDestroy->new($lei->can('sigint_reap'), $pid);
-	my $err = waitpid($pid, 0) == $pid ? undef : "waitpid @$cmd: $!";
+	waitpid($pid, 0) == $pid or die "waitpid @$cmd: $!";
 	@$reap = (); # cancel reap
-	$err ? $lei->err($err) : $?
+	$?
 }
 
 sub clone_v1 {
@@ -163,7 +163,7 @@ sub clone_v1 {
 	my $pfx = $curl->torsocks($lei, $uri) or return;
 	my $cmd = [ @$pfx, clone_cmd($lei, my $opt = {}),
 			$uri->as_string, $self->{dst} ];
-	my $cerr = run_reap($lei, $cmd, $opt) // return;
+	my $cerr = run_reap($lei, $cmd, $opt);
 	return $lei->child_error($cerr, "@$cmd failed") if $cerr;
 	_try_config($self);
 	index_cloned_inbox($self, 1);
@@ -193,7 +193,7 @@ failed to extract epoch number from $src
 	my @cmd = clone_cmd($lei, my $opt = {});
 	while (my $pair = shift(@src_edst)) {
 		my $cmd = [ @$pfx, @cmd, @$pair ];
-		my $cerr = run_reap($lei, $cmd, $opt) // return;
+		my $cerr = run_reap($lei, $cmd, $opt);
 		return $lei->child_error($cerr, "@$cmd failed") if $cerr;
 	}
 	undef $on_destroy; # unlock
@@ -228,9 +228,8 @@ sub try_manifest {
 	my $reap = PublicInbox::OnDestroy->new($lei->can('sigint_reap'), $pid);
 	my $gz = do { local $/; <$fh> } // die "read(curl $uri): $!";
 	close $fh;
-	my $err = waitpid($pid, 0) == $pid ? undef : "waitpid @$cmd: $!";
+	waitpid($pid, 0) == $pid or die "waitpid @$cmd: $!";
 	@$reap = ();
-	return $lei->err($err) if $err;
 	if ($?) {
 		return try_scrape($self) if ($? >> 8) == 22; # 404 missing
 		return $lei->child_error($?, "@$cmd failed");

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/7] lei_mirror: fix error message
  2021-09-12  7:47 [PATCH 0/7] new public-inbox-{clone,fetch} commands Eric Wong
  2021-09-12  7:47 ` [PATCH 1/7] lei_mirror: simplify error reporting Eric Wong
@ 2021-09-12  7:47 ` Eric Wong
  2021-09-12  7:47 ` [PATCH 3/7] new public-inbox-{clone,fetch} commands Eric Wong
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2021-09-12  7:47 UTC (permalink / raw)
  To: meta

We're using rename(2) rather than link(2)
---
 lib/PublicInbox/LeiMirror.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 96580f9e..355813bd 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -109,7 +109,7 @@ sub _try_config {
 		return;
 	}
 	return $lei->err("# @$cmd failed (non-fatal)") if $cerr;
-	rename($f, $ce) or return $lei->err("link($f, $ce): $! (non-fatal)");
+	rename($f, $ce) or return $lei->err("rename($f, $ce): $! (non-fatal)");
 	my $cfg = PublicInbox::Config->git_config_dump($f, $lei->{2});
 	my $ibx = $self->{ibx} = {};
 	for my $sec (grep(/\Apublicinbox\./, @{$cfg->{-section_order}})) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/7] new public-inbox-{clone,fetch} commands
  2021-09-12  7:47 [PATCH 0/7] new public-inbox-{clone,fetch} commands Eric Wong
  2021-09-12  7:47 ` [PATCH 1/7] lei_mirror: simplify error reporting Eric Wong
  2021-09-12  7:47 ` [PATCH 2/7] lei_mirror: fix error message Eric Wong
@ 2021-09-12  7:47 ` Eric Wong
  2021-09-12  7:47 ` [PATCH 4/7] clone|lei_mirror: write description in mirrors Eric Wong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2021-09-12  7:47 UTC (permalink / raw)
  To: meta

Setting up and maintaining git-only mirrors of v2 inboxes is
complex since multiple commands are required to clone and fetch
into epochs.

Unlike grokmirror, these commands do not require any
configuration.  Instead, they rely on existing git config files
and work like "git clone --mirror" and "git fetch",
respectively.

Like grokmirror, they use manifest.js.gz, but only on a
per-inbox basis so users won't have to clone every inbox of a
large instance nor edit config files to include/exclude inboxes
they're interested in.
---
 Documentation/public-inbox-clone.pod |  71 +++++++++++++
 Documentation/public-inbox-fetch.pod |  63 ++++++++++++
 MANIFEST                             |   5 +
 lib/PublicInbox/Admin.pm             |   8 ++
 lib/PublicInbox/Fetch.pm             | 145 +++++++++++++++++++++++++++
 lib/PublicInbox/LEI.pm               |   6 +-
 lib/PublicInbox/LeiMirror.pm         |  95 +++++++++++-------
 script/public-inbox-clone            |  58 +++++++++++
 script/public-inbox-fetch            |  35 +++++++
 t/lei-mirror.t                       |  29 ++++++
 10 files changed, 475 insertions(+), 40 deletions(-)
 create mode 100644 Documentation/public-inbox-clone.pod
 create mode 100644 Documentation/public-inbox-fetch.pod
 create mode 100644 lib/PublicInbox/Fetch.pm
 create mode 100755 script/public-inbox-clone
 create mode 100755 script/public-inbox-fetch

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
new file mode 100644
index 00000000..fdb57663
--- /dev/null
+++ b/Documentation/public-inbox-clone.pod
@@ -0,0 +1,71 @@
+=head1 NAME
+
+public-inbox-clone - "git clone --mirror" wrapper
+
+=head1 SYNOPSIS
+
+public-inbox-clone INBOX_URL [INBOX_DIR]
+
+=head1 DESCRIPTION
+
+public-inbox-clone is a wrapper around C<git clone --mirror> for
+making the initial clone of a remote HTTP(S) public-inbox.  It
+allows cloning multi-epoch v2 inboxes with a single command and
+zero configuration.
+
+It does not run L<public-inbox-init(1)> nor
+L<public-inbox-index(1)>.  Those commands must be run separately
+if serving/searching the mirror is required.  As-is,
+public-inbox-clone is suitable for creating a git-only backup.
+
+public-inbox-clone does not use nor require any extra
+configuration files (not even C<~/.public-inbox/config>).
+
+L<public-inbox-fetch(1)> may be used to keep C<INBOX_DIR>
+up-to-date.
+
+For v2 inboxes, it will create a C<$INBOX_DIR/manifest.js.gz>
+file to speed up subsequent L<public-inbox-fetch(1)>.
+
+=head1 OPTIONS
+
+=over
+
+=item -q
+
+=item --quiet
+
+Quiets down progress messages, also passed to L<git-fetch(1)>.
+
+=item -v
+
+=item --verbose
+
+Increases verbosity, also passed to L<git-fetch(1)>.
+
+=item --torsocks=auto|no|yes
+
+=item --no-torsocks
+
+Whether to wrap L<git(1)> and L<curl(1)> commands with torsocks.
+
+Default: C<auto>
+
+=back
+
+=head1 CONTACT
+
+Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
+
+The mail archives are hosted at L<https://public-inbox.org/meta/> and
+L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
+
+=head1 COPYRIGHT
+
+Copyright all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<public-inbox-fetch(1)>, L<public-inbox-init(1)>, L<public-inbox-index(1)>
diff --git a/Documentation/public-inbox-fetch.pod b/Documentation/public-inbox-fetch.pod
new file mode 100644
index 00000000..7944fdcd
--- /dev/null
+++ b/Documentation/public-inbox-fetch.pod
@@ -0,0 +1,63 @@
+=head1 NAME
+
+public-inbox-fetch - "git fetch" wrapper for v2 inbox mirrors
+
+=head1 SYNOPSIS
+
+public-inbox-fetch -C INBOX_DIR
+
+=head1 DESCRIPTION
+
+public-inbox-fetch updates git storage of public-inbox mirrors.
+With v2 inboxes, it allows detection of new epochs and avoids
+unnecessary traffic on old epochs.
+
+public-inbox-fetch does not use nor require any configuration
+files of its own.
+
+It does not run L<public-inbox-index(1)>, making it suitable
+for maintaining git-only backups.
+
+For v2 inboxes, it will maintain C<$INBOX_DIR/manifest.js.gz>
+file to speed up future invocations.
+
+=head1 OPTIONS
+
+=over
+
+=item -q
+
+=item --quiet
+
+Quiets down progress messages, also passed to L<git-fetch(1)>.
+
+=item -v
+
+=item --verbose
+
+Increases verbosity, also passed to L<git-fetch(1)>.
+
+=item --torsocks=auto|no|yes
+
+=item --no-torsocks
+
+Whether to wrap L<git(1)> and L<curl(1)> commands with torsocks.
+
+Default: C<auto>
+
+=head1 CONTACT
+
+Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
+
+The mail archives are hosted at L<https://public-inbox.org/meta/> and
+L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
+
+=head1 COPYRIGHT
+
+Copyright all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<public-inbox-index(1)>
diff --git a/MANIFEST b/MANIFEST
index c64f7d94..a1450880 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -51,6 +51,7 @@ Documentation/lei.pod
 Documentation/lei_design_notes.txt
 Documentation/marketing.txt
 Documentation/mknews.perl
+Documentation/public-inbox-clone.pod
 Documentation/public-inbox-compact.pod
 Documentation/public-inbox-config.pod
 Documentation/public-inbox-convert.pod
@@ -58,6 +59,7 @@ Documentation/public-inbox-daemon.pod
 Documentation/public-inbox-edit.pod
 Documentation/public-inbox-extindex-format.pod
 Documentation/public-inbox-extindex.pod
+Documentation/public-inbox-fetch.pod
 Documentation/public-inbox-glossary.pod
 Documentation/public-inbox-httpd.pod
 Documentation/public-inbox-imapd.pod
@@ -163,6 +165,7 @@ lib/PublicInbox/ExtSearchIdx.pm
 lib/PublicInbox/FakeImport.pm
 lib/PublicInbox/FakeInotify.pm
 lib/PublicInbox/Feed.pm
+lib/PublicInbox/Fetch.pm
 lib/PublicInbox/Filter/Base.pm
 lib/PublicInbox/Filter/Gmane.pm
 lib/PublicInbox/Filter/Mirror.pm
@@ -329,10 +332,12 @@ sa_config/README
 sa_config/root/etc/spamassassin/public-inbox.pre
 sa_config/user/.spamassassin/user_prefs
 script/lei
+script/public-inbox-clone
 script/public-inbox-compact
 script/public-inbox-convert
 script/public-inbox-edit
 script/public-inbox-extindex
+script/public-inbox-fetch
 script/public-inbox-httpd
 script/public-inbox-imapd
 script/public-inbox-index
diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 2534958b..9ff59bca 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -372,4 +372,12 @@ sub index_prepare ($$) {
 	$env;
 }
 
+sub do_chdir ($) {
+	my $chdir = $_[0] // return;
+	for my $d (@$chdir) {
+		next if $d eq ''; # same as git(1)
+		chdir $d or die "cd $d: $!";
+	}
+}
+
 1;
diff --git a/lib/PublicInbox/Fetch.pm b/lib/PublicInbox/Fetch.pm
new file mode 100644
index 00000000..d795731c
--- /dev/null
+++ b/lib/PublicInbox/Fetch.pm
@@ -0,0 +1,145 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# Wrapper to "git fetch" remote public-inboxes
+package PublicInbox::Fetch;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::IPC);
+use URI ();
+use PublicInbox::Spawn qw(popen_rd);
+use PublicInbox::Admin;
+use PublicInbox::LEI;
+use PublicInbox::LeiCurl;
+use PublicInbox::LeiMirror;
+use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
+use File::Temp ();
+
+sub new { bless {}, __PACKAGE__ }
+
+sub fetch_cmd ($$) {
+	my ($lei, $opt) = @_;
+	my @cmd = qw(git);
+	$opt->{$_} = $lei->{$_} for (0..2);
+	# we support "-c $key=$val" for arbitrary git config options
+	# e.g.: git -c http.proxy=socks5h://127.0.0.1:9050
+	push(@cmd, '-c', $_) for @{$lei->{opt}->{c} // []};
+	push @cmd, 'fetch';
+	push @cmd, '-q' if $lei->{opt}->{quiet};
+	push @cmd, '-v' if $lei->{opt}->{verbose};
+	@cmd;
+}
+
+sub remote_url ($$) {
+	my ($lei, $dir) = @_; # TODO: support non-"origin"?
+	my $cmd = [ qw(git config remote.origin.url) ];
+	my $fh = popen_rd($cmd, undef, { -C => $dir, 2 => $lei->{2} });
+	my $url = <$fh>;
+	close $fh or return;
+	chomp $url;
+	$url;
+}
+
+sub do_fetch {
+	my ($cls, $lei, $cd) = @_;
+	my $ibx_ver;
+	my $curl = PublicInbox::LeiCurl->new($lei) or return;
+	my $dir = PublicInbox::Admin::resolve_inboxdir($cd, \$ibx_ver);
+	if ($ibx_ver == 1) {
+		my $url = remote_url($lei, $dir) //
+			die "E: $dir missing remote.origin.url\n";
+		my $uri = URI->new($url);
+		my $torsocks = $curl->torsocks($lei, $uri);
+		my $opt = { -C => $dir };
+		my $cmd = [ @$torsocks, fetch_cmd($lei, $opt) ];
+		my $cerr = PublicInbox::LeiMirror::run_reap($lei, $cmd, $opt);
+		$lei->child_error($cerr, "@$cmd failed") if $cerr;
+		return;
+	}
+	# v2:
+	opendir my $dh, "$dir/git" or die "opendir $dir/git: $!";
+	my @epochs = sort { $b <=> $a } map { substr($_, 0, -4) + 0 }
+				grep(/\A[0-9]+\.git\z/, readdir($dh));
+	my ($git_url, $epoch);
+	for my $nr (@epochs) { # try newest epoch, first
+		my $edir = "$dir/git/$nr.git";
+		if (defined(my $url = remote_url($lei, $edir))) {
+			$git_url = $url;
+			$epoch = $nr;
+			last;
+		} else {
+			warn "W: $edir missing remote.origin.url\n";
+		}
+	}
+	$git_url or die "Unable to determine git URL\n";
+	my $inbox_url = $git_url;
+	$inbox_url =~ s!/git/$epoch(?:\.git)?/?\z!! or
+		$inbox_url =~ s!/$epoch(?:\.git)?/?\z!! or die <<EOM;
+Unable to infer inbox URL from <$git_url>
+EOM
+	$lei->qerr("# inbox URL: $inbox_url/");
+	my $muri = URI->new("$inbox_url/manifest.js.gz");
+	my $ft = File::Temp->new(TEMPLATE => 'manifest-XXXX',
+				UNLINK => 1, DIR => $dir);
+	my $fn = $ft->filename;
+	my @opt = (qw(-R -o), $fn);
+	my $mf = "$dir/manifest.js.gz";
+	my $m0; # current manifest.js.gz contents
+	if (open my $fh, '<', $mf) {
+		$m0 = eval {
+			PublicInbox::LeiMirror::decode_manifest($fh, $mf, $mf)
+		};
+		$lei->err($@) if $@;
+		push @opt, '-z', $mf if defined($m0);
+	}
+	my $curl_cmd = $curl->for_uri($lei, $muri, @opt);
+	my $opt = {};
+	$opt->{$_} = $lei->{$_} for (0..2);
+	my $cerr = PublicInbox::LeiMirror::run_reap($lei, $curl_cmd, $opt);
+	return $lei->child_error($cerr, "@$curl_cmd failed") if $cerr;
+	return if !-s $ft; # 304 Not Modified via curl -z
+
+	my $m1 = PublicInbox::LeiMirror::decode_manifest($ft, $fn, $muri);
+	my $mdiff = { %$m1 };
+
+	# filter out unchanged entries
+	while (my ($k, $v0) = each %{$m0 // {}}) {
+		my $cur = $m1->{$k} // next;
+		my $f0 = $v0->{fingerprint} // next;
+		my $f1 = $cur->{fingerprint} // next;
+		my $t0 = $v0->{modified} // next;
+		my $t1 = $cur->{modified} // next;
+		delete($mdiff->{$k}) if $f0 eq $f1 && $t0 == $t1;
+	}
+	my $ibx_uri = URI->new("$inbox_url/");
+	my ($path_pfx, $v1_bare, @v2_epochs) =
+		PublicInbox::LeiMirror::deduce_epochs($mdiff, $ibx_uri->path);
+	defined($v1_bare) and die <<EOM;
+E: got v1 `$v1_bare' when expecting v2 epoch(s) in <$muri>, WTF?
+EOM
+	my @epoch_nr = sort { $a <=> $b }
+		map { my ($nr) = (m!/([0-9]+)\.git\z!g) } @v2_epochs;
+
+	# n.b. this expects all epochs are from the same host
+	my $torsocks = $curl->torsocks($lei, $muri);
+	for my $nr (@epoch_nr) {
+		my $dir = "$dir/git/$nr.git";
+		my $cmd;
+		my $opt = {};
+		if (-d $dir) {
+			$opt->{-C} = $dir;
+			$cmd = [ @$torsocks, fetch_cmd($lei, $opt) ];
+		} else {
+			my $e_uri = $ibx_uri->clone;
+			$e_uri->path($ibx_uri->path."git/$nr.git");
+			$cmd = [ @$torsocks,
+				PublicInbox::LeiMirror::clone_cmd($lei, $opt),
+				$$e_uri, $dir ];
+		}
+		my $cerr = PublicInbox::LeiMirror::run_reap($lei, $cmd, $opt);
+		return $lei->child_error($cerr, "@$cmd failed") if $cerr;
+	}
+	rename($fn, $mf) or die "E: rename($fn, $mf): $!\n";
+	$ft->unlink_on_destroy(0);
+}
+
+1;
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index aff2bf19..6d5d3c03 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -468,6 +468,8 @@ sub x_it ($$) {
 		$self->{pkt_op_p}->pkt_do('x_it', $code);
 	} elsif ($self->{sock}) { # to lei(1) client
 		send($self->{sock}, "x_it $code", MSG_EOR);
+	} elsif ($quit == \&CORE::exit) { # an admin command
+		exit($code >> 8);
 	} # else ignore if client disconnected
 }
 
@@ -511,7 +513,7 @@ sub fail ($$;$) {
 	my ($self, $buf, $exit_code) = @_;
 	$self->{failed}++;
 	err($self, $buf) if defined $buf;
-	# calls fail_handler:
+	# calls fail_handler
 	$self->{pkt_op_p}->pkt_do('!') if $self->{pkt_op_p};
 	x_it($self, ($exit_code // 1) << 8);
 	undef;
@@ -536,6 +538,8 @@ sub child_error { # passes non-fatal curl exit codes to user
 		$self->{pkt_op_p}->pkt_do('child_error', $child_error);
 	} elsif ($self->{sock}) { # to lei(1) client
 		send($self->{sock}, "child_error $child_error", MSG_EOR);
+	} else { # non-lei admin command
+		$self->{child_error} ||= $child_error;
 	} # else noop if client disconnected
 }
 
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 355813bd..c128d13d 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -8,6 +8,7 @@ use v5.10.1;
 use parent qw(PublicInbox::IPC);
 use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
 use PublicInbox::Spawn qw(popen_rd spawn);
+use File::Temp ();
 
 sub do_finish_mirror { # dwaitpid callback
 	my ($arg, $pid) = @_;
@@ -18,7 +19,9 @@ sub do_finish_mirror { # dwaitpid callback
 	} elsif (!unlink($f)) {
 		$lei->err("unlink($f): $!") unless $!{ENOENT};
 	} else {
-		$lei->add_external_finish($mrr->{dst});
+		if ($lei->{cmd} ne 'public-inbox-clone') {
+			$lei->add_external_finish($mrr->{dst});
+		}
 		$lei->qerr("# mirrored $mrr->{src} => $mrr->{dst}");
 	}
 	$lei->dclose;
@@ -121,33 +124,38 @@ sub _try_config {
 
 sub index_cloned_inbox {
 	my ($self, $iv) = @_;
-	my $ibx = delete($self->{ibx}) // {
-		address => [ 'lei@example.com' ],
-		version => $iv,
-	};
-	$ibx->{inboxdir} = $self->{dst};
-	PublicInbox::Inbox->new($ibx);
-	PublicInbox::InboxWritable->new($ibx);
-	my $opt = {};
 	my $lei = $self->{lei};
-	for my $sw ($lei->index_opt) {
-		my ($k) = ($sw =~ /\A([\w-]+)/);
-		$opt->{$k} = $lei->{opt}->{$k};
+
+	# n.b. public-inbox-clone works w/o (SQLite || Xapian)
+	# lei is useless without Xapian + SQLite
+	if ($lei->{cmd} ne 'public-inbox-clone') {
+		my $ibx = delete($self->{ibx}) // {
+			address => [ 'lei@example.com' ],
+			version => $iv,
+		};
+		$ibx->{inboxdir} = $self->{dst};
+		PublicInbox::Inbox->new($ibx);
+		PublicInbox::InboxWritable->new($ibx);
+		my $opt = {};
+		for my $sw ($lei->index_opt) {
+			my ($k) = ($sw =~ /\A([\w-]+)/);
+			$opt->{$k} = $lei->{opt}->{$k};
+		}
+		# force synchronous dwaitpid for v2:
+		local $PublicInbox::DS::in_loop = 0;
+		my $cfg = PublicInbox::Config->new(undef, $lei->{2});
+		my $env = PublicInbox::Admin::index_prepare($opt, $cfg);
+		local %ENV = (%ENV, %$env) if $env;
+		PublicInbox::Admin::progress_prepare($opt, $lei->{2});
+		PublicInbox::Admin::index_inbox($ibx, undef, $opt);
 	}
-	# force synchronous dwaitpid for v2:
-	local $PublicInbox::DS::in_loop = 0;
-	my $cfg = PublicInbox::Config->new(undef, $lei->{2});
-	my $env = PublicInbox::Admin::index_prepare($opt, $cfg);
-	local %ENV = (%ENV, %$env) if $env;
-	PublicInbox::Admin::progress_prepare($opt, $lei->{2});
-	PublicInbox::Admin::index_inbox($ibx, undef, $opt);
 	open my $x, '>', "$self->{dst}/mirror.done"; # for do_finish_mirror
 }
 
 sub run_reap {
 	my ($lei, $cmd, $opt) = @_;
 	$lei->qerr("# @$cmd");
-	$opt->{pgid} = 0;
+	$opt->{pgid} = 0 if $lei->{sock};
 	my $pid = spawn($cmd, undef, $opt);
 	my $reap = PublicInbox::OnDestroy->new($lei->can('sigint_reap'), $pid);
 	waitpid($pid, 0) == $pid or die "waitpid @$cmd: $!";
@@ -205,6 +213,7 @@ sub deduce_epochs ($$) {
 	my ($m, $path) = @_;
 	my ($v1_bare, @v2_epochs);
 	my $path_pfx = '';
+	$path =~ s!/+\z!!;
 	do {
 		$v1_bare = $m->{$path};
 		@v2_epochs = grep(m!\A\Q$path\E/git/[0-9]+\.git\z!, keys %$m);
@@ -213,6 +222,18 @@ sub deduce_epochs ($$) {
 	($path_pfx, $v1_bare, @v2_epochs);
 }
 
+sub decode_manifest ($$$) {
+	my ($fh, $fn, $uri) = @_;
+	my $js;
+	my $gz = do { local $/; <$fh> } // die "slurp($fn): $!";
+	gunzip(\$gz => \$js, MultiStream => 1) or
+		die "gunzip($uri): $GunzipError\n";
+	my $m = eval { PublicInbox::Config->json->decode($js) };
+	die "$uri: error decoding `$js': $@\n" if $@;
+	ref($m) eq 'HASH' or die "$uri unknown type: ".ref($m);
+	$m;
+}
+
 sub try_manifest {
 	my ($self) = @_;
 	my $uri = URI->new($self->{src});
@@ -221,26 +242,19 @@ sub try_manifest {
 	my $path = $uri->path;
 	chop($path) eq '/' or die "BUG: $uri not canonicalized";
 	$uri->path($path . '/manifest.js.gz');
-	my $cmd = $curl->for_uri($lei, $uri);
-	$lei->qerr("# @$cmd");
-	my $opt = { 0 => $lei->{0}, 2 => $lei->{2} };
-	my ($fh, $pid) = popen_rd($cmd, undef, $opt);
-	my $reap = PublicInbox::OnDestroy->new($lei->can('sigint_reap'), $pid);
-	my $gz = do { local $/; <$fh> } // die "read(curl $uri): $!";
-	close $fh;
-	waitpid($pid, 0) == $pid or die "waitpid @$cmd: $!";
-	@$reap = ();
-	if ($?) {
-		return try_scrape($self) if ($? >> 8) == 22; # 404 missing
-		return $lei->child_error($?, "@$cmd failed");
+	my $pdir = $lei->rel2abs($self->{dst});
+	$pdir =~ s!/[^/]+/?\z!!;
+	my $ft = File::Temp->new(TEMPLATE => 'manifest-XXXX',
+				UNLINK => 1, DIR => $pdir);
+	my $fn = $ft->filename;
+	my $cmd = $curl->for_uri($lei, $uri, '-R', '-o', $fn);
+	my $opt = { 0 => $lei->{0}, 1 => $lei->{1}, 2 => $lei->{2} };
+	my $cerr = run_reap($lei, $cmd, $opt);
+	if ($cerr) {
+		return try_scrape($self) if ($cerr >> 8) == 22; # 404 missing
+		return $lei->child_error($cerr, "@$cmd failed");
 	}
-	my $js;
-	gunzip(\$gz => \$js, MultiStream => 1) or
-		die "gunzip($uri): $GunzipError";
-	my $m = eval { PublicInbox::Config->json->decode($js) };
-	die "$uri: error decoding `$js': $@" if $@;
-	ref($m) eq 'HASH' or die "$uri unknown type: ".ref($m);
-
+	my $m = decode_manifest($ft, $fn, $uri);
 	my ($path_pfx, $v1_bare, @v2_epochs) = deduce_epochs($m, $path);
 	if (@v2_epochs) {
 		# It may be possible to have v1 + v2 in parallel someday:
@@ -254,6 +268,9 @@ EOM
 			$uri->clone
 		} @v2_epochs;
 		clone_v2($self, \@v2_epochs);
+		my $fin = "$self->{dst}/manifest.js.gz";
+		rename($fn, $fin) or die "E: rename($fn, $fin): $!";
+		$ft->unlink_on_destroy(0);
 	} elsif (defined $v1_bare) {
 		clone_v1($self);
 	} else {
diff --git a/script/public-inbox-clone b/script/public-inbox-clone
new file mode 100755
index 00000000..2b18969f
--- /dev/null
+++ b/script/public-inbox-clone
@@ -0,0 +1,58 @@
+#!perl -w
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# Wrapper to git clone remote public-inboxes
+use strict;
+use v5.10.1;
+use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
+my $opt = {};
+my $help = <<EOF; # the following should fit w/o scrolling in 80x24 term:
+usage: public-inbox-clone INBOX_URL [DESTINATION]
+
+  clone remote public-inboxes
+
+options:
+
+  --torsocks VAL      whether or not to wrap git and curl commands with
+                      torsocks (default: `auto')
+                      Must be one of: `auto', `no' or `yes'
+  --verbose | -v      increase verbosity (may be repeated)
+    --quiet | -q      increase verbosity (may be repeated)
+    -C DIR            chdir to specified directory
+EOF
+GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@
+		no-torsocks torsocks=s)) or die $help;
+if ($opt->{help}) { print $help; exit };
+require PublicInbox::Admin; # loads Config
+PublicInbox::Admin::do_chdir(delete $opt->{C});
+PublicInbox::Admin::setup_signals();
+$SIG{PIPE} = 'IGNORE';
+
+my ($url, $dst, $extra) = @ARGV;
+die $help if !defined($url) || defined($extra);
+defined($dst) or ($dst) = ($url =~ m!/([^/]+)/?\z!);
+index($dst, "\n") >= 0 and die "`\\n' not allowed in `$dst'";
+
+# n.b. this is still a truckload of code...
+require URI;
+require PublicInbox::LEI;
+require PublicInbox::LeiExternal;
+require PublicInbox::LeiMirror;
+require PublicInbox::LeiCurl;
+require PublicInbox::Lock;
+
+$url = PublicInbox::LeiExternal::ext_canonicalize($url);
+my $lei = bless {
+	env => \%ENV, opt => $opt, cmd => 'public-inbox-clone',
+	0 => *STDIN{GLOB}, 2 => *STDERR{GLOB},
+}, 'PublicInbox::LEI';
+open $lei->{1}, '+<&=', 1 or die "dup: $!";
+open $lei->{3}, '.' or die "open . $!";
+my $mrr = bless {
+	lei => $lei,
+	src => $url,
+	dst => $dst,
+}, 'PublicInbox::LeiMirror';
+$mrr->do_mirror;
+$mrr->can('do_finish_mirror')->([$mrr, $lei], $$);
+exit(($lei->{child_error} // 0) >> 8);
diff --git a/script/public-inbox-fetch b/script/public-inbox-fetch
new file mode 100755
index 00000000..5d303574
--- /dev/null
+++ b/script/public-inbox-fetch
@@ -0,0 +1,35 @@
+#!perl -w
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# Wrapper to git fetch remote public-inboxes
+use strict;
+use v5.10.1;
+use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
+my $opt = {};
+my $help = <<EOF; # the following should fit w/o scrolling in 80x24 term:
+usage: public-inbox-fetch -C DESTINATION
+
+  fetch remote public-inboxes
+
+options:
+
+  --torsocks VAL      whether or not to wrap git and curl commands with
+                      torsocks (default: `auto')
+                      Must be one of: `auto', `no' or `yes'
+  --verbose | -v      increase verbosity (may be repeated)
+    --quiet | -q      increase verbosity (may be repeated)
+    -C DIR            chdir to specified directory
+EOF
+GetOptions($opt, qw(help|h quiet|q verbose|v+ C=s@ c=s@
+	no-torsocks torsocks=s)) or die $help;
+if ($opt->{help}) { print $help; exit };
+require PublicInbox::Fetch; # loads Admin
+PublicInbox::Admin::do_chdir(delete $opt->{C});
+PublicInbox::Admin::setup_signals();
+$SIG{PIPE} = 'IGNORE';
+
+my $lei = bless {
+	env => \%ENV, opt => $opt, cmd => 'public-inbox-fetch',
+	0 => *STDIN{GLOB}, 1 => *STDOUT{GLOB}, 2 => *STDERR{GLOB},
+}, 'PublicInbox::LEI';
+PublicInbox::Fetch->do_fetch($lei, '.');
diff --git a/t/lei-mirror.t b/t/lei-mirror.t
index a61a7565..75e25b3f 100644
--- a/t/lei-mirror.t
+++ b/t/lei-mirror.t
@@ -46,6 +46,7 @@ test_lei({ tmpdir => $tmpdir }, sub {
 
 	lei_ok('add-external', "$t1-pfx", '--mirror', "$http/pfx/t1/",
 			\'--mirror v1 w/ PSGI prefix');
+	ok(!-e "$t1-pfx/mirror.done", 'no leftover mirror.done');
 
 	my $d = "$home/404";
 	ok(!lei(qw(add-external --mirror), "$http/404", $d), 'mirror 404');
@@ -77,6 +78,34 @@ test_lei({ tmpdir => $tmpdir }, sub {
 	} # for
 });
 
+SKIP: {
+	undef $sock;
+	my $d = "$tmpdir/d";
+	mkdir $d or xbail "mkdir $d $!";
+	my $opt = { -C => $d, 2 => \(my $err) };
+	ok(!run_script([qw(-clone -q), "$http/404"], undef, $opt), '404 fails');
+	ok(!-d "$d/404", 'destination not created');
+	delete $opt->{2};
+
+	ok(run_script([qw(-clone -q -C), $d, "$http/t2"], undef, $opt),
+		'-clone succeeds on v2');
+	ok(-d "$d/t2/git/0.git", 'epoch cloned');
+	ok(-f "$d/t2/manifest.js.gz", 'manifest saved');
+	ok(!-e "$d/t2/mirror.done", 'no leftover mirror.done');
+	ok(run_script([qw(-fetch -q -C), "$d/t2"], undef, $opt),
+		'-fetch succeeds w/ manifest.js.gz');
+	unlink("$d/t2/manifest.js.gz") or xbail "unlink $!";
+	ok(run_script([qw(-fetch -q -C), "$d/t2"], undef, $opt),
+		'-fetch succeeds w/o manifest.js.gz');
+
+	ok(run_script([qw(-clone -q -C), $d, "$http/t1"], undef, $opt),
+		'cloning v1 works');
+	ok(-d "$d/t1", 'v1 cloned');
+	ok(!-e "$d/t1/mirror.done", 'no leftover file');
+	ok(run_script([qw(-fetch -q -C), "$d/t1"], undef, $opt),
+		'fetching v1 works');
+}
+
 ok($td->kill, 'killed -httpd');
 $td->join;
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/7] clone|lei_mirror: write description in mirrors
  2021-09-12  7:47 [PATCH 0/7] new public-inbox-{clone,fetch} commands Eric Wong
                   ` (2 preceding siblings ...)
  2021-09-12  7:47 ` [PATCH 3/7] new public-inbox-{clone,fetch} commands Eric Wong
@ 2021-09-12  7:47 ` Eric Wong
  2021-09-12  7:47 ` [PATCH 5/7] import: do not write a "description" file Eric Wong
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2021-09-12  7:47 UTC (permalink / raw)
  To: meta

Instead of generic "Unnamed repository" or "missing" messages,
show "mirror of $URL" since it seems like a better default when
creating a mirror.
---
 lib/PublicInbox/LeiMirror.pm | 63 +++++++++++++++++++++++++-----------
 t/lei-mirror.t               | 23 +++++++++++++
 2 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index c128d13d..fe1cefe2 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -9,6 +9,7 @@ use parent qw(PublicInbox::IPC);
 use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
 use PublicInbox::Spawn qw(popen_rd spawn);
 use File::Temp ();
+use Fcntl qw(SEEK_SET);
 
 sub do_finish_mirror { # dwaitpid callback
 	my ($arg, $pid) = @_;
@@ -87,6 +88,27 @@ sub clone_cmd {
 	@cmd;
 }
 
+sub _get_txt { # non-fatal
+	my ($self, $endpoint, $file) = @_;
+	my $uri = URI->new($self->{src});
+	my $lei = $self->{lei};
+	my $path = $uri->path;
+	chop($path) eq '/' or die "BUG: $uri not canonicalized";
+	$uri->path("$path/$endpoint");
+	my $cmd = $self->{curl}->for_uri($lei, $uri, '--compressed');
+	my $ce = "$self->{dst}/$file";
+	my $ft = File::Temp->new(TEMPLATE => "$file-XXXX",
+				UNLINK => 1, DIR => $self->{dst});
+	my $opt = { 0 => $lei->{0}, 1 => $ft, 2 => $lei->{2} };
+	my $cerr = run_reap($lei, $cmd, $opt);
+	return "$uri missing" if ($cerr >> 8) == 22;
+	return "# @$cmd failed (non-fatal)" if $cerr;
+	my $f = $ft->filename;
+	rename($f, $ce) or return "rename($f, $ce): $! (non-fatal)";
+	$ft->unlink_on_destroy(0);
+	undef; # success
+}
+
 # tries the relatively new /$INBOX/_/text/config/raw endpoint
 sub _try_config {
 	my ($self) = @_;
@@ -96,24 +118,10 @@ sub _try_config {
 		File::Path::mkpath($dst);
 		-d $dst or die "mkpath($dst): $!\n";
 	}
-	my $uri = URI->new($self->{src});
-	my $lei = $self->{lei};
-	my $path = $uri->path;
-	chop($path) eq '/' or die "BUG: $uri not canonicalized";
-	$uri->path($path . '/_/text/config/raw');
-	my $cmd = $self->{curl}->for_uri($lei, $uri, '--compressed');
-	my $ce = "$dst/inbox.config.example";
-	my $f = "$ce-$$.tmp";
-	open(my $fh, '+>', $f) or return $lei->err("open $f: $! (non-fatal)");
-	my $opt = { 0 => $lei->{0}, 1 => $fh, 2 => $lei->{2} };
-	my $cerr = run_reap($lei, $cmd, $opt);
-	if (($cerr >> 8) == 22) { # 404 missing
-		unlink($f) if -s $fh == 0;
-		return;
-	}
-	return $lei->err("# @$cmd failed (non-fatal)") if $cerr;
-	rename($f, $ce) or return $lei->err("rename($f, $ce): $! (non-fatal)");
-	my $cfg = PublicInbox::Config->git_config_dump($f, $lei->{2});
+	my $err = _get_txt($self, qw(_/text/config/raw inbox.config.example));
+	return $self->{lei}->err($err) if $err;
+	my $f = "$self->{dst}/inbox.config.example";
+	my $cfg = PublicInbox::Config->git_config_dump($f, $self->{lei}->{2});
 	my $ibx = $self->{ibx} = {};
 	for my $sec (grep(/\Apublicinbox\./, @{$cfg->{-section_order}})) {
 		for (qw(address newsgroup nntpmirror)) {
@@ -122,9 +130,28 @@ sub _try_config {
 	}
 }
 
+sub set_description ($) {
+	my ($self) = @_;
+	my $f = "$self->{dst}/description";
+	open my $fh, '+>>', $f or die "open($f): $!";
+	seek($fh, 0, SEEK_SET) or die "seek($f): $!";
+	chomp(my $d = do { local $/; <$fh> } // die "read($f): $!");
+	if ($d eq '($INBOX_DIR/description missing)' ||
+			$d =~ /^Unnamed repository/ || $d !~ /\S/) {
+		seek($fh, 0, SEEK_SET) or die "seek($f): $!";
+		truncate($fh, 0) or die "truncate($f): $!";
+		print $fh "mirror of $self->{src}\n" or die "print($f): $!";
+		close $fh or die "close($f): $!";
+	}
+}
+
 sub index_cloned_inbox {
 	my ($self, $iv) = @_;
 	my $lei = $self->{lei};
+	my $err = _get_txt($self, qw(description description));
+	$lei->err($err) if $err; # non fatal
+	eval { set_description($self) };
+	warn $@ if $@;
 
 	# n.b. public-inbox-clone works w/o (SQLite || Xapian)
 	# lei is useless without Xapian + SQLite
diff --git a/t/lei-mirror.t b/t/lei-mirror.t
index 75e25b3f..35b77cf7 100644
--- a/t/lei-mirror.t
+++ b/t/lei-mirror.t
@@ -2,6 +2,7 @@
 # Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict; use v5.10.1; use PublicInbox::TestCommon;
+use PublicInbox::Inbox;
 require_mods(qw(-httpd lei));
 my $sock = tcp_server();
 my ($tmpdir, $for_destroy) = tmpdir();
@@ -15,6 +16,8 @@ test_lei({ tmpdir => $tmpdir }, sub {
 	my $t1 = "$home/t1-mirror";
 	lei_ok('add-external', $t1, '--mirror', "$http/t1/", \'--mirror v1');
 	ok(-f "$t1/public-inbox/msgmap.sqlite3", 't1-mirror indexed');
+	is(PublicInbox::Inbox::try_cat("$t1/description"),
+		"mirror of $http/t1/\n", 'description set');
 
 	lei_ok('ls-external');
 	like($lei_out, qr!\Q$t1\E!, 't1 added to ls-externals');
@@ -22,6 +25,9 @@ test_lei({ tmpdir => $tmpdir }, sub {
 	my $t2 = "$home/t2-mirror";
 	lei_ok('add-external', $t2, '--mirror', "$http/t2/", \'--mirror v2');
 	ok(-f "$t2/msgmap.sqlite3", 't2-mirror indexed');
+	ok(-f "$t2/description", 't2 description');
+	is(PublicInbox::Inbox::try_cat("$t2/description"),
+		"mirror of $http/t2/\n", 'description set');
 
 	lei_ok('ls-external');
 	like($lei_out, qr!\Q$t2\E!, 't2 added to ls-externals');
@@ -109,4 +115,21 @@ SKIP: {
 ok($td->kill, 'killed -httpd');
 $td->join;
 
+{
+	require_ok 'PublicInbox::LeiMirror';
+	my $mrr = { src => 'https://example.com/src/', dst => $tmpdir };
+	my $exp = "mirror of https://example.com/src/\n";
+	my $f = "$tmpdir/description";
+	PublicInbox::LeiMirror::set_description($mrr);
+	is(PublicInbox::Inbox::try_cat($f), $exp, 'description set on ENOENT');
+
+	my $fh;
+	(open($fh, '>', $f) and close($fh)) or xbail $!;
+	PublicInbox::LeiMirror::set_description($mrr);
+	is(PublicInbox::Inbox::try_cat($f), $exp, 'description set on empty');
+	(open($fh, '>', $f) and print $fh "x\n" and close($fh)) or xbail $!;
+	is(PublicInbox::Inbox::try_cat($f), "x\n",
+		'description preserved if non-default');
+}
+
 done_testing;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/7] import: do not write a "description" file
  2021-09-12  7:47 [PATCH 0/7] new public-inbox-{clone,fetch} commands Eric Wong
                   ` (3 preceding siblings ...)
  2021-09-12  7:47 ` [PATCH 4/7] clone|lei_mirror: write description in mirrors Eric Wong
@ 2021-09-12  7:47 ` Eric Wong
  2021-09-12  7:47 ` [PATCH 6/7] init: set a useful description Eric Wong
  2021-09-12  7:47 ` [PATCH 7/7] fetch: use manifest.js.gz for v1 Eric Wong
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2021-09-12  7:47 UTC (permalink / raw)
  To: meta

The default value is worthless to us and git functions fine
without the file.  public-inbox-init will create a useful one
in the next change.
---
 lib/PublicInbox/Import.pm | 3 ---
 t/www_listing.t           | 1 -
 2 files changed, 4 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 362cdc47..17adfabd 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -451,9 +451,6 @@ sub add {
 }
 
 my @INIT_FILES = ('HEAD' => undef, # filled in at runtime
-		'description' => <<EOD,
-Unnamed repository; edit this file 'description' to name the repository.
-EOD
 		'config' => <<EOC);
 [core]
 	repositoryFormatVersion = 0
diff --git a/t/www_listing.t b/t/www_listing.t
index 7ea12eea..5f90139a 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -101,7 +101,6 @@ SKIP: {
 	is(xsys('git', "--git-dir=$alt", qw(config gitweb.owner),
 		"lorelei \xc4\x80"), 0,
 		'set gitweb user');
-	ok(unlink("$bare->{git_dir}/description"), 'removed bare/description');
 	open $fh, '>', $cfgfile or xbail "open $cfgfile: $!";
 	$fh->autoflush(1);
 	print $fh <<"" or xbail "print $!";

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/7] init: set a useful description
  2021-09-12  7:47 [PATCH 0/7] new public-inbox-{clone,fetch} commands Eric Wong
                   ` (4 preceding siblings ...)
  2021-09-12  7:47 ` [PATCH 5/7] import: do not write a "description" file Eric Wong
@ 2021-09-12  7:47 ` Eric Wong
  2021-09-12  7:47 ` [PATCH 7/7] fetch: use manifest.js.gz for v1 Eric Wong
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2021-09-12  7:47 UTC (permalink / raw)
  To: meta

"Unnamed repository" for v1 inboxes was misleading, and having a
non-existent description for v2 was equally annoying, so set a
short description based on the primary address.

We remove descriptions when setting up new test inboxes to
preserve the behavior of the t/lei-mirror.t test case.
---
 lib/PublicInbox/TestCommon.pm | 3 ++-
 script/public-inbox-init      | 6 ++++++
 t/init.t                      | 3 +++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 2e3e9ecc..14dac03f 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -605,7 +605,8 @@ sub setup_public_inboxes () {
 		run_script([qw(-init --skip-docdata), "-V$V",
 				'--newsgroup', "t.v$V", "t$V",
 				"$test_home/t$V", "http://example.com/t$V",
-				"t$V\@example.com" ]) or BAIL_OUT "init v$V";
+				"t$V\@example.com" ]) or xbail "init v$V";
+		unlink "$test_home/t$V/description" or xbail "unlink $!";
 	}
 	require PublicInbox::Config;
 	require PublicInbox::InboxWritable;
diff --git a/script/public-inbox-init b/script/public-inbox-init
index ced88235..78a4d3bd 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -246,3 +246,9 @@ if (defined $perm) {
 rename $pi_config_tmp, $pi_config or
 	die "failed to rename `$pi_config_tmp' to `$pi_config': $!\n";
 undef $auto_unlink; # trigger ->DESTROY
+
+my $f = "$inboxdir/description";
+if (sysopen $fh, $f, O_CREAT|O_EXCL|O_WRONLY) {
+	print $fh "public inbox for $address[0]\n" or die "print($f): $!";
+	close $fh or die "close($f): $!";
+}
diff --git a/t/init.t b/t/init.t
index efa3314d..752e5af9 100644
--- a/t/init.t
+++ b/t/init.t
@@ -99,6 +99,9 @@ sub quiet_fail {
 	$err = '';
 	ok(run_script($cmd, $env, $rdr), 'initializes non-existent hierarchy');
 	ok(-d "$tmpdir/a/b/c/d", 'directory created');
+	is(PublicInbox::Inbox::try_cat("$tmpdir/a/b/c/d/description"),
+		"public inbox for abcd\@example.com\n", 'description set');
+
 	open my $fh, '>', "$tmpdir/d" or BAIL_OUT "open: $!";
 	close $fh;
 	$cmd = [ '-init', 'd-f-conflict', "$tmpdir/d/f/conflict",

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 7/7] fetch: use manifest.js.gz for v1
  2021-09-12  7:47 [PATCH 0/7] new public-inbox-{clone,fetch} commands Eric Wong
                   ` (5 preceding siblings ...)
  2021-09-12  7:47 ` [PATCH 6/7] init: set a useful description Eric Wong
@ 2021-09-12  7:47 ` Eric Wong
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2021-09-12  7:47 UTC (permalink / raw)
  To: meta

This is gentler to the remote HTTP server in the no-op case and
will allow client migrations to some v2-ish format without
forcing the client to redownload everything.
---
 lib/PublicInbox/Fetch.pm     | 147 +++++++++++++++++++++--------------
 lib/PublicInbox/LeiMirror.pm |   6 +-
 2 files changed, 90 insertions(+), 63 deletions(-)

diff --git a/lib/PublicInbox/Fetch.pm b/lib/PublicInbox/Fetch.pm
index d795731c..9613a582 100644
--- a/lib/PublicInbox/Fetch.pm
+++ b/lib/PublicInbox/Fetch.pm
@@ -35,49 +35,13 @@ sub remote_url ($$) {
 	my $fh = popen_rd($cmd, undef, { -C => $dir, 2 => $lei->{2} });
 	my $url = <$fh>;
 	close $fh or return;
-	chomp $url;
+	$url =~ s!/*\n!!s;
 	$url;
 }
 
-sub do_fetch {
-	my ($cls, $lei, $cd) = @_;
-	my $ibx_ver;
-	my $curl = PublicInbox::LeiCurl->new($lei) or return;
-	my $dir = PublicInbox::Admin::resolve_inboxdir($cd, \$ibx_ver);
-	if ($ibx_ver == 1) {
-		my $url = remote_url($lei, $dir) //
-			die "E: $dir missing remote.origin.url\n";
-		my $uri = URI->new($url);
-		my $torsocks = $curl->torsocks($lei, $uri);
-		my $opt = { -C => $dir };
-		my $cmd = [ @$torsocks, fetch_cmd($lei, $opt) ];
-		my $cerr = PublicInbox::LeiMirror::run_reap($lei, $cmd, $opt);
-		$lei->child_error($cerr, "@$cmd failed") if $cerr;
-		return;
-	}
-	# v2:
-	opendir my $dh, "$dir/git" or die "opendir $dir/git: $!";
-	my @epochs = sort { $b <=> $a } map { substr($_, 0, -4) + 0 }
-				grep(/\A[0-9]+\.git\z/, readdir($dh));
-	my ($git_url, $epoch);
-	for my $nr (@epochs) { # try newest epoch, first
-		my $edir = "$dir/git/$nr.git";
-		if (defined(my $url = remote_url($lei, $edir))) {
-			$git_url = $url;
-			$epoch = $nr;
-			last;
-		} else {
-			warn "W: $edir missing remote.origin.url\n";
-		}
-	}
-	$git_url or die "Unable to determine git URL\n";
-	my $inbox_url = $git_url;
-	$inbox_url =~ s!/git/$epoch(?:\.git)?/?\z!! or
-		$inbox_url =~ s!/$epoch(?:\.git)?/?\z!! or die <<EOM;
-Unable to infer inbox URL from <$git_url>
-EOM
-	$lei->qerr("# inbox URL: $inbox_url/");
-	my $muri = URI->new("$inbox_url/manifest.js.gz");
+sub do_manifest ($$$) {
+	my ($lei, $dir, $ibx_uri) = @_;
+	my $muri = URI->new("$ibx_uri/manifest.js.gz");
 	my $ft = File::Temp->new(TEMPLATE => 'manifest-XXXX',
 				UNLINK => 1, DIR => $dir);
 	my $fn = $ft->filename;
@@ -91,13 +55,16 @@ EOM
 		$lei->err($@) if $@;
 		push @opt, '-z', $mf if defined($m0);
 	}
-	my $curl_cmd = $curl->for_uri($lei, $muri, @opt);
+	my $curl_cmd = $lei->{curl}->for_uri($lei, $muri, @opt);
 	my $opt = {};
 	$opt->{$_} = $lei->{$_} for (0..2);
 	my $cerr = PublicInbox::LeiMirror::run_reap($lei, $curl_cmd, $opt);
-	return $lei->child_error($cerr, "@$curl_cmd failed") if $cerr;
-	return if !-s $ft; # 304 Not Modified via curl -z
-
+	if ($cerr) {
+		return [ 404 ] if ($cerr >> 8) == 22; # 404 Missing
+		$lei->child_error($cerr, "@$curl_cmd failed");
+		return;
+	}
+	return [ 304 ] if !-s $ft; # 304 Not Modified via curl -z
 	my $m1 = PublicInbox::LeiMirror::decode_manifest($ft, $fn, $muri);
 	my $mdiff = { %$m1 };
 
@@ -110,36 +77,96 @@ EOM
 		my $t1 = $cur->{modified} // next;
 		delete($mdiff->{$k}) if $f0 eq $f1 && $t0 == $t1;
 	}
-	my $ibx_uri = URI->new("$inbox_url/");
 	my ($path_pfx, $v1_bare, @v2_epochs) =
 		PublicInbox::LeiMirror::deduce_epochs($mdiff, $ibx_uri->path);
-	defined($v1_bare) and die <<EOM;
+	[ 200, $path_pfx, $v1_bare, \@v2_epochs, $muri, $ft, $mf ];
+}
+
+sub do_fetch {
+	my ($cls, $lei, $cd) = @_;
+	my $ibx_ver;
+	$lei->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+	my $dir = PublicInbox::Admin::resolve_inboxdir($cd, \$ibx_ver);
+	my ($ibx_uri, @git_dir, @epochs);
+	if ($ibx_ver == 1) {
+		my $url = remote_url($lei, $dir) //
+			die "E: $dir missing remote.origin.url\n";
+		$ibx_uri = URI->new($url);
+	} else { # v2:
+		opendir my $dh, "$dir/git" or die "opendir $dir/git: $!";
+		@epochs = sort { $b <=> $a } map { substr($_, 0, -4) + 0 }
+					grep(/\A[0-9]+\.git\z/, readdir($dh));
+		my ($git_url, $epoch);
+		for my $nr (@epochs) { # try newest epoch, first
+			my $edir = "$dir/git/$nr.git";
+			if (defined(my $url = remote_url($lei, $edir))) {
+				$git_url = $url;
+				$epoch = $nr;
+				last;
+			} else {
+				warn "W: $edir missing remote.origin.url\n";
+			}
+		}
+		$git_url or die "Unable to determine git URL\n";
+		my $inbox_url = $git_url;
+		$inbox_url =~ s!/git/$epoch(?:\.git)?/?\z!! or
+			$inbox_url =~ s!/$epoch(?:\.git)?/?\z!! or die <<EOM;
+Unable to infer inbox URL from <$git_url>
+EOM
+		$ibx_uri = URI->new($inbox_url);
+	}
+	$lei->qerr("# inbox URL: $ibx_uri/");
+	my $res = do_manifest($lei, $dir, $ibx_uri) or return;
+	my ($code, $path_pfx, $v1_bare, $v2_epochs, $muri, $ft, $mf) = @$res;
+	return if $code == 304;
+	if ($code == 404) {
+		# any pre-manifest.js.gz instances running? Just fetch all
+		# existing ones and unconditionally try cloning the next
+		$v2_epochs = [ map {;
+				"$dir/git/$_.git";
+				} @epochs ];
+		push @$v2_epochs, "$dir/git/".($epochs[-1] + 1) if @epochs;
+	} else {
+		$code == 200 or die "BUG unexpected code $code\n";
+	}
+	if ($ibx_ver == 2) {
+		defined($v1_bare) and warn <<EOM;
 E: got v1 `$v1_bare' when expecting v2 epoch(s) in <$muri>, WTF?
 EOM
-	my @epoch_nr = sort { $a <=> $b }
-		map { my ($nr) = (m!/([0-9]+)\.git\z!g) } @v2_epochs;
-
+		@git_dir = map { "$dir/git/$_.git" } sort { $a <=> $b }
+			map { my ($nr) = (m!/([0-9]+)\.git\z!g) } @$v2_epochs;
+	} else {
+		$v1_bare eq $dir or warn "$v1_bare != $dir";
+		$git_dir[0] = $v1_bare // $dir;
+	}
 	# n.b. this expects all epochs are from the same host
-	my $torsocks = $curl->torsocks($lei, $muri);
-	for my $nr (@epoch_nr) {
-		my $dir = "$dir/git/$nr.git";
+	my $torsocks = $lei->{curl}->torsocks($lei, $muri);
+	for my $d (@git_dir) {
 		my $cmd;
-		my $opt = {};
-		if (-d $dir) {
-			$opt->{-C} = $dir;
+		my $opt = {}; # for spawn
+		if (-d $d) {
+			$opt->{-C} = $d;
 			$cmd = [ @$torsocks, fetch_cmd($lei, $opt) ];
 		} else {
 			my $e_uri = $ibx_uri->clone;
-			$e_uri->path($ibx_uri->path."git/$nr.git");
+			my ($epath) = ($d =~ m!/(git/[0-9]+\.git)\z!);
+			$e_uri->path($ibx_uri->path.$epath);
 			$cmd = [ @$torsocks,
 				PublicInbox::LeiMirror::clone_cmd($lei, $opt),
-				$$e_uri, $dir ];
+				$$e_uri, $d];
 		}
 		my $cerr = PublicInbox::LeiMirror::run_reap($lei, $cmd, $opt);
-		return $lei->child_error($cerr, "@$cmd failed") if $cerr;
+		# do not bail on clone failure if we didn't have a manifest
+		if ($cerr && ($code == 200 || -d $d)) {
+			$lei->child_error($cerr, "@$cmd failed");
+			return;
+		}
+	}
+	if ($ft) {
+		my $fn = $ft->filename;
+		rename($fn, $mf) or die "E: rename($fn, $mf): $!\n";
+		$ft->unlink_on_destroy(0);
 	}
-	rename($fn, $mf) or die "E: rename($fn, $mf): $!\n";
-	$ft->unlink_on_destroy(0);
 }
 
 1;
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index fe1cefe2..23813dcf 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -295,15 +295,15 @@ EOM
 			$uri->clone
 		} @v2_epochs;
 		clone_v2($self, \@v2_epochs);
-		my $fin = "$self->{dst}/manifest.js.gz";
-		rename($fn, $fin) or die "E: rename($fn, $fin): $!";
-		$ft->unlink_on_destroy(0);
 	} elsif (defined $v1_bare) {
 		clone_v1($self);
 	} else {
 		die "E: confused by <$uri>, possible matches:\n\t",
 			join(', ', sort keys %$m), "\n";
 	}
+	my $fin = "$self->{dst}/manifest.js.gz";
+	rename($fn, $fin) or die "E: rename($fn, $fin): $!";
+	$ft->unlink_on_destroy(0);
 }
 
 sub start_clone_url {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-12  7:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-12  7:47 [PATCH 0/7] new public-inbox-{clone,fetch} commands Eric Wong
2021-09-12  7:47 ` [PATCH 1/7] lei_mirror: simplify error reporting Eric Wong
2021-09-12  7:47 ` [PATCH 2/7] lei_mirror: fix error message Eric Wong
2021-09-12  7:47 ` [PATCH 3/7] new public-inbox-{clone,fetch} commands Eric Wong
2021-09-12  7:47 ` [PATCH 4/7] clone|lei_mirror: write description in mirrors Eric Wong
2021-09-12  7:47 ` [PATCH 5/7] import: do not write a "description" file Eric Wong
2021-09-12  7:47 ` [PATCH 6/7] init: set a useful description Eric Wong
2021-09-12  7:47 ` [PATCH 7/7] fetch: use manifest.js.gz for v1 Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).