unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/5] clone improvements
@ 2023-03-13 12:00 Eric Wong
  2023-03-13 12:00 ` [PATCH 1/5] lei_mirror: describe why the {ibx} field is used Eric Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Eric Wong @ 2023-03-13 12:00 UTC (permalink / raw)
  To: meta

The more I use it, the more problems I find :x

Eric Wong (5):
  lei_mirror: describe why the {ibx} field is used
  lei_mirror: do not re-fetch inbox.config.example
  lei_mirror: do not fetch to read-only directories
  lei_mirror: handle UTF-8 from manifest.js.gz properly
  doc: clone: document --remote-manifest= option

 Documentation/public-inbox-clone.pod | 11 +++++++++++
 lib/PublicInbox/LeiMirror.pm         | 24 +++++++++++++++++-------
 t/clone-coderepo.t                   |  8 ++++++--
 3 files changed, 34 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] lei_mirror: describe why the {ibx} field is used
  2023-03-13 12:00 [PATCH 0/5] clone improvements Eric Wong
@ 2023-03-13 12:00 ` Eric Wong
  2023-03-13 12:00 ` [PATCH 2/5] lei_mirror: do not re-fetch inbox.config.example Eric Wong
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2023-03-13 12:00 UTC (permalink / raw)
  To: meta

I forgot why that hunk of code was needed :x, so maybe others
will find the comment helpful, too.
---
 lib/PublicInbox/LeiMirror.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index c148ebfd..d878f1e4 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -194,7 +194,7 @@ sub _write_inbox_config {
 		die "open($f): $!";
 	}
 	my $cfg = PublicInbox::Config->git_config_dump($f, $self->{lei}->{2});
-	my $ibx = $self->{ibx} = {};
+	my $ibx = $self->{ibx} = {}; # for indexing
 	for my $sec (grep(/\Apublicinbox\./, @{$cfg->{-section_order}})) {
 		for (qw(address newsgroup nntpmirror)) {
 			$ibx->{$_} = $cfg->{"$sec.$_"};

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] lei_mirror: do not re-fetch inbox.config.example
  2023-03-13 12:00 [PATCH 0/5] clone improvements Eric Wong
  2023-03-13 12:00 ` [PATCH 1/5] lei_mirror: describe why the {ibx} field is used Eric Wong
@ 2023-03-13 12:00 ` Eric Wong
  2023-03-13 12:00 ` [PATCH 3/5] lei_mirror: do not fetch to read-only directories Eric Wong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2023-03-13 12:00 UTC (permalink / raw)
  To: meta

It's a significant source of latency for incremental updates at
the moment, and not really needed since it's just an example.
---
 lib/PublicInbox/LeiMirror.pm | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index d878f1e4..967a6422 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -620,7 +620,8 @@ sub clone_v1 {
 						\&run_puh, $self, $fini));
 	}
 	if (!$self->{-is_epoch} && $lei->{opt}->{'inbox-config'} =~
-				/\A(?:always|v1)\z/s) {
+				/\A(?:always|v1)\z/s &&
+			!-f "$dst/inbox.config.example") {
 		_get_txt_start($self, '_/text/config/raw', $fini);
 	}
 
@@ -923,8 +924,10 @@ failed to extract epoch number from $src
 
 	$self->{dry_run} or File::Path::mkpath($dst);
 
-	$lei->{opt}->{'inbox-config'} =~ /\A(?:always|v2)\z/s and
+	if ($lei->{opt}->{'inbox-config'} =~ /\A(?:always|v2)\z/s &&
+			!-f "$dst/inbox.config.example") {
 		_get_txt_start($task, '_/text/config/raw', $fini);
+	}
 
 	defined($desc) ? ($task->{'txt.description'} = $desc) :
 		_get_txt_start($task, 'description', $fini);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] lei_mirror: do not fetch to read-only directories
  2023-03-13 12:00 [PATCH 0/5] clone improvements Eric Wong
  2023-03-13 12:00 ` [PATCH 1/5] lei_mirror: describe why the {ibx} field is used Eric Wong
  2023-03-13 12:00 ` [PATCH 2/5] lei_mirror: do not re-fetch inbox.config.example Eric Wong
@ 2023-03-13 12:00 ` Eric Wong
  2023-03-13 12:00 ` [PATCH 4/5] lei_mirror: handle UTF-8 from manifest.js.gz properly Eric Wong
  2023-03-13 12:00 ` [PATCH 5/5] doc: clone: document --remote-manifest= option Eric Wong
  4 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2023-03-13 12:00 UTC (permalink / raw)
  To: meta

As with public-inbox-fetch, we shouldn't waste time fetching
into read-only directories, since --epoch= will make unwanted
epoch directories read-only placeholders.
---
 lib/PublicInbox/LeiMirror.pm | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 967a6422..3ec8170f 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -593,8 +593,15 @@ sub clone_v1 {
 		die "$uri is a v1 inbox, --epoch is not supported\n";
 	$self->{-torsocks} //= $curl->torsocks($lei, $uri) or return;
 	my $dst = $self->{cur_dst} // $self->{dst};
-	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
 	my $resume = -d $dst;
+	if ($resume) { # respect read-only cloned w/ --epoch=
+		my @st = stat(_); # for root
+		if (!-w _ || !($st[2] & 0222)) {
+			warn "# skipping $dst, not writable\n";
+			return;
+		}
+	}
+	my $fini = PublicInbox::OnDestroy->new($$, \&v1_done, $self);
 	if (my $fgrp = forkgroup_prep($self, $uri)) {
 		$fgrp->{-fini} = $fini;
 		if ($resume) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] lei_mirror: handle UTF-8 from manifest.js.gz properly
  2023-03-13 12:00 [PATCH 0/5] clone improvements Eric Wong
                   ` (2 preceding siblings ...)
  2023-03-13 12:00 ` [PATCH 3/5] lei_mirror: do not fetch to read-only directories Eric Wong
@ 2023-03-13 12:00 ` Eric Wong
  2023-03-13 12:00 ` [PATCH 5/5] doc: clone: document --remote-manifest= option Eric Wong
  4 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2023-03-13 12:00 UTC (permalink / raw)
  To: meta

This should ensure we display the "git config gitweb.owner
$OWNER" command invocation properly and also ensures we set the
description properly without triggering wide character warnings.

Also tested with a smallish iproute2 repo
(/pub/scm/linux/kernel/git/toke/iproute2.git) using my mirror:

  public-inbox-clone --remote-manifest=pub/manifest.js.gz \
    --include='*/toke/iproute2.git' --inbox-config=never \
    https://80x24.org/lore $DST

Anyways, I'm fairly certain this change and its tests are
correct; but I still struggle to understand Perl's approach to
Unicode and it's interactions with various JSON implementations.

Fixes: 0830817c132cb105 ("lei_mirror: show non-ASCII owner properly w/ --verbose")
---
 lib/PublicInbox/LeiMirror.pm | 6 +++---
 t/clone-coderepo.t           | 8 ++++++--
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
index 3ec8170f..18932cf4 100644
--- a/lib/PublicInbox/LeiMirror.pm
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -259,8 +259,7 @@ sub run_reap {
 sub start_cmd {
 	my ($self, $cmd, $opt, $fini) = @_;
 	do_reap($self);
-	utf8::decode(my $msg = "# @$cmd");
-	$self->{lei}->qerr($msg);
+	$self->{lei}->qerr("# @$cmd");
 	return if $self->{dry_run};
 	$LIVE->{spawn($cmd, undef, $opt)} = [ \&reap_cmd, $self, $cmd, $fini ]
 }
@@ -633,7 +632,7 @@ sub clone_v1 {
 	}
 
 	my $d = $self->{-ent} ? $self->{-ent}->{description} : undef;
-	$self->{'txt.description'} = $d if defined $d;
+	utf8::encode($self->{'txt.description'} = $d) if defined $d;
 	(!defined($d) && !$end) and
 		_get_txt_start($self, 'description', $fini);
 
@@ -823,6 +822,7 @@ sub update_ent {
 	$new = $self->{-ent}->{owner} // return;
 	$cur = $self->{-local_manifest}->{$key}->{owner} // "\0";
 	return if $cur eq $new;
+	utf8::encode($new); # to octets
 	my $cmd = [ qw(git config -f), "$dst/config", 'gitweb.owner', $new ];
 	start_cmd($self, $cmd, { 2 => $self->{lei}->{2} });
 }
diff --git a/t/clone-coderepo.t b/t/clone-coderepo.t
index 1f33a6d7..3a5997c9 100644
--- a/t/clone-coderepo.t
+++ b/t/clone-coderepo.t
@@ -63,11 +63,13 @@ EOM
 	my $env = { TEST_DOCROOT => "$tmpdir/src", PI_CONFIG => $pi_config };
 	$td = start_script($cmd, $env, { 3 => $tcp });
 	my $fp = sha1_hex(my $refs = xqx([@git, 'show-ref']));
+	my $alice = "\x{100}lice";
 	$m = {
 		'/a.git' => {
 			fingerprint => $fp,
 			modified => 1,
-			owner => 'Alice',
+			owner => $alice,
+			description => "${alice}'s repo",
 		},
 		'/b.git' => {
 			fingerprint => $fp,
@@ -89,9 +91,11 @@ my $cmd = [qw(-clone --inbox-config=never --manifest= --project-list=
 	--objstore= -p -q), $url, "$tmpdir/dst", '--exit-code'];
 ok(run_script($cmd), 'clone');
 is(xqx([qw(git config gitweb.owner)], { GIT_DIR => "$tmpdir/dst/a.git" }),
-	"Alice\n", 'a.git gitweb.owner set');
+	"\xc4\x80lice\n", 'a.git gitweb.owner set');
 is(xqx([qw(git config gitweb.owner)], { GIT_DIR => "$tmpdir/dst/b.git" }),
 	"Bob\n", 'b.git gitweb.owner set');
+my $desc = PublicInbox::Git::try_cat("$tmpdir/dst/a.git/description");
+is($desc, "\xc4\x80lice's repo\n", 'description set');
 
 my $dst_pl = "$tmpdir/dst/projects.list";
 my $dst_mf = "$tmpdir/dst/manifest.js.gz";

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] doc: clone: document --remote-manifest= option
  2023-03-13 12:00 [PATCH 0/5] clone improvements Eric Wong
                   ` (3 preceding siblings ...)
  2023-03-13 12:00 ` [PATCH 4/5] lei_mirror: handle UTF-8 from manifest.js.gz properly Eric Wong
@ 2023-03-13 12:00 ` Eric Wong
  2023-03-13 23:34   ` Kyle Meyer
  4 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-03-13 12:00 UTC (permalink / raw)
  To: meta

---
 Documentation/public-inbox-clone.pod | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index af4e8e95..90359f5a 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -129,6 +129,17 @@ is specified where C<FILE> is an empty string (C<"">), then C<manifest.js.gz>
 
 This is a new option in public-inbox 2.0+
 
+=item --remote-manifest=URL|RELATIVE_PATH
+
+Use an alternate location for the remote manifest.js.gz file.
+This mah be specified as a full absolute URL (e.g
+C<--remote-manifest=https://80x24.org/lore/pub/manifest.js.gz>),
+or a pathname relative to the ROOT_URL (e.g
+C<--remote-manifest=pub/manifest.js.gz> when ROOT_URL is
+C<https://80x24.org/lore/>
+
+This is a new option in public-inbox 2.0+
+
 =item --project-list=FILE
 
 When cloning code repos from a manifest, generate a cgit-compatible

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 5/5] doc: clone: document --remote-manifest= option
  2023-03-13 12:00 ` [PATCH 5/5] doc: clone: document --remote-manifest= option Eric Wong
@ 2023-03-13 23:34   ` Kyle Meyer
  2023-03-14 20:50     ` [PATCH] doc: clone: fix typo in --remote-manifest= description Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Kyle Meyer @ 2023-03-13 23:34 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong writes:

> +Use an alternate location for the remote manifest.js.gz file.
> +This mah be specified as a full absolute URL (e.g

typo: may

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] doc: clone: fix typo in --remote-manifest= description
  2023-03-13 23:34   ` Kyle Meyer
@ 2023-03-14 20:50     ` Eric Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2023-03-14 20:50 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: meta

Kyle Meyer <kyle@kyleam.com> wrote:
> Eric Wong writes:
> 
> > +Use an alternate location for the remote manifest.js.gz file.
> > +This mah be specified as a full absolute URL (e.g
> 
> typo: may

Thanks, pushed as b192069672eb5e9de3fdd07064a9c9231390d584

------8<-----
Subject: [PATCH] doc: clone: fix typo in --remote-manifest= description

Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87v8j4ql8k.fsf@kyleam.com/
---
 Documentation/public-inbox-clone.pod | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/public-inbox-clone.pod b/Documentation/public-inbox-clone.pod
index 90359f5a..7def14ef 100644
--- a/Documentation/public-inbox-clone.pod
+++ b/Documentation/public-inbox-clone.pod
@@ -132,7 +132,7 @@ This is a new option in public-inbox 2.0+
 =item --remote-manifest=URL|RELATIVE_PATH
 
 Use an alternate location for the remote manifest.js.gz file.
-This mah be specified as a full absolute URL (e.g
+This may be specified as a full absolute URL (e.g
 C<--remote-manifest=https://80x24.org/lore/pub/manifest.js.gz>),
 or a pathname relative to the ROOT_URL (e.g
 C<--remote-manifest=pub/manifest.js.gz> when ROOT_URL is

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-03-14 20:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-13 12:00 [PATCH 0/5] clone improvements Eric Wong
2023-03-13 12:00 ` [PATCH 1/5] lei_mirror: describe why the {ibx} field is used Eric Wong
2023-03-13 12:00 ` [PATCH 2/5] lei_mirror: do not re-fetch inbox.config.example Eric Wong
2023-03-13 12:00 ` [PATCH 3/5] lei_mirror: do not fetch to read-only directories Eric Wong
2023-03-13 12:00 ` [PATCH 4/5] lei_mirror: handle UTF-8 from manifest.js.gz properly Eric Wong
2023-03-13 12:00 ` [PATCH 5/5] doc: clone: document --remote-manifest= option Eric Wong
2023-03-13 23:34   ` Kyle Meyer
2023-03-14 20:50     ` [PATCH] doc: clone: fix typo in --remote-manifest= description Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).