* Restarting daemons on config file change @ 2021-07-19 20:03 Konstantin Ryabitsev 2021-07-19 20:49 ` Eric Wong 0 siblings, 1 reply; 12+ messages in thread From: Konstantin Ryabitsev @ 2021-07-19 20:03 UTC (permalink / raw) To: meta Hello: Something I stumbled on today is the need to have the -httpd and -nntpd daemons reread the config file after we've mirrored and initialized new inboxdirs. The situation is: - public-inbox-{httpd,nntpd} are running as systemd services as user "publicinbox" - the mirroring and initialization/indexing is done as user "mirror", so we can't send a HUP to the daemon processes The best I can think of is a systemd watcher service that automatically restarts the daemons when the config file is modified, but I wanted to check here first to see if perhaps I'm missing something simpler. -K ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Restarting daemons on config file change 2021-07-19 20:03 Restarting daemons on config file change Konstantin Ryabitsev @ 2021-07-19 20:49 ` Eric Wong 2021-07-20 8:58 ` [PATCH] httpd: fix SIGHUP by invalidating cache on reload Eric Wong 2021-07-20 17:00 ` Restarting daemons on config file change Konstantin Ryabitsev 0 siblings, 2 replies; 12+ messages in thread From: Eric Wong @ 2021-07-19 20:49 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > Hello: > > Something I stumbled on today is the need to have the -httpd and -nntpd > daemons reread the config file after we've mirrored and initialized new > inboxdirs. The situation is: Correct, config files are read once at startup. > - public-inbox-{httpd,nntpd} are running as systemd services as user > "publicinbox" > - the mirroring and initialization/indexing is done as user "mirror", so we > can't send a HUP to the daemon processes I seem to recall HUP having some trouble with -httpd (and less so with nntpd/imapd); or at least that's what -daemon(8) manpage alludes to... I usually just swap between "@foo" and "@bar" systemd units, but I restart more often for code changes than config file changes. > The best I can think of is a systemd watcher service that automatically > restarts the daemons when the config file is modified, but I wanted to check > here first to see if perhaps I'm missing something simpler. Yes, a systemd.path unit might be the way to go. A patch for examples/ would be appreciated if you go down that route :> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] httpd: fix SIGHUP by invalidating cache on reload 2021-07-19 20:49 ` Eric Wong @ 2021-07-20 8:58 ` Eric Wong 2021-07-20 17:00 ` Restarting daemons on config file change Konstantin Ryabitsev 1 sibling, 0 replies; 12+ messages in thread From: Eric Wong @ 2021-07-20 8:58 UTC (permalink / raw) To: meta; +Cc: Konstantin Ryabitsev Eric Wong <e@80x24.org> wrote: > I seem to recall HUP having some trouble with -httpd (and less > so with nntpd/imapd); or at least that's what -daemon(8) manpage > alludes to... -------8<------- Subject: [PATCH] httpd: fix SIGHUP by invalidating cache on reload Since we require separate PublicInbox::HTTPD instances for each listen socket address (in order to support {SERVER_<NAME|PORT>} for PSGI env), the old cache needed to be invalidated on rare app refreshes. SIGHUP has always been broken in -httpd (but not -imapd or -nntpd) due to this cache. Update the daemon documentation and 5.10.1-ize some bits while we're in the area. --- Documentation/public-inbox-daemon.pod | 8 +++++--- script/public-inbox-httpd | 6 ++++-- t/httpd.t | 25 +++++++++++++++++++++++++ 3 files changed, 34 insertions(+), 5 deletions(-) diff --git a/Documentation/public-inbox-daemon.pod b/Documentation/public-inbox-daemon.pod index ec210efa..f77fc3a9 100644 --- a/Documentation/public-inbox-daemon.pod +++ b/Documentation/public-inbox-daemon.pod @@ -5,13 +5,14 @@ public-inbox-daemon - common usage for public-inbox network daemons =head1 SYNOPSIS public-inbox-httpd + public-inbox-imapd public-inbox-nntpd =head1 DESCRIPTION This manual describes common options and behavior for public-inbox network daemons. Network daemons for public-inbox -provide read-only NNTP and HTTP access to public-inboxes. Write +provide read-only NNTP, IMAP and HTTP access to public-inboxes. Write access to a public-inbox will never be required to run these. These daemons are implemented with a common core using @@ -103,7 +104,7 @@ See L</UPGRADING> below. =item SIGHUP Reload config files associated with the process. -(FIXME: not tested for -httpd, yet) +(Note: broken for L<public-inbox-httpd(1)> only in E<lt>= 1.6) =item SIGTTIN @@ -188,4 +189,5 @@ License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt> =head1 SEE ALSO -L<public-inbox-httpd(1)>, L<public-inbox-nntpd(1)> +L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>, +L<public-inbox-nntpd(1)> diff --git a/script/public-inbox-httpd b/script/public-inbox-httpd index b31b896d..7b0ec560 100755 --- a/script/public-inbox-httpd +++ b/script/public-inbox-httpd @@ -4,6 +4,7 @@ # # Standalone HTTP server for public-inbox. use strict; +use v5.10.1; use PublicInbox::Daemon; BEGIN { for (qw(Plack::Builder Plack::Util)) { @@ -14,7 +15,7 @@ BEGIN { require PublicInbox::HTTPD; } -my %httpds; +my %httpds; # per-listen-FD mapping for HTTPD->{env}->{SERVER_<NAME|PORT>} my $app; my $refresh = sub { if (@ARGV) { @@ -37,12 +38,13 @@ my $refresh = sub { sub { $www->call(@_) }; }; } + %httpds = (); # invalidate cache }; PublicInbox::Daemon::run('0.0.0.0:8080', $refresh, sub ($$$) { # post_accept my ($client, $addr, $srv) = @_; my $fd = fileno($srv); - my $h = $httpds{$fd} ||= PublicInbox::HTTPD->new($srv, $app); + my $h = $httpds{$fd} //= PublicInbox::HTTPD->new($srv, $app); PublicInbox::HTTP->new($client, $addr, $h), }); diff --git a/t/httpd.t b/t/httpd.t index 0354a733..849f61bb 100644 --- a/t/httpd.t +++ b/t/httpd.t @@ -32,6 +32,10 @@ Date: Thu, 01 Jan 1970 06:06:06 +0000 nntp EOF }; + my $i2 = create_inbox 'test-2', sub { + my ($im, $ibx) = @_; + $im->add(eml_load('t/plack-qp.eml')) or xbail '->add'; + }; local $ENV{HOME} = $home; my $cmd = [ '-init', $group, $inboxdir, 'http://example.com/', $addr ]; ok(run_script($cmd), 'init ran properly'); @@ -64,6 +68,27 @@ EOF "$http_pfx/$group", "$tmpdir/dumb.git"), 0, 'clone successful'); + # test config reload + my $cfg = "$home/.public-inbox/config"; + open my $fh, '>>', $cfg or xbail "open: $!"; + print $fh <<EOM or xbail "print $!"; +[publicinbox "test-2"] + inboxdir = $i2->{inboxdir} + address = test-2\@example.com + url = https://example.com/test-2 +EOM + close $fh or xbail "close $!"; + $td->kill('HUP') or BAIL_OUT "failed to kill -httpd: $!"; + tick; # wait for HUP to take effect + my $buf = do { + my $c2 = tcp_connect($sock); + $c2->write("GET /test-2/qp\@example.com/raw HTTP/1.0\r\n\r\n") + or xbail "c2 write: $!"; + local $/; + <$c2> + }; + like($buf, qr!\AHTTP/1\.0 200\b!s, 'got 200 after reload for test-2'); + ok($td->kill, 'killed httpd'); $td->join; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: Restarting daemons on config file change 2021-07-19 20:49 ` Eric Wong 2021-07-20 8:58 ` [PATCH] httpd: fix SIGHUP by invalidating cache on reload Eric Wong @ 2021-07-20 17:00 ` Konstantin Ryabitsev 2021-07-20 20:34 ` Eric Wong 1 sibling, 1 reply; 12+ messages in thread From: Konstantin Ryabitsev @ 2021-07-20 17:00 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Mon, Jul 19, 2021 at 08:49:35PM +0000, Eric Wong wrote: > > The best I can think of is a systemd watcher service that automatically > > restarts the daemons when the config file is modified, but I wanted to check > > here first to see if perhaps I'm missing something simpler. > > Yes, a systemd.path unit might be the way to go. A patch for > examples/ would be appreciated if you go down that route :> Okay, let me see what I can come up with. Looks like the best course of action is to: 1. use a global blocking lock 2. copy the config file to a new location 3. make the necessary changes to the temporary config file 4. move the temp config file to overwrite the current config 5. release the lock This would make sure that there is are no races and that all changes show up at once and don't repeatedly trigger the path watcher. -K ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Restarting daemons on config file change 2021-07-20 17:00 ` Restarting daemons on config file change Konstantin Ryabitsev @ 2021-07-20 20:34 ` Eric Wong 2021-07-20 20:49 ` Konstantin Ryabitsev 0 siblings, 1 reply; 12+ messages in thread From: Eric Wong @ 2021-07-20 20:34 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Mon, Jul 19, 2021 at 08:49:35PM +0000, Eric Wong wrote: > > > The best I can think of is a systemd watcher service that automatically > > > restarts the daemons when the config file is modified, but I wanted to check > > > here first to see if perhaps I'm missing something simpler. > > > > Yes, a systemd.path unit might be the way to go. A patch for > > examples/ would be appreciated if you go down that route :> > > Okay, let me see what I can come up with. Looks like the best course of action > is to: > > 1. use a global blocking lock > 2. copy the config file to a new location > 3. make the necessary changes to the temporary config file > 4. move the temp config file to overwrite the current config > 5. release the lock public-inbox-init already does all of that. It doesn't set all possible keys, yet (e.g. infourl, maybe some others). > This would make sure that there is are no races and that all changes show up > at once and don't repeatedly trigger the path watcher. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Restarting daemons on config file change 2021-07-20 20:34 ` Eric Wong @ 2021-07-20 20:49 ` Konstantin Ryabitsev 2021-07-20 21:07 ` Eric Wong 0 siblings, 1 reply; 12+ messages in thread From: Konstantin Ryabitsev @ 2021-07-20 20:49 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Tue, Jul 20, 2021 at 08:34:33PM +0000, Eric Wong wrote: > > Okay, let me see what I can come up with. Looks like the best course of action > > is to: > > > > 1. use a global blocking lock > > 2. copy the config file to a new location > > 3. make the necessary changes to the temporary config file > > 4. move the temp config file to overwrite the current config > > 5. release the lock > > public-inbox-init already does all of that. It doesn't set > all possible keys, yet (e.g. infourl, maybe some others). I figured as much, but we do want to set extra keys *and* write the config in a certain order (e.g. prioritizing some sources over others by listing them first). I currently do this via a list-id globbing match (e.g. --listid-priority=*.linux.dev,*.kernel.org,*). Apropos, question -- will public-inbox-init do anything to the config file if it finds that all the settings are already in place, or will it just accept them? -K ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Restarting daemons on config file change 2021-07-20 20:49 ` Konstantin Ryabitsev @ 2021-07-20 21:07 ` Eric Wong 2021-07-20 21:18 ` Konstantin Ryabitsev 0 siblings, 1 reply; 12+ messages in thread From: Eric Wong @ 2021-07-20 21:07 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Tue, Jul 20, 2021 at 08:34:33PM +0000, Eric Wong wrote: > > > Okay, let me see what I can come up with. Looks like the best course of action > > > is to: > > > > > > 1. use a global blocking lock > > > 2. copy the config file to a new location > > > 3. make the necessary changes to the temporary config file > > > 4. move the temp config file to overwrite the current config > > > 5. release the lock > > > > public-inbox-init already does all of that. It doesn't set > > all possible keys, yet (e.g. infourl, maybe some others). > > I figured as much, but we do want to set extra keys *and* write the config in > a certain order (e.g. prioritizing some sources over others by listing them > first). I currently do this via a list-id globbing match (e.g. > --listid-priority=*.linux.dev,*.kernel.org,*). Ah, that gets tricky, since git-config doesn't seem to provide a mechanism for ordering. Not sure, perhaps serializing -init invocations somehow is the way to go? I'm not sure how to expose that ordering, though... > Apropos, question -- will public-inbox-init do anything to the config file if > it finds that all the settings are already in place, or will it just accept > them? It only checks conflicting addresses, at the moment. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Restarting daemons on config file change 2021-07-20 21:07 ` Eric Wong @ 2021-07-20 21:18 ` Konstantin Ryabitsev 2021-07-21 14:05 ` [PATCH 1/2] extsearch: support publicinbox.*.boost parameter Eric Wong ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Konstantin Ryabitsev @ 2021-07-20 21:18 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Tue, Jul 20, 2021 at 09:07:24PM +0000, Eric Wong wrote: > > I figured as much, but we do want to set extra keys *and* write the config in > > a certain order (e.g. prioritizing some sources over others by listing them > > first). I currently do this via a list-id globbing match (e.g. > > --listid-priority=*.linux.dev,*.kernel.org,*). > > Ah, that gets tricky, since git-config doesn't seem to provide a > mechanism for ordering. Not sure, perhaps serializing -init > invocations somehow is the way to go? I'm not sure how to > expose that ordering, though... Serializing wouldn't necessarily help, as a list may be added to an existing archive collection at some later point, and we don't necessarily want it at the bottom. Another thought would be to give a "rank" ("weight"/"priority") configuration option that you can check before falling back to the listed order. E.g.: [publicinbox "dkim-mangling"] listid=dkim-mangling.list.example.org rank=10 [publicinbox "dkim-respecting"] listid=dkim-respecting.list.example.net rank=1 This way higher-ranking entries will win when returning aggregating threads, so we're less likely to see results from known dkim-mangling sources. -K ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] extsearch: support publicinbox.*.boost parameter 2021-07-20 21:18 ` Konstantin Ryabitsev @ 2021-07-21 14:05 ` Eric Wong 2021-07-21 14:05 ` [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE Eric Wong 2021-07-21 15:32 ` [PATCH 0/2] things to make mirroring easier Konstantin Ryabitsev 2 siblings, 0 replies; 12+ messages in thread From: Eric Wong @ 2021-07-21 14:05 UTC (permalink / raw) To: meta; +Cc: Konstantin Ryabitsev This behaves identically the lei external "boost" parameter in prioritizing raw messages for extindex. Relying exclusively on the config file order doesn't work well for mirrors since it's impossible to guarantee config file ordering via grokmirror hooks. Config file ordering remains the default if boost is unconfigured, or in case of ties. Note: I chose the name "boost" rather than "priority" or "rank" since I always get confused by whether higher or lower numbers take precedence when it comes to kernel scheduling. "weight" is also a part of Xapian API terminology, which we currently do not expose to configuration (but may in the future). --- Documentation/public-inbox-config.pod | 8 +++++ lib/PublicInbox/Config.pm | 2 +- lib/PublicInbox/ExtSearchIdx.pm | 43 +++++++++++++++++---------- t/extsearch.t | 32 ++++++++++++++++++++ 4 files changed, 69 insertions(+), 16 deletions(-) diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod index 05d9ca62..5b86ef6c 100644 --- a/Documentation/public-inbox-config.pod +++ b/Documentation/public-inbox-config.pod @@ -124,6 +124,14 @@ allow for searching for phrases using quoted text. Default: C<full> +=item publicinbox.<name>.boost + +Control indexing order for L<public-inbox-extindex(1)>, with ties +broken by config file order. This only affects indexing and does +not affect messages which are already indexed. + +Default: C<0> + =item publicinbox.<name>.indexSequentialShard See L<public-inbox-index(1)/publicInbox.indexSequentialShard> diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index 8e46328d..7aa1f6c8 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -429,7 +429,7 @@ sub _fill_ibx { $ibx->{$k} = $v if defined $v; } for my $k (qw(filter inboxdir newsgroup replyto httpbackendmax feedmax - indexlevel indexsequentialshard)) { + indexlevel indexsequentialshard boost)) { my $v = get_1($self, $pfx, $k) // next; $ibx->{$k} = $v; } diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm index 0e27bba6..357312b8 100644 --- a/lib/PublicInbox/ExtSearchIdx.pm +++ b/lib/PublicInbox/ExtSearchIdx.pm @@ -44,7 +44,7 @@ sub new { topdir => $dir, creat => $opt->{creat}, ibx_map => {}, # (newsgroup//inboxdir) => $ibx - ibx_list => [], + ibx_cfg => [], # by config section order indexlevel => $l, transact_bytes => 0, total_bytes => 0, @@ -62,7 +62,8 @@ sub new { sub attach_inbox { my ($self, $ibx) = @_; $self->{ibx_map}->{$ibx->eidx_key} //= do { - push @{$self->{ibx_list}}, $ibx; + delete $self->{-ibx_ary}; # invalidate cache + push @{$self->{ibx_cfg}}, $ibx; $ibx; } } @@ -388,7 +389,7 @@ sub _ibx_for ($$$) { my ($self, $sync, $smsg) = @_; my $ibx_id = delete($smsg->{ibx_id}) // die '{ibx_id} unset'; my $pos = $sync->{id2pos}->{$ibx_id} // die "$ibx_id no pos"; - $self->{ibx_list}->[$pos] // die "BUG: ibx for $smsg->{blob} not mapped" + $self->{-ibx_ary}->[$pos] // die "BUG: ibx for $smsg->{blob} not mapped" } sub _fd_constrained ($) { @@ -402,7 +403,7 @@ sub _fd_constrained ($) { chomp($soft = `sh -c 'ulimit -n'`); } if (defined($soft)) { - my $want = scalar(@{$self->{ibx_list}}) + 64; # estimate + my $want = scalar(@{$self->{-ibx_ary}}) + 64; # estimate my $ret = $want > $soft; if ($ret) { warn <<EOF; @@ -524,10 +525,10 @@ BUG? #$docid $smsg->{blob} is not referenced by inboxes during reindex return; } - # we sort {xr3r} in the reverse order of {ibx_list} so we can + # we sort {xr3r} in the reverse order of ibx_sorted so we can # hit the common case in _reindex_finalize without rereading # from git (or holding multiple messages in memory). - my $id2pos = $sync->{id2pos}; # index in {ibx_list} + my $id2pos = $sync->{id2pos}; # index in ibx_sorted @$xr3 = sort { $id2pos->{$b->[0]} <=> $id2pos->{$a->[0]} || @@ -621,6 +622,17 @@ EOF undef; } +sub ibx_sorted ($) { + my ($self) = @_; + $self->{-ibx_ary} //= do { + # highest boost first, stable for config-ordering tiebreaker + use sort 'stable'; + [ sort { + ($b->{boost} // 0) <=> ($a->{boost} // 0) + } @{$self->{ibx_cfg}} ]; + } +} + sub eidxq_process ($$) { # for reindexing my ($self, $sync) = @_; @@ -638,7 +650,7 @@ sub eidxq_process ($$) { # for reindexing $sync->{id2pos} //= do { my %id2pos; my $pos = 0; - $id2pos{$_->{-ibx_id}} = $pos++ for @{$self->{ibx_list}}; + $id2pos{$_->{-ibx_id}} = $pos++ for (@{ibx_sorted($self)}); \%id2pos; }; my ($del, $iter); @@ -829,7 +841,7 @@ sub eidx_reindex { warn "E: aborting --reindex\n"; return; } - for my $ibx (@{$self->{ibx_list}}) { + for my $ibx (@{ibx_sorted($self)}) { _reindex_inbox($self, $sync, $ibx); last if $sync->{quit}; } @@ -959,7 +971,7 @@ sub eidx_sync { # main entry point local $SIG{QUIT} = $quit; local $SIG{INT} = $quit; local $SIG{TERM} = $quit; - for my $ibx (@{$self->{ibx_list}}) { + for my $ibx (@{ibx_sorted($self)}) { $ibx->{-ibx_id} //= $self->{oidx}->ibx_id($ibx->eidx_key); } if (delete($opt->{dedupe})) { @@ -973,7 +985,7 @@ sub eidx_sync { # main entry point # don't use $_ here, it'll get clobbered by reindex_checkpoint if ($opt->{scan} // 1) { - for my $ibx (@{$self->{ibx_list}}) { + for my $ibx (@{ibx_sorted($self)}) { last if $sync->{quit}; sync_inbox($self, $sync, $ibx); } @@ -1115,7 +1127,7 @@ sub idx_init { # similar to V2Writable } undef $dh; } - for my $ibx (@{$self->{ibx_list}}) { + for my $ibx (@{ibx_sorted($self)}) { # create symlinks for multi-pack-index $git_midx += symlink_packs($ibx, $pd); # add new lines to our alternates file @@ -1180,7 +1192,8 @@ sub eidx_reload { # -extindex --watch SIGHUP handler my $pr = $self->{-watch_sync}->{-opt}->{-progress}; $pr->('reloading ...') if $pr; delete $self->{-resync_queue}; - @{$self->{ibx_list}} = (); + delete $self->{-ibx_ary}; + $self->{ibx_cfg} = []; %{$self->{ibx_map}} = (); delete $self->{-watch_sync}->{id2pos}; my $cfg = PublicInbox::Config->new; @@ -1194,7 +1207,7 @@ sub eidx_reload { # -extindex --watch SIGHUP handler sub eidx_resync_start ($) { # -extindex --watch SIGUSR1 handler my ($self) = @_; - $self->{-resync_queue} //= [ @{$self->{ibx_list}} ]; + $self->{-resync_queue} //= [ @{ibx_sorted($self)} ]; PublicInbox::DS::requeue($self); # trigger our ->event_step } @@ -1225,9 +1238,9 @@ sub eidx_watch { # public-inbox-extindex --watch main loop require PublicInbox::Sigfd; my $idler = PublicInbox::InboxIdle->new($self->{cfg}); if (!$self->{cfg}) { - $idler->watch_inbox($_) for @{$self->{ibx_list}}; + $idler->watch_inbox($_) for (@{ibx_sorted($self)}); } - $_->subscribe_unlock(__PACKAGE__, $self) for @{$self->{ibx_list}}; + $_->subscribe_unlock(__PACKAGE__, $self) for (@{ibx_sorted($self)}); my $pr = $opt->{-progress}; $pr->("performing initial scan ...\n") if $pr; my $sync = eidx_sync($self, $opt); # initial sync diff --git a/t/extsearch.t b/t/extsearch.t index 5f0cd866..46a6f2ec 100644 --- a/t/extsearch.t +++ b/t/extsearch.t @@ -60,6 +60,38 @@ ok(run_script([qw(-extindex --all), "$home/extindex"]), 'extindex init'); ok($es->has_threadid, '->has_threadid'); } +if ('with boost') { + xsys([qw(git config publicinbox.v1test.boost), 10], + { GIT_CONFIG => $cfg_path }); + ok(run_script([qw(-extindex --all), "$home/extindex-b"]), + 'extindex init with boost'); + my $es = PublicInbox::ExtSearch->new("$home/extindex-b"); + my $smsg = $es->over->get_art(1); + ok($smsg, 'got first article'); + my $xref3 = $es->over->get_xref3($smsg->{num}); + my @v1 = grep(/\Av1/, @$xref3); + my @v2 = grep(/\Av2/, @$xref3); + like($v1[0], qr/\Av1\.example.*?\b\Q$smsg->{blob}\E\b/, + 'smsg->{blob} respected boost'); + is(scalar(@$xref3), 2, 'only to entries'); + undef $es; + + xsys([qw(git config publicinbox.v2test.boost), 20], + { GIT_CONFIG => $cfg_path }); + ok(run_script([qw(-extindex --all --reindex), "$home/extindex-b"]), + 'extindex --reindex with altered boost'); + + $es = PublicInbox::ExtSearch->new("$home/extindex-b"); + $smsg = $es->over->get_art(1); + like($v2[0], qr/\Av2\.example.*?\b\Q$smsg->{blob}\E\b/, + 'smsg->{blob} respects boost after reindex'); + + xsys([qw(git config --unset publicinbox.v1test.boost)], + { GIT_CONFIG => $cfg_path }); + xsys([qw(git config --unset publicinbox.v2test.boost)], + { GIT_CONFIG => $cfg_path }); +} + { # TODO: -extindex should write this to config open $fh, '>>', $cfg_path or BAIL_OUT $!; print $fh <<EOF or BAIL_OUT $!; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE 2021-07-20 21:18 ` Konstantin Ryabitsev 2021-07-21 14:05 ` [PATCH 1/2] extsearch: support publicinbox.*.boost parameter Eric Wong @ 2021-07-21 14:05 ` Eric Wong 2021-07-25 10:40 ` [PATCH 3/2] init: support git <2.30 for "-c KEY=VALUE" args Eric Wong 2021-07-21 15:32 ` [PATCH 0/2] things to make mirroring easier Konstantin Ryabitsev 2 siblings, 1 reply; 12+ messages in thread From: Eric Wong @ 2021-07-21 14:05 UTC (permalink / raw) To: meta; +Cc: Konstantin Ryabitsev This won't blindly append identical key=values, but allows specifying multiple, different key=value pairs as long as the values are different. --- Documentation/public-inbox-init.pod | 7 ++++++ script/public-inbox-init | 36 ++++++++++++++++++++++++++++- t/init.t | 34 +++++++++++++++++++++++++++ 3 files changed, 76 insertions(+), 1 deletion(-) diff --git a/Documentation/public-inbox-init.pod b/Documentation/public-inbox-init.pod index 62624f14..85c6c9e8 100644 --- a/Documentation/public-inbox-init.pod +++ b/Documentation/public-inbox-init.pod @@ -60,6 +60,13 @@ Available in public-inbox 1.6.0+. Default: none. +=item -c KEY=VALUE + +Allow setting arbitrary configs as C<publicinbox.$NAME.$KEY>. +This is idempotent for the same C<VALUE>, but allows setting +multiple values for keys such as C<publicinbox.$NAME.url> and +C<publicinbox.$NAME.watch>. + =item --skip-artnum This option allows archivists to publish incomplete archives diff --git a/script/public-inbox-init b/script/public-inbox-init index 335eb476..e22a0564 100755 --- a/script/public-inbox-init +++ b/script/public-inbox-init @@ -22,6 +22,7 @@ options: -V2 use scalable public-inbox-v2-format(5) -L LEVEL index level `basic', `medium', or `full' (default: full) --ng NEWSGROUP set NNTP newsgroup name + -c KEY=VALUE set additional config option(s) --skip-artnum=NUM NNTP article numbers to skip --skip-epoch=NUM epochs to skip (-V2 only) -j JOBS number of indexing jobs (-V2 only), (default: 4) @@ -35,6 +36,7 @@ PublicInbox::Admin::require_or_die('-base'); my ($version, $indexlevel, $skip_epoch, $skip_artnum, $jobs, $show_help); my $skip_docdata; my $ng = ''; +my @c_extra; my %opts = ( 'V|version=i' => \$version, 'L|index-level|indexlevel=s' => \$indexlevel, @@ -44,6 +46,7 @@ my %opts = ( 'ng|newsgroup=s' => \$ng, 'skip-docdata' => \$skip_docdata, 'help|h' => \$show_help, + 'c=s@' => \@c_extra, ); my $usage_cb = sub { print STDERR $help; @@ -51,13 +54,38 @@ my $usage_cb = sub { }; GetOptions(%opts) or $usage_cb->(); if ($show_help) { print $help; exit 0 }; -PublicInbox::Admin::indexlevel_ok_or_die($indexlevel) if defined $indexlevel; my $name = shift @ARGV or $usage_cb->(); my $inboxdir = shift @ARGV or $usage_cb->(); my $http_url = shift @ARGV or $usage_cb->(); my (@address) = @ARGV; @address or $usage_cb->(); +@c_extra = map { + my ($k, $v) = split(/=/, $_, 2); + defined($v) or die "Usage: -c KEY=VALUE\n"; + $k =~ /\A[a-z]+\z/i or die "$k contains invalid characters\n"; + $k = lc($k); + if ($k eq 'newsgroup') { + die "newsgroup already set ($ng)\n" if $ng ne ''; + $ng = $v; + (); + } elsif ($k eq 'address') { + push @address, $v; # for conflict checking + (); + } elsif ($k =~ /\A(?:inboxdir|mainrepo)\z/) { + die "$k not allowed via -c $_\n" + } elsif ($k eq 'indexlevel') { + defined($indexlevel) and + die "indexlevel already set ($indexlevel)\n"; + $indexlevel = $v; + (); + } else { + $_ + } +} @c_extra; + +PublicInbox::Admin::indexlevel_ok_or_die($indexlevel) if defined $indexlevel; + $ng =~ m![^A-Za-z0-9/_\.\-\~\@\+\=:]! and die "--newsgroup `$ng' is not valid\n"; ($ng =~ m!\A\.! || $ng =~ m!\.\z!) and @@ -201,6 +229,12 @@ if (defined($indexlevel)) { } run_die([@x, "$pfx.newsgroup", $ng]) if $ng ne ''; +for my $kv (@c_extra) { + my ($k, $v) = split(/=/, $kv, 2); + # --fixed-value for idempotent invocations + run_die([@x, qw(--replace-all --fixed-value), "$pfx.$k", $v, $v]); +} + # needed for git prior to v2.1.0 if (defined $perm) { chmod($perm & 07777, $pi_config_tmp) or diff --git a/t/init.t b/t/init.t index d46bef23..7382e05b 100644 --- a/t/init.t +++ b/t/init.t @@ -48,6 +48,40 @@ sub quiet_fail { is($? >> 8, 255, 'got expected exit code on lock failure'); ok(unlink("$cfgfile.lock"), '-init did not unlink lock on failure'); + + my @init_args = ('i', "$tmpdir/i", + qw(http://example.com/i i@example.com)); + $cmd = [ qw(-init -c .bogus=val), @init_args ]; + quiet_fail($cmd, 'invalid -c KEY=VALUE fails'); + $cmd = [ qw(-init -c .bogus=val), @init_args ]; + quiet_fail($cmd, '-c KEY-only fails'); + $cmd = [ qw(-init -c address=clist@example.com), @init_args ]; + quiet_fail($cmd, '-c address=CONFLICTING-VALUE fails'); + + $cmd = [ qw(-init -c no=problem -c no=problemo), @init_args ]; + ok(run_script($cmd), '-c KEY=VALUE runs'); + my $env = { GIT_CONFIG => "$ENV{PI_DIR}/config" }; + chomp(my @v = xqx([qw(git config --get-all publicinbox.i.no)], $env)); + is_deeply(\@v, [ qw(problem problemo) ]) or xbail(\@v); + + ok(run_script($cmd), '-c KEY=VALUE runs idempotently'); + chomp(my @v2 = xqx([qw(git config --get-all publicinbox.i.no)], $env)); + is_deeply(\@v, \@v2, 'nothing repeated') or xbail(\@v2); + + ok(run_script([@$cmd, '-c', 'no=more']), '-c KEY=VALUE addendum'); + chomp(@v = xqx([qw(git config --get-all publicinbox.i.no)], $env)); + is_deeply(\@v, [ qw(problem problemo more) ]) or xbail(\@v); + + + ok(run_script([@$cmd, '-c', 'no=problem']), '-c KEY=VALUE repeated'); + chomp(@v = xqx([qw(git config --get-all publicinbox.i.no)], $env)); + is_deeply(\@v, [ qw(problem problemo more) ]) or xbail(\@v); + + ok(run_script([@$cmd, '-c', 'address=j@example.com']), + '-c KEY=VALUE address'); + chomp(@v = xqx([qw(git config --get-all publicinbox.i.address)], $env)); + is_deeply(\@v, [ qw(i@example.com j@example.com) ], + 'extra address added via -c KEY=VALUE'); } { my $env = { PI_DIR => "$tmpdir/.public-inbox/" }; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/2] init: support git <2.30 for "-c KEY=VALUE" args 2021-07-21 14:05 ` [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE Eric Wong @ 2021-07-25 10:40 ` Eric Wong 0 siblings, 0 replies; 12+ messages in thread From: Eric Wong @ 2021-07-25 10:40 UTC (permalink / raw) To: meta; +Cc: Konstantin Ryabitsev It turns out `--fixed-value' is a relatively new git-config(1) feature in git 2.30+ (December 2020). So use the quotemeta perlop for now since it seems compatible-enough for POSIX ERE used by git. --- script/public-inbox-init | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/script/public-inbox-init b/script/public-inbox-init index e22a0564..6fac4d18 100755 --- a/script/public-inbox-init +++ b/script/public-inbox-init @@ -231,8 +231,11 @@ run_die([@x, "$pfx.newsgroup", $ng]) if $ng ne ''; for my $kv (@c_extra) { my ($k, $v) = split(/=/, $kv, 2); - # --fixed-value for idempotent invocations - run_die([@x, qw(--replace-all --fixed-value), "$pfx.$k", $v, $v]); + # git 2.30+ has --fixed-value for idempotent invocations, + # but that's too new to depend on in 2021. Perl quotemeta + # seems compatible enough for POSIX ERE which git uses + my $re = '^'.quotemeta($v).'$'; + run_die([@x, qw(--replace-all), "$pfx.$k", $v, $re]); } # needed for git prior to v2.1.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] things to make mirroring easier 2021-07-20 21:18 ` Konstantin Ryabitsev 2021-07-21 14:05 ` [PATCH 1/2] extsearch: support publicinbox.*.boost parameter Eric Wong 2021-07-21 14:05 ` [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE Eric Wong @ 2021-07-21 15:32 ` Konstantin Ryabitsev 2 siblings, 0 replies; 12+ messages in thread From: Konstantin Ryabitsev @ 2021-07-21 15:32 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Wed, Jul 21, 2021 at 02:05:48PM +0000, Eric Wong wrote: > publicinbox.<name>.boost is now supported (it should be obvious > higher numbers are handled first, because OS scheduler > "priority" always confuses me :x) > > And -init handles arbitrary "-c KEY=VALUE" things like > git-config and allows multi-value settings. Great stuff, thanks for both of these! -K ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-07-25 10:40 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-07-19 20:03 Restarting daemons on config file change Konstantin Ryabitsev 2021-07-19 20:49 ` Eric Wong 2021-07-20 8:58 ` [PATCH] httpd: fix SIGHUP by invalidating cache on reload Eric Wong 2021-07-20 17:00 ` Restarting daemons on config file change Konstantin Ryabitsev 2021-07-20 20:34 ` Eric Wong 2021-07-20 20:49 ` Konstantin Ryabitsev 2021-07-20 21:07 ` Eric Wong 2021-07-20 21:18 ` Konstantin Ryabitsev 2021-07-21 14:05 ` [PATCH 1/2] extsearch: support publicinbox.*.boost parameter Eric Wong 2021-07-21 14:05 ` [PATCH 2/2] init: allow arbitrary key-values via -c KEY=VALUE Eric Wong 2021-07-25 10:40 ` [PATCH 3/2] init: support git <2.30 for "-c KEY=VALUE" args Eric Wong 2021-07-21 15:32 ` [PATCH 0/2] things to make mirroring easier Konstantin Ryabitsev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).