From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 25/26] xcpdb|compact: support --jobs/-j flag like gmake(1)
Date: Thu, 23 May 2019 09:37:03 +0000 [thread overview]
Message-ID: <20190523093704.18367-26-e@80x24.org> (raw)
In-Reply-To: <20190523093704.18367-1-e@80x24.org>
We don't have to be tied to the number of partitions in case
we made a bad choice at initialization. This doesn't affect
reindexing, but the copying phase is already intensive.
And optimize away the extra process when we only have a single
job which won't parallelize.
The wording for the (v2) reindexing phase could be improved,
later. I also plan to allow repartitioning of existing
Xapian DBs.
---
lib/PublicInbox/Xapcmd.pm | 44 +++++++++++++++++++++++++--------------
1 file changed, 28 insertions(+), 16 deletions(-)
diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index 5b6d06b..a294d53 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -13,7 +13,7 @@ use File::Basename qw(dirname);
# support testing with dev versions of Xapian which installs
# commands with a version number suffix (e.g. "xapian-compact-1.5")
our $XAPIAN_COMPACT = $ENV{XAPIAN_COMPACT} || 'xapian-compact';
-our @COMPACT_OPT = qw(quiet|q blocksize|b=s no-full|n fuller|F);
+our @COMPACT_OPT = qw(jobs|j=i quiet|q blocksize|b=s no-full|n fuller|F);
sub commit_changes ($$$) {
my ($ibx, $tmp, $opt) = @_;
@@ -54,8 +54,7 @@ sub cb_spawn {
my ($cb, $args, $opt) = @_; # $cb = cpdb() or compact()
defined(my $pid = fork) or die "fork: $!";
return $pid if $pid > 0;
- eval { $cb->($args, $opt) };
- die $@ if $@;
+ $cb->($args, $opt);
exit 0;
}
@@ -103,6 +102,31 @@ sub same_fs_or_die ($$) {
die "$x and $y reside on different filesystems\n";
}
+sub process_queue {
+ my ($queue, $cb, $max, $opt) = @_;
+ if ($max <= 1) {
+ while (defined(my $args = shift @$queue)) {
+ $cb->($args, $opt);
+ }
+ return;
+ }
+
+ # run in parallel:
+ my %pids;
+ while (@$queue) {
+ while (scalar(keys(%pids)) < $max && scalar(@$queue)) {
+ my $args = shift @$queue;
+ $pids{cb_spawn($cb, $args, $opt)} = $args;
+ }
+
+ while (scalar keys %pids) {
+ my $pid = waitpid(-1, 0);
+ my $args = delete $pids{$pid};
+ die join(' ', @$args)." failed: $?\n" if $?;
+ }
+ }
+}
+
sub run {
my ($ibx, $task, $opt) = @_; # task = 'cpdb' or 'compact'
my $cb = \&${\"PublicInbox::Xapcmd::$task"};
@@ -163,19 +187,7 @@ sub run {
}
delete($ibx->{$_}) for (qw(mm over search)); # cleanup
- my %pids;
- while (@q) {
- while (scalar(keys(%pids)) < $max && scalar(@q)) {
- my $args = shift @q;
- $pids{cb_spawn($cb, $args, $opt)} = $args;
- }
-
- while (scalar keys %pids) {
- my $pid = waitpid(-1, 0);
- my $args = delete $pids{$pid};
- die join(' ', @$args)." failed: $?\n" if $?;
- }
- }
+ process_queue(\@q, $cb, $max, $opt);
commit_changes($ibx, $tmp, $opt);
});
}
--
EW
next prev parent reply other threads:[~2019-05-23 9:37 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-23 9:36 [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
2019-05-23 9:36 ` [PATCH 01/26] t/convert-compact: skip on missing xapian-compact(1) Eric Wong
2019-05-23 9:36 ` [PATCH 02/26] v1writable: retire in favor of InboxWritable Eric Wong
2019-05-23 9:36 ` [PATCH 03/26] doc: document the reason for --no-renumber Eric Wong
2019-05-23 9:36 ` [PATCH 04/26] search: reenable phrase search on non-chert Xapian Eric Wong
2019-05-23 9:36 ` [PATCH 05/26] xapcmd: new module for wrapping Xapian commands Eric Wong
2019-05-23 9:36 ` [PATCH 06/26] admin: hoist out resolve_inboxes for -compact and -index Eric Wong
2019-05-23 9:36 ` [PATCH 07/26] xapcmd: support spawn options Eric Wong
2019-05-23 9:36 ` [PATCH 08/26] xcpdb: new tool which wraps Xapian's copydatabase(1) Eric Wong
2019-05-23 9:36 ` [PATCH 09/26] xapcmd: do not cleanup on errors Eric Wong
2019-05-23 9:36 ` [PATCH 10/26] admin: move index_inbox over Eric Wong
2019-05-23 9:36 ` [PATCH 11/26] xcpdb: implement using Perl bindings Eric Wong
2019-05-23 9:36 ` [PATCH 12/26] xapcmd: xcpdb supports compaction Eric Wong
2019-05-23 9:36 ` [PATCH 13/26] v2writable: hoist out log_range sub for readability Eric Wong
2019-05-23 9:36 ` [PATCH 14/26] xcpdb: use fine-grained locking Eric Wong
2019-05-23 9:36 ` [PATCH 15/26] xcpdb: implement progress reporting Eric Wong
2019-05-23 9:36 ` [PATCH 16/26] xcpdb: cleanup error handling and diagnosis Eric Wong
2019-05-23 9:36 ` [PATCH 17/26] xapcmd: avoid EXDEV when finalizing changes Eric Wong
2019-05-23 9:36 ` [PATCH 18/26] doc: xcpdb: update to reflect the current state Eric Wong
2019-05-23 9:36 ` [PATCH 19/26] xapcmd: use "print STDERR" for progress reporting Eric Wong
2019-05-23 9:36 ` [PATCH 20/26] xcpdb: show re-indexing progress Eric Wong
2019-05-23 9:36 ` [PATCH 21/26] xcpdb: remove temporary directories on aborts Eric Wong
2019-05-23 9:37 ` [PATCH 22/26] compact: reuse infrastructure from xcpdb Eric Wong
2019-05-23 9:37 ` [PATCH 23/26] xcpdb|compact: support some xapian-compact switches Eric Wong
2019-05-23 9:37 ` [PATCH 24/26] xapcmd: cleanup on interrupted xcpdb "--compact" Eric Wong
2019-05-23 9:37 ` Eric Wong [this message]
2019-05-23 9:37 ` [PATCH 26/26] xapcmd: do not reset %SIG until last Xtmpdir is done Eric Wong
2019-05-23 10:37 ` [PATCH 27/26] doc: various updates to reflect current state Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190523093704.18367-26-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).