* [bug#39728] [PATCH] Allow parallel downloads and builds @ 2020-02-21 22:53 Julien Lepiller 2020-02-24 21:23 ` Ludovic Courtès 2021-07-14 2:49 ` [bug#39728] [PATCH] " Maxim Cournoyer 0 siblings, 2 replies; 10+ messages in thread From: Julien Lepiller @ 2020-02-21 22:53 UTC (permalink / raw) To: 39728 [-- Attachment #1: Type: text/plain, Size: 480 bytes --] Hi guix! This patch allows to count builds and downloads separately. The idea is that downloads need bandwidth, but no CPU, while builds do not need bandwidth, but need CPU. With this patch, guix will be able to download substitutes while building unrelated packages. Currently, guix needs to wait for the download to finish before proceeding to the build. This should reduce the time of guix commands that need to build and download things at the same time. What do you think? [-- Attachment #2: 0001-nix-Count-build-and-download-jobs-separately.patch --] [-- Type: text/x-patch, Size: 10298 bytes --] From 9c059d81ba4f4016f8c400b403f8c5edbdb160c2 Mon Sep 17 00:00:00 2001 From: Julien Lepiller <julien@lepiller.eu> Date: Fri, 21 Feb 2020 23:41:33 +0100 Subject: [PATCH] nix: Count build and download jobs separately. This allows to run downloads (that take bandwith) and builds (that take CPU time) independently from one another. * nix/nix-daemon/guix-daemon.cc: Add a max-download-jobs option. * nix/libstore/globals.hh: Add a maxDownloadJobs setting. * nix/libstore/globals.cc: Add a default value to it. * nix/libstore/build.cc: Manage build and download jobs separately. --- nix/libstore/build.cc | 75 +++++++++++++++++++++++++++-------- nix/libstore/globals.cc | 2 + nix/libstore/globals.hh | 3 ++ nix/nix-daemon/guix-daemon.cc | 5 +++ 4 files changed, 69 insertions(+), 16 deletions(-) diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc index 17e92c68a7..d08904c4ee 100644 --- a/nix/libstore/build.cc +++ b/nix/libstore/build.cc @@ -204,6 +204,7 @@ struct Child set<int> fds; bool respectTimeouts; bool inBuildSlot; + bool inDownloadSlot; time_t lastOutput; /* time we last got output on stdout/stderr */ time_t timeStarted; }; @@ -231,10 +232,14 @@ private: /* Child processes currently running. */ Children children; - /* Number of build slots occupied. This includes local builds and - substitutions but not remote builds via the build hook. */ + /* Number of build slots occupied. This includes local builds but not + substitution, built-ins and remote builds via the build hook. */ unsigned int nrLocalBuilds; + /* Number of download slots occupied. This includes substitution and + built-ins. */ + unsigned int nrDownloads; + /* Maps used to prevent multiple instantiations of a goal for the same derivation / path. */ WeakGoalMap derivationGoals; @@ -275,15 +280,18 @@ public: /* Wake up a goal (i.e., there is something for it to do). */ void wakeUp(GoalPtr goal); - /* Return the number of local build and substitution processes - currently running (but not remote builds via the build - hook). */ + /* Return the number of local build processes currently running (but + not remote builds via the build hook). */ unsigned int getNrLocalBuilds(); + /* Return the number of downloads currently running. */ + unsigned int getNrDownloads(); + /* Registers a running child process. `inBuildSlot' means that the process counts towards the jobs limit. */ void childStarted(GoalPtr goal, pid_t pid, - const set<int> & fds, bool inBuildSlot, bool respectTimeouts); + const set<int> & fds, bool inBuildSlot, bool inDownloadSlot, + bool respectTimeouts); /* Unregisters a running child process. `wakeSleepers' should be false if there is no sense in waking up goals that are sleeping @@ -295,6 +303,10 @@ public: might be right away). */ void waitForBuildSlot(GoalPtr goal); + /* Put `goal' to sleep until a download slot becomes available (which + might be right away). */ + void waitForDownloadSlot(GoalPtr goal); + /* Wait for any goal to finish. Pretty indiscriminate way to wait for some resource that some other goal is holding. */ void waitForAnyGoal(GoalPtr goal); @@ -1359,12 +1371,21 @@ void DerivationGoal::tryToBuild() derivation prefers to be done locally, do it even if maxBuildJobs is 0. */ unsigned int curBuilds = worker.getNrLocalBuilds(); - if (curBuilds >= settings.maxBuildJobs && !(buildLocally && curBuilds == 0)) { + if (curBuilds >= settings.maxBuildJobs && !(buildLocally && curBuilds == 0) && + !fixedOutput) { worker.waitForBuildSlot(shared_from_this()); outputLocks.unlock(); return; } + unsigned int curDownloads = worker.getNrDownloads(); + if (curDownloads >= (settings.maxDownloadJobs==0?1:settings.maxDownloadJobs) && + fixedOutput) { + worker.waitForDownloadSlot(shared_from_this()); + outputLocks.unlock(); + return; + } + try { /* Okay, we have to build. */ @@ -1648,7 +1669,7 @@ HookReply DerivationGoal::tryBuildHook() set<int> fds; fds.insert(hook->fromHook.readSide); fds.insert(hook->builderOut.readSide); - worker.childStarted(shared_from_this(), hook->pid, fds, false, true); + worker.childStarted(shared_from_this(), hook->pid, fds, false, false, true); if (settings.printBuildTrace) printMsg(lvlError, format("@ build-started %1% - %2% %3% %4%") @@ -2030,7 +2051,7 @@ void DerivationGoal::startBuilder() pid.setSeparatePG(true); builderOut.writeSide.close(); worker.childStarted(shared_from_this(), pid, - singleton<set<int> >(builderOut.readSide), true, true); + singleton<set<int> >(builderOut.readSide), !fixedOutput, fixedOutput, true); /* Check if setting up the build environment failed. */ string msg = readLine(builderOut.readSide); @@ -3034,11 +3055,11 @@ void SubstitutionGoal::tryToRun() trace("trying to run"); /* Make sure that we are allowed to start a build. Note that even - is maxBuildJobs == 0 (no local builds allowed), we still allow - a substituter to run. This is because substitutions cannot be - distributed to another machine via the build hook. */ - if (worker.getNrLocalBuilds() >= (settings.maxBuildJobs == 0 ? 1 : settings.maxBuildJobs)) { - worker.waitForBuildSlot(shared_from_this()); + is maxDownloadJobs == 0 (no downloads allowed), we still allow + a substituter to run. This is because we always need to download, so + not allowing is meaningless. */ + if (worker.getNrDownloads() >= (settings.maxDownloadJobs == 0 ? 1 : settings.maxDownloadJobs)) { + worker.waitForDownloadSlot(shared_from_this()); return; } @@ -3107,7 +3128,7 @@ void SubstitutionGoal::tryToRun() outPipe.writeSide.close(); logPipe.writeSide.close(); worker.childStarted(shared_from_this(), - pid, singleton<set<int> >(logPipe.readSide), true, true); + pid, singleton<set<int> >(logPipe.readSide), false, true, true); state = &SubstitutionGoal::finished; @@ -3242,6 +3263,7 @@ Worker::Worker(LocalStore & store) if (working) abort(); working = true; nrLocalBuilds = 0; + nrDownloads = 0; lastWokenUp = 0; permanentFailure = false; timedOut = false; @@ -3334,9 +3356,14 @@ unsigned Worker::getNrLocalBuilds() return nrLocalBuilds; } +unsigned Worker::getNrDownloads() +{ + return nrDownloads; +} + void Worker::childStarted(GoalPtr goal, - pid_t pid, const set<int> & fds, bool inBuildSlot, + pid_t pid, const set<int> & fds, bool inBuildSlot, bool inDownloadSlot, bool respectTimeouts) { Child child; @@ -3344,9 +3371,11 @@ void Worker::childStarted(GoalPtr goal, child.fds = fds; child.timeStarted = child.lastOutput = time(0); child.inBuildSlot = inBuildSlot; + child.inDownloadSlot = inDownloadSlot; child.respectTimeouts = respectTimeouts; children[pid] = child; if (inBuildSlot) nrLocalBuilds++; + if (inDownloadSlot) nrDownloads++; } @@ -3362,6 +3391,11 @@ void Worker::childTerminated(pid_t pid, bool wakeSleepers) nrLocalBuilds--; } + if (i->second.inDownloadSlot) { + assert(nrDownloads > 0); + nrDownloads--; + } + children.erase(pid); if (wakeSleepers) { @@ -3386,6 +3420,15 @@ void Worker::waitForBuildSlot(GoalPtr goal) addToWeakGoals(wantingToBuild, goal); } +void Worker::waitForDownloadSlot(GoalPtr goal) +{ + debug("wait for download slot"); + if (getNrDownloads() < (settings.maxDownloadJobs==0?1:settings.maxDownloadJobs)) + wakeUp(goal); /* we can do it right away */ + else + addToWeakGoals(wantingToBuild, goal); +} + void Worker::waitForAnyGoal(GoalPtr goal) { diff --git a/nix/libstore/globals.cc b/nix/libstore/globals.cc index 0cc001fbe4..416033718d 100644 --- a/nix/libstore/globals.cc +++ b/nix/libstore/globals.cc @@ -29,6 +29,7 @@ Settings::Settings() tryFallback = false; buildVerbosity = lvlError; maxBuildJobs = 1; + maxDownloadJobs = 1; buildCores = 1; readOnlyMode = false; thisSystem = SYSTEM; @@ -118,6 +119,7 @@ void Settings::update() { _get(tryFallback, "build-fallback"); _get(maxBuildJobs, "build-max-jobs"); + _get(maxDownloadJobs, "download-max-jobs"); _get(buildCores, "build-cores"); _get(thisSystem, "system"); _get(multiplexedBuildOutput, "multiplexed-build-output"); diff --git a/nix/libstore/globals.hh b/nix/libstore/globals.hh index 27616a2283..c033f8ed56 100644 --- a/nix/libstore/globals.hh +++ b/nix/libstore/globals.hh @@ -90,6 +90,9 @@ struct Settings { /* Maximum number of parallel build jobs. 0 means unlimited. */ unsigned int maxBuildJobs; + /* Maximum number of parallel download jobs. 0 means 1. */ + unsigned int maxDownloadJobs; + /* Number of CPU cores to utilize in parallel within a build, i.e. by passing this number to Make via '-j'. 0 means that the number of actual CPU cores on the local host ought to be diff --git a/nix/nix-daemon/guix-daemon.cc b/nix/nix-daemon/guix-daemon.cc index cd949aca67..6997c07f5a 100644 --- a/nix/nix-daemon/guix-daemon.cc +++ b/nix/nix-daemon/guix-daemon.cc @@ -99,6 +99,8 @@ static const struct argp_option options[] = }, { "max-jobs", 'M', n_("N"), 0, n_("allow at most N build jobs") }, + { "max-downloads", 'D', n_("N"), 0, + n_("allow at most N download jobs") }, { "timeout", GUIX_OPT_TIMEOUT, n_("SECONDS"), 0, n_("mark builds as failed after SECONDS of activity") }, { "max-silent-time", GUIX_OPT_MAX_SILENT_TIME, n_("SECONDS"), 0, @@ -276,6 +278,9 @@ parse_opt (int key, char *arg, struct argp_state *state) case 'M': settings.set ("build-max-jobs", arg); break; + case 'D': + settings.set ("download-max-jobs", arg); + break; case GUIX_OPT_TIMEOUT: settings.set ("build-timeout", arg); break; -- 2.24.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH] Allow parallel downloads and builds 2020-02-21 22:53 [bug#39728] [PATCH] Allow parallel downloads and builds Julien Lepiller @ 2020-02-24 21:23 ` Ludovic Courtès 2020-02-25 15:21 ` zimoun 2021-07-14 2:49 ` [bug#39728] [PATCH] " Maxim Cournoyer 1 sibling, 1 reply; 10+ messages in thread From: Ludovic Courtès @ 2020-02-24 21:23 UTC (permalink / raw) To: Julien Lepiller; +Cc: 39728 Hi! Julien Lepiller <julien@lepiller.eu> skribis: > This patch allows to count builds and downloads separately. The idea is > that downloads need bandwidth, but no CPU, while builds do not need > bandwidth, but need CPU. With this patch, guix will be able to download > substitutes while building unrelated packages. Currently, guix needs to > wait for the download to finish before proceeding to the build. This > should reduce the time of guix commands that need to build and download > things at the same time. > > What do you think? I think it’s a good idea! I wonder what the UI will look like: (guix status) would no longer display a progress bar when there’s more than on job (build or download) taking place at the same time. >>From 9c059d81ba4f4016f8c400b403f8c5edbdb160c2 Mon Sep 17 00:00:00 2001 > From: Julien Lepiller <julien@lepiller.eu> > Date: Fri, 21 Feb 2020 23:41:33 +0100 > Subject: [PATCH] nix: Count build and download jobs separately. ^ I’d write “daemon:” here. :-) > This allows to run downloads (that take bandwith) and builds (that take ^ “This allows us” > CPU time) independently from one another. > > * nix/nix-daemon/guix-daemon.cc: Add a max-download-jobs option. > * nix/libstore/globals.hh: Add a maxDownloadJobs setting. > * nix/libstore/globals.cc: Add a default value to it. > * nix/libstore/build.cc: Manage build and download jobs separately. For the final patch, please specify the entities changed (classes, functions, etc.). > + /* Number of download slots occupied. This includes substitution and > + built-ins. */ > + unsigned int nrDownloads; Note that not all builtins are downloads. Fixed-output derivations are (usually) also downloads. (It’d be the first time the daemon gets a notion of “download”. We should make sure it doesn’t conflict with other assumptions.) > /* Registers a running child process. `inBuildSlot' means that > the process counts towards the jobs limit. */ > void childStarted(GoalPtr goal, pid_t pid, > - const set<int> & fds, bool inBuildSlot, bool respectTimeouts); > + const set<int> & fds, bool inBuildSlot, bool inDownloadSlot, > + bool respectTimeouts); How about replacing these two Booleans by a single enum? > + unsigned int curDownloads = worker.getNrDownloads(); > + if (curDownloads >= (settings.maxDownloadJobs==0?1:settings.maxDownloadJobs) && > + fixedOutput) { This is hard to parse and lacking spacing. :-) Perhaps make an intermediate function or variable? > +void Worker::waitForDownloadSlot(GoalPtr goal) > +{ > + debug("wait for download slot"); > + if (getNrDownloads() < (settings.maxDownloadJobs==0?1:settings.maxDownloadJobs)) Same here. > @@ -118,6 +119,7 @@ void Settings::update() > { > _get(tryFallback, "build-fallback"); > _get(maxBuildJobs, "build-max-jobs"); > + _get(maxDownloadJobs, "download-max-jobs"); We should also allow ‘set-build-options’ to set this option, as well as add it to ‘%standard-build-options’. That can prolly come in a separate patch. > + { "max-downloads", 'D', n_("N"), 0, > + n_("allow at most N download jobs") }, We’d need to update doc/guix.texi. It would be great if you could test this patch for your daily usage. I find it surprisingly easy to break things in the daemon. :-) Thank you! Ludo’. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH] Allow parallel downloads and builds 2020-02-24 21:23 ` Ludovic Courtès @ 2020-02-25 15:21 ` zimoun 2020-02-25 15:39 ` Julien Lepiller 0 siblings, 1 reply; 10+ messages in thread From: zimoun @ 2020-02-25 15:21 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Julien Lepiller, 39728 Hi Julien, On Mon, 24 Feb 2020 at 22:43, Ludovic Courtès <ludo@gnu.org> wrote: > Julien Lepiller <julien@lepiller.eu> skribis: > > This patch allows to count builds and downloads separately. The idea is > > that downloads need bandwidth, but no CPU, while builds do not need > > bandwidth, but need CPU. With this patch, guix will be able to download > > substitutes while building unrelated packages. Currently, guix needs to > > wait for the download to finish before proceeding to the build. This > > should reduce the time of guix commands that need to build and download > > things at the same time. > > > > What do you think? > > I think it’s a good idea! > > I wonder what the UI will look like: (guix status) would no longer > display a progress bar when there’s more than on job (build or download) > taking place at the same time. Speaking about progress bar, it could be nice (as an improvement) to have a concurrent progress bar. As an example, see: http://hackage.haskell.org/package/concurrent-output > It would be great if you could test this patch for your daily usage. I > find it surprisingly easy to break things in the daemon. :-) How can I do that? After the 'make', how can change the daemon? And then revert it again to the default one? Cheers, simon ^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH] Allow parallel downloads and builds 2020-02-25 15:21 ` zimoun @ 2020-02-25 15:39 ` Julien Lepiller 2020-02-26 10:36 ` Pierre Neidhardt 0 siblings, 1 reply; 10+ messages in thread From: Julien Lepiller @ 2020-02-25 15:39 UTC (permalink / raw) To: zimoun, Ludovic Courtès; +Cc: 39728 Le 25 février 2020 10:21:24 GMT-05:00, zimoun <zimon.toutoune@gmail.com> a écrit : >Hi Julien, > >On Mon, 24 Feb 2020 at 22:43, Ludovic Courtès <ludo@gnu.org> wrote: >> Julien Lepiller <julien@lepiller.eu> skribis: > >> > This patch allows to count builds and downloads separately. The >idea is >> > that downloads need bandwidth, but no CPU, while builds do not need >> > bandwidth, but need CPU. With this patch, guix will be able to >download >> > substitutes while building unrelated packages. Currently, guix >needs to >> > wait for the download to finish before proceeding to the build. >This >> > should reduce the time of guix commands that need to build and >download >> > things at the same time. >> > >> > What do you think? >> >> I think it’s a good idea! >> >> I wonder what the UI will look like: (guix status) would no longer >> display a progress bar when there’s more than on job (build or >download) >> taking place at the same time. > >Speaking about progress bar, it could be nice (as an improvement) to >have a concurrent progress bar. As an example, see: > >http://hackage.haskell.org/package/concurrent-output > > >> It would be great if you could test this patch for your daily usage. >I >> find it surprisingly easy to break things in the daemon. :-) > >How can I do that? >After the 'make', how can change the daemon? And then revert it again >to the default one? > > >Cheers, >simon On the guix system, try (in a guix environment guix) sudo herd stop guix-daemon; sudo ./pre-inst-env guix-daemon --build-users-group=guixbuild To revert back, kill this (^C) and sudo herd start guix-daemon. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH] Allow parallel downloads and builds 2020-02-25 15:39 ` Julien Lepiller @ 2020-02-26 10:36 ` Pierre Neidhardt 2021-11-21 22:49 ` [bug#39728] [PATCH v2] " Julien Lepiller 0 siblings, 1 reply; 10+ messages in thread From: Pierre Neidhardt @ 2020-02-26 10:36 UTC (permalink / raw) To: Julien Lepiller; +Cc: Ludovic Courtès, 39728, zimoun [-- Attachment #1: Type: text/plain, Size: 376 bytes --] Thank you so much for this, Julien! There was a thread on this topic on the mailing list: https://lists.gnu.org/archive/html/guix-devel/2019-11/msg00002.html (could not find the first email :p). There were a couple other issues that were mentioned there. This patch would be a first step towards more parallelization! -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH v2] Allow parallel downloads and builds 2020-02-26 10:36 ` Pierre Neidhardt @ 2021-11-21 22:49 ` Julien Lepiller 2021-11-25 12:53 ` Ludovic Courtès 0 siblings, 1 reply; 10+ messages in thread From: Julien Lepiller @ 2021-11-21 22:49 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: Ludovic Courtès, 39728, zimoun [-- Attachment #1: Type: text/plain, Size: 612 bytes --] Le Wed, 26 Feb 2020 11:36:32 +0100, Pierre Neidhardt <mail@ambrevar.xyz> a écrit : > Thank you so much for this, Julien! > > There was a thread on this topic on the mailing list: > https://lists.gnu.org/archive/html/guix-devel/2019-11/msg00002.html > (could not find the first email :p). > > There were a couple other issues that were mentioned there. > This patch would be a first step towards more parallelization! > Hi! After so long, I managed to find the time to go over the comments and improve my patches. I tested the new daemon for a bit, and it's working as expected so far :D [-- Attachment #2: 0001-daemon-Count-build-and-download-jobs-separately.patch --] [-- Type: text/x-patch, Size: 13953 bytes --] From 3dee3844b6da1d70b883391f5b2d23e331d6a5ad Mon Sep 17 00:00:00 2001 Message-Id: <3dee3844b6da1d70b883391f5b2d23e331d6a5ad.1637534792.git.julien@lepiller.eu> From: Julien Lepiller <julien@lepiller.eu> Date: Sun, 21 Nov 2021 22:35:09 +0100 Subject: [PATCH 1/2] daemon: Count build and download jobs separately. This allows us to run downloads (that take bandwith) and builds (that take CPU time) independently from one another. * nix/nix-daemon/guix-daemon.cc: Add a max-download-jobs option. * nix/nix-daemon/nix-daemon.cc: Add a max-download-jobs option. * nix/libstore/globals.hh (Settings): Add a maxDownloadJobs setting. * nix/libstore/globals.cc (Settings::Settings): Add a default value to it. (Settings::update): Add it. * nix/libstore/build.cc (Slot): New enum type. (Child): Use `Slot` enum to record the job slot of the task, if any. (Goal): Separate build and download slot members and methods. (DerivationGoal::tryToBuild): Check number of available slots in build or download tasks. (DerivationGoal::tryBuildHook, SubstitutionGoal::tryToRun): Check number of download slots instead of build slots. (Worker::Worker, Worker::childStarted, Worker::childTerminated): Manage build and download slots separately. * doc/guix.texi (Invoking guix-daemon): Document `--max-downloads`. --- doc/guix.texi | 6 ++ nix/libstore/build.cc | 105 ++++++++++++++++++++++++++-------- nix/libstore/globals.cc | 2 + nix/libstore/globals.hh | 3 + nix/nix-daemon/guix-daemon.cc | 5 ++ nix/nix-daemon/nix-daemon.cc | 2 +- 6 files changed, 97 insertions(+), 26 deletions(-) diff --git a/doc/guix.texi b/doc/guix.texi index 59ceb4477a..b8de53c53b 100644 --- a/doc/guix.texi +++ b/doc/guix.texi @@ -1619,6 +1619,12 @@ Invoking guix-daemon locally; instead, the daemon will offload builds (@pxref{Daemon Offload Setup}), or simply fail. +@item --max-downloads=@var{n} +@itemx -D @var{n} +Allow at most @var{n} download jobs in parallel. The default value is +@code{1}. Since some downloads always need to happen, setting it to +@code{0} does not make any sense and will be ignored. + @item --max-silent-time=@var{seconds} When the build or substitution process remains silent for more than @var{seconds}, terminate it and report a build failure. diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc index 5697ae5a43..2fbbd81af2 100644 --- a/nix/libstore/build.cc +++ b/nix/libstore/build.cc @@ -196,6 +196,9 @@ bool CompareGoalPtrs::operator() (const GoalPtr & a, const GoalPtr & b) { } +/* A list of possible slots */ +typedef enum { slotNone, slotBuild, slotDownload } Slot; + /* A mapping used to remember for each child process to what goal it belongs, and file descriptors for receiving log data and output path creation commands. */ @@ -204,7 +207,7 @@ struct Child WeakGoalPtr goal; set<int> fds; bool respectTimeouts; - bool inBuildSlot; + Slot inSlot; time_t lastOutput; /* time we last got output on stdout/stderr */ time_t timeStarted; }; @@ -232,10 +235,15 @@ private: /* Child processes currently running. */ Children children; - /* Number of build slots occupied. This includes local builds and - substitutions but not remote builds via the build hook. */ + /* Number of build slots occupied. This includes local builds but not + fixed-output derivations, substitution, built-ins and remote builds + via the build hook. */ unsigned int nrLocalBuilds; + /* Number of download slots occupied. This includes fixed-output + derivations, substitution and built-ins. */ + unsigned int nrDownloads; + /* Maps used to prevent multiple instantiations of a goal for the same derivation / path. */ WeakGoalMap derivationGoals; @@ -277,15 +285,18 @@ public: /* Wake up a goal (i.e., there is something for it to do). */ void wakeUp(GoalPtr goal); - /* Return the number of local build and substitution processes - currently running (but not remote builds via the build - hook). */ + /* Return the number of local build processes currently running (but + not remote builds via the build hook). */ unsigned int getNrLocalBuilds(); - /* Registers a running child process. `inBuildSlot' means that - the process counts towards the jobs limit. */ + /* Return the number of downloads currently running. */ + unsigned int getNrDownloads(); + + /* Registers a running child process. `inSlot' indicates whether + the process counts towards the jobs limit, and which one. */ void childStarted(GoalPtr goal, pid_t pid, - const set<int> & fds, bool inBuildSlot, bool respectTimeouts); + const set<int> & fds, Slot inSlot, + bool respectTimeouts); /* Unregisters a running child process. `wakeSleepers' should be false if there is no sense in waking up goals that are sleeping @@ -297,6 +308,10 @@ public: might be right away). */ void waitForBuildSlot(GoalPtr goal); + /* Put `goal' to sleep until a download slot becomes available (which + might be right away). */ + void waitForDownloadSlot(GoalPtr goal); + /* Wait for any goal to finish. Pretty indiscriminate way to wait for some resource that some other goal is holding. */ void waitForAnyGoal(GoalPtr goal); @@ -1249,16 +1264,28 @@ void DerivationGoal::tryToBuild() } } - /* Make sure that we are allowed to start a build. If this - derivation prefers to be done locally, do it even if - maxBuildJobs is 0. */ + /* Make sure that, if this is a build (not a download), we are allowed to + start a build. If this derivation prefers to be done locally, do it + even if maxBuildJobs is 0. */ unsigned int curBuilds = worker.getNrLocalBuilds(); - if (curBuilds >= settings.maxBuildJobs && !(buildLocally && curBuilds == 0)) { + if (curBuilds >= settings.maxBuildJobs && !(buildLocally && curBuilds == 0) && + !fixedOutput) { worker.waitForBuildSlot(shared_from_this()); outputLocks.unlock(); return; } + /* Make sure that, if this is a download (not a build), we are allowed to + start a download. Even if maxDownloadJobs is 0, we always allow + downloads, so it is set to 1 */ + unsigned int curDownloads = worker.getNrDownloads(); + unsigned int maxDownloads = settings.maxDownloadJobs == 0 ? 1 : settings.maxDownloadJobs; + if (curDownloads >= maxDownloads && fixedOutput) { + worker.waitForDownloadSlot(shared_from_this()); + outputLocks.unlock(); + return; + } + try { /* Okay, we have to build. */ @@ -1546,7 +1573,7 @@ HookReply DerivationGoal::tryBuildHook() set<int> fds; fds.insert(hook->fromAgent.readSide); fds.insert(hook->builderOut.readSide); - worker.childStarted(shared_from_this(), hook->pid, fds, false, true); + worker.childStarted(shared_from_this(), hook->pid, fds, slotNone, true); if (settings.printBuildTrace) printMsg(lvlError, format("@ build-started %1% - %2% %3% %4%") @@ -1940,7 +1967,8 @@ void DerivationGoal::startBuilder() pid.setSeparatePG(true); builderOut.writeSide.close(); worker.childStarted(shared_from_this(), pid, - singleton<set<int> >(builderOut.readSide), true, true); + singleton<set<int> >(builderOut.readSide), + fixedOutput? slotDownload: slotBuild, true); /* Check if setting up the build environment failed. */ string msg = readLine(builderOut.readSide); @@ -2979,11 +3007,11 @@ void SubstitutionGoal::tryToRun() trace("trying to run"); /* Make sure that we are allowed to start a build. Note that even - is maxBuildJobs == 0 (no local builds allowed), we still allow - a substituter to run. This is because substitutions cannot be - distributed to another machine via the build hook. */ - if (worker.getNrLocalBuilds() >= (settings.maxBuildJobs == 0 ? 1 : settings.maxBuildJobs)) { - worker.waitForBuildSlot(shared_from_this()); + is maxDownloadJobs == 0 (no downloads allowed), we still allow + a substituter to run. This is because we always need to download, so + not allowing is meaningless. */ + if (worker.getNrDownloads() >= (settings.maxDownloadJobs == 0 ? 1 : settings.maxDownloadJobs)) { + worker.waitForDownloadSlot(shared_from_this()); return; } @@ -3042,7 +3070,8 @@ void SubstitutionGoal::tryToRun() set<int> fds; fds.insert(substituter->fromAgent.readSide); fds.insert(substituter->builderOut.readSide); - worker.childStarted(shared_from_this(), substituter->pid, fds, true, true); + worker.childStarted(shared_from_this(), substituter->pid, fds, + slotDownload, true); state = &SubstitutionGoal::finished; @@ -3206,6 +3235,7 @@ Worker::Worker(LocalStore & store) if (working) abort(); working = true; nrLocalBuilds = 0; + nrDownloads = 0; lastWokenUp = 0; permanentFailure = false; timedOut = false; @@ -3299,18 +3329,25 @@ unsigned Worker::getNrLocalBuilds() } +unsigned Worker::getNrDownloads() +{ + return nrDownloads; +} + + void Worker::childStarted(GoalPtr goal, - pid_t pid, const set<int> & fds, bool inBuildSlot, + pid_t pid, const set<int> & fds, Slot inSlot, bool respectTimeouts) { Child child; child.goal = goal; child.fds = fds; child.timeStarted = child.lastOutput = time(0); - child.inBuildSlot = inBuildSlot; + child.inSlot = inSlot; child.respectTimeouts = respectTimeouts; children[pid] = child; - if (inBuildSlot) nrLocalBuilds++; + if (inSlot == slotBuild) nrLocalBuilds++; + if (inSlot == slotDownload) nrDownloads++; } @@ -3321,11 +3358,16 @@ void Worker::childTerminated(pid_t pid, bool wakeSleepers) Children::iterator i = children.find(pid); assert(i != children.end()); - if (i->second.inBuildSlot) { + if (i->second.inSlot == slotBuild) { assert(nrLocalBuilds > 0); nrLocalBuilds--; } + if (i->second.inSlot == slotDownload) { + assert(nrDownloads > 0); + nrDownloads--; + } + children.erase(pid); if (wakeSleepers) { @@ -3351,6 +3393,19 @@ void Worker::waitForBuildSlot(GoalPtr goal) } +void Worker::waitForDownloadSlot(GoalPtr goal) +{ + debug("wait for download slot"); + /* Disabling all downloads does not make any sense, so we make sure we + can download with at least one job. */ + unsigned int maxDownloads = settings.maxDownloadJobs == 0 ? 1 : settings.maxDownloadJobs; + if (getNrDownloads() < maxDownloads) + wakeUp(goal); /* we can do it right away */ + else + addToWeakGoals(wantingToBuild, goal); +} + + void Worker::waitForAnyGoal(GoalPtr goal) { debug("wait for any goal"); diff --git a/nix/libstore/globals.cc b/nix/libstore/globals.cc index 0cc001fbe4..416033718d 100644 --- a/nix/libstore/globals.cc +++ b/nix/libstore/globals.cc @@ -29,6 +29,7 @@ Settings::Settings() tryFallback = false; buildVerbosity = lvlError; maxBuildJobs = 1; + maxDownloadJobs = 1; buildCores = 1; readOnlyMode = false; thisSystem = SYSTEM; @@ -118,6 +119,7 @@ void Settings::update() { _get(tryFallback, "build-fallback"); _get(maxBuildJobs, "build-max-jobs"); + _get(maxDownloadJobs, "download-max-jobs"); _get(buildCores, "build-cores"); _get(thisSystem, "system"); _get(multiplexedBuildOutput, "multiplexed-build-output"); diff --git a/nix/libstore/globals.hh b/nix/libstore/globals.hh index 27616a2283..c033f8ed56 100644 --- a/nix/libstore/globals.hh +++ b/nix/libstore/globals.hh @@ -90,6 +90,9 @@ struct Settings { /* Maximum number of parallel build jobs. 0 means unlimited. */ unsigned int maxBuildJobs; + /* Maximum number of parallel download jobs. 0 means 1. */ + unsigned int maxDownloadJobs; + /* Number of CPU cores to utilize in parallel within a build, i.e. by passing this number to Make via '-j'. 0 means that the number of actual CPU cores on the local host ought to be diff --git a/nix/nix-daemon/guix-daemon.cc b/nix/nix-daemon/guix-daemon.cc index 36a06a3fae..f1b60b0050 100644 --- a/nix/nix-daemon/guix-daemon.cc +++ b/nix/nix-daemon/guix-daemon.cc @@ -100,6 +100,8 @@ static const struct argp_option options[] = }, { "max-jobs", 'M', n_("N"), 0, n_("allow at most N build jobs") }, + { "max-downloads", 'D', n_("N"), 0, + n_("allow at most N download jobs") }, { "timeout", GUIX_OPT_TIMEOUT, n_("SECONDS"), 0, n_("mark builds as failed after SECONDS of activity") }, { "max-silent-time", GUIX_OPT_MAX_SILENT_TIME, n_("SECONDS"), 0, @@ -285,6 +287,9 @@ parse_opt (int key, char *arg, struct argp_state *state) case 'M': settings.set ("build-max-jobs", arg); break; + case 'D': + settings.set ("download-max-jobs", arg); + break; case GUIX_OPT_TIMEOUT: settings.set ("build-timeout", arg); break; diff --git a/nix/nix-daemon/nix-daemon.cc b/nix/nix-daemon/nix-daemon.cc index 497de11a04..4b2f8bcf22 100644 --- a/nix/nix-daemon/nix-daemon.cc +++ b/nix/nix-daemon/nix-daemon.cc @@ -610,7 +610,7 @@ static void performOp(bool trusted, unsigned int clientVersion, string value = readString(from); if (name == "build-timeout" || name == "build-max-silent-time" || name == "build-max-jobs" || name == "build-cores" - || name == "build-repeat" + || name == "build-repeat" || name == "download-max-jobs" || name == "multiplexed-build-output") settings.set(name, value); else if (name == "user-name" -- 2.33.1 [-- Attachment #3: 0002-guix-Support-specifying-max-download-jobs.patch --] [-- Type: text/x-patch, Size: 4127 bytes --] From d8b39af21d97dc3f0d3057ceb7ba91a93ff8d3ec Mon Sep 17 00:00:00 2001 Message-Id: <d8b39af21d97dc3f0d3057ceb7ba91a93ff8d3ec.1637534792.git.julien@lepiller.eu> In-Reply-To: <3dee3844b6da1d70b883391f5b2d23e331d6a5ad.1637534792.git.julien@lepiller.eu> References: <3dee3844b6da1d70b883391f5b2d23e331d6a5ad.1637534792.git.julien@lepiller.eu> From: Julien Lepiller <julien@lepiller.eu> Date: Sun, 21 Nov 2021 23:26:11 +0100 Subject: [PATCH 2/2] guix: Support specifying max download jobs. * guix/store.scm (set-build-options): Optionally send max-download-jobs. * guix/scripts/build.scm (set-build-options-from-command-line) (%standard-build-options): Support `--max-downloads` * doc/guix.texi (Common Build Options): Document it. --- doc/guix.texi | 6 ++++++ guix/scripts/build.scm | 8 ++++++++ guix/store.scm | 5 +++++ 3 files changed, 19 insertions(+) diff --git a/doc/guix.texi b/doc/guix.texi index b8de53c53b..0afefbd416 100644 --- a/doc/guix.texi +++ b/doc/guix.texi @@ -10846,6 +10846,12 @@ Common Build Options guix-daemon, @option{--max-jobs}}, for details about this option and the equivalent @command{guix-daemon} option. +@item --max-downloads=@var{n} +@itemx -D @var{n} +Allow at most @var{n} download jobs in parallel. @xref{Invoking +guix-daemon, @option{--max-downloads}}, for details about this option and the +equivalent @command{guix-daemon} option. + @item --debug=@var{level} Produce debugging output coming from the build daemon. @var{level} must be an integer between 0 and 5; higher means more verbose output. Setting a level of diff --git a/guix/scripts/build.scm b/guix/scripts/build.scm index 97e2f5a167..0d0199eccd 100644 --- a/guix/scripts/build.scm +++ b/guix/scripts/build.scm @@ -207,6 +207,7 @@ (define (set-build-options-from-command-line store opts) #:rounds (assoc-ref opts 'rounds) #:build-cores (assoc-ref opts 'cores) #:max-build-jobs (assoc-ref opts 'max-jobs) + #:max-download-jobs (assoc-ref opts 'max-downloads) #:fallback? (assoc-ref opts 'fallback?) #:use-substitutes? (assoc-ref opts 'substitutes?) #:substitute-urls (assoc-ref opts 'substitute-urls) @@ -316,6 +317,13 @@ (define %standard-build-options (let ((c (false-if-exception (string->number arg)))) (if c (apply values (alist-cons 'max-jobs c result) rest) + (leave (G_ "not a number: '~a' option argument: ~a~%") + name arg))))) + (option '(#\D "max-downloads") #t #f + (lambda (opt name arg result . rest) + (let ((c (false-if-exception (string->number arg)))) + (if c + (apply values (alist-cons 'max-downloads c result) rest) (leave (G_ "not a number: '~a' option argument: ~a~%") name arg))))))) diff --git a/guix/store.scm b/guix/store.scm index 7388953d15..6b5b9262b1 100644 --- a/guix/store.scm +++ b/guix/store.scm @@ -803,6 +803,7 @@ (define* (set-build-options server (verbosity 0) rounds ;number of build rounds max-build-jobs + max-download-jobs timeout max-silent-time (offload? #t) @@ -896,6 +897,10 @@ (define* (set-build-options server `(("build-max-jobs" . ,(number->string max-build-jobs))) '()) + ,@(if max-download-jobs + `(("download-max-jobs" + . ,(number->string max-download-jobs))) + '()) ,@(if build-cores `(("build-cores" . ,(number->string build-cores))) '()) -- 2.33.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH v2] Allow parallel downloads and builds 2021-11-21 22:49 ` [bug#39728] [PATCH v2] " Julien Lepiller @ 2021-11-25 12:53 ` Ludovic Courtès 2021-11-25 13:01 ` Julien Lepiller 0 siblings, 1 reply; 10+ messages in thread From: Ludovic Courtès @ 2021-11-25 12:53 UTC (permalink / raw) To: Julien Lepiller; +Cc: 39728, Pierre Neidhardt, zimoun Hello, Julien Lepiller <julien@lepiller.eu> skribis: > After so long, I managed to find the time to go over the comments and > improve my patches. I tested the new daemon for a bit, and it's working > as expected so far :D On a recent daemon, have you seen cases where having multiple downloads in parallel speeds things up? The analysis in <https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/> suggests that at the time you first submitted this patch, substitution speed (which is different from raw download speed) was often CPU-bound. This is no longer the case, meaning that downloads should now be network-bound or almost. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH v2] Allow parallel downloads and builds 2021-11-25 12:53 ` Ludovic Courtès @ 2021-11-25 13:01 ` Julien Lepiller 2021-11-26 10:16 ` Ludovic Courtès 0 siblings, 1 reply; 10+ messages in thread From: Julien Lepiller @ 2021-11-25 13:01 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 39728, Pierre Neidhardt, zimoun Le Thu, 25 Nov 2021 13:53:25 +0100, Ludovic Courtès <ludo@gnu.org> a écrit : > Hello, > > Julien Lepiller <julien@lepiller.eu> skribis: > > > After so long, I managed to find the time to go over the comments > > and improve my patches. I tested the new daemon for a bit, and it's > > working as expected so far :D > > On a recent daemon, have you seen cases where having multiple > downloads in parallel speeds things up? > > The analysis in > <https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/> > suggests that at the time you first submitted this patch, substitution > speed (which is different from raw download speed) was often > CPU-bound. This is no longer the case, meaning that downloads should > now be network-bound or almost. > > Thanks, > Ludo’. I would still say yes, because the output from berlin is often much less than my throughput. With multiple downloads in parallel it at least feels quicker, probably because I can download at full speed. In any case, I see often a build start while downloads are in progress, so I think it's still a win if you can get a few derivations built while waiting for a big download to finish at the same time :) At some point we might want to prioritize builds/downloads that help unlock as much builds as possible early, so we don't have builds waiting for downloads. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH v2] Allow parallel downloads and builds 2021-11-25 13:01 ` Julien Lepiller @ 2021-11-26 10:16 ` Ludovic Courtès 0 siblings, 0 replies; 10+ messages in thread From: Ludovic Courtès @ 2021-11-26 10:16 UTC (permalink / raw) To: Julien Lepiller; +Cc: 39728, Pierre Neidhardt, zimoun Hi! Julien Lepiller <julien@lepiller.eu> skribis: > I would still say yes, because the output from berlin is often much > less than my throughput. With multiple downloads in parallel it at > least feels quicker, probably because I can download at full speed. It would be nice to measure that because like I wrote, I think we’re pretty much network-bound these days, at least with zstd and uncompressed downloads. > In any case, I see often a build start while downloads are in progress, > so I think it's still a win if you can get a few derivations built > while waiting for a big download to finish at the same time :) True! Overlapping downloads and builds sounds like a good idea. > At some point we might want to prioritize builds/downloads that help > unlock as much builds as possible early, so we don't have builds > waiting for downloads. Right now the daemon starts with substitutes and builds afterwards. BTW, we’re assuming downloads = substitutes in this whole discussion, but we could/should take fixed-output derivations into account too. I’ll take a closer look later on… Thanks, Ludo’. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [bug#39728] [PATCH] Allow parallel downloads and builds 2020-02-21 22:53 [bug#39728] [PATCH] Allow parallel downloads and builds Julien Lepiller 2020-02-24 21:23 ` Ludovic Courtès @ 2021-07-14 2:49 ` Maxim Cournoyer 1 sibling, 0 replies; 10+ messages in thread From: Maxim Cournoyer @ 2021-07-14 2:49 UTC (permalink / raw) To: Julien Lepiller; +Cc: 39728 Hello! Julien Lepiller <julien@lepiller.eu> writes: > Hi guix! > > This patch allows to count builds and downloads separately. The idea is > that downloads need bandwidth, but no CPU, while builds do not need > bandwidth, but need CPU. With this patch, guix will be able to download > substitutes while building unrelated packages. Currently, guix needs to > wait for the download to finish before proceeding to the build. This > should reduce the time of guix commands that need to build and download > things at the same time. > > What do you think? Looks like a neat improvement! Could you provide a follow-up to Ludovic's review? It seems not much is missing (minor cosmetic changes to commit messages + code and more importantly, the accompanying documentation update). Thank you! Maxim ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-11-26 10:17 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-02-21 22:53 [bug#39728] [PATCH] Allow parallel downloads and builds Julien Lepiller 2020-02-24 21:23 ` Ludovic Courtès 2020-02-25 15:21 ` zimoun 2020-02-25 15:39 ` Julien Lepiller 2020-02-26 10:36 ` Pierre Neidhardt 2021-11-21 22:49 ` [bug#39728] [PATCH v2] " Julien Lepiller 2021-11-25 12:53 ` Ludovic Courtès 2021-11-25 13:01 ` Julien Lepiller 2021-11-26 10:16 ` Ludovic Courtès 2021-07-14 2:49 ` [bug#39728] [PATCH] " Maxim Cournoyer
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).