unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: Julien Lepiller <julien@lepiller.eu>
To: Pierre Neidhardt <mail@ambrevar.xyz>
Cc: "Ludovic Courtès" <ludo@gnu.org>,
	39728@debbugs.gnu.org, zimoun <zimon.toutoune@gmail.com>
Subject: [bug#39728] [PATCH v2] Allow parallel downloads and builds
Date: Sun, 21 Nov 2021 23:49:41 +0100	[thread overview]
Message-ID: <20211121234941.32724b68@tachikoma.lepiller.eu> (raw)
In-Reply-To: <87zhd58wjz.fsf@ambrevar.xyz>

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

Le Wed, 26 Feb 2020 11:36:32 +0100,
Pierre Neidhardt <mail@ambrevar.xyz> a écrit :

> Thank you so much for this, Julien!
> 
> There was a thread on this topic on the mailing list:
> https://lists.gnu.org/archive/html/guix-devel/2019-11/msg00002.html
> (could not find the first email :p).
> 
> There were a couple other issues that were mentioned there.
> This patch would be a first step towards more parallelization!
> 

Hi!

After so long, I managed to find the time to go over the comments and
improve my patches. I tested the new daemon for a bit, and it's working
as expected so far :D

[-- Attachment #2: 0001-daemon-Count-build-and-download-jobs-separately.patch --]
[-- Type: text/x-patch, Size: 13953 bytes --]

From 3dee3844b6da1d70b883391f5b2d23e331d6a5ad Mon Sep 17 00:00:00 2001
Message-Id: <3dee3844b6da1d70b883391f5b2d23e331d6a5ad.1637534792.git.julien@lepiller.eu>
From: Julien Lepiller <julien@lepiller.eu>
Date: Sun, 21 Nov 2021 22:35:09 +0100
Subject: [PATCH 1/2] daemon: Count build and download jobs separately.

This allows us to run downloads (that take bandwith) and builds (that take
CPU time) independently from one another.

* nix/nix-daemon/guix-daemon.cc: Add a max-download-jobs option.
* nix/nix-daemon/nix-daemon.cc: Add a max-download-jobs option.
* nix/libstore/globals.hh (Settings): Add a maxDownloadJobs setting.
* nix/libstore/globals.cc (Settings::Settings): Add a default value to it.
(Settings::update): Add it.
* nix/libstore/build.cc (Slot): New enum type.
(Child): Use `Slot` enum to record the job slot of the task, if any.
(Goal): Separate build and download slot members and methods.
(DerivationGoal::tryToBuild): Check number of available slots in build
or download tasks.
(DerivationGoal::tryBuildHook, SubstitutionGoal::tryToRun): Check number
of download slots instead of build slots.
(Worker::Worker, Worker::childStarted, Worker::childTerminated):
Manage build and download slots separately.
* doc/guix.texi (Invoking guix-daemon): Document `--max-downloads`.
---
 doc/guix.texi                 |   6 ++
 nix/libstore/build.cc         | 105 ++++++++++++++++++++++++++--------
 nix/libstore/globals.cc       |   2 +
 nix/libstore/globals.hh       |   3 +
 nix/nix-daemon/guix-daemon.cc |   5 ++
 nix/nix-daemon/nix-daemon.cc  |   2 +-
 6 files changed, 97 insertions(+), 26 deletions(-)

diff --git a/doc/guix.texi b/doc/guix.texi
index 59ceb4477a..b8de53c53b 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -1619,6 +1619,12 @@ Invoking guix-daemon
 locally; instead, the daemon will offload builds (@pxref{Daemon Offload
 Setup}), or simply fail.
 
+@item --max-downloads=@var{n}
+@itemx -D @var{n}
+Allow at most @var{n} download jobs in parallel.  The default value is
+@code{1}.  Since some downloads always need to happen, setting it to
+@code{0} does not make any sense and will be ignored.
+
 @item --max-silent-time=@var{seconds}
 When the build or substitution process remains silent for more than
 @var{seconds}, terminate it and report a build failure.
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index 5697ae5a43..2fbbd81af2 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -196,6 +196,9 @@ bool CompareGoalPtrs::operator() (const GoalPtr & a, const GoalPtr & b) {
 }
 
 
+/* A list of possible slots */
+typedef enum { slotNone, slotBuild, slotDownload } Slot;
+
 /* A mapping used to remember for each child process to what goal it
    belongs, and file descriptors for receiving log data and output
    path creation commands. */
@@ -204,7 +207,7 @@ struct Child
     WeakGoalPtr goal;
     set<int> fds;
     bool respectTimeouts;
-    bool inBuildSlot;
+    Slot inSlot;
     time_t lastOutput; /* time we last got output on stdout/stderr */
     time_t timeStarted;
 };
@@ -232,10 +235,15 @@ private:
     /* Child processes currently running. */
     Children children;
 
-    /* Number of build slots occupied.  This includes local builds and
-       substitutions but not remote builds via the build hook. */
+    /* Number of build slots occupied.  This includes local builds but not
+       fixed-output derivations, substitution, built-ins and remote builds
+       via the build hook. */
     unsigned int nrLocalBuilds;
 
+    /* Number of download slots occupied.  This includes fixed-output
+       derivations, substitution and built-ins. */
+    unsigned int nrDownloads;
+
     /* Maps used to prevent multiple instantiations of a goal for the
        same derivation / path. */
     WeakGoalMap derivationGoals;
@@ -277,15 +285,18 @@ public:
     /* Wake up a goal (i.e., there is something for it to do). */
     void wakeUp(GoalPtr goal);
 
-    /* Return the number of local build and substitution processes
-       currently running (but not remote builds via the build
-       hook). */
+    /* Return the number of local build processes currently running (but
+       not remote builds via the build hook). */
     unsigned int getNrLocalBuilds();
 
-    /* Registers a running child process.  `inBuildSlot' means that
-       the process counts towards the jobs limit. */
+    /* Return the number of downloads currently running. */
+    unsigned int getNrDownloads();
+
+    /* Registers a running child process.  `inSlot' indicates whether
+       the process counts towards the jobs limit, and which one. */
     void childStarted(GoalPtr goal, pid_t pid,
-        const set<int> & fds, bool inBuildSlot, bool respectTimeouts);
+        const set<int> & fds, Slot inSlot,
+        bool respectTimeouts);
 
     /* Unregisters a running child process.  `wakeSleepers' should be
        false if there is no sense in waking up goals that are sleeping
@@ -297,6 +308,10 @@ public:
        might be right away). */
     void waitForBuildSlot(GoalPtr goal);
 
+    /* Put `goal' to sleep until a download slot becomes available (which
+       might be right away). */
+    void waitForDownloadSlot(GoalPtr goal);
+
     /* Wait for any goal to finish.  Pretty indiscriminate way to
        wait for some resource that some other goal is holding. */
     void waitForAnyGoal(GoalPtr goal);
@@ -1249,16 +1264,28 @@ void DerivationGoal::tryToBuild()
         }
     }
 
-    /* Make sure that we are allowed to start a build.  If this
-       derivation prefers to be done locally, do it even if
-       maxBuildJobs is 0. */
+    /* Make sure that, if this is a build (not a download), we are allowed to
+       start a build.  If this derivation prefers to be done locally, do it
+       even if maxBuildJobs is 0. */
     unsigned int curBuilds = worker.getNrLocalBuilds();
-    if (curBuilds >= settings.maxBuildJobs && !(buildLocally && curBuilds == 0)) {
+    if (curBuilds >= settings.maxBuildJobs && !(buildLocally && curBuilds == 0) &&
+                    !fixedOutput) {
         worker.waitForBuildSlot(shared_from_this());
         outputLocks.unlock();
         return;
     }
 
+    /* Make sure that, if this is a download (not a build), we are allowed to
+       start a download.  Even if maxDownloadJobs is 0, we always allow
+       downloads, so it is set to 1 */
+    unsigned int curDownloads = worker.getNrDownloads();
+    unsigned int maxDownloads = settings.maxDownloadJobs == 0 ? 1 : settings.maxDownloadJobs;
+    if (curDownloads >= maxDownloads && fixedOutput) {
+        worker.waitForDownloadSlot(shared_from_this());
+        outputLocks.unlock();
+        return;
+    }
+
     try {
 
         /* Okay, we have to build. */
@@ -1546,7 +1573,7 @@ HookReply DerivationGoal::tryBuildHook()
     set<int> fds;
     fds.insert(hook->fromAgent.readSide);
     fds.insert(hook->builderOut.readSide);
-    worker.childStarted(shared_from_this(), hook->pid, fds, false, true);
+    worker.childStarted(shared_from_this(), hook->pid, fds, slotNone, true);
 
     if (settings.printBuildTrace)
         printMsg(lvlError, format("@ build-started %1% - %2% %3% %4%")
@@ -1940,7 +1967,8 @@ void DerivationGoal::startBuilder()
     pid.setSeparatePG(true);
     builderOut.writeSide.close();
     worker.childStarted(shared_from_this(), pid,
-        singleton<set<int> >(builderOut.readSide), true, true);
+        singleton<set<int> >(builderOut.readSide),
+        fixedOutput? slotDownload: slotBuild, true);
 
     /* Check if setting up the build environment failed. */
     string msg = readLine(builderOut.readSide);
@@ -2979,11 +3007,11 @@ void SubstitutionGoal::tryToRun()
     trace("trying to run");
 
     /* Make sure that we are allowed to start a build.  Note that even
-       is maxBuildJobs == 0 (no local builds allowed), we still allow
-       a substituter to run.  This is because substitutions cannot be
-       distributed to another machine via the build hook. */
-    if (worker.getNrLocalBuilds() >= (settings.maxBuildJobs == 0 ? 1 : settings.maxBuildJobs)) {
-        worker.waitForBuildSlot(shared_from_this());
+       is maxDownloadJobs == 0 (no downloads allowed), we still allow
+       a substituter to run.  This is because we always need to download, so
+       not allowing is meaningless. */
+    if (worker.getNrDownloads() >= (settings.maxDownloadJobs == 0 ? 1 : settings.maxDownloadJobs)) {
+        worker.waitForDownloadSlot(shared_from_this());
         return;
     }
 
@@ -3042,7 +3070,8 @@ void SubstitutionGoal::tryToRun()
     set<int> fds;
     fds.insert(substituter->fromAgent.readSide);
     fds.insert(substituter->builderOut.readSide);
-    worker.childStarted(shared_from_this(), substituter->pid, fds, true, true);
+    worker.childStarted(shared_from_this(), substituter->pid, fds,
+                    slotDownload, true);
 
     state = &SubstitutionGoal::finished;
 
@@ -3206,6 +3235,7 @@ Worker::Worker(LocalStore & store)
     if (working) abort();
     working = true;
     nrLocalBuilds = 0;
+    nrDownloads = 0;
     lastWokenUp = 0;
     permanentFailure = false;
     timedOut = false;
@@ -3299,18 +3329,25 @@ unsigned Worker::getNrLocalBuilds()
 }
 
 
+unsigned Worker::getNrDownloads()
+{
+    return nrDownloads;
+}
+
+
 void Worker::childStarted(GoalPtr goal,
-    pid_t pid, const set<int> & fds, bool inBuildSlot,
+    pid_t pid, const set<int> & fds, Slot inSlot,
     bool respectTimeouts)
 {
     Child child;
     child.goal = goal;
     child.fds = fds;
     child.timeStarted = child.lastOutput = time(0);
-    child.inBuildSlot = inBuildSlot;
+    child.inSlot = inSlot;
     child.respectTimeouts = respectTimeouts;
     children[pid] = child;
-    if (inBuildSlot) nrLocalBuilds++;
+    if (inSlot == slotBuild) nrLocalBuilds++;
+    if (inSlot == slotDownload) nrDownloads++;
 }
 
 
@@ -3321,11 +3358,16 @@ void Worker::childTerminated(pid_t pid, bool wakeSleepers)
     Children::iterator i = children.find(pid);
     assert(i != children.end());
 
-    if (i->second.inBuildSlot) {
+    if (i->second.inSlot == slotBuild) {
         assert(nrLocalBuilds > 0);
         nrLocalBuilds--;
     }
 
+    if (i->second.inSlot == slotDownload) {
+        assert(nrDownloads > 0);
+        nrDownloads--;
+    }
+
     children.erase(pid);
 
     if (wakeSleepers) {
@@ -3351,6 +3393,19 @@ void Worker::waitForBuildSlot(GoalPtr goal)
 }
 
 
+void Worker::waitForDownloadSlot(GoalPtr goal)
+{
+    debug("wait for download slot");
+    /* Disabling all downloads does not make any sense, so we make sure we
+       can download with at least one job. */
+    unsigned int maxDownloads = settings.maxDownloadJobs == 0 ? 1 : settings.maxDownloadJobs;
+    if (getNrDownloads() < maxDownloads)
+        wakeUp(goal); /* we can do it right away */
+    else
+        addToWeakGoals(wantingToBuild, goal);
+}
+
+
 void Worker::waitForAnyGoal(GoalPtr goal)
 {
     debug("wait for any goal");
diff --git a/nix/libstore/globals.cc b/nix/libstore/globals.cc
index 0cc001fbe4..416033718d 100644
--- a/nix/libstore/globals.cc
+++ b/nix/libstore/globals.cc
@@ -29,6 +29,7 @@ Settings::Settings()
     tryFallback = false;
     buildVerbosity = lvlError;
     maxBuildJobs = 1;
+    maxDownloadJobs = 1;
     buildCores = 1;
     readOnlyMode = false;
     thisSystem = SYSTEM;
@@ -118,6 +119,7 @@ void Settings::update()
 {
     _get(tryFallback, "build-fallback");
     _get(maxBuildJobs, "build-max-jobs");
+    _get(maxDownloadJobs, "download-max-jobs");
     _get(buildCores, "build-cores");
     _get(thisSystem, "system");
     _get(multiplexedBuildOutput, "multiplexed-build-output");
diff --git a/nix/libstore/globals.hh b/nix/libstore/globals.hh
index 27616a2283..c033f8ed56 100644
--- a/nix/libstore/globals.hh
+++ b/nix/libstore/globals.hh
@@ -90,6 +90,9 @@ struct Settings {
     /* Maximum number of parallel build jobs.  0 means unlimited. */
     unsigned int maxBuildJobs;
 
+    /* Maximum number of parallel download jobs.  0 means 1. */
+    unsigned int maxDownloadJobs;
+
     /* Number of CPU cores to utilize in parallel within a build,
        i.e. by passing this number to Make via '-j'. 0 means that the
        number of actual CPU cores on the local host ought to be
diff --git a/nix/nix-daemon/guix-daemon.cc b/nix/nix-daemon/guix-daemon.cc
index 36a06a3fae..f1b60b0050 100644
--- a/nix/nix-daemon/guix-daemon.cc
+++ b/nix/nix-daemon/guix-daemon.cc
@@ -100,6 +100,8 @@ static const struct argp_option options[] =
     },
     { "max-jobs", 'M', n_("N"), 0,
       n_("allow at most N build jobs") },
+    { "max-downloads", 'D', n_("N"), 0,
+      n_("allow at most N download jobs") },
     { "timeout", GUIX_OPT_TIMEOUT, n_("SECONDS"), 0,
       n_("mark builds as failed after SECONDS of activity") },
     { "max-silent-time", GUIX_OPT_MAX_SILENT_TIME, n_("SECONDS"), 0,
@@ -285,6 +287,9 @@ parse_opt (int key, char *arg, struct argp_state *state)
     case 'M':
       settings.set ("build-max-jobs", arg);
       break;
+    case 'D':
+      settings.set ("download-max-jobs", arg);
+      break;
     case GUIX_OPT_TIMEOUT:
       settings.set ("build-timeout", arg);
       break;
diff --git a/nix/nix-daemon/nix-daemon.cc b/nix/nix-daemon/nix-daemon.cc
index 497de11a04..4b2f8bcf22 100644
--- a/nix/nix-daemon/nix-daemon.cc
+++ b/nix/nix-daemon/nix-daemon.cc
@@ -610,7 +610,7 @@ static void performOp(bool trusted, unsigned int clientVersion,
                 string value = readString(from);
                 if (name == "build-timeout" || name == "build-max-silent-time"
                     || name == "build-max-jobs" || name == "build-cores"
-                    || name == "build-repeat"
+                    || name == "build-repeat" || name == "download-max-jobs"
                     || name == "multiplexed-build-output")
                     settings.set(name, value);
 		else if (name == "user-name"
-- 
2.33.1


[-- Attachment #3: 0002-guix-Support-specifying-max-download-jobs.patch --]
[-- Type: text/x-patch, Size: 4127 bytes --]

From d8b39af21d97dc3f0d3057ceb7ba91a93ff8d3ec Mon Sep 17 00:00:00 2001
Message-Id: <d8b39af21d97dc3f0d3057ceb7ba91a93ff8d3ec.1637534792.git.julien@lepiller.eu>
In-Reply-To: <3dee3844b6da1d70b883391f5b2d23e331d6a5ad.1637534792.git.julien@lepiller.eu>
References: <3dee3844b6da1d70b883391f5b2d23e331d6a5ad.1637534792.git.julien@lepiller.eu>
From: Julien Lepiller <julien@lepiller.eu>
Date: Sun, 21 Nov 2021 23:26:11 +0100
Subject: [PATCH 2/2] guix: Support specifying max download jobs.

* guix/store.scm (set-build-options): Optionally send max-download-jobs.
* guix/scripts/build.scm (set-build-options-from-command-line)
(%standard-build-options): Support `--max-downloads`
* doc/guix.texi (Common Build Options): Document it.
---
 doc/guix.texi          | 6 ++++++
 guix/scripts/build.scm | 8 ++++++++
 guix/store.scm         | 5 +++++
 3 files changed, 19 insertions(+)

diff --git a/doc/guix.texi b/doc/guix.texi
index b8de53c53b..0afefbd416 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -10846,6 +10846,12 @@ Common Build Options
 guix-daemon, @option{--max-jobs}}, for details about this option and the
 equivalent @command{guix-daemon} option.
 
+@item --max-downloads=@var{n}
+@itemx -D @var{n}
+Allow at most @var{n} download jobs in parallel.  @xref{Invoking
+guix-daemon, @option{--max-downloads}}, for details about this option and the
+equivalent @command{guix-daemon} option.
+
 @item --debug=@var{level}
 Produce debugging output coming from the build daemon.  @var{level} must be an
 integer between 0 and 5; higher means more verbose output.  Setting a level of
diff --git a/guix/scripts/build.scm b/guix/scripts/build.scm
index 97e2f5a167..0d0199eccd 100644
--- a/guix/scripts/build.scm
+++ b/guix/scripts/build.scm
@@ -207,6 +207,7 @@ (define (set-build-options-from-command-line store opts)
                      #:rounds (assoc-ref opts 'rounds)
                      #:build-cores (assoc-ref opts 'cores)
                      #:max-build-jobs (assoc-ref opts 'max-jobs)
+                     #:max-download-jobs (assoc-ref opts 'max-downloads)
                      #:fallback? (assoc-ref opts 'fallback?)
                      #:use-substitutes? (assoc-ref opts 'substitutes?)
                      #:substitute-urls (assoc-ref opts 'substitute-urls)
@@ -316,6 +317,13 @@ (define %standard-build-options
                   (let ((c (false-if-exception (string->number arg))))
                     (if c
                         (apply values (alist-cons 'max-jobs c result) rest)
+                        (leave (G_ "not a number: '~a' option argument: ~a~%")
+                               name arg)))))
+        (option '(#\D "max-downloads") #t #f
+                (lambda (opt name arg result . rest)
+                  (let ((c (false-if-exception (string->number arg))))
+                    (if c
+                        (apply values (alist-cons 'max-downloads c result) rest)
                         (leave (G_ "not a number: '~a' option argument: ~a~%")
                                name arg)))))))
 
diff --git a/guix/store.scm b/guix/store.scm
index 7388953d15..6b5b9262b1 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -803,6 +803,7 @@ (define* (set-build-options server
                             (verbosity 0)
                             rounds                ;number of build rounds
                             max-build-jobs
+                            max-download-jobs
                             timeout
                             max-silent-time
                             (offload? #t)
@@ -896,6 +897,10 @@ (define* (set-build-options server
                            `(("build-max-jobs"
                               . ,(number->string max-build-jobs)))
                            '())
+                     ,@(if max-download-jobs
+                           `(("download-max-jobs"
+                              . ,(number->string max-download-jobs)))
+                           '())
                      ,@(if build-cores
                            `(("build-cores" . ,(number->string build-cores)))
                            '())
-- 
2.33.1


  reply	other threads:[~2021-11-21 22:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21 22:53 [bug#39728] [PATCH] Allow parallel downloads and builds Julien Lepiller
2020-02-24 21:23 ` Ludovic Courtès
2020-02-25 15:21   ` zimoun
2020-02-25 15:39     ` Julien Lepiller
2020-02-26 10:36       ` Pierre Neidhardt
2021-11-21 22:49         ` Julien Lepiller [this message]
2021-11-25 12:53           ` [bug#39728] [PATCH v2] " Ludovic Courtès
2021-11-25 13:01             ` Julien Lepiller
2021-11-26 10:16               ` Ludovic Courtès
2021-07-14  2:49 ` [bug#39728] [PATCH] " Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211121234941.32724b68@tachikoma.lepiller.eu \
    --to=julien@lepiller.eu \
    --cc=39728@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    --cc=mail@ambrevar.xyz \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).