unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: "Maxim Cournoyer" <maxim.cournoyer@gmail.com>,
	"Ludovic Courtès" <ludo@gnu.org>
Cc: 24496@debbugs.gnu.org, ng0 <ngillmann@runbox.com>
Subject: bug#24496: offloading should fall back to local build after n tries
Date: Sat, 18 Dec 2021 01:10:49 +0100	[thread overview]
Message-ID: <86tuf6rcvq.fsf@gmail.com> (raw)
In-Reply-To: <87lf0i6gj6.fsf@gmail.com>

Hi,

I have not checked all the details, since the code of “guix offload” is
run by root, IIUC and so it is not as friendly as usual to debug. :-)

On Fri, 17 Dec 2021 at 16:57, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:

>> However, I think this behavior was unintentionally lost in
>> efbf5fdd01817ea75de369e3dd2761a85f8f7dd5.  Maxim, WDYT?
>
> I just reviewed this commit, and don't see anywhere where the behavior
> would have changed.  The discarding happens here:

[...]

> previously load could be set to +inf.0.  Now it is a float between 0.0
> and 1.0, with threshold defaulting to 0.6.

My /etc/guix/machines.scm contains only one machine and --max-jobs=0.

Because the machine is unreachable, IIUC, ’node’ is (or should be) false
and ’load’ is thus not involved, I guess.  Indeed, ’report-load’
displays nothing, and instead I get:

--8<---------------cut here---------------start------------->8---
The following derivation will be built:
   /gnu/store/c1qicg17ygn1a0biq0q4mkprzy4p2x74-hello-2.10.drv
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
waiting for locks or build slots...
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
  C-c C-c
--8<---------------cut here---------------end--------------->8---


Well, if the machine is not reachable, then ’session’ is false, right?

--8<---------------cut here---------------start------------->8---
@@ -472,11 +480,15 @@ (define (machine-faster? m1 m2)
        (let* ((session (false-if-exception (open-ssh-session best
                                                              %short-timeout)))
               (node    (and session (remote-inferior session)))
-              (load    (and node (normalized-load best (node-load node))))
+              (load    (and node (node-load node)))
+              (threshold (build-machine-overload-threshold best))
               (space   (and node (node-free-disk-space node))))
+         (when load (report-load best load))
          (when node (close-inferior node))
          (when session (disconnect! session))
-         (if (and node (< load 2.) (>= space %minimum-disk-space))
+         (if (and node
+                  (or (not threshold) (< load threshold))
+                  (>= space %minimum-disk-space))
[...]
             (begin
               ;; BEST is unsuitable, so try the next one.
               (when (and space (< space %minimum-disk-space))
                 (format (current-error-port)
                         "skipping machine '~a' because it is low \
on disk space (~,2f MiB free)~%"
                         (build-machine-name best)
                         (/ space (expt 2 20) 1.)))
               (release-build-slot slot)
               (loop others)))))
--8<---------------cut here---------------end--------------->8---

Therefore, the ’else’ branch goes and so the codes does ’(loop others)’.

However, I miss why ’others’ is not empty (only one machine in
/etc/guix/machines.scm).  Well, the message «waiting for locks or build
slots...» suggests that something is restarted and it is not that ’loop’
we are observing but another one.

On daemon side, I do not know what this ’waitingForAWhile’ and
’lastWokenUp’ mean.

--8<---------------cut here---------------start------------->8---
    /* If we are polling goals that are waiting for a lock, then wake
       up after a few seconds at most. */
    if (!waitingForAWhile.empty()) {
        useTimeout = true;
        if (lastWokenUp == 0)
            printMsg(lvlError, "waiting for locks or build slots...");
        if (lastWokenUp == 0 || lastWokenUp > before) lastWokenUp = before;
        timeout.tv_sec = std::max((time_t) 1, (time_t) (lastWokenUp + settings.pollInterval - before));
    } else lastWokenUp = 0;
--8<---------------cut here---------------end--------------->8---


Bah it requires more investigations and I agree with Maxim that
efbf5fdd01817ea75de369e3dd2761a85f8f7dd5 is probably not the issue
there.

Cheers,
simon




  reply	other threads:[~2021-12-18  0:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-21  9:39 bug#24496: offloading should fall back to local build after n tries ng0
2016-09-26  9:20 ` Ludovic Courtès
2016-10-04 17:08   ` ng0
2016-10-05 11:36     ` Ludovic Courtès
2021-12-16 12:52   ` zimoun
2021-12-17 15:33     ` Ludovic Courtès
2021-12-17 21:57       ` Maxim Cournoyer
2021-12-18  0:10         ` zimoun [this message]
2021-12-21 14:28         ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86tuf6rcvq.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=24496@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    --cc=maxim.cournoyer@gmail.com \
    --cc=ngillmann@runbox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).