From: zimoun <zimon.toutoune@gmail.com>
To: "Maxim Cournoyer" <maxim.cournoyer@gmail.com>,
"Ludovic Courtès" <ludo@gnu.org>
Cc: 24496@debbugs.gnu.org, ng0 <ngillmann@runbox.com>
Subject: bug#24496: offloading should fall back to local build after n tries
Date: Sat, 18 Dec 2021 01:10:49 +0100 [thread overview]
Message-ID: <86tuf6rcvq.fsf@gmail.com> (raw)
In-Reply-To: <87lf0i6gj6.fsf@gmail.com>
Hi,
I have not checked all the details, since the code of “guix offload” is
run by root, IIUC and so it is not as friendly as usual to debug. :-)
On Fri, 17 Dec 2021 at 16:57, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:
>> However, I think this behavior was unintentionally lost in
>> efbf5fdd01817ea75de369e3dd2761a85f8f7dd5. Maxim, WDYT?
>
> I just reviewed this commit, and don't see anywhere where the behavior
> would have changed. The discarding happens here:
[...]
> previously load could be set to +inf.0. Now it is a float between 0.0
> and 1.0, with threshold defaulting to 0.6.
My /etc/guix/machines.scm contains only one machine and --max-jobs=0.
Because the machine is unreachable, IIUC, ’node’ is (or should be) false
and ’load’ is thus not involved, I guess. Indeed, ’report-load’
displays nothing, and instead I get:
--8<---------------cut here---------------start------------->8---
The following derivation will be built:
/gnu/store/c1qicg17ygn1a0biq0q4mkprzy4p2x74-hello-2.10.drv
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
waiting for locks or build slots...
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
C-c C-c
--8<---------------cut here---------------end--------------->8---
Well, if the machine is not reachable, then ’session’ is false, right?
--8<---------------cut here---------------start------------->8---
@@ -472,11 +480,15 @@ (define (machine-faster? m1 m2)
(let* ((session (false-if-exception (open-ssh-session best
%short-timeout)))
(node (and session (remote-inferior session)))
- (load (and node (normalized-load best (node-load node))))
+ (load (and node (node-load node)))
+ (threshold (build-machine-overload-threshold best))
(space (and node (node-free-disk-space node))))
+ (when load (report-load best load))
(when node (close-inferior node))
(when session (disconnect! session))
- (if (and node (< load 2.) (>= space %minimum-disk-space))
+ (if (and node
+ (or (not threshold) (< load threshold))
+ (>= space %minimum-disk-space))
[...]
(begin
;; BEST is unsuitable, so try the next one.
(when (and space (< space %minimum-disk-space))
(format (current-error-port)
"skipping machine '~a' because it is low \
on disk space (~,2f MiB free)~%"
(build-machine-name best)
(/ space (expt 2 20) 1.)))
(release-build-slot slot)
(loop others)))))
--8<---------------cut here---------------end--------------->8---
Therefore, the ’else’ branch goes and so the codes does ’(loop others)’.
However, I miss why ’others’ is not empty (only one machine in
/etc/guix/machines.scm). Well, the message «waiting for locks or build
slots...» suggests that something is restarted and it is not that ’loop’
we are observing but another one.
On daemon side, I do not know what this ’waitingForAWhile’ and
’lastWokenUp’ mean.
--8<---------------cut here---------------start------------->8---
/* If we are polling goals that are waiting for a lock, then wake
up after a few seconds at most. */
if (!waitingForAWhile.empty()) {
useTimeout = true;
if (lastWokenUp == 0)
printMsg(lvlError, "waiting for locks or build slots...");
if (lastWokenUp == 0 || lastWokenUp > before) lastWokenUp = before;
timeout.tv_sec = std::max((time_t) 1, (time_t) (lastWokenUp + settings.pollInterval - before));
} else lastWokenUp = 0;
--8<---------------cut here---------------end--------------->8---
Bah it requires more investigations and I agree with Maxim that
efbf5fdd01817ea75de369e3dd2761a85f8f7dd5 is probably not the issue
there.
Cheers,
simon
next prev parent reply other threads:[~2021-12-18 0:12 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-21 9:39 bug#24496: offloading should fall back to local build after n tries ng0
2016-09-26 9:20 ` Ludovic Courtès
2016-10-04 17:08 ` ng0
2016-10-05 11:36 ` Ludovic Courtès
2021-12-16 12:52 ` zimoun
2021-12-17 15:33 ` Ludovic Courtès
2021-12-17 21:57 ` Maxim Cournoyer
2021-12-18 0:10 ` zimoun [this message]
2021-12-21 14:28 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86tuf6rcvq.fsf@gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=24496@debbugs.gnu.org \
--cc=ludo@gnu.org \
--cc=maxim.cournoyer@gmail.com \
--cc=ngillmann@runbox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).