From: "Ludovic Courtès" <ludo@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: 41625@debbugs.gnu.org
Subject: bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response.
Date: Tue, 25 May 2021 22:27:02 +0200 [thread overview]
Message-ID: <875yz61rvt.fsf@gnu.org> (raw)
In-Reply-To: <20210525155003.27590-1-maxim.cournoyer@gmail.com> (Maxim Cournoyer's message of "Tue, 25 May 2021 11:50:03 -0400")
Hi!
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> Fixes <https://issues.guix.gnu.org/41625>.
>
> * guix/scripts/offload.scm (check-machine-availability): Refactor so that it
> takes a single machine object, to allow for retrying a single machine. Handle
> the case where the checks raised an exception due to the connection to the
> build machine having been lost, and retry up to 3 times. Ensure the cleanup
> code is run in all situations.
> (check-machines-availability): New procedure. Call
> CHECK-MACHINES-AVAILABILITY in parallel, which improves performance (about
> twice as fast with 4 build machines, from ~30 s to ~15 s).
> * guix/inferior.scm (&inferior-connection-lost): New condition type.
> (read-repl-response): Raise a condition of the above type when reading EOF
> from the build machine's port.
[...]
> +(define-condition-type &inferior-connection-lost &error
> + inferior-connection-lost?)
> +
> (define* (read-repl-response port #:optional inferior)
> "Read a (guix repl) response from PORT and return it as a Scheme object.
> Raise '&inferior-exception' when an exception is read from PORT."
> @@ -241,6 +246,10 @@ Raise '&inferior-exception' when an exception is read from PORT."
> (match (read port)
> (('values objects ...)
> (apply values (map sexp->object objects)))
> + ;; Unexpectedly read EOF from the port. This can happen for example when
> + ;; the underlying connection for PORT was lost with Guile-SSH.
> + (? eof-object?
> + (raise (condition (&inferior-connection-lost))))
The match clause syntax is incorrect; should be:
((? eof-object?)
(raise …))
> + (info (G_ "Testing ~a build machines defined in '~a'...~%")
> (length machines) machine-file)
> - (let* ((names (map build-machine-name machines))
> - (sockets (map build-machine-daemon-socket machines))
> - (sessions (map (cut open-ssh-session <> %short-timeout) machines))
> - (nodes (map remote-inferior sessions)))
> - (for-each assert-node-has-guix nodes names)
> - (for-each assert-node-repl nodes names)
> - (for-each assert-node-can-import sessions nodes names sockets)
> - (for-each assert-node-can-export sessions nodes names sockets)
> - (for-each close-inferior nodes)
> - (for-each disconnect! sessions))))
> + (par-for-each check-machine-availability machines)))
Why not! IMO this should go in a separate patch, though, since it’s not
related.
> +(define (check-machine-availability machine)
> + "Check whether MACHINE is available. Exit with an error upon failure."
> + ;; Sometimes, the machine remote port may return EOF, presumably because the
> + ;; connection was lost. Retry up to 3 times.
> + (let loop ((retries 3))
> + (guard (c ((inferior-connection-lost? c)
> + (let ((retries-left (1- retries)))
> + (if (> retries-left 0)
> + (begin
> + (format (current-error-port)
> + (G_ "connection to machine ~s lost; retrying~%")
> + (build-machine-name machine))
> + (loop (retries-left)))
> + (leave (G_ "connection repeatedly lost with machine '~a'~%")
> + (build-machine-name machine))))))
I’m afraid we’re papering over problems here.
Is running ‘guix offload test /etc/guix/machines.scm overdrive1’ on
berlin enough to reproduce the issue? If so, we could monitor/strace
sshd on overdrive1 to get a better understanding of what’s going on.
WDYT?
Thanks,
Ludo’.
next prev parent reply other threads:[~2021-05-25 20:29 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-31 9:51 bug#41625: Sporadic guix-offload crashes due to EOF errors Marius Bakke
2020-05-31 10:12 ` Marius Bakke
2020-05-31 11:21 ` Marius Bakke
2020-06-04 12:05 ` Ludovic Courtès
2021-05-24 5:33 ` Maxim Cournoyer
2021-05-25 15:50 ` bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response Maxim Cournoyer
2021-05-25 20:27 ` Ludovic Courtès [this message]
2021-05-26 3:18 ` bug#41625: [PATCH v2] " Maxim Cournoyer
2021-05-26 9:14 ` Ludovic Courtès
2021-05-27 11:49 ` Maxim Cournoyer
2021-05-27 14:57 ` bug#41625: [PATCH v3] " Maxim Cournoyer
2021-07-05 8:57 ` bug#41625: Sporadic guix-offload crashes due to EOF errors Ludovic Courtès
2021-09-24 4:53 ` Maxim Cournoyer
2021-09-24 4:55 ` Maxim Cournoyer
2021-05-27 17:20 ` bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response Maxim Cournoyer
2021-05-29 19:24 ` Ludovic Courtès
2021-05-26 15:48 ` Marius Bakke
2021-05-27 11:51 ` Maxim Cournoyer
2022-03-26 5:03 ` bug#41625: Sporadic guix-offload crashes due to EOF errors Maxim Cournoyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875yz61rvt.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=41625@debbugs.gnu.org \
--cc=maxim.cournoyer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).