unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: 41625@debbugs.gnu.org
Subject: bug#41625: Sporadic guix-offload crashes due to EOF errors
Date: Mon, 05 Jul 2021 10:57:07 +0200	[thread overview]
Message-ID: <878s2lw330.fsf_-_@gnu.org> (raw)
In-Reply-To: <87r1hsjkbv.fsf_-_@gmail.com> (Maxim Cournoyer's message of "Thu,  27 May 2021 10:57:24 -0400")

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Now that I have root access to overdrive1, I could strace the sshd
> process (I just did 'strace -p340', noting the process of sshd displayed
> with 'herd status sshd'):
>
> pselect6(87, [3 4], NULL, NULL, NULL, NULL) = 1 (in [3])
> accept(3, {sa_family=AF_INET, sin_port=htons(33262), sin_addr=inet_addr("66.158.152.121")}, [128->16]) = 5
> fcntl(5, F_GETFL)                       = 0x2 (flags O_RDWR)
> pipe2([6, 7], 0)                        = 0
> socketpair(AF_UNIX, SOCK_STREAM, 0, [8, 9]) = 0
> clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xffff8e0ef0e0) = 644
> close(7)                                = 0
> close(9)                                = 0
> write(8, "\0\0\1\245\0", 5)             = 5
> write(8, "\0\0\1\234\nPort 22\nPermitRootLogin no\n"..., 420) = 420
> close(8)                                = 0
> close(5)                                = 0
> getpid()                                = 340
> getpid()                                = 340
> getpid()                                = 340
> getpid()                                = 340
> getpid()                                = 340
> getpid()                                = 340
> getpid()                                = 340
> pselect6(87, [3 4 6], NULL, NULL, NULL, NULL) = 1 (in [6])
> read(6, "\0", 1)                        = 1
> pselect6(87, [3 4 6], NULL, NULL, NULL, NULL) = 1 (in [6])
> read(6, "", 1)                          = 0

OK, so it looks as if the client disconnected right away.  Hard to tell
exactly what that happened.  :-/  Perhaps turning libssh debugging on on
the client side could help (by uncommenting “#:log-verbosity 'protocol”
in (guix ssh)).

>>From c7b2ec1c58adf8c795df0a6aaf075dbc331f41e8 Mon Sep 17 00:00:00 2001
> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
> Date: Thu, 27 May 2021 08:44:44 -0400
> Subject: [PATCH 1/2] offload: Parallelize machine check in offload test.
>
> * guix/scripts/offload.scm (check-machine-availability): Refactor so that it
> takes a single machine object.  Ensure the cleanup code is always run.
> (check-machines-availability): New procedure.  Call
> CHECK-MACHINES-AVAILABILITY in parallel, which improves performance (about
> twice as fast with 4 build machines, from ~30 s to ~15 s).

I remain wary of this change, because that could lead to subtle
non-deterministic bugs (of the kind that keeps you busy for weeks) and
because I personally don’t mind if ‘guix offload test’ takes 30s on
berlin, and because the intermingled output may make diagnostics less
clear.

>>From b5558777617e4674a150895458d57d202de56120 Mon Sep 17 00:00:00 2001
> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
> Date: Tue, 25 May 2021 08:42:06 -0400
> Subject: [PATCH 2/2] offload: Handle a possible EOF response from
>  read-repl-response.
>
> Partially fixes <https://issues.guix.gnu.org/41625>.
>
> * guix/scripts/offload.scm (check-machine-availability): Handle the case where
> the checks raised an exception due to receiving EOF prematurely, and retry up
> to 3 times.
> * guix/inferior.scm (&inferior-premature-eof): New condition type.
> (read-repl-response): Raise a condition of the above type when reading EOF
> from the build machine's port.

LGTM!

Thanks,
Ludo’.




  reply	other threads:[~2021-07-05  8:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-31  9:51 bug#41625: Sporadic guix-offload crashes due to EOF errors Marius Bakke
2020-05-31 10:12 ` Marius Bakke
2020-05-31 11:21   ` Marius Bakke
2020-06-04 12:05     ` Ludovic Courtès
2021-05-24  5:33       ` Maxim Cournoyer
2021-05-25 15:50         ` bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response Maxim Cournoyer
2021-05-25 20:27           ` Ludovic Courtès
2021-05-26  3:18             ` bug#41625: [PATCH v2] " Maxim Cournoyer
2021-05-26  9:14               ` Ludovic Courtès
2021-05-27 11:49                 ` Maxim Cournoyer
2021-05-27 14:57                 ` bug#41625: [PATCH v3] " Maxim Cournoyer
2021-07-05  8:57                   ` Ludovic Courtès [this message]
2021-09-24  4:53                     ` bug#41625: Sporadic guix-offload crashes due to EOF errors Maxim Cournoyer
2021-09-24  4:55                     ` Maxim Cournoyer
2021-05-27 17:20                 ` bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response Maxim Cournoyer
2021-05-29 19:24                   ` Ludovic Courtès
2021-05-26 15:48               ` Marius Bakke
2021-05-27 11:51                 ` Maxim Cournoyer
2022-03-26  5:03                   ` bug#41625: Sporadic guix-offload crashes due to EOF errors Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878s2lw330.fsf_-_@gnu.org \
    --to=ludo@gnu.org \
    --cc=41625@debbugs.gnu.org \
    --cc=maxim.cournoyer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).