all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 41625@debbugs.gnu.org
Subject: bug#41625: Sporadic guix-offload crashes due to EOF errors
Date: Fri, 24 Sep 2021 00:53:22 -0400	[thread overview]
Message-ID: <87czoyk20t.fsf@gmail.com> (raw)
In-Reply-To: <878s2lw330.fsf_-_@gnu.org> ("Ludovic Courtès"'s message of "Mon, 05 Jul 2021 10:57:07 +0200")

Hello!

Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Now that I have root access to overdrive1, I could strace the sshd
>> process (I just did 'strace -p340', noting the process of sshd displayed
>> with 'herd status sshd'):
>>
>> pselect6(87, [3 4], NULL, NULL, NULL, NULL) = 1 (in [3])
>> accept(3, {sa_family=AF_INET, sin_port=htons(33262), sin_addr=inet_addr("66.158.152.121")}, [128->16]) = 5
>> fcntl(5, F_GETFL)                       = 0x2 (flags O_RDWR)
>> pipe2([6, 7], 0)                        = 0
>> socketpair(AF_UNIX, SOCK_STREAM, 0, [8, 9]) = 0
>> clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xffff8e0ef0e0) = 644
>> close(7)                                = 0
>> close(9)                                = 0
>> write(8, "\0\0\1\245\0", 5)             = 5
>> write(8, "\0\0\1\234\nPort 22\nPermitRootLogin no\n"..., 420) = 420
>> close(8)                                = 0
>> close(5)                                = 0
>> getpid()                                = 340
>> getpid()                                = 340
>> getpid()                                = 340
>> getpid()                                = 340
>> getpid()                                = 340
>> getpid()                                = 340
>> getpid()                                = 340
>> pselect6(87, [3 4 6], NULL, NULL, NULL, NULL) = 1 (in [6])
>> read(6, "\0", 1)                        = 1
>> pselect6(87, [3 4 6], NULL, NULL, NULL, NULL) = 1 (in [6])
>> read(6, "", 1)                          = 0
>
> OK, so it looks as if the client disconnected right away.  Hard to tell
> exactly what that happened.  :-/  Perhaps turning libssh debugging on on
> the client side could help (by uncommenting “#:log-verbosity 'protocol”
> in (guix ssh)).

I was able to better understand the problem after encountering it on
another low power ARM board.  It's about the guile-ssh/libssh timeout
causing a channel read to return EOF.

I have one example here where it hangs at the (inferior-eval
'(use-modules (gnu)) result)' step; Guix runs for about 1m30s,
apparently loading all the package modules. Perhaps my
GUILE_COMPILED_PATH is not set correctly and things are slower than
expected.  Not sure.

But what happens is that there's no output in the 15 s timeout that we
set for the SSH session elapses, and libssh's ssh_channel_read returns
0, which is the same value it returns when it encounters EOF.  Guile's
peek_byte_or_eof turn that zero into an EOF.  I've shared my analysis on
the guile-ssh bug tracker [0]

So information is lost at libssh's level, which is not so nice.  Knowing
exactly how that EOF come into the picture, we can handle it and produce
better diagnostic though.  I'll try reworking my original patch in that
direction.

Thanks,

Maxim

[0]  https://github.com/artyom-poptsov/guile-ssh/issues/29




  reply	other threads:[~2021-09-24  4:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-31  9:51 bug#41625: Sporadic guix-offload crashes due to EOF errors Marius Bakke
2020-05-31 10:12 ` Marius Bakke
2020-05-31 11:21   ` Marius Bakke
2020-06-04 12:05     ` Ludovic Courtès
2021-05-24  5:33       ` Maxim Cournoyer
2021-05-25 15:50         ` bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response Maxim Cournoyer
2021-05-25 20:27           ` Ludovic Courtès
2021-05-26  3:18             ` bug#41625: [PATCH v2] " Maxim Cournoyer
2021-05-26  9:14               ` Ludovic Courtès
2021-05-27 11:49                 ` Maxim Cournoyer
2021-05-27 14:57                 ` bug#41625: [PATCH v3] " Maxim Cournoyer
2021-07-05  8:57                   ` bug#41625: Sporadic guix-offload crashes due to EOF errors Ludovic Courtès
2021-09-24  4:53                     ` Maxim Cournoyer [this message]
2021-09-24  4:55                     ` Maxim Cournoyer
2021-05-27 17:20                 ` bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response Maxim Cournoyer
2021-05-29 19:24                   ` Ludovic Courtès
2021-05-26 15:48               ` Marius Bakke
2021-05-27 11:51                 ` Maxim Cournoyer
2022-03-26  5:03                   ` bug#41625: Sporadic guix-offload crashes due to EOF errors Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czoyk20t.fsf@gmail.com \
    --to=maxim.cournoyer@gmail.com \
    --cc=41625@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.