unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Andrew Tropin <andrew@trop.in>
To: "Marius Bakke" <marius@gnu.org>, "Ludovic Courtès" <ludo@gnu.org>
Cc: poptsov.artyom@gmail.com, 58290@debbugs.gnu.org
Subject: bug#58290: guile ssh error on guix deploy
Date: Thu, 10 Nov 2022 14:06:33 +0400	[thread overview]
Message-ID: <87h6z75djq.fsf@trop.in> (raw)
In-Reply-To: <875yfv38jt.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 9820 bytes --]

On 2022-11-04 00:47, Marius Bakke wrote:

> Ludovic Courtès <ludo@gnu.org> skriver:
>
>> Hi,
>>
>> Marius Bakke <marius@gnu.org> skribis:
>>
>>> I have the same problem.  Log messages around failure look something
>>> like this (extracted from above message):
>>>
>>>   Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
>>>   Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>>   Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
>>>   Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination. 
>>>   Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 
>>>
>>> Then deploy crashes with 'Channel opening failure'.
>>
>> “no more sessions” comes from this:
>>
>> --8<---------------cut here---------------start------------->8---
>> int
>> session_open(Authctxt *authctxt, int chanid)
>> {
>> 	Session *s = session_new();
>> 	debug("session_open: channel %d", chanid);
>> 	if (s == NULL) {
>> 		error("no more sessions");
>> 		return 0;
>> 	}
>>
>> [...]
>>
>> }
>> --8<---------------cut here---------------end--------------->8---
>>
>> Would be interesting to run sshd in verbose/debug mode and see why we
>> hit that; it could be because the maximum number of sessions has been
>> reached or something.
>
> I enabled (log-level 'debug) and this is what happens server side:
>
> Nov  3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
> Nov  3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 channel 8 pid 26345
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release channel 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm session
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 0 request exec reply 1
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 0 req exec
> Nov  3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
> Nov  3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
> ["no more sessions" error occurs here, in a different log file]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, nchannels 11
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure session
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, nchannels 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, nchannels 9
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, nchannels 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, nchannels 7
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, nchannels 6
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, nchannels 5
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, nchannels 4
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, nchannels 3
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, nchannels 2
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, nchannels 1
> Nov  3 21:50:28 capella sshd[26115]: debug1: do_cleanup
>
> I compared this with a successful deploy:
>
> Nov  3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 channel 6 pid 25842
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release channel 6
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
> Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session]
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link with channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 0 request exec reply 1
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 0 req exec
> Nov  3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials
> Nov  3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
> Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session]
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link with channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 8 request exec reply 1
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 8 req exec
> Nov  3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials
> Nov  3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read])
> Nov  3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 67 win 64000 max 32768
> Nov  3 21:46:28 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session]
> Nov  3 21:46:28 capella sshd[25612]: debug1: session_new: session 9
> [... more channels and stuffs before graceful shutdown ...]
>
> I believe the two-minute window before SIGCHLD is waiting for some
> Shepherd services to time out (long story..!).
>
> In my current tests the failure always occur at the
> very end of deploy:
>
> building /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
> building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
> ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: channel 66 error (2) open failed: #<input-output: channel (closed) 7fdefb015e80>
>
> What's interesting is that all of the failed deploys fail to open
> "channel 10", whereas successful deploys never reach it.  All of these
> are deploying the exact same configuration and commit, so the SSH
> traffic should nearly identical.
>
> I have attached the transcripts of each session (by grepping the PIDs
> from /var/log/debug).  Grepping for session_open yields some
> inconstencies, but haven't studied them in detail yet.
>
>>> Due to the number of SSH connections made by deploy and frequent
>>> occurence of this problem, I was not able to successfully deploy through
>>> many attempts.
>>
>> Ouch.  Normally, thanks to memoization, ‘guix deploy’ opens only one
>> session per machine.  (I think it works well with the 25 local build
>> machines behind berlin; it’s also fine for a small set of machines I
>> take care of at work.)
>
> Have they been rebooted since the switch to inetd-style sshd?  It fails
> more often than not for me.
>
>>> I found that removing the memoizing open-machine-ssh-session* in
>>> (gnu machine ssh) works around it and can happily deploy again.
>>>
>>> (that is, just use 'open-machine-ssh-session' instead)
>>
>> Interesting.  That gives us a hint.  Could you add a ‘pk’ in
>> ‘open-machine-ssh-session*’ to see how many connections it opens?
>
> Indeed it only opens a single connection.  The problem seems to be
> with multiple "channels" and "sessions" over a single connection.
>

Very detailed report, thank you for digging in!

CCed Artyom Poptsov, author of Guile-SSH.

-- 
Best regards,
Andrew Tropin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2022-11-10 10:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04 11:09 bug#58290: guile ssh error on guix deploy Andrew Tropin
2022-10-08 12:39 ` Ludovic Courtès
2022-10-14  6:45   ` Andrew Tropin
2022-10-17 14:39     ` Ludovic Courtès
2022-10-17 17:27       ` Marius Bakke
2022-10-18 15:31         ` Ludovic Courtès
2022-11-03 23:47           ` Marius Bakke
2022-11-10 10:06             ` Andrew Tropin [this message]
2023-01-12 12:11             ` Ludovic Courtès
2023-01-12 12:44               ` Ludovic Courtès
2023-01-22 20:00                 ` Artyom V. Poptsov
2023-01-23 20:14                   ` Artyom V. Poptsov
2023-01-23 21:50                     ` Ludovic Courtès
2023-01-24 15:53                       ` Andrew Tropin
2023-01-30 22:53                         ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6z75djq.fsf@trop.in \
    --to=andrew@trop.in \
    --cc=58290@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    --cc=marius@gnu.org \
    --cc=poptsov.artyom@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).