unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Marius Bakke <marius@gnu.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 58290@debbugs.gnu.org, Andrew Tropin <andrew@trop.in>
Subject: bug#58290: guile ssh error on guix deploy
Date: Fri, 04 Nov 2022 00:47:02 +0100	[thread overview]
Message-ID: <875yfv38jt.fsf@gnu.org> (raw)
In-Reply-To: <87y1tdkv1d.fsf@gnu.org>


[-- Attachment #1.1: Type: text/plain, Size: 9393 bytes --]

Ludovic Courtès <ludo@gnu.org> skriver:

> Hi,
>
> Marius Bakke <marius@gnu.org> skribis:
>
>> I have the same problem.  Log messages around failure look something
>> like this (extracted from above message):
>>
>>   Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
>>   Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>   Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
>>   Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination. 
>>   Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 
>>
>> Then deploy crashes with 'Channel opening failure'.
>
> “no more sessions” comes from this:
>
> --8<---------------cut here---------------start------------->8---
> int
> session_open(Authctxt *authctxt, int chanid)
> {
> 	Session *s = session_new();
> 	debug("session_open: channel %d", chanid);
> 	if (s == NULL) {
> 		error("no more sessions");
> 		return 0;
> 	}
>
> [...]
>
> }
> --8<---------------cut here---------------end--------------->8---
>
> Would be interesting to run sshd in verbose/debug mode and see why we
> hit that; it could be because the maximum number of sessions has been
> reached or something.

I enabled (log-level 'debug) and this is what happens server side:

Nov  3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
Nov  3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
Nov  3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 channel 8 pid 26345
Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release channel 8
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
Nov  3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm session
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 0 request exec reply 1
Nov  3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 0 req exec
Nov  3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
Nov  3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
["no more sessions" error occurs here, in a different log file]
Nov  3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 10
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, nchannels 11
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure session
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, nchannels 10
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, nchannels 9
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, nchannels 8
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, nchannels 7
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, nchannels 6
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, nchannels 5
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, nchannels 4
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, nchannels 3
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, nchannels 2
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, nchannels 1
Nov  3 21:50:28 capella sshd[26115]: debug1: do_cleanup

I compared this with a successful deploy:

Nov  3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
Nov  3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
Nov  3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 channel 6 pid 25842
Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release channel 6
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
Nov  3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session]
Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 0
Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0
Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link with channel 0
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 0 request exec reply 1
Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 channel 0
Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 0 req exec
Nov  3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials
Nov  3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
Nov  3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session]
Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 8
Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8
Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link with channel 8
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 8 request exec reply 1
Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 channel 8
Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 8 req exec
Nov  3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials
Nov  3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0
Nov  3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read])
Nov  3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 67 win 64000 max 32768
Nov  3 21:46:28 capella sshd[25612]: debug1: input_session_request
Nov  3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session]
Nov  3 21:46:28 capella sshd[25612]: debug1: session_new: session 9
[... more channels and stuffs before graceful shutdown ...]

I believe the two-minute window before SIGCHLD is waiting for some
Shepherd services to time out (long story..!).

In my current tests the failure always occur at the
very end of deploy:

building /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: channel 66 error (2) open failed: #<input-output: channel (closed) 7fdefb015e80>

What's interesting is that all of the failed deploys fail to open
"channel 10", whereas successful deploys never reach it.  All of these
are deploying the exact same configuration and commit, so the SSH
traffic should nearly identical.

I have attached the transcripts of each session (by grepping the PIDs
from /var/log/debug).  Grepping for session_open yields some
inconstencies, but haven't studied them in detail yet.

>> Due to the number of SSH connections made by deploy and frequent
>> occurence of this problem, I was not able to successfully deploy through
>> many attempts.
>
> Ouch.  Normally, thanks to memoization, ‘guix deploy’ opens only one
> session per machine.  (I think it works well with the 25 local build
> machines behind berlin; it’s also fine for a small set of machines I
> take care of at work.)

Have they been rebooted since the switch to inetd-style sshd?  It fails
more often than not for me.

>> I found that removing the memoizing open-machine-ssh-session* in
>> (gnu machine ssh) works around it and can happily deploy again.
>>
>> (that is, just use 'open-machine-ssh-session' instead)
>
> Interesting.  That gives us a hint.  Could you add a ‘pk’ in
> ‘open-machine-ssh-session*’ to see how many connections it opens?

Indeed it only opens a single connection.  The problem seems to be
with multiple "channels" and "sessions" over a single connection.


[-- Attachment #1.2: deploy-fail.log.gz --]
[-- Type: application/gzip, Size: 3684 bytes --]

[-- Attachment #1.3: deploy-success.log.gz --]
[-- Type: application/gzip, Size: 3943 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 247 bytes --]

  reply	other threads:[~2022-11-03 23:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04 11:09 bug#58290: guile ssh error on guix deploy Andrew Tropin
2022-10-08 12:39 ` Ludovic Courtès
2022-10-14  6:45   ` Andrew Tropin
2022-10-17 14:39     ` Ludovic Courtès
2022-10-17 17:27       ` Marius Bakke
2022-10-18 15:31         ` Ludovic Courtès
2022-11-03 23:47           ` Marius Bakke [this message]
2022-11-10 10:06             ` Andrew Tropin
2023-01-12 12:11             ` Ludovic Courtès
2023-01-12 12:44               ` Ludovic Courtès
2023-01-22 20:00                 ` Artyom V. Poptsov
2023-01-23 20:14                   ` Artyom V. Poptsov
2023-01-23 21:50                     ` Ludovic Courtès
2023-01-24 15:53                       ` Andrew Tropin
2023-01-30 22:53                         ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875yfv38jt.fsf@gnu.org \
    --to=marius@gnu.org \
    --cc=58290@debbugs.gnu.org \
    --cc=andrew@trop.in \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).