Ludovic Courtès skriver: > Hi, > > Marius Bakke skribis: > >> I have the same problem. Log messages around failure look something >> like this (extracted from above message): >> >> Oct 4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. >> Oct 4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y >> Oct 4 11:52:28 localhost sshd[252]: error: no more sessions >> Oct 4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination. >> Oct 4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. >> >> Then deploy crashes with 'Channel opening failure'. > > “no more sessions” comes from this: > > --8<---------------cut here---------------start------------->8--- > int > session_open(Authctxt *authctxt, int chanid) > { > Session *s = session_new(); > debug("session_open: channel %d", chanid); > if (s == NULL) { > error("no more sessions"); > return 0; > } > > [...] > > } > --8<---------------cut here---------------end--------------->8--- > > Would be interesting to run sshd in verbose/debug mode and see why we > hit that; it could be because the maximum number of sessions has been > reached or something. I enabled (log-level 'debug) and this is what happens server side: Nov 3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0 Nov 3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD. Nov 3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345 Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 channel 8 pid 26345 Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release channel 8 Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768 Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session] Nov 3 21:50:28 capella sshd[26115]: debug1: session_new: session 0 Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0 Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with channel 0 Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm session Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 0 request exec reply 1 Nov 3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 channel 0 Nov 3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 0 req exec Nov 3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials Nov 3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0 Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768 Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session] Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10 ["no more sessions" error occurs here, in a different log file] Nov 3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 10 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, nchannels 11 Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure session Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, nchannels 10 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, nchannels 9 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, nchannels 8 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, nchannels 7 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, nchannels 6 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, nchannels 5 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, nchannels 4 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, nchannels 3 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, nchannels 2 Nov 3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, nchannels 1 Nov 3 21:50:28 capella sshd[26115]: debug1: do_cleanup I compared this with a successful deploy: Nov 3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0 Nov 3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD. Nov 3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842 Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 channel 6 pid 25842 Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release channel 6 Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768 Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request Nov 3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session] Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 0 Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0 Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link with channel 0 Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 0 request exec reply 1 Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 channel 0 Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 0 req exec Nov 3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials Nov 3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0 Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768 Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request Nov 3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session] Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 8 Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8 Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link with channel 8 Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 8 request exec reply 1 Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 channel 8 Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 8 req exec Nov 3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials Nov 3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0 Nov 3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read]) Nov 3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 67 win 64000 max 32768 Nov 3 21:46:28 capella sshd[25612]: debug1: input_session_request Nov 3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session] Nov 3 21:46:28 capella sshd[25612]: debug1: session_new: session 9 [... more channels and stuffs before graceful shutdown ...] I believe the two-minute window before SIGCHLD is waiting for some Shepherd services to time out (long story..!). In my current tests the failure always occur at the very end of deploy: building /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv... building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv... ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: channel 66 error (2) open failed: # What's interesting is that all of the failed deploys fail to open "channel 10", whereas successful deploys never reach it. All of these are deploying the exact same configuration and commit, so the SSH traffic should nearly identical. I have attached the transcripts of each session (by grepping the PIDs from /var/log/debug). Grepping for session_open yields some inconstencies, but haven't studied them in detail yet. >> Due to the number of SSH connections made by deploy and frequent >> occurence of this problem, I was not able to successfully deploy through >> many attempts. > > Ouch. Normally, thanks to memoization, ‘guix deploy’ opens only one > session per machine. (I think it works well with the 25 local build > machines behind berlin; it’s also fine for a small set of machines I > take care of at work.) Have they been rebooted since the switch to inetd-style sshd? It fails more often than not for me. >> I found that removing the memoizing open-machine-ssh-session* in >> (gnu machine ssh) works around it and can happily deploy again. >> >> (that is, just use 'open-machine-ssh-session' instead) > > Interesting. That gives us a hint. Could you add a ‘pk’ in > ‘open-machine-ssh-session*’ to see how many connections it opens? Indeed it only opens a single connection. The problem seems to be with multiple "channels" and "sessions" over a single connection.