On 2022-11-04 00:47, Marius Bakke wrote: > Ludovic Courtès skriver: > >> Hi, >> >> Marius Bakke skribis: >> >>> I have the same problem. Log messages around failure look something >>> like this (extracted from above message): >>> >>> Oct 4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. >>> Oct 4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y >>> Oct 4 11:52:28 localhost sshd[252]: error: no more sessions >>> Oct 4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination. >>> Oct 4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. >>> >>> Then deploy crashes with 'Channel opening failure'. >> >> “no more sessions” comes from this: >> >> --8<---------------cut here---------------start------------->8--- >> int >> session_open(Authctxt *authctxt, int chanid) >> { >> Session *s = session_new(); >> debug("session_open: channel %d", chanid); >> if (s == NULL) { >> error("no more sessions"); >> return 0; >> } >> >> [...] >> >> } >> --8<---------------cut here---------------end--------------->8--- >> >> Would be interesting to run sshd in verbose/debug mode and see why we >> hit that; it could be because the maximum number of sessions has been >> reached or something. > > I enabled (log-level 'debug) and this is what happens server side: > > Nov 3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0 > Nov 3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD. > Nov 3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345 > Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 channel 8 pid 26345 > Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release channel 8 > Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768 > Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session] > Nov 3 21:50:28 capella sshd[26115]: debug1: session_new: session 0 > Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0 > Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with channel 0 > Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm session > Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 0 request exec reply 1 > Nov 3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 channel 0 > Nov 3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 0 req exec > Nov 3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials > Nov 3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0 > Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768 > Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session] > Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10 > ["no more sessions" error occurs here, in a different log file] > Nov 3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 10 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, nchannels 11 > Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure session > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, nchannels 10 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, nchannels 9 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, nchannels 8 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, nchannels 7 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, nchannels 6 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, nchannels 5 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, nchannels 4 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, nchannels 3 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, nchannels 2 > Nov 3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, nchannels 1 > Nov 3 21:50:28 capella sshd[26115]: debug1: do_cleanup > > I compared this with a successful deploy: > > Nov 3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0 > Nov 3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD. > Nov 3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842 > Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 channel 6 pid 25842 > Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release channel 6 > Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768 > Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request > Nov 3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session] > Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 0 > Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0 > Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link with channel 0 > Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session > Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 0 request exec reply 1 > Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 channel 0 > Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 0 req exec > Nov 3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials > Nov 3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0 > Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768 > Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request > Nov 3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session] > Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 8 > Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8 > Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link with channel 8 > Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session > Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 8 request exec reply 1 > Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 channel 8 > Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 8 req exec > Nov 3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials > Nov 3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0 > Nov 3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read]) > Nov 3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 67 win 64000 max 32768 > Nov 3 21:46:28 capella sshd[25612]: debug1: input_session_request > Nov 3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session] > Nov 3 21:46:28 capella sshd[25612]: debug1: session_new: session 9 > [... more channels and stuffs before graceful shutdown ...] > > I believe the two-minute window before SIGCHLD is waiting for some > Shepherd services to time out (long story..!). > > In my current tests the failure always occur at the > very end of deploy: > > building /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv... > building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv... > ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: channel 66 error (2) open failed: # > > What's interesting is that all of the failed deploys fail to open > "channel 10", whereas successful deploys never reach it. All of these > are deploying the exact same configuration and commit, so the SSH > traffic should nearly identical. > > I have attached the transcripts of each session (by grepping the PIDs > from /var/log/debug). Grepping for session_open yields some > inconstencies, but haven't studied them in detail yet. > >>> Due to the number of SSH connections made by deploy and frequent >>> occurence of this problem, I was not able to successfully deploy through >>> many attempts. >> >> Ouch. Normally, thanks to memoization, ‘guix deploy’ opens only one >> session per machine. (I think it works well with the 25 local build >> machines behind berlin; it’s also fine for a small set of machines I >> take care of at work.) > > Have they been rebooted since the switch to inetd-style sshd? It fails > more often than not for me. > >>> I found that removing the memoizing open-machine-ssh-session* in >>> (gnu machine ssh) works around it and can happily deploy again. >>> >>> (that is, just use 'open-machine-ssh-session' instead) >> >> Interesting. That gives us a hint. Could you add a ‘pk’ in >> ‘open-machine-ssh-session*’ to see how many connections it opens? > > Indeed it only opens a single connection. The problem seems to be > with multiple "channels" and "sessions" over a single connection. > Very detailed report, thank you for digging in! CCed Artyom Poptsov, author of Guile-SSH. -- Best regards, Andrew Tropin