From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#33239: 'guix offload' regularly hangs in 'channel-get-exit-status' call Date: Fri, 23 Nov 2018 18:25:21 +0100 Message-ID: <87wop33dvi.fsf@gnu.org> References: <87k1lvrblp.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:44997) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gQFDP-00006r-CW for bug-guix@gnu.org; Fri, 23 Nov 2018 12:26:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gQFDO-0000HH-43 for bug-guix@gnu.org; Fri, 23 Nov 2018 12:26:03 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:40227) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gQFDN-0000Gt-Vn for bug-guix@gnu.org; Fri, 23 Nov 2018 12:26:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gQFDN-0001Qe-Jx for bug-guix@gnu.org; Fri, 23 Nov 2018 12:26:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87k1lvrblp.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Fri, 02 Nov 2018 11:57:06 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: 33239@debbugs.gnu.org ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > (gdb) bt > #0 0x00007f299fb330f1 in __GI___poll (fds=3D0x1dd58c0, nfds=3D1, timeout= =3D-1) at ../sysdeps/unix/sysv/linux/poll.c:29 > #1 0x00007f2994287577 in ssh_poll_ctx_dopoll () from target:/gnu/store/w= mpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4 > #2 0x00007f29942884d9 in ssh_handle_packets () from target:/gnu/store/wm= pg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4 > #3 0x00007f29942885ad in ssh_handle_packets_termination () from target:/= gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4 > #4 0x00007f2994275080 in ssh_channel_get_exit_status () from target:/gnu= /store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4 > #5 0x00007f29946dd11a in guile_ssh_channel_get_exit_status () from targe= t:/gnu/store/i3nfl17wfx7sryq6w15r9wxl7ilmq4rb-guile-ssh-0.11.3/lib/libguile= -ssh.so.11 > #6 0x00007f29a1765965 in vm_regular_engine (thread=3D0x1dd58c0, vp=3D0x1= d4df30, registers=3D0xffffffff, resume=3D-1615646479) at vm-engine.c:786 > #7 0x00007f29a1768fba in scm_call_n (proc=3D#, arg= v=3Dargv@entry=3D0x7ffc76b1ece8, nargs=3Dnargs@entry=3D1) at vm.c:1257 > #8 0x00007f29a16ecff7 in scm_primitive_eval ( > exp=3Dexp@entry=3D((@ (ice-9 control) %) (begin ((@@ (ice-9 command-l= ine) load/lang) "/gnu/store/zz3b7j4iv6v143v7cqyr77k83zc5n3zw-guix-0.15.0-6.= f9a8fce/bin/.guix-real") (main (command-line)) (quit)))) at eval.c:662 > #9 0x00007f29a16ed053 in scm_eval ( > exp=3D((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/la= ng) "/gnu/store/zz3b7j4iv6v143v7cqyr77k83zc5n3zw-guix-0.15.0-6.f9a8fce/bin/= .guix-real") (main (command-line)) (quit))), module_or_state=3Dmodule_or_st= ate@entry=3D"#" =3D {...}) at eval.c:696 > #10 0x00007f29a1738220 in scm_shell (argc=3D11, argv=3D0x1dd5280) at scri= pt.c:454 > > (gdb) frame 0 > #0 0x00007f299fb330f1 in __GI___poll (fds=3D0x1dd58c0, nfds=3D1, timeout= =3D-1) at ../sysdeps/unix/sysv/linux/poll.c:29 > 29 in ../sysdeps/unix/sysv/linux/poll.c > (gdb) p *fds > $1 =3D {fd =3D 14, events =3D 1, revents =3D 0} > (gdb) shell ls -l /proc/12605/fd > total 0 > lr-x------ 1 root root 64 Nov 2 11:20 0 -> 'pipe:[44413497]' > l-wx------ 1 root root 64 Nov 2 11:33 1 -> 'pipe:[44413496]' > lr-x------ 1 root root 64 Nov 2 11:33 10 -> 'pipe:[44459532]' > l-wx------ 1 root root 64 Nov 2 11:33 11 -> 'pipe:[44459532]' > lr-x------ 1 root root 64 Nov 2 11:33 12 -> 'pipe:[44429590]' > l-wx------ 1 root root 64 Nov 2 11:33 13 -> 'pipe:[44429590]' > lrwx------ 1 root root 64 Nov 2 11:33 14 -> 'socket:[44444783]' > lrwx------ 1 root root 64 Nov 2 11:33 15 -> 'socket:[44444784]' > l-wx------ 1 root root 64 Nov 2 11:33 16 -> /var/guix/offload/141.80.167= .140/0 When that happens, the guile process on the remote node that runs the =E2=80=98redirect=E2=80=99 code of =E2=80=98remote-daemon-channel=E2=80=99 = is stuck in select(2) with infinite timeout. Note on berlin the build nodes are still running Guile 2.2.2, vulnerable to the =E2=80=98select=E2=80=99 bug , which we = =E2=80=98redirect=E2=80=99 supposedly works around. Ludo=E2=80=99.