From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#34033: Offloading sometimes hangs Date: Thu, 10 Jan 2019 17:09:31 +0100 Message-ID: <87o98obikk.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:37881) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ghcuC-0006Fw-TA for bug-guix@gnu.org; Thu, 10 Jan 2019 11:10:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ghcuA-0005tn-Sz for bug-guix@gnu.org; Thu, 10 Jan 2019 11:10:04 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:54830) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ghcuA-0005tc-Or for bug-guix@gnu.org; Thu, 10 Jan 2019 11:10:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ghcuA-000776-Cw for bug-guix@gnu.org; Thu, 10 Jan 2019 11:10:02 -0500 Sender: "Debbugs-submit" Resent-Message-ID: Received: from eggs.gnu.org ([209.51.188.92]:37807) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ghctm-0006Fe-6V for bug-guix@gnu.org; Thu, 10 Jan 2019 11:09:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ghctl-0005Zd-7u for bug-guix@gnu.org; Thu, 10 Jan 2019 11:09:38 -0500 Received: from hera.aquilenet.fr ([2a0c:e300::1]:60310) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ghctk-0005XJ-TI for bug-guix@gnu.org; Thu, 10 Jan 2019 11:09:37 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 75E0E195E for ; Thu, 10 Jan 2019 17:09:33 +0100 (CET) Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pvs2nnhO917r for ; Thu, 10 Jan 2019 17:09:32 +0100 (CET) Received: from ribbon (unknown [IPv6:2001:660:6102:320:e120:2c8f:8909:cdfe]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 46F56193C for ; Thu, 10 Jan 2019 17:09:32 +0100 (CET) List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: 34033@debbugs.gnu.org Hello, So there=E2=80=99s another situation where offloading regularly hangs on berlin. The =E2=80=98guix offload=E2=80=99 process looks like this: --8<---------------cut here---------------start------------->8--- (gdb) bt #0 0x00007f1f715686a1 in __GI___poll (fds=3D0x14e9b30, nfds=3D1, timeout= =3D-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007f1f673b94e7 in ssh_poll (timeout=3D, nfds=3D, fds=3D) at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/poll.c:= 98 #2 ssh_poll_ctx_dopoll (ctx=3Dctx@entry=3D0x14ee2e0, timeout=3Dtimeout@ent= ry=3D-1) at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/poll.c:= 612 #3 0x00007f1f673ba449 in ssh_handle_packets (session=3Dsession@entry=3D0x2= 249360, timeout=3Dtimeout@entry=3D-1) at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/session= .c:634 #4 0x00007f1f673ba51d in ssh_handle_packets_termination (session=3Dsession= @entry=3D0x2249360, timeout=3D, timeout@entry=3D-3, fct=3Dfct@entry=3D0x7f1f673a4430 , user=3Duser@entry=3D0x7ffce23953f0) at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/session= .c:696 #5 0x00007f1f673a6aaf in ssh_channel_read_timeout (channel=3D0x224e360, de= st=3Ddest@entry=3D0x18ef020, count=3Dcount@entry=3D8, is_stderr=3D, timeout=3D-3, tim= eout@entry=3D-1) at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/channel= s.c:2705 #6 0x00007f1f673a6bbb in ssh_channel_read (channel=3D, dest= =3Ddest@entry=3D0x18ef020, count=3Dcount@entry=3D8, is_stderr=3D) at /tmp/guix-build-libssh-0.7.7.drv-0/libs= sh-0.7.7-checkout/src/channels.c:2621 #7 0x00007f1f67413a23 in read_from_channel_port ( channel=3D0x22f01a0, dst=3D, start= =3D0, count=3D8) at channel-type.c:161 #8 0x00007f1f71b65287 in scm_i_read_bytes ( port=3Dport@entry=3D0x22f01a0, dst=3Ddst@entry=3D= "#" =3D {...}, start=3Dstart@entry=3D0, count=3Dcount@entry=3D8)= at ports.c:1559 #9 0x00007f1f71b6996c in scm_c_read_bytes ( port=3Dport@entry=3D0x22f01a0, dst=3Ddst@entry=3D= "#" =3D {...}, start=3Dstart@entry=3D0, count=3Dcount@entry=3D8)= at ports.c:1639 #10 0x00007f1f71b6fd80 in scm_get_bytevector_n ( port=3D0x22f01a0, count=3D) at r6rs-ports.c:421 #11 0x00007f1f71ba4715 in vm_regular_engine (thread=3D0x14e9b30, vp=3D0xc31= f30, registers=3D0xffffffff, resume=3D1901495969) at vm-engine.c:786 [...] (gdb) p *fds $1 =3D {fd =3D 15, events =3D 1, revents =3D 0} (gdb) shell ls -l /proc/12185/fd total 0 lr-x------ 1 root root 64 Jan 10 16:56 0 -> 'pipe:[76778016]' l-wx------ 1 root root 64 Jan 10 16:56 1 -> 'pipe:[76778015]' lr-x------ 1 root root 64 Jan 10 16:56 10 -> 'pipe:[76838317]' l-wx------ 1 root root 64 Jan 10 16:56 11 -> 'pipe:[76838317]' lr-x------ 1 root root 64 Jan 10 16:56 12 -> 'pipe:[76851360]' l-wx------ 1 root root 64 Jan 10 16:56 13 -> 'pipe:[76851360]' l-wx------ 1 root root 64 Jan 10 16:56 14 -> /var/guix/offload/overdrive1.g= uixsd.org/1 lrwx------ 1 root root 64 Jan 10 16:56 15 -> 'socket:[76860702]' lr-x------ 1 root root 64 Jan 10 16:56 16 -> /dev/urandom l-wx------ 1 root root 64 Jan 10 16:56 2 -> 'pipe:[76778015]' lr-x------ 1 root root 64 Jan 10 16:56 3 -> 'pipe:[76838313]' l-wx------ 1 root root 64 Jan 10 16:56 4 -> 'pipe:[76778017]' l-wx------ 1 root root 64 Jan 10 16:56 5 -> 'pipe:[76838313]' lr-x------ 1 root root 64 Jan 10 16:56 6 -> 'pipe:[76838316]' l-wx------ 1 root root 64 Jan 10 16:56 7 -> 'pipe:[76838316]' lr-x------ 1 root root 64 Jan 10 16:56 8 -> 'pipe:[76841414]' l-wx------ 1 root root 64 Jan 10 16:56 9 -> 'pipe:[76841414]' --8<---------------cut here---------------end--------------->8--- It=E2=80=99s a =E2=80=98get-bytevector-n=E2=80=99 for 8 bytes, so it looks = like the daemon protocol. At that point the socket is actually dead: if I connect on the remote machine (overdrive1.guixsd.org) I can see that there are no other open SSH sessions. A simple thing would be to somehow get libssh to pass POLLIN | POLLRDHUP instead of just POLLIN. Additionally, we could change Guile-SSH so that we can specify a timeout when reading from a channel. Ludo=E2=80=99.