From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#30365: Offloading sometimes hangs Date: Tue, 06 Feb 2018 11:04:10 +0100 Message-ID: <877erq8med.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:39305) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ej07g-0003ch-Ap for bug-guix@gnu.org; Tue, 06 Feb 2018 05:05:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ej07a-0004ag-ER for bug-guix@gnu.org; Tue, 06 Feb 2018 05:05:08 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:51646) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ej07a-0004aP-9j for bug-guix@gnu.org; Tue, 06 Feb 2018 05:05:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ej07Z-0005Mv-S7 for bug-guix@gnu.org; Tue, 06 Feb 2018 05:05:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: Received: from eggs.gnu.org ([2001:4830:134:3::10]:39103) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ej06t-000326-3w for bug-guix@gnu.org; Tue, 06 Feb 2018 05:04:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ej06o-0004GH-1d for bug-guix@gnu.org; Tue, 06 Feb 2018 05:04:19 -0500 Received: from hera.aquilenet.fr ([2a0c:e300::1]:33454) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ej06n-0004FX-Lr for bug-guix@gnu.org; Tue, 06 Feb 2018 05:04:13 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id DB19583CC for ; Tue, 6 Feb 2018 11:04:11 +0100 (CET) Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ni5OYypPyFze for ; Tue, 6 Feb 2018 11:04:11 +0100 (CET) Received: from ribbon (unknown [193.50.110.200]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 08373793E for ; Tue, 6 Feb 2018 11:04:10 +0100 (CET) List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: 30365@debbugs.gnu.org Hi, On berlin.guixsd.org, offloading would sometimes hang in the middle of an offloaded build: no more build log output showing up, nothing happening (this is with guix-0.14.0-6.0dcf675). On the build machine side, the guile process that forwards data between the sshd and guix-daemon=C2=B9 is stuck on: read(0, =E2=80=A6) with this stack trace: --8<---------------cut here---------------start------------->8--- (gdb) bt #0 0x00007f09d6068aed in read () from /gnu/store/3h31zsqxjjg52da5gp3qmhkh4= x8klhah-glibc-2.25/lib/libpthread.so.0 #1 0x00007f09d653fc47 in fport_read () from /gnu/store/0v539yjmdqhjm1xcpvndmagkgjz5fvh2-guile-2.2.2/lib/libguil= e-2.2.so.1 #2 0x00007f09d656cd77 in scm_i_read_bytes () from /gnu/store/0v539yjmdqhjm1xcpvndmagkgjz5fvh2-guile-2.2.2/lib/libguil= e-2.2.so.1 #3 0x00007f09d65705fe in scm_fill_input () from /gnu/store/0v539yjmdqhjm1xcpvndmagkgjz5fvh2-guile-2.2.2/lib/libguil= e-2.2.so.1 #4 0x00007f09d6577897 in scm_get_bytevector_some () from /gnu/store/0v539yjmdqhjm1xcpvndmagkgjz5fvh2-guile-2.2.2/lib/libguil= e-2.2.so.1 #5 0x00007f09d65abc4d in vm_regular_engine () from /gnu/store/0v539yjmdqhjm1xcpvndmagkgjz5fvh2-guile-2.2.2/lib/libguil= e-2.2.so.1 #6 0x00007f09d65af2aa in scm_call_n () from /gnu/store/0v539yjmdqhjm1xcpvndmagkgjz5fvh2-guile-2.2.2/lib/libguil= e-2.2.so.1 #7 0x00007f09d65338d7 in scm_primitive_eval () from /gnu/store/0v539yjmdqhjm1xcpvndmagkgjz5fvh2-guile-2.2.2/lib/libguil= e-2.2.so.1 --8<---------------cut here---------------end--------------->8--- In theory this =E2=80=9Ccannot happen=E2=80=9D because it reads from stdin = iff =E2=80=98select=E2=80=99 said stdin is ready. On the server side (on berlin itself), the corresponding =E2=80=98guix offl= oad=E2=80=99 process is stuck here: --8<---------------cut here---------------start------------->8--- (gdb) bt #0 0x00007ff49b3590bd in poll () from target:/gnu/store/3h31zsqxjjg52da5gp= 3qmhkh4x8klhah-glibc-2.25/lib/libc.so.6 #1 0x00007ff48f4db377 in ssh_poll_ctx_dopoll () from target:/gnu/store/3phbrya78gpk7rg6flqyqzf53y3x9zv9-libssh-0.7.5/lib= /libssh.so.4 #2 0x00007ff48f4dc319 in ssh_handle_packets () from target:/gnu/store/3phbrya78gpk7rg6flqyqzf53y3x9zv9-libssh-0.7.5/lib= /libssh.so.4 #3 0x00007ff48f4dc3ed in ssh_handle_packets_termination () from target:/gnu/store/3phbrya78gpk7rg6flqyqzf53y3x9zv9-libssh-0.7.5/lib= /libssh.so.4 #4 0x00007ff48f4c8eff in ssh_channel_read_timeout () from target:/gnu/store/3phbrya78gpk7rg6flqyqzf53y3x9zv9-libssh-0.7.5/lib= /libssh.so.4 #5 0x00007ff48f930803 in read_from_channel_port () from target:/gnu/store/xfaqdvk060yz7ddc9isk3wkybqmcfj3w-guile-ssh-0.11.2= /lib/libguile-ssh.so.11 #6 0x00007ff49cea7d77 in scm_i_read_bytes () from target:/gnu/store/swyipr8smrd5bc72n92sdfxzx0p4cjpi-guile-2.2.2/lib/= libguile-2.2.so.1 #7 0x00007ff49ceac3fc in scm_c_read_bytes () from target:/gnu/store/swyipr8smrd5bc72n92sdfxzx0p4cjpi-guile-2.2.2/lib/= libguile-2.2.so.1 #8 0x00007ff49ceb2838 in scm_get_bytevector_n () from target:/gnu/store/swyipr8smrd5bc72n92sdfxzx0p4cjpi-guile-2.2.2/lib/= libguile-2.2.so.1 #9 0x00007ff49cee6c4d in vm_regular_engine () from target:/gnu/store/swyipr8smrd5bc72n92sdfxzx0p4cjpi-guile-2.2.2/lib/= libguile-2.2.so.1 #10 0x00007ff49ceea2aa in scm_call_n () from target:/gnu/store/swyipr8smrd5bc72n92sdfxzx0p4cjpi-guile-2.2.2/lib/= libguile-2.2.so.1 #11 0x00007ff49ce6e8d7 in scm_primitive_eval () --8<---------------cut here---------------end--------------->8--- Presumably the =E2=80=98scm_get_bytevector_n=E2=80=99 call comes from (guix serialization) or =E2=80=98process-stderr=E2=80=99. IOW we have a deadlock where both sides are waiting for input data. Ludo=E2=80=99. =C2=B9 https://git.savannah.gnu.org/cgit/guix.git/tree/guix/ssh.scm?id=3D03= 62e5820ab6a1eb8eaf33bc47e592857c25f765#n102