From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#34033: Offloading sometimes hangs Date: Mon, 14 Jan 2019 23:45:56 +0100 Message-ID: <87fttuq2mz.fsf@gnu.org> References: <87o98obikk.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:33614) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gjB0a-0001n3-8G for bug-guix@gnu.org; Mon, 14 Jan 2019 17:47:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gjB0Y-0005IO-CX for bug-guix@gnu.org; Mon, 14 Jan 2019 17:47:04 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:60145) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gjB0Y-0005Hv-7q for bug-guix@gnu.org; Mon, 14 Jan 2019 17:47:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gjB0X-0002gc-Rc for bug-guix@gnu.org; Mon, 14 Jan 2019 17:47:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87o98obikk.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Thu, 10 Jan 2019 17:09:31 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: 34033@debbugs.gnu.org Hello, Ludovic Court=C3=A8s skribis: > A simple thing would be to somehow get libssh to pass POLLIN | POLLRDHUP > instead of just POLLIN. Reported here: https://www.libssh.org/archive/libssh/2019-01/0000000.html A fix has been proposed by upstream and should be committed shortly. > Additionally, we could change Guile-SSH so that we can specify a timeout > when reading from a channel. Turns out we can set a per-session timeout, which we already do (see #:timeout in =E2=80=98open-ssh-session=E2=80=99 in (guix scripts offload)) = but =E2=80=98ssh_channel_read=E2=80=99 would ignore it and instead pass an infi= nite timeout to poll(2): https://www.libssh.org/archive/libssh/2019-01/0000001.html This issue happens to be fixed in libssh 0.8.x, so I upgraded our libssh package in commit a8b0556ea1e439c89dc1ba33c8864e8b9b811f08. (That still doesn=E2=80=99t tell us why our =E2=80=98guix offload=E2=80=99 = processes would occasionally be stuck but at least it ensures the build farm keeps making progress even when that happens.) Ludo=E2=80=99.