From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#34033: Offloading sometimes hangs Date: Sat, 22 Feb 2020 21:35:50 +0100 Message-ID: <87v9nyuzq1.fsf@gnu.org> References: <87o98obikk.fsf@gnu.org> <87fttuq2mz.fsf@gnu.org> <87wo8fqlu5.fsf@apteryx.i-did-not-set--mail-host-address--so-tickle-me> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:35792) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j5bVK-0000gh-Vq for bug-guix@gnu.org; Sat, 22 Feb 2020 15:36:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j5bVJ-0007hD-VR for bug-guix@gnu.org; Sat, 22 Feb 2020 15:36:02 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:44108) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j5bVJ-0007h3-SL for bug-guix@gnu.org; Sat, 22 Feb 2020 15:36:01 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1j5bVJ-0002G1-QO for bug-guix@gnu.org; Sat, 22 Feb 2020 15:36:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87wo8fqlu5.fsf@apteryx.i-did-not-set--mail-host-address--so-tickle-me> (Maxim Cournoyer's message of "Fri, 21 Feb 2020 23:37:06 -0500") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane-mx.org@gnu.org Sender: "bug-Guix" To: Maxim Cournoyer Cc: 34033@debbugs.gnu.org Hi Maxim, Maxim Cournoyer skribis: > Ludovic Court=C3=A8s writes: > >> Hello, >> >> Ludovic Court=C3=A8s skribis: >> >>> A simple thing would be to somehow get libssh to pass POLLIN | POLLRDHUP >>> instead of just POLLIN. >> >> Reported here: >> >> https://www.libssh.org/archive/libssh/2019-01/0000000.html >> >> A fix has been proposed by upstream and should be committed shortly. >> >>> Additionally, we could change Guile-SSH so that we can specify a timeout >>> when reading from a channel. >> >> Turns out we can set a per-session timeout, which we already do (see >> #:timeout in =E2=80=98open-ssh-session=E2=80=99 in (guix scripts offload= )) but >> =E2=80=98ssh_channel_read=E2=80=99 would ignore it and instead pass an i= nfinite timeout >> to poll(2): >> >> https://www.libssh.org/archive/libssh/2019-01/0000001.html >> >> This issue happens to be fixed in libssh 0.8.x, so I upgraded our libssh >> package in commit a8b0556ea1e439c89dc1ba33c8864e8b9b811f08. >> >> (That still doesn=E2=80=99t tell us why our =E2=80=98guix offload=E2=80= =99 processes would >> occasionally be stuck but at least it ensures the build farm keeps >> making progress even when that happens.) >> >> Ludo=E2=80=99. > > Seems the patch in the response at the URL you linked is awaiting some > feedback/review. Is this the reason 'guix substitute' hangs for so long > when the substitute server is down? (like 1 minute or so). The issues above are in libssh and were fixed a while ago. =E2=80=98guix substitute=E2=80=99 doesn=E2=80=99t use Guile-SSH/libssh, so the problem yo= u=E2=80=99re seeing must be something different. What do you mean by =E2=80=9Cthe substitute server is down=E2=80=9D? You m= ean =E2=80=98guix publish=E2=80=99 is not running, or the machine is unavailable altogether? Thanks, Ludo=E2=80=99.