From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: bug#35181: Hydra offloads often get stuck while exporting build requisites Date: Tue, 09 Apr 2019 14:09:41 -0400 Message-ID: <87pnpvrqwv.fsf@netris.org> References: <87mul17oo2.fsf@netris.org> <87imvp7ogv.fsf@netris.org> <20190407173105.GB1337@macbook41> <87ef6d6mdn.fsf@netris.org> <87pnpw29kp.fsf@gnu.org> <87o95g5lpd.fsf@netris.org> <87ftqrh2jn.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:54818) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hDvE3-0005XS-LH for bug-guix@gnu.org; Tue, 09 Apr 2019 14:12:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hDvE2-0001fx-E3 for bug-guix@gnu.org; Tue, 09 Apr 2019 14:12:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:38399) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hDvE2-0001fl-5V for bug-guix@gnu.org; Tue, 09 Apr 2019 14:12:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hDvE1-0000Ud-TD for bug-guix@gnu.org; Tue, 09 Apr 2019 14:12:01 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87ftqrh2jn.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 09 Apr 2019 12:54:20 +0200") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 35181@debbugs.gnu.org Hi Ludovic, Ludovic Court=C3=A8s writes: > The problem is that this is an ancient Guix. In the meantime, > offloading has seen relevant changes, in particular things like commit > ed7b44370f71126087eb953f36aad8dc4c44109f which address stability issues > with Guile-SSH (ssh dist node) that was previously used. > > I think we should upgrade Guix on hydra.gnu.org otherwise we=E2=80=99re l= ikely > to end up chasing old bugs. Sure, that makes sense. I also noticed the old Guix after writing my last messages, so yesterday I tried updating Hydra's Guix to 0.16.0-11, which at the time was the latest version built by Hydra. After updating, I quit and relaunched 'guix-daemon', as well as 'guix publish', hydra-queue-runner, and hydra-evaluator. With the new version of Guix, *all* offloads started failing in a strange way: it got stuck in a loop, printing endlessly repeated messages like this: process N acquired build slot '/var/guix/offload/hydra.gnunet.org/0' process N acquired build slot '/var/guix/offload/hydra.gnunet.org/0' process N acquired build slot '/var/guix/offload/hydra.gnunet.org/1' process N acquired build slot '/var/guix/offload/hydra.gnunet.org/2' process N acquired build slot '/var/guix/offload/hydra.gnunet.org/0' This is from memory because after killing the queue-runner and cancelling the 'mozjs-60' jobs (which I had intended to start building as a test), the nix output above is no longer visible on those pages, and I'm not sure offhand were to look for it. Anyway, in every offloaded build, it printed a line like the above every few seconds, with the build slot number at the end varying. I don't remember if the process number varied. This reminds that I also ran into difficulties updating 'guix' on the armhf build slaves, which are also currently stuck on an even more ancient version of Guix (circa 0.12.0). On both Hydra and its armhf build slaves, Guix is installed on top of a Debian derivative, and both 'guix' and 'guix-daemon' are launched from an environment without any Guix environment variable settings. This apparently works in ancient versions of Guix, but not recent ones. So, could the problem simply be that the 'guix' wrapper is not installing enough environment variable settings for offloading to work? Mark