From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#24496: offloading should fall back to local build after n tries Date: Wed, 05 Oct 2016 13:36:20 +0200 Message-ID: <87a8ej81u3.fsf@gnu.org> References: <8760ppr3q3.fsf@we.make.ritual.n0.is> <87r387nhjg.fsf@gnu.org> <87vax8nis5.fsf@we.make.ritual.n0.is> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:43179) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brkVV-0001nz-7C for bug-guix@gnu.org; Wed, 05 Oct 2016 07:37:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1brkVS-0005s4-1F for bug-guix@gnu.org; Wed, 05 Oct 2016 07:37:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:39031) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brkVR-0005ry-TG for bug-guix@gnu.org; Wed, 05 Oct 2016 07:37:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1brkVR-0006ND-Oo for bug-guix@gnu.org; Wed, 05 Oct 2016 07:37:01 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87vax8nis5.fsf@we.make.ritual.n0.is> (ng0's message of "Tue, 04 Oct 2016 17:08:58 +0000") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: ng0 Cc: 24496@debbugs.gnu.org ng0 skribis: > Ludovic Court=C3=A8s writes: [...] >> Like you say, on Hydra-style setup this could be a problem: the >> front-end machine may have --max-jobs=3D0, meaning that it cannot perform >> builds on its own. >> >> So I guess we would need a command-line option to select a different >> behavior. I=E2=80=99m not sure how to do that because =E2=80=98guix off= load=E2=80=99 is >> =E2=80=9Chidden=E2=80=9D behind =E2=80=98guix-daemon=E2=80=99, so there= =E2=80=99s no obvious place for such an >> option. > > Could the daemon run with --enable-hydra-style or --disable-hydra-style > and --disable-hydra-style would allow falling back to local build if > after a defined time - keeping slow connections in mind - the machine > did not reply. That would be too ad-hoc IMO, and the problem mentioned above remains. >> In the meantime, you could also hack up your machines.scm: it would >> return a list where unreachable machines have been filtered out. > > How can I achieve this? Something like: (define the-machine (build-machine =E2=80=A6)) (if (managed-to-connect-timely the-machine) (list the-machine) '()) =E2=80=A6 where =E2=80=98managed-to-connect-timely=E2=80=99 would try to co= nnect to the machine with a timeout. > And to append to this bug: it seems to me that offloading requires 1 > lsh-key for each > build-machine. The main machine needs to be able to connect to each build machine over SSH, so indeed, that requires proper SSH key registration (host keys and authorized user keys). > (https://lists.gnu.org/archive/html/help-guix/2016-10/msg00007.html) > and that you can not directly address them (say I want to create some > system where I want to build on machine 1 AND machine 2. Having 2 > x86_64 in machines.scm only selects one of them (if 2 were working, > see linked thread) and builds on the one which is accessible first. If > however the first machine is somehow blocked and it fails, therefore > terminates lsh connection, the build does not happen at all. The code that selects machines is in (guix scripts offload), specifically =E2=80=98choose-build-machine=E2=80=99. It tries to choose th= e =E2=80=9Cbest=E2=80=9D machine, which means, roughly, the fastest and least loaded one. HTH, Ludo=E2=80=99.