From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 5TscHlYurWC9SAEAgWs5BA (envelope-from ) for ; Tue, 25 May 2021 19:05:26 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id KOouGVYurWBvTAAAbx9fmQ (envelope-from ) for ; Tue, 25 May 2021 17:05:26 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B69D318B36 for ; Tue, 25 May 2021 19:05:25 +0200 (CEST) Received: from localhost ([::1]:56218 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1llaUe-0005xU-Ie for larch@yhetil.org; Tue, 25 May 2021 13:05:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42126) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1llZKh-0003m8-59 for bug-guix@gnu.org; Tue, 25 May 2021 11:51:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:35050) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1llZKg-0007AG-SH for bug-guix@gnu.org; Tue, 25 May 2021 11:51:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1llZKg-0002A9-Py for bug-guix@gnu.org; Tue, 25 May 2021 11:51:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response. Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Tue, 25 May 2021 15:51:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41625 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 41625@debbugs.gnu.org Received: via spool by 41625-submit@debbugs.gnu.org id=B41625.16219578288246 (code B ref 41625); Tue, 25 May 2021 15:51:02 +0000 Received: (at 41625) by debbugs.gnu.org; 25 May 2021 15:50:28 +0000 Received: from localhost ([127.0.0.1]:46595 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1llZK8-00028v-6t for submit@debbugs.gnu.org; Tue, 25 May 2021 11:50:28 -0400 Received: from mail-qk1-f179.google.com ([209.85.222.179]:34662) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1llZK6-00028f-Jg for 41625@debbugs.gnu.org; Tue, 25 May 2021 11:50:27 -0400 Received: by mail-qk1-f179.google.com with SMTP id v8so30904655qkv.1 for <41625@debbugs.gnu.org>; Tue, 25 May 2021 08:50:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TJ/ibxEYb0JUuf7a4sZtB6Hjq0Fp+E4X/jxSdDxQ3xo=; b=K4WX+rUSjjKdkJ0Cz8YcVwvOrMM6a2tVd1xcqZmZ5xWoaTRUqQEcdHzHK/chTyx351 W0dFsOfw1Uq1Na4EFv4PXH6tq8bdHjrPMkNLu2FyC8kEeXIVT5JUc8FV5d3XBGyKu25m ePwhSFCyBE8fwahjeXwz4OhniJgKf/G56my+Uw3Ql/18K5YWscNoxyWM2leszBMLa+e4 b40ObyxKprDKkUgEdL81EHwfJG9qxtCav0fTu0wyzNPJpNqQRpHirsMs4av95AqoYnwx kD60ZZ52hntFbKkVaxaetpqmCFnTulg/QEvK0Q9M5M2slgLR14FRuEkKNHlDlhcSCu7Z 59NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TJ/ibxEYb0JUuf7a4sZtB6Hjq0Fp+E4X/jxSdDxQ3xo=; b=kLGO5h5XEk6BgsIzzZyFU4gdBNxCy8x21YkBag638c0QJ6j+U0MUL10kuL2lX4RYNH bELiWXLS8g2aBLo8EOmv2i/SwkEXSZQP5pmy/m1sv/1O+Ir73EOPfIad0si5F81k61FD vD1u8WCvoj5BfCONh4kmRz8f3KFcvdnqwmKsYp19dnBpQaoS56xXGQzFEKg8RIU+zFWd YTYiwbS5RZ3y57QQODn4XlCVFtqt3zd2qvp7XBL4o8y7GXZPCuwTDgAPVMAh4g5leQCd UKBzjGNgVixEMOJLv0CT1rl/ClQkTVzHLgfKszpaCmebuojLlf9HAWtBqMbGeJRF/LUr 3hQA== X-Gm-Message-State: AOAM533WlfxNO6jVXG8me5mCkKXkDTLmzD1QB/4kvJg+mVlfk8ukuqjV tNh+yzDVJDuTdN3Taa8bj6v8gSPjngs= X-Google-Smtp-Source: ABdhPJwTWx1Kg+kSMVsMuMwPP02ptK5yezG88iDN5RnXOMQtWDNJkLCT211cUR30+sOEWv63KzwW1g== X-Received: by 2002:a05:620a:955:: with SMTP id w21mr35927513qkw.124.1621957820493; Tue, 25 May 2021 08:50:20 -0700 (PDT) Received: from localhost.localdomain (dsl-10-130-162.b2b2c.ca. [72.10.130.162]) by smtp.gmail.com with ESMTPSA id j3sm13202245qka.98.2021.05.25.08.50.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 May 2021 08:50:19 -0700 (PDT) From: Maxim Cournoyer Date: Tue, 25 May 2021 11:50:03 -0400 Message-Id: <20210525155003.27590-1-maxim.cournoyer@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <87mtsky9um.fsf@gmail.com> References: <87mtsky9um.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Maxim Cournoyer Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1621962325; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=TJ/ibxEYb0JUuf7a4sZtB6Hjq0Fp+E4X/jxSdDxQ3xo=; b=kX3EfkU5h7aqUyQOnNbQR7BhRl9XWSBBIMrtp8R4g25TH6jVnzM3riKcCpXHPdut7C1yVn G5sxiHIJzrgpITWWkQkX9FNIGCSF9DcxOVemSbXmfaux7kU3EAEAV60pCMKCGprGUu1YUZ 7IMU6SFrwksb/jdm9+smjKw5MTDComRIr6obCaa1gsj2tddKe5lgz60NimB1xQ0ELgvIOg 1iAT5e4zq41JuTT+8ovVEGXnD8cXTfCO1YeKTHGxlU5K9ZAd/VjJPBeIV3BNe7N38qSgKq SlY9htAvIcrQwpnxC3at9ZVfeQy/RqxSc0YnXr+AmXFQKoJdrsjhb44DVAzsSg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1621962325; a=rsa-sha256; cv=none; b=sBIV9vgmvlqgcxYdCYSBqeObtSAxbap7sFNeScz3Mtssji+oWid5AuJJ6qZIwYuWZ0TkO9 gxdhKqkZHIoB2JxqKFIJipjA9q/fav+7zi/pX4syNAtFY18HMPf1pGUlR+8mheQBKqcjSI 8Nt4a3oHytvaUDm0UinEOy2bEjO/uFAWLhUHx9L7N+8BPI7nyTZl6fRuissGOUZSEpOAJ/ ZkJxRiUZjWBms/lWrPncBswfHzQNsOS8lyLJbInziKGe5ni8hdZtWeWTPJcSOm0GGSC1yv Iw6UG5cZF4PIq6azGAY4gIAzEuC8tHno8V3Rqxl4qn+cP8DCVQYwTiUD5bDxSQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20161025 header.b=K4WX+rUS; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Spam-Score: 0.17 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20161025 header.b=K4WX+rUS; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: B69D318B36 X-Spam-Score: 0.17 X-Migadu-Scanner: scn0.migadu.com X-TUID: /ThUNc+3KUh9 Fixes . * guix/scripts/offload.scm (check-machine-availability): Refactor so that it takes a single machine object, to allow for retrying a single machine. Handle the case where the checks raised an exception due to the connection to the build machine having been lost, and retry up to 3 times. Ensure the cleanup code is run in all situations. (check-machines-availability): New procedure. Call CHECK-MACHINES-AVAILABILITY in parallel, which improves performance (about twice as fast with 4 build machines, from ~30 s to ~15 s). * guix/inferior.scm (&inferior-connection-lost): New condition type. (read-repl-response): Raise a condition of the above type when reading EOF from the build machine's port. --- guix/inferior.scm | 9 ++++++++ guix/scripts/offload.scm | 50 +++++++++++++++++++++++++++++----------- 2 files changed, 45 insertions(+), 14 deletions(-) diff --git a/guix/inferior.scm b/guix/inferior.scm index 7c8e478f2a..4ac1ea3484 100644 --- a/guix/inferior.scm +++ b/guix/inferior.scm @@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2018, 2019, 2020, 2021 Ludovic Courtès +;;; Copyright © 2021 Maxim Cournoyer ;;; ;;; This file is part of GNU Guix. ;;; @@ -70,6 +71,7 @@ inferior-exception-arguments inferior-exception-inferior inferior-exception-stack + inferior-connection-lost? read-repl-response inferior-packages @@ -228,6 +230,9 @@ equivalent. Return #f if the inferior could not be launched." (inferior inferior-exception-inferior) ; | #f (stack inferior-exception-stack)) ;list of (FILE COLUMN LINE) +(define-condition-type &inferior-connection-lost &error + inferior-connection-lost?) + (define* (read-repl-response port #:optional inferior) "Read a (guix repl) response from PORT and return it as a Scheme object. Raise '&inferior-exception' when an exception is read from PORT." @@ -241,6 +246,10 @@ Raise '&inferior-exception' when an exception is read from PORT." (match (read port) (('values objects ...) (apply values (map sexp->object objects))) + ;; Unexpectedly read EOF from the port. This can happen for example when + ;; the underlying connection for PORT was lost with Guile-SSH. + (? eof-object? + (raise (condition (&inferior-connection-lost)))) (('exception ('arguments key objects ...) ('stack frames ...)) ;; Protocol (0 1 1) and later. diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scm index 835078cb97..7c6d38b218 100644 --- a/guix/scripts/offload.scm +++ b/guix/scripts/offload.scm @@ -1,7 +1,7 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès ;;; Copyright © 2017 Ricardo Wurmus -;;; Copyright © 2020 Maxim Cournoyer +;;; Copyright © 2020, 2021 Maxim Cournoyer ;;; Copyright © 2020 Julien Lepiller ;;; ;;; This file is part of GNU Guix. @@ -53,6 +53,7 @@ #:use-module (ice-9 regex) #:use-module (ice-9 format) #:use-module (ice-9 binary-ports) + #:use-module (ice-9 threads) #:export (build-machine build-machine? build-machine-name @@ -684,7 +685,7 @@ daemon is not running." (leave (G_ "failed to import '~a' from '~a'~%") item name))))) -(define (check-machine-availability machine-file pred) +(define (check-machines-availability machine-file pred) "Check that each machine matching PRED in MACHINE-FILE is usable as a build machine." (define (build-machine=? m1 m2) @@ -696,18 +697,39 @@ machine." (let ((machines (filter pred (delete-duplicates (build-machines machine-file) build-machine=?)))) - (info (G_ "testing ~a build machines defined in '~a'...~%") + (info (G_ "Testing ~a build machines defined in '~a'...~%") (length machines) machine-file) - (let* ((names (map build-machine-name machines)) - (sockets (map build-machine-daemon-socket machines)) - (sessions (map (cut open-ssh-session <> %short-timeout) machines)) - (nodes (map remote-inferior sessions))) - (for-each assert-node-has-guix nodes names) - (for-each assert-node-repl nodes names) - (for-each assert-node-can-import sessions nodes names sockets) - (for-each assert-node-can-export sessions nodes names sockets) - (for-each close-inferior nodes) - (for-each disconnect! sessions)))) + (par-for-each check-machine-availability machines))) + +(define (check-machine-availability machine) + "Check whether MACHINE is available. Exit with an error upon failure." + ;; Sometimes, the machine remote port may return EOF, presumably because the + ;; connection was lost. Retry up to 3 times. + (let loop ((retries 3)) + (guard (c ((inferior-connection-lost? c) + (let ((retries-left (1- retries))) + (if (> retries-left 0) + (begin + (format (current-error-port) + (G_ "connection to machine ~s lost; retrying~%") + (build-machine-name machine)) + (loop (retries-left))) + (leave (G_ "connection repeatedly lost with machine '~a'~%") + (build-machine-name machine)))))) + (let* ((name (build-machine-name machine)) + (socket (build-machine-daemon-socket machine)) + (session (open-ssh-session machine %short-timeout)) + (node (remote-inferior session))) + (dynamic-wind + (lambda () #t) + (lambda () + (assert-node-has-guix node name) + (assert-node-repl node name) + (assert-node-can-import session node name socket) + (assert-node-can-export session node name socket)) + (lambda () + (close-inferior node) + (disconnect! session))))))) (define (check-machine-status machine-file pred) "Print the load of each machine matching PRED in MACHINE-FILE." @@ -824,7 +846,7 @@ machine." ((file) (values file (const #t))) (() (values %machine-file (const #t))) (x (leave (G_ "wrong number of arguments~%")))))) - (check-machine-availability (or file %machine-file) pred)))) + (check-machines-availability (or file %machine-file) pred)))) (("status" rest ...) (with-error-handling (let-values (((file pred) -- 2.31.1