From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id iEOSECoWVGDRAQAA0tVLHw (envelope-from ) for ; Fri, 19 Mar 2021 03:10:34 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id 2ExMDCoWVGDMRgAAB5/wlQ (envelope-from ) for ; Fri, 19 Mar 2021 03:10:34 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id CA0D7126AB for ; Fri, 19 Mar 2021 04:10:33 +0100 (CET) Received: from localhost ([::1]:44708 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lN5Wy-0006VD-VN for larch@yhetil.org; Thu, 18 Mar 2021 23:10:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58490) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lN5Wh-0006V6-G5 for guix-devel@gnu.org; Thu, 18 Mar 2021 23:10:15 -0400 Received: from mail-40141.protonmail.ch ([185.70.40.141]:34097) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lN5We-0006XJ-66 for guix-devel@gnu.org; Thu, 18 Mar 2021 23:10:14 -0400 Date: Fri, 19 Mar 2021 03:09:54 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1616123406; bh=kzMR0wA+nz7fXuPmRjt80KZ9evfwQe3z3ywh/AVPvIc=; h=Date:To:From:Reply-To:Subject:From; b=QlxZ+JHUtwGqnePYLUNJnSXobfq/5gFPfHVLeuVPR/bp9rCwvgjdY78a8nu15gdPf b9b0xOSmXGcOhq1pKyOkZ9t3rnxvQBQTch2Mn+QMsiz3wYLyiHijIFend6EO4LBdHk 0v6eNUWC+cc9bcclYlUEyzZALaRMo7JGISZ6Fw1M= To: "guix-devel@gnu.org" , Mathieu Othacehe From: raid5atemyhomework Subject: Making the Guix installer resilient against transient network issues Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.40.141; envelope-from=raid5atemyhomework@protonmail.com; helo=mail-40141.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: raid5atemyhomework Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616123433; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=kzMR0wA+nz7fXuPmRjt80KZ9evfwQe3z3ywh/AVPvIc=; b=ppIvEyiLlyaBmelcyhYpWfjqVVwfxrs7zfxpTU1h0qb4hyS9v/YjGI9qd2GcoRyRvcGuUq aZuRThQgrfqHlMtacb7ufltLrmUtOqE0Hdn/4vkE00wfCfPRaF0k8NTVT8A02S3xbS9Ezl bquC6JyRGoTA8fG30krwvwOh3RoL/vkvmDYk3/pluHpPT5z+dZPwNP+08yvfPuCTiKrKd9 920Zm2qpwYSPyMEY6aTvu+sAup6+OunM9sbePmzYLPtVnnXyqwf5ld8x0u/wU5F71r8fjS 2SN1QyaRFeS9PoSRoI980CyJJOjkEkYe+6voPHty6gabDx6A1s0ra/9mNP9CYw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616123433; a=rsa-sha256; cv=none; b=gdkrqZ1sUs7UTlVxPUHhLN1lDhF3fgBkkQnsN3npSwDbxsKy/RysmKdCQHN8xrYlGYdbS4 nwvhyigY7zuNmi5uDdiZx3C+mqlu+yJC52gv9qMgoBfPfk4HAod2TeKNUL0aqCW99BO7Yv b56kIgHEgKUG2HIQw0t7zIXslb8XGQ0DeJvRkguErV7L+QdtzEZBuhkLQv2jLN2kf+w5UK a46iAJMM1PZemTAVCpgRxFh1X3rhaDdOKLRXsjP2zs1qIuNdW5NHy8K7LswUJxvce+jPbw Y9jKuJvi17gmpILVMOP27tOmT53HFFpD1F1xwFR3496P2pNvsepnV5INvtov+A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=QlxZ+JHU; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.11 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=QlxZ+JHU; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: CA0D7126AB X-Spam-Score: -3.11 X-Migadu-Scanner: scn0.migadu.com X-TUID: gwhthOOHx5R+ Hello Guix devel, When `guix system init` fails, there are a number of possible causes of fai= lure: * Packages being downloaded are so broken that they cannot actually be buil= t. * QC should filter this out. * The hardware being installed on is broken, usually a failure of the stora= ge device being installed into. * Downloading substitutes from the substitute server failed. Of the above, the last is the most likely to occur in practice. I have been doing a number of repeated installation tests on VMs using the = SJTUG mirror server, as well as the Berlin Cuirass, and a significant numbe= r of installation attempts via the guided installer fail due to problems wi= th downloading substitutes. * From my system, the Berlin Cuirass server is very very very slow (< 40kiB= /s, sometimes as low as 4kiB/s) and possibly because of the slowness, the d= ownload gets interrupted part of the way through which causes the install t= o fail. * The SJTUG server sometimes responds in ways that the Guix downloader does= not expect, causing failures. What I do instead is to use the "manual" mode and just keep doing `guix sys= tem build` over and over until it manages to pull through. I think that the guided installer should also use the same technique of try= ing `guix system build` repeatedly for at least some number of tries, possi= bly asking the user if they want to keep trying (in case the issue is a per= manent network error rather than a transient network error). Yes, currently a failure to install "just" kicks the user back to the guide= d install and they can rerun `guix system init`. ***HOWEVER***, because th= e store is in a COW mode, this sometimes leaves the store in a wonky state = and the `guix system init` performs the system build from 0, or it can fail= . Not to mention that this is requires more keypresses for the user. So, let me sketch proposed changes to `gnu/installer/final.scm`: ```patch @@ -169,6 +169,15 @@ or #f. Return #t on success and #f on failure." "/tmp/installer-system-init-options" read)) (const '()))) + (build-command (append (list "guix" "system" "build" + "--fallback") + options + (list (%installer-configuration-file)))) + (build-grub-command + (append (list "guix" "build" + "--fallback" + "grub" "grub-efi") + options)) (install-command (append (list "guix" "system" "init" "--fallback") options @@ -178,6 +187,36 @@ or #f. Return #t on success and #f on failure." (database-file (string-append database-dir "/db.sqlite")) (saved-database (string-append database-dir "/db.save")) (ret #f)) + + (define* (perform-install #:optional (tries 0)) + + (define (retry) + (perform-install (+ tries 1))) + + (define (ask-if-retry) + ;; TODO. Not sure best way to query user whether they + ;; would like to retry again. + ) + + (if (and (run-command build-command #:locale locale) + (run-command build-grub-command #:locale locale)) + (run-command install-command #:locale locale) + ;; Try to recover. + (begin + (format #t "~%~%~s~%~s~%~%" + (G_ "Failure while building system.") + (G_ "This is usually caused by (hopefully transient) n= etwork errors.")) + (cond + ((< tries %max-auto-system-build-retries) + (format #t "~s~%" + (G_ "Will wait 3 seconds and retry...")) + (sleep 3) + (retry)) + (else + #f))))) + (mkdir-p (%installer-target-dir)) ;; We want to initialize user passwords but we don't want to store the= m in @@ -221,9 +260,8 @@ or #f. Return #t on success and #f on failure." (lambda () (with-error-to-file "/dev/console" (lambda () - (run-command install-command - #:locale locale))))) - (run-command install-command #:locale locale)))) + (perform-install))))) + (perform-install)))) (lambda () ;; Restart guix-daemon so that it does no keep the MNT namesp= ace ;; alive. ``` Notes: * `guix system build` only builds the *system*. It doesn't build the bootl= oader. I can't find a command that builds the bootloader; only `guix syste= m init` or `guix system reconfigure` do that, but we need to differentiate = between the failure "downloading from the substituter failed" (which might = be fixable by just retrying) from "writing to the device being installed in= to failed". * In the above I use `guix build grub grub-efi` as a proxy for this, but = it would be nice if there were some kind of `guix system build-bootloader` = that would perform *building* of the script that installs the bootloader, b= ut doesn't actually install the bootloader *yet*. * I don't know how best to ask the user if they want to retry the system bu= ilding process. Thanks raid5atemyhomework