* Making the Guix installer resilient against transient network issues
@ 2021-03-19 3:09 raid5atemyhomework
0 siblings, 0 replies; only message in thread
From: raid5atemyhomework @ 2021-03-19 3:09 UTC (permalink / raw)
To: guix-devel@gnu.org, Mathieu Othacehe
Hello Guix devel,
When `guix system init` fails, there are a number of possible causes of failure:
* Packages being downloaded are so broken that they cannot actually be built.
* QC should filter this out.
* The hardware being installed on is broken, usually a failure of the storage device being installed into.
* Downloading substitutes from the substitute server failed.
Of the above, the last is the most likely to occur in practice.
I have been doing a number of repeated installation tests on VMs using the SJTUG mirror server, as well as the Berlin Cuirass, and a significant number of installation attempts via the guided installer fail due to problems with downloading substitutes.
* From my system, the Berlin Cuirass server is very very very slow (< 40kiB/s, sometimes as low as 4kiB/s) and possibly because of the slowness, the download gets interrupted part of the way through which causes the install to fail.
* The SJTUG server sometimes responds in ways that the Guix downloader does not expect, causing failures.
What I do instead is to use the "manual" mode and just keep doing `guix system build` over and over until it manages to pull through.
I think that the guided installer should also use the same technique of trying `guix system build` repeatedly for at least some number of tries, possibly asking the user if they want to keep trying (in case the issue is a permanent network error rather than a transient network error).
Yes, currently a failure to install "just" kicks the user back to the guided install and they can rerun `guix system init`. ***HOWEVER***, because the store is in a COW mode, this sometimes leaves the store in a wonky state and the `guix system init` performs the system build from 0, or it can fail. Not to mention that this is requires more keypresses for the user.
So, let me sketch proposed changes to `gnu/installer/final.scm`:
```patch
@@ -169,6 +169,15 @@ or #f. Return #t on success and #f on failure."
"/tmp/installer-system-init-options"
read))
(const '())))
+ (build-command (append (list "guix" "system" "build"
+ "--fallback")
+ options
+ (list (%installer-configuration-file))))
+ (build-grub-command
+ (append (list "guix" "build"
+ "--fallback"
+ "grub" "grub-efi")
+ options))
(install-command (append (list "guix" "system" "init"
"--fallback")
options
@@ -178,6 +187,36 @@ or #f. Return #t on success and #f on failure."
(database-file (string-append database-dir "/db.sqlite"))
(saved-database (string-append database-dir "/db.save"))
(ret #f))
+
+ (define* (perform-install #:optional (tries 0))
+
+ (define (retry)
+ (perform-install (+ tries 1)))
+
+ (define (ask-if-retry)
+ ;; TODO. Not sure best way to query user whether they
+ ;; would like to retry again.
+ )
+
+ (if (and (run-command build-command #:locale locale)
+ (run-command build-grub-command #:locale locale))
+ (run-command install-command #:locale locale)
+ ;; Try to recover.
+ (begin
+ (format #t "~%~%~s~%~s~%~%"
+ (G_ "Failure while building system.")
+ (G_ "This is usually caused by (hopefully transient) network errors."))
+ (cond
+ ((< tries %max-auto-system-build-retries)
+ (format #t "~s~%"
+ (G_ "Will wait 3 seconds and retry..."))
+ (sleep 3)
+ (retry))
+ (else
+ #f)))))
+
(mkdir-p (%installer-target-dir))
;; We want to initialize user passwords but we don't want to store them in
@@ -221,9 +260,8 @@ or #f. Return #t on success and #f on failure."
(lambda ()
(with-error-to-file "/dev/console"
(lambda ()
- (run-command install-command
- #:locale locale)))))
- (run-command install-command #:locale locale))))
+ (perform-install)))))
+ (perform-install))))
(lambda ()
;; Restart guix-daemon so that it does no keep the MNT namespace
;; alive.
```
Notes:
* `guix system build` only builds the *system*. It doesn't build the bootloader. I can't find a command that builds the bootloader; only `guix system init` or `guix system reconfigure` do that, but we need to differentiate between the failure "downloading from the substituter failed" (which might be fixable by just retrying) from "writing to the device being installed into failed".
* In the above I use `guix build grub grub-efi` as a proxy for this, but it would be nice if there were some kind of `guix system build-bootloader` that would perform *building* of the script that installs the bootloader, but doesn't actually install the bootloader *yet*.
* I don't know how best to ask the user if they want to retry the system building process.
Thanks
raid5atemyhomework
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2021-03-19 3:10 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19 3:09 Making the Guix installer resilient against transient network issues raid5atemyhomework
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).