unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Making the Guix installer resilient against transient network issues
@ 2021-03-19  3:09 raid5atemyhomework
  0 siblings, 0 replies; only message in thread
From: raid5atemyhomework @ 2021-03-19  3:09 UTC (permalink / raw)
  To: guix-devel@gnu.org, Mathieu Othacehe

Hello Guix devel,

When `guix system init` fails, there are a number of possible causes of failure:

* Packages being downloaded are so broken that they cannot actually be built.
  * QC should filter this out.
* The hardware being installed on is broken, usually a failure of the storage device being installed into.
* Downloading substitutes from the substitute server failed.

Of the above, the last is the most likely to occur in practice.
I have been doing a number of repeated installation tests on VMs using the SJTUG mirror server, as well as the Berlin Cuirass, and a significant number of installation attempts via the guided installer fail due to problems with downloading substitutes.

* From my system, the Berlin Cuirass server is very very very slow (< 40kiB/s, sometimes as low as 4kiB/s) and possibly because of the slowness, the download gets interrupted part of the way through which causes the install to fail.
* The SJTUG server sometimes responds in ways that the Guix downloader does not expect, causing failures.

What I do instead is to use the "manual" mode and just keep doing `guix system build` over and over until it manages to pull through.

I think that the guided installer should also use the same technique of trying `guix system build` repeatedly for at least some number of tries, possibly asking the user if they want to keep trying (in case the issue is a permanent network error rather than a transient network error).

Yes, currently a failure to install "just" kicks the user back to the guided install and they can rerun `guix system init`.  ***HOWEVER***, because the store is in a COW mode, this sometimes leaves the store in a wonky state and the `guix system init` performs the system build from 0, or it can fail.  Not to mention that this is requires more keypresses for the user.

So, let me sketch proposed changes to `gnu/installer/final.scm`:

```patch
@@ -169,6 +169,15 @@ or #f.  Return #t on success and #f on failure."
                                   "/tmp/installer-system-init-options"
                                 read))
                             (const '())))
+         (build-command   (append (list "guix" "system" "build"
+                                        "--fallback")
+                                  options
+                                  (list (%installer-configuration-file))))
+         (build-grub-command
+                          (append (list "guix" "build"
+                                        "--fallback"
+                                        "grub" "grub-efi")
+                                  options))
          (install-command (append (list "guix" "system" "init"
                                         "--fallback")
                                   options
@@ -178,6 +187,36 @@ or #f.  Return #t on success and #f on failure."
          (database-file   (string-append database-dir "/db.sqlite"))
          (saved-database  (string-append database-dir "/db.save"))
          (ret             #f))
+
+    (define* (perform-install #:optional (tries 0))
+
+      (define (retry)
+        (perform-install (+ tries 1)))
+
+      (define (ask-if-retry)
+        ;; TODO. Not sure best way to query user whether they
+        ;; would like to retry again.
+        )
+
+      (if (and (run-command build-command #:locale locale)
+               (run-command build-grub-command #:locale locale))
+          (run-command install-command #:locale locale)
+          ;; Try to recover.
+          (begin
+            (format #t "~%~%~s~%~s~%~%"
+                    (G_ "Failure while building system.")
+                    (G_ "This is usually caused by (hopefully transient) network errors."))
+            (cond
+              ((< tries %max-auto-system-build-retries)
+               (format #t "~s~%"
+                       (G_ "Will wait 3 seconds and retry..."))
+               (sleep 3)
+               (retry))
+              (else
+               #f)))))
+
     (mkdir-p (%installer-target-dir))

     ;; We want to initialize user passwords but we don't want to store them in
@@ -221,9 +260,8 @@ or #f.  Return #t on success and #f on failure."
                          (lambda ()
                            (with-error-to-file "/dev/console"
                              (lambda ()
-                               (run-command install-command
-                                            #:locale locale)))))
-                       (run-command install-command #:locale locale))))
+                               (perform-install)))))
+                       (perform-install))))
            (lambda ()
              ;; Restart guix-daemon so that it does no keep the MNT namespace
              ;; alive.
```

Notes:

* `guix system build` only builds the *system*.  It doesn't build the bootloader.  I can't find a command that builds the bootloader; only `guix system init` or `guix system reconfigure` do that, but we need to differentiate between the failure "downloading from the substituter failed" (which might be fixable by just retrying) from "writing to the device being installed into failed".
  * In the above I use `guix build grub grub-efi` as a proxy for this, but it would be nice if there were some kind of `guix system build-bootloader` that would perform *building* of the script that installs the bootloader, but doesn't actually install the bootloader *yet*.
* I don't know how best to ask the user if they want to retry the system building process.


Thanks
raid5atemyhomework


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-03-19  3:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19  3:09 Making the Guix installer resilient against transient network issues raid5atemyhomework

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).