[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Making the Guix installer resilient against transient network issues
From: |
raid5atemyhomework |
Subject: |
Making the Guix installer resilient against transient network issues |
Date: |
Fri, 19 Mar 2021 03:09:54 +0000 |
Hello Guix devel,
When `guix system init` fails, there are a number of possible causes of failure:
* Packages being downloaded are so broken that they cannot actually be built.
* QC should filter this out.
* The hardware being installed on is broken, usually a failure of the storage
device being installed into.
* Downloading substitutes from the substitute server failed.
Of the above, the last is the most likely to occur in practice.
I have been doing a number of repeated installation tests on VMs using the
SJTUG mirror server, as well as the Berlin Cuirass, and a significant number of
installation attempts via the guided installer fail due to problems with
downloading substitutes.
* From my system, the Berlin Cuirass server is very very very slow (< 40kiB/s,
sometimes as low as 4kiB/s) and possibly because of the slowness, the download
gets interrupted part of the way through which causes the install to fail.
* The SJTUG server sometimes responds in ways that the Guix downloader does not
expect, causing failures.
What I do instead is to use the "manual" mode and just keep doing `guix system
build` over and over until it manages to pull through.
I think that the guided installer should also use the same technique of trying
`guix system build` repeatedly for at least some number of tries, possibly
asking the user if they want to keep trying (in case the issue is a permanent
network error rather than a transient network error).
Yes, currently a failure to install "just" kicks the user back to the guided
install and they can rerun `guix system init`. ***HOWEVER***, because the
store is in a COW mode, this sometimes leaves the store in a wonky state and
the `guix system init` performs the system build from 0, or it can fail. Not
to mention that this is requires more keypresses for the user.
So, let me sketch proposed changes to `gnu/installer/final.scm`:
```patch
@@ -169,6 +169,15 @@ or #f. Return #t on success and #f on failure."
"/tmp/installer-system-init-options"
read))
(const '())))
+ (build-command (append (list "guix" "system" "build"
+ "--fallback")
+ options
+ (list (%installer-configuration-file))))
+ (build-grub-command
+ (append (list "guix" "build"
+ "--fallback"
+ "grub" "grub-efi")
+ options))
(install-command (append (list "guix" "system" "init"
"--fallback")
options
@@ -178,6 +187,36 @@ or #f. Return #t on success and #f on failure."
(database-file (string-append database-dir "/db.sqlite"))
(saved-database (string-append database-dir "/db.save"))
(ret #f))
+
+ (define* (perform-install #:optional (tries 0))
+
+ (define (retry)
+ (perform-install (+ tries 1)))
+
+ (define (ask-if-retry)
+ ;; TODO. Not sure best way to query user whether they
+ ;; would like to retry again.
+ )
+
+ (if (and (run-command build-command #:locale locale)
+ (run-command build-grub-command #:locale locale))
+ (run-command install-command #:locale locale)
+ ;; Try to recover.
+ (begin
+ (format #t "~%~%~s~%~s~%~%"
+ (G_ "Failure while building system.")
+ (G_ "This is usually caused by (hopefully transient)
network errors."))
+ (cond
+ ((< tries %max-auto-system-build-retries)
+ (format #t "~s~%"
+ (G_ "Will wait 3 seconds and retry..."))
+ (sleep 3)
+ (retry))
+ (else
+ #f)))))
+
(mkdir-p (%installer-target-dir))
;; We want to initialize user passwords but we don't want to store them in
@@ -221,9 +260,8 @@ or #f. Return #t on success and #f on failure."
(lambda ()
(with-error-to-file "/dev/console"
(lambda ()
- (run-command install-command
- #:locale locale)))))
- (run-command install-command #:locale locale))))
+ (perform-install)))))
+ (perform-install))))
(lambda ()
;; Restart guix-daemon so that it does no keep the MNT namespace
;; alive.
```
Notes:
* `guix system build` only builds the *system*. It doesn't build the
bootloader. I can't find a command that builds the bootloader; only `guix
system init` or `guix system reconfigure` do that, but we need to differentiate
between the failure "downloading from the substituter failed" (which might be
fixable by just retrying) from "writing to the device being installed into
failed".
* In the above I use `guix build grub grub-efi` as a proxy for this, but it
would be nice if there were some kind of `guix system build-bootloader` that
would perform *building* of the script that installs the bootloader, but
doesn't actually install the bootloader *yet*.
* I don't know how best to ask the user if they want to retry the system
building process.
Thanks
raid5atemyhomework
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Making the Guix installer resilient against transient network issues,
raid5atemyhomework <=