unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#71144: Interactive prompt opened upon shepherd config file error
@ 2024-05-23 10:59 Ludovic Courtès
  2024-05-25  9:11 ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2024-05-23 10:59 UTC (permalink / raw)
  To: 71144; +Cc: Andreas Enge, Christopher Baines

Hello,

One problem we noticed in the analysis of the boot problem of bayfront
after the recent downtime¹ is that an interactive REPL would be opened
after an unbound variable was found in the shepherd config file:

--8<---------------cut here---------------start------------->8---
[   13.098907] shepherd[1]: Service root started.
[   13.100711] shepherd[1]: Service root running with value #t.
[   13.103824] shepherd[1]: Service root has been started.
[   13.426102] shepherd[1]: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
[   13.428099] shepherd[1]: Unbound variable: make-forkexec-constructor/container
[   13.429912] shepherd[1]: 
[   13.431108] shepherd[1]: Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
[   13.441983] shepherd[1]: GNU Guile 3.0.9
[   13.442728] shepherd[1]: Copyright (C) 1995-2023 Free Software Foundation, Inc.
[   13.443947] shepherd[1]: 
[   13.444427] shepherd[1]: Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
[   13.445679] shepherd[1]: This program is free software, and you are welcome to redistribute it
[   13.446919] shepherd[1]: under certain conditions; type `,show c' for 
[   13.447072] shepherd[1]: details.
[   13.448737] shepherd[1]: 
[   13.449239] shepherd[1]: Enter `,help' for help.
--8<---------------cut here---------------end--------------->8---

This was unhelpful because we couldn’t interact with that REPL remotely
(no IPMI).  Even when you can interact, it’s of limited use; in this
case, if you type “,q”, it tries to continue and fails:

--8<---------------cut here---------------start------------->8---
Uncaught exception in task:
In fibers.scm:
    172:8  7 (_)
In ice-9/exceptions.scm:
   406:15  6 (_)
In ice-9/boot-9.scm:
  1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In shepherd/service.scm:
   824:39  4 (_)


this is because we’re effectively adding #f in the middle of the list
passed to ‘register-services’ (see below).

This REPL-on-error “feature” comes from Guix System, not Shepherd, in
the config file generated from (gnu services shepherd):

    ;; Arrange to spawn a REPL if something goes wrong.  This is better
    ;; than a kernel panic.
    (call-with-error-handling
      (lambda ()
        (register-services
         (parameterize ((current-warning-port
                         (%make-void-port "w")))
           (map (lambda (file)
                  (save-module-excursion
                   (lambda ()
                     (set-current-module (make-user-module))
                     (load-compiled file))))
                '#$(map scm->go files))))))

The rationale mentioned in the comment no longer holds: starting from
Shepherd 0.10.2, the config file is loaded in the background; if it’s
evaluation fails, shepherd keeps running (see
‘tests/config-failure.sh’, which tests this behavior).

I think we should change the above to log and gracefully handle failure
to load an individual service file.

Ludo’.

¹ https://lists.gnu.org/archive/html/info-guix/2024-05/msg00000.html




^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#71144: Interactive prompt opened upon shepherd config file error
  2024-05-23 10:59 bug#71144: Interactive prompt opened upon shepherd config file error Ludovic Courtès
@ 2024-05-25  9:11 ` Ludovic Courtès
  2024-05-25 12:02   ` Christopher Baines
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2024-05-25  9:11 UTC (permalink / raw)
  To: 71144; +Cc: Andreas Enge, Christopher Baines

[-- Attachment #1: Type: text/plain, Size: 2596 bytes --]

Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> I think we should change the above to log and gracefully handle failure
> to load an individual service file.

With the change below, every service except the offending one is loaded
and started as expected:

--8<---------------cut here---------------start------------->8---
[   22.450515] shepherd[1]: Service root running with value #t.
[   22.454624] shepherd[1]: Service root has been started.
[   22.711738] shepherd[1]: Exception caught while loading '/gnu/store/fjis6iqpjfcnr90fy8rsg9v4j828jslv-shepherd-gwl-web.go': #<&compound-exception components: (#<&undefined-variable> #<&origin origin: #f> #<&message message: "Unbound variable: ~S"> #<&irritants irri
[   22.711839] tants: (make-forkexec-constructor/container)> #<&exception-with-kind-and-args kind: unbound-variable args: (#f "Unbound variable: ~S" (make-forkexec-constructor/container) #f)>)>
[   22.755146] shepherd[1]: starting services...
[   22.756491] shepherd[1]: Configuration successfully loaded from '/gnu/store/mq7y31xnjcjwjkyf6w7qiaq61g6n9f5x-shepherd.conf'.
Uncaught exception in task:
In fibers.scm:
    172:8  7 (_)
In ice-9/exceptions.scm:
   406:15  6 (_)
In ice-9/boot-9.scm:
  1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In shepherd/service.scm:
   824:39  4 (_)
In oop/goops.scm:
  1567:11  3 (cache-miss #f)
   1585:2  2 (_ _ _)
In ice-9/boot-9.scm:
  1685:16  1 (raise-exception _ #:continuable? _)
  1683:16  0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1683:16: In procedure raise-exception:
No applicable method for #<<generic> one-shot-service? (1)> in call (one-shot-service? #f)
[   22.798737] shepherd[1]: Starting service user-file-systems...
[   22.800361] shepherd[1]: Starting service root-file-system...
[   22.802015] shepherd[1]: Starting service host-name...
[   22.803688] shepherd[1]: Starting service pam...
[   22.805372] shepherd[1]: Starting service sysctl...
[   22.806926] shepherd[1]: Starting service loopback...
[   22.808225] shepherd[1]: Starting service firewall...
--8<---------------cut here---------------end--------------->8---

(There’s still this scary-looking but harmless backtrace in the middle:
that’s because (start-in-the-background '(something-that-does-not-exist))
throws like that as of 0.10.4.)

Once booted, shepherd is fine and you can interact normally with it; the
only thing missing is, in this case, the ‘gwl-web’ service, which we
failed to load.

I think that’s a significant improvement.

Thoughts?

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 2153 bytes --]

diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm
index 455e972535d..f13c52c37ba 100644
--- a/gnu/services/shepherd.scm
+++ b/gnu/services/shepherd.scm
@@ -380,8 +380,7 @@ (define (shepherd-configuration-file services shepherd)
         (scm->go (cute scm->go <> shepherd)))
     (define config
       #~(begin
-          (use-modules (srfi srfi-34)
-                       (system repl error-handling))
+          (use-modules (srfi srfi-1))
 
           (define (make-user-module)
             ;; Copied from (shepherd support), where it's private.
@@ -417,17 +416,22 @@ (define (shepherd-configuration-file services shepherd)
 
           ;; Arrange to spawn a REPL if something goes wrong.  This is better
           ;; than a kernel panic.
-          (call-with-error-handling
-            (lambda ()
-              (register-services
-               (parameterize ((current-warning-port
-                               (%make-void-port "w")))
-                 (map (lambda (file)
-                        (save-module-excursion
-                         (lambda ()
-                           (set-current-module (make-user-module))
-                           (load-compiled file))))
-                      '#$(map scm->go files))))))
+          (register-services
+           (parameterize ((current-warning-port (%make-void-port "w")))
+             (filter-map (lambda (file)
+                           (with-exception-handler
+                               (lambda (exception)
+                                 (format #t "Exception caught \
+while loading '~a': ~s~%"
+                                         file exception)
+                                 #f)
+                             (lambda ()
+                               (save-module-excursion
+                                (lambda ()
+                                  (set-current-module (make-user-module))
+                                  (load-compiled file))))
+                             #:unwind? #t))
+                         '#$(map scm->go files))))
 
           (format #t "starting services...~%")
           (let ((services-to-start

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#71144: Interactive prompt opened upon shepherd config file error
  2024-05-25  9:11 ` Ludovic Courtès
@ 2024-05-25 12:02   ` Christopher Baines
  2024-05-25 14:59     ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Baines @ 2024-05-25 12:02 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 71144, Andreas Enge

[-- Attachment #1: Type: text/plain, Size: 2872 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:

> Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
>
>> I think we should change the above to log and gracefully handle failure
>> to load an individual service file.
>
> With the change below, every service except the offending one is loaded
> and started as expected:
>
> --8<---------------cut here---------------start------------->8---
> [   22.450515] shepherd[1]: Service root running with value #t.
> [   22.454624] shepherd[1]: Service root has been started.
> [   22.711738] shepherd[1]: Exception caught while loading '/gnu/store/fjis6iqpjfcnr90fy8rsg9v4j828jslv-shepherd-gwl-web.go': #<&compound-exception components: (#<&undefined-variable> #<&origin origin: #f> #<&message message: "Unbound variable: ~S"> #<&irritants irri
> [   22.711839] tants: (make-forkexec-constructor/container)> #<&exception-with-kind-and-args kind: unbound-variable args: (#f "Unbound variable: ~S" (make-forkexec-constructor/container) #f)>)>
> [   22.755146] shepherd[1]: starting services...
> [   22.756491] shepherd[1]: Configuration successfully loaded from '/gnu/store/mq7y31xnjcjwjkyf6w7qiaq61g6n9f5x-shepherd.conf'.
> Uncaught exception in task:
> In fibers.scm:
>     172:8  7 (_)
> In ice-9/exceptions.scm:
>    406:15  6 (_)
> In ice-9/boot-9.scm:
>   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
> In shepherd/service.scm:
>    824:39  4 (_)
> In oop/goops.scm:
>   1567:11  3 (cache-miss #f)
>    1585:2  2 (_ _ _)
> In ice-9/boot-9.scm:
>   1685:16  1 (raise-exception _ #:continuable? _)
>   1683:16  0 (raise-exception _ #:continuable? _)
> ice-9/boot-9.scm:1683:16: In procedure raise-exception:
> No applicable method for #<<generic> one-shot-service? (1)> in call (one-shot-service? #f)
> [   22.798737] shepherd[1]: Starting service user-file-systems...
> [   22.800361] shepherd[1]: Starting service root-file-system...
> [   22.802015] shepherd[1]: Starting service host-name...
> [   22.803688] shepherd[1]: Starting service pam...
> [   22.805372] shepherd[1]: Starting service sysctl...
> [   22.806926] shepherd[1]: Starting service loopback...
> [   22.808225] shepherd[1]: Starting service firewall...
> --8<---------------cut here---------------end--------------->8---
>
> (There’s still this scary-looking but harmless backtrace in the middle:
> that’s because (start-in-the-background '(something-that-does-not-exist))
> throws like that as of 0.10.4.)
>
> Once booted, shepherd is fine and you can interact normally with it; the
> only thing missing is, in this case, the ‘gwl-web’ service, which we
> failed to load.
>
> I think that’s a significant improvement.
>
> Thoughts?

That looks good to me, the "Arrange to spawn a REPL if something goes
wrong" comment needs removing/updating, but that's the only thing I
spotted.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#71144: Interactive prompt opened upon shepherd config file error
  2024-05-25 12:02   ` Christopher Baines
@ 2024-05-25 14:59     ` Ludovic Courtès
  0 siblings, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2024-05-25 14:59 UTC (permalink / raw)
  To: Christopher Baines; +Cc: 71144-done, Andreas Enge

Christopher Baines <mail@cbaines.net> skribis:

> That looks good to me, the "Arrange to spawn a REPL if something goes
> wrong" comment needs removing/updating, but that's the only thing I
> spotted.

Cool.  I updated the comment and pushed it as
cca25a67693bb68a1884a081b415a43fad1e8641.

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-25 15:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 10:59 bug#71144: Interactive prompt opened upon shepherd config file error Ludovic Courtès
2024-05-25  9:11 ` Ludovic Courtès
2024-05-25 12:02   ` Christopher Baines
2024-05-25 14:59     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).