unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#33968: errors in shepherd service constructors are not logged and lead to misleading status
@ 2019-01-03 21:36 Florian Dold
  2021-05-20  3:00 ` bug#33968: errors in shepherd service constructors are not logged and lead to misleading status, hang boot Maxim Cournoyer
  2023-06-15 21:15 ` bug#33968: errors in shepherd service constructors are not logged and lead to misleading status Ludovic Courtès
  0 siblings, 2 replies; 3+ messages in thread
From: Florian Dold @ 2019-01-03 21:36 UTC (permalink / raw)
  To: 33968

[-- Attachment #1: Type: text/plain, Size: 2203 bytes --]

Hi Guix,

when defining a service type that extends shepherd-root-service-type and
the 'start' function of the shepherd-service definition contains an
error, the error is silently ignored.  No log output is generated at all.

For example (full system definition is attached):

(define (errtest-shepherd-service c)
  (list
    (shepherd-service
      (provision '(errtest))
      (documentation "Errtest")
      (requirement '(file-systems))
      (modules `((shepherd support) (ice-9 match) ,@%default-modules))
      (start #~(lambda args
                 (local-output "errtest start")
                 this-is-an-unbound-variable
                 (local-output "errtest end")
                 #t)))))


The log message "errtest start" appears in /var/log/messages, as
expected.  The next line contains an error, and aborts execution of the
start function.

The error only becomes apparent when manually doing a "herd restart
errtest", which shows an error message (but without any error location
or stack trace).  But the error (regarding the unbound variable) is not
logged, and there is no indication in the log that the service couldn't
be started in any log.

Furthermore the "herd status" of a service that encountered an error in
the start function is very misleading:

root@errtest ~# herd status errtest
Status of errtest:
  It is stopped.
  It is enabled.
  Provides (errtest).
  Requires (file-systems).
  Conflicts with ().
  Will be respawned.


It shows "Will be respawned", which is wrong.

I'd be happy to work on a patch, but it seems like there is some design
discussion necessary, in particular how the "Will be respawned" should
be handled.  Services have a "respawn?" flag, but of course respawning
can only work if the start function executed successfully (and only the
service process itself failed) in the first place.

I generally feel like the state machine for services needs some work.
In particular, it would be useful to distinguish between "failed" and
"completed" services instead of conflating both states into "stopped".
Or maybe have some more general mechanism for storing state about the
service, instead of just the slot that usually contains the PID?

- Florian

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: config-error-reporting.scm --]
[-- Type: text/x-scheme; name="config-error-reporting.scm", Size: 1253 bytes --]

(use-modules (gnu))
(use-service-modules networking ssh shepherd)

(define (errtest-shepherd-service c)
  (list
    (shepherd-service
      (provision '(errtest))
      (documentation "Errtest")
      (requirement '(file-systems))
      (modules `((shepherd support) (ice-9 match) ,@%default-modules))
      (start #~(lambda args
		 (local-output "errtest start")
		 this-is-an-unbound-variable
		 (local-output "errtest end")
		 #t)))))

(define errtest-service-type
  (service-type
   (name 'errtest)
   (extensions
    (list (service-extension shepherd-root-service-type errtest-shepherd-service)))
   (default-value #t)))

(operating-system
  (host-name "errtest")
  (timezone "Europe/Berlin")
  (locale "en_US.utf8")
  (kernel-arguments (list "console=ttyS0" "console=tty0"))
  (bootloader (bootloader-configuration
                (bootloader grub-bootloader)
                (target "/dev/sdX")))
  (file-systems (cons (file-system
                        (device (file-system-label "my-root"))
                        (mount-point "/")
                        (type "ext4"))
                      %base-file-systems))

  (services (cons* (service errtest-service-type)
                   %base-services)))

^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#33968: errors in shepherd service constructors are not logged and lead to misleading status, hang boot
  2019-01-03 21:36 bug#33968: errors in shepherd service constructors are not logged and lead to misleading status Florian Dold
@ 2021-05-20  3:00 ` Maxim Cournoyer
  2023-06-15 21:15 ` bug#33968: errors in shepherd service constructors are not logged and lead to misleading status Ludovic Courtès
  1 sibling, 0 replies; 3+ messages in thread
From: Maxim Cournoyer @ 2021-05-20  3:00 UTC (permalink / raw)
  To: Florian Dold; +Cc: 33968

Hi Florian,

I stumbled upon this problem with https://issues.guix.gnu.org/48521, and
had a hard time to debug it (due to the completely missing information
in any output or log from shepherd).

Worst, this caused the system to hang early on boot!

I'm raising the priority of this issue.

Thanks,

Maxim




^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#33968: errors in shepherd service constructors are not logged and lead to misleading status
  2019-01-03 21:36 bug#33968: errors in shepherd service constructors are not logged and lead to misleading status Florian Dold
  2021-05-20  3:00 ` bug#33968: errors in shepherd service constructors are not logged and lead to misleading status, hang boot Maxim Cournoyer
@ 2023-06-15 21:15 ` Ludovic Courtès
  1 sibling, 0 replies; 3+ messages in thread
From: Ludovic Courtès @ 2023-06-15 21:15 UTC (permalink / raw)
  To: 33968-done

Florian Dold <florian.dold@gmail.com> skribis:

> when defining a service type that extends shepherd-root-service-type and
> the 'start' function of the shepherd-service definition contains an
> error, the error is silently ignored.  No log output is generated at all.

[...]

> I generally feel like the state machine for services needs some work.
> In particular, it would be useful to distinguish between "failed" and
> "completed" services instead of conflating both states into "stopped".
> Or maybe have some more general mechanism for storing state about the
> service, instead of just the slot that usually contains the PID?

It’s been 4 years (!) but the good news is that all this is fixed as of
Shepherd 0.10.  Closing!

Ludo’.




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-06-15 21:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03 21:36 bug#33968: errors in shepherd service constructors are not logged and lead to misleading status Florian Dold
2021-05-20  3:00 ` bug#33968: errors in shepherd service constructors are not logged and lead to misleading status, hang boot Maxim Cournoyer
2023-06-15 21:15 ` bug#33968: errors in shepherd service constructors are not logged and lead to misleading status Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).