From mboxrd@z Thu Jan 1 00:00:00 1970 From: Florian Dold Subject: bug#33968: errors in shepherd service constructors are not logged and lead to misleading status Date: Thu, 3 Jan 2019 22:36:20 +0100 Message-ID: <7c7f7030-a0f2-5fd0-7d02-f203277d7bba@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------127386F850B9CE8C8EAF01A8" Return-path: Received: from eggs.gnu.org ([208.118.235.92]:57708) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gfAfv-0007QA-4g for bug-guix@gnu.org; Thu, 03 Jan 2019 16:37:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gfAfn-0005f9-U2 for bug-guix@gnu.org; Thu, 03 Jan 2019 16:37:09 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:60117) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gfAfm-0005eh-Fp for bug-guix@gnu.org; Thu, 03 Jan 2019 16:37:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gfAfm-0000yY-Bs for bug-guix@gnu.org; Thu, 03 Jan 2019 16:37:02 -0500 Sender: "Debbugs-submit" Resent-Message-ID: Received: from eggs.gnu.org ([208.118.235.92]:57511) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gfAfL-000717-Jo for bug-guix@gnu.org; Thu, 03 Jan 2019 16:36:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gfAfG-0005F1-Lt for bug-guix@gnu.org; Thu, 03 Jan 2019 16:36:35 -0500 Received: from mail-wm1-x32b.google.com ([2a00:1450:4864:20::32b]:36879) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gfAfE-0005BB-Lt for bug-guix@gnu.org; Thu, 03 Jan 2019 16:36:30 -0500 Received: by mail-wm1-x32b.google.com with SMTP id g67so31537360wmd.2 for ; Thu, 03 Jan 2019 13:36:24 -0800 (PST) Received: from [192.168.178.64] (p508876EC.dip0.t-ipconnect.de. [80.136.118.236]) by smtp.gmail.com with ESMTPSA id c129sm35173081wma.48.2019.01.03.13.36.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 03 Jan 2019 13:36:21 -0800 (PST) Content-Language: en-US-large List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: 33968@debbugs.gnu.org This is a multi-part message in MIME format. --------------127386F850B9CE8C8EAF01A8 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Hi Guix, when defining a service type that extends shepherd-root-service-type and the 'start' function of the shepherd-service definition contains an error, the error is silently ignored. No log output is generated at all. For example (full system definition is attached): (define (errtest-shepherd-service c) (list (shepherd-service (provision '(errtest)) (documentation "Errtest") (requirement '(file-systems)) (modules `((shepherd support) (ice-9 match) ,@%default-modules)) (start #~(lambda args (local-output "errtest start") this-is-an-unbound-variable (local-output "errtest end") #t))))) The log message "errtest start" appears in /var/log/messages, as expected. The next line contains an error, and aborts execution of the start function. The error only becomes apparent when manually doing a "herd restart errtest", which shows an error message (but without any error location or stack trace). But the error (regarding the unbound variable) is not logged, and there is no indication in the log that the service couldn't be started in any log. Furthermore the "herd status" of a service that encountered an error in the start function is very misleading: root@errtest ~# herd status errtest Status of errtest: It is stopped. It is enabled. Provides (errtest). Requires (file-systems). Conflicts with (). Will be respawned. It shows "Will be respawned", which is wrong. I'd be happy to work on a patch, but it seems like there is some design discussion necessary, in particular how the "Will be respawned" should be handled. Services have a "respawn?" flag, but of course respawning can only work if the start function executed successfully (and only the service process itself failed) in the first place. I generally feel like the state machine for services needs some work. In particular, it would be useful to distinguish between "failed" and "completed" services instead of conflating both states into "stopped". Or maybe have some more general mechanism for storing state about the service, instead of just the slot that usually contains the PID? - Florian --------------127386F850B9CE8C8EAF01A8 Content-Type: text/x-scheme; name="config-error-reporting.scm" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="config-error-reporting.scm" (use-modules (gnu)) (use-service-modules networking ssh shepherd) (define (errtest-shepherd-service c) (list (shepherd-service (provision '(errtest)) (documentation "Errtest") (requirement '(file-systems)) (modules `((shepherd support) (ice-9 match) ,@%default-modules)) (start #~(lambda args (local-output "errtest start") this-is-an-unbound-variable (local-output "errtest end") #t))))) (define errtest-service-type (service-type (name 'errtest) (extensions (list (service-extension shepherd-root-service-type errtest-shepherd-= service))) (default-value #t))) (operating-system (host-name "errtest") (timezone "Europe/Berlin") (locale "en_US.utf8") (kernel-arguments (list "console=3DttyS0" "console=3Dtty0")) (bootloader (bootloader-configuration (bootloader grub-bootloader) (target "/dev/sdX"))) (file-systems (cons (file-system (device (file-system-label "my-root")) (mount-point "/") (type "ext4")) %base-file-systems)) (services (cons* (service errtest-service-type) %base-services))) --------------127386F850B9CE8C8EAF01A8--