all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Debugging failing shepherd startup
@ 2024-06-21 13:27 maya
  2024-06-21 20:32 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  2024-06-27 13:07 ` Ludovic Courtès
  0 siblings, 2 replies; 6+ messages in thread
From: maya @ 2024-06-21 13:27 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

Hi,

I have an issue with my Guix configuration. From a certain update my
system fails to boot. It successfully boots into kernel and starts
shepherd. But after that shepherd fails to activate some necessary
service and the system is softlocked.

The problem is that I can't neither control the system in the booted
state at that point nor I can check the logs, as logd wasn't started
yet, so kernel messages are only preserved in memory and on the screen.

But I cannot scroll the screen, and can't interact with the system in
any way. I can at least display some of them with disabling silent
kernel, but the issue is the root cause is scrolled away too fast to be
read.

My question is, is there a way to debug this? I mostly need help with
identifying the failing service, once I have it, I think I can sort it
out.

Best regards,
Maya

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Debugging failing shepherd startup
  2024-06-21 13:27 Debugging failing shepherd startup maya
@ 2024-06-21 20:32 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  2024-06-27 13:07 ` Ludovic Courtès
  1 sibling, 0 replies; 6+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-21 20:32 UTC (permalink / raw)
  To: maya, guix-devel

Hi Maya,

On Fri, Jun 21 2024, maya@zenmaya.xyz wrote:

> From a certain update my system fails to boot. It successfully boots
> into kernel and starts shepherd. But after that shepherd fails to
> activate some necessary service and the system is softlocked.

I have had the same problem repeatedly one one piece of equipment.  It
is vexing to the n-th degree.

> My question is, is there a way to debug this? I mostly need help with
> identifying the failing service, once I have it, I think I can sort it
> out.

I disable all suspect services and then add them back with an ad-hoc
geometric algorithm: I add half and, if those works, add a quarter, and
then an eighth...

There is some disagreement one how the Shepherd can produce better
logging output.  Some information may be in /var/log/messages but it can
be hard to find.

You are not alone.  Please post a reproducer if you have one.

Kind regards
Felix



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Debugging failing shepherd startup
  2024-06-21 13:27 Debugging failing shepherd startup maya
  2024-06-21 20:32 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-06-27 13:07 ` Ludovic Courtès
  2024-06-27 17:22   ` Maya
  1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2024-06-27 13:07 UTC (permalink / raw)
  To: maya; +Cc: guix-devel

Hi maya,

maya@zenmaya.xyz skribis:

> I have an issue with my Guix configuration. From a certain update my
> system fails to boot. It successfully boots into kernel and starts
> shepherd. But after that shepherd fails to activate some necessary
> service and the system is softlocked.
>
> The problem is that I can't neither control the system in the booted
> state at that point nor I can check the logs, as logd wasn't started
> yet, so kernel messages are only preserved in memory and on the screen.
>
> But I cannot scroll the screen, and can't interact with the system in
> any way. I can at least display some of them with disabling silent
> kernel, but the issue is the root cause is scrolled away too fast to be
> read.
>
> My question is, is there a way to debug this? I mostly need help with
> identifying the failing service, once I have it, I think I can sort it
> out.

Last month, Shepherd integration in Guix was fixed so that failure to
load one service would not prevent the system from loading and starting
other services:

  https://issues.guix.gnu.org/71144

Does the problem still manifest on a system reconfigured from a commit
after cca25a67693bb68a1884a081b415a43fad1e8641?

(See also <https://issues.guix.gnu.org/71193> for a recent example in
that vein and how the change above addresses it.)

HTH,
Ludo’.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Debugging failing shepherd startup
  2024-06-27 13:07 ` Ludovic Courtès
@ 2024-06-27 17:22   ` Maya
  2024-06-28 15:17     ` Reproducer for " Felix Lechner via Development of GNU Guix and the GNU System distribution.
  0 siblings, 1 reply; 6+ messages in thread
From: Maya @ 2024-06-27 17:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 789 bytes --]


Hi Ludo'

> Last month, Shepherd integration in Guix was fixed so that failure to
> load one service would not prevent the system from loading and starting
> other services:
>
>   https://issues.guix.gnu.org/71144
>
> Does the problem still manifest on a system reconfigured from a commit
> after cca25a67693bb68a1884a081b415a43fad1e8641?
>
> (See also <https://issues.guix.gnu.org/71193> for a recent example in
> that vein and how the change above addresses it.)

It did not fix the issue, but I had noticed that one of the services
that was failing was fuse, as it was the last service listed as failed
and the file-systems service depended on it.

Disabling fuse fixed the issue. I still don't know how to fix the issue
properly, but at least I can reconfigure my system now <3

Maya

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Reproducer for  failing shepherd startup
  2024-06-27 17:22   ` Maya
@ 2024-06-28 15:17     ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  2024-06-28 16:43       ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  0 siblings, 1 reply; 6+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-28 15:17 UTC (permalink / raw)
  To: Maya, Ludovic Courtès, Attila Lendvai; +Cc: guix-devel

Hi Maya, Ludo' and Attila,

On Thu, Jun 27 2024, Maya wrote:

> I still don't know how to fix the issue properly, but at least I can
> reconfigure my system now <3

I have a reproducer!  In the code below, please change "sunday" to "0",
together with this line in your "services":

     (service mdadm-resync-service-type)

When reconfiguring, you should see something like:

    guix deploy: warning: an error occurred while upgrading services on
    'YOUR-FQDN': %exception #<inferior-object #<&action-exception-error
    service: root action: eval key: %exception args: ("#<&message
    message: \"calendar-event: 0: invalid day of week\">")>>

That system should refuse to boot.  Interestingly, the Shepherd will
block in such a way that even the Magic SysRq key 'i' which is normally
enough will not stop it.  I have to go all the way *backwards* to 'b' in
the sequence B-U-S-I-E-R. [1]

Ludo', thank you for making the #:days-of the week symbolic [2] and also
for fixing the Shepherd to be able to show status and reboot again! [3]

Kind regards
Felix

[1] https://lists.gnu.org/archive/html/guix-devel/2024-04/msg00214.html
[2] https://git.savannah.gnu.org/cgit/shepherd.git/commit/?h=devel&id=2e844430ec8aa4aebb7a8c185f54d6f91bbc3cfe
[3] https://lists.gnu.org/archive/html/info-gnu/2024-06/msg00009.html

* * *

(define (mdadm-resync-shepherd-service config)
  (shepherd-service
   (provision '(mdadm-resync))
   (requirement '(file-systems user-processes))
   (modules '((ice-9 ftw)
              (ice-9 regex)
              (shepherd service timer)))
   (start #~(make-timer-constructor
             ;; Every first Sunday of the month at 1 AM.
             (calendar-event #:days-of-month '(1 2 3 4 5 6 7)
                             #:days-of-week '(sunday)
                             #:hours '(1))
             (lambda _
               ;; some helpers and error handling
               (define (info message)
                 (let ((timestamp (strftime "%Y-%m-%dT%H:%M:%S%zZ" (localtime (current-time)))))
                   (format (current-error-port) "~a ~a~%" timestamp message)))

               (define (resync array)
                 (let ((port (open-output-file (string-append "/sys/block/" array "/md/sync_action"))))
                   (display "check" port)
                   (close-port port))
                 (info (string-append "Started MD resync for " array ".")))

               (let* ((is-mdadm-device? (lambda (file)
                                          (string-match "^md.+" file)))
                      (arrays (scandir "/dev" is-mdadm-device?)))
                 (map resync arrays)))))
   (stop #~(make-timer-destructor))
   ;; (actions
   ;;  (list (shepherd-action
   ;;         (name 'trigger)
   ;;         (documentation "Trigger the action associated with this timer.")
   ;;         (procedure #~(identity trigger-timer)))))
   (documentation "MD array resync")))

(define mdadm-resync-service-type
  (service-type
   (name 'mdadm-resync)
   (description "MD array resync")
   (extensions
    (list
     (service-extension shepherd-root-service-type
                        (compose list mdadm-resync-shepherd-service))))
   (default-value #f)))


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reproducer for  failing shepherd startup
  2024-06-28 15:17     ` Reproducer for " Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-06-28 16:43       ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-28 16:43 UTC (permalink / raw)
  To: Maya, Ludovic Courtès, Attila Lendvai; +Cc: guix-devel

On Fri, Jun 28 2024, Felix Lechner wrote:

> even the Magic SysRq key 'i' [...] enough will not stop it.

Sorry, I meant to write 'e' which presumably refers to SIGTERM rather
than SIGKILL.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-28 16:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-21 13:27 Debugging failing shepherd startup maya
2024-06-21 20:32 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-27 13:07 ` Ludovic Courtès
2024-06-27 17:22   ` Maya
2024-06-28 15:17     ` Reproducer for " Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-28 16:43       ` Felix Lechner via Development of GNU Guix and the GNU System distribution.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.