Hello, when Guix machine is shutting down, it keeps waiting for PID associated with [mt76-tx phy0] to terminate. Since it is a kernel thread, it does not happen. Previous discussion on this bug was done via email, and is copied here: Date: Sun, 7 Jan 2024 15:59:51 +0100 From: Tomas Volf <~@wolfsden.cz> To: guix-devel@gnu.org Subject: Re: GNU Shepherd 0.10.3 released On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote: > We are pleased to announce the GNU Shepherd version 0.10.3, a bug-fix > release of the new 0.10.x series, representing 51 commits over 6 months. Congratulations on the release :) > > ** Do not accidentally wait for Linux kernel thread completion > () > > In cases a PID file contained a bogus PID or one that’s only valid in a > separate PID namespace, shepherd could end up waiting for the termination of > what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”). This > situation is now recognized and avoided. This is great, I will not have to remember to run `modprobe -r mt7921e' before each shutdown anymore. I hope. Looking forward to getting it in the Guix :) Have a nice 2024, Tomas Volf Date: Wed, 10 Jan 2024 00:34:48 +0100 From: Ludovic Courtès To: guix-devel@gnu.org Subject: Re: GNU Shepherd 0.10.3 released Tomas Volf <~@wolfsden.cz> skribis: > On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote: [...] >> ** Do not accidentally wait for Linux kernel thread completion >> () >> >> In cases a PID file contained a bogus PID or one that’s only valid in a >> separate PID namespace, shepherd could end up waiting for the termination of >> what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”). This >> situation is now recognized and avoided. > > This is great, I will not have to remember to run `modprobe -r mt7921e' before > each shutdown anymore. I hope. Looking forward to getting it in the Guix :) D’oh, why did you have to do that? How did Shepherd end up with “wrong” PID? I hope this release fixes it! Ludo’. Date: Wed, 10 Jan 2024 17:38:17 +0100 From: Tomas Volf <~@wolfsden.cz> To: Ludovic Courtès Cc: guix-devel@gnu.org Subject: Re: GNU Shepherd 0.10.3 released On 2024-01-10 00:34:48 +0100, Ludovic Courtès wrote: > Tomas Volf <~@wolfsden.cz> skribis: > > > On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote: > > [...] > > >> ** Do not accidentally wait for Linux kernel thread completion > >> () > >> > >> In cases a PID file contained a bogus PID or one that’s only valid in a > >> separate PID namespace, shepherd could end up waiting for the termination of > >> what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”). This > >> situation is now recognized and avoided. > > > > This is great, I will not have to remember to run `modprobe -r mt7921e' before > > each shutdown anymore. I hope. Looking forward to getting it in the Guix :) > > D’oh, why did you have to do that? Otherwise the shepherd would be stuck on shutdown waiting for process named [mt76-tx phy0] to terminate with messages along the lines of: shepherd[1]: waiting for process termination (processes left: (1 678)) It is a kernel thread as far as I can tell (based on https://stackoverflow.com/a/12231039): $ cd /proc/678 $ cat cmdline $ readlink exe; echo $? 1 Removing the module mt7921e stops the thread, so shepherd does not wait for it. > How did Shepherd end up with “wrong” PID? That I do not know. It is visible in `ps' output, so I assume shepherd picked it up on its own somehow. > > I hope this release fixes it! As far as I can tell, the 0.10.3 was already added into guix: $ ps 1 | cat PID TTY STAT TIME COMMAND 1 ? Sl 0:01 /gnu/store/bhynhk0c6ssq3fqqc59fvhxjzwywsjbb-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/06mz0yjkghi7r6d7lmhvv7gryipljhdd-shepherd-0.10.3/bin/shepherd +--config /gnu/store/klkqq2y65k141rlipq4ls0w2rlhds12h-shepherd.conf So I have to say it sadly did not resolve this issue. I am unsure why though. I am not familiar with Shepherd's code base, but quick look at the git log suggested that procedure (@@ (shepherd service) pseudo-process?) is the relevant one. When I try it from a REPL, it returns #t. $ guix shell guile shepherd guile-fibers -- guile GNU Guile 3.0.9 Copyright (C) 1995-2023 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> ,use (shepherd service) scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688) $1 = #t So it *should* work? However the issue is caused by non-free WiFi driver on a corrupted kernel, so I am not sure if it is even problem that needs to be solved... I would (obviously) like to see it resolved, but I probably cannot even bug report it, since it requires non-free hardware and software to reproduce. Tomas PS: It is interesting that `guix shell guile shepherd' is not enough, the guile-fibers have to be explicitly specified as well. Is that expected? Date: Wed, 10 Jan 2024 17:50:19 +0100 From: Tomas Volf <~@wolfsden.cz> To: Ludovic Courtès , guix-devel@gnu.org Subject: Re: GNU Shepherd 0.10.3 released PS: On 2024-01-10 17:38:17 +0100, Tomas Volf wrote: > scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688) The pid is different than above, because this was after a reboot. Tomas Date: Thu, 11 Jan 2024 13:41:39 +0100 From: Ludovic Courtès To: guix-devel@gnu.org Subject: Re: GNU Shepherd 0.10.3 released Hello, Tomas Volf <~@wolfsden.cz> skribis: > Otherwise the shepherd would be stuck on shutdown waiting for process named > > [mt76-tx phy0] > > to terminate with messages along the lines of: > > shepherd[1]: waiting for process termination (processes left: (1 678)) > > It is a kernel thread as far as I can tell (based on > https://stackoverflow.com/a/12231039): > > $ cd /proc/678 > $ cat cmdline > $ readlink exe; echo $? > 1 > > Removing the module mt7921e stops the thread, so shepherd does not wait for it. Ooooh. Then I’m afraid this bug isn’t fixed yet because that code (“waiting for process termination”) is currently in Guix, not in Shepherd. However, ‘processes’, which is what is used here and which is defined in (guix build syscalls), already checks for kernel threads, though it does it differently than what I implemented in shepherd: (define (kernel? pid) "Return #t if PID designates a \"kernel thread\" rather than a normal user-land process." (let ((stat (call-with-input-file (format #f "/proc/~a/stat" pid) (compose string-tokenize read-string)))) ;; See proc.txt in Linux's documentation for the list of fields. (match stat ((pid tcomm state ppid pgrp sid tty_nr tty_pgrp flags min_flt cmin_flt maj_flt cmaj_flt utime stime cutime cstime priority nice num_thread it_real_value start_time vsize rss rsslim (= string->number start_code) (= string->number end_code) _ ...) ;; Got this obscure trick from sysvinit's 'killall5' program. (and (zero? start_code) (zero? end_code)))))) It would be great if you could check whether this approach works for you. (I had completely forgotten about this code. Funny thing is this one was inspired by sysvinit, whereas the one in Shepherd was inspired by systemd. A sign of times!) Ludo’. Date: Thu, 11 Jan 2024 14:12:51 +0100 From: Tomas Volf <~@wolfsden.cz> To: Ludovic Courtès Cc: guix-devel@gnu.org Subject: Re: GNU Shepherd 0.10.3 released On 2024-01-11 13:41:39 +0100, Ludovic Courtès wrote: > Tomas Volf <~@wolfsden.cz> skribis: > > > Otherwise the shepherd would be stuck on shutdown waiting for process named > > > > [mt76-tx phy0] > > > > to terminate with messages along the lines of: > > > > shepherd[1]: waiting for process termination (processes left: (1 678)) > > > > It is a kernel thread as far as I can tell (based on > > https://stackoverflow.com/a/12231039): > > > > $ cd /proc/678 > > $ cat cmdline > > $ readlink exe; echo $? > > 1 > > > > Removing the module mt7921e stops the thread, so shepherd does not wait for it. > > Ooooh. > > Then I’m afraid this bug isn’t fixed yet because that code (“waiting for > process termination”) is currently in Guix, not in Shepherd. > > However, ‘processes’, which is what is used here and which is defined in > (guix build syscalls), already checks for kernel threads, though it > does it differently than what I implemented in shepherd: > > (define (kernel? pid) > "Return #t if PID designates a \"kernel thread\" rather than a normal > user-land process." > (let ((stat (call-with-input-file (format #f "/proc/~a/stat" pid) > (compose string-tokenize read-string)))) > ;; See proc.txt in Linux's documentation for the list of fields. > (match stat > ((pid tcomm state ppid pgrp sid tty_nr tty_pgrp flags min_flt > cmin_flt maj_flt cmaj_flt utime stime cutime cstime > priority nice num_thread it_real_value start_time > vsize rss rsslim > (= string->number start_code) (= string->number end_code) _ ...) > ;; Got this obscure trick from sysvinit's 'killall5' program. > (and (zero? start_code) (zero? end_code)))))) > > It would be great if you could check whether this approach works for > you. Ah, that code indeed returns #f for the pid in question: scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688) $1 = #f The stat file: $ cat /proc/688/stat 688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0 0 0 0 0 So the start_code is not zero (I would guess it is -1). I have no idea what that means though. Tomas Date: Mon, 29 Jan 2024 17:31:33 +0100 From: Ludovic Courtès To: guix-devel@gnu.org Subject: Re: GNU Shepherd 0.10.3 released Hi, Tomas Volf <~@wolfsden.cz> skribis: > Ah, that code indeed returns #f for the pid in question: > > scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688) > $1 = #f > > The stat file: > > $ cat /proc/688/stat > 688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0 0 0 0 0 > > So the start_code is not zero (I would guess it is -1). I have no idea what > that means though. What about this method (from shepherd)? --8<---------------cut here---------------start------------->8--- (define (linux-process-flags pid) "Return the process flags of @var{pid} (or'd @code{PF_} constants), assuming the Linux /proc file system is mounted; raise a @code{system-error} exception otherwise." (call-with-input-file (string-append "/proc/" (number->string pid) "/stat") (lambda (port) (define line (get-string-all port)) ;; Parse like systemd's 'is_kernel_thread' function. (let ((offset (string-index line #\)))) ;offset past 'tcomm' field (match (and offset (string-tokenize (string-drop line (+ offset 1)))) ((state ppid pgrp sid tty-nr tty-pgrp flags . _) (or (string->number flags) 0)) (_ 0)))))) ;; Per-process flag defined in . (define PF_KTHREAD #x00200000) ;I am a kernel thread (define (linux-kernel-thread? pid) "Return true if @var{pid} is a Linux kernel thread." (= PF_KTHREAD (logand (linux-process-flags pid) PF_KTHREAD))) --8<---------------cut here---------------end--------------->8--- If it works better, we can use that in syscalls.scm as well. Ludo’. PS: Having an entry in bug-guix would ensure we don’t lose track of this. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.