all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#68800: Guix waits for termination of a kernel thread
@ 2024-01-29 17:34 Tomas Volf
  2024-01-29 17:44 ` Tomas Volf
  0 siblings, 1 reply; 3+ messages in thread
From: Tomas Volf @ 2024-01-29 17:34 UTC (permalink / raw)
  To: 68800

[-- Attachment #1: Type: text/plain, Size: 12827 bytes --]

Hello,

when Guix machine is shutting down, it keeps waiting for PID associated with
[mt76-tx phy0] to terminate.  Since it is a kernel thread, it does not happen.

Previous discussion on this bug was done via email, and is copied here:



Date: Sun, 7 Jan 2024 15:59:51 +0100
From: Tomas Volf <~@wolfsden.cz>
To: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:
> We are pleased to announce the GNU Shepherd version 0.10.3, a bug-fix
> release of the new 0.10.x series, representing 51 commits over 6 months.

Congratulations on the release :)

>
>   ** Do not accidentally wait for Linux kernel thread completion
>      (<https://issues.guix.gnu.org/67132>)
>
>   In cases a PID file contained a bogus PID or one that’s only valid in a
>   separate PID namespace, shepherd could end up waiting for the termination of
>   what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”).  This
>   situation is now recognized and avoided.

This is great, I will not have to remember to run `modprobe -r mt7921e' before
each shutdown anymore.  I hope.  Looking forward to getting it in the Guix :)

Have a nice 2024,
Tomas Volf



Date: Wed, 10 Jan 2024 00:34:48 +0100
From: Ludovic Courtès <ludo@gnu.org>
To: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

Tomas Volf <~@wolfsden.cz> skribis:

> On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:

[...]

>>   ** Do not accidentally wait for Linux kernel thread completion
>>      (<https://issues.guix.gnu.org/67132>)
>>
>>   In cases a PID file contained a bogus PID or one that’s only valid in a
>>   separate PID namespace, shepherd could end up waiting for the termination of
>>   what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”).  This
>>   situation is now recognized and avoided.
>
> This is great, I will not have to remember to run `modprobe -r mt7921e' before
> each shutdown anymore.  I hope.  Looking forward to getting it in the Guix :)

D’oh, why did you have to do that?  How did Shepherd end up with “wrong” PID?

I hope this release fixes it!

Ludo’.



Date: Wed, 10 Jan 2024 17:38:17 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

On 2024-01-10 00:34:48 +0100, Ludovic Courtès wrote:
> Tomas Volf <~@wolfsden.cz> skribis:
>
> > On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:
>
> [...]
>
> >>   ** Do not accidentally wait for Linux kernel thread completion
> >>      (<https://issues.guix.gnu.org/67132>)
> >>
> >>   In cases a PID file contained a bogus PID or one that’s only valid in a
> >>   separate PID namespace, shepherd could end up waiting for the termination of
> >>   what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”).  This
> >>   situation is now recognized and avoided.
> >
> > This is great, I will not have to remember to run `modprobe -r mt7921e' before
> > each shutdown anymore.  I hope.  Looking forward to getting it in the Guix :)
>
> D’oh, why did you have to do that?

Otherwise the shepherd would be stuck on shutdown waiting for process named

    [mt76-tx phy0]

to terminate with messages along the lines of:

    shepherd[1]: waiting for process termination (processes left: (1 678))

It is a kernel thread as far as I can tell (based on
https://stackoverflow.com/a/12231039):

    $ cd /proc/678
    $ cat cmdline
    $ readlink exe; echo $?
    1

Removing the module mt7921e stops the thread, so shepherd does not wait for it.

> How did Shepherd end up with “wrong” PID?

That I do not know.  It is visible in `ps' output, so I assume shepherd picked
it up on its own somehow.

>
> I hope this release fixes it!

As far as I can tell, the 0.10.3 was already added into guix:

    $ ps 1 | cat
      PID TTY      STAT   TIME COMMAND
        1 ?        Sl     0:01 /gnu/store/bhynhk0c6ssq3fqqc59fvhxjzwywsjbb-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/06mz0yjkghi7r6d7lmhvv7gryipljhdd-shepherd-0.10.3/bin/shepherd
+--config /gnu/store/klkqq2y65k141rlipq4ls0w2rlhds12h-shepherd.conf

So I have to say it sadly did not resolve this issue.  I am unsure why though.
I am not familiar with Shepherd's code base, but quick look at the git log
suggested that procedure (@@ (shepherd service) pseudo-process?) is the relevant
one.  When I try it from a REPL, it returns #t.

    $ guix shell guile shepherd guile-fibers -- guile
    GNU Guile 3.0.9
    Copyright (C) 1995-2023 Free Software Foundation, Inc.

    Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
    This program is free software, and you are welcome to redistribute it
    under certain conditions; type `,show c' for details.

    Enter `,help' for help.
    scheme@(guile-user)> ,use (shepherd service)
    scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688)
    $1 = #t

So it *should* work?  However the issue is caused by non-free WiFi driver on a
corrupted kernel, so I am not sure if it is even problem that needs to be
solved...  I would (obviously) like to see it resolved, but I probably cannot
even bug report it, since it requires non-free hardware and software to
reproduce.

Tomas

PS: It is interesting that `guix shell guile shepherd' is not enough, the
    guile-fibers have to be explicitly specified as well.  Is that expected?



Date: Wed, 10 Jan 2024 17:50:19 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo@gnu.org>, guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

PS:

On 2024-01-10 17:38:17 +0100, Tomas Volf wrote:
>     scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688)

The pid is different than above, because this was after a reboot.

Tomas



Date: Thu, 11 Jan 2024 13:41:39 +0100
From: Ludovic Courtès <ludo@gnu.org>
To: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

Hello,

Tomas Volf <~@wolfsden.cz> skribis:

> Otherwise the shepherd would be stuck on shutdown waiting for process named
>
>     [mt76-tx phy0]
>
> to terminate with messages along the lines of:
>
>     shepherd[1]: waiting for process termination (processes left: (1 678))
>
> It is a kernel thread as far as I can tell (based on
> https://stackoverflow.com/a/12231039):
>
>     $ cd /proc/678
>     $ cat cmdline
>     $ readlink exe; echo $?
>     1
>
> Removing the module mt7921e stops the thread, so shepherd does not wait for it.

Ooooh.

Then I’m afraid this bug isn’t fixed yet because that code (“waiting for
process termination”) is currently in Guix, not in Shepherd.

However, ‘processes’, which is what is used here and which is defined in
(guix build syscalls), already checks for kernel threads, though it
does it differently than what I implemented in shepherd:

  (define (kernel? pid)
    "Return #t if PID designates a \"kernel thread\" rather than a normal
  user-land process."
    (let ((stat (call-with-input-file (format #f "/proc/~a/stat" pid)
                  (compose string-tokenize read-string))))
      ;; See proc.txt in Linux's documentation for the list of fields.
      (match stat
        ((pid tcomm state ppid pgrp sid tty_nr tty_pgrp flags min_flt
              cmin_flt maj_flt cmaj_flt utime stime cutime cstime
              priority nice num_thread it_real_value start_time
              vsize rss rsslim
              (= string->number start_code) (= string->number end_code) _ ...)
         ;; Got this obscure trick from sysvinit's 'killall5' program.
         (and (zero? start_code) (zero? end_code))))))

It would be great if you could check whether this approach works for
you.

(I had completely forgotten about this code.  Funny thing is this one
was inspired by sysvinit, whereas the one in Shepherd was inspired by
systemd.  A sign of times!)

Ludo’.



Date: Thu, 11 Jan 2024 14:12:51 +0100
From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

On 2024-01-11 13:41:39 +0100, Ludovic Courtès wrote:
> Tomas Volf <~@wolfsden.cz> skribis:
>
> > Otherwise the shepherd would be stuck on shutdown waiting for process named
> >
> >     [mt76-tx phy0]
> >
> > to terminate with messages along the lines of:
> >
> >     shepherd[1]: waiting for process termination (processes left: (1 678))
> >
> > It is a kernel thread as far as I can tell (based on
> > https://stackoverflow.com/a/12231039):
> >
> >     $ cd /proc/678
> >     $ cat cmdline
> >     $ readlink exe; echo $?
> >     1
> >
> > Removing the module mt7921e stops the thread, so shepherd does not wait for it.
>
> Ooooh.
>
> Then I’m afraid this bug isn’t fixed yet because that code (“waiting for
> process termination”) is currently in Guix, not in Shepherd.
>
> However, ‘processes’, which is what is used here and which is defined in
> (guix build syscalls), already checks for kernel threads, though it
> does it differently than what I implemented in shepherd:
>
>   (define (kernel? pid)
>     "Return #t if PID designates a \"kernel thread\" rather than a normal
>   user-land process."
>     (let ((stat (call-with-input-file (format #f "/proc/~a/stat" pid)
>                   (compose string-tokenize read-string))))
>       ;; See proc.txt in Linux's documentation for the list of fields.
>       (match stat
>         ((pid tcomm state ppid pgrp sid tty_nr tty_pgrp flags min_flt
>               cmin_flt maj_flt cmaj_flt utime stime cutime cstime
>               priority nice num_thread it_real_value start_time
>               vsize rss rsslim
>               (= string->number start_code) (= string->number end_code) _ ...)
>          ;; Got this obscure trick from sysvinit's 'killall5' program.
>          (and (zero? start_code) (zero? end_code))))))
>
> It would be great if you could check whether this approach works for
> you.

Ah, that code indeed returns #f for the pid in question:

    scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688)
    $1 = #f

The stat file:

    $ cat /proc/688/stat
    688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0 0 0 0 0

So the start_code is not zero (I would guess it is -1).  I have no idea what
that means though.

Tomas



Date: Mon, 29 Jan 2024 17:31:33 +0100
From: Ludovic Courtès <ludo@gnu.org>
To: guix-devel@gnu.org
Subject: Re: GNU Shepherd 0.10.3 released

Hi,

Tomas Volf <~@wolfsden.cz> skribis:

> Ah, that code indeed returns #f for the pid in question:
>
>     scheme@(guix-user)> ((@@ (guix build syscalls) kernel?) 688)
>     $1 = #f
>
> The stat file:
>
>     $ cat /proc/688/stat
>     688 (mt76-tx phy0) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -2 0 1 0 964 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 5 1 1 0 0 0 0 0 0 0 0 0 0 0
>
> So the start_code is not zero (I would guess it is -1).  I have no idea what
> that means though.

What about this method (from shepherd)?

--8<---------------cut here---------------start------------->8---
(define (linux-process-flags pid)
  "Return the process flags of @var{pid} (or'd @code{PF_} constants), assuming
the Linux /proc file system is mounted; raise a @code{system-error} exception
otherwise."
  (call-with-input-file (string-append "/proc/" (number->string pid)
                                       "/stat")
    (lambda (port)
      (define line
        (get-string-all port))

      ;; Parse like systemd's 'is_kernel_thread' function.
      (let ((offset (string-index line #\))))     ;offset past 'tcomm' field
        (match (and offset
                    (string-tokenize (string-drop line (+ offset 1))))
          ((state ppid pgrp sid tty-nr tty-pgrp flags . _)
           (or (string->number flags) 0))
          (_
           0))))))

;; Per-process flag defined in <linux/sched.h>.
(define PF_KTHREAD #x00200000)                    ;I am a kernel thread

(define (linux-kernel-thread? pid)
  "Return true if @var{pid} is a Linux kernel thread."
  (= PF_KTHREAD (logand (linux-process-flags pid) PF_KTHREAD)))
--8<---------------cut here---------------end--------------->8---

If it works better, we can use that in syscalls.scm as well.

Ludo’.

PS: Having an entry in bug-guix would ensure we don’t lose track of this.


--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-02-20 10:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-29 17:34 bug#68800: Guix waits for termination of a kernel thread Tomas Volf
2024-01-29 17:44 ` Tomas Volf
2024-02-20 10:08   ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.