all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Mathieu Othacehe <othacehe@gnu.org>
Cc: 41948@debbugs.gnu.org
Subject: bug#41948: Shepherd deadlocks
Date: Fri, 07 May 2021 23:49:42 +0200	[thread overview]
Message-ID: <87v97u9pu1.fsf@gnu.org> (raw)
In-Reply-To: <87k0xyhq22.fsf@gnu.org> (Mathieu Othacehe's message of "Sun, 16 Aug 2020 11:56:37 +0200")

[-- Attachment #1: Type: text/plain, Size: 3057 bytes --]

Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

> Those two finalizer threads share the same pipe. When we try to
> stop the finalizer thread in Shepherd, right before forking a new
> process, we send a '\1' byte to the finalizer pipe.
>
> 1     write(6, "\1", 1 <unfinished ...>
>
>
> which is received by (line 183597): 
>
> 253   <... read resumed>"\1", 1)        = 1
>
> the marionette finalizer thread. Then, we pthread_join the Shepherd
> finalizer thread, which never stops! Quite unfortunate.

While working on a fix for this issue (finalizer pipe shared between
parent and child process), I found the ‘sleep_pipe’ of the main thread
is also shared between the parent and its child.

Attached is a reproducer.  It prints something like this before hanging:

--8<---------------cut here---------------start------------->8---
$ guile ~/src/guile-debugging/signal-pipe.scm
parent: 25947
child: 25953
alarm in parent!
alarm in parent!
alarm in parent!

[...]

alarm in parent!
alarm in parent!
child woken up!
--8<---------------cut here---------------end--------------->8---

“child woken up” means that it’s the child process that won the race
reading on the sleep pipe (from ‘scm_std_select’).

The parent process then hangs because, in ‘scm_std_select’, it did:

  1. select(1), which returned due to available data on ‘wakeup_fd’;

  2. ‘full_read (wakeup_fd, &dummy, 1)’ gets stuck forever in read(2)
     because the child process read that byte in the meantime so
     there’s nothing left to read.

Here’s the sequence:

--8<---------------cut here---------------start------------->8---
25947 select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=100000}) = 0 (Timeout)
25947 getpid()                          = 25947
25947 kill(25947, SIGALRM)              = 0
25947 --- SIGALRM {si_signo=SIGALRM, si_code=SI_USER, si_pid=25947, si_uid=1000} ---
25947 write(8, "\16", 1)                = 1
25947 rt_sigreturn({mask=[]})           = 0
25952 <... read resumed>"\16", 1)       = 1
25947 rt_sigprocmask(SIG_BLOCK, NULL,  <unfinished ...>
25952 write(4, "\0", 1 <unfinished ...>
25947 <... rt_sigprocmask resumed>[], 8) = 0
25953 <... select resumed>)             = 1 (in [3], left {tv_sec=0, tv_usec=346370})
25952 <... write resumed>)              = 1
25947 select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=100000} <unfinished ...>
25953 read(3,  <unfinished ...>
25952 rt_sigprocmask(SIG_BLOCK, NULL,  <unfinished ...>
25947 <... select resumed>)             = 1 (in [3], left {tv_sec=0, tv_usec=99999})
25953 <... read resumed>"\0", 1)        = 1
25947 read(3,  <unfinished ...>
25952 <... rt_sigprocmask resumed>~[KILL STOP PWR RTMIN RT_1], 8) = 0
25953 write(1, "child woken up!\n", 16 <unfinished ...>
25952 read(7,  <unfinished ...>
--8<---------------cut here---------------end--------------->8---

Notice how “\16” (= SIGALRM) is written by the parent’s main thread and
read by the child’s main thread.

Thoughts?

Ludo’.


[-- Attachment #2: the reproducer --]
[-- Type: text/plain, Size: 760 bytes --]

;; https://issues.guix.gnu.org/41948

(use-modules (ice-9 match))

(setvbuf (current-output-port) 'line)
(sigaction SIGCHLD pk)                            ;start signal thread

(match (primitive-fork)
  (0
   (format #t "child: ~a~%" (getpid))
   (let loop ()
     (unless (zero? (usleep 500000))
       ;; If this happens, it means the select(2) call in 'scm_std_select'
       ;; returned because one of our file descriptors had input data
       ;; available (which shouldn't happen).
       (format #t "child woken up!~%"))
     (loop)))
  (pid
   (format #t "parent: ~a~%" (getpid))
   (sigaction SIGALRM (lambda _
                        (format #t "alarm in parent!~%")))
   (let loop ()
     (kill (getpid) SIGALRM)
     (usleep 100000)
     (loop))))

  reply	other threads:[~2021-05-07 21:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19  8:41 bug#41948: Shepherd deadlocks Mathieu Othacehe
2020-06-19 12:10 ` Mathieu Othacehe
2020-06-20  0:16 ` Michael Rohleder
2020-06-20 10:31 ` Ludovic Courtès
2020-08-16  9:56   ` Mathieu Othacehe
2021-05-07 21:49     ` Ludovic Courtès [this message]
2021-05-07 22:07       ` Ludovic Courtès
2021-05-08 20:52         ` Ludovic Courtès
2021-05-08  9:43     ` Ludovic Courtès
2021-05-08 13:49       ` Andrew Whatson
2021-05-08 13:49         ` bug#41948: [PATCH] Fix some finalizer thread race conditions Andrew Whatson
2021-05-08 20:50           ` bug#41948: Shepherd deadlocks Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v97u9pu1.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=41948@debbugs.gnu.org \
    --cc=othacehe@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.