unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Mathieu Othacehe <othacehe@gnu.org>
Cc: 41948@debbugs.gnu.org
Subject: bug#41948: Shepherd deadlocks
Date: Sat, 20 Jun 2020 12:31:40 +0200	[thread overview]
Message-ID: <87a70yc9kj.fsf@gnu.org> (raw)
In-Reply-To: <87h7v75txx.fsf@gnu.org> (Mathieu Othacehe's message of "Fri, 19 Jun 2020 10:41:14 +0200")

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

> When running "gui-installed-desktop-os-encrypted" test, Shepherd seems
> to deadlock when restarting "guix-daemon". This can happen at different
> stages:
>
> * In "umount-cow-store" procedure, just before finishing the install.
>
> * During "set-http-proxy" tests inside the marionette.
>   
> This is not always reproducible. In order to gather some information, I
> created a Shepherd "strace" service that logs what's happening in
> Shepherd itself (patch attached).

We should be able to reproduce it with much simpler tests then, right?
Like maybe “while : ; do herd restart guix-daemon ; done” or similar?

> It seems that, just after blocking signals, in "fork+exec-command", I
> guess, Shepherd is taking a lock:
>
> 183553:1     chdir("/")                        = 0
> 183554:1     clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, tv_nsec=4387222}) = 0
> 183555:1     rt_sigprocmask(SIG_BLOCK, NULL, [HUP INT TERM CHLD], 8) = 0
> 183560:1     madvise(0x7f179782d000, 12288, MADV_DONTNEED) = 0
> 183561:1     clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, tv_nsec=13592357}) = 0
> 183562:1     clone(child_stack=0x7f17981e6fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[271], tls=0x7f17981e7700, child_tidptr=0x7f17981e79d0) = 271
> 183563:1     stat("/etc/localtime", {st_mode=S_IFREG|0444, st_size=1920, ...}) = 0
> 183564:1     write(15, "shepherd[1]: changing HTTP/HTTPS"..., 86 <unfinished ...>
> 183566:1     <... write resumed>)              = 86
> 183575:1     getpgid(169)                      = 169
> 183576:1     getpgid(0)                        = 0
> 183577:1     kill(-169, SIGTERM <unfinished ...>
> 183579:1     <... kill resumed>)               = 0
> 183582:1     stat("/etc/localtime", {st_mode=S_IFREG|0444, st_size=1920, ...}) = 0
> 183583:1     write(15, "shepherd[1]: Service guix-daemon"..., 51 <unfinished ...>
> 183585:1     <... write resumed>)              = 51
> 183594:1     getuid()                          = 0
> 183595:1     rt_sigprocmask(SIG_BLOCK, [HUP INT TERM CHLD], [HUP INT TERM CHLD], 8) = 0
> 183596:1     write(6, "\1", 1 <unfinished ...>
> 183598:1     <... write resumed>)              = 1
> 183605:1     futex(0x7f17981e79d0, FUTEX_WAIT, 271, NULL <unfinished ...>
>
> and is then blocking forever.

When that happens, we should check how many threads exist in PID 1.
There should be the finalization thread and the main thread, plus the
signal thread (because there are still ‘sigaction’ calls in the ‘main’
procedure), plus the GC marker threads.

In <https://issues.guix.gnu.org/31925#6>, Andy suggests that the signal
thread is not properly handled; indeed it takes locks and we don’t try
to shut it down upon fork.  However, when using signalfd, the signal
thread must be stuck in its ‘read’ call in ‘read_signal_pipe_data’, so I
don’t see how it could cause problems.

The GC threads are presumably taken care of by the atfork handler in
libgc.

Thoughts?

Ludo’.




  parent reply	other threads:[~2020-06-20 10:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19  8:41 bug#41948: Shepherd deadlocks Mathieu Othacehe
2020-06-19 12:10 ` Mathieu Othacehe
2020-06-20  0:16 ` Michael Rohleder
2020-06-20 10:31 ` Ludovic Courtès [this message]
2020-08-16  9:56   ` Mathieu Othacehe
2021-05-07 21:49     ` Ludovic Courtès
2021-05-07 22:07       ` Ludovic Courtès
2021-05-08 20:52         ` Ludovic Courtès
2021-05-08  9:43     ` Ludovic Courtès
2021-05-08 13:49       ` Andrew Whatson
2021-05-08 13:49         ` bug#41948: [PATCH] Fix some finalizer thread race conditions Andrew Whatson
2021-05-08 20:50           ` bug#41948: Shepherd deadlocks Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a70yc9kj.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=41948@debbugs.gnu.org \
    --cc=othacehe@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).