From: Mathieu Othacehe <othacehe@gnu.org>
To: 41948@debbugs.gnu.org
Subject: bug#41948: Shepherd deadlocks
Date: Fri, 19 Jun 2020 10:41:14 +0200 [thread overview]
Message-ID: <87h7v75txx.fsf@gnu.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 2414 bytes --]
Hello,
When running "gui-installed-desktop-os-encrypted" test, Shepherd seems
to deadlock when restarting "guix-daemon". This can happen at different
stages:
* In "umount-cow-store" procedure, just before finishing the install.
* During "set-http-proxy" tests inside the marionette.
This is not always reproducible. In order to gather some information, I
created a Shepherd "strace" service that logs what's happening in
Shepherd itself (patch attached).
It seems that, just after blocking signals, in "fork+exec-command", I
guess, Shepherd is taking a lock:
--8<---------------cut here---------------start------------->8---
183553:1 chdir("/") = 0
183554:1 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, tv_nsec=4387222}) = 0
183555:1 rt_sigprocmask(SIG_BLOCK, NULL, [HUP INT TERM CHLD], 8) = 0
183560:1 madvise(0x7f179782d000, 12288, MADV_DONTNEED) = 0
183561:1 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, tv_nsec=13592357}) = 0
183562:1 clone(child_stack=0x7f17981e6fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[271], tls=0x7f17981e7700, child_tidptr=0x7f17981e79d0) = 271
183563:1 stat("/etc/localtime", {st_mode=S_IFREG|0444, st_size=1920, ...}) = 0
183564:1 write(15, "shepherd[1]: changing HTTP/HTTPS"..., 86 <unfinished ...>
183566:1 <... write resumed>) = 86
183575:1 getpgid(169) = 169
183576:1 getpgid(0) = 0
183577:1 kill(-169, SIGTERM <unfinished ...>
183579:1 <... kill resumed>) = 0
183582:1 stat("/etc/localtime", {st_mode=S_IFREG|0444, st_size=1920, ...}) = 0
183583:1 write(15, "shepherd[1]: Service guix-daemon"..., 51 <unfinished ...>
183585:1 <... write resumed>) = 51
183594:1 getuid() = 0
183595:1 rt_sigprocmask(SIG_BLOCK, [HUP INT TERM CHLD], [HUP INT TERM CHLD], 8) = 0
183596:1 write(6, "\1", 1 <unfinished ...>
183598:1 <... write resumed>) = 1
183605:1 futex(0x7f17981e79d0, FUTEX_WAIT, 271, NULL <unfinished ...>
--8<---------------cut here---------------end--------------->8---
and is then blocking forever.
I cannot find any trace of locking in Shepherd, except for
"without-automatic-finalization" procedure.
Thanks,
Mathieu
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-strace-service.patch --]
[-- Type: text/x-diff, Size: 3177 bytes --]
From 13ac475ad953509cbbf88547e4f72efc08d31055 Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe <othacehe@gnu.org>
Date: Thu, 18 Jun 2020 14:13:15 +0200
Subject: [PATCH] strace service
---
gnu/services/base.scm | 3 +++
gnu/services/linux.scm | 25 ++++++++++++++++++++++++-
gnu/system/install.scm | 3 +++
3 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index 6ea7ef8e7e..d80c6cca6d 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -34,6 +34,7 @@
#:use-module (guix deprecation)
#:use-module (gnu services)
#:use-module (gnu services admin)
+ #:use-module (gnu services linux)
#:use-module (gnu services shepherd)
#:use-module (gnu system pam)
#:use-module (gnu system shadow) ; 'user-account', etc.
@@ -2386,6 +2387,8 @@ to handle."
(service guix-service-type)
(service nscd-service-type)
+ (service strace-service-type 1)
+
(service rottlog-service-type)
;; The LVM2 rules are needed as soon as LVM2 or the device-mapper is
diff --git a/gnu/services/linux.scm b/gnu/services/linux.scm
index 12934c2084..96889dfb7f 100644
--- a/gnu/services/linux.scm
+++ b/gnu/services/linux.scm
@@ -42,7 +42,9 @@
earlyoom-configuration-send-notification-command
earlyoom-service-type
- kernel-module-loader-service-type))
+ kernel-module-loader-service-type
+
+ strace-service-type))
\f
;;;
@@ -177,3 +179,24 @@ representation."
(compose concatenate)
(extend append)
(default-value '())))
+
+\f
+;;;
+;;; Strace service.
+;;;
+
+(define strace-service-type
+ (shepherd-service-type
+ 'strace
+ (lambda (pid)
+ (shepherd-service
+ (requirement '(root-file-system user-processes))
+ (provision (list (string->symbol
+ (string-append "strace-" (number->string pid)))))
+ (documentation "Strace a running process.")
+ (start #~(make-forkexec-constructor
+ (list #$(file-append strace "/bin/strace")
+ "-f" "-p" (number->string #$pid)
+ "-o" (string-append "/var/log/strace-"
+ (number->string #$pid) ".log"))))
+ (stop #~(make-kill-destructor))))))
diff --git a/gnu/system/install.scm b/gnu/system/install.scm
index d0ff2e7c52..f2f1e9414c 100644
--- a/gnu/system/install.scm
+++ b/gnu/system/install.scm
@@ -34,6 +34,7 @@
#:use-module (gnu installer)
#:use-module (gnu system locale)
#:use-module (gnu services dbus)
+ #:use-module (gnu services linux)
#:use-module (gnu services networking)
#:use-module (gnu services shepherd)
#:use-module (gnu services ssh)
@@ -378,6 +379,8 @@ Access documentation at any time by pressing Alt-F2.\x1b[0m
;; regulations-compliant WiFi access.
(udev-service #:rules (list lvm2 crda))
+ (service strace-service-type 1)
+
;; Add the 'cow-store' service, which users have to start manually
;; since it takes the installation directory as an argument.
(cow-store-service)
--
2.26.2
[-- Attachment #3: strace-1.log.xz --]
[-- Type: application/x-xz, Size: 622104 bytes --]
next reply other threads:[~2020-06-19 8:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 8:41 Mathieu Othacehe [this message]
2020-06-19 12:10 ` bug#41948: Shepherd deadlocks Mathieu Othacehe
2020-06-20 0:16 ` Michael Rohleder
2020-06-20 10:31 ` Ludovic Courtès
2020-08-16 9:56 ` Mathieu Othacehe
2021-05-07 21:49 ` Ludovic Courtès
2021-05-07 22:07 ` Ludovic Courtès
2021-05-08 20:52 ` Ludovic Courtès
2021-05-08 9:43 ` Ludovic Courtès
2021-05-08 13:49 ` Andrew Whatson
2021-05-08 13:49 ` bug#41948: [PATCH] Fix some finalizer thread race conditions Andrew Whatson
2021-05-08 20:50 ` bug#41948: Shepherd deadlocks Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h7v75txx.fsf@gnu.org \
--to=othacehe@gnu.org \
--cc=41948@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).