From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#37757: Kernel panic upon shutdown Date: Mon, 02 Dec 2019 18:33:03 +0100 Message-ID: <87d0d6k4z4.fsf@gnu.org> References: <0876c9961fdffa47be54b756a05eb6320b6bdb18.camel@gmail.com> <874kzsfqsx.fsf@gnu.org> <87k183mnza.fsf@gnu.org> <87wobkw7gj.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:51568) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ibpaF-0007Vt-JP for bug-guix@gnu.org; Mon, 02 Dec 2019 12:34:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ibpaE-00061l-9q for bug-guix@gnu.org; Mon, 02 Dec 2019 12:34:03 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:32901) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ibpaE-00061f-6Z for bug-guix@gnu.org; Mon, 02 Dec 2019 12:34:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ibpaD-00030B-Vb for bug-guix@gnu.org; Mon, 02 Dec 2019 12:34:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87wobkw7gj.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Thu, 28 Nov 2019 12:45:00 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Jesse Gibbons , Jan Cc: 37757@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi! Ludovic Court=C3=A8s skribis: > Jesse (and anyone else experiencing this!), could you try to (1) > reconfigure with this patch, (2) reboot, (3) try to halt the system to > reproduce the crash, and (4) retrieve a backtrace from the =E2=80=98core= =E2=80=99 file? > > For #4, you=E2=80=99ll have to do something along these lines once you=E2= =80=99ve > rebooted after the crash: > > sudo gdb /run/current-system/profile/bin/guile /core > > and then type =E2=80=9Cthread apply all bt=E2=80=9D at the GDB prompt. It turns out the previous patch didn=E2=80=99t work; in short, we really ha= ve to use async-signal-safe functions only from the signal handler, so this has to be done in C. The attached patch does that. I=E2=80=99ve tried it with =E2=80=98guix sys= tem container=E2=80=99 and it seems to dump core as expected, from what I can s= ee. Let me know if you manage to reproduce the bug and to get a core dumped with this patch. To everyone reading this: if you=E2=80=99re experiencing shepherd crashes, please raise your hand :-) and consider applying this patch so we can gather debugging info! Thanks, Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm index 08bb33039c..cf82ef0a4c 100644 --- a/gnu/services/shepherd.scm +++ b/gnu/services/shepherd.scm @@ -271,6 +271,23 @@ and return the resulting '.go' file." (compile-file #$file #:output-file #$output #:env env)))))) +(define (crash-handler) + (define gcc-toolchain + (module-ref (resolve-interface '(gnu packages commencement)) + 'gcc-toolchain)) + + (define source + (local-file "../system/aux-files/shepherd-crash-handler.c")) + + (computed-file "crash-handler.so" + #~(begin + (setenv "PATH" #+(file-append gcc-toolchain "/bin")) + (setenv "CPATH" #+(file-append gcc-toolchain "/include")) + (setenv "LIBRARY_PATH" + #+(file-append gcc-toolchain "/lib")) + (system* "gcc" "-Wall" "-g" "-O3" "-fPIC" + "-shared" "-o" #$output #$source)))) + (define (shepherd-configuration-file services) "Return the shepherd configuration file for SERVICES." (assert-valid-graph services) @@ -281,6 +298,9 @@ and return the resulting '.go' file." (use-modules (srfi srfi-34) (system repl error-handling)) + ;; Load the crash handler, which allows shepherd to dump core. + (dynamic-link #$(crash-handler)) + ;; Arrange to spawn a REPL if something goes wrong. This is better ;; than a kernel panic. (call-with-error-handling diff --git a/gnu/system/aux-files/shepherd-crash-handler.c b/gnu/system/aux-files/shepherd-crash-handler.c new file mode 100644 index 0000000000..6b2db10866 --- /dev/null +++ b/gnu/system/aux-files/shepherd-crash-handler.c @@ -0,0 +1,70 @@ +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include /* For SYS_xxx definitions */ +#include + +static void +handle_crash (int sig) +{ + static const char msg[] = "Shepherd crashed!\n"; + write (2, msg, sizeof msg); + +#ifdef __sparc__ + /* See 'raw_clone' in systemd. */ +# error "SPARC uses a different 'clone' syscall convention" +#endif + + pid_t pid = syscall (SYS_clone, SIGCHLD, NULL); + if (pid < 0) + abort (); + + if (pid == 0) + { + /* Restore the default signal handler to get a core dump. */ + signal (sig, SIG_DFL); + + const struct rlimit infinity = { RLIM_INFINITY, RLIM_INFINITY }; + setrlimit (RLIMIT_CORE, &infinity); + chdir ("/"); + + int pid = syscall (SYS_getpid); + kill (pid, sig); + + /* As it turns out, 'kill' simply returns without doing anything, which + is consistent with the "Notes" section of kill(2). Thus, force a + crash. */ + * (int *) 0 = 42; + + _exit (254); + } + else + { + signal (sig, SIG_IGN); + + int status; + waitpid (pid, &status, 0); + + sync (); + + _exit (255); + } + + _exit (253); +} + +static void initialize_crash_handler (void) + __attribute__ ((constructor)); + +static void +initialize_crash_handler (void) +{ + signal (SIGSEGV, handle_crash); + signal (SIGABRT, handle_crash); +} --=-=-=--