* bug#37757: Kernel panic upon shutdown [not found] <0876c9961fdffa47be54b756a05eb6320b6bdb18.camel@gmail.com> @ 2019-10-28 22:28 ` Ludovic Courtès 2019-11-13 22:05 ` Ludovic Courtès 0 siblings, 1 reply; 8+ messages in thread From: Ludovic Courtès @ 2019-10-28 22:28 UTC (permalink / raw) To: Jesse Gibbons; +Cc: 37757 Hi, Jesse Gibbons <jgibbons2357@gmail.com> skribis: > Attached is a picture of the kernel panic. It happened when I tried to shut > down. > I do not know what log to look at to get any details about what happened > about that time. Of course, the panic itself is not in any of the logs in > /var/log. > This is not the first time there was a kernel panic during the shutdown > process. I’ve just seen it on a laptop running GNOME and ‘%desktop-services’. The kernel panic appeared right after shutting down ModemManager (I don’t have ModemManager on my own laptop and I’ve never experienced the bug, but I don’t know if it’s significant.) Note that we see (roughly): attempted to kill init! exit code=0x0000000b which, unless I’m mistaken, means that PID 1 segfaulted (SIGSEGV = 11), which is bad. According to reboot(2), the ‘reboot’ syscall doesn’t return in this case, so the segfault must have happened before the ‘reboot’ call. The problem appeared roughly after the ‘core-updates’ merge, but I don’t see any change to the ‘reboot’ wrapper in glibc 2.29. Is it reproducible for you in a VM built with ‘guix system vm’? If would be helpful if we had that. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#37757: Kernel panic upon shutdown 2019-10-28 22:28 ` bug#37757: Kernel panic upon shutdown Ludovic Courtès @ 2019-11-13 22:05 ` Ludovic Courtès 2019-11-13 22:22 ` Jan 2019-11-28 11:45 ` Ludovic Courtès 0 siblings, 2 replies; 8+ messages in thread From: Ludovic Courtès @ 2019-11-13 22:05 UTC (permalink / raw) To: Jesse Gibbons; +Cc: 37757 Ludovic Courtès <ludo@gnu.org> skribis: > I’ve just seen it on a laptop running GNOME and ‘%desktop-services’. > The kernel panic appeared right after shutting down ModemManager (I > don’t have ModemManager on my own laptop and I’ve never experienced the > bug, but I don’t know if it’s significant.) > > Note that we see (roughly): > > attempted to kill init! exit code=0x0000000b [...] > Is it reproducible for you in a VM built with ‘guix system vm’? If > would be helpful if we had that. For the record, apparently I can’t reproduce it in a ‘guix system vm gnu/system/examples/desktop.tmpl’ VM. Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#37757: Kernel panic upon shutdown 2019-11-13 22:05 ` Ludovic Courtès @ 2019-11-13 22:22 ` Jan 2019-11-28 11:45 ` Ludovic Courtès 1 sibling, 0 replies; 8+ messages in thread From: Jan @ 2019-11-13 22:22 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Jesse Gibbons, 37757 Hi, I encountered the same error today. I had ran "sudo herd stop tor" and then "sudo herd stop xorg-server" and it panicked. Jan Wielkiewicz ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#37757: Kernel panic upon shutdown 2019-11-13 22:05 ` Ludovic Courtès 2019-11-13 22:22 ` Jan @ 2019-11-28 11:45 ` Ludovic Courtès 2019-12-02 17:33 ` Ludovic Courtès 1 sibling, 1 reply; 8+ messages in thread From: Ludovic Courtès @ 2019-11-28 11:45 UTC (permalink / raw) To: Jesse Gibbons; +Cc: 37757 [-- Attachment #1: Type: text/plain, Size: 676 bytes --] Hello! The attached patch should allow shepherd (PID 1) to dump core when it crashes (systemd does something similar). Jesse (and anyone else experiencing this!), could you try to (1) reconfigure with this patch, (2) reboot, (3) try to halt the system to reproduce the crash, and (4) retrieve a backtrace from the ‘core’ file? For #4, you’ll have to do something along these lines once you’ve rebooted after the crash: sudo gdb /run/current-system/profile/bin/guile /core and then type “thread apply all bt” at the GDB prompt. I’ll also try to do that on another machine where I’ve seen it happen. Thanks in advance! Ludo’. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 6254 bytes --] diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm index 08bb33039c..ec49244cf6 100644 --- a/gnu/services/shepherd.scm +++ b/gnu/services/shepherd.scm @@ -277,45 +277,87 @@ and return the resulting '.go' file." (let ((files (map shepherd-service-file services))) (define config - #~(begin - (use-modules (srfi srfi-34) - (system repl error-handling)) + (with-imported-modules '((guix build syscalls)) + #~(begin + (use-modules (srfi srfi-34) + (system repl error-handling) + (guix build syscalls) + (system foreign)) - ;; Arrange to spawn a REPL if something goes wrong. This is better - ;; than a kernel panic. - (call-with-error-handling - (lambda () - (apply register-services - (map load-compiled '#$(map scm->go files))))) + (define signal + (let ((proc (pointer->procedure int + (dynamic-func "signal" + (dynamic-link)) + (list int '*)))) + (lambda (signum handler) + (proc signum + (if (integer? handler) ;SIG_DFL, etc. + (make-pointer handler) + (procedure->pointer void handler (list int))))))) - ;; guix-daemon 0.6 aborts if 'PATH' is undefined, so work around - ;; it. - (setenv "PATH" "/run/current-system/profile/bin") + (define (handle-crash sig) + (dynamic-wind + (const #t) + (lambda () + (gc-disable) + (pk 'crash! sig) + ;; Fork and have the child dump core at the root. + (match (clone SIGCHLD) + (0 + (setrlimit 'core #f #f) + (chdir "/") + (signal sig SIG_DFL) + ;; Note: 'getpid' would return 1, hence this hack. + (kill (string->number (readlink "/proc/self")) + sig) + (primitive-_exit 253)) + (child + (waitpid child) + (sync) + ;; Hopefully at this point core has been dumped. + (pk 'done) + (sleep 3) + (primitive-_exit 255)))) + (lambda () + (primitive-_exit 254)))) - (format #t "starting services...~%") - (for-each (lambda (service) - ;; In the Shepherd 0.3 the 'start' method can raise - ;; '&action-runtime-error' if it fails, so protect - ;; against it. (XXX: 'action-runtime-error?' is not - ;; exported is 0.3, hence 'service-error?'.) - (guard (c ((service-error? c) - (format (current-error-port) - "failed to start service '~a'~%" - service))) - (start service))) - '#$(append-map shepherd-service-provision - (filter shepherd-service-auto-start? - services))) + (signal SIGSEGV handle-crash) - ;; Hang up stdin. At this point, we assume that 'start' methods - ;; that required user interaction on the console (e.g., - ;; 'cryptsetup open' invocations, post-fsck emergency REPL) have - ;; completed. User interaction becomes impossible after this - ;; call; this avoids situations where services wrongfully lead - ;; PID 1 to read from stdin (the console), which users may not - ;; have access to (see <https://bugs.gnu.org/23697>). - (redirect-port (open-input-file "/dev/null") - (current-input-port)))) + ;; Arrange to spawn a REPL if something goes wrong. This is better + ;; than a kernel panic. + (call-with-error-handling + (lambda () + (apply register-services + (map load-compiled '#$(map scm->go files))))) + + ;; guix-daemon 0.6 aborts if 'PATH' is undefined, so work around + ;; it. + (setenv "PATH" "/run/current-system/profile/bin") + + (format #t "starting services...~%") + (for-each (lambda (service) + ;; In the Shepherd 0.3 the 'start' method can raise + ;; '&action-runtime-error' if it fails, so protect + ;; against it. (XXX: 'action-runtime-error?' is not + ;; exported is 0.3, hence 'service-error?'.) + (guard (c ((service-error? c) + (format (current-error-port) + "failed to start service '~a'~%" + service))) + (start service))) + '#$(append-map shepherd-service-provision + (filter shepherd-service-auto-start? + services))) + + ;; Hang up stdin. At this point, we assume that 'start' methods + ;; that required user interaction on the console (e.g., + ;; 'cryptsetup open' invocations, post-fsck emergency REPL) have + ;; completed. User interaction becomes impossible after this + ;; call; this avoids situations where services wrongfully lead + ;; PID 1 to read from stdin (the console), which users may not + ;; have access to (see <https://bugs.gnu.org/23697>). + (redirect-port (open-input-file "/dev/null") + (current-input-port))))) (scheme-file "shepherd.conf" config))) ^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#37757: Kernel panic upon shutdown 2019-11-28 11:45 ` Ludovic Courtès @ 2019-12-02 17:33 ` Ludovic Courtès 2019-12-03 9:43 ` Arne Babenhauserheide 2019-12-09 13:47 ` Ludovic Courtès 0 siblings, 2 replies; 8+ messages in thread From: Ludovic Courtès @ 2019-12-02 17:33 UTC (permalink / raw) To: Jesse Gibbons, Jan; +Cc: 37757 [-- Attachment #1: Type: text/plain, Size: 1101 bytes --] Hi! Ludovic Courtès <ludo@gnu.org> skribis: > Jesse (and anyone else experiencing this!), could you try to (1) > reconfigure with this patch, (2) reboot, (3) try to halt the system to > reproduce the crash, and (4) retrieve a backtrace from the ‘core’ file? > > For #4, you’ll have to do something along these lines once you’ve > rebooted after the crash: > > sudo gdb /run/current-system/profile/bin/guile /core > > and then type “thread apply all bt” at the GDB prompt. It turns out the previous patch didn’t work; in short, we really have to use async-signal-safe functions only from the signal handler, so this has to be done in C. The attached patch does that. I’ve tried it with ‘guix system container’ and it seems to dump core as expected, from what I can see. Let me know if you manage to reproduce the bug and to get a core dumped with this patch. To everyone reading this: if you’re experiencing shepherd crashes, please raise your hand :-) and consider applying this patch so we can gather debugging info! Thanks, Ludo’. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 3376 bytes --] diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm index 08bb33039c..cf82ef0a4c 100644 --- a/gnu/services/shepherd.scm +++ b/gnu/services/shepherd.scm @@ -271,6 +271,23 @@ and return the resulting '.go' file." (compile-file #$file #:output-file #$output #:env env)))))) +(define (crash-handler) + (define gcc-toolchain + (module-ref (resolve-interface '(gnu packages commencement)) + 'gcc-toolchain)) + + (define source + (local-file "../system/aux-files/shepherd-crash-handler.c")) + + (computed-file "crash-handler.so" + #~(begin + (setenv "PATH" #+(file-append gcc-toolchain "/bin")) + (setenv "CPATH" #+(file-append gcc-toolchain "/include")) + (setenv "LIBRARY_PATH" + #+(file-append gcc-toolchain "/lib")) + (system* "gcc" "-Wall" "-g" "-O3" "-fPIC" + "-shared" "-o" #$output #$source)))) + (define (shepherd-configuration-file services) "Return the shepherd configuration file for SERVICES." (assert-valid-graph services) @@ -281,6 +298,9 @@ and return the resulting '.go' file." (use-modules (srfi srfi-34) (system repl error-handling)) + ;; Load the crash handler, which allows shepherd to dump core. + (dynamic-link #$(crash-handler)) + ;; Arrange to spawn a REPL if something goes wrong. This is better ;; than a kernel panic. (call-with-error-handling diff --git a/gnu/system/aux-files/shepherd-crash-handler.c b/gnu/system/aux-files/shepherd-crash-handler.c new file mode 100644 index 0000000000..6b2db10866 --- /dev/null +++ b/gnu/system/aux-files/shepherd-crash-handler.c @@ -0,0 +1,70 @@ +#define _GNU_SOURCE + +#include <stdlib.h> +#include <unistd.h> +#include <sched.h> +#include <sys/time.h> +#include <sys/resource.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <sys/syscall.h> /* For SYS_xxx definitions */ +#include <signal.h> + +static void +handle_crash (int sig) +{ + static const char msg[] = "Shepherd crashed!\n"; + write (2, msg, sizeof msg); + +#ifdef __sparc__ + /* See 'raw_clone' in systemd. */ +# error "SPARC uses a different 'clone' syscall convention" +#endif + + pid_t pid = syscall (SYS_clone, SIGCHLD, NULL); + if (pid < 0) + abort (); + + if (pid == 0) + { + /* Restore the default signal handler to get a core dump. */ + signal (sig, SIG_DFL); + + const struct rlimit infinity = { RLIM_INFINITY, RLIM_INFINITY }; + setrlimit (RLIMIT_CORE, &infinity); + chdir ("/"); + + int pid = syscall (SYS_getpid); + kill (pid, sig); + + /* As it turns out, 'kill' simply returns without doing anything, which + is consistent with the "Notes" section of kill(2). Thus, force a + crash. */ + * (int *) 0 = 42; + + _exit (254); + } + else + { + signal (sig, SIG_IGN); + + int status; + waitpid (pid, &status, 0); + + sync (); + + _exit (255); + } + + _exit (253); +} + +static void initialize_crash_handler (void) + __attribute__ ((constructor)); + +static void +initialize_crash_handler (void) +{ + signal (SIGSEGV, handle_crash); + signal (SIGABRT, handle_crash); +} ^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#37757: Kernel panic upon shutdown 2019-12-02 17:33 ` Ludovic Courtès @ 2019-12-03 9:43 ` Arne Babenhauserheide 2019-12-09 13:47 ` Ludovic Courtès 1 sibling, 0 replies; 8+ messages in thread From: Arne Babenhauserheide @ 2019-12-03 9:43 UTC (permalink / raw) To: 37757; +Cc: jgibbons2357 [-- Attachment #1: Type: text/plain, Size: 367 bytes --] Ludovic Courtès <ludo@gnu.org> writes: > To everyone reading this: if you’re experiencing shepherd crashes, > please raise your hand :-) \o > and consider applying this patch so we can gather debugging info! Can I do that without installing from a local checkout? Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1076 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#37757: Kernel panic upon shutdown 2019-12-02 17:33 ` Ludovic Courtès 2019-12-03 9:43 ` Arne Babenhauserheide @ 2019-12-09 13:47 ` Ludovic Courtès 2019-12-09 23:13 ` Ludovic Courtès 1 sibling, 1 reply; 8+ messages in thread From: Ludovic Courtès @ 2019-12-09 13:47 UTC (permalink / raw) To: Jesse Gibbons; +Cc: Andy Wingo, 37757 [-- Attachment #1: Type: text/plain, Size: 4466 bytes --] Hello, [+Cc: Andy for a heads-up on the fix below.] Ludovic Courtès <ludo@gnu.org> skribis: > It turns out the previous patch didn’t work; in short, we really have to > use async-signal-safe functions only from the signal handler, so this > has to be done in C. > > The attached patch does that. I’ve tried it with ‘guix system > container’ and it seems to dump core as expected, from what I can see. > > Let me know if you manage to reproduce the bug and to get a core dumped > with this patch. Good news! The patch does indeed allow shepherd to dump core, and I managed to grab the backtrace below on an x86_64 machine running Guix System (from yesterday) with GNOME: --8<---------------cut here---------------start------------->8--- Using host libthread_db library "/gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libthread_db.so.1". Core was generated by `/gnu/store/1mkkv2caiqbdbbd256c4dirfi4kwsacv-guile-2.2.6/bin/guile --no-auto-com'. Program terminated with signal SIGSEGV, Segmentation fault. #0 handle_crash (sig=11) at /gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43 43 * (int *) 0 = 42; [Current thread is 1 (LWP 4635)] […] Thread 1 (LWP 4635): #0 handle_crash (sig=11) at /gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43 infinity = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} pid = <optimized out> msg = "Shepherd crashed!\n" pid = <optimized out> #1 <signal handler called> No locals. #2 handle_crash (sig=6) at /gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43 infinity = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} pid = <optimized out> msg = "Shepherd crashed!\n" pid = <optimized out> #3 <signal handler called> No locals. #4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 set = {__val = {0, 2314885530818445312, 0 <repeats 14 times>}} pid = <optimized out> tid = <optimized out> ret = <optimized out> #5 0x00007f03eef40891 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 13 times>, 139654877144192, 0, 139654877624544}}, sa_flags = -279049286, sa_restorer = 0x7f03ef57e480 <read_finalization_pipe_data>} sigs = {__val = {32, 0 <repeats 15 times>}} #6 0x00007f03ef57e89a in finalization_thread_proc (unused=<optimized out>) at finalizers.c:228 data = {byte = -24 '\350', n = -1, err = 4} #7 0x00007f03ef56f35a in c_body (d=0x7f03ed152e50) at continuations.c:422 data = 0x7f03ed152e50 #8 0x00007f03ef5f079f in vm_regular_engine (thread=0x2, vp=0x7f03eb1caea0, registers=0x0, resume=-286001158) at vm-engine.c:786 ret = 2 ip = <optimized out> sp = <optimized out> op = 10 jump_table_ = {…} jump_table = 0x7f03ef64d8e0 <jump_table_> […] #19 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:710 No locals. #20 0x00007f03ef497015 in start_thread (arg=0x7f03ed153700) at pthread_create.c:486 ret = <optimized out> pd = 0x7f03ed153700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139654839219968, -749312912628550421, 140727702524830, 140727702524831, 140727702524832, 139654839219968, 837174519050892523, 837169745183601899}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #21 0x00007f03eeffd91f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. --8<---------------cut here---------------end--------------->8--- So what happens is that ‘finalization_thread_proc’ in Guile receives EINTR (data.err == 4) but then, despite EINTR, it goes on to check the value of ‘data.byte’ and aborts because it’s neither 0 nor 1. My plan is to: 1. push the patch below to the ‘stable-2.2’ branch of Guile; done: <https://git.savannah.gnu.org/cgit/guile.git/commit/?h=stable-2.2&id=edf5aea7ac852db2356ef36cba4a119eb0c81ea9>; 2. use a patched Guile for the ‘shepherd’ package; 3. include the crash handler in the Shepherd. Thoughts? Thanks, Ludo’. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 1353 bytes --] diff --git a/libguile/finalizers.c b/libguile/finalizers.c index c5d69e8e3..94a6e6b0a 100644 --- a/libguile/finalizers.c +++ b/libguile/finalizers.c @@ -1,4 +1,4 @@ -/* Copyright (C) 2012, 2013, 2014 Free Software Foundation, Inc. +/* Copyright (C) 2012, 2013, 2014, 2019 Free Software Foundation, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public License @@ -211,21 +211,26 @@ finalization_thread_proc (void *unused) scm_without_guile (read_finalization_pipe_data, &data); - if (data.n <= 0 && data.err != EINTR) + if (data.n <= 0) { - perror ("error in finalization thread"); - return NULL; + if (data.err != EINTR) + { + perror ("error in finalization thread"); + return NULL; + } } - - switch (data.byte) + else { - case 0: - scm_run_finalizers (); - break; - case 1: - return NULL; - default: - abort (); + switch (data.byte) + { + case 0: + scm_run_finalizers (); + break; + case 1: + return NULL; + default: + abort (); + } } } } ^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#37757: Kernel panic upon shutdown 2019-12-09 13:47 ` Ludovic Courtès @ 2019-12-09 23:13 ` Ludovic Courtès 0 siblings, 0 replies; 8+ messages in thread From: Ludovic Courtès @ 2019-12-09 23:13 UTC (permalink / raw) To: Jesse Gibbons; +Cc: 37757-done Hi, Ludovic Courtès <ludo@gnu.org> skribis: > My plan is to: > > 1. push the patch below to the ‘stable-2.2’ branch of Guile; > done: > <https://git.savannah.gnu.org/cgit/guile.git/commit/?h=stable-2.2&id=edf5aea7ac852db2356ef36cba4a119eb0c81ea9>; > > 2. use a patched Guile for the ‘shepherd’ package; Done: <https://git.savannah.gnu.org/cgit/guix.git/commit/?id=24ba2cee2b1671c5dae36bb4cdba139f1fd09023>. > 3. include the crash handler in the Shepherd. Done: <https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=dfb7c7ecdb2d12061073e6939ec6e765ae59c00c>. I’m closing the bug. Please reopen it if you notice anything wrong! Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-12-09 23:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <0876c9961fdffa47be54b756a05eb6320b6bdb18.camel@gmail.com> 2019-10-28 22:28 ` bug#37757: Kernel panic upon shutdown Ludovic Courtès 2019-11-13 22:05 ` Ludovic Courtès 2019-11-13 22:22 ` Jan 2019-11-28 11:45 ` Ludovic Courtès 2019-12-02 17:33 ` Ludovic Courtès 2019-12-03 9:43 ` Arne Babenhauserheide 2019-12-09 13:47 ` Ludovic Courtès 2019-12-09 23:13 ` Ludovic Courtès
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.