From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#37757: Kernel panic upon shutdown Date: Thu, 28 Nov 2019 12:45:00 +0100 Message-ID: <87wobkw7gj.fsf@gnu.org> References: <0876c9961fdffa47be54b756a05eb6320b6bdb18.camel@gmail.com> <874kzsfqsx.fsf@gnu.org> <87k183mnza.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:45138) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iaIFP-0001Da-7C for bug-guix@gnu.org; Thu, 28 Nov 2019 06:46:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iaIFK-0005mU-9V for bug-guix@gnu.org; Thu, 28 Nov 2019 06:46:08 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:50801) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iaIFI-0005hP-L4 for bug-guix@gnu.org; Thu, 28 Nov 2019 06:46:06 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1iaIFF-0003qb-J0 for bug-guix@gnu.org; Thu, 28 Nov 2019 06:46:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87k183mnza.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 13 Nov 2019 23:05:13 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Jesse Gibbons Cc: 37757@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello! The attached patch should allow shepherd (PID=C2=A01) to dump core when it crashes (systemd does something similar). Jesse (and anyone else experiencing this!), could you try to (1) reconfigure with this patch, (2) reboot, (3) try to halt the system to reproduce the crash, and (4) retrieve a backtrace from the =E2=80=98core=E2= =80=99 file? For #4, you=E2=80=99ll have to do something along these lines once you=E2= =80=99ve rebooted after the crash: sudo gdb /run/current-system/profile/bin/guile /core and then type =E2=80=9Cthread apply all bt=E2=80=9D at the GDB prompt. I=E2=80=99ll also try to do that on another machine where I=E2=80=99ve seen= it happen. Thanks in advance! Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm index 08bb33039c..ec49244cf6 100644 --- a/gnu/services/shepherd.scm +++ b/gnu/services/shepherd.scm @@ -277,45 +277,87 @@ and return the resulting '.go' file." (let ((files (map shepherd-service-file services))) (define config - #~(begin - (use-modules (srfi srfi-34) - (system repl error-handling)) + (with-imported-modules '((guix build syscalls)) + #~(begin + (use-modules (srfi srfi-34) + (system repl error-handling) + (guix build syscalls) + (system foreign)) - ;; Arrange to spawn a REPL if something goes wrong. This is better - ;; than a kernel panic. - (call-with-error-handling - (lambda () - (apply register-services - (map load-compiled '#$(map scm->go files))))) + (define signal + (let ((proc (pointer->procedure int + (dynamic-func "signal" + (dynamic-link)) + (list int '*)))) + (lambda (signum handler) + (proc signum + (if (integer? handler) ;SIG_DFL, etc. + (make-pointer handler) + (procedure->pointer void handler (list int))))))) - ;; guix-daemon 0.6 aborts if 'PATH' is undefined, so work around - ;; it. - (setenv "PATH" "/run/current-system/profile/bin") + (define (handle-crash sig) + (dynamic-wind + (const #t) + (lambda () + (gc-disable) + (pk 'crash! sig) + ;; Fork and have the child dump core at the root. + (match (clone SIGCHLD) + (0 + (setrlimit 'core #f #f) + (chdir "/") + (signal sig SIG_DFL) + ;; Note: 'getpid' would return 1, hence this hack. + (kill (string->number (readlink "/proc/self")) + sig) + (primitive-_exit 253)) + (child + (waitpid child) + (sync) + ;; Hopefully at this point core has been dumped. + (pk 'done) + (sleep 3) + (primitive-_exit 255)))) + (lambda () + (primitive-_exit 254)))) - (format #t "starting services...~%") - (for-each (lambda (service) - ;; In the Shepherd 0.3 the 'start' method can raise - ;; '&action-runtime-error' if it fails, so protect - ;; against it. (XXX: 'action-runtime-error?' is not - ;; exported is 0.3, hence 'service-error?'.) - (guard (c ((service-error? c) - (format (current-error-port) - "failed to start service '~a'~%" - service))) - (start service))) - '#$(append-map shepherd-service-provision - (filter shepherd-service-auto-start? - services))) + (signal SIGSEGV handle-crash) - ;; Hang up stdin. At this point, we assume that 'start' methods - ;; that required user interaction on the console (e.g., - ;; 'cryptsetup open' invocations, post-fsck emergency REPL) have - ;; completed. User interaction becomes impossible after this - ;; call; this avoids situations where services wrongfully lead - ;; PID 1 to read from stdin (the console), which users may not - ;; have access to (see ). - (redirect-port (open-input-file "/dev/null") - (current-input-port)))) + ;; Arrange to spawn a REPL if something goes wrong. This is better + ;; than a kernel panic. + (call-with-error-handling + (lambda () + (apply register-services + (map load-compiled '#$(map scm->go files))))) + + ;; guix-daemon 0.6 aborts if 'PATH' is undefined, so work around + ;; it. + (setenv "PATH" "/run/current-system/profile/bin") + + (format #t "starting services...~%") + (for-each (lambda (service) + ;; In the Shepherd 0.3 the 'start' method can raise + ;; '&action-runtime-error' if it fails, so protect + ;; against it. (XXX: 'action-runtime-error?' is not + ;; exported is 0.3, hence 'service-error?'.) + (guard (c ((service-error? c) + (format (current-error-port) + "failed to start service '~a'~%" + service))) + (start service))) + '#$(append-map shepherd-service-provision + (filter shepherd-service-auto-start? + services))) + + ;; Hang up stdin. At this point, we assume that 'start' methods + ;; that required user interaction on the console (e.g., + ;; 'cryptsetup open' invocations, post-fsck emergency REPL) have + ;; completed. User interaction becomes impossible after this + ;; call; this avoids situations where services wrongfully lead + ;; PID 1 to read from stdin (the console), which users may not + ;; have access to (see ). + (redirect-port (open-input-file "/dev/null") + (current-input-port))))) (scheme-file "shepherd.conf" config))) --=-=-=--