Hello, [+Cc: Andy for a heads-up on the fix below.] Ludovic Courtès skribis: > It turns out the previous patch didn’t work; in short, we really have to > use async-signal-safe functions only from the signal handler, so this > has to be done in C. > > The attached patch does that. I’ve tried it with ‘guix system > container’ and it seems to dump core as expected, from what I can see. > > Let me know if you manage to reproduce the bug and to get a core dumped > with this patch. Good news! The patch does indeed allow shepherd to dump core, and I managed to grab the backtrace below on an x86_64 machine running Guix System (from yesterday) with GNOME: --8<---------------cut here---------------start------------->8--- Using host libthread_db library "/gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libthread_db.so.1". Core was generated by `/gnu/store/1mkkv2caiqbdbbd256c4dirfi4kwsacv-guile-2.2.6/bin/guile --no-auto-com'. Program terminated with signal SIGSEGV, Segmentation fault. #0 handle_crash (sig=11) at /gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43 43 * (int *) 0 = 42; [Current thread is 1 (LWP 4635)] […] Thread 1 (LWP 4635): #0 handle_crash (sig=11) at /gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43 infinity = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} pid = msg = "Shepherd crashed!\n" pid = #1 No locals. #2 handle_crash (sig=6) at /gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43 infinity = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} pid = msg = "Shepherd crashed!\n" pid = #3 No locals. #4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 set = {__val = {0, 2314885530818445312, 0 }} pid = tid = ret = #5 0x00007f03eef40891 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 , 139654877144192, 0, 139654877624544}}, sa_flags = -279049286, sa_restorer = 0x7f03ef57e480 } sigs = {__val = {32, 0 }} #6 0x00007f03ef57e89a in finalization_thread_proc (unused=) at finalizers.c:228 data = {byte = -24 '\350', n = -1, err = 4} #7 0x00007f03ef56f35a in c_body (d=0x7f03ed152e50) at continuations.c:422 data = 0x7f03ed152e50 #8 0x00007f03ef5f079f in vm_regular_engine (thread=0x2, vp=0x7f03eb1caea0, registers=0x0, resume=-286001158) at vm-engine.c:786 ret = 2 ip = sp = op = 10 jump_table_ = {…} jump_table = 0x7f03ef64d8e0 […] #19 scm_with_guile (func=, data=) at threads.c:710 No locals. #20 0x00007f03ef497015 in start_thread (arg=0x7f03ed153700) at pthread_create.c:486 ret = pd = 0x7f03ed153700 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139654839219968, -749312912628550421, 140727702524830, 140727702524831, 140727702524832, 139654839219968, 837174519050892523, 837169745183601899}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = #21 0x00007f03eeffd91f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. --8<---------------cut here---------------end--------------->8--- So what happens is that ‘finalization_thread_proc’ in Guile receives EINTR (data.err == 4) but then, despite EINTR, it goes on to check the value of ‘data.byte’ and aborts because it’s neither 0 nor 1. My plan is to: 1. push the patch below to the ‘stable-2.2’ branch of Guile; done: ; 2. use a patched Guile for the ‘shepherd’ package; 3. include the crash handler in the Shepherd. Thoughts? Thanks, Ludo’.