From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: MPS: a random backtrace while toying with gdb Date: Tue, 02 Jul 2024 17:57:08 +0300 Message-ID: <86sewrc057.fsf@gnu.org> References: <87bk3jh8bt.fsf@localhost> <86r0cefb0i.fsf@gnu.org> <86msn1fk0c.fsf@gnu.org> <86h6d9dlyg.fsf@gnu.org> <86h6d8c52h.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15268"; mail-complaints-to="usenet@ciao.gmane.io" Cc: eller.helmut@gmail.com, gerd.moellmann@gmail.com, yantar92@posteo.net, emacs-devel@gnu.org To: Pip Cet Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Jul 02 16:58:12 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sOexP-0003id-4Z for ged-emacs-devel@m.gmane-mx.org; Tue, 02 Jul 2024 16:58:11 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sOewh-0005f9-SI; Tue, 02 Jul 2024 10:57:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sOewX-0005cQ-2V for emacs-devel@gnu.org; Tue, 02 Jul 2024 10:57:18 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sOewW-0003Ky-Du; Tue, 02 Jul 2024 10:57:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=IZJDU4ggcrRmX+P0GmWHm7XXs1A98XP/K7+9ZPc+hx4=; b=qY+9i7lKC9a3 ix5+WUU89gFDqpGXNTwAwdG0JiKv/6dCR5Kz8Kme+3Dfdsy5KT13YNTdNZLNnt0m7fkjUNeKI/zmK jcn1esTI2QIr1YnoVP64fMlKt2sRheS34EqysH/rPV4EgLJ5VWfqdE0ndlXSBc31ugp9apPpcm5ts TtOcKftwEMjp0nAQXosu7/3qbTRzJsucNIy7aAzqPCvgfmLQuFBzHa5J0ap5V9hlHiLsLxUBNEvO3 UQrHSUyeOlVSFs7mSQq4T1rkpxoQvQLApve5V3SwJZ0IU+pnqwwAW0nSR371YjzUZgZsY5k61wY7o 8UDQarM+hfzAMluGcXFzJA==; In-Reply-To: (message from Pip Cet on Tue, 02 Jul 2024 14:24:33 +0000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:321170 Archived-At: > Date: Tue, 02 Jul 2024 14:24:33 +0000 > From: Pip Cet > Cc: eller.helmut@gmail.com, gerd.moellmann@gmail.com, yantar92@posteo.net, emacs-devel@gnu.org > > > > > That's not the problem, AFAIU. The problem is that a signal handler > > > > which accesses Lisp data or the state of the Lisp machine could > > > > trigger an MPS call, which will try taking the arena lock, and that > > > > cannot be nested, by MPS design. And our handlers do access the Lisp > > > > machine, albeit cautiously and as little as necessary. So when the > > > > signal happens in the middle of an MPS call which already took the > > > > arena lock, we cannot safely access our data. > > > > > > I've tried quite hard to make this happen, but I didn't manage it. It seems that whenever MPS puts up a protection barrier for existing allocated memory, the arena lock has already been released. As signal handlers cannot allocate memory directly, there's no deadlock, either. > > > > > > I don't understand MPS as well as you apparently do, so could you help me and tell where to put a kill(getpid(), SIGWHATEVER) with an appropriate signal handler which will cause a crash (without, in the signal handler, allocating memory)? > > > > I thought using the profiler would trigger these easily enough? I > > think someone (Helmut?) posted a simple recipe for reproducing that > > some time ago? > > Those were all signals interrupting MPS's SIGSEGV handler. You were talking about signals interrupting MPS code that runs outside of a signal handler, weren't you? I don't think they all were interrupting MPS's SIGSEGV handler. I think it's the other way around: we interrupted MPS code, and our signal handler accessed memory which triggered MPS's SIGSEGV. But even if I'm wrong, why is that important? We need to solve both kinds of situations, don't we? > > Also, there was a recipe with SIGCHLD not long ago (you'd need to undo > > Helmut's fixes for that, I believe, to be able to reproduce that). > > Same thing. Not AFAICT. Look: Thread 1 "emacs" hit Breakpoint 1, terminate_due_to_signal (sig=sig@entry=6, backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:443 443 { (gdb) bt #0 terminate_due_to_signal (sig=sig@entry=6, backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:443 #1 0x00005555558634be in set_state (state=IGC_STATE_DEAD) at igc.c:179 #2 igc_assert_fail (file=, line=, msg=) at igc.c:205 #3 0x00005555558f1e19 in mps_lib_assert_fail (condition=0x555555943c4c "res == 0", line=126, file=0x555555943c36 "lockix.c") at /home/yantar92/Dist/mps/code/mpsliban.c:87 #4 LockClaim (lock=0x7fffe8000110) at /home/yantar92/Dist/mps/code/lockix.c:126 #5 0x00005555558f204d in ArenaEnterLock (arena=0x7ffff7fbf000, recursive=0) at /home/yantar92/Dist/mps/code/global.c:576 #6 0x000055555591aefe in ArenaEnter (arena=0x7ffff7fbf000) at /home/yantar92/Dist/mps/code/global.c:553 #7 ArenaAccess (addr=0x7fffeb908758, mode=mode@entry=3, context=context@entry=0x7fffffff97d0) at /home/yantar92/Dist/mps/code/global.c:655 #8 0x0000555555926202 in sigHandle (sig=, info=0x7fffffff9af0, uap=0x7fffffff99c0) at /home/yantar92/Dist/mps/code/protsgix.c:97 #9 0x00007ffff3048050 in () at /lib64/libc.so.6 #10 0x0000555555827385 in PSEUDOVECTORP (a=XIL(0x7fffeb90875d), code=9) at /home/yantar92/Git/emacs/src/lisp.h:1105 #11 PROCESSP (a=XIL(0x7fffeb90875d)) at /home/yantar92/Git/emacs/src/process.h:212 #12 XPROCESS (a=XIL(0x7fffeb90875d)) at /home/yantar92/Git/emacs/src/process.h:224 #13 handle_child_signal (sig=sig@entry=17) at process.c:7660 #14 0x000055555573b771 in deliver_process_signal (sig=17, handler=handler@entry=0x555555827200 ) at sysdep.c:1758 #15 0x0000555555820647 in deliver_child_signal (sig=) at process.c:7702 #16 0x00007ffff3048050 in () at /lib64/libc.so.6 #17 0x000055555585f77b in fix_lisp_obj (ss=ss@entry=0x7fffffffa9a8, pobj=pobj@entry=0x7fffeee7ffe8) at igc.c:841 #18 0x000055555586050d in fix_cons (ss=0x7fffffffa9a8, cons=0x7fffeee7ffe0) at igc.c:1474 #19 dflt_scan_obj (ss=0x7fffffffa9a8, base_start=0x7fffeee7ffd8, base_limit=0x7fffeee80000, closure=0x0) at igc.c:1578 #20 dflt_scanx (ss=ss@entry=0x7fffffffa9a8, base_start=, base_limit=0x7fffeee80000, closure=closure@entry=0x0) at igc.c:1658 #21 0x00005555558613a3 in dflt_scan (ss=0x7fffffffa9a8, base_start=, base_limit=) at igc.c:1669 #22 0x00005555558f163f in TraceScanFormat (limit=0x7fffeee80000, base=0x7fffeee7e000, ss=0x7fffffffa9a0) at /home/yantar92/Dist/mps/code/trace.c:1539 #23 amcSegScan (totalReturn=0x7fffffffa99c, seg=0x7fffe845e4c8, ss=0x7fffffffa9a0) at /home/yantar92/Dist/mps/code/poolamc.c:1440 #24 0x000055555591e7bc in traceScanSegRes (ts=ts@entry=1, rank=rank@entry=1, arena=arena@entry=0x7ffff7fbf000, seg=seg@entry=0x7fffe845e4c8) at /home/yantar92/Dist/mps/code/trace.c:1205 #25 0x000055555591e9ca in traceScanSeg (ts=1, rank=1, arena=0x7ffff7fbf000, seg=0x7fffe845e4c8) at /home/yantar92/Dist/mps/code/trace.c:1267 #26 0x000055555591f3a4 in TraceAdvance (trace=trace@entry=0x7ffff7fbfaa8) at /home/yantar92/Dist/mps/code/trace.c:1728 #27 0x000055555591faa4 in TracePoll (workReturn=workReturn@entry=0x7fffffffab90, collectWorldReturn=collectWorldReturn@entry=0x7fffffffab8c, globals=globals@entry=0x7ffff7fbf008, collectWorldAllowed=) at /home/yantar92/Dist/mps/code/trace.c:1849 #28 0x000055555591fceb in ArenaPoll (globals=globals@entry=0x7ffff7fbf008) at /home/yantar92/Dist/mps/code/global.c:745 #29 0x00005555559200da in mps_ap_fill (p_o=p_o@entry=0x7fffffffad00, mps_ap=mps_ap@entry=0x7fffe80017f0, size=size@entry=24) at /home/yantar92/Dist/mps/code/mpsi.c:1097 #30 0x00005555558601ee in alloc_impl (size=24, type=IGC_OBJ_CONS, ap=0x7fffe80017f0) at igc.c:3330 #31 0x000055555586023c in alloc (size=size@entry=16, type=type@entry=IGC_OBJ_CONS) at igc.c:3358 #32 0x000055555586187a in igc_make_cons (car=XIL(0x133e0), cdr=XIL(0)) at igc.c:3385 #33 0x000055555578e7de in Fcons (car=, cdr=) at alloc.c:2926 #34 Flist (nargs=31, args=0x7fffffffaf38) at alloc.c:3054 #35 0x00007ffff06b13ea in F7365742d666163652d617474726962757465_set_face_attribute_0 () This says: . we called Fcons (from a "normal" Emacs Lisp program, which called set-face-attribute) . that entered MPS by way of igc_make_cons . MPS called our scanning code in dflt_scan . while in fix_* functions called by dflt_scan, we got SIGCHLD . the SIGCHLD handler accessed Lisp data of the process object(s), which triggered MPS SIGSEGV handler . the MPS handler tried to take the arena lock and aborted IOW, SIGCHLD did NOT interrupt the MPS SIGSEGV handler, it interrupted the "normal" MPS code when it called our scanning callbacks. > > Why not simply bind the sigusr2 event to some function (see the node > > "Misc Events" in the ELisp manual for how), and then use "kill -USR2" > > outside of Emacs? IOW, I guess I don't understand why you'd need all > > that complexity just to reproduce the crashes. > > Because I wanted to be sure to hit the tiny window while a global lock was taken. I think the scenario above with SIGCHLD does precisely that, no?