From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: MPS: a random backtrace while toying with gdb Date: Tue, 02 Jul 2024 16:10:46 +0300 Message-ID: <86h6d8c52h.fsf@gnu.org> References: <87bk3jh8bt.fsf@localhost> <86cynyhfsn.fsf@gnu.org> <87v81qp91g.fsf@gmail.com> <86r0cefb0i.fsf@gnu.org> <86msn1fk0c.fsf@gnu.org> <86h6d9dlyg.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25332"; mail-complaints-to="usenet@ciao.gmane.io" Cc: eller.helmut@gmail.com, gerd.moellmann@gmail.com, yantar92@posteo.net, emacs-devel@gnu.org To: Pip Cet Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Jul 02 15:11:11 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sOdHo-0006EN-Oz for ged-emacs-devel@m.gmane-mx.org; Tue, 02 Jul 2024 15:11:08 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sOdHY-00037W-55; Tue, 02 Jul 2024 09:10:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sOdHW-000372-Qk for emacs-devel@gnu.org; Tue, 02 Jul 2024 09:10:50 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sOdHW-0008VU-5w; Tue, 02 Jul 2024 09:10:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=hdFttIUUFOBCuFL6HJA+QoysbEqsk8YpQtaqUBuFxEY=; b=j//phwcX0sm7 wUHkEYh8OlVJw4PD1jrOp3wuR4xqxyQytMiv9G2Q88tSo4d24TECt6IM/F80phauANGuDCoC+RKK5 mOCHzPbFwhB7R2m334P8x1cUNwy1ruk8fdyzcmwb+jAhFu6G35RPvYJkuhnV0wEywU5Y5wUQMI1qJ jiL4zHUziAu9iDQxVODg7HA04eAW7OAQQisKvp5W3Pg0+vgg2LXMRPod+118kRI8x+hhD6jtqNi/T keZQACK80ubH2fesocAsPt0sz1fMRo/Tt/Lzp9SM0ZSsrDN3tfnSN+RF2ZVmkEtpqi7IkJEbS6wyX tIS2Uql8msE78ug6gkZP8w==; In-Reply-To: (message from Pip Cet on Tue, 02 Jul 2024 07:55:26 +0000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:321147 Archived-At: > Date: Tue, 02 Jul 2024 07:55:26 +0000 > From: Pip Cet > Cc: eller.helmut@gmail.com, gerd.moellmann@gmail.com, yantar92@posteo.net, emacs-devel@gnu.org > > > Which is why I suggested to block the signals before calling MPS and > > unblock them immediately when we return from an MPS call. All of > > these calls are in igc.c, so the job of adding these blocks, while > > mundane and boring, is not impossible. > > And it adds two syscalls to what should be a very fast operation. I'm not convinced it's necessary. We should time these syscalls if we are afraid they could slow us down. > > That's not the problem, AFAIU. The problem is that a signal handler > > which accesses Lisp data or the state of the Lisp machine could > > trigger an MPS call, which will try taking the arena lock, and that > > cannot be nested, by MPS design. And our handlers do access the Lisp > > machine, albeit cautiously and as little as necessary. So when the > > signal happens in the middle of an MPS call which already took the > > arena lock, we cannot safely access our data. > > I've tried quite hard to make this happen, but I didn't manage it. It seems that whenever MPS puts up a protection barrier for existing allocated memory, the arena lock has already been released. As signal handlers cannot allocate memory directly, there's no deadlock, either. > > I don't understand MPS as well as you apparently do, so could you help me and tell where to put a kill(getpid(), SIGWHATEVER) with an appropriate signal handler which will cause a crash (without, in the signal handler, allocating memory)? I thought using the profiler would trigger these easily enough? I think someone (Helmut?) posted a simple recipe for reproducing that some time ago? Also, there was a recipe with SIGCHLD not long ago (you'd need to undo Helmut's fixes for that, I believe, to be able to reproduce that). > I'm seriously tempted to suggest that until we can produce such a crash, we can work on the assumption that blocking signals while handling SIGSEGV is enough, but, again, I don't fully understand MPS and its complicated locking scheme. I agree that having a reproduction recipe is a necessary condition for trying to fix this. > To expand a little on what I'm doing: > > * install a handler for SIGUSR2 which dereferences a pointer stored in a global variable (and remove the old SIGUSR2 handler) > * modify MPS's locking functions to kill(getpid(), SIGUSR2) right after acquiring the lock > * in gdb, wait for a SIGSEGV to find a protected address/segment. Store that in the pointer variable. > * there should now be a crash when the SIGUSR2 handler runs and memory protection for the pointer is in effect > * no crashes observed so far. Why not simply bind the sigusr2 event to some function (see the node "Misc Events" in the ELisp manual for how), and then use "kill -USR2" outside of Emacs? IOW, I guess I don't understand why you'd need all that complexity just to reproduce the crashes.