From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Date: Mon, 06 Jan 2025 14:59:07 +0200 Message-ID: <867c786quc.fsf@gnu.org> References: <87jzbbke6u.fsf@protonmail.com> <87msg7iq0o.fsf@protonmail.com> <86ed1jf1tp.fsf@gnu.org> <865xmugawr.fsf@gnu.org> <8634hx8k1u.fsf@gnu.org> <86msg56to8.fsf@gnu.org> <86h66d6pw1.fsf@gnu.org> <4B76EB57-AA29-40BC-8361-0906E00A3578@dancol.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15822"; mail-complaints-to="usenet@ciao.gmane.io" Cc: gerd.moellmann@gmail.com, pipcet@protonmail.com, 75322@debbugs.gnu.org To: Daniel Colascione Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jan 06 14:00:53 2025 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tUmiz-0003yZ-NK for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 06 Jan 2025 14:00:53 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tUmiK-00045R-Sx; Mon, 06 Jan 2025 08:00:12 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUmiH-00043e-Qa for bug-gnu-emacs@gnu.org; Mon, 06 Jan 2025 08:00:09 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tUmiA-0001j8-W3 for bug-gnu-emacs@gnu.org; Mon, 06 Jan 2025 08:00:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=References:In-Reply-To:From:Date:To:Subject; bh=aOjweXmfgueQjISw82b0XHL5Wa4qyoKsxsaSHjNXdP4=; b=d0qub9oIDr3Eiif4UkFLSsGMXrzgtyOlz99v6cVjGNmOpPBTWZfTGMC4IqC2M8RrWb83EwoZX5ukx5VaHPp0TqpJ/09DNOj86Pk8la0VGW9GWnhHyyOry9CKvsskAluyxy6du9OJnVfSsxeOf5upgG96lOZMHp+eGFptB6OvMs51GifZJTcbSXOYaDy1JGUXO9yjzbzGAmlr6DsbLVyg5GTHW5xDprizgC7SRzK2/puqWbdoH2caj+QVHcjft2lYSS3COFfK3rj40oppVwUynJkXtyRPr3U9FvqJpmKyJQ+wxkmBDwbTVfh7Fk64Pmy87R8EyXkquGIpXGC3goCKKA==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tUmiA-0002Vh-PZ for bug-gnu-emacs@gnu.org; Mon, 06 Jan 2025 08:00:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 06 Jan 2025 13:00:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 75322 X-GNU-PR-Package: emacs Original-Received: via spool by 75322-submit@debbugs.gnu.org id=B75322.17361683629544 (code B ref 75322); Mon, 06 Jan 2025 13:00:02 +0000 Original-Received: (at 75322) by debbugs.gnu.org; 6 Jan 2025 12:59:22 +0000 Original-Received: from localhost ([127.0.0.1]:37174 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tUmhT-0002Tk-17 for submit@debbugs.gnu.org; Mon, 06 Jan 2025 07:59:22 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53720) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tUmhR-0002TX-0A for 75322@debbugs.gnu.org; Mon, 06 Jan 2025 07:59:17 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUmhL-0001e4-85; Mon, 06 Jan 2025 07:59:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=aOjweXmfgueQjISw82b0XHL5Wa4qyoKsxsaSHjNXdP4=; b=ISGSwZWiH8mn wVPJH2hCD3/EnTHIL/kKFSl0s/s/0hTtwBrQMMvNLB8Bgj0XeTAFhX/0tcMuHsthGv2XfdnjIWyOZ BbZq2Y+ShdNP2dhJbarCrWv3J8e3eGvyMd999a/29fQmzjgsE8F8KLkRtDeYsnlUDjKcHqv5CEYVq yEnhtbn1tXr3yHG5w0WPIPyYuZofptj3dY5bdDLqXzm5xz9mru4PCEpL8gWLW+IzL4xZMGOK8ICFd axOlrSXm5djchsZNvCJoCxo2mTw7VTNLZpzWH+DpylW5nVKBOq5aBkpzONBZYZp4xlB7XYfbqfKrL 5omF4AMArcwY+jzfDfdADQ==; In-Reply-To: <4B76EB57-AA29-40BC-8361-0906E00A3578@dancol.org> (message from Daniel Colascione on Sun, 05 Jan 2025 16:15:47 -0500) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:298650 Archived-At: > Date: Sun, 05 Jan 2025 16:15:47 -0500 > From: Daniel Colascione > CC: pipcet@protonmail.com, 75322@debbugs.gnu.org > > > . an automatic variable > > . a static variable that is protected by someone > > . a global variable that is protected by someone > > . a result of dereferencing a pointer that is somehow protected > > > >etc. etc., where "protected by someone" means that it is a descendant > >of some staticpro, or of some root, or... > > Well, yeah. Every other GC program does this. Emacs can too. There's no great burden: all Lisp objects get traced automatically. Everything on the stack or in a register gets traced automatically, and, because the scanning is conservative, pinned. You only have to take extra steps to tell the GC about something when you're going out of your way to go around the GC. > > It's simply not true that to adopt a modern GC every line of code has to change. I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite. The opinions in this thread are that changes in the code _are_ needed. Maybe that's not a "rewrite" you had in mind, but if we need to make such substantial changes in many places, that's a "rewrite" in my book. > >And if we cannot prove to ourselves that one of the above happens, > >then we'd need to force a copy of the variable to be on the stack? > > > >Does this sound practical? > > > >If this is the price of using MPS, and I'm not missing something > >obvious, then it sounds like we should run away from MPS, fast. > >Because we will sooner or later have to rewrite every single line of > >code we ever wrote. > > No, you do it by adopting a rule that when a function receives a pointer, the caller guarantees the validity of the pointer for the duration of the call. This way, only the level of the stack that spills the array to the heap has to take on the responsibility of keeping the referenced objects alive, and making the spilled array a pointer to the pinned guts of a Lisp vector is an adequate way to do this. We are talking about code such as this one: SAFE_NALLOCA (args2, 1, nargs + 1); args2[0] = Qcall_process; for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; coding_systems = Ffind_operation_coding_system (nargs + 1, args2); val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; "Look, ma: no pointers!" The args[] array is fine: it comes from the caller and is valid. The problem being discussed here is the args2[] array, in the case where SAFE_NALLOCA decides args2[] is too large for the stack and instead malloc's it. In that case, args2[] stores on the heap copies of the objects in args[] (still no pointers!), and the issue is that when GC happens (say, at some point inside Ffind_operation_coding_system), the objects in args[] are updated by GC, but their copies in args2[] are not. So this code needs to be changed. And if you look around, we have quite a lot of these in many places. > "Oh, but won't that kill performance?" That wasn't my primary bother. The primary bother is the need to modify many places. Which is not necessarily a purely mechanical refactoring, either. Consider: static Lisp_Object last_coding; Lisp_Object coding = Vcoding_system_for_write; [...] last_coding = coding; call_some_function (); if (NILP (last_coding)) do_something (); If call_some_function can trigger GC, the value of 'coding' after the call is still valid, but the value of 'last_coding' could be garbage (unless 'last_coding' is staticpro'd). We have almost 200 static Lisp_Object variables, probably not all of them staticpro'd (8 of them inside functions, like the above example, so definitely not staticpro'd). So now we need to examine the uses of all of them and either staticpro them or do something else (like move the assignment to 'last_coding' to after call_some_function). And we will probably find other usage patterns which trip the same wire. And the amazing part is not the above, it''s the fact that with all that "unsafe" code people are using the branch for production, and most of them report a stable Emacs. What does that tell us? that most, if not all, of those places are perfectly valid code, and we are haunted by the proverbial shadow of a dwarf? Or maybe it just makes the job harder, because we now need to identify the places where these bad things can _really_ happen, and fix only them? Or something else? This is what I'm trying to establish. The problem is not theoretical, it's entirely practical: where to steer our efforts of stabilizing the branch and preparing it for landing, given the available resources. > The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. If we allocate on the stack almost all the time, we will keep allocating on the stack almost all the time. But anyway, if you are now saying that the code is broken, then a rewrite, and a significant one at that, _is_ needed, right?