From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.bugs Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Date: Mon, 06 Jan 2025 09:48:09 -0500 Message-ID: <87wmf8t2vq.fsf@dancol.org> References: <87jzbbke6u.fsf@protonmail.com> <87msg7iq0o.fsf@protonmail.com> <86ed1jf1tp.fsf@gnu.org> <865xmugawr.fsf@gnu.org> <8634hx8k1u.fsf@gnu.org> <86msg56to8.fsf@gnu.org> <86h66d6pw1.fsf@gnu.org> <4B76EB57-AA29-40BC-8361-0906E00A3578@dancol.org> <867c786quc.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12109"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.12.8; emacs 31.0.50 Cc: gerd.moellmann@gmail.com, pipcet@protonmail.com, 75322@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jan 06 15:49:28 2025 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tUoQ3-000319-Hw for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 06 Jan 2025 15:49:28 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tUoPl-0005K3-8Z; Mon, 06 Jan 2025 09:49:09 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUoPf-0005JC-5B for bug-gnu-emacs@gnu.org; Mon, 06 Jan 2025 09:49:04 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tUoPe-0007RH-SZ for bug-gnu-emacs@gnu.org; Mon, 06 Jan 2025 09:49:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:Date:References:In-Reply-To:From:To:Subject; bh=RJhmg6Hyu9C3OwpRN4wq+SeVTTU+t9CeQb6mOEamAfg=; b=X9l0mjPfDLdQExczoJ0DzqQTgfpMIIcpE+PwYLRsssc8mRFf8UtYfswL+vWthm6+spfZpNmOjCnavOC5K6Int/X5pPaLDh/gWbApE4upZ9Cveed7fSuf1DantNAFDeNbWbRSqFDaAKlFsHjOtVycASlHmTii3Zwu1Ts6KBrilJ/tFiMOXxXZzCsPQugFSFLCYIed/0WkNM8KA/MwS7Hv0hXTr7Qd4/rGWo1xS1tGUUz/RnbA+qj8pgiekVo4bny1EEeebH7OZq3jMX/w6vdg6zpKkAhiTPm7JepNjueWqBLX02QujMcgER94JJvXT3hWRy8aMLAXMFT7wdq2xEkr4g==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tUoPe-0008LU-Hl for bug-gnu-emacs@gnu.org; Mon, 06 Jan 2025 09:49:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Daniel Colascione Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 06 Jan 2025 14:49:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 75322 X-GNU-PR-Package: emacs Original-Received: via spool by 75322-submit@debbugs.gnu.org id=B75322.173617490332013 (code B ref 75322); Mon, 06 Jan 2025 14:49:02 +0000 Original-Received: (at 75322) by debbugs.gnu.org; 6 Jan 2025 14:48:23 +0000 Original-Received: from localhost ([127.0.0.1]:37502 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tUoP0-0008KH-QI for submit@debbugs.gnu.org; Mon, 06 Jan 2025 09:48:23 -0500 Original-Received: from dancol.org ([2600:3c01:e000:3d8::1]:57250) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tUoOy-0008K6-9u for 75322@debbugs.gnu.org; Mon, 06 Jan 2025 09:48:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Type:MIME-Version:Message-ID:Date:References:In-Reply-To: Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=RJhmg6Hyu9C3OwpRN4wq+SeVTTU+t9CeQb6mOEamAfg=; b=DcoIq6ZFTtC9pzIJtv3fmHpmkS fz4BQjIhcJP0bk0V75WjeFo/+P9rGrjk/dm7xyjAHMs7YSw7uJ1/jPZQMh+b1wLLzOFGScryAp8nP dOjlNjIkGzwFzleWkvpbbHTDUVw8HkiHPX/nRpOV3ctd96EiNlYTq+Og7SODW+NYxRwjZ+m0C+4UB OKOqRSkL907hd7sOTbfMA5KNXfDxqXvghtj3NZ/Roa/6VYEJQZjeF/9jLU/9GbksBMNRRydyt4uMv YDX6QbEqi+3/v9H8SrOBaW/cbq95JJOrqF8bKe83D2I1XcnpubVDZIvuayGppNUxqmoSKitW0fZGx KxfUpFVQ==; Original-Received: from 2603-9001-4203-1ab2-1e69-c57e-fb78-2899.inf6.spectrum.com ([2603:9001:4203:1ab2:1e69:c57e:fb78:2899]:32928 helo=localhost) by dancol.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tUoOp-0002X2-1p; Mon, 06 Jan 2025 09:48:11 -0500 In-Reply-To: <867c786quc.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 06 Jan 2025 14:59:07 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:298670 Archived-At: Eli Zaretskii writes: >> Date: Sun, 05 Jan 2025 16:15:47 -0500 >> From: Daniel Colascione >> CC: pipcet@protonmail.com, 75322@debbugs.gnu.org >> >> > . an automatic variable >> > . a static variable that is protected by someone >> > . a global variable that is protected by someone >> > . a result of dereferencing a pointer that is somehow protected >> > >> >etc. etc., where "protected by someone" means that it is a descendant >> >of some staticpro, or of some root, or... >> >> Well, yeah. Every other GC program does this. Emacs can too. There's >> no great burden: all Lisp objects get traced >> automatically. Everything on the stack or in a register gets traced >> automatically, and, because the scanning is conservative, >> pinned. You only have to take extra steps to tell the GC about >> something when you're going out of your way to go around the GC. >> >> It's simply not true that to adopt a modern GC every line of code has to change. I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite. > > The opinions in this thread are that changes in the code _are_ needed. > Maybe that's not a "rewrite" you had in mind, but if we need to make > such substantial changes in many places, that's a "rewrite" in my > book. I wouldn't call it a "rewrite". If auditing the codebase for memory safety is a "rewrite", I'm a "duck". We're talking about a few hundred lines of changes at the most. Most of the work is just auditing the code for problems. We should be grateful Gerd has done this work already, not "run away from MPS, fast". >> >And if we cannot prove to ourselves that one of the above happens, >> >then we'd need to force a copy of the variable to be on the stack? >> > >> >Does this sound practical? >> > >> >If this is the price of using MPS, and I'm not missing something >> >obvious, then it sounds like we should run away from MPS, fast. >> >Because we will sooner or later have to rewrite every single line of >> >code we ever wrote. >> >> No, you do it by adopting a rule that when a function receives a >> pointer, the caller guarantees the validity of the pointer for the >> duration of the call. This way, only the level of the stack that >> spills the array to the heap has to take on the responsibility of >> keeping the referenced objects alive, and making the spilled array a >> pointer to the pinned guts of a Lisp vector is an adequate way to >> do this. > > We are talking about code such as this one: > > SAFE_NALLOCA (args2, 1, nargs + 1); > args2[0] = Qcall_process; > for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; > coding_systems = Ffind_operation_coding_system (nargs + 1, args2); > val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; > > "Look, ma: no pointers!" Lisp_Object val, *args2; In the C programming language, "*" means "pointer". > The args[] array is fine: it comes from the caller and is valid. The > problem being discussed here is the args2[] array, in the case where > SAFE_NALLOCA decides args2[] is too large for the stack and instead > malloc's it. In that case, args2[] stores on the heap copies of the > objects in args[] (still no pointers!), and the issue is that when GC > happens (say, at some point inside Ffind_operation_coding_system), the > objects in args[] are updated by GC, but their copies in args2[] are > not. Each Lisp_Object held across a GC must either: 1) be visible to the GC for precise marking and updating (which is the case for all references from Lisp object to another as well as staticpro-protected static storage), or 2) be visible to the GC for conservative marking and pinning instead of moving, which today means the variable is on the stack. A Lisp_Object embedded in a block of malloc-heap memory doesn't satisfy either condition and so is unsafe. Also, this pattern is perfectly fine: void foo() { Lisp_Object c = Fcons(...); bar(&c); } void bar(Lisp_Object* c) { eassert(CONSP(c)); Fgarbage_collect(); eassert(CONSP(c)); } because when bar() gets its pointer, it implicitly assumes that it's *caller* has followed the rule above, which it has. The Emacs code already works this way. The changes we're talking about are not anywhere near as large-scale as what you might have in mind. > So this code needs to be changed. The snippet you quoted above can be fixed with a one-liner --- replace SAFE_NALLOCA with SAFE_ALLOCA_LISP. (We have to fix SAFE_ALLOCA_LISP_EXTRA to make it safe with the old GC, but that's not a change that needs to happen at call sites.) > And if you look around, we have quite a lot of these in many places. Sounds like Gerd's spent some time hunting them down. >> "Oh, but won't that kill performance?" > > That wasn't my primary bother. The primary bother is the need to > modify many places. Which is not necessarily a purely mechanical > refactoring, either. Consider: > > static Lisp_Object last_coding; > Lisp_Object coding = Vcoding_system_for_write; > [...] > last_coding = coding; > call_some_function (); > if (NILP (last_coding)) > do_something (); > > If call_some_function can trigger GC, the value of 'coding' after the > call is still valid, but the value of 'last_coding' could be garbage > (unless 'last_coding' is staticpro'd). Correct. The fix is to move last_coding to file scope and gcpro it. You can argue that moving this declaration around is not a "mechanical" fix, but it's straightforward, safe, and makes the code clearer: after the change, the static storage is at file scope where you can see it, not buried in a function body. > We have almost 200 static > Lisp_Object variables, probably not all of them staticpro'd (8 of them > inside functions, like the above example, so definitely not > staticpro'd). So now we need to examine the uses of all of them and > either staticpro them or do something else (like move the assignment > to 'last_coding' to after call_some_function). Changing eight variables from function statics to file statics hardly seems like a monumental effort. The static-storage global-scope Lisp_Object variables are probably almost all gcproed already. Gerd has already gone a long way towards identifying the various problem spots. In particular, his change to move uses of SAFE_ALLOCA to the _LISP variant in various places is good and should be applied regardless of the readiness of the rest of the MPS branch. Even in cases where for one reason or another the code he's changing is safe under the old GC, the old code is fragile and can break under modification. > And we will probably find other usage patterns which trip the same Yep. > And the amazing part is not the above, it''s the fact that with all > that "unsafe" code people are using the branch for production, and > most of them report a stable Emacs. What does that tell us? that > most, if not all, of those places are perfectly valid code, and we are > haunted by the proverbial shadow of a dwarf? Or maybe it just makes > the job harder, because we now need to identify the places where these > bad things can _really_ happen, and fix only them? Or something else? > This is what I'm trying to establish. The problem is not theoretical, > it's entirely practical: where to steer our efforts of stabilizing the > branch and preparing it for landing, given the available resources. There are three reasons we might not see code with dubious adherence to GC rules cause problems in practice: 1) the dubious code is unsafe only in rare edge cases (who has thousands of call-process parameters?) or the problems are GC-timing dependent, 2) the dubious code is safe only because all the Lisp values in question are protected by other references and the old GC is non-moving, and 3) the dubious code is safe only because the only Lisp values stored in these malloc-heap regions talk about objects that are naturally immortal, e.g. Qnil, and the GC doesn't move immortal values. A case of #1 is a memory safety problem. #2 and #3 merely mean we have fragile code. Sometimes it's hard to tell the difference, which is why we should fix all three. A moving GC like MPS imposes more stringent requirements on a program than a non-moving one does, but the changes needed to make Emacs safe under a moving GC are not particularly invasive and make the code safer and more maintainable overall. Most of the changes should be one- or few-line tweaks. We're talking about nothing remotely close to a general rewrite. >> The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. > > If we allocate on the stack almost all the time, we will keep > allocating on the stack almost all the time. But anyway, if you are > now saying that the code is broken, then a rewrite, and a significant > one at that, _is_ needed, right? "Almost" isn't good enough when it comes to memory safety problems and the security issues they often cause.