From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Date: Sat, 04 Jan 2025 22:31:48 +0200 Message-ID: <86a5c6b9sb.fsf@gnu.org> References: <87jzbbke6u.fsf@protonmail.com> <86sepzf4h3.fsf@gnu.org> <87a5c6j0qn.fsf@protonmail.com> <86jzbad735.fsf@gnu.org> <877c7aha9n.fsf@protonmail.com> <86y0zqbgot.fsf@gnu.org> <87ttaee5qp.fsf@protonmail.com> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21699"; mail-complaints-to="usenet@ciao.gmane.io" Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org To: Pip Cet Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jan 04 21:33:18 2025 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tUAph-0005TS-ML for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 04 Jan 2025 21:33:18 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tUApV-0003rm-Gi; Sat, 04 Jan 2025 15:33:05 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUApT-0003rX-KG for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 15:33:03 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tUApT-0004aN-4W for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 15:33:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=References:In-Reply-To:From:Date:To:Subject; bh=DuiRCYQsARmL021cp51N9cc8YuKtHbOj2tCU1KOxqWY=; b=rDtimlJ82X0zYa9aGeZuzaAZu3E1etNLJWmUSmBpGPp9yN3rBKqp8Lf/HwfZCiRxcNNjZBiybSetyaJihlyVFb4S7pkZ2VdEYXHYDBK8+pbaGlO7A14vkpk8OYO+PsgITT9lydIa6q6j6qvm6fdco7J8AIPxh2rJmkUjN22FT2+itwSipX0Ju9+i9HAqwa24kg5iMdYEVhAGWqkBMyXEylx2ygAVckF5s9GBkaBWesEN4asPyZrVpDkDDiXZzy6NnEnYYf2bhO1fcxAA2rWF/u3ObVgTkH4ZsISFynWfwPaE3HEnizLXLEAV0dikPcCNBE+QQ8UxojeXNnN5H+HdNw==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tUApS-0003T3-JK for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 15:33:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 04 Jan 2025 20:33:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 75322 X-GNU-PR-Package: emacs Original-Received: via spool by 75322-submit@debbugs.gnu.org id=B75322.173602274413247 (code B ref 75322); Sat, 04 Jan 2025 20:33:02 +0000 Original-Received: (at 75322) by debbugs.gnu.org; 4 Jan 2025 20:32:24 +0000 Original-Received: from localhost ([127.0.0.1]:57574 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tUAop-0003Ra-RZ for submit@debbugs.gnu.org; Sat, 04 Jan 2025 15:32:24 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:34404) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tUAon-0003RM-Ga for 75322@debbugs.gnu.org; Sat, 04 Jan 2025 15:32:22 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUAoi-0004X8-1O; Sat, 04 Jan 2025 15:32:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=DuiRCYQsARmL021cp51N9cc8YuKtHbOj2tCU1KOxqWY=; b=ZPTTFSM3pIfO 1IYkc+20uqJZ0FhiRJCwox7f+NIpZwi34Mmz54QJGzoBWUktCL09QlsByu2vXd6KZH46GjF51oAVJ ydPeknKnGwqndLgMyA0lPn9alrkNd/5UuzG5MCWWcvNAet1DiKnj9alHDn+ggtLifrltn2X9WwFAA 6WQfWZSnz4pcd2WOUllor4R9FSVbCPwAsc1aKj+OdYxFD1aUFlOPVoV6hIBtKT/KPKoDdQHfwv3DX Cm4WAxo3niMJF7PvEPdx3hcXkPgvE81tdDnaywUm5YvJsULY9YeK/62LFEkAKi8QxNipf014JIx1A l7zfV9hOKbi0DETr1LSetQ==; In-Reply-To: <87ttaee5qp.fsf@protonmail.com> (message from Pip Cet on Sat, 04 Jan 2025 19:32:01 +0000) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:298469 Archived-At: > Date: Sat, 04 Jan 2025 19:32:01 +0000 > From: Pip Cet > Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org > > > ENCODE_FILE can indeed trigger GC in rare cases, but I see only one > > such call: > > > > path = ENCODE_FILE (path); > > > > We could simply move this to before the loop that sets up new_argv[]. > > Fixing the current code for this super-rare case would be good, but my > point was that we cannot prove the current code to be correct; it isn't, > technically speaking, even though it's battle-worn. I want to be pragmatic and solve practical problems in my life time. Proving that the code is correct I leave to the next generation. > >> >> Yes, make_environment_block does say its callers can't run GC, but > >> >> call_process doesn't indicate when and how it establishes a no-GC > >> >> assumption. > >> > > >> > What would be needed to indicate that? > >> > >> I'd prefer a pair of macros (no-ops in regular builds) to comments, but > >> there is no obvious best solution here. > > > > Sorry, I don't understand: why macros? Do we use something like that > > anywhere else in Emacs? > > How is it different from other easserts? emacs_spawn has > > eassert (input_blocked_p ()); How do you eassert something that should not happen during some code fragment? If you have something specific in mind, do show the code. > >> My proposal would be to remove most (ideally, all, and then we're done) > >> no-GC assumptions, and put the few ones that must remain into separate > >> function bodies for the no-GC / possible-GC cases. Then we can put the > >> no-GC function bodies into a separate section and prohibit references > >> from no-GC to possible-GC functions, and have the linker check that. > > > > First, the techniques that rely on separate segments are non-portable, > > so not all platforms will support them. More importantly, I'm afraid > > Absolutely, debug builds on the main platforms can, though. But then important code that is #ifdef SOME-PLATFORM cannot be handled like that, and we are back at square one, because users of "non-main platforms" still report bugs and we try to investigate and fix them, so we still need to be able to solve this when separate segments are not available. > > the amount of code where we currently don't expect GC is much more > > than you seem to imagine, so I don't think the above is practical. > > That's why I want to remove most non-GC assumptions first. I gave up > the last time I tried doing this, FWIW, for the reason you describe. > > > In any case, the starting point is to audit all the places where GC > > can happen, and that needs to be done anyway if we want to do it > > thoroughly and proactively (as opposed to only when someone reports a > > bug). > > I don't see how putting a few macros in while we're there could hurt :-) We neither saw any macros yet nor have any idea how many of them will be needed. So I don't know why you think it's just a few macros. We are still discussing a tiny static function and haven't arrived at an agreed and safe solution. Let's be a bit more practical here, okay? > > The safe thing is to re-initialize the pointer from the string > > whenever we think GC could happen, and otherwise make sure GC cannot > > happen. > > For the old GC, yes. For the new GC, the safe thing would be to ensure > MPS has removed all memory barriers, take the arena lock, then call the > syscall and return. Or use xstrdup. If this is indeed so (and I don't think it is), then we need to discuss it very thoroughly, because it basically means we cannot do anything with Lisp strings in C. For example, the display iterator has a member that is a Lisp string, which is used when we produce glyphs for display from a string (such as a display property or an overlay before/after string) -- what you say here basically means that we cannot carry that string around, right? If not, how is that different? > >> If a pointer to "old" data is ever exposed to Emacs, we lose, because > >> MPS will reuse the memory for new data, which might be behind a barrier. > >> > >> If we ever do: > >> > >> static Lisp_Object unmarked; > ^^^^^^ > >> unmarked = string; > >> ... trigger GC here ... > >> puts (SDATA (unmarked); > >> > >> the most likely outcome (thanks to Gerd for the hint) is that > >> nonsensical data is printed > > > > Are you sure? > > With the static keyword, yes. (Assuming we don't staticpro the static > variable, of course). What does static have to do with this? What matters is the value, not the storage. The value comes from 'string', a different variable. It points to string data, and it's that string data that is of interest to us in the above snippet. > > The below is indeed unsafe: > > > > char *p = SDATA (unmarked); > > ... trigger GC here ... > > puts (p); > > Right now, that's safe for MPS, but not for the old GC, correct? If GC moves string data, then it is not safe, period. Does MPS move string data? > >> > To clarify, I was trying to understand whether the error message > >> > reported by Ihor in another thread could have happened because of GC > >> > in this are of the code. > >> > >> I currently think that Ihor's test case calls execve () with nonsensical > >> "environment" strings a lot, and once in a while they'll even be behind > >> the barrier which causes an EFAULT. > > > > Before we agree that this could be the reason, we'd need to find the > > places where GC could reuse the memory of a live string, while that > > string appears in some live data structure, and as long as no GC can > > happen between the time we put the SDATA pointers into the environment > > array and the time posix_spawn returns. > > Calling igc_collect () right before the spawn results in corrupt data: But the code doesn't call igc_collect, so how is this relevant? Please see how I described the situation in the paragraph above. > So the only mystery left is what causes GC to be called on current > scratch/igc after the environment block has been created; That's the _only_ mystery, and only if that in fact happens. Someone should explain how GC _can_ happen in that case on the igc branch. If you can, please do, and let's stop arguing about theoretical possibilities. > My current theory is that igc_on_grow_specpdl (called indirectly from > here: > > /* Do the unwind-protect now, even though the pid is not known, so > that no storage allocation is done in the critical section. > The actual PID will be filled in during the critical section. */ > record_unwind_protect (call_process_cleanup, Fcurrent_buffer ()); > > ) releases the arena, and MPS uses that apparent opportunity to call > ArenaPoll which might do GC things. Evidence, please.