From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Date: Sat, 04 Jan 2025 19:32:01 +0000 Message-ID: <87ttaee5qp.fsf@protonmail.com> References: <87jzbbke6u.fsf@protonmail.com> <86sepzf4h3.fsf@gnu.org> <87a5c6j0qn.fsf@protonmail.com> <86jzbad735.fsf@gnu.org> <877c7aha9n.fsf@protonmail.com> <86y0zqbgot.fsf@gnu.org> Reply-To: Pip Cet Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18117"; mail-complaints-to="usenet@ciao.gmane.io" Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jan 04 20:33:41 2025 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tU9u1-0004ao-GE for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 04 Jan 2025 20:33:41 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tU9tQ-0000xP-CG; Sat, 04 Jan 2025 14:33:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tU9tO-0000wa-Ei for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 14:33:02 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tU9tN-0006l9-Vv for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 14:33:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:References:In-Reply-To:From:Date:To:Subject; bh=/PhpZgtpf4yj/ZKz2T0Ke7L61JZ0Sf107iJYIuNG+ZI=; b=lTNs0junoec/dD70RFE+Twu5BTwTV9tERmVuIkfkzcgugDog9cw1gmEFZzXXCzu0sn43AwSgNe9w7MGlEcmgXWtbo+VcO6w9M7/qmTDDqIjDlbT4Y4sDI2t1W6QYKtOJjMPBIybfblL4zVnKXIPHtdAMxrBdW7uNDtUdGxI5hLWdwD0q4ovGIycAhemKXeHURw/XPki0qflpXnp9h5mN//jli0tYGKI1/Q3Szezf09Zo1XMEMFHAyg4P3oyc0mA9lsPI4bijRvP9CraThb27qT7FlbOTu0zQMoSYHFG2gmgpOtTSis4FhxoLwR0CwesrhyPksJ7Z1/QM1PYxzDsZXQ==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tU9tN-0000iu-Q3 for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 14:33:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Pip Cet Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 04 Jan 2025 19:33:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 75322 X-GNU-PR-Package: emacs Original-Received: via spool by 75322-submit@debbugs.gnu.org id=B75322.17360191372719 (code B ref 75322); Sat, 04 Jan 2025 19:33:01 +0000 Original-Received: (at 75322) by debbugs.gnu.org; 4 Jan 2025 19:32:17 +0000 Original-Received: from localhost ([127.0.0.1]:57462 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tU9sd-0000hm-W6 for submit@debbugs.gnu.org; Sat, 04 Jan 2025 14:32:16 -0500 Original-Received: from mail-10630.protonmail.ch ([79.135.106.30]:10541) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tU9sb-0000hV-0X for 75322@debbugs.gnu.org; Sat, 04 Jan 2025 14:32:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1736019124; x=1736278324; bh=/PhpZgtpf4yj/ZKz2T0Ke7L61JZ0Sf107iJYIuNG+ZI=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post; b=QjHQ3IrSAH73PIhP2BbLSE1AnXXnqugsjvt49/ho2EzQI5W3siDrj9lYPB0jJB10Q WDWwmN+vBkBkqnbQ5/z3VawwJZmc8WLA6TX8hUWhD3EfDNP584FNaQeQfTZ+CEsAXF 8hCPC3GH3N6ajlfFnWjssGbNriPN2oIWrc0Gn7HPysA9U/4YOU65nSqG4X57ZaAEo5 GC163tWZQaKjZ0dal/9ZTq9BTtQKFHbrTHtQY+V3+dWw3n746U0GeD1O0OM9PdTM+J 99SnmC5t35lBvA8TW8/o40Bm/LAwTNaT8yhXGG2rIoRguQ49et+xHuL/DzD0MB4Nss Bp68y/r52B91Q== In-Reply-To: <86y0zqbgot.fsf@gnu.org> Feedback-ID: 112775352:user:proton X-Pm-Message-ID: 19a505660cd17e8db9df073ee0cef67ce15876fe X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:298462 Archived-At: "Eli Zaretskii" writes: >> Date: Sat, 04 Jan 2025 15:26:01 +0000 >> From: Pip Cet >> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org >> >> "Eli Zaretskii" writes: >> >> > AFAICT, expand-file-name is called before we start using the SSDATA of >> > strings in the args[] array. Or what did I miss? >> >> You're right, thanks. I got confused between args and new_argv. >> >> The next thing I'd look at is the final call to ENCODE_FILE, called >> after new_argv is set up; this ends up in encode_coding_object, which >> calls safe_funcall in what seems to me to be an unlikely but possible >> code path. I assume that's unsafe (the safe_ refers to redisplay, not >> GC, IIUC)? > > ENCODE_FILE can indeed trigger GC in rare cases, but I see only one > such call: > > path =3D ENCODE_FILE (path); > > We could simply move this to before the loop that sets up new_argv[]. Fixing the current code for this super-rare case would be good, but my point was that we cannot prove the current code to be correct; it isn't, technically speaking, even though it's battle-worn. > >> While maybe_quit is "safe" because it inhibits GC, I believe it can >> still call the debugger, which might require more memory than is >> available until GC is re-enabled. > > maybe_quit is not the problem, the problem AFAIU is that any encoding > could have pre-write-conversion function written in Lisp. Agreed, not the problem here. >> >> Yes, make_environment_block does say its callers can't run GC, but >> >> call_process doesn't indicate when and how it establishes a no-GC >> >> assumption. >> > >> > What would be needed to indicate that? >> >> I'd prefer a pair of macros (no-ops in regular builds) to comments, but >> there is no obvious best solution here. > > Sorry, I don't understand: why macros? Do we use something like that > anywhere else in Emacs? How is it different from other easserts? emacs_spawn has eassert (input_blocked_p ()); static_assert (NIL_IS_ZERO) was intended to be a similarly proactive macro marking the code making that assumption, but that didn't work out. >> My proposal would be to remove most (ideally, all, and then we're done) >> no-GC assumptions, and put the few ones that must remain into separate >> function bodies for the no-GC / possible-GC cases. Then we can put the >> no-GC function bodies into a separate section and prohibit references >> from no-GC to possible-GC functions, and have the linker check that. > > First, the techniques that rely on separate segments are non-portable, > so not all platforms will support them. More importantly, I'm afraid Absolutely, debug builds on the main platforms can, though. > the amount of code where we currently don't expect GC is much more > than you seem to imagine, so I don't think the above is practical. That's why I want to remove most non-GC assumptions first. I gave up the last time I tried doing this, FWIW, for the reason you describe. > In any case, the starting point is to audit all the places where GC > can happen, and that needs to be done anyway if we want to do it > thoroughly and proactively (as opposed to only when someone reports a > bug). I don't see how putting a few macros in while we're there could hurt :-) > We can delay the decision what to do with these places to > later, when we understand better the magnitude of the problem and its > various aspects (in terms of several different effects that GC can > have on various objects). Agreed, I won't post a patch for now. >> >> As we agreed, code should be written to assume GC can strike at any >> >> time. >> > >> > I don't think we agreed to that. At least I didn't, not in this >> > general form. It would be a huge job to make all of our code comply >> > with this. >> >> I said "That's what you should think: GC can strike at any time", and >> you said "The same is true with the old GC". I misread that as >> agreement. > > I only meant that MPS is not very different from the old GC in this > regard, that's all. I think I understand now, thanks! >> >> IIUC, Gerd explained that the old GC can still move the string *data* >> >> used in that structure, even if the string metadata stays in place. >> > >> > If string data is moved, then accessing the old pointer would trigger >> > the barrier and MPS will do its thing, not? >> >> Sorry, but I think I'm confused here. >> >> IIUC, MPS doesn't currently use barriers on data that is moved (it >> could, so the data is copied lazily, but I don't think that's what you >> meant), it uses barriers on data that contains references that may not >> have been fixed. > > The safe thing is to re-initialize the pointer from the string > whenever we think GC could happen, and otherwise make sure GC cannot > happen. For the old GC, yes. For the new GC, the safe thing would be to ensure MPS has removed all memory barriers, take the arena lock, then call the syscall and return. Or use xstrdup. > >> If a pointer to "old" data is ever exposed to Emacs, we lose, because >> MPS will reuse the memory for new data, which might be behind a barrier. >> >> If we ever do: >> >> static Lisp_Object unmarked; ^^^^^^ >> unmarked =3D string; >> ... trigger GC here ... >> puts (SDATA (unmarked); >> >> the most likely outcome (thanks to Gerd for the hint) is that >> nonsensical data is printed > > Are you sure? With the static keyword, yes. (Assuming we don't staticpro the static variable, of course). > The below is indeed unsafe: > > char *p =3D SDATA (unmarked); > ... trigger GC here ... > puts (p); Right now, that's safe for MPS, but not for the old GC, correct? >> > To clarify, I was trying to understand whether the error message >> > reported by Ihor in another thread could have happened because of GC >> > in this are of the code. >> >> I currently think that Ihor's test case calls execve () with nonsensical >> "environment" strings a lot, and once in a while they'll even be behind >> the barrier which causes an EFAULT. > > Before we agree that this could be the reason, we'd need to find the > places where GC could reuse the memory of a live string, while that > string appears in some live data structure, and as long as no GC can > happen between the time we put the SDATA pointers into the environment > array and the time posix_spawn returns. Calling igc_collect () right before the spawn results in corrupt data: $ gdb --ar ./src/emacs -Q --batch --eval '(while t (with-temp-buffer (shell-command "/usr/bin/env" t) (message "%S" (buffer-string))))' (gdb) b posix_spawn Breakpoint 1 at 0x535e0 (gdb) r Starting program: ... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1.1, 0x00007ffff5c2b9c0 in posix_spawn () from /lib64/libc.so.6 (gdb) p igc_collect () $1 =3D void (gdb) p env[1] No symbol "env" in current context. (gdb) up #1 0x00005555558a805d in emacs_spawn (newpid=3D0x7ffffffeb148, std_in=3D4,= std_out=3D6, std_err=3D6, argv=3D0x7ffffffeb0f0, envp=3D0x5555560bbae0,=20 cwd=3D0x7fffe3acb0c8 "/home/pip/emacs-forge/emacs", pty_name=3D0x0, pty= _in=3Dfalse, pty_out=3Dfalse, oldset=3D0x7ffffffeb280) at callproc.c:1490 1490=09 vfork_error =3D posix_spawn (&pid, argv[0], &actions, &attribu= tes, (gdb) p envp[1] $2 =3D 0x7fffe3a5ded0 "\bD\034\344\377\177" (gdb) c Continuing. [Detaching after vfork from child process 5771] "PWD=3D/home/pip/emacs-forge/emacs SHLVL=3D0 _=3D/usr/bin/env " The old pointer pointed to an AMCZ pool (no barriers), but GC might have reassigned the memory to an AMC pool (with barriers), which would cause the EFAULT (and explain why it's so rare). Hardcoding the assumption that string data is never behind a memory barrier is a bad idea. We might want to change it back to an AMC pool (I've done so in local debug builds so I could go back from string data to the string metadata). So the only mystery left is what causes GC to be called on current scratch/igc after the environment block has been created; we know we want to change the code so GC can be triggered from another thread, but it's failing now, and there are no other threads yet! My current theory is that igc_on_grow_specpdl (called indirectly from here: /* Do the unwind-protect now, even though the pid is not known, so that no storage allocation is done in the critical section. The actual PID will be filled in during the critical section. */ record_unwind_protect (call_process_cleanup, Fcurrent_buffer ()); ) releases the arena, and MPS uses that apparent opportunity to call ArenaPoll which might do GC things. But, again, we want to make this code safe for GC at any time. (We might be able to delay mps_arena_release until the next maybe_quit, I guess) Pip