From: Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
Date: Sat, 04 Jan 2025 19:32:01 +0000 [thread overview]
Message-ID: <87ttaee5qp.fsf@protonmail.com> (raw)
In-Reply-To: <86y0zqbgot.fsf@gnu.org>
"Eli Zaretskii" <eliz@gnu.org> writes:
>> Date: Sat, 04 Jan 2025 15:26:01 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> > AFAICT, expand-file-name is called before we start using the SSDATA of
>> > strings in the args[] array. Or what did I miss?
>>
>> You're right, thanks. I got confused between args and new_argv.
>>
>> The next thing I'd look at is the final call to ENCODE_FILE, called
>> after new_argv is set up; this ends up in encode_coding_object, which
>> calls safe_funcall in what seems to me to be an unlikely but possible
>> code path. I assume that's unsafe (the safe_ refers to redisplay, not
>> GC, IIUC)?
>
> ENCODE_FILE can indeed trigger GC in rare cases, but I see only one
> such call:
>
> path = ENCODE_FILE (path);
>
> We could simply move this to before the loop that sets up new_argv[].
Fixing the current code for this super-rare case would be good, but my
point was that we cannot prove the current code to be correct; it isn't,
technically speaking, even though it's battle-worn.
>
>> While maybe_quit is "safe" because it inhibits GC, I believe it can
>> still call the debugger, which might require more memory than is
>> available until GC is re-enabled.
>
> maybe_quit is not the problem, the problem AFAIU is that any encoding
> could have pre-write-conversion function written in Lisp.
Agreed, not the problem here.
>> >> Yes, make_environment_block does say its callers can't run GC, but
>> >> call_process doesn't indicate when and how it establishes a no-GC
>> >> assumption.
>> >
>> > What would be needed to indicate that?
>>
>> I'd prefer a pair of macros (no-ops in regular builds) to comments, but
>> there is no obvious best solution here.
>
> Sorry, I don't understand: why macros? Do we use something like that
> anywhere else in Emacs?
How is it different from other easserts? emacs_spawn has
eassert (input_blocked_p ());
static_assert (NIL_IS_ZERO) was intended to be a similarly proactive
macro marking the code making that assumption, but that didn't work out.
>> My proposal would be to remove most (ideally, all, and then we're done)
>> no-GC assumptions, and put the few ones that must remain into separate
>> function bodies for the no-GC / possible-GC cases. Then we can put the
>> no-GC function bodies into a separate section and prohibit references
>> from no-GC to possible-GC functions, and have the linker check that.
>
> First, the techniques that rely on separate segments are non-portable,
> so not all platforms will support them. More importantly, I'm afraid
Absolutely, debug builds on the main platforms can, though.
> the amount of code where we currently don't expect GC is much more
> than you seem to imagine, so I don't think the above is practical.
That's why I want to remove most non-GC assumptions first. I gave up
the last time I tried doing this, FWIW, for the reason you describe.
> In any case, the starting point is to audit all the places where GC
> can happen, and that needs to be done anyway if we want to do it
> thoroughly and proactively (as opposed to only when someone reports a
> bug).
I don't see how putting a few macros in while we're there could hurt :-)
> We can delay the decision what to do with these places to
> later, when we understand better the magnitude of the problem and its
> various aspects (in terms of several different effects that GC can
> have on various objects).
Agreed, I won't post a patch for now.
>> >> As we agreed, code should be written to assume GC can strike at any
>> >> time.
>> >
>> > I don't think we agreed to that. At least I didn't, not in this
>> > general form. It would be a huge job to make all of our code comply
>> > with this.
>>
>> I said "That's what you should think: GC can strike at any time", and
>> you said "The same is true with the old GC". I misread that as
>> agreement.
>
> I only meant that MPS is not very different from the old GC in this
> regard, that's all.
I think I understand now, thanks!
>> >> IIUC, Gerd explained that the old GC can still move the string *data*
>> >> used in that structure, even if the string metadata stays in place.
>> >
>> > If string data is moved, then accessing the old pointer would trigger
>> > the barrier and MPS will do its thing, not?
>>
>> Sorry, but I think I'm confused here.
>>
>> IIUC, MPS doesn't currently use barriers on data that is moved (it
>> could, so the data is copied lazily, but I don't think that's what you
>> meant), it uses barriers on data that contains references that may not
>> have been fixed.
>
> The safe thing is to re-initialize the pointer from the string
> whenever we think GC could happen, and otherwise make sure GC cannot
> happen.
For the old GC, yes. For the new GC, the safe thing would be to ensure
MPS has removed all memory barriers, take the arena lock, then call the
syscall and return. Or use xstrdup.
>
>> If a pointer to "old" data is ever exposed to Emacs, we lose, because
>> MPS will reuse the memory for new data, which might be behind a barrier.
>>
>> If we ever do:
>>
>> static Lisp_Object unmarked;
^^^^^^
>> unmarked = string;
>> ... trigger GC here ...
>> puts (SDATA (unmarked);
>>
>> the most likely outcome (thanks to Gerd for the hint) is that
>> nonsensical data is printed
>
> Are you sure?
With the static keyword, yes. (Assuming we don't staticpro the static
variable, of course).
> The below is indeed unsafe:
>
> char *p = SDATA (unmarked);
> ... trigger GC here ...
> puts (p);
Right now, that's safe for MPS, but not for the old GC, correct?
>> > To clarify, I was trying to understand whether the error message
>> > reported by Ihor in another thread could have happened because of GC
>> > in this are of the code.
>>
>> I currently think that Ihor's test case calls execve () with nonsensical
>> "environment" strings a lot, and once in a while they'll even be behind
>> the barrier which causes an EFAULT.
>
> Before we agree that this could be the reason, we'd need to find the
> places where GC could reuse the memory of a live string, while that
> string appears in some live data structure, and as long as no GC can
> happen between the time we put the SDATA pointers into the environment
> array and the time posix_spawn returns.
Calling igc_collect () right before the spawn results in corrupt data:
$ gdb --ar ./src/emacs -Q --batch --eval '(while t (with-temp-buffer
(shell-command "/usr/bin/env" t) (message "%S" (buffer-string))))'
(gdb) b posix_spawn
Breakpoint 1 at 0x535e0
(gdb) r
Starting program: ...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 1.1, 0x00007ffff5c2b9c0 in posix_spawn () from /lib64/libc.so.6
(gdb) p igc_collect ()
$1 = void
(gdb) p env[1]
No symbol "env" in current context.
(gdb) up
#1 0x00005555558a805d in emacs_spawn (newpid=0x7ffffffeb148, std_in=4, std_out=6, std_err=6, argv=0x7ffffffeb0f0, envp=0x5555560bbae0,
cwd=0x7fffe3acb0c8 "/home/pip/emacs-forge/emacs", pty_name=0x0, pty_in=false, pty_out=false, oldset=0x7ffffffeb280) at callproc.c:1490
1490 vfork_error = posix_spawn (&pid, argv[0], &actions, &attributes,
(gdb) p envp[1]
$2 = 0x7fffe3a5ded0 "\bD\034\344\377\177"
(gdb) c
Continuing.
[Detaching after vfork from child process 5771]
"PWD=/home/pip/emacs-forge/emacs
SHLVL=0
_=/usr/bin/env
"
The old pointer pointed to an AMCZ pool (no barriers), but GC might have
reassigned the memory to an AMC pool (with barriers), which would cause
the EFAULT (and explain why it's so rare).
Hardcoding the assumption that string data is never behind a memory
barrier is a bad idea. We might want to change it back to an AMC pool
(I've done so in local debug builds so I could go back from string data
to the string metadata).
So the only mystery left is what causes GC to be called on current
scratch/igc after the environment block has been created; we know we
want to change the code so GC can be triggered from another thread, but
it's failing now, and there are no other threads yet!
My current theory is that igc_on_grow_specpdl (called indirectly from
here:
/* Do the unwind-protect now, even though the pid is not known, so
that no storage allocation is done in the critical section.
The actual PID will be filled in during the critical section. */
record_unwind_protect (call_process_cleanup, Fcurrent_buffer ());
) releases the arena, and MPS uses that apparent opportunity to call
ArenaPoll which might do GC things.
But, again, we want to make this code safe for GC at any time.
(We might be able to delay mps_arena_release until the next maybe_quit,
I guess)
Pip
next prev parent reply other threads:[~2025-01-04 19:32 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-03 19:55 ` Gerd Möllmann
2025-01-03 20:34 ` Gerd Möllmann
2025-01-03 20:48 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 4:40 ` Gerd Möllmann
2025-01-04 7:57 ` Eli Zaretskii
2025-01-04 8:47 ` Gerd Möllmann
2025-01-04 9:56 ` Eli Zaretskii
2025-01-04 10:20 ` Gerd Möllmann
2025-01-05 13:30 ` Eli Zaretskii
2025-01-05 14:11 ` Gerd Möllmann
2025-01-05 17:45 ` Eli Zaretskii
2025-01-05 18:17 ` Gerd Möllmann
2025-01-05 19:07 ` Eli Zaretskii
2025-01-05 20:04 ` Gerd Möllmann
2025-01-05 20:24 ` Eli Zaretskii
2025-01-06 3:57 ` Gerd Möllmann
2025-01-06 8:25 ` Gerd Möllmann
2025-01-06 14:07 ` Eli Zaretskii
2025-01-05 21:15 ` Daniel Colascione
2025-01-06 12:59 ` Eli Zaretskii
2025-01-06 14:48 ` Daniel Colascione
2025-01-06 15:12 ` Eli Zaretskii
2025-01-06 15:27 ` Daniel Colascione
2025-01-05 21:01 ` Daniel Colascione
2025-01-05 23:28 ` Daniel Colascione
2025-01-06 13:26 ` Eli Zaretskii
2025-01-06 15:08 ` Daniel Colascione
2025-01-06 4:23 ` Gerd Möllmann
2025-01-04 11:41 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 11:29 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 12:17 ` Gerd Möllmann
2025-01-04 7:00 ` Eli Zaretskii
2025-01-04 7:17 ` Gerd Möllmann
2025-01-04 8:23 ` Eli Zaretskii
2025-01-04 8:58 ` Gerd Möllmann
2025-01-04 11:08 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 13:47 ` Eli Zaretskii
2025-01-04 14:13 ` Gerd Möllmann
2025-01-04 15:26 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 15:34 ` Gerd Möllmann
2025-01-04 18:19 ` Eli Zaretskii
2025-01-04 18:35 ` Gerd Möllmann
2025-01-04 19:10 ` Eli Zaretskii
2025-01-04 19:24 ` Gerd Möllmann
2025-01-04 18:02 ` Eli Zaretskii
2025-01-04 19:32 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2025-01-04 20:31 ` Eli Zaretskii
2025-01-04 21:15 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 8:23 ` Eli Zaretskii
2025-01-05 9:04 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 9:32 ` Eli Zaretskii
2025-01-05 9:47 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 11:04 ` Eli Zaretskii
2025-01-06 15:54 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-06 19:16 ` Gerd Möllmann
2025-01-08 3:46 ` Gerd Möllmann
2025-01-05 6:32 ` Gerd Möllmann
2025-01-05 6:59 ` Gerd Möllmann
2025-01-05 10:21 ` Eli Zaretskii
2025-01-05 10:30 ` Gerd Möllmann
2025-01-05 10:35 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 10:45 ` Gerd Möllmann
2025-01-05 11:29 ` Eli Zaretskii
2025-01-05 11:37 ` Gerd Möllmann
2025-01-05 12:15 ` Eli Zaretskii
2025-01-05 13:21 ` Gerd Möllmann
2025-01-05 17:31 ` Eli Zaretskii
2025-01-05 17:49 ` Gerd Möllmann
2025-01-05 18:42 ` Eli Zaretskii
2025-01-05 19:02 ` Gerd Möllmann
2025-01-05 7:48 ` Eli Zaretskii
2025-01-05 8:19 ` Gerd Möllmann
2025-01-05 10:33 ` Eli Zaretskii
2025-01-05 10:40 ` Gerd Möllmann
2025-01-05 11:21 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 11:27 ` Gerd Möllmann
2025-01-05 11:49 ` Paul Eggert
2025-01-06 6:26 ` Gerd Möllmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttaee5qp.fsf@protonmail.com \
--to=bug-gnu-emacs@gnu.org \
--cc=75322@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=gerd.moellmann@gmail.com \
--cc=pipcet@protonmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.