all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Pip Cet <pipcet@protonmail.com>
Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
Date: Sat, 04 Jan 2025 22:31:48 +0200	[thread overview]
Message-ID: <86a5c6b9sb.fsf@gnu.org> (raw)
In-Reply-To: <87ttaee5qp.fsf@protonmail.com> (message from Pip Cet on Sat, 04 Jan 2025 19:32:01 +0000)

> Date: Sat, 04 Jan 2025 19:32:01 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
> 
> > ENCODE_FILE can indeed trigger GC in rare cases, but I see only one
> > such call:
> >
> >   path = ENCODE_FILE (path);
> >
> > We could simply move this to before the loop that sets up new_argv[].
> 
> Fixing the current code for this super-rare case would be good, but my
> point was that we cannot prove the current code to be correct; it isn't,
> technically speaking, even though it's battle-worn.

I want to be pragmatic and solve practical problems in my life time.
Proving that the code is correct I leave to the next generation.

> >> >> Yes, make_environment_block does say its callers can't run GC, but
> >> >> call_process doesn't indicate when and how it establishes a no-GC
> >> >> assumption.
> >> >
> >> > What would be needed to indicate that?
> >>
> >> I'd prefer a pair of macros (no-ops in regular builds) to comments, but
> >> there is no obvious best solution here.
> >
> > Sorry, I don't understand: why macros?  Do we use something like that
> > anywhere else in Emacs?
> 
> How is it different from other easserts?  emacs_spawn has
> 
>   eassert (input_blocked_p ());

How do you eassert something that should not happen during some code
fragment?  If you have something specific in mind, do show the code.

> >> My proposal would be to remove most (ideally, all, and then we're done)
> >> no-GC assumptions, and put the few ones that must remain into separate
> >> function bodies for the no-GC / possible-GC cases.  Then we can put the
> >> no-GC function bodies into a separate section and prohibit references
> >> from no-GC to possible-GC functions, and have the linker check that.
> >
> > First, the techniques that rely on separate segments are non-portable,
> > so not all platforms will support them.  More importantly, I'm afraid
> 
> Absolutely, debug builds on the main platforms can, though.

But then important code that is #ifdef SOME-PLATFORM cannot be handled
like that, and we are back at square one, because users of "non-main
platforms" still report bugs and we try to investigate and fix them,
so we still need to be able to solve this when separate segments are
not available.

> > the amount of code where we currently don't expect GC is much more
> > than you seem to imagine, so I don't think the above is practical.
> 
> That's why I want to remove most non-GC assumptions first.  I gave up
> the last time I tried doing this, FWIW, for the reason you describe.
> 
> > In any case, the starting point is to audit all the places where GC
> > can happen, and that needs to be done anyway if we want to do it
> > thoroughly and proactively (as opposed to only when someone reports a
> > bug).
> 
> I don't see how putting a few macros in while we're there could hurt :-)

We neither saw any macros yet nor have any idea how many of them will
be needed.  So I don't know why you think it's just a few macros.  We
are still discussing a tiny static function and haven't arrived at an
agreed and safe solution.  Let's be a bit more practical here, okay?

> > The safe thing is to re-initialize the pointer from the string
> > whenever we think GC could happen, and otherwise make sure GC cannot
> > happen.
> 
> For the old GC, yes.  For the new GC, the safe thing would be to ensure
> MPS has removed all memory barriers, take the arena lock, then call the
> syscall and return.  Or use xstrdup.

If this is indeed so (and I don't think it is), then we need to
discuss it very thoroughly, because it basically means we cannot do
anything with Lisp strings in C.  For example, the display iterator
has a member that is a Lisp string, which is used when we produce
glyphs for display from a string (such as a display property or an
overlay before/after string) -- what you say here basically means that
we cannot carry that string around, right?  If not, how is that
different?

> >> If a pointer to "old" data is ever exposed to Emacs, we lose, because
> >> MPS will reuse the memory for new data, which might be behind a barrier.
> >>
> >> If we ever do:
> >>
> >>    static Lisp_Object unmarked;
>       ^^^^^^
> >>    unmarked = string;
> >>    ... trigger GC here ...
> >>    puts (SDATA (unmarked);
> >>
> >> the most likely outcome (thanks to Gerd for the hint) is that
> >> nonsensical data is printed
> >
> > Are you sure?
> 
> With the static keyword, yes.  (Assuming we don't staticpro the static
> variable, of course).

What does static have to do with this?  What matters is the value, not
the storage.  The value comes from 'string', a different variable.  It
points to string data, and it's that string data that is of interest
to us in the above snippet.

> > The below is indeed unsafe:
> >
> >   char *p = SDATA (unmarked);
> >   ... trigger GC here ...
> >   puts (p);
> 
> Right now, that's safe for MPS, but not for the old GC, correct?

If GC moves string data, then it is not safe, period.  Does MPS move
string data?

> >> > To clarify, I was trying to understand whether the error message
> >> > reported by Ihor in another thread could have happened because of GC
> >> > in this are of the code.
> >>
> >> I currently think that Ihor's test case calls execve () with nonsensical
> >> "environment" strings a lot, and once in a while they'll even be behind
> >> the barrier which causes an EFAULT.
> >
> > Before we agree that this could be the reason, we'd need to find the
> > places where GC could reuse the memory of a live string, while that
> > string appears in some live data structure, and as long as no GC can
> > happen between the time we put the SDATA pointers into the environment
> > array and the time posix_spawn returns.
> 
> Calling igc_collect () right before the spawn results in corrupt data:

But the code doesn't call igc_collect, so how is this relevant?
Please see how I described the situation in the paragraph above.

> So the only mystery left is what causes GC to be called on current
> scratch/igc after the environment block has been created;

That's the _only_ mystery, and only if that in fact happens.  Someone
should explain how GC _can_ happen in that case on the igc branch.  If
you can, please do, and let's stop arguing about theoretical
possibilities.

> My current theory is that igc_on_grow_specpdl (called indirectly from
> here:
> 
>   /* Do the unwind-protect now, even though the pid is not known, so
>      that no storage allocation is done in the critical section.
>      The actual PID will be filled in during the critical section.  */
>   record_unwind_protect (call_process_cleanup, Fcurrent_buffer ());
> 
> ) releases the arena, and MPS uses that apparent opportunity to call
> ArenaPoll which might do GC things.

Evidence, please.





  reply	other threads:[~2025-01-04 20:31 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-03 19:55 ` Gerd Möllmann
2025-01-03 20:34   ` Gerd Möllmann
2025-01-03 20:48     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04  4:40       ` Gerd Möllmann
2025-01-04  7:57         ` Eli Zaretskii
2025-01-04  8:47           ` Gerd Möllmann
2025-01-04  9:56             ` Eli Zaretskii
2025-01-04 10:20               ` Gerd Möllmann
2025-01-05 13:30                 ` Eli Zaretskii
2025-01-05 14:11                   ` Gerd Möllmann
2025-01-05 17:45                     ` Eli Zaretskii
2025-01-05 18:17                       ` Gerd Möllmann
2025-01-05 19:07                         ` Eli Zaretskii
2025-01-05 20:04                           ` Gerd Möllmann
2025-01-05 20:24                             ` Eli Zaretskii
2025-01-06  3:57                               ` Gerd Möllmann
2025-01-06  8:25                                 ` Gerd Möllmann
2025-01-06 14:07                                 ` Eli Zaretskii
2025-01-05 21:15                           ` Daniel Colascione
2025-01-06 12:59                             ` Eli Zaretskii
2025-01-06 14:48                               ` Daniel Colascione
2025-01-06 15:12                                 ` Eli Zaretskii
2025-01-06 15:27                                   ` Daniel Colascione
2025-01-05 21:01                     ` Daniel Colascione
2025-01-05 23:28                       ` Daniel Colascione
2025-01-06 13:26                         ` Eli Zaretskii
2025-01-06 15:08                           ` Daniel Colascione
2025-01-06  4:23                       ` Gerd Möllmann
2025-01-04 11:41               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 11:29         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 12:17           ` Gerd Möllmann
2025-01-04  7:00     ` Eli Zaretskii
2025-01-04  7:17       ` Gerd Möllmann
2025-01-04  8:23         ` Eli Zaretskii
2025-01-04  8:58           ` Gerd Möllmann
2025-01-04 11:08       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 13:47         ` Eli Zaretskii
2025-01-04 14:13           ` Gerd Möllmann
2025-01-04 15:26           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 15:34             ` Gerd Möllmann
2025-01-04 18:19               ` Eli Zaretskii
2025-01-04 18:35                 ` Gerd Möllmann
2025-01-04 19:10                   ` Eli Zaretskii
2025-01-04 19:24                     ` Gerd Möllmann
2025-01-04 18:02             ` Eli Zaretskii
2025-01-04 19:32               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 20:31                 ` Eli Zaretskii [this message]
2025-01-04 21:15                   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05  8:23                     ` Eli Zaretskii
2025-01-05  9:04                       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05  9:32                         ` Eli Zaretskii
2025-01-05  9:47                           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 11:04                             ` Eli Zaretskii
2025-01-06 15:54                               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-06 19:16                                 ` Gerd Möllmann
2025-01-08  3:46                                   ` Gerd Möllmann
2025-01-05  6:32 ` Gerd Möllmann
2025-01-05  6:59   ` Gerd Möllmann
2025-01-05 10:21     ` Eli Zaretskii
2025-01-05 10:30       ` Gerd Möllmann
2025-01-05 10:35         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 10:45           ` Gerd Möllmann
2025-01-05 11:29         ` Eli Zaretskii
2025-01-05 11:37           ` Gerd Möllmann
2025-01-05 12:15             ` Eli Zaretskii
2025-01-05 13:21               ` Gerd Möllmann
2025-01-05 17:31                 ` Eli Zaretskii
2025-01-05 17:49                   ` Gerd Möllmann
2025-01-05 18:42                     ` Eli Zaretskii
2025-01-05 19:02                       ` Gerd Möllmann
2025-01-05  7:48   ` Eli Zaretskii
2025-01-05  8:19     ` Gerd Möllmann
2025-01-05 10:33       ` Eli Zaretskii
2025-01-05 10:40         ` Gerd Möllmann
2025-01-05 11:21           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 11:27             ` Gerd Möllmann
2025-01-05 11:49             ` Paul Eggert
2025-01-06  6:26           ` Gerd Möllmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86a5c6b9sb.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=75322@debbugs.gnu.org \
    --cc=gerd.moellmann@gmail.com \
    --cc=pipcet@protonmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.