unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Pip Cet <pipcet@protonmail.com>
Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
Date: Sun, 05 Jan 2025 10:23:00 +0200	[thread overview]
Message-ID: <864j2dacuz.fsf@gnu.org> (raw)
In-Reply-To: <87ikque0xp.fsf@protonmail.com> (message from Pip Cet on Sat, 04 Jan 2025 21:15:50 +0000)

> Date: Sat, 04 Jan 2025 21:15:50 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> > The safe thing is to re-initialize the pointer from the string
> >> > whenever we think GC could happen, and otherwise make sure GC cannot
> >> > happen.
> >>
> >> For the old GC, yes.  For the new GC, the safe thing would be to ensure
> >> MPS has removed all memory barriers, take the arena lock, then call the
> >> syscall and return.  Or use xstrdup.
> >
> > If this is indeed so (and I don't think it is), then we need to
> 
> Which part do you think is wrong?

All of what you describe.  We should be able to use without
restrictions local variables whose values are Lisp strings, and we
should be able to use without restrictions static variables whose
values are Lisp strings.  To use the 'char *' pointer to a Lisp
string's data we should make sure GC cannot happen between the time we
extract the pointer using SDATA or SSDATA and the time we end the use
of the pointer.

If you think anything in the above is not correct with MPS, please
tell.  Please only respond with real practical problems, to keep this
discussion concrete and on-topic.

> > discuss it very thoroughly, because it basically means we cannot do
> > anything with Lisp strings in C.  For example, the display iterator
> > has a member that is a Lisp string, which is used when we produce
> > glyphs for display from a string (such as a display property or an
> > overlay before/after string) -- what you say here basically means that
> > we cannot carry that string around, right?  If not, how is that
> > different?
> 
> Of course we can use Lisp strings.  As long as there's an automatic
> variable pointing to the string data, it'll stay there.  If there's a
> static variable pointing to the string data, it might move, but then the
> static variable will be updated.

Then I don't understand why you seem to be saying that our usual use
patterns of Lisp strings and their textual data is unsafe and needs to
be amended by using xstrdup etc.  We do it already as described above.

> >> >> If a pointer to "old" data is ever exposed to Emacs, we lose, because
> >> >> MPS will reuse the memory for new data, which might be behind a barrier.
> >> >>
> >> >> If we ever do:
> >> >>
> >> >>    static Lisp_Object unmarked;
> >>       ^^^^^^
> >> >>    unmarked = string;
> >> >>    ... trigger GC here ...
> >> >>    puts (SDATA (unmarked);
> >> >>
> >> >> the most likely outcome (thanks to Gerd for the hint) is that
> >> >> nonsensical data is printed
> >> >
> >> > Are you sure?
> >>
> >> With the static keyword, yes.  (Assuming we don't staticpro the static
> >> variable, of course).
> >
> > What does static have to do with this?
> 
> What matters is whether the value is visible to the garbage collector
> (in which case it remains a valid pointer) or isn't (in which case the
> memory it points to is used for something else).
> 
> Automatic variables, residing on the stack or residing in a register
> (which is then spilled to the stack) protect the memory they point to.
> Static variables don't unless we tell GC about them.
> 
> > What matters is the value, not the storage.
> 
> I have no idea what you mean by that.  The value of the variable is a
> tagged pointer.  It won't change during GC, because GC never alters
> automatic variables.  The question is whether this pointer still points
> to the right data area after GC.

The pointer taken before GC might not point to the string's data
after GC.  Which is why, if we know GC could happen between two points
in a program's code, we cannot use the same 'char *' pointer there;
instead, we must recompute the pointer (by using SDATA) after the
possible GC.

> > The value comes from 'string', a different variable.  It
> 
> Which might be in a register and not survive until GC is triggered.

A Lisp_Object variable will survive.  Its pointer will be updated if
needed to point to the new location of the data.  Thus, using the
Lisp_Object variable is always safe, but the pointer to its data must
be updated after a potential GC.

> >> > The below is indeed unsafe:
> >> >
> >> >   char *p = SDATA (unmarked);
> >> >   ... trigger GC here ...
> >> >   puts (p);
> >>
> >> Right now, that's safe for MPS, but not for the old GC, correct?
> >
> > If GC moves string data, then it is not safe, period.  Does MPS move
> > string data?
> 
> MPS does not move string data if there is a stack variable pointing to
> it.  It does in other cases.  This is why it's safe for MPS.  The old
> GC, IIUC, is less forgiving.

The conclusion is that the above is NOT safe, because in some cases
the data could move.  Which was what I said from the get-go.

> >> > Before we agree that this could be the reason, we'd need to find the
> >> > places where GC could reuse the memory of a live string, while that
> >> > string appears in some live data structure, and as long as no GC can
> >> > happen between the time we put the SDATA pointers into the environment
> >> > array and the time posix_spawn returns.
> >>
> >> Calling igc_collect () right before the spawn results in corrupt data:
> >
> > But the code doesn't call igc_collect, so how is this relevant?
> 
> This is all that is relevant to establishing the current code is not
> safe for the intended behavior of scratch/igc, which is to allow garbage
> collection at this point, equivalent to setting a breakpoint there and
> calling igc_collect().

Why would we allow GC at some point where the code clearly does NOT
expect GC to happen?

We are discussing whether some code in callproc.c needs to be changed
or is okay as it is.  If it is okay because GC cannot possibly happen
while some code fragment is being executed, then forcefully injecting
a call to GC there is pointless, because it changes the conditions
under which this code normally runs.  So please keep such arguments
out of this discussion, they just muddy the waters and are not useful.

> I honestly don't know whether hitting a memory barrier can result in
> other data being moved by MPS right now.  I'm assuming it can, and we
> should not assume string data is protected from being moved unless we
> root it explicitly.

We should not assume anything, we should have a clear understanding
what does and what doesn't move the data and the pointers.  We cannot
possibly analyze the code otherwise, or at least I cannot.

To keep this discussion focused, let's please return to looking at
some particular code in callproc.c where you think SAFE_ALLOCA is
unsafe.





  reply	other threads:[~2025-01-05  8:23 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-03 19:55 ` Gerd Möllmann
2025-01-03 20:34   ` Gerd Möllmann
2025-01-03 20:48     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04  4:40       ` Gerd Möllmann
2025-01-04  7:57         ` Eli Zaretskii
2025-01-04  8:47           ` Gerd Möllmann
2025-01-04  9:56             ` Eli Zaretskii
2025-01-04 10:20               ` Gerd Möllmann
2025-01-05 13:30                 ` Eli Zaretskii
2025-01-05 14:11                   ` Gerd Möllmann
2025-01-05 17:45                     ` Eli Zaretskii
2025-01-05 18:17                       ` Gerd Möllmann
2025-01-05 19:07                         ` Eli Zaretskii
2025-01-05 20:04                           ` Gerd Möllmann
2025-01-05 20:24                             ` Eli Zaretskii
2025-01-06  3:57                               ` Gerd Möllmann
2025-01-06  8:25                                 ` Gerd Möllmann
2025-01-06 14:07                                 ` Eli Zaretskii
2025-01-05 21:15                           ` Daniel Colascione
2025-01-06 12:59                             ` Eli Zaretskii
2025-01-06 14:48                               ` Daniel Colascione
2025-01-06 15:12                                 ` Eli Zaretskii
2025-01-06 15:27                                   ` Daniel Colascione
2025-01-05 21:01                     ` Daniel Colascione
2025-01-05 23:28                       ` Daniel Colascione
2025-01-06 13:26                         ` Eli Zaretskii
2025-01-06 15:08                           ` Daniel Colascione
2025-01-06  4:23                       ` Gerd Möllmann
2025-01-04 11:41               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 11:29         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 12:17           ` Gerd Möllmann
2025-01-04  7:00     ` Eli Zaretskii
2025-01-04  7:17       ` Gerd Möllmann
2025-01-04  8:23         ` Eli Zaretskii
2025-01-04  8:58           ` Gerd Möllmann
2025-01-04 11:08       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 13:47         ` Eli Zaretskii
2025-01-04 14:13           ` Gerd Möllmann
2025-01-04 15:26           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 15:34             ` Gerd Möllmann
2025-01-04 18:19               ` Eli Zaretskii
2025-01-04 18:35                 ` Gerd Möllmann
2025-01-04 19:10                   ` Eli Zaretskii
2025-01-04 19:24                     ` Gerd Möllmann
2025-01-04 18:02             ` Eli Zaretskii
2025-01-04 19:32               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 20:31                 ` Eli Zaretskii
2025-01-04 21:15                   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05  8:23                     ` Eli Zaretskii [this message]
2025-01-05  9:04                       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05  9:32                         ` Eli Zaretskii
2025-01-05  9:47                           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 11:04                             ` Eli Zaretskii
2025-01-06 15:54                               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-06 19:16                                 ` Gerd Möllmann
2025-01-05  6:32 ` Gerd Möllmann
2025-01-05  6:59   ` Gerd Möllmann
2025-01-05 10:21     ` Eli Zaretskii
2025-01-05 10:30       ` Gerd Möllmann
2025-01-05 10:35         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 10:45           ` Gerd Möllmann
2025-01-05 11:29         ` Eli Zaretskii
2025-01-05 11:37           ` Gerd Möllmann
2025-01-05 12:15             ` Eli Zaretskii
2025-01-05 13:21               ` Gerd Möllmann
2025-01-05 17:31                 ` Eli Zaretskii
2025-01-05 17:49                   ` Gerd Möllmann
2025-01-05 18:42                     ` Eli Zaretskii
2025-01-05 19:02                       ` Gerd Möllmann
2025-01-05  7:48   ` Eli Zaretskii
2025-01-05  8:19     ` Gerd Möllmann
2025-01-05 10:33       ` Eli Zaretskii
2025-01-05 10:40         ` Gerd Möllmann
2025-01-05 11:21           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 11:27             ` Gerd Möllmann
2025-01-05 11:49             ` Paul Eggert
2025-01-06  6:26           ` Gerd Möllmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=864j2dacuz.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=75322@debbugs.gnu.org \
    --cc=gerd.moellmann@gmail.com \
    --cc=pipcet@protonmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).