all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
@ 2025-01-03 17:20 Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-03 19:55 ` Gerd Möllmann
  2025-01-05  6:32 ` Gerd Möllmann
  0 siblings, 2 replies; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-03 17:20 UTC (permalink / raw)
  To: 75322

process.c and callproc.c use ALLOCA to store arrays of Lisp_Object
values, or arrays of pointers to string data.

With the current GC, that is correct code only if it's provable that
these objects are marked through some other reference to them (or we
know no GC can happen), because if the last reference is hiding in a
SAFE_ALLOCA'd buffer AND that buffer is not on the C stack, it will be
collected.

With MPS GC, it's even more important to do so, because the object might
otherwise be moved, invalidating the pointer.  This is a possible
explanation for bug#75292.

In either case, I don't immediately see that the current code would be
safe.

SAFE_ALLOCA doesn't call xmalloc very often: it usually uses alloca(),
which results in data on the C stack, where it is visible to both
garbage collectors.  An alternative to fixing this bug in callproc.c and
process.c (and reviewing every other use of SAFE_ALLOCA) would be to
ensure that in the rare case that SAFE_ALLOCA memory is not on the
stack, we'd still conservatively scan it for references.

If we decide to retain the current SAFE_ALLOCA behavior, it is very
important to test builds where SAFE_ALLOCA always uses xmalloc, so we
have a chance to detect such bugs.  Unfortunately, currently bidi.c and
some other places assume that MAX_ALLOCA is "large enough" so this isn't
a simple matter of defining that constant to be 0.

A large stack footprint comes at a cost: it needs to be scanned during
GC, but more importantly there is the risk of latent bugs because stack
pages might not be accessed in the "right" order and the OS might assume
that an excessively-offset access is a program bug rather than an
attempt to allocate large amounts of stack; GCC on GNU/Linux currently
tries to work around this issue by touching each 4096-byte page of the
stack at least once, in the right order.

Note that call_process already allocates more than 64 KB of stack space
unconditionally (in an inner scope, but GCC hoists such allocations to
the function frame).  It may be a good idea not to use SAFE_ALLOCA at
all in this function, unless the 64 KB allocation can be removed.






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-03 19:55 ` Gerd Möllmann
  2025-01-03 20:34   ` Gerd Möllmann
  2025-01-05  6:32 ` Gerd Möllmann
  1 sibling, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-03 19:55 UTC (permalink / raw)
  To: Pip Cet; +Cc: 75322

Pip Cet <pipcet@protonmail.com> writes:

> process.c and callproc.c use ALLOCA to store arrays of Lisp_Object
> values, or arrays of pointers to string data.
>
> With the current GC, that is correct code only if it's provable that
> these objects are marked through some other reference to them (or we
> know no GC can happen), because if the last reference is hiding in a
> SAFE_ALLOCA'd buffer AND that buffer is not on the C stack, it will be
> collected.
>
> With MPS GC, it's even more important to do so, because the object might
> otherwise be moved, invalidating the pointer.  This is a possible
> explanation for bug#75292.
>
> In either case, I don't immediately see that the current code would be
> safe.

I agree.

One should probably check why the places storing Lisp_Objects don't use
SAFE_ALLOCA_LISP. If that's possible it would work for old GC and igc.

The pointers to string data case probably requires adding yet another
macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root,
possibly and exact one which would be good.






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-03 19:55 ` Gerd Möllmann
@ 2025-01-03 20:34   ` Gerd Möllmann
  2025-01-03 20:48     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-04  7:00     ` Eli Zaretskii
  0 siblings, 2 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-03 20:34 UTC (permalink / raw)
  To: Pip Cet; +Cc: 75322

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

>
> The pointers to string data case probably requires adding yet another
> macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root,
> possibly and exact one which would be good.

Or one does it as you did in b0a209e9204, that's of course also safe.
For both old and new GC. (Don't remember if you mentioned it Pip, but
old GC moves string data as well, during string compaction, should GC
run).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-03 20:34   ` Gerd Möllmann
@ 2025-01-03 20:48     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-04  4:40       ` Gerd Möllmann
  2025-01-04  7:00     ` Eli Zaretskii
  1 sibling, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-03 20:48 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 75322

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>>
>> The pointers to string data case probably requires adding yet another
>> macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root,
>> possibly and exact one which would be good.

Note that might still EFAULT if there's a memory barrier.  I think.

Do we really need to move all arguments to syscalls and libc functions
which might use a syscall into non-MPS memory?  That would be bad.

And which libc functions might use a syscall?  I think we can agree
fprintf might, and memcpy() doesn't (note to self: destroy all evidence
I ever considered making memcpy() use MMU tricks for very large
buffers), but what about all the others?

Maybe I'm panicking too much and fixing read/write/exec* is good enough?

> Or one does it as you did in b0a209e9204, that's of course also safe.
> For both old and new GC. (Don't remember if you mentioned it Pip, but
> old GC moves string data as well, during string compaction, should GC
> run).

Ouch.  Yes, I remember now.

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-03 20:48     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-04  4:40       ` Gerd Möllmann
  2025-01-04  7:57         ` Eli Zaretskii
  2025-01-04 11:29         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04  4:40 UTC (permalink / raw)
  To: Pip Cet; +Cc: 75322

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>>
>>> The pointers to string data case probably requires adding yet another
>>> macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root,
>>> possibly and exact one which would be good.
>
> Note that might still EFAULT if there's a memory barrier.  I think.
>
> Do we really need to move all arguments to syscalls and libc functions
> which might use a syscall into non-MPS memory?  That would be bad.
>
> And which libc functions might use a syscall?  I think we can agree
> fprintf might, and memcpy() doesn't (note to self: destroy all evidence
> I ever considered making memcpy() use MMU tricks for very large
> buffers), but what about all the others?
>
> Maybe I'm panicking too much and fixing read/write/exec* is good
> enough?

Don't Panic! Quote from The Hitchhiker's Guide to Emacs (non-NS edition)
:-).

TBH, I couldn't follow your thoughts above with the EFAULT, syscalls and
so on.

>> Or one does it as you did in b0a209e9204, that's of course also safe.
>> For both old and new GC. (Don't remember if you mentioned it Pip, but
>> old GC moves string data as well, during string compaction, should GC
>> run).
>
> Ouch.  Yes, I remember now.
>
> Pip

And today I see you reverted that commit. Is there something wrong with
it? I couldn't see something wrong, and for me VALUE(no root) >
VALUE(exact) VALUE(ambig).

WRT Lisp_Object allocas, please tell if I should do that.

(I think this all could also be done on master, but since this is there
forever, I don't think it's absolutely necessary. Agree?)





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-03 20:34   ` Gerd Möllmann
  2025-01-03 20:48     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-04  7:00     ` Eli Zaretskii
  2025-01-04  7:17       ` Gerd Möllmann
  2025-01-04 11:08       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04  7:00 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> Cc: 75322@debbugs.gnu.org
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Date: Fri, 03 Jan 2025 21:34:07 +0100
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
> >
> > The pointers to string data case probably requires adding yet another
> > macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root,
> > possibly and exact one which would be good.
> 
> Or one does it as you did in b0a209e9204, that's of course also safe.
> For both old and new GC. (Don't remember if you mentioned it Pip, but
> old GC moves string data as well, during string compaction, should GC
> run).

The current code in callproc.c assumes that GC cannot run while we are
parked in posix_spawn or vfork.  Is that assumption false with MPS?
If so, what would trigger GC during that time?

Another question is about the global Lisp variables in 'globals'.  For
example, Vprocess_environment actually globals.f_Vprocess_environment.
Is this large struct protected from GC, i.e. can GC ever decide that
process-environment is not used and free it?  If it's protected, where
and how is it protected?  And if it is protected, then any members of
the list that is the value of process-environment are also protected
and cannot be freed by GC.

If 'globals' is not protected, I think we should protect it, no?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  7:00     ` Eli Zaretskii
@ 2025-01-04  7:17       ` Gerd Möllmann
  2025-01-04  8:23         ` Eli Zaretskii
  2025-01-04 11:08       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04  7:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Helmut Eller, pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: 75322@debbugs.gnu.org
>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Date: Fri, 03 Jan 2025 21:34:07 +0100
>> 
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>> >
>> > The pointers to string data case probably requires adding yet another
>> > macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root,
>> > possibly and exact one which would be good.
>> 
>> Or one does it as you did in b0a209e9204, that's of course also safe.
>> For both old and new GC. (Don't remember if you mentioned it Pip, but
>> old GC moves string data as well, during string compaction, should GC
>> run).
>
> The current code in callproc.c assumes that GC cannot run while we are
> parked in posix_spawn or vfork.  Is that assumption false with MPS?
> If so, what would trigger GC during that time?

Okay, so it's safe with the old GC, I assume. Do you know if it is done
with an inhibit_garbage_collection? Not that we run into the trap that
somewhere on the way to the actual fork maybe_quit is called which can
GC and so on, like we had it in the regexp engine a while back.

For MPS, I don't know for sure. I have seen in passing while git
grepping in their repo recently, that they write something about forking
and child processes, but I don't know what they do. Maybe someone having
read the code can answer that. (Added Helmut in CC).

>
> Another question is about the global Lisp variables in 'globals'.  For
> example, Vprocess_environment actually globals.f_Vprocess_environment.
> Is this large struct protected from GC, i.e. can GC ever decide that
> process-environment is not used and free it?  If it's protected, where
> and how is it protected?  And if it is protected, then any members of
> the list that is the value of process-environment are also protected
> and cannot be freed by GC.
>
> If 'globals' is not protected, I think we should protect it, no?

process-environment is a DEFVAR_LISP, and is root in both GCs via the
staticpro mechanism (staticvec array).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  4:40       ` Gerd Möllmann
@ 2025-01-04  7:57         ` Eli Zaretskii
  2025-01-04  8:47           ` Gerd Möllmann
  2025-01-04 11:29         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04  7:57 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> Cc: 75322@debbugs.gnu.org
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Date: Sat, 04 Jan 2025 05:40:09 +0100
> 
> And today I see you reverted that commit. Is there something wrong with
> it? I couldn't see something wrong, and for me VALUE(no root) >
> VALUE(exact) VALUE(ambig).

That's my fault.  I asked to post the patch and discuss it before
committing, as these are delicate issues where I prefer that we all
are on the same page before changing this code.

> WRT Lisp_Object allocas, please tell if I should do that.

Let's discuss this on a case by case basis.  Not all uses of alloca
are the same or have the same requirements and restrictions.

> (I think this all could also be done on master, but since this is there
> forever, I don't think it's absolutely necessary. Agree?)

Exactly my point, which is why I think we should discuss these changes
before installing them on the branch, let alone on master.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  7:17       ` Gerd Möllmann
@ 2025-01-04  8:23         ` Eli Zaretskii
  2025-01-04  8:58           ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04  8:23 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: eller.helmut, pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org, Helmut Eller
>  <eller.helmut@gmail.com>
> Date: Sat, 04 Jan 2025 08:17:34 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > The current code in callproc.c assumes that GC cannot run while we are
> > parked in posix_spawn or vfork.  Is that assumption false with MPS?
> > If so, what would trigger GC during that time?
> 
> Okay, so it's safe with the old GC, I assume. Do you know if it is done
> with an inhibit_garbage_collection?

No, without.  Unless posix_spawn could somehow call back to us in a
way that posix_spawn will later return and resume running (as opposed
to via a fatal signal), how could GC happen?

> Not that we run into the trap that somewhere on the way to the
> actual fork maybe_quit is called which can GC and so on, like we had
> it in the regexp engine a while back.

This can be established by examining the code, right?

> For MPS, I don't know for sure. I have seen in passing while git
> grepping in their repo recently, that they write something about forking
> and child processes, but I don't know what they do. Maybe someone having
> read the code can answer that. (Added Helmut in CC).

We are talking about vfork, not fork.  In our case, the forked process
runs a completely separate program, in most cases not even Emacs, so MPS
has no bearing on that, I think.

> > Another question is about the global Lisp variables in 'globals'.  For
> > example, Vprocess_environment actually globals.f_Vprocess_environment.
> > Is this large struct protected from GC, i.e. can GC ever decide that
> > process-environment is not used and free it?  If it's protected, where
> > and how is it protected?  And if it is protected, then any members of
> > the list that is the value of process-environment are also protected
> > and cannot be freed by GC.
> >
> > If 'globals' is not protected, I think we should protect it, no?
> 
> process-environment is a DEFVAR_LISP, and is root in both GCs via the
> staticpro mechanism (staticvec array).

OK, so it's impossible that some member of the process-environment's
list will be GCed, right?  AFAIU, Pip was considering a situation
where some of these member strings are GCed, and then subroutines of
posix_spawn use a bad address, to explain the error message Ihor
reported.  This seems impossible because process-environment is
staticpro'd, right?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  7:57         ` Eli Zaretskii
@ 2025-01-04  8:47           ` Gerd Möllmann
  2025-01-04  9:56             ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04  8:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: 75322@debbugs.gnu.org
>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Date: Sat, 04 Jan 2025 05:40:09 +0100
>> 
>> And today I see you reverted that commit. Is there something wrong with
>> it? I couldn't see something wrong, and for me VALUE(no root) >
>> VALUE(exact) VALUE(ambig).
>
> That's my fault.  I asked to post the patch and discuss it before
> committing, as these are delicate issues where I prefer that we all
> are on the same page before changing this code.

Ah okay, that explains it, I missed that discussion, sorry.

>> WRT Lisp_Object allocas, please tell if I should do that.
>
> Let's discuss this on a case by case basis.  Not all uses of alloca
> are the same or have the same requirements and restrictions.

Okay. For the SDATA case, I find Pip's commit does the right thing. It
uses xstrdup for the strings and makes sure these are freed later by
registering them in one of the special specpdl entries that exist for
that purpose (record_unwind_protect_ptr). Works with both GCs, is always
safe, I don't see disadvantages, and we don't have to check if GC runs
or not and compact strings (old) or moves string data around (igc).

For the Lisp_Object cases, I strongly suspect that these cases are an
oversight and were left over when SAFE_ALLOCA_LISP was introduced. The
reason for introducing SAFE_ALLOCA_LISP is, IMO, what Pip said (old GC):
to make sure that the Lisp_Objects are marked, via specpdl stack entries
again, when the specpdl stack(s) are marked. Not doing that looks
definitely wrong. Replacing the other ALLOCA macros that are currently
used for holding Lisp_Objects with SAFE_ALLOCA_LISP would solve things
for both GCs.






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  8:23         ` Eli Zaretskii
@ 2025-01-04  8:58           ` Gerd Möllmann
  0 siblings, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04  8:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eller.helmut, pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org, Helmut Eller
>>  <eller.helmut@gmail.com>
>> Date: Sat, 04 Jan 2025 08:17:34 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > The current code in callproc.c assumes that GC cannot run while we are
>> > parked in posix_spawn or vfork.  Is that assumption false with MPS?
>> > If so, what would trigger GC during that time?
>> 
>> Okay, so it's safe with the old GC, I assume. Do you know if it is done
>> with an inhibit_garbage_collection?
>
> No, without.  Unless posix_spawn could somehow call back to us in a
> way that posix_spawn will later return and resume running (as opposed
> to via a fatal signal), how could GC happen?
>
>> Not that we run into the trap that somewhere on the way to the
>> actual fork maybe_quit is called which can GC and so on, like we had
>> it in the regexp engine a while back.
>
> This can be established by examining the code, right?

In principle yes, one could check if maybe_quit is called in the call
trees between the point SDATA is stored and fork happens, but I wouldn't
do it because (1) someone might add a maybe_quit at some point, (2)
Pip's solution is immune to the problem, and (3) it's potentially a lot
of work to figure out if GC can be called in a call tree.

>> For MPS, I don't know for sure. I have seen in passing while git
>> grepping in their repo recently, that they write something about forking
>> and child processes, but I don't know what they do. Maybe someone having
>> read the code can answer that. (Added Helmut in CC).
>
> We are talking about vfork, not fork.  In our case, the forked process
> runs a completely separate program, in most cases not even Emacs, so MPS
> has no bearing on that, I think.

Okay. I hope Helmut and/or Pip know more about this than I do.

>> > Another question is about the global Lisp variables in 'globals'.  For
>> > example, Vprocess_environment actually globals.f_Vprocess_environment.
>> > Is this large struct protected from GC, i.e. can GC ever decide that
>> > process-environment is not used and free it?  If it's protected, where
>> > and how is it protected?  And if it is protected, then any members of
>> > the list that is the value of process-environment are also protected
>> > and cannot be freed by GC.
>> >
>> > If 'globals' is not protected, I think we should protect it, no?
>> 
>> process-environment is a DEFVAR_LISP, and is root in both GCs via the
>> staticpro mechanism (staticvec array).
>
> OK, so it's impossible that some member of the process-environment's
> list will be GCed, right?  

Right.

> AFAIU, Pip was considering a situation where some of these member
> strings are GCed, and then subroutines of posix_spawn use a bad
> address, to explain the error message Ihor reported. This seems
> impossible because process-environment is staticpro'd, right?

I think that scenario cannot happen in this case, right. I'd like his
xstrdups better anyway, so that all becomes irrelevant for both GCs.
Less to worry about :-).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  8:47           ` Gerd Möllmann
@ 2025-01-04  9:56             ` Eli Zaretskii
  2025-01-04 10:20               ` Gerd Möllmann
  2025-01-04 11:41               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04  9:56 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sat, 04 Jan 2025 09:47:43 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Let's discuss this on a case by case basis.  Not all uses of alloca
> > are the same or have the same requirements and restrictions.
> 
> Okay. For the SDATA case, I find Pip's commit does the right thing. It
> uses xstrdup for the strings and makes sure these are freed later by
> registering them in one of the special specpdl entries that exist for
> that purpose (record_unwind_protect_ptr). Works with both GCs, is always
> safe, I don't see disadvantages, and we don't have to check if GC runs
> or not and compact strings (old) or moves string data around (igc).

The disadvantage is to xstrdup strings when that is not needed.  It
increases memory pressure and also costs some CPU time.  Not very
significant disadvantage, admittedly, but still.  If this is needed,
it's a small price to pay, but if it isn't needed, it's a waste.

So let's see if we agree that it is indeed needed.  The way to do that
is to explain how GC could happen while the code which assembles the
environment in make_environment_block and the code in emacs_spawn
afterwards run.  Note that we block SIGCHLD and block input while
emacs_spawn runs.

> For the Lisp_Object cases, I strongly suspect that these cases are an
> oversight and were left over when SAFE_ALLOCA_LISP was introduced. The
> reason for introducing SAFE_ALLOCA_LISP is, IMO, what Pip said (old GC):
> to make sure that the Lisp_Objects are marked, via specpdl stack entries
> again, when the specpdl stack(s) are marked. Not doing that looks
> definitely wrong. Replacing the other ALLOCA macros that are currently
> used for holding Lisp_Objects with SAFE_ALLOCA_LISP would solve things
> for both GCs.

Can we first identify those cases, so that we could discuss the
specific codes in question?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  9:56             ` Eli Zaretskii
@ 2025-01-04 10:20               ` Gerd Möllmann
  2025-01-05 13:30                 ` Eli Zaretskii
  2025-01-04 11:41               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04 10:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sat, 04 Jan 2025 09:47:43 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Let's discuss this on a case by case basis.  Not all uses of alloca
>> > are the same or have the same requirements and restrictions.
>> 
>> Okay. For the SDATA case, I find Pip's commit does the right thing. It
>> uses xstrdup for the strings and makes sure these are freed later by
>> registering them in one of the special specpdl entries that exist for
>> that purpose (record_unwind_protect_ptr). Works with both GCs, is always
>> safe, I don't see disadvantages, and we don't have to check if GC runs
>> or not and compact strings (old) or moves string data around (igc).
>
> The disadvantage is to xstrdup strings when that is not needed.  It
> increases memory pressure and also costs some CPU time.  Not very
> significant disadvantage, admittedly, but still.  If this is needed,
> it's a small price to pay, but if it isn't needed, it's a waste.

That's true.

>
> So let's see if we agree that it is indeed needed.  The way to do that
> is to explain how GC could happen while the code which assembles the
> environment in make_environment_block and the code in emacs_spawn
> afterwards run.  Note that we block SIGCHLD and block input while
> emacs_spawn runs.

I'm sorry, but I'm afraid that's not something for me. The possible gains
are so small that I don't want to spend time checking this.

>> For the Lisp_Object cases, I strongly suspect that these cases are an
>> oversight and were left over when SAFE_ALLOCA_LISP was introduced. The
>> reason for introducing SAFE_ALLOCA_LISP is, IMO, what Pip said (old GC):
>> to make sure that the Lisp_Objects are marked, via specpdl stack entries
>> again, when the specpdl stack(s) are marked. Not doing that looks
>> definitely wrong. Replacing the other ALLOCA macros that are currently
>> used for holding Lisp_Objects with SAFE_ALLOCA_LISP would solve things
>> for both GCs.
>
> Can we first identify those cases, so that we could discuss the
> specific codes in question?

Okay. Pip talked about callproc.c and process.c. I believe process.c is
the one with the SDATA.

In callproc.c I found two: call_process and create_temp_file both use
SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces
with SAVE_ALLOCA_LISP.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  7:00     ` Eli Zaretskii
  2025-01-04  7:17       ` Gerd Möllmann
@ 2025-01-04 11:08       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-04 13:47         ` Eli Zaretskii
  1 sibling, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 11:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gerd Möllmann, 75322

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Cc: 75322@debbugs.gnu.org
>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Date: Fri, 03 Jan 2025 21:34:07 +0100
>>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>> >
>> > The pointers to string data case probably requires adding yet another
>> > macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root,
>> > possibly and exact one which would be good.
>>
>> Or one does it as you did in b0a209e9204, that's of course also safe.
>> For both old and new GC. (Don't remember if you mentioned it Pip, but
>> old GC moves string data as well, during string compaction, should GC
>> run).
>
> The current code in callproc.c assumes that GC cannot run while we are
> parked in posix_spawn or vfork.

If you're attempting to explain why the current code is safe (if you're
saying it is), it assumes much more than that.  call_process assumes
Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the
function calls Lisp, which may do anything, including modifying
Vprocess_environment.

Regardless of what you're saying, such assumptions need to be spelled
out.  Where they are made, that is, not in a utility function.

Yes, make_environment_block does say its callers can't run GC, but
call_process doesn't indicate when and how it establishes a no-GC
assumption.

> Is that assumption false with MPS?

As we agreed, code should be written to assume GC can strike at any
time.  In the context of MPS, to make things more difficult, "GC can
strike" may mean a full GC happens (moving objects) or a memory barrier
is lifted.

> Another question is about the global Lisp variables in 'globals'.  For
> example, Vprocess_environment actually globals.f_Vprocess_environment.

> Is this large struct protected from GC, i.e. can GC ever decide that
> process-environment is not used and free it?  If it's protected, where
> and how is it protected?

It's a global variable.  Those are protected.

> And if it is protected, then any members of
> the list that is the value of process-environment are also protected
> and cannot be freed by GC.

IIUC, Gerd explained that the old GC can still move the string *data*
used in that structure, even if the string metadata stays in place.

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  4:40       ` Gerd Möllmann
  2025-01-04  7:57         ` Eli Zaretskii
@ 2025-01-04 11:29         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-04 12:17           ` Gerd Möllmann
  1 sibling, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 11:29 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 75322

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>
>>>>
>>>> The pointers to string data case probably requires adding yet another
>>>> macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root,
>>>> possibly and exact one which would be good.
>>
>> Note that might still EFAULT if there's a memory barrier.  I think.
>>
>> Do we really need to move all arguments to syscalls and libc functions
>> which might use a syscall into non-MPS memory?  That would be bad.
>>
>> And which libc functions might use a syscall?  I think we can agree
>> fprintf might, and memcpy() doesn't (note to self: destroy all evidence
>> I ever considered making memcpy() use MMU tricks for very large
>> buffers), but what about all the others?
>>
>> Maybe I'm panicking too much and fixing read/write/exec* is good
>> enough?
>
> Don't Panic! Quote from The Hitchhiker's Guide to Emacs (non-NS edition)
> :-).

> TBH, I couldn't follow your thoughts above with the EFAULT, syscalls and
> so on.

My understanding is that if there is a memory barrier in place for a
string that a syscall tries to access, we get an -EFAULT from Linux, an
EFAULT from glibc, and the syscall won't work.

This is what makes valid_pointer_p work, for example.  (To the extent it
does: valid_pointer_p assumes 16 bytes after the pointer are readable; I
don't see why that is true for small objects).

What makes this more difficult is that glibc and GCC disagree about what
to do with invalid pointers even in the simplest case: glibc documents
printf ("%s\n", NULL) to work, but GCC will rewrite it into puts (NULL),
which crashes.  I'm worried that glibc might wrap a syscall incorrectly
wrt EFAULT and SIGSEGV, in this case.

Worse, if the syscall is in a fork()ed process, MPS machinery to remove
the memory barrier might not be in place after the fork.  And who knows
about posix_spawn action descriptors?  Or vfork?

>>> Or one does it as you did in b0a209e9204, that's of course also safe.
>>> For both old and new GC. (Don't remember if you mentioned it Pip, but
>>> old GC moves string data as well, during string compaction, should GC
>>> run).
>>
>> Ouch.  Yes, I remember now.
>>
>> Pip
>
> And today I see you reverted that commit. Is there something wrong with
> it? I couldn't see something wrong, and for me VALUE(no root) >
> VALUE(exact) VALUE(ambig).

There were two reasons for the revert:

1. Eli asked me not to push the change right after I pushed.  I thought
it would be best to restore the "before" state so we could discuss the
solution.

2. For the non-MPS case, I rashly assumed it would be okay to remove the
no-GC assumption that call_process apparently establishes (even though
there is no comment saying so).  I'm not sure what I would do now; the
old code seems buggy to me because Fexpand_file_name can call Lisp, but
that bug affects only argv, not envp.  It may be best to fix the argv
code but leave the envp code in its (once again) current fragile state,
documenting precisely which assumptions are made there.

> WRT Lisp_Object allocas, please tell if I should do that.

Sorry, I don't understand.  Lisp_Objects shouldn't be allocated with
SAFE_ALLOCA, but allocating them with SAFE_ALLOCA_LISP_EXTRA is fine.
Pointers to string data cannot currently be safely allocated with
SAFE_ALLOCA, but I'm not sure whether SAFE_ALLOCA_AMBIGUOUS or
SAFE_ALLOCA_EXACT_POINTER would be the right thing to do.

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04  9:56             ` Eli Zaretskii
  2025-01-04 10:20               ` Gerd Möllmann
@ 2025-01-04 11:41               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 11:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gerd Möllmann, 75322

"Eli Zaretskii" <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sat, 04 Jan 2025 09:47:43 +0100
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> > Let's discuss this on a case by case basis.  Not all uses of alloca
>> > are the same or have the same requirements and restrictions.
>>
>> Okay. For the SDATA case, I find Pip's commit does the right thing. It
>> uses xstrdup for the strings and makes sure these are freed later by
>> registering them in one of the special specpdl entries that exist for
>> that purpose (record_unwind_protect_ptr). Works with both GCs, is always
>> safe, I don't see disadvantages, and we don't have to check if GC runs
>> or not and compact strings (old) or moves string data around (igc).
>
> The disadvantage is to xstrdup strings when that is not needed.  It
> increases memory pressure and also costs some CPU time.  Not very
> significant disadvantage, admittedly, but still.  If this is needed,
> it's a small price to pay, but if it isn't needed, it's a waste.

It's easier to xstrdup rather than figure out which no-GC assumptions
are in the subprocess code, which of them are valid (probably not all of
them, considering Fexpand_file_name), and how to document them in a way
that doesn't turn this area of code into more of a minefield than it is
already :-)

> So let's see if we agree that it is indeed needed.  The way to do that
> is to explain how GC could happen while the code which assembles the
> environment in make_environment_block and the code in emacs_spawn
> afterwards run.  Note that we block SIGCHLD and block input while
> emacs_spawn runs.

I'm not going to explain "how GC could happen" in the MPS case: GC can
strike at any time.

For traditional GC, if we want to change the code as little as possible,
we could indeed try to establish no-GC assumptions, but we don't have
to: we can simply start with correct but slow code and maybe drop a note
to the usual suspects that there is optimization potential here.

>> For the Lisp_Object cases, I strongly suspect that these cases are an
>> oversight and were left over when SAFE_ALLOCA_LISP was introduced. The
>> reason for introducing SAFE_ALLOCA_LISP is, IMO, what Pip said (old GC):
>> to make sure that the Lisp_Objects are marked, via specpdl stack entries
>> again, when the specpdl stack(s) are marked. Not doing that looks
>> definitely wrong. Replacing the other ALLOCA macros that are currently
>> used for holding Lisp_Objects with SAFE_ALLOCA_LISP would solve things
>> for both GCs.
>
> Can we first identify those cases, so that we could discuss the
> specific codes in question?

TBH, if there is a single bug caused by an incorrect SAFE_ALLOCA which
should be a SAFE_ALLOCA_LISP, I think that's reason enough to modify
SAFE_ALLOCA to conservatively scan any xmalloc'd memory it uses.  No
performance impact as this case is nonexistent in practice.

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 11:29         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-04 12:17           ` Gerd Möllmann
  0 siblings, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04 12:17 UTC (permalink / raw)
  To: Pip Cet; +Cc: 75322

Pip Cet <pipcet@protonmail.com> writes:

>> TBH, I couldn't follow your thoughts above with the EFAULT, syscalls and
>> so on.
>
> My understanding is that if there is a memory barrier in place for a
> string that a syscall tries to access, we get an -EFAULT from Linux, an
> EFAULT from glibc, and the syscall won't work.

Ah, I think I understand now, thanks. Thing is that string data is in
our leaf pool (AMCZ). AMCZ doesn't use barriers because there are no
references in its objects. Does that make sense?

> This is what makes valid_pointer_p work, for example.  (To the extent it
> does: valid_pointer_p assumes 16 bytes after the pointer are readable; I
> don't see why that is true for small objects).
>
> What makes this more difficult is that glibc and GCC disagree about what
> to do with invalid pointers even in the simplest case: glibc documents
> printf ("%s\n", NULL) to work, but GCC will rewrite it into puts (NULL),
> which crashes.  I'm worried that glibc might wrap a syscall incorrectly
> wrt EFAULT and SIGSEGV, in this case.
>
> Worse, if the syscall is in a fork()ed process, MPS machinery to remove
> the memory barrier might not be in place after the fork.  And who knows
> about posix_spawn action descriptors?  Or vfork?
>
>>>> Or one does it as you did in b0a209e9204, that's of course also safe.
>>>> For both old and new GC. (Don't remember if you mentioned it Pip, but
>>>> old GC moves string data as well, during string compaction, should GC
>>>> run).
>>>
>>> Ouch.  Yes, I remember now.
>>>
>>> Pip
>>
>> And today I see you reverted that commit. Is there something wrong with
>> it? I couldn't see something wrong, and for me VALUE(no root) >
>> VALUE(exact) VALUE(ambig).
>
> There were two reasons for the revert:
>
> 1. Eli asked me not to push the change right after I pushed.  I thought
> it would be best to restore the "before" state so we could discuss the
> solution.
>
> 2. For the non-MPS case, I rashly assumed it would be okay to remove the
> no-GC assumption that call_process apparently establishes (even though
> there is no comment saying so).  I'm not sure what I would do now; the
> old code seems buggy to me because Fexpand_file_name can call Lisp, but
> that bug affects only argv, not envp.  It may be best to fix the argv
> code but leave the envp code in its (once again) current fragile state,
> documenting precisely which assumptions are made there.
>
>> WRT Lisp_Object allocas, please tell if I should do that.
>
> Sorry, I don't understand.  Lisp_Objects shouldn't be allocated with
> SAFE_ALLOCA, but allocating them with SAFE_ALLOCA_LISP_EXTRA is fine.
> Pointers to string data cannot currently be safely allocated with
> SAFE_ALLOCA, but I'm not sure whether SAFE_ALLOCA_AMBIGUOUS or
> SAFE_ALLOCA_EXACT_POINTER would be the right thing to do.

My fault: I meant allocas used to store Lisp_Object in them, i.e.
Lisp_Object * :-).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 11:08       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-04 13:47         ` Eli Zaretskii
  2025-01-04 14:13           ` Gerd Möllmann
  2025-01-04 15:26           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04 13:47 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, 75322

> Date: Sat, 04 Jan 2025 11:08:50 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, 75322@debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > The current code in callproc.c assumes that GC cannot run while we are
> > parked in posix_spawn or vfork.
> 
> If you're attempting to explain why the current code is safe (if you're
> saying it is), it assumes much more than that.  call_process assumes
> Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the
> function calls Lisp, which may do anything, including modifying
> Vprocess_environment.

AFAICT, expand-file-name is called before we start using the SSDATA of
strings in the args[] array.  Or what did I miss?

> Regardless of what you're saying, such assumptions need to be spelled
> out.  Where they are made, that is, not in a utility function.

I'm okay with spelling out these assumptions.

> Yes, make_environment_block does say its callers can't run GC, but
> call_process doesn't indicate when and how it establishes a no-GC
> assumption.

What would be needed to indicate that?

> > Is that assumption false with MPS?
> 
> As we agreed, code should be written to assume GC can strike at any
> time.

I don't think we agreed to that.  At least I didn't, not in this
general form.  It would be a huge job to make all of our code comply
with this.

> > Another question is about the global Lisp variables in 'globals'.  For
> > example, Vprocess_environment actually globals.f_Vprocess_environment.
> 
> > Is this large struct protected from GC, i.e. can GC ever decide that
> > process-environment is not used and free it?  If it's protected, where
> > and how is it protected?
> 
> It's a global variable.  Those are protected.
> 
> > And if it is protected, then any members of
> > the list that is the value of process-environment are also protected
> > and cannot be freed by GC.
> 
> IIUC, Gerd explained that the old GC can still move the string *data*
> used in that structure, even if the string metadata stays in place.

If string data is moved, then accessing the old pointer would trigger
the barrier and MPS will do its thing, not?

To clarify, I was trying to understand whether the error message
reported by Ihor in another thread could have happened because of GC
in this are of the code.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 13:47         ` Eli Zaretskii
@ 2025-01-04 14:13           ` Gerd Möllmann
  2025-01-04 15:26           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04 14:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> IIUC, Gerd explained that the old GC can still move the string *data*
>> used in that structure, even if the string metadata stays in place.
>
> If string data is moved, then accessing the old pointer would trigger
> the barrier and MPS will do its thing, not?
>
> To clarify, I was trying to understand whether the error message
> reported by Ihor in another thread could have happened because of GC
> in this are of the code.

So, Ihor saw something in igc. In that case, the string _data_ will
never a barrier on it, because it's leaf data, i.e. data that doesn't
contain references, and that is allocated from an AMCZ pool. If that
helps, don't know.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 13:47         ` Eli Zaretskii
  2025-01-04 14:13           ` Gerd Möllmann
@ 2025-01-04 15:26           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-04 15:34             ` Gerd Möllmann
  2025-01-04 18:02             ` Eli Zaretskii
  1 sibling, 2 replies; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 15:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 75322

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Sat, 04 Jan 2025 11:08:50 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, 75322@debbugs.gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> > The current code in callproc.c assumes that GC cannot run while we are
>> > parked in posix_spawn or vfork.
>>
>> If you're attempting to explain why the current code is safe (if you're
>> saying it is), it assumes much more than that.  call_process assumes
>> Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the
>> function calls Lisp, which may do anything, including modifying
>> Vprocess_environment.
>
> AFAICT, expand-file-name is called before we start using the SSDATA of
> strings in the args[] array.  Or what did I miss?

You're right, thanks.  I got confused between args and new_argv.

The next thing I'd look at is the final call to ENCODE_FILE, called
after new_argv is set up; this ends up in encode_coding_object, which
calls safe_funcall in what seems to me to be an unlikely but possible
code path.  I assume that's unsafe (the safe_ refers to redisplay, not
GC, IIUC)?

While maybe_quit is "safe" because it inhibits GC, I believe it can
still call the debugger, which might require more memory than is
available until GC is re-enabled.

>> Regardless of what you're saying, such assumptions need to be spelled
>> out.  Where they are made, that is, not in a utility function.
>
> I'm okay with spelling out these assumptions.

Excellent.  Now we just need to establish what they are :-)

>> Yes, make_environment_block does say its callers can't run GC, but
>> call_process doesn't indicate when and how it establishes a no-GC
>> assumption.
>
> What would be needed to indicate that?

I'd prefer a pair of macros (no-ops in regular builds) to comments, but
there is no obvious best solution here.

My proposal would be to remove most (ideally, all, and then we're done)
no-GC assumptions, and put the few ones that must remain into separate
function bodies for the no-GC / possible-GC cases.  Then we can put the
no-GC function bodies into a separate section and prohibit references
from no-GC to possible-GC functions, and have the linker check that.

>> > Is that assumption false with MPS?
>>
>> As we agreed, code should be written to assume GC can strike at any
>> time.
>
> I don't think we agreed to that.  At least I didn't, not in this
> general form.  It would be a huge job to make all of our code comply
> with this.

I said "That's what you should think: GC can strike at any time", and
you said "The same is true with the old GC".  I misread that as
agreement.

>> IIUC, Gerd explained that the old GC can still move the string *data*
>> used in that structure, even if the string metadata stays in place.
>
> If string data is moved, then accessing the old pointer would trigger
> the barrier and MPS will do its thing, not?

Sorry, but I think I'm confused here.

IIUC, MPS doesn't currently use barriers on data that is moved (it
could, so the data is copied lazily, but I don't think that's what you
meant), it uses barriers on data that contains references that may not
have been fixed.

If a pointer to "old" data is ever exposed to Emacs, we lose, because
MPS will reuse the memory for new data, which might be behind a barrier.

If we ever do:

   static Lisp_Object unmarked;
   unmarked = string;
   ... trigger GC here ...
   puts (SDATA (unmarked);

the most likely outcome (thanks to Gerd for the hint) is that
nonsensical data is printed, because the string data was reused for
another string; less likely, but possibly, the data is reused for a
different pool (with barriers) and we'd get a SIGSEGV; even less likely,
puts() will call write() directly and we get an EFAULT, and glibc does
whatever glibc does in that case (set errno, I hope).

Accessing the old pointer is never okay.  MPS can't fix it, and doesn't
make catching such errors easy.

> To clarify, I was trying to understand whether the error message
> reported by Ihor in another thread could have happened because of GC
> in this are of the code.

I currently think that Ihor's test case calls execve () with nonsensical
"environment" strings a lot, and once in a while they'll even be behind
the barrier which causes an EFAULT.  GNU/Linux seems relatively
forgiving about strings without "=" in the envp, for whatever reason.

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 15:26           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-04 15:34             ` Gerd Möllmann
  2025-01-04 18:19               ` Eli Zaretskii
  2025-01-04 18:02             ` Eli Zaretskii
  1 sibling, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04 15:34 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, 75322

Pip Cet <pipcet@protonmail.com> writes:

> "Eli Zaretskii" <eliz@gnu.org> writes:
>
>>> Date: Sat, 04 Jan 2025 11:08:50 +0000
>>> From: Pip Cet <pipcet@protonmail.com>
>>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, 75322@debbugs.gnu.org
>>>
>>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>>
>>> > The current code in callproc.c assumes that GC cannot run while we are
>>> > parked in posix_spawn or vfork.
>>>
>>> If you're attempting to explain why the current code is safe (if you're
>>> saying it is), it assumes much more than that.  call_process assumes
>>> Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the
>>> function calls Lisp, which may do anything, including modifying
>>> Vprocess_environment.
>>
>> AFAICT, expand-file-name is called before we start using the SSDATA of
>> strings in the args[] array.  Or what did I miss?
>
> You're right, thanks.  I got confused between args and new_argv.
>
> The next thing I'd look at is the final call to ENCODE_FILE, called
> after new_argv is set up; this ends up in encode_coding_object, which
> calls safe_funcall in what seems to me to be an unlikely but possible
> code path.  I assume that's unsafe (the safe_ refers to redisplay, not
> GC, IIUC)?
>
> While maybe_quit is "safe" because it inhibits GC, I believe it can
> still call the debugger, which might require more memory than is
> available until GC is re-enabled.
>
>>> Regardless of what you're saying, such assumptions need to be spelled
>>> out.  Where they are made, that is, not in a utility function.
>>
>> I'm okay with spelling out these assumptions.
>
> Excellent.  Now we just need to establish what they are :-)
>
>>> Yes, make_environment_block does say its callers can't run GC, but
>>> call_process doesn't indicate when and how it establishes a no-GC
>>> assumption.
>>
>> What would be needed to indicate that?
>
> I'd prefer a pair of macros (no-ops in regular builds) to comments, but
> there is no obvious best solution here.
>
> My proposal would be to remove most (ideally, all, and then we're done)
> no-GC assumptions, and put the few ones that must remain into separate
> function bodies for the no-GC / possible-GC cases.  Then we can put the
> no-GC function bodies into a separate section and prohibit references
> from no-GC to possible-GC functions, and have the linker check that.
>
>>> > Is that assumption false with MPS?
>>>
>>> As we agreed, code should be written to assume GC can strike at any
>>> time.
>>
>> I don't think we agreed to that.  At least I didn't, not in this
>> general form.  It would be a huge job to make all of our code comply
>> with this.
>
> I said "That's what you should think: GC can strike at any time", and
> you said "The same is true with the old GC".  I misread that as
> agreement.
>
>>> IIUC, Gerd explained that the old GC can still move the string *data*
>>> used in that structure, even if the string metadata stays in place.
>>
>> If string data is moved, then accessing the old pointer would trigger
>> the barrier and MPS will do its thing, not?
>
> Sorry, but I think I'm confused here.
>
> IIUC, MPS doesn't currently use barriers on data that is moved (it
> could, so the data is copied lazily, but I don't think that's what you
> meant), it uses barriers on data that contains references that may not
> have been fixed.
>
> If a pointer to "old" data is ever exposed to Emacs, we lose, because
> MPS will reuse the memory for new data, which might be behind a barrier.
>
> If we ever do:
>
>    static Lisp_Object unmarked;
>    unmarked = string;
>    ... trigger GC here ...
>    puts (SDATA (unmarked);
>
> the most likely outcome (thanks to Gerd for the hint) is that
> nonsensical data is printed, because the string data was reused for
> another string; less likely, but possibly, the data is reused for a
> different pool (with barriers) and we'd get a SIGSEGV; even less likely,
> puts() will call write() directly and we get an EFAULT, and glibc does
> whatever glibc does in that case (set errno, I hope).
>
> Accessing the old pointer is never okay.  MPS can't fix it, and doesn't
> make catching such errors easy.
>
>> To clarify, I was trying to understand whether the error message
>> reported by Ihor in another thread could have happened because of GC
>> in this are of the code.
>
> I currently think that Ihor's test case calls execve () with nonsensical
> "environment" strings a lot, and once in a while they'll even be behind
> the barrier which causes an EFAULT.  GNU/Linux seems relatively
> forgiving about strings without "=" in the envp, for whatever reason.
>
> Pip

I'm entering a state of confusion again, as usual in this discussion.
Can we _pretty please_ just do the xstrdup thing, and forget about it?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 15:26           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-04 15:34             ` Gerd Möllmann
@ 2025-01-04 18:02             ` Eli Zaretskii
  2025-01-04 19:32               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04 18:02 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, 75322

> Date: Sat, 04 Jan 2025 15:26:01 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > AFAICT, expand-file-name is called before we start using the SSDATA of
> > strings in the args[] array.  Or what did I miss?
> 
> You're right, thanks.  I got confused between args and new_argv.
> 
> The next thing I'd look at is the final call to ENCODE_FILE, called
> after new_argv is set up; this ends up in encode_coding_object, which
> calls safe_funcall in what seems to me to be an unlikely but possible
> code path.  I assume that's unsafe (the safe_ refers to redisplay, not
> GC, IIUC)?

ENCODE_FILE can indeed trigger GC in rare cases, but I see only one
such call:

  path = ENCODE_FILE (path);

We could simply move this to before the loop that sets up new_argv[].

> While maybe_quit is "safe" because it inhibits GC, I believe it can
> still call the debugger, which might require more memory than is
> available until GC is re-enabled.

maybe_quit is not the problem, the problem AFAIU is that any encoding
could have pre-write-conversion function written in Lisp.

> >> Yes, make_environment_block does say its callers can't run GC, but
> >> call_process doesn't indicate when and how it establishes a no-GC
> >> assumption.
> >
> > What would be needed to indicate that?
> 
> I'd prefer a pair of macros (no-ops in regular builds) to comments, but
> there is no obvious best solution here.

Sorry, I don't understand: why macros?  Do we use something like that
anywhere else in Emacs?

> My proposal would be to remove most (ideally, all, and then we're done)
> no-GC assumptions, and put the few ones that must remain into separate
> function bodies for the no-GC / possible-GC cases.  Then we can put the
> no-GC function bodies into a separate section and prohibit references
> from no-GC to possible-GC functions, and have the linker check that.

First, the techniques that rely on separate segments are non-portable,
so not all platforms will support them.  More importantly, I'm afraid
the amount of code where we currently don't expect GC is much more
than you seem to imagine, so I don't think the above is practical.

In any case, the starting point is to audit all the places where GC
can happen, and that needs to be done anyway if we want to do it
thoroughly and proactively (as opposed to only when someone reports a
bug).  We can delay the decision what to do with these places to
later, when we understand better the magnitude of the problem and its
various aspects (in terms of several different effects that GC can
have on various objects).

> >> As we agreed, code should be written to assume GC can strike at any
> >> time.
> >
> > I don't think we agreed to that.  At least I didn't, not in this
> > general form.  It would be a huge job to make all of our code comply
> > with this.
> 
> I said "That's what you should think: GC can strike at any time", and
> you said "The same is true with the old GC".  I misread that as
> agreement.

I only meant that MPS is not very different from the old GC in this
regard, that's all.

> >> IIUC, Gerd explained that the old GC can still move the string *data*
> >> used in that structure, even if the string metadata stays in place.
> >
> > If string data is moved, then accessing the old pointer would trigger
> > the barrier and MPS will do its thing, not?
> 
> Sorry, but I think I'm confused here.
> 
> IIUC, MPS doesn't currently use barriers on data that is moved (it
> could, so the data is copied lazily, but I don't think that's what you
> meant), it uses barriers on data that contains references that may not
> have been fixed.

The safe thing is to re-initialize the pointer from the string
whenever we think GC could happen, and otherwise make sure GC cannot
happen.

> If a pointer to "old" data is ever exposed to Emacs, we lose, because
> MPS will reuse the memory for new data, which might be behind a barrier.
> 
> If we ever do:
> 
>    static Lisp_Object unmarked;
>    unmarked = string;
>    ... trigger GC here ...
>    puts (SDATA (unmarked);
> 
> the most likely outcome (thanks to Gerd for the hint) is that
> nonsensical data is printed

Are you sure?  AFAIU, referencing 'unmarked' is safe, even after GC.
The below is indeed unsafe:

  char *p = SDATA (unmarked);
  ... trigger GC here ...
  puts (p);

> > To clarify, I was trying to understand whether the error message
> > reported by Ihor in another thread could have happened because of GC
> > in this are of the code.
> 
> I currently think that Ihor's test case calls execve () with nonsensical
> "environment" strings a lot, and once in a while they'll even be behind
> the barrier which causes an EFAULT.

Before we agree that this could be the reason, we'd need to find the
places where GC could reuse the memory of a live string, while that
string appears in some live data structure, and as long as no GC can
happen between the time we put the SDATA pointers into the environment
array and the time posix_spawn returns.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 15:34             ` Gerd Möllmann
@ 2025-01-04 18:19               ` Eli Zaretskii
  2025-01-04 18:35                 ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04 18:19 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  75322@debbugs.gnu.org
> Date: Sat, 04 Jan 2025 16:34:15 +0100
> 
> I'm entering a state of confusion again, as usual in this discussion.
> Can we _pretty please_ just do the xstrdup thing, and forget about it?

We could do that, but what does this mean for our protocol of using
data from Lisp objects in libc and system calls?  Does it mean we
cannot use SAFE_ALLOCA?  Does it mean we must always xstrdup every
string and xmalloc every other object before calling some system API?
Are Lisp strings safe when GC can strike, or aren't they?  What about
Lisp vectors?  Etc. etc.

We must figure out what are the safe and sound rules for doing this
with MPS, like we had with the old GC.  Otherwise, we will be unable
to write correct and sound code, and we'll be unable to audit code
written by others.

Right now, it doesn't seem to me like we have a clear idea of what's
permitted and what's unsafe.  We are still arguing whether GC moves
Lisp strings and what exactly does that mean.  We still don't
understand well enough what, if anything, are the problems with
SAFE_ALLOCA and its ilk.  We just established that ENCODE_FILE and
ENCODE_SYSTEM can trigger GC, and didn't yet review the affected code.
And there are many more places and calls to consider (e.g., do you
know that redisplay could trigger GC when it calls character
composition?).

If we don't consider this stuff, if we just "do the xstrdup thing and
forget about it", how can we continue cleaning up the igc branch and
making its code more stable and reliable?  It doesn't sound wise to
me.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 18:19               ` Eli Zaretskii
@ 2025-01-04 18:35                 ` Gerd Möllmann
  2025-01-04 19:10                   ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04 18:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  75322@debbugs.gnu.org
>> Date: Sat, 04 Jan 2025 16:34:15 +0100
>> 
>> I'm entering a state of confusion again, as usual in this discussion.
>> Can we _pretty please_ just do the xstrdup thing, and forget about it?
>
> We could do that, but what does this mean for our protocol of using
> data from Lisp objects in libc and system calls?  Does it mean we
> cannot use SAFE_ALLOCA?  Does it mean we must always xstrdup every
> string and xmalloc every other object before calling some system API?
> Are Lisp strings safe when GC can strike, or aren't they?  What about
> Lisp vectors?  Etc. etc.
>
> We must figure out what are the safe and sound rules for doing this
> with MPS, like we had with the old GC.  Otherwise, we will be unable
> to write correct and sound code, and we'll be unable to audit code
> written by others.
>
> Right now, it doesn't seem to me like we have a clear idea of what's
> permitted and what's unsafe.  

I'd say I have a clear idea.

> We are still arguing whether GC moves Lisp strings and what exactly
> does that mean. We still don't understand well enough what, if
> anything, are the problems with SAFE_ALLOCA and its ilk. 

I can't believe you say that. We talked about why xmalloc'd memory
has to be a root if it contains references. SAFE_NALLOCA uses xnmalloc.
Safe_ALLOCA_LISP does things differently.

> We just established that ENCODE_FILE and ENCODE_SYSTEM can trigger GC,
> and didn't yet review the affected code. And there are many more
> places and calls to consider (e.g., do you know that redisplay could
> trigger GC when it calls character composition?).
>
> If we don't consider this stuff, if we just "do the xstrdup thing and
> forget about it", how can we continue cleaning up the igc branch and
> making its code more stable and reliable?  It doesn't sound wise to
> me.

I'll bow out of this discussion, sorry. This time for real :-).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 18:35                 ` Gerd Möllmann
@ 2025-01-04 19:10                   ` Eli Zaretskii
  2025-01-04 19:24                     ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04 19:10 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sat, 04 Jan 2025 19:35:04 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > We are still arguing whether GC moves Lisp strings and what exactly
> > does that mean. We still don't understand well enough what, if
> > anything, are the problems with SAFE_ALLOCA and its ilk. 
> 
> I can't believe you say that. We talked about why xmalloc'd memory
> has to be a root if it contains references. SAFE_NALLOCA uses xnmalloc.
> Safe_ALLOCA_LISP does things differently.

Sorry, I don't have a clear idea what that means.  When can we use
SAFE_NALLOCA and friends? when can we or must we use SAFE_ALLOCA_LISP?
What are the considerations? etc. etc.

I believe you that you have a clear idea, but we need all of us have a
clear idea, and we should probably write that up somewhere once we do.

> > We just established that ENCODE_FILE and ENCODE_SYSTEM can trigger GC,
> > and didn't yet review the affected code. And there are many more
> > places and calls to consider (e.g., do you know that redisplay could
> > trigger GC when it calls character composition?).
> >
> > If we don't consider this stuff, if we just "do the xstrdup thing and
> > forget about it", how can we continue cleaning up the igc branch and
> > making its code more stable and reliable?  It doesn't sound wise to
> > me.
> 
> I'll bow out of this discussion, sorry. This time for real :-).

I'm sorry to hear that, because I think it's important to have you on
board in this discussion.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 19:10                   ` Eli Zaretskii
@ 2025-01-04 19:24                     ` Gerd Möllmann
  0 siblings, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-04 19:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sat, 04 Jan 2025 19:35:04 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > We are still arguing whether GC moves Lisp strings and what exactly
>> > does that mean. We still don't understand well enough what, if
>> > anything, are the problems with SAFE_ALLOCA and its ilk. 
>> 
>> I can't believe you say that. We talked about why xmalloc'd memory
>> has to be a root if it contains references. SAFE_NALLOCA uses xnmalloc.
>> Safe_ALLOCA_LISP does things differently.
>
> Sorry, I don't have a clear idea what that means.  When can we use
> SAFE_NALLOCA and friends? when can we or must we use SAFE_ALLOCA_LISP?
> What are the considerations? etc. etc.

I tried to explain that in igc.org, e.g. what roots are, or malloc in
one section. That's the best I can come up with.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 18:02             ` Eli Zaretskii
@ 2025-01-04 19:32               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-04 20:31                 ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 19:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 75322

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Sat, 04 Jan 2025 15:26:01 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> > AFAICT, expand-file-name is called before we start using the SSDATA of
>> > strings in the args[] array.  Or what did I miss?
>>
>> You're right, thanks.  I got confused between args and new_argv.
>>
>> The next thing I'd look at is the final call to ENCODE_FILE, called
>> after new_argv is set up; this ends up in encode_coding_object, which
>> calls safe_funcall in what seems to me to be an unlikely but possible
>> code path.  I assume that's unsafe (the safe_ refers to redisplay, not
>> GC, IIUC)?
>
> ENCODE_FILE can indeed trigger GC in rare cases, but I see only one
> such call:
>
>   path = ENCODE_FILE (path);
>
> We could simply move this to before the loop that sets up new_argv[].

Fixing the current code for this super-rare case would be good, but my
point was that we cannot prove the current code to be correct; it isn't,
technically speaking, even though it's battle-worn.

>
>> While maybe_quit is "safe" because it inhibits GC, I believe it can
>> still call the debugger, which might require more memory than is
>> available until GC is re-enabled.
>
> maybe_quit is not the problem, the problem AFAIU is that any encoding
> could have pre-write-conversion function written in Lisp.

Agreed, not the problem here.

>> >> Yes, make_environment_block does say its callers can't run GC, but
>> >> call_process doesn't indicate when and how it establishes a no-GC
>> >> assumption.
>> >
>> > What would be needed to indicate that?
>>
>> I'd prefer a pair of macros (no-ops in regular builds) to comments, but
>> there is no obvious best solution here.
>
> Sorry, I don't understand: why macros?  Do we use something like that
> anywhere else in Emacs?

How is it different from other easserts?  emacs_spawn has

  eassert (input_blocked_p ());

static_assert (NIL_IS_ZERO) was intended to be a similarly proactive
macro marking the code making that assumption, but that didn't work out.

>> My proposal would be to remove most (ideally, all, and then we're done)
>> no-GC assumptions, and put the few ones that must remain into separate
>> function bodies for the no-GC / possible-GC cases.  Then we can put the
>> no-GC function bodies into a separate section and prohibit references
>> from no-GC to possible-GC functions, and have the linker check that.
>
> First, the techniques that rely on separate segments are non-portable,
> so not all platforms will support them.  More importantly, I'm afraid

Absolutely, debug builds on the main platforms can, though.

> the amount of code where we currently don't expect GC is much more
> than you seem to imagine, so I don't think the above is practical.

That's why I want to remove most non-GC assumptions first.  I gave up
the last time I tried doing this, FWIW, for the reason you describe.

> In any case, the starting point is to audit all the places where GC
> can happen, and that needs to be done anyway if we want to do it
> thoroughly and proactively (as opposed to only when someone reports a
> bug).

I don't see how putting a few macros in while we're there could hurt :-)

> We can delay the decision what to do with these places to
> later, when we understand better the magnitude of the problem and its
> various aspects (in terms of several different effects that GC can
> have on various objects).

Agreed, I won't post a patch for now.

>> >> As we agreed, code should be written to assume GC can strike at any
>> >> time.
>> >
>> > I don't think we agreed to that.  At least I didn't, not in this
>> > general form.  It would be a huge job to make all of our code comply
>> > with this.
>>
>> I said "That's what you should think: GC can strike at any time", and
>> you said "The same is true with the old GC".  I misread that as
>> agreement.
>
> I only meant that MPS is not very different from the old GC in this
> regard, that's all.

I think I understand now, thanks!

>> >> IIUC, Gerd explained that the old GC can still move the string *data*
>> >> used in that structure, even if the string metadata stays in place.
>> >
>> > If string data is moved, then accessing the old pointer would trigger
>> > the barrier and MPS will do its thing, not?
>>
>> Sorry, but I think I'm confused here.
>>
>> IIUC, MPS doesn't currently use barriers on data that is moved (it
>> could, so the data is copied lazily, but I don't think that's what you
>> meant), it uses barriers on data that contains references that may not
>> have been fixed.
>
> The safe thing is to re-initialize the pointer from the string
> whenever we think GC could happen, and otherwise make sure GC cannot
> happen.

For the old GC, yes.  For the new GC, the safe thing would be to ensure
MPS has removed all memory barriers, take the arena lock, then call the
syscall and return.  Or use xstrdup.

>
>> If a pointer to "old" data is ever exposed to Emacs, we lose, because
>> MPS will reuse the memory for new data, which might be behind a barrier.
>>
>> If we ever do:
>>
>>    static Lisp_Object unmarked;
      ^^^^^^
>>    unmarked = string;
>>    ... trigger GC here ...
>>    puts (SDATA (unmarked);
>>
>> the most likely outcome (thanks to Gerd for the hint) is that
>> nonsensical data is printed
>
> Are you sure?

With the static keyword, yes.  (Assuming we don't staticpro the static
variable, of course).

> The below is indeed unsafe:
>
>   char *p = SDATA (unmarked);
>   ... trigger GC here ...
>   puts (p);

Right now, that's safe for MPS, but not for the old GC, correct?

>> > To clarify, I was trying to understand whether the error message
>> > reported by Ihor in another thread could have happened because of GC
>> > in this are of the code.
>>
>> I currently think that Ihor's test case calls execve () with nonsensical
>> "environment" strings a lot, and once in a while they'll even be behind
>> the barrier which causes an EFAULT.
>
> Before we agree that this could be the reason, we'd need to find the
> places where GC could reuse the memory of a live string, while that
> string appears in some live data structure, and as long as no GC can
> happen between the time we put the SDATA pointers into the environment
> array and the time posix_spawn returns.

Calling igc_collect () right before the spawn results in corrupt data:

$ gdb --ar ./src/emacs -Q --batch --eval '(while t (with-temp-buffer
(shell-command "/usr/bin/env" t) (message "%S" (buffer-string))))'


(gdb) b posix_spawn
Breakpoint 1 at 0x535e0
(gdb) r
Starting program: ...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1.1, 0x00007ffff5c2b9c0 in posix_spawn () from /lib64/libc.so.6
(gdb) p igc_collect ()
$1 = void
(gdb) p env[1]
No symbol "env" in current context.
(gdb) up
#1  0x00005555558a805d in emacs_spawn (newpid=0x7ffffffeb148, std_in=4, std_out=6, std_err=6, argv=0x7ffffffeb0f0, envp=0x5555560bbae0, 
    cwd=0x7fffe3acb0c8 "/home/pip/emacs-forge/emacs", pty_name=0x0, pty_in=false, pty_out=false, oldset=0x7ffffffeb280) at callproc.c:1490
1490	      vfork_error = posix_spawn (&pid, argv[0], &actions, &attributes,
(gdb) p envp[1]
$2 = 0x7fffe3a5ded0 "\bD\034\344\377\177"
(gdb) c
Continuing.
[Detaching after vfork from child process 5771]
"PWD=/home/pip/emacs-forge/emacs
SHLVL=0
_=/usr/bin/env
"

The old pointer pointed to an AMCZ pool (no barriers), but GC might have
reassigned the memory to an AMC pool (with barriers), which would cause
the EFAULT (and explain why it's so rare).

Hardcoding the assumption that string data is never behind a memory
barrier is a bad idea.  We might want to change it back to an AMC pool
(I've done so in local debug builds so I could go back from string data
to the string metadata).

So the only mystery left is what causes GC to be called on current
scratch/igc after the environment block has been created; we know we
want to change the code so GC can be triggered from another thread, but
it's failing now, and there are no other threads yet!

My current theory is that igc_on_grow_specpdl (called indirectly from
here:

  /* Do the unwind-protect now, even though the pid is not known, so
     that no storage allocation is done in the critical section.
     The actual PID will be filled in during the critical section.  */
  record_unwind_protect (call_process_cleanup, Fcurrent_buffer ());

) releases the arena, and MPS uses that apparent opportunity to call
ArenaPoll which might do GC things.

But, again, we want to make this code safe for GC at any time.

(We might be able to delay mps_arena_release until the next maybe_quit,
I guess)

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 19:32               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-04 20:31                 ` Eli Zaretskii
  2025-01-04 21:15                   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-04 20:31 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, 75322

> Date: Sat, 04 Jan 2025 19:32:01 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
> 
> > ENCODE_FILE can indeed trigger GC in rare cases, but I see only one
> > such call:
> >
> >   path = ENCODE_FILE (path);
> >
> > We could simply move this to before the loop that sets up new_argv[].
> 
> Fixing the current code for this super-rare case would be good, but my
> point was that we cannot prove the current code to be correct; it isn't,
> technically speaking, even though it's battle-worn.

I want to be pragmatic and solve practical problems in my life time.
Proving that the code is correct I leave to the next generation.

> >> >> Yes, make_environment_block does say its callers can't run GC, but
> >> >> call_process doesn't indicate when and how it establishes a no-GC
> >> >> assumption.
> >> >
> >> > What would be needed to indicate that?
> >>
> >> I'd prefer a pair of macros (no-ops in regular builds) to comments, but
> >> there is no obvious best solution here.
> >
> > Sorry, I don't understand: why macros?  Do we use something like that
> > anywhere else in Emacs?
> 
> How is it different from other easserts?  emacs_spawn has
> 
>   eassert (input_blocked_p ());

How do you eassert something that should not happen during some code
fragment?  If you have something specific in mind, do show the code.

> >> My proposal would be to remove most (ideally, all, and then we're done)
> >> no-GC assumptions, and put the few ones that must remain into separate
> >> function bodies for the no-GC / possible-GC cases.  Then we can put the
> >> no-GC function bodies into a separate section and prohibit references
> >> from no-GC to possible-GC functions, and have the linker check that.
> >
> > First, the techniques that rely on separate segments are non-portable,
> > so not all platforms will support them.  More importantly, I'm afraid
> 
> Absolutely, debug builds on the main platforms can, though.

But then important code that is #ifdef SOME-PLATFORM cannot be handled
like that, and we are back at square one, because users of "non-main
platforms" still report bugs and we try to investigate and fix them,
so we still need to be able to solve this when separate segments are
not available.

> > the amount of code where we currently don't expect GC is much more
> > than you seem to imagine, so I don't think the above is practical.
> 
> That's why I want to remove most non-GC assumptions first.  I gave up
> the last time I tried doing this, FWIW, for the reason you describe.
> 
> > In any case, the starting point is to audit all the places where GC
> > can happen, and that needs to be done anyway if we want to do it
> > thoroughly and proactively (as opposed to only when someone reports a
> > bug).
> 
> I don't see how putting a few macros in while we're there could hurt :-)

We neither saw any macros yet nor have any idea how many of them will
be needed.  So I don't know why you think it's just a few macros.  We
are still discussing a tiny static function and haven't arrived at an
agreed and safe solution.  Let's be a bit more practical here, okay?

> > The safe thing is to re-initialize the pointer from the string
> > whenever we think GC could happen, and otherwise make sure GC cannot
> > happen.
> 
> For the old GC, yes.  For the new GC, the safe thing would be to ensure
> MPS has removed all memory barriers, take the arena lock, then call the
> syscall and return.  Or use xstrdup.

If this is indeed so (and I don't think it is), then we need to
discuss it very thoroughly, because it basically means we cannot do
anything with Lisp strings in C.  For example, the display iterator
has a member that is a Lisp string, which is used when we produce
glyphs for display from a string (such as a display property or an
overlay before/after string) -- what you say here basically means that
we cannot carry that string around, right?  If not, how is that
different?

> >> If a pointer to "old" data is ever exposed to Emacs, we lose, because
> >> MPS will reuse the memory for new data, which might be behind a barrier.
> >>
> >> If we ever do:
> >>
> >>    static Lisp_Object unmarked;
>       ^^^^^^
> >>    unmarked = string;
> >>    ... trigger GC here ...
> >>    puts (SDATA (unmarked);
> >>
> >> the most likely outcome (thanks to Gerd for the hint) is that
> >> nonsensical data is printed
> >
> > Are you sure?
> 
> With the static keyword, yes.  (Assuming we don't staticpro the static
> variable, of course).

What does static have to do with this?  What matters is the value, not
the storage.  The value comes from 'string', a different variable.  It
points to string data, and it's that string data that is of interest
to us in the above snippet.

> > The below is indeed unsafe:
> >
> >   char *p = SDATA (unmarked);
> >   ... trigger GC here ...
> >   puts (p);
> 
> Right now, that's safe for MPS, but not for the old GC, correct?

If GC moves string data, then it is not safe, period.  Does MPS move
string data?

> >> > To clarify, I was trying to understand whether the error message
> >> > reported by Ihor in another thread could have happened because of GC
> >> > in this are of the code.
> >>
> >> I currently think that Ihor's test case calls execve () with nonsensical
> >> "environment" strings a lot, and once in a while they'll even be behind
> >> the barrier which causes an EFAULT.
> >
> > Before we agree that this could be the reason, we'd need to find the
> > places where GC could reuse the memory of a live string, while that
> > string appears in some live data structure, and as long as no GC can
> > happen between the time we put the SDATA pointers into the environment
> > array and the time posix_spawn returns.
> 
> Calling igc_collect () right before the spawn results in corrupt data:

But the code doesn't call igc_collect, so how is this relevant?
Please see how I described the situation in the paragraph above.

> So the only mystery left is what causes GC to be called on current
> scratch/igc after the environment block has been created;

That's the _only_ mystery, and only if that in fact happens.  Someone
should explain how GC _can_ happen in that case on the igc branch.  If
you can, please do, and let's stop arguing about theoretical
possibilities.

> My current theory is that igc_on_grow_specpdl (called indirectly from
> here:
> 
>   /* Do the unwind-protect now, even though the pid is not known, so
>      that no storage allocation is done in the critical section.
>      The actual PID will be filled in during the critical section.  */
>   record_unwind_protect (call_process_cleanup, Fcurrent_buffer ());
> 
> ) releases the arena, and MPS uses that apparent opportunity to call
> ArenaPoll which might do GC things.

Evidence, please.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 20:31                 ` Eli Zaretskii
@ 2025-01-04 21:15                   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-05  8:23                     ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 21:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 75322

"Eli Zaretskii" <eliz@gnu.org> writes:

>> > The safe thing is to re-initialize the pointer from the string
>> > whenever we think GC could happen, and otherwise make sure GC cannot
>> > happen.
>>
>> For the old GC, yes.  For the new GC, the safe thing would be to ensure
>> MPS has removed all memory barriers, take the arena lock, then call the
>> syscall and return.  Or use xstrdup.
>
> If this is indeed so (and I don't think it is), then we need to

Which part do you think is wrong?

> discuss it very thoroughly, because it basically means we cannot do
> anything with Lisp strings in C.  For example, the display iterator
> has a member that is a Lisp string, which is used when we produce
> glyphs for display from a string (such as a display property or an
> overlay before/after string) -- what you say here basically means that
> we cannot carry that string around, right?  If not, how is that
> different?

Of course we can use Lisp strings.  As long as there's an automatic
variable pointing to the string data, it'll stay there.  If there's a
static variable pointing to the string data, it might move, but then the
static variable will be updated.

>> >> If a pointer to "old" data is ever exposed to Emacs, we lose, because
>> >> MPS will reuse the memory for new data, which might be behind a barrier.
>> >>
>> >> If we ever do:
>> >>
>> >>    static Lisp_Object unmarked;
>>       ^^^^^^
>> >>    unmarked = string;
>> >>    ... trigger GC here ...
>> >>    puts (SDATA (unmarked);
>> >>
>> >> the most likely outcome (thanks to Gerd for the hint) is that
>> >> nonsensical data is printed
>> >
>> > Are you sure?
>>
>> With the static keyword, yes.  (Assuming we don't staticpro the static
>> variable, of course).
>
> What does static have to do with this?

What matters is whether the value is visible to the garbage collector
(in which case it remains a valid pointer) or isn't (in which case the
memory it points to is used for something else).

Automatic variables, residing on the stack or residing in a register
(which is then spilled to the stack) protect the memory they point to.
Static variables don't unless we tell GC about them.

> What matters is the value, not the storage.

I have no idea what you mean by that.  The value of the variable is a
tagged pointer.  It won't change during GC, because GC never alters
automatic variables.  The question is whether this pointer still points
to the right data area after GC.  Unless there happens to be another
ambiguous reference to the string data (which means MPS cannot move the
string data, because it cannot alter ambiguous references), an
unprotected static Lisp_Object will most likely point to invalid data
after MPS GC runs.

> The value comes from 'string', a different variable.  It

Which might be in a register and not survive until GC is triggered.  Or
it might be a global/static variable which is in an exact root, which
means the data can be moved, 'string' will be updated, 'unmarked' won't.

> points to string data, and it's that string data that is of interest
> to us in the above snippet.

>> > The below is indeed unsafe:
>> >
>> >   char *p = SDATA (unmarked);
>> >   ... trigger GC here ...
>> >   puts (p);
>>
>> Right now, that's safe for MPS, but not for the old GC, correct?
>
> If GC moves string data, then it is not safe, period.  Does MPS move
> string data?

MPS does not move string data if there is a stack variable pointing to
it.  It does in other cases.  This is why it's safe for MPS.  The old
GC, IIUC, is less forgiving.

>> >> > To clarify, I was trying to understand whether the error message
>> >> > reported by Ihor in another thread could have happened because of GC
>> >> > in this are of the code.
>> >>
>> >> I currently think that Ihor's test case calls execve () with nonsensical
>> >> "environment" strings a lot, and once in a while they'll even be behind
>> >> the barrier which causes an EFAULT.
>> >
>> > Before we agree that this could be the reason, we'd need to find the
>> > places where GC could reuse the memory of a live string, while that
>> > string appears in some live data structure, and as long as no GC can
>> > happen between the time we put the SDATA pointers into the environment
>> > array and the time posix_spawn returns.
>>
>> Calling igc_collect () right before the spawn results in corrupt data:
>
> But the code doesn't call igc_collect, so how is this relevant?

This is all that is relevant to establishing the current code is not
safe for the intended behavior of scratch/igc, which is to allow garbage
collection at this point, equivalent to setting a breakpoint there and
calling igc_collect().

That we have a specific code path which ends up in MPS code is a bonus.

I honestly don't know whether hitting a memory barrier can result in
other data being moved by MPS right now.  I'm assuming it can, and we
should not assume string data is protected from being moved unless we
root it explicitly.

(Just answering the GC questions for now, sorry).

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-03 19:55 ` Gerd Möllmann
@ 2025-01-05  6:32 ` Gerd Möllmann
  2025-01-05  6:59   ` Gerd Möllmann
  2025-01-05  7:48   ` Eli Zaretskii
  1 sibling, 2 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05  6:32 UTC (permalink / raw)
  To: Pip Cet; +Cc: 75322

Just want to add that all SAFE_NALLOC uses should be checked in
scratch/igc. For example, set_overlays_multibyte uses it to store
itree_node *, and itree_node is in MPS.

I think I'll just make it allocate a root in my Emacs. That's the least
work.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  6:32 ` Gerd Möllmann
@ 2025-01-05  6:59   ` Gerd Möllmann
  2025-01-05 10:21     ` Eli Zaretskii
  2025-01-05  7:48   ` Eli Zaretskii
  1 sibling, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05  6:59 UTC (permalink / raw)
  To: Pip Cet; +Cc: 75322

[-- Attachment #1: Type: text/plain, Size: 414 bytes --]

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Just want to add that all SAFE_NALLOC uses should be checked in
> scratch/igc. For example, set_overlays_multibyte uses it to store
> itree_node *, and itree_node is in MPS.
>
> I think I'll just make it allocate a root in my Emacs. That's the least
> work.

Please find attached what I'm using now in my Emacs, in addition to your
xstrdup commit.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Don-t-use-SAFE_NALLOCA-for-Lisp_Objects.patch --]
[-- Type: text/x-patch, Size: 1639 bytes --]

From c364d3199a1b3b1a9ec0f9afe421110d46a77769 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerd=20M=C3=B6llmann?= <gerd@gnu.org>
Date: Sat, 4 Jan 2025 19:42:50 +0100
Subject: [PATCH 1/2] Don't use SAFE_NALLOCA for Lisp_Objects

* src/callproc.c (call_process, create_temp_file): Use SAFE_ALLOCA_LISP
instead of SAFE_NALLCA.
---
 src/callproc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/callproc.c b/src/callproc.c
index 459ddbeb4f3..6f358f939de 100644
--- a/src/callproc.c
+++ b/src/callproc.c
@@ -423,7 +423,7 @@ call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd,
 	  val = Qraw_text;
 	else
 	  {
-	    SAFE_NALLOCA (args2, 1, nargs + 1);
+	    SAFE_ALLOCA_LISP (args2, nargs + 1);
 	    args2[0] = Qcall_process;
 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
@@ -741,7 +741,7 @@ call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd,
 	    {
 	      ptrdiff_t i;
 
-	      SAFE_NALLOCA (args2, 1, nargs + 1);
+	      SAFE_ALLOCA_LISP (args2, nargs + 1);
 	      args2[0] = Qcall_process;
 	      for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
 	      coding_systems
@@ -1033,7 +1033,7 @@ create_temp_file (ptrdiff_t nargs, Lisp_Object *args,
       Lisp_Object coding_systems;
       Lisp_Object *args2;
       USE_SAFE_ALLOCA;
-      SAFE_NALLOCA (args2, 1, nargs + 1);
+      SAFE_ALLOCA_LISP (args2, nargs + 1);
       args2[0] = Qcall_process_region;
       memcpy (args2 + 1, args, nargs * sizeof *args);
       coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
-- 
2.47.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Make-SAFE_NALLOCA-allocate-an-ambiguous-root.patch --]
[-- Type: text/x-patch, Size: 2523 bytes --]

From 5b0a46390dfb5407cb7585bcbb1b7491d81885ea Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerd=20M=C3=B6llmann?= <gerd@gnu.org>
Date: Sun, 5 Jan 2025 07:45:51 +0100
Subject: [PATCH 2/2] Make SAFE_NALLOCA allocate an ambiguous root

* src/igc.c (igc_xnmalloc_ambig): New function.
* src/igc.h: Declare it.
* src/lisp.h (SAFE_NALLOCA): Use it.
---
 src/igc.c  |  6 ++++++
 src/igc.h  |  1 +
 src/lisp.h | 19 +++++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/src/igc.c b/src/igc.c
index 54131d5fcf3..10f74c66d8f 100644
--- a/src/igc.c
+++ b/src/igc.c
@@ -3144,6 +3144,12 @@ igc_xzalloc_ambig (size_t size)
   return p;
 }
 
+void *
+igc_xnmalloc_ambig (ptrdiff_t nitems, ptrdiff_t item_size)
+{
+  return igc_xzalloc_ambig (nitems * item_size);
+}
+
 void *
 igc_realloc_ambig (void *block, size_t size)
 {
diff --git a/src/igc.h b/src/igc.h
index 2a9bcda58b2..656a384aacf 100644
--- a/src/igc.h
+++ b/src/igc.h
@@ -86,6 +86,7 @@ #define EMACS_IGC_H
 struct Lisp_Buffer_Local_Value *igc_alloc_blv (void);
 void *igc_alloc_handler (void);
 void *igc_xzalloc_ambig (size_t size);
+void *igc_xnmalloc_ambig (ptrdiff_t nitems, ptrdiff_t item_size);
 void *igc_realloc_ambig (void *block, size_t size);
 void igc_xfree (void *p);
 Lisp_Object *igc_xalloc_lisp_objs_exact (size_t n);
diff --git a/src/lisp.h b/src/lisp.h
index 67b574a67b7..32e2886561c 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -6109,6 +6109,23 @@ #define SAFE_ALLOCA(size) ((size) <= sa_avail				\
    NITEMS items, each of the same type as *BUF.  MULTIPLIER must
    positive.  The code is tuned for MULTIPLIER being a constant.  */
 
+# ifdef HAVE_MPS
+void *igc_xnmalloc_ambig (ptrdiff_t nitems, ptrdiff_t item_size);
+void igc_xfree (void *p);
+
+#define SAFE_NALLOCA(buf, multiplier, nitems)				\
+  do {									\
+    if ((nitems) <= sa_avail / sizeof *(buf) / (multiplier))		\
+      (buf) = AVAIL_ALLOCA (sizeof *(buf) * (multiplier) * (nitems));	\
+    else								\
+      {									\
+	(buf) = igc_xnmalloc_ambig (nitems, sizeof *(buf) * (multiplier)); \
+	record_unwind_protect_ptr (igc_xfree, buf);			\
+      }									\
+  } while (false)
+
+#else
+
 #define SAFE_NALLOCA(buf, multiplier, nitems)			 \
   do {								 \
     if ((nitems) <= sa_avail / sizeof *(buf) / (multiplier))	 \
@@ -6120,6 +6137,8 @@ #define SAFE_NALLOCA(buf, multiplier, nitems)			 \
       }								 \
   } while (false)
 
+#endif
+
 /* SAFE_ALLOCA_STRING allocates a C copy of a Lisp string.  */
 
 #define SAFE_ALLOCA_STRING(ptr, string)			\
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  6:32 ` Gerd Möllmann
  2025-01-05  6:59   ` Gerd Möllmann
@ 2025-01-05  7:48   ` Eli Zaretskii
  2025-01-05  8:19     ` Gerd Möllmann
  1 sibling, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05  7:48 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> Cc: 75322@debbugs.gnu.org
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Date: Sun, 05 Jan 2025 07:32:46 +0100
> 
> Just want to add that all SAFE_NALLOC uses should be checked in
> scratch/igc. For example, set_overlays_multibyte uses it to store
> itree_node *, and itree_node is in MPS.

Can you tell more about what should be checked and how?  I'm probably
missing some details of the igc build, because I cannot translate what
you say above into practical actions of auditing the code.

Thanks.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  7:48   ` Eli Zaretskii
@ 2025-01-05  8:19     ` Gerd Möllmann
  2025-01-05 10:33       ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05  8:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: 75322@debbugs.gnu.org
>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Date: Sun, 05 Jan 2025 07:32:46 +0100
>> 
>> Just want to add that all SAFE_NALLOC uses should be checked in
>> scratch/igc. For example, set_overlays_multibyte uses it to store
>> itree_node *, and itree_node is in MPS.
>
> Can you tell more about what should be checked and how?  I'm probably
> missing some details of the igc build, because I cannot translate what
> you say above into practical actions of auditing the code.
>
> Thanks.

I'd grep for SAFE_NALLOCA, and for each occurrence, see what is stored
in the memory allocated. If that is a reference to MPS-allocated memory
(a pointer or Lisp_Object), it should be changed, because it then hides
references from MPS in malloc'd memory. Or can hide, to be more precise,
in the case it doesn't use alloca.

Or alternatively, you could avoid having to do that, and make
SAFE_NALLOCA xmalloc an ambiguous root. A patch for that is in the other
mail I sent. I think the performance impact of that is negligible
because this path is only executed when SAFE_ALLOCA does not use alloca.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 21:15                   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05  8:23                     ` Eli Zaretskii
  2025-01-05  9:04                       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05  8:23 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, 75322

> Date: Sat, 04 Jan 2025 21:15:50 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> > The safe thing is to re-initialize the pointer from the string
> >> > whenever we think GC could happen, and otherwise make sure GC cannot
> >> > happen.
> >>
> >> For the old GC, yes.  For the new GC, the safe thing would be to ensure
> >> MPS has removed all memory barriers, take the arena lock, then call the
> >> syscall and return.  Or use xstrdup.
> >
> > If this is indeed so (and I don't think it is), then we need to
> 
> Which part do you think is wrong?

All of what you describe.  We should be able to use without
restrictions local variables whose values are Lisp strings, and we
should be able to use without restrictions static variables whose
values are Lisp strings.  To use the 'char *' pointer to a Lisp
string's data we should make sure GC cannot happen between the time we
extract the pointer using SDATA or SSDATA and the time we end the use
of the pointer.

If you think anything in the above is not correct with MPS, please
tell.  Please only respond with real practical problems, to keep this
discussion concrete and on-topic.

> > discuss it very thoroughly, because it basically means we cannot do
> > anything with Lisp strings in C.  For example, the display iterator
> > has a member that is a Lisp string, which is used when we produce
> > glyphs for display from a string (such as a display property or an
> > overlay before/after string) -- what you say here basically means that
> > we cannot carry that string around, right?  If not, how is that
> > different?
> 
> Of course we can use Lisp strings.  As long as there's an automatic
> variable pointing to the string data, it'll stay there.  If there's a
> static variable pointing to the string data, it might move, but then the
> static variable will be updated.

Then I don't understand why you seem to be saying that our usual use
patterns of Lisp strings and their textual data is unsafe and needs to
be amended by using xstrdup etc.  We do it already as described above.

> >> >> If a pointer to "old" data is ever exposed to Emacs, we lose, because
> >> >> MPS will reuse the memory for new data, which might be behind a barrier.
> >> >>
> >> >> If we ever do:
> >> >>
> >> >>    static Lisp_Object unmarked;
> >>       ^^^^^^
> >> >>    unmarked = string;
> >> >>    ... trigger GC here ...
> >> >>    puts (SDATA (unmarked);
> >> >>
> >> >> the most likely outcome (thanks to Gerd for the hint) is that
> >> >> nonsensical data is printed
> >> >
> >> > Are you sure?
> >>
> >> With the static keyword, yes.  (Assuming we don't staticpro the static
> >> variable, of course).
> >
> > What does static have to do with this?
> 
> What matters is whether the value is visible to the garbage collector
> (in which case it remains a valid pointer) or isn't (in which case the
> memory it points to is used for something else).
> 
> Automatic variables, residing on the stack or residing in a register
> (which is then spilled to the stack) protect the memory they point to.
> Static variables don't unless we tell GC about them.
> 
> > What matters is the value, not the storage.
> 
> I have no idea what you mean by that.  The value of the variable is a
> tagged pointer.  It won't change during GC, because GC never alters
> automatic variables.  The question is whether this pointer still points
> to the right data area after GC.

The pointer taken before GC might not point to the string's data
after GC.  Which is why, if we know GC could happen between two points
in a program's code, we cannot use the same 'char *' pointer there;
instead, we must recompute the pointer (by using SDATA) after the
possible GC.

> > The value comes from 'string', a different variable.  It
> 
> Which might be in a register and not survive until GC is triggered.

A Lisp_Object variable will survive.  Its pointer will be updated if
needed to point to the new location of the data.  Thus, using the
Lisp_Object variable is always safe, but the pointer to its data must
be updated after a potential GC.

> >> > The below is indeed unsafe:
> >> >
> >> >   char *p = SDATA (unmarked);
> >> >   ... trigger GC here ...
> >> >   puts (p);
> >>
> >> Right now, that's safe for MPS, but not for the old GC, correct?
> >
> > If GC moves string data, then it is not safe, period.  Does MPS move
> > string data?
> 
> MPS does not move string data if there is a stack variable pointing to
> it.  It does in other cases.  This is why it's safe for MPS.  The old
> GC, IIUC, is less forgiving.

The conclusion is that the above is NOT safe, because in some cases
the data could move.  Which was what I said from the get-go.

> >> > Before we agree that this could be the reason, we'd need to find the
> >> > places where GC could reuse the memory of a live string, while that
> >> > string appears in some live data structure, and as long as no GC can
> >> > happen between the time we put the SDATA pointers into the environment
> >> > array and the time posix_spawn returns.
> >>
> >> Calling igc_collect () right before the spawn results in corrupt data:
> >
> > But the code doesn't call igc_collect, so how is this relevant?
> 
> This is all that is relevant to establishing the current code is not
> safe for the intended behavior of scratch/igc, which is to allow garbage
> collection at this point, equivalent to setting a breakpoint there and
> calling igc_collect().

Why would we allow GC at some point where the code clearly does NOT
expect GC to happen?

We are discussing whether some code in callproc.c needs to be changed
or is okay as it is.  If it is okay because GC cannot possibly happen
while some code fragment is being executed, then forcefully injecting
a call to GC there is pointless, because it changes the conditions
under which this code normally runs.  So please keep such arguments
out of this discussion, they just muddy the waters and are not useful.

> I honestly don't know whether hitting a memory barrier can result in
> other data being moved by MPS right now.  I'm assuming it can, and we
> should not assume string data is protected from being moved unless we
> root it explicitly.

We should not assume anything, we should have a clear understanding
what does and what doesn't move the data and the pointers.  We cannot
possibly analyze the code otherwise, or at least I cannot.

To keep this discussion focused, let's please return to looking at
some particular code in callproc.c where you think SAFE_ALLOCA is
unsafe.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  8:23                     ` Eli Zaretskii
@ 2025-01-05  9:04                       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-05  9:32                         ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05  9:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 75322

"Eli Zaretskii" <eliz@gnu.org> writes:
> we should be able to use without restrictions static variables whose
> values are Lisp strings.

Without staticpro?  That makes no sense whatsoever to me.  It's not true
for either garbage collector.  Do you expect MPS to scan the entire
data/bss segment ambiguously?

>> Which might be in a register and not survive until GC is triggered.
>
> A Lisp_Object variable will survive.  Its pointer will be updated if
> needed to point to the new location of the data.  Thus, using the

MPS never changes the value of an automatic variable.

> Lisp_Object variable is always safe, but the pointer to its data must
> be updated after a potential GC.
>
>> >> > The below is indeed unsafe:
>> >> >
>> >> >   char *p = SDATA (unmarked);
>> >> >   ... trigger GC here ...
>> >> >   puts (p);
>> >>
>> >> Right now, that's safe for MPS, but not for the old GC, correct?
>> >
>> > If GC moves string data, then it is not safe, period.  Does MPS move
>> > string data?
>>
>> MPS does not move string data if there is a stack variable pointing to
>> it.  It does in other cases.  This is why it's safe for MPS.  The old
>> GC, IIUC, is less forgiving.
>
> The conclusion is that the above is NOT safe, because in some cases
> the data could move.  Which was what I said from the get-go.

The conclusion is incorrect.  The code is safe if MPS is in use, and we
rely on that.

The idea behind MPS is that you write code as above, which is safe when
MPS is in use, rather than attempting to avoid GC, which is fragile at
best (see call_process) and impossible in multi-threaded systems.  We
have to avoid the old GC in the !HAVE_MPS build, but with the MPS
collector, there is no need (and no way) to avoid GC here.

You seem to fundamentally disagree with me about how MPS works.  We need
to resolve that difference one way or another before we can continue any
GC-related discussions.

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  9:04                       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05  9:32                         ` Eli Zaretskii
  2025-01-05  9:47                           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05  9:32 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, 75322

> Date: Sun, 05 Jan 2025 09:04:06 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> > we should be able to use without restrictions static variables whose
> > values are Lisp strings.
> 
> Without staticpro?  That makes no sense whatsoever to me.  It's not true
> for either garbage collector.  Do you expect MPS to scan the entire
> data/bss segment ambiguously?

That's not relevant to the issue at hand, which was about the
following snippet:

    static Lisp_Object unmarked;
                       ^^^^^^
    unmarked = string;
    ... trigger GC here ...
    puts (SDATA (unmarked);

That 'unmarked' is a static variable is not relevant, because the
issue is the pointer to the text data of 'string', which value we put
int 'unmarked'.

> >> Which might be in a register and not survive until GC is triggered.
> >
> > A Lisp_Object variable will survive.  Its pointer will be updated if
> > needed to point to the new location of the data.  Thus, using the
> 
> MPS never changes the value of an automatic variable.

Exactly!

> > Lisp_Object variable is always safe, but the pointer to its data must
> > be updated after a potential GC.
> >
> >> >> > The below is indeed unsafe:
> >> >> >
> >> >> >   char *p = SDATA (unmarked);
> >> >> >   ... trigger GC here ...
> >> >> >   puts (p);
> >> >>
> >> >> Right now, that's safe for MPS, but not for the old GC, correct?
> >> >
> >> > If GC moves string data, then it is not safe, period.  Does MPS move
> >> > string data?
> >>
> >> MPS does not move string data if there is a stack variable pointing to
> >> it.  It does in other cases.  This is why it's safe for MPS.  The old
> >> GC, IIUC, is less forgiving.
> >
> > The conclusion is that the above is NOT safe, because in some cases
> > the data could move.  Which was what I said from the get-go.
> 
> The conclusion is incorrect.  The code is safe if MPS is in use, and we
> rely on that.

No, we do NOT, and should NOT, rely on that!  Because you yourself say
above that MPS will move the string data in some cases.

> The idea behind MPS is that you write code as above, which is safe when
> MPS is in use, rather than attempting to avoid GC, which is fragile at
> best (see call_process) and impossible in multi-threaded systems.

I don't suggest to avoid GC.  I suggest that where GC _can_ happen, we
must reinitialize the pointer to string data after GC.

Otherwise, what was all that discussion about what call_process does
with the strings in the args[] array?  If GC cannot change the data
pointer to SDATA, then why should we care about GC in that function?

> You seem to fundamentally disagree with me about how MPS works.  We need
> to resolve that difference one way or another before we can continue any
> GC-related discussions.

I have a much larger issue here: this discussion produces an enormous
number of long messages without actually leading anywhere.  Maybe I'm
too stupid to discuss this, but I cannot keep it going like that, it
eats up all my free time (of which I don't have too much to be gin
with).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  9:32                         ` Eli Zaretskii
@ 2025-01-05  9:47                           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-05 11:04                             ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05  9:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 75322

"Eli Zaretskii" <eliz@gnu.org> writes:
>> >> >> > The below is indeed unsafe:
>> >> >> >
>> >> >> >   char *p = SDATA (unmarked);
>> >> >> >   ... trigger GC here ...
>> >> >> >   puts (p);
>> >> >>
>> >> >> Right now, that's safe for MPS, but not for the old GC, correct?
>> >> >
>> >> > If GC moves string data, then it is not safe, period.  Does MPS move
>> >> > string data?
>> >>
>> >> MPS does not move string data if there is a stack variable pointing to
>> >> it.  It does in other cases.  This is why it's safe for MPS.  The old
>> >> GC, IIUC, is less forgiving.
>> >
>> > The conclusion is that the above is NOT safe, because in some cases
>> > the data could move.  Which was what I said from the get-go.
>>
>> The conclusion is incorrect.  The code is safe if MPS is in use, and we
>> rely on that.
>
> No, we do NOT, and should NOT, rely on that!

We do, we should, and we will continue doing so.

> Because you yourself say above that MPS will move the string data in
> some cases.

Not in this case!

>> The idea behind MPS is that you write code as above, which is safe when
>> MPS is in use, rather than attempting to avoid GC, which is fragile at
>> best (see call_process) and impossible in multi-threaded systems.
>
> I don't suggest to avoid GC.  I suggest that where GC _can_ happen, we
> must reinitialize the pointer to string data after GC.

That suggestion is incorrect.  MPS GC can and does happen while we have
SDATA pointers on the stack, and we rely on it to keep those valid.  We
need not reinitialize anything after MPS GC, and we can't do so because
we don't know when and how we are interrupted for MPS GC.

IOW, MPS treats SDATA point the same way it treats Lisp_Objects. That
isn't the problem here.  The problem is that like Lisp_Objects, SDATA
pointers cannot be moved to xmalloc'd memory and retain their validity.

As long as you think MPS requires us to avoid GC in this case (this is
implied by "we must do this or that after GC"), your understanding of
how scratch/igc uses MPS is fundamentally incorrect.

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  6:59   ` Gerd Möllmann
@ 2025-01-05 10:21     ` Eli Zaretskii
  2025-01-05 10:30       ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 10:21 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> Cc: 75322@debbugs.gnu.org
> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Date: Sun, 05 Jan 2025 07:59:32 +0100
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> > Just want to add that all SAFE_NALLOC uses should be checked in
> > scratch/igc. For example, set_overlays_multibyte uses it to store
> > itree_node *, and itree_node is in MPS.
> >
> > I think I'll just make it allocate a root in my Emacs. That's the least
> > work.
> 
> Please find attached what I'm using now in my Emacs, in addition to your
> xstrdup commit.

These two are alternatives, right?  IOW, we should use one or the
other in each specific case, is that correct?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 10:21     ` Eli Zaretskii
@ 2025-01-05 10:30       ` Gerd Möllmann
  2025-01-05 10:35         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-05 11:29         ` Eli Zaretskii
  0 siblings, 2 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 10:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: 75322@debbugs.gnu.org
>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Date: Sun, 05 Jan 2025 07:59:32 +0100
>> 
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>> 
>> > Just want to add that all SAFE_NALLOC uses should be checked in
>> > scratch/igc. For example, set_overlays_multibyte uses it to store
>> > itree_node *, and itree_node is in MPS.
>> >
>> > I think I'll just make it allocate a root in my Emacs. That's the least
>> > work.
>> 
>> Please find attached what I'm using now in my Emacs, in addition to your
>> xstrdup commit.
>
> These two are alternatives, right?  IOW, we should use one or the
> other in each specific case, is that correct?

If you mean the two patches I sent with "these two", then no. I prefer
using SAFE_ALLOCA_LISP because that introduces an exact root.

If "these two" means xstrdup and my SAFE_NALLOCA change, then yes, kind
of. It doesn't harm to do both, but it's not strictly necessary to do
xstrdup if the SAFE_NALLOCA change is in. The other way round is not
true, i.e. having the xstrdup change fixes only 1 case.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  8:19     ` Gerd Möllmann
@ 2025-01-05 10:33       ` Eli Zaretskii
  2025-01-05 10:40         ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 10:33 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sun, 05 Jan 2025 09:19:17 +0100
> 
> I'd grep for SAFE_NALLOCA, and for each occurrence, see what is stored
> in the memory allocated.

Is only SAFE_NALLOCA a potential problem?  What about the other
members of the SAFE_*ALLOC* family?

> If that is a reference to MPS-allocated memory
> (a pointer or Lisp_Object), it should be changed, because it then hides
> references from MPS in malloc'd memory. Or can hide, to be more precise,
> in the case it doesn't use alloca.

Does using igc_xnmalloc_ambig have any significant adverse effect on
GC?  Like makes it slower or forces it to scan more memory?  Because
if we make SAFE_NALLOCA do this by default, there could be quite a lot
more such ambiguous roots by the time GC starts.

If this could impact performance or has some other adverse effects,
I think I'd prefer to have a new argument to SAFE_NALLOCA telling it
whether to call igc_xnmalloc_ambig, and we will then need to audit our
code to use that argument where what's stored in the memory it
allocates could be a Lisp object.  This is more error-prone, but
performance does count, especially where it comes to GC.

> I think the performance impact of that is negligible because this
> path is only executed when SAFE_ALLOCA does not use alloca.

MAX_ALLOCA is only 16KB (and Paul Eggert says it's dangerous to make
it significantly larger), so I'm not sure your assumption about
negligible performance impact is necessarily correct.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 10:30       ` Gerd Möllmann
@ 2025-01-05 10:35         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-05 10:45           ` Gerd Möllmann
  2025-01-05 11:29         ` Eli Zaretskii
  1 sibling, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 10:35 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, 75322

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> If "these two" means xstrdup and my SAFE_NALLOCA change, then yes, kind
> of. It doesn't harm to do both, but it's not strictly necessary to do
> xstrdup if the SAFE_NALLOCA change is in. The other way round is not

For argv, I agree, but make_environment_block uses xnmalloc and assumes
SDATA pointers in there will survive, which they don't, according to my
GDB log.

It was very brave not to start out by making every xmalloc an ambiguous
root and fixing only those that we know to be safe :-)

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 10:33       ` Eli Zaretskii
@ 2025-01-05 10:40         ` Gerd Möllmann
  2025-01-05 11:21           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-06  6:26           ` Gerd Möllmann
  0 siblings, 2 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 10:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sun, 05 Jan 2025 09:19:17 +0100
>> 
>> I'd grep for SAFE_NALLOCA, and for each occurrence, see what is stored
>> in the memory allocated.
>
> Is only SAFE_NALLOCA a potential problem?  What about the other
> members of the SAFE_*ALLOC* family?

Haven't checked others yet. I have that on my todo list now.

>
>> If that is a reference to MPS-allocated memory
>> (a pointer or Lisp_Object), it should be changed, because it then hides
>> references from MPS in malloc'd memory. Or can hide, to be more precise,
>> in the case it doesn't use alloca.
>
> Does using igc_xnmalloc_ambig have any significant adverse effect on
> GC?  Like makes it slower or forces it to scan more memory?  Because
> if we make SAFE_NALLOCA do this by default, there could be quite a lot
> more such ambiguous roots by the time GC starts.

More roots = more to scan for GC.

> If this could impact performance or has some other adverse effects,
> I think I'd prefer to have a new argument to SAFE_NALLOCA telling it
> whether to call igc_xnmalloc_ambig, and we will then need to audit our
> code to use that argument where what's stored in the memory it
> allocates could be a Lisp object.  This is more error-prone, but
> performance does count, especially where it comes to GC.
>
>> I think the performance impact of that is negligible because this
>> path is only executed when SAFE_ALLOCA does not use alloca.
>
> MAX_ALLOCA is only 16KB (and Paul Eggert says it's dangerous to make
> it significantly larger), so I'm not sure your assumption about
> negligible performance impact is necessarily correct.

Works well for me so far. 





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 10:35         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 10:45           ` Gerd Möllmann
  0 siblings, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 10:45 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, 75322

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> If "these two" means xstrdup and my SAFE_NALLOCA change, then yes, kind
>> of. It doesn't harm to do both, but it's not strictly necessary to do
>> xstrdup if the SAFE_NALLOCA change is in. The other way round is not
>
> For argv, I agree, but make_environment_block uses xnmalloc and assumes
> SDATA pointers in there will survive, which they don't, according to my
> GDB log.

Ah, okay. Then both xstrdup and SAFE_NALLOCA. Which I have here
so I'm good.

> It was very brave not to start out by making every xmalloc an ambiguous
> root and fixing only those that we know to be safe :-)

The SAFE_ stuff was only 1 out of 6542 things to get going. And I'm
absolutely reckless anyway :-).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05  9:47                           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 11:04                             ` Eli Zaretskii
  2025-01-06 15:54                               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 11:04 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, 75322

> Date: Sun, 05 Jan 2025 09:47:06 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> > I don't suggest to avoid GC.  I suggest that where GC _can_ happen, we
> > must reinitialize the pointer to string data after GC.
> 
> That suggestion is incorrect.  MPS GC can and does happen while we have
> SDATA pointers on the stack, and we rely on it to keep those valid.

Since we've established that MPS works on a single thread, which is
our Lisp thread, then GC cannot happen unless we make some call that
could trigger MPS GC.  Thus, whether or not SSDATA pointers are on the
stack is not relevant to when MPS GC could happen; what matters is the
code that is executed.

> IOW, MPS treats SDATA point the same way it treats Lisp_Objects.

I don't understand what that means.  What is "SDATA point"?  What is
the meaning of "treats the same"?

> The problem is that like Lisp_Objects, SDATA pointers cannot be
> moved to xmalloc'd memory and retain their validity.

No one said they could.  This is trivial.

> As long as you think MPS requires us to avoid GC in this case (this is
> implied by "we must do this or that after GC"), your understanding of
> how scratch/igc uses MPS is fundamentally incorrect.

Maybe my understanding is incorrect, but this style of "discussion",
where you are working hard to prove at all costs that I misunderstand
the issues and to point out my mistakes, contributes nothing at all to
understanding.  It only contributes to confusion.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 10:40         ` Gerd Möllmann
@ 2025-01-05 11:21           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-05 11:27             ` Gerd Möllmann
  2025-01-05 11:49             ` Paul Eggert
  2025-01-06  6:26           ` Gerd Möllmann
  1 sibling, 2 replies; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 11:21 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, Paul Eggert, 75322

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>> MAX_ALLOCA is only 16KB (and Paul Eggert says it's dangerous to make
>> it significantly larger), so I'm not sure your assumption about
>> negligible performance impact is necessarily correct.
>
> Works well for me so far.

Paul, this discussion is about changing SAFE_ALLOCA so its memory area
is scanned conservatively for the scratch/igc branch, even if xmalloc is
used (when there isn't enough stack space).

MPS requires this in some cases where the old GC didn't, because it's
enough for the old GC that we know there is a reference to retained
objects elsewhere, but MPS might move the data in that case.

However, given how rare it is for SAFE_ALLOCA to use xmalloc, maybe it
would be a good idea to make the corresponding change for the old GC,
too?

Also, scratch/igc behaves differently with regard to SDATA: references
to string data will keep the string data (but not its metadata) alive.
If scratch/igc is ever merged and people start using it exclusively, we
may start seeing bugs that rely on this additional safety guarantee.

My preference would be to change the traditional GC to also ensure code
like

  char *p = SDATA (string);
  string = Qnil;
  garbage_collect ();
  puts (p);

is safe.  Some memory and performance cost, but no-GC assumptions are
hard to verify.  This may or may not fix bugs in the present code.

A potential example for such a bug is this:

I think that it's possible for call_process to call Lisp via ENCODE_FILE
in a very unusual scenario, which might trigger GC and relocation of
string data which has already been put in the argv and environment
arrays.

However, I don't fully understand which additional protections
traditional GC offers: is it always unsafe to use an SDATA obtained
before a GC run after the GC run completes, or is there a more
complicated set of rules?

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 11:21           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 11:27             ` Gerd Möllmann
  2025-01-05 11:49             ` Paul Eggert
  1 sibling, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 11:27 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, Paul Eggert, 75322

Pip Cet <pipcet@protonmail.com> writes:

> However, I don't fully understand which additional protections
> traditional GC offers: is it always unsafe to use an SDATA obtained
> before a GC run after the GC run completes, or is there a more
> complicated set of rules?

There is this, which is used in the byte interpreter, IIRC, as sort of a
workaround for the problem

#ifndef HAVE_MPS
/* Pin a unibyte string in place so that it won't move during GC.  */
void
pin_string (Lisp_Object string)
{





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 10:30       ` Gerd Möllmann
  2025-01-05 10:35         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 11:29         ` Eli Zaretskii
  2025-01-05 11:37           ` Gerd Möllmann
  1 sibling, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 11:29 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sun, 05 Jan 2025 11:30:11 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Cc: 75322@debbugs.gnu.org
> >> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> >> Date: Sun, 05 Jan 2025 07:59:32 +0100
> >> 
> >> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> >> 
> >> > Just want to add that all SAFE_NALLOC uses should be checked in
> >> > scratch/igc. For example, set_overlays_multibyte uses it to store
> >> > itree_node *, and itree_node is in MPS.
> >> >
> >> > I think I'll just make it allocate a root in my Emacs. That's the least
> >> > work.
> >> 
> >> Please find attached what I'm using now in my Emacs, in addition to your
> >> xstrdup commit.
> >
> > These two are alternatives, right?  IOW, we should use one or the
> > other in each specific case, is that correct?
> 
> If you mean the two patches I sent with "these two", then no. I prefer
> using SAFE_ALLOCA_LISP because that introduces an exact root.

I guess I'm confused, then.  The first patch replaces calls to
SAFE_NALLOCA by SAFE_ALLOCA_LISP, the second patch modifies
SAFE_NALLOCA to call igc_xnmalloc_ambig.  That's why I thought they
were alternatives.

If they are not alternatives, then why did you replace SAFE_NALLOCA in
the first patch?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 11:29         ` Eli Zaretskii
@ 2025-01-05 11:37           ` Gerd Möllmann
  2025-01-05 12:15             ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 11:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> If you mean the two patches I sent with "these two", then no. I prefer
>> using SAFE_ALLOCA_LISP because that introduces an exact root.
>
> I guess I'm confused, then.  The first patch replaces calls to
> SAFE_NALLOCA by SAFE_ALLOCA_LISP, the second patch modifies
> SAFE_NALLOCA to call igc_xnmalloc_ambig.  That's why I thought they
> were alternatives.
>
> If they are not alternatives, then why did you replace SAFE_NALLOCA in
> the first patch?

I checked other uses of SAFE_NALLOCA that were not yet mentioned, and
found another problematic case. (Something with struct itree_node *,
don't remember the function name, it's in some other mail). There were
too many grep hits for SAFE_NALLOCA for me, so I shot with a canon :-).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 11:21           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-05 11:27             ` Gerd Möllmann
@ 2025-01-05 11:49             ` Paul Eggert
  1 sibling, 0 replies; 78+ messages in thread
From: Paul Eggert @ 2025-01-05 11:49 UTC (permalink / raw)
  To: Pip Cet, Gerd Möllmann; +Cc: Eli Zaretskii, 75322

On 2025-01-05 03:21, Pip Cet wrote:
> is it always unsafe to use an SDATA obtained
> before a GC run after the GC run completes, or is there a more
> complicated set of rules?

If there's a more complicated set of rules, I don't know them. Gerd's 
citation of pin_string makes sense.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 11:37           ` Gerd Möllmann
@ 2025-01-05 12:15             ` Eli Zaretskii
  2025-01-05 13:21               ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 12:15 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sun, 05 Jan 2025 12:37:42 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> If you mean the two patches I sent with "these two", then no. I prefer
> >> using SAFE_ALLOCA_LISP because that introduces an exact root.
> >
> > I guess I'm confused, then.  The first patch replaces calls to
> > SAFE_NALLOCA by SAFE_ALLOCA_LISP, the second patch modifies
> > SAFE_NALLOCA to call igc_xnmalloc_ambig.  That's why I thought they
> > were alternatives.
> >
> > If they are not alternatives, then why did you replace SAFE_NALLOCA in
> > the first patch?
> 
> I checked other uses of SAFE_NALLOCA that were not yet mentioned, and
> found another problematic case. (Something with struct itree_node *,
> don't remember the function name, it's in some other mail). There were
> too many grep hits for SAFE_NALLOCA for me, so I shot with a canon :-).

OK.

So can we talk about the relative merits and demerits of using
SAFE_ALLOCA_LISP vs SAFE_NALLOCA?  First, why is it better to have an
exact root than an ambiguous root?  And second, SAFE_ALLOCA_LISP
conses a Lisp vector, which will increase GC pressure, so isn't
SAFE_NALLOCA preferable at least in some cases?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 12:15             ` Eli Zaretskii
@ 2025-01-05 13:21               ` Gerd Möllmann
  2025-01-05 17:31                 ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 13:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sun, 05 Jan 2025 12:37:42 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> If you mean the two patches I sent with "these two", then no. I prefer
>> >> using SAFE_ALLOCA_LISP because that introduces an exact root.
>> >
>> > I guess I'm confused, then.  The first patch replaces calls to
>> > SAFE_NALLOCA by SAFE_ALLOCA_LISP, the second patch modifies
>> > SAFE_NALLOCA to call igc_xnmalloc_ambig.  That's why I thought they
>> > were alternatives.
>> >
>> > If they are not alternatives, then why did you replace SAFE_NALLOCA in
>> > the first patch?
>> 
>> I checked other uses of SAFE_NALLOCA that were not yet mentioned, and
>> found another problematic case. (Something with struct itree_node *,
>> don't remember the function name, it's in some other mail). There were
>> too many grep hits for SAFE_NALLOCA for me, so I shot with a canon :-).
>
> OK.
>
> So can we talk about the relative merits and demerits of using
> SAFE_ALLOCA_LISP vs SAFE_NALLOCA?  

Let me add a (0): I assume that SAFE_ALLOCA_LISP is the right thing in
the _old_ GC, because it makes sure objects referenced in the xmalloc'd
memory are marked. From my POV, it would require a very good reason to
use something else, which is nowhere mentioned. That's why I suspect
it's a left-over from times where SAFE_ALLOCA_LISP didn't exist.

(And I very much hope it's not the old pattern of "I don't need to GCPRO
this because I know this is already protected because of so-and-so",
which you might still remember from the old times. In which cases I
would've liked to hit people with a GCPRO on their head when I had to
debug that and so-on-so was no longer true  :-).)

> First, why is it better to have an exact root than an ambiguous root?

In the most general case, where an ambiguous root can contain random bit
patterns, say the C stack, I'd say the greatest advantage of exact roots
is avoiding false positives that keep objects alive, or prevent copying
them.

In the specific here case, where a root actually contains only
Lisp_Objects, and not random patterns, I'd say the advantage of exact
roots is that they don't prevent copying.

The "prevent copying" disadvantage is a bit hand-wavy, and depends a lot
on the GC implementation. Maybe a good picture of it that one would like
to have a fully copying collector, with its advantage of reducing
fragmentation, for example, but one can only have a mostly-copying
collector, because of ambiguous references. The more the "mostly" is
true the better for the copying/fragmentation. Does that make sense?

> And second, SAFE_ALLOCA_LISP conses a Lisp vector, which will increase
> GC pressure, so isn't SAFE_NALLOCA preferable at least in some cases?

SAFE_ALLOCA_LISP allocates a Lisp vector, that's true. I think one can
say that allocation is cheap on average. The overhead of freeing it is
not copying it, which is basically zero.

SAFE_NALLOCA, with my patch, requires a xmalloc, creation of a MPS root
object, deletion of that, and xfree.

Let's assume scanning costs are more or less the same because the
number of references is the same in both cases.

I think SAFE_NALLOCA is more expensive,





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-04 10:20               ` Gerd Möllmann
@ 2025-01-05 13:30                 ` Eli Zaretskii
  2025-01-05 14:11                   ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 13:30 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sat, 04 Jan 2025 11:20:41 +0100
> 
> In callproc.c I found two: call_process and create_temp_file both use
> SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces
> with SAVE_ALLOCA_LISP.

What are the conditions under which placing Lisp objects into
SAFE_NALLOCA is not safe?

I understand that the first condition is that SAFE_NALLOCA uses
xmalloc instead of alloca.

But what are the other conditions?  Is one of them that GC could
happen while these Lisp objects are in the memory allocated by
SAFE_NALLOCA off the heap?  IOW, if no GC happen, is that still
unsafe?  And if GC _can_ happen, but we don't use the allocated block
again, is that a problem?  For example, in this fragment:

	    SAFE_NALLOCA (args2, 1, nargs + 1);
	    args2[0] = Qcall_process;
	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;

Let's say Ffind_operation_coding_system could trigger GC.  But we
never again use the args2[] array after Ffind_operation_coding_system
returns.  Is the above still unsafe?  If so, could you tell what
could MPS do during GC to make this unsafe?

Also, in some other message you said SAFE_NALLOCA is unsafe if
_pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA
allocates off the heap.  In call_process I see that we only ever put
Lisp objects into the memory allocated by SAFE_NALLOCA.  If that is
unsafe, could you tell what MPS does during GC which makes this
unsafe?

TIA





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 13:30                 ` Eli Zaretskii
@ 2025-01-05 14:11                   ` Gerd Möllmann
  2025-01-05 17:45                     ` Eli Zaretskii
  2025-01-05 21:01                     ` Daniel Colascione
  0 siblings, 2 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 14:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sat, 04 Jan 2025 11:20:41 +0100
>> 
>> In callproc.c I found two: call_process and create_temp_file both use
>> SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces
>> with SAVE_ALLOCA_LISP.
>
> What are the conditions under which placing Lisp objects into
> SAFE_NALLOCA is not safe?
>
> I understand that the first condition is that SAFE_NALLOCA uses
> xmalloc instead of alloca.

Right. If it doesn't use xmalloc, the references are on the C stack, and
both old and new GC handle that by scanning the C stack.

> But what are the other conditions?  Is one of them that GC could
> happen while these Lisp objects are in the memory allocated by
> SAFE_NALLOCA off the heap?  

Yes.

> IOW, if no GC happen, is that still unsafe? And if GC _can_ happen,
> but we don't use the allocated block again, is that a problem? For
> example, in this fragment:
>
> 	    SAFE_NALLOCA (args2, 1, nargs + 1);
> 	    args2[0] = Qcall_process;
> 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
> 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
> 	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
>
> Let's say Ffind_operation_coding_system could trigger GC.  But we
> never again use the args2[] array after Ffind_operation_coding_system
> returns.  Is the above still unsafe?  If so, could you tell what
> could MPS do during GC to make this unsafe?

Let me first say why I find this unsafe in the old GC, in principle. If
we don't assume anything about the objects referenced from args2, then a
reference in args2 may well be the only one to some object. In this
case, the old GC would sweep it. Or, the other way 'round, by using
SAFE_NALLOCA we make an assumption. And that, from my (GCPRO) POV, needs
a proof, or better yet some check in code.

Not using arg2 after Ffind_operation_coding_system above is not enough.
It would have to be not using args2 after the GC has run. Maybe that's
_in_ Ffind_operation_coding_system.

In the new GC, with MPS, the same is true as above. An object which is
only referenced from args2 may die.

Additionally, objects might not die but may move, assuming that
SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA
makes another assumption in the MPS case: that something else prevents
the objects from moving. Another proof or check required with my GCPRO
hat on.

>
> Also, in some other message you said SAFE_NALLOCA is unsafe if
> _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA
> allocates off the heap.  In call_process I see that we only ever put
> Lisp objects into the memory allocated by SAFE_NALLOCA.  If that is
> unsafe, could you tell what MPS does during GC which makes this
> unsafe?

Not sure, is the question why in MPS both pointers and Lisp_Object count
as "references"?

If it's that, it's basically the same in the old GC. For example, when
marking the C stack, we must recognize both pointers to Lisp_Cons and
Lisp_Objects that look like conses, which contain such a pointer. Which
was the reason I had to add the mem_blocks, the red-black tree and so
on, to be able to determine if some potential pointer can indeed be
pointer to cons, conservatively.






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 13:21               ` Gerd Möllmann
@ 2025-01-05 17:31                 ` Eli Zaretskii
  2025-01-05 17:49                   ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 17:31 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sun, 05 Jan 2025 14:21:04 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > So can we talk about the relative merits and demerits of using
> > SAFE_ALLOCA_LISP vs SAFE_NALLOCA?  
> 
> Let me add a (0): I assume that SAFE_ALLOCA_LISP is the right thing in
> the _old_ GC, because it makes sure objects referenced in the xmalloc'd
> memory are marked. From my POV, it would require a very good reason to
> use something else, which is nowhere mentioned. That's why I suspect
> it's a left-over from times where SAFE_ALLOCA_LISP didn't exist.

This surprises me because on the master branch SAFE_ALLOCA_LISP either
calls alloca or xzalloc.  So I don't see why SAFE_ALLOCA_LISP would be
safer in the old GC than SAFE_NALLOCA.  It's only on the igc branch
that you made SAFE_ALLOCA_LISP make a Lisp vector.

But this is a tangent from my POV.  I would like to discuss the new
GC, not the old one.

> > And second, SAFE_ALLOCA_LISP conses a Lisp vector, which will increase
> > GC pressure, so isn't SAFE_NALLOCA preferable at least in some cases?
> 
> SAFE_ALLOCA_LISP allocates a Lisp vector, that's true. I think one can
> say that allocation is cheap on average. The overhead of freeing it is
> not copying it, which is basically zero.

But I presume that if we have more Lisp objects, GC will happen
sooner, no?  So increasing GC pressure is not zero-cost, because more
frequent GC means more CPU processing that stops our application
thread.

> SAFE_NALLOCA, with my patch, requires a xmalloc, creation of a MPS root
> object, deletion of that, and xfree.
> 
> Let's assume scanning costs are more or less the same because the
> number of references is the same in both cases.

But more memory allocated via xmalloc doesn't increase the frequency
of GC, does it?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 14:11                   ` Gerd Möllmann
@ 2025-01-05 17:45                     ` Eli Zaretskii
  2025-01-05 18:17                       ` Gerd Möllmann
  2025-01-05 21:01                     ` Daniel Colascione
  1 sibling, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 17:45 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sun, 05 Jan 2025 15:11:08 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > And if GC _can_ happen,
> > but we don't use the allocated block again, is that a problem? For
> > example, in this fragment:
> >
> > 	    SAFE_NALLOCA (args2, 1, nargs + 1);
> > 	    args2[0] = Qcall_process;
> > 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
> > 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
> > 	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
> >
> > Let's say Ffind_operation_coding_system could trigger GC.  But we
> > never again use the args2[] array after Ffind_operation_coding_system
> > returns.  Is the above still unsafe?  If so, could you tell what
> > could MPS do during GC to make this unsafe?
> 
> Let me first say why I find this unsafe in the old GC, in principle. If
> we don't assume anything about the objects referenced from args2, then a
> reference in args2 may well be the only one to some object. In this
> case, the old GC would sweep it.

OK, but in most, if not all of these cases, the objects are referenced
from the stack.  For example, in the above fragment, the args[] array
is on the stack.  Right?

> Not using arg2 after Ffind_operation_coding_system above is not enough.
> It would have to be not using args2 after the GC has run. Maybe that's
> _in_ Ffind_operation_coding_system.

OK, agreed.

> Additionally, objects might not die but may move, assuming that
> SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA
> makes another assumption in the MPS case: that something else prevents
> the objects from moving. Another proof or check required with my GCPRO
> hat on.

What does it mean in detail "the object may move"?  A Lisp object is a
tagged pointer.  Do you mean the pointer should no point to a
different address, i.e. the value of a Lisp object as a number should
change to still be valid?  And if so, is MPS supposed to find all the
copies of that value everywhere in order to update them?  So if I have
several variables which were all assigned a value of the same Lisp
object, they all need to be updated when the object moves?

> > Also, in some other message you said SAFE_NALLOCA is unsafe if
> > _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA
> > allocates off the heap.  In call_process I see that we only ever put
> > Lisp objects into the memory allocated by SAFE_NALLOCA.  If that is
> > unsafe, could you tell what MPS does during GC which makes this
> > unsafe?
> 
> Not sure, is the question why in MPS both pointers and Lisp_Object count
> as "references"?

Yes, if that's the situation.  Earlier you only mentioned pointers to
Lisp objects, something that happens relatively rarely.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 17:31                 ` Eli Zaretskii
@ 2025-01-05 17:49                   ` Gerd Möllmann
  2025-01-05 18:42                     ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 17:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sun, 05 Jan 2025 14:21:04 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > So can we talk about the relative merits and demerits of using
>> > SAFE_ALLOCA_LISP vs SAFE_NALLOCA?  
>> 
>> Let me add a (0): I assume that SAFE_ALLOCA_LISP is the right thing in
>> the _old_ GC, because it makes sure objects referenced in the xmalloc'd
>> memory are marked. From my POV, it would require a very good reason to
>> use something else, which is nowhere mentioned. That's why I suspect
>> it's a left-over from times where SAFE_ALLOCA_LISP didn't exist.
>
> This surprises me because on the master branch SAFE_ALLOCA_LISP either
> calls alloca or xzalloc.  So I don't see why SAFE_ALLOCA_LISP would be
> safer in the old GC than SAFE_NALLOCA.  It's only on the igc branch
> that you made SAFE_ALLOCA_LISP make a Lisp vector.
> But this is a tangent from my POV.  I would like to discuss the new
> GC, not the old one.
>

This in SAFE_ALLOCA_LISP_EXTRA

	(buf) = xzalloc (alloca_nbytes);		       \
	record_unwind_protect_array (buf, nelt);	       \

makes a specpdl entry, which makes mark_specpdl do this:

	case SPECPDL_UNWIND_ARRAY:
	  mark_objects (pdl->unwind_array.array, pdl->unwind_array.nelts);
	  break;

>> > And second, SAFE_ALLOCA_LISP conses a Lisp vector, which will increase
>> > GC pressure, so isn't SAFE_NALLOCA preferable at least in some cases?
>> 
>> SAFE_ALLOCA_LISP allocates a Lisp vector, that's true. I think one can
>> say that allocation is cheap on average. The overhead of freeing it is
>> not copying it, which is basically zero.
>
> But I presume that if we have more Lisp objects, GC will happen
> sooner, no?  So increasing GC pressure is not zero-cost, because more
> frequent GC means more CPU processing that stops our application
> thread.

I don't know what exact consequences that has. MPS is at least an
incremental collector, so it's not normally the case that the GC
_starts_ at some given point.

What certainly happens is that the array is allocated from an allocation
point which has a certain amount of memory pre-allocated. And that
pre-allocated memory will run out sooner, in which case the allocation
point will acquire additional memory, which potentially requires some
amount of GC.

And so on. So it will have some effect, for sure, but I don't remember
the docs spelling out details. Maybe someone else knows.

>
>> SAFE_NALLOCA, with my patch, requires a xmalloc, creation of a MPS root
>> object, deletion of that, and xfree.
>> 
>> Let's assume scanning costs are more or less the same because the
>> number of references is the same in both cases.
>
> But more memory allocated via xmalloc doesn't increase the frequency
> of GC, does it?

Xmalloc no, but as I said above... 





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 17:45                     ` Eli Zaretskii
@ 2025-01-05 18:17                       ` Gerd Möllmann
  2025-01-05 19:07                         ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 18:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sun, 05 Jan 2025 15:11:08 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > And if GC _can_ happen,
>> > but we don't use the allocated block again, is that a problem? For
>> > example, in this fragment:
>> >
>> > 	    SAFE_NALLOCA (args2, 1, nargs + 1);
>> > 	    args2[0] = Qcall_process;
>> > 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
>> > 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
>> > 	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
>> >
>> > Let's say Ffind_operation_coding_system could trigger GC.  But we
>> > never again use the args2[] array after Ffind_operation_coding_system
>> > returns.  Is the above still unsafe?  If so, could you tell what
>> > could MPS do during GC to make this unsafe?
>> 
>> Let me first say why I find this unsafe in the old GC, in principle. If
>> we don't assume anything about the objects referenced from args2, then a
>> reference in args2 may well be the only one to some object. In this
>> case, the old GC would sweep it.
>
> OK, but in most, if not all of these cases, the objects are referenced
> from the stack.  For example, in the above fragment, the args[] array
> is on the stack.  Right?

That args is a parameter

  call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd,

So just from this I see only args itself on the stack, not args[0],
args[1] and so on. I would have to look at all callers to determine
that. Not good enough in my book.

>
>> Not using arg2 after Ffind_operation_coding_system above is not enough.
>> It would have to be not using args2 after the GC has run. Maybe that's
>> _in_ Ffind_operation_coding_system.
>
> OK, agreed.
>
>> Additionally, objects might not die but may move, assuming that
>> SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA
>> makes another assumption in the MPS case: that something else prevents
>> the objects from moving. Another proof or check required with my GCPRO
>> hat on.
>
> What does it mean in detail "the object may move"?  A Lisp object is a
> tagged pointer.  Do you mean the pointer should no point to a
> different address, i.e. the value of a Lisp object as a number should
> change to still be valid?  

Exactly. Unless an ambiguous reference prevents the copying that can
happen.

> And if so, is MPS supposed to find all the copies of that value
> everywhere in order to update them? So if I have several variables
> which were all assigned a value of the same Lisp object, they all need
> to be updated when the object moves?

Yes. MPS does that with the help of our dflt_scan and its subroutines
where we call MPS_FIX2 and update the reference.

>> > Also, in some other message you said SAFE_NALLOCA is unsafe if
>> > _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA
>> > allocates off the heap.  In call_process I see that we only ever put
>> > Lisp objects into the memory allocated by SAFE_NALLOCA.  If that is
>> > unsafe, could you tell what MPS does during GC which makes this
>> > unsafe?
>> 
>> Not sure, is the question why in MPS both pointers and Lisp_Object count
>> as "references"?
>
> Yes, if that's the situation.  Earlier you only mentioned pointers to
> Lisp objects, something that happens relatively rarely.

That's the case in MPS. Fixnums aside, Lisp_Object is basically also
only a pointer, with some tag bits added. In that sense it's the same
case.

And every string contains a pointer to it's data, which I consider part
of the Lisp data. And intervals are also Lisp data. The ones from enum
igc_obj_type.






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 17:49                   ` Gerd Möllmann
@ 2025-01-05 18:42                     ` Eli Zaretskii
  2025-01-05 19:02                       ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 18:42 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sun, 05 Jan 2025 18:49:56 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > This surprises me because on the master branch SAFE_ALLOCA_LISP either
> > calls alloca or xzalloc.  So I don't see why SAFE_ALLOCA_LISP would be
> > safer in the old GC than SAFE_NALLOCA.  It's only on the igc branch
> > that you made SAFE_ALLOCA_LISP make a Lisp vector.
> > But this is a tangent from my POV.  I would like to discuss the new
> > GC, not the old one.
> >
> 
> This in SAFE_ALLOCA_LISP_EXTRA
> 
> 	(buf) = xzalloc (alloca_nbytes);		       \
> 	record_unwind_protect_array (buf, nelt);	       \
> 
> makes a specpdl entry, which makes mark_specpdl do this:
> 
> 	case SPECPDL_UNWIND_ARRAY:
> 	  mark_objects (pdl->unwind_array.array, pdl->unwind_array.nelts);
> 	  break;

Thanks.  I guess it's supposed to be "common knowledge" that
record_unwind_protect_* indirectly causes the objects to be marked...

But then why don't we set the .mark member in SAFE_NALLOCA or some
variant thereof?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 18:42                     ` Eli Zaretskii
@ 2025-01-05 19:02                       ` Gerd Möllmann
  0 siblings, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 19:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sun, 05 Jan 2025 18:49:56 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > This surprises me because on the master branch SAFE_ALLOCA_LISP either
>> > calls alloca or xzalloc.  So I don't see why SAFE_ALLOCA_LISP would be
>> > safer in the old GC than SAFE_NALLOCA.  It's only on the igc branch
>> > that you made SAFE_ALLOCA_LISP make a Lisp vector.
>> > But this is a tangent from my POV.  I would like to discuss the new
>> > GC, not the old one.
>> >
>> 
>> This in SAFE_ALLOCA_LISP_EXTRA
>> 
>> 	(buf) = xzalloc (alloca_nbytes);		       \
>> 	record_unwind_protect_array (buf, nelt);	       \
>> 
>> makes a specpdl entry, which makes mark_specpdl do this:
>> 
>> 	case SPECPDL_UNWIND_ARRAY:
>> 	  mark_objects (pdl->unwind_array.array, pdl->unwind_array.nelts);
>> 	  break;
>
> Thanks.  I guess it's supposed to be "common knowledge" that
> record_unwind_protect_* indirectly causes the objects to be marked...
>
> But then why don't we set the .mark member in SAFE_NALLOCA or some
> variant thereof?

Questions upon questions. I have no idea. It seems to be relatively
new, i.e. younger than 20 years :-).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 18:17                       ` Gerd Möllmann
@ 2025-01-05 19:07                         ` Eli Zaretskii
  2025-01-05 20:04                           ` Gerd Möllmann
  2025-01-05 21:15                           ` Daniel Colascione
  0 siblings, 2 replies; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 19:07 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sun, 05 Jan 2025 19:17:37 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > OK, but in most, if not all of these cases, the objects are referenced
> > from the stack.  For example, in the above fragment, the args[] array
> > is on the stack.  Right?
> 
> That args is a parameter
> 
>   call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd,
> 
> So just from this I see only args itself on the stack, not args[0],
> args[1] and so on. I would have to look at all callers to determine
> that. Not good enough in my book.

So what, we will now need to copy every args[] into a Lisp vector
created by SAFE_ALLOCA_LISP, or xstrdup all of them, and do it in
each and every function that gets the args[] array, all the way down
to where the array is finally used (because usually we have 3 or 4
nested levels that pass args[] to one another)?  That's insane!

> > What does it mean in detail "the object may move"?  A Lisp object is a
> > tagged pointer.  Do you mean the pointer should no point to a
> > different address, i.e. the value of a Lisp object as a number should
> > change to still be valid?  
> 
> Exactly. Unless an ambiguous reference prevents the copying that can
> happen.

How can we possibly make sure this works reliably and safely??  For
each variable we have in every function, we will need to analyze
whether the variable is

  . an automatic variable
  . a static variable that is protected by someone
  . a global variable that is protected by someone
  . a result of dereferencing a pointer that is somehow protected

etc. etc., where "protected by someone" means that it is a descendant
of some staticpro, or of some root, or...

And if we cannot prove to ourselves that one of the above happens,
then we'd need to force a copy of the variable to be on the stack?

Does this sound practical?

If this is the price of using MPS, and I'm not missing something
obvious, then it sounds like we should run away from MPS, fast.
Because we will sooner or later have to rewrite every single line of
code we ever wrote.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 19:07                         ` Eli Zaretskii
@ 2025-01-05 20:04                           ` Gerd Möllmann
  2025-01-05 20:24                             ` Eli Zaretskii
  2025-01-05 21:15                           ` Daniel Colascione
  1 sibling, 1 reply; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-05 20:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sun, 05 Jan 2025 19:17:37 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > OK, but in most, if not all of these cases, the objects are referenced
>> > from the stack.  For example, in the above fragment, the args[] array
>> > is on the stack.  Right?
>> 
>> That args is a parameter
>> 
>>   call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd,
>> 
>> So just from this I see only args itself on the stack, not args[0],
>> args[1] and so on. I would have to look at all callers to determine
>> that. Not good enough in my book.
>
> So what, we will now need to copy every args[] into a Lisp vector
> created by SAFE_ALLOCA_LISP, or xstrdup all of them, and do it in
> each and every function that gets the args[] array, all the way down
> to where the array is finally used (because usually we have 3 or 4
> nested levels that pass args[] to one another)?  That's insane!
>
>> > What does it mean in detail "the object may move"?  A Lisp object is a
>> > tagged pointer.  Do you mean the pointer should no point to a
>> > different address, i.e. the value of a Lisp object as a number should
>> > change to still be valid?  
>> 
>> Exactly. Unless an ambiguous reference prevents the copying that can
>> happen.
>
> How can we possibly make sure this works reliably and safely??  For
> each variable we have in every function, we will need to analyze
> whether the variable is
>
>   . an automatic variable
>   . a static variable that is protected by someone
>   . a global variable that is protected by someone
>   . a result of dereferencing a pointer that is somehow protected
>
> etc. etc., where "protected by someone" means that it is a descendant
> of some staticpro, or of some root, or...
>
> And if we cannot prove to ourselves that one of the above happens,
> then we'd need to force a copy of the variable to be on the stack?
>
> Does this sound practical?
>
> If this is the price of using MPS, and I'm not missing something
> obvious, then it sounds like we should run away from MPS, fast.
> Because we will sooner or later have to rewrite every single line of
> code we ever wrote.

I'm bowing out again. It's not worth it.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 20:04                           ` Gerd Möllmann
@ 2025-01-05 20:24                             ` Eli Zaretskii
  2025-01-06  3:57                               ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-05 20:24 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Sun, 05 Jan 2025 21:04:56 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > How can we possibly make sure this works reliably and safely??  For
> > each variable we have in every function, we will need to analyze
> > whether the variable is
> >
> >   . an automatic variable
> >   . a static variable that is protected by someone
> >   . a global variable that is protected by someone
> >   . a result of dereferencing a pointer that is somehow protected
> >
> > etc. etc., where "protected by someone" means that it is a descendant
> > of some staticpro, or of some root, or...
> >
> > And if we cannot prove to ourselves that one of the above happens,
> > then we'd need to force a copy of the variable to be on the stack?
> >
> > Does this sound practical?
> >
> > If this is the price of using MPS, and I'm not missing something
> > obvious, then it sounds like we should run away from MPS, fast.
> > Because we will sooner or later have to rewrite every single line of
> > code we ever wrote.
> 
> I'm bowing out again. It's not worth it.

I don't understand why?  I need to understand the implications to be
able to make decisions, which are part of my job.  So I ask questions,
and I'm grateful for your answers, which clarify the issues for me.
That I sometimes sound overwhelmed by the implications shouldn't be
held against me, it's just a normal human reaction, nothing more.

If I somehow sound impolite, I apologize.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 14:11                   ` Gerd Möllmann
  2025-01-05 17:45                     ` Eli Zaretskii
@ 2025-01-05 21:01                     ` Daniel Colascione
  2025-01-05 23:28                       ` Daniel Colascione
  2025-01-06  4:23                       ` Gerd Möllmann
  1 sibling, 2 replies; 78+ messages in thread
From: Daniel Colascione @ 2025-01-05 21:01 UTC (permalink / raw)
  To: 75322, gerd.moellmann, eliz; +Cc: pipcet

[-- Attachment #1: Type: text/plain, Size: 3876 bytes --]

On January 5, 2025 9:11:08 AM EST, "Gerd Möllmann" <gerd.moellmann@gmail.com> wrote:
>Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>>> Date: Sat, 04 Jan 2025 11:20:41 +0100
>>> 
>>> In callproc.c I found two: call_process and create_temp_file both use
>>> SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces
>>> with SAVE_ALLOCA_LISP.
>>
>> What are the conditions under which placing Lisp objects into
>> SAFE_NALLOCA is not safe?
>>
>> I understand that the first condition is that SAFE_NALLOCA uses
>> xmalloc instead of alloca.
>
>Right. If it doesn't use xmalloc, the references are on the C stack, and
>both old and new GC handle that by scanning the C stack.
>
>> But what are the other conditions?  Is one of them that GC could
>> happen while these Lisp objects are in the memory allocated by
>> SAFE_NALLOCA off the heap?  
>
>Yes.
>
>> IOW, if no GC happen, is that still unsafe? And if GC _can_ happen,
>> but we don't use the allocated block again, is that a problem? For
>> example, in this fragment:
>>
>> 	    SAFE_NALLOCA (args2, 1, nargs + 1);
>> 	    args2[0] = Qcall_process;
>> 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
>> 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
>> 	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
>>
>> Let's say Ffind_operation_coding_system could trigger GC.  But we
>> never again use the args2[] array after Ffind_operation_coding_system
>> returns.  Is the above still unsafe?  If so, could you tell what
>> could MPS do during GC to make this unsafe?
>
>Let me first say why I find this unsafe in the old GC, in principle. If
>we don't assume anything about the objects referenced from args2, then a
>reference in args2 may well be the only one to some object. In this
>case, the old GC would sweep it.

Gerd is right. This pattern was never safe.

> Or, the other way 'round, by using
>SAFE_NALLOCA we make an assumption. And that, from my (GCPRO) POV, needs
>a proof, or better yet some check in code.
>
>Not using arg2 after Ffind_operation_coding_system above is not enough.
>It would have to be not using args2 after the GC has run. Maybe that's
>_in_ Ffind_operation_coding_system.
>
>In the new GC, with MPS, the same is true as above. An object which is
>only referenced from args2 may die.

Right, because the backing storage for args2 might be in the mallloc heap, and GC doesn't scan the mallloc heap.

>Additionally, objects might not die but may move, assuming that
>SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA
>makes another assumption in the MPS case: that something else prevents
>the objects from moving. Another proof or check required with my GCPRO
>hat on.

Yes. In any system, a reference the GC doesn't know about must be assumed to be garbage the instant it's created. Every object is dead unless the GC can prove it's alive.

>> Also, in some other message you said SAFE_NALLOCA is unsafe if
>> _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA
>> allocates off the heap.  In call_process I see that we only ever put
>> Lisp objects into the memory allocated by SAFE_NALLOCA.  If that is
>> unsafe, could you tell what MPS does during GC which makes this
>> unsafe?
>
>Not sure, is the question why in MPS both pointers and Lisp_Object count
>as "references"?
>
>If it's that, it's basically the same in the old GC. For example, when
>marking the C stack, we must recognize both pointers to Lisp_Cons and
>Lisp_Objects that look like conses, which contain such a pointer. 

And a third case: interior pointers. A native pointer to a Lisp object isn't necessarily pointing to the start of that object.

[-- Attachment #2: Type: text/html, Size: 4494 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 19:07                         ` Eli Zaretskii
  2025-01-05 20:04                           ` Gerd Möllmann
@ 2025-01-05 21:15                           ` Daniel Colascione
  2025-01-06 12:59                             ` Eli Zaretskii
  1 sibling, 1 reply; 78+ messages in thread
From: Daniel Colascione @ 2025-01-05 21:15 UTC (permalink / raw)
  To: 75322, eliz, gerd.moellmann; +Cc: pipcet



On January 5, 2025 2:07:26 PM EST, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sun, 05 Jan 2025 19:17:37 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > OK, but in most, if not all of these cases, the objects are referenced
>> > from the stack.  For example, in the above fragment, the args[] array
>> > is on the stack.  Right?
>> 
>> That args is a parameter
>> 
>>   call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd,
>> 
>> So just from this I see only args itself on the stack, not args[0],
>> args[1] and so on. I would have to look at all callers to determine
>> that. Not good enough in my book.
>
>So what, we will now need to copy every args[] into a Lisp vector
>created by SAFE_ALLOCA_LISP, or xstrdup all of them, and do it in
>each and every function that gets the args[] array, all the way down
>to where the array is finally used (because usually we have 3 or 4
>nested levels that pass args[] to one another)?  That's insane!
>
>> > What does it mean in detail "the object may move"?  A Lisp object is a
>> > tagged pointer.  Do you mean the pointer should no point to a
>> > different address, i.e. the value of a Lisp object as a number should
>> > change to still be valid?  
>> 
>> Exactly. Unless an ambiguous reference prevents the copying that can
>> happen.
>
>How can we possibly make sure this works reliably and safely??  For
>each variable we have in every function, we will need to analyze
>whether the variable is
>
>  . an automatic variable
>  . a static variable that is protected by someone
>  . a global variable that is protected by someone
>  . a result of dereferencing a pointer that is somehow protected
>
>etc. etc., where "protected by someone" means that it is a descendant
>of some staticpro, or of some root, or...

Well, yeah. Every other GC program does this. Emacs can too. There's no great burden: all Lisp objects get traced automatically. Everything on the stack or in a register gets traced automatically, and, because the scanning is conservative, pinned. You only have to take extra steps to tell the GC about something when you're going out of your way to go around the GC.

It's simply not true that to adopt a modern GC every line of code has to change.  I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite.

>And if we cannot prove to ourselves that one of the above happens,
>then we'd need to force a copy of the variable to be on the stack?
>
>Does this sound practical?
>
>If this is the price of using MPS, and I'm not missing something
>obvious, then it sounds like we should run away from MPS, fast.
>Because we will sooner or later have to rewrite every single line of
>code we ever wrote.

No, you do it by adopting a rule that when a function receives a pointer, the caller guarantees the validity of the pointer for the duration of the call. This way, only the level of the stack that spills the array to the heap has to take on the responsibility of keeping the referenced objects alive, and making the spilled array a pointer to the pinned guts of a Lisp vector is an adequate way to do this. 

"Oh, but won't that kill performance?"

In a generational system, allocating small, short-lived objects is actually cheap ---- usually faster than mallloc. I don't recall how MPS does it, but in some garbage collectors, a gen0 allocation of this sort is literally just incrementing a pointer.

Malloc has a substantial cost of its own too.

Yes, these objects make a gen0 GC happen sooner. So what? Young generation GC is cheap. If you could observe a performance problem, you could solve the reference problem by maintaining a thread local linked list of spilled array blocks and teach the GC to scan them, but that's likely overkill.

Anyway, every native code program anywhere that uses GC has to care about lifetimes and references.  Abandoning MPS doesn't magically make the problem go away. The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. 





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 21:01                     ` Daniel Colascione
@ 2025-01-05 23:28                       ` Daniel Colascione
  2025-01-06 13:26                         ` Eli Zaretskii
  2025-01-06  4:23                       ` Gerd Möllmann
  1 sibling, 1 reply; 78+ messages in thread
From: Daniel Colascione @ 2025-01-05 23:28 UTC (permalink / raw)
  To: 75322; +Cc: gerd.moellmann, eliz, pipcet

Daniel Colascione <dancol@dancol.org> writes:

> On January 5, 2025 9:11:08 AM EST, "Gerd Möllmann"
> <gerd.moellmann@gmail.com> wrote:
>>Eli Zaretskii <eliz@gnu.org> writes:
>>
>>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>>>> Date: Sat, 04 Jan 2025 11:20:41 +0100
>>>> 
>>>> In callproc.c I found two: call_process and create_temp_file both use
>>>> SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces
>>>> with SAVE_ALLOCA_LISP.
>>>
>>> What are the conditions under which placing Lisp objects into
>>> SAFE_NALLOCA is not safe?
>>>
>>> I understand that the first condition is that SAFE_NALLOCA uses
>>> xmalloc instead of alloca.
>>
>>Right. If it doesn't use xmalloc, the references are on the C stack, and
>>both old and new GC handle that by scanning the C stack.
>>
>>> But what are the other conditions?  Is one of them that GC could
>>> happen while these Lisp objects are in the memory allocated by
>>> SAFE_NALLOCA off the heap?  
>>
>>Yes.
>>
>>> IOW, if no GC happen, is that still unsafe? And if GC _can_ happen,
>>> but we don't use the allocated block again, is that a problem? For
>>> example, in this fragment:
>>>
>>> 	    SAFE_NALLOCA (args2, 1, nargs + 1);
>>> 	    args2[0] = Qcall_process;
>>> 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
>>> 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
>>> 	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
>>>
>>> Let's say Ffind_operation_coding_system could trigger GC.  But we
>>> never again use the args2[] array after Ffind_operation_coding_system
>>> returns.  Is the above still unsafe?  If so, could you tell what
>>> could MPS do during GC to make this unsafe?
>>
>>Let me first say why I find this unsafe in the old GC, in principle. If
>>we don't assume anything about the objects referenced from args2, then a
>>reference in args2 may well be the only one to some object. In this
>>case, the old GC would sweep it.
>
> Gerd is right. This pattern was never safe.

Here's a demonstration of the problem.  Run ./emacs -batch -Q --eval
'(acos 0)'.  If you leave demo_crash to true, Emacs will abort quickly
after we detect a use-after-free.  If you set demo_crash to false, Emacs
will run the loop all day.

diff --git a/src/floatfns.c b/src/floatfns.c
index 065ae16e885..a95597beef8 100644
--- a/src/floatfns.c
+++ b/src/floatfns.c
@@ -50,6 +50,8 @@ Copyright (C) 1988, 1993-1994, 1999, 2001-2025 Free Software Foundation,
 
 #include <config.h>
 
+#include <stdlib.h>
+
 #include "lisp.h"
 #include "bignum.h"
 
@@ -86,6 +88,31 @@ DEFUN ("acos", Facos, Sacos, 1, 1, 0,
        doc: /* Return the inverse cosine of ARG.  */)
   (Lisp_Object arg)
 {
+  Lisp_Object* args2;
+  unsigned start = 12345;
+  unsigned counter = start;
+  bool demo_crash = true;
+
+  USE_SAFE_ALLOCA;
+
+  SAFE_NALLOCA (args2, 1, 1 + (demo_crash ? MAX_ALLOCA : 0));
+  args2[0] = Fcons (make_fixnum (counter),
+		    make_fixnum (counter + 1));
+  counter += 2;
+  for (;;)
+    {
+      if (!FIXNUMP (XCAR (args2[0])))
+	emacs_abort ();
+      if (XFIXNUM (XCAR (args2[0])) != 12345)
+	emacs_abort ();
+      Fcons (make_fixnum (counter),
+	     make_fixnum (counter + 1));
+      Fgarbage_collect ();
+      fprintf (stderr, ".");
+      fflush (stderr);
+      counter += 2;
+    }
+
   double d = extract_float (arg);
   d = acos (d);
   return make_float (d);





^ permalink raw reply related	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 20:24                             ` Eli Zaretskii
@ 2025-01-06  3:57                               ` Gerd Möllmann
  2025-01-06  8:25                                 ` Gerd Möllmann
  2025-01-06 14:07                                 ` Eli Zaretskii
  0 siblings, 2 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-06  3:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Sun, 05 Jan 2025 21:04:56 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > How can we possibly make sure this works reliably and safely??  For
>> > each variable we have in every function, we will need to analyze
>> > whether the variable is
>> >
>> >   . an automatic variable
>> >   . a static variable that is protected by someone
>> >   . a global variable that is protected by someone
>> >   . a result of dereferencing a pointer that is somehow protected
>> >
>> > etc. etc., where "protected by someone" means that it is a descendant
>> > of some staticpro, or of some root, or...
>> >
>> > And if we cannot prove to ourselves that one of the above happens,
>> > then we'd need to force a copy of the variable to be on the stack?
>> >
>> > Does this sound practical?
>> >
>> > If this is the price of using MPS, and I'm not missing something
>> > obvious, then it sounds like we should run away from MPS, fast.
>> > Because we will sooner or later have to rewrite every single line of
>> > code we ever wrote.
>> 
>> I'm bowing out again. It's not worth it.
>
> I don't understand why?  I need to understand the implications to be
> able to make decisions, which are part of my job.  So I ask questions,
> and I'm grateful for your answers, which clarify the issues for me.
> That I sometimes sound overwhelmed by the implications shouldn't be
> held against me, it's just a normal human reaction, nothing more.

I don't hold that against you, that's why I'm trying to answer
questions, write stuff up, and so on, but for me your reply before this
one was a leaf node in the thread.

From my POV: So we're talking about things, you want to make it
concrete, we land in call_process, I explain why SAFE_NALLOCA is unsafe
when used with references even with the old GC, you think references are
on the stack because the parameter args is on stack, and I say no.

Next thing I get is a rant. You don't even say "you're right" or "you're
wrong", so I don't know for sure if you accept my argumentation or not.
Instead, you write something that came across here as "unreasonable,
can't be true, we have to change every line of code, let's run from
MPS".

What should I reply to that? Nothing of course.

> If I somehow sound impolite, I apologize.

No worries about politeness. It wasn't impolite, and I'm not very
sensible anyway.






^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 21:01                     ` Daniel Colascione
  2025-01-05 23:28                       ` Daniel Colascione
@ 2025-01-06  4:23                       ` Gerd Möllmann
  1 sibling, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-06  4:23 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: pipcet, 75322, eliz

Daniel Colascione <dancol@dancol.org> writes:

>>If it's that, it's basically the same in the old GC. For example, when
>>marking the C stack, we must recognize both pointers to Lisp_Cons and
>>Lisp_Objects that look like conses, which contain such a pointer. 
>
> And a third case: interior pointers. A native pointer to a Lisp object
> isn't necessarily pointing to the start of that object.

Right. Thanks for the backup, BTW :-).





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 10:40         ` Gerd Möllmann
  2025-01-05 11:21           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-06  6:26           ` Gerd Möllmann
  1 sibling, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-06  6:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>>> Date: Sun, 05 Jan 2025 09:19:17 +0100
>>> 
>>> I'd grep for SAFE_NALLOCA, and for each occurrence, see what is stored
>>> in the memory allocated.
>>
>> Is only SAFE_NALLOCA a potential problem?  What about the other
>> members of the SAFE_*ALLOC* family?
>
> Haven't checked others yet. I have that on my todo list now.

Checked SAFE_ALLOCA now, and it seems not to be used for holding
references anywhere.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-06  3:57                               ` Gerd Möllmann
@ 2025-01-06  8:25                                 ` Gerd Möllmann
  2025-01-06 14:07                                 ` Eli Zaretskii
  1 sibling, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-06  8:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 75322

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>>> Date: Sun, 05 Jan 2025 21:04:56 +0100
>>> 
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>> 
>>> > How can we possibly make sure this works reliably and safely??  For
>>> > each variable we have in every function, we will need to analyze
>>> > whether the variable is
>>> >
>>> >   . an automatic variable
>>> >   . a static variable that is protected by someone
>>> >   . a global variable that is protected by someone
>>> >   . a result of dereferencing a pointer that is somehow protected
>>> >
>>> > etc. etc., where "protected by someone" means that it is a descendant
>>> > of some staticpro, or of some root, or...
>>> >
>>> > And if we cannot prove to ourselves that one of the above happens,
>>> > then we'd need to force a copy of the variable to be on the stack?
>>> >
>>> > Does this sound practical?
>>> >
>>> > If this is the price of using MPS, and I'm not missing something
>>> > obvious, then it sounds like we should run away from MPS, fast.
>>> > Because we will sooner or later have to rewrite every single line of
>>> > code we ever wrote.
>>> 
>>> I'm bowing out again. It's not worth it.
>>
>> I don't understand why?  I need to understand the implications to be
>> able to make decisions, which are part of my job.  So I ask questions,
>> and I'm grateful for your answers, which clarify the issues for me.
>> That I sometimes sound overwhelmed by the implications shouldn't be
>> held against me, it's just a normal human reaction, nothing more.
>
> I don't hold that against you, that's why I'm trying to answer
> questions, write stuff up, and so on, but for me your reply before this
> one was a leaf node in the thread.
>
>> From my POV: So we're talking about things, you want to make it
> concrete, we land in call_process, I explain why SAFE_NALLOCA is unsafe
> when used with references even with the old GC, you think references are
> on the stack because the parameter args is on stack, and I say no.
>
> Next thing I get is a rant. You don't even say "you're right" or "you're
> wrong", so I don't know for sure if you accept my argumentation or not.
> Instead, you write something that came across here as "unreasonable,
> can't be true, we have to change every line of code, let's run from
> MPS".
>
> What should I reply to that? Nothing of course.
>
>> If I somehow sound impolite, I apologize.
>
> No worries about politeness. It wasn't impolite, and I'm not very
> sensible anyway.

Not being sensible is true also, but s/sensible/sensitive/ :-)





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 21:15                           ` Daniel Colascione
@ 2025-01-06 12:59                             ` Eli Zaretskii
  2025-01-06 14:48                               ` Daniel Colascione
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-06 12:59 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: gerd.moellmann, pipcet, 75322

> Date: Sun, 05 Jan 2025 16:15:47 -0500
> From: Daniel Colascione <dancol@dancol.org>
> CC: pipcet@protonmail.com, 75322@debbugs.gnu.org
> 
> >  . an automatic variable
> >  . a static variable that is protected by someone
> >  . a global variable that is protected by someone
> >  . a result of dereferencing a pointer that is somehow protected
> >
> >etc. etc., where "protected by someone" means that it is a descendant
> >of some staticpro, or of some root, or...
> 
> Well, yeah. Every other GC program does this. Emacs can too. There's no great burden: all Lisp objects get traced automatically. Everything on the stack or in a register gets traced automatically, and, because the scanning is conservative, pinned. You only have to take extra steps to tell the GC about something when you're going out of your way to go around the GC.
> 
> It's simply not true that to adopt a modern GC every line of code has to change.  I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite.

The opinions in this thread are that changes in the code _are_ needed.
Maybe that's not a "rewrite" you had in mind, but if we need to make
such substantial changes in many places, that's a "rewrite" in my
book.

> >And if we cannot prove to ourselves that one of the above happens,
> >then we'd need to force a copy of the variable to be on the stack?
> >
> >Does this sound practical?
> >
> >If this is the price of using MPS, and I'm not missing something
> >obvious, then it sounds like we should run away from MPS, fast.
> >Because we will sooner or later have to rewrite every single line of
> >code we ever wrote.
> 
> No, you do it by adopting a rule that when a function receives a pointer, the caller guarantees the validity of the pointer for the duration of the call. This way, only the level of the stack that spills the array to the heap has to take on the responsibility of keeping the referenced objects alive, and making the spilled array a pointer to the pinned guts of a Lisp vector is an adequate way to do this. 

We are talking about code such as this one:

	    SAFE_NALLOCA (args2, 1, nargs + 1);
	    args2[0] = Qcall_process;
	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;

"Look, ma: no pointers!"

The args[] array is fine: it comes from the caller and is valid.  The
problem being discussed here is the args2[] array, in the case where
SAFE_NALLOCA decides args2[] is too large for the stack and instead
malloc's it.  In that case, args2[] stores on the heap copies of the
objects in args[] (still no pointers!), and the issue is that when GC
happens (say, at some point inside Ffind_operation_coding_system), the
objects in args[] are updated by GC, but their copies in args2[] are
not.

So this code needs to be changed.

And if you look around, we have quite a lot of these in many places.

> "Oh, but won't that kill performance?"

That wasn't my primary bother.  The primary bother is the need to
modify many places.  Which is not necessarily a purely mechanical
refactoring, either.  Consider:

  static Lisp_Object last_coding;
  Lisp_Object coding = Vcoding_system_for_write;
  [...]
  last_coding = coding;
  call_some_function ();
  if (NILP (last_coding))
    do_something ();

If call_some_function can trigger GC, the value of 'coding' after the
call is still valid, but the value of 'last_coding' could be garbage
(unless 'last_coding' is staticpro'd).  We have almost 200 static
Lisp_Object variables, probably not all of them staticpro'd (8 of them
inside functions, like the above example, so definitely not
staticpro'd).  So now we need to examine the uses of all of them and
either staticpro them or do something else (like move the assignment
to 'last_coding' to after call_some_function).

And we will probably find other usage patterns which trip the same
wire.

And the amazing part is not the above, it''s the fact that with all
that "unsafe" code people are using the branch for production, and
most of them report a stable Emacs.  What does that tell us? that
most, if not all, of those places are perfectly valid code, and we are
haunted by the proverbial shadow of a dwarf?  Or maybe it just makes
the job harder, because we now need to identify the places where these
bad things can _really_ happen, and fix only them?  Or something else?

This is what I'm trying to establish.  The problem is not theoretical,
it's entirely practical: where to steer our efforts of stabilizing the
branch and preparing it for landing, given the available resources.

> The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. 

If we allocate on the stack almost all the time, we will keep
allocating on the stack almost all the time.  But anyway, if you are
now saying that the code is broken, then a rewrite, and a significant
one at that, _is_ needed, right?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 23:28                       ` Daniel Colascione
@ 2025-01-06 13:26                         ` Eli Zaretskii
  2025-01-06 15:08                           ` Daniel Colascione
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-06 13:26 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: gerd.moellmann, pipcet, 75322

> From: Daniel Colascione <dancol@dancol.org>
> Cc: gerd.moellmann@gmail.com,  eliz@gnu.org,  pipcet@protonmail.com
> Date: Sun, 05 Jan 2025 18:28:59 -0500
> 
> Daniel Colascione <dancol@dancol.org> writes:
> 
> Here's a demonstration of the problem.  Run ./emacs -batch -Q --eval
> '(acos 0)'.  If you leave demo_crash to true, Emacs will abort quickly
> after we detect a use-after-free.  If you set demo_crash to false, Emacs
> will run the loop all day.

It is a well-known fact that inserting Fgarbage_collect in various
random places can cause bugs.  But expecting every Emacs C-level
hacker to write code that will endure such testing is impractical.  We
routinely let much more easily-spotted blunders slip though.  The
sheer number of subtleties and factoids you need to keep in mind when
writing safe code in Emacs is already inhumanly large.

We only get away because there are many places where GC cannot happen.
Admittedly, with the proliferation of calls into Lisp, there's less
and less of these places each year.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-06  3:57                               ` Gerd Möllmann
  2025-01-06  8:25                                 ` Gerd Möllmann
@ 2025-01-06 14:07                                 ` Eli Zaretskii
  1 sibling, 0 replies; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-06 14:07 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, 75322

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Mon, 06 Jan 2025 04:57:37 +0100
> 
> >From my POV: So we're talking about things, you want to make it
> concrete, we land in call_process, I explain why SAFE_NALLOCA is unsafe
> when used with references even with the old GC, you think references are
> on the stack because the parameter args is on stack, and I say no.
> 
> Next thing I get is a rant.

It wasn't a rant, it was an honest surprise that we'd need to go to
such lengths to accommodate MPS.

I still consider the effort to be huge and not very practical, and
even if someone volunteers to make all those changes, I'd be petrified
to think how many bugs this could introduce on the way.

> You don't even say "you're right" or "you're
> wrong", so I don't know for sure if you accept my argumentation or not.

Sorry for my style, but I usually try not to say things that add
nothing to the discussion.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-06 12:59                             ` Eli Zaretskii
@ 2025-01-06 14:48                               ` Daniel Colascione
  2025-01-06 15:12                                 ` Eli Zaretskii
  0 siblings, 1 reply; 78+ messages in thread
From: Daniel Colascione @ 2025-01-06 14:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, 75322

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sun, 05 Jan 2025 16:15:47 -0500
>> From: Daniel Colascione <dancol@dancol.org>
>> CC: pipcet@protonmail.com, 75322@debbugs.gnu.org
>> 
>> >  . an automatic variable
>> >  . a static variable that is protected by someone
>> >  . a global variable that is protected by someone
>> >  . a result of dereferencing a pointer that is somehow protected
>> >
>> >etc. etc., where "protected by someone" means that it is a descendant
>> >of some staticpro, or of some root, or...
>> 
>> Well, yeah. Every other GC program does this. Emacs can too. There's
>> no great burden: all Lisp objects get traced
>> automatically. Everything on the stack or in a register gets traced
>> automatically, and, because the scanning is conservative,
>> pinned. You only have to take extra steps to tell the GC about
>> something when you're going out of your way to go around the GC.
>> 
>> It's simply not true that to adopt a modern GC every line of code has to change.  I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite.
>
> The opinions in this thread are that changes in the code _are_ needed.
> Maybe that's not a "rewrite" you had in mind, but if we need to make
> such substantial changes in many places, that's a "rewrite" in my
> book.

I wouldn't call it a "rewrite".  If auditing the codebase for memory
safety is a "rewrite", I'm a "duck".  We're talking about a few hundred
lines of changes at the most.  Most of the work is just auditing the
code for problems.  We should be grateful Gerd has done this work
already, not "run away from MPS, fast".

>> >And if we cannot prove to ourselves that one of the above happens,
>> >then we'd need to force a copy of the variable to be on the stack?
>> >
>> >Does this sound practical?
>> >
>> >If this is the price of using MPS, and I'm not missing something
>> >obvious, then it sounds like we should run away from MPS, fast.
>> >Because we will sooner or later have to rewrite every single line of
>> >code we ever wrote.
>> 
>> No, you do it by adopting a rule that when a function receives a
>> pointer, the caller guarantees the validity of the pointer for the
>> duration of the call. This way, only the level of the stack that
>> spills the array to the heap has to take on the responsibility of
>> keeping the referenced objects alive, and making the spilled array a
>> pointer to the pinned guts of a Lisp vector is an adequate way to
>> do this.
>
> We are talking about code such as this one:
>
> 	    SAFE_NALLOCA (args2, 1, nargs + 1);
> 	    args2[0] = Qcall_process;
> 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
> 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
> 	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
>
> "Look, ma: no pointers!"

      Lisp_Object val, *args2;

In the C programming language, "*" means "pointer".

> The args[] array is fine: it comes from the caller and is valid.  The
> problem being discussed here is the args2[] array, in the case where
> SAFE_NALLOCA decides args2[] is too large for the stack and instead
> malloc's it.  In that case, args2[] stores on the heap copies of the
> objects in args[] (still no pointers!), and the issue is that when GC
> happens (say, at some point inside Ffind_operation_coding_system), the
> objects in args[] are updated by GC, but their copies in args2[] are
> not.

Each Lisp_Object held across a GC must either:

1) be visible to the GC for precise marking and updating (which is the
case for all references from Lisp object to another as well as
staticpro-protected static storage), or

2) be visible to the GC for conservative marking and pinning instead
of moving, which today means the variable is on the stack.

A Lisp_Object embedded in a block of malloc-heap memory doesn't satisfy
either condition and so is unsafe.

Also, this pattern is perfectly fine:

void
foo()
{
  Lisp_Object c = Fcons(...);
  bar(&c);
}

void
bar(Lisp_Object* c)
{
  eassert(CONSP(c));
  Fgarbage_collect();
  eassert(CONSP(c));
}

because when bar() gets its pointer, it implicitly assumes that it's
*caller* has followed the rule above, which it has.

The Emacs code already works this way.  The changes we're talking about
are not anywhere near as large-scale as what you might have in mind.

> So this code needs to be changed.

The snippet you quoted above can be fixed with a one-liner --- replace
SAFE_NALLOCA with SAFE_ALLOCA_LISP.  (We have to fix
SAFE_ALLOCA_LISP_EXTRA to make it safe with the old GC, but that's not a
change that needs to happen at call sites.)

> And if you look around, we have quite a lot of these in many places.

Sounds like Gerd's spent some time hunting them down.

>> "Oh, but won't that kill performance?"
>
> That wasn't my primary bother.  The primary bother is the need to
> modify many places.  Which is not necessarily a purely mechanical
> refactoring, either.  Consider:
>
>   static Lisp_Object last_coding;
>   Lisp_Object coding = Vcoding_system_for_write;
>   [...]
>   last_coding = coding;
>   call_some_function ();
>   if (NILP (last_coding))
>     do_something ();
>
> If call_some_function can trigger GC, the value of 'coding' after the
> call is still valid, but the value of 'last_coding' could be garbage
> (unless 'last_coding' is staticpro'd).

Correct.  The fix is to move last_coding to file scope and gcpro it.
You can argue that moving this declaration around is not a "mechanical"
fix, but it's straightforward, safe, and makes the code clearer: after
the change, the static storage is at file scope where you can see it,
not buried in a function body.

> We have almost 200 static
> Lisp_Object variables, probably not all of them staticpro'd (8 of them
> inside functions, like the above example, so definitely not
> staticpro'd).  So now we need to examine the uses of all of them and
> either staticpro them or do something else (like move the assignment
> to 'last_coding' to after call_some_function).

Changing eight variables from function statics to file statics hardly
seems like a monumental effort.  The static-storage global-scope
Lisp_Object variables are probably almost all gcproed already.

Gerd has already gone a long way towards identifying the various problem
spots.  In particular, his change to move uses of SAFE_ALLOCA to the
_LISP variant in various places is good and should be applied regardless
of the readiness of the rest of the MPS branch. Even in cases where for
one reason or another the code he's changing is safe under the old GC,
the old code is fragile and can break under modification. 

> And we will probably find other usage patterns which trip the same

Yep.

> And the amazing part is not the above, it''s the fact that with all
> that "unsafe" code people are using the branch for production, and
> most of them report a stable Emacs.  What does that tell us? that
> most, if not all, of those places are perfectly valid code, and we are
> haunted by the proverbial shadow of a dwarf?  Or maybe it just makes
> the job harder, because we now need to identify the places where these
> bad things can _really_ happen, and fix only them?  Or something else?

> This is what I'm trying to establish.  The problem is not theoretical,
> it's entirely practical: where to steer our efforts of stabilizing the
> branch and preparing it for landing, given the available resources.

There are three reasons we might not see code with dubious adherence to
GC rules cause problems in practice:

1) the dubious code is unsafe only in rare edge cases (who has thousands
of call-process parameters?) or the problems are GC-timing dependent,

2) the dubious code is safe only because all the Lisp values in question
are protected by other references and the old GC is non-moving, and

3) the dubious code is safe only because the only Lisp values stored in
these malloc-heap regions talk about objects that are naturally
immortal, e.g. Qnil, and the GC doesn't move immortal values.

A case of #1 is a memory safety problem.  #2 and #3 merely mean we have
fragile code.  Sometimes it's hard to tell the difference, which is why
we should fix all three.

A moving GC like MPS imposes more stringent requirements on a program
than a non-moving one does, but the changes needed to make Emacs safe
under a moving GC are not particularly invasive and make the code safer
and more maintainable overall. Most of the changes should be one- or
few-line tweaks.

We're talking about nothing remotely close to a general rewrite.

>> The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. 
>
> If we allocate on the stack almost all the time, we will keep
> allocating on the stack almost all the time.  But anyway, if you are
> now saying that the code is broken, then a rewrite, and a significant
> one at that, _is_ needed, right?

"Almost" isn't good enough when it comes to memory safety problems and
the security issues they often cause.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-06 13:26                         ` Eli Zaretskii
@ 2025-01-06 15:08                           ` Daniel Colascione
  0 siblings, 0 replies; 78+ messages in thread
From: Daniel Colascione @ 2025-01-06 15:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, 75322



On January 6, 2025 8:26:15 AM EST, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Cc: gerd.moellmann@gmail.com,  eliz@gnu.org,  pipcet@protonmail.com
>> Date: Sun, 05 Jan 2025 18:28:59 -0500
>> 
>> Daniel Colascione <dancol@dancol.org> writes:
>> 
>> Here's a demonstration of the problem.  Run ./emacs -batch -Q --eval
>> '(acos 0)'.  If you leave demo_crash to true, Emacs will abort quickly
>> after we detect a use-after-free.  If you set demo_crash to false, Emacs
>> will run the loop all day.
>
>It is a well-known fact that inserting Fgarbage_collect in various
>random places can cause bugs. 

It's not a "random place". It's demonstrating an unsafe pattern.

> But expecting every Emacs C-level
>hacker to write code that will endure such testing is impractical.  

If someone can't write safe Emacs C code, he shouldn't be writing Emacs C code at all. It is  reasonable to expect people to follow the rules.

Most new features should be in Lisp and changes to C code should be in service of making it possible to put more logic in Lisp.

> We
>routinely let much more easily-spotted blunders slip though.  The
>sheer number of subtleties and factoids you need to keep in mind when
>writing safe code in Emacs is already inhumanly large.

Yes. It's hard enough to write C code as it is. That's why people need to write C code with clear and simple bright-line rules on memory safety. For example, instead of thinking about whether this or that function scope Lisp_Object is safe, just ban function scope Lisp_Object static variables generally. Much lower cognitive burden this way.

>We only get away because there are many places where GC cannot happen.

We also get away with a lot because the old GC is non-moving. The things we get away with are fragile whether or not the old GC being forgiving makes them technically safe.

>Admittedly, with the proliferation of calls into Lisp, there's less
>and less of these places each year.

Yes, which is why it's best to make code GC safe in general. Just like we don't have to sprinkle the register keyword around anymore, we don't have to try to get clever and avoid having to gcpro this or that thing. Protect all the things. Simple and robust.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-06 14:48                               ` Daniel Colascione
@ 2025-01-06 15:12                                 ` Eli Zaretskii
  2025-01-06 15:27                                   ` Daniel Colascione
  0 siblings, 1 reply; 78+ messages in thread
From: Eli Zaretskii @ 2025-01-06 15:12 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: gerd.moellmann, pipcet, 75322

> From: Daniel Colascione <dancol@dancol.org>
> Cc: gerd.moellmann@gmail.com,  pipcet@protonmail.com,  75322@debbugs.gnu.org
> Date: Mon, 06 Jan 2025 09:48:09 -0500
> 
> I wouldn't call it a "rewrite".  If auditing the codebase for memory
> safety is a "rewrite", I'm a "duck".  We're talking about a few hundred
> lines of changes at the most.  Most of the work is just auditing the
> code for problems.  We should be grateful Gerd has done this work
> already, not "run away from MPS, fast".

I _am_ grateful to Gerd (and Helmut, and Pip, and others who work on
this).  I also invested a significant, albeit smaller, effort on my
part into this branch.  However, the potential amount of changes still
bothers me.  I understand it doesn't bother you, so I guess we
disagree in our estimations.

> > 	    SAFE_NALLOCA (args2, 1, nargs + 1);
> > 	    args2[0] = Qcall_process;
> > 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
> > 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
> > 	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
> >
> > "Look, ma: no pointers!"
> 
>       Lisp_Object val, *args2;
> 
> In the C programming language, "*" means "pointer".

Are we going to argue about pointers and arrays?

> > So this code needs to be changed.
> 
> The snippet you quoted above can be fixed with a one-liner --- replace
> SAFE_NALLOCA with SAFE_ALLOCA_LISP.

It's just one example, and there are many like it.  So that one-liner
is multiplied many times.

And then we have variations, where args[] gets text of strings or some
other similar stuff.  Etc. etc.

> > And if you look around, we have quite a lot of these in many places.
> 
> Sounds like Gerd's spent some time hunting them down.

Sure, but I'm afraid there are many more.

> > We have almost 200 static
> > Lisp_Object variables, probably not all of them staticpro'd (8 of them
> > inside functions, like the above example, so definitely not
> > staticpro'd).  So now we need to examine the uses of all of them and
> > either staticpro them or do something else (like move the assignment
> > to 'last_coding' to after call_some_function).
> 
> Changing eight variables from function statics to file statics hardly
> seems like a monumental effort.

After you found them, and after you know they should be changed, yes.
It's easy to account for the knowns; the problem is always the
unknowns.  That's why most effort estimations are inaccurate.  I
wonder what are our unknowns here, and how many of them are there.

> The static-storage global-scope
> Lisp_Object variables are probably almost all gcproed already.

Maybe.  But someone needs to verify that, right?





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-06 15:12                                 ` Eli Zaretskii
@ 2025-01-06 15:27                                   ` Daniel Colascione
  0 siblings, 0 replies; 78+ messages in thread
From: Daniel Colascione @ 2025-01-06 15:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, 75322



On January 6, 2025 10:12:53 AM EST, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Cc: gerd.moellmann@gmail.com,  pipcet@protonmail.com,  75322@debbugs.gnu.org
>> Date: Mon, 06 Jan 2025 09:48:09 -0500
>> 
>> I wouldn't call it a "rewrite".  If auditing the codebase for memory
>> safety is a "rewrite", I'm a "duck".  We're talking about a few hundred
>> lines of changes at the most.  Most of the work is just auditing the
>> code for problems.  We should be grateful Gerd has done this work
>> already, not "run away from MPS, fast".
>
>I _am_ grateful to Gerd (and Helmut, and Pip, and others who work on
>this).  I also invested a significant, albeit smaller, effort on my
>part into this branch.  However, the potential amount of changes still
>bothers me.  I understand it doesn't bother you, so I guess we
>disagree in our estimations.
>
>> > 	    SAFE_NALLOCA (args2, 1, nargs + 1);
>> > 	    args2[0] = Qcall_process;
>> > 	    for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
>> > 	    coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
>> > 	    val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
>> >
>> > "Look, ma: no pointers!"
>> 
>>       Lisp_Object val, *args2;
>> 
>> In the C programming language, "*" means "pointer".
>
>Are we going to argue about pointers and arrays?
>
>> > So this code needs to be changed.
>> 
>> The snippet you quoted above can be fixed with a one-liner --- replace
>> SAFE_NALLOCA with SAFE_ALLOCA_LISP.
>
>It's just one example, and there are many like it.  So that one-liner
>is multiplied many times.
>
>And then we have variations, where args[] gets text of strings or some
>other similar stuff.  Etc. etc.
>
>> > And if you look around, we have quite a lot of these in many places.
>> 
>> Sounds like Gerd's spent some time hunting them down.
>
>Sure, but I'm afraid there are many more.
>
>> > We have almost 200 static
>> > Lisp_Object variables, probably not all of them staticpro'd (8 of them
>> > inside functions, like the above example, so definitely not
>> > staticpro'd).  So now we need to examine the uses of all of them and
>> > either staticpro them or do something else (like move the assignment
>> > to 'last_coding' to after call_some_function).
>> 
>> Changing eight variables from function statics to file statics hardly
>> seems like a monumental effort.
>
>After you found them, and after you know they should be changed, yes.
>It's easy to account for the knowns; the problem is always the
>unknowns.  That's why most effort estimations are inaccurate.  I
>wonder what are our unknowns here, and how many of them are there.

I'm a lot less worried than you are about the unknown unknowns. I have a few specific reasons why:

1) we caught most of the movement-unsafe global references when we did pdumper, which, like MPS, moves things around in memory and so has to worry about the kind of unsafe references we're discussing here.

2) when I did my own moving GC a few years ago, I didn't run into serious problems with movement-unsafe references, although, like Gerd and others have done on the MPS branch, I had to fix a few

3) it should be possible (I don't know whether MPC implements this or not, but it could) to arrange for a debugging mode moving GC that moves *everything* not pinned from the from-space to a non-overlapping to-space, then applies PROT_NONE to any part of the from-space not used for a pinned object. This way, any GC-invisible Lisp_Obiects (or other pointers) "left behind" because we forgot to tell the GC about them will produce an immediate SIGSEGV when used. We could even conservatively scan for them. 

#3 is probably overkill, but it's something we could try if we attempted to merge MPS and found chronic problems 

>> The static-storage global-scope
>> Lisp_Object variables are probably almost all gcproed already.
>
>Maybe.  But someone needs to verify that, right?

A few people have already done things like this over the years.





^ permalink raw reply	[flat|nested] 78+ messages in thread

* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-05 11:04                             ` Eli Zaretskii
@ 2025-01-06 15:54                               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2025-01-06 19:16                                 ` Gerd Möllmann
  0 siblings, 1 reply; 78+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-06 15:54 UTC (permalink / raw)
  To: Eli Zaretskii, Daniel Colascione; +Cc: gerd.moellmann, 75322

"Eli Zaretskii" <eliz@gnu.org> writes:

>> As long as you think MPS requires us to avoid GC in this case (this is
>> implied by "we must do this or that after GC"), your understanding of
>> how scratch/igc uses MPS is fundamentally incorrect.
>
> Maybe my understanding is incorrect, but this style of "discussion",
> where you are working hard to prove at all costs that I misunderstand
> the issues and to point out my mistakes, contributes nothing at all to
> understanding.  It only contributes to confusion.

I thought it was the only way for me to get you to even consider the
possibility that your ideas of how GC works with MPS may need to be
corrected.  Given your message, I currently think there is no way for me
to do so at all.

I give up.  (Certainly on this discussion, maybe on scratch/igc, who
knows.)

The MPS-based GC in scratch/igc does not allow or require general Emacs
C code to make "no GC here" assumptions.

Pip






^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
  2025-01-06 15:54                               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-06 19:16                                 ` Gerd Möllmann
  0 siblings, 0 replies; 78+ messages in thread
From: Gerd Möllmann @ 2025-01-06 19:16 UTC (permalink / raw)
  To: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  Cc: Eli Zaretskii, Daniel Colascione, Pip Cet, 75322, Emacs Devel,
	Helmut Eller

(CC to emacs-devel and Helmut)

> I give up.  (Certainly on this discussion, maybe on scratch/igc, who
> knows.)

I propose that we, i.e. you Pip, Helmut, and me, ignore the subject of
a possible merge to master in the future. That's an SEP, costs too much
energy, and what happens is unpredictable.

I'll begin with the ignoring by reverting a revert, adding Helmut's and
my patches, merging master on the weekend. Sorry for any inconvenience
that causes.



^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2025-01-06 19:16 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-03 19:55 ` Gerd Möllmann
2025-01-03 20:34   ` Gerd Möllmann
2025-01-03 20:48     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04  4:40       ` Gerd Möllmann
2025-01-04  7:57         ` Eli Zaretskii
2025-01-04  8:47           ` Gerd Möllmann
2025-01-04  9:56             ` Eli Zaretskii
2025-01-04 10:20               ` Gerd Möllmann
2025-01-05 13:30                 ` Eli Zaretskii
2025-01-05 14:11                   ` Gerd Möllmann
2025-01-05 17:45                     ` Eli Zaretskii
2025-01-05 18:17                       ` Gerd Möllmann
2025-01-05 19:07                         ` Eli Zaretskii
2025-01-05 20:04                           ` Gerd Möllmann
2025-01-05 20:24                             ` Eli Zaretskii
2025-01-06  3:57                               ` Gerd Möllmann
2025-01-06  8:25                                 ` Gerd Möllmann
2025-01-06 14:07                                 ` Eli Zaretskii
2025-01-05 21:15                           ` Daniel Colascione
2025-01-06 12:59                             ` Eli Zaretskii
2025-01-06 14:48                               ` Daniel Colascione
2025-01-06 15:12                                 ` Eli Zaretskii
2025-01-06 15:27                                   ` Daniel Colascione
2025-01-05 21:01                     ` Daniel Colascione
2025-01-05 23:28                       ` Daniel Colascione
2025-01-06 13:26                         ` Eli Zaretskii
2025-01-06 15:08                           ` Daniel Colascione
2025-01-06  4:23                       ` Gerd Möllmann
2025-01-04 11:41               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 11:29         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 12:17           ` Gerd Möllmann
2025-01-04  7:00     ` Eli Zaretskii
2025-01-04  7:17       ` Gerd Möllmann
2025-01-04  8:23         ` Eli Zaretskii
2025-01-04  8:58           ` Gerd Möllmann
2025-01-04 11:08       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 13:47         ` Eli Zaretskii
2025-01-04 14:13           ` Gerd Möllmann
2025-01-04 15:26           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 15:34             ` Gerd Möllmann
2025-01-04 18:19               ` Eli Zaretskii
2025-01-04 18:35                 ` Gerd Möllmann
2025-01-04 19:10                   ` Eli Zaretskii
2025-01-04 19:24                     ` Gerd Möllmann
2025-01-04 18:02             ` Eli Zaretskii
2025-01-04 19:32               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-04 20:31                 ` Eli Zaretskii
2025-01-04 21:15                   ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05  8:23                     ` Eli Zaretskii
2025-01-05  9:04                       ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05  9:32                         ` Eli Zaretskii
2025-01-05  9:47                           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 11:04                             ` Eli Zaretskii
2025-01-06 15:54                               ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-06 19:16                                 ` Gerd Möllmann
2025-01-05  6:32 ` Gerd Möllmann
2025-01-05  6:59   ` Gerd Möllmann
2025-01-05 10:21     ` Eli Zaretskii
2025-01-05 10:30       ` Gerd Möllmann
2025-01-05 10:35         ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 10:45           ` Gerd Möllmann
2025-01-05 11:29         ` Eli Zaretskii
2025-01-05 11:37           ` Gerd Möllmann
2025-01-05 12:15             ` Eli Zaretskii
2025-01-05 13:21               ` Gerd Möllmann
2025-01-05 17:31                 ` Eli Zaretskii
2025-01-05 17:49                   ` Gerd Möllmann
2025-01-05 18:42                     ` Eli Zaretskii
2025-01-05 19:02                       ` Gerd Möllmann
2025-01-05  7:48   ` Eli Zaretskii
2025-01-05  8:19     ` Gerd Möllmann
2025-01-05 10:33       ` Eli Zaretskii
2025-01-05 10:40         ` Gerd Möllmann
2025-01-05 11:21           ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 11:27             ` Gerd Möllmann
2025-01-05 11:49             ` Paul Eggert
2025-01-06  6:26           ` Gerd Möllmann

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.