On January 5, 2025 9:11:08 AM EST, "Gerd Möllmann" <gerd.moellmann@gmail.com> wrote:
>Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: pipcet@protonmail.com,  75322@debbugs.gnu.org
>>> Date: Sat, 04 Jan 2025 11:20:41 +0100
>>>
>>> In callproc.c I found two: call_process and create_temp_file both use
>>> SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces
>>> with SAVE_ALLOCA_LISP.
>>
>> What are the conditions under which placing Lisp objects into
>> SAFE_NALLOCA is not safe?
>>
>> I understand that the first condition is that SAFE_NALLOCA uses
>> xmalloc instead of alloca.
>
>Right. If it doesn't use xmalloc, the references are on the C stack, and
>both old and new GC handle that by scanning the C stack.
>
>> But what are the other conditions?  Is one of them that GC could
>> happen while these Lisp objects are in the memory allocated by
>> SAFE_NALLOCA off the heap? 
>
>Yes.
>
>> IOW, if no GC happen, is that still unsafe? And if GC _can_ happen,
>> but we don't use the allocated block again, is that a problem? For
>> example, in this fragment:
>>
>>     SAFE_NALLOCA (args2, 1, nargs + 1);
>>     args2[0] = Qcall_process;
>>     for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
>>     coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
>>     val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil;
>>
>> Let's say Ffind_operation_coding_system could trigger GC.  But we
>> never again use the args2[] array after Ffind_operation_coding_system
>> returns.  Is the above still unsafe?  If so, could you tell what
>> could MPS do during GC to make this unsafe?
>
>Let me first say why I find this unsafe in the old GC, in principle. If
>we don't assume anything about the objects referenced from args2, then a
>reference in args2 may well be the only one to some object. In this
>case, the old GC would sweep it.

Gerd is right. This pattern was never safe.

> Or, the other way 'round, by using
>SAFE_NALLOCA we make an assumption. And that, from my (GCPRO) POV, needs
>a proof, or better yet some check in code.
>
>Not using arg2 after Ffind_operation_coding_system above is not enough.
>It would have to be not using args2 after the GC has run. Maybe that's
>_in_ Ffind_operation_coding_system.
>
>In the new GC, with MPS, the same is true as above. An object which is
>only referenced from args2 may die.

Right, because the backing storage for args2 might be in the mallloc heap, and GC doesn't scan the mallloc heap.

>Additionally, objects might not die but may move, assuming that
>SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA
>makes another assumption in the MPS case: that something else prevents
>the objects from moving. Another proof or check required with my GCPRO
>hat on.

Yes. In any system, a reference the GC doesn't know about must be assumed to be garbage the instant it's created. Every object is dead unless the GC can prove it's alive.

>> Also, in some other message you said SAFE_NALLOCA is unsafe if
>> _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA
>> allocates off the heap.  In call_process I see that we only ever put
>> Lisp objects into the memory allocated by SAFE_NALLOCA.  If that is
>> unsafe, could you tell what MPS does during GC which makes this
>> unsafe?
>
>Not sure, is the question why in MPS both pointers and Lisp_Object count
>as "references"?
>
>If it's that, it's basically the same in the old GC. For example, when
>marking the C stack, we must recognize both pointers to Lisp_Cons and
>Lisp_Objects that look like conses, which contain such a pointer.

And a third case: interior pointers. A native pointer to a Lisp object isn't necessarily pointing to the start of that object.