* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) @ 2025-01-03 17:20 Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-03 19:55 ` Gerd Möllmann 2025-01-05 6:32 ` Gerd Möllmann 0 siblings, 2 replies; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-03 17:20 UTC (permalink / raw) To: 75322 process.c and callproc.c use ALLOCA to store arrays of Lisp_Object values, or arrays of pointers to string data. With the current GC, that is correct code only if it's provable that these objects are marked through some other reference to them (or we know no GC can happen), because if the last reference is hiding in a SAFE_ALLOCA'd buffer AND that buffer is not on the C stack, it will be collected. With MPS GC, it's even more important to do so, because the object might otherwise be moved, invalidating the pointer. This is a possible explanation for bug#75292. In either case, I don't immediately see that the current code would be safe. SAFE_ALLOCA doesn't call xmalloc very often: it usually uses alloca(), which results in data on the C stack, where it is visible to both garbage collectors. An alternative to fixing this bug in callproc.c and process.c (and reviewing every other use of SAFE_ALLOCA) would be to ensure that in the rare case that SAFE_ALLOCA memory is not on the stack, we'd still conservatively scan it for references. If we decide to retain the current SAFE_ALLOCA behavior, it is very important to test builds where SAFE_ALLOCA always uses xmalloc, so we have a chance to detect such bugs. Unfortunately, currently bidi.c and some other places assume that MAX_ALLOCA is "large enough" so this isn't a simple matter of defining that constant to be 0. A large stack footprint comes at a cost: it needs to be scanned during GC, but more importantly there is the risk of latent bugs because stack pages might not be accessed in the "right" order and the OS might assume that an excessively-offset access is a program bug rather than an attempt to allocate large amounts of stack; GCC on GNU/Linux currently tries to work around this issue by touching each 4096-byte page of the stack at least once, in the right order. Note that call_process already allocates more than 64 KB of stack space unconditionally (in an inner scope, but GCC hoists such allocations to the function frame). It may be a good idea not to use SAFE_ALLOCA at all in this function, unless the 64 KB allocation can be removed. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-03 19:55 ` Gerd Möllmann 2025-01-03 20:34 ` Gerd Möllmann 2025-01-05 6:32 ` Gerd Möllmann 1 sibling, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-03 19:55 UTC (permalink / raw) To: Pip Cet; +Cc: 75322 Pip Cet <pipcet@protonmail.com> writes: > process.c and callproc.c use ALLOCA to store arrays of Lisp_Object > values, or arrays of pointers to string data. > > With the current GC, that is correct code only if it's provable that > these objects are marked through some other reference to them (or we > know no GC can happen), because if the last reference is hiding in a > SAFE_ALLOCA'd buffer AND that buffer is not on the C stack, it will be > collected. > > With MPS GC, it's even more important to do so, because the object might > otherwise be moved, invalidating the pointer. This is a possible > explanation for bug#75292. > > In either case, I don't immediately see that the current code would be > safe. I agree. One should probably check why the places storing Lisp_Objects don't use SAFE_ALLOCA_LISP. If that's possible it would work for old GC and igc. The pointers to string data case probably requires adding yet another macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root, possibly and exact one which would be good. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-03 19:55 ` Gerd Möllmann @ 2025-01-03 20:34 ` Gerd Möllmann 2025-01-03 20:48 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 7:00 ` Eli Zaretskii 0 siblings, 2 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-03 20:34 UTC (permalink / raw) To: Pip Cet; +Cc: 75322 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > > The pointers to string data case probably requires adding yet another > macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root, > possibly and exact one which would be good. Or one does it as you did in b0a209e9204, that's of course also safe. For both old and new GC. (Don't remember if you mentioned it Pip, but old GC moves string data as well, during string compaction, should GC run). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-03 20:34 ` Gerd Möllmann @ 2025-01-03 20:48 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 4:40 ` Gerd Möllmann 2025-01-04 7:00 ` Eli Zaretskii 1 sibling, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-03 20:48 UTC (permalink / raw) To: Gerd Möllmann; +Cc: 75322 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> >> The pointers to string data case probably requires adding yet another >> macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root, >> possibly and exact one which would be good. Note that might still EFAULT if there's a memory barrier. I think. Do we really need to move all arguments to syscalls and libc functions which might use a syscall into non-MPS memory? That would be bad. And which libc functions might use a syscall? I think we can agree fprintf might, and memcpy() doesn't (note to self: destroy all evidence I ever considered making memcpy() use MMU tricks for very large buffers), but what about all the others? Maybe I'm panicking too much and fixing read/write/exec* is good enough? > Or one does it as you did in b0a209e9204, that's of course also safe. > For both old and new GC. (Don't remember if you mentioned it Pip, but > old GC moves string data as well, during string compaction, should GC > run). Ouch. Yes, I remember now. Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-03 20:48 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 4:40 ` Gerd Möllmann 2025-01-04 7:57 ` Eli Zaretskii 2025-01-04 11:29 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 2 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 4:40 UTC (permalink / raw) To: Pip Cet; +Cc: 75322 Pip Cet <pipcet@protonmail.com> writes: > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >> >>> >>> The pointers to string data case probably requires adding yet another >>> macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root, >>> possibly and exact one which would be good. > > Note that might still EFAULT if there's a memory barrier. I think. > > Do we really need to move all arguments to syscalls and libc functions > which might use a syscall into non-MPS memory? That would be bad. > > And which libc functions might use a syscall? I think we can agree > fprintf might, and memcpy() doesn't (note to self: destroy all evidence > I ever considered making memcpy() use MMU tricks for very large > buffers), but what about all the others? > > Maybe I'm panicking too much and fixing read/write/exec* is good > enough? Don't Panic! Quote from The Hitchhiker's Guide to Emacs (non-NS edition) :-). TBH, I couldn't follow your thoughts above with the EFAULT, syscalls and so on. >> Or one does it as you did in b0a209e9204, that's of course also safe. >> For both old and new GC. (Don't remember if you mentioned it Pip, but >> old GC moves string data as well, during string compaction, should GC >> run). > > Ouch. Yes, I remember now. > > Pip And today I see you reverted that commit. Is there something wrong with it? I couldn't see something wrong, and for me VALUE(no root) > VALUE(exact) VALUE(ambig). WRT Lisp_Object allocas, please tell if I should do that. (I think this all could also be done on master, but since this is there forever, I don't think it's absolutely necessary. Agree?) ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 4:40 ` Gerd Möllmann @ 2025-01-04 7:57 ` Eli Zaretskii 2025-01-04 8:47 ` Gerd Möllmann 2025-01-04 11:29 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 7:57 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > Cc: 75322@debbugs.gnu.org > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Date: Sat, 04 Jan 2025 05:40:09 +0100 > > And today I see you reverted that commit. Is there something wrong with > it? I couldn't see something wrong, and for me VALUE(no root) > > VALUE(exact) VALUE(ambig). That's my fault. I asked to post the patch and discuss it before committing, as these are delicate issues where I prefer that we all are on the same page before changing this code. > WRT Lisp_Object allocas, please tell if I should do that. Let's discuss this on a case by case basis. Not all uses of alloca are the same or have the same requirements and restrictions. > (I think this all could also be done on master, but since this is there > forever, I don't think it's absolutely necessary. Agree?) Exactly my point, which is why I think we should discuss these changes before installing them on the branch, let alone on master. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 7:57 ` Eli Zaretskii @ 2025-01-04 8:47 ` Gerd Möllmann 2025-01-04 9:56 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 8:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> Cc: 75322@debbugs.gnu.org >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Date: Sat, 04 Jan 2025 05:40:09 +0100 >> >> And today I see you reverted that commit. Is there something wrong with >> it? I couldn't see something wrong, and for me VALUE(no root) > >> VALUE(exact) VALUE(ambig). > > That's my fault. I asked to post the patch and discuss it before > committing, as these are delicate issues where I prefer that we all > are on the same page before changing this code. Ah okay, that explains it, I missed that discussion, sorry. >> WRT Lisp_Object allocas, please tell if I should do that. > > Let's discuss this on a case by case basis. Not all uses of alloca > are the same or have the same requirements and restrictions. Okay. For the SDATA case, I find Pip's commit does the right thing. It uses xstrdup for the strings and makes sure these are freed later by registering them in one of the special specpdl entries that exist for that purpose (record_unwind_protect_ptr). Works with both GCs, is always safe, I don't see disadvantages, and we don't have to check if GC runs or not and compact strings (old) or moves string data around (igc). For the Lisp_Object cases, I strongly suspect that these cases are an oversight and were left over when SAFE_ALLOCA_LISP was introduced. The reason for introducing SAFE_ALLOCA_LISP is, IMO, what Pip said (old GC): to make sure that the Lisp_Objects are marked, via specpdl stack entries again, when the specpdl stack(s) are marked. Not doing that looks definitely wrong. Replacing the other ALLOCA macros that are currently used for holding Lisp_Objects with SAFE_ALLOCA_LISP would solve things for both GCs. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 8:47 ` Gerd Möllmann @ 2025-01-04 9:56 ` Eli Zaretskii 2025-01-04 10:20 ` Gerd Möllmann 2025-01-04 11:41 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 2 replies; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 9:56 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sat, 04 Jan 2025 09:47:43 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Let's discuss this on a case by case basis. Not all uses of alloca > > are the same or have the same requirements and restrictions. > > Okay. For the SDATA case, I find Pip's commit does the right thing. It > uses xstrdup for the strings and makes sure these are freed later by > registering them in one of the special specpdl entries that exist for > that purpose (record_unwind_protect_ptr). Works with both GCs, is always > safe, I don't see disadvantages, and we don't have to check if GC runs > or not and compact strings (old) or moves string data around (igc). The disadvantage is to xstrdup strings when that is not needed. It increases memory pressure and also costs some CPU time. Not very significant disadvantage, admittedly, but still. If this is needed, it's a small price to pay, but if it isn't needed, it's a waste. So let's see if we agree that it is indeed needed. The way to do that is to explain how GC could happen while the code which assembles the environment in make_environment_block and the code in emacs_spawn afterwards run. Note that we block SIGCHLD and block input while emacs_spawn runs. > For the Lisp_Object cases, I strongly suspect that these cases are an > oversight and were left over when SAFE_ALLOCA_LISP was introduced. The > reason for introducing SAFE_ALLOCA_LISP is, IMO, what Pip said (old GC): > to make sure that the Lisp_Objects are marked, via specpdl stack entries > again, when the specpdl stack(s) are marked. Not doing that looks > definitely wrong. Replacing the other ALLOCA macros that are currently > used for holding Lisp_Objects with SAFE_ALLOCA_LISP would solve things > for both GCs. Can we first identify those cases, so that we could discuss the specific codes in question? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 9:56 ` Eli Zaretskii @ 2025-01-04 10:20 ` Gerd Möllmann 2025-01-05 13:30 ` Eli Zaretskii 2025-01-04 11:41 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 10:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sat, 04 Jan 2025 09:47:43 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > Let's discuss this on a case by case basis. Not all uses of alloca >> > are the same or have the same requirements and restrictions. >> >> Okay. For the SDATA case, I find Pip's commit does the right thing. It >> uses xstrdup for the strings and makes sure these are freed later by >> registering them in one of the special specpdl entries that exist for >> that purpose (record_unwind_protect_ptr). Works with both GCs, is always >> safe, I don't see disadvantages, and we don't have to check if GC runs >> or not and compact strings (old) or moves string data around (igc). > > The disadvantage is to xstrdup strings when that is not needed. It > increases memory pressure and also costs some CPU time. Not very > significant disadvantage, admittedly, but still. If this is needed, > it's a small price to pay, but if it isn't needed, it's a waste. That's true. > > So let's see if we agree that it is indeed needed. The way to do that > is to explain how GC could happen while the code which assembles the > environment in make_environment_block and the code in emacs_spawn > afterwards run. Note that we block SIGCHLD and block input while > emacs_spawn runs. I'm sorry, but I'm afraid that's not something for me. The possible gains are so small that I don't want to spend time checking this. >> For the Lisp_Object cases, I strongly suspect that these cases are an >> oversight and were left over when SAFE_ALLOCA_LISP was introduced. The >> reason for introducing SAFE_ALLOCA_LISP is, IMO, what Pip said (old GC): >> to make sure that the Lisp_Objects are marked, via specpdl stack entries >> again, when the specpdl stack(s) are marked. Not doing that looks >> definitely wrong. Replacing the other ALLOCA macros that are currently >> used for holding Lisp_Objects with SAFE_ALLOCA_LISP would solve things >> for both GCs. > > Can we first identify those cases, so that we could discuss the > specific codes in question? Okay. Pip talked about callproc.c and process.c. I believe process.c is the one with the SDATA. In callproc.c I found two: call_process and create_temp_file both use SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces with SAVE_ALLOCA_LISP. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 10:20 ` Gerd Möllmann @ 2025-01-05 13:30 ` Eli Zaretskii 2025-01-05 14:11 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 13:30 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sat, 04 Jan 2025 11:20:41 +0100 > > In callproc.c I found two: call_process and create_temp_file both use > SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces > with SAVE_ALLOCA_LISP. What are the conditions under which placing Lisp objects into SAFE_NALLOCA is not safe? I understand that the first condition is that SAFE_NALLOCA uses xmalloc instead of alloca. But what are the other conditions? Is one of them that GC could happen while these Lisp objects are in the memory allocated by SAFE_NALLOCA off the heap? IOW, if no GC happen, is that still unsafe? And if GC _can_ happen, but we don't use the allocated block again, is that a problem? For example, in this fragment: SAFE_NALLOCA (args2, 1, nargs + 1); args2[0] = Qcall_process; for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; coding_systems = Ffind_operation_coding_system (nargs + 1, args2); val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; Let's say Ffind_operation_coding_system could trigger GC. But we never again use the args2[] array after Ffind_operation_coding_system returns. Is the above still unsafe? If so, could you tell what could MPS do during GC to make this unsafe? Also, in some other message you said SAFE_NALLOCA is unsafe if _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA allocates off the heap. In call_process I see that we only ever put Lisp objects into the memory allocated by SAFE_NALLOCA. If that is unsafe, could you tell what MPS does during GC which makes this unsafe? TIA ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 13:30 ` Eli Zaretskii @ 2025-01-05 14:11 ` Gerd Möllmann 2025-01-05 17:45 ` Eli Zaretskii 2025-01-05 21:01 ` Daniel Colascione 0 siblings, 2 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 14:11 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sat, 04 Jan 2025 11:20:41 +0100 >> >> In callproc.c I found two: call_process and create_temp_file both use >> SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces >> with SAVE_ALLOCA_LISP. > > What are the conditions under which placing Lisp objects into > SAFE_NALLOCA is not safe? > > I understand that the first condition is that SAFE_NALLOCA uses > xmalloc instead of alloca. Right. If it doesn't use xmalloc, the references are on the C stack, and both old and new GC handle that by scanning the C stack. > But what are the other conditions? Is one of them that GC could > happen while these Lisp objects are in the memory allocated by > SAFE_NALLOCA off the heap? Yes. > IOW, if no GC happen, is that still unsafe? And if GC _can_ happen, > but we don't use the allocated block again, is that a problem? For > example, in this fragment: > > SAFE_NALLOCA (args2, 1, nargs + 1); > args2[0] = Qcall_process; > for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; > coding_systems = Ffind_operation_coding_system (nargs + 1, args2); > val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; > > Let's say Ffind_operation_coding_system could trigger GC. But we > never again use the args2[] array after Ffind_operation_coding_system > returns. Is the above still unsafe? If so, could you tell what > could MPS do during GC to make this unsafe? Let me first say why I find this unsafe in the old GC, in principle. If we don't assume anything about the objects referenced from args2, then a reference in args2 may well be the only one to some object. In this case, the old GC would sweep it. Or, the other way 'round, by using SAFE_NALLOCA we make an assumption. And that, from my (GCPRO) POV, needs a proof, or better yet some check in code. Not using arg2 after Ffind_operation_coding_system above is not enough. It would have to be not using args2 after the GC has run. Maybe that's _in_ Ffind_operation_coding_system. In the new GC, with MPS, the same is true as above. An object which is only referenced from args2 may die. Additionally, objects might not die but may move, assuming that SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA makes another assumption in the MPS case: that something else prevents the objects from moving. Another proof or check required with my GCPRO hat on. > > Also, in some other message you said SAFE_NALLOCA is unsafe if > _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA > allocates off the heap. In call_process I see that we only ever put > Lisp objects into the memory allocated by SAFE_NALLOCA. If that is > unsafe, could you tell what MPS does during GC which makes this > unsafe? Not sure, is the question why in MPS both pointers and Lisp_Object count as "references"? If it's that, it's basically the same in the old GC. For example, when marking the C stack, we must recognize both pointers to Lisp_Cons and Lisp_Objects that look like conses, which contain such a pointer. Which was the reason I had to add the mem_blocks, the red-black tree and so on, to be able to determine if some potential pointer can indeed be pointer to cons, conservatively. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 14:11 ` Gerd Möllmann @ 2025-01-05 17:45 ` Eli Zaretskii 2025-01-05 18:17 ` Gerd Möllmann 2025-01-05 21:01 ` Daniel Colascione 1 sibling, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 17:45 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sun, 05 Jan 2025 15:11:08 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > And if GC _can_ happen, > > but we don't use the allocated block again, is that a problem? For > > example, in this fragment: > > > > SAFE_NALLOCA (args2, 1, nargs + 1); > > args2[0] = Qcall_process; > > for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; > > coding_systems = Ffind_operation_coding_system (nargs + 1, args2); > > val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; > > > > Let's say Ffind_operation_coding_system could trigger GC. But we > > never again use the args2[] array after Ffind_operation_coding_system > > returns. Is the above still unsafe? If so, could you tell what > > could MPS do during GC to make this unsafe? > > Let me first say why I find this unsafe in the old GC, in principle. If > we don't assume anything about the objects referenced from args2, then a > reference in args2 may well be the only one to some object. In this > case, the old GC would sweep it. OK, but in most, if not all of these cases, the objects are referenced from the stack. For example, in the above fragment, the args[] array is on the stack. Right? > Not using arg2 after Ffind_operation_coding_system above is not enough. > It would have to be not using args2 after the GC has run. Maybe that's > _in_ Ffind_operation_coding_system. OK, agreed. > Additionally, objects might not die but may move, assuming that > SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA > makes another assumption in the MPS case: that something else prevents > the objects from moving. Another proof or check required with my GCPRO > hat on. What does it mean in detail "the object may move"? A Lisp object is a tagged pointer. Do you mean the pointer should no point to a different address, i.e. the value of a Lisp object as a number should change to still be valid? And if so, is MPS supposed to find all the copies of that value everywhere in order to update them? So if I have several variables which were all assigned a value of the same Lisp object, they all need to be updated when the object moves? > > Also, in some other message you said SAFE_NALLOCA is unsafe if > > _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA > > allocates off the heap. In call_process I see that we only ever put > > Lisp objects into the memory allocated by SAFE_NALLOCA. If that is > > unsafe, could you tell what MPS does during GC which makes this > > unsafe? > > Not sure, is the question why in MPS both pointers and Lisp_Object count > as "references"? Yes, if that's the situation. Earlier you only mentioned pointers to Lisp objects, something that happens relatively rarely. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 17:45 ` Eli Zaretskii @ 2025-01-05 18:17 ` Gerd Möllmann 2025-01-05 19:07 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 18:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sun, 05 Jan 2025 15:11:08 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > And if GC _can_ happen, >> > but we don't use the allocated block again, is that a problem? For >> > example, in this fragment: >> > >> > SAFE_NALLOCA (args2, 1, nargs + 1); >> > args2[0] = Qcall_process; >> > for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; >> > coding_systems = Ffind_operation_coding_system (nargs + 1, args2); >> > val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; >> > >> > Let's say Ffind_operation_coding_system could trigger GC. But we >> > never again use the args2[] array after Ffind_operation_coding_system >> > returns. Is the above still unsafe? If so, could you tell what >> > could MPS do during GC to make this unsafe? >> >> Let me first say why I find this unsafe in the old GC, in principle. If >> we don't assume anything about the objects referenced from args2, then a >> reference in args2 may well be the only one to some object. In this >> case, the old GC would sweep it. > > OK, but in most, if not all of these cases, the objects are referenced > from the stack. For example, in the above fragment, the args[] array > is on the stack. Right? That args is a parameter call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd, So just from this I see only args itself on the stack, not args[0], args[1] and so on. I would have to look at all callers to determine that. Not good enough in my book. > >> Not using arg2 after Ffind_operation_coding_system above is not enough. >> It would have to be not using args2 after the GC has run. Maybe that's >> _in_ Ffind_operation_coding_system. > > OK, agreed. > >> Additionally, objects might not die but may move, assuming that >> SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA >> makes another assumption in the MPS case: that something else prevents >> the objects from moving. Another proof or check required with my GCPRO >> hat on. > > What does it mean in detail "the object may move"? A Lisp object is a > tagged pointer. Do you mean the pointer should no point to a > different address, i.e. the value of a Lisp object as a number should > change to still be valid? Exactly. Unless an ambiguous reference prevents the copying that can happen. > And if so, is MPS supposed to find all the copies of that value > everywhere in order to update them? So if I have several variables > which were all assigned a value of the same Lisp object, they all need > to be updated when the object moves? Yes. MPS does that with the help of our dflt_scan and its subroutines where we call MPS_FIX2 and update the reference. >> > Also, in some other message you said SAFE_NALLOCA is unsafe if >> > _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA >> > allocates off the heap. In call_process I see that we only ever put >> > Lisp objects into the memory allocated by SAFE_NALLOCA. If that is >> > unsafe, could you tell what MPS does during GC which makes this >> > unsafe? >> >> Not sure, is the question why in MPS both pointers and Lisp_Object count >> as "references"? > > Yes, if that's the situation. Earlier you only mentioned pointers to > Lisp objects, something that happens relatively rarely. That's the case in MPS. Fixnums aside, Lisp_Object is basically also only a pointer, with some tag bits added. In that sense it's the same case. And every string contains a pointer to it's data, which I consider part of the Lisp data. And intervals are also Lisp data. The ones from enum igc_obj_type. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 18:17 ` Gerd Möllmann @ 2025-01-05 19:07 ` Eli Zaretskii 2025-01-05 20:04 ` Gerd Möllmann 2025-01-05 21:15 ` Daniel Colascione 0 siblings, 2 replies; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 19:07 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sun, 05 Jan 2025 19:17:37 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > OK, but in most, if not all of these cases, the objects are referenced > > from the stack. For example, in the above fragment, the args[] array > > is on the stack. Right? > > That args is a parameter > > call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd, > > So just from this I see only args itself on the stack, not args[0], > args[1] and so on. I would have to look at all callers to determine > that. Not good enough in my book. So what, we will now need to copy every args[] into a Lisp vector created by SAFE_ALLOCA_LISP, or xstrdup all of them, and do it in each and every function that gets the args[] array, all the way down to where the array is finally used (because usually we have 3 or 4 nested levels that pass args[] to one another)? That's insane! > > What does it mean in detail "the object may move"? A Lisp object is a > > tagged pointer. Do you mean the pointer should no point to a > > different address, i.e. the value of a Lisp object as a number should > > change to still be valid? > > Exactly. Unless an ambiguous reference prevents the copying that can > happen. How can we possibly make sure this works reliably and safely?? For each variable we have in every function, we will need to analyze whether the variable is . an automatic variable . a static variable that is protected by someone . a global variable that is protected by someone . a result of dereferencing a pointer that is somehow protected etc. etc., where "protected by someone" means that it is a descendant of some staticpro, or of some root, or... And if we cannot prove to ourselves that one of the above happens, then we'd need to force a copy of the variable to be on the stack? Does this sound practical? If this is the price of using MPS, and I'm not missing something obvious, then it sounds like we should run away from MPS, fast. Because we will sooner or later have to rewrite every single line of code we ever wrote. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 19:07 ` Eli Zaretskii @ 2025-01-05 20:04 ` Gerd Möllmann 2025-01-05 20:24 ` Eli Zaretskii 2025-01-05 21:15 ` Daniel Colascione 1 sibling, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 20:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sun, 05 Jan 2025 19:17:37 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > OK, but in most, if not all of these cases, the objects are referenced >> > from the stack. For example, in the above fragment, the args[] array >> > is on the stack. Right? >> >> That args is a parameter >> >> call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd, >> >> So just from this I see only args itself on the stack, not args[0], >> args[1] and so on. I would have to look at all callers to determine >> that. Not good enough in my book. > > So what, we will now need to copy every args[] into a Lisp vector > created by SAFE_ALLOCA_LISP, or xstrdup all of them, and do it in > each and every function that gets the args[] array, all the way down > to where the array is finally used (because usually we have 3 or 4 > nested levels that pass args[] to one another)? That's insane! > >> > What does it mean in detail "the object may move"? A Lisp object is a >> > tagged pointer. Do you mean the pointer should no point to a >> > different address, i.e. the value of a Lisp object as a number should >> > change to still be valid? >> >> Exactly. Unless an ambiguous reference prevents the copying that can >> happen. > > How can we possibly make sure this works reliably and safely?? For > each variable we have in every function, we will need to analyze > whether the variable is > > . an automatic variable > . a static variable that is protected by someone > . a global variable that is protected by someone > . a result of dereferencing a pointer that is somehow protected > > etc. etc., where "protected by someone" means that it is a descendant > of some staticpro, or of some root, or... > > And if we cannot prove to ourselves that one of the above happens, > then we'd need to force a copy of the variable to be on the stack? > > Does this sound practical? > > If this is the price of using MPS, and I'm not missing something > obvious, then it sounds like we should run away from MPS, fast. > Because we will sooner or later have to rewrite every single line of > code we ever wrote. I'm bowing out again. It's not worth it. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 20:04 ` Gerd Möllmann @ 2025-01-05 20:24 ` Eli Zaretskii 2025-01-06 3:57 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 20:24 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sun, 05 Jan 2025 21:04:56 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > How can we possibly make sure this works reliably and safely?? For > > each variable we have in every function, we will need to analyze > > whether the variable is > > > > . an automatic variable > > . a static variable that is protected by someone > > . a global variable that is protected by someone > > . a result of dereferencing a pointer that is somehow protected > > > > etc. etc., where "protected by someone" means that it is a descendant > > of some staticpro, or of some root, or... > > > > And if we cannot prove to ourselves that one of the above happens, > > then we'd need to force a copy of the variable to be on the stack? > > > > Does this sound practical? > > > > If this is the price of using MPS, and I'm not missing something > > obvious, then it sounds like we should run away from MPS, fast. > > Because we will sooner or later have to rewrite every single line of > > code we ever wrote. > > I'm bowing out again. It's not worth it. I don't understand why? I need to understand the implications to be able to make decisions, which are part of my job. So I ask questions, and I'm grateful for your answers, which clarify the issues for me. That I sometimes sound overwhelmed by the implications shouldn't be held against me, it's just a normal human reaction, nothing more. If I somehow sound impolite, I apologize. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 20:24 ` Eli Zaretskii @ 2025-01-06 3:57 ` Gerd Möllmann 2025-01-06 8:25 ` Gerd Möllmann 2025-01-06 14:07 ` Eli Zaretskii 0 siblings, 2 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-06 3:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sun, 05 Jan 2025 21:04:56 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > How can we possibly make sure this works reliably and safely?? For >> > each variable we have in every function, we will need to analyze >> > whether the variable is >> > >> > . an automatic variable >> > . a static variable that is protected by someone >> > . a global variable that is protected by someone >> > . a result of dereferencing a pointer that is somehow protected >> > >> > etc. etc., where "protected by someone" means that it is a descendant >> > of some staticpro, or of some root, or... >> > >> > And if we cannot prove to ourselves that one of the above happens, >> > then we'd need to force a copy of the variable to be on the stack? >> > >> > Does this sound practical? >> > >> > If this is the price of using MPS, and I'm not missing something >> > obvious, then it sounds like we should run away from MPS, fast. >> > Because we will sooner or later have to rewrite every single line of >> > code we ever wrote. >> >> I'm bowing out again. It's not worth it. > > I don't understand why? I need to understand the implications to be > able to make decisions, which are part of my job. So I ask questions, > and I'm grateful for your answers, which clarify the issues for me. > That I sometimes sound overwhelmed by the implications shouldn't be > held against me, it's just a normal human reaction, nothing more. I don't hold that against you, that's why I'm trying to answer questions, write stuff up, and so on, but for me your reply before this one was a leaf node in the thread. From my POV: So we're talking about things, you want to make it concrete, we land in call_process, I explain why SAFE_NALLOCA is unsafe when used with references even with the old GC, you think references are on the stack because the parameter args is on stack, and I say no. Next thing I get is a rant. You don't even say "you're right" or "you're wrong", so I don't know for sure if you accept my argumentation or not. Instead, you write something that came across here as "unreasonable, can't be true, we have to change every line of code, let's run from MPS". What should I reply to that? Nothing of course. > If I somehow sound impolite, I apologize. No worries about politeness. It wasn't impolite, and I'm not very sensible anyway. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-06 3:57 ` Gerd Möllmann @ 2025-01-06 8:25 ` Gerd Möllmann 2025-01-06 14:07 ` Eli Zaretskii 1 sibling, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-06 8:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Eli Zaretskii <eliz@gnu.org> writes: > >>> From: Gerd Möllmann <gerd.moellmann@gmail.com> >>> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >>> Date: Sun, 05 Jan 2025 21:04:56 +0100 >>> >>> Eli Zaretskii <eliz@gnu.org> writes: >>> >>> > How can we possibly make sure this works reliably and safely?? For >>> > each variable we have in every function, we will need to analyze >>> > whether the variable is >>> > >>> > . an automatic variable >>> > . a static variable that is protected by someone >>> > . a global variable that is protected by someone >>> > . a result of dereferencing a pointer that is somehow protected >>> > >>> > etc. etc., where "protected by someone" means that it is a descendant >>> > of some staticpro, or of some root, or... >>> > >>> > And if we cannot prove to ourselves that one of the above happens, >>> > then we'd need to force a copy of the variable to be on the stack? >>> > >>> > Does this sound practical? >>> > >>> > If this is the price of using MPS, and I'm not missing something >>> > obvious, then it sounds like we should run away from MPS, fast. >>> > Because we will sooner or later have to rewrite every single line of >>> > code we ever wrote. >>> >>> I'm bowing out again. It's not worth it. >> >> I don't understand why? I need to understand the implications to be >> able to make decisions, which are part of my job. So I ask questions, >> and I'm grateful for your answers, which clarify the issues for me. >> That I sometimes sound overwhelmed by the implications shouldn't be >> held against me, it's just a normal human reaction, nothing more. > > I don't hold that against you, that's why I'm trying to answer > questions, write stuff up, and so on, but for me your reply before this > one was a leaf node in the thread. > >> From my POV: So we're talking about things, you want to make it > concrete, we land in call_process, I explain why SAFE_NALLOCA is unsafe > when used with references even with the old GC, you think references are > on the stack because the parameter args is on stack, and I say no. > > Next thing I get is a rant. You don't even say "you're right" or "you're > wrong", so I don't know for sure if you accept my argumentation or not. > Instead, you write something that came across here as "unreasonable, > can't be true, we have to change every line of code, let's run from > MPS". > > What should I reply to that? Nothing of course. > >> If I somehow sound impolite, I apologize. > > No worries about politeness. It wasn't impolite, and I'm not very > sensible anyway. Not being sensible is true also, but s/sensible/sensitive/ :-) ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-06 3:57 ` Gerd Möllmann 2025-01-06 8:25 ` Gerd Möllmann @ 2025-01-06 14:07 ` Eli Zaretskii 1 sibling, 0 replies; 78+ messages in thread From: Eli Zaretskii @ 2025-01-06 14:07 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Mon, 06 Jan 2025 04:57:37 +0100 > > >From my POV: So we're talking about things, you want to make it > concrete, we land in call_process, I explain why SAFE_NALLOCA is unsafe > when used with references even with the old GC, you think references are > on the stack because the parameter args is on stack, and I say no. > > Next thing I get is a rant. It wasn't a rant, it was an honest surprise that we'd need to go to such lengths to accommodate MPS. I still consider the effort to be huge and not very practical, and even if someone volunteers to make all those changes, I'd be petrified to think how many bugs this could introduce on the way. > You don't even say "you're right" or "you're > wrong", so I don't know for sure if you accept my argumentation or not. Sorry for my style, but I usually try not to say things that add nothing to the discussion. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 19:07 ` Eli Zaretskii 2025-01-05 20:04 ` Gerd Möllmann @ 2025-01-05 21:15 ` Daniel Colascione 2025-01-06 12:59 ` Eli Zaretskii 1 sibling, 1 reply; 78+ messages in thread From: Daniel Colascione @ 2025-01-05 21:15 UTC (permalink / raw) To: 75322, eliz, gerd.moellmann; +Cc: pipcet On January 5, 2025 2:07:26 PM EST, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sun, 05 Jan 2025 19:17:37 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > OK, but in most, if not all of these cases, the objects are referenced >> > from the stack. For example, in the above fragment, the args[] array >> > is on the stack. Right? >> >> That args is a parameter >> >> call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd, >> >> So just from this I see only args itself on the stack, not args[0], >> args[1] and so on. I would have to look at all callers to determine >> that. Not good enough in my book. > >So what, we will now need to copy every args[] into a Lisp vector >created by SAFE_ALLOCA_LISP, or xstrdup all of them, and do it in >each and every function that gets the args[] array, all the way down >to where the array is finally used (because usually we have 3 or 4 >nested levels that pass args[] to one another)? That's insane! > >> > What does it mean in detail "the object may move"? A Lisp object is a >> > tagged pointer. Do you mean the pointer should no point to a >> > different address, i.e. the value of a Lisp object as a number should >> > change to still be valid? >> >> Exactly. Unless an ambiguous reference prevents the copying that can >> happen. > >How can we possibly make sure this works reliably and safely?? For >each variable we have in every function, we will need to analyze >whether the variable is > > . an automatic variable > . a static variable that is protected by someone > . a global variable that is protected by someone > . a result of dereferencing a pointer that is somehow protected > >etc. etc., where "protected by someone" means that it is a descendant >of some staticpro, or of some root, or... Well, yeah. Every other GC program does this. Emacs can too. There's no great burden: all Lisp objects get traced automatically. Everything on the stack or in a register gets traced automatically, and, because the scanning is conservative, pinned. You only have to take extra steps to tell the GC about something when you're going out of your way to go around the GC. It's simply not true that to adopt a modern GC every line of code has to change. I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite. >And if we cannot prove to ourselves that one of the above happens, >then we'd need to force a copy of the variable to be on the stack? > >Does this sound practical? > >If this is the price of using MPS, and I'm not missing something >obvious, then it sounds like we should run away from MPS, fast. >Because we will sooner or later have to rewrite every single line of >code we ever wrote. No, you do it by adopting a rule that when a function receives a pointer, the caller guarantees the validity of the pointer for the duration of the call. This way, only the level of the stack that spills the array to the heap has to take on the responsibility of keeping the referenced objects alive, and making the spilled array a pointer to the pinned guts of a Lisp vector is an adequate way to do this. "Oh, but won't that kill performance?" In a generational system, allocating small, short-lived objects is actually cheap ---- usually faster than mallloc. I don't recall how MPS does it, but in some garbage collectors, a gen0 allocation of this sort is literally just incrementing a pointer. Malloc has a substantial cost of its own too. Yes, these objects make a gen0 GC happen sooner. So what? Young generation GC is cheap. If you could observe a performance problem, you could solve the reference problem by maintaining a thread local linked list of spilled array blocks and teach the GC to scan them, but that's likely overkill. Anyway, every native code program anywhere that uses GC has to care about lifetimes and references. Abandoning MPS doesn't magically make the problem go away. The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 21:15 ` Daniel Colascione @ 2025-01-06 12:59 ` Eli Zaretskii 2025-01-06 14:48 ` Daniel Colascione 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-06 12:59 UTC (permalink / raw) To: Daniel Colascione; +Cc: gerd.moellmann, pipcet, 75322 > Date: Sun, 05 Jan 2025 16:15:47 -0500 > From: Daniel Colascione <dancol@dancol.org> > CC: pipcet@protonmail.com, 75322@debbugs.gnu.org > > > . an automatic variable > > . a static variable that is protected by someone > > . a global variable that is protected by someone > > . a result of dereferencing a pointer that is somehow protected > > > >etc. etc., where "protected by someone" means that it is a descendant > >of some staticpro, or of some root, or... > > Well, yeah. Every other GC program does this. Emacs can too. There's no great burden: all Lisp objects get traced automatically. Everything on the stack or in a register gets traced automatically, and, because the scanning is conservative, pinned. You only have to take extra steps to tell the GC about something when you're going out of your way to go around the GC. > > It's simply not true that to adopt a modern GC every line of code has to change. I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite. The opinions in this thread are that changes in the code _are_ needed. Maybe that's not a "rewrite" you had in mind, but if we need to make such substantial changes in many places, that's a "rewrite" in my book. > >And if we cannot prove to ourselves that one of the above happens, > >then we'd need to force a copy of the variable to be on the stack? > > > >Does this sound practical? > > > >If this is the price of using MPS, and I'm not missing something > >obvious, then it sounds like we should run away from MPS, fast. > >Because we will sooner or later have to rewrite every single line of > >code we ever wrote. > > No, you do it by adopting a rule that when a function receives a pointer, the caller guarantees the validity of the pointer for the duration of the call. This way, only the level of the stack that spills the array to the heap has to take on the responsibility of keeping the referenced objects alive, and making the spilled array a pointer to the pinned guts of a Lisp vector is an adequate way to do this. We are talking about code such as this one: SAFE_NALLOCA (args2, 1, nargs + 1); args2[0] = Qcall_process; for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; coding_systems = Ffind_operation_coding_system (nargs + 1, args2); val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; "Look, ma: no pointers!" The args[] array is fine: it comes from the caller and is valid. The problem being discussed here is the args2[] array, in the case where SAFE_NALLOCA decides args2[] is too large for the stack and instead malloc's it. In that case, args2[] stores on the heap copies of the objects in args[] (still no pointers!), and the issue is that when GC happens (say, at some point inside Ffind_operation_coding_system), the objects in args[] are updated by GC, but their copies in args2[] are not. So this code needs to be changed. And if you look around, we have quite a lot of these in many places. > "Oh, but won't that kill performance?" That wasn't my primary bother. The primary bother is the need to modify many places. Which is not necessarily a purely mechanical refactoring, either. Consider: static Lisp_Object last_coding; Lisp_Object coding = Vcoding_system_for_write; [...] last_coding = coding; call_some_function (); if (NILP (last_coding)) do_something (); If call_some_function can trigger GC, the value of 'coding' after the call is still valid, but the value of 'last_coding' could be garbage (unless 'last_coding' is staticpro'd). We have almost 200 static Lisp_Object variables, probably not all of them staticpro'd (8 of them inside functions, like the above example, so definitely not staticpro'd). So now we need to examine the uses of all of them and either staticpro them or do something else (like move the assignment to 'last_coding' to after call_some_function). And we will probably find other usage patterns which trip the same wire. And the amazing part is not the above, it''s the fact that with all that "unsafe" code people are using the branch for production, and most of them report a stable Emacs. What does that tell us? that most, if not all, of those places are perfectly valid code, and we are haunted by the proverbial shadow of a dwarf? Or maybe it just makes the job harder, because we now need to identify the places where these bad things can _really_ happen, and fix only them? Or something else? This is what I'm trying to establish. The problem is not theoretical, it's entirely practical: where to steer our efforts of stabilizing the branch and preparing it for landing, given the available resources. > The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. If we allocate on the stack almost all the time, we will keep allocating on the stack almost all the time. But anyway, if you are now saying that the code is broken, then a rewrite, and a significant one at that, _is_ needed, right? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-06 12:59 ` Eli Zaretskii @ 2025-01-06 14:48 ` Daniel Colascione 2025-01-06 15:12 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Daniel Colascione @ 2025-01-06 14:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> Date: Sun, 05 Jan 2025 16:15:47 -0500 >> From: Daniel Colascione <dancol@dancol.org> >> CC: pipcet@protonmail.com, 75322@debbugs.gnu.org >> >> > . an automatic variable >> > . a static variable that is protected by someone >> > . a global variable that is protected by someone >> > . a result of dereferencing a pointer that is somehow protected >> > >> >etc. etc., where "protected by someone" means that it is a descendant >> >of some staticpro, or of some root, or... >> >> Well, yeah. Every other GC program does this. Emacs can too. There's >> no great burden: all Lisp objects get traced >> automatically. Everything on the stack or in a register gets traced >> automatically, and, because the scanning is conservative, >> pinned. You only have to take extra steps to tell the GC about >> something when you're going out of your way to go around the GC. >> >> It's simply not true that to adopt a modern GC every line of code has to change. I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite. > > The opinions in this thread are that changes in the code _are_ needed. > Maybe that's not a "rewrite" you had in mind, but if we need to make > such substantial changes in many places, that's a "rewrite" in my > book. I wouldn't call it a "rewrite". If auditing the codebase for memory safety is a "rewrite", I'm a "duck". We're talking about a few hundred lines of changes at the most. Most of the work is just auditing the code for problems. We should be grateful Gerd has done this work already, not "run away from MPS, fast". >> >And if we cannot prove to ourselves that one of the above happens, >> >then we'd need to force a copy of the variable to be on the stack? >> > >> >Does this sound practical? >> > >> >If this is the price of using MPS, and I'm not missing something >> >obvious, then it sounds like we should run away from MPS, fast. >> >Because we will sooner or later have to rewrite every single line of >> >code we ever wrote. >> >> No, you do it by adopting a rule that when a function receives a >> pointer, the caller guarantees the validity of the pointer for the >> duration of the call. This way, only the level of the stack that >> spills the array to the heap has to take on the responsibility of >> keeping the referenced objects alive, and making the spilled array a >> pointer to the pinned guts of a Lisp vector is an adequate way to >> do this. > > We are talking about code such as this one: > > SAFE_NALLOCA (args2, 1, nargs + 1); > args2[0] = Qcall_process; > for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; > coding_systems = Ffind_operation_coding_system (nargs + 1, args2); > val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; > > "Look, ma: no pointers!" Lisp_Object val, *args2; In the C programming language, "*" means "pointer". > The args[] array is fine: it comes from the caller and is valid. The > problem being discussed here is the args2[] array, in the case where > SAFE_NALLOCA decides args2[] is too large for the stack and instead > malloc's it. In that case, args2[] stores on the heap copies of the > objects in args[] (still no pointers!), and the issue is that when GC > happens (say, at some point inside Ffind_operation_coding_system), the > objects in args[] are updated by GC, but their copies in args2[] are > not. Each Lisp_Object held across a GC must either: 1) be visible to the GC for precise marking and updating (which is the case for all references from Lisp object to another as well as staticpro-protected static storage), or 2) be visible to the GC for conservative marking and pinning instead of moving, which today means the variable is on the stack. A Lisp_Object embedded in a block of malloc-heap memory doesn't satisfy either condition and so is unsafe. Also, this pattern is perfectly fine: void foo() { Lisp_Object c = Fcons(...); bar(&c); } void bar(Lisp_Object* c) { eassert(CONSP(c)); Fgarbage_collect(); eassert(CONSP(c)); } because when bar() gets its pointer, it implicitly assumes that it's *caller* has followed the rule above, which it has. The Emacs code already works this way. The changes we're talking about are not anywhere near as large-scale as what you might have in mind. > So this code needs to be changed. The snippet you quoted above can be fixed with a one-liner --- replace SAFE_NALLOCA with SAFE_ALLOCA_LISP. (We have to fix SAFE_ALLOCA_LISP_EXTRA to make it safe with the old GC, but that's not a change that needs to happen at call sites.) > And if you look around, we have quite a lot of these in many places. Sounds like Gerd's spent some time hunting them down. >> "Oh, but won't that kill performance?" > > That wasn't my primary bother. The primary bother is the need to > modify many places. Which is not necessarily a purely mechanical > refactoring, either. Consider: > > static Lisp_Object last_coding; > Lisp_Object coding = Vcoding_system_for_write; > [...] > last_coding = coding; > call_some_function (); > if (NILP (last_coding)) > do_something (); > > If call_some_function can trigger GC, the value of 'coding' after the > call is still valid, but the value of 'last_coding' could be garbage > (unless 'last_coding' is staticpro'd). Correct. The fix is to move last_coding to file scope and gcpro it. You can argue that moving this declaration around is not a "mechanical" fix, but it's straightforward, safe, and makes the code clearer: after the change, the static storage is at file scope where you can see it, not buried in a function body. > We have almost 200 static > Lisp_Object variables, probably not all of them staticpro'd (8 of them > inside functions, like the above example, so definitely not > staticpro'd). So now we need to examine the uses of all of them and > either staticpro them or do something else (like move the assignment > to 'last_coding' to after call_some_function). Changing eight variables from function statics to file statics hardly seems like a monumental effort. The static-storage global-scope Lisp_Object variables are probably almost all gcproed already. Gerd has already gone a long way towards identifying the various problem spots. In particular, his change to move uses of SAFE_ALLOCA to the _LISP variant in various places is good and should be applied regardless of the readiness of the rest of the MPS branch. Even in cases where for one reason or another the code he's changing is safe under the old GC, the old code is fragile and can break under modification. > And we will probably find other usage patterns which trip the same Yep. > And the amazing part is not the above, it''s the fact that with all > that "unsafe" code people are using the branch for production, and > most of them report a stable Emacs. What does that tell us? that > most, if not all, of those places are perfectly valid code, and we are > haunted by the proverbial shadow of a dwarf? Or maybe it just makes > the job harder, because we now need to identify the places where these > bad things can _really_ happen, and fix only them? Or something else? > This is what I'm trying to establish. The problem is not theoretical, > it's entirely practical: where to steer our efforts of stabilizing the > branch and preparing it for landing, given the available resources. There are three reasons we might not see code with dubious adherence to GC rules cause problems in practice: 1) the dubious code is unsafe only in rare edge cases (who has thousands of call-process parameters?) or the problems are GC-timing dependent, 2) the dubious code is safe only because all the Lisp values in question are protected by other references and the old GC is non-moving, and 3) the dubious code is safe only because the only Lisp values stored in these malloc-heap regions talk about objects that are naturally immortal, e.g. Qnil, and the GC doesn't move immortal values. A case of #1 is a memory safety problem. #2 and #3 merely mean we have fragile code. Sometimes it's hard to tell the difference, which is why we should fix all three. A moving GC like MPS imposes more stringent requirements on a program than a non-moving one does, but the changes needed to make Emacs safe under a moving GC are not particularly invasive and make the code safer and more maintainable overall. Most of the changes should be one- or few-line tweaks. We're talking about nothing remotely close to a general rewrite. >> The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. > > If we allocate on the stack almost all the time, we will keep > allocating on the stack almost all the time. But anyway, if you are > now saying that the code is broken, then a rewrite, and a significant > one at that, _is_ needed, right? "Almost" isn't good enough when it comes to memory safety problems and the security issues they often cause. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-06 14:48 ` Daniel Colascione @ 2025-01-06 15:12 ` Eli Zaretskii 2025-01-06 15:27 ` Daniel Colascione 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-06 15:12 UTC (permalink / raw) To: Daniel Colascione; +Cc: gerd.moellmann, pipcet, 75322 > From: Daniel Colascione <dancol@dancol.org> > Cc: gerd.moellmann@gmail.com, pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Mon, 06 Jan 2025 09:48:09 -0500 > > I wouldn't call it a "rewrite". If auditing the codebase for memory > safety is a "rewrite", I'm a "duck". We're talking about a few hundred > lines of changes at the most. Most of the work is just auditing the > code for problems. We should be grateful Gerd has done this work > already, not "run away from MPS, fast". I _am_ grateful to Gerd (and Helmut, and Pip, and others who work on this). I also invested a significant, albeit smaller, effort on my part into this branch. However, the potential amount of changes still bothers me. I understand it doesn't bother you, so I guess we disagree in our estimations. > > SAFE_NALLOCA (args2, 1, nargs + 1); > > args2[0] = Qcall_process; > > for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; > > coding_systems = Ffind_operation_coding_system (nargs + 1, args2); > > val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; > > > > "Look, ma: no pointers!" > > Lisp_Object val, *args2; > > In the C programming language, "*" means "pointer". Are we going to argue about pointers and arrays? > > So this code needs to be changed. > > The snippet you quoted above can be fixed with a one-liner --- replace > SAFE_NALLOCA with SAFE_ALLOCA_LISP. It's just one example, and there are many like it. So that one-liner is multiplied many times. And then we have variations, where args[] gets text of strings or some other similar stuff. Etc. etc. > > And if you look around, we have quite a lot of these in many places. > > Sounds like Gerd's spent some time hunting them down. Sure, but I'm afraid there are many more. > > We have almost 200 static > > Lisp_Object variables, probably not all of them staticpro'd (8 of them > > inside functions, like the above example, so definitely not > > staticpro'd). So now we need to examine the uses of all of them and > > either staticpro them or do something else (like move the assignment > > to 'last_coding' to after call_some_function). > > Changing eight variables from function statics to file statics hardly > seems like a monumental effort. After you found them, and after you know they should be changed, yes. It's easy to account for the knowns; the problem is always the unknowns. That's why most effort estimations are inaccurate. I wonder what are our unknowns here, and how many of them are there. > The static-storage global-scope > Lisp_Object variables are probably almost all gcproed already. Maybe. But someone needs to verify that, right? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-06 15:12 ` Eli Zaretskii @ 2025-01-06 15:27 ` Daniel Colascione 0 siblings, 0 replies; 78+ messages in thread From: Daniel Colascione @ 2025-01-06 15:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, 75322 On January 6, 2025 10:12:53 AM EST, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Daniel Colascione <dancol@dancol.org> >> Cc: gerd.moellmann@gmail.com, pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Mon, 06 Jan 2025 09:48:09 -0500 >> >> I wouldn't call it a "rewrite". If auditing the codebase for memory >> safety is a "rewrite", I'm a "duck". We're talking about a few hundred >> lines of changes at the most. Most of the work is just auditing the >> code for problems. We should be grateful Gerd has done this work >> already, not "run away from MPS, fast". > >I _am_ grateful to Gerd (and Helmut, and Pip, and others who work on >this). I also invested a significant, albeit smaller, effort on my >part into this branch. However, the potential amount of changes still >bothers me. I understand it doesn't bother you, so I guess we >disagree in our estimations. > >> > SAFE_NALLOCA (args2, 1, nargs + 1); >> > args2[0] = Qcall_process; >> > for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; >> > coding_systems = Ffind_operation_coding_system (nargs + 1, args2); >> > val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; >> > >> > "Look, ma: no pointers!" >> >> Lisp_Object val, *args2; >> >> In the C programming language, "*" means "pointer". > >Are we going to argue about pointers and arrays? > >> > So this code needs to be changed. >> >> The snippet you quoted above can be fixed with a one-liner --- replace >> SAFE_NALLOCA with SAFE_ALLOCA_LISP. > >It's just one example, and there are many like it. So that one-liner >is multiplied many times. > >And then we have variations, where args[] gets text of strings or some >other similar stuff. Etc. etc. > >> > And if you look around, we have quite a lot of these in many places. >> >> Sounds like Gerd's spent some time hunting them down. > >Sure, but I'm afraid there are many more. > >> > We have almost 200 static >> > Lisp_Object variables, probably not all of them staticpro'd (8 of them >> > inside functions, like the above example, so definitely not >> > staticpro'd). So now we need to examine the uses of all of them and >> > either staticpro them or do something else (like move the assignment >> > to 'last_coding' to after call_some_function). >> >> Changing eight variables from function statics to file statics hardly >> seems like a monumental effort. > >After you found them, and after you know they should be changed, yes. >It's easy to account for the knowns; the problem is always the >unknowns. That's why most effort estimations are inaccurate. I >wonder what are our unknowns here, and how many of them are there. I'm a lot less worried than you are about the unknown unknowns. I have a few specific reasons why: 1) we caught most of the movement-unsafe global references when we did pdumper, which, like MPS, moves things around in memory and so has to worry about the kind of unsafe references we're discussing here. 2) when I did my own moving GC a few years ago, I didn't run into serious problems with movement-unsafe references, although, like Gerd and others have done on the MPS branch, I had to fix a few 3) it should be possible (I don't know whether MPC implements this or not, but it could) to arrange for a debugging mode moving GC that moves *everything* not pinned from the from-space to a non-overlapping to-space, then applies PROT_NONE to any part of the from-space not used for a pinned object. This way, any GC-invisible Lisp_Obiects (or other pointers) "left behind" because we forgot to tell the GC about them will produce an immediate SIGSEGV when used. We could even conservatively scan for them. #3 is probably overkill, but it's something we could try if we attempted to merge MPS and found chronic problems >> The static-storage global-scope >> Lisp_Object variables are probably almost all gcproed already. > >Maybe. But someone needs to verify that, right? A few people have already done things like this over the years. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 14:11 ` Gerd Möllmann 2025-01-05 17:45 ` Eli Zaretskii @ 2025-01-05 21:01 ` Daniel Colascione 2025-01-05 23:28 ` Daniel Colascione 2025-01-06 4:23 ` Gerd Möllmann 1 sibling, 2 replies; 78+ messages in thread From: Daniel Colascione @ 2025-01-05 21:01 UTC (permalink / raw) To: 75322, gerd.moellmann, eliz; +Cc: pipcet [-- Attachment #1: Type: text/plain, Size: 3876 bytes --] On January 5, 2025 9:11:08 AM EST, "Gerd Möllmann" <gerd.moellmann@gmail.com> wrote: >Eli Zaretskii <eliz@gnu.org> writes: > >>> From: Gerd Möllmann <gerd.moellmann@gmail.com> >>> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >>> Date: Sat, 04 Jan 2025 11:20:41 +0100 >>> >>> In callproc.c I found two: call_process and create_temp_file both use >>> SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces >>> with SAVE_ALLOCA_LISP. >> >> What are the conditions under which placing Lisp objects into >> SAFE_NALLOCA is not safe? >> >> I understand that the first condition is that SAFE_NALLOCA uses >> xmalloc instead of alloca. > >Right. If it doesn't use xmalloc, the references are on the C stack, and >both old and new GC handle that by scanning the C stack. > >> But what are the other conditions? Is one of them that GC could >> happen while these Lisp objects are in the memory allocated by >> SAFE_NALLOCA off the heap? > >Yes. > >> IOW, if no GC happen, is that still unsafe? And if GC _can_ happen, >> but we don't use the allocated block again, is that a problem? For >> example, in this fragment: >> >> SAFE_NALLOCA (args2, 1, nargs + 1); >> args2[0] = Qcall_process; >> for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; >> coding_systems = Ffind_operation_coding_system (nargs + 1, args2); >> val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; >> >> Let's say Ffind_operation_coding_system could trigger GC. But we >> never again use the args2[] array after Ffind_operation_coding_system >> returns. Is the above still unsafe? If so, could you tell what >> could MPS do during GC to make this unsafe? > >Let me first say why I find this unsafe in the old GC, in principle. If >we don't assume anything about the objects referenced from args2, then a >reference in args2 may well be the only one to some object. In this >case, the old GC would sweep it. Gerd is right. This pattern was never safe. > Or, the other way 'round, by using >SAFE_NALLOCA we make an assumption. And that, from my (GCPRO) POV, needs >a proof, or better yet some check in code. > >Not using arg2 after Ffind_operation_coding_system above is not enough. >It would have to be not using args2 after the GC has run. Maybe that's >_in_ Ffind_operation_coding_system. > >In the new GC, with MPS, the same is true as above. An object which is >only referenced from args2 may die. Right, because the backing storage for args2 might be in the mallloc heap, and GC doesn't scan the mallloc heap. >Additionally, objects might not die but may move, assuming that >SAFE_NALLOCA does not create an ambiguous root. So, using SAFE_NALLOCA >makes another assumption in the MPS case: that something else prevents >the objects from moving. Another proof or check required with my GCPRO >hat on. Yes. In any system, a reference the GC doesn't know about must be assumed to be garbage the instant it's created. Every object is dead unless the GC can prove it's alive. >> Also, in some other message you said SAFE_NALLOCA is unsafe if >> _pointers_ to Lisp objects are placed in the memory SAFE_NALLOCA >> allocates off the heap. In call_process I see that we only ever put >> Lisp objects into the memory allocated by SAFE_NALLOCA. If that is >> unsafe, could you tell what MPS does during GC which makes this >> unsafe? > >Not sure, is the question why in MPS both pointers and Lisp_Object count >as "references"? > >If it's that, it's basically the same in the old GC. For example, when >marking the C stack, we must recognize both pointers to Lisp_Cons and >Lisp_Objects that look like conses, which contain such a pointer. And a third case: interior pointers. A native pointer to a Lisp object isn't necessarily pointing to the start of that object. [-- Attachment #2: Type: text/html, Size: 4494 bytes --] ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 21:01 ` Daniel Colascione @ 2025-01-05 23:28 ` Daniel Colascione 2025-01-06 13:26 ` Eli Zaretskii 2025-01-06 4:23 ` Gerd Möllmann 1 sibling, 1 reply; 78+ messages in thread From: Daniel Colascione @ 2025-01-05 23:28 UTC (permalink / raw) To: 75322; +Cc: gerd.moellmann, eliz, pipcet Daniel Colascione <dancol@dancol.org> writes: > On January 5, 2025 9:11:08 AM EST, "Gerd Möllmann" > <gerd.moellmann@gmail.com> wrote: >>Eli Zaretskii <eliz@gnu.org> writes: >> >>>> From: Gerd Möllmann <gerd.moellmann@gmail.com> >>>> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >>>> Date: Sat, 04 Jan 2025 11:20:41 +0100 >>>> >>>> In callproc.c I found two: call_process and create_temp_file both use >>>> SAFE_NALLOCA to store Lisp_Objects. I think these should be replaces >>>> with SAVE_ALLOCA_LISP. >>> >>> What are the conditions under which placing Lisp objects into >>> SAFE_NALLOCA is not safe? >>> >>> I understand that the first condition is that SAFE_NALLOCA uses >>> xmalloc instead of alloca. >> >>Right. If it doesn't use xmalloc, the references are on the C stack, and >>both old and new GC handle that by scanning the C stack. >> >>> But what are the other conditions? Is one of them that GC could >>> happen while these Lisp objects are in the memory allocated by >>> SAFE_NALLOCA off the heap? >> >>Yes. >> >>> IOW, if no GC happen, is that still unsafe? And if GC _can_ happen, >>> but we don't use the allocated block again, is that a problem? For >>> example, in this fragment: >>> >>> SAFE_NALLOCA (args2, 1, nargs + 1); >>> args2[0] = Qcall_process; >>> for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; >>> coding_systems = Ffind_operation_coding_system (nargs + 1, args2); >>> val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; >>> >>> Let's say Ffind_operation_coding_system could trigger GC. But we >>> never again use the args2[] array after Ffind_operation_coding_system >>> returns. Is the above still unsafe? If so, could you tell what >>> could MPS do during GC to make this unsafe? >> >>Let me first say why I find this unsafe in the old GC, in principle. If >>we don't assume anything about the objects referenced from args2, then a >>reference in args2 may well be the only one to some object. In this >>case, the old GC would sweep it. > > Gerd is right. This pattern was never safe. Here's a demonstration of the problem. Run ./emacs -batch -Q --eval '(acos 0)'. If you leave demo_crash to true, Emacs will abort quickly after we detect a use-after-free. If you set demo_crash to false, Emacs will run the loop all day. diff --git a/src/floatfns.c b/src/floatfns.c index 065ae16e885..a95597beef8 100644 --- a/src/floatfns.c +++ b/src/floatfns.c @@ -50,6 +50,8 @@ Copyright (C) 1988, 1993-1994, 1999, 2001-2025 Free Software Foundation, #include <config.h> +#include <stdlib.h> + #include "lisp.h" #include "bignum.h" @@ -86,6 +88,31 @@ DEFUN ("acos", Facos, Sacos, 1, 1, 0, doc: /* Return the inverse cosine of ARG. */) (Lisp_Object arg) { + Lisp_Object* args2; + unsigned start = 12345; + unsigned counter = start; + bool demo_crash = true; + + USE_SAFE_ALLOCA; + + SAFE_NALLOCA (args2, 1, 1 + (demo_crash ? MAX_ALLOCA : 0)); + args2[0] = Fcons (make_fixnum (counter), + make_fixnum (counter + 1)); + counter += 2; + for (;;) + { + if (!FIXNUMP (XCAR (args2[0]))) + emacs_abort (); + if (XFIXNUM (XCAR (args2[0])) != 12345) + emacs_abort (); + Fcons (make_fixnum (counter), + make_fixnum (counter + 1)); + Fgarbage_collect (); + fprintf (stderr, "."); + fflush (stderr); + counter += 2; + } + double d = extract_float (arg); d = acos (d); return make_float (d); ^ permalink raw reply related [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 23:28 ` Daniel Colascione @ 2025-01-06 13:26 ` Eli Zaretskii 2025-01-06 15:08 ` Daniel Colascione 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-06 13:26 UTC (permalink / raw) To: Daniel Colascione; +Cc: gerd.moellmann, pipcet, 75322 > From: Daniel Colascione <dancol@dancol.org> > Cc: gerd.moellmann@gmail.com, eliz@gnu.org, pipcet@protonmail.com > Date: Sun, 05 Jan 2025 18:28:59 -0500 > > Daniel Colascione <dancol@dancol.org> writes: > > Here's a demonstration of the problem. Run ./emacs -batch -Q --eval > '(acos 0)'. If you leave demo_crash to true, Emacs will abort quickly > after we detect a use-after-free. If you set demo_crash to false, Emacs > will run the loop all day. It is a well-known fact that inserting Fgarbage_collect in various random places can cause bugs. But expecting every Emacs C-level hacker to write code that will endure such testing is impractical. We routinely let much more easily-spotted blunders slip though. The sheer number of subtleties and factoids you need to keep in mind when writing safe code in Emacs is already inhumanly large. We only get away because there are many places where GC cannot happen. Admittedly, with the proliferation of calls into Lisp, there's less and less of these places each year. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-06 13:26 ` Eli Zaretskii @ 2025-01-06 15:08 ` Daniel Colascione 0 siblings, 0 replies; 78+ messages in thread From: Daniel Colascione @ 2025-01-06 15:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, 75322 On January 6, 2025 8:26:15 AM EST, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Daniel Colascione <dancol@dancol.org> >> Cc: gerd.moellmann@gmail.com, eliz@gnu.org, pipcet@protonmail.com >> Date: Sun, 05 Jan 2025 18:28:59 -0500 >> >> Daniel Colascione <dancol@dancol.org> writes: >> >> Here's a demonstration of the problem. Run ./emacs -batch -Q --eval >> '(acos 0)'. If you leave demo_crash to true, Emacs will abort quickly >> after we detect a use-after-free. If you set demo_crash to false, Emacs >> will run the loop all day. > >It is a well-known fact that inserting Fgarbage_collect in various >random places can cause bugs. It's not a "random place". It's demonstrating an unsafe pattern. > But expecting every Emacs C-level >hacker to write code that will endure such testing is impractical. If someone can't write safe Emacs C code, he shouldn't be writing Emacs C code at all. It is reasonable to expect people to follow the rules. Most new features should be in Lisp and changes to C code should be in service of making it possible to put more logic in Lisp. > We >routinely let much more easily-spotted blunders slip though. The >sheer number of subtleties and factoids you need to keep in mind when >writing safe code in Emacs is already inhumanly large. Yes. It's hard enough to write C code as it is. That's why people need to write C code with clear and simple bright-line rules on memory safety. For example, instead of thinking about whether this or that function scope Lisp_Object is safe, just ban function scope Lisp_Object static variables generally. Much lower cognitive burden this way. >We only get away because there are many places where GC cannot happen. We also get away with a lot because the old GC is non-moving. The things we get away with are fragile whether or not the old GC being forgiving makes them technically safe. >Admittedly, with the proliferation of calls into Lisp, there's less >and less of these places each year. Yes, which is why it's best to make code GC safe in general. Just like we don't have to sprinkle the register keyword around anymore, we don't have to try to get clever and avoid having to gcpro this or that thing. Protect all the things. Simple and robust. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 21:01 ` Daniel Colascione 2025-01-05 23:28 ` Daniel Colascione @ 2025-01-06 4:23 ` Gerd Möllmann 1 sibling, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-06 4:23 UTC (permalink / raw) To: Daniel Colascione; +Cc: pipcet, 75322, eliz Daniel Colascione <dancol@dancol.org> writes: >>If it's that, it's basically the same in the old GC. For example, when >>marking the C stack, we must recognize both pointers to Lisp_Cons and >>Lisp_Objects that look like conses, which contain such a pointer. > > And a third case: interior pointers. A native pointer to a Lisp object > isn't necessarily pointing to the start of that object. Right. Thanks for the backup, BTW :-). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 9:56 ` Eli Zaretskii 2025-01-04 10:20 ` Gerd Möllmann @ 2025-01-04 11:41 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 0 replies; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 11:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Gerd Möllmann, 75322 "Eli Zaretskii" <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sat, 04 Jan 2025 09:47:43 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > Let's discuss this on a case by case basis. Not all uses of alloca >> > are the same or have the same requirements and restrictions. >> >> Okay. For the SDATA case, I find Pip's commit does the right thing. It >> uses xstrdup for the strings and makes sure these are freed later by >> registering them in one of the special specpdl entries that exist for >> that purpose (record_unwind_protect_ptr). Works with both GCs, is always >> safe, I don't see disadvantages, and we don't have to check if GC runs >> or not and compact strings (old) or moves string data around (igc). > > The disadvantage is to xstrdup strings when that is not needed. It > increases memory pressure and also costs some CPU time. Not very > significant disadvantage, admittedly, but still. If this is needed, > it's a small price to pay, but if it isn't needed, it's a waste. It's easier to xstrdup rather than figure out which no-GC assumptions are in the subprocess code, which of them are valid (probably not all of them, considering Fexpand_file_name), and how to document them in a way that doesn't turn this area of code into more of a minefield than it is already :-) > So let's see if we agree that it is indeed needed. The way to do that > is to explain how GC could happen while the code which assembles the > environment in make_environment_block and the code in emacs_spawn > afterwards run. Note that we block SIGCHLD and block input while > emacs_spawn runs. I'm not going to explain "how GC could happen" in the MPS case: GC can strike at any time. For traditional GC, if we want to change the code as little as possible, we could indeed try to establish no-GC assumptions, but we don't have to: we can simply start with correct but slow code and maybe drop a note to the usual suspects that there is optimization potential here. >> For the Lisp_Object cases, I strongly suspect that these cases are an >> oversight and were left over when SAFE_ALLOCA_LISP was introduced. The >> reason for introducing SAFE_ALLOCA_LISP is, IMO, what Pip said (old GC): >> to make sure that the Lisp_Objects are marked, via specpdl stack entries >> again, when the specpdl stack(s) are marked. Not doing that looks >> definitely wrong. Replacing the other ALLOCA macros that are currently >> used for holding Lisp_Objects with SAFE_ALLOCA_LISP would solve things >> for both GCs. > > Can we first identify those cases, so that we could discuss the > specific codes in question? TBH, if there is a single bug caused by an incorrect SAFE_ALLOCA which should be a SAFE_ALLOCA_LISP, I think that's reason enough to modify SAFE_ALLOCA to conservatively scan any xmalloc'd memory it uses. No performance impact as this case is nonexistent in practice. Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 4:40 ` Gerd Möllmann 2025-01-04 7:57 ` Eli Zaretskii @ 2025-01-04 11:29 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 12:17 ` Gerd Möllmann 1 sibling, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 11:29 UTC (permalink / raw) To: Gerd Möllmann; +Cc: 75322 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Pip Cet <pipcet@protonmail.com> writes: > >> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >> >>> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >>> >>>> >>>> The pointers to string data case probably requires adding yet another >>>> macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root, >>>> possibly and exact one which would be good. >> >> Note that might still EFAULT if there's a memory barrier. I think. >> >> Do we really need to move all arguments to syscalls and libc functions >> which might use a syscall into non-MPS memory? That would be bad. >> >> And which libc functions might use a syscall? I think we can agree >> fprintf might, and memcpy() doesn't (note to self: destroy all evidence >> I ever considered making memcpy() use MMU tricks for very large >> buffers), but what about all the others? >> >> Maybe I'm panicking too much and fixing read/write/exec* is good >> enough? > > Don't Panic! Quote from The Hitchhiker's Guide to Emacs (non-NS edition) > :-). > TBH, I couldn't follow your thoughts above with the EFAULT, syscalls and > so on. My understanding is that if there is a memory barrier in place for a string that a syscall tries to access, we get an -EFAULT from Linux, an EFAULT from glibc, and the syscall won't work. This is what makes valid_pointer_p work, for example. (To the extent it does: valid_pointer_p assumes 16 bytes after the pointer are readable; I don't see why that is true for small objects). What makes this more difficult is that glibc and GCC disagree about what to do with invalid pointers even in the simplest case: glibc documents printf ("%s\n", NULL) to work, but GCC will rewrite it into puts (NULL), which crashes. I'm worried that glibc might wrap a syscall incorrectly wrt EFAULT and SIGSEGV, in this case. Worse, if the syscall is in a fork()ed process, MPS machinery to remove the memory barrier might not be in place after the fork. And who knows about posix_spawn action descriptors? Or vfork? >>> Or one does it as you did in b0a209e9204, that's of course also safe. >>> For both old and new GC. (Don't remember if you mentioned it Pip, but >>> old GC moves string data as well, during string compaction, should GC >>> run). >> >> Ouch. Yes, I remember now. >> >> Pip > > And today I see you reverted that commit. Is there something wrong with > it? I couldn't see something wrong, and for me VALUE(no root) > > VALUE(exact) VALUE(ambig). There were two reasons for the revert: 1. Eli asked me not to push the change right after I pushed. I thought it would be best to restore the "before" state so we could discuss the solution. 2. For the non-MPS case, I rashly assumed it would be okay to remove the no-GC assumption that call_process apparently establishes (even though there is no comment saying so). I'm not sure what I would do now; the old code seems buggy to me because Fexpand_file_name can call Lisp, but that bug affects only argv, not envp. It may be best to fix the argv code but leave the envp code in its (once again) current fragile state, documenting precisely which assumptions are made there. > WRT Lisp_Object allocas, please tell if I should do that. Sorry, I don't understand. Lisp_Objects shouldn't be allocated with SAFE_ALLOCA, but allocating them with SAFE_ALLOCA_LISP_EXTRA is fine. Pointers to string data cannot currently be safely allocated with SAFE_ALLOCA, but I'm not sure whether SAFE_ALLOCA_AMBIGUOUS or SAFE_ALLOCA_EXACT_POINTER would be the right thing to do. Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 11:29 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 12:17 ` Gerd Möllmann 0 siblings, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 12:17 UTC (permalink / raw) To: Pip Cet; +Cc: 75322 Pip Cet <pipcet@protonmail.com> writes: >> TBH, I couldn't follow your thoughts above with the EFAULT, syscalls and >> so on. > > My understanding is that if there is a memory barrier in place for a > string that a syscall tries to access, we get an -EFAULT from Linux, an > EFAULT from glibc, and the syscall won't work. Ah, I think I understand now, thanks. Thing is that string data is in our leaf pool (AMCZ). AMCZ doesn't use barriers because there are no references in its objects. Does that make sense? > This is what makes valid_pointer_p work, for example. (To the extent it > does: valid_pointer_p assumes 16 bytes after the pointer are readable; I > don't see why that is true for small objects). > > What makes this more difficult is that glibc and GCC disagree about what > to do with invalid pointers even in the simplest case: glibc documents > printf ("%s\n", NULL) to work, but GCC will rewrite it into puts (NULL), > which crashes. I'm worried that glibc might wrap a syscall incorrectly > wrt EFAULT and SIGSEGV, in this case. > > Worse, if the syscall is in a fork()ed process, MPS machinery to remove > the memory barrier might not be in place after the fork. And who knows > about posix_spawn action descriptors? Or vfork? > >>>> Or one does it as you did in b0a209e9204, that's of course also safe. >>>> For both old and new GC. (Don't remember if you mentioned it Pip, but >>>> old GC moves string data as well, during string compaction, should GC >>>> run). >>> >>> Ouch. Yes, I remember now. >>> >>> Pip >> >> And today I see you reverted that commit. Is there something wrong with >> it? I couldn't see something wrong, and for me VALUE(no root) > >> VALUE(exact) VALUE(ambig). > > There were two reasons for the revert: > > 1. Eli asked me not to push the change right after I pushed. I thought > it would be best to restore the "before" state so we could discuss the > solution. > > 2. For the non-MPS case, I rashly assumed it would be okay to remove the > no-GC assumption that call_process apparently establishes (even though > there is no comment saying so). I'm not sure what I would do now; the > old code seems buggy to me because Fexpand_file_name can call Lisp, but > that bug affects only argv, not envp. It may be best to fix the argv > code but leave the envp code in its (once again) current fragile state, > documenting precisely which assumptions are made there. > >> WRT Lisp_Object allocas, please tell if I should do that. > > Sorry, I don't understand. Lisp_Objects shouldn't be allocated with > SAFE_ALLOCA, but allocating them with SAFE_ALLOCA_LISP_EXTRA is fine. > Pointers to string data cannot currently be safely allocated with > SAFE_ALLOCA, but I'm not sure whether SAFE_ALLOCA_AMBIGUOUS or > SAFE_ALLOCA_EXACT_POINTER would be the right thing to do. My fault: I meant allocas used to store Lisp_Object in them, i.e. Lisp_Object * :-). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-03 20:34 ` Gerd Möllmann 2025-01-03 20:48 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 7:00 ` Eli Zaretskii 2025-01-04 7:17 ` Gerd Möllmann 2025-01-04 11:08 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 2 replies; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 7:00 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > Cc: 75322@debbugs.gnu.org > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Date: Fri, 03 Jan 2025 21:34:07 +0100 > > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > > > > > The pointers to string data case probably requires adding yet another > > macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root, > > possibly and exact one which would be good. > > Or one does it as you did in b0a209e9204, that's of course also safe. > For both old and new GC. (Don't remember if you mentioned it Pip, but > old GC moves string data as well, during string compaction, should GC > run). The current code in callproc.c assumes that GC cannot run while we are parked in posix_spawn or vfork. Is that assumption false with MPS? If so, what would trigger GC during that time? Another question is about the global Lisp variables in 'globals'. For example, Vprocess_environment actually globals.f_Vprocess_environment. Is this large struct protected from GC, i.e. can GC ever decide that process-environment is not used and free it? If it's protected, where and how is it protected? And if it is protected, then any members of the list that is the value of process-environment are also protected and cannot be freed by GC. If 'globals' is not protected, I think we should protect it, no? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 7:00 ` Eli Zaretskii @ 2025-01-04 7:17 ` Gerd Möllmann 2025-01-04 8:23 ` Eli Zaretskii 2025-01-04 11:08 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 7:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Helmut Eller, pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> Cc: 75322@debbugs.gnu.org >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Date: Fri, 03 Jan 2025 21:34:07 +0100 >> >> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >> >> > >> > The pointers to string data case probably requires adding yet another >> > macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root, >> > possibly and exact one which would be good. >> >> Or one does it as you did in b0a209e9204, that's of course also safe. >> For both old and new GC. (Don't remember if you mentioned it Pip, but >> old GC moves string data as well, during string compaction, should GC >> run). > > The current code in callproc.c assumes that GC cannot run while we are > parked in posix_spawn or vfork. Is that assumption false with MPS? > If so, what would trigger GC during that time? Okay, so it's safe with the old GC, I assume. Do you know if it is done with an inhibit_garbage_collection? Not that we run into the trap that somewhere on the way to the actual fork maybe_quit is called which can GC and so on, like we had it in the regexp engine a while back. For MPS, I don't know for sure. I have seen in passing while git grepping in their repo recently, that they write something about forking and child processes, but I don't know what they do. Maybe someone having read the code can answer that. (Added Helmut in CC). > > Another question is about the global Lisp variables in 'globals'. For > example, Vprocess_environment actually globals.f_Vprocess_environment. > Is this large struct protected from GC, i.e. can GC ever decide that > process-environment is not used and free it? If it's protected, where > and how is it protected? And if it is protected, then any members of > the list that is the value of process-environment are also protected > and cannot be freed by GC. > > If 'globals' is not protected, I think we should protect it, no? process-environment is a DEFVAR_LISP, and is root in both GCs via the staticpro mechanism (staticvec array). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 7:17 ` Gerd Möllmann @ 2025-01-04 8:23 ` Eli Zaretskii 2025-01-04 8:58 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 8:23 UTC (permalink / raw) To: Gerd Möllmann; +Cc: eller.helmut, pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org, Helmut Eller > <eller.helmut@gmail.com> > Date: Sat, 04 Jan 2025 08:17:34 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > The current code in callproc.c assumes that GC cannot run while we are > > parked in posix_spawn or vfork. Is that assumption false with MPS? > > If so, what would trigger GC during that time? > > Okay, so it's safe with the old GC, I assume. Do you know if it is done > with an inhibit_garbage_collection? No, without. Unless posix_spawn could somehow call back to us in a way that posix_spawn will later return and resume running (as opposed to via a fatal signal), how could GC happen? > Not that we run into the trap that somewhere on the way to the > actual fork maybe_quit is called which can GC and so on, like we had > it in the regexp engine a while back. This can be established by examining the code, right? > For MPS, I don't know for sure. I have seen in passing while git > grepping in their repo recently, that they write something about forking > and child processes, but I don't know what they do. Maybe someone having > read the code can answer that. (Added Helmut in CC). We are talking about vfork, not fork. In our case, the forked process runs a completely separate program, in most cases not even Emacs, so MPS has no bearing on that, I think. > > Another question is about the global Lisp variables in 'globals'. For > > example, Vprocess_environment actually globals.f_Vprocess_environment. > > Is this large struct protected from GC, i.e. can GC ever decide that > > process-environment is not used and free it? If it's protected, where > > and how is it protected? And if it is protected, then any members of > > the list that is the value of process-environment are also protected > > and cannot be freed by GC. > > > > If 'globals' is not protected, I think we should protect it, no? > > process-environment is a DEFVAR_LISP, and is root in both GCs via the > staticpro mechanism (staticvec array). OK, so it's impossible that some member of the process-environment's list will be GCed, right? AFAIU, Pip was considering a situation where some of these member strings are GCed, and then subroutines of posix_spawn use a bad address, to explain the error message Ihor reported. This seems impossible because process-environment is staticpro'd, right? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 8:23 ` Eli Zaretskii @ 2025-01-04 8:58 ` Gerd Möllmann 0 siblings, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 8:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eller.helmut, pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org, Helmut Eller >> <eller.helmut@gmail.com> >> Date: Sat, 04 Jan 2025 08:17:34 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > The current code in callproc.c assumes that GC cannot run while we are >> > parked in posix_spawn or vfork. Is that assumption false with MPS? >> > If so, what would trigger GC during that time? >> >> Okay, so it's safe with the old GC, I assume. Do you know if it is done >> with an inhibit_garbage_collection? > > No, without. Unless posix_spawn could somehow call back to us in a > way that posix_spawn will later return and resume running (as opposed > to via a fatal signal), how could GC happen? > >> Not that we run into the trap that somewhere on the way to the >> actual fork maybe_quit is called which can GC and so on, like we had >> it in the regexp engine a while back. > > This can be established by examining the code, right? In principle yes, one could check if maybe_quit is called in the call trees between the point SDATA is stored and fork happens, but I wouldn't do it because (1) someone might add a maybe_quit at some point, (2) Pip's solution is immune to the problem, and (3) it's potentially a lot of work to figure out if GC can be called in a call tree. >> For MPS, I don't know for sure. I have seen in passing while git >> grepping in their repo recently, that they write something about forking >> and child processes, but I don't know what they do. Maybe someone having >> read the code can answer that. (Added Helmut in CC). > > We are talking about vfork, not fork. In our case, the forked process > runs a completely separate program, in most cases not even Emacs, so MPS > has no bearing on that, I think. Okay. I hope Helmut and/or Pip know more about this than I do. >> > Another question is about the global Lisp variables in 'globals'. For >> > example, Vprocess_environment actually globals.f_Vprocess_environment. >> > Is this large struct protected from GC, i.e. can GC ever decide that >> > process-environment is not used and free it? If it's protected, where >> > and how is it protected? And if it is protected, then any members of >> > the list that is the value of process-environment are also protected >> > and cannot be freed by GC. >> > >> > If 'globals' is not protected, I think we should protect it, no? >> >> process-environment is a DEFVAR_LISP, and is root in both GCs via the >> staticpro mechanism (staticvec array). > > OK, so it's impossible that some member of the process-environment's > list will be GCed, right? Right. > AFAIU, Pip was considering a situation where some of these member > strings are GCed, and then subroutines of posix_spawn use a bad > address, to explain the error message Ihor reported. This seems > impossible because process-environment is staticpro'd, right? I think that scenario cannot happen in this case, right. I'd like his xstrdups better anyway, so that all becomes irrelevant for both GCs. Less to worry about :-). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 7:00 ` Eli Zaretskii 2025-01-04 7:17 ` Gerd Möllmann @ 2025-01-04 11:08 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 13:47 ` Eli Zaretskii 1 sibling, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 11:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Gerd Möllmann, 75322 "Eli Zaretskii" <eliz@gnu.org> writes: >> Cc: 75322@debbugs.gnu.org >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Date: Fri, 03 Jan 2025 21:34:07 +0100 >> >> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >> >> > >> > The pointers to string data case probably requires adding yet another >> > macro SAFE_ALLOCA_FIND_A_GOOD_NAME, which, for MPS, allocates a root, >> > possibly and exact one which would be good. >> >> Or one does it as you did in b0a209e9204, that's of course also safe. >> For both old and new GC. (Don't remember if you mentioned it Pip, but >> old GC moves string data as well, during string compaction, should GC >> run). > > The current code in callproc.c assumes that GC cannot run while we are > parked in posix_spawn or vfork. If you're attempting to explain why the current code is safe (if you're saying it is), it assumes much more than that. call_process assumes Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the function calls Lisp, which may do anything, including modifying Vprocess_environment. Regardless of what you're saying, such assumptions need to be spelled out. Where they are made, that is, not in a utility function. Yes, make_environment_block does say its callers can't run GC, but call_process doesn't indicate when and how it establishes a no-GC assumption. > Is that assumption false with MPS? As we agreed, code should be written to assume GC can strike at any time. In the context of MPS, to make things more difficult, "GC can strike" may mean a full GC happens (moving objects) or a memory barrier is lifted. > Another question is about the global Lisp variables in 'globals'. For > example, Vprocess_environment actually globals.f_Vprocess_environment. > Is this large struct protected from GC, i.e. can GC ever decide that > process-environment is not used and free it? If it's protected, where > and how is it protected? It's a global variable. Those are protected. > And if it is protected, then any members of > the list that is the value of process-environment are also protected > and cannot be freed by GC. IIUC, Gerd explained that the old GC can still move the string *data* used in that structure, even if the string metadata stays in place. Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 11:08 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 13:47 ` Eli Zaretskii 2025-01-04 14:13 ` Gerd Möllmann 2025-01-04 15:26 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 2 replies; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 13:47 UTC (permalink / raw) To: Pip Cet; +Cc: gerd.moellmann, 75322 > Date: Sat, 04 Jan 2025 11:08:50 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, 75322@debbugs.gnu.org > > "Eli Zaretskii" <eliz@gnu.org> writes: > > > The current code in callproc.c assumes that GC cannot run while we are > > parked in posix_spawn or vfork. > > If you're attempting to explain why the current code is safe (if you're > saying it is), it assumes much more than that. call_process assumes > Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the > function calls Lisp, which may do anything, including modifying > Vprocess_environment. AFAICT, expand-file-name is called before we start using the SSDATA of strings in the args[] array. Or what did I miss? > Regardless of what you're saying, such assumptions need to be spelled > out. Where they are made, that is, not in a utility function. I'm okay with spelling out these assumptions. > Yes, make_environment_block does say its callers can't run GC, but > call_process doesn't indicate when and how it establishes a no-GC > assumption. What would be needed to indicate that? > > Is that assumption false with MPS? > > As we agreed, code should be written to assume GC can strike at any > time. I don't think we agreed to that. At least I didn't, not in this general form. It would be a huge job to make all of our code comply with this. > > Another question is about the global Lisp variables in 'globals'. For > > example, Vprocess_environment actually globals.f_Vprocess_environment. > > > Is this large struct protected from GC, i.e. can GC ever decide that > > process-environment is not used and free it? If it's protected, where > > and how is it protected? > > It's a global variable. Those are protected. > > > And if it is protected, then any members of > > the list that is the value of process-environment are also protected > > and cannot be freed by GC. > > IIUC, Gerd explained that the old GC can still move the string *data* > used in that structure, even if the string metadata stays in place. If string data is moved, then accessing the old pointer would trigger the barrier and MPS will do its thing, not? To clarify, I was trying to understand whether the error message reported by Ihor in another thread could have happened because of GC in this are of the code. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 13:47 ` Eli Zaretskii @ 2025-01-04 14:13 ` Gerd Möllmann 2025-01-04 15:26 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 14:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Pip Cet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> IIUC, Gerd explained that the old GC can still move the string *data* >> used in that structure, even if the string metadata stays in place. > > If string data is moved, then accessing the old pointer would trigger > the barrier and MPS will do its thing, not? > > To clarify, I was trying to understand whether the error message > reported by Ihor in another thread could have happened because of GC > in this are of the code. So, Ihor saw something in igc. In that case, the string _data_ will never a barrier on it, because it's leaf data, i.e. data that doesn't contain references, and that is allocated from an AMCZ pool. If that helps, don't know. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 13:47 ` Eli Zaretskii 2025-01-04 14:13 ` Gerd Möllmann @ 2025-01-04 15:26 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 15:34 ` Gerd Möllmann 2025-01-04 18:02 ` Eli Zaretskii 1 sibling, 2 replies; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 15:26 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 75322 "Eli Zaretskii" <eliz@gnu.org> writes: >> Date: Sat, 04 Jan 2025 11:08:50 +0000 >> From: Pip Cet <pipcet@protonmail.com> >> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, 75322@debbugs.gnu.org >> >> "Eli Zaretskii" <eliz@gnu.org> writes: >> >> > The current code in callproc.c assumes that GC cannot run while we are >> > parked in posix_spawn or vfork. >> >> If you're attempting to explain why the current code is safe (if you're >> saying it is), it assumes much more than that. call_process assumes >> Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the >> function calls Lisp, which may do anything, including modifying >> Vprocess_environment. > > AFAICT, expand-file-name is called before we start using the SSDATA of > strings in the args[] array. Or what did I miss? You're right, thanks. I got confused between args and new_argv. The next thing I'd look at is the final call to ENCODE_FILE, called after new_argv is set up; this ends up in encode_coding_object, which calls safe_funcall in what seems to me to be an unlikely but possible code path. I assume that's unsafe (the safe_ refers to redisplay, not GC, IIUC)? While maybe_quit is "safe" because it inhibits GC, I believe it can still call the debugger, which might require more memory than is available until GC is re-enabled. >> Regardless of what you're saying, such assumptions need to be spelled >> out. Where they are made, that is, not in a utility function. > > I'm okay with spelling out these assumptions. Excellent. Now we just need to establish what they are :-) >> Yes, make_environment_block does say its callers can't run GC, but >> call_process doesn't indicate when and how it establishes a no-GC >> assumption. > > What would be needed to indicate that? I'd prefer a pair of macros (no-ops in regular builds) to comments, but there is no obvious best solution here. My proposal would be to remove most (ideally, all, and then we're done) no-GC assumptions, and put the few ones that must remain into separate function bodies for the no-GC / possible-GC cases. Then we can put the no-GC function bodies into a separate section and prohibit references from no-GC to possible-GC functions, and have the linker check that. >> > Is that assumption false with MPS? >> >> As we agreed, code should be written to assume GC can strike at any >> time. > > I don't think we agreed to that. At least I didn't, not in this > general form. It would be a huge job to make all of our code comply > with this. I said "That's what you should think: GC can strike at any time", and you said "The same is true with the old GC". I misread that as agreement. >> IIUC, Gerd explained that the old GC can still move the string *data* >> used in that structure, even if the string metadata stays in place. > > If string data is moved, then accessing the old pointer would trigger > the barrier and MPS will do its thing, not? Sorry, but I think I'm confused here. IIUC, MPS doesn't currently use barriers on data that is moved (it could, so the data is copied lazily, but I don't think that's what you meant), it uses barriers on data that contains references that may not have been fixed. If a pointer to "old" data is ever exposed to Emacs, we lose, because MPS will reuse the memory for new data, which might be behind a barrier. If we ever do: static Lisp_Object unmarked; unmarked = string; ... trigger GC here ... puts (SDATA (unmarked); the most likely outcome (thanks to Gerd for the hint) is that nonsensical data is printed, because the string data was reused for another string; less likely, but possibly, the data is reused for a different pool (with barriers) and we'd get a SIGSEGV; even less likely, puts() will call write() directly and we get an EFAULT, and glibc does whatever glibc does in that case (set errno, I hope). Accessing the old pointer is never okay. MPS can't fix it, and doesn't make catching such errors easy. > To clarify, I was trying to understand whether the error message > reported by Ihor in another thread could have happened because of GC > in this are of the code. I currently think that Ihor's test case calls execve () with nonsensical "environment" strings a lot, and once in a while they'll even be behind the barrier which causes an EFAULT. GNU/Linux seems relatively forgiving about strings without "=" in the envp, for whatever reason. Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 15:26 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 15:34 ` Gerd Möllmann 2025-01-04 18:19 ` Eli Zaretskii 2025-01-04 18:02 ` Eli Zaretskii 1 sibling, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 15:34 UTC (permalink / raw) To: Pip Cet; +Cc: Eli Zaretskii, 75322 Pip Cet <pipcet@protonmail.com> writes: > "Eli Zaretskii" <eliz@gnu.org> writes: > >>> Date: Sat, 04 Jan 2025 11:08:50 +0000 >>> From: Pip Cet <pipcet@protonmail.com> >>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, 75322@debbugs.gnu.org >>> >>> "Eli Zaretskii" <eliz@gnu.org> writes: >>> >>> > The current code in callproc.c assumes that GC cannot run while we are >>> > parked in posix_spawn or vfork. >>> >>> If you're attempting to explain why the current code is safe (if you're >>> saying it is), it assumes much more than that. call_process assumes >>> Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the >>> function calls Lisp, which may do anything, including modifying >>> Vprocess_environment. >> >> AFAICT, expand-file-name is called before we start using the SSDATA of >> strings in the args[] array. Or what did I miss? > > You're right, thanks. I got confused between args and new_argv. > > The next thing I'd look at is the final call to ENCODE_FILE, called > after new_argv is set up; this ends up in encode_coding_object, which > calls safe_funcall in what seems to me to be an unlikely but possible > code path. I assume that's unsafe (the safe_ refers to redisplay, not > GC, IIUC)? > > While maybe_quit is "safe" because it inhibits GC, I believe it can > still call the debugger, which might require more memory than is > available until GC is re-enabled. > >>> Regardless of what you're saying, such assumptions need to be spelled >>> out. Where they are made, that is, not in a utility function. >> >> I'm okay with spelling out these assumptions. > > Excellent. Now we just need to establish what they are :-) > >>> Yes, make_environment_block does say its callers can't run GC, but >>> call_process doesn't indicate when and how it establishes a no-GC >>> assumption. >> >> What would be needed to indicate that? > > I'd prefer a pair of macros (no-ops in regular builds) to comments, but > there is no obvious best solution here. > > My proposal would be to remove most (ideally, all, and then we're done) > no-GC assumptions, and put the few ones that must remain into separate > function bodies for the no-GC / possible-GC cases. Then we can put the > no-GC function bodies into a separate section and prohibit references > from no-GC to possible-GC functions, and have the linker check that. > >>> > Is that assumption false with MPS? >>> >>> As we agreed, code should be written to assume GC can strike at any >>> time. >> >> I don't think we agreed to that. At least I didn't, not in this >> general form. It would be a huge job to make all of our code comply >> with this. > > I said "That's what you should think: GC can strike at any time", and > you said "The same is true with the old GC". I misread that as > agreement. > >>> IIUC, Gerd explained that the old GC can still move the string *data* >>> used in that structure, even if the string metadata stays in place. >> >> If string data is moved, then accessing the old pointer would trigger >> the barrier and MPS will do its thing, not? > > Sorry, but I think I'm confused here. > > IIUC, MPS doesn't currently use barriers on data that is moved (it > could, so the data is copied lazily, but I don't think that's what you > meant), it uses barriers on data that contains references that may not > have been fixed. > > If a pointer to "old" data is ever exposed to Emacs, we lose, because > MPS will reuse the memory for new data, which might be behind a barrier. > > If we ever do: > > static Lisp_Object unmarked; > unmarked = string; > ... trigger GC here ... > puts (SDATA (unmarked); > > the most likely outcome (thanks to Gerd for the hint) is that > nonsensical data is printed, because the string data was reused for > another string; less likely, but possibly, the data is reused for a > different pool (with barriers) and we'd get a SIGSEGV; even less likely, > puts() will call write() directly and we get an EFAULT, and glibc does > whatever glibc does in that case (set errno, I hope). > > Accessing the old pointer is never okay. MPS can't fix it, and doesn't > make catching such errors easy. > >> To clarify, I was trying to understand whether the error message >> reported by Ihor in another thread could have happened because of GC >> in this are of the code. > > I currently think that Ihor's test case calls execve () with nonsensical > "environment" strings a lot, and once in a while they'll even be behind > the barrier which causes an EFAULT. GNU/Linux seems relatively > forgiving about strings without "=" in the envp, for whatever reason. > > Pip I'm entering a state of confusion again, as usual in this discussion. Can we _pretty please_ just do the xstrdup thing, and forget about it? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 15:34 ` Gerd Möllmann @ 2025-01-04 18:19 ` Eli Zaretskii 2025-01-04 18:35 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 18:19 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, 75322@debbugs.gnu.org > Date: Sat, 04 Jan 2025 16:34:15 +0100 > > I'm entering a state of confusion again, as usual in this discussion. > Can we _pretty please_ just do the xstrdup thing, and forget about it? We could do that, but what does this mean for our protocol of using data from Lisp objects in libc and system calls? Does it mean we cannot use SAFE_ALLOCA? Does it mean we must always xstrdup every string and xmalloc every other object before calling some system API? Are Lisp strings safe when GC can strike, or aren't they? What about Lisp vectors? Etc. etc. We must figure out what are the safe and sound rules for doing this with MPS, like we had with the old GC. Otherwise, we will be unable to write correct and sound code, and we'll be unable to audit code written by others. Right now, it doesn't seem to me like we have a clear idea of what's permitted and what's unsafe. We are still arguing whether GC moves Lisp strings and what exactly does that mean. We still don't understand well enough what, if anything, are the problems with SAFE_ALLOCA and its ilk. We just established that ENCODE_FILE and ENCODE_SYSTEM can trigger GC, and didn't yet review the affected code. And there are many more places and calls to consider (e.g., do you know that redisplay could trigger GC when it calls character composition?). If we don't consider this stuff, if we just "do the xstrdup thing and forget about it", how can we continue cleaning up the igc branch and making its code more stable and reliable? It doesn't sound wise to me. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 18:19 ` Eli Zaretskii @ 2025-01-04 18:35 ` Gerd Möllmann 2025-01-04 19:10 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 18:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: Eli Zaretskii <eliz@gnu.org>, 75322@debbugs.gnu.org >> Date: Sat, 04 Jan 2025 16:34:15 +0100 >> >> I'm entering a state of confusion again, as usual in this discussion. >> Can we _pretty please_ just do the xstrdup thing, and forget about it? > > We could do that, but what does this mean for our protocol of using > data from Lisp objects in libc and system calls? Does it mean we > cannot use SAFE_ALLOCA? Does it mean we must always xstrdup every > string and xmalloc every other object before calling some system API? > Are Lisp strings safe when GC can strike, or aren't they? What about > Lisp vectors? Etc. etc. > > We must figure out what are the safe and sound rules for doing this > with MPS, like we had with the old GC. Otherwise, we will be unable > to write correct and sound code, and we'll be unable to audit code > written by others. > > Right now, it doesn't seem to me like we have a clear idea of what's > permitted and what's unsafe. I'd say I have a clear idea. > We are still arguing whether GC moves Lisp strings and what exactly > does that mean. We still don't understand well enough what, if > anything, are the problems with SAFE_ALLOCA and its ilk. I can't believe you say that. We talked about why xmalloc'd memory has to be a root if it contains references. SAFE_NALLOCA uses xnmalloc. Safe_ALLOCA_LISP does things differently. > We just established that ENCODE_FILE and ENCODE_SYSTEM can trigger GC, > and didn't yet review the affected code. And there are many more > places and calls to consider (e.g., do you know that redisplay could > trigger GC when it calls character composition?). > > If we don't consider this stuff, if we just "do the xstrdup thing and > forget about it", how can we continue cleaning up the igc branch and > making its code more stable and reliable? It doesn't sound wise to > me. I'll bow out of this discussion, sorry. This time for real :-). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 18:35 ` Gerd Möllmann @ 2025-01-04 19:10 ` Eli Zaretskii 2025-01-04 19:24 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 19:10 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sat, 04 Jan 2025 19:35:04 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > We are still arguing whether GC moves Lisp strings and what exactly > > does that mean. We still don't understand well enough what, if > > anything, are the problems with SAFE_ALLOCA and its ilk. > > I can't believe you say that. We talked about why xmalloc'd memory > has to be a root if it contains references. SAFE_NALLOCA uses xnmalloc. > Safe_ALLOCA_LISP does things differently. Sorry, I don't have a clear idea what that means. When can we use SAFE_NALLOCA and friends? when can we or must we use SAFE_ALLOCA_LISP? What are the considerations? etc. etc. I believe you that you have a clear idea, but we need all of us have a clear idea, and we should probably write that up somewhere once we do. > > We just established that ENCODE_FILE and ENCODE_SYSTEM can trigger GC, > > and didn't yet review the affected code. And there are many more > > places and calls to consider (e.g., do you know that redisplay could > > trigger GC when it calls character composition?). > > > > If we don't consider this stuff, if we just "do the xstrdup thing and > > forget about it", how can we continue cleaning up the igc branch and > > making its code more stable and reliable? It doesn't sound wise to > > me. > > I'll bow out of this discussion, sorry. This time for real :-). I'm sorry to hear that, because I think it's important to have you on board in this discussion. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 19:10 ` Eli Zaretskii @ 2025-01-04 19:24 ` Gerd Möllmann 0 siblings, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-04 19:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sat, 04 Jan 2025 19:35:04 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > We are still arguing whether GC moves Lisp strings and what exactly >> > does that mean. We still don't understand well enough what, if >> > anything, are the problems with SAFE_ALLOCA and its ilk. >> >> I can't believe you say that. We talked about why xmalloc'd memory >> has to be a root if it contains references. SAFE_NALLOCA uses xnmalloc. >> Safe_ALLOCA_LISP does things differently. > > Sorry, I don't have a clear idea what that means. When can we use > SAFE_NALLOCA and friends? when can we or must we use SAFE_ALLOCA_LISP? > What are the considerations? etc. etc. I tried to explain that in igc.org, e.g. what roots are, or malloc in one section. That's the best I can come up with. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 15:26 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 15:34 ` Gerd Möllmann @ 2025-01-04 18:02 ` Eli Zaretskii 2025-01-04 19:32 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 18:02 UTC (permalink / raw) To: Pip Cet; +Cc: gerd.moellmann, 75322 > Date: Sat, 04 Jan 2025 15:26:01 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org > > "Eli Zaretskii" <eliz@gnu.org> writes: > > > AFAICT, expand-file-name is called before we start using the SSDATA of > > strings in the args[] array. Or what did I miss? > > You're right, thanks. I got confused between args and new_argv. > > The next thing I'd look at is the final call to ENCODE_FILE, called > after new_argv is set up; this ends up in encode_coding_object, which > calls safe_funcall in what seems to me to be an unlikely but possible > code path. I assume that's unsafe (the safe_ refers to redisplay, not > GC, IIUC)? ENCODE_FILE can indeed trigger GC in rare cases, but I see only one such call: path = ENCODE_FILE (path); We could simply move this to before the loop that sets up new_argv[]. > While maybe_quit is "safe" because it inhibits GC, I believe it can > still call the debugger, which might require more memory than is > available until GC is re-enabled. maybe_quit is not the problem, the problem AFAIU is that any encoding could have pre-write-conversion function written in Lisp. > >> Yes, make_environment_block does say its callers can't run GC, but > >> call_process doesn't indicate when and how it establishes a no-GC > >> assumption. > > > > What would be needed to indicate that? > > I'd prefer a pair of macros (no-ops in regular builds) to comments, but > there is no obvious best solution here. Sorry, I don't understand: why macros? Do we use something like that anywhere else in Emacs? > My proposal would be to remove most (ideally, all, and then we're done) > no-GC assumptions, and put the few ones that must remain into separate > function bodies for the no-GC / possible-GC cases. Then we can put the > no-GC function bodies into a separate section and prohibit references > from no-GC to possible-GC functions, and have the linker check that. First, the techniques that rely on separate segments are non-portable, so not all platforms will support them. More importantly, I'm afraid the amount of code where we currently don't expect GC is much more than you seem to imagine, so I don't think the above is practical. In any case, the starting point is to audit all the places where GC can happen, and that needs to be done anyway if we want to do it thoroughly and proactively (as opposed to only when someone reports a bug). We can delay the decision what to do with these places to later, when we understand better the magnitude of the problem and its various aspects (in terms of several different effects that GC can have on various objects). > >> As we agreed, code should be written to assume GC can strike at any > >> time. > > > > I don't think we agreed to that. At least I didn't, not in this > > general form. It would be a huge job to make all of our code comply > > with this. > > I said "That's what you should think: GC can strike at any time", and > you said "The same is true with the old GC". I misread that as > agreement. I only meant that MPS is not very different from the old GC in this regard, that's all. > >> IIUC, Gerd explained that the old GC can still move the string *data* > >> used in that structure, even if the string metadata stays in place. > > > > If string data is moved, then accessing the old pointer would trigger > > the barrier and MPS will do its thing, not? > > Sorry, but I think I'm confused here. > > IIUC, MPS doesn't currently use barriers on data that is moved (it > could, so the data is copied lazily, but I don't think that's what you > meant), it uses barriers on data that contains references that may not > have been fixed. The safe thing is to re-initialize the pointer from the string whenever we think GC could happen, and otherwise make sure GC cannot happen. > If a pointer to "old" data is ever exposed to Emacs, we lose, because > MPS will reuse the memory for new data, which might be behind a barrier. > > If we ever do: > > static Lisp_Object unmarked; > unmarked = string; > ... trigger GC here ... > puts (SDATA (unmarked); > > the most likely outcome (thanks to Gerd for the hint) is that > nonsensical data is printed Are you sure? AFAIU, referencing 'unmarked' is safe, even after GC. The below is indeed unsafe: char *p = SDATA (unmarked); ... trigger GC here ... puts (p); > > To clarify, I was trying to understand whether the error message > > reported by Ihor in another thread could have happened because of GC > > in this are of the code. > > I currently think that Ihor's test case calls execve () with nonsensical > "environment" strings a lot, and once in a while they'll even be behind > the barrier which causes an EFAULT. Before we agree that this could be the reason, we'd need to find the places where GC could reuse the memory of a live string, while that string appears in some live data structure, and as long as no GC can happen between the time we put the SDATA pointers into the environment array and the time posix_spawn returns. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 18:02 ` Eli Zaretskii @ 2025-01-04 19:32 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 20:31 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 19:32 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 75322 "Eli Zaretskii" <eliz@gnu.org> writes: >> Date: Sat, 04 Jan 2025 15:26:01 +0000 >> From: Pip Cet <pipcet@protonmail.com> >> Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org >> >> "Eli Zaretskii" <eliz@gnu.org> writes: >> >> > AFAICT, expand-file-name is called before we start using the SSDATA of >> > strings in the args[] array. Or what did I miss? >> >> You're right, thanks. I got confused between args and new_argv. >> >> The next thing I'd look at is the final call to ENCODE_FILE, called >> after new_argv is set up; this ends up in encode_coding_object, which >> calls safe_funcall in what seems to me to be an unlikely but possible >> code path. I assume that's unsafe (the safe_ refers to redisplay, not >> GC, IIUC)? > > ENCODE_FILE can indeed trigger GC in rare cases, but I see only one > such call: > > path = ENCODE_FILE (path); > > We could simply move this to before the loop that sets up new_argv[]. Fixing the current code for this super-rare case would be good, but my point was that we cannot prove the current code to be correct; it isn't, technically speaking, even though it's battle-worn. > >> While maybe_quit is "safe" because it inhibits GC, I believe it can >> still call the debugger, which might require more memory than is >> available until GC is re-enabled. > > maybe_quit is not the problem, the problem AFAIU is that any encoding > could have pre-write-conversion function written in Lisp. Agreed, not the problem here. >> >> Yes, make_environment_block does say its callers can't run GC, but >> >> call_process doesn't indicate when and how it establishes a no-GC >> >> assumption. >> > >> > What would be needed to indicate that? >> >> I'd prefer a pair of macros (no-ops in regular builds) to comments, but >> there is no obvious best solution here. > > Sorry, I don't understand: why macros? Do we use something like that > anywhere else in Emacs? How is it different from other easserts? emacs_spawn has eassert (input_blocked_p ()); static_assert (NIL_IS_ZERO) was intended to be a similarly proactive macro marking the code making that assumption, but that didn't work out. >> My proposal would be to remove most (ideally, all, and then we're done) >> no-GC assumptions, and put the few ones that must remain into separate >> function bodies for the no-GC / possible-GC cases. Then we can put the >> no-GC function bodies into a separate section and prohibit references >> from no-GC to possible-GC functions, and have the linker check that. > > First, the techniques that rely on separate segments are non-portable, > so not all platforms will support them. More importantly, I'm afraid Absolutely, debug builds on the main platforms can, though. > the amount of code where we currently don't expect GC is much more > than you seem to imagine, so I don't think the above is practical. That's why I want to remove most non-GC assumptions first. I gave up the last time I tried doing this, FWIW, for the reason you describe. > In any case, the starting point is to audit all the places where GC > can happen, and that needs to be done anyway if we want to do it > thoroughly and proactively (as opposed to only when someone reports a > bug). I don't see how putting a few macros in while we're there could hurt :-) > We can delay the decision what to do with these places to > later, when we understand better the magnitude of the problem and its > various aspects (in terms of several different effects that GC can > have on various objects). Agreed, I won't post a patch for now. >> >> As we agreed, code should be written to assume GC can strike at any >> >> time. >> > >> > I don't think we agreed to that. At least I didn't, not in this >> > general form. It would be a huge job to make all of our code comply >> > with this. >> >> I said "That's what you should think: GC can strike at any time", and >> you said "The same is true with the old GC". I misread that as >> agreement. > > I only meant that MPS is not very different from the old GC in this > regard, that's all. I think I understand now, thanks! >> >> IIUC, Gerd explained that the old GC can still move the string *data* >> >> used in that structure, even if the string metadata stays in place. >> > >> > If string data is moved, then accessing the old pointer would trigger >> > the barrier and MPS will do its thing, not? >> >> Sorry, but I think I'm confused here. >> >> IIUC, MPS doesn't currently use barriers on data that is moved (it >> could, so the data is copied lazily, but I don't think that's what you >> meant), it uses barriers on data that contains references that may not >> have been fixed. > > The safe thing is to re-initialize the pointer from the string > whenever we think GC could happen, and otherwise make sure GC cannot > happen. For the old GC, yes. For the new GC, the safe thing would be to ensure MPS has removed all memory barriers, take the arena lock, then call the syscall and return. Or use xstrdup. > >> If a pointer to "old" data is ever exposed to Emacs, we lose, because >> MPS will reuse the memory for new data, which might be behind a barrier. >> >> If we ever do: >> >> static Lisp_Object unmarked; ^^^^^^ >> unmarked = string; >> ... trigger GC here ... >> puts (SDATA (unmarked); >> >> the most likely outcome (thanks to Gerd for the hint) is that >> nonsensical data is printed > > Are you sure? With the static keyword, yes. (Assuming we don't staticpro the static variable, of course). > The below is indeed unsafe: > > char *p = SDATA (unmarked); > ... trigger GC here ... > puts (p); Right now, that's safe for MPS, but not for the old GC, correct? >> > To clarify, I was trying to understand whether the error message >> > reported by Ihor in another thread could have happened because of GC >> > in this are of the code. >> >> I currently think that Ihor's test case calls execve () with nonsensical >> "environment" strings a lot, and once in a while they'll even be behind >> the barrier which causes an EFAULT. > > Before we agree that this could be the reason, we'd need to find the > places where GC could reuse the memory of a live string, while that > string appears in some live data structure, and as long as no GC can > happen between the time we put the SDATA pointers into the environment > array and the time posix_spawn returns. Calling igc_collect () right before the spawn results in corrupt data: $ gdb --ar ./src/emacs -Q --batch --eval '(while t (with-temp-buffer (shell-command "/usr/bin/env" t) (message "%S" (buffer-string))))' (gdb) b posix_spawn Breakpoint 1 at 0x535e0 (gdb) r Starting program: ... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1.1, 0x00007ffff5c2b9c0 in posix_spawn () from /lib64/libc.so.6 (gdb) p igc_collect () $1 = void (gdb) p env[1] No symbol "env" in current context. (gdb) up #1 0x00005555558a805d in emacs_spawn (newpid=0x7ffffffeb148, std_in=4, std_out=6, std_err=6, argv=0x7ffffffeb0f0, envp=0x5555560bbae0, cwd=0x7fffe3acb0c8 "/home/pip/emacs-forge/emacs", pty_name=0x0, pty_in=false, pty_out=false, oldset=0x7ffffffeb280) at callproc.c:1490 1490 vfork_error = posix_spawn (&pid, argv[0], &actions, &attributes, (gdb) p envp[1] $2 = 0x7fffe3a5ded0 "\bD\034\344\377\177" (gdb) c Continuing. [Detaching after vfork from child process 5771] "PWD=/home/pip/emacs-forge/emacs SHLVL=0 _=/usr/bin/env " The old pointer pointed to an AMCZ pool (no barriers), but GC might have reassigned the memory to an AMC pool (with barriers), which would cause the EFAULT (and explain why it's so rare). Hardcoding the assumption that string data is never behind a memory barrier is a bad idea. We might want to change it back to an AMC pool (I've done so in local debug builds so I could go back from string data to the string metadata). So the only mystery left is what causes GC to be called on current scratch/igc after the environment block has been created; we know we want to change the code so GC can be triggered from another thread, but it's failing now, and there are no other threads yet! My current theory is that igc_on_grow_specpdl (called indirectly from here: /* Do the unwind-protect now, even though the pid is not known, so that no storage allocation is done in the critical section. The actual PID will be filled in during the critical section. */ record_unwind_protect (call_process_cleanup, Fcurrent_buffer ()); ) releases the arena, and MPS uses that apparent opportunity to call ArenaPoll which might do GC things. But, again, we want to make this code safe for GC at any time. (We might be able to delay mps_arena_release until the next maybe_quit, I guess) Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 19:32 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 20:31 ` Eli Zaretskii 2025-01-04 21:15 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-04 20:31 UTC (permalink / raw) To: Pip Cet; +Cc: gerd.moellmann, 75322 > Date: Sat, 04 Jan 2025 19:32:01 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org > > > ENCODE_FILE can indeed trigger GC in rare cases, but I see only one > > such call: > > > > path = ENCODE_FILE (path); > > > > We could simply move this to before the loop that sets up new_argv[]. > > Fixing the current code for this super-rare case would be good, but my > point was that we cannot prove the current code to be correct; it isn't, > technically speaking, even though it's battle-worn. I want to be pragmatic and solve practical problems in my life time. Proving that the code is correct I leave to the next generation. > >> >> Yes, make_environment_block does say its callers can't run GC, but > >> >> call_process doesn't indicate when and how it establishes a no-GC > >> >> assumption. > >> > > >> > What would be needed to indicate that? > >> > >> I'd prefer a pair of macros (no-ops in regular builds) to comments, but > >> there is no obvious best solution here. > > > > Sorry, I don't understand: why macros? Do we use something like that > > anywhere else in Emacs? > > How is it different from other easserts? emacs_spawn has > > eassert (input_blocked_p ()); How do you eassert something that should not happen during some code fragment? If you have something specific in mind, do show the code. > >> My proposal would be to remove most (ideally, all, and then we're done) > >> no-GC assumptions, and put the few ones that must remain into separate > >> function bodies for the no-GC / possible-GC cases. Then we can put the > >> no-GC function bodies into a separate section and prohibit references > >> from no-GC to possible-GC functions, and have the linker check that. > > > > First, the techniques that rely on separate segments are non-portable, > > so not all platforms will support them. More importantly, I'm afraid > > Absolutely, debug builds on the main platforms can, though. But then important code that is #ifdef SOME-PLATFORM cannot be handled like that, and we are back at square one, because users of "non-main platforms" still report bugs and we try to investigate and fix them, so we still need to be able to solve this when separate segments are not available. > > the amount of code where we currently don't expect GC is much more > > than you seem to imagine, so I don't think the above is practical. > > That's why I want to remove most non-GC assumptions first. I gave up > the last time I tried doing this, FWIW, for the reason you describe. > > > In any case, the starting point is to audit all the places where GC > > can happen, and that needs to be done anyway if we want to do it > > thoroughly and proactively (as opposed to only when someone reports a > > bug). > > I don't see how putting a few macros in while we're there could hurt :-) We neither saw any macros yet nor have any idea how many of them will be needed. So I don't know why you think it's just a few macros. We are still discussing a tiny static function and haven't arrived at an agreed and safe solution. Let's be a bit more practical here, okay? > > The safe thing is to re-initialize the pointer from the string > > whenever we think GC could happen, and otherwise make sure GC cannot > > happen. > > For the old GC, yes. For the new GC, the safe thing would be to ensure > MPS has removed all memory barriers, take the arena lock, then call the > syscall and return. Or use xstrdup. If this is indeed so (and I don't think it is), then we need to discuss it very thoroughly, because it basically means we cannot do anything with Lisp strings in C. For example, the display iterator has a member that is a Lisp string, which is used when we produce glyphs for display from a string (such as a display property or an overlay before/after string) -- what you say here basically means that we cannot carry that string around, right? If not, how is that different? > >> If a pointer to "old" data is ever exposed to Emacs, we lose, because > >> MPS will reuse the memory for new data, which might be behind a barrier. > >> > >> If we ever do: > >> > >> static Lisp_Object unmarked; > ^^^^^^ > >> unmarked = string; > >> ... trigger GC here ... > >> puts (SDATA (unmarked); > >> > >> the most likely outcome (thanks to Gerd for the hint) is that > >> nonsensical data is printed > > > > Are you sure? > > With the static keyword, yes. (Assuming we don't staticpro the static > variable, of course). What does static have to do with this? What matters is the value, not the storage. The value comes from 'string', a different variable. It points to string data, and it's that string data that is of interest to us in the above snippet. > > The below is indeed unsafe: > > > > char *p = SDATA (unmarked); > > ... trigger GC here ... > > puts (p); > > Right now, that's safe for MPS, but not for the old GC, correct? If GC moves string data, then it is not safe, period. Does MPS move string data? > >> > To clarify, I was trying to understand whether the error message > >> > reported by Ihor in another thread could have happened because of GC > >> > in this are of the code. > >> > >> I currently think that Ihor's test case calls execve () with nonsensical > >> "environment" strings a lot, and once in a while they'll even be behind > >> the barrier which causes an EFAULT. > > > > Before we agree that this could be the reason, we'd need to find the > > places where GC could reuse the memory of a live string, while that > > string appears in some live data structure, and as long as no GC can > > happen between the time we put the SDATA pointers into the environment > > array and the time posix_spawn returns. > > Calling igc_collect () right before the spawn results in corrupt data: But the code doesn't call igc_collect, so how is this relevant? Please see how I described the situation in the paragraph above. > So the only mystery left is what causes GC to be called on current > scratch/igc after the environment block has been created; That's the _only_ mystery, and only if that in fact happens. Someone should explain how GC _can_ happen in that case on the igc branch. If you can, please do, and let's stop arguing about theoretical possibilities. > My current theory is that igc_on_grow_specpdl (called indirectly from > here: > > /* Do the unwind-protect now, even though the pid is not known, so > that no storage allocation is done in the critical section. > The actual PID will be filled in during the critical section. */ > record_unwind_protect (call_process_cleanup, Fcurrent_buffer ()); > > ) releases the arena, and MPS uses that apparent opportunity to call > ArenaPoll which might do GC things. Evidence, please. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 20:31 ` Eli Zaretskii @ 2025-01-04 21:15 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 8:23 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-04 21:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 75322 "Eli Zaretskii" <eliz@gnu.org> writes: >> > The safe thing is to re-initialize the pointer from the string >> > whenever we think GC could happen, and otherwise make sure GC cannot >> > happen. >> >> For the old GC, yes. For the new GC, the safe thing would be to ensure >> MPS has removed all memory barriers, take the arena lock, then call the >> syscall and return. Or use xstrdup. > > If this is indeed so (and I don't think it is), then we need to Which part do you think is wrong? > discuss it very thoroughly, because it basically means we cannot do > anything with Lisp strings in C. For example, the display iterator > has a member that is a Lisp string, which is used when we produce > glyphs for display from a string (such as a display property or an > overlay before/after string) -- what you say here basically means that > we cannot carry that string around, right? If not, how is that > different? Of course we can use Lisp strings. As long as there's an automatic variable pointing to the string data, it'll stay there. If there's a static variable pointing to the string data, it might move, but then the static variable will be updated. >> >> If a pointer to "old" data is ever exposed to Emacs, we lose, because >> >> MPS will reuse the memory for new data, which might be behind a barrier. >> >> >> >> If we ever do: >> >> >> >> static Lisp_Object unmarked; >> ^^^^^^ >> >> unmarked = string; >> >> ... trigger GC here ... >> >> puts (SDATA (unmarked); >> >> >> >> the most likely outcome (thanks to Gerd for the hint) is that >> >> nonsensical data is printed >> > >> > Are you sure? >> >> With the static keyword, yes. (Assuming we don't staticpro the static >> variable, of course). > > What does static have to do with this? What matters is whether the value is visible to the garbage collector (in which case it remains a valid pointer) or isn't (in which case the memory it points to is used for something else). Automatic variables, residing on the stack or residing in a register (which is then spilled to the stack) protect the memory they point to. Static variables don't unless we tell GC about them. > What matters is the value, not the storage. I have no idea what you mean by that. The value of the variable is a tagged pointer. It won't change during GC, because GC never alters automatic variables. The question is whether this pointer still points to the right data area after GC. Unless there happens to be another ambiguous reference to the string data (which means MPS cannot move the string data, because it cannot alter ambiguous references), an unprotected static Lisp_Object will most likely point to invalid data after MPS GC runs. > The value comes from 'string', a different variable. It Which might be in a register and not survive until GC is triggered. Or it might be a global/static variable which is in an exact root, which means the data can be moved, 'string' will be updated, 'unmarked' won't. > points to string data, and it's that string data that is of interest > to us in the above snippet. >> > The below is indeed unsafe: >> > >> > char *p = SDATA (unmarked); >> > ... trigger GC here ... >> > puts (p); >> >> Right now, that's safe for MPS, but not for the old GC, correct? > > If GC moves string data, then it is not safe, period. Does MPS move > string data? MPS does not move string data if there is a stack variable pointing to it. It does in other cases. This is why it's safe for MPS. The old GC, IIUC, is less forgiving. >> >> > To clarify, I was trying to understand whether the error message >> >> > reported by Ihor in another thread could have happened because of GC >> >> > in this are of the code. >> >> >> >> I currently think that Ihor's test case calls execve () with nonsensical >> >> "environment" strings a lot, and once in a while they'll even be behind >> >> the barrier which causes an EFAULT. >> > >> > Before we agree that this could be the reason, we'd need to find the >> > places where GC could reuse the memory of a live string, while that >> > string appears in some live data structure, and as long as no GC can >> > happen between the time we put the SDATA pointers into the environment >> > array and the time posix_spawn returns. >> >> Calling igc_collect () right before the spawn results in corrupt data: > > But the code doesn't call igc_collect, so how is this relevant? This is all that is relevant to establishing the current code is not safe for the intended behavior of scratch/igc, which is to allow garbage collection at this point, equivalent to setting a breakpoint there and calling igc_collect(). That we have a specific code path which ends up in MPS code is a bonus. I honestly don't know whether hitting a memory barrier can result in other data being moved by MPS right now. I'm assuming it can, and we should not assume string data is protected from being moved unless we root it explicitly. (Just answering the GC questions for now, sorry). Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-04 21:15 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 8:23 ` Eli Zaretskii 2025-01-05 9:04 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 8:23 UTC (permalink / raw) To: Pip Cet; +Cc: gerd.moellmann, 75322 > Date: Sat, 04 Jan 2025 21:15:50 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org > > "Eli Zaretskii" <eliz@gnu.org> writes: > > >> > The safe thing is to re-initialize the pointer from the string > >> > whenever we think GC could happen, and otherwise make sure GC cannot > >> > happen. > >> > >> For the old GC, yes. For the new GC, the safe thing would be to ensure > >> MPS has removed all memory barriers, take the arena lock, then call the > >> syscall and return. Or use xstrdup. > > > > If this is indeed so (and I don't think it is), then we need to > > Which part do you think is wrong? All of what you describe. We should be able to use without restrictions local variables whose values are Lisp strings, and we should be able to use without restrictions static variables whose values are Lisp strings. To use the 'char *' pointer to a Lisp string's data we should make sure GC cannot happen between the time we extract the pointer using SDATA or SSDATA and the time we end the use of the pointer. If you think anything in the above is not correct with MPS, please tell. Please only respond with real practical problems, to keep this discussion concrete and on-topic. > > discuss it very thoroughly, because it basically means we cannot do > > anything with Lisp strings in C. For example, the display iterator > > has a member that is a Lisp string, which is used when we produce > > glyphs for display from a string (such as a display property or an > > overlay before/after string) -- what you say here basically means that > > we cannot carry that string around, right? If not, how is that > > different? > > Of course we can use Lisp strings. As long as there's an automatic > variable pointing to the string data, it'll stay there. If there's a > static variable pointing to the string data, it might move, but then the > static variable will be updated. Then I don't understand why you seem to be saying that our usual use patterns of Lisp strings and their textual data is unsafe and needs to be amended by using xstrdup etc. We do it already as described above. > >> >> If a pointer to "old" data is ever exposed to Emacs, we lose, because > >> >> MPS will reuse the memory for new data, which might be behind a barrier. > >> >> > >> >> If we ever do: > >> >> > >> >> static Lisp_Object unmarked; > >> ^^^^^^ > >> >> unmarked = string; > >> >> ... trigger GC here ... > >> >> puts (SDATA (unmarked); > >> >> > >> >> the most likely outcome (thanks to Gerd for the hint) is that > >> >> nonsensical data is printed > >> > > >> > Are you sure? > >> > >> With the static keyword, yes. (Assuming we don't staticpro the static > >> variable, of course). > > > > What does static have to do with this? > > What matters is whether the value is visible to the garbage collector > (in which case it remains a valid pointer) or isn't (in which case the > memory it points to is used for something else). > > Automatic variables, residing on the stack or residing in a register > (which is then spilled to the stack) protect the memory they point to. > Static variables don't unless we tell GC about them. > > > What matters is the value, not the storage. > > I have no idea what you mean by that. The value of the variable is a > tagged pointer. It won't change during GC, because GC never alters > automatic variables. The question is whether this pointer still points > to the right data area after GC. The pointer taken before GC might not point to the string's data after GC. Which is why, if we know GC could happen between two points in a program's code, we cannot use the same 'char *' pointer there; instead, we must recompute the pointer (by using SDATA) after the possible GC. > > The value comes from 'string', a different variable. It > > Which might be in a register and not survive until GC is triggered. A Lisp_Object variable will survive. Its pointer will be updated if needed to point to the new location of the data. Thus, using the Lisp_Object variable is always safe, but the pointer to its data must be updated after a potential GC. > >> > The below is indeed unsafe: > >> > > >> > char *p = SDATA (unmarked); > >> > ... trigger GC here ... > >> > puts (p); > >> > >> Right now, that's safe for MPS, but not for the old GC, correct? > > > > If GC moves string data, then it is not safe, period. Does MPS move > > string data? > > MPS does not move string data if there is a stack variable pointing to > it. It does in other cases. This is why it's safe for MPS. The old > GC, IIUC, is less forgiving. The conclusion is that the above is NOT safe, because in some cases the data could move. Which was what I said from the get-go. > >> > Before we agree that this could be the reason, we'd need to find the > >> > places where GC could reuse the memory of a live string, while that > >> > string appears in some live data structure, and as long as no GC can > >> > happen between the time we put the SDATA pointers into the environment > >> > array and the time posix_spawn returns. > >> > >> Calling igc_collect () right before the spawn results in corrupt data: > > > > But the code doesn't call igc_collect, so how is this relevant? > > This is all that is relevant to establishing the current code is not > safe for the intended behavior of scratch/igc, which is to allow garbage > collection at this point, equivalent to setting a breakpoint there and > calling igc_collect(). Why would we allow GC at some point where the code clearly does NOT expect GC to happen? We are discussing whether some code in callproc.c needs to be changed or is okay as it is. If it is okay because GC cannot possibly happen while some code fragment is being executed, then forcefully injecting a call to GC there is pointless, because it changes the conditions under which this code normally runs. So please keep such arguments out of this discussion, they just muddy the waters and are not useful. > I honestly don't know whether hitting a memory barrier can result in > other data being moved by MPS right now. I'm assuming it can, and we > should not assume string data is protected from being moved unless we > root it explicitly. We should not assume anything, we should have a clear understanding what does and what doesn't move the data and the pointers. We cannot possibly analyze the code otherwise, or at least I cannot. To keep this discussion focused, let's please return to looking at some particular code in callproc.c where you think SAFE_ALLOCA is unsafe. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 8:23 ` Eli Zaretskii @ 2025-01-05 9:04 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 9:32 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 9:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 75322 "Eli Zaretskii" <eliz@gnu.org> writes: > we should be able to use without restrictions static variables whose > values are Lisp strings. Without staticpro? That makes no sense whatsoever to me. It's not true for either garbage collector. Do you expect MPS to scan the entire data/bss segment ambiguously? >> Which might be in a register and not survive until GC is triggered. > > A Lisp_Object variable will survive. Its pointer will be updated if > needed to point to the new location of the data. Thus, using the MPS never changes the value of an automatic variable. > Lisp_Object variable is always safe, but the pointer to its data must > be updated after a potential GC. > >> >> > The below is indeed unsafe: >> >> > >> >> > char *p = SDATA (unmarked); >> >> > ... trigger GC here ... >> >> > puts (p); >> >> >> >> Right now, that's safe for MPS, but not for the old GC, correct? >> > >> > If GC moves string data, then it is not safe, period. Does MPS move >> > string data? >> >> MPS does not move string data if there is a stack variable pointing to >> it. It does in other cases. This is why it's safe for MPS. The old >> GC, IIUC, is less forgiving. > > The conclusion is that the above is NOT safe, because in some cases > the data could move. Which was what I said from the get-go. The conclusion is incorrect. The code is safe if MPS is in use, and we rely on that. The idea behind MPS is that you write code as above, which is safe when MPS is in use, rather than attempting to avoid GC, which is fragile at best (see call_process) and impossible in multi-threaded systems. We have to avoid the old GC in the !HAVE_MPS build, but with the MPS collector, there is no need (and no way) to avoid GC here. You seem to fundamentally disagree with me about how MPS works. We need to resolve that difference one way or another before we can continue any GC-related discussions. Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 9:04 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 9:32 ` Eli Zaretskii 2025-01-05 9:47 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 9:32 UTC (permalink / raw) To: Pip Cet; +Cc: gerd.moellmann, 75322 > Date: Sun, 05 Jan 2025 09:04:06 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org > > "Eli Zaretskii" <eliz@gnu.org> writes: > > we should be able to use without restrictions static variables whose > > values are Lisp strings. > > Without staticpro? That makes no sense whatsoever to me. It's not true > for either garbage collector. Do you expect MPS to scan the entire > data/bss segment ambiguously? That's not relevant to the issue at hand, which was about the following snippet: static Lisp_Object unmarked; ^^^^^^ unmarked = string; ... trigger GC here ... puts (SDATA (unmarked); That 'unmarked' is a static variable is not relevant, because the issue is the pointer to the text data of 'string', which value we put int 'unmarked'. > >> Which might be in a register and not survive until GC is triggered. > > > > A Lisp_Object variable will survive. Its pointer will be updated if > > needed to point to the new location of the data. Thus, using the > > MPS never changes the value of an automatic variable. Exactly! > > Lisp_Object variable is always safe, but the pointer to its data must > > be updated after a potential GC. > > > >> >> > The below is indeed unsafe: > >> >> > > >> >> > char *p = SDATA (unmarked); > >> >> > ... trigger GC here ... > >> >> > puts (p); > >> >> > >> >> Right now, that's safe for MPS, but not for the old GC, correct? > >> > > >> > If GC moves string data, then it is not safe, period. Does MPS move > >> > string data? > >> > >> MPS does not move string data if there is a stack variable pointing to > >> it. It does in other cases. This is why it's safe for MPS. The old > >> GC, IIUC, is less forgiving. > > > > The conclusion is that the above is NOT safe, because in some cases > > the data could move. Which was what I said from the get-go. > > The conclusion is incorrect. The code is safe if MPS is in use, and we > rely on that. No, we do NOT, and should NOT, rely on that! Because you yourself say above that MPS will move the string data in some cases. > The idea behind MPS is that you write code as above, which is safe when > MPS is in use, rather than attempting to avoid GC, which is fragile at > best (see call_process) and impossible in multi-threaded systems. I don't suggest to avoid GC. I suggest that where GC _can_ happen, we must reinitialize the pointer to string data after GC. Otherwise, what was all that discussion about what call_process does with the strings in the args[] array? If GC cannot change the data pointer to SDATA, then why should we care about GC in that function? > You seem to fundamentally disagree with me about how MPS works. We need > to resolve that difference one way or another before we can continue any > GC-related discussions. I have a much larger issue here: this discussion produces an enormous number of long messages without actually leading anywhere. Maybe I'm too stupid to discuss this, but I cannot keep it going like that, it eats up all my free time (of which I don't have too much to be gin with). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 9:32 ` Eli Zaretskii @ 2025-01-05 9:47 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 11:04 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 9:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 75322 "Eli Zaretskii" <eliz@gnu.org> writes: >> >> >> > The below is indeed unsafe: >> >> >> > >> >> >> > char *p = SDATA (unmarked); >> >> >> > ... trigger GC here ... >> >> >> > puts (p); >> >> >> >> >> >> Right now, that's safe for MPS, but not for the old GC, correct? >> >> > >> >> > If GC moves string data, then it is not safe, period. Does MPS move >> >> > string data? >> >> >> >> MPS does not move string data if there is a stack variable pointing to >> >> it. It does in other cases. This is why it's safe for MPS. The old >> >> GC, IIUC, is less forgiving. >> > >> > The conclusion is that the above is NOT safe, because in some cases >> > the data could move. Which was what I said from the get-go. >> >> The conclusion is incorrect. The code is safe if MPS is in use, and we >> rely on that. > > No, we do NOT, and should NOT, rely on that! We do, we should, and we will continue doing so. > Because you yourself say above that MPS will move the string data in > some cases. Not in this case! >> The idea behind MPS is that you write code as above, which is safe when >> MPS is in use, rather than attempting to avoid GC, which is fragile at >> best (see call_process) and impossible in multi-threaded systems. > > I don't suggest to avoid GC. I suggest that where GC _can_ happen, we > must reinitialize the pointer to string data after GC. That suggestion is incorrect. MPS GC can and does happen while we have SDATA pointers on the stack, and we rely on it to keep those valid. We need not reinitialize anything after MPS GC, and we can't do so because we don't know when and how we are interrupted for MPS GC. IOW, MPS treats SDATA point the same way it treats Lisp_Objects. That isn't the problem here. The problem is that like Lisp_Objects, SDATA pointers cannot be moved to xmalloc'd memory and retain their validity. As long as you think MPS requires us to avoid GC in this case (this is implied by "we must do this or that after GC"), your understanding of how scratch/igc uses MPS is fundamentally incorrect. Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 9:47 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 11:04 ` Eli Zaretskii 2025-01-06 15:54 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 11:04 UTC (permalink / raw) To: Pip Cet; +Cc: gerd.moellmann, 75322 > Date: Sun, 05 Jan 2025 09:47:06 +0000 > From: Pip Cet <pipcet@protonmail.com> > Cc: gerd.moellmann@gmail.com, 75322@debbugs.gnu.org > > "Eli Zaretskii" <eliz@gnu.org> writes: > > I don't suggest to avoid GC. I suggest that where GC _can_ happen, we > > must reinitialize the pointer to string data after GC. > > That suggestion is incorrect. MPS GC can and does happen while we have > SDATA pointers on the stack, and we rely on it to keep those valid. Since we've established that MPS works on a single thread, which is our Lisp thread, then GC cannot happen unless we make some call that could trigger MPS GC. Thus, whether or not SSDATA pointers are on the stack is not relevant to when MPS GC could happen; what matters is the code that is executed. > IOW, MPS treats SDATA point the same way it treats Lisp_Objects. I don't understand what that means. What is "SDATA point"? What is the meaning of "treats the same"? > The problem is that like Lisp_Objects, SDATA pointers cannot be > moved to xmalloc'd memory and retain their validity. No one said they could. This is trivial. > As long as you think MPS requires us to avoid GC in this case (this is > implied by "we must do this or that after GC"), your understanding of > how scratch/igc uses MPS is fundamentally incorrect. Maybe my understanding is incorrect, but this style of "discussion", where you are working hard to prove at all costs that I misunderstand the issues and to point out my mistakes, contributes nothing at all to understanding. It only contributes to confusion. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 11:04 ` Eli Zaretskii @ 2025-01-06 15:54 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-06 19:16 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-06 15:54 UTC (permalink / raw) To: Eli Zaretskii, Daniel Colascione; +Cc: gerd.moellmann, 75322 "Eli Zaretskii" <eliz@gnu.org> writes: >> As long as you think MPS requires us to avoid GC in this case (this is >> implied by "we must do this or that after GC"), your understanding of >> how scratch/igc uses MPS is fundamentally incorrect. > > Maybe my understanding is incorrect, but this style of "discussion", > where you are working hard to prove at all costs that I misunderstand > the issues and to point out my mistakes, contributes nothing at all to > understanding. It only contributes to confusion. I thought it was the only way for me to get you to even consider the possibility that your ideas of how GC works with MPS may need to be corrected. Given your message, I currently think there is no way for me to do so at all. I give up. (Certainly on this discussion, maybe on scratch/igc, who knows.) The MPS-based GC in scratch/igc does not allow or require general Emacs C code to make "no GC here" assumptions. Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-06 15:54 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-06 19:16 ` Gerd Möllmann 0 siblings, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-06 19:16 UTC (permalink / raw) To: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors Cc: Eli Zaretskii, Daniel Colascione, Pip Cet, 75322, Emacs Devel, Helmut Eller (CC to emacs-devel and Helmut) > I give up. (Certainly on this discussion, maybe on scratch/igc, who > knows.) I propose that we, i.e. you Pip, Helmut, and me, ignore the subject of a possible merge to master in the future. That's an SEP, costs too much energy, and what happens is unpredictable. I'll begin with the ignoring by reverting a revert, adding Helmut's and my patches, merging master on the weekend. Sorry for any inconvenience that causes. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-03 19:55 ` Gerd Möllmann @ 2025-01-05 6:32 ` Gerd Möllmann 2025-01-05 6:59 ` Gerd Möllmann 2025-01-05 7:48 ` Eli Zaretskii 1 sibling, 2 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 6:32 UTC (permalink / raw) To: Pip Cet; +Cc: 75322 Just want to add that all SAFE_NALLOC uses should be checked in scratch/igc. For example, set_overlays_multibyte uses it to store itree_node *, and itree_node is in MPS. I think I'll just make it allocate a root in my Emacs. That's the least work. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 6:32 ` Gerd Möllmann @ 2025-01-05 6:59 ` Gerd Möllmann 2025-01-05 10:21 ` Eli Zaretskii 2025-01-05 7:48 ` Eli Zaretskii 1 sibling, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 6:59 UTC (permalink / raw) To: Pip Cet; +Cc: 75322 [-- Attachment #1: Type: text/plain, Size: 414 bytes --] Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Just want to add that all SAFE_NALLOC uses should be checked in > scratch/igc. For example, set_overlays_multibyte uses it to store > itree_node *, and itree_node is in MPS. > > I think I'll just make it allocate a root in my Emacs. That's the least > work. Please find attached what I'm using now in my Emacs, in addition to your xstrdup commit. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Don-t-use-SAFE_NALLOCA-for-Lisp_Objects.patch --] [-- Type: text/x-patch, Size: 1639 bytes --] From c364d3199a1b3b1a9ec0f9afe421110d46a77769 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gerd=20M=C3=B6llmann?= <gerd@gnu.org> Date: Sat, 4 Jan 2025 19:42:50 +0100 Subject: [PATCH 1/2] Don't use SAFE_NALLOCA for Lisp_Objects * src/callproc.c (call_process, create_temp_file): Use SAFE_ALLOCA_LISP instead of SAFE_NALLCA. --- src/callproc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/callproc.c b/src/callproc.c index 459ddbeb4f3..6f358f939de 100644 --- a/src/callproc.c +++ b/src/callproc.c @@ -423,7 +423,7 @@ call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd, val = Qraw_text; else { - SAFE_NALLOCA (args2, 1, nargs + 1); + SAFE_ALLOCA_LISP (args2, nargs + 1); args2[0] = Qcall_process; for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; coding_systems = Ffind_operation_coding_system (nargs + 1, args2); @@ -741,7 +741,7 @@ call_process (ptrdiff_t nargs, Lisp_Object *args, int filefd, { ptrdiff_t i; - SAFE_NALLOCA (args2, 1, nargs + 1); + SAFE_ALLOCA_LISP (args2, nargs + 1); args2[0] = Qcall_process; for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; coding_systems @@ -1033,7 +1033,7 @@ create_temp_file (ptrdiff_t nargs, Lisp_Object *args, Lisp_Object coding_systems; Lisp_Object *args2; USE_SAFE_ALLOCA; - SAFE_NALLOCA (args2, 1, nargs + 1); + SAFE_ALLOCA_LISP (args2, nargs + 1); args2[0] = Qcall_process_region; memcpy (args2 + 1, args, nargs * sizeof *args); coding_systems = Ffind_operation_coding_system (nargs + 1, args2); -- 2.47.1 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-Make-SAFE_NALLOCA-allocate-an-ambiguous-root.patch --] [-- Type: text/x-patch, Size: 2523 bytes --] From 5b0a46390dfb5407cb7585bcbb1b7491d81885ea Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gerd=20M=C3=B6llmann?= <gerd@gnu.org> Date: Sun, 5 Jan 2025 07:45:51 +0100 Subject: [PATCH 2/2] Make SAFE_NALLOCA allocate an ambiguous root * src/igc.c (igc_xnmalloc_ambig): New function. * src/igc.h: Declare it. * src/lisp.h (SAFE_NALLOCA): Use it. --- src/igc.c | 6 ++++++ src/igc.h | 1 + src/lisp.h | 19 +++++++++++++++++++ 3 files changed, 26 insertions(+) diff --git a/src/igc.c b/src/igc.c index 54131d5fcf3..10f74c66d8f 100644 --- a/src/igc.c +++ b/src/igc.c @@ -3144,6 +3144,12 @@ igc_xzalloc_ambig (size_t size) return p; } +void * +igc_xnmalloc_ambig (ptrdiff_t nitems, ptrdiff_t item_size) +{ + return igc_xzalloc_ambig (nitems * item_size); +} + void * igc_realloc_ambig (void *block, size_t size) { diff --git a/src/igc.h b/src/igc.h index 2a9bcda58b2..656a384aacf 100644 --- a/src/igc.h +++ b/src/igc.h @@ -86,6 +86,7 @@ #define EMACS_IGC_H struct Lisp_Buffer_Local_Value *igc_alloc_blv (void); void *igc_alloc_handler (void); void *igc_xzalloc_ambig (size_t size); +void *igc_xnmalloc_ambig (ptrdiff_t nitems, ptrdiff_t item_size); void *igc_realloc_ambig (void *block, size_t size); void igc_xfree (void *p); Lisp_Object *igc_xalloc_lisp_objs_exact (size_t n); diff --git a/src/lisp.h b/src/lisp.h index 67b574a67b7..32e2886561c 100644 --- a/src/lisp.h +++ b/src/lisp.h @@ -6109,6 +6109,23 @@ #define SAFE_ALLOCA(size) ((size) <= sa_avail \ NITEMS items, each of the same type as *BUF. MULTIPLIER must positive. The code is tuned for MULTIPLIER being a constant. */ +# ifdef HAVE_MPS +void *igc_xnmalloc_ambig (ptrdiff_t nitems, ptrdiff_t item_size); +void igc_xfree (void *p); + +#define SAFE_NALLOCA(buf, multiplier, nitems) \ + do { \ + if ((nitems) <= sa_avail / sizeof *(buf) / (multiplier)) \ + (buf) = AVAIL_ALLOCA (sizeof *(buf) * (multiplier) * (nitems)); \ + else \ + { \ + (buf) = igc_xnmalloc_ambig (nitems, sizeof *(buf) * (multiplier)); \ + record_unwind_protect_ptr (igc_xfree, buf); \ + } \ + } while (false) + +#else + #define SAFE_NALLOCA(buf, multiplier, nitems) \ do { \ if ((nitems) <= sa_avail / sizeof *(buf) / (multiplier)) \ @@ -6120,6 +6137,8 @@ #define SAFE_NALLOCA(buf, multiplier, nitems) \ } \ } while (false) +#endif + /* SAFE_ALLOCA_STRING allocates a C copy of a Lisp string. */ #define SAFE_ALLOCA_STRING(ptr, string) \ -- 2.47.1 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 6:59 ` Gerd Möllmann @ 2025-01-05 10:21 ` Eli Zaretskii 2025-01-05 10:30 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 10:21 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > Cc: 75322@debbugs.gnu.org > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Date: Sun, 05 Jan 2025 07:59:32 +0100 > > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > > > Just want to add that all SAFE_NALLOC uses should be checked in > > scratch/igc. For example, set_overlays_multibyte uses it to store > > itree_node *, and itree_node is in MPS. > > > > I think I'll just make it allocate a root in my Emacs. That's the least > > work. > > Please find attached what I'm using now in my Emacs, in addition to your > xstrdup commit. These two are alternatives, right? IOW, we should use one or the other in each specific case, is that correct? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 10:21 ` Eli Zaretskii @ 2025-01-05 10:30 ` Gerd Möllmann 2025-01-05 10:35 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 11:29 ` Eli Zaretskii 0 siblings, 2 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 10:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> Cc: 75322@debbugs.gnu.org >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Date: Sun, 05 Jan 2025 07:59:32 +0100 >> >> Gerd Möllmann <gerd.moellmann@gmail.com> writes: >> >> > Just want to add that all SAFE_NALLOC uses should be checked in >> > scratch/igc. For example, set_overlays_multibyte uses it to store >> > itree_node *, and itree_node is in MPS. >> > >> > I think I'll just make it allocate a root in my Emacs. That's the least >> > work. >> >> Please find attached what I'm using now in my Emacs, in addition to your >> xstrdup commit. > > These two are alternatives, right? IOW, we should use one or the > other in each specific case, is that correct? If you mean the two patches I sent with "these two", then no. I prefer using SAFE_ALLOCA_LISP because that introduces an exact root. If "these two" means xstrdup and my SAFE_NALLOCA change, then yes, kind of. It doesn't harm to do both, but it's not strictly necessary to do xstrdup if the SAFE_NALLOCA change is in. The other way round is not true, i.e. having the xstrdup change fixes only 1 case. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 10:30 ` Gerd Möllmann @ 2025-01-05 10:35 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 10:45 ` Gerd Möllmann 2025-01-05 11:29 ` Eli Zaretskii 1 sibling, 1 reply; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 10:35 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, 75322 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > If "these two" means xstrdup and my SAFE_NALLOCA change, then yes, kind > of. It doesn't harm to do both, but it's not strictly necessary to do > xstrdup if the SAFE_NALLOCA change is in. The other way round is not For argv, I agree, but make_environment_block uses xnmalloc and assumes SDATA pointers in there will survive, which they don't, according to my GDB log. It was very brave not to start out by making every xmalloc an ambiguous root and fixing only those that we know to be safe :-) Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 10:35 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 10:45 ` Gerd Möllmann 0 siblings, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 10:45 UTC (permalink / raw) To: Pip Cet; +Cc: Eli Zaretskii, 75322 Pip Cet <pipcet@protonmail.com> writes: > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> If "these two" means xstrdup and my SAFE_NALLOCA change, then yes, kind >> of. It doesn't harm to do both, but it's not strictly necessary to do >> xstrdup if the SAFE_NALLOCA change is in. The other way round is not > > For argv, I agree, but make_environment_block uses xnmalloc and assumes > SDATA pointers in there will survive, which they don't, according to my > GDB log. Ah, okay. Then both xstrdup and SAFE_NALLOCA. Which I have here so I'm good. > It was very brave not to start out by making every xmalloc an ambiguous > root and fixing only those that we know to be safe :-) The SAFE_ stuff was only 1 out of 6542 things to get going. And I'm absolutely reckless anyway :-). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 10:30 ` Gerd Möllmann 2025-01-05 10:35 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 11:29 ` Eli Zaretskii 2025-01-05 11:37 ` Gerd Möllmann 1 sibling, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 11:29 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sun, 05 Jan 2025 11:30:11 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> Cc: 75322@debbugs.gnu.org > >> From: Gerd Möllmann <gerd.moellmann@gmail.com> > >> Date: Sun, 05 Jan 2025 07:59:32 +0100 > >> > >> Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> > >> > Just want to add that all SAFE_NALLOC uses should be checked in > >> > scratch/igc. For example, set_overlays_multibyte uses it to store > >> > itree_node *, and itree_node is in MPS. > >> > > >> > I think I'll just make it allocate a root in my Emacs. That's the least > >> > work. > >> > >> Please find attached what I'm using now in my Emacs, in addition to your > >> xstrdup commit. > > > > These two are alternatives, right? IOW, we should use one or the > > other in each specific case, is that correct? > > If you mean the two patches I sent with "these two", then no. I prefer > using SAFE_ALLOCA_LISP because that introduces an exact root. I guess I'm confused, then. The first patch replaces calls to SAFE_NALLOCA by SAFE_ALLOCA_LISP, the second patch modifies SAFE_NALLOCA to call igc_xnmalloc_ambig. That's why I thought they were alternatives. If they are not alternatives, then why did you replace SAFE_NALLOCA in the first patch? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 11:29 ` Eli Zaretskii @ 2025-01-05 11:37 ` Gerd Möllmann 2025-01-05 12:15 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 11:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> If you mean the two patches I sent with "these two", then no. I prefer >> using SAFE_ALLOCA_LISP because that introduces an exact root. > > I guess I'm confused, then. The first patch replaces calls to > SAFE_NALLOCA by SAFE_ALLOCA_LISP, the second patch modifies > SAFE_NALLOCA to call igc_xnmalloc_ambig. That's why I thought they > were alternatives. > > If they are not alternatives, then why did you replace SAFE_NALLOCA in > the first patch? I checked other uses of SAFE_NALLOCA that were not yet mentioned, and found another problematic case. (Something with struct itree_node *, don't remember the function name, it's in some other mail). There were too many grep hits for SAFE_NALLOCA for me, so I shot with a canon :-). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 11:37 ` Gerd Möllmann @ 2025-01-05 12:15 ` Eli Zaretskii 2025-01-05 13:21 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 12:15 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sun, 05 Jan 2025 12:37:42 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> If you mean the two patches I sent with "these two", then no. I prefer > >> using SAFE_ALLOCA_LISP because that introduces an exact root. > > > > I guess I'm confused, then. The first patch replaces calls to > > SAFE_NALLOCA by SAFE_ALLOCA_LISP, the second patch modifies > > SAFE_NALLOCA to call igc_xnmalloc_ambig. That's why I thought they > > were alternatives. > > > > If they are not alternatives, then why did you replace SAFE_NALLOCA in > > the first patch? > > I checked other uses of SAFE_NALLOCA that were not yet mentioned, and > found another problematic case. (Something with struct itree_node *, > don't remember the function name, it's in some other mail). There were > too many grep hits for SAFE_NALLOCA for me, so I shot with a canon :-). OK. So can we talk about the relative merits and demerits of using SAFE_ALLOCA_LISP vs SAFE_NALLOCA? First, why is it better to have an exact root than an ambiguous root? And second, SAFE_ALLOCA_LISP conses a Lisp vector, which will increase GC pressure, so isn't SAFE_NALLOCA preferable at least in some cases? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 12:15 ` Eli Zaretskii @ 2025-01-05 13:21 ` Gerd Möllmann 2025-01-05 17:31 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 13:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sun, 05 Jan 2025 12:37:42 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> >> If you mean the two patches I sent with "these two", then no. I prefer >> >> using SAFE_ALLOCA_LISP because that introduces an exact root. >> > >> > I guess I'm confused, then. The first patch replaces calls to >> > SAFE_NALLOCA by SAFE_ALLOCA_LISP, the second patch modifies >> > SAFE_NALLOCA to call igc_xnmalloc_ambig. That's why I thought they >> > were alternatives. >> > >> > If they are not alternatives, then why did you replace SAFE_NALLOCA in >> > the first patch? >> >> I checked other uses of SAFE_NALLOCA that were not yet mentioned, and >> found another problematic case. (Something with struct itree_node *, >> don't remember the function name, it's in some other mail). There were >> too many grep hits for SAFE_NALLOCA for me, so I shot with a canon :-). > > OK. > > So can we talk about the relative merits and demerits of using > SAFE_ALLOCA_LISP vs SAFE_NALLOCA? Let me add a (0): I assume that SAFE_ALLOCA_LISP is the right thing in the _old_ GC, because it makes sure objects referenced in the xmalloc'd memory are marked. From my POV, it would require a very good reason to use something else, which is nowhere mentioned. That's why I suspect it's a left-over from times where SAFE_ALLOCA_LISP didn't exist. (And I very much hope it's not the old pattern of "I don't need to GCPRO this because I know this is already protected because of so-and-so", which you might still remember from the old times. In which cases I would've liked to hit people with a GCPRO on their head when I had to debug that and so-on-so was no longer true :-).) > First, why is it better to have an exact root than an ambiguous root? In the most general case, where an ambiguous root can contain random bit patterns, say the C stack, I'd say the greatest advantage of exact roots is avoiding false positives that keep objects alive, or prevent copying them. In the specific here case, where a root actually contains only Lisp_Objects, and not random patterns, I'd say the advantage of exact roots is that they don't prevent copying. The "prevent copying" disadvantage is a bit hand-wavy, and depends a lot on the GC implementation. Maybe a good picture of it that one would like to have a fully copying collector, with its advantage of reducing fragmentation, for example, but one can only have a mostly-copying collector, because of ambiguous references. The more the "mostly" is true the better for the copying/fragmentation. Does that make sense? > And second, SAFE_ALLOCA_LISP conses a Lisp vector, which will increase > GC pressure, so isn't SAFE_NALLOCA preferable at least in some cases? SAFE_ALLOCA_LISP allocates a Lisp vector, that's true. I think one can say that allocation is cheap on average. The overhead of freeing it is not copying it, which is basically zero. SAFE_NALLOCA, with my patch, requires a xmalloc, creation of a MPS root object, deletion of that, and xfree. Let's assume scanning costs are more or less the same because the number of references is the same in both cases. I think SAFE_NALLOCA is more expensive, ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 13:21 ` Gerd Möllmann @ 2025-01-05 17:31 ` Eli Zaretskii 2025-01-05 17:49 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 17:31 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sun, 05 Jan 2025 14:21:04 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > So can we talk about the relative merits and demerits of using > > SAFE_ALLOCA_LISP vs SAFE_NALLOCA? > > Let me add a (0): I assume that SAFE_ALLOCA_LISP is the right thing in > the _old_ GC, because it makes sure objects referenced in the xmalloc'd > memory are marked. From my POV, it would require a very good reason to > use something else, which is nowhere mentioned. That's why I suspect > it's a left-over from times where SAFE_ALLOCA_LISP didn't exist. This surprises me because on the master branch SAFE_ALLOCA_LISP either calls alloca or xzalloc. So I don't see why SAFE_ALLOCA_LISP would be safer in the old GC than SAFE_NALLOCA. It's only on the igc branch that you made SAFE_ALLOCA_LISP make a Lisp vector. But this is a tangent from my POV. I would like to discuss the new GC, not the old one. > > And second, SAFE_ALLOCA_LISP conses a Lisp vector, which will increase > > GC pressure, so isn't SAFE_NALLOCA preferable at least in some cases? > > SAFE_ALLOCA_LISP allocates a Lisp vector, that's true. I think one can > say that allocation is cheap on average. The overhead of freeing it is > not copying it, which is basically zero. But I presume that if we have more Lisp objects, GC will happen sooner, no? So increasing GC pressure is not zero-cost, because more frequent GC means more CPU processing that stops our application thread. > SAFE_NALLOCA, with my patch, requires a xmalloc, creation of a MPS root > object, deletion of that, and xfree. > > Let's assume scanning costs are more or less the same because the > number of references is the same in both cases. But more memory allocated via xmalloc doesn't increase the frequency of GC, does it? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 17:31 ` Eli Zaretskii @ 2025-01-05 17:49 ` Gerd Möllmann 2025-01-05 18:42 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 17:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sun, 05 Jan 2025 14:21:04 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > So can we talk about the relative merits and demerits of using >> > SAFE_ALLOCA_LISP vs SAFE_NALLOCA? >> >> Let me add a (0): I assume that SAFE_ALLOCA_LISP is the right thing in >> the _old_ GC, because it makes sure objects referenced in the xmalloc'd >> memory are marked. From my POV, it would require a very good reason to >> use something else, which is nowhere mentioned. That's why I suspect >> it's a left-over from times where SAFE_ALLOCA_LISP didn't exist. > > This surprises me because on the master branch SAFE_ALLOCA_LISP either > calls alloca or xzalloc. So I don't see why SAFE_ALLOCA_LISP would be > safer in the old GC than SAFE_NALLOCA. It's only on the igc branch > that you made SAFE_ALLOCA_LISP make a Lisp vector. > But this is a tangent from my POV. I would like to discuss the new > GC, not the old one. > This in SAFE_ALLOCA_LISP_EXTRA (buf) = xzalloc (alloca_nbytes); \ record_unwind_protect_array (buf, nelt); \ makes a specpdl entry, which makes mark_specpdl do this: case SPECPDL_UNWIND_ARRAY: mark_objects (pdl->unwind_array.array, pdl->unwind_array.nelts); break; >> > And second, SAFE_ALLOCA_LISP conses a Lisp vector, which will increase >> > GC pressure, so isn't SAFE_NALLOCA preferable at least in some cases? >> >> SAFE_ALLOCA_LISP allocates a Lisp vector, that's true. I think one can >> say that allocation is cheap on average. The overhead of freeing it is >> not copying it, which is basically zero. > > But I presume that if we have more Lisp objects, GC will happen > sooner, no? So increasing GC pressure is not zero-cost, because more > frequent GC means more CPU processing that stops our application > thread. I don't know what exact consequences that has. MPS is at least an incremental collector, so it's not normally the case that the GC _starts_ at some given point. What certainly happens is that the array is allocated from an allocation point which has a certain amount of memory pre-allocated. And that pre-allocated memory will run out sooner, in which case the allocation point will acquire additional memory, which potentially requires some amount of GC. And so on. So it will have some effect, for sure, but I don't remember the docs spelling out details. Maybe someone else knows. > >> SAFE_NALLOCA, with my patch, requires a xmalloc, creation of a MPS root >> object, deletion of that, and xfree. >> >> Let's assume scanning costs are more or less the same because the >> number of references is the same in both cases. > > But more memory allocated via xmalloc doesn't increase the frequency > of GC, does it? Xmalloc no, but as I said above... ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 17:49 ` Gerd Möllmann @ 2025-01-05 18:42 ` Eli Zaretskii 2025-01-05 19:02 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 18:42 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sun, 05 Jan 2025 18:49:56 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > This surprises me because on the master branch SAFE_ALLOCA_LISP either > > calls alloca or xzalloc. So I don't see why SAFE_ALLOCA_LISP would be > > safer in the old GC than SAFE_NALLOCA. It's only on the igc branch > > that you made SAFE_ALLOCA_LISP make a Lisp vector. > > But this is a tangent from my POV. I would like to discuss the new > > GC, not the old one. > > > > This in SAFE_ALLOCA_LISP_EXTRA > > (buf) = xzalloc (alloca_nbytes); \ > record_unwind_protect_array (buf, nelt); \ > > makes a specpdl entry, which makes mark_specpdl do this: > > case SPECPDL_UNWIND_ARRAY: > mark_objects (pdl->unwind_array.array, pdl->unwind_array.nelts); > break; Thanks. I guess it's supposed to be "common knowledge" that record_unwind_protect_* indirectly causes the objects to be marked... But then why don't we set the .mark member in SAFE_NALLOCA or some variant thereof? ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 18:42 ` Eli Zaretskii @ 2025-01-05 19:02 ` Gerd Möllmann 0 siblings, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 19:02 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sun, 05 Jan 2025 18:49:56 +0100 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > This surprises me because on the master branch SAFE_ALLOCA_LISP either >> > calls alloca or xzalloc. So I don't see why SAFE_ALLOCA_LISP would be >> > safer in the old GC than SAFE_NALLOCA. It's only on the igc branch >> > that you made SAFE_ALLOCA_LISP make a Lisp vector. >> > But this is a tangent from my POV. I would like to discuss the new >> > GC, not the old one. >> > >> >> This in SAFE_ALLOCA_LISP_EXTRA >> >> (buf) = xzalloc (alloca_nbytes); \ >> record_unwind_protect_array (buf, nelt); \ >> >> makes a specpdl entry, which makes mark_specpdl do this: >> >> case SPECPDL_UNWIND_ARRAY: >> mark_objects (pdl->unwind_array.array, pdl->unwind_array.nelts); >> break; > > Thanks. I guess it's supposed to be "common knowledge" that > record_unwind_protect_* indirectly causes the objects to be marked... > > But then why don't we set the .mark member in SAFE_NALLOCA or some > variant thereof? Questions upon questions. I have no idea. It seems to be relatively new, i.e. younger than 20 years :-). ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 6:32 ` Gerd Möllmann 2025-01-05 6:59 ` Gerd Möllmann @ 2025-01-05 7:48 ` Eli Zaretskii 2025-01-05 8:19 ` Gerd Möllmann 1 sibling, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 7:48 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > Cc: 75322@debbugs.gnu.org > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Date: Sun, 05 Jan 2025 07:32:46 +0100 > > Just want to add that all SAFE_NALLOC uses should be checked in > scratch/igc. For example, set_overlays_multibyte uses it to store > itree_node *, and itree_node is in MPS. Can you tell more about what should be checked and how? I'm probably missing some details of the igc build, because I cannot translate what you say above into practical actions of auditing the code. Thanks. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 7:48 ` Eli Zaretskii @ 2025-01-05 8:19 ` Gerd Möllmann 2025-01-05 10:33 ` Eli Zaretskii 0 siblings, 1 reply; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 8:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> Cc: 75322@debbugs.gnu.org >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Date: Sun, 05 Jan 2025 07:32:46 +0100 >> >> Just want to add that all SAFE_NALLOC uses should be checked in >> scratch/igc. For example, set_overlays_multibyte uses it to store >> itree_node *, and itree_node is in MPS. > > Can you tell more about what should be checked and how? I'm probably > missing some details of the igc build, because I cannot translate what > you say above into practical actions of auditing the code. > > Thanks. I'd grep for SAFE_NALLOCA, and for each occurrence, see what is stored in the memory allocated. If that is a reference to MPS-allocated memory (a pointer or Lisp_Object), it should be changed, because it then hides references from MPS in malloc'd memory. Or can hide, to be more precise, in the case it doesn't use alloca. Or alternatively, you could avoid having to do that, and make SAFE_NALLOCA xmalloc an ambiguous root. A patch for that is in the other mail I sent. I think the performance impact of that is negligible because this path is only executed when SAFE_ALLOCA does not use alloca. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 8:19 ` Gerd Möllmann @ 2025-01-05 10:33 ` Eli Zaretskii 2025-01-05 10:40 ` Gerd Möllmann 0 siblings, 1 reply; 78+ messages in thread From: Eli Zaretskii @ 2025-01-05 10:33 UTC (permalink / raw) To: Gerd Möllmann; +Cc: pipcet, 75322 > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org > Date: Sun, 05 Jan 2025 09:19:17 +0100 > > I'd grep for SAFE_NALLOCA, and for each occurrence, see what is stored > in the memory allocated. Is only SAFE_NALLOCA a potential problem? What about the other members of the SAFE_*ALLOC* family? > If that is a reference to MPS-allocated memory > (a pointer or Lisp_Object), it should be changed, because it then hides > references from MPS in malloc'd memory. Or can hide, to be more precise, > in the case it doesn't use alloca. Does using igc_xnmalloc_ambig have any significant adverse effect on GC? Like makes it slower or forces it to scan more memory? Because if we make SAFE_NALLOCA do this by default, there could be quite a lot more such ambiguous roots by the time GC starts. If this could impact performance or has some other adverse effects, I think I'd prefer to have a new argument to SAFE_NALLOCA telling it whether to call igc_xnmalloc_ambig, and we will then need to audit our code to use that argument where what's stored in the memory it allocates could be a Lisp object. This is more error-prone, but performance does count, especially where it comes to GC. > I think the performance impact of that is negligible because this > path is only executed when SAFE_ALLOCA does not use alloca. MAX_ALLOCA is only 16KB (and Paul Eggert says it's dangerous to make it significantly larger), so I'm not sure your assumption about negligible performance impact is necessarily correct. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 10:33 ` Eli Zaretskii @ 2025-01-05 10:40 ` Gerd Möllmann 2025-01-05 11:21 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-06 6:26 ` Gerd Möllmann 0 siblings, 2 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 10:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Eli Zaretskii <eliz@gnu.org> writes: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >> Date: Sun, 05 Jan 2025 09:19:17 +0100 >> >> I'd grep for SAFE_NALLOCA, and for each occurrence, see what is stored >> in the memory allocated. > > Is only SAFE_NALLOCA a potential problem? What about the other > members of the SAFE_*ALLOC* family? Haven't checked others yet. I have that on my todo list now. > >> If that is a reference to MPS-allocated memory >> (a pointer or Lisp_Object), it should be changed, because it then hides >> references from MPS in malloc'd memory. Or can hide, to be more precise, >> in the case it doesn't use alloca. > > Does using igc_xnmalloc_ambig have any significant adverse effect on > GC? Like makes it slower or forces it to scan more memory? Because > if we make SAFE_NALLOCA do this by default, there could be quite a lot > more such ambiguous roots by the time GC starts. More roots = more to scan for GC. > If this could impact performance or has some other adverse effects, > I think I'd prefer to have a new argument to SAFE_NALLOCA telling it > whether to call igc_xnmalloc_ambig, and we will then need to audit our > code to use that argument where what's stored in the memory it > allocates could be a Lisp object. This is more error-prone, but > performance does count, especially where it comes to GC. > >> I think the performance impact of that is negligible because this >> path is only executed when SAFE_ALLOCA does not use alloca. > > MAX_ALLOCA is only 16KB (and Paul Eggert says it's dangerous to make > it significantly larger), so I'm not sure your assumption about > negligible performance impact is necessarily correct. Works well for me so far. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 10:40 ` Gerd Möllmann @ 2025-01-05 11:21 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 11:27 ` Gerd Möllmann 2025-01-05 11:49 ` Paul Eggert 2025-01-06 6:26 ` Gerd Möllmann 1 sibling, 2 replies; 78+ messages in thread From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 11:21 UTC (permalink / raw) To: Gerd Möllmann; +Cc: Eli Zaretskii, Paul Eggert, 75322 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Eli Zaretskii <eliz@gnu.org> writes: >> MAX_ALLOCA is only 16KB (and Paul Eggert says it's dangerous to make >> it significantly larger), so I'm not sure your assumption about >> negligible performance impact is necessarily correct. > > Works well for me so far. Paul, this discussion is about changing SAFE_ALLOCA so its memory area is scanned conservatively for the scratch/igc branch, even if xmalloc is used (when there isn't enough stack space). MPS requires this in some cases where the old GC didn't, because it's enough for the old GC that we know there is a reference to retained objects elsewhere, but MPS might move the data in that case. However, given how rare it is for SAFE_ALLOCA to use xmalloc, maybe it would be a good idea to make the corresponding change for the old GC, too? Also, scratch/igc behaves differently with regard to SDATA: references to string data will keep the string data (but not its metadata) alive. If scratch/igc is ever merged and people start using it exclusively, we may start seeing bugs that rely on this additional safety guarantee. My preference would be to change the traditional GC to also ensure code like char *p = SDATA (string); string = Qnil; garbage_collect (); puts (p); is safe. Some memory and performance cost, but no-GC assumptions are hard to verify. This may or may not fix bugs in the present code. A potential example for such a bug is this: I think that it's possible for call_process to call Lisp via ENCODE_FILE in a very unusual scenario, which might trigger GC and relocation of string data which has already been put in the argv and environment arrays. However, I don't fully understand which additional protections traditional GC offers: is it always unsafe to use an SDATA obtained before a GC run after the GC run completes, or is there a more complicated set of rules? Pip ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 11:21 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 11:27 ` Gerd Möllmann 2025-01-05 11:49 ` Paul Eggert 1 sibling, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-05 11:27 UTC (permalink / raw) To: Pip Cet; +Cc: Eli Zaretskii, Paul Eggert, 75322 Pip Cet <pipcet@protonmail.com> writes: > However, I don't fully understand which additional protections > traditional GC offers: is it always unsafe to use an SDATA obtained > before a GC run after the GC run completes, or is there a more > complicated set of rules? There is this, which is used in the byte interpreter, IIRC, as sort of a workaround for the problem #ifndef HAVE_MPS /* Pin a unibyte string in place so that it won't move during GC. */ void pin_string (Lisp_Object string) { ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 11:21 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 11:27 ` Gerd Möllmann @ 2025-01-05 11:49 ` Paul Eggert 1 sibling, 0 replies; 78+ messages in thread From: Paul Eggert @ 2025-01-05 11:49 UTC (permalink / raw) To: Pip Cet, Gerd Möllmann; +Cc: Eli Zaretskii, 75322 On 2025-01-05 03:21, Pip Cet wrote: > is it always unsafe to use an SDATA obtained > before a GC run after the GC run completes, or is there a more > complicated set of rules? If there's a more complicated set of rules, I don't know them. Gerd's citation of pin_string makes sense. ^ permalink raw reply [flat|nested] 78+ messages in thread
* bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) 2025-01-05 10:40 ` Gerd Möllmann 2025-01-05 11:21 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-06 6:26 ` Gerd Möllmann 1 sibling, 0 replies; 78+ messages in thread From: Gerd Möllmann @ 2025-01-06 6:26 UTC (permalink / raw) To: Eli Zaretskii; +Cc: pipcet, 75322 Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Eli Zaretskii <eliz@gnu.org> writes: > >>> From: Gerd Möllmann <gerd.moellmann@gmail.com> >>> Cc: pipcet@protonmail.com, 75322@debbugs.gnu.org >>> Date: Sun, 05 Jan 2025 09:19:17 +0100 >>> >>> I'd grep for SAFE_NALLOCA, and for each occurrence, see what is stored >>> in the memory allocated. >> >> Is only SAFE_NALLOCA a potential problem? What about the other >> members of the SAFE_*ALLOC* family? > > Haven't checked others yet. I have that on my todo list now. Checked SAFE_ALLOCA now, and it seems not to be used for holding references anywhere. ^ permalink raw reply [flat|nested] 78+ messages in thread
end of thread, other threads:[~2025-01-06 19:16 UTC | newest] Thread overview: 78+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-03 17:20 bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-03 19:55 ` Gerd Möllmann 2025-01-03 20:34 ` Gerd Möllmann 2025-01-03 20:48 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 4:40 ` Gerd Möllmann 2025-01-04 7:57 ` Eli Zaretskii 2025-01-04 8:47 ` Gerd Möllmann 2025-01-04 9:56 ` Eli Zaretskii 2025-01-04 10:20 ` Gerd Möllmann 2025-01-05 13:30 ` Eli Zaretskii 2025-01-05 14:11 ` Gerd Möllmann 2025-01-05 17:45 ` Eli Zaretskii 2025-01-05 18:17 ` Gerd Möllmann 2025-01-05 19:07 ` Eli Zaretskii 2025-01-05 20:04 ` Gerd Möllmann 2025-01-05 20:24 ` Eli Zaretskii 2025-01-06 3:57 ` Gerd Möllmann 2025-01-06 8:25 ` Gerd Möllmann 2025-01-06 14:07 ` Eli Zaretskii 2025-01-05 21:15 ` Daniel Colascione 2025-01-06 12:59 ` Eli Zaretskii 2025-01-06 14:48 ` Daniel Colascione 2025-01-06 15:12 ` Eli Zaretskii 2025-01-06 15:27 ` Daniel Colascione 2025-01-05 21:01 ` Daniel Colascione 2025-01-05 23:28 ` Daniel Colascione 2025-01-06 13:26 ` Eli Zaretskii 2025-01-06 15:08 ` Daniel Colascione 2025-01-06 4:23 ` Gerd Möllmann 2025-01-04 11:41 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 11:29 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 12:17 ` Gerd Möllmann 2025-01-04 7:00 ` Eli Zaretskii 2025-01-04 7:17 ` Gerd Möllmann 2025-01-04 8:23 ` Eli Zaretskii 2025-01-04 8:58 ` Gerd Möllmann 2025-01-04 11:08 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 13:47 ` Eli Zaretskii 2025-01-04 14:13 ` Gerd Möllmann 2025-01-04 15:26 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 15:34 ` Gerd Möllmann 2025-01-04 18:19 ` Eli Zaretskii 2025-01-04 18:35 ` Gerd Möllmann 2025-01-04 19:10 ` Eli Zaretskii 2025-01-04 19:24 ` Gerd Möllmann 2025-01-04 18:02 ` Eli Zaretskii 2025-01-04 19:32 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-04 20:31 ` Eli Zaretskii 2025-01-04 21:15 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 8:23 ` Eli Zaretskii 2025-01-05 9:04 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 9:32 ` Eli Zaretskii 2025-01-05 9:47 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 11:04 ` Eli Zaretskii 2025-01-06 15:54 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-06 19:16 ` Gerd Möllmann 2025-01-05 6:32 ` Gerd Möllmann 2025-01-05 6:59 ` Gerd Möllmann 2025-01-05 10:21 ` Eli Zaretskii 2025-01-05 10:30 ` Gerd Möllmann 2025-01-05 10:35 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 10:45 ` Gerd Möllmann 2025-01-05 11:29 ` Eli Zaretskii 2025-01-05 11:37 ` Gerd Möllmann 2025-01-05 12:15 ` Eli Zaretskii 2025-01-05 13:21 ` Gerd Möllmann 2025-01-05 17:31 ` Eli Zaretskii 2025-01-05 17:49 ` Gerd Möllmann 2025-01-05 18:42 ` Eli Zaretskii 2025-01-05 19:02 ` Gerd Möllmann 2025-01-05 7:48 ` Eli Zaretskii 2025-01-05 8:19 ` Gerd Möllmann 2025-01-05 10:33 ` Eli Zaretskii 2025-01-05 10:40 ` Gerd Möllmann 2025-01-05 11:21 ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors 2025-01-05 11:27 ` Gerd Möllmann 2025-01-05 11:49 ` Paul Eggert 2025-01-06 6:26 ` Gerd Möllmann
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).