On 12/21/2015 08:09 PM, Paul Eggert wrote: > Daniel Colascione wrote: >> we should treat OOM exactly like other sorts of error. > > Perhaps we should, but currently stack overflow is not treated that way. > >> OS signals should go through the usual Emacs event loop, right? > > I'm not sure what you mean, but let's put it this way: stack overflow > can occur while in the low-level handler for an OS signal. And even if > stack overflow does not occur, if the user types C-g three times when > inhibit-quit is nil, the OS signal won't go through the Emacs event > loop; instead, Emacs will invoke (signal 'quit nil). > > Perhaps what we need to do is to have stack overflow invoke (signal > 'stack-overflow nil), or something like that. It's a bit tricky, though, > as one needs some stack space to call 'signal'. > >> The standard requires runtimes reserve enough memory to throw >> std::bad_alloc. All Emacs has to do is make sure control flow reaches >> the C++ level. > > How does this actually work, when combined with Emacs's C-level stack > overflow checking? Won't one get in the way of the other? Let's start over. Right now, when we get a SIGSEGV, we check the siginfo_t the OS gives us by calling stack_overflow on it; if that returns true, we longjmp to toplevel. We configure the sigsegv handler to run on an alternate stack, so we'll always have space to do that much work. The longjmp restores the original stack. On the other side of the longjmp, we resume execution with our regular stack, but much further up the stack. At this point, we know we have a stack overflow, because nothing else longjmps to return_to_command_loop. Now, if we return normally to a C++ caller with an error indication set, the C++ caller will almost certainly have enough stack space to throw its own exception and propagate the exception further. The only real change we have to make is to have Emacs longjmp not to return_to_command_loop (which might skip module frames), but to longjmp instead to the most deeply nested entry point from module code into Emacs, which we can set up in advance whenever a module calls into the Emacs API. unwind_to_catch isn't really very different from the longmp to return_to_command_loop: I don't see any reason we can't run it on the alternate signal stack. In fact, I don't see why we can't replace return_to_command_loop generally with Fsignal. I really don't like the stack overflow protection stuff in general though. It's not possible to robustly recover, because the stack overflow detection turns *any* function call into an operation that might return non-locally. In that environment --- where, say, XCAR might end up running lisp --- it's hard to maintain invariants. I'd rather Emacs just die on C stack overflow, except when we know we're running Lisp in such a way that we know we can recover. (The bad_alloc comment is moe about exhausting the heap: even if we instead exhaust the malloc heap instead of the stack, we'll have still set aside enough space to throw a bad_alloc as long as Emacs returns control to C++.)