On 12/21/2015 08:09 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>> we should treat OOM exactly like other sorts of error.
>
> Perhaps we should, but currently stack overflow is not treated that way.
>
>> OS signals should go through the usual Emacs event loop, right?
>
> I'm not sure what you mean, but let's put it this way: stack overflow
> can occur while in the low-level handler for an OS signal. And even if
> stack overflow does not occur, if the user types C-g three times when
> inhibit-quit is nil, the OS signal won't go through the Emacs event
> loop; instead, Emacs will invoke (signal 'quit nil).
>
> Perhaps what we need to do is to have stack overflow invoke (signal
> 'stack-overflow nil), or something like that. It's a bit tricky, though,
> as one needs some stack space to call 'signal'.
>
>> The standard requires runtimes reserve enough memory to throw
>> std::bad_alloc. All Emacs has to do is make sure control flow reaches
>> the C++ level.
>
> How does this actually work, when combined with Emacs's C-level stack
> overflow checking? Won't one get in the way of the other?

Let's start over.

Right now, when we get a SIGSEGV, we check the siginfo_t the OS gives us
by calling stack_overflow on it; if that returns true, we longjmp to
toplevel. We configure the sigsegv handler to run on an alternate stack,
so we'll always have space to do that much work. The longjmp restores
the original stack. On the other side of the longjmp, we resume
execution with our regular stack, but much further up the stack. At this
point, we know we have a stack overflow, because nothing else longjmps
to return_to_command_loop.

Now, if we return normally to a C++ caller with an error indication set,
the C++ caller will almost certainly have enough stack space to throw
its own exception and propagate the exception further.

The only real change we have to make is to have Emacs longjmp not to
return_to_command_loop (which might skip module frames), but to longjmp
instead to the most deeply nested entry point from module code into
Emacs, which we can set up in advance whenever a module calls into the
Emacs API.

unwind_to_catch isn't really very different from the longmp to
return_to_command_loop: I don't see any reason we can't run it on the
alternate signal stack. In fact, I don't see why we can't replace
return_to_command_loop generally with Fsignal.

I really don't like the stack overflow protection stuff in general
though. It's not possible to robustly recover, because the stack
overflow detection turns *any* function call into an operation that
might return non-locally. In that environment --- where, say, XCAR might
end up running lisp --- it's hard to maintain invariants. I'd rather
Emacs just die on C stack overflow, except when we know we're running
Lisp in such a way that we know we can recover.

(The bad_alloc comment is moe about exhausting the heap: even if we
instead exhaust the malloc heap instead of the stack, we'll have still
set aside enough space to throw a bad_alloc as long as Emacs returns
control to C++.)