On 12/22/2015 08:01 AM, Eli Zaretskii wrote:
>> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com,
>>  emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Mon, 21 Dec 2015 20:38:07 -0800
>>
>> Right now, when we get a SIGSEGV, we check the siginfo_t the OS gives us
>> by calling stack_overflow on it; if that returns true, we longjmp to
>> toplevel. We configure the sigsegv handler to run on an alternate stack,
>> so we'll always have space to do that much work. The longjmp restores
>> the original stack. On the other side of the longjmp, we resume
>> execution with our regular stack, but much further up the stack. At this
>> point, we know we have a stack overflow, because nothing else longjmps
>> to return_to_command_loop.
>>
>> Now, if we return normally to a C++ caller with an error indication set,
>> the C++ caller will almost certainly have enough stack space to throw
>> its own exception and propagate the exception further.
> 
> I very much doubt that.  The alternate stack is quite small (AFAIK,
> the standard value that we are using, SIGSTKSZ, is something like
> 8KB).  Running arbitrary C++ code on such a small stack is not safe.
> (My understanding is that the value of SIGSTKSZ should suffice for
> calling printf, and that's about it.)  There will be high risk of
> hitting yet another stack overflow, this time a fatal one.

We're not talking about running arbitrary C++ code on the small stack.
The longjmp transfers execution to the original stack, but with the
context popped off.

Overflow stack: A B C D E F G
Signal stack: 1 2 3 longjmp
Resumption stack: A B C

> 
>> unwind_to_catch isn't really very different from the longmp to
>> return_to_command_loop: I don't see any reason we can't run it on the
>> alternate signal stack. In fact, I don't see why we can't replace
>> return_to_command_loop generally with Fsignal.
> 
> See above: I think running arbitrary Lisp code on a 8KB stack is even
> less safe that with C++ code.  We avoid doing that for a good reason.
> Let me remind you that Emacs on Windows sets up a 8MB stack (as
> opposed to the standard 2MB) because it is necessary in some
> situations, like matching some regexps.  8MB, not 8KB!  A Lisp unwind
> handler can do anything at all, so I think running the unwinding code
> from a stack overflow is not an option, if we want to make sure stack
> overflow recovery will not hit another fatal stack overflow in most
> cases.
> 
>> I really don't like the stack overflow protection stuff in general
>> though. It's not possible to robustly recover, because the stack
>> overflow detection turns *any* function call into an operation that
>> might return non-locally. In that environment --- where, say, XCAR might
>> end up running lisp --- it's hard to maintain invariants.
> 
> It might be less than nice or elegant, but Emacs should give the user
> an opportunity to save their work.
> 
>> I'd rather Emacs just die on C stack overflow, except when we know
>> we're running Lisp in such a way that we know we can recover.
> 
> You are in effect saying the stack overflow recovery code should not
> have been added to Emacs.  But we already decided that an attempt to
> recover is a useful feature, and I see no reason to go back.  Even if
> this is works only in some cases, partial recovery is better than a
> hard crash, because it lets users save their work.

Or it actually corrupts their work, because the Emacs core is in a bad
state. We can gracefully recover from stack overflow of Lisp code. We
cannot recover from stack oveflow at arbitrary points in the C core.