On 12/24/2015 09:17 AM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
> 
>> Neither you nor Paul have addressed any of the alternatives to this
>> longjmp-from-anywhere behavior. You have not addressed the point that Emacs
>> can crash fatally in numerous ways having nothing to do with stack overflow.
>> You have not addressed the point that we already have robust stack overflow
>> protection at the Lisp level, and so don't need additional workarounds at
>> the C level. You have not even provided any evidence that C-level stack
>> overflow is a problem worth solving.
> 
> Would someone be willing to summarize where we're at at this point with this
> discussion? It has been long and large enough that I'm no longer clear on
> exactly what it is that we do and don't want, and why. Just a summary of our
> major alternatives at this point, and the most significant points for and
> against each would be great.
> 

If the C stack in Emacs overflows, Emacs crashes and terminates.
Normally, we prevent C stack overflow by preventing Lisp evaluation from
getting too deep by bounding it with the variables max-lisp-eval-depth
and max-specpdl-size, but a nasty C function can still overflow the
stack and crash.

In 2014, Emacs gained a new path in the SIGSEGV handler that attempts to
detect C stack oerflow and longjmp back to toplevel. It's important to
note that we don't just longjmp when we're in a safe position: we
longjmp from *anywhere*, even if we're, say, in the middle of malloc.
This longjmp can corrupt internal state in Emacs or libc, cause
deadlocks, bypass C++ destructors in module code, or literally cause any
behavior whatsoever, since we're violating invariants of the system. The
longjmp also bypasses unwind-protect handlers and other kinds of
resource cleanup. Everyone acknowledges that this path is very unsafe.

Eli and Paul believe that "Emacs should never crash", and that
potentially saving user data is worth the risk of undefined behavior,
which they contend does not occur in practice.

They are wrong. This code is terrible and that we should delete it
immediately. The code is fundamentally flawed and cannot be made to work
properly on any platform. No other program attempts to recover from
stack overflow this way. (I surveyed a few in a previous messages.)

In practice, the Lisp stack depth limits provide enough protection, and
the risk of data corruption is too great. The existing auto-save logic
is good enough for data recovery, especially if we run the sigsegv
handler on the alternate signal stack (which we can make as large as we
want) when possible.

C stack overflow is a programing error just as bad as *((char*)1)=2 and
we shouldn't try to recover from it, *especially* not when this recovery
is dangerous and leads to more problems than it solves.

If we keep this code in Emacs, it sets a precedent for other terrible
forms of crash recovery, like silently ignoring writes to NULL,
replacing reads from NULL with zero, longjmping out of SIGABRT, and so
on. If we believe "Emacs should never crash", we should fix its bugs,
not try to paper over them with undefined behavior.