On 12/24/2015 09:17 AM, John Wiegley wrote: >>>>>> Daniel Colascione writes: > >> Neither you nor Paul have addressed any of the alternatives to this >> longjmp-from-anywhere behavior. You have not addressed the point that Emacs >> can crash fatally in numerous ways having nothing to do with stack overflow. >> You have not addressed the point that we already have robust stack overflow >> protection at the Lisp level, and so don't need additional workarounds at >> the C level. You have not even provided any evidence that C-level stack >> overflow is a problem worth solving. > > Would someone be willing to summarize where we're at at this point with this > discussion? It has been long and large enough that I'm no longer clear on > exactly what it is that we do and don't want, and why. Just a summary of our > major alternatives at this point, and the most significant points for and > against each would be great. > If the C stack in Emacs overflows, Emacs crashes and terminates. Normally, we prevent C stack overflow by preventing Lisp evaluation from getting too deep by bounding it with the variables max-lisp-eval-depth and max-specpdl-size, but a nasty C function can still overflow the stack and crash. In 2014, Emacs gained a new path in the SIGSEGV handler that attempts to detect C stack oerflow and longjmp back to toplevel. It's important to note that we don't just longjmp when we're in a safe position: we longjmp from *anywhere*, even if we're, say, in the middle of malloc. This longjmp can corrupt internal state in Emacs or libc, cause deadlocks, bypass C++ destructors in module code, or literally cause any behavior whatsoever, since we're violating invariants of the system. The longjmp also bypasses unwind-protect handlers and other kinds of resource cleanup. Everyone acknowledges that this path is very unsafe. Eli and Paul believe that "Emacs should never crash", and that potentially saving user data is worth the risk of undefined behavior, which they contend does not occur in practice. They are wrong. This code is terrible and that we should delete it immediately. The code is fundamentally flawed and cannot be made to work properly on any platform. No other program attempts to recover from stack overflow this way. (I surveyed a few in a previous messages.) In practice, the Lisp stack depth limits provide enough protection, and the risk of data corruption is too great. The existing auto-save logic is good enough for data recovery, especially if we run the sigsegv handler on the alternate signal stack (which we can make as large as we want) when possible. C stack overflow is a programing error just as bad as *((char*)1)=2 and we shouldn't try to recover from it, *especially* not when this recovery is dangerous and leads to more problems than it solves. If we keep this code in Emacs, it sets a precedent for other terrible forms of crash recovery, like silently ignoring writes to NULL, replacing reads from NULL with zero, longjmping out of SIGABRT, and so on. If we believe "Emacs should never crash", we should fix its bugs, not try to paper over them with undefined behavior.