On 12/24/2015 08:10 AM, Eli Zaretskii wrote: >> From: Paul Eggert >> Date: Wed, 23 Dec 2015 18:51:23 -0800 >> Cc: Emacs Development >> >>> Please stop repeating the false idea that >>> longjmp from arbitrary points in the program to toplevel is harmless. >> >> Neither Eli nor I have said it's harmless. Merely that it works well enough in >> practice. Let's not make perfection the enemy of functionality. > > Right. > >> > the current mechanism does not achieve its goal. It's >> > utterly unsafe even without module code added to the mix. >> >> It's safe enough in practice. You're right that in *theory* it's utterly unsafe, >> but Emacs is a practical program not a theoretical exercise. >> >> Really, the idea that we'll let Emacs crash on stack overflow (merely because >> modules are being used) is a non-starter. We need a better solution. > > 100% agreement. > You'd prefer Emacs to lock up or corrupt data instead? Neither you nor Paul have addressed any of the alternatives to this longjmp-from-anywhere behavior. You have not addressed the point that Emacs can crash fatally in numerous ways having nothing to do with stack overflow. You have not addressed the point that we already have robust stack overflow protection at the Lisp level, and so don't need additional workarounds at the C level. You have not even provided any evidence that C-level stack overflow is a problem worth solving. All I see is a insistence that we keep the longjmp hack stay because "Emacs must not crash", even though it demonstrably does crash in numerous exciting ways, and won't stop any time soon, because real programs always have bugs, and experience shows that failing quickly (trying to preserve data) is better than trying to limp along, because that just makes the situation worse. I know the rebuttal to that last point is that the perfect shouldn't be the enemy of the good: believe me, I've debugged enough crashes and hangs caused by well-intentioned crash recovery code to know that invoking undefined behavior to recover from a crash is far below "good" on the scale of things you can do to improve program reliability. There is a good reason that other programs --- not other text editors [1], not other VMs [2], not web browsers [3], not GCC [4], nor GDB [5] --- uses the completely unsafe mechanism Emacs currently uses to react to stack overflow. (If such programs exist, I haven't seen them.) Most programs, in fact, don't bother trying to recover from stack overflow, because most of the time, in practice, their stack use is bounded statically. Let me detail a *safe*, *effective* alternative one more time. If you really want to make lisp-induced stack overflow less likely, here is how you do it: 1) Using some mechanism (alloca will work, although OS-specific options exist), make sure you have X MB of address space dedicated to the main thread on startup. At this point, we cannot lose data, and failing to obtain this address space is both unlikely and as harmful as failing to obtain space for Emacs BSS. 2) Now we know the addresses of the top and bottom of the stack. 3) On each time Lisp calls into C, each time a module calls into the Emacs core, and on each QUIT, subtract the current stack pointer from the top of the stack. The result is a lower bound on the amount of stack space available. This computation is very cheap: it's one load from global storage or TLS and a subtract instruction. 4) If the amount of stack space available is less than some threshold, say Y, signal a stack exhaustion error. 5) Require that C code (modules included) do not use more than Y MB of stack space between QUITs or calls to the module API 6) Set Y to a reasonable figure like 4MB. Third-party libraries must already be able to run in bounded stack space because they're usually designed to run off the main thread, and on both Windows and POSIX systems, non-main thread stacks are sized on thread startup and cannot grow. I have no idea why we would prefer the SIGSEGV trap approach to the scheme I just outlined. As a practical matter, modules will not adhere to weird Emacs-specific stack overflow detection schemes. Insisting on them will not help. If the current longjmp scheme remains in place, the user-visible behavior will be "Emacs randomly locks up, and the stack in the debugger is impossible according to the code as written", not "I was able to save my data". [1] vim (7.4.712) autosaves and exits on fatal signals, of which SIGSEGV is one. It uses an alternate signal stack to do it, just as I proposed in a previous message. [2] hotspot (openjdk-8-u845-b14) uses a guard page to generate StackOverflowError when Java code blows the stack, but if the overflowing frame is C code, it simply lets the program crash [3] Firefox 43 uses Breakpad to handle fatal errors (which we really should do too, but that's a separate discussion) [4] GCC (current master) turns SIGSEGV into an internal compilation error [5] GDB (current master) just crashes on SIGSEGV, although it does have a special case for trying to catch crashes in the C++ name demangler functions