On 12/24/2015 08:10 AM, Eli Zaretskii wrote:
>> From: Paul Eggert <eggert@cs.ucla.edu>
>> Date: Wed, 23 Dec 2015 18:51:23 -0800
>> Cc: Emacs Development <Emacs-devel@gnu.org>
>>
>>> Please stop repeating the false idea that
>>> longjmp from arbitrary points in the program to toplevel is harmless.
>>
>> Neither Eli nor I have said it's harmless. Merely that it works well enough in 
>> practice. Let's not make perfection the enemy of functionality.
> 
> Right.
> 
>>  > the current mechanism does not achieve its goal. It's
>>  > utterly unsafe even without module code added to the mix.
>>
>> It's safe enough in practice. You're right that in *theory* it's utterly unsafe, 
>> but Emacs is a practical program not a theoretical exercise.
>>
>> Really, the idea that we'll let Emacs crash on stack overflow (merely because 
>> modules are being used) is a non-starter. We need a better solution.
> 
> 100% agreement.
> 

You'd prefer Emacs to lock up or corrupt data instead?

Neither you nor Paul have addressed any of the alternatives to this
longjmp-from-anywhere behavior. You have not addressed the point that
Emacs can crash fatally in numerous ways having nothing to do with stack
overflow. You have not addressed the point that we already have robust
stack overflow protection at the Lisp level, and so don't need
additional workarounds at the C level. You have not even provided any
evidence that C-level stack overflow is a problem worth solving.

All I see is a insistence that we keep the longjmp hack stay because
"Emacs must not crash", even though it demonstrably does crash in
numerous exciting ways, and won't stop any time soon, because real
programs always have bugs, and experience shows that failing quickly
(trying to preserve data) is better than trying to limp along, because
that just makes the situation worse.

I know the rebuttal to that last point is that the perfect shouldn't be
the enemy of the good: believe me, I've debugged enough crashes and
hangs caused by well-intentioned crash recovery code to know that
invoking undefined behavior to recover from a crash is far below "good"
on the scale of things you can do to improve program reliability.

There is a good reason that other programs --- not other text editors
[1], not other VMs [2], not web browsers [3], not GCC [4], nor GDB [5]
--- uses the completely unsafe mechanism Emacs currently uses to react
to stack overflow. (If such programs exist, I haven't seen them.) Most
programs, in fact, don't bother trying to recover from stack overflow,
because most of the time, in practice, their stack use is bounded
statically.

Let me detail a *safe*, *effective* alternative one more time. If you
really want to make lisp-induced stack overflow less likely, here is how
you do it:

1) Using some mechanism (alloca will work, although OS-specific options
exist), make sure you have X MB of address space dedicated to the main
thread on startup. At this point, we cannot lose data, and failing to
obtain this address space is both unlikely and as harmful as failing to
obtain space for Emacs BSS.

2) Now we know the addresses of the top and bottom of the stack.

3) On each time Lisp calls into C, each time a module calls into the
Emacs core, and on each QUIT, subtract the current stack pointer from
the top of the stack. The result is a lower bound on the amount of stack
space available. This computation is very cheap: it's one load from
global storage or TLS and a subtract instruction.

4) If the amount of stack space available is less than some threshold,
say Y, signal a stack exhaustion error.

5) Require that C code (modules included) do not use more than Y MB of
stack space between QUITs or calls to the module API

6) Set Y to a reasonable figure like 4MB. Third-party libraries must
already be able to run in bounded stack space because they're usually
designed to run off the main thread, and on both Windows and POSIX
systems, non-main thread stacks are sized on thread startup and cannot grow.

I have no idea why we would prefer the SIGSEGV trap approach to
the scheme I just outlined.

As a practical matter, modules will not adhere to weird Emacs-specific
stack overflow detection schemes. Insisting on them will not help. If
the current longjmp scheme remains in place, the user-visible behavior
will be "Emacs randomly locks up, and the stack in the debugger is
impossible according to the code as written", not "I was able to save my
data".

[1] vim (7.4.712) autosaves and exits on fatal signals, of which SIGSEGV
is one. It uses an alternate signal stack to do it, just as I proposed
in a previous message.

[2] hotspot (openjdk-8-u845-b14) uses a guard page to generate
StackOverflowError when Java code blows the stack, but if the
overflowing frame is C code, it simply lets the program crash

[3] Firefox 43 uses Breakpad to handle fatal errors (which we really
should do too, but that's a separate discussion)

[4] GCC (current master) turns SIGSEGV into an internal compilation error

[5] GDB (current master) just crashes on SIGSEGV, although it does have
a special case for trying to catch crashes in the C++ name demangler
functions