On 12/22/2015 08:01 AM, Eli Zaretskii wrote: >> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com, >> emacs-devel@gnu.org >> From: Daniel Colascione >> Date: Mon, 21 Dec 2015 20:38:07 -0800 >> >> Right now, when we get a SIGSEGV, we check the siginfo_t the OS gives us >> by calling stack_overflow on it; if that returns true, we longjmp to >> toplevel. We configure the sigsegv handler to run on an alternate stack, >> so we'll always have space to do that much work. The longjmp restores >> the original stack. On the other side of the longjmp, we resume >> execution with our regular stack, but much further up the stack. At this >> point, we know we have a stack overflow, because nothing else longjmps >> to return_to_command_loop. >> >> Now, if we return normally to a C++ caller with an error indication set, >> the C++ caller will almost certainly have enough stack space to throw >> its own exception and propagate the exception further. > > I very much doubt that. The alternate stack is quite small (AFAIK, > the standard value that we are using, SIGSTKSZ, is something like > 8KB). Running arbitrary C++ code on such a small stack is not safe. > (My understanding is that the value of SIGSTKSZ should suffice for > calling printf, and that's about it.) There will be high risk of > hitting yet another stack overflow, this time a fatal one. We're not talking about running arbitrary C++ code on the small stack. The longjmp transfers execution to the original stack, but with the context popped off. Overflow stack: A B C D E F G Signal stack: 1 2 3 longjmp Resumption stack: A B C > >> unwind_to_catch isn't really very different from the longmp to >> return_to_command_loop: I don't see any reason we can't run it on the >> alternate signal stack. In fact, I don't see why we can't replace >> return_to_command_loop generally with Fsignal. > > See above: I think running arbitrary Lisp code on a 8KB stack is even > less safe that with C++ code. We avoid doing that for a good reason. > Let me remind you that Emacs on Windows sets up a 8MB stack (as > opposed to the standard 2MB) because it is necessary in some > situations, like matching some regexps. 8MB, not 8KB! A Lisp unwind > handler can do anything at all, so I think running the unwinding code > from a stack overflow is not an option, if we want to make sure stack > overflow recovery will not hit another fatal stack overflow in most > cases. > >> I really don't like the stack overflow protection stuff in general >> though. It's not possible to robustly recover, because the stack >> overflow detection turns *any* function call into an operation that >> might return non-locally. In that environment --- where, say, XCAR might >> end up running lisp --- it's hard to maintain invariants. > > It might be less than nice or elegant, but Emacs should give the user > an opportunity to save their work. > >> I'd rather Emacs just die on C stack overflow, except when we know >> we're running Lisp in such a way that we know we can recover. > > You are in effect saying the stack overflow recovery code should not > have been added to Emacs. But we already decided that an attempt to > recover is a useful feature, and I see no reason to go back. Even if > this is works only in some cases, partial recovery is better than a > hard crash, because it lets users save their work. Or it actually corrupts their work, because the Emacs core is in a bad state. We can gracefully recover from stack overflow of Lisp code. We cannot recover from stack oveflow at arbitrary points in the C core.