On 01/03/2016 11:15 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 11:04:08 -0800
>>
>>> Yes, it is.  You would like us to crash rather than try recovering.
>>> That is a very heavy price in Emacs.
>>
>> Why is it uniquely unacceptable in Emacs? Why do other programs that
>> fill the same niche not employ this strategy?
> 
> Not many other programs run for so long and have so much precious data
> for their users.  Besides, who says there are no other programs that
> do this?  libsigsegv wasn't written as an academic exercise.

Many other programs run as long. One example is the Linux kernel, which
panics on stack overflow.

>> Why do we not try to mitigate NULL pointer dereferences (to which
>> all the same arguments apply)?
> 
> We do: we catch SIGSEGV and try to save what can be salvaged.

Invoking auto-save after resetting SIGSEGV is a good application of that
approach. (We should make sure that control flow can't leave the sigsegv
handler.) What's dangerous is allowing Emacs to continue running after
we've detected that it's entered a bad state. I'm not against installing
a sigsegv handler: I'm against returning control flow to toplevel.

>>>> My point isn't that memory leaks are disastrous. It's that the
>>>> consequences of this code weren't given due consideration at the time it
>>>> was committed.
>>>
>>> You have absolutely no evidence that this wasn't considered.  It's
>>> factually incorrect.  You don't have to know that it's incorrect, but
>>> I would expect you to give more credit to our collective knowledge and
>>> experience than you evidently do.
>>
>> I searched the mailing list and saw no discussion of the points I
>> raised.
> 
> Who said that considerations must be in public discussions?  On the
> contrary, I'd rather take the lack of discussions as an indication
> that this was considered and no one saw any problem with it.

The existence of consistent with both my view and widespread, sagacious
approval. Given the concerns I raised, the more parsimonious explanation
is that the code went in without review, because even if you and Paul
are right, it's worth having a conversation about the dangers of the
code, and AFAICT, there was none.

>>>>> You are not objective, so you exaggerate the risks and dismiss the
>>>>> benefits.
>>>>
>>>> I disagree that there *are* significant benefits.
>>>
>>> Of course, you do.  Like I said: your bias affects your judgment.
>>
>> So does yours.
> 
> No, I acknowledge the risks.  You don't acknowledge the benefits.

The benefit is that returning control to toplevel allows the user to
save data in buffers where autosave is not enabled. I think the benefit
is slight.

Autosave is the only mechanism that protects against other failure
modes, like the OOM killer, NULL pointer dereferences, and sudden power
loss. Consequently, I strongly suspect that any truly precious data is
in autosave buffers and that this stack overflow mitigation in practice
allows the recovery of nothing important.

>>> It's not undefined behavior, not in practice.  We know quite well what
>>> can and cannot happen.
>>
>> No you don't, because we can longjmp out of third-party code
> 
> FUD.  What "third-party code"?  Any code we use in Emacs has its
> sources open for scrutiny.

First of all, it's perfectly legal to update libc to a version that
wasn't around for a particular Emacs release, and this libc (which is
perfectly conforming under _legitimate_ API use) might have problems
with the Emacs recovery scheme that we didn't and couldn't anticipate.

Also, third-party libraries are generally written under the assumption
that control isn't yanked form under them partway through delicate
operations. I don't think it's reasonable to expect that every library
Emacs uses be robust under this kind of abuse.

>>> Anyway, saying that "unpleasant things can happen" _is_ FUD.  I want
>>> to see a single bug report about these unpleasant things happening in
>>> real use, then I'll start thinking whether I should reconsider.
>>
>> And I want to see a real bug report about the stack overflow we're
>> trying to defend against.
> 
> We've been through that already: if stack overflow never happens, the
> recovery code can never cause any problems.

Given that stack overflow is rare, we won't get to test the scenario
much. We should err on the side of making Emacs behave predictably
instead of trying to recover using undefined behavior, because if the
recovery causes problems, it'll be hard to tell.

>> The failure mode here wouldn't be obvious either: Emacs could just
>> silently crash, hang, or write a wrong byte or two to a file.
> 
> Neither of which is a disaster.

Neither of which will produce a bug report blaming this code, so the
lack of bug reports is not positive evidence that this code is harmless.

>> You have no idea what might happen, which is especially concerning
>> because Emacs is frequently an internet-facing network program parsing
>> untrusted data.
> 
> All I want is to take every measure to avoid losing work.  Every other
> problem was already there before stack-overflow recovery was added.

I agree that we should avoid losing work. The way to do that is to beef
up autosave so that after a crash, we can recover quickly. That's the
approach other long-running programs with precious user data, like
Office, Visual Studio, Firefox, and vim, use.