On 01/03/2016 01:45 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
> 
>> Ideally, Emacs would, on crash (and after auto-save), spawn a copy of itself
>> with an error report pre-filled. Fork and exec work perfectly fine in signal
>> handlers.
> 
> One problem here is that some of us have extensive configurations that load a
> great deal of saved state between executions. Spawning a new Emacs just to
> send an error report is not something I'd want to see happen.

Are you worried about startup time or correctness? Either way, wouldn't
wouldn't spawning a new emacs with -Q solve the problem?

>> But in any case, if we put Emacs into a state where the only thing a user
>> can do is save files, why not just save the files? There's no guarantee that
>> after a crash that we can even display something.
> 
> So, on a detected crash, auto-save all files, and save a text file with the
> crash data before exiting? That sounds pretty safe and reasonable to me.

I'm imagining more a minidump than a text file, yes, that's the basic idea.

> Maybe we could even popup a window to alert the user, and prompt them to press
> a key, but the only action will be to exit (unless the user is a power user
> and uses recursive edit to attempt to interact with their now-broken Emacs).

That's a reasonable UI, but popping up a window or otherwise displaying
UI in-process might not work. Instead, we can fork and exec a new Emacs
to interact with the user, and read from a pipe that process inherits a
byte telling the crashing Emacs what it should do. All that's perfectly
legal to do from an async-signal-unsafe context.

The new Emacs has to know *how* to display a message. I think it should
be possible to look at the current frame's window system information.
For NS and Win32, we just need to know whether it's GUI or a tty.  For
X11, we'd just need to extract display. On every frame switch, we can
record this information in a simple variable we can read in any
async-signal-safe way.

Of course the child Emacs has to display something to the user somehow,
but we can record the current window-system parameters on every frame
switch into async-signal-safe state (say, a global char buffer), so that
we can launch the child Emacs with the right display parameters.

If the user indicates via the new process that she wants to continue
using the broken Emacs, great. We should support doing just that. It'd
be nice also to give that child Emacs support for attaching GDB to its
parent, actually. Of course it's possible to attach GDB manually, but
why not make it convenient?

>> We have no information on how often Emacs crashes in the hands or real users
>> or how it crashes. A wait-and-see approach is just blind faith.
> 
> I prefer to think of it as data gathering. Accepting the words of one person
> about what the future will look like is more in line with the faith approach.
> I'm not hearing a chorus of voices against this feature, and I have the word
> of other seasoned engineers in support of it.
> 
>> One question that neither you, nor Eli, nor Paul have answered is why we
>> would try to recover from stack overflow and not NULL deferences. Exactly
>> the same arguments apply to both situations.
> 
> Why must it be all or nothing? Some is better than nothing. The error handler
> can evolve after we know just how useful it is (or whether it is).

If we had real data, I'd be more comfortable with the feature. As it is,
we have to rely on user reports, and I suspect that most users won't
bother reporting occasional hangs and crashes if it's any harder than
pushing a button. Given the absence of quantitative information, I'd
rather avoid undefined behavior.

> Eli, Paul: What do you think about just auto-saving as much as possible,
> writing an error trace to a file, and prompting the user to press a key, after
> which we abort the running Emacs? This is in line with what many of my OS X
> applications do when they encounter a fatal error; they're kind enough to tell
> me that it happened, and give me an "OK" button to click before they abort,
> but they don't allow me to continue to operate the application in an unknown
> state.

That works. In particular, on startup, we can create a new, empty file
under ~/.emacs.d and keep a file descriptor to it open. Normally, we'll
never write to the file. If we see a crash of *any* sort, however ---
stack overflow or some other bug --- we'll prompt the user. If the user
elects to continue using Emacs or attach a debugger, fine.

If not, we'll save to the file we've already opened information about
the crash, followed by the contents of dirty buffers.

On next startup, for each crash file we find that isn't owned by a
running Emacs, we'll

  1) read and parse the crash file,
  2) prompt the user to send a bug report, and
  3) restore the contents of persisted buffers.

To avoid crash loops arising from certain arrangements of buffer
contents, we can restore each buffer in fundamental-mode, and with a
name indicating that it's recovered data.

The advantage of using this scheme instead of the generic auto-save is
that this one is async-signal-safe (and never runs Lisp), can't fail
(except due to disk space exhaustion and the Emacs process disappearing
--- because we've preallocated all other resources), works for
non-file-backed buffers that wouldn't ordinarily be autosaved, and makes
state restoration explicit.

It also works perfectly well for crashes in module code.

Of course, the downside is that the code to do this doesn't exist yet.