On 01/03/2016 01:28 PM, Daniel Colascione wrote: > On 01/03/2016 01:07 PM, John Wiegley wrote: >>>>>>> Daniel Colascione writes: >> >>> It's not just a theoretical problem: I've spent lots of late nights staring >>> at stack traces, trying to figure out how a certain deadlock could be >>> possible, only to realize that the program had already crashed --- or would >>> have, if a seldom-tested bit of code hadn't checked for NULL and returned >>> without releasing a lock, causing a hang half an hour later. >> >> I see. Isn't what you describe an argument against error handling in general, >> though? It too can mask the origin of serious problems. > > It is. There's a difference between trying to paper over undefined > behavior generally, however, and reporting well-defined errors using a > safe mechanism. (The former invalidates the system's own invariants, > while the latter invalidates only the application's invariants.) > > But yes, error handling in general can paper over bugs, and I've > certainly seem Emacs bugs similarly exacerbated by attempting to ignore > errors. > >> What if we do this: >> >> 1. When a serious error occurs that engages crash recovery, we pop up a >> window in Emacs describing that a serious error occurred that would have >> crashed Emacs --and that *nothing* should be trusted now. All the user >> should do is save critical buffers and exit immediately. > > The call to Fdo_auto_save tries to do that already. Fdo_auto_save isn't > async-signal-safe, so I'd rather fork a child process, in the child, > call Fdo_auto_save and exit, have the parent wait 500ms for the child > (not forever, in case the child deadlocks), kill the child, and continue > crashing. That, or provide a less elaborate, async-signal-safe, pure C > auto-save facility. I'd also support doing no auto-save at crash time. Auto-save should happen frequently enough anyway that users shouldn't lose much data when a crash happens, and not auto-saving sidesteps a lot of robustness concerns.