From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) Date: Wed, 23 Dec 2015 08:25:51 -0800 Message-ID: <567ACB0F.9060804@dancol.org> References: <83mvu1x6t3.fsf@gnu.org> <565779CD.80405@cs.ucla.edu> <83io4nuc68.fsf@gnu.org> <83r3iht93x.fsf@gnu.org> <838u4psznr.fsf@gnu.org> <56772054.8010401@cs.ucla.edu> <83zix4scgf.fsf@gnu.org> <5677DBC9.6030307@cs.ucla.edu> <83io3rst2r.fsf@gnu.org> <567841A6.4090408@cs.ucla.edu> <567844B9.2050308@dancol.org> <5678CD07.8080209@cs.ucla.edu> <5678D3AF.7030101@dancol.org> <83oadiqxq1.fsf@gnu.org> <5679B33E.9000804@dancol.org> <83y4cmp5y5.fsf@gnu.org> <5679B7F5.9030504@dancol.org> <83twnap4xa.fsf@gnu.org> <5679BE1D.5070903@dancol.org> <83poxxp2rl.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="4nqgO3QLi1RrVXuDsM8Kf5qWUcPkLxp8i" X-Trace: ger.gmane.org 1450887970 11863 80.91.229.3 (23 Dec 2015 16:26:10 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 23 Dec 2015 16:26:10 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, eggert@cs.ucla.edu, tzz@lifelogs.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 23 17:26:09 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aBmEp-0007Wi-CC for ged-emacs-devel@m.gmane.org; Wed, 23 Dec 2015 17:26:07 +0100 Original-Received: from localhost ([::1]:56797 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBmEo-00061i-N2 for ged-emacs-devel@m.gmane.org; Wed, 23 Dec 2015 11:26:06 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:32857) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBmEk-00061b-7a for emacs-devel@gnu.org; Wed, 23 Dec 2015 11:26:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aBmEh-0003rh-5g for emacs-devel@gnu.org; Wed, 23 Dec 2015 11:26:02 -0500 Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:38000) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBmEg-0003r9-Nk; Wed, 23 Dec 2015 11:25:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:Cc:References:To:Subject; bh=CYfnCg+INmJ4yA/6gxF2pdRyInd9oWzvZLZxYsbqmPs=; b=gtrnED1AXM9OOzJICSihlCWhLZC8QwATwpAC+u+lupMf/qU+lF385trUKooAE6rCttlhcqT9jgi8YNNa3g+Sko95Dq03dsd6cGDK7vMHWJyMKIYl7/h2XAic4b7dRq0/0V6ZHU3M1NUGh/ibShvRh5AYo/A5hfFWzC++4yzUcv4Y4lAnjuzpkbL203VGTEXAvfB66M3my9Jw5fMo/i8q2Yv9l9pbBt9eBJrrLpUFZVehLaCsv22RVrel+BAXRMM1c6t/aLW9RW4DZCqGW2nAabqUNG+fXzU6caCPD1Jj3ZohAjrYQuF+2mBzLk32NqMPfQEZdpo4s8+9zWa9JKSR6g==; Original-Received: from [2620:10d:c090:180::31bb] (helo=[IPv6:2620:10d:c081:1103:2ab2:bdff:fe1c:db58]) by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84) (envelope-from ) id 1aBmEf-0005ao-Mi; Wed, 23 Dec 2015 08:25:57 -0800 X-Enigmail-Draft-Status: N1110 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 In-Reply-To: <83poxxp2rl.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:196714 Archived-At: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --4nqgO3QLi1RrVXuDsM8Kf5qWUcPkLxp8i Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 12/23/2015 08:07 AM, Eli Zaretskii wrote: >> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com, >> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org >> From: Daniel Colascione >> Date: Tue, 22 Dec 2015 13:18:21 -0800 >> >>>> Which is why you setjmp in places where you have a significant stack= >>>> reserve. >>> >>> There's no way of doing that portably, or even non-portably on many >>> platforms. You simply don't _know_ how much stack is left. >> >> You can probe at program start and pre-allocate as much as is reasonab= le. >=20 > Pre-allocate what? Are you suggesting that Emacs allocates its own > stack, instead of relying on the one provided by the linker and the > OS? We can alloca, say, 8MB, and write to the start and end of the allocated region. Then we'll know we have at least that much stack space available.= >>>> Longjmp, by itself, is simple and clear. What's unreliable is longjm= ping >>>> to Lisp at completely arbitrary points in the program, even ones mar= ked >>>> "GC can't happen here" and the like. >>> >>> We longjmp to a particular place, not arbitrary place. >> >> But we longjmp _from_ anywhere, and "anywhere" might be in the middle = of >> any delicate code sequence, since the compiler can generate code to >> write to new stack slots at any point. >=20 > I simply don't see any trouble this could cause, except leaking some > memory. Can you describe in enough detail a single use case where > this could have any other adverse effects that we should care about > when recovering from stack overflow? What happens if we overflow inside malloc? One possibility is that we'll longjmp back to toplevel without releasing the heap lock, then deadlock the next time we try to allocate. >>>> You say Emacs shouldn't crash. Fine. We can't make that guarantee >>>> if the crash recovery code breaks program invariants. >>> >>> Crash recovery doesn't need to keep invariants. Or maybe I >>> misunderstand what invariants do you have in mind. >> >> Any stack allocation anywhere in the program can longjmp. It's >> impossible to reason about safety in that situation. >=20 > Emacs is not safety-critical software, so there's no requirement to > reason about safety. Since I think the recovery's only role is to > allow the user to exit Emacs in a controlled way without losing work, > I simply don't see any problem that could be caused by longjmping from > an arbitrary stack allocation. After all, stack allocation is just > assignment of value to a register, and sometimes grafting a range of > memory pages into the memory set. >=20 >>>> Failing that, we should allocate guard pages, unprotect the guard >>>> pages on overflow >>> >>> Thats what the OS is for. It would be wrong for us to start messing >>> with page protection etc. The exception caused by stack overflow >>> removes protection from the guard page to let you do something simple= , >>> like run the exception handler -- are you suggesting we catch the >>> exception and mess with protection bits as well, i.e. replace one of >>> the core functions of a modern OS? All that because what we have now= >>> is not elegant enough for us? Doesn't sound right to me. >=20 >> We have a program that has its own Lisp runtime, has its own memory >> allocation system, uses its own virtual filesystem access layer, and >> that brings itself back from the dead. We're well past replicating OS >> functionality. >=20 > Actually, most of the above is simply untrue: we use system allocators > to allocate memory We have internal allocators for strings and conses and use the system allocator only for backing storage. > use mundane C APIs like 'open' and 'read' to > access files We must. , and if by "bringing itself from the dead" you allude to > unexec, then what it does is a subset of what every linker does, > hardly an OS stuff. Granted, that's toolchain work, not "OS" work, but it's still outside the domain of most text editors. > I think we should strive to distance ourselves from the OS business, > not the other way around. There was time when doing complex things > sometimes required messing with low-level functionality like that, but > that time is long passed. Allocating our own stack, setting up and > managing our own guard pages and the related exceptions -- we > shouldn't go back there. If an OS provides a documented and supported facility, there's no shame in using it. I'm not sure how worrying about whatever that facility is "OS business" is useful. >> It's not a matter of elegance: it's a matter of correctness. The curre= nt >> scheme is unsafe. >=20 > Emacs is not safety-critical software. It doesn't need to be "safe" > by your definition, if I understand it correctly. It's not safety-critical software, but undefined behavior is undefined. What makes us confident that we can't corrupt buffer data by longjmping from the wrong place? Anything can happen because we can longjmp from anywhere. It's admirable to avoid the loss of user data, but I think there's a way that's both safer and more general. Instead of trying to catch stack overflow, let's treat stack overflow as a normal fatal error and instead think about how we can preserve buffer contents on fatal errors generally= =2E What if we just installed a SIGSEGV handler (or, on Windows, a vectored exception handler) that wrote buffer contents to a special file on a fatal signal, then allowed that fatal signal to propagate normally? The next time Emacs starts, we can restore the buffers we've saved this way and ask users to save them --- just like autosave, but done on-demand, at crash time, in C code, on the alternate signal stack. >>>> and call out_of_memory so that it's obvious Emacs is in a bad >>>> state. This way, we don't have to longjmp out of arbitrary code >>>> sequences. >>> >>> There's no problem longjmping out of arbitrary code sequences. When >>> you debug a program, you do that all the time. >> >> In GDB, interrupting normal control flow is not part of standard >> debugging practice. >=20 > ??? Every time a debuggee hits a breakpoint, the normal control flow > is interrupted, and you in effect have a huge longjmp -- from the > debuggee to the debugger. When a program hits a breakpoint, the OS sends it a signal. A debugger that's ptraced its debugee will receive that signal, suspend execution, and give control to the user. If the user opts to continue execution, the debugger restores the debugee to the state it was in when it received the signal, then allows is to resume execution. At no point does the debugger force a debugee to longjmp. Debuggers take pains to make programs behave as if breakpoints weren't there at all. We don't try to resume execution at the point of a stack overflow. --4nqgO3QLi1RrVXuDsM8Kf5qWUcPkLxp8i Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJWessPAAoJEN4WImmbpWBlFZAP/if93/8CaJgcMVJlsYUuWSIq B0xDH34N32V61qojmOqBNnohteUZHSvPmXcgEoszxB7hArUFc1yQwZBsDlcrtFkh jZ/s8HYFfg32AdnhMETNb/wba3q2RTHJ1oGqueUZUEhcEBQhaZzcZZbuhwqvVspY omBlkQHJk1Or7bkfn4a5rvsf8ixnESc/QKiKx+Y9d+4hKBNGtgzBbgI7lyRT6Kks JbIZhadIB/yPEPugAW3dornVJmcLvZ1PZigHBlyVlKRYCjc38xd+Vjb4h+7K19+9 sO59tSkugk2RQ3nz6Q5BkJmFWqShInp93S+PPpncq8edbWD4ZzVa2MWOeAWs0Fof fRXAeHk4FNd7iHB26MXIftADHepImce9s4IX7pcd/7+EGRts2ftL4pFduDh2Vahg Jc9fTejBmwzfCNOBUo/VpxvspMo8pxf9suioEswBGQg9mhtuWqUBQzKImOjLpQFF P+dbuxI3lVqm8fpp/IigUCObvOxhexYXf8EX84YH2OYJ4u6jUnqG76KYNDWzvDXi 5Brtkz3we4OWbM9rFh6dJskPkAo8ME4mxOMdoxMzYO+SHf60ML2UrZGERF7GooNB T/x14MxpqPmj+Edr5uvLnT7XR5apV85ZqH/mCwUl9mbMltUZFR4VjJBcc5XhruvM 2Lh99BvQHAcmnRdzTghS =knJY -----END PGP SIGNATURE----- --4nqgO3QLi1RrVXuDsM8Kf5qWUcPkLxp8i--