From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc. Date: Sun, 3 Jan 2016 12:47:42 -0800 Message-ID: <568988EE.3010205@dancol.org> References: <83mvu1x6t3.fsf@gnu.org> <5677DBC9.6030307@cs.ucla.edu> <83io3rst2r.fsf@gnu.org> <567841A6.4090408@cs.ucla.edu> <567844B9.2050308@dancol.org> <5678CD07.8080209@cs.ucla.edu> <5678D3AF.7030101@dancol.org> <5678D620.6070000@cs.ucla.edu> <83mvt2qxm1.fsf@gnu.org> <56797CD9.8010706@cs.ucla.edu> <8337uuqsux.fsf@gnu.org> <5679DC83.70405@cs.ucla.edu> <83oadhp2mj.fsf@gnu.org> <567AD556.6020202@cs.ucla.edu> <567AD766.3060608@dancol.org> <567B5DAB.2000900@cs.ucla.edu> <83fuyromig.fsf@gnu.org> <567C25B1.3020101@dancol.org> <56892FD6.8040708@dancol.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="tOgMcPj0CuOvKJDQTqLmfvswRuxrXG9hg" X-Trace: ger.gmane.org 1451854087 23211 80.91.229.3 (3 Jan 2016 20:48:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 3 Jan 2016 20:48:07 +0000 (UTC) To: Eli Zaretskii , Paul Eggert , Emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jan 03 21:47:59 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aFpZG-0003ak-HG for ged-emacs-devel@m.gmane.org; Sun, 03 Jan 2016 21:47:58 +0100 Original-Received: from localhost ([::1]:42884 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aFpZC-0003iM-KZ for ged-emacs-devel@m.gmane.org; Sun, 03 Jan 2016 15:47:54 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38878) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aFpZ8-0003gY-Tr for Emacs-devel@gnu.org; Sun, 03 Jan 2016 15:47:52 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aFpZ5-00037z-MA for Emacs-devel@gnu.org; Sun, 03 Jan 2016 15:47:50 -0500 Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:39783) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aFpZ5-00037t-B8; Sun, 03 Jan 2016 15:47:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:References:To:Subject; bh=RsVZfx5OQcUgLStsUj02P/39lHVpM/tS9x9EEiYi+vw=; b=BMnsTOaoViwkxtc1NQ+4CfU6ZMOQGv7qU4kSm+8AnIr9NMPxGjV2fk1N9cO3HYihG6khO8Ttt27FOnEhQq6AjjcQHmEWwGRoZy0dWoH+UPwYHtmc0MyEPd7I4pXviRt+GnPUY1+QS1J1IOigGkpWp6HMYKkIhmZcw295yZt6MRBUTbyy5NWPEuIFZmWXeMdVV/jKu3Y6j219F8d6up6yX+7WGKJt2b2aWGHUhtMf0i8Gb7uUeWrmt2T9usrc1hk1ufy5RWNGwfv9RHkehm+k9HWm7RMrIniTLK1vn3wQpC20iUJYjfIxRUSgYYLuN97Dg27Q5FH0EdpiLKEuJQgMbg==; Original-Received: from c-67-161-115-4.hsd1.wa.comcast.net ([67.161.115.4] helo=[192.168.1.210]) by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84) (envelope-from ) id 1aFpZ4-00079s-5t; Sun, 03 Jan 2016 12:47:46 -0800 X-Enigmail-Draft-Status: N1110 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:197501 Archived-At: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --tOgMcPj0CuOvKJDQTqLmfvswRuxrXG9hg Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 01/03/2016 12:25 PM, John Wiegley wrote: >>>>>> Daniel Colascione writes: >=20 >> In practice, the Lisp stack depth limits provide enough protection, an= d the >> risk of data corruption is too great. The existing auto-save logic is = good >> enough for data recovery, especially if we run the sigsegv handler on = the >> alternate signal stack (which we can make as large as we want) when >> possible. >=20 > OK, I see we have two roads, and I see where your objection is coming f= rom. >=20 > You say, "In practice". Can you expound on your practical experience? I= 'm > curious if there's a real experience you've had that leads to such a st= rong > objection. I hate to use arguments from experience, but you asked: I worked on crash reporting for Windows Phone, and I do significant work for crash reporting on Messenger and Facebook for Android. I've worked extensively with Breakpad, ACRA, multiplexed unix signal handlers, crash classification, and so on. In my experience, attempts to recover from crashes have almost always made problems worse: they obscure root causes of important bugs by causing seemingly-impossible downstream crashes and data corruption. It's not just a theoretical problem: I've spent lots of late nights staring at stack traces, trying to figure out how a certain deadlock could be possible, only to realize that the program had already crashed --- or would have, if a seldom-tested bit of code hadn't checked for NULL and returned without releasing a lock, causing a hang half an hour later. It's even worse with an SEH handler, which allows programmers writing for Windows to do this: for(;;) { __try { DoSomething(); __except(1) { // LOL: silently ignore all stack overflow, NULL deref, etc. } } The Emacs error recovery code is similar in spirit. Granted, it's not silent, and we don't try to recover from *all* segfaults, but it's still essentially ignoring a programming error and trying to continue. It's because I've wasted so much time debugging these kinds of programs that I strongly prefer failing fast when someone goes wrong and relying on automatic persistence mechanisms to preserve volatile data. I've seen dozens of simple bugs (that could have been quickly fixed) turn into monsters because someone tried to paper over them and keep a program from crashing. In the context of stack overflow in Emacs, if we're getting this code, it's because we made a mistake [1] in the C core. There's no user interaction that should cause us to overflow the stack. I'd rather know about that mistake and get a user back into a working Emacs as soon as possible. [1] The GC tracing thing is concerning, but Paul's post actually gives me an idea for fixing it without completely redoing marking: we can reserve GC stack at the same time we allocate lisp objects, say in 2MB chunks, and then just switch stacks as we mark. > Also, note that other cases of error recovery leading to undefined beha= vior > exist in the wild: If a process uses too much memory, Linux's OOM kille= r will > terminate arbitrary processes in an attempt to prevent system lockup. T= here > are no guarantees that it will not kill something that leaves the syste= m in an > inconsistent or bad state, since the process it kills may have been in = the > middle of a critical process, and the author might not have written pro= per > signal handlers. Nit: unfortunately, it's not possible for normal processes to even detect the Linux OOM killer's operation. Death comes quickly via SIGKILL; if you want to recover, you need a watchdog. You're right that under Linux, programs need to prepare for the possibility that they might suddenly cease to exist. We're talking about something different here, which is the possibility that a program can *keep running*, but in a damaged and undefined state. > I'm inclined to leave the stack overflow protection in until it bites u= s; > because I know from personal evidence that having Emacs suddenly disapp= ear > DOES bite people. I'm less sure about "undefined behavior" that I haven= 't > experienced yet... I'm worried that it'll be hard to know if it bites us, particularly since the problems I'm imagining are infrequent, unreproducible, and carry no obvious signature that would show up in a user crash report. --tOgMcPj0CuOvKJDQTqLmfvswRuxrXG9hg Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJWiYjuAAoJEN4WImmbpWBlW7UQAJ7bKtmsK3MOS/p0dHnOXBMY NwBEW9/Wt2sI/uSaP/HHqYuY5kpTH/qOSnEuay7unMT3PuRXA9rlW2/l3hPErPBq 9EKGvquB4fbd7DBY1Z4IvbzcB8E4E0eAigxx/1CNxG21Q6K6eB5DLGl4eUvbxM21 b9PgXhDSnnoKLix4J1LoqlKIES+5I878/zUnLx0DK2CYSuMTkNpjdEuI7DoXQov3 /DgIiPMwrHEmAJb5KtS6SneVzh91Y/AwMGp1I9K7TirC0g4pXyAVznDajk/N77BY 299h5+6anNWg0PHiT+1eo4XOGpUdfTyxIFbwzjmyWBWy3SGds3cowjCPnkWL5WNT SJF+amR3ottkX5UqvyTI0XL3P4Es4yU7wuV9fTJdSlIRBkbk3pPHc1+AlzPFLRy9 6SkdF/lHlenLoQ2thr/QtpIXppcSfuKucob8Sg+ABFWjCHt4HyWdUM5pcKIMWnAW pWxIkO0l7R47F6HvWVO4dc+idHjJ+RvSpGSTmuH0XDuHerWdlJttGHksMzJXb2b8 w4ym82bw5tlMjPjJxuyuOYLKYM2m6t13tMf1i5uKTb/JNrbJQ/MNRysS5slGv/Vj 3yc5P2/Z1KKwNTnBEu7xjeg4HD/Lu6+GLIxbZk3DXBu9PuxbTCJ5He3jeYFISW3p Wea2/sh37crZ332fpVX9 =cuYZ -----END PGP SIGNATURE----- --tOgMcPj0CuOvKJDQTqLmfvswRuxrXG9hg--