From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc. Date: Thu, 24 Dec 2015 10:06:15 -0800 Message-ID: <567C3417.5090303@dancol.org> References: <83mvu1x6t3.fsf@gnu.org> <83r3iht93x.fsf@gnu.org> <838u4psznr.fsf@gnu.org> <56772054.8010401@cs.ucla.edu> <83zix4scgf.fsf@gnu.org> <5677DBC9.6030307@cs.ucla.edu> <83io3rst2r.fsf@gnu.org> <567841A6.4090408@cs.ucla.edu> <567844B9.2050308@dancol.org> <5678CD07.8080209@cs.ucla.edu> <5678D3AF.7030101@dancol.org> <5678D620.6070000@cs.ucla.edu> <83mvt2qxm1.fsf@gnu.org> <56797CD9.8010706@cs.ucla.edu> <8337uuqsux.fsf@gnu.org> <5679DC83.70405@cs.ucla.edu> <83oadhp2mj.fsf@gnu.org> <567AD556.6020202@cs.ucla.edu> <567AD766.3060608@dancol.org> <567B5DAB.2000900@cs.ucla.edu> <83fuyromig.fsf@gnu.org> <567C25B1.3020101@dancol.org> <831taboike.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="wgIV25kl52HqshIiuCmc2mV8nsqwSf4gs" X-Trace: ger.gmane.org 1450980409 15684 80.91.229.3 (24 Dec 2015 18:06:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 24 Dec 2015 18:06:49 +0000 (UTC) Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 24 19:06:41 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aCAHf-0008UN-Sf for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 19:06:40 +0100 Original-Received: from localhost ([::1]:32946 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAHf-0001vk-7y for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 13:06:39 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49999) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAHQ-0001uo-BK for Emacs-devel@gnu.org; Thu, 24 Dec 2015 13:06:25 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aCAHN-0005os-3E for Emacs-devel@gnu.org; Thu, 24 Dec 2015 13:06:24 -0500 Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:46241) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aCAHM-0005oL-O9; Thu, 24 Dec 2015 13:06:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:Cc:References:To:Subject; bh=NGTV1Xu4NRlRKt9CQYspI3i6MYYHNXQ/CwwD0ahxqn8=; b=hl0RwwNKriWseZ1nltyqMuSKM+U342wQMPtvpdiLpcCuwlE19J2sTaXl9vk8DXkcHo3UjuGOZRwZL0c/NeerT8kIx8HCIAiumu+xyLwLJXMhIWolKHjIpx+eV9rNArtKaHkyx5mt92cGJXo6dlU+Yz4HoPpvd3bAqE+gWOJw7zP3nsqNLkVEgc3DlhbFg3386RqHNU/3qbNd3TCLzoVnYa3l9SSNnBmQ1y7P5afKO3EdEJ6uWTHwwlUaQm5Pdh1gXncZwPqyXbCOGh5jkxz8205ql6PR0B+gwfp7QgrCsr39eM/eO2HayiD0iR/D999XPzYIEPd4UkV16u5+AIDz7g==; Original-Received: from c-67-161-115-4.hsd1.wa.comcast.net ([67.161.115.4] helo=[192.168.1.210]) by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84) (envelope-from ) id 1aCAHL-0004PS-Io; Thu, 24 Dec 2015 10:06:19 -0800 X-Enigmail-Draft-Status: N1110 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 In-Reply-To: <831taboike.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:196782 Archived-At: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --wgIV25kl52HqshIiuCmc2mV8nsqwSf4gs Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 12/24/2015 09:36 AM, Eli Zaretskii wrote: >> Cc: Emacs-devel@gnu.org >> From: Daniel Colascione >> Date: Thu, 24 Dec 2015 09:04:49 -0800 >> >> You'd prefer Emacs to lock up or corrupt data instead? >=20 > Instead of crashing and corrupting data? What's the difference? >=20 > Of course, if it would do that all the time, or even most of the time, > we'd consider the solution a bad one, and remove it or look for ways > of improving it. But we are not there; in most cases the recovery > doesn't hang and doesn't corrupt any data. How would we know? It's not as if we have telemetry from real users that lets us quantitatively evaluate crash frequency. (Automatically sending crash reports is something else we should do, although I suspect that's going to be a very long discussion.) In any case, I expect the undefined-behavior problem to be worse in a modules-heavy system, since most of the Emacs core code is written to use non-local control flow for error reporting already, and since it uses the GC for resource cleanup. I expect module code to be written in a style less tolerant of arbitrary non-local control flow. >> Neither you nor Paul have addressed any of the alternatives to this >> longjmp-from-anywhere behavior. You have not addressed the point that >> Emacs can crash fatally in numerous ways having nothing to do with sta= ck >> overflow. You have not addressed the point that we already have robust= >> stack overflow protection at the Lisp level, and so don't need >> additional workarounds at the C level. You have not even provided any >> evidence that C-level stack overflow is a problem worth solving. >=20 > I think we did address those, you just didn't like the responses, so > you don't accept them as responses. I have seen no evidence that C stack overflow is a real problem that justifies the risks inherent in the current error handling scheme. >> All I see is a insistence that we keep the longjmp hack stay because >> "Emacs must not crash", even though it demonstrably does crash in >> numerous exciting ways, and won't stop any time soon, because real >> programs always have bugs, and experience shows that failing quickly >> (trying to preserve data) is better than trying to limp along, because= >> that just makes the situation worse. >=20 > Stack overflow recovery is an attempt to solve some of these crashes. > Having it means that users will lose their work in a smaller number of > use cases. So it's an improvement, even if a small one. I fail to > see in it any cause for such excitement. I've already outlined a scheme for preventing data loss in most fatal crash instances, not just those arising from stack overflow. >> I know the rebuttal to that last point is that the perfect shouldn't b= e >> the enemy of the good: believe me, I've debugged enough crashes and >> hangs caused by well-intentioned crash recovery code to know that >> invoking undefined behavior to recover from a crash is far below "good= " >> on the scale of things you can do to improve program reliability. >=20 > I believe you. Now please believe me and Paul who have slightly > different experience and have come to slightly different conclusions. >=20 >> 1) Using some mechanism (alloca will work, although OS-specific option= s >> exist), make sure you have X MB of address space dedicated to the main= >> thread on startup. At this point, we cannot lose data, and failing to >> obtain this address space is both unlikely and as harmful as failing t= o >> obtain space for Emacs BSS. >> >> 2) Now we know the addresses of the top and bottom of the stack. >> >> 3) On each time Lisp calls into C, each time a module calls into the >> Emacs core, and on each QUIT, subtract the current stack pointer from >> the top of the stack. The result is a lower bound on the amount of sta= ck >> space available. This computation is very cheap: it's one load from >> global storage or TLS and a subtract instruction. >> >> 4) If the amount of stack space available is less than some threshold,= >> say Y, signal a stack exhaustion error. >> >> 5) Require that C code (modules included) do not use more than Y MB of= >> stack space between QUITs or calls to the module API >> >> 6) Set Y to a reasonable figure like 4MB. Third-party libraries must >> already be able to run in bounded stack space because they're usually >> designed to run off the main thread, and on both Windows and POSIX >> systems, non-main thread stacks are sized on thread startup and cannot= grow. >> >> I have no idea why we would prefer the SIGSEGV trap approach to >> the scheme I just outlined. >=20 > Your scheme has disadvantages as well. Selecting a good value for Y > is a hard problem. Choose too much, and you will risk aborting valid > programs; choose too little, and you will overflow the stack. Making > sure C doesn't use more than Y is also hard, especially for GC. The GC stack use problem is a separate bug. The right fix there, I think, is to use some data structure other than the C stack for keeping track of the set of objects being marked. Other VMs don't tend to have this problem: one common approach is to allocate managed objects from a contiguous range of address space and use a bit vector to remember all the object-start positions in this range. Then, instead of recursively marking all objects, the GC can just linearly scan from the start to the end of the heap, marking objects as it goes. We can't do that because our backing store is malloc, not a linear region we can annotate with a few bit vectors. We might be able to use some kind of cursor into the now-mandatory mem_node tree. In any case, the possibility of the C stack overflowing during GC isn't relevant to this discussion, since that has isn't covered by the current logic anyway. > It > sounds like just making the stack larger is a better and easier > solution. I'd be perfectly happy deleting the stack overflow code entirely and increasing the declared stack size (on platforms where we ask for it). > Threads make this even more complicated. At least on Windows, by > default each thread gets the same amount of memory reserved for its > stack as recorded by the linker in the program's header, i.e. 8MB in > our case. So several threads can easily eat up a large portion of the > program's address space, and then the actual amount of stack is much > smaller than you might think. We don't have to run Emacs on the main thread. We could, instead, with minimal code changes, call CreateThread on startup, supplying a larger stack size that applies only to that thread. Or we can let X=3D8MB and Y=3D2MB (the system default). I'm not clear on what you mean by "stack is smaller than you might think": on both POSIX systems and on Windows, thread stacks are address space reservations made at thread creation time. If we can't fit another thread stack in the current address space, the failure mode is thread creation failing, not thread stacks being undersized. > So on balance, I don't see how your proposal is better. I'm really not sure what's balancing the risk of data corruption and lockups caused by the stack overflow code. Emacs got along fine for decades before Dmitry added the stack overflow check late last year. --wgIV25kl52HqshIiuCmc2mV8nsqwSf4gs Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJWfDQXAAoJEN4WImmbpWBl6/wP/jQYI7oj3D6IjcVzY6QBY6HW AkuvlRX1G7bEsddMbNybSoyhpz6bvHn3wP8K8gAFNrs2P8aCro9KR1a+P287EJDC y81a6j2O33NgyHrQmOQDuloozWDAt1Tx+r1fnogILYs9A4wDj3zSWXvfrm3qjNS8 n5nITbaUJVad9o7C48jsJ2BoAUH4566vtldIE29qbGSM7BN+9DlyCwW4d/nRtyRG pCRSD4AtjbbOoEYQZvJ+gVypoPTVg6Yojd5eBnMf1wd39KzgswROcwlJ5j6Rap8E D7mBlIbXyPSw1CC5XbS3j7yPgxuLwAZZFGMeLgY5Mobqq/XOPq93BEl6G3iwj/MT ByKUZgpt1tf1c0r+m78yxVmqi54ugyIIHhORIdjT0FKxC8hIMy+YPN8UVrBQLHNx g/6toNL8THxuQibMW7nFEnp6+RRuu6KfuHM3kIs3LKLIm+1WZT1u/nEhps7EHkPu 0Qi7QlGdR7Qm36ByYwERYFkzKDde3sneBUcnn65YBABH1vRXwotDlBrVRm4kTLG5 MC1Xd2LmEirWf/q48Yq83KG56kJ6bbAZC9r6ll8J1i/JivJaMcmhoU5ilU1Hk/a/ 94H+gBj57x4rvUMvGox9O2uAFxXgRZs/xi7g2rpMe5x5Y7ZwzJZvmzaL4fWVCFT+ hXL3jDrakW4K56R4Pwnw =cpfz -----END PGP SIGNATURE----- --wgIV25kl52HqshIiuCmc2mV8nsqwSf4gs--