From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Daniel Colascione <dancol@dancol.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
Date: Thu, 24 Dec 2015 10:06:15 -0800
Message-ID: <567C3417.5090303@dancol.org>
References: <83mvu1x6t3.fsf@gnu.org>
	<CAArVCkQLgbeaU7X4Vczx50MhQrwToZFpQnChiaga1U3eUVAP1w@mail.gmail.com>
	<CAArVCkTitGtzNxJ6DKMWx8cA2+YyfMKA_er_SvNL9iyOhF+1nA@mail.gmail.com>
	<83r3iht93x.fsf@gnu.org>
	<CAArVCkRJd+O9gbWQBpOh8a9zgbWid8jfrfkSwKG9fEnF+y6bwQ@mail.gmail.com>
	<838u4psznr.fsf@gnu.org> <56772054.8010401@cs.ucla.edu>
	<83zix4scgf.fsf@gnu.org> <5677DBC9.6030307@cs.ucla.edu>
	<83io3rst2r.fsf@gnu.org> <567841A6.4090408@cs.ucla.edu>
	<567844B9.2050308@dancol.org> <5678CD07.8080209@cs.ucla.edu>
	<5678D3AF.7030101@dancol.org> <5678D620.6070000@cs.ucla.edu>
	<83mvt2qxm1.fsf@gnu.org> <56797CD9.8010706@cs.ucla.edu>
	<8337uuqsux.fsf@gnu.org> <5679DC83.70405@cs.ucla.edu>
	<83oadhp2mj.fsf@gnu.org>
	<CAArVCkSfvMqzgNaDh4XCcv-PwdTsmK5vrD_6bp1OmmBGAWOnYA@mail.gmail.com>
	<567AD556.6020202@cs.ucla.edu> <567AD766.3060608@dancol.org>
	<567B5DAB.2000900@cs.ucla.edu> <83fuyromig.fsf@gnu.org>
	<567C25B1.3020101@dancol.org> <831taboike.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature";
	boundary="wgIV25kl52HqshIiuCmc2mV8nsqwSf4gs"
X-Trace: ger.gmane.org 1450980409 15684 80.91.229.3 (24 Dec 2015 18:06:49 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 24 Dec 2015 18:06:49 +0000 (UTC)
Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 24 19:06:41 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1aCAHf-0008UN-Sf
	for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 19:06:40 +0100
Original-Received: from localhost ([::1]:32946 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1aCAHf-0001vk-7y
	for ged-emacs-devel@m.gmane.org; Thu, 24 Dec 2015 13:06:39 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49999)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dancol@dancol.org>) id 1aCAHQ-0001uo-BK
	for Emacs-devel@gnu.org; Thu, 24 Dec 2015 13:06:25 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dancol@dancol.org>) id 1aCAHN-0005os-3E
	for Emacs-devel@gnu.org; Thu, 24 Dec 2015 13:06:24 -0500
Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:46241)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dancol@dancol.org>)
	id 1aCAHM-0005oL-O9; Thu, 24 Dec 2015 13:06:21 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org;
	s=x; 
	h=Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:Cc:References:To:Subject;
	bh=NGTV1Xu4NRlRKt9CQYspI3i6MYYHNXQ/CwwD0ahxqn8=; 
	b=hl0RwwNKriWseZ1nltyqMuSKM+U342wQMPtvpdiLpcCuwlE19J2sTaXl9vk8DXkcHo3UjuGOZRwZL0c/NeerT8kIx8HCIAiumu+xyLwLJXMhIWolKHjIpx+eV9rNArtKaHkyx5mt92cGJXo6dlU+Yz4HoPpvd3bAqE+gWOJw7zP3nsqNLkVEgc3DlhbFg3386RqHNU/3qbNd3TCLzoVnYa3l9SSNnBmQ1y7P5afKO3EdEJ6uWTHwwlUaQm5Pdh1gXncZwPqyXbCOGh5jkxz8205ql6PR0B+gwfp7QgrCsr39eM/eO2HayiD0iR/D999XPzYIEPd4UkV16u5+AIDz7g==;
Original-Received: from c-67-161-115-4.hsd1.wa.comcast.net ([67.161.115.4]
	helo=[192.168.1.210])
	by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
	(Exim 4.84) (envelope-from <dancol@dancol.org>)
	id 1aCAHL-0004PS-Io; Thu, 24 Dec 2015 10:06:19 -0800
X-Enigmail-Draft-Status: N1110
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.4.0
In-Reply-To: <831taboike.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:196782
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/196782>

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--wgIV25kl52HqshIiuCmc2mV8nsqwSf4gs
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 12/24/2015 09:36 AM, Eli Zaretskii wrote:
>> Cc: Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Thu, 24 Dec 2015 09:04:49 -0800
>>
>> You'd prefer Emacs to lock up or corrupt data instead?
>=20
> Instead of crashing and corrupting data?  What's the difference?
>=20
> Of course, if it would do that all the time, or even most of the time,
> we'd consider the solution a bad one, and remove it or look for ways
> of improving it.  But we are not there; in most cases the recovery
> doesn't hang and doesn't corrupt any data.

How would we know? It's not as if we have telemetry from real users that
lets us quantitatively evaluate crash frequency. (Automatically sending
crash reports is something else we should do, although I suspect that's
going to be a very long discussion.)

In any case, I expect the undefined-behavior problem to be worse in a
modules-heavy system, since most of the Emacs core code is written to
use non-local control flow for error reporting already, and since it
uses the GC for resource cleanup. I expect module code to be written in
a style less tolerant of arbitrary non-local control flow.

>> Neither you nor Paul have addressed any of the alternatives to this
>> longjmp-from-anywhere behavior. You have not addressed the point that
>> Emacs can crash fatally in numerous ways having nothing to do with sta=
ck
>> overflow. You have not addressed the point that we already have robust=

>> stack overflow protection at the Lisp level, and so don't need
>> additional workarounds at the C level. You have not even provided any
>> evidence that C-level stack overflow is a problem worth solving.
>=20
> I think we did address those, you just didn't like the responses, so
> you don't accept them as responses.

I have seen no evidence that C stack overflow is a real problem that
justifies the risks inherent in the current error handling scheme.

>> All I see is a insistence that we keep the longjmp hack stay because
>> "Emacs must not crash", even though it demonstrably does crash in
>> numerous exciting ways, and won't stop any time soon, because real
>> programs always have bugs, and experience shows that failing quickly
>> (trying to preserve data) is better than trying to limp along, because=

>> that just makes the situation worse.
>=20
> Stack overflow recovery is an attempt to solve some of these crashes.
> Having it means that users will lose their work in a smaller number of
> use cases.  So it's an improvement, even if a small one.  I fail to
> see in it any cause for such excitement.

I've already outlined a scheme for preventing data loss in most fatal
crash instances, not just those arising from stack overflow.

>> I know the rebuttal to that last point is that the perfect shouldn't b=
e
>> the enemy of the good: believe me, I've debugged enough crashes and
>> hangs caused by well-intentioned crash recovery code to know that
>> invoking undefined behavior to recover from a crash is far below "good=
"
>> on the scale of things you can do to improve program reliability.
>=20
> I believe you.  Now please believe me and Paul who have slightly
> different experience and have come to slightly different conclusions.
>=20
>> 1) Using some mechanism (alloca will work, although OS-specific option=
s
>> exist), make sure you have X MB of address space dedicated to the main=

>> thread on startup. At this point, we cannot lose data, and failing to
>> obtain this address space is both unlikely and as harmful as failing t=
o
>> obtain space for Emacs BSS.
>>
>> 2) Now we know the addresses of the top and bottom of the stack.
>>
>> 3) On each time Lisp calls into C, each time a module calls into the
>> Emacs core, and on each QUIT, subtract the current stack pointer from
>> the top of the stack. The result is a lower bound on the amount of sta=
ck
>> space available. This computation is very cheap: it's one load from
>> global storage or TLS and a subtract instruction.
>>
>> 4) If the amount of stack space available is less than some threshold,=

>> say Y, signal a stack exhaustion error.
>>
>> 5) Require that C code (modules included) do not use more than Y MB of=

>> stack space between QUITs or calls to the module API
>>
>> 6) Set Y to a reasonable figure like 4MB. Third-party libraries must
>> already be able to run in bounded stack space because they're usually
>> designed to run off the main thread, and on both Windows and POSIX
>> systems, non-main thread stacks are sized on thread startup and cannot=
 grow.
>>
>> I have no idea why we would prefer the SIGSEGV trap approach to
>> the scheme I just outlined.
>=20
> Your scheme has disadvantages as well.  Selecting a good value for Y
> is a hard problem.  Choose too much, and you will risk aborting valid
> programs; choose too little, and you will overflow the stack.  Making
> sure C doesn't use more than Y is also hard, especially for GC.

The GC stack use problem is a separate bug. The right fix there, I
think, is to use some data structure other than the C stack for keeping
track of the set of objects being marked.

Other VMs don't tend to have this problem: one common approach is to
allocate managed objects from a contiguous range of address space and
use a bit vector to remember all the object-start positions in this
range. Then, instead of recursively marking all objects, the GC can just
linearly scan from the start to the end of the heap, marking objects as
it goes. We can't do that because our backing store is malloc, not a
linear region we can annotate with a few bit vectors.

We might be able to use some kind of cursor into the now-mandatory
mem_node tree.

In any case, the possibility of the C stack overflowing during GC isn't
relevant to this discussion, since that has isn't covered by the current
logic anyway.

> It
> sounds like just making the stack larger is a better and easier
> solution.

I'd be perfectly happy deleting the stack overflow code entirely and
increasing the declared stack size (on platforms where we ask for it).

> Threads make this even more complicated.  At least on Windows, by
> default each thread gets the same amount of memory reserved for its
> stack as recorded by the linker in the program's header, i.e. 8MB in
> our case.  So several threads can easily eat up a large portion of the
> program's address space, and then the actual amount of stack is much
> smaller than you might think.

We don't have to run Emacs on the main thread. We could, instead, with
minimal code changes, call CreateThread on startup, supplying a larger
stack size that applies only to that thread. Or we can let X=3D8MB and
Y=3D2MB (the system default).

I'm not clear on what you mean by "stack is smaller than you might
think": on both POSIX systems and on Windows, thread stacks are address
space reservations made at thread creation time. If we can't fit another
thread stack in the current address space, the failure mode is thread
creation failing, not thread stacks being undersized.

> So on balance, I don't see how your proposal is better.

I'm really not sure what's balancing the risk of data corruption and
lockups caused by the stack overflow code. Emacs got along fine for
decades before Dmitry added the stack overflow check late last year.


--wgIV25kl52HqshIiuCmc2mV8nsqwSf4gs
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJWfDQXAAoJEN4WImmbpWBl6/wP/jQYI7oj3D6IjcVzY6QBY6HW
AkuvlRX1G7bEsddMbNybSoyhpz6bvHn3wP8K8gAFNrs2P8aCro9KR1a+P287EJDC
y81a6j2O33NgyHrQmOQDuloozWDAt1Tx+r1fnogILYs9A4wDj3zSWXvfrm3qjNS8
n5nITbaUJVad9o7C48jsJ2BoAUH4566vtldIE29qbGSM7BN+9DlyCwW4d/nRtyRG
pCRSD4AtjbbOoEYQZvJ+gVypoPTVg6Yojd5eBnMf1wd39KzgswROcwlJ5j6Rap8E
D7mBlIbXyPSw1CC5XbS3j7yPgxuLwAZZFGMeLgY5Mobqq/XOPq93BEl6G3iwj/MT
ByKUZgpt1tf1c0r+m78yxVmqi54ugyIIHhORIdjT0FKxC8hIMy+YPN8UVrBQLHNx
g/6toNL8THxuQibMW7nFEnp6+RRuu6KfuHM3kIs3LKLIm+1WZT1u/nEhps7EHkPu
0Qi7QlGdR7Qm36ByYwERYFkzKDde3sneBUcnn65YBABH1vRXwotDlBrVRm4kTLG5
MC1Xd2LmEirWf/q48Yq83KG56kJ6bbAZC9r6ll8J1i/JivJaMcmhoU5ilU1Hk/a/
94H+gBj57x4rvUMvGox9O2uAFxXgRZs/xi7g2rpMe5x5Y7ZwzJZvmzaL4fWVCFT+
hXL3jDrakW4K56R4Pwnw
=cpfz
-----END PGP SIGNATURE-----

--wgIV25kl52HqshIiuCmc2mV8nsqwSf4gs--