From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Daniel Colascione <dancol@dancol.org>
Newsgroups: gmane.emacs.devel
Subject: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS
	etc.)
Date: Wed, 23 Dec 2015 08:25:51 -0800
Message-ID: <567ACB0F.9060804@dancol.org>
References: <83mvu1x6t3.fsf@gnu.org> <565779CD.80405@cs.ucla.edu>
	<83io4nuc68.fsf@gnu.org>
	<CAArVCkRCdRFkeW7Bzy8dCUJxgCfMAFMt6RGvpn-Pm-ZBsybSkA@mail.gmail.com>
	<CAArVCkTEeOPLL=w0q06ZhVyYkaArqsCMcYP=EavfjE5cGVv5_w@mail.gmail.com>
	<CAArVCkQLgbeaU7X4Vczx50MhQrwToZFpQnChiaga1U3eUVAP1w@mail.gmail.com>
	<CAArVCkTitGtzNxJ6DKMWx8cA2+YyfMKA_er_SvNL9iyOhF+1nA@mail.gmail.com>
	<83r3iht93x.fsf@gnu.org>
	<CAArVCkRJd+O9gbWQBpOh8a9zgbWid8jfrfkSwKG9fEnF+y6bwQ@mail.gmail.com>
	<838u4psznr.fsf@gnu.org> <56772054.8010401@cs.ucla.edu>
	<83zix4scgf.fsf@gnu.org> <5677DBC9.6030307@cs.ucla.edu>
	<83io3rst2r.fsf@gnu.org> <567841A6.4090408@cs.ucla.edu>
	<567844B9.2050308@dancol.org> <5678CD07.8080209@cs.ucla.edu>
	<5678D3AF.7030101@dancol.org> <83oadiqxq1.fsf@gnu.org>
	<5679B33E.9000804@dancol.org> <83y4cmp5y5.fsf@gnu.org>
	<5679B7F5.9030504@dancol.org> <83twnap4xa.fsf@gnu.org>
	<5679BE1D.5070903@dancol.org> <83poxxp2rl.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature";
	boundary="4nqgO3QLi1RrVXuDsM8Kf5qWUcPkLxp8i"
X-Trace: ger.gmane.org 1450887970 11863 80.91.229.3 (23 Dec 2015 16:26:10 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Wed, 23 Dec 2015 16:26:10 +0000 (UTC)
Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, eggert@cs.ucla.edu,
	tzz@lifelogs.com, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 23 17:26:09 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1aBmEp-0007Wi-CC
	for ged-emacs-devel@m.gmane.org; Wed, 23 Dec 2015 17:26:07 +0100
Original-Received: from localhost ([::1]:56797 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1aBmEo-00061i-N2
	for ged-emacs-devel@m.gmane.org; Wed, 23 Dec 2015 11:26:06 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:32857)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dancol@dancol.org>) id 1aBmEk-00061b-7a
	for emacs-devel@gnu.org; Wed, 23 Dec 2015 11:26:03 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dancol@dancol.org>) id 1aBmEh-0003rh-5g
	for emacs-devel@gnu.org; Wed, 23 Dec 2015 11:26:02 -0500
Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:38000)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dancol@dancol.org>)
	id 1aBmEg-0003r9-Nk; Wed, 23 Dec 2015 11:25:59 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org;
	s=x; 
	h=Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:Cc:References:To:Subject;
	bh=CYfnCg+INmJ4yA/6gxF2pdRyInd9oWzvZLZxYsbqmPs=; 
	b=gtrnED1AXM9OOzJICSihlCWhLZC8QwATwpAC+u+lupMf/qU+lF385trUKooAE6rCttlhcqT9jgi8YNNa3g+Sko95Dq03dsd6cGDK7vMHWJyMKIYl7/h2XAic4b7dRq0/0V6ZHU3M1NUGh/ibShvRh5AYo/A5hfFWzC++4yzUcv4Y4lAnjuzpkbL203VGTEXAvfB66M3my9Jw5fMo/i8q2Yv9l9pbBt9eBJrrLpUFZVehLaCsv22RVrel+BAXRMM1c6t/aLW9RW4DZCqGW2nAabqUNG+fXzU6caCPD1Jj3ZohAjrYQuF+2mBzLk32NqMPfQEZdpo4s8+9zWa9JKSR6g==;
Original-Received: from [2620:10d:c090:180::31bb]
	(helo=[IPv6:2620:10d:c081:1103:2ab2:bdff:fe1c:db58])
	by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
	(Exim 4.84) (envelope-from <dancol@dancol.org>)
	id 1aBmEf-0005ao-Mi; Wed, 23 Dec 2015 08:25:57 -0800
X-Enigmail-Draft-Status: N1110
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.4.0
In-Reply-To: <83poxxp2rl.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:196714
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/196714>

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--4nqgO3QLi1RrVXuDsM8Kf5qWUcPkLxp8i
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 12/23/2015 08:07 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
>>  p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Tue, 22 Dec 2015 13:18:21 -0800
>>
>>>> Which is why you setjmp in places where you have a significant stack=

>>>> reserve.
>>>
>>> There's no way of doing that portably, or even non-portably on many
>>> platforms.  You simply don't _know_ how much stack is left.
>>
>> You can probe at program start and pre-allocate as much as is reasonab=
le.
>=20
> Pre-allocate what?  Are you suggesting that Emacs allocates its own
> stack, instead of relying on the one provided by the linker and the
> OS?

We can alloca, say, 8MB, and write to the start and end of the allocated
region. Then we'll know we have at least that much stack space available.=


>>>> Longjmp, by itself, is simple and clear. What's unreliable is longjm=
ping
>>>> to Lisp at completely arbitrary points in the program, even ones mar=
ked
>>>> "GC can't happen here" and the like.
>>>
>>> We longjmp to a particular place, not arbitrary place.
>>
>> But we longjmp _from_ anywhere, and "anywhere" might be in the middle =
of
>> any delicate code sequence, since the compiler can generate code to
>> write to new stack slots at any point.
>=20
> I simply don't see any trouble this could cause, except leaking some
> memory.  Can you describe in enough detail a single use case where
> this could have any other adverse effects that we should care about
> when recovering from stack overflow?

What happens if we overflow inside malloc? One possibility is that we'll
longjmp back to toplevel without releasing the heap lock, then deadlock
the next time we try to allocate.

>>>> You say Emacs shouldn't crash.  Fine. We can't make that guarantee
>>>> if the crash recovery code breaks program invariants.
>>>
>>> Crash recovery doesn't need to keep invariants.  Or maybe I
>>> misunderstand what invariants do you have in mind.
>>
>> Any stack allocation anywhere in the program can longjmp. It's
>> impossible to reason about safety in that situation.
>=20
> Emacs is not safety-critical software, so there's no requirement to
> reason about safety.  Since I think the recovery's only role is to
> allow the user to exit Emacs in a controlled way without losing work,
> I simply don't see any problem that could be caused by longjmping from
> an arbitrary stack allocation.  After all, stack allocation is just
> assignment of value to a register, and sometimes grafting a range of
> memory pages into the memory set.
>=20
>>>> Failing that, we should allocate guard pages, unprotect the guard
>>>> pages on overflow
>>>
>>> Thats what the OS is for.  It would be wrong for us to start messing
>>> with page protection etc.  The exception caused by stack overflow
>>> removes protection from the guard page to let you do something simple=
,
>>> like run the exception handler -- are you suggesting we catch the
>>> exception and mess with protection bits as well, i.e. replace one of
>>> the core functions of a modern OS?  All that because what we have now=

>>> is not elegant enough for us?  Doesn't sound right to me.
>=20
>> We have a program that has its own Lisp runtime, has its own memory
>> allocation system, uses its own virtual filesystem access layer, and
>> that brings itself back from the dead. We're well past replicating OS
>> functionality.
>=20
> Actually, most of the above is simply untrue: we use system allocators
> to allocate memory

We have internal allocators for strings and conses and use the system
allocator only for backing storage.

> use mundane C APIs like 'open' and 'read' to
> access files

We must.

, and if by "bringing itself from the dead" you allude to
> unexec, then what it does is a subset of what every linker does,
> hardly an OS stuff.

Granted, that's toolchain work, not "OS" work, but it's still outside
the domain of most text editors.

> I think we should strive to distance ourselves from the OS business,
> not the other way around.  There was time when doing complex things
> sometimes required messing with low-level functionality like that, but
> that time is long passed.  Allocating our own stack, setting up and
> managing our own guard pages and the related exceptions -- we
> shouldn't go back there.

If an OS provides a documented and supported facility, there's no shame
in using it. I'm not sure how worrying about whatever that facility is
"OS business" is useful.

>> It's not a matter of elegance: it's a matter of correctness. The curre=
nt
>> scheme is unsafe.
>=20
> Emacs is not safety-critical software.  It doesn't need to be "safe"
> by your definition, if I understand it correctly.

It's not safety-critical software, but undefined behavior is undefined.
What makes us confident that we can't corrupt buffer data by longjmping
from the wrong place? Anything can happen because we can longjmp from
anywhere.

It's admirable to avoid the loss of user data, but I think there's a way
that's both safer and more general. Instead of trying to catch stack
overflow, let's treat stack overflow as a normal fatal error and instead
think about how we can preserve buffer contents on fatal errors generally=
=2E

What if we just installed a SIGSEGV handler (or, on Windows, a vectored
exception handler) that wrote buffer contents to a special file on a
fatal signal, then allowed that fatal signal to propagate normally? The
next time Emacs starts, we can restore the buffers we've saved this way
and ask users to save them --- just like autosave, but done on-demand,
at crash time, in C code, on the alternate signal stack.

>>>> and call out_of_memory so that it's obvious Emacs is in a bad
>>>> state. This way, we don't have to longjmp out of arbitrary code
>>>> sequences.
>>>
>>> There's no problem longjmping out of arbitrary code sequences.  When
>>> you debug a program, you do that all the time.
>>
>> In GDB, interrupting normal control flow is not part of standard
>> debugging practice.
>=20
> ??? Every time a debuggee hits a breakpoint, the normal control flow
> is interrupted, and you in effect have a huge longjmp -- from the
> debuggee to the debugger.

When a program hits a breakpoint, the OS sends it a signal. A debugger
that's ptraced its debugee will receive that signal, suspend execution,
and give control to the user. If the user opts to continue execution,
the debugger restores the debugee to the state it was in when it
received the signal, then allows is to resume execution.

At no point does the debugger force a debugee to longjmp. Debuggers take
pains to make programs behave as if breakpoints weren't there at all. We
don't try to resume execution at the point of a stack overflow.


--4nqgO3QLi1RrVXuDsM8Kf5qWUcPkLxp8i
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJWessPAAoJEN4WImmbpWBlFZAP/if93/8CaJgcMVJlsYUuWSIq
B0xDH34N32V61qojmOqBNnohteUZHSvPmXcgEoszxB7hArUFc1yQwZBsDlcrtFkh
jZ/s8HYFfg32AdnhMETNb/wba3q2RTHJ1oGqueUZUEhcEBQhaZzcZZbuhwqvVspY
omBlkQHJk1Or7bkfn4a5rvsf8ixnESc/QKiKx+Y9d+4hKBNGtgzBbgI7lyRT6Kks
JbIZhadIB/yPEPugAW3dornVJmcLvZ1PZigHBlyVlKRYCjc38xd+Vjb4h+7K19+9
sO59tSkugk2RQ3nz6Q5BkJmFWqShInp93S+PPpncq8edbWD4ZzVa2MWOeAWs0Fof
fRXAeHk4FNd7iHB26MXIftADHepImce9s4IX7pcd/7+EGRts2ftL4pFduDh2Vahg
Jc9fTejBmwzfCNOBUo/VpxvspMo8pxf9suioEswBGQg9mhtuWqUBQzKImOjLpQFF
P+dbuxI3lVqm8fpp/IigUCObvOxhexYXf8EX84YH2OYJ4u6jUnqG76KYNDWzvDXi
5Brtkz3we4OWbM9rFh6dJskPkAo8ME4mxOMdoxMzYO+SHf60ML2UrZGERF7GooNB
T/x14MxpqPmj+Edr5uvLnT7XR5apV85ZqH/mCwUl9mbMltUZFR4VjJBcc5XhruvM
2Lh99BvQHAcmnRdzTghS
=knJY
-----END PGP SIGNATURE-----

--4nqgO3QLi1RrVXuDsM8Kf5qWUcPkLxp8i--