* Dynamic modules: MODULE_HANDLE_SIGNALS etc.
@ 2015-11-25 18:39 Eli Zaretskii
2015-11-25 18:50 ` Philipp Stephani
2015-11-26 21:29 ` Paul Eggert
0 siblings, 2 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-11-25 18:39 UTC (permalink / raw)
To: dancol, tzz, p.stephani2, aurelien.aptel+emacs; +Cc: emacs-devel
These macros need more extensive documentation wrt when each one is
needed and why.
I tried to "reverse-engineer" that information from its current usage,
and my best hypothesis is that they should be used if a functions
calls some Emacs function that could potentially signal an error or
throw. For example, module_make_function calls list4, module_funcall
calls Ffuncall, module_copy_string_contents calls ENCODE_UTF_8, etc.
Is that correct?
If it is, then I have a few questions:
. Why don't some functions use any of these macros, although they do
call Emacs functions? Examples include module_make_integer (calls
make_number), and module_make_user_ptr (calls make_user_ptr).
. It seems like emacs-module.c assumes something about certain Emacs
functions, and based on that decides not to use these macros even
when calling those Emacs functions. For example, module_vec_get
calls ASIZE and AREF, but doesn't use the MODULE_HANDLE_* macros.
Evidently, it assumes that neither ASIZE nor AREF will ever signal
or throw. But isn't that a fragile assumption? The
implementation of internal Emacs functions is subject to change
without notice, and it would be a maintenance burden to have to
analyze upon each such change whether emacs-module.c needs some
augmentation.
. How to decide whether to use MODULE_HANDLE_SIGNAL or
MODULE_HANDLE_THROW (or both)? Again, it looks like the current
code simply assumes specific knowledge about certain Emacs
functions, knowledge which again can become outdated a year or a
month or a day from now.
So bottom line (again assuming my guesses above are correct), I'd
suggest to use these macros in all the emacs-module.c functions, and
in fact come up with a wrapper around calls to Emacs functions and
macros that will catch signals and throws, and make a point of calling
each such function/macro through that wrapper.
Comments?
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-25 18:39 Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
@ 2015-11-25 18:50 ` Philipp Stephani
2015-11-25 19:24 ` Eli Zaretskii
2015-11-26 21:29 ` Paul Eggert
1 sibling, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2015-11-25 18:50 UTC (permalink / raw)
To: Eli Zaretskii, dancol, tzz, aurelien.aptel+emacs; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2891 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 25. Nov. 2015 um 19:39 Uhr:
> These macros need more extensive documentation wrt when each one is
> needed and why.
>
> I tried to "reverse-engineer" that information from its current usage,
> and my best hypothesis is that they should be used if a functions
> calls some Emacs function that could potentially signal an error or
> throw. For example, module_make_function calls list4, module_funcall
> calls Ffuncall, module_copy_string_contents calls ENCODE_UTF_8, etc.
>
> Is that correct?
>
> If it is, then I have a few questions:
>
> . Why don't some functions use any of these macros, although they do
> call Emacs functions? Examples include module_make_integer (calls
> make_number), and module_make_user_ptr (calls make_user_ptr).
>
> . It seems like emacs-module.c assumes something about certain Emacs
> functions, and based on that decides not to use these macros even
> when calling those Emacs functions. For example, module_vec_get
> calls ASIZE and AREF, but doesn't use the MODULE_HANDLE_* macros.
> Evidently, it assumes that neither ASIZE nor AREF will ever signal
> or throw. But isn't that a fragile assumption? The
> implementation of internal Emacs functions is subject to change
> without notice, and it would be a maintenance burden to have to
> analyze upon each such change whether emacs-module.c needs some
> augmentation.
> . How to decide whether to use MODULE_HANDLE_SIGNAL or
> MODULE_HANDLE_THROW (or both)? Again, it looks like the current
> code simply assumes specific knowledge about certain Emacs
> functions, knowledge which again can become outdated a year or a
> month or a day from now.
>
> So bottom line (again assuming my guesses above are correct), I'd
> suggest to use these macros in all the emacs-module.c functions,
Yes, your thinking is correct. I used these macros based on the current
implementation. If this is too brittle, then these macros should indeed be
added to all environment functions (i.e. those functions that gets directly
called from module code).
I left them out purely for performance reasons: these macros both call
setjmp and possibly malloc, which can incur a significant penalty,
especially for environment functions that are very small otherwise (e.g. eq
or is_not_nil).
> and
> in fact come up with a wrapper around calls to Emacs functions and
> macros that will catch signals and throws, and make a point of calling
> each such function/macro through that wrapper.
>
>
My first approach was to create such a wrapper function, but I quickly
realized that it's too much of a hassle. Existing wrapper functions are
e.g. internal_condition_case in eval.c, but you'd need one wrapper function
for each signature, which quickly results in excessive code duplication.
Therefore I used the macros.
[-- Attachment #2: Type: text/html, Size: 3622 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-25 18:50 ` Philipp Stephani
@ 2015-11-25 19:24 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-11-25 19:24 UTC (permalink / raw)
To: Philipp Stephani; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 25 Nov 2015 18:50:05 +0000
> Cc: emacs-devel@gnu.org
>
> I left them out purely for performance reasons: these macros both call setjmp
> and possibly malloc, which can incur a significant penalty, especially for
> environment functions that are very small otherwise (e.g. eq or is_not_nil).
Be that as it may, I really don't think we have any reasonable
alternative. Anything else will be a bug waiting to happen.
> and
> in fact come up with a wrapper around calls to Emacs functions and
> macros that will catch signals and throws, and make a point of calling
> each such function/macro through that wrapper.
>
> My first approach was to create such a wrapper function, but I quickly realized
> that it's too much of a hassle. Existing wrapper functions are e.g.
> internal_condition_case in eval.c, but you'd need one wrapper function for each
> signature, which quickly results in excessive code duplication. Therefore I
> used the macros.
Maybe we could come up with some clever macro wrappers instead.
Macros are not really bound by function signature limitations, so
maybe there is a way.
Failing that, we will have to clearly document that use of these
macros in every emacs-module.c function is mandatory. And maybe also
have only one macro instead of 2, which catches both signals and
throw's.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-25 18:39 Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
2015-11-25 18:50 ` Philipp Stephani
@ 2015-11-26 21:29 ` Paul Eggert
2015-11-27 7:35 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-11-26 21:29 UTC (permalink / raw)
To: Eli Zaretskii, dancol, tzz, p.stephani2, aurelien.aptel+emacs; +Cc: emacs-devel
Eli Zaretskii wrote:
> it would be a maintenance burden to have to
> analyze upon each such change whether emacs-module.c needs some
> augmentation.
While that's true in general, I think some exceptions are OK. E.g., it's OK if
emacs-module.c assumes that ASIZE is a simple access function or macro that
doesn't throw signals. If we actually changed ASIZE to throw signals, there's a
boatload of other code we'd need to change as well, and changing emacs-module.c
wouldn't add much more to the maintenance burden.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-26 21:29 ` Paul Eggert
@ 2015-11-27 7:35 ` Eli Zaretskii
2015-11-27 19:19 ` Philipp Stephani
2015-11-28 23:20 ` Paul Eggert
0 siblings, 2 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-11-27 7:35 UTC (permalink / raw)
To: Paul Eggert; +Cc: aurelien.aptel+emacs, tzz, dancol, p.stephani2, emacs-devel
> Cc: emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 26 Nov 2015 13:29:49 -0800
>
> Eli Zaretskii wrote:
> > it would be a maintenance burden to have to
> > analyze upon each such change whether emacs-module.c needs some
> > augmentation.
>
> While that's true in general, I think some exceptions are OK. E.g., it's OK if
> emacs-module.c assumes that ASIZE is a simple access function or macro that
> doesn't throw signals. If we actually changed ASIZE to throw signals, there's a
> boatload of other code we'd need to change as well, and changing emacs-module.c
> wouldn't add much more to the maintenance burden.
So what are the rules here, exactly? I'd like to write them down in
the commentary to emacs-module.c, so that any future changes there
will have lower probability of breaking things.
E.g., can make_number signal an error? What about make_float or
make_string? And what about accessors like XFLOAT_DATA or AREF?
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-27 7:35 ` Eli Zaretskii
@ 2015-11-27 19:19 ` Philipp Stephani
2015-11-28 10:58 ` Philipp Stephani
2015-11-28 23:20 ` Paul Eggert
1 sibling, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2015-11-27 19:19 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1777 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Fr., 27. Nov. 2015 um 08:36 Uhr:
> > Cc: emacs-devel@gnu.org
> > From: Paul Eggert <eggert@cs.ucla.edu>
> > Date: Thu, 26 Nov 2015 13:29:49 -0800
> >
> > Eli Zaretskii wrote:
> > > it would be a maintenance burden to have to
> > > analyze upon each such change whether emacs-module.c needs some
> > > augmentation.
> >
> > While that's true in general, I think some exceptions are OK. E.g.,
> it's OK if
> > emacs-module.c assumes that ASIZE is a simple access function or macro
> that
> > doesn't throw signals. If we actually changed ASIZE to throw signals,
> there's a
> > boatload of other code we'd need to change as well, and changing
> emacs-module.c
> > wouldn't add much more to the maintenance burden.
>
> So what are the rules here, exactly? I'd like to write them down in
> the commentary to emacs-module.c, so that any future changes there
> will have lower probability of breaking things.
>
> E.g., can make_number signal an error? What about make_float or
> make_string? And what about accessors like XFLOAT_DATA or AREF?
>
>
Are there any established rules? If not we should probably be conservative
and assume that everything signals. If we figure out that this introduces
an unacceptably high overhead in some situations we can reconsider later.
I would propose three exceptions: free_global_ref, is_not_nil, eq.
free_global_ref cannot fail in Daniel's design, and implementing it that
way would be consistent with other resource deallocation functions such as
free(3). is_not_nil and eq seem so fundamental that I cannot imagine a
situation where they could ever fail. Documenting that these three cannot
fail would free module authors from the need to check for errors after
calling these functions.
[-- Attachment #2: Type: text/html, Size: 2324 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-27 19:19 ` Philipp Stephani
@ 2015-11-28 10:58 ` Philipp Stephani
2015-11-28 12:10 ` Eli Zaretskii
2015-12-19 21:03 ` Philipp Stephani
0 siblings, 2 replies; 177+ messages in thread
From: Philipp Stephani @ 2015-11-28 10:58 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 2026 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 27. Nov. 2015 um
20:19 Uhr:
> Eli Zaretskii <eliz@gnu.org> schrieb am Fr., 27. Nov. 2015 um 08:36 Uhr:
>
>> > Cc: emacs-devel@gnu.org
>> > From: Paul Eggert <eggert@cs.ucla.edu>
>> > Date: Thu, 26 Nov 2015 13:29:49 -0800
>> >
>> > Eli Zaretskii wrote:
>> > > it would be a maintenance burden to have to
>> > > analyze upon each such change whether emacs-module.c needs some
>> > > augmentation.
>> >
>> > While that's true in general, I think some exceptions are OK. E.g.,
>> it's OK if
>> > emacs-module.c assumes that ASIZE is a simple access function or macro
>> that
>> > doesn't throw signals. If we actually changed ASIZE to throw signals,
>> there's a
>> > boatload of other code we'd need to change as well, and changing
>> emacs-module.c
>> > wouldn't add much more to the maintenance burden.
>>
>> So what are the rules here, exactly? I'd like to write them down in
>> the commentary to emacs-module.c, so that any future changes there
>> will have lower probability of breaking things.
>>
>> E.g., can make_number signal an error? What about make_float or
>> make_string? And what about accessors like XFLOAT_DATA or AREF?
>>
>>
> Are there any established rules? If not we should probably be conservative
> and assume that everything signals. If we figure out that this introduces
> an unacceptably high overhead in some situations we can reconsider later.
> I would propose three exceptions: free_global_ref, is_not_nil, eq.
> free_global_ref cannot fail in Daniel's design, and implementing it that
> way would be consistent with other resource deallocation functions such as
> free(3). is_not_nil and eq seem so fundamental that I cannot imagine a
> situation where they could ever fail. Documenting that these three cannot
> fail would free module authors from the need to check for errors after
> calling these functions.
>
For now I've attached a patch to replace the initial setup of most
environment functions with a single macro.
[-- Attachment #1.2: Type: text/html, Size: 2863 bytes --]
[-- Attachment #2: 0001-Create-and-use-a-macro-MODULE_FUNCTION_BEGIN.patch --]
[-- Type: application/octet-stream, Size: 13021 bytes --]
From 1c30db558708575635905c48ae2c6fc5649f8ea8 Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Sat, 28 Nov 2015 11:55:02 +0100
Subject: [PATCH] Create and use a macro MODULE_FUNCTION_BEGIN.
This replaces the initial boilerplate for most environment functions.
* emacs-module.c (module_make_global_ref)
(module_free_global_ref, module_make_function, module_funcall)
(module_intern, module_type_of, module_extract_integer)
(module_make_integer, module_extract_float, module_make_float)
(module_copy_string_contents, module_make_string)
(module_make_user_ptr, module_get_user_ptr, module_set_user_ptr)
(module_get_user_finalizer, module_set_user_finalizer)
(module_vec_set, module_vec_get, module_vec_size): Use new helper
macro MODULE_FUNCTION_BEGIN.
---
src/emacs-module.c | 132 +++++++++++++++++++----------------------------------
1 file changed, 46 insertions(+), 86 deletions(-)
diff --git a/src/emacs-module.c b/src/emacs-module.c
index 3686470..fc77765 100644
--- a/src/emacs-module.c
+++ b/src/emacs-module.c
@@ -156,24 +156,14 @@ static void module_wrong_type (emacs_env *, Lisp_Object, Lisp_Object);
passing information to the handler functions. */
/* Place this macro at the beginning of a function returning a number
- or a pointer to handle signals. The function must have an ENV
- parameter. The function will return 0 (or NULL) if a signal is
- caught. */
-#define MODULE_HANDLE_SIGNALS MODULE_HANDLE_SIGNALS_RETURN (0)
-
-/* Place this macro at the beginning of a function returning void to
- handle signals. The function must have an ENV parameter. */
-#define MODULE_HANDLE_SIGNALS_VOID MODULE_HANDLE_SIGNALS_RETURN ()
-
-#define MODULE_HANDLE_SIGNALS_RETURN(retval) \
- MODULE_SETJMP (CONDITION_CASE, module_handle_signal, retval)
-
-/* Place this macro at the beginning of a function returning a pointer
- to handle non-local exits via `throw'. The function must have an
- ENV parameter. The function will return NULL if a `throw' is
- caught. */
-#define MODULE_HANDLE_THROW \
- MODULE_SETJMP (CATCHER_ALL, module_handle_throw, NULL)
+ or a pointer to handle non-local exits. The function must have an
+ ENV parameter. The function will return the specified value if a
+ signal or throw is caught. */
+// TODO: Have Fsignal check for CATCHER_ALL so we only have to install
+// one handler.
+#define MODULE_HANDLE_NONLOCAL_EXIT(retval) \
+ MODULE_SETJMP (CONDITION_CASE, module_handle_signal, retval); \
+ MODULE_SETJMP (CATCHER_ALL, module_handle_throw, retval)
#define MODULE_SETJMP(handlertype, handlerfunc, retval) \
MODULE_SETJMP_1 (handlertype, handlerfunc, retval, \
@@ -190,6 +180,8 @@ static void module_wrong_type (emacs_env *, Lisp_Object, Lisp_Object);
code after the macro may longjmp back into the macro, which means
its local variable C must stay live in later code. */
+// TODO: Make backtraces work if this macros is used.
+
#define MODULE_SETJMP_1(handlertype, handlerfunc, retval, c, dummy) \
if (module_non_local_exit_check (env) != emacs_funcall_exit_return) \
return retval; \
@@ -252,8 +244,8 @@ static Lisp_Object module_call_func;
4. Any function that needs to call Emacs facilities, such as
encoding or decoding functions, or 'intern', or 'make_string',
should protect itself from signals and 'throw' in the called
- Emacs functions, by placing the macros MODULE_HANDLE_SIGNALS
- and/or MODULE_HANDLE_THROW right after the above 2 tests.
+ Emacs functions, by placing the macro
+ MODULE_HANDLE_NONLOCAL_EXIT right after the above 2 tests.
5. Do NOT use 'eassert' for checking validity of user code in the
module. Instead, make those checks part of the code, and if the
@@ -263,6 +255,16 @@ static Lisp_Object module_call_func;
instead of reporting the error back to Lisp, and also because
'eassert' is compiled to nothing in the release version. */
+/* Use MODULE_FUNCTION_BEGIN to implement steps 2 through 4 for most
+ environment functions. On error it will return its argument, which
+ should be a sentinel value. */
+
+#define MODULE_FUNCTION_BEGIN(error_retval) \
+ check_main_thread (); \
+ if (module_non_local_exit_check (env) != emacs_funcall_exit_return) \
+ return error_retval; \
+ MODULE_HANDLE_NONLOCAL_EXIT (error_retval)
+
/* Catch signals and throws only if the code can actually signal or
throw. If checking is enabled, abort if the current thread is not
the Emacs main thread. */
@@ -280,10 +282,7 @@ module_get_environment (struct emacs_runtime *ert)
static emacs_value
module_make_global_ref (emacs_env *env, emacs_value ref)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
- MODULE_HANDLE_SIGNALS;
+ MODULE_FUNCTION_BEGIN (NULL);
struct Lisp_Hash_Table *h = XHASH_TABLE (Vmodule_refs_hash);
Lisp_Object new_obj = value_to_lisp (ref);
EMACS_UINT hashcode;
@@ -312,13 +311,10 @@ module_make_global_ref (emacs_env *env, emacs_value ref)
static void
module_free_global_ref (emacs_env *env, emacs_value ref)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return;
/* TODO: This probably never signals. */
/* FIXME: Wait a minute. Shouldn't this function report an error if
the hash lookup fails? */
- MODULE_HANDLE_SIGNALS_VOID;
+ MODULE_FUNCTION_BEGIN ();
struct Lisp_Hash_Table *h = XHASH_TABLE (Vmodule_refs_hash);
Lisp_Object obj = value_to_lisp (ref);
EMACS_UINT hashcode;
@@ -396,10 +392,7 @@ module_make_function (emacs_env *env, ptrdiff_t min_arity, ptrdiff_t max_arity,
emacs_subr subr, const char *documentation,
void *data)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
- MODULE_HANDLE_SIGNALS;
+ MODULE_FUNCTION_BEGIN (NULL);
if (! (0 <= min_arity
&& (max_arity < 0
@@ -434,11 +427,7 @@ static emacs_value
module_funcall (emacs_env *env, emacs_value fun, ptrdiff_t nargs,
emacs_value args[])
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
- MODULE_HANDLE_SIGNALS;
- MODULE_HANDLE_THROW;
+ MODULE_FUNCTION_BEGIN (NULL);
/* Make a new Lisp_Object array starting with the function as the
first arg, because that's what Ffuncall takes. */
@@ -456,19 +445,14 @@ module_funcall (emacs_env *env, emacs_value fun, ptrdiff_t nargs,
static emacs_value
module_intern (emacs_env *env, const char *name)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
- MODULE_HANDLE_SIGNALS;
+ MODULE_FUNCTION_BEGIN (NULL);
return lisp_to_value (env, intern (name));
}
static emacs_value
module_type_of (emacs_env *env, emacs_value value)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
+ MODULE_FUNCTION_BEGIN (NULL);
return lisp_to_value (env, Ftype_of (value_to_lisp (value)));
}
@@ -493,9 +477,7 @@ module_eq (emacs_env *env, emacs_value a, emacs_value b)
static intmax_t
module_extract_integer (emacs_env *env, emacs_value n)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return 0;
+ MODULE_FUNCTION_BEGIN (0);
Lisp_Object l = value_to_lisp (n);
if (! INTEGERP (l))
{
@@ -508,9 +490,7 @@ module_extract_integer (emacs_env *env, emacs_value n)
static emacs_value
module_make_integer (emacs_env *env, intmax_t n)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
+ MODULE_FUNCTION_BEGIN (NULL);
if (! (MOST_NEGATIVE_FIXNUM <= n && n <= MOST_POSITIVE_FIXNUM))
{
module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
@@ -522,9 +502,7 @@ module_make_integer (emacs_env *env, intmax_t n)
static double
module_extract_float (emacs_env *env, emacs_value f)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return 0;
+ MODULE_FUNCTION_BEGIN (0);
Lisp_Object lisp = value_to_lisp (f);
if (! FLOATP (lisp))
{
@@ -537,10 +515,7 @@ module_extract_float (emacs_env *env, emacs_value f)
static emacs_value
module_make_float (emacs_env *env, double d)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
- MODULE_HANDLE_SIGNALS;
+ MODULE_FUNCTION_BEGIN (NULL);
return lisp_to_value (env, make_float (d));
}
@@ -548,10 +523,7 @@ static bool
module_copy_string_contents (emacs_env *env, emacs_value value, char *buffer,
ptrdiff_t *length)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return false;
- MODULE_HANDLE_SIGNALS;
+ MODULE_FUNCTION_BEGIN (false);
Lisp_Object lisp_str = value_to_lisp (value);
if (! STRINGP (lisp_str))
{
@@ -594,10 +566,7 @@ module_copy_string_contents (emacs_env *env, emacs_value value, char *buffer,
static emacs_value
module_make_string (emacs_env *env, const char *str, ptrdiff_t length)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
- MODULE_HANDLE_SIGNALS;
+ MODULE_FUNCTION_BEGIN (NULL);
if (length > STRING_BYTES_BOUND)
{
module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
@@ -611,18 +580,14 @@ module_make_string (emacs_env *env, const char *str, ptrdiff_t length)
static emacs_value
module_make_user_ptr (emacs_env *env, emacs_finalizer_function fin, void *ptr)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
+ MODULE_FUNCTION_BEGIN (NULL);
return lisp_to_value (env, make_user_ptr (fin, ptr));
}
static void *
module_get_user_ptr (emacs_env *env, emacs_value uptr)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
+ MODULE_FUNCTION_BEGIN (NULL);
Lisp_Object lisp = value_to_lisp (uptr);
if (! USER_PTRP (lisp))
{
@@ -635,6 +600,8 @@ module_get_user_ptr (emacs_env *env, emacs_value uptr)
static void
module_set_user_ptr (emacs_env *env, emacs_value uptr, void *ptr)
{
+ // FIXME: This function should return bool because it can fail.
+ MODULE_FUNCTION_BEGIN ();
check_main_thread ();
if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
return;
@@ -647,9 +614,7 @@ module_set_user_ptr (emacs_env *env, emacs_value uptr, void *ptr)
static emacs_finalizer_function
module_get_user_finalizer (emacs_env *env, emacs_value uptr)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
+ MODULE_FUNCTION_BEGIN (NULL);
Lisp_Object lisp = value_to_lisp (uptr);
if (! USER_PTRP (lisp))
{
@@ -663,9 +628,8 @@ static void
module_set_user_finalizer (emacs_env *env, emacs_value uptr,
emacs_finalizer_function fin)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return;
+ // FIXME: This function should return bool because it can fail.
+ MODULE_FUNCTION_BEGIN ();
Lisp_Object lisp = value_to_lisp (uptr);
if (! USER_PTRP (lisp))
module_wrong_type (env, Quser_ptr, lisp);
@@ -675,9 +639,8 @@ module_set_user_finalizer (emacs_env *env, emacs_value uptr,
static void
module_vec_set (emacs_env *env, emacs_value vec, ptrdiff_t i, emacs_value val)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return;
+ // FIXME: This function should return bool because it can fail.
+ MODULE_FUNCTION_BEGIN ();
Lisp_Object lvec = value_to_lisp (vec);
if (! VECTORP (lvec))
{
@@ -698,9 +661,7 @@ module_vec_set (emacs_env *env, emacs_value vec, ptrdiff_t i, emacs_value val)
static emacs_value
module_vec_get (emacs_env *env, emacs_value vec, ptrdiff_t i)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return NULL;
+ MODULE_FUNCTION_BEGIN (NULL);
Lisp_Object lvec = value_to_lisp (vec);
if (! VECTORP (lvec))
{
@@ -721,9 +682,8 @@ module_vec_get (emacs_env *env, emacs_value vec, ptrdiff_t i)
static ptrdiff_t
module_vec_size (emacs_env *env, emacs_value vec)
{
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return 0;
+ // FIXME: Return a sentinel value (e.g., -1) on error.
+ MODULE_FUNCTION_BEGIN (0);
Lisp_Object lvec = value_to_lisp (vec);
if (! VECTORP (lvec))
{
--
2.6.3
^ permalink raw reply related [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-28 10:58 ` Philipp Stephani
@ 2015-11-28 12:10 ` Eli Zaretskii
2015-12-19 21:03 ` Philipp Stephani
1 sibling, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-11-28 12:10 UTC (permalink / raw)
To: Philipp Stephani; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 28 Nov 2015 10:58:44 +0000
> Cc: dancol@dancol.org, tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> emacs-devel@gnu.org
>
> For now I've attached a patch to replace the initial setup of most environment
> functions with a single macro.
Thanks, pushed.
I think we should require each function in emacs-module.c to be
non-void. That should simplify things even more.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-27 7:35 ` Eli Zaretskii
2015-11-27 19:19 ` Philipp Stephani
@ 2015-11-28 23:20 ` Paul Eggert
1 sibling, 0 replies; 177+ messages in thread
From: Paul Eggert @ 2015-11-28 23:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, tzz, dancol, p.stephani2, emacs-devel
Eli Zaretskii wrote:
> So what are the rules here, exactly? I'd like to write them down in
> the commentary to emacs-module.c, so that any future changes there
> will have lower probability of breaking things.
I guess the rules are that if emacs-module.c calls a function without worrying
about whether it can signal, then that function shouldn't signal. I'm not
trying to be flippant here, just realistic. If you like I suppose you could
list in emacs-module.c the functions that you think cannot throw signals. But
perhaps it'd be better to wrap all calls to such functions in a wrapper that is
ordinarily a no-op, but which dumps core if a signal occurs while debugging.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-11-28 10:58 ` Philipp Stephani
2015-11-28 12:10 ` Eli Zaretskii
@ 2015-12-19 21:03 ` Philipp Stephani
2015-12-19 22:57 ` Philipp Stephani
` (2 more replies)
1 sibling, 3 replies; 177+ messages in thread
From: Philipp Stephani @ 2015-12-19 21:03 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 2355 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am Sa., 28. Nov. 2015 um
11:58 Uhr:
> Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 27. Nov. 2015 um
> 20:19 Uhr:
>
>> Eli Zaretskii <eliz@gnu.org> schrieb am Fr., 27. Nov. 2015 um 08:36 Uhr:
>>
>>> > Cc: emacs-devel@gnu.org
>>> > From: Paul Eggert <eggert@cs.ucla.edu>
>>> > Date: Thu, 26 Nov 2015 13:29:49 -0800
>>> >
>>> > Eli Zaretskii wrote:
>>> > > it would be a maintenance burden to have to
>>> > > analyze upon each such change whether emacs-module.c needs some
>>> > > augmentation.
>>> >
>>> > While that's true in general, I think some exceptions are OK. E.g.,
>>> it's OK if
>>> > emacs-module.c assumes that ASIZE is a simple access function or macro
>>> that
>>> > doesn't throw signals. If we actually changed ASIZE to throw signals,
>>> there's a
>>> > boatload of other code we'd need to change as well, and changing
>>> emacs-module.c
>>> > wouldn't add much more to the maintenance burden.
>>>
>>> So what are the rules here, exactly? I'd like to write them down in
>>> the commentary to emacs-module.c, so that any future changes there
>>> will have lower probability of breaking things.
>>>
>>> E.g., can make_number signal an error? What about make_float or
>>> make_string? And what about accessors like XFLOAT_DATA or AREF?
>>>
>>>
>> Are there any established rules? If not we should probably be
>> conservative and assume that everything signals. If we figure out that
>> this introduces an unacceptably high overhead in some situations we can
>> reconsider later.
>> I would propose three exceptions: free_global_ref, is_not_nil, eq.
>> free_global_ref cannot fail in Daniel's design, and implementing it that
>> way would be consistent with other resource deallocation functions such as
>> free(3). is_not_nil and eq seem so fundamental that I cannot imagine a
>> situation where they could ever fail. Documenting that these three cannot
>> fail would free module authors from the need to check for errors after
>> calling these functions.
>>
>
> For now I've attached a patch to replace the initial setup of most
> environment functions with a single macro.
>
Here's another patch that replaces some of the custom error handling with
signals. The signals will be immediately caught by the prologue, but some
complexity and duplication is removed.
[-- Attachment #1.2: Type: text/html, Size: 3475 bytes --]
[-- Attachment #2: 0001-Use-standard-checks-whenever-possible.patch --]
[-- Type: application/octet-stream, Size: 9553 bytes --]
From 4d4b6516001163eb72cbcfec480ba878829f668d Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Sat, 19 Dec 2015 22:00:51 +0100
Subject: [PATCH] Use standard checks whenever possible.
This is possible in all functions where we catch signals anyway.
* emacs-module.c (module_make_global_ref, module_funcall)
(module_copy_string_contents, module_make_string): Use xsignal0
and CHECK macros for argument checks.
---
src/emacs-module.c | 123 +++++++++--------------------------------------------
1 file changed, 21 insertions(+), 102 deletions(-)
diff --git a/src/emacs-module.c b/src/emacs-module.c
index ee97644..28b73a5 100644
--- a/src/emacs-module.c
+++ b/src/emacs-module.c
@@ -102,14 +102,12 @@ static enum emacs_funcall_exit module_non_local_exit_check (emacs_env *);
static void check_main_thread (void);
static void finalize_environment (struct emacs_env_private *);
static void initialize_environment (emacs_env *, struct emacs_env_private *priv);
-static void module_args_out_of_range (emacs_env *, Lisp_Object, Lisp_Object);
static void module_handle_signal (emacs_env *, Lisp_Object);
static void module_handle_throw (emacs_env *, Lisp_Object);
static void module_non_local_exit_signal_1 (emacs_env *, Lisp_Object, Lisp_Object);
static void module_non_local_exit_throw_1 (emacs_env *, Lisp_Object, Lisp_Object);
static void module_out_of_memory (emacs_env *);
static void module_reset_handlerlist (const int *);
-static void module_wrong_type (emacs_env *, Lisp_Object, Lisp_Object);
/* We used to return NULL when emacs_value was a different type from
Lisp_Object, but nowadays we just use Qnil instead. Although they
@@ -261,12 +259,9 @@ module_make_global_ref (emacs_env *env, emacs_value ref)
if (i >= 0)
{
Lisp_Object value = HASH_VALUE (h, i);
+ verify (EMACS_INT_MAX > MOST_POSITIVE_FIXNUM);
EMACS_INT refcount = XFASTINT (value) + 1;
- if (refcount > MOST_POSITIVE_FIXNUM)
- {
- module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
- return module_nil;
- }
+ if (FIXNUM_OVERFLOW_P (refcount)) xsignal0 (Qoverflow_error);
value = make_natnum (refcount);
set_hash_value_slot (h, i, value);
}
@@ -405,6 +400,7 @@ module_funcall (emacs_env *env, emacs_value fun, ptrdiff_t nargs,
first arg, because that's what Ffuncall takes. */
Lisp_Object *newargs;
USE_SAFE_ALLOCA;
+ if (nargs == PTRDIFF_MAX) xsignal0 (Qoverflow_error);
SAFE_ALLOCA_LISP (newargs, nargs + 1);
newargs[0] = value_to_lisp (fun);
for (ptrdiff_t i = 0; i < nargs; i++)
@@ -451,11 +447,7 @@ module_extract_integer (emacs_env *env, emacs_value n)
{
MODULE_FUNCTION_BEGIN (0);
Lisp_Object l = value_to_lisp (n);
- if (! INTEGERP (l))
- {
- module_wrong_type (env, Qintegerp, l);
- return 0;
- }
+ CHECK_NUMBER (l);
return XINT (l);
}
@@ -463,11 +455,7 @@ static emacs_value
module_make_integer (emacs_env *env, intmax_t n)
{
MODULE_FUNCTION_BEGIN (module_nil);
- if (! (MOST_NEGATIVE_FIXNUM <= n && n <= MOST_POSITIVE_FIXNUM))
- {
- module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
- return module_nil;
- }
+ if (FIXNUM_OVERFLOW_P (n)) xsignal0 (Qoverflow_error);
return lisp_to_value (make_number (n));
}
@@ -476,11 +464,7 @@ module_extract_float (emacs_env *env, emacs_value f)
{
MODULE_FUNCTION_BEGIN (0);
Lisp_Object lisp = value_to_lisp (f);
- if (! FLOATP (lisp))
- {
- module_wrong_type (env, Qfloatp, lisp);
- return 0;
- }
+ CHECK_TYPE (FLOATP (lisp), Qfloatp, lisp);
return XFLOAT_DATA (lisp);
}
@@ -497,19 +481,11 @@ module_copy_string_contents (emacs_env *env, emacs_value value, char *buffer,
{
MODULE_FUNCTION_BEGIN (false);
Lisp_Object lisp_str = value_to_lisp (value);
- if (! STRINGP (lisp_str))
- {
- module_wrong_type (env, Qstringp, lisp_str);
- return false;
- }
+ CHECK_STRING (lisp_str);
Lisp_Object lisp_str_utf8 = ENCODE_UTF_8 (lisp_str);
ptrdiff_t raw_size = SBYTES (lisp_str_utf8);
- if (raw_size == PTRDIFF_MAX)
- {
- module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
- return false;
- }
+ if (raw_size == PTRDIFF_MAX) xsignal0 (Qoverflow_error);
ptrdiff_t required_buf_size = raw_size + 1;
eassert (length != NULL);
@@ -525,8 +501,7 @@ module_copy_string_contents (emacs_env *env, emacs_value value, char *buffer,
if (*length < required_buf_size)
{
*length = required_buf_size;
- module_non_local_exit_signal_1 (env, Qargs_out_of_range, Qnil);
- return false;
+ xsignal0 (Qargs_out_of_range);
}
*length = required_buf_size;
@@ -539,11 +514,7 @@ static emacs_value
module_make_string (emacs_env *env, const char *str, ptrdiff_t length)
{
MODULE_FUNCTION_BEGIN (module_nil);
- if (length > STRING_BYTES_BOUND)
- {
- module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
- return module_nil;
- }
+ if (length > STRING_BYTES_BOUND) xsignal0 (Qoverflow_error);
Lisp_Object lstr = make_unibyte_string (str, length);
return lisp_to_value (code_convert_string_norecord (lstr, Qutf_8, false));
}
@@ -560,11 +531,7 @@ module_get_user_ptr (emacs_env *env, emacs_value uptr)
{
MODULE_FUNCTION_BEGIN (NULL);
Lisp_Object lisp = value_to_lisp (uptr);
- if (! USER_PTRP (lisp))
- {
- module_wrong_type (env, Quser_ptr, lisp);
- return NULL;
- }
+ CHECK_TYPE (USER_PTRP (lisp), Quser_ptrp, lisp);
return XUSER_PTR (lisp)->p;
}
@@ -573,12 +540,8 @@ module_set_user_ptr (emacs_env *env, emacs_value uptr, void *ptr)
{
/* FIXME: This function should return bool because it can fail. */
MODULE_FUNCTION_BEGIN ();
- check_main_thread ();
- if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
- return;
Lisp_Object lisp = value_to_lisp (uptr);
- if (! USER_PTRP (lisp))
- module_wrong_type (env, Quser_ptr, lisp);
+ CHECK_TYPE (USER_PTRP (lisp), Quser_ptrp, lisp);
XUSER_PTR (lisp)->p = ptr;
}
@@ -587,11 +550,7 @@ module_get_user_finalizer (emacs_env *env, emacs_value uptr)
{
MODULE_FUNCTION_BEGIN (NULL);
Lisp_Object lisp = value_to_lisp (uptr);
- if (! USER_PTRP (lisp))
- {
- module_wrong_type (env, Quser_ptr, lisp);
- return NULL;
- }
+ CHECK_TYPE (USER_PTRP (lisp), Quser_ptrp, lisp);
return XUSER_PTR (lisp)->finalizer;
}
@@ -602,8 +561,7 @@ module_set_user_finalizer (emacs_env *env, emacs_value uptr,
/* FIXME: This function should return bool because it can fail. */
MODULE_FUNCTION_BEGIN ();
Lisp_Object lisp = value_to_lisp (uptr);
- if (! USER_PTRP (lisp))
- module_wrong_type (env, Quser_ptr, lisp);
+ CHECK_TYPE (USER_PTRP (lisp), Quser_ptrp, lisp);
XUSER_PTR (lisp)->finalizer = fin;
}
@@ -613,19 +571,9 @@ module_vec_set (emacs_env *env, emacs_value vec, ptrdiff_t i, emacs_value val)
/* FIXME: This function should return bool because it can fail. */
MODULE_FUNCTION_BEGIN ();
Lisp_Object lvec = value_to_lisp (vec);
- if (! VECTORP (lvec))
- {
- module_wrong_type (env, Qvectorp, lvec);
- return;
- }
- if (! (0 <= i && i < ASIZE (lvec)))
- {
- if (MOST_NEGATIVE_FIXNUM <= i && i <= MOST_POSITIVE_FIXNUM)
- module_args_out_of_range (env, lvec, make_number (i));
- else
- module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
- return;
- }
+ CHECK_VECTOR (lvec);
+ if (FIXNUM_OVERFLOW_P (i)) xsignal0 (Qoverflow_error);
+ CHECK_RANGED_INTEGER (make_number (i), 0, ASIZE (lvec) - 1);
ASET (lvec, i, value_to_lisp (val));
}
@@ -634,19 +582,9 @@ module_vec_get (emacs_env *env, emacs_value vec, ptrdiff_t i)
{
MODULE_FUNCTION_BEGIN (module_nil);
Lisp_Object lvec = value_to_lisp (vec);
- if (! VECTORP (lvec))
- {
- module_wrong_type (env, Qvectorp, lvec);
- return module_nil;
- }
- if (! (0 <= i && i < ASIZE (lvec)))
- {
- if (MOST_NEGATIVE_FIXNUM <= i && i <= MOST_POSITIVE_FIXNUM)
- module_args_out_of_range (env, lvec, make_number (i));
- else
- module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
- return module_nil;
- }
+ CHECK_VECTOR (lvec);
+ if (FIXNUM_OVERFLOW_P (i)) xsignal0 (Qoverflow_error);
+ CHECK_RANGED_INTEGER (make_number (i), 0, ASIZE (lvec) - 1);
return lisp_to_value (AREF (lvec, i));
}
@@ -656,11 +594,7 @@ module_vec_size (emacs_env *env, emacs_value vec)
/* FIXME: Return a sentinel value (e.g., -1) on error. */
MODULE_FUNCTION_BEGIN (0);
Lisp_Object lvec = value_to_lisp (vec);
- if (! VECTORP (lvec))
- {
- module_wrong_type (env, Qvectorp, lvec);
- return 0;
- }
+ CHECK_VECTOR (lvec);
return ASIZE (lvec);
}
@@ -819,14 +753,6 @@ module_non_local_exit_throw_1 (emacs_env *env, Lisp_Object tag,
}
}
-/* Module version of `wrong_type_argument'. */
-static void
-module_wrong_type (emacs_env *env, Lisp_Object predicate, Lisp_Object value)
-{
- module_non_local_exit_signal_1 (env, Qwrong_type_argument,
- list2 (predicate, value));
-}
-
/* Signal an out-of-memory condition to the caller. */
static void
module_out_of_memory (emacs_env *env)
@@ -837,13 +763,6 @@ module_out_of_memory (emacs_env *env)
XCDR (Vmemory_signal_data));
}
-/* Signal arguments are out of range. */
-static void
-module_args_out_of_range (emacs_env *env, Lisp_Object a1, Lisp_Object a2)
-{
- module_non_local_exit_signal_1 (env, Qargs_out_of_range, list2 (a1, a2));
-}
-
\f
/* Value conversion. */
--
2.6.3
^ permalink raw reply related [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-19 21:03 ` Philipp Stephani
@ 2015-12-19 22:57 ` Philipp Stephani
2015-12-20 15:47 ` Eli Zaretskii
2015-12-20 15:48 ` Eli Zaretskii
2016-02-29 22:48 ` Philipp Stephani
2 siblings, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2015-12-19 22:57 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 2686 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am Sa., 19. Dez. 2015 um
22:03 Uhr:
> Philipp Stephani <p.stephani2@gmail.com> schrieb am Sa., 28. Nov. 2015 um
> 11:58 Uhr:
>
>> Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 27. Nov. 2015
>> um 20:19 Uhr:
>>
>>> Eli Zaretskii <eliz@gnu.org> schrieb am Fr., 27. Nov. 2015 um 08:36 Uhr:
>>>
>>>> > Cc: emacs-devel@gnu.org
>>>> > From: Paul Eggert <eggert@cs.ucla.edu>
>>>> > Date: Thu, 26 Nov 2015 13:29:49 -0800
>>>> >
>>>> > Eli Zaretskii wrote:
>>>> > > it would be a maintenance burden to have to
>>>> > > analyze upon each such change whether emacs-module.c needs some
>>>> > > augmentation.
>>>> >
>>>> > While that's true in general, I think some exceptions are OK. E.g.,
>>>> it's OK if
>>>> > emacs-module.c assumes that ASIZE is a simple access function or
>>>> macro that
>>>> > doesn't throw signals. If we actually changed ASIZE to throw
>>>> signals, there's a
>>>> > boatload of other code we'd need to change as well, and changing
>>>> emacs-module.c
>>>> > wouldn't add much more to the maintenance burden.
>>>>
>>>> So what are the rules here, exactly? I'd like to write them down in
>>>> the commentary to emacs-module.c, so that any future changes there
>>>> will have lower probability of breaking things.
>>>>
>>>> E.g., can make_number signal an error? What about make_float or
>>>> make_string? And what about accessors like XFLOAT_DATA or AREF?
>>>>
>>>>
>>> Are there any established rules? If not we should probably be
>>> conservative and assume that everything signals. If we figure out that
>>> this introduces an unacceptably high overhead in some situations we can
>>> reconsider later.
>>> I would propose three exceptions: free_global_ref, is_not_nil, eq.
>>> free_global_ref cannot fail in Daniel's design, and implementing it that
>>> way would be consistent with other resource deallocation functions such as
>>> free(3). is_not_nil and eq seem so fundamental that I cannot imagine a
>>> situation where they could ever fail. Documenting that these three cannot
>>> fail would free module authors from the need to check for errors after
>>> calling these functions.
>>>
>>
>> For now I've attached a patch to replace the initial setup of most
>> environment functions with a single macro.
>>
>
> Here's another patch that replaces some of the custom error handling with
> signals. The signals will be immediately caught by the prologue, but some
> complexity and duplication is removed.
>
Added another patch that documents some of the assumptions about non-local
exits. I've added them directly to lisp.h so that people changing these
definitions will hopefully notice.
[-- Attachment #1.2: Type: text/html, Size: 4082 bytes --]
[-- Attachment #2: 0001-Document-assumptions-about-non-local-exits.patch --]
[-- Type: application/octet-stream, Size: 4240 bytes --]
From 525fec853289dffcabb313605da7262da1f8ccef Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Sat, 19 Dec 2015 23:51:52 +0100
Subject: [PATCH] Document assumptions about non-local exits.
* lisp.h: Document assumptions about EQ and NILP.
* emacs-module.c (module_non_local_exit_get): Document that we
cannot use the current implementation.
(module_is_not_nil, module_eq): Document assumptions about EQ and
NILP.
---
src/emacs-module.c | 8 ++++++--
src/lisp.h | 9 ++++++++-
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/src/emacs-module.c b/src/emacs-module.c
index ee97644..5d1b4dc 100644
--- a/src/emacs-module.c
+++ b/src/emacs-module.c
@@ -325,6 +325,8 @@ module_non_local_exit_get (emacs_env *env, emacs_value *sym, emacs_value *data)
struct emacs_env_private *p = env->private_members;
if (p->pending_non_local_exit != emacs_funcall_exit_return)
{
+ /* FIXME: We cannot call lisp_to_value here because that can
+ exit non-locally. */
*sym = lisp_to_value (p->non_local_exit_symbol);
*data = lisp_to_value (p->non_local_exit_data);
}
@@ -434,6 +436,7 @@ module_is_not_nil (emacs_env *env, emacs_value value)
check_main_thread ();
if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
return false;
+ /* Assume that NILP never exits non-locally. */
return ! NILP (value_to_lisp (value));
}
@@ -443,6 +446,7 @@ module_eq (emacs_env *env, emacs_value a, emacs_value b)
check_main_thread ();
if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
return false;
+ /* Assume that EQ never exits non-locally. */
return EQ (value_to_lisp (a), value_to_lisp (b));
}
@@ -889,7 +893,7 @@ value_to_lisp_bits (emacs_value v)
}
/* If V was computed from lisp_to_value (O), then return O.
- Never fails. */
+ Must never fail or exit non-locally. */
static Lisp_Object
value_to_lisp (emacs_value v)
{
@@ -919,7 +923,7 @@ enum { HAVE_STRUCT_ATTRIBUTE_ALIGNED = 0 };
#endif
/* Convert O to an emacs_value. Allocate storage if needed; this can
- signal if memory is exhausted. */
+ signal if memory is exhausted. Must be injective. */
static emacs_value
lisp_to_value (Lisp_Object o)
{
diff --git a/src/lisp.h b/src/lisp.h
index 995760a..f1d761e 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -329,11 +329,13 @@ error !;
#define lisp_h_CHECK_TYPE(ok, predicate, x) \
((ok) ? (void) 0 : (void) wrong_type_argument (predicate, x))
#define lisp_h_CONSP(x) (XTYPE (x) == Lisp_Cons)
+/* lisp_h_EQ must never exit non-locally. */
#define lisp_h_EQ(x, y) (XLI (x) == XLI (y))
#define lisp_h_FLOATP(x) (XTYPE (x) == Lisp_Float)
#define lisp_h_INTEGERP(x) ((XTYPE (x) & (Lisp_Int0 | ~Lisp_Int1)) == Lisp_Int0)
#define lisp_h_MARKERP(x) (MISCP (x) && XMISCTYPE (x) == Lisp_Misc_Marker)
#define lisp_h_MISCP(x) (XTYPE (x) == Lisp_Misc)
+/* lisp_h_NILP must never exit non-locally. */
#define lisp_h_NILP(x) EQ (x, Qnil)
#define lisp_h_SET_SYMBOL_VAL(sym, v) \
(eassert ((sym)->redirect == SYMBOL_PLAINVAL), (sym)->val.value = (v))
@@ -382,11 +384,13 @@ error !;
# define CHECK_SYMBOL(x) lisp_h_CHECK_SYMBOL (x)
# define CHECK_TYPE(ok, predicate, x) lisp_h_CHECK_TYPE (ok, predicate, x)
# define CONSP(x) lisp_h_CONSP (x)
+/* EQ must never exit non-locally. */
# define EQ(x, y) lisp_h_EQ (x, y)
# define FLOATP(x) lisp_h_FLOATP (x)
# define INTEGERP(x) lisp_h_INTEGERP (x)
# define MARKERP(x) lisp_h_MARKERP (x)
# define MISCP(x) lisp_h_MISCP (x)
+/* NILP must never exit non-locally. */
# define NILP(x) lisp_h_NILP (x)
# define SET_SYMBOL_VAL(sym, v) lisp_h_SET_SYMBOL_VAL (sym, v)
# define SYMBOL_CONSTANT_P(sym) lisp_h_SYMBOL_CONSTANT_P (sym)
@@ -988,7 +992,8 @@ make_natnum (EMACS_INT n)
return USE_LSB_TAG ? make_number (n) : XIL (n + (int0 << VALBITS));
}
-/* Return true if X and Y are the same object. */
+/* Return true if X and Y are the same object. Must never exit
+ non-locally. */
INLINE bool
(EQ) (Lisp_Object x, Lisp_Object y)
@@ -2565,6 +2570,8 @@ enum char_bits
\f
/* Data type checking. */
+/* Checks whether X is null. Must never exit non-locally. */
+
INLINE bool
(NILP) (Lisp_Object x)
{
--
2.6.4
^ permalink raw reply related [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-19 22:57 ` Philipp Stephani
@ 2015-12-20 15:47 ` Eli Zaretskii
2015-12-20 18:34 ` Philipp Stephani
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-20 15:47 UTC (permalink / raw)
To: Philipp Stephani; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 19 Dec 2015 22:57:14 +0000
> Cc: dancol@dancol.org, tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> emacs-devel@gnu.org
>
> Added another patch that documents some of the assumptions about non-local
> exits. I've added them directly to lisp.h so that people changing these
> definitions will hopefully notice.
Thanks, but please mention in these comments that emacs-module.c
relies on these requirements.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-19 21:03 ` Philipp Stephani
2015-12-19 22:57 ` Philipp Stephani
@ 2015-12-20 15:48 ` Eli Zaretskii
2015-12-20 18:27 ` Philipp Stephani
2016-02-29 22:48 ` Philipp Stephani
2 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-20 15:48 UTC (permalink / raw)
To: Philipp Stephani; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 19 Dec 2015 21:03:32 +0000
> Cc: dancol@dancol.org, tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> emacs-devel@gnu.org
>
> Here's another patch that replaces some of the custom error handling with
> signals. The signals will be immediately caught by the prologue, but some
> complexity and duplication is removed.
Thanks. Did you run the mod-test test suite, and did you make sure
that signaling an error from a module still reports the same
information and backtrace as before the change?
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-20 15:48 ` Eli Zaretskii
@ 2015-12-20 18:27 ` Philipp Stephani
2015-12-20 19:00 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2015-12-20 18:27 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 925 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 20. Dez. 2015 um 16:48 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 19 Dec 2015 21:03:32 +0000
> > Cc: dancol@dancol.org, tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> > emacs-devel@gnu.org
> >
> > Here's another patch that replaces some of the custom error handling with
> > signals. The signals will be immediately caught by the prologue, but some
> > complexity and duplication is removed.
>
> Thanks. Did you run the mod-test test suite,
yes
> and did you make sure
> that signaling an error from a module still reports the same
> information and backtrace as before the change?
>
No, but I wouldn't know how the patch should change this. It only changes
an implementation detail in the error reporting code of the module
implementation itself.
If we wanted to check the backtrace, I think we should rather add a unit
test for it.
[-- Attachment #2: Type: text/html, Size: 1720 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-20 15:47 ` Eli Zaretskii
@ 2015-12-20 18:34 ` Philipp Stephani
2015-12-20 19:11 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2015-12-20 18:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 605 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 20. Dez. 2015 um 16:47 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 19 Dec 2015 22:57:14 +0000
> > Cc: dancol@dancol.org, tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> > emacs-devel@gnu.org
> >
> > Added another patch that documents some of the assumptions about
> non-local
> > exits. I've added them directly to lisp.h so that people changing these
> > definitions will hopefully notice.
>
> Thanks, but please mention in these comments that emacs-module.c
> relies on these requirements.
>
>
Done. Attached new patch.
[-- Attachment #1.2: Type: text/html, Size: 1238 bytes --]
[-- Attachment #2: 0001-Document-assumptions-about-non-local-exits.patch --]
[-- Type: application/octet-stream, Size: 4448 bytes --]
From a5c8911cae67617002a9743e19b6c9ac748fb889 Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Sat, 19 Dec 2015 23:51:52 +0100
Subject: [PATCH] Document assumptions about non-local exits.
* lisp.h: Document assumptions about EQ and NILP.
* emacs-module.c (module_non_local_exit_get): Document that we
cannot use the current implementation.
(module_is_not_nil, module_eq): Document assumptions about EQ and
NILP.
---
src/emacs-module.c | 8 ++++++--
src/lisp.h | 13 ++++++++++++-
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/emacs-module.c b/src/emacs-module.c
index ee97644..5d1b4dc 100644
--- a/src/emacs-module.c
+++ b/src/emacs-module.c
@@ -325,6 +325,8 @@ module_non_local_exit_get (emacs_env *env, emacs_value *sym, emacs_value *data)
struct emacs_env_private *p = env->private_members;
if (p->pending_non_local_exit != emacs_funcall_exit_return)
{
+ /* FIXME: We cannot call lisp_to_value here because that can
+ exit non-locally. */
*sym = lisp_to_value (p->non_local_exit_symbol);
*data = lisp_to_value (p->non_local_exit_data);
}
@@ -434,6 +436,7 @@ module_is_not_nil (emacs_env *env, emacs_value value)
check_main_thread ();
if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
return false;
+ /* Assume that NILP never exits non-locally. */
return ! NILP (value_to_lisp (value));
}
@@ -443,6 +446,7 @@ module_eq (emacs_env *env, emacs_value a, emacs_value b)
check_main_thread ();
if (module_non_local_exit_check (env) != emacs_funcall_exit_return)
return false;
+ /* Assume that EQ never exits non-locally. */
return EQ (value_to_lisp (a), value_to_lisp (b));
}
@@ -889,7 +893,7 @@ value_to_lisp_bits (emacs_value v)
}
/* If V was computed from lisp_to_value (O), then return O.
- Never fails. */
+ Must never fail or exit non-locally. */
static Lisp_Object
value_to_lisp (emacs_value v)
{
@@ -919,7 +923,7 @@ enum { HAVE_STRUCT_ATTRIBUTE_ALIGNED = 0 };
#endif
/* Convert O to an emacs_value. Allocate storage if needed; this can
- signal if memory is exhausted. */
+ signal if memory is exhausted. Must be injective. */
static emacs_value
lisp_to_value (Lisp_Object o)
{
diff --git a/src/lisp.h b/src/lisp.h
index 995760a..bcac4b6 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -329,11 +329,15 @@ error !;
#define lisp_h_CHECK_TYPE(ok, predicate, x) \
((ok) ? (void) 0 : (void) wrong_type_argument (predicate, x))
#define lisp_h_CONSP(x) (XTYPE (x) == Lisp_Cons)
+/* lisp_h_EQ must never exit non-locally; emacs-module.c relies on
+ that. */
#define lisp_h_EQ(x, y) (XLI (x) == XLI (y))
#define lisp_h_FLOATP(x) (XTYPE (x) == Lisp_Float)
#define lisp_h_INTEGERP(x) ((XTYPE (x) & (Lisp_Int0 | ~Lisp_Int1)) == Lisp_Int0)
#define lisp_h_MARKERP(x) (MISCP (x) && XMISCTYPE (x) == Lisp_Misc_Marker)
#define lisp_h_MISCP(x) (XTYPE (x) == Lisp_Misc)
+/* lisp_h_NILP must never exit non-locally; emacs-module.c relies on
+ that. */
#define lisp_h_NILP(x) EQ (x, Qnil)
#define lisp_h_SET_SYMBOL_VAL(sym, v) \
(eassert ((sym)->redirect == SYMBOL_PLAINVAL), (sym)->val.value = (v))
@@ -382,11 +386,14 @@ error !;
# define CHECK_SYMBOL(x) lisp_h_CHECK_SYMBOL (x)
# define CHECK_TYPE(ok, predicate, x) lisp_h_CHECK_TYPE (ok, predicate, x)
# define CONSP(x) lisp_h_CONSP (x)
+/* EQ must never exit non-locally; emacs-module.c relies on that. */
# define EQ(x, y) lisp_h_EQ (x, y)
# define FLOATP(x) lisp_h_FLOATP (x)
# define INTEGERP(x) lisp_h_INTEGERP (x)
# define MARKERP(x) lisp_h_MARKERP (x)
# define MISCP(x) lisp_h_MISCP (x)
+/* NILP must never exit non-locally; emacs-module.c relies on
+ that. */
# define NILP(x) lisp_h_NILP (x)
# define SET_SYMBOL_VAL(sym, v) lisp_h_SET_SYMBOL_VAL (sym, v)
# define SYMBOL_CONSTANT_P(sym) lisp_h_SYMBOL_CONSTANT_P (sym)
@@ -988,7 +995,8 @@ make_natnum (EMACS_INT n)
return USE_LSB_TAG ? make_number (n) : XIL (n + (int0 << VALBITS));
}
-/* Return true if X and Y are the same object. */
+/* Return true if X and Y are the same object. Must never exit
+ non-locally; emacs-module.c relies on that. */
INLINE bool
(EQ) (Lisp_Object x, Lisp_Object y)
@@ -2565,6 +2573,9 @@ enum char_bits
\f
/* Data type checking. */
+/* Checks whether X is null. Must never exit non-locally;
+ emacs-module.c relies on that. */
+
INLINE bool
(NILP) (Lisp_Object x)
{
--
2.6.3
^ permalink raw reply related [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-20 18:27 ` Philipp Stephani
@ 2015-12-20 19:00 ` Eli Zaretskii
2015-12-20 21:00 ` Philipp Stephani
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-20 19:00 UTC (permalink / raw)
To: Philipp Stephani; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 20 Dec 2015 18:27:19 +0000
> Cc: eggert@cs.ucla.edu, dancol@dancol.org, tzz@lifelogs.com,
> aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org
>
> and did you make sure
> that signaling an error from a module still reports the same
> information and backtrace as before the change?
>
> No, but I wouldn't know how the patch should change this. It only changes an
> implementation detail in the error reporting code of the module implementation
> itself.
I provided one such test here:
http://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02318.html
and the resulting backtrace here:
http://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02405.html
> If we wanted to check the backtrace, I think we should rather add a unit test
> for it.
That would be good, thanks.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-20 18:34 ` Philipp Stephani
@ 2015-12-20 19:11 ` Eli Zaretskii
2015-12-20 21:40 ` Paul Eggert
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-20 19:11 UTC (permalink / raw)
To: Philipp Stephani; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 20 Dec 2015 18:34:03 +0000
> Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, eggert@cs.ucla.edu,
> dancol@dancol.org, emacs-devel@gnu.org
>
> Thanks, but please mention in these comments that emacs-module.c
> relies on these requirements.
>
> Done. Attached new patch.
Thanks, pushed.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-20 19:00 ` Eli Zaretskii
@ 2015-12-20 21:00 ` Philipp Stephani
2017-03-26 20:18 ` Philipp Stephani
0 siblings, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2015-12-20 21:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 1022 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 20. Dez. 2015 um 19:59 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sun, 20 Dec 2015 18:27:19 +0000
> > Cc: eggert@cs.ucla.edu, dancol@dancol.org, tzz@lifelogs.com,
> > aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org
> >
> > and did you make sure
> > that signaling an error from a module still reports the same
> > information and backtrace as before the change?
> >
> > No, but I wouldn't know how the patch should change this. It only
> changes an
> > implementation detail in the error reporting code of the module
> implementation
> > itself.
>
> I provided one such test here:
>
> http://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02318.html
>
> and the resulting backtrace here:
>
> http://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02405.html
>
> > If we wanted to check the backtrace, I think we should rather add a unit
> test
> > for it.
>
> That would be good, thanks.
>
>
Added a patch to that effect.
[-- Attachment #1.2: Type: text/html, Size: 2013 bytes --]
[-- Attachment #2: 0001-Add-check-for-expected-backtrace-in-module-calls.patch --]
[-- Type: application/octet-stream, Size: 1449 bytes --]
From fbe75ba6089cdceb09507b383c0dcb0b70479c9b Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Sun, 20 Dec 2015 21:59:10 +0100
Subject: [PATCH] Add check for expected backtrace in module calls.
* test.el (mod-test-non-local-exit-signal-test): Compare actual
backtrace to expected backtrace.
---
modules/mod-test/test.el | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/modules/mod-test/test.el b/modules/mod-test/test.el
index 69bf933..f8c16d3 100644
--- a/modules/mod-test/test.el
+++ b/modules/mod-test/test.el
@@ -64,7 +64,21 @@
;;
(ert-deftest mod-test-non-local-exit-signal-test ()
- (should-error (mod-test-signal)))
+ (should-error (mod-test-signal))
+ (let (debugger-args backtrace)
+ (should-error
+ (let ((debugger (lambda (&rest args)
+ (setq debugger-args args
+ backtrace (with-output-to-string (backtrace)))
+ (cl-incf num-nonmacro-input-events)))
+ (debug-on-signal t))
+ (mod-test-signal)))
+ (should (equal debugger-args '(error (error . 56))))
+ (should (string-match-p
+ (rx bol " internal--module-call(" (+ nonl) ?\) ?\n
+ " apply(internal--module-call " (+ nonl) ?\) ?\n
+ " mod-test-signal()" eol)
+ backtrace))))
(ert-deftest mod-test-non-local-exit-throw-test ()
(should (equal
--
2.6.3
^ permalink raw reply related [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-20 19:11 ` Eli Zaretskii
@ 2015-12-20 21:40 ` Paul Eggert
2015-12-21 3:33 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-12-20 21:40 UTC (permalink / raw)
To: Eli Zaretskii, Philipp Stephani
Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
Eli Zaretskii wrote:
>> From: Philipp Stephani <p.stephani2@gmail.com>
>> Date: Sun, 20 Dec 2015 18:34:03 +0000
>> Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, eggert@cs.ucla.edu,
>> dancol@dancol.org, emacs-devel@gnu.org
>>
>> Thanks, but please mention in these comments that emacs-module.c
>> relies on these requirements.
>>
>> Done. Attached new patch.
>
> Thanks, pushed.
These newly-added comments are incorrect. Any of these functions can exit
nonlocally due to stack overflow. I assume it's OK to revert them?
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-20 21:40 ` Paul Eggert
@ 2015-12-21 3:33 ` Eli Zaretskii
2015-12-21 11:00 ` Paul Eggert
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-21 3:33 UTC (permalink / raw)
To: Paul Eggert; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, dancol@dancol.org,
> emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 20 Dec 2015 13:40:36 -0800
>
> Eli Zaretskii wrote:
> >> From: Philipp Stephani <p.stephani2@gmail.com>
> >> Date: Sun, 20 Dec 2015 18:34:03 +0000
> >> Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, eggert@cs.ucla.edu,
> >> dancol@dancol.org, emacs-devel@gnu.org
> >>
> >> Thanks, but please mention in these comments that emacs-module.c
> >> relies on these requirements.
> >>
> >> Done. Attached new patch.
> >
> > Thanks, pushed.
>
> These newly-added comments are incorrect. Any of these functions can exit
> nonlocally due to stack overflow. I assume it's OK to revert them?
Then we either cannot use them in emacs-module.c, or should place
comments there to the effect that we disregard that possibility (and
maybe augment the comments in lisp.h saying that stack overflow is the
only allowed reason for non-local exits). Which one do we prefer?
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 3:33 ` Eli Zaretskii
@ 2015-12-21 11:00 ` Paul Eggert
2015-12-21 11:21 ` Yuri Khan
2015-12-21 15:46 ` Eli Zaretskii
0 siblings, 2 replies; 177+ messages in thread
From: Paul Eggert @ 2015-12-21 11:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
Eli Zaretskii wrote:
> Then we either cannot use them in emacs-module.c,
The lisp.h functions are no different from any other functions in this respect,
so this is not a lisp.h problem. Stack overflow can occur with any function
call, even functions defined within emacs-module.c itself.
> or should place
> comments there to the effect that we disregard that possibility (and
> maybe augment the comments in lisp.h saying that stack overflow is the
> only allowed reason for non-local exits). Which one do we prefer?
If emacs-module.c is incompatible with stack overflow detection, then we should
put a FIXME comment in emacs-module.c saying so, as this is a
recently-introduced bug that should get fixed. I did this just now.
Perhaps someone with C++ expertise can chime in and say how C++ programs are
expected to deal with stack overflow. Whatever mechanism is used, Emacs should
be using it.
Come to think of it, there must be a better way for emacs-module.c to deal with
C++ exceptions. The current approach is complex, confusing and error-prone. If
emacs-module.c or the Emacs exception-handling mechanism really needs to be
rewritten in C++ in order to catch C++ exceptions nicely, then somebody with C++
expertise should do that. (Optionally, of course; it should be OK to have a
C-only Emacs that does not worry about C++ modules.) If done well, that should
simplify the module interface considerably, for both C and C++.
If this simplification cannot be done, one possible workaround would be to say
that C++ modules should not use nontrivial destructors. This would be a
different way to let us simplify emacs-module.c significantly, albeit a way that
is less satisfactory for C++ modules.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 11:00 ` Paul Eggert
@ 2015-12-21 11:21 ` Yuri Khan
2015-12-21 11:34 ` Paul Eggert
2015-12-21 15:46 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Yuri Khan @ 2015-12-21 11:21 UTC (permalink / raw)
To: Paul Eggert
Cc: tzz, Emacs developers, Philipp Stephani, Aurélien Aptel,
Eli Zaretskii, Daniel Colascione
On Mon, Dec 21, 2015 at 5:00 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
> Come to think of it, there must be a better way for emacs-module.c to deal
> with C++ exceptions. The current approach is complex, confusing and
> error-prone. If emacs-module.c or the Emacs exception-handling mechanism
> really needs to be rewritten in C++ in order to catch C++ exceptions nicely,
> then somebody with C++ expertise should do that. (Optionally, of course; it
> should be OK to have a C-only Emacs that does not worry about C++ modules.)
> If done well, that should simplify the module interface considerably, for
> both C and C++.
On the Emacs side of the module API, Emacs should not have to
implement any special treatment for C++ exceptions. (Because if it
does, where do we draw the line? Should Emacs then handle any other
languages’ exceptions?)
A sane approach is to have a C-only API, and module developers who
want to write modules in other languages will devise their own
wrappers/adaptors to make interfacing with Emacs easier and less
error-prone. As part of its contract, such an adaptor shall prevent
any exceptions from crossing the module boundary.
> If this simplification cannot be done, one possible workaround would be to
> say that C++ modules should not use nontrivial destructors. This would be a
> different way to let us simplify emacs-module.c significantly, albeit a way
> that is less satisfactory for C++ modules.
I’m lacking full context here; do you mean not using nontrivial
destructors *at all*? This would amount to banning C++ as a possible
module development language, as nontrivial, deterministically invoked
destructors are its very essence.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 11:21 ` Yuri Khan
@ 2015-12-21 11:34 ` Paul Eggert
0 siblings, 0 replies; 177+ messages in thread
From: Paul Eggert @ 2015-12-21 11:34 UTC (permalink / raw)
To: Yuri Khan
Cc: tzz, Emacs developers, Philipp Stephani, Aurélien Aptel,
Eli Zaretskii, Daniel Colascione
Yuri Khan wrote:
> On the Emacs side of the module API, Emacs should not have to
> implement any special treatment for C++ exceptions. (Because if it
> does, where do we draw the line? Should Emacs then handle any other
> languages’ exceptions?)
This approach would be fine, if we can figure out how to solve the
exception-handling problem.
> A sane approach is to have a C-only API, and module developers who
> want to write modules in other languages will devise their own
> wrappers/adaptors to make interfacing with Emacs easier and less
> error-prone. As part of its contract, such an adaptor shall prevent
> any exceptions from crossing the module boundary.
That sounds reasonable, and should simplify the module API, at least for
C-language modules.
>> >If this simplification cannot be done, one possible workaround would be to
>> >say that C++ modules should not use nontrivial destructors. This would be a
>> >different way to let us simplify emacs-module.c significantly, albeit a way
>> >that is less satisfactory for C++ modules.
> I’m lacking full context here; do you mean not using nontrivial
> destructors*at all*?
I'm lacking context too, but I suppose the answer might be "yes", or at least
"yes, unless you can guarantee by some other means that stack overflow cannot
occur". Which does indeed sound unsatisfactory.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 11:00 ` Paul Eggert
2015-12-21 11:21 ` Yuri Khan
@ 2015-12-21 15:46 ` Eli Zaretskii
2015-12-21 18:15 ` Paul Eggert
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-21 15:46 UTC (permalink / raw)
To: Paul Eggert; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, dancol@dancol.org,
> tzz@lifelogs.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 21 Dec 2015 03:00:25 -0800
>
> Then we either cannot use them in emacs-module.c,
>
> The lisp.h functions are no different from any other functions in this respect, so this is not a lisp.h problem. Stack overflow can occur with any function call, even functions defined within emacs-module.c itself.
This discussion started so long ago that we've lost the perspective,
and its main point is no longer in our sights. Let me remind that.
A long time ago in a galaxy far, far away, I wrote in
http://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02190.html:
> > . Why don't some functions use any of these macros, although they do
> > call Emacs functions? Examples include module_make_integer (calls
> > make_number), and module_make_user_ptr (calls make_user_ptr).
> >
> > . It seems like emacs-module.c assumes something about certain Emacs
> > functions, and based on that decides not to use these macros even
> > when calling those Emacs functions. For example, module_vec_get
> > calls ASIZE and AREF, but doesn't use the MODULE_HANDLE_* macros.
> > Evidently, it assumes that neither ASIZE nor AREF will ever signal
> > or throw. But isn't that a fragile assumption? The
> > implementation of internal Emacs functions is subject to change
> > without notice, and it would be a maintenance burden to have to
> > analyze upon each such change whether emacs-module.c needs some
> > augmentation.
And you replied in
http://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02270.html:
> While that's true in general, I think some exceptions are OK. E.g., it's OK if emacs-module.c assumes that ASIZE is a simple access function or macro that doesn't throw signals. If we actually changed ASIZE to throw signals, there's a boatload of other code we'd need to change as well, and changing emacs-module.c wouldn't add much more to the maintenance burden.
Phillip's comments that you reverted were the logical consequence of
the apparent conclusion we drew from the above exchange: that it was
okay to assume these functions/macros cannot throw signals. Are you
now saying something different from what you said back then, i.e. that
we cannot rely on any function/macro from lisp.h to be signal-safe?
If so, we should add the necessary protection, in the form of calls to
MODULE_FUNCTION_BEGIN, to emacs-module.c functions that until now
relied on those lisp.h functions/macros to be safe.
> If emacs-module.c is incompatible with stack overflow detection, then we should put a FIXME comment in emacs-module.c saying so, as this is a recently-introduced bug that should get fixed. I did this just now.
> Perhaps someone with C++ expertise can chime in and say how C++ programs are expected to deal with stack overflow. Whatever mechanism is used, Emacs should be using it.
I have no idea how (or even if) C++ code can cope with stack overflow,
except by leaking all the objects that were alive at the point where
we detected stack overflow. AFAIK, proper C++ exception handling
requires non-trivial amounts of stack space that is not available when
there's stack overflow, where you have at most a single guard page to
work with.
> Come to think of it, there must be a better way for emacs-module.c to deal with C++ exceptions. The current approach is complex, confusing and error-prone. If emacs-module.c or the Emacs exception-handling mechanism really needs to be rewritten in C++ in order to catch C++ exceptions nicely, then somebody with C++ expertise should do that. (Optionally, of course; it should be OK to have a C-only Emacs that does not worry about C++ modules.) If done well, that should simplify the module interface considerably, for both C and C++.
I think there is some misunderstanding here, or some confusion,
perhaps mine: emacs-module.c is not supposed to deal with any C++
exceptions. C++ exceptions are supposed to be caught at the C++
level, below emacs-module.c, and handled there. An exception that
isn't caught will be recorded and will cause all the subsequent calls
to Lisp or to emacs-module.c function to fail, until we return back to
the Lisp interpreter that called the module function. But it will not
cause any C++-specific trouble.
What emacs-module.c does with non-local exits of _any_ kind is record
the first occurrence of such an exit, and silently return to the
caller, thus allowing the C++ objects on the stack to be destroyed
normally. IOW, it defers the exit until internal--module-call is
about to return. What problems do you see with that which cause you
to think it's error-prone, let alone dysfunctional?
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 15:46 ` Eli Zaretskii
@ 2015-12-21 18:15 ` Paul Eggert
2015-12-21 18:28 ` Daniel Colascione
` (2 more replies)
0 siblings, 3 replies; 177+ messages in thread
From: Paul Eggert @ 2015-12-21 18:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
Eli Zaretskii wrote:
> Are you
> now saying something different from what you said back then, i.e. that
> we cannot rely on any function/macro from lisp.h to be signal-safe?
Yes and no. As I understood it, that old conversation was about functions that
explicitly signal or throw, and it's safe to assume that EQ, NILP, etc. won't do
that. The new conversation is about running out of memory, which is a different
form of non-local exit. There may be other forms, such as operating-system
signals (I don't recall exactly).
> If so, we should add the necessary protection, in the form of calls to
> MODULE_FUNCTION_BEGIN, to emacs-module.c functions that until now
> relied on those lisp.h functions/macros to be safe.
This wouldn't suffice for these other non-local exits, I think; at least, not as
currently constructed.
> AFAIK, proper C++ exception handling
> requires non-trivial amounts of stack space that is not available when
> there's stack overflow, where you have at most a single guard page to
> work with.
There should be workarounds for that. Surely the C++ community has run into this
problem and has solutions. If we want to support C++ modules, we need to employ
them.
> I think there is some misunderstanding here, or some confusion,
> perhaps mine: emacs-module.c is not supposed to deal with any C++
> exceptions. C++ exceptions are supposed to be caught at the C++
> level, below emacs-module.c, and handled there. An exception that
> isn't caught will be recorded and will cause all the subsequent calls
> to Lisp or to emacs-module.c function to fail,
Why bother? If C++ exceptions are supposed to be caught by the C++ module in
question, why does Emacs need to worry about C++ exceptions that are not caught?
> What emacs-module.c does with non-local exits of _any_ kind is record
> the first occurrence of such an exit, and silently return to the
> caller, thus allowing the C++ objects on the stack to be destroyed
> normally. IOW, it defers the exit until internal--module-call is
> about to return. What problems do you see with that which cause you
> to think it's error-prone, let alone dysfunctional?
It uses a different model at the C level from what one sees in Elisp, or from
what one normally sees in C for that matter. I don't feel that I will really
understand the model unless I see some actual modules that do function calls and
exception handling; but it's hard to believe that a model that does silent
returns and that defers returns until later and that records some returns but
not others will be problem-free. Wouldn't it be simpler to have a module invoke
analogs of 'condition-case' and/or 'catch', and to dispense with the
funcall_exit stuff?
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 18:15 ` Paul Eggert
@ 2015-12-21 18:28 ` Daniel Colascione
2015-12-21 19:00 ` Eli Zaretskii
` (2 more replies)
2015-12-21 18:57 ` Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
2015-12-21 20:15 ` Philipp Stephani
2 siblings, 3 replies; 177+ messages in thread
From: Daniel Colascione @ 2015-12-21 18:28 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4445 bytes --]
On 12/21/2015 10:15 AM, Paul Eggert wrote:
> Eli Zaretskii wrote:
>
>> Are you
>> now saying something different from what you said back then, i.e. that
>> we cannot rely on any function/macro from lisp.h to be signal-safe?
>
> Yes and no. As I understood it, that old conversation was about
> functions that explicitly signal or throw, and it's safe to assume that
> EQ, NILP, etc. won't do that. The new conversation is about running out
> of memory, which is a different form of non-local exit.
IMHO, we should treat OOM exactly like other sorts of error. It's
dangerous to make some functions infallible.
> There may be
> other forms, such as operating-system signals (I don't recall exactly).
OS signals should go through the usual Emacs event loop, right?
>> If so, we should add the necessary protection, in the form of calls to
>> MODULE_FUNCTION_BEGIN, to emacs-module.c functions that until now
>> relied on those lisp.h functions/macros to be safe.
>
> This wouldn't suffice for these other non-local exits, I think; at
> least, not as currently constructed.
>
>> AFAIK, proper C++ exception handling
>> requires non-trivial amounts of stack space that is not available when
>> there's stack overflow, where you have at most a single guard page to
>> work with.
>
> There should be workarounds for that. Surely the C++ community has run
> into this problem and has solutions. If we want to support C++ modules,
> we need to employ them.
The standard requires runtimes reserve enough memory to throw
std::bad_alloc. All Emacs has to do is make sure control flow reaches
the C++ level.
>> I think there is some misunderstanding here, or some confusion,
>> perhaps mine: emacs-module.c is not supposed to deal with any C++
>> exceptions. C++ exceptions are supposed to be caught at the C++
>> level, below emacs-module.c, and handled there. An exception that
>> isn't caught will be recorded and will cause all the subsequent calls
>> to Lisp or to emacs-module.c function to fail,
>
> Why bother? If C++ exceptions are supposed to be caught by the C++
> module in question, why does Emacs need to worry about C++ exceptions
> that are not caught?
IMHO, it should be the module's job to make sure C++ exceptions don't
propagate through Emacs stack frames. Emacs shouldn't know or care about
C++ exceptions in any way.
You previously wrote that,
> If emacs-module.c or the Emacs exception-handling mechanism really
> needs to be rewritten in C++ in order to catch C++ exceptions nicely,
> then somebody with C++ expertise should do that.
I think there's a fundamental misunderstanding here. Emacs signals and
C++ exceptions are completely separate mechanisms. There's no reason
Emacs has to care about C++ at all. When we say that we want C++
exceptions to work, the Emacs-relevant meaning is that Emacs should
return always use conventional local returns, not longjmp, so that the
stack unwinding facilities of other languages (e.g., C++) work properly.
>> What emacs-module.c does with non-local exits of _any_ kind is record
>> the first occurrence of such an exit, and silently return to the
>> caller, thus allowing the C++ objects on the stack to be destroyed
>> normally. IOW, it defers the exit until internal--module-call is
>> about to return. What problems do you see with that which cause you
>> to think it's error-prone, let alone dysfunctional?
>
> It uses a different model at the C level from what one sees in Elisp, or
> from what one normally sees in C for that matter. I don't feel that I
> will really understand the model unless I see some actual modules that
> do function calls and exception handling; but it's hard to believe that
> a model that does silent returns and that defers returns until later and
> that records some returns but not others will be problem-free. Wouldn't
> it be simpler to have a module invoke analogs of 'condition-case' and/or
> 'catch', and to dispense with the funcall_exit stuff?
>
Both the Python and Java extension APIs implement high-level exceptions
with low-level state exactly the way we're talking about here, and the
result has been generally usable.
Of course it's possible for module authors (or Emacs developers) to
introduce bugs with either model, but using an explicit error indication
results in bugs that are easier to notice and easier to fix.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 18:15 ` Paul Eggert
2015-12-21 18:28 ` Daniel Colascione
@ 2015-12-21 18:57 ` Eli Zaretskii
2015-12-21 20:15 ` Philipp Stephani
2 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-21 18:57 UTC (permalink / raw)
To: Paul Eggert; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, dancol@dancol.org,
> tzz@lifelogs.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 21 Dec 2015 10:15:02 -0800
>
> Eli Zaretskii wrote:
>
> > Are you
> > now saying something different from what you said back then, i.e. that
> > we cannot rely on any function/macro from lisp.h to be signal-safe?
>
> Yes and no. As I understood it, that old conversation was about functions that
> explicitly signal or throw, and it's safe to assume that EQ, NILP, etc. won't do
> that. The new conversation is about running out of memory, which is a different
> form of non-local exit. There may be other forms, such as operating-system
> signals (I don't recall exactly).
The old and the new conversations are about the same: non-local exits
in functions defined by lisp.h and other Emacs sources.
> > If so, we should add the necessary protection, in the form of calls to
> > MODULE_FUNCTION_BEGIN, to emacs-module.c functions that until now
> > relied on those lisp.h functions/macros to be safe.
>
> This wouldn't suffice for these other non-local exits, I think; at least, not as
> currently constructed.
So the comments you deleted were correct: emacs-module.c can and does
rely on those functions and macros not to signal or throw.
> > AFAIK, proper C++ exception handling
> > requires non-trivial amounts of stack space that is not available when
> > there's stack overflow, where you have at most a single guard page to
> > work with.
>
> There should be workarounds for that. Surely the C++ community has run into this
> problem and has solutions. If we want to support C++ modules, we need to employ
> them.
I will have an opinion about that when I see such a solution. I'm not
sure it exists. It's a rare C program that can recover from stack
overflow; it wouldn't be a huge surprise to learn that no C++ program
can do that portably, or at all.
> > I think there is some misunderstanding here, or some confusion,
> > perhaps mine: emacs-module.c is not supposed to deal with any C++
> > exceptions. C++ exceptions are supposed to be caught at the C++
> > level, below emacs-module.c, and handled there. An exception that
> > isn't caught will be recorded and will cause all the subsequent calls
> > to Lisp or to emacs-module.c function to fail,
>
> Why bother? If C++ exceptions are supposed to be caught by the C++ module in
> question, why does Emacs need to worry about C++ exceptions that are not caught?
It doesn't. I tried to explain that. It worries about longjmp
etc. called by Lisp or by implementations of Lisp primitives provided
by modules.
> > What emacs-module.c does with non-local exits of _any_ kind is record
> > the first occurrence of such an exit, and silently return to the
> > caller, thus allowing the C++ objects on the stack to be destroyed
> > normally. IOW, it defers the exit until internal--module-call is
> > about to return. What problems do you see with that which cause you
> > to think it's error-prone, let alone dysfunctional?
>
> It uses a different model at the C level from what one sees in Elisp, or from
> what one normally sees in C for that matter.
So what? It's so simple that I'm surprised we waste so much time
discussing it.
> I don't feel that I will really understand the model unless I see
> some actual modules that do function calls and exception handling
There's a test in mod-test that does that, you can look at it and
trace in the debugger what happens in emacs-module.c once the error is
signaled.
> but it's hard to believe that a model that does silent
> returns and that defers returns until later and that records some returns but
> not others will be problem-free.
Why is it hard? It records the info if the slot is vacant, so only
the first instance gets recorded. All the subsequent calls see that
the slot is not vacant and simply return without doing anything. How
complicated is that?
> Wouldn't it be simpler to have a module invoke analogs of
> 'condition-case' and/or 'catch', and to dispense with the
> funcall_exit stuff?
No, not necessarily. Both are simple.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 18:28 ` Daniel Colascione
@ 2015-12-21 19:00 ` Eli Zaretskii
2015-12-21 20:19 ` Philipp Stephani
2015-12-21 19:04 ` Eli Zaretskii
2015-12-22 4:09 ` Paul Eggert
2 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-21 19:00 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com,
> emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 21 Dec 2015 10:28:09 -0800
>
> IMHO, we should treat OOM exactly like other sorts of error. It's
> dangerous to make some functions infallible.
We do: we signal an error.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 18:28 ` Daniel Colascione
2015-12-21 19:00 ` Eli Zaretskii
@ 2015-12-21 19:04 ` Eli Zaretskii
2015-12-22 4:09 ` Paul Eggert
2 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-21 19:04 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 21 Dec 2015 10:28:09 -0800
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com,
> emacs-devel@gnu.org
>
> The standard requires runtimes reserve enough memory to throw
> std::bad_alloc.
If they have enough memory for bad_alloc, they will never hit a stack
overflow. If they do hit it, they won't have enough memory. These
two conditions are mutually exclusive.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 18:15 ` Paul Eggert
2015-12-21 18:28 ` Daniel Colascione
2015-12-21 18:57 ` Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
@ 2015-12-21 20:15 ` Philipp Stephani
2 siblings, 0 replies; 177+ messages in thread
From: Philipp Stephani @ 2015-12-21 20:15 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4139 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Mo., 21. Dez. 2015 um 19:15 Uhr:
> Eli Zaretskii wrote:
>
> > Are you
> > now saying something different from what you said back then, i.e. that
> > we cannot rely on any function/macro from lisp.h to be signal-safe?
>
> Yes and no. As I understood it, that old conversation was about functions
> that
> explicitly signal or throw, and it's safe to assume that EQ, NILP, etc.
> won't do
> that. The new conversation is about running out of memory, which is a
> different
> form of non-local exit. There may be other forms, such as operating-system
> signals (I don't recall exactly).
>
My comment was meant to refer only to signals and throws (i.e. the
constructs described in
https://www.gnu.org/software/emacs/manual/html_node/elisp/Nonlocal-Exits.html
).
>
> > If so, we should add the necessary protection, in the form of calls to
> > MODULE_FUNCTION_BEGIN, to emacs-module.c functions that until now
> > relied on those lisp.h functions/macros to be safe.
>
> This wouldn't suffice for these other non-local exits, I think; at least,
> not as
> currently constructed.
>
I don't see how such a protection could be written at all. The stack
overflow handler seems to be fixed.
>
> > AFAIK, proper C++ exception handling
> > requires non-trivial amounts of stack space that is not available when
> > there's stack overflow, where you have at most a single guard page to
> > work with.
>
> There should be workarounds for that. Surely the C++ community has run
> into this
> problem and has solutions. If we want to support C++ modules, we need to
> employ
> them.
>
The solution in C++ is simple: don't use longjmp/setjmp, and write only
trivial signal handlers (that only set a flag), or use signal fds etc. I
don't think C++ programs regularly try to catch stack overflows; such an
attempt would almost guarantee undefined behavior, so crashing is more
appropriate.
>
> > I think there is some misunderstanding here, or some confusion,
> > perhaps mine: emacs-module.c is not supposed to deal with any C++
> > exceptions. C++ exceptions are supposed to be caught at the C++
> > level, below emacs-module.c, and handled there. An exception that
> > isn't caught will be recorded and will cause all the subsequent calls
> > to Lisp or to emacs-module.c function to fail,
>
> Why bother? If C++ exceptions are supposed to be caught by the C++ module
> in
> question, why does Emacs need to worry about C++ exceptions that are not
> caught?
>
It doesn't, in fact due to noexcept it is impossible to throw exceptions
across the module interface (and any C++ module had better install a
catch-all handler to avoid crashes).
>
> > What emacs-module.c does with non-local exits of _any_ kind is record
> > the first occurrence of such an exit, and silently return to the
> > caller, thus allowing the C++ objects on the stack to be destroyed
> > normally. IOW, it defers the exit until internal--module-call is
> > about to return. What problems do you see with that which cause you
> > to think it's error-prone, let alone dysfunctional?
>
> It uses a different model at the C level from what one sees in Elisp, or
> from
> what one normally sees in C for that matter. I don't feel that I will
> really
> understand the model unless I see some actual modules that do function
> calls and
> exception handling; but it's hard to believe that a model that does silent
> returns and that defers returns until later and that records some returns
> but
> not others will be problem-free. Wouldn't it be simpler to have a module
> invoke
> analogs of 'condition-case' and/or 'catch', and to dispense with the
> funcall_exit stuff?
>
Please see the countless discussions on this topic, starting in February
(!) with Daniel's original design:
https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00960.html
https://lists.gnu.org/archive/html/emacs-devel/2015-09/msg00548.html
https://lists.gnu.org/archive/html/emacs-devel/2015-09/msg00545.html
https://lists.gnu.org/archive/html/emacs-devel/2015-10/msg00416.html
https://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02159.html
[-- Attachment #2: Type: text/html, Size: 5865 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 19:00 ` Eli Zaretskii
@ 2015-12-21 20:19 ` Philipp Stephani
0 siblings, 0 replies; 177+ messages in thread
From: Philipp Stephani @ 2015-12-21 20:19 UTC (permalink / raw)
To: Eli Zaretskii, Daniel Colascione
Cc: aurelien.aptel+emacs, tzz, eggert, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 600 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Mo., 21. Dez. 2015 um 20:00 Uhr:
> > Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com,
> tzz@lifelogs.com,
> > emacs-devel@gnu.org
> > From: Daniel Colascione <dancol@dancol.org>
> > Date: Mon, 21 Dec 2015 10:28:09 -0800
> >
> > IMHO, we should treat OOM exactly like other sorts of error. It's
> > dangerous to make some functions infallible.
>
> We do: we signal an error.
>
For heap allocation failures, yes, but I think not for stack overflows.
(And I think handling stack overflows is much harder, so I guess most
people don't bother trying.)
[-- Attachment #2: Type: text/html, Size: 1220 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-21 18:28 ` Daniel Colascione
2015-12-21 19:00 ` Eli Zaretskii
2015-12-21 19:04 ` Eli Zaretskii
@ 2015-12-22 4:09 ` Paul Eggert
2015-12-22 4:38 ` Daniel Colascione
2 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-12-22 4:09 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
Daniel Colascione wrote:
> we should treat OOM exactly like other sorts of error.
Perhaps we should, but currently stack overflow is not treated that way.
> OS signals should go through the usual Emacs event loop, right?
I'm not sure what you mean, but let's put it this way: stack overflow can occur
while in the low-level handler for an OS signal. And even if stack overflow does
not occur, if the user types C-g three times when inhibit-quit is nil, the OS
signal won't go through the Emacs event loop; instead, Emacs will invoke (signal
'quit nil).
Perhaps what we need to do is to have stack overflow invoke (signal
'stack-overflow nil), or something like that. It's a bit tricky, though, as one
needs some stack space to call 'signal'.
> The standard requires runtimes reserve enough memory to throw
> std::bad_alloc. All Emacs has to do is make sure control flow reaches
> the C++ level.
How does this actually work, when combined with Emacs's C-level stack overflow
checking? Won't one get in the way of the other?
> When we say that we want C++
> exceptions to work, the Emacs-relevant meaning is that Emacs should
> return always use conventional local returns, not longjmp, so that the
> stack unwinding facilities of other languages (e.g., C++) work properly.
Clearly we're not doing that, at least for stack overflow and 3x C-g, and
perhaps for other cases (I haven't investigated thoroughly).
> Both the Python and Java extension APIs implement high-level exceptions
> with low-level state exactly the way we're talking about here, and the
> result has been generally usable.
I wonder how they deal with stack overflow, and/or with OS signals.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 4:09 ` Paul Eggert
@ 2015-12-22 4:38 ` Daniel Colascione
2015-12-22 4:48 ` Paul Eggert
2015-12-22 16:01 ` Eli Zaretskii
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2015-12-22 4:38 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3139 bytes --]
On 12/21/2015 08:09 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>> we should treat OOM exactly like other sorts of error.
>
> Perhaps we should, but currently stack overflow is not treated that way.
>
>> OS signals should go through the usual Emacs event loop, right?
>
> I'm not sure what you mean, but let's put it this way: stack overflow
> can occur while in the low-level handler for an OS signal. And even if
> stack overflow does not occur, if the user types C-g three times when
> inhibit-quit is nil, the OS signal won't go through the Emacs event
> loop; instead, Emacs will invoke (signal 'quit nil).
>
> Perhaps what we need to do is to have stack overflow invoke (signal
> 'stack-overflow nil), or something like that. It's a bit tricky, though,
> as one needs some stack space to call 'signal'.
>
>> The standard requires runtimes reserve enough memory to throw
>> std::bad_alloc. All Emacs has to do is make sure control flow reaches
>> the C++ level.
>
> How does this actually work, when combined with Emacs's C-level stack
> overflow checking? Won't one get in the way of the other?
Let's start over.
Right now, when we get a SIGSEGV, we check the siginfo_t the OS gives us
by calling stack_overflow on it; if that returns true, we longjmp to
toplevel. We configure the sigsegv handler to run on an alternate stack,
so we'll always have space to do that much work. The longjmp restores
the original stack. On the other side of the longjmp, we resume
execution with our regular stack, but much further up the stack. At this
point, we know we have a stack overflow, because nothing else longjmps
to return_to_command_loop.
Now, if we return normally to a C++ caller with an error indication set,
the C++ caller will almost certainly have enough stack space to throw
its own exception and propagate the exception further.
The only real change we have to make is to have Emacs longjmp not to
return_to_command_loop (which might skip module frames), but to longjmp
instead to the most deeply nested entry point from module code into
Emacs, which we can set up in advance whenever a module calls into the
Emacs API.
unwind_to_catch isn't really very different from the longmp to
return_to_command_loop: I don't see any reason we can't run it on the
alternate signal stack. In fact, I don't see why we can't replace
return_to_command_loop generally with Fsignal.
I really don't like the stack overflow protection stuff in general
though. It's not possible to robustly recover, because the stack
overflow detection turns *any* function call into an operation that
might return non-locally. In that environment --- where, say, XCAR might
end up running lisp --- it's hard to maintain invariants. I'd rather
Emacs just die on C stack overflow, except when we know we're running
Lisp in such a way that we know we can recover.
(The bad_alloc comment is moe about exhausting the heap: even if we
instead exhaust the malloc heap instead of the stack, we'll have still
set aside enough space to throw a bad_alloc as long as Emacs returns
control to C++.)
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 4:38 ` Daniel Colascione
@ 2015-12-22 4:48 ` Paul Eggert
2015-12-22 4:52 ` Daniel Colascione
2015-12-22 16:03 ` Eli Zaretskii
2015-12-22 16:01 ` Eli Zaretskii
1 sibling, 2 replies; 177+ messages in thread
From: Paul Eggert @ 2015-12-22 4:48 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
Daniel Colascione wrote:
> I'd rather
> Emacs just die on C stack overflow
That would certainly be easier to implement! But as a user I would not find it
acceptable.
> except when we know we're running
> Lisp in such a way that we know we can recover.
I'm afraid that's not good enough, as stack overflow can occur while running C code.
> The only real change we have to make is to have Emacs longjmp not to
> return_to_command_loop (which might skip module frames), but to longjmp
> instead to the most deeply nested entry point from module code into
> Emacs, which we can set up in advance whenever a module calls into the
> Emacs API.
Yes, that looks like something we should do, then, to get stack overflow
checking working with modules.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 4:48 ` Paul Eggert
@ 2015-12-22 4:52 ` Daniel Colascione
2015-12-22 6:09 ` Paul Eggert
2015-12-22 16:03 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-22 4:52 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 872 bytes --]
On 12/21/2015 08:48 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>> I'd rather
>> Emacs just die on C stack overflow
>
> That would certainly be easier to implement! But as a user I would not
> find it acceptable.
You find it acceptable in most other programs. What problem are we
trying to solve with this overflow checking? We should do something
about unbound recursion in Lisp, but in C? Why? We expect a much higher
level of robustness in the C code.
Besides, we already crash if we overflow the stack while we're GCing.
>> except when we know we're running
>> Lisp in such a way that we know we can recover.
>
> I'm afraid that's not good enough, as stack overflow can occur while
> running C code.
Yes, and the proper response to stack overflow in C code should be to
crash. We shouldn't be allocating stack without bound in C.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 4:52 ` Daniel Colascione
@ 2015-12-22 6:09 ` Paul Eggert
2015-12-22 6:14 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-12-22 6:09 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
Daniel Colascione wrote:
>> That would certainly be easier to implement! But as a user I would not
>> >find it acceptable.
> You find it acceptable in most other programs.
?! No, I don't find it acceptable for any program to dump core on stack
overflow. And even if it were OK for (say) 'cat' to dump core due to stack
overflow in a typical environment (which it's not), Emacs is more important than
'cat', because people use it as an interactive text editor and do not want to
lose their work.
>we already crash if we overflow the stack while we're GCing.
If so, that's a bug that should get fixed. It's not an excuse to introduce
similar bugs.
Really, the idea that it's OK for Emacs to crash is a nonstarter. Emacs should
not crash.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 6:09 ` Paul Eggert
@ 2015-12-22 6:14 ` Daniel Colascione
2015-12-22 6:33 ` Paul Eggert
2015-12-22 16:12 ` Eli Zaretskii
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2015-12-22 6:14 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]
On 12/21/2015 10:09 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>>> That would certainly be easier to implement! But as a user I would not
>>> >find it acceptable.
>> You find it acceptable in most other programs.
>
> ?! No, I don't find it acceptable for any program to dump core on stack
> overflow.
Most programs you use don't bother trying to detect or recover from sack
overflow. Why should they? C stack overflow is a programming error. It's
like hitting an assertion failure. Should we try to recover from
assertion failures too?
> And even if it were OK for (say) 'cat' to dump core due to
> stack overflow in a typical environment (which it's not), Emacs is more
> important than 'cat', because people use it as an interactive text
> editor and do not want to lose their work.
>
>> we already crash if we overflow the stack while we're GCing.
>
> If so, that's a bug that should get fixed. It's not an excuse to
> introduce similar bugs.
>
> Really, the idea that it's OK for Emacs to crash is a nonstarter. Emacs
> should not crash.
Ideally, we wouldn't have bugs. But we do, and when we hit them, we
should crash reliably and deterministically if we can't recover
reliably. The current stack overflow scheme can't recover reliably
because it doesn't fix maintain invariants. There are ways to fix it ---
e.g., we can make our own guard pages, mark them PROT_NONE, and on stack
overflow, mark them PROT_READ | PROT_WRITE long enough to reach eval,
*then* signal, but I don't think it's worth it.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 6:14 ` Daniel Colascione
@ 2015-12-22 6:33 ` Paul Eggert
2015-12-22 6:35 ` Daniel Colascione
2015-12-22 16:12 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-12-22 6:33 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
Daniel Colascione wrote:
> Most programs you use don't bother trying to detect or recover from sack
> overflow. Why should they? C stack overflow is a programming error.
Emacs is different because it's programmable, and it must respond robustly to
programming errors.
> The current stack overflow scheme can't recover reliably
> because it doesn't fix maintain invariants.
It works well enough now for people to get work done. Introducing modules should
not break this.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 6:33 ` Paul Eggert
@ 2015-12-22 6:35 ` Daniel Colascione
2015-12-22 6:44 ` Paul Eggert
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-22 6:35 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 805 bytes --]
On 12/21/2015 10:33 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>> Most programs you use don't bother trying to detect or recover from sack
>> overflow. Why should they? C stack overflow is a programming error.
>
> Emacs is different because it's programmable, and it must respond
> robustly to programming errors.
It's programmable in Lisp. Lisp stack overflows shouldn't kill Emacs.
I'm suggesting that we shouldn't care about *C* stack overflows.
>> The current stack overflow scheme can't recover reliably
>> because it doesn't fix maintain invariants.
>
> It works well enough now for people to get work done. Introducing
> modules should not break this.
Modules can just dereference NULL or call abort. Writing a module is
more dangerous than writing a bit of elisp.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 6:35 ` Daniel Colascione
@ 2015-12-22 6:44 ` Paul Eggert
2015-12-22 6:53 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-12-22 6:44 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
Daniel Colascione wrote:
> It's programmable in Lisp. Lisp stack overflows shouldn't kill Emacs.
> I'm suggesting that we shouldn't care about *C* stack overflows.
The Lisp stack *is* the C stack. There is just one stack, which can overflow in
module code or in Elisp interpreter code (or in library code or whatever).
Whatever technique is used to detect Lisp stack overflows, should be usable to
detect stack overflows in module calls.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 6:44 ` Paul Eggert
@ 2015-12-22 6:53 ` Daniel Colascione
2015-12-22 16:13 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-22 6:53 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii
Cc: aurelien.aptel+emacs, p.stephani2, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]
On 12/21/2015 10:44 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>> It's programmable in Lisp. Lisp stack overflows shouldn't kill Emacs.
>> I'm suggesting that we shouldn't care about *C* stack overflows.
>
> The Lisp stack *is* the C stack. There is just one stack, which can
> overflow in module code or in Elisp interpreter code (or in library code
> or whatever). Whatever technique is used to detect Lisp stack overflows,
> should be usable to detect stack overflows in module calls.
It's true that we have a unified stack, but that doesn't mean we have to
treat both Lisp and C stack frames identically. My chief problem with
the current stack overflow detection is that it's completely
asynchronous: we can longjmp on *any* stack enlargement, even in the
middle of a sequence of operations that can't normally be interrupted. A
much more limited approach, where we probe the stack in Feval and
longjmp only in safe positions, but would work as well for lisp, but
would prevent our trying to recover from inherently unrecoverable
situations.
But why bother? We have max-lisp-eval-depth to prevent runaway
recursion. Whenever I've written clowny overly-deep code, I've always
hit that limit, or max-specpdl-size, first.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 4:38 ` Daniel Colascione
2015-12-22 4:48 ` Paul Eggert
@ 2015-12-22 16:01 ` Eli Zaretskii
2015-12-22 16:32 ` John Wiegley
2015-12-22 20:31 ` Daniel Colascione
1 sibling, 2 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-22 16:01 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com,
> emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 21 Dec 2015 20:38:07 -0800
>
> Right now, when we get a SIGSEGV, we check the siginfo_t the OS gives us
> by calling stack_overflow on it; if that returns true, we longjmp to
> toplevel. We configure the sigsegv handler to run on an alternate stack,
> so we'll always have space to do that much work. The longjmp restores
> the original stack. On the other side of the longjmp, we resume
> execution with our regular stack, but much further up the stack. At this
> point, we know we have a stack overflow, because nothing else longjmps
> to return_to_command_loop.
>
> Now, if we return normally to a C++ caller with an error indication set,
> the C++ caller will almost certainly have enough stack space to throw
> its own exception and propagate the exception further.
I very much doubt that. The alternate stack is quite small (AFAIK,
the standard value that we are using, SIGSTKSZ, is something like
8KB). Running arbitrary C++ code on such a small stack is not safe.
(My understanding is that the value of SIGSTKSZ should suffice for
calling printf, and that's about it.) There will be high risk of
hitting yet another stack overflow, this time a fatal one.
> unwind_to_catch isn't really very different from the longmp to
> return_to_command_loop: I don't see any reason we can't run it on the
> alternate signal stack. In fact, I don't see why we can't replace
> return_to_command_loop generally with Fsignal.
See above: I think running arbitrary Lisp code on a 8KB stack is even
less safe that with C++ code. We avoid doing that for a good reason.
Let me remind you that Emacs on Windows sets up a 8MB stack (as
opposed to the standard 2MB) because it is necessary in some
situations, like matching some regexps. 8MB, not 8KB! A Lisp unwind
handler can do anything at all, so I think running the unwinding code
from a stack overflow is not an option, if we want to make sure stack
overflow recovery will not hit another fatal stack overflow in most
cases.
> I really don't like the stack overflow protection stuff in general
> though. It's not possible to robustly recover, because the stack
> overflow detection turns *any* function call into an operation that
> might return non-locally. In that environment --- where, say, XCAR might
> end up running lisp --- it's hard to maintain invariants.
It might be less than nice or elegant, but Emacs should give the user
an opportunity to save their work.
> I'd rather Emacs just die on C stack overflow, except when we know
> we're running Lisp in such a way that we know we can recover.
You are in effect saying the stack overflow recovery code should not
have been added to Emacs. But we already decided that an attempt to
recover is a useful feature, and I see no reason to go back. Even if
this is works only in some cases, partial recovery is better than a
hard crash, because it lets users save their work.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 4:48 ` Paul Eggert
2015-12-22 4:52 ` Daniel Colascione
@ 2015-12-22 16:03 ` Eli Zaretskii
2015-12-22 16:39 ` Paul Eggert
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-22 16:03 UTC (permalink / raw)
To: Paul Eggert; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com,
> emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 21 Dec 2015 20:48:32 -0800
>
> The only real change we have to make is to have Emacs longjmp not to
> return_to_command_loop (which might skip module frames), but to longjmp
> instead to the most deeply nested entry point from module code into
> Emacs, which we can set up in advance whenever a module calls into the
> Emacs API.
>
> Yes, that looks like something we should do, then, to get stack overflow checking working with modules.
If that means letting modules run arbitrary stack unwinding code on
the alternate stack, I'm against that. Doing so will significantly
lower the chances for a successful recovery (which means at least
letting the user safely save their work and exit the session).
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 6:14 ` Daniel Colascione
2015-12-22 6:33 ` Paul Eggert
@ 2015-12-22 16:12 ` Eli Zaretskii
2015-12-22 17:26 ` Philipp Stephani
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-22 16:12 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com,
> emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 21 Dec 2015 22:14:38 -0800
>
> And even if it were OK for (say) 'cat' to dump core due to
> > stack overflow in a typical environment (which it's not), Emacs is more
> > important than 'cat', because people use it as an interactive text
> > editor and do not want to lose their work.
> >
> >> we already crash if we overflow the stack while we're GCing.
> >
> > If so, that's a bug that should get fixed. It's not an excuse to
> > introduce similar bugs.
> >
> > Really, the idea that it's OK for Emacs to crash is a nonstarter. Emacs
> > should not crash.
>
> Ideally, we wouldn't have bugs. But we do, and when we hit them, we
> should crash reliably and deterministically if we can't recover
> reliably.
I'm sorry, Daniel, but that kind of philosophy is a non-starter with
Emacs. Saving the user's work even in some cases is much better than
not saving it at all. And the current scheme is quite reliable, it
works in many scenarios, though not all of them.
As for bugs that cause fatal errors, Emacs always tried to catch them
and at least auto-save. This also doesn't work with 100% reliability,
but it saved me my work many times in the past. It would be a step
backward to give up that, just because it's not bullet-proof.
IOW, crashing reliably is an anti-thesis for Emacs, has been for as
long as I can remember. And for very good reasons, IMO. We are not
going to change that.
> The current stack overflow scheme can't recover reliably because it
> doesn't fix maintain invariants. There are ways to fix it --- e.g.,
> we can make our own guard pages, mark them PROT_NONE, and on stack
> overflow, mark them PROT_READ | PROT_WRITE long enough to reach
> eval, *then* signal, but I don't think it's worth it.
Improvements to the current recovery are welcome, although what you
suggest might run afoul of constant changes in the OS internals, so it
could be fragile (or perhaps even impossible on some systems). Still,
if you have patches to propose, please do, and thanks.
Bottom line, I mostly agree with Paul: we should keep the stack
overflow recovery. We should also try improving it as much as
possible, in particular wrt modules. However, I don't think letting
modules run arbitrary stack unwinders, let alone Lisp handlers, when
we detect a stack overflow is a good idea, because the alternate stack
is too small for that, and will run higher risk defeating the
recovery.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 6:53 ` Daniel Colascione
@ 2015-12-22 16:13 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-22 16:13 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 21 Dec 2015 22:53:44 -0800
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com,
> emacs-devel@gnu.org
>
> But why bother? We have max-lisp-eval-depth to prevent runaway
> recursion. Whenever I've written clowny overly-deep code, I've always
> hit that limit, or max-specpdl-size, first.
In that case, all this discussion is a moot point, since you are in
effect saying that the stack-overflow recovery code will never be
executed in the first place.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 16:01 ` Eli Zaretskii
@ 2015-12-22 16:32 ` John Wiegley
2015-12-22 20:31 ` Daniel Colascione
1 sibling, 0 replies; 177+ messages in thread
From: John Wiegley @ 2015-12-22 16:32 UTC (permalink / raw)
To: Eli Zaretskii
Cc: eggert, tzz, emacs-devel, p.stephani2, aurelien.aptel+emacs,
Daniel Colascione
>>>>> Eli Zaretskii <eliz@gnu.org> writes:
> You are in effect saying the stack overflow recovery code should not have
> been added to Emacs. But we already decided that an attempt to recover is a
> useful feature, and I see no reason to go back. Even if this is works only
> in some cases, partial recovery is better than a hard crash, because it lets
> users save their work.
I completely agree with Eli on this point.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 16:03 ` Eli Zaretskii
@ 2015-12-22 16:39 ` Paul Eggert
2015-12-22 17:46 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-12-22 16:39 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
Eli Zaretskii wrote:
>> The only real change we have to make is to have Emacs longjmp not to
>> > return_to_command_loop (which might skip module frames), but to longjmp
>> > instead to the most deeply nested entry point from module code into
>> > Emacs, which we can set up in advance whenever a module calls into the
>> > Emacs API.
>> >
>> >Yes, that looks like something we should do, then, to get stack overflow checking working with modules.
> If that means letting modules run arbitrary stack unwinding code on
> the alternate stack, I'm against that.
Likewise. My thought was more to require that the module code's most deeply
nested frame must be on the main stack, with some (perhaps configurable) amount
of unused stack space above it, so that any module-specific unwinding would be
done on the main stack. In other words, the module code knows about stack
overflow and has methods (either a runtime check on each call as Daniel
suggests, or guard pages and constraints about module stack frame sizes if this
works and has better performance) to detect it.
We might also want to put in something to catch trivial stack-overflow infinite
loops, if they turn into an issue. This would give up on catching a looping
stack-overflow and would go back to the previous catch. This would break
destructor-based invariants, but that's often better than crashing. It's akin to
the existing 3x C-g signal, which breaks invariants all over the place
(including invariants in module code!) but in my experience is crucial.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 16:12 ` Eli Zaretskii
@ 2015-12-22 17:26 ` Philipp Stephani
2015-12-22 17:51 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2015-12-22 17:26 UTC (permalink / raw)
To: Eli Zaretskii, Daniel Colascione
Cc: aurelien.aptel+emacs, tzz, eggert, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Di., 22. Dez. 2015 um 17:11 Uhr:
> > Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com,
> tzz@lifelogs.com,
> > emacs-devel@gnu.org
> > From: Daniel Colascione <dancol@dancol.org>
> > Date: Mon, 21 Dec 2015 22:14:38 -0800
> >
> > And even if it were OK for (say) 'cat' to dump core due to
> > > stack overflow in a typical environment (which it's not), Emacs is more
> > > important than 'cat', because people use it as an interactive text
> > > editor and do not want to lose their work.
> > >
> > >> we already crash if we overflow the stack while we're GCing.
> > >
> > > If so, that's a bug that should get fixed. It's not an excuse to
> > > introduce similar bugs.
> > >
> > > Really, the idea that it's OK for Emacs to crash is a nonstarter. Emacs
> > > should not crash.
> >
> > Ideally, we wouldn't have bugs. But we do, and when we hit them, we
> > should crash reliably and deterministically if we can't recover
> > reliably.
>
> I'm sorry, Daniel, but that kind of philosophy is a non-starter with
> Emacs. Saving the user's work even in some cases is much better than
> not saving it at all. And the current scheme is quite reliable, it
> works in many scenarios, though not all of them.
>
> As for bugs that cause fatal errors, Emacs always tried to catch them
> and at least auto-save.
>
IIUC this code (terminate_due_to_signal) already auto-saves without calling
Lisp (or module) code, so this should be usable. We just need to make sure
that module code is not run after a stack overflow. Would it be possible to
chose this path if a module function is running?
[-- Attachment #2: Type: text/html, Size: 2442 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 16:39 ` Paul Eggert
@ 2015-12-22 17:46 ` Eli Zaretskii
2015-12-22 23:28 ` Paul Eggert
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-22 17:46 UTC (permalink / raw)
To: Paul Eggert; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, dancol@dancol.org,
> tzz@lifelogs.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 22 Dec 2015 08:39:53 -0800
>
> My thought was more to require that the module code's most deeply
> nested frame must be on the main stack, with some (perhaps
> configurable) amount of unused stack space above it, so that any
> module-specific unwinding would be done on the main stack. In other
> words, the module code knows about stack overflow and has methods
> (either a runtime check on each call as Daniel suggests, or guard
> pages and constraints about module stack frame sizes if this works
> and has better performance) to detect it.
Is that practical? One problem that I, as a module developer would
face, is to estimate the amount of stack I'd need for unwinding.
Where do I begin?
Then there are all the stack frames below the module, which belong to
some Lisp someone else wrote -- who will be responsible for ensuring
those other unwinders don't need large amounts of stack space that
might be unavailable at stack-overflow point?
Stack overflow detection on modern systems uses hardware assistance
and processor exceptions to detect overflow with no runtime
penalties. Doing the equivalent in application code is bound to incur
additional processing, which will slow down code, right? If you think
about manipulating the guard pages to make them resizable, are we sure
enough of the supported platforms allow that?
IOW, I simply don't see how this could be practical
Stack overflow, like any other fatal error, are very infrequent events
in Emacs. Hitting stack overflow when some module runs is even rarer.
Why is it a disaster to fail to invoke the unwinders in those cases?
My warm advice to users is to save and exit immediately after the
recovery anyway, so I fail to see why some resource leak is a grave
problem. Anyone who continues a session that recovered from stack
overflow is playing with fire anyway.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 17:26 ` Philipp Stephani
@ 2015-12-22 17:51 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-22 17:51 UTC (permalink / raw)
To: Philipp Stephani; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel, eggert
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 22 Dec 2015 17:26:14 +0000
> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com,
> emacs-devel@gnu.org
>
> IIUC this code (terminate_due_to_signal) already auto-saves without calling
> Lisp (or module) code, so this should be usable. We just need to make sure that
> module code is not run after a stack overflow. Would it be possible to chose
> this path if a module function is running?
I'm not sure I understand your proposal in enough detail to answer.
Are you suggesting to call terminate_due_to_signal where we detect
stack overflow, without unwinding the stack by longjmp? If so, that
runs a high risk of crashing, due to several reasons (I can explain
them if you indeed suggested something similar to that). We must
release enough stack space before we call any non-trivial code.
That's what longjmp to top level does.
If you are suggesting something else, please elaborate.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 16:01 ` Eli Zaretskii
2015-12-22 16:32 ` John Wiegley
@ 2015-12-22 20:31 ` Daniel Colascione
2015-12-22 20:46 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-22 20:31 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3703 bytes --]
On 12/22/2015 08:01 AM, Eli Zaretskii wrote:
>> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, tzz@lifelogs.com,
>> emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Mon, 21 Dec 2015 20:38:07 -0800
>>
>> Right now, when we get a SIGSEGV, we check the siginfo_t the OS gives us
>> by calling stack_overflow on it; if that returns true, we longjmp to
>> toplevel. We configure the sigsegv handler to run on an alternate stack,
>> so we'll always have space to do that much work. The longjmp restores
>> the original stack. On the other side of the longjmp, we resume
>> execution with our regular stack, but much further up the stack. At this
>> point, we know we have a stack overflow, because nothing else longjmps
>> to return_to_command_loop.
>>
>> Now, if we return normally to a C++ caller with an error indication set,
>> the C++ caller will almost certainly have enough stack space to throw
>> its own exception and propagate the exception further.
>
> I very much doubt that. The alternate stack is quite small (AFAIK,
> the standard value that we are using, SIGSTKSZ, is something like
> 8KB). Running arbitrary C++ code on such a small stack is not safe.
> (My understanding is that the value of SIGSTKSZ should suffice for
> calling printf, and that's about it.) There will be high risk of
> hitting yet another stack overflow, this time a fatal one.
We're not talking about running arbitrary C++ code on the small stack.
The longjmp transfers execution to the original stack, but with the
context popped off.
Overflow stack: A B C D E F G
Signal stack: 1 2 3 longjmp
Resumption stack: A B C
>
>> unwind_to_catch isn't really very different from the longmp to
>> return_to_command_loop: I don't see any reason we can't run it on the
>> alternate signal stack. In fact, I don't see why we can't replace
>> return_to_command_loop generally with Fsignal.
>
> See above: I think running arbitrary Lisp code on a 8KB stack is even
> less safe that with C++ code. We avoid doing that for a good reason.
> Let me remind you that Emacs on Windows sets up a 8MB stack (as
> opposed to the standard 2MB) because it is necessary in some
> situations, like matching some regexps. 8MB, not 8KB! A Lisp unwind
> handler can do anything at all, so I think running the unwinding code
> from a stack overflow is not an option, if we want to make sure stack
> overflow recovery will not hit another fatal stack overflow in most
> cases.
>
>> I really don't like the stack overflow protection stuff in general
>> though. It's not possible to robustly recover, because the stack
>> overflow detection turns *any* function call into an operation that
>> might return non-locally. In that environment --- where, say, XCAR might
>> end up running lisp --- it's hard to maintain invariants.
>
> It might be less than nice or elegant, but Emacs should give the user
> an opportunity to save their work.
>
>> I'd rather Emacs just die on C stack overflow, except when we know
>> we're running Lisp in such a way that we know we can recover.
>
> You are in effect saying the stack overflow recovery code should not
> have been added to Emacs. But we already decided that an attempt to
> recover is a useful feature, and I see no reason to go back. Even if
> this is works only in some cases, partial recovery is better than a
> hard crash, because it lets users save their work.
Or it actually corrupts their work, because the Emacs core is in a bad
state. We can gracefully recover from stack overflow of Lisp code. We
cannot recover from stack oveflow at arbitrary points in the C core.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 20:31 ` Daniel Colascione
@ 2015-12-22 20:46 ` Eli Zaretskii
2015-12-22 20:52 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-22 20:46 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Tue, 22 Dec 2015 12:31:58 -0800
>
> > I very much doubt that. The alternate stack is quite small (AFAIK,
> > the standard value that we are using, SIGSTKSZ, is something like
> > 8KB). Running arbitrary C++ code on such a small stack is not safe.
> > (My understanding is that the value of SIGSTKSZ should suffice for
> > calling printf, and that's about it.) There will be high risk of
> > hitting yet another stack overflow, this time a fatal one.
>
> We're not talking about running arbitrary C++ code on the small stack.
> The longjmp transfers execution to the original stack, but with the
> context popped off.
>
> Overflow stack: A B C D E F G
> Signal stack: 1 2 3 longjmp
> Resumption stack: A B C
If you only longjmp a short while, you have no idea how much stack you
freed. You might as well be just 200 bytes below the level where the
stack overflow hit.
That's why we jump to the lowest level we can: there, we _know_ we
have enough stack to do any kind of stuff.
> > You are in effect saying the stack overflow recovery code should not
> > have been added to Emacs. But we already decided that an attempt to
> > recover is a useful feature, and I see no reason to go back. Even if
> > this is works only in some cases, partial recovery is better than a
> > hard crash, because it lets users save their work.
>
> Or it actually corrupts their work, because the Emacs core is in a bad
> state.
No, the core isn't in a bad state. Longjmp is not an abnormal path,
its semantics is very simple and clear.
> We can gracefully recover from stack overflow of Lisp code. We
> cannot recover from stack oveflow at arbitrary points in the C core.
We can and we do. The recovery has only to be good enough to allow
saving your work and exiting. That's the only goal of that
protection: allow the user to exit normally, after saving their work.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 20:46 ` Eli Zaretskii
@ 2015-12-22 20:52 ` Daniel Colascione
2015-12-22 21:08 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-22 20:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]
On 12/22/2015 12:46 PM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
>> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Tue, 22 Dec 2015 12:31:58 -0800
>>
>>> I very much doubt that. The alternate stack is quite small (AFAIK,
>>> the standard value that we are using, SIGSTKSZ, is something like
>>> 8KB). Running arbitrary C++ code on such a small stack is not safe.
>>> (My understanding is that the value of SIGSTKSZ should suffice for
>>> calling printf, and that's about it.) There will be high risk of
>>> hitting yet another stack overflow, this time a fatal one.
>>
>> We're not talking about running arbitrary C++ code on the small stack.
>> The longjmp transfers execution to the original stack, but with the
>> context popped off.
>>
>> Overflow stack: A B C D E F G
>> Signal stack: 1 2 3 longjmp
>> Resumption stack: A B C
>
> If you only longjmp a short while, you have no idea how much stack you
> freed. You might as well be just 200 bytes below the level where the
> stack overflow hit.
Which is why you setjmp in places where you have a significant stack
reserve.
> That's why we jump to the lowest level we can: there, we _know_ we
> have enough stack to do any kind of stuff.
>
>>> You are in effect saying the stack overflow recovery code should not
>>> have been added to Emacs. But we already decided that an attempt to
>>> recover is a useful feature, and I see no reason to go back. Even if
>>> this is works only in some cases, partial recovery is better than a
>>> hard crash, because it lets users save their work.
>>
>> Or it actually corrupts their work, because the Emacs core is in a bad
>> state.
>
> No, the core isn't in a bad state. Longjmp is not an abnormal path,
> its semantics is very simple and clear.
Longjmp, by itself, is simple and clear. What's unreliable is longjmping
to Lisp at completely arbitrary points in the program, even ones marked
"GC can't happen here" and the like. You say Emacs shouldn't crash.
Fine. We can't make that guarantee if the crash recovery code breaks
program invariants. If you want to longjmp, you need to do it at certain
well-defined points only. The current approach is a bug machine.
>> We can gracefully recover from stack overflow of Lisp code. We
>> cannot recover from stack oveflow at arbitrary points in the C core.
>
> We can and we do. The recovery has only to be good enough to allow
> saving your work and exiting. That's the only goal of that
> protection: allow the user to exit normally, after saving their work.
I'd rather rely on autosaves. Failing that, we should allocate guard
pages, unprotect the guard pages on overflow, and call out_of_memory so
that it's obvious Emacs is in a bad state. This way, we don't have to
longjmp out of arbitrary code sequences.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 20:52 ` Daniel Colascione
@ 2015-12-22 21:08 ` Eli Zaretskii
2015-12-22 21:18 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-22 21:08 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Tue, 22 Dec 2015 12:52:05 -0800
>
> > If you only longjmp a short while, you have no idea how much stack you
> > freed. You might as well be just 200 bytes below the level where the
> > stack overflow hit.
>
> Which is why you setjmp in places where you have a significant stack
> reserve.
There's no way of doing that portably, or even non-portably on many
platforms. You simply don't _know_ how much stack is left.
> > No, the core isn't in a bad state. Longjmp is not an abnormal path,
> > its semantics is very simple and clear.
>
> Longjmp, by itself, is simple and clear. What's unreliable is longjmping
> to Lisp at completely arbitrary points in the program, even ones marked
> "GC can't happen here" and the like.
We longjmp to a particular place, not arbitrary place.
> You say Emacs shouldn't crash. Fine. We can't make that guarantee
> if the crash recovery code breaks program invariants.
Crash recovery doesn't need to keep invariants. Or maybe I
misunderstand what invariants do you have in mind.
> If you want to longjmp, you need to do it at certain well-defined
> points only. The current approach is a bug machine.
No, it isn't. There's a small number of places (2?) to which we
jump. That's all.
> >> We can gracefully recover from stack overflow of Lisp code. We
> >> cannot recover from stack oveflow at arbitrary points in the C core.
> >
> > We can and we do. The recovery has only to be good enough to allow
> > saving your work and exiting. That's the only goal of that
> > protection: allow the user to exit normally, after saving their work.
>
> I'd rather rely on autosaves.
It's not reliable enough, because it happens relatively rarely. Doing
it much more frequently will be an annoyance with many buffers. And
then there are buffers that don't autosave, but the user might still
want to do something with them if she needs to abandon ship.
> Failing that, we should allocate guard pages, unprotect the guard
> pages on overflow
Thats what the OS is for. It would be wrong for us to start messing
with page protection etc. The exception caused by stack overflow
removes protection from the guard page to let you do something simple,
like run the exception handler -- are you suggesting we catch the
exception and mess with protection bits as well, i.e. replace one of
the core functions of a modern OS? All that because what we have now
is not elegant enough for us? Doesn't sound right to me.
> and call out_of_memory so that it's obvious Emacs is in a bad
> state. This way, we don't have to longjmp out of arbitrary code
> sequences.
There's no problem longjmping out of arbitrary code sequences. When
you debug a program, you do that all the time.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 21:08 ` Eli Zaretskii
@ 2015-12-22 21:18 ` Daniel Colascione
2015-12-23 16:07 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-22 21:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4081 bytes --]
On 12/22/2015 01:08 PM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
>> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Tue, 22 Dec 2015 12:52:05 -0800
>>
>>> If you only longjmp a short while, you have no idea how much stack you
>>> freed. You might as well be just 200 bytes below the level where the
>>> stack overflow hit.
>>
>> Which is why you setjmp in places where you have a significant stack
>> reserve.
>
> There's no way of doing that portably, or even non-portably on many
> platforms. You simply don't _know_ how much stack is left.
You can probe at program start and pre-allocate as much as is reasonable.
>>> No, the core isn't in a bad state. Longjmp is not an abnormal path,
>>> its semantics is very simple and clear.
>>
>> Longjmp, by itself, is simple and clear. What's unreliable is longjmping
>> to Lisp at completely arbitrary points in the program, even ones marked
>> "GC can't happen here" and the like.
>
> We longjmp to a particular place, not arbitrary place.
But we longjmp _from_ anywhere, and "anywhere" might be in the middle of
any delicate code sequence, since the compiler can generate code to
write to new stack slots at any point.
>> You say Emacs shouldn't crash. Fine. We can't make that guarantee
>> if the crash recovery code breaks program invariants.
>
> Crash recovery doesn't need to keep invariants. Or maybe I
> misunderstand what invariants do you have in mind.
Any stack allocation anywhere in the program can longjmp. It's
impossible to reason about safety in that situation.
>
>> If you want to longjmp, you need to do it at certain well-defined
>> points only. The current approach is a bug machine.
>
> No, it isn't. There's a small number of places (2?) to which we
> jump. That's all.
>
>>>> We can gracefully recover from stack overflow of Lisp code. We
>>>> cannot recover from stack oveflow at arbitrary points in the C core.
>>>
>>> We can and we do. The recovery has only to be good enough to allow
>>> saving your work and exiting. That's the only goal of that
>>> protection: allow the user to exit normally, after saving their work.
>>
>> I'd rather rely on autosaves.
>
> It's not reliable enough, because it happens relatively rarely. Doing
> it much more frequently will be an annoyance with many buffers. And
> then there are buffers that don't autosave, but the user might still
> want to do something with them if she needs to abandon ship.
>
>> Failing that, we should allocate guard pages, unprotect the guard
>> pages on overflow
>
> Thats what the OS is for. It would be wrong for us to start messing
> with page protection etc. The exception caused by stack overflow
> removes protection from the guard page to let you do something simple,
> like run the exception handler -- are you suggesting we catch the
> exception and mess with protection bits as well, i.e. replace one of
> the core functions of a modern OS? All that because what we have now
> is not elegant enough for us? Doesn't sound right to me.
We have a program that has its own Lisp runtime, has its own memory
allocation system, uses its own virtual filesystem access layer, and
that brings itself back from the dead. We're well past replicating OS
functionality.
It's not a matter of elegance: it's a matter of correctness. The current
scheme is unsafe.
>
>> and call out_of_memory so that it's obvious Emacs is in a bad
>> state. This way, we don't have to longjmp out of arbitrary code
>> sequences.
>
> There's no problem longjmping out of arbitrary code sequences. When
> you debug a program, you do that all the time.
In GDB, interrupting normal control flow is not part of standard
debugging practice. Lisp-level debugging can return to toplevel from
anywhere, but only between lisp form evaluations, and returning to
toplevel still runs unwind-protect handlers. The longjmp on stack
overflow does not.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 17:46 ` Eli Zaretskii
@ 2015-12-22 23:28 ` Paul Eggert
2015-12-23 16:10 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-12-22 23:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
Eli Zaretskii wrote:
> Is that practical? One problem that I, as a module developer would
> face, is to estimate the amount of stack I'd need for unwinding.
> Where do I begin?
We could merely require that any module needing recursion must call a new
stack-size-checking function at each level of recursion, in order to detect
stack overflow.
Also, any module with an unbounded amount of computation should call the
equivalent of QUIT every now and then. If the module API doesn't let (or
ideally, require) modules to do that now, it should. Otherwise it's an Emacs
freeze waiting to happen.
> Then there are all the stack frames below the module, which belong to
> some Lisp someone else wrote -- who will be responsible for ensuring
> those other unwinders don't need large amounts of stack space that
> might be unavailable at stack-overflow point?
Each stack-overflow unwinder should use only a small amount of stack space, in
order to prevent an infinite loop. We could partially enforce this by detecting
trivial infinite stack-overflow loops.
> Stack overflow detection on modern systems uses hardware assistance
> and processor exceptions to detect overflow with no runtime
> penalties. Doing the equivalent in application code is bound to incur
> additional processing, which will slow down code, right?
Yes, but it's not a big deal.
> If you think
> about manipulating the guard pages to make them resizable, are we sure
> enough of the supported platforms allow that?
In platforms that don't support guard pages we'll need to have the run-time
checks. Either approach should be adequate.
> Hitting stack overflow when some module runs is even rarer.
> Why is it a disaster to fail to invoke the unwinders in those cases?
Often I expect it wouldn't be a disaster; it'd just be a memory leak. I suppose
there could be some modules where it would be a disaster. But perhaps we can
just ask people to not write such modules, by making it a part of the Emacs API
that unwinders might not be invoked.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 21:18 ` Daniel Colascione
@ 2015-12-23 16:07 ` Eli Zaretskii
2015-12-23 16:25 ` Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-23 16:07 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Tue, 22 Dec 2015 13:18:21 -0800
>
> >> Which is why you setjmp in places where you have a significant stack
> >> reserve.
> >
> > There's no way of doing that portably, or even non-portably on many
> > platforms. You simply don't _know_ how much stack is left.
>
> You can probe at program start and pre-allocate as much as is reasonable.
Pre-allocate what? Are you suggesting that Emacs allocates its own
stack, instead of relying on the one provided by the linker and the
OS?
> >> Longjmp, by itself, is simple and clear. What's unreliable is longjmping
> >> to Lisp at completely arbitrary points in the program, even ones marked
> >> "GC can't happen here" and the like.
> >
> > We longjmp to a particular place, not arbitrary place.
>
> But we longjmp _from_ anywhere, and "anywhere" might be in the middle of
> any delicate code sequence, since the compiler can generate code to
> write to new stack slots at any point.
I simply don't see any trouble this could cause, except leaking some
memory. Can you describe in enough detail a single use case where
this could have any other adverse effects that we should care about
when recovering from stack overflow?
> >> You say Emacs shouldn't crash. Fine. We can't make that guarantee
> >> if the crash recovery code breaks program invariants.
> >
> > Crash recovery doesn't need to keep invariants. Or maybe I
> > misunderstand what invariants do you have in mind.
>
> Any stack allocation anywhere in the program can longjmp. It's
> impossible to reason about safety in that situation.
Emacs is not safety-critical software, so there's no requirement to
reason about safety. Since I think the recovery's only role is to
allow the user to exit Emacs in a controlled way without losing work,
I simply don't see any problem that could be caused by longjmping from
an arbitrary stack allocation. After all, stack allocation is just
assignment of value to a register, and sometimes grafting a range of
memory pages into the memory set.
> >> Failing that, we should allocate guard pages, unprotect the guard
> >> pages on overflow
> >
> > Thats what the OS is for. It would be wrong for us to start messing
> > with page protection etc. The exception caused by stack overflow
> > removes protection from the guard page to let you do something simple,
> > like run the exception handler -- are you suggesting we catch the
> > exception and mess with protection bits as well, i.e. replace one of
> > the core functions of a modern OS? All that because what we have now
> > is not elegant enough for us? Doesn't sound right to me.
> We have a program that has its own Lisp runtime, has its own memory
> allocation system, uses its own virtual filesystem access layer, and
> that brings itself back from the dead. We're well past replicating OS
> functionality.
Actually, most of the above is simply untrue: we use system allocators
to allocate memory, use mundane C APIs like 'open' and 'read' to
access files, and if by "bringing itself from the dead" you allude to
unexec, then what it does is a subset of what every linker does,
hardly an OS stuff.
I think we should strive to distance ourselves from the OS business,
not the other way around. There was time when doing complex things
sometimes required messing with low-level functionality like that, but
that time is long passed. Allocating our own stack, setting up and
managing our own guard pages and the related exceptions -- we
shouldn't go back there.
> It's not a matter of elegance: it's a matter of correctness. The current
> scheme is unsafe.
Emacs is not safety-critical software. It doesn't need to be "safe"
by your definition, if I understand it correctly.
> >> and call out_of_memory so that it's obvious Emacs is in a bad
> >> state. This way, we don't have to longjmp out of arbitrary code
> >> sequences.
> >
> > There's no problem longjmping out of arbitrary code sequences. When
> > you debug a program, you do that all the time.
>
> In GDB, interrupting normal control flow is not part of standard
> debugging practice.
??? Every time a debuggee hits a breakpoint, the normal control flow
is interrupted, and you in effect have a huge longjmp -- from the
debuggee to the debugger.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-22 23:28 ` Paul Eggert
@ 2015-12-23 16:10 ` Eli Zaretskii
2015-12-23 16:20 ` Philipp Stephani
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-23 16:10 UTC (permalink / raw)
To: Paul Eggert; +Cc: aurelien.aptel+emacs, p.stephani2, dancol, tzz, emacs-devel
> Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com, dancol@dancol.org,
> tzz@lifelogs.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 22 Dec 2015 15:28:03 -0800
>
> Is that practical? One problem that I, as a module developer would
> face, is to estimate the amount of stack I'd need for unwinding.
> Where do I begin?
>
> We could merely require that any module needing recursion must call a new stack-size-checking function at each level of recursion, in order to detect stack overflow.
That's performance penalty. I don't think we should incur that, on
behalf of a problem that really should "never" happen.
> Also, any module with an unbounded amount of computation should call the equivalent of QUIT every now and then. If the module API doesn't let (or ideally, require) modules to do that now, it should. Otherwise it's an Emacs freeze waiting to happen.
I agree, but I think it's unrelated to stack overflow. (Though I
think we will need to have a module-specific version of QUIT, as long
as we keep the constraint of not directly longjmping from module
code.)
> Then there are all the stack frames below the module, which belong to
> some Lisp someone else wrote -- who will be responsible for ensuring
> those other unwinders don't need large amounts of stack space that
> might be unavailable at stack-overflow point?
>
> Each stack-overflow unwinder should use only a small amount of stack space, in order to prevent an infinite loop. We could partially enforce this by detecting trivial infinite stack-overflow loops.
I don't think it's practical to require that unwinders use "only a
small amount of stack space". Verifying that is non-trivial even if
you know how the particular internals work; it's downright impossible
if you are a Lisp programmer with only a minimal knowledge of C and
the Emacs C levels, or none at all.
> Stack overflow detection on modern systems uses hardware assistance
> and processor exceptions to detect overflow with no runtime
> penalties. Doing the equivalent in application code is bound to incur
> additional processing, which will slow down code, right?
>
> Yes, but it's not a big deal.
I disagree. Compilers' settings to allow run-time stack-smashing
detection are off by default for a good reason.
> If you think
> about manipulating the guard pages to make them resizable, are we sure
> enough of the supported platforms allow that?
>
> In platforms that don't support guard pages we'll need to have the run-time checks. Either approach should be adequate.
I didn't mean platforms that don't support guard pages. I meant
platforms that do, but don't allow you to mess with them. For
starters, you need to know the address of the page at the end of the
stack, which is not trivial. Then you need to change its protection,
which I'm not sure all platforms will let you do, certainly in
future-proof manner. (Don't modern hardened system prevent messing
with page protection bits?)
> Hitting stack overflow when some module runs is even rarer.
> Why is it a disaster to fail to invoke the unwinders in those cases?
>
> Often I expect it wouldn't be a disaster; it'd just be a memory leak. I suppose there could be some modules where it would be a disaster. But perhaps we can just ask people to not write such modules, by making it a part of the Emacs API that unwinders might not be invoked.
I think if we adopt the view that the user should save and exit ASAP
following recovery from stack overflow (similar to what we expect when
we run out of memory), the probability of a "disaster" in this
scenario becomes much lower, perhaps negligible.
All in all, I think the current recovery is "good enough", and
investing significant efforts into making it slightly better will
waste resources that are better applied to more important problems.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-23 16:10 ` Eli Zaretskii
@ 2015-12-23 16:20 ` Philipp Stephani
2015-12-23 16:46 ` Eli Zaretskii
2015-12-23 17:09 ` Paul Eggert
0 siblings, 2 replies; 177+ messages in thread
From: Philipp Stephani @ 2015-12-23 16:20 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 23. Dez. 2015 um 17:10 Uhr:
> > Cc: aurelien.aptel+emacs@gmail.com, p.stephani2@gmail.com,
> dancol@dancol.org,
> > tzz@lifelogs.com, emacs-devel@gnu.org
> > From: Paul Eggert <eggert@cs.ucla.edu>
> > Date: Tue, 22 Dec 2015 15:28:03 -0800
> >
> > Is that practical? One problem that I, as a module developer would
> > face, is to estimate the amount of stack I'd need for unwinding.
> > Where do I begin?
> >
> > We could merely require that any module needing recursion must call a
> new stack-size-checking function at each level of recursion, in order to
> detect stack overflow.
>
> That's performance penalty. I don't think we should incur that, on
> behalf of a problem that really should "never" happen.
>
It's also impossible. Modules will be written in arbitrary languages and
make use of arbitrary libraries, some of which might not even know about
Emacs's existence. We cannot impose any constraints on module code.
>
> > Also, any module with an unbounded amount of computation should call the
> equivalent of QUIT every now and then. If the module API doesn't let (or
> ideally, require) modules to do that now, it should. Otherwise it's an
> Emacs freeze waiting to happen.
>
> I agree, but I think it's unrelated to stack overflow. (Though I
> think we will need to have a module-specific version of QUIT, as long
> as we keep the constraint of not directly longjmping from module
> code.)
>
There's no harm in providing such a functionality (but of course we can't
enforce its use). Alternatively modules can run computations in background
threads and update Emacs state in timer functions.
>
>
> > Hitting stack overflow when some module runs is even rarer.
> > Why is it a disaster to fail to invoke the unwinders in those cases?
> >
> > Often I expect it wouldn't be a disaster; it'd just be a memory leak. I
> suppose there could be some modules where it would be a disaster. But
> perhaps we can just ask people to not write such modules, by making it a
> part of the Emacs API that unwinders might not be invoked.
>
> I think if we adopt the view that the user should save and exit ASAP
> following recovery from stack overflow (similar to what we expect when
> we run out of memory), the probability of a "disaster" in this
> scenario becomes much lower, perhaps negligible.
>
The disaster is that longjmps cause undefined behavior in C++ if
destructors would be skipped (and I don't even want to imagine what would
happen to higher-level languages such as Java or Haskell). Even if we
assume "benign UB" (which is dangerous because UB tends to become more
malign as compilers evolve), it will cause internal state to become
inconsistent, which is truly disastrous. Resuming operation in such a
situation is a non-starter.
> All in all, I think the current recovery is "good enough", and
> investing significant efforts into making it slightly better will
> waste resources that are better applied to more important problems.
>
>
handle_sigsegv already has several cases where the longjmp is skipped: when
in a GC, when the signal is received from the wrong thread, and when it's
not guaranteed to be a SO. In those cases Emacs aborts (still autosaving).
Why can't we do the same if SO is detected while a module function is
active?
[-- Attachment #2: Type: text/html, Size: 4664 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 16:07 ` Eli Zaretskii
@ 2015-12-23 16:25 ` Daniel Colascione
2015-12-23 17:30 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-23 16:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 6903 bytes --]
On 12/23/2015 08:07 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
>> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Tue, 22 Dec 2015 13:18:21 -0800
>>
>>>> Which is why you setjmp in places where you have a significant stack
>>>> reserve.
>>>
>>> There's no way of doing that portably, or even non-portably on many
>>> platforms. You simply don't _know_ how much stack is left.
>>
>> You can probe at program start and pre-allocate as much as is reasonable.
>
> Pre-allocate what? Are you suggesting that Emacs allocates its own
> stack, instead of relying on the one provided by the linker and the
> OS?
We can alloca, say, 8MB, and write to the start and end of the allocated
region. Then we'll know we have at least that much stack space available.
>>>> Longjmp, by itself, is simple and clear. What's unreliable is longjmping
>>>> to Lisp at completely arbitrary points in the program, even ones marked
>>>> "GC can't happen here" and the like.
>>>
>>> We longjmp to a particular place, not arbitrary place.
>>
>> But we longjmp _from_ anywhere, and "anywhere" might be in the middle of
>> any delicate code sequence, since the compiler can generate code to
>> write to new stack slots at any point.
>
> I simply don't see any trouble this could cause, except leaking some
> memory. Can you describe in enough detail a single use case where
> this could have any other adverse effects that we should care about
> when recovering from stack overflow?
What happens if we overflow inside malloc? One possibility is that we'll
longjmp back to toplevel without releasing the heap lock, then deadlock
the next time we try to allocate.
>>>> You say Emacs shouldn't crash. Fine. We can't make that guarantee
>>>> if the crash recovery code breaks program invariants.
>>>
>>> Crash recovery doesn't need to keep invariants. Or maybe I
>>> misunderstand what invariants do you have in mind.
>>
>> Any stack allocation anywhere in the program can longjmp. It's
>> impossible to reason about safety in that situation.
>
> Emacs is not safety-critical software, so there's no requirement to
> reason about safety. Since I think the recovery's only role is to
> allow the user to exit Emacs in a controlled way without losing work,
> I simply don't see any problem that could be caused by longjmping from
> an arbitrary stack allocation. After all, stack allocation is just
> assignment of value to a register, and sometimes grafting a range of
> memory pages into the memory set.
>
>>>> Failing that, we should allocate guard pages, unprotect the guard
>>>> pages on overflow
>>>
>>> Thats what the OS is for. It would be wrong for us to start messing
>>> with page protection etc. The exception caused by stack overflow
>>> removes protection from the guard page to let you do something simple,
>>> like run the exception handler -- are you suggesting we catch the
>>> exception and mess with protection bits as well, i.e. replace one of
>>> the core functions of a modern OS? All that because what we have now
>>> is not elegant enough for us? Doesn't sound right to me.
>
>> We have a program that has its own Lisp runtime, has its own memory
>> allocation system, uses its own virtual filesystem access layer, and
>> that brings itself back from the dead. We're well past replicating OS
>> functionality.
>
> Actually, most of the above is simply untrue: we use system allocators
> to allocate memory
We have internal allocators for strings and conses and use the system
allocator only for backing storage.
> use mundane C APIs like 'open' and 'read' to
> access files
We must.
, and if by "bringing itself from the dead" you allude to
> unexec, then what it does is a subset of what every linker does,
> hardly an OS stuff.
Granted, that's toolchain work, not "OS" work, but it's still outside
the domain of most text editors.
> I think we should strive to distance ourselves from the OS business,
> not the other way around. There was time when doing complex things
> sometimes required messing with low-level functionality like that, but
> that time is long passed. Allocating our own stack, setting up and
> managing our own guard pages and the related exceptions -- we
> shouldn't go back there.
If an OS provides a documented and supported facility, there's no shame
in using it. I'm not sure how worrying about whatever that facility is
"OS business" is useful.
>> It's not a matter of elegance: it's a matter of correctness. The current
>> scheme is unsafe.
>
> Emacs is not safety-critical software. It doesn't need to be "safe"
> by your definition, if I understand it correctly.
It's not safety-critical software, but undefined behavior is undefined.
What makes us confident that we can't corrupt buffer data by longjmping
from the wrong place? Anything can happen because we can longjmp from
anywhere.
It's admirable to avoid the loss of user data, but I think there's a way
that's both safer and more general. Instead of trying to catch stack
overflow, let's treat stack overflow as a normal fatal error and instead
think about how we can preserve buffer contents on fatal errors generally.
What if we just installed a SIGSEGV handler (or, on Windows, a vectored
exception handler) that wrote buffer contents to a special file on a
fatal signal, then allowed that fatal signal to propagate normally? The
next time Emacs starts, we can restore the buffers we've saved this way
and ask users to save them --- just like autosave, but done on-demand,
at crash time, in C code, on the alternate signal stack.
>>>> and call out_of_memory so that it's obvious Emacs is in a bad
>>>> state. This way, we don't have to longjmp out of arbitrary code
>>>> sequences.
>>>
>>> There's no problem longjmping out of arbitrary code sequences. When
>>> you debug a program, you do that all the time.
>>
>> In GDB, interrupting normal control flow is not part of standard
>> debugging practice.
>
> ??? Every time a debuggee hits a breakpoint, the normal control flow
> is interrupted, and you in effect have a huge longjmp -- from the
> debuggee to the debugger.
When a program hits a breakpoint, the OS sends it a signal. A debugger
that's ptraced its debugee will receive that signal, suspend execution,
and give control to the user. If the user opts to continue execution,
the debugger restores the debugee to the state it was in when it
received the signal, then allows is to resume execution.
At no point does the debugger force a debugee to longjmp. Debuggers take
pains to make programs behave as if breakpoints weren't there at all. We
don't try to resume execution at the point of a stack overflow.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-23 16:20 ` Philipp Stephani
@ 2015-12-23 16:46 ` Eli Zaretskii
2015-12-23 17:09 ` Paul Eggert
1 sibling, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-23 16:46 UTC (permalink / raw)
To: Philipp Stephani; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 23 Dec 2015 16:20:45 +0000
> Cc: aurelien.aptel+emacs@gmail.com, dancol@dancol.org, tzz@lifelogs.com,
> emacs-devel@gnu.org
>
> > Also, any module with an unbounded amount of computation should call the
> equivalent of QUIT every now and then. If the module API doesn't let (or
> ideally, require) modules to do that now, it should. Otherwise it's an
> Emacs freeze waiting to happen.
>
> I agree, but I think it's unrelated to stack overflow. (Though I
> think we will need to have a module-specific version of QUIT, as long
> as we keep the constraint of not directly longjmping from module
> code.)
>
> There's no harm in providing such a functionality (but of course we can't
> enforce its use).
We will document that in the manual and comment on non-complying code.
> Alternatively modules can run computations in background
> threads and update Emacs state in timer functions.
I'd be very weary of going that way. Emacs's design doesn't support
such techniques very well, and they will of course multiply
complexity.
> I think if we adopt the view that the user should save and exit ASAP
> following recovery from stack overflow (similar to what we expect when
> we run out of memory), the probability of a "disaster" in this
> scenario becomes much lower, perhaps negligible.
>
> The disaster is that longjmps cause undefined behavior in C++ if destructors
> would be skipped (and I don't even want to imagine what would happen to
> higher-level languages such as Java or Haskell).
It is not a disaster if the user saves and exits right after the
recovery, instead of continuing to use the session that hit stack
overflow. Nothing bad will happen.
> All in all, I think the current recovery is "good enough", and
> investing significant efforts into making it slightly better will
> waste resources that are better applied to more important problems.
>
> handle_sigsegv already has several cases where the longjmp is skipped: when in
> a GC, when the signal is received from the wrong thread, and when it's not
> guaranteed to be a SO. In those cases Emacs aborts (still autosaving). Why
> can't we do the same if SO is detected while a module function is active?
It won't auto-save in those cases, and if we try, we will most
probably crash anyway. Those cases are when we have no better
alternative, and if someone comes up with an idea for solving some of
them, we will definitely consider it.
Making stack overflow do the same is a step backward, because it will
increase the probability of losing work.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-23 16:20 ` Philipp Stephani
2015-12-23 16:46 ` Eli Zaretskii
@ 2015-12-23 17:09 ` Paul Eggert
2015-12-23 17:18 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2015-12-23 17:09 UTC (permalink / raw)
To: Philipp Stephani, Eli Zaretskii
Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
Philipp Stephani wrote:
>>> We could merely require that any module needing recursion must call a
>> new stack-size-checking function at each level of recursion, in order to
>> detect stack overflow.
>>
>> That's performance penalty. I don't think we should incur that, on
>> behalf of a problem that really should "never" happen.
>>
>
> It's also impossible. Modules will be written in arbitrary languages and
> make use of arbitrary libraries, some of which might not even know about
> Emacs's existence. We cannot impose any constraints on module code.
There's also the issue that other languages may have their own assumptions about
how stack overflow is detected, assumptions that may disagree with Emacs's
method or even prevent Emacs's method with working. My own preference also is
to rely on the usual VM techniques for ordinary C-code modules, and hope that
other languages don't get in the way. Still, it may not be possible to do that,
and we may be forced to impose a bit of software overflow-checking on modules
implemented via recursion, for module authors who care about reliability.
Does anybody know how C++ and/or Java and/or Python modules detect and recover
from stack overflow on GNU/Linux? That would be a reasonable sanity check here.
(And if nobody knows, that's not a good sign....)
>>> Also, any module with an unbounded amount of computation should call the
>> equivalent of QUIT every now and then. If the module API doesn't let (or
>> ideally, require) modules to do that now, it should. Otherwise it's an
>> Emacs freeze waiting to happen.
>>
>> I agree, but I think it's unrelated to stack overflow. (Though I
>> think we will need to have a module-specific version of QUIT, as long
>> as we keep the constraint of not directly longjmping from module
>> code.)
>>
>
> There's no harm in providing such a functionality (but of course we can't
> enforce its use).
We *could* enforce the use by requiring at least one call to QUIT every (say)
100 ms, and by doing the equivalent of 3x C-g when the time limit is exceeded.
That'd be useful even without modules.
> Even if we
> assume "benign UB" (which is dangerous because UB tends to become more
> malign as compilers evolve), it will cause internal state to become
> inconsistent, which is truly disastrous.
Emacs has been relying on this sort of "benign UB" for years, in areas just as
disaster-prone as C++ cleanup, and it has worked well enough. I don't see why
C++ would be any different. Typically C++ cleanup is just recovering memory, so
we'll have a memory leak; big deal.
> handle_sigsegv already has several cases where the longjmp is skipped: when
> in a GC, when the signal is received from the wrong thread, and when it's
> not guaranteed to be a SO. In those cases Emacs aborts (still autosaving).
1. You're right about the GC, but that aspect of GC is considered to be a bug
(it's called "hard GC error" in the handle_sigsegv comment) and should not be
precedent for us installing similar bugs elsewhere.
2. The wrong-thread check is because we are attempting to detect stack overflow
only within Emacs (which can occur due to a user program error, which should not
crash Emacs); stack-overflow errors in other threads are considered to be
internal errors in Emacs and indeed never should happen, so it's OK to treat
them as fatal errors (I expect this should be true even if the other threads
were created by a module).
> Why can't we do the same if SO is detected while a module function is
> active?
It might be reasonable to do that if the stack overflow were entirely due to the
module -- that'd be like stack overflow in some other thread created by the
module. But it would not work when Lisp code almost overflows the stack, then
calls the module code, which overflows; in this case Emacs needs to recover
rather than aborting.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-23 17:09 ` Paul Eggert
@ 2015-12-23 17:18 ` Daniel Colascione
2015-12-24 2:51 ` Paul Eggert
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-23 17:18 UTC (permalink / raw)
To: Paul Eggert, Philipp Stephani, Eli Zaretskii
Cc: aurelien.aptel+emacs, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 5233 bytes --]
On 12/23/2015 09:09 AM, Paul Eggert wrote:
> Philipp Stephani wrote:
>>>> We could merely require that any module needing recursion must call a
>>> new stack-size-checking function at each level of recursion, in order to
>>> detect stack overflow.
>>>
>>> That's performance penalty. I don't think we should incur that, on
>>> behalf of a problem that really should "never" happen.
>>>
>>
>> It's also impossible. Modules will be written in arbitrary languages and
>> make use of arbitrary libraries, some of which might not even know about
>> Emacs's existence. We cannot impose any constraints on module code.
>
> There's also the issue that other languages may have their own
> assumptions about how stack overflow is detected, assumptions that may
> disagree with Emacs's method or even prevent Emacs's method with
> working. My own preference also is to rely on the usual VM techniques
> for ordinary C-code modules, and hope that other languages don't get in
> the way. Still, it may not be possible to do that, and we may be forced
> to impose a bit of software overflow-checking on modules implemented via
> recursion, for module authors who care about reliability.
>
> Does anybody know how C++ and/or Java and/or Python modules detect and
> recover from stack overflow on GNU/Linux? That would be a reasonable
> sanity check here. (And if nobody knows, that's not a good sign....)
Python uses explicit runtime checks, IIRC. HotSpot uses a guard page.
C++ is just the C ABI and is indifferent to stack overflow. Most C-ABI
programs treat stack overflow as another variety of fatal memory error,
and I think that's the right approach. I don't see why Emacs needs to be
special here.
>>>> Also, any module with an unbounded amount of computation should call
>>>> the
>>> equivalent of QUIT every now and then. If the module API doesn't let (or
>>> ideally, require) modules to do that now, it should. Otherwise it's an
>>> Emacs freeze waiting to happen.
>>>
>>> I agree, but I think it's unrelated to stack overflow. (Though I
>>> think we will need to have a module-specific version of QUIT, as long
>>> as we keep the constraint of not directly longjmping from module
>>> code.)
>>>
>>
>> There's no harm in providing such a functionality (but of course we can't
>> enforce its use).
>
> We *could* enforce the use by requiring at least one call to QUIT every
> (say) 100 ms, and by doing the equivalent of 3x C-g when the time limit
> is exceeded. That'd be useful even without modules.
That's grossly unacceptable. Individual page faults (over which programs
have little control) can take longer than that.
Besides, modifying all code to fit into Emacs' idiosyncratic model of
stack overflow detection is unreasonable. Modules exist in large part to
call into pre-existing, unmodified libraries.
>> Even if we
>> assume "benign UB" (which is dangerous because UB tends to become more
>> malign as compilers evolve), it will cause internal state to become
>> inconsistent, which is truly disastrous.
>
> Emacs has been relying on this sort of "benign UB" for years, in areas
> just as disaster-prone as C++ cleanup, and it has worked well enough. I
> don't see why C++ would be any different. Typically C++ cleanup is just
> recovering memory, so we'll have a memory leak; big deal.
Emacs is a monolith and has been written with non-local returns in mind.
Most programs are not. The existing scheme is completely unsafe; you can
do a lot worse than leak. Please stop repeating the false idea that
longjmp from arbitrary points in the program to toplevel is harmless.
>
>> handle_sigsegv already has several cases where the longjmp is skipped:
>> when
>> in a GC, when the signal is received from the wrong thread, and when it's
>> not guaranteed to be a SO. In those cases Emacs aborts (still
>> autosaving).
>
> 1. You're right about the GC, but that aspect of GC is considered to be
> a bug (it's called "hard GC error" in the handle_sigsegv comment) and
> should not be precedent for us installing similar bugs elsewhere.
>
> 2. The wrong-thread check is because we are attempting to detect stack
> overflow only within Emacs (which can occur due to a user program error,
> which should not crash Emacs); stack-overflow errors in other threads
> are considered to be internal errors in Emacs and indeed never should
> happen, so it's OK to treat them as fatal errors (I expect this should
> be true even if the other threads were created by a module).
>
>> Why can't we do the same if SO is detected while a module function is
>> active?
>
> It might be reasonable to do that if the stack overflow were entirely
> due to the module -- that'd be like stack overflow in some other thread
> created by the module. But it would not work when Lisp code almost
> overflows the stack, then calls the module code, which overflows; in
> this case Emacs needs to recover rather than aborting.
The _only_ thing Emacs can safely do is abort. The right approach is to
make sure aborting minimizes data loss. There is no reason for Emacs to
be more concerned than other programs about this rare error case.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 16:25 ` Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) Daniel Colascione
@ 2015-12-23 17:30 ` Eli Zaretskii
2015-12-23 17:41 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-23 17:30 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Wed, 23 Dec 2015 08:25:51 -0800
>
> We can alloca, say, 8MB, and write to the start and end of the allocated
> region.
How do you know the alloca won't trigger stack overflow?
> Then we'll know we have at least that much stack space available.
At that point, yes. But you need to know that at many other points,
when some of the stack is already used up.
> > I simply don't see any trouble this could cause, except leaking some
> > memory. Can you describe in enough detail a single use case where
> > this could have any other adverse effects that we should care about
> > when recovering from stack overflow?
>
> What happens if we overflow inside malloc? One possibility is that we'll
> longjmp back to toplevel without releasing the heap lock, then deadlock
> the next time we try to allocate.
I very much doubt anything like that can happen. An malloc
implementation which behaves like that won't last long. Lots of C
programs longjmp from signal handlers, so interrupting malloc with,
say, SIGINT, must work. I think even Emacs did something like that in
the past, at least on a TTY, where C-g triggers SIGINT.
> >> We have a program that has its own Lisp runtime, has its own memory
> >> allocation system, uses its own virtual filesystem access layer, and
> >> that brings itself back from the dead. We're well past replicating OS
> >> functionality.
> >
> > Actually, most of the above is simply untrue: we use system allocators
> > to allocate memory
>
> We have internal allocators for strings and conses and use the system
> allocator only for backing storage.
On some systems. Not on all of them.
> , and if by "bringing itself from the dead" you allude to
> > unexec, then what it does is a subset of what every linker does,
> > hardly an OS stuff.
>
> Granted, that's toolchain work, not "OS" work, but it's still outside
> the domain of most text editors.
Sure. But a linker is still an application that reads and writes
files. It doesn't futz with OS-level features like page protection
and processor exceptions.
> > Emacs is not safety-critical software. It doesn't need to be "safe"
> > by your definition, if I understand it correctly.
>
> It's not safety-critical software, but undefined behavior is undefined.
> What makes us confident that we can't corrupt buffer data by longjmping
> from the wrong place?
Nothing makes us confident. Recovery from stack overflow is not
guaranteed to work in all cases. But if it works in some of them, it
is already better than always crashing, IMO.
> Anything can happen because we can longjmp from anywhere.
Yes. But if we hit a stack overflow, we are already in deep trouble.
> What if we just installed a SIGSEGV handler (or, on Windows, a vectored
> exception handler) that wrote buffer contents to a special file on a
> fatal signal, then allowed that fatal signal to propagate normally?
I presume you mean auto-save, not save.
We could try calling shut_down_emacs from the signal handler, but I'm
not sure if the small alternate stack will be enough for write-region.
Something to investigate, I guess.
> The next time Emacs starts, we can restore the buffers we've saved
> this way and ask users to save them --- just like autosave, but done
> on-demand, at crash time, in C code, on the alternate signal stack.
Why "like autosave"? What will be different from actually
auto-saving? shut_down_emacs does that automatically.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 17:30 ` Eli Zaretskii
@ 2015-12-23 17:41 ` Daniel Colascione
2015-12-23 17:55 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-23 17:41 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 5442 bytes --]
On 12/23/2015 09:30 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
>> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Wed, 23 Dec 2015 08:25:51 -0800
>>
>> We can alloca, say, 8MB, and write to the start and end of the allocated
>> region.
>
> How do you know the alloca won't trigger stack overflow?
We don't know that, but at program startup, we have no data to lose. How
do you know Emacs BSS requirements won't run the system out of memory?
>> Then we'll know we have at least that much stack space available.
>
> At that point, yes. But you need to know that at many other points,
> when some of the stack is already used up.
Sure. But now Emacs can ask itself, "do I have at least X KB of stack
space available?", and if the answer is "no", signal if Y KB of stack is
available (Y<X, of course), or abort if not.
>>> I simply don't see any trouble this could cause, except leaking some
>>> memory. Can you describe in enough detail a single use case where
>>> this could have any other adverse effects that we should care about
>>> when recovering from stack overflow?
>>
>> What happens if we overflow inside malloc? One possibility is that we'll
>> longjmp back to toplevel without releasing the heap lock, then deadlock
>> the next time we try to allocate.
>
> I very much doubt anything like that can happen. An malloc
> implementation which behaves like that won't last long. Lots of C
> programs longjmp from signal handlers, so interrupting malloc with,
> say, SIGINT, must work. I think even Emacs did something like that in
> the past, at least on a TTY, where C-g triggers SIGINT.
These programs are all unsafe. If they work, it's by luck alone. In
fact, it's not possible to write a malloc that behaves the way you'd
like, since malloc can legitimately take locks, and the system provides
no way to release them on non-local exit from a signal handler.
You're essentially claiming that programs using pthread_mutex_lock won't
last long. There are a few existence proofs here and there to the contrary.
The problem isn't limited to locks. Malloc could be in the middle of
updating internal data structures when you longjmp out of it. The next
allocation could scribble over arbitrary memory.
>>>> We have a program that has its own Lisp runtime, has its own memory
>>>> allocation system, uses its own virtual filesystem access layer, and
>>>> that brings itself back from the dead. We're well past replicating OS
>>>> functionality.
>>>
>>> Actually, most of the above is simply untrue: we use system allocators
>>> to allocate memory
>>
>> We have internal allocators for strings and conses and use the system
>> allocator only for backing storage.
>
> On some systems. Not on all of them.
>
>> , and if by "bringing itself from the dead" you allude to
>>> unexec, then what it does is a subset of what every linker does,
>>> hardly an OS stuff.
>>
>> Granted, that's toolchain work, not "OS" work, but it's still outside
>> the domain of most text editors.
>
> Sure. But a linker is still an application that reads and writes
> files. It doesn't futz with OS-level features like page protection
> and processor exceptions.
What's so scary about page protection? I've yet to see a coherent
argument for why we shouldn't take advantage of the facility where it's
available.
>>> Emacs is not safety-critical software. It doesn't need to be "safe"
>>> by your definition, if I understand it correctly.
>>
>> It's not safety-critical software, but undefined behavior is undefined.
>> What makes us confident that we can't corrupt buffer data by longjmping
>> from the wrong place?
>
> Nothing makes us confident. Recovery from stack overflow is not
> guaranteed to work in all cases. But if it works in some of them, it
> is already better than always crashing, IMO.
Why? If we can prevent data loss, I'd rather reliably crash than enter
some frankenstate where anything can happen.
>> Anything can happen because we can longjmp from anywhere.
>
> Yes. But if we hit a stack overflow, we are already in deep trouble.
And it's because we're in deep trouble that we should kill the program
as quickly as possible.
>> What if we just installed a SIGSEGV handler (or, on Windows, a vectored
>> exception handler) that wrote buffer contents to a special file on a
>> fatal signal, then allowed that fatal signal to propagate normally?
>
> I presume you mean auto-save, not save.
>
> We could try calling shut_down_emacs from the signal handler, but I'm
> not sure if the small alternate stack will be enough for write-region.
> Something to investigate, I guess.
We can make the alternate signal stack as large as we want.
>> The next time Emacs starts, we can restore the buffers we've saved
>> this way and ask users to save them --- just like autosave, but done
>> on-demand, at crash time, in C code, on the alternate signal stack.
>
> Why "like autosave"? What will be different from actually
> auto-saving? shut_down_emacs does that automatically.
>
Er, yes, I noticed that after I wrote the email that we already do what
I propose, more or less. In this case, we don't lose very much by just
deleting the stack overflow code and relying on autosave.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 17:41 ` Daniel Colascione
@ 2015-12-23 17:55 ` Eli Zaretskii
2015-12-23 17:56 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-23 17:55 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Wed, 23 Dec 2015 09:41:20 -0800
>
> > We could try calling shut_down_emacs from the signal handler, but I'm
> > not sure if the small alternate stack will be enough for write-region.
> > Something to investigate, I guess.
>
> We can make the alternate signal stack as large as we want.
Not as large as is safe to run arbitrary Lisp.
> >> The next time Emacs starts, we can restore the buffers we've saved
> >> this way and ask users to save them --- just like autosave, but done
> >> on-demand, at crash time, in C code, on the alternate signal stack.
> >
> > Why "like autosave"? What will be different from actually
> > auto-saving? shut_down_emacs does that automatically.
> >
>
> Er, yes, I noticed that after I wrote the email that we already do what
> I propose, more or less. In this case, we don't lose very much by just
> deleting the stack overflow code and relying on autosave.
Auto-save will only save buffers that visit files.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 17:55 ` Eli Zaretskii
@ 2015-12-23 17:56 ` Daniel Colascione
2015-12-23 18:09 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-23 17:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]
On 12/23/2015 09:55 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
>> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Wed, 23 Dec 2015 09:41:20 -0800
>>
>>> We could try calling shut_down_emacs from the signal handler, but I'm
>>> not sure if the small alternate stack will be enough for write-region.
>>> Something to investigate, I guess.
>>
>> We can make the alternate signal stack as large as we want.
>
> Not as large as is safe to run arbitrary Lisp.
Then don't run arbitrary lisp after we've segfaulted.
>
>>>> The next time Emacs starts, we can restore the buffers we've saved
>>>> this way and ask users to save them --- just like autosave, but done
>>>> on-demand, at crash time, in C code, on the alternate signal stack.
>>>
>>> Why "like autosave"? What will be different from actually
>>> auto-saving? shut_down_emacs does that automatically.
>>>
>>
>> Er, yes, I noticed that after I wrote the email that we already do what
>> I propose, more or less. In this case, we don't lose very much by just
>> deleting the stack overflow code and relying on autosave.
>
> Auto-save will only save buffers that visit files.
>
A minimal, C-only autosave run in the signal-handling path could save
all buffer content.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 17:56 ` Daniel Colascione
@ 2015-12-23 18:09 ` Eli Zaretskii
2015-12-23 18:19 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-23 18:09 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Wed, 23 Dec 2015 09:56:31 -0800
>
> >> We can make the alternate signal stack as large as we want.
> >
> > Not as large as is safe to run arbitrary Lisp.
>
> Then don't run arbitrary lisp after we've segfaulted.
It's out of your control.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 18:09 ` Eli Zaretskii
@ 2015-12-23 18:19 ` Daniel Colascione
2015-12-23 18:45 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-23 18:19 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]
On 12/23/2015 10:09 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
>> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Wed, 23 Dec 2015 09:56:31 -0800
>>
>>>> We can make the alternate signal stack as large as we want.
>>>
>>> Not as large as is safe to run arbitrary Lisp.
>>
>> Then don't run arbitrary lisp after we've segfaulted.
>
> It's out of your control.
>
No it isn't. We don't have to run the generic auto-save logic: in fact,
we probably shouldn't run arbitrary lisp, because a fatal signal
indicates that the process is in a bad state. Instead, if we really want
to minimize the possibility of data loss, we should use a pure-C
autosave system directly from the crash handler, not longjmp from
arbitrary parts of the program to toplevel.
The other option is to use a guard page: on stack overflow, unprotect
the guard page (allowing program execution to proceed normally for a
little while longer --- again, no longjmp), Fsignal at the next
opportunity to QUIT, invoke out_of_memory after the signal, and let
users save at that point.
You're against installing a guard page because it's something the OS
does. I don't see how that's relevant, since the OS gives us APIs
achieve exactly the behavior we want. The other objection to using a
guard page is that it requires determining in advance the maximum amount
of stack space we want to give Emacs, but if we limit it to 8MB on
Windows, 8MB of stack space should be enough anywhere.
Regardless, the current mechanism does not achieve its goal. It's
utterly unsafe even without module code added to the mix. A mechanism
that invokes arbitrary undefined behavior is *worse* than useless.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 18:19 ` Daniel Colascione
@ 2015-12-23 18:45 ` Eli Zaretskii
2015-12-24 3:26 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-23 18:45 UTC (permalink / raw)
To: Daniel Colascione
Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Wed, 23 Dec 2015 10:19:15 -0800
>
> >>>> We can make the alternate signal stack as large as we want.
> >>>
> >>> Not as large as is safe to run arbitrary Lisp.
> >>
> >> Then don't run arbitrary lisp after we've segfaulted.
> >
> > It's out of your control.
>
> No it isn't. We don't have to run the generic auto-save logic: in fact,
> we probably shouldn't run arbitrary lisp, because a fatal signal
> indicates that the process is in a bad state. Instead, if we really want
> to minimize the possibility of data loss, we should use a pure-C
> autosave system directly from the crash handler, not longjmp from
> arbitrary parts of the program to toplevel.
auto-save is implemented in C anyway. But it calls functions that
might call Lisp out of your control. We attempt to disable that when
in emergency shutdown, but it's not bullet-proof.
And there still is a problem of buffers that don't visit files.
> The other option is to use a guard page: on stack overflow, unprotect
> the guard page (allowing program execution to proceed normally for a
> little while longer --- again, no longjmp), Fsignal at the next
> opportunity to QUIT, invoke out_of_memory after the signal, and let
> users save at that point.
The guard page is too small for any serious code.
> Regardless, the current mechanism does not achieve its goal.
Of course, it does.
> A mechanism that invokes arbitrary undefined behavior is *worse*
> than useless.
I cannot disagree more.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-23 17:18 ` Daniel Colascione
@ 2015-12-24 2:51 ` Paul Eggert
2015-12-24 3:11 ` Daniel Colascione
2015-12-24 16:10 ` Eli Zaretskii
0 siblings, 2 replies; 177+ messages in thread
From: Paul Eggert @ 2015-12-24 2:51 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Emacs Development
Daniel Colascione wrote:
> Python uses explicit runtime checks, IIRC. HotSpot uses a guard page.
This suggests that Emacs could profitably use either approach to detecting stack
overflow.
>> We *could* enforce the use by requiring at least one call to QUIT every
>> (say) 100 ms, and by doing the equivalent of 3x C-g when the time limit
>> is exceeded. That'd be useful even without modules.
>
> That's grossly unacceptable. Individual page faults (over which programs
> have little control) can take longer than that.
We are straying into a different topic here, but this is a problem that needs to
be addressed. If 100 ms is too small, make it 1 s, or enable the timeout only on
C-g (C-g C-g C-g already can cause longjmp from arbitrary code, so this isn't
much of a stretch). The point is that Emacs should not freeze indefinitely.
> Besides, modifying all code to fit into Emacs' idiosyncratic model of
> stack overflow detection is unreasonable.
There should be no need to modify library code. We should be able to get this to
work by having the library wrapper deal with the issue one way or another.
> Please stop repeating the false idea that
> longjmp from arbitrary points in the program to toplevel is harmless.
Neither Eli nor I have said it's harmless. Merely that it works well enough in
practice. Let's not make perfection the enemy of functionality.
> the current mechanism does not achieve its goal. It's
> utterly unsafe even without module code added to the mix.
It's safe enough in practice. You're right that in *theory* it's utterly unsafe,
but Emacs is a practical program not a theoretical exercise.
Really, the idea that we'll let Emacs crash on stack overflow (merely because
modules are being used) is a non-starter. We need a better solution.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-24 2:51 ` Paul Eggert
@ 2015-12-24 3:11 ` Daniel Colascione
2015-12-24 16:10 ` Eli Zaretskii
1 sibling, 0 replies; 177+ messages in thread
From: Daniel Colascione @ 2015-12-24 3:11 UTC (permalink / raw)
To: Paul Eggert; +Cc: Emacs Development
[-- Attachment #1: Type: text/plain, Size: 3324 bytes --]
On 12/23/2015 06:51 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>
>> Python uses explicit runtime checks, IIRC. HotSpot uses a guard page.
>
> This suggests that Emacs could profitably use either approach to
> detecting stack overflow.
For managed code, yes. That's the key distinction that you're missing:
we can prevent runaway stack use in *lisp* code. Doing it for C code is
a mind-bogglingly bad idea, especially given the fundamental
misunderstanding of the dangers involved expressed on this thread. (The
idea that "the worst we can do is leak" reflects a dangerous
misunderstanding of what can go wrong.)
>
>>> We *could* enforce the use by requiring at least one call to QUIT every
>>> (say) 100 ms, and by doing the equivalent of 3x C-g when the time limit
>>> is exceeded. That'd be useful even without modules.
>>
>> That's grossly unacceptable. Individual page faults (over which programs
>> have little control) can take longer than that.
>
> We are straying into a different topic here, but this is a problem that
> needs to be addressed. If 100 ms is too small, make it 1 s, or enable
> the timeout only on C-g (C-g C-g C-g already can cause longjmp from
> arbitrary code, so this isn't much of a stretch). The point is that
> Emacs should not freeze indefinitely.
So, what, we should longjmp out of pthread_mutex_lock if we think it's
taking too long? You can't arbitrarily break program semantics this way,
especially if we're going to run third-party code via modules. Where are
you getting this idea that we need to provide hacky, broken, and
unnecessary crash prevention facilities that other programs don't
(because they're all bad ideas)?
>
>> Besides, modifying all code to fit into Emacs' idiosyncratic model of
>> stack overflow detection is unreasonable.
>
> There should be no need to modify library code. We should be able to get
> this to work by having the library wrapper deal with the issue one way
> or another.
>
>> Please stop repeating the false idea that
>> longjmp from arbitrary points in the program to toplevel is harmless.
>
> Neither Eli nor I have said it's harmless. Merely that it works well
> enough in practice. Let's not make perfection the enemy of functionality.
I've already explained the correct way to avoid data loss. Keeping a
damaged Emacs instance alive does nobody any good. You're willfully
invoking undefined behavior to achieve something that's not really
necessary.
>
>> the current mechanism does not achieve its goal. It's
>> utterly unsafe even without module code added to the mix.
>
> It's safe enough in practice. You're right that in *theory* it's utterly
> unsafe, but Emacs is a practical program not a theoretical exercise.
>
> Really, the idea that we'll let Emacs crash on stack overflow (merely
> because modules are being used) is a non-starter. We need a better
> solution.
Next, we'll be talking about some way to make Emacs not crash on NULL
pointer dereference. This line of reasoning is very scary. We should
prevent *managed* code from crashing at all costs. Trying to impose the
same safety guarantees on C code will do more harm than good. There's a
very good reason other programs don't try to do it. Emacs is alone in
this very bad idea.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2015-12-23 18:45 ` Eli Zaretskii
@ 2015-12-24 3:26 ` Daniel Colascione
0 siblings, 0 replies; 177+ messages in thread
From: Daniel Colascione @ 2015-12-24 3:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, p.stephani2, eggert, tzz, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2248 bytes --]
On 12/23/2015 10:45 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, aurelien.aptel+emacs@gmail.com,
>> p.stephani2@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Wed, 23 Dec 2015 10:19:15 -0800
>>
>>>>>> We can make the alternate signal stack as large as we want.
>>>>>
>>>>> Not as large as is safe to run arbitrary Lisp.
>>>>
>>>> Then don't run arbitrary lisp after we've segfaulted.
>>>
>>> It's out of your control.
>>
>> No it isn't. We don't have to run the generic auto-save logic: in fact,
>> we probably shouldn't run arbitrary lisp, because a fatal signal
>> indicates that the process is in a bad state. Instead, if we really want
>> to minimize the possibility of data loss, we should use a pure-C
>> autosave system directly from the crash handler, not longjmp from
>> arbitrary parts of the program to toplevel.
>
> auto-save is implemented in C anyway. But it calls functions that
> might call Lisp out of your control. We attempt to disable that when
> in emergency shutdown, but it's not bullet-proof.
>
> And there still is a problem of buffers that don't visit files.
So make it bullet-proof and very dumb: add a bit of C code that visits
all buffers and writes their contents to a file we've pre-opened (so we
have a file descriptor handy). On the next startup, read that file and
restore the buffers.
I don't think that measure is necessary though, since we already deal
with stack overflow of Lisp in other ways. What convinces you that stack
overflow of C code is a real problem?
>> The other option is to use a guard page: on stack overflow, unprotect
>> the guard page (allowing program execution to proceed normally for a
>> little while longer --- again, no longjmp), Fsignal at the next
>> opportunity to QUIT, invoke out_of_memory after the signal, and let
>> users save at that point.
>
> The guard page is too small for any serious code.
It depends on how many of them you want to have. After all, until
they're used, they consume only address space. (That's true on Windows
as well.)
>> Regardless, the current mechanism does not achieve its goal.
>
> Of course, it does.
It does not.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-24 2:51 ` Paul Eggert
2015-12-24 3:11 ` Daniel Colascione
@ 2015-12-24 16:10 ` Eli Zaretskii
2015-12-24 17:04 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-24 16:10 UTC (permalink / raw)
To: Paul Eggert; +Cc: dancol, Emacs-devel
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 23 Dec 2015 18:51:23 -0800
> Cc: Emacs Development <Emacs-devel@gnu.org>
>
> > Please stop repeating the false idea that
> > longjmp from arbitrary points in the program to toplevel is harmless.
>
> Neither Eli nor I have said it's harmless. Merely that it works well enough in
> practice. Let's not make perfection the enemy of functionality.
Right.
> > the current mechanism does not achieve its goal. It's
> > utterly unsafe even without module code added to the mix.
>
> It's safe enough in practice. You're right that in *theory* it's utterly unsafe,
> but Emacs is a practical program not a theoretical exercise.
>
> Really, the idea that we'll let Emacs crash on stack overflow (merely because
> modules are being used) is a non-starter. We need a better solution.
100% agreement.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-24 16:10 ` Eli Zaretskii
@ 2015-12-24 17:04 ` Daniel Colascione
2015-12-24 17:17 ` John Wiegley
2015-12-24 17:36 ` Eli Zaretskii
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2015-12-24 17:04 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 5273 bytes --]
On 12/24/2015 08:10 AM, Eli Zaretskii wrote:
>> From: Paul Eggert <eggert@cs.ucla.edu>
>> Date: Wed, 23 Dec 2015 18:51:23 -0800
>> Cc: Emacs Development <Emacs-devel@gnu.org>
>>
>>> Please stop repeating the false idea that
>>> longjmp from arbitrary points in the program to toplevel is harmless.
>>
>> Neither Eli nor I have said it's harmless. Merely that it works well enough in
>> practice. Let's not make perfection the enemy of functionality.
>
> Right.
>
>> > the current mechanism does not achieve its goal. It's
>> > utterly unsafe even without module code added to the mix.
>>
>> It's safe enough in practice. You're right that in *theory* it's utterly unsafe,
>> but Emacs is a practical program not a theoretical exercise.
>>
>> Really, the idea that we'll let Emacs crash on stack overflow (merely because
>> modules are being used) is a non-starter. We need a better solution.
>
> 100% agreement.
>
You'd prefer Emacs to lock up or corrupt data instead?
Neither you nor Paul have addressed any of the alternatives to this
longjmp-from-anywhere behavior. You have not addressed the point that
Emacs can crash fatally in numerous ways having nothing to do with stack
overflow. You have not addressed the point that we already have robust
stack overflow protection at the Lisp level, and so don't need
additional workarounds at the C level. You have not even provided any
evidence that C-level stack overflow is a problem worth solving.
All I see is a insistence that we keep the longjmp hack stay because
"Emacs must not crash", even though it demonstrably does crash in
numerous exciting ways, and won't stop any time soon, because real
programs always have bugs, and experience shows that failing quickly
(trying to preserve data) is better than trying to limp along, because
that just makes the situation worse.
I know the rebuttal to that last point is that the perfect shouldn't be
the enemy of the good: believe me, I've debugged enough crashes and
hangs caused by well-intentioned crash recovery code to know that
invoking undefined behavior to recover from a crash is far below "good"
on the scale of things you can do to improve program reliability.
There is a good reason that other programs --- not other text editors
[1], not other VMs [2], not web browsers [3], not GCC [4], nor GDB [5]
--- uses the completely unsafe mechanism Emacs currently uses to react
to stack overflow. (If such programs exist, I haven't seen them.) Most
programs, in fact, don't bother trying to recover from stack overflow,
because most of the time, in practice, their stack use is bounded
statically.
Let me detail a *safe*, *effective* alternative one more time. If you
really want to make lisp-induced stack overflow less likely, here is how
you do it:
1) Using some mechanism (alloca will work, although OS-specific options
exist), make sure you have X MB of address space dedicated to the main
thread on startup. At this point, we cannot lose data, and failing to
obtain this address space is both unlikely and as harmful as failing to
obtain space for Emacs BSS.
2) Now we know the addresses of the top and bottom of the stack.
3) On each time Lisp calls into C, each time a module calls into the
Emacs core, and on each QUIT, subtract the current stack pointer from
the top of the stack. The result is a lower bound on the amount of stack
space available. This computation is very cheap: it's one load from
global storage or TLS and a subtract instruction.
4) If the amount of stack space available is less than some threshold,
say Y, signal a stack exhaustion error.
5) Require that C code (modules included) do not use more than Y MB of
stack space between QUITs or calls to the module API
6) Set Y to a reasonable figure like 4MB. Third-party libraries must
already be able to run in bounded stack space because they're usually
designed to run off the main thread, and on both Windows and POSIX
systems, non-main thread stacks are sized on thread startup and cannot grow.
I have no idea why we would prefer the SIGSEGV trap approach to
the scheme I just outlined.
As a practical matter, modules will not adhere to weird Emacs-specific
stack overflow detection schemes. Insisting on them will not help. If
the current longjmp scheme remains in place, the user-visible behavior
will be "Emacs randomly locks up, and the stack in the debugger is
impossible according to the code as written", not "I was able to save my
data".
[1] vim (7.4.712) autosaves and exits on fatal signals, of which SIGSEGV
is one. It uses an alternate signal stack to do it, just as I proposed
in a previous message.
[2] hotspot (openjdk-8-u845-b14) uses a guard page to generate
StackOverflowError when Java code blows the stack, but if the
overflowing frame is C code, it simply lets the program crash
[3] Firefox 43 uses Breakpad to handle fatal errors (which we really
should do too, but that's a separate discussion)
[4] GCC (current master) turns SIGSEGV into an internal compilation error
[5] GDB (current master) just crashes on SIGSEGV, although it does have
a special case for trying to catch crashes in the C++ name demangler
functions
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-24 17:04 ` Daniel Colascione
@ 2015-12-24 17:17 ` John Wiegley
2016-01-03 14:27 ` Daniel Colascione
2015-12-24 17:36 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: John Wiegley @ 2015-12-24 17:17 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> Neither you nor Paul have addressed any of the alternatives to this
> longjmp-from-anywhere behavior. You have not addressed the point that Emacs
> can crash fatally in numerous ways having nothing to do with stack overflow.
> You have not addressed the point that we already have robust stack overflow
> protection at the Lisp level, and so don't need additional workarounds at
> the C level. You have not even provided any evidence that C-level stack
> overflow is a problem worth solving.
Would someone be willing to summarize where we're at at this point with this
discussion? It has been long and large enough that I'm no longer clear on
exactly what it is that we do and don't want, and why. Just a summary of our
major alternatives at this point, and the most significant points for and
against each would be great.
Thanks,
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-24 17:04 ` Daniel Colascione
2015-12-24 17:17 ` John Wiegley
@ 2015-12-24 17:36 ` Eli Zaretskii
2015-12-24 18:06 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-24 17:36 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Thu, 24 Dec 2015 09:04:49 -0800
>
> You'd prefer Emacs to lock up or corrupt data instead?
Instead of crashing and corrupting data? What's the difference?
Of course, if it would do that all the time, or even most of the time,
we'd consider the solution a bad one, and remove it or look for ways
of improving it. But we are not there; in most cases the recovery
doesn't hang and doesn't corrupt any data.
> Neither you nor Paul have addressed any of the alternatives to this
> longjmp-from-anywhere behavior. You have not addressed the point that
> Emacs can crash fatally in numerous ways having nothing to do with stack
> overflow. You have not addressed the point that we already have robust
> stack overflow protection at the Lisp level, and so don't need
> additional workarounds at the C level. You have not even provided any
> evidence that C-level stack overflow is a problem worth solving.
I think we did address those, you just didn't like the responses, so
you don't accept them as responses.
> All I see is a insistence that we keep the longjmp hack stay because
> "Emacs must not crash", even though it demonstrably does crash in
> numerous exciting ways, and won't stop any time soon, because real
> programs always have bugs, and experience shows that failing quickly
> (trying to preserve data) is better than trying to limp along, because
> that just makes the situation worse.
Stack overflow recovery is an attempt to solve some of these crashes.
Having it means that users will lose their work in a smaller number of
use cases. So it's an improvement, even if a small one. I fail to
see in it any cause for such excitement.
> I know the rebuttal to that last point is that the perfect shouldn't be
> the enemy of the good: believe me, I've debugged enough crashes and
> hangs caused by well-intentioned crash recovery code to know that
> invoking undefined behavior to recover from a crash is far below "good"
> on the scale of things you can do to improve program reliability.
I believe you. Now please believe me and Paul who have slightly
different experience and have come to slightly different conclusions.
> 1) Using some mechanism (alloca will work, although OS-specific options
> exist), make sure you have X MB of address space dedicated to the main
> thread on startup. At this point, we cannot lose data, and failing to
> obtain this address space is both unlikely and as harmful as failing to
> obtain space for Emacs BSS.
>
> 2) Now we know the addresses of the top and bottom of the stack.
>
> 3) On each time Lisp calls into C, each time a module calls into the
> Emacs core, and on each QUIT, subtract the current stack pointer from
> the top of the stack. The result is a lower bound on the amount of stack
> space available. This computation is very cheap: it's one load from
> global storage or TLS and a subtract instruction.
>
> 4) If the amount of stack space available is less than some threshold,
> say Y, signal a stack exhaustion error.
>
> 5) Require that C code (modules included) do not use more than Y MB of
> stack space between QUITs or calls to the module API
>
> 6) Set Y to a reasonable figure like 4MB. Third-party libraries must
> already be able to run in bounded stack space because they're usually
> designed to run off the main thread, and on both Windows and POSIX
> systems, non-main thread stacks are sized on thread startup and cannot grow.
>
> I have no idea why we would prefer the SIGSEGV trap approach to
> the scheme I just outlined.
Your scheme has disadvantages as well. Selecting a good value for Y
is a hard problem. Choose too much, and you will risk aborting valid
programs; choose too little, and you will overflow the stack. Making
sure C doesn't use more than Y is also hard, especially for GC. It
sounds like just making the stack larger is a better and easier
solution.
Threads make this even more complicated. At least on Windows, by
default each thread gets the same amount of memory reserved for its
stack as recorded by the linker in the program's header, i.e. 8MB in
our case. So several threads can easily eat up a large portion of the
program's address space, and then the actual amount of stack is much
smaller than you might think.
So on balance, I don't see how your proposal is better.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-24 17:36 ` Eli Zaretskii
@ 2015-12-24 18:06 ` Daniel Colascione
2015-12-24 19:15 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2015-12-24 18:06 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 7380 bytes --]
On 12/24/2015 09:36 AM, Eli Zaretskii wrote:
>> Cc: Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Thu, 24 Dec 2015 09:04:49 -0800
>>
>> You'd prefer Emacs to lock up or corrupt data instead?
>
> Instead of crashing and corrupting data? What's the difference?
>
> Of course, if it would do that all the time, or even most of the time,
> we'd consider the solution a bad one, and remove it or look for ways
> of improving it. But we are not there; in most cases the recovery
> doesn't hang and doesn't corrupt any data.
How would we know? It's not as if we have telemetry from real users that
lets us quantitatively evaluate crash frequency. (Automatically sending
crash reports is something else we should do, although I suspect that's
going to be a very long discussion.)
In any case, I expect the undefined-behavior problem to be worse in a
modules-heavy system, since most of the Emacs core code is written to
use non-local control flow for error reporting already, and since it
uses the GC for resource cleanup. I expect module code to be written in
a style less tolerant of arbitrary non-local control flow.
>> Neither you nor Paul have addressed any of the alternatives to this
>> longjmp-from-anywhere behavior. You have not addressed the point that
>> Emacs can crash fatally in numerous ways having nothing to do with stack
>> overflow. You have not addressed the point that we already have robust
>> stack overflow protection at the Lisp level, and so don't need
>> additional workarounds at the C level. You have not even provided any
>> evidence that C-level stack overflow is a problem worth solving.
>
> I think we did address those, you just didn't like the responses, so
> you don't accept them as responses.
I have seen no evidence that C stack overflow is a real problem that
justifies the risks inherent in the current error handling scheme.
>> All I see is a insistence that we keep the longjmp hack stay because
>> "Emacs must not crash", even though it demonstrably does crash in
>> numerous exciting ways, and won't stop any time soon, because real
>> programs always have bugs, and experience shows that failing quickly
>> (trying to preserve data) is better than trying to limp along, because
>> that just makes the situation worse.
>
> Stack overflow recovery is an attempt to solve some of these crashes.
> Having it means that users will lose their work in a smaller number of
> use cases. So it's an improvement, even if a small one. I fail to
> see in it any cause for such excitement.
I've already outlined a scheme for preventing data loss in most fatal
crash instances, not just those arising from stack overflow.
>> I know the rebuttal to that last point is that the perfect shouldn't be
>> the enemy of the good: believe me, I've debugged enough crashes and
>> hangs caused by well-intentioned crash recovery code to know that
>> invoking undefined behavior to recover from a crash is far below "good"
>> on the scale of things you can do to improve program reliability.
>
> I believe you. Now please believe me and Paul who have slightly
> different experience and have come to slightly different conclusions.
>
>> 1) Using some mechanism (alloca will work, although OS-specific options
>> exist), make sure you have X MB of address space dedicated to the main
>> thread on startup. At this point, we cannot lose data, and failing to
>> obtain this address space is both unlikely and as harmful as failing to
>> obtain space for Emacs BSS.
>>
>> 2) Now we know the addresses of the top and bottom of the stack.
>>
>> 3) On each time Lisp calls into C, each time a module calls into the
>> Emacs core, and on each QUIT, subtract the current stack pointer from
>> the top of the stack. The result is a lower bound on the amount of stack
>> space available. This computation is very cheap: it's one load from
>> global storage or TLS and a subtract instruction.
>>
>> 4) If the amount of stack space available is less than some threshold,
>> say Y, signal a stack exhaustion error.
>>
>> 5) Require that C code (modules included) do not use more than Y MB of
>> stack space between QUITs or calls to the module API
>>
>> 6) Set Y to a reasonable figure like 4MB. Third-party libraries must
>> already be able to run in bounded stack space because they're usually
>> designed to run off the main thread, and on both Windows and POSIX
>> systems, non-main thread stacks are sized on thread startup and cannot grow.
>>
>> I have no idea why we would prefer the SIGSEGV trap approach to
>> the scheme I just outlined.
>
> Your scheme has disadvantages as well. Selecting a good value for Y
> is a hard problem. Choose too much, and you will risk aborting valid
> programs; choose too little, and you will overflow the stack. Making
> sure C doesn't use more than Y is also hard, especially for GC.
The GC stack use problem is a separate bug. The right fix there, I
think, is to use some data structure other than the C stack for keeping
track of the set of objects being marked.
Other VMs don't tend to have this problem: one common approach is to
allocate managed objects from a contiguous range of address space and
use a bit vector to remember all the object-start positions in this
range. Then, instead of recursively marking all objects, the GC can just
linearly scan from the start to the end of the heap, marking objects as
it goes. We can't do that because our backing store is malloc, not a
linear region we can annotate with a few bit vectors.
We might be able to use some kind of cursor into the now-mandatory
mem_node tree.
In any case, the possibility of the C stack overflowing during GC isn't
relevant to this discussion, since that has isn't covered by the current
logic anyway.
> It
> sounds like just making the stack larger is a better and easier
> solution.
I'd be perfectly happy deleting the stack overflow code entirely and
increasing the declared stack size (on platforms where we ask for it).
> Threads make this even more complicated. At least on Windows, by
> default each thread gets the same amount of memory reserved for its
> stack as recorded by the linker in the program's header, i.e. 8MB in
> our case. So several threads can easily eat up a large portion of the
> program's address space, and then the actual amount of stack is much
> smaller than you might think.
We don't have to run Emacs on the main thread. We could, instead, with
minimal code changes, call CreateThread on startup, supplying a larger
stack size that applies only to that thread. Or we can let X=8MB and
Y=2MB (the system default).
I'm not clear on what you mean by "stack is smaller than you might
think": on both POSIX systems and on Windows, thread stacks are address
space reservations made at thread creation time. If we can't fit another
thread stack in the current address space, the failure mode is thread
creation failing, not thread stacks being undersized.
> So on balance, I don't see how your proposal is better.
I'm really not sure what's balancing the risk of data corruption and
lockups caused by the stack overflow code. Emacs got along fine for
decades before Dmitry added the stack overflow check late last year.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-24 18:06 ` Daniel Colascione
@ 2015-12-24 19:15 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2015-12-24 19:15 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Thu, 24 Dec 2015 10:06:15 -0800
>
> >> You'd prefer Emacs to lock up or corrupt data instead?
> >
> > Instead of crashing and corrupting data? What's the difference?
> >
> > Of course, if it would do that all the time, or even most of the time,
> > we'd consider the solution a bad one, and remove it or look for ways
> > of improving it. But we are not there; in most cases the recovery
> > doesn't hang and doesn't corrupt any data.
>
> How would we know?
I was talking based on my own testing of the feature, back when I
implemented it for MS-Windows.
> In any case, I expect the undefined-behavior problem to be worse in a
> modules-heavy system, since most of the Emacs core code is written to
> use non-local control flow for error reporting already, and since it
> uses the GC for resource cleanup. I expect module code to be written in
> a style less tolerant of arbitrary non-local control flow.
Maybe you are right, it remains to be seen. If indeed this is what
will happen, we will have to deal with that.
> I have seen no evidence that C stack overflow is a real problem that
> justifies the risks inherent in the current error handling scheme.
If it's not a real problem, then this entire discussion was moot,
since the code we are discussing will never run.
> In any case, the possibility of the C stack overflowing during GC isn't
> relevant to this discussion, since that has isn't covered by the current
> logic anyway.
Yes, but by artificially reducing available stack space, we might make
such irrecoverable problems more frequent.
> > It
> > sounds like just making the stack larger is a better and easier
> > solution.
>
> I'd be perfectly happy deleting the stack overflow code entirely and
> increasing the declared stack size (on platforms where we ask for it).
If we think the current stack size is borderline, we could do that
regardless. The lower the probability of stack overflow recovery to
be needed, the better.
> > Threads make this even more complicated. At least on Windows, by
> > default each thread gets the same amount of memory reserved for its
> > stack as recorded by the linker in the program's header, i.e. 8MB in
> > our case. So several threads can easily eat up a large portion of the
> > program's address space, and then the actual amount of stack is much
> > smaller than you might think.
>
> We don't have to run Emacs on the main thread. We could, instead, with
> minimal code changes, call CreateThread on startup, supplying a larger
> stack size that applies only to that thread. Or we can let X=8MB and
> Y=2MB (the system default).
Yes, we could. Assuming that Someone™ does the work of changing the
code that assumes that the main thread runs Lisp etc.
> I'm not clear on what you mean by "stack is smaller than you might
> think": on both POSIX systems and on Windows, thread stacks are address
> space reservations made at thread creation time. If we can't fit another
> thread stack in the current address space, the failure mode is thread
> creation failing, not thread stacks being undersized.
I think lack of address space might mean the thread starts, but then
hits stack overflow, because the stack couldn't be expanded.
Also, at least on Windows the expression of this failure might be
non-obvious. For example, we had a bug a couple of years ago where
the file selection dialog won't pop, or popped and then behaved
incorrectly, for this very reason. Windows sometimes starts threads
on behalf of a process outside of our control.
> > So on balance, I don't see how your proposal is better.
>
> I'm really not sure what's balancing the risk of data corruption and
> lockups caused by the stack overflow code. Emacs got along fine for
> decades before Dmitry added the stack overflow check late last year.
And I'm sure we will get along fine for many years hence.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-24 17:17 ` John Wiegley
@ 2016-01-03 14:27 ` Daniel Colascione
2016-01-03 15:46 ` Eli Zaretskii
` (2 more replies)
0 siblings, 3 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 14:27 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3222 bytes --]
On 12/24/2015 09:17 AM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> Neither you nor Paul have addressed any of the alternatives to this
>> longjmp-from-anywhere behavior. You have not addressed the point that Emacs
>> can crash fatally in numerous ways having nothing to do with stack overflow.
>> You have not addressed the point that we already have robust stack overflow
>> protection at the Lisp level, and so don't need additional workarounds at
>> the C level. You have not even provided any evidence that C-level stack
>> overflow is a problem worth solving.
>
> Would someone be willing to summarize where we're at at this point with this
> discussion? It has been long and large enough that I'm no longer clear on
> exactly what it is that we do and don't want, and why. Just a summary of our
> major alternatives at this point, and the most significant points for and
> against each would be great.
>
If the C stack in Emacs overflows, Emacs crashes and terminates.
Normally, we prevent C stack overflow by preventing Lisp evaluation from
getting too deep by bounding it with the variables max-lisp-eval-depth
and max-specpdl-size, but a nasty C function can still overflow the
stack and crash.
In 2014, Emacs gained a new path in the SIGSEGV handler that attempts to
detect C stack oerflow and longjmp back to toplevel. It's important to
note that we don't just longjmp when we're in a safe position: we
longjmp from *anywhere*, even if we're, say, in the middle of malloc.
This longjmp can corrupt internal state in Emacs or libc, cause
deadlocks, bypass C++ destructors in module code, or literally cause any
behavior whatsoever, since we're violating invariants of the system. The
longjmp also bypasses unwind-protect handlers and other kinds of
resource cleanup. Everyone acknowledges that this path is very unsafe.
Eli and Paul believe that "Emacs should never crash", and that
potentially saving user data is worth the risk of undefined behavior,
which they contend does not occur in practice.
They are wrong. This code is terrible and that we should delete it
immediately. The code is fundamentally flawed and cannot be made to work
properly on any platform. No other program attempts to recover from
stack overflow this way. (I surveyed a few in a previous messages.)
In practice, the Lisp stack depth limits provide enough protection, and
the risk of data corruption is too great. The existing auto-save logic
is good enough for data recovery, especially if we run the sigsegv
handler on the alternate signal stack (which we can make as large as we
want) when possible.
C stack overflow is a programing error just as bad as *((char*)1)=2 and
we shouldn't try to recover from it, *especially* not when this recovery
is dangerous and leads to more problems than it solves.
If we keep this code in Emacs, it sets a precedent for other terrible
forms of crash recovery, like silently ignoring writes to NULL,
replacing reads from NULL with zero, longjmping out of SIGABRT, and so
on. If we believe "Emacs should never crash", we should fix its bugs,
not try to paper over them with undefined behavior.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 14:27 ` Daniel Colascione
@ 2016-01-03 15:46 ` Eli Zaretskii
2016-01-03 15:49 ` Daniel Colascione
2016-01-03 16:31 ` Paul Eggert
2016-01-03 20:25 ` John Wiegley
2 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 15:46 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 06:27:34 -0800
>
> > Would someone be willing to summarize where we're at at this point with this
> > discussion? It has been long and large enough that I'm no longer clear on
> > exactly what it is that we do and don't want, and why. Just a summary of our
> > major alternatives at this point, and the most significant points for and
> > against each would be great.
> >
> [...]
> Eli and Paul believe that "Emacs should never crash", and that
> potentially saving user data is worth the risk of undefined behavior,
> which they contend does not occur in practice.
>
> They are wrong. This code is terrible and that we should delete it
> immediately. The code is fundamentally flawed and cannot be made to work
> properly on any platform. No other program attempts to recover from
> stack overflow this way. (I surveyed a few in a previous messages.)
This is not a summary, this is propaganda. If you cannot summarize an
issue objectively, please don't summarize at all. I deliberately
avoided replying for fear of being too involved to write an objective
summary. I wish you exercised the same self-restraint.
John, please disregard this "summary".
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 15:46 ` Eli Zaretskii
@ 2016-01-03 15:49 ` Daniel Colascione
2016-01-03 16:40 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 15:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1436 bytes --]
On 01/03/2016 07:46 AM, Eli Zaretskii wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 06:27:34 -0800
>>
>>> Would someone be willing to summarize where we're at at this point with this
>>> discussion? It has been long and large enough that I'm no longer clear on
>>> exactly what it is that we do and don't want, and why. Just a summary of our
>>> major alternatives at this point, and the most significant points for and
>>> against each would be great.
>>>
>> [...]
>> Eli and Paul believe that "Emacs should never crash", and that
>> potentially saving user data is worth the risk of undefined behavior,
>> which they contend does not occur in practice.
>>
>> They are wrong. This code is terrible and that we should delete it
>> immediately. The code is fundamentally flawed and cannot be made to work
>> properly on any platform. No other program attempts to recover from
>> stack overflow this way. (I surveyed a few in a previous messages.)
>
> This is not a summary, this is propaganda. If you cannot summarize an
> issue objectively, please don't summarize at all. I deliberately
> avoided replying for fear of being too involved to write an objective
> summary. I wish you exercised the same self-restraint.
>
> John, please disregard this "summary".
Nice trick, shutting down discussion so that the code stays by default.
I see nothing wrong with advocacy.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 14:27 ` Daniel Colascione
2016-01-03 15:46 ` Eli Zaretskii
@ 2016-01-03 16:31 ` Paul Eggert
2016-01-03 16:48 ` Daniel Colascione
2016-01-03 17:16 ` Eli Zaretskii
2016-01-03 20:25 ` John Wiegley
2 siblings, 2 replies; 177+ messages in thread
From: Paul Eggert @ 2016-01-03 16:31 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii, Emacs-devel
Daniel Colascione wrote:
> In 2014, Emacs gained a new path in the SIGSEGV handler that attempts to
> detect C stack oerflow and longjmp back to toplevel. It's important to
> note that we don't just longjmp when we're in a safe position: we
> longjmp from *anywhere*, even if we're, say, in the middle of malloc.
Although that particular code path may have been introduced recently, for
decades Emacs has longjmped from arbitrary locations due to other signals, so
adding a longjmp for SIGSEGV does not introduce new issues.
> The code is fundamentally flawed and cannot be made to work
> properly on any platform.
The code is part of Emacs 24.5 and does not appear to be causing problems; at
least, I don't recall any bug reports from the field. The other longjmps, which
are fundamentally flawed in the same way, have been in Emacs for decades, and
also seem to work well enough in practice.
> No other program attempts to recover from
> stack overflow this way.
True, not *exactly* in this way, but Emacs is pretty special.
> In practice, the Lisp stack depth limits provide enough protection
That won't be true once once people link in dynamic modules, since the modules
may not use Lisp and may exhaust the C stack. And even without modules, I recall
people running into stack-overflow issues in the regular-expression code. Did
that ever get fixed? Even if so, most likely other such issues are still lurking
in the regexp code.
> The existing auto-save logic
> is good enough for data recovery, especially if we run the sigsegv
> handler on the alternate signal stack (which we can make as large as we
> want) when possible.
Something like that could help, yes. But the existing auto-save logic can
longjmp out, so I don't see how this would address your concern about longjmp.
One possible way forward here is the approach recommended by GNU libsigsegv. See
<https://www.gnu.org/software/libsigsegv/> "About stack overflow handlers". In
the past we've avoided libsigsegv's approach because it was considered to be too
heavyweight, but it would be safer to do something along the lines that it
suggests, or perhaps even to use libsigsegv if available.
> If we keep this code in Emacs, it sets a precedent for other terrible
> forms of crash recovery, like silently ignoring writes to NULL ...
Naah. We're pragmatic, not stupid.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 15:49 ` Daniel Colascione
@ 2016-01-03 16:40 ` Eli Zaretskii
2016-01-03 16:50 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 16:40 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 07:49:44 -0800
>
> > This is not a summary, this is propaganda. If you cannot summarize an
> > issue objectively, please don't summarize at all. I deliberately
> > avoided replying for fear of being too involved to write an objective
> > summary. I wish you exercised the same self-restraint.
> >
> > John, please disregard this "summary".
>
> Nice trick, shutting down discussion so that the code stays by default.
I didn't shut down any discussion. You are free to continue writing,
on this subject as on any other.
> I see nothing wrong with advocacy.
Neither do I. Just don't submit advocacy as a "summary" when a
decision-maker requests one in order to make up his/her mind.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 16:31 ` Paul Eggert
@ 2016-01-03 16:48 ` Daniel Colascione
2016-01-03 18:07 ` Paul Eggert
2016-01-03 17:16 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 16:48 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4224 bytes --]
On 01/03/2016 08:31 AM, Paul Eggert wrote:
> Daniel Colascione wrote:
>> In 2014, Emacs gained a new path in the SIGSEGV handler that attempts to
>> detect C stack oerflow and longjmp back to toplevel. It's important to
>> note that we don't just longjmp when we're in a safe position: we
>> longjmp from *anywhere*, even if we're, say, in the middle of malloc.
>
> Although that particular code path may have been introduced recently,
> for decades Emacs has longjmped from arbitrary locations due to other
> signals, so adding a longjmp for SIGSEGV does not introduce new issues.
What do you mean? We can lisp-signal from lots of places (but not
arbitrary values of the program counter), but we don't respond to *unix*
signals by longjmping.
When I added the SIGUSR2 debug-break support, I made handle_user_signal
set Vquit_flag instead of longjmping in order to avoid exactly the
problem I'm highlighting here.
We don't longjmp from SIGIO either, by the way: we set a flag that we
inspect later, at a safe place.
>> The code is fundamentally flawed and cannot be made to work
>> properly on any platform.
>
> The code is part of Emacs 24.5 and does not appear to be causing
> problems; at least, I don't recall any bug reports from the field. The
> other longjmps, which are fundamentally flawed in the same way, have
> been in Emacs for decades, and also seem to work well enough in practice.
>
>> No other program attempts to recover from
>> stack overflow this way.
>
> True, not *exactly* in this way, but Emacs is pretty special.
Not in this respect. In particular, Emacs has no special magic that
makes it safe to longjmp out of arbitrary C program sequences.
>> In practice, the Lisp stack depth limits provide enough protection
>
> That won't be true once once people link in dynamic modules, since the
> modules may not use Lisp and may exhaust the C stack. And even without
> modules, I recall people running into stack-overflow issues in the
> regular-expression code. Did that ever get fixed? Even if so, most
> likely other such issues are still lurking in the regexp code.
>
>> The existing auto-save logic
>> is good enough for data recovery, especially if we run the sigsegv
>> handler on the alternate signal stack (which we can make as large as we
>> want) when possible.
>
> Something like that could help, yes. But the existing auto-save logic
> can longjmp out, so I don't see how this would address your concern
> about longjmp.
It can Fsignal. That's not the same thing. Fsignal is a supported,
controlled mechanism with defined semantics, not a free-for-all like the
SIGSEGV handler.
> One possible way forward here is the approach recommended by GNU
> libsigsegv. See <https://www.gnu.org/software/libsigsegv/> "About stack
> overflow handlers". In the past we've avoided libsigsegv's approach
> because it was considered to be too heavyweight, but it would be safer
> to do something along the lines that it suggests, or perhaps even to use
> libsigsegv if available.
The libsigsegv approach is a problem too: pthread cancellation in
practice is unsupported on some platforms, does not execute C++
destructors on some platforms even where it's supported, is lightly
tested, and conflicts with other uses of sigsegv. In the case of Emacs,
pthread cancellation is additionally risky because we don't even try to
PTHREAD_CANCEL_DISABLE around critical code sequences.
Neither you nor Eli have provided any evidence that we need to worry
about C stack overflow at all and that any of the proposed recovery
schemes is worth the technical complexity and risk. We should treat C
stack overflow exactly like we treat a NULL dereference.
I've previously proposed alternate, safe, ways we can recover from C
stack overflow, but I don't think we should use any of these, because I
don't think it's worth attempting to recover from this bad state at all.
>> If we keep this code in Emacs, it sets a precedent for other terrible
>> forms of crash recovery, like silently ignoring writes to NULL ...
>
> Naah. We're pragmatic, not stupid.
People who call themselves pragmatic have had worse ideas.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 16:40 ` Eli Zaretskii
@ 2016-01-03 16:50 ` Daniel Colascione
2016-01-03 17:20 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 16:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]
On 01/03/2016 08:40 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 07:49:44 -0800
>>
>>> This is not a summary, this is propaganda. If you cannot summarize an
>>> issue objectively, please don't summarize at all. I deliberately
>>> avoided replying for fear of being too involved to write an objective
>>> summary. I wish you exercised the same self-restraint.
>>>
>>> John, please disregard this "summary".
>>
>> Nice trick, shutting down discussion so that the code stays by default.
>
> I didn't shut down any discussion. You are free to continue writing,
> on this subject as on any other.
>
>> I see nothing wrong with advocacy.
>
> Neither do I. Just don't submit advocacy as a "summary" when a
> decision-maker requests one in order to make up his/her mind.
I presented your argument (such as it is), then presented mine. If you
don't feel as though I've adequately presented your position, you can
elaborate.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 16:31 ` Paul Eggert
2016-01-03 16:48 ` Daniel Colascione
@ 2016-01-03 17:16 ` Eli Zaretskii
2016-01-03 17:22 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 17:16 UTC (permalink / raw)
To: Paul Eggert; +Cc: dancol, Emacs-devel
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 3 Jan 2016 08:31:35 -0800
>
> Daniel Colascione wrote:
> > In 2014, Emacs gained a new path in the SIGSEGV handler that attempts to
> > detect C stack oerflow and longjmp back to toplevel. It's important to
> > note that we don't just longjmp when we're in a safe position: we
> > longjmp from *anywhere*, even if we're, say, in the middle of malloc.
>
> Although that particular code path may have been introduced recently, for
> decades Emacs has longjmped from arbitrary locations due to other signals, so
> adding a longjmp for SIGSEGV does not introduce new issues.
>
> > The code is fundamentally flawed and cannot be made to work
> > properly on any platform.
>
> The code is part of Emacs 24.5 and does not appear to be causing problems; at
> least, I don't recall any bug reports from the field. The other longjmps, which
> are fundamentally flawed in the same way, have been in Emacs for decades, and
> also seem to work well enough in practice.
All true.
But we are reiterating a long discussion, where all of this was
already said, and said again, and again, and again. There's nothing
new left to be said here.
Daniel thinks that Emacs should be designed and implemented as
safety-critical software, where any such techniques are a definite
no-no. But Emacs is not a safety-critical program, it is allowed to
crash from time to time, even in nasty ways. It is therefore okay for
such a program to use techniques that make the probability of losing
work lower. My analysis of this discussion is that this is the
crucial point that Daniel refuses to understand and/or agree to --
that being a non safety-critical piece of software means Emacs can do
stuff that it otherwise would have been prohibited from doing.
IOW, a requirement as fundamental as safety-criticality _does_ affect
the design and the techniques allowed during implementation. I submit
that this is a fundamental software engineering issue which cannot be
cast away, and as long as Daniel misinterprets it, we can never agree
on anything. Because in safety-critical software, even a single nasty
crash can be fatal, something that is very far from what Emacs can do.
> One possible way forward here is the approach recommended by GNU libsigsegv. See
> <https://www.gnu.org/software/libsigsegv/> "About stack overflow handlers". In
> the past we've avoided libsigsegv's approach because it was considered to be too
> heavyweight, but it would be safer to do something along the lines that it
> suggests, or perhaps even to use libsigsegv if available.
Just for the record: the MS-Windows implementation of stack overflow
recovery is a simplification of the Windows code in libsigsegv (the
simplification is possible because libsigsegv, as a general-purpose
library, allows more flexible stuff to be done in the handler). So at
least on MS-Windows, we already do what libsigsegv does.
> > If we keep this code in Emacs, it sets a precedent for other terrible
> > forms of crash recovery, like silently ignoring writes to NULL ...
>
> Naah. We're pragmatic, not stupid.
Indeed.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 16:50 ` Daniel Colascione
@ 2016-01-03 17:20 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 17:20 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 08:50:00 -0800
>
> >> I see nothing wrong with advocacy.
> >
> > Neither do I. Just don't submit advocacy as a "summary" when a
> > decision-maker requests one in order to make up his/her mind.
>
> I presented your argument (such as it is), then presented mine. If you
> don't feel as though I've adequately presented your position, you can
> elaborate.
What you wrote after your presentation left that presentation tainted.
John obviously asked for a balanced summary, where each argument is
accompanied by its pros and contras. What you wrote was a far cry
from that.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 17:16 ` Eli Zaretskii
@ 2016-01-03 17:22 ` Daniel Colascione
2016-01-03 17:39 ` Eli Zaretskii
2016-01-03 17:43 ` Eli Zaretskii
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 17:22 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3524 bytes --]
On 01/03/2016 09:16 AM, Eli Zaretskii wrote:
>> From: Paul Eggert <eggert@cs.ucla.edu>
>> Date: Sun, 3 Jan 2016 08:31:35 -0800
>>
>> Daniel Colascione wrote:
>>> In 2014, Emacs gained a new path in the SIGSEGV handler that attempts to
>>> detect C stack oerflow and longjmp back to toplevel. It's important to
>>> note that we don't just longjmp when we're in a safe position: we
>>> longjmp from *anywhere*, even if we're, say, in the middle of malloc.
>>
>> Although that particular code path may have been introduced recently, for
>> decades Emacs has longjmped from arbitrary locations due to other signals, so
>> adding a longjmp for SIGSEGV does not introduce new issues.
>>
>>> The code is fundamentally flawed and cannot be made to work
>>> properly on any platform.
>>
>> The code is part of Emacs 24.5 and does not appear to be causing problems; at
>> least, I don't recall any bug reports from the field. The other longjmps, which
>> are fundamentally flawed in the same way, have been in Emacs for decades, and
>> also seem to work well enough in practice.
>
> All true.
Untrue. Which jumps in particular can come from inside malloc?
> But we are reiterating a long discussion, where all of this was
> already said, and said again, and again, and again. There's nothing
> new left to be said here.
>
> Daniel thinks that Emacs should be designed and implemented as
> safety-critical software, where any such techniques are a definite
> no-no. But Emacs is not a safety-critical program, it is allowed to
> crash from time to time, even in nasty ways. It is therefore okay for
> such a program to use techniques that make the probability of losing
> work lower. My analysis of this discussion is that this is the
> crucial point that Daniel refuses to understand and/or agree to --
> that being a non safety-critical piece of software means Emacs can do
> stuff that it otherwise would have been prohibited from doing.
It's not about whether Emacs is "safety critical" --- it's about whether
you're making the robustness situation worse than it already is by
adding dubious workarounds for a problem we don't actually have.
The Linux kernel doesn't bill itself as safety critical either, and this
kind of reckless sloppiness wouldn't be acceptable there either.
> IOW, a requirement as fundamental as safety-criticality _does_ affect
> the design and the techniques allowed during implementation. I submit
> that this is a fundamental software engineering issue which cannot be
> cast away, and as long as Daniel misinterprets it, we can never agree
> on anything. Because in safety-critical software, even a single nasty
> crash can be fatal, something that is very far from what Emacs can do.
You're creating a false dichotomy between safety-critical software and
everything else. Emacs merely not avionics-grade software does not
excuse the use of techniques that are both inherently incorrect and that
add no real value and quite a bit of real danger.
You have *still* not presented any evidence, not one shred, that we have
a real stack overflow problem that makes it worth relying on more than
the auto-save functionality and that makes it worth reaching for unsafe
and completely undefined behavior.
All you have is your assertion that Emacs is not safety-critical
software, we can should use this technique, which you have not
demonstrated saves anyone anything and which I have demonstrated is
completely unsafe.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 17:22 ` Daniel Colascione
@ 2016-01-03 17:39 ` Eli Zaretskii
2016-01-03 17:49 ` Daniel Colascione
2016-01-03 17:43 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 17:39 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 09:22:32 -0800
>
> > IOW, a requirement as fundamental as safety-criticality _does_ affect
> > the design and the techniques allowed during implementation. I submit
> > that this is a fundamental software engineering issue which cannot be
> > cast away, and as long as Daniel misinterprets it, we can never agree
> > on anything. Because in safety-critical software, even a single nasty
> > crash can be fatal, something that is very far from what Emacs can do.
>
> You're creating a false dichotomy between safety-critical software and
> everything else. Emacs merely not avionics-grade software does not
> excuse the use of techniques that are both inherently incorrect and that
> add no real value and quite a bit of real danger.
It's not false dichotomy, it's real. That you misunderstand this
crucial issue is the root cause of this dispute and of our fundamental
disagreement. You are applying theory outside of its domain of
applicability.
> You have *still* not presented any evidence, not one shred, that we have
> a real stack overflow problem that makes it worth relying on more than
> the auto-save functionality and that makes it worth reaching for unsafe
> and completely undefined behavior.
Not sure what evidence you are looking for. Does the fact that 2 not
entirely stupid Emacs developers, each one with years of hacking Emacs
on their record, disagree with you constitute such an evidence?
> All you have is your assertion that Emacs is not safety-critical
> software, we can should use this technique, which you have not
> demonstrated saves anyone anything and which I have demonstrated is
> completely unsafe.
We are not looking for safe techniques. That's exactly your mistake.
We are looking for pragmatically helpful techniques.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 17:22 ` Daniel Colascione
2016-01-03 17:39 ` Eli Zaretskii
@ 2016-01-03 17:43 ` Eli Zaretskii
1 sibling, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 17:43 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 09:22:32 -0800
>
> > All true.
>
> Untrue. Which jumps in particular can come from inside malloc?
I'll let Paul explain this. I think I know what he has in mind, but
since his level of expertise on Posix signal handling is much higher
than mine, I will leave the floor to Paul.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 17:39 ` Eli Zaretskii
@ 2016-01-03 17:49 ` Daniel Colascione
2016-01-03 18:08 ` Eli Zaretskii
2016-01-03 18:17 ` Paul Eggert
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 17:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3239 bytes --]
On 01/03/2016 09:39 AM, Eli Zaretskii wrote:
>> Cc: Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 09:22:32 -0800
>>
>>> IOW, a requirement as fundamental as safety-criticality _does_ affect
>>> the design and the techniques allowed during implementation. I submit
>>> that this is a fundamental software engineering issue which cannot be
>>> cast away, and as long as Daniel misinterprets it, we can never agree
>>> on anything. Because in safety-critical software, even a single nasty
>>> crash can be fatal, something that is very far from what Emacs can do.
>>
>> You're creating a false dichotomy between safety-critical software and
>> everything else. Emacs merely not avionics-grade software does not
>> excuse the use of techniques that are both inherently incorrect and that
>> add no real value and quite a bit of real danger.
>
> It's not false dichotomy, it's real. That you misunderstand this
> crucial issue is the root cause of this dispute and of our fundamental
> disagreement. You are applying theory outside of its domain of
> applicability.
You're not seeing that robustness applies to all software, not just
"safety-critical" (however you define that) software, because users
depend on software being predictable.
>> You have *still* not presented any evidence, not one shred, that we have
>> a real stack overflow problem that makes it worth relying on more than
>> the auto-save functionality and that makes it worth reaching for unsafe
>> and completely undefined behavior.
>
> Not sure what evidence you are looking for. Does the fact that 2 not
> entirely stupid Emacs developers, each one with years of hacking Emacs
> on their record, disagree with you constitute such an evidence?
That's not evidence. It's the opinion of two people, one of whom
previously said that the worst side effect of this scheme is a potential
memory leak, a statement that suggests that the dangers of this scheme
are not being appreciated.
>> All you have is your assertion that Emacs is not safety-critical
>> software, we can should use this technique, which you have not
>> demonstrated saves anyone anything and which I have demonstrated is
>> completely unsafe.
>
> We are not looking for safe techniques. That's exactly your mistake.
> We are looking for pragmatically helpful techniques.
I don't think this technique is even helpful. Quite the opposite,
actually, if we start to pollute the module API with some facility for
dealing with the result of this awful stack overflow scheme.
The trouble with unsafe mitigations like this one (which inhabits the
same robustness tier as "#define pthread_mutex_lock(l) (void)0 /*LOL
FAST*/") is that errors compound, and once you let undefined behavior
leak in somewhere, you can no longer reason about how the system
operates. It's essential to kill and restart software as soon as you
notice anything going wrong, because only then does reason still apply
to the system.
*Anything* can happen, and there's no guarantee that what happens is
better for the user than an immediate crash. Hell, you can even cause
security problems with schemes of this sort.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 16:48 ` Daniel Colascione
@ 2016-01-03 18:07 ` Paul Eggert
2016-01-03 18:22 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2016-01-03 18:07 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii, Emacs-devel
Daniel Colascione wrote:
> we don't respond to *unix* signals by longjmping.
Not true. A code path is deliver_interrupt_signal → deliver_process_signal →
handle_interrupt_signal → handle_interrupt → quit_throw_to_read_char →
sys_longjmp. For what it's worth, Emacs can also lisp-signal from Unix signal
handlers if immediate_quit is true. This code has been in Emacs for many years.
> The libsigsegv approach is a problem too: pthread cancellation in
> practice is unsupported on some platforms, does not execute C++
> destructors on some platforms even where it's supported, is lightly
> tested, and conflicts with other uses of sigsegv. In the case of Emacs,
> pthread cancellation is additionally risky because we don't even try to
> PTHREAD_CANCEL_DISABLE around critical code sequences.
Yes, of course we'd have to do some work to take the libsigsegv approach. We
would need to use pthread cancellation only on platforms where it works, and we
would need to defer cancellation during critical sections. On platforms that
lack pthread cancellation or where it doesn't work, we'd be no worse off than we
are now. If we do the longjmp ourselves I assume we can work around the C++
destructor problem the same way we do now.
From what Eli writes, some of this work (I don't know how much) has already
been done for the MS-Windows port. It would be helpful to do something similar
for GNU/Linux and similar platforms, and to do it right by marking critical
sections etc. All this would make Emacs more bullet-proof, if someone has the
time to undertake the job.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 17:49 ` Daniel Colascione
@ 2016-01-03 18:08 ` Eli Zaretskii
2016-01-03 18:24 ` Daniel Colascione
2016-01-04 3:17 ` Richard Stallman
2016-01-03 18:17 ` Paul Eggert
1 sibling, 2 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 18:08 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 09:49:03 -0800
>
> >> You're creating a false dichotomy between safety-critical software and
> >> everything else. Emacs merely not avionics-grade software does not
> >> excuse the use of techniques that are both inherently incorrect and that
> >> add no real value and quite a bit of real danger.
> >
> > It's not false dichotomy, it's real. That you misunderstand this
> > crucial issue is the root cause of this dispute and of our fundamental
> > disagreement. You are applying theory outside of its domain of
> > applicability.
>
> You're not seeing that robustness applies to all software, not just
> "safety-critical" (however you define that) software, because users
> depend on software being predictable.
Robustness comes at a price. You are asking Emacs and its users to
pay a heavy price that they don't need to pay, because there are no
requirements for Emacs to be as robust as safety-critical software.
Engineering is about compromises: you design and implement your
systems to meet the requirements with some reasonable margin, but you
do not implement non-essential features that exert a significant
impact on what the product can or cannot do. Doing so is bad
engineering.
> >> You have *still* not presented any evidence, not one shred, that we have
> >> a real stack overflow problem that makes it worth relying on more than
> >> the auto-save functionality and that makes it worth reaching for unsafe
> >> and completely undefined behavior.
> >
> > Not sure what evidence you are looking for. Does the fact that 2 not
> > entirely stupid Emacs developers, each one with years of hacking Emacs
> > on their record, disagree with you constitute such an evidence?
>
> That's not evidence. It's the opinion of two people
The argument is about assessments. There could be no facts here, only
opinions. What else did you expect?
> one of whom previously said that the worst side effect of this
> scheme is a potential memory leak, a statement that suggests that
> the dangers of this scheme are not being appreciated.
Only if you think about Emacs as safety-critical piece of software
that must operate continuously, 24x7. Otherwise, memory leaks when
recovering from a disaster that happens very rarely is quite
acceptable, if it brings other benefits (such as not losing work).
> >> All you have is your assertion that Emacs is not safety-critical
> >> software, we can should use this technique, which you have not
> >> demonstrated saves anyone anything and which I have demonstrated is
> >> completely unsafe.
> >
> > We are not looking for safe techniques. That's exactly your mistake.
> > We are looking for pragmatically helpful techniques.
>
> I don't think this technique is even helpful. Quite the opposite,
> actually, if we start to pollute the module API with some facility for
> dealing with the result of this awful stack overflow scheme.
You are not objective, so you exaggerate the risks and dismiss the
benefits.
> *Anything* can happen, and there's no guarantee that what happens is
> better for the user than an immediate crash. Hell, you can even cause
> security problems with schemes of this sort.
Sorry, that's FUD.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 17:49 ` Daniel Colascione
2016-01-03 18:08 ` Eli Zaretskii
@ 2016-01-03 18:17 ` Paul Eggert
1 sibling, 0 replies; 177+ messages in thread
From: Paul Eggert @ 2016-01-03 18:17 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii; +Cc: Emacs-devel
Daniel Colascione wrote:
> It's the opinion of two people, one of whom
> previously said that the worst side effect of this scheme is a potential
> memory leak
I don't recall anyone making such a statement. For what it's worth, I agree with
you that Emacs's current approach can result in hangs and crashes and in theory
even worse behavior, and that it'd be nice to clean this up. We disagree about
how likely these disasters are in practice, and therefore how high a priority it
is to fix this, and whether we should remove otherwise-useful features in order
to avert the disasters.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 18:07 ` Paul Eggert
@ 2016-01-03 18:22 ` Daniel Colascione
2016-01-03 21:02 ` Paul Eggert
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 18:22 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3590 bytes --]
On 01/03/2016 10:07 AM, Paul Eggert wrote:
> Daniel Colascione wrote:
>
>> we don't respond to *unix* signals by longjmping.
>
> Not true. A code path is deliver_interrupt_signal →
> deliver_process_signal → handle_interrupt_signal → handle_interrupt →
> quit_throw_to_read_char → sys_longjmp.
Not the case. handle_interrupt can call quit_throw_to_read_char only
when waiting_for_input is true, which it is only when, well, we're
waiting for input, not at arbitrary points in the program.
It can't be the case that we can longjmp from arbitrary points in Emacs
in response to a SIGINT, since if we did, C-g would be unsafe and could
crash Emacs, which it doesn't.
> For what it's worth, Emacs can
> also lisp-signal from Unix signal handlers if immediate_quit is true.
> This code has been in Emacs for many years.
Likewise. Any call to malloc (or, in general, any async-signal-unsafe
function) with immediate_quit or waiting_for_input true is a bug that we
need to fix.
Longjmp from a signal handler isn't a bad approach. I use it in my own
programs. It becomes abominable when the longjmp can happen from
*anywhere*, as it can with the stack overflow handler.
>> The libsigsegv approach is a problem too: pthread cancellation in
>> practice is unsupported on some platforms, does not execute C++
>> destructors on some platforms even where it's supported, is lightly
>> tested, and conflicts with other uses of sigsegv. In the case of Emacs,
>> pthread cancellation is additionally risky because we don't even try to
>> PTHREAD_CANCEL_DISABLE around critical code sequences.
>
> Yes, of course we'd have to do some work to take the libsigsegv
> approach. We would need to use pthread cancellation only on platforms
> where it works, and we would need to defer cancellation during critical
> sections. On platforms that lack pthread cancellation or where it
> doesn't work, we'd be no worse off than we are now. If we do the longjmp
> ourselves I assume we can work around the C++ destructor problem the
> same way we do now.
Or can we use a stack guard region [1], and in the signal handler,
unprotect the set a global variable in the signal handler, and check the
variable on QUIT, and at toplevel, reprotect the guard region. If we
segfault again without having reached toplevel, just die. Would that
make you happy? I'd much rather see that approach, which is safe, than
our current one, which isn't.
[1] If I say "guard page", the conventional term, you'll complain that a
single page isn't sufficient.
> From what Eli writes, some of this work (I don't know how much) has
> already been done for the MS-Windows port. It would be helpful to do
> something similar for GNU/Linux and similar platforms, and to do it
> right by marking critical sections etc. All this would make Emacs more
> bullet-proof, if someone has the time to undertake the job.
I haven't looked as much at the Windows stack overflow implementation,
but there's clearly nothing in the code right now that establishes the
critical regions necessary for this scheme to work. Besides, this scheme
*still* leaves you vulnerable to stack overflow, since if you overflow
in a cancellation-disable region, your only option is to crash.
Again, neither you nor Eli have demonstrated in any way that all this
complexity is necessary, that we actually have a C stack overflow
problem, or that we have special needs that other text editors and user
programs don't, needs that justify an elaborate stack overflow recovery
scheme.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 18:08 ` Eli Zaretskii
@ 2016-01-03 18:24 ` Daniel Colascione
2016-01-03 18:51 ` Eli Zaretskii
2016-01-04 3:17 ` Richard Stallman
1 sibling, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 18:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4033 bytes --]
On 01/03/2016 10:08 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 09:49:03 -0800
>>
>>>> You're creating a false dichotomy between safety-critical software and
>>>> everything else. Emacs merely not avionics-grade software does not
>>>> excuse the use of techniques that are both inherently incorrect and that
>>>> add no real value and quite a bit of real danger.
>>>
>>> It's not false dichotomy, it's real. That you misunderstand this
>>> crucial issue is the root cause of this dispute and of our fundamental
>>> disagreement. You are applying theory outside of its domain of
>>> applicability.
>>
>> You're not seeing that robustness applies to all software, not just
>> "safety-critical" (however you define that) software, because users
>> depend on software being predictable.
>
> Robustness comes at a price. You are asking Emacs and its users to
> pay a heavy price that they don't need to pay, because there are no
> requirements for Emacs to be as robust as safety-critical software.
It's not a heavy price at all. We already protect against runaway lisp
code. If this stack overflow recovery were so important, you'd see other
programs in the same niche (e.g., vim) do it. Why is Emacs alone here?
> Engineering is about compromises: you design and implement your
> systems to meet the requirements with some reasonable margin, but you
> do not implement non-essential features that exert a significant
> impact on what the product can or cannot do. Doing so is bad
> engineering.
>
>>>> You have *still* not presented any evidence, not one shred, that we have
>>>> a real stack overflow problem that makes it worth relying on more than
>>>> the auto-save functionality and that makes it worth reaching for unsafe
>>>> and completely undefined behavior.
>>>
>>> Not sure what evidence you are looking for. Does the fact that 2 not
>>> entirely stupid Emacs developers, each one with years of hacking Emacs
>>> on their record, disagree with you constitute such an evidence?
>>
>> That's not evidence. It's the opinion of two people
>
> The argument is about assessments. There could be no facts here, only
> opinions. What else did you expect?
>
>> one of whom previously said that the worst side effect of this
>> scheme is a potential memory leak, a statement that suggests that
>> the dangers of this scheme are not being appreciated.
>
> Only if you think about Emacs as safety-critical piece of software
> that must operate continuously, 24x7. Otherwise, memory leaks when
> recovering from a disaster that happens very rarely is quite
> acceptable, if it brings other benefits (such as not losing work).
My point isn't that memory leaks are disastrous. It's that the
consequences of this code weren't given due consideration at the time it
was committed.
>>>> All you have is your assertion that Emacs is not safety-critical
>>>> software, we can should use this technique, which you have not
>>>> demonstrated saves anyone anything and which I have demonstrated is
>>>> completely unsafe.
>>>
>>> We are not looking for safe techniques. That's exactly your mistake.
>>> We are looking for pragmatically helpful techniques.
>>
>> I don't think this technique is even helpful. Quite the opposite,
>> actually, if we start to pollute the module API with some facility for
>> dealing with the result of this awful stack overflow scheme.
>
> You are not objective, so you exaggerate the risks and dismiss the
> benefits.
I disagree that there *are* significant benefits.
>> *Anything* can happen, and there's no guarantee that what happens is
>> better for the user than an immediate crash. Hell, you can even cause
>> security problems with schemes of this sort.
>
> Sorry, that's FUD.
No it isn't. When you invoke undefined behavior, anything unpleasant can
happen, and at scale, everything unpleasant will.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 18:24 ` Daniel Colascione
@ 2016-01-03 18:51 ` Eli Zaretskii
2016-01-03 19:04 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 18:51 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 10:24:26 -0800
>
> > Robustness comes at a price. You are asking Emacs and its users to
> > pay a heavy price that they don't need to pay, because there are no
> > requirements for Emacs to be as robust as safety-critical software.
>
> It's not a heavy price at all.
Yes, it is. You would like us to crash rather than try recovering.
That is a very heavy price in Emacs.
> > Only if you think about Emacs as safety-critical piece of software
> > that must operate continuously, 24x7. Otherwise, memory leaks when
> > recovering from a disaster that happens very rarely is quite
> > acceptable, if it brings other benefits (such as not losing work).
>
> My point isn't that memory leaks are disastrous. It's that the
> consequences of this code weren't given due consideration at the time it
> was committed.
You have absolutely no evidence that this wasn't considered. It's
factually incorrect. You don't have to know that it's incorrect, but
I would expect you to give more credit to our collective knowledge and
experience than you evidently do.
> > You are not objective, so you exaggerate the risks and dismiss the
> > benefits.
>
> I disagree that there *are* significant benefits.
Of course, you do. Like I said: your bias affects your judgment.
> >> *Anything* can happen, and there's no guarantee that what happens is
> >> better for the user than an immediate crash. Hell, you can even cause
> >> security problems with schemes of this sort.
> >
> > Sorry, that's FUD.
>
> No it isn't. When you invoke undefined behavior, anything unpleasant can
> happen, and at scale, everything unpleasant will.
It's not undefined behavior, not in practice. We know quite well what
can and cannot happen.
Anyway, saying that "unpleasant things can happen" _is_ FUD. I want
to see a single bug report about these unpleasant things happening in
real use, then I'll start thinking whether I should reconsider.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 18:51 ` Eli Zaretskii
@ 2016-01-03 19:04 ` Daniel Colascione
2016-01-03 19:15 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 19:04 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3309 bytes --]
On 01/03/2016 10:51 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 10:24:26 -0800
>>
>>> Robustness comes at a price. You are asking Emacs and its users to
>>> pay a heavy price that they don't need to pay, because there are no
>>> requirements for Emacs to be as robust as safety-critical software.
>>
>> It's not a heavy price at all.
>
> Yes, it is. You would like us to crash rather than try recovering.
> That is a very heavy price in Emacs.
Why is it uniquely unacceptable in Emacs? Why do other programs that
fill the same niche not employ this strategy? Why do we not try to
mitigate NULL pointer dereferences (to which all the same arguments apply)?
You haven't addressed any of these points.
>>> Only if you think about Emacs as safety-critical piece of software
>>> that must operate continuously, 24x7. Otherwise, memory leaks when
>>> recovering from a disaster that happens very rarely is quite
>>> acceptable, if it brings other benefits (such as not losing work).
>>
>> My point isn't that memory leaks are disastrous. It's that the
>> consequences of this code weren't given due consideration at the time it
>> was committed.
>
> You have absolutely no evidence that this wasn't considered. It's
> factually incorrect. You don't have to know that it's incorrect, but
> I would expect you to give more credit to our collective knowledge and
> experience than you evidently do.
I searched the mailing list and saw no discussion of the points I
raised. The rebuttals to my concerns ("so what if some memory leaks?",
"emacs has longjmped from arbitrary points forever") have been
inadequate and incorrect.
>>> You are not objective, so you exaggerate the risks and dismiss the
>>> benefits.
>>
>> I disagree that there *are* significant benefits.
>
> Of course, you do. Like I said: your bias affects your judgment.
So does yours.
>
>>>> *Anything* can happen, and there's no guarantee that what happens is
>>>> better for the user than an immediate crash. Hell, you can even cause
>>>> security problems with schemes of this sort.
>>>
>>> Sorry, that's FUD.
>>
>> No it isn't. When you invoke undefined behavior, anything unpleasant can
>> happen, and at scale, everything unpleasant will.
>
> It's not undefined behavior, not in practice. We know quite well what
> can and cannot happen.
No you don't, because we can longjmp out of third-party code, and unless
you have a crystal ball, you're not going to be able to predict
everything that code can do. That we know what can happen here is simply
false.
>
> Anyway, saying that "unpleasant things can happen" _is_ FUD. I want
> to see a single bug report about these unpleasant things happening in
> real use, then I'll start thinking whether I should reconsider.
And I want to see a real bug report about the stack overflow we're
trying to defend against. The failure mode here wouldn't be obvious
either: Emacs could just silently crash, hang, or write a wrong byte or
two to a file.
You have no idea what might happen, which is especially concerning
because Emacs is frequently an internet-facing network program parsing
untrusted data.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 19:04 ` Daniel Colascione
@ 2016-01-03 19:15 ` Eli Zaretskii
2016-01-03 19:26 ` Daniel Colascione
2016-01-03 19:49 ` John Wiegley
0 siblings, 2 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 19:15 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 11:04:08 -0800
>
> > Yes, it is. You would like us to crash rather than try recovering.
> > That is a very heavy price in Emacs.
>
> Why is it uniquely unacceptable in Emacs? Why do other programs that
> fill the same niche not employ this strategy?
Not many other programs run for so long and have so much precious data
for their users. Besides, who says there are no other programs that
do this? libsigsegv wasn't written as an academic exercise.
> Why do we not try to mitigate NULL pointer dereferences (to which
> all the same arguments apply)?
We do: we catch SIGSEGV and try to save what can be salvaged.
> >> My point isn't that memory leaks are disastrous. It's that the
> >> consequences of this code weren't given due consideration at the time it
> >> was committed.
> >
> > You have absolutely no evidence that this wasn't considered. It's
> > factually incorrect. You don't have to know that it's incorrect, but
> > I would expect you to give more credit to our collective knowledge and
> > experience than you evidently do.
>
> I searched the mailing list and saw no discussion of the points I
> raised.
Who said that considerations must be in public discussions? On the
contrary, I'd rather take the lack of discussions as an indication
that this was considered and no one saw any problem with it.
> >>> You are not objective, so you exaggerate the risks and dismiss the
> >>> benefits.
> >>
> >> I disagree that there *are* significant benefits.
> >
> > Of course, you do. Like I said: your bias affects your judgment.
>
> So does yours.
No, I acknowledge the risks. You don't acknowledge the benefits.
> > It's not undefined behavior, not in practice. We know quite well what
> > can and cannot happen.
>
> No you don't, because we can longjmp out of third-party code
FUD. What "third-party code"? Any code we use in Emacs has its
sources open for scrutiny.
> > Anyway, saying that "unpleasant things can happen" _is_ FUD. I want
> > to see a single bug report about these unpleasant things happening in
> > real use, then I'll start thinking whether I should reconsider.
>
> And I want to see a real bug report about the stack overflow we're
> trying to defend against.
We've been through that already: if stack overflow never happens, the
recovery code can never cause any problems.
> The failure mode here wouldn't be obvious either: Emacs could just
> silently crash, hang, or write a wrong byte or two to a file.
Neither of which is a disaster.
> You have no idea what might happen, which is especially concerning
> because Emacs is frequently an internet-facing network program parsing
> untrusted data.
All I want is to take every measure to avoid losing work. Every other
problem was already there before stack-overflow recovery was added.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 19:15 ` Eli Zaretskii
@ 2016-01-03 19:26 ` Daniel Colascione
2016-01-03 19:46 ` Eli Zaretskii
2016-01-03 19:49 ` John Wiegley
1 sibling, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 19:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 5595 bytes --]
On 01/03/2016 11:15 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 11:04:08 -0800
>>
>>> Yes, it is. You would like us to crash rather than try recovering.
>>> That is a very heavy price in Emacs.
>>
>> Why is it uniquely unacceptable in Emacs? Why do other programs that
>> fill the same niche not employ this strategy?
>
> Not many other programs run for so long and have so much precious data
> for their users. Besides, who says there are no other programs that
> do this? libsigsegv wasn't written as an academic exercise.
Many other programs run as long. One example is the Linux kernel, which
panics on stack overflow.
>> Why do we not try to mitigate NULL pointer dereferences (to which
>> all the same arguments apply)?
>
> We do: we catch SIGSEGV and try to save what can be salvaged.
Invoking auto-save after resetting SIGSEGV is a good application of that
approach. (We should make sure that control flow can't leave the sigsegv
handler.) What's dangerous is allowing Emacs to continue running after
we've detected that it's entered a bad state. I'm not against installing
a sigsegv handler: I'm against returning control flow to toplevel.
>>>> My point isn't that memory leaks are disastrous. It's that the
>>>> consequences of this code weren't given due consideration at the time it
>>>> was committed.
>>>
>>> You have absolutely no evidence that this wasn't considered. It's
>>> factually incorrect. You don't have to know that it's incorrect, but
>>> I would expect you to give more credit to our collective knowledge and
>>> experience than you evidently do.
>>
>> I searched the mailing list and saw no discussion of the points I
>> raised.
>
> Who said that considerations must be in public discussions? On the
> contrary, I'd rather take the lack of discussions as an indication
> that this was considered and no one saw any problem with it.
The existence of consistent with both my view and widespread, sagacious
approval. Given the concerns I raised, the more parsimonious explanation
is that the code went in without review, because even if you and Paul
are right, it's worth having a conversation about the dangers of the
code, and AFAICT, there was none.
>>>>> You are not objective, so you exaggerate the risks and dismiss the
>>>>> benefits.
>>>>
>>>> I disagree that there *are* significant benefits.
>>>
>>> Of course, you do. Like I said: your bias affects your judgment.
>>
>> So does yours.
>
> No, I acknowledge the risks. You don't acknowledge the benefits.
The benefit is that returning control to toplevel allows the user to
save data in buffers where autosave is not enabled. I think the benefit
is slight.
Autosave is the only mechanism that protects against other failure
modes, like the OOM killer, NULL pointer dereferences, and sudden power
loss. Consequently, I strongly suspect that any truly precious data is
in autosave buffers and that this stack overflow mitigation in practice
allows the recovery of nothing important.
>>> It's not undefined behavior, not in practice. We know quite well what
>>> can and cannot happen.
>>
>> No you don't, because we can longjmp out of third-party code
>
> FUD. What "third-party code"? Any code we use in Emacs has its
> sources open for scrutiny.
First of all, it's perfectly legal to update libc to a version that
wasn't around for a particular Emacs release, and this libc (which is
perfectly conforming under _legitimate_ API use) might have problems
with the Emacs recovery scheme that we didn't and couldn't anticipate.
Also, third-party libraries are generally written under the assumption
that control isn't yanked form under them partway through delicate
operations. I don't think it's reasonable to expect that every library
Emacs uses be robust under this kind of abuse.
>>> Anyway, saying that "unpleasant things can happen" _is_ FUD. I want
>>> to see a single bug report about these unpleasant things happening in
>>> real use, then I'll start thinking whether I should reconsider.
>>
>> And I want to see a real bug report about the stack overflow we're
>> trying to defend against.
>
> We've been through that already: if stack overflow never happens, the
> recovery code can never cause any problems.
Given that stack overflow is rare, we won't get to test the scenario
much. We should err on the side of making Emacs behave predictably
instead of trying to recover using undefined behavior, because if the
recovery causes problems, it'll be hard to tell.
>> The failure mode here wouldn't be obvious either: Emacs could just
>> silently crash, hang, or write a wrong byte or two to a file.
>
> Neither of which is a disaster.
Neither of which will produce a bug report blaming this code, so the
lack of bug reports is not positive evidence that this code is harmless.
>> You have no idea what might happen, which is especially concerning
>> because Emacs is frequently an internet-facing network program parsing
>> untrusted data.
>
> All I want is to take every measure to avoid losing work. Every other
> problem was already there before stack-overflow recovery was added.
I agree that we should avoid losing work. The way to do that is to beef
up autosave so that after a crash, we can recover quickly. That's the
approach other long-running programs with precious user data, like
Office, Visual Studio, Firefox, and vim, use.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 19:26 ` Daniel Colascione
@ 2016-01-03 19:46 ` Eli Zaretskii
2016-01-03 19:47 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-03 19:46 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 11:26:26 -0800
>
> [...]
This is going nowhere. We have circled around the same arguments for
at least the 3rd time. We evidently have too different perspectives
on what's involved, and no hope of ever agreeing.
We should stop. Well, I am.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 19:46 ` Eli Zaretskii
@ 2016-01-03 19:47 ` Daniel Colascione
0 siblings, 0 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 19:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 546 bytes --]
On 01/03/2016 11:46 AM, Eli Zaretskii wrote:
>> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 11:26:26 -0800
>>
>> [...]
>
> This is going nowhere. We have circled around the same arguments for
> at least the 3rd time. We evidently have too different perspectives
> on what's involved, and no hope of ever agreeing.
>
> We should stop. Well, I am.
I'd like johnw to weigh in the code. That was the entire point of
replying to his request for a summary.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 19:15 ` Eli Zaretskii
2016-01-03 19:26 ` Daniel Colascione
@ 2016-01-03 19:49 ` John Wiegley
2016-01-03 20:14 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-03 19:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Daniel Colascione, Emacs-devel, eggert
>>>>> Eli Zaretskii <eliz@gnu.org> writes:
> All I want is to take every measure to avoid losing work. Every other
> problem was already there before stack-overflow recovery was added.
I asked last week for a summary of the options available for moving forward.
Can someone please summarize so we can end this discussion with a decision?
Thank you,
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 19:49 ` John Wiegley
@ 2016-01-03 20:14 ` Daniel Colascione
0 siblings, 0 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 20:14 UTC (permalink / raw)
To: Eli Zaretskii, eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 591 bytes --]
On 01/03/2016 11:49 AM, John Wiegley wrote:
>>>>>> Eli Zaretskii <eliz@gnu.org> writes:
>
>> All I want is to take every measure to avoid losing work. Every other
>> problem was already there before stack-overflow recovery was added.
>
> I asked last week for a summary of the options available for moving forward.
> Can someone please summarize so we can end this discussion with a decision?
Well, I tried. I guess it'll have to be someone who isn't Eli, Paul, or
me. Alternatively, reading the thread from today's first message should
give you an idea of what it's about.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 14:27 ` Daniel Colascione
2016-01-03 15:46 ` Eli Zaretskii
2016-01-03 16:31 ` Paul Eggert
@ 2016-01-03 20:25 ` John Wiegley
2016-01-03 20:47 ` Daniel Colascione
2 siblings, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-03 20:25 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1506 bytes --]
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> In practice, the Lisp stack depth limits provide enough protection, and the
> risk of data corruption is too great. The existing auto-save logic is good
> enough for data recovery, especially if we run the sigsegv handler on the
> alternate signal stack (which we can make as large as we want) when
> possible.
OK, I see we have two roads, and I see where your objection is coming from.
You say, "In practice". Can you expound on your practical experience? I'm
curious if there's a real experience you've had that leads to such a strong
objection.
Also, note that other cases of error recovery leading to undefined behavior
exist in the wild: If a process uses too much memory, Linux's OOM killer will
terminate arbitrary processes in an attempt to prevent system lockup. There
are no guarantees that it will not kill something that leaves the system in an
inconsistent or bad state, since the process it kills may have been in the
middle of a critical process, and the author might not have written proper
signal handlers.
I'm inclined to leave the stack overflow protection in until it bites us;
because I know from personal evidence that having Emacs suddenly disappear
DOES bite people. I'm less sure about "undefined behavior" that I haven't
experienced yet...
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 20:25 ` John Wiegley
@ 2016-01-03 20:47 ` Daniel Colascione
2016-01-03 21:07 ` John Wiegley
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 20:47 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4387 bytes --]
On 01/03/2016 12:25 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> In practice, the Lisp stack depth limits provide enough protection, and the
>> risk of data corruption is too great. The existing auto-save logic is good
>> enough for data recovery, especially if we run the sigsegv handler on the
>> alternate signal stack (which we can make as large as we want) when
>> possible.
>
> OK, I see we have two roads, and I see where your objection is coming from.
>
> You say, "In practice". Can you expound on your practical experience? I'm
> curious if there's a real experience you've had that leads to such a strong
> objection.
I hate to use arguments from experience, but you asked: I worked on
crash reporting for Windows Phone, and I do significant work for crash
reporting on Messenger and Facebook for Android. I've worked extensively
with Breakpad, ACRA, multiplexed unix signal handlers, crash
classification, and so on. In my experience, attempts to recover from
crashes have almost always made problems worse: they obscure root causes
of important bugs by causing seemingly-impossible downstream crashes and
data corruption.
It's not just a theoretical problem: I've spent lots of late nights
staring at stack traces, trying to figure out how a certain deadlock
could be possible, only to realize that the program had already crashed
--- or would have, if a seldom-tested bit of code hadn't checked for
NULL and returned without releasing a lock, causing a hang half an hour
later. It's even worse with an SEH handler, which allows programmers
writing for Windows to do this:
for(;;) {
__try {
DoSomething();
__except(1) {
// LOL: silently ignore all stack overflow, NULL deref, etc.
}
}
The Emacs error recovery code is similar in spirit. Granted, it's not
silent, and we don't try to recover from *all* segfaults, but it's still
essentially ignoring a programming error and trying to continue.
It's because I've wasted so much time debugging these kinds of programs
that I strongly prefer failing fast when someone goes wrong and relying
on automatic persistence mechanisms to preserve volatile data. I've seen
dozens of simple bugs (that could have been quickly fixed) turn into
monsters because someone tried to paper over them and keep a program
from crashing.
In the context of stack overflow in Emacs, if we're getting this code,
it's because we made a mistake [1] in the C core. There's no user
interaction that should cause us to overflow the stack. I'd rather know
about that mistake and get a user back into a working Emacs as soon as
possible.
[1] The GC tracing thing is concerning, but Paul's post actually gives
me an idea for fixing it without completely redoing marking: we can
reserve GC stack at the same time we allocate lisp objects, say in 2MB
chunks, and then just switch stacks as we mark.
> Also, note that other cases of error recovery leading to undefined behavior
> exist in the wild: If a process uses too much memory, Linux's OOM killer will
> terminate arbitrary processes in an attempt to prevent system lockup. There
> are no guarantees that it will not kill something that leaves the system in an
> inconsistent or bad state, since the process it kills may have been in the
> middle of a critical process, and the author might not have written proper
> signal handlers.
Nit: unfortunately, it's not possible for normal processes to even
detect the Linux OOM killer's operation. Death comes quickly via
SIGKILL; if you want to recover, you need a watchdog.
You're right that under Linux, programs need to prepare for the
possibility that they might suddenly cease to exist. We're talking about
something different here, which is the possibility that a program can
*keep running*, but in a damaged and undefined state.
> I'm inclined to leave the stack overflow protection in until it bites us;
> because I know from personal evidence that having Emacs suddenly disappear
> DOES bite people. I'm less sure about "undefined behavior" that I haven't
> experienced yet...
I'm worried that it'll be hard to know if it bites us, particularly
since the problems I'm imagining are infrequent, unreproducible, and
carry no obvious signature that would show up in a user crash report.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 18:22 ` Daniel Colascione
@ 2016-01-03 21:02 ` Paul Eggert
2016-01-03 21:12 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2016-01-03 21:02 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii, Emacs-devel
Daniel Colascione wrote:
> neither you nor Eli have demonstrated in any way that all this
> complexity is necessary, that we actually have a C stack overflow
> problem,
I mentioned (1) stack overflow in the regexp code, and (2) stack overflow in C
modules; did you miss that? The counterargument for (2) that C modules can crash
Emacs in countless ways so let's not worry about stack overflow is not all that
convincing. It can be useful for the suspenders of stack-overflow checking to go
along with the belt of must-be-perfect modules.
> handle_interrupt can call quit_throw_to_read_char only
> when waiting_for_input is true, which it is only when, well, we're
> waiting for input, not at arbitrary points in the program.
Ah, good point, so that part of the code should be OK. Still, a few lines
earlier we see things like Fdo_auto_save () and fflush (stdout) that can be
executed from a Unix signal handler while quit-flag is non-nil. Although this
has undefined behavior too, this code has been around for quite some time and I
use it more often than I like to admit.
> Or can we use a stack guard region [1], and in the signal handler,
> unprotect the set a global variable in the signal handler, and check the
> variable on QUIT, and at toplevel, reprotect the guard region. If we
> segfault again without having reached toplevel, just die. Would that
> make you happy?
I think something along that lines would suffice, yes. Admittedly I didn't quite
follow what you wrote (perhaps some text got elided?). But the main point, as I
understand it, is that we needn't worry about having a stack-overflow check
inside the stack-overflow handler, because we can insist that the stack-overflow
handler be tightly-enough controlled so that it won't recurse indefinitely.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 20:47 ` Daniel Colascione
@ 2016-01-03 21:07 ` John Wiegley
2016-01-03 21:28 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-03 21:07 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2892 bytes --]
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> It's not just a theoretical problem: I've spent lots of late nights staring
> at stack traces, trying to figure out how a certain deadlock could be
> possible, only to realize that the program had already crashed --- or would
> have, if a seldom-tested bit of code hadn't checked for NULL and returned
> without releasing a lock, causing a hang half an hour later.
I see. Isn't what you describe an argument against error handling in general,
though? It too can mask the origin of serious problems.
What if we do this:
1. When a serious error occurs that engages crash recovery, we pop up a
window in Emacs describing that a serious error occurred that would have
crashed Emacs --and that *nothing* should be trusted now. All the user
should do is save critical buffers and exit immediately.
2. When in such a state, M-x report-emacs-bug automatically includes a trace
for the location where the crash occurred. Of course, this assumes Emacs
is still functional enough to send e-mail.
> You're right that under Linux, programs need to prepare for the possibility
> that they might suddenly cease to exist. We're talking about something
> different here, which is the possibility that a program can *keep running*,
> but in a damaged and undefined state.
I was thinking the system itself is now running in a damaged and undefined
state. When that happens, I often reboot since I can't really trust it
anymore.
> I'm worried that it'll be hard to know if it bites us, particularly since
> the problems I'm imagining are infrequent, unreproducible, and carry no
> obvious signature that would show up in a user crash report.
If we use a window to pop up an alarm indicating, boldly, that Emacs is now
UNSTABLE and should only be used to save files and exit -- maybe even noting
how to abort Emacs to avoid typical cleanup actions -- we can start getting
feedback on whether this feature really helps or hurts.
I understand error handlers can mask problems, and that they've made your life
more difficult as an engineer concerned with uncovering such causes. However,
I'm disinclined to accept, a priori, that it will hurt before trying it out.
When Emacs isn't being run under gdb (which it almost never is) it also
doesn't give much useful information about what happened, and loses data. With
the crash recovery logic, we should at least be able to provide a trace of
where we were when the crash was detected, plus give the user a chance of
reporting that data back to us. I see this as possibly *increasing* the amount
of error information we receive, and not just masking or eliminating it.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:02 ` Paul Eggert
@ 2016-01-03 21:12 ` Daniel Colascione
2016-01-03 23:11 ` Paul Eggert
2016-01-04 15:24 ` Eli Zaretskii
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 21:12 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3064 bytes --]
On 01/03/2016 01:02 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>
>> neither you nor Eli have demonstrated in any way that all this
>> complexity is necessary, that we actually have a C stack overflow
>> problem,
>
> I mentioned (1) stack overflow in the regexp code, and (2) stack
> overflow in C modules; did you miss that? The counterargument for (2)
> that C modules can crash Emacs in countless ways so let's not worry
> about stack overflow is not all that convincing.
I think it's very convincing, and it's even less defensible to try to
fix overflows in modules. It's *critical* not to violate the invariants
of code. You have no idea what might happen when you do. It's not the
job of the Emacs core to try to fix the bugs in modules that happen to
be loaded into the process.
> It can be useful for
> the suspenders of stack-overflow checking to go along with the belt of
> must-be-perfect modules.
As I've previously written, I don't believe in trying to paper over
bugs. Just crash. Loading native code in Emacs is a dangerous operation;
trying to hide that danger by attempting to fix certain classes of
module bugs will just make all problems harder to find.
>> handle_interrupt can call quit_throw_to_read_char only
>> when waiting_for_input is true, which it is only when, well, we're
>> waiting for input, not at arbitrary points in the program.
>
> Ah, good point, so that part of the code should be OK. Still, a few
> lines earlier we see things like Fdo_auto_save () and fflush (stdout)
> that can be executed from a Unix signal handler while quit-flag is
> non-nil. Although this has undefined behavior too, this code has been
> around for quite some time and I use it more often than I like to admit.
I don't like this either. It should be possible to replace the printfs
in this instance with calls to write(1, "message") (which will bypass
any output buffering) and restore async-signal-safety.
If a user elects to attempt auto-save in this situation, that's on him.
Ideally, we'd make autosave async-signal-safe, which will help in this
handler and in the segfault hander.
>> Or can we use a stack guard region [1], and in the signal handler,
>> unprotect the set a global variable in the signal handler, and check the
>> variable on QUIT, and at toplevel, reprotect the guard region. If we
>> segfault again without having reached toplevel, just die. Would that
>> make you happy?
>
> I think something along that lines would suffice, yes. Admittedly I
> didn't quite follow what you wrote (perhaps some text got elided?). But
> the main point, as I understand it, is that we needn't worry about
> having a stack-overflow check inside the stack-overflow handler, because
> we can insist that the stack-overflow handler be tightly-enough
> controlled so that it won't recurse indefinitely.
Yes: do as little as possible in the segfault handler and signal an
error the normal way at the next safe opportunity, if one arises before
we fully exhaust the stack.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:07 ` John Wiegley
@ 2016-01-03 21:28 ` Daniel Colascione
2016-01-03 21:31 ` Daniel Colascione
` (2 more replies)
0 siblings, 3 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 21:28 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4995 bytes --]
On 01/03/2016 01:07 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> It's not just a theoretical problem: I've spent lots of late nights staring
>> at stack traces, trying to figure out how a certain deadlock could be
>> possible, only to realize that the program had already crashed --- or would
>> have, if a seldom-tested bit of code hadn't checked for NULL and returned
>> without releasing a lock, causing a hang half an hour later.
>
> I see. Isn't what you describe an argument against error handling in general,
> though? It too can mask the origin of serious problems.
It is. There's a difference between trying to paper over undefined
behavior generally, however, and reporting well-defined errors using a
safe mechanism. (The former invalidates the system's own invariants,
while the latter invalidates only the application's invariants.)
But yes, error handling in general can paper over bugs, and I've
certainly seem Emacs bugs similarly exacerbated by attempting to ignore
errors.
> What if we do this:
>
> 1. When a serious error occurs that engages crash recovery, we pop up a
> window in Emacs describing that a serious error occurred that would have
> crashed Emacs --and that *nothing* should be trusted now. All the user
> should do is save critical buffers and exit immediately.
The call to Fdo_auto_save tries to do that already. Fdo_auto_save isn't
async-signal-safe, so I'd rather fork a child process, in the child,
call Fdo_auto_save and exit, have the parent wait 500ms for the child
(not forever, in case the child deadlocks), kill the child, and continue
crashing. That, or provide a less elaborate, async-signal-safe, pure C
auto-save facility.
In any case, control flow shouldn't leave the signal handler when the
application is in an unpredictable state.
> 2. When in such a state, M-x report-emacs-bug automatically includes a trace
> for the location where the crash occurred. Of course, this assumes Emacs
> is still functional enough to send e-mail.
>
>> You're right that under Linux, programs need to prepare for the possibility
>> that they might suddenly cease to exist. We're talking about something
>> different here, which is the possibility that a program can *keep running*,
>> but in a damaged and undefined state.
Ideally, Emacs would, on crash (and after auto-save), spawn a copy of
itself with an error report pre-filled. Fork and exec work perfectly
fine in signal handlers.
> I was thinking the system itself is now running in a damaged and undefined
> state. When that happens, I often reboot since I can't really trust it
> anymore.
>
>> I'm worried that it'll be hard to know if it bites us, particularly since
>> the problems I'm imagining are infrequent, unreproducible, and carry no
>> obvious signature that would show up in a user crash report.
>
> If we use a window to pop up an alarm indicating, boldly, that Emacs is now
> UNSTABLE and should only be used to save files and exit -- maybe even noting
> how to abort Emacs to avoid typical cleanup actions -- we can start getting
> feedback on whether this feature really helps or hurts.
I think we need better crash reporting generally. Stack overflow is only
one instance of the general class of things that can go wrong.
But in any case, if we put Emacs into a state where the only thing a
user can do is save files, why not just save the files? There's no
guarantee that after a crash that we can even display something.
> I understand error handlers can mask problems, and that they've made your life
> more difficult as an engineer concerned with uncovering such causes. However,
> I'm disinclined to accept, a priori, that it will hurt before trying it out.
We have no information on how often Emacs crashes in the hands or real
users or how it crashes. A wait-and-see approach is just blind faith.
Nobody has also brought up why other programs don't work with way. Other
programs avoid this kind of hackery for good reasons, which I've
detailed. We shouldn't ignore the lessons of everyone else. It's not for
lack of inspiration that nobody else does this.
One question that neither you, nor Eli, nor Paul have answered is why we
would try to recover from stack overflow and not NULL deferences.
Exactly the same arguments apply to both situations.
> When Emacs isn't being run under gdb (which it almost never is) it also
> doesn't give much useful information about what happened, and loses data. With
> the crash recovery logic, we should at least be able to provide a trace of
> where we were when the crash was detected, plus give the user a chance of
> reporting that data back to us. I see this as possibly *increasing* the amount
> of error information we receive, and not just masking or eliminating it.
Emacs should report its own crashes somehow *generally*, probably with
Breakpad.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:28 ` Daniel Colascione
@ 2016-01-03 21:31 ` Daniel Colascione
2016-01-04 15:27 ` Eli Zaretskii
2016-01-03 21:45 ` John Wiegley
2016-01-04 15:26 ` Eli Zaretskii
2 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 21:31 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2090 bytes --]
On 01/03/2016 01:28 PM, Daniel Colascione wrote:
> On 01/03/2016 01:07 PM, John Wiegley wrote:
>>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>>
>>> It's not just a theoretical problem: I've spent lots of late nights staring
>>> at stack traces, trying to figure out how a certain deadlock could be
>>> possible, only to realize that the program had already crashed --- or would
>>> have, if a seldom-tested bit of code hadn't checked for NULL and returned
>>> without releasing a lock, causing a hang half an hour later.
>>
>> I see. Isn't what you describe an argument against error handling in general,
>> though? It too can mask the origin of serious problems.
>
> It is. There's a difference between trying to paper over undefined
> behavior generally, however, and reporting well-defined errors using a
> safe mechanism. (The former invalidates the system's own invariants,
> while the latter invalidates only the application's invariants.)
>
> But yes, error handling in general can paper over bugs, and I've
> certainly seem Emacs bugs similarly exacerbated by attempting to ignore
> errors.
>
>> What if we do this:
>>
>> 1. When a serious error occurs that engages crash recovery, we pop up a
>> window in Emacs describing that a serious error occurred that would have
>> crashed Emacs --and that *nothing* should be trusted now. All the user
>> should do is save critical buffers and exit immediately.
>
> The call to Fdo_auto_save tries to do that already. Fdo_auto_save isn't
> async-signal-safe, so I'd rather fork a child process, in the child,
> call Fdo_auto_save and exit, have the parent wait 500ms for the child
> (not forever, in case the child deadlocks), kill the child, and continue
> crashing. That, or provide a less elaborate, async-signal-safe, pure C
> auto-save facility.
I'd also support doing no auto-save at crash time. Auto-save should
happen frequently enough anyway that users shouldn't lose much data when
a crash happens, and not auto-saving sidesteps a lot of robustness concerns.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:28 ` Daniel Colascione
2016-01-03 21:31 ` Daniel Colascione
@ 2016-01-03 21:45 ` John Wiegley
2016-01-03 22:20 ` Daniel Colascione
2016-01-04 15:29 ` Eli Zaretskii
2016-01-04 15:26 ` Eli Zaretskii
2 siblings, 2 replies; 177+ messages in thread
From: John Wiegley @ 2016-01-03 21:45 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2466 bytes --]
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> Ideally, Emacs would, on crash (and after auto-save), spawn a copy of itself
> with an error report pre-filled. Fork and exec work perfectly fine in signal
> handlers.
One problem here is that some of us have extensive configurations that load a
great deal of saved state between executions. Spawning a new Emacs just to
send an error report is not something I'd want to see happen.
> But in any case, if we put Emacs into a state where the only thing a user
> can do is save files, why not just save the files? There's no guarantee that
> after a crash that we can even display something.
So, on a detected crash, auto-save all files, and save a text file with the
crash data before exiting? That sounds pretty safe and reasonable to me.
Maybe we could even popup a window to alert the user, and prompt them to press
a key, but the only action will be to exit (unless the user is a power user
and uses recursive edit to attempt to interact with their now-broken Emacs).
> We have no information on how often Emacs crashes in the hands or real users
> or how it crashes. A wait-and-see approach is just blind faith.
I prefer to think of it as data gathering. Accepting the words of one person
about what the future will look like is more in line with the faith approach.
I'm not hearing a chorus of voices against this feature, and I have the word
of other seasoned engineers in support of it.
> One question that neither you, nor Eli, nor Paul have answered is why we
> would try to recover from stack overflow and not NULL deferences. Exactly
> the same arguments apply to both situations.
Why must it be all or nothing? Some is better than nothing. The error handler
can evolve after we know just how useful it is (or whether it is).
Eli, Paul: What do you think about just auto-saving as much as possible,
writing an error trace to a file, and prompting the user to press a key, after
which we abort the running Emacs? This is in line with what many of my OS X
applications do when they encounter a fatal error; they're kind enough to tell
me that it happened, and give me an "OK" button to click before they abort,
but they don't allow me to continue to operate the application in an unknown
state.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:45 ` John Wiegley
@ 2016-01-03 22:20 ` Daniel Colascione
2016-01-03 22:43 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) John Wiegley
2016-01-04 15:31 ` Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
2016-01-04 15:29 ` Eli Zaretskii
1 sibling, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 22:20 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 5687 bytes --]
On 01/03/2016 01:45 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> Ideally, Emacs would, on crash (and after auto-save), spawn a copy of itself
>> with an error report pre-filled. Fork and exec work perfectly fine in signal
>> handlers.
>
> One problem here is that some of us have extensive configurations that load a
> great deal of saved state between executions. Spawning a new Emacs just to
> send an error report is not something I'd want to see happen.
Are you worried about startup time or correctness? Either way, wouldn't
wouldn't spawning a new emacs with -Q solve the problem?
>> But in any case, if we put Emacs into a state where the only thing a user
>> can do is save files, why not just save the files? There's no guarantee that
>> after a crash that we can even display something.
>
> So, on a detected crash, auto-save all files, and save a text file with the
> crash data before exiting? That sounds pretty safe and reasonable to me.
I'm imagining more a minidump than a text file, yes, that's the basic idea.
> Maybe we could even popup a window to alert the user, and prompt them to press
> a key, but the only action will be to exit (unless the user is a power user
> and uses recursive edit to attempt to interact with their now-broken Emacs).
That's a reasonable UI, but popping up a window or otherwise displaying
UI in-process might not work. Instead, we can fork and exec a new Emacs
to interact with the user, and read from a pipe that process inherits a
byte telling the crashing Emacs what it should do. All that's perfectly
legal to do from an async-signal-unsafe context.
The new Emacs has to know *how* to display a message. I think it should
be possible to look at the current frame's window system information.
For NS and Win32, we just need to know whether it's GUI or a tty. For
X11, we'd just need to extract display. On every frame switch, we can
record this information in a simple variable we can read in any
async-signal-safe way.
Of course the child Emacs has to display something to the user somehow,
but we can record the current window-system parameters on every frame
switch into async-signal-safe state (say, a global char buffer), so that
we can launch the child Emacs with the right display parameters.
If the user indicates via the new process that she wants to continue
using the broken Emacs, great. We should support doing just that. It'd
be nice also to give that child Emacs support for attaching GDB to its
parent, actually. Of course it's possible to attach GDB manually, but
why not make it convenient?
>> We have no information on how often Emacs crashes in the hands or real users
>> or how it crashes. A wait-and-see approach is just blind faith.
>
> I prefer to think of it as data gathering. Accepting the words of one person
> about what the future will look like is more in line with the faith approach.
> I'm not hearing a chorus of voices against this feature, and I have the word
> of other seasoned engineers in support of it.
>
>> One question that neither you, nor Eli, nor Paul have answered is why we
>> would try to recover from stack overflow and not NULL deferences. Exactly
>> the same arguments apply to both situations.
>
> Why must it be all or nothing? Some is better than nothing. The error handler
> can evolve after we know just how useful it is (or whether it is).
If we had real data, I'd be more comfortable with the feature. As it is,
we have to rely on user reports, and I suspect that most users won't
bother reporting occasional hangs and crashes if it's any harder than
pushing a button. Given the absence of quantitative information, I'd
rather avoid undefined behavior.
> Eli, Paul: What do you think about just auto-saving as much as possible,
> writing an error trace to a file, and prompting the user to press a key, after
> which we abort the running Emacs? This is in line with what many of my OS X
> applications do when they encounter a fatal error; they're kind enough to tell
> me that it happened, and give me an "OK" button to click before they abort,
> but they don't allow me to continue to operate the application in an unknown
> state.
That works. In particular, on startup, we can create a new, empty file
under ~/.emacs.d and keep a file descriptor to it open. Normally, we'll
never write to the file. If we see a crash of *any* sort, however ---
stack overflow or some other bug --- we'll prompt the user. If the user
elects to continue using Emacs or attach a debugger, fine.
If not, we'll save to the file we've already opened information about
the crash, followed by the contents of dirty buffers.
On next startup, for each crash file we find that isn't owned by a
running Emacs, we'll
1) read and parse the crash file,
2) prompt the user to send a bug report, and
3) restore the contents of persisted buffers.
To avoid crash loops arising from certain arrangements of buffer
contents, we can restore each buffer in fundamental-mode, and with a
name indicating that it's recovered data.
The advantage of using this scheme instead of the generic auto-save is
that this one is async-signal-safe (and never runs Lisp), can't fail
(except due to disk space exhaustion and the Emacs process disappearing
--- because we've preallocated all other resources), works for
non-file-backed buffers that wouldn't ordinarily be autosaved, and makes
state restoration explicit.
It also works perfectly well for crashes in module code.
Of course, the downside is that the code to do this doesn't exist yet.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2016-01-03 22:20 ` Daniel Colascione
@ 2016-01-03 22:43 ` John Wiegley
2016-01-03 22:55 ` Crash recovery strategies Daniel Colascione
` (2 more replies)
2016-01-04 15:31 ` Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
1 sibling, 3 replies; 177+ messages in thread
From: John Wiegley @ 2016-01-03 22:43 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> Are you worried about startup time or correctness? Either way, wouldn't
> wouldn't spawning a new emacs with -Q solve the problem?
Except that it wouldn't be configured to use the right mail sender. :(
> That's a reasonable UI, but popping up a window or otherwise displaying UI
> in-process might not work. Instead, we can fork and exec a new Emacs to
> interact with the user, and read from a pipe that process inherits a byte
> telling the crashing Emacs what it should do. All that's perfectly legal to
> do from an async-signal-unsafe context.
I'm OK if sometimes it just doesn't work. The new async Emacs idea sounds like
it has a host of unforeseen complications waiting behind it.
> If we had real data, I'd be more comfortable with the feature. As it is, we
> have to rely on user reports, and I suspect that most users won't bother
> reporting occasional hangs and crashes if it's any harder than pushing a
> button. Given the absence of quantitative information, I'd rather avoid
> undefined behavior.
OK, your objection is duly noted.
> That works. In particular, on startup, we can create a new, empty file under
> ~/.emacs.d and keep a file descriptor to it open. Normally, we'll never
> write to the file. If we see a crash of *any* sort, however --- stack
> overflow or some other bug --- we'll prompt the user. If the user elects to
> continue using Emacs or attach a debugger, fine.
Ah, showing the report on the *next* Emacs invocation is also something that
OS X applications do (as an example of prior art). I like that idea.
> On next startup, for each crash file we find that isn't owned by a running
> Emacs, we'll
> 1) read and parse the crash file,
> 2) prompt the user to send a bug report, and
> 3) restore the contents of persisted buffers.
> To avoid crash loops arising from certain arrangements of buffer contents,
> we can restore each buffer in fundamental-mode, and with a name indicating
> that it's recovered data.
I like this idea. It's like Windows booting into safe mode, except it's safe
buffers.
> Of course, the downside is that the code to do this doesn't exist yet.
I'd like to know what others think (maybe they are ignoring this thread now,
so I've changed the subject). But I feel like there is some convergence now.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 22:43 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) John Wiegley
@ 2016-01-03 22:55 ` Daniel Colascione
2016-01-03 22:59 ` John Wiegley
` (2 more replies)
2016-01-03 23:21 ` Paul Eggert
2016-01-04 15:32 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) Eli Zaretskii
2 siblings, 3 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 22:55 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3083 bytes --]
On 01/03/2016 02:43 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> Are you worried about startup time or correctness? Either way, wouldn't
>> wouldn't spawning a new emacs with -Q solve the problem?
>
> Except that it wouldn't be configured to use the right mail sender. :(
But that the emacs -Q process won't be doing any sending. The next
regular Emacs process will do that. The process we spawn from the
sigsegv handler is just for asking the user what to do about the crash.
If we can't launch the process, we can do the default thing, the right
choice for which is probably to write the crash file.
>> That's a reasonable UI, but popping up a window or otherwise displaying UI
>> in-process might not work. Instead, we can fork and exec a new Emacs to
>> interact with the user, and read from a pipe that process inherits a byte
>> telling the crashing Emacs what it should do. All that's perfectly legal to
>> do from an async-signal-unsafe context.
>
> I'm OK if sometimes it just doesn't work. The new async Emacs idea sounds like
> it has a host of unforeseen complications waiting behind it.
The problem is that we can't *tell* whether it doesn't work. If we try
to do that, we can just silently not execute.
>> If we had real data, I'd be more comfortable with the feature. As it is, we
>> have to rely on user reports, and I suspect that most users won't bother
>> reporting occasional hangs and crashes if it's any harder than pushing a
>> button. Given the absence of quantitative information, I'd rather avoid
>> undefined behavior.
>
> OK, your objection is duly noted.
>
>> That works. In particular, on startup, we can create a new, empty file under
>> ~/.emacs.d and keep a file descriptor to it open. Normally, we'll never
>> write to the file. If we see a crash of *any* sort, however --- stack
>> overflow or some other bug --- we'll prompt the user. If the user elects to
>> continue using Emacs or attach a debugger, fine.
>
> Ah, showing the report on the *next* Emacs invocation is also something that
> OS X applications do (as an example of prior art). I like that idea.
>
>> On next startup, for each crash file we find that isn't owned by a running
>> Emacs, we'll
>
>> 1) read and parse the crash file,
>> 2) prompt the user to send a bug report, and
>> 3) restore the contents of persisted buffers.
>
>> To avoid crash loops arising from certain arrangements of buffer contents,
>> we can restore each buffer in fundamental-mode, and with a name indicating
>> that it's recovered data.
>
> I like this idea. It's like Windows booting into safe mode, except it's safe
> buffers.
>
>> Of course, the downside is that the code to do this doesn't exist yet.
>
> I'd like to know what others think (maybe they are ignoring this thread now,
> so I've changed the subject). But I feel like there is some convergence now.
If implementing a scheme like this is what it takes to kill the stack
overflow code, I think I can implement it.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 22:55 ` Crash recovery strategies Daniel Colascione
@ 2016-01-03 22:59 ` John Wiegley
2016-01-03 23:04 ` Daniel Colascione
2016-01-03 23:47 ` John Wiegley
2016-01-04 15:33 ` Eli Zaretskii
2 siblings, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-03 22:59 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> But that the emacs -Q process won't be doing any sending. The next regular
> Emacs process will do that. The process we spawn from the sigsegv handler is
> just for asking the user what to do about the crash. If we can't launch the
> process, we can do the default thing, the right choice for which is probably
> to write the crash file.
Hmm... something about this approach just doesn't feel right, but I'm not sure
what it is that I don't like. I'll have to sleep on it.
>> I'm OK if sometimes it just doesn't work. The new async Emacs idea sounds like
>> it has a host of unforeseen complications waiting behind it.
> The problem is that we can't *tell* whether it doesn't work. If we try to do
> that, we can just silently not execute.
This isn't going to be a 100% solution to any problem, so I'm OK if this is a
scatter-gun approach.
>>> On next startup, for each crash file we find that isn't owned by a running
>>> Emacs, we'll
>>
>>> 1) read and parse the crash file,
>>> 2) prompt the user to send a bug report, and
>>> 3) restore the contents of persisted buffers.
>>
>>> To avoid crash loops arising from certain arrangements of buffer contents,
>>> we can restore each buffer in fundamental-mode, and with a name indicating
>>> that it's recovered data.
> If implementing a scheme like this is what it takes to kill the stack
> overflow code, I think I can implement it.
Wouldn't the stack overflow code still exist, to catch the error? Maybe I
haven't understood something... Can you explain how this approach removes that
code?
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 22:59 ` John Wiegley
@ 2016-01-03 23:04 ` Daniel Colascione
2016-01-03 23:20 ` John Wiegley
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 23:04 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2410 bytes --]
On 01/03/2016 02:59 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> But that the emacs -Q process won't be doing any sending. The next regular
>> Emacs process will do that. The process we spawn from the sigsegv handler is
>> just for asking the user what to do about the crash. If we can't launch the
>> process, we can do the default thing, the right choice for which is probably
>> to write the crash file.
>
> Hmm... something about this approach just doesn't feel right, but I'm not sure
> what it is that I don't like. I'll have to sleep on it.
Fair enough. I don't think in core we've had any features that rely on
Emacs re-executing itself. I've wanted to make the byte compiler do the
same for years though, so it might be worth seeing how hard it would be
to make this work.
>>> I'm OK if sometimes it just doesn't work. The new async Emacs idea sounds like
>>> it has a host of unforeseen complications waiting behind it.
>
>> The problem is that we can't *tell* whether it doesn't work. If we try to do
>> that, we can just silently not execute.
>
> This isn't going to be a 100% solution to any problem, so I'm OK if this is a
> scatter-gun approach.
The problem is that displaying a GUI dialog box requires re-entering the
main event loop, which I think risks too much undefined behavior. A
separate process sidesteps the problem.
>>>> On next startup, for each crash file we find that isn't owned by a running
>>>> Emacs, we'll
>>>
>>>> 1) read and parse the crash file,
>>>> 2) prompt the user to send a bug report, and
>>>> 3) restore the contents of persisted buffers.
>>>
>>>> To avoid crash loops arising from certain arrangements of buffer contents,
>>>> we can restore each buffer in fundamental-mode, and with a name indicating
>>>> that it's recovered data.
>
>> If implementing a scheme like this is what it takes to kill the stack
>> overflow code, I think I can implement it.
>
> Wouldn't the stack overflow code still exist, to catch the error? Maybe I
> haven't understood something... Can you explain how this approach removes that
> code?
The code that bothers me is the code that longjmps *out* of the sigsegv
handler that catches stack overflow. The handler being there is fine.
Under my proposed scheme, we won't longjmp out of the handler unless the
user tells us to do that.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:12 ` Daniel Colascione
@ 2016-01-03 23:11 ` Paul Eggert
2016-01-03 23:22 ` Daniel Colascione
2016-01-04 15:24 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2016-01-03 23:11 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]
Daniel Colascione wrote:
> It's *critical* not to violate the invariants of code.
Sure, but we are discussing what those invariants should be; they are not carved
in stone. One possible invariant is (A) "stack overflow never happens". Another
is (B) "if stack overflow happens, callers must tolerate being longjmped
through". Either invariant is reasonable per se. It is a judgment call as to
which invariant is better for Emacs. Possibly some modules will prefer (A) and
others (B).
Take the regular expression code as an example. Suppose it has unusual
worst-case behavior that can grow the stack in arbitrary ways (which I think it
does though I'm not going to investigate the details right now). One way to
address this is to rewrite the code so that it doesn't have the behavior, but
that would be a pain; the code has been that way for decades and is crufty at
this point and a lot of Emacs depends on its quirks. Another way to address it
is to use a guard page or whatever on the halfway-decent platforms that support
that sort of thing. We've chosen the latter, i.e., we've chosen invariant (B),
and yes there are problems with this approach but it beats doing nothing and it
beats doing (A) because nobody has had the time to do (A), assuming it's doable
at all.
> It should be possible to replace the printfs
> in this instance with calls to write(1, "message") (which will bypass
> any output buffering) and restore async-signal-safety.
Good point. I did that with the attached patch to emacs-25. However, this
doesn't address the Fdo_auto_save () issue in the same neighborhood.
> If a user elects to attempt auto-save in this situation, that's on him.
Sure, and Emacs already asks the user whether to auto-save in that situation, so
this should be OK already.
> Ideally, we'd make autosave async-signal-safe, which will help in this
> handler and in the segfault hander.
Yes, that'd be good, if we didn't lose functionality thereby.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Avoid-stdio-in-SIGINT-handler.patch --]
[-- Type: text/x-diff; name="0001-Avoid-stdio-in-SIGINT-handler.patch", Size: 11744 bytes --]
From d8a33374cfa7deaf1e2fd4762c59bc6607850f65 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 3 Jan 2016 15:00:49 -0800
Subject: [PATCH] Avoid stdio in SIGINT handler
* admin/merge-gnulib (GNULIB_MODULES): Add ignore-value.
* lib/gnulib.mk, m4/gnulib-comp.m4: Regenerate.
* lib/ignore-value.h: New file, from gnulib.
* src/keyboard.c: Include it.
(write_stdout, read_stdin): New functions.
(handle_interrupt): Use them instead of printf and getchar,
and avoid fflush when handling signals.
---
admin/merge-gnulib | 2 +-
lib/gnulib.mk | 9 ++++++-
lib/ignore-value.h | 50 +++++++++++++++++++++++++++++++++++++
m4/gnulib-comp.m4 | 2 ++
src/keyboard.c | 73 +++++++++++++++++++++++++++++++++++-------------------
5 files changed, 108 insertions(+), 28 deletions(-)
create mode 100644 lib/ignore-value.h
diff --git a/admin/merge-gnulib b/admin/merge-gnulib
index 363bb23..40b5b78 100755
--- a/admin/merge-gnulib
+++ b/admin/merge-gnulib
@@ -32,7 +32,7 @@ GNULIB_MODULES='
dtoastr dtotimespec dup2 environ execinfo faccessat
fcntl fcntl-h fdatasync fdopendir filemode fstatat fsync
getloadavg getopt-gnu gettime gettimeofday gitlog-to-changelog
- intprops largefile lstat
+ ignore-value intprops largefile lstat
manywarnings memrchr mkostemp mktime
pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat
sig2str socklen stat-time stdalign stddef stdio
diff --git a/lib/gnulib.mk b/lib/gnulib.mk
index 97ed5b1..b920cbb 100644
--- a/lib/gnulib.mk
+++ b/lib/gnulib.mk
@@ -21,7 +21,7 @@
# the same distribution terms as the rest of that program.
#
# Generated by gnulib-tool.
-# Reproduce by: gnulib-tool --import --lib=libgnu --source-base=lib --m4-base=m4 --doc-base=doc --tests-base=tests --aux-dir=build-aux --avoid=close --avoid=dup --avoid=fchdir --avoid=flexmember --avoid=fstat --avoid=malloc-posix --avoid=msvc-inval --avoid=msvc-nothrow --avoid=open --avoid=openat-die --avoid=opendir --avoid=raise --avoid=save-cwd --avoid=select --avoid=setenv --avoid=sigprocmask --avoid=stdarg --avoid=stdbool --avoid=threadlib --avoid=unsetenv --makefile-name=gnulib.mk --conditional-dependencies --no-libtool --macro-prefix=gl --no-vc-files alloca-opt binary-io byteswap c-ctype c-strcase careadlinkat close-stream count-one-bits count-trailing-zeros crypto/md5 crypto/sha1 crypto/sha256 crypto/sha512 dtoastr dtotimespec dup2 environ execinfo faccessat fcntl fcntl-h fdatasync fdopendir filemode fstatat fsync getloadavg getopt-gnu gettime gettimeofday gitlog-to-changelog intprops largefile lstat manywarnings memrchr mkostemp mktime pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat sig2str socklen stat-time stdalign stddef stdio stpcpy strftime strtoimax strtoumax symlink sys_stat sys_time time time_r time_rz timegm timer-time timespec-add timespec-sub unsetenv update-copyright utimens vla warnings
+# Reproduce by: gnulib-tool --import --lib=libgnu --source-base=lib --m4-base=m4 --doc-base=doc --tests-base=tests --aux-dir=build-aux --avoid=close --avoid=dup --avoid=fchdir --avoid=flexmember --avoid=fstat --avoid=malloc-posix --avoid=msvc-inval --avoid=msvc-nothrow --avoid=open --avoid=openat-die --avoid=opendir --avoid=raise --avoid=save-cwd --avoid=select --avoid=setenv --avoid=sigprocmask --avoid=stdarg --avoid=stdbool --avoid=threadlib --avoid=unsetenv --makefile-name=gnulib.mk --conditional-dependencies --no-libtool --macro-prefix=gl --no-vc-files alloca-opt binary-io byteswap c-ctype c-strcase careadlinkat close-stream count-one-bits count-trailing-zeros crypto/md5 crypto/sha1 crypto/sha256 crypto/sha512 dtoastr dtotimespec dup2 environ execinfo faccessat fcntl fcntl-h fdatasync fdopendir filemode fstatat fsync getloadavg getopt-gnu gettime gettimeofday gitlog-to-changelog ignore-value intprops largefile lstat manywarnings memrchr mkostemp mktime pipe2 pselect pthread_sigmask putenv qcopy-acl readlink readlinkat sig2str socklen stat-time stdalign stddef stdio stpcpy strftime strtoimax strtoumax symlink sys_stat sys_time time time_r time_rz timegm timer-time timespec-add timespec-sub unsetenv update-copyright utimens vla warnings
MOSTLYCLEANFILES += core *.stackdump
@@ -567,6 +567,13 @@ EXTRA_libgnu_a_SOURCES += group-member.c
## end gnulib module group-member
+## begin gnulib module ignore-value
+
+
+EXTRA_DIST += ignore-value.h
+
+## end gnulib module ignore-value
+
## begin gnulib module intprops
diff --git a/lib/ignore-value.h b/lib/ignore-value.h
new file mode 100644
index 0000000..6713d96
--- /dev/null
+++ b/lib/ignore-value.h
@@ -0,0 +1,50 @@
+/* ignore a function return without a compiler warning. -*- coding: utf-8 -*-
+
+ Copyright (C) 2008-2016 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+/* Written by Jim Meyering, Eric Blake and Pádraig Brady. */
+
+/* Use "ignore_value" to avoid a warning when using a function declared with
+ gcc's warn_unused_result attribute, but for which you really do want to
+ ignore the result. Traditionally, people have used a "(void)" cast to
+ indicate that a function's return value is deliberately unused. However,
+ if the function is declared with __attribute__((warn_unused_result)),
+ gcc issues a warning even with the cast.
+
+ Caution: most of the time, you really should heed gcc's warning, and
+ check the return value. However, in those exceptional cases in which
+ you're sure you know what you're doing, use this function.
+
+ For the record, here's one of the ignorable warnings:
+ "copy.c:233: warning: ignoring return value of 'fchown',
+ declared with attribute warn_unused_result". */
+
+#ifndef _GL_IGNORE_VALUE_H
+#define _GL_IGNORE_VALUE_H
+
+/* Normally casting an expression to void discards its value, but GCC
+ versions 3.4 and newer have __attribute__ ((__warn_unused_result__))
+ which may cause unwanted diagnostics in that case. Use __typeof__
+ and __extension__ to work around the problem, if the workaround is
+ known to be needed. */
+#if 3 < __GNUC__ + (4 <= __GNUC_MINOR__)
+# define ignore_value(x) \
+ (__extension__ ({ __typeof__ (x) __x = (x); (void) __x; }))
+#else
+# define ignore_value(x) ((void) (x))
+#endif
+
+#endif
diff --git a/m4/gnulib-comp.m4 b/m4/gnulib-comp.m4
index 69920a8..27ca70a 100644
--- a/m4/gnulib-comp.m4
+++ b/m4/gnulib-comp.m4
@@ -91,6 +91,7 @@ AC_DEFUN([gl_EARLY],
# Code from module gettimeofday:
# Code from module gitlog-to-changelog:
# Code from module group-member:
+ # Code from module ignore-value:
# Code from module include_next:
# Code from module intprops:
# Code from module inttypes-incomplete:
@@ -905,6 +906,7 @@ AC_DEFUN([gl_FILE_LIST], [
lib/gettimeofday.c
lib/gl_openssl.h
lib/group-member.c
+ lib/ignore-value.h
lib/intprops.h
lib/inttypes.in.h
lib/lstat.c
diff --git a/src/keyboard.c b/src/keyboard.c
index fcafd0b..6bdfc1a 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -64,6 +64,8 @@ along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. */
#include <unistd.h>
#include <fcntl.h>
+#include <ignore-value.h>
+
#ifdef HAVE_WINDOW_SYSTEM
#include TERM_HEADER
#endif /* HAVE_WINDOW_SYSTEM */
@@ -10206,6 +10208,21 @@ deliver_interrupt_signal (int sig)
deliver_process_signal (sig, handle_interrupt_signal);
}
+/* Output MSG directly to standard output, without buffering. Ignore
+ failures. This is safe in a signal handler. */
+static void
+write_stdout (char const *msg)
+{
+ ignore_value (write (STDOUT_FILENO, msg, strlen (msg)));
+}
+
+/* Read a byte from stdin, without buffering. Safe in signal handlers. */
+static int
+read_stdin (void)
+{
+ char c;
+ return read (STDIN_FILENO, &c, 1) == 1 ? c : EOF;
+}
/* If Emacs is stuck because `inhibit-quit' is true, then keep track
of the number of times C-g has been requested. If C-g is pressed
@@ -10242,9 +10259,9 @@ handle_interrupt (bool in_signal_handler)
sigemptyset (&blocked);
sigaddset (&blocked, SIGINT);
pthread_sigmask (SIG_BLOCK, &blocked, 0);
+ fflush (stdout);
}
- fflush (stdout);
reset_all_sys_modes ();
#ifdef SIGTSTP
@@ -10260,8 +10277,9 @@ handle_interrupt (bool in_signal_handler)
/* Perhaps should really fork an inferior shell?
But that would not provide any way to get back
to the original shell, ever. */
- printf ("No support for stopping a process on this operating system;\n");
- printf ("you can continue or abort.\n");
+ write_stdout ("No support for stopping a process"
+ " on this operating system;\n"
+ "you can continue or abort.\n");
#endif /* not SIGTSTP */
#ifdef MSDOS
/* We must remain inside the screen area when the internal terminal
@@ -10272,46 +10290,49 @@ handle_interrupt (bool in_signal_handler)
the code used for auto-saving doesn't cope with the mark bit. */
if (!gc_in_progress)
{
- printf ("Auto-save? (y or n) ");
- fflush (stdout);
- if (((c = getchar ()) & ~040) == 'Y')
+ write_stdout ("Auto-save? (y or n) ");
+ c = read_stdin ();
+ if ((c & 040) == 'Y')
{
Fdo_auto_save (Qt, Qnil);
#ifdef MSDOS
- printf ("\r\nAuto-save done");
-#else /* not MSDOS */
- printf ("Auto-save done\n");
-#endif /* not MSDOS */
+ write_stdout ("\r\nAuto-save done");
+#else
+ write_stdout ("Auto-save done\n");
+#endif
}
- while (c != '\n') c = getchar ();
+ while (c != '\n')
+ c = read_stdin ();
}
else
{
/* During GC, it must be safe to reenable quitting again. */
Vinhibit_quit = Qnil;
+ write_stdout
+ (
#ifdef MSDOS
- printf ("\r\n");
-#endif /* not MSDOS */
- printf ("Garbage collection in progress; cannot auto-save now\r\n");
- printf ("but will instead do a real quit after garbage collection ends\r\n");
- fflush (stdout);
+ "\r\n"
+#endif
+ "Garbage collection in progress; cannot auto-save now\r\n"
+ "but will instead do a real quit"
+ " after garbage collection ends\r\n");
}
#ifdef MSDOS
- printf ("\r\nAbort? (y or n) ");
-#else /* not MSDOS */
- printf ("Abort (and dump core)? (y or n) ");
-#endif /* not MSDOS */
- fflush (stdout);
- if (((c = getchar ()) & ~040) == 'Y')
+ write_stdout ("\r\nAbort? (y or n) ");
+#else
+ write_stdout ("Abort (and dump core)? (y or n) ");
+#endif
+ c = read_stdin ();
+ if ((c & ~040) == 'Y')
emacs_abort ();
- while (c != '\n') c = getchar ();
+ while (c != '\n')
+ c = read_stdin ();
#ifdef MSDOS
- printf ("\r\nContinuing...\r\n");
+ write_stdout ("\r\nContinuing...\r\n");
#else /* not MSDOS */
- printf ("Continuing...\n");
+ write_stdout ("Continuing...\n");
#endif /* not MSDOS */
- fflush (stdout);
init_all_sys_modes ();
}
else
--
2.5.0
^ permalink raw reply related [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:04 ` Daniel Colascione
@ 2016-01-03 23:20 ` John Wiegley
0 siblings, 0 replies; 177+ messages in thread
From: John Wiegley @ 2016-01-03 23:20 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> The problem is that displaying a GUI dialog box requires re-entering the
> main event loop, which I think risks too much undefined behavior. A separate
> process sidesteps the problem.
It does do that. But maybe this is better as a v2 feature. Let's try just
prompting before shutdown first.
> The code that bothers me is the code that longjmps *out* of the sigsegv
> handler that catches stack overflow. The handler being there is fine. Under
> my proposed scheme, we won't longjmp out of the handler unless the user
> tells us to do that.
OK, that's easy enough to understand. I'll let Eli and Paul weigh in now
before continuing.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 22:43 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) John Wiegley
2016-01-03 22:55 ` Crash recovery strategies Daniel Colascione
@ 2016-01-03 23:21 ` Paul Eggert
2016-01-03 23:24 ` Daniel Colascione
2016-01-03 23:27 ` John Wiegley
2016-01-04 15:32 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) Eli Zaretskii
2 siblings, 2 replies; 177+ messages in thread
From: Paul Eggert @ 2016-01-03 23:21 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii, Emacs-devel
John Wiegley wrote:
> I'd like to know what others think (maybe they are ignoring this thread now,
> so I've changed the subject). But I feel like there is some convergence now.
I didn't understand all the details of the proposal. That being said, I'm not
sure I like the idea of losing my Emacs session merely because some regular
expression got too long. Sure, Emacs would create a bunch of autosave files that
I can recover file data from, but if my session has been going for hours or days
it'll have quite a bit of state that isn't in any buffer associated with a file,
and I'd rather not lose that state. So although it'd be fine to have
async-signal-safe autosave as an option, as a user I'd like to also retain the
option for Emacs to keep doing the same old unsafe stuff that it's been doing
for decades, at least until after we get in-the-field experience with the new
approach.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 23:11 ` Paul Eggert
@ 2016-01-03 23:22 ` Daniel Colascione
2016-01-03 23:29 ` John Wiegley
` (2 more replies)
0 siblings, 3 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 23:22 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]
On 01/03/2016 03:11 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>> It's *critical* not to violate the invariants of code.
>
> Sure, but we are discussing what those invariants should be; they are
> not carved in stone. One possible invariant is (A) "stack overflow
> never happens". Another is (B) "if stack overflow happens, callers must
> tolerate being longjmped through". Either invariant is reasonable per
> se. It is a judgment call as to which invariant is better for Emacs.
> Possibly some modules will prefer (A) and others (B).
>
> Take the regular expression code as an example. Suppose it has unusual
> worst-case behavior that can grow the stack in arbitrary ways (which I
> think it does though I'm not going to investigate the details right
> now). One way to address this is to rewrite the code so that it doesn't
> have the behavior, but that would be a pain; the code has been that way
> for decades and is crufty at this point and a lot of Emacs depends on
> its quirks. Another way to address it is to use a guard page or whatever
> on the halfway-decent platforms that support that sort of thing. We've
> chosen the latter, i.e., we've chosen invariant (B), and yes there are
> problems with this approach but it beats doing nothing and it beats
> doing (A) because nobody has had the time to do (A), assuming it's
> doable at all.
The regexp code is a good example of a benign use of longjmp. That's our
code and we know it pretty well. We can set a global variable that says
"longjmp if there's a stack overflow in *this* piece of code" without
longjmping out of arbitrary pieces of code we don't own. GDB uses a
similar approach to suppress crashes from C++ demangling code.
>> It should be possible to replace the printfs
>> in this instance with calls to write(1, "message") (which will bypass
>> any output buffering) and restore async-signal-safety.
>
> Good point. I did that with the attached patch to emacs-25. However,
> this doesn't address the Fdo_auto_save () issue in the same neighborhood.
Thanks. The quick and dirty fix for Fdo_auto_save is to run
Fdo_auto_save in a forked child, where it's less likely to hurt
something, and put a limit on the time we're prepared to spend waiting
for that child. I've implemented Breakpad extensions that use a similar
approach to good effect. Of course, this approach won't work for
Windows, DOS, etc., but we're talking about quick and dirty.
>> If a user elects to attempt auto-save in this situation, that's on him.
>
> Sure, and Emacs already asks the user whether to auto-save in that
> situation, so this should be OK already.
I'm not sure users on window systems actually see these prompts. IME,
that's the majority of users.
>> Ideally, we'd make autosave async-signal-safe, which will help in this
>> handler and in the segfault hander.
>
> Yes, that'd be good, if we didn't lose functionality thereby.
It's the functionality loss that prompts me to propose a simpler,
pure-C, non-Fdo_auto_save approach to saving data when we're crashed;
see the other branch of this thread.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:21 ` Paul Eggert
@ 2016-01-03 23:24 ` Daniel Colascione
2016-01-03 23:28 ` John Wiegley
2016-01-04 0:51 ` Paul Eggert
2016-01-03 23:27 ` John Wiegley
1 sibling, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 23:24 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]
On 01/03/2016 03:21 PM, Paul Eggert wrote:
> John Wiegley wrote:
>> I'd like to know what others think (maybe they are ignoring this
>> thread now,
>> so I've changed the subject). But I feel like there is some
>> convergence now.
>
> I didn't understand all the details of the proposal. That being said,
> I'm not sure I like the idea of losing my Emacs session merely because
> some regular expression got too long. Sure, Emacs would create a bunch
> of autosave files that I can recover file data from, but if my session
> has been going for hours or days it'll have quite a bit of state that
> isn't in any buffer associated with a file, and I'd rather not lose that
> state. So although it'd be fine to have async-signal-safe autosave as an
> option, as a user I'd like to also retain the option for Emacs to keep
> doing the same old unsafe stuff that it's been doing for decades, at
> least until after we get in-the-field experience with the new approach.
As I've mentioned elsewhere, I'm perfectly happy making the regular
expression code a special case, since we own it and know what it does. I
agree that Emacs absolutely shouldn't crash on any Lisp input to
re-search-forward, looking-at, etc.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:21 ` Paul Eggert
2016-01-03 23:24 ` Daniel Colascione
@ 2016-01-03 23:27 ` John Wiegley
2016-01-03 23:29 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-03 23:27 UTC (permalink / raw)
To: Paul Eggert; +Cc: Eli Zaretskii, Daniel Colascione, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]
>>>>> Paul Eggert <eggert@cs.ucla.edu> writes:
> So although it'd be fine to have async-signal-safe autosave as an option, as
> a user I'd like to also retain the option for Emacs to keep doing the same
> old unsafe stuff that it's been doing for decades, at least until after we
> get in-the-field experience with the new approach.
Agreed. There should absolutely be an option to retain the unsafe behavior,
since at the very least it can be useful to have Emacs stick around while
exploring the behavior of a bug in situ. So we won't delete the longjmp code,
we just might change the default behavior to something more consistent with
safer practices.
Also completely agree on the in-the-field experience part. I'm not even sure
we'll change the default behavior right away; we might give it a few releases
to see how it goes, and to hear back from people who've decided to turn it on.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:24 ` Daniel Colascione
@ 2016-01-03 23:28 ` John Wiegley
2016-01-04 0:51 ` Paul Eggert
1 sibling, 0 replies; 177+ messages in thread
From: John Wiegley @ 2016-01-03 23:28 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> As I've mentioned elsewhere, I'm perfectly happy making the regular
> expression code a special case, since we own it and know what it does. I
> agree that Emacs absolutely shouldn't crash on any Lisp input to
> re-search-forward, looking-at, etc.
Sounds like a useful compromise.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 23:22 ` Daniel Colascione
@ 2016-01-03 23:29 ` John Wiegley
2016-01-04 1:05 ` Paul Eggert
2016-01-04 15:38 ` Eli Zaretskii
2 siblings, 0 replies; 177+ messages in thread
From: John Wiegley @ 2016-01-03 23:29 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> see the other branch of this thread.
Isn't a thread with multiple branches a split-end? A frayed thread at best. :)
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:27 ` John Wiegley
@ 2016-01-03 23:29 ` Daniel Colascione
2016-01-03 23:33 ` Sending automatic crash reports to the FSF (was: Crash recovery strategies) John Wiegley
2016-01-04 1:34 ` Crash recovery strategies Drew Adams
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 23:29 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]
On 01/03/2016 03:27 PM, John Wiegley wrote:
>>>>>> Paul Eggert <eggert@cs.ucla.edu> writes:
>
>> So although it'd be fine to have async-signal-safe autosave as an option, as
>> a user I'd like to also retain the option for Emacs to keep doing the same
>> old unsafe stuff that it's been doing for decades, at least until after we
>> get in-the-field experience with the new approach.
>
> Agreed. There should absolutely be an option to retain the unsafe behavior,
> since at the very least it can be useful to have Emacs stick around while
> exploring the behavior of a bug in situ. So we won't delete the longjmp code,
> we just might change the default behavior to something more consistent with
> safer practices.
>
> Also completely agree on the in-the-field experience part. I'm not even sure
> we'll change the default behavior right away; we might give it a few releases
> to see how it goes, and to hear back from people who've decided to turn it on.
More generally, how do we feel about automatically sending crash reports
to the FSF? Of course, we'd send reports stripped of personally
identifiable information, as is standard practice.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Sending automatic crash reports to the FSF (was: Crash recovery strategies)
2016-01-03 23:29 ` Daniel Colascione
@ 2016-01-03 23:33 ` John Wiegley
2016-01-03 23:36 ` Sending automatic crash reports to the FSF Daniel Colascione
2016-01-04 1:34 ` Crash recovery strategies Drew Adams
1 sibling, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-03 23:33 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> More generally, how do we feel about automatically sending crash reports to
> the FSF? Of course, we'd send reports stripped of personally identifiable
> information, as is standard practice.
This is a really good question. The number of users we have, compared to the
number of people able to field such reports, is what makes me wary. Perhaps we
could have a separate mailing list for such reports, without turning them into
bugs automatically, just to keep us abreast of possible recurring patterns. It
might even become a useful way to gather some statistics.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Sending automatic crash reports to the FSF
2016-01-03 23:33 ` Sending automatic crash reports to the FSF (was: Crash recovery strategies) John Wiegley
@ 2016-01-03 23:36 ` Daniel Colascione
2016-01-03 23:39 ` John Wiegley
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 23:36 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]
On 01/03/2016 03:33 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> More generally, how do we feel about automatically sending crash reports to
>> the FSF? Of course, we'd send reports stripped of personally identifiable
>> information, as is standard practice.
>
> This is a really good question. The number of users we have, compared to the
> number of people able to field such reports, is what makes me wary. Perhaps we
> could have a separate mailing list for such reports, without turning them into
> bugs automatically, just to keep us abreast of possible recurring patterns. It
> might even become a useful way to gather some statistics.
Do you have any idea who would run the servers that would receive these
reports? At both Facebook and Microsoft, the resources necessary to
store user crash reports are quite significant, and I wouldn't impose
that burden on the FSF. We'd definitely need some kind of aggregation
and sampling approach.
I'm not aware of any free software that handles the server-side aspect
of automated crash reporting. Does anyone here? I wonder what Mozilla is
using.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Sending automatic crash reports to the FSF
2016-01-03 23:36 ` Sending automatic crash reports to the FSF Daniel Colascione
@ 2016-01-03 23:39 ` John Wiegley
2016-01-03 23:48 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-03 23:39 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> Do you have any idea who would run the servers that would receive these
> reports? At both Facebook and Microsoft, the resources necessary to store
> user crash reports are quite significant, and I wouldn't impose that burden
> on the FSF. We'd definitely need some kind of aggregation and sampling
> approach.
I rent a VPS that doesn't do a whole lot (mostly just forwards mail and runs
my personal webblogs). I administer dovecot and postfix on that box, and would
be happy to receive and store crash reports there.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 22:55 ` Crash recovery strategies Daniel Colascione
2016-01-03 22:59 ` John Wiegley
@ 2016-01-03 23:47 ` John Wiegley
2016-01-03 23:51 ` Daniel Colascione
2016-01-04 15:33 ` Eli Zaretskii
2 siblings, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-03 23:47 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> If implementing a scheme like this is what it takes to kill the stack
> overflow code, I think I can implement it.
Even if we don't kill the stack overflow code, would you be interested in
trying out your async spawn idea? That might have other useful applications
too. In fact, I've wanted to move async.el's ideas into the C level for a
while (to avoid text-based marshalling between parent and child Emacsen), and
this could dovetail beautifully with that desire...
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Sending automatic crash reports to the FSF
2016-01-03 23:39 ` John Wiegley
@ 2016-01-03 23:48 ` Daniel Colascione
0 siblings, 0 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 23:48 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1339 bytes --]
On 01/03/2016 03:39 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> Do you have any idea who would run the servers that would receive these
>> reports? At both Facebook and Microsoft, the resources necessary to store
>> user crash reports are quite significant, and I wouldn't impose that burden
>> on the FSF. We'd definitely need some kind of aggregation and sampling
>> approach.
>
> I rent a VPS that doesn't do a whole lot (mostly just forwards mail and runs
> my personal webblogs). I administer dovecot and postfix on that box, and would
> be happy to receive and store crash reports there.
I run a similar machine. I suspect that both servers would promptly
melt, implode, and then, through sheer energy density, collapse into
black holes. Even in this decadent age, there are a *lot* of Emacs users.
I think we can start with the Emacs-next master branch and get a feeling
for what kind of workload this project would involve.
By the way: Ubuntu (my distribution of choice these days) has its own
crash reporting system, this appport/whoopsie thing. I wonder whether we
can sample their existing Emacs crash reports somehow.
Microsoft also has a way for software vendors to get crash reports from
Windows Error Reporting.
https://wiki.ubuntu.com/ErrorTracker
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:47 ` John Wiegley
@ 2016-01-03 23:51 ` Daniel Colascione
2016-01-04 0:12 ` John Wiegley
2016-01-04 15:40 ` Eli Zaretskii
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-03 23:51 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]
On 01/03/2016 03:47 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <dancol@dancol.org> writes:
>
>> If implementing a scheme like this is what it takes to kill the stack
>> overflow code, I think I can implement it.
>
> Even if we don't kill the stack overflow code, would you be interested in
> trying out your async spawn idea? That might have other useful applications
> too. In fact, I've wanted to move async.el's ideas into the C level for a
> while (to avoid text-based marshalling between parent and child Emacsen), and
> this could dovetail beautifully with that desire...
How would moving to C help? Keep in mind that we're talking about fork
*and exec*, so the parent and child don't share memory. They can,
however, communicate over a pipe. Lisp already has read and print, so
given the choice, I'd rather implement cross-process functionality in
Lisp. Sure, that's text-based marshaling, but with it's general, isn't it?
They only difficulty I have in mind is making sure that the child Emacs
can display UI to the user, which involves matching window-system
parameters like DISPLAY.
That said, if we do end up implementing async-spawn, and we need to be
able to do it from C *anyway*, we can certainly expose that C
functionality to regular Lisp callers.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:51 ` Daniel Colascione
@ 2016-01-04 0:12 ` John Wiegley
2016-01-04 15:40 ` Eli Zaretskii
1 sibling, 0 replies; 177+ messages in thread
From: John Wiegley @ 2016-01-04 0:12 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, Paul Eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> How would moving to C help? Keep in mind that we're talking about fork *and
> exec*, so the parent and child don't share memory. They can, however,
> communicate over a pipe. Lisp already has read and print, so given the
> choice, I'd rather implement cross-process functionality in Lisp. Sure,
> that's text-based marshaling, but with it's general, isn't it?
Ah, in that case, async.el is just what you want, since it does exactly what
you describe above.
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:24 ` Daniel Colascione
2016-01-03 23:28 ` John Wiegley
@ 2016-01-04 0:51 ` Paul Eggert
1 sibling, 0 replies; 177+ messages in thread
From: Paul Eggert @ 2016-01-04 0:51 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii, Emacs-devel
Daniel Colascione wrote:
> I'm perfectly happy making the regular
> expression code a special case, since we own it and know what it does.
Some modules will be like the regexp code, which means it won't be that special
a case. Suppose I add PCRE as a module, for example? PCRE has problems with
stack overflow.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 23:22 ` Daniel Colascione
2016-01-03 23:29 ` John Wiegley
@ 2016-01-04 1:05 ` Paul Eggert
2016-01-04 1:07 ` Daniel Colascione
2016-01-04 15:38 ` Eli Zaretskii
2 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2016-01-04 1:05 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii, Emacs-devel
Daniel Colascione wrote:
> The quick and dirty fix for Fdo_auto_save is to run
> Fdo_auto_save in a forked child, where it's less likely to hurt
> something, and put a limit on the time we're prepared to spend waiting
> for that child.
fork isn't async-signal-unsafe in glibc, so it's not clear this is safer there.
Also, even assuming POSIX compliance, suppose a library establishes an
async-signal-unsafe handler via pthread_atfork? Either way, although the
quick-and-dirty fix could be better than what we have now, it also could be worse.
See:
https://sourceware.org/bugzilla/show_bug.cgi?id=4737
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 1:05 ` Paul Eggert
@ 2016-01-04 1:07 ` Daniel Colascione
0 siblings, 0 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 1:07 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 518 bytes --]
On 01/03/2016 05:05 PM, Paul Eggert wrote:
> Daniel Colascione wrote:
>> The quick and dirty fix for Fdo_auto_save is to run
>> Fdo_auto_save in a forked child, where it's less likely to hurt
>> something, and put a limit on the time we're prepared to spend waiting
>> for that child.
>
> fork isn't async-signal-unsafe in glibc,
Then glibc violates POSIX. Some days, I can't stand how shoddy the
POSIX FOSS software world is.
vfork, however, should be safe enough if all we're going to do is exec.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* RE: Crash recovery strategies
2016-01-03 23:29 ` Daniel Colascione
2016-01-03 23:33 ` Sending automatic crash reports to the FSF (was: Crash recovery strategies) John Wiegley
@ 2016-01-04 1:34 ` Drew Adams
1 sibling, 0 replies; 177+ messages in thread
From: Drew Adams @ 2016-01-04 1:34 UTC (permalink / raw)
To: Daniel Colascione, Paul Eggert, Eli Zaretskii, Emacs-devel
> More generally, how do we feel about automatically sending crash reports
> to the FSF? Of course, we'd send reports stripped of personally
> identifiable information, as is standard practice.
I, for one, might be against that, unless the user explicitly
agrees to it ahead of time somehow, and we let the user know
exactly what (kinds of) info will be sent.
What one person considers personal information (whether or not
it identifies the person) another person might consider not to
be so.
IOW, it might depend on just what is meant by stripping.
For example, the info that is gathered automatically now for
inclusion in a bug report could be seen by a user to include
more info than s?he might want.
Is there a reason to send the report automatically? Is it just
to avoid interacting with the user because any further interaction
might interfere with the state to be reported (in which case, take
the state snapshot before asking)? Or is it because such
interaction could itself be problematic (in which case just do
without that particular report)?
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 18:08 ` Eli Zaretskii
2016-01-03 18:24 ` Daniel Colascione
@ 2016-01-04 3:17 ` Richard Stallman
1 sibling, 0 replies; 177+ messages in thread
From: Richard Stallman @ 2016-01-04 3:17 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: dancol, Emacs-devel, eggert
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Robustness comes at a price. You are asking Emacs and its users to
> pay a heavy price that they don't need to pay, because there are no
> requirements for Emacs to be as robust as safety-critical software.
That's the crucial point. With unlimited resources we could make
bugs in Emacs vanishingly unlikely. But if we put so much resources
into that, we would get nothing else done. We need to use our users
to find the bugs.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:12 ` Daniel Colascione
2016-01-03 23:11 ` Paul Eggert
@ 2016-01-04 15:24 ` Eli Zaretskii
2016-01-04 15:28 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:24 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 13:12:29 -0800
>
> > I think something along that lines would suffice, yes. Admittedly I
> > didn't quite follow what you wrote (perhaps some text got elided?). But
> > the main point, as I understand it, is that we needn't worry about
> > having a stack-overflow check inside the stack-overflow handler, because
> > we can insist that the stack-overflow handler be tightly-enough
> > controlled so that it won't recurse indefinitely.
>
> Yes: do as little as possible in the segfault handler and signal an
> error the normal way at the next safe opportunity, if one arises before
> we fully exhaust the stack.
You cannot do that when the stack is exhausted. You will almost
certainly crash if you try.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:28 ` Daniel Colascione
2016-01-03 21:31 ` Daniel Colascione
2016-01-03 21:45 ` John Wiegley
@ 2016-01-04 15:26 ` Eli Zaretskii
2 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:26 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 13:28:24 -0800
>
> One question that neither you, nor Eli, nor Paul have answered is why we
> would try to recover from stack overflow and not NULL deferences.
> Exactly the same arguments apply to both situations.
The difference should be obvious: stack overflow is not (necessarily)
a bug, it is just an exhaustion of some resource. Like getting a NULL
return value from memory allocation routines. Would you claim that
getting a NULL from malloc also requires to "crash quickly and
predictably"? No, I don't think so.
> Emacs should report its own crashes somehow *generally*, probably with
> Breakpad.
Maybe. But given the current trend of removing more and more data
from what report-emacs-bug sends, I rather doubt that many users would
want to allow such reports.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:31 ` Daniel Colascione
@ 2016-01-04 15:27 ` Eli Zaretskii
2016-01-04 15:29 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:27 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 13:31:49 -0800
>
> I'd also support doing no auto-save at crash time. Auto-save should
> happen frequently enough anyway that users shouldn't lose much data when
> a crash happens, and not auto-saving sidesteps a lot of robustness concerns.
I have my auto-save-timeout customized to 15 since about forever, but
I don't know what others will say about that. We should ask users
before changing this.
In any case, disabling auto-save at crash time is out of the question,
IMO. Precious stuff could come into a buffer from some fast source,
such as a program or a network connection, and in that case even
several seconds worth of data could be a lot. Removing auto-save from
the emergency shut down would be a terrible mistake.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 15:24 ` Eli Zaretskii
@ 2016-01-04 15:28 ` Daniel Colascione
2016-01-04 16:00 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 15:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 902 bytes --]
On 01/04/2016 07:24 AM, Eli Zaretskii wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 13:12:29 -0800
>>
>>> I think something along that lines would suffice, yes. Admittedly I
>>> didn't quite follow what you wrote (perhaps some text got elided?). But
>>> the main point, as I understand it, is that we needn't worry about
>>> having a stack-overflow check inside the stack-overflow handler, because
>>> we can insist that the stack-overflow handler be tightly-enough
>>> controlled so that it won't recurse indefinitely.
>>
>> Yes: do as little as possible in the segfault handler and signal an
>> error the normal way at the next safe opportunity, if one arises before
>> we fully exhaust the stack.
>
> You cannot do that when the stack is exhausted. You will almost
> certainly crash if you try.
That's why we have the alternate signal stack.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 15:27 ` Eli Zaretskii
@ 2016-01-04 15:29 ` Daniel Colascione
2016-01-04 16:01 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 15:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]
On 01/04/2016 07:27 AM, Eli Zaretskii wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 13:31:49 -0800
>>
>> I'd also support doing no auto-save at crash time. Auto-save should
>> happen frequently enough anyway that users shouldn't lose much data when
>> a crash happens, and not auto-saving sidesteps a lot of robustness concerns.
>
> I have my auto-save-timeout customized to 15 since about forever, but
> I don't know what others will say about that. We should ask users
> before changing this.
>
> In any case, disabling auto-save at crash time is out of the question,
> IMO. Precious stuff could come into a buffer from some fast source,
> such as a program or a network connection, and in that case even
> several seconds worth of data could be a lot. Removing auto-save from
> the emergency shut down would be a terrible mistake.
Emacs is not a safety-critical system. If you need guaranteed recording
of a fast data stream, use something else.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 21:45 ` John Wiegley
2016-01-03 22:20 ` Daniel Colascione
@ 2016-01-04 15:29 ` Eli Zaretskii
1 sibling, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:29 UTC (permalink / raw)
To: John Wiegley; +Cc: dancol, Emacs-devel, eggert
> From: John Wiegley <jwiegley@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, Paul Eggert <eggert@cs.ucla.edu>, Emacs-devel@gnu.org
> Date: Sun, 03 Jan 2016 13:45:15 -0800
>
> So, on a detected crash, auto-save all files, and save a text file with the
> crash data before exiting? That sounds pretty safe and reasonable to me.
Which is why we already do that. See shut_down_emacs.
> Maybe we could even popup a window to alert the user, and prompt them to press
> a key, but the only action will be to exit (unless the user is a power user
> and uses recursive edit to attempt to interact with their now-broken Emacs).
We already do that as well, as part of so-called "emergency exit".
See handle_interrupt.
> Why must it be all or nothing? Some is better than nothing. The error handler
> can evolve after we know just how useful it is (or whether it is).
That's exactly the crux of our disagreement with Daniel: we think
partial recovery is better (for users) than no recovery, while he
thinks that no recovery is better (because it makes the job of
debugging easier).
> Eli, Paul: What do you think about just auto-saving as much as possible,
> writing an error trace to a file, and prompting the user to press a key, after
> which we abort the running Emacs?
We already do all that, we just don't prompt in most cases. We simply
auto-save and then commit suicide. But in the case of stack overflow,
we can do better, so we try that as well.
> This is in line with what many of my OS X applications do when they
> encounter a fatal error; they're kind enough to tell me that it
> happened, and give me an "OK" button to click before they abort, but
> they don't allow me to continue to operate the application in an
> unknown state.
On MS-Windows, when we abort or succeed in catching a fatal signal
early enough, we also pop up an abort dialog, which even allows to
attach a debugger, not just exit.
IOW, all of these reasonable things we already do. The argument is
not about them, it's about going some extra mile in the specific case
of stack overflow. In that case, we can do better in most cases:
avoid crashing entirely, leaving Emacs in a state where the user can
shut down the session in a more orderly fashion. Admittedly, this
won't work in all cases, but that's not required.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 22:20 ` Daniel Colascione
2016-01-03 22:43 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) John Wiegley
@ 2016-01-04 15:31 ` Eli Zaretskii
2016-01-04 15:41 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:31 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 14:20:28 -0800
>
> That's a reasonable UI, but popping up a window or otherwise displaying
> UI in-process might not work. Instead, we can fork and exec a new Emacs
> to interact with the user, and read from a pipe that process inherits a
> byte telling the crashing Emacs what it should do. All that's perfectly
> legal to do from an async-signal-unsafe context.
>
> The new Emacs has to know *how* to display a message. I think it should
> be possible to look at the current frame's window system information.
> For NS and Win32, we just need to know whether it's GUI or a tty. For
> X11, we'd just need to extract display. On every frame switch, we can
> record this information in a simple variable we can read in any
> async-signal-safe way.
>
> Of course the child Emacs has to display something to the user somehow,
> but we can record the current window-system parameters on every frame
> switch into async-signal-safe state (say, a global char buffer), so that
> we can launch the child Emacs with the right display parameters.
>
> If the user indicates via the new process that she wants to continue
> using the broken Emacs, great. We should support doing just that. It'd
> be nice also to give that child Emacs support for attaching GDB to its
> parent, actually. Of course it's possible to attach GDB manually, but
> why not make it convenient?
I think this will emerge as a tremendously complex feature, whose
design and implementation will become more and more complicated as new
aspects of this come into view. Complex backup and recovery
procedures are generally a bad idea, because they tend to make the
overall reliability lower, not higher, due to problems inherent in the
recovery code itself. So I think doing this is not a good idea. It
certainly isn't a good use of our time and scarce resources.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
2016-01-03 22:43 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) John Wiegley
2016-01-03 22:55 ` Crash recovery strategies Daniel Colascione
2016-01-03 23:21 ` Paul Eggert
@ 2016-01-04 15:32 ` Eli Zaretskii
2016-01-04 15:35 ` Crash recovery strategies Daniel Colascione
2 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:32 UTC (permalink / raw)
To: John Wiegley; +Cc: dancol, Emacs-devel, eggert
> From: John Wiegley <jwiegley@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, Paul Eggert <eggert@cs.ucla.edu>, Emacs-devel@gnu.org
> Date: Sun, 03 Jan 2016 14:43:13 -0800
>
> Ah, showing the report on the *next* Emacs invocation is also something that
> OS X applications do (as an example of prior art). I like that idea.
We already do that: when Emacs comes up after a crash, it offers to
recover-session. We can extend that by adding a feature that would
send the details of the crash somewhere (but I very much doubt that
users will want to use this, as I already wrote).
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 22:55 ` Crash recovery strategies Daniel Colascione
2016-01-03 22:59 ` John Wiegley
2016-01-03 23:47 ` John Wiegley
@ 2016-01-04 15:33 ` Eli Zaretskii
2016-01-04 15:34 ` Daniel Colascione
2 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:33 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 14:55:26 -0800
>
> The code that bothers me is the code that longjmps *out* of the sigsegv
> handler that catches stack overflow. The handler being there is fine.
> Under my proposed scheme, we won't longjmp out of the handler unless the
> user tells us to do that.
You won't be able to do anything except call printf, without jumping
out of the handler. The stack space left after stack overflow is too
small for anything fancier. You will almost certainly crash, that's
all.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-04 15:33 ` Eli Zaretskii
@ 2016-01-04 15:34 ` Daniel Colascione
2016-01-04 16:02 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 15:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 761 bytes --]
On 01/04/2016 07:33 AM, Eli Zaretskii wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 14:55:26 -0800
>>
>> The code that bothers me is the code that longjmps *out* of the sigsegv
>> handler that catches stack overflow. The handler being there is fine.
>> Under my proposed scheme, we won't longjmp out of the handler unless the
>> user tells us to do that.
>
> You won't be able to do anything except call printf, without jumping
> out of the handler. The stack space left after stack overflow is too
> small for anything fancier. You will almost certainly crash, that's
> all.
There's no guarantee that you can even call printf. And you continue to
ignore the existence of the alternate signal stack.
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-04 15:32 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) Eli Zaretskii
@ 2016-01-04 15:35 ` Daniel Colascione
2016-01-04 16:04 ` Eli Zaretskii
2016-01-05 4:48 ` Richard Stallman
0 siblings, 2 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 15:35 UTC (permalink / raw)
To: Eli Zaretskii, John Wiegley; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 514 bytes --]
On 01/04/2016 07:32 AM, Eli Zaretskii wrote:
>> From: John Wiegley <jwiegley@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>, Paul Eggert <eggert@cs.ucla.edu>, Emacs-devel@gnu.org
>> Date: Sun, 03 Jan 2016 14:43:13 -0800
>>
>> Ah, showing the report on the *next* Emacs invocation is also something that
>> OS X applications do (as an example of prior art). I like that idea.
>
> We already do that: when Emacs comes up after a crash, it offers to
> recover-session.
Not on my Emacs we don't.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-03 23:22 ` Daniel Colascione
2016-01-03 23:29 ` John Wiegley
2016-01-04 1:05 ` Paul Eggert
@ 2016-01-04 15:38 ` Eli Zaretskii
2016-01-04 15:40 ` Daniel Colascione
2 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:38 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 15:22:48 -0800
>
> The quick and dirty fix for Fdo_auto_save is to run
> Fdo_auto_save in a forked child
Forking a child from a damaged parent is likely to leave us with the
same problem in the child. Also, if the original session uses up a
lot of memory, the fork could simply fail. (You later suggested a
vfork + exec, but that doesn't allow auto-saving from the child,
AFAIU.)
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 15:38 ` Eli Zaretskii
@ 2016-01-04 15:40 ` Daniel Colascione
2016-01-04 16:07 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 15:40 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]
On 01/04/2016 07:38 AM, Eli Zaretskii wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 15:22:48 -0800
>>
>> The quick and dirty fix for Fdo_auto_save is to run
>> Fdo_auto_save in a forked child
>
> Forking a child from a damaged parent is likely to leave us with the
> same problem in the child. Also, if the original session uses up a
> lot of memory, the fork could simply fail. (You later suggested a
> vfork + exec, but that doesn't allow auto-saving from the child,
> AFAIU.)
The idea is that we can wait a bounded amount of time for the child,
then kill it if it misbehaves. This way, we at least won't deadlock
(although the child might). The reason I suggested vfork is that glibc
(the quality of which continues to disappoint me) doesn't provide an
async-signal-safe fork. With the scheme I proposed, we don't *need* an
async-signal-safe fork, since we can perform the entire emergency save
operation inside the signal handler, without invoking lisp. We'd just
vfork and exec to ask the user what we want to do, which is an optional
process.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-03 23:51 ` Daniel Colascione
2016-01-04 0:12 ` John Wiegley
@ 2016-01-04 15:40 ` Eli Zaretskii
2016-01-04 15:44 ` Daniel Colascione
1 sibling, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 15:40 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Sun, 3 Jan 2016 15:51:13 -0800
>
> How would moving to C help? Keep in mind that we're talking about fork
> *and exec*, so the parent and child don't share memory. They can,
> however, communicate over a pipe.
Communicating over a pipe means encoding/decoding stuff, which
allocates memory, temporary buffers, and can potentially run Lisp.
It's IMO unwise to do that in a crashed session.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 15:31 ` Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
@ 2016-01-04 15:41 ` Daniel Colascione
2016-01-04 16:13 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 15:41 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]
On 01/04/2016 07:31 AM, Eli Zaretskii wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 14:20:28 -0800
>>
>> That's a reasonable UI, but popping up a window or otherwise displaying
>> UI in-process might not work. Instead, we can fork and exec a new Emacs
>> to interact with the user, and read from a pipe that process inherits a
>> byte telling the crashing Emacs what it should do. All that's perfectly
>> legal to do from an async-signal-unsafe context.
>>
>> The new Emacs has to know *how* to display a message. I think it should
>> be possible to look at the current frame's window system information.
>> For NS and Win32, we just need to know whether it's GUI or a tty. For
>> X11, we'd just need to extract display. On every frame switch, we can
>> record this information in a simple variable we can read in any
>> async-signal-safe way.
>>
>> Of course the child Emacs has to display something to the user somehow,
>> but we can record the current window-system parameters on every frame
>> switch into async-signal-safe state (say, a global char buffer), so that
>> we can launch the child Emacs with the right display parameters.
>>
>> If the user indicates via the new process that she wants to continue
>> using the broken Emacs, great. We should support doing just that. It'd
>> be nice also to give that child Emacs support for attaching GDB to its
>> parent, actually. Of course it's possible to attach GDB manually, but
>> why not make it convenient?
>
> I think this will emerge as a tremendously complex feature, whose
> design and implementation will become more and more complicated as new
> aspects of this come into view.
Either we already do most of this (as you've discussed previously) or
it's incredibly complex. You can't have it both ways. The problem is
that the current approach is completely broken, and you refuse to
acknowledge that it might be causing severe problems in a way we'd never
hear about.
> Complex backup and recovery
> procedures are generally a bad idea, because they tend to make the
> overall reliability lower, not higher, due to problems inherent in the
> recovery code itself. So I think doing this is not a good idea. It
> certainly isn't a good use of our time and scarce resources.
What's complex is running arbitrary Lisp code and longjmping to the main
loop when we *know* Emacs might be in the middle of arbitrary library or
module code that really might not like its invariants being violated.
You're attempting to shoot down my proposal, which *is* simple,
reliable, and safe, on the grounds that it's *complex*, when the
existing scheme is both unreliable and *infinitely* complex?
From a software engineering perspective, that's just baffling. It's
preferring to use an industrial milling machine with anthrax-coated
turbo-blades instead of a potato peeler. Both can peel a potato, but I
know which one I'd want to use.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-04 15:40 ` Eli Zaretskii
@ 2016-01-04 15:44 ` Daniel Colascione
0 siblings, 0 replies; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 15:44 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 876 bytes --]
On 01/04/2016 07:40 AM, Eli Zaretskii wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Date: Sun, 3 Jan 2016 15:51:13 -0800
>>
>> How would moving to C help? Keep in mind that we're talking about fork
>> *and exec*, so the parent and child don't share memory. They can,
>> however, communicate over a pipe.
>
> Communicating over a pipe means encoding/decoding stuff, which
> allocates memory, temporary buffers, and can potentially run Lisp.
> It's IMO unwise to do that in a crashed session.
Did you read my proposal? I suggested reading a *single byte*. POSIX
guarantees we can do that, even from async-signal-unsafe context. The
byte tells us whether to save, crash, die, or attempt to recover.
man 7 signal
But never mind. This whole ask-the-user-what-to-do thing is too
confusing. Just imagine I'd said we'd always write the save file.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 15:28 ` Daniel Colascione
@ 2016-01-04 16:00 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 16:00 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 4 Jan 2016 07:28:07 -0800
>
> >> Yes: do as little as possible in the segfault handler and signal an
> >> error the normal way at the next safe opportunity, if one arises before
> >> we fully exhaust the stack.
> >
> > You cannot do that when the stack is exhausted. You will almost
> > certainly crash if you try.
>
> That's why we have the alternate signal stack.
Which is also quite small, and from which you cannot run Lisp safely
(AFAIU).
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 15:29 ` Daniel Colascione
@ 2016-01-04 16:01 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 16:01 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 4 Jan 2016 07:29:09 -0800
>
> > In any case, disabling auto-save at crash time is out of the question,
> > IMO. Precious stuff could come into a buffer from some fast source,
> > such as a program or a network connection, and in that case even
> > several seconds worth of data could be a lot. Removing auto-save from
> > the emergency shut down would be a terrible mistake.
>
> Emacs is not a safety-critical system. If you need guaranteed recording
> of a fast data stream, use something else.
Emacs is not a safety-critical system, but it already auto-saves at
crash time, so you are suggesting to make the situation with this
worse than it already is. Not acceptable.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-04 15:34 ` Daniel Colascione
@ 2016-01-04 16:02 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 16:02 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 4 Jan 2016 07:34:24 -0800
>
> > You won't be able to do anything except call printf, without jumping
> > out of the handler. The stack space left after stack overflow is too
> > small for anything fancier. You will almost certainly crash, that's
> > all.
>
> There's no guarantee that you can even call printf.
The guard page is usually one or 2 pages, which is enough for most
uses of printf.
> And you continue to ignore the existence of the alternate signal
> stack.
No, I don't.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-04 15:35 ` Crash recovery strategies Daniel Colascione
@ 2016-01-04 16:04 ` Eli Zaretskii
2016-01-05 4:48 ` Richard Stallman
1 sibling, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 16:04 UTC (permalink / raw)
To: Daniel Colascione; +Cc: jwiegley, eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 4 Jan 2016 07:35:02 -0800
>
> > We already do that: when Emacs comes up after a crash, it offers to
> > recover-session.
>
> Not on my Emacs we don't.
We do, see startup.el.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 15:40 ` Daniel Colascione
@ 2016-01-04 16:07 ` Eli Zaretskii
2016-01-04 20:32 ` John Wiegley
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 16:07 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 4 Jan 2016 07:40:48 -0800
>
> >> The quick and dirty fix for Fdo_auto_save is to run
> >> Fdo_auto_save in a forked child
> >
> > Forking a child from a damaged parent is likely to leave us with the
> > same problem in the child. Also, if the original session uses up a
> > lot of memory, the fork could simply fail. (You later suggested a
> > vfork + exec, but that doesn't allow auto-saving from the child,
> > AFAIU.)
>
> The idea is that we can wait a bounded amount of time for the child,
> then kill it if it misbehaves.
Yes, but then there's no auto-saving, so this trick seems pointless to
me.
> The reason I suggested vfork is that glibc (the quality of which
> continues to disappoint me) doesn't provide an async-signal-safe
> fork. With the scheme I proposed, we don't *need* an
> async-signal-safe fork, since we can perform the entire emergency
> save operation inside the signal handler, without invoking
> lisp. We'd just vfork and exec to ask the user what we want to do,
> which is an optional process.
Why do we need to vfork+exec just to ask a question? We do this today
with a very low-level I/O calls, and that code worked since about
forever (I remember studying it 25 years ago).
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 15:41 ` Daniel Colascione
@ 2016-01-04 16:13 ` Eli Zaretskii
0 siblings, 0 replies; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-04 16:13 UTC (permalink / raw)
To: Daniel Colascione; +Cc: eggert, Emacs-devel
> Cc: eggert@cs.ucla.edu, Emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 4 Jan 2016 07:41:27 -0800
>
> > I think this will emerge as a tremendously complex feature, whose
> > design and implementation will become more and more complicated as new
> > aspects of this come into view.
>
> Either we already do most of this (as you've discussed previously) or
> it's incredibly complex. You can't have it both ways.
We do have the _functionality_, but its _design_ and _implementation_
are very different: it's much simpler and therefore much more robust
and much less prone to design and implementation bugs that will
definitely lower the overall reliability, instead of making it higher.
> The problem is that the current approach is completely broken, and
> you refuse to acknowledge that it might be causing severe problems
> in a way we'd never hear about.
It's not broken. It works, for many years. What you say is simply
not true.
> > Complex backup and recovery
> > procedures are generally a bad idea, because they tend to make the
> > overall reliability lower, not higher, due to problems inherent in the
> > recovery code itself. So I think doing this is not a good idea. It
> > certainly isn't a good use of our time and scarce resources.
>
> What's complex is running arbitrary Lisp code and longjmping to the main
> loop when we *know* Emacs might be in the middle of arbitrary library or
> module code that really might not like its invariants being violated.
No, that's dead simple. Just look at the code. You think it's risky,
but risky and complex are two very different things.
> You're attempting to shoot down my proposal
No, I am not. And even if I did, I can't: you are free to code
whatever you like.
What I'm doing is voicing my opinions on your ideas, which is what
John requested. He _wanted_ to hear my opinions, after hearing yours.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 16:07 ` Eli Zaretskii
@ 2016-01-04 20:32 ` John Wiegley
2016-01-04 20:34 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: John Wiegley @ 2016-01-04 20:32 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Daniel Colascione, Emacs-devel, eggert
[-- Attachment #1: Type: text/plain, Size: 940 bytes --]
OK, I've heard the concerns and input from all parties, and my decision is
that we will include the stack overflow recovery logic, as it is now
implemented, while keeping our ears open for any problems this causes to
users.
Daniel, Eli, Paul, thank you for spending the time to be vocal and deliberate
in enumerating your concerns. To Daniel specifically: I appreciate your
experience in this area, and that you are not speaking from your imagination,
but I'd like to give the current recovery approach a try before calling it a
non-starter, or branching out into more complex solutions.
We *will* have the freedom to reverse this decision in a future release if
things become worse rather than better. But there is no more benefit in
debating this future.
Thank you,
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 20:32 ` John Wiegley
@ 2016-01-04 20:34 ` Daniel Colascione
2016-01-04 20:35 ` Daniel Colascione
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 20:34 UTC (permalink / raw)
To: Eli Zaretskii, eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 993 bytes --]
On 01/04/2016 12:32 PM, John Wiegley wrote:
> OK, I've heard the concerns and input from all parties, and my decision is
> that we will include the stack overflow recovery logic, as it is now
> implemented, while keeping our ears open for any problems this causes to
> users.
>
> Daniel, Eli, Paul, thank you for spending the time to be vocal and deliberate
> in enumerating your concerns. To Daniel specifically: I appreciate your
> experience in this area, and that you are not speaking from your imagination,
> but I'd like to give the current recovery approach a try before calling it a
> non-starter, or branching out into more complex solutions.
>
> We *will* have the freedom to reverse this decision in a future release if
> things become worse rather than better. But there is no more benefit in
> debating this future.
Thanks for considering the issue. I still think this decision does not
bode well at all for Emacs robustness, and I'm terribly disappointed.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 20:34 ` Daniel Colascione
@ 2016-01-04 20:35 ` Daniel Colascione
2016-01-04 22:06 ` John Wiegley
0 siblings, 1 reply; 177+ messages in thread
From: Daniel Colascione @ 2016-01-04 20:35 UTC (permalink / raw)
To: Eli Zaretskii, eggert, Emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]
On 01/04/2016 12:34 PM, Daniel Colascione wrote:
> On 01/04/2016 12:32 PM, John Wiegley wrote:
>> OK, I've heard the concerns and input from all parties, and my decision is
>> that we will include the stack overflow recovery logic, as it is now
>> implemented, while keeping our ears open for any problems this causes to
>> users.
>>
>> Daniel, Eli, Paul, thank you for spending the time to be vocal and deliberate
>> in enumerating your concerns. To Daniel specifically: I appreciate your
>> experience in this area, and that you are not speaking from your imagination,
>> but I'd like to give the current recovery approach a try before calling it a
>> non-starter, or branching out into more complex solutions.
>>
>> We *will* have the freedom to reverse this decision in a future release if
>> things become worse rather than better. But there is no more benefit in
>> debating this future.
>
> Thanks for considering the issue. I still think this decision does not
> bode well at all for Emacs robustness, and I'm terribly disappointed.
I take it you're not interested in other improvements to crash recovery?
That's a shame. I was going to do some work there. As it stands, I'm
forced to conclude that we don't give a damn about making Emacs robust.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-01-04 20:35 ` Daniel Colascione
@ 2016-01-04 22:06 ` John Wiegley
0 siblings, 0 replies; 177+ messages in thread
From: John Wiegley @ 2016-01-04 22:06 UTC (permalink / raw)
To: Daniel Colascione; +Cc: Eli Zaretskii, eggert, Emacs-devel
>>>>> Daniel Colascione <dancol@dancol.org> writes:
> I take it you're not interested in other improvements to crash recovery?
> That's a shame. I was going to do some work there. As it stands, I'm forced
> to conclude that we don't give a damn about making Emacs robust.
Please move further discussion on making Emacs as stable as boiling
nitroglycerin to emacs-tangents.
Thank you,
--
John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-04 15:35 ` Crash recovery strategies Daniel Colascione
2016-01-04 16:04 ` Eli Zaretskii
@ 2016-01-05 4:48 ` Richard Stallman
2016-01-05 15:52 ` Eli Zaretskii
1 sibling, 1 reply; 177+ messages in thread
From: Richard Stallman @ 2016-01-05 4:48 UTC (permalink / raw)
To: Daniel Colascione; +Cc: jwiegley, eliz, eggert, Emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > We already do that: when Emacs comes up after a crash, it offers to
> > recover-session.
> Not on my Emacs we don't.
Is that a bug? It sounds like one. Would you like to investigate,
and/or send a bug report?
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-05 4:48 ` Richard Stallman
@ 2016-01-05 15:52 ` Eli Zaretskii
2016-01-05 16:37 ` Clément Pit--Claudel
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-05 15:52 UTC (permalink / raw)
To: rms; +Cc: jwiegley, dancol, Emacs-devel, eggert
> From: Richard Stallman <rms@gnu.org>
> CC: eliz@gnu.org, jwiegley@gmail.com, eggert@cs.ucla.edu,
> Emacs-devel@gnu.org
> Date: Mon, 04 Jan 2016 23:48:53 -0500
>
> > > We already do that: when Emacs comes up after a crash, it offers to
> > > recover-session.
>
> > Not on my Emacs we don't.
>
> Is that a bug? It sounds like one. Would you like to investigate,
> and/or send a bug report?
If Emacs doesn't propose to recover crashed sessions, it's indeed a
bug. But it does do that for me, I just double-checked to be sure.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-05 15:52 ` Eli Zaretskii
@ 2016-01-05 16:37 ` Clément Pit--Claudel
2016-01-05 17:08 ` Eli Zaretskii
0 siblings, 1 reply; 177+ messages in thread
From: Clément Pit--Claudel @ 2016-01-05 16:37 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 914 bytes --]
On 01/05/2016 10:52 AM, Eli Zaretskii wrote:
>> From: Richard Stallman <rms@gnu.org>
>> CC: eliz@gnu.org, jwiegley@gmail.com, eggert@cs.ucla.edu,
>> Emacs-devel@gnu.org
>> Date: Mon, 04 Jan 2016 23:48:53 -0500
>>
>> > > We already do that: when Emacs comes up after a crash, it offers to
>> > > recover-session.
>>
>> > Not on my Emacs we don't.
>>
>> Is that a bug? It sounds like one. Would you like to investigate,
>> and/or send a bug report?
>
> If Emacs doesn't propose to recover crashed sessions, it's indeed a
> bug. But it does do that for me, I just double-checked to be sure.
I never heard about session recovery before reading this thread. When my Emacs crashes (so about twice a week) I use M-x recover-this-file after reopening the files that I was editing. Is Emacs supposed to do more that the recover-this-file prompts? Did I misconfigure something?
Clément.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-05 16:37 ` Clément Pit--Claudel
@ 2016-01-05 17:08 ` Eli Zaretskii
2016-01-05 17:38 ` Clément Pit--Claudel
0 siblings, 1 reply; 177+ messages in thread
From: Eli Zaretskii @ 2016-01-05 17:08 UTC (permalink / raw)
To: Clément Pit--Claudel; +Cc: emacs-devel
> From: Clément Pit--Claudel <clement.pit@gmail.com>
> Date: Tue, 5 Jan 2016 11:37:06 -0500
>
> I never heard about session recovery before reading this thread.
See the node "Recover" in the Emacs manual, it is described there.
> When my Emacs crashes (so about twice a week) I use M-x recover-this-file after reopening the files that I was editing. Is Emacs supposed to do more that the recover-this-file prompts? Did I misconfigure something?
When you start Emacs as "emacs -q" (lower-case 'q'), it should show a
prominent message saying this:
Auto-save file lists were found. If an Emacs session crashed recently,
type M-x recover-session RET to recover the files you were editing.
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Crash recovery strategies
2016-01-05 17:08 ` Eli Zaretskii
@ 2016-01-05 17:38 ` Clément Pit--Claudel
0 siblings, 0 replies; 177+ messages in thread
From: Clément Pit--Claudel @ 2016-01-05 17:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 831 bytes --]
On 01/05/2016 12:08 PM, Eli Zaretskii wrote:
>> From: Clément Pit--Claudel <clement.pit@gmail.com>
>> Date: Tue, 5 Jan 2016 11:37:06 -0500
>>
>> I never heard about session recovery before reading this thread.
>
> See the node "Recover" in the Emacs manual, it is described there.
>
>> When my Emacs crashes (so about twice a week) I use M-x recover-this-file after reopening the files that I was editing. Is Emacs supposed to do more that the recover-this-file prompts? Did I misconfigure something?
>
> When you start Emacs as "emacs -q" (lower-case 'q'), it should show a
> prominent message saying this:
>
> Auto-save file lists were found. If an Emacs session crashed recently,
> type M-x recover-session RET to recover the files you were editing.
I see, thanks! I'll try next time I see a crash.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-19 21:03 ` Philipp Stephani
2015-12-19 22:57 ` Philipp Stephani
2015-12-20 15:48 ` Eli Zaretskii
@ 2016-02-29 22:48 ` Philipp Stephani
2016-03-01 16:41 ` Paul Eggert
2 siblings, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2016-02-29 22:48 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2564 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am Sa., 19. Dez. 2015 um
22:03 Uhr:
> Philipp Stephani <p.stephani2@gmail.com> schrieb am Sa., 28. Nov. 2015 um
> 11:58 Uhr:
>
>> Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 27. Nov. 2015
>> um 20:19 Uhr:
>>
>>> Eli Zaretskii <eliz@gnu.org> schrieb am Fr., 27. Nov. 2015 um 08:36 Uhr:
>>>
>>>> > Cc: emacs-devel@gnu.org
>>>> > From: Paul Eggert <eggert@cs.ucla.edu>
>>>> > Date: Thu, 26 Nov 2015 13:29:49 -0800
>>>> >
>>>> > Eli Zaretskii wrote:
>>>> > > it would be a maintenance burden to have to
>>>> > > analyze upon each such change whether emacs-module.c needs some
>>>> > > augmentation.
>>>> >
>>>> > While that's true in general, I think some exceptions are OK. E.g.,
>>>> it's OK if
>>>> > emacs-module.c assumes that ASIZE is a simple access function or
>>>> macro that
>>>> > doesn't throw signals. If we actually changed ASIZE to throw
>>>> signals, there's a
>>>> > boatload of other code we'd need to change as well, and changing
>>>> emacs-module.c
>>>> > wouldn't add much more to the maintenance burden.
>>>>
>>>> So what are the rules here, exactly? I'd like to write them down in
>>>> the commentary to emacs-module.c, so that any future changes there
>>>> will have lower probability of breaking things.
>>>>
>>>> E.g., can make_number signal an error? What about make_float or
>>>> make_string? And what about accessors like XFLOAT_DATA or AREF?
>>>>
>>>>
>>> Are there any established rules? If not we should probably be
>>> conservative and assume that everything signals. If we figure out that
>>> this introduces an unacceptably high overhead in some situations we can
>>> reconsider later.
>>> I would propose three exceptions: free_global_ref, is_not_nil, eq.
>>> free_global_ref cannot fail in Daniel's design, and implementing it that
>>> way would be consistent with other resource deallocation functions such as
>>> free(3). is_not_nil and eq seem so fundamental that I cannot imagine a
>>> situation where they could ever fail. Documenting that these three cannot
>>> fail would free module authors from the need to check for errors after
>>> calling these functions.
>>>
>>
>> For now I've attached a patch to replace the initial setup of most
>> environment functions with a single macro.
>>
>
> Here's another patch that replaces some of the custom error handling with
> signals. The signals will be immediately caught by the prologue, but some
> complexity and duplication is removed.
>
Could this patch please be reviewed and/or applied? Thanks.
[-- Attachment #2: Type: text/html, Size: 3956 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-02-29 22:48 ` Philipp Stephani
@ 2016-03-01 16:41 ` Paul Eggert
2016-03-01 21:43 ` Philipp Stephani
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2016-03-01 16:41 UTC (permalink / raw)
To: Philipp Stephani, Eli Zaretskii
Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
On 02/29/2016 02:48 PM, Philipp Stephani wrote:
> Could this patch please be reviewed and/or applied? Thanks.
That was a long thread and I see multiple patches in it. Could you
please summarize the current proposal as you see it? For reference:
https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00871.html
https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00881.html
https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00902.html
(I could well have missed some.)
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-03-01 16:41 ` Paul Eggert
@ 2016-03-01 21:43 ` Philipp Stephani
2016-03-02 18:54 ` Paul Eggert
0 siblings, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2016-03-01 21:43 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Di., 1. März 2016 um 17:41 Uhr:
> On 02/29/2016 02:48 PM, Philipp Stephani wrote:
> > Could this patch please be reviewed and/or applied? Thanks.
>
> That was a long thread and I see multiple patches in it. Could you
> please summarize the current proposal as you see it? For reference:
>
> https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00871.html
I think this patch can still be applied as-is, it just simplifies the code
a bit without changing any behavior.
>
>
> https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00881.html
>
> https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00902.html
>
>
The second one is newer, so the first one should be discarded in favor of
the second one. Given the decision to allow longjmp on segfault, it is no
longer correct, but a variant (replacing "non-local exits" with "non-local
exits due to Lisp `signal' and `throw'") should still be correct and useful.
[-- Attachment #2: Type: text/html, Size: 1809 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-03-01 21:43 ` Philipp Stephani
@ 2016-03-02 18:54 ` Paul Eggert
2016-03-31 18:44 ` Philipp Stephani
0 siblings, 1 reply; 177+ messages in thread
From: Paul Eggert @ 2016-03-02 18:54 UTC (permalink / raw)
To: Philipp Stephani, Eli Zaretskii
Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]
On 03/01/2016 01:43 PM, Philipp Stephani wrote:
>
>
> https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00871.html
>
>
> I think this patch can still be applied as-is, it just simplifies the
> code a bit without changing any behavior.
OK, thanks, I applied it to master, along with the attached followup
which cleans things up a bit more and removes unnecessary runtime checks.
>
>
> https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00881.html
>
> https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00902.html
>
>
> The second one is newer, so the first one should be discarded in favor
> of the second one. Given the decision to allow longjmp on segfault, it
> is no longer correct, but a variant (replacing "non-local exits" with
> "non-local exits due to Lisp `signal' and `throw'") should still be
> correct and useful.
This one looks more dubious. As I mentioned in the containing thread,
the comments are somewhat distracting where they are. Perhaps they could
be collected together into a comment in emacs-module.c that summarizes
the assumptions it makes.
Also, we really should fix the stack-overflow mess. But that's a bigger
topic....
[-- Attachment #2: 0001-emacs-module.c-simplification-and-tuneup.patch --]
[-- Type: application/x-patch, Size: 5258 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-03-02 18:54 ` Paul Eggert
@ 2016-03-31 18:44 ` Philipp Stephani
2016-04-01 8:29 ` Paul Eggert
0 siblings, 1 reply; 177+ messages in thread
From: Philipp Stephani @ 2016-03-31 18:44 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii; +Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1763 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Mi., 2. März 2016 um 19:54 Uhr:
> On 03/01/2016 01:43 PM, Philipp Stephani wrote:
> >
> >
> > https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00871.html
> >
> >
> > I think this patch can still be applied as-is, it just simplifies the
> > code a bit without changing any behavior.
>
> OK, thanks, I applied it to master, along with the attached followup
> which cleans things up a bit more and removes unnecessary runtime checks.
>
Thanks. Why did you remove the checks? I think all of them are necessary
and lead to undefined behavior if they are violated.
>
> >
> >
> > https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00881.html
> >
> > https://lists.gnu.org/archive/html/emacs-devel/2015-12/msg00902.html
> >
> >
> > The second one is newer, so the first one should be discarded in favor
> > of the second one. Given the decision to allow longjmp on segfault, it
> > is no longer correct, but a variant (replacing "non-local exits" with
> > "non-local exits due to Lisp `signal' and `throw'") should still be
> > correct and useful.
>
> This one looks more dubious. As I mentioned in the containing thread,
> the comments are somewhat distracting where they are. Perhaps they could
> be collected together into a comment in emacs-module.c that summarizes
> the assumptions it makes.
>
>
The issue with that approach is that if one of the used functions is
changed to throw signals, then the change author might not know to update
emacs-module.c. In the worst case, for every function that starts throwing
signals emacs-module.c would have to be investigated. It's less fragile to
document the requirements in the functions themselves.
[-- Attachment #2: Type: text/html, Size: 2675 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2016-03-31 18:44 ` Philipp Stephani
@ 2016-04-01 8:29 ` Paul Eggert
0 siblings, 0 replies; 177+ messages in thread
From: Paul Eggert @ 2016-04-01 8:29 UTC (permalink / raw)
To: Philipp Stephani, Eli Zaretskii
Cc: aurelien.aptel+emacs, tzz, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]
Philipp Stephani wrote:
> Why did you remove the checks? I think all of them are necessary
> and lead to undefined behavior if they are violated.
Looking at my patch:
> - verify (EMACS_INT_MAX > MOST_POSITIVE_FIXNUM);
EMACS_INT_MAX cannot possibly be less than MOST_POSITIVE_FIXNUM, since
MOST_POSITIVE_FIXNUM is an EMACS_INT.
> EMACS_INT refcount = XFASTINT (value) + 1;
> - if (FIXNUM_OVERFLOW_P (refcount)) xsignal0 (Qoverflow_error);
> + if (MOST_POSITIVE_FIXNUM < refcount)
> + xsignal0 (Qoverflow_error);
Adding 1 to the result of XFASTINT cannot possibly yield an integer that is less
than MOST_NEGATIVE_FIXNUM, so there's no need for the extra runtime check that
FIXNUM_OVERFLOW_P would impose.
> ptrdiff_t raw_size = SBYTES (lisp_str_utf8);
> - if (raw_size == PTRDIFF_MAX) xsignal0 (Qoverflow_error);
raw_size cannot possibly be PTRDIFF_MAX, since SBYTES always returns a value no
greater than STRING_BYTES_BOUND, and STRING_BYTES_BOUND is less than PTRDIFF_MAX.
> - if (length > STRING_BYTES_BOUND) xsignal0 (Qoverflow_error);
> Lisp_Object lstr = make_unibyte_string (str, length);
make_unibyte_string already checks for string length overflow, so the caller
need not check this.
> - if (FIXNUM_OVERFLOW_P (i)) xsignal0 (Qoverflow_error);
> CHECK_RANGED_INTEGER (make_number (i), 0, ASIZE (lvec) - 1);
CHECK_RANGED_INTEGER already checks that the integer is in range, so the caller
doesn't need to check that again. Hmm, here, though, there is a problem, in that
make_number can silently overflow. Sorry about that. I fixed this by installing
the attached further patch.
> if one of the used functions is
> changed to throw signals
We shouldn't have to worry about that. Changing core functions to throw signals
would break lots of other code. We don't need to burden readers of these common
core functions with every single design constraint that affects them. It'd be OK
to put this commentary somewhere else, just not in a place where it clutters up
the main code.
[-- Attachment #2: 0001-Fix-check-for-subscript-errors-in-module-calls.patch --]
[-- Type: text/x-diff, Size: 1647 bytes --]
From 08b456856baa9e033ee1b210a03373589dd7765a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 1 Apr 2016 01:24:55 -0700
Subject: [PATCH] Fix check for subscript errors in module calls
* src/emacs-module.c (check_vec_index): New function.
(module_vec_set, module_vec_get): Use it instead of a
not-strict-enough check.
---
src/emacs-module.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/src/emacs-module.c b/src/emacs-module.c
index f9ede84..b57636e 100644
--- a/src/emacs-module.c
+++ b/src/emacs-module.c
@@ -588,13 +588,21 @@ module_set_user_finalizer (emacs_env *env, emacs_value uptr,
}
static void
+check_vec_index (Lisp_Object lvec, ptrdiff_t i)
+{
+ CHECK_VECTOR (lvec);
+ if (! (0 <= i && i < ASIZE (lvec)))
+ args_out_of_range_3 (make_fixnum_or_float (i),
+ make_number (0), make_number (ASIZE (lvec) - 1));
+}
+
+static void
module_vec_set (emacs_env *env, emacs_value vec, ptrdiff_t i, emacs_value val)
{
/* FIXME: This function should return bool because it can fail. */
MODULE_FUNCTION_BEGIN ();
Lisp_Object lvec = value_to_lisp (vec);
- CHECK_VECTOR (lvec);
- CHECK_RANGED_INTEGER (make_number (i), 0, ASIZE (lvec) - 1);
+ check_vec_index (lvec, i);
ASET (lvec, i, value_to_lisp (val));
}
@@ -603,8 +611,7 @@ module_vec_get (emacs_env *env, emacs_value vec, ptrdiff_t i)
{
MODULE_FUNCTION_BEGIN (module_nil);
Lisp_Object lvec = value_to_lisp (vec);
- CHECK_VECTOR (lvec);
- CHECK_RANGED_INTEGER (make_number (i), 0, ASIZE (lvec) - 1);
+ check_vec_index (lvec, i);
return lisp_to_value (AREF (lvec, i));
}
--
2.5.5
^ permalink raw reply related [flat|nested] 177+ messages in thread
* Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
2015-12-20 21:00 ` Philipp Stephani
@ 2017-03-26 20:18 ` Philipp Stephani
0 siblings, 0 replies; 177+ messages in thread
From: Philipp Stephani @ 2017-03-26 20:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: aurelien.aptel+emacs, tzz, eggert, dancol, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am So., 20. Dez. 2015 um
22:00 Uhr:
> Eli Zaretskii <eliz@gnu.org> schrieb am So., 20. Dez. 2015 um 19:59 Uhr:
>
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sun, 20 Dec 2015 18:27:19 +0000
> > Cc: eggert@cs.ucla.edu, dancol@dancol.org, tzz@lifelogs.com,
> > aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org
> >
> > and did you make sure
> > that signaling an error from a module still reports the same
> > information and backtrace as before the change?
> >
> > No, but I wouldn't know how the patch should change this. It only
> changes an
> > implementation detail in the error reporting code of the module
> implementation
> > itself.
>
> I provided one such test here:
>
> http://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02318.html
>
> and the resulting backtrace here:
>
> http://lists.gnu.org/archive/html/emacs-devel/2015-11/msg02405.html
>
> > If we wanted to check the backtrace, I think we should rather add a unit
> test
> > for it.
>
> That would be good, thanks.
>
>
> Added a patch to that effect.
>
Pushed to master as 05bfebfc91.
[-- Attachment #2: Type: text/html, Size: 3264 bytes --]
^ permalink raw reply [flat|nested] 177+ messages in thread
end of thread, other threads:[~2017-03-26 20:18 UTC | newest]
Thread overview: 177+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-25 18:39 Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
2015-11-25 18:50 ` Philipp Stephani
2015-11-25 19:24 ` Eli Zaretskii
2015-11-26 21:29 ` Paul Eggert
2015-11-27 7:35 ` Eli Zaretskii
2015-11-27 19:19 ` Philipp Stephani
2015-11-28 10:58 ` Philipp Stephani
2015-11-28 12:10 ` Eli Zaretskii
2015-12-19 21:03 ` Philipp Stephani
2015-12-19 22:57 ` Philipp Stephani
2015-12-20 15:47 ` Eli Zaretskii
2015-12-20 18:34 ` Philipp Stephani
2015-12-20 19:11 ` Eli Zaretskii
2015-12-20 21:40 ` Paul Eggert
2015-12-21 3:33 ` Eli Zaretskii
2015-12-21 11:00 ` Paul Eggert
2015-12-21 11:21 ` Yuri Khan
2015-12-21 11:34 ` Paul Eggert
2015-12-21 15:46 ` Eli Zaretskii
2015-12-21 18:15 ` Paul Eggert
2015-12-21 18:28 ` Daniel Colascione
2015-12-21 19:00 ` Eli Zaretskii
2015-12-21 20:19 ` Philipp Stephani
2015-12-21 19:04 ` Eli Zaretskii
2015-12-22 4:09 ` Paul Eggert
2015-12-22 4:38 ` Daniel Colascione
2015-12-22 4:48 ` Paul Eggert
2015-12-22 4:52 ` Daniel Colascione
2015-12-22 6:09 ` Paul Eggert
2015-12-22 6:14 ` Daniel Colascione
2015-12-22 6:33 ` Paul Eggert
2015-12-22 6:35 ` Daniel Colascione
2015-12-22 6:44 ` Paul Eggert
2015-12-22 6:53 ` Daniel Colascione
2015-12-22 16:13 ` Eli Zaretskii
2015-12-22 16:12 ` Eli Zaretskii
2015-12-22 17:26 ` Philipp Stephani
2015-12-22 17:51 ` Eli Zaretskii
2015-12-22 16:03 ` Eli Zaretskii
2015-12-22 16:39 ` Paul Eggert
2015-12-22 17:46 ` Eli Zaretskii
2015-12-22 23:28 ` Paul Eggert
2015-12-23 16:10 ` Eli Zaretskii
2015-12-23 16:20 ` Philipp Stephani
2015-12-23 16:46 ` Eli Zaretskii
2015-12-23 17:09 ` Paul Eggert
2015-12-23 17:18 ` Daniel Colascione
2015-12-24 2:51 ` Paul Eggert
2015-12-24 3:11 ` Daniel Colascione
2015-12-24 16:10 ` Eli Zaretskii
2015-12-24 17:04 ` Daniel Colascione
2015-12-24 17:17 ` John Wiegley
2016-01-03 14:27 ` Daniel Colascione
2016-01-03 15:46 ` Eli Zaretskii
2016-01-03 15:49 ` Daniel Colascione
2016-01-03 16:40 ` Eli Zaretskii
2016-01-03 16:50 ` Daniel Colascione
2016-01-03 17:20 ` Eli Zaretskii
2016-01-03 16:31 ` Paul Eggert
2016-01-03 16:48 ` Daniel Colascione
2016-01-03 18:07 ` Paul Eggert
2016-01-03 18:22 ` Daniel Colascione
2016-01-03 21:02 ` Paul Eggert
2016-01-03 21:12 ` Daniel Colascione
2016-01-03 23:11 ` Paul Eggert
2016-01-03 23:22 ` Daniel Colascione
2016-01-03 23:29 ` John Wiegley
2016-01-04 1:05 ` Paul Eggert
2016-01-04 1:07 ` Daniel Colascione
2016-01-04 15:38 ` Eli Zaretskii
2016-01-04 15:40 ` Daniel Colascione
2016-01-04 16:07 ` Eli Zaretskii
2016-01-04 20:32 ` John Wiegley
2016-01-04 20:34 ` Daniel Colascione
2016-01-04 20:35 ` Daniel Colascione
2016-01-04 22:06 ` John Wiegley
2016-01-04 15:24 ` Eli Zaretskii
2016-01-04 15:28 ` Daniel Colascione
2016-01-04 16:00 ` Eli Zaretskii
2016-01-03 17:16 ` Eli Zaretskii
2016-01-03 17:22 ` Daniel Colascione
2016-01-03 17:39 ` Eli Zaretskii
2016-01-03 17:49 ` Daniel Colascione
2016-01-03 18:08 ` Eli Zaretskii
2016-01-03 18:24 ` Daniel Colascione
2016-01-03 18:51 ` Eli Zaretskii
2016-01-03 19:04 ` Daniel Colascione
2016-01-03 19:15 ` Eli Zaretskii
2016-01-03 19:26 ` Daniel Colascione
2016-01-03 19:46 ` Eli Zaretskii
2016-01-03 19:47 ` Daniel Colascione
2016-01-03 19:49 ` John Wiegley
2016-01-03 20:14 ` Daniel Colascione
2016-01-04 3:17 ` Richard Stallman
2016-01-03 18:17 ` Paul Eggert
2016-01-03 17:43 ` Eli Zaretskii
2016-01-03 20:25 ` John Wiegley
2016-01-03 20:47 ` Daniel Colascione
2016-01-03 21:07 ` John Wiegley
2016-01-03 21:28 ` Daniel Colascione
2016-01-03 21:31 ` Daniel Colascione
2016-01-04 15:27 ` Eli Zaretskii
2016-01-04 15:29 ` Daniel Colascione
2016-01-04 16:01 ` Eli Zaretskii
2016-01-03 21:45 ` John Wiegley
2016-01-03 22:20 ` Daniel Colascione
2016-01-03 22:43 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) John Wiegley
2016-01-03 22:55 ` Crash recovery strategies Daniel Colascione
2016-01-03 22:59 ` John Wiegley
2016-01-03 23:04 ` Daniel Colascione
2016-01-03 23:20 ` John Wiegley
2016-01-03 23:47 ` John Wiegley
2016-01-03 23:51 ` Daniel Colascione
2016-01-04 0:12 ` John Wiegley
2016-01-04 15:40 ` Eli Zaretskii
2016-01-04 15:44 ` Daniel Colascione
2016-01-04 15:33 ` Eli Zaretskii
2016-01-04 15:34 ` Daniel Colascione
2016-01-04 16:02 ` Eli Zaretskii
2016-01-03 23:21 ` Paul Eggert
2016-01-03 23:24 ` Daniel Colascione
2016-01-03 23:28 ` John Wiegley
2016-01-04 0:51 ` Paul Eggert
2016-01-03 23:27 ` John Wiegley
2016-01-03 23:29 ` Daniel Colascione
2016-01-03 23:33 ` Sending automatic crash reports to the FSF (was: Crash recovery strategies) John Wiegley
2016-01-03 23:36 ` Sending automatic crash reports to the FSF Daniel Colascione
2016-01-03 23:39 ` John Wiegley
2016-01-03 23:48 ` Daniel Colascione
2016-01-04 1:34 ` Crash recovery strategies Drew Adams
2016-01-04 15:32 ` Crash recovery strategies (was: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) Eli Zaretskii
2016-01-04 15:35 ` Crash recovery strategies Daniel Colascione
2016-01-04 16:04 ` Eli Zaretskii
2016-01-05 4:48 ` Richard Stallman
2016-01-05 15:52 ` Eli Zaretskii
2016-01-05 16:37 ` Clément Pit--Claudel
2016-01-05 17:08 ` Eli Zaretskii
2016-01-05 17:38 ` Clément Pit--Claudel
2016-01-04 15:31 ` Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
2016-01-04 15:41 ` Daniel Colascione
2016-01-04 16:13 ` Eli Zaretskii
2016-01-04 15:29 ` Eli Zaretskii
2016-01-04 15:26 ` Eli Zaretskii
2015-12-24 17:36 ` Eli Zaretskii
2015-12-24 18:06 ` Daniel Colascione
2015-12-24 19:15 ` Eli Zaretskii
2015-12-22 16:01 ` Eli Zaretskii
2015-12-22 16:32 ` John Wiegley
2015-12-22 20:31 ` Daniel Colascione
2015-12-22 20:46 ` Eli Zaretskii
2015-12-22 20:52 ` Daniel Colascione
2015-12-22 21:08 ` Eli Zaretskii
2015-12-22 21:18 ` Daniel Colascione
2015-12-23 16:07 ` Eli Zaretskii
2015-12-23 16:25 ` Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.) Daniel Colascione
2015-12-23 17:30 ` Eli Zaretskii
2015-12-23 17:41 ` Daniel Colascione
2015-12-23 17:55 ` Eli Zaretskii
2015-12-23 17:56 ` Daniel Colascione
2015-12-23 18:09 ` Eli Zaretskii
2015-12-23 18:19 ` Daniel Colascione
2015-12-23 18:45 ` Eli Zaretskii
2015-12-24 3:26 ` Daniel Colascione
2015-12-21 18:57 ` Dynamic modules: MODULE_HANDLE_SIGNALS etc Eli Zaretskii
2015-12-21 20:15 ` Philipp Stephani
2015-12-20 15:48 ` Eli Zaretskii
2015-12-20 18:27 ` Philipp Stephani
2015-12-20 19:00 ` Eli Zaretskii
2015-12-20 21:00 ` Philipp Stephani
2017-03-26 20:18 ` Philipp Stephani
2016-02-29 22:48 ` Philipp Stephani
2016-03-01 16:41 ` Paul Eggert
2016-03-01 21:43 ` Philipp Stephani
2016-03-02 18:54 ` Paul Eggert
2016-03-31 18:44 ` Philipp Stephani
2016-04-01 8:29 ` Paul Eggert
2015-11-28 23:20 ` Paul Eggert
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).