@Paul Eggert: I am cc-ing you because
you are the author of commit f0a1e9ec and may be more familiar
with this topic.
Please ignore my previous email, I thought condition-case WAS able
to catch C stack overflow before commit f0a1e9ec, but it seems not
the case, or at least not related to this bug.
After some code reading and debugging, I find the problem: in
commit f0a1e9ec, the read_buffer for read1 is moved from a static
variable to an array stackbuf of size MAX_ALLOCA located on stack.
MAX_ALLOCA is defined to be 16 * 1024. So every recursion of read1
will eat up 16KB of stack, and thousands of recursions (not
uncommon for a deeply nested structure) quickly use up whole stack
and cause stack overflow.
One solution is to make stackbuf much smaller. I set it to 16, and
this bug disappeared. Though 16 may be too aggressive, 16 * 1024
is way too big for a stack-based buffer in a function that may
recur thousands of times. To make things worse, the buffer is
totally a waste of space when read1 is dealing with everything
("[", "]", "(", ")", "#", "=", numbers, etc.) other than the name
of a symbol (usually tens of characters) or a string, which is the
only case when we would need a really long buffer. A conservative
choice would be a number higher than 40 or 80, making the buffer
long enough to hold any symbol, as people usually do not have
symbol longer than the one of half the width of a terminal. A more
aggressive choice is to totally remove the buffer and only
allocate it on heap. This comes at a cost of possible slow down
because memory allocation on heap is usually slower than on stack.
The reason why this was not the case before commit f0a1e9ec is
that this buffer is reused by every recursion of read1, and is not
a problem.
As a reference, MAX_ALLOCA is defined in src/lisp.h for
SAFE_ALLOCA, which allocate memory on stack if its size is less
than MAX_ALLOCA, and allocate memory on heap otherwise. The usage
for SAFE_ALLOCA and a preparation macro USE_SAFE_ALLOCA seems
pretty complicated and I am not able to figure out.
On 07/11/2018 10:46 PM, Sheng Yang (杨圣) wrote:
condition-case was able to catch C
stack overflow before commit f0a1e9ec. I understand that
recovering from C stack overflow is magical and can be tricky,
but emacs is capable of this thanks to all of your efforts. The
only part missing is re-throwing this as a lisp exception, which
should not be as hard as recovering from C stack overflow.
Here is why this feature can be important. When we open a file,
find-file-hook will call many functions, including but not
limited to undo-tree. These functions read additional files
(undo-tree, project file, dir-local, etc.) and perform tasks. To
guard against file corruption and other problems, all reads are
wrapped in some try-catch clause. However, the trust in these
try-catch clauses are let down, and a single file corruption (or
a file that can cause C stack overflow) ruins the whole process
of loading file with a mysterious message of"Recovered from C
stack overflow". I don't think this is acceptable.
From a lisp programmer's perspective, if exceptions should
occur, they should be caught. This is exactly the behavior that
condition-case and other try-catch clause promise.
I am not an expert in C, debugging the C part of emacs can be
painful for me. Therefore I bisected and found the offending
commits (see my original bug report). Hope this can help you pin
point the problem and fix the bug.
On 07/11/2018 02:48 PM, Noam Postavsky wrote:
retitle 31995 Condition-case can't catch C stack overflow
tags 31995 + wontfix
quit
Sheng Yang (杨圣) <yangsheng6810@gmail.com> writes:
It seems that the function call ~(read (current-buffer))~ causes C stack
overflow. Though I personally believe the undo-tree file is not
corrupted, I assume this error should be caught by condition-case even
if the file to read is indeed corrupted.
The file is not corrupted, it's just that the recursion goes too deep
during reading. However, I don't think condition-case can reasonably
catch C stack overflow. As it is, recovering from C stack overflow at
all is a bit controversial, which is why we have the
attempt-stack-overflow-recovery variable which you can set to nil in
order to reliably segfault instead.
--
Sheng Yang(杨圣)
PhD student
Computer Science Department
University of Maryland, College Park
E-mail:yangsheng6810@gmail.com
--
Sheng Yang(杨圣)
PhD student
Computer Science Department
University of Maryland, College Park
E-mail:yangsheng6810@gmail.com