From: Gemini Lasswell <gazally@runbox.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 33014@debbugs.gnu.org, schwab@linux-m68k.org
Subject: bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
Date: Fri, 19 Oct 2018 12:32:32 -0700 [thread overview]
Message-ID: <87ftx1bulr.fsf@runbox.com> (raw)
In-Reply-To: <87woqebx9v.fsf@runbox.com> (Gemini Lasswell's message of "Thu, 18 Oct 2018 17:22:36 -0700")
[-- Attachment #1: Type: text/plain, Size: 8770 bytes --]
Gemini Lasswell <gazally@runbox.com> writes:
> I set up a single-threaded situation where I could redefine a function
> while exec_byte_code was running it, and got a segfault. I've gained
> some insights from debugging this version of the bug which I will put
> into a separate email.
Here's a gdb transcript going through the single-threaded version of
this bug. In this transcript I use a file 'repro.el' which I've
attached to the end of this message, and is the same as the one in my
last message.
Start gdb with a breakpoint at Fredraw_display:
$ gdb --args ./emacs -Q
...
(gdb) b Fredraw_display
(gdb) r
In Emacs, find the file repro.el and load it with byte-compile-file,
then go back to *scratch* and run my-loop:
C-x C-f repro.el RET
C-u M-x byte-compile-file RET repro.el RET
C-x b RET
M-x my-loop RET
This gets me to the gdb prompt, at a point in execution where the next
function called will be my-loop-1, so I set a breakpoint in
funcall_lambda, where I can see the bytecode object for my-loop-1 (I
edited out the bytestring):
Thread 1 "emacs" hit Breakpoint 3, Fredraw_display () at dispnew.c:3027
3027 {
(gdb) br funcall_lambda
Breakpoint 4 at 0x5cdb00: file eval.c, line 3016.
(gdb) c
Continuing.
Thread 1 "emacs" hit Breakpoint 4, funcall_lambda (fun=XIL(0x31c0235),
nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x7fffffff01c0)
at eval.c:3016
3016 {
(gdb) clear
Deleted breakpoint 4
(gdb) p fun
$1 = XIL(0x1630fc5)
(gdb) pr
#[0 "..." [my-var 0 "Now in recursive edit
" recursive-edit format "Leaving recursive edit: %s
" (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3] 6]
Then I skip ahead into exec-byte-code:
(gdb) br exec_byte_code
Breakpoint 5 at 0x611bb0: file bytecode.c, line 342.
(gdb) c
Continuing.
Thread 1 "emacs" hit Breakpoint 5, exec_byte_code (bytestr=XIL(0x3571d24),
vector=XIL(0x31c0195), maxdepth=make_number(4),
args_template=args_template@entry=XIL(0), nargs=nargs@entry=0,
args=args@entry=0x0) at bytecode.c:342
342 {
Here's what's in the register $rbp, and the constants vector:
(gdb) clear
Deleted breakpoint 5
(gdb) p $rbp
$2 = (void *) 0xb0201
(gdb) pr
#<INVALID_LISP_OBJECT 0x000b0201>
(gdb) p vector
$3 = XIL(0x1630f35)
(gdb) pr
[my-var 0 "Now in recursive edit
" recursive-edit format "Leaving recursive edit: %s
" (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3]
Skip ahead, to get to where exec_byte_code has a value for vectorp:
(gdb) n 12
366 USE_SAFE_ALLOCA;
(gdb) p vectorp
$4 = (Lisp_Object *) 0x1630f38 <bss_sbrk_buffer+9164248>
(gdb) p *vectorp
$5 = XIL(0x2327d80)
(gdb) pr
my-var
(gdb) break mark_vectorlike if ptr->contents == $4
Breakpoint 6 at 0x5ad400: file alloc.c, line 6036.
(gdb) c
Continuing.
The idea is to break when garbage collection finds the constants vector.
(I first tried setting a conditional breakpoint in mark_object, which
made garbage collection either hang or take more time than I had
patience for.)
In Emacs type C-x b RET. This causes a gc and a breakpoint hit:
Thread 1 "emacs" hit Breakpoint 6, mark_vectorlike (ptr=0x31c0190) at alloc.c:6036
6036 eassert (!VECTOR_MARKED_P (ptr));
(gdb) bt 20
#0 mark_vectorlike (ptr=0x1630f30 <bss_sbrk_buffer+9164240>) at alloc.c:6036
#1 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#2 0x00000000005ad45e in mark_vectorlike (
ptr=0x1611fd0 <bss_sbrk_buffer+9037424>) at alloc.c:6046
#3 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#4 0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
#5 0x00000000005acae4 in mark_object (arg=...) at alloc.c:6434
#6 0x00000000005ad45e in mark_vectorlike (
ptr=0x15a8e00 <bss_sbrk_buffer+8606880>) at alloc.c:6046
#7 0x00000000005ad45e in mark_vectorlike (
ptr=0x15a9c30 <bss_sbrk_buffer+8610512>) at alloc.c:6046
#8 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#9 0x00000000005ad45e in mark_vectorlike (
ptr=0x15a7c30 <bss_sbrk_buffer+8602320>) at alloc.c:6046
#10 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#11 0x00000000005ad45e in mark_vectorlike (
ptr=0x15a6e80 <bss_sbrk_buffer+8598816>) at alloc.c:6046
#12 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#13 0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
#14 0x00000000005acaa5 in mark_object (arg=...) at alloc.c:6431
#15 0x00000000005ad45e in mark_vectorlike (
ptr=0x15fbed0 <bss_sbrk_buffer+8947056>) at alloc.c:6046
#16 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#17 0x00000000005ad45e in mark_vectorlike (
ptr=0x15fbf50 <bss_sbrk_buffer+8947184>) at alloc.c:6046
#18 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
#19 0x00000000005ad45e in mark_vectorlike (
ptr=0x15fcc80 <bss_sbrk_buffer+8950560>) at alloc.c:6046
(More stack frames follow...)
Lisp Backtrace:
"Automatic GC" (0x0)
"eldoc-pre-command-refresh-echo-area" (0xfffefbb0)
"recursive-edit" (0xfffeffd8)
"my-loop-1" (0xffff0250)
"my-loop" (0xffff0650)
"funcall-interactively" (0xffff0648)
"call-interactively" (0xffff07d0)
"command-execute" (0xffff0ab8)
"execute-extended-command" (0xffff0ea0)
"funcall-interactively" (0xffff0e98)
"call-interactively" (0xffff11d0)
"command-execute" (0xffff1488)
There are 279 frames in the backtrace, and mark_stack and mark_memory
aren't there. So I'm guessing the constants vector is getting found via
the function definition of 'my-loop-1'. Keep going:
(gdb) c
Continuing.
Now in Emacs do this:
M-x eval-buffer RET
C-x b RET
M-x my-gc RET
Execution does not stop at the breakpoint. In Emacs type C-M-c.
Result:
Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffd8,
message=<optimized out>) at editfns.c:3129
3129 unsigned char format_char = *format++;
What's happened to the constants vector and its contents?
(gdb) p $3
$6 = XIL(0x1630f35)
(gdb) pr
#<INVALID_LISP_OBJECT 0x01630f35>
(gdb) p *$4
$7 = XIL(0x2327d80)
(gdb) pr
my-var
(gdb) p *($4+5)
$8 = XIL(0x359a6f4)
(gdb) pr
#<INVALID_LISP_OBJECT 0x0359a6f4>
(gdb) p *($4+4)
$9 = XIL(0x6390)
(gdb) pr
format
Looks like the constants vector was freed, and its contents haven't been
overwritten (yet) but the format string has been freed leading to the
crash in styled_format.
While I was developing this method of reproducing this bug, I went
through this exercise without lexical-binding set in repro.el. In that
version, the register $rbp when exec_byte_code is called contains the
bytecode Lisp_Object (instead of the non-Lisp-object value it contains
in the transcript above), and the first thing exec_byte_code does is
save it on the stack (presumably because the System V AMD64 ABI calling
convention says that called functions which use $rbp should save and
restore it).
Here's the beginning of the disassembly of exec_byte_code from
"objdump -S bytecode.o":
0000000000000020 <exec_byte_code>:
executing BYTESTR. */
Lisp_Object
exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
Lisp_Object args_template, ptrdiff_t nargs, Lisp_Object *args)
{
20: 55 push %rbp
21: 48 89 e5 mov %rsp,%rbp
24: 41 57 push %r15
26: 41 56 push %r14
28: 41 55 push %r13
2a: 41 54 push %r12
2c: 49 89 ce mov %rcx,%r14
2f: 53 push %rbx
So in the non-lexical-binding case the bytecode Lisp_Object is written
to the stack by the first instruction in exec_byte_code, and then during
the execution of 'my-gc' the breakpoint in mark_vectorlike stops at a
point with a much shorter backtrace which includes mark_stack and
mark_memory, and mark_memory's pp is pointing to the location on the
stack where $rbp was written. The bytecode object and constants vector
are consequently not freed, and no segfault happens.
I don't follow everything going on in the disassembly of funcall_lambda,
but I did figure out (by comparison with a debug session in the
multithreaded situation) that the different values in $rbp when
funcall_lambda calls exec_byte_code depend on the different code paths
following the test of whether the first element of the bytecode object
vector (the "args template" as funcall_lambda's comment calls it) is an
integer, which in turn depends on whether my-loop-1 was compiled with
lexical-binding on.
Here is 'repro.el':
[-- Attachment #2: repro.el --]
[-- Type: text/plain, Size: 685 bytes --]
;;; -*- lexical-binding: t -*-
(defvar my-var "ok")
(defun my-loop-1 ()
(let ((val 0))
(while t
(insert "Now in recursive edit\n")
(recursive-edit)
(insert (format "Leaving recursive edit: %s\n" my-var))
(let ((things '(a b c d e)))
(cond ;
((= val 0) (message "foo: %s" (last things)))
((= val 1) (message "bar: %s" things))
((= val 2) (message "baz: %s" (car things)))
(t (message "bop: %s" (nth 2 things))))
(setq val (mod (1+ val) 3))))))
(defun my-loop ()
(interactive)
(redraw-display)
(my-loop-1))
(defun my-gc-1 ()
(garbage-collect))
(defun my-gc ()
(interactive)
(my-gc-1))
(provide 'repro)
next prev parent reply other threads:[~2018-10-19 19:32 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-11 5:30 bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function Gemini Lasswell
2018-10-12 8:12 ` Eli Zaretskii
2018-10-12 20:02 ` Gemini Lasswell
2018-10-13 6:23 ` Eli Zaretskii
2018-10-13 17:17 ` Gemini Lasswell
2018-10-13 18:04 ` Eli Zaretskii
2018-10-14 19:29 ` Gemini Lasswell
2018-10-15 2:37 ` Eli Zaretskii
2018-10-14 19:46 ` Andreas Schwab
2018-10-15 14:59 ` Eli Zaretskii
2018-10-15 16:22 ` Gemini Lasswell
2018-10-15 16:41 ` Eli Zaretskii
2018-10-16 18:46 ` Gemini Lasswell
2018-10-16 19:25 ` Eli Zaretskii
2018-10-16 19:38 ` Eli Zaretskii
2018-10-19 0:22 ` Gemini Lasswell
2018-10-19 8:44 ` Eli Zaretskii
2018-10-19 20:05 ` Gemini Lasswell
2018-10-20 6:41 ` Eli Zaretskii
2018-10-20 8:23 ` Andreas Schwab
2018-10-20 10:20 ` Eli Zaretskii
2018-10-20 11:30 ` Andreas Schwab
2018-10-29 18:24 ` Gemini Lasswell
2018-10-29 19:41 ` Eli Zaretskii
2018-10-19 19:32 ` Gemini Lasswell [this message]
2018-10-17 16:21 ` Eli Zaretskii
2018-10-18 1:07 ` Gemini Lasswell
2018-10-18 17:04 ` Eli Zaretskii
2018-10-19 0:39 ` Gemini Lasswell
2018-10-19 8:38 ` Eli Zaretskii
2018-10-29 18:56 ` Stefan Monnier
2018-10-31 4:49 ` Paul Eggert
2018-10-31 15:33 ` Eli Zaretskii
2018-11-01 23:15 ` Gemini Lasswell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ftx1bulr.fsf@runbox.com \
--to=gazally@runbox.com \
--cc=33014@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.