bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
@ 2018-10-11  5:30 Gemini Lasswell
  2018-10-12  8:12 ` Eli Zaretskii
  2018-10-31  4:49 ` Paul Eggert
  0 siblings, 2 replies; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-11  5:30 UTC (permalink / raw)
  To: 33014

[-- Attachment #1: Type: text/plain, Size: 684 bytes --]

When I run some byte-compiled code which creates some threads, and then,
while a thread is blocked, interactively evaluate the function which
was used to create that thread, Emacs has a fatal error or segmentation
fault when the thread becomes unblocked.

To reproduce:

  Build Emacs from master with this patch, in which I've pasted some
  excerpts from my current project onto the end of lisp/thread.el.  It's
  going to be like ERT but designed to run benchmarks instead of tests,
  but right now all it does is to create a buffer and three threads, set
  the threads up to communicate with each other, log their progress to
  *Messages*, and update the buffer when they finish:


[-- Attachment #2: 0001-Reproduce-Bswitch-segfault.patch --]
[-- Type: text/plain, Size: 17777 bytes --]

From 4b9e3ccb2376ea532051d5ff7d2d308df56576c9 Mon Sep 17 00:00:00 2001
From: Gemini Lasswell <gazally@runbox.com>
Date: Tue, 9 Oct 2018 09:27:54 -0700
Subject: [PATCH] Reproduce Bswitch segfault

---
 lisp/thread.el | 454 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 454 insertions(+)

diff --git a/lisp/thread.el b/lisp/thread.el
index 7974a2603c..9c4774e961 100644
--- a/lisp/thread.el
+++ b/lisp/thread.el
@@ -196,5 +196,459 @@ thread-list--name
       (and (eq thread main-thread) "Main")
       (prin1-to-string thread)))
 
+;;; Thread-safe messages
+
+(cl-defstruct
+    (thread--message
+     (:constructor
+      thread-make-message (&optional name
+                                     &aux
+                                     (mutex (make-mutex name))
+                                     (condition
+                                      (make-condition-variable mutex name)))))
+  name mutex value condition)
+
+(defun thread-message-available-p (message)
+  "Return the value of MESSAGE."
+  (thread--message-value message))
+
+(defun thread-message-send (message value)
+  "Set the VALUE of MESSAGE, and awaken all threads waiting for it."
+  (with-mutex (thread--message-mutex message)
+    (setf (thread--message-value message) value)
+    (condition-notify (thread--message-condition message) t)))
+
+(defun thread-message-cancel (message)
+  "Cancel MESSAGE by setting its value to nil."
+  (with-mutex (thread--message-mutex message)
+    (setf (thread--message-value message) nil)))
+
+(defun thread-message-wait (message &optional cancel)
+  "If MESSAGE's value is nil, block until it is set to something else.
+Return the value of MESSAGE.  If CANCEL is non-nil, clear MESSAGE
+by setting its value to nil.  If multiple threads are waiting on
+the same message, and all pass a non-nil CANCEL, then only one
+thread will unblock and receive the message's value, and the
+others will continue to block."
+  (with-mutex (thread--message-mutex message)
+    (while (not (thread--message-value message))
+      (condition-wait (thread--message-condition message)))
+    (let ((value (thread--message-value message)))
+      (when cancel
+        (setf (thread--message-value message) nil))
+      value)))
+
+;;; Thread-safe queues
+
+(cl-defstruct (thread--queue
+               (:constructor
+                thread-make-queue (&optional
+                                   size-limit
+                                   type
+                                   &aux
+                                   (fifo (eq type 'fifo))
+                                   (limit (when (natnump size-limit) size-limit))
+                                   (mutex (make-mutex))
+                                   (not-full (make-condition-variable mutex))
+                                   (not-empty (make-condition-variable mutex)))))
+  fifo
+  limit
+  items
+  mutex
+  not-full
+  not-empty)
+
+(defun thread-queue-empty-p (queue)
+  "Return non-nil if QUEUE is empty.
+There is no guarantee that QUEUE will contain the same number of
+items the next time you access it."
+  (with-mutex (thread--queue-mutex queue)
+    (null (thread--queue-items queue))))
+
+(defun thread-queue-full-p (queue)
+  "Return non-nil if QUEUE is full.
+There is no guarantee that QUEUE will contain the same number of
+items the next time you access it."
+  (when (thread--queue-limit queue)
+    (with-mutex (thread--queue-mutex queue)
+      (= (length (thread--queue-items queue)) (thread--queue-limit queue)))))
+
+(defun thread-queue-length (queue)
+  "Return the number of items in QUEUE.
+There is no guarantee that QUEUE will contain the same number of
+items the next time you access it."
+  (with-mutex (thread--queue-mutex queue)
+    (length (thread--queue-items queue))))
+
+(defun thread-queue-remove-all (queue)
+  "Discard any items in QUEUE."
+  (with-mutex (thread--queue-mutex queue)
+    (setf (thread--queue-items queue) nil)
+    (condition-notify (thread--queue-not-full queue))))
+
+(defun thread-queue-put (item queue)
+  "Put ITEM into QUEUE.
+If QUEUE was created with a size limit, and already contains that many items,
+block until one is removed."
+  (with-mutex (thread--queue-mutex queue)
+    (while (and (thread--queue-limit queue)
+                (= (length (thread--queue-items queue)) (thread--queue-limit queue)))
+      (condition-wait (thread--queue-not-full queue)))
+    (if (thread--queue-fifo queue)
+        (setf (thread--queue-items queue)
+              (nconc (thread--queue-items queue) (list item)))
+      (push item (thread--queue-items queue)))
+    (condition-notify (thread--queue-not-empty queue))))
+
+(defun thread-queue-get (queue)
+  "Remove an item from QUEUE and return it.
+If there are no items in QUEUE, block until one is added."
+  (with-mutex (thread--queue-mutex queue)
+    (while (null (thread--queue-items queue))
+      (condition-wait (thread--queue-not-empty queue)))
+    (let ((item (pop (thread--queue-items queue))))
+      (condition-notify (thread--queue-not-full queue))
+      item)))
+
+;;; Bswitch bug
+
+(require 'seq)
+
+(defvar erb--job (thread-make-message)
+  "This contains all the information needed about what benchmark job to run.
+It is created by `erb-run-start' and cleared when the benchmark job is
+finished by `erb-run--benchmarker'.")
+
+(defvar erb--status nil
+  "The state of the benchmark runner.
+Possible values are `idle', `building', `benchmarking' and
+`done'.  Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb--status-commits nil
+  "The commits being processed by the benchmark runner.")
+
+(defvar erb--status-waiting-to-build nil
+  "The list of commits that have not yet started building.
+Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb--status-building nil
+  "The list of commits that are currently being built.
+Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb--status-built nil
+  "The list of commits that have been built.
+Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb--status-failed-builds nil
+  "The list of commits that failed to build.
+Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb--status-waiting-to-benchmark nil
+  "The commits that are built and waiting to be benchmarked.
+Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb--status-benchmarking nil
+  "The commit that is currently being benchmarked.
+Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb--status-benchmark-failures nil
+  "The commits that had task failures during benchmarking.
+Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb--status-finished nil
+  "The commits that are finished benchmarking.
+Maintained by `erb-run--monitor' and used to update the
+`erb-run' buffer.")
+
+(defvar erb-run--cancel-request (thread-make-message)
+  "Signal the user's request that a benchmark job be stopped.
+Cleared when the benchmark job is cleaned up, by
+`erb-run-benchmarker'.")
+
+
+;;; ERB Run Benchmarks mode
+
+(defvar erb-run-refresh-seconds 0.2
+  "Delay between updates of `erb-run' buffers.")
+
+;; Options settable in the erb-run-mode buffer.
+(defvar-local erb-run--commit-range '("commitA" "commitB" "commitC"))
+(defvar-local erb-run--count-to-select nil)
+
+(defvar erb-run-mode-map
+  (let ((map (copy-keymap special-mode-map)))
+    (set-keymap-parent map button-buffer-map)
+    (define-key map "n" 'next-line)
+    (define-key map "p" 'previous-line)
+    (define-key map "s" 'erb-run-start)
+    (define-key map "c" 'erb-run-cancel)
+    (define-key map [follow-link] 'mouse-face)
+    (define-key map [mouse-2] 'mouse-select-window)
+
+    map)
+  "Local keymap for `erb-run-mode' buffers.")
+
+(define-derived-mode erb-run-mode special-mode "ERB-run"
+  "Mode for configuring and running benchmarks.
+\\<erb-run-mode-map>
+ERB is documented in info node `(erb)'."
+  :group 'erb-mode
+  (setq-local revert-buffer-function #'erb-run-revert-buffer))
+
+;; TODO make only one buffer?
+(defun erb-run-generate-new-buffer ()
+  (let* ((name (format "*ERB-run: %s*" (file-name-nondirectory
+                                        (directory-file-name default-directory))))
+         (buffer (generate-new-buffer name)))
+    (with-current-buffer buffer
+      (erb-run-mode)
+      (run-at-time erb-run-refresh-seconds nil
+                   #'erb-run--timer-func buffer)
+      (add-to-list 'uniquify-list-buffers-directory-modes 'erb-run-mode))
+    buffer))
+
+;; TODO autoload is just for now
+;;;###autoload
+(defun erb-summary-run ()
+  "Switch to or create an `erb-run-mode' buffer for running benchmarks."
+  (interactive)
+  (let* ((dir default-directory)
+         (buffer (or (seq-find (lambda (buf)
+                                 (with-current-buffer buf
+                                   (eq major-mode 'erb-run-mode)))
+                               (buffer-list))
+                     (erb-run-generate-new-buffer))))
+    (switch-to-buffer buffer)
+    (erb-run-revert-buffer buffer)))
+
+(defun erb-run-revert-buffer (&rest _ignored)
+  (let ((inhibit-read-only t))
+    (erase-buffer)
+    ;; (erb--update-config-cache)
+    (insert
+     (format "Commit range: %s\n" erb-run--commit-range)
+     (format "Number to select: %s\n" (if erb-run--count-to-select
+                                          erb-run--count-to-select "All"))
+     (if (memq erb--status '(idle done))
+         "\nStart running benchmarks\n"
+       "\nCancel\n")
+
+     "\n"
+     (format "Benchmarked: %s\n" (if erb--status-finished
+                                     (length erb--status-finished) ""))
+     (format "Build Failures: %s\n" (if (not (eq erb--status 'idle))
+                                        (length erb--status-failed-builds) ""))
+     (format "Task Failures: %s\n" (if (not (eq erb--status 'idle))
+                                       (length erb--status-benchmark-failures) ""))
+     "Total: %s/%s\n\n"
+
+     "Started at: \n"
+     (cl-case erb--status
+       (building "Building.")
+       (benchmarking "Benchmarking.")
+       (idle "Ready.")
+       (done "Finished.")
+       (t (format "erb--status=%s" erb--status)))
+     "\nFinished at: \n\n")
+
+    (unless (eq erb--status 'idle)
+      (dolist (commit erb--status-commits)
+        (insert
+         (cond
+          ((memq commit erb--status-finished)           ".")
+          ((memq commit erb--status-building)           "B")
+          ((memq commit erb--status-benchmark-failures) "F")
+          ((memq commit erb--status-failed-builds)      "E")
+          ((memq commit erb--status-benchmarking)       "@")
+          ((memq commit erb--status-built)              "b")
+          (t "w"))))
+      (insert "\n"))
+
+    (set-buffer-modified-p nil)))
+
+(defun erb-run--timer-func (buffer)
+  "Revert BUFFER and set a timer to do it again."
+  (when (buffer-live-p buffer)
+    (with-current-buffer buffer
+      (revert-buffer))
+    (run-at-time erb-run-refresh-seconds nil
+                 #'erb-run--timer-func buffer)))
+
+(defun erb-run-start ()
+  "Start running benchmarks."
+  (interactive)
+  (when (thread-message-available-p erb--job)
+    (user-error "Benchmarks are already running"))
+
+  (erb--start-benchmark-controller-thread)
+  (erb--start-builder-threads)
+  (erb--start-benchmark-monitor-thread)
+
+  (unless erb-run--commit-range
+    (user-error "Choose a commit or range of commits to benchmark"))
+
+  (erb--status-clear)
+  (thread-message-send erb--job erb-run--commit-range))
+
+(defun erb-run-cancel ()
+  "Stop running benchmarks."
+  (interactive)
+  (erb--status-clear))
+
+;;; The benchmark runner: the controller thread
+
+(defvar erb-simultaneous-build-count 1
+  "The number of builds to run simultaneously.")
+
+(defvar erb--builders nil
+  "The list of threads which have been created to run builds.")
+
+(defvar erb--unbuilt-commits (thread-make-queue)
+  "A thread-safe queue of commits waiting to be built.")
+(defvar erb--built-commits (thread-make-queue)
+  "A thread-safe queue of commits which have been built.")
+
+(defvar erb--benchmark-controller nil)
+
+(defun erb--start-benchmark-controller-thread ()
+  "Start the benchmark controller thread if it is not already started."
+  (unless erb--benchmark-controller
+    (setq erb--benchmark-controller
+          (make-thread #'erb--benchmark-control-func "control"))))
+
+(defun erb--benchmark-control-func ()
+  "Process benchmark jobs.
+Watch for incoming jobs arriving by a thread-safe message in
+`erb--job'.  When a job becomes available, build and
+benchmark all the commits and then clear the message."
+  (while t
+    (condition-case err
+        (let* ((job (thread-message-wait erb--job))
+               (count (length job))
+               builds)
+
+          ;; First, do all the builds.  There may be more than one
+          ;; builder thread.
+          (erb--status-change 'erb--status 'building)
+          (dolist (commit job)
+            (erb--status-add commit 'erb--status-waiting-to-build)
+            (thread-queue-put commit erb--unbuilt-commits))
+
+          (while (> count 0)
+            (let ((build (thread-queue-get erb--built-commits)))
+              (push build builds))
+            (cl-decf count))
+
+          (erb--status-change 'erb--status 'benchmarking)
+          ;; Then benchmark the build results, one at a time.
+
+          (erb--status-change 'erb--status 'done)
+          (thread-message-cancel erb--job))
+      ((error quit) (message "Error in ERB benchmark control thread: %s" err)))))
+
+;;; The benchmark runner: the builder threads
+
+;; TODO  Watch for a signal to shut down
+
+(defun erb--start-builder-threads ()
+  "Create the desired number of commit-building threads.
+Get the number from `erb-run-simultaneous-build-count'.
+TODO: adjust the number down as well as up."
+  (let ((needed (- erb-simultaneous-build-count
+                   (length erb--builders))))
+    (while (> needed 0)
+      (push (make-thread #'erb--builder-func
+                         (format "builder %s" (length erb--builders)))
+            erb--builders)
+      (cl-decf needed))))
+
+(defun erb--builder-func ()
+  "Build commits from `erb-run--commmits-to-build'."
+  (while t
+    (condition-case err
+        (let ((commit (thread-queue-get erb--unbuilt-commits)))
+          (message "building %s" commit)
+          (erb--status-remove commit 'erb--status-waiting-to-build)
+          (erb--status-add commit 'erb--status-building)
+
+          (unwind-protect
+              (sleep-for (random 10))
+
+            (erb--status-remove commit 'erb--status-building)
+            (erb--status-add commit 'erb--status-built)
+            (thread-queue-put commit erb--built-commits)))
+
+      ((error quit) (message "Error in ERB benchmark build thread: %s" err)))))
+
+;;; The benchmark runner: the status monitor thread
+
+(defvar erb--benchmark-monitor nil
+  "The thread which keeps track of build and benchmark job status.")
+(defvar erb--status-updates (thread-make-queue nil 'fifo)
+  "A thread-safe queue of status updates yet to be processed.")
+
+(defun erb--status-clear ()
+  (thread-queue-put '(clear) erb--status-updates))
+(defun erb--status-change (symbol value)
+  (thread-queue-put `(change ,symbol ,value) erb--status-updates))
+(defun erb--status-add (value symbol)
+  (thread-queue-put `(add ,value ,symbol) erb--status-updates))
+(defun erb--status-remove (value symbol)
+  (thread-queue-put `(remove ,value ,symbol) erb--status-updates))
+
+(defun erb--start-benchmark-monitor-thread ()
+  "Start the benchmark monitor thread if it is not already started."
+  (unless erb--benchmark-monitor
+    (setq erb--benchmark-monitor
+          (make-thread #'erb--benchmark-monitor-func "monitor"))))
+
+(defvar erb--logging t)
+(defun erb--benchmark-monitor-func ()
+  "Process benchmark job status update.
+Collect status update requests from `erb--status-updates'
+and update the various global variables accordingly."
+  (while t
+    (condition-case err
+        (pcase (thread-queue-get erb--status-updates)
+          (`(clear)
+           (when erb--logging (message "** Clear **"))
+           (erb--clear-status))
+          (`(add ,value ,variable)
+           (when erb--logging
+             (message "** Add %s to %s **" value variable))
+           (push value (symbol-value variable)))
+          (`(remove ,value ,variable)
+           (when erb--logging
+             (message "** Remove %s from %s **" value variable))
+           (setf (symbol-value variable)
+                 (delq value (symbol-value variable))))
+          (`(change ,variable ,value)
+           (when erb--logging
+             (message "** Set %s to %s **" variable value))
+           (setf (symbol-value variable) value))
+          (update (error "Unrecognized status update: %s" update)))
+      ((error quit) (message "Error in ERB benchmark control thread: %s" err)))))
+
+(defun erb--clear-status ()
+  "Reset all the ERB benchmarking status variables to their initial state."
+  (setq erb--status                      'idle
+        erb--status-waiting-to-build     nil
+        erb--status-building             nil
+        erb--status-built                nil
+        erb--status-failed-builds        nil
+        erb--status-waiting-to-benchmark nil
+        erb--status-benchmarking         nil
+        erb--status-finished             nil))
+
+
+
 (provide 'thread)
 ;;; thread.el ends here
-- 
2.16.4


[-- Attachment #3: Type: text/plain, Size: 5589 bytes --]


  Run Emacs with -Q, and then type:
    M-x erb-summary-run RET
    s

  Wait several seconds for the second to the last line in the buffer to
  change to "Finished".  There will also be an echo area message about
  erb--status being set to done.

  Navigate to lisp/thread.el, select everything from the definition of
  erb--benchmark-monitor to the end of the file, and use:
    M-x eval-region RET

  Return to the buffer created by erb-summary-run, and type 's' again.

Result:

lisp.h:2241: Emacs fatal error: assertion failed: HASH_TABLE_P (a)

My suspicion is that the garbage collector is freeing something needed
by the blocked thread.  Setting gc-cons-threshold to 500M before doing
the steps above stops the error from happening.

Here's the backtrace.  While trying to sort out how to reproduce this, I
also saw it segfault in Ffuncall, in styled_format, and in the Bswitch
case of exec_byte_code just past where this error occurs, when it tries
to access h->count.

Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
#0  terminate_due_to_signal (sig=sig@entry=6,
    backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:369
#1  0x00000000005a4d99 in die (msg=msg@entry=0x678d52 "HASH_TABLE_P (a)",
    file=file@entry=0x6768a5 "lisp.h", line=line@entry=2241) at alloc.c:7094
#2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
#3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=...,
    nargs=nargs@entry=0, args=<optimized out>,
    args@entry=0x16eac38 <bss_sbrk_buffer+9926040>) at bytecode.c:1403
#4  0x00000000005cb972 in funcall_lambda (fun=..., nargs=nargs@entry=0,
    arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>,
    arg_vector@entry=0x158ec58 <bss_sbrk_buffer+8500664>) at eval.c:3057
#5  0x00000000005c818b in Ffuncall (nargs=nargs@entry=1,
    args=args@entry=0x158ec50 <bss_sbrk_buffer+8500656>) at eval.c:2870
#6  0x000000000064443b in invoke_thread_function () at thread.c:684
#7  0x00000000005c728f in internal_condition_case (
    bfun=bfun@entry=0x644400 <invoke_thread_function>, handlers=...,
    handlers@entry=XIL(0xc3c0), hfun=hfun@entry=0x644320 <record_thread_error>)
    at eval.c:1373
#8  0x0000000000644dd1 in run_thread (state=0x158ec30 <bss_sbrk_buffer+8500624>)
    at thread.c:723
#9  0x00007f1cebf602a7 in start_thread ()
   from /nix/store/hwwqshlmazzjzj7yhrkyjydxamvvkfd3-glibc-2.26-131/lib/libpthread.so.0
#10 0x00007f1ceb5fd57f in clone ()
   from /nix/store/hwwqshlmazzjzj7yhrkyjydxamvvkfd3-glibc-2.26-131/lib/libc.so.6

Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
"erb--benchmark-monitor-func" (0x158ec58)


In GNU Emacs 27.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.28)
 of 2018-10-09 built on sockeye
Repository revision: 708444efad7a2ce1e309532898b844527e2d9c64
Windowing system distributor 'The X.Org Foundation', version 11.0.11906000
System Description: NixOS 18.03.git.bd06547 (Impala)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --prefix=/home/gem/src/emacs/master/bin --with-modules
 --with-x-toolkit=gtk3 --with-xft --config-cache
 --enable-checking=yes,glyphs --enable-check-lisp-object-type'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND DBUS GSETTINGS GLIB NOTIFY LIBSELINUX
GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM
MODULES THREADS GMP

Important settings:
  value of $EMACSLOADPATH:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv dired dired-loaddefs format-spec rfc822 mml
easymenu mml-sec password-cache epa derived epg epg-config gnus-util
rmail rmail-loaddefs time-date mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib sendmail
rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads dbusbind
inotify dynamic-setting system-font-setting font-render-setting
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 94967 9472)
 (symbols 48 20045 1)
 (strings 32 28456 1769)
 (string-bytes 1 816313)
 (vectors 16 14265)
 (vector-slots 8 504082 12268)
 (floats 8 47 70)
 (intervals 56 213 0)
 (buffers 992 11))

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-11  5:30 bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function Gemini Lasswell
@ 2018-10-12  8:12 ` Eli Zaretskii
  2018-10-12 20:02   ` Gemini Lasswell
  2018-10-31  4:49 ` Paul Eggert
  1 sibling, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-12  8:12 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014

> From: Gemini Lasswell <gazally@runbox.com>
> Date: Wed, 10 Oct 2018 22:30:29 -0700
> 
> When I run some byte-compiled code which creates some threads, and then,
> while a thread is blocked, interactively evaluate the function which
> was used to create that thread, Emacs has a fatal error or segmentation
> fault when the thread becomes unblocked.

Can you please make a smaller stand-alone test case, which doesn't
require patching Emacs?  That will make it much easier to try
reproducing the problem.

> Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
> #0  terminate_due_to_signal (sig=sig@entry=6,
>     backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:369
> #1  0x00000000005a4d99 in die (msg=msg@entry=0x678d52 "HASH_TABLE_P (a)",
>     file=file@entry=0x6768a5 "lisp.h", line=line@entry=2241) at alloc.c:7094
> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
> #3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=...,
>     nargs=nargs@entry=0, args=<optimized out>,
>     args@entry=0x16eac38 <bss_sbrk_buffer+9926040>) at bytecode.c:1403
> #4  0x00000000005cb972 in funcall_lambda (fun=..., nargs=nargs@entry=0,
>     arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>,
>     arg_vector@entry=0x158ec58 <bss_sbrk_buffer+8500664>) at eval.c:3057
> #5  0x00000000005c818b in Ffuncall (nargs=nargs@entry=1,
>     args=args@entry=0x158ec50 <bss_sbrk_buffer+8500656>) at eval.c:2870
> #6  0x000000000064443b in invoke_thread_function () at thread.c:684
> #7  0x00000000005c728f in internal_condition_case (
>     bfun=bfun@entry=0x644400 <invoke_thread_function>, handlers=...,
>     handlers@entry=XIL(0xc3c0), hfun=hfun@entry=0x644320 <record_thread_error>)
>     at eval.c:1373
> #8  0x0000000000644dd1 in run_thread (state=0x158ec30 <bss_sbrk_buffer+8500624>)

Can you show the Lisp backtrace of this thread?  Also, what is the
offending object 'a' in this frame:

> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241

and what was its parent object in the calling frame?

Thanks.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-12  8:12 ` Eli Zaretskii
@ 2018-10-12 20:02   ` Gemini Lasswell
  2018-10-13  6:23     ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-12 20:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014

Eli Zaretskii <eliz@gnu.org> writes:

> Can you please make a smaller stand-alone test case, which doesn't
> require patching Emacs?  That will make it much easier to try
> reproducing the problem.

I've tried to do that without success.  The bug won't reproduce if I put
all the code added to thread.el by the patch into its own file and load
it with C-u M-x byte-compile-file, and it also doesn't work to put the
resulting .elc on my load-path and load it with require.

I've determined today that having -O2 in CFLAGS is necessary to
reproduce the bug, and that -O1 or -O0 won't do it.

> Can you show the Lisp backtrace of this thread?  Also, what is the
> offending object 'a' in this frame:

The Lisp backtrace is really short:

Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
"erb--benchmark-monitor-func" (0x158ec58)

>> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
>
> and what was its parent object in the calling frame?

Those are both optimized out with -O2.  I recompiled bytecode.c with
"volatile" on the declaration of jmp_table, and got this:

(gdb) up 3
#3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., 
    nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x16eacf8 <bss_sbrk_buffer+9926232>) at bytecode.c:1403
1403	            struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);
(gdb) p jmp_table
$1 = make_number(514)
(gdb) p *top
$3 = XIL(0x42b4d0)
(gdb) pp *top
remove

Then I started looking at other variables in exec_byte_code, and found
this which didn't look right:

(gdb) p *vectorp
$13 = XIL(0x7f4934009523)
(gdb) pr
(((help-menu "Help" keymap (emacs-tutorial menu-item "Emacs Tutorial" help-with-tutorial :help "Lear
?\207" [yank-menu kill-ring buffer-read-only gui-backend-selection-exists-p CLIPBOARD featurep ns] 2
\205^Q^@ÅÆ!\207" [visual-line-mode word-wrap truncate-lines 0 nil toggle-truncate-lines -1] 2 nil ni

(I've truncated the result of printing *vectorp since each line is over
5000 characters long.)

Since that looked like it was unlikely to be the original value of
*vectorp, I started a new debugging session and stepped through
Thread 7's call to exec_byte_code for erb--benchmark-monitor-func, and
determined that *vectorp's initial value was erb--status-updates, which
matches the first element of the constants vector in
(symbol-function 'erb--benchmark-monitor-func).

The value of vectorp was 0x16eac38 so I set a watchpoint on
  *(EMACS_INT *) 0x16eac38
and continued, and then during the execution of eval-region
it triggered here:

Thread 1 "monitor" hit Hardware watchpoint 7: *(EMACS_INT *) 0x16eac38

Old value = 60897760
New value = 24075314
setup_on_free_list (v=v@entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
    nbytes=nbytes@entry=272) at alloc.c:3060
3060	  total_free_vector_slots += nbytes / word_size;
(gdb) bt 10
#0  setup_on_free_list (v=v@entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
    nbytes=nbytes@entry=272) at alloc.c:3060
#1  0x00000000005a9a24 in sweep_vectors () at alloc.c:3297
#2  0x00000000005adb2e in gc_sweep () at alloc.c:6872
#3  garbage_collect_1 (end=<optimized out>) at alloc.c:5860
#4  Fgarbage_collect () at alloc.c:5989
#5  0x00000000005ca478 in maybe_gc () at lisp.h:4804
#6  Ffuncall (nargs=4, args=args@entry=0x7fff210a3bc8) at eval.c:2838
#7  0x0000000000611e00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., 
    args_template=..., nargs=nargs@entry=2, args=<optimized out>, 
    args@entry=0x9bd128 <pure+781288>) at bytecode.c:632
#8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fff210a3bc8), 
    nargs=nargs@entry=2, arg_vector=0x9bd128 <pure+781288>, 
    arg_vector@entry=0x7fff210a3f00) at eval.c:3057
#9  0x00000000005ca54b in Ffuncall (nargs=3, args=args@entry=0x7fff210a3ef8)
    at eval.c:2870
(More stack frames follow...)

Note that just as was happening when we were working through bug#32357,
the thread names which gdb prints are wrong, which I verified with:

(gdb) p current_thread
$21 = (struct thread_state *) 0xd73480 <main_thread>
(gdb) p current_thread->name
$22 = XIL(0)

Am I correct that the next step is to figure out why the garbage
collector is not marking this vector?  Presumably it's no longer
attached to the function definition for erb--benchmark-monitor-func by
the time the garbage collector runs, but it's supposed to be found by
mark_stack when called from mark_one_thread for Thread 7, right?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-12 20:02   ` Gemini Lasswell
@ 2018-10-13  6:23     ` Eli Zaretskii
  2018-10-13 17:17       ` Gemini Lasswell
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-13  6:23 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: 33014@debbugs.gnu.org
> Date: Fri, 12 Oct 2018 13:02:56 -0700
> 
> I've tried to do that without success.  The bug won't reproduce if I put
> all the code added to thread.el by the patch into its own file and load
> it with C-u M-x byte-compile-file, and it also doesn't work to put the
> resulting .elc on my load-path and load it with require.

Did you try loading it as a .el file?

Anyway, it's too bad that the reproduction is so Heisenbug-like.  It
probably won't reproduce on my system anyway.

> I've determined today that having -O2 in CFLAGS is necessary to
> reproduce the bug, and that -O1 or -O0 won't do it.

One more reason why reproduction elsewhere is probably hard.

> The Lisp backtrace is really short:
> 
> Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
> "erb--benchmark-monitor-func" (0x158ec58)

If you succeed in reproducing this when this code is loaded
uncompiled, the backtrace might be more helpful.

> >> #2  0x00000000006122b5 in XHASH_TABLE (a=...) at lisp.h:2241
> >
> > and what was its parent object in the calling frame?
> 
> Those are both optimized out with -O2.  I recompiled bytecode.c with
> "volatile" on the declaration of jmp_table, and got this:
> 
> (gdb) up 3
> #3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., 
>     nargs=nargs@entry=0, args=<optimized out>, 
>     args@entry=0x16eacf8 <bss_sbrk_buffer+9926232>) at bytecode.c:1403
> 1403	            struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);
> (gdb) p jmp_table
> $1 = make_number(514)
> (gdb) p *top
> $3 = XIL(0x42b4d0)
> (gdb) pp *top
> remove

Which one of these is the one that triggers the assertion violation?

> Thread 1 "monitor" hit Hardware watchpoint 7: *(EMACS_INT *) 0x16eac38
> 
> Old value = 60897760
> New value = 24075314
> setup_on_free_list (v=v@entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
>     nbytes=nbytes@entry=272) at alloc.c:3060
> 3060	  total_free_vector_slots += nbytes / word_size;
> (gdb) bt 10
> #0  setup_on_free_list (v=v@entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
>     nbytes=nbytes@entry=272) at alloc.c:3060
> #1  0x00000000005a9a24 in sweep_vectors () at alloc.c:3297
> #2  0x00000000005adb2e in gc_sweep () at alloc.c:6872
> #3  garbage_collect_1 (end=<optimized out>) at alloc.c:5860
> #4  Fgarbage_collect () at alloc.c:5989
> #5  0x00000000005ca478 in maybe_gc () at lisp.h:4804
> #6  Ffuncall (nargs=4, args=args@entry=0x7fff210a3bc8) at eval.c:2838
> #7  0x0000000000611e00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., 
>     args_template=..., nargs=nargs@entry=2, args=<optimized out>, 
>     args@entry=0x9bd128 <pure+781288>) at bytecode.c:632
> #8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fff210a3bc8), 
>     nargs=nargs@entry=2, arg_vector=0x9bd128 <pure+781288>, 
>     arg_vector@entry=0x7fff210a3f00) at eval.c:3057
> #9  0x00000000005ca54b in Ffuncall (nargs=3, args=args@entry=0x7fff210a3ef8)
>     at eval.c:2870
> (More stack frames follow...)

Can you show the Lisp backtrace for the above?

> Note that just as was happening when we were working through bug#32357,
> the thread names which gdb prints are wrong, which I verified with:

Looks like a bug in pthreads version of sys_thread_create: it calls
prctl with first arg PR_SET_NAME, but my reading of the documentation
is that such a call gives the name to the _calling_ thread, which is
not the thread just created.  We should instead call
pthread_setname_np, I think (but I'm not an expert on pthreads).

> Am I correct that the next step is to figure out why the garbage
> collector is not marking this vector?  Presumably it's no longer
> attached to the function definition for erb--benchmark-monitor-func by
> the time the garbage collector runs, but it's supposed to be found by
> mark_stack when called from mark_one_thread for Thread 7, right?

Is this vector the byte-code of erb--benchmark-monitor-func?  If so,
how come it is no longer attached to the function, as long as the
function does exist?

And if this vector isn't the byte-code of erb--benchmark-monitor-func,
then what is it?

IMO, we cannot reason about what GC does or doesn't do until we
understand what data structure it processes, and what is the relation
of that data structure to the symbols in your program and in Emacs.

Thanks.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-13  6:23     ` Eli Zaretskii
@ 2018-10-13 17:17       ` Gemini Lasswell
  2018-10-13 18:04         ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-13 17:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014

[-- Attachment #1: Type: text/plain, Size: 553 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

> Did you try loading it as a .el file?

Yes, but I couldn't reproduce the bug.

>> The Lisp backtrace is really short:
>> 
>> Thread 7 (Thread 0x7f1cd4dec700 (LWP 21837)):
>> "erb--benchmark-monitor-func" (0x158ec58)
>
> If you succeed in reproducing this when this code is loaded
> uncompiled, the backtrace might be more helpful.

The assertion happens in the Bswitch case of exec_byte_code when it's
running erb--benchmark-monitor-func, and there's only one 'switch' in
the disassembled bytecode, see line 16:


[-- Attachment #2: bytecode.txt --]
[-- Type: text/plain, Size: 4386 bytes --]

byte code for erb--benchmark-monitor-func:
  doc:  Process benchmark job status update. ...
  args: nil
0:1	constant  (error quit)
1	pushconditioncase 34
4	constant  thread-queue-get
5	varref	  erb--status-updates
6	call	  1
7	dup
8	consp
9	goto-if-nil 32
12	dup
13	car
14	dup
15	constant  <jump-table-eq (clear 2 add 6 remove 14 change 22)>
16	switch
17	goto	  30
20:2	stack-ref 1
21	cdr
22	dup
23	goto-if-not-nil 4
26	varref	  erb--logging
27	goto-if-nil 3
30	constant  message
31	constant  "** Clear **"
32	call	  1
33	discard
34:3	constant  erb--clear-status
35	call	  0
36	goto	  5
39:4	stack-ref 2
40	constant  error
41	constant  "Unrecognized status update: %s"
42	stack-ref 2
43	call	  2
44	stack-set 1
46:5	stack-set 1
48	goto	  31
51:6	stack-ref 1
52	cdr
53	dup
54	consp
55	goto-if-nil 12
58	dup
59	car
60	stack-ref 1
61	cdr
62	dup
63	consp
64	goto-if-nil 10
67	dup
68	car
69	stack-ref 1
70	cdr
71	dup
72	goto-if-not-nil 8
75	stack-ref 1
76	stack-ref 4
77	varref	  erb--logging
78	goto-if-nil 7
81	constant  message
82	constant  "** Add %s to %s **"
83	stack-ref 2
84	stack-ref 4
85	call	  3
86	discard
87:7	stack-ref 1
88	dup
89	stack-ref 2
90	stack-ref 2
91	symbol-value
92	cons
93	set
94	stack-set 1
96	discardN-preserve-tos 2
98	goto	  9
101:8	stack-ref 6
103	constant  error
104	constant  "Unrecognized status update: %s"
105	stack-ref 2
106	call	  2
107	stack-set 1
109:9	discardN-preserve-tos 2
111	goto	  11
114:10	stack-ref 4
115	constant  error
116	constant  "Unrecognized status update: %s"
117	stack-ref 2
118	call	  2
119	stack-set 1
121:11	discardN-preserve-tos 2
123	goto	  13
126:12	stack-ref 2
127	constant  error
128	constant  "Unrecognized status update: %s"
129	stack-ref 2
130	call	  2
131	stack-set 1
133:13	stack-set 1
135	goto	  31
138:14	stack-ref 1
139	cdr
140	dup
141	consp
142	goto-if-nil 20
145	dup
146	car
147	stack-ref 1
148	cdr
149	dup
150	consp
151	goto-if-nil 18
154	dup
155	car
156	stack-ref 1
157	cdr
158	dup
159	goto-if-not-nil 16
162	stack-ref 1
163	stack-ref 4
164	varref	  erb--logging
165	goto-if-nil 15
168	constant  message
169	constant  "** Remove %s from %s **"
170	stack-ref 2
171	stack-ref 4
172	call	  3
173	discard
174:15	stack-ref 1
175	dup
176	constant  delq
177	stack-ref 3
178	stack-ref 5
179	symbol-value
180	call	  2
181	set
182	stack-set 1
184	discardN-preserve-tos 2
186	goto	  17
189:16	stack-ref 6
191	constant  error
192	constant  "Unrecognized status update: %s"
193	stack-ref 2
194	call	  2
195	stack-set 1
197:17	discardN-preserve-tos 2
199	goto	  19
202:18	stack-ref 4
203	constant  error
204	constant  "Unrecognized status update: %s"
205	stack-ref 2
206	call	  2
207	stack-set 1
209:19	discardN-preserve-tos 2
211	goto	  21
214:20	stack-ref 2
215	constant  error
216	constant  "Unrecognized status update: %s"
217	stack-ref 2
218	call	  2
219	stack-set 1
221:21	stack-set 1
223	goto	  31
226:22	stack-ref 1
227	cdr
228	dup
229	consp
230	goto-if-nil 28
233	dup
234	car
235	stack-ref 1
236	cdr
237	dup
238	consp
239	goto-if-nil 26
242	dup
243	car
244	stack-ref 1
245	cdr
246	dup
247	goto-if-not-nil 24
250	stack-ref 1
251	stack-ref 4
252	varref	  erb--logging
253	goto-if-nil 23
256	constant  message
257	constant  "** Set %s to %s **"
258	stack-ref 2
259	stack-ref 4
260	call	  3
261	discard
262:23	dup
263	dup
264	stack-ref 3
265	set
266	stack-set 1
268	discardN-preserve-tos 2
270	goto	  25
273:24	stack-ref 6
275	constant  error
276	constant  "Unrecognized status update: %s"
277	stack-ref 2
278	call	  2
279	stack-set 1
281:25	discardN-preserve-tos 2
283	goto	  27
286:26	stack-ref 4
287	constant  error
288	constant  "Unrecognized status update: %s"
289	stack-ref 2
290	call	  2
291	stack-set 1
293:27	discardN-preserve-tos 2
295	goto	  29
298:28	stack-ref 2
299	constant  error
300	constant  "Unrecognized status update: %s"
301	stack-ref 2
302	call	  2
303	stack-set 1
305:29	stack-set 1
307	goto	  31
310:30	stack-ref 1
311	constant  error
312	constant  "Unrecognized status update: %s"
313	stack-ref 2
314	call	  2
315	stack-set 1
317:31	stack-set 1
319	goto	  33
322:32	dup
323	constant  error
324	constant  "Unrecognized status update: %s"
325	stack-ref 2
326	call	  2
327	stack-set 1
329:33	stack-set 1
331	pophandler
332	goto	  35
335:34	constant  message
336	constant  "Error in ERB benchmark control thread: %s"
337	stack-ref 2
338	call	  2
339	stack-set 1
341:35	discard
342	goto	  1
345	return

[-- Attachment #3: Type: text/plain, Size: 3123 bytes --]


>> (gdb) p jmp_table
>> $1 = make_number(514)
>> (gdb) p *top
>> $3 = XIL(0x42b4d0)
>> (gdb) pp *top
>> remove
>
> Which one of these is the one that triggers the assertion violation?

jmp_table.  The assertion violation is at line 1403 in bytecode.c:

            struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);

>> Thread 1 "monitor" hit Hardware watchpoint 7: *(EMACS_INT *) 0x16eac38
>> 
>> Old value = 60897760
>> New value = 24075314
>> setup_on_free_list (v=v@entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
>>     nbytes=nbytes@entry=272) at alloc.c:3060
>> 3060	  total_free_vector_slots += nbytes / word_size;
>> (gdb) bt 10
>> #0  setup_on_free_list (v=v@entry=0x16eac30 <bss_sbrk_buffer+9926032>, 
>>     nbytes=nbytes@entry=272) at alloc.c:3060
>> #1  0x00000000005a9a24 in sweep_vectors () at alloc.c:3297
>> #2  0x00000000005adb2e in gc_sweep () at alloc.c:6872
>> #3  garbage_collect_1 (end=<optimized out>) at alloc.c:5860
>> #4  Fgarbage_collect () at alloc.c:5989
>> #5  0x00000000005ca478 in maybe_gc () at lisp.h:4804
>> #6  Ffuncall (nargs=4, args=args@entry=0x7fff210a3bc8) at eval.c:2838
>> #7  0x0000000000611e00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., 
>>     args_template=..., nargs=nargs@entry=2, args=<optimized out>, 
>>     args@entry=0x9bd128 <pure+781288>) at bytecode.c:632
>> #8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fff210a3bc8), 
>>     nargs=nargs@entry=2, arg_vector=0x9bd128 <pure+781288>, 
>>     arg_vector@entry=0x7fff210a3f00) at eval.c:3057
>> #9  0x00000000005ca54b in Ffuncall (nargs=3, args=args@entry=0x7fff210a3ef8)
>>     at eval.c:2870
>> (More stack frames follow...)
>
> Can you show the Lisp backtrace for the above?

(gdb) xbacktrace
"Automatic GC" (0x0)
"string-match" (0x210a3bd0)
"completion-pcm--string->pattern" (0x210a3f00)
"completion-pcm--find-all-completions" (0x210a43a0)
"completion-pcm-try-completion" (0x210a4668)
0x1723c30 PVEC_COMPILED
"completion--some" (0x210a4b60)
"completion--nth-completion" (0x210a4e68)
"completion-try-completion" (0x210a50f0)
"execute-extended-command--shorter" (0x210a5390)
"execute-extended-command" (0x210a5760)
"funcall-interactively" (0x210a5758)
"call-interactively" (0x210a5a90)
"command-execute" (0x210a5d48)

>> Am I correct that the next step is to figure out why the garbage
>> collector is not marking this vector?  Presumably it's no longer
>> attached to the function definition for erb--benchmark-monitor-func by
>> the time the garbage collector runs, but it's supposed to be found by
>> mark_stack when called from mark_one_thread for Thread 7, right?
>
> Is this vector the byte-code of erb--benchmark-monitor-func?  If so,
> how come it is no longer attached to the function, as long as the
> function does exist?

This vector is the constants vector for the byte-code of
erb--benchmark-monitor-func.

When eval-region evaluates the defun for erb--benchmark-monitor-func, it
replaces the symbol's function definition, so it removes that reference
to the byte-code.  AFAIK the only other reference to the byte-code
is on the stack of Thread 7, which is running the byte-code.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-13 17:17       ` Gemini Lasswell
@ 2018-10-13 18:04         ` Eli Zaretskii
  2018-10-14 19:29           ` Gemini Lasswell
  2018-10-14 19:46           ` Andreas Schwab
  0 siblings, 2 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-13 18:04 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: 33014@debbugs.gnu.org
> Date: Sat, 13 Oct 2018 10:17:10 -0700
> 
> When eval-region evaluates the defun for erb--benchmark-monitor-func, it
> replaces the symbol's function definition, so it removes that reference
> to the byte-code.  AFAIK the only other reference to the byte-code
> is on the stack of Thread 7, which is running the byte-code.

So you are saying that the call to mark_stack inside mark_one_thread
doesn't do its job well enough?  AFAIU, it's supposed to scan the
stack of each and every thread, and mark Lisp objects referenced from
those stacks.

How do we know there's a reference to that vector on thread 7's stack?
Could it be that there is no reference at all?





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-13 18:04         ` Eli Zaretskii
@ 2018-10-14 19:29           ` Gemini Lasswell
  2018-10-15  2:37             ` Eli Zaretskii
  2018-10-14 19:46           ` Andreas Schwab
  1 sibling, 1 reply; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-14 19:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014

Eli Zaretskii <eliz@gnu.org> writes:

> How do we know there's a reference to that vector on thread 7's stack?
> Could it be that there is no reference at all?

Yes it could be that the reference is getting optimized out.  I asked
gdb for more detail on the stack frames for exec_byte_code and
funcall_lambda, and the arguments referring to the byte-code object and
its components do appear to be optimized out, see below.  I also tried
adding 'volatile' to the declaration of the local variable 'fun' in
Ffuncall, and that made the bug go away.

Is there anything else I should be looking at before concluding that
this is the problem?  And if it is, what is the best way to fix it?


lisp.h:2241: Emacs fatal error: assertion failed: HASH_TABLE_P (a)
[Switching to Thread 0x7f6eca4b9700 (LWP 17729)]

Thread 7 "builder 0" hit Breakpoint 1, terminate_due_to_signal (sig=sig@entry=6, 
    backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:369
369	{
(gdb) bt
#0  terminate_due_to_signal (sig=sig@entry=6, 
    backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:369
#1  0x00000000005a7159 in die (msg=msg@entry=0x67b132 "HASH_TABLE_P (a)", 
    file=file@entry=0x678c85 "lisp.h", line=line@entry=2241) at alloc.c:7094
#2  0x0000000000614685 in XHASH_TABLE (a=...) at lisp.h:2241
#3  exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., 
    nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x16eac38 <bss_sbrk_buffer+9926040>) at bytecode.c:1403
#4  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7f6eca4b8470), 
    nargs=nargs@entry=0, arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>, 
    arg_vector@entry=0x1574c58 <bss_sbrk_buffer+8394168>) at eval.c:3057
#5  0x00000000005ca54b in Ffuncall (nargs=nargs@entry=1, 
    args=args@entry=0x1574c50 <bss_sbrk_buffer+8394160>) at eval.c:2870
#6  0x000000000064680b in invoke_thread_function () at thread.c:684
#7  0x00000000005c964f in internal_condition_case (
    bfun=bfun@entry=0x6467d0 <invoke_thread_function>, handlers=..., 
    handlers@entry=XIL(0xc3c0), hfun=hfun@entry=0x6466f0 <record_thread_error>)
    at eval.c:1373
#8  0x00000000006471a1 in run_thread (state=0x1574c30 <bss_sbrk_buffer+8394128>)
    at thread.c:723
#9  0x00007f6eea5cb5a7 in start_thread ()
   from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libpthread.so.0
#10 0x00007f6ee9c6622f in clone ()
   from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libc.so.6
(gdb) set print frame-arguments all
(gdb) info frame 3
Stack frame at 0x7f6eca4b87d0:
 rip = 0x614685 in exec_byte_code (bytecode.c:1403); saved rip = 0x5cdd32
 called by frame at 0x7f6eca4b87d0, caller of frame at 0x7f6eca4b87d0
 source language c.
 Arglist at 0x7f6eca4b87c0, args: bytestr=<optimized out>, 
    vector=<optimized out>, maxdepth=<optimized out>, 
    args_template=<optimized out>, nargs=nargs@entry=0, args=<optimized out>, 
    args@entry=0x16eac38 <bss_sbrk_buffer+9926040>
 Locals at 0x7f6eca4b87c0, Previous frame's sp is 0x7f6eca4b87c8
 Saved registers:
  rbx at 0x7f6eca4b8798, rbp at 0x7f6eca4b87c0, r12 at 0x7f6eca4b87a0,
  r13 at 0x7f6eca4b87a8, r14 at 0x7f6eca4b87b0, r15 at 0x7f6eca4b87b8
(gdb) info frame 4
Stack frame at 0x7f6eca4b87d0:
 rip = 0x5cdd32 in funcall_lambda (eval.c:3057); saved rip = 0x64680b
 tail call frame, caller of frame at 0x7f6eca4b87d0
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp is 0x7f6eca4b87d0





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-14 19:29           ` Gemini Lasswell
@ 2018-10-15  2:37             ` Eli Zaretskii
  0 siblings, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-15  2:37 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: 33014@debbugs.gnu.org
> Date: Sun, 14 Oct 2018 12:29:42 -0700
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > How do we know there's a reference to that vector on thread 7's stack?
> > Could it be that there is no reference at all?
> 
> Yes it could be that the reference is getting optimized out.  I asked
> gdb for more detail on the stack frames for exec_byte_code and
> funcall_lambda, and the arguments referring to the byte-code object and
> its components do appear to be optimized out, see below.  I also tried
> adding 'volatile' to the declaration of the local variable 'fun' in
> Ffuncall, and that made the bug go away.

"Optimized out" is GDB's way of saying it's confused by the complex
way a variable's location changes as the program counter advances.  It
doesn't mean the variable is lost, just that GDB lost its track.

> Is there anything else I should be looking at before concluding that
> this is the problem?  And if it is, what is the best way to fix it?

There's the question Andreas asked, we should look into that, I think.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-13 18:04         ` Eli Zaretskii
  2018-10-14 19:29           ` Gemini Lasswell
@ 2018-10-14 19:46           ` Andreas Schwab
  2018-10-15 14:59             ` Eli Zaretskii
  1 sibling, 1 reply; 34+ messages in thread
From: Andreas Schwab @ 2018-10-14 19:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gemini Lasswell, 33014

On Okt 13 2018, Eli Zaretskii <eliz@gnu.org> wrote:

> So you are saying that the call to mark_stack inside mark_one_thread
> doesn't do its job well enough?  AFAIU, it's supposed to scan the
> stack of each and every thread, and mark Lisp objects referenced from
> those stacks.
>
> How do we know there's a reference to that vector on thread 7's stack?
> Could it be that there is no reference at all?

Do we actually mark the registers of the threads as gc roots?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-14 19:46           ` Andreas Schwab
@ 2018-10-15 14:59             ` Eli Zaretskii
  2018-10-15 16:22               ` Gemini Lasswell
  2018-10-16 18:46               ` Gemini Lasswell
  0 siblings, 2 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-15 14:59 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gazally, 33014

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Gemini Lasswell <gazally@runbox.com>,  33014@debbugs.gnu.org
> Date: Sun, 14 Oct 2018 21:46:35 +0200
> 
> Do we actually mark the registers of the threads as gc roots?

According to my reading of the code, we do.  Each time a running
thread is about to release the global lock, we call
flush_stack_call_func, which is supposed to flush relevant registers
to the stack of that thread.  And mark_one_thread marks the stack of
each thread, so it should be able to see the Lisp objects on that
stack.

In this case, the function whose bytecode seems to be GC'ed is the
thread function itself.  That function is also marked, as part of
marking the thread object itself, although, of course, re-evaluating
the function will redefine the function.  But, if my reading of
exec_byte_code is correct, the bytecode should be on the stack and in
registers while we execute it, so even though the bytecode gets
disconnected from the function, it is still reachable from the stack,
and should have been marked...

Could this be some bug in the implementation of __builtin_unwind_init
etc., which causes it not to save some registers under some
conditions?  Gemini, what version of the compiler are you using?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-15 14:59             ` Eli Zaretskii
@ 2018-10-15 16:22               ` Gemini Lasswell
  2018-10-15 16:41                 ` Eli Zaretskii
  2018-10-16 18:46               ` Gemini Lasswell
  1 sibling, 1 reply; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-15 16:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014, Andreas Schwab

Eli Zaretskii <eliz@gnu.org> writes:

> Could this be some bug in the implementation of __builtin_unwind_init
> etc., which causes it not to save some registers under some
> conditions?  Gemini, what version of the compiler are you using?

7.3.0






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-15 16:22               ` Gemini Lasswell
@ 2018-10-15 16:41                 ` Eli Zaretskii
  0 siblings, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-15 16:41 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014, schwab

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: Andreas Schwab <schwab@linux-m68k.org>,  33014@debbugs.gnu.org
> Date: Mon, 15 Oct 2018 09:22:46 -0700
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Could this be some bug in the implementation of __builtin_unwind_init
> > etc., which causes it not to save some registers under some
> > conditions?  Gemini, what version of the compiler are you using?
> 
> 7.3.0

Then it's unlikely, I think.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-15 14:59             ` Eli Zaretskii
  2018-10-15 16:22               ` Gemini Lasswell
@ 2018-10-16 18:46               ` Gemini Lasswell
  2018-10-16 19:25                 ` Eli Zaretskii
  2018-10-17 16:21                 ` Eli Zaretskii
  1 sibling, 2 replies; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-16 18:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014, Andreas Schwab

Eli Zaretskii <eliz@gnu.org> writes:

> In this case, the function whose bytecode seems to be GC'ed is the
> thread function itself.  That function is also marked, as part of
> marking the thread object itself, although, of course, re-evaluating
> the function will redefine the function.  But, if my reading of
> exec_byte_code is correct, the bytecode should be on the stack and in
> registers while we execute it, so even though the bytecode gets
> disconnected from the function, it is still reachable from the stack,
> and should have been marked...

My knowledge of what gcc does and how the code it generates works is
superficial, but I don't see why an optimizer would find it necessary to
save the following values:

- The value of 'fun' in Ffuncall after it is used as an argument for
  funcall_lambda.

- The value of 'fun' in funcall_lambda after it is used to calculate
  the arguments to exec_byte_code.

- The value of 'vector' in exec_byte_code after the calculation of
  vectorp.

These three are the only variables that I see in Thread 7 from which the
garbage collector could find the constants vector which it's not
finding.  If gcc's optimizer puts them all in registers instead of on
the stack because it knows it won't need them later, those registers
will be overwritten with other values by recursive calls before
flush_stack_call_func is called.  Here's the backtrace of where Thread 7
is stopped while Thread 1 is running garbage collection, in which the
three frames I'm talking about above are 10, 11 and 12:

(gdb) thread apply 7 bt

Thread 7 (Thread 0x7fecdacdd700 (LWP 5509)):
#0  0x00007fecf771f592 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libpthread.so.0
#1  0x0000000000648389 in sys_cond_wait (cond=cond@entry=0x16e9c48 <bss_sbrk_buffer+9921960>, mutex=mutex@entry=0xd73440 <global_lock>) at systhread.c:163
#2  0x0000000000647747 in condition_wait_callback (arg=0x16e9c30 <bss_sbrk_buffer+9921936>) at thread.c:410
#3  0x00000000005a9608 in flush_stack_call_func (func=func@entry=0x647630 <condition_wait_callback>, arg=<optimized out>) at alloc.c:5021
#4  0x0000000000646e1d in Fcondition_wait (cond=<optimized out>) at thread.c:449
#5  0x00000000005cc49e in funcall_subr (subr=0xcdc1c0 <Scondition_wait>, numargs=numargs@entry=1, args=args@entry=0x7fecdacdc1c0) at eval.c:2931
#6  0x00000000005ca661 in Ffuncall (nargs=2, args=args@entry=0x7fecdacdc1b8) at eval.c:2856
#7  0x0000000000611e00 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=1, 
    args=<optimized out>, args@entry=0x11bde68 <bss_sbrk_buffer+4499400>) at bytecode.c:632
#8  0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fecdacdc1b8), nargs=nargs@entry=1, arg_vector=0x11bde68 <bss_sbrk_buffer+4499400>, arg_vector@entry=0x7fecdacdc470)
    at eval.c:3057
#9  0x00000000005ca54b in Ffuncall (nargs=2, args=args@entry=0x7fecdacdc468) at eval.c:2870
#10 0x0000000000611e00 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=nargs@entry=0, 
    args=<optimized out>, args@entry=0x16eac38 <bss_sbrk_buffer+9926040>) at bytecode.c:632
#11 0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fecdacdc468), nargs=nargs@entry=0, arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>, 
    arg_vector@entry=0x1578c58 <bss_sbrk_buffer+8410552>) at eval.c:3057
#12 0x00000000005ca54b in Ffuncall (nargs=nargs@entry=1, args=args@entry=0x1578c50 <bss_sbrk_buffer+8410544>) at eval.c:2870
#13 0x000000000064680b in invoke_thread_function () at thread.c:684
#14 0x00000000005c964f in internal_condition_case (bfun=bfun@entry=0x6467d0 <invoke_thread_function>, handlers=<optimized out>, handlers@entry=XIL(0xc3c0), 
    hfun=hfun@entry=0x6466f0 <record_thread_error>) at eval.c:1373
#15 0x00000000006471a1 in run_thread (state=0x1578c30 <bss_sbrk_buffer+8410512>) at thread.c:723
#16 0x00007fecf77195a7 in start_thread () from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libpthread.so.0
#17 0x00007fecf6db422f in clone () from
/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libc.so.6

gdb shows a value for fun in frame 11, but when I try to print
XIL(0x7fecdacdc468) it complains about it being an invalid lisp object,
and then the result of "info frame 11" shows some similar values,
so I'm thinking gdb is confused:

(gdb) thread apply 7 info frame 11

Thread 7 (Thread 0x7fecdacdd700 (LWP 5509)):
Stack frame at 0x7fecdacdc7d0:
 rip = 0x5cdd32 in funcall_lambda (eval.c:3057); saved rip = 0x64680b
 tail call frame, caller of frame at 0x7fecdacdc7d0
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp is 0x7fecdacdc7d0

I haven't figured out how to get gdb to print the Lisp backtrace of one
thread while execution is stopped in a different one.  But I expect
Thread 7's Lisp backtrace looks like this:

condition-wait
thread-queue-get
erb--benchmark-monitor-func






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-16 18:46               ` Gemini Lasswell
@ 2018-10-16 19:25                 ` Eli Zaretskii
  2018-10-16 19:38                   ` Eli Zaretskii
  2018-10-19  0:22                   ` Gemini Lasswell
  2018-10-17 16:21                 ` Eli Zaretskii
  1 sibling, 2 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-16 19:25 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014, schwab

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: Andreas Schwab <schwab@linux-m68k.org>,  33014@debbugs.gnu.org
> Date: Tue, 16 Oct 2018 11:46:36 -0700
> 
> My knowledge of what gcc does and how the code it generates works is
> superficial, but I don't see why an optimizer would find it necessary to
> save the following values:
> 
> - The value of 'fun' in Ffuncall after it is used as an argument for
>   funcall_lambda.
> 
> - The value of 'fun' in funcall_lambda after it is used to calculate
>   the arguments to exec_byte_code.
> 
> - The value of 'vector' in exec_byte_code after the calculation of
>   vectorp.

There are calling frames as well.  For GC to pay attention to a Lisp
object, it is enough to have that object _somewhere_ on the stack.

Anyway, are you saying that stack marking doesn't work in optimized
code?  We've been using this technique for the last 17 years without
problems; why would the fact that we have more than one thread change
that?  The same arguments you submit are valid for a single-threaded
Emacs, right?

I think the chance of something like what you describe to happen here
are small, and we shouldn't throw in the towel so quickly.  I don't
think we've exhausted all the other possibilities, not yet.

> gdb shows a value for fun in frame 11, but when I try to print
> XIL(0x7fecdacdc468) it complains about it being an invalid lisp object,
> and then the result of "info frame 11" shows some similar values,
> so I'm thinking gdb is confused:

It's quite possible that GDB is not confused, and you've found some
evidence of the problem.

How did you try to print XIL(0x7fecdacdc468)?  Maybe we should take a
good look at this object.

> I haven't figured out how to get gdb to print the Lisp backtrace of one
> thread while execution is stopped in a different one.

You can't, AFAIR.  The code that helps us produce a Lisp backtrace
doesn't work in that case.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-16 19:25                 ` Eli Zaretskii
@ 2018-10-16 19:38                   ` Eli Zaretskii
  2018-10-19  0:22                   ` Gemini Lasswell
  1 sibling, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-16 19:38 UTC (permalink / raw)
  To: gazally; +Cc: 33014, schwab

> Date: Tue, 16 Oct 2018 22:25:21 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 33014@debbugs.gnu.org, schwab@linux-m68k.org
> 
> > - The value of 'fun' in Ffuncall after it is used as an argument for
> >   funcall_lambda.
> > 
> > - The value of 'fun' in funcall_lambda after it is used to calculate
> >   the arguments to exec_byte_code.
> > 
> > - The value of 'vector' in exec_byte_code after the calculation of
> >   vectorp.
> 
> There are calling frames as well.  For GC to pay attention to a Lisp
> object, it is enough to have that object _somewhere_ on the stack.

And btw, 'fun' is not the object we should be tracing in this case.
We should be tracing the bytecode that is being run, either the entire
vector or some of its elements.  AFAIU, that is the bytecode of the
thread function, which I think is the one called here:

> #11 0x00000000005cdd32 in funcall_lambda (fun=XIL(0x7fecdacdc468), nargs=nargs@entry=0, arg_vector=0x16eac38 <bss_sbrk_buffer+9926040>, 
>     arg_vector@entry=0x1578c58 <bss_sbrk_buffer+8410552>) at eval.c:3057
> #12 0x00000000005ca54b in Ffuncall (nargs=nargs@entry=1, args=args@entry=0x1578c50 <bss_sbrk_buffer+8410544>) at eval.c:2870
> #13 0x000000000064680b in invoke_thread_function () at thread.c:684





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-16 19:25                 ` Eli Zaretskii
  2018-10-16 19:38                   ` Eli Zaretskii
@ 2018-10-19  0:22                   ` Gemini Lasswell
  2018-10-19  8:44                     ` Eli Zaretskii
  2018-10-19 19:32                     ` Gemini Lasswell
  1 sibling, 2 replies; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-19  0:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014, schwab

Eli Zaretskii <eliz@gnu.org> writes:

> Anyway, are you saying that stack marking doesn't work in optimized
> code?  We've been using this technique for the last 17 years without
> problems; why would the fact that we have more than one thread change
> that?  The same arguments you submit are valid for a single-threaded
> Emacs, right?

Apparently so.  I set up a single-threaded situation where I could
redefine a function while exec_byte_code was running it, and got a
segfault.  I've gained some insights from debugging this version of the
bug which I will put into a separate email.

Here are steps which consistently reproduce it for me:

Save the following code to the file 'repro.el', and then run
emacs -Q (I'm using master built with -O2 in CFLAGS):

;;;  -*- lexical-binding: t -*-
(defvar my-var "ok")
(defun my-loop-1 ()
  (let ((val 0))
    (while t
      (insert "Now in recursive edit\n")
      (recursive-edit)
      (insert (format "Leaving recursive edit: %s\n" my-var))
      (let ((things '(a b c d e)))
	(cond 
         ((= val 0) (message "foo: %s" (last things)))
         ((= val 1) (message "bar: %s" things))
         ((= val 2) (message "baz: %s" (car things)))
         (t (message "bop: %s" (nth 2 things))))
	(setq val (mod (1+ val) 3))))))

(defun my-loop ()
  (interactive)
  (redraw-display)
  (my-loop-1))

(defun my-gc-1 ()
  (garbage-collect))

(defun my-gc ()
  (interactive)
  (my-gc-1))

(provide 'repro)

Then, from emacs -Q:

C-x C-f repro.el RET
C-u M-x byte-compile-file RET repro.el RET
C-x b RET
M-x my-loop RET
C-x b RET
M-x eval-buffer RET
C-x b RET
M-x my-gc RET
C-M-c

Result:

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffc8, 
    message=<optimized out>) at editfns.c:3129
3129	      unsigned char format_char = *format++;
(gdb) bt
#0  0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffc8, message=<optimized out>) at editfns.c:3129
#1  0x00000000005ca771 in Ffuncall (nargs=3, args=args@entry=0x7ffffffeffc0) at eval.c:2859
#2  0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs@entry=0, args=<optimized out>, args@entry=0x31bda38)
    at bytecode.c:632
#3  0x00000000005cde82 in funcall_lambda (fun=XIL(0x7ffffffeffc0), nargs=nargs@entry=0, arg_vector=0x31bda38, arg_vector@entry=0x7fffffff0240) at eval.c:3060
#4  0x00000000005ca65b in Ffuncall (nargs=1, args=args@entry=0x7fffffff0238) at eval.c:2873
#5  0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs@entry=0, args=<optimized out>, args@entry=0x31bdaf8)
    at bytecode.c:632
#6  0x00000000005cde82 in funcall_lambda (fun=XIL(0x7fffffff0238), nargs=nargs@entry=0, arg_vector=0x31bdaf8, arg_vector@entry=0x7fffffff0640) at eval.c:3060
#7  0x00000000005ca65b in Ffuncall (nargs=nargs@entry=1, args=args@entry=0x7fffffff0638) at eval.c:2873
#8  0x00000000005c6653 in Ffuncall_interactively (nargs=1, args=0x7fffffff0638) at callint.c:253
#9  0x00000000005ca771 in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x7fffffff0630) at eval.c:2859
#10 0x00000000005cab2c in Fapply (nargs=nargs@entry=3, args=args@entry=0x7fffffff0630) at eval.c:2432
#11 0x00000000005c6de1 in Fcall_interactively (function=..., record_flag=..., keys=...) at callint.c:340
#12 0x00000000005cc5d7 in funcall_subr (subr=0xcd63c0 <Scall_interactively>, numargs=numargs@entry=3, args=args@entry=0x7fffffff07c0) at eval.c:2939
#13 0x00000000005ca771 in Ffuncall (nargs=4, args=args@entry=0x7fffffff07b8) at eval.c:2859
#14 0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs@entry=2, args=<optimized out>, 
    args@entry=0x9c3cc8 <pure+808136>) at bytecode.c:632
#15 0x00000000005cde82 in funcall_lambda (fun=XIL(0x7fffffff07b8), nargs=nargs@entry=2, arg_vector=0x9c3cc8 <pure+808136>, arg_vector@entry=0x7fffffff0aa8) at eval.c:3060
#16 0x00000000005ca65b in Ffuncall (nargs=3, args=args@entry=0x7fffffff0aa0) at eval.c:2873
#17 0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs@entry=3, args=<optimized out>, 
    args@entry=0x9c3978 <pure+807288>) at bytecode.c:632
#18 0x00000000005cde82 in funcall_lambda (fun=XIL(0x7fffffff0aa0), nargs=nargs@entry=3, arg_vector=0x9c3978 <pure+807288>, arg_vector@entry=0x7fffffff0e90) at eval.c:3060
#19 0x00000000005ca65b in Ffuncall (nargs=nargs@entry=4, args=args@entry=0x7fffffff0e88) at eval.c:2873
#20 0x00000000005c6653 in Ffuncall_interactively (nargs=4, args=0x7fffffff0e88) at callint.c:253
#21 0x00000000005ca771 in Ffuncall (nargs=nargs@entry=5, args=0x7fffffff0e80) at eval.c:2859
#22 0x00000000005caa3a in Fapply (nargs=nargs@entry=3, args=args@entry=0x7fffffff1030) at eval.c:2479
#23 0x00000000005c6de1 in Fcall_interactively (function=..., record_flag=..., keys=...) at callint.c:340
#24 0x00000000005cc5d7 in funcall_subr (subr=0xcd63c0 <Scall_interactively>, numargs=numargs@entry=3, args=args@entry=0x7fffffff11c0) at eval.c:2939
#25 0x00000000005ca771 in Ffuncall (nargs=4, args=args@entry=0x7fffffff11b8) at eval.c:2859
#26 0x0000000000611f00 in exec_byte_code (bytestr=..., vector=..., maxdepth=..., args_template=..., nargs=nargs@entry=1, args=<optimized out>, 
    args@entry=0x9c3cc8 <pure+808136>) at bytecode.c:632
#27 0x00000000005cde82 in funcall_lambda (fun=XIL(0x7fffffff11b8), nargs=nargs@entry=1, arg_vector=0x9c3cc8 <pure+808136>, arg_vector@entry=0x7fffffff1478) at eval.c:3060
#28 0x00000000005ca65b in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x7fffffff1470) at eval.c:2873
#29 0x00000000005ca83a in call1 (fn=..., fn@entry=XIL(0x3ff0), arg1=...) at eval.c:2710
#30 0x000000000054f597 in command_loop_1 () at keyboard.c:1451
#31 0x00000000005c975f in internal_condition_case (bfun=bfun@entry=0x54f080 <command_loop_1>, handlers=..., handlers@entry=XIL(0x53a0), 
    hfun=hfun@entry=0x541d60 <cmd_error>) at eval.c:1373
#32 0x000000000053db88 in command_loop_2 (ignore=..., ignore@entry=XIL(0)) at keyboard.c:1079
#33 0x00000000005c9683 in internal_catch (tag=..., func=func@entry=0x53db60 <command_loop_2>, arg=..., arg@entry=XIL(0)) at eval.c:1136
#34 0x000000000053ddeb in command_loop () at keyboard.c:1058
#35 0x0000000000541864 in recursive_edit_1 () at keyboard.c:703
#36 0x0000000000541c23 in Frecursive_edit () at keyboard.c:774
#37 0x000000000041e727 in main (argc=<optimized out>, argv=<optimized out>) at emacs.c:1731

Lisp Backtrace:
"format" (0xfffeffc8)
"my-loop-1" (0xffff0240)
"my-loop" (0xffff0640)
"funcall-interactively" (0xffff0638)
"call-interactively" (0xffff07c0)
"command-execute" (0xffff0aa8)
"execute-extended-command" (0xffff0e90)
"funcall-interactively" (0xffff0e88)
"call-interactively" (0xffff11c0)
"command-execute" (0xffff1478)





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-19  0:22                   ` Gemini Lasswell
@ 2018-10-19  8:44                     ` Eli Zaretskii
  2018-10-19 20:05                       ` Gemini Lasswell
  2018-10-19 19:32                     ` Gemini Lasswell
  1 sibling, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-19  8:44 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: 33014@debbugs.gnu.org,  schwab@linux-m68k.org
> Date: Thu, 18 Oct 2018 17:22:36 -0700
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Anyway, are you saying that stack marking doesn't work in optimized
> > code?  We've been using this technique for the last 17 years without
> > problems; why would the fact that we have more than one thread change
> > that?  The same arguments you submit are valid for a single-threaded
> > Emacs, right?
> 
> Apparently so.  I set up a single-threaded situation where I could
> redefine a function while exec_byte_code was running it, and got a
> segfault.  I've gained some insights from debugging this version of the
> bug which I will put into a separate email.

If this is the case, then I think we should protect the definition of
a running function from GC, in some way, either by making sure it is
referenced by some stack-based Lisp object, even in heavily optimized
code (e.g., by using 'volatile' qualifiers); or by some other method
that will ensure that definition is marked and not swept.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-19  8:44                     ` Eli Zaretskii
@ 2018-10-19 20:05                       ` Gemini Lasswell
  2018-10-20  6:41                         ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-19 20:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014

[-- Attachment #1: Type: text/plain, Size: 1166 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> > Anyway, are you saying that stack marking doesn't work in optimized
>> > code?  We've been using this technique for the last 17 years without
>> > problems; why would the fact that we have more than one thread change
>> > that?  The same arguments you submit are valid for a single-threaded
>> > Emacs, right?
>> 
>> Apparently so.  I set up a single-threaded situation where I could
>> redefine a function while exec_byte_code was running it, and got a
>> segfault.  I've gained some insights from debugging this version of the
>> bug which I will put into a separate email.
>
> If this is the case, then I think we should protect the definition of
> a running function from GC, in some way, either by making sure it is
> referenced by some stack-based Lisp object, even in heavily optimized
> code (e.g., by using 'volatile' qualifiers); or by some other method
> that will ensure that definition is marked and not swept.

Maybe code optimizers have improved over the last 17 years?

I have patched Emacs with a 'volatile' on the definition of 'fun' in
Ffuncall, and so far haven't managed to reproduce the bug with it:


[-- Attachment #2: 0001-src-eval.c-Ffuncall-Make-local-variable-fun-volatile.patch --]
[-- Type: text/plain, Size: 801 bytes --]

From a1fc2dfd392e0ba8754159d855da231a56ca275b Mon Sep 17 00:00:00 2001
From: Gemini Lasswell <gazally@runbox.com>
Date: Sun, 14 Oct 2018 12:12:04 -0700
Subject: [PATCH] * src/eval.c (Ffuncall): Make local variable 'fun' volatile
 (bug#33014)

---
 src/eval.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/eval.c b/src/eval.c
index 5e25caaa84..75b30f9c7d 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -2817,8 +2817,8 @@ Thus, (funcall \\='cons \\='x \\='y) returns (x . y).
 usage: (funcall FUNCTION &rest ARGUMENTS)  */)
   (ptrdiff_t nargs, Lisp_Object *args)
 {
-  Lisp_Object fun, original_fun;
-  Lisp_Object funcar;
+  Lisp_Object volatile fun;
+  Lisp_Object original_fun, funcar;
   ptrdiff_t numargs = nargs - 1;
   Lisp_Object val;
   ptrdiff_t count;
-- 
2.16.4


[-- Attachment #3: Type: text/plain, Size: 354 bytes --]


I'll go back now to working on my benchmarking project which I hope
someday will make it easy to see if that 'volatile' causes measurable
harm to performance.  I'll also keep using 'eval-region' and 'eval-buffer'
while I have threads running byte-compiled functions which get redefined
by doing that, and report back here if I encounter this bug again.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-19 20:05                       ` Gemini Lasswell
@ 2018-10-20  6:41                         ` Eli Zaretskii
  2018-10-20  8:23                           ` Andreas Schwab
  2018-10-29 18:24                           ` Gemini Lasswell
  0 siblings, 2 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-20  6:41 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: 33014@debbugs.gnu.org
> Date: Fri, 19 Oct 2018 13:05:19 -0700
> 
> > If this is the case, then I think we should protect the definition of
> > a running function from GC, in some way, either by making sure it is
> > referenced by some stack-based Lisp object, even in heavily optimized
> > code (e.g., by using 'volatile' qualifiers); or by some other method
> > that will ensure that definition is marked and not swept.
> 
> Maybe code optimizers have improved over the last 17 years?

I think a much more significant factor is that modern processors have
many more registers to use.

> I have patched Emacs with a 'volatile' on the definition of 'fun' in
> Ffuncall, and so far haven't managed to reproduce the bug with it:

Thanks.  This needs a comment for why we do something strange like
that, but otherwise, if no one has better ideas in a week's time,
let's install this.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-20  6:41                         ` Eli Zaretskii
@ 2018-10-20  8:23                           ` Andreas Schwab
  2018-10-20 10:20                             ` Eli Zaretskii
  2018-10-29 18:24                           ` Gemini Lasswell
  1 sibling, 1 reply; 34+ messages in thread
From: Andreas Schwab @ 2018-10-20  8:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gemini Lasswell, 33014

On Okt 20 2018, Eli Zaretskii <eliz@gnu.org> wrote:

>> Maybe code optimizers have improved over the last 17 years?
>
> I think a much more significant factor is that modern processors have
> many more registers to use.

I think an important factor is that they pass arguments in registers, so
it is more likely that the original value of an argument is lost.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-20  8:23                           ` Andreas Schwab
@ 2018-10-20 10:20                             ` Eli Zaretskii
  2018-10-20 11:30                               ` Andreas Schwab
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-20 10:20 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gazally, 33014

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Gemini Lasswell <gazally@runbox.com>,  33014@debbugs.gnu.org
> Date: Sat, 20 Oct 2018 10:23:37 +0200
> 
> On Okt 20 2018, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> >> Maybe code optimizers have improved over the last 17 years?
> >
> > I think a much more significant factor is that modern processors have
> > many more registers to use.
> 
> I think an important factor is that they pass arguments in registers, so
> it is more likely that the original value of an argument is lost.

Agreed.  That's part of what I meant by "have many more registers".





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-20 10:20                             ` Eli Zaretskii
@ 2018-10-20 11:30                               ` Andreas Schwab
  0 siblings, 0 replies; 34+ messages in thread
From: Andreas Schwab @ 2018-10-20 11:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gazally, 33014

On Okt 20 2018, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: Gemini Lasswell <gazally@runbox.com>,  33014@debbugs.gnu.org
>> Date: Sat, 20 Oct 2018 10:23:37 +0200
>> 
>> On Okt 20 2018, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>> >> Maybe code optimizers have improved over the last 17 years?
>> >
>> > I think a much more significant factor is that modern processors have
>> > many more registers to use.
>> 
>> I think an important factor is that they pass arguments in registers, so
>> it is more likely that the original value of an argument is lost.
>
> Agreed.  That's part of what I meant by "have many more registers".

You can pass by register even without more of them, and the likelihood
that the original value is lost is even higher then.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-20  6:41                         ` Eli Zaretskii
  2018-10-20  8:23                           ` Andreas Schwab
@ 2018-10-29 18:24                           ` Gemini Lasswell
  2018-10-29 19:41                             ` Eli Zaretskii
  1 sibling, 1 reply; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-29 18:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014

Eli Zaretskii <eliz@gnu.org> writes:

>> I have patched Emacs with a 'volatile' on the definition of 'fun' in
>> Ffuncall, and so far haven't managed to reproduce the bug with it:
>
> Thanks.  This needs a comment for why we do something strange like
> that, but otherwise, if no one has better ideas in a week's time,
> let's install this.

Pushed to master, along with a new test which should fail if some future
optimizing compiler removes the reference from the stack in spite of the
'volatile'.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-29 18:24                           ` Gemini Lasswell
@ 2018-10-29 19:41                             ` Eli Zaretskii
  0 siblings, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-29 19:41 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: 33014@debbugs.gnu.org
> Date: Mon, 29 Oct 2018 11:24:10 -0700
> 
> Pushed to master, along with a new test which should fail if some future
> optimizing compiler removes the reference from the stack in spite of the
> 'volatile'.

Thanks.

Any reason not to cherry-pick this to emacs-26?





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-19  0:22                   ` Gemini Lasswell
  2018-10-19  8:44                     ` Eli Zaretskii
@ 2018-10-19 19:32                     ` Gemini Lasswell
  1 sibling, 0 replies; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-19 19:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014, schwab

[-- Attachment #1: Type: text/plain, Size: 8770 bytes --]

Gemini Lasswell <gazally@runbox.com> writes:

> I set up a single-threaded situation where I could redefine a function
> while exec_byte_code was running it, and got a segfault.  I've gained
> some insights from debugging this version of the bug which I will put
> into a separate email.

Here's a gdb transcript going through the single-threaded version of
this bug.  In this transcript I use a file 'repro.el' which I've
attached to the end of this message, and is the same as the one in my
last message.

Start gdb with a breakpoint at Fredraw_display:

$ gdb --args ./emacs -Q
...
(gdb) b Fredraw_display
(gdb) r

In Emacs, find the file repro.el and load it with byte-compile-file,
then go back to *scratch* and run my-loop:

  C-x C-f repro.el RET
  C-u M-x byte-compile-file RET repro.el RET
  C-x b RET
  M-x my-loop RET

This gets me to the gdb prompt, at a point in execution where the next
function called will be my-loop-1, so I set a breakpoint in
funcall_lambda, where I can see the bytecode object for my-loop-1 (I
edited out the bytestring):

  Thread 1 "emacs" hit Breakpoint 3, Fredraw_display () at dispnew.c:3027
  3027	{
  (gdb) br funcall_lambda
  Breakpoint 4 at 0x5cdb00: file eval.c, line 3016.
  (gdb) c
  Continuing.

  Thread 1 "emacs" hit Breakpoint 4, funcall_lambda (fun=XIL(0x31c0235),
      nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x7fffffff01c0)
      at eval.c:3016
  3016	{
  (gdb) clear
  Deleted breakpoint 4
  (gdb) p fun
  $1 = XIL(0x1630fc5)
  (gdb) pr
  #[0 "..." [my-var 0 "Now in recursive edit
  " recursive-edit format "Leaving recursive edit: %s
  " (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3] 6]

Then I skip ahead into exec-byte-code:

  (gdb) br exec_byte_code
  Breakpoint 5 at 0x611bb0: file bytecode.c, line 342.
  (gdb) c
  Continuing.

  Thread 1 "emacs" hit Breakpoint 5, exec_byte_code (bytestr=XIL(0x3571d24),
      vector=XIL(0x31c0195), maxdepth=make_number(4),
      args_template=args_template@entry=XIL(0), nargs=nargs@entry=0,
      args=args@entry=0x0) at bytecode.c:342
  342	{

Here's what's in the register $rbp, and the constants vector:

  (gdb) clear
  Deleted breakpoint 5
  (gdb) p $rbp
  $2 = (void *) 0xb0201
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x000b0201>
  (gdb) p vector
  $3 = XIL(0x1630f35)
  (gdb) pr
  [my-var 0 "Now in recursive edit
  " recursive-edit format "Leaving recursive edit: %s
  " (a b c d e) message "foo: %s" last 1 "bar: %s" 2 "baz: %s" "bop: %s" mod 3]

Skip ahead, to get to where exec_byte_code has a value for vectorp:

  (gdb) n 12
  366	  USE_SAFE_ALLOCA;
  (gdb) p vectorp
  $4 = (Lisp_Object *) 0x1630f38 <bss_sbrk_buffer+9164248>
  (gdb) p *vectorp
  $5 = XIL(0x2327d80)
  (gdb) pr
  my-var
  (gdb) break mark_vectorlike if ptr->contents == $4
  Breakpoint 6 at 0x5ad400: file alloc.c, line 6036.
  (gdb) c
  Continuing.

The idea is to break when garbage collection finds the constants vector.
(I first tried setting a conditional breakpoint in mark_object, which
made garbage collection either hang or take more time than I had
patience for.)

In Emacs type C-x b RET.  This causes a gc and a breakpoint hit:

  Thread 1 "emacs" hit Breakpoint 6, mark_vectorlike (ptr=0x31c0190) at alloc.c:6036
  6036	  eassert (!VECTOR_MARKED_P (ptr));

  (gdb) bt 20
  #0  mark_vectorlike (ptr=0x1630f30 <bss_sbrk_buffer+9164240>) at alloc.c:6036
  #1  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #2  0x00000000005ad45e in mark_vectorlike (
      ptr=0x1611fd0 <bss_sbrk_buffer+9037424>) at alloc.c:6046
  #3  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #4  0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
  #5  0x00000000005acae4 in mark_object (arg=...) at alloc.c:6434
  #6  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a8e00 <bss_sbrk_buffer+8606880>) at alloc.c:6046
  #7  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a9c30 <bss_sbrk_buffer+8610512>) at alloc.c:6046
  #8  0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #9  0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a7c30 <bss_sbrk_buffer+8602320>) at alloc.c:6046
  #10 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #11 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15a6e80 <bss_sbrk_buffer+8598816>) at alloc.c:6046
  #12 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #13 0x00000000005acdf4 in mark_object (arg=...) at alloc.c:6477
  #14 0x00000000005acaa5 in mark_object (arg=...) at alloc.c:6431
  #15 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fbed0 <bss_sbrk_buffer+8947056>) at alloc.c:6046
  #16 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #17 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fbf50 <bss_sbrk_buffer+8947184>) at alloc.c:6046
  #18 0x00000000005aca9c in mark_object (arg=...) at alloc.c:6430
  #19 0x00000000005ad45e in mark_vectorlike (
      ptr=0x15fcc80 <bss_sbrk_buffer+8950560>) at alloc.c:6046
  (More stack frames follow...)

  Lisp Backtrace:
  "Automatic GC" (0x0)
  "eldoc-pre-command-refresh-echo-area" (0xfffefbb0)
  "recursive-edit" (0xfffeffd8)
  "my-loop-1" (0xffff0250)
  "my-loop" (0xffff0650)
  "funcall-interactively" (0xffff0648)
  "call-interactively" (0xffff07d0)
  "command-execute" (0xffff0ab8)
  "execute-extended-command" (0xffff0ea0)
  "funcall-interactively" (0xffff0e98)
  "call-interactively" (0xffff11d0)
  "command-execute" (0xffff1488)

There are 279 frames in the backtrace, and mark_stack and mark_memory
aren't there.  So I'm guessing the constants vector is getting found via
the function definition of 'my-loop-1'.  Keep going:

  (gdb) c
  Continuing.

Now in Emacs do this:
  M-x eval-buffer RET
  C-x b RET
  M-x my-gc RET

Execution does not stop at the breakpoint.  In Emacs type C-M-c.
Result:

  Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
  0x00000000005bca1b in styled_format (nargs=2, args=0x7ffffffeffd8,
      message=<optimized out>) at editfns.c:3129
  3129	      unsigned char format_char = *format++;

What's happened to the constants vector and its contents?

  (gdb) p $3
  $6 = XIL(0x1630f35)
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x01630f35>
  (gdb) p *$4
  $7 = XIL(0x2327d80)
  (gdb) pr
  my-var
  (gdb) p *($4+5)
  $8 = XIL(0x359a6f4)
  (gdb) pr
  #<INVALID_LISP_OBJECT 0x0359a6f4>
  (gdb) p *($4+4)
  $9 = XIL(0x6390)
  (gdb) pr
  format

Looks like the constants vector was freed, and its contents haven't been
overwritten (yet) but the format string has been freed leading to the
crash in styled_format.

While I was developing this method of reproducing this bug, I went
through this exercise without lexical-binding set in repro.el.  In that
version, the register $rbp when exec_byte_code is called contains the
bytecode Lisp_Object (instead of the non-Lisp-object value it contains
in the transcript above), and the first thing exec_byte_code does is
save it on the stack (presumably because the System V AMD64 ABI calling
convention says that called functions which use $rbp should save and
restore it).

Here's the beginning of the disassembly of exec_byte_code from
  "objdump -S bytecode.o":

  0000000000000020 <exec_byte_code>:
     executing BYTESTR.  */

  Lisp_Object
  exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
  		Lisp_Object args_template, ptrdiff_t nargs, Lisp_Object *args)
  {
        20:	55                   	push   %rbp
        21:	48 89 e5             	mov    %rsp,%rbp
        24:	41 57                	push   %r15
        26:	41 56                	push   %r14
        28:	41 55                	push   %r13
        2a:	41 54                	push   %r12
        2c:	49 89 ce             	mov    %rcx,%r14
        2f:	53                   	push   %rbx

So in the non-lexical-binding case the bytecode Lisp_Object is written
to the stack by the first instruction in exec_byte_code, and then during
the execution of 'my-gc' the breakpoint in mark_vectorlike stops at a
point with a much shorter backtrace which includes mark_stack and
mark_memory, and mark_memory's pp is pointing to the location on the
stack where $rbp was written. The bytecode object and constants vector
are consequently not freed, and no segfault happens.

I don't follow everything going on in the disassembly of funcall_lambda,
but I did figure out (by comparison with a debug session in the
multithreaded situation) that the different values in $rbp when
funcall_lambda calls exec_byte_code depend on the different code paths
following the test of whether the first element of the bytecode object
vector (the "args template" as funcall_lambda's comment calls it) is an
integer, which in turn depends on whether my-loop-1 was compiled with
lexical-binding on.

Here is 'repro.el':

[-- Attachment #2: repro.el --]
[-- Type: text/plain, Size: 685 bytes --]

;;;  -*- lexical-binding: t -*-
(defvar my-var "ok")
(defun my-loop-1 ()
  (let ((val 0))
    (while t
      (insert "Now in recursive edit\n")
      (recursive-edit)
      (insert (format "Leaving recursive edit: %s\n" my-var))
      (let ((things '(a b c d e)))
	(cond ;
         ((= val 0) (message "foo: %s" (last things)))
         ((= val 1) (message "bar: %s" things))
         ((= val 2) (message "baz: %s" (car things)))
         (t (message "bop: %s" (nth 2 things))))
	(setq val (mod (1+ val) 3))))))

(defun my-loop ()
  (interactive)
  (redraw-display)
  (my-loop-1))

(defun my-gc-1 ()
  (garbage-collect))

(defun my-gc ()
  (interactive)
  (my-gc-1))

(provide 'repro)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-16 18:46               ` Gemini Lasswell
  2018-10-16 19:25                 ` Eli Zaretskii
@ 2018-10-17 16:21                 ` Eli Zaretskii
  2018-10-18  1:07                   ` Gemini Lasswell
  1 sibling, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-17 16:21 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014, schwab

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: Andreas Schwab <schwab@linux-m68k.org>,  33014@debbugs.gnu.org
> Date: Tue, 16 Oct 2018 11:46:36 -0700
> 
> My knowledge of what gcc does and how the code it generates works is
> superficial, but I don't see why an optimizer would find it necessary to
> save the following values:
> 
> - The value of 'fun' in Ffuncall after it is used as an argument for
>   funcall_lambda.
> 
> - The value of 'fun' in funcall_lambda after it is used to calculate
>   the arguments to exec_byte_code.
> 
> - The value of 'vector' in exec_byte_code after the calculation of
>   vectorp.

After thinking about this a bit, I don't really agree with the last
one: the compiler could indeed stop tracking 'vector', but not
XVECTOR (vector)->contents, and we are interested in the latter.

One other thought is that, if worse comes to worst, we may consider
disallowing redefinition of a function that is currently being
executed (in another thread).

However, I'm still not convinced we are there.  Can we establish which
element(s) of the bytecode vector are GC'ed in this scenario?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-17 16:21                 ` Eli Zaretskii
@ 2018-10-18  1:07                   ` Gemini Lasswell
  2018-10-18 17:04                     ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-18  1:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014, schwab

Eli Zaretskii <eliz@gnu.org> writes:

> After thinking about this a bit, I don't really agree with the last
> one: the compiler could indeed stop tracking 'vector', but not
> XVECTOR (vector)->contents, and we are interested in the latter.

If the compiler stops tracking 'vector', and the garbage collector frees
it, doesn't that cause XVECTOR (vector)->contents to be overwritten?  In
the debugging session in my second message in this thread I had a
hardware watchpoint on what vectorp was pointing at and it went off in
setup_on_free_list.

> However, I'm still not convinced we are there.  Can we establish which
> element(s) of the bytecode vector are GC'ed in this scenario?

I'll see if I can figure that out.

Is there an easy way to print the function binding of a Lisp symbol from
gdb?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-18  1:07                   ` Gemini Lasswell
@ 2018-10-18 17:04                     ` Eli Zaretskii
  2018-10-19  0:39                       ` Gemini Lasswell
  2018-10-29 18:56                       ` Stefan Monnier
  0 siblings, 2 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-18 17:04 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014, schwab

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: 33014@debbugs.gnu.org,  schwab@linux-m68k.org
> Date: Wed, 17 Oct 2018 18:07:39 -0700
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > After thinking about this a bit, I don't really agree with the last
> > one: the compiler could indeed stop tracking 'vector', but not
> > XVECTOR (vector)->contents, and we are interested in the latter.
> 
> If the compiler stops tracking 'vector', and the garbage collector frees
> it, doesn't that cause XVECTOR (vector)->contents to be overwritten?

Hmmm... could be.

> > However, I'm still not convinced we are there.  Can we establish which
> > element(s) of the bytecode vector are GC'ed in this scenario?
> 
> I'll see if I can figure that out.
> 
> Is there an easy way to print the function binding of a Lisp symbol from
> gdb?

Not sure what you mean by "the function binding" in this context.
I hope something like the following will do:

  (gdb) p fun
  (gdb) xpr

Let me know if this isn't what you meant.

Thanks.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-18 17:04                     ` Eli Zaretskii
@ 2018-10-19  0:39                       ` Gemini Lasswell
  2018-10-19  8:38                         ` Eli Zaretskii
  2018-10-29 18:56                       ` Stefan Monnier
  1 sibling, 1 reply; 34+ messages in thread
From: Gemini Lasswell @ 2018-10-19  0:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 33014, schwab

Eli Zaretskii <eliz@gnu.org> writes:

>   (gdb) p fun
>   (gdb) xpr
>
> Let me know if this isn't what you meant.

I meant something like 'pv', as in:

(gdb) pv emacs-version
"27.0.50"

but which I could use to find out what the bytecode object for
erb--benchmark-monitor-func is.
      





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-19  0:39                       ` Gemini Lasswell
@ 2018-10-19  8:38                         ` Eli Zaretskii
  0 siblings, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-19  8:38 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33014, schwab

> From: Gemini Lasswell <gazally@runbox.com>
> Cc: 33014@debbugs.gnu.org,  schwab@linux-m68k.org
> Date: Thu, 18 Oct 2018 17:39:54 -0700
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >   (gdb) p fun
> >   (gdb) xpr
> >
> > Let me know if this isn't what you meant.
> 
> I meant something like 'pv', as in:
> 
> (gdb) pv emacs-version
> "27.0.50"
> 
> but which I could use to find out what the bytecode object for
> erb--benchmark-monitor-func is.

But a function doesn't have to be byte-compiled, in which case there's
no bytecode.  Look at the implementation of funcall, and you will see
how Emacs deals with this.  It seemed to me that xpr reflects that, in
that it shows you what object to look at for a given function symbol.

If you are sure the function is already compiled, then funcall_lambda
will show you how it invokes exec_byte_code for such a function, and
you will see there how to access the bytecode of such a function.

HTH

P.S. Patches to .gdbinit to provide such a functionality in a new
command, called, say, "xfunc", will be most welcome, of course.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-18 17:04                     ` Eli Zaretskii
  2018-10-19  0:39                       ` Gemini Lasswell
@ 2018-10-29 18:56                       ` Stefan Monnier
  1 sibling, 0 replies; 34+ messages in thread
From: Stefan Monnier @ 2018-10-29 18:56 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: schwab, 33014

>> > After thinking about this a bit, I don't really agree with the last
>> > one: the compiler could indeed stop tracking 'vector', but not
>> > XVECTOR (vector)->contents, and we are interested in the latter.
>> If the compiler stops tracking 'vector', and the garbage collector frees
>> it, doesn't that cause XVECTOR (vector)->contents to be overwritten?
> Hmmm... could be.

Indeed, the conservative GC doesn't try to handle "pointers into the
middle of objects", so a pointer to `XVECTOR (vector)->contents` won't
be sufficient to keep `vector` alive.

> From: Gemini Lasswell <gazally@runbox.com>
> Date: Sun, 14 Oct 2018 12:12:04 -0700
> Subject: [PATCH] * src/eval.c (Ffuncall): Make local variable 'fun' volatile
>  (bug#33014)

Shouldn't we do that in exec_byte_code instead (it probably doesn't
matter that much in the end, but I think conceptually that would be the
more correct place)?

E.g. if you change your test to

    (defun eval-tests-33014-redefine ()
      "Remove the Lisp reference to the byte-compiled object."
      (aset (symbol-function #'eval-tests-33014-func) 1 nil)
      (aset (symbol-function #'eval-tests-33014-func) 2 nil))

you won't get a crash but only because these `aset` will fail (bytecode
objects are luckily read-only).  Moving the volatile thingies to
exec_byte_code should let our code work correctly against the above test
even if we changed aset to allow modifying bytecode objects.

        Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-11  5:30 bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function Gemini Lasswell
  2018-10-12  8:12 ` Eli Zaretskii
@ 2018-10-31  4:49 ` Paul Eggert
  2018-10-31 15:33   ` Eli Zaretskii
  2018-11-01 23:15   ` Gemini Lasswell
  1 sibling, 2 replies; 34+ messages in thread
From: Paul Eggert @ 2018-10-31  4:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gemini Lasswell, 33014, Andreas Schwab, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 923 bytes --]

> Any reason not to cherry-pick this to emacs-26?

No, once we fix it up. Although adding 'volatile' happened to work for Gemini's 
compiler, it won't suffice in general as the C standard does not require 
volatile variables to survive their last access (which is what the patch was 
assuming). Furthermore, Fbyte_code bypasses that patch, so the bug could still 
occur even if 'volatile' cured the bug in the more-common code path.

A simple way to ensure that the constant vector survives GC is to have 
exec_byte_code put the vector into a GC-visible slot. As it happens, there's a 
spare slot that we can appropriate, so this won't cost us stack (or heap) space. 
I installed the first attached patch into master to do that, and backported the 
patch series into emacs-26 via the last two attached patches.

Thanks, Gemini, for the good work in debugging this problem and writing that 
test case. GC bugs can be nasty.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Improve-fix-for-Bug-33014.patch --]
[-- Type: text/x-patch; name="0001-Improve-fix-for-Bug-33014.patch", Size: 2262 bytes --]

From cf486a7a920d3d95fa9aa98d7b03ebc61b17518a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 30 Oct 2018 20:57:46 -0700
Subject: [PATCH] Improve fix for Bug#33014
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Although the previously-applied fix worked for its platform,
it doesn’t suffice in general.
* src/bytecode.c (exec_byte_code): Save VECTOR into stack slot
so that it survives GC.  The stack slot was otherwise unused,
so this doesn’t cost us memory, only a store insn.
* src/eval.c (Ffuncall): Do not make FUN volatile, reverting
2018-10-14T19:12:04Z!gazally@runbox.com.  Adding ‘volatile’
does not suffice, since storage for a volatile local can be
reclaimed after its last access (e.g., by tail recursion
elimination), which would make VECTOR invisible to GC.
---
 src/bytecode.c | 1 +
 src/eval.c     | 7 ++-----
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/bytecode.c b/src/bytecode.c
index 17457fc574..40389e08f0 100644
--- a/src/bytecode.c
+++ b/src/bytecode.c
@@ -369,6 +369,7 @@ exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
   ptrdiff_t item_bytes = stack_items * word_size;
   Lisp_Object *stack_base = ptr_bounds_clip (alloc, item_bytes);
   Lisp_Object *top = stack_base;
+  *top = vector; /* Ensure VECTOR survives GC (Bug#33014).  */
   Lisp_Object *stack_lim = stack_base + stack_items;
   unsigned char *bytestr_data = alloc;
   bytestr_data = ptr_bounds_clip (bytestr_data + item_bytes, bytestr_length);
diff --git a/src/eval.c b/src/eval.c
index 32cfda24d8..a51d0c9083 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -2820,11 +2820,8 @@ Thus, (funcall \\='cons \\='x \\='y) returns (x . y).
 usage: (funcall FUNCTION &rest ARGUMENTS)  */)
   (ptrdiff_t nargs, Lisp_Object *args)
 {
-  /* Use 'volatile' here to cause optimizing compilers to keep a
-     reference on the stack to the function's bytecode object.  See
-     Bug#33014.  */
-  Lisp_Object volatile fun;
-  Lisp_Object original_fun, funcar;
+  Lisp_Object fun, original_fun;
+  Lisp_Object funcar;
   ptrdiff_t numargs = nargs - 1;
   Lisp_Object val;
   ptrdiff_t count;
-- 
2.17.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0001-Refer-to-bytecode-constant-vectors-Bug-33014.patch --]
[-- Type: text/x-patch; name="0001-Refer-to-bytecode-constant-vectors-Bug-33014.patch", Size: 1200 bytes --]

From 1ad2903a48b682985a2bd0709ec05f67a1351a8e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 30 Oct 2018 21:14:10 -0700
Subject: [PATCH 1/2] Refer to bytecode constant vectors (Bug#33014)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Backport from master.
* src/bytecode.c (exec_byte_code): Save VECTOR into stack slot
so that it survives GC.  The stack slot was otherwise unused,
so this doesn’t cost us memory, only a store insn.
---
 src/bytecode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/bytecode.c b/src/bytecode.c
index e51f9095b3..538cd4f3ca 100644
--- a/src/bytecode.c
+++ b/src/bytecode.c
@@ -367,6 +367,7 @@ exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
   SAFE_ALLOCA_LISP_EXTRA (stack_base, stack_items, bytestr_length);
   Lisp_Object *stack_lim = stack_base + stack_items;
   Lisp_Object *top = stack_base;
+  *top = vector; /* Ensure VECTOR survives GC (Bug#33014).  */
   memcpy (stack_lim, SDATA (bytestr), bytestr_length);
   void *void_stack_lim = stack_lim;
   unsigned char const *bytestr_data = void_stack_lim;
-- 
2.17.1


[-- Attachment #4: 0002-Add-regression-test-for-Bug-33014.patch --]
[-- Type: text/x-patch, Size: 2169 bytes --]

From c3cf85b1c186e13c2d588aa35ffa57981ca481d7 Mon Sep 17 00:00:00 2001
From: Gemini Lasswell <gazally@runbox.com>
Date: Tue, 30 Oct 2018 21:15:51 -0700
Subject: [PATCH 2/2] Add regression test for Bug#33014

Backport from master.
* test/src/eval-tests.el:
(eval-tests-byte-code-being-evaluated-is-protected-from-gc): New test.
(eval-tests-33014-var): New variable.
(eval-tests-33014-func, eval-tests-33014-redefine): New functions.
---
 test/src/eval-tests.el | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/test/src/eval-tests.el b/test/src/eval-tests.el
index e68fd13611..eeb98b0994 100644
--- a/test/src/eval-tests.el
+++ b/test/src/eval-tests.el
@@ -99,4 +99,34 @@ eval-tests--exceed-specbind-limit
         (signal-hook-function #'ignore))
     (should-error (eval-tests--exceed-specbind-limit))))
 
+(ert-deftest eval-tests-byte-code-being-evaluated-is-protected-from-gc ()
+  "Regression test for Bug#33014.
+Check that byte-compiled objects being executed by exec-byte-code
+are found on the stack and therefore not garbage collected."
+  (should (string= (eval-tests-33014-func)
+                   "before after: ok foo: (e) bar: (a b c d e) baz: a bop: c")))
+
+(defvar eval-tests-33014-var "ok")
+(defun eval-tests-33014-func ()
+  "A function which has a non-trivial constants vector when byte-compiled."
+  (let ((result "before "))
+    (eval-tests-33014-redefine)
+    (garbage-collect)
+    (setq result (concat result (format "after: %s" eval-tests-33014-var)))
+    (let ((vals '(0 1 2 3))
+          (things '(a b c d e)))
+      (dolist (val vals)
+        (setq result
+              (concat result " "
+                      (cond
+                       ((= val 0) (format "foo: %s" (last things)))
+                       ((= val 1) (format "bar: %s" things))
+                       ((= val 2) (format "baz: %s" (car things)))
+                       (t (format "bop: %s" (nth 2 things))))))))
+    result))
+
+(defun eval-tests-33014-redefine ()
+  "Remove the Lisp reference to the byte-compiled object."
+  (setf (symbol-function #'eval-tests-33014-func) nil))
+
 ;;; eval-tests.el ends here
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-31  4:49 ` Paul Eggert
@ 2018-10-31 15:33   ` Eli Zaretskii
  2018-11-01 23:15   ` Gemini Lasswell
  1 sibling, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2018-10-31 15:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: gazally, 33014, schwab, monnier

> Cc: Gemini Lasswell <gazally@runbox.com>, 33014@debbugs.gnu.org,
>  Stefan Monnier <monnier@iro.umontreal.ca>,
>  Andreas Schwab <schwab@linux-m68k.org>
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 30 Oct 2018 21:49:46 -0700
> 
> > Any reason not to cherry-pick this to emacs-26?
> 
> No, once we fix it up. Although adding 'volatile' happened to work for Gemini's 
> compiler, it won't suffice in general as the C standard does not require 
> volatile variables to survive their last access (which is what the patch was 
> assuming). Furthermore, Fbyte_code bypasses that patch, so the bug could still 
> occur even if 'volatile' cured the bug in the more-common code path.
> 
> A simple way to ensure that the constant vector survives GC is to have 
> exec_byte_code put the vector into a GC-visible slot. As it happens, there's a 
> spare slot that we can appropriate, so this won't cost us stack (or heap) space. 
> I installed the first attached patch into master to do that, and backported the 
> patch series into emacs-26 via the last two attached patches.

Thanks.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function
  2018-10-31  4:49 ` Paul Eggert
  2018-10-31 15:33   ` Eli Zaretskii
@ 2018-11-01 23:15   ` Gemini Lasswell
  1 sibling, 0 replies; 34+ messages in thread
From: Gemini Lasswell @ 2018-11-01 23:15 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Andreas Schwab, 33014, Stefan Monnier

Paul Eggert <eggert@cs.ucla.edu> writes:

> A simple way to ensure that the constant vector survives GC is to have
> exec_byte_code put the vector into a GC-visible slot. As it happens,
> there's a spare slot that we can appropriate, so this won't cost us
> stack (or heap) space. I installed the first attached patch into
> master to do that, and backported the patch series into emacs-26 via
> the last two attached patches.

Thanks.  I ran through all my methods of reproducing this bug on both
emacs-26 and master, and saw no more problems.





^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-11-01 23:15 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-11  5:30 bug#33014: 26.1.50; 27.0.50; Fatal error after re-evaluating a thread's function Gemini Lasswell
2018-10-12  8:12 ` Eli Zaretskii
2018-10-12 20:02   ` Gemini Lasswell
2018-10-13  6:23     ` Eli Zaretskii
2018-10-13 17:17       ` Gemini Lasswell
2018-10-13 18:04         ` Eli Zaretskii
2018-10-14 19:29           ` Gemini Lasswell
2018-10-15  2:37             ` Eli Zaretskii
2018-10-14 19:46           ` Andreas Schwab
2018-10-15 14:59             ` Eli Zaretskii
2018-10-15 16:22               ` Gemini Lasswell
2018-10-15 16:41                 ` Eli Zaretskii
2018-10-16 18:46               ` Gemini Lasswell
2018-10-16 19:25                 ` Eli Zaretskii
2018-10-16 19:38                   ` Eli Zaretskii
2018-10-19  0:22                   ` Gemini Lasswell
2018-10-19  8:44                     ` Eli Zaretskii
2018-10-19 20:05                       ` Gemini Lasswell
2018-10-20  6:41                         ` Eli Zaretskii
2018-10-20  8:23                           ` Andreas Schwab
2018-10-20 10:20                             ` Eli Zaretskii
2018-10-20 11:30                               ` Andreas Schwab
2018-10-29 18:24                           ` Gemini Lasswell
2018-10-29 19:41                             ` Eli Zaretskii
2018-10-19 19:32                     ` Gemini Lasswell
2018-10-17 16:21                 ` Eli Zaretskii
2018-10-18  1:07                   ` Gemini Lasswell
2018-10-18 17:04                     ` Eli Zaretskii
2018-10-19  0:39                       ` Gemini Lasswell
2018-10-19  8:38                         ` Eli Zaretskii
2018-10-29 18:56                       ` Stefan Monnier
2018-10-31  4:49 ` Paul Eggert
2018-10-31 15:33   ` Eli Zaretskii
2018-11-01 23:15   ` Gemini Lasswell

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.