unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#33198: 27.0.50; emacs_abort on EBADF during accept-process-output in non-main thread
@ 2018-10-29 22:10 Gemini Lasswell
  2018-10-30  6:50 ` Eli Zaretskii
  0 siblings, 1 reply; 3+ messages in thread
From: Gemini Lasswell @ 2018-10-29 22:10 UTC (permalink / raw)
  To: 33198

[-- Attachment #1: Type: text/plain, Size: 1498 bytes --]

I've hit the emacs_abort at line 5510 in process.c a few times in the
last week.  I haven't found a way to make it reproduce on demand.  I
tried to narrow the code it's happening in down to a smaller test case,
without success.  I'd appreciate suggestions for how to track down what
is going wrong.

I'm working on a Lisp program which has work to do which can be done in
parallel, and I'm implementing it using threads.  My code has 4 worker
threads which pick jobs to do off of a queue (which is made thread-safe
with a mutex and condition variables).  The jobs consist of an argument
to a shell script, which the threads run asynchronously using
start-file-process and accept-process-output.  This allows the worker
threads to be responsive to a user command to cancel the work in
progress, although I haven't been using that cancel command when the bug
happens.  When it has happened, it's been after I run a command which
adds 6 jobs to the queue for the 4 threads to process.

The crash has happened with two different shell scripts, one which just
consists of "exit 1" and another which makes a directory and a symlink.
Neither script prints anything to standard output.

I've tried using the process object instead of nil as the first argument
to accept-process-output and have seen the same crash both ways.

Here are the two main functions in my worker threads,
'erb--builder-func' which is passed to 'make-thread' to create the
threads, and 'erb--build' which runs the child processes.


[-- Attachment #2: erb-build.el --]
[-- Type: text/plain, Size: 2884 bytes --]

(defun erb--builder-func ()
  "Build commits from `erb--unbuilt-commits'."
  (catch 'stop
    (while t
      (condition-case err
          (let ((commit (thread-queue-get erb--unbuilt-commits))
                build-result)
            (when (eq commit 'stop)
              (message "ERB builder thread stopping")
              (throw 'stop nil))

            (erb--status-remove commit 'waiting-to-build)
            (erb--status-add commit 'building)

            (unwind-protect
                (let ((job (thread-message-value erb--job)))
                  (unless (eq job 'cancel)
                    (with-current-buffer (erb--job-buffer job)
                      (setq build-result (erb--build commit)))))

              (erb--status-remove commit 'building)
              (if build-result
                  (erb--status-add commit 'built)
                (erb--status-add commit 'failed-builds))
              (thread-queue-put (list commit build-result)
                                erb--built-commits)))

        ((error quit) (message "Error in ERB benchmark build thread: %s" err))))))

(defun erb--build (commit)
  "Build Emacs from COMMIT.
Run the build in an asynchonous process in a temporary directory.
Save the directory name if the build is successful.  If the build
fails, save the output of the build script in the file COMMIT.log
in the results/MACHINE/failed-builds directory of
`erb-suite-directory'."
  (let* ((temp-dir (file-name-as-directory (make-temp-file "erb" t)))
         (default-directory temp-dir)
         (name (format "ERB-build-%s" commit))
         (outbuf (generate-new-buffer name))
         (build-script (erb--get-build-script-filename))
         process success)

    (unwind-protect
        (map-let (project-repo) erb--config
          (setq process
                (condition-case _err
                    (start-file-process name outbuf build-script project-repo
                                        commit)
                  ((error quit) nil)))
          (if (null process)
              (progn
                (message "Failed to start build process for commit `%s'"
                         commit)
                (erb-run--record-failure commit "Failed to start build process"))
            (catch 'quit
              (while (process-live-p process)
                (accept-process-output nil 0.5)
                (when (erb--cancel-now-p)
                  (delete-process process)
                  (throw 'quit nil)))
              (if (= (process-exit-status process) 0)
                  (progn
                    (setq success temp-dir)
                    (erb-run--remove-old-failure commit))
                (message "Building commit `%s' failed" commit)
                (erb-run--record-failure commit outbuf)))))
      (unless success
	(delete-directory temp-dir t))
      (kill-buffer outbuf))
    success))

[-- Attachment #3: Type: text/plain, Size: 5709 bytes --]


Thread 6 "ERB control" hit Breakpoint 1, terminate_due_to_signal (
    sig=sig@entry=6, backtrace_limit=backtrace_limit@entry=40) at emacs.c:369
369	{
(gdb) bt
#0  terminate_due_to_signal (sig=sig@entry=6,
    backtrace_limit=backtrace_limit@entry=40) at emacs.c:369
#1  0x0000000000511a23 in emacs_abort () at sysdep.c:2429
#2  0x00000000005b68c1 in wait_reading_process_output (
    time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd@entry=0,
    do_display=do_display@entry=false, wait_for_cell=wait_for_cell@entry=XIL(0),
    wait_proc=<optimized out>, just_wait_proc=0) at process.c:5510
#3  0x00000000005b6eea in Faccept_process_output (process=XIL(0),
    seconds=<optimized out>, millisec=<optimized out>, just_this_one=XIL(0))
    at process.c:4677
#4  0x000000000056e815 in Ffuncall (nargs=3, args=args@entry=0x7fffd366d360)
    at eval.c:2856
#5  0x00000000005aa740 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=nargs@entry=1, args=<optimized out>,
    args@entry=0x15dede8 <bss_sbrk_buffer+10104552>) at bytecode.c:632
#6  0x0000000000571416 in funcall_lambda (fun=XIL(0x7fffd366d360),
    nargs=nargs@entry=1, arg_vector=0x15dede8 <bss_sbrk_buffer+10104552>,
    arg_vector@entry=0x7fffd366d600) at eval.c:3057
#7  0x000000000056e793 in Ffuncall (nargs=2, args=args@entry=0x7fffd366d5f8)
    at eval.c:2870
#8  0x00000000005aa740 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=nargs@entry=0, args=<optimized out>,
    args@entry=0x15deca8 <bss_sbrk_buffer+10104232>) at bytecode.c:632
#9  0x0000000000571416 in funcall_lambda (fun=XIL(0x7fffd366d5f8),
    nargs=nargs@entry=0, arg_vector=0x15deca8 <bss_sbrk_buffer+10104232>,
    arg_vector@entry=0x1423c58 <bss_sbrk_buffer+8289624>) at eval.c:3057
#10 0x000000000056e793 in Ffuncall (nargs=nargs@entry=1,
    args=args@entry=0x1423c50 <bss_sbrk_buffer+8289616>) at eval.c:2870
#11 0x00000000005d425b in invoke_thread_function () at thread.c:684
#12 0x000000000056d9ef in internal_condition_case (
    bfun=bfun@entry=0x5d4220 <invoke_thread_function>,
    handlers=handlers@entry=XIL(0xc3c0),
    hfun=hfun@entry=0x5d3ae0 <record_thread_error>) at eval.c:1373
#13 0x00000000005d414b in run_thread (state=0x1423c30 <bss_sbrk_buffer+8289584>)
    at thread.c:723
#14 0x00007ffff15a65a7 in start_thread ()
   from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libpthread.so.0
#15 0x00007ffff0c4122f in clone ()
   from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libc.so.6

Lisp Backtrace:
"accept-process-output" (0xd366d368)
"erb--build" (0xd366d600)
"erb--builder-func" (0x1423c58)


In GNU Emacs 27.0.50 (build 8, x86_64-pc-linux-gnu, GTK+ Version 3.22.30)
 of 2018-10-28 built on sockeye
Repository revision: f7638edcb06fac3b58b986062ea679f6919d81d7
Windowing system distributor 'The X.Org Foundation', version 11.0.11906000
System Description: NixOS 18.09.git.ad56635 (Jellyfish)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --prefix=/home/gem/src/emacs/master/bin --with-modules
 --with-x-toolkit=gtk3 --with-xft --config-cache'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND DBUS GSETTINGS GLIB NOTIFY LIBSELINUX
GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM
MODULES THREADS GMP

Important settings:
  value of $EMACSLOADPATH:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv dired dired-loaddefs format-spec rfc822 mml
easymenu mml-sec password-cache epa derived epg epg-config gnus-util
rmail rmail-loaddefs time-date mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib sendmail
rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads dbusbind
inotify dynamic-setting system-font-setting font-render-setting
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 95415 9749)
 (symbols 48 20031 1)
 (strings 32 28349 1783)
 (string-bytes 1 753921)
 (vectors 16 14931)
 (vector-slots 8 508718 9684)
 (floats 8 47 70)
 (intervals 56 209 0)
 (buffers 992 11))

^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#33198: 27.0.50; emacs_abort on EBADF during accept-process-output in non-main thread
  2018-10-29 22:10 bug#33198: 27.0.50; emacs_abort on EBADF during accept-process-output in non-main thread Gemini Lasswell
@ 2018-10-30  6:50 ` Eli Zaretskii
  2021-02-02 15:02   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 3+ messages in thread
From: Eli Zaretskii @ 2018-10-30  6:50 UTC (permalink / raw)
  To: Gemini Lasswell; +Cc: 33198

> From: Gemini Lasswell <gazally@runbox.com>
> Date: Mon, 29 Oct 2018 15:10:39 -0700
> 
> I've hit the emacs_abort at line 5510 in process.c a few times in the
> last week.  I haven't found a way to make it reproduce on demand.  I
> tried to narrow the code it's happening in down to a smaller test case,
> without success.  I'd appreciate suggestions for how to track down what
> is going wrong.

I suggest to instrument the code that determines which thread will
listen to what descriptors in its pselect call.  This happens inside
compute_input_wait_mask and compute_non_keyboard_wait_mask, and the
data those use is set by several add_*_fd functions.  The
instrumentation should output the descriptor, the thread ID, and what
is it used for.  Then I think you will be able to see where did the
bad descriptor come from, and how it happened to be bad.

You will also need to determine which descriptor is the bad one; the
usual paradigm to do that is by calling 'fcntl (fd, F_GETFD)' on each
descriptor on which pselect was asked to wait, and see which ones
return -1 with erno = EBADFD.





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#33198: 27.0.50; emacs_abort on EBADF during accept-process-output in non-main thread
  2018-10-30  6:50 ` Eli Zaretskii
@ 2021-02-02 15:02   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 3+ messages in thread
From: Lars Ingebrigtsen @ 2021-02-02 15:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gemini Lasswell, 33198

Eli Zaretskii <eliz@gnu.org> writes:

> You will also need to determine which descriptor is the bad one; the
> usual paradigm to do that is by calling 'fcntl (fd, F_GETFD)' on each
> descriptor on which pselect was asked to wait, and see which ones
> return -1 with erno = EBADFD.

This was two years ago, so I'm guessing there's little chance of there
being any progress with this crash, and I'm closing this bug report.  If
this is a problem that persists, please respond to the debbugs address
and we'll reopen.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-02-02 15:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29 22:10 bug#33198: 27.0.50; emacs_abort on EBADF during accept-process-output in non-main thread Gemini Lasswell
2018-10-30  6:50 ` Eli Zaretskii
2021-02-02 15:02   ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).