From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: dick.r.chiang@gmail.com Newsgroups: gmane.emacs.bugs Subject: bug#36609: 27.0.50; Possible race-condition in threading implementation Date: Wed, 09 Jun 2021 17:40:28 -0400 Message-ID: <87wnr2lnsj.fsf@dick> References: <87muhks3b5.fsf@hochschule-trier.de> <87fsxv8182.fsf@dick> <83wnr7gdd8.fsf@gnu.org> <875yyqg66k.fsf@dick> <83k0n6hjym.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27443"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.14 (Gnus v5.14pre) Emacs/28.0.50 (gnu/linux) Cc: larsi@gnus.org, politza@hochschule-trier.de, pipcet@gmail.com, 36609@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jun 09 23:41:10 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lr5wk-00071T-KE for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 09 Jun 2021 23:41:10 +0200 Original-Received: from localhost ([::1]:48956 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lr5wi-0004wf-W3 for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 09 Jun 2021 17:41:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60878) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lr5wc-0004wR-8i for bug-gnu-emacs@gnu.org; Wed, 09 Jun 2021 17:41:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:51481) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lr5wc-0005cd-18 for bug-gnu-emacs@gnu.org; Wed, 09 Jun 2021 17:41:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lr5wb-0003Uf-Uh for bug-gnu-emacs@gnu.org; Wed, 09 Jun 2021 17:41:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: dick.r.chiang@gmail.com Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 09 Jun 2021 21:41:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 36609 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: fixed Original-Received: via spool by 36609-submit@debbugs.gnu.org id=B36609.162327484113394 (code B ref 36609); Wed, 09 Jun 2021 21:41:01 +0000 Original-Received: (at 36609) by debbugs.gnu.org; 9 Jun 2021 21:40:41 +0000 Original-Received: from localhost ([127.0.0.1]:34794 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lr5wG-0003Ty-RS for submit@debbugs.gnu.org; Wed, 09 Jun 2021 17:40:41 -0400 Original-Received: from mail-qk1-f172.google.com ([209.85.222.172]:40875) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lr5wC-0003Tg-Ja for 36609@debbugs.gnu.org; Wed, 09 Jun 2021 17:40:39 -0400 Original-Received: by mail-qk1-f172.google.com with SMTP id u30so25260743qke.7 for <36609@debbugs.gnu.org>; Wed, 09 Jun 2021 14:40:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=JVtouAO+wsqgDxIC6mqjcOiWtk6odQlpL0hRdvzXPgk=; b=ml7hwuLraUKMbi73BCJdkiFBB1y1gPc9wfbfASSqgf1dYFEVTJvOukvkQsthAegdBA kXjfSZ5tizLxpqIUekQzK2EpooZzqBbzU09norTml3rLxN+Ni1SOUZYniVW3IuQIKlIw gor5WQ5HIxaGrppgDwgVIterU7eGhPL1f0HHjSna78rEOIPboWCc48bw9Cj//kYSU8Yz w0pwmJJjHhL9jR6OL+AkbVvWWMsDMX0zZbrvb7GnTB4P6IxuBR50Qj+BTUs1CGKwB97R h5nwc/yf79R/GE+4+PIt0B64xm/3xGKJ0LrzL2SfC5jWOTgnALkuX5Xsrk7vFdxBm7pW 3ntg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=JVtouAO+wsqgDxIC6mqjcOiWtk6odQlpL0hRdvzXPgk=; b=kghYXCq7TU3DDlj8aST0zK5nhay+tOdJOv8E9BdND1NSbqlLDymnZSg9V+ca3eRTlZ LMVBRHlvl7hGrUb8zRVreLoDuC02CUlGsXJqn6UUiiHevZMpoa7SV5RKdWKjGasyk2jc VIv4/I+8Kk4pWX8zgb+OAXtZN/DT3unYAMRImluZRc8Abv0QVF5yOpx5hdyu/S4EGC8S bTrF1HOu6AU+CB8GWJVGxHCymNnUmNKez6kIbZ+TfYZcvuqYM3/zIFic8VVy7I0mIONW jcX7I7GPf5BQhKSw8TPB8F5xPVMtWoRL9niS7ghxQ/imdDmlbWQ6iR9dGHaElBd71yYc 84cA== X-Gm-Message-State: AOAM530st2xQccfzqfycTMzFRRuNXsgMFf2oxTpUJ0haxA88ZVkw+5le uaCZwkyurD8xSOzhD4nhcCg= X-Google-Smtp-Source: ABdhPJyNfHa5S0OgYUO/FtPnk4dY3zkrOCi3RTgAMf0iZG6n397zCQvFXb73VhmvBvhlhtMHIP5Q2w== X-Received: by 2002:a37:848:: with SMTP id 69mr1705148qki.411.1623274830830; Wed, 09 Jun 2021 14:40:30 -0700 (PDT) Original-Received: from localhost (pool-71-190-212-171.nycmny.fios.verizon.net. [71.190.212.171]) by smtp.gmail.com with ESMTPSA id u123sm873846qkh.83.2021.06.09.14.40.29 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 09 Jun 2021 14:40:29 -0700 (PDT) In-Reply-To: <83k0n6hjym.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 06 Jun 2021 22:27:45 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:208307 Archived-At: --=-=-= Content-Type: text/plain EZ> You will have to provide a more detailed analysis, sorry. There is no guarantee the static variable `threads_holding_glib_lock` introduced in xgselect.c in commit 9c62ffb is synchronized across threads. As such, relying on it becoming zero in time for release_select_lock() is fraught. (If you add print statements, this particular heisenbug disappears, at least on Linux kernel 4.15.0-99-generic). Four attachments: 1. Desired patch to master (reverts 9c62ffb, adds main-thread-p check). --=-=-= Content-Type: text/x-diff Content-Disposition: attachment; filename=0001-Only-acquire-glib-context-if-main-thread.patch >From 5b30b55f60844ada43b119640e052f8ed5ee6e9c Mon Sep 17 00:00:00 2001 From: dickmao Date: Wed, 9 Jun 2021 17:11:48 -0400 Subject: [PATCH] Only acquire glib context if main thread * src/thread.c (really_call_select): revert 9c62ffb * src/xgselect.c (xg_select): revert 9c62ffb, only acquire glib if main --- src/thread.c | 8 -------- src/xgselect.c | 44 ++++++++++++++++---------------------------- src/xgselect.h | 2 -- 3 files changed, 16 insertions(+), 38 deletions(-) diff --git a/src/thread.c b/src/thread.c index f74f611148..609cd7c5fc 100644 --- a/src/thread.c +++ b/src/thread.c @@ -28,12 +28,6 @@ Copyright (C) 2012-2021 Free Software Foundation, Inc. #include "pdumper.h" #include "keyboard.h" -#if defined HAVE_GLIB && ! defined (HAVE_NS) -#include -#else -#define release_select_lock() do { } while (0) -#endif - union aligned_thread_state { struct thread_state s; @@ -592,8 +586,6 @@ really_call_select (void *arg) sa->result = (sa->func) (sa->max_fds, sa->rfds, sa->wfds, sa->efds, sa->timeout, sa->sigmask); - release_select_lock (); - block_interrupt_signal (&oldset); /* If we were interrupted by C-g while inside sa->func above, the signal handler could have called maybe_reacquire_global_lock, in diff --git a/src/xgselect.c b/src/xgselect.c index 0d91d55bad..b40f75cb50 100644 --- a/src/xgselect.c +++ b/src/xgselect.c @@ -28,27 +28,7 @@ Copyright (C) 2009-2021 Free Software Foundation, Inc. #include "lisp.h" #include "blockinput.h" #include "systime.h" - -static ptrdiff_t threads_holding_glib_lock; -static GMainContext *glib_main_context; - -void release_select_lock (void) -{ - if (--threads_holding_glib_lock == 0) - g_main_context_release (glib_main_context); -} - -static void acquire_select_lock (GMainContext *context) -{ - if (threads_holding_glib_lock++ == 0) - { - glib_main_context = context; - while (!g_main_context_acquire (context)) - { - /* Spin. */ - } - } -} +#include "thread.h" /* `xg_select' is a `pselect' replacement. Why do we need a separate function? 1. Timeouts. Glib and Gtk rely on timer events. If we did pselect @@ -75,19 +55,27 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds, GPollFD *gfds = gfds_buf; int gfds_size = ARRAYELTS (gfds_buf); int n_gfds, retval = 0, our_fds = 0, max_fds = fds_lim - 1; + bool context_acquired = false; int i, nfds, tmo_in_millisec, must_free = 0; bool need_to_dispatch; context = g_main_context_default (); - acquire_select_lock (context); + if (main_thread_p (current_thread)) + context_acquired = g_main_context_acquire (context); + /* FIXME: If we couldn't acquire the context, we just silently proceed + because this function handles more than just glib file descriptors. + Note that, as implemented, this failure is completely silent: there is + no feedback to the caller. */ if (rfds) all_rfds = *rfds; else FD_ZERO (&all_rfds); if (wfds) all_wfds = *wfds; else FD_ZERO (&all_wfds); - n_gfds = g_main_context_query (context, G_PRIORITY_LOW, &tmo_in_millisec, - gfds, gfds_size); + n_gfds = (context_acquired + ? g_main_context_query (context, G_PRIORITY_LOW, &tmo_in_millisec, + gfds, gfds_size) + : -1); if (gfds_size < n_gfds) { @@ -165,10 +153,8 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds, #else need_to_dispatch = true; #endif - if (need_to_dispatch) + if (need_to_dispatch && context_acquired) { - acquire_select_lock (context); - int pselect_errno = errno; /* Prevent g_main_dispatch recursion, that would occur without block_input wrapper, because event handlers call @@ -178,9 +164,11 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds, g_main_context_dispatch (context); unblock_input (); errno = pselect_errno; - release_select_lock (); } + if (context_acquired) + g_main_context_release (context); + /* To not have to recalculate timeout, return like this. */ if ((our_fds > 0 || (nfds == 0 && tmop == &tmo)) && (retval == 0)) { diff --git a/src/xgselect.h b/src/xgselect.h index 2142a236b2..e00dce1283 100644 --- a/src/xgselect.h +++ b/src/xgselect.h @@ -29,6 +29,4 @@ #define XGSELECT_H fd_set *rfds, fd_set *wfds, fd_set *efds, struct timespec *timeout, sigset_t *sigmask); -extern void release_select_lock (void); - #endif /* XGSELECT_H */ -- 2.26.2 --=-=-= Content-Type: text/plain 2. OP's original "my code doesn't work, here you figure it out", obfuscated MRE Run as: for i in 1 2 3 ; do src/emacs -Q -l ./report-orig.el & done Fails if: One of the emacsen stops responding --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=report-orig.el Content-Transfer-Encoding: quoted-printable ;; -*- lexical-binding: t -*- (require 'threads) (require 'eieio) (require 'cl-lib) (require 'ring) (defun debug (fmt &rest args) (princ (apply #'format fmt args) #'external-debugging-output) (terpri #'external-debugging-output)) (define-error 'thread-utils-thread-interrupted "Thread was interrupted" 'error) (defun thread-utils-main-thread-p (&optional object) (let ((object (or object (current-thread)))) (and (threadp object) (eq object (car (all-threads)))))) (defun thread-utils-quitable-apply (fn &rest args) (let ((quit-thread (make-thread (lambda nil (condition-case nil (cl-loop (sleep-for 3)) (thread-utils-thread-interrupted nil)))))) (unwind-protect (apply fn args) (thread-signal quit-thread 'thread-utils-thread-interrupted nil)))) (defun thread-utils-condition-quitable-wait (condition) (cl-check-type condition condition-variable) (thread-utils-quitable-apply #'condition-wait condition)) (defun thread-utils-condition-wait (condition) (if (thread-utils-main-thread-p) (thread-utils-condition-quitable-wait condition) (condition-wait condition))) (defconst channel-default-capacity 16) (defclass channel-terminal nil ((mutex :initarg :mutex :type mutex) (condition :initarg :condition :type condition-variable) (msg-queue :initarg :msg-queue :type ring) (closed-p :initform nil) (other-terminal :type (or null channel-terminal)))) (defclass channel-source (channel-terminal) nil) (defclass channel-sink (channel-terminal) nil) (define-error 'channel-closed "Trying to send/recv from a closed channel") (defun make-channel (&optional capacity) (unless capacity (setq capacity channel-default-capacity)) (cl-check-type capacity (integer 1 *)) (let* ((mutex (make-mutex "channel")) (condition (make-condition-variable mutex "channel")) (msg-queue (make-ring capacity)) (source (channel-source :mutex mutex :condition condition :msg-queue msg-queue)) (sink (channel-sink :mutex mutex :condition condition :msg-queue msg-queue))) (oset source other-terminal sink) (oset sink other-terminal source) (cons source sink))) (cl-defgeneric channel-send ((source channel-source) message) (with-mutex (oref source mutex) (with-slots (condition msg-queue) source (while (and (not (channel-closed-p source)) (=3D (ring-size msg-queue) (ring-length msg-queue))) (thread-utils-condition-wait condition)) (when (channel-closed-p source) (signal 'channel-closed (list source))) (let ((inhibit-quit t)) (ring-insert msg-queue message) (when (=3D 1 (ring-length msg-queue)) (condition-notify condition t))) nil))) (cl-defgeneric channel-recv ((sink channel-terminal)) (with-mutex (oref sink mutex) (with-slots (condition msg-queue) sink (while (and (not (channel-closed-p sink)) (ring-empty-p msg-queue)) (thread-utils-condition-wait condition)) (when (channel-closed-p sink) (signal 'channel-closed (list sink))) (let ((inhibit-quit t)) (prog1 (ring-remove msg-queue) (when (=3D 1 (- (ring-size msg-queue) (ring-length msg-queue))) (condition-notify condition t))))))) (cl-defgeneric channel-peek ((sink channel-terminal)) (with-mutex (oref sink mutex) (with-slots (condition msg-queue) sink (while (and (not (channel-closed-p sink)) (ring-empty-p msg-queue)) (thread-utils-condition-wait condition)) (when (channel-closed-p sink) (signal 'channel-closed (list sink))) (ring-ref msg-queue -1)))) (cl-defgeneric channel-close ((terminal channel-terminal)) (with-mutex (oref terminal mutex) (with-slots (closed-p condition) terminal (setq closed-p t) (condition-notify condition t)) nil)) (cl-defmethod channel-closed-p ((source channel-source)) (with-mutex (oref source mutex) (with-slots (closed-p other-terminal) source (or closed-p (oref other-terminal closed-p))))) (cl-defmethod channel-closed-p ((sink channel-sink)) (with-mutex (oref sink mutex) (with-slots (closed-p other-terminal msg-queue) sink (or closed-p (and (oref other-terminal closed-p) (ring-empty-p msg-queue)))))) (defclass future nil ((channel :initform (make-channel 1)))) (defun make-future () (make-instance 'future)) (cl-defgeneric future-set ((future future) value) (with-slots (channel) future (let ((inhibit-quit t)) (condition-case nil (progn (debug "Sending future") (channel-send (car channel) value) (debug "Future send")) (channel-closed (signal 'error (list future)))) (debug "Closing future") (channel-close (car channel)) (debug "Future closed")))) (cl-defgeneric future-get ((future future)) (with-slots (channel) future (debug "Getting future") (channel-peek (cdr channel)) (debug "Future got"))) (defclass future-deferred (future) ((producer :initarg :producer :type function) (value-produced-p :initform nil) (mutex :initform (make-mutex "future-deferred")))) (defun make-deferred-future (producer) (make-instance 'future-deferred :producer producer)) (cl-defmethod future-get :before ((future future-deferred)) (with-slots (mutex value-produced-p producer) future (with-mutex mutex (unless value-produced-p (unwind-protect (make-thread (lambda nil (debug "Setting Future") (future-set future (funcall producer)) (debug "Future set")) "XEmacs") (setq value-produced-p t)))))) (ignore-errors (enable-command 'list-threads)) (call-interactively #'list-threads) (let ((future (make-deferred-future (lambda nil 42)))) (future-get future)) --=-=-= Content-Type: text/plain 3. "Affine" redaction of OP's MRE *infrequently* breaks on tip of master (succeeds after patch) Run as: for i in 1 2 3 ; do src/emacs -Q -l ./report.el & done Fails if: One of the emacsen stops responding --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=report.el Content-Transfer-Encoding: quoted-printable ;; -*- lexical-binding: t -*- (call-interactively #'list-threads) (let* ((cv (make-condition-variable (make-mutex) "CV")) condition (notify (lambda () (sleep-for 1) ;; let wait() start spinning first (with-mutex (condition-mutex cv) (setq condition t) (condition-notify cv)))) (wait (lambda () (with-mutex (condition-mutex cv) (while (not condition) (condition-wait cv))))) (herring (make-thread (apply-partially #'sleep-for 1000) "unrelated"= ))) (make-thread notify "notify") (funcall wait) (thread-signal herring 'quit nil)) --=-=-= Content-Type: text/plain 4. My MRE *frequently* breaks tip of master (succeeds after patch) --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=42.el Content-Transfer-Encoding: quoted-printable ;; -*- lexical-binding: t -*- (require 'eieio) (defclass channel () ((condition :initarg :condition :type condition-variable) (msg-queue :initarg :msg-queue :type ring))) (cl-defgeneric channel-send ((channel channel) message) (with-slots (condition msg-queue) channel (with-mutex (condition-mutex condition) (while (<=3D (ring-size msg-queue) (ring-length msg-queue)) (condition-wait condition)) (ring-insert msg-queue message) (condition-notify condition t)))) (cl-defgeneric channel-recv ((channel channel)) (with-slots (condition msg-queue) channel (with-mutex (condition-mutex condition) (while (ring-empty-p msg-queue) (condition-wait condition)) (prog1 (ring-remove msg-queue) (condition-notify condition t))))) (defmacro run-thread (name what) `(make-thread (lambda () (sleep-for (1+ (random 3))) (funcall ,what) (princ (format "%s finished\n" ,name) #'external-debugging-output)) ,name)) (call-interactively #'list-threads) (let* ((n 3) (capacity 1) (channel (make-instance 'channel :condition (make-condition-variable (make-mutex) "channel") :msg-queue (make-ring capacity)))) (dotimes (i n) (let ((send-name (format "send-%d" (1+ i))) (recv-name (format "recv-%d" (- n i)))) (run-thread send-name (lambda () (channel-send channel 42))) (run-thread recv-name (lambda () (channel-recv channel)))))) --=-=-= Content-Type: text/plain Run as: for i in 1 2 3 ; do src/emacs -Q -l ./42.el & done Fails if: One of the emacsen stops responding --=-=-=--