From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#36609: 27.0.50; Possible race-condition in threading implementation Date: Sat, 13 Jul 2019 09:50:02 +0300 Message-ID: <83wogm1l9h.fsf@gnu.org> References: <87muhks3b5.fsf@hochschule-trier.de> <83k1cn2z0l.fsf@gnu.org> <83ftnb2wf9.fsf@gnu.org> <83blxz2t43.fsf@gnu.org> Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="151411"; mail-complaints-to="usenet@blaine.gmane.org" Cc: 36609@debbugs.gnu.org, politza@hochschule-trier.de To: Pip Cet Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Jul 13 08:51:11 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hmBsE-0004y5-UN for geb-bug-gnu-emacs@m.gmane.org; Sat, 13 Jul 2019 08:51:11 +0200 Original-Received: from localhost ([::1]:53884 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hmBsE-0005A4-0A for geb-bug-gnu-emacs@m.gmane.org; Sat, 13 Jul 2019 02:51:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:42179) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hmBsB-00059w-IC for bug-gnu-emacs@gnu.org; Sat, 13 Jul 2019 02:51:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hmBs9-0004Z4-CR for bug-gnu-emacs@gnu.org; Sat, 13 Jul 2019 02:51:07 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:60791) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hmBs6-0004XU-Vj for bug-gnu-emacs@gnu.org; Sat, 13 Jul 2019 02:51:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hmBs6-0002qa-PF for bug-gnu-emacs@gnu.org; Sat, 13 Jul 2019 02:51:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 13 Jul 2019 06:51:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 36609 X-GNU-PR-Package: emacs Original-Received: via spool by 36609-submit@debbugs.gnu.org id=B36609.156300062010683 (code B ref 36609); Sat, 13 Jul 2019 06:51:02 +0000 Original-Received: (at 36609) by debbugs.gnu.org; 13 Jul 2019 06:50:20 +0000 Original-Received: from localhost ([127.0.0.1]:41379 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hmBrP-0002mE-E5 for submit@debbugs.gnu.org; Sat, 13 Jul 2019 02:50:19 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:57945) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hmBrM-0002lM-Pt for 36609@debbugs.gnu.org; Sat, 13 Jul 2019 02:50:17 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58114) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hmBrH-0003xy-1u; Sat, 13 Jul 2019 02:50:11 -0400 Original-Received: from [176.228.60.248] (port=4320 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hmBrG-00042u-4u; Sat, 13 Jul 2019 02:50:10 -0400 In-reply-to: (message from Pip Cet on Fri, 12 Jul 2019 19:30:34 +0000) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:173304 > From: Pip Cet > Date: Fri, 12 Jul 2019 19:30:34 +0000 > Cc: politza@hochschule-trier.de, 36609@debbugs.gnu.org > > > > > We should either release the global lock before the thread exits, or > > > > defer the acting upon the signal until later. We cannot disable the > > > > signal handling altogether because it is entirely legitimate to signal > > > > another thread, and when we do, that other thread will _always_ be > > > > inside thread_select. > > > > > > Really? What about thread-yield? > > > > What about it? > > > > You are asking whether, when thread-signal is executed, the thread > > which we are signaling is necessarily parked inside thread_select? If > > so, I don't understand your surprise: only one thread can ever be > > running, and that is by definition the thread which calls > > thread-signal. All the other threads cannot be running, which means > > they are parked either in thread_select or in sys_mutex_lock called > > from acquire_global_lock. Right? > > No, they might also be in the sys_thread_yield syscall, having > released the global lock but not yet reacquired it: > > release_global_lock (); > sys_thread_yield (); <<<<< here > acquire_global_lock (self); OK, but that, too, means the thread being signaled is not running, right? And I still think that a very frequent case, perhaps the most frequent, is that the thread being signaled is inside thread_select. > I'm not sure how it's relevant to assert that "that other thread will > _always_ be inside thread_select". OK, we've now established that the other thread could also be in acquire_global_lock or (for a very short time) in sys_thread_yield. > I have an idea where you might be going with that I was merely pointing out that we cannot disable the signal handling as a means to solve the problem. > but that idea wouldn't work (to release the lock from the signalling > thread, not the signalled thread that holds it). Maybe we have a misunderstanding here. I was talking about this part of post_acquire_global_lock: /* We could have been signaled while waiting to grab the global lock for the first time since this thread was created, in which case we didn't yet have the opportunity to set up the handlers. Delay raising the signal in that case (it will be actually raised when the thread comes here after acquiring the lock the next time). */ if (!NILP (current_thread->error_symbol) && handlerlist) { Lisp_Object sym = current_thread->error_symbol; Lisp_Object data = current_thread->error_data; current_thread->error_symbol = Qnil; current_thread->error_data = Qnil; Fsignal (sym, data); } In this part, we have already switched to the thread that has been signaled, so we are in the signaled thread, not in the signaling thread. I meant to release the lock before the call to Fsignal here. > > If the problem with missing events, > > then which events are those, and what bad things will happen if we > > miss them? > > All events that glib knows about but Emacs doesn't. For example, a > glib timeout is apparently used to achieve some kind of scroll effect > on GTK menus, which is why we call xg_select from xmenu.c. > > I don't know which libraries use glib-based threads, but I think dbus does, too. > > I believe, but am not certain, that this also includes X events when > using GTK. That would explain the frozen sessions. So is the problem that the Glib context is locked "forever", or is the problem that it's locked by another Lisp thread, even if this lock is short-lived? If the former, then arranging for the release of that lock when the signaled thread exits would solve the problem, right? And if the problem is the latter one, then why didn't we hear about this much earlier? Can you show the bad effect from missing these events without signaling a thread? Thanks.