unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Pip Cet <pipcet@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 36609@debbugs.gnu.org, politza@hochschule-trier.de
Subject: bug#36609: 27.0.50; Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 14:37:25 +0000	[thread overview]
Message-ID: <CAOqdjBeMgjZapif+2bBVnoGU--r9H57r=7PvRBzG4PjidS9PQA@mail.gmail.com> (raw)
In-Reply-To: <8336jb2fhq.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2523 bytes --]

On Fri, Jul 12, 2019 at 7:57 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > I'm now convinced that there simply is no safe way to call select()
> > from two threads at once when using glib.
>
> I hope not, although GTK with its idiosyncrasies caused a lot of pain
> for the threads implementation in Emacs.

Well, I think we're going to have to do one or more of the following:

- have a race condition
- access glib-locked data from the "wrong" thread (another Emacs thread)
- release the glib lock from the "wrong" thread (another Emacs thread)

Of these, the second is the best alternative, I think: we simply grab
the g_main_context lock globally, acting for all Emacs threads, and
the last thread to require it releases it when it leaves xg_select. As
long as there's at least one thread in the critical section of
xg_select, we hold the lock, but access to the context isn't
necessarily from the thread which locked it.

> > I think our options are hacking around it and hoping nothing breaks
> > (this is what the attached patch does; it releases the main context
> > glib lock from the wrong thread soon "after" the other thread called
> > select, but there's actually no way to ensure that "after" is
> > accurate), or rewriting things so we have a single thread that does
> > all the select()ing.
>
> Hmm... how would this work with your patch?  Suppose one thread calls
> xg_select, acquires the Glib lock, sets its holding_glib_lock flag,
> then releases the global Lisp lock and calls pselect.  Since the
> global Lisp lock is now up for grabs, some other Lisp thread can
> acquire it and start running.

And when it starts running, it releases the Glib lock.

> If that other thread then calls
> xg_select, it will hang forever trying to acquire the Glib lock,
> because the first thread that holds it is stuck in pselect.

The first thread no longer holds the Glib lock, it was released when
we switched threads.

> I know very little about GTK and the Glib context lock, but AFAIR we
> really must treat that lock as a global one, not a thread-local one.
> So I think it's okay for one thread to take the Glib lock, and another
> to release it, because Glib just wants to know whether the "rest of
> the program" has it, it doesn't care which thread is that which holds
> the lock.

Okay, that sounds like option #2 above. The attached patch exposes
glib externals to the generic code, but it appears to work. If you
think the approach is okay, I'll move the glib-specific parts to
xgselect.c (if that's okay).

[-- Attachment #2: glib-hack-002.diff --]
[-- Type: text/x-patch, Size: 5894 bytes --]

diff --git a/src/thread.c b/src/thread.c
index e2deadd7a8..0ddb79460b 100644
--- a/src/thread.c
+++ b/src/thread.c
@@ -19,6 +19,9 @@ Copyright (C) 2012-2019 Free Software Foundation, Inc.
 
 #include <config.h>
 #include <setjmp.h>
+#ifdef HAVE_GLIB
+#include <glib.h>
+#endif
 #include "lisp.h"
 #include "character.h"
 #include "buffer.h"
@@ -82,7 +85,7 @@ post_acquire_global_lock (struct thread_state *self)
 
   /* Do this early on, so that code below could signal errors (e.g.,
      unbind_for_thread_switch might) correctly, because we are already
-     running in the context of the thread pointed by SELF.  */
+     running in the context of the thread pointed to by SELF.  */
   current_thread = self;
 
   if (prev_thread != current_thread)
@@ -586,6 +589,17 @@ really_call_select (void *arg)
   sa->result = (sa->func) (sa->max_fds, sa->rfds, sa->wfds, sa->efds,
 			   sa->timeout, sa->sigmask);
 
+#ifdef HAVE_GLIB
+  /* Release the Glib lock, if there are no other threads in the
+     critical section.  */
+  if (current_thread != NULL && current_thread->holding_glib_lock)
+    {
+      current_thread->holding_glib_lock = false;
+      if (--threads_holding_glib_lock == 0)
+	g_main_context_release (glib_main_context);
+    }
+#endif
+
   block_interrupt_signal (&oldset);
   /* If we were interrupted by C-g while inside sa->func above, the
      signal handler could have called maybe_reacquire_global_lock, in
@@ -756,6 +770,13 @@ run_thread (void *state)
       }
   }
 
+#ifdef HAVE_GLIB
+  /* Remember to release the Glib lock we might still be holding
+     (?)  */
+  if (current_thread->holding_glib_lock)
+    if (--threads_holding_glib_lock == 0)
+      g_main_context_release (glib_main_context);
+#endif
   current_thread = NULL;
   sys_cond_broadcast (&self->thread_condvar);
 
diff --git a/src/thread.h b/src/thread.h
index 498b9909c9..1a58f65c88 100644
--- a/src/thread.h
+++ b/src/thread.h
@@ -29,9 +29,18 @@ #define THREAD_H
 #include <signal.h>		/* sigset_t */
 #endif
 
+#ifdef HAVE_GLIB
+#include <glib.h>
+#endif
+
 #include "sysselect.h"		/* FIXME */
 #include "systhread.h"
 
+#ifdef HAVE_GLIB
+extern ptrdiff_t threads_holding_glib_lock;
+extern GMainContext *glib_main_context;
+#endif
+
 struct thread_state
 {
   union vectorlike_header header;
@@ -169,6 +178,9 @@ #define getcjmp (current_thread->m_getcjmp)
      interrupter should broadcast to this condition.  */
   sys_cond_t *wait_condvar;
 
+#ifdef HAVE_GLIB
+  bool holding_glib_lock;
+#endif
   /* This thread might have released the global lock.  If so, this is
      non-zero.  When a thread runs outside thread_select with this
      flag non-zero, it means it has been interrupted by SIGINT while
diff --git a/src/xgselect.c b/src/xgselect.c
index 9982a1f0e9..0c95857ef9 100644
--- a/src/xgselect.c
+++ b/src/xgselect.c
@@ -29,6 +29,9 @@ Copyright (C) 2009-2019 Free Software Foundation, Inc.
 #include "blockinput.h"
 #include "systime.h"
 
+ptrdiff_t threads_holding_glib_lock;
+GMainContext *glib_main_context;
+
 /* `xg_select' is a `pselect' replacement.  Why do we need a separate function?
    1. Timeouts.  Glib and Gtk rely on timer events.  If we did pselect
       with a greater timeout then the one scheduled by Glib, we would
@@ -54,26 +57,28 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds,
   GPollFD *gfds = gfds_buf;
   int gfds_size = ARRAYELTS (gfds_buf);
   int n_gfds, retval = 0, our_fds = 0, max_fds = fds_lim - 1;
-  bool context_acquired = false;
   int i, nfds, tmo_in_millisec, must_free = 0;
   bool need_to_dispatch;
 
   context = g_main_context_default ();
-  context_acquired = g_main_context_acquire (context);
-  /* FIXME: If we couldn't acquire the context, we just silently proceed
-     because this function handles more than just glib file descriptors.
-     Note that, as implemented, this failure is completely silent: there is
-     no feedback to the caller.  */
+  /* Acquire the lock.  This is a busy wait for testing.  */
+  if (current_thread != NULL && !current_thread->holding_glib_lock)
+    {
+      if (threads_holding_glib_lock++ == 0)
+	while (!g_main_context_acquire (context))
+	  {
+	  }
+      current_thread->holding_glib_lock = true;
+      glib_main_context = context;
+    }
 
   if (rfds) all_rfds = *rfds;
   else FD_ZERO (&all_rfds);
   if (wfds) all_wfds = *wfds;
   else FD_ZERO (&all_wfds);
 
-  n_gfds = (context_acquired
-	    ? g_main_context_query (context, G_PRIORITY_LOW, &tmo_in_millisec,
-				    gfds, gfds_size)
-	    : -1);
+  n_gfds = g_main_context_query (context, G_PRIORITY_LOW, &tmo_in_millisec,
+				 gfds, gfds_size);
 
   if (gfds_size < n_gfds)
     {
@@ -151,8 +156,19 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds,
 #else
   need_to_dispatch = true;
 #endif
-  if (need_to_dispatch && context_acquired)
+  if (need_to_dispatch)
     {
+      /* Acquire the lock.  This is a busy wait for testing.  */
+      glib_main_context = context;
+      if (!current_thread->holding_glib_lock)
+	{
+	  if (threads_holding_glib_lock++ == 0)
+	    while (!g_main_context_acquire (context))
+	      {
+	      }
+	  current_thread->holding_glib_lock = true;
+	}
+
       int pselect_errno = errno;
       /* Prevent g_main_dispatch recursion, that would occur without
          block_input wrapper, because event handlers call
@@ -164,8 +180,12 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds,
       errno = pselect_errno;
     }
 
-  if (context_acquired)
-    g_main_context_release (context);
+  if (current_thread != NULL && current_thread->holding_glib_lock)
+    if (--threads_holding_glib_lock == 0)
+      {
+	g_main_context_release (context);
+	current_thread->holding_glib_lock = false;
+      }
 
   /* To not have to recalculate timeout, return like this.  */
   if ((our_fds > 0 || (nfds == 0 && tmop == &tmo)) && (retval == 0))

  reply	other threads:[~2019-07-13 14:37 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-11 20:51 bug#36609: 27.0.50; Possible race-condition in threading implementation Andreas Politz
2019-07-12  7:47 ` Eli Zaretskii
2019-07-12  8:05   ` Eli Zaretskii
2019-07-12  9:02 ` Pip Cet
2019-07-12 12:42   ` Eli Zaretskii
2019-07-12 12:57     ` Pip Cet
2019-07-12 13:31       ` Eli Zaretskii
2019-07-12 13:51         ` Pip Cet
2019-07-12 15:05           ` Eli Zaretskii
2019-07-12 18:06             ` Pip Cet
2019-07-12 18:27               ` Eli Zaretskii
2019-07-12 18:34                 ` Eli Zaretskii
2019-07-12 19:24                   ` Pip Cet
2019-07-12 19:57                     ` Eli Zaretskii
2019-07-13 14:37                       ` Pip Cet [this message]
2019-07-13 15:03                         ` Eli Zaretskii
2019-07-13 15:13                           ` Eli Zaretskii
2019-07-13 15:54                           ` Eli Zaretskii
2019-07-13 15:57                             ` Pip Cet
2019-07-13 16:02                               ` Eli Zaretskii
2019-07-13 18:17                                 ` Pip Cet
2020-08-21 12:57                                   ` Lars Ingebrigtsen
2019-07-13 15:04                         ` Andreas Politz
2019-07-12 12:44   ` Pip Cet
2019-07-12 12:55     ` Eli Zaretskii
2019-07-12 13:40       ` Pip Cet
2019-07-12 13:51         ` Eli Zaretskii
2019-07-12 14:34           ` Pip Cet
2019-07-12 15:02             ` Eli Zaretskii
2019-07-12 19:30               ` Pip Cet
2019-07-13  6:50                 ` Eli Zaretskii
2021-06-06 15:50 ` dick.r.chiang
     [not found] ` <87fsxv8182.fsf@dick>
2021-06-06 16:35   ` Eli Zaretskii
2021-06-06 19:10     ` dick.r.chiang
2021-06-06 19:27       ` Eli Zaretskii
2021-06-09 21:40         ` dick.r.chiang
2021-06-10  6:46           ` Eli Zaretskii
2021-06-10 11:52             ` dick.r.chiang
2021-06-10 14:18               ` Eli Zaretskii
2021-06-10 14:55                 ` dick.r.chiang
2021-06-10 15:04                   ` Eli Zaretskii
2021-06-10 21:36                     ` dick.r.chiang
2021-06-11  6:00                       ` Eli Zaretskii
2021-06-19 17:53                         ` Eli Zaretskii
2021-06-19 19:14                           ` dick.r.chiang
2021-06-19 19:18                             ` Eli Zaretskii
2021-06-19 21:12                               ` dick.r.chiang
2021-06-20 11:39                                 ` Eli Zaretskii
2021-06-20 14:01                                   ` dick.r.chiang
2021-06-20 15:53                                     ` Eli Zaretskii
2021-06-25 13:54                                       ` Eli Zaretskii
2021-06-10 15:35                 ` Andreas Schwab
2021-06-10 15:39                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOqdjBeMgjZapif+2bBVnoGU--r9H57r=7PvRBzG4PjidS9PQA@mail.gmail.com' \
    --to=pipcet@gmail.com \
    --cc=36609@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=politza@hochschule-trier.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).