From: Pip Cet <pipcet@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 36609@debbugs.gnu.org, politza@hochschule-trier.de
Subject: bug#36609: 27.0.50; Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 14:37:25 +0000 [thread overview]
Message-ID: <CAOqdjBeMgjZapif+2bBVnoGU--r9H57r=7PvRBzG4PjidS9PQA@mail.gmail.com> (raw)
In-Reply-To: <8336jb2fhq.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 2523 bytes --]
On Fri, Jul 12, 2019 at 7:57 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > I'm now convinced that there simply is no safe way to call select()
> > from two threads at once when using glib.
>
> I hope not, although GTK with its idiosyncrasies caused a lot of pain
> for the threads implementation in Emacs.
Well, I think we're going to have to do one or more of the following:
- have a race condition
- access glib-locked data from the "wrong" thread (another Emacs thread)
- release the glib lock from the "wrong" thread (another Emacs thread)
Of these, the second is the best alternative, I think: we simply grab
the g_main_context lock globally, acting for all Emacs threads, and
the last thread to require it releases it when it leaves xg_select. As
long as there's at least one thread in the critical section of
xg_select, we hold the lock, but access to the context isn't
necessarily from the thread which locked it.
> > I think our options are hacking around it and hoping nothing breaks
> > (this is what the attached patch does; it releases the main context
> > glib lock from the wrong thread soon "after" the other thread called
> > select, but there's actually no way to ensure that "after" is
> > accurate), or rewriting things so we have a single thread that does
> > all the select()ing.
>
> Hmm... how would this work with your patch? Suppose one thread calls
> xg_select, acquires the Glib lock, sets its holding_glib_lock flag,
> then releases the global Lisp lock and calls pselect. Since the
> global Lisp lock is now up for grabs, some other Lisp thread can
> acquire it and start running.
And when it starts running, it releases the Glib lock.
> If that other thread then calls
> xg_select, it will hang forever trying to acquire the Glib lock,
> because the first thread that holds it is stuck in pselect.
The first thread no longer holds the Glib lock, it was released when
we switched threads.
> I know very little about GTK and the Glib context lock, but AFAIR we
> really must treat that lock as a global one, not a thread-local one.
> So I think it's okay for one thread to take the Glib lock, and another
> to release it, because Glib just wants to know whether the "rest of
> the program" has it, it doesn't care which thread is that which holds
> the lock.
Okay, that sounds like option #2 above. The attached patch exposes
glib externals to the generic code, but it appears to work. If you
think the approach is okay, I'll move the glib-specific parts to
xgselect.c (if that's okay).
[-- Attachment #2: glib-hack-002.diff --]
[-- Type: text/x-patch, Size: 5894 bytes --]
diff --git a/src/thread.c b/src/thread.c
index e2deadd7a8..0ddb79460b 100644
--- a/src/thread.c
+++ b/src/thread.c
@@ -19,6 +19,9 @@ Copyright (C) 2012-2019 Free Software Foundation, Inc.
#include <config.h>
#include <setjmp.h>
+#ifdef HAVE_GLIB
+#include <glib.h>
+#endif
#include "lisp.h"
#include "character.h"
#include "buffer.h"
@@ -82,7 +85,7 @@ post_acquire_global_lock (struct thread_state *self)
/* Do this early on, so that code below could signal errors (e.g.,
unbind_for_thread_switch might) correctly, because we are already
- running in the context of the thread pointed by SELF. */
+ running in the context of the thread pointed to by SELF. */
current_thread = self;
if (prev_thread != current_thread)
@@ -586,6 +589,17 @@ really_call_select (void *arg)
sa->result = (sa->func) (sa->max_fds, sa->rfds, sa->wfds, sa->efds,
sa->timeout, sa->sigmask);
+#ifdef HAVE_GLIB
+ /* Release the Glib lock, if there are no other threads in the
+ critical section. */
+ if (current_thread != NULL && current_thread->holding_glib_lock)
+ {
+ current_thread->holding_glib_lock = false;
+ if (--threads_holding_glib_lock == 0)
+ g_main_context_release (glib_main_context);
+ }
+#endif
+
block_interrupt_signal (&oldset);
/* If we were interrupted by C-g while inside sa->func above, the
signal handler could have called maybe_reacquire_global_lock, in
@@ -756,6 +770,13 @@ run_thread (void *state)
}
}
+#ifdef HAVE_GLIB
+ /* Remember to release the Glib lock we might still be holding
+ (?) */
+ if (current_thread->holding_glib_lock)
+ if (--threads_holding_glib_lock == 0)
+ g_main_context_release (glib_main_context);
+#endif
current_thread = NULL;
sys_cond_broadcast (&self->thread_condvar);
diff --git a/src/thread.h b/src/thread.h
index 498b9909c9..1a58f65c88 100644
--- a/src/thread.h
+++ b/src/thread.h
@@ -29,9 +29,18 @@ #define THREAD_H
#include <signal.h> /* sigset_t */
#endif
+#ifdef HAVE_GLIB
+#include <glib.h>
+#endif
+
#include "sysselect.h" /* FIXME */
#include "systhread.h"
+#ifdef HAVE_GLIB
+extern ptrdiff_t threads_holding_glib_lock;
+extern GMainContext *glib_main_context;
+#endif
+
struct thread_state
{
union vectorlike_header header;
@@ -169,6 +178,9 @@ #define getcjmp (current_thread->m_getcjmp)
interrupter should broadcast to this condition. */
sys_cond_t *wait_condvar;
+#ifdef HAVE_GLIB
+ bool holding_glib_lock;
+#endif
/* This thread might have released the global lock. If so, this is
non-zero. When a thread runs outside thread_select with this
flag non-zero, it means it has been interrupted by SIGINT while
diff --git a/src/xgselect.c b/src/xgselect.c
index 9982a1f0e9..0c95857ef9 100644
--- a/src/xgselect.c
+++ b/src/xgselect.c
@@ -29,6 +29,9 @@ Copyright (C) 2009-2019 Free Software Foundation, Inc.
#include "blockinput.h"
#include "systime.h"
+ptrdiff_t threads_holding_glib_lock;
+GMainContext *glib_main_context;
+
/* `xg_select' is a `pselect' replacement. Why do we need a separate function?
1. Timeouts. Glib and Gtk rely on timer events. If we did pselect
with a greater timeout then the one scheduled by Glib, we would
@@ -54,26 +57,28 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds,
GPollFD *gfds = gfds_buf;
int gfds_size = ARRAYELTS (gfds_buf);
int n_gfds, retval = 0, our_fds = 0, max_fds = fds_lim - 1;
- bool context_acquired = false;
int i, nfds, tmo_in_millisec, must_free = 0;
bool need_to_dispatch;
context = g_main_context_default ();
- context_acquired = g_main_context_acquire (context);
- /* FIXME: If we couldn't acquire the context, we just silently proceed
- because this function handles more than just glib file descriptors.
- Note that, as implemented, this failure is completely silent: there is
- no feedback to the caller. */
+ /* Acquire the lock. This is a busy wait for testing. */
+ if (current_thread != NULL && !current_thread->holding_glib_lock)
+ {
+ if (threads_holding_glib_lock++ == 0)
+ while (!g_main_context_acquire (context))
+ {
+ }
+ current_thread->holding_glib_lock = true;
+ glib_main_context = context;
+ }
if (rfds) all_rfds = *rfds;
else FD_ZERO (&all_rfds);
if (wfds) all_wfds = *wfds;
else FD_ZERO (&all_wfds);
- n_gfds = (context_acquired
- ? g_main_context_query (context, G_PRIORITY_LOW, &tmo_in_millisec,
- gfds, gfds_size)
- : -1);
+ n_gfds = g_main_context_query (context, G_PRIORITY_LOW, &tmo_in_millisec,
+ gfds, gfds_size);
if (gfds_size < n_gfds)
{
@@ -151,8 +156,19 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds,
#else
need_to_dispatch = true;
#endif
- if (need_to_dispatch && context_acquired)
+ if (need_to_dispatch)
{
+ /* Acquire the lock. This is a busy wait for testing. */
+ glib_main_context = context;
+ if (!current_thread->holding_glib_lock)
+ {
+ if (threads_holding_glib_lock++ == 0)
+ while (!g_main_context_acquire (context))
+ {
+ }
+ current_thread->holding_glib_lock = true;
+ }
+
int pselect_errno = errno;
/* Prevent g_main_dispatch recursion, that would occur without
block_input wrapper, because event handlers call
@@ -164,8 +180,12 @@ xg_select (int fds_lim, fd_set *rfds, fd_set *wfds, fd_set *efds,
errno = pselect_errno;
}
- if (context_acquired)
- g_main_context_release (context);
+ if (current_thread != NULL && current_thread->holding_glib_lock)
+ if (--threads_holding_glib_lock == 0)
+ {
+ g_main_context_release (context);
+ current_thread->holding_glib_lock = false;
+ }
/* To not have to recalculate timeout, return like this. */
if ((our_fds > 0 || (nfds == 0 && tmop == &tmo)) && (retval == 0))
next prev parent reply other threads:[~2019-07-13 14:37 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-11 20:51 bug#36609: 27.0.50; Possible race-condition in threading implementation Andreas Politz
2019-07-12 7:47 ` Eli Zaretskii
2019-07-12 8:05 ` Eli Zaretskii
2019-07-12 9:02 ` Pip Cet
2019-07-12 12:42 ` Eli Zaretskii
2019-07-12 12:57 ` Pip Cet
2019-07-12 13:31 ` Eli Zaretskii
2019-07-12 13:51 ` Pip Cet
2019-07-12 15:05 ` Eli Zaretskii
2019-07-12 18:06 ` Pip Cet
2019-07-12 18:27 ` Eli Zaretskii
2019-07-12 18:34 ` Eli Zaretskii
2019-07-12 19:24 ` Pip Cet
2019-07-12 19:57 ` Eli Zaretskii
2019-07-13 14:37 ` Pip Cet [this message]
2019-07-13 15:03 ` Eli Zaretskii
2019-07-13 15:13 ` Eli Zaretskii
2019-07-13 15:54 ` Eli Zaretskii
2019-07-13 15:57 ` Pip Cet
2019-07-13 16:02 ` Eli Zaretskii
2019-07-13 18:17 ` Pip Cet
2020-08-21 12:57 ` Lars Ingebrigtsen
2019-07-13 15:04 ` Andreas Politz
2019-07-12 12:44 ` Pip Cet
2019-07-12 12:55 ` Eli Zaretskii
2019-07-12 13:40 ` Pip Cet
2019-07-12 13:51 ` Eli Zaretskii
2019-07-12 14:34 ` Pip Cet
2019-07-12 15:02 ` Eli Zaretskii
2019-07-12 19:30 ` Pip Cet
2019-07-13 6:50 ` Eli Zaretskii
2021-06-06 15:50 ` dick.r.chiang
[not found] ` <87fsxv8182.fsf@dick>
2021-06-06 16:35 ` Eli Zaretskii
2021-06-06 19:10 ` dick.r.chiang
2021-06-06 19:27 ` Eli Zaretskii
2021-06-09 21:40 ` dick.r.chiang
2021-06-10 6:46 ` Eli Zaretskii
2021-06-10 11:52 ` dick.r.chiang
2021-06-10 14:18 ` Eli Zaretskii
2021-06-10 14:55 ` dick.r.chiang
2021-06-10 15:04 ` Eli Zaretskii
2021-06-10 21:36 ` dick.r.chiang
2021-06-11 6:00 ` Eli Zaretskii
2021-06-19 17:53 ` Eli Zaretskii
2021-06-19 19:14 ` dick.r.chiang
2021-06-19 19:18 ` Eli Zaretskii
2021-06-19 21:12 ` dick.r.chiang
2021-06-20 11:39 ` Eli Zaretskii
2021-06-20 14:01 ` dick.r.chiang
2021-06-20 15:53 ` Eli Zaretskii
2021-06-25 13:54 ` Eli Zaretskii
2021-06-10 15:35 ` Andreas Schwab
2021-06-10 15:39 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOqdjBeMgjZapif+2bBVnoGU--r9H57r=7PvRBzG4PjidS9PQA@mail.gmail.com' \
--to=pipcet@gmail.com \
--cc=36609@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=politza@hochschule-trier.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).