* [PATCH] Final: thread lock nesting debugging
@ 2008-11-17 4:42 Linas Vepstas
2008-11-17 13:21 ` Han-Wen Nienhuys
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Linas Vepstas @ 2008-11-17 4:42 UTC (permalink / raw)
To: bug-guile
[-- Attachment #1: Type: text/plain, Size: 12796 bytes --]
I've been seeing all sorts of deadlocks in guile, and so I wrote a small
debugging utility to try to track down the problems. I'd like to see
this patch included in future versions of guile.
I found one bug with this tool, and have submitted a patch for that
already. It looks like there's another bug involving signals --
there is a probably deadlock in garbage collection if a signal
is sent at just the wrong time. The bug can be seen by enabling
this patch, and then running 'make check'. I'm going to ignore
this, as I'm not worried about signals right now.
This is my "final" version of this patch, I'd sent a beta version
a few days ago. Its "final" because I'm not anticipating any
further changes.
--linas
Add a deadlock debugging facility to guile.
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
---
config.h.in | 3
configure.in | 11 +++
libguile/Makefile.am | 2
libguile/debug-locks.c | 159 +++++++++++++++++++++++++++++++++++++++++++++
libguile/pthread-threads.h | 15 ++++
libguile/threads.c | 53 +++++++++++++++
libguile/threads.h | 8 ++
7 files changed, 250 insertions(+), 1 deletion(-)
Index: guile-1.8.5/libguile/pthread-threads.h
===================================================================
--- guile-1.8.5.orig/libguile/pthread-threads.h 2008-11-16
18:57:19.000000000 -0600
+++ guile-1.8.5/libguile/pthread-threads.h 2008-11-16 18:57:48.000000000 -0600
@@ -91,6 +91,21 @@ extern pthread_mutexattr_t scm_i_pthread
#define scm_i_scm_pthread_cond_wait scm_pthread_cond_wait
#define scm_i_scm_pthread_cond_timedwait scm_pthread_cond_timedwait
+#ifdef GUILE_DEBUG_LOCKS
+#undef scm_i_pthread_mutex_lock
+#define scm_i_pthread_mutex_lock(ARG) scm_i_pthread_mutex_lock_dbg(ARG, #ARG)
+
+#undef scm_i_pthread_mutex_unlock
+#define scm_i_pthread_mutex_unlock(ARG)
scm_i_pthread_mutex_unlock_dbg(ARG, #ARG)
+
+int scm_i_pthread_mutex_lock_dbg(pthread_mutex_t *, const char *);
+int scm_i_pthread_mutex_unlock_dbg(pthread_mutex_t *, const char *);
+
+void prt_lockholders(void);
+void prt_this_lockholder(void);
+
+#endif
+
#endif /* SCM_PTHREADS_THREADS_H */
/*
Index: guile-1.8.5/libguile/threads.c
===================================================================
--- guile-1.8.5.orig/libguile/threads.c 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/libguile/threads.c 2008-11-16 18:57:48.000000000 -0600
@@ -441,6 +441,24 @@ guilify_self_1 (SCM_STACKITEM *base)
SCM_SET_FREELIST_LOC (scm_i_freelist, &t->freelist);
SCM_SET_FREELIST_LOC (scm_i_freelist2, &t->freelist2);
+#ifdef GUILE_DEBUG_LOCKS
+ int i, j;
+ for(i=0; i<LOCK_STACK_DEPTH; i++)
+ {
+ t->lockname[i] = NULL;
+ t->lockmutex[i] = NULL;
+ for(j=0; j<TRACE_STACK_DEPTH; j++)
+ {
+ t->lockholder[i][j] = NULL;
+ }
+ }
+ if (scm_initialized_p == 0)
+ {
+ t->lockname[0] = "&scm_i_init_mutex";
+ t->lockmutex[0] = &scm_i_init_mutex;
+ }
+#endif
+
scm_i_pthread_setspecific (scm_i_thread_key, t);
scm_i_pthread_mutex_lock (&t->heap_mutex);
@@ -1624,8 +1642,21 @@ scm_i_thread_wake_up ()
scm_i_thread *t;
scm_i_pthread_cond_broadcast (&wake_up_cond);
+#ifndef GUILE_DEBUG_LOCKS
for (t = all_threads; t; t = t->next_thread)
scm_i_pthread_mutex_unlock (&t->heap_mutex);
+#else
+ /* Unlock in reverse order from locking */
+ scm_i_thread *tt = NULL;
+ while (tt != all_threads)
+ {
+ scm_i_thread *tp = NULL;
+ for (t = all_threads; t != tt; t = t->next_thread)
+ tp = t;
+ scm_i_pthread_mutex_unlock (&tp->heap_mutex);
+ tt = tp;
+ }
+#endif
scm_i_pthread_mutex_unlock (&thread_admin_mutex);
scm_enter_guile ((scm_t_guile_ticket) SCM_I_CURRENT_THREAD);
}
@@ -1721,6 +1752,28 @@ scm_init_threads_default_dynamic_state (
scm_i_default_dynamic_state = scm_permanent_object (state);
}
+#ifdef GUILE_DEBUG_LOCKS
+extern int guile_do_abort_on_badlock;
+
+void prt_one_lockholder(scm_i_thread *);
+void prt_lockholders(void)
+{
+ scm_i_thread *t;
+
+ if (!guile_do_abort_on_badlock) return;
+
+ for (t = all_threads; t; t = t->next_thread)
+ {
+ prt_one_lockholder(t);
+ }
+}
+
+void prt_this_lockholder(void)
+{
+ prt_one_lockholder(SCM_I_CURRENT_THREAD);
+}
+#endif
+
void
scm_init_thread_procs ()
{
Index: guile-1.8.5/libguile/threads.h
===================================================================
--- guile-1.8.5.orig/libguile/threads.h 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/libguile/threads.h 2008-11-16 18:57:48.000000000 -0600
@@ -108,6 +108,14 @@ typedef struct scm_i_thread {
SCM_STACKITEM *top;
jmp_buf regs;
+#ifdef GUILE_DEBUG_LOCKS
+#define LOCK_STACK_DEPTH 250
+#define TRACE_STACK_DEPTH 4
+ const char *lockname[LOCK_STACK_DEPTH];
+ char *lockholder[LOCK_STACK_DEPTH][TRACE_STACK_DEPTH];
+ pthread_mutex_t *lockmutex[LOCK_STACK_DEPTH];
+#endif
+
} scm_i_thread;
#define SCM_I_IS_THREAD(x) SCM_SMOB_PREDICATE (scm_tc16_thread, x)
Index: guile-1.8.5/configure.in
===================================================================
--- guile-1.8.5.orig/configure.in 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/configure.in 2008-11-16 18:57:48.000000000 -0600
@@ -122,6 +122,13 @@ AC_ARG_ENABLE(debug-malloc,
[Define this if you want to debug scm_must_malloc/realloc/free calls.])
fi)
+AC_ARG_ENABLE(debug-locks,
+ [ --enable-debug-locks include thread lock debugging code],
+ if test "$enable_debug_locks" = y || test "$enable_debug_locks" = yes; then
+ AC_DEFINE(GUILE_DEBUG_LOCKS, 1,
+ [Define this if you want to debug pthread lock nesting and
deadlock trouble.])
+ fi)
+
SCM_I_GSC_GUILE_DEBUG=0
AC_ARG_ENABLE(guile-debug,
[AC_HELP_STRING([--enable-guile-debug],
@@ -263,6 +270,10 @@ if test "$enable_debug_malloc" = yes; th
AC_LIBOBJ([debug-malloc])
fi
+if test "$enable_debug_locks" = yes; then
+ AC_LIBOBJ([debug-locks])
+fi
+
if test "$enable_elisp" = yes; then
SCM_I_GSC_ENABLE_ELISP=1
else
Index: guile-1.8.5/config.h.in
===================================================================
--- guile-1.8.5.orig/config.h.in 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/config.h.in 2008-11-16 18:57:48.000000000 -0600
@@ -42,6 +42,9 @@ Boston, MA 02110-1301, USA.
/* Define this if you want to debug scm_must_malloc/realloc/free calls. */
#undef GUILE_DEBUG_MALLOC
+/* Define this if you want to debug thread locking and deadlocks. */
+#undef GUILE_DEBUG_LOCKS
+
/* The imaginary unit (positive square root of -1). */
#undef GUILE_I
Index: guile-1.8.5/libguile/Makefile.am
===================================================================
--- guile-1.8.5.orig/libguile/Makefile.am 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/libguile/Makefile.am 2008-11-16 18:57:48.000000000 -0600
@@ -159,7 +159,7 @@ EXTRA_libguile_la_SOURCES = _scm.h \
inet_aton.c memmove.c putenv.c strerror.c \
dynl.c regex-posix.c \
filesys.c posix.c net_db.c socket.c \
- debug-malloc.c mkstemp.c \
+ debug-locks.c debug-malloc.c mkstemp.c \
win32-uname.c win32-dirent.c win32-socket.c
## delete guile-snarf.awk from the installation bindir, in case it's
Index: guile-1.8.5/libguile/debug-locks.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ guile-1.8.5/libguile/debug-locks.c 2008-11-16 20:36:22.000000000 -0600
@@ -0,0 +1,159 @@
+/* Copyright (C) 2008 Free Software Foundation, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+/*
+ * Utilities for tracing/debugging deadlocks. Conditionally compiled by
+ * requesting ./configure --enable-debug-locks. The functions here
+ * replace the pthred mutex lock and unlock routines to record where
+ * a lock is taken, by recording a short snippet of the stack. The
+ * logs of held locks are on a thread-by-thread basis, so that deadlocks
+ * across different threads can be debugged.
+ *
+ * The primary check is to make sure that locks are always unlocked in
+ * reverse order (nested order), as out-of-sequence locks typically
+ * result in deadlocks. One exception is made: if the lock is the
+ * thread heap_mutex lock (i.e. the lock that defines "guile mode"),
+ * its allowed to be unlocked in reversed sequence, as a special case.
+ * If a bad unlock sequence is detected, then abort is called, to put
+ * the system into the debugger.
+ *
+ * If guile still deadlocks, without triggering an error, you might
+ * find the prt_lockholders() function to be useful: from within gdb,
+ * just say "call prt_lockholders()" to list all locks held by all
+ * threads.
+ *
+ * CAUTION: Turning this on leads to a *severe* performance degradation.
+ */
+
+#include <pthread.h>
+#include <execinfo.h>
+#include "_scm.h"
+
+extern void prt_lockholders(void);
+
+int guile_do_abort_on_badlock = 0;
+
+void prt_one_lockholder(scm_i_thread *);
+void prt_one_lockholder(scm_i_thread *t)
+{
+ int i, j;
+
+ fprintf (stderr, "\nThread %p\n", (void *) t->pthread);
+ for (i=0; i<LOCK_STACK_DEPTH; i++)
+ {
+ if (NULL == t->lockname[i]) break;
+ fprintf(stderr, "%d: %s (%p) in:\n", i, t->lockname[i], t->lockmutex[i]);
+ for (j=0; j<TRACE_STACK_DEPTH; j++)
+ {
+ if (NULL == t->lockholder[i][j]) break;
+ fprintf(stderr, "\t%s\n", t->lockholder[i][j]);
+ }
+ }
+}
+
+
+int scm_i_pthread_mutex_lock_dbg(pthread_mutex_t *mtx, const char *lockstr)
+{
+ int i,j;
+ int rc = pthread_mutex_lock(mtx);
+ scm_i_thread *tp = SCM_I_CURRENT_THREAD;
+
+ if (NULL == tp)
+ return rc;
+
+ for (i=0; i<LOCK_STACK_DEPTH; i++)
+ {
+ if (0x0 == tp->lockname[i])
+ {
+ void * b[TRACE_STACK_DEPTH+1];
+ int sz = backtrace(b, TRACE_STACK_DEPTH+1);
+ char **s = backtrace_symbols(b, sz);
+ for (j=0; j<sz-1; j++)
+ {
+ tp->lockholder[i][j] = strdup(s[j+1]);
+ }
+ tp->lockname[i] = lockstr;
+ tp->lockmutex[i] = mtx;
+ free (s);
+ break;
+ }
+ }
+
+ if (LOCK_STACK_DEPTH <= i)
+ {
+ fprintf(stderr, "Error: thread is holding too many locks\n");
+ prt_lockholders();
+ if (guile_do_abort_on_badlock) abort();
+ }
+
+ return rc;
+}
+
+int scm_i_pthread_mutex_unlock_dbg(pthread_mutex_t *mtx, const char * lockstr)
+{
+ int i,j;
+
+ scm_i_thread *tp = SCM_I_CURRENT_THREAD;
+ if (NULL == tp)
+ return pthread_mutex_unlock(mtx);
+
+ for(i=LOCK_STACK_DEPTH-1; i>=0; i--)
+ {
+ if (0x0 == tp->lockname[i])
+ continue;
+
+ /* Allows the crazy nested two-step invloving the heap_mutex */
+ if ((tp->lockmutex[i] != mtx) &&
+ ((tp->lockmutex[i] != &tp->heap_mutex) || (0 == i) ||
+ (tp->lockmutex[i-1] != mtx)))
+ {
+ fprintf(stderr, "Error: unlocking is badly nested: "
+ "Attempting to unlock %s (%p)\n",
+ lockstr, mtx);
+ prt_one_lockholder(tp);
+ // prt_lockholders();
+ if (guile_do_abort_on_badlock) abort();
+ }
+ if (tp->lockmutex[i-1] == mtx)
+ {
+ tp->lockname[i-1] = tp->lockname[i];
+ tp->lockmutex[i-1] = tp->lockmutex[i];
+ for (j=0; j<TRACE_STACK_DEPTH; j++)
+ {
+ char * tmp = tp->lockholder[i-1][j];
+ tp->lockholder[i-1][j] = tp->lockholder[i][j];
+ tp->lockholder[i][j] = tmp;
+ }
+ }
+ tp->lockname[i] = NULL;
+ tp->lockmutex[i] = NULL;
+ for (j=0; j<TRACE_STACK_DEPTH; j++)
+ {
+ if(tp->lockholder[i][j]) free(tp->lockholder[i][j]);
+ tp->lockholder[i][j] = NULL;
+ }
+ break;
+ }
+
+ if (0 > i)
+ {
+ fprintf(stderr, "Error: unlocking a lock that's not held\n");
+ prt_lockholders();
+ if (guile_do_abort_on_badlock) abort();
+ }
+
+ return pthread_mutex_unlock(mtx);
+}
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: thread-dbg.patch --]
[-- Type: text/x-diff; name=thread-dbg.patch, Size: 12908 bytes --]
Subject: [PATCH] Final: thread lock nesting debugging
Date: 16 Nov 2008
To: bug-guile@gnu.org
I've been seeing all sorts of deadlocks in guile, and so I wrote a small
debugging utility to try to track down the problems. I'd like to see
this patch included in future versions of guile.
I found one bug with this tool, and have submitted a patch for that
already. It looks like there's another bug involving signals --
there is a probably deadlock in garbage collection if a signal
is sent at just the wrong time. The bug can be seen by enabling
this patch, and then running 'make check'. I'm going to ignore
this, as I'm not worried about signals right now.
This is my "final" version of this patch, I'd sent a beta version
a few days ago. Its "final" because I'm not anticipating any
further changes.
--linas
Add a deadlock debugging facility to guile.
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
---
config.h.in | 3
configure.in | 11 +++
libguile/Makefile.am | 2
libguile/debug-locks.c | 159 +++++++++++++++++++++++++++++++++++++++++++++
libguile/pthread-threads.h | 15 ++++
libguile/threads.c | 53 +++++++++++++++
libguile/threads.h | 8 ++
7 files changed, 250 insertions(+), 1 deletion(-)
Index: guile-1.8.5/libguile/pthread-threads.h
===================================================================
--- guile-1.8.5.orig/libguile/pthread-threads.h 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/libguile/pthread-threads.h 2008-11-16 18:57:48.000000000 -0600
@@ -91,6 +91,21 @@ extern pthread_mutexattr_t scm_i_pthread
#define scm_i_scm_pthread_cond_wait scm_pthread_cond_wait
#define scm_i_scm_pthread_cond_timedwait scm_pthread_cond_timedwait
+#ifdef GUILE_DEBUG_LOCKS
+#undef scm_i_pthread_mutex_lock
+#define scm_i_pthread_mutex_lock(ARG) scm_i_pthread_mutex_lock_dbg(ARG, #ARG)
+
+#undef scm_i_pthread_mutex_unlock
+#define scm_i_pthread_mutex_unlock(ARG) scm_i_pthread_mutex_unlock_dbg(ARG, #ARG)
+
+int scm_i_pthread_mutex_lock_dbg(pthread_mutex_t *, const char *);
+int scm_i_pthread_mutex_unlock_dbg(pthread_mutex_t *, const char *);
+
+void prt_lockholders(void);
+void prt_this_lockholder(void);
+
+#endif
+
#endif /* SCM_PTHREADS_THREADS_H */
/*
Index: guile-1.8.5/libguile/threads.c
===================================================================
--- guile-1.8.5.orig/libguile/threads.c 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/libguile/threads.c 2008-11-16 18:57:48.000000000 -0600
@@ -441,6 +441,24 @@ guilify_self_1 (SCM_STACKITEM *base)
SCM_SET_FREELIST_LOC (scm_i_freelist, &t->freelist);
SCM_SET_FREELIST_LOC (scm_i_freelist2, &t->freelist2);
+#ifdef GUILE_DEBUG_LOCKS
+ int i, j;
+ for(i=0; i<LOCK_STACK_DEPTH; i++)
+ {
+ t->lockname[i] = NULL;
+ t->lockmutex[i] = NULL;
+ for(j=0; j<TRACE_STACK_DEPTH; j++)
+ {
+ t->lockholder[i][j] = NULL;
+ }
+ }
+ if (scm_initialized_p == 0)
+ {
+ t->lockname[0] = "&scm_i_init_mutex";
+ t->lockmutex[0] = &scm_i_init_mutex;
+ }
+#endif
+
scm_i_pthread_setspecific (scm_i_thread_key, t);
scm_i_pthread_mutex_lock (&t->heap_mutex);
@@ -1624,8 +1642,21 @@ scm_i_thread_wake_up ()
scm_i_thread *t;
scm_i_pthread_cond_broadcast (&wake_up_cond);
+#ifndef GUILE_DEBUG_LOCKS
for (t = all_threads; t; t = t->next_thread)
scm_i_pthread_mutex_unlock (&t->heap_mutex);
+#else
+ /* Unlock in reverse order from locking */
+ scm_i_thread *tt = NULL;
+ while (tt != all_threads)
+ {
+ scm_i_thread *tp = NULL;
+ for (t = all_threads; t != tt; t = t->next_thread)
+ tp = t;
+ scm_i_pthread_mutex_unlock (&tp->heap_mutex);
+ tt = tp;
+ }
+#endif
scm_i_pthread_mutex_unlock (&thread_admin_mutex);
scm_enter_guile ((scm_t_guile_ticket) SCM_I_CURRENT_THREAD);
}
@@ -1721,6 +1752,28 @@ scm_init_threads_default_dynamic_state (
scm_i_default_dynamic_state = scm_permanent_object (state);
}
+#ifdef GUILE_DEBUG_LOCKS
+extern int guile_do_abort_on_badlock;
+
+void prt_one_lockholder(scm_i_thread *);
+void prt_lockholders(void)
+{
+ scm_i_thread *t;
+
+ if (!guile_do_abort_on_badlock) return;
+
+ for (t = all_threads; t; t = t->next_thread)
+ {
+ prt_one_lockholder(t);
+ }
+}
+
+void prt_this_lockholder(void)
+{
+ prt_one_lockholder(SCM_I_CURRENT_THREAD);
+}
+#endif
+
void
scm_init_thread_procs ()
{
Index: guile-1.8.5/libguile/threads.h
===================================================================
--- guile-1.8.5.orig/libguile/threads.h 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/libguile/threads.h 2008-11-16 18:57:48.000000000 -0600
@@ -108,6 +108,14 @@ typedef struct scm_i_thread {
SCM_STACKITEM *top;
jmp_buf regs;
+#ifdef GUILE_DEBUG_LOCKS
+#define LOCK_STACK_DEPTH 250
+#define TRACE_STACK_DEPTH 4
+ const char *lockname[LOCK_STACK_DEPTH];
+ char *lockholder[LOCK_STACK_DEPTH][TRACE_STACK_DEPTH];
+ pthread_mutex_t *lockmutex[LOCK_STACK_DEPTH];
+#endif
+
} scm_i_thread;
#define SCM_I_IS_THREAD(x) SCM_SMOB_PREDICATE (scm_tc16_thread, x)
Index: guile-1.8.5/configure.in
===================================================================
--- guile-1.8.5.orig/configure.in 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/configure.in 2008-11-16 18:57:48.000000000 -0600
@@ -122,6 +122,13 @@ AC_ARG_ENABLE(debug-malloc,
[Define this if you want to debug scm_must_malloc/realloc/free calls.])
fi)
+AC_ARG_ENABLE(debug-locks,
+ [ --enable-debug-locks include thread lock debugging code],
+ if test "$enable_debug_locks" = y || test "$enable_debug_locks" = yes; then
+ AC_DEFINE(GUILE_DEBUG_LOCKS, 1,
+ [Define this if you want to debug pthread lock nesting and deadlock trouble.])
+ fi)
+
SCM_I_GSC_GUILE_DEBUG=0
AC_ARG_ENABLE(guile-debug,
[AC_HELP_STRING([--enable-guile-debug],
@@ -263,6 +270,10 @@ if test "$enable_debug_malloc" = yes; th
AC_LIBOBJ([debug-malloc])
fi
+if test "$enable_debug_locks" = yes; then
+ AC_LIBOBJ([debug-locks])
+fi
+
if test "$enable_elisp" = yes; then
SCM_I_GSC_ENABLE_ELISP=1
else
Index: guile-1.8.5/config.h.in
===================================================================
--- guile-1.8.5.orig/config.h.in 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/config.h.in 2008-11-16 18:57:48.000000000 -0600
@@ -42,6 +42,9 @@ Boston, MA 02110-1301, USA.
/* Define this if you want to debug scm_must_malloc/realloc/free calls. */
#undef GUILE_DEBUG_MALLOC
+/* Define this if you want to debug thread locking and deadlocks. */
+#undef GUILE_DEBUG_LOCKS
+
/* The imaginary unit (positive square root of -1). */
#undef GUILE_I
Index: guile-1.8.5/libguile/Makefile.am
===================================================================
--- guile-1.8.5.orig/libguile/Makefile.am 2008-11-16 18:57:19.000000000 -0600
+++ guile-1.8.5/libguile/Makefile.am 2008-11-16 18:57:48.000000000 -0600
@@ -159,7 +159,7 @@ EXTRA_libguile_la_SOURCES = _scm.h \
inet_aton.c memmove.c putenv.c strerror.c \
dynl.c regex-posix.c \
filesys.c posix.c net_db.c socket.c \
- debug-malloc.c mkstemp.c \
+ debug-locks.c debug-malloc.c mkstemp.c \
win32-uname.c win32-dirent.c win32-socket.c
## delete guile-snarf.awk from the installation bindir, in case it's
Index: guile-1.8.5/libguile/debug-locks.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ guile-1.8.5/libguile/debug-locks.c 2008-11-16 20:36:22.000000000 -0600
@@ -0,0 +1,159 @@
+/* Copyright (C) 2008 Free Software Foundation, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+/*
+ * Utilities for tracing/debugging deadlocks. Conditionally compiled by
+ * requesting ./configure --enable-debug-locks. The functions here
+ * replace the pthred mutex lock and unlock routines to record where
+ * a lock is taken, by recording a short snippet of the stack. The
+ * logs of held locks are on a thread-by-thread basis, so that deadlocks
+ * across different threads can be debugged.
+ *
+ * The primary check is to make sure that locks are always unlocked in
+ * reverse order (nested order), as out-of-sequence locks typically
+ * result in deadlocks. One exception is made: if the lock is the
+ * thread heap_mutex lock (i.e. the lock that defines "guile mode"),
+ * its allowed to be unlocked in reversed sequence, as a special case.
+ * If a bad unlock sequence is detected, then abort is called, to put
+ * the system into the debugger.
+ *
+ * If guile still deadlocks, without triggering an error, you might
+ * find the prt_lockholders() function to be useful: from within gdb,
+ * just say "call prt_lockholders()" to list all locks held by all
+ * threads.
+ *
+ * CAUTION: Turning this on leads to a *severe* performance degradation.
+ */
+
+#include <pthread.h>
+#include <execinfo.h>
+#include "_scm.h"
+
+extern void prt_lockholders(void);
+
+int guile_do_abort_on_badlock = 0;
+
+void prt_one_lockholder(scm_i_thread *);
+void prt_one_lockholder(scm_i_thread *t)
+{
+ int i, j;
+
+ fprintf (stderr, "\nThread %p\n", (void *) t->pthread);
+ for (i=0; i<LOCK_STACK_DEPTH; i++)
+ {
+ if (NULL == t->lockname[i]) break;
+ fprintf(stderr, "%d: %s (%p) in:\n", i, t->lockname[i], t->lockmutex[i]);
+ for (j=0; j<TRACE_STACK_DEPTH; j++)
+ {
+ if (NULL == t->lockholder[i][j]) break;
+ fprintf(stderr, "\t%s\n", t->lockholder[i][j]);
+ }
+ }
+}
+
+
+int scm_i_pthread_mutex_lock_dbg(pthread_mutex_t *mtx, const char *lockstr)
+{
+ int i,j;
+ int rc = pthread_mutex_lock(mtx);
+ scm_i_thread *tp = SCM_I_CURRENT_THREAD;
+
+ if (NULL == tp)
+ return rc;
+
+ for (i=0; i<LOCK_STACK_DEPTH; i++)
+ {
+ if (0x0 == tp->lockname[i])
+ {
+ void * b[TRACE_STACK_DEPTH+1];
+ int sz = backtrace(b, TRACE_STACK_DEPTH+1);
+ char **s = backtrace_symbols(b, sz);
+ for (j=0; j<sz-1; j++)
+ {
+ tp->lockholder[i][j] = strdup(s[j+1]);
+ }
+ tp->lockname[i] = lockstr;
+ tp->lockmutex[i] = mtx;
+ free (s);
+ break;
+ }
+ }
+
+ if (LOCK_STACK_DEPTH <= i)
+ {
+ fprintf(stderr, "Error: thread is holding too many locks\n");
+ prt_lockholders();
+ if (guile_do_abort_on_badlock) abort();
+ }
+
+ return rc;
+}
+
+int scm_i_pthread_mutex_unlock_dbg(pthread_mutex_t *mtx, const char * lockstr)
+{
+ int i,j;
+
+ scm_i_thread *tp = SCM_I_CURRENT_THREAD;
+ if (NULL == tp)
+ return pthread_mutex_unlock(mtx);
+
+ for(i=LOCK_STACK_DEPTH-1; i>=0; i--)
+ {
+ if (0x0 == tp->lockname[i])
+ continue;
+
+ /* Allows the crazy nested two-step invloving the heap_mutex */
+ if ((tp->lockmutex[i] != mtx) &&
+ ((tp->lockmutex[i] != &tp->heap_mutex) || (0 == i) ||
+ (tp->lockmutex[i-1] != mtx)))
+ {
+ fprintf(stderr, "Error: unlocking is badly nested: "
+ "Attempting to unlock %s (%p)\n",
+ lockstr, mtx);
+ prt_one_lockholder(tp);
+ // prt_lockholders();
+ if (guile_do_abort_on_badlock) abort();
+ }
+ if (tp->lockmutex[i-1] == mtx)
+ {
+ tp->lockname[i-1] = tp->lockname[i];
+ tp->lockmutex[i-1] = tp->lockmutex[i];
+ for (j=0; j<TRACE_STACK_DEPTH; j++)
+ {
+ char * tmp = tp->lockholder[i-1][j];
+ tp->lockholder[i-1][j] = tp->lockholder[i][j];
+ tp->lockholder[i][j] = tmp;
+ }
+ }
+ tp->lockname[i] = NULL;
+ tp->lockmutex[i] = NULL;
+ for (j=0; j<TRACE_STACK_DEPTH; j++)
+ {
+ if(tp->lockholder[i][j]) free(tp->lockholder[i][j]);
+ tp->lockholder[i][j] = NULL;
+ }
+ break;
+ }
+
+ if (0 > i)
+ {
+ fprintf(stderr, "Error: unlocking a lock that's not held\n");
+ prt_lockholders();
+ if (guile_do_abort_on_badlock) abort();
+ }
+
+ return pthread_mutex_unlock(mtx);
+}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Final: thread lock nesting debugging
2008-11-17 4:42 [PATCH] Final: thread lock nesting debugging Linas Vepstas
@ 2008-11-17 13:21 ` Han-Wen Nienhuys
2008-11-18 13:26 ` Ludovic Courtès
2008-11-19 23:14 ` Neil Jerram
2 siblings, 0 replies; 8+ messages in thread
From: Han-Wen Nienhuys @ 2008-11-17 13:21 UTC (permalink / raw)
To: bug-guile
Linas Vepstas escreveu:
> I've been seeing all sorts of deadlocks in guile, and so I wrote a small
> debugging utility to try to track down the problems. I'd like to see
> this patch included in future versions of guile.
>
> I found one bug with this tool, and have submitted a patch for that
> already. It looks like there's another bug involving signals --
> there is a probably deadlock in garbage collection if a signal
> is sent at just the wrong time. The bug can be seen by enabling
> this patch, and then running 'make check'. I'm going to ignore
> this, as I'm not worried about signals right now.
>
> This is my "final" version of this patch, I'd sent a beta version
> a few days ago. Its "final" because I'm not anticipating any
> further changes.
Would it be possible to have this be a runtime switch? My experience
is that conditional code in GUILE (especially the debugging ones) tend
to bitrot very quickly. Also, they don´t help anyone who is not compiling
their own GUILE.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Final: thread lock nesting debugging
2008-11-17 4:42 [PATCH] Final: thread lock nesting debugging Linas Vepstas
2008-11-17 13:21 ` Han-Wen Nienhuys
@ 2008-11-18 13:26 ` Ludovic Courtès
2008-11-19 23:14 ` Neil Jerram
2 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2008-11-18 13:26 UTC (permalink / raw)
To: bug-guile
Hello,
"Linas Vepstas" <linasvepstas@gmail.com> writes:
> I've been seeing all sorts of deadlocks in guile, and so I wrote a small
> debugging utility to try to track down the problems. I'd like to see
> this patch included in future versions of guile.
FWIW, I'm reluctant to committing this patch for various reasons:
intrusion, non-genericity, bitrot of conditional code as Han-Wen noted,
non-C89ness, pollution outside of the `scm_' namespace, incorrect style.
Another reason is that pthread libraries are sometimes able to debug
deadlocks by themselves. NetBSD's performs several run-time checks by
default [0]. Another solution is the explicit use of error-checking
mutexes, as with the patch at [1]. What do you think?
Thanks,
Ludo'.
[0] http://thread.gmane.org/gmane.lisp.guile.devel/6656
[1] http://thread.gmane.org/gmane.lisp.guile.devel/6733
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Final: thread lock nesting debugging
2008-11-17 4:42 [PATCH] Final: thread lock nesting debugging Linas Vepstas
2008-11-17 13:21 ` Han-Wen Nienhuys
2008-11-18 13:26 ` Ludovic Courtès
@ 2008-11-19 23:14 ` Neil Jerram
2008-11-19 23:21 ` Neil Jerram
2008-11-20 16:03 ` Linas Vepstas
2 siblings, 2 replies; 8+ messages in thread
From: Neil Jerram @ 2008-11-19 23:14 UTC (permalink / raw)
To: linasvepstas; +Cc: bug-guile
2008/11/17 Linas Vepstas <linasvepstas@gmail.com>:
> I've been seeing all sorts of deadlocks in guile, and so I wrote a small
> debugging utility to try to track down the problems.
Interesting patch!
One query; I may be being a bit dumb, I'm only just recovering from a
bad cold, but anyway... Your patch checks for a thread unlocking
mutexes in the reverse order that it locked them in (let's call this
point "A"). But I thought your recent investigations had shown that
the problem was threads doing locking in inconsistent order, e.g.
thread 1 locks M1 and then M2, while thread 2 locks M2 and then M1
(point "B"). Are points "A" and "B" equivalent? (It isn't obvious to
me if so.)
Regards,
Neil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Final: thread lock nesting debugging
2008-11-19 23:14 ` Neil Jerram
@ 2008-11-19 23:21 ` Neil Jerram
2008-11-20 0:22 ` Neil Jerram
2008-11-20 16:03 ` Linas Vepstas
1 sibling, 1 reply; 8+ messages in thread
From: Neil Jerram @ 2008-11-19 23:21 UTC (permalink / raw)
To: linasvepstas; +Cc: bug-guile
2008/11/19 Neil Jerram <neiljerram@googlemail.com>:
> 2008/11/17 Linas Vepstas <linasvepstas@gmail.com>:
>> I've been seeing all sorts of deadlocks in guile, and so I wrote a small
>> debugging utility to try to track down the problems.
>
> Interesting patch!
>
> One query; I may be being a bit dumb, I'm only just recovering from a
> bad cold, but anyway... Your patch checks for a thread unlocking
> mutexes in the reverse order that it locked them in (let's call this
> point "A"). But I thought your recent investigations had shown that
> the problem was threads doing locking in inconsistent order, e.g.
> thread 1 locks M1 and then M2, while thread 2 locks M2 and then M1
> (point "B"). Are points "A" and "B" equivalent? (It isn't obvious to
> me if so.)
Also I wondered if there are already tools to debug this kind of thing
(without new Guile code), and a quick search finds this [1], which
suggests that helgrind could catch bad lock ordering for us.
[1] http://www.network-theory.co.uk/docs/valgrind/valgrind_113.html
Neil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Final: thread lock nesting debugging
2008-11-19 23:14 ` Neil Jerram
2008-11-19 23:21 ` Neil Jerram
@ 2008-11-20 16:03 ` Linas Vepstas
2008-11-20 18:33 ` Linas Vepstas
1 sibling, 1 reply; 8+ messages in thread
From: Linas Vepstas @ 2008-11-20 16:03 UTC (permalink / raw)
To: Neil Jerram; +Cc: bug-guile
2008/11/19 Neil Jerram <neiljerram@googlemail.com>:
> 2008/11/17 Linas Vepstas <linasvepstas@gmail.com>:
>> I've been seeing all sorts of deadlocks in guile, and so I wrote a small
>> debugging utility to try to track down the problems.
>
> Interesting patch!
>
> One query; I may be being a bit dumb, I'm only just recovering from a
> bad cold, but anyway... Your patch checks for a thread unlocking
> mutexes in the reverse order that it locked them in (let's call this
> point "A"). But I thought your recent investigations had shown that
> the problem was threads doing locking in inconsistent order, e.g.
> thread 1 locks M1 and then M2, while thread 2 locks M2 and then M1
> (point "B"). Are points "A" and "B" equivalent? (It isn't obvious to
> me if so.)
Hi Neil,
There is (should be) only one lock in guile that is "inconsistent"
in its locking order, and this is the t->heap_mutex lock.
My guess is that valgrind is tripping over this one. I guess
I should argue that this is why one needs a custom patch,
instead of using valgrind (which is otherwise pretty fantastic
for mem corruption and the like).
The t->heap_mutex lock is heled whenever a thread is
"guilified" or is "in guile mode". Its primary reason for
being is to keep the garbage collector from running
until all threads have been halted. (This is done by
scm_i_thread_put_to_sleep)
After applying my set of patches, the only inconsistent
(and widespread!) lock ordering problem that I'm seeing
stems from the asymmetric way in which scm_i_scm_pthread_mutex_lock is
used to take a lock,
and then drop it. If you follow the #define for
scm_i_scm_pthread_mutex_lock, you find that its of
the form:
drop (thread->heap_mutex)
take(some lock)
take (thread->heap_mutex)
Whereas the unlock is just
drop(some lock)
You can see this, for example, in ports.c line 728
scm_i_scm_pthread_mutex_lock (&scm_i_port_table_mutex);
scm_i_remove_port (port);
scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
Tto be "correctly" nested, the unlock should have droped
the heap mutex first, and then reacquired it. I believe that
doing this would be enough to quiet down helgrind. (at least
for most cases ... what remains would be interesting)
OK, the above was just facts; below, some random comments
which might be incorrect (reasoning about locks can be
deceptive; I've certainly mis-reasoned several times w.r.t guile)
-- I had decided that the way that the dropping of the
lock is done is OK, and that it would be silly (and a
performance hit) to try to "fix" the unlocking order.
For debugging with helgrind, you may want to do this
anyway, but for production, it seemed un-necessary.
Prod me, and perhaps I can reconstruct why I cam to
this conclusion.
-- The reason for dropping the heap_mutex before grabbing
the other lock (for example scm_i_port_table_mutex),
is somewhat obscure, but at one point I decided that this
was OK, and arguably correct. As I write this, I've
forgotten why. However, this should be a focus of attention,
and re-thought-out. If you are willing to think about it, prod
me and maybe I can say something intelligent. Changing
this would also quiet helgrind (and boost performance).
It might be safe, I need to rethink this.
Anyway, my patch allows for this single occurance of the
out-of-sequence heap_mutex unlock.
--linas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Final: thread lock nesting debugging
2008-11-20 16:03 ` Linas Vepstas
@ 2008-11-20 18:33 ` Linas Vepstas
0 siblings, 0 replies; 8+ messages in thread
From: Linas Vepstas @ 2008-11-20 18:33 UTC (permalink / raw)
To: Neil Jerram; +Cc: bug-guile
2008/11/20 Linas Vepstas <linasvepstas@gmail.com>:
> -- The reason for dropping the heap_mutex before grabbing
> the other lock (for example scm_i_port_table_mutex),
> is somewhat obscure, but at one point I decided that this
> was OK, and arguably correct. As I write this, I've
> forgotten why. However, this should be a focus of attention,
> and re-thought-out.
Well, a quick look reminds me of the situation: in many/most
cases, locked sections might trigger garbage collection.
Thus, the heap_mutex *must* be dropped before the lock
is taken.
My gut impression is that this is a poor design point; and
that the correct thing to do would be make locks fine-grained,
so that there is never a need to run GC while a lock is held.
This would require extensive auditing of the guile code.
--linas
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-11-20 18:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-17 4:42 [PATCH] Final: thread lock nesting debugging Linas Vepstas
2008-11-17 13:21 ` Han-Wen Nienhuys
2008-11-18 13:26 ` Ludovic Courtès
2008-11-19 23:14 ` Neil Jerram
2008-11-19 23:21 ` Neil Jerram
2008-11-20 0:22 ` Neil Jerram
2008-11-20 16:03 ` Linas Vepstas
2008-11-20 18:33 ` Linas Vepstas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).