unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#22152: fat_mutex owner corruption (fmoc) inside fat_mutex_unlock (guile-v2.0.11)
@ 2015-12-12 17:28 Iwan Aucamp
  2016-01-07  0:13 ` Mark H Weaver
  0 siblings, 1 reply; 3+ messages in thread
From: Iwan Aucamp @ 2015-12-12 17:28 UTC (permalink / raw)
  To: 22152; +Cc: Iwan Armand Aucamp, Donovan Hutcheon, Martin Cooper, Waqar Ali

[-- Attachment #1: Type: text/plain, Size: 6224 bytes --]

Hi

We sporadically get "mutex not locked" and "mutex not locked by current thread"
exceptions on Solaris 10u10 with guile-2.0.11.

This problem can be reproduced with following scheme scripts:



  guile-fmoc-test-mnl.scm (for "mutex not locked")
      Two threads, one of them (reader-000) waits on a condition variable that
      nothing will trigger and the other thread (writer-000) locks and unlocks
      the mutex used with the condition variable. This code causes "mutex not
      locked" exception with some consistency.

      Output for this is in guile-fmoc-test-mnl-problem_output.txt (referenced
      as mnl-problem_output.txt)


      * owner id for reader-000 is 12593872

      * owner id for writer-000 is 12595040



  guile-fmoc-test-mnlbct.scm (for "mutex not locked by current thread")
      Same as guile-fmoc-test-mnl.scm except here writer-000 signals condition
      variable. This code causes "mutex not locked" and "mutex not locked by
      current thread" errors.

      Output for this, showing "mutex not locked by current thread", is in
      guile-fmoc-test-mnlbct-problem_output.txt (referenced as mnlbct-
      problem_output.txt)


      * owner id for reader-000 is 14535648

      * owner id for writer-000 is 14536400



To track down this issue we have added some debug printfs (see guile-2.0.11-
with-debug.patch). Given that this changes the line numbers I have referenced
original line numbers as o:file:line and line numbers with patch as d[:file]:
line. Also, these printfs have resulted in some irrelevant output (for internal
and verbose logging mutexes) which has been filtered out.

There is various scenarios leading to these errors that we have found but all
caused by same problem. The a detailed analysis for "mutex not locked by
current thread" scenario that can be seen in mnlbct-problem_output.txt is
included below and detailed analysis for other scenarios will be shared if
required.

Scenario from mnlbct-problem_output.txt:


  1. [writer-000:14536400] unlocks fat_mutex[14512880] and queues reader-000:
     14535648

     at o:threads.c:1664 - d:1681

     before mnlbct-problem_output.txt:4079

  2. [reader-000:14535648] locks fat_mutex[14512880] with fat_mutex_lock

     at o:threads.c:1394 - d:1401

     before mnlbct-problem_output.txt:4080

  3. [reader-000:14535648] enters wait-condition-variable and changes fat_mutex
     [14512880].owner to writer-000:14536400

     at o:threads.c:1616 - d:1631

     before mnlbct-problem_output.txt:4083

  4. [reader-000:14535648] goes into block_self and starts waiting on ptheead
     condition variable inside fat_mutex_unlock > block_self. This unlocks
     fat_mutex[14512880].mutex, allowing some other thread to lock fat_mutex
     [14512880]

     at o:threads.c:452 - d:456

     before mnlbct-problem_output.txt:4083

  5. [writer-000:14536400] locks fat_mutex[14512880] with fat_mutex_lock

     at o:threads.c:1394 - d:1401

     before mnlbct-problem_output.txt:4082

  6. [reader-000:14535648] spurious wake-up occurs for condition variable which
     causes block_self to return EINTR to fat_mutex_unlock

     at o:threads.c:1621 - d:1636

     before mnlbct-problem_output.txt:4084

  7. [reader-000:14535648] loops and sets fat_mutex[14512880].owner=4 (i.e. not
     locked) while writer-000:14536400 should still be owner of fat_mutex
     [14512880]. Since it was spurious wake-up reader-000:14535648 continues to
     wait for condition to be notified again.

     at o:threads.c:1616 - d:1631

     before mnlbct-problem_output.txt:4086

  8. [writer-000:14536400] completes signal-condition-variable

     before mnlbct-problem_output.txt:4088

  9. [reader-000:14535648] now gets actual notification and block_self returns
     0. This causes fat_mutex:14512880 to be locked again - which works cos
     fat_mutex:14512880.owner is 4. This changes fat_mutex:14512880.owner from
     4 to 14535648.

     at o:threads.c:1643 - d:1660

     before mnlbct-problem_output.txt:4086

 10. [writer-000:14536400] tries to unlock the mutex, this fails though as
     reader-000:14535648 now owns the mutex - resulting in "mutex not locked by
     current thread" exception.

     at o:threads.c:1599 - d:1614

     before mnlbct-problem_output.txt:4089


Briefly, for mnl-problem_output.txt:


  1. reader-000 locks fat_mutex and unlocks it again as it starts waiting for
     condition to be notified.

  2. writer-000 locks mutex

  3. Spurious wake-up occurs for reader-000 which causes reader to change
     fat_mutex.owner from writer-000 id to 4 and then resumes waiting on
     condition variable

  4. Writer tries to unlock fat_mutex but now owner is 4 and this results in
     "mutex not locked" exception


The cause of these problems seems to be related to fat_mutex_unlock changing
fat_mutex.owner inside the while loop that is intended for checking condition
variable predicate which is problematic if spurious wake-ups from
pthread_cond_wait occur. Spurious wake-ups from pthread_cond_wait seems less
common on Linux, which is why we have only been observing the issue on Solaris.
It does however look like this problem will occur on any platform when a
spurious wake-up does occur.

As far as we can tell there is no reason for the fat_mutex.owner assignment to
happen inside the loop. It seems more appropriate that this happens only once
before the loop and not again. To this extent we moved owner reassignment out
of the loop and this seems to have resolved our issues. The patch for this is
in guile-2.0.11-with-fmoc_fix.patch.

We have ran the test suite with this on Linux and everything passes. There is
however other issues with test suite on Solaris that prevents it from
completing (both with and without the patch) which needs further investigation.


* All files related to this can be found at
https://gitlab.com/concurrent-systems/osp-issues-1512/tree/master/guile-fmoc

* Source with guile-2.0.11-with-debug.patch can be found at
https://gitlab.com/concurrent-systems/guile/tree/v2.0.11-with-debug

* Source with guile-2.0.11-with-fmoc_fix.patch can be found at
https://gitlab.com/concurrent-systems/guile/tree/v2.0.11-with-fmoc_fix


Regards
-- 
Iwan Aucamp

[-- Attachment #2: guile-2.0.11-with-debug.patch --]
[-- Type: text/x-patch, Size: 5757 bytes --]

diff --git a/libguile/debug.c b/libguile/debug.c
index 107b5d4..6f47354 100644
--- a/libguile/debug.c
+++ b/libguile/debug.c
@@ -263,3 +263,4 @@ scm_init_debug ()
   c-file-style: "gnu"
   End:
 */
+int dpe = 0;
diff --git a/libguile/debugx.h b/libguile/debugx.h
new file mode 100644
index 0000000..b39a7bb
--- /dev/null
+++ b/libguile/debugx.h
@@ -0,0 +1,35 @@
+/*
+vim: set ts=8 sts=2 sw=2 noexpandtab filetype=c:
+*/
+#include <unistd.h>
+#include <time.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <stdlib.h>
+
+extern int dpe;
+
+#define DEBUG_PRINTF( cond, format, args... ) \
+  if ( cond )\
+  { \
+    struct timespec l_timespec; \
+    struct tm l_tm; \
+    char date_time_buf[32];\
+    char __db_buffer[8*1024]; \
+    int written = -1; \
+    clock_gettime( CLOCK_REALTIME, &l_timespec ); \
+    \
+    localtime_r(&(l_timespec.tv_sec), &l_tm); \
+    \
+    strftime( date_time_buf, 32, "%Y-%m-%dT%H:%M:%S", &l_tm );\
+    \
+    \
+    written = snprintf( __db_buffer, 8*1024, "%s.%09li [%08u:%08u:%010u] DEBUG_VAR [%s:%i:%s] "format"\n", \
+      date_time_buf, l_timespec.tv_nsec, \
+      ( unsigned int )( getpid() ), ( unsigned int )( getppid() ), ( unsigned int )( pthread_self() ), \
+      __FILE__, __LINE__, __FUNCTION__, \
+      ##args ); \
+    write( fileno( stderr ), __db_buffer, written ); \
+  } \
+
+
diff --git a/libguile/init.c b/libguile/init.c
index b320360..33340ce 100644
--- a/libguile/init.c
+++ b/libguile/init.c
@@ -17,6 +17,9 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
  * 02110-1301 USA
  */
+/*
+vim: set ts=8 sts=2 sw=2 noexpandtab filetype=c:
+*/
 
 
 \f
@@ -304,6 +307,7 @@ static void *invoke_main_func(void *body_data);
    allocate.  So, scm_boot_guile function exits, rather than
    returning, to discourage people from making that mistake.  */
 
+#include "libguile/debugx.h"
 
 void
 scm_boot_guile (int argc, char ** argv, void (*main_func) (), void *closure)
@@ -380,6 +384,7 @@ scm_i_init_guile (void *base)
 {
   if (scm_initialized_p)
     return;
+  dpe = ( getenv( "DEBUG_PRINTF_ENABLE" ) != NULL );
 
   scm_storage_prehistory ();
   scm_threads_prehistory (base);  /* requires storage_prehistory */
diff --git a/libguile/threads.c b/libguile/threads.c
index 15e4919..702ce37 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -17,6 +17,9 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
  * 02110-1301 USA
  */
+/*
+vim: set ts=8 sts=2 sw=2 noexpandtab filetype=c:
+*/
 
 
 \f
@@ -64,6 +67,7 @@
 #include "libguile/scmsigs.h"
 #include "libguile/strings.h"
 #include "libguile/weaks.h"
+#include "libguile/debugx.h"
 
 #include <full-read.h>
 
@@ -458,7 +462,10 @@ block_self (SCM queue, SCM sleep_object, scm_i_pthread_mutex_t *mutex,
 	 happened above.
       */
       if (remqueue (queue, q_handle) && err == 0)
+        {
+          DEBUG_PRINTF( dpe, "[ waittime = %p, errno = %i, errno_string = %s ]", waittime, err, strerror( err ) );
 	err = EINTR;
+        }
       t->block_asyncs--;
       scm_i_reset_sleep (t);
     }
@@ -1406,6 +1413,8 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
     {
       if (m->level == 0)
 	{
+          DEBUG_PRINTF( dpe, "mutex = %lu : m->level = %i : owner = %lu -> %lu",
+            SCM_UNPACK( mutex ), m->level, SCM_UNPACK( m->owner ), SCM_UNPACK( new_owner ) );
 	  m->owner = new_owner;
 	  m->level++;
 
@@ -1432,6 +1441,8 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
 	}
       else if (SCM_I_IS_THREAD (m->owner) && scm_c_thread_exited_p (m->owner))
 	{
+          DEBUG_PRINTF( dpe, "mutex = %lu : m->level = %i : owner = %lu -> %lu",
+            SCM_UNPACK( mutex ), m->level, SCM_UNPACK( m->owner ), SCM_UNPACK( new_owner ) );
 	  m->owner = new_owner;
 	  err = scm_cons (scm_abandoned_mutex_error_key,
 			  scm_from_locale_string ("lock obtained on abandoned "
@@ -1589,6 +1600,8 @@ fat_mutex_unlock (SCM mutex, SCM cond,
 	  if (!m->unchecked_unlock)
 	    {
 	      scm_i_pthread_mutex_unlock (&m->lock);
+              DEBUG_PRINTF( dpe, "mutex = %lu : t->handle = %lu : m->level = %i : owner = %lu -> -1",
+                SCM_UNPACK( mutex ), SCM_UNPACK( t->handle ), m->level, SCM_UNPACK( owner ) );
 	      scm_misc_error (NULL, "mutex not locked", SCM_EOL);
 	    }
 	  owner = t->handle;
@@ -1596,6 +1609,8 @@ fat_mutex_unlock (SCM mutex, SCM cond,
       else if (!m->allow_external_unlock)
 	{
 	  scm_i_pthread_mutex_unlock (&m->lock);
+          DEBUG_PRINTF( dpe, "mutex = %lu : t->handle = %lu : m->level = %i : owner = %lu -> -1",
+            SCM_UNPACK( mutex ), SCM_UNPACK( t->handle ), m->level, SCM_UNPACK( owner ) );
 	  scm_misc_error (NULL, "mutex not locked by current thread", SCM_EOL);
 	}
     }
@@ -1619,6 +1634,8 @@ fat_mutex_unlock (SCM mutex, SCM cond,
 	  t->block_asyncs++;
 
 	  err = block_self (c->waiting, cond, &m->lock, waittime);
+          DEBUG_PRINTF( dpe, "mutex = %lu : t->handle = %lu : m->level = %i : owner = %lu -> %lu [ relock = %i, err = %i, errstr = %s ]",
+            SCM_UNPACK( mutex ), SCM_UNPACK( t->handle ), m->level, SCM_UNPACK( owner ), SCM_UNPACK( m->owner ), relock, err, strerror( err ) );
 	  scm_i_pthread_mutex_unlock (&m->lock);
 
 	  if (err == 0)
@@ -1662,6 +1679,8 @@ fat_mutex_unlock (SCM mutex, SCM cond,
 	  /* Change the owner of MUTEX.  */
 	  t->mutexes = scm_delq_x (mutex, t->mutexes);
 	  m->owner = unblock_from_queue (m->waiting);
+          DEBUG_PRINTF( dpe, "mutex = %lu : t->handle = %lu : m->level = %i : owner = %lu -> %lu [ relock = %i ]",
+            SCM_UNPACK( mutex ), SCM_UNPACK( t->handle ), m->level, SCM_UNPACK( owner ), SCM_UNPACK( m->owner ), relock );
 	}
 
       scm_i_pthread_mutex_unlock (&m->lock);

[-- Attachment #3: guile-2.0.11-with-fmoc_fix.patch --]
[-- Type: text/x-patch, Size: 877 bytes --]

diff --git a/libguile/threads.c b/libguile/threads.c
index 15e4919..417c225 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1603,19 +1603,20 @@ fat_mutex_unlock (SCM mutex, SCM cond,
   if (! (SCM_UNBNDP (cond)))
     {
       c = SCM_CONDVAR_DATA (cond);
+
+      if (m->level > 0)
+        m->level--;
+      if (m->level == 0)
+        {
+          /* Change the owner of MUTEX.  */
+          t->mutexes = scm_delq_x (mutex, t->mutexes);
+          m->owner = unblock_from_queue (m->waiting);
+        }
+
       while (1)
 	{
 	  int brk = 0;
 
-	  if (m->level > 0)
-	    m->level--;
-	  if (m->level == 0)
-	    {
-	      /* Change the owner of MUTEX.  */
-	      t->mutexes = scm_delq_x (mutex, t->mutexes);
-	      m->owner = unblock_from_queue (m->waiting);
-	    }
-
 	  t->block_asyncs++;
 
 	  err = block_self (c->waiting, cond, &m->lock, waittime);

[-- Attachment #4: guile-fmoc-test-mnl.scm --]
[-- Type: text/x-scheme, Size: 2723 bytes --]

; vim: set ts=4 sw=4 noexpandtab filetype=scheme:

(format (current-error-port) "(current-filename) = ~a\n" (current-filename))
(format (current-error-port) "%load-path = ~a\n" %load-path)

(define-module (test-notlocked aab)
	#:use-module (ice-9 format)
	#:use-module (system foreign)
	#:use-module (ice-9 threads)
	#:use-module (oop goops)
)


(define-syntax-rule (verbose-format format-string ...) (verbose-format-with-location (current-source-location) format-string ...))

(define verbose-port (current-error-port))
(define verbose-mutex (make-mutex))

(define (verbose-format-with-location location format-string . format-args)
	(let ((tod (gettimeofday)) (csl (current-source-location)))
		(with-mutex verbose-mutex (apply format (append (list verbose-port
				(string-append "~a.~6,'0d ~6,'0d:~6,'0d:~a ~a:~5,'0d " format-string "\n")
				(strftime "%Y:%m:%d %H:%M:%S" (localtime (car tod))) (cdr tod)
				(getpid) (getppid) #f ;(thread-name (current-thread))
				(basename (assq-ref location 'filename)) (assq-ref location 'line))
				format-args
				)
			))
		)
		(force-output verbose-port)
	)


(define exit-status #f)

(define (exit-wait)
	(do () (exit-status)
		(usleep 500000))
	(verbose-format "ending ..." exit-status))

(define (signal-handler signal)
	(verbose-format "got signal ~a" signal)
	(set! exit-status #t)
	(verbose-format "exit-status = ~a" exit-status))

(define (writer id mutex condition-variable)
	(verbose-format "~a entry" id)
	(let loop ( (i 0) )
		(with-mutex mutex
			(verbose-format "~a writing ~a" id i)
			;(signal-condition-variable condition-variable)
			(verbose-format "~a wrote ~a" id i)
			(usleep 20000)
			)
		(usleep 100)
		(loop (+ 1 i))
		)
	(verbose-format "~a end" id)
	)

(define (reader id mutex condition-variable)
	(verbose-format "~a entry" id)
	(let loop ( (i 0) )
		(with-mutex mutex
			(verbose-format "~a reading ~a" id i)
			(wait-condition-variable condition-variable mutex)
			(usleep 20000)
			(verbose-format "~a read ~a" id i)
			)
		(usleep 100)
		(loop (+ 1 i))
		)
	(verbose-format "~a end" id)
	)

(define (main)
		(verbose-format "entry")
		(verbose-format "entry ~a" (current-source-location))
		(sigaction SIGQUIT signal-handler SA_RESTART)
		(sigaction SIGINT signal-handler SA_RESTART)
		(sigaction SIGTERM signal-handler SA_RESTART)
		(let (
				(mutex (make-mutex))
				(condition-variable (make-condition-variable))
			)
			(call-with-new-thread (lambda () (writer "writer-000" mutex condition-variable)))
			(call-with-new-thread (lambda () (reader "reader-000" mutex condition-variable)))
			(verbose-format "waiting for exit ...")
			(exit-wait)
			(verbose-format "done ...")
		)
		(verbose-format "end")
		(primitive-exit 0)
		)

(main)

[-- Attachment #5: guile-fmoc-test-mnlbct.scm --]
[-- Type: text/x-scheme, Size: 2721 bytes --]

; vim: set ts=4 sw=4 noexpandtab filetype=scheme:

(format (current-error-port) "(current-filename) = ~a\n" (current-filename))
(format (current-error-port) "%load-path = ~a\n" %load-path)

(define-module (test-notlocked aab)
	#:use-module (ice-9 format)
	#:use-module (system foreign)
	#:use-module (ice-9 threads)
	#:use-module (oop goops)
)


(define-syntax-rule (verbose-format format-string ...) (verbose-format-with-location (current-source-location) format-string ...))

(define verbose-port (current-error-port))
(define verbose-mutex (make-mutex))

(define (verbose-format-with-location location format-string . format-args)
	(let ((tod (gettimeofday)) (csl (current-source-location)))
		(with-mutex verbose-mutex (apply format (append (list verbose-port
				(string-append "~a.~6,'0d ~6,'0d:~6,'0d:~a ~a:~5,'0d " format-string "\n")
				(strftime "%Y:%m:%d %H:%M:%S" (localtime (car tod))) (cdr tod)
				(getpid) (getppid) #f ;(thread-name (current-thread))
				(basename (assq-ref location 'filename)) (assq-ref location 'line))
				format-args
				)
			))
		)
		(force-output verbose-port)
	)

(define exit-status #f)

(define (exit-wait)
	(do () (exit-status)
		(usleep 500000))
	(verbose-format "ending ..." exit-status))

(define (signal-handler signal)
	(verbose-format "got signal ~a" signal)
	(set! exit-status #t)
	(verbose-format "exit-status = ~a" exit-status))

(define (writer id mutex condition-variable)
	(verbose-format "~a entry" id)
	(let loop ( (i 0) )
		(with-mutex mutex
			(verbose-format "~a writing ~a" id i)
			(signal-condition-variable condition-variable)
			(verbose-format "~a wrote ~a" id i)
			(usleep 20000)
			)
		(usleep 100)
		(loop (+ 1 i))
		)
	(verbose-format "~a end" id)
	)

(define (reader id mutex condition-variable)
	(verbose-format "~a entry" id)
	(let loop ( (i 0) )
		(with-mutex mutex
			(verbose-format "~a reading ~a" id i)
			(wait-condition-variable condition-variable mutex)
			(usleep 20000)
			(verbose-format "~a read ~a" id i)
			)
		(usleep 100)
		(loop (+ 1 i))
		)
	(verbose-format "~a end" id)
	)

(define (main)
		(verbose-format "entry")
		(verbose-format "entry ~a" (current-source-location))
		(sigaction SIGQUIT signal-handler SA_RESTART)
		(sigaction SIGINT signal-handler SA_RESTART)
		(sigaction SIGTERM signal-handler SA_RESTART)
		(let (
				(mutex (make-mutex))
				(condition-variable (make-condition-variable))
			)
			(call-with-new-thread (lambda () (writer "writer-000" mutex condition-variable)))
			(call-with-new-thread (lambda () (reader "reader-000" mutex condition-variable)))
			(verbose-format "waiting for exit ...")
			(exit-wait)
			(verbose-format "done ...")
		)
		(verbose-format "end")
		(primitive-exit 0)
		)

(main)

[-- Attachment #6: guile-fmoc-test-mnlbct-problem_output.txt.bz2 --]
[-- Type: application/x-bzip2, Size: 25881 bytes --]

[-- Attachment #7: guile-fmoc-test-mnl-problem_output.txt.bz2 --]
[-- Type: application/x-bzip2, Size: 3419 bytes --]

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* bug#22152: fat_mutex owner corruption (fmoc) inside fat_mutex_unlock (guile-v2.0.11)
  2015-12-12 17:28 bug#22152: fat_mutex owner corruption (fmoc) inside fat_mutex_unlock (guile-v2.0.11) Iwan Aucamp
@ 2016-01-07  0:13 ` Mark H Weaver
  2016-06-20 20:05   ` Mark H Weaver
  0 siblings, 1 reply; 3+ messages in thread
From: Mark H Weaver @ 2016-01-07  0:13 UTC (permalink / raw)
  To: Iwan Aucamp
  Cc: Iwan Armand Aucamp, 22152, Donovan Hutcheon, Martin Cooper,
	Waqar Ali

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

Iwan Aucamp <aucampia@gmail.com> writes:
> We sporadically get "mutex not locked" and "mutex not locked by current thread"
> exceptions on Solaris 10u10 with guile-2.0.11.

Thanks very much for your detailed analysis and proposed fix.

I've attached a patch that hopefully fixes this bug and also refactors
the code to hopefully be somewhat more clear.  Can you please test it on
Solaris and verify that it works for your use cases?

    Thanks,
      Mark


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Preliminary patch to fix and refactor fat_mutex_unlock --]
[-- Type: text/x-patch, Size: 2754 bytes --]

From 1d945f638033e7649ea61fa1c4050b30da891d6b Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Fri, 18 Dec 2015 02:29:48 -0500
Subject: [PATCH] PRELIMINARY: Fix bug in fat_mutex_unlock.

---
 libguile/threads.c | 87 ++++++++++++++++++++++--------------------------------
 1 file changed, 36 insertions(+), 51 deletions(-)

diff --git a/libguile/threads.c b/libguile/threads.c
index 6ae6818..dcbd24d 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1607,70 +1607,55 @@ fat_mutex_unlock (SCM mutex, SCM cond,
 	}
     }
 
+  if (m->level > 0)
+    m->level--;
+  if (m->level == 0)
+    {
+      /* Change the owner of MUTEX.  */
+      t->mutexes = scm_delq_x (mutex, t->mutexes);
+      m->owner = unblock_from_queue (m->waiting);
+    }
+
   if (! (SCM_UNBNDP (cond)))
     {
       c = SCM_CONDVAR_DATA (cond);
-      while (1)
-	{
-	  int brk = 0;
-
-	  if (m->level > 0)
-	    m->level--;
-	  if (m->level == 0)
-	    {
-	      /* Change the owner of MUTEX.  */
-	      t->mutexes = scm_delq_x (mutex, t->mutexes);
-	      m->owner = unblock_from_queue (m->waiting);
-	    }
-
-	  t->block_asyncs++;
+      t->block_asyncs++;
 
+      do
+	{
 	  err = block_self (c->waiting, cond, &m->lock, waittime);
 	  scm_i_pthread_mutex_unlock (&m->lock);
 
-	  if (err == 0)
-	    {
-	      ret = 1;
-	      brk = 1;
-	    }
-	  else if (err == ETIMEDOUT)
-	    {
-	      ret = 0;
-	      brk = 1;
-	    }
-	  else if (err != EINTR)
-	    {
-	      errno = err;
-	      scm_syserror (NULL);
-	    }
-
-	  if (brk)
-	    {
-	      if (relock)
-		scm_lock_mutex_timed (mutex, SCM_UNDEFINED, owner);
-	      t->block_asyncs--;
-	      break;
-	    }
-
-	  t->block_asyncs--;
-	  scm_async_click ();
+          if (err == EINTR)
+            {
+              t->block_asyncs--;
+              scm_async_click ();
 
-	  scm_remember_upto_here_2 (cond, mutex);
+              scm_remember_upto_here_2 (cond, mutex);
 
-	  scm_i_scm_pthread_mutex_lock (&m->lock);
+              scm_i_scm_pthread_mutex_lock (&m->lock);
+              t->block_asyncs++;
+            }
 	}
+      while (err == EINTR);
+
+      if (err == 0)
+        ret = 1;
+      else if (err == ETIMEDOUT)
+        ret = 0;
+      else
+        {
+          t->block_asyncs--;
+          errno = err;
+          scm_syserror (NULL);
+        }
+
+      if (relock)
+        scm_lock_mutex_timed (mutex, SCM_UNDEFINED, owner);
+      t->block_asyncs--;
     }
   else
     {
-      if (m->level > 0)
-	m->level--;
-      if (m->level == 0)
-	{
-	  /* Change the owner of MUTEX.  */
-	  t->mutexes = scm_delq_x (mutex, t->mutexes);
-	  m->owner = unblock_from_queue (m->waiting);
-	}
-
       scm_i_pthread_mutex_unlock (&m->lock);
       ret = 1;
     }
-- 
2.6.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* bug#22152: fat_mutex owner corruption (fmoc) inside fat_mutex_unlock (guile-v2.0.11)
  2016-01-07  0:13 ` Mark H Weaver
@ 2016-06-20 20:05   ` Mark H Weaver
  0 siblings, 0 replies; 3+ messages in thread
From: Mark H Weaver @ 2016-06-20 20:05 UTC (permalink / raw)
  To: Iwan Aucamp
  Cc: Iwan Armand Aucamp, Donovan Hutcheon, 22152-done, Waqar Ali,
	Martin Cooper

Mark H Weaver <mhw@netris.org> writes:

> Iwan Aucamp <aucampia@gmail.com> writes:
>> We sporadically get "mutex not locked" and "mutex not locked by current thread"
>> exceptions on Solaris 10u10 with guile-2.0.11.
>
> Thanks very much for your detailed analysis and proposed fix.
>
> I've attached a patch that hopefully fixes this bug and also refactors
> the code to hopefully be somewhat more clear.  Can you please test it on
> Solaris and verify that it works for your use cases?

I went ahead and pushed commit 1e86dc32a42af549fc9e4721ad48cdd7d296c042
to stable-2.0, which will soon become guile-2.0.12.  I hope it fixes the
issue, although unfortunately my patch was never tested on Solaris.  I'm
going to close this bug, but feel free to reopen it if there are still
issues.

     Thanks,
       Mark





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-06-20 20:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-12 17:28 bug#22152: fat_mutex owner corruption (fmoc) inside fat_mutex_unlock (guile-v2.0.11) Iwan Aucamp
2016-01-07  0:13 ` Mark H Weaver
2016-06-20 20:05   ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).