unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: "Linas Vepstas" <linasvepstas@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guile-user@gnu.org
Subject: Re: guile threading deadlock
Date: Sat, 8 Nov 2008 12:29:25 -0600	[thread overview]
Message-ID: <3ae3aa420811081029y2d6334a1y84cef351e69b7de1@mail.gmail.com> (raw)
In-Reply-To: <87fxm25rrg.fsf@gnu.org>

Hi,

2008/11/8 Ludovic Courtès <ludo@gnu.org>:
> Hello!
>
> "Linas Vepstas" <linasvepstas@gmail.com> writes:
>
>> I've got a little deadlock problem w/ guile.  Here's the pseudocode:

I've got a much, much simpler case, see below.

> Can you try to provide actual code to reproduce the problem?  :-)
> That would be great.

Sure, but you won't enjoy debugging it.  Or even building it.
Go to https://code.launchpad.net/~opencog-dev
check out the branch called "staging". Build it.
then cd to directory opencog/scm and run load.sh

> Did you compile Guile with thread support

Yes. You will observe that the stack traces I sent were
deadlocked in garbage collection?   Without thread
support, how could things possibly deadlock?

Anyway, I have an even simpler variant, with only *one*
thread deadlocked in gc.    Here's the scenario:

thread A:
   scm_init_guile();
   does some other stuff, then
   goes to sleep in select, waiting on socket input
   (as expected).

thread B:
   scm_init_guile() -- hangs here.

B deadlocks with the stack trace below:

#0  0xffffe425 in __kernel_vsyscall ()
#1  0xf7e60589 in __lll_lock_wait () from
/lib/tls/i686/cmov/libpthread.so.0
#2  0xf7e5bba6 in _L_lock_95 () from /lib/tls/i686/cmov/libpthread.so.0
#3  0xf7e5b58a in pthread_mutex_lock () from
/lib/tls/i686/cmov/libpthread.so.0
#4  0xf7844464 in scm_i_thread_put_to_sleep () at threads.c:1615
#5  0xf77eeca9 in scm_i_gc (what=0xf786422e "cells") at gc.c:552
#6  0xf77eefed in scm_gc_for_newcell (freelist=0xf787984c,
free_cells=0x99fa25c)
    at gc.c:509
#7  0xf7843bff in guilify_self_2 (parent=0xf76b0e70) at
../libguile/inline.h:115
#8  0xf7845a9b in scm_i_init_thread_for_guile (base=0xf3b8a000,
    parent=0xf76b0e70) at threads.c:578
#9  0xf7845d82 in scm_init_guile () at threads.c:682
#10 0xf796f928 in opencog::SchemeEval::thread_init (this=0x995bc38)
    at ...

I built guile, and added debug prints:  one to
scm_enter_guile, which takes a lock, and one
to scm_leave_guile, with drops a lock.  I also
put prints into scm_i_thread_put_to_sleep ().

The behaviour is very clear, and every simple:
when scm_init_guile is called in thread A, the result
is that it is in "guile mode" i.e. holding the lock --
it is created holding the lock.  There's a series of
pairs of calls to leave..enter which are always
paired up.  Anyway, when thread A finally goes to
sleep waiting on input, it does so with its lock held.
Read libguile/threads.c:scm_enter_guile() to see
what I mean:

static void
scm_enter_guile (scm_t_guile_ticket ticket)
{
  scm_i_thread *t = (scm_i_thread *)ticket;
  if (t)
    {
      scm_i_pthread_mutex_lock (&t->heap_mutex);
      resume (t);
    }
}

while scm_leave_guile() does symmetrically the opposite.

Anyway, at this point, thread A is sleeping, holding the
above lock, because the last guile-thing it did was to
call scm_enter_guile().

Then *later on*, thread B calls scm_init_guile(), and
hangs, very clearly in scm_i_thread_put_to_sleep().
The printf reveal the hangs happen here:

     /* Signal all threads to go to sleep
       */
      scm_i_thread_go_to_sleep = 1;
      for (t = all_threads; t; t = t->next_thread)
      {
           scm_i_pthread_mutex_lock (&t->heap_mutex);
      }

Well, the for loop then gets stuck, waiting for the lock.
But the lock will never be granted, because thread A
is holding it, and is in permanent sleep. As a result,
thread B is blocked, forever, thus a deadlock.

I'm somewhat stumped, because I can't imagine how
this code *ever* could have worked in the first place.
The deadlock seems really blatent to me.  It seems
criminal for guile to *ever* return to the caller, while
still holding a lock of any sort. But every clearly,
scm_init_guile() (and I guess most other calls) return
to C code,  with a lock held.  This is just begging for
deadlocks!

--linas

  reply	other threads:[~2008-11-08 18:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-08  2:26 guile threading deadlock Linas Vepstas
2008-11-08 12:25 ` Ludovic Courtès
2008-11-08 18:29   ` Linas Vepstas [this message]
2008-11-09 17:13     ` Ludovic Courtès
2008-11-09 19:47       ` Linas Vepstas
2008-11-09 21:14         ` Ludovic Courtès
2008-11-09 22:16           ` Linas Vepstas
2008-11-09 23:36             ` Ludovic Courtès
2008-11-10 23:59               ` Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ae3aa420811081029y2d6334a1y84cef351e69b7de1@mail.gmail.com \
    --to=linasvepstas@gmail.com \
    --cc=guile-user@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).