unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: "Linas Vepstas" <linasvepstas@gmail.com>
To: "Neil Jerram" <neiljerram@googlemail.com>
Cc: bug-guile@gnu.org
Subject: Re: [PATCH] Final: thread lock nesting debugging
Date: Thu, 20 Nov 2008 10:03:43 -0600	[thread overview]
Message-ID: <3ae3aa420811200803u485a8fa7ia486c281da54497a@mail.gmail.com> (raw)
In-Reply-To: <49dd78620811191514m6e73438esd4f1474c9926880d@mail.gmail.com>

2008/11/19 Neil Jerram <neiljerram@googlemail.com>:
> 2008/11/17 Linas Vepstas <linasvepstas@gmail.com>:
>> I've been seeing all sorts of deadlocks in guile, and so I wrote a small
>> debugging utility to try to track down the problems.
>
> Interesting patch!
>
> One query; I may be being a bit dumb, I'm only just recovering from a
> bad cold, but anyway...  Your patch checks for a thread unlocking
> mutexes in the reverse order that it locked them in (let's call this
> point "A").  But I thought your recent investigations had shown that
> the problem was threads doing locking in inconsistent order, e.g.
> thread 1 locks M1 and then M2, while thread 2 locks M2 and then M1
> (point "B").  Are points "A" and "B" equivalent?  (It isn't obvious to
> me if so.)

Hi Neil,

There is (should be) only one lock in guile that is "inconsistent"
in its locking order, and this is the t->heap_mutex lock.

My guess is that valgrind is tripping over this one. I guess
I should argue that this is why one needs a custom patch,
instead of using valgrind (which is otherwise pretty fantastic
for mem corruption and the like).

The  t->heap_mutex lock is heled whenever a thread is
"guilified" or is "in guile mode".  Its primary reason for
being is to keep the garbage collector from running
until all threads have been halted. (This is done by
scm_i_thread_put_to_sleep)

After applying my set of patches, the only inconsistent
(and widespread!) lock ordering problem that I'm seeing
stems from the asymmetric way in which scm_i_scm_pthread_mutex_lock is
used to take a lock,
and then drop it.  If you follow the #define for
scm_i_scm_pthread_mutex_lock, you find that its of
the form:

   drop (thread->heap_mutex)
   take(some lock)
   take (thread->heap_mutex)

Whereas the unlock is just

   drop(some lock)

You can see this, for example, in ports.c line 728

  scm_i_scm_pthread_mutex_lock (&scm_i_port_table_mutex);
  scm_i_remove_port (port);
  scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);

Tto be "correctly" nested, the unlock should have droped
the heap mutex first, and then reacquired it.  I believe that
doing this would be enough to quiet down helgrind. (at least
for most cases ... what remains would be interesting)

OK, the above was just facts; below, some random comments
which might be incorrect (reasoning about locks can be
deceptive; I've certainly mis-reasoned several times w.r.t guile)

-- I had decided that the way that the dropping of the
lock is done is OK, and that it would be silly (and a
performance hit) to try to "fix" the unlocking order.
For debugging with helgrind, you may want to do this
anyway, but for production, it seemed un-necessary.
Prod me, and perhaps I can reconstruct why I cam to
this conclusion.

-- The reason for dropping the heap_mutex before grabbing
the other lock (for example  scm_i_port_table_mutex),
is somewhat obscure, but at one point I decided that this
was OK, and arguably correct.  As I write this, I've
forgotten why. However, this should be a focus of attention,
and re-thought-out.  If you are willing to think about it, prod
me and maybe I can say something intelligent.  Changing
this would also quiet helgrind (and boost performance).
It might be safe, I need to rethink this.

Anyway, my patch allows for this single occurance of the
out-of-sequence heap_mutex unlock.

--linas




  parent reply	other threads:[~2008-11-20 16:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-17  4:42 [PATCH] Final: thread lock nesting debugging Linas Vepstas
2008-11-17 13:21 ` Han-Wen Nienhuys
2008-11-18 13:26 ` Ludovic Courtès
2008-11-19 23:14 ` Neil Jerram
2008-11-19 23:21   ` Neil Jerram
2008-11-20  0:22     ` Neil Jerram
2008-11-20 16:03   ` Linas Vepstas [this message]
2008-11-20 18:33     ` Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ae3aa420811200803u485a8fa7ia486c281da54497a@mail.gmail.com \
    --to=linasvepstas@gmail.com \
    --cc=bug-guile@gnu.org \
    --cc=neiljerram@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).