From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Neil Jerram Newsgroups: gmane.lisp.guile.devel Subject: Re: srfi-18 requirements Date: Thu, 17 Jan 2008 01:48:28 +0000 Message-ID: <874pddcjdf.fsf@ossau.uklinux.net> References: <2bc5f8210710101854m1254160ei451026182b87e767@mail.gmail.com> <87ve7mmdpl.fsf@chbouib.org> <2bc5f8210712172030h101f71e2w95265d138ffdb2a8@mail.gmail.com> <87odc88muv.fsf@ossau.uklinux.net> <2bc5f8210712301238w583feb99w96bb77ed389eac50@mail.gmail.com> <87fxxh8isb.fsf@ossau.uklinux.net> <2bc5f8210801032101x33db423ak7bf7950c378ae27e@mail.gmail.com> <87abnldsg5.fsf@ossau.uklinux.net> <2bc5f8210801061341o5a8b060fm3e80d6b9cb8eb4d6@mail.gmail.com> <87prwb3oc4.fsf@ossau.uklinux.net> <2bc5f8210801101839w2b6ab7f8h3d99b6db35620a6@mail.gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1200534538 23232 80.91.229.12 (17 Jan 2008 01:48:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 17 Jan 2008 01:48:58 +0000 (UTC) Cc: =?iso-8859-1?Q?Ludovic_Court=E8s?= , guile-devel@gnu.org To: "Julian Graham" Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Jan 17 02:49:16 2008 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JFJsY-0008Lo-3O for guile-devel@m.gmane.org; Thu, 17 Jan 2008 02:49:14 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JFJs9-0004M7-8P for guile-devel@m.gmane.org; Wed, 16 Jan 2008 20:48:49 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JFJs3-0004M2-I0 for guile-devel@gnu.org; Wed, 16 Jan 2008 20:48:43 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JFJs1-0004Lq-Su for guile-devel@gnu.org; Wed, 16 Jan 2008 20:48:42 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JFJs1-0004Ln-OA for guile-devel@gnu.org; Wed, 16 Jan 2008 20:48:41 -0500 Original-Received: from mail3.uklinux.net ([80.84.72.33]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JFJrx-0007Zc-Gc; Wed, 16 Jan 2008 20:48:37 -0500 Original-Received: from arudy (host86-145-183-175.range86-145.btcentralplus.com [86.145.183.175]) by mail3.uklinux.net (Postfix) with ESMTP id 3A9C01F6F8A; Thu, 17 Jan 2008 01:48:30 +0000 (GMT) Original-Received: from laruns (unknown [192.168.0.10]) by arudy (Postfix) with ESMTP id 4721C3800A; Thu, 17 Jan 2008 01:48:29 +0000 (GMT) In-Reply-To: <2bc5f8210801101839w2b6ab7f8h3d99b6db35620a6@mail.gmail.com> (Julian Graham's message of "Thu, 10 Jan 2008 21:39:00 -0500") User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Linux 2.4-2.6 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:6963 Archived-At: "Julian Graham" writes: > Hi Neil, > >> 1. Embarassingly - given that I already said "Nice fix" to this - I'm >> afraid I can't now see exactly why this is needed. > > Argh, you're right -- when I first noticed this behavior, I was so > astonished to see my logs showing threads entering and leaving guile > mode during GC that my first move was to try and prevent this. When > my changes got rid of this behavior, I assumed everything was > hunky-dory. However, when pressed to explain the details, I added > more logging, which showed that errant thread ultimately did go to > sleep at the proper time -- it just never woke up when the > wake_up_cond was broadcast on. > > My current inclination is that the problem lies with sleeping on the > global wake_up_cond -- each thread calls pthread_cond_wait with its > own, thread-specific heap_mutex, the result of which is undefined, or > so say the glibc docs. Agreed. All the examples I've seen have the same mutex for all threads that wait on the cond var. > I'm testing a fix now that uses a mutex reserved for this purpose > instead. OK. While looking through the docs, though, and playing with possible solutions, I noted a couple of other pitfalls (which the current code also appears to suffer from). 1. pthread_cond_wait() returning does not necessarily mean that the cond var was signalled. Apparently pthread_cond_wait() can return early because of an interrupt. 2. If two threads are using pthread_cond_wait and pthread_cond_signal to communicate, and using the cond_var itself as a state indication, they have to be certain that the pthread_cond_wait starts before the pthread_cond_signal, otherwise it won't work. The practical impact of these is that one shouldn't use the cond_var itself as an indication of "reached so-and-so state". Instead, one can represent the state using an explicit variable, which is protected by the associated mutex, and then interpret the cond_var as indicating simply that the variable _might_ have changed. In our case, I think the state variable could be scm_i_thread_go_to_sleep, protected by thread_admin_mutex. Here's a possible solution based on this, but it isn't yet complete, because it doesn't explain how num_guile_threads_awake is calculated. (And I have to go to bed!) scm_i_thread_sleep_for_gc () { scm_i_thread *t = suspend (); pthread_mutex_lock (&thread_admin_mutex); if (scm_i_thread_go_to_sleep) { num_guile_threads_awake--; pthread_cond_signal (&going_to_sleep_cond); while (scm_i_thread_go_to_sleep) { pthread_cond_wait (&wake_up_cond, &thread_admin_mutex); } num_guile_threads_awake++; } pthread_mutex_unlock (&thread_admin_mutex); resume (t); } scm_i_thread_put_to_sleep () { pthread_mutex_lock (&thread_admin_mutex); scm_i_thread_go_to_sleep = 1; while (num_guile_threads_awake > 0) { pthread_cond_wait (&going_to_sleep_cond, &thread_admin_mutex); } } scm_i_thread_wake_up () { scm_i_thread_go_to_sleep = 0; pthread_mutex_unlock (&thread_admin_mutex); pthread_cond_broadcast (&wake_up_cond); } > So why hasn't this been reported before? I'm not really sure, except > that based on my logs, a GC involving more than two threads (one > thread stays awake, of course, to manage the collection) is kind of > rare. It doesn't even necessarily happen during an entire run of my > SRFI-18 test suite, which lasts for several seconds and is fairly > multi-threaded. Not sure what you mean here. Surely if there are >2 threads, they all have to go to sleep before GC can proceed? >> Is that right? I think you suggested in one of your previous emails >> that it might be possible for thread A to enter and leave guile mode >> multiple times, but I don't see how that is possible. > > It *is* possible, because a thread can enter and leave guile mode and > do a fair number of things without SCM_TICK getting called. I don't > know if that's significant or not. That may mean that we need some more SCM_TICK calls. What kind of processing was the thread doing? >> 2. Should admin_mutex be locked in scm_c_thread_exited_p()? I think >> it should. (This was equally wrong when using thread_admin_mutex, of >> course; your patch doesn't make anything worse, but it's worth fixing >> this in passing if you agree.) > > Sure -- wouldn't hurt. I'll include that with whatever ends up in the > final "bug" patch. Thanks. > Apologies that it takes me so long to reply to these emails. Blame > the overhead of looping my test code all night? No need to apologize there! My time at the moment is pretty limited too, so if you replied any quicker, you'd then just be waiting for me (even more)! Regards, Neil