From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Julian Graham" Newsgroups: gmane.lisp.guile.devel Subject: Re: srfi-18 requirements Date: Sat, 19 Jan 2008 15:10:48 -0500 Message-ID: <2bc5f8210801191210h72903a37q1c8f60e3638bfdba@mail.gmail.com> References: <2bc5f8210710101854m1254160ei451026182b87e767@mail.gmail.com> <87odc88muv.fsf@ossau.uklinux.net> <2bc5f8210712301238w583feb99w96bb77ed389eac50@mail.gmail.com> <87fxxh8isb.fsf@ossau.uklinux.net> <2bc5f8210801032101x33db423ak7bf7950c378ae27e@mail.gmail.com> <87abnldsg5.fsf@ossau.uklinux.net> <2bc5f8210801061341o5a8b060fm3e80d6b9cb8eb4d6@mail.gmail.com> <87prwb3oc4.fsf@ossau.uklinux.net> <2bc5f8210801101839w2b6ab7f8h3d99b6db35620a6@mail.gmail.com> <874pddcjdf.fsf@ossau.uklinux.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1200773464 29193 80.91.229.12 (19 Jan 2008 20:11:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 19 Jan 2008 20:11:04 +0000 (UTC) Cc: =?ISO-8859-1?Q?Ludovic_Court=E8s?= , guile-devel@gnu.org To: "Neil Jerram" Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Jan 19 21:11:22 2008 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JGK2B-0003gY-IT for guile-devel@m.gmane.org; Sat, 19 Jan 2008 21:11:19 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JGK1m-0002HQ-A6 for guile-devel@m.gmane.org; Sat, 19 Jan 2008 15:10:54 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JGK1j-0002FW-90 for guile-devel@gnu.org; Sat, 19 Jan 2008 15:10:51 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JGK1i-0002F5-Qf for guile-devel@gnu.org; Sat, 19 Jan 2008 15:10:51 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JGK1i-0002Ew-MN for guile-devel@gnu.org; Sat, 19 Jan 2008 15:10:50 -0500 Original-Received: from fg-out-1718.google.com ([72.14.220.157]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JGK1i-0006eQ-9K for guile-devel@gnu.org; Sat, 19 Jan 2008 15:10:50 -0500 Original-Received: by fg-out-1718.google.com with SMTP id d23so1538379fga.30 for ; Sat, 19 Jan 2008 12:10:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=tFJmCKwcWrQ4MRuQ2vpsguXZRG5AkxWVF/rs35+H2kI=; b=ZSjnYHQAAdOKwjaTHQlgbOaw0pl5axM/4uGCHnmi5boDcgSJykio2gKCYeeFZwBfncRd/5Z2F0FZvJOk7egPLHjLMMEtYjxWdhkulCn1s0Rf/zlQWE3DBnIlXPaxhRH+w9VPycETpQBd+JpLIPVyXUIaFfRouDUr9DQ/Bfyxv2I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=dBXGHSrClGrKiliEAfD0e/oU5+tYk2ZE5fpfJ3m569Jv7720cCTiiuSBm3Q0+wb6SKRYNVFHYxHdld2Pze2qrkV2nCeIJVCnkeJMDRhv2tuidv8sThAQIbUlrgG72DI+Rsbj3Rbkf9XxtCtdbb4+9Dp8IUYVCzj1L0sAjNxCstQ= Original-Received: by 10.82.162.14 with SMTP id k14mr8830239bue.32.1200773448110; Sat, 19 Jan 2008 12:10:48 -0800 (PST) Original-Received: by 10.82.176.13 with HTTP; Sat, 19 Jan 2008 12:10:48 -0800 (PST) In-Reply-To: <874pddcjdf.fsf@ossau.uklinux.net> Content-Disposition: inline X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 2) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:6964 Archived-At: Hi Neil, > OK. While looking through the docs, though, and playing with possible > solutions, I noted a couple of other pitfalls (which the current code > also appears to suffer from). > > 1. pthread_cond_wait() returning does not necessarily mean that the > cond var was signalled. Apparently pthread_cond_wait() can return > early because of an interrupt. Yes, the pthreads docs refer to this as a "spurious wakeup." > 2. If two threads are using pthread_cond_wait and pthread_cond_signal > to communicate, and using the cond_var itself as a state > indication, they have to be certain that the pthread_cond_wait > starts before the pthread_cond_signal, otherwise it won't work. Right -- holding the right mutexes when you signal / broadcast is pretty important. > The practical impact of these is that one shouldn't use the cond_var > itself as an indication of "reached so-and-so state". Instead, one > can represent the state using an explicit variable, which is protected > by the associated mutex, and then interpret the cond_var as indicating > simply that the variable _might_ have changed. > > In our case, I think the state variable could be > scm_i_thread_go_to_sleep, protected by thread_admin_mutex. Here's a > possible solution based on this, but it isn't yet complete, because it > doesn't explain how num_guile_threads_awake is calculated. (And I > have to go to bed!) I've come up with something similar that seems to work decently and seems a bit simple. See what you think (apologies for the formatting): static scm_i_pthread_cond_t wake_up_cond; static scm_i_pthread_mutex_t wake_up_mutex; static int wake_up_flag = 0; int scm_i_thread_go_to_sleep; void scm_i_thread_put_to_sleep () { if (threads_initialized_p) { scm_i_thread *t; scm_leave_guile (); scm_i_pthread_mutex_lock (&thread_admin_mutex); wake_up_flag = 0; scm_i_thread_go_to_sleep = 1; for (t = all_threads; t; t = t->next_thread) { scm_i_pthread_mutex_lock (&t->heap_mutex); } scm_i_thread_go_to_sleep = 0; } } void scm_i_thread_wake_up () { if (threads_initialized_p) { scm_i_thread *t; scm_i_pthread_mutex_lock (&wake_up_mutex); wake_up_flag = 1; scm_i_pthread_cond_broadcast (&wake_up_cond); scm_i_pthread_mutex_unlock (&wake_up_mutex); for (t = all_threads; t; t = t->next_thread) { scm_i_pthread_mutex_unlock (&t->heap_mutex); } scm_i_pthread_mutex_unlock (&thread_admin_mutex); scm_enter_guile ((scm_t_guile_ticket) SCM_I_CURRENT_THREAD); } } void scm_i_thread_sleep_for_gc () { scm_i_thread *t = suspend (); scm_i_pthread_cleanup_push ((void (*)(void *)) scm_i_pthread_mutex_unlock, &wake_up_mutex); scm_i_pthread_mutex_lock (&wake_up_mutex); scm_i_pthread_mutex_unlock (&t->heap_mutex); do { scm_i_pthread_cond_wait (&wake_up_cond, &wake_up_mutex); } while (!wake_up_flag); scm_i_pthread_mutex_lock (&t->heap_mutex); scm_i_pthread_mutex_unlock (&wake_up_mutex); scm_i_pthread_cleanup_pop (0); resume (t); } > > So why hasn't this been reported before? I'm not really sure, except > > that based on my logs, a GC involving more than two threads (one > > thread stays awake, of course, to manage the collection) is kind of > > rare. It doesn't even necessarily happen during an entire run of my > > SRFI-18 test suite, which lasts for several seconds and is fairly > > multi-threaded. > > Not sure what you mean here. Surely if there are >2 threads, they all > have to go to sleep before GC can proceed? Of course -- all I meant by this was that in the existing thread tests (and in much of the SRFI-18 test code I wrote) the lifespans of threads besides the main thread (and the signal delivery thread) are usually short enough that they don't end up participating in this whole co-op GC process. Maybe we need some test code for longer-running, guile-mode threads. (Perhaps developers with multi-threaded Guile application development under their belts would care to chime in here?) > > It *is* possible, because a thread can enter and leave guile mode and > > do a fair number of things without SCM_TICK getting called. I don't > > know if that's significant or not. > > That may mean that we need some more SCM_TICK calls. What kind of > processing was the thread doing? I'm not totally sure -- I'll have to add some more logs and get back to you. I think are definitely some places where an extra SCM_TICK might do some good (in fat_cond_timedwait, e.g.). Regards, Julian