* Locks and threads @ 2009-02-11 22:31 Neil Jerram 2009-02-11 23:05 ` Neil Jerram ` (4 more replies) 0 siblings, 5 replies; 51+ messages in thread From: Neil Jerram @ 2009-02-11 22:31 UTC (permalink / raw) To: Guile Development; +Cc: Linas Vepstas [-- Attachment #1: Type: text/plain, Size: 727 bytes --] I've started working through the lock ordering and threading issues that we have. My plan is (with 1.8.x) - first to address problems reported by helgrind (since I think we should take advantage of external tools before adding debug code to Guile internally) - then to run Linas's define-race program, and address the problems that that shows up (including thread-safe define, hopefully) - then (or maybe as part of the previous step) to look again at Linas's lock order debugging patch, to see if it would make sense to apply some of that. Does that sound sensible; have I missed anything? A patch for the first helgrind problem is attached; any comments would be appreciated, as ever. Regards, Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Lock-ordering-don-t-lock-heap_mutex-and-then-thread.patch --] [-- Type: text/x-diff, Size: 2882 bytes --] From 3dcf473e9e535314fc6dedc99ff8d9132e7a3e3e Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 11 Feb 2009 21:22:57 +0000 Subject: [PATCH] Lock ordering: don't lock heap_mutex and then thread_admin_mutex This fixes the following helgrind report. Thread #1: lock order "0x4325084 before 0x4105328" violated at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40D01EA: scm_i_thread_put_to_sleep (threads.c:1622) by 0x4078958: scm_make_fluid (fluids.c:114) by 0x40778D6: scm_init_feature (feature.c:101) by 0x408C62E: scm_i_init_guile (init.c:464) by 0x40D1873: scm_i_init_thread_for_guile (threads.c:583) by 0x40D18B4: scm_i_with_guile_and_parent (threads.c:725) by 0x40D19BD: scm_with_guile (threads.c:714) by 0x408C42E: scm_boot_guile (init.c:350) by 0x8048710: main (guile.c:69) Required order was established by acquisition of lock at 0x4325084 at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40CFFA4: guilify_self_1 (threads.c:468) by 0x408C58B: scm_i_init_guile (init.c:421) by 0x40D1873: scm_i_init_thread_for_guile (threads.c:583) by 0x40D18B4: scm_i_with_guile_and_parent (threads.c:725) by 0x40D19BD: scm_with_guile (threads.c:714) by 0x408C42E: scm_boot_guile (init.c:350) by 0x8048710: main (guile.c:69) followed by a later acquisition of lock at 0x4105328 at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40CFFAC: guilify_self_1 (threads.c:470) by 0x408C58B: scm_i_init_guile (init.c:421) by 0x40D1873: scm_i_init_thread_for_guile (threads.c:583) by 0x40D18B4: scm_i_with_guile_and_parent (threads.c:725) by 0x40D19BD: scm_with_guile (threads.c:714) by 0x408C42E: scm_boot_guile (init.c:350) by 0x8048710: main (guile.c:69) * threads.c (guilify_self_1): Add self to global thread list _before_ entering Guile mode. --- libguile/threads.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index ba17746..05e01e3 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -465,13 +465,19 @@ guilify_self_1 (SCM_STACKITEM *base) scm_i_pthread_setspecific (scm_i_thread_key, t); - scm_i_pthread_mutex_lock (&t->heap_mutex); - + /* As soon as this thread adds itself to the global thread list, the + GC may think that it has a stack that needs marking. Therefore + initialize t->top to be the same as t->base, just in case GC runs + before the thread can lock its heap_mutex for the first time. */ + t->top = t->base; scm_i_pthread_mutex_lock (&thread_admin_mutex); t->next_thread = all_threads; all_threads = t; thread_count++; scm_i_pthread_mutex_unlock (&thread_admin_mutex); + + /* Enter Guile mode. */ + scm_enter_guile (t); } /* Perform second stage of thread initialisation, in guile mode. -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 22:31 Locks and threads Neil Jerram @ 2009-02-11 23:05 ` Neil Jerram 2009-02-11 23:32 ` Ludovic Courtès 2009-02-11 23:30 ` Linas Vepstas ` (3 subsequent siblings) 4 siblings, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-02-11 23:05 UTC (permalink / raw) To: Guile Development [-- Attachment #1: Type: text/plain, Size: 241 bytes --] Neil Jerram <neil@ossau.uklinux.net> writes: > - first to address problems reported by helgrind (since I think we > should take advantage of external tools before adding debug code to > Guile internally) Here's the next one. Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Don-t-leave-Guile-mode-to-lock-async_mutex.patch --] [-- Type: text/x-diff, Size: 3403 bytes --] From 76f55c5796f1fc7aca6c36bc57f06bab72300a94 Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 11 Feb 2009 23:03:16 +0000 Subject: [PATCH] Don't leave Guile mode to lock async_mutex This fixes lots of helgrind-reported problems, such as: Thread #1: lock order "0x4325084 before 0x4108928" violated at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40CFD37: scm_enter_guile (threads.c:377) by 0x40D0284: scm_pthread_mutex_lock (threads.c:1485) by 0x405A736: scm_async_click (async.c:155) by 0x406F9EE: deval (eval.c:4080) by 0x40761D9: scm_primitive_eval_x (eval.c:5921) by 0x40AD20E: install_handler (scmsigs.c:113) by 0x40AD402: scm_sigaction_for_thread (scmsigs.c:394) by 0x4087D1F: scm_gsubr_apply (gsubr.c:223) by 0x406DF55: scm_dapply (eval.c:4930) by 0x407147C: deval (eval.c:4378) by 0x406E1BD: scm_dapply (eval.c:5012) Required order was established by acquisition of lock at 0x4325084 at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40CFD37: scm_enter_guile (threads.c:377) by 0x408C58B: scm_i_init_guile (init.c:421) by 0x40D1873: scm_i_init_thread_for_guile (threads.c:589) by 0x40D18B4: scm_i_with_guile_and_parent (threads.c:731) by 0x40D19BD: scm_with_guile (threads.c:720) by 0x408C42E: scm_boot_guile (init.c:350) by 0x8048710: main (guile.c:69) followed by a later acquisition of lock at 0x4108928 at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40AFD61: scm_make_smob_type (smob.c:294) by 0x40AFF19: scm_smob_prehistory (smob.c:512) by 0x408C595: scm_i_init_guile (init.c:423) by 0x40D1873: scm_i_init_thread_for_guile (threads.c:589) by 0x40D18B4: scm_i_with_guile_and_parent (threads.c:731) by 0x40D19BD: scm_with_guile (threads.c:720) by 0x408C42E: scm_boot_guile (init.c:350) by 0x8048710: main (guile.c:69) * libguile/async.c (scm_async_click): Don't leave Guile mode when locking async_mutex. We don't need to, because none of the code that has async_mutex locked can block, and doing so may lead to lock ordering problems between async_mutex and a thread's heap_mutex. --- libguile/async.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/libguile/async.c b/libguile/async.c index a9763da..56634d5 100644 --- a/libguile/async.c +++ b/libguile/async.c @@ -152,7 +152,7 @@ scm_async_click () invoked even when pending_asyncs is zero. */ - scm_i_scm_pthread_mutex_lock (&async_mutex); + scm_i_pthread_mutex_lock (&async_mutex); t->pending_asyncs = 0; if (t->block_asyncs == 0) { @@ -197,7 +197,7 @@ scm_i_queue_async_cell (SCM c, scm_i_thread *t) int sleep_fd; SCM p; - scm_i_scm_pthread_mutex_lock (&async_mutex); + scm_i_pthread_mutex_lock (&async_mutex); p = t->active_asyncs; SCM_SETCDR (c, SCM_EOL); if (!scm_is_pair (p)) @@ -263,7 +263,7 @@ scm_i_setup_sleep (scm_i_thread *t, { int pending; - scm_i_scm_pthread_mutex_lock (&async_mutex); + scm_i_pthread_mutex_lock (&async_mutex); pending = t->pending_asyncs; if (!pending) { @@ -278,7 +278,7 @@ scm_i_setup_sleep (scm_i_thread *t, void scm_i_reset_sleep (scm_i_thread *t) { - scm_i_scm_pthread_mutex_lock (&async_mutex); + scm_i_pthread_mutex_lock (&async_mutex); t->sleep_object = SCM_BOOL_F; t->sleep_mutex = NULL; t->sleep_fd = -1; -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 23:05 ` Neil Jerram @ 2009-02-11 23:32 ` Ludovic Courtès 0 siblings, 0 replies; 51+ messages in thread From: Ludovic Courtès @ 2009-02-11 23:32 UTC (permalink / raw) To: guile-devel Neil Jerram <neil@ossau.uklinux.net> writes: > * libguile/async.c (scm_async_click): Don't leave Guile mode when > locking async_mutex. We don't need to, because none of the code > that has async_mutex locked can block, and doing so may lead to lock > ordering problems between async_mutex and a thread's heap_mutex. [...] > - scm_i_scm_pthread_mutex_lock (&async_mutex); > + scm_i_pthread_mutex_lock (&async_mutex); This one was tricky. I took me a fair amount of M-. to grasp the difference between these two variants. Thanks! Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 22:31 Locks and threads Neil Jerram 2009-02-11 23:05 ` Neil Jerram @ 2009-02-11 23:30 ` Linas Vepstas 2009-02-11 23:53 ` Neil Jerram 2009-02-11 23:30 ` Ludovic Courtès ` (2 subsequent siblings) 4 siblings, 1 reply; 51+ messages in thread From: Linas Vepstas @ 2009-02-11 23:30 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development 2009/2/11 Neil Jerram <neil@ossau.uklinux.net>: > I've started working through the lock ordering and threading issues > that we have. My plan is (with 1.8.x) > > - first to address problems reported by helgrind (since I think we > should take advantage of external tools before adding debug code to > Guile internally) > > - then to run Linas's define-race program, and address the problems > that that shows up (including thread-safe define, hopefully) > > - then (or maybe as part of the previous step) to look again at > Linas's lock order debugging patch, to see if it would make sense to > apply some of that. > > Does that sound sensible; have I missed anything? Err, sort of, yes, unless I misunderstand. Guile 1.8 makes a certain basic assumption that is splattered throughout the code; it rather intentionally re-orders the order in which one of the locks is taken. If I remember correctly, its the "in guile mode" lock. So if you just go looking for locks that are released out-of-order, you'll find lots of these. At the time, I had decided that 1) it would be a lot of work to get these in order, and the patch would likely be rejected, and 2) the reordering is essentially harmless (since its consistently done). 3) there might have even been a performance hit (I don't remember) by trying to get these into order. This made using valgrind impossible, and that's why I created the custom patch -- it made a point of ignoring this one reversed-order, while checking for badness in everything else. --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 23:30 ` Linas Vepstas @ 2009-02-11 23:53 ` Neil Jerram 2009-02-12 0:18 ` Linas Vepstas 2009-02-12 20:51 ` Ludovic Courtès 0 siblings, 2 replies; 51+ messages in thread From: Neil Jerram @ 2009-02-11 23:53 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development Linas Vepstas <linasvepstas@gmail.com> writes: > Err, sort of, yes, unless I misunderstand. Guile 1.8 makes > a certain basic assumption that is splattered throughout > the code; it rather intentionally re-orders the order in which > one of the locks is taken. If I remember correctly, its the > "in guile mode" lock. So if you just go looking for locks > that are released out-of-order, you'll find lots of these. Yes, I think I understand this now (having seen it myself). The pattern is - thread holding its heap_mutex - which is the normal state in guile mode - thread calls scm_i_scm_pthread_mutex_lock to lock some other mutex - scm_i_scm_pthread_mutex_lock: - unlocks the heap_mutex - locks the other mutex - locks the heap mutex again. That in itself doesn't actually cause an ordering problem, but then the thread releases the other mutex without releasing the heap mutex first - which is perceived (by helgrind at least) as a problem. (Is something like this actually _ever_ a problem? If locks are always _acquired_ in the right order, how can the order of _releasing_ ever cause a problem?) The async_mutex handling (that I've posted a patch for) is one example of this. > At the time, I had decided that > 1) it would be a lot of work to get these in order, and the > patch would likely be rejected, and > 2) the reordering is essentially harmless (since its > consistently done). > 3) there might have even been a performance hit (I don't > remember) by trying to get these into order. The other thing to bear in mind is that 99% of this will just evaporate if we move to BDW-GC for 2.0.x; so - assuming we do end up doing that - it makes sense to take a slightly more pragmatic approach than normal for 1.8.x. > This made using valgrind impossible, and that's why I created > the custom patch -- it made a point of ignoring this one > reversed-order, while checking for badness in everything else. Thanks. I understand this much better now! On the other hand, after the async_mutex patch, my helgrind output [1] is only reporting a couple of problems now, so it looks like helgrind-cleanliness might be achievable. [1] I am only running a basic startup test, though: "valgrind --tool=helgrind guile -q <<EOF". Were you running something a lot more complex than that? Regards, Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 23:53 ` Neil Jerram @ 2009-02-12 0:18 ` Linas Vepstas 2009-02-12 20:51 ` Ludovic Courtès 1 sibling, 0 replies; 51+ messages in thread From: Linas Vepstas @ 2009-02-12 0:18 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development [-- Attachment #1: Type: text/plain, Size: 2619 bytes --] 2009/2/11 Neil Jerram <neil@ossau.uklinux.net>: > Linas Vepstas <linasvepstas@gmail.com> writes: > >> Err, sort of, yes, unless I misunderstand. Guile 1.8 makes >> a certain basic assumption that is splattered throughout >> the code; it rather intentionally re-orders the order in which >> one of the locks is taken. If I remember correctly, its the >> "in guile mode" lock. So if you just go looking for locks >> that are released out-of-order, you'll find lots of these. > > Yes, I think I understand this now (having seen it myself). The > pattern is > > - thread holding its heap_mutex - which is the normal state in guile Yes, that's the one. > That in itself doesn't actually cause an ordering problem, but then > the thread releases the other mutex without releasing the heap mutex > first - which is perceived (by helgrind at least) as a problem. > > (Is something like this actually _ever_ a problem? If locks are > always _acquired_ in the right order, how can the order of _releasing_ > ever cause a problem?) Yes, it can be a problem; I don't want to dream up some particular scenario (this stuff destroys brain cells) ... but I do vaguely remember skimming some wikipedia article on locking that discussed this. I think the scenario involves three locks, though. This is why helgrind checks lock order -- its one of the locking problems it can actually detect. However, for the case of guile, the heap mutex is not visible to anything that isn't guile, and thus, its safe in this particular case. If there were outside users, things would be different. > The other thing to bear in mind is that 99% of this will just > evaporate if we move to BDW-GC for 2.0.x; so - assuming we do end up > doing that - it makes sense to take a slightly more pragmatic approach > than normal for 1.8.x. Sure. As a reminder ... the only real remaining problem that remained with a race to update some hash table, when define was being used from several threads. *thats* the bug that needs attention (but is hard to fix). > [1] I am only running a basic startup test, though: "valgrind > --tool=helgrind guile -q <<EOF". Were you running something a lot > more complex than that? I had written some simple test case, which I think sprouted a bunch of threads, and then did simple scheme things in each .. e.g. just adding numbers, or whatever. I'm attaching some kind of simple test case to this email however, it is very hacked, so I don't know if it actually will find bugs, and its probably doesn't do what it claims to do. I provide it only as a short-cut for creating a new test case. ... --linas [-- Attachment #2: thread-exp.c --] [-- Type: text/x-csrc, Size: 4552 bytes --] /** * Guile threading bug/unexpected behaviour. * * Two posix threads are created. The first thread runs, and defines * something. The second thread tries to access the thing defined by * the first. But, due to some "unexpected" threading behaviour, the * things defined in the first thread are not visible to the first. * I'm pretty convinced this is a bug, and am hunting it down now. * * The printed output of this program, for me, under guile1.8.5, is * * HEllo, thread one is running * Goodbye, thread one is exiting * HEllo, thread two is running * ERROR: Unbound variable: x * Goodbye, thread two is exiting * Main is exiting * * To get a better idea of what's going on, it seems to have something * to do with the current module. By turning on print debugging, the * output, for me, becomes: * * HEllo, thread one is running * thread one had output: * the root module is #<module (guile) f73f9e80> * the current module is #<directory (guile-user) f73fc600> * * Goodbye, thread one is exiting * HEllo, thread two is running * thread two had output: * the root module is #<module (guile) f73f9e80> * the current module is #<module (guile) f73f9e80> * * ERROR: Unbound variable: x * Goodbye, thread two is exiting * Main is exiting */ #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <libguile.h> SCM outport; void prtdbg(const char * id) { #define SHOW_STUFF 1 #ifdef SHOW_STUFF char buff[100]; sprintf(buff, "(display \"duuude id %s\\n\")\n", id); scm_c_eval_string (buff); scm_c_eval_string ("(display \"the root module is \")\n"); scm_c_eval_string ("(display the-root-module)\n"); scm_c_eval_string ("(newline)\n"); scm_c_eval_string ("(display \"the current module is \")\n"); scm_c_eval_string ("(display (current-module))\n"); scm_c_eval_string ("(newline)\n"); SCM out = scm_get_output_string(outport); char * str = scm_to_locale_string(out); scm_truncate_file(outport, scm_from_uint16(0)); printf("%s had output:\n%s\n", id, str); #endif } void * scm_one (void *p) { outport = scm_open_output_string(); scm_set_current_output_port(outport); prtdbg("thread one"); // scm_c_eval_string ("(set-current-module the-root-module)\n"); // prtdbg("thread one again"); scm_c_eval_string ("(define x \"asddf\")\n"); } void * scm_two (void *p) { scm_set_current_output_port(outport); prtdbg("thread two"); // scm_c_eval_string ("(display x)\n"); scm_c_eval_string ("(display \"duuuuuuuuuuuuauuuuuuuuuuuude\")\n"); scm_c_eval_string ("(define (fact n) (if (= n 1) 1 (* n (fact (- n 1)))))\n" "(display \"wahazzzuppp\\n\")\n" "(display (fact 369))\n" "(display \"yeah\\n\")\n"); } void * scm_three (void *p) { scm_set_current_output_port(outport); prtdbg("thread three"); scm_c_eval_string ("(display x)\n"); } void * scm_four (void *p) { prtdbg("thread four"); scm_c_eval_string ("(display x)\n"); } static void * thread_zero (void * arg) { printf("Thread zeroooooo \n"); scm_with_guile(scm_one, NULL); printf("Thread exit \n"); return NULL; } static void * thread_one (void * arg) { printf("HEllo, thread one is running\n"); pthread_attr_t attr; pthread_t t0; pthread_attr_init(&attr); // pthread_create(&t0, &attr, thread_zero, NULL); scm_with_guile(scm_one, NULL); printf("Goodbye, thread one is exiting\n"); return NULL; } static void * thread_two (void * arg) { printf("HEllo, thread two is running\n"); scm_with_guile(scm_two, NULL); printf("Goodbye, thread two is exiting\n"); return NULL; } static void * thread_three (void * arg) { printf("HEllo, thread three is running\n"); scm_with_guile(scm_three, NULL); printf("Goodbye, thread three is exiting\n"); return NULL; } static void * thread_four (void * arg) { printf("HEllo, thread four is running\n"); scm_with_guile(scm_four, NULL); printf("Goodbye, thread four is exiting\n"); return NULL; } main() { int rc; pthread_attr_t attr; pthread_t t1, t2, t3, t4; pthread_attr_init(&attr); rc = pthread_create(&t1, &attr, thread_one, NULL); if(rc) { fprintf(stderr, "Fatal error: can't create thread one\n"); exit(1); } sleep(1); rc = pthread_create(&t2, &attr, thread_two, NULL); if(rc) { fprintf(stderr, "Fatal error: can't create thread two\n"); exit(1); } sleep(1); rc = pthread_create(&t3, &attr, thread_three, NULL); if(rc) { fprintf(stderr, "Fatal error: can't create thread three\n"); exit(1); } sleep(1); rc = pthread_create(&t4, &attr, thread_four, NULL); sleep(5); printf("Main is exiting\n"); } ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 23:53 ` Neil Jerram 2009-02-12 0:18 ` Linas Vepstas @ 2009-02-12 20:51 ` Ludovic Courtès 1 sibling, 0 replies; 51+ messages in thread From: Ludovic Courtès @ 2009-02-12 20:51 UTC (permalink / raw) To: guile-devel Hello, Neil Jerram <neil@ossau.uklinux.net> writes: > (Is something like this actually _ever_ a problem? If locks are > always _acquired_ in the right order, how can the order of _releasing_ > ever cause a problem?) It can't be a problem, AFAIUI. I don't think Helgrind check the releasing order, though; the manual doesn't mention it (http://valgrind.org/docs/manual/hg-manual.html#hg-manual.lock-orders). Thanks, Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 22:31 Locks and threads Neil Jerram 2009-02-11 23:05 ` Neil Jerram 2009-02-11 23:30 ` Linas Vepstas @ 2009-02-11 23:30 ` Ludovic Courtès 2009-02-12 12:55 ` Greg Troxel 2009-03-04 23:49 ` Neil Jerram 4 siblings, 0 replies; 51+ messages in thread From: Ludovic Courtès @ 2009-02-11 23:30 UTC (permalink / raw) To: guile-devel Hi Neil, Neil Jerram <neil@ossau.uklinux.net> writes: > I've started working through the lock ordering and threading issues > that we have. My plan is (with 1.8.x) > > - first to address problems reported by helgrind (since I think we > should take advantage of external tools before adding debug code to > Guile internally) > > - then to run Linas's define-race program, and address the problems > that that shows up (including thread-safe define, hopefully) > > - then (or maybe as part of the previous step) to look again at > Linas's lock order debugging patch, to see if it would make sense to > apply some of that. > > Does that sound sensible; have I missed anything? Yes. I think Helgrind can be very helpful. > diff --git a/libguile/threads.c b/libguile/threads.c > index ba17746..05e01e3 100644 > --- a/libguile/threads.c > +++ b/libguile/threads.c > @@ -465,13 +465,19 @@ guilify_self_1 (SCM_STACKITEM *base) > > scm_i_pthread_setspecific (scm_i_thread_key, t); > > - scm_i_pthread_mutex_lock (&t->heap_mutex); > - > + /* As soon as this thread adds itself to the global thread list, the > + GC may think that it has a stack that needs marking. Therefore > + initialize t->top to be the same as t->base, just in case GC runs > + before the thread can lock its heap_mutex for the first time. */ > + t->top = t->base; > scm_i_pthread_mutex_lock (&thread_admin_mutex); > t->next_thread = all_threads; > all_threads = t; > thread_count++; > scm_i_pthread_mutex_unlock (&thread_admin_mutex); > + > + /* Enter Guile mode. */ > + scm_enter_guile (t); Looks good. `scm_enter_guile ()' does a bit more than the `scm_i_pthread_mutex_lock ()' call that was removed, but it appears to be OK. Thanks for looking into this! Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 22:31 Locks and threads Neil Jerram ` (2 preceding siblings ...) 2009-02-11 23:30 ` Ludovic Courtès @ 2009-02-12 12:55 ` Greg Troxel 2009-02-12 18:00 ` Ken Raeburn 2009-03-05 20:41 ` Neil Jerram 2009-03-04 23:49 ` Neil Jerram 4 siblings, 2 replies; 51+ messages in thread From: Greg Troxel @ 2009-02-12 12:55 UTC (permalink / raw) To: Neil Jerram; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 557 bytes --] Does that sound sensible; have I missed anything? Also run tests on other than Linux, with as many different OS threading implementations as possible. For all systems, set the threading libraries to the most restrictive settings, specifically aborting on any behavior the standard says is undefined. If the define-race program gets added as a check that make check runs, my autobuild machine will run it daily. I am not familiar with helgrind; not sure if it's useful to run that on other platforms once it's ok on one, or what platforms it runs on. [-- Attachment #2: Type: application/pgp-signature, Size: 193 bytes --] ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-12 12:55 ` Greg Troxel @ 2009-02-12 18:00 ` Ken Raeburn 2009-02-12 21:14 ` Ludovic Courtès 2009-03-05 20:41 ` Neil Jerram 1 sibling, 1 reply; 51+ messages in thread From: Ken Raeburn @ 2009-02-12 18:00 UTC (permalink / raw) To: Greg Troxel; +Cc: guile-devel, Neil Jerram On Feb 12, 2009, at 07:55, Greg Troxel wrote: > Does that sound sensible; have I missed anything? > > Also run tests on other than Linux, with as many different OS > threading > implementations as possible. For all systems, set the threading > libraries to the most restrictive settings, specifically aborting on > any > behavior the standard says is undefined. Yeah, definitely NetBSD is a good one for this. > If the define-race program gets added as a check that make check runs, > my autobuild machine will run it daily. > > I am not familiar with helgrind; not sure if it's useful to run that > on > other platforms once it's ok on one, or what platforms it runs on. It's part of the valgrind suite, which currently only runs on Linux, though there is porting work in progress. The helgrind tool checks for possible synchronization issues. As helgrind and NetBSD's pthread checking code work by instrumenting the executable and watching progress, I'd suggest also trying out static analysis tools if anyone has them handy. I see guile is already listed with Coverity's open-source scanner project; who's reviewing that data, if anyone? Ken ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-12 18:00 ` Ken Raeburn @ 2009-02-12 21:14 ` Ludovic Courtès 2009-02-14 1:25 ` Ken Raeburn 0 siblings, 1 reply; 51+ messages in thread From: Ludovic Courtès @ 2009-02-12 21:14 UTC (permalink / raw) To: guile-devel Hello, Ken Raeburn <raeburn@raeburn.org> writes: > As helgrind and NetBSD's pthread checking code work by instrumenting > the executable and watching progress, I'd suggest also trying out > static analysis tools if anyone has them handy. I see guile is > already listed with Coverity's open-source scanner project; who's > reviewing that data, if anyone? I don't feel like supporting that tool. I noticed that other free software supporters are doubtful, too: http://blog.josefsson.org/2007/04/02/boycott-scancoveritycom/ . We could use tools like Splint, but it seems to be quite intrusive and difficult to use when not used from the beginning of the project (to get an idea it gives 97 warnings on `alist.c'...). Thanks, Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-12 21:14 ` Ludovic Courtès @ 2009-02-14 1:25 ` Ken Raeburn 2009-02-14 16:09 ` Ludovic Courtès 0 siblings, 1 reply; 51+ messages in thread From: Ken Raeburn @ 2009-02-14 1:25 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel On Feb 12, 2009, at 16:14, Ludovic Courtès wrote: > Ken Raeburn <raeburn@raeburn.org> writes: >> [... Coverity ...] > I don't feel like supporting that tool. I noticed that other free > software supporters are doubtful, too: > http://blog.josefsson.org/2007/04/02/boycott-scancoveritycom/ . The license situation might be a problem for some, sure; that's up to the individuals to decide. It looks okay by me, but I don't have time to deal. Quite a number of the projects at scan.coverity.com are GNU projects, including emacs and gcc; any idea if anyone from the GNU Project has talked to Coverity in an official capacity about getting the terms changed? > We could use tools like Splint, but it seems to be quite intrusive and > difficult to use when not used from the beginning of the project (to > get > an idea it gives 97 warnings on `alist.c'...). I used to use Splint, and it did find a few problems for me; between the intrusiveness and the relative lack of work on it these days, and the difficulty in expressing some constructs to it, I don't bother any more. I'm not convinced it was worth the time and annoyance to annotate the code. Ken ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-14 1:25 ` Ken Raeburn @ 2009-02-14 16:09 ` Ludovic Courtès 0 siblings, 0 replies; 51+ messages in thread From: Ludovic Courtès @ 2009-02-14 16:09 UTC (permalink / raw) To: guile-devel Hi, Ken Raeburn <raeburn@raeburn.org> writes: > Quite a number of the projects at scan.coverity.com are GNU projects, > including emacs and gcc; any idea if anyone from the GNU Project has > talked to Coverity in an official capacity about getting the terms > changed? No idea. > I used to use Splint, and it did find a few problems for me; between > the intrusiveness and the relative lack of work on it these days, and > the difficulty in expressing some constructs to it, I don't bother any > more. I'm not convinced it was worth the time and annoyance to > annotate the code. Yes, that's exactly what I feared. Then there's "cilly" [0], part of the CIL framework, but I've never used it and it's unclear to me what it's good at. We could also use more GCC warnings. Thanks, Ludo'. [0] http://manju.cs.berkeley.edu/cil/cil007.html#sec-driver ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-12 12:55 ` Greg Troxel 2009-02-12 18:00 ` Ken Raeburn @ 2009-03-05 20:41 ` Neil Jerram 1 sibling, 0 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-05 20:41 UTC (permalink / raw) To: Greg Troxel; +Cc: guile-devel Greg Troxel <gdt@ir.bbn.com> writes: > Does that sound sensible; have I missed anything? > > Also run tests on other than Linux, with as many different OS threading > implementations as possible. For all systems, set the threading > libraries to the most restrictive settings, specifically aborting on any > behavior the standard says is undefined. I can't (or don't want to) do that myself. We already have some coverage here from Simon Josefsson's autobuilder, the Debian builds, and Guile users running on interesting platforms. If anyone sees opportunities for feeding Guile into more build systems, please go ahead! > If the define-race program gets added as a check that make check runs, > my autobuild machine will run it daily. Yes, I will try to incorporate define-race in this way. Thanks for your comments! Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-02-11 22:31 Locks and threads Neil Jerram ` (3 preceding siblings ...) 2009-02-12 12:55 ` Greg Troxel @ 2009-03-04 23:49 ` Neil Jerram 2009-03-05 3:54 ` Linas Vepstas ` (2 more replies) 4 siblings, 3 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-04 23:49 UTC (permalink / raw) To: Guile Development; +Cc: Linas Vepstas [-- Attachment #1: Type: text/plain, Size: 474 bytes --] Neil Jerram <neil@ossau.uklinux.net> writes: > - first to address problems reported by helgrind (since I think we > should take advantage of external tools before adding debug code to > Guile internally) Here's another lock ordering fix. (For 1.8.x at least, I haven't checked if it's relevant to master yet.) 1.8.x is then helgrind-clean (apart from one unimportant-looking termination issue), so next it's on to the real problem, threadsafe define. Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Lock-ordering-don-t-allocate-when-in-critical-secti.patch --] [-- Type: text/x-diff, Size: 6292 bytes --] From 588edc4cd1bfea7d51c9aed463e8119482e7a3f0 Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 4 Mar 2009 23:45:11 +0000 Subject: [PATCH] Lock ordering: don't allocate when in critical section (scm_sigaction_for_thread) This fixes the following helgrind report. Thread #1: lock order "0x4114748 before 0x4331084" violated at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x407A55F: deval (eval.c:4077) by 0x407A071: scm_dapply (eval.c:5012) by 0x407888A: scm_apply (eval.c:4811) by 0x407E20C: scm_call_0 (eval.c:4666) by 0x406BDF7: scm_dynamic_wind (dynwind.c:111) by 0x407620C: ceval (eval.c:4571) by 0x407E8D9: scm_primitive_eval_x (eval.c:5921) by 0x407E934: scm_eval_x (eval.c:5956) by 0x40B9140: scm_shell (script.c:737) by 0x4095AC5: invoke_main_func (init.c:367) by 0x4065A81: c_body (continuations.c:349) Required order was established by acquisition of lock at 0x4114748 at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40B7BE1: scm_sigaction_for_thread (scmsigs.c:339) by 0x40911DE: scm_gsubr_apply (gsubr.c:223) by 0x4079E36: scm_dapply (eval.c:4930) by 0x407AB68: deval (eval.c:4378) by 0x407A071: scm_dapply (eval.c:5012) by 0x407888A: scm_apply (eval.c:4811) by 0x407E1D0: scm_call_1 (eval.c:4672) by 0x407FEEE: scm_map (eval.c:5489) by 0x407BDCC: deval (eval.c:4367) by 0x407D197: deval (eval.c:3698) by 0x407A071: scm_dapply (eval.c:5012) followed by a later acquisition of lock at 0x4331084 at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40DC397: scm_enter_guile (threads.c:377) by 0x40DC8F4: scm_pthread_mutex_lock (threads.c:1485) by 0x4083FAA: scm_gc_for_newcell (gc.c:484) by 0x40A9781: scm_cons (inline.h:115) by 0x4079867: scm_dapply (eval.c:4850) by 0x407888A: scm_apply (eval.c:4811) by 0x40796D6: scm_call_3 (eval.c:4684) by 0x409A7C2: module_variable (modules.c:302) by 0x409A7EE: module_variable (modules.c:312) by 0x409A962: scm_sym2var (modules.c:466) by 0x40738F4: scm_lookupcar1 (eval.c:2874) * libguile/scmsigs.c (close_1): Renamed `handler_to_async'; also handle #f case and wrapping the async in a cons, if necessary. (install_handler): Pass in async instead of constructing it; combine two branches into one. (scm_sigaction_for_thread): Allocate async upfront instead of inside the critical section, and pass it to install_handler calls. Leave critical section before signaling out-of-range error. --- libguile/scmsigs.c | 53 ++++++++++++++++++++++++++++----------------------- 1 files changed, 29 insertions(+), 24 deletions(-) diff --git a/libguile/scmsigs.c b/libguile/scmsigs.c index 9433273..e15bbf3 100644 --- a/libguile/scmsigs.c +++ b/libguile/scmsigs.c @@ -108,10 +108,20 @@ static SIGRETTYPE (*orig_handlers[NSIG])(int); #endif static SCM -close_1 (SCM proc, SCM arg) +handler_to_async (SCM handler, int signum) { - return scm_primitive_eval_x (scm_list_3 (scm_sym_lambda, SCM_EOL, - scm_list_2 (proc, arg))); + if (scm_is_false (handler)) + return SCM_BOOL_F; + else + { + SCM async = scm_primitive_eval_x (scm_list_3 (scm_sym_lambda, SCM_EOL, + scm_list_2 (handler, + scm_from_int (signum)))); +#if !SCM_USE_PTHREAD_THREADS + async = scm_cons (async, SCM_BOOL_F); +#endif + return async; + } } #if SCM_USE_PTHREAD_THREADS @@ -239,23 +249,10 @@ ensure_signal_delivery_thread () #endif /* !SCM_USE_PTHREAD_THREADS */ static void -install_handler (int signum, SCM thread, SCM handler) +install_handler (int signum, SCM thread, SCM handler, SCM async) { - if (scm_is_false (handler)) - { - SCM_SIMPLE_VECTOR_SET (*signal_handlers, signum, SCM_BOOL_F); - SCM_SIMPLE_VECTOR_SET (signal_handler_asyncs, signum, SCM_BOOL_F); - } - else - { - SCM async = close_1 (handler, scm_from_int (signum)); -#if !SCM_USE_PTHREAD_THREADS - async = scm_cons (async, SCM_BOOL_F); -#endif - SCM_SIMPLE_VECTOR_SET (*signal_handlers, signum, handler); - SCM_SIMPLE_VECTOR_SET (signal_handler_asyncs, signum, async); - } - + SCM_SIMPLE_VECTOR_SET (*signal_handlers, signum, handler); + SCM_SIMPLE_VECTOR_SET (signal_handler_asyncs, signum, async); SCM_SIMPLE_VECTOR_SET (signal_handler_threads, signum, thread); } @@ -308,6 +305,7 @@ SCM_DEFINE (scm_sigaction_for_thread, "sigaction", 1, 3, 0, int save_handler = 0; SCM old_handler; + SCM async; csig = scm_to_signed_integer (signum, 0, NSIG-1); @@ -334,6 +332,10 @@ SCM_DEFINE (scm_sigaction_for_thread, "sigaction", 1, 3, 0, SCM_MISC_ERROR ("thread has already exited", SCM_EOL); } + /* Allocate upfront, as we can't do it inside the critical + section. */ + async = handler_to_async (handler, csig); + ensure_signal_delivery_thread (); SCM_CRITICAL_SECTION_START; @@ -351,10 +353,13 @@ SCM_DEFINE (scm_sigaction_for_thread, "sigaction", 1, 3, 0, #else chandler = (SIGRETTYPE (*) (int)) handler_int; #endif - install_handler (csig, SCM_BOOL_F, SCM_BOOL_F); + install_handler (csig, SCM_BOOL_F, SCM_BOOL_F, async); } else - SCM_OUT_OF_RANGE (2, handler); + { + SCM_CRITICAL_SECTION_END; + SCM_OUT_OF_RANGE (2, handler); + } } else if (scm_is_false (handler)) { @@ -366,7 +371,7 @@ SCM_DEFINE (scm_sigaction_for_thread, "sigaction", 1, 3, 0, { action = orig_handlers[csig]; orig_handlers[csig].sa_handler = SIG_ERR; - install_handler (csig, SCM_BOOL_F, SCM_BOOL_F); + install_handler (csig, SCM_BOOL_F, SCM_BOOL_F, async); } #else if (orig_handlers[csig] == SIG_ERR) @@ -375,7 +380,7 @@ SCM_DEFINE (scm_sigaction_for_thread, "sigaction", 1, 3, 0, { chandler = orig_handlers[csig]; orig_handlers[csig] = SIG_ERR; - install_handler (csig, SCM_BOOL_F, SCM_BOOL_F); + install_handler (csig, SCM_BOOL_F, SCM_BOOL_F, async); } #endif } @@ -391,7 +396,7 @@ SCM_DEFINE (scm_sigaction_for_thread, "sigaction", 1, 3, 0, if (orig_handlers[csig] == SIG_ERR) save_handler = 1; #endif - install_handler (csig, thread, handler); + install_handler (csig, thread, handler, async); } /* XXX - Silently ignore setting handlers for `program error signals' -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-04 23:49 ` Neil Jerram @ 2009-03-05 3:54 ` Linas Vepstas 2009-03-05 19:46 ` Neil Jerram 2009-03-05 21:35 ` Ludovic Courtès 2009-03-10 23:57 ` Neil Jerram 2 siblings, 1 reply; 51+ messages in thread From: Linas Vepstas @ 2009-03-05 3:54 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development 2009/3/4 Neil Jerram <neil@ossau.uklinux.net>: > Here's another lock ordering fix. (For 1.8.x at least, I haven't > checked if it's relevant to master yet.) I skimmed it quickly, looks reasonable, except for this: else - SCM_OUT_OF_RANGE (2, handler); + { + SCM_CRITICAL_SECTION_END; + SCM_OUT_OF_RANGE (2, handler); + } The matching SCM_CRITICAL_SECTION_START; is not in an if block, ... before your change, where was the matching END? Why is there now an END when one did not used to be needed? Was this a bug you fixed, unrelated to the rest of the patch? In the past, without this END, there should have been a deadlock, so I guess this demonstrates that the else branch is never taken? All this is rather confusing. Now, if this was the linux kernel, people would ask you to split this patch into two patches: one to fix this unbalanced start/end problem, and another to pull the alloc out of the critical section. That makes it easier to review the correctness of the changes ... --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 3:54 ` Linas Vepstas @ 2009-03-05 19:46 ` Neil Jerram 2009-03-05 20:05 ` Neil Jerram 0 siblings, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-05 19:46 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development Linas Vepstas <linasvepstas@gmail.com> writes: > I skimmed it quickly, looks reasonable, Thanks for reviewing! > except for this: > > else > - SCM_OUT_OF_RANGE (2, handler); > + { > + SCM_CRITICAL_SECTION_END; > + SCM_OUT_OF_RANGE (2, handler); > + } > > The matching SCM_CRITICAL_SECTION_START; > is not in an if block, ... before your change, where > was the matching END? Why is there now an END > when one did not used to be needed? Was this a > bug you fixed, unrelated to the rest of the patch? Yes, it's an unrelated bug. All of the places that raise errors (and so exit non-locally) should exit the critical section first. > In the past, without this END, there should have been > a deadlock, so I guess this demonstrates that the else > branch is never taken? Yes. But that's expected; the else branch is only used if `sigaction' is called with incorrect parameters. > Now, if this was the linux kernel, people would ask you > to split this patch into two patches: one to fix this > unbalanced start/end problem, and another to pull the > alloc out of the critical section. That makes it easier > to review the correctness of the changes ... You're absolutely right. I'll leave this part out, and generate a separate patch for it. Thanks again, Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 19:46 ` Neil Jerram @ 2009-03-05 20:05 ` Neil Jerram 2009-03-05 20:40 ` Linas Vepstas ` (2 more replies) 0 siblings, 3 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-05 20:05 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development [-- Attachment #1: Type: text/plain, Size: 320 bytes --] Neil Jerram <neil@ossau.uklinux.net> writes: > Yes, it's an unrelated bug. All of the places that raise errors (and > so exit non-locally) should exit the critical section first. > You're absolutely right. I'll leave this part out, and generate a > separate patch for it. Here's the separate patch... Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Avoid-throw-from-critical-section-given-invalid-sig.patch --] [-- Type: text/x-diff, Size: 2861 bytes --] From 4d78f2447a5a1e3cb1acf596929b1bfe4fbe3884 Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Thu, 5 Mar 2009 20:03:33 +0000 Subject: [PATCH] Avoid throw from critical section, given invalid sigaction call * libguile/scmsigs.c (scm_sigaction_for_thread): Exit critical section before raising out-of-range error. * test-suite/Makefile.am (SCM_TESTS): Add signals.test. * test-suite/tests/signals.test: New file. --- libguile/scmsigs.c | 5 ++++- test-suite/Makefile.am | 1 + test-suite/tests/signals.test | 29 +++++++++++++++++++++++++++++ 3 files changed, 34 insertions(+), 1 deletions(-) create mode 100644 test-suite/tests/signals.test diff --git a/libguile/scmsigs.c b/libguile/scmsigs.c index c406007..e15bbf3 100644 --- a/libguile/scmsigs.c +++ b/libguile/scmsigs.c @@ -356,7 +356,10 @@ SCM_DEFINE (scm_sigaction_for_thread, "sigaction", 1, 3, 0, install_handler (csig, SCM_BOOL_F, SCM_BOOL_F, async); } else - SCM_OUT_OF_RANGE (2, handler); + { + SCM_CRITICAL_SECTION_END; + SCM_OUT_OF_RANGE (2, handler); + } } else if (scm_is_false (handler)) { diff --git a/test-suite/Makefile.am b/test-suite/Makefile.am index df8c7bc..5492137 100644 --- a/test-suite/Makefile.am +++ b/test-suite/Makefile.am @@ -63,6 +63,7 @@ SCM_TESTS = tests/alist.test \ tests/reader.test \ tests/receive.test \ tests/regexp.test \ + tests/signals.test \ tests/socket.test \ tests/srcprop.test \ tests/srfi-1.test \ diff --git a/test-suite/tests/signals.test b/test-suite/tests/signals.test new file mode 100644 index 0000000..7cab85e --- /dev/null +++ b/test-suite/tests/signals.test @@ -0,0 +1,29 @@ +;;;; signals.test --- test suite for Guile's signal functions -*- scheme -*- +;;;; +;;;; Copyright (C) 2009 Free Software Foundation, Inc. +;;;; +;;;; This program is free software; you can redistribute it and/or modify +;;;; it under the terms of the GNU General Public License as published by +;;;; the Free Software Foundation; either version 2, or (at your option) +;;;; any later version. +;;;; +;;;; This program is distributed in the hope that it will be useful, +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;;;; GNU General Public License for more details. +;;;; +;;;; You should have received a copy of the GNU General Public License +;;;; along with this software; see the file COPYING. If not, write to +;;;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, +;;;; Boston, MA 02110-1301 USA + + +(use-modules (test-suite lib)) + +(with-test-prefix "sigaction" + + (pass-if-exception "handler arg is an invalid integer" + exception:out-of-range + (sigaction SIGINT 51)) + + ) -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 20:05 ` Neil Jerram @ 2009-03-05 20:40 ` Linas Vepstas 2009-03-05 20:49 ` Neil Jerram 2009-03-05 20:57 ` Linas Vepstas 2009-03-25 19:00 ` Neil Jerram 2 siblings, 1 reply; 51+ messages in thread From: Linas Vepstas @ 2009-03-05 20:40 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development 2009/3/5 Neil Jerram <neil@ossau.uklinux.net>: >> Yes, it's an unrelated bug. All of the places that raise errors (and >> so exit non-locally) should exit the critical section first. > >> You're absolutely right. I'll leave this part out, and generate a >> separate patch for it. > > Here's the separate patch... Looks good, except for the test case: + (pass-if-exception "handler arg is an invalid integer" + exception:out-of-range + (sigaction SIGINT 51)) whereas my bits/signum.h shows: #define SIGUNUSED 31 #define _NSIG 65 /* Biggest signal number + 1 (including real-time signals). */ I presume that in the era of 64-bit CPU's we might start seeing new sigs in the 32-64 range. Or maybe the RT kernels already do ... ? --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 20:40 ` Linas Vepstas @ 2009-03-05 20:49 ` Neil Jerram 0 siblings, 0 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-05 20:49 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development Linas Vepstas <linasvepstas@gmail.com> writes: > 2009/3/5 Neil Jerram <neil@ossau.uklinux.net>: >>> Yes, it's an unrelated bug. All of the places that raise errors (and >>> so exit non-locally) should exit the critical section first. >> >>> You're absolutely right. I'll leave this part out, and generate a >>> separate patch for it. >> >> Here's the separate patch... > > Looks good, except for the test case: > > + (pass-if-exception "handler arg is an invalid integer" > + exception:out-of-range > + (sigaction SIGINT 51)) > > whereas my bits/signum.h shows: > > #define SIGUNUSED 31 > > #define _NSIG 65 /* Biggest signal number + 1 > (including real-time signals). */ > > I presume that in the era of 64-bit CPU's we might > start seeing new sigs in the 32-64 range. Or maybe > the RT kernels already do ... ? Right, but the 51 is the handler arg, for which the only valid integer values are SIG_DFL (0) and SIG_IGN (1). I'll add a comment to explain this, and code to make the test fail in case SIG_DFL or SIG_IGN is (on some platform) 51. Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 20:05 ` Neil Jerram 2009-03-05 20:40 ` Linas Vepstas @ 2009-03-05 20:57 ` Linas Vepstas 2009-03-05 21:25 ` Neil Jerram 2009-03-25 19:00 ` Neil Jerram 2 siblings, 1 reply; 51+ messages in thread From: Linas Vepstas @ 2009-03-05 20:57 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development 2009/3/5 Neil Jerram <neil@ossau.uklinux.net>: >> Yes, it's an unrelated bug. All of the places that raise errors (and >> so exit non-locally) should exit the critical section first. > >> You're absolutely right. I'll leave this part out, and generate a >> separate patch for it. > > Here's the separate patch... Err, OK, so I thought I'd look at the code more carefully. I don't understand the patch. libguile/scmsigs.c has a SCM_CRITICAL_SECTION_START at line 339, which seems to be balanced by SCM_CRITICAL_SECTION_END; at lines 442 and 461, right before the return from the subroutine. So why insert this seemingly un-needed SCM_CRITICAL_SECTION_END, and worse, nest it in an else block? --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 20:57 ` Linas Vepstas @ 2009-03-05 21:25 ` Neil Jerram 2009-03-05 21:56 ` Linas Vepstas 0 siblings, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-05 21:25 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development Linas Vepstas <linasvepstas@gmail.com> writes: > I don't understand the patch. > > libguile/scmsigs.c has a SCM_CRITICAL_SECTION_START > at line 339, which seems to be balanced by > SCM_CRITICAL_SECTION_END; > at lines 442 and 461, right before the return from > the subroutine. > > So why insert this seemingly un-needed SCM_CRITICAL_SECTION_END, > and worse, nest it in an else block? Because when scm_sigaction_for_thread decides to throw an out-of-range error (by the `SCM_OUT_OF_RANGE' call), it will exit non-locally at that point (by calling `longjmp'). So, in this case, it won't pass through (either of) the SCM_CRITICAL_SECTION_ENDs just before the return statement. Does that make sense now? Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 21:25 ` Neil Jerram @ 2009-03-05 21:56 ` Linas Vepstas 2009-03-06 11:01 ` Andy Wingo 2009-03-08 22:04 ` Neil Jerram 0 siblings, 2 replies; 51+ messages in thread From: Linas Vepstas @ 2009-03-05 21:56 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development 2009/3/5 Neil Jerram <neil@ossau.uklinux.net>: > Linas Vepstas <linasvepstas@gmail.com> writes: > >> I don't understand the patch. >> >> libguile/scmsigs.c has a SCM_CRITICAL_SECTION_START >> at line 339, which seems to be balanced by >> SCM_CRITICAL_SECTION_END; >> at lines 442 and 461, right before the return from >> the subroutine. >> >> So why insert this seemingly un-needed SCM_CRITICAL_SECTION_END, >> and worse, nest it in an else block? > > Because when scm_sigaction_for_thread decides to throw an out-of-range > error (by the `SCM_OUT_OF_RANGE' call), it will exit non-locally at > that point (by calling `longjmp'). So, in this case, it won't pass > through (either of) the SCM_CRITICAL_SECTION_ENDs just before the > return statement. > > Does that make sense now? Ah. I suspected as much, and started reading the code, but clearly I didn't go deep enough. Just now I double checked and eventually found the longjump in throw.c after digging fairly deep. Perhaps I'm naive, perhaps some naming convention could be used to indicate that SCM_OUT_OF_RANGE will never return? None of the functions in the call stack gave any real hint that they might now return; they mostly looked liked ordinary functions. --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 21:56 ` Linas Vepstas @ 2009-03-06 11:01 ` Andy Wingo 2009-03-06 12:36 ` Linas Vepstas 2009-03-08 22:04 ` Neil Jerram 1 sibling, 1 reply; 51+ messages in thread From: Andy Wingo @ 2009-03-06 11:01 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development, Neil Jerram Hi Linas, On Thu 05 Mar 2009 22:56, Linas Vepstas <linasvepstas@gmail.com> writes: > Perhaps I'm naive, perhaps some naming convention > could be used to indicate that SCM_OUT_OF_RANGE > will never return? None of the functions in the call stack > gave any real hint that they might now return; they mostly > looked liked ordinary functions. Changing the names would be quite an undertaking. Perhaps we should be more clear about this point in the docs, though, as it is fundamental to Guile programming in C. Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-06 11:01 ` Andy Wingo @ 2009-03-06 12:36 ` Linas Vepstas 2009-03-06 22:05 ` Ludovic Courtès 0 siblings, 1 reply; 51+ messages in thread From: Linas Vepstas @ 2009-03-06 12:36 UTC (permalink / raw) To: Andy Wingo; +Cc: Guile Development, Neil Jerram 2009/3/6 Andy Wingo <wingo@pobox.com>: > Hi Linas, > > On Thu 05 Mar 2009 22:56, Linas Vepstas <linasvepstas@gmail.com> writes: > >> Perhaps I'm naive, perhaps some naming convention >> could be used to indicate that SCM_OUT_OF_RANGE >> will never return? None of the functions in the call stack >> gave any real hint that they might now return; they mostly >> looked liked ordinary functions. > > Changing the names would be quite an undertaking. Perhaps we should be > more clear about this point in the docs, though, as it is fundamental to > Guile programming in C. The Linux kernel uses the gcc markup __attribute__((noreturn)) to indicate a function that does not return; so e.g. void blahfunc(SCM args) __attribute__((noreturn)); It used to be common to see #define NORET __attribute__((noreturn)) and void NORET blafunc(...); put that style passed on ... I have no clue what the compiler does with this. perhaps it handles the stack differently, knowing its never unwound, or maybe it doesn't generate the function epilogue. Even if the compiler does nothing, it would still be a handy clue to the programmer reading the code! --linas /usr/src/linux-2.6.28.4# grep -r noreturn * |wc 92 439 7488 ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-06 12:36 ` Linas Vepstas @ 2009-03-06 22:05 ` Ludovic Courtès 0 siblings, 0 replies; 51+ messages in thread From: Ludovic Courtès @ 2009-03-06 22:05 UTC (permalink / raw) To: guile-devel Hi, Linas Vepstas <linasvepstas@gmail.com> writes: > /usr/src/linux-2.6.28.4# grep -r noreturn * |wc > 92 439 7488 $ grep SCM_NORETURN libguile/*.[ch] | wc 25 155 1758 Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 21:56 ` Linas Vepstas 2009-03-06 11:01 ` Andy Wingo @ 2009-03-08 22:04 ` Neil Jerram 1 sibling, 0 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-08 22:04 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development Linas Vepstas <linasvepstas@gmail.com> writes: > Perhaps I'm naive, perhaps some naming convention > could be used to indicate that SCM_OUT_OF_RANGE > will never return? None of the functions in the call stack > gave any real hint that they might now return; they mostly > looked liked ordinary functions. As Andy has said, this kind of thing is quite pervasive in Guile, and so I think it's something that developers become used to very quickly. (Incidentally, it happens in your define-race code too. The first time that the program hits an undefined variable, the error (from within scm_c_eval_string) causes a jump back out to scm_with_guile. For the thread concerned, therefore, s->count is never incremented anymore, and the thread just keeps hitting the same undefined variable over and over again...) In this code (scm_sigaction_for_thread) it would probably be better to use the scm_dynwind_* API, and in particular scm_dynwind_critical_section, as that would cover the mainline case and all the error cases in one go. But I'm surprised to see scm_dynwind_critical_section using a fat mutex; need to look into that a bit more... Regards, Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-05 20:05 ` Neil Jerram 2009-03-05 20:40 ` Linas Vepstas 2009-03-05 20:57 ` Linas Vepstas @ 2009-03-25 19:00 ` Neil Jerram 2009-03-25 22:08 ` Ludovic Courtès 2 siblings, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-25 19:00 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development Neil Jerram <neil@ossau.uklinux.net> writes: >> Yes, it's an unrelated bug. All of the places that raise errors (and >> so exit non-locally) should exit the critical section first. > >> You're absolutely right. I'll leave this part out, and generate a >> separate patch for it. > > Here's the separate patch... > Subject: [PATCH] Avoid throw from critical section, given invalid sigaction call > > * libguile/scmsigs.c (scm_sigaction_for_thread): Exit critical section > before raising out-of-range error. > > * test-suite/Makefile.am (SCM_TESTS): Add signals.test. > > * test-suite/tests/signals.test: New file. I've committed this now too. (Also to 1.8.x. I'll come back later to look at which of all these lock fixes are needed in master.) Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-25 19:00 ` Neil Jerram @ 2009-03-25 22:08 ` Ludovic Courtès 0 siblings, 0 replies; 51+ messages in thread From: Ludovic Courtès @ 2009-03-25 22:08 UTC (permalink / raw) To: guile-devel Hi Neil, Neil Jerram <neil@ossau.uklinux.net> writes: > Neil Jerram <neil@ossau.uklinux.net> writes: >> Subject: [PATCH] Avoid throw from critical section, given invalid sigaction call >> >> * libguile/scmsigs.c (scm_sigaction_for_thread): Exit critical section >> before raising out-of-range error. >> >> * test-suite/Makefile.am (SCM_TESTS): Add signals.test. >> >> * test-suite/tests/signals.test: New file. > > I've committed this now too. (Also to 1.8.x. I'll come back later to > look at which of all these lock fixes are needed in master.) Perfect! This one can go in `master' too, I think. Neil Jerram <neil@ossau.uklinux.net> writes: > Neil Jerram <neil@ossau.uklinux.net> writes: > >> I've been running Linas's define-race test program, and hitting the >> `throw from within critical section' quite a lot. >> >> We've already discussed this a couple of times: >> http://www.mail-archive.com/bug-guile@gnu.org/msg04613.html >> http://www.mail-archive.com/guile-devel@gnu.org/msg03016.html >> >> Andy proposed just removing the check for 1.8.x, but I would prefer to >> keep it if possible, as I think there probably still are genuine >> problems where the check could help. What about the patch below? > > I don't believe anyone commented on this, but I've committed it now to > 1.8.x. In case of any concerns, please let me know. Sorry for not commenting earlier. This looks good to me. I suppose it can't hurt in `master'; the future of critical sections is uncertain, though. Thanks! Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-04 23:49 ` Neil Jerram 2009-03-05 3:54 ` Linas Vepstas @ 2009-03-05 21:35 ` Ludovic Courtès 2009-03-10 23:57 ` Neil Jerram 2 siblings, 0 replies; 51+ messages in thread From: Ludovic Courtès @ 2009-03-05 21:35 UTC (permalink / raw) To: guile-devel Hello Neil, Thanks for the patches and tests! Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-04 23:49 ` Neil Jerram 2009-03-05 3:54 ` Linas Vepstas 2009-03-05 21:35 ` Ludovic Courtès @ 2009-03-10 23:57 ` Neil Jerram 2009-03-12 0:07 ` Neil Jerram 2009-03-25 18:57 ` Neil Jerram 2 siblings, 2 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-10 23:57 UTC (permalink / raw) To: Guile Development; +Cc: Linas Vepstas [-- Attachment #1: Type: text/plain, Size: 617 bytes --] Neil Jerram <neil@ossau.uklinux.net> writes: > next it's on to the real problem, threadsafe define. I've been running Linas's define-race test program, and hitting the `throw from within critical section' quite a lot. We've already discussed this a couple of times: http://www.mail-archive.com/bug-guile@gnu.org/msg04613.html http://www.mail-archive.com/guile-devel@gnu.org/msg03016.html Andy proposed just removing the check for 1.8.x, but I would prefer to keep it if possible, as I think there probably still are genuine problems where the check could help. What about the patch below? Thanks, Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Fix-spurious-throw-from-within-critical-section-er.patch --] [-- Type: text/x-diff, Size: 4178 bytes --] From 2daf18f36c6428cbc883c0eb1a381ffbba59614b Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Tue, 10 Mar 2009 23:55:31 +0000 Subject: [PATCH] Fix spurious `throw from within critical section' errors The crux of this problem was that the thread doing a throw, and so checking scm_i_critical_section_level, was different from the thread that was in a critical section. * libguile/async.h (scm_i_critical_section_level): Removed, replaced by per-thread critical_section_level. (SCM_CRITICAL_SECTION_START, SCM_CRITICAL_SECTION_END): Use per-thread critical_section_level. * libguile/continuations.c (scm_dynthrow): Check per-thread critical_section_level. * libguile/threads.c (guilify_self_1): Init per-thread critical_section_level. (scm_i_critical_section_level): Removed. * libguile/threads.h (scm_i_thread): New critical_section_level field. * libguile/throw.c (scm_ithrow): Check per-thread critical_section_level. --- libguile/async.h | 8 +++----- libguile/continuations.c | 2 +- libguile/threads.c | 2 +- libguile/threads.h | 3 +++ libguile/throw.c | 2 +- 5 files changed, 9 insertions(+), 8 deletions(-) diff --git a/libguile/async.h b/libguile/async.h index a81a98d..3c1725f 100644 --- a/libguile/async.h +++ b/libguile/async.h @@ -58,20 +58,18 @@ void scm_dynwind_unblock_asyncs (void); the manual. */ -/* Defined in threads.c. scm_i_critical_section_level is only used - for error checking and will go away eventually. */ +/* Defined in threads.c. */ extern scm_i_pthread_mutex_t scm_i_critical_section_mutex; -extern int scm_i_critical_section_level; #define SCM_CRITICAL_SECTION_START \ do { \ scm_i_pthread_mutex_lock (&scm_i_critical_section_mutex);\ SCM_I_CURRENT_THREAD->block_asyncs++; \ - scm_i_critical_section_level++; \ + SCM_I_CURRENT_THREAD->critical_section_level++; \ } while (0) #define SCM_CRITICAL_SECTION_END \ do { \ - scm_i_critical_section_level--; \ + SCM_I_CURRENT_THREAD->critical_section_level--; \ SCM_I_CURRENT_THREAD->block_asyncs--; \ scm_i_pthread_mutex_unlock (&scm_i_critical_section_mutex); \ scm_async_click (); \ diff --git a/libguile/continuations.c b/libguile/continuations.c index 74bb911..69d2569 100644 --- a/libguile/continuations.c +++ b/libguile/continuations.c @@ -256,7 +256,7 @@ scm_dynthrow (SCM cont, SCM val) SCM_STACKITEM *dst = thread->continuation_base; SCM_STACKITEM stack_top_element; - if (scm_i_critical_section_level) + if (thread->critical_section_level) { fprintf (stderr, "continuation invoked from within critical section.\n"); abort (); diff --git a/libguile/threads.c b/libguile/threads.c index 05e01e3..fc3e607 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -422,6 +422,7 @@ guilify_self_1 (SCM_STACKITEM *base) t->active_asyncs = SCM_EOL; t->block_asyncs = 1; t->pending_asyncs = 1; + t->critical_section_level = 0; t->last_debug_frame = NULL; t->base = base; #ifdef __ia64__ @@ -1667,7 +1668,6 @@ scm_i_thread_sleep_for_gc () /* This mutex is used by SCM_CRITICAL_SECTION_START/END. */ scm_i_pthread_mutex_t scm_i_critical_section_mutex; -int scm_i_critical_section_level = 0; static SCM dynwind_critical_section_mutex; diff --git a/libguile/threads.h b/libguile/threads.h index 9417b88..2b0e067 100644 --- a/libguile/threads.h +++ b/libguile/threads.h @@ -113,6 +113,9 @@ typedef struct scm_i_thread { scm_t_contregs *pending_rbs_continuation; #endif + /* Whether this thread is in a critical section. */ + int critical_section_level; + } scm_i_thread; #define SCM_I_IS_THREAD(x) SCM_SMOB_PREDICATE (scm_tc16_thread, x) diff --git a/libguile/throw.c b/libguile/throw.c index 7e2803f..92c5a1a 100644 --- a/libguile/throw.c +++ b/libguile/throw.c @@ -692,7 +692,7 @@ scm_ithrow (SCM key, SCM args, int noreturn SCM_UNUSED) SCM dynpair = SCM_UNDEFINED; SCM winds; - if (scm_i_critical_section_level) + if (SCM_I_CURRENT_THREAD->critical_section_level) { fprintf (stderr, "throw from within critical section.\n"); abort (); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-10 23:57 ` Neil Jerram @ 2009-03-12 0:07 ` Neil Jerram 2009-03-12 0:53 ` Neil Jerram 2009-03-25 18:57 ` Neil Jerram 1 sibling, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-12 0:07 UTC (permalink / raw) To: Guile Development; +Cc: Linas Vepstas [-- Attachment #1: Type: text/plain, Size: 1001 bytes --] Neil Jerram <neil@ossau.uklinux.net> writes: >> next it's on to the real problem, threadsafe define. > > I've been running Linas's define-race test program, and hitting the > `throw from within critical section' quite a lot. [...] After the patch that I posted yesterday, we finally get to the real problem. I've tried to address it by allowing a hashtable to have a mutex associated with it - resulting in the patch below. But unfortunately it's still not good; a sample run of test-define-race gives: #<module (guile) 40376f00> #<directory (guile-user) 40379680> ERROR: ERROR: Unbound variable: x1-100499 Unbound variable: define ERROR: Unbound variable: x4-100596 ERROR: Unbound variable: define ERROR: Unbound variable: define ERROR: Unbound variable: define guile-define test case: good-bye! test-define-race: 2 error(s) FAIL: test-define-race I'm off to sleep now, so I thought I'd post what I've done in case others have thoughts on it or can see something wrong. Regards, Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Work-on-define-race.patch --] [-- Type: text/x-diff, Size: 16184 bytes --] From fd6931d5694f336aac9b9c5179dc0a4fc6f2c4c5 Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 11 Mar 2009 23:54:38 +0000 Subject: [PATCH] Work on define race --- libguile/hashtab.c | 130 +++++++++++++++++++++--- libguile/hashtab.h | 4 + libguile/modules.c | 1 - libguile/throw.c | 5 +- test-suite/standalone/Makefile.am | 5 + test-suite/standalone/test-define-race.c | 163 ++++++++++++++++++++++++++++++ 6 files changed, 291 insertions(+), 17 deletions(-) create mode 100644 test-suite/standalone/test-define-race.c diff --git a/libguile/hashtab.c b/libguile/hashtab.c index ea7fc69..098b610 100644 --- a/libguile/hashtab.c +++ b/libguile/hashtab.c @@ -103,6 +103,7 @@ make_hash_table (int flags, unsigned long k, const char *func_name) t->upper = 9 * n / 10; t->flags = flags; t->hash_fn = NULL; + t->mutex = SCM_BOOL_F; if (flags) { SCM_NEWSMOB3 (table, scm_tc16_hashtable, vector, t, weak_hashtables); @@ -199,6 +200,13 @@ scm_i_rehash (SCM table, } +static SCM +hashtable_mark (SCM table) +{ + scm_gc_mark (SCM_HASHTABLE_MUTEX (table)); + return SCM_CELL_OBJECT_1 (table); +} + static int hashtable_print (SCM exp, SCM port, scm_print_state *pstate SCM_UNUSED) { @@ -417,18 +425,34 @@ scm_hash_fn_get_handle (SCM table, SCM obj, unsigned long (*hash_fn)(), SCM (*as #define FUNC_NAME "scm_hash_fn_get_handle" { unsigned long k; - SCM h; + SCM h, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - table = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + table = SCM_HASHTABLE_VECTOR (table); + } else SCM_VALIDATE_VECTOR (1, table); if (SCM_SIMPLE_VECTOR_LENGTH (table) == 0) - return SCM_BOOL_F; + { + h = SCM_BOOL_F; + goto end; + } k = hash_fn (obj, SCM_SIMPLE_VECTOR_LENGTH (table), closure); if (k >= SCM_SIMPLE_VECTOR_LENGTH (table)) scm_out_of_range ("hash_fn_get_handle", scm_from_ulong (k)); h = assoc_fn (obj, SCM_SIMPLE_VECTOR_REF (table, k), closure); + + end: + if (!scm_is_false (mutex)) + scm_dynwind_end (); + return h; } #undef FUNC_NAME @@ -440,10 +464,18 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init, unsigned long (*hash_ #define FUNC_NAME "scm_hash_fn_create_handle_x" { unsigned long k; - SCM buckets, it; + SCM buckets, it, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - buckets = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + buckets = SCM_HASHTABLE_VECTOR (table); + } else { SCM_ASSERT (scm_is_simple_vector (table), @@ -458,7 +490,7 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init, unsigned long (*hash_ scm_out_of_range ("hash_fn_create_handle_x", scm_from_ulong (k)); it = assoc_fn (obj, SCM_SIMPLE_VECTOR_REF (buckets, k), closure); if (scm_is_pair (it)) - return it; + ; else if (scm_is_true (it)) scm_wrong_type_arg_msg (NULL, 0, it, "a pair"); else @@ -492,8 +524,13 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init, unsigned long (*hash_ || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table)) scm_i_rehash (table, hash_fn, closure, FUNC_NAME); } - return SCM_CAR (new_bucket); + it = SCM_CAR (new_bucket); } + + if (!scm_is_false (mutex)) + scm_dynwind_end (); + + return it; } #undef FUNC_NAME @@ -531,10 +568,18 @@ scm_hash_fn_remove_x (SCM table, SCM obj, void *closure) { unsigned long k; - SCM buckets, h; + SCM buckets, h, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - buckets = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + buckets = SCM_HASHTABLE_VECTOR (table); + } else { SCM_ASSERT (scm_is_simple_vector (table), table, @@ -542,7 +587,10 @@ scm_hash_fn_remove_x (SCM table, SCM obj, buckets = table; } if (SCM_SIMPLE_VECTOR_LENGTH (table) == 0) - return SCM_EOL; + { + h = SCM_EOL; + goto end; + } k = hash_fn (obj, SCM_SIMPLE_VECTOR_LENGTH (buckets), closure); if (k >= SCM_SIMPLE_VECTOR_LENGTH (buckets)) @@ -559,6 +607,9 @@ scm_hash_fn_remove_x (SCM table, SCM obj, scm_i_rehash (table, hash_fn, closure, "scm_hash_fn_remove_x"); } } + end: + if (!scm_is_false (mutex)) + scm_dynwind_end (); return h; } @@ -569,8 +620,16 @@ SCM_DEFINE (scm_hash_clear_x, "hash-clear!", 1, 0, 0, { if (SCM_HASHTABLE_P (table)) { + SCM mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } scm_vector_fill_x (SCM_HASHTABLE_VECTOR (table), SCM_EOL); SCM_SET_HASHTABLE_N_ITEMS (table, 0); + if (!scm_is_false (mutex)) + scm_dynwind_end (); } else scm_vector_fill_x (table, SCM_EOL); @@ -917,10 +976,18 @@ SCM scm_internal_hash_fold (SCM (*fn) (), void *closure, SCM init, SCM table) { long i, n; - SCM buckets, result = init; + SCM buckets, result = init, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - buckets = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + buckets = SCM_HASHTABLE_VECTOR (table); + } else buckets = table; @@ -940,6 +1007,9 @@ scm_internal_hash_fold (SCM (*fn) (), void *closure, SCM init, SCM table) } } + if (!scm_is_false (mutex)) + scm_dynwind_end (); + return result; } @@ -955,10 +1025,18 @@ void scm_internal_hash_for_each_handle (SCM (*fn) (), void *closure, SCM table) { long i, n; - SCM buckets; + SCM buckets, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - buckets = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + buckets = SCM_HASHTABLE_VECTOR (table); + } else buckets = table; @@ -977,6 +1055,9 @@ scm_internal_hash_for_each_handle (SCM (*fn) (), void *closure, SCM table) ls = SCM_CDR (ls); } } + + if (!scm_is_false (mutex)) + scm_dynwind_end (); } SCM_DEFINE (scm_hash_fold, "hash-fold", 3, 0, 0, @@ -1065,6 +1146,25 @@ SCM_DEFINE (scm_hash_map_to_list, "hash-map->list", 2, 0, 0, } #undef FUNC_NAME +SCM_DEFINE (scm_hash_use_mutex_x, "hash-use-mutex!", 2, 0, 0, + (SCM table, SCM mutex), + "Use @var{mutex} to serialize operations on @var{table} from multiple threads.") +#define FUNC_NAME s_scm_hash_use_mutex_x +{ + /* Must be a real (i.e. not a vector) and non-weak hash table. */ + SCM_VALIDATE_HASHTABLE (1, table); + if (SCM_HASHTABLE_WEAK_P (table)) + SCM_MISC_ERROR ("can't use mutex with a weak hash table", SCM_EOL); + + if (!scm_is_false (mutex)) + SCM_VALIDATE_MUTEX (2, mutex); + + SCM_SET_HASHTABLE_MUTEX (table, mutex); + + return SCM_UNSPECIFIED; +} +#undef FUNC_NAME + \f @@ -1072,7 +1172,7 @@ void scm_hashtab_prehistory () { scm_tc16_hashtable = scm_make_smob_type (s_hashtable, 0); - scm_set_smob_mark (scm_tc16_hashtable, scm_markcdr); + scm_set_smob_mark (scm_tc16_hashtable, hashtable_mark); scm_set_smob_print (scm_tc16_hashtable, hashtable_print); scm_set_smob_free (scm_tc16_hashtable, hashtable_free); scm_c_hook_add (&scm_after_gc_c_hook, rehash_after_gc, 0, 0); diff --git a/libguile/hashtab.h b/libguile/hashtab.h index 1017354..250f59a 100644 --- a/libguile/hashtab.h +++ b/libguile/hashtab.h @@ -58,6 +58,8 @@ SCM_API scm_t_bits scm_tc16_hashtable; #define SCM_HASHTABLE_DECREMENT(x) (SCM_HASHTABLE_N_ITEMS(x)--) #define SCM_HASHTABLE_UPPER(x) (SCM_HASHTABLE (x)->upper) #define SCM_HASHTABLE_LOWER(x) (SCM_HASHTABLE (x)->lower) +#define SCM_HASHTABLE_MUTEX(x) (SCM_HASHTABLE (x)->mutex) +#define SCM_SET_HASHTABLE_MUTEX(x, m) (SCM_HASHTABLE (x)->mutex = m) #define SCM_HASHTABLE_N_BUCKETS(h) \ SCM_SIMPLE_VECTOR_LENGTH (SCM_HASHTABLE_VECTOR (h)) @@ -74,6 +76,7 @@ typedef struct scm_t_hashtable { int size_index; /* index into hashtable_size */ int min_size_index; /* minimum size_index */ unsigned long (*hash_fn) (); /* for rehashing after a GC. */ + SCM mutex; /* mutex for thread safety */ } scm_t_hashtable; \f @@ -132,6 +135,7 @@ SCM_API SCM scm_hash_fold (SCM proc, SCM init, SCM hash); SCM_API SCM scm_hash_for_each (SCM proc, SCM hash); SCM_API SCM scm_hash_for_each_handle (SCM proc, SCM hash); SCM_API SCM scm_hash_map_to_list (SCM proc, SCM hash); +SCM_API SCM scm_hash_use_mutex_x (SCM hash, SCM mutex); SCM_API void scm_hashtab_prehistory (void); SCM_API void scm_init_hashtab (void); diff --git a/libguile/modules.c b/libguile/modules.c index 444ece1..da397d8 100644 --- a/libguile/modules.c +++ b/libguile/modules.c @@ -279,7 +279,6 @@ SCM_DEFINE (scm_env_module, "env-module", 1, 0, 0, * C level implementation of the standard eval closure * * This increases loading speed substantially. - * The code will be replaced by the low-level environments in next release. */ static SCM module_make_local_var_x_var; diff --git a/libguile/throw.c b/libguile/throw.c index 92c5a1a..b2ec371 100644 --- a/libguile/throw.c +++ b/libguile/throw.c @@ -264,6 +264,7 @@ scm_c_with_throw_handler (SCM tag, { SCM pre_unwind, answer; struct pre_unwind_data c; + SCM new_dynwinds; c.handler = handler; c.handler_data = handler_data; @@ -272,7 +273,9 @@ scm_c_with_throw_handler (SCM tag, pre_unwind = make_pre_unwind_data (&c); SCM_CRITICAL_SECTION_START; - scm_i_set_dynwinds (scm_acons (tag, pre_unwind, scm_i_dynwinds ())); + new_dynwinds = scm_acons (tag, pre_unwind, SCM_BOOL_F); + SCM_SETCDR (new_dynwinds, scm_i_dynwinds ()); + scm_i_set_dynwinds (new_dynwinds); SCM_CRITICAL_SECTION_END; answer = (*body) (body_data); diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am index 356573f..bb2f089 100644 --- a/test-suite/standalone/Makefile.am +++ b/test-suite/standalone/Makefile.am @@ -137,6 +137,11 @@ test_scm_with_guile_LDADD = ${top_builddir}/libguile/libguile.la check_PROGRAMS += test-scm-with-guile TESTS += test-scm-with-guile +test_define_race_CFLAGS = ${test_cflags} +test_define_race_LDADD = ${top_builddir}/libguile/libguile.la +check_PROGRAMS += test-define-race +TESTS += test-define-race + else EXTRA_DIST += test-with-guile-module.c test-scm-with-guile.c diff --git a/test-suite/standalone/test-define-race.c b/test-suite/standalone/test-define-race.c new file mode 100644 index 0000000..a79dcee --- /dev/null +++ b/test-suite/standalone/test-define-race.c @@ -0,0 +1,163 @@ +/* Copyright (C) 2009 Free Software Foundation, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/* + * test-define-race.c + * + * Program to exhibit a race in the guile-1.8.x define code. + * See https://savannah.gnu.org/bugs/index.php?24867 for general + * status and description. + * + * Summary: variable definition and lookup is not thread-safe in guile; + * attempting to look up a variable while another thread is defining + * a variable can sometimes lead to the first thread loosing, and not + * seeing an existing, defined variable. Alternately, difining two + * different variables at the same time can result in one of them + * failing to be defined; on rarer occasions, a seg-fault results. + * + * Compile as: + * cc test-define-race.c -lpthread -lguile + * + * May need to run several times to see the bug(s). + * + * Linas Vepstas <linasvepstas@gmail.com> December 2008 + */ + +#include <libguile.h> +#include <pthread.h> +#include <stdio.h> +#include <stdlib.h> + +typedef struct +{ + int id; + int count; + int do_exit; + int error_count; + int last_ok; + +} state; + +static void * guile_mode_definer(void * ud) +{ + SCM result; + int val; + + char buff[1000]; + state * s = (state *) ud; + + /* Increment before evaluation, in case evaluation raises an + error. */ + s->count ++; + + if (s->last_ok) + { + /* See if the previous definition holds the expected value. If + the expected value is undefined, scm_c_eval_string will raise + an error. */ + s->error_count ++; + s->last_ok = 0; + sprintf (buff, "x%d-%d\n", s->id, s->count - 1); + result = scm_c_eval_string (buff); + val = scm_to_int(result); + + if (val != s->count - 1) + printf ("Define mismatch on thread %d\n", s->id); + else + s->error_count --; + } + + /* Define a new variable with a new value. */ + sprintf (buff, "(define x%d-%d %d)\n", s->id, s->count, s->count); + scm_c_eval_string (buff); + + /* If we reach here, the definition was apparently successful, so we + can check it on the next iteration. */ + s->last_ok = 1; + + return NULL; +} + +static void * definer (void *ud) +{ + int i; + state * s = (state *) ud; + + while(!s->do_exit) + for (i=0; i<4000; i++) + { + scm_with_guile (guile_mode_definer, ud); + sched_yield(); /* try to get the threads to inter-leave a lot */ + } + return NULL; +} + +static void init_ctr(state *s, int val) +{ + s->id = val; + s->count = 0; + s->do_exit = 0; + s->error_count = 0; + s->last_ok = 0; +} + +static void * be_threadsafe(void * ud) +{ + scm_c_eval_string ("(begin (write the-root-module) (newline))"); + scm_c_eval_string ("(begin (write (module-obarray the-root-module)) (newline))"); + scm_c_eval_string ("(hash-use-mutex! (module-obarray the-root-module) (make-mutex))"); + scm_c_eval_string ("(begin (write (current-module)) (newline))"); + scm_c_eval_string ("(begin (write (module-obarray (current-module))) (newline))"); + scm_c_eval_string ("(hash-use-mutex! (module-obarray (current-module)) (make-mutex))"); + return NULL; +} + +int main(int argc, char ** argv) +{ + pthread_t th1, th2, th3, th4; + state counter1, counter2, counter3, counter4; + int error_total; + + scm_with_guile (be_threadsafe, NULL); + + init_ctr (&counter1, 1); + init_ctr (&counter2, 2); + init_ctr (&counter3, 3); + init_ctr (&counter4, 4); + + pthread_create(&th1, NULL, definer, (void *) &counter1); + pthread_create(&th2, NULL, definer, (void *) &counter2); + pthread_create(&th3, NULL, definer, (void *) &counter3); + pthread_create(&th4, NULL, definer, (void *) &counter4); + + sleep(200); + counter1.do_exit = 1; + counter2.do_exit = 1; + counter3.do_exit = 1; + counter4.do_exit = 1; + printf ("guile-define test case: good-bye!\n"); + + pthread_join(th1, NULL); + pthread_join(th2, NULL); + pthread_join(th3, NULL); + pthread_join(th4, NULL); + + error_total = (counter1.error_count + counter2.error_count + + counter3.error_count + counter4.error_count); + printf("test-define-race: %d error(s)\n", error_total); + exit (error_total); +} -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-12 0:07 ` Neil Jerram @ 2009-03-12 0:53 ` Neil Jerram 2009-03-12 1:29 ` Linas Vepstas 2009-03-12 22:13 ` Andy Wingo 0 siblings, 2 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-12 0:53 UTC (permalink / raw) To: Guile Development; +Cc: Linas Vepstas Neil Jerram <neil@ossau.uklinux.net> writes: > #<module (guile) 40376f00> > #<directory (guile-user) 40379680> > ERROR: ERROR: Unbound variable: x1-100499 > Unbound variable: define > ERROR: Unbound variable: x4-100596 > ERROR: Unbound variable: define > ERROR: Unbound variable: define > ERROR: Unbound variable: define > guile-define test case: good-bye! > test-define-race: 2 error(s) > FAIL: test-define-race > > I'm off to sleep now, so I thought I'd post what I've done in case > others have thoughts on it or can see something wrong. Thanks to a hint from helgrind, I think the problem might be that the symbols obarray is not thread-safe. But surely using a mutex for every symbol that Guile reads would be terrible for performance... so I hope there is an alternative. Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-12 0:53 ` Neil Jerram @ 2009-03-12 1:29 ` Linas Vepstas 2009-03-12 3:09 ` Clinton Ebadi 2009-03-25 22:13 ` Neil Jerram 2009-03-12 22:13 ` Andy Wingo 1 sibling, 2 replies; 51+ messages in thread From: Linas Vepstas @ 2009-03-12 1:29 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development 2009/3/11 Neil Jerram <neil@ossau.uklinux.net>: > Neil Jerram <neil@ossau.uklinux.net> writes: > >> #<module (guile) 40376f00> >> #<directory (guile-user) 40379680> >> ERROR: ERROR: Unbound variable: x1-100499 >> Unbound variable: define >> ERROR: Unbound variable: x4-100596 >> ERROR: Unbound variable: define >> ERROR: Unbound variable: define >> ERROR: Unbound variable: define >> guile-define test case: good-bye! >> test-define-race: 2 error(s) >> FAIL: test-define-race >> >> I'm off to sleep now, so I thought I'd post what I've done in case >> others have thoughts on it or can see something wrong. > > Thanks to a hint from helgrind, I think the problem might be that the > symbols obarray is not thread-safe. But surely using a mutex for > every symbol that Guile reads would be terrible for performance... so > I hope there is an alternative. Well, once you identify the section that needs locking, you'll want to use an rwlock instead of a mutex. The rwlock (pthread_rwlock_rdlock) allows multiple simultaneous readers. The writers, however, get exclusive access. (pthread_rwlock_wrlock) I have no clue as to what the overhead is. I know that the powerpc architecture implements locking primitives that are extremely fast (a few cycles); I assume Intel does as well. These are used liberally within the kernel to implement all sorts of fast atomic ops. Every now and then, some newcomer reinvents a locking system using these primitives, "for performance reasons", and posts it to some mailing list. The standard, canonical response is "you're an idiot", followed by pointers to 5 different bugs in the poster's code, and an explanation that the pthreads implementation is faster than his, and that its also debugged. (The response is less blunt when the poster is a customer of yours, and the email was delivered via some bushy-tailed executive). An alternative idea is to try to apply some principles from functional programming, e.g. "copy on write" (COW): when the obarray needs to be updated, make a copy of it. Any readers in other threads continue to safely use the original version. When the new obarray is fully constructed, new readers will use it. The old obarray is garbage-collected in due time. I dunno if this could be applied for this case; at least we have garbage collection, which removes one major stumbling block for using COW. --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-12 1:29 ` Linas Vepstas @ 2009-03-12 3:09 ` Clinton Ebadi 2009-03-25 22:13 ` Neil Jerram 1 sibling, 0 replies; 51+ messages in thread From: Clinton Ebadi @ 2009-03-12 3:09 UTC (permalink / raw) To: Guile Development; +Cc: linasvepstas Linas Vepstas <linasvepstas@gmail.com> writes: > An alternative idea is to try to apply some principles from > functional programming, e.g. "copy on write" (COW): when > the obarray needs to be updated, make a copy of it. Any > readers in other threads continue to safely use the > original version. When the new obarray is fully constructed, > new readers will use it. The old obarray is garbage-collected > in due time. I dunno if this could be applied for this case; > at least we have garbage collection, which removes one > major stumbling block for using COW. If the obarray were implemented as a persistent tree instead of hash table you wouldn't even need to copy it--the only operation that would need to be atomic would be committing the change--swap the pointer to the root, but only if the current root is the same as when you started modifying the obarray; otherwise merge your changes into what is presently the root if it changed in the meantime (alternatively, lock around the entire insertion/replace block). Lookup time would be increased from amortized O(1) to O(logN) and generate a bit of garbage on every insert (depth(tree) nodes), but minimizes potentially expensive locking (e.g. if multiple concurrent readers are allowed while writing to the hash table you have to handle locking everyone out when the table needs rehashing--with the persistent tree approach readers *never* block). I suppose the best way to find out would be to implement it, and I may do so as an experiment (it would require modifying some C, however, which I am not so inclined to use for a mere thought experiment). -- Lindsay (Carlton): should i eat more post its ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-12 1:29 ` Linas Vepstas 2009-03-12 3:09 ` Clinton Ebadi @ 2009-03-25 22:13 ` Neil Jerram 2009-03-25 22:34 ` Linas Vepstas 1 sibling, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-25 22:13 UTC (permalink / raw) To: linasvepstas; +Cc: Guile Development Linas Vepstas <linasvepstas@gmail.com> writes: > Well, once you identify the section that needs locking, > you'll want to use an rwlock instead of a mutex. The > rwlock (pthread_rwlock_rdlock) allows multiple > simultaneous readers. The writers, however, get > exclusive access. (pthread_rwlock_wrlock) I forgot to comment on this before... In my view the important case to optimize is the one with only a single thread - i.e. I don't want to add any significant performance penalty in that case. And for that case, IIUC, there is no difference between a mutex and an rwlock. Also I would guess that we would hit more portability issues in future with a rwlock than with a mutex. Of course, we could look at configure time for rwlock support, and fall back to a mutex - but that has a cost too, in code complexity. On balance, therefore, I don't think using a rwlock rather than a mutex is worthwhile. Regards, Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-25 22:13 ` Neil Jerram @ 2009-03-25 22:34 ` Linas Vepstas 0 siblings, 0 replies; 51+ messages in thread From: Linas Vepstas @ 2009-03-25 22:34 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development 2009/3/25 Neil Jerram <neil@ossau.uklinux.net>: > Linas Vepstas <linasvepstas@gmail.com> writes: > >> Well, once you identify the section that needs locking, >> you'll want to use an rwlock instead of a mutex. The >> rwlock (pthread_rwlock_rdlock) allows multiple >> simultaneous readers. The writers, however, get >> exclusive access. (pthread_rwlock_wrlock) > > I don't want to add any significant performance > penalty in that case. And for that case, IIUC, there is no difference > between a mutex and an rwlock. If there's no perf difference, then isn't that an argument *for* rwlock, instead of against it? > Also I would guess that we would hit more portability issues in future > with a rwlock than with a mutex. Well, since rwlock is a part of posix threads, wouldn't it be available universally (wherever posix threads are available)? Are you voicing some suspicion that its somehow buggier, or less tested? (I have no direct experience on these two perf/portability issues myself). --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-12 0:53 ` Neil Jerram 2009-03-12 1:29 ` Linas Vepstas @ 2009-03-12 22:13 ` Andy Wingo 2009-03-13 19:13 ` Neil Jerram 2009-03-14 14:23 ` Ludovic Courtès 1 sibling, 2 replies; 51+ messages in thread From: Andy Wingo @ 2009-03-12 22:13 UTC (permalink / raw) To: Neil Jerram; +Cc: Linas Vepstas, Guile Development Hey Neil, On Thu 12 Mar 2009 01:53, Neil Jerram <neil@ossau.uklinux.net> writes: > Thanks to a hint from helgrind, I think the problem might be that the > symbols obarray is not thread-safe. But surely using a mutex for > every symbol that Guile reads would be terrible for performance... Dunno, in GStreamer we found that uncontended locks are cheap cheap cheap. AFAIU they don't even cause context switches. And the reader will be less important in terms of e.g. startup time once the VM lands. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-12 22:13 ` Andy Wingo @ 2009-03-13 19:13 ` Neil Jerram 2009-03-25 23:19 ` Neil Jerram 2009-03-14 14:23 ` Ludovic Courtès 1 sibling, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-13 19:13 UTC (permalink / raw) To: Andy Wingo, Linas Vepstas; +Cc: Clinton Ebadi, Guile Development Andy Wingo <wingo@pobox.com> writes: > Hey Neil, > > On Thu 12 Mar 2009 01:53, Neil Jerram <neil@ossau.uklinux.net> writes: > >> Thanks to a hint from helgrind, I think the problem might be that the >> symbols obarray is not thread-safe. But surely using a mutex for >> every symbol that Guile reads would be terrible for performance... > > Dunno, in GStreamer we found that uncontended locks are cheap cheap > cheap. AFAIU they don't even cause context switches. And the reader will > be less important in terms of e.g. startup time once the VM lands. Hi Linas, Andy, Many thanks for your input on this. I'll go ahead with a mutex or rwlock. First thing is to see if it fixes the problem; if it does, I'll try to measure the performance impact. Neil PS. And thanks Clinton too - although I'd prefer not to have to do such a big rewrite if we can manage without it. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-13 19:13 ` Neil Jerram @ 2009-03-25 23:19 ` Neil Jerram 2009-03-26 3:40 ` Linas Vepstas ` (2 more replies) 0 siblings, 3 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-25 23:19 UTC (permalink / raw) To: Andy Wingo; +Cc: Clinton Ebadi, Linas Vepstas, Guile Development [-- Attachment #1: Type: text/plain, Size: 1774 bytes --] Neil Jerram <neil@ossau.uklinux.net> writes: > Many thanks for your input on this. I'll go ahead with a mutex or > rwlock. First thing is to see if it fixes the problem; if it does, > I'll try to measure the performance impact. Attached are 3 patches relating to thread-safe define. #1 is Linas's test-define-race test, modified by me. This test reliably fails (without the following fix) if run for 200 seconds, but I don't think we can have a test in Guile's "make check" that takes 200s. Therefore I've coded it so that - it runs for a duration specified by environment variable GUILE_TEST_DEFINE_RACE_DURATION - it defaults to just 10s if that variable isn't defined. So people who want to run this test meaningfully on their platforms can do "GUILE_TEST_DEFINE_RACE_DURATION=200 make check". #2 makes the symbols hash thread-safe, and it appears that this completely fixes the define-race problem. I don't understand why we apparently don't need patch #3 as well - but that's what my results indicate. #3 allows a non-weak hash table to be automatically thread-safe, by associating a fat mutex with it. #3 is what I tried first, applying it to the module-obarray of the (guile) and (guile-user) modules. But test-define-race still gave errors. Then I added #2, and that fixed the errors. Then I checked #2 without #3, and that fixes all the test-define-race errors too. So apparently #3 is not needed. It looks like we should commit #1 and #2 (although maybe with rwlock instead of mutex), so please send any comments you have on those. I'd also appreciate thoughts on #3 as a possible hash table enhancement (for master), and on why we don't apparently need to make the module obarray hash tables thread-safe in this way. Regards, Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-New-test-for-thread-safe-define.patch --] [-- Type: text/x-diff, Size: 6167 bytes --] From 1f2707cc9473548863b483fd12d97afe7d0c94c2 Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 25 Mar 2009 20:55:37 +0000 Subject: [PATCH] New test for thread-safe define Written by Linas Vepstas; modified by Neil Jerram. * test-suite/standalone/Makefile.am: Add test-define-race test. * test-suite/standalone/test-define-race.c: New test. --- test-suite/standalone/Makefile.am | 5 + test-suite/standalone/test-define-race.c | 168 ++++++++++++++++++++++++++++++ 2 files changed, 173 insertions(+), 0 deletions(-) create mode 100644 test-suite/standalone/test-define-race.c diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am index e7cfd82..766e447 100644 --- a/test-suite/standalone/Makefile.am +++ b/test-suite/standalone/Makefile.am @@ -145,6 +145,11 @@ test_scm_with_guile_LDADD = ${top_builddir}/libguile/libguile.la check_PROGRAMS += test-scm-with-guile TESTS += test-scm-with-guile +test_define_race_CFLAGS = ${test_cflags} +test_define_race_LDADD = ${top_builddir}/libguile/libguile.la +check_PROGRAMS += test-define-race +TESTS += test-define-race + else EXTRA_DIST += test-with-guile-module.c test-scm-with-guile.c diff --git a/test-suite/standalone/test-define-race.c b/test-suite/standalone/test-define-race.c new file mode 100644 index 0000000..f375d55 --- /dev/null +++ b/test-suite/standalone/test-define-race.c @@ -0,0 +1,168 @@ +/* Copyright (C) 2009 Free Software Foundation, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/* + * test-define-race.c + * + * Program to exhibit a race in the guile-1.8.x define code. + * See https://savannah.gnu.org/bugs/index.php?24867 for general + * status and description. + * + * Summary: variable definition and lookup is not thread-safe in guile; + * attempting to look up a variable while another thread is defining + * a variable can sometimes lead to the first thread loosing, and not + * seeing an existing, defined variable. Alternately, difining two + * different variables at the same time can result in one of them + * failing to be defined; on rarer occasions, a seg-fault results. + * + * Compile as: + * cc test-define-race.c -lpthread -lguile + * + * May need to run several times to see the bug(s). + * + * Linas Vepstas <linasvepstas@gmail.com> December 2008 + */ + +#include <libguile.h> +#include <pthread.h> +#include <stdio.h> +#include <stdlib.h> + +typedef struct +{ + int id; + int count; + int do_exit; + int error_count; + int last_ok; + +} state; + +static void * guile_mode_definer(void * ud) +{ + SCM result; + int val; + + char buff[1000]; + state * s = (state *) ud; + + /* Increment before evaluation, in case evaluation raises an + error. */ + s->count ++; + + if (s->last_ok) + { + /* See if the previous definition holds the expected value. If + the expected value is undefined, scm_c_eval_string will raise + an error. */ + s->error_count ++; + s->last_ok = 0; + sprintf (buff, "x%d-%d\n", s->id, s->count - 1); + result = scm_c_eval_string (buff); + val = scm_to_int(result); + + if (val != s->count - 1) + printf ("Define mismatch on thread %d\n", s->id); + else + s->error_count --; + } + + /* Define a new variable with a new value. */ + sprintf (buff, "(define x%d-%d %d)\n", s->id, s->count, s->count); + scm_c_eval_string (buff); + + /* If we reach here, the definition was apparently successful, so we + can check it on the next iteration. */ + s->last_ok = 1; + + return NULL; +} + +static void * definer (void *ud) +{ + int i; + state * s = (state *) ud; + + while(!s->do_exit) + for (i=0; i<4000; i++) + { + scm_with_guile (guile_mode_definer, ud); + sched_yield(); /* try to get the threads to inter-leave a lot */ + } + return NULL; +} + +static void init_ctr(state *s, int val) +{ + s->id = val; + s->count = 0; + s->do_exit = 0; + s->error_count = 0; + s->last_ok = 0; +} + +static void * setup(void * ud) +{ + int *duration = (int *)ud; + + /* Query an environment variable to find out how long to run this + test for, defaulting to 10s. */ + *duration = scm_to_int (scm_c_eval_string ("(catch #t " + "(lambda () " + " (round (string->number (string-append \"#e\" " + "(or (getenv \"GUILE_TEST_DEFINE_RACE_DURATION\") \"10\"))))) " + "(lambda _ " + " (write _) (newline) 10))")); + + return NULL; +} + +int main(int argc, char ** argv) +{ + pthread_t th1, th2, th3, th4; + state counter1, counter2, counter3, counter4; + int error_total; + int duration; + + scm_with_guile (setup, &duration); + + init_ctr (&counter1, 1); + init_ctr (&counter2, 2); + init_ctr (&counter3, 3); + init_ctr (&counter4, 4); + + pthread_create(&th1, NULL, definer, (void *) &counter1); + pthread_create(&th2, NULL, definer, (void *) &counter2); + pthread_create(&th3, NULL, definer, (void *) &counter3); + pthread_create(&th4, NULL, definer, (void *) &counter4); + + sleep(duration); + counter1.do_exit = 1; + counter2.do_exit = 1; + counter3.do_exit = 1; + counter4.do_exit = 1; + + pthread_join(th1, NULL); + pthread_join(th2, NULL); + pthread_join(th3, NULL); + pthread_join(th4, NULL); + + error_total = (counter1.error_count + counter2.error_count + + counter3.error_count + counter4.error_count); + printf("test-define-race: %d error(s) in %ds\n", error_total, duration); + exit (error_total); +} -- 1.5.6.5 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-Make-the-interned-symbols-hash-thread-safe.patch --] [-- Type: text/x-diff, Size: 13228 bytes --] From 7b6b81cfef3705d20715b978b51004a8c4a6ddf6 Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 25 Mar 2009 22:34:23 +0000 Subject: [PATCH] Make the interned symbols hash thread-safe These changes introduce a mutex that protects all accesses to and modifications of the symbols hash. They allow the test-define-race test to run without errors for a long time. (Well, for 200s anyway.) * libguile/environments.c (obarray_enter, obarray_replace): Extra parameter on scm_i_rehash calls. * libguile/hashtab.c: Include symbols.h. (scm_i_rehash): New mutex parameter. When non-NULL, unlock and relock this mutex around allocations, and recheck new size calculation (in case another thread has changed it). (rehash_after_gc): Special treatment for the symbol hash. (scm_hash_fn_create_handle_x, scm_hash_fn_remove_x): Extra parameter on scm_i_rehash calls. * libguile/hashtab.h (scm_i_rehash): New mutex parameter. * libguile/symbols.c (symbols): Rename scm_i_symbols and make non-static. (symbols_mutex): New variable. (scm_sys_symbols): symbols -> scm_i_symbols. (lookup_interned_symbol): Lock symbols_mutex while performing lookup. (intern_symbol): Add name, len and raw_hash parameters. Lock symbols_mutex while checking and modifying the symbols hash. Once the mutex is locked, recheck in case another thread has already interned the symbol. Return interned symbol, in order to tell callers about this last case. (scm_i_c_mem2symbol, scm_i_mem2symbol, scm_take_locale_symboln): Update intern_symbol calls. (scm_i_rehash_symbols_after_gc): New function. (scm_symbols_prehistory): symbols -> scm_i_symbols. Initialize symbols_mutex. * libguile/symbols.h (scm_i_symbols, scm_i_rehash_symbols_after_gc): New declarations. --- libguile/environments.c | 4 +- libguile/hashtab.c | 53 +++++++++++++++++++-------- libguile/hashtab.h | 3 +- libguile/symbols.c | 92 +++++++++++++++++++++++++++++++++++++---------- libguile/symbols.h | 2 + 5 files changed, 117 insertions(+), 37 deletions(-) diff --git a/libguile/environments.c b/libguile/environments.c index 13d63c0..077a397 100644 --- a/libguile/environments.c +++ b/libguile/environments.c @@ -516,7 +516,7 @@ obarray_enter (SCM obarray, SCM symbol, SCM data) SCM_SET_HASHTABLE_BUCKET (obarray, hash, slot); SCM_HASHTABLE_INCREMENT (obarray); if (SCM_HASHTABLE_N_ITEMS (obarray) > SCM_HASHTABLE_UPPER (obarray)) - scm_i_rehash (obarray, scm_i_hash_symbol, 0, "obarray_enter"); + scm_i_rehash (obarray, scm_i_hash_symbol, 0, "obarray_enter", NULL); return entry; } @@ -550,7 +550,7 @@ obarray_replace (SCM obarray, SCM symbol, SCM data) SCM_SET_HASHTABLE_BUCKET (obarray, hash, slot); SCM_HASHTABLE_INCREMENT (obarray); if (SCM_HASHTABLE_N_ITEMS (obarray) > SCM_HASHTABLE_UPPER (obarray)) - scm_i_rehash (obarray, scm_i_hash_symbol, 0, "obarray_replace"); + scm_i_rehash (obarray, scm_i_hash_symbol, 0, "obarray_replace", NULL); return SCM_BOOL_F; } diff --git a/libguile/hashtab.c b/libguile/hashtab.c index ea7fc69..0292012 100644 --- a/libguile/hashtab.c +++ b/libguile/hashtab.c @@ -30,6 +30,7 @@ #include "libguile/root.h" #include "libguile/vectors.h" #include "libguile/ports.h" +#include "libguile/symbols.h" #include "libguile/validate.h" #include "libguile/hashtab.h" @@ -117,13 +118,15 @@ void scm_i_rehash (SCM table, unsigned long (*hash_fn)(), void *closure, - const char* func_name) + const char* func_name, + scm_i_pthread_mutex_t *mutex) { - SCM buckets, new_buckets; - int i; + SCM buckets, new_buckets = SCM_BOOL_F; + int i = 0; unsigned long old_size; unsigned long new_size; + restart: if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table)) { /* rehashing is not triggered when i <= min_size */ @@ -133,7 +136,7 @@ scm_i_rehash (SCM table, while (i > SCM_HASHTABLE (table)->min_size_index && SCM_HASHTABLE_N_ITEMS (table) < hashtable_size[i] / 4); } - else + else if (SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table)) { i = SCM_HASHTABLE (table)->size_index + 1; if (i >= HASHTABLE_SIZE_N) @@ -157,12 +160,28 @@ scm_i_rehash (SCM table, SCM_HASHTABLE (table)->upper = 9 * new_size / 10; buckets = SCM_HASHTABLE_VECTOR (table); - if (SCM_HASHTABLE_WEAK_P (table)) - new_buckets = scm_i_allocate_weak_vector (SCM_HASHTABLE_FLAGS (table), - scm_from_ulong (new_size), - SCM_EOL); - else - new_buckets = scm_c_make_vector (new_size, SCM_EOL); + if (scm_is_false (new_buckets) || + (SCM_SIMPLE_VECTOR_LENGTH (new_buckets) != new_size)) + { + /* Need to allocate or reallocate the new_buckets vector. */ + if (mutex) + scm_i_pthread_mutex_unlock (mutex); + + if (SCM_HASHTABLE_WEAK_P (table)) + new_buckets = scm_i_allocate_weak_vector (SCM_HASHTABLE_FLAGS (table), + scm_from_ulong (new_size), + SCM_EOL); + else + new_buckets = scm_c_make_vector (new_size, SCM_EOL); + + if (mutex) + scm_i_pthread_mutex_lock (mutex); + + /* Another thread could have messed with the hashtable while the + mutex was unlocked. So we now have to recalculate the rehash + size from scratch. */ + goto restart; + } /* When this is a weak hashtable, running the GC might change it. We need to cope with this while rehashing its elements. We do @@ -271,11 +290,15 @@ rehash_after_gc (void *dummy1 SCM_UNUSED, h = first; do { - /* Rehash only when we have a hash_fn. + /* Special treatment for the symbol hash, as it is protected + by a mutex. */ + if (scm_is_eq (h, scm_i_symbols)) + scm_i_rehash_symbols_after_gc (); + /* Otherwise, rehash only when we have a hash_fn. */ - if (SCM_HASHTABLE (h)->hash_fn) + else if (SCM_HASHTABLE (h)->hash_fn) scm_i_rehash (h, SCM_HASHTABLE (h)->hash_fn, NULL, - "rehash_after_gc"); + "rehash_after_gc", NULL); last = h; h = SCM_HASHTABLE_NEXT (h); } while (!scm_is_null (h)); @@ -490,7 +513,7 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init, unsigned long (*hash_ SCM_HASHTABLE_INCREMENT (table); if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table) || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table)) - scm_i_rehash (table, hash_fn, closure, FUNC_NAME); + scm_i_rehash (table, hash_fn, closure, FUNC_NAME, NULL); } return SCM_CAR (new_bucket); } @@ -556,7 +579,7 @@ scm_hash_fn_remove_x (SCM table, SCM obj, { SCM_HASHTABLE_DECREMENT (table); if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table)) - scm_i_rehash (table, hash_fn, closure, "scm_hash_fn_remove_x"); + scm_i_rehash (table, hash_fn, closure, "scm_hash_fn_remove_x", NULL); } } return h; diff --git a/libguile/hashtab.h b/libguile/hashtab.h index 1017354..3eca04c 100644 --- a/libguile/hashtab.h +++ b/libguile/hashtab.h @@ -96,7 +96,8 @@ SCM_API SCM scm_weak_key_hash_table_p (SCM h); SCM_API SCM scm_weak_value_hash_table_p (SCM h); SCM_API SCM scm_doubly_weak_hash_table_p (SCM h); -SCM_API void scm_i_rehash (SCM table, unsigned long (*hash_fn)(), void *closure, const char*func_name); +SCM_API void scm_i_rehash (SCM table, unsigned long (*hash_fn)(), + void *closure, const char *func_name, scm_i_pthread_mutex_t *mutex); SCM_API void scm_i_scan_weak_hashtables (void); SCM_API SCM scm_hash_fn_get_handle (SCM table, SCM obj, unsigned long (*hash_fn) (), SCM (*assoc_fn) (), void * closure); diff --git a/libguile/symbols.c b/libguile/symbols.c index b1329fa..a233fe3 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -46,7 +46,8 @@ \f -static SCM symbols; +SCM scm_i_symbols; +static scm_i_pthread_mutex_t symbols_mutex; #ifdef GUILE_DEBUG SCM_DEFINE (scm_sys_symbols, "%symbols", 0, 0, 0, @@ -54,7 +55,7 @@ SCM_DEFINE (scm_sys_symbols, "%symbols", 0, 0, 0, "Return the system symbol obarray.") #define FUNC_NAME s_scm_sys_symbols { - return symbols; + return scm_i_symbols; } #undef FUNC_NAME #endif @@ -90,9 +91,13 @@ lookup_interned_symbol (const char *name, size_t len, { /* Try to find the symbol in the symbols table */ SCM l; - unsigned long hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (symbols); + unsigned long hash; + + scm_i_pthread_mutex_lock (&symbols_mutex); - for (l = SCM_HASHTABLE_BUCKET (symbols, hash); + hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (scm_i_symbols); + + for (l = SCM_HASHTABLE_BUCKET (scm_i_symbols, hash); !scm_is_null (l); l = SCM_CDR (l)) { @@ -110,31 +115,69 @@ lookup_interned_symbol (const char *name, size_t len, goto next_symbol; } + scm_i_pthread_mutex_unlock (&symbols_mutex); return sym; } next_symbol: ; } + scm_i_pthread_mutex_unlock (&symbols_mutex); return SCM_BOOL_F; } /* Intern SYMBOL, an uninterned symbol. */ -static void -intern_symbol (SCM symbol) +static SCM +intern_symbol (SCM symbol, const char *name, size_t len, size_t raw_hash) { - SCM slot, cell; + SCM slot, new_bucket, l; unsigned long hash; - hash = scm_i_symbol_hash (symbol) % SCM_HASHTABLE_N_BUCKETS (symbols); - slot = SCM_HASHTABLE_BUCKET (symbols, hash); - cell = scm_cons (symbol, SCM_UNDEFINED); + /* Allocate new cell and bucket before locking the mutex. */ + new_bucket = scm_acons (symbol, SCM_UNDEFINED, SCM_BOOL_F); + + scm_i_pthread_mutex_lock (&symbols_mutex); + + hash = raw_hash % SCM_HASHTABLE_N_BUCKETS (scm_i_symbols); + slot = SCM_HASHTABLE_BUCKET (scm_i_symbols, hash); + + /* We have to check again here that the symbol isn't already hashed; + another thread could have interned it inbetween when we tried to + look up the symbol and the scm_i_pthread_mutex_lock call + above. */ + for (l = slot; !scm_is_null (l); l = SCM_CDR (l)) + { + SCM sym = SCM_CAAR (l); + if ((scm_i_symbol_hash (sym) == raw_hash) && + (scm_i_symbol_length (sym) == len)) + { + const char *chrs = scm_i_symbol_chars (sym); + size_t i = len; + + while (i != 0) + { + --i; + if (name[i] != chrs[i]) + goto next_symbol; + } + + scm_i_pthread_mutex_unlock (&symbols_mutex); + return sym; + } + next_symbol: + ; + } + + SCM_SETCDR (new_bucket, slot); + SCM_SET_HASHTABLE_BUCKET (scm_i_symbols, hash, new_bucket); + SCM_HASHTABLE_INCREMENT (scm_i_symbols); + if (SCM_HASHTABLE_N_ITEMS (scm_i_symbols) > SCM_HASHTABLE_UPPER (scm_i_symbols)) + scm_i_rehash (scm_i_symbols, scm_i_hash_symbol, 0, "intern_symbol", + &symbols_mutex); - SCM_SET_HASHTABLE_BUCKET (symbols, hash, scm_cons (cell, slot)); - SCM_HASHTABLE_INCREMENT (symbols); + scm_i_pthread_mutex_unlock (&symbols_mutex); - if (SCM_HASHTABLE_N_ITEMS (symbols) > SCM_HASHTABLE_UPPER (symbols)) - scm_i_rehash (symbols, scm_i_hash_symbol, 0, "intern_symbol"); + return symbol; } static SCM @@ -149,7 +192,7 @@ scm_i_c_mem2symbol (const char *name, size_t len) /* The symbol was not found, create it. */ symbol = scm_i_c_make_symbol (name, len, 0, raw_hash, scm_cons (SCM_BOOL_F, SCM_EOL)); - intern_symbol (symbol); + symbol = intern_symbol (symbol, name, len, raw_hash); } return symbol; @@ -169,12 +212,22 @@ scm_i_mem2symbol (SCM str) /* The symbol was not found, create it. */ symbol = scm_i_make_symbol (str, 0, raw_hash, scm_cons (SCM_BOOL_F, SCM_EOL)); - intern_symbol (symbol); + symbol = intern_symbol (symbol, name, len, raw_hash); } return symbol; } +void +scm_i_rehash_symbols_after_gc () +{ + scm_i_pthread_mutex_lock (&symbols_mutex); + + scm_i_rehash (scm_i_symbols, scm_i_hash_symbol, 0, "rehash_after_gc", + &symbols_mutex); + + scm_i_pthread_mutex_unlock (&symbols_mutex); +} static SCM scm_i_mem2uninterned_symbol (SCM str) @@ -417,7 +470,7 @@ scm_take_locale_symboln (char *sym, size_t len) { res = scm_i_c_take_symbol (sym, len, 0, raw_hash, scm_cons (SCM_BOOL_F, SCM_EOL)); - intern_symbol (res); + res = intern_symbol (res, sym, len, raw_hash); } else free (sym); @@ -434,8 +487,9 @@ scm_take_locale_symbol (char *sym) void scm_symbols_prehistory () { - symbols = scm_make_weak_key_hash_table (scm_from_int (2139)); - scm_permanent_object (symbols); + scm_i_symbols = scm_make_weak_key_hash_table (scm_from_int (2139)); + scm_permanent_object (scm_i_symbols); + scm_i_pthread_mutex_init (&symbols_mutex, NULL); } diff --git a/libguile/symbols.h b/libguile/symbols.h index f70d655..30c3005 100644 --- a/libguile/symbols.h +++ b/libguile/symbols.h @@ -34,6 +34,7 @@ #define SCM_I_F_SYMBOL_UNINTERNED 0x100 \f +extern SCM scm_i_symbols; #ifdef GUILE_DEBUG SCM_API SCM scm_sys_symbols (void); @@ -63,6 +64,7 @@ SCM_API SCM scm_take_locale_symboln (char *sym, size_t len); SCM_API unsigned long scm_i_hash_symbol (SCM obj, unsigned long n, void *closure); +void scm_i_rehash_symbols_after_gc (void); SCM_API void scm_symbols_prehistory (void); SCM_API void scm_init_symbols (void); -- 1.5.6.5 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #4: 0003-Allow-non-weak-hash-tables-to-be-automatically-threa.patch --] [-- Type: text/x-diff, Size: 9989 bytes --] From 7627abf6ecf1770367ea2c5f8d61e14fa4843c8a Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 25 Mar 2009 22:50:46 +0000 Subject: [PATCH] Allow non-weak hash tables to be automatically thread-safe This patch allows a fat mutex to be associated with a hash table. When a hash table has an associated mutex, that mutex will be automatically locked and unlocked around all operations (include lookups) on the hash table's internal data. * libguile/hashtab.c (make_hash_table): Init mutex field to #f. (hashtable_mark): New function. (scm_hash_fn_get_handle, scm_hash_fn_create_handle_x, scm_hash_fn_remove_x, scm_hash_clear_x, scm_internal_hash_fold, scm_internal_hash_for_each_handle): If the hash table has a mutex, lock and unlock it around all accessing of the hash table's internals. (scm_hash_use_mutex_x): New function. (scm_hashtab_prehistory): Use hashtable_mark as hash table mark function. * libguile/hashtab.h (SCM_HASHTABLE_MUTEX, SCM_SET_HASHTABLE_MUTEX, scm_hash_use_mutex_x): New declarations. (scm_t_hashtable): New mutex field. --- libguile/hashtab.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++------ libguile/hashtab.h | 4 ++ 2 files changed, 119 insertions(+), 15 deletions(-) diff --git a/libguile/hashtab.c b/libguile/hashtab.c index 0292012..7621314 100644 --- a/libguile/hashtab.c +++ b/libguile/hashtab.c @@ -104,6 +104,7 @@ make_hash_table (int flags, unsigned long k, const char *func_name) t->upper = 9 * n / 10; t->flags = flags; t->hash_fn = NULL; + t->mutex = SCM_BOOL_F; if (flags) { SCM_NEWSMOB3 (table, scm_tc16_hashtable, vector, t, weak_hashtables); @@ -218,6 +219,13 @@ scm_i_rehash (SCM table, } +static SCM +hashtable_mark (SCM table) +{ + scm_gc_mark (SCM_HASHTABLE_MUTEX (table)); + return SCM_CELL_OBJECT_1 (table); +} + static int hashtable_print (SCM exp, SCM port, scm_print_state *pstate SCM_UNUSED) { @@ -440,18 +448,34 @@ scm_hash_fn_get_handle (SCM table, SCM obj, unsigned long (*hash_fn)(), SCM (*as #define FUNC_NAME "scm_hash_fn_get_handle" { unsigned long k; - SCM h; + SCM h, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - table = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + table = SCM_HASHTABLE_VECTOR (table); + } else SCM_VALIDATE_VECTOR (1, table); if (SCM_SIMPLE_VECTOR_LENGTH (table) == 0) - return SCM_BOOL_F; + { + h = SCM_BOOL_F; + goto end; + } k = hash_fn (obj, SCM_SIMPLE_VECTOR_LENGTH (table), closure); if (k >= SCM_SIMPLE_VECTOR_LENGTH (table)) scm_out_of_range ("hash_fn_get_handle", scm_from_ulong (k)); h = assoc_fn (obj, SCM_SIMPLE_VECTOR_REF (table, k), closure); + + end: + if (!scm_is_false (mutex)) + scm_dynwind_end (); + return h; } #undef FUNC_NAME @@ -463,10 +487,18 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init, unsigned long (*hash_ #define FUNC_NAME "scm_hash_fn_create_handle_x" { unsigned long k; - SCM buckets, it; + SCM buckets, it, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - buckets = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + buckets = SCM_HASHTABLE_VECTOR (table); + } else { SCM_ASSERT (scm_is_simple_vector (table), @@ -481,7 +513,7 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init, unsigned long (*hash_ scm_out_of_range ("hash_fn_create_handle_x", scm_from_ulong (k)); it = assoc_fn (obj, SCM_SIMPLE_VECTOR_REF (buckets, k), closure); if (scm_is_pair (it)) - return it; + ; else if (scm_is_true (it)) scm_wrong_type_arg_msg (NULL, 0, it, "a pair"); else @@ -515,8 +547,13 @@ scm_hash_fn_create_handle_x (SCM table, SCM obj, SCM init, unsigned long (*hash_ || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table)) scm_i_rehash (table, hash_fn, closure, FUNC_NAME, NULL); } - return SCM_CAR (new_bucket); + it = SCM_CAR (new_bucket); } + + if (!scm_is_false (mutex)) + scm_dynwind_end (); + + return it; } #undef FUNC_NAME @@ -554,10 +591,18 @@ scm_hash_fn_remove_x (SCM table, SCM obj, void *closure) { unsigned long k; - SCM buckets, h; + SCM buckets, h, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - buckets = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + buckets = SCM_HASHTABLE_VECTOR (table); + } else { SCM_ASSERT (scm_is_simple_vector (table), table, @@ -565,7 +610,10 @@ scm_hash_fn_remove_x (SCM table, SCM obj, buckets = table; } if (SCM_SIMPLE_VECTOR_LENGTH (table) == 0) - return SCM_EOL; + { + h = SCM_EOL; + goto end; + } k = hash_fn (obj, SCM_SIMPLE_VECTOR_LENGTH (buckets), closure); if (k >= SCM_SIMPLE_VECTOR_LENGTH (buckets)) @@ -582,6 +630,9 @@ scm_hash_fn_remove_x (SCM table, SCM obj, scm_i_rehash (table, hash_fn, closure, "scm_hash_fn_remove_x", NULL); } } + end: + if (!scm_is_false (mutex)) + scm_dynwind_end (); return h; } @@ -592,8 +643,16 @@ SCM_DEFINE (scm_hash_clear_x, "hash-clear!", 1, 0, 0, { if (SCM_HASHTABLE_P (table)) { + SCM mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } scm_vector_fill_x (SCM_HASHTABLE_VECTOR (table), SCM_EOL); SCM_SET_HASHTABLE_N_ITEMS (table, 0); + if (!scm_is_false (mutex)) + scm_dynwind_end (); } else scm_vector_fill_x (table, SCM_EOL); @@ -940,10 +999,18 @@ SCM scm_internal_hash_fold (SCM (*fn) (), void *closure, SCM init, SCM table) { long i, n; - SCM buckets, result = init; + SCM buckets, result = init, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - buckets = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + buckets = SCM_HASHTABLE_VECTOR (table); + } else buckets = table; @@ -963,6 +1030,9 @@ scm_internal_hash_fold (SCM (*fn) (), void *closure, SCM init, SCM table) } } + if (!scm_is_false (mutex)) + scm_dynwind_end (); + return result; } @@ -978,10 +1048,18 @@ void scm_internal_hash_for_each_handle (SCM (*fn) (), void *closure, SCM table) { long i, n; - SCM buckets; + SCM buckets, mutex = SCM_BOOL_F; if (SCM_HASHTABLE_P (table)) - buckets = SCM_HASHTABLE_VECTOR (table); + { + mutex = SCM_HASHTABLE_MUTEX (table); + if (!scm_is_false (mutex)) + { + scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE); + scm_dynwind_lock_mutex (mutex); + } + buckets = SCM_HASHTABLE_VECTOR (table); + } else buckets = table; @@ -1000,6 +1078,9 @@ scm_internal_hash_for_each_handle (SCM (*fn) (), void *closure, SCM table) ls = SCM_CDR (ls); } } + + if (!scm_is_false (mutex)) + scm_dynwind_end (); } SCM_DEFINE (scm_hash_fold, "hash-fold", 3, 0, 0, @@ -1088,6 +1169,25 @@ SCM_DEFINE (scm_hash_map_to_list, "hash-map->list", 2, 0, 0, } #undef FUNC_NAME +SCM_DEFINE (scm_hash_use_mutex_x, "hash-use-mutex!", 2, 0, 0, + (SCM table, SCM mutex), + "Use @var{mutex} to serialize operations on @var{table} from multiple threads.") +#define FUNC_NAME s_scm_hash_use_mutex_x +{ + /* Must be a real (i.e. not a vector) and non-weak hash table. */ + SCM_VALIDATE_HASHTABLE (1, table); + if (SCM_HASHTABLE_WEAK_P (table)) + SCM_MISC_ERROR ("can't use mutex with a weak hash table", SCM_EOL); + + if (!scm_is_false (mutex)) + SCM_VALIDATE_MUTEX (2, mutex); + + SCM_SET_HASHTABLE_MUTEX (table, mutex); + + return SCM_UNSPECIFIED; +} +#undef FUNC_NAME + \f @@ -1095,7 +1195,7 @@ void scm_hashtab_prehistory () { scm_tc16_hashtable = scm_make_smob_type (s_hashtable, 0); - scm_set_smob_mark (scm_tc16_hashtable, scm_markcdr); + scm_set_smob_mark (scm_tc16_hashtable, hashtable_mark); scm_set_smob_print (scm_tc16_hashtable, hashtable_print); scm_set_smob_free (scm_tc16_hashtable, hashtable_free); scm_c_hook_add (&scm_after_gc_c_hook, rehash_after_gc, 0, 0); diff --git a/libguile/hashtab.h b/libguile/hashtab.h index 3eca04c..79cb30c 100644 --- a/libguile/hashtab.h +++ b/libguile/hashtab.h @@ -58,6 +58,8 @@ SCM_API scm_t_bits scm_tc16_hashtable; #define SCM_HASHTABLE_DECREMENT(x) (SCM_HASHTABLE_N_ITEMS(x)--) #define SCM_HASHTABLE_UPPER(x) (SCM_HASHTABLE (x)->upper) #define SCM_HASHTABLE_LOWER(x) (SCM_HASHTABLE (x)->lower) +#define SCM_HASHTABLE_MUTEX(x) (SCM_HASHTABLE (x)->mutex) +#define SCM_SET_HASHTABLE_MUTEX(x, m) (SCM_HASHTABLE (x)->mutex = m) #define SCM_HASHTABLE_N_BUCKETS(h) \ SCM_SIMPLE_VECTOR_LENGTH (SCM_HASHTABLE_VECTOR (h)) @@ -74,6 +76,7 @@ typedef struct scm_t_hashtable { int size_index; /* index into hashtable_size */ int min_size_index; /* minimum size_index */ unsigned long (*hash_fn) (); /* for rehashing after a GC. */ + SCM mutex; /* mutex for thread safety */ } scm_t_hashtable; \f @@ -133,6 +136,7 @@ SCM_API SCM scm_hash_fold (SCM proc, SCM init, SCM hash); SCM_API SCM scm_hash_for_each (SCM proc, SCM hash); SCM_API SCM scm_hash_for_each_handle (SCM proc, SCM hash); SCM_API SCM scm_hash_map_to_list (SCM proc, SCM hash); +SCM_API SCM scm_hash_use_mutex_x (SCM hash, SCM mutex); SCM_API void scm_hashtab_prehistory (void); SCM_API void scm_init_hashtab (void); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-25 23:19 ` Neil Jerram @ 2009-03-26 3:40 ` Linas Vepstas 2009-03-26 8:02 ` Neil Jerram 2009-03-26 9:10 ` Ludovic Courtès 2009-03-26 22:51 ` Neil Jerram 2 siblings, 1 reply; 51+ messages in thread From: Linas Vepstas @ 2009-03-26 3:40 UTC (permalink / raw) To: Neil Jerram; +Cc: Andy Wingo, Clinton Ebadi, Guile Development 2009/3/25 Neil Jerram <neil@ossau.uklinux.net>: > #2 makes the symbols hash thread-safe, and it appears that this > completely fixes the define-race problem. I reviewed patch2 as best I could, but I'm cross-eyed cause its late at night, and I don't know guile internals well. So I'm not sure my review means much. However, isn't the following a mem leak: +static SCM +intern_symbol (SCM symbol, const char *name, size_t len, size_t raw_hash) + /* Allocate new cell and bucket before locking the mutex. */ + new_bucket = scm_acons (symbol, SCM_UNDEFINED, SCM_BOOL_F); + for (l = slot; !scm_is_null (l); l = SCM_CDR (l)) ... + + scm_i_pthread_mutex_unlock (&symbols_mutex); + return sym; it looks to me like new_bucket is never released. I assume that the scm_acons did a malloc, but maybe not I'm to tired to say anything meaningful about the rest ... --linas p.s. many thanks for chasing this stuff down and fixing it ! It is all a very pleasant surprise! ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-26 3:40 ` Linas Vepstas @ 2009-03-26 8:02 ` Neil Jerram 2009-03-26 18:39 ` Linas Vepstas 0 siblings, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-26 8:02 UTC (permalink / raw) To: linasvepstas; +Cc: Andy Wingo, Clinton Ebadi, Guile Development Linas Vepstas <linasvepstas@gmail.com> writes: > I reviewed patch2 as best I could, Thanks! > it looks to me like new_bucket is never released. > I assume that the scm_acons did a malloc, but > maybe not Yes, but the garbage collection will find and free it, so it's OK. Regards, Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-26 8:02 ` Neil Jerram @ 2009-03-26 18:39 ` Linas Vepstas 0 siblings, 0 replies; 51+ messages in thread From: Linas Vepstas @ 2009-03-26 18:39 UTC (permalink / raw) To: Neil Jerram; +Cc: Andy Wingo, Clinton Ebadi, Guile Development 2009/3/26 Neil Jerram <neil@ossau.uklinux.net>: > Linas Vepstas <linasvepstas@gmail.com> writes: > >> I reviewed patch2 as best I could, > > Thanks! > >> it looks to me like new_bucket is never released. >> I assume that the scm_acons did a malloc, but >> maybe not > > Yes, but the garbage collection will find and free it, so it's OK. Duhh. Clearly, my brain inhabits the non-GC world. --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-25 23:19 ` Neil Jerram 2009-03-26 3:40 ` Linas Vepstas @ 2009-03-26 9:10 ` Ludovic Courtès 2009-03-26 22:01 ` Neil Jerram 2009-03-26 22:51 ` Neil Jerram 2 siblings, 1 reply; 51+ messages in thread From: Ludovic Courtès @ 2009-03-26 9:10 UTC (permalink / raw) To: guile-devel Hi Neil, Neil Jerram <neil@ossau.uklinux.net> writes: > - it runs for a duration specified by environment variable > GUILE_TEST_DEFINE_RACE_DURATION > > - it defaults to just 10s if that variable isn't defined. Even 10s is going to be slightly annoying. Another option would be to create tens of guile-definer threads instead of just 4; it might trigger the problem earlier. > #2 makes the symbols hash thread-safe, and it appears that this > completely fixes the define-race problem. I don't understand why we > apparently don't need patch #3 as well - but that's what my results > indicate. Roughly #2 is a specialized version of #3 as it addresses only the case of the symbol hash table. So it should be enough for this very problem. #3 is probably the way to go in the longer run, but it changes the API/ABI so it can't go in 1.8. I'll skip #3 for now. > * test-suite/standalone/test-define-race.c: New test. Could you run `indent' on this file? > +TESTS += test-define-race It needs to be conditionalized on `--with-threads'. > scm_i_rehash (SCM table, > unsigned long (*hash_fn)(), > void *closure, > - const char* func_name) > + const char* func_name, > + scm_i_pthread_mutex_t *mutex) This function assumes that MUTEX is locked. I would either write it prominently in a comment above or change the semantics so that MUTEX is acquired in `scm_i_rehash ()', not in the caller. The latter might be achieved by integrating tests about whether TABLE needs to be rehashed into `scm_i_rehash ()' itself. I.e., this: if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table) || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table)) scm_i_rehash (...); would become: scm_i_rehash (...); I haven't reviewed all uses to see whether it's actually possible and whether it would lead to race conditions, though. > -static SCM symbols; > +SCM scm_i_symbols; I don't think this is needed, is it? > +static SCM > +intern_symbol (SCM symbol, const char *name, size_t len, size_t raw_hash) Since all users of `intern_symbol ()' perform a `lookup_interned_symbol ()' right before, I'd rather change users so that they lock before `lookup' and unlock after `intern': lock (&symbols_mutex); sym = lookup_interned_symbol (name); if (scm_is_false (sym)) { sym = scm_i_c_make_symbol (...); intern_symbol (sym); } unlock (&symbols_mutex); Is it doable? Thanks, Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-26 9:10 ` Ludovic Courtès @ 2009-03-26 22:01 ` Neil Jerram 2009-03-26 23:12 ` Ludovic Courtès 0 siblings, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-26 22:01 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel ludo@gnu.org (Ludovic Courtès) writes: > Hi Neil, Hi Ludovic, Many thanks for your detailed comments! > Neil Jerram <neil@ossau.uklinux.net> writes: > >> - it runs for a duration specified by environment variable >> GUILE_TEST_DEFINE_RACE_DURATION >> >> - it defaults to just 10s if that variable isn't defined. > > Even 10s is going to be slightly annoying. Another option would be to > create tens of guile-definer threads instead of just 4; it might trigger > the problem earlier. That's an interesting idea, but to be honest I'd prefer to finish this work and move onto something else. Would it be OK just to reduce the default runtime to 2 seconds? (It wouldn't make any practical difference, because 10 seconds isn't normally enough to see a define race anyway. It'll just be enough to catch any obvious bitrotting, and people who want to run a real test can use GUILE_TEST_DEFINE_RACE_DURATION.) >> #2 makes the symbols hash thread-safe, and it appears that this >> completely fixes the define-race problem. I don't understand why we >> apparently don't need patch #3 as well - but that's what my results >> indicate. > > Roughly #2 is a specialized version of #3 as it addresses only the case > of the symbol hash table. So it should be enough for this very problem. Actually #2 and #3 are a bit different (and probably I should have explained more about this in my previous email). The key points about #2 are that it uses a straight pthreads mutex to protect the symbol hash, and that the symbol hash is a weak hash table. Using a pthreads mutex means that we have to take care not to do any allocations (or anything else that could block) while the mutex is locked. The weakness means that as well as the obvious lookup and intern operations, we have to allow for the hash table being accessed and resized from an after-GC hook. #3, on the other hand, uses a fat mutex, and can be applied to any non-weak hash table. Using a fat mutex means that it's fine to allocate or block while the mutex is locked, and means that the code can use scm_dynwind_lock_mutex, which is quite neat. On the other hand, it would cause a problem if the hash table concerned could be accessed during GC, because - if the GC didn't try to lock the fat mutex itself, it would be modifying the hash table under the feet of the thread that has the mutex locked - if the GC does try to lock the fat mutex, it won't be able to and so will block... deadlock! Hence the restriction of #3 to non-weak hash tables. > #3 is probably the way to go in the longer run, but it changes the > API/ABI so it can't go in 1.8. If #3 was needed, I wouldn't see any problem with its API/ABI changes. The API change is addition of one new primitive, and the ABI change is to a structure that isn't intended for user code to access. Am I missing something? On the other hand, if it isn't needed - as appears to be true - there's no case for adding it to 1.8. >> * test-suite/standalone/test-define-race.c: New test. > > Could you run `indent' on this file? Sure, will do. >> +TESTS += test-define-race > > It needs to be conditionalized on `--with-threads'. It's already inside an "if BUILD_PTHREAD_SUPPORT". Do I need to do more than that? >> scm_i_rehash (SCM table, >> unsigned long (*hash_fn)(), >> void *closure, >> - const char* func_name) >> + const char* func_name, >> + scm_i_pthread_mutex_t *mutex) > > This function assumes that MUTEX is locked. I would either write it > prominently in a comment above or change the semantics so that MUTEX is > acquired in `scm_i_rehash ()', not in the caller. I think I prefer the prominent comment. My first thought was for scm_i_rehash () to acquire the mutex. I changed to the current code because then all of the lock..unlock pairs are in symbols.c - which seemed more consistent to me. > The latter might be achieved by integrating tests about whether TABLE > needs to be rehashed into `scm_i_rehash ()' itself. I.e., this: > > if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table) > || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table)) > scm_i_rehash (...); > > would become: > > scm_i_rehash (...); > > I haven't reviewed all uses to see whether it's actually possible and > whether it would lead to race conditions, though. I think that could work, but it would add another lock/unlock to the mainline. (We need to hold the lock even just to evaluate the rehashing condition.) The change doesn't feel compelling enough to me to justify that. >> -static SCM symbols; >> +SCM scm_i_symbols; > > I don't think this is needed, is it? Yes, so that rehash_after_gc () (in hashtab.c) can identify the symbol hash and call scm_i_rehash_symbols_after_gc () for it. >> +static SCM >> +intern_symbol (SCM symbol, const char *name, size_t len, size_t raw_hash) > > Since all users of `intern_symbol ()' perform a > `lookup_interned_symbol ()' right before, I'd rather change users so > that they lock before `lookup' and unlock after `intern': > > lock (&symbols_mutex); > > sym = lookup_interned_symbol (name); > if (scm_is_false (sym)) > { > sym = scm_i_c_make_symbol (...); > intern_symbol (sym); > } > > unlock (&symbols_mutex); > > Is it doable? The problem with this is allocation (scm_i_c_make_symbol) while holding the mutex. The overall arrangement of the symbol code is lookup - allocate - intern and I assume that is so as to avoid allocation in the case where the symbol is already interned; otherwise the arrangement could be allocate - lookup/intern with the lookup/intern being implemented as a single operation. I've made my changes so as to leave this overall pattern the same. The code certainly could be simpler if we always allocated upfront, but I think it is better (for performance) not to do that. On the other hand, I suspect the code could still be simplified to remove the duplication between lookup_interned_symbol and intern_symbol. I'll have a look at that. Thanks again for the review! Regards, Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-26 22:01 ` Neil Jerram @ 2009-03-26 23:12 ` Ludovic Courtès 0 siblings, 0 replies; 51+ messages in thread From: Ludovic Courtès @ 2009-03-26 23:12 UTC (permalink / raw) To: guile-devel Hi Neil, Neil Jerram <neil@ossau.uklinux.net> writes: > That's an interesting idea, but to be honest I'd prefer to finish this > work and move onto something else. Would it be OK just to reduce the > default runtime to 2 seconds? (It wouldn't make any practical > difference, because 10 seconds isn't normally enough to see a define > race anyway. It'll just be enough to catch any obvious bitrotting, > and people who want to run a real test can use > GUILE_TEST_DEFINE_RACE_DURATION.) Yes, that'd be OK. > The key points about #2 are that it uses a straight pthreads mutex to > protect the symbol hash, and that the symbol hash is a weak hash > table. Using a pthreads mutex means that we have to take care not to > do any allocations (or anything else that could block) while the mutex > is locked. The weakness means that as well as the obvious lookup and > intern operations, we have to allow for the hash table being accessed > and resized from an after-GC hook. OK, got it. (A limitation that vanishes with BDW-GC.) > #3, on the other hand, uses a fat mutex, and can be applied to any > non-weak hash table. Using a fat mutex means that it's fine to > allocate or block while the mutex is locked, and means that the code > can use scm_dynwind_lock_mutex, which is quite neat. On the other > hand, it would cause a problem if the hash table concerned could be > accessed during GC, because > > - if the GC didn't try to lock the fat mutex itself, it would be > modifying the hash table under the feet of the thread that has the > mutex locked > > - if the GC does try to lock the fat mutex, it won't be able to and so > will block... deadlock! > > Hence the restriction of #3 to non-weak hash tables. OK. > If #3 was needed, I wouldn't see any problem with its API/ABI changes. > The API change is addition of one new primitive, and the ABI change is > to a structure that isn't intended for user code to access. Am I > missing something? I was concerned about the `scm_t_hashtable' change, but it doesn't appear to be accessed by public inlines/macros, so that should be OK. > On the other hand, if it isn't needed - as appears to be true - > there's no case for adding it to 1.8. Agreed. >> It needs to be conditionalized on `--with-threads'. > > It's already inside an "if BUILD_PTHREAD_SUPPORT". Do I need to do > more than that? No, I had overlooked this. >>> scm_i_rehash (SCM table, >>> unsigned long (*hash_fn)(), >>> void *closure, >>> - const char* func_name) >>> + const char* func_name, >>> + scm_i_pthread_mutex_t *mutex) >> >> This function assumes that MUTEX is locked. I would either write it >> prominently in a comment above or change the semantics so that MUTEX is >> acquired in `scm_i_rehash ()', not in the caller. > > I think I prefer the prominent comment. My first thought was for > scm_i_rehash () to acquire the mutex. I changed to the current code > because then all of the lock..unlock pairs are in symbols.c - which > seemed more consistent to me. Fair enough. >> The latter might be achieved by integrating tests about whether TABLE >> needs to be rehashed into `scm_i_rehash ()' itself. I.e., this: >> >> if (SCM_HASHTABLE_N_ITEMS (table) < SCM_HASHTABLE_LOWER (table) >> || SCM_HASHTABLE_N_ITEMS (table) > SCM_HASHTABLE_UPPER (table)) >> scm_i_rehash (...); >> >> would become: >> >> scm_i_rehash (...); >> >> I haven't reviewed all uses to see whether it's actually possible and >> whether it would lead to race conditions, though. > > I think that could work, but it would add another lock/unlock to the > mainline. (We need to hold the lock even just to evaluate the > rehashing condition.) The change doesn't feel compelling enough to me > to justify that. Contention-free locks are "cheap" in that there's no syscall involved (when using futexes); OTOH there's at least one function call, which we'd rather avoid. >>> -static SCM symbols; >>> +SCM scm_i_symbols; >> >> I don't think this is needed, is it? > > Yes, so that rehash_after_gc () (in hashtab.c) can identify the symbol > hash and call scm_i_rehash_symbols_after_gc () for it. Oops, right. >>> +static SCM >>> +intern_symbol (SCM symbol, const char *name, size_t len, size_t raw_hash) >> >> Since all users of `intern_symbol ()' perform a >> `lookup_interned_symbol ()' right before, I'd rather change users so >> that they lock before `lookup' and unlock after `intern': >> >> lock (&symbols_mutex); >> >> sym = lookup_interned_symbol (name); >> if (scm_is_false (sym)) >> { >> sym = scm_i_c_make_symbol (...); >> intern_symbol (sym); >> } >> >> unlock (&symbols_mutex); >> >> Is it doable? > > The problem with this is allocation (scm_i_c_make_symbol) while > holding the mutex. The overall arrangement of the symbol code is > > lookup - allocate - intern > > and I assume that is so as to avoid allocation in the case where the > symbol is already interned; otherwise the arrangement could be > > allocate - lookup/intern > > with the lookup/intern being implemented as a single operation. > > I've made my changes so as to leave this overall pattern the same. > The code certainly could be simpler if we always allocated upfront, > but I think it is better (for performance) not to do that. Yes, you're right. I hadn't grasped all the implications here. One nitpick: `intern_symbol ()' shouldn't need NAME, LEN and RAW_HASH as it can get them "for free" from SYMBOL. > On the other hand, I suspect the code could still be simplified to > remove the duplication between lookup_interned_symbol and > intern_symbol. I'll have a look at that. Possible. Thanks for looking into this! Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-25 23:19 ` Neil Jerram 2009-03-26 3:40 ` Linas Vepstas 2009-03-26 9:10 ` Ludovic Courtès @ 2009-03-26 22:51 ` Neil Jerram 2009-03-27 3:15 ` Linas Vepstas 2 siblings, 1 reply; 51+ messages in thread From: Neil Jerram @ 2009-03-26 22:51 UTC (permalink / raw) To: Linas Vepstas, Guile Development Neil Jerram <neil@ossau.uklinux.net> writes: > #2 makes the symbols hash thread-safe, and it appears that this > completely fixes the define-race problem. I don't understand why we > apparently don't need patch #3 as well - but that's what my results > indicate. Maybe it's because the fragments of code passed to scm_c_eval_string in the test-define-race test are very simple - just (define x1-100 100) and x1-100 - and that scm_c_eval_string consists of reading (=> symbol lookup) followed by evaluation (=> module obarray lookup), and that each scm_c_eval_string call is fast enough to be completed well within one time slice. Then the mutex locking at the start of each symbol lookup would be enough to make sure that scm_c_eval_string calls on different threads never overlap with each other. Does that sound feasible? (If so, a test like test-define-race, with patch #2 in place, should still show errors on a multi-core system, or if the code that it evaluates was made more complex; and use of patch #3 should then fix those errors.) Regards, Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-26 22:51 ` Neil Jerram @ 2009-03-27 3:15 ` Linas Vepstas 0 siblings, 0 replies; 51+ messages in thread From: Linas Vepstas @ 2009-03-27 3:15 UTC (permalink / raw) To: Neil Jerram; +Cc: Guile Development 2009/3/26 Neil Jerram <neil@ossau.uklinux.net>: > > - and that scm_c_eval_string consists of reading (=> symbol lookup) > followed by evaluation (=> module obarray lookup), and that each > scm_c_eval_string call is fast enough to be completed well within one > time slice. One way to stress test such situations is to call pthread_yield right after the unlock. i.e. temporarily change scm_i_pthread_mutex_unlock to also called yield. This will trigger a thread switch even if the time-slice is not expired, and can often expose these kinds of bugs. --linas ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-12 22:13 ` Andy Wingo 2009-03-13 19:13 ` Neil Jerram @ 2009-03-14 14:23 ` Ludovic Courtès 2009-03-16 22:57 ` Andy Wingo 1 sibling, 1 reply; 51+ messages in thread From: Ludovic Courtès @ 2009-03-14 14:23 UTC (permalink / raw) To: guile-devel Hello! Andy Wingo <wingo@pobox.com> writes: > Dunno, in GStreamer we found that uncontended locks are cheap cheap > cheap. AFAIU they don't even cause context switches. On GNU/Linux they are implemented using futexes and the kernel is not invoked when there's no contention. Still, I would measure the impact on `module-define!' and `module-ref' because that's at least an additional function call. Thanks, Ludo'. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-14 14:23 ` Ludovic Courtès @ 2009-03-16 22:57 ` Andy Wingo 0 siblings, 0 replies; 51+ messages in thread From: Andy Wingo @ 2009-03-16 22:57 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel On Sat 14 Mar 2009 15:23, ludo@gnu.org (Ludovic Courtès) writes: > Still, I would measure the impact on `module-define!' and `module-ref' > because that's at least an additional function call. You probably know, but at least in the VM (and I thought in the interpreter too), the locations are computed once, and the resulting variable is cached. So module-ref is not called repeatedly for the same binding. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Locks and threads 2009-03-10 23:57 ` Neil Jerram 2009-03-12 0:07 ` Neil Jerram @ 2009-03-25 18:57 ` Neil Jerram 1 sibling, 0 replies; 51+ messages in thread From: Neil Jerram @ 2009-03-25 18:57 UTC (permalink / raw) To: Guile Development; +Cc: Linas Vepstas Neil Jerram <neil@ossau.uklinux.net> writes: > I've been running Linas's define-race test program, and hitting the > `throw from within critical section' quite a lot. > > We've already discussed this a couple of times: > http://www.mail-archive.com/bug-guile@gnu.org/msg04613.html > http://www.mail-archive.com/guile-devel@gnu.org/msg03016.html > > Andy proposed just removing the check for 1.8.x, but I would prefer to > keep it if possible, as I think there probably still are genuine > problems where the check could help. What about the patch below? I don't believe anyone commented on this, but I've committed it now to 1.8.x. In case of any concerns, please let me know. Neil ^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2009-03-27 3:15 UTC | newest] Thread overview: 51+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-11 22:31 Locks and threads Neil Jerram 2009-02-11 23:05 ` Neil Jerram 2009-02-11 23:32 ` Ludovic Courtès 2009-02-11 23:30 ` Linas Vepstas 2009-02-11 23:53 ` Neil Jerram 2009-02-12 0:18 ` Linas Vepstas 2009-02-12 20:51 ` Ludovic Courtès 2009-02-11 23:30 ` Ludovic Courtès 2009-02-12 12:55 ` Greg Troxel 2009-02-12 18:00 ` Ken Raeburn 2009-02-12 21:14 ` Ludovic Courtès 2009-02-14 1:25 ` Ken Raeburn 2009-02-14 16:09 ` Ludovic Courtès 2009-03-05 20:41 ` Neil Jerram 2009-03-04 23:49 ` Neil Jerram 2009-03-05 3:54 ` Linas Vepstas 2009-03-05 19:46 ` Neil Jerram 2009-03-05 20:05 ` Neil Jerram 2009-03-05 20:40 ` Linas Vepstas 2009-03-05 20:49 ` Neil Jerram 2009-03-05 20:57 ` Linas Vepstas 2009-03-05 21:25 ` Neil Jerram 2009-03-05 21:56 ` Linas Vepstas 2009-03-06 11:01 ` Andy Wingo 2009-03-06 12:36 ` Linas Vepstas 2009-03-06 22:05 ` Ludovic Courtès 2009-03-08 22:04 ` Neil Jerram 2009-03-25 19:00 ` Neil Jerram 2009-03-25 22:08 ` Ludovic Courtès 2009-03-05 21:35 ` Ludovic Courtès 2009-03-10 23:57 ` Neil Jerram 2009-03-12 0:07 ` Neil Jerram 2009-03-12 0:53 ` Neil Jerram 2009-03-12 1:29 ` Linas Vepstas 2009-03-12 3:09 ` Clinton Ebadi 2009-03-25 22:13 ` Neil Jerram 2009-03-25 22:34 ` Linas Vepstas 2009-03-12 22:13 ` Andy Wingo 2009-03-13 19:13 ` Neil Jerram 2009-03-25 23:19 ` Neil Jerram 2009-03-26 3:40 ` Linas Vepstas 2009-03-26 8:02 ` Neil Jerram 2009-03-26 18:39 ` Linas Vepstas 2009-03-26 9:10 ` Ludovic Courtès 2009-03-26 22:01 ` Neil Jerram 2009-03-26 23:12 ` Ludovic Courtès 2009-03-26 22:51 ` Neil Jerram 2009-03-27 3:15 ` Linas Vepstas 2009-03-14 14:23 ` Ludovic Courtès 2009-03-16 22:57 ` Andy Wingo 2009-03-25 18:57 ` Neil Jerram
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).