open-pipe deadlocked

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

* open-pipe deadlocked
@ 2011-07-29 13:56 rixed
  2011-08-02 21:24 ` Andreas Rottmann
  2011-08-10 18:31 ` rixed
  0 siblings, 2 replies; 12+ messages in thread
From: rixed @ 2011-07-29 13:56 UTC (permalink / raw)
  To: guile-user

Hello !

Sometime, my progam calls open-input-pipe and the forked child hangs
waiting for a lock (so after the fork but before executing the command).

So I read the code for open-process, especially what happens between the
fork and the execlp, and I noticed several potential problems :

- all ports are closed, but what about other open files that are not
  ports ? My application opens many files in C that are not known to
  guile. Shouldn't these be closed as well ?

- what if when forking some other guile thread hold one of the internal
  lock (for instance, the lock protecting the port table) ? Then the
  code between the fork and the exec (which loop on all ports, amongst
  other things) may try to grab this internal mutex, deadlocking.

Any thoughs?




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-07-29 13:56 open-pipe deadlocked rixed
@ 2011-08-02 21:24 ` Andreas Rottmann
  2011-08-04  7:18   ` rixed
  2011-08-10 18:31 ` rixed
  1 sibling, 1 reply; 12+ messages in thread
From: Andreas Rottmann @ 2011-08-02 21:24 UTC (permalink / raw)
  To: rixed; +Cc: guile-user

rixed@happyleptic.org writes:

> Hello !
>
> Sometime, my progam calls open-input-pipe and the forked child hangs
> waiting for a lock (so after the fork but before executing the command).
>
> So I read the code for open-process, especially what happens between the
> fork and the execlp, and I noticed several potential problems :
>
> - all ports are closed, but what about other open files that are not
>   ports ? My application opens many files in C that are not known to
>   guile. Shouldn't these be closed as well ?
>
From this, I gather you have a C application that has Guile embedded,
right?  Or are you rather having a Guile application that uses
third-party C code via language bindings?  In the first case, it could
be a workable solution (since you control all the C code) to just open
all files with the FD_CLOEXEC flag set, assuming you don't want to share
these file descriptors with the child.  In the latter case, you are (in
general) a bit out of luck; perhaps this[0] LWN discussion can shed some
light on the general issue here.

[0] http://lwn.net/Articles/292559/

Personally, I think FD_CLOEXEC being set per default would be a good
thing, but that's not going to happen, so one can either keep track of
all FDs to close between fork() and exec(), or mark all FDs as
FD_CLOEXEC manually after their creation. However the former is not
possible with the current implementation of `open-process', as there's
no user-defined code being executed between fork() and exec() to do the
actual closing.

> - what if when forking some other guile thread hold one of the internal
>   lock (for instance, the lock protecting the port table) ? Then the
>   code between the fork and the exec (which loop on all ports, amongst
>   other things) may try to grab this internal mutex, deadlocking.
>
I've not yet done multithreading with Guile, but I think you are right;
Guile should use pthread_atfork() as explained at[1], but it apparently
does not (or at least so "git grep" indicated).  This sounds like a bug;
could you come up with an example program that has a good chance to run
into this suspected issue?

[1] http://pubs.opengroup.org/onlinepubs/007904975/functions/pthread_atfork.html

Regards, Rotty
-- 
Andreas Rottmann -- <http://rotty.yi.org/>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-08-02 21:24 ` Andreas Rottmann
@ 2011-08-04  7:18   ` rixed
  2011-09-01 21:45     ` Ludovic Courtès
  0 siblings, 1 reply; 12+ messages in thread
From: rixed @ 2011-08-04  7:18 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 8968 bytes --]

> > - all ports are closed, but what about other open files that are not
> >   ports ? My application opens many files in C that are not known to
> >   guile. Shouldn't these be closed as well ?
> >
> From this, I gather you have a C application that has Guile embedded,
> right?

Yes.

> (...) it could
> be a workable solution (since you control all the C code) to just open
> all files with the FD_CLOEXEC flag set, assuming you don't want to share
> these file descriptors with the child.  In the latter case, you are (in
> general) a bit out of luck; perhaps this[0] LWN discussion can shed some
> light on the general issue here.

Very interresting reading, but:

- If there is no better way then guile doc should definitively state
  this

- Why not use the "good old way" of execing, ie fork then close
  everything but stdin/out/err, then exec. I don't have R.Stevens book at
  hand, but see [0] for an example

[0] http://www.enderunix.org/docs/eng/daemon.php

> > - what if when forking some other guile thread hold one of the internal
> >   lock (for instance, the lock protecting the port table) ? Then the
> >   code between the fork and the exec (which loop on all ports, amongst
> >   other things) may try to grab this internal mutex, deadlocking.
> >
> I've not yet done multithreading with Guile, but I think you are right;
> Guile should use pthread_atfork() as explained at[1], but it apparently
> does not (or at least so "git grep" indicated).  This sounds like a bug;
> could you come up with an example program that has a good chance to run
> into this suspected issue?

I attached 2 files:

- "guile deadlock.scm > /tmp/log" deadlocks after around 12k lines of output for me

- more surprisingly, "guile crash.scm > /dev/log" segfaults with this
  backtrace :


Thread 2 (Thread 30144):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007fccbe1d30e9 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007fccbe1d2f0b in __pthread_mutex_lock (mutex=0x7fccbf01ec60) at pthread_mutex_lock.c:61
#3  0x00007fccbedd017c in scm_pthread_mutex_lock (mutex=0x7fccbf01ec60) at threads.c:1482
#4  0x00007fccbed70c61 in scm_gc_for_newcell (freelist=0x7fccbf0262a0, free_cells=0xb2c0b8) at gc.c:484
#5  0x00007fccbed86201 in scm_cell (car=140517355806736, cdr=140517355823216) at ../libguile/inline.h:122
#6  0x00007fccbed44b63 in scm_acons (key=0x7fccbf142100, value=0x7fccbf13e020, alist=0x7fccbf142070) at alist.c:43
#7  0x00007fccbed5fbed in ceval (x=0x7fccbf142010, env=0x7fccbf13e070) at eval.c:4300
#8  0x00007fccbed63b15 in scm_i_eval_x (exp=0x7fccbf148750, env=0x7fccbf1486e0) at eval.c:5900
#9  0x00007fccbed63c8c in scm_primitive_eval_x (exp=0x7fccbf148750) at eval.c:5921
#10 0x00007fccbed890f6 in scm_primitive_load (filename=0x7fccbf1e2c60) at load.c:109
#11 0x00007fccbed5f7a5 in ceval (x=0x404, env=0x7fccbf182b30) at eval.c:4232
#12 0x00007fccbed63c28 in scm_i_eval (exp=0x7fccbf182b90, env=0x7fccbf182b30) at eval.c:5910
#13 0x00007fccbed4b088 in scm_start_stack (id=0x7fccbf1cc2c0, exp=0x7fccbf206270, env=0x7fccbf182b30) at debug.c:457
#14 0x00007fccbed4b14a in scm_m_start_stack (exp=0x7fccbf206280, env=0x7fccbf182b30) at debug.c:473
#15 0x00007fccbed611b9 in scm_apply (proc=0x7fccbf2156a0, arg1=0x7fccbf2062e0, args=0x7fccbf182b30) at eval.c:4882
#16 0x00007fccbed5efcc in ceval (x=0x7fccbf2062e0, env=0x7fccbf182b30) at eval.c:4059
#17 0x00007fccbed61adb in scm_apply (proc=0x7fccbf182aa0, arg1=0x404, args=0x7fccbf182b30) at eval.c:5012
#18 0x00007fccbed60cf7 in scm_call_0 (proc=0x7fccbf182ac0) at eval.c:4666
#19 0x00007fccbed6e04c in apply_thunk (thunk=0x7fccbf182ac0) at fluids.c:400
#20 0x00007fccbed6e250 in scm_c_with_fluid (fluid=0x7fccbf1c1e20, value=0x4, cproc=0x7fccbed6e034 <apply_thunk>, cdata=0x7fccbf182ac0) at fluids.c:463
#21 0x00007fccbed6e202 in scm_with_fluid (fluid=0x7fccbf1c1e20, value=0x4, thunk=0x7fccbf182ac0) at fluids.c:450
#22 0x00007fccbed60589 in ceval (x=0x7fccbf206230, env=0x7fccbf182a40) at eval.c:4547
#23 0x00007fccbed61adb in scm_apply (proc=0x7fccbf182630, arg1=0x404, args=0x7fccbf1828d0) at eval.c:5012
#24 0x00007fccbed60cf7 in scm_call_0 (proc=0x7fccbf182650) at eval.c:4666
#25 0x00007fccbed4e980 in scm_dynamic_wind (in_guard=0x7fccbf1827e0, thunk=0x7fccbf182650, out_guard=0x7fccbf182820) at dynwind.c:111
#26 0x00007fccbed60589 in ceval (x=0x7fccbf1abdc0, env=0x7fccbf182770) at eval.c:4547
#27 0x00007fccbed5b46c in ceval (x=0x7fccbf1821b0, env=0x7fccbf1825a0) at eval.c:3368
#28 0x00007fccbed63b15 in scm_i_eval_x (exp=0x7fccbf182540, env=0x7fccbf1825a0) at eval.c:5900
#29 0x00007fccbed63c8c in scm_primitive_eval_x (exp=0x7fccbf182540) at eval.c:5921
#30 0x00007fccbed63d37 in scm_eval_x (exp=0x7fccbf182540, module_or_state=0x7fccbf1e3120) at eval.c:5956
#31 0x00007fccbedacd1e in scm_shell (argc=2, argv=0x7fff6d32e308) at script.c:737
#32 0x00000000004006b8 in inner_main (closure=0x0, argc=2, argv=0x7fff6d32e308) at guile.c:53
#33 0x00007fccbed85eba in invoke_main_func (body_data=0x7fff6d32e1c0) at init.c:367
#34 0x00007fccbed497f6 in c_body (d=0x7fff6d32e100) at continuations.c:349
#35 0x00007fccbedd0e1f in scm_c_catch (tag=0x104, body=0x7fccbed497ce <c_body>, body_data=0x7fff6d32e100, handler=0x7fccbed49805 <c_handler>, 
    handler_data=0x7fff6d32e100, pre_unwind_handler=0x7fccbedd1763 <scm_handle_by_message_noexit>, pre_unwind_handler_data=0x0) at throw.c:203
#36 0x00007fccbed49792 in scm_i_with_continuation_barrier (body=0x7fccbed497ce <c_body>, body_data=0x7fff6d32e100, handler=0x7fccbed49805 <c_handler>, 
    handler_data=0x7fff6d32e100, pre_unwind_handler=0x7fccbedd1763 <scm_handle_by_message_noexit>, pre_unwind_handler_data=0x0) at continuations.c:325
#37 0x00007fccbed49881 in scm_c_with_continuation_barrier (func=0x7fccbed85e62 <invoke_main_func>, data=0x7fff6d32e1c0) at continuations.c:367
#38 0x00007fccbedced99 in scm_i_with_guile_and_parent (func=0x7fccbed85e62 <invoke_main_func>, data=0x7fff6d32e1c0, parent=0x0) at threads.c:733
#39 0x00007fccbedced5a in scm_with_guile (func=0x7fccbed85e62 <invoke_main_func>, data=0x7fff6d32e1c0) at threads.c:721
#40 0x00007fccbed85e43 in scm_boot_guile (argc=2, argv=0x7fff6d32e308, main_func=0x400694 <inner_main>, closure=0x0) at init.c:350
#41 0x00000000004006e5 in main (argc=2, argv=0x7fff6d32e308) at guile.c:63

Thread 1 (Thread 30145):
#0  0x00007fccbed72f69 in scm_i_sweep_some_segments (fl=0x7fccbf0262a0) at gc-segment.c:350
#1  0x00007fccbed70c87 in scm_gc_for_newcell (freelist=0x7fccbf0262a0, free_cells=0xb45498) at gc.c:487
#2  0x00007fccbed86201 in scm_cell (car=140517355786256, cdr=140517355823216) at ../libguile/inline.h:122
#3  0x00007fccbed44b63 in scm_acons (key=0x7fccbf142100, value=0x7fccbf139020, alist=0x7fccbf142070) at alist.c:43
#4  0x00007fccbed5fbed in ceval (x=0x7fccbf142010, env=0x7fccbf139070) at eval.c:4300
#5  0x00007fccbed61adb in scm_apply (proc=0x7fccbf148780, arg1=0x404, args=0x7fccbf14bc60) at eval.c:5012
#6  0x00007fccbed60cf7 in scm_call_0 (proc=0x7fccbf148760) at eval.c:4666
#7  0x00007fccbedd13bf in scm_body_thunk (body_data=0x7fccbdc63b50) at throw.c:355
#8  0x00007fccbedd0e1f in scm_c_catch (tag=0x104, body=0x7fccbedd139b <scm_body_thunk>, body_data=0x7fccbdc63b50, 
    handler=0x7fccbedd13c1 <scm_handle_by_proc>, handler_data=0x7fccbdc63b38, pre_unwind_handler=0, pre_unwind_handler_data=0x7fccbdc63b30) at throw.c:203
#9  0x00007fccbedd18e4 in scm_catch_with_pre_unwind_handler (key=0x104, thunk=0x7fccbf148760, handler=0x7fccbf13ae50, pre_unwind_handler=0x204) at throw.c:587
#10 0x00007fccbedd191c in scm_catch (key=0x104, thunk=0x7fccbf148760, handler=0x7fccbf13ae50) at throw.c:601
#11 0x00007fccbedceea6 in really_launch (d=0x7fff6d32ca30) at threads.c:778
#12 0x00007fccbed497f6 in c_body (d=0x7fccbdc63e40) at continuations.c:349
#13 0x00007fccbedd0e1f in scm_c_catch (tag=0x104, body=0x7fccbed497ce <c_body>, body_data=0x7fccbdc63e40, handler=0x7fccbed49805 <c_handler>, 
    handler_data=0x7fccbdc63e40, pre_unwind_handler=0x7fccbedd1763 <scm_handle_by_message_noexit>, pre_unwind_handler_data=0x0) at throw.c:203
#14 0x00007fccbed49792 in scm_i_with_continuation_barrier (body=0x7fccbed497ce <c_body>, body_data=0x7fccbdc63e40, handler=0x7fccbed49805 <c_handler>, 
    handler_data=0x7fccbdc63e40, pre_unwind_handler=0x7fccbedd1763 <scm_handle_by_message_noexit>, pre_unwind_handler_data=0x0) at continuations.c:325
#15 0x00007fccbed49881 in scm_c_with_continuation_barrier (func=0x7fccbedcedf4 <really_launch>, data=0x7fff6d32ca30) at continuations.c:367
#16 0x00007fccbedced99 in scm_i_with_guile_and_parent (func=0x7fccbedcedf4 <really_launch>, data=0x7fff6d32ca30, parent=0xb2f380) at threads.c:733
#17 0x00007fccbedceef0 in launch_thread (d=0x7fff6d32ca30) at threads.c:788
#18 0x00007fccbe1d08ba in start_thread (arg=<value optimized out>) at pthread_create.c:300
#19 0x00007fccbdf3802d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#20 0x0000000000000000 in ?? ()


[-- Attachment #2: deadlock.scm --]
[-- Type: text/plain, Size: 481 bytes --]

; Show how a deadlock can occur when forking a new thread

(use-modules (ice-9 popen)
			 (ice-9 threads))

(define (repeat n f)
  (if (> n 0)
	  (begin
		(f)
		(repeat (- n 1) f))))

(define (forever f)
  (f)
  (forever f))

(display "Spawn a thread that performs some writes\n")
(make-thread forever (lambda ()
					   (display "write...\n")))

(display "Now exec some processes...\n")
(forever (lambda ()
		   (let ((pipe (open-input-pipe "sleep 0")))
			 (close-pipe pipe))))


[-- Attachment #3: crash.scm --]
[-- Type: text/plain, Size: 263 bytes --]

; A small variation that crash

(use-modules (ice-9 threads))

(define (forever f)
  (f)
  (forever f))

; Spawn a thread that performs some writes
(make-thread forever (lambda ()
					   (display "write...\n")))

(forever (lambda () (display "new pipe...\n")))


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-07-29 13:56 open-pipe deadlocked rixed
  2011-08-02 21:24 ` Andreas Rottmann
@ 2011-08-10 18:31 ` rixed
  1 sibling, 0 replies; 12+ messages in thread
From: rixed @ 2011-08-10 18:31 UTC (permalink / raw)
  To: guile-user

Should I file a bug report at savannah, then?




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-08-04  7:18   ` rixed
@ 2011-09-01 21:45     ` Ludovic Courtès
  2011-09-02  8:13       ` rixed
  2011-09-02  9:26       ` rixed
  0 siblings, 2 replies; 12+ messages in thread
From: Ludovic Courtès @ 2011-09-01 21:45 UTC (permalink / raw)
  To: guile-user

Hi Cédric,

rixed@happyleptic.org skribis:

> I attached 2 files:
>
> - "guile deadlock.scm > /tmp/log" deadlocks after around 12k lines of output for me

AFAICS the problem does not occur with Guile 2.0.

For 1.8, could you try running Helgrind and see what happens?

> - more surprisingly, "guile crash.scm > /dev/log" segfaults with this
>   backtrace :

Well yes, ports still aren’t thread-safe ;-) so this is bound to crash
in unexpected ways.

With Guile 2.0, it always ends up with:

--8<---------------cut here---------------start------------->8---
ERROR: In procedure display:
ERROR: In procedure fport_write: Bad address
--8<---------------cut here---------------end--------------->8---

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-09-01 21:45     ` Ludovic Courtès
@ 2011-09-02  8:13       ` rixed
  2011-09-02  9:26       ` rixed
  1 sibling, 0 replies; 12+ messages in thread
From: rixed @ 2011-09-02  8:13 UTC (permalink / raw)
  To: guile-user

-[ Thu, Sep 01, 2011 at 11:45:37PM +0200, Ludovic Courtès ]----
> Hi Cédric,
> 
> rixed@happyleptic.org skribis:
> 
> > I attached 2 files:
> >
> > - "guile deadlock.scm > /tmp/log" deadlocks after around 12k lines of output for me
> 
> AFAICS the problem does not occur with Guile 2.0.
> For 1.8, could you try running Helgrind and see what happens?

Will do.

> > - more surprisingly, "guile crash.scm > /dev/log" segfaults with this
> >   backtrace :
> 
> Well yes, ports still aren???t thread-safe ;-) so this is bound to crash
> in unexpected ways.
> 
> With Guile 2.0, it always ends up with:
> 
> --8<---------------cut here---------------start------------->8---
> ERROR: In procedure display:
> ERROR: In procedure fport_write: Bad address
> --8<---------------cut here---------------end--------------->8---

With guile 2.0.2 it sometimes ends with this or sometime segfaults. See
the ticket I opened at savannah here for a backtrace:

https://savannah.gnu.org/bugs/?33996





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-09-01 21:45     ` Ludovic Courtès
  2011-09-02  8:13       ` rixed
@ 2011-09-02  9:26       ` rixed
  2011-09-02 10:58         ` rixed
                           ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: rixed @ 2011-09-02  9:26 UTC (permalink / raw)
  To: guile-user

> For 1.8, could you try running Helgrind and see what happens?

Helgrind complains about loads of 'possible data race' but does not
detect anything wrong when the actual deadlock occurs. When I exit
the program it does tell that a threads still own some lock, but
does not reveal the addresses of those in a meaningfull way for me:

==26762== Thread #1: Exiting thread still holds 1 lock
==26762==    at 0x5A81B4D: waitpid (waitpid.c:41)
==26762==    by 0x4F0A289: scm_waitpid (posix.c:560)
==27182==    by 0x5A7BF09: pthread_mutex_lock (pthread_mutex_lock.c:61)
==26762==    by 0x4E8FCBF: deval (eval.c:4229)
==27182==    by 0x4C25BEF: pthread_mutex_lock (hg_intercepts.c:488)
==27182==    by 0x4EF6606: scm_i_thread_put_to_sleep (threads.c:1676)
==26762==    by 0x4E89B4F: scm_i_eval_x (eval.c:5900)
==27182==    by 0x4E96D93: scm_i_gc (gc.c:550)
==27182==    by 0x4E96CBC: scm_gc_for_newcell (gc.c:507)
==26762==    by 0x4E8FCED: deval (eval.c:4232)
==27182==    by 0x4EAC1B8: scm_cell (inline.h:122)
==26762==    by 0x4E89C62: scm_i_eval (eval.c:5910)
==26762==    by 0x4E710D7: scm_start_stack (debug.c:457)
==26762==    by 0x4E71199: scm_m_start_stack (debug.c:473)
==26762==
==27182==    by 0x4E91F5E: scm_dapply (eval.c:5012)
==27182==

(how pthread_mutex_lock apears to call scm_waitpid is not clear to me)

I don't know how helgrind works exactly, and thus can not be sure
its supposed to detect when a thread lock a mutex it already owns
(especially after a fork).

As to why it does not happen with guile2, this is still a mystery. My
theory about this deadlock is that the thread that calls open-process
owns the scm_i_port_table_mutex when open-process is called, and thus
the port-for-each call deadlock. But since
guile2's open-process does the same fork (not vfork), takes the same
scm_i_port_table_mutex in port-for-each, which mutex is still not
recursive, and yet does not deadlock, then maybe my theory is wrong in
the first place - or maybe the path that calls open-process while
scm_i_port_table_mutex is locked disapeared in guile2, maybe due to the
change of garbage collector (since the GC also grab this lock I
believe). Or maybe the deadlock involves another lock in addition to
this one. I'm going to turn scm_i_port_table_mutex into a recursive
mutex in order to try to invalidate my theory. sorry I'm thinking aloud
but maybe this can give you some better idea?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-09-02  9:26       ` rixed
@ 2011-09-02 10:58         ` rixed
  2011-09-02 10:58         ` rixed
  2011-09-03 20:32         ` Ludovic Courtès
  2 siblings, 0 replies; 12+ messages in thread
From: rixed @ 2011-09-02 10:58 UTC (permalink / raw)
  To: guile-user

-[ Fri, Sep 02, 2011 at 11:26:42AM +0200, rixed@happyleptic.org ]----
> > For 1.8, could you try running Helgrind and see what happens?
> ==26762== Thread #1: Exiting thread still holds 1 lock
> ==26762==    at 0x5A81B4D: waitpid (waitpid.c:41)
> ==26762==    by 0x4F0A289: scm_waitpid (posix.c:560)
> ==27182==    by 0x5A7BF09: pthread_mutex_lock (pthread_mutex_lock.c:61)
> ==26762==    by 0x4E8FCBF: deval (eval.c:4229)
> ==27182==    by 0x4C25BEF: pthread_mutex_lock (hg_intercepts.c:488)
> ==27182==    by 0x4EF6606: scm_i_thread_put_to_sleep (threads.c:1676)
> ==26762==    by 0x4E89B4F: scm_i_eval_x (eval.c:5900)
> ==27182==    by 0x4E96D93: scm_i_gc (gc.c:550)
> ==27182==    by 0x4E96CBC: scm_gc_for_newcell (gc.c:507)
> ==26762==    by 0x4E8FCED: deval (eval.c:4232)
> ==27182==    by 0x4EAC1B8: scm_cell (inline.h:122)
> ==26762==    by 0x4E89C62: scm_i_eval (eval.c:5910)
> ==26762==    by 0x4E710D7: scm_start_stack (debug.c:457)
> ==26762==    by 0x4E71199: scm_m_start_stack (debug.c:473)
> ==26762==
> ==27182==    by 0x4E91F5E: scm_dapply (eval.c:5012)
> ==27182==
> 
> (how pthread_mutex_lock apears to call scm_waitpid is not clear to me)

I answer myself: there are actually 2 stacks frames here. This should be
read as:

> ==26762== Thread #1: Exiting thread still holds 1 lock
> ==26762==    at 0x5A81B4D: waitpid (waitpid.c:41)
> ==26762==    by 0x4F0A289: scm_waitpid (posix.c:560)
> ==26762==    by 0x4E8FCBF: deval (eval.c:4229)
> ==26762==    by 0x4E89B4F: scm_i_eval_x (eval.c:5900)
> ==26762==    by 0x4E8FCED: deval (eval.c:4232)
> ==26762==    by 0x4E89C62: scm_i_eval (eval.c:5910)
> ==26762==    by 0x4E710D7: scm_start_stack (debug.c:457)
> ==26762==    by 0x4E71199: scm_m_start_stack (debug.c:473)
> ==26762==

and

> ==27182==    by 0x5A7BF09: pthread_mutex_lock (pthread_mutex_lock.c:61)
> ==27182==    by 0x4C25BEF: pthread_mutex_lock (hg_intercepts.c:488)
> ==27182==    by 0x4EF6606: scm_i_thread_put_to_sleep (threads.c:1676)
> ==27182==    by 0x4E96D93: scm_i_gc (gc.c:550)
> ==27182==    by 0x4E96CBC: scm_gc_for_newcell (gc.c:507)
> ==27182==    by 0x4EAC1B8: scm_cell (inline.h:122)
> ==27182==    by 0x4E91F5E: scm_dapply (eval.c:5012)
> ==27182==

Which makes more sense.
(I suppose the first stackframe is where the thread exited and the
second where it took its lock from).




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-09-02  9:26       ` rixed
  2011-09-02 10:58         ` rixed
@ 2011-09-02 10:58         ` rixed
  2011-09-03 20:32         ` Ludovic Courtès
  2 siblings, 0 replies; 12+ messages in thread
From: rixed @ 2011-09-02 10:58 UTC (permalink / raw)
  To: guile-user

-[ Fri, Sep 02, 2011 at 11:26:42AM +0200, rixed@happyleptic.org ]----
> Or maybe the deadlock involves another lock in addition to
> this one. I'm going to turn scm_i_port_table_mutex into a recursive
> mutex in order to try to invalidate my theory.

initializing scm_i_port_table_mutex as a recursive mutex does not
prevent the deadlock, so initial theory is dead. That's not the forked
thread that deadlocks itself by locking it twice.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-09-02  9:26       ` rixed
  2011-09-02 10:58         ` rixed
  2011-09-02 10:58         ` rixed
@ 2011-09-03 20:32         ` Ludovic Courtès
  2011-09-04 10:18           ` Andy Wingo
  2 siblings, 1 reply; 12+ messages in thread
From: Ludovic Courtès @ 2011-09-03 20:32 UTC (permalink / raw)
  To: guile-user

Hi!

rixed@happyleptic.org skribis:

>> For 1.8, could you try running Helgrind and see what happens?
>
> Helgrind complains about loads of 'possible data race'

Actually 1.8 has a serious problem when it comes to multi-threading:
memoization, which modifies the source code tree structure, is not
thread-safe.

I’m not sure if this could explain your deadlock, but it could
potentially lead to unexpected behavior.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-09-03 20:32         ` Ludovic Courtès
@ 2011-09-04 10:18           ` Andy Wingo
  2011-09-05  7:30             ` rixed
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Wingo @ 2011-09-04 10:18 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user

Hi,

On Sat 03 Sep 2011 22:32, ludo@gnu.org (Ludovic Courtès) writes:

> rixed@happyleptic.org skribis:
>
>>> For 1.8, could you try running Helgrind and see what happens?
>>
>> Helgrind complains about loads of 'possible data race'
>
> Actually 1.8 has a serious problem when it comes to multi-threading:
> memoization, which modifies the source code tree structure, is not
> thread-safe.

Yeah, at this point I think that you really should be using 2.0 if you
are using threads.  Some things work in 1.8 but we really can't help
debugging there, because 1.8 has some more serious problems that are
already fixed in 2.0 and could cause all sorts of undefined behavior.

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: open-pipe deadlocked
  2011-09-04 10:18           ` Andy Wingo
@ 2011-09-05  7:30             ` rixed
  0 siblings, 0 replies; 12+ messages in thread
From: rixed @ 2011-09-05  7:30 UTC (permalink / raw)
  To: guile-user

-[ Sun, Sep 04, 2011 at 12:18:15PM +0200, Andy Wingo ]----
> > Actually 1.8 has a serious problem when it comes to multi-threading:
> > memoization, which modifies the source code tree structure, is not
> > thread-safe.
> 
> Yeah, at this point I think that you really should be using 2.0 if you
> are using threads.  Some things work in 1.8 but we really can't help
> debugging there, because 1.8 has some more serious problems that are
> already fixed in 2.0 and could cause all sorts of undefined behavior.

I already encountered the memoization bug and dealt with it by... hum...
<slow voice> delaying the starting of the thread so that the same code
is never memoized several times simultaneously </slow voice> and it
worked so far. That's why I believe this is a different problem.

But yes, we are actually transitioning to guile2.




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-09-05  7:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-29 13:56 open-pipe deadlocked rixed
2011-08-02 21:24 ` Andreas Rottmann
2011-08-04  7:18   ` rixed
2011-09-01 21:45     ` Ludovic Courtès
2011-09-02  8:13       ` rixed
2011-09-02  9:26       ` rixed
2011-09-02 10:58         ` rixed
2011-09-02 10:58         ` rixed
2011-09-03 20:32         ` Ludovic Courtès
2011-09-04 10:18           ` Andy Wingo
2011-09-05  7:30             ` rixed
2011-08-10 18:31 ` rixed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).