unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Threads deadlock in select
@ 2011-09-02 14:29 Andrew Gaylard
  2012-01-09 15:50 ` Andy Wingo
  0 siblings, 1 reply; 2+ messages in thread
From: Andrew Gaylard @ 2011-09-02 14:29 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 9621 bytes --]

Hi,

I am experiencing a problem where my application hangs on occasionally (1 in
5 times) on startup.
There are two threads, one is C++'s main(), the other is created from scheme
directly.  The scheme
thread implements a TCP listener, using conventional
socket/bind/listen/accept logic:

(define (run-user-interface)
        (log "starting user interface...")
        (let* ( (sm     (fluid-ref *state-machine*))
                (s      (socket AF_INET SOCK_STREAM 0))
                (c      (cluster (fluid-ref *computer*)))
                (port   (port c))
                (ip     (cluster-ip-address c))
              )
        (setsockopt s SOL_SOCKET SO_REUSEADDR 1)
        (bind s AF_INET (inet-pton AF_INET ip) port)

        (listen s 5)
        (log "telnet cli on ~a:~d" ip port)

        (while #t
                (let* ( (client-connection (accept s))  ; block here
                        (socket (car client-connection))
                        (port-open? #t)
                      )
                (write-line "Welcome to the UI Server" socket)

                (while port-open?
                        (display "$ " socket)
                        (let ( (line (read-delimited "\r" socket)) )
                        (if (eof-object? line)
                                (set! port-open? #f)
                                (begin
                                        (read-delimited "\n" socket)    ;
throw away
                                        (set! port-open? (evaluate socket
line))) )
                ))
                (shutdown socket 2)
        )) )
)
(define (start-user-interface)
        (call-with-new-thread run-user-interface)
)

It appears that this scheme code causes the main thead to block on a mutex.
The GDB backtrace is shown below.  The backtrace of thread-2 clearly shows
that scm_accept called scm_std_select, which looks OK to me.  Thread-1,
however, is stuck in scm_gc_for_newcell, which called scm_pthread_mutex.

I'd be grateful for any pointers on what I'm doing wrong.  This is
Guile-1.8.8, if that's
important.  We can't upgrade to 2.0.x, as I can't get it to build on our
platform (Solaris-10,
SPARC and x86-64).  Tips on that'd be helpful too.

Many thanks,
- Andrew

(gdb) bt
#0  0xfffffd7ffe89c257 in __lwp_park () from /lib/64/libc.so.1
#1  0xfffffd7ffe8941f6 in mutex_lock_queue () from /lib/64/libc.so.1
#2  0xfffffd7ffe894ca8 in mutex_lock_impl () from /lib/64/libc.so.1
#3  0xfffffd7ffe894d9b in pthread_mutex_lock () from /lib/64/libc.so.1
#4  0xfffffd7ffec20bf8 in scm_pthread_mutex_lock
(mutex=0xfffffd7ffec62090) at threads.c:1499
#5  0xfffffd7ffebc1d2e in scm_gc_for_newcell
(freelist=0xfffffd7ffec686e0, free_cells=0x478e08) at gc.c:484
#6  0xfffffd7ffebd7169 in scm_cell (car=8112048, cdr=1028) at
../libguile/inline.h:122
#7  0xfffffd7ffebd8461 in scm_list_1 (e1=0x7bc7b0) at list.c:47
#8  0xfffffd7ffebef712 in scm_remove_from_port_table (port=0x7bc7b0)
at ports.c:564
#9  0xfffffd7ffebc545b in scm_i_sweep_card (p=0x7bc7b0,
free_list=0xfffffd7fffdfd2d8, seg=0x4ccde0) at gc-card.c:212
#10 0xfffffd7ffebc3a7a in scm_i_sweep_some_cards (seg=0x4ccde0) at
gc-segment.c:168
#11 0xfffffd7ffebc4042 in scm_i_sweep_some_segments
(fl=0xfffffd7ffec686e0) at gc-segment.c:353
#12 0xfffffd7ffebc1d54 in scm_gc_for_newcell
(freelist=0xfffffd7ffec686e0, free_cells=0x478e08) at gc.c:487
#13 0xfffffd7ffebd7169 in scm_cell (car=6192256, cdr=1028) at
../libguile/inline.h:122
#14 0xfffffd7ffebee29b in scm_cons (x=0x5e7c80, y=0x404) at pairs.c:62
#15 0xfffffd7ffebb45cc in scm_closure (code=0x5e7c80, env=0x7b8560) at
eval.c:5601
#16 0xfffffd7ffebb7eef in deval (x=0x5e7c90, env=0x7b8560) at eval.c:3674
#17 0xfffffd7ffebb558f in eval_letrec_inits (env=0x7b8560,
init_forms=0x69a3a0, init_values_eol=0xfffffd7fffdfdc38) at
eval.c:3189
#18 0xfffffd7ffebb54bd in eval_letrec_inits (env=0x7b8560,
init_forms=0x69a420, init_values_eol=0xfffffd7fffdfdc38) at
eval.c:3186
#19 0xfffffd7ffebb54bd in eval_letrec_inits (env=0x7b8560,
init_forms=0x69a5a0, init_values_eol=0xfffffd7fffdfdc38) at
eval.c:3186
#20 0xfffffd7ffebb54bd in eval_letrec_inits (env=0x7b8560,
init_forms=0x69a740, init_values_eol=0xfffffd7fffdfdc38) at
eval.c:3186
#21 0xfffffd7ffebb54bd in eval_letrec_inits (env=0x7b8560,
init_forms=0x69a940, init_values_eol=0xfffffd7fffdfdc38) at
eval.c:3186
#22 0xfffffd7ffebb54bd in eval_letrec_inits (env=0x7b8560,
init_forms=0x69aa80, init_values_eol=0xfffffd7fffdfdc38) at
eval.c:3186
#23 0xfffffd7ffebb54bd in eval_letrec_inits (env=0x7b8560,
init_forms=0x69abe0, init_values_eol=0xfffffd7fffdfdc38) at
eval.c:3186
#24 0xfffffd7ffebae18b in ceval (x=0x69a340, env=0x7b8560) at eval.c:3630
#25 0xfffffd7ffebb1054 in ceval (x=0x738740, env=0x7ba8e0) at eval.c:4342
#26 0xfffffd7ffebaca4f in ceval (x=0x776120, env=0x7a8cf0) at eval.c:3397
#27 0xfffffd7ffebac3b4 in scm_eval_body (code=0x629220, env=0x7a8d60)
at eval.c:3000
#28 0xfffffd7ffebb3622 in call_closure_1 (proc=0x7a8dc0,
arg1=0x6d6960) at eval.c:5261
#29 0xfffffd7ffe4d5729 in scm_srfi1_for_each (proc=0x7a8dc0,
arg1=0x5fe6a0, args=0x404) at srfi-1.c:1516
#30 0xfffffd7ffebb11cf in ceval (x=0x404, env=0x7a8e10) at eval.c:4367
#31 0xfffffd7ffebb2d47 in scm_apply (proc=0x629690, arg1=0x404,
args=0x7a8e10) at eval.c:5012
#32 0xfffffd7ffebb1f63 in scm_call_0 (proc=0x7a8eb0) at eval.c:4666
#33 0xfffffd7ffebbf09e in apply_thunk (thunk=0x7a8eb0) at fluids.c:400
#34 0xfffffd7ffebbf2a2 in scm_c_with_fluid (fluid=0x713700,
value=0x6d6ac0, cproc=0xfffffd7ffebbf086 <apply_thunk>,
cdata=0x7a8eb0) at fluids.c:463
#35 0xfffffd7ffebbf254 in scm_with_fluid (fluid=0x713700,
value=0x6d6ac0, thunk=0x7a8eb0) at fluids.c:450
#36 0xfffffd7ffebb17f4 in ceval (x=0x6296b0, env=0x7a8ed0) at eval.c:4547
#37 0xfffffd7ffebb2d47 in scm_apply (proc=0x734c20, arg1=0x404,
args=0x7a97b0) at eval.c:5012
#38 0xfffffd7ffebb1fd9 in scm_call_2 (proc=0x734b70, arg1=0x6d6ac0,
arg2=0x9476e0) at eval.c:4678
#39 0x000000000043c3d7 in driver::process_trampoline (this=0x9b82c0,
event=0x9476e0) at driver.cpp:498
#40 0x000000000043a583 in trampoline::process (this=0x641cf0) at event.cpp:71
#41 0x000000000043bb36 in driver::run () at driver.cpp:307
#42 0x0000000000445e46 in main (argc=2, argv=0xfffffd7fffdfec48) at
monitor.cpp:34
(gdb)
(gdb)
(gdb) thread 2
[Switching to thread 2 (LWP    2        )]#0  0xfffffd7ffe8a162a in
__pollsys () from /lib/64/libc.so.1
(gdb) bt
#0  0xfffffd7ffe8a162a in __pollsys () from /lib/64/libc.so.1
#1  0xfffffd7ffe88fa45 in _pollsys () from /lib/64/libc.so.1
#2  0xfffffd7ffe848334 in pselect () from /lib/64/libc.so.1
#3  0xfffffd7ffe848602 in select () from /lib/64/libc.so.1
#4  0xfffffd7ffec20ad4 in scm_std_select (nfds=13,
readfds=0xfffffd7ffe3dba20, writefds=0x0,
exceptfds=0xfffffd7ffe3d9a20, timeout=0x0) at threads.c:1465
#5  0xfffffd7ffec395e9 in scm_accept (sock=0x7cec80) at socket.c:1346
#6  0xfffffd7ffebb0a3e in ceval (x=0x404, env=0x83bdc0) at eval.c:4232
#7  0xfffffd7ffebae30c in ceval (x=0x83c510, env=0x83bdc0) at eval.c:3648
#8  0xfffffd7ffebad9ed in ceval (x=0x83be00, env=0x83bdc0) at eval.c:3558
#9  0xfffffd7ffebaca4f in ceval (x=0x83c1d0, env=0x83be60) at eval.c:3397
#10 0xfffffd7ffebb2d47 in scm_apply (proc=0x83c050, arg1=0x404,
args=0x83bf50) at eval.c:5012
#11 0xfffffd7ffebb1f63 in scm_call_0 (proc=0x83c030) at eval.c:4666
#12 0xfffffd7ffec21e29 in scm_body_thunk
(body_data=0xfffffd7ffe3de650) at throw.c:355
#13 0xfffffd7ffec21895 in scm_c_catch (tag=0x6e64e0,
body=0xfffffd7ffec21e05 <scm_body_thunk>,
body_data=0xfffffd7ffe3de650, handler=0xfffffd7ffec21e2b
<scm_handle_by_proc>, handler_data=0xfffffd7ffe3de638,
    pre_unwind_handler=0, pre_unwind_handler_data=0xfffffd7ffe3de630)
at throw.c:203
#14 0xfffffd7ffec2234b in scm_catch_with_pre_unwind_handler
(key=0x6e64e0, thunk=0x83c030, handler=0x83bfe0,
pre_unwind_handler=0x204) at throw.c:587
#15 0xfffffd7ffebd264b in scm_gsubr_apply (args=0x404) at gsubr.c:223
#16 0xfffffd7ffebb27da in scm_apply (proc=0x47d4f0, arg1=0x4a7190,
args=0x83bfb0) at eval.c:4932
#17 0xfffffd7ffebb12fd in ceval (x=0x4996f0, env=0x83c070) at eval.c:4382
#18 0xfffffd7ffebad8bd in ceval (x=0x83c0d0, env=0x83c070) at eval.c:3537
#19 0xfffffd7ffebb2d47 in scm_apply (proc=0x7a1b70, arg1=0x404,
args=0x785f40) at eval.c:5012
#20 0xfffffd7ffebb1f63 in scm_call_0 (proc=0x7a1ae0) at eval.c:4666
#21 0xfffffd7ffec1f997 in really_launch (d=0xfffffd7fffdfcd10) at threads.c:793
#22 0xfffffd7ffeb9ad2b in c_body (d=0xfffffd7ffe3def20) at continuations.c:349
#23 0xfffffd7ffec21895 in scm_c_catch (tag=0x104,
body=0xfffffd7ffeb9ad03 <c_body>, body_data=0xfffffd7ffe3def20,
handler=0xfffffd7ffeb9ad3a <c_handler>,
handler_data=0xfffffd7ffe3def20,
    pre_unwind_handler=0xfffffd7ffec221cd
<scm_handle_by_message_noexit>, pre_unwind_handler_data=0x0) at
throw.c:203
#24 0xfffffd7ffeb9acc7 in scm_i_with_continuation_barrier
(body=0xfffffd7ffeb9ad03 <c_body>, body_data=0xfffffd7ffe3def20,
handler=0xfffffd7ffeb9ad3a <c_handler>,
handler_data=0xfffffd7ffe3def20,
    pre_unwind_handler=0xfffffd7ffec221cd
<scm_handle_by_message_noexit>, pre_unwind_handler_data=0x0) at
continuations.c:325
#25 0xfffffd7ffeb9adb6 in scm_c_with_continuation_barrier
(func=0xfffffd7ffec1f904 <really_launch>, data=0xfffffd7fffdfcd10) at
continuations.c:367
#26 0xfffffd7ffec1f8a9 in scm_i_with_guile_and_parent
(func=0xfffffd7ffec1f904 <really_launch>, data=0xfffffd7fffdfcd10,
parent=0x4c1380) at threads.c:750
#27 0xfffffd7ffec1f9ff in launch_thread (d=0xfffffd7fffdfcd10) at threads.c:805
#28 0xfffffd7ffe89bfbb in _thr_setup () from /lib/64/libc.so.1
#29 0xfffffd7ffe89c1e0 in ?? () from /lib/64/libc.so.1
#30 0x0000000000000000 in ?? ()

[-- Attachment #2: Type: text/html, Size: 10236 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Threads deadlock in select
  2011-09-02 14:29 Threads deadlock in select Andrew Gaylard
@ 2012-01-09 15:50 ` Andy Wingo
  0 siblings, 0 replies; 2+ messages in thread
From: Andy Wingo @ 2012-01-09 15:50 UTC (permalink / raw)
  To: Andrew Gaylard; +Cc: guile-user

Hi Andrew,

Sorry for the late reply.

On Fri 02 Sep 2011 16:29, Andrew Gaylard <ag@computer.org> writes:

> (gdb) bt
> #0  0xfffffd7ffe89c257 in __lwp_park () from /lib/64/libc.so.1
> #1  0xfffffd7ffe8941f6 in mutex_lock_queue () from /lib/64/libc.so.1
> #2  0xfffffd7ffe894ca8 in mutex_lock_impl () from /lib/64/libc.so.1
> #3  0xfffffd7ffe894d9b in pthread_mutex_lock () from /lib/64/libc.so.1
> #4  0xfffffd7ffec20bf8 in scm_pthread_mutex_lock (mutex=0xfffffd7ffec62090) at threads.c:1499
> #5  0xfffffd7ffebc1d2e in scm_gc_for_newcell (freelist=0xfffffd7ffec686e0, free_cells=0x478e08) at gc.c:484
> #6  0xfffffd7ffebd7169 in scm_cell (car=8112048, cdr=1028) at ../libguile/inline.h:122
> #7  0xfffffd7ffebd8461 in scm_list_1 (e1=0x7bc7b0) at list.c:47
> #8  0xfffffd7ffebef712 in scm_remove_from_port_table (port=0x7bc7b0) at ports.c:564
> #9  0xfffffd7ffebc545b in scm_i_sweep_card (p=0x7bc7b0, free_list=0xfffffd7fffdfd2d8, seg=0x4ccde0) at gc-card.c:212
> #10 0xfffffd7ffebc3a7a in scm_i_sweep_some_cards (seg=0x4ccde0) at gc-segment.c:168
> #11 0xfffffd7ffebc4042 in scm_i_sweep_some_segments (fl=0xfffffd7ffec686e0) at gc-segment.c:353
> #12 0xfffffd7ffebc1d54 in scm_gc_for_newcell (freelist=0xfffffd7ffec686e0, free_cells=0x478e08) at gc.c:487
> #13 0xfffffd7ffebd7169 in scm_cell (car=6192256, cdr=1028) at ../libguile/inline.h:122
> #14 0xfffffd7ffebee29b in scm_cons (x=0x5e7c80, y=0x404) at pairs.c:62
...

> #0  0xfffffd7ffe8a162a in __pollsys () from /lib/64/libc.so.1
> #1  0xfffffd7ffe88fa45 in _pollsys () from /lib/64/libc.so.1
> #2  0xfffffd7ffe848334 in pselect () from /lib/64/libc.so.1
> #3  0xfffffd7ffe848602 in select () from /lib/64/libc.so.1
> #4  0xfffffd7ffec20ad4 in scm_std_select (nfds=13, readfds=0xfffffd7ffe3dba20, writefds=0x0, exceptfds=0xfffffd7ffe3d9a20, timeout=0x0) at threads.c:1465
> #5  0xfffffd7ffec395e9 in scm_accept (sock=0x7cec80) at socket.c:1346
> #6  0xfffffd7ffebb0a3e in ceval (x=0x404, env=0x83bdc0) at eval.c:4232

It seems that `accept' in 1.8 was not called in a scm_without_guile, so
Guile thinks that thread is still active, and thus the code that tries
to shut down all threads can't grab its thread lock.

Some solutions that I can think of:

  * Patch scm_accept to leave Guile.  (Then submit the patch :)

  * Do a `select' on the socket before accepting, to be sure there's a
    client ready, and that `accept' won't block.

  * Port to Guile 2.0.

> We can't upgrade to 2.0.x, as I can't get it to build on our platform
> (Solaris-10, SPARC and x86-64).  Tips on that'd be helpful too.

I think we have had some reports of Guile 2.0 building on this platform,
but it's not a regular build platform.  If you are interested in getting
this to work, send build logs for Guile 2.0.3 to bug-guile@gnu.org.

Thanks,

ANdy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-01-09 15:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-02 14:29 Threads deadlock in select Andrew Gaylard
2012-01-09 15:50 ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).