unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#25386: guile-2.0 and 2.2 thread leakage+crash; very simple test attached.
@ 2017-01-07 21:47 Linas Vepstas
  2017-01-08  2:57 ` bug#25386: Manual gc helps Linas Vepstas
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Linas Vepstas @ 2017-01-07 21:47 UTC (permalink / raw)
  To: 25386

The (very simple) program below leaks ... something, very rapidly, and
then crashes after about 15-30 seconds.  Last thing printed before
crash:

rate=194.80519560944032
num threads=2
((gc-time-taken . 2791348254) (heap-size . 7532883968) (heap-free-size
. 2449408) (heap-total-allocated . 23912882640)
(heap-allocated-since-gc . 1073995264) (protected-objects . 90)
(gc-times . 87))

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
Aborted

Similar issue in guile-2.2 except it takes longer (8 minutes) and
crashes in gc somewhere.  I assume that some sort of
continuation is left lying about, even though the thread has
exited.

(define junk 0)
(define halt #f)

(define (wtf-thr)
   (define start (- (current-time) 0.1))

   ; Create thread that does junk and exits.  Yes, the increment
   ; of `junk` is not protected, and its racey, but so what.
   (define (mkthr v) (call-with-new-thread (lambda () (set! junk (+ junk
1)))))

   ; thread arguments
   (define thrarg (make-list 10 0))

   (define cnt 0)
   (define (mke)
      ; Create a limited number of threads
      (define thr-list (map mkthr thrarg))
      ; (display (length (all-threads)))
      (map join-thread thr-list)

      ; Some handy debug printing.
      (set! cnt (+ cnt 1))
      (if (eq? 0 (modulo cnt 500))
         (begin
            (display "rate=")
            (display (/ cnt (- (current-time) start))) (newline)
            (display "num threads=")
            (display (length (all-threads))) (newline)
            (display (gc-stats)) (newline) (newline)
         )))

   ; tail recursive infinite loop.
   (define (aloop) (mke) (if (not halt) (aloop)))

   ; while forever.
   (aloop)
)

; Run elsewhere, so that we have a shell prompt
; (not required for the bug)
(call-with-new-thread wtf-thr)

; halt if desired.
; (set! halt #t)





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#25386: Manual gc helps
  2017-01-07 21:47 bug#25386: guile-2.0 and 2.2 thread leakage+crash; very simple test attached Linas Vepstas
@ 2017-01-08  2:57 ` Linas Vepstas
  2017-01-08 14:57 ` bug#25386: test case update Linas Vepstas
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2017-01-08  2:57 UTC (permalink / raw)
  To: 25386

I did a fairly through review of the thread-creation and thread-join
code in the git master branch, and it looks to be just fine. Thus,
some experimentation is in order:

Going back to guile-2.0, I see this behavior:
guile -v
guile (GNU Guile) 2.0.11
Packaged by Debian (2.0.11-deb+1-10)

If I add a manual gc to the exit of the thread, like so:

   (define (mkthr v) (call-with-new-thread (lambda ()
         (set! junk (+ junk 1)) (gc) )))

then the heap blows up, in minutes, to about 180MB but then stops
growing, even after hours and millions of thread creates:
(heap-size . 183734272) (gc-times . 1957954)

If I gc only every third thread create, it quickly blows up to about
400MB, and then stablizes, for hours:
(heap-size . 428638208) (gc-times . 1292663)

If I gc every 17th thread, it blows up to about 1.8GB and then is stable:
(heap-size . 1875902464) (gc-times . 327462)

This last one after about 5.5 million thread creates and joins.
The counting is done like so:

   (define (mkthr v) (call-with-new-thread (lambda ()
                (lock-mutex mtx)
                (if (eq? 0 (modulo junk 17)) (gc))
                (set! junk (+ junk 1))
                (unlock-mutex mtx)
                )))

In each case, it seems to hit a plateau at about (n+1)*100MB when gc
is done on one out of every n threads.  This seems quite bizarre to
me: why does this inverted relation on number of gcs vs number of
thread creates? What's magic about 100MB? Clearly 100MB is wayyy too
large for this very simple program.  I mean, even if I gc at *every*
thread-exit ...

(I have not yet explored above in guile-2.2)

Since I cannot find any 'obvious' bugs in guile, this suggests some
strange stochastic behavior in bdw-gc?





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#25386: test case update
  2017-01-07 21:47 bug#25386: guile-2.0 and 2.2 thread leakage+crash; very simple test attached Linas Vepstas
  2017-01-08  2:57 ` bug#25386: Manual gc helps Linas Vepstas
@ 2017-01-08 14:57 ` Linas Vepstas
  2017-01-08 21:55 ` bug#25386: calling gc too often triggers a crash Linas Vepstas
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2017-01-08 14:57 UTC (permalink / raw)
  To: 25386

Ran the three test case overnight; saw mostly no increase in mem usage.

-- gc before every thread exit: up to 9M thread exits, no change in heap-size

-- gc before every third thread exit: 25M thread exits, no change.

-- gc before every 17th thread exit: 44M thread exits, relatively
small increase, from (heap-size . 1875902464) to (heap-size .
2068840448)





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#25386: calling gc too often triggers a crash
  2017-01-07 21:47 bug#25386: guile-2.0 and 2.2 thread leakage+crash; very simple test attached Linas Vepstas
  2017-01-08  2:57 ` bug#25386: Manual gc helps Linas Vepstas
  2017-01-08 14:57 ` bug#25386: test case update Linas Vepstas
@ 2017-01-08 21:55 ` Linas Vepstas
  2017-01-09  1:28 ` bug#25386: Please ignore previous stack trace Linas Vepstas
  2017-01-11  6:14 ` bug#25386: This can be closed Linas Vepstas
  4 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2017-01-08 21:55 UTC (permalink / raw)
  To: 25386

Since above works so swimmingly in the bove example, I tried it in a
production system.  Calling gc shortly before thread exit results in a
crash, always the same crash, always in under 20 minutes:

guile: hashtab.c:137: vacuum_weak_hash_table: Assertion `removed <= len'
failed.
Aborted

again, this is for guile -v
guile (GNU Guile) 2.0.11
Packaged by Debian (2.0.11-deb+1-10)

Perhaps this is fixed in 2.0.13 ???

guile: hashtab.c:137: vacuum_weak_hash_table: Assertion `removed <= len' failed.

Thread 1416 "guile" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffe7b7fe700 (LWP 29883)]
0x00007ffff749e428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff749e428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff74a002a in __GI_abort () at abort.c:89
#2  0x00007ffff7496bd7 in __assert_fail_base (fmt=<optimized out>,
    assertion=assertion@entry=0x7ffff7b5f7a2 "removed <= len",
    file=file@entry=0x7ffff7b5f798 "hashtab.c", line=line@entry=137,
    function=function@entry=0x7ffff7b5ff60 "vacuum_weak_hash_table")
    at assert.c:92
#3  0x00007ffff7496c82 in __GI___assert_fail (
    assertion=0x7ffff7b5f7a2 "removed <= len",
    file=0x7ffff7b5f798 "hashtab.c", line=137,
    function=0x7ffff7b5ff60 "vacuum_weak_hash_table") at assert.c:101
#4  0x00007ffff7ac3108 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#5  0x00007ffff7ac31af in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#6  0x00007ffff7ac5b1c in scm_c_hook_run ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#7  0x00007ffff7207ff5 in GC_try_to_collect_inner ()
   from /usr/lib/x86_64-linux-gnu/libgc.so.1
#8  0x00007ffff72082aa in GC_try_to_collect_general ()
   from /usr/lib/x86_64-linux-gnu/libgc.so.1
#9  0x00007ffff720838d in GC_gcollect ()
   from /usr/lib/x86_64-linux-gnu/libgc.so.1
#10 0x00007ffff7ab9109 in scm_gc ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#11 0x00007ffff7b3402b in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#12 0x00007ffff7aab107 in scm_call_1 ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#13 0x00007ffff7b34093 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#14 0x00007ffff7aab21e in scm_call_3 ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#15 0x00007ffff7b34093 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#16 0x00007ffff7aab283 in scm_call_4 ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#17 0x00007fffefb90d79 in
opencog::SchemeEval::do_eval(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&) (
    this=0x7ffe74000980, expr=...)
    at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:564
#18 0x00007fffefb90e2a in opencog::SchemeEval::c_wrap_eval(void*) (
    p=0x7ffe74000980)
    at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:493
#19 0x00007ffff7aa158a in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#20 0x00007ffff7b34093 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#21 0x00007ffff7aab283 in scm_call_4 ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#22 0x00007ffff7aa1d21 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#23 0x00007ffff7aa1e05 in scm_c_with_continuation_barrier ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#24 0x00007ffff72198e7 in GC_call_with_gc_active ()
   from /usr/lib/x86_64-linux-gnu/libgc.so.1
#25 0x00007ffff7b21c01 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#26 0x00007ffff7213952 in GC_call_with_stack_base ()
   from /usr/lib/x86_64-linux-gnu/libgc.so.1
#27 0x00007ffff7b21fe8 in scm_with_guile ()
   from /usr/lib/x86_64-linux-gnu/libguile-2.0.so.22
#28 0x00007fffefb90eae in
opencog::SchemeEval::eval_expr(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&) (
    this=0x7ffe74000980, expr=...)
    at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:465
#29 0x00007fffe386cc86 in opencog::GenericShell::eval_loop (
    this=0x7ffe4c001380)
    at /home/ubuntu/src/opencog/opencog/cogserver/shell/GenericShell.cc:446
#30 0x00007fffee768c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#31 0x00007ffff78396ba in start_thread (arg=0x7ffe7b7fe700)
    at pthread_create.c:333
#32 0x00007ffff756f82d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#25386: Please ignore previous stack trace.
  2017-01-07 21:47 bug#25386: guile-2.0 and 2.2 thread leakage+crash; very simple test attached Linas Vepstas
                   ` (2 preceding siblings ...)
  2017-01-08 21:55 ` bug#25386: calling gc too often triggers a crash Linas Vepstas
@ 2017-01-09  1:28 ` Linas Vepstas
  2017-01-11  6:14 ` bug#25386: This can be closed Linas Vepstas
  4 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2017-01-09  1:28 UTC (permalink / raw)
  To: 25386

Sorry, please ignore the previous stack trace.  I updated to version
2.0.13 (manually compiled) and it crashed with a zillion messages:

guile: warning: weak hash table corruption (https://bugs.gnu.org/19180)

so that's a done deal, then.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#25386: This can be closed
  2017-01-07 21:47 bug#25386: guile-2.0 and 2.2 thread leakage+crash; very simple test attached Linas Vepstas
                   ` (3 preceding siblings ...)
  2017-01-09  1:28 ` bug#25386: Please ignore previous stack trace Linas Vepstas
@ 2017-01-11  6:14 ` Linas Vepstas
  2017-03-01 15:31   ` Andy Wingo
  4 siblings, 1 reply; 7+ messages in thread
From: Linas Vepstas @ 2017-01-11  6:14 UTC (permalink / raw)
  To: 25386

This can be closed as 'fixed'; I tested on today's git

guile (GNU Guile) 2.1.5.19-7e9395

and the worst of it seems to be over.  mem usage growth on the
original test case:

(heap-size . 7921664) (gc-times . 40)
(heap-size . 14344192) (gc-times . 953)
(heap-size . 14344192) (gc-times . 5219)  ; after 4 minutes CPU
(heap-size . 26419200) (gc-times . 64975) ; after 77 minutes CPU
(heap-size . 26419200) (gc-times . 133346) ; after 154 mins CPU
(heap-size . 26419200) (gc-times . 170083) ; after 192 mins CPU
(heap-size . 26419200) (gc-times . 249102) ; after 283 mins cpu
(heap-size . 26419200) (gc-times . 420031) ; after 468 min cpu
(heap-size . 26419200) (gc-times . 557039) ; after 804 mins CPU

i.e. 26MBytes - larger than it needs to be, but acceptable.

The last entry was, in full,
((gc-time-taken . 355210357) (heap-size . 26419200) (heap-free-size .
20336640) (heap-total-allocated . 2522568563312)
(heap-allocated-since-gc . 57648) (protected-objects . 0) (gc-times .
557619))

i.e. of the 26MB, only 6MB is in use, the rest is free.  The 6MB is
close to what it starts with.  2522 GB were chewed through in the
process, so this is OK, I guess.

A variant test case, create 510 threads before calling join:  (change
10 to 510 in above test)

(heap-size . 10604544) (gc-times . 32)
(heap-size . 19505152) (gc-times . 484)
(heap-size . 35926016) (gc-times . 1761)
(heap-size . 48238592) (gc-times . 4217)  ; after 8 minutes cpu time
(heap-size . 48238592) (gc-times . 47902) ; after 76 mins CPU
(heap-size . 48238592) (gc-times . 73063) ; after 114 mins CPU
(heap-size . 65540096) (gc-times . 128094) ; after 209 mins cpu
(heap-size . 65540096) (gc-times . 248321) ; after 399 mins
(heap-size . 65540096) (gc-times . 344197) ; after 546 min

i.e. 65MBytes .. acceptable, I guess.

The last one was:
((gc-time-taken . 218714374) (heap-size . 65540096) (heap-free-size .
54419456) (heap-total-allocated . 2057186203744)
(heap-allocated-since-gc . 4553872) (protected-objects . 0) (gc-times
. 344799))

so of the 65MB, only 11MB is in-use.


My production server is doing this:

(heap-size . 652918784) (gc-times . 233) ; about 8 mins CPU
(heap-size . 737722368) (gc-times . 339) ; 12 mins CPU
(heap-size . 1332973568) (gc-times . 1797) ; 120 mins CPU
(heap-size . 1441443840) (gc-times . 2221) ; 151 min CPU
(heap-size . 1521213440) (gc-times . 2441) ; 168 min cpu
(heap-size . 1595101184) (gc-times . 3061) ; 218 min cpu
(heap-size . 1726119936) (gc-times . 3292) ; 237 min
(heap-size . 1960865792) (gc-times . 6698) ; 510 minn
(heap-size . 1960865792) (gc-times . 10383) ; 805 min
(heap-size . 2931556352) (gc-times . 14211) ; 1199 min

about 3GB --

Last one is, in  full:
(gc-stats)
((gc-time-taken . 19818394581722) (heap-size . 2931556352)
(heap-free-size . 1767579648) (heap-total-allocated . 731393991040)
(heap-allocated-since-gc . 4063680) (protected-objects . 318)
(gc-times . 14211))

so of the 3GB, 1.8GB is free, and 1.2GB in use which is surprisingly
high for my app, but I can live with that.

Thanks!

--linas





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#25386: This can be closed
  2017-01-11  6:14 ` bug#25386: This can be closed Linas Vepstas
@ 2017-03-01 15:31   ` Andy Wingo
  0 siblings, 0 replies; 7+ messages in thread
From: Andy Wingo @ 2017-03-01 15:31 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: 25386-done

On Wed 11 Jan 2017 07:14, Linas Vepstas <linasvepstas@gmail.com> writes:

> This can be closed as 'fixed'; I tested on today's git
>
> guile (GNU Guile) 2.1.5.19-7e9395
>
> and the worst of it seems to be over.  mem usage growth on the
> original test case:

OK, closing :)

Andy





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-01 15:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-07 21:47 bug#25386: guile-2.0 and 2.2 thread leakage+crash; very simple test attached Linas Vepstas
2017-01-08  2:57 ` bug#25386: Manual gc helps Linas Vepstas
2017-01-08 14:57 ` bug#25386: test case update Linas Vepstas
2017-01-08 21:55 ` bug#25386: calling gc too often triggers a crash Linas Vepstas
2017-01-09  1:28 ` bug#25386: Please ignore previous stack trace Linas Vepstas
2017-01-11  6:14 ` bug#25386: This can be closed Linas Vepstas
2017-03-01 15:31   ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).