unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* out-of-control GC
@ 2017-09-09 19:47 Linas Vepstas
  2017-09-10  4:14 ` Linas Vepstas
  2017-09-11 18:13 ` Hans Åberg
  0 siblings, 2 replies; 29+ messages in thread
From: Linas Vepstas @ 2017-09-09 19:47 UTC (permalink / raw)
  To: guile-user@gnu.org

I've been experiencing problems with guile GC for a while now, that I've
mostly been able to ignore, but that are becoming debilitating, and I need
help, or need to find some solution.

Background: I have a C++ library that has guile bindings; it is wrapped up
as a guile module. It manages a lot of data, anywhere from a few gigabytes,
to a over a hundred gigabytes.  Almost all this RAM usage should be
invisible, opaque to guile: so, for example,  while my library is managing
5 or 10 GBytes, guile itself is managing maybe 20 or 50 or 100MBytes,
according to (gc-stats). This is normal and expected. Some of my programs
have guile use considerably more than this, in the multi-GByte range

However, the larger the RAM usage, the more frequently guile GC seems to
run, and the longer it takes, so that by the time my dataset is 100GBytes,
guile is using maybe 1GBytes, and is spending 99% of its time in GC. My app
slows to a crawl, and nothing gets done.  I need to get GC under control,
because it is running pointlessly, because its clear that (almost) nothing
can be reclaimed, (usually) nothing is being reclaimed.

Here's my tool:

   ; Report the average amount of time spent in GC
   (define avg-gc-cpu-time
      (let ((last-gc (gc-stats))
            (start-time (get-internal-real-time))
            (run-time (get-internal-run-time)))
         (lambda ()
            (define now (get-internal-real-time))
            (define run (get-internal-run-time))
            (define cur (gc-stats))
            (define gc-time-taken (* 1.0e-9 (- (cdar cur) (cdar last-gc))))
            (define elapsed-time (* 1.0e-9 (- now start-time)))
            (define cpu-time (* 1.0e-9 (- run run-time)))
            (format #t "Elapsed time: ~5f secs. GC: ~5f%  cpu-usage: ~5f%\n"
               elapsed-time
               (* 100 (/ gc-time-taken elapsed-time))
               (* 100 (/ cpu-time elapsed-time))
            )
            (set! last-gc cur)
            (set! start-time now)
            (set! run-time run))))

Here's what it prints:

guile-en-r> (avg-gc-cpu-time)
Elapsed time: 46.03 secs. GC: 342.8%  cpu-usage: 346.4%

guile-en-r> (avg-gc-cpu-time)
Elapsed time: 7288. secs. GC: 329.5%  cpu-usage: 330.9%

guile-en-r> (avg-gc-cpu-time)
Elapsed time: 3205. secs. GC: 318.0%  cpu-usage: 319.4%

Here's a typical gc from the start/middle of this.

> (gc-stats)
((gc-time-taken . 634825386689) (heap-size . 16199680) (heap-free-size .
1925120) (heap-total-allocated . 15424779760) (heap-allocated-since-gc .
7692512) (protected-objects . 7) (gc-times . 3786))

In this case, the heap is 16MBytes, little or no work is being done in
guile, almost all is done in the C++ code. GC has taken 634 seconds ... a
lot but acceptable.  After this point, the number of GC's skyrockets -- a
gc ever few seconds, sometimes tens of them per second.  It sucks up cpu
time, and makes the guile prompt be very laggy.  I'd give more examples,
except that now, my app has moved to the stage where it really is creating
giant-sized scheme lists:

> (gc-stats)
((gc-time-taken . 281465535649189) (heap-size . 5175664640) (heap-free-size
. 1538387968) (heap-total-allocated . 67649385200) (heap-allocated-since-gc
. 6112) (protected-objects . 7) (gc-times . 11319))

That's right: 5GBytes of scheme stuff, and 281465 seconds = 78 cpu-hours
spent in GC.  top shows 116GBytes total in the app, which is normal and
expected.

When my app is about 50GBytes in size, it will run for about 12 hours, with
an outrageous amount of time spent in GC, which I just ignore out of habit,
because I can wait 12 hours.  This one, a bigger dataset, appears to be
stalled -- after 24 hours, less than 1% of the computations have been
completed.  I'm screwed until I can somehow disable guile GC long enough
for the thing to make forward progress.

I'm sort-of willing to hack on guile internals, except that my project is
already late and I am panicking ...

--linas

-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *


^ permalink raw reply	[flat|nested] 29+ messages in thread
* Re: out-of-control GC
@ 2017-09-11  7:42 Kjetil Matheussen
  2017-09-14 18:41 ` Linas Vepstas
  0 siblings, 1 reply; 29+ messages in thread
From: Kjetil Matheussen @ 2017-09-11  7:42 UTC (permalink / raw)
  To: guile-user, linasvepstas

> linasvepstas@gmail.com:
>

> I've been experiencing problems with guile GC for a while now, that I've
> mostly been able to ignore, but that are becoming debilitating, and I need
> help, or need to find some solution.
>
> Background: I have a C++ library that has guile bindings; it is wrapped up
> as a guile module. It manages a lot of data, anywhere from a few gigabytes,
> to a over a hundred gigabytes.  Almost all this RAM usage should be
> invisible, opaque to guile: so, for example,  while my library is managing
> 5 or 10 GBytes, guile itself is managing maybe 20 or 50 or 100MBytes,
> according to (gc-stats). This is normal and expected. Some of my programs
> have guile use considerably more than this, in the multi-GByte range
>
> However, the larger the RAM usage, the more frequently guile GC seems to
> run, and the longer it takes, so that by the time my dataset is 100GBytes,
> guile is using maybe 1GBytes, and is spending 99% of its time in GC. My app
> slows to a crawl, and nothing gets done.  I need to get GC under control,
> because it is running pointlessly, because its clear that (almost) nothing
> can be reclaimed, (usually) nothing is being reclaimed.
>
>
Hi Lineas,

This is probably not the reason, but could it be that your program has a
lot of global/static data,
or that you are dynamically loading more and more libraries that has
global/static data?
(global/static data are scanned by bdw-gc, unless explicitly prevented from
doing so)

To test this, you can add a "static_roots_callback" before calling GC_INIT
when your
program starts up. Here's an example of a "static_roots_callback":

https://github.com/kmatheussen/radium/blob/master/Qt/Qt_Main.cpp#L3028
(change "#if 0" at line 3094 to "#if 1" to print out info every time you
collect garbage)

It's initialized like this:

int main(){
   GC_register_has_static_roots_callback(gc_has_static_roots_func);
   GC_INIT();
   ...
}


Also, maybe you should ask on the bdw-gc mailing list too. The question is
probably more relevant there if it's a gc tuning issue (which it might not
be though).


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-09-14 20:45 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-09 19:47 out-of-control GC Linas Vepstas
2017-09-10  4:14 ` Linas Vepstas
2017-09-10 16:50   ` Arne Babenhauserheide
2017-09-10 18:03     ` David Kastrup
2017-09-10 20:23     ` Linas Vepstas
2017-09-10 20:36       ` Marko Rauhamaa
2017-09-10 21:47         ` Linas Vepstas
2017-09-10 22:38           ` Marko Rauhamaa
2017-09-11 16:17             ` Arne Babenhauserheide
2017-09-14 18:31             ` Linas Vepstas
2017-09-14 18:58               ` Marko Rauhamaa
2017-09-14 20:15                 ` Linas Vepstas
2017-09-14 20:36                   ` David Kastrup
2017-09-14 20:45                 ` Arne Babenhauserheide
2017-09-11  8:22           ` tomas
2017-09-14 20:19             ` Linas Vepstas
2017-09-13 17:13         ` David Kastrup
2017-09-10 22:17       ` Arne Babenhauserheide
2017-09-10 22:36       ` Arne Babenhauserheide
2017-09-14  7:44       ` Mark H Weaver
2017-09-10 16:53   ` Arne Babenhauserheide
2017-09-13  2:24   ` Mark H Weaver
2017-09-13 19:38     ` Arne Babenhauserheide
2017-09-13 19:57       ` David Kastrup
2017-09-13 20:37         ` Arne Babenhauserheide
2017-09-14 19:14     ` Linas Vepstas
2017-09-11 18:13 ` Hans Åberg
  -- strict thread matches above, loose matches on Subject: below --
2017-09-11  7:42 Kjetil Matheussen
2017-09-14 18:41 ` Linas Vepstas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).