out-of-control GC

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

* out-of-control GC
@ 2017-09-09 19:47 Linas Vepstas
  2017-09-10  4:14 ` Linas Vepstas
  2017-09-11 18:13 ` Hans Åberg
  0 siblings, 2 replies; 29+ messages in thread
From: Linas Vepstas @ 2017-09-09 19:47 UTC (permalink / raw)
  To: guile-user@gnu.org

I've been experiencing problems with guile GC for a while now, that I've
mostly been able to ignore, but that are becoming debilitating, and I need
help, or need to find some solution.

Background: I have a C++ library that has guile bindings; it is wrapped up
as a guile module. It manages a lot of data, anywhere from a few gigabytes,
to a over a hundred gigabytes.  Almost all this RAM usage should be
invisible, opaque to guile: so, for example,  while my library is managing
5 or 10 GBytes, guile itself is managing maybe 20 or 50 or 100MBytes,
according to (gc-stats). This is normal and expected. Some of my programs
have guile use considerably more than this, in the multi-GByte range

However, the larger the RAM usage, the more frequently guile GC seems to
run, and the longer it takes, so that by the time my dataset is 100GBytes,
guile is using maybe 1GBytes, and is spending 99% of its time in GC. My app
slows to a crawl, and nothing gets done.  I need to get GC under control,
because it is running pointlessly, because its clear that (almost) nothing
can be reclaimed, (usually) nothing is being reclaimed.

Here's my tool:

   ; Report the average amount of time spent in GC
   (define avg-gc-cpu-time
      (let ((last-gc (gc-stats))
            (start-time (get-internal-real-time))
            (run-time (get-internal-run-time)))
         (lambda ()
            (define now (get-internal-real-time))
            (define run (get-internal-run-time))
            (define cur (gc-stats))
            (define gc-time-taken (* 1.0e-9 (- (cdar cur) (cdar last-gc))))
            (define elapsed-time (* 1.0e-9 (- now start-time)))
            (define cpu-time (* 1.0e-9 (- run run-time)))
            (format #t "Elapsed time: ~5f secs. GC: ~5f%  cpu-usage: ~5f%\n"
               elapsed-time
               (* 100 (/ gc-time-taken elapsed-time))
               (* 100 (/ cpu-time elapsed-time))
            )
            (set! last-gc cur)
            (set! start-time now)
            (set! run-time run))))

Here's what it prints:

guile-en-r> (avg-gc-cpu-time)
Elapsed time: 46.03 secs. GC: 342.8%  cpu-usage: 346.4%

guile-en-r> (avg-gc-cpu-time)
Elapsed time: 7288. secs. GC: 329.5%  cpu-usage: 330.9%

guile-en-r> (avg-gc-cpu-time)
Elapsed time: 3205. secs. GC: 318.0%  cpu-usage: 319.4%

Here's a typical gc from the start/middle of this.

> (gc-stats)
((gc-time-taken . 634825386689) (heap-size . 16199680) (heap-free-size .
1925120) (heap-total-allocated . 15424779760) (heap-allocated-since-gc .
7692512) (protected-objects . 7) (gc-times . 3786))

In this case, the heap is 16MBytes, little or no work is being done in
guile, almost all is done in the C++ code. GC has taken 634 seconds ... a
lot but acceptable.  After this point, the number of GC's skyrockets -- a
gc ever few seconds, sometimes tens of them per second.  It sucks up cpu
time, and makes the guile prompt be very laggy.  I'd give more examples,
except that now, my app has moved to the stage where it really is creating
giant-sized scheme lists:

> (gc-stats)
((gc-time-taken . 281465535649189) (heap-size . 5175664640) (heap-free-size
. 1538387968) (heap-total-allocated . 67649385200) (heap-allocated-since-gc
. 6112) (protected-objects . 7) (gc-times . 11319))

That's right: 5GBytes of scheme stuff, and 281465 seconds = 78 cpu-hours
spent in GC.  top shows 116GBytes total in the app, which is normal and
expected.

When my app is about 50GBytes in size, it will run for about 12 hours, with
an outrageous amount of time spent in GC, which I just ignore out of habit,
because I can wait 12 hours.  This one, a bigger dataset, appears to be
stalled -- after 24 hours, less than 1% of the computations have been
completed.  I'm screwed until I can somehow disable guile GC long enough
for the thing to make forward progress.

I'm sort-of willing to hack on guile internals, except that my project is
already late and I am panicking ...

--linas

-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-09 19:47 out-of-control GC Linas Vepstas
@ 2017-09-10  4:14 ` Linas Vepstas
  2017-09-10 16:50   ` Arne Babenhauserheide
                     ` (2 more replies)
  2017-09-11 18:13 ` Hans Åberg
  1 sibling, 3 replies; 29+ messages in thread
From: Linas Vepstas @ 2017-09-10  4:14 UTC (permalink / raw)
  To: guile-user@gnu.org

To make this conversation less crazy, and more down-to-earth, here is a
demo that seems to spend most of its time in GC. Hard for me to to say that
this is "wrong", right now, but it illustrates the scandalous nature of the
problem.

Its a simple loop, about 6 lines of code total, it mostly just wastes cpu
time.  There's also a longer progress-report printer, similar to the
previous one I posted.

It runs at about 1 million iterations/minute, for me, of which 15 seconds
are spent "doing real work", and 45-75 seconds in gc.  Note that GC runs in
parallel, usually using about 3-4 cores, so the fraction-cpu-time is the
fraction relative to wall-clock, so that %350 cpu time is typical (i.e. 3.5
cpu cores are running)

This seems to be a reasonable model of the behavior that I am seeing.

--linas

; Report the average amount of time spent in GC
(define avg-gc-cpu-time
(let ((last-gc (gc-stats))
(start-time (get-internal-real-time))
(run-time (get-internal-run-time)))
(lambda ()
(define now (get-internal-real-time))
(define run (get-internal-run-time))
(define cur (gc-stats))
(define gc-time-taken (* 1.0e-9 (- (cdar cur) (cdar last-gc))))
(define elapsed-time (* 1.0e-9 (- now start-time)))
(define cpu-time (* 1.0e-9 (- run run-time)))
(define non-gc-cpu (- cpu-time gc-time-taken))
(define ngc (- (assoc-ref cur 'gc-times)
(assoc-ref last-gc 'gc-times)))
(format #t "Elapsed: ~5f secs. non-gc: ~5f secs Rate: ~5f gc/min %GC: ~5f%
 %cpu: ~5f%\n"
elapsed-time
non-gc-cpu
(/ (* ngc 60) elapsed-time)
(* 100 (/ gc-time-taken elapsed-time))
(* 100 (/ cpu-time elapsed-time))
)
(set! last-gc cur)
(set! start-time now)
(set! run-time run))))

; a tail-recursive loop, seems to waste a lot of time in gc.
(define (make-bgl lis cnt)
; a bogus longer list
(define longer (cons (format #f "~A" cnt) lis))
; periodically report statistics
(if (eq? 0 (modulo cnt (* 1000 1000)))
(begin (format #t "~A " cnt)  (avg-gc-cpu-time)))
; periodically trim the list
(if (eq? 0 (modulo cnt (* 4123 1123)))
(set! longer '()))
; loop for a while.
(if (eq? 0 cnt) lis
(make-bgl longer (- cnt 1))))

; (define foo (make-bgl '() (* 1235 1000 1000)))




On Sat, Sep 9, 2017 at 2:47 PM, Linas Vepstas <linasvepstas@gmail.com>
wrote:

>
> I've been experiencing problems with guile GC for a while now, that I've
> mostly been able to ignore, but that are becoming debilitating, and I need
> help, or need to find some solution.
>
> Background: I have a C++ library that has guile bindings; it is wrapped up
> as a guile module. It manages a lot of data, anywhere from a few gigabytes,
> to a over a hundred gigabytes.  Almost all this RAM usage should be
> invisible, opaque to guile: so, for example,  while my library is managing
> 5 or 10 GBytes, guile itself is managing maybe 20 or 50 or 100MBytes,
> according to (gc-stats). This is normal and expected. Some of my programs
> have guile use considerably more than this, in the multi-GByte range
>
> However, the larger the RAM usage, the more frequently guile GC seems to
> run, and the longer it takes, so that by the time my dataset is 100GBytes,
> guile is using maybe 1GBytes, and is spending 99% of its time in GC. My app
> slows to a crawl, and nothing gets done.  I need to get GC under control,
> because it is running pointlessly, because its clear that (almost) nothing
> can be reclaimed, (usually) nothing is being reclaimed.
>
> Here's my tool:
>
>    ; Report the average amount of time spent in GC
>    (define avg-gc-cpu-time
>       (let ((last-gc (gc-stats))
>             (start-time (get-internal-real-time))
>             (run-time (get-internal-run-time)))
>          (lambda ()
>             (define now (get-internal-real-time))
>             (define run (get-internal-run-time))
>             (define cur (gc-stats))
>             (define gc-time-taken (* 1.0e-9 (- (cdar cur) (cdar last-gc))))
>             (define elapsed-time (* 1.0e-9 (- now start-time)))
>             (define cpu-time (* 1.0e-9 (- run run-time)))
>             (format #t "Elapsed time: ~5f secs. GC: ~5f%  cpu-usage:
> ~5f%\n"
>                elapsed-time
>                (* 100 (/ gc-time-taken elapsed-time))
>                (* 100 (/ cpu-time elapsed-time))
>             )
>             (set! last-gc cur)
>             (set! start-time now)
>             (set! run-time run))))
>
> Here's what it prints:
>
> guile-en-r> (avg-gc-cpu-time)
> Elapsed time: 46.03 secs. GC: 342.8%  cpu-usage: 346.4%
>
> guile-en-r> (avg-gc-cpu-time)
> Elapsed time: 7288. secs. GC: 329.5%  cpu-usage: 330.9%
>
> guile-en-r> (avg-gc-cpu-time)
> Elapsed time: 3205. secs. GC: 318.0%  cpu-usage: 319.4%
>
> Here's a typical gc from the start/middle of this.
>
> > (gc-stats)
> ((gc-time-taken . 634825386689) (heap-size . 16199680) (heap-free-size .
> 1925120) (heap-total-allocated . 15424779760) (heap-allocated-since-gc .
> 7692512) (protected-objects . 7) (gc-times . 3786))
>
> In this case, the heap is 16MBytes, little or no work is being done in
> guile, almost all is done in the C++ code. GC has taken 634 seconds ... a
> lot but acceptable.  After this point, the number of GC's skyrockets -- a
> gc ever few seconds, sometimes tens of them per second.  It sucks up cpu
> time, and makes the guile prompt be very laggy.  I'd give more examples,
> except that now, my app has moved to the stage where it really is creating
> giant-sized scheme lists:
>
> > (gc-stats)
> ((gc-time-taken . 281465535649189) (heap-size . 5175664640
> <(517)%20566-4640>) (heap-free-size
> . 1538387968) (heap-total-allocated . 67649385200) (heap-allocated-since-gc
> . 6112) (protected-objects . 7) (gc-times . 11319))
>
> That's right: 5GBytes of scheme stuff, and 281465 seconds = 78 cpu-hours
> spent in GC.  top shows 116GBytes total in the app, which is normal and
> expected.
>
> When my app is about 50GBytes in size, it will run for about 12 hours,
> with an outrageous amount of time spent in GC, which I just ignore out of
> habit, because I can wait 12 hours.  This one, a bigger dataset, appears to
> be stalled -- after 24 hours, less than 1% of the computations have been
> completed.  I'm screwed until I can somehow disable guile GC long enough
> for the thing to make forward progress.
>
> I'm sort-of willing to hack on guile internals, except that my project is
> already late and I am panicking ...
>
> --linas
>
> --
> *"The problem is not that artificial intelligence will get too smart and
> take over the world," computer scientist Pedro Domingos writes, "the
> problem is that it's too stupid and already has." *
>



-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10  4:14 ` Linas Vepstas
@ 2017-09-10 16:50   ` Arne Babenhauserheide
  2017-09-10 18:03     ` David Kastrup
  2017-09-10 20:23     ` Linas Vepstas
  2017-09-10 16:53   ` Arne Babenhauserheide
  2017-09-13  2:24   ` Mark H Weaver
  2 siblings, 2 replies; 29+ messages in thread
From: Arne Babenhauserheide @ 2017-09-10 16:50 UTC (permalink / raw)
  To: linasvepstas; +Cc: guile-user@gnu.org

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

Hi Linas,


Linas Vepstas <linasvepstas@gmail.com> writes:

> To make this conversation less crazy, and more down-to-earth, here is a
> demo that seems to spend most of its time in GC. Hard for me to to say that
> this is "wrong", right now, but it illustrates the scandalous nature of the
> problem.

Do you have many string operations in your code?

If I replace

    (cons (format #f "~A" cnt) lis)

with

    (cons cnt lis)

GC is down to ~20% (from around 130%).

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 16:50   ` Arne Babenhauserheide
@ 2017-09-10 18:03     ` David Kastrup
  2017-09-10 20:23     ` Linas Vepstas
  1 sibling, 0 replies; 29+ messages in thread
From: David Kastrup @ 2017-09-10 18:03 UTC (permalink / raw)
  To: guile-user

Arne Babenhauserheide <arne_bab@web.de> writes:

> Hi Linas,
>
>
> Linas Vepstas <linasvepstas@gmail.com> writes:
>
>> To make this conversation less crazy, and more down-to-earth, here is a
>> demo that seems to spend most of its time in GC. Hard for me to to say that
>> this is "wrong", right now, but it illustrates the scandalous nature of the
>> problem.
>
> Do you have many string operations in your code?
>
> If I replace
>
>     (cons (format #f "~A" cnt) lis)
>
> with
>
>     (cons cnt lis)
>
> GC is down to ~20% (from around 130%).

But the main point of using format (rather than string appends) is to
avoid creating lots of garbage-collected temporary expressions.  A
factor of 6 seems to suggest that this expectation is not actually being
accommodated.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 16:50   ` Arne Babenhauserheide
  2017-09-10 18:03     ` David Kastrup
@ 2017-09-10 20:23     ` Linas Vepstas
  2017-09-10 20:36       ` Marko Rauhamaa
                         ` (3 more replies)
  1 sibling, 4 replies; 29+ messages in thread
From: Linas Vepstas @ 2017-09-10 20:23 UTC (permalink / raw)
  To: Arne Babenhauserheide; +Cc: guile-user@gnu.org

Hi Arne,

On Sun, Sep 10, 2017 at 11:50 AM, Arne Babenhauserheide <arne_bab@web.de>
wrote:

> Hi Linas,
>
>
> Linas Vepstas <linasvepstas@gmail.com> writes:
>
> > To make this conversation less crazy, and more down-to-earth, here is a
> > demo that seems to spend most of its time in GC. Hard for me to to say
> that
> > this is "wrong", right now, but it illustrates the scandalous nature of
> the
> > problem.
>
> Do you have many string operations in your code?
>
> If I replace
>
>     (cons (format #f "~A" cnt) lis)
>
> with
>
>     (cons cnt lis)
>
> GC is down to ~20% (from around 130%).
>

That demo tried to capture the spirit of a more complex system. The cons
that I have is actually (cons (wrapper-for-some-c++-code foobar) lis)  or
maybe (cons (scheme-that-calls-other-scheme-that-calls-c++ foo) lis)  ...
but its not just cons -- it can also be for-each, map, or fold.

The actual printouts, in my instrumented code look like this:

guile-en-r> (avg-gc-cpu-time)
Elapsed: 4461. secs. Rate: 4.236 gc/min %cpu-GC: 280.8%  %cpu-use: 284.0%

guile-en-r> (avg-gc-cpu-time)
Elapsed: 1501. secs. Rate: 4.077 gc/min %cpu-GC: 280.4%  %cpu-use: 283.8%

guile-en-r> (gc-stats)

((gc-time-taken . 353452428102078) (heap-size . 6485340160) (heap-free-size
. 3497791488) (heap-total-allocated . 100778445248)
(heap-allocated-since-gc . 9248) (protected-objects . 7) (gc-times . 12894))

which suggests that the vast majority of time is spent in GC, and not in
either scheme/guile, or my wrapped c++ code.

4 gc's/minute means one every 15 seconds, which sounds reasonable, except
that RAM usage is so huge, that each GC takes many seconds to complete.  By
examining  heap-free-size before and after, there is almost no change --
i.e. there was very little to collect.   Hmm  .. perhaps I should monitor
that more closely, to see how true that really is ...

Anyway, this demo gives a scenario where gc seems to run too often.  I have
other scenarios where it does not run often enough -- I have one where the
working set is about 200MBytes, but guile happily grows to 10GB or 20GB RAM
over a few hours.  My solution for that other case is to manually GC
whenever gc-stats gets above 1GB, because I know a priori my working set is
smaller than that.

The overall message seems to be that gc either runs too often, or not often
enough.  Some sort of better planning is needed.

I'm motivated to explore this, as it actually impacts my work.

--linas

-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 20:23     ` Linas Vepstas
@ 2017-09-10 20:36       ` Marko Rauhamaa
  2017-09-10 21:47         ` Linas Vepstas
  2017-09-13 17:13         ` David Kastrup
  2017-09-10 22:17       ` Arne Babenhauserheide
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 29+ messages in thread
From: Marko Rauhamaa @ 2017-09-10 20:36 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: guile-user@gnu.org

Linas Vepstas <linasvepstas@gmail.com>:
> which suggests that the vast majority of time is spent in GC, and not
> in either scheme/guile, or my wrapped c++ code.

That may well be normal.

> 4 gc's/minute means one every 15 seconds, which sounds reasonable,
> except that RAM usage is so huge, that each GC takes many seconds to
> complete.

That may or may not be normal. However, it should not be a function of
how much RAM has been allocated ourside Guile's heap.

> Anyway, this demo gives a scenario where gc seems to run too often.

How often is too often? I would imagine the objective is to minimize the
stop-the-world periods. IOW, you usually want the program to run very
steadily.

> I have other scenarios where it does not run often enough -- I have
> one where the working set is about 200MBytes, but guile happily grows
> to 10GB or 20GB RAM over a few hours.

You mean the size of Guile's heap? That seems pathological. Maybe there
are lots of values in your data that accidentally look like heap
addresses.

> My solution for that other case is to manually GC whenever gc-stats
> gets above 1GB, because I know a priori my working set is smaller than
> that.
>
> The overall message seems to be that gc either runs too often, or not
> often enough. Some sort of better planning is needed.
>
> I'm motivated to explore this, as it actually impacts my work.

Absolutely, although as an application programmer you should be able to
trust that GC just works.


Marko



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 20:36       ` Marko Rauhamaa
@ 2017-09-10 21:47         ` Linas Vepstas
  2017-09-10 22:38           ` Marko Rauhamaa
  2017-09-11  8:22           ` tomas
  2017-09-13 17:13         ` David Kastrup
  1 sibling, 2 replies; 29+ messages in thread
From: Linas Vepstas @ 2017-09-10 21:47 UTC (permalink / raw)
  To: Marko Rauhamaa; +Cc: guile-user@gnu.org

On Sun, Sep 10, 2017 at 3:36 PM, Marko Rauhamaa <marko@pacujo.net> wrote:

> Linas Vepstas <linasvepstas@gmail.com>:
> > which suggests that the vast majority of time is spent in GC, and not
> > in either scheme/guile, or my wrapped c++ code.
>
> That may well be normal.
>

It might be "normal", but its objectionable.  If 90% of the cpu time is
spent in GC, and almost nothing at all was actually collected, then its too
much.   On the Amazon cloud, you get to pay by the minute. On laptops with
batteries, people complain. Especially when they are apple users, not linux
users.

If the compute job takes a week, and your user has discovered this GC
thing, then the user will conclude that it should take less than 24 hours,
and will tell you that you are stupid, and that guile sucks.  I have been
told I'm stupid, and that guile sucks too many times.

CPU cycles aren't free.

> > 4 gc's/minute means one every 15 seconds, which sounds reasonable,
> > except that RAM usage is so huge, that each GC takes many seconds to
> > complete.
>
> That may or may not be normal. However, it should not be a function of
> how much RAM has been allocated ourside Guile's heap.
>

The bigger the RAM usage, the slower it seems to be.  It might be due to
fragmentation.  It might be due to memory-bandwidth effects: if GC touches
every page or every other page, out of 100GB, it takes time to move those
pages from RAM through 3rd, 2nd and 1st-level cache.   DDR3 has a bandwidth
of maybe 25GB/sec, if you are lucky; less depending on the northbridge and
cpu design

Its possible/likely that mixed c++ and guile code results in heavily
fragmented memory, where every 4K page has 100 bytes of guile, and 3900
bytes of non-collectible memory.  The TLB will get severely thrashed in
this scenario.  TLB's are infamously under-sized, and have been for decades.

> > Anyway, this demo gives a scenario where gc seems to run too often.
>
> How often is too often? I would imagine the objective is to minimize the
> stop-the-world periods. IOW, you usually want the program to run very
> steadily.
>

Its too often if GC collected less than 10% of the current in-use heap.

In the current case I am looking at, stop-the-world lasts about 5 seconds,
and the guile in-use heap changes by less than 1% before and after.

>
> > I have other scenarios where it does not run often enough -- I have
> > one where the working set is about 200MBytes, but guile happily grows
> > to 10GB or 20GB RAM over a few hours.
>
> You mean the size of Guile's heap?

Yes.

> That seems pathological.

Yes.

> Maybe there
> are lots of values in your data that accidentally look like heap
> addresses.
>

 Maybe. But probably not. For this particular scenario, there are vast
quantities of 1MB to 10MB sized C strings that are sent to
scm_eval_string(), and are then freed.  These C strings are approx 10%
scheme s-exps, and about 90% utf8 strings in various natural languages
(french, chinese, etc)

However, guile uses gc_malloc_atomic for these strings, and so GC should
never-ever peer into the contents of them. So I don't think its "accidental
heap addresses".  Its probably fragmentation, and the very heavy churn of
large malloc chunks.

> > My solution for that other case is to manually GC whenever gc-stats
> > gets above 1GB, because I know a priori my working set is smaller than
> > that.
> >
> > The overall message seems to be that gc either runs too often, or not
> > often enough. Some sort of better planning is needed.
> >
> > I'm motivated to explore this, as it actually impacts my work.
>
> Absolutely, although as an application programmer you should be able to
> trust that GC just works.
>

Well, yes, but I also have to handle user feedback about where I should
stick it.

The c++ code that I manage uses smart pointers for memory management. I
have no idea what fraction of cpu times is spent in that code, as there is
no simple, practical way of instrumenting that code and measuring it.  It
might be a sloppy disaster for all I know -- but I don't know.

By contrast, the guile gc can be directly observed, and some of my users
observe it. Myself, I'm willing to partly ignore it, but it keeps coming up
as a possible bottleneck.

So let me redirect: is there some central spot within guile, where the
decision to run GC is made? If so, what file would that be in? If not, is
there some string I should grep for?

--linas

-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 21:47         ` Linas Vepstas
@ 2017-09-10 22:38           ` Marko Rauhamaa
  2017-09-11 16:17             ` Arne Babenhauserheide
  2017-09-14 18:31             ` Linas Vepstas
  2017-09-11  8:22           ` tomas
  1 sibling, 2 replies; 29+ messages in thread
From: Marko Rauhamaa @ 2017-09-10 22:38 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: guile-user@gnu.org

Linas Vepstas <linasvepstas@gmail.com>:

> On Sun, Sep 10, 2017 at 3:36 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>
>> Linas Vepstas <linasvepstas@gmail.com>:
>> > which suggests that the vast majority of time is spent in GC, and not
>> > in either scheme/guile, or my wrapped c++ code.
>>
>> That may well be normal.
>
> It might be "normal", but its objectionable. If 90% of the cpu time is
> spent in GC, and almost nothing at all was actually collected, then
> its too much.

Hard to say. I don't know how much garbage Guile generates as part of
its normal execution model, but for example Python generates garbage
with each method call. It is very natural for each application of a Lisp
function to create a throw-away data structure through substitution.

> On the Amazon cloud, you get to pay by the minute. On laptops with
> batteries, people complain. Especially when they are apple users, not
> linux users.

GC overhead is like RAM cache overhead or C function call overhead. Its
the price of doing business.

> If the compute job takes a week, and your user has discovered this GC
> thing, then the user will conclude that it should take less than 24
> hours, and will tell you that you are stupid, and that guile sucks. I
> have been told I'm stupid, and that guile sucks too many times.
>
> CPU cycles aren't free.

I have no idea how good Guile's performance is. Python's performance is
probably about 100 times worse than C, yet it's the fast becoming the
most popular programming language in the world.

Not everything in the world is done for performance. The #1 purpose for
a programming language is managing complexity. Relatively few
computation tasks require excellent performance, and there are other
programming languages for that: C, C++, Java, C#, go

Note that Java, C# and go are performant despite GC. GC generally
doesn't have to be all that costly throughput-wise. The biggest
performance drain for high-level programming languages such as Scheme or
Python is the absence of compile-time typing. In fact, Python is adding
optional static type annotation (which I'm afraid might be a big
mistake).

> However, guile uses gc_malloc_atomic for these strings, and so GC
> should never-ever peer into the contents of them. So I don't think its
> "accidental heap addresses". Its probably fragmentation, and the very
> heavy churn of large malloc chunks.

You aren't thrashing by any chance...?

> So let me redirect: is there some central spot within guile, where the
> decision to run GC is made? If so, what file would that be in? If not,
> is there some string I should grep for?

I believe that decision is left for GC_malloc(1L):

   GC_malloc will attempt to reclaim inaccessible space automatically by
   invoking a conservative garbage collector at appropriate points.

Marko

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 22:38           ` Marko Rauhamaa
@ 2017-09-11 16:17             ` Arne Babenhauserheide
  2017-09-14 18:31             ` Linas Vepstas
  1 sibling, 0 replies; 29+ messages in thread
From: Arne Babenhauserheide @ 2017-09-11 16:17 UTC (permalink / raw)
  To: Marko Rauhamaa; +Cc: guile-user@gnu.org

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]


Marko Rauhamaa <marko@pacujo.net> writes:
> I have no idea how good Guile's performance is. Python's performance is
> probably about 100 times worse than C, yet it's the fast becoming the
> most popular programming language in the world.

One of the reasons for that is that you can get it to native C for most
non-trivial tasks by writing an extension in C.

If I understand it right, Linas is doing just that for Guile, and that
shows problems, which means that something is blocking the path to
much wider usecases. So getting that fixed would be great!

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 22:38           ` Marko Rauhamaa
  2017-09-11 16:17             ` Arne Babenhauserheide
@ 2017-09-14 18:31             ` Linas Vepstas
  2017-09-14 18:58               ` Marko Rauhamaa
  1 sibling, 1 reply; 29+ messages in thread
From: Linas Vepstas @ 2017-09-14 18:31 UTC (permalink / raw)
  To: Marko Rauhamaa; +Cc: guile-user@gnu.org

Hi Marko,

A casual reply, then ...

On Sun, Sep 10, 2017 at 5:38 PM, Marko Rauhamaa <marko@pacujo.net> wrote:

> Linas Vepstas <linasvepstas@gmail.com>:
>
> > On Sun, Sep 10, 2017 at 3:36 PM, Marko Rauhamaa <marko@pacujo.net>
> wrote:
> >
>
> Not everything in the world is done for performance. The #1 purpose for
> a programming language is managing complexity. Relatively few
> computation tasks require excellent performance, and there are other
> programming languages for that: C, C++, Java, C#, go
>

I agree in principle; the guts of my system are in C++, the casual
computations that string together the pipeline are in guile.

There are some 50+ dialects of scheme, and somewhere on the net is a giant
page of benchmarks comparing them. Guile ranks about 10th or 11th in that
list, which isn't bad, except that many of the #1 and #2 spots are
something like 10x faster.  I'd like to see guile improve in those
rankings.

>
> Note that Java, C# and go are performant despite GC.

Well, some decades ago, there were a couple of PhD's down the hall from me,
and they were full-time assigned to improving java GC performance.  That's
all they did, for years. And this was a long time ago.  If java wasn't good
at this by now, there'd be ... it would be incomprehensible.

>  The biggest
> performance drain for high-level programming languages such as Scheme or
> Python is the absence of compile-time typing. In fact, Python is adding
> optional static type annotation (which I'm afraid might be a big
> mistake).
>

Well that's a can of worms. Aside from static typing to benefit the
compiler, there's static typing to help the programmer understand what the
heck is being passed as an argument to some poorly documented, overly-long
function.  This has always been a kind-of bugaboo of reading other people's
scheme code -- its too often too hard to understand.

This is also why python is popular: any shmo feels they can code in it; its
the new visual-basic of the 21st century, with all of the culture and
code-quality that implies.

One "obvious" solution to adding types in scheme is to go to caml/haskell,
but this does not obviously improve readability.  (although types are
effing fantastic from the math/theoretical side: my system makes very very
heavy use of types and type constructors, But its unrelated to scheme)

For a certain class of problems, coding in python simply doesn't work:
graph data structures are naturally recursive and functional; they're a
natural fit for scheme, and are hard/painful/nearly impossible to
manipulate in python.

>
> You aren't thrashing by any chance...?
>
No.

>
>
> I believe that decision is left for GC_malloc(1L):
>

OK thanks

--linas

-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-14 18:31             ` Linas Vepstas
@ 2017-09-14 18:58               ` Marko Rauhamaa
  2017-09-14 20:15                 ` Linas Vepstas
  2017-09-14 20:45                 ` Arne Babenhauserheide
  0 siblings, 2 replies; 29+ messages in thread
From: Marko Rauhamaa @ 2017-09-14 18:58 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: guile-user@gnu.org

Linas Vepstas <linasvepstas@gmail.com>:

> Well that's a can of worms. Aside from static typing to benefit the
> compiler, there's static typing to help the programmer understand what
> the heck is being passed as an argument to some poorly documented,
> overly-long function. This has always been a kind-of bugaboo of
> reading other people's scheme code -- its too often too hard to
> understand.
>
> This is also why python is popular: any shmo feels they can code in
> it; its the new visual-basic of the 21st century, with all of the
> culture and code-quality that implies.

I strongly disagree. Python is popular because its level of abstraction
is optimal for a whole slew of use cases. I see no reason why Scheme
couldn't surpass Python some day in the future for the same reason.

Not everybody can write software in good style. In fact, in my
experience very few programmers can do it. That is not a property of the
programming language.

Static type declarations have undeniable quality benefits, but in my
experience the noise they introduce is also a quality problem. One of
the prime quality boosters for high-level programming languages is the
clear but concise expressivity. You can see the forest for the trees.

> For a certain class of problems, coding in python simply doesn't work:
> graph data structures are naturally recursive and functional; they're
> a natural fit for scheme, and are hard/painful/nearly impossible to
> manipulate in python.

I don't see that.

The greatest benefit of Scheme (and other Lisps) is the data/code
unification. All code is data and all data is code. Apart from that,
Python has adopted almost everything from Lisp. Now, Scheme and Lisp
famously allow you to expand the language syntax dynamically, but I'd
say you most often shouldn't use that power. Good Scheme code embraces
the lambda and leaves ad-hoc syntax aside.

Marko

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-14 18:58               ` Marko Rauhamaa
@ 2017-09-14 20:15                 ` Linas Vepstas
  2017-09-14 20:36                   ` David Kastrup
  2017-09-14 20:45                 ` Arne Babenhauserheide
  1 sibling, 1 reply; 29+ messages in thread
From: Linas Vepstas @ 2017-09-14 20:15 UTC (permalink / raw)
  To: Marko Rauhamaa; +Cc: guile-user@gnu.org

Hi Marko, long off-topic reply about the nature of programming languages.

On Thu, Sep 14, 2017 at 1:58 PM, Marko Rauhamaa <marko@pacujo.net> wrote:

> Linas Vepstas <linasvepstas@gmail.com>:
>
> > Well that's a can of worms. Aside from static typing to benefit the
> > compiler, there's static typing to help the programmer understand what
> > the heck is being passed as an argument to some poorly documented,
> > overly-long function. This has always been a kind-of bugaboo of
> > reading other people's scheme code -- its too often too hard to
> > understand.
> >
> > This is also why python is popular: any shmo feels they can code in
> > it; its the new visual-basic of the 21st century, with all of the
> > culture and code-quality that implies.
>
> I strongly disagree. Python is popular because its level of abstraction
> is optimal for a whole slew of use cases. I see no reason why Scheme
> couldn't surpass Python some day in the future for the same reason.
>

What would that leve of abstraction be?  Racket, ex PLT/scheme seems to
have taken steps in that direction; I can't tell if they've been successful
at this. Should guile move in that direction?

>
> Not everybody can write software in good style. In fact, in my
> experience very few programmers can do it. That is not a property of the
> programming language.
>

Perhaps.  Back in the day, when young programmers were learning C++ for the
first time, they felt obligated to use each of the nifty new tricks they
just learned: multiple inheritance, overloaded operators, virtual methods,
virtual base classes, templates: not just use, but abuse these, even when
something simpler was available.  It was like working with quick-set
concrete: after the code was written, it seemed like it was impossible to
make any significant changes to it that would even compile, much less do
the right thing.  This kind of code was hard to read cause it was badly
written.

Sometimes, with scheme, I get the sense that if you haven't read SICP you
can't understand what the code does: that you can't understand the code
cause it's well-written.

>
> Static type declarations have undeniable quality benefits, but in my
> experience the noise they introduce is also a quality problem. One of
> the prime quality boosters for high-level programming languages is the
> clear but concise expressivity. You can see the forest for the trees.
>

Speaking of SICP -- one of the early examples they have there is for
working with 2D numeric data, and using either (x,y) under the covers, or
using (r,theta). Which is all very spiffy, except that, for the user who
just glances at the function signature, its not clear exactly what that
function was expecting to get as arguments.

Somehow, that doesn't happen in Python, even in it's current untyped state.
But I cannot put my finger on it, its a gut-sense thing.

>
> > For a certain class of problems, coding in python simply doesn't work:
> > graph data structures are naturally recursive and functional; they're
> > a natural fit for scheme, and are hard/painful/nearly impossible to
> > manipulate in python.
>
> I don't see that.
>

Hmm. I have C++ code with guile, cython and haskell bindings.  It is used
to work with data structures that resemble json-like nested, recursive
tree-structures.  By json-like I mean they are trees, with various
different types at the vertexes.  A graph (directed or non-directed, with
loops in general) is specified by specifying the various trees that partly
span the graph.   I kind of wish I had javascript bindings for this thing,
but can't figure out how.

The guile bindings for this are very elegant, and are about as readable as
json is.  The python bindings are real clunkers -- they're much more
verbose, by factors of 3x to 5x,  To add injury to insult, the insanely
stupid python whitespace rules means that it is impossible to indent them
into the natural tree structure that they naturally "want" to have.

Perhaps our python bindings are mis-designed. They were created by a python
fanatic to be "pythonic" which I think means "I don't know how to code"
:-(   Our guile bindings are also mis-designed (by me), I regret many of
the decisions I made... but they're still nicer than the python...

> The greatest benefit of Scheme (and other Lisps) is the data/code
> unification. All code is data and all data is code. Apart from that,
> Python has adopted almost everything from Lisp. Now, Scheme and Lisp
> famously allow you to expand the language syntax dynamically, but I'd
> say you most often shouldn't use that power. Good Scheme code embraces
> the lambda and leaves ad-hoc syntax aside.
>

Since we are off-topic, I want to mention two things that scheme doesn't
have that I'm supremely interested in.  Of course, it would not be scheme,
any more, once these were added, but these are things important for some
unspecified future.

One is a knowledge-representation language subset.  Something like the
datalog subset of prolog, or something like SQL: some subset of scheme that
can be manipulated and queried and persisted to a file.  Think pattern
matching .. but not the current anemic, super-simplistic model used in
scheme, but something full-featured, like SQL. Applied not to tables, but
to s-exps.

Another neato trick would be some kind of term-rewriting system.  Something
kind-of-ish like the scheme macro subsystem, but instead, the macros could
be invoked repeatedly, ad infinitum, on arbitrary data, instead of just
once on fixed data.  Again, kind-of-like SQL query statements: you can
re-use them over and over, even as the table of data underneath it all
changes. Pattern-matching, all grown up.

If you had these two, you've have a graph DB that far exceeded what the
tinkerpop folks can do. and with it being scheme ... or at least s-exps ...

--linas

>
>
> Marko
>

-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-14 20:15                 ` Linas Vepstas
@ 2017-09-14 20:36                   ` David Kastrup
  0 siblings, 0 replies; 29+ messages in thread
From: David Kastrup @ 2017-09-14 20:36 UTC (permalink / raw)
  To: guile-user

Linas Vepstas <linasvepstas@gmail.com> writes:

> Perhaps.  Back in the day, when young programmers were learning C++
> for the first time, they felt obligated to use each of the nifty new
> tricks they just learned: multiple inheritance, overloaded operators,
> virtual methods, virtual base classes, templates: not just use, but
> abuse these, even when something simpler was available.  It was like
> working with quick-set concrete: after the code was written, it seemed
> like it was impossible to make any significant changes to it that
> would even compile, much less do the right thing.  This kind of code
> was hard to read cause it was badly written.

The language C++ has had a long-running inferiority complex towards
FORTRAN, basically with the motto "we can do the same, but user-defined,
and still get native performance of a statically typed system".  This
gave it a monstrous, and actually unpredictably contorted type
conversion lattice in order to simulate Fortran's builtin complex types.
It also gave birth to a legion of attempts to do efficient
runtime-dimensioned multidimensional arrays, with the result that there
are no numeric libraries equivalent in usability and efficiency to
half-century old FORTRAN libraries like BLAS and LINPACK since there
still is no agreed-upon efficient multidimensional array type, let alone
one where the compiler can do good strength reduction.

This "we can do the same, but user-defined, and still get native
performance of a statically typed system" mantra pervades all language
standardization processes and basically leads to a Turing-complete
template, type and ambiguity resolution system that is a wholly
independent layer akin to the C preprocessor but syntactically
structure-preserving.

Scheme (and its overall LISP family) has no language syntax to speak of
and its input _is_ already a data structure, so "structure-preserving"
has always been a given for its preprocessing.  It also never really was
into static typing, but "JIT compilation" allows to reduce the impact of
that decision considerably.

> Sometimes, with scheme, I get the sense that if you haven't read SICP
> you can't understand what the code does: that you can't understand the
> code cause it's well-written.

Well, that does not sound like a good language design goal either.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-14 18:58               ` Marko Rauhamaa
  2017-09-14 20:15                 ` Linas Vepstas
@ 2017-09-14 20:45                 ` Arne Babenhauserheide
  1 sibling, 0 replies; 29+ messages in thread
From: Arne Babenhauserheide @ 2017-09-14 20:45 UTC (permalink / raw)
  To: Marko Rauhamaa; +Cc: guile-user@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]


Marko Rauhamaa <marko@pacujo.net> writes:
> Linas Vepstas <linasvepstas@gmail.com>:
>> This is also why python is popular: any shmo feels they can code in
>> it; its the new visual-basic of the 21st century, with all of the
>> culture and code-quality that implies.
>
> I strongly disagree. Python is popular because its level of abstraction
> is optimal for a whole slew of use cases. I see no reason why Scheme
> couldn't surpass Python some day in the future for the same reason.

I somewhat disagree: Python became popular, because code written in
Python is typically easy to understand; even for non-programmers. This
is aided by the zen of python, which is a seriously awesome basis to
create a programming language: https://www.python.org/dev/peps/pep-0020/

Now so many people are using Python, that it is also popular because
there are so many people using it and building things for it.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 21:47         ` Linas Vepstas
  2017-09-10 22:38           ` Marko Rauhamaa
@ 2017-09-11  8:22           ` tomas
  2017-09-14 20:19             ` Linas Vepstas
  1 sibling, 1 reply; 29+ messages in thread
From: tomas @ 2017-09-11  8:22 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: guile-user@gnu.org

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, Sep 10, 2017 at 04:47:09PM -0500, Linas Vepstas wrote:

[...]

> The bigger the RAM usage, the slower it seems to be.  It might be due to
> fragmentation [...]

Your guesses all make much sense (gc and non-gc data mixed at page level,
forcing the gc to touch many pages, etc.).

To just add one more: total memory usage (gc and non-gc) seems to enter
the gc triggering heuristics [1] (I couldn't follow in detail how, and I
don't know whether this is still valid in the 2.2 branch).

It just might be that Guile is seeing the outrageous process's memory
footprint and tries desperately to free some memory (which of course
can't suceed in your case because most memory usage isn't "guile's").

That might explain the gc being called far too often (and without
success).

Take this with two pounds of salt. My knowledge of Guile internals is
pathetic at best.

Cheers

[1] http://git.savannah.gnu.org/cgit/guile.git/tree/libguile/gc.c?h=v2.0.13#n770

- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlm2R6gACgkQBcgs9XrR2ka/2gCfXMuoNM8HNNQWsiRCLzUuq76q
LtEAn2yAgEyKbRjO8gljRabo4V9TtQIw
=St+d
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-11  8:22           ` tomas
@ 2017-09-14 20:19             ` Linas Vepstas
  0 siblings, 0 replies; 29+ messages in thread
From: Linas Vepstas @ 2017-09-14 20:19 UTC (permalink / raw)
  To: tomas; +Cc: guile-user@gnu.org

Tomas, Hans, Mark,

Thanks looking at it now.

--linas


On Mon, Sep 11, 2017 at 3:22 AM, <tomas@tuxteam.de> wrote:

>
> To just add one more: total memory usage (gc and non-gc) seems to enter
> the gc triggering heuristics [1] (I couldn't follow in detail how, and I
> don't know whether this is still valid in the 2.2 branch).
>
> [1] http://git.savannah.gnu.org/cgit/guile.git/tree/libguile/
> gc.c?h=v2.0.13#n770
>


Hans wrote:
> Does it have to do with this issue [1], that the Boehm GC initially uses
a small amount of heap. One can use say
>   export GC_INITIAL_HEAP_SIZE=50G
> on the shell command line before running the program.

> 1. https://gith <https://github.com/Macaulay2/M2/issues/500>
ub.com/Macaulay2/M2/issues/500 <https://github.com/Macaulay2/M2/issues/500>

Mark wrote:
> If you'd like to learn how Boehm GC decides when to run, I would
> encourage you to read <http://www.hboehm.info/gc/gcdescr.html>.  I've
> selected some relevant excerpts below.


-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 20:36       ` Marko Rauhamaa
  2017-09-10 21:47         ` Linas Vepstas
@ 2017-09-13 17:13         ` David Kastrup
  1 sibling, 0 replies; 29+ messages in thread
From: David Kastrup @ 2017-09-13 17:13 UTC (permalink / raw)
  To: guile-user

Marko Rauhamaa <marko@pacujo.net> writes:

> Linas Vepstas <linasvepstas@gmail.com>:
>
>> The overall message seems to be that gc either runs too often, or not
>> often enough. Some sort of better planning is needed.
>>
>> I'm motivated to explore this, as it actually impacts my work.
>
> Absolutely, although as an application programmer you should be able
> to trust that GC just works.

s/although/because/

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 20:23     ` Linas Vepstas
  2017-09-10 20:36       ` Marko Rauhamaa
@ 2017-09-10 22:17       ` Arne Babenhauserheide
  2017-09-10 22:36       ` Arne Babenhauserheide
  2017-09-14  7:44       ` Mark H Weaver
  3 siblings, 0 replies; 29+ messages in thread
From: Arne Babenhauserheide @ 2017-09-10 22:17 UTC (permalink / raw)
  To: linasvepstas; +Cc: guile-user@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2580 bytes --]

Hi Linas,

Linas Vepstas <linasvepstas@gmail.com> writes:

>> If I replace
>>
>>     (cons (format #f "~A" cnt) lis)
>>
>> with
>>
>>     (cons cnt lis)
>>
>> GC is down to ~20% (from around 130%).
>>
>
> That demo tried to capture the spirit of a more complex system.

That’s great, by the way!

I did that experiment, because the string formatting seems to create
lots of values the GC has to collect. If this triggers an investigation
of all memory, any reduction of disposable value creation should help.

Just try putting a (gc-disable) just before you run make-bgl and see the
memory usage shoot through the roof almost instantly.

For the real-code: throw in a 

> The cons
> that I have is actually (cons (wrapper-for-some-c++-code foobar) lis)  or
> maybe (cons (scheme-that-calls-other-scheme-that-calls-c++ foo) lis)  ...
> but its not just cons -- it can also be for-each, map, or fold.
>
> The actual printouts, in my instrumented code look like this:
>
> guile-en-r> (avg-gc-cpu-time)
> Elapsed: 4461. secs. Rate: 4.236 gc/min %cpu-GC: 280.8%  %cpu-use: 284.0%

This indeed looks pretty painful.

> Anyway, this demo gives a scenario where gc seems to run too often.  I have
> other scenarios where it does not run often enough -- I have one where the
> working set is about 200MBytes, but guile happily grows to 10GB or 20GB RAM
> over a few hours.  My solution for that other case is to manually GC
> whenever gc-stats gets above 1GB, because I know a priori my working set is
> smaller than that.
>
> The overall message seems to be that gc either runs too often, or not often
> enough.  Some sort of better planning is needed.

A short term solution could be to disable automatic triggering of GC in
cases where it fires too often (compared to the total memory
consumption).

> I'm motivated to explore this, as it actually impacts my work.

That’s great! And it’s awesome that you’re pushing Guile that far. This
is what it needs to become a reliable real-world battered workhorse for
many more people.

A start might be to look at the problem description: GC is triggered too
often when your C++ datastructures are so huge that the growth in
Guile-managed memory is irrelevant, and it is triggered too seldomly
when you only manage a small amount of data.

So maybe GC could benefit from a kind of hint from C++ about the size of
the datastructures which allows tuning the GC frequency to the total
amount of memory.

-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 20:23     ` Linas Vepstas
  2017-09-10 20:36       ` Marko Rauhamaa
  2017-09-10 22:17       ` Arne Babenhauserheide
@ 2017-09-10 22:36       ` Arne Babenhauserheide
  2017-09-14  7:44       ` Mark H Weaver
  3 siblings, 0 replies; 29+ messages in thread
From: Arne Babenhauserheide @ 2017-09-10 22:36 UTC (permalink / raw)
  To: linasvepstas; +Cc: guile-user@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2636 bytes --]

Hi Linas,

Linas Vepstas <linasvepstas@gmail.com> writes:
> On Sun, Sep 10, 2017 at 11:50 AM, Arne Babenhauserheide <arne_bab@web.de>
>> Linas Vepstas <linasvepstas@gmail.com> writes:
>>
>> > To make this conversation less crazy, and more down-to-earth, here
>> > is a demo that seems to spend most of its time in GC. Hard for me
>> > to to say that this is "wrong", right now, but it illustrates the
>> > scandalous nature of the problem.
>>
>> Do you have many string operations in your code?
>>
>> If I replace
>>
>>     (cons (format #f "~A" cnt) lis)
>>
>> with
>>
>>     (cons cnt lis)
>>
>> GC is down to ~20% (from around 130%).
>>
>
> That demo tried to capture the spirit of a more complex system. The cons
> that I have is actually (cons (wrapper-for-some-c++-code foobar) lis)  or
> maybe (cons (scheme-that-calls-other-scheme-that-calls-c++ foo) lis)  ...
> but its not just cons -- it can also be for-each, map, or fold.

I looked at format, because it seems to create a lot of values which
need to be collected.

Just try to run your function like this:

    (gc-disable)
    (define foo (make-bgl '() (* 1235 1000 1000)))

… and see memory usage go through the roof almost instantly.

Then try the same without the format to get a mostly tame memory
behavior.

> The actual printouts, in my instrumented code look like this:
>
> guile-en-r> (avg-gc-cpu-time)
> Elapsed: 4461. secs. Rate: 4.236 gc/min %cpu-GC: 280.8%  %cpu-use: 284.0%

That looks painful indeed.

> Anyway, this demo gives a scenario where gc seems to run too often.  I have
> other scenarios where it does not run often enough -- I have one where the
> working set is about 200MBytes, but guile happily grows to 10GB or 20GB RAM
> over a few hours.  My solution for that other case is to manually GC
> whenever gc-stats gets above 1GB, because I know a priori my working set is
> smaller than that.
>
> The overall message seems to be that gc either runs too often, or not often
> enough.  Some sort of better planning is needed.

Maybe Guile needs some additional information from the C++
datastructures about the total size of the managed memory (and the
actual size of datastructures in foreign code) so it can judge whether
the added memory usage it sees is actually already relevant.

> I'm motivated to explore this, as it actually impacts my work.

That’s great! And it is also awesome that you’re pushing Guile this far!
This helps Guile become a reliable tool for many more people.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10 20:23     ` Linas Vepstas
                         ` (2 preceding siblings ...)
  2017-09-10 22:36       ` Arne Babenhauserheide
@ 2017-09-14  7:44       ` Mark H Weaver
  3 siblings, 0 replies; 29+ messages in thread
From: Mark H Weaver @ 2017-09-14  7:44 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: guile-user

Linas Vepstas <linasvepstas@gmail.com> writes:
> The overall message seems to be that gc either runs too often, or not often
> enough.  Some sort of better planning is needed.

There several variables available to adjust libgc's heuristics, e.g.
GC_use_entire_heap, GC_free_space_divisor, GC_full_frequency, with
associated environment variables, described in:
<https://github.com/ivmai/bdwgc/blob/master/doc/README.environment>

> I'm motivated to explore this, as it actually impacts my work.

If you'd like to learn how Boehm GC decides when to run, I would
encourage you to read <http://www.hboehm.info/gc/gcdescr.html>.  I've
selected some relevant excerpts below.

      Mark

http://www.hboehm.info/gc/gcdescr.html

[...]
  In non-incremental mode, we make a decision about whether to garbage
  collect whenever an allocation would otherwise have failed with the
  current heap size. If the total amount of allocation since the last
  collection is less than the heap size divided by GC_free_space_divisor,
  we try to expand the heap. Otherwise, we initiate a garbage
  collection. This ensures that the amount of garbage collection work per
  allocated byte remains constant.

  The above is in fact an oversimplification of the real heap expansion
  and GC triggering heuristic, which adjusts slightly for root size and
  certain kinds of fragmentation. In particular:

  * Programs with a large root set size and little live heap memory will
    expand the heap to amortize the cost of scanning the roots.

  * Versions 5.x of the collector actually collect more frequently in
    nonincremental mode. The large block allocator usually refuses to
    split large heap blocks once the garbage collection threshold is
    reached. This often has the effect of collecting well before the heap
    fills up, thus reducing fragmentation and working set size at the
    expense of GC time. Versions 6.x choose an intermediate strategy
    depending on how much large object allocation has taken place in the
    past. (If the collector is configured to unmap unused pages, versions
    6.x use the 5.x strategy.)

  * In calculating the amount of allocation since the last collection we
    give partial credit for objects we expect to be explicitly
    deallocated. Even if all objects are explicitly managed, it is often
    desirable to collect on rare occasion, since that is our only
    mechanism for coalescing completely empty chunks.

  It has been suggested that this should be adjusted so that we favor
  expansion if the resulting heap still fits into physical memory. In many
  cases, that would no doubt help. But it is tricky to do this in a way
  that remains robust if multiple application are contending for a single
  pool of physical memory.

[...]

  In incremental mode, the heap is always expanded when we encounter
  insufficient space for an allocation. Garbage collection is triggered
  whenever we notice that more than GC_heap_size/2 *
  GC_free_space_divisor bytes of allocation have taken place. After
  GC_full_freq minor collections a major collection is started.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10  4:14 ` Linas Vepstas
  2017-09-10 16:50   ` Arne Babenhauserheide
@ 2017-09-10 16:53   ` Arne Babenhauserheide
  2017-09-13  2:24   ` Mark H Weaver
  2 siblings, 0 replies; 29+ messages in thread
From: Arne Babenhauserheide @ 2017-09-10 16:53 UTC (permalink / raw)
  To: linasvepstas; +Cc: guile-user@gnu.org

[-- Attachment #1: Type: text/plain, Size: 492 bytes --]


Linas Vepstas <linasvepstas@gmail.com> writes:

> To make this conversation less crazy, and more down-to-earth, here is a
> demo that seems to spend most of its time in GC. Hard for me to to say that
> this is "wrong", right now, but it illustrates the scandalous nature of the
> problem.

My approach was to optimize the function (in REPL: ,optimize <func>) and
then to look for boxed values.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-10  4:14 ` Linas Vepstas
  2017-09-10 16:50   ` Arne Babenhauserheide
  2017-09-10 16:53   ` Arne Babenhauserheide
@ 2017-09-13  2:24   ` Mark H Weaver
  2017-09-13 19:38     ` Arne Babenhauserheide
  2017-09-14 19:14     ` Linas Vepstas
  2 siblings, 2 replies; 29+ messages in thread
From: Mark H Weaver @ 2017-09-13  2:24 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: guile-user

Hi Linas,

Linas Vepstas <linasvepstas@gmail.com> writes:

> To make this conversation less crazy, and more down-to-earth, here is a
> demo that seems to spend most of its time in GC. Hard for me to to say that
> this is "wrong", right now, but it illustrates the scandalous nature of the
> problem.
>
> Its a simple loop, about 6 lines of code total, it mostly just wastes cpu
> time.  There's also a longer progress-report printer, similar to the
> previous one I posted.
>
> It runs at about 1 million iterations/minute, for me, of which 15 seconds
> are spent "doing real work", and 45-75 seconds in gc.  Note that GC runs in
> parallel, usually using about 3-4 cores, so the fraction-cpu-time is the
> fraction relative to wall-clock, so that %350 cpu time is typical (i.e. 3.5
> cpu cores are running)
>
> This seems to be a reasonable model of the behavior that I am seeing.

The indentation was stripped from your mail, so I re-indented the
following code:

> ; a tail-recursive loop, seems to waste a lot of time in gc.
> (define (make-bgl lis cnt)
>   ; a bogus longer list
>   (define longer (cons (format #f "~A" cnt) lis))
>   ; periodically report statistics
>   (if (eq? 0 (modulo cnt (* 1000 1000)))
>       (begin (format #t "~A " cnt)  (avg-gc-cpu-time)))
>   ; periodically trim the list
>   (if (eq? 0 (modulo cnt (* 4123 1123)))
>       (set! longer '()))
>   ; loop for a while.
>   (if (eq? 0 cnt) lis
>       (make-bgl longer (- cnt 1))))
>
> ; (define foo (make-bgl '() (* 1235 1000 1000)))

A few observations:

* The (format #f ...) allocates a temporary string output port, and that
  turns out to be quite a lot of allocation, especially in Guile 2.2.
  Glancing at the relevant code, I would estimate that it's at least 2.5
  kilobytes spread over at least 10 heap blocks of various sizes.  It
  also involves registering a finalizer for the port, and adding to a
  weak set, both of which are relatively expensive for the GC to deal
  with.  Using 'number->string' there would reduce memory allocation of
  the loop above by an order of magnitude or more.

  In theory, string ports could (and IMO should) be made a *lot*
  lighter, but we're not there yet.

* The fact that you're using 'set!' to mutate 'longer' forces a
  heap-allocated variable object to be allocated for 'longer', whereas
  otherwise it could be allocated on the stack.  See section 9.3.4
  (Variables and the VM) in the Guile 2.2 manual for more on this, and
  how 'set!' often makes things much less efficient.

* Since 'longer' is allocated within the loop, a new copy of that
  (heap-allocated) variable object is created for each iteration.

Since you're running this loop about 1.2 billion times, and each
iteration is allocating at least 2.5k bytes, that comes out to around 3
terabytes of allocation during this loop.  All of that must be reclaimed
by the garbage collector.

Also, since the port objects have finalizers, they are not deallocated
during the first collection that finds them to be unreachable.  They are
added to a queue of objects to be finalized, and only after they have
been finalized and the next GC runs can they actually be deallocated.

Considering all of this, it's not surprising to me that the cost of this
loop in dominated by time spent in GC.  The remaining things to do are
very lightweight operations by comparison.

       Mark

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-13  2:24   ` Mark H Weaver
@ 2017-09-13 19:38     ` Arne Babenhauserheide
  2017-09-13 19:57       ` David Kastrup
  2017-09-14 19:14     ` Linas Vepstas
  1 sibling, 1 reply; 29+ messages in thread
From: Arne Babenhauserheide @ 2017-09-13 19:38 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]

Hi Mark,

Thank you for your great writeup!

Mark H Weaver <mhw@netris.org> writes:
> * The (format #f ...) allocates a temporary string output port, and that
…
>   with.  Using 'number->string' there would reduce memory allocation of
>   the loop above by an order of magnitude or more.
…
> * The fact that you're using 'set!' to mutate 'longer' forces a
>   heap-allocated variable object to be allocated for 'longer', whereas
>   otherwise it could be allocated on the stack.  See section 9.3.4
>   (Variables and the VM) in the Guile 2.2 manual for more on this, and
>   how 'set!' often makes things much less efficient.
>
> * Since 'longer' is allocated within the loop, a new copy of that
>   (heap-allocated) variable object is created for each iteration.

I implemented a version which avoids both points via let-recursion and
number->string:

    (define (make-bgl lis cnt)
      "a bogus longer list"
      (let loop ((longer (cons (number->string cnt) lis))
                 (cnt cnt))
        ;; periodically report statistics
        (when (eq? 0 (modulo cnt (* 1000 1000)))
          (begin (format #t "~A " cnt)  (avg-gc-cpu-time)))
        (cond
         ;; periodically trim the list
         ((and (eq? 0 (modulo cnt (* 4123 1123)) (> (length longer) 0)))
          (loop (list)
                cnt))
         ;; end recursion
         ((eq? 0 cnt)
          lis)
         (else
          (loop (cons (number->string cnt) longer)
                (- cnt 1))))))

This does indeed spend much less time in GC, but its memory usage is
still rising rapidly. Did I unknowingly create a memory leak?

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-13 19:38     ` Arne Babenhauserheide
@ 2017-09-13 19:57       ` David Kastrup
  2017-09-13 20:37         ` Arne Babenhauserheide
  0 siblings, 1 reply; 29+ messages in thread
From: David Kastrup @ 2017-09-13 19:57 UTC (permalink / raw)
  To: guile-user

Arne Babenhauserheide <arne_bab@web.de> writes:

> I implemented a version which avoids both points via let-recursion and
> number->string:
>
>     (define (make-bgl lis cnt)
>       "a bogus longer list"
>       (let loop ((longer (cons (number->string cnt) lis))
>                  (cnt cnt))
>         ;; periodically report statistics
>         (when (eq? 0 (modulo cnt (* 1000 1000)))
>           (begin (format #t "~A " cnt)  (avg-gc-cpu-time)))
>         (cond
>          ;; periodically trim the list
>          ((and (eq? 0 (modulo cnt (* 4123 1123)) (> (length longer) 0)))

You probably mean

           ((and (eq? 0 (modulo cnt (* 4123 1123))) (> (length longer) 0))

instead, and (eq? 0 ...) is undefined behavior anyway even disregarding
that it let an error go unnoticed that would have bombed out had you
been using (= 0 ...) or equivalently (zero? ...) instead.

And (> (length longer) 0) is just horribly inefficient anyway (O(n) with
a turtle-hare hiding in the constant factor) compared to (pair? longer).

>           (loop (list)
>                 cnt))
>          ;; end recursion
>          ((eq? 0 cnt)
>           lis)
>          (else
>           (loop (cons (number->string cnt) longer)
>                 (- cnt 1))))))
>
> This does indeed spend much less time in GC, but its memory usage is
> still rising rapidly. Did I unknowingly create a memory leak?
>
> Best wishes,
> Arne

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-13 19:57       ` David Kastrup
@ 2017-09-13 20:37         ` Arne Babenhauserheide
  0 siblings, 0 replies; 29+ messages in thread
From: Arne Babenhauserheide @ 2017-09-13 20:37 UTC (permalink / raw)
  To: David Kastrup; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 1544 bytes --]


David Kastrup <dak@gnu.org> writes:

> Arne Babenhauserheide <arne_bab@web.de> writes:
>
>> I implemented a version which avoids both points via let-recursion and
>> number->string:
>>
>>     (define (make-bgl lis cnt)
>>       "a bogus longer list"
>>       (let loop ((longer (cons (number->string cnt) lis))
>>                  (cnt cnt))
>>         ;; periodically report statistics
>>         (when (eq? 0 (modulo cnt (* 1000 1000)))
>>           (begin (format #t "~A " cnt)  (avg-gc-cpu-time)))
>>         (cond
>>          ;; periodically trim the list
>>          ((and (eq? 0 (modulo cnt (* 4123 1123)) (> (length longer) 0)))
>
> You probably mean
>
>            ((and (eq? 0 (modulo cnt (* 4123 1123))) (> (length longer) 0))
>
> instead, and (eq? 0 ...) is undefined behavior anyway even disregarding
> that it let an error go unnoticed that would have bombed out had you
> been using (= 0 ...) or equivalently (zero? ...) instead.

Ah, yes, there’s the memoryleak. Thank you!

> And (> (length longer) 0) is just horribly inefficient anyway (O(n) with
> a turtle-hare hiding in the constant factor) compared to (pair? longer).

I didn’t know that this works — thank you!

(pair? (list)) → #f
(pair? (list 1)) → #t

>> This does indeed spend much less time in GC, but its memory usage is
>> still rising rapidly. Did I unknowingly create a memory leak?

Thanks to David this is answered now.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-13  2:24   ` Mark H Weaver
  2017-09-13 19:38     ` Arne Babenhauserheide
@ 2017-09-14 19:14     ` Linas Vepstas
  1 sibling, 0 replies; 29+ messages in thread
From: Linas Vepstas @ 2017-09-14 19:14 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Guile User

Hi Mark,

Thanks... I'm not sure how to reply. So below I write "Yes, but...".  My
alternative was to ignore your email, but that seems rude.

On Tue, Sep 12, 2017 at 9:24 PM, Mark H Weaver <mhw@netris.org> wrote:

> Hi Linas,
>
> Linas Vepstas <linasvepstas@gmail.com> writes:
>
>
> A few observations:
>
> * The (format #f ...) allocates a temporary string output port, and that
>   turns out to be quite a lot of allocation, especially in Guile 2.2.
>   Glancing at the relevant code, I would estimate that it's at least 2.5
>   kilobytes spread over at least 10 heap blocks of various sizes.  It
>   also involves registering a finalizer for the port, and adding to a
>   weak set, both of which are relatively expensive for the GC to deal
>   with.  Using 'number->string' there would reduce memory allocation of
>   the loop above by an order of magnitude or more.
>

Yes but ... that code was meant to be an approximation of the "real" code.
In the "real" code, there's maybe dozen-ish scheme calls, so the estimate
of 2.5KB spread over 10 block might be in the right range. In the "real"
code, there is also maybe 30KB of C++ mallocs, per iteration.  Again, those
mallocs should be "invisible" to bdwgc, but they might serve to fragment
RAM badly.

>
>
> * The fact that you're using 'set!' to mutate 'longer' forces a
>   heap-allocated variable object to be allocated for 'longer', whereas
>   otherwise it could be allocated on the stack.  See section 9.3.4
>   (Variables and the VM) in the Guile 2.2 manual for more on this, and
>   how 'set!' often makes things much less efficient.
>

Heh. I was using set! for this demo, only to clear out the long list.
otherwise, you'd get a multi-billion-entry list, which clearly blows out
the stack.

>
> * Since 'longer' is allocated within the loop, a new copy of that
>   (heap-allocated) variable object is created for each iteration.
>

Hmm. That's interesting, Is that really right? The thing was meant to be a
tail recursive loop, designed to run for some arbitarily long time.
Clearly a heap allocation in a loop would be a disaster.

In my "real" code, I don't think I am using set! in this way (but I will
now have to pay closer attention).  Its not clear that this isn't some kind
of guile design flaw -- I can't imagine a technical reason for why a set!
to a local variable would force it to go on the heap.

> Since you're running this loop about 1.2 billion times, and each
> iteration is allocating at least 2.5k bytes, that comes out to around 3
> terabytes of allocation during this loop.  All of that must be reclaimed
> by the garbage collector.
>

Yeah, the 1.2 billion was to get it to run some 12+ hours, so that I could
verify if it slowly degrades over time (it didn't - it was stable after the
first few minutes).  My loop was running 25 million times, and it took some
12 or 20 hours to finish. And it did seem to slowly degrade, but I'm not
quite sure.

> Also, since the port objects have finalizers,

I don't think that any of my code is using ports in my loop, or that
there's anything else in the loop that needs finalization.

I need to create some better instrumentation.  But at 1-3 days per run,
debugging gets tedious very quickly very fast.  One is not motivated to
revisit the problem, once you finally come out the other end.  Until, of
course, next time.

--linas

-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-09 19:47 out-of-control GC Linas Vepstas
  2017-09-10  4:14 ` Linas Vepstas
@ 2017-09-11 18:13 ` Hans Åberg
  1 sibling, 0 replies; 29+ messages in thread
From: Hans Åberg @ 2017-09-11 18:13 UTC (permalink / raw)
  To: linasvepstas; +Cc: guile-user@gnu.org


> On 9 Sep 2017, at 21:47, Linas Vepstas <linasvepstas@gmail.com> wrote:
> 
> I've been experiencing problems with guile GC for a while now, that I've
> mostly been able to ignore, but that are becoming debilitating, and I need
> help, or need to find some solution.
> 
> Background: I have a C++ library that has guile bindings; it is wrapped up
> as a guile module. It manages a lot of data, anywhere from a few gigabytes,
> to a over a hundred gigabytes.  Almost all this RAM usage should be
> invisible, opaque to guile: so, for example,  while my library is managing
> 5 or 10 GBytes, guile itself is managing maybe 20 or 50 or 100MBytes,
> according to (gc-stats). This is normal and expected. Some of my programs
> have guile use considerably more than this, in the multi-GByte range
> 
> However, the larger the RAM usage, the more frequently guile GC seems to
> run, and the longer it takes, so that by the time my dataset is 100GBytes,
> guile is using maybe 1GBytes, and is spending 99% of its time in GC. My app
> slows to a crawl, and nothing gets done.  I need to get GC under control,
> because it is running pointlessly, because its clear that (almost) nothing
> can be reclaimed, (usually) nothing is being reclaimed.

Does it have to do with this issue [1], that the Boehm GC initially uses a small amount of heap. One can use say
  export GC_INITIAL_HEAP_SIZE=50G
on the shell command line before running the program.

1. https://github.com/Macaulay2/M2/issues/500





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
@ 2017-09-11  7:42 Kjetil Matheussen
  2017-09-14 18:41 ` Linas Vepstas
  0 siblings, 1 reply; 29+ messages in thread
From: Kjetil Matheussen @ 2017-09-11  7:42 UTC (permalink / raw)
  To: guile-user, linasvepstas

> linasvepstas@gmail.com:
>

> I've been experiencing problems with guile GC for a while now, that I've
> mostly been able to ignore, but that are becoming debilitating, and I need
> help, or need to find some solution.
>
> Background: I have a C++ library that has guile bindings; it is wrapped up
> as a guile module. It manages a lot of data, anywhere from a few gigabytes,
> to a over a hundred gigabytes.  Almost all this RAM usage should be
> invisible, opaque to guile: so, for example,  while my library is managing
> 5 or 10 GBytes, guile itself is managing maybe 20 or 50 or 100MBytes,
> according to (gc-stats). This is normal and expected. Some of my programs
> have guile use considerably more than this, in the multi-GByte range
>
> However, the larger the RAM usage, the more frequently guile GC seems to
> run, and the longer it takes, so that by the time my dataset is 100GBytes,
> guile is using maybe 1GBytes, and is spending 99% of its time in GC. My app
> slows to a crawl, and nothing gets done.  I need to get GC under control,
> because it is running pointlessly, because its clear that (almost) nothing
> can be reclaimed, (usually) nothing is being reclaimed.
>
>
Hi Lineas,

This is probably not the reason, but could it be that your program has a
lot of global/static data,
or that you are dynamically loading more and more libraries that has
global/static data?
(global/static data are scanned by bdw-gc, unless explicitly prevented from
doing so)

To test this, you can add a "static_roots_callback" before calling GC_INIT
when your
program starts up. Here's an example of a "static_roots_callback":

https://github.com/kmatheussen/radium/blob/master/Qt/Qt_Main.cpp#L3028
(change "#if 0" at line 3094 to "#if 1" to print out info every time you
collect garbage)

It's initialized like this:

int main(){
   GC_register_has_static_roots_callback(gc_has_static_roots_func);
   GC_INIT();
   ...
}


Also, maybe you should ask on the bdw-gc mailing list too. The question is
probably more relevant there if it's a gc tuning issue (which it might not
be though).


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: out-of-control GC
  2017-09-11  7:42 Kjetil Matheussen
@ 2017-09-14 18:41 ` Linas Vepstas
  0 siblings, 0 replies; 29+ messages in thread
From: Linas Vepstas @ 2017-09-14 18:41 UTC (permalink / raw)
  To: k.s.matheussen; +Cc: Guile User

On Mon, Sep 11, 2017 at 2:42 AM, Kjetil Matheussen <k.s.matheussen@gmail.com
> wrote:

>
>
> This is probably not the reason, but could it be that your program has a
> lot of global/static data,
> or that you are dynamically loading more and more libraries that has
> global/static data?
> (global/static data are scanned by bdw-gc, unless explicitly prevented
> from doing so)
>

Thanks for the suggestion, but I don't think so. There are a few dozen
shared libs, I doubt they have more than a few KB of .data and .bss
segments.  Its pretty "normal" C++ code.

Now, there are lots of threads, each which has thread-local sections but
its hard to imagine how this would add up to more than a few megabytes.

--linas

-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-09-14 20:45 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-09 19:47 out-of-control GC Linas Vepstas
2017-09-10  4:14 ` Linas Vepstas
2017-09-10 16:50   ` Arne Babenhauserheide
2017-09-10 18:03     ` David Kastrup
2017-09-10 20:23     ` Linas Vepstas
2017-09-10 20:36       ` Marko Rauhamaa
2017-09-10 21:47         ` Linas Vepstas
2017-09-10 22:38           ` Marko Rauhamaa
2017-09-11 16:17             ` Arne Babenhauserheide
2017-09-14 18:31             ` Linas Vepstas
2017-09-14 18:58               ` Marko Rauhamaa
2017-09-14 20:15                 ` Linas Vepstas
2017-09-14 20:36                   ` David Kastrup
2017-09-14 20:45                 ` Arne Babenhauserheide
2017-09-11  8:22           ` tomas
2017-09-14 20:19             ` Linas Vepstas
2017-09-13 17:13         ` David Kastrup
2017-09-10 22:17       ` Arne Babenhauserheide
2017-09-10 22:36       ` Arne Babenhauserheide
2017-09-14  7:44       ` Mark H Weaver
2017-09-10 16:53   ` Arne Babenhauserheide
2017-09-13  2:24   ` Mark H Weaver
2017-09-13 19:38     ` Arne Babenhauserheide
2017-09-13 19:57       ` David Kastrup
2017-09-13 20:37         ` Arne Babenhauserheide
2017-09-14 19:14     ` Linas Vepstas
2017-09-11 18:13 ` Hans Åberg
  -- strict thread matches above, loose matches on Subject: below --
2017-09-11  7:42 Kjetil Matheussen
2017-09-14 18:41 ` Linas Vepstas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).