unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Possible Memory Leak with stream-for-each
@ 2010-07-19 18:08 Abhijeet More
  2010-07-20 20:36 ` Andy Wingo
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Abhijeet More @ 2010-07-19 18:08 UTC (permalink / raw)
  To: guile-user

Hi All,
I've been trying to use streams as defined in SICP using guile.
A little googling showed that an implementation had already been suggested here:
http://lists.gnu.org/archive/html/guile-user/2001-04/msg00220.html

However, when I use this to iterate through the stream I see that
guile's memory utilization keeps growing until the iteration is
complete . I'm using guile 1.6.8 (also tested 1.8.7) on linux. I
observe the memory utilization under top.
I tried the same thing with plt-scheme/racket and it did not show a
similar leak i.e .the memory growth was capped at a certain point
during the iteration. It did not grow beyond that point.

From a little more googling, it appears that a similar memory leak has
been discussed before but that investigation was not completed. Here
is the thread:
 http://sources.redhat.com/ml/guile/2000-03/msg00568.html

So my questions are:
1. Can it be confirmed that this is a leak in guile's garbage collection?
2. Are there any workarounds (for instance doing an explicit "(gc)"
somewhere in the definitions?
3. Any pointers on fixing the underlying issue?
4. I noticed that streams in guile (ice-9 streams) were not
implemented in the SICP way. In-fact they were implemented in a way
that makes recursive definitions impossible. Was this intentional?

Some code to illustrate what I'm trying to do:

Simply print all s-expressions in a file to another as follows :
(let* ((outport (open-output-file <OUT-FILE-NAME>)))
  (stream-for-each (lambda (x) (pretty-print x outport))
                   (port->stream (open-input-file <IN-FILE-NAME>) read)))

where port->stream is:
(define (port->stream port readproc)
  (cons-stream (readproc port) (port->stream port readproc)))

(defmacro cons-stream (a b)
  `(cons ,a (delay ,b)))

(define stream-null? null?)
(define the-empty-stream '())
(define (stream-car stream) (car stream))
(define (stream-cdr stream) (force (cdr stream)))

(define (stream-for-each proc s)
  (if (not (stream-null? s))
      (begin (proc (stream-car s))
             (stream-for-each proc (stream-cdr s)))))

I get the same behavior with the following definition:

(define-syntax cons-stream
  (syntax-rules ()
    ((_ ?car ?cdr) (cons ?car (delay ?cdr)))))

Thanks
Abhijeet



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-19 18:08 Possible Memory Leak with stream-for-each Abhijeet More
@ 2010-07-20 20:36 ` Andy Wingo
  2010-07-21  7:00   ` Tristan Colgate
  2010-07-24 16:13 ` Ludovic Courtès
  2010-09-02 12:49 ` Possible Memory Leak with stream-for-each Ludovic Courtès
  2 siblings, 1 reply; 19+ messages in thread
From: Andy Wingo @ 2010-07-20 20:36 UTC (permalink / raw)
  To: Abhijeet More; +Cc: guile-user

On Mon 19 Jul 2010 20:08, Abhijeet More <abhijeet.more@gmail.com> writes:

> 1. Can it be confirmed that this is a leak in guile's garbage
> collection?

Hi,

I can confirm this for Guile 1.9/2.0 at least. Gross... The code that I
used was, to first generate a test file:

  (with-output-to-file "/tmp/test"
    (lambda ()
      (let lp ((n 0))
        (if (< n 10000000)
            (begin
              (write '(foo))
              (lp (1+ n)))))))

Then execute the following code:

  (define stream-null? null?)
  (define the-empty-stream '())
  (define (stream-car stream) (car stream))
  (define (stream-cdr stream) (force (cdr stream)))
  (define-syntax cons-stream
    (syntax-rules ()
      ((_ ?car ?cdr) (cons ?car (delay ?cdr)))))

  (define (stream-for-each proc s)
    (if (not (stream-null? s))
        (begin (proc (stream-car s))
               (stream-for-each proc (stream-cdr s)))))

  (define (port->stream port readproc)
    (cons-stream (readproc port) (port->stream port readproc)))

  (stream-for-each
   identity
   (port->stream (open-input-file "/tmp/test") read))

And I see memory usage explode, yes, at the REPL, even if I disable
position recording via (read-disable 'positions).

> 2. Are there any workarounds (for instance doing an explicit "(gc)"
> somewhere in the definitions?
> 3. Any pointers on fixing the underlying issue?

I don't know. Ludovic? :) You have certainly found a bug, though. We
probably won't look into it for 1.8, but we will certainly try to fix it
for 2.0 (soon!).

> 4. I noticed that streams in guile (ice-9 streams) were not
> implemented in the SICP way. In-fact they were implemented in a way
> that makes recursive definitions impossible. Was this intentional?

I don't know TBH. SICP streams do have a problem, amply explored in
http://www.cs.rice.edu/~taha/publications/conference/sml98.pdf; but
beyond that, I don't know.

Perturbedly yours,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-20 20:36 ` Andy Wingo
@ 2010-07-21  7:00   ` Tristan Colgate
  0 siblings, 0 replies; 19+ messages in thread
From: Tristan Colgate @ 2010-07-21  7:00 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-user

FWIW, the code in srfi-41 now works out of the box (thanks to some
recent r6rs library fixes), with the exception of all the negative
unit tests, due to some exception handling issue I didn't really look
at. I don't know if it still shows the memory leak though.

On 20 July 2010 21:36, Andy Wingo <wingo@pobox.com> wrote:
> On Mon 19 Jul 2010 20:08, Abhijeet More <abhijeet.more@gmail.com> writes:
>
>> 1. Can it be confirmed that this is a leak in guile's garbage
>> collection?
>
> Hi,
>
> I can confirm this for Guile 1.9/2.0 at least. Gross... The code that I
> used was, to first generate a test file:
>
>  (with-output-to-file "/tmp/test"
>    (lambda ()
>      (let lp ((n 0))
>        (if (< n 10000000)
>            (begin
>              (write '(foo))
>              (lp (1+ n)))))))
>
> Then execute the following code:
>
>  (define stream-null? null?)
>  (define the-empty-stream '())
>  (define (stream-car stream) (car stream))
>  (define (stream-cdr stream) (force (cdr stream)))
>  (define-syntax cons-stream
>    (syntax-rules ()
>      ((_ ?car ?cdr) (cons ?car (delay ?cdr)))))
>
>  (define (stream-for-each proc s)
>    (if (not (stream-null? s))
>        (begin (proc (stream-car s))
>               (stream-for-each proc (stream-cdr s)))))
>
>  (define (port->stream port readproc)
>    (cons-stream (readproc port) (port->stream port readproc)))
>
>  (stream-for-each
>   identity
>   (port->stream (open-input-file "/tmp/test") read))
>
> And I see memory usage explode, yes, at the REPL, even if I disable
> position recording via (read-disable 'positions).
>
>> 2. Are there any workarounds (for instance doing an explicit "(gc)"
>> somewhere in the definitions?
>> 3. Any pointers on fixing the underlying issue?
>
> I don't know. Ludovic? :) You have certainly found a bug, though. We
> probably won't look into it for 1.8, but we will certainly try to fix it
> for 2.0 (soon!).
>
>> 4. I noticed that streams in guile (ice-9 streams) were not
>> implemented in the SICP way. In-fact they were implemented in a way
>> that makes recursive definitions impossible. Was this intentional?
>
> I don't know TBH. SICP streams do have a problem, amply explored in
> http://www.cs.rice.edu/~taha/publications/conference/sml98.pdf; but
> beyond that, I don't know.
>
> Perturbedly yours,
>
> Andy
> --
> http://wingolog.org/
>
>



-- 
Tristan Colgate-McFarlane
----
  "You can get all your daily vitamins from 52 pints of guiness, and a
glass of milk"



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-19 18:08 Possible Memory Leak with stream-for-each Abhijeet More
  2010-07-20 20:36 ` Andy Wingo
@ 2010-07-24 16:13 ` Ludovic Courtès
  2010-07-24 16:32   ` Abhijeet More
  2010-09-02 12:49 ` Possible Memory Leak with stream-for-each Ludovic Courtès
  2 siblings, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2010-07-24 16:13 UTC (permalink / raw)
  To: guile-user

Hi!

The leak you describe seems to be that depicted under “Why the space
leak occurs” at <http://srfi.schemers.org/srfi-45/srfi-45.html>.

Am I right?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-24 16:13 ` Ludovic Courtès
@ 2010-07-24 16:32   ` Abhijeet More
  2010-07-24 16:46     ` Abhijeet More
  0 siblings, 1 reply; 19+ messages in thread
From: Abhijeet More @ 2010-07-24 16:32 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user

On Sat, Jul 24, 2010 at 12:13 PM, Ludovic Courtès <ludo@gnu.org> wrote:
> Hi!
>
> The leak you describe seems to be that depicted under “Why the space
> leak occurs” at <http://srfi.schemers.org/srfi-45/srfi-45.html>.
>
> Am I right?

I don't think so. The leak described in SRFI-45 is due to a naive
implementation of stream-filter on SICP-like streams (odd or even).
SRFI-45 describes a way to implement streams so that the leak does not
occur with the naive implementation of stream-filter.

The leak I'm referring to happens with *stream-for-each*.
*stream-filter* is not involved in any way.
I doubt if that is the problem I'm describing - especially since the
same code doesn't cause a leak in PLT Scheme/Racket.

Thanks
Abhijeet



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-24 16:32   ` Abhijeet More
@ 2010-07-24 16:46     ` Abhijeet More
  2010-07-26  9:36       ` Andy Wingo
  0 siblings, 1 reply; 19+ messages in thread
From: Abhijeet More @ 2010-07-24 16:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user

On Sat, Jul 24, 2010 at 12:32 PM, Abhijeet More <abhijeet.more@gmail.com> wrote:
>
> I don't think so. The leak described in SRFI-45 is due to a naive
> implementation of stream-filter on SICP-like streams (odd or even).
> SRFI-45 describes a way to implement streams so that the leak does not
> occur with the naive implementation of stream-filter.
[snip]

Now that I think about it though - it may be worth investigating
whether the stream-for-each implementation causes a similar
ever-growing sequence of pending-promises.
I'll see if this is the case.
The reason I feel strongly that this is not the problem is that the
same implementation (including cons-stream) does *not* cause a leak
with PLT Scheme/racket.
Thanks
Abhijeet



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-24 16:46     ` Abhijeet More
@ 2010-07-26  9:36       ` Andy Wingo
  2010-07-30  0:38         ` Abhijeet More
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Wingo @ 2010-07-26  9:36 UTC (permalink / raw)
  To: Abhijeet More; +Cc: Ludovic Courtès, guile-user

On Sat 24 Jul 2010 18:46, Abhijeet More <abhijeet.more@gmail.com> writes:

> The reason I feel strongly that this is not the problem is that the
> same implementation (including cons-stream) does *not* cause a leak
> with PLT Scheme/racket.

I think it's probably our bug, but I haven't had time to examine it
thoroughly.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-26  9:36       ` Andy Wingo
@ 2010-07-30  0:38         ` Abhijeet More
  2010-07-31 11:48           ` Andy Wingo
  0 siblings, 1 reply; 19+ messages in thread
From: Abhijeet More @ 2010-07-30  0:38 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-user

On Mon, Jul 26, 2010 at 5:36 AM, Andy Wingo <wingo@pobox.com> wrote:
>
> I think it's probably our bug, but I haven't had time to examine it
> thoroughly.

FWIW - I tried using the SRFI-45 reference implementation for delay
and force (as-is from the SRFI reference implementation) and got the
same behavior.

To debug this further are there any useful places in the
stream-for-each implementation that I could do a gc-stats from? Or
would we need to look at the C implementation?

Can anyone suggest good starting points for this?
Thanks
Abhijeet



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-30  0:38         ` Abhijeet More
@ 2010-07-31 11:48           ` Andy Wingo
  2010-07-31 20:16             ` Abhijeet More
                               ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Andy Wingo @ 2010-07-31 11:48 UTC (permalink / raw)
  To: Abhijeet More; +Cc: Ludovic Courtès, guile-user, Tibi Turbureanu

Hi Abhijeet,

On Fri 30 Jul 2010 02:38, Abhijeet More <abhijeet.more@gmail.com> writes:

> To debug this further are there any useful places in the
> stream-for-each implementation that I could do a gc-stats from? Or
> would we need to look at the C implementation?
>
> Can anyone suggest good starting points for this?

We started poking this a little at the GNU Hackers Meeting last
weekend. I say "we" but it was really Tibi (copied on the mail) who was
doing most of the work.

We printed out the object-address of the stream at the start of the
iteration, then ctrl-C'd in GDB somewhere in the middle. We then called
GC_print_heap_obj on that address and it printed as a two-word
object. We scm_display'd the address, and it was not a stream. (It was
used for something else.)

So, it's not the case that the beginning of the stream was being held on
to. Which is a bad thing -- it means that somehow something in the
middle was being held on to.

To really track this down we need a heap profiler. To make a heap
profiler, we need to hack libgc. libgc has some of the things we need
already, but some are only present in debug builds, which is
ridiculous -- one should always have the ability to profile the heap. We
need to figure out which value is being misidentified as a pointer to a
heap-allocated stream-pair.

That is my analysis anyway. The next step is to be able to expose the
back-pointer graph from libgc, and write analysis tools to figure out
which non-stream objects point to a stream.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-31 11:48           ` Andy Wingo
@ 2010-07-31 20:16             ` Abhijeet More
  2010-08-11 16:57               ` Abhijeet More
  2010-08-02  3:29             ` Tibi Turbureanu
  2010-08-15 15:12             ` Heap profiler Ludovic Courtès
  2 siblings, 1 reply; 19+ messages in thread
From: Abhijeet More @ 2010-07-31 20:16 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-user, Tibi Turbureanu

On Sat, Jul 31, 2010 at 7:48 AM, Andy Wingo <wingo@pobox.com> wrote:

[snip]

> To really track this down we need a heap profiler. To make a heap
> profiler, we need to hack libgc. libgc has some of the things we need
> already, but some are only present in debug builds, which is
> ridiculous -- one should always have the ability to profile the heap. We
> need to figure out which value is being misidentified as a pointer to a
> heap-allocated stream-pair.
>
> That is my analysis anyway. The next step is to be able to expose the
> back-pointer graph from libgc, and write analysis tools to figure out
> which non-stream objects point to a stream.

[snip]

Andy,
Thanks! That's a great answer!
I'll see what I can do with these guidelines.
Abhijeet



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-31 11:48           ` Andy Wingo
  2010-07-31 20:16             ` Abhijeet More
@ 2010-08-02  3:29             ` Tibi Turbureanu
  2010-08-15 15:12             ` Heap profiler Ludovic Courtès
  2 siblings, 0 replies; 19+ messages in thread
From: Tibi Turbureanu @ 2010-08-02  3:29 UTC (permalink / raw)
  To: guile-user; +Cc: Andy Wingo, Ludovic Courtès

Hello guys,

On Sat, Jul 31, 2010 at 01:48:27PM +0200, Andy Wingo wrote:
> So, it's not the case that the beginning of the stream was being held on
> to. Which is a bad thing -- it means that somehow something in the
> middle was being held on to.

Andy, I remember you said at GHM that we might not use libgc correctly.

Related to this, I found some tips in README.QUICK:

[quote]
Replace calls to malloc by calls to GC_MALLOC, and calls to realloc 
by calls to GC_REALLOC.
Define GC_DEBUG before including gc.h for additional checking.
[\quote]

We also have GC_malloc_* calls in Guile and maybe we should have only
GC_MALLOC_* calls so that the debugging mode can work (just defining
GC_DEBUG before including gc.h gives a segfault).

There is also a warning:

[quote]
Do not store the only pointer to an object in memory allocated
with system malloc, since the collector usually does not scan
memory allocated in this way.
[/quote]

But I haven't studied the code yet to tell if we respect this.

What do you guys think?

Tibi



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-31 20:16             ` Abhijeet More
@ 2010-08-11 16:57               ` Abhijeet More
  0 siblings, 0 replies; 19+ messages in thread
From: Abhijeet More @ 2010-08-11 16:57 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-user, Tibi Turbureanu

On Sat, Jul 31, 2010 at 4:16 PM, Abhijeet More <abhijeet.more@gmail.com> wrote:

> Andy,
> Thanks! That's a great answer!
> I'll see what I can do with these guidelines.


Apologies - I haven't had a chance yet to look into using the libgc
library as you suggested.
Would it be possible to use valgrind massif instead of changing the
code to see what we need?

Also: I was trying to write an implementation that did not leak and
came across the following.
When I redefine "delay" and "force" as follows:
(defmacro delay (expr)
  `(make-promise (lambda() ,expr)))

(define (make-promise proc)
  (lambda() (proc)))

(define force
  (lambda (promise)
    (promise)))

there is no memory leak. I see a leak again when I introduce
memoization into the "make-promise" implementation above.
To me it looks like that the leak that we see is caused by memoization
of the "promise" objects created by guile's default version of force
and delay. At what point does guile decide that a memoization isn't
required anymore?
At this point I don't know guile code well enough to say this for sure.

Thanks
Abhijeet



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Heap profiler
  2010-07-31 11:48           ` Andy Wingo
  2010-07-31 20:16             ` Abhijeet More
  2010-08-02  3:29             ` Tibi Turbureanu
@ 2010-08-15 15:12             ` Ludovic Courtès
  2 siblings, 0 replies; 19+ messages in thread
From: Ludovic Courtès @ 2010-08-15 15:12 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-user, Tibi Turbureanu

Hello!

Andy Wingo <wingo@pobox.com> writes:

> To really track this down we need a heap profiler. To make a heap
> profiler, we need to hack libgc.

As a start, I suggest looking at these:

  - http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/3392

  - “Understanding Memory allocations of Scheme Programs”, Serrano &
    Boehm, 2000, http://www-sop.inria.fr/mimosa/fp/Bigloo/bigloo-5.html

I miss ‘gc-live-object-stats’...

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-07-19 18:08 Possible Memory Leak with stream-for-each Abhijeet More
  2010-07-20 20:36 ` Andy Wingo
  2010-07-24 16:13 ` Ludovic Courtès
@ 2010-09-02 12:49 ` Ludovic Courtès
  2010-09-02 16:20   ` Julian Graham
  2010-09-02 18:46   ` Andy Wingo
  2 siblings, 2 replies; 19+ messages in thread
From: Ludovic Courtès @ 2010-09-02 12:49 UTC (permalink / raw)
  To: guile-user; +Cc: guile-devel

Hello!

I believe this patch fixes the problem:

  http://git.sv.gnu.org/cgit/guile.git/commit/?id=f57fdf07d6374028f35bcb1ee748a94022deda6d

Basically ‘force’ was leaking memory because it uses ‘lock-mutex’, which
was the culprit (!).

Julian: Could you please review this patch?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-09-02 12:49 ` Possible Memory Leak with stream-for-each Ludovic Courtès
@ 2010-09-02 16:20   ` Julian Graham
  2010-09-02 18:46   ` Andy Wingo
  1 sibling, 0 replies; 19+ messages in thread
From: Julian Graham @ 2010-09-02 16:20 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user, guile-devel

Hey Ludovic,


On Thu, Sep 2, 2010 at 8:49 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> I believe this patch fixes the problem:
>
>  http://git.sv.gnu.org/cgit/guile.git/commit/?id=f57fdf07d6374028f35bcb1ee748a94022deda6d
>
> Basically ‘force’ was leaking memory because it uses ‘lock-mutex’, which
> was the culprit (!).
>
> Julian: Could you please review this patch?

Sure.  That looks alright to me, especially in light of our discussion
on IRC.  Summary: The list of held mutexes is not weak; it's a list of
weak-car pairs.  So if a particular mutex in the list goes away, the
pair's still hanging around.  The patch introduces more explicit
cleanup of that mutex list.  (Thanks for clarifying that!)


Regards,
Julian



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Possible Memory Leak with stream-for-each
  2010-09-02 12:49 ` Possible Memory Leak with stream-for-each Ludovic Courtès
  2010-09-02 16:20   ` Julian Graham
@ 2010-09-02 18:46   ` Andy Wingo
  1 sibling, 0 replies; 19+ messages in thread
From: Andy Wingo @ 2010-09-02 18:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user, guile-devel

On Thu 02 Sep 2010 05:49, ludo@gnu.org (Ludovic Courtès) writes:

> Basically ‘force’ was leaking memory because it uses ‘lock-mutex’, which
> was the culprit (!).

You are a superhero, Dr. Courtès!!

A
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Heap profiler
@ 2022-11-07 11:03 Ludovic Courtès
  2022-11-12  9:24 ` zimoun
  0 siblings, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2022-11-07 11:03 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 2888 bytes --]

Hello Guilers,

While desperately chasing <https://issues.guix.gnu.org/59021> and
related memory leak issues, I came up with the attached rudimentary heap
profiler.  You can load it and invoking it in a running process:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (profile-heap)
  %   type                               self    avg obj size
 19.7 pair                                  720,864    16.0
 16.3 unknown                               594,832   600.8
 14.7 struct                                536,784    48.2
 12.6 bytevector                            461,824  1110.2
  7.7 stringbuf                             281,136   117.8
  6.8 pointer                               248,688    16.0
  5.5 vector                                202,815    35.4
  4.1 symbol                                148,640    32.0
  3.1 program                               113,824    40.0
  1.6 heap-number                            59,680    31.8
  1.5 string                                 54,960    32.0
  1.4 smob                                   52,736    38.0
  1.3 variable                               49,328    22.6
  0.8 weak-table                             30,144    30.4
  0.8 atomic-box                             28,528    32.1
  0.8 vm-continuation                        27,680    32.0
  0.7 hash-table                             26,736    32.1
  0.2 syntax                                  6,144    48.0
  0.1 dynamic-state                           4,208  1052.0
  0.1 primitive                               2,880    16.0
  0.1 weak-vector                             1,984    18.0
  0.0 keyword                                   752    16.7
  0.0 bitvector                                 672    35.4
  0.0 frame                                     624    39.0
  0.0 primitive-generic                         608    32.0
  0.0 continuation                              576   576.0
  0.0 fluid                                     208    29.7
  0.0 array                                      96    48.0
  0.0 weak-set                                   96    48.0
  0.0 port                                       64    32.0
sampled heap: 3.48865 MiB (heap size: 12.78906 MiB)
$5 = #t
--8<---------------cut here---------------end--------------->8---

It samples the GC-managed heap and counts the number and size of objects
of each type.  The “unknown” bit is anything that lacks a type tag, such
as stacks allocated for delimited continuations by ‘capture_stack’ in
libguile.

It gives a rough idea of what’s going on but of course it’s intrusive:
the profiling process itself allocates memory.  The next step will be to
run it from GDB so that it’s non-intrusive.

I’d be curious to know if people have developed similar tools in this
area.

Ludo’.


[-- Attachment #2: the heap profiler --]
[-- Type: text/plain, Size: 9089 bytes --]

;;; Copyright © 2022 Ludovic Courtès <ludo@gnu.org>
;;;
;;; Distributed under the GNU Lesser General Public License, version 3 or (at
;;; your option) any later version.

(use-modules (system foreign)
             (system base types internal)
             ;; ((system base types) #:select (scm->object))
             (srfi srfi-1)
             (srfi srfi-9 gnu)
             (ice-9 match)
             (ice-9 control)
             (ice-9 format)
             (ice-9 rdelim)
             (ice-9 regex))

(define-immutable-record-type <memory-mapping>
  (memory-mapping start end permissions name)
  memory-mapping?
  (start       memory-mapping-start)
  (end         memory-mapping-end)
  (permissions memory-mapping-permissions)
  (name        memory-mapping-name))

(define (memory-mappings pid)              ;based on Guile's 'gc-profile.scm'
  "Return an list of alists, each of which contains information about a memory
mapping of process @var{pid}.  This information is obtained by reading
@file{/proc/PID/maps} on Linux.  See `procs(5)' for details."

  (define mapping-line-rx
    ;; As of Linux 2.6.32.28, an `maps' line looks like this:
    ;; "00400000-0041d000 r--p 00000000 fd:00 7926441  /bin/cat".
    (make-regexp
     "^([[:xdigit:]]+)-([[:xdigit:]]+) ([rwx-]{3}[ps]) ([[:xdigit:]]+) (fd|[[:xdigit:]]{2}):[[:xdigit:]]{2} [0-9]+[[:blank:]]+(.*)$"))

  (call-with-input-file (format #f "/proc/~a/maps" pid)
    (lambda (port)
      (let loop ((result '()))
        (match (read-line port)
          ((? eof-object?)
           (reverse result))
          (line
           (cond ((regexp-exec mapping-line-rx line)
                  =>
                  (lambda (match)
                    (let ((start (string->number (match:substring match 1)
                                                 16))
                          (end   (string->number (match:substring match 2)
                                                 16))
                          (perms (match:substring match 3))
                          (name  (match:substring match 6)))
                      (loop (cons (memory-mapping
                                   start end perms
                                   (if (string=? name "")
                                       #f
                                       name))
                                  result)))))
                 (else
                  (loop result)))))))))

;; (define random-valid-address
;;   ;; XXX: This is only in libgc with back pointers.
;;   (let ((ptr (false-if-exception
;;               (dynamic-func "GC_generate_random_valid_address" (dynamic-link)))))
;;     (if ptr
;;         (pointer->procedure '* ptr '())
;;         (const #f))))

(define (heap-sections)
  (filter (lambda (mapping)
            (and (not (memory-mapping-name mapping))
                 (string=? "rw-p" (memory-mapping-permissions mapping))))
          (memory-mappings (getpid))))

(define (random-valid-address heap-sections)
  ;; Mimic 'GC_generate_random_valid_address', which is only available with
  ;; '-DBACK_PTRS' builds of libgc.
  (define heap-size
    (fold (lambda (mapping size)
            (+ size (- (memory-mapping-end mapping)
                       (memory-mapping-start mapping))))
          0
          heap-sections))

  (let loop ((sections heap-sections)
             (size     0)
             (offset   (random heap-size)))
    (match sections
      (() #f)
      ((section . rest)
       (let* ((start (memory-mapping-start section))
              (end   (memory-mapping-end section))
              (section-size  (- end start)))
         (if (< offset section-size)
             (let ((result (base-pointer (+ start offset))))
               ;; (pk 'p (number->string (+ start offset) 16) result)
               (if (null-pointer? result)
                   (loop heap-sections 0 (random heap-size)) ;retry
                   result))
             (loop rest
                   (+ size section-size)
                   (- offset section-size))))))))

(define object-size
  (pointer->procedure size_t
                      (dynamic-func "GC_size" (dynamic-link))
                      '(*)))

(define base-pointer
  (pointer->procedure '*
                      (dynamic-func "GC_base" (dynamic-link))
                      (list uintptr_t)))

(define (heap-tag->type-name word)
  "Return the type name as a symbol corresponding to the tag WORD."
  (match (let/ec return
           (let-syntax ((tag-name (syntax-rules ()
                                    ((_ name pred mask tag)
                                     (when (= (logand word mask) tag)
                                       (return 'name))))))
             (visit-heap-tags tag-name)
             'unknown))
    ('program
     (cond ((= (logand word #x1000) #x1000)
            'partial-continuation)
           ((= (logand word #x2000) #x2000)
            'foreign-program)
           ((= (logand word #x800) #x800)
            'continuation)
           ((= (logand word #x400) #x400)
            'primitive-generic)
           ((= (logand word #x200) #x200)
            'primitive)
           ((= (logand word #x100) #x100)
            'boot-program)
           (else
            'program)))
    (type
     type)))

(define* (profile-heap #:key (sample-count 100000))
  "Pick SAMPLE-COUNT addresses in the GC-managed heap and display a profile
of this sample per data type."
  (define heap-size
    (assoc-ref (gc-stats) 'heap-size))

  (define heap
    (heap-sections))

  (let ((objects (make-hash-table 57))
        (visited (make-hash-table)))
    (let loop ((i sample-count))
      (unless (zero? i)
        (let ((address (random-valid-address heap)))
          (if (hashv-ref visited (pointer-address address))
              (loop i)
              (begin
                (hashv-set! visited (pointer-address address) #t)
                (let* ((tag  (pointer-address (dereference-pointer address)))
                       (type (heap-tag->type-name tag))
                       (size (match type
                               ('pair (* 2 (sizeof '*)))
                               ('vector
                                (min (ash tag -8)
                                     (object-size address)))
                               (_ (object-size address)))))
                  ;; (when (eq? 'unknown type)
                  ;;   (pk (object-size address)))
                  ;; (when (eq? 'vector type)
                  ;;   (pk 'vector size 'tag tag 'addr address 'vs (object-size address)))
                  (hashq-set! objects type
                              (match (hashq-ref objects type '(0 . 0))
                                ((count . total)
                                 (cons (+ count 1) (+ total size))))))
                (loop (- i 1)))))))
    (let ((grand-total (hash-fold (lambda (type stats result)
                                    (match stats
                                      ((_ . total)
                                       (+ total result))))
                                  0
                                  objects)))
      (format #t "  %   type                               self    avg obj size~%")
      (for-each (match-lambda
                  ((type . (count . total))
                   (format #t "~5,1f ~30a ~14h ~7,1f~%"
                           (* 100. (/ total grand-total))
                           type total
                           (/ total count 1.))))
                (sort (hash-map->list cons objects)
                      (match-lambda*
                        (((_ . (count1 . total1)) (_ . (count2 . total2)))
                         (or (> total1 total2)
                             (and (= total1 total2)
                                  (> count1 count2)))))))
      (format #t "sampled heap: ~h MiB (heap size: ~h MiB)~%"
              (/ grand-total (expt 2. 20))
              (/ heap-size (expt 2. 20))))))

(define (heap-samples type count)
  "Sample COUNT objects of the given TYPE, a symbol such as 'vector, and
return them.

WARNING: This can crash your application as this could pick bogus or
finalized objects."
  (define heap
    (heap-sections))

  (let ((visited (make-hash-table)))
    (let loop ((i count)
               (objects '()))
      (if (zero? i)
          objects
          (let ((address (random-valid-address heap)))
            (if (hashv-ref visited (pointer-address address))
                (loop i objects)
                (begin
                  (hashv-set! visited (pointer-address address) #t)
                  (let ((tag (pointer-address (dereference-pointer address))))
                    (if (eq? type (heap-tag->type-name tag))
                        (loop (- i 1)
                              (cons (pointer->scm address) objects))
                        (loop i objects))))))))))

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Heap profiler
  2022-11-07 11:03 Heap profiler Ludovic Courtès
@ 2022-11-12  9:24 ` zimoun
  2022-11-13  3:29   ` Maxim Cournoyer
  0 siblings, 1 reply; 19+ messages in thread
From: zimoun @ 2022-11-12  9:24 UTC (permalink / raw)
  To: Ludovic Courtès, guile-user

Hi,

On Mon, 07 Nov 2022 at 12:03, Ludovic Courtès <ludo@gnu.org> wrote:

> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (profile-heap)

[...]

> sampled heap: 3.48865 MiB (heap size: 12.78906 MiB)
> $5 = #t
> --8<---------------cut here---------------end--------------->8---
>
> It samples the GC-managed heap and counts the number and size of objects
> of each type.  The “unknown” bit is anything that lacks a type tag, such
> as stacks allocated for delimited continuations by ‘capture_stack’ in
> libguile.

It could nice to have it by default. For instance under ’,help profile’
with something like ’,heap [,ph] - Profile heap memory executation’.  Or
elsewhere as ’,help system’ close to ,gc.


Cheers,
simon



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Heap profiler
  2022-11-12  9:24 ` zimoun
@ 2022-11-13  3:29   ` Maxim Cournoyer
  0 siblings, 0 replies; 19+ messages in thread
From: Maxim Cournoyer @ 2022-11-13  3:29 UTC (permalink / raw)
  To: zimoun; +Cc: Ludovic Courtès, guile-user

Hi,

zimoun <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On Mon, 07 Nov 2022 at 12:03, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> --8<---------------cut here---------------start------------->8---
>> scheme@(guile-user)> (profile-heap)
>
> [...]
>
>> sampled heap: 3.48865 MiB (heap size: 12.78906 MiB)
>> $5 = #t
>> --8<---------------cut here---------------end--------------->8---
>>
>> It samples the GC-managed heap and counts the number and size of objects
>> of each type.  The “unknown” bit is anything that lacks a type tag, such
>> as stacks allocated for delimited continuations by ‘capture_stack’ in
>> libguile.
>
> It could nice to have it by default. For instance under ’,help profile’
> with something like ’,heap [,ph] - Profile heap memory executation’.  Or
> elsewhere as ’,help system’ close to ,gc.

+1 :-)

-- 
Thanks,
Maxim



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-11-13  3:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-19 18:08 Possible Memory Leak with stream-for-each Abhijeet More
2010-07-20 20:36 ` Andy Wingo
2010-07-21  7:00   ` Tristan Colgate
2010-07-24 16:13 ` Ludovic Courtès
2010-07-24 16:32   ` Abhijeet More
2010-07-24 16:46     ` Abhijeet More
2010-07-26  9:36       ` Andy Wingo
2010-07-30  0:38         ` Abhijeet More
2010-07-31 11:48           ` Andy Wingo
2010-07-31 20:16             ` Abhijeet More
2010-08-11 16:57               ` Abhijeet More
2010-08-02  3:29             ` Tibi Turbureanu
2010-08-15 15:12             ` Heap profiler Ludovic Courtès
2010-09-02 12:49 ` Possible Memory Leak with stream-for-each Ludovic Courtès
2010-09-02 16:20   ` Julian Graham
2010-09-02 18:46   ` Andy Wingo
  -- strict thread matches above, loose matches on Subject: below --
2022-11-07 11:03 Heap profiler Ludovic Courtès
2022-11-12  9:24 ` zimoun
2022-11-13  3:29   ` Maxim Cournoyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).