unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Heap profiler
@ 2022-11-07 11:03 Ludovic Courtès
  2022-11-12  9:24 ` zimoun
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2022-11-07 11:03 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 2888 bytes --]

Hello Guilers,

While desperately chasing <https://issues.guix.gnu.org/59021> and
related memory leak issues, I came up with the attached rudimentary heap
profiler.  You can load it and invoking it in a running process:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (profile-heap)
  %   type                               self    avg obj size
 19.7 pair                                  720,864    16.0
 16.3 unknown                               594,832   600.8
 14.7 struct                                536,784    48.2
 12.6 bytevector                            461,824  1110.2
  7.7 stringbuf                             281,136   117.8
  6.8 pointer                               248,688    16.0
  5.5 vector                                202,815    35.4
  4.1 symbol                                148,640    32.0
  3.1 program                               113,824    40.0
  1.6 heap-number                            59,680    31.8
  1.5 string                                 54,960    32.0
  1.4 smob                                   52,736    38.0
  1.3 variable                               49,328    22.6
  0.8 weak-table                             30,144    30.4
  0.8 atomic-box                             28,528    32.1
  0.8 vm-continuation                        27,680    32.0
  0.7 hash-table                             26,736    32.1
  0.2 syntax                                  6,144    48.0
  0.1 dynamic-state                           4,208  1052.0
  0.1 primitive                               2,880    16.0
  0.1 weak-vector                             1,984    18.0
  0.0 keyword                                   752    16.7
  0.0 bitvector                                 672    35.4
  0.0 frame                                     624    39.0
  0.0 primitive-generic                         608    32.0
  0.0 continuation                              576   576.0
  0.0 fluid                                     208    29.7
  0.0 array                                      96    48.0
  0.0 weak-set                                   96    48.0
  0.0 port                                       64    32.0
sampled heap: 3.48865 MiB (heap size: 12.78906 MiB)
$5 = #t
--8<---------------cut here---------------end--------------->8---

It samples the GC-managed heap and counts the number and size of objects
of each type.  The “unknown” bit is anything that lacks a type tag, such
as stacks allocated for delimited continuations by ‘capture_stack’ in
libguile.

It gives a rough idea of what’s going on but of course it’s intrusive:
the profiling process itself allocates memory.  The next step will be to
run it from GDB so that it’s non-intrusive.

I’d be curious to know if people have developed similar tools in this
area.

Ludo’.


[-- Attachment #2: the heap profiler --]
[-- Type: text/plain, Size: 9089 bytes --]

;;; Copyright © 2022 Ludovic Courtès <ludo@gnu.org>
;;;
;;; Distributed under the GNU Lesser General Public License, version 3 or (at
;;; your option) any later version.

(use-modules (system foreign)
             (system base types internal)
             ;; ((system base types) #:select (scm->object))
             (srfi srfi-1)
             (srfi srfi-9 gnu)
             (ice-9 match)
             (ice-9 control)
             (ice-9 format)
             (ice-9 rdelim)
             (ice-9 regex))

(define-immutable-record-type <memory-mapping>
  (memory-mapping start end permissions name)
  memory-mapping?
  (start       memory-mapping-start)
  (end         memory-mapping-end)
  (permissions memory-mapping-permissions)
  (name        memory-mapping-name))

(define (memory-mappings pid)              ;based on Guile's 'gc-profile.scm'
  "Return an list of alists, each of which contains information about a memory
mapping of process @var{pid}.  This information is obtained by reading
@file{/proc/PID/maps} on Linux.  See `procs(5)' for details."

  (define mapping-line-rx
    ;; As of Linux 2.6.32.28, an `maps' line looks like this:
    ;; "00400000-0041d000 r--p 00000000 fd:00 7926441  /bin/cat".
    (make-regexp
     "^([[:xdigit:]]+)-([[:xdigit:]]+) ([rwx-]{3}[ps]) ([[:xdigit:]]+) (fd|[[:xdigit:]]{2}):[[:xdigit:]]{2} [0-9]+[[:blank:]]+(.*)$"))

  (call-with-input-file (format #f "/proc/~a/maps" pid)
    (lambda (port)
      (let loop ((result '()))
        (match (read-line port)
          ((? eof-object?)
           (reverse result))
          (line
           (cond ((regexp-exec mapping-line-rx line)
                  =>
                  (lambda (match)
                    (let ((start (string->number (match:substring match 1)
                                                 16))
                          (end   (string->number (match:substring match 2)
                                                 16))
                          (perms (match:substring match 3))
                          (name  (match:substring match 6)))
                      (loop (cons (memory-mapping
                                   start end perms
                                   (if (string=? name "")
                                       #f
                                       name))
                                  result)))))
                 (else
                  (loop result)))))))))

;; (define random-valid-address
;;   ;; XXX: This is only in libgc with back pointers.
;;   (let ((ptr (false-if-exception
;;               (dynamic-func "GC_generate_random_valid_address" (dynamic-link)))))
;;     (if ptr
;;         (pointer->procedure '* ptr '())
;;         (const #f))))

(define (heap-sections)
  (filter (lambda (mapping)
            (and (not (memory-mapping-name mapping))
                 (string=? "rw-p" (memory-mapping-permissions mapping))))
          (memory-mappings (getpid))))

(define (random-valid-address heap-sections)
  ;; Mimic 'GC_generate_random_valid_address', which is only available with
  ;; '-DBACK_PTRS' builds of libgc.
  (define heap-size
    (fold (lambda (mapping size)
            (+ size (- (memory-mapping-end mapping)
                       (memory-mapping-start mapping))))
          0
          heap-sections))

  (let loop ((sections heap-sections)
             (size     0)
             (offset   (random heap-size)))
    (match sections
      (() #f)
      ((section . rest)
       (let* ((start (memory-mapping-start section))
              (end   (memory-mapping-end section))
              (section-size  (- end start)))
         (if (< offset section-size)
             (let ((result (base-pointer (+ start offset))))
               ;; (pk 'p (number->string (+ start offset) 16) result)
               (if (null-pointer? result)
                   (loop heap-sections 0 (random heap-size)) ;retry
                   result))
             (loop rest
                   (+ size section-size)
                   (- offset section-size))))))))

(define object-size
  (pointer->procedure size_t
                      (dynamic-func "GC_size" (dynamic-link))
                      '(*)))

(define base-pointer
  (pointer->procedure '*
                      (dynamic-func "GC_base" (dynamic-link))
                      (list uintptr_t)))

(define (heap-tag->type-name word)
  "Return the type name as a symbol corresponding to the tag WORD."
  (match (let/ec return
           (let-syntax ((tag-name (syntax-rules ()
                                    ((_ name pred mask tag)
                                     (when (= (logand word mask) tag)
                                       (return 'name))))))
             (visit-heap-tags tag-name)
             'unknown))
    ('program
     (cond ((= (logand word #x1000) #x1000)
            'partial-continuation)
           ((= (logand word #x2000) #x2000)
            'foreign-program)
           ((= (logand word #x800) #x800)
            'continuation)
           ((= (logand word #x400) #x400)
            'primitive-generic)
           ((= (logand word #x200) #x200)
            'primitive)
           ((= (logand word #x100) #x100)
            'boot-program)
           (else
            'program)))
    (type
     type)))

(define* (profile-heap #:key (sample-count 100000))
  "Pick SAMPLE-COUNT addresses in the GC-managed heap and display a profile
of this sample per data type."
  (define heap-size
    (assoc-ref (gc-stats) 'heap-size))

  (define heap
    (heap-sections))

  (let ((objects (make-hash-table 57))
        (visited (make-hash-table)))
    (let loop ((i sample-count))
      (unless (zero? i)
        (let ((address (random-valid-address heap)))
          (if (hashv-ref visited (pointer-address address))
              (loop i)
              (begin
                (hashv-set! visited (pointer-address address) #t)
                (let* ((tag  (pointer-address (dereference-pointer address)))
                       (type (heap-tag->type-name tag))
                       (size (match type
                               ('pair (* 2 (sizeof '*)))
                               ('vector
                                (min (ash tag -8)
                                     (object-size address)))
                               (_ (object-size address)))))
                  ;; (when (eq? 'unknown type)
                  ;;   (pk (object-size address)))
                  ;; (when (eq? 'vector type)
                  ;;   (pk 'vector size 'tag tag 'addr address 'vs (object-size address)))
                  (hashq-set! objects type
                              (match (hashq-ref objects type '(0 . 0))
                                ((count . total)
                                 (cons (+ count 1) (+ total size))))))
                (loop (- i 1)))))))
    (let ((grand-total (hash-fold (lambda (type stats result)
                                    (match stats
                                      ((_ . total)
                                       (+ total result))))
                                  0
                                  objects)))
      (format #t "  %   type                               self    avg obj size~%")
      (for-each (match-lambda
                  ((type . (count . total))
                   (format #t "~5,1f ~30a ~14h ~7,1f~%"
                           (* 100. (/ total grand-total))
                           type total
                           (/ total count 1.))))
                (sort (hash-map->list cons objects)
                      (match-lambda*
                        (((_ . (count1 . total1)) (_ . (count2 . total2)))
                         (or (> total1 total2)
                             (and (= total1 total2)
                                  (> count1 count2)))))))
      (format #t "sampled heap: ~h MiB (heap size: ~h MiB)~%"
              (/ grand-total (expt 2. 20))
              (/ heap-size (expt 2. 20))))))

(define (heap-samples type count)
  "Sample COUNT objects of the given TYPE, a symbol such as 'vector, and
return them.

WARNING: This can crash your application as this could pick bogus or
finalized objects."
  (define heap
    (heap-sections))

  (let ((visited (make-hash-table)))
    (let loop ((i count)
               (objects '()))
      (if (zero? i)
          objects
          (let ((address (random-valid-address heap)))
            (if (hashv-ref visited (pointer-address address))
                (loop i objects)
                (begin
                  (hashv-set! visited (pointer-address address) #t)
                  (let ((tag (pointer-address (dereference-pointer address))))
                    (if (eq? type (heap-tag->type-name tag))
                        (loop (- i 1)
                              (cons (pointer->scm address) objects))
                        (loop i objects))))))))))

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Heap profiler
  2022-11-07 11:03 Heap profiler Ludovic Courtès
@ 2022-11-12  9:24 ` zimoun
  2022-11-13  3:29   ` Maxim Cournoyer
  0 siblings, 1 reply; 4+ messages in thread
From: zimoun @ 2022-11-12  9:24 UTC (permalink / raw)
  To: Ludovic Courtès, guile-user

Hi,

On Mon, 07 Nov 2022 at 12:03, Ludovic Courtès <ludo@gnu.org> wrote:

> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (profile-heap)

[...]

> sampled heap: 3.48865 MiB (heap size: 12.78906 MiB)
> $5 = #t
> --8<---------------cut here---------------end--------------->8---
>
> It samples the GC-managed heap and counts the number and size of objects
> of each type.  The “unknown” bit is anything that lacks a type tag, such
> as stacks allocated for delimited continuations by ‘capture_stack’ in
> libguile.

It could nice to have it by default. For instance under ’,help profile’
with something like ’,heap [,ph] - Profile heap memory executation’.  Or
elsewhere as ’,help system’ close to ,gc.


Cheers,
simon



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Heap profiler
  2022-11-12  9:24 ` zimoun
@ 2022-11-13  3:29   ` Maxim Cournoyer
  0 siblings, 0 replies; 4+ messages in thread
From: Maxim Cournoyer @ 2022-11-13  3:29 UTC (permalink / raw)
  To: zimoun; +Cc: Ludovic Courtès, guile-user

Hi,

zimoun <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On Mon, 07 Nov 2022 at 12:03, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> --8<---------------cut here---------------start------------->8---
>> scheme@(guile-user)> (profile-heap)
>
> [...]
>
>> sampled heap: 3.48865 MiB (heap size: 12.78906 MiB)
>> $5 = #t
>> --8<---------------cut here---------------end--------------->8---
>>
>> It samples the GC-managed heap and counts the number and size of objects
>> of each type.  The “unknown” bit is anything that lacks a type tag, such
>> as stacks allocated for delimited continuations by ‘capture_stack’ in
>> libguile.
>
> It could nice to have it by default. For instance under ’,help profile’
> with something like ’,heap [,ph] - Profile heap memory executation’.  Or
> elsewhere as ’,help system’ close to ,gc.

+1 :-)

-- 
Thanks,
Maxim



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Heap profiler
  2010-07-31 11:48           ` Andy Wingo
@ 2010-08-15 15:12             ` Ludovic Courtès
  0 siblings, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2010-08-15 15:12 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-user, Tibi Turbureanu

Hello!

Andy Wingo <wingo@pobox.com> writes:

> To really track this down we need a heap profiler. To make a heap
> profiler, we need to hack libgc.

As a start, I suggest looking at these:

  - http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/3392

  - “Understanding Memory allocations of Scheme Programs”, Serrano &
    Boehm, 2000, http://www-sop.inria.fr/mimosa/fp/Bigloo/bigloo-5.html

I miss ‘gc-live-object-stats’...

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-13  3:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-07 11:03 Heap profiler Ludovic Courtès
2022-11-12  9:24 ` zimoun
2022-11-13  3:29   ` Maxim Cournoyer
  -- strict thread matches above, loose matches on Subject: below --
2010-07-19 18:08 Possible Memory Leak with stream-for-each Abhijeet More
2010-07-24 16:13 ` Ludovic Courtès
2010-07-24 16:32   ` Abhijeet More
2010-07-24 16:46     ` Abhijeet More
2010-07-26  9:36       ` Andy Wingo
2010-07-30  0:38         ` Abhijeet More
2010-07-31 11:48           ` Andy Wingo
2010-08-15 15:12             ` Heap profiler Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).