From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Newsgroups: gmane.lisp.guile.user Subject: Heap profiler Date: Mon, 07 Nov 2022 12:03:46 +0100 Message-ID: <87k047uiul.fsf@inria.fr> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12667"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Mon Nov 07 12:04:45 2022 Return-path: Envelope-to: guile-user@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1orzvp-00031m-AX for guile-user@m.gmane-mx.org; Mon, 07 Nov 2022 12:04:45 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1orzv1-0007BU-Pe; Mon, 07 Nov 2022 06:03:55 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1orzuw-0007AW-JO for guile-user@gnu.org; Mon, 07 Nov 2022 06:03:50 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1orzuw-0002PU-8w for guile-user@gnu.org; Mon, 07 Nov 2022 06:03:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:Subject:To:From:in-reply-to: references; bh=lUjeu8asFW8yugmfknVHH9DVNZ1UStk8n/dYAITSviY=; b=cBTBe3MRPZc8s2 DTnbweDzfdYOsVqWO9ZiZSXZgQ9rHkqyaMZ3LIPowMTvaXgu4s4Grf+OKYar7dts0NWQNyMoGrrx2 bx5Oewd5UZ81ajEUt5axW1VkSgiuYLL3CZ26MHKynnlsetNiUCICxC34A3ZmAJg99c4G1hp2YYxsB IrF0S5N7+Xjmxr+QJtRJd+L7FE7luoppMlCrSXZ0eRHkp3tsgJssX5gLh62z3MIMG8B8Gf4jDJOzE rtLGVjPqPru3TFWC92Lf73I/CoBhttvgDYbhVMRlLMx7WjKmh2AdTgWAElAO5klvBD2CgclLxDSmJ iaqMdmcwkE/VQ6jElxSg==; Original-Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1orzuv-0003YQ-CY for guile-user@gnu.org; Mon, 07 Nov 2022 06:03:49 -0500 X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Septidi 17 Brumaire an 231 de la =?utf-8?Q?R=C3=A9vo?= =?utf-8?Q?lution=2C?= jour du Cresson X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.user:18696 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello Guilers, While desperately chasing and related memory leak issues, I came up with the attached rudimentary heap profiler. You can load it and invoking it in a running process: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> (profile-heap) % type self avg obj size 19.7 pair 720,864 16.0 16.3 unknown 594,832 600.8 14.7 struct 536,784 48.2 12.6 bytevector 461,824 1110.2 7.7 stringbuf 281,136 117.8 6.8 pointer 248,688 16.0 5.5 vector 202,815 35.4 4.1 symbol 148,640 32.0 3.1 program 113,824 40.0 1.6 heap-number 59,680 31.8 1.5 string 54,960 32.0 1.4 smob 52,736 38.0 1.3 variable 49,328 22.6 0.8 weak-table 30,144 30.4 0.8 atomic-box 28,528 32.1 0.8 vm-continuation 27,680 32.0 0.7 hash-table 26,736 32.1 0.2 syntax 6,144 48.0 0.1 dynamic-state 4,208 1052.0 0.1 primitive 2,880 16.0 0.1 weak-vector 1,984 18.0 0.0 keyword 752 16.7 0.0 bitvector 672 35.4 0.0 frame 624 39.0 0.0 primitive-generic 608 32.0 0.0 continuation 576 576.0 0.0 fluid 208 29.7 0.0 array 96 48.0 0.0 weak-set 96 48.0 0.0 port 64 32.0 sampled heap: 3.48865 MiB (heap size: 12.78906 MiB) $5 =3D #t --8<---------------cut here---------------end--------------->8--- It samples the GC-managed heap and counts the number and size of objects of each type. The =E2=80=9Cunknown=E2=80=9D bit is anything that lacks a t= ype tag, such as stacks allocated for delimited continuations by =E2=80=98capture_stack= =E2=80=99 in libguile. It gives a rough idea of what=E2=80=99s going on but of course it=E2=80=99s= intrusive: the profiling process itself allocates memory. The next step will be to run it from GDB so that it=E2=80=99s non-intrusive. I=E2=80=99d be curious to know if people have developed similar tools in th= is area. Ludo=E2=80=99. --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline; filename=heap-profiler.scm Content-Transfer-Encoding: quoted-printable Content-Description: the heap profiler ;;; Copyright =C2=A9 2022 Ludovic Court=C3=A8s ;;; ;;; Distributed under the GNU Lesser General Public License, version 3 or (= at ;;; your option) any later version. (use-modules (system foreign) (system base types internal) ;; ((system base types) #:select (scm->object)) (srfi srfi-1) (srfi srfi-9 gnu) (ice-9 match) (ice-9 control) (ice-9 format) (ice-9 rdelim) (ice-9 regex)) (define-immutable-record-type (memory-mapping start end permissions name) memory-mapping? (start memory-mapping-start) (end memory-mapping-end) (permissions memory-mapping-permissions) (name memory-mapping-name)) (define (memory-mappings pid) ;based on Guile's 'gc-profile.sc= m' "Return an list of alists, each of which contains information about a mem= ory mapping of process @var{pid}. This information is obtained by reading @file{/proc/PID/maps} on Linux. See `procs(5)' for details." (define mapping-line-rx ;; As of Linux 2.6.32.28, an `maps' line looks like this: ;; "00400000-0041d000 r--p 00000000 fd:00 7926441 /bin/cat". (make-regexp "^([[:xdigit:]]+)-([[:xdigit:]]+) ([rwx-]{3}[ps]) ([[:xdigit:]]+) (fd|= [[:xdigit:]]{2}):[[:xdigit:]]{2} [0-9]+[[:blank:]]+(.*)$")) (call-with-input-file (format #f "/proc/~a/maps" pid) (lambda (port) (let loop ((result '())) (match (read-line port) ((? eof-object?) (reverse result)) (line (cond ((regexp-exec mapping-line-rx line) =3D> (lambda (match) (let ((start (string->number (match:substring match 1) 16)) (end (string->number (match:substring match 2) 16)) (perms (match:substring match 3)) (name (match:substring match 6))) (loop (cons (memory-mapping start end perms (if (string=3D? name "") #f name)) result))))) (else (loop result))))))))) ;; (define random-valid-address ;; ;; XXX: This is only in libgc with back pointers. ;; (let ((ptr (false-if-exception ;; (dynamic-func "GC_generate_random_valid_address" (dynamic-= link))))) ;; (if ptr ;; (pointer->procedure '* ptr '()) ;; (const #f)))) (define (heap-sections) (filter (lambda (mapping) (and (not (memory-mapping-name mapping)) (string=3D? "rw-p" (memory-mapping-permissions mapping)))) (memory-mappings (getpid)))) (define (random-valid-address heap-sections) ;; Mimic 'GC_generate_random_valid_address', which is only available with ;; '-DBACK_PTRS' builds of libgc. (define heap-size (fold (lambda (mapping size) (+ size (- (memory-mapping-end mapping) (memory-mapping-start mapping)))) 0 heap-sections)) (let loop ((sections heap-sections) (size 0) (offset (random heap-size))) (match sections (() #f) ((section . rest) (let* ((start (memory-mapping-start section)) (end (memory-mapping-end section)) (section-size (- end start))) (if (< offset section-size) (let ((result (base-pointer (+ start offset)))) ;; (pk 'p (number->string (+ start offset) 16) result) (if (null-pointer? result) (loop heap-sections 0 (random heap-size)) ;retry result)) (loop rest (+ size section-size) (- offset section-size)))))))) (define object-size (pointer->procedure size_t (dynamic-func "GC_size" (dynamic-link)) '(*))) (define base-pointer (pointer->procedure '* (dynamic-func "GC_base" (dynamic-link)) (list uintptr_t))) (define (heap-tag->type-name word) "Return the type name as a symbol corresponding to the tag WORD." (match (let/ec return (let-syntax ((tag-name (syntax-rules () ((_ name pred mask tag) (when (=3D (logand word mask) tag) (return 'name)))))) (visit-heap-tags tag-name) 'unknown)) ('program (cond ((=3D (logand word #x1000) #x1000) 'partial-continuation) ((=3D (logand word #x2000) #x2000) 'foreign-program) ((=3D (logand word #x800) #x800) 'continuation) ((=3D (logand word #x400) #x400) 'primitive-generic) ((=3D (logand word #x200) #x200) 'primitive) ((=3D (logand word #x100) #x100) 'boot-program) (else 'program))) (type type))) (define* (profile-heap #:key (sample-count 100000)) "Pick SAMPLE-COUNT addresses in the GC-managed heap and display a profile of this sample per data type." (define heap-size (assoc-ref (gc-stats) 'heap-size)) (define heap (heap-sections)) (let ((objects (make-hash-table 57)) (visited (make-hash-table))) (let loop ((i sample-count)) (unless (zero? i) (let ((address (random-valid-address heap))) (if (hashv-ref visited (pointer-address address)) (loop i) (begin (hashv-set! visited (pointer-address address) #t) (let* ((tag (pointer-address (dereference-pointer address)= )) (type (heap-tag->type-name tag)) (size (match type ('pair (* 2 (sizeof '*))) ('vector (min (ash tag -8) (object-size address))) (_ (object-size address))))) ;; (when (eq? 'unknown type) ;; (pk (object-size address))) ;; (when (eq? 'vector type) ;; (pk 'vector size 'tag tag 'addr address 'vs (object-= size address))) (hashq-set! objects type (match (hashq-ref objects type '(0 . 0)) ((count . total) (cons (+ count 1) (+ total size)))))) (loop (- i 1))))))) (let ((grand-total (hash-fold (lambda (type stats result) (match stats ((_ . total) (+ total result)))) 0 objects))) (format #t " % type self avg obj = size~%") (for-each (match-lambda ((type . (count . total)) (format #t "~5,1f ~30a ~14h ~7,1f~%" (* 100. (/ total grand-total)) type total (/ total count 1.)))) (sort (hash-map->list cons objects) (match-lambda* (((_ . (count1 . total1)) (_ . (count2 . total2))) (or (> total1 total2) (and (=3D total1 total2) (> count1 count2))))))) (format #t "sampled heap: ~h MiB (heap size: ~h MiB)~%" (/ grand-total (expt 2. 20)) (/ heap-size (expt 2. 20)))))) (define (heap-samples type count) "Sample COUNT objects of the given TYPE, a symbol such as 'vector, and return them. WARNING: This can crash your application as this could pick bogus or finalized objects." (define heap (heap-sections)) (let ((visited (make-hash-table))) (let loop ((i count) (objects '())) (if (zero? i) objects (let ((address (random-valid-address heap))) (if (hashv-ref visited (pointer-address address)) (loop i objects) (begin (hashv-set! visited (pointer-address address) #t) (let ((tag (pointer-address (dereference-pointer address)= ))) (if (eq? type (heap-tag->type-name tag)) (loop (- i 1) (cons (pointer->scm address) objects)) (loop i objects)))))))))) --=-=-=--