all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Identifying sources of allocations in a toy Mandelbrot package
@ 2024-01-17 12:39 Psionic K
  2024-01-17 12:58 ` Eli Zaretskii
  2024-01-17 13:25 ` Emanuel Berg
  0 siblings, 2 replies; 15+ messages in thread
From: Psionic K @ 2024-01-17 12:39 UTC (permalink / raw)
  To: help-gnu-emacs

I was toying around with compilation speed trying to eliminate
allocations and discovered that my package was spending about 40% of
its runtime in GC  (observed in `gc-elapsed').  I didn't suspect the
situation could be this intense until turning on GC messages.

I wrapped the command's body in some temporary GC settings and then
found values I was comfortable with in terms of tradeoffs for
resulting resident memory consumption and time spent GC'ing.

By setting absurd values that wouldn't GC until the program finished,
I was able to reproduce freezes I had experienced in the past where
Emacs was evidently GC'ing forever.

All that was fun, but what I don't understand is where this package
manages to consume up to 7GB of memory, which Emacs will not readily
give back afterwards.  How can I identify causes of allocations until
I gain familiarity with which expressions typically cause them?  To my
naive eyes, there are no allocations in the hot section of the code,
but that is very untrue given the behavior.

I want to also remark that `gc-cons-percentage' does not seem to be a
percentage but a factor.  If it is a percentage, I don't understand
it's behavior because any value larger than 2.0 seems to lead Emacs to
using all of the available memory.

Package follows.  Enjoy, but as configured, Emacs will be hovering at
around 1.3GB after running:

    ;;; mandelbrot.el --- Mandelbrot -*- lexical-binding: t; -*-

    ;;; Commentary:

    ;; Render's Mandelbrot set.

    ;;; Code:

    (eval-when-compile (require 'cl-lib))

    ;;;###autoload
    (defun mandelbrot ()
      "Render a mandelbrot."
      (declare (speed 3))
      (interactive)
      (pop-to-buffer (get-buffer-create "*mandelbrot*"))
      (let ((gc-cons-threshold (* 800000 1024))
            (gc-cons-percentage 1.0))
        (let* ((cx) (cy) (zr) (zi)
               (inhibit-modification-hooks t)
               (w 1600) (h 1200) (d 256)
               (output (make-vector (* w h 3) 0))
               (idx 0)
               (x0 -2.5) (y0 -1.5) (dx 4.0) (dy 3.0)
               (dxp (/ dx w)) (dyp (/ dy h)))
          (fundamental-mode) (erase-buffer)
          (set-buffer-multibyte nil)
          (insert (format "P6\n%d %d\n255\n" w h))
          (dotimes (y h)
            (dotimes (x w)
              (setq cx (+ x0 (* dxp x))
                    cy (+ y0 (* dyp y))
                    zr 0
                    zi 0)
              (let ((v (cl-dotimes (v d d)
                         (if (> (+ (* zr zr) (* zi zi)) 4) (cl-return v)
                           (cl-psetq
                            zr (+ (* zr zr) (- (* zi zi)) cx)
                            zi (+ (* (* zr zi) 2) cy))))))
                ;; exponential 0.8 enhances contrast a bit
                (let ((out (floor (* 256 (expt (/ (float v) d) 0.8)))))
                  (aset output idx out)
                  (aset output (+ idx 1) out)
                  (aset output (+ idx 2) out)
                  (setq idx (+ idx 3))))))
          (insert (concat output))
          (image-mode))))

    (provide 'mandelbrot)
    ;;; mandelbrot.el ends here.



^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Identifying sources of allocations in a toy Mandelbrot package
@ 2024-01-19  9:19 Psionic K
  2024-01-19 15:33 ` Tomas Hlavaty
  2024-01-19 15:44 ` Eli Zaretskii
  0 siblings, 2 replies; 15+ messages in thread
From: Psionic K @ 2024-01-19  9:19 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: incal, Eli Zaretskii

Most importantly to my question, how can I identify sources of consing
without just knowing which functions and macros pathologically cons?
Is there a way to allow usage of stack by the byte or native compiler?

> You have a pretty big vector

The vector is allocated once and not on the same scale as usage,
measured by profiling or by resident memory in top.

If I let Emacs run with no GC, it will consume about 10GB before it
starts swapping and becomes unresponsive until I kill it.

> and a lot of nested iteration

I would have expected this to consume only stack memory, but it seems
that neither native compiled nor bytecode can use memory in a
stack-like way?  I'm not sure what the Elisp manual on stack allocated
objects means for native or byte compiled functions.

> cl-psetq conses.

After removing this for a combination of setq's and then removing two
unnecessary `cl-dotimes` expressions that used no CL features, I've
seen a second-run reduction down to about 8-9s.

The first run is quite a bit longer.  Although both runs GC 4-5 times,
the first run of an Emacs session is about 35s versus 8s for a second
run.  `gc-elapsed' after the first run goes from near zero to 33.7s.

The result is still on the same order of magnitude in terms of memory
usage.  The only behavior consistent with earlier memory profiling I
did is if every single floating point value ever calculated lives on a
unique piece of heap until GC'd.  It as if heap is consumed in a
straight line and there is no stack memory.

Perhaps concerning is that Emacs continues to consume the same
resident memory, about 2.1GB, indefinitely after the first run.  Is
this simply expected from a sprinkling of objects that Emacs cannot GC
and therefore never give back memory?

Is this unique to floating points?  Do I still have latent consing in
this version?  Thanks in advance.

    ;;; mandelbrot.el --- Mandelbrot -*- lexical-binding: t; -*-

    ;;; Commentary:

    ;; Render's Mandelbrot set.

    ;;; Code:

    (eval-when-compile (require 'cl-lib))

    ;;;###autoload
    (defun mandelbrot ()
      "Render a mandelbrot."
      (declare (speed 3))
      (interactive)
      (pop-to-buffer (get-buffer-create "*mandelbrot*"))
      (let ((gc-cons-threshold (* 800000 2048))
            (gc-cons-percentage 1.0))
        (let* ((cx) (cy) (zr) (zi) (v) (out) (zrt)
               (inhibit-modification-hooks t)
               (w 1600) (h 1200) (d 256)
               (output (make-vector (* w h 3) 0))
               (idx 0)
               (x0 -2.5) (y0 -1.5) (dx 4.0) (dy 3.0)
               (dxp (/ dx w)) (dyp (/ dy h)))
          (fundamental-mode) (erase-buffer)
          (set-buffer-multibyte nil)
          (insert (format "P6\n%d %d\n255\n" w h))
          (dotimes (y h)
            (dotimes (x w)
              (setf cx (+ x0 (* dxp x)))
              (setf cy (+ y0 (* dyp y)))
              (setq zr 0)
              (setq zi 0)
              (setq v (cl-dotimes (v d d)
                        (if (> (+ (* zr zr) (* zi zi)) 4) (cl-return v)
                          (setq zrt (+ (* zr zr) (- (* zi zi)) cx))
                          (setq zi (+ (* (* zr zi) 2) cy))
                          (setq zr zrt))))
              ;; samples that hit 256 wrap in the graphic to display as zero
              ;; exponential 0.8 enhances contrast a bit
              (setq out (floor (* 256 (expt (/ (float v) d) 0.8))))
              (aset output idx out)
              (aset output (+ idx 1) out)
              (aset output (+ idx 2) out)
              (setq idx (+ idx 3))))
          (insert (concat output))
          (image-mode))))

    (provide 'mandelbrot)
    ;;; mandelbrot.el ends here.



^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Identifying sources of allocations in a toy Mandelbrot package
@ 2024-01-27  9:25 Psionic K
  0 siblings, 0 replies; 15+ messages in thread
From: Psionic K @ 2024-01-27  9:25 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: Tomas Hlavaty

[-- Attachment #1: Type: text/plain, Size: 2015 bytes --]

I wanted to dig into the CPU versus memory limitation a bit.

I switched back to writing to a vector, not touching any buffers to
wrap the profile around the hot section.  CPU + mem results:

fixed point byte-compiled:       195 samples     3,890,391 bytes
fixed point native compiled:     250 samples     4,205,335 bytes
floating point byte-compiled:    560 samples   221,858,239 bytes
floating point native compiled:  409 samples   221,732,687 bytes

* There is no typo in the native & byte compiled fixed point results.
I made a few runs.  It is slower when native compiled.
* I found no other combination of `defsubst' `defun' or declaring
speed or not that could get the native compiled fixed point version to
be faster than byte compiled.
* The floating point version was only measured when the run did not
trigger GC.  This inflates its performance because GC cost, about 30%,
is not shown.

While the fixed point is fast considering it has to do more work,
perhaps if the native compiler didn't manage to slow it down, it could
be faster.

Regarding my previous statements about memory bandwidth being the
limiting factor, while the memory used is on the lower end of an order
of magnitude within the random write bandwidth available.  If that
amount was limiting, the compilation method would not be expected to
affect performance much.  Since the native compiled floating point is
faster, it is likely CPU bound.  Consumption is just a proxy for
bandwidth used.  Runtimes use bandwidth without new allocation.
Better tools are needed to answer this conclusively.

I may continue this exercise.  My initial idea to create clarity is by
writing a function that just throws away conses that hold values that
require 1, 10, 100, or 1000 steps to compute.  If the performance
scales inversely by conses but not by steps, we can isolate memory
versus computation as the bottleneck for some workflows and provide a
benchmark for the compiler and GC to be smarter about not generating
garbage in the first place.

[-- Attachment #2: fixed.el --]
[-- Type: text/x-emacs-lisp, Size: 2093 bytes --]

;;; -*- lexical-binding: t -*-

;; samples that hit 256 wrap in the graphic to display as zero
;; exponential 0.8 enhances contrast a bit
(let ((a (make-vector 257 0)))
  (dotimes (v 256)
    (aset a v (floor (* 256 (expt (/ v 256.0) 0.8)))))
  (define-inline contrast (v)
    (aref a v)))

(defconst +1fx+ (expt 2 12))

(define-inline fx (x)
  (* x +1fx+))

(define-inline fx2 (n d)
  (/ (* n +1fx+) d))

(defconst +4fx+ (fx 4))

(define-inline fx* (l r)
  (/ (* l r) +1fx+))

(define-inline fx^2 (x)
  (fx* x x))

(define-inline mandelbrot1 (cx cy)
  (let ((zr 0)
	(zi 0)
	(v 0)
	zr^2
	zi^2)
    (while (and (< v 256) (<= (+ (setq zr^2 (fx^2 zr)) (setq zi^2 (fx^2 zi))) +4fx+))
      (setq v (1+ v))
      (let ((tmp (+ zr^2 (- zi^2) cx)))
	(setq zi (+ (* (fx* zr zi) 2) cy))
	(setq zr tmp)))
    v))

(defun mandelbrot (&optional w h)
  (declare (speed 3))
  (interactive)
  (let ((output (make-vector (* w h) 0.0))
        (idx 0))
    (let ((w (or w 1600))
          (h (or h 1200))
          (x0 (fx2 -25 10))
          (y0 (fx2 -15 10)))
      (let ((dxp (/ (fx 4) w))
	    (dyp (/ (fx 3) h)))
        (dotimes (y h)
	  (let ((cy (+ y0 (* dyp y))))
	    (dotimes (x w)
	      (let ((v (contrast (mandelbrot1 (+ x0 (* dxp x)) cy))))
	        (aset output idx v)
                (setq idx (1+ idx))))))))
    output))

(defun mandelbrot-bench ()
  (interactive)
  (let (output)
    (profiler-start 'cpu+mem)
    (setq output (mandelbrot 320 200))
    (profiler-stop)
    (profiler-report)
    (message "%S" output)))

;;(mandelbrot 320 200)
;;(mandelbrot 640 480)
;;(mandelbrot 1024 768)
;;(mandelbrot)

;; (profiler-start 'mem)
;; (mandelbrot 320 200)
;; (profiler-stop)
;; (profiler-report)

;;(profiler-start 'cpu)
;;(mandelbrot 320 200)
;;(profiler-stop)
;;(profiler-report)

;;(profiler-start 'cpu+mem)
;;(mandelbrot)
;;(profiler-stop)
;;(profiler-report)

;; (benchmark-run 1 (mandelbrot 320 200))










































































































;;(32.898215598 2 0.09366582000001245)
;;(51.629763289 2 0.09165389300000015)

[-- Attachment #3: floating.el --]
[-- Type: text/x-emacs-lisp, Size: 1730 bytes --]

;;; -*- lexical-binding: t -*-

(define-inline mandelbrot1 (cx cy)
  (let ((zr 0)
	(zi 0)
	(v 0))
    (while (and (< v 256) (<= (+ (* zr zr) (* zi zi)) 4))
      (setq v (1+ v))
      (let ((tmp (+ (* zr zr) (- (* zi zi)) cx)))
	(setq zi (+ (* (* zr zi) 2) cy))
	(setq zr tmp)))
    ;; samples that hit 256 wrap in the graphic to display as zero
    ;; exponential 0.8 enhances contrast a bit
    (floor (* 256 (expt (/ v 256.0) 0.8)))))

(defun mandelbrot (&optional w h)
  (declare (speed 3))
  (interactive)
  (let ((output (make-vector (* w h) 0.0))
        (idx 0))
    (let ((w (or w 1600))
	  (h (or h 1200))
          (x0 -2.5)
	  (y0 -1.5)
	  (dx 4.0)
	  (dy 3.0))
      (let ((dxp (/ dx w))
	    (dyp (/ dy h)))
        (dotimes (y h)
	  (let ((cy (+ y0 (* dyp y))))
	    (dotimes (x w)
	      (let ((v (mandelbrot1 (+ x0 (* dxp x)) cy)))
	        (aset output idx v)
                (setq idx (1+ idx))))))))
    output))

(defun mandelbrot-bench ()
  (interactive)
  (let (output)
    (profiler-start 'cpu+mem)
    (setq output (mandelbrot 320 200))
    (profiler-stop)
    (profiler-report)
    (message "%S" output)))

;;(mandelbrot 320 200)
;;(mandelbrot 640 480)
;;(mandelbrot 1024 768)
;;(mandelbrot)

;; (profiler-start 'mem)
;; (mandelbrot 320 200)
;; (profiler-stop)
;; (profiler-report)

;; (profiler-start 'cpu+mem)
;; (mandelbrot 320 200)
;; (profiler-stop)
;; (profiler-report)

;; (profiler-start 'cpu)
;; (mandelbrot 320 200)
;; (profiler-stop)
;; (profiler-report)

;;(profiler-start 'cpu+mem)
;;(mandelbrot)
;;(profiler-stop)
;;(profiler-report)

;; (benchmark-run 1 (mandelbrot))
;; (15.033354357 15 3.9534469799999954)
;;(120.40541861 1947 93.45048212499998)
;;(128.362728323 1942 93.44881820700004)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-01-27  9:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-17 12:39 Identifying sources of allocations in a toy Mandelbrot package Psionic K
2024-01-17 12:58 ` Eli Zaretskii
2024-01-17 13:25 ` Emanuel Berg
  -- strict thread matches above, loose matches on Subject: below --
2024-01-19  9:19 Psionic K
2024-01-19 15:33 ` Tomas Hlavaty
2024-01-20  3:14   ` Psionic K
2024-01-20  3:37     ` Psionic K
2024-01-20  7:29     ` Eli Zaretskii
2024-01-20  9:09     ` Tomas Hlavaty
2024-01-20 10:03       ` Psionic K
2024-01-20 10:31         ` Eli Zaretskii
2024-01-26 23:36         ` Tomas Hlavaty
2024-01-27  1:07           ` Psionic K
2024-01-19 15:44 ` Eli Zaretskii
2024-01-27  9:25 Psionic K

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.