Hi Guilers, after reading in that performance becomes much better when the heap is bigger (though trading increased memory usage), I was intrigued and ran the r7rs benchmarks by ecraven¹ with increased heap sizes. Then I took the geometric mean of the slowdown compared to the fastest run to get an estimate of the effect.²³ Firstoff: This is not shooting against whippet. I just saw that we might not have optimized what we have in view of current machines (the last change to that part of the code was 2013). Since the goal is to see whether whippet is a better algorithm and not just a better parameter tuning, I think we should optimize what we have as well as we can. Long story short: This is the geometric mean slowdown of the different configurations I tried. For each test I divide the runtime by the runtime of the fastest run. Then I multiply all results together and take the nth root. You can see the different runs speed up from 1.36 to 1.08 (so the fastest saves 20% of the runtime compared to the slowest). Guile-3.0.8-Release-Guix 1.3635622029749705 Guile-3.0.8.32-C7465-Initial-Heap-Same 1.28468658287013 Guile-3.0.8.32-C7465-Initial-Heap-X2 1.2026211277628647 Guile-3.0.8.32-C7465-Initial-Heap-X03 1.1420637378011338 Guile-3.0.8.32-C7465-Initial-Heap-Y4 1.115306570523714 Guile-3.0.8.32-C7465-Initial-Heap-X4 1.1152983530832015 Guile-3.0.8.32-C7465-Initial-Heap-Y8 1.1010439505355267 Guile-3.0.8.32-C7465-Initial-Heap-X8 1.0963485672806603 Guile-3.0.8.32-C7465-Initial-Heap-Y16 1.0865560807057923 Guile-3.0.8.32-C7465-Initial-Heap-X32 1.0794187284545695 The "initial-heap-same" uses the same default initial heap size as the release, but compiled on my machine the same as the later runs (to have a fair comparison). So the actual realistic savings is from 1.28 to 1.08; about 15% runtime. The biggest slowdowns with the original setting (Guile-3.0.8.32-C7465-Initial-Heap-Same) are ("string:500000:100" . 2.804365797853303) ("pi:50:500:50:100" . 1.329592551153582) ("fibc:30:10" . 1.9593681770110587) ("ctak:27:16:8:1" . 1.7223180947713728) ("puzzle:1000" . 1.2507733865679516) ("divrec:1000:1000000" . 1.770196079435263) ("diviter:1000:1000000" . 1.6470513953889132) ("destruc:600:50:4000" . 1.4069984209535633) ("deriv:10000000" . 1.5446282315616486) ("browse:2000" . 1.3562114814384538)) With the guix release, these are worse — it might just be the missing CPU optimization: ("string:500000:100" . 3.8170164716409203) ("pi:50:500:50:100" . 10.019772974391747) ("fibc:30:10" . 1.7577252524431337) ("ctak:27:16:8:1" . 2.189032714421155) ("puzzle:1000" . 1.3845551513952115) ("divrec:1000:1000000" . 1.5390117526035674) ("diviter:1000:1000000" . 1.6519778833202012) ("destruc:600:50:4000" . 1.3818820105624317) ("deriv:10000000" . 1.4957537102777774) ("browse:2000" . 1.3667013607333778)) -- the difference in pi might be explained by compiling for my specific CPU? -- For X2 (doubled default initial heap size), these are ("string:500000:100" . 3.0659126833785826) ("pi:50:500:50:100" . 1.070977435388164) ("fibc:30:10" . 1.5787505744881367) ("ctak:27:16:8:1" . 1.9081576466641825) ("puzzle:1000" . 1.4067164190079613) ("divrec:1000:1000000" . 1.2700549704431958) ("diviter:1000:1000000" . 1.2804407603111705) ("destruc:600:50:4000" . 1.2165780508774755) ("deriv:10000000" . 1.2148018600354562) ("browse:2000" . 1.1392901841291523)) For X3: ("string:500000:100" . 2.0856082228400274) ("pi:50:500:50:100" . 1.0) ("fibc:30:10" . 1.1895650934858277) ("ctak:27:16:8:1" . 1.2368626373551983) ("puzzle:1000" . 1.1086368282261354) ("divrec:1000:1000000" . 1.1689640018971026) ("diviter:1000:1000000" . 1.214446692814547) ("destruc:600:50:4000" . 1.1610110879917723) ("deriv:10000000" . 1.1918170314242649) ("browse:2000" . 1.085672027420832)) For X4: ("string:500000:100" . 1.442325198861453) ("pi:50:500:50:100" . 1.0392308205992176) ("fibc:30:10" . 1.11921913729854) ("ctak:27:16:8:1" . 1.0730039630449966) ("puzzle:1000" . 1.0601949486754898) ("divrec:1000:1000000" . 1.114130496950625) ("diviter:1000:1000000" . 1.120809472425336) ("destruc:600:50:4000" . 1.1238746226148861) ("deriv:10000000" . 1.03309750321746) ("browse:2000" . 1.0503206501571072)) The inverse effect is seen in the memory usage, as shown by the Gnome system monitor (using the memory column; not shared, not reserved, not virtual). I only checked this by eye, so take it with a grain of salt. With the original setting, most benchmarks take around 4-6 MB of memory. - x1: 4-6 MB - x2: 6-9 MB - x3: 4-10 MB - x4: ~11 MB - x32: 4-74 MB So this is a tradeoff, and x3 looks good to me: about 50% more memory usage for long-running services in exchange for saving 15% geometric mean runtime with some cases saving much more — i.e. ctak (a typical hard case) saves 44% of the runtime and divrec and fibc save 30%. When I start guile just with sleep, it takes 2.9MB of memory regardless of the default initial heap size, and it is already at 6MB shared memory, so I think that usual CPU caches should not be impacted negatively. guile -q -c "(sleep 30)" So I would suggest increasing the default initial heap size by factor 3. Another good test to see whether this is the right path to follow could be to run a guix derivation and see how its runtime changes with this. One more good test would be to run the lilypond benchmarks with this change, because those have some quite performance sensitive cases that people actually depend on for their daily work. What do you think? Here’s the change I’d do: diff --git a/libguile/gc.c b/libguile/gc.c index 7717e9bef..5a4ab5c6d 100644 --- a/libguile/gc.c +++ b/libguile/gc.c @@ -65,10 +65,12 @@ /* Size in bytes of the initial heap. This should be about the size of - result of 'guile -c "(display (assq-ref (gc-stats) - 'heap-total-allocated))"'. */ + 3 x the result of 'guile -c "(display (assq-ref (gc-stats) + 'heap-total-allocated))"'. + Increased from the minimum heap by factor 3 to avoid collections + during common short-running tasks. */ -#define DEFAULT_INITIAL_HEAP_SIZE (256 * 1024 * SIZEOF_UINTPTR_T) +#define DEFAULT_INITIAL_HEAP_SIZE (3 * 256 * 1024 * SIZEOF_UINTPTR_T) /* Set this to != 0 if every cell that is accessed shall be checked: */ I attached the full output of the evaluation and the benchmarks, in case you want to look into details.