Hello, This is a followup to the compiler issue, this time focusing on execution time, trying to see why the compiler takes so long to compile gnu/packages/*.scm, which should be trivial. As a test bed, I built this from the Guix tree: GUILE_LOAD_COMPILED_PATH=$PWD guild compile \ -L $PWD -o t.cps --to=cps gnu/packages/python.scm This results in a 428,185-line file, with that many CPS labels. If we profile the ‘emit-bytecode’ procedure of (language cps compile-bytecode), we get this: --8<---------------cut here---------------start------------->8--- $ guile --debug profile-cps.scm ;;; (go!) GC Warning: Repeated allocation of very large block (appr. size 268439552): May lead to memory leak and poor performance % cumulative self time seconds seconds procedure 15.41 54.63 10.25 language/cps/intset.scm:468:5:visit-branch 10.72 7.13 7.13 language/cps/intmap.scm:394:0:intmap-ref 7.97 20.31 5.30 language/cps/intset.scm:547:2:union 6.63 4.41 4.41 vector-copy 5.96 3.96 3.96 language/cps/intmap.scm:184:0:intmap-add! 4.91 236177.43 3.27 language/cps/intset.scm:470:5:visit-branch 4.50 23.45 3.00 language/cps/intmap.scm:247:2:adjoin 3.13 3.89 2.08 language/cps/intset.scm:269:0:intset-add 2.72 12.75 1.81 language/cps/intmap.scm:246:0:intmap-add 2.31 384.90 1.54 language/cps/intmap.scm:515:5:visit-branch 2.12 1.41 1.41 language/cps/intset.scm:149:0:add-level 1.94 5.67 1.29 language/cps/intmap.scm:104:0:clone-branch-and-set 1.86 1.24 1.24 language/cps/intset.scm:349:0:intset-ref 1.79 2.15 1.19 language/cps/intset.scm:270:2:adjoin 1.79 1.24 1.19 language/cps/intset.scm:204:0:intset-add! 1.64 1.41 1.09 ice-9/vlist.scm:539:0:vhash-assq 1.60 1.06 1.06 language/cps/intset.scm:466:2:intset-fold 1.45 4.23 0.97 language/cps/intset.scm:382:2:visit-node 1.08 2.53 0.72 language/cps/intset.scm:316:2:remove 1.01 6.76 0.67 language/cps/intset.scm:496:0:intset-union 0.89 0.64 0.59 language/cps/intmap.scm:290:0:intmap-replace! 0.82 0.54 0.54 system/vm/linker.scm:176:0:string-table-intern! 0.78 0.59 0.52 language/cps/intset.scm:579:0:intset-intersect 0.74 0.50 0.50 language/cps/intset.scm:155:0:make-intset/prune 0.67 0.59 0.45 language/cps/intset.scm:725:2:subtract-nodes 0.63 0.54 0.42 ice-9/vlist.scm:534:0:vhash-assoc 0.56 8.82 0.37 language/cps/utils.scm:514:0:solve-flow-equations 0.52 0.35 0.35 put-u8 0.52 0.35 0.35 hashq 0.41 6.31 0.27 srfi/srfi-1.scm:458:2:fold 0.41 0.27 0.27 language/cps/utils.scm:381:15 0.34 236148.90 0.22 language/cps/utils.scm:364:2:visit 0.34 1.04 0.22 language/cps/intset.scm:315:0:intset-remove 0.30 1.93 0.20 language/cps/utils.scm:394:4:visit-scc 0.26 3.32 0.17 language/cps/utils.scm:241:3:visit-cont 0.26 2.10 0.17 language/cps/compile-bytecode.scm:539:4:compile-cont 0.26 0.82 0.17 language/cps/slot-allocation.scm:156:4 0.26 0.35 0.17 language/cps/slot-allocation.scm:792:3 0.26 0.17 0.17 language/cps/utils.scm:383:18 0.22 3.44 0.15 language/cps/slot-allocation.scm:1015:4:allocate-cont 0.22 0.77 0.15 language/cps/intset.scm:694:0:intset-subtract 0.22 0.64 0.15 language/cps/slot-allocation.scm:278:16 0.22 0.45 0.15 language/cps/intset.scm:755:8:lp 0.22 0.27 0.15 language/cps/slot-allocation.scm:325:3 0.22 0.17 0.15 language/cps/slot-allocation.scm:400:16 0.22 0.15 0.15 gensym 0.22 0.15 0.15 assq-ref 0.22 0.15 0.15 hash 0.19 4.19 0.12 language/cps/utils.scm:127:16 0.19 1.21 0.12 language/cps/slot-allocation.scm:636:2:compute-shuffles 0.19 1.09 0.12 language/cps/intset.scm:367:0:intset-next 0.19 1.02 0.12 system/vm/assembler.scm:2019:2:write-arity 0.19 0.69 0.12 language/cps/slot-allocation.scm:874:19 0.19 0.17 0.12 ice-9/vlist.scm:449:0:vhash-cons 0.15 8.84 0.10 language/cps/utils.scm:306:4:visit 0.15 2.77 0.10 language/cps/intmap.scm:524:5:visit-branch 0.15 1.63 0.10 system/vm/linker.scm:357:0:add-symbols 0.15 0.89 0.10 language/cps/utils.scm:367:23 0.15 0.45 0.10 system/vm/assembler.scm:2249:0:link-debug 0.15 0.30 0.10 language/cps/slot-allocation.scm:677:2:measure-cont 0.15 0.20 0.10 language/cps/slot-allocation.scm:873:4:compute-live-slots* 0.15 0.10 0.10 system/vm/assembler.scm:1722:18 0.15 0.10 0.10 language/cps/intset.scm:114:0:leaf-empty? 0.15 0.10 0.10 language/cps/intmap.scm:127:0:add-level 0.11 16.02 0.07 language/cps/intmap.scm:517:5:visit-branch 0.11 1.06 0.07 language/cps/slot-allocation.scm:415:17 0.11 0.52 0.07 srfi/srfi-1.scm:634:2:for-each 0.11 0.22 0.07 language/cps/intmap.scm:530:0:intmap-union 0.11 0.15 0.07 language/cps/utils.scm:273:5 0.11 0.12 0.07 system/vm/assembler.scm:1970:0:put-uleb128 0.11 0.10 0.07 language/cps/intmap.scm:578:2:union 0.11 0.07 0.07 language/cps/utils.scm:388:27 0.11 0.07 0.07 language/cps/compile-bytecode.scm:72:16 0.11 0.07 0.07 language/cps/utils.scm:425:17 0.11 0.07 0.07 language/cps/intmap.scm:167:0:persistent-intmap 0.11 0.07 0.07 list? 0.11 0.07 0.07 length+ 0.11 0.07 0.07 hashq-set! 0.11 0.07 0.07 system/vm/assembler.scm:1302:0:emit-definition 0.11 0.07 0.07 language/cps/intset.scm:632:2:intersect 0.11 0.07 0.07 system/vm/assembler.scm:957:0:immediate-bits 0.07 2.28 0.05 language/cps/utils.scm:432:2:component-successors 0.07 2.25 0.05 language/cps/utils.scm:509:0:intset-pop 0.07 0.72 0.05 language/cps/slot-allocation.scm:904:4:allocate 0.07 0.62 0.05 language/cps/slot-allocation.scm:768:2:allocate-lazy 0.07 0.35 0.05 language/cps/utils.scm:395:6:visit 0.07 0.35 0.05 language/cps/utils.scm:402:5 0.07 0.25 0.05 language/cps/utils.scm:434:18 0.07 0.17 0.05 language/cps/slot-allocation.scm:726:17 0.07 0.07 0.05 language/cps/slot-allocation.scm:555:2:compute-live-slots 0.07 0.07 0.05 ice-9/boot-9.scm:1038:0:compose 0.07 0.05 0.05 language/cps/intset.scm:178:0:transient-intset 0.07 0.05 0.05 language/cps/slot-allocation.scm:436:4:add 0.07 0.05 0.05 system/vm/assembler.scm:1299:0:emit-source 0.07 0.05 0.05 max 0.07 0.05 0.05 logbit? 0.07 0.05 0.05 reverse 0.07 0.05 0.05 language/cps/slot-allocation.scm:303:35 0.07 0.05 0.05 language/cps/utils.scm:206:6 0.07 0.05 0.05 language/cps/utils.scm:231:19 0.04 33421.25 0.02 srfi/srfi-1.scm:590:5:map1 0.04 26.55 0.02 language/cps/slot-allocation.scm:195:0:compute-reverse-control-flow-order 0.04 3.19 0.02 system/vm/assembler.scm:1024:0:intern-constant 0.04 1.54 0.02 language/cps/slot-allocation.scm:289:16 0.04 0.89 0.02 language/cps/utils.scm:531:31 0.04 0.40 0.02 language/cps/slot-allocation.scm:454:21 0.04 0.25 0.02 language/cps/slot-allocation.scm:918:4:allocate* 0.04 0.25 0.02 language/cps/slot-allocation.scm:202:25 0.04 0.22 0.02 language/cps/compile-bytecode.scm:567:19 0.04 0.22 0.02 language/cps/utils.scm:436:21 0.04 0.17 0.02 language/cps/slot-allocation.scm:284:15 0.04 0.17 0.02 language/cps/slot-allocation.scm:787:2:get-defs 0.04 0.15 0.02 language/cps/slot-allocation.scm:556:17 0.04 0.15 0.02 language/cps/utils.scm:191:16 0.04 0.12 0.02 language/cps/slot-allocation.scm:868:6:lp 0.04 0.10 0.02 language/cps/slot-allocation.scm:547:4:lp 0.04 0.10 0.02 ice-9/boot-9.scm:261:2:for-each 0.04 0.07 0.02 system/vm/assembler.scm:1387:0:link-data 0.04 0.07 0.02 language/cps/slot-allocation.scm:482:0:integers 0.04 0.07 0.02 language/cps/slot-allocation.scm:487:0:solve-parallel-move 0.04 0.05 0.02 language/cps/slot-allocation.scm:725:2:compute-live-slots 0.04 0.05 0.02 language/cps/slot-allocation.scm:463:0:find-first-zero 0.04 0.05 0.02 language/cps/intmap.scm:431:2:visit-node 0.04 0.05 0.02 language/cps/slot-allocation.scm:823:20 0.04 0.05 0.02 language/cps/compile-bytecode.scm:498:4:skip-elided-conts 0.04 0.02 0.02 language/cps/slot-allocation.scm:130:0:lookup-call-proc-slot 0.04 0.02 0.02 language/cps/slot-allocation.scm:283:0:rename-intset 0.04 0.02 0.02 language/cps/intmap.scm:366:0:intmap-remove 0.04 0.02 0.02 language/cps/compile-bytecode.scm:92:49 0.04 0.02 0.02 language/cps/utils.scm:314:34 0.04 0.02 0.02 language/cps/slot-allocation.scm:765:23 0.04 0.02 0.02 language/cps/intset.scm:138:0:branch-empty? 0.04 0.02 0.02 system/vm/assembler.scm:701:0:encode-X8_S24-R32-N32-N32-B1_X31 0.04 0.02 0.02 language/cps/utils.scm:458:10 0.04 0.02 0.02 system/vm/linker.scm:418:0:process-reloc 0.04 0.02 0.02 system/vm/assembler.scm:701:0:encode-X8_F24-B1_X7_C24 0.04 0.02 0.02 language/cps/utils.scm:134:16 0.04 0.02 0.02 language/cps/slot-allocation.scm:696:15 0.04 0.02 0.02 system/vm/assembler.scm:1361:0:write-immediate 0.04 0.02 0.02 language/cps/slot-allocation.scm:267:3 0.04 0.02 0.02 %after-gc-thunk 0.04 0.02 0.02 make-bytevector 0.04 0.02 0.02 language/cps/slot-allocation.scm:108:0:lookup-representation 0.04 0.02 0.02 system/vm/assembler.scm:936:12 0.04 0.02 0.02 language/cps/compile-bytecode.scm:105:4:from-sp 0.04 0.02 0.02 language/cps/intset.scm:187:0:persistent-intset 0.04 0.02 0.02 min 0.04 0.02 0.02 system/vm/assembler.scm:1689:3 0.04 0.02 0.02 language/cps/compile-bytecode.scm:463:31 0.04 0.02 0.02 system/vm/elf.scm:929:0:write-elf64-symbol 0.04 0.02 0.02 language/cps/slot-allocation.scm:470:0:find-first-trailing-zero 0.04 0.02 0.02 srfi/srfi-1.scm:687:0:filter-map 0.04 0.02 0.02 bytevector-u64-set! 0.04 0.02 0.02 language/cps/slot-allocation.scm:102:0:lookup-maybe-slot 0.04 0.02 0.02 system/vm/assembler.scm:2000:2:write-header 0.04 0.02 0.02 language/cps/intmap.scm:513:2:intmap-fold 0.04 0.02 0.02 language/cps/slot-allocation.scm:153:10 0.04 0.02 0.02 language/cps/utils.scm:426:32 0.04 0.02 0.02 system/vm/assembler.scm:1112:0:intern-cache-cell 0.04 0.02 0.02 cons 0.04 0.02 0.02 language/cps/intmap.scm:290:53 0.04 0.02 0.02 system/vm/assembler.scm:1296:0:emit-label 0.00 66.52 0.00 language/cps/compile-bytecode.scm:585:0:emit-bytecode 0.00 64.56 0.00 language/cps/compile-bytecode.scm:50:15 0.00 60.03 0.00 language/cps/compile-bytecode.scm:84:0:compile-function 0.00 59.66 0.00 language/cps/slot-allocation.scm:841:0:allocate-slots 0.00 25.11 0.00 language/cps/utils.scm:408:0:compute-sorted-strongly-connected-components 0.00 23.45 0.00 language/cps/slot-allocation.scm:386:0:compute-lazy-vars 0.00 23.10 0.00 language/cps/slot-allocation.scm:296:0:compute-live-variables 0.00 13.32 0.00 language/cps/utils.scm:390:0:compute-strongly-connected-components 0.00 5.65 0.00 language/cps/utils.scm:125:0:intmap-map 0.00 1.81 0.00 language/cps/slot-allocation.scm:287:0:rename-graph 0.00 1.78 0.00 language/cps/slot-allocation.scm:276:0:rename-keys 0.00 1.73 0.00 system/vm/assembler.scm:2578:0:link-objects 0.00 1.73 0.00 system/vm/assembler.scm:2609:0:link-assembly 0.00 1.71 0.00 system/vm/linker.scm:706:0:link-elf 0.00 1.63 0.00 system/vm/linker.scm:635:0:allocate-elf 0.00 1.63 0.00 system/vm/linker.scm:371:0:allocate-segment 0.00 1.26 0.00 language/cps/slot-allocation.scm:536:0:compute-shuffles 0.00 1.09 0.00 language/cps/utils.scm:378:0:invert-graph 0.00 1.04 0.00 language/cps/utils.scm:304:0:compute-successors 0.00 1.02 0.00 system/vm/assembler.scm:2086:0:link-arities 0.00 1.02 0.00 system/vm/assembler.scm:1999:0:write-arities 0.00 0.84 0.00 language/cps/slot-allocation.scm:149:0:compute-defs-and-uses 0.00 0.72 0.00 language/cps/slot-allocation.scm:724:0:allocate-lazy-vars 0.00 0.67 0.00 language/cps/utils.scm:131:0:intmap-keys 0.00 0.62 0.00 anon #x7f790ba74034 0.00 0.62 0.00 language/cps/utils.scm:295:17 0.00 0.54 0.00 language/cps/utils.scm:202:0:compute-constant-values 0.00 0.52 0.00 system/vm/assembler.scm:1144:0:emit-load-constant 0.00 0.42 0.00 language/cps/utils.scm:239:0:compute-function-body 0.00 0.35 0.00 language/cps/utils.scm:183:0:compute-defining-expressions 0.00 0.30 0.00 language/cps/compile-bytecode.scm:587:21 0.00 0.30 0.00 language/cps/compile-bytecode.scm:52:0:intmap-select 0.00 0.27 0.00 language/cps/compile-bytecode.scm:62:0:compute-forwarding-labels 0.00 0.12 0.00 language/cps/compile-bytecode.scm:55:4 0.00 0.12 0.00 system/vm/assembler.scm:1726:0:link-text-object 0.00 0.12 0.00 language/cps/slot-allocation.scm:157:6:get-defs 0.00 0.12 0.00 system/vm/assembler.scm:1645:0:link-constants 0.00 0.10 0.00 language/cps/slot-allocation.scm:663:2:max-size 0.00 0.10 0.00 language/cps/utils.scm:170:2:fixpoint 0.00 0.10 0.00 hash-map->list 0.00 0.07 0.00 system/vm/assembler.scm:1318:0:emit-cached-module-box 0.00 0.05 0.00 system/vm/assembler.scm:2533:4:write-die 0.00 0.05 0.00 system/vm/assembler.scm:1312:0:emit-cached-toplevel-box 0.00 0.05 0.00 system/vm/assembler.scm:1160:0:emit-load-static-procedure 0.00 0.05 0.00 srfi/srfi-1.scm:586:2:map 0.00 0.05 0.00 language/cps/compile-bytecode.scm:65:18 0.00 0.05 0.00 language/cps/intmap.scm:336:2:adjoin 0.00 0.02 0.00 system/vm/assembler.scm:1371:0:emit-init-constants 0.00 0.02 0.00 system/vm/assembler.scm:701:0:encode-X8_S8_I16 0.00 0.02 0.00 anon #x7f790ba74000 0.00 0.02 0.00 language/cps/slot-allocation.scm:894:7 0.00 0.02 0.00 system/vm/assembler.scm:1308:0:emit-cache-current-module! 0.00 0.02 0.00 language/cps/intmap.scm:424:0:intmap-next 0.00 0.02 0.00 srfi/srfi-1.scm:606:7:map2 0.00 0.02 0.00 system/vm/assembler.scm:1868:0:link-symtab 0.00 0.02 0.00 language/cps/slot-allocation.scm:579:2:compute-receive-shuffles 0.00 0.02 0.00 language/cps/intmap.scm:331:0:intmap-replace --- Sample count: 2686 Total time: 66.515425859 seconds (31.859444991 seconds in GC) --8<---------------cut here---------------end--------------->8--- The intmap ‘big-cps’ has those 428K elements distributed in 13,814 33-element vectors (per ‘intmap-vector-count’ in the attached file.) I fail to see why we’re GC’ing so much (‘visit-branch’ is also first in the GC profile). It looks like (1) something’s not working as intended GC-wise (unintended vector retention somewhere?), and (2) the O(log N) complexity, IIUC, of intmap/intset lookups may be penalizing on large input programs. Andy, WDYT? Thanks, Ludo’.