Hey, > That’s over SSH, right? correct, the worst possible case: Inside two VM’s on a Laptop, SSH transport between them and /gnu+/var/guix on an NFS share (nfsd is in the same VM as guix-daemon). > Probably what’s killing us is the round-trip time for all these small > RPCs. We would need pipelining but the RPC protocol is not designed to > make that easy. That would have been my best guess too, but it does not seem to be the biggest problem right now. Looking at the numbers again (both patches applied) with the attached manifest[1], I see that: ---snip--- Local UNIX socket with and without --no-grafts: N Min Max Median Avg Stddev x 10 6.07 6.35 6.145 6.16 0.08232726 + 10 17.47 17.89 17.545 17.602 0.14351152 Difference at 99.0% confidence 11.442 +/- 0.150576 185.747% +/- 4.07133% Local UNIX socket vs. guix://localhost transport: N Min Max Median Avg Stddev x 10 17.47 17.89 17.545 17.602 0.14351152 + 10 17.43 18.1 17.61 17.642 0.20131788 No difference proven at 99.0% confidence Local UNIX socket vs ssh://localhost transport: N Min Max Median Avg Stddev x 10 17.47 17.89 17.545 17.602 0.14351152 + 10 33.46 35.27 34.315 34.359 0.53873205 Difference at 99.0% confidence 16.757 +/- 0.5074 95.1994% +/- 3.13957% ---snap--- So I would conclude: 1) Grafting still takes a lot of time and needs more work 2) Linux optimizes localhost networking pretty well 3) Our SSH transport is terribly slow Moving to non-localhost communication between two VM’s: ---snip--- guix://localhost vs. guix://remote-host transport: N Min Max Median Avg Stddev x 10 17.43 18.1 17.61 17.642 0.20131788 + 10 20.88 22.58 21.095 21.222 0.49689704 Difference at 99.0% confidence 3.58 +/- 0.487934 20.2925% +/- 2.85159% guix://remote-host vs. ssh://remote-host: N Min Max Median Avg Stddev x 10 20.88 22.58 21.095 21.222 0.49689704 + 10 30.1 32.56 31.005 31.093 0.70740606 Difference at 99.0% confidence 9.871 +/- 0.786769 46.5131% +/- 4.35326% ---snap--- Conclusion here is the same: Not alot of impact of networking/NFS and SSH transport is still terribly slow. (Confusingly faster than localhost though.) > Perhaps you could “strace -Tt” the thing to check whether this > hypothesis is correct by looking at the time we spend waiting for > replies? I’m not sure this will yield meaningful data for SSH, so I analyzet it for guix://localhost vs. guix://remote-host. Takeaway is, yes, of course there is a statistically significant difference and it’s about 40%±50%, which means this method is pretty useless, because we can’t bin RPC’s by type. So, I guess it would make sense for me to look at the SSH transport itself again and see if there are any other low-hanging fruit. Not sure how much I can help with profiling guile/guix itself. A different/better RPC protocol is probably GSoC/v2.0-worthy? Sorry for all the lengthy emails, Lars [1] You’ll need this channel: https://github.com/leibniz-psychology/guix-zpid