Hi Magali, (It is a slightly edited version I sent you. Since the aim of Outreachy is also to interact with Community, let enjoy the French proverb: «more crazy people we are, more fun we have». :-)) On Thu, 28 Jan 2021 at 00:53, Magali wrote: > Another thing is that the command is a bit slower than 'git log' itself. > Thoughts on how that could be improved? The command is “slow”. A first quick analysis about the meaning of “slow”. Basically, I have run twice: --8<---------------cut here---------------start------------->8--- for n in 60000 10000 5000 1000 500 100 50 10 5 1; do time ./pre-inst-env guix git log --oneline \ | head -n $n > /dev/null ; done --8<---------------cut here---------------end--------------->8--- to have kind of warm cache. And again for the equivalent Git command. Then, bit of Emacs edit processing to transform the output in the buffer of ’M-x shell’ to something in the Python file: tguix = np.array([2.871, … tgit = np.array([0.013, … (Well, to be correct, it is not twice but a couple of times to have an average.) Let normalize by removing the additive constant and run a classic linear regression: t ~ B n ^ a => log(t) ~ a log(n) + b where ’a’ and ’b’ have to be estimated. Well, I am surprised by ’a ~ 0.5’ which means that “guix git log” runs with a sublinear time complexity. However, “git log” is linear. Maybe I am doing something wrong. Initially I thought the tree was badly walked but a quick: --8<---------------cut here---------------start------------->8--- $ ./pre-inst-env guix git log --channel-cache-path guix /home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq $ git -C /home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq \ log --oneline | wc -l 72791 $ ./pre-inst-env guix git log --oneline | wc -l 72791 --8<---------------cut here---------------end--------------->8--- shows it is correct. Hum?! Well, I suspect noise on the data and the normalization is bad here. Running the experiment in a batch of 10 times then averaging them should give an analysis more meaningful. Hey, that’s a quick one. :-) The conclusion here, it scales well enough… for now. Therefore, the real the question is about the «additive constant». On my machine, it is ~2.9s and this is where there is room of improvement, I guess. I exported ’get-commits’ to have it in the REPL and I did: --8<---------------cut here---------------start------------->8--- $ ./pre-inst-env guix repl GNU Guile 3.0.5 Copyright (C) 1995-2021 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guix-user)> ,use(guix scripts git log) scheme@(guix-user)> (define (compute) (begin (get-commits) 'ok)) scheme@(guix-user)> ,time (compute) $1 = ok ;; 2.533936s real time, 3.099156s run time. 0.901027s spent in GC. scheme@(guix-user)> --8<---------------cut here---------------end--------------->8--- And I let you run “,profile (compute)”. Well, this function should be optimized. IMHO. Initially, I thought about “stream” but I do no think it is the issue here. Well, I think that the ’repo’ is open at each commit when folding. Instead, it should be open before, keep alive and close at the end of the ’fold’. Somehow. WDYT? I will give a closer look to ’commit-closure’ because I am not convinced it is useful here, neither. Bonus, attached the Python script and the plot. :-) Cheers, simon