From: zimoun <zimon.toutoune@gmail.com>
To: Magali <magalilemes00@gmail.com>,
"Gábor Boskovits" <boskovits@gmail.com>,
"Guix Devel" <guix-devel@gnu.org>
Subject: Re: [Outreachy] 'guix git log' slowness?
Date: Fri, 29 Jan 2021 01:04:12 +0100 [thread overview]
Message-ID: <86eei4hamb.fsf@gmail.com> (raw)
In-Reply-To: <a8834240-0d7f-94db-fd68-10a51837f1f4@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3844 bytes --]
Hi Magali,
(It is a slightly edited version I sent you. Since the aim of Outreachy
is also to interact with Community, let enjoy the French proverb: «more
crazy people we are, more fun we have». :-))
On Thu, 28 Jan 2021 at 00:53, Magali <magalilemes00@gmail.com> wrote:
> Another thing is that the command is a bit slower than 'git log' itself.
> Thoughts on how that could be improved?
The command is “slow”. A first quick analysis about the meaning of “slow”.
Basically, I have run twice:
--8<---------------cut here---------------start------------->8---
for n in 60000 10000 5000 1000 500 100 50 10 5 1;
do
time ./pre-inst-env guix git log --oneline \
| head -n $n > /dev/null ;
done
--8<---------------cut here---------------end--------------->8---
to have kind of warm cache. And again for the equivalent Git command.
Then, bit of Emacs edit processing to transform the output in the buffer
of ’M-x shell’ to something in the Python file:
tguix = np.array([2.871, …
tgit = np.array([0.013, …
(Well, to be correct, it is not twice but a couple of times to have an
average.)
Let normalize by removing the additive constant and run a classic
linear regression:
t ~ B n ^ a => log(t) ~ a log(n) + b
where ’a’ and ’b’ have to be estimated.
Well, I am surprised by ’a ~ 0.5’ which means that “guix git log” runs
with a sublinear time complexity. However, “git log” is linear. Maybe
I am doing something wrong.
Initially I thought the tree was badly walked but a quick:
--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix git log --channel-cache-path
guix
/home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
$ git -C /home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq \
log --oneline | wc -l
72791
$ ./pre-inst-env guix git log --oneline | wc -l
72791
--8<---------------cut here---------------end--------------->8---
shows it is correct. Hum?! Well, I suspect noise on the data and the
normalization is bad here. Running the experiment in a batch of 10
times then averaging them should give an analysis more meaningful. Hey,
that’s a quick one. :-)
The conclusion here, it scales well enough… for now.
Therefore, the real the question is about the «additive constant». On
my machine, it is ~2.9s and this is where there is room of improvement,
I guess.
I exported ’get-commits’ to have it in the REPL and I did:
--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix repl
GNU Guile 3.0.5
Copyright (C) 1995-2021 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guix-user)> ,use(guix scripts git log)
scheme@(guix-user)> (define (compute) (begin (get-commits) 'ok))
scheme@(guix-user)> ,time (compute)
$1 = ok
;; 2.533936s real time, 3.099156s run time. 0.901027s spent in GC.
scheme@(guix-user)>
--8<---------------cut here---------------end--------------->8---
And I let you run “,profile (compute)”. Well, this function should be
optimized. IMHO.
Initially, I thought about “stream” but I do no think it is the issue
here. Well, I think that the ’repo’ is open at each commit when
folding. Instead, it should be open before, keep alive and close at the
end of the ’fold’. Somehow. WDYT?
I will give a closer look to ’commit-closure’ because I am not convinced
it is useful here, neither.
Bonus, attached the Python script and the plot. :-)
Cheers,
simon
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: quick-analysis.py --]
[-- Type: text/x-python, Size: 951 bytes --]
# coding: utf-8
#
import numpy as np
import matplotlib.pyplot as plt
n = np.array([1., 5., 10., 50., 100., 500., 1000., 5000., 10000., 60000.])
tguix = np.array([2.871, 3.006, 2.987, 2.892, 2.991, 3.038, 3.088, 3.315, 3.565, 5.634])
tgit = np.array([0.013, 0.006, 0.013, 0.016, 0.014, 0.029, 0.047, 0.12, 0.233, 1.283])
normalize = lambda x: x - x[0]
logify = lambda x: np.log10(x[3:])
x = logify(n)
y = logify(normalize(tguix))
z = logify(normalize(tgit))
X = np.concatenate((x.reshape((len(x), 1)), np.ones((len(x), 1))), axis=1)
guix, res, rank, s = np.linalg.lstsq(X, y, rcond=None)
git, res, rank, s = np.linalg.lstsq(X, z, rcond=None)
predict = lambda sol: sol[0] * x + sol[1]
plt.plot(x, y, 'bo', label='guix')
plt.plot(x, z, 'ro', label='git')
plt.plot(x, predict(guix), 'bx-')
plt.plot(x, predict(git), 'rx-')
plt.legend(loc='lower right')
plt.xlabel("log(N) where N in `| head -n N`")
plt.ylabel("log(real time)")
plt.show()
[-- Attachment #3: scale.png --]
[-- Type: image/png, Size: 45025 bytes --]
prev parent reply other threads:[~2021-01-29 0:08 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-28 3:53 [Outreachy] Feedback on 'guix git log' subcommand Magali
2021-01-28 23:07 ` zimoun
2021-01-30 1:30 ` Magali
2021-02-01 7:27 ` zimoun
2021-02-01 15:18 ` Ludovic Courtès
2021-01-29 0:04 ` zimoun [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86eei4hamb.fsf@gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=boskovits@gmail.com \
--cc=guix-devel@gnu.org \
--cc=magalilemes00@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.