all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: Magali <magalilemes00@gmail.com>,
	"Gábor Boskovits" <boskovits@gmail.com>,
	"Guix Devel" <guix-devel@gnu.org>
Subject: Re: [Outreachy] 'guix git log' slowness?
Date: Fri, 29 Jan 2021 01:04:12 +0100	[thread overview]
Message-ID: <86eei4hamb.fsf@gmail.com> (raw)
In-Reply-To: <a8834240-0d7f-94db-fd68-10a51837f1f4@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3844 bytes --]

Hi Magali,

(It is a slightly edited version I sent you.  Since the aim of Outreachy
is also to interact with Community, let enjoy the French proverb: «more
crazy people we are, more fun we have». :-))

On Thu, 28 Jan 2021 at 00:53, Magali <magalilemes00@gmail.com> wrote:

> Another thing is that the command is a bit slower than 'git log' itself.
> Thoughts on how that could be improved?

The command is “slow”.  A first quick analysis about the meaning of “slow”.

Basically, I have run twice:

--8<---------------cut here---------------start------------->8---
for n in 60000 10000 5000 1000 500 100 50 10 5 1;
do
   time ./pre-inst-env guix git log --oneline \
       | head -n $n > /dev/null ;
done
--8<---------------cut here---------------end--------------->8---

to have kind of warm cache.  And again for the equivalent Git command.
Then, bit of Emacs edit processing to transform the output in the buffer
of ’M-x shell’ to something in the Python file:

tguix = np.array([2.871, …
tgit  = np.array([0.013, …

(Well, to be correct, it is not twice but a couple of times to have an
average.)

Let normalize by removing the additive constant and run a classic
linear regression:

  t ~ B n ^ a => log(t) ~ a log(n) + b

where ’a’ and ’b’ have to be estimated.

Well, I am surprised by ’a ~ 0.5’ which means that “guix git log” runs
with a sublinear time complexity.  However, “git log” is linear.  Maybe
I am doing something wrong.

Initially I thought the tree was badly walked but a quick:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix git log --channel-cache-path
guix
  /home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
$ git -C /home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq \
      log --oneline | wc -l
72791
$ ./pre-inst-env guix git log --oneline | wc -l
72791
--8<---------------cut here---------------end--------------->8---

shows it is correct.  Hum?!  Well, I suspect noise on the data and the
normalization is bad here.  Running the experiment in a batch of 10
times then averaging them should give an analysis more meaningful.  Hey,
that’s a quick one. :-)

The conclusion here, it scales well enough… for now.

Therefore, the real the question is about the «additive constant».  On
my machine, it is ~2.9s and this is where there is room of improvement,
I guess.

I exported ’get-commits’ to have it in the REPL and I did:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix repl
GNU Guile 3.0.5
Copyright (C) 1995-2021 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guix-user)> ,use(guix scripts git log)
scheme@(guix-user)> (define (compute) (begin (get-commits) 'ok))
scheme@(guix-user)> ,time (compute)
$1 = ok
;; 2.533936s real time, 3.099156s run time.  0.901027s spent in GC.
scheme@(guix-user)>
--8<---------------cut here---------------end--------------->8---

And I let you run “,profile (compute)”.  Well, this function should be
optimized.  IMHO.

Initially, I thought about “stream” but I do no think it is the issue
here.  Well, I think that the ’repo’ is open at each commit when
folding.  Instead, it should be open before, keep alive and close at the
end of the ’fold’.  Somehow.  WDYT?

I will give a closer look to ’commit-closure’ because I am not convinced
it is useful here, neither.


Bonus, attached the Python script and the plot. :-)


Cheers,
simon


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: quick-analysis.py --]
[-- Type: text/x-python, Size: 951 bytes --]

# coding: utf-8


#
import numpy as np
import matplotlib.pyplot as plt

n = np.array([1., 5., 10., 50., 100., 500., 1000., 5000., 10000., 60000.])
tguix = np.array([2.871, 3.006, 2.987, 2.892, 2.991, 3.038, 3.088, 3.315, 3.565, 5.634])
tgit  = np.array([0.013, 0.006, 0.013, 0.016, 0.014, 0.029, 0.047, 0.12, 0.233, 1.283])

normalize = lambda x: x - x[0]
logify    = lambda x: np.log10(x[3:])

x = logify(n)
y = logify(normalize(tguix))
z = logify(normalize(tgit))

X = np.concatenate((x.reshape((len(x), 1)), np.ones((len(x), 1))), axis=1)

guix, res, rank, s = np.linalg.lstsq(X, y,  rcond=None)
git,  res, rank, s = np.linalg.lstsq(X, z, rcond=None)

predict = lambda sol: sol[0] * x + sol[1]

plt.plot(x, y, 'bo', label='guix')
plt.plot(x, z, 'ro', label='git')
plt.plot(x, predict(guix), 'bx-')
plt.plot(x, predict(git),  'rx-')
plt.legend(loc='lower right')
plt.xlabel("log(N) where N in `| head -n N`")
plt.ylabel("log(real time)")
plt.show()

[-- Attachment #3: scale.png --]
[-- Type: image/png, Size: 45025 bytes --]

      parent reply	other threads:[~2021-01-29  0:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-28  3:53 [Outreachy] Feedback on 'guix git log' subcommand Magali
2021-01-28 23:07 ` zimoun
2021-01-30  1:30   ` Magali
2021-02-01  7:27     ` zimoun
2021-02-01 15:18   ` Ludovic Courtès
2021-01-29  0:04 ` zimoun [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86eei4hamb.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=boskovits@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=magalilemes00@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.