all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: Jacob Hrbek <kreyren@rixotstudio.cz>
Cc: "guix-devel@gnu.org" <guix-devel@gnu.org>
Subject: Re: Proposal: Build timers
Date: Tue, 23 Nov 2021 12:56:35 +0100	[thread overview]
Message-ID: <865ysjw0ek.fsf@gmail.com> (raw)
In-Reply-To: <HktDVm3xNb7mUmY6ZZ_ebsG4s9SJ1LFVYGbce9pIRfjvKli4phmJ1P-D19z7FrsZxav708l7cKkFkT2QrN8btGgNFRzDIqnRgZ5-WNTPC10=@rixotstudio.cz>

Hi,

On Tue, 23 Nov 2021 at 06:21, Jacob Hrbek <kreyren@rixotstudio.cz> wrote:

> 1. locally: Storing the value somewhere on the system and adding up to
> it each build to provide more accurate average.

Timing is already stored, see “guix build --log-file”.  Give a look at
’/var/log/guix/drvs’.  For instance,

--8<---------------cut here---------------start------------->8---
$ bzcat /var/log/guix/drvs/aq/abymi9yk7pv89614dcdfll3hh4i5mc-julia-1.5.3.drv.bz2 | grep phase | grep seconds
phase `set-SOURCE-DATE-EPOCH' succeeded after 0.0 seconds
phase `set-paths' succeeded after 0.0 seconds
phase `install-locale' succeeded after 0.0 seconds
phase `unpack' succeeded after 1.1 seconds
phase `use-system-libwhich' succeeded after 0.0 seconds
phase `disable-documentation' succeeded after 0.0 seconds
phase `prepare-deps' succeeded after 0.0 seconds
phase `bootstrap' succeeded after 0.0 seconds
phase `patch-usr-bin-file' succeeded after 0.0 seconds
phase `patch-source-shebangs' succeeded after 0.2 seconds
phase `patch-generated-file-shebangs' succeeded after 0.0 seconds
phase `fix-include-and-link-paths' succeeded after 0.0 seconds
phase `replace-default-shell' succeeded after 0.0 seconds
phase `fix-precompile' succeeded after 0.0 seconds
phase `build' succeeded after 354.3 seconds
phase `set-home' succeeded after 0.0 seconds
phase `disable-broken-tests' succeeded after 0.0 seconds
phase `check' succeeded after 7428.8 seconds
phase `install' succeeded after 16.0 seconds
phase `make-wrapper' succeeded after 0.0 seconds
phase `patch-shebangs' succeeded after 0.0 seconds
phase `strip' succeeded after 0.0 seconds
phase `validate-runpath' succeeded after 0.0 seconds
phase `validate-documentation-location' succeeded after 0.0 seconds
phase `delete-info-dir-file' succeeded after 0.0 seconds
phase `patch-dot-desktop-files' succeeded after 0.0 seconds
phase `install-license-files' succeeded after 0.0 seconds
phase `reset-gzip-timestamps' succeeded after 0.0 seconds
phase `compress-documentation' succeeded after 0.0 seconds
--8<---------------cut here---------------end--------------->8---

Therefore, you need to extract somehow that information.


> **optionally** This local database can be shared across multiple
> systems that add values to it like simple listener waiting for POST
> requests.

It should be better to use a content-addressed distribution such as IPFS
or GNUnet, IMHO.


> - within the guix repo: Since we are already building the package we
> can take the time and then do the provided math in reverse to
> calculate the time: 
>
>     Build took 100 sec on system with 8 threads at 2.4 Ghz max cpu frequency:
>
>     100 * (2.4 * 8) = 1920 (build time value per one thread at 1 Ghz)
>
>     Building the package on system with 2 threads at 2.4 Ghz max cpu frequency:
>
>     1920 / (2 * 2.4) = 400
>
>     We can then assume that the build will take 1920/400=4.8 -> 4.8
>     times longer on this system. 

Are you assuming here that the two machines are the same?  Or are they
different?

This approximation would not even be accurate enough for the same
machine.  For instance, the test suite of the julia package runs mainly
sequential using one thread.  If you go back to numbers above,
build=354.3 seconds and check=7428.8 seconds, so the number of threads
only tweaks timing of build phase, which will not impact much the
overall timing.

Somehow, IIUC your proposal, you would like, based on timings from
machine A about a set of packages, and timings from machine B about the
same set of packages, knowing the timing of machine B for package foo,
then extrapolate timing for machine A of package foo.  The maths for
that are not linear, IMHO, and require “complicated” heuristics.  It is
not that complicated, it “just” require some statistical regression –
though it is not straightforward either. :-)

BTW, why not directly substitute package foo from machine B?


> The math might need to be adjusted, but it seems to be sufficiently
> accurate through my testing, fwiw I see that `(max cpu frequency * cpu
> threads)` is an important component of the equasion using the analogy
> of a (possibly silly) "pokemon battle" assuming interpreting a
> mathematical equasion to define the Health Points of the package and
> damage per second of the CPU then simply substracting these two values
> to determine how long it will take to build alike package has 500 HP
> -> Needs a CPU that deals 100 HP to complete in 5 sec or CPU that
> deals 50 HP to finish in 10 sec.

I will be happy if I am wrong.  I guess this back-to-envelope would be
not accurate enough; for two reasons.  As I said elsewhere, to your
example value of 100 seconds is attached a strong variability, depending
one on how the package itself scales at build time and more than often
this scaling is not linear versus the number of threads – from my
experience; and two on the stressed context where the build happens.


> About accuracy: I highly doubt that we need to worry about the system
> noise as that will be mitigated after enough systems submit average
> build time with their max CPU frequency and threads used.. we
> shouldn't really bother past that at the current stage and we can
> always add additional metadata if needed.

A average is not meaningful by itself.  It provides a first-order
approximation and generally it is not sufficient; the second-order is
also required.  Especially when drawing a model for prediction.  From
what I remember about stats, and assuming the distribution is Gaussian,
the mean and standard error are required to capture that information.
My guess is because standard error, the mean would not provide useful
prediction shareable across heterogeneous machines.

I will be happy to be wrong and only numbers can answer to this
question.  If you are interested by building a model or verify your
assumptions, I am sure it is possible to dump the current Cuirass
postgres database and then do some analytics.  It would be a starting
point to evaluate if the effort implied by your proposal is worth.

I am not convinced such model would be doable for practical use across
heterogeneous machines, but it would help for monitoring CI.


Cheers,
simon




  reply	other threads:[~2021-11-23 11:59 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22 22:02 Proposal: Build timers Jacob Hrbek
2021-11-23  1:06 ` zimoun
2021-11-23  6:21   ` Jacob Hrbek
2021-11-23 11:56     ` zimoun [this message]
2021-11-23 14:39       ` Jacob Hrbek
2021-11-24 11:35         ` zimoun
2021-11-25  4:00           ` Jacob Hrbek
2021-11-23 12:05     ` Julien Lepiller
2021-11-23 16:23       ` zimoun
2021-11-23 20:09 ` Liliana Marie Prikler
2021-11-23 21:31   ` Jacob Hrbek
2021-11-23 21:35   ` Jacob Hrbek
2021-11-23 23:50     ` Julien Lepiller
2021-11-24 11:31       ` zimoun
2021-11-24 20:23         ` Vagrant Cascadian
2021-11-24 21:50           ` zimoun
2021-11-25  4:03           ` Jacob Hrbek
2021-11-25  5:21             ` Liliana Marie Prikler
2021-11-25 10:23             ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=865ysjw0ek.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=kreyren@rixotstudio.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.