unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Jacob Hrbek <kreyren@rixotstudio.cz>
To: zimoun <zimon.toutoune@gmail.com>
Cc: "guix-devel@gnu.org" <guix-devel@gnu.org>
Subject: Re: Proposal: Build timers
Date: Thu, 25 Nov 2021 04:00:28 +0000	[thread overview]
Message-ID: <XpYiBZ1iP9f-V6E0wysN608KsoIhCYJzSuioeywqECck8QxqeqpuhnD38wJKyLZsZ1geT7HOpzTtF1QDHrZCAFnS-3-QKmHfOqc-ozV_HaY=@rixotstudio.cz> (raw)
In-Reply-To: <86r1b5u6pm.fsf@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 7687 bytes --]

> The “pokémon-battle” model is a simple linear model
(cross-multiplication); using Jacob’s “notation” -- zimoun

It's not designed to be linear as the HP variable could be adjusted in real time e.g. recalculating it every X amount of time as needed which can include calculations for noise that influences the task.

It currently seems as linear as I developed it to be a workable platform on which we can develop more robust solution in case the simple linear calculations are not sufficiently accurate (which i think that it will be if we get sufficient amount of data to calculate it).

>  - HP: time to build on machine A  -- zimoun

Not time, but **COMPLEXITY** of the package as I see that as an important destiction since it's by design never meant to store time, but "time value" **that is converted in time**.

> based on some experiments.  Last, on machine B, knowing both nthread' and cpufreq' for that machine B, you are expecting to evaluate HP' for that machine B applying the formula:
>   HP' = a * nthread' * cpufreq' + b -- zimoun

In this context I would describe it as:

CPU strenght = nthread * cpufreq * "other things that make the CPU to deal more damage"
HP = "CPU strenght" * "time it took to build in sec"

Which is linear, but the components used to figure out this linear function are non-linear e.g. "RAM memory" will most likely  appear as exponential, but it will eventually hit constant when the CPU's requirement for the memory are satisfied.

Also the calculation should never contain systems values from systems a,b,c,.. , rather interpreting the hardware resources into an equasion that should be integrated to calculate unknowns

where the issue in that theory is figuring out the "time it took to build" and "CPU strenght" which i think can be mitigated by determining how the hardware affects the build by changing it's variables in two builds to determine e.g.

4 thread = 5 min
3 threads = 10 min
2 threads = 15 min

-> 1 threads will take 20 min.

So literally following a pokemon battle system alike:

Pokemon with 100 HP and you dealing 10 HP per turn -> it will take you 10 turns to win the battle.

---

Btw. The components capable of bottleneck such as the amount of RAM memory should be probably calculated as <0.0,1.0> so that they can be applied as **multipliers** to the "CPU strenght" since (following "the rule of compilation" using 2 GB of RAM per 1 thread) the CPU will function in a scenario of 4 threads with 4 GB of RAM function at half the efficiency (0.5) if it's requirements for fast memory are not satisfied.

-- Jacob "Kreyren" Hrbek

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Wednesday, November 24th, 2021 at 11:35 AM, zimoun <zimon.toutoune@gmail.com> wrote:

> Hi,
> 

> On Tue, 23 Nov 2021 at 14:39, Jacob Hrbek kreyren@rixotstudio.cz wrote:
> 

> > > This approximation would not even be accurate enough for the same
> > > 

> > > machine. For instance, the test suite of the julia package runs
> > > 

> > > mainly sequential using one thread...
> > 

> > I am aware of this scenario and I adapted the equasion for it, but I
> > 

> > recognize that this exponentially increases the inaccuracy with more
> > 

> > threads and I don't believe that there is a mathematical way with the
> > 

> > provided values to handle that scenario so we would have to adjust the
> > 

> > calculation for those packages.
> 

> What I am trying to explain is that the model cannot work to be
> 

> predictable enough with what I consider a meaningful accuracy.
> 

> Obviously, relaxing the precision, it is easy to infer a rule of thumb;
> 

> a simple cross-multiplication fits the job. ;-)
> 

> The “pokémon-battle” model is a simple linear model
> 

> (cross-multiplication); using Jacob’s “notation”:
> 

> -   HP: time to build on machine A
> -   DPS = nthread * cpufreq : “power” of machine
>     

>     Then it is expected to evaluate ’a’ and ’b’ on average such that:
>     

>     HP = a * DPS + b
>     

>     based on some experiments. Last, on machine B, knowing both nthread'
>     

>     and cpufreq' for that machine B, you are expecting to evaluate HP' for
>     

>     that machine B applying the formula:
>     

>     HP' = a * nthread' * cpufreq' + b
>     

>     Jacob, do I correctly understand the model?
>     

>     In any case, that’s what LFS is doing, instead HP is named SBU. And
>     

>     instead DPS, they use a reference package. And this normalization is
>     

>     better, IMHO. Other said, for one specific package considered as
>     

>     reference, they compute HP1 (resp. HP2) for machine A (resp. B), then
>     

>     for machine A, they know HP for another package and they deduce,
>     

>     HP' = HP2/HP1 * HP
>     

>     All this is trivial. :-) The key is the accuracy, i.e., the error
>     

>     between the prediction HP' and the real time. Here, the issue is that
>     

>     HP1 and HP2 capture for one specific package the overall time; which
>     

>     depends on hidden parameters as nthread, cpufreq, IO, and other
>     

>     parameters from hardware. But that a strong assumption when considering
>     

>     these hidden parameters (evaluated for one specific package) are equally
>     

>     the same for any other package.
>     

>     It is a strong assumption because the hidden parameters depends on
>     

>     hardware specifications (nthread, cpufreq, etc.) and how the package
>     

>     itself exploits them.
>     

>     Therefore, the difference between the prediction and the real time is
>     

>     highly variable, and thus personally I am not convince the effort is
>     

>     worth; for local build. That’s another story. ;-)
>     

>     LSF is well-aware of the issue and it is documented [1,2].
>     

>     The root of the issue is the model based on a strong assumption; both
>     

>     (model and assumption) do not fit how the reality concrete works, IMHO.
>     

>     One straightforward way — requiring some work though – for improving the
>     

>     accuracy is to use statistical regressions. We cannot do really better
>     

>     to capture the hardware specification – noticing that the machine stress
>     

>     (what the machine is currently doing when the build happens) introduces
>     

>     a variability hard to estimate beforehand. However, it is possible to
>     

>     do better when dealing with packages. Other said, exploit the data from
>     

>     the build farms.
>     

>     Well, I stop here because it rings a bell: model could be discussed at
>     

>     length if it is never applied to concrete numbers. :-)
>     

>     Let keep it pragmatic! :-)
>     

>     Using the simple LFS model and SBU, what would be the typical error?
>     

>     For instance, I propose that we collectively send the timings of
>     

>     packages: bash, gmsh, julia, emacs, vim; or any other 5 packages for
>     

>     x86_64 architecture. Then we can compare typical errors between
>     

>     prediction and real, i.e., evaluate “accuracy“ for SBU and then decide
>     

>     if it is acceptable or not. :-)
>     

>     Cheers,
>     

>     simon
>     

>     1: https://www.linuxfromscratch.org/lfs/view/stable/chapter04/aboutsbus.html
> 

> 2: https://www.linuxfromscratch.org/~bdubbs/about.html

[-- Attachment #1.2: publickey - kreyren@rixotstudio.cz - 0x1677DB82.asc --]
[-- Type: application/pgp-keys, Size: 737 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

  reply	other threads:[~2021-11-25  4:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22 22:02 Proposal: Build timers Jacob Hrbek
2021-11-23  1:06 ` zimoun
2021-11-23  6:21   ` Jacob Hrbek
2021-11-23 11:56     ` zimoun
2021-11-23 14:39       ` Jacob Hrbek
2021-11-24 11:35         ` zimoun
2021-11-25  4:00           ` Jacob Hrbek [this message]
2021-11-23 12:05     ` Julien Lepiller
2021-11-23 16:23       ` zimoun
2021-11-23 20:09 ` Liliana Marie Prikler
2021-11-23 21:31   ` Jacob Hrbek
2021-11-23 21:35   ` Jacob Hrbek
2021-11-23 23:50     ` Julien Lepiller
2021-11-24 11:31       ` zimoun
2021-11-24 20:23         ` Vagrant Cascadian
2021-11-24 21:50           ` zimoun
2021-11-25  4:03           ` Jacob Hrbek
2021-11-25  5:21             ` Liliana Marie Prikler
2021-11-25 10:23             ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='XpYiBZ1iP9f-V6E0wysN608KsoIhCYJzSuioeywqECck8QxqeqpuhnD38wJKyLZsZ1geT7HOpzTtF1QDHrZCAFnS-3-QKmHfOqc-ozV_HaY=@rixotstudio.cz' \
    --to=kreyren@rixotstudio.cz \
    --cc=guix-devel@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).