unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#32548: Cuirass: Performance monitoring
@ 2018-08-27 22:33 Ludovic Courtès
  2020-09-06 14:42 ` Mathieu Othacehe
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2018-08-27 22:33 UTC (permalink / raw)
  To: 32548

As discussed earlier today on IRC with Clément, we could add performance
monitoring capabilities to Cuirass.  Interesting metrics would be:

  • time of push to time of evaluation completion;

  • time of evaluation completion to time of build completion.

We could visualize that per job over time.  Perhaps these are also stats
that ‘guix weather’ could display.

Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2018-08-27 22:33 bug#32548: Cuirass: Performance monitoring Ludovic Courtès
@ 2020-09-06 14:42 ` Mathieu Othacehe
  2020-09-06 18:51   ` Christopher Baines
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Othacehe @ 2020-09-06 14:42 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 32548


Hello,

> As discussed earlier today on IRC with Clément, we could add performance
> monitoring capabilities to Cuirass.  Interesting metrics would be:
>
>   • time of push to time of evaluation completion;
>
>   • time of evaluation completion to time of build completion.

Small update on that one. With Cuirass commit
154232bc767d002f69aa6bb1cdddfd108b98584b, we now have the following
timestamps:

* Checkout commit time.
* Evaluation creation.
* Evaluation checkouts completion.
* Evaluation completion.

For the first timestamp, I'm using Guile-Git to extract the commit time,
which is not the commit push time. In fact, I think there is no such
thing as "commit push time" in git.

We can still compute the metric 'time of commit to time of evaluation
completion', but it's less relevant than the proposed 'time of push to
time of evaluation completion'.

The other proposed metric, 'time of evaluation completion to time of
build completion' can now be computed.

Regarding the actual computation and reporting of those metrics, I'm
still considering different options. I'd like to have a look to
Guile-prometheus that is written by Christopher.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-06 14:42 ` Mathieu Othacehe
@ 2020-09-06 18:51   ` Christopher Baines
  2020-09-07  8:11     ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Baines @ 2020-09-06 18:51 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 32548

[-- Attachment #1: Type: text/plain, Size: 1391 bytes --]


Mathieu Othacehe <othacehe@gnu.org> writes:

> Hello,
>
>> As discussed earlier today on IRC with Clément, we could add performance
>> monitoring capabilities to Cuirass.  Interesting metrics would be:
>>
>>   • time of push to time of evaluation completion;
>>
>>   • time of evaluation completion to time of build completion.
>
> Small update on that one. With Cuirass commit
> 154232bc767d002f69aa6bb1cdddfd108b98584b, we now have the following
> timestamps:
>
> * Checkout commit time.
> * Evaluation creation.
> * Evaluation checkouts completion.
> * Evaluation completion.
>
> For the first timestamp, I'm using Guile-Git to extract the commit time,
> which is not the commit push time. In fact, I think there is no such
> thing as "commit push time" in git.

I had this issue with the Guix Data Service as well, it uses the
timestamp in the email sent by the Savannah git hook, which is the
closest I've got to "commit push time".

> We can still compute the metric 'time of commit to time of evaluation
> completion', but it's less relevant than the proposed 'time of push to
> time of evaluation completion'.

As someone can commit, then potentially push those commits hours later,
assuming no one else has pushed, this data might be a bit noisy. Time
between Curiass noticing the new commit to the evaluation completion
might be cleaner.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-06 18:51   ` Christopher Baines
@ 2020-09-07  8:11     ` Ludovic Courtès
  2020-09-10 13:26       ` Mathieu Othacehe
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2020-09-07  8:11 UTC (permalink / raw)
  To: Christopher Baines; +Cc: Mathieu Othacehe, 32548

Hi,

Christopher Baines <mail@cbaines.net> skribis:

> Mathieu Othacehe <othacehe@gnu.org> writes:
>
>> Hello,
>>
>>> As discussed earlier today on IRC with Clément, we could add performance
>>> monitoring capabilities to Cuirass.  Interesting metrics would be:
>>>
>>>   • time of push to time of evaluation completion;
>>>
>>>   • time of evaluation completion to time of build completion.
>>
>> Small update on that one. With Cuirass commit
>> 154232bc767d002f69aa6bb1cdddfd108b98584b, we now have the following
>> timestamps:
>>
>> * Checkout commit time.
>> * Evaluation creation.
>> * Evaluation checkouts completion.
>> * Evaluation completion.
>>
>> For the first timestamp, I'm using Guile-Git to extract the commit time,
>> which is not the commit push time. In fact, I think there is no such
>> thing as "commit push time" in git.
>
> I had this issue with the Guix Data Service as well, it uses the
> timestamp in the email sent by the Savannah git hook, which is the
> closest I've got to "commit push time".

Neat.

>> We can still compute the metric 'time of commit to time of evaluation
>> completion', but it's less relevant than the proposed 'time of push to
>> time of evaluation completion'.
>
> As someone can commit, then potentially push those commits hours later,
> assuming no one else has pushed, this data might be a bit noisy. Time
> between Curiass noticing the new commit to the evaluation completion
> might be cleaner.

Agreed.  We regularly push commits that are weeks or months old
(sometimes years), so there might be too many outliers when looking at
the commit time.

Thanks for pushing this, Mathieu!

Ludo’.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-07  8:11     ` Ludovic Courtès
@ 2020-09-10 13:26       ` Mathieu Othacehe
  2020-09-14 13:34         ` Mathieu Othacehe
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Othacehe @ 2020-09-10 13:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 32548


Hello,

> Agreed.  We regularly push commits that are weeks or months old
> (sometimes years), so there might be too many outliers when looking at
> the commit time.

Yes, so I used checkout time instead of commit time with
af12a80599346968fb9f52edb33b48dd26852788.

I also turned Evaluation 'in_progress' field into 'status' field. This
way it's much easier to create some metrics on evaluations. It also
allows to distinguish between 'aborted' and 'failed' evaluations.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-10 13:26       ` Mathieu Othacehe
@ 2020-09-14 13:34         ` Mathieu Othacehe
  2020-09-14 14:10           ` zimoun
                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Mathieu Othacehe @ 2020-09-14 13:34 UTC (permalink / raw)
  To: Ludovic Courtès, mail; +Cc: 32548


Hello,

I just pushed support for computing and displaying metrics in Cuirass. I
started with two metrics:

* Builds per day
* Average evaluation speed per specification.

Those metrics can now be seen at:

https://ci.guix.gnu.org/metrics

and are updated every hour.

I plan to add more metrics such as:

- Evaluation completion percentage.
- Evaluation completion speed.
- Failed evaluations percentage.
- Pending builds per day.

Don't hesitate to comment or propose other metrics.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-14 13:34         ` Mathieu Othacehe
@ 2020-09-14 14:10           ` zimoun
  2020-09-16  2:21             ` Bonface M. K.
  2020-09-14 19:27           ` Ludovic Courtès
  2020-09-16 15:56           ` Andreas Enge
  2 siblings, 1 reply; 14+ messages in thread
From: zimoun @ 2020-09-14 14:10 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 32548

Hi Mathieu,

Really cool!

On Mon, 14 Sep 2020 at 15:35, Mathieu Othacehe <othacehe@gnu.org> wrote:

> * Builds per day
> * Average evaluation speed per specification.

Something interesting could be: min and max (of 100 evaluations).
The standard error deviation too but I am not sure it is easy to
interpret with a quick look.  Instead, the median could be
interesting.

For example, consider these 2 evaluations:

https://ci.guix.gnu.org/build/2094496/details
https://ci.guix.gnu.org/build/3035986/details

Well, if there is say 99 evaluations of first "kind" and 1 of second
kind, the average is:
(99*849 + 1_595_796_252) / 100 = 15_958_803.03
which does not really represent the effective workload.

Well, I will try to give a look if I can schedule a moment. :-)


> Those metrics can now be seen at:
>
> https://ci.guix.gnu.org/metrics

Nice plot!


> I plan to add more metrics such as:
>
> - Evaluation completion percentage.
> - Evaluation completion speed.
> - Failed evaluations percentage.
> - Pending builds per day.

Cool!
Maybe time between commit time (not author time) and start of the build.


Cheers,
simon




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-14 13:34         ` Mathieu Othacehe
  2020-09-14 14:10           ` zimoun
@ 2020-09-14 19:27           ` Ludovic Courtès
  2020-09-17 10:07             ` Mathieu Othacehe
  2020-09-16 15:56           ` Andreas Enge
  2 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2020-09-14 19:27 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 32548

Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

> I just pushed support for computing and displaying metrics in Cuirass. I
> started with two metrics:
>
> * Builds per day
> * Average evaluation speed per specification.
>
> Those metrics can now be seen at:
>
> https://ci.guix.gnu.org/metrics
>
> and are updated every hour.

This is very cool, thumbs up!

> I plan to add more metrics such as:
>
> - Evaluation completion percentage.
> - Evaluation completion speed.
> - Failed evaluations percentage.
> - Pending builds per day.

That’d be awesome.

As discussed on IRC, builds per day should be compared to new
derivations per day.  For example, if on a day there’s 100 new
derivations and we only manage to build 10 of them, we have a problem.

BTW, in cuirass.log I noticed this:

--8<---------------cut here---------------start------------->8---
2020-09-14T21:16:21 Updating metric average-eval-duration-per-spec (guix-modular-master) to 414.8085106382979.
2020-09-14T21:16:21 Failed to compute metric average-10-last-eval-duration-per-spec (kernel-updates).
2020-09-14T21:16:21 Failed to compute metric average-100-last-eval-duration-per-spec (kernel-updates).
2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (kernel-updates).
2020-09-14T21:16:21 Failed to compute metric average-10-last-eval-duration-per-spec (staging-staging).
2020-09-14T21:16:21 Failed to compute metric average-100-last-eval-duration-per-spec (staging-staging).
2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (staging-staging).
2020-09-14T21:16:21 Failed to compute metric average-10-last-eval-duration-per-spec (version-1.0.1).
2020-09-14T21:16:21 Failed to compute metric average-100-last-eval-duration-per-spec (version-1.0.1).
2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (version-1.0.1).
2020-09-14T21:16:21 Failed to compute metric average-10-last-eval-duration-per-spec (version-1.1.0).
2020-09-14T21:16:21 Failed to compute metric average-100-last-eval-duration-per-spec (version-1.1.0).
2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (version-1.1.0).
2020-09-14T21:16:21 Failed to compute metric average-10-last-eval-duration-per-spec (wip-desktop).
2020-09-14T21:16:21 Failed to compute metric average-100-last-eval-duration-per-spec (wip-desktop).
2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (wip-desktop).
--8<---------------cut here---------------end--------------->8---

Perhaps it can’t compute an average yet for these jobsets?

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-14 14:10           ` zimoun
@ 2020-09-16  2:21             ` Bonface M. K.
  0 siblings, 0 replies; 14+ messages in thread
From: Bonface M. K. @ 2020-09-16  2:21 UTC (permalink / raw)
  To: zimoun; +Cc: Mathieu Othacehe, 32548

[-- Attachment #1: Type: text/plain, Size: 1553 bytes --]

Hi all.

zimoun <zimon.toutoune@gmail.com> writes:

> Hi Mathieu,
>
> Really cool!
>
> On Mon, 14 Sep 2020 at 15:35, Mathieu Othacehe <othacehe@gnu.org> wrote:
>
>> * Builds per day
>> * Average evaluation speed per specification.
>
> Something interesting could be: min and max (of 100 evaluations).
> The standard error deviation too but I am not sure it is easy to
> interpret with a quick look.  Instead, the median could be
> interesting.
>
> For example, consider these 2 evaluations:
>
> https://ci.guix.gnu.org/build/2094496/details
> https://ci.guix.gnu.org/build/3035986/details
>

I'm getting a 504 Gateway Time-out error when
visiting the above links(at the time of sending
this email).

> Well, if there is say 99 evaluations of first "kind" and 1 of second
> kind, the average is:
> (99*849 + 1_595_796_252) / 100 = 15_958_803.03
> which does not really represent the effective workload.
>
> Well, I will try to give a look if I can schedule a moment. :-)
>
>
>> Those metrics can now be seen at:
>>
>> https://ci.guix.gnu.org/metrics
>
> Nice plot!
>
>
>> I plan to add more metrics such as:
>>
>> - Evaluation completion percentage.
>> - Evaluation completion speed.
>> - Failed evaluations percentage.
>> - Pending builds per day.
>
> Cool!
> Maybe time between commit time (not author time) and start of the build.
>
>
> Cheers,
> simon
>
>
>
>

-- 
Bonface M. K. (https://www.bonfacemunyoki.com)
Chief Emacs Mchochezi
GPG key = D4F09EB110177E03C28E2FE1F5BBAE1E0392253F

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 869 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-14 13:34         ` Mathieu Othacehe
  2020-09-14 14:10           ` zimoun
  2020-09-14 19:27           ` Ludovic Courtès
@ 2020-09-16 15:56           ` Andreas Enge
  2020-09-17  7:10             ` Mathieu Othacehe
  2 siblings, 1 reply; 14+ messages in thread
From: Andreas Enge @ 2020-09-16 15:56 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 32548

On Mon, Sep 14, 2020 at 03:34:17PM +0200, Mathieu Othacehe wrote:
> I just pushed support for computing and displaying metrics in Cuirass. I
> started with two metrics:
> * Builds per day
> * Average evaluation speed per specification.
> Those metrics can now be seen at:
> https://ci.guix.gnu.org/metrics

Congratulations, that looks like a very useful start already!
(And the number of builds has doubled since yesterday, so someone already
put it to good use!)

How about also adding metrics per build machine? I have the impression,
for instance, that the aarch64 machine in my living room is not used.
If this is confirmed, we could take appropriate action (uncomment it in
/etc/machines.scm :-), compare to other used machines, change the scheduling
in the daemon, or even turn it off to conserve energy should it turn out
that we have too much build power...).

Andreas





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-16 15:56           ` Andreas Enge
@ 2020-09-17  7:10             ` Mathieu Othacehe
  2020-09-18 12:21               ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Othacehe @ 2020-09-17  7:10 UTC (permalink / raw)
  To: Andreas Enge; +Cc: 32548


Hello Andreas,

> Congratulations, that looks like a very useful start already!
> (And the number of builds has doubled since yesterday, so someone already
> put it to good use!)

Thanks for your feedback :)

> How about also adding metrics per build machine? I have the impression,
> for instance, that the aarch64 machine in my living room is not used.
> If this is confirmed, we could take appropriate action (uncomment it in
> /etc/machines.scm :-), compare to other used machines, change the scheduling
> in the daemon, or even turn it off to conserve energy should it turn out
> that we have too much build power...).

Yes I would really like to have something like:
https://hydra.nixos.org/machines, with a build rate for every machine.

However, it cannot be done without structural changes to how offloading
is handled. For now it's working this way:

Cuirass -> guix-daemon -> guix offload -> build machines

Which means that Cuirass has almost no information about offloaded
builds. We are currently starting discussions about inviting the Guix
Build Coordinator to the party.

That could maybe help us implement what you are proposing, among other
things.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-14 19:27           ` Ludovic Courtès
@ 2020-09-17 10:07             ` Mathieu Othacehe
  2020-09-17 20:22               ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Mathieu Othacehe @ 2020-09-17 10:07 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 32548-done


Hey Ludo,

> As discussed on IRC, builds per day should be compared to new
> derivations per day.  For example, if on a day there’s 100 new
> derivations and we only manage to build 10 of them, we have a problem.

I added this line, and they sadly do not overlap :(

> 2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (version-1.1.0).
> 2020-09-14T21:16:21 Failed to compute metric average-10-last-eval-duration-per-spec (wip-desktop).
> 2020-09-14T21:16:21 Failed to compute metric average-100-last-eval-duration-per-spec (wip-desktop).
> 2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (wip-desktop).
>
> Perhaps it can’t compute an average yet for these jobsets?

Yes as soon as those evaluations will be repaired, we should be able to
compute those metrics. I chose to keep the error messages as a
remainder.

I added various other metrics and updated the "/metrics" page. Once we
have a better view, we should think of adding thresholds on those
metrics.

Closing this one!

Thanks,

Mathieu

-- 
https://othacehe.org




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-17 10:07             ` Mathieu Othacehe
@ 2020-09-17 20:22               ` Ludovic Courtès
  0 siblings, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2020-09-17 20:22 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 32548-done

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> As discussed on IRC, builds per day should be compared to new
>> derivations per day.  For example, if on a day there’s 100 new
>> derivations and we only manage to build 10 of them, we have a problem.
>
> I added this line, and they sadly do not overlap :(

It seems less bad than I thought though, and the rendering is pretty.
:-)

>> 2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (version-1.1.0).
>> 2020-09-14T21:16:21 Failed to compute metric average-10-last-eval-duration-per-spec (wip-desktop).
>> 2020-09-14T21:16:21 Failed to compute metric average-100-last-eval-duration-per-spec (wip-desktop).
>> 2020-09-14T21:16:21 Failed to compute metric average-eval-duration-per-spec (wip-desktop).
>>
>> Perhaps it can’t compute an average yet for these jobsets?
>
> Yes as soon as those evaluations will be repaired, we should be able to
> compute those metrics. I chose to keep the error messages as a
> remainder.

Makes sense.

> I added various other metrics and updated the "/metrics" page. Once we
> have a better view, we should think of adding thresholds on those
> metrics.

Excellent.

Thanks a lot for closing this gap!

Ludo’.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#32548: Cuirass: Performance monitoring
  2020-09-17  7:10             ` Mathieu Othacehe
@ 2020-09-18 12:21               ` Ludovic Courtès
  0 siblings, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2020-09-18 12:21 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 32548

Hi Mathieu!

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> How about also adding metrics per build machine? I have the impression,
>> for instance, that the aarch64 machine in my living room is not used.
>> If this is confirmed, we could take appropriate action (uncomment it in
>> /etc/machines.scm :-), compare to other used machines, change the scheduling
>> in the daemon, or even turn it off to conserve energy should it turn out
>> that we have too much build power...).
>
> Yes I would really like to have something like:
> https://hydra.nixos.org/machines, with a build rate for every machine.

+1!

> However, it cannot be done without structural changes to how offloading
> is handled. For now it's working this way:
>
> Cuirass -> guix-daemon -> guix offload -> build machines
>
> Which means that Cuirass has almost no information about offloaded
> builds.

In practice, it could parse the offload events that it gets; a bit of a
hack, but good enough.  However…

> We are currently starting discussions about inviting the Guix Build
> Coordinator to the party.

… this sounds like the better option longer-term.

Ludo’.




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-18 12:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-27 22:33 bug#32548: Cuirass: Performance monitoring Ludovic Courtès
2020-09-06 14:42 ` Mathieu Othacehe
2020-09-06 18:51   ` Christopher Baines
2020-09-07  8:11     ` Ludovic Courtès
2020-09-10 13:26       ` Mathieu Othacehe
2020-09-14 13:34         ` Mathieu Othacehe
2020-09-14 14:10           ` zimoun
2020-09-16  2:21             ` Bonface M. K.
2020-09-14 19:27           ` Ludovic Courtès
2020-09-17 10:07             ` Mathieu Othacehe
2020-09-17 20:22               ` Ludovic Courtès
2020-09-16 15:56           ` Andreas Enge
2020-09-17  7:10             ` Mathieu Othacehe
2020-09-18 12:21               ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).