* Guix build coordinator and cgroups
@ 2024-06-07 14:08 Andreas Enge
0 siblings, 0 replies; only message in thread
From: Andreas Enge @ 2024-06-07 14:08 UTC (permalink / raw)
To: guix-devel
Hello,
when trying to run a guix build agent in a docker container on openshift
with a colleague and assigning 8 of the 128 cores of the physical machine,
the agent would be completely choked since it would start all builds with
commands such as "make -j 128". The 128 are determined by a call to the
guile function current-processor-count, which calls nproc from coreutils
(see "man nproc"). This works on bare metal and virtual machines, but not
in containers or more generally when cgroups are used to limit the number
of cores. Additionally, but less crucially, this probably leads to the
max-1min-load-average parameter of guix-build-coordinator-agent-configuration
to be completely useless: In the example, the machine could have a load of
120 on the other cores, but the part attached to the build agent would
be idle.
This can be worked around by passing by hand extra arguments, such as
"--cores=8" to the guix daemon service, and adapting max-parallel-builds
of the build agent service. Still, it would be nice to have a more
automated approach (for instance, when changing the number of assigned
cores in openshift, one does not want to recreate a docker container with
new manual parameters).
Here is how far we got concerning a potential solution.
When cgroups are available, the file
/sys/fs/cgroup/cpu.pressure
contains some measure of load congestion:
some avg10=8.28 avg60=5.50 avg300=2.11 total=365519361
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
Its contents are described here:
https://www.kernel.org/doc/html/latest/accounting/psi.html#psi
The "full" line is meaningless. I am not exactly sure what is measured
by the "some" line - it is not the load, but a percentage of time during
which "some tasks are stalled on a given resource". It looks like the
max-1min-load-average of the build agent service could be replaced by
a threshold for the avg60 value of this file.
To obtain the current value, the libcgroup library, which is already
available in guix, can be used; we may need to write guile bindings.
I suppose that the number of available cores can be determined in a
similar manner.
What do you think?
Andreas
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-06-07 14:09 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-07 14:08 Guix build coordinator and cgroups Andreas Enge
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).