* bug#63412: Topological sorting in cuirass
@ 2023-05-10 10:06 Andreas Enge
2023-05-10 13:59 ` Dr. Arne Babenhauserheide
2023-10-29 17:01 ` Ludovic Courtès
0 siblings, 2 replies; 3+ messages in thread
From: Andreas Enge @ 2023-05-10 10:06 UTC (permalink / raw)
To: 63412
This is a wishlist bug, but it is important for architectures where we
are currently short on build power, and where this issue can stall builds
and waste an arbitrary amount of build power.
Cuirass should sort builds and only offload derivations for which all
inputs are available.
In my current understanding, cuirass offloads arbitrary derivations, and
the machine to which they are offloaded then starts building recursively
all inputs. If this is true, then it is possible that at some point in time,
all build slots are taken by the same package built as many times as there
are machines; I have seen something like this when working on core-updates,
where several machines were building the main gcc compiler at the same time.
At worse, if cuirass asks every machine to build a leaf package, this may
result in a simultaneous full bootstrap on all of them.
The situation becomes worse when the package in question fails. Then as
I understand it, each machine may receive a request to build something
depending on the failing package and try the failing build and thus waste
build power that will not be available to build other packages successfully.
Solving this problem may also make reports of build failures more accurate
and legible. For instance, doxygen currently fails to build on aarch64:
https://ci.guix.gnu.org/build/969427/details
and is reported as "Failed", and not as "Failed (dependency)".
However, looking at the build log
https://ci.guix.gnu.org/build/969427/log/raw
shows this:
...
building path(s) `/gnu/store/p5vqrwywz053r1vkiyw54dp9gj7vw9xd-ninja-1.11.1'
...
builder for `/gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv' failed with exit code 1
@ build-failed /gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv - 1 builder for `/gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv' failed with exit code 1
cannot build derivation `/gnu/store/hlscqram59id51hxg0fj15041v52h1kw-meson-1.1.0.drv': 1 dependencies couldn't be built
cannot build derivation `/gnu/store/w8qxkrwpffd9qs5w1jggy1yi27ycm0xr-jsoncpp-1.9.5.drv': 2 dependencies couldn't be built
cannot build derivation `/gnu/store/mss4yv015cil1vnjnglq506m83b7n3dy-cmake-bootstrap-3.24.2.drv': 1 dependencies couldn't be built
cannot build derivation `/gnu/store/w0irp6xn30nlmpizhcbjnvhqmsba41jn-cmake-minimal-3.24.2.drv': 2 dependencies couldn't be built
cannot build derivation `/gnu/store/rqk2rbnpjpcnqswz8hqari1rnw6r8v1m-doxygen-1.9.5.drv': 1 dependencies couldn't be built
So it is indeed a different package that fails (and the last few lines
give a list of dependencies between ninja and doxygen, each of which may
or may not fail once ninja is fixed).
Notice that this could be solved without a topological sorting of the
dependency graph: It would be enough to keep an array deriv in which
deriv[i] contains a list of derivations requiring i more inputs to be built,
together with the list of inputs; elements in deriv[0] are ready to be sent
to a build machine, and upon completion of a build, all derivations
depending on it should be moved from deriv[i] to deriv[i-1] if the input
has been built successfully, or marked as "Failed (dependency)" if the
input has failed. (But this could be expensive, and may require appropriate
data structures.)
Alternatively, build jobs could be sorted topologically and then be kept
in a list; then before sending out a job, all its inputs have been tried
to be built; the job should then be sent if all inputs are available, or
be marked as "Failed (dependency)" if any of them has failed.
Andreas
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#63412: Topological sorting in cuirass
2023-05-10 10:06 bug#63412: Topological sorting in cuirass Andreas Enge
@ 2023-05-10 13:59 ` Dr. Arne Babenhauserheide
2023-10-29 17:01 ` Ludovic Courtès
1 sibling, 0 replies; 3+ messages in thread
From: Dr. Arne Babenhauserheide @ 2023-05-10 13:59 UTC (permalink / raw)
To: Andreas Enge; +Cc: 63412
[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]
Andreas Enge <andreas@enge.fr> writes:
> Cuirass should sort builds and only offload derivations for which all
> inputs are available.
...
> Alternatively, build jobs could be sorted topologically and then be kept
> in a list; then before sending out a job, all its inputs have been tried
> to be built; the job should then be sent if all inputs are available, or
> be marked as "Failed (dependency)" if any of them has failed.
If you want to try this out quickly, you could use code from the
topological sorting SRFI I’m slowly finalizing:
https://srfi.schemers.org/srfi-234/srfi-234.html
For usage see: https://github.com/scheme-requests-for-implementation/srfi-234/blob/main/srfi-234-test.scm
Code: https://github.com/scheme-requests-for-implementation/srfi-234/blob/main/srfi/234-impl.scm
edgelist->graph should do the conversion from inputs per package to the
input format.
Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1125 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#63412: Topological sorting in cuirass
2023-05-10 10:06 bug#63412: Topological sorting in cuirass Andreas Enge
2023-05-10 13:59 ` Dr. Arne Babenhauserheide
@ 2023-10-29 17:01 ` Ludovic Courtès
1 sibling, 0 replies; 3+ messages in thread
From: Ludovic Courtès @ 2023-10-29 17:01 UTC (permalink / raw)
To: Andreas Enge; +Cc: 63412
Hi,
Andreas Enge <andreas@enge.fr> skribis:
> This is a wishlist bug, but it is important for architectures where we
> are currently short on build power, and where this issue can stall builds
> and waste an arbitrary amount of build power.
>
> Cuirass should sort builds and only offload derivations for which all
> inputs are available.
Cuirass has two build backends: talking to the build daemon, and using
the ZeroMQ-based “remote worker” protocol. In practice we use the
latter, which fixes scalability issues with the former.
The worker protocol implements work stealing: workers periodically send
messages to “remote server” asking for work. Said server replies with a
derivation for one of the systems the worker supports; that derivation
is chosen among the “builds” whose dependencies have all been
successfully built.
And here’s the trick: the server is doing the right thing, but it has a
partial view. Namely, the server sees “builds” rather than
“derivations”. “Builds” are the things explicitly declared in the
jobset. If you have a declared build for GCC, but no build for MPC and
MPFR, then the server will consider that GCC has zero non-built
dependencies, even though MPFR and MPC may still need to be built.
Having ‘remote-worker’ operate on derivations rather than builds would
address this impedance mismatch, though there are complications (there
are bits of the database schema that amalgamate build/derivation).
Food for thought!
Ludo’.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-10-29 17:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-10 10:06 bug#63412: Topological sorting in cuirass Andreas Enge
2023-05-10 13:59 ` Dr. Arne Babenhauserheide
2023-10-29 17:01 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).