From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bug-guix-bounces+larch=yhetil.org@gnu.org>
Received: from mp11.migadu.com ([2001:41d0:2:bcc0::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms9.migadu.com with LMTPS
	id 4DAnMeJsW2SwNwAASxT56A
	(envelope-from <bug-guix-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Wed, 10 May 2023 12:07:30 +0200
Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp11.migadu.com with LMTPS
	id MDsxMeJsW2QSvgAA9RJhRA
	(envelope-from <bug-guix-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Wed, 10 May 2023 12:07:30 +0200
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 53974ABCD
	for <larch@yhetil.org>; Wed, 10 May 2023 12:07:30 +0200 (CEST)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <bug-guix-bounces@gnu.org>)
	id 1pwgiu-0002cz-MT; Wed, 10 May 2023 06:07:04 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
 id 1pwgis-0002co-MN
 for bug-guix@gnu.org; Wed, 10 May 2023 06:07:02 -0400
Received: from debbugs.gnu.org ([209.51.188.43])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
 id 1pwgis-0003fQ-EP
 for bug-guix@gnu.org; Wed, 10 May 2023 06:07:02 -0400
Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
 (envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1pwgis-0002dt-9x
 for bug-guix@gnu.org; Wed, 10 May 2023 06:07:02 -0400
X-Loop: help-debbugs@gnu.org
Subject: bug#63412: Topological sorting in cuirass
Resent-From: Andreas Enge <andreas@enge.fr>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-guix@gnu.org
Resent-Date: Wed, 10 May 2023 10:07:02 +0000
Resent-Message-ID: <handler.63412.B.168371321210136@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: report 63412
X-GNU-PR-Package: guix
X-GNU-PR-Keywords: 
To: 63412@debbugs.gnu.org
X-Debbugs-Original-To: bug-guix@gnu.org
Received: via spool by submit@debbugs.gnu.org id=B.168371321210136
 (code B ref -1); Wed, 10 May 2023 10:07:02 +0000
Received: (at submit) by debbugs.gnu.org; 10 May 2023 10:06:52 +0000
Received: from localhost ([127.0.0.1]:45276 helo=debbugs.gnu.org)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
 id 1pwgih-0002dP-PW
 for submit@debbugs.gnu.org; Wed, 10 May 2023 06:06:52 -0400
Received: from lists.gnu.org ([209.51.188.17]:34622)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <andreas@enge.fr>) id 1pwgif-0002dH-Is
 for submit@debbugs.gnu.org; Wed, 10 May 2023 06:06:50 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <andreas@enge.fr>) id 1pwgid-0002c0-Nf
 for bug-guix@gnu.org; Wed, 10 May 2023 06:06:47 -0400
Received: from hera.aquilenet.fr ([185.233.100.1])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <andreas@enge.fr>) id 1pwgiW-0003XH-Hx
 for bug-guix@gnu.org; Wed, 10 May 2023 06:06:42 -0400
Received: from localhost (localhost [127.0.0.1])
 by hera.aquilenet.fr (Postfix) with ESMTP id 03CD6354;
 Wed, 10 May 2023 12:06:32 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at hera.aquilenet.fr
Received: from hera.aquilenet.fr ([127.0.0.1])
 by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 7Ed9LQSqCVIA; Wed, 10 May 2023 12:06:30 +0200 (CEST)
Received: from jurong (unknown [IPv6:2001:861:c4:f2f0::c64])
 by hera.aquilenet.fr (Postfix) with ESMTPSA id 44FE45B;
 Wed, 10 May 2023 12:06:30 +0200 (CEST)
Date: Wed, 10 May 2023 12:06:28 +0200
From: Andreas Enge <andreas@enge.fr>
Message-ID: <ZFtspPexmg3YM/ug@jurong>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Received-SPF: pass client-ip=185.233.100.1; envelope-from=andreas@enge.fr;
 helo=hera.aquilenet.fr
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001,
 SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-BeenThere: bug-guix@gnu.org
List-Id: Bug reports for GNU Guix <bug-guix.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-guix>,
 <mailto:bug-guix-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/bug-guix>
List-Post: <mailto:bug-guix@gnu.org>
List-Help: <mailto:bug-guix-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-guix>,
 <mailto:bug-guix-request@gnu.org?subject=subscribe>
Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org
Sender: bug-guix-bounces+larch=yhetil.org@gnu.org
X-Migadu-Flow: FLOW_IN
X-Migadu-Country: US
ARC-Seal: i=1; s=key1; d=yhetil.org; t=1683713250; a=rsa-sha256; cv=none;
	b=aZ6Rt8l68n8f4ARWbUG5w1hG2EENuVX0rZrZW8FuJ/cNPKaVo83SPNH6dJiSIi9g75N9qi
	3hf5NXhpiEuWehyqfMuANWlYqeV1QHaaAj6IufKXhlwNEy6mk7T/FvwgyCdFYOnafBRbqU
	0cHLvwIKORVNm+1Xuu2xLYzp3hsyGo2Arph6X+B/Qo5iUpN9OpA+MZjAV7JLnXHlt6XfSo
	PCT0GeChNY/zMB2Er6M3XXDM5Dr3SHtwDt6QCMsdKrVukcszuUQgAVUchtntsM4WjfYl9o
	n3KZT4w5OWebt6asEEnFoA2Rgxz8n1c1A4bkIeX0Sm93GEm8zqSF09LmGwWRZw==
ARC-Authentication-Results: i=1;
	aspmx1.migadu.com;
	dkim=none;
	dmarc=none;
	spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org;
	s=key1; t=1683713250;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:mime-version:mime-version:
	 content-type:content-type:resent-cc:resent-from:resent-sender:
	 resent-message-id:list-id:list-help:list-unsubscribe:list-subscribe:
	 list-post; bh=9Z+CTlh8rI2U5g/8hsl3a+GETPEtBg+E4WQLca1ztVQ=;
	b=gDUvpuKAEA3ufJpkjto/VNuFuWRL6Q8grfnqLeWSexY/j/frRegCmt3eirFtlMo/is1iOr
	20JavDkYjrgGh03sh9eta66DA8a9+xjGSLpqOyE6F+EP4lXp4+GcmeSZBStsXkhSjwbrdQ
	iTL6wzy2qISeepVkwfU9Csc8P/ZEnV12gX/1Jw23I94R7I8PJ+TY9u3qiw1q0YxYjykrfM
	AZV1/5zknEwYX6V6tGVlk8T2SoMJmINIDgJYqogpjUTtBQbCFovtli34QV3GdLDMmx/oF5
	mP/X4Suj+FJKTCUQO2DXyFy9lObW4VkaH71MAJG/cAPToAG5C0f8PUUjstK6KQ==
X-Migadu-Scanner: scn1.migadu.com
Authentication-Results: aspmx1.migadu.com;
	dkim=none;
	dmarc=none;
	spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"
X-Migadu-Spam-Score: -2.88
X-Spam-Score: -2.88
X-Migadu-Queue-Id: 53974ABCD
X-TUID: kUMaQ+FGWOwY

This is a wishlist bug, but it is important for architectures where we
are currently short on build power, and where this issue can stall builds
and waste an arbitrary amount of build power.

Cuirass should sort builds and only offload derivations for which all
inputs are available.

In my current understanding, cuirass offloads arbitrary derivations, and
the machine to which they are offloaded then starts building recursively
all inputs. If this is true, then it is possible that at some point in time,
all build slots are taken by the same package built as many times as there
are machines; I have seen something like this when working on core-updates,
where several machines were building the main gcc compiler at the same time.
At worse, if cuirass asks every machine to build a leaf package, this may
result in a simultaneous full bootstrap on all of them.

The situation becomes worse when the package in question fails. Then as
I understand it, each machine may receive a request to build something
depending on the failing package and try the failing build and thus waste
build power that will not be available to build other packages successfully.

Solving this problem may also make reports of build failures more accurate
and legible. For instance, doxygen currently fails to build on aarch64:
   https://ci.guix.gnu.org/build/969427/details
and is reported as "Failed", and not as "Failed (dependency)".
However, looking at the build log
   https://ci.guix.gnu.org/build/969427/log/raw
shows this:
...
building path(s) `/gnu/store/p5vqrwywz053r1vkiyw54dp9gj7vw9xd-ninja-1.11.1'
...
builder for `/gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv' failed with exit code 1
@ build-failed /gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv - 1 builder for `/gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv' failed with exit code 1
cannot build derivation `/gnu/store/hlscqram59id51hxg0fj15041v52h1kw-meson-1.1.0.drv': 1 dependencies couldn't be built
cannot build derivation `/gnu/store/w8qxkrwpffd9qs5w1jggy1yi27ycm0xr-jsoncpp-1.9.5.drv': 2 dependencies couldn't be built
cannot build derivation `/gnu/store/mss4yv015cil1vnjnglq506m83b7n3dy-cmake-bootstrap-3.24.2.drv': 1 dependencies couldn't be built
cannot build derivation `/gnu/store/w0irp6xn30nlmpizhcbjnvhqmsba41jn-cmake-minimal-3.24.2.drv': 2 dependencies couldn't be built
cannot build derivation `/gnu/store/rqk2rbnpjpcnqswz8hqari1rnw6r8v1m-doxygen-1.9.5.drv': 1 dependencies couldn't be built

So it is indeed a different package that fails (and the last few lines
give a list of dependencies between ninja and doxygen, each of which may
or may not fail once ninja is fixed).

Notice that this could be solved without a topological sorting of the
dependency graph: It would be enough to keep an array deriv in which
deriv[i] contains a list of derivations requiring i more inputs to be built,
together with the list of inputs; elements in deriv[0] are ready to be sent
to a build machine, and upon completion of a build, all derivations
depending on it should be moved from deriv[i] to deriv[i-1] if the input
has been built successfully, or marked as "Failed (dependency)" if the
input has failed. (But this could be expensive, and may require appropriate
data structures.)

Alternatively, build jobs could be sorted topologically and then be kept
in a list; then before sending out a job, all its inputs have been tried
to be built; the job should then be sent if all inputs are available, or
be marked as "Failed (dependency)" if any of them has failed.

Andreas