From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 2s/cDxQTRV9wegAA0tVLHw (envelope-from ) for ; Tue, 25 Aug 2020 13:33:08 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id GDQICxQTRV8BDAAAbx9fmQ (envelope-from ) for ; Tue, 25 Aug 2020 13:33:08 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id D68ED94062D for ; Tue, 25 Aug 2020 13:33:07 +0000 (UTC) Received: from localhost ([::1]:55388 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kAZ4U-0005jm-OM for larch@yhetil.org; Tue, 25 Aug 2020 09:33:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49452) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kAZ4K-0005j9-Mn for guix-devel@gnu.org; Tue, 25 Aug 2020 09:32:57 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37848) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kAZ4I-0005iw-M6; Tue, 25 Aug 2020 09:32:55 -0400 Received: from [2a01:e0a:19b:d9a0:51fc:698d:e660:b966] (port=43642 helo=cervin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kAZ4G-00020T-PZ; Tue, 25 Aug 2020 09:32:54 -0400 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: Improving CI throughput References: <3308cccb-0f9f-6499-b948-3062a8a81ec8@web.de> <874kpriytq.fsf@gnu.org> <877dto2jhw.fsf_-_@gnu.org> Date: Tue, 25 Aug 2020 15:32:50 +0200 In-Reply-To: <877dto2jhw.fsf_-_@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\?\= \=\?utf-8\?Q\?\=22's\?\= message of "Mon, 24 Aug 2020 16:42:19 +0200") Message-ID: <87eenuu9z1.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix-devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -1.01 X-TUID: oL7140hQytXT Hey, > Yeah, this is a ridiculous situation. We should do a hackathon to get > better monitoring of useful metrics (machine load, > time-of-push-to-time-to-build-completion, etc.), to clearly identify the > bottlenecks (crashes? inefficient protocol? scheduling issues? Cuirass > or offload or guix-daemon issue?), and to address as many of them as we > can. > > Any volunteers? :-) I'd really like to improve the situation! A hackathon seems like a nice idea. As a matter of fact, I already spent some times improving the stability of Cuirass web interface[1]. Now I can see multiple topics that could be approached in parallel: * Add metrics to Cuirass as you suggested. There's an open ticket about that here[2]. * Investigate offloading issues[3]. * Fix database contention[4]. * Fix guix-daemon deadlocking[5]. * Monitor closely what's happening on Berlin and decide if it is opportune to add a build scheduler mechanism somewhere. See what Hydra is doing[6] and what Chris is proposing[7]. As most of the issues are only observed on Berlin machines, which access is restricted, we will also have to find a way to reproduce them locally. Anyway, if some people are motivated, we could try to plan a day or week-end to work on those topics :). Thanks, Mathieu [1]: https://issues.guix.gnu.org/42548. [2]: https://issues.guix.gnu.org/32548. [3]: https://issues.guix.gnu.org/34033. [4]: https://issues.guix.gnu.org/42001. [5]: https://issues.guix.gnu.org/31785. [6]: https://github.com/NixOS/hydra/blob/master/src/hydra-queue-runner/dispatcher.cc [7]: https://lists.gnu.org/archive/html/guix-devel/2020-04/msg00323.html