From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id MN73COwLSV/iFQAA0tVLHw (envelope-from ) for ; Fri, 28 Aug 2020 13:51:40 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id SMy+BOwLSV/wYgAAB5/wlQ (envelope-from ) for ; Fri, 28 Aug 2020 13:51:40 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 89A9D940A5F for ; Fri, 28 Aug 2020 13:51:39 +0000 (UTC) Received: from localhost ([::1]:45262 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kBen4-0002Fw-GV for larch@yhetil.org; Fri, 28 Aug 2020 09:51:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41882) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kBemx-0002Fo-Fv for guix-devel@gnu.org; Fri, 28 Aug 2020 09:51:31 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:58608) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kBemw-0000xz-UF; Fri, 28 Aug 2020 09:51:30 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=49172 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kBemw-0000PD-3P; Fri, 28 Aug 2020 09:51:30 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: Improving CI throughput References: <3308cccb-0f9f-6499-b948-3062a8a81ec8@web.de> <874kpriytq.fsf@gnu.org> <877dto2jhw.fsf_-_@gnu.org> <87eenuu9z1.fsf@gnu.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 12 Fructidor an 228 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Fri, 28 Aug 2020 15:51:28 +0200 In-Reply-To: <87eenuu9z1.fsf@gnu.org> (Mathieu Othacehe's message of "Tue, 25 Aug 2020 15:32:50 +0200") Message-ID: <87imd2q3of.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix-devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -1.01 X-TUID: rOcngPCVymKz Hello! Mathieu Othacehe skribis: >> Yeah, this is a ridiculous situation. We should do a hackathon to get >> better monitoring of useful metrics (machine load, >> time-of-push-to-time-to-build-completion, etc.), to clearly identify the >> bottlenecks (crashes? inefficient protocol? scheduling issues? Cuirass >> or offload or guix-daemon issue?), and to address as many of them as we >> can. >> >> Any volunteers? :-) > > I'd really like to improve the situation! A hackathon seems like a > nice idea. > > As a matter of fact, I already spent some times improving the stability > of Cuirass web interface[1]. Much appreciated! > Now I can see multiple topics that could be approached in parallel: > > * Add metrics to Cuirass as you suggested. There's an open ticket about > that here[2]. > > * Investigate offloading issues[3]. > > * Fix database contention[4]. > > * Fix guix-daemon deadlocking[5]. > > * Monitor closely what's happening on Berlin and decide if it is > opportune to add a build scheduler mechanism somewhere. See what Hydra > is doing[6] and what Chris is proposing[7]. I=E2=80=99m happy to help tackle daemon/offload issues, but I=E2=80=99ll be= more motivated if others join. :-) > As most of the issues are only observed on Berlin machines, which access = is > restricted, we will also have to find a way to reproduce them locally. Yeah, and these are usually non-deterministic issues and not that frequent. > Anyway, if some people are motivated, we could try to plan a day or > week-end to work on those topics :). I can try and spend some time on it this week-end. I suggest that people join the IRC channel and shout =E2=80=9CCI!=E2=80=9D as a way to ral= ly, and then share what they=E2=80=99re looking at and how they feel. How does that sou= nd? Thanks for cooking up this list of issues! Ludo=E2=80=99.