From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id GIRyBT9bGmKzbAEAgWs5BA (envelope-from ) for ; Sat, 26 Feb 2022 17:54:23 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id UNPFOT5bGmLyOQEAG6o9tA (envelope-from ) for ; Sat, 26 Feb 2022 17:54:22 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id AB20B463E1 for ; Sat, 26 Feb 2022 17:54:22 +0100 (CET) Received: from localhost ([::1]:38894 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nO0Kr-0003NO-TR for larch@yhetil.org; Sat, 26 Feb 2022 11:54:21 -0500 Received: from eggs.gnu.org ([209.51.188.92]:57514) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nO0Kg-0003NC-2C for guix-devel@gnu.org; Sat, 26 Feb 2022 11:54:10 -0500 Received: from mail-4318.protonmail.ch ([185.70.43.18]:26185) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nO0Kd-0003xZ-Ax for guix-devel@gnu.org; Sat, 26 Feb 2022 11:54:09 -0500 Date: Sat, 26 Feb 2022 16:54:02 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1645894442; bh=P9+wDNkVDAkwGxnDTbOPfJ9zMdO3Q85kD4v3S9HaI7Y=; h=Date:To:From:Cc:Reply-To:Subject:Message-ID:In-Reply-To: References:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID; b=fuYDQ0ApMiDoWUKhZvv994amQIjCg1cSKZIO6NJh5dMYTaBYC/Uqriho91icFReao 7JAEbX5fYOb2A1Qf6rDe++vXgSyu7nMOXWK0his7iacoCYfgD62ly4CX8fGk2tgfLU vtk9MPZZR495UVmef0JEZ8IsCLR51+trEDwxW+AJluAhcea7hCumcTTmetBD0c7QQu QIh/CvvokBXM0X4e3c9iGex47mEJEDKcsW9GXxQPhErnMT6VDXXv5pQvrgrx/1jmOM MNIaGG2azGhz6kckKDsl7AbNIyCGTIpXwSC6sghxKRDsNM54DUdGHR6gKznTDVLsMH 4tXJsxNPcTNSQ== To: Christopher Baines From: Kaelyn Cc: Ricardo Wurmus , guix-devel@gnu.org Subject: Re: llvm on aarch64 builds very slowly Message-ID: In-Reply-To: <87wnhltq4k.fsf@cbaines.net> References: <87wnhmbhpy.fsf@elephly.net> <877d9lbkll.fsf@elephly.net> <87wnhltq4k.fsf@cbaines.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.43.18; envelope-from=kaelyn.alexi@protonmail.com; helo=mail-4318.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Kaelyn Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1645894462; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=P9+wDNkVDAkwGxnDTbOPfJ9zMdO3Q85kD4v3S9HaI7Y=; b=mjtZHWmR3jmpohtsDBws5x5xIC4M/pHfCtl325wLpluFKMEycZmJ61M8BjnPg/xd62Us72 TElNNLO5j9sD3ruV9Q84+/pCz5bhWHg76CH3CWVawbNvruvC4R3U6w2gnMTlFjD3i9GNQb Ruo3HxegeFwl5ADsFsLJco8/zIjanG4uPEXz5g503slKvtrDPSmMsTv71w5kxaxNqAadph zCDcaIQ0LrFaRC+YjeNFiTYrMo8n2i0R1xKPDSewVZDRZL+oCeaipLYAhSaiPbsJJDXlaK LzJYo+QVFkiNkk1Cp0Met4xyx+1b8YBMAozgpu4ZpLKpvpxPGyUkXtSDwjo2Uw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1645894462; a=rsa-sha256; cv=none; b=arWsLqXfJXPelcQt0oYTo9hUJunYlgLvuSRFRcx/1MOIVeI5zVVKpmIJBAKIj8Bcvssj2m mBkFXFLYgE2qYDjyLxvYk3+D5LsNVEZiq3aKuFPGryKsjagCCzuGp+evuPvsUe4mbPFhtv nQ22ZVYhYwPHX54IL8KvzuPDFS7AYCLyoU9icqdPF0WNILxPWxo4Dv8mD0a+yp+rhlNUFN liTZ8ROfAyFQYGOSYhmHFjHLgh/ZxUyx4wNMdUeNgyidFxRnom9MrpnSYr/A7cjm1ocYRt fe4fyjtzT2tZiRJvQP3/jiYCAKlSsUboJXEsDQ3wv7S4T+G1SvA1NlTihZfytQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail3 header.b=fuYDQ0Ap; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -9.32 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail3 header.b=fuYDQ0Ap; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: AB20B463E1 X-Spam-Score: -9.32 X-Migadu-Scanner: scn0.migadu.com X-TUID: TgMeHaBSurnO On Wednesday, February 23rd, 2022 at 9:49 AM, Christopher Baines wrote: > Ricardo Wurmus rekado@elephly.net writes: > > > Ricardo Wurmus rekado@elephly.net writes: > > > > > Hi Guix, > > > > > > I had to manually run the build of llvm 11 on aarch64, because it wou= ld > > > > > > keep timing out: > > > > > > time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0= .0.drv --timeout=3D999999 --max-silent-time=3D999999 > > > > > > After more than two days it finally built. This seems a little > > > > > > excessive. Towards the end of the build I saw a 1% point progress > > > > > > increase for every hour that passed. > > > > > > Is there something wrong with the build nodes, are we building llvm 1= 1 > > > > > > wrong, or is this just the way it is on aarch64 systems? > > > > I now see that gfortran 10 also takes a very long time to build. It= =E2=80=99s > > > > on kreuzberg (10.0.0.9) and I see that out of the 16 cores only one is > > > > really busy. Other cores sometimes come in with a tiny bit of work, but > > > > you might miss it if you blink. > > > > Guix ran =E2=80=9Cmake -j 16=E2=80=9D at the top level, but the other m= ake processes > > > > that have been spawned as children do not have =E2=80=9C-j 16=E2=80= =9D. There are > > > > probably 16 or so invocations of cc1plus, but only CPU0 seems to be bus= y > > > > at 100% while the others are at 0. > > > > What=E2=80=99s up with that? > > Regarding the llvm derivation you mentioned [1], it looks like for > > bordeaux.guix.gnu.org, the build completed in around a couple of hours, > > this was on the 4 core Overdrive machine though. > > 1: https://data.guix.gnu.org/gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-l= lvm-11.0.0.drv > > On the subject of the HoneyComb machines, I haven't noticed anything > > like you describe with the one (hatysa) running behind > > bordeaux.guix.gnu.org. Most cores are fully occupied most of the time, > > which the 15m load average sitting around 16. > > Some things to check though, what does the load average look like when > > you think the system should be using all it's cores? If it's high but > > there's not much CPU utilisation, that suggests there's a bottleneck > > somewhere else. > > Also, what does the memory and swap usage look like? Hatysa has 32GB of > > memory and swap, and ideally it would actually have 64GB, since that > > would avoid swapping more often. One thing I remember about building LLVM a number of years ago when I was w= orking on it through my job (though only for x86-64, not aarch64) is that t= he build is very memory intensive. In particular, linking the various binar= ies would each be quite slow and consume a lot of memory, causing significa= nt, intense swapping with less than 64GB of memory in a parallel build (and= sometimes eventually trigger the OOM killer). As I recall, using ld.bfd fo= r the build was by far the slowest, ld.gold was noticeably better, and ld.l= ld was showing promise for doing better than ld.gold. Just my $0.02 of past= experiences, in case they help to understand the slow aarch64 build with L= LVM 11. Cheers, Kaelyn > > One problem I have observed with hatysa is storage > > instability/performance issues. Looking in /var/log/messages, I see > > things like the following. Maybe check /var/log/messages for anything > > similar? > > nvme nvme0: I/O 0 QID 6 timeout, aborting > > nvme nvme0: I/O 1 QID 6 timeout, aborting > > nvme nvme0: I/O 2 QID 6 timeout, aborting > > nvme nvme0: I/O 3 QID 6 timeout, aborting > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > Lastly, I'm not quite sure what thermal problems look like on ARM, but > > maybe check the CPU temps. I see between 60 and 70 degrees as reported > > by the sensors command, this is with a different CPU cooler though. > > Chris