* llvm on aarch64 builds very slowly @ 2022-02-22 23:22 Ricardo Wurmus 2022-02-23 16:33 ` Ricardo Wurmus 2022-02-24 2:23 ` Maxim Cournoyer 0 siblings, 2 replies; 5+ messages in thread From: Ricardo Wurmus @ 2022-02-22 23:22 UTC (permalink / raw) To: guix-devel Hi Guix, I had to manually run the build of llvm 11 on aarch64, because it would keep timing out: time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999 After more than two days it finally built. This seems a little excessive. Towards the end of the build I saw a 1% point progress increase for every hour that passed. Is there something wrong with the build nodes, are we building llvm 11 wrong, or is this just the way it is on aarch64 systems? -- Ricardo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: llvm on aarch64 builds very slowly 2022-02-22 23:22 llvm on aarch64 builds very slowly Ricardo Wurmus @ 2022-02-23 16:33 ` Ricardo Wurmus 2022-02-23 17:49 ` Christopher Baines 2022-02-24 2:23 ` Maxim Cournoyer 1 sibling, 1 reply; 5+ messages in thread From: Ricardo Wurmus @ 2022-02-23 16:33 UTC (permalink / raw) To: guix-devel Ricardo Wurmus <rekado@elephly.net> writes: > Hi Guix, > > I had to manually run the build of llvm 11 on aarch64, because it would > keep timing out: > > time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999 > > After more than two days it finally built. This seems a little > excessive. Towards the end of the build I saw a 1% point progress > increase for every hour that passed. > > Is there something wrong with the build nodes, are we building llvm 11 > wrong, or is this just the way it is on aarch64 systems? I now see that gfortran 10 also takes a very long time to build. It’s on kreuzberg (10.0.0.9) and I see that out of the 16 cores only *one* is really busy. Other cores sometimes come in with a tiny bit of work, but you might miss it if you blink. Guix ran “make -j 16” at the top level, but the other make processes that have been spawned as children do not have “-j 16”. There are probably 16 or so invocations of cc1plus, but only CPU0 seems to be busy at 100% while the others are at 0. What’s up with that? -- Ricardo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: llvm on aarch64 builds very slowly 2022-02-23 16:33 ` Ricardo Wurmus @ 2022-02-23 17:49 ` Christopher Baines 2022-02-26 16:54 ` Kaelyn 0 siblings, 1 reply; 5+ messages in thread From: Christopher Baines @ 2022-02-23 17:49 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 2913 bytes --] Ricardo Wurmus <rekado@elephly.net> writes: > Ricardo Wurmus <rekado@elephly.net> writes: > >> Hi Guix, >> >> I had to manually run the build of llvm 11 on aarch64, because it would >> keep timing out: >> >> time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999 >> >> After more than two days it finally built. This seems a little >> excessive. Towards the end of the build I saw a 1% point progress >> increase for every hour that passed. >> >> Is there something wrong with the build nodes, are we building llvm 11 >> wrong, or is this just the way it is on aarch64 systems? > > I now see that gfortran 10 also takes a very long time to build. It’s > on kreuzberg (10.0.0.9) and I see that out of the 16 cores only *one* is > really busy. Other cores sometimes come in with a tiny bit of work, but > you might miss it if you blink. > > Guix ran “make -j 16” at the top level, but the other make processes > that have been spawned as children do not have “-j 16”. There are > probably 16 or so invocations of cc1plus, but only CPU0 seems to be busy > at 100% while the others are at 0. > > What’s up with that? Regarding the llvm derivation you mentioned [1], it looks like for bordeaux.guix.gnu.org, the build completed in around a couple of hours, this was on the 4 core Overdrive machine though. 1: https://data.guix.gnu.org/gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv On the subject of the HoneyComb machines, I haven't noticed anything like you describe with the one (hatysa) running behind bordeaux.guix.gnu.org. Most cores are fully occupied most of the time, which the 15m load average sitting around 16. Some things to check though, what does the load average look like when you think the system should be using all it's cores? If it's high but there's not much CPU utilisation, that suggests there's a bottleneck somewhere else. Also, what does the memory and swap usage look like? Hatysa has 32GB of memory and swap, and ideally it would actually have 64GB, since that would avoid swapping more often. One problem I have observed with hatysa is storage instability/performance issues. Looking in /var/log/messages, I see things like the following. Maybe check /var/log/messages for anything similar? nvme nvme0: I/O 0 QID 6 timeout, aborting nvme nvme0: I/O 1 QID 6 timeout, aborting nvme nvme0: I/O 2 QID 6 timeout, aborting nvme nvme0: I/O 3 QID 6 timeout, aborting nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 Lastly, I'm not quite sure what thermal problems look like on ARM, but maybe check the CPU temps. I see between 60 and 70 degrees as reported by the sensors command, this is with a different CPU cooler though. Chris [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: llvm on aarch64 builds very slowly 2022-02-23 17:49 ` Christopher Baines @ 2022-02-26 16:54 ` Kaelyn 0 siblings, 0 replies; 5+ messages in thread From: Kaelyn @ 2022-02-26 16:54 UTC (permalink / raw) To: Christopher Baines; +Cc: Ricardo Wurmus, guix-devel On Wednesday, February 23rd, 2022 at 9:49 AM, Christopher Baines <mail@cbaines.net> wrote: > Ricardo Wurmus rekado@elephly.net writes: > > > Ricardo Wurmus rekado@elephly.net writes: > > > > > Hi Guix, > > > > > > I had to manually run the build of llvm 11 on aarch64, because it would > > > > > > keep timing out: > > > > > > time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999 > > > > > > After more than two days it finally built. This seems a little > > > > > > excessive. Towards the end of the build I saw a 1% point progress > > > > > > increase for every hour that passed. > > > > > > Is there something wrong with the build nodes, are we building llvm 11 > > > > > > wrong, or is this just the way it is on aarch64 systems? > > > > I now see that gfortran 10 also takes a very long time to build. It’s > > > > on kreuzberg (10.0.0.9) and I see that out of the 16 cores only one is > > > > really busy. Other cores sometimes come in with a tiny bit of work, but > > > > you might miss it if you blink. > > > > Guix ran “make -j 16” at the top level, but the other make processes > > > > that have been spawned as children do not have “-j 16”. There are > > > > probably 16 or so invocations of cc1plus, but only CPU0 seems to be busy > > > > at 100% while the others are at 0. > > > > What’s up with that? > > Regarding the llvm derivation you mentioned [1], it looks like for > > bordeaux.guix.gnu.org, the build completed in around a couple of hours, > > this was on the 4 core Overdrive machine though. > > 1: https://data.guix.gnu.org/gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv > > On the subject of the HoneyComb machines, I haven't noticed anything > > like you describe with the one (hatysa) running behind > > bordeaux.guix.gnu.org. Most cores are fully occupied most of the time, > > which the 15m load average sitting around 16. > > Some things to check though, what does the load average look like when > > you think the system should be using all it's cores? If it's high but > > there's not much CPU utilisation, that suggests there's a bottleneck > > somewhere else. > > Also, what does the memory and swap usage look like? Hatysa has 32GB of > > memory and swap, and ideally it would actually have 64GB, since that > > would avoid swapping more often. One thing I remember about building LLVM a number of years ago when I was working on it through my job (though only for x86-64, not aarch64) is that the build is very memory intensive. In particular, linking the various binaries would each be quite slow and consume a lot of memory, causing significant, intense swapping with less than 64GB of memory in a parallel build (and sometimes eventually trigger the OOM killer). As I recall, using ld.bfd for the build was by far the slowest, ld.gold was noticeably better, and ld.lld was showing promise for doing better than ld.gold. Just my $0.02 of past experiences, in case they help to understand the slow aarch64 build with LLVM 11. Cheers, Kaelyn > > One problem I have observed with hatysa is storage > > instability/performance issues. Looking in /var/log/messages, I see > > things like the following. Maybe check /var/log/messages for anything > > similar? > > nvme nvme0: I/O 0 QID 6 timeout, aborting > > nvme nvme0: I/O 1 QID 6 timeout, aborting > > nvme nvme0: I/O 2 QID 6 timeout, aborting > > nvme nvme0: I/O 3 QID 6 timeout, aborting > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > nvme nvme0: Abort status: 0x0 > > Lastly, I'm not quite sure what thermal problems look like on ARM, but > > maybe check the CPU temps. I see between 60 and 70 degrees as reported > > by the sensors command, this is with a different CPU cooler though. > > Chris ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: llvm on aarch64 builds very slowly 2022-02-22 23:22 llvm on aarch64 builds very slowly Ricardo Wurmus 2022-02-23 16:33 ` Ricardo Wurmus @ 2022-02-24 2:23 ` Maxim Cournoyer 1 sibling, 0 replies; 5+ messages in thread From: Maxim Cournoyer @ 2022-02-24 2:23 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel Hi Ricardo, Ricardo Wurmus <rekado@elephly.net> writes: > Hi Guix, > > I had to manually run the build of llvm 11 on aarch64, because it would > keep timing out: > > time guix build > /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv > --timeout=999999 --max-silent-time=999999 > > After more than two days it finally built. This seems a little > excessive. Towards the end of the build I saw a 1% point progress > increase for every hour that passed. > > Is there something wrong with the build nodes, are we building llvm 11 > wrong, or is this just the way it is on aarch64 systems? I'd ask in #llvm on libera.chat; hopefully someone is used to develop on aarch64 there and would know. Thanks, Maxim ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-02-26 16:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-22 23:22 llvm on aarch64 builds very slowly Ricardo Wurmus 2022-02-23 16:33 ` Ricardo Wurmus 2022-02-23 17:49 ` Christopher Baines 2022-02-26 16:54 ` Kaelyn 2022-02-24 2:23 ` Maxim Cournoyer
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.