unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* llvm on aarch64 builds very slowly
@ 2022-02-22 23:22 Ricardo Wurmus
  2022-02-23 16:33 ` Ricardo Wurmus
  2022-02-24  2:23 ` Maxim Cournoyer
  0 siblings, 2 replies; 5+ messages in thread
From: Ricardo Wurmus @ 2022-02-22 23:22 UTC (permalink / raw)
  To: guix-devel

Hi Guix,

I had to manually run the build of llvm 11 on aarch64, because it would
keep timing out:

    time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999

After more than two days it finally built.  This seems a little
excessive.  Towards the end of the build I saw a 1% point progress
increase for every hour that passed.

Is there something wrong with the build nodes, are we building llvm 11
wrong, or is this just the way it is on aarch64 systems?

-- 
Ricardo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: llvm on aarch64 builds very slowly
  2022-02-22 23:22 llvm on aarch64 builds very slowly Ricardo Wurmus
@ 2022-02-23 16:33 ` Ricardo Wurmus
  2022-02-23 17:49   ` Christopher Baines
  2022-02-24  2:23 ` Maxim Cournoyer
  1 sibling, 1 reply; 5+ messages in thread
From: Ricardo Wurmus @ 2022-02-23 16:33 UTC (permalink / raw)
  To: guix-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> Hi Guix,
>
> I had to manually run the build of llvm 11 on aarch64, because it would
> keep timing out:
>
>     time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999
>
> After more than two days it finally built.  This seems a little
> excessive.  Towards the end of the build I saw a 1% point progress
> increase for every hour that passed.
>
> Is there something wrong with the build nodes, are we building llvm 11
> wrong, or is this just the way it is on aarch64 systems?

I now see that gfortran 10 also takes a very long time to build.  It’s
on kreuzberg (10.0.0.9) and I see that out of the 16 cores only *one* is
really busy.  Other cores sometimes come in with a tiny bit of work, but
you might miss it if you blink.

Guix ran “make -j 16” at the top level, but the other make processes
that have been spawned as children do not have “-j 16”.  There are
probably 16 or so invocations of cc1plus, but only CPU0 seems to be busy
at 100% while the others are at 0.

What’s up with that?

-- 
Ricardo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: llvm on aarch64 builds very slowly
  2022-02-23 16:33 ` Ricardo Wurmus
@ 2022-02-23 17:49   ` Christopher Baines
  2022-02-26 16:54     ` Kaelyn
  0 siblings, 1 reply; 5+ messages in thread
From: Christopher Baines @ 2022-02-23 17:49 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]


Ricardo Wurmus <rekado@elephly.net> writes:

> Ricardo Wurmus <rekado@elephly.net> writes:
>
>> Hi Guix,
>>
>> I had to manually run the build of llvm 11 on aarch64, because it would
>> keep timing out:
>>
>>     time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999
>>
>> After more than two days it finally built.  This seems a little
>> excessive.  Towards the end of the build I saw a 1% point progress
>> increase for every hour that passed.
>>
>> Is there something wrong with the build nodes, are we building llvm 11
>> wrong, or is this just the way it is on aarch64 systems?
>
> I now see that gfortran 10 also takes a very long time to build.  It’s
> on kreuzberg (10.0.0.9) and I see that out of the 16 cores only *one* is
> really busy.  Other cores sometimes come in with a tiny bit of work, but
> you might miss it if you blink.
>
> Guix ran “make -j 16” at the top level, but the other make processes
> that have been spawned as children do not have “-j 16”.  There are
> probably 16 or so invocations of cc1plus, but only CPU0 seems to be busy
> at 100% while the others are at 0.
>
> What’s up with that?

Regarding the llvm derivation you mentioned [1], it looks like for
bordeaux.guix.gnu.org, the build completed in around a couple of hours,
this was on the 4 core Overdrive machine though.

1: https://data.guix.gnu.org/gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv

On the subject of the HoneyComb machines, I haven't noticed anything
like you describe with the one (hatysa) running behind
bordeaux.guix.gnu.org. Most cores are fully occupied most of the time,
which the 15m load average sitting around 16.

Some things to check though, what does the load average look like when
you think the system should be using all it's cores? If it's high but
there's not much CPU utilisation, that suggests there's a bottleneck
somewhere else.

Also, what does the memory and swap usage look like? Hatysa has 32GB of
memory and swap, and ideally it would actually have 64GB, since that
would avoid swapping more often.

One problem I have observed with hatysa is storage
instability/performance issues. Looking in /var/log/messages, I see
things like the following. Maybe check /var/log/messages for anything
similar?

  nvme nvme0: I/O 0 QID 6 timeout, aborting
  nvme nvme0: I/O 1 QID 6 timeout, aborting
  nvme nvme0: I/O 2 QID 6 timeout, aborting
  nvme nvme0: I/O 3 QID 6 timeout, aborting
  nvme nvme0: Abort status: 0x0
  nvme nvme0: Abort status: 0x0
  nvme nvme0: Abort status: 0x0
  nvme nvme0: Abort status: 0x0

Lastly, I'm not quite sure what thermal problems look like on ARM, but
maybe check the CPU temps. I see between 60 and 70 degrees as reported
by the sensors command, this is with a different CPU cooler though.

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: llvm on aarch64 builds very slowly
  2022-02-22 23:22 llvm on aarch64 builds very slowly Ricardo Wurmus
  2022-02-23 16:33 ` Ricardo Wurmus
@ 2022-02-24  2:23 ` Maxim Cournoyer
  1 sibling, 0 replies; 5+ messages in thread
From: Maxim Cournoyer @ 2022-02-24  2:23 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi Ricardo,

Ricardo Wurmus <rekado@elephly.net> writes:

> Hi Guix,
>
> I had to manually run the build of llvm 11 on aarch64, because it would
> keep timing out:
>
>     time guix build
> /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv
> --timeout=999999 --max-silent-time=999999
>
> After more than two days it finally built.  This seems a little
> excessive.  Towards the end of the build I saw a 1% point progress
> increase for every hour that passed.
>
> Is there something wrong with the build nodes, are we building llvm 11
> wrong, or is this just the way it is on aarch64 systems?

I'd ask in #llvm on libera.chat; hopefully someone is used to develop on
aarch64 there and would know.

Thanks,

Maxim


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: llvm on aarch64 builds very slowly
  2022-02-23 17:49   ` Christopher Baines
@ 2022-02-26 16:54     ` Kaelyn
  0 siblings, 0 replies; 5+ messages in thread
From: Kaelyn @ 2022-02-26 16:54 UTC (permalink / raw)
  To: Christopher Baines; +Cc: Ricardo Wurmus, guix-devel

On Wednesday, February 23rd, 2022 at 9:49 AM, Christopher Baines <mail@cbaines.net> wrote:

> Ricardo Wurmus rekado@elephly.net writes:
>
> > Ricardo Wurmus rekado@elephly.net writes:
> >
> > > Hi Guix,
> > >
> > > I had to manually run the build of llvm 11 on aarch64, because it would
> > >
> > > keep timing out:
> > >
> > > time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999
> > >
> > > After more than two days it finally built. This seems a little
> > >
> > > excessive. Towards the end of the build I saw a 1% point progress
> > >
> > > increase for every hour that passed.
> > >
> > > Is there something wrong with the build nodes, are we building llvm 11
> > >
> > > wrong, or is this just the way it is on aarch64 systems?
> >
> > I now see that gfortran 10 also takes a very long time to build. It’s
> >
> > on kreuzberg (10.0.0.9) and I see that out of the 16 cores only one is
> >
> > really busy. Other cores sometimes come in with a tiny bit of work, but
> >
> > you might miss it if you blink.
> >
> > Guix ran “make -j 16” at the top level, but the other make processes
> >
> > that have been spawned as children do not have “-j 16”. There are
> >
> > probably 16 or so invocations of cc1plus, but only CPU0 seems to be busy
> >
> > at 100% while the others are at 0.
> >
> > What’s up with that?
>
> Regarding the llvm derivation you mentioned [1], it looks like for
>
> bordeaux.guix.gnu.org, the build completed in around a couple of hours,
>
> this was on the 4 core Overdrive machine though.
>
> 1: https://data.guix.gnu.org/gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv
>
> On the subject of the HoneyComb machines, I haven't noticed anything
>
> like you describe with the one (hatysa) running behind
>
> bordeaux.guix.gnu.org. Most cores are fully occupied most of the time,
>
> which the 15m load average sitting around 16.
>
> Some things to check though, what does the load average look like when
>
> you think the system should be using all it's cores? If it's high but
>
> there's not much CPU utilisation, that suggests there's a bottleneck
>
> somewhere else.
>
> Also, what does the memory and swap usage look like? Hatysa has 32GB of
>
> memory and swap, and ideally it would actually have 64GB, since that
>
> would avoid swapping more often.

One thing I remember about building LLVM a number of years ago when I was working on it through my job (though only for x86-64, not aarch64) is that the build is very memory intensive. In particular, linking the various binaries would each be quite slow and consume a lot of memory, causing significant, intense swapping with less than 64GB of memory in a parallel build (and sometimes eventually trigger the OOM killer). As I recall, using ld.bfd for the build was by far the slowest, ld.gold was noticeably better, and ld.lld was showing promise for doing better than ld.gold. Just my $0.02 of past experiences, in case they help to understand the slow aarch64 build with LLVM 11.

Cheers,
Kaelyn

>
> One problem I have observed with hatysa is storage
>
> instability/performance issues. Looking in /var/log/messages, I see
>
> things like the following. Maybe check /var/log/messages for anything
>
> similar?
>
> nvme nvme0: I/O 0 QID 6 timeout, aborting
>
> nvme nvme0: I/O 1 QID 6 timeout, aborting
>
> nvme nvme0: I/O 2 QID 6 timeout, aborting
>
> nvme nvme0: I/O 3 QID 6 timeout, aborting
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> Lastly, I'm not quite sure what thermal problems look like on ARM, but
>
> maybe check the CPU temps. I see between 60 and 70 degrees as reported
>
> by the sensors command, this is with a different CPU cooler though.
>
> Chris


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-26 16:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22 23:22 llvm on aarch64 builds very slowly Ricardo Wurmus
2022-02-23 16:33 ` Ricardo Wurmus
2022-02-23 17:49   ` Christopher Baines
2022-02-26 16:54     ` Kaelyn
2022-02-24  2:23 ` Maxim Cournoyer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).