all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kaelyn <kaelyn.alexi@protonmail.com>
To: Christopher Baines <mail@cbaines.net>
Cc: Ricardo Wurmus <rekado@elephly.net>, guix-devel@gnu.org
Subject: Re: llvm on aarch64 builds very slowly
Date: Sat, 26 Feb 2022 16:54:02 +0000	[thread overview]
Message-ID: <q9qTt4nwDNzpPmkYmOkHp805og8qkTZrlo3oSnsmFkrG_jYbNk_jcI18LRNxX_boWoIsZqL3KrrCwnkqkxPtPuhdCkzPO_cZHfy0z3-XunI=@protonmail.com> (raw)
In-Reply-To: <87wnhltq4k.fsf@cbaines.net>

On Wednesday, February 23rd, 2022 at 9:49 AM, Christopher Baines <mail@cbaines.net> wrote:

> Ricardo Wurmus rekado@elephly.net writes:
>
> > Ricardo Wurmus rekado@elephly.net writes:
> >
> > > Hi Guix,
> > >
> > > I had to manually run the build of llvm 11 on aarch64, because it would
> > >
> > > keep timing out:
> > >
> > > time guix build /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv --timeout=999999 --max-silent-time=999999
> > >
> > > After more than two days it finally built. This seems a little
> > >
> > > excessive. Towards the end of the build I saw a 1% point progress
> > >
> > > increase for every hour that passed.
> > >
> > > Is there something wrong with the build nodes, are we building llvm 11
> > >
> > > wrong, or is this just the way it is on aarch64 systems?
> >
> > I now see that gfortran 10 also takes a very long time to build. It’s
> >
> > on kreuzberg (10.0.0.9) and I see that out of the 16 cores only one is
> >
> > really busy. Other cores sometimes come in with a tiny bit of work, but
> >
> > you might miss it if you blink.
> >
> > Guix ran “make -j 16” at the top level, but the other make processes
> >
> > that have been spawned as children do not have “-j 16”. There are
> >
> > probably 16 or so invocations of cc1plus, but only CPU0 seems to be busy
> >
> > at 100% while the others are at 0.
> >
> > What’s up with that?
>
> Regarding the llvm derivation you mentioned [1], it looks like for
>
> bordeaux.guix.gnu.org, the build completed in around a couple of hours,
>
> this was on the 4 core Overdrive machine though.
>
> 1: https://data.guix.gnu.org/gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv
>
> On the subject of the HoneyComb machines, I haven't noticed anything
>
> like you describe with the one (hatysa) running behind
>
> bordeaux.guix.gnu.org. Most cores are fully occupied most of the time,
>
> which the 15m load average sitting around 16.
>
> Some things to check though, what does the load average look like when
>
> you think the system should be using all it's cores? If it's high but
>
> there's not much CPU utilisation, that suggests there's a bottleneck
>
> somewhere else.
>
> Also, what does the memory and swap usage look like? Hatysa has 32GB of
>
> memory and swap, and ideally it would actually have 64GB, since that
>
> would avoid swapping more often.

One thing I remember about building LLVM a number of years ago when I was working on it through my job (though only for x86-64, not aarch64) is that the build is very memory intensive. In particular, linking the various binaries would each be quite slow and consume a lot of memory, causing significant, intense swapping with less than 64GB of memory in a parallel build (and sometimes eventually trigger the OOM killer). As I recall, using ld.bfd for the build was by far the slowest, ld.gold was noticeably better, and ld.lld was showing promise for doing better than ld.gold. Just my $0.02 of past experiences, in case they help to understand the slow aarch64 build with LLVM 11.

Cheers,
Kaelyn

>
> One problem I have observed with hatysa is storage
>
> instability/performance issues. Looking in /var/log/messages, I see
>
> things like the following. Maybe check /var/log/messages for anything
>
> similar?
>
> nvme nvme0: I/O 0 QID 6 timeout, aborting
>
> nvme nvme0: I/O 1 QID 6 timeout, aborting
>
> nvme nvme0: I/O 2 QID 6 timeout, aborting
>
> nvme nvme0: I/O 3 QID 6 timeout, aborting
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> Lastly, I'm not quite sure what thermal problems look like on ARM, but
>
> maybe check the CPU temps. I see between 60 and 70 degrees as reported
>
> by the sensors command, this is with a different CPU cooler though.
>
> Chris


  reply	other threads:[~2022-02-26 16:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-22 23:22 llvm on aarch64 builds very slowly Ricardo Wurmus
2022-02-23 16:33 ` Ricardo Wurmus
2022-02-23 17:49   ` Christopher Baines
2022-02-26 16:54     ` Kaelyn [this message]
2022-02-24  2:23 ` Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='q9qTt4nwDNzpPmkYmOkHp805og8qkTZrlo3oSnsmFkrG_jYbNk_jcI18LRNxX_boWoIsZqL3KrrCwnkqkxPtPuhdCkzPO_cZHfy0z3-XunI=@protonmail.com' \
    --to=kaelyn.alexi@protonmail.com \
    --cc=guix-devel@gnu.org \
    --cc=mail@cbaines.net \
    --cc=rekado@elephly.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.