I noticed that many Common Lisp or SBCL-related packages are failing to build on the aarc64 platform on our build farm, due the failure to build SBCL: From the log of <https://ci.guix.gnu.org/build/180326/details>: ------ //entering make-target-2.sh //doing warm init - compilation phase This is SBCL 2.1.0, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. Initial page table: Gen Boxed Code Raw LgBox LgCode LgRaw Pin Alloc Waste Trig WP GCs Mem-age 6 397 250 0 0 0 0 0 42335440 66352 2000000 647 0 0.0000 Total bytes allocated = 42335440 Dynamic-space-size bytes = 3221225472 COLD-INIT... (Length(TLFs)= 9736) Disassembler: 72 printers, 0 prefilters, 4 labelers CORRUPTION WARNING in SBCL pid 1774 tid 1774: Memory fault at 0xfffffffffffffffa (pc=0x1002199f70) The integrity of this image is possibly compromised. Exiting. Error opening /dev/tty: No such device or address Welcome to LDB, a low-level debugger for the Lisp runtime environment. ldb> real 0m6.120s user 0m5.958s sys 0m0.137s command "sh" "make.sh" "clisp" "--prefix=/gnu/store/j1ciw4dc8iskd5fdcw0s1ba08kkg7vx6-sbcl-2.1.0" "--dynamic-space-size=3072" "--with-sb-core-compression" "--with-sb-xref-for-internals" failed with status 1 ------ It appears that SBCL can support this platform. However, until we make it work, I plan to remove aarch64 from the "supported-systems" of sbcl, to avoid attempting these builds.
[-- Attachment #1: Type: text/plain, Size: 2154 bytes --] Leo Famulari <leo@famulari.name> skribis: > I noticed that many Common Lisp or SBCL-related packages are failing to > build on the aarc64 platform on our build farm, due the failure to build > SBCL: > > From the log of <https://ci.guix.gnu.org/build/180326/details>: > > ------ > //entering make-target-2.sh > //doing warm init - compilation phase > This is SBCL 2.1.0, an implementation of ANSI Common Lisp. > More information about SBCL is available at <http://www.sbcl.org/>. > > SBCL is free software, provided as is, with absolutely no warranty. > It is mostly in the public domain; some portions are provided under > BSD-style licenses. See the CREDITS and COPYING files in the > distribution for more information. > Initial page table: > Gen Boxed Code Raw LgBox LgCode LgRaw Pin Alloc Waste Trig WP GCs Mem-age > 6 397 250 0 0 0 0 0 42335440 66352 2000000 647 0 0.0000 > Total bytes allocated = 42335440 > Dynamic-space-size bytes = 3221225472 > COLD-INIT... (Length(TLFs)= 9736) > Disassembler: 72 printers, 0 prefilters, 4 labelers > CORRUPTION WARNING in SBCL pid 1774 tid 1774: > Memory fault at 0xfffffffffffffffa (pc=0x1002199f70) > The integrity of this image is possibly compromised. > Exiting. > Error opening /dev/tty: No such device or address > Welcome to LDB, a low-level debugger for the Lisp runtime environment. > ldb> > real 0m6.120s > user 0m5.958s > sys 0m0.137s > command "sh" "make.sh" "clisp" "--prefix=/gnu/store/j1ciw4dc8iskd5fdcw0s1ba08kkg7vx6-sbcl-2.1.0" "--dynamic-space-size=3072" "--with-sb-core-compression" "--with-sb-xref-for-internals" failed with status 1 > ------ > > It appears that SBCL can support this platform. However, until we make > it work, I plan to remove aarch64 from the "supported-systems" of sbcl, > to avoid attempting these builds. I tried to bootstrap sbcl using ecl instead of clisp, using "guix build -s aarch64-linux sbcl" on a x86-64 machine because I don't have any arm64 hardware, but it failed with the same memory fault. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 247 bytes --]
On Wed, Jan 13, 2021 at 10:03:47PM +0100, Guillaume Le Vaillant wrote:
> I tried to bootstrap sbcl using ecl instead of clisp, using
> "guix build -s aarch64-linux sbcl" on a x86-64 machine because I don't
> have any arm64 hardware, but it failed with the same memory fault.
Thanks! On #guix, Efraim reported that the builds were succeeding on his
aarch64 hardware. So, I'm not sure what's going on :/
[-- Attachment #1: Type: text/plain, Size: 1245 bytes --] Leo Famulari <leo@famulari.name> skribis: > On Wed, Jan 13, 2021 at 10:03:47PM +0100, Guillaume Le Vaillant wrote: >> I tried to bootstrap sbcl using ecl instead of clisp, using >> "guix build -s aarch64-linux sbcl" on a x86-64 machine because I don't >> have any arm64 hardware, but it failed with the same memory fault. > > Thanks! On #guix, Efraim reported that the builds were succeeding on his > aarch64 hardware. So, I'm not sure what's going on :/ When taking a look at the logs of failing builds of sbcl-* packages on aarch64-linux (for example at [1] for master or at [2] for staging), I saw that the build jobs try to build sbcl (which is currently failing) for every package. I would have expected the builds for sbcl-* packages to be marked as "failed because missing dependency" given that the main dependency of the asdf-build-system/sbcl failed to build. Instead the build farm is trying to rebuild sbcl over and over, which wastes quite some time and resources. It looks like the dependencies of the build system are not considered as dependencies for the packages that use this build system. Am I missing something? [1] https://ci.guix.gnu.org/eval/31355?status=failed [2] https://ci.guix.gnu.org/eval/20644?status=failed [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 247 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1291 bytes --] On Sat, Jan 16, 2021 at 10:42:58AM +0100, Guillaume Le Vaillant wrote: > When taking a look at the logs of failing builds of sbcl-* packages on > aarch64-linux (for example at [1] for master or at [2] for staging), > I saw that the build jobs try to build sbcl (which is currently failing) > for every package. > > I would have expected the builds for sbcl-* packages to be marked as > "failed because missing dependency" given that the main dependency of > the asdf-build-system/sbcl failed to build. Instead the build farm is > trying to rebuild sbcl over and over, which wastes quite some time and > resources. > > It looks like the dependencies of the build system are not considered as > dependencies for the packages that use this build system. > Am I missing something? That's a good observation. I hadn't thought of it. I'm CC-ing Mathieu Othacehe and guix-sysadmin so that we can disable these builds until we can fix the bug for real. Mathieu: this might explain why the build farm is spending all its effort on aarch64. By the way, SBCL can be built for aarch64 according to Efraim: http://logs.guix.gnu.org/guix/2021-01-13.log#105652 So, the original point of this bug is obviated, and I'm re-titling it to address your observation, Guillaume. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
Hello Leo & Guillaume,
> That's a good observation. I hadn't thought of it.
>
> I'm CC-ing Mathieu Othacehe and guix-sysadmin so that we can disable
> these builds until we can fix the bug for real. Mathieu: this might
> explain why the build farm is spending all its effort on aarch64.
If we want to disable SBCL builds temporarily we can do something
similar to what I did to disable Rust builds on non-x86_64 architectures
here: 0ed631866cc0b7cece2b0a0b50e39b37ae91bb67.
Regarding the rebuilding that is a limitation of the new Cuirass remote
building mechanism I am aware of and I need to solve. As build failures
are only cached of the machine performing the build and the builds are
now distributed to all the machines across the build farm, we need to
find a way to centralize the builds failure cache.
It would also be nice to optionally publish the build failures cache so
that the user doesn't try to build a package that is known to be failing
on the build farm. Chris, have you encountered this issue with the Build
Coordinator?
Thanks,
Mathieu
[-- Attachment #1: Type: text/plain, Size: 2064 bytes --] Mathieu Othacehe <othacehe@gnu.org> writes: > Hello Leo & Guillaume, > >> That's a good observation. I hadn't thought of it. >> >> I'm CC-ing Mathieu Othacehe and guix-sysadmin so that we can disable >> these builds until we can fix the bug for real. Mathieu: this might >> explain why the build farm is spending all its effort on aarch64. > > If we want to disable SBCL builds temporarily we can do something > similar to what I did to disable Rust builds on non-x86_64 architectures > here: 0ed631866cc0b7cece2b0a0b50e39b37ae91bb67. > > Regarding the rebuilding that is a limitation of the new Cuirass remote > building mechanism I am aware of and I need to solve. As build failures > are only cached of the machine performing the build and the builds are > now distributed to all the machines across the build farm, we need to > find a way to centralize the builds failure cache. > > It would also be nice to optionally publish the build failures cache so > that the user doesn't try to build a package that is known to be failing > on the build farm. Chris, have you encountered this issue with the Build > Coordinator? Not really. The first thing to note is that I'm running the Guix Build Coordinator currently without the guix-daemon --cache-failures option, in fact it's probably unwise to do so, as it would mean that rather than some builds taking place, the guix-daemon could just return a cached failure. I should probably mention this in the README. The way this situation is dealt with in the Guix Build Coordinator is simplified by the agents not attempting builds where the derivation inputs aren't present. If an agent is unable to ensure all the inputs are present, it just reports this to the coordinator. The behaviour is configurable, but the default missing inputs hook will submit a new build for a missing input, but only if one doesn't already exist. Because of this, you don't get the behaviour where some missing prerequisite that fails to built is built over and over again, every time you try and build a derivation that uses it. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --]
On Sun, Jan 17, 2021 at 10:29:49AM +0100, Mathieu Othacehe wrote: > If we want to disable SBCL builds temporarily we can do something > similar to what I did to disable Rust builds on non-x86_64 architectures > here: 0ed631866cc0b7cece2b0a0b50e39b37ae91bb67. ------ diff --git a/gnu/packages/rust.scm b/gnu/packages/rust.scm index 35a96b5754..91b5d6b6ec 100644 --- a/gnu/packages/rust.scm +++ b/gnu/packages/rust.scm @@ -452,6 +452,7 @@ test = { path = \"../libtest\" } (variable "LIBRARY_PATH") (files '("lib" "lib64"))))) + (supported-systems '("x86_64-linux")) (synopsis "Compiler for the Rust programming language") (description "Rust is a systems programming language that provides memory safety and thread safety guarantees.") ------ Hm, this would also prevent users from building the packages on their own machines. We need a way to make changes like this but limit them to the build farm.
[-- Attachment #1: Type: text/plain, Size: 1358 bytes --] On Sun, Jan 17, 2021 at 02:11:11PM -0500, Leo Famulari wrote: > On Sun, Jan 17, 2021 at 10:29:49AM +0100, Mathieu Othacehe wrote: > > If we want to disable SBCL builds temporarily we can do something > > similar to what I did to disable Rust builds on non-x86_64 architectures > > here: 0ed631866cc0b7cece2b0a0b50e39b37ae91bb67. > > ------ > diff --git a/gnu/packages/rust.scm b/gnu/packages/rust.scm > index 35a96b5754..91b5d6b6ec 100644 > --- a/gnu/packages/rust.scm > +++ b/gnu/packages/rust.scm > @@ -452,6 +452,7 @@ test = { path = \"../libtest\" } > (variable "LIBRARY_PATH") > (files '("lib" "lib64"))))) > > + (supported-systems '("x86_64-linux")) > (synopsis "Compiler for the Rust programming language") > (description "Rust is a systems programming language that provides memory > safety and thread safety guarantees.") > ------ > > Hm, this would also prevent users from building the packages on their > own machines. We need a way to make changes like this but limit them to > the build farm. Would marking it unsubstitutable work? Or would we need build-local -- Efraim Flashner <efraim@flashner.co.il> אפרים פלשנר GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
Hey Chris,
> The behaviour is configurable, but the default missing inputs hook will
> submit a new build for a missing input, but only if one doesn't already
> exist. Because of this, you don't get the behaviour where some missing
> prerequisite that fails to built is built over and over again, every
> time you try and build a derivation that uses it.
Oh, makes sense, thanks for explaining. I also had a look to Hydra in
that matter. They are breaking a "Build" into "BuildSteps" corresponding
more or less to the build dependencies, or "Inputs" in the
Coordinator. This way they make sure not to submit multiple times a same
build to the workers.
It means that it is also no longer needed to cache build failures I
guess. I think it would be really nice to port this mechanism to
Cuirass.
Thanks,
Mathieu
Hi,
Efraim Flashner <efraim@flashner.co.il> skribis:
> On Sun, Jan 17, 2021 at 02:11:11PM -0500, Leo Famulari wrote:
>> On Sun, Jan 17, 2021 at 10:29:49AM +0100, Mathieu Othacehe wrote:
>> > If we want to disable SBCL builds temporarily we can do something
>> > similar to what I did to disable Rust builds on non-x86_64 architectures
>> > here: 0ed631866cc0b7cece2b0a0b50e39b37ae91bb67.
>>
>> ------
>> diff --git a/gnu/packages/rust.scm b/gnu/packages/rust.scm
>> index 35a96b5754..91b5d6b6ec 100644
>> --- a/gnu/packages/rust.scm
>> +++ b/gnu/packages/rust.scm
>> @@ -452,6 +452,7 @@ test = { path = \"../libtest\" }
>> (variable "LIBRARY_PATH")
>> (files '("lib" "lib64")))))
>>
>> + (supported-systems '("x86_64-linux"))
>> (synopsis "Compiler for the Rust programming language")
>> (description "Rust is a systems programming language that provides memory
>> safety and thread safety guarantees.")
>> ------
>>
>> Hm, this would also prevent users from building the packages on their
>> own machines. We need a way to make changes like this but limit them to
>> the build farm.
>
> Would marking it unsubstitutable work? Or would we need build-local
If you mark it as #:substitutable? #f, then CI won’t try to build it but
users can still build it.
Now, I think we should avoid papering over CI configuration issues (did
I get that right?) by changing package definitions.
HTH!
Ludo’.
On Tue, Jan 19, 2021 at 02:13:58PM +0100, Ludovic Courtès wrote:
> Now, I think we should avoid papering over CI configuration issues (did
> I get that right?) by changing package definitions.
Yes, that's idiomatic English, if that is what you were asking about.
And I agree, the package definitions shouldn't include workarounds for
CI problems.
Leo Famulari <leo@famulari.name> skribis:
> On Tue, Jan 19, 2021 at 02:13:58PM +0100, Ludovic Courtès wrote:
>> Now, I think we should avoid papering over CI configuration issues (did
>> I get that right?) by changing package definitions.
>
> Yes, that's idiomatic English, if that is what you were asking about.
>
> And I agree, the package definitions shouldn't include workarounds for
> CI problems.
This time I wasn’t asking about English :-), but rather about whether
the patch Efraim provided was indeed to paper over our CI problems.
Ludo’.
On Mon, Feb 01, 2021 at 09:56:05PM +0100, Ludovic Courtès wrote: > This time I wasn’t asking about English :-), but rather about whether > the patch Efraim provided was indeed to paper over our CI problems. Well, that patch does indeed "paper over" the CI problems. Christopher Baines has graciously given me access to his Overdrive 1000 (monokuma), so I am able to test building packages on bare-metal aarch64. I'm testing building sbcl-cambl on it now. That is the package that spurred this bug report. Unfortunately, the aarch64 emulation we are using on berlin does cause a lot of failures that can't be reproduced on real hardware. We are using QEMU, right? If so, there is a new major QEMU update available on guix-patches, and maybe that would fix some problems: https://bugs.gnu.org/45014 As I mentioned in that patch submission, there are some minor issues with the update, but maybe we should just go for it.
On Mon, Feb 01, 2021 at 04:13:28PM -0500, Leo Famulari wrote:
> I'm testing building sbcl-cambl on it now. That is the package that
> spurred this bug report.
This build [0] succeeds on the overdrive.
So, we should prioritize getting the overdrives back into the build
farm, and we can also try the newer QEMU.
[0] /gnu/store/spjz8fh5kx6fh4qx69d4f21r9km77v59-sbcl-cambl-4.0.0-1.7016d1a.drv