unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#55848: [cuirass] workers stalled
@ 2022-06-08 15:31 Mathieu Othacehe
  2022-06-08 19:07 ` Greg Hogan
  2022-06-19  2:07 ` Tom Fitzhenry
  0 siblings, 2 replies; 11+ messages in thread
From: Mathieu Othacehe @ 2022-06-08 15:31 UTC (permalink / raw)
  To: 55848


Hello,

The aarch64 workers were all idle whereas 70k builds were
available. Once restarted, they started building again.

The problem might be that when the server is unavailable for a while the
worker connections expire and cannot be resumed once the server is
available again.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-08 15:31 bug#55848: [cuirass] workers stalled Mathieu Othacehe
@ 2022-06-08 19:07 ` Greg Hogan
  2022-06-11 10:44   ` Tom Fitzhenry
  2022-06-19  2:07 ` Tom Fitzhenry
  1 sibling, 1 reply; 11+ messages in thread
From: Greg Hogan @ 2022-06-08 19:07 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 55848

On Wed, Jun 8, 2022 at 11:32 AM Mathieu Othacehe <othacehe@gnu.org> wrote:
>
>
> Hello,
>
> The aarch64 workers were all idle whereas 70k builds were
> available. Once restarted, they started building again.
>
> The problem might be that when the server is unavailable for a while the
> worker connections expire and cannot be resumed once the server is
> available again.
>
> Thanks,
>
> Mathieu

The recent aarch64 builds look to all be failing with the following message.

===== <cut> =====
substitute:
substitute:  [Kupdating substitutes from 'https://ci.guix.gnu.org'...
 0.0%guix substitute: error: TLS error in procedure 'handshake': Error
in the pull function.
===== </cut> =====




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-08 19:07 ` Greg Hogan
@ 2022-06-11 10:44   ` Tom Fitzhenry
  2022-06-12 13:33     ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Fitzhenry @ 2022-06-11 10:44 UTC (permalink / raw)
  To: Greg Hogan; +Cc: Mathieu Othacehe, 55848

Greg Hogan <code@greghogan.com> writes:

> On Wed, Jun 8, 2022 at 11:32 AM Mathieu Othacehe <othacehe@gnu.org> wrote:
>> The aarch64 workers were all idle whereas 70k builds were
>> available. Once restarted, they started building again.

From following the builds on http://ci.guix.gnu.org/workers , many
(all?) builds are failing on the following workers:

* grunewald
* kreuzberg
* pankow

The builds are failing with the same error:

"substitute: updating substitutes from 'https://ci.guix.gnu.org'...
0.0%guix substitute: error: TLS error in procedure 'handshake': Error in
the pull function."

Here's some examples:
* http://ci.guix.gnu.org/build/998403/details
* http://ci.guix.gnu.org/build/978678/details
* http://ci.guix.gnu.org/build/978243/details


On worker overdrive1, in the raw log of
http://ci.guix.gnu.org/build/875908/details we can see this
rust-async-mutex build managing to pull substitutes, but it 
seems to be compiling rust-1.57 itself.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-11 10:44   ` Tom Fitzhenry
@ 2022-06-12 13:33     ` Ludovic Courtès
  2022-06-12 16:10       ` Ricardo Wurmus
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2022-06-12 13:33 UTC (permalink / raw)
  To: Tom Fitzhenry; +Cc: Mathieu Othacehe, 55848, Greg Hogan, guix-sysadmin

Hi,

(+Cc: guix-sysadmin)

Tom Fitzhenry <tom@tom-fitzhenry.me.uk> skribis:

>>From following the builds on http://ci.guix.gnu.org/workers , many
> (all?) builds are failing on the following workers:
>
> * grunewald
> * kreuzberg
> * pankow
>
> The builds are failing with the same error:
>
> "substitute: updating substitutes from 'https://ci.guix.gnu.org'...
> 0.0%guix substitute: error: TLS error in procedure 'handshake': Error in
> the pull function."

On these machines, https://ci.guix.gnu.org (among other) is unavailable
for some reason (firewall I guess):

--8<---------------cut here---------------start------------->8---
ludo@grunewald ~$ wget --debug -O/dev/null https://ci.guix.gnu.org
Setting --output-document (outputdocument) to /dev/null
DEBUG output created by Wget 1.21.1 on linux-gnu.

Reading HSTS entries from /home/ludo/.wget-hsts
URI encoding = ‘UTF-8’
--2022-06-11 22:38:59--  https://ci.guix.gnu.org/
Certificates loaded: 444
Resolving ci.guix.gnu.org (ci.guix.gnu.org)... 141.80.181.40
Caching ci.guix.gnu.org => 141.80.181.40
Connecting to ci.guix.gnu.org (ci.guix.gnu.org)|141.80.181.40|:443... connected.
Created socket 4.
Releasing 0x000000001fd26b50 (new refcount 1).

[Sits there forever…]
--8<---------------cut here---------------end--------------->8---

These machines are configured using ‘honeycomb-system’ from (sysadmin
honeycomb) in maintenance.git.

guix-daemon is configured to use the default substitute URLs,
https://ci.guix.gnu.org and https://bordeaux.guix.gnu.org, which we know
are unreachable.

I’ve theoretically addressed this here:

  https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=99bd9dc9001d6bea7480a7ce0e0e10ff78adb787
  https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=b0661cc7d6dd74b0aeac3b052a80a8a2fef2af9c

I tried to reconfigure those boxes with ‘guix deploy’, but this is
currently on hold because ci.guix has run out of inodes…

To be continued!

Ludo’.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-12 13:33     ` Ludovic Courtès
@ 2022-06-12 16:10       ` Ricardo Wurmus
  2022-06-12 20:22         ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Ricardo Wurmus @ 2022-06-12 16:10 UTC (permalink / raw)
  To: Ludovic Courtès
  Cc: Mathieu Othacehe, guix-sysadmin, 55848, Greg Hogan, Tom Fitzhenry


Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> (+Cc: guix-sysadmin)
>
> Tom Fitzhenry <tom@tom-fitzhenry.me.uk> skribis:
>
>>>From following the builds on http://ci.guix.gnu.org/workers , many
>> (all?) builds are failing on the following workers:
>>
>> * grunewald
>> * kreuzberg
>> * pankow
>>
>> The builds are failing with the same error:
>>
>> "substitute: updating substitutes from 'https://ci.guix.gnu.org'...
>> 0.0%guix substitute: error: TLS error in procedure 'handshake': Error in
>> the pull function."
>
> On these machines, https://ci.guix.gnu.org (among other) is unavailable
> for some reason (firewall I guess):

They should be using the local IP instead of routing through the
internet, so /etc/hosts should contain an entry for

141.80.167.131 ci.guix.gnu.org

(We have the same entry on the other build nodes hosted at the MDC.)

“guix deploy” did not work on these nodes due to a serious problem: they
were given *some* x86_64 binaries to execute, so deployed systems were
unbootable.  Since we don’t have a serial interface through which you
could debug this remotely, please make sure not to deploy a broken
system.  I’d like to avoid trips to the data centre.

-- 
Ricardo




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-12 16:10       ` Ricardo Wurmus
@ 2022-06-12 20:22         ` Ludovic Courtès
  0 siblings, 0 replies; 11+ messages in thread
From: Ludovic Courtès @ 2022-06-12 20:22 UTC (permalink / raw)
  To: Ricardo Wurmus
  Cc: Mathieu Othacehe, guix-sysadmin, 55848, Greg Hogan, Tom Fitzhenry

Ricardo Wurmus <rekado@elephly.net> skribis:

> They should be using the local IP instead of routing through the
> internet, so /etc/hosts should contain an entry for
>
> 141.80.167.131 ci.guix.gnu.org

Good idea.

> “guix deploy” did not work on these nodes due to a serious problem: they
> were given *some* x86_64 binaries to execute, so deployed systems were
> unbootable.  Since we don’t have a serial interface through which you
> could debug this remotely, please make sure not to deploy a broken
> system.  I’d like to avoid trips to the data centre.

Ooooh right, thanks for the reminder!

Ludo’.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-08 15:31 bug#55848: [cuirass] workers stalled Mathieu Othacehe
  2022-06-08 19:07 ` Greg Hogan
@ 2022-06-19  2:07 ` Tom Fitzhenry
  2022-06-20  2:39   ` Maxim Cournoyer
  1 sibling, 1 reply; 11+ messages in thread
From: Tom Fitzhenry @ 2022-06-19  2:07 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 55848

Mathieu Othacehe <othacehe@gnu.org> writes:

Substitutes for aarch64 are a lot healthier now. Thanks Ludovic!

* kreuzberg is now successfully building and has been for a while.
* ci.guix.gnu.has has 41% of substitutes (a low percentage, but likely a
  high percentage of toolchains). 0 jobs are queued, presumably because Curiass
  believes its up-to-date. This should increase over time, as packages
  are updated.
* bordeaux has 83.8% of substitutes.

A few issues remain for aarch64:

* grunewald and kreuzberg are not on <https://ci.guix.gnu.org/workers>.
  Perhaps they were taken down while the substitute ratio was low to
  avoid each worker independently recompiling expensive toolchains?
* rust@1.39.0 (and thus all of Rust) is missing from ci and bordeaux. I
  had expected this would have been working. I'll take a look and raise
  a separate issue.

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix weather -s aarch64-linux -c2000
computing 15514 package derivations for aarch64-linux...
looking for 16265 store items on https://ci.guix.gnu.org...
https://ci.guix.gnu.org
  41.0% substitutes available (6668 out of 16265)
  at least 34188.1 MiB of nars (compressed)
  45362.5 MiB on disk (uncompressed)
  0.015 seconds per request (144.9 seconds in total)
  66.2 requests per second

  0.0% (0 out of 9597) of the missing items are queued
  at least 1000 queued builds
      aarch64-linux: 110 (11.0%)
      powerpc64le-linux: 890 (89.0%)
  build rate: 36.81 builds per hour
      aarch64-linux: 17.23 builds per hour
      x86_64-linux: 14.25 builds per hour
      powerpc64le-linux: 1.01 builds per hour
      i686-linux: 4.83 builds per hour
1871 packages are missing from 'https://ci.guix.gnu.org' for 'aarch64-linux', among which:
  3479	rust@1.39.0	/gnu/store/xxlgndidxvhdd391k35vcmviixq5d9b0-rust-1.39.0-cargo /gnu/store/cfy1p8q4bwwy1i01cjfssfry21kpljz3-rust-1.39.0 
  2111	cairomm@1.14.2	/gnu/store/bxknxn3nbmmvavf537k0pggrynhrgsaf-cairomm-1.14.2-doc /gnu/store/3sn66mgr29v73zpp93c2v09a0rj87l3w-cairomm-1.14.2 
  2101	texlive-latex-pgf@59745	/gnu/store/l6jr7v8ygn3ybj4gxcwskf8ifsjcj6x1-texlive-latex-pgf-59745 
looking for 16265 store items on https://bordeaux.guix.gnu.org...
https://bordeaux.guix.gnu.org
  83.8% substitutes available (13624 out of 16265)
  35138.6 MiB of nars (compressed)
  109501.6 MiB on disk (uncompressed)
  0.060 seconds per request (699.4 seconds in total)
  16.7 requests per second
  (continuous integration information unavailable)
579 packages are missing from 'https://bordeaux.guix.gnu.org' for 'aarch64-linux', among which:
  3479	rust@1.39.0	/gnu/store/xxlgndidxvhdd391k35vcmviixq5d9b0-rust-1.39.0-cargo /gnu/store/cfy1p8q4bwwy1i01cjfssfry21kpljz3-rust-1.39.0
--8<---------------cut here---------------end--------------->8---



> Hello,
>
> The aarch64 workers were all idle whereas 70k builds were
> available. Once restarted, they started building again.
>
> The problem might be that when the server is unavailable for a while the
> worker connections expire and cannot be resumed once the server is
> available again.
>
> Thanks,
>
> Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-19  2:07 ` Tom Fitzhenry
@ 2022-06-20  2:39   ` Maxim Cournoyer
  2022-06-20  2:44     ` Tom Fitzhenry
  2022-06-20 13:02     ` Maxime Devos
  0 siblings, 2 replies; 11+ messages in thread
From: Maxim Cournoyer @ 2022-06-20  2:39 UTC (permalink / raw)
  To: Tom Fitzhenry; +Cc: Mathieu Othacehe, 55848

Hi Mathieu!

[...]

> A few issues remain for aarch64:
>
> * grunewald and kreuzberg are not on <https://ci.guix.gnu.org/workers>.
>   Perhaps they were taken down while the substitute ratio was low to
>   avoid each worker independently recompiling expensive toolchains?
> * rust@1.39.0 (and thus all of Rust) is missing from ci and bordeaux. I
>   had expected this would have been working. I'll take a look and raise
>   a separate issue.

That's a known issue with mrustc; it only succeeds with x86_64; the
other architectures have problems.  That's a bug the mrustc author would
like to fix, so perhaps in time in will improve (especially if
interested parties can lend a hand).

There was also an attempt to cross-compile a rust/cargo bootstrap seed
for other architectures (branch: wip-cross-built-rust) but due to
complications with building rust as a static archive (it relies on
dynamic linking for its macro expand crates), the effort stalled.

Thanks,

Maxim




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-20  2:39   ` Maxim Cournoyer
@ 2022-06-20  2:44     ` Tom Fitzhenry
  2022-06-20 13:02     ` Maxime Devos
  1 sibling, 0 replies; 11+ messages in thread
From: Tom Fitzhenry @ 2022-06-20  2:44 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 55848

On Mon, 20 Jun 2022, at 12:39 PM, Maxim Cournoyer wrote:
> That's a known issue with mrustc; it only succeeds with x86_64; the
> other architectures have problems.  That's a bug the mrustc author would
> like to fix, so perhaps in time in will improve (especially if
> interested parties can lend a hand).

mrustc was fixed on aarch64 in https://issues.guix.gnu.org/54580 on staging, which was recently merged to master.

I had tested mrustc and rust-1.39 to compile on aarch64 on staging, but now I observe rust-1.39 failing.

I'll take a closer look, maybe I'm missing something.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-20  2:39   ` Maxim Cournoyer
  2022-06-20  2:44     ` Tom Fitzhenry
@ 2022-06-20 13:02     ` Maxime Devos
  2022-06-21  5:32       ` Maxim Cournoyer
  1 sibling, 1 reply; 11+ messages in thread
From: Maxime Devos @ 2022-06-20 13:02 UTC (permalink / raw)
  To: Maxim Cournoyer, Tom Fitzhenry; +Cc: Mathieu Othacehe, 55848

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

Maxim Cournoyer schreef op zo 19-06-2022 om 22:39 [-0400]:
> There was also an attempt to cross-compile a rust/cargo bootstrap seed
> for other architectures (branch: wip-cross-built-rust) but due to
> complications with building rust as a static archive (it relies on
> dynamic linking for its macro expand crates), the effort stalled.

FWIW, has it been considered to cross-compile rust non-statically
(not as a seed, just as an input cross-compiled from another system)?
Doesn't help for people that cannot offload to x86_64 and don't have
substitutes from ci.guix.gnu.org or such enabled, but could still be an
improvement.

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#55848: [cuirass] workers stalled
  2022-06-20 13:02     ` Maxime Devos
@ 2022-06-21  5:32       ` Maxim Cournoyer
  0 siblings, 0 replies; 11+ messages in thread
From: Maxim Cournoyer @ 2022-06-21  5:32 UTC (permalink / raw)
  To: Maxime Devos; +Cc: Mathieu Othacehe, 55848, Tom Fitzhenry

Hi Maxime,

Maxime Devos <maximedevos@telenet.be> writes:

> Maxim Cournoyer schreef op zo 19-06-2022 om 22:39 [-0400]:
>> There was also an attempt to cross-compile a rust/cargo bootstrap seed
>> for other architectures (branch: wip-cross-built-rust) but due to
>> complications with building rust as a static archive (it relies on
>> dynamic linking for its macro expand crates), the effort stalled.
>
> FWIW, has it been considered to cross-compile rust non-statically
> (not as a seed, just as an input cross-compiled from another system)?
> Doesn't help for people that cannot offload to x86_64 and don't have
> substitutes from ci.guix.gnu.org or such enabled, but could still be an
> improvement.

This already works, on the branch.  One of the patches carried there
that made it possible has been merged upstream too.  The issue is that
to offer a useful cross-compiled rust on non-x86_64 systems, you need to
move it from system domains; the clean way to do this is to archive a
static binary that depends on nothing else somewhere, and extract it in
a package for the target architecture.

Currently it's not cleanly self-contained because it still references
GCC libraries.

Maxim




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-21  5:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-08 15:31 bug#55848: [cuirass] workers stalled Mathieu Othacehe
2022-06-08 19:07 ` Greg Hogan
2022-06-11 10:44   ` Tom Fitzhenry
2022-06-12 13:33     ` Ludovic Courtès
2022-06-12 16:10       ` Ricardo Wurmus
2022-06-12 20:22         ` Ludovic Courtès
2022-06-19  2:07 ` Tom Fitzhenry
2022-06-20  2:39   ` Maxim Cournoyer
2022-06-20  2:44     ` Tom Fitzhenry
2022-06-20 13:02     ` Maxime Devos
2022-06-21  5:32       ` Maxim Cournoyer

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).