* ci.guix.gnu.org is getting back to life
@ 2024-07-11 10:26 Ludovic Courtès
2024-07-11 13:26 ` Ricardo Wurmus
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Ludovic Courtès @ 2024-07-11 10:26 UTC (permalink / raw)
To: Guix Devel
Hello!
Last month, we discussed¹ slow progress with builds (and ‘core-updates’
in particular) on ci.guix, especially on AArch64 and POWER9. Those
turned out to be mostly due to scalability issues in Cuirass. Likewise,
the front page at https://ci.guix.gnu.org was timing out for almost two
weeks².
I’m happy to report that the first class of problems is mostly fixed,
and timeouts are not entirely gone with they’re less frequent. Some
details about the work done:
• I learned a lot from Chris about all things PostgreSQL (I even learn
that phrases like “database administrator” are a thing). Chris
provided invaluable suggestions to optimize SQL queries that were
taking too long, as was the case on the front page, and to tweak
PostgreSQL configuration.
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=f60e73b7b1e906349d2355d37807514c6e667f0c
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=d98e1f76501d368e67a8e57455195590880283f8
• The infamous “missing derivation” issue that has been causing
spurious build failures may be coming to an end: as suggested by
Chris, ‘cuirass remote-worker’ now has an explicit step to
substitute .drv store items and it keeps retrying for a while when
that fails:
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=2365ba786c805477fcbae6eaeb358b0dd0501598
• ‘cuirass remote-server’ doesn’t use the database anymore to store
transient worker information (“last seen” time), which reduces
pressure on the database and increases throughput:
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=e9f83e43f066cdc8bb4bec6ba221ade4ef7cab7b
• A bunch of corner cases (stuck builds, etc.) are now better handled
by restarting, rescheduling, or canceling as makes most sense.
• We upgraded the Honeycombs (AArch64) and POWER9 build machines. At
this stage 3 AArch64 and 2 POWER9 build machines are fully
operational behind ci.guix:
https://ci.guix.gnu.org/workers
Another Honeycomb, grunewald, is undergoing maintenance at the MDC
and should be back soon.
Substitute available is back to 94% for x86_64 for ‘core-updates’; other
architectures are still lacking but that’ll hopefully improve over the
coming days:
https://qa.guix.gnu.org/branch/core-updates
These things require constant attention. If you notice anything
suspicious, feel free to bring it up here or on IRC!
Ludo’.
¹ https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00149.html
² https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00312.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ci.guix.gnu.org is getting back to life
2024-07-11 10:26 ci.guix.gnu.org is getting back to life Ludovic Courtès
@ 2024-07-11 13:26 ` Ricardo Wurmus
2024-07-12 9:41 ` Simon Tournier
2024-07-21 13:04 ` Ludovic Courtès
2 siblings, 0 replies; 4+ messages in thread
From: Ricardo Wurmus @ 2024-07-11 13:26 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Guix Devel
Ludovic Courtès <ludo@gnu.org> writes:
> Last month, we discussed¹ slow progress with builds (and ‘core-updates’
> in particular) on ci.guix, especially on AArch64 and POWER9. Those
> turned out to be mostly due to scalability issues in Cuirass. Likewise,
> the front page at https://ci.guix.gnu.org was timing out for almost two
> weeks².
>
> I’m happy to report that the first class of problems is mostly fixed,
> and timeouts are not entirely gone with they’re less frequent.
Thank you for the update and for driving the investigations and working
towards fixes!
--
Ricardo
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ci.guix.gnu.org is getting back to life
2024-07-11 10:26 ci.guix.gnu.org is getting back to life Ludovic Courtès
2024-07-11 13:26 ` Ricardo Wurmus
@ 2024-07-12 9:41 ` Simon Tournier
2024-07-21 13:04 ` Ludovic Courtès
2 siblings, 0 replies; 4+ messages in thread
From: Simon Tournier @ 2024-07-12 9:41 UTC (permalink / raw)
To: Ludovic Courtès, Guix Devel
Hi,
On Thu, 11 Jul 2024 at 12:26, Ludovic Courtès <ludo@gnu.org> wrote:
> These things require constant attention.
Thank you for such attention! Thanks Chris for playing the role of
“database administrator”. :-)
Well, it reminds me that our discussion [1] “Sustainable funding and
maintenance for our infrastructure” has been focused on “funding” and
“hardware” more than “maintenance”.
Cheers,
simon
1: Sustainable funding and maintenance for our infrastructure
Ludovic Courtès <ludo@gnu.org>
Tue, 02 Jul 2024 16:24:06 +0200
id:87sewr98jd.fsf@gnu.org
https://lists.gnu.org/archive/html/guix-devel/2024-07
https://yhetil.org/guix/87sewr98jd.fsf@gnu.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ci.guix.gnu.org is getting back to life
2024-07-11 10:26 ci.guix.gnu.org is getting back to life Ludovic Courtès
2024-07-11 13:26 ` Ricardo Wurmus
2024-07-12 9:41 ` Simon Tournier
@ 2024-07-21 13:04 ` Ludovic Courtès
2 siblings, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2024-07-21 13:04 UTC (permalink / raw)
To: Guix Devel
Hello!
Ludovic Courtès <ludo@gnu.org> skribis:
> • We upgraded the Honeycombs (AArch64) and POWER9 build machines. At
> this stage 3 AArch64 and 2 POWER9 build machines are fully
> operational behind ci.guix:
>
> https://ci.guix.gnu.org/workers
>
> Another Honeycomb, grunewald, is undergoing maintenance at the MDC
> and should be back soon.
grunewald has been back to work for a week, so we’re making progress!
There was a regression in Cuirass that, when a worker is found
unresponsive, would cause all the builds performed on that workers to be
rescheduled (instead of just those that were running on the worker at
that time!). It took me a while to notice it, but that certainly slowed
things down as workers would end up rebuilding the same things again
occasionally (not actually rebuilding when substitutes from a previous
build were available, but still).
The Arm workers have been busy, which has improved substitute
availability for aarch64-linux and armhf-linux, but there’s still a lot
of progress to be made:
https://qa.guix.gnu.org/branch/master
https://qa.guix.gnu.org/branch/core-updates
So far they’ve been mostly processing relatively old ‘master’ builds and
recent ‘gnome-team’ builds; I hope they can start working on
‘core-updates’ now.
Ludo’.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-07-21 13:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-11 10:26 ci.guix.gnu.org is getting back to life Ludovic Courtès
2024-07-11 13:26 ` Ricardo Wurmus
2024-07-12 9:41 ` Simon Tournier
2024-07-21 13:04 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).