all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* ci.guix.gnu.org is getting back to life
@ 2024-07-11 10:26 Ludovic Courtès
  2024-07-11 13:26 ` Ricardo Wurmus
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Ludovic Courtès @ 2024-07-11 10:26 UTC (permalink / raw)
  To: Guix Devel

Hello!

Last month, we discussed¹ slow progress with builds (and ‘core-updates’
in particular) on ci.guix, especially on AArch64 and POWER9.  Those
turned out to be mostly due to scalability issues in Cuirass.  Likewise,
the front page at https://ci.guix.gnu.org was timing out for almost two
weeks².

I’m happy to report that the first class of problems is mostly fixed,
and timeouts are not entirely gone with they’re less frequent.  Some
details about the work done:

  • I learned a lot from Chris about all things PostgreSQL (I even learn
    that phrases like “database administrator” are a thing).  Chris
    provided invaluable suggestions to optimize SQL queries that were
    taking too long, as was the case on the front page, and to tweak
    PostgreSQL configuration.

      https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=f60e73b7b1e906349d2355d37807514c6e667f0c
      https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=d98e1f76501d368e67a8e57455195590880283f8

  • The infamous “missing derivation” issue that has been causing
    spurious build failures may be coming to an end: as suggested by
    Chris, ‘cuirass remote-worker’ now has an explicit step to
    substitute .drv store items and it keeps retrying for a while when
    that fails:

      https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=2365ba786c805477fcbae6eaeb358b0dd0501598

  • ‘cuirass remote-server’ doesn’t use the database anymore to store
    transient worker information (“last seen” time), which reduces
    pressure on the database and increases throughput:

      https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=e9f83e43f066cdc8bb4bec6ba221ade4ef7cab7b

  • A bunch of corner cases (stuck builds, etc.) are now better handled
    by restarting, rescheduling, or canceling as makes most sense.

  • We upgraded the Honeycombs (AArch64) and POWER9 build machines.  At
    this stage 3 AArch64 and 2 POWER9 build machines are fully
    operational behind ci.guix:

      https://ci.guix.gnu.org/workers

    Another Honeycomb, grunewald, is undergoing maintenance at the MDC
    and should be back soon.

Substitute available is back to 94% for x86_64 for ‘core-updates’; other
architectures are still lacking but that’ll hopefully improve over the
coming days:

  https://qa.guix.gnu.org/branch/core-updates

These things require constant attention.  If you notice anything
suspicious, feel free to bring it up here or on IRC!

Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00149.html
² https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00312.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ci.guix.gnu.org is getting back to life
  2024-07-11 10:26 ci.guix.gnu.org is getting back to life Ludovic Courtès
@ 2024-07-11 13:26 ` Ricardo Wurmus
  2024-07-12  9:41 ` Simon Tournier
  2024-07-21 13:04 ` Ludovic Courtès
  2 siblings, 0 replies; 4+ messages in thread
From: Ricardo Wurmus @ 2024-07-11 13:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel

Ludovic Courtès <ludo@gnu.org> writes:

> Last month, we discussed¹ slow progress with builds (and ‘core-updates’
> in particular) on ci.guix, especially on AArch64 and POWER9.  Those
> turned out to be mostly due to scalability issues in Cuirass.  Likewise,
> the front page at https://ci.guix.gnu.org was timing out for almost two
> weeks².
>
> I’m happy to report that the first class of problems is mostly fixed,
> and timeouts are not entirely gone with they’re less frequent.

Thank you for the update and for driving the investigations and working
towards fixes!

-- 
Ricardo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ci.guix.gnu.org is getting back to life
  2024-07-11 10:26 ci.guix.gnu.org is getting back to life Ludovic Courtès
  2024-07-11 13:26 ` Ricardo Wurmus
@ 2024-07-12  9:41 ` Simon Tournier
  2024-07-21 13:04 ` Ludovic Courtès
  2 siblings, 0 replies; 4+ messages in thread
From: Simon Tournier @ 2024-07-12  9:41 UTC (permalink / raw)
  To: Ludovic Courtès, Guix Devel

Hi,

On Thu, 11 Jul 2024 at 12:26, Ludovic Courtès <ludo@gnu.org> wrote:

> These things require constant attention.

Thank you for such attention!  Thanks Chris for playing the role of
“database administrator”. :-)

Well, it reminds me that our discussion [1] “Sustainable funding and
maintenance for our infrastructure” has been focused on “funding” and
“hardware” more than “maintenance”.

Cheers,
simon



1: Sustainable funding and maintenance for our infrastructure
Ludovic Courtès <ludo@gnu.org>
Tue, 02 Jul 2024 16:24:06 +0200
id:87sewr98jd.fsf@gnu.org
https://lists.gnu.org/archive/html/guix-devel/2024-07
https://yhetil.org/guix/87sewr98jd.fsf@gnu.org


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ci.guix.gnu.org is getting back to life
  2024-07-11 10:26 ci.guix.gnu.org is getting back to life Ludovic Courtès
  2024-07-11 13:26 ` Ricardo Wurmus
  2024-07-12  9:41 ` Simon Tournier
@ 2024-07-21 13:04 ` Ludovic Courtès
  2 siblings, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2024-07-21 13:04 UTC (permalink / raw)
  To: Guix Devel

Hello!

Ludovic Courtès <ludo@gnu.org> skribis:

>   • We upgraded the Honeycombs (AArch64) and POWER9 build machines.  At
>     this stage 3 AArch64 and 2 POWER9 build machines are fully
>     operational behind ci.guix:
>
>       https://ci.guix.gnu.org/workers
>
>     Another Honeycomb, grunewald, is undergoing maintenance at the MDC
>     and should be back soon.

grunewald has been back to work for a week, so we’re making progress!

There was a regression in Cuirass that, when a worker is found
unresponsive, would cause all the builds performed on that workers to be
rescheduled (instead of just those that were running on the worker at
that time!).  It took me a while to notice it, but that certainly slowed
things down as workers would end up rebuilding the same things again
occasionally (not actually rebuilding when substitutes from a previous
build were available, but still).

The Arm workers have been busy, which has improved substitute
availability for aarch64-linux and armhf-linux, but there’s still a lot
of progress to be made:

  https://qa.guix.gnu.org/branch/master
  https://qa.guix.gnu.org/branch/core-updates

So far they’ve been mostly processing relatively old ‘master’ builds and
recent ‘gnome-team’ builds; I hope they can start working on
‘core-updates’ now.

Ludo’.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-07-21 13:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-11 10:26 ci.guix.gnu.org is getting back to life Ludovic Courtès
2024-07-11 13:26 ` Ricardo Wurmus
2024-07-12  9:41 ` Simon Tournier
2024-07-21 13:04 ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.