all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* May update on bordeaux.guix.gnu.org
@ 2022-05-20 11:34 Christopher Baines
  0 siblings, 0 replies; only message in thread
From: Christopher Baines @ 2022-05-20 11:34 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 5293 bytes --]

Hi!

The last update was sent out in February [1], so this update covers
roughly the last 3 months.

1: https://lists.gnu.org/archive/html/guix-devel/2022-02/msg00099.html

### Summary

bordeaux.guix.gnu.org is one of the default sources of substitutes, the
other one being ci.guix.gnu.org.

I haven't had much time to spend on Guix, so the little time I've had
has gone in to general maintenance. There's also been some problems that
have come up in the last few months.

Those problems have been overcome, and I'm looking forward to doing more
with mirrors and continuing to work on building non-master branches and
packages affected by patches.

If you want the details, read on.

### Some problems, and addressing them

data.guix.gnu.org has had some issues recently processing revisions,
which delayed builds starting. This has been addressed for now by
clearing out cached connections within the inferior process used to
gather information about the revision being processed. As the memory
required for this processing is probably going to increase in the
future, this'll area probably require further work.

In other Guix Data Service news, data.qa.guix.gnu.org now exists. This
is just a hopefully stable name for a Guix Data Service instance that
processes non-master branches and branches related to patches. There's
also been some improvements which specifically benefit
data.qa.guix.gnu.org, including fixing a locking issue that prevented
build events from being processed and a memory management improvement to
close the inferior processes when loading revisions before waiting on
the big database lock.

Another problem was a disk failure on lakeside.guix.gnu.org, the machine
that should have been storing all the built nars and supporting serving
requests to bordeaux.guix.gnu.org. There's now a new machine,
bishan.guix.gnu.org which has more storage space, and hopefully working
hard disks.

It's the nar-herder that is meant to assist with managing all the nars,
and generally I think this problem was well handled. Even though one
machine involved had a hardware issue, builds were still happening and
substitutes were still available. I was able to setup a new machine,
have the nar-herder download all the nars, and then switch over to using
the new machine.

One nar [2] was lost because it was only stored on lakeside, and the
file could not longer be read from the disk (it's now been built again,
so it should be available).

2: https://bordeaux.guix.gnu.org/nar/lzip/phcqx40viymdxlfaa5fpbx43np8qhzpn-qtbase-6.1.1-debug

Out of this, there have been a bunch of improvements made to the
nar-herder:

 - Logging has improved, and Prometheus style metrics are now available

 - The nar-herder supports handling /nar requests. The nars are still
   served by nginx, but the nar-herder metrics are updated.

 - The nar-herder now supports removing nars, which came in useful when
   removing the irretrievable nar mentioned above.

 - The "and" component of the storage nar removal criteria now works,
   and this is now used on bordeaux.guix.gnu.org to ensure that nars are
   only removed from bayfront once they're stored on two machines
   (rather than just one as was the case previously)

 - Mirroring nars has also been parallelised in the case where there is
   no storage limit. This helped speed up bishan downloading all the
   nars.

The other problem has been I/O performance and probably related issues
on bayfront, the machine that runs the coordinator and serves
substitutes. Bayfront also does a bunch more things, with only two hard
drives in RAID 1, so in some ways it's a good test that the Guix Build
Coordinator can work with less performant hardware.

One particular issue was that Garbage Collection (GC) in /gnu/store on
bayfront was taking days to run, and while it was running, the
coordinator couldn't substitute derivations from data.guix.gnu.org,
which happens as part of submitting new builds. This meant that builds
were delayed.

To address this, if derivations aren't in the local store, the
coordinator now can read them directly from a substitute server
(data.guix.gnu.org in this case). This skips out unnecessarily adding
them to the store, and maybe waiting for garbage collection to finish to
allow this. It also means that there will be less things in the store to
garbage collect as well.

### Looking forward

I mentioned last time [1] that nar capacity was something that needed
work soon. The new bishan machine has ~6TB of free storage, which should
be enough for a while, but hatysa that also stores all the nars only has
610GB of free space. One way or another, I'll try to add 6TB of more
storage at some point soon.

Also mentioned last time was mirrors in different geographical
locations. I've now setup a mirror in the US [3] to enable testing of
this, and I'll send out a specific email about this shortly.

3: https://bordeaux-us-east-mirror.cbaines.net/

Time permitting, I want to try and keep progressing the work to build
non-master branches and packages affected by patches on
bordeaux.guix.gnu.org. Currently, data.qa.guix.gnu.org is processing
revisions too slowly, but once that's fixed, it should be possible to
look at submitting builds.

Let me know if you have any questions or comments!

Thanks,

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-05-20 14:01 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20 11:34 May update on bordeaux.guix.gnu.org Christopher Baines

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.