* Solstice infrastructure hackathon
@ 2021-12-16 9:46 Ludovic Courtès
2021-12-16 11:30 ` zimoun
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Ludovic Courtès @ 2021-12-16 9:46 UTC (permalink / raw)
To: Guix Devel; +Cc: guix-sysadmin
Hello Guix!
This week, the node behind {ci,issues,disarchive}.guix.gnu.org and
guix.gnu.org was down twice for a few hours—nothing terrible in the end,
but it reminded us that, even though Guix doesn’t rely on any particular
machine, we can definitely feel the inconvenience when it’s down.
We were unlucky enough that it happened days after the other build farm,
bordeaux.guix.gnu.org, ran out of disk space and had its CI stopped,
right before the big merge—so it doesn’t have substitutes for current
master.
While discussing this on IRC the other day, we thought that perhaps it
was time to have an infrastructure hackathon. How about Tuesday,
Dec. 21st? (Probably with a followup in January.)
Here are tasks that were brought up:
• Set up a backup server for berlin.guix.gnu.org, the head node of the
ci.guix.gnu.org, possibly moving some services such as the web site
there.
• Add DNS redundancy for guix.gnu.org so it can point to one of two
hosts (need to figure out certbot challenges so both machines can
update their certificates).
• Set up status.guix.gnu.org with sysadmin status updates (possibly
using Prometheus?).
• Come up with a plan to add disks to the RAID array on bayfront, the
head node of bordeaux.guix.gnu.org.
• Work on a plan to back up the Disarchive database currently on
berlin.guix.
• Work on a plan to mirror nars from ci.guix and bordeaux.guix, using
plain rsync or <https://git.cbaines.net/guix/nar-herder/about/>.
• Have a documented procedure to set up substitute mirrors, such as
the one in .cn (I can’t find the URL), ideally with plain rsync
access.
Am I forgetting something?
Some of these tasks require root or physical access for the final steps,
but most of them are about (1) coming up with a plan, and (2) adjusting
the system configuration at
<https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/>.
Anyone with sysadmin experience to share and Guix System knowledge can
join! We’ll communicate over #guix on irc.libera.chat.
Who’s in? :-)
Cheers,
Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Solstice infrastructure hackathon
@ 2021-12-16 10:13 Blake Shaw
0 siblings, 0 replies; 9+ messages in thread
From: Blake Shaw @ 2021-12-16 10:13 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Guix Devel, guix-sysadmin
This sounds great! I'm in, marking it off in my calendar :)
--
“In girum imus nocte et consumimur igni”
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Solstice infrastructure hackathon
2021-12-16 9:46 Solstice infrastructure hackathon Ludovic Courtès
@ 2021-12-16 11:30 ` zimoun
2021-12-17 1:57 ` Christopher Baines
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: zimoun @ 2021-12-16 11:30 UTC (permalink / raw)
To: Ludovic Courtès, Guix Devel; +Cc: guix-sysadmin
Hi Ludo,
On Thu, 16 Dec 2021 at 10:46, Ludovic Courtès <ludo@gnu.org> wrote:
> Here are tasks that were brought up:
>
> • Set up a backup server for berlin.guix.gnu.org, the head node of the
> ci.guix.gnu.org, possibly moving some services such as the web site
> there.
About backup, Git repositories from Savannah require backup too, at
least Guix channel. Past discussions [1].
1: <https://lists.gnu.org/archive/html/guix-devel/2019-12/msg00148.html>
> • Work on a plan to mirror nars from ci.guix and bordeaux.guix, using
> plain rsync or <https://git.cbaines.net/guix/nar-herder/about/>.
Well, that reminds me the CDN experience; for the most recent
discussions [2,3]. The outcome of experience was very positive, IIRC.
Therefore, it appears to me worth to go this CDN direction, waiting the
Right Thing.
2: <https://lists.gnu.org/archive/html/guix-devel/2018-12/msg00192.html>
3: <https://lists.gnu.org/archive/html/guix-devel/2019-03/msg00135.html>
> • Have a documented procedure to set up substitute mirrors, such as
> the one in .cn (I can’t find the URL), ideally with plain rsync
> access.
Maybe that URL
<https://mirrors.sjtug.sjtu.edu.cn/guix>
from this message [4].
4: <https://yhetil.org/guix/87czz24ilu.fsf@riseup.net/>
Cheers,
simon
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Solstice infrastructure hackathon
2021-12-16 9:46 Solstice infrastructure hackathon Ludovic Courtès
2021-12-16 11:30 ` zimoun
@ 2021-12-17 1:57 ` Christopher Baines
2021-12-17 8:12 ` Mathieu Othacehe
2021-12-18 10:57 ` pukkamustard
2021-12-22 0:44 ` Ludovic Courtès
3 siblings, 1 reply; 9+ messages in thread
From: Christopher Baines @ 2021-12-17 1:57 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel, guix-sysadmin
[-- Attachment #1: Type: text/plain, Size: 3396 bytes --]
Ludovic Courtès <ludo@gnu.org> writes:
> We were unlucky enough that it happened days after the other build farm,
> bordeaux.guix.gnu.org, ran out of disk space and had its CI stopped,
> right before the big merge—so it doesn’t have substitutes for current
> master.
Builds effectively stopped on the 29th of November, which is more than a
few days I'd say, although this is maybe not the biggest issue. Since
the build coordinator instance behind bordeaux.guix.gnu.org wasn't
building things from core-updates-frozen prior to the merge, even if
builds hadn't stopped due to the space issues on bayfront, it still
wouldn't have had many substitutes.
As part of testing patches and branches [1], I think it would be good to
get builds for things like core-updates-frozen happening, that will
hopefully improve the substitute availability from bordeaux.guix.gnu.org
on average.
1: https://lists.gnu.org/archive/html/guix-devel/2021-08/msg00001.html
> • Add DNS redundancy for guix.gnu.org so it can point to one of two
> hosts (need to figure out certbot challenges so both machines can
> update their certificates).
This (in general) is something I'm interested in working out, since
it'll be helpful for setting up mirrors for substitutes as well (in the
case where you want the mirrors to respond to one common DNS name with
working TLS).
> • Come up with a plan to add disks to the RAID array on bayfront, the
> head node of bordeaux.guix.gnu.org.
The space issue on bayfront that led to builds not happening has now
been effectively resolved (see [2]). There's definitely lots of tidying
up to do, but I think the situation for storing the nars is much better
now.
2: https://lists.gnu.org/archive/html/guix-devel/2021-12/msg00140.html
That's not to say there's not something to be gained by upgrading the
bayfront hardware, some SSD storage would be ideal to speed up the
coordinator and builds.
> • Work on a plan to mirror nars from ci.guix and bordeaux.guix, using
> plain rsync or <https://git.cbaines.net/guix/nar-herder/about/>.
I'm interested in getting bordeaux.guix.gnu.org in to a state where
there's less of a discrepancy in performance depending on where in the
world it's accessed from. I'm assuming there is some difference in the
performance, which is definitely an assumption to check, which is one
part of the problem. If it turns out there are some gains to be had, the
next step is investigating how this could be approached. Mirrors plus
GeoIP based DNS is the approach I currently have in mind.
Anyway, even if there isn't a meaningful performance difference, maybe
it's worth setting up distributed mirrors for reliability.
> • Have a documented procedure to set up substitute mirrors, such as
> the one in .cn (I can’t find the URL), ideally with plain rsync
> access.
Getting the nar-herder in to a state where other people might be able to
use it is definitely on my list of things to do. I'm assuming here that
it's something that people might want to use, and again that's probably
worth investigating. If it turns out that people just want to use rsync,
it's probably worth assisting with getting that kind of setup working.
> Who’s in? :-)
Not sure how much time I'll have, but I'll try to be around :)
Thanks,
Chris
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Solstice infrastructure hackathon
2021-12-17 1:57 ` Christopher Baines
@ 2021-12-17 8:12 ` Mathieu Othacehe
2021-12-18 16:08 ` Ricardo Wurmus
2021-12-20 22:58 ` Ludovic Courtès
0 siblings, 2 replies; 9+ messages in thread
From: Mathieu Othacehe @ 2021-12-17 8:12 UTC (permalink / raw)
To: Christopher Baines; +Cc: guix-devel, guix-sysadmin
Hello,
> That's not to say there's not something to be gained by upgrading the
> bayfront hardware, some SSD storage would be ideal to speed up the
> coordinator and builds.
While we are talking infrastructure, a short digression on the current
situation. The Berlin build farm is providing substitutes for all
branches and development branches. It is providing system images that
are made available on the Guix website, on Gnome Boxes and probably
elsewhere.
It is giving a frequent status on the system tests. It is also checking
that all our sources are available and feeding the Disarchive
database. When this build farm is experiencing troubles, all those
important services are suspended or limited.
As you may know, Berlin is also experiencing storage issues that will
soon be really problematic. Having Bordeaux as another build farm
monopolizing resources and requiring new hardware while maintaining
Berlin is already a huge burden seems untenable to me.
I personally think that having Bordeaux as an alternative build farm,
running alternative software is a wrong direction for the project. I
would personally prefer to put a stop to that situation.
We should clearly host some services such as the Guix website on
Berlin & Bordeaux to bring some redundancy. However, as far as
substitutes building is concerned, redundancy is premature when
maintaining a single system, with our limited human and hardware
resources, proves to be so complex.
What do other people think?
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Solstice infrastructure hackathon
2021-12-16 9:46 Solstice infrastructure hackathon Ludovic Courtès
2021-12-16 11:30 ` zimoun
2021-12-17 1:57 ` Christopher Baines
@ 2021-12-18 10:57 ` pukkamustard
2021-12-22 0:44 ` Ludovic Courtès
3 siblings, 0 replies; 9+ messages in thread
From: pukkamustard @ 2021-12-18 10:57 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel, guix-sysadmin
I'm in!
Ludovic Courtès <ludo@gnu.org> writes:
> Hello Guix!
>
> This week, the node behind {ci,issues,disarchive}.guix.gnu.org and
> guix.gnu.org was down twice for a few hours—nothing terrible in the end,
> but it reminded us that, even though Guix doesn’t rely on any particular
> machine, we can definitely feel the inconvenience when it’s down.
>
> We were unlucky enough that it happened days after the other build farm,
> bordeaux.guix.gnu.org, ran out of disk space and had its CI stopped,
> right before the big merge—so it doesn’t have substitutes for current
> master.
>
> While discussing this on IRC the other day, we thought that perhaps it
> was time to have an infrastructure hackathon. How about Tuesday,
> Dec. 21st? (Probably with a followup in January.)
>
> Here are tasks that were brought up:
>
> • Set up a backup server for berlin.guix.gnu.org, the head node of the
> ci.guix.gnu.org, possibly moving some services such as the web site
> there.
>
> • Add DNS redundancy for guix.gnu.org so it can point to one of two
> hosts (need to figure out certbot challenges so both machines can
> update their certificates).
>
> • Set up status.guix.gnu.org with sysadmin status updates (possibly
> using Prometheus?).
>
> • Come up with a plan to add disks to the RAID array on bayfront, the
> head node of bordeaux.guix.gnu.org.
>
> • Work on a plan to back up the Disarchive database currently on
> berlin.guix.
>
> • Work on a plan to mirror nars from ci.guix and bordeaux.guix, using
> plain rsync or <https://git.cbaines.net/guix/nar-herder/about/>.
>
> • Have a documented procedure to set up substitute mirrors, such as
> the one in .cn (I can’t find the URL), ideally with plain rsync
> access.
>
> Am I forgetting something?
>
> Some of these tasks require root or physical access for the final steps,
> but most of them are about (1) coming up with a plan, and (2) adjusting
> the system configuration at
> <https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/>.
> Anyone with sysadmin experience to share and Guix System knowledge can
> join! We’ll communicate over #guix on irc.libera.chat.
>
> Who’s in? :-)
>
> Cheers,
> Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Solstice infrastructure hackathon
2021-12-17 8:12 ` Mathieu Othacehe
@ 2021-12-18 16:08 ` Ricardo Wurmus
2021-12-20 22:58 ` Ludovic Courtès
1 sibling, 0 replies; 9+ messages in thread
From: Ricardo Wurmus @ 2021-12-18 16:08 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: guix-devel, guix-sysadmin
Mathieu Othacehe <othacehe@gnu.org> writes:
> We should clearly host some services such as the Guix website on
> Berlin & Bordeaux to bring some redundancy. However, as far as
> substitutes building is concerned, redundancy is premature when
> maintaining a single system, with our limited human and hardware
> resources, proves to be so complex.
>
> What do other people think?
I have a very unqualified opinion, which is directly related to the fact
that I’m interacting with ci.guix.gnu.org (and the software on it)
daily, but I have no idea what bordeaux is running. I never contributed
to the build coordinator, never configured it, never fetched substitutes
from there either.
So to me the practical value of Cuirass as it exists now is quite
obvious, because I’m pretty familiar with it, have run an own instance
in the past, and many of our services (like the aforementioned installer
images) depend on it.
I feel strongly that maintenance and improvements to Cuirass should not
fall exclusively on Mathieu’s shoulders, so it would be wonderful if we
had more people hack on Cuirass. That said, I don’t see exploratory
work on an alternative way to build substitutes as a redirection of
resources that would be needed elsewhere. Motivation is not fungible.
The existence of these two systems affects our resources in that build
machines are added exclusively to either one or the other system. This
has primarily implications for our limited aarch64 build nodes. For
x86_64 we’ve got the vast majority connected to ci.guix.gnu.org.
Personally, I consider the continued development and improvement of
Cuirass to be essential. When it comes to scarcity of build nodes I
think the solution should be to buy more.
--
Ricardo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Solstice infrastructure hackathon
2021-12-17 8:12 ` Mathieu Othacehe
2021-12-18 16:08 ` Ricardo Wurmus
@ 2021-12-20 22:58 ` Ludovic Courtès
1 sibling, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2021-12-20 22:58 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: guix-devel, guix-sysadmin
Hello,
Mathieu Othacehe <othacehe@gnu.org> skribis:
> I personally think that having Bordeaux as an alternative build farm,
> running alternative software is a wrong direction for the project. I
> would personally prefer to put a stop to that situation.
>
> We should clearly host some services such as the Guix website on
> Berlin & Bordeaux to bring some redundancy. However, as far as
> substitutes building is concerned, redundancy is premature when
> maintaining a single system, with our limited human and hardware
> resources, proves to be so complex.
>
> What do other people think?
I agree with what Ricardo wrote, in particular that motivation is not
fungible. In that sense, an extra effort is not a “waste of resources”
in that it doesn’t take anything from other efforts in terms of
workforce.
Now, I’d certainly like to see more collective thinking around the
infrastructure we’re building—just like we think and work collectively
on Guix and its packages.
We have one set of problems to solve: long-term storage, GC scalability,
distribution, mirroring, continuous integration, etc. While it’s
valuable to approach them from difference angles, I also find it more
fruitful when we can iterate collectively on actual solutions, and
follow our shared set of processes—review, tests, doc, integration in
Guix proper when applicable, and so on.
Tomorrow is a day where we can hopefully take advantage of that
collective work to address part of our infrastructure needs.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Solstice infrastructure hackathon
2021-12-16 9:46 Solstice infrastructure hackathon Ludovic Courtès
` (2 preceding siblings ...)
2021-12-18 10:57 ` pukkamustard
@ 2021-12-22 0:44 ` Ludovic Courtès
3 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2021-12-22 0:44 UTC (permalink / raw)
To: Guix Devel; +Cc: guix-sysadmin
Hello Guix!
Ludovic Courtès <ludo@gnu.org> skribis:
> Here are tasks that were brought up:
>
> • Set up a backup server for berlin.guix.gnu.org, the head node of the
> ci.guix.gnu.org, possibly moving some services such as the web site
> there.
>
> • Add DNS redundancy for guix.gnu.org so it can point to one of two
> hosts (need to figure out certbot challenges so both machines can
> update their certificates).
>
> • Set up status.guix.gnu.org with sysadmin status updates (possibly
> using Prometheus?).
>
> • Come up with a plan to add disks to the RAID array on bayfront, the
> head node of bordeaux.guix.gnu.org.
>
> • Work on a plan to back up the Disarchive database currently on
> berlin.guix.
>
> • Work on a plan to mirror nars from ci.guix and bordeaux.guix, using
> plain rsync or <https://git.cbaines.net/guix/nar-herder/about/>.
>
> • Have a documented procedure to set up substitute mirrors, such as
> the one in .cn (I can’t find the URL), ideally with plain rsync
> access.
A small but dedicated bunch of people made progress on several of these
items today, in a loosely coordinated fashion on IRC—which perhaps made
it hard to get started, let us know what you think would help you join!
Most of the progress so far is visible in the commit log of the
maintenance repository:
https://git.savannah.gnu.org/cgit/guix/maintenance.git/log/?id=e19f6d92b0b9a743c5b3cad236e51b8dd9d7c5e9
There’s IPv6, use of nar-herder to distribute bordeaux.guix substitutes,
I/O performance testing on a possible ci.guix head node
replacement/backup, web site replication, backups over rsync from berlin
to bordeaux, and more.
We have yet to complete support for web site replication: adding nginx
rules on the backup, having guix.gnu.org point to the two hosts, setting
up Let’s Encrypt. This should be within reach quickly.
Other items above are yet to be addressed. Our next priority should be
to have an off-site copy of the ci.guix substitutes.
Overall I think we need to aim for complete redundancy of the main
services. The good news is that this Guix System thing greatly
simplifies the work!
To be continued with a second session sometime in January!
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-12-22 0:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-16 9:46 Solstice infrastructure hackathon Ludovic Courtès
2021-12-16 11:30 ` zimoun
2021-12-17 1:57 ` Christopher Baines
2021-12-17 8:12 ` Mathieu Othacehe
2021-12-18 16:08 ` Ricardo Wurmus
2021-12-20 22:58 ` Ludovic Courtès
2021-12-18 10:57 ` pukkamustard
2021-12-22 0:44 ` Ludovic Courtès
-- strict thread matches above, loose matches on Subject: below --
2021-12-16 10:13 Blake Shaw
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.