From: Giovanni Biscuolo <g@xelera.eu>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: plz is there a roadmap for a more resilient substitutes infrastructure?
Date: Sun, 11 Nov 2018 19:56:45 +0100 [thread overview]
Message-ID: <87o9av4f4i.fsf@roquette.mug.biscuolo.net> (raw)
In-Reply-To: <87tvku5u1y.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 6062 bytes --]
Hi!
sorry for my late reply
I confess I haven't still read the whole Guix/GuixSD Reference Maulal,
so my apologies if I'm asking something already documented :-S
ludo@gnu.org (Ludovic Courtès) writes:
[...]
> We Guix developers don’t have control over the physical hardware behind
> hydra.gnu.org; for this machine, we rely on the work of the FSF
> sysadmins for all things hardware/networking.
OK, thanks for this info
> Unfortunately in this case, this maintenance period was rather
> unprepared: it wasn’t supposed to last a whole week, rather a few hours
> or a day at most. Most of the time it took was about copying data to a
> new disk (!).
is it published somewhere what are the minimum hardware and disk needs
for a complete GuixSD distribution build server?
> Had this been prepared, we could have arranged to keep
> hydra.gnu.org up until the replacement was ready. We Guix developers
> didn’t have much visibility over what was going on though, and we just
> didn’t anticipate this.
sorry about that, I'm a sysadmin and I know how much my work is
impacting others :-)
> It is clear that this prolonged downtime was harmful to many users and
> to the project’s reputation.
GuixSD does not deserve this kind of harm :-(
> What to do from here?
I once saw the existance of
https://git.savannah.gnu.org/cgit/guix/maintenance.git [1] you pointed
me (below), but did not read the entire tree
now I see we have
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/doc/1.0.org
should we add a new "super" task named "resilience of subsitutes
network"?
looking at
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/machines.scm
it seems that some deggree of resilience for hydra.gnu.org is already in
place but this does not seem to work as a distributed source of
substitute servers, but "just" to offload build jobs to the defined list
of build servers
could servers in "machines.scm" also be used as substitutes servers?
> Our main focus is on making berlin.guixsd.org the primary build farm of
> the project. It has the advantage that one Guix dev has physical access
> to it (Ricardo); it’s also much more powerful than hydra.gnu.org and the
> associated build machines.
OK, I see it
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/doc/1.0.org#n30
more details could help fix related issues
IMHO a public guixsd.org Sysadmins Manual should be in the roadmap (as
MAYBE): that could help the core team job, show the community how the
job is done *and* help others to build on our best practices
Guix/GuixSD is *the* perfect tool for IaC (infrastructure as code),
could be *very* interesting to develop a "Literate GuixSD IaC package"
as a meta-project :-)
maybe we could (slowly) build a reproducible IaC literate devops
document, based on org-mode babel, so we'd have both tangled code and
exported documentation
> Yet, there’s more work to do: berlin has just 1T of disk space. Ricardo
> started looking on growing it but was stuck on software issues IIRC. I
> think fixing this should be a priority, so I think we should help
> Ricardo fix the software issues as much as we can.
I realize I'm pretty new in this community and you can't trust me since
we do non even know each other... but I could help if needed, just tell
me (in private if more appropriate) what's the hardware issue
> That alone doesn’t fix the resilience issue: berlin.guixsd.org could go
> down at some point for some time.
>
> To address that, a possibility that was discussed recently on
> guix-sysadmin is use bayfront.guixsd.org has a separate build farm
guess you meant "use bayfront.guixsd.org *as* a separate build farm"
> and/or mirror of berlin.
[...]
>> given the prolonged issue, please also consider writing an *official*
>> blog post explaining the current situation and steps adopted to prevent
>> similar issues in the future
>
> We set up the info-guix mailing list with that in mind (but too late for
> this incident). Posting blog posts is also a good idea; we should have
> done that, with instructions on how to switch to berlin.guixsd.org.
given the impact on project reputation, please consider a "post-mortem"
blog post on what happened: something in line with Ludo's reply to me
not all interested users and observers read this (and others) mailing
list archives
>> 1. is there a method to "replicate the whole store of an official server
>> (e.g. hydra.gnu.org once healed)" so we can just "guix publish" a
>> *complete* mirror? In this case a ready to use official
>> mirror-config.scm could be useful
>
> mirror.hydra.gnu.org is a simple nginx proxy to hydra.gnu.org. You can
> find its config here:
>
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf
OK, so it's caching proxy
I'll see if and how I can build a similar one
sorry but I still don't understand why mirror.hydra.gnu.org failed
serving substitutes during a 0.15 installation started from the install
CD: it was a cache size problem?
> In the past a few people set up their own mirrors using a similar
> configuration.
we shold build a network of organizations and individuals for this
>> 2. is there an official mirrors directory users can look at when needed?
>
> No.
I volunteer to keep such a list and coordinate the "volunteers network",
if you want
>> 3. is there a plan to build a service similar to
>> http://httpredir.debian.org/? (I looked on the web but did not find any
>> reference to such plan)
>
> Like I wrote, there’s no concrete plan at this point, which means it’s
> an opportunity for you and anyone else to chime in and give a hand!
I have no experience in building such a service but it definitely fits
in my professional enhancement plan, so I'm still not able to lead such
a project but I can help
ciao
Giovanni
--
Giovanni Biscuolo
Xelera IT Infrastructures
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
prev parent reply other threads:[~2018-11-11 18:57 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-02 12:16 plz is there a roadmap for a more resilient substitutes infrastructure? Giovanni Biscuolo
2018-11-02 21:04 ` Pjotr Prins
2018-11-02 22:51 ` Julien Lepiller
2018-11-03 6:10 ` Pjotr Prins
2018-11-02 21:13 ` Devan Carpenter
2018-11-06 11:23 ` Ludovic Courtès
2018-11-06 11:31 ` Pierre Neidhardt
2018-11-11 18:56 ` Giovanni Biscuolo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o9av4f4i.fsf@roquette.mug.biscuolo.net \
--to=g@xelera.eu \
--cc=guix-devel@gnu.org \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).