From mboxrd@z Thu Jan 1 00:00:00 1970 From: Giovanni Biscuolo Subject: Re: plz is there a roadmap for a more resilient substitutes infrastructure? Date: Sun, 11 Nov 2018 19:56:45 +0100 Message-ID: <87o9av4f4i.fsf@roquette.mug.biscuolo.net> References: <87wopv7jzw.fsf@roquette.mug.biscuolo.net> <87tvku5u1y.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:56498) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gLuv6-0007oZ-NQ for guix-devel@gnu.org; Sun, 11 Nov 2018 13:57:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gLuuz-0003jk-V6 for guix-devel@gnu.org; Sun, 11 Nov 2018 13:57:15 -0500 In-Reply-To: <87tvku5u1y.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: guix-devel@gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi! sorry for my late reply I confess I haven't still read the whole Guix/GuixSD Reference Maulal, so my apologies if I'm asking something already documented :-S ludo@gnu.org (Ludovic Court=C3=A8s) writes: [...] > We Guix developers don=E2=80=99t have control over the physical hardware = behind > hydra.gnu.org; for this machine, we rely on the work of the FSF > sysadmins for all things hardware/networking. OK, thanks for this info > Unfortunately in this case, this maintenance period was rather > unprepared: it wasn=E2=80=99t supposed to last a whole week, rather a few= hours > or a day at most. Most of the time it took was about copying data to a > new disk (!). is it published somewhere what are the minimum hardware and disk needs for a complete GuixSD distribution build server? > Had this been prepared, we could have arranged to keep > hydra.gnu.org up until the replacement was ready. We Guix developers > didn=E2=80=99t have much visibility over what was going on though, and we= just > didn=E2=80=99t anticipate this. sorry about that, I'm a sysadmin and I know how much my work is impacting others :-) > It is clear that this prolonged downtime was harmful to many users and > to the project=E2=80=99s reputation. GuixSD does not deserve this kind of harm :-( > What to do from here? I once saw the existance of https://git.savannah.gnu.org/cgit/guix/maintenance.git [1] you pointed me (below), but did not read the entire tree now I see we have https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/doc/1.0.org should we add a new "super" task named "resilience of subsitutes network"? looking at https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/machines.= scm it seems that some deggree of resilience for hydra.gnu.org is already in place but this does not seem to work as a distributed source of substitute servers, but "just" to offload build jobs to the defined list of build servers could servers in "machines.scm" also be used as substitutes servers? > Our main focus is on making berlin.guixsd.org the primary build farm of > the project. It has the advantage that one Guix dev has physical access > to it (Ricardo); it=E2=80=99s also much more powerful than hydra.gnu.org = and the > associated build machines. OK, I see it https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/doc/1.0.org#n30 more details could help fix related issues IMHO a public guixsd.org Sysadmins Manual should be in the roadmap (as MAYBE): that could help the core team job, show the community how the job is done *and* help others to build on our best practices Guix/GuixSD is *the* perfect tool for IaC (infrastructure as code), could be *very* interesting to develop a "Literate GuixSD IaC package" as a meta-project :-)=20 maybe we could (slowly) build a reproducible IaC literate devops document, based on org-mode babel, so we'd have both tangled code and exported documentation > Yet, there=E2=80=99s more work to do: berlin has just 1T of disk space. = Ricardo > started looking on growing it but was stuck on software issues IIRC. I > think fixing this should be a priority, so I think we should help > Ricardo fix the software issues as much as we can. I realize I'm pretty new in this community and you can't trust me since we do non even know each other... but I could help if needed, just tell me (in private if more appropriate) what's the hardware issue > That alone doesn=E2=80=99t fix the resilience issue: berlin.guixsd.org co= uld go > down at some point for some time. > > To address that, a possibility that was discussed recently on > guix-sysadmin is use bayfront.guixsd.org has a separate build farm guess you meant "use bayfront.guixsd.org *as* a separate build farm" > and/or mirror of berlin. [...] >> given the prolonged issue, please also consider writing an *official* >> blog post explaining the current situation and steps adopted to prevent >> similar issues in the future > > We set up the info-guix mailing list with that in mind (but too late for > this incident). Posting blog posts is also a good idea; we should have > done that, with instructions on how to switch to berlin.guixsd.org. given the impact on project reputation, please consider a "post-mortem" blog post on what happened: something in line with Ludo's reply to me not all interested users and observers read this (and others) mailing list archives >> 1. is there a method to "replicate the whole store of an official server >> (e.g. hydra.gnu.org once healed)" so we can just "guix publish" a >> *complete* mirror? In this case a ready to use official >> mirror-config.scm could be useful > > mirror.hydra.gnu.org is a simple nginx proxy to hydra.gnu.org. You can > find its config here: > > https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx= /mirror.conf OK, so it's caching proxy I'll see if and how I can build a similar one sorry but I still don't understand why mirror.hydra.gnu.org failed serving substitutes during a 0.15 installation started from the install CD: it was a cache size problem? > In the past a few people set up their own mirrors using a similar > configuration. we shold build a network of organizations and individuals for this >> 2. is there an official mirrors directory users can look at when needed? > > No. I volunteer to keep such a list and coordinate the "volunteers network", if you want >> 3. is there a plan to build a service similar to >> http://httpredir.debian.org/? (I looked on the web but did not find any >> reference to such plan) > > Like I wrote, there=E2=80=99s no concrete plan at this point, which means= it=E2=80=99s > an opportunity for you and anyone else to chime in and give a hand! I have no experience in building such a service but it definitely fits in my professional enhancement plan, so I'm still not able to lead such a project but I can help ciao Giovanni =2D-=20 Giovanni Biscuolo Xelera IT Infrastructures --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEERcxjuFJYydVfNLI5030Op87MORIFAlvoe24ACgkQ030Op87M ORLVRRAAibOmQZKBUCfOY2OAPNVo/iMeofUaTjMWRnT6sElQ5SQF5QxBR+iH+HeZ dB66P0/MIOuQZryXCgn8Oj3gyJv7m9plSp7wuQCxJ3VjeE9C6MCDpZ/xHZ2pdU0D OU7w1jNrAE3kd6F334a3r3kw2cAt4FzpBtpSAIjzA9DS610D8Bi2KwBFBRfHvI/X a+LNT2fxBJbmHTz3XY1P/oeHadzqJcByyNukGMzzL59Dbg5cYdw54fshkObN/anh YXLQl1PBmthoF4TOLDofh0kVdGM+UvwwmF4n0O3BY/txt7igC9LzDMyGIifVcb98 Sfq4gwlL7r/mIuObQlff4ih/Txn+E5HRji7dIqJVih0Mo1ESPnyii8I5ySyCxXW6 m451srElkx6A9Iniph4QRUgu51xr6Mk8+0lmTa3S1tmMZJq2aRyc/OR7CPusC3HI RGCu72x92+XemNvZJCT+WOmTbBsRs4eLvykioluC7Zw22i3oALu7efMiBAgrsgNh uNDiQJexs6jOlYkneSGduIo7D3oG+fASN5FuUZj/pHqfjef5h+nyEgckQqwHLB3H 3SR7c4m7ILEhIXLfLwL0qSvWMbDaSwmiW7rmyDt6+vZFStvYS0pY7wTCec5fPw+b 1zmoRL5dGFXik9OFBjBIgL6fKg3j/4F0n9n2B0KgykjY1YiD0Og= =c32w -----END PGP SIGNATURE----- --=-=-=--