From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:48357) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1geICP-0000HK-BJ for guix-patches@gnu.org; Tue, 01 Jan 2019 06:27:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1geICM-0004EL-Lu for guix-patches@gnu.org; Tue, 01 Jan 2019 06:27:05 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:46121) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1geICM-0004E2-Ic for guix-patches@gnu.org; Tue, 01 Jan 2019 06:27:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1geICM-0006dq-4x for guix-patches@gnu.org; Tue, 01 Jan 2019 06:27:02 -0500 Subject: [bug#33508] [PATCH] gnu: Add ability to restart services on system reconfigure Resent-Message-ID: References: <87efb8m5gy.fsf@zancanaro.id.au> <87lg4yws9a.fsf@gnu.org> From: Carlo Zancanaro In-reply-to: <87lg4yws9a.fsf@gnu.org> Date: Tue, 01 Jan 2019 22:25:30 +1100 Message-ID: <87h8eszkk5.fsf@zancanaro.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 33508@debbugs.gnu.org Hey Ludo=E2=80=99, Sorry for not responding to this email for so long. I've been=20 trying to think through some of the issues around this, and I'm=20 not confident that I have thought through the issues well enough=20 to actually decide on a good course of action, beyond what I have=20 already written. I'll respond to a few specific things in your=20 message, but I don't even know what a good solution would look=20 like, let alone how to build it. On Mon, Dec 10 2018, Ludovic Court=C3=A8s wrote: > In what sense is guix-daemon =E2=80=9Calways safe to restart=E2=80=9D? I= t=E2=80=99s=20 > actually a difficult question for me. I agree it's tricky. I had mostly intended that as an example,=20 because I used guix-daemon for my testing, but ... > You could argue that its child guix-daemon processes will remain=20 > live when we restart it, meaning that client connections remain=20 > active and valid. I believe this is indeed the case, though it=20 > would be worth double-checking. ... this is what I was thinking. I'm fairly sure this is the case,=20 given my observations while I was testing these patches. > Now, if safe-to-restart means that we automatically invoke the=20 > =E2=80=9Crestart=E2=80=9D action on guix-daemon, that means that anything= that=20 > depends on it (=E2=80=98guix-publish=E2=80=99, =E2=80=98cuirass=E2=80=99,= =E2=80=98hpcguix-web=E2=80=99, etc.)=20 > would be restarted as well (even though I *think* we don=E2=80=99t have=20 > to in this case.) But these may not be safe to restart: for=20 > example, on may want =E2=80=98guix-publish=E2=80=99 to run uninterrupted. At the moment we have no way to capture this, particularly in the=20 Shepherd. There's no way to restart a service without restarting=20 dependent services, but I particularly want to pick up on the=20 "uninterrupted" by talking about nginx below. > ... > sshd, nginx, and maybe guix-daemon can more or less be=20 > live-upgraded, meaning that (1) existing connections are=20 > preserved but future connections will talk to the new daemon,=20 > and as a corollary, (2) dependent services do not need to be=20 > stopped & restarted. I did some research into nginx, and it turns out that it is=20 possible to upgrade nginx with zero-downtime by running two=20 daemons simultaneously listening on the same port(s), then=20 shutting down the old daemon after the new one has successfully=20 started[1]. This allows for an "uninterrupted" upgrade, but I'm=20 not confident that I would be able to implement it within our=20 current framework. In all, I haven't done anything with this in the last month. I've=20 thought about it a few times, but it just feels a bit=20 overwhelming. Carlo [1]: https://nginx.org/en/docs/control.html#upgrade