From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54837) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1frrYw-0003sx-Gy for guix-patches@gnu.org; Mon, 20 Aug 2018 17:18:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1frrYo-0000cy-Rp for guix-patches@gnu.org; Mon, 20 Aug 2018 17:18:10 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:50952) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1frrYo-0000cq-O6 for guix-patches@gnu.org; Mon, 20 Aug 2018 17:18:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1frrYo-0003wN-E8 for guix-patches@gnu.org; Mon, 20 Aug 2018 17:18:02 -0400 Subject: [bug#32408] [PATCH shepherd] Allow replacement of services Resent-Message-ID: References: <87wosz4spx.fsf@zancanaro.id.au> <87h8jobwwp.fsf@gnu.org> From: Carlo Zancanaro In-reply-to: <87h8jobwwp.fsf@gnu.org> Date: Tue, 21 Aug 2018 07:16:48 +1000 Message-ID: <87k1okzqkf.fsf@zancanaro.id.au> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 32408@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hey Ludo, On Tue, Aug 21 2018, Ludovic Court=C3=A8s wrote: > We=E2=80=99ll still want to be able to special-case things like nginx=20 > that can be =E2=80=9Chot-replaced=E2=80=9D, though. So perhaps, in addit= ion to=20 > this patch on the Shepherd side, we=E2=80=99ll need extra stuff in (gnu=20 > services shepherd). Yeah, if we expose the replacement field directly, then we'll need=20 some supporting code in (gnu services shepherd), even if it's just=20 to detect whether the service is stopped or not before doing a=20 replacement. Although ideally our interface wouldn't introduce=20 race conditions like that. (See below for more thoughts on this.) > For instance, the =E2=80=98actions=E2=80=99 field of c= ould,=20 > by default, include an =E2=80=9Cupgrade=E2=80=9D action that simply sets = the=20 > =E2=80=98replacement=E2=80=99 slot. For nginx, we=E2=80=99d provide a cu= stom =E2=80=9Cupgrade=E2=80=9D=20 > action that does =E2=80=9Cnginx -s restart=E2=80=9D or whatever it is tha= t needs=20 > to be done. > > =E2=80=98guix system reconfigure=E2=80=99 would automatically invoke the= =20 > =E2=80=98upgrade=E2=80=99 action for each new service. > > WDYT? How many services can we meaningfully upgrade like this? My=20 understanding is that most of our system services a fed an=20 immutable configuration file, and thus restarting/reloading won't=20 actually upgrade them. In order to make an upgrade action work the=20 service would have to mutate itself into a new correct=20 configuration, as well as restarting/reloading the underlying=20 daemon. It's even trickier if the daemon itself has been upgraded,=20 because then the process will have to be restarted anyway. At any rate, I think the replacement mechanism (this patch) is=20 just one way that a service can be reloaded. It would probably be=20 a good idea to create a higher-level abstraction over it. I think=20 other mechanisms (like a upgrade/reload action) should be handled=20 on the Guix side of things. >> + (let ((replacement (slot-ref service 'replacement))) >> + (define (copy-slot! slot) >> + (slot-set! service slot (slot-ref replacement slot))) >> + (when replacement >> + (copy-slot! 'provides) >> + (copy-slot! 'requires) >> + (copy-slot! 'respawn?) >> + (copy-slot! 'start) >> + (copy-slot! 'stop) >> + (copy-slot! 'actions) >> + (copy-slot! 'running) >> + (copy-slot! 'docstring)) >> + service)) > > Having a hardcoded list of slots sounds error-prone=E2=80=94surely we=E2= =80=99ll=20 > forget to update it down the road. I wonder what else could be=20 > done. > > One option would be to grab the block asyncs and atomically=20 > replace the service in the =E2=80=98%services=E2=80=99 hash table. Then = we only=20 > need to copy the =E2=80=98last-respawns=E2=80=99 slot to the new service,= I=20 > believe. (This changes the object identity of the service but I=20 > think its OK.) > > Another option would be to use GOOPS tricks to iterate over the=20 > list of slots and have a list of slots *not* to copy. I=E2=80=99m not a= =20 > big fan of this option, though. My favourite option for this would be to separate the =20 object into an immutable and a mutable .=20 The object would have a reference to a =20 object in order to invoke actions on it, and it could also hold a=20 second object as a replacement. Then the swap would be=20 much more straightforward. I haven't done any real work towards=20 this, though. In the short term, I'd rather replace it in the %services hash=20 table. I did it by copying slots because I wasn't sure I would get=20 the details of the swap right and didn't have time to properly=20 work out how to do it. I'll give it a go! >> +(let ((service (lookup-running 'test))) >> + (slot-set! service 'replacement >> + (make > > I wonder if we should let users fiddle with =E2=80=98replacement=E2=80=99= =20 > directly, or if we should provide a higher-level construct. > > For instance, =E2=80=98register-services=E2=80=99 could transparently set= the=20 > =E2=80=98replacement=E2=80=99 field for services already registered inste= ad of=20 > doing: > > (assert (null? (lookup-services (canonical-name new)))) > > Not sure if there are cases where this behavior would be=20 > undesirable, though. > > Thoughts? With this current patch the replacement field is only checked at=20 the point when the service is stopped, so the field could only be=20 set when the service is actually running. I think it makes the=20 most sense to just replace the service directly if it's not=20 stopped. I can't think of any undesirable cases, but having a higher-level=20 interface is a good idea. At the very least we need to control the=20 inherent race condition involved in (if running? do-x do-y) for if=20 the service is stopped after the running? check. At the moment I=20 think the only thing we have to worry about there is signals, but=20 if we're going to move to have more parallelism through fibers=20 then we might need to be even more careful. I'll try to send through an updated patch later this week. Carlo --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAlt7L8AACgkQqdyPv9aw IbyKtw//X2xkrVY9HAtGQ7ROk1k7PMGAEFpCJyxEFbcMh3BkbmJxbhdX5Gov1+3M U67UX/t2D239Q07W28FPw5psm2PPowSZMx+QIcl+CA2H3efPuRmgdXlAPuMzn8gn CZxLcDoS9lhB1vFGHIG6AkHVzGFj6OB2HhDNG/id/KyumUOPMRrqrNDTtt9+60pB X/At8EpNp1QGtgq4ZOVXvmEufDX4TWzSdAFYh82AiiTuZZFeDhqZ7zDEk8jB/ng9 MPtyt8RdWfnqmxgzwHWf/Dniiow8nHw9Z9mYX+P4g7/6ZRWvDTLeGFKn5fjG9fVz Ntg71O7DDyuM7bm24R2/JW7cN1Qv9gP4AYOAfrYxJgzNntNmnOVniZyFRS+dtm0p z6iF/LqsDkit9eRnWW5K8MuEu9AwtV8HTANNmGgOnU4Oc7soVzBJnbvJYuxTPEm7 rfliFke/HNGBnvSfoXa5JzGPYJz8wVlnMdUc7CvATVobUqty4sJ4NbSLQDxNDlSE 9uTZH5irxE5E2RY1aPgsl+4uU1KbMw+rah5d40pe0ddVu1WI0llSmij0IpJTaFm9 +IzMoaiJhrk0ceocs8aEcmx9gjl09ELcUiEIvoorz/Z9NMdLLJ04eJHlWNXgTgu7 QA37arySGg6zxnQOiExJC+/B6l+7hw341+pG+YDrA6gBFIpp9ys= =5vP+ -----END PGP SIGNATURE----- --=-=-=--