From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Sassmannshausen Subject: Thoughts on stateful services in Guix Date: Sat, 01 Feb 2020 21:38:28 +0100 Message-ID: <87r1zexcu3.fsf@gmail.com> Reply-To: alex.sassmannshausen@gmail.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:47960) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ixzXG-0007XU-Ql for guix-devel@gnu.org; Sat, 01 Feb 2020 15:38:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ixzXE-00018N-UX for guix-devel@gnu.org; Sat, 01 Feb 2020 15:38:34 -0500 Received: from mail-wm1-x32f.google.com ([2a00:1450:4864:20::32f]:34030) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ixzXD-00017A-JW for guix-devel@gnu.org; Sat, 01 Feb 2020 15:38:32 -0500 Received: by mail-wm1-x32f.google.com with SMTP id s144so11351787wme.1 for ; Sat, 01 Feb 2020 12:38:31 -0800 (PST) Received: from watership ([2a02:a03f:3e55:2800:bb66:9d33:218e:68a0]) by smtp.gmail.com with ESMTPSA id s139sm17223000wme.35.2020.02.01.12.38.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 01 Feb 2020 12:38:29 -0800 (PST) List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: Guix-devel Hello, As a result of FOSDEM conversations today I felt inspired to put some thoughts on paper about an area where I think we currently run into complications. Not sure if it is appropriate to write this blog-post-ish contribution here, but as I don't have a blog, and it's about Guix development, I figured it might be OK. Best wishes, Alex 1 Introduction =E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2= =95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90 Guix is amazing. A large part of why it continues to be amazing is because it provides strong guarantees to the end-user and developer. As such it makes reasoning about packages and deployments relatively straight-forward. The functional paradigm cleary works fantastically for packages. Unfortunately it is not quite clear that it works just as well for services. The reason for this is that too many useful services are inherently stateful. Their statefulness means they have side-effects, which in turn cause issues when relying on Guix features such as roll-back or automated deployment & guaranteed reproducability. Disciplined developers can implement many services in such a way that their statefulness is delegated to other dedicated services. In this way the problem of statefulness can be isolated. However, many existing useful services /have not/ been implemented in such a way. These notes are an attempt to think through ways of formalising how we mitigate stateful services in Guix. The notes below are organised around the example of a popular PHP content management system, Drupal. Similar problems will apply to many other end-user services. My intent here is to communicate my thinking in the hopes that others can point to obvious flaws in my reasoning =E2=80=94 or to stimulate conversation about the topic. I hope at the very least that it is an interesting read! 2 Learning by Doing: Packaging & Deploying Drupal =E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2= =95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95= =90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90= =E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2= =95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95= =90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90 2.1 Problem (1): the Drupal tarball is a binary blob! =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80 Released Drupal tarballs are shipped with a bunch of PHP dependencies, as well as compiled JS files. A fully source-distributed installation of Drupal would: 1) delete all shipped PHP dependencies 2) independently build all PHP dependencies and make them available to Drupal (through it's vendor directory as symlinks?) 3) delete all compiled JS files & libraries 4) independently build the JS dependencies and make them avalable to Drupal. 2.2 Packages must always be stateless! =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 What does this mean in practice? Let's look at Drupal again. When you download Drupal, the resulting tarball contains the source code, and an empty sites/ folder. The sites folder is intended to contain /state/ files. The normal installation procedure is to simply drop the drupal distribution in your web root directory, and for state files to live underneath sites/ within your webroot. In Guix, Drupal is packaged so that it is installed in the store. The store is read-only and hence no /state/ files can live under the sites/ directory in the store. How do we get around this? 1) either we patch drupal to expect the sites/ directory outside of its own folder tree. 2) or we symlink /gnu/store/=E2=80=A6drupal=E2=80=A6/sites/ to a differen= t location on the filesystem (e.g. /var/lib/drupal/sites/) The latter solution, while easy, means that a *successful* installation of the package Drupal in Guix results in an installation in the store with a *broken symlink* pointing outside the store. But the package itself has been rendered stateless! 2.3 Drupal: a stateful service =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 Services in Guix are rich and multidimensional entities. At core they are promises of things that will have happened when a system is up and running. These promises can be arbitrary, like generating a configuration file every boot; or they can be an extension of other, already existing services. A particularly popular service to extend is the shepherd service, which ensures that particular daemons are started as soon as possible after the system has started. What would a Drupal service look like? Essentially the software is just a bunch of files in a webroot =E2=80=94 so at it's heart it simply extends a web service (e.g. nginx) with new location directives. On the other hand it also requires that tha web server, a sql backend and php-fpm are running, so that drupal can actually function. These two requirements are easily met with the usual service infrastructure: simply extend nginx with a location definition pointing to the Drupal folder in the store as the webroot and extend shepherd to require mysql, nginx and php-fpm. 2.3.1 Enter the state dragon =E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2= =95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95= =8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C= =E2=95=8C=E2=95=8C=E2=95=8C But hang-on=E2=80=A6 if Mysql service is a dependency, and we store state= in a Mysql DB, what happens when we upgrade to a newer version of Drupal, with a different schema? Would this service instantiation change the state of our system? Could we still roll-back? And remember that broken symlink we introduced in the package? That files/ subdirectory contains /at least/ a stateful configuration file =E2=80=94 but also Drupal modules, themes, uploaded files and caches. Wh= at happens when we upgrade Drupal? Can we guarantee they won't be irrevocably changed as part of the upgrade? What happens when we roll-back? The truth is that we have introduced state into our deployment and we hence lose a lot of the beautiful hard guarantees that Guix provides us with. 2.3.2 Fakin' it til we're makin' it =E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2= =95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95= =8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C= =E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2=95=8C=E2= =95=8C=E2=95=8C Can we get those guarantees back? I fear not. But we can add some shims and mitigations that might take the edge off the sting. We've elaborated, above, the extend of the state problem in Drupal. What would a /state shim/ look like? Working backwards, to ensure we can roll-back we need to make sure that we have access to the state at that point in history that we want to roll-back to. When we roll-back from version 2 of Drupal to version 1, we must make sure we still have the state of version 1. So therefore we need to simply make an addressable state dump just before we activate a new service instance, so that when we roll-back, we can flip back to that state dump. In the case of our Drupal example, this means that we need to dump the sites/ directory and the mysql databases. This state dump needs to fulfill the following criteria: 1) It needs to be reversible: a transaction into a new state needs to make a dump before transition and then carry out state changes; a transaction into a past state must make a dump before transition and then load the old state's dump as it's new state. 2) It (optionally) needs to be secured: state will likely contain personal information, and this information needs to be protected. 2.4 The stateful service shimmy =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 Here's the proposed general shim then: 1) We define a new type of shepherd service, a /stateful service/, which has additional actions: state-dump-shim and state-load-shim. 2) Each stateful service (e.g. drupal) is responsible for, through provision of imperative code snippets in the state-dump-shim and state-load-shim fields, providing state dumpers/restorers. 3) Whenever a new service revision is installed: 1) before it is installed, state-dump-shim is called. It should dump the service's state in a way that when state-load-shim is called with the new service revision's store hash, it can locate any state dumps that were stored for that particular hash. 2) after it is installed, state-load-shim is called with the store hash of the new service revision. It should then be able to locate and load any state associated with this hash. If no state is associated (e.g. when we install a wholly new revision), nothing is done. What does this give us? When a migration to a new service revision causes a deployed bit of software to stop working as expected, we can do a `guix system =E2=80=93roll-back`. This will then also restore the service's state to = the previous version and thus *should* restore operations. *BEWARE*: any state changes that occured since the previous state-dump /will/ be lost. We are restoring to a backup =E2=80=94 so user generated state that was added after the backup will not be present after restoration! 3 A generalisation: the Stateful-Service Service =E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2= =95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95= =90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90= =E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2= =95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95= =90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90 State dumping and restoration *should* be generalisable. It should normally consist of one or more operations of: =E2=80=A2 running a special program to dump data, encrypting the dump, and storing it, together with the service revision hash, in a known location. =E2=80=A2 tarring up a directory tree, encrypting the archive and storing= it, together wtih the service revision hash, in a known location. These operations should be able to be provided by a daemon, managed by shepherd, which can be configured with a gpg key for encryption, a mechanism for decryption (interactive or programmatic), a location for storing/retrieving dumps (local or remote?), and a DSL for mapping data dump / data restoration program invocations or file-system locations to data dumps it already knows of.