From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Baines Subject: Re: Experiment in generating multi-layer Docker images with guix pack Date: Thu, 26 Mar 2020 20:15:09 +0000 Message-ID: <87wo76kh4i.fsf@cbaines.net> References: <20200321232428.31832-1-mail@cbaines.net> <87zhc39vcs.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:35185) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jHYuL-0005Fw-Ac for guix-devel@gnu.org; Thu, 26 Mar 2020 16:15:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jHYuJ-0005Wb-Jh for guix-devel@gnu.org; Thu, 26 Mar 2020 16:15:17 -0400 In-reply-to: <87zhc39vcs.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: guix-devel@gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ludovic Court=C3=A8s writes: > Christopher Baines skribis: > >> I think it could be useful to support multiple different strategies for >> generating layers for Docker images, with different trade-offs. This app= roach >> using two layers should make the resulting images more efficient to use = in the >> case where like the guile example above, where the packages you run guix= pack >> with have exactly matching inputs. > > Did you read ? > They came up with a pretty smart algorithm that would be worth copying. I'm aware of it, but I haven't read it in detail yet. >> As well as these behaviour changes, these patches also modify the >> implementation. Rather than having some build side code that's used in t= he >> pack and vm module gexpressions, these patches introduce two new record = types: >> and . This at least structures the >> derivations so that each layer is represented by a derivation, and then >> there's a derivation for the image itself, which is a little more effici= ent in >> terms of computation. > > Nice. > > I think a layering algorithm like Graham Christensen=E2=80=99s above requ= ires > knowledge of the reference graph, meaning that layering can only be > computed on the build side, using #:references-graphs. In that case, it > could be that you can=E2=80=99t have a host-side rec= ord. As I understand it, you only have to do the computation on the build side if you're restricted to doing a single set of builds. If you first build the store items you want to put in the image, then look at there references and compute the derivation for building the image, then you could do this kind of computation on the client side. But yeah, this is important to work out, as how image generation should work, and what behaviours we want should define the structure of the code. I went with records to represent layers partially because I'm familiar with it, but also because it allows for easier manipulation of layers on the client side. Representing different layers as different derivations also allows them to potentially be built in parallel, although I'm not sure how beneficial this might be. Related to this, at the moment Docker V1 images can be generated, it would be good in the future to also support Docker V2 images and OCI images. All three container formats use a layered approach to managing the files, but they are all different (as far as I'm aware). In my mind there are three architectural approaches: - Image generation entirely on the build side - The layers and the image are constructed through one derivation - The code for building images is in a module available at build time - Different approaches for layering are implemented in the module available at build time, and parameters are passed in as data/gexpressions - Image generation entirely on the client side - Each layer is a derivation, and the image is an additional derivation that takes the layers as an input - The code for building images is inside gexp compilers for the record types representing the images and layers - Different approaches for layering manipulate the layer records on the client side - Image generation can be done both build and client side - Depending on the parameters, the layers and image can be a single derivation, or one for each layer, and another for the image - The code for building images is in a module available at build time, and this is also used by gexp compilers - Different approaches for layering have the option of either being on the build side, or the client side What are peoples thoughts? Thanks, Chris --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQKTBAEBCgB9FiEEPonu50WOcg2XVOCyXiijOwuE9XcFAl59DU1fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcACgkQXiijOwuE 9XcB8Q/+NBHacCtXw0G17G0HYNWW66wHr47MSq8bBzxuAT8sv6lANAEyY8OcyDUU 6f4rtJf9wvnmt1zCYoWghtxmtYwNDW2FBVwzPqV0IechCZ1PFTpJ3wbUMzFkTvau o4U13kcP5Jd6T7g+bIOs3M8z1rArnAt6BoQzoPenJSi1L2tVdnDfx8QKTsorOfns qpa7z2+1Q2xDxnVOwJnz5ANgtQ/yvLZ1QP0158umiEfsxSFH9TMFaFgiwuuWxFyg O1mUD6kbeZkV61BClwzaiDiQiT/pVhfUtk1zIffLBUrO5SlPXiiuWbiuvQGEIhD+ rmA3rGd9fUJrEjSf/OZSqM2L3hqLMR5kAcDd8Te93YxUOWAfzpjfJBNMJER3w2aE MZnG8bumhlwx5TylTgQt/I1k98zxctcbf1cun1AfgM4lFfuX1TpLvRWBReH51S0x d/2X1s8zS2cHuT8t2wcuXPFx4vF9hAI/VPoF6l/aeHaL6e4tQfM1YuYu1kk8XW5u brshB7dun1CSUrEPiNuQfBWmKqnGCP0V/w8ZV+Ee/mCx4X2H+USTRjQ283MIa75Q K4uLCP78DDYFDhNwJ89PpyDun6W/vpjUx/DhjBlUZgmIUftg/IwHAoK5E70872ay OTSncrcvxLtYNtKct6XJSqACbEHdr4NcvCWSl1Wo0GqsE8AlD8k= =m/Iu -----END PGP SIGNATURE----- --=-=-=--