From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?G=C3=A1bor_Boskovits?= Subject: Content addressable store Date: Wed, 15 May 2019 10:33:18 +0200 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000000fea2b0588e901a6" Return-path: Received: from eggs.gnu.org ([209.51.188.92]:40612) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hQpLx-0006c3-FF for guix-devel@gnu.org; Wed, 15 May 2019 04:33:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hQpLw-00053J-5y for guix-devel@gnu.org; Wed, 15 May 2019 04:33:33 -0400 Received: from mail-ed1-x52b.google.com ([2a00:1450:4864:20::52b]:38203) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hQpLv-00052W-QK for guix-devel@gnu.org; Wed, 15 May 2019 04:33:32 -0400 Received: by mail-ed1-x52b.google.com with SMTP id w11so2955327edl.5 for ; Wed, 15 May 2019 01:33:31 -0700 (PDT) List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Guix-devel --0000000000000fea2b0588e901a6 Content-Type: text/plain; charset="UTF-8" On IRC at May. 14. 2019, the topic of content addressable store idea was discussed. This is also discussed here: page 143 of https://nixos.org/~eelco/pubs/phd-thesis.pdf (or 135) on the intentional model and https://github.com/NixOS/nix/issues/296. Thanks for the links roptat. So, after reading this an initial idea came up, which looks like this: 1. solve the content addressability problem like proposed in the thesis: - build the derviation like we do it now - rewrite the self-references to a known constant - compute the hash after the rewrite - relocate the package to the store-path indicated by the new hash 2. after the packager builds the package, the content address can be added to the definition 3. fail tha package build, if it has a content address, but it mismatches the produced artifact. 4. use flags to allow installing to the original path, and to the content addressed path. I propose to default these in such a way, that it installs to the original path if no content address specified, and to install to the content addressed path, if the content address is specfied. (This might come in hand in the transitional period, so that we can install the package to both locations) There are two issues with the approach: 1. only reproducible packages can be content addressed 2. when a package has a content address, then it will be resolved to that in the dependents, opening up the possibility, that the package points to the output of another derivation than the one defined in the package. As per discussion a user using a channel trust the channel code, it was concluded, that malicious injection can be ignored. What might still happen, is that upon updating a package, the content address is not modified, so the dependents still resolve to the old content address, and have no way of knowing, that the package definition does not actually build. With proper workflow support this might be manageable. Benefits of this approach: - the content addresses do not need a centralized database - the complications resulting from derivations building to different outputs is eliminated - a very good reproducibility indicator is gained - it can peacfully coexist with our current store. Wdyt? --0000000000000fea2b0588e901a6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On IRC at May. 14. 2019, the topic of content address= able store idea was discussed.

This is also discus= sed here:
page 143 of https://nixos.o= rg/~eelco/pubs/phd-thesis.pdf (or 135) on the intentional model<= /span>
and

Thanks for the links roptat.

So, after reading this an initial idea came up, which looks like this:<= /span>

1. solve the content addressability problem like pro= posed in the thesis:
- build the derviation like we do it now
- rewrite the self-references to a known constant
- compute the= hash after the rewrite
- relocate the package to the store-path = indicated by the new hash

2. after the packager buil= ds the package, the content address can be added to the definition
=
3. fail tha package build, if it has a content address, but it= mismatches the produced artifact.

=
4. use flags to all= ow installing to the original path, and to the content addressed path.
I propose to default these in such a way, that it installs to the origi= nal path if no content
address specified, and to install to the con= tent addressed path, if the content address is specfied.
(This migh= t come in hand in the transitional period, so that we can install the packa= ge to both locations)

There are two issues with the ap= proach:
1. only reproducible packages can be content addressed
2. when a package has a content address, then it will be resolved to th= at in the dependents, opening up the possibility, that the package points t= o the output of another derivation than the one defined in the package. As = per discussion a user using a channel trust the channel code, it was conclu= ded, that malicious injection can be ignored. What might still happen, is t= hat upon updating a package, the content address is not modified, so the de= pendents still resolve to the old content address, and have no way of knowi= ng, that the package definition does not actually build. With proper workfl= ow support this might be manageable.
=
<= /span>
Benefits of t= his approach:
- the content addresses do not need a centralized dat= abase
- the complications resulting from derivations building to di= fferent outputs is eliminated
- a very good reproducibility indicat= or is gained
- it can peacfully coexist with our current store.

Wdyt?

--0000000000000fea2b0588e901a6--