From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id YOGqGA0gBWT3ZwAAbAwnHQ (envelope-from ) for ; Mon, 06 Mar 2023 00:04:45 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id GIimGA0gBWQ4PwEA9RJhRA (envelope-from ) for ; Mon, 06 Mar 2023 00:04:45 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0CD783C40D for ; Mon, 6 Mar 2023 00:04:45 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pYxOq-00083K-6j; Sun, 05 Mar 2023 18:04:16 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pYxOo-00081Q-OS for guix-devel@gnu.org; Sun, 05 Mar 2023 18:04:14 -0500 Received: from mail-wm1-x32a.google.com ([2a00:1450:4864:20::32a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pYxOm-0001sD-Ic for guix-devel@gnu.org; Sun, 05 Mar 2023 18:04:14 -0500 Received: by mail-wm1-x32a.google.com with SMTP id j3so4617998wms.2 for ; Sun, 05 Mar 2023 15:04:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678057450; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=9+vsCzIoSbMgKjxujJfg98+b7DeZIVnb0Zrzw28cfNw=; b=LmisCJTxHM6iK2/PGL+rQyLrbekuh8VKiCSmCQ9uuFJebL6TH64maYZw1a8/LQokGM hhugSyzwBH2YEZwJjg4PD3F6iAOLk61iYnCqdb14smvkJ91iuRsxSwgXBfpdk4qx37m8 AKteUMDCurzN6UfuPvV+8nRZuRBMhyMgKqAllUDP6Hi8/OtD/HQIG2fNP0tCxzFrV0HE rNfvuUsK/nwfPxf7hrlBObeFvVBCrQfFdvbisVDf1ariySjuRnewHiHiKlSrRIgklbRn Se9NGminkS3Z9sAW2xfzKYrb6bwpLV5bGa1Dyl+/RdAW6Hk0ASKfsbZ0pVooHbofgBZL RaGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678057450; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:to:from:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=9+vsCzIoSbMgKjxujJfg98+b7DeZIVnb0Zrzw28cfNw=; b=3+ztEgLnEhMWIh7RWRpwJrVAhDEL5cjrY8x1z7O+r2jB4mRELIdictRglekHQ/B53C XPC4rf97ZftSVqMi25dGWb4yt3Vg4rqfspUjUDi13np8+Da0z4/fbFVFY0m17D6YBrnG WDKewNf9Ys7snoLIsqe1U8uSeSllBO1Az85WHJXNvgntqczJ7xPt5A9cMzbSq18BwONW My+7BqeKttm0IA5mMqnIt1sOZ4Hf7Ev551UPD7fSvZPutZkaj7lP/5KcOvUtNApvj4Zk DCycpNUsh1G9gBkTWHfmkbB9hAY6lSBJx0gnwCtXMIGA4bni8N/pHDWFg0oe8JBAV2s2 Wygw== X-Gm-Message-State: AO0yUKVazkJwUPhCh/zsWUGdRwDKwcxRLePazLw/zrlxwnH6ibLv7xIB BtMqtRnOypgirSCczsG9OasRiBXZ+pA= X-Google-Smtp-Source: AK7set/HN/CtcopjzdFeZZO9nbv7s/PMaEVQXUoqy+VOi2iowhauedbLAT4P0BLWK6y7k/kdhhdU6Q== X-Received: by 2002:a05:600c:3ac8:b0:3db:2922:2b99 with SMTP id d8-20020a05600c3ac800b003db29222b99mr7096598wms.4.1678057449588; Sun, 05 Mar 2023 15:04:09 -0800 (PST) Received: from lili ([2a01:e0a:59b:9120:65d2:2476:f637:db1e]) by smtp.gmail.com with ESMTPSA id w34-20020a05600c09a200b003e21f959453sm8575944wmp.32.2023.03.05.15.04.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Mar 2023 15:04:08 -0800 (PST) From: Simon Tournier To: Maxime Devos , Guix Devel Subject: Re: intrinsic vs extrinsic identifier: toward more robustness? In-Reply-To: <09d3d861-0390-3ce6-30c7-22a1e2685787@telenet.be> References: <87jzzxd7z8.fsf@gmail.com> <09d3d861-0390-3ce6-30c7-22a1e2685787@telenet.be> Date: Sun, 05 Mar 2023 21:21:18 +0100 Message-ID: <86sfej0x1d.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::32a; envelope-from=zimon.toutoune@gmail.com; helo=mail-wm1-x32a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1678057485; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=9+vsCzIoSbMgKjxujJfg98+b7DeZIVnb0Zrzw28cfNw=; b=QOhmkhqVPfRT4ebbobYGys5B+VNMI15D5jQRTu0WOT2xfmcbPC/8eWf7KSkjzJVIBM+vYy RfWrQSk5Y3d0NJKn77bLRd4wlSRQzCpaIPtZtlpUZ/JZmi2YKChFyrtA41ckTq7DBkcfAS yWf2wRVbFbKYSwOtU6Namp0TpX3cY5JZ7uL+Juku4oibfG24V+ATEymA5f3rTw7M3o7UWO /00Fres5QmRYpkkXM4uR5135h+uTn/J9TfHSLuk/RYUkNaUoMEkBx6boe5oM8jXx/0XvvN ARhaagM689XZJM+sVfwYmK1By0AbQEryoMafDx6vVGOqNXTJmOF2aaKOfVxi2A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LmisCJTx; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=key1; d=yhetil.org; t=1678057485; a=rsa-sha256; cv=none; b=GTWgjSszdvFnMxZGvjVLRgUmp02nqMOqzIyOiS55oDLeJQR3SG3zhDGlltxCYMXNX62qSM BHDTDWGh/gns1A84ipLiMdHOHHUsF5rTg5jCtVKmsE6TT2XfrEXyxbnMT309Ki7n9wcs9G MlexfvOyOIbYNPhNHTX5G4lC6OJgvLVVaCKDU8VRbi7I8CFT/YQhDh/SlS76iDjZPfaJe0 sl8LcN5U5GCSl2SItQ7pab/RR6WK+3ouSoJhgS4CZZBMU0Bkt9j3gYDOY2Dhm8NwN8G/CW K7qTJmp9fLVnG5Mt1sIOvyVv5KIUhdnhPDjMWUAIsZM+uZzm1fKdQqjDwFJBmA== X-Migadu-Scanner: scn1.migadu.com X-Migadu-Spam-Score: -5.44 X-Spam-Score: -5.44 X-Migadu-Queue-Id: 0CD783C40D Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LmisCJTx; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com X-TUID: sBE/4ec/KpnM Hi Maxime, Thanks for your comments. On Sat, 04 Mar 2023 at 01:08, Maxime Devos wrote: > To my understanding, there is only one 'real' identifier in Guix: the=20 > (sha256sum (base32 ...)) (*). Those other identifiers like the URL in=20 > url-fetch and git-fetch are just hints on where to find the object --=20 > very important hints without which finding the object is much more=20 > likely to fail, but just hints nonetheless. I am not sure to understand why you mean by =E2=80=9Chint=E2=80=9D. I woul= d not call URLs something like =E2=80=9Cjust hints on where to find the object=E2=80= =9D. NAR+SHA256 is only the =E2=80=99real=E2=80=99 identifier when you allow substitutes. Otherwise, Guix fetches using the =E2=80=99uri=E2=80=99 from t= he field =E2=80=99origin=E2=80=99. And that=E2=80=99s the scenario I am envisioning= here: for whatever reasons, all the data in the stores Bordeaux and Berlin are gone, then it is hard time for =E2=80=9Cguix time-machine=E2=80=9D. >> Intrinsic identifier also relies on a (trusted) map but collisions are >> avoided as much as possible. Somehow it strongly reduces the power of >> the authority and it is often more robust. > > Who is 'the authority' here, how does the absence of collision reduces=20 > the power of the authority, and what is your point about reducing the=20 > power of the authority? Considering intrinsic identifier, the =E2=80=9Cauthority=E2=80=9D is the da= ta itself, somehow. In content-addressed systems, the =E2=80=9Cauthority=E2=80=9D is = diluted or absent. >> Whatever the intrinsic identifier we consider =E2=80=93 even ones based = on very >> weak cryptographic hash function as MD5, or based on non-crytographic >> hash function as Pearson hashing, etc. =E2=80=93 the integrity check is >> currently done by SHA256. > > How about using the hash of the integrity check as an intrinsic=20 > identifier, like is done currently? I mean, we hash it anyway with=20 > sha256 for the integrity check anyway, might as reuse it. Maybe ask GNUnet folk to address by NAR+SHA256 instead on their specification. ;-) Kidding aside, your comment rises two points of view: 1. Guix is fetching data from elsewhere and this elsewhere is not using NAR+SHAR256 intrinsic identifier. Therefore, the question is how to adapt the source origin for taking into account this elsewhere? 2. Replace the NAR+SHA256 integrity checksum by what content-addressed systems use as intrinsic identifier. IMHO, that=E2=80=99s a bad idea f= or two reasons: (a) security, for instance SHA1 as used by SWH is not secure and (b) it will be unmanageable in practise. >> All that=E2=80=99s said, Guix uses extrinsic identifiers for almost all = origins, >> if not all. Even for =E2=80=99git-fetch=E2=80=99 method. > > For git-fetch, the value of the 'commit' field is intrinsic (except when= =20 > it's a tag instead). No, it is imprecise. The exception is *not* label tag as value for the =E2=80=99commit=E2=80=99 field but the exception is Git commit hash as valu= e. > This can be solved by placing the actual commit in the 'commit' field of= =20 > git-reference, instead of the tag name, then things are completely=20 > unambiguous -- this and its opposite were discussed in =E2=80=98On raw st= rings=20 > in commit field=E2=80=99 (*), IIRC. The thread you are referencing [1] is based on misunderstandings. I would like to move forward, hence my detailed email. :-) 1: > (*) Also maybe that thread about tricking peer review. > > I didn't understand the position that commit field should contain the=20 > (indirect, fragile) tag instead of the (direct, robust) commit, but=20 > those differences could be sidestepped by having both a 'tag' field and=20 > a 'commit' field, IIUC. I would not frame this way. My view is not to replace something by something else, instead, is to add something and/or several things. > The problem then was to somehow map the NAR hash to the FS identifier. Yes, that=E2=80=99s the problem. :-) GNUnet FS identifier is one case. And= my discussion here is: could we augment source origin to be able to deal with various identifier? > A straightforward solution would be to just replace the https:// by=20 > gnunet:// in the origin (like in https://issues.guix.gnu.org/44199,=20 > except that patch doesn't support fallbacks to other URLs like url-fetch= =20 > does). Somehow, your proposition would be to have a list as URI, right? (origin (method gnunet-fetch) (uri (list (string-append "mirror://gnu/hello/hello-" version ".tar.gz") "gnunet://fs/chk/TY48PGS5RVX643NT2B7GDNFCBT4DWG692PF4YNHERR96K6MS= FRZ4ZWRPQ4KVKZV29MGRZTWAMY9ETTST4B6VFM47JR2JS5PWBTPVXB0.8A9HRYABJ7HDA7B0" "shw:1:dir:9c1eecffa866f7cb9ffdd56c32ad0cecb11fcf2a" (file-name "gnunet-hello-2.10.tar.gz") (sha256 (base32 "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i") >> It is not affordable, neither wanted, to switch from the current >> extrinsic identification to a complete intrinsic one. Although it would >> fix many issues. ;-) > > How about in-between: include both an intrinsic identifier (the=20 > sha256sum) and an extrinsic identifier (the URLs to locate the object=20 > at), like the status quo. That=E2=80=99s what I am proposing between the lines. :-) The question is which design. For instance, it could go under the field =E2=80=99properties=E2=80=99 similarly as =E2=80=9Cupstream name=E2=80=9D o= r potentially other =E2=80=9Cmetadata=E2=80=9D. Or it could go under the source origin field. Well, however as you pointed, being a =E2=80=99properties=E2=80=99 would no= t be as easy. And as you also pointed, the integrity field could be something else than =E2=80=99sha256=E2=80=99, so maybe we could have a list here. >> The discussion could also fit how to distribute using ERIS. > > ERIS is not a method on its own; you need to combine it with a P2P=20 > network that uses ERIS. I do not understand the special focus on ERIS. Yes, indeed. However, to my knowledge, each P2P can use its own identifier and from my understanding, ERIS relies on whatever P2P. Therefore, willing guix-daemon being able to use ERIS, it somehow implies a discussion about the identifiers used by the P2P networks. Do I miss something? >> At some point, I was thinking to have something like =E2=80=9Cguix freez= e -m >> manifest.scm=E2=80=9D returning a map of all the sources from the deep b= ootstrap >> to the leaf packages described in manifest.scm. However, maybe >> something is poor in the metadata we collect at package time. > > That sounds like "guix build --sources=3Dtransitive' to me, except for=20 > being even more transitive. I propose making this an additional option=20 > for the --sources argument instead. No. =E2=80=9Cguix build --sources=3Dtransitive=E2=80=9D returns an archive= containing all the sources. Instead, I would like the all various identifiers (URL, NAR, SWHID, GNUnet, etc.) of all the transitive sources. Cheers, simon PS: >> However the fields =E2=80=99swhid=E2=80=99 and the other SHA256 =E2=80= =99digest=E2=80=99 are different >> from above. That=E2=80=99s because the dots [...] part. It probably co= mes from >> the normalization process. Well, I am not sure to deeply understand why >> it is different but that=E2=80=99s another story. :-) > > The reason for the normalisation was something about SWH only providing=20 > tarballs whose contents are equal to the ingested tarball; the tarballs=20 > are not bit-for-bit identical to the ingested tarball. But Guix needs=20 > bit-for-bit identical tarballs, so Disarchive contains the information=20 > that was stripped-out by SWH to complement the tarballs provided by=20 > Disarchive. SWH is not in the picture with the example I provided. :-) Yes, the dots part is related to some normalization and =E2=80=9Cmetadata=E2=80=9D. What I do not understand is, if =E2=80=9Cguix build hello -S=E2=80=9D is ma= nually uncompressed and untar, the content corresponds to: $ guix hash -S git -H sha256 -f hex hello-2.12.1 cc7d5c45cfa1f5fba96c8b32d933734b24377a3c1ac776650044e497469affd4 The tool =E2=80=99disarchive=E2=80=99 dissembles the compressed archive; it= first provides the hash of the compressed archive (.tar.gz), then store metadata about compression level, algorithm etc, then provides the hash of the uncompressed archive (.tar), then store metadata about files and last it provides the hash of the tree, it reads, (input (directory-ref (version 0) (name "3dq55rw99wdc4g4wblz7xikc8a2jy7a3-hello-2.12.1") (addresses (swhid "swh:1:dir:9c1eecffa866f7cb9ffdd56c32ad0cecb11fcf2a")) (digest (sha256 "1cb6effd40736b441a2a6dd49e56b3dfd4f6550e8ae1a8ac34ed4b167= 4097bc0")))))))) and I do not understand why it is not the same as manually computed; see above. Well, that=E2=80=99s a detail and not relevant to the current discussion since it is part of how Disarchive works internally.