From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id yGEGFnVJzmGtAwAAgWs5BA (envelope-from ) for ; Fri, 31 Dec 2021 01:06:13 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id uHaREnVJzmHBYQEAauVa8A (envelope-from ) for ; Fri, 31 Dec 2021 01:06:13 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 98CDBA745 for ; Fri, 31 Dec 2021 01:06:12 +0100 (CET) Received: from localhost ([::1]:35898 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n35Qx-0006u8-Q7 for larch@yhetil.org; Thu, 30 Dec 2021 19:06:11 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60748) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n35Nh-0005qx-V6 for guix-devel@gnu.org; Thu, 30 Dec 2021 19:02:49 -0500 Received: from [2a00:1450:4864:20::444] (port=47054 helo=mail-wr1-x444.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n35Nf-0001DM-A0 for guix-devel@gnu.org; Thu, 30 Dec 2021 19:02:49 -0500 Received: by mail-wr1-x444.google.com with SMTP id i22so53147109wrb.13 for ; Thu, 30 Dec 2021 16:02:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:subject:from:to:date:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=1mUVXMOoxum4Fv8vHcoT4LE6myurwjaUL5PMT5x064s=; b=mmAmNCqNPJNRW1Ic5qt22QD9ilDZ3bFgQq0xmseWbL+Cy7vUybbZqI2zhMjelTshrL 676QYAbxsbQp73XFxeuvjnCCd2POKqMp2b2QNxObqhMEVcHT3qDZQv3WDNM9mpZksy25 r8I8SdV1WDmHnXLEgFdcHH+WcGHt0rVXOBQM2s9+dg/723olQ+bGSMpDO3vygWBrpVas DyXajgop0XY3Co0AABxetYf8W+Gor4WH7fL4kzi4cfpWX7BkB+l4+yCsSCETcjGLf16J AasrJHh9l0CBu7Rv/kcLwr2QHR0YvR4BMFNvA7OQHRffCfOkbZP9OtBk41Mz0fsRIpzs 1TQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=1mUVXMOoxum4Fv8vHcoT4LE6myurwjaUL5PMT5x064s=; b=6irxCh/BG5dGiyUPocYIVFLXPaw78Kex2w8RIwMCqEZ9QIN4qizlYPD3C6RJJi1YYS Lie5rY0RovlF2trdHx1S8Szfh9LCIaGrmrl1ostR1/wE3BZ3oqOPAZvriR2lggh9pOWl tL2HpHVuGun8t7BTpCSf/Pw4GC18bIMb5qZ+tydXv5N6uT0LDRAMw+OuzOnuhn7L8Eig C/YyJ0/wxnzJxWgx5a4hGlTUO6DVL/DwVjHw9mHEhya33ZruTYuJI03+fKbYfyoohydk xKAIOOtWXKX+RA5TJY9kr6PeI2A1etLZbugxOWnuDT/BvOcwmXY6sZOG7wY2andhRxyR O8MA== X-Gm-Message-State: AOAM533EXUy/xkkmej4yF8sQ+oO3kJ6WEFsg49qVA6CjEgYTHBT9mUrO 1x+qHtsTVmODnDwG3TzGQdg= X-Google-Smtp-Source: ABdhPJxG2IBCF9JV/suzk+iasWC/HtpkuoNu8bskqlXHvQjqPn3GTpV+DWF83ENyvql6MBuY22oWpg== X-Received: by 2002:a05:6000:1563:: with SMTP id 3mr27300472wrz.372.1640908965250; Thu, 30 Dec 2021 16:02:45 -0800 (PST) Received: from nijino.fritz.box (85-127-52-93.dsl.dynamic.surfer.at. [85.127.52.93]) by smtp.gmail.com with ESMTPSA id d5sm2924018wrs.61.2021.12.30.16.02.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Dec 2021 16:02:44 -0800 (PST) Message-ID: <3d448fe42f0c43574db96fa26aecd7da5fd5a95d.camel@gmail.com> Subject: Re: On raw strings in commit field From: Liliana Marie Prikler To: zimoun , guix-devel@gnu.org Date: Fri, 31 Dec 2021 01:02:43 +0100 In-Reply-To: <867dbmi7pf.fsf@gmail.com> References: <6e451a878b749d4afb6eede9b476e5faabb0d609.camel@gmail.com> <86y243kdoo.fsf@gmail.com> <899587fb6a76ddfa37d197d3d0fd23cdc7ad8592.camel@gmail.com> <867dbmi7pf.fsf@gmail.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a00:1450:4864:20::444 (failed) Received-SPF: pass client-ip=2a00:1450:4864:20::444; envelope-from=liliana.prikler@gmail.com; helo=mail-wr1-x444.google.com X-Spam_score_int: 6 X-Spam_score: 0.6 X-Spam_bar: / X-Spam_report: (0.6 / 5.0 requ) DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1640909172; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=1mUVXMOoxum4Fv8vHcoT4LE6myurwjaUL5PMT5x064s=; b=namVnyzv4KatggTY3sMvot1HlLvTdE/fBrI/gxMFg24Y4tNPhkqSBw+kzCX0Q8GYwmD/0G 89o7jWWW9w/9/yNLpvXGboOczujZK68pIrlxW/5AFqMVSpWnRymw44cHPSS0hq4kD3dAtL g6CC48odzKqyq7Y4VmxWr2KajM5ljcDhwVivH1BmuD/sNDmUQBx015fzmPWR4V79Rvw1jj hrYk2thjYC0KZ9pAhkdm0oXd3/6mVSs7Q5roWu7CLcEvH+mEN1v6CM2pSfXylAeInBFG4R R6n3d4sCwNw8Grzlj9EE935+yKOV2AyCvzwhIX7+VZJGATXtuv3LWWnk0Wf3gQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1640909172; a=rsa-sha256; cv=none; b=iRlYhb88T/xlOSFFhrh9k0TrLMuKzgVSOEuxdp/OL7tD0wzJvH/1w9l0HkHe/B14QRS5KX +hw3lQGSAxy0Elu4FX3QiyH0rltfcTrP3+xcvKef4geZhTk9FE23m48pjx+1SNSNtD7OiF DN4Hd3vN15NdxIht3OnIzd5qAsgC/6GFVvg9R8ej6eAQGfT0QWPmP+zzMtUd48VCGcSJVM Gt+yuORDP/avEzUpSEjiGtBucm8xr/YOf66spLlc5S8hSixWQYEgfMXHmCNjcHdbiLYIgF 0LGlAdEwgPJdbI1SUuK98Z9EtyVJ/ecyUATRKGQu3w5eMdvaqfZ467REx/FIUg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mmAmNCqN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -4.27 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mmAmNCqN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 98CDBA745 X-Spam-Score: -4.27 X-Migadu-Scanner: scn1.migadu.com X-TUID: enHeOYk1TF7P Am Donnerstag, dem 30.12.2021 um 13:43 +0100 schrieb zimoun: > Hi Liliana, > > On Wed, 29 Dec 2021 at 21:25, Liliana Marie Prikler > wrote: > > Am Mittwoch, dem 29.12.2021 um 09:39 +0100 schrieb zimoun: > > > On Tue, 28 Dec 2021 at 21:55, Liliana Marie Prikler > > > wrote: > > > The notion of equivalence I am using here is the same as in the > > statement "5 ≡ 2 mod 3", wherein the ≡ symbol is ironically called > > IDENTICAL TO in Unicode despite being used very differently in > > mathematics.  Perhaps there is a language barrier here; in German > > we read that as "5 is equivalent to 2 modulo 3" and logic > > equivalence functions similarly. > > I do not understand against what you are arguing so I skip it. :-) I was under the impression that you and I used the word "equivalent" differently, so I wanted to clear that up. > > For the record, one could argue that I should have used that symbol > > for comparing Guix "1.2.3" to upstream "v1.2.3" because they are in > > fact not equal, only equivalent, but that's besides the point.  The > > point is, with an upstream behaving as we want upstreams to behave > > (not just git ones, url-fetch suffers from the same issue with > > moving tarballs for instance), you can substitute one for the other > > without a change in meaning; both will fetch the same commit. > > If I understand you correctly: > >  - Guix "1.2.3" means the field ’version’ >  - upstream “v1.2.3” means the upstream tag used by the field > ’commit’ of ’git-reference’. > > and yes it is strongly expected that these both fields matches. :-) Well, at least we agree on something. > But it is irrelevant, IMHO, to your initial message «commit tags are > in principle mutable and hence can not be relied on when fetching > sources. I do have a few issues with that explanation».  It is > fortunate and not robust that ’commit’ matches ’version’ via upstream > ’tag’. It is in fact very relevant to the issue at hand. In principle, versioned URLs are not robust, hence we can't have a single package using url-fetch. A statement like that is obviously silly, not just because tarballs that are updated in-place are exceedingly rare, but also because they violate how we think about versions. The same holds for git, with the difference being that we no longer generate a URL from the version, but a tag. If that tag can't serve as bridge here, the version field loses the meaning it had from the strong expectation that the two of them match. > Because how ’commit’ and ’tag’ are defined is different. > > I cannot tell it differently than: Git commit depends only on the > content, although ’tag’ not. > > Version (or tag) is convenient names for humans.  It is easier to > tell version 0.23.1 than > 09rdbcr8dinzijyx9h940ann91yjlbg0fangx365llhvy354n840.  And we can > deduce that 0.22.3 is older than 0.23.1, when it is impossible for > commits. Git commit hashes do not just depend on the content. They also depend on how much effort you put into solving a proof of work challenge that won't ever earn you crypto coins [1]. > If you prefer to keep the frame: «you can substitute one for the > other without a change in meaning», then, for what my opinion is > worth on that matter, my probably wrong understanding of your words > is that perhaps you are missing a point about content-addressability. To be fair, I did not consider content-addressability here, because my main concern is natural intelligence based verification. > > > From the content to the hash, three keys: 1) how to serialize and > > > 2) how to hash and 3) how to represent the hash.  For #1, Git > > > uses their own serializer and Guix, inheriting from Nix, uses > > > another (Nar); although the difference is minor.  For #2, Git > > > uses by default SHA-1 as hash function, although Guix uses SHA- > > > 256.  And for #3, Git uses hexadecimal format and Guix uses nix- > > > base32. > > [...] > > > > To make it explicit, the checksum hash of ’git-reference’ could > > > be removed because it is somehow redundant with the commit hash. > > > Obviously, it cannot because security reason (SHA-1 is considered > > > as weak). > > > > The other way also works.  If Git used a secure hashing function > > such as SHA-256 (or SHA-512 or Keccak) and Guix supported that > > hash, we could generate a git hash from the Guix hash (assuming > > also we allow the origin serializer to be configured, which would > > be required either way). > > Yes somehow.  To be on the same wavelength, we need to be precise > when we speak about hash here because hash means: > >  - serializer: how to deal with all the bits making the full content >    (files, folder, tree, etc.) >  - hashing function >  - format > > So yes, on principles, instead of NAR + SHA-256 + Nix-base32, the > Guix project could have chosen Git + SHA-1 + Hex, or Git + SHA-512 + > Base64 or any other combinations. > > (I think this choice inherited from Nix is rooted in daemon > implementations and another triplet would have been more changes when > starting Guix, I guess.) > > However, knowing only the final Guix checksum hash (NAR + SHA-256 + > Nix-base32), say > 09rdbcr8dinzijyx9h940ann91yjlbg0fangx365llhvy354n840, > you can easily replace by any other formats (Hex or Base64), but it > is not straightforward to compute the Git commit hash (here > c78b91edb7c17c6fbf3b294452f44e91d75e3c67) from this Guix checksum > hash, because the serializer NAR and Git have minor differences, and > mainly because one uses SHA-256 and the other SHA-1 – and it is > generally not possible to convert the hash from one hashing function > to another hashing function. > > To make it short, my point is: a) a Git commit hash owns the same > properties as any checksum hash and b) a string tag is obviously not > a checksum. Ad b), I never claimed that a string tag is a checksum. All I'm claiming is that *under normal circumstances* we would expect it to point to just one commit over time, similar to how we expect mirror URL to expect to the same tarball no matter who ends up delivering it. Or how we expect the same substitutes from different servers ;) Ad a) given that the Git hash (or checksum if you will) is weaker than other checksums used in Guix, I simply wanted to reassert that it ought to be the first to vanish if any of them vanishes, not the last. I understand my crypto well enough to know that we can't simply change serializers post hash creation; if we wanted to encode our origin hashes in an SWH-friendly fashion, we would need to change API accordingly. > > I don't know too much about Disarchive here, so please enlighten > > me. If it used a pair of origin file name + hash, whether or not > > the git-reference uses tags would be irrelevant, no?  Do we have to > > take values from the uri field? > > I am not sure to understand the questions.  Maybe the thread starting > here is worth: > >     > > Otherwise, could you explain more what you have in mind? I'mma quote Ludo for a change. > SWH records the “history of the history”. It can tell you what the > tag pointed to at the time of a specific snapshot. This just reiterates my point of Guix not trying hard enough with fallbacks. Let's say I archive git.evil.org/malicious-repo at version 0.1.0 a fair number of 64 times because I just keep changing the initial release and Guix still refers to it by tag because I am also in charge of updating the Guix package and have not yet caught up to the fact that revision/commit pairs are good, actually. Since each of those 64 archives have a different NAR hash, we could try fetching all of them from SWH and pick the one that fits. Now obviously, in the real world, we would probably switch to a version/revision pair for an upstream that violated our expectations once, perhaps twice, so the overhead would not be as dramatic outside of constructed examples. Still, for the sake of "robustness", we might want to decide what's a robust number of retries to not get into a DoS loop. > > > To me, robustness means make a map from intrinsic values to > > > content; as Disarchive is doing for instance. > > > > See above, I don't understand why Disarchive would need more than > > the content hash as an intrinsic value to do so. > > Basically nothing more, so nothing to understand. :-) > > Your initial messages started with: > >         [...] > > and my intent was to point the reason is not really the “mutable” > part but the reason is because it is better to rely on intrinsic > values (discussed in link above).   By content hash, I meant NAR hash or Guix hash, not commit hash. Sorry for the confusion. > Obviously, intrinsic value is immutable but, IMHO, intrinsic value is > somehow a key-point for lookup in content-address systems.  Git- > commit hash is one way, SWH-ID is another, IPFS uses another, GNUnet > another, etc.  The recent ERIS [1,2] is an > attempt to bridge, IIUC. > > Addressing ’origin’ by intrinsic values implies which ones and The > Right Thing is really hard to predict. I don't think I agree with that assessment. "Guix for Racket packages" (it was called Xiden back then, but appears to have changed to denxi) had the insane idea of allowing more than one hash in a package definition and the source would have to match all of them. We could do the same in Guix, but it'd be another core-updates cycle until then. > My opinion is that robust long-term – i.e., near future I want – is > to rely on more intrinsic values in ’source’ or ’origin’ and less > tags, urls, etc.  Well, I am fine if we disagree.  You asked «What do > y'all think?», now you know what I think. :-) > > Last, sorry if I am misunderstanding you, back to your initial > message. You provided ’guile-aiscm’ as one example of something that > confused you.  Instead of the current definition, you would like this > definition > > --8<---------------cut here---------------start------------->8--- > 1 file changed, 1 insertion(+), 1 deletion(-) > gnu/packages/machine-learning.scm | 2 +- > > modified   gnu/packages/machine-learning.scm > @@ -299,7 +299,7 @@ (define-public guile-aiscm >                (method git-fetch) >                (uri (git-reference >                      (url "https://github.com/wedesoft/aiscm") > -                    (commit > "c78b91edb7c17c6fbf3b294452f44e91d75e3c67"))) > +                    (commit (string-append "v" version)))) >                (file-name (git-file-name name version)) >                (sha256 >                 (base32 > --8<---------------cut here---------------end--------------->8--- That would have been a perfectly fine definition in my opinion, yes. > ?  Or something like along these lines, > > --8<---------------cut here---------------start------------->8--- > (define-public guile-aiscm >   (let ((version "0.23.1") >         (commit "c78b91edb7c17c6fbf3b294452f44e91d75e3c67") >         (revision "0")) >     (package >       (name "guile-aiscm") >       (version (git-version version revision commit)) >       (source (origin >                 (method git-fetch) >                 (uri (git-reference >                       (url "https://github.com/wedesoft/aiscm") >                       (commit commit))) >                 (file-name (git-file-name name version)) >                 (sha256 >                  (base32 >                   > "09rdbcr8dinzijyx9h940ann91yjlbg0fangx365llhvy354n840")))) > [..] > --8<---------------cut here---------------end--------------->8--- > > ?  And your point is that “0.23.1” is redundant with > “c78b91edb7c17c6fbf3b294452f44e91d75e3c67” because Git so why not > just use “0.23.1” in ’origin’.  Right? We typically don't let-bind version (i.e. we only bind revision and commit, which is probably a wise idea as version is syntax inside package), but sure, that's also a fine definition. I would wonder why you are doing that for a commit that is itself a release, but if you're explaining to me "Well, I don't trust this weird wedesoft fellow, they sound like the kind of person/company to change their tags more often then their underwear" or even better had evidence of such a change, I'd agree and push. > In the current matter of facts, I do not think any rationale can be > made in favor of one of the three main possible definitions > (addressing by tag, by commit, using let).  The only weak > justification for addressing using commit hash is that the lookup > when fallbacking to SWH is easier, i.e., it is easier when the Git- > commit hash is known instead of URL+tag. In my personal opinion, the version+raw commit style can be discredited using Cantor's diagonal argument. > These 200 packages can also be seen as real-world experiments > complementing the other ways of addressing in order to find The Right > Way for robust addressing. If a comment spanning four lines is the most reasonable way of explaining said style to others in the source code, that alone serves as an argument for let-binding. > My personal preference, for what it is worth, is an explicit > reference to the commit, i.e., the current definition or the ’let’ > one.  Note it was also discussed this: have convenient things as > url+tag for ’uri’ and use checksum coupled to an external service as > disarchive.guix.gnu.org; but the definitions would be not self- > consistent anymore.  Heh, The Right Thing is not obvious. :-) I have trouble understanding this. Using origin file-names and hashes for computing fallbacks would be a good thing, no? We could completely decouple that from anything related to the method; if we have a backup elsewhere, we can use it. > Other said, version and tag are currently first-class when commit is > second-class, somehow.  As you said «it allows us to derive commit > from tag» (tag is mine).  And I think it is inherited from the long > history about releasing software which is now somehow inadequate > these days. Obviously, I do not know how to do but it should be the > contrary: commit first-class which allows us to derive version > second-class. Let's put humans before machines, they're not our overlords (yet). > PS: You said in initial email «(1) is more convenient; it allows us > to derive commit from version, which is often done through an affine > mapping.». > > I do not understand the “affine mapping”.  Why would it be an affine > mapping?  Well, I miss what is the affine space here, I am able to > imagine the set but what would be the vector space?  Bah you are > probably referring to maths I have never studied. :-) I thought affine mappings were a fine substitute for bijective ones, but it turns out this time it was I who sucks at maths. The original point I was making though, is that we often just have to prepend "v" or some other version marker to get from the Guix version to the tag, for which it doesn't matter if that's an affine mapping or a bijective one, as it's both affine and bijective. Cheers