From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 2FWlMUTEzGF9aAAAgWs5BA (envelope-from ) for ; Wed, 29 Dec 2021 21:25:40 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id IAlIKkTEzGEF6AAAG6o9tA (envelope-from ) for ; Wed, 29 Dec 2021 21:25:40 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 2078737EEE for ; Wed, 29 Dec 2021 21:25:40 +0100 (CET) Received: from localhost ([::1]:42600 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n2fVy-0005AE-MB for larch@yhetil.org; Wed, 29 Dec 2021 15:25:38 -0500 Received: from eggs.gnu.org ([209.51.188.92]:36740) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n2fVa-0005A5-0P for guix-devel@gnu.org; Wed, 29 Dec 2021 15:25:14 -0500 Received: from [2a00:1450:4864:20::444] (port=45731 helo=mail-wr1-x444.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n2fVX-00026t-TB for guix-devel@gnu.org; Wed, 29 Dec 2021 15:25:13 -0500 Received: by mail-wr1-x444.google.com with SMTP id v7so46400830wrv.12 for ; Wed, 29 Dec 2021 12:25:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:subject:from:to:date:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=ImMQ/xhqfKsiPLEK2vd86hsmlBnlXau34TLurQIrPUg=; b=p08BwTRw05T5S72sc+H2uakZgyl9jOKCAYCE1aBkImlsfm6h3P5EIPzAlNxExFG1K+ nhPd0KF3K25JTEKLr5zAToup+gM0b2PhklOWlz42dJGJdUYch6fXaIyH1/wiK1v02HBv vlzcDCnu81Hk6QvT6KU9IgmpQzfww4UFi6KqFM8jYVC0r8OfhPzp09UxuBHsqe7tLfMW UCVj9yyA718/+IooPG5fSM0gWi4e0H3od/BVKQdn0TcvepXPTtJtEMwQx/9wlk/MxbTN /atTEzC/SmkULFzURDluHPQV7ZvRhD/BBlM9E30Ymc6GxJ/NAAJszyTE85ZV2A9Uvk9J Ohjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=ImMQ/xhqfKsiPLEK2vd86hsmlBnlXau34TLurQIrPUg=; b=KBWvAVbnMbD94ERkilwi/itua6VH84fc9xCXU324PC/UZQpgHDF3SumOZ9pTjc7nIY jnoLEzYs8sutedq0dch4ydqkKn2oTmV2Y+bZZIaBVBpGwNERcSKLtcxE2TxZTYOxg5j7 eMbvA9O2rC8nn6K8O/QF/lhHvXdEfNstIBPmZpEIVQq3NkppD4e/5Qk85CNn6/S3F+W7 TMUshgtni5uBwU72J9egbfKTR7I4rcEU+SSDjbFcuVERbcN7HQ3xuACVo1rZxM9uhkl8 GW0cMhYTINU23Z3st81/UWtDGZoD7gG7lhFXQ4KB4Erm1IRY02Q4PzG7HXcwbMHELeQp Sl/g== X-Gm-Message-State: AOAM533dUzW4rHl/ydawgPOWJqH9OEpTMfal+ve67Kg5MEmqv8vGjfbK 2TAtNp3gsMXU5CkQt3XL1i4= X-Google-Smtp-Source: ABdhPJxV/LviwZe6cASd6wR2gLjyGbqKCjU/xnE70cw6SIBZLgppY9E4UeSqDGt4GnARj9LEt12dKg== X-Received: by 2002:adf:f085:: with SMTP id n5mr22387360wro.418.1640809509998; Wed, 29 Dec 2021 12:25:09 -0800 (PST) Received: from nijino.fritz.box (85-127-52-93.dsl.dynamic.surfer.at. [85.127.52.93]) by smtp.gmail.com with ESMTPSA id z22sm21009748wmi.26.2021.12.29.12.25.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Dec 2021 12:25:09 -0800 (PST) Message-ID: <899587fb6a76ddfa37d197d3d0fd23cdc7ad8592.camel@gmail.com> Subject: Re: On raw strings in commit field From: Liliana Marie Prikler To: zimoun , guix-devel@gnu.org Date: Wed, 29 Dec 2021 21:25:07 +0100 In-Reply-To: <86y243kdoo.fsf@gmail.com> References: <6e451a878b749d4afb6eede9b476e5faabb0d609.camel@gmail.com> <86y243kdoo.fsf@gmail.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a00:1450:4864:20::444 (failed) Received-SPF: pass client-ip=2a00:1450:4864:20::444; envelope-from=liliana.prikler@gmail.com; helo=mail-wr1-x444.google.com X-Spam_score_int: 6 X-Spam_score: 0.6 X-Spam_bar: / X-Spam_report: (0.6 / 5.0 requ) DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1640809540; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=ImMQ/xhqfKsiPLEK2vd86hsmlBnlXau34TLurQIrPUg=; b=c53PD54nUAuA2fzYKbwRwMas5IroGE/0cF2irRERkhZjsiqXVnrWTpOWS0fVb5GFmnhuP2 nOoMV45shrvJi5Bq8XCYLU5VDkK52bH/x20f5IawpM4aU7OH9c8Bai/zT1Z9PUo0dG6/gK x2hJpkkiWR4S++U8KJdzihgkenxnRn+6vle2zTnto8Nlu94Sb3NwOVvcQ4Lk93ZR4onwR+ e8Ub5oC2aZr+WzsZ7aoNDc3mdpSgEau+xCdtnjkyxdM9wHn9bEpoXfa4lYVeqPUm6HxxDG PC0j/4ql3sQEoB4QmChE6h9RaRJTYKT49gz1tw3AZ8Jd6qkfj63cLdD/wfZbFw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1640809540; a=rsa-sha256; cv=none; b=q4M66SfayCAnUyGfyNwCosnYo8wFHeCkzMTzwubvNVPGiRCcigajz1q3b5CQC6Fr+Lt+ya US8GgAafRcJbsWElyyVLHlQB1LvtUZzuHznvXX3kMuWxsto1MjO37hGnNDRt1rkRTpYK7v ZYdhmDjkeNYVHcmmnQpP7KpXVtIRMQdD87s5qZuljRJoZhwSzfhA8+7K3WXLuz+z/b093N yPxp17A8xeh/UixcXW93xgnVdQjkYpWAjlEj0MTWZP5UYiiEHDQcvlH6jcyl1Nu/9wQ6Sy iH6TmLu3xNLgstmIV6yAsfv54BnQhE9mFTBLyTMbiRz7AoDdXBfg7VLo3hJW5g== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=p08BwTRw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -4.27 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=p08BwTRw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 2078737EEE X-Spam-Score: -4.27 X-Migadu-Scanner: scn1.migadu.com X-TUID: GxL1KqMgUUvl Hi, Am Mittwoch, dem 29.12.2021 um 09:39 +0100 schrieb zimoun: > Hi, > > On Tue, 28 Dec 2021 at 21:55, Liliana Marie Prikler > wrote: > > > Consider a package being added or updated in Guix.  At the time of > > commit, we have the tag v1.2.3 pointing towards commit deadbeef.  We > > therefore create a guix package with version "1.2.3" pointing to > > said commit (either directly or indirectly).  At this point, one of > > the following holds: > >   (1) Guix "1.2.3" -> upstream "v1.2.3" -> upstream "deadbeef" > >   (2) Guix "1.2.3" -> upstream "deadbeef" <- upstream "v1.2.3" > > From either, we can follow that Guix "1.2.3" = upstream "v1.2.3".  If > > upstream keeps their tags around, then both forms are equivalent, but > > (1) is more convenient; it allows us to derive commit from version, > > which is often done through an affine mapping. > > No, tags and hash commit are not equivalent.  Hash commit is intrinsic: > it only depends on the content.  Whereas, tags are extrinsic, they > depend on external choice. The notion of equivalence I am using here is the same as in the statement "5 ≡ 2 mod 3", wherein the ≡ symbol is ironically called IDENTICAL TO in Unicode despite being used very differently in mathematics.  Perhaps there is a language barrier here; in German we read that as "5 is equivalent to 2 modulo 3" and logic equivalence functions similarly. For the record, one could argue that I should have used that symbol for comparing Guix "1.2.3" to upstream "v1.2.3" because they are in fact not equal, only equivalent, but that's besides the point. The point is, with an upstream behaving as we want upstreams to behave (not just git ones, url-fetch suffers from the same issue with moving tarballs for instance), you can substitute one for the other without a change in meaning; both will fetch the same commit. > From the content to the hash, three keys: 1) how to serialize and 2) > how to hash and 3) how to represent the hash.  For #1, Git uses their > own serializer and Guix, inheriting from Nix, uses another (Nar); > although the difference is minor.  For #2, Git uses by default SHA-1 as > hash function, although Guix uses SHA-256.  And for #3, Git uses > hexadecimal format and Guix uses nix-base32. > > The subcommand “guix hash” with the options ’-S, -H’ and ’-f’ exposes > these 3 keys.  For instance: > >         $ cat /tmp/foo.txt | git hash-object --stdin >         557db03de997c86a4a028e1ebd3a1ceb225be238 >         $ ./pre-inst-env guix hash -S git -H sha1 -f hex /tmp/foo.txt >         557db03de997c86a4a028e1ebd3a1ceb225be238 > > To make it explicit, the checksum hash of ’git-reference’ could be > removed because it is somehow redundant with the commit hash. > Obviously, it cannot because security reason (SHA-1 is considered as > weak). The other way also works. If Git used a secure hashing function such as SHA-256 (or SHA-512 or Keccak) and Guix supported that hash, we could generate a git hash from the Guix hash (assuming also we allow the origin serializer to be configured, which would be required either way). The weakness of SHA-1 also flies in the face of the robustness argument. One could maliciously push a commit that replaces an existing one with the same hash, though it would also break the repo in doing so. At least in theory, as no such attack has been done yet. Note to self: theoretical attacks on Git are probably off-topic as well. > > Problems arise, when upstreams move or delete tags.  At this  > > point, guix packages that use them break and are no longer able to > > fetch their source code.  Raw commits are in principle resilient to > > this kind of denial of service; instead upstreams would have to > > actually delete the commits themselves, including also possible > > backups such as SWH to break it.  There is certainly an argument > > for robustness to be made here, particularly concerning `guix time- > > machine', though as noted it is not infallible.   > > SWH provides ’swh:id’ which is another triplet (really close to Git). > Basically, content means data and metadata and to make it short, SWH > deals their way with metadata for reason of large scale.  And SWH > does snapshots of Git repositories. > > Therefore, to have something really robust, Guix has to rely on a map > from package definition to SWH. > > Using Git commit hash instead of tag makes this map.  For tag, to > have something robust, we need an external map from checksum hash to > SWH hash via Git commit hash.  This “external” is done by Disarchive. I don't know too much about Disarchive here, so please enlighten me. If it used a pair of origin file name + hash, whether or not the git- reference uses tags would be irrelevant, no? Do we have to take values from the uri field? > > Long-term, we might want to support having multiple > references> in git-fetch -- if the first one fails due to a hash > > mismatch, we would warn about that instead of producing an error > > and thereafter continue with the second, third, etc.  > > similar to how we currently have mirror:// urls for some well-known > > mirrored repositories.  That way, we have a system to warn us about > > naughty upstreams while also providing robustness for the time > > machine. > > I think the long term is to completely remove tag and only use commit > hash; as done for ’guile-aiscm’.  But it will not happen for > convenience reasons, I guess. > > What you are proposing is to mix extrinsic (tag, URL, etc.) with > intrinsic (commit hash, checksum hash, etc.).  Well, I do not know if > this proposed fallback mechanism would ease the maintenance and would > make Guix more robust. I'm not sure the distinction between extrinsic and intrinsic values is a useful one here. The only important intrinsic value here is the content hash, which is unlikely to break [1]. We're using extrinsic values such as URLs all over the place, including the very line preceding the commit value of a git-reference (almost) every time -- I'm leaving room here for some person to put the commit before the URL. > To me, robustness means make a map from intrinsic values to content; > as Disarchive is doing for instance. See above, I don't understand why Disarchive would need more than the content hash as an intrinsic value to do so. Cheers, Liliana [1] "Briefly stated, if you find SHA-256 collisions scary then your priorities are wrong." https://stackoverflow.com/a/4014407