From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:403:4789::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id MGjtBqKnHWUVLAEA9RJhRA:P1 (envelope-from ) for ; Wed, 04 Oct 2023 19:57:54 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:4789::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id MGjtBqKnHWUVLAEA9RJhRA (envelope-from ) for ; Wed, 04 Oct 2023 19:57:54 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 989825A7DB for ; Wed, 4 Oct 2023 19:57:53 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Fq9pRWRi; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1696442274; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=BTPLoeeSUgGxtYaQdo/XmLc5l8Jvlphrlqqv8jRrkr0=; b=ZVsN0OK5PE7+qeLmpgDa1YYz3EgKvO8Bf8nLWxkxztW4vGMEJLQxP3701JjQ2KoaQU4CmS Pd2Le5C5hZ0+jBAaChU8Oen5BBP49OydkDJftgrQwvMyhrL30qTMUJrNplUvePWJmSziDA UG+nMvS/M610YDlpRrlxVTlWsNGLDRGJgg9znlZM+LnPFYvzJavJWMtn4P/w0XNb4ZN+Ua Nb4dBjCGY1Od4+yOmQHbV1u+euDrklOwFXxI9AWou3R5NQTr9tli2amNKzLjDH/ub72e51 ZC4xAj7+K/SRUXduQ399v4Bg48Dki1wafAm/VJ2T4As1nWUPAdSVinyFuAmxWQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1696442274; a=rsa-sha256; cv=none; b=pyWZwkg3/2V9NXgA+eaPACinv4DIsOYmNeizjbhOR/UgiVUrs3eX6F49TL/JYS3VS0u+V0 EWwiUscJFwA6X9syVO/iT12YZtPueEELoNr/IOnfhkv57uAQh5rC+XpoQCyufaIkbDeZL6 jWP98BcZj56Hgk3vG7oK6zEKODjN3xGwcpdUqJThXsvbB98XwMWbZkcWF6kJpVt3kXDTdh A+qFWIczzy8PMopxD3WyKd10O0cheKpLTdrqEz+1f6KhMsemp8tjAV+SQ9UEDNqgiAzq+E famTYwG1C1o7zq6bRL6HpVVDM5tFaH8l0j2WSMmgEplzfxmlbsxDdu4tvx0xsw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Fq9pRWRi; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo67g-0005hI-4z; Wed, 04 Oct 2023 13:57:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo67c-0005ak-M8 for guix-devel@gnu.org; Wed, 04 Oct 2023 13:57:20 -0400 Received: from mail-wr1-x42d.google.com ([2a00:1450:4864:20::42d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qo67V-0002Ha-Et; Wed, 04 Oct 2023 13:57:20 -0400 Received: by mail-wr1-x42d.google.com with SMTP id ffacd0b85a97d-32163c3ece5so20764f8f.1; Wed, 04 Oct 2023 10:57:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696442230; x=1697047030; darn=gnu.org; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=BTPLoeeSUgGxtYaQdo/XmLc5l8Jvlphrlqqv8jRrkr0=; b=Fq9pRWRijNWUAw04RKSLn1Slcs6HuCNYwe8Ugp/4uoWAoY5dReXeajdcJtvKeJrEsL 1zXJRfEabeDpxUolr9DQgQa9tKJQ1DlaKQmX3trhOZ1HmuuH+1MbMPZNKDSDmfu25nVC x10DaG/viV51wFNI6qTCegIbSMz1u74v0ubhwlCE0L2adyxSTt2I1rEL9J5MWymImA10 yPdEaTKxkIUXejvvYw3Lh2xnYl4PJG5+Huh2NnjaqJ16VdW6N2o33aBrocki8LgH0vyc cWGAgpjN2GttA/ewlhgb3QPk1Py8oQEFskUTqIF7vqgBcwIXMC3I+E9NbCguLA1EykXM P2Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696442230; x=1697047030; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BTPLoeeSUgGxtYaQdo/XmLc5l8Jvlphrlqqv8jRrkr0=; b=Fw9nLfCEI31xhAdkYExIr9qkvacNUqDLoAQNyImsS6F4uCv3h5ICJcWzPphtVvfxHM Y1uXg7BHwZBKqL2VGcpOpEYEC7GgSpnxyA8wYLZA2cL8Kb3ZlMU4LYZ0kuf/6HycPctm SykdDxAfg0hgwKfF5f0T0A4Jpsdy30DerXPi5SoyFIY8OQIjNs48DT8dUtQIOJqJXRX0 Hbls2Kz5te22mBLg7jripIpBkCg8ejQuq4fG+FgU/zuY2n7uScAUpql5W074urHVjP08 m30mLDP6C+3IGjmelhuIwJjZsbSNyj0SGpoff+gv4W4MtHBlm8M1iwF3b7hvqntwkMhu uIbg== X-Gm-Message-State: AOJu0YwTtVQ7zRMpC1QQ+PIKRmqtC1bGnzzN9Ms5oCLCCsCiyY97v6Lc az0uklXUultnL5eLsqWMkVvE1Sa8/AE= X-Google-Smtp-Source: AGHT+IEgoiZcfAJecX8cNEmSN6bn/YT1ME+aTvW/vlTt824vqJbif3mbVPzJayfL7/mX/lZ9AdZLiQ== X-Received: by 2002:adf:e892:0:b0:328:2f4c:8185 with SMTP id d18-20020adfe892000000b003282f4c8185mr2748397wrm.6.1696442230269; Wed, 04 Oct 2023 10:57:10 -0700 (PDT) Received: from pfiuh07 ([193.48.40.241]) by smtp.gmail.com with ESMTPSA id i12-20020a5d438c000000b0031fe0576460sm4543633wrq.11.2023.10.04.10.57.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 10:57:09 -0700 (PDT) From: Simon Tournier To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: Guix Devel Subject: content-address hint? (was Re: intrinsic vs extrinsic identifier: toward more robustness?) In-Reply-To: <87a60cbnf7.fsf@gnu.org> References: <87jzzxd7z8.fsf@gmail.com> <87a60cbnf7.fsf@gnu.org> Date: Wed, 04 Oct 2023 10:52:58 +0200 Message-ID: <87a5sy3h1x.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::42d; envelope-from=zimon.toutoune@gmail.com; helo=mail-wr1-x42d.google.com X-Spam_score_int: -1 X-Spam_score: -0.2 X-Spam_bar: / X-Spam_report: (-0.2 / 5.0 requ) BAYES_00=-1.9, DATE_IN_PAST_06_12=1.543, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Spam-Score: -6.57 X-Migadu-Spam-Score: -6.57 X-Migadu-Scanner: mx1.migadu.com X-Migadu-Queue-Id: 989825A7DB X-TUID: nMSBwT7z8VxI Hi Ludo, On Thu, 16 Mar 2023 at 18:45, Ludovic Court=C3=A8s wrote: > Thanks for starting this discussion! I feel this discussion is still pending, so I am resuming. :-) If context is missing, the thread starts here. intrinsic vs extrinsic identifier: toward more robustness? Simon Tournier Fri, 03 Mar 2023 19:07:23 +0100 id:87jzzxd7z8.fsf@gmail.com https://lists.gnu.org/archive/html/guix-devel/2023-03 https://yhetil.org/guix/87jzzxd7z8.fsf@gmail.com > Sources (fixed-output derivations) are already content-addressed, by > definition (I prefer =E2=80=9Ccontent addressing=E2=80=9D over =E2=80=9Ci= ntrinsic > identification=E2=80=9D because that=E2=80=99s a more widely recognized t= erm). >From my understanding, this is correct only when the sources live in the Guix project infrastructure. I agree that if the source is substitutable (=3D the source exists on one of substitute servers, i.e., Guix project servers), then the fixed-output derivation is content-addressed, For instance, let consider this fixed-output derivation: --8<---------------cut here---------------start------------->8--- Derive ([("out","/gnu/store/n1k6jppyasn20zr6m8sfyv5ll07ibyf1-asciidoc-8.6.10.tar.g= z","sha256","9e52f8578d891beaef25730a92a6e723596ddbd07bfe0d2a56486fcf63a0b9= 83")] ,[] ,["/gnu/store/5iw2ivjw5njyyvi7avyphfcibgbqdbsc-mirrors","/gnu/store/vwyxp1= dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors"] ,"x86_64-linux","builtin:download",[] ,[("content-addressed-mirrors","/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai7= 9-content-addressed-mirrors") ,("impureEnvVars","http_proxy https_proxy LC_ALL LC_MESSAGES LANG COLUMN= S") ,("mirrors","/gnu/store/5iw2ivjw5njyyvi7avyphfcibgbqdbsc-mirrors") ,("out","/gnu/store/n1k6jppyasn20zr6m8sfyv5ll07ibyf1-asciidoc-8.6.10.tar= .gz") ,("preferLocalBuild","1") ,("url","\"https://github.com/asciidoc/asciidoc/archive/8.6.10.tar.gz\""= )]) --8<---------------cut here---------------end--------------->8--- I agree that the =E2=80=9Curl=E2=80=9D field is useless while the content e= xists on the =E2=80=9Ccontent-addressed-mirrors=E2=80=9D list. If one opens that file, = then the code reads: --8<---------------cut here---------------start------------->8--- (begin (use-modules (guix base32)) (define (guix-publish host) (lambda (file algo hash) (string-append "https://" host "/file/" file "/" (symbol->string algo) "/" (bytevector->nix-base32-string hash)))) (module-autoload! (current-module) (quote (guix base16)) (quote (bytevector->base16-string))) (list (guix-publish "ci.guix.gnu.org") (lambda (file algo hash) (string-append "https://tarballs.nixos.org/" (symbol->string algo) "/" (bytevector->nix-base32-string hash))) (lambda (file algo hash) (string-append "https://archive.softwareheritage.org/api/1/content/" (symbol->string algo) ":" (bytevector->base16-string hash) "/raw/")))) --8<---------------cut here---------------end--------------->8--- Therefore, the look-up is done with some content-addressed via these 3 servers. > In a way, like Maxime way saying, the URL/URI is just a hint; what > matters it the content hash that appears in the origin. However, from my understanding, it is incorrect to speak about content-addressed when the source (fixed-output derivation) does not exist for whatever reason on any substitute servers. The URL/URI is not =E2=80=9Cjust a hint=E2=80=9D. It *is* the location fro= m where the data are fetched. And it is not content-addressed. If I am incorrect, please could you explain? Please note that if only one source is missing than all the castle falls down. Other said, robustness means the hunt of the corner cases. :-) If I want to time-machine to d63ee94d63c667e0c63651d6b775460f4c67497d from Sat Jan 4 2020, and need Git, then it fails because: sha256 hash mismatch for /gnu/store/n1k6jppyasn20zr6m8sfyv5ll07ibyf1-as= ciidoc-8.6.10.tar.gz: expected hash: 10xrl1iwyvs8aqm0vzkvs3dnsn93wyk942kk4ppyl6w9imbzhlly actual hash: 1sh341j7ripkdb2wn6yf3rciln8ll89351b3d55gpkj89wypkmi2 Game over. )-: Do we share the same understanding? > What=E2=80=99s missing, both in SWH and in Guix, is the ability to store > multiple hashes. SWH could certainly store several hashes, computed > using different serialization and hash algorithm combinations. [...] > The other option=E2=80=94storing multiple hashes for each origin in Guix= =E2=80=94doesn=E2=80=99t > sound practical: I can=E2=80=99t imagine packages storing and updating mo= re than > one content hash per package. That doesn=E2=80=99t sound reasonable. Pl= us it > would be a long-term solution and wouldn=E2=80=99t help today. Yes, the core question is where to store the database mapping these multiple hashes. Software Heritage (SWH) is one option although 1. it had not been discussed yet how the Nar hashes will be publicly exposed, if they are and 2. if SWH will implement a resolver Nar -> SWHID. On the other hand, on Guix side, we are already building a database mapping multiple hashes: Disarchive database. :-) The question with the Disarchive database is its redundancy, IMHO. Concretely, if disarchive.guix.gnu.org is down, game over. I wish long live to Guix project :-) but it would appear to me more robust to propose a counter-measure. The big picture is: if I publish a paper which details about numerical processing using Guix, then having a Guix installation at hand would be the only condition for redoing. Last, please note Guix is already storing multiple hashes for some origins. It is the case for =E2=80=99git-fetch=E2=80=99 methods, for examp= le. All these packages using a plain Git commit hash are somehow storing two content-addressed hashes (Git and Nar). If one needs examples about how upstream can manage the ugly way their mutable Git tag, for recent cases: bug#66015: Removal of python-pyxel Simon Tournier Fri, 15 Sep 2023 21:09:59 +0200 id:874jjv9rso.fsf@gmail.com https://issues.guix.gnu.org/66015 https://issues.guix.gnu.org/msgid/874jjv9rso.fsf@gmail.com https://yhetil.org/guix/874jjv9rso.fsf@gmail.com and [bug#66013] [PATCH 0/4] gnu: bap, python-glcontext: Fix hash and up= date. Simon Tournier Fri, 15 Sep 2023 20:38:34 +0200 id:cover.1694800551.git.zimon.toutoune@gmail.com https://issues.guix.gnu.org/66013 https://issues.guix.gnu.org/msgid/cover.1694800551.git.zimon.toutou= ne@gmail.com https://yhetil.org/guix/cover.1694800551.git.zimon.toutoune@gmail.c= om All in all, I think we will have more robustness if the Guix I am running implements by its own some builtin features for content-addressed instead of relying on external databases. It is not clear for me how exactly, hence the discussion. :-) Another angle to see the problem of the multiple hashes is for using IPFS, GNUnet and friends. ( I let aside long-term vs today because the time-frame I am interested in is: =E2=80=9Cguarantees=E2=80=9C that I will be able to redo= in 3 years later what I am doing in a very near future. And now I am trying to redo something from 3 years back to spot the potential problems and fix them or improve. I do not really care about the state of redoing Guix as 3 years ago because almost no one published papers using Guix 3 years ago. ;-) Guix is becoming popular in scientific context, yeah! so my interest about this robustness is for when Guix will be just a bit more popular. ) Cheers, simon