unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Simon Tournier <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: Guix Devel <guix-devel@gnu.org>
Subject: Re: intrinsic vs extrinsic identifier: toward more robustness?
Date: Thu, 06 Apr 2023 14:15:56 +0200	[thread overview]
Message-ID: <87lej5mcjn.fsf@gmail.com> (raw)
In-Reply-To: <87a60cbnf7.fsf@gnu.org>

Hi,

On jeu., 16 mars 2023 at 18:45, Ludovic Courtès <ludo@gnu.org> wrote:

>> For sure, we have to fix the holes and bugs. :-)  However, I am asking
>> what we could add for having more robustness on the long term.

> Sources (fixed-output derivations) are already content-addressed, by
> definition (I prefer “content addressing” over “intrinsic
> identification” because that’s a more widely recognized term).

This is the case when you consider that the result of the fixed-output
derivation is already inside the Guix “ecosystem”…

> In a way, like Maxime way saying, the URL/URI is just a hint; what
> matters it the content hash that appears in the origin.

…but else URL/URI is not just a “hint“.  Or could you explain what you
mean by a “hint”?

Maybe I misunderstand something, from my understanding, URL/URI is a
“hint” only when substitutes is available, else Guix relies on plain
URL/URI for fetching data.

--8<---------------cut here---------------start------------->8---
$ guix build hello -S --no-substitutes --check
The following derivation will be built:
  /gnu/store/3hxraqxb0zklq065zjrxcs199ynmvicy-hello-2.12.1.tar.gz.drv
building /gnu/store/3hxraqxb0zklq065zjrxcs199ynmvicy-hello-2.12.1.tar.gz.drv...

Starting download of /gnu/store/1s6xba6nafkxb242kafkg3x10jkdn2n9-hello-2.12.1.tar.gz
From https://ftpmirror.gnu.org/gnu/hello/hello-2.12.1.tar.gz...
following redirection to `https://mirror.cyberbits.eu/gnu/hello/hello-2.12.1.tar.gz'...
downloading from https://ftpmirror.gnu.org/gnu/hello/hello-2.12.1.tar.gz ...

warning: rewriting hashes in `/gnu/store/3dq55rw99wdc4g4wblz7xikc8a2jy7a3-hello-2.12.1.tar.gz'; cross fingers
--8<---------------cut here---------------end--------------->8---

Other said, when speaking about robustness (broad meaning), I think we
cannot assume that the “content addressing” provided by the derivation,

--8<---------------cut here---------------start------------->8---
Derive
([("out","/gnu/store/3dq55rw99wdc4g4wblz7xikc8a2jy7a3-hello-2.12.1.tar.gz","sha256","8d99142afd92576f30b0cd7cb42a8dc6809998bc5d607d88761f512e26c7db20")]
 ,[]
 ,["/gnu/store/0mxnx8l4fgigvd7gakwdk6hc6im4wnai-disarchive-mirrors","/gnu/store/ckxc05iflc8jagdxwh4z1cxc23mb6i6q-mirrors","/gnu/store/wg1yp2vx8gb7qmcgyibqnwblahpp4bjg-content-addressed-mirrors"]
 ,"x86_64-linux","builtin:download",[]
 ,[("content-addressed-mirrors","/gnu/store/wg1yp2vx8gb7qmcgyibqnwblahpp4bjg-content-addressed-mirrors")
   ,("disarchive-mirrors","/gnu/store/0mxnx8l4fgigvd7gakwdk6hc6im4wnai-disarchive-mirrors")
   ,("impureEnvVars","http_proxy https_proxy LC_ALL LC_MESSAGES LANG COLUMNS")
   ,("mirrors","/gnu/store/ckxc05iflc8jagdxwh4z1cxc23mb6i6q-mirrors")
   ,("out","/gnu/store/3dq55rw99wdc4g4wblz7xikc8a2jy7a3-hello-2.12.1.tar.gz")
   ,("preferLocalBuild","1")
   ,("url","\"mirror://gnu/hello/hello-2.12.1.tar.gz\"")])
--8<---------------cut here---------------end--------------->8---

is still there and instead it would mean Guix has to rely on another
system (here ’url’).  Somehow, I am proposing to optionally add more
“content addressing” than the current NAR+SHA256 (and URL/URI) to then
be able to exploit other “content addressing“ systems.


> So it seems to me that the basics are already in place.

Well, there is two possible choices: (1) rely on an external service
that would be bridge the different content addressing systems (as
extending the Disarchive database or hope SWH will do it :-)) but this
other external service needs to be always available or (2) extend the
information of packages (optional fields, etc.).

Moreover about (1), all third-party channels would have to be ingested
by this external service.  About SWH, that’s possible.  About Disarchive
database, it would mean register this third-party channel or maintain
their own database.  Contrary to (2) where the identifier would be
optionally part of the package definition.


> What’s missing, both in SWH and in Guix, is the ability to store
> multiple hashes.  SWH could certainly store several hashes, computed
> using different serialization and hash algorithm combinations.

Please note that currently Guix relies on a “hint“ when SWH is used as
fallback.  For instance, consider most of the cases of git-fetch, Guix
provides to the SWH API the context (URL and Git tag) and let SWH
resolves in order to find the content addressing identifier.  It works
for many cases but it fails for history of history cases, e.g., when
upstream does in-place tag replacement.

And this strategy does not work with Subversion (svn-fetch) or Mercurial
(hg-fetch) or else.  It requires more work on our side (parse the result
of the query, extract relevant information etc.).  Nothing impossible
but far to be done, IMHO. :-)

Well, I still have mixed feelings about the SWH fallback robustness. :-)


> This is what you suggested at
> <https://gitlab.softwareheritage.org/swh/meta/-/issues/4538>; it was
> also discussed in the thread at
> <https://sympa.inria.fr/sympa/arc/swh-devel/2016-07/msg00019.html>.  It
> would be awesome if SWH would store Nar hashes; that would solve all our
> problems, as you explained.

Yeah that’s nice. :-)  The progress is tracked by,

    https://gitlab.softwareheritage.org/swh/meta/-/issues/4979

and the first part for computing NAR is now merged, IIUC, with:

    https://gitlab.softwareheritage.org/swh/devel/swh-loader-core/-/merge_requests/459

However, exposing via their API this NAR and then bridging NAR -> swhid
is not planned on SWH side yet, AFAIK.


> The other option—storing multiple hashes for each origin in Guix—doesn’t
> sound practical: I can’t imagine packages storing and updating more than
> one content hash per package.  That doesn’t sound reasonable.  Plus it
> would be a long-term solution and wouldn’t help today.

Storing a list of content addressing identifiers (NAR+SHA256, Git+SHA1,
GNUnet, IPFS, etc.) would allow to add robustness, IMHO.

Other said, it is not affordable to have a ’gnunet-fetch’ method as
proposed in [1] but we could optionally have,

     (origin
       (method url-fetch)
       (uri (string-append "mirror://gnu/hello/hello-" version
                           ".tar.gz"))
       (sha256
        (base32
         "086vqwk2wl8zfs47sq2xpjc9k066ilmb8z6dn0q6ymwjzlm196cd"))
       (identifiers
        (list
         (gnunet "Y48PGS5RVX643NT2B7GDNFCBT4DWG692PF4YNHERR96K6MSFRZ4ZWRPQ4KVKZV29MGRZTWAMY9ETTST4B6VFM47JR2JS5PWBTPVXB0.8A9HRYABJ7HDA7B0")
         (git+sha1 "swh:1:dir:013573086777370b558b1a9ecb6d0dca9bb8ea18")
         (none+sha1 "8f261739d33d31867ab9c5fa26f973c37da26ca5"))))

And we could also have Git commit hash (for packages using git-fetch
method), etc.

Having an optional field ’identifiers’ would allow to help today for all
other fetch methods than url-fetch and git-fetch.

For sure, it is not straightforward.  For instance, how to insure the
consistency?  Via “guix lint”?  Else? 

Well, on the other hand, sometimes I would like to have a list of
sources using different fetch method, say try first using this url-fetch
and then this git-fetch and then this SWH fallback, etc.


To me the other viable option would be to extend the Disarchive database
and services around.

Thought?

Cheers,
simon

1: https://issues.guix.gnu.org/44199#0-lineno68


  reply	other threads:[~2023-04-06 13:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-03 18:07 intrinsic vs extrinsic identifier: toward more robustness? Simon Tournier
2023-03-04  0:08 ` Maxime Devos
2023-03-04  4:10   ` Maxim Cournoyer
2023-03-05 20:21   ` Simon Tournier
2023-03-06 12:22     ` Maxime Devos
2023-03-06 13:42       ` Simon Tournier
2023-03-16 17:45 ` Ludovic Courtès
2023-04-06 12:15   ` Simon Tournier [this message]
2023-10-04  8:52   ` content-address hint? (was Re: intrinsic vs extrinsic identifier: toward more robustness?) Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lej5mcjn.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).