all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: 44187@debbugs.gnu.org
Subject: bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones
Date: Fri, 10 Sep 2021 16:34:12 +0200	[thread overview]
Message-ID: <20210910143415.14783-1-ludo@gnu.org> (raw)
In-Reply-To: <87pn0dk61v.fsf@gnu.org>

Hi!

A bit of context: we already had automatic SWH fallback for Git checkouts,
which is to say that any origin that uses ‘git-fetch’ would have its
checkout transparently fetched from SWH if upstream vanished (this
dates back to commit 608d3dca89d73fe7260e97a284a8aeea756a3e11, Nov. 2018).

What this patch series provides is SWH fallback for full Git clones (as
opposed to flat checkouts).  It works for anything that uses (guix git).
That includes <git-checkout>, used by transformation options:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build footswitch --with-git-url=footswitch=http://example.org/sdf --with-commit=footswitch=1eabc563ca5692b3e08d84f1f0e6fd2283284469 -n
updating checkout of 'http://example.org/sdf'...
SWH: found revision 1eabc563ca5692b3e08d84f1f0e6fd2283284469 with directory at 'https://archive.softwareheritage.org/api/1/directory/ad8976564375ee55f645387bbcdf4b66e6582fbf/'
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/HEAD
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/branches/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/config
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/description
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/applypatch-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/commit-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/fsmonitor-watchman.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/post-update.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-applypatch.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-commit.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-push.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-rebase.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-receive.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/prepare-commit-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/update.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/exclude
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/refs
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/info/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/info/packs
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/pack-ed28f44a2599fe2d0a5f1b1a84c247c43afd14a1.idx
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/pack-ed28f44a2599fe2d0a5f1b1a84c247c43afd14a1.pack
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/heads/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/heads/master
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/tags/
retrieved commit 1eabc563ca5692b3e08d84f1f0e6fd2283284469
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://bayfront.guix.gnu.org'... 100.0%
The following derivation would be built:
   /gnu/store/39kzsy5kgj5150q6zgckc2hbxp999adw-footswitch-git.1eabc56.drv
--8<---------------cut here---------------end--------------->8---

In the example above, we pass a bogus Git URL, but since the target
commit is known, (guix git) automatically fetches a bare Git repository
from the SWH vault.

It also works for channels, which is what zimoun reported here:

--8<---------------cut here---------------start------------->8---
$ cat /tmp/chan.scm
(list (channel
        (name 'guix)
        (url "https://git.savannah.gnu.org/git/guix.git")
        (commit
          "f91ae9425bb385b60396a544afe27933896b8fa3")
        (introduction
          (make-channel-introduction
            "9edb3f66fd807b096b48283debdcddccfea34bad"
            (openpgp-fingerprint
             "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
      (channel
       (name 'guix-past)
       (url "https://does-not-exist.inria.fr/guix-hpc/guix-past")
       (commit "77e183dc7ade307ad3409fad4b71f12e266de910")
       #;(introduction
        (make-channel-introduction
         "0c119db2ea86a389769f4d2b9c6f5c41c027e336"
         (openpgp-fingerprint
          "3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5")))))
$ ./pre-inst-env guix time-machine -C /tmp/chan.scm -- describe
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Updating channel 'guix-past' from Git repository at 'https://does-not-exist.inria.fr/guix-hpc/guix-past'...
SWH: found revision 77e183dc7ade307ad3409fad4b71f12e266de910 with directory at 'https://archive.softwareheritage.org/api/1/directory/7c6aa10e1e0fa54199566145c6a453731872b87d/'
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/HEAD
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/branches/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/config
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/description
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/hooks/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/exclude
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/refs
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/info/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/info/packs
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/pack-e6c0a4813509178eed735708dd60503353a50b9c.idx
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/pack-e6c0a4813509178eed735708dd60503353a50b9c.pack
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/heads/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/heads/master
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/tags/
Computing Guix derivation for 'x86_64-linux'... \  C-c C-c
--8<---------------cut here---------------end--------------->8---

Here, the ‘guix-past’ channel is transparently cloned from SWH.  This
is pretty cool, because having the whole repo around is what permits
things like downgrade prevention¹ and news support².

  Finally we can enjoy content-addressability and brittle URLs
  are becoming a thing of the past!*


Limitations
~~~~~~~~~~~~

Yes, there’s a couple of them.

First, fallback is implemented only for fresh clones, not for updates.
Thus, if I rerun the first example, having now the clone in
~/.cache/guix/checkouts, with a different commit, I get:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build footswitch --with-git-url=footswitch=http://example.org/sdf --with-commit=footswitch=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -n
updating checkout of 'http://example.org/sdf'...
guix build: error: Git failure while fetching http://example.org/sdf: unexpected http status code: 404
--8<---------------cut here---------------end--------------->8---

Second, clones from SWH only contain the one branch that the revision
is on.  For channels, that means that the ‘keyring’ branch is not fetched,
which is why I commented out ‘introduction’ in /tmp/chan.scm above.
If I uncomment it, I get:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix time-machine -C /tmp/chan.scm -- describe
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Updating channel 'guix-past' from Git repository at 'https://does-not-exist.inria.fr/guix-hpc/guix-past'...
guix time-machine: error: Git error: cannot locate remote-tracking branch 'origin/keyring'
--8<---------------cut here---------------end--------------->8---

The SWH folks tell me it’ll eventually be possible to map a revision
to its containing snapshot(s) via the HTTP API, and to obtain entire
snapshots (i.e., the repo and all its branches) from the vault.  That’s
what we need to fix this issue.

*Third, and this answers the asterisk above, we must keep in mind that
this is content-addressibility *with SHA1*.  Generating a chosen-prefix
collision is becoming affordable³, so users absolutely need an additional
mechanism to authenticate code they fetched.

For origins, we have the content SHA256, so we’re fine.  For channels,
we have Guix’s authentication mechanism¹, except it’s not available yet
via SWH, as I wrote above.  For the footswitch example above using
‘--with-commit’, we don’t have any authentication method, but in fact,
that’s the situation of Git repositories in general: they can rarely be
authenticated.

Overall, I think it’s a step in the right direction.

Thoughts?

Thanks to vlorentz and olasd on #swh-devel for their support!

Thanks,
Ludo’.

¹ https://guix.gnu.org/en/blog/2020/securing-updates/
² https://guix.gnu.org/en/blog/2019/spreading-the-news/
³ https://sha-mbles.github.io/

Ludovic Courtès (3):
  swh: Support downloads of bare Git repositories.
  git: 'update-cached-checkout' can fall back to SWH when cloning.
  git: 'reference-available?' recognizes 'tag-or-commit'.

 guix/git.scm | 45 +++++++++++++++++++++++++++++++++++++++++++--
 guix/swh.scm | 52 ++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 83 insertions(+), 14 deletions(-)

-- 
2.33.0





  reply	other threads:[~2021-09-10 14:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-23 22:17 bug#44187: whishlist: time-machine --channel falls back to SWH zimoun
2021-03-05 14:51 ` Ludovic Courtès
2021-09-10 14:34   ` Ludovic Courtès [this message]
2021-09-10 14:34     ` bug#44187: [PATCH 1/3] swh: Support downloads of bare Git repositories Ludovic Courtès
2021-09-17 17:31       ` bug#44187: Channel clones lack SWH fallback zimoun
2021-09-18 10:05         ` Ludovic Courtès
2021-09-18 10:27           ` zimoun
2021-09-10 14:34     ` bug#44187: [PATCH 2/3] git: 'update-cached-checkout' can fall back to SWH when cloning Ludovic Courtès
2021-09-10 14:34     ` bug#44187: [PATCH 3/3] git: 'reference-available?' recognizes 'tag-or-commit' Ludovic Courtès
2021-09-13 16:07     ` bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones zimoun
2021-09-14 13:37       ` Ludovic Courtès
2021-09-17  8:02 ` bug#44187: Channel clones lack SWH fallback zimoun
2021-09-18 21:10   ` Ludovic Courtès
2021-09-20  9:27     ` zimoun
2021-09-22 10:03       ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210910143415.14783-1-ludo@gnu.org \
    --to=ludo@gnu.org \
    --cc=44187@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.