unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: 68741@debbugs.gnu.org
Cc: "Ludovic Courtès" <ludo@gnu.org>,
	"Christopher Baines" <guix@cbaines.net>,
	"Josselin Poiret" <dev@jpoiret.xyz>,
	"Ludovic Courtès" <ludo@gnu.org>,
	"Mathieu Othacehe" <othacehe@gnu.org>,
	"Ricardo Wurmus" <rekado@elephly.net>,
	"Simon Tournier" <zimon.toutoune@gmail.com>,
	"Tobias Geerinckx-Rice" <me@tobias.gr>
Subject: [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage
Date: Fri, 26 Jan 2024 18:16:40 +0100	[thread overview]
Message-ID: <cover.1706287537.git.ludo@gnu.org> (raw)

Hello Guix!

For those who’ve been following along, you might remember that the
main impedance mismatch between SWH and Guix is that SWH uses Git
tree SHA1 hashes to identify directories whereas Guix uses nar SHA256
hashes (and possibly other hash functions in the future):

  https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/

Because of this, the SWH fallback path for ‘git-download’ had two
options:

  1. If ‘git-reference’ specifies a full SHA1 commit ID, it would
     look it up on SWH and fetch it.

  2. If ‘git-reference’ specifies a tag, which is perhaps the
     majority of cases, Guix would ask SWH the commit that once
     corresponded to that tag at that URL, and then fetch it.

Case #1 is ideal: it’s content-addressed.  Case #2 is brittle: we’re
hoping that the tag hasn’t been modified and that the URL hasn’t been
reused for something else; if that’s not the case, SWH might return
the “wrong” commit and we end up fetching something unrelated.

The good news is that our friends at SWH have just deployed a new
version of their code that lets us look up directories by some
“external identifier” (“ExtID”), among which there’s ‘nar-sha256’:

  https://archive.softwareheritage.org/api/1/extid/doc/

And that, my friends, makes a huge difference: the impedance mismatch
is gone, we can now use content-addressing to fetch our stuff from SWH!!
And that works not just for Git, but also for Mercurial, SVN, CVS, etc.

Well, there’s a caveat: currently the ‘nar-sha256’ is added only on
new visits and it’s apparently not being added yet for Mercurial for
unclear reasons.  So right now, we can get guile-sqlite3 0.1.3 (Git) by
nar-sha256, but we cannot get guile-wisp (hg) nor in fact most things.
That’ll improve over time though, and SWH comrades are open to adding
those ExtIDs retroactively.

The patches that follow do several things:

  1. Follow redirects in the Vault: (guix swh) previously did not
     do that (oops!) but the newly-deployed Vault now responds with
     302 redirects so we have to handle that.

  2. Add bindings for the ExtID HTTP interface.

  3. Add ‘swh-download-directory-by-nar-hash’, which does what it
     says.

  4. Use that as the preferred fallback method for ‘git-fetch’.

Here’s a REPLshot:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (lookup-external-id "nar-sha256" (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) )
$43 = #<<external-id> value: "0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63" type: "nar-sha256" version: 0 target: "swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153" target-url: "https://archive.softwareheritage.org/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153">
scheme@(guile-user)> (swh-download-directory-by-nar-hash (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) 'sha256 "/tmp/gsql")
SWH: found directory with nar-sha256 hash 0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63 at 'swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153'
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/.gitignore
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/AUTHORS
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING.LESSER
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ChangeLog
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/Makefile.am
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/NEWS
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/README
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/guile.am
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/test-driver.scm
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/configure.ac
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/env.in
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/sqlite3.scm.in
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/
swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/basic.scm
$46 = #t
--8<---------------cut here---------------end--------------->8---

Huge thanks to everyone over at #swh-devel for helping me out
over the past few days!

Next tasks: implement download fallback for ‘hg-fetch’, change
‘guix lint -c archival’ to make ‘save-origin’ requests not just
for Git repos, assess the situation with SVN and sub-directories
to see what can be done.

Thoughts?

Ludo’.

PS: Apologies for the wall of text!

Ludovic Courtès (6):
  swh: ‘vault-fetch’ follows redirects.
  swh: Add bindings for the “ExtID” API.
  swh: Add ‘swh-download-directory-by-nar-hash’.
  lint: archival: Check with ‘lookup-directory-by-nar-hash’.
  git-download: Download from SWH by nar hash when possible.
  swh: Fix docstring of ‘lookup-directory’.

 guix/build/git.scm                |  20 ++++--
 guix/git-download.scm             |   4 +-
 guix/lint.scm                     |  28 +++++---
 guix/scripts/perform-download.scm |   4 +-
 guix/swh.scm                      | 113 ++++++++++++++++++++++++++----
 tests/lint.scm                    |  33 +++++++--
 tests/swh.scm                     |  21 +++++-
 7 files changed, 189 insertions(+), 34 deletions(-)


base-commit: 8bee6bb9aaaf35c36fe325675d1eb2daebd69c25
-- 
2.41.0





             reply	other threads:[~2024-01-26 17:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26 17:16 Ludovic Courtès [this message]
2024-01-26 17:25 ` [bug#68741] [PATCH 1/6] swh: ‘vault-fetch’ follows redirects Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 2/6] swh: Add bindings for the “ExtID” API Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 3/6] swh: Add ‘swh-download-directory-by-nar-hash’ Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 4/6] lint: archival: Check with ‘lookup-directory-by-nar-hash’ Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 5/6] git-download: Download from SWH by nar hash when possible Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 6/6] swh: Fix docstring of ‘lookup-directory’ Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
2024-02-12 11:23 ` bug#68741: " Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1706287537.git.ludo@gnu.org \
    --to=ludo@gnu.org \
    --cc=68741@debbugs.gnu.org \
    --cc=dev@jpoiret.xyz \
    --cc=guix@cbaines.net \
    --cc=me@tobias.gr \
    --cc=othacehe@gnu.org \
    --cc=rekado@elephly.net \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).