From: "Ludovic Courtès" <ludo@gnu.org>
To: 68741@debbugs.gnu.org
Cc: Timothy Sample <samplet@ngyro.com>, Antoine R. Dumont (@ardumont)
Subject: [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage
Date: Fri, 26 Jan 2024 18:25:37 +0100 [thread overview]
Message-ID: <87y1ccm2ge.fsf@gnu.org> (raw)
In-Reply-To: <cover.1706287537.git.ludo@gnu.org> ("Ludovic Courtès"'s message of "Fri, 26 Jan 2024 18:16:40 +0100")
Oops, I forgot to Cc: the fine people for the cover letter; fixed!
See <https://issues.guix.gnu.org/68741>.
Ludovic Courtès <ludo@gnu.org> skribis:
> Hello Guix!
>
> For those who’ve been following along, you might remember that the
> main impedance mismatch between SWH and Guix is that SWH uses Git
> tree SHA1 hashes to identify directories whereas Guix uses nar SHA256
> hashes (and possibly other hash functions in the future):
>
> https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/
>
> Because of this, the SWH fallback path for ‘git-download’ had two
> options:
>
> 1. If ‘git-reference’ specifies a full SHA1 commit ID, it would
> look it up on SWH and fetch it.
>
> 2. If ‘git-reference’ specifies a tag, which is perhaps the
> majority of cases, Guix would ask SWH the commit that once
> corresponded to that tag at that URL, and then fetch it.
>
> Case #1 is ideal: it’s content-addressed. Case #2 is brittle: we’re
> hoping that the tag hasn’t been modified and that the URL hasn’t been
> reused for something else; if that’s not the case, SWH might return
> the “wrong” commit and we end up fetching something unrelated.
>
> The good news is that our friends at SWH have just deployed a new
> version of their code that lets us look up directories by some
> “external identifier” (“ExtID”), among which there’s ‘nar-sha256’:
>
> https://archive.softwareheritage.org/api/1/extid/doc/
>
> And that, my friends, makes a huge difference: the impedance mismatch
> is gone, we can now use content-addressing to fetch our stuff from SWH!!
> And that works not just for Git, but also for Mercurial, SVN, CVS, etc.
>
> Well, there’s a caveat: currently the ‘nar-sha256’ is added only on
> new visits and it’s apparently not being added yet for Mercurial for
> unclear reasons. So right now, we can get guile-sqlite3 0.1.3 (Git) by
> nar-sha256, but we cannot get guile-wisp (hg) nor in fact most things.
> That’ll improve over time though, and SWH comrades are open to adding
> those ExtIDs retroactively.
>
> The patches that follow do several things:
>
> 1. Follow redirects in the Vault: (guix swh) previously did not
> do that (oops!) but the newly-deployed Vault now responds with
> 302 redirects so we have to handle that.
>
> 2. Add bindings for the ExtID HTTP interface.
>
> 3. Add ‘swh-download-directory-by-nar-hash’, which does what it
> says.
>
> 4. Use that as the preferred fallback method for ‘git-fetch’.
>
> Here’s a REPLshot:
>
> scheme@(guile-user)> (lookup-external-id "nar-sha256" (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) )
> $43 = #<<external-id> value: "0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63" type: "nar-sha256" version: 0 target: "swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153" target-url: "https://archive.softwareheritage.org/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153">
> scheme@(guile-user)> (swh-download-directory-by-nar-hash (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) 'sha256 "/tmp/gsql")
> SWH: found directory with nar-sha256 hash 0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63 at 'swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153'
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/.gitignore
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/AUTHORS
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING.LESSER
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ChangeLog
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/Makefile.am
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/NEWS
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/README
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/guile.am
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/test-driver.scm
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/configure.ac
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/env.in
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/sqlite3.scm.in
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/basic.scm
> $46 = #t
>
> Huge thanks to everyone over at #swh-devel for helping me out
> over the past few days!
>
> Next tasks: implement download fallback for ‘hg-fetch’, change
> ‘guix lint -c archival’ to make ‘save-origin’ requests not just
> for Git repos, assess the situation with SVN and sub-directories
> to see what can be done.
>
> Thoughts?
>
> Ludo’.
>
> PS: Apologies for the wall of text!
>
> Ludovic Courtès (6):
> swh: ‘vault-fetch’ follows redirects.
> swh: Add bindings for the “ExtID” API.
> swh: Add ‘swh-download-directory-by-nar-hash’.
> lint: archival: Check with ‘lookup-directory-by-nar-hash’.
> git-download: Download from SWH by nar hash when possible.
> swh: Fix docstring of ‘lookup-directory’.
>
> guix/build/git.scm | 20 ++++--
> guix/git-download.scm | 4 +-
> guix/lint.scm | 28 +++++---
> guix/scripts/perform-download.scm | 4 +-
> guix/swh.scm | 113 ++++++++++++++++++++++++++----
> tests/lint.scm | 33 +++++++--
> tests/swh.scm | 21 +++++-
> 7 files changed, 189 insertions(+), 34 deletions(-)
>
>
> base-commit: 8bee6bb9aaaf35c36fe325675d1eb2daebd69c25
next prev parent reply other threads:[~2024-01-26 17:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 1/6] swh: ‘vault-fetch’ follows redirects Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 2/6] swh: Add bindings for the “ExtID” API Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 3/6] swh: Add ‘swh-download-directory-by-nar-hash’ Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 4/6] lint: archival: Check with ‘lookup-directory-by-nar-hash’ Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 5/6] git-download: Download from SWH by nar hash when possible Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 6/6] swh: Fix docstring of ‘lookup-directory’ Ludovic Courtès
2024-01-26 17:25 ` Ludovic Courtès [this message]
2024-02-12 11:23 ` bug#68741: [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1ccm2ge.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=68741@debbugs.gnu.org \
--cc=samplet@ngyro.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).