unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: 68741@debbugs.gnu.org
Cc: Timothy Sample <samplet@ngyro.com>, Antoine R. Dumont (@ardumont)
Subject: [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage
Date: Fri, 26 Jan 2024 18:25:37 +0100	[thread overview]
Message-ID: <87y1ccm2ge.fsf@gnu.org> (raw)
In-Reply-To: <cover.1706287537.git.ludo@gnu.org> ("Ludovic Courtès"'s message of "Fri, 26 Jan 2024 18:16:40 +0100")

Oops, I forgot to Cc: the fine people for the cover letter; fixed!

See <https://issues.guix.gnu.org/68741>.

Ludovic Courtès <ludo@gnu.org> skribis:

> Hello Guix!
>
> For those who’ve been following along, you might remember that the
> main impedance mismatch between SWH and Guix is that SWH uses Git
> tree SHA1 hashes to identify directories whereas Guix uses nar SHA256
> hashes (and possibly other hash functions in the future):
>
>   https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/
>
> Because of this, the SWH fallback path for ‘git-download’ had two
> options:
>
>   1. If ‘git-reference’ specifies a full SHA1 commit ID, it would
>      look it up on SWH and fetch it.
>
>   2. If ‘git-reference’ specifies a tag, which is perhaps the
>      majority of cases, Guix would ask SWH the commit that once
>      corresponded to that tag at that URL, and then fetch it.
>
> Case #1 is ideal: it’s content-addressed.  Case #2 is brittle: we’re
> hoping that the tag hasn’t been modified and that the URL hasn’t been
> reused for something else; if that’s not the case, SWH might return
> the “wrong” commit and we end up fetching something unrelated.
>
> The good news is that our friends at SWH have just deployed a new
> version of their code that lets us look up directories by some
> “external identifier” (“ExtID”), among which there’s ‘nar-sha256’:
>
>   https://archive.softwareheritage.org/api/1/extid/doc/
>
> And that, my friends, makes a huge difference: the impedance mismatch
> is gone, we can now use content-addressing to fetch our stuff from SWH!!
> And that works not just for Git, but also for Mercurial, SVN, CVS, etc.
>
> Well, there’s a caveat: currently the ‘nar-sha256’ is added only on
> new visits and it’s apparently not being added yet for Mercurial for
> unclear reasons.  So right now, we can get guile-sqlite3 0.1.3 (Git) by
> nar-sha256, but we cannot get guile-wisp (hg) nor in fact most things.
> That’ll improve over time though, and SWH comrades are open to adding
> those ExtIDs retroactively.
>
> The patches that follow do several things:
>
>   1. Follow redirects in the Vault: (guix swh) previously did not
>      do that (oops!) but the newly-deployed Vault now responds with
>      302 redirects so we have to handle that.
>
>   2. Add bindings for the ExtID HTTP interface.
>
>   3. Add ‘swh-download-directory-by-nar-hash’, which does what it
>      says.
>
>   4. Use that as the preferred fallback method for ‘git-fetch’.
>
> Here’s a REPLshot:
>
> scheme@(guile-user)> (lookup-external-id "nar-sha256" (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) )
> $43 = #<<external-id> value: "0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63" type: "nar-sha256" version: 0 target: "swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153" target-url: "https://archive.softwareheritage.org/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153">
> scheme@(guile-user)> (swh-download-directory-by-nar-hash (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) 'sha256 "/tmp/gsql")
> SWH: found directory with nar-sha256 hash 0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63 at 'swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153'
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/.gitignore
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/AUTHORS
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING.LESSER
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ChangeLog
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/Makefile.am
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/NEWS
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/README
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/guile.am
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/test-driver.scm
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/configure.ac
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/env.in
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/sqlite3.scm.in
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/basic.scm
> $46 = #t
>
> Huge thanks to everyone over at #swh-devel for helping me out
> over the past few days!
>
> Next tasks: implement download fallback for ‘hg-fetch’, change
> ‘guix lint -c archival’ to make ‘save-origin’ requests not just
> for Git repos, assess the situation with SVN and sub-directories
> to see what can be done.
>
> Thoughts?
>
> Ludo’.
>
> PS: Apologies for the wall of text!
>
> Ludovic Courtès (6):
>   swh: ‘vault-fetch’ follows redirects.
>   swh: Add bindings for the “ExtID” API.
>   swh: Add ‘swh-download-directory-by-nar-hash’.
>   lint: archival: Check with ‘lookup-directory-by-nar-hash’.
>   git-download: Download from SWH by nar hash when possible.
>   swh: Fix docstring of ‘lookup-directory’.
>
>  guix/build/git.scm                |  20 ++++--
>  guix/git-download.scm             |   4 +-
>  guix/lint.scm                     |  28 +++++---
>  guix/scripts/perform-download.scm |   4 +-
>  guix/swh.scm                      | 113 ++++++++++++++++++++++++++----
>  tests/lint.scm                    |  33 +++++++--
>  tests/swh.scm                     |  21 +++++-
>  7 files changed, 189 insertions(+), 34 deletions(-)
>
>
> base-commit: 8bee6bb9aaaf35c36fe325675d1eb2daebd69c25




  parent reply	other threads:[~2024-01-26 17:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 1/6] swh: ‘vault-fetch’ follows redirects Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 2/6] swh: Add bindings for the “ExtID” API Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 3/6] swh: Add ‘swh-download-directory-by-nar-hash’ Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 4/6] lint: archival: Check with ‘lookup-directory-by-nar-hash’ Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 5/6] git-download: Download from SWH by nar hash when possible Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 6/6] swh: Fix docstring of ‘lookup-directory’ Ludovic Courtès
2024-01-26 17:25 ` Ludovic Courtès [this message]
2024-02-12 11:23 ` bug#68741: [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1ccm2ge.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=68741@debbugs.gnu.org \
    --cc=samplet@ngyro.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).