* [bug#68741] [PATCH 1/6] swh: ‘vault-fetch’ follows redirects.
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
@ 2024-01-26 17:25 ` Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 2/6] swh: Add bindings for the “ExtID” API Ludovic Courtès
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-01-26 17:25 UTC (permalink / raw)
To: 68741
Cc: Ludovic Courtès, Christopher Baines, Josselin Poiret,
Ludovic Courtès, Mathieu Othacehe, Ricardo Wurmus,
Simon Tournier, Tobias Geerinckx-Rice
Today, URLs like
https://archive.softwareheritage.org/api/1/vault/flat/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/raw/
redirect to https://swhvaultstorage.blob.core.windows.net/…. This
change fixes ‘vault-fetch’ to follow these.
* guix/swh.scm (http-get/follow): New procedure.
(vault-fetch): Use it instead of ‘http-get*’.
Change-Id: Id6b9585a9ce6699a2274b99c9a6d4edda1018b02
---
guix/swh.scm | 52 +++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 41 insertions(+), 11 deletions(-)
diff --git a/guix/swh.scm b/guix/swh.scm
index c7c1c873a2..4e71bdb045 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2018-2021, 2024 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2020 Jakub Kądziołka <kuba@kadziolka.net>
;;; Copyright © 2021 Xinglu Chen <public@yoctocell.xyz>
;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
@@ -583,6 +583,41 @@ (define* (request-cooking id #:optional kind #:key (archive-type 'flat))
json->vault-reply
http-post*))
+(define* (http-get/follow url
+ #:key
+ (verify-certificate? (%verify-swh-certificate?)))
+ "Like 'http-get' but follow redirects (HTTP 30x). On success, return two
+values: an input port to read the response body and its 'Content-Length'. On
+failure return #f and #f."
+ (define uri
+ (if (string? url) (string->uri url) url))
+
+ (let loop ((uri uri))
+ (define (resolve-uri-reference target)
+ (if (and (uri-scheme target) (uri-host target))
+ target
+ (build-uri (uri-scheme uri) #:host (uri-host uri)
+ #:port (uri-port uri)
+ #:path (uri-path target))))
+
+ (let*-values (((response port)
+ (http-get* uri #:streaming? #t
+ #:verify-certificate? verify-certificate?))
+ ((code)
+ (response-code response)))
+ (case code
+ ((200)
+ (values port (response-content-length response)))
+ ((301 ; moved permanently
+ 302 ; found (redirection)
+ 303 ; see other
+ 307 ; temporary redirection
+ 308) ; permanent redirection
+ (close-port port)
+ (loop (resolve-uri-reference (response-location response))))
+ (else
+ (values #f #f))))))
+
(define* (vault-fetch id
#:optional kind
#:key
@@ -604,16 +639,11 @@ (define* (vault-fetch id
(match (vault-reply-status reply)
('done
;; Fetch the bundle.
- (let-values (((response port)
- (http-get* (swh-url (vault-reply-fetch-url reply))
- #:streaming? #t
- #:verify-certificate?
- (%verify-swh-certificate?))))
- (if (= (response-code response) 200)
- port
- (begin ;shouldn't happen
- (close-port port)
- #f))))
+ (let-values (((port length)
+ (http-get/follow (swh-url (vault-reply-fetch-url reply))
+ #:verify-certificate?
+ (%verify-swh-certificate?))))
+ port))
('failed
;; Upon failure, we're supposed to try again.
(format log-port "SWH vault: failure: ~a~%"
--
2.41.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [bug#68741] [PATCH 2/6] swh: Add bindings for the “ExtID” API.
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 1/6] swh: ‘vault-fetch’ follows redirects Ludovic Courtès
@ 2024-01-26 17:25 ` Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 3/6] swh: Add ‘swh-download-directory-by-nar-hash’ Ludovic Courtès
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-01-26 17:25 UTC (permalink / raw)
To: 68741
Cc: Ludovic Courtès, Christopher Baines, Josselin Poiret,
Ludovic Courtès, Mathieu Othacehe, Ricardo Wurmus,
Simon Tournier, Tobias Geerinckx-Rice
This interface was deployed at archive.softwareheritage.org a few days
ago. Our main use case will be looking up directories by “nar-sha256”
hashes.
* guix/swh.scm (<external-id>): New JSON-mapped record type.
(lookup-external-id, lookup-directory-by-nar-hash): New procedures.
* tests/swh.scm (%external-id): New variable.
("lookup-directory-by-nar-hash"): New test.
Change-Id: Ib671c7798aeb6f8132ac78f2b06b9285da8e7bd5
---
guix/swh.scm | 35 +++++++++++++++++++++++++++++++++++
tests/swh.scm | 21 ++++++++++++++++++++-
2 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/guix/swh.scm b/guix/swh.scm
index 4e71bdb045..60e97c6d38 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -78,6 +78,14 @@ (define-module (guix swh)
lookup-revision
lookup-origin-revision
+ external-id?
+ external-id-value
+ external-id-type
+ external-id-version
+ external-id-target
+ lookup-external-id
+ lookup-directory-by-nar-hash
+
content?
content-checksums
content-data-url
@@ -382,6 +390,15 @@ (define-json-mapping <directory-entry> make-directory-entry directory-entry?
(permissions directory-entry-permissions "perms")
(target-url directory-entry-target-url "target_url"))
+;; <https://archive.softwareheritage.org/api/1/extid/doc/>
+(define-json-mapping <external-id> make-external-id external-id?
+ json->external-id
+ (value external-id-value "extid")
+ (type external-id-type "extid_type")
+ (version external-id-version "extid_version")
+ (target external-id-target)
+ (target-url external-id-target-url "target_url"))
+
;; <https://archive.softwareheritage.org/api/1/origin/save/>
(define-json-mapping <save-reply> make-save-reply save-reply?
json->save-reply
@@ -436,6 +453,24 @@ (define (json->directory-entries port)
(map json->directory-entry
(vector->list (json->scm port))))
+(define (lookup-external-id type id)
+ "Return the external ID record for ID, a bytevector, of the given TYPE
+(currently one of: \"bzr-nodeid\", \"hg-nodeid\", \"nar-sha256\",
+\"checksum-sha512\")."
+ (call (swh-url "/api/1/extid" type
+ (string-append "hex:" (bytevector->base16-string id)))
+ json->external-id))
+
+(define* (lookup-directory-by-nar-hash hash #:optional (algorithm 'sha256))
+ "Return the SWHID of a directory---i.e., prefixed by \"swh:1:dir\"---for the
+directory that with the given HASH (a bytevector), assuming nar serialization
+and use of ALGORITHM."
+ ;; example:
+ ;; https://archive.softwareheritage.org/api/1/extid/nar-sha256/base64url:0jD6Z4TLMm5g1CviuNNuVNP31KWyoT_oevfr8TQwc3Y/
+ (and=> (lookup-external-id (string-append "nar-" (symbol->string algorithm))
+ hash)
+ external-id-target))
+
(define (origin-visits origin)
"Return the list of visits of ORIGIN, a record as returned by
'lookup-origin'."
diff --git a/tests/swh.scm b/tests/swh.scm
index a36f951241..e7ced6b50c 100644
--- a/tests/swh.scm
+++ b/tests/swh.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2019-2021, 2024 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -18,6 +18,7 @@
(define-module (test-swh)
#:use-module (guix swh)
+ #:use-module (guix base32)
#:use-module (guix tests http)
#:use-module (web response)
#:use-module (srfi srfi-19)
@@ -56,6 +57,16 @@ (define %directory-entries
\"length\": 456,
\"dir_id\": 2 } ]")
+(define %external-id
+ "{ \"extid_type\": \"nar-sha256\",
+ \"extid\":
+\"0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63\",
+ \"version\": 0,
+ \"target\": \"swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153\",
+ \"target_url\":
+\"https://archive.softwareheritage.org/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153\"
+ }")
+
(define-syntax-rule (with-json-result str exp ...)
(with-http-server `((200 ,str))
(parameterize ((%swh-base-url (%local-url)))
@@ -98,6 +109,14 @@ (define-syntax-rule (with-json-result str exp ...)
(directory-entry-length entry)))
(lookup-directory "123"))))
+(test-equal "lookup-directory-by-nar-hash"
+ "swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153"
+ (with-json-result %external-id
+ (lookup-directory-by-nar-hash
+ (nix-base32-string->bytevector
+ "0qqygvlpz63phdi2p5p8ncp80dci230qfa3pwds8yfxqqaablmhb")
+ 'sha256)))
+
(test-equal "rate limit reached"
3000000000
(let ((too-many (build-response
--
2.41.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [bug#68741] [PATCH 3/6] swh: Add ‘swh-download-directory-by-nar-hash’.
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 1/6] swh: ‘vault-fetch’ follows redirects Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 2/6] swh: Add bindings for the “ExtID” API Ludovic Courtès
@ 2024-01-26 17:25 ` Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 4/6] lint: archival: Check with ‘lookup-directory-by-nar-hash’ Ludovic Courtès
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-01-26 17:25 UTC (permalink / raw)
To: 68741
Cc: Ludovic Courtès, Christopher Baines, Josselin Poiret,
Ludovic Courtès, Mathieu Othacehe, Ricardo Wurmus,
Simon Tournier, Tobias Geerinckx-Rice
This allows us to take advantage of content addressing by giving SWH the
expected nar hash.
* guix/swh.scm (swh-download-directory-by-nar-hash): New procedure.
Change-Id: I0494ee15a3cde390a22552de7c2246e0314ba7b5
---
guix/swh.scm | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/guix/swh.scm b/guix/swh.scm
index 60e97c6d38..be1eb7d151 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -123,6 +123,7 @@ (define-module (guix swh)
commit-id?
swh-download-directory
+ swh-download-directory-by-nar-hash
swh-download))
;;; Commentary:
@@ -805,3 +806,26 @@ (define* (swh-download url reference output
"SWH: revision ~s originating from ~a could not be found~%"
reference url)
#f)))
+
+(define* (swh-download-directory-by-nar-hash hash algorithm output
+ #:key
+ (log-port (current-error-port)))
+ "Download from Software Heritage the directory with the given nar HASH for
+ALGORITHM (a symbol such as 'sha256), and unpack it in OUTPUT. Return #t on
+success and #f on failure.
+
+This procedure uses the \"vault\", which contains \"cooked\" directories in
+the form of tarballs. If the requested directory is not cooked yet, it will
+wait until it becomes available, which could take several minutes."
+ (match (lookup-directory-by-nar-hash hash algorithm)
+ (#f
+ (format log-port
+ "SWH: directory with nar-~a hash ~a not found~%"
+ algorithm (bytevector->base16-string hash))
+ #f)
+ (swhid
+ (format log-port "SWH: found directory with nar-~a hash ~a at '~a'~%"
+ algorithm (bytevector->base16-string hash) swhid)
+ (swh-download-archive swhid output
+ #:archive-type 'flat ;SWHID denotes a directory
+ #:log-port log-port))))
--
2.41.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [bug#68741] [PATCH 4/6] lint: archival: Check with ‘lookup-directory-by-nar-hash’.
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
` (2 preceding siblings ...)
2024-01-26 17:25 ` [bug#68741] [PATCH 3/6] swh: Add ‘swh-download-directory-by-nar-hash’ Ludovic Courtès
@ 2024-01-26 17:25 ` Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 5/6] git-download: Download from SWH by nar hash when possible Ludovic Courtès
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-01-26 17:25 UTC (permalink / raw)
To: 68741
Cc: Ludovic Courtès, Christopher Baines, Josselin Poiret,
Ludovic Courtès, Mathieu Othacehe, Ricardo Wurmus,
Simon Tournier, Tobias Geerinckx-Rice
While this method is new and nar-sha256 ExtIDs are currently available
only for new visits, it is fundamentally more reliable than the other
methods, which is why it comes first.
* guix/lint.scm (check-archival)[lookup-by-nar-hash]: New procedure.
Call ‘lookup-by-nar-hash’ before the other lookup methods.
* tests/lint.scm ("archival: content available")
("archival: content unavailable but disarchive available")
("archival: missing revision")
("archival: revision available"): Add a 404 response corresponding to
the ‘lookup-external-id’ request.
* tests/lint.scm ("archival: nar-sha256 extid available"): New test.
Change-Id: I4a81d6e022a3b72e6484726549d7fbae627f8e73
---
guix/lint.scm | 28 ++++++++++++++++++----------
tests/lint.scm | 33 ++++++++++++++++++++++++++++-----
2 files changed, 46 insertions(+), 15 deletions(-)
diff --git a/guix/lint.scm b/guix/lint.scm
index 861e352b93..c95de85e69 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -1,7 +1,7 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2014 Cyril Roelandt <tipecaml@gmail.com>
;;; Copyright © 2014, 2015 Eric Bavier <bavier@member.fsf.org>
-;;; Copyright © 2013-2023 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2013-2024 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2015, 2016 Mathieu Lirzin <mthl@gnu.org>
;;; Copyright © 2016 Danny Milosavljevic <dannym+a@scratchpost.org>
;;; Copyright © 2016 Hartmut Goebel <h.goebel@crazy-compilers.com>
@@ -1658,24 +1658,31 @@ (define (check-archival package)
(or (not (request-rate-limit-reached? url method))
(throw skip-key #t)))
+ (define (lookup-by-nar-hash hash)
+ (lookup-directory-by-nar-hash (content-hash-value hash)
+ (content-hash-algorithm hash)))
+
(parameterize ((%allow-request? skip-when-limit-reached))
(catch #t
(lambda ()
(match (package-source package)
(#f ;no source
'())
- ((and (? origin?)
+ ((and (? origin? origin)
(= origin-uri (? git-reference? reference)))
(define url
(git-reference-url reference))
(define commit
(git-reference-commit reference))
+ (define hash
+ (origin-hash origin))
- (match (if (commit-id? commit)
- (or (lookup-revision commit)
- (lookup-origin-revision url commit))
- (lookup-origin-revision url commit))
- ((? revision? revision)
+ (match (or (lookup-by-nar-hash hash)
+ (if (commit-id? commit)
+ (or (lookup-revision commit)
+ (lookup-origin-revision url commit))
+ (lookup-origin-revision url commit)))
+ ((or (? string?) (? revision?))
'())
(#f
;; Revision is missing from the archive, attempt to save it.
@@ -1704,9 +1711,10 @@ (define (check-archival package)
(if (and=> (origin-hash origin) ;XXX: for ungoogled-chromium
content-hash-value) ;& icecat
(let ((hash (origin-hash origin)))
- (match (lookup-content (content-hash-value hash)
- (symbol->string
- (content-hash-algorithm hash)))
+ (match (or (lookup-by-nar-hash hash)
+ (lookup-content (content-hash-value hash)
+ (symbol->string
+ (content-hash-algorithm hash))))
(#f
;; If SWH doesn't have HASH as is, it may be because it's
;; a hand-crafted tarball. In that case, check whether
diff --git a/tests/lint.scm b/tests/lint.scm
index a52a82237b..87213fcc78 100644
--- a/tests/lint.scm
+++ b/tests/lint.scm
@@ -1,7 +1,7 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2012, 2013 Cyril Roelandt <tipecaml@gmail.com>
;;; Copyright © 2014, 2015, 2016 Eric Bavier <bavier@member.fsf.org>
-;;; Copyright © 2014-2023 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2014-2024 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2015, 2016 Mathieu Lirzin <mthl@gnu.org>
;;; Copyright © 2016 Hartmut Goebel <h.goebel@crazy-compilers.com>
;;; Copyright © 2017 Alex Kost <alezost@gmail.com>
@@ -1358,7 +1358,8 @@ (define (package-with-phase-changes changes)
;; https://archive.softwareheritage.org/api/1/content/
(content "{ \"checksums\": {}, \"data_url\": \"xyz\",
\"length\": 42 }"))
- (with-http-server `((200 ,content))
+ (with-http-server `((404 "") ;extid
+ (200 ,content))
(parameterize ((%swh-base-url (%local-url)))
(check-archival (dummy-package "x" (source origin)))))))
@@ -1378,7 +1379,8 @@ (define (package-with-phase-changes changes)
\"type\": \"file\",
\"name\": \"README\"
\"length\": 42 } ]"))
- (with-http-server `((404 "") ;lookup-content
+ (with-http-server `((404 "") ;lookup-directory-by-nar-hash
+ (404 "") ;lookup-content
(200 ,disarchive) ;Disarchive database lookup
(200 ,directory)) ;lookup-directory
(mock ((guix download) %disarchive-mirrors (list (%local-url)))
@@ -1397,7 +1399,8 @@ (define (package-with-phase-changes changes)
\"save_request_date\": \"2014-11-17T22:09:38+01:00\",
\"save_request_status\": \"accepted\",
\"save_task_status\": \"scheduled\" }")
- (warnings (with-http-server `((404 "No revision.") ;lookup-revision
+ (warnings (with-http-server `((404 "No extid.") ;lookup-directory-by-nar-hash
+ (404 "No revision.") ;lookup-revision
(404 "No origin.") ;lookup-origin
(200 ,save)) ;save-origin
(parameterize ((%swh-base-url (%local-url)))
@@ -1415,7 +1418,27 @@ (define (package-with-phase-changes changes)
;; https://archive.softwareheritage.org/api/1/revision/
(revision "{ \"author\": {}, \"parents\": [],
\"date\": \"2014-11-17T22:09:38+01:00\" }"))
- (with-http-server `((200 ,revision))
+ (with-http-server `((404 "No directory.") ;lookup-directory-by-nar-hash
+ (200 ,revision))
+ (parameterize ((%swh-base-url (%local-url)))
+ (check-archival (dummy-package "x" (source origin)))))))
+
+(test-equal "archival: nar-sha256 extid available"
+ '()
+ (let* ((origin (origin
+ (method git-fetch)
+ (uri (git-reference
+ (url "http://example.org/foo.git")
+ (commit "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")))
+ (sha256 (make-bytevector 32))))
+ ;; https://archive.softwareheritage.org/api/1/extid/doc/
+ (extid "{ \"extid_type\": \"nar-sha256\",
+ \"extid\": \"1234\",
+ \"extid_version\": 0,
+ \"target\": \"swh:1:dir:cabba93\",
+ \"target_url\": \"boo\"
+ }"))
+ (with-http-server `((200 ,extid))
(parameterize ((%swh-base-url (%local-url)))
(check-archival (dummy-package "x" (source origin)))))))
--
2.41.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [bug#68741] [PATCH 5/6] git-download: Download from SWH by nar hash when possible.
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
` (3 preceding siblings ...)
2024-01-26 17:25 ` [bug#68741] [PATCH 4/6] lint: archival: Check with ‘lookup-directory-by-nar-hash’ Ludovic Courtès
@ 2024-01-26 17:25 ` Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 6/6] swh: Fix docstring of ‘lookup-directory’ Ludovic Courtès
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-01-26 17:25 UTC (permalink / raw)
To: 68741
Cc: Ludovic Courtès, Christopher Baines, Josselin Poiret,
Ludovic Courtès, Mathieu Othacehe, Ricardo Wurmus,
Simon Tournier, Tobias Geerinckx-Rice
From: Ludovic Courtès <ludovic.courtes@inria.fr>
* guix/build/git.scm (git-fetch-with-fallback): Add #:hash
and #:hash-algorithm. Try ‘swh-download-directory-by-nar-hash’ before
‘swh-download’ when #:hash is provided.
* guix/git-download.scm (git-fetch/in-band*): Pass #:hash
and #:hash-algorithm to ‘git-fetch-with-fallback’.
* guix/scripts/perform-download.scm (perform-git-download): Likewise.
Change-Id: Ic875a7022fd78c9fac32e92ad4f8ce4d81646ec5
---
guix/build/git.scm | 20 ++++++++++++++++----
guix/git-download.scm | 4 +++-
guix/scripts/perform-download.scm | 4 +++-
3 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/guix/build/git.scm b/guix/build/git.scm
index 867cade2c4..4c69365a7b 100644
--- a/guix/build/git.scm
+++ b/guix/build/git.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2014, 2016, 2019, 2023 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2014, 2016, 2019, 2023-2024 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2023 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;;
;;; This file is part of GNU Guix.
@@ -20,7 +20,9 @@
(define-module (guix build git)
#:use-module (guix build utils)
#:autoload (guix build download-nar) (download-nar)
- #:autoload (guix swh) (%verify-swh-certificate? swh-download)
+ #:autoload (guix swh) (%verify-swh-certificate?
+ swh-download
+ swh-download-directory-by-nar-hash)
#:use-module (srfi srfi-34)
#:use-module (ice-9 format)
#:export (git-fetch
@@ -91,10 +93,13 @@ (define* (git-fetch url commit directory
(define* (git-fetch-with-fallback url commit directory
#:key (git-command "git")
+ hash hash-algorithm
lfs? recursive?)
"Like 'git-fetch', fetch COMMIT from URL into DIRECTORY, but fall back to
alternative methods when fetching from URL fails: attempt to download a nar,
-and if that also fails, download from the Software Heritage archive."
+and if that also fails, download from the Software Heritage archive. When
+HASH and HASH-ALGORITHM are provided, they are interpreted as the nar hash of
+the directory of interested and are used as its content address at SWH."
(or (git-fetch url commit directory
#:lfs? lfs?
#:recursive? recursive?
@@ -110,7 +115,14 @@ (define* (git-fetch-with-fallback url commit directory
(format (current-error-port)
"Trying to download from Software Heritage...~%")
- (swh-download url commit directory)
+ ;; First try to look up and download the directory corresponding
+ ;; to HASH: this is fundamentally more reliable than looking up
+ ;; COMMIT, especially when COMMIT denotes a tag.
+ (or (and hash hash-algorithm
+ (swh-download-directory-by-nar-hash hash hash-algorithm
+ directory))
+ (swh-download url commit directory))
+
(when (file-exists?
(string-append directory "/.gitattributes"))
;; Perform CR/LF conversion and other changes
diff --git a/guix/git-download.scm b/guix/git-download.scm
index 3de6ae970d..aadcbd234c 100644
--- a/guix/git-download.scm
+++ b/guix/git-download.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2014-2021, 2023 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2014-2021, 2023-2024 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2017 Mathieu Lirzin <mthl@gnu.org>
;;; Copyright © 2017 Christopher Baines <mail@cbaines.net>
;;; Copyright © 2020 Jakub Kądziołka <kuba@kadziolka.net>
@@ -165,6 +165,8 @@ (define* (git-fetch/in-band* ref hash-algo hash
(git-fetch-with-fallback (getenv "git url") (getenv "git commit")
#$output
+ #:hash #$hash
+ #:hash-algorithm '#$hash-algo
#:lfs? lfs?
#:recursive? recursive?
#:git-command "git")))))
diff --git a/guix/scripts/perform-download.scm b/guix/scripts/perform-download.scm
index 9aa0e61e9d..e7eb3b2a1f 100644
--- a/guix/scripts/perform-download.scm
+++ b/guix/scripts/perform-download.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2016-2018, 2020, 2023 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2016-2018, 2020, 2023-2024 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -115,6 +115,8 @@ (define* (perform-git-download drv output
(setenv "PATH" "/run/current-system/profile/bin:/bin:/usr/bin")
(git-fetch-with-fallback url commit output
+ #:hash hash
+ #:hash-algorithm algo
#:recursive? recursive?
#:git-command %git))))
--
2.41.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [bug#68741] [PATCH 6/6] swh: Fix docstring of ‘lookup-directory’.
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
` (4 preceding siblings ...)
2024-01-26 17:25 ` [bug#68741] [PATCH 5/6] git-download: Download from SWH by nar hash when possible Ludovic Courtès
@ 2024-01-26 17:25 ` Ludovic Courtès
2024-01-26 17:25 ` [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
2024-02-12 11:23 ` bug#68741: " Ludovic Courtès
7 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-01-26 17:25 UTC (permalink / raw)
To: 68741
Cc: Ludovic Courtès, Christopher Baines, Josselin Poiret,
Ludovic Courtès, Mathieu Othacehe, Ricardo Wurmus,
Simon Tournier, Tobias Geerinckx-Rice
* guix/swh.scm (lookup-directory): Fix docstring.
Change-Id: Ia1fd9b2bc9184364cebbd30ee84c9fdea4ba897c
---
guix/swh.scm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/guix/swh.scm b/guix/swh.scm
index be1eb7d151..04cecd854c 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -446,7 +446,7 @@ (define-query (lookup-revision id)
json->revision)
(define-query (lookup-directory id)
- "Return the directory with the given ID."
+ "Return the list of entries of the directory with the given ID."
(path "/api/1/directory" id)
json->directory-entries)
--
2.41.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
` (5 preceding siblings ...)
2024-01-26 17:25 ` [bug#68741] [PATCH 6/6] swh: Fix docstring of ‘lookup-directory’ Ludovic Courtès
@ 2024-01-26 17:25 ` Ludovic Courtès
2024-02-12 11:23 ` bug#68741: " Ludovic Courtès
7 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-01-26 17:25 UTC (permalink / raw)
To: 68741; +Cc: Timothy Sample, Antoine
Oops, I forgot to Cc: the fine people for the cover letter; fixed!
See <https://issues.guix.gnu.org/68741>.
Ludovic Courtès <ludo@gnu.org> skribis:
> Hello Guix!
>
> For those who’ve been following along, you might remember that the
> main impedance mismatch between SWH and Guix is that SWH uses Git
> tree SHA1 hashes to identify directories whereas Guix uses nar SHA256
> hashes (and possibly other hash functions in the future):
>
> https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/
>
> Because of this, the SWH fallback path for ‘git-download’ had two
> options:
>
> 1. If ‘git-reference’ specifies a full SHA1 commit ID, it would
> look it up on SWH and fetch it.
>
> 2. If ‘git-reference’ specifies a tag, which is perhaps the
> majority of cases, Guix would ask SWH the commit that once
> corresponded to that tag at that URL, and then fetch it.
>
> Case #1 is ideal: it’s content-addressed. Case #2 is brittle: we’re
> hoping that the tag hasn’t been modified and that the URL hasn’t been
> reused for something else; if that’s not the case, SWH might return
> the “wrong” commit and we end up fetching something unrelated.
>
> The good news is that our friends at SWH have just deployed a new
> version of their code that lets us look up directories by some
> “external identifier” (“ExtID”), among which there’s ‘nar-sha256’:
>
> https://archive.softwareheritage.org/api/1/extid/doc/
>
> And that, my friends, makes a huge difference: the impedance mismatch
> is gone, we can now use content-addressing to fetch our stuff from SWH!!
> And that works not just for Git, but also for Mercurial, SVN, CVS, etc.
>
> Well, there’s a caveat: currently the ‘nar-sha256’ is added only on
> new visits and it’s apparently not being added yet for Mercurial for
> unclear reasons. So right now, we can get guile-sqlite3 0.1.3 (Git) by
> nar-sha256, but we cannot get guile-wisp (hg) nor in fact most things.
> That’ll improve over time though, and SWH comrades are open to adding
> those ExtIDs retroactively.
>
> The patches that follow do several things:
>
> 1. Follow redirects in the Vault: (guix swh) previously did not
> do that (oops!) but the newly-deployed Vault now responds with
> 302 redirects so we have to handle that.
>
> 2. Add bindings for the ExtID HTTP interface.
>
> 3. Add ‘swh-download-directory-by-nar-hash’, which does what it
> says.
>
> 4. Use that as the preferred fallback method for ‘git-fetch’.
>
> Here’s a REPLshot:
>
> scheme@(guile-user)> (lookup-external-id "nar-sha256" (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) )
> $43 = #<<external-id> value: "0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63" type: "nar-sha256" version: 0 target: "swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153" target-url: "https://archive.softwareheritage.org/swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153">
> scheme@(guile-user)> (swh-download-directory-by-nar-hash (content-hash-value(origin-hash (package-source (@ (gnu packages guile) guile-sqlite3)))) 'sha256 "/tmp/gsql")
> SWH: found directory with nar-sha256 hash 0b56ba94c2b83b8f74e3772887c1109135802eb3e8962b628377987fe97e1e63 at 'swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153'
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/.gitignore
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/AUTHORS
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/COPYING.LESSER
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/ChangeLog
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/Makefile.am
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/NEWS
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/README
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/guile.am
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/build-aux/test-driver.scm
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/configure.ac
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/env.in
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/sqlite3.scm.in
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/
> swh:1:dir:84a8b34591712c0a90bab0af604188bcd1fe3153/tests/basic.scm
> $46 = #t
>
> Huge thanks to everyone over at #swh-devel for helping me out
> over the past few days!
>
> Next tasks: implement download fallback for ‘hg-fetch’, change
> ‘guix lint -c archival’ to make ‘save-origin’ requests not just
> for Git repos, assess the situation with SVN and sub-directories
> to see what can be done.
>
> Thoughts?
>
> Ludo’.
>
> PS: Apologies for the wall of text!
>
> Ludovic Courtès (6):
> swh: ‘vault-fetch’ follows redirects.
> swh: Add bindings for the “ExtID” API.
> swh: Add ‘swh-download-directory-by-nar-hash’.
> lint: archival: Check with ‘lookup-directory-by-nar-hash’.
> git-download: Download from SWH by nar hash when possible.
> swh: Fix docstring of ‘lookup-directory’.
>
> guix/build/git.scm | 20 ++++--
> guix/git-download.scm | 4 +-
> guix/lint.scm | 28 +++++---
> guix/scripts/perform-download.scm | 4 +-
> guix/swh.scm | 113 ++++++++++++++++++++++++++----
> tests/lint.scm | 33 +++++++--
> tests/swh.scm | 21 +++++-
> 7 files changed, 189 insertions(+), 34 deletions(-)
>
>
> base-commit: 8bee6bb9aaaf35c36fe325675d1eb2daebd69c25
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#68741: [PATCH 0/6] Content-addressed downloads from Software Heritage
2024-01-26 17:16 [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
` (6 preceding siblings ...)
2024-01-26 17:25 ` [bug#68741] [PATCH 0/6] Content-addressed downloads from Software Heritage Ludovic Courtès
@ 2024-02-12 11:23 ` Ludovic Courtès
7 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-02-12 11:23 UTC (permalink / raw)
To: 68741-done
Cc: Josselin Poiret, Simon Tournier, Mathieu Othacehe,
Tobias Geerinckx-Rice, Ricardo Wurmus, Christopher Baines
Hi,
Ludovic Courtès <ludo@gnu.org> skribis:
> swh: ‘vault-fetch’ follows redirects.
> swh: Add bindings for the “ExtID” API.
> swh: Add ‘swh-download-directory-by-nar-hash’.
> lint: archival: Check with ‘lookup-directory-by-nar-hash’.
> git-download: Download from SWH by nar hash when possible.
> swh: Fix docstring of ‘lookup-directory’.
Pushed as 5a61ce6bcfbd0882956e40457232da737776abe7.
> Next tasks: implement download fallback for ‘hg-fetch’, change
> ‘guix lint -c archival’ to make ‘save-origin’ requests not just
> for Git repos, assess the situation with SVN and sub-directories
> to see what can be done.
Let’s make it happen!
Ludo’.
^ permalink raw reply [flat|nested] 9+ messages in thread