unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#47336: Disarchive as a fallback for downloads
@ 2021-03-23  4:42 Timothy Sample
  2021-03-23  4:52 ` bug#47336: [PATCH 1/2] swh: Add a directory download procedure Timothy Sample
  2021-03-23  5:11 ` bug#47336: Disarchive as a fallback for downloads Timothy Sample
  0 siblings, 2 replies; 4+ messages in thread
From: Timothy Sample @ 2021-03-23  4:42 UTC (permalink / raw)
  To: 47336

Hello,

This patch series adds Disarchive assembly (backed by SWH lookup) as a
fallback for downloads.

To try it, make sure you are running the daemon in an environment with
Disarchive available:

    $ ./pre-inst-env guix environment --ad-hoc guile disarchive
    # ./pre-inst-env guix-daemon --build-users-group=guixbuild

Don’t forget to stop your existing Guix Daemon.  :)

You also need to make sure that regular downloads are unavailable.  I do
this by adjusting the “try” loop at the end of “url-fetch” in
“guix/build/download.scm”.  I replace the usual list of URLs with ‘()’:

    (let try ((uri (append uri content-addressed-uris)))
      (match '() ; uri
        ...))

Now you can ask Guix for a recent .tar.gz source package:

    $ ./pre-inst-env guix build --no-substitutes -S python-httpretty

You should see:

    Trying to use Disarchive to assemble /gnu/store/kbcnm57y2q1jvhvd8zw1g5vdiwlv19y9-httpretty-1.0.5.tar.gz
    Assembling the directory httpretty-1.0.5
    Downloading from Software Heritage...
    7903d608efc89c14afb4d692a3721156e31a43e2/
    7903d608efc89c14afb4d692a3721156e31a43e2/httpretty-1.0.5/
    7903d608efc89c14afb4d692a3721156e31a43e2/httpretty-1.0.5/COPYING
    [...]
    Checking httpretty-1.0.5 digest... ok
    Assembling the tarball httpretty-1.0.5.tar
    Checking httpretty-1.0.5.tar digest... ok
    Assembling the Gzip file httpretty-1.0.5.tar.gz
    Checking httpretty-1.0.5.tar.gz digest... ok
    Copying result to /gnu/store/kbcnm57y2q1jvhvd8zw1g5vdiwlv19y9-httpretty-1.0.5.tar.gz
    successfully built /gnu/store/k0b3c7kgzyn1nlyhx192pcbcgbfnhnwa-httpretty-1.0.5.tar.gz.drv

There’s lots to talk about though....

First, it looks up the metadata on my server.  This is fine for a demo,
but not what we want forever.  The patch series supports adding several
mirrors for looking up the metadata.  In the past, we talked about
putting everything on one or a few of the big Git hosting platforms like
GitHub or Gitlab.  That way, it would be easily picked up by SWH and
archived “forever”.  Right now, I have Cuirass set up to build the
metadata, and a little script that moves it from the build server to my
Web server.  It would be simple enough to adjust that script to push it
to a remote Git repo.  (Of course, the next step is to move this setup
to Guix infrastructure.)  Thoughts?

On the code level, there were two things I couldn’t figure out for
myself.

I made the mirror list just simple strings.  AIUI, the client and the
daemon have to agree about the format of the mirror list.  Given that
running old daemons is common, changing the format is difficult.  Is it
worth it to copy the more flexible interface used by the content
addressed mirrors?  If yes, do I have to do the same ‘module-autoload!’
dance to use ‘bytevector->base16-string’?  :)  (I probably would have
just copied it, but that part confused me a bit.)

I imported some modules from “guix/build/download.scm” (well, just
“base16” and “swh”).  It feels weird to use a bunch of host-side modules
from what’s nominally a “guix/build” module.  This is okay because
“guix/build/download.scm” is not /really/ build-side code.  It’s more
like daemon (-ish) code that just happens to live in “guix/build”, which
is why importing host-side modules is OK... right?

Hopefully everything else is more-or-less fine.  :)


-- Tim




^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#47336: [PATCH 1/2] swh: Add a directory download procedure.
  2021-03-23  4:42 bug#47336: Disarchive as a fallback for downloads Timothy Sample
@ 2021-03-23  4:52 ` Timothy Sample
  2021-03-23  4:52   ` bug#47336: [PATCH 2/2] download: Use Disarchive as a last resort Timothy Sample
  2021-03-23  5:11 ` bug#47336: Disarchive as a fallback for downloads Timothy Sample
  1 sibling, 1 reply; 4+ messages in thread
From: Timothy Sample @ 2021-03-23  4:52 UTC (permalink / raw)
  To: 47336

* guix/swh.scm (swh-directory-download): New procedure (with
implementation extracted from 'swh-download').
(swh-download): Use it to download the revision directory.
---
 guix/swh.scm | 65 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 29 deletions(-)

diff --git a/guix/swh.scm b/guix/swh.scm
index f11b7ea2d5..2402ec98e6 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -108,6 +108,7 @@
 
             commit-id?
 
+            swh-download-directory
             swh-download))
 
 ;;; Commentary:
@@ -558,12 +559,6 @@ requested bundle cooking, waiting for completion...~%"))
 ;;; High-level interface.
 ;;;
 
-(define (commit-id? reference)
-  "Return true if REFERENCE is likely a commit ID, false otherwise---e.g., if
-it is a tag name.  This is based on a simple heuristic so use with care!"
-  (and (= (string-length reference) 40)
-       (string-every char-set:hex-digit reference)))
-
 (define (call-with-temporary-directory proc)      ;FIXME: factorize
   "Call PROC with a name of a temporary directory; close the directory and
 delete it when leaving the dynamic extent of this call."
@@ -577,6 +572,39 @@ delete it when leaving the dynamic extent of this call."
       (lambda ()
         (false-if-exception (delete-file-recursively tmp-dir))))))
 
+(define* (swh-download-directory id output
+                                 #:key (log-port (current-error-port)))
+  "Download from Software Heritage the directory with the given ID, and
+unpack it to OUTPUT.  Return #t on success and #f on failure"
+  (call-with-temporary-directory
+   (lambda (directory)
+     (match (vault-fetch id 'directory #:log-port log-port)
+       (#f
+        (format log-port
+                "SWH: directory ~a could not be fetched from the vault~%"
+                id)
+        #f)
+       ((? port? input)
+        (let ((tar (open-pipe* OPEN_WRITE "tar" "-C" directory "-xzvf" "-")))
+          (dump-port input tar)
+          (close-port input)
+          (let ((status (close-pipe tar)))
+            (unless (zero? status)
+              (error "tar extraction failure" status)))
+
+          (match (scandir directory)
+            (("." ".." sub-directory)
+             (copy-recursively (string-append directory "/" sub-directory)
+                               output
+                               #:log (%make-void-port "w"))
+             #t))))))))
+
+(define (commit-id? reference)
+  "Return true if REFERENCE is likely a commit ID, false otherwise---e.g., if
+it is a tag name.  This is based on a simple heuristic so use with care!"
+  (and (= (string-length reference) 40)
+       (string-every char-set:hex-digit reference)))
+
 (define* (swh-download url reference output
                        #:key (log-port (current-error-port)))
   "Download from Software Heritage a checkout of the Git tag or commit
@@ -593,28 +621,7 @@ wait until it becomes available, which could take several minutes."
      (format log-port "SWH: found revision ~a with directory at '~a'~%"
              (revision-id revision)
              (swh-url (revision-directory-url revision)))
-     (call-with-temporary-directory
-      (lambda (directory)
-        (match (vault-fetch (revision-directory revision) 'directory
-                            #:log-port log-port)
-          (#f
-           (format log-port
-                   "SWH: directory ~a could not be fetched from the vault~%"
-                   (revision-directory revision))
-           #f)
-          ((? port? input)
-           (let ((tar (open-pipe* OPEN_WRITE "tar" "-C" directory "-xzvf" "-")))
-             (dump-port input tar)
-             (close-port input)
-             (let ((status (close-pipe tar)))
-               (unless (zero? status)
-                 (error "tar extraction failure" status)))
-
-             (match (scandir directory)
-               (("." ".." sub-directory)
-                (copy-recursively (string-append directory "/" sub-directory)
-                                  output
-                                  #:log (%make-void-port "w"))
-                #t))))))))
+     (swh-download-directory (revision-directory revision) output
+                             #:log-port log-port))
     (#f
      #f)))
-- 
2.31.0





^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#47336: [PATCH 2/2] download: Use Disarchive as a last resort.
  2021-03-23  4:52 ` bug#47336: [PATCH 1/2] swh: Add a directory download procedure Timothy Sample
@ 2021-03-23  4:52   ` Timothy Sample
  0 siblings, 0 replies; 4+ messages in thread
From: Timothy Sample @ 2021-03-23  4:52 UTC (permalink / raw)
  To: 47336

* guix/download.scm (%disarchive-mirrors): New variable.
(%disarchive-mirror-file): New variable.
(built-in-download): Add 'disarchive-mirrors' keyword argument and
pass its value along to the 'builtin:download' derivation.
(url-fetch): Pass '%disarchive-mirror-file' to 'built-in-download'.
* guix/scripts/perform-download.scm (perform-download): Read
Disarchive mirrors from the environment and pass them to
'url-fetch'.
* guix/build/download.scm (disarchive-fetch/any): New procedure.
(url-fetch): Add 'disarchive-mirrors' keyword argument, use it to
make a list of URIs, and use the new procedure to fetch the file if
all other methods fail.
---
 guix/build/download.scm           | 77 +++++++++++++++++++++++++++----
 guix/download.scm                 | 19 ++++++--
 guix/scripts/perform-download.scm |  7 ++-
 3 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/guix/build/download.scm b/guix/build/download.scm
index a22d4064ca..f476d0f8ec 100644
--- a/guix/build/download.scm
+++ b/guix/build/download.scm
@@ -2,6 +2,7 @@
 ;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
 ;;; Copyright © 2017 Tobias Geerinckx-Rice <me@tobias.gr>
+;;; Copyright © 2021 Timothy Sample <samplet@ngyro.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -23,10 +24,12 @@
   #:use-module (web http)
   #:use-module ((web client) #:hide (open-socket-for-uri))
   #:use-module (web response)
+  #:use-module (guix base16)
   #:use-module (guix base64)
   #:use-module (guix ftp-client)
   #:use-module (guix build utils)
   #:use-module (guix progress)
+  #:use-module (guix swh)
   #:use-module (rnrs io ports)
   #:use-module (rnrs bytevectors)
   #:use-module (srfi srfi-1)
@@ -626,10 +629,50 @@ Return a list of URIs."
     (else
      (list uri))))
 
+(define* (disarchive-fetch/any uris file
+                               #:key (timeout 10))
+  "Fetch a Disarchive specification from any of URIS, assemble it,
+and write the output to FILE."
+  (define (fetch-specification uris)
+    (any (lambda (uri)
+           (false-if-exception*
+            (let-values (((port size) (http-fetch uri
+                                                  #:verify-certificate? #t
+                                                  #:timeout timeout)))
+              (let ((specification (read port)))
+                (close-port port)
+                specification))))
+         uris))
+
+  (define (resolve addresses output)
+    (any (match-lambda
+           (('swhid swhid)
+            (match (string-split swhid #\:)
+              (("swh" "1" "dir" id)
+               (format #t "Downloading from Software Heritage...~%" file)
+               (false-if-exception*
+                (swh-download-directory id output)))
+              (_ #f)))
+           (_ #f))
+         addresses))
+
+  (match (and=> (resolve-module '(disarchive) #:ensure #f)
+                (lambda (disarchive)
+                  (cons (module-ref disarchive '%disarchive-log-port)
+                        (module-ref disarchive 'disarchive-assemble))))
+    (#f #f)
+    ((%disarchive-log-port . disarchive-assemble)
+     (format #t "Trying to use Disarchive to assemble ~a~%" file)
+     (match (fetch-specification uris)
+       (#f #f)
+       (spec (parameterize ((%disarchive-log-port (current-output-port)))
+               (disarchive-assemble spec file #:resolver resolve)))))))
+
 (define* (url-fetch url file
                     #:key
                     (timeout 10) (verify-certificate? #t)
                     (mirrors '()) (content-addressed-mirrors '())
+                    (disarchive-mirrors '())
                     (hashes '())
                     print-build-trace?)
   "Fetch FILE from URL; URL may be either a single string, or a list of
@@ -693,6 +736,17 @@ otherwise simply ignore them."
                               hashes))
                 content-addressed-mirrors))
 
+  (define disarchive-uris
+    (append-map (lambda (mirror)
+                  (map (match-lambda
+                         ((hash-algo . hash)
+                          (string->uri
+                           (string-append mirror
+                                          (symbol->string hash-algo) "/"
+                                          (bytevector->base16-string hash)))))
+                       hashes))
+                disarchive-mirrors))
+
   ;; Make this unbuffered so 'progress-report/file' works as expected.  'line
   ;; means '\n', not '\r', so it's not appropriate here.
   (setvbuf (current-output-port) 'none)
@@ -705,15 +759,18 @@ otherwise simply ignore them."
        (or (fetch uri file)
            (try tail)))
       (()
-       (format (current-error-port) "failed to download ~s from ~s~%"
-               file url)
-
-       ;; Remove FILE in case we made an incomplete download, for example due
-       ;; to ENOSPC.
-       (catch 'system-error
-         (lambda ()
-           (delete-file file))
-         (const #f))
-       #f))))
+       ;; If we are looking for a software archive, one last thing we
+       ;; can try is to use Disarchive to assemble it.
+       (or (disarchive-fetch/any disarchive-uris file #:timeout timeout)
+           (begin
+             (format (current-error-port) "failed to download ~s from ~s~%"
+                     file url)
+             ;; Remove FILE in case we made an incomplete download, for
+             ;; example due to ENOSPC.
+             (catch 'system-error
+               (lambda ()
+                 (delete-file file))
+               (const #f))
+             #f))))))
 
 ;;; download.scm ends here
diff --git a/guix/download.scm b/guix/download.scm
index 30f69c0325..72094e7318 100644
--- a/guix/download.scm
+++ b/guix/download.scm
@@ -406,12 +406,19 @@
   (plain-file "content-addressed-mirrors"
               (object->string %content-addressed-mirrors)))
 
+(define %disarchive-mirrors
+  '("https://disarchive.ngyro.com/"))
+
+(define %disarchive-mirror-file
+  (plain-file "disarchive-mirrors" (object->string %disarchive-mirrors)))
+
 (define built-in-builders*
   (store-lift built-in-builders))
 
 (define* (built-in-download file-name url
                             #:key system hash-algo hash
                             mirrors content-addressed-mirrors
+                            disarchive-mirrors
                             executable?
                             (guile 'unused))
   "Download FILE-NAME from URL using the built-in 'download' builder.  When
@@ -422,13 +429,16 @@ explicitly depend on Guile, GnuTLS, etc.  Instead, the daemon performs the
 download by itself using its own dependencies."
   (mlet %store-monad ((mirrors (lower-object mirrors))
                       (content-addressed-mirrors
-                       (lower-object content-addressed-mirrors)))
+                       (lower-object content-addressed-mirrors))
+                      (disarchive-mirrors (lower-object disarchive-mirrors)))
     (raw-derivation file-name "builtin:download" '()
                     #:system system
                     #:hash-algo hash-algo
                     #:hash hash
                     #:recursive? executable?
-                    #:sources (list mirrors content-addressed-mirrors)
+                    #:sources (list mirrors
+                                    content-addressed-mirrors
+                                    disarchive-mirrors)
 
                     ;; Honor the user's proxy and locale settings.
                     #:leaked-env-vars '("http_proxy" "https_proxy"
@@ -439,6 +449,7 @@ download by itself using its own dependencies."
                                  ("mirrors" . ,mirrors)
                                  ("content-addressed-mirrors"
                                   . ,content-addressed-mirrors)
+                                 ("disarchive-mirrors" . ,disarchive-mirrors)
                                  ,@(if executable?
                                        '(("executable" . "1"))
                                        '()))
@@ -492,7 +503,9 @@ name in the store."
                              #:executable? executable?
                              #:mirrors %mirror-file
                              #:content-addressed-mirrors
-                             %content-addressed-mirror-file)))))
+                             %content-addressed-mirror-file
+                             #:disarchive-mirrors
+                             %disarchive-mirror-file)))))
 
 (define* (url-fetch/executable url hash-algo hash
                                #:optional name
diff --git a/guix/scripts/perform-download.scm b/guix/scripts/perform-download.scm
index 8d409092ba..6889bcef79 100644
--- a/guix/scripts/perform-download.scm
+++ b/guix/scripts/perform-download.scm
@@ -54,7 +54,8 @@ actual output is different from that when we're doing a 'bmCheck' or
                        (output* "out")
                        (executable "executable")
                        (mirrors "mirrors")
-                       (content-addressed-mirrors "content-addressed-mirrors"))
+                       (content-addressed-mirrors "content-addressed-mirrors")
+                       (disarchive-mirrors "disarchive-mirrors"))
     (unless url
       (leave (G_ "~a: missing URL~%") (derivation-file-name drv)))
 
@@ -79,6 +80,10 @@ actual output is different from that when we're doing a 'bmCheck' or
                              (lambda (port)
                                (eval (read port) %user-module)))
                            '())
+                       #:disarchive-mirrors
+                       (if disarchive-mirrors
+                           (call-with-input-file disarchive-mirrors read)
+                           '())
                        #:hashes `((,algo . ,hash))
 
                        ;; Since DRV's output hash is known, X.509 certificate
-- 
2.31.0





^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#47336: Disarchive as a fallback for downloads
  2021-03-23  4:42 bug#47336: Disarchive as a fallback for downloads Timothy Sample
  2021-03-23  4:52 ` bug#47336: [PATCH 1/2] swh: Add a directory download procedure Timothy Sample
@ 2021-03-23  5:11 ` Timothy Sample
  1 sibling, 0 replies; 4+ messages in thread
From: Timothy Sample @ 2021-03-23  5:11 UTC (permalink / raw)
  To: 47336; +Cc: control

reassign 47336 guix-patches
thanks

Oops!  I sent this to the wrong list.  My apologies.




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-23  5:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-23  4:42 bug#47336: Disarchive as a fallback for downloads Timothy Sample
2021-03-23  4:52 ` bug#47336: [PATCH 1/2] swh: Add a directory download procedure Timothy Sample
2021-03-23  4:52   ` bug#47336: [PATCH 2/2] download: Use Disarchive as a last resort Timothy Sample
2021-03-23  5:11 ` bug#47336: Disarchive as a fallback for downloads Timothy Sample

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).