unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: Julien Lepiller <julien@lepiller.eu>
To: 39530@debbugs.gnu.org
Subject: [bug#39530] [PATCH] guix: Support partial download
Date: Sun, 9 Feb 2020 20:23:58 +0100	[thread overview]
Message-ID: <20200209202358.17bb4a39@tachikoma.lepiller.eu> (raw)

[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]

Hi Guix!

This patch adds support for partial download of fixed-output
derivations. I'm not very confident it can be pushed as-is, and it has
some shortcomings.

First, I make sure that the guix daemon will not remove previously
failed attempts when trying to build something again, when that is a
fixed-output derivation. Then, I add a Range HTTP header when
performing an HTTP fetch; this ensures that we only query for the part
we don't already have, and append it to the target file.

If a partial download fails, the same mirror/url is tried again, but
the partial file is removed first, ensuring we do a complete fetch this
time around. If that failed too, we try with the following url. If we
only perform a complete fetch, we proceed as usual. The next url will
be a partial fetch if there is already something locally.

The use-case is: I have a very unreliable wifi currently and when
downloading a big source (or substitute, but this patch doesn't address
that use-case), the connection is sometimes dropped in the middle and i
have to fetch everything from scratch. With this patch, the download
resumes.

Some issues that might need to be fixed: progress only shows for the
rest of the file, it would be nicer if it could start again where it
was before (say the connection dropped at 34%, then the progress bar
should start from 34%). When there are at least two urls it goes like
that: fetch a partial file, connection drops. Remove the file and try
again, connection drops. Go to the next mirror, fetch a partial file.
The first mirror restarted the download from the beginning, but we'd
like it not to, and skip to the following mirror instead. When there is
a hash mismatch, the file is fetched twice on a further attempt.

When testing locally with guix build -S ghostscript (and running the
daemon from ./pre-inst-env), the download went fine. Cancelling it in
the middle and restarting it did continue the download instead of
starting again, which is nice :).

However, with that daemon there was a lot of new builds required to run
guix environment guix as my user (and nothing was substituted, which
is weird), whereas with the system's daemon, there was nothing to
build. Maybe there's something fishy in that patch...

[-- Attachment #2: 0001-guix-download-Add-partial-download-support.patch --]
[-- Type: text/x-patch, Size: 5205 bytes --]

From 332793b7f29ea68ac9a1af22e3d1c4745200da7e Mon Sep 17 00:00:00 2001
From: Julien Lepiller <julien@lepiller.eu>
Date: Sun, 9 Feb 2020 19:47:27 +0100
Subject: [PATCH] guix: download: Add partial download support.

* nix/libstore/build.cc (tryToBuild): Do not remove invalid fixed-output
derivations.
* guix/build/download.scm (http-fetch): Add a range argument.
(url-fetch): Performa partial download if a file already exists.
---
 guix/build/download.scm | 36 +++++++++++++++++++++++++++---------
 nix/libstore/build.cc   |  1 +
 2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/guix/build/download.scm b/guix/build/download.scm
index 0f2d5f402a..c2b710becc 100644
--- a/guix/build/download.scm
+++ b/guix/build/download.scm
@@ -654,11 +654,13 @@ Return the resulting target URI."
                     #:query    (uri-query    ref)
                     #:fragment (uri-fragment ref)))))
 
-(define* (http-fetch uri #:key timeout (verify-certificate? #t))
+(define* (http-fetch uri #:key timeout (verify-certificate? #t) range)
   "Return an input port containing the data at URI, and the expected number of
 bytes available or #f.  When TIMEOUT is true, bail out if the connection could
 not be established in less than TIMEOUT seconds.  When VERIFY-CERTIFICATE? is
-true, verify HTTPS certificates; otherwise simply ignore them."
+true, verify HTTPS certificates; otherwise simply ignore them.  When RANGE is
+a number, it is the number of bytes we want to skip from the data at URI;
+otherwise the full document is requested."
 
   (define headers
     `(;; Some web sites, such as http://dist.schmorp.de, would block you if
@@ -670,6 +672,10 @@ true, verify HTTPS certificates; otherwise simply ignore them."
       ;; Acceptable" when not explicitly told that everything is accepted.
       (Accept . "*/*")
 
+      ,@(if range
+            `((Range . ,(string-append "bytes=" (number->string range) "-")))
+            '())
+
       ;; Basic authentication, if needed.
       ,@(match (uri-userinfo uri)
           ((? string? str)
@@ -690,7 +696,8 @@ true, verify HTTPS certificates; otherwise simply ignore them."
                 ((code)
                  (response-code resp)))
     (case code
-      ((200)                                      ; OK
+      ((200                                       ; OK
+        206)                                      ; Partial content
        (values port (response-content-length resp)))
       ((301                                       ; moved permanently
         302                                       ; found (redirection)
@@ -703,7 +710,8 @@ true, verify HTTPS certificates; otherwise simply ignore them."
          (close connection)
          (http-fetch uri
                      #:timeout timeout
-                     #:verify-certificate? verify-certificate?)))
+                     #:verify-certificate? verify-certificate?
+                     #:range range)))
       (else
        (error "download failed" (uri->string uri)
               code (response-reason-phrase resp))))))
@@ -775,10 +783,15 @@ otherwise simply ignore them."
       ((http https)
        (false-if-exception*
         (let-values (((port size)
-                      (http-fetch uri
-                                  #:verify-certificate? verify-certificate?
-                                  #:timeout timeout)))
-          (call-with-output-file file
+                      (if (file-exists? file)
+                        (http-fetch uri
+                                    #:verify-certificate? verify-certificate?
+                                    #:timeout timeout
+                                    #:range (stat:size (stat file)))
+                        (http-fetch uri
+                                    #:verify-certificate? verify-certificate?
+                                    #:timeout timeout))))
+          (call-with-port (open-file file (if (file-exists? file) "a" "w"))
             (lambda (output)
               (dump-port* port output
                           #:buffer-size %http-receive-buffer-size
@@ -817,7 +830,12 @@ otherwise simply ignore them."
   (let try ((uri (append uri content-addressed-uris)))
     (match uri
       ((uri tail ...)
-       (or (fetch uri file)
+       (or (if (file-exists? file)
+	       ;; If the file already exists, we do a partial fetch for the
+	       ;; remainder.  If something went wrong, we remove the file
+	       ;; and try to fetch the whole file instead.
+               (or (fetch uri file) (begin (delete-file file) (fetch uri file)))
+               (fetch uri file))
            (try tail)))
       (()
        (format (current-error-port) "failed to download ~s from ~s~%"
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index 17e92c68a7..176ab40226 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -1320,6 +1320,7 @@ void DerivationGoal::tryToBuild()
         Path path = i->second.path;
         if (worker.store.isValidPath(path)) continue;
         if (!pathExists(path)) continue;
+	if (fixedOutput) continue;
         debug(format("removing invalid path `%1%'") % path);
         deletePath(path);
     }
-- 
2.24.0


             reply	other threads:[~2020-02-09 19:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-09 19:23 Julien Lepiller [this message]
2020-02-19 16:04 ` [bug#39530] [PATCH] guix: Support partial download Ludovic Courtès
2020-02-19 16:25   ` Julien Lepiller
2020-02-19 22:07     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200209202358.17bb4a39@tachikoma.lepiller.eu \
    --to=julien@lepiller.eu \
    --cc=39530@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).