From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id sC6+Jq9AlF/LLAAA0tVLHw (envelope-from ) for ; Sat, 24 Oct 2020 14:56:47 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id BlOEIq9AlF8nZAAAbx9fmQ (envelope-from ) for ; Sat, 24 Oct 2020 14:56:47 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 01679940111 for ; Sat, 24 Oct 2020 14:56:47 +0000 (UTC) Received: from localhost ([::1]:38082 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kWKyL-0002y5-IY for larch@yhetil.org; Sat, 24 Oct 2020 10:56:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40536) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kWKwg-0002Cs-4z for guix-patches@gnu.org; Sat, 24 Oct 2020 10:55:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:50945) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kWKwf-00076g-Qm for guix-patches@gnu.org; Sat, 24 Oct 2020 10:55:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kWKwf-0001MO-On for guix-patches@gnu.org; Sat, 24 Oct 2020 10:55:01 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#44193] [PATCH 1/1] publish: Add '--cache-bypass-threshold'. References: <20201024144929.4529-1-ludo@gnu.org> In-Reply-To: <20201024144929.4529-1-ludo@gnu.org> Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Sat, 24 Oct 2020 14:55:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44193 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 44193@debbugs.gnu.org Cc: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 44193-submit@debbugs.gnu.org id=B44193.16035512735187 (code B ref 44193); Sat, 24 Oct 2020 14:55:01 +0000 Received: (at 44193) by debbugs.gnu.org; 24 Oct 2020 14:54:33 +0000 Received: from localhost ([127.0.0.1]:34258 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kWKwC-0001La-TF for submit@debbugs.gnu.org; Sat, 24 Oct 2020 10:54:33 -0400 Received: from eggs.gnu.org ([209.51.188.92]:40350) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kWKw9-0001LM-82 for 44193@debbugs.gnu.org; Sat, 24 Oct 2020 10:54:31 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:47714) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kWKw3-00074N-BM; Sat, 24 Oct 2020 10:54:23 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=51772 helo=gnu.org) by fencepost.gnu.org with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kWKw2-00051z-S2; Sat, 24 Oct 2020 10:54:23 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Date: Sat, 24 Oct 2020 16:54:16 +0200 Message-Id: <20201024145416.4691-1-ludo@gnu.org> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -3.3 (---) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Spam-Score: 3.49 X-TUID: lB5aZK3ofrVL * guix/scripts/publish.scm (show-help, %options): Add '--cache-bypass-threshold'. (low-compression): New procedure. (cache-bypass-threshold): New parameter. (bypass-cache?): New procedure. (render-narinfo/cached): Call 'render-narinfo' when 'bypass-cache?' returns true. (render-nar/cached): Call 'render-nar' when 'bypass-cache?' returns true. (guix-publish): Parameterize 'cache-bypass-threshold'. * tests/publish.scm ("with cache", "with cache, lzip + gzip") ("with cache, uncompressed"): Pass '--cache-bypass-threshold=0'. ("with cache, vanishing item"): Expect 200 for RESPONSE. ("with cache, cache bypass"): New test. --- doc/guix.texi | 24 +++++++++++- guix/scripts/publish.scm | 85 ++++++++++++++++++++++++++++++++-------- tests/publish.scm | 43 ++++++++++++++++++-- 3 files changed, 130 insertions(+), 22 deletions(-) diff --git a/doc/guix.texi b/doc/guix.texi index b5061877e2..633c974562 100644 --- a/doc/guix.texi +++ b/doc/guix.texi @@ -11977,13 +11977,20 @@ in advance, so @command{guix publish} does not add a prevents clients from knowing the amount of data being downloaded. Conversely, when @option{--cache} is used, the first request for a store -item (@i{via} a @code{.narinfo} URL) returns 404 and triggers a +item (@i{via} a @code{.narinfo} URL) triggers a background process to @dfn{bake} the archive---computing its @code{.narinfo} and compressing the archive, if needed. Once the archive is cached in @var{directory}, subsequent requests succeed and are served directly from the cache, which guarantees that clients get the best possible bandwidth. +That first @code{.narinfo} request nonetheless returns 200, provided the +requested store item is ``small enough'', below the cache bypass +threshold---see @option{--cache-bypass-threshold} below. That way, +clients do not have to wait until the archive is baked. For larger +store items, the first @code{.narinfo} request returns 404, meaning that +clients have to wait until the archive is baked. + The ``baking'' process is performed by worker threads. By default, one thread per CPU core is created, but this can be customized. See @option{--workers} below. @@ -12009,6 +12016,21 @@ Additionally, when @option{--cache} is used, cached entries that have not been accessed for @var{ttl} and that no longer have a corresponding item in the store, may be deleted. +@item --cache-bypass-threshold=@var{size} +When used in conjunction with @option{--cache}, store items smaller than +@var{size} are immediately available, even when they are not yet in +cache. @var{size} is a size in bytes, or it can be prefixed by @code{M} +for megabytes and so on. The default is @code{10M}. + +``Cache bypass'' allows you to reduce the publication delay for clients +at the expense of possibly additional I/O and CPU use on the server +side: depending on the client access patterns, those store items can end +up being baked several times until a copy is available in cache. + +Increasing the threshold may be useful for sites that have few users, or +to guarantee that users get substitutes even for store items that are +not popular. + @item --nar-path=@var{path} Use @var{path} as the prefix for the URLs of ``nar'' files (@pxref{Invoking guix archive, normalized archives}). diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm index 4eaf961ab2..c0150c74da 100644 --- a/guix/scripts/publish.scm +++ b/guix/scripts/publish.scm @@ -81,6 +81,9 @@ Publish ~a over HTTP.\n") %store-directory) compress archives with METHOD at LEVEL")) (display (G_ " -c, --cache=DIRECTORY cache published items to DIRECTORY")) + (display (G_ " + --cache-bypass-threshold=SIZE + serve store items below SIZE even when not cached")) (display (G_ " --workers=N use N workers to bake items")) (display (G_ " @@ -134,6 +137,12 @@ if ITEM is already compressed." (list %no-compression) requested)) +(define (low-compression c) + "Return of the same type as C, but optimized for low CPU +usage." + (compression (compression-type c) + (min (compression-level c) 2))) + (define %options (list (option '(#\h "help") #f #f (lambda _ @@ -184,6 +193,10 @@ if ITEM is already compressed." (option '(#\c "cache") #t #f (lambda (opt name arg result) (alist-cons 'cache arg result))) + (option '("cache-bypass-threshold") #t #f + (lambda (opt name arg result) + (alist-cons 'cache-bypass-threshold (size->number arg) + result))) (option '("workers") #t #f (lambda (opt name arg result) (alist-cons 'workers (string->number* arg) @@ -434,7 +447,7 @@ items. Failing that, we could eventually have to recompute them and return (expiration-time file)))))) (define (hash-part->path* store hash cache) - "Like 'hash-part->path' but cached results under CACHE. This ensures we can + "Like 'hash-part->path' but cache results under CACHE. This ensures we can still map HASH to the corresponding store file name, even if said store item vanished from the store in the meantime." (let ((cached (hash-part-mapping-cache-file cache hash))) @@ -454,6 +467,18 @@ vanished from the store in the meantime." result)) (apply throw args)))))) +(define cache-bypass-threshold + ;; Maximum size of a store item that may be served by the '/cached' handlers + ;; below even when not in cache. + (make-parameter (* 10 (expt 2 20)))) + +(define (bypass-cache? store item) + "Return true if we allow ITEM to be downloaded before it is cached. ITEM is +interpreted as the basename of a store item." + (guard (c ((store-error? c) #f)) + (< (path-info-nar-size (query-path-info store item)) + (cache-bypass-threshold)))) + (define* (render-narinfo/cached store request hash #:key ttl (compressions (list %no-compression)) (nar-path "nar") @@ -513,9 +538,20 @@ requested using POOL." (nar-expiration-time ttl) #:delete-entry delete-entry #:cleanup-period ttl)))) - (not-found request - #:phrase "We're baking it" - #:ttl 300)) ;should be available within 5m + + ;; If ITEM passes 'bypass-cache?', render a temporary narinfo right + ;; away, with a short TTL. The narinfo is temporary because it + ;; lacks 'FileSize', for instance, which the cached narinfo will + ;; have. Chances are that the nar will be baked by the time the + ;; client asks for it. + (if (bypass-cache? store item) + (render-narinfo store request hash + #:ttl 300 ;temporary + #:nar-path nar-path + #:compressions compressions) + (not-found request + #:phrase "We're baking it" + #:ttl 300))) ;should be available within 5m (else (not-found request #:phrase ""))))) @@ -627,19 +663,31 @@ return it; otherwise, return 404. When TTL is true, use it as the 'Cache-Control' expiration time." (let ((cached (nar-cache-file cache store-item #:compression compression))) - (if (file-exists? cached) - (values `((content-type . (application/octet-stream - (charset . "ISO-8859-1"))) - ,@(if ttl - `((cache-control (max-age . ,ttl))) - '()) + (cond ((file-exists? cached) + (values `((content-type . (application/octet-stream + (charset . "ISO-8859-1"))) + ,@(if ttl + `((cache-control (max-age . ,ttl))) + '()) - ;; XXX: We're not returning the actual contents, deferring - ;; instead to 'http-write'. This is a hack to work around - ;; . - (x-raw-file . ,cached)) - #f) - (not-found request)))) + ;; XXX: We're not returning the actual contents, deferring + ;; instead to 'http-write'. This is a hack to work around + ;; . + (x-raw-file . ,cached)) + #f)) + ((let* ((hash (and=> (string-index store-item #\-) + (cut string-take store-item <>))) + (item (and hash + (guard (c ((store-error? c) #f)) + (hash-part->path store hash))))) + (and item (bypass-cache? store item))) + ;; Render STORE-ITEM live. We reach this because STORE-ITEM is + ;; being baked but clients are already asking for it. Thus, we're + ;; duplicating work, but doing so allows us to reduce delays. + (render-nar store request store-item + #:compression (low-compression compression))) + (else + (not-found request))))) (define (render-content-addressed-file store request name algo hash) @@ -1061,7 +1109,10 @@ methods, return the applicable compression." consider using the '--user' option!~%"))) (parameterize ((%public-key public-key) - (%private-key private-key)) + (%private-key private-key) + (cache-bypass-threshold + (or (assoc-ref opts 'cache-bypass-threshold) + (cache-bypass-threshold)))) (info (G_ "publishing ~a on ~a, port ~d~%") %store-directory (inet-ntop (sockaddr:fam address) (sockaddr:addr address)) diff --git a/tests/publish.scm b/tests/publish.scm index 1c3b2785fb..f081d016d3 100644 --- a/tests/publish.scm +++ b/tests/publish.scm @@ -412,7 +412,8 @@ References: ~%" (call-with-new-thread (lambda () (guix-publish "--port=6797" "-C2" - (string-append "--cache=" cache))))))) + (string-append "--cache=" cache) + "--cache-bypass-threshold=0")))))) (wait-until-ready 6797) (let* ((base "http://localhost:6797/") (part (store-path-hash-part %item)) @@ -461,7 +462,8 @@ References: ~%" (call-with-new-thread (lambda () (guix-publish "--port=6794" "-Cgzip:2" "-Clzip:2" - (string-append "--cache=" cache))))))) + (string-append "--cache=" cache) + "--cache-bypass-threshold=0")))))) (wait-until-ready 6794) (let* ((base "http://localhost:6794/") (part (store-path-hash-part %item)) @@ -516,7 +518,8 @@ References: ~%" (call-with-new-thread (lambda () (guix-publish "--port=6796" "-C2" "--ttl=42h" - (string-append "--cache=" cache))))))) + (string-append "--cache=" cache) + "--cache-bypass-threshold=0")))))) (wait-until-ready 6796) (let* ((base "http://localhost:6796/") (part (store-path-hash-part item)) @@ -580,12 +583,44 @@ References: ~%" (basename item) ".narinfo")) (response (http-get url))) - (and (= 404 (response-code response)) + (and (= 200 (response-code response)) ;we're below the threshold (wait-for-file cached) (begin (delete-paths %store (list item)) (response-code (pk 'response (http-get url)))))))))) +(test-equal "with cache, cache bypass" + 200 + (call-with-temporary-directory + (lambda (cache) + (let ((thread (with-separate-output-ports + (call-with-new-thread + (lambda () + (guix-publish "--port=6788" "-C" "gzip" + (string-append "--cache=" cache))))))) + (wait-until-ready 6788) + + (let* ((base "http://localhost:6788/") + (item (add-text-to-store %store "random" (random-text))) + (part (store-path-hash-part item)) + (narinfo (string-append base part ".narinfo")) + (nar (string-append base "nar/gzip/" (basename item))) + (cached (string-append cache "/gzip/" (basename item) + ".narinfo"))) + ;; We're below the default cache bypass threshold, so NAR and NARINFO + ;; should immediately return 200. The NARINFO request should trigger + ;; caching, and the next request to NAR should return 200 as well. + (and (let ((response (pk 'r1 (http-get nar)))) + (and (= 200 (response-code response)) + (not (response-content-length response)))) ;not known + (= 200 (response-code (http-get narinfo))) + (begin + (wait-for-file cached) + (let ((response (pk 'r2 (http-get nar)))) + (and (> (response-content-length response) + (stat:size (stat item))) + (response-code response)))))))))) + (test-equal "/log/NAME" `(200 #t application/x-bzip2) (let ((drv (run-with-store %store -- 2.28.0