From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id AMw2G6BRrGOeUQAAbAwnHQ (envelope-from ) for ; Wed, 28 Dec 2022 15:24:32 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id QMxWGqBRrGPfHgAAG6o9tA (envelope-from ) for ; Wed, 28 Dec 2022 15:24:32 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 03D882B335 for ; Wed, 28 Dec 2022 15:24:31 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pAXLi-0008Us-Ku; Wed, 28 Dec 2022 09:24:06 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAXLe-0008Uc-Rt for bug-guix@gnu.org; Wed, 28 Dec 2022 09:24:02 -0500 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pAXLe-0002xd-JJ for bug-guix@gnu.org; Wed, 28 Dec 2022 09:24:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pAXLd-0003zD-Vv for bug-guix@gnu.org; Wed, 28 Dec 2022 09:24:02 -0500 X-Loop: help-debbugs@gnu.org Subject: bug#48468: substitute server connection timeout Resent-From: Mathieu Othacehe Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Wed, 28 Dec 2022 14:24:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48468 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 48468@debbugs.gnu.org Received: via spool by 48468-submit@debbugs.gnu.org id=B48468.167223741115280 (code B ref 48468); Wed, 28 Dec 2022 14:24:01 +0000 Received: (at 48468) by debbugs.gnu.org; 28 Dec 2022 14:23:31 +0000 Received: from localhost ([127.0.0.1]:57315 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pAXL8-0003yO-Ei for submit@debbugs.gnu.org; Wed, 28 Dec 2022 09:23:31 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51524) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pAXL7-0003yA-5R for 48468@debbugs.gnu.org; Wed, 28 Dec 2022 09:23:29 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAXL1-0002uC-Im; Wed, 28 Dec 2022 09:23:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=QOPIJnp44LyWwhKxBw3SXZP166pG8HblXCwrDfAzN5Q=; b=QGqLtaL9TJccFG9KgUtQ 7rjgDYC6oBJxK5x/OCq9oan9dwVfYkCAhaB/g+hDaGoXjYaAYqzivPOHOU3zOWHPgydG5V7E80ZK3 6PKv4QG0i4TqwTBxtzaEn6DaqCCJwd7Qbo7m0n94J8mCMwXZ/gcnHZkcSdjWZNuoQ34SbC79vjk4d ZeGMmRjZKrsP4jyy6t7wnGuDHeW2OtOP+nEx9KoJiATj0Yw/zbGOQ/rfjYVVreENyUVhvBQb1w7Yx AvRAhM1zkoQ5nzKwIW9WDT5aXA3J8McGYnNnq7FXMQcfnU2O+uuUsa3M66Mjmh6V/lViu0WmAD66v WT2Rz30v7REHNw==; Received: from [2a01:cb18:832e:5f00:3563:417e:2a38:86d8] (helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAXL1-00056O-64; Wed, 28 Dec 2022 09:23:23 -0500 From: Mathieu Othacehe In-Reply-To: <87wn72fay3.fsf@gnu.org> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s message of "Thu, 08 Dec 2022 11:26:28 +0100") References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> <87wn72fay3.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) Date: Wed, 28 Dec 2022 15:23:21 +0100 Message-ID: <87v8lvsj3a.fsf@gnu.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1672237472; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=QOPIJnp44LyWwhKxBw3SXZP166pG8HblXCwrDfAzN5Q=; b=YMnVZK+X5l+z6+8cHmKxDcfKTKjbM97NCz8GlqyjlU578Czkz3xC2AyLPPv9Ozc0QLDBNJ WCF077TKYP6HcOHksfZwdr7L2t7lRPfzkhfTdHNRyUjb9HO6mfPnVYt95Yj9LH1CthIuUj vkWEMHAHIAbKvMiqoKAbV4YtEZuW30O7UgiD+OGCCtASNkUwbZFgm4ZJ8hsp4oI5Cewb+4 SaD8VRNgyzsJY9e1q44KJ0hZUsnqDCtPE1yq8kccSlGexdO/sIAsTE0J/dmcviZ8LUqgZn 7y+LeiTf7gA/DxheK77AfHXoUIIx2II3sc7UAHlKrBE87r+TFq5smcJvEQQKLg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=QGqLtaL9; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Seal: i=1; s=key1; d=yhetil.org; t=1672237472; a=rsa-sha256; cv=none; b=bh37NBotKapldKr0E7QkJ/cxh0h+9kXUEsdxGh8ciISzvaLMQ9mEIan850+GQDirHY91Oe OQykJQAA27zYybNRuiuGP27Jy3bRI2ZsMG64jQTLjqRXifCnHxDkwI6dQ+BPWIFBtLeM4B Wp4pIclCUbDUuf3BuBx4mhziuv4N0jKItnIs4MgCa+UICLD9oqwqzkZLRVd2z5krQbyjWs OVzt3t7rs3+C+H+xyDJzDVciXwwa6nDmvycukbCdeKuqS1vE8H5srtHFSPCu/GyrXlIIF7 czUPdiNDI/uKwTfNxJEUUCGovdQg5PmtqC7MdPE+chUPAA+AftajInT3fhcR/g== X-Spam-Score: -2.43 X-Migadu-Queue-Id: 03D882B335 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=QGqLtaL9; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org X-Migadu-Scanner: scn1.migadu.com X-Migadu-Spam-Score: -2.43 X-TUID: UnkArDxMbe3V --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello, So I had an other look to that one, > (From .) > > Plausible explanations that come to mind: > > 1. =E2=80=98guix publish=E2=80=99 returning 404, but not due to baking.= Instead the > .drv is simply not in store, hence 404. > > 2. Client timeout (=E2=80=98guix publish=E2=80=99 fails to reply on tim= e). > > 3. Cached 404 in nginx (though we=E2=80=99re not supposed to cache thos= e I > think?), or timeout in nginx (again due to =E2=80=98guix publish=E2= =80=99 being too > slow and =E2=80=98proxy_read_timeout=E2=80=99 is reached, currently = 10s). I focused on https://ci.guix.gnu.org/build/308493/details which is similar to the one linked above. Some derivations are successfully substituted then, one is not and it aborts. This build stopped at 2022/12/24 00:25:38 CET. At that time the nginx error log on Berlin looks like: --8<---------------cut here---------------start------------->8--- 2022/12/24 00:25:23 [info] 128755#0: *178076209 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /x3c2f9zzxhk0ci81nrdsxzglp4b1h3di.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/x3c2f9zzxhk0ci81nrdsxzglp4b1h3di.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:24 [info] 128742#0: *178076816 client 174.59.221.51 closed= keepalive connection 2022/12/24 00:25:25 [info] 128755#0: *178076830 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /vx7lgvrcw6i6r0inw513qn6fj3cb27mm.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/vx7lgvrcw6i6r0inw513qn6fj3cb27mm.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:26 [info] 128742#0: *178077436 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /b1g8vyqp8c52yk6vscdvjflci5mnladq.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/b1g8vyqp8c52yk6vscdvjflci5mnladq.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:28 [info] 128742#0: *178078045 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /007zgflsl5xkr377wpakbsis5c2yqh1q.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/007zgflsl5xkr377wpakbsis5c2yqh1q.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:32 [info] 128742#0: *178078659 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /xdfyi8jki350i8njk1glrzg9p0va18vy.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/xdfyi8jki350i8njk1glrzg9p0va18vy.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:33 [info] 128741#0: *178079271 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /rzs7jiwhyls2bg8zv2g1vmmgcfwkvaax.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/rzs7jiwhyls2bg8zv2g1vmmgcfwkvaax.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:36 [info] 128741#0: *178080049 client 157.55.39.212 closed= keepalive connection 2022/12/24 00:25:38 [info] 128741#0: *178080047 peer closed connection in S= SL handshake while SSL handshaking, client: 127.0.0.1, server: 0.0.0.0:443 --8<---------------cut here---------------end--------------->8--- That would, in that specific case, invalidate your hypothesis number 2. When 'guix publish' timeouts, we have messages like: --8<---------------cut here---------------start------------->8--- 2022/12/24 00:45:36 [error] 128749#0: *178225211 upstream timed out (110: C= onnection timed out) while reading response header from upstream, client: 8= 6.87.253.49, server: ci.guix.gnu.org, request: "GET /m0hqyvj8di41xlraz087km= pr38zaw90r.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/m0hqyvj8di41= xlraz087kmpr38zaw90r.narinfo", host: "ci.guix.gnu.org" --8<---------------cut here---------------end--------------->8--- However, like suggested in your hypothesis number 1, it seems instead that we are replying 404 to the worker which resets the connection. As we have put aside the baking thing, the question is now why are those derivations not available? Are they not part of Berlin's store at that time? Or is the publish server erroneously returns 404? It looks like the requested derivations are still absent, as of right now: --8<---------------cut here---------------start------------->8--- mathieu@berlin /var/log/nginx$ ls /gnu/store/vx7lgvrcw6i6r0inw513qn6fj3cb27= mm* ls: cannot access '/gnu/store/vx7lgvrcw6i6r0inw513qn6fj3cb27mm*': No such f= ile or directory mathieu@berlin /var/log/nginx$ ls /gnu/store/b1g8vyqp8c52yk6vscdvjflci5mnla= dq* ls: cannot access '/gnu/store/b1g8vyqp8c52yk6vscdvjflci5mnladq*': No such f= ile or directory mathieu@berlin /var/log/nginx$ ls /gnu/store/007zgflsl5xkr377wpakbsis5c2yqh= 1q* ls: cannot access '/gnu/store/007zgflsl5xkr377wpakbsis5c2yqh1q*': No such f= ile or directory --8<---------------cut here---------------end--------------->8--- As I don't have much clue about what are those derivations, I think we should instrument a bit the publish server and maybe the substitute script like proposed in the attachments. WDYT? Thanks, Mathieu --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline; filename=0001-scripts-publish-Add-a-custom-baking-header.patch Content-Transfer-Encoding: quoted-printable >From 9f9c839937ac2edd1b5901b2262c4be0954fa20c Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe Date: Wed, 28 Dec 2022 15:12:46 +0100 Subject: [PATCH 1/2] scripts: publish: Add a custom baking header. Log the not-found responses and their reason (baking or not) to stdout. Also send the X-Baking custom header so that the client can be informed of the cause of the failure. * guix/scripts/publish.scm (not-found): Add a baking? argument to add the X-Baking HTTP header to the response if baking is in progress. Also, log t= he 404 responses to stdout, indicating if it is due to baking or not. (render-narinfo/cached): Pass the baking? argument. --- guix/scripts/publish.scm | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm index 3bf3bd9c7c..11fedf092e 100644 --- a/guix/scripts/publish.scm +++ b/guix/scripts/publish.scm @@ -4,7 +4,7 @@ ;;; Copyright =C2=A9 2015-2022 Ludovic Court=C3=A8s ;;; Copyright =C2=A9 2020 Maxim Cournoyer ;;; Copyright =C2=A9 2021 Simon Tournier -;;; Copyright =C2=A9 2021 Mathieu Othacehe +;;; Copyright =C2=A9 2021, 2022 Mathieu Othacehe ;;; ;;; This file is part of GNU Guix. ;;; @@ -375,14 +375,28 @@ (define* (narinfo-string store store-path compression))) compressions)))) =20 +;; Custom header to indicate that baking is in progress. +(declare-opaque-header! "X-Baking") + (define* (not-found request - #:key (phrase "Resource not found") + #:key + baking? + (phrase "Resource not found") ttl) "Render 404 response for REQUEST." + (format #t (G_ "=E2=86=B3 ~a ~a: 404~a~%") + (request-method request) + (uri-path (request-uri request)) + (if baking? " (baking)" "")) (values (build-response #:code 404 - #:headers (if ttl - `((cache-control (max-age . ,ttl))) - '())) + #:headers + (append + (if ttl + `((cache-control (max-age . ,ttl))) + '()) + (if baking? + '((x-baking . "1")) + '()))) (string-append phrase ": " (uri-path (request-uri request))))) =20 @@ -587,6 +601,7 @@ (define (delete-entry narinfo) #:nar-path nar-path #:compressions compressions) (not-found request + #:baking? #t #:phrase "We're baking it" #:ttl 300))) ;should be available withi= n 5m (else --=20 2.38.1 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0002-substitutes-Log-the-failing-queries.patch >From 25ffc57864dbf34ca58741f89c1f790dbde6702f Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe Date: Wed, 28 Dec 2022 15:19:29 +0100 Subject: [PATCH 2/2] substitutes: Log the failing queries. * guix/substitutes.scm (%debug?): New variable. (handle-narinfo-response): Log the failing queries if the %debug? parameter is set. --- guix/substitutes.scm | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/guix/substitutes.scm b/guix/substitutes.scm index 9014cf61ec..819eb2c73e 100644 --- a/guix/substitutes.scm +++ b/guix/substitutes.scm @@ -90,6 +90,12 @@ (define %narinfo-cache-directory (string-append %state-directory "/substitute/cache")) (string-append (cache-directory #:ensure? #f) "/substitute"))) +(define %debug? + ;; Enable debug mode by setting the GUIX_SUBSTITUTE_DEBUG environmnent + ;; variable. + (make-parameter + (getenv "GUIX_SUBSTITUTE_DEBUG"))) + (define (narinfo-cache-file cache-url path) "Return the name of the local file that contains an entry for PATH. The entry is stored in a sub-directory specific to CACHE-URL." @@ -224,6 +230,15 @@ (define (handle-narinfo-response request response port result) (let* ((path (uri-path (request-uri request))) (hash-part (basename (string-drop-right path 8)))) ;drop ".narinfo" + ;; Log the failing queries and indicate if it failed because the + ;; narinfo is being baked. + (when (%debug?) + (let ((baking? + (assoc-ref (response-headers response) 'x-baking))) + (display + (format #f "could not fetch ~a~a ~a~a~%" + url path code + (if baking? " (baking)" ""))))) (if len (get-bytevector-n port len) (read-to-eof port)) -- 2.38.1 --=-=-=--