From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id wD0EHBVkoWC8IgAAgWs5BA (envelope-from ) for ; Sun, 16 May 2021 20:27:33 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id mMCtFxVkoWAOdQAAB5/wlQ (envelope-from ) for ; Sun, 16 May 2021 18:27:33 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C676113D94 for ; Sun, 16 May 2021 20:27:32 +0200 (CEST) Received: from localhost ([::1]:39006 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1liLUA-0002AS-30 for larch@yhetil.org; Sun, 16 May 2021 14:27:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51386) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1liLTi-0002AB-2V for bug-guix@gnu.org; Sun, 16 May 2021 14:27:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:39912) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1liLTh-0005dv-Rg for bug-guix@gnu.org; Sun, 16 May 2021 14:27:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1liLTh-0002nq-OI for bug-guix@gnu.org; Sun, 16 May 2021 14:27:01 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#48468: substitute server connection timeout Resent-From: Christopher Baines Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Sun, 16 May 2021 18:27:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48468 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Mathieu Othacehe X-Debbugs-Original-Cc: bug-guix@gnu.org, 48468@debbugs.gnu.org Received: via spool by submit@debbugs.gnu.org id=B.162118959210739 (code B ref -1); Sun, 16 May 2021 18:27:01 +0000 Received: (at submit) by debbugs.gnu.org; 16 May 2021 18:26:32 +0000 Received: from localhost ([127.0.0.1]:51458 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1liLTE-0002n9-5G for submit@debbugs.gnu.org; Sun, 16 May 2021 14:26:32 -0400 Received: from lists.gnu.org ([209.51.188.17]:40602) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1liLTD-0002n2-2G for submit@debbugs.gnu.org; Sun, 16 May 2021 14:26:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51284) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1liLTC-00028q-TL for bug-guix@gnu.org; Sun, 16 May 2021 14:26:30 -0400 Received: from mira.cbaines.net ([212.71.252.8]:56692) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1liLTA-0005Ms-Op; Sun, 16 May 2021 14:26:30 -0400 Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:8ac0:b4c7:f5c8:7caa]) by mira.cbaines.net (Postfix) with ESMTPSA id 64FE527BC78; Sun, 16 May 2021 19:26:26 +0100 (BST) Received: from capella (localhost [127.0.0.1]) by localhost (OpenSMTPD) with ESMTP id 835715d7; Sun, 16 May 2021 18:26:25 +0000 (UTC) References: <87lf8e4l42.fsf@gnu.org> User-agent: mu4e 1.4.15; emacs 27.1 From: Christopher Baines In-reply-to: <87lf8e4l42.fsf@gnu.org> Date: Sun, 16 May 2021 19:26:22 +0100 Message-ID: <87eee6cz75.fsf@cbaines.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Received-SPF: pass client-ip=212.71.252.8; envelope-from=mail@cbaines.net; helo=mira.cbaines.net X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, NORMAL_HTTP_TO_IP=0.001, NUMERIC_HTTP_ADDR=1.242, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, WEIRD_PORT=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 48468@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1621189653; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post; bh=GZHkPFUAh91GyRMzxoOtAZLHQasd89AUszVzD7Ope3s=; b=JFQCy+JciFfa/INrAbIsPzFfcpakpc5CekA39hjPnrLZ8jv+oXqcABKtVAPxwpMKjhYDDL P42APJ24RbkTSBG1GVUi+kVCMseiX5NYYEaSezO0vKmmQn/5cFAdht2yOT23iapkkwgStj daZptDdEYOnhakepSr83qCei2ajL6+etR9qP0MihrAzFI9dqlgyN/EHQBC2ABRLdcE9xXd i7Ut6rHwnlrwmwCVR0DRBaO4mX9m7otGlEZ2AkJVRujG48iltBto6AhJahsep3Q1MOs+h/ JkOFpNbz++e7WXPfxnUEIsnXeGl60f1QGahrNjbG2NAMwmJClw7SPzPNPL/EHA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1621189653; a=rsa-sha256; cv=none; b=YbBg84r0v8dTJqriFMzjc0lq2U8jquS6UejoGH9B/nKyxOguCGT3x+uKamqTOMhzoLJVvd 3JQDdV7dp+lGKJr/gVyOMFn17ntMkNCxsxd4N+a6itpDjgmF/rZBO0UHWTEhwxU98zuoOz wR1JvMRYVCxi4N79Xbrb5AY9bu+03wxxWs+DhvRaxNujeyZ0QhxUFf8RZnnZFACQk+Okw4 66nfJS7O8P09540GFEC+QyPXHh5/XrWDV2OhliZPiq9PJMWxi36iMNO3/BwEdMhcRVuVQd HulN0VC9DisyqlSTlUaJSIFxh6QBHAjmcugAG/eY9oCoihxPBrfZPGgJT+qPfQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Spam-Score: -3.05 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: C676113D94 X-Spam-Score: -3.05 X-Migadu-Scanner: scn0.migadu.com X-TUID: dk8lzFYRv2/C --=-=-= Content-Type: text/plain Mathieu Othacehe writes: > Hello, > > We recently have a lot of those errors on Cuirass: > > --8<---------------cut here---------------start------------->8--- > guix substitute: warning: while fetching http://141.80.167.131:5557/nar/g7ka09613k5v1vlznh87yg35905ggw51-python2-scipy-1.2.2-guile-builder: server is somewhat slow > guix substitute: warning: try `--no-substitutes' if the problem persists > guix substitute: error: connect*: Connection timed out > --8<---------------cut here---------------end--------------->8--- > > which means that the workers are failing to connect to the Cuirass > remote-server publish server on berlin at 141.80.167.131:5557. > > Stracing this publish server shows that connection reuse seems to be > broken: > > --8<---------------cut here---------------start------------->8--- > accept4(9, {sa_family=AF_INET, sin_port=htons(41742), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41744), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41746), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25 > accept4(9, {sa_family=AF_INET, sin_port=htons(41748), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24 > accept4(9, {sa_family=AF_INET, sin_port=htons(41750), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41752), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41754), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25 > accept4(9, {sa_family=AF_INET, sin_port=htons(41756), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41758), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 26 > accept4(9, {sa_family=AF_INET, sin_port=htons(41760), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24 > accept4(9, {sa_family=AF_INET, sin_port=htons(41762), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41764), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41766), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41768), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 22 > accept4(9, {sa_family=AF_INET, sin_port=htons(41770), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41772), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41774), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41776), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41778), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41780), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41782), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > --8<---------------cut here---------------end--------------->8--- > > Investigating it, I found that the connection is closed and opened > multiple times in the call-with-cached-connection procedure of the (guix > script substitute) module. > > It looks like its because a 'bad-headers exception is raised when trying > to parse an eof object: > > --8<---------------cut here---------------start------------->8--- > ;;; (error bad-header (read-header-line #)) > --8<---------------cut here---------------end--------------->8--- > > I'm not sure where this eof comes from. There is this comment in the > http-multiple-get procedure in (guix http-client): > > --8<---------------cut here---------------start------------->8--- > ;; Swallow networking errors that could occur due to connection reuse > ;; and the like; they will be handled down the road when trying to > ;; read responses. > (false-if-networking-error > (begin > (for-each (cut write-request <> buffer) batch) > (put-bytevector p (get)) > (force-output p)))) > --8<---------------cut here---------------end--------------->8--- > > which would suggest that connection reuse could cause networking errors? > > What also puzzles me it that the main guix publish server on berlin does > not seem to present this issue. That would indicate that this error is > caused by how the Cuirass remote-server publish server is started or > configured. > > Ludo, Chris, any idea? While I've been working in this area, I've actually been trying to pick apart the connection caching, since the single thread assumption doesn't hold in the Guix Build Coordinator. Anyway, I do have a theory. Assuming I'm correct in saying that there's no nginx between the client and publish server here, I think that's your configuration difference. For ci.guix.gnu.org, as well as data.guix.gnu.org, it's NGinx which is keeping connections alive. I'm not sure the Guile code for the publish server does similarly, so talking to it directly might be different. That's on the server side, the actual problem is probably on the client side, as I guess there are possibly places where closed connections aren't handled properly. This reminds me I sent some patches relating to closing connections, this could well be related [1]. 1: https://issues.guix.gnu.org/47174 --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmChY89fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh aW5lcy5uZXQACgkQXiijOwuE9XfG9w/+KvzCHwfnB+eeEd6i0xTvTfc8+oLxb+GS j1TSbXSGR02ByGMGIbJeAMmNPV20V9Lzi24a49451fskUZbCGeg6aS2rO3c9QpUX jbF7//NW8zwgmhO/Hh9dvB4v+q01+VwrzE3c6ssAVO6772ZrTJqGA7oCU9UqD1w6 dQYUto8iSxa95UhFcNFMtGcBAevdoN4Ku9OpkqIt8uViCED+YM6a9Ej4d9f9oC8h vxw6EU0YsCrobJQJ8KW56N9uR4oeI4IqWevou2LbSAmcHEWmGHAF6XDgxu0594Ip YOFuqso4nuRxU3ffzgEupMbJA07yrwf0EB4lreqsSfU8gtZupFMLgyc8PviyXjw3 XPpo3PAukKhsVZyQlnTc6ldatWWVHIeTPy8vxdr6GvSQnzliaQmzpHj7xVqkfS3f R1gz5hEXoS0o/CXpVkpnaYMC8m+Y2ujM5AbEUBVo6itR6mmUCCF5D9q/BtosjQvu 8wrLes42iAqwlnNeg8UGKra/cwJ0YGVrOInsOyXX97jz3UVjkxV7jq53+vkw3fGZ eyHDugAJ8DKw9js8BX88AmXaV/Art4M0qG6D6Zp+MvIUbU9z+oNeTjAXgGewxntw sfIdxHju+DNs5x5je3pTzyRSDPEfle8Xvd58DbGyAIoP8v7C7LYPmxVID7R0D/cs ccIQKCj/kqg= =yVQQ -----END PGP SIGNATURE----- --=-=-=--