From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id YN1eCfSHr2BaXQEAgWs5BA (envelope-from ) for ; Thu, 27 May 2021 13:52:20 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id gI8TBfSHr2BKIAAA1q6Kng (envelope-from ) for ; Thu, 27 May 2021 11:52:20 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 7435920910 for ; Thu, 27 May 2021 13:52:19 +0200 (CEST) Received: from localhost ([::1]:52970 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lmEYh-00009g-BL for larch@yhetil.org; Thu, 27 May 2021 07:52:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48944) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lmEWx-0008O0-IW for bug-guix@gnu.org; Thu, 27 May 2021 07:50:36 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:39211) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lmEWY-0005LK-3m for bug-guix@gnu.org; Thu, 27 May 2021 07:50:07 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lmEWX-00051c-Ul for bug-guix@gnu.org; Thu, 27 May 2021 07:50:01 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response. Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Thu, 27 May 2021 11:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41625 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 41625-submit@debbugs.gnu.org id=B41625.162211617119272 (code B ref 41625); Thu, 27 May 2021 11:50:01 +0000 Received: (at 41625) by debbugs.gnu.org; 27 May 2021 11:49:31 +0000 Received: from localhost ([127.0.0.1]:50757 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lmEW2-00050m-Jz for submit@debbugs.gnu.org; Thu, 27 May 2021 07:49:30 -0400 Received: from mail-qk1-f178.google.com ([209.85.222.178]:44898) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lmEW1-00050a-95 for 41625@debbugs.gnu.org; Thu, 27 May 2021 07:49:29 -0400 Received: by mail-qk1-f178.google.com with SMTP id h20so253835qko.11 for <41625@debbugs.gnu.org>; Thu, 27 May 2021 04:49:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=/DWBr4/8Nt2F2IKaJzjp6D5dZnx1xjLmdnLpT+HR2P0=; b=r6+221jnZNKiAjefaRgRU/fxsjOe2KmZMoBBPQeiGkJ/GH1ARFoWoMjroDtiAIaFJX ZKaJGrKG2LQjdA9GziMmSqf5aLKKbh7Fnw1o01xLSFxumk94z9KjmvMjqV3I60qxfw9o YI4dpxEqgn7qHOaaeS2qAmctBWGMr/UaQ/v2b6hoZbV8EJsMBr1wiCP7nsuG0N8qN0Gm VEfD0xQRIe6D0mlGY/SRim9a71qNK2xKIrp0DxUifHUJDhxXkUS+WGmL4G23CPgdUSID ebDSATifkdNrYGAuSuar4TmNC/UMXAM1hS65SR2f1I1nD+R+lhm32YeUcZbGWlq20VbO RFzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=/DWBr4/8Nt2F2IKaJzjp6D5dZnx1xjLmdnLpT+HR2P0=; b=Pj5Fozkx+OnIFw+USvVNlH7hDP5NJvGLOHro/sncw8hjIEeGhj4FxMJJFn2WcpUG2V M6NCUnzNiHqFNUEX1hpgx4caf5umbnz9nPt+lJ3x/Sq6h7lbsMKECqof0Fk/mCEkYuoB 6YHN+QuEPxIRpy9R5wUtyEb4xC0ZubSXqhkXGPZO79KOgA9QwLirfoLkaWv+MxvofuA3 Jx0e1D9hkvzrrSxuXZ/aVyqsGhzfg+mnvhIPm5PzBpwxRWOqgBLJseFhkLinQwXk7j9/ UhjKpOdPNfJR8gmsKwNxlPVNGPU0mCOBae2HyTNc1Q3noCm8hQ3E4eWMsO0HgPk6hxME J5FA== X-Gm-Message-State: AOAM531/vqgw+heDk664qo6fJPLr7AhTkg5WQibRpLlHSXN3DcPTKmsN SSk5VNrlobeT2ygg2IrcKyE= X-Google-Smtp-Source: ABdhPJwoW2qVr3nMpoLSOICW/WMBfS5qDracpQkzr8nYfZnMAFbQwSrha8rdIr/UuSQga5VJAP+xcw== X-Received: by 2002:a37:b143:: with SMTP id a64mr3056144qkf.492.1622116163676; Thu, 27 May 2021 04:49:23 -0700 (PDT) Received: from hurd (dsl-152-121.b2b2c.ca. [66.158.152.121]) by smtp.gmail.com with ESMTPSA id v25sm1183427qtf.68.2021.05.27.04.49.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 May 2021 04:49:23 -0700 (PDT) From: Maxim Cournoyer References: <87mtsky9um.fsf@gmail.com> <20210525155003.27590-1-maxim.cournoyer@gmail.com> <875yz61rvt.fsf@gnu.org> <87mtsikwsm.fsf_-_@gmail.com> <87fsy9x3ev.fsf@gnu.org> Date: Thu, 27 May 2021 07:49:22 -0400 In-Reply-To: <87fsy9x3ev.fsf@gnu.org> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s message of "Wed, 26 May 2021 11:14:32 +0200") Message-ID: <877djkl7lp.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 41625@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1622116339; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=/DWBr4/8Nt2F2IKaJzjp6D5dZnx1xjLmdnLpT+HR2P0=; b=CwcrhRfo/NLDYAKrOG86HoRrXf3cVYMOmEplrHrDAz0ewZg76cN/mUeHd8cGg4gJYLQP1Y /Uof34wLNZN9nSPtKBy2sgrLEgSV78pJ/ze4jzOWzbvbWV3qeX0SQp7SK8PGq5iPJKGvZO Fn7Uwac5nVL85Dfd07fij5CpB7C0KTQfuhSODECs65wSBmT9XyJ32J//h/VbPTrX8MkLbQ AhRH0VOGEVJw7R3J/dfePjBA+swk5z0+WGVbs3CwEukC34qWDz2lhbuOYkBGXixHTtm+HR VR3hJ6XmeLZQf4SpHnjfk6cZJIrBjn0aZn8JkQzy6tlBNF+hyRM10PVjcLfwIg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1622116339; a=rsa-sha256; cv=none; b=h2f3jkjqHnwrLVQspd2/wM8YYhwHcqafZ2UDzYGbgYbuk4q6j08rs5rIkVnN8vzJBp3WEW +4NUf1Hvnt6Fcknezgycp7SvQr6nCW7dXcpy6zTQbI7cA7bWeHo0jk7z2ilZfp9+v4pnP1 /SQFWZ/4c0QLyDNXvErbEe99EGTauCS/mMZK4W3/om4wSe/LRGvUEhsL22D51zWbiQBtk4 wMQBttfU4nkOOtbuhrzgZrLi2oSqZhFaDL0QXaVrk+LFzNnb8PbNtJ0CK5UpCmJLm/KAt3 XJbLV5s3wQbsp2oP2UHWMbbGvGkGOKdkQYMp5R5oimbsYGFZEiBqRdpYtaNOAA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20161025 header.b=r6+221jn; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Spam-Score: -1.33 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20161025 header.b=r6+221jn; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: 7435920910 X-Spam-Score: -1.33 X-Migadu-Scanner: scn0.migadu.com X-TUID: VvMhEBYlyunj Hi Ludovic, Ludovic Court=C3=A8s writes: [...] > I see. So I=E2=80=99d say it=E2=80=99s a prerequisite (a patch that must= come before) > but not entirely the same thing. I=E2=80=99m nitpicking! Eh, it's okay :-). Splitting changes into the right unit is a problem that is akin to naming things; it's hard! I welcome your suggestion. > We should make sure it doesn=E2=80=99t trigger thread-safety issues in li= bssh or > anything like that (running it repeatedly on a large machines.scm should > give us some confidence). It seems fine so far, but I've only tested in a loop with 4 build machines. When it nears completion I'll give it a shot on berlin. [...] > Yes, but note that this is just for =E2=80=98guix offload test=E2=80=99. = The actual > code run while offloading will still fail badly. Ah, thanks for pointing that; I somehow thought that this machine status checking code was a prelude to every offloaded build. [...] >> I don't have a password set for my user on overdrive1, so can't attach >> strace to sshd, but yeah, we could try to capture it and see if we can >> understand what's going on. > > OK. I'd be happy to try strace when your are available. You can ping me on the chat. It's been more than 8 hours since I tried, so I should be able to trigger the problem :-). [...] > Perhaps worth adding an =E2=80=98inferior=E2=80=99 and/or =E2=80=98port= =E2=80=99 field. That would > allow the handler to present more information as to which inferior is > failing. > > Maybe =E2=80=98premature-eof=E2=80=99 would be more accurate than =E2=80= =98connection-lost=E2=80=99. Good suggestions. I'll implement them. >> + (format (current-error-port) >> + (G_ "connection to machine '~a' lost; re= trying~%") >> + (build-machine-name machine)) > > You can use =E2=80=98info=E2=80=99 instead of =E2=80=98format=E2=80=99. That also. Thanks! On another note, I was able to 'exercise' the fix, and the exception is raised but something fails with the following backtrace instead of being retried: --8<---------------cut here---------------start------------->8--- guix offload: Testing 1 build machines defined in '/etc/guix/machines.scm'.= .. connection to machine 'overdrive1.guix.gnu.org' lost; retrying Backtrace: In ice-9/boot-9.scm: 1752:10 10 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _) In unknown file: 9 (apply-smob/0 #) In ice-9/boot-9.scm: 724:2 8 (call-with-prompt _ _ #) In ice-9/eval.scm: 619:8 7 (_ #(#(#))) In guix/ui.scm: 2161:12 6 (run-guix-command _ . _) In ice-9/boot-9.scm: 1752:10 5 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _) 1747:15 4 (with-exception-handler # _ # _ # =E2=80=A6) In srfi/srfi-1.scm: 634:9 3 (for-each # (#< name: "overdriv=E2=80=A6>)) In ice-9/eval.scm: 191:35 2 (_ #(#(#(# 3 #<= na=E2=80=A6> =E2=80=A6) =E2=80=A6) =E2=80=A6)) Exception thrown while printing backtrace: In procedure frame-local-ref: Argument 2 out of range: 1 ice-9/boot-9.scm:1685:16: In procedure raise-exception: Wrong type to apply: 2 --8<---------------cut here---------------end--------------->8--- I haven't been able to pinpoint what yet. Notice that in the above code I've changed par-for-each by just for-each, doubting it might have something to do with it, but it appears unrelated. Thanks, Maxim