From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id SBbsMvVZTWHtwwAAgWs5BA (envelope-from ) for ; Fri, 24 Sep 2021 06:54:13 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id B7CkLvVZTWFrWQAA1q6Kng (envelope-from ) for ; Fri, 24 Sep 2021 04:54:13 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 1607321C34 for ; Fri, 24 Sep 2021 06:54:13 +0200 (CEST) Received: from localhost ([::1]:51122 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mTdDv-0002si-2A for larch@yhetil.org; Fri, 24 Sep 2021 00:54:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44348) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mTdDn-0002s3-Cq for bug-guix@gnu.org; Fri, 24 Sep 2021 00:54:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:44887) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mTdDn-0004nM-59 for bug-guix@gnu.org; Fri, 24 Sep 2021 00:54:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mTdDm-0006LR-GP for bug-guix@gnu.org; Fri, 24 Sep 2021 00:54:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#41625: Sporadic guix-offload crashes due to EOF errors Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 24 Sep 2021 04:54:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41625 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 41625-submit@debbugs.gnu.org id=B41625.163245922424361 (code B ref 41625); Fri, 24 Sep 2021 04:54:02 +0000 Received: (at 41625) by debbugs.gnu.org; 24 Sep 2021 04:53:44 +0000 Received: from localhost ([127.0.0.1]:56433 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mTdDT-0006Kr-Rk for submit@debbugs.gnu.org; Fri, 24 Sep 2021 00:53:44 -0400 Received: from mail-qt1-f171.google.com ([209.85.160.171]:40600) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mTdDS-0006Kd-1M for 41625@debbugs.gnu.org; Fri, 24 Sep 2021 00:53:43 -0400 Received: by mail-qt1-f171.google.com with SMTP id t13so2524108qtc.7 for <41625@debbugs.gnu.org>; Thu, 23 Sep 2021 21:53:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=rAXIGNNLVkyF2wSEqFECNIbURW2lEr/2pPb09+CicnM=; b=LSpsXiS1Ty5v3Lm0KayuFS2OkCKp8rVS2rNwU+cKaczxZ8372RYWhfcAHHC15ydDya E9xhp4iHGKRi819MbfT6V2IWG5I6yj03BAgQEiF9Wte7UTStqmXtrziAR0jRdh7l8g9k Ynv4/c1dhJP0i52bDOW2N1PeB/hMeWkiWD7zXfJ7qFPWq2F28By/GDJe+51UeOSpUbHw GwJeOr8quzemlQ52UUD6lshYkrjePKeAS/3UaDXkpFj5Ik+IvaxypXO6Pb62YCoktyZ1 8v/Vg4HGX8ptkOR23rWCyVc+chzTINwrxStoh8c4rML4qq2zRTJifWZlGgv47u65X1Yf QGuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=rAXIGNNLVkyF2wSEqFECNIbURW2lEr/2pPb09+CicnM=; b=Tp3mIUpjQ/IVy0kVItV8irjQlhkXp2AWFGMT3FusPiAaTcWR2a+TbhGy0t6nFwPqlm wMFn+N5EWemjxY0C7uoAHEBh1PrryjeNZY/iddp4hxVaY4ibCBWm1NNMg59q5yLQCK+h oRBz240Wpt1Nw/GplShj5kuMv1K/G2poYMRkpZ3ci7bt7F2AMsgBFuUHFwR3q3VE9f6B jIC5Q7D7Ohf6bro11Tk6CX9yH2yl5Koz0KM9TNhJIBcdsaQTgLRV/h0k2c0B/f65qwAJ z1j8eC6dAb44i4WzrZGpECAN5Y0NKgnvO8hfRnSvmG64HIM0+ZcgC7quxUMGQ6GpVHuD 0EvA== X-Gm-Message-State: AOAM5313m0FUR1UZb3cxHs4fi6L2ez+LzxsxQx1YXX0MOfD2dsu+BsIz 2XimQQYBxvd0zP/lOTWxqmK43sKt6CA= X-Google-Smtp-Source: ABdhPJxRm+wZv/qja6Hnud4E+vIrspRA6odw4V/k4mjl6xz3pXJgfnYBp6OqcJMU7U9pfo8jHUaKPg== X-Received: by 2002:a05:622a:178b:: with SMTP id s11mr2409208qtk.13.1632459216365; Thu, 23 Sep 2021 21:53:36 -0700 (PDT) Received: from hurd (dsl-10-149-91.b2b2c.ca. [72.10.149.91]) by smtp.gmail.com with ESMTPSA id f13sm4808886qto.63.2021.09.23.21.53.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Sep 2021 21:53:27 -0700 (PDT) From: Maxim Cournoyer References: <87mtsky9um.fsf@gmail.com> <20210525155003.27590-1-maxim.cournoyer@gmail.com> <875yz61rvt.fsf@gnu.org> <87mtsikwsm.fsf_-_@gmail.com> <87fsy9x3ev.fsf@gnu.org> <87r1hsjkbv.fsf_-_@gmail.com> <878s2lw330.fsf_-_@gnu.org> Date: Fri, 24 Sep 2021 00:53:22 -0400 In-Reply-To: <878s2lw330.fsf_-_@gnu.org> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s message of "Mon, 05 Jul 2021 10:57:07 +0200") Message-ID: <87czoyk20t.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 41625@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1632459253; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=rAXIGNNLVkyF2wSEqFECNIbURW2lEr/2pPb09+CicnM=; b=KOd9HwiXJrkTV3r6e1ordv2eeA9jcGqlir1R3kjs6sxgkcMnMIw9BNewsEl3xtPyEEOQGk MtYef8eriKinR4VYLrvCpnFtCvOrHv5+lvZZyqht5HqFT81MiBJ5pSxXz9po7EfOsmjRoZ NfVbIPSwVbkonHrtZBVvoez5GtnF1VpzPFtwEDP1CjEZNYNTuuN1Is37IPmCsdQDLfPGC1 if2605SjRtVpSHZI0vFhvoPFQ5ST8yxQmYQcFlUgPe2r5ohQnYS5Mqmaa2jK/4ByuAh/vf OCjY+J/pw1LFgli2/ObupQ8JqlWs7bpXBPmclhdXfnfYTsoS37pE7Y6tjajqGQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1632459253; a=rsa-sha256; cv=none; b=pr9dWLg+p8kZeBZML9872PELoTroH3bWN93s+BZ1lW4j0GjHEMATwMbDS/JXx24X/jYvgu W4Ifk99R5Y4c3WK8oxFBBm1k2+9lWXrINldm2bEMBJ9gQsfDW+m30sZu5Ce1upZ+jXux3R etj/iH0oaqWuYviuWGfsHX9KuhjiCcfDt7Tm2adyOPvs7WzkFscZqK+IMvq0HPygKzJ5ji X99RGPburIxPz0suKoS56W8QXCKBht5q1/2SJamQLilUiW8ONY3XSqB68X4PC9+0PVzE1/ k01eSOyzPR4I5YqtQbZbKGiNqPzQvBgLB1hIWxgaBptzO95EuevYUMoIcV4R3w== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=LSpsXiS1; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Spam-Score: -1.29 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=LSpsXiS1; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: 1607321C34 X-Spam-Score: -1.29 X-Migadu-Scanner: scn0.migadu.com X-TUID: 0yB+7SBDI7iQ Hello! Ludovic Court=C3=A8s writes: > Hi, > > Maxim Cournoyer skribis: > >> Now that I have root access to overdrive1, I could strace the sshd >> process (I just did 'strace -p340', noting the process of sshd displayed >> with 'herd status sshd'): >> >> pselect6(87, [3 4], NULL, NULL, NULL, NULL) =3D 1 (in [3]) >> accept(3, {sa_family=3DAF_INET, sin_port=3Dhtons(33262), sin_addr=3Dinet= _addr("66.158.152.121")}, [128->16]) =3D 5 >> fcntl(5, F_GETFL) =3D 0x2 (flags O_RDWR) >> pipe2([6, 7], 0) =3D 0 >> socketpair(AF_UNIX, SOCK_STREAM, 0, [8, 9]) =3D 0 >> clone(child_stack=3DNULL, flags=3DCLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI= D|SIGCHLD, child_tidptr=3D0xffff8e0ef0e0) =3D 644 >> close(7) =3D 0 >> close(9) =3D 0 >> write(8, "\0\0\1\245\0", 5) =3D 5 >> write(8, "\0\0\1\234\nPort 22\nPermitRootLogin no\n"..., 420) =3D 420 >> close(8) =3D 0 >> close(5) =3D 0 >> getpid() =3D 340 >> getpid() =3D 340 >> getpid() =3D 340 >> getpid() =3D 340 >> getpid() =3D 340 >> getpid() =3D 340 >> getpid() =3D 340 >> pselect6(87, [3 4 6], NULL, NULL, NULL, NULL) =3D 1 (in [6]) >> read(6, "\0", 1) =3D 1 >> pselect6(87, [3 4 6], NULL, NULL, NULL, NULL) =3D 1 (in [6]) >> read(6, "", 1) =3D 0 > > OK, so it looks as if the client disconnected right away. Hard to tell > exactly what that happened. :-/ Perhaps turning libssh debugging on on > the client side could help (by uncommenting =E2=80=9C#:log-verbosity 'pro= tocol=E2=80=9D > in (guix ssh)). I was able to better understand the problem after encountering it on another low power ARM board. It's about the guile-ssh/libssh timeout causing a channel read to return EOF. I have one example here where it hangs at the (inferior-eval '(use-modules (gnu)) result)' step; Guix runs for about 1m30s, apparently loading all the package modules. Perhaps my GUILE_COMPILED_PATH is not set correctly and things are slower than expected. Not sure. But what happens is that there's no output in the 15 s timeout that we set for the SSH session elapses, and libssh's ssh_channel_read returns 0, which is the same value it returns when it encounters EOF. Guile's peek_byte_or_eof turn that zero into an EOF. I've shared my analysis on the guile-ssh bug tracker [0] So information is lost at libssh's level, which is not so nice. Knowing exactly how that EOF come into the picture, we can handle it and produce better diagnostic though. I'll try reworking my original patch in that direction. Thanks, Maxim [0] https://github.com/artyom-poptsov/guile-ssh/issues/29