From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 4BiWNFjt/V5HSAAA0tVLHw (envelope-from ) for ; Thu, 02 Jul 2020 14:21:12 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id MDGMMFjt/V4ScAAAB5/wlQ (envelope-from ) for ; Thu, 02 Jul 2020 14:21:12 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 17F9494051E for ; Thu, 2 Jul 2020 14:21:11 +0000 (UTC) Received: from localhost ([::1]:51054 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jr05N-0007Le-9k for larch@yhetil.org; Thu, 02 Jul 2020 10:21:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52774) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jr05G-0007JV-24 for bug-guix@gnu.org; Thu, 02 Jul 2020 10:21:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:43937) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jr05F-0000XO-Pd for bug-guix@gnu.org; Thu, 02 Jul 2020 10:21:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jr05F-0000EF-LP for bug-guix@gnu.org; Thu, 02 Jul 2020 10:21:01 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#34033: Offloading sometimes hangs Resent-From: Mathieu Othacehe Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Thu, 02 Jul 2020 14:21:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 34033 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 34033-submit@debbugs.gnu.org id=B34033.1593699635821 (code B ref 34033); Thu, 02 Jul 2020 14:21:01 +0000 Received: (at 34033) by debbugs.gnu.org; 2 Jul 2020 14:20:35 +0000 Received: from localhost ([127.0.0.1]:55483 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jr04p-0000DB-Cd for submit@debbugs.gnu.org; Thu, 02 Jul 2020 10:20:35 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38274) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jr04m-0000Cx-6X for 34033@debbugs.gnu.org; Thu, 02 Jul 2020 10:20:34 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:33435) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jr04g-00081f-90; Thu, 02 Jul 2020 10:20:26 -0400 Received: from [2a01:e0a:fa:a50:283e:a3be:73a1:d0e2] (port=44556 helo=meru) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jr04f-0002cK-T0; Thu, 02 Jul 2020 10:20:26 -0400 From: Mathieu Othacehe References: <87o98obikk.fsf@gnu.org> <87fttuq2mz.fsf@gnu.org> Date: Thu, 02 Jul 2020 16:20:23 +0200 In-Reply-To: <87fttuq2mz.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 14 Jan 2019 23:45:56 +0100") Message-ID: <87pn9ec82g.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -3.3 (---) X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 34033@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Spam-Score: -1.01 X-TUID: n/TZ7MZ8Yb/2 Hello, > (That still doesn=E2=80=99t tell us why our =E2=80=98guix offload=E2=80= =99 processes would > occasionally be stuck but at least it ensures the build farm keeps > making progress even when that happens.) I'm still not sure it's directly related to this bug but I observed several offloading hangs on Berlin today. For instance, in Cuirass logs: --8<---------------cut here---------------start------------->8--- 2020-07-02T09:59:45 '/gnu/store/rm8ndiichxhwybaizis5pgck77952ilp-halt.drv' = offloaded to '141.80.167.164' 2020-07-02T09:54:30 '/gnu/store/dxczkbf5wa6qr37gm7wr995hcxs8s0ya-motion-4.2= .2.drv' offloaded to '141.80.167.170' --8<---------------cut here---------------end--------------->8--- those two builds were offloaded around 10:00 today and there's still no report from them at 16:00.=20 On 141.80.167.164 there's a matching build log: --8<---------------cut here---------------start------------->8--- -rw-r--r-- 1 root root 1735 Jul 2 10:00 /var/log/guix/drvs/rm/8ndiichxhwyb= aizis5pgck77952ilp-halt.drv.bz2 --8<---------------cut here---------------end--------------->8--- same on 141.80.167.170, --8<---------------cut here---------------start------------->8--- -rw-r--r-- 1 root root 6344 Jul 2 09:56 /var/log/guix/drvs/dx/czkbf5wa6qr3= 7gm7wr995hcxs8s0ya-motion-4.2.2.drv.bz2 --8<---------------cut here---------------end--------------->8--- Having those builds "unfinished" keeps the rest of the evaluation hanging. Running this SQL command in Cuirass database: --8<---------------cut here---------------start------------->8--- sqlite> select derivation, datetime(starttime, 'unixepoch', 'localtime'),st= optime from Builds where status=3D-1 and evaluation=3D14771; /gnu/store/ncp59nyidli4lm3ff2lkfjym25yb18j5-guix-1.1.0-14.5bd8033.drv|2020-= 07-02 09:33:04|0 /gnu/store/rm8ndiichxhwybaizis5pgck77952ilp-halt.drv|2020-07-02 09:59:28|0 /gnu/store/71wnjgm2waqgw3fqmxmc4r3f1ifd1l92-cups-test.drv|2020-07-02 10:00:= 26|0 /gnu/store/9qsqd7jfwnaw9sm323y45cwymn98kyjl-exim-test.drv|2020-07-02 10:00:= 51|0 /gnu/store/vhcww4fw4qxw0hl1009npd26b22gfj3c-bitlbee-test.drv|2020-07-02 10:= 00:24|0 /gnu/store/92jrd6dfzgdifr107hwi64s8hf4mls47-iptables.drv|2020-07-02 09:59:4= 9|0 /gnu/store/380nq6sjphd0agrvl43sr6ypli1yraz4-gnunet-0.12.2.drv|2020-07-02 09= :51:32|0 /gnu/store/lqs22nbc6vy2z2524rmkcsmbh5mllm62-cuirass-0.0.1-37.882393d.drv|20= 20-07-02 10:34:37|0 /gnu/store/dxczkbf5wa6qr37gm7wr995hcxs8s0ya-motion-4.2.2.drv|2020-07-02 09:= 54:02|0 /gnu/store/5ln3r997ycr7rd6fqahd2d426mjw0rxb-gzochi-0.12.drv|2020-07-02 09:5= 3:51|0 --8<---------------cut here---------------end--------------->8--- shows that the evaluation is pretty much pending since 10:00. According to Cuirass logs again, all those builds were offloaded, "/gnu/store/380nq6sjphd0agrvl43sr6ypli1yraz4-gnunet-0.12.2.drv", "/gnu/store/lqs22nbc6vy2z2524rmkcsmbh5mllm62-cuirass-0.0.1-37.882393d.drv" and /gnu/store/5ln3r997ycr7rd6fqahd2d426mjw0rxb-gzochi-0.12.drv are reported as failed, and all other are still hanging. Something is going wrong here! I'll keep investigating. Thanks, Mathieu