From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#35181: Hydra offloads often get stuck while exporting build requisites Date: Tue, 09 Apr 2019 12:54:20 +0200 Message-ID: <87ftqrh2jn.fsf@gnu.org> References: <87mul17oo2.fsf@netris.org> <87imvp7ogv.fsf@netris.org> <20190407173105.GB1337@macbook41> <87ef6d6mdn.fsf@netris.org> <87pnpw29kp.fsf@gnu.org> <87o95g5lpd.fsf@netris.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:53411) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hDoP9-0002n0-V3 for bug-guix@gnu.org; Tue, 09 Apr 2019 06:55:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hDoP8-00040l-KU for bug-guix@gnu.org; Tue, 09 Apr 2019 06:55:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:37054) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hDoP8-00040O-B4 for bug-guix@gnu.org; Tue, 09 Apr 2019 06:55:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hDoP8-0001e8-4x for bug-guix@gnu.org; Tue, 09 Apr 2019 06:55:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87o95g5lpd.fsf@netris.org> (Mark H. Weaver's message of "Mon, 08 Apr 2019 15:40:51 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: 35181@debbugs.gnu.org Hi Mark, Mark H Weaver skribis: > Ludovic Court=C3=A8s writes: > >> Mark H Weaver skribis: >> >>> The source checkout currently being transferred for build 3432472 >>> (/gnu/store/=E2=80=A6-font-google-material-design-icons-3.0.1-checkout)= is 176 >>> megabytes uncompressed, as measured by "du -s --si", which is not >>> precisely same as NAR size, but hopefully close enough for a rough >>> estimate. As I write this, build 3432472 been stuck here for 24 hours >>> 15 minutes. Even if the average transfer rate were 4 kilobytes per >>> second, it should have been done in half that time. >> >> This is weird, could it be that data transfers get stuck somehow? > > As far as I can tell, that's what seems to happen. > >> Did you try to check the status of the =E2=80=98nix-store=E2=80=99 and = =E2=80=98guix offload=E2=80=99 >> processes on the head node? > > Here are the corresponding 'guix offload' processes: > > hydra@20121227-hydra:~$ ps auxwwf | head -1; ps auxwwf | egrep -B1 'off()= load' [...] > root 14769 0.0 0.2 145668 10912 ? SLsl Apr07 0:16 | = | \_ /gnu/store/yihvhxv3xyyvl1m2cy1lnf1lyi9h76fk-guile-2.2.2/bin/guile -= -no-auto-compile /gnu/store/fkkjhida23k612naa9d4q6avqj5v3b28-guix-0.13.0-8.= 357ab93/bin/.guix-real offload x86_64-linux 3600 1 72000 The problem is that this is an ancient Guix. In the meantime, offloading has seen relevant changes, in particular things like commit ed7b44370f71126087eb953f36aad8dc4c44109f which address stability issues with Guile-SSH (ssh dist node) that was previously used. I think we should upgrade Guix on hydra.gnu.org otherwise we=E2=80=99re lik= ely to end up chasing old bugs. > The 'nix-store' processes seem to be stuck sleeping in 'read', if I'm > interpreting the 'strace' output correctly: > > root@20121227-hydra:~# strace -p 8983 > Process 8983 attached - interrupt to quit > read(3, ^C > Process 8983 detached > root@20121227-hydra:~# strace -p 14767 > Process 14767 attached - interrupt to quit > read(3, ^C > Process 14767 detached > > > "netstat --inet --program" shows that the SSH connections are still > open: > > root@20121227-hydra:~# netstat --inet --program | grep 'hydra\.net\.in\.t= um\.' > tcp 0 0 20121227-hydra.gn:53216 hydra.net.in.tum.de:ssh ESTAB= LISHED 14769/guile=20=20=20=20=20 > tcp 0 0 20121227-hydra.gn:52434 hydra.net.in.tum.de:ssh ESTAB= LISHED 8985/guile=20=20=20=20=20=20 > tcp 0 0 20121227-hydra.gnu.:www hydra.net.in.tum.:52104 TIME_= WAIT -=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 > tcp 0 0 20121227-hydra.gnu.:www hydra.net.in.tum.:52103 TIME_= WAIT -=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 This could be the kind of issue that we had with (ssh dist node). It=E2=80= =99s hard to tell. > I could easily believe that this problem is specific to > hydra.gnunet.org, but even if that's the case, it would be good if > offloading would reliably time out before days have passed. That=E2=80=99s the case with commit a708de151c255712071e42e5c8284756b51768c= d, but again, the Guix installation on hydra may predate that. :-/ Thanks, Ludo=E2=80=99.