From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id ALOpDO/ypGHcZAAAgWs5BA (envelope-from ) for ; Mon, 29 Nov 2021 16:34:07 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id AHyAA+/ypGGXLgAAbx9fmQ (envelope-from ) for ; Mon, 29 Nov 2021 15:34:07 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id A5EC48A3D for ; Mon, 29 Nov 2021 16:34:06 +0100 (CET) Received: from localhost ([::1]:39968 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mrifN-0003NV-CM for larch@yhetil.org; Mon, 29 Nov 2021 10:34:05 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52146) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mriRo-0004wV-2U for bug-guix@gnu.org; Mon, 29 Nov 2021 10:20:04 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:55587) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mriRm-0001cV-QF for bug-guix@gnu.org; Mon, 29 Nov 2021 10:20:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mriRm-0006El-GE for bug-guix@gnu.org; Mon, 29 Nov 2021 10:20:02 -0500 X-Loop: help-debbugs@gnu.org Subject: bug#52182: [cuirass] remote-worker process freeze Resent-From: Mathieu Othacehe Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 29 Nov 2021 15:20:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 52182 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 52182@debbugs.gnu.org X-Debbugs-Original-To: bug-guix@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.163819919223953 (code B ref -1); Mon, 29 Nov 2021 15:20:02 +0000 Received: (at submit) by debbugs.gnu.org; 29 Nov 2021 15:19:52 +0000 Received: from localhost ([127.0.0.1]:38900 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mriRb-0006EH-NM for submit@debbugs.gnu.org; Mon, 29 Nov 2021 10:19:52 -0500 Received: from lists.gnu.org ([209.51.188.17]:56588) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mriRa-0006EA-PA for submit@debbugs.gnu.org; Mon, 29 Nov 2021 10:19:51 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52064) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mriRa-0004ug-KG for bug-guix@gnu.org; Mon, 29 Nov 2021 10:19:50 -0500 Received: from [2001:470:142:3::e] (port=47628 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mriRa-0001b4-9c for bug-guix@gnu.org; Mon, 29 Nov 2021 10:19:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:Subject:To:From:in-reply-to: references; bh=Kdu9oFdCKG/pFUuxFu6Bxs4NV02A7B2COEBDqZ+uXzU=; b=AGDGRvK3qhx2Df bDiCIOuyHyAKUWs3uRoSK9EUQIidRriHD1plGRcN3BU8m2mSHyKIV+Mi4q5p8dHrzwtMjIponvPkf vaLEoa8a/mF58hwz5r624SWWLb+jolG1extw51LoueYo/K8qrSX2mN86+DIffkh7rUMrOwtXC9g3o dkLb4s8nSYWteUrgXYCdcx4QGlVMgE34iBOjoDMh0bW9yWXfytzwd3IvoGkXRASVnlDSosGIemkXk SlOJJK8EPZ/28Go1bK2oaeCGXBDImwRa5MXNXosRLkx2ARlGHJcZ9XPFW/as6lwLJYLJ0lXzJop23 N/MYaxf6yv+dHaErapWQ==; Received: from [2a01:e0a:19b:d9a0:2ddb:d3d2:32e8:d31a] (port=60754 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mriRa-0006Tk-53 for bug-guix@gnu.org; Mon, 29 Nov 2021 10:19:50 -0500 From: Mathieu Othacehe Date: Mon, 29 Nov 2021 16:19:48 +0100 Message-ID: <87ilwbj8ff.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1638200046; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:list-id:list-help:list-unsubscribe:list-subscribe: list-post:dkim-signature; bh=Kdu9oFdCKG/pFUuxFu6Bxs4NV02A7B2COEBDqZ+uXzU=; b=LDx0zsq4QdxGVNKv2o81J+y18GR8G9bdNHePOMd7z7TEW8oqY1/YcMHw3P0QT8P7nTKfmS EWHvPJS2NfM2XSewwmKZk0oHC6CcpT1zkTqPYi+m3aHhyrfDMUEz+FWjiJwz3593G/WtIF 6EVjDMO0+ltKI5KGYh3mVHz3hGrnS2RdRMYe0gy5WH2wXeaQhKvGVLRzBIBm4RSIss58ts j/OPR+5/VEBFxVBvox1ww7To/iE18bhC27DH+4Cu3aQSOLbhvykkrxsCJ7P6nqp9ptV5Wb tCT0rJrnkD+l7rdH3LTHw+4fYqnEZnghPigU1YdseMPqSkFt6jD1nY/hC8y/kw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1638200046; a=rsa-sha256; cv=none; b=dMd9xsRi7xwIEITUb2BE/UeJRQ8lGZrfsNBI3NeSDI0820fBveSzHI+IlLhy3l7ZYO3O7E hAnQe58zbhGjmfZSg3zbgEzJI3PRt4fTxRE7ocydtE/IDDBVtpC7msBBZdLw4L3pg/jcXR P5YOdju+RDQhsTTIxJ+/pgHNneUugOVPMeqVhYiPc98IYzupKcqgPeJwxYF7N8zbHI2B5x 9+YQua8rtmWB2LU76sX33LgcgFeSe306Br6OqnkdGyT+UNPhJBGCvU0VUawL52KW17oqEA dmCK0AQDbWgsf+U/QeigeGnDBZZumk5KKy+RxuAuZVWXxASfoTz3IbHgR0WPMA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=AGDGRvK3; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Spam-Score: -3.81 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=AGDGRvK3; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: A5EC48A3D X-Spam-Score: -3.81 X-Migadu-Scanner: scn1.migadu.com X-TUID: EIo9GHvEaPoD Hello, On the newly installed honeycomb machines, some Cuirass remote-worker process freeze completely and stop communicating with the remote-server. This has already been observed, but is for some reason more repeatable on those machines. Here are the information I could collect on such a frozen process using GDB: --8<---------------cut here---------------start------------->8--- (gdb) attach 5660 ;frozen cuirass-remote-worker PID (gdb) info thr Id Target Id Frame * 1 Thread 0xffffafd32e20 (LWP 5660) "yHg3r3fS" 0x0000ffffafb3fa80 in do_futex_wait.constprop () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libpthread.so.0 2 Thread 0xffffa6c1c1d0 (LWP 5666) "ZMQbg/Reaper" 0x0000ffffaf7ec294 in epoll_pwait () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 3 Thread 0xffffaf0071d0 (LWP 5667) "ZMQbg/IO/0" 0x0000ffffaf7ec294 in epoll_pwait () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 4 Thread 0xffffa641b1d0 (LWP 5674) "yHg3r3fS" 0x0000ffffaf7b9d04 in clock_nanosleep@@GLIBC_2.17 () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 (gdb) bt #0 0x0000ffffafb3fa80 in do_futex_wait.constprop () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libpthread.so.0 #1 0x0000ffffafb3fb78 in __new_sem_wait_slow.constprop.0 () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libpthread.so.0 #2 0x0000ffffafb80318 in GC_stop_world () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #3 0x0000ffffafb6c020 in GC_stopped_mark () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #4 0x0000ffffafb6c8dc in GC_try_to_collect_inner () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #5 0x0000ffffafb6d598 in GC_collect_or_expand () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #6 0x0000ffffafb73b4c in GC_alloc_large () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #7 0x0000ffffafb74038 in GC_generic_malloc () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #8 0x0000ffffafb74298 in GC_malloc_kind_global () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #9 0x0000ffffafc11fa8 in scm_make_bytevector () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 #10 0x0000ffffacacc418 in ?? () #11 0x0000ffffacc2ef2c in ?? () (gdb) thr 4 [Switching to thread 4 (Thread 0xffffa641b1d0 (LWP 5674))] #0 0x0000ffffaf7b9d04 in clock_nanosleep@@GLIBC_2.17 () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 (gdb) bt #0 0x0000ffffaf7b9d04 in clock_nanosleep@@GLIBC_2.17 () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 #1 0x0000ffffaf7bf55c in nanosleep () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 #2 0x0000ffffafb7e844 in GC_lock () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #3 0x0000ffffafb7ecdc in GC_do_blocking_inner () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #4 0x0000ffffafb73998 in GC_with_callee_saves_pushed () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #5 0x0000ffffafb79654 in GC_do_blocking () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 #6 0x0000ffffafc96d94 in scm_without_guile () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 #7 0x0000ffffafc97050 in scm_std_select () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 #8 0x0000ffffafc97b5c in scm_std_sleep () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 #9 0x0000ffffafc75918 in scm_sleep () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 #10 0x0000ffffa6c50d94 in ?? () #11 0x0000ffffacc2ee0c in ?? () --8<---------------cut here---------------end--------------->8--- So the threads 2 and 3 are managed internally by ZMQ. The threads 1 and 4 are respectively the thread pinging the remote-server and the thread actually building stuff. Looks like they are both stuck doing GC stuff. Thanks, Mathieu