From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id qK6PKCF6AWJc6wAAgWs5BA (envelope-from ) for ; Mon, 07 Feb 2022 20:59:29 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id GH1dISF6AWIlGAAAG6o9tA (envelope-from ) for ; Mon, 07 Feb 2022 20:59:29 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 1650534113 for ; Mon, 7 Feb 2022 20:59:29 +0100 (CET) Received: from localhost ([::1]:50948 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nHAAa-0002wM-7q for larch@yhetil.org; Mon, 07 Feb 2022 14:59:28 -0500 Received: from eggs.gnu.org ([209.51.188.92]:50744) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nH9rn-0004QX-6M for bug-guix@gnu.org; Mon, 07 Feb 2022 14:40:04 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:49925) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nH9rm-0003K9-Qh for bug-guix@gnu.org; Mon, 07 Feb 2022 14:40:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1nH9rl-0007ex-Kv for bug-guix@gnu.org; Mon, 07 Feb 2022 14:40:01 -0500 X-Loop: help-debbugs@gnu.org Subject: bug#52182: [cuirass] remote-worker process freeze Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 07 Feb 2022 19:40:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52182 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Mathieu Othacehe Received: via spool by 52182-submit@debbugs.gnu.org id=B52182.164426275429382 (code B ref 52182); Mon, 07 Feb 2022 19:40:01 +0000 Received: (at 52182) by debbugs.gnu.org; 7 Feb 2022 19:39:14 +0000 Received: from localhost ([127.0.0.1]:43822 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nH9r0-0007dq-A7 for submit@debbugs.gnu.org; Mon, 07 Feb 2022 14:39:14 -0500 Received: from mail-qk1-f180.google.com ([209.85.222.180]:46839) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nH9qy-0007dd-Aw for 52182@debbugs.gnu.org; Mon, 07 Feb 2022 14:39:13 -0500 Received: by mail-qk1-f180.google.com with SMTP id 13so11834371qkd.13 for <52182@debbugs.gnu.org>; Mon, 07 Feb 2022 11:39:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=MkIm8K5gxpKWOKcaVgMFmgr7hJAuW5udCa6SfnK5uVo=; b=M8DFXIFuc8iDbK96moPxvA4gfY5K7KnGOIo4S+0xktFGEM2xdVBoOK6L2wnGOQNmz+ 60xyIWBQTY5jImLSyKj2VRBltf/fpPAnhBmrEVRi/7/0m//Yfts/p++B41ys4ppPsSYE ByMmfIXwQvBvLCcpU/lBpwrxx6xs63cE6NACoeB709sxo9dV1rZetftv2Qq/1iv3Y30N jjXuhBMbZT+SajWf+t8MpAbArrBObrGkFhdPQLPEZt/cHILlOqbhGGSBn2DV57KMZ/01 FJXeK/eGORN9JjKmhEperJctt+6VlT0V7EebOuZL1gQNmNpYr9771ocv3nozRe4RREnp ujog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=MkIm8K5gxpKWOKcaVgMFmgr7hJAuW5udCa6SfnK5uVo=; b=v/7DuAow2nMxr+8Pop09q4vJtfNap820Ri+5FjyjmqevKUzzhrPn/u380cYDneBLfB n/gclCSVqEd1NqD21stNY8oLqgOhEPeh4TtMNnJVy3iL/x5VAa/O5ATNpRc4tZ3FA/SG b15DLM1kni1x/DMLLjzLaboTAJ2Gdh4zVENzk49F8lh62/GP+zIhoP7ZnfdqG2ZWJPTO rLUUMMfudONYhmT4iXu6PLTox7C1P3D2odNpp8QVH74E54QXhGM4a2yu71BNqcy+NoDQ 2ZxQVZBEQEDHBZrqpuwREYvDr0iP3lzM8bItbHxKGYnFRpzAT2QX4L5TgbakZfxQdl82 p05g== X-Gm-Message-State: AOAM530dlebr8rf+swldyEVRgxDVC34+pTovyZAQJGEuSrc+dUpGN+CV 8TCam1oy38mytXcsMzfiErfqcqdAXII= X-Google-Smtp-Source: ABdhPJy2Y6WVtJpgDuWy5LOTg1Hnsz0BGdcE+IEvQ/MC/WwBcvAoXOtGstIgTuGA8kNNeje5oyRYKQ== X-Received: by 2002:a37:8003:: with SMTP id b3mr823587qkd.388.1644262746526; Mon, 07 Feb 2022 11:39:06 -0800 (PST) Received: from hurd (dsl-154-179.b2b2c.ca. [66.158.154.179]) by smtp.gmail.com with ESMTPSA id f5sm5819389qkp.97.2022.02.07.11.39.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Feb 2022 11:39:05 -0800 (PST) From: Maxim Cournoyer References: <87ilwbj8ff.fsf@gnu.org> Date: Mon, 07 Feb 2022 14:39:04 -0500 In-Reply-To: <87ilwbj8ff.fsf@gnu.org> (Mathieu Othacehe's message of "Mon, 29 Nov 2021 16:19:48 +0100") Message-ID: <874k5a1n6v.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 52182@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1644263969; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=MkIm8K5gxpKWOKcaVgMFmgr7hJAuW5udCa6SfnK5uVo=; b=vCqz/c9XJk926rOysRxtTVlijVym7RULvulrU1Y4FJ0dRJOSMQ+qokK0/pdx7un1zH/ORD O93UhTiJips3xQ3UnKLNABC87p7nLzjyY5FfQOLKSpe9Aj+RWXMknZ/HbGuZG3U/V43w3g /Ilga/26b56yr0Vx8wmp9OFGPggsl3dqipf1zHr07cAPVYbij46S+uROGWTXlYgdn4vDoo L9DyfMyHu3GpdF/slhEHGx/p8Q2CHnZJDfCcglIEtSXFVEb+0pha8eTRcm/mkGOXHN6lk2 CNiIHkc1kVFIQq7Y2rr3kFUIkjAIW1kG1iKAeEsNboaBUySJ8m7XvAzlYJ5P+Q== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1644263969; a=rsa-sha256; cv=none; b=fy/SzuSJ6oL3vUTWKxspQDRrw607i/HUMD1cmMhKLjxk9l40X/6QcQefhOR2ed1lImCKc/ 8QulCy+hDNK7YRsfgsquK0gYLWsXHVFdCIqLitUE6WyZenWebvISfokC0uTC7FOy7R7taO L6LhclqOl6Mr5mBBHF5UMtwEJ/3qjM735iqxGbDXWmUGErnAlWPkm6Zl29Y0Z3sVablQZO VOuSax/EKPjGe75O/gUqKTyzTRWHTaQND7TWh70mf/WT6opgnVB78lv//D3GRgX3MYcygh sHB4dqwO3F8hXly2Qr0o2mnZCke22KzElQSyMZhv/juMtWsycAxXuz4h/FoZ2w== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=M8DFXIFu; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.03 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=M8DFXIFu; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 1650534113 X-Spam-Score: -3.03 X-Migadu-Scanner: scn0.migadu.com X-TUID: ForDmC51NHgn Hello Mathieu, Mathieu Othacehe writes: > Hello, > > On the newly installed honeycomb machines, some Cuirass remote-worker > process freeze completely and stop communicating with the > remote-server. > > This has already been observed, but is for some reason more repeatable > on those machines. > > Here are the information I could collect on such a frozen process using > GDB: > > (gdb) attach 5660 ;frozen cuirass-remote-worker PID > (gdb) info thr > Id Target Id Frame > * 1 Thread 0xffffafd32e20 (LWP 5660) "yHg3r3fS" 0x0000ffffafb3fa80 in do_futex_wait.constprop () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libpthread.so.0 > 2 Thread 0xffffa6c1c1d0 (LWP 5666) "ZMQbg/Reaper" 0x0000ffffaf7ec294 in epoll_pwait () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 > 3 Thread 0xffffaf0071d0 (LWP 5667) "ZMQbg/IO/0" 0x0000ffffaf7ec294 in epoll_pwait () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 > 4 Thread 0xffffa641b1d0 (LWP 5674) "yHg3r3fS" 0x0000ffffaf7b9d04 in clock_nanosleep@@GLIBC_2.17 () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 > (gdb) bt > #0 0x0000ffffafb3fa80 in do_futex_wait.constprop () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libpthread.so.0 > #1 0x0000ffffafb3fb78 in __new_sem_wait_slow.constprop.0 () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libpthread.so.0 > #2 0x0000ffffafb80318 in GC_stop_world () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #3 0x0000ffffafb6c020 in GC_stopped_mark () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #4 0x0000ffffafb6c8dc in GC_try_to_collect_inner () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #5 0x0000ffffafb6d598 in GC_collect_or_expand () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #6 0x0000ffffafb73b4c in GC_alloc_large () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #7 0x0000ffffafb74038 in GC_generic_malloc () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #8 0x0000ffffafb74298 in GC_malloc_kind_global () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #9 0x0000ffffafc11fa8 in scm_make_bytevector () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 > #10 0x0000ffffacacc418 in ?? () > #11 0x0000ffffacc2ef2c in ?? () > (gdb) thr 4 > [Switching to thread 4 (Thread 0xffffa641b1d0 (LWP 5674))] > #0 0x0000ffffaf7b9d04 in clock_nanosleep@@GLIBC_2.17 () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 > (gdb) bt > #0 0x0000ffffaf7b9d04 in clock_nanosleep@@GLIBC_2.17 () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 > #1 0x0000ffffaf7bf55c in nanosleep () from /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/libc.so.6 > #2 0x0000ffffafb7e844 in GC_lock () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #3 0x0000ffffafb7ecdc in GC_do_blocking_inner () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #4 0x0000ffffafb73998 in GC_with_callee_saves_pushed () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #5 0x0000ffffafb79654 in GC_do_blocking () from /gnu/store/jsda4njqwjp4kb60fwa7n4mlfi1aanpq-libgc-7.6.12/lib/libgc.so.1 > #6 0x0000ffffafc96d94 in scm_without_guile () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 > #7 0x0000ffffafc97050 in scm_std_select () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 > #8 0x0000ffffafc97b5c in scm_std_sleep () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 > #9 0x0000ffffafc75918 in scm_sleep () from /gnu/store/7g3nbnf2kf31jk696k0nyz9ck55b11a0-guile-3.0.7/lib/libguile-3.0.so.1 > #10 0x0000ffffa6c50d94 in ?? () > #11 0x0000ffffacc2ee0c in ?? () > > So the threads 2 and 3 are managed internally by ZMQ. The threads 1 and > 4 are respectively the thread pinging the remote-server and the thread > actually building stuff. After asking about this issue on #guix, cbaines pointed to this relevant thread: https://lists.gnu.org/archive/html/bug-guile/2021-12/msg00011.html. Ludovic mentioned it is known that forking processes in threads would be undefined behavior in POSIX, but that doesn't match what the worker code is currently doing, which is to fork *then* spawn threads. Christopher mentioned perhaps something to try is to call execlp in the primitive-fork new process; this seems to work reliable for them in the guix-data-service. Thanks, Maxim