From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:403:4789::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id YPiADW7nrmTYJQAASxT56A (envelope-from ) for ; Wed, 12 Jul 2023 19:48:30 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:4789::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id gGmQDW7nrmSRpAAA9RJhRA (envelope-from ) for ; Wed, 12 Jul 2023 19:48:30 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id CC08F4AA5B for ; Wed, 12 Jul 2023 19:48:29 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=V16LzW0j; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1689184109; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=RIRMf0vtkZDttVTaQoUZDimaL0r7DS5e54O1ycNojSI=; b=Lc5nvrIg3IWv+ICkTr9ntm5INAXItAS4W5FISauHjemlIGh8ae7rZTEn8ajFzAhHY26AaD 4pe9f2pAcy1p3Zf4rpmHti3RUffZvImeJW+HfkaOfpHhvWnh9lLmC7SxfWeroCTDl9SHMW +OiGEB1P+wrXP2J7B+3Tf2J6CcLpWGVDWYAbEb19T+6JJDEdWCBY0tqcW/sM7/6iwJ0WU/ yO5o+pqAtpbufniTKnt9ROKjXG9U5AIssO1fPR6ogvkY6dyY5X0hkKqQhogr/OFnv9ub1W pc6263KUw2PzewaP8CsRVI39BaGofj7r4oCT8vEb1M5lsJQGV+ITWeEUiSnYHg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=V16LzW0j; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1689184109; a=rsa-sha256; cv=none; b=CvkyRcLEn5Vyb7AdlZ8WcXiv0em9rIysyYfYAnkg6CC7+WEnqSN62kHTSRoY0yCy+fr53e otro5hShVtZNLtu1TIsg1avPJlpGruw5NJLKwI5HxZmasBPc1dxRw0nnwDviCYkXqWiyB3 IJlj2JLRZ4tuF6hewu9z79nMc98C6GyzH6le5tnkeAXZC8xWqsLwam9ogBYiLGY13/BSJh nn4Uw1Ndj/+ljPEavCw8qk2EQ+sdU/oRnR9itPNec+ws517Fdl/QluQZ3SymNH9dU0lbhO SdJ+4AYlP8amsKclkVa9USXEzuquASZsQV7R+5EhH+STavE5QcpGCDeccsZRAw== Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qJdwc-0003fv-8M; Wed, 12 Jul 2023 13:48:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qJdwY-0003fD-R3 for bug-guix@gnu.org; Wed, 12 Jul 2023 13:48:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qJdwY-0008K1-H8 for bug-guix@gnu.org; Wed, 12 Jul 2023 13:48:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qJdwY-0000Rr-Dj for bug-guix@gnu.org; Wed, 12 Jul 2023 13:48:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#63982: Shepherd can crash when a user service fails to start Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Wed, 12 Jul 2023 17:48:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63982 X-GNU-PR-Package: guix X-GNU-PR-Keywords: moreinfo To: Maxim Cournoyer Cc: 63982@debbugs.gnu.org Received: via spool by 63982-submit@debbugs.gnu.org id=B63982.16891840281535 (code B ref 63982); Wed, 12 Jul 2023 17:48:02 +0000 Received: (at 63982) by debbugs.gnu.org; 12 Jul 2023 17:47:08 +0000 Received: from localhost ([127.0.0.1]:52911 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qJdvf-0000Of-RW for submit@debbugs.gnu.org; Wed, 12 Jul 2023 13:47:08 -0400 Received: from eggs.gnu.org ([209.51.188.92]:52626) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qJdvd-0000O9-Dd for 63982@debbugs.gnu.org; Wed, 12 Jul 2023 13:47:06 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qJdvX-0007XV-JB; Wed, 12 Jul 2023 13:46:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=RIRMf0vtkZDttVTaQoUZDimaL0r7DS5e54O1ycNojSI=; b=V16LzW0jiJ9b9OZ+GaAJ 7PJFOBXf305jvK1vXtbms9VLgLbZ6vxYBca28dVaoFS1E8X8oCs1mKphYBZIqaIanGo6a509/Fanl 2s4cyphQ+X+YIHiX8PvgcbjrUJdoW6YDx/f88C5mk7LN35PCbvwxV4cbo3O3P3TxxiiijVu0xon7U F4BGPgFbAB6Ht9bu6n5b3PpFIwxn22aGd9cYZQvmZgMS0q3MyQAYn7qg+0mwmexyhZd74/7vwN9Qc m+O/iYycm06F9McxPy0j68Ah7Xsy865lMAUMbMYSNq0sDcmMYuirUjKRKAH6nrf9cw3xUQr4B/Zym MU2smMjUfs7mDg==; Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qJdvX-0006ub-6T; Wed, 12 Jul 2023 13:46:59 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <87mt18blug.fsf@gmail.com> <87ilbsvlql.fsf@gnu.org> <87pm60wpr1.fsf@gmail.com> <87o7lirq9a.fsf@gnu.org> <87zg4wrzvi.fsf_-_@gmail.com> <878rcb41vb.fsf_-_@gnu.org> Date: Wed, 12 Jul 2023 19:46:56 +0200 In-Reply-To: <878rcb41vb.fsf_-_@gnu.org> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s message of "Thu, 22 Jun 2023 23:35:04 +0200") Message-ID: <87ttu910q7.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Spam-Score: -7.21 X-Migadu-Queue-Id: CC08F4AA5B X-Migadu-Scanner: mx1.migadu.com X-Migadu-Spam-Score: -7.21 X-TUID: XDFfkhZ5abJC --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi! Ludovic Court=C3=A8s skribis: > Turns out that this happens when calling the =E2=80=98daemonize=E2=80=99 = action on > =E2=80=98root=E2=80=99. I have a reproducer now and am investigating=E2= =80=A6 Good news: this is fixed in Shepherd commit f4272d2f0f393d2aa3e9d76b36ab6aa5f2fc72c2! The root cause is inconsistent semantics when mixing epoll, signalfd, and fork, specifically this part from signalfd(2): epoll(7) semantics If a process adds (via epoll_ctl(2)) a signalfd file descriptor to= an epoll(7) instance, then epoll_wait(2) returns events only for sign= als sent to that process. In particular, if the process then uses fork= (2) to create a child process, then the child will be able to read(2) s= ig=E2=80=90 nals that are sent to it using the signalfd file descriptor, = but epoll_wait(2) will not indicate that the signalfd file descriptor = is ready. In this scenario, a possible workaround is that after = the fork(2), the child process can close the signalfd file descriptor t= hat it inherited from the parent process and then create another signa= lfd file descriptor and add it to the epoll instance. [=E2=80=A6] The C program below illustrates this behavior: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline; filename=signalfd+epoll.c Content-Transfer-Encoding: quoted-printable Content-Description: The C program. #include #include #include #include #include #include int main () { int ep, sfd; sigset_t signals; sigemptyset (&signals); sigaddset (&signals, SIGINT); sigaddset (&signals, SIGHUP); sigprocmask (SIG_BLOCK, &signals, NULL); sfd =3D signalfd (-1, &signals, SFD_CLOEXEC); ep =3D epoll_create1 (EPOLL_CLOEXEC); struct epoll_event events =3D { .events =3D EPOLLIN | EPOLLONESHOT, .data= =3D NULL }; epoll_ctl (ep, EPOLL_CTL_ADD, sfd, &events); epoll_wait (ep, &events, 1, 123); if (fork () =3D=3D 0) { /* Quoth signalfd(2): If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an epoll(7) instance, then epoll_wait(2) returns events only for signals sent to that process. In particular, if the process then uses fork(2) to create a child process, then the child will be able to read(2) sig=E2= =80=90 nals that are sent to it using the signalfd file descriptor, but epoll_wait(2) will not indicate that the signalfd file descriptor is ready. */ printf ("try this: kill -INT %i\n", getpid ()); while (1) { struct signalfd_siginfo info; if (epoll_wait (ep, &events, 1, 777) > 0) { read (sfd, &info, sizeof info); printf ("got signal %i!\n", info.ssi_signo); epoll_ctl (ep, EPOLL_CTL_MOD, sfd, &events); } } } return 0; } --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Of course it took me a while to find out about this; I first looked at things individually and didn=E2=80=99t expect the mixture to behave inconsistently. Maxim, let me know if it works for you! Thanks, Ludo=E2=80=99. --=-=-=--