From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id /mz0NXYps2HVwAAAgWs5BA (envelope-from ) for ; Fri, 10 Dec 2021 11:18:30 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 6GsWMXYps2FFawAA1q6Kng (envelope-from ) for ; Fri, 10 Dec 2021 10:18:30 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 5BAE413F83 for ; Fri, 10 Dec 2021 11:18:30 +0100 (CET) Received: from localhost ([::1]:57188 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvcyy-0006Vv-T7 for larch@yhetil.org; Fri, 10 Dec 2021 05:18:28 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51018) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvcyX-0006QA-Vm for bug-guix@gnu.org; Fri, 10 Dec 2021 05:18:02 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:34035) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mvcyX-0001Wr-Mn for bug-guix@gnu.org; Fri, 10 Dec 2021 05:18:01 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mvcyX-0008CK-J7 for bug-guix@gnu.org; Fri, 10 Dec 2021 05:18:01 -0500 X-Loop: help-debbugs@gnu.org Subject: bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out) Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 10 Dec 2021 10:18:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52051 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Maxim Cournoyer Received: via spool by 52051-submit@debbugs.gnu.org id=B52051.163913146531487 (code B ref 52051); Fri, 10 Dec 2021 10:18:01 +0000 Received: (at 52051) by debbugs.gnu.org; 10 Dec 2021 10:17:45 +0000 Received: from localhost ([127.0.0.1]:45581 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvcyG-0008Bn-OU for submit@debbugs.gnu.org; Fri, 10 Dec 2021 05:17:45 -0500 Received: from eggs.gnu.org ([209.51.188.92]:50046) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvcyB-0008BX-9M for 52051@debbugs.gnu.org; Fri, 10 Dec 2021 05:17:43 -0500 Received: from [2001:470:142:3::e] (port=50608 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvcy6-0001S2-0f; Fri, 10 Dec 2021 05:17:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=lpPLSkWVFOT/rO5pMBOmC/e3PpMZkzytW7o8XiZn4FQ=; b=GWuXf8OBW4+KTe/RUkNL nEVIJKlt+4fmTzBIV4RmjtMTTVFzj46ZZ5OJyolS6xrO0OgqrWc9LVT7UfHHngg6ANeCjwkmxWEwZ ctWp0Pd+aFmzhcHyEOB/Txmg23hFMQlw25df2yCGxsclXV+Hj35t3Uqx/ibQ4V0fG9ayhxFTijXBX 1cqor9t87rYnbxHqsnCDzyuf6CQXcaJ7qu8EflnXCcI8AGKp+CllP5qs+iPDbSKP+Fd0FkrmfsELA eb/EaaOmXlCdGmNaC55g/Tyt6nJq0Ks9W0OSN4UoAKA6io5X1IYW92s0G3nKhIvNGourQ8SDqWi2S DyC+zBaVyU7htQ==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:57333 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvcy5-00034d-UK; Fri, 10 Dec 2021 05:17:34 -0500 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <87pmqr1i8k.fsf@gmail.com> <87czmqorjb.fsf@jpoiret.xyz> <87tug0c1zi.fsf@gmail.com> <87wnkvp8cx.fsf@jpoiret.xyz> <87mtlcun1h.fsf@gmail.com> <87a6hahha0.fsf@gnu.org> <87bl1plv4q.fsf@gmail.com> <87v8zxctn5.fsf@gnu.org> <875yrxyrjr.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 20 Frimaire an 230 de la =?UTF-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Fri, 10 Dec 2021 11:17:30 +0100 In-Reply-To: <875yrxyrjr.fsf@gmail.com> (Maxim Cournoyer's message of "Fri, 10 Dec 2021 00:09:28 -0500") Message-ID: <87k0gcbw79.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 52051@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1639131510; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=lpPLSkWVFOT/rO5pMBOmC/e3PpMZkzytW7o8XiZn4FQ=; b=ZxtJf374I7fmWHo62OWG4A/tTkglfdiYd4kqXHHjMJMzavwwFmNhWwvJmtPqGiWoqkoSN/ g3jYf7pLfwrza9U/9s08RTyu4c2FZxLk7rtLSrXoetjrY8lGuVlOAS51oEUrxkHt//B4Ax Vn/Ymq1tHzPfyEv55qIxGj3tLrpyqjVQZrS2XHXNBKXqfLpmave7IRvTseYwX0LmoKiXbG 2/BWGoqrfruqgyz6TELJsp1gRr+lMhJFZrh/qwliItnZ6+k5I0venonwz5Sr6b2ViJqMw9 CP85C04feEVjiSwc4EljaNf1SwpDc3WztDpunKoQBw0SBnmpVaKtXygFSKIfew== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1639131510; a=rsa-sha256; cv=none; b=ACwQ43tBNrzpuWKzD7gd+WEbHSZ5fnN0e5tTAYbz5YaVHnYoyy/HlulmzgCfo7nJcMiIXw 7sCRMszBzBTwvGZETm+ov65SvycSGf7Oz6LlYdrefAFYUg37FNTI/yWa41FAPQtZ5PAYOJ cMgS+ZLFblQ32UjbgtgTPzJOsPTKDOWqCVYSjMgWaCollBvhx6FGrPN9SY5f3tWul+V+kq 6mDlinD87GvBO8+Ruu+8z59l+Ju5lMjJCPbeKO4dADsiuRJzZ69QnN0AapTHuD+yde1fcC uIn5nkvAtsKDcCQV5MdDOHICokdaXRcNf6DfAUs7FlTT6o7a+wc8g7XoPmMBCg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=GWuXf8OB; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.66 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=GWuXf8OB; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 5BAE413F83 X-Spam-Score: -3.66 X-Migadu-Scanner: scn1.migadu.com X-TUID: 1JzuyKrBMzAS Hi Maxim! Maxim Cournoyer skribis: [...] > Oh! Marius found this interesting issue [0] that they shared on IRC > today; I wonder if it may be related. sd-bus apparently pipeline things > aggressively, which is not always handled well by other D-bus > participants. > > [0] https://github.com/systemd/systemd/issues/16610 Interesting. >> Anyway, the =E2=80=9CHello=E2=80=9D message is sent to the system bus as= ynchronously in >> =E2=80=98sd-bus.c=E2=80=99: [...] >> static int hello_callback(sd_bus_message *reply, void *userdata, sd_bus_= error *error) { >> >> [...] >> >> fail: >> /* When Hello() failed, let's propagate this in two ways: first = we return the error immediately here, >> * which is the propagated up towards the event loop. Let's also= invalidate the connection, so that >> * if the user then calls back into us again we won't wait any l= onger. */ >> >> bus_set_state(bus, BUS_CLOSING); >> return r; >> } >> >> >> It=E2=80=99s not clear from that whether the authors intended for the th= ing to >> keep going in case of failure. In our case it=E2=80=99s not helpful. > > If we picture this in the systemd use case, I believe sd-bus must be > used as *the* d-bus daemon, perhaps? So it should never die, and expect > users to call back into it to retry things? In our case, it acts as a > D-Bus client, not a server (IIUC), so perhaps its behavior is not tuned > right for our use case. I think systemd-logind is a separate process, just like elogind, that talks over D-Bus; I don=E2=80=99t think there=E2=80=99s any difference here= . It just seems that this corner case, where we don=E2=80=99t get a reply for =E2=80= =98Hello=E2=80=99, is not correctly handled. > Interesting that you mention this; that's what I worked on yesterday > (see attached patches). I managed to get elogind panicking the kernel > during a guix system reconfigure, which corrupted GRUB, so had to chroot > and reconfigure from there [1]. Not sure what happened, but it seems > that killing and restarting elogind is susceptible to cause hard locks. Ouch! This is weird. Note that =E2=80=9Ceverything=E2=80=9D depends on el= ogind, as can be seen with =E2=80=98guix system shepherd-graph=E2=80=99. So =E2=80=9Cher= d stop elogind=E2=80=9D is not something you should try at home. Incidentally, can you reproduce the bug in a VM? That would be much nicer. >> We know that elogind is started after dbus-daemon has written its PID >> file (there=E2=80=99s a Shepherd service dependency). Is there a possib= ility >> that dbus-daemon writes its PID file before it has set >> =E2=80=98new_connection_function=E2=80=99? > > Are PID files conventionally agreed to be synchronization primitives? Yes. Daemons are supposed to write their PID file once they=E2=80=99re rea= dy to do their job. > I'm sorry for my lack of deep analysis here, I've spent most of my > evening attempting to fix my system just to boot ^^'. I've at least > managed to collect the following verbose D-Bus log (54 MiB uncompressed) > which hopefully can shed some light onto how this failure came to be. Excerpt: --8<---------------cut here---------------start------------->8--- 366: 0x7f28f396e740: 1639108961.938358 [dbus-sysdeps-util-unix.c(237):_dbus= _write_pid_to_file_and_pipe] writing pid file /var/run/dbus/pid 366: 0x7f28f396e740: 1639108961.938438 [dbus-sysdeps-util-unix.c(291):_dbus= _write_pid_to_file_and_pipe] No pid pipe to write to 366: 0x7f28f396e740: 1639108961.938474 [dbus-userdb.c(177):_dbus_user_datab= ase_lookup] Using cache for UID 986 information 366: 0x7f28f396e740: 1639108961.938665 [dbus-sysdeps-unix.c(3514):_dbus_soc= ketpair] full-duplex pipe 6 <-> 7 366: 0x7f28f396e740: 1639108961.938700 [main.c(719):main] We are on D-Bus... 366: 0x7f28f396e740: 1639108961.938725 [dbus-mainloop.c(884):_dbus_loop_run= ] Running main loop, depth 0 -> 1 366: 0x7f28f396e740: 1639108962.566557 [dbus-server-socket.c(182):socket_ha= ndle_watch] Handling client connection, flags 0x1 366: 0x7f28f396e740: 1639108962.566623 [dbus-sysdeps-unix.c(2344):_dbus_acc= ept] client fd 8 accepted 366: 0x7f28f396e740: 1639108962.566665 [dbus-server-socket.c(94):handle_new= _client_fd_and_unlock] Creating new client connection with fd 8 366: 0x7f28f396e740: 1639108962.566779 [dbus-connection.c(1360):_dbus_conne= ction_new_for_transport] LOCK 366: 0x7f28f396e740: 1639108962.566824 [dbus-transport-socket.c(180):check_= read_watch] fd =3D 8 366: 0x7f28f396e740: 1639108962.566869 [dbus-transport-socket.c(227):check_= read_watch] setting read watch enabled =3D 1 [...] 366: 0x7f28f396e740: 1639108962.568765 [dbus-transport-socket.c(974):socket= _handle_watch] handling read watch 0x23c2b00 flags =3D 9 366: 0x7f28f396e740: 1639108962.568807 [dbus-transport-socket.c(348):exchan= ge_credentials] exchange_credentials: do_reading =3D 1, do_writing =3D 0 366: 0x7f28f396e740: 1639108962.568862 [dbus-transport-socket.c(383):exchan= ge_credentials] Failed to read credentials Failed to read credentials byte = (zero-length read) 366: 0x7f28f396e740: 1639108962.568904 [dbus-transport.c(510):_dbus_transpo= rt_disconnect] start 366: 0x7f28f396e740: 1639108962.568945 [dbus-transport-socket.c(1041):socke= t_disconnect]=20 366: 0x7f28f396e740: 1639108962.568986 [dbus-transport-socket.c(76):free_wa= tches] start 366: 0x7f28f396e740: 1639108962.569029 [dbus-watch.c(423):_dbus_watch_list_= remove_watch] Removing watch on fd 8 --8<---------------cut here---------------end--------------->8--- We see this sequence: 1. Writes PID file. 2. Accepts incoming connection. 3. =E2=80=9CFailed to read credentials=E2=80=9D. 4. Close connection. So we need to understand #3. Perhaps going back to strace logs could help this time=E2=80=A6 Thanks, Ludo=E2=80=99.