all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: 52051@debbugs.gnu.org
Subject: bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out)
Date: Fri, 10 Dec 2021 11:17:30 +0100	[thread overview]
Message-ID: <87k0gcbw79.fsf@gnu.org> (raw)
In-Reply-To: <875yrxyrjr.fsf@gmail.com> (Maxim Cournoyer's message of "Fri, 10 Dec 2021 00:09:28 -0500")

Hi Maxim!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

[...]

> Oh!  Marius found this interesting issue [0] that they shared on IRC
> today; I wonder if it may be related.  sd-bus apparently pipeline things
> aggressively, which is not always handled well by other D-bus
> participants.
>
> [0]  https://github.com/systemd/systemd/issues/16610

Interesting.

>> Anyway, the “Hello” message is sent to the system bus asynchronously in
>> ‘sd-bus.c’:

[...]

>> static int hello_callback(sd_bus_message *reply, void *userdata, sd_bus_error *error) {
>>
>> [...]
>>
>> fail:
>>         /* When Hello() failed, let's propagate this in two ways: first we return the error immediately here,
>>          * which is the propagated up towards the event loop. Let's also invalidate the connection, so that
>>          * if the user then calls back into us again we won't wait any longer. */
>>
>>         bus_set_state(bus, BUS_CLOSING);
>>         return r;
>> }
>>
>>
>> It’s not clear from that whether the authors intended for the thing to
>> keep going in case of failure.  In our case it’s not helpful.
>
> If we picture this in the systemd use case, I believe sd-bus must be
> used as *the* d-bus daemon, perhaps?  So it should never die, and expect
> users to call back into it to retry things?  In our case, it acts as a
> D-Bus client, not a server (IIUC), so perhaps its behavior is not tuned
> right for our use case.

I think systemd-logind is a separate process, just like elogind, that
talks over D-Bus; I don’t think there’s any difference here.  It just
seems that this corner case, where we don’t get a reply for ‘Hello’, is
not correctly handled.

> Interesting that you mention this; that's what I worked on yesterday
> (see attached patches).  I managed to get elogind panicking the kernel
> during a guix system reconfigure, which corrupted GRUB, so had to chroot
> and reconfigure from there [1].  Not sure what happened, but it seems
> that killing and restarting elogind is susceptible to cause hard locks.

Ouch!  This is weird.  Note that “everything” depends on elogind, as can
be seen with ‘guix system shepherd-graph’.  So “herd stop elogind” is
not something you should try at home.

Incidentally, can you reproduce the bug in a VM?  That would be much
nicer.

>> We know that elogind is started after dbus-daemon has written its PID
>> file (there’s a Shepherd service dependency).  Is there a possibility
>> that dbus-daemon writes its PID file before it has set
>> ‘new_connection_function’?
>
> Are PID files conventionally agreed to be synchronization primitives?

Yes.  Daemons are supposed to write their PID file once they’re ready to
do their job.

> I'm sorry for my lack of deep analysis here, I've spent most of my
> evening attempting to fix my system just to boot ^^'.  I've at least
> managed to collect the following verbose D-Bus log (54 MiB uncompressed)
> which hopefully can shed some light onto how this failure came to be.

Excerpt:

--8<---------------cut here---------------start------------->8---
366: 0x7f28f396e740: 1639108961.938358 [dbus-sysdeps-util-unix.c(237):_dbus_write_pid_to_file_and_pipe] writing pid file /var/run/dbus/pid
366: 0x7f28f396e740: 1639108961.938438 [dbus-sysdeps-util-unix.c(291):_dbus_write_pid_to_file_and_pipe] No pid pipe to write to
366: 0x7f28f396e740: 1639108961.938474 [dbus-userdb.c(177):_dbus_user_database_lookup] Using cache for UID 986 information
366: 0x7f28f396e740: 1639108961.938665 [dbus-sysdeps-unix.c(3514):_dbus_socketpair] full-duplex pipe 6 <-> 7
366: 0x7f28f396e740: 1639108961.938700 [main.c(719):main] We are on D-Bus...
366: 0x7f28f396e740: 1639108961.938725 [dbus-mainloop.c(884):_dbus_loop_run] Running main loop, depth 0 -> 1
366: 0x7f28f396e740: 1639108962.566557 [dbus-server-socket.c(182):socket_handle_watch] Handling client connection, flags 0x1
366: 0x7f28f396e740: 1639108962.566623 [dbus-sysdeps-unix.c(2344):_dbus_accept] client fd 8 accepted
366: 0x7f28f396e740: 1639108962.566665 [dbus-server-socket.c(94):handle_new_client_fd_and_unlock] Creating new client connection with fd 8
366: 0x7f28f396e740: 1639108962.566779 [dbus-connection.c(1360):_dbus_connection_new_for_transport] LOCK
366: 0x7f28f396e740: 1639108962.566824 [dbus-transport-socket.c(180):check_read_watch] fd = 8
366: 0x7f28f396e740: 1639108962.566869 [dbus-transport-socket.c(227):check_read_watch]   setting read watch enabled = 1

[...]

366: 0x7f28f396e740: 1639108962.568765 [dbus-transport-socket.c(974):socket_handle_watch] handling read watch 0x23c2b00 flags = 9
366: 0x7f28f396e740: 1639108962.568807 [dbus-transport-socket.c(348):exchange_credentials] exchange_credentials: do_reading = 1, do_writing = 0
366: 0x7f28f396e740: 1639108962.568862 [dbus-transport-socket.c(383):exchange_credentials] Failed to read credentials Failed to read credentials byte (zero-length read)
366: 0x7f28f396e740: 1639108962.568904 [dbus-transport.c(510):_dbus_transport_disconnect] start
366: 0x7f28f396e740: 1639108962.568945 [dbus-transport-socket.c(1041):socket_disconnect] 
366: 0x7f28f396e740: 1639108962.568986 [dbus-transport-socket.c(76):free_watches] start
366: 0x7f28f396e740: 1639108962.569029 [dbus-watch.c(423):_dbus_watch_list_remove_watch] Removing watch on fd 8
--8<---------------cut here---------------end--------------->8---

We see this sequence:

  1. Writes PID file.

  2. Accepts incoming connection.

  3. “Failed to read credentials”.

  4. Close connection.

So we need to understand #3.  Perhaps going back to strace logs could
help this time…

Thanks,
Ludo’.




  reply	other threads:[~2021-12-10 10:18 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23  6:47 bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out) Maxim Cournoyer
2021-11-24  9:02 ` Josselin Poiret via Bug reports for GNU Guix
2021-11-25  3:03   ` Maxim Cournoyer
2021-11-25 22:13     ` Ludovic Courtès
2021-11-25  4:07   ` Maxim Cournoyer
2021-11-25 22:07     ` Maxim Cournoyer
2021-11-26  9:35     ` Josselin Poiret via Bug reports for GNU Guix
2021-12-07 21:23       ` Maxim Cournoyer
2021-12-07 22:15         ` Maxim Cournoyer
2021-12-08 22:18         ` Ludovic Courtès
2021-12-09 14:18           ` Maxim Cournoyer
2021-12-09 22:15             ` Ludovic Courtès
2021-12-10  5:09               ` Maxim Cournoyer
2021-12-10 10:17                 ` Ludovic Courtès [this message]
2021-12-16 18:24                   ` Josselin Poiret via Bug reports for GNU Guix
2021-12-21  9:31                     ` Timothy Sample
2021-12-21 16:13                       ` Leo Famulari
2021-12-21 16:36                         ` Timothy Sample
2021-12-21 16:52                           ` Maxime Devos
2021-12-21 17:32                             ` Leo Famulari
2021-12-21 17:51                               ` Maxime Devos
2021-12-21 18:11                                 ` Leo Famulari
2021-12-21 17:37                           ` Leo Famulari
2021-12-21 17:44                             ` Leo Famulari
2021-12-08  2:12       ` Maxim Cournoyer
2021-12-15 19:27 ` Michael Rohleder
2021-12-16  3:36 ` Abhiseck Paira via Bug reports for GNU Guix
2021-12-16 13:09 ` ison
2021-12-16 17:59   ` bug#52051: [core-updates-frozen] cannot login Guillaume Le Vaillant
2021-12-16 18:53     ` bug#52051: [core-updates-frozen] cannot login ('org.freedesktop.login1' service times out) Michael Rohleder
2021-12-20  0:40 ` Caleb Herbert
2021-12-22  1:46 ` Luis Felipe via Bug reports for GNU Guix
2021-12-22  2:19   ` Leo Famulari
2021-12-22 21:16     ` Leo Famulari
2021-12-22 23:38       ` Leo Famulari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0gcbw79.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=52051@debbugs.gnu.org \
    --cc=maxim.cournoyer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.