unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#55444: elogind startup race between shepherd and dbus-daemon
@ 2022-05-16  8:26 Ludovic Courtès
  2022-05-24  2:27 ` Maxim Cournoyer
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Ludovic Courtès @ 2022-05-16  8:26 UTC (permalink / raw)
  To: 55444

Hello!

Currently (40a729a0e6f1d660b942241416c1e2c567616d4d), shepherd and
dbus-daemon compete to start elogind: shepherd tries to start it
eagerly, and dbus-daemon starts it on-demand upon bus activation.

Sometimes dbus-daemon wins, and thus shepherd tries a few times to start
it anyway, leading to the infamous:

  elogind is already running as PID 123

(elogind checks whether its PID file exists.  Note that you may see that
message also when shepherd wins, because dbus-daemon tries to start it
anyway.)  Eventually, shepherd considers that elogind cannot be started
and disables it.

In addition to being ridiculous, it’s harmful: the ‘xorg-server’ service
(from ‘gdm-service-type’ and ‘sddm-service-type’ depends on ‘elogind’),
so if shepherd loses the race, Xorg isn’t started (on my laptop,
shepherd never loses the race it seems, but i’ve seen it lose half of
the time on a slower machine).

The reason elogind is started by shepherd is explained in this comment:

       ;; Start elogind from the Shepherd rather than waiting
       ;; for bus activation.  This ensures that it can handle
       ;; events like lid close, etc.

This comes from 94a881178af9a9a918ce6de55641daa245c92e73, which was a
fix for <http://issues.guix.gnu.org/27580>.  I believe the justification
still holds.

So it would seem that the solution to this is to prevent dbus-daemon
from starting elogind.  We can do that by changing
org.freedesktop.login1.service so that it has “Exec=true” instead of
“Exec=elogind --daemon”.

“Exec=true” is a bit crude because it doesn’t guarantee that elogind is
really started; if that isn’t good enough, we could instead wait for the
PID file or something (as of Shepherd 0.9.0, invoking ‘herd start
elogind’ potentially leads shepherd to start a second instance if the
first one is still being started, so we can’t really do that).

Depending on what we end up with, we might also revisit whether
xorg-server needs to explicitly depend on elogind.

Thoughts?

Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#55444: elogind startup race between shepherd and dbus-daemon
  2022-05-16  8:26 bug#55444: elogind startup race between shepherd and dbus-daemon Ludovic Courtès
@ 2022-05-24  2:27 ` Maxim Cournoyer
  2022-05-25 12:45   ` Ludovic Courtès
  2022-05-24 19:25 ` Liliana Marie Prikler
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Maxim Cournoyer @ 2022-05-24  2:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 55444

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

> Hello!
>
> Currently (40a729a0e6f1d660b942241416c1e2c567616d4d), shepherd and
> dbus-daemon compete to start elogind: shepherd tries to start it
> eagerly, and dbus-daemon starts it on-demand upon bus activation.
>
> Sometimes dbus-daemon wins, and thus shepherd tries a few times to start
> it anyway, leading to the infamous:
>
>   elogind is already running as PID 123

Do we have a system test that sometimes reproduce it, or at least the
above message?  I have some branch where I introduce some D-Bus
synchronization primitives I had started to fix
https://issues.guix.gnu.org/issue/52051, which ended up being fixed
differently (bumping the timeout value); perhaps it could be of use
here.

Thanks,

Maxim




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#55444: elogind startup race between shepherd and dbus-daemon
  2022-05-16  8:26 bug#55444: elogind startup race between shepherd and dbus-daemon Ludovic Courtès
  2022-05-24  2:27 ` Maxim Cournoyer
@ 2022-05-24 19:25 ` Liliana Marie Prikler
  2022-05-25 12:26   ` Ludovic Courtès
  2022-05-27 13:54 ` Ludovic Courtès
  2022-05-27 20:54 ` Ludovic Courtès
  3 siblings, 1 reply; 9+ messages in thread
From: Liliana Marie Prikler @ 2022-05-24 19:25 UTC (permalink / raw)
  To: Ludovic Courtès, 55444

Hi,

Am Montag, dem 16.05.2022 um 10:26 +0200 schrieb Ludovic Courtès:
> [...]
> So it would seem that the solution to this is to prevent dbus-daemon
> from starting elogind.  We can do that by changing
> org.freedesktop.login1.service so that it has “Exec=true” instead of
> “Exec=elogind --daemon”.
> 
> “Exec=true” is a bit crude because it doesn’t guarantee that elogind
> is really started; if that isn’t good enough, we could instead wait
> for the PID file or something (as of Shepherd 0.9.0, invoking ‘herd
> start elogind’ potentially leads shepherd to start a second instance
> if the first one is still being started, so we can’t really do that).
Why does shepherd race with itself here?  That sounds like a very evil
bug.  Rather than waiting for a log file, I'd suggest writing an ad-hoc
Guile script that communicates with shepherd and blocks until shepherd
signals that elogind has been started, but this script too would have
to work around shepherd racing against itself.

> Depending on what we end up with, we might also revisit whether
> xorg-server needs to explicitly depend on elogind.
At least in the case of GDM I think it does heavily depend on elogind.
For the future, I think we also should take over dbus-daemon's
autostart in the same way systemd already has.

Cheers




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#55444: elogind startup race between shepherd and dbus-daemon
  2022-05-24 19:25 ` Liliana Marie Prikler
@ 2022-05-25 12:26   ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2022-05-25 12:26 UTC (permalink / raw)
  To: Liliana Marie Prikler; +Cc: 55444

Hi,

Liliana Marie Prikler <liliana.prikler@gmail.com> skribis:

> Am Montag, dem 16.05.2022 um 10:26 +0200 schrieb Ludovic Courtès:
>> [...]
>> So it would seem that the solution to this is to prevent dbus-daemon
>> from starting elogind.  We can do that by changing
>> org.freedesktop.login1.service so that it has “Exec=true” instead of
>> “Exec=elogind --daemon”.
>> 
>> “Exec=true” is a bit crude because it doesn’t guarantee that elogind
>> is really started; if that isn’t good enough, we could instead wait
>> for the PID file or something (as of Shepherd 0.9.0, invoking ‘herd
>> start elogind’ potentially leads shepherd to start a second instance
>> if the first one is still being started, so we can’t really do that).
> Why does shepherd race with itself here?  That sounds like a very evil
> bug.  Rather than waiting for a log file, I'd suggest writing an ad-hoc
> Guile script that communicates with shepherd and blocks until shepherd
> signals that elogind has been started, but this script too would have
> to work around shepherd racing against itself.

Right.  Currently services have two states: stopped, and started.
Fixing that needs non-trivial changes to how shepherd handles state.

We’ll have to do that (the way I see it, we’ll move state out of
<service> and have a fiber explicitly handle state, including
distinguishing between “stopped” and “starting”), but I think/hope we
can fix this bug without first addressing this issue.

>> Depending on what we end up with, we might also revisit whether
>> xorg-server needs to explicitly depend on elogind.
> At least in the case of GDM I think it does heavily depend on elogind.
> For the future, I think we also should take over dbus-daemon's
> autostart in the same way systemd already has.

Agreed, though that one is trickier: we’d need an implementation of the
D-Bus protocol.  There’s guile-ac-d-bus but it’s probably under-tested.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#55444: elogind startup race between shepherd and dbus-daemon
  2022-05-24  2:27 ` Maxim Cournoyer
@ 2022-05-25 12:45   ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2022-05-25 12:45 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 55444

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hello!
>>
>> Currently (40a729a0e6f1d660b942241416c1e2c567616d4d), shepherd and
>> dbus-daemon compete to start elogind: shepherd tries to start it
>> eagerly, and dbus-daemon starts it on-demand upon bus activation.
>>
>> Sometimes dbus-daemon wins, and thus shepherd tries a few times to start
>> it anyway, leading to the infamous:
>>
>>   elogind is already running as PID 123
>
> Do we have a system test that sometimes reproduce it, or at least the
> above message?

Any system along the lines of gnu/system/examples/desktop.tmpl gets this
message, though usually Xorg starts without any problem.

There’s an elogind system test but it doesn’t catch the problem because
it’s a non-deterministic and rather rare issue.

> I have some branch where I introduce some D-Bus synchronization
> primitives I had started to fix
> https://issues.guix.gnu.org/issue/52051, which ended up being fixed
> differently (bumping the timeout value); perhaps it could be of use
> here.

Hmm I had forgotten about that bug, and I wonder if it’s the same bug I
was seeing, and that in fact the shepherd/dbus-daemon race isn’t the
root cause.  The machine where I saw elogind startup failures was
extremely slow to start for other reasons.

Anyhow, getting rid of this race seems like the right thing to do.

Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#55444: elogind startup race between shepherd and dbus-daemon
  2022-05-16  8:26 bug#55444: elogind startup race between shepherd and dbus-daemon Ludovic Courtès
  2022-05-24  2:27 ` Maxim Cournoyer
  2022-05-24 19:25 ` Liliana Marie Prikler
@ 2022-05-27 13:54 ` Ludovic Courtès
  2022-05-27 20:54 ` Ludovic Courtès
  3 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2022-05-27 13:54 UTC (permalink / raw)
  To: 55444

Ludovic Courtès <ludo@gnu.org> skribis:

> Currently (40a729a0e6f1d660b942241416c1e2c567616d4d), shepherd and
> dbus-daemon compete to start elogind: shepherd tries to start it
> eagerly, and dbus-daemon starts it on-demand upon bus activation.
>
> Sometimes dbus-daemon wins, and thus shepherd tries a few times to start
> it anyway, leading to the infamous:

Here’s an example where dbus-daemon wins:

--8<---------------cut here---------------start------------->8---
$ sudo grep --color -E '^May 27 .*(dbus.*login|elogind)' /var/log/messages
May 27 15:06:36 localhost dbus-daemon[326]: [system] Activating service name='org.freedesktop.login1' requested by ':1.0' (uid=0 pid=307 comm="/gnu/store/2lis8khrdk0zzjzs5ydi8rs5h6f6wjr7-shadow") (using servicehelper)
May 27 15:06:37 localhost elogind-daemon[335]: New seat seat0.
May 27 15:06:37 localhost elogind-daemon[335]: Watching system buttons on /dev/input/event2 (Power Button)
May 27 15:06:37 localhost elogind-daemon[335]: Watching system buttons on /dev/input/event0 (Power Button)
May 27 15:06:37 localhost elogind-daemon[335]: Watching system buttons on /dev/input/event1 (Sleep Button)
May 27 15:06:41 localhost elogind-daemon[335]: Watching system buttons on /dev/input/event4 (Darfon HP USB Keyboard)
May 27 15:06:43 localhost elogind-daemon[335]: Watching system buttons on /dev/input/event5 (Darfon HP USB Keyboard Consumer Control)
May 27 15:06:48 localhost dbus-daemon[326]: [system] Successfully activated service 'org.freedesktop.login1'
May 27 15:06:54 localhost elogind-daemon[335]: New session c1 of user ludo.
May 27 15:07:45 localhost shepherd[1]: Service elogind has been started. 
May 27 15:08:02 localhost shepherd[1]: Respawning elogind. 
May 27 15:08:14 localhost shepherd[1]: Service elogind has been started. 
May 27 15:08:32 localhost shepherd[1]: Respawning elogind. 
May 27 15:08:43 localhost shepherd[1]: Service elogind has been started. 
May 27 15:08:47 localhost vmunix: [   25.123255] shepherd[1]: Service file-system-/sys/fs/cgroup/elogind has been started.
May 27 15:08:53 localhost shepherd[1]: Respawning elogind. 
May 27 15:08:59 localhost shepherd[1]: Service elogind has been started. 
May 27 15:09:00 localhost vmunix: [   79.976531] elogind[348]: elogind is already running as PID 335
May 27 15:09:00 localhost vmunix: [  107.880971] elogind[364]: elogind is already running as PID 335
May 27 15:09:00 localhost vmunix: [  109.160864] elogind-daemon[335]: New session c2 of user ludo.
May 27 15:09:00 localhost vmunix: [  135.017068] elogind[369]: elogind is already running as PID 335
May 27 15:09:00 localhost vmunix: [  159.849027] elogind[370]: elogind is already running as PID 335
May 27 15:09:00 localhost vmunix: [  181.608889] elogind[371]: elogind is already running as PID 335
May 27 15:09:00 localhost shepherd[1]: Respawning elogind. 
May 27 15:09:00 localhost shepherd[1]: Service elogind has been started. 
May 27 15:09:04 localhost shepherd[1]: Respawning elogind. 
May 27 15:09:04 localhost shepherd[1]: Service elogind has been started. 
May 27 15:09:04 localhost elogind[410]: elogind is already running as PID 335
May 27 15:09:04 localhost shepherd[1]: Respawning elogind. 
May 27 15:09:04 localhost shepherd[1]: Service elogind has been started. 
May 27 15:09:05 localhost elogind[411]: elogind is already running as PID 335
May 27 15:09:05 localhost shepherd[1]: Respawning elogind. 
May 27 15:09:05 localhost shepherd[1]: Service elogind has been started. 
May 27 15:09:05 localhost elogind[412]: elogind is already running as PID 335
May 27 15:09:05 localhost shepherd[1]: Respawning elogind. 
May 27 15:09:05 localhost shepherd[1]: Service elogind has been started. 
May 27 15:09:05 localhost elogind[416]: elogind is already running as PID 335
May 27 15:09:05 localhost shepherd[1]: Respawning elogind. 
May 27 15:09:05 localhost shepherd[1]: Service elogind has been started. 
May 27 15:09:05 localhost elogind[417]: elogind is already running as PID 335
May 27 15:09:05 localhost shepherd[1]: Service elogind has been disabled. 
May 27 15:09:08 localhost elogind-daemon[335]: New session c3 of user gdm.
May 27 15:12:08 localhost elogind-daemon[335]: New session c4 of user ludo.
--8<---------------cut here---------------end--------------->8---

(In this case ‘xorg-server’ started but ‘herd status elogind’ shows it
is stopped and disabled.)

Contrast with a successful startup where shepherd wins:

--8<---------------cut here---------------start------------->8---
May 27 11:03:55 localhost shepherd[1]: Service elogind has been started. 
May 27 11:03:54 localhost elogind[476]: New seat seat0.
May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/input/event3 (Power Button)
May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/input/event1 (Power Button)
May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/input/event0 (Lid Switch)
May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/input/event2 (Sleep Button)
May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/input/event5 (Dell Dell USB Keyboard)
May 27 11:03:54 localhost dbus-daemon[470]: [system] Activating service name='org.freedesktop.login1' requested by ':1.2' (uid=0 pid=477 comm="/gnu/store/qpaw2f734zlsq153fkn5afcv4k4fk63z-upower") (using servicehelper)
May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/input/event4 (AT Translated Set 2 keyboard)
May 27 11:03:55 localhost elogind[496]: elogind is already running as PID 476
May 27 11:03:55 localhost dbus-daemon[470]: [system] Successfully activated service 'org.freedesktop.login1'
May 27 11:04:03 localhost vmunix: [ 2089.808033] shepherd[1]: Service file-system-/sys/fs/cgroup/elogind has been started.
May 27 11:04:04 localhost elogind[476]: New session c1 of user gdm.
--8<---------------cut here---------------end--------------->8---

Besides, Gentoo recommends starting it from runit rather than on-demand:

  https://wiki.gentoo.org/wiki/Elogind#Service
  https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-auth/elogind/elogind-241.1.ebuild?id=bea47cee314829edbb41453d1e89fa1d1d3f9993

They don’t seem to be doing anything to avoid the race though.

Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#55444: elogind startup race between shepherd and dbus-daemon
  2022-05-16  8:26 bug#55444: elogind startup race between shepherd and dbus-daemon Ludovic Courtès
                   ` (2 preceding siblings ...)
  2022-05-27 13:54 ` Ludovic Courtès
@ 2022-05-27 20:54 ` Ludovic Courtès
  2022-05-28  8:13   ` Josselin Poiret via Bug reports for GNU Guix
  3 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2022-05-27 20:54 UTC (permalink / raw)
  To: 55444

[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]

Hey there,

Ludovic Courtès <ludo@gnu.org> skribis:

> So it would seem that the solution to this is to prevent dbus-daemon
> from starting elogind.  We can do that by changing
> org.freedesktop.login1.service so that it has “Exec=true” instead of
> “Exec=elogind --daemon”.
>
> “Exec=true” is a bit crude because it doesn’t guarantee that elogind is
> really started; if that isn’t good enough, we could instead wait for the
> PID file or something (as of Shepherd 0.9.0, invoking ‘herd start
> elogind’ potentially leads shepherd to start a second instance if the
> first one is still being started, so we can’t really do that).

The patch below address that: it changes the “Exec=” line of
‘org.freedesktop.login1’ to refer to a wrapper.  That wrapper connects
to shepherd and waits until ‘elogind’ is started.

That way, if dbus-daemon comes first, it won’t actually launch anything
and instead wait for the Shepherd ‘elogind’ service to be up.  (And if
it comes second, dbus-daemon won’t try to launch anything, so no
spurious “already running” messages.)

I tested it in a ‘desktop.tmpl’ VM, quickly logging in on tty1.  On
/var/log/messages, you can see the “Activating ….login1” message from
dbus-daemon, followed by “Service elogind started” from shepherd,
followed by “Successfully activated ….login1” from dbus-daemon.

The “elogind” system test passes too.

Thoughts?  Objections?

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: the patch --]
[-- Type: text/x-patch, Size: 5173 bytes --]

From 7ef63d7426677961afd2bd937af19b08209c5b70 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Fri, 27 May 2022 22:41:55 +0200
Subject: [PATCH] services: elogind: When started by dbus-daemon, wait for the
 Shepherd service.

Fixes <https://issues.guix.gnu.org/55444>.

Previously shepherd and dbus-daemon would race to start elogind.  In
some cases (for instance if one logs in quickly enough on the tty),
dbus-daemon would "win" and start elogind before shepherd has had a
chance to do it.  Consequently, shepherd would fail to start elogind and
mark it as stopped and disabled, in turn preventing services that depend
on it such as 'xorg-server' from starting.

* gnu/services/desktop.scm (elogind-dbus-service): Rewrite to refer to a
wrapper that waits for the 'elogind' Shepherd service.
---
 gnu/services/desktop.scm | 79 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 75 insertions(+), 4 deletions(-)

diff --git a/gnu/services/desktop.scm b/gnu/services/desktop.scm
index 24fd43a207..318107a2ca 100644
--- a/gnu/services/desktop.scm
+++ b/gnu/services/desktop.scm
@@ -1075,10 +1075,81 @@ (define-syntax-rule (ini-file config file clause ...)
    ("HybridSleepMode" (sleep-list elogind-hybrid-sleep-mode))))
 
 (define (elogind-dbus-service config)
-  (list (wrapped-dbus-service (elogind-package config)
-                              "libexec/elogind/elogind"
-                              `(("ELOGIND_CONF_FILE"
-                                 ,(elogind-configuration-file config))))))
+  "Return a @file{org.freedesktop.login1.service} file that tells D-Bus how to
+\"start\" elogind.  In practice though, our elogind is started when booting by
+shepherd.  Thus, the @code{Exec} line of this @file{.service} file does not
+explain how to start elogind; instead, it spawns a wrapper that waits for the
+@code{elogind} shepherd service.  This avoids a race condition where both
+@command{shepherd} and @command{dbus-daemon} would attempt to start elogind."
+  ;; For more info on the elogind startup race, see
+  ;; <https://issues.guix.gnu.org/55444>.
+
+  (define elogind
+    (elogind-package config))
+
+  (define wrapper
+    (program-file "elogind-dbus-shepherd-sync"
+                  (with-imported-modules '((gnu services herd))
+                    #~(begin
+                        (use-modules (gnu services herd)
+                                     (srfi srfi-1)
+                                     (ice-9 match))
+
+                        (define (elogind-service? service)
+                          (memq 'elogind (live-service-provision service)))
+
+                        (define max-attempts
+                          ;; Number of attempts before assuming elogind failed
+                          ;; to start.
+                          20)
+
+                        ;; Repeatedly check whether the 'elogind' shepherd
+                        ;; service is up and running.  (As of Shepherd 0.9.1,
+                        ;; we cannot just call the 'start' method and wait for
+                        ;; it: it would spawn an additional elogind process.)
+                        (let loop ((attempts 0))
+                          (define services
+                            (current-services))
+
+                          (when (>= attempts max-attempts)
+                            (format (current-error-port)
+                                    "elogind shepherd service not started~%")
+                            (exit 2))
+
+                          (match (find elogind-service? services)
+                            (#f
+                             (format (current-error-port)
+                                     "no elogind shepherd service~%")
+                             (exit 1))
+                            (service
+                             (unless (live-service-running service)
+                               (sleep 1)
+                               (loop (+ attempts 1))))))))))
+
+  (define build
+    (with-imported-modules '((guix build utils))
+      #~(begin
+          (use-modules (guix build utils)
+                       (ice-9 match))
+
+          (define service-directory
+            "/share/dbus-1/system-services")
+
+          (mkdir-p (dirname (string-append #$output service-directory)))
+          (copy-recursively (string-append #$elogind service-directory)
+                            (string-append #$output service-directory))
+          (symlink (string-append #$elogind "/etc") ;for etc/dbus-1
+                   (string-append #$output "/etc"))
+
+          ;; Replace the "Exec=" line of the 'org.freedesktop.login1.service'
+          ;; file with one that refers to WRAPPER instead of elogind.
+          (match (find-files #$output "\\.service$")
+            ((file)
+             (substitute* file
+               (("Exec[[:blank:]]*=.*" _)
+                (string-append "Exec=" #$wrapper "\n"))))))))
+
+  (list (computed-file "elogind-dbus-service-wrapper" build)))
 
 (define (pam-extension-procedure config)
   "Return an extension for PAM-ROOT-SERVICE-TYPE that ensures that all the PAM
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#55444: elogind startup race between shepherd and dbus-daemon
  2022-05-27 20:54 ` Ludovic Courtès
@ 2022-05-28  8:13   ` Josselin Poiret via Bug reports for GNU Guix
  2022-05-28 21:26     ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Josselin Poiret via Bug reports for GNU Guix @ 2022-05-28  8:13 UTC (permalink / raw)
  To: Ludovic Courtès, 55444

Hello Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> The patch below address that: it changes the “Exec=” line of
> ‘org.freedesktop.login1’ to refer to a wrapper.  That wrapper connects
> to shepherd and waits until ‘elogind’ is started.
>
> That way, if dbus-daemon comes first, it won’t actually launch anything
> and instead wait for the Shepherd ‘elogind’ service to be up.  (And if
> it comes second, dbus-daemon won’t try to launch anything, so no
> spurious “already running” messages.)
>
> I tested it in a ‘desktop.tmpl’ VM, quickly logging in on tty1.  On
> /var/log/messages, you can see the “Activating ….login1” message from
> dbus-daemon, followed by “Service elogind started” from shepherd,
> followed by “Successfully activated ….login1” from dbus-daemon.
>
> The “elogind” system test passes too.
>
> Thoughts?  Objections?
>
> Ludo’.

Great idea!  The patch LGTM, although I'd argue that most of the wrapper
code could belong in (gnu services herd), in something like
(wait-for-services #:select? select? #:retries (n 20)) so that someone
else doesn't end up recoding this for another service.

Best,
-- 
Josselin Poiret




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#55444: elogind startup race between shepherd and dbus-daemon
  2022-05-28  8:13   ` Josselin Poiret via Bug reports for GNU Guix
@ 2022-05-28 21:26     ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2022-05-28 21:26 UTC (permalink / raw)
  To: Josselin Poiret; +Cc: 55444-done

Hello,

Josselin Poiret <dev@jpoiret.xyz> skribis:

> Great idea!  The patch LGTM, although I'd argue that most of the wrapper
> code could belong in (gnu services herd), in something like
> (wait-for-services #:select? select? #:retries (n 20)) so that someone
> else doesn't end up recoding this for another service.

Good idea, done here:

  f383838a09 services: elogind: When started by dbus-daemon, wait for the Shepherd service.
  b04ae71def services: herd: Add 'wait-for-service'.

Let me know what you think!

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-05-28 21:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16  8:26 bug#55444: elogind startup race between shepherd and dbus-daemon Ludovic Courtès
2022-05-24  2:27 ` Maxim Cournoyer
2022-05-25 12:45   ` Ludovic Courtès
2022-05-24 19:25 ` Liliana Marie Prikler
2022-05-25 12:26   ` Ludovic Courtès
2022-05-27 13:54 ` Ludovic Courtès
2022-05-27 20:54 ` Ludovic Courtès
2022-05-28  8:13   ` Josselin Poiret via Bug reports for GNU Guix
2022-05-28 21:26     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).