unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#74279: Shepherd service is not getting respawned.
@ 2024-11-09 14:58 Tomas Volf
  2024-11-10 11:32 ` Ludovic Courtès
  0 siblings, 1 reply; 2+ messages in thread
From: Tomas Volf @ 2024-11-09 14:58 UTC (permalink / raw)
  To: 74279

Hi,

I wrote a shepherd service to function as a check for networking being
actually up, but it does not get respawned when it fails and I do not
understand why.

This is the service in my operating-system:

--8<---------------cut here---------------start------------->8---
(simple-service
 'network-online
 shepherd-root-service-type
 (list (shepherd-service
        (requirement '(networking))
        (provision '(network-online))
        (documentation "Wait for the network to come up.")
        (start #~(lambda _
                   (let* ((cmd "/run/privileged/bin/ping -qc1 -W1 1.1.1.1")
                          (status (system cmd)))
                     (= 0 (status:exit-val status)))))
        (one-shot? #t)
        ;; Try every second.
        (respawn-delay 1)
        ;; Retry forever.  Double-quoting is intentional.
        (respawn-limit ''(5 . 5)))))
--8<---------------cut here---------------end--------------->8---

Now, when I reboot the machine, I see in the log that the service did
start:

--8<---------------cut here---------------start------------->8---
Nov  7 00:18:20 localhost shepherd[1]: Starting service network-online...
[..]
Nov  7 00:18:20 localhost shepherd[1]: [sh] PING 192.168.0.110 (192.168.0.110): 56 data bytes
Nov  7 00:18:20 localhost shepherd[1]: [sh] /run/privileged/bin/ping: sending packet: Network is unreachable
Nov  7 00:18:20 localhost shepherd[1]: Service network-online could not be started.
Nov  7 00:18:20 localhost shepherd[1]: Service network-online failed to start.
--8<---------------cut here---------------end--------------->8---

The fail on first run is expected, however the problem is it starts
exactly once.  I do not see any attempts to respawn it in the
/var/log/messages, but based on the documentation the service *should*
get respawned, since it failed.  What am I doing wrong?  Would anyone
have any suggestions, either what is wrong with the code above or how to
approach it in another way?

Have a nice day,
Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.




^ permalink raw reply	[flat|nested] 2+ messages in thread

* bug#74279: Shepherd service is not getting respawned.
  2024-11-09 14:58 bug#74279: Shepherd service is not getting respawned Tomas Volf
@ 2024-11-10 11:32 ` Ludovic Courtès
  0 siblings, 0 replies; 2+ messages in thread
From: Ludovic Courtès @ 2024-11-10 11:32 UTC (permalink / raw)
  To: 74279

Hi Tomas,

Tomas Volf <~@wolfsden.cz> skribis:

>         (start #~(lambda _
>                    (let* ((cmd "/run/privileged/bin/ping -qc1 -W1 1.1.1.1")
>                           (status (system cmd)))
>                      (= 0 (status:exit-val status)))))
>         (one-shot? #t)
>         ;; Try every second.
>         (respawn-delay 1)
>         ;; Retry forever.  Double-quoting is intentional.
>         (respawn-limit ''(5 . 5)))))

[...]

> Nov  7 00:18:20 localhost shepherd[1]: Starting service network-online...
> [..]
> Nov  7 00:18:20 localhost shepherd[1]: [sh] PING 192.168.0.110 (192.168.0.110): 56 data bytes
> Nov  7 00:18:20 localhost shepherd[1]: [sh] /run/privileged/bin/ping: sending packet: Network is unreachable
> Nov  7 00:18:20 localhost shepherd[1]: Service network-online could not be started.
> Nov  7 00:18:20 localhost shepherd[1]: Service network-online failed to start.

I think there’s a misunderstanding here: ‘respawn?’ is about respawning
a service that, once it is running, terminates prematurely.

In your case, the service does not start (its ‘start’ method returns
#f).

Now, it would probably make sense to have a mechanism to retry starting
services.

In the specific case of ‘network-online’ though, you could use a
different approach: the ‘start’ method could itself try retry pinging
the network several times and fail only if it failed to reach the
network after, say, 10s.  (Remember that ‘start’ and ‘stop’ must
complete in a timely fashion.)

HTH,
Ludo’.




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-11-10 12:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-09 14:58 bug#74279: Shepherd service is not getting respawned Tomas Volf
2024-11-10 11:32 ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).