* shepherd respawn frequency
@ 2023-05-27 8:34 Attila Lendvai
2023-07-02 20:01 ` Ludovic Courtès
0 siblings, 1 reply; 5+ messages in thread
From: Attila Lendvai @ 2023-05-27 8:34 UTC (permalink / raw)
To: guix-devel
dear guix,
the issue at hand:
i have a daemon that simply quits when some of its running condition is not satisfied. this can be dependent on unpredictable external factors, like the temporary unreachability of a remote service.
shepherd respawns it immediately in RESPAWN-SERVICE, without any delay, which leads to a kind of a busy loop (i noticed this through the fan noise of the machine). i know that there's a stopgap measure to disable such services, but:
1) some of these daemons struggle long enough before quitting that
they do not trigger the default RESPAWN-LIMIT-HIT? stopgap
measure
2) i *do* want shepherd to keep restarting them indefinitely, but
not immediately after their premature exit
proposed solution:
would the shepherd maintaners (looking at you Ludo :) accept a change that introduces a new field into <service> called RESPAWN-DELAY, and issue a fiber sleep in RESPAWN-SERVICE when it is not #false, and the daemon process quits unexpectedly?
in an initial commit i'd also turn the global variable called RESPAWN-LIMIT into a field of <service>, and make it take its default value from a properly named %RESPAWN-LIMIT global variable.
open questions:
- what should be the default value of the respawn delay? i suggest 5
seconds, and i'd argue against it being disabled by default:
- premature exits happen more frequently at startup than in an
already running process
- an unwanted default respawn delay causes less headache than an
unwanted busy loop.
- if the respawn delay is set, then should respawn-limit be ignored?
IOW, should the logic treat them as two independent variables, or
should it not? and should there be some logic in how/where they
take their defaults from?
my pick: treat them as two independent variables, but when the user
explicitly specifies a respawn delay for the service object, then
there shouldn't be any respawn limit, unless the user also
explicitly specifies it on the <service> object.
corollary: the handling of defaults should be implemented so that
the fields of <service> hold #false as default value, in which
case the logic takes the default value from a global variable in
shepherd.
- should i bother with detecting a first respawn in a given past
period (of e.g. 1 minute?), and do not apply any delay when this is
the first respawn in that time window? this adds extra complexity,
which may not be worth it. i'd go with a pass here.
- after a cursory look, i don't understand the relationship between
RESPAWNS and FAILURES. the former seems to be an endlessly growing
list of timestamps, while the latter is a ring buffer of
timestamps. it's not crucial for me to understand it, but i wonder
if there's a bug lurking there that eats up the heap when a service
keeps respawning without any delay?
i'm all ears for suggestions, and i'm also happy to hand over the implementation to someone else, who already had plans to do it, and knows the internals of shepherd better than me.
--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Looking back on a 30-year teaching career full of rewards and prizes, somehow I can't completely believe that I spent my time on earth institutionalized; I can't believe that centralized schooling is allowed to exist at all as a gigantic indoctrination and sorting machine, robbing people of their children. Did it really happen? Was this my life? God help me.”
— John Taylor Gatto (1935–2018), Teacher of the Year, both in New York City and State, multiple times
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: shepherd respawn frequency
2023-05-27 8:34 shepherd respawn frequency Attila Lendvai
@ 2023-07-02 20:01 ` Ludovic Courtès
2023-07-02 20:11 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Ludovic Courtès @ 2023-07-02 20:01 UTC (permalink / raw)
To: Attila Lendvai; +Cc: guix-devel
Hi!
Attila Lendvai <attila@lendvai.name> skribis:
> would the shepherd maintaners (looking at you Ludo :) accept a change that introduces a new field into <service> called RESPAWN-DELAY, and issue a fiber sleep in RESPAWN-SERVICE when it is not #false, and the daemon process quits unexpectedly?
We could do that. It never occurred to me that this is something one
would want to have though. My reasoning is that if you mark a service
as respawnable, then you really want it to be respawned as soon as it
fails, not 5 seconds later.
Do you have a motivating example in mind (a daemon) to share?
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: shepherd respawn frequency
2023-07-02 20:01 ` Ludovic Courtès
@ 2023-07-02 20:11 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2023-07-02 20:25 ` Attila Lendvai
2023-07-03 9:08 ` Efraim Flashner
2 siblings, 0 replies; 5+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2023-07-02 20:11 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Attila Lendvai, guix-devel
Hi Ludo',
On Sun, Jul 2, 2023 at 1:01 PM Ludovic Courtès <ludo@gnu.org> wrote:
>
> We could do that.
Without advocating that the Shepherd draw any inspiration from
systemd, please allow me to mention, sotto-voce, that the default may
be 100 milliseconds over there. [1]
Kind regards
Felix
[1] https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec=
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: shepherd respawn frequency
2023-07-02 20:01 ` Ludovic Courtès
2023-07-02 20:11 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2023-07-02 20:25 ` Attila Lendvai
2023-07-03 9:08 ` Efraim Flashner
2 siblings, 0 replies; 5+ messages in thread
From: Attila Lendvai @ 2023-07-02 20:25 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
> We could do that. It never occurred to me that this is something one
> would want to have though. My reasoning is that if you mark a service
> as respawnable, then you really want it to be respawned as soon as it
> fails, not 5 seconds later.
there are a large number of different applications that users may want to run as a service. i'm pretty sure a nontrivial subset of them can get into such an error loop.
this specific one is the Bee client of ethswarm.org. multiple threads are active during its startup. one of them is to connect to the blockchain node. when it's unreachable, then eventually it gives up and quits.
arguably, it would be a better behavior to keep trying indefinitely instead of quitting, but let's assume that its behavior is practically beyond our control.
--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
The real voyage of discovery consists not in seeking new landscapes, but in looking with new eyes.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: shepherd respawn frequency
2023-07-02 20:01 ` Ludovic Courtès
2023-07-02 20:11 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2023-07-02 20:25 ` Attila Lendvai
@ 2023-07-03 9:08 ` Efraim Flashner
2 siblings, 0 replies; 5+ messages in thread
From: Efraim Flashner @ 2023-07-03 9:08 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Attila Lendvai, guix-devel
[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]
On Sun, Jul 02, 2023 at 10:01:26PM +0200, Ludovic Courtès wrote:
> Hi!
>
> Attila Lendvai <attila@lendvai.name> skribis:
>
> > would the shepherd maintaners (looking at you Ludo :) accept a change that introduces a new field into <service> called RESPAWN-DELAY, and issue a fiber sleep in RESPAWN-SERVICE when it is not #false, and the daemon process quits unexpectedly?
>
> We could do that. It never occurred to me that this is something one
> would want to have though. My reasoning is that if you mark a service
> as respawnable, then you really want it to be respawned as soon as it
> fails, not 5 seconds later.
>
> Do you have a motivating example in mind (a daemon) to share?
>
> Thanks,
> Ludo’.
>
If you want to make sure that something is really gone and cleaned up
before trying again. Or an artificial delay, say while waiting for the
correct network interface to come up. Or even just
`mbsync -a && exit 1 || exit 1`
--
Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-07-03 9:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-27 8:34 shepherd respawn frequency Attila Lendvai
2023-07-02 20:01 ` Ludovic Courtès
2023-07-02 20:11 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2023-07-02 20:25 ` Attila Lendvai
2023-07-03 9:08 ` Efraim Flashner
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).