From: Attila Lendvai <attila@lendvai.name>
To: guix-devel <guix-devel@gnu.org>
Subject: shepherd respawn frequency
Date: Sat, 27 May 2023 08:34:51 +0000 [thread overview]
Message-ID: <bGCU7KHyHA2JPBvuFtyzn7MHs52dXt2SAUHjyZ6vQUhohqCmpmqPsjh7WDYbHQz_VgB1ujdxjapLxNR1UYTx4io-je-56QjmnqR-0alO4rc=@lendvai.name> (raw)
dear guix,
the issue at hand:
i have a daemon that simply quits when some of its running condition is not satisfied. this can be dependent on unpredictable external factors, like the temporary unreachability of a remote service.
shepherd respawns it immediately in RESPAWN-SERVICE, without any delay, which leads to a kind of a busy loop (i noticed this through the fan noise of the machine). i know that there's a stopgap measure to disable such services, but:
1) some of these daemons struggle long enough before quitting that
they do not trigger the default RESPAWN-LIMIT-HIT? stopgap
measure
2) i *do* want shepherd to keep restarting them indefinitely, but
not immediately after their premature exit
proposed solution:
would the shepherd maintaners (looking at you Ludo :) accept a change that introduces a new field into <service> called RESPAWN-DELAY, and issue a fiber sleep in RESPAWN-SERVICE when it is not #false, and the daemon process quits unexpectedly?
in an initial commit i'd also turn the global variable called RESPAWN-LIMIT into a field of <service>, and make it take its default value from a properly named %RESPAWN-LIMIT global variable.
open questions:
- what should be the default value of the respawn delay? i suggest 5
seconds, and i'd argue against it being disabled by default:
- premature exits happen more frequently at startup than in an
already running process
- an unwanted default respawn delay causes less headache than an
unwanted busy loop.
- if the respawn delay is set, then should respawn-limit be ignored?
IOW, should the logic treat them as two independent variables, or
should it not? and should there be some logic in how/where they
take their defaults from?
my pick: treat them as two independent variables, but when the user
explicitly specifies a respawn delay for the service object, then
there shouldn't be any respawn limit, unless the user also
explicitly specifies it on the <service> object.
corollary: the handling of defaults should be implemented so that
the fields of <service> hold #false as default value, in which
case the logic takes the default value from a global variable in
shepherd.
- should i bother with detecting a first respawn in a given past
period (of e.g. 1 minute?), and do not apply any delay when this is
the first respawn in that time window? this adds extra complexity,
which may not be worth it. i'd go with a pass here.
- after a cursory look, i don't understand the relationship between
RESPAWNS and FAILURES. the former seems to be an endlessly growing
list of timestamps, while the latter is a ring buffer of
timestamps. it's not crucial for me to understand it, but i wonder
if there's a bug lurking there that eats up the heap when a service
keeps respawning without any delay?
i'm all ears for suggestions, and i'm also happy to hand over the implementation to someone else, who already had plans to do it, and knows the internals of shepherd better than me.
--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Looking back on a 30-year teaching career full of rewards and prizes, somehow I can't completely believe that I spent my time on earth institutionalized; I can't believe that centralized schooling is allowed to exist at all as a gigantic indoctrination and sorting machine, robbing people of their children. Did it really happen? Was this my life? God help me.”
— John Taylor Gatto (1935–2018), Teacher of the Year, both in New York City and State, multiple times
next reply other threads:[~2023-05-27 8:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-27 8:34 Attila Lendvai [this message]
2023-07-02 20:01 ` shepherd respawn frequency Ludovic Courtès
2023-07-02 20:11 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2023-07-02 20:25 ` Attila Lendvai
2023-07-03 9:08 ` Efraim Flashner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='bGCU7KHyHA2JPBvuFtyzn7MHs52dXt2SAUHjyZ6vQUhohqCmpmqPsjh7WDYbHQz_VgB1ujdxjapLxNR1UYTx4io-je-56QjmnqR-0alO4rc=@lendvai.name' \
--to=attila@lendvai.name \
--cc=guix-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.