unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* shepherd: failing test: should `herd stop` stop a respawning process?
@ 2024-10-06 17:02 Attila Lendvai
  2024-10-06 17:31 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  2024-10-23 20:03 ` Ludovic Courtès
  0 siblings, 2 replies; 5+ messages in thread
From: Attila Lendvai @ 2024-10-06 17:02 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

i have a daemon process that quits soon after it starts (when it has some issue with its configurations).

then i usually fix the config, and do a guix system reconfigure. but i have noticed from the logs that this process often remains in a resawn loop, even if i herd stop and herd disable it after the reconfigure (i.e. a shepherd service upgrade).

i have attached a respawn2.sh that when put under tests/ reproduces the issue in a shepherd checkout (see the TODO notes):

$ guix shell
$ make check TESTS="tests/respawn2.sh"

what's wrong?
 - is it my expectation that herd stop should stop the respawning loop?

 - do i have a bug in my test.sh?

 - is this a shepherd bug? if so, then shall i finish up this test case as a proper patch for shepherd?

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“In all history there is no war which was not hatched by the governments, the governments alone, independent of the interests of the people, to whom war is always pernicious even when successful.”
	— Leo Tolstoy (1828–1910), 'On Patriotism' (1894)

[-- Attachment #2: respawn2.sh --]
[-- Type: application/x-shellscript, Size: 4298 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: shepherd: failing test: should `herd stop` stop a respawning process?
  2024-10-06 17:02 shepherd: failing test: should `herd stop` stop a respawning process? Attila Lendvai
@ 2024-10-06 17:31 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  2024-10-23 20:03 ` Ludovic Courtès
  1 sibling, 0 replies; 5+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-10-06 17:31 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: guix-devel

Hi Attila,

On Sun, Oct 06 2024, Attila Lendvai wrote:

>  - my expectation that herd stop should stop the respawning loop?

My sense is that the Shepherd and systemd always restart services.  I
don't think they know whether a failure relates to configuration or to
something else, but they could probably measure the rate of failure and
stop at some point.

You can mark services as "one-shot."  They are not restarted.  Maybe
that setting is helpful until your configuration is working.

Kind regards
Felix


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: shepherd: failing test: should `herd stop` stop a respawning process?
  2024-10-06 17:02 shepherd: failing test: should `herd stop` stop a respawning process? Attila Lendvai
  2024-10-06 17:31 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-10-23 20:03 ` Ludovic Courtès
  2024-11-22 20:43   ` Attila Lendvai
  1 sibling, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2024-10-23 20:03 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: guix-devel

Hi,

Attila Lendvai <attila@lendvai.name> skribis:

> i have a daemon process that quits soon after it starts (when it has some issue with its configurations).
>
> then i usually fix the config, and do a guix system reconfigure. but i have noticed from the logs that this process often remains in a resawn loop, even if i herd stop and herd disable it after the reconfigure (i.e. a shepherd service upgrade).

If it’s in a respawn loop, the problem is that ‘herd stop’ may or may
not happen at the right moment, because the service oscillates between
the stopped/starting/running/stopping statuses.

However, ‘herd disable’ should prevent it from being respawned.
(Respawning calls ‘start-service’, which cannot start a service marked
as disabled.)

But I don’t know, there could be a bug.  Could you come up with a
reduced test case (I looked at the one attached but I’m not sure which
part to focus on), or do you have logs of the problem?

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: shepherd: failing test: should `herd stop` stop a respawning process?
  2024-10-23 20:03 ` Ludovic Courtès
@ 2024-11-22 20:43   ` Attila Lendvai
  2024-11-30 18:29     ` Ludovic Courtès
  0 siblings, 1 reply; 5+ messages in thread
From: Attila Lendvai @ 2024-11-22 20:43 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

> But I don’t know, there could be a bug. Could you come up with a
> reduced test case (I looked at the one attached but I’m not sure which
> part to focus on), or do you have logs of the problem?


ok, i've attached a stipped down version of the test case. it hopefully reproduces the same situation i'm observing on my servers.

which seems to be the following:

  1. i have a service that keeps respawning (typically due to a config
     mistake)

  2. said service is upgraded/replaced in a `guix system reconfigure`

  3. v1 of the service keeps respawning forever, and there's nothing i
     can do to stop it at this point. `herd disable` operates on v2 of
     the service, while some fiber, or some signal handler of v1 is
     still in a respawn loop.

HTH,

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Our scientific power has outrun our spiritual power. We have guided missiles and misguided men.”
	— Martin Luther King, Jr. (1929–1968, assassinated)

[-- Attachment #2: respawn2.sh --]
[-- Type: application/x-shellscript, Size: 3618 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: shepherd: failing test: should `herd stop` stop a respawning process?
  2024-11-22 20:43   ` Attila Lendvai
@ 2024-11-30 18:29     ` Ludovic Courtès
  0 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2024-11-30 18:29 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: guix-devel

Hi Attila,

Attila Lendvai <attila@lendvai.name> skribis:

> ok, i've attached a stipped down version of the test case. it hopefully reproduces the same situation i'm observing on my servers.
>
> which seems to be the following:
>
>   1. i have a service that keeps respawning (typically due to a config
>      mistake)
>
>   2. said service is upgraded/replaced in a `guix system reconfigure`
>
>   3. v1 of the service keeps respawning forever, and there's nothing i
>      can do to stop it at this point. `herd disable` operates on v2 of
>      the service, while some fiber, or some signal handler of v1 is
>      still in a respawn loop.

Thanks for the detailed bug report and test case.  It’s a pretty nasty
bug that you found here.

Commit 5fe594d593e6dcb19e23029bf3ff5f4a77a92523 should fix it.  Let me
know if you notice anything wrong!

Ludo’.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-11-30 18:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-06 17:02 shepherd: failing test: should `herd stop` stop a respawning process? Attila Lendvai
2024-10-06 17:31 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-10-23 20:03 ` Ludovic Courtès
2024-11-22 20:43   ` Attila Lendvai
2024-11-30 18:29     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).