* Upgrading Shepherd services
@ 2024-05-16 23:26 Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-17 10:59 ` Attila Lendvai
2024-05-17 15:20 ` Ludovic Courtès
0 siblings, 2 replies; 10+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-05-16 23:26 UTC (permalink / raw)
To: guix-devel
Hi,
I have a lot of custom Shepherd services. Every so often I make a
mistake that stalls the step in 'guix deploy' that upgrades Shepherd
services, but without any error messages.
Unfortunately, I can also no longer run 'herd status', which likewise
hangs, or 'reboot'. How may I debug such issues in my operating-system
declaration, please?
Kind regards
Felix
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-16 23:26 Upgrading Shepherd services Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-05-17 10:59 ` Attila Lendvai
2024-05-23 2:04 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-17 15:20 ` Ludovic Courtès
1 sibling, 1 reply; 10+ messages in thread
From: Attila Lendvai @ 2024-05-17 10:59 UTC (permalink / raw)
To: Felix Lechner; +Cc: guix-devel
> I have a lot of custom Shepherd services. Every so often I make a
> mistake that stalls the step in 'guix deploy' that upgrades Shepherd
> services, but without any error messages.
>
> Unfortunately, I can also no longer run 'herd status', which likewise
> hangs, or 'reboot'. How may I debug such issues in my operating-system
> declaration, please?
Ludo,
this is the kind of issue for which extensive logging is needed. i.e. there's no self-contained reproducer (or is there, Felix?), and it requires a live environment to experience it.
and i suspect that i may even have fixed this in one of the commits that cleans up shepherd's error handling. one of the issues i remember is that an exception from the start (or stop?) GEXP of a service sometimes brought shepherd into a non-responsive state (without any sign of it in its logs).
Felix,
i'm planning to rebase my branch on Ludo's devel branch. it's not trivial because Ludo continues hacking shepherd, but i'll hopefully do it in the next few days. after that you may give it a try and see if you experience this issue again, and if you do then you can have plenty of logs to give you a clue why/how it happens.
if you do have a reproducer, then i'd be interested in adding it as a test in the shepherd codebase.
https://codeberg.org/attila-lendvai-patches/shepherd/commits/branch/various
--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“It is humiliating to realize that when you drive yourself underground, when you fake who you are, often you do so for people you do not even like or respect.”
— Nathaniel Branden (1930–2014)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-16 23:26 Upgrading Shepherd services Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-17 10:59 ` Attila Lendvai
@ 2024-05-17 15:20 ` Ludovic Courtès
2024-05-19 15:55 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
1 sibling, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2024-05-17 15:20 UTC (permalink / raw)
To: Felix Lechner via Development of GNU Guix and the GNU System distribution.
Cc: Felix Lechner
Hi Felix,
Felix Lechner via "Development of GNU Guix and the GNU System
distribution." <guix-devel@gnu.org> skribis:
> I have a lot of custom Shepherd services. Every so often I make a
> mistake that stalls the step in 'guix deploy' that upgrades Shepherd
> services, but without any error messages.
>
> Unfortunately, I can also no longer run 'herd status', which likewise
> hangs, or 'reboot'. How may I debug such issues in my operating-system
> declaration, please?
The standard service constructors are de-constructors cannot block
shepherd entirely (at least not AFAIK). So my suggestion would be to
first look at any service you’re using that has a custom ‘start’ or
‘stop’ method doing weird things; make sure none of them can block.
Another thing: when shepherd is blocked, try “sudo strace -p1” so see
what syscall it’s waiting to complete (that’s the likely problem).
In addition, check the last lines of /var/log/messages so see what
shepherd was trying to do before blocking.
HTH!
Ludo’.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-17 15:20 ` Ludovic Courtès
@ 2024-05-19 15:55 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-25 9:02 ` Attila Lendvai
0 siblings, 1 reply; 10+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-05-19 15:55 UTC (permalink / raw)
To: Ludovic Courtès,
Felix Lechner via Development of GNU Guix and the GNU System distribution.,
Attila Lendvai
Hi Ludo' (and Attila)
On Fri, May 17 2024, Ludovic Courtès wrote:
> look at any service you’re using that has a custom ‘start’ or ‘stop’
> method doing weird things; make sure none of them can block.
Okay, that's the probably the source of my problems. I do a lot of
things in Guile in my operation-system declaration (without even
a program-file). It's just too convenient!
The resulting lack of isolation probably causes my issues, although
there seem to be a class of runtime errors causing me trouble that are
not blocking behaviors. (Remember my time with the days of the week
starting with zero instead of one?)
Some Newbie errors are hard to debug with Shepherd. In fairness, that's
probably true for all of Guile.
A better way to develop services is probably to use the Shepherd's REPL.
I have done so one time before and am now reading the manual.
The Shepherd may become a real sensation when folks outside Guix become
aware of it. It's a wonderful piece of software.
Please also allow me to address Attila's comments. I cannot say whether
I encountered a bug in Shepherd, or whether Attila and I saw the same
bug. I am sure, howeer, that the Shepherd's behavior sometimes deviates
from my expectations.
It's probably because I'm not using it right but it can be a real source
of frustration and anxiety at times.
And Attila, as for your interaction with Ludo' I am not sure there is
great value in venting about Ludo' making changes that are difficult to
rebase upon. It is the privilege of a maintainer.
You are not the only one to have felt that frustration.
At the same time, your contributions to the Shepherd could be very
valuable. You are talented and committed to excellence. All you have
to do---if it's not an overreach for me to say so here---is to get
yourself on the same page with Ludo'.
Please forgive my professorial tone.
For example, if Ludo' doesn't want debugging statements all over the
place there must be another plan to capture the output. (Ludo' has not
said how, or I read over it.) There is no point to litigate the details
here, but I would be happy to offer my help to mediate so that your
contributions become more acceptable upstream.
As a rule, I do not contribute to projects where my own direction
diverges too much, unless I offer features that are universally
attractive. Life is too short.
Fortunately, I do not see irreconcilable differences between your
direction and Ludo's but you have to keep an open mind.
I write in peace.
Felix
P.S. I'm looking out for a reproducer!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-17 10:59 ` Attila Lendvai
@ 2024-05-23 2:04 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-23 19:25 ` Attila Lendvai
2024-06-01 13:24 ` Ludovic Courtès
0 siblings, 2 replies; 10+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-05-23 2:04 UTC (permalink / raw)
To: Attila Lendvai; +Cc: guix-devel
Hi Attila,
On Fri, May 17 2024, Attila Lendvai wrote:
> if you do have a reproducer
Here is a small one for not booting, although the service activation
during 'guix deploy' succeeds.
Please try the Guix timer below with the Shepherd development branch.
My equipment does not boot when the apparently erroneous (actions ...)
field in the shepherd-service record is present.
King regards,
Felix
P.S. Advice on how to access the trigger would be welcome.
* * *
(define (garbage-collector-shepherd-service config)
(shepherd-service
(provision '(garbage-collector))
(requirement '(guix-daemon))
(modules '((shepherd service timer)))
(start #~(make-timer-constructor
;; Five minutes after midnight every day.
(calendar-event #:hours '(0) #:minutes '(5))
(command (list "guix" "gc" "--free-space=1G"))))
(stop #~(make-timer-destructor))
(actions
(list (shepherd-action
(name 'trigger)
(documentation "Trigger the action associated with this timer.")
(procedure #~(identity trigger-timer)))))
(documentation "Maintain minimum free space by cleaning up Guix garbage")))
(define garbage-collector-service-type
(service-type
(name 'garbage-collector)
(description
"Maintain minimum free space by cleaning up Guix garbage")
(extensions
(list
(service-extension shepherd-root-service-type
(compose list garbage-collector-shepherd-service))))
(default-value #f)))
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-23 2:04 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-05-23 19:25 ` Attila Lendvai
2024-05-24 4:17 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-01 13:24 ` Ludovic Courtès
1 sibling, 1 reply; 10+ messages in thread
From: Attila Lendvai @ 2024-05-23 19:25 UTC (permalink / raw)
To: Felix Lechner; +Cc: guix-devel
hi Felix,
> Here is a small one for not booting, although the service activation
> during 'guix deploy' succeeds.
>
> Please try the Guix timer below with the Shepherd development branch.
> My equipment does not boot when the apparently erroneous (actions ...)
> field in the shepherd-service record is present.
i cannot reproduce this.
maybe it fails for you due to some missing modules that are available in my test env?
this below is with my shepherd branch, but later i double checked with vanilla 'devel', and it works the same.
# herd trigger garbage-collector
Triggering timer.
#
herd[210]: [debug] Got a reply, processing it
shepherd[1]: [debug] fork+exec-command for (guix gc --free-space=1G), user #f, group #f, supplementary-groups (), log-file #f
shepherd[1]: [debug] exec-command for (guix gc --free-space=1G), user #f, group #f, supplementary-groups (), log-file #f, log-port #
shepherd[1]: Timer 'garbage-collector' spawned process 212.
shepherd[1]: [debug] query-service-controller; message status, service #<<service> provision: (garbage-collector) requirement: (guix
shepherd[1]: [debug] query-service-controller; message running, service #<<service> provision: (garbage-collector) requirement: (gui
shepherd[1]: [guix] guix gc: already 30082.59 MiBs available on /gnu/store, nothing to do
shepherd[1]: Process 212 of timer 'garbage-collector' terminated with status 0 after 1 seconds.
HTH,
--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“There are two ways to be fooled. One is to believe what isn't true; the other is to refuse to believe what is true.”
— Søren Kierkegaard (1813–1855)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-23 19:25 ` Attila Lendvai
@ 2024-05-24 4:17 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-24 17:19 ` Attila Lendvai
0 siblings, 1 reply; 10+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-05-24 4:17 UTC (permalink / raw)
To: Attila Lendvai; +Cc: guix-devel
Hi Attila,
On Thu, May 23 2024, Attila Lendvai wrote:
> maybe it fails for you due to some missing modules that are available
> in my test env?
Thanks for trying that out locally! Still no go here. I can restart
the upgraded services and trigger the timers, but my system won't boot.
> this below is with my shepherd branch
I see some services starting but no errors on the console. Also, there
is absolutely nothing in /var/log/messages. Would it help to diagnose
it using your Shepherd branch?
Kind regards
Felix
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-24 4:17 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-05-24 17:19 ` Attila Lendvai
0 siblings, 0 replies; 10+ messages in thread
From: Attila Lendvai @ 2024-05-24 17:19 UTC (permalink / raw)
To: Felix Lechner; +Cc: guix-devel
> I see some services starting but no errors on the console. Also, there
> is absolutely nothing in /var/log/messages. Would it help to diagnose
> it using your Shepherd branch?
yep, in two ways: my branch has extensive logging (and currently its default level is set to debug), and i also reworked and extended the error handling.
my expectation is that your machine should both start up, and also emit some useful log why that specific service is failing.
if that is not the case, then i'd really love to see a self-contained reproducer.
if you want to dig deeper towards a reproducer, then one option is to try to write a guix system test that reproduces it (see gnu/tests/ for examples, and `make check-system`).
to use my shepherd channel:
(channel
(name 'shepherd)
(url "https://codeberg.org/attila-lendvai-patches/shepherd.git")
(branch "attila")
(introduction
(make-channel-introduction
;; note that this commit id changes whenever i rebase and force-push my commits
"13557ba988f4976f6581149ecdc06fce031258c7"
(openpgp-fingerprint
"69DA 8D74 F179 7AD6 7806 EE06 FEFA 9FE5 5CF6 E3CD"))))
and in your OS definition follow the instructions that are now in the shepherd README.
HTH,
--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Gradualism in theory is perpetuity in practice.”
— Jared Howe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-19 15:55 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-05-25 9:02 ` Attila Lendvai
0 siblings, 0 replies; 10+ messages in thread
From: Attila Lendvai @ 2024-05-25 9:02 UTC (permalink / raw)
To: Felix Lechner
Cc: Ludovic Courtès,
Felix Lechner via Development of GNU Guix and the GNU System distribution.
hi Felix,
> And Attila, as for your interaction with Ludo' I am not sure there is
> great value in venting about Ludo' making changes that are difficult to
> rebase upon. It is the privilege of a maintainer.
>
> You are not the only one to have felt that frustration.
well, i have two hats on in this situation:
when it's my developer hat on, then i agree with you.
but when i have the enthusiastic guix user hat on... then i'm a bit concerned that shepherd seems to be a one-bus project (https://chaoss.community/kb/metric-bus-factor/). and it has issues that are stopping me from using guix in ways that i'd like to... which is why i sometimes put the developer hat on, and then send my contributions... which are then met with... well... a moderate level of enthusiasm.
now, i, the dev, understand Ludo's perspective: i also prefer spending my free time hacking ahead on the joyous path of my own plans and inspirations, instead of reviewing contributions.
but one of these contributions was a fix for a long-standing, and rather hard to find bug (that, BTW, also caused the recent, multi-day outage of several guix services). and the rest of the commits in my branch are mostly "just" the means to finding bugs like that, including the ones in my own services. and it's reasonable to expect that these commits will be useful for finding future bugs, too. and i, the user, am somewhat concerned about the way such contributions are greeted.
now, the situation is tricky here, because i'm both guys... :) and the concerned voice of the enthusiastic user sure sounds like the whining of a self-righteous, misunderstood genius... so, yeah. but here we are nevertheless.
> At the same time, your contributions to the Shepherd could be very
> valuable. You are talented and committed to excellence. All you have
> to do---if it's not an overreach for me to say so here---is to get
> yourself on the same page with Ludo'.
that sounds like a monarchy, but my preferred locales are meritocracies... ;)
yet, i think i'm still going the extra mile for now, and i'm jumping even those hoops that i find arbitrary (even if i argue against them in the process).
> Please forgive my professorial tone.
no, it's welcome, i appreciate your feedback! it has helped me to understand my internal dev vs. user conflict.
> For example, if Ludo' doesn't want debugging statements all over the
> place there must be another plan to capture the output. (Ludo' has not
> said how, or I read over it.) There is no point to litigate the details
ultimately, you can't escape the fact that only the programmer knows what state is useful in a sequential log for understanding the dynamic behavior of a codebase. and "log statements scattered around the codebase" are exactly those annotations. and in addition they also serve as comments, only "smart" ones that are also observable at runtime when needed.
> here, but I would be happy to offer my help to mediate so that your
> contributions become more acceptable upstream.
>
> As a rule, I do not contribute to projects where my own direction
> diverges too much, unless I offer features that are universally
> attractive. Life is too short.
sure, i get it. and with only my programmer hat on, i wouldn't even be here writing this mail... but with my enthusiastic user hat on, i'm all the more concerned about that sentiment!
--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“The true test of intelligence is not how much we know how to do, but how we behave when we don't know what to do.”
— John Holt (1923–1985), 'How Children Fail' (1964)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Upgrading Shepherd services
2024-05-23 2:04 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-23 19:25 ` Attila Lendvai
@ 2024-06-01 13:24 ` Ludovic Courtès
1 sibling, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2024-06-01 13:24 UTC (permalink / raw)
To: Felix Lechner via Development of GNU Guix and the GNU System distribution.
Cc: Attila Lendvai, Felix Lechner
Hello!
Felix Lechner via "Development of GNU Guix and the GNU System
distribution." <guix-devel@gnu.org> skribis:
> (define (garbage-collector-shepherd-service config)
FWIW, if it can help, I have something very similar in my config:
--8<---------------cut here---------------start------------->8---
(define %gc-service
(simple-service
'gc shepherd-root-service-type
(list (shepherd-service
(provision '(gc))
(requirement '(user-processes))
(modules '((shepherd service timer)))
(start #~(make-timer-constructor
(calendar-event #:minutes '(0))
(command '("/run/current-system/profile/bin/guix" "gc" "-F2G")
#:user "ludo")))
(stop #~(make-timer-destructor))
(documentation "Run the garbage collector (GC).")
(actions
(list (shepherd-action
(name 'trigger)
(documentation "GC!")
(procedure #~trigger-timer))))))))
--8<---------------cut here---------------end--------------->8---
… and I add ‘%gc-service’ to my ‘services’ field.
Works as advertised:
--8<---------------cut here---------------start------------->8---
$ sudo /run/current-system/profile/bin/herd status gc
Password:
Status of gc:
It is running since Mon 27 May 2024 08:39:28 AM CEST (5 days ago).
Timed service.
Periodically running as "ludo": /run/current-system/profile/bin/guix gc -F2G.
It is enabled.
Provides: gc.
Requires: user-processes.
Will be respawned.
Recent runs:
2024-05-31 21:00:00 Process exited successfully after 0 seconds.
2024-05-31 22:00:00 Process exited successfully after 0 seconds.
2024-05-31 23:00:00 Process exited successfully after 0 seconds.
2024-06-01 00:00:00 Process exited successfully after 0 seconds.
2024-06-01 01:00:00 Process exited successfully after 0 seconds.
2024-06-01 11:00:00 Process exited successfully after 0 seconds.
2024-06-01 12:00:00 Process exited successfully after 0 seconds.
2024-06-01 13:00:00 Process exited successfully after 0 seconds.
2024-06-01 14:00:00 Process exited successfully after 0 seconds.
2024-06-01 15:00:00 Process exited successfully after 0 seconds.
Recent messages:
2024-06-01 15:00:00 guix gc: already 4169.06 MiBs available on /gnu/store, nothing to do
Upcoming timer alarms:
04:00:00 PM (in 37 minutes)
05:00:00 PM (in 97 minutes)
06:00:00 PM (in 3 hours)
07:00:00 PM (in 4 hours)
08:00:00 PM (in 5 hours)
--8<---------------cut here---------------end--------------->8---
HTH!
Ludo’.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-06-01 13:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-16 23:26 Upgrading Shepherd services Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-17 10:59 ` Attila Lendvai
2024-05-23 2:04 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-23 19:25 ` Attila Lendvai
2024-05-24 4:17 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-24 17:19 ` Attila Lendvai
2024-06-01 13:24 ` Ludovic Courtès
2024-05-17 15:20 ` Ludovic Courtès
2024-05-19 15:55 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-05-25 9:02 ` Attila Lendvai
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).