* bug#58926: Shepherd becomes unresponsive after an interrupt
@ 2022-10-31 12:44 Mathieu Othacehe
2022-11-10 9:59 ` Ludovic Courtès
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Mathieu Othacehe @ 2022-10-31 12:44 UTC (permalink / raw)
To: 58926
Hello,
When running the following command:
--8<---------------cut here---------------start------------->8---
sudo herd restart service-that-hangs-upon-restart
--8<---------------cut here---------------end--------------->8---
then hitting C-c, Shepherd becomes totally unresponsive:
--8<---------------cut here---------------start------------->8---
sudo herd status
--8<---------------cut here---------------end--------------->8---
and all further Shpeherd commands hang forever. I was able to reproduce
it in two different configurations:
1. On my laptop with a Wireguard service trying to reach a non-existing
DNS server.
--8<---------------cut here---------------start------------->8---
(service wireguard-service-type
(wireguard-configuration
(addresses (list "10.0.0.2/24"))
(dns '("10.0.0.50")) #does not exit
--8<---------------cut here---------------end--------------->8---
2. On Berlin, while trying to restart nginx.
In both situations, the "reboot" command was also hanging.
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#58926: Shepherd becomes unresponsive after an interrupt
2022-10-31 12:44 bug#58926: Shepherd becomes unresponsive after an interrupt Mathieu Othacehe
@ 2022-11-10 9:59 ` Ludovic Courtès
2022-11-12 18:10 ` Ludovic Courtès
2022-11-12 18:28 ` Ludovic Courtès
2 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-10 9:59 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 58926
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
> sudo herd restart service-that-hangs-upon-restart
>
>
> then hitting C-c, Shepherd becomes totally unresponsive:
>
> sudo herd status
>
>
> and all further Shpeherd commands hang forever. I was able to reproduce
> it in two different configurations:
>
> 1. On my laptop with a Wireguard service trying to reach a non-existing
> DNS server.
>
> (service wireguard-service-type
> (wireguard-configuration
> (addresses (list "10.0.0.2/24"))
> (dns '("10.0.0.50")) #does not exit
>
> 2. On Berlin, while trying to restart nginx.
I experienced case #2: in that case ‘strace -p1’ showed that shepherd
was stuck on waitpid of the nginx process, which was not terminating.
Killing that process would unlock shepherd.
This might be <https://issues.guix.gnu.org/56674>.
Would be good to see what’s up with WireGuard.
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#58926: Shepherd becomes unresponsive after an interrupt
2022-10-31 12:44 bug#58926: Shepherd becomes unresponsive after an interrupt Mathieu Othacehe
2022-11-10 9:59 ` Ludovic Courtès
@ 2022-11-12 18:10 ` Ludovic Courtès
2022-11-17 10:23 ` bug#53225: " Ludovic Courtès
2022-11-12 18:28 ` Ludovic Courtès
2 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-12 18:10 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 53225, 58926
Mathieu Othacehe <othacehe@gnu.org> skribis:
> 1. On my laptop with a Wireguard service trying to reach a non-existing
> DNS server.
>
> (service wireguard-service-type
> (wireguard-configuration
> (addresses (list "10.0.0.2/24"))
> (dns '("10.0.0.50")) #does not exit
This one is similar to:
https://issues.guix.gnu.org/53225
https://issues.guix.gnu.org/53381
It has to do with the fact that “wg-quick up” blocks until it succeeds
and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
terminates.
The solution will be to use something non-blocking instead of ‘invoke’;
I’m looking into it.
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#58926: Shepherd becomes unresponsive after an interrupt
2022-10-31 12:44 bug#58926: Shepherd becomes unresponsive after an interrupt Mathieu Othacehe
2022-11-10 9:59 ` Ludovic Courtès
2022-11-12 18:10 ` Ludovic Courtès
@ 2022-11-12 18:28 ` Ludovic Courtès
2 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-12 18:28 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 58926
Mathieu Othacehe <othacehe@gnu.org> skribis:
> then hitting C-c, Shepherd becomes totally unresponsive:
>
> sudo herd status
>
>
> and all further Shpeherd commands hang forever. I was able to reproduce
> it in two different configurations:
[...]
> 2. On Berlin, while trying to restart nginx.
I can’t reproduce it in a VM.
Before I try it on a production system :-), does anyone have a tip on
how to reproduce it? Or perhaps strace output from a system that
exhibits this bug?
TIA!
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#58926: Shepherd becomes unresponsive after an interrupt
2022-11-13 23:16 ` Ludovic Courtès
@ 2022-11-14 16:32 ` Ludovic Courtès
0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-14 16:32 UTC (permalink / raw)
To: 56674; +Cc: Mathieu Othacehe, 58926
Hello!
Ludovic Courtès <ludo@gnu.org> skribis:
> These fresh Shepherd commits install a non-blocking ‘system*’ replacement:
>
> 975b0aa service: Provide a non-blocking replacement of 'system*'.
> 039c7a8 service: Spawn a fiber responsible for process monitoring.
>
> We’ll have to do more testing and probably go for a 0.9.3 release soon.
Shepherd commit ada88074f0ab7551fd0f3dce8bf06de971382e79 passes my
tests. It definitely solves the wireguard example and similar things
(uses of ‘system*’ in service constructors/destructors); I can’t tell
for sure about nginx because I haven’t been able to reproduce it in a
VM. I’m interested in ways to reproduce it.
It does look like we could go with 0.9.3 real soon now.
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#53225: bug#58926: Shepherd becomes unresponsive after an interrupt
2022-11-12 18:10 ` Ludovic Courtès
@ 2022-11-17 10:23 ` Ludovic Courtès
0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-17 10:23 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 53225-done, 58926-done
Hi,
Ludovic Courtès <ludo@gnu.org> skribis:
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> 1. On my laptop with a Wireguard service trying to reach a non-existing
>> DNS server.
>>
>> (service wireguard-service-type
>> (wireguard-configuration
>> (addresses (list "10.0.0.2/24"))
>> (dns '("10.0.0.50")) #does not exit
>
> This one is similar to:
>
> https://issues.guix.gnu.org/53225
> https://issues.guix.gnu.org/53381
>
> It has to do with the fact that “wg-quick up” blocks until it succeeds
> and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
> terminates.
>
> The solution will be to use something non-blocking instead of ‘invoke’;
> I’m looking into it.
This is fixed in the Shepherd 0.9.3, which landed in Guix commit
283d7318c5b312d7129adb6dbeea6ad205ce89d1.
As I wrote, I’m not sure whether it fixes the nginx situation since I
could not reproduce it. I’m closing and let’s open a new issue
specifically for nginx if it comes up again with 0.9.3.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-11-17 10:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 12:44 bug#58926: Shepherd becomes unresponsive after an interrupt Mathieu Othacehe
2022-11-10 9:59 ` Ludovic Courtès
2022-11-12 18:10 ` Ludovic Courtès
2022-11-17 10:23 ` bug#53225: " Ludovic Courtès
2022-11-12 18:28 ` Ludovic Courtès
-- strict thread matches above, loose matches on Subject: below --
2022-07-20 21:39 bug#56674: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks Ludovic Courtès
2022-11-13 23:16 ` Ludovic Courtès
2022-11-14 16:32 ` bug#58926: Shepherd becomes unresponsive after an interrupt Ludovic Courtès
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.