unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#58926: Shepherd becomes unresponsive after an interrupt
@ 2022-10-31 12:44 Mathieu Othacehe
  2022-11-10  9:59 ` Ludovic Courtès
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Mathieu Othacehe @ 2022-10-31 12:44 UTC (permalink / raw)
  To: 58926


Hello,

When running the following command:

--8<---------------cut here---------------start------------->8---
sudo herd restart service-that-hangs-upon-restart
--8<---------------cut here---------------end--------------->8---

then hitting C-c, Shepherd becomes totally unresponsive:

--8<---------------cut here---------------start------------->8---
sudo herd status
--8<---------------cut here---------------end--------------->8---

and all further Shpeherd commands hang forever. I was able to reproduce
it in two different configurations:

1. On my laptop with a Wireguard service trying to reach a non-existing
DNS server.

--8<---------------cut here---------------start------------->8---
            (service wireguard-service-type
                     (wireguard-configuration
                      (addresses (list "10.0.0.2/24"))
                      (dns '("10.0.0.50")) #does not exit
--8<---------------cut here---------------end--------------->8---

2. On Berlin, while trying to restart nginx.

In both situations, the "reboot" command was also hanging.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#58926: Shepherd becomes unresponsive after an interrupt
  2022-10-31 12:44 bug#58926: Shepherd becomes unresponsive after an interrupt Mathieu Othacehe
@ 2022-11-10  9:59 ` Ludovic Courtès
  2022-11-12 18:10 ` Ludovic Courtès
  2022-11-12 18:28 ` Ludovic Courtès
  2 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-10  9:59 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 58926

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

> sudo herd restart service-that-hangs-upon-restart
>
>
> then hitting C-c, Shepherd becomes totally unresponsive:
>
> sudo herd status
>
>
> and all further Shpeherd commands hang forever. I was able to reproduce
> it in two different configurations:
>
> 1. On my laptop with a Wireguard service trying to reach a non-existing
> DNS server.
>
>             (service wireguard-service-type
>                      (wireguard-configuration
>                       (addresses (list "10.0.0.2/24"))
>                       (dns '("10.0.0.50")) #does not exit
>
> 2. On Berlin, while trying to restart nginx.

I experienced case #2: in that case ‘strace -p1’ showed that shepherd
was stuck on waitpid of the nginx process, which was not terminating.
Killing that process would unlock shepherd.

This might be <https://issues.guix.gnu.org/56674>.

Would be good to see what’s up with WireGuard.

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#58926: Shepherd becomes unresponsive after an interrupt
  2022-10-31 12:44 bug#58926: Shepherd becomes unresponsive after an interrupt Mathieu Othacehe
  2022-11-10  9:59 ` Ludovic Courtès
@ 2022-11-12 18:10 ` Ludovic Courtès
  2022-11-17 10:23   ` bug#53225: " Ludovic Courtès
  2022-11-12 18:28 ` Ludovic Courtès
  2 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-12 18:10 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 53225, 58926

Mathieu Othacehe <othacehe@gnu.org> skribis:

> 1. On my laptop with a Wireguard service trying to reach a non-existing
> DNS server.
>
>             (service wireguard-service-type
>                      (wireguard-configuration
>                       (addresses (list "10.0.0.2/24"))
>                       (dns '("10.0.0.50")) #does not exit

This one is similar to:

  https://issues.guix.gnu.org/53225
  https://issues.guix.gnu.org/53381

It has to do with the fact that “wg-quick up” blocks until it succeeds
and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
terminates.

The solution will be to use something non-blocking instead of ‘invoke’;
I’m looking into it.

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#58926: Shepherd becomes unresponsive after an interrupt
  2022-10-31 12:44 bug#58926: Shepherd becomes unresponsive after an interrupt Mathieu Othacehe
  2022-11-10  9:59 ` Ludovic Courtès
  2022-11-12 18:10 ` Ludovic Courtès
@ 2022-11-12 18:28 ` Ludovic Courtès
  2 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-12 18:28 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 58926

Mathieu Othacehe <othacehe@gnu.org> skribis:

> then hitting C-c, Shepherd becomes totally unresponsive:
>
> sudo herd status
>
>
> and all further Shpeherd commands hang forever. I was able to reproduce
> it in two different configurations:

[...]

> 2. On Berlin, while trying to restart nginx.

I can’t reproduce it in a VM.

Before I try it on a production system :-), does anyone have a tip on
how to reproduce it?  Or perhaps strace output from a system that
exhibits this bug?

TIA!

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#53225: bug#58926: Shepherd becomes unresponsive after an interrupt
  2022-11-12 18:10 ` Ludovic Courtès
@ 2022-11-17 10:23   ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-17 10:23 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 53225-done, 58926-done

Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> 1. On my laptop with a Wireguard service trying to reach a non-existing
>> DNS server.
>>
>>             (service wireguard-service-type
>>                      (wireguard-configuration
>>                       (addresses (list "10.0.0.2/24"))
>>                       (dns '("10.0.0.50")) #does not exit
>
> This one is similar to:
>
>   https://issues.guix.gnu.org/53225
>   https://issues.guix.gnu.org/53381
>
> It has to do with the fact that “wg-quick up” blocks until it succeeds
> and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
> terminates.
>
> The solution will be to use something non-blocking instead of ‘invoke’;
> I’m looking into it.

This is fixed in the Shepherd 0.9.3, which landed in Guix commit
283d7318c5b312d7129adb6dbeea6ad205ce89d1.

As I wrote, I’m not sure whether it fixes the nginx situation since I
could not reproduce it.  I’m closing and let’s open a new issue
specifically for nginx if it comes up again with 0.9.3.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#58926: Shepherd becomes unresponsive after an interrupt
  2022-11-13 23:16 ` Ludovic Courtès
@ 2022-11-14 16:32   ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-11-14 16:32 UTC (permalink / raw)
  To: 56674; +Cc: Mathieu Othacehe, 58926

Hello!

Ludovic Courtès <ludo@gnu.org> skribis:

> These fresh Shepherd commits install a non-blocking ‘system*’ replacement:
>
>   975b0aa service: Provide a non-blocking replacement of 'system*'.
>   039c7a8 service: Spawn a fiber responsible for process monitoring.
>
> We’ll have to do more testing and probably go for a 0.9.3 release soon.

Shepherd commit ada88074f0ab7551fd0f3dce8bf06de971382e79 passes my
tests.  It definitely solves the wireguard example and similar things
(uses of ‘system*’ in service constructors/destructors); I can’t tell
for sure about nginx because I haven’t been able to reproduce it in a
VM.  I’m interested in ways to reproduce it.

It does look like we could go with 0.9.3 real soon now.

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-17 10:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 12:44 bug#58926: Shepherd becomes unresponsive after an interrupt Mathieu Othacehe
2022-11-10  9:59 ` Ludovic Courtès
2022-11-12 18:10 ` Ludovic Courtès
2022-11-17 10:23   ` bug#53225: " Ludovic Courtès
2022-11-12 18:28 ` Ludovic Courtès
  -- strict thread matches above, loose matches on Subject: below --
2022-07-20 21:39 bug#56674: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks Ludovic Courtès
2022-11-13 23:16 ` Ludovic Courtès
2022-11-14 16:32   ` bug#58926: Shepherd becomes unresponsive after an interrupt Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).