unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#67839: shepherd: sometimes hangs on `guix system reconfigure`
@ 2023-12-15 19:20 Attila Lendvai
  2023-12-15 19:37 ` bug#67839: [PATCH 1/2] shepherd: Move root-service start under with-process-monitor Attila Lendvai
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Attila Lendvai @ 2023-12-15 19:20 UTC (permalink / raw)
  To: 67839

my fellow hackers,

i'm going to attach two patches that is essentially just adding a couple of asserts that trigger a test failure (tests/replacement.sh).

my current codebase (https://codeberg.org/attila-lendvai-patches/shepherd/commits/branch/attila) logs a whole lot more information, and has a more sophisticated error handling. triggering the same error on that codebase shows that the first assert is already failing (the one that is before spawning the new fiber for the controller of the replacement).

maybe the root cause is this: https://github.com/wingo/fibers/issues/29

HTH,

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Angry people want you to see how powerful they are… loving people want you to see how powerful You are.”
	— Chief Red Eagle





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#67839: [PATCH 1/2] shepherd: Move root-service start under with-process-monitor.
  2023-12-15 19:20 bug#67839: shepherd: sometimes hangs on `guix system reconfigure` Attila Lendvai
@ 2023-12-15 19:37 ` Attila Lendvai
  2023-12-15 19:37   ` bug#67839: [PATCH 2/2] service: Add two asserts that will make tests/replacement.sh fail Attila Lendvai
  2023-12-17  0:06 ` bug#67839: [PATCH 3/2] shepherd: Fix tests/replacement.sh Attila Lendvai
  2023-12-17  0:44 ` bug#67839: [PATCH v2 1/2] shepherd: Make sure with-process-monitor covers everything needed Attila Lendvai
  2 siblings, 1 reply; 7+ messages in thread
From: Attila Lendvai @ 2023-12-15 19:37 UTC (permalink / raw)
  To: 67839; +Cc: Attila Lendvai

* modules/shepherd.scm (main): move the (start-service root-service) under the
dynamic extent of with-process-monitor, so that (current-process-monitor) is
valid for the root-service, too.
---
 modules/shepherd.scm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index efc5517..77c6d18 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -451,12 +451,12 @@ fork in the child process."
         (run-fibers
          (lambda ()
            (with-service-registry
+             (with-process-monitor
 
-             ;; Register and start the 'root' service.
-             (register-services (list root-service))
-             (start-service root-service)
+               ;; Register and start the 'root' service.
+               (register-services (list root-service))
+               (start-service root-service)
 
-             (with-process-monitor
                ;; Replace the default 'system*' binding with one that
                ;; cooperates instead of blocking on 'waitpid'.  Replace
                ;; 'primitive-load' (in C as of 3.0.9) with one that does
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#67839: [PATCH 2/2] service: Add two asserts that will make tests/replacement.sh fail.
  2023-12-15 19:37 ` bug#67839: [PATCH 1/2] shepherd: Move root-service start under with-process-monitor Attila Lendvai
@ 2023-12-15 19:37   ` Attila Lendvai
  0 siblings, 0 replies; 7+ messages in thread
From: Attila Lendvai @ 2023-12-15 19:37 UTC (permalink / raw)
  To: 67839; +Cc: Attila Lendvai

* modules/shepherd/service.scm (spawn-service-controller): Add two asserts.
This is the bug that causes `guix system reconfigure ...` to sometimes hang,
and subsequently all shepherd commands, because a match-error flies out from
the service-controller of a replaced service, and thus its fiber dies.
---
 modules/shepherd/service.scm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index c3bdf44..0ee6929 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -382,9 +382,11 @@ denoting what the service provides."
 
 (define (spawn-service-controller service)
   "Return a channel over which @var{service} may be controlled."
+  (assert (current-process-monitor))
   (let ((channel (make-channel)))
     (spawn-fiber
      (lambda ()
+       (assert (current-process-monitor))
        ;; The controller writes to its current output port via 'local-output'.
        ;; Make sure that goes to the right port.  If the controller got a
        ;; wrong output port, it could crash and stop responding just because a
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#67839: [PATCH 3/2] shepherd: Fix tests/replacement.sh
  2023-12-15 19:20 bug#67839: shepherd: sometimes hangs on `guix system reconfigure` Attila Lendvai
  2023-12-15 19:37 ` bug#67839: [PATCH 1/2] shepherd: Move root-service start under with-process-monitor Attila Lendvai
@ 2023-12-17  0:06 ` Attila Lendvai
  2023-12-17  0:44 ` bug#67839: [PATCH v2 1/2] shepherd: Make sure with-process-monitor covers everything needed Attila Lendvai
  2 siblings, 0 replies; 7+ messages in thread
From: Attila Lendvai @ 2023-12-17  0:06 UTC (permalink / raw)
  To: 67839; +Cc: Attila Lendvai

* modules/shepherd.scm (main): Switch with-service-registry and
with-process-monitor.  Fix proposed by @emixa-d at
https://github.com/wingo/fibers/issues/29#issuecomment-1858922276.  This way
the parameterize of the process monitor covers everything else.
---
 modules/shepherd.scm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 77c6d18..3303de3 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -450,8 +450,8 @@ fork in the child process."
         ;; because POSIX threads and 'fork' cannot be used together.
         (run-fibers
          (lambda ()
-           (with-service-registry
-             (with-process-monitor
+           (with-process-monitor
+             (with-service-registry
 
                ;; Register and start the 'root' service.
                (register-services (list root-service))
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#67839: [PATCH v2 1/2] shepherd: Make sure with-process-monitor covers everything needed.
  2023-12-15 19:20 bug#67839: shepherd: sometimes hangs on `guix system reconfigure` Attila Lendvai
  2023-12-15 19:37 ` bug#67839: [PATCH 1/2] shepherd: Move root-service start under with-process-monitor Attila Lendvai
  2023-12-17  0:06 ` bug#67839: [PATCH 3/2] shepherd: Fix tests/replacement.sh Attila Lendvai
@ 2023-12-17  0:44 ` Attila Lendvai
  2023-12-17  0:44   ` bug#67839: [PATCH v2 2/2] service: Add asserts that used to make tests/replacement.sh fail Attila Lendvai
  2023-12-17 21:59   ` bug#67839: shepherd: sometimes hangs on `guix system reconfigure` Ludovic Courtès
  2 siblings, 2 replies; 7+ messages in thread
From: Attila Lendvai @ 2023-12-17  0:44 UTC (permalink / raw)
  To: 67839; +Cc: Attila Lendvai

* modules/shepherd.scm (main): Switch with-service-registry and
with-process-monitor.  This way the parameterize of the process monitor covers
everything else.  This fixes the bug that caused `guix system reconfigure` to
hang in certain situations.  Fix proposed by @emixa-d at:
https://github.com/wingo/fibers/issues/29#issuecomment-1858922276.
---
 modules/shepherd.scm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index efc5517..3303de3 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -450,13 +450,13 @@ fork in the child process."
         ;; because POSIX threads and 'fork' cannot be used together.
         (run-fibers
          (lambda ()
-           (with-service-registry
+           (with-process-monitor
+             (with-service-registry
 
-             ;; Register and start the 'root' service.
-             (register-services (list root-service))
-             (start-service root-service)
+               ;; Register and start the 'root' service.
+               (register-services (list root-service))
+               (start-service root-service)
 
-             (with-process-monitor
                ;; Replace the default 'system*' binding with one that
                ;; cooperates instead of blocking on 'waitpid'.  Replace
                ;; 'primitive-load' (in C as of 3.0.9) with one that does
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#67839: [PATCH v2 2/2] service: Add asserts that used to make tests/replacement.sh fail.
  2023-12-17  0:44 ` bug#67839: [PATCH v2 1/2] shepherd: Make sure with-process-monitor covers everything needed Attila Lendvai
@ 2023-12-17  0:44   ` Attila Lendvai
  2023-12-17 21:59   ` bug#67839: shepherd: sometimes hangs on `guix system reconfigure` Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Attila Lendvai @ 2023-12-17  0:44 UTC (permalink / raw)
  To: 67839; +Cc: Attila Lendvai

* modules/shepherd/service.scm (spawn-service-controller): Add two asserts.
This is the bug that causes `guix system reconfigure ...` to sometimes hang,
and subsequently all shepherd commands, because a match-error flies out from
the service-controller of a replaced service, and thus its fiber dies.  These
asserts get triggered without the previous commit that fixes the issue.
---
 modules/shepherd/service.scm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index c3bdf44..0ee6929 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -382,9 +382,11 @@ denoting what the service provides."
 
 (define (spawn-service-controller service)
   "Return a channel over which @var{service} may be controlled."
+  (assert (current-process-monitor))
   (let ((channel (make-channel)))
     (spawn-fiber
      (lambda ()
+       (assert (current-process-monitor))
        ;; The controller writes to its current output port via 'local-output'.
        ;; Make sure that goes to the right port.  If the controller got a
        ;; wrong output port, it could crash and stop responding just because a
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#67839: shepherd: sometimes hangs on `guix system reconfigure`
  2023-12-17  0:44 ` bug#67839: [PATCH v2 1/2] shepherd: Make sure with-process-monitor covers everything needed Attila Lendvai
  2023-12-17  0:44   ` bug#67839: [PATCH v2 2/2] service: Add asserts that used to make tests/replacement.sh fail Attila Lendvai
@ 2023-12-17 21:59   ` Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2023-12-17 21:59 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: Attila Lendvai, 67839

Hi Attila,

Attila Lendvai <attila.lendvai@gmail.com> skribis:

> * modules/shepherd.scm (main): Switch with-service-registry and
> with-process-monitor.  This way the parameterize of the process monitor covers
> everything else.  This fixes the bug that caused `guix system reconfigure` to
> hang in certain situations.  Fix proposed by @emixa-d at:
> https://github.com/wingo/fibers/issues/29#issuecomment-1858922276.

[...]

> * modules/shepherd/service.scm (spawn-service-controller): Add two asserts.
> This is the bug that causes `guix system reconfigure ...` to sometimes hang,
> and subsequently all shepherd commands, because a match-error flies out from
> the service-controller of a replaced service, and thus its fiber dies.  These
> asserts get triggered without the previous commit that fixes the issue.

Good catch!!

I pushed these patches with small edits, in particular adding a test
that reproduces the bug without relying on assertion failures:

  5dbde1c support: ‘assert’ logs source location information.
  0bcf02a Update NEWS.
  c07f0a8 service: Add asserts to ensure a process monitor is running.
  9be0b7e shepherd: Make sure ‘with-process-monitor’ covers everything needed.

Thanks for the tedious but fruitful debugging work!

Ludo’.




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-12-17 22:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-15 19:20 bug#67839: shepherd: sometimes hangs on `guix system reconfigure` Attila Lendvai
2023-12-15 19:37 ` bug#67839: [PATCH 1/2] shepherd: Move root-service start under with-process-monitor Attila Lendvai
2023-12-15 19:37   ` bug#67839: [PATCH 2/2] service: Add two asserts that will make tests/replacement.sh fail Attila Lendvai
2023-12-17  0:06 ` bug#67839: [PATCH 3/2] shepherd: Fix tests/replacement.sh Attila Lendvai
2023-12-17  0:44 ` bug#67839: [PATCH v2 1/2] shepherd: Make sure with-process-monitor covers everything needed Attila Lendvai
2023-12-17  0:44   ` bug#67839: [PATCH v2 2/2] service: Add asserts that used to make tests/replacement.sh fail Attila Lendvai
2023-12-17 21:59   ` bug#67839: shepherd: sometimes hangs on `guix system reconfigure` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).