unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
* [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
@ 2018-03-26 11:16 Carlo Zancanaro
  2018-03-26 23:39 ` Carlo Zancanaro
  2018-03-29 20:07 ` Ludovic Courtès
  0 siblings, 2 replies; 8+ messages in thread
From: Carlo Zancanaro @ 2018-03-26 11:16 UTC (permalink / raw)
  To: 30948


[-- Attachment #1.1: Type: text/plain, Size: 576 bytes --]

When working on the Shepherd, I found that in the build containers 
processes don't get reaped by pid 1. See 
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=30637#29. This 
caused (and will cause) the Shepherd's tests to fail on some 
systems.

Our guile-builder script should handle SIGCHLD and then use 
waitpid to reap the child processes. Here's my attempt at a patch 
to do that.

I haven't been able to build anything with it because the computer 
I'm currently on is laughably slow. If someone else can check that 
you can still build with it I'd really appreciate it.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-guix-Reap-finished-child-processes-in-build-containe.patch --]
[-- Type: text/x-patch, Size: 1457 bytes --]

From 7c66818570a139fc4e7b11de34d07c76ebdc6bac Mon Sep 17 00:00:00 2001
From: Carlo Zancanaro <carlo@zancanaro.id.au>
Date: Mon, 26 Mar 2018 22:08:26 +1100
Subject: [PATCH] guix: Reap finished child processes in build containers.

* guix/derivations (build-expression->derivation)[prologue]: Handle SIGCHLD
  and reap child processes when they finish.
---
 guix/derivations.scm | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/guix/derivations.scm b/guix/derivations.scm
index da686e89e..80787e99e 100644
--- a/guix/derivations.scm
+++ b/guix/derivations.scm
@@ -1180,6 +1180,17 @@ ALLOWED-REFERENCES, DISALLOWED-REFERENCES, LOCAL-BUILD?, and SUBSTITUTABLE?."
                            (filter module-form? exp))
                           (_ `(,exp)))
 
+                      ;; The root process in the build container should reap
+                      ;; processes that die, so handle SIGCHLD.
+                      (sigaction SIGCHLD
+                        (lambda ()
+                          (let loop ()
+                            (match (waitpid WAIT_ANY WNOHANG)
+                              ((0 . _) #f)
+                              ((pid . _) (loop))
+                              (_ #f))))
+                        SA_NOCLDSTOP)
+
                       (define %output (getenv "out"))
                       (define %outputs
                         (map (lambda (o)
-- 
2.16.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
  2018-03-26 11:16 [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers Carlo Zancanaro
@ 2018-03-26 23:39 ` Carlo Zancanaro
  2018-03-29 20:07 ` Ludovic Courtès
  1 sibling, 0 replies; 8+ messages in thread
From: Carlo Zancanaro @ 2018-03-26 23:39 UTC (permalink / raw)
  To: 30948


[-- Attachment #1.1: Type: text/plain, Size: 301 bytes --]

Okay, it turns out my previous patch was very wrong. I tried to 
start a build and it broke pretty significantly.

I've attached a new patch that at least starts building. My 
computer takes too long to actually build anything, but I'm 
slightly more confident that my change won't break everything.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-guix-Reap-finished-child-processes-in-build-containe.patch --]
[-- Type: text/x-patch, Size: 1693 bytes --]

From c57b2fe19865afc21fd1fd9a7cad3286b05a9b22 Mon Sep 17 00:00:00 2001
From: Carlo Zancanaro <carlo@zancanaro.id.au>
Date: Mon, 26 Mar 2018 22:08:26 +1100
Subject: [PATCH] guix: Reap finished child processes in build containers.

* guix/derivations (build-expression->derivation)[prologue]: Handle SIGCHLD
  and reap child processes when they finish.
---
 guix/derivations.scm | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/guix/derivations.scm b/guix/derivations.scm
index da686e89e..27ab3e420 100644
--- a/guix/derivations.scm
+++ b/guix/derivations.scm
@@ -1201,6 +1201,21 @@ ALLOWED-REFERENCES, DISALLOWED-REFERENCES, LOCAL-BUILD?, and SUBSTITUTABLE?."
                                           (else drv))))))
                                inputs))
 
+                      ;; The root process in the build container should reap
+                      ;; processes that die, so handle SIGCHLD.
+                      (use-modules (ice-9 match))
+                      (sigaction SIGCHLD
+                        (lambda _
+                          (let loop ()
+                            (match (catch 'system-error
+                                     (lambda ()
+                                       (waitpid WAIT_ANY WNOHANG))
+                                     (lambda args
+                                       '(0 . -)))
+                              ((0 . _) #f)
+                              ((pid . _) (loop)))))
+                        SA_NOCLDSTOP)
+
                       ,@(if (null? modules)
                             '()
                             ;; Remove our own settings.
-- 
2.16.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
  2018-03-26 11:16 [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers Carlo Zancanaro
  2018-03-26 23:39 ` Carlo Zancanaro
@ 2018-03-29 20:07 ` Ludovic Courtès
  2018-03-29 21:15   ` Carlo Zancanaro
  1 sibling, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2018-03-29 20:07 UTC (permalink / raw)
  To: Carlo Zancanaro; +Cc: 30948

[-- Attachment #1: Type: text/plain, Size: 3231 bytes --]

Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

> When working on the Shepherd, I found that in the build containers
> processes don't get reaped by pid 1. See
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=30637#29. This caused
> (and will cause) the Shepherd's tests to fail on some systems.
>
> Our guile-builder script should handle SIGCHLD and then use waitpid to
> reap the child processes. Here's my attempt at a patch to do that.

I would rather install the handler as a phase in gnu-build-system: this
leaves ‘build-expression->derivation’ generic, and also gives us more
flexibility (e.g., we can disable that phase without doing a full
rebuild if needed.)  See the patch below.

WDYT?

On my first attempt with:

  ./pre-inst-env guix build -e '(@@ (gnu packages commencement) findutils-boot0)'

quickly failed:

--8<---------------cut here---------------start------------->8---
checking for vfork.h... no
checking for fork... yes
checking for vfork... yes
checking for working fork... Backtrace:
In ice-9/boot-9.scm:
yes
checking for working vfork... (cached) yes
checking for strcasecmp...  157: 13 [catch #t #<catch-closure c900a0> ...]
In unknown file:
   ?: 12 [apply-smob/1 #<catch-closure c900a0>]
In ice-9/boot-9.scm:
  63: 11 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 10 [eval # #]
In ice-9/boot-9.scm:
2320: 9 [save-module-excursion #<procedure cc1b80 at ice-9/boot-9.scm:3961:3 ()>]
3966: 8 [#<procedure cc1b80 at ice-9/boot-9.scm:3961:3 ()>]
1645: 7 [%start-stack load-stack #<procedure cbd2c0 at ice-9/boot-9.scm:3957:10 ()>]
1650: 6 [#<procedure cc3060 ()>]
In unknown file:
   ?: 5 [primitive-load "/gnu/store/pz3jy89ax5jg0j6fnp5n42x4vznga8s3-make-boot0-4.2.1-guile-builder"]
In ice-9/eval.scm:
 387: 4 [eval # ()]
In srfi/srfi-1.scm:
 619: 3 [for-each #<procedure 1217560 at /gnu/store/hf8xflikhgsd4hfy9h8s0cjzfqm8f3yb-module-import/guix/build/gnu-build-system.scm:815:12 (expr)> ...]
In /gnu/store/hf8xflikhgsd4hfy9h8s0cjzfqm8f3yb-module-import/guix/build/gnu-build-system.scm:
 819: 2 [#<procedure 1217560 at /gnu/store/hf8xflikhgsd4hfy9h8s0cjzfqm8f3yb-module-import/guix/build/gnu-build-system.scm:815:12 (expr)> #]
In /gnu/store/hf8xflikhgsd4hfy9h8s0cjzfqm8f3yb-module-import/guix/build/utils.scm:
 614: 1 [invoke "/gnu/store/g34swjqyw205d15pyra39j56qvyxq9w9-bootstrap-binaries-0/bin/bash" ...]
In unknown file:
   ?: 0 [system* "/gnu/store/g34swjqyw205d15pyra39j56qvyxq9w9-bootstrap-binaries-0/bin/bash" ...]

ERROR: In procedure system*:
ERROR: In procedure system*: Interrupted system call
builder for `/gnu/store/hc96d5dcshbdgavpp0j01qnsjf0yf9z5-make-boot0-4.2.1.drv' failed with exit code 1
--8<---------------cut here---------------end--------------->8---

This is why ‘install-SIGCHLD-handler’ in the patch does nothing on Guile
<= 2.0.9.

Now, we’d need to test it for real with Guile 2.2.  I suppose one way to
test without rebuilding it all would be to add this phase explicitly in
a package and try building it with --rounds=10 or something.  Would you
like to try that?

Note that we have only a couple of days left before the ‘core-updates’
freeze.

Thanks,
Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1878 bytes --]

diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index be5ad78b9..2c6cb4ad2 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -51,6 +51,28 @@
    (define time-monotonic time-tai))
   (else #t))
 
+(define* (install-SIGCHLD-handler #:rest _)
+  "Handle SIGCHLD signals.  Since this code is usually running as PID 1 in the
+build daemon, it has to reap dead processes, hence this procedure."
+  ;; In Guile <= 2.0.9, syscalls could throw EINTR.  With these versions,
+  ;; installing a SIGCHLD handler is not safe because we could have uncaught
+  ;; 'system-error' exceptions at any time.
+  (when (or (not (string=? (effective-version) "2.0"))
+            (> (string->number (micro-version)) 9))
+    (format #t "installing SIGCHLD handler in PID ~a\n" (getpid))
+    (sigaction SIGCHLD
+      (lambda _
+        (let loop ()
+          (match (catch 'system-error
+                   (lambda ()
+                     (waitpid WAIT_ANY WNOHANG))
+                   (lambda args
+                     '(0 . -)))
+            ((0 . _) #f)
+            ((pid . _) (loop)))))
+      SA_NOCLDSTOP))
+  #t)
+
 (define* (set-SOURCE-DATE-EPOCH #:rest _)
   "Set the 'SOURCE_DATE_EPOCH' environment variable.  This is used by tools
 that incorporate timestamps as a way to tell them to use a fixed timestamp.
@@ -758,7 +780,8 @@ which cannot be found~%"
   ;; Standard build phases, as a list of symbol/procedure pairs.
   (let-syntax ((phases (syntax-rules ()
                          ((_ p ...) `((p . ,p) ...)))))
-    (phases set-SOURCE-DATE-EPOCH set-paths install-locale unpack
+    (phases install-SIGCHLD-handler
+            set-SOURCE-DATE-EPOCH set-paths install-locale unpack
             bootstrap
             patch-usr-bin-file
             patch-source-shebangs configure patch-generated-file-shebangs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
  2018-03-29 20:07 ` Ludovic Courtès
@ 2018-03-29 21:15   ` Carlo Zancanaro
  2018-03-30  8:16     ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Carlo Zancanaro @ 2018-03-29 21:15 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30948

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

Hey Ludo,

On Thu, Mar 29 2018, Ludovic Courtès wrote:
> I would rather install the handler as a phase in 
> gnu-build-system: this leaves ‘build-expression->derivation’ 
> generic, and also gives us more flexibility (e.g., we can 
> disable that phase without doing a full rebuild if needed.)  See 
> the patch below.
>
> WDYT?

What do you mean by "generic"? From what I can understand it's one 
of pid 1's responsiblities to reap child processes, so I would 
expect this to be set up for every builder, before the builder is 
run. Given it's not specific to the gnu-build-system, I don't 
think it really fits there.

> On my first attempt with:
>
>   ./pre-inst-env guix build -e '(@@ (gnu packages commencement) 
>   findutils-boot0)'
>
> quickly failed:
>
> ...
>
> This is why ‘install-SIGCHLD-handler’ in the patch does nothing 
> on Guile <= 2.0.9.

From what I understand, Guix depends on Guile 2.0.13 or later, so 
I didn't think it needed to work with 2.0.9. From my quick check, 
though, our bootstrap binaries are Guile 2.0.9? I can see how that 
might cause a problem. In what sense does Guix require 2.0.13 (as 
the manual claims) rather than 2.0.9?

> Now, we’d need to test it for real with Guile 2.2.  I suppose 
> one way to
> test without rebuilding it all would be to add this phase 
> explicitly in
> a package and try building it with --rounds=10 or something. 
> Would you
> like to try that?

Yeah, I'll give it a go.

Carlo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
  2018-03-29 21:15   ` Carlo Zancanaro
@ 2018-03-30  8:16     ` Ludovic Courtès
  2018-03-30 11:17       ` Carlo Zancanaro
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2018-03-30  8:16 UTC (permalink / raw)
  To: Carlo Zancanaro; +Cc: 30948

Heya,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

> On Thu, Mar 29 2018, Ludovic Courtès wrote:
>> I would rather install the handler as a phase in gnu-build-system:
>> this leaves ‘build-expression->derivation’ generic, and also gives
>> us more flexibility (e.g., we can disable that phase without doing a
>> full rebuild if needed.)  See the patch below.
>>
>> WDYT?
>
> What do you mean by "generic"?

I want as little magic as possible around the expression that’s passed
to ‘build-expression->derivation’.

> From what I can understand it's one of pid 1's responsiblities to reap
> child processes, so I would expect this to be set up for every
> builder, before the builder is run.

True, but for derivations it’s also “optional” because eventually
guix-daemon terminates all its child processes.

> Given it's not specific to the gnu-build-system, I don't think it
> really fits there.

Yes, but note that it would be inherited by all the build systems.

>> On my first attempt with:
>>
>>   ./pre-inst-env guix build -e '(@@ (gnu packages commencement)
>> findutils-boot0)'
>>
>> quickly failed:
>>
>> ...
>>
>> This is why ‘install-SIGCHLD-handler’ in the patch does nothing on
>> Guile <= 2.0.9.
>
> From what I understand, Guix depends on Guile 2.0.13 or later, so I
> didn't think it needed to work with 2.0.9. From my quick check,
> though, our bootstrap binaries are Guile 2.0.9?

Exactly.

> I can see how that might cause a problem. In what sense does Guix
> require 2.0.13 (as the manual claims) rather than 2.0.9?

There’s the “host side” (the ‘guix’ commands and related modules), and
there’s the “build side” (code used in the build environment when
building derivations.)

The “build side” is fully specified: ‘guix graph’ shows exactly what
Guile is used where, and you can see with, say:

  guix graph -t derivation \
    -e '(@@ (gnu packages commencement) findutils-boot0)'

that the early derivations run on Guile 2.0.9.

For “host side” code, users can use any Guile >= 2.0.13.

See also
<https://gnu.org/software/guix/manual/html_node/G_002dExpressions.html>.

I hope this clarifies a bit!

Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
  2018-03-30  8:16     ` Ludovic Courtès
@ 2018-03-30 11:17       ` Carlo Zancanaro
  2018-03-30 15:17         ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Carlo Zancanaro @ 2018-03-30 11:17 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30948

[-- Attachment #1: Type: text/plain, Size: 1769 bytes --]

Hey,

On Fri, Mar 30 2018, Ludovic Courtès wrote:
>> From what I can understand it's one of pid 1's responsiblities 
>> to reap child processes, so I would expect this to be set up 
>> for every builder, before the builder is run.
>
> True, but for derivations it’s also “optional” because 
> eventually guix-daemon terminates all its child processes.

As long as the build process doesn't rely on behaviour that, 
strictly speaking, it should be allowed to rely on. It's not an 
issue of resource leaking, it's an issue of correctness.

>> Given it's not specific to the gnu-build-system, I don't think 
>> it really fits there.
>
> Yes, but note that it would be inherited by all the build 
> systems.

Except for trivial-build-system, which is probably fine. I still 
don't think it fits in a specific build system, given it's a 
behaviour that transcends the specific action happening within the 
container.

Putting it in gnu-build-system will solve the problem in all 
realistic cases, so that's probably fine. It's still subtly 
incorrect, but will only be a problem if something using the 
trivial build system relies on pid 1 to reap a process, or if we 
make a new build system not deriving from gnu-build-system (which 
seems unlikely, but not impossible).

> The “build side” is fully specified: ‘guix graph’ shows exactly 
> what Guile is used where, and you can see with, say:
>
>   guix graph -t derivation \
>     -e '(@@ (gnu packages commencement) findutils-boot0)'
>
> that the early derivations run on Guile 2.0.9.
>
> For “host side” code, users can use any Guile >= 2.0.13.

Yeah, okay. That makes sense. I guess I just expected 2.0.13 to be 
the minimum version throughout.

Carlo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
  2018-03-30 11:17       ` Carlo Zancanaro
@ 2018-03-30 15:17         ` Ludovic Courtès
  2022-11-24 16:40           ` Maxim Cournoyer
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2018-03-30 15:17 UTC (permalink / raw)
  To: Carlo Zancanaro; +Cc: 30948

Hello,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

> On Fri, Mar 30 2018, Ludovic Courtès wrote:
>>> From what I can understand it's one of pid 1's responsiblities to
>>> reap child processes, so I would expect this to be set up for every
>>> builder, before the builder is run.
>>
>> True, but for derivations it’s also “optional” because eventually
>> guix-daemon terminates all its child processes.
>
> As long as the build process doesn't rely on behaviour that, strictly
> speaking, it should be allowed to rely on. It's not an issue of
> resource leaking, it's an issue of correctness.

Right.

>>> Given it's not specific to the gnu-build-system, I don't think it
>>> really fits there.
>>
>> Yes, but note that it would be inherited by all the build systems.
>
> Except for trivial-build-system, which is probably fine. I still don't
> think it fits in a specific build system, given it's a behaviour that
> transcends the specific action happening within the container.
>
> Putting it in gnu-build-system will solve the problem in all realistic
> cases, so that's probably fine. It's still subtly incorrect, but will
> only be a problem if something using the trivial build system relies
> on pid 1 to reap a process, or if we make a new build system not
> deriving from gnu-build-system (which seems unlikely, but not
> impossible).

I agree, every Guile process running as PID 1 should reap processes.

My view is just that this mechanism belongs in “user code”, not in the
low-level mechanisms such as ‘build-expression->derivation’ and
‘gexp->derivation’.  It’s a matter of separation of concerns.

Of course we don’t want to duplicate that code every time, but the way
we should factorize it, IMO, is by putting it in a “normal” module that
people will use.

Putting it in gnu-build-system is an admittedly hacky but easy way to
have it widely shared.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
  2018-03-30 15:17         ` Ludovic Courtès
@ 2022-11-24 16:40           ` Maxim Cournoyer
  0 siblings, 0 replies; 8+ messages in thread
From: Maxim Cournoyer @ 2022-11-24 16:40 UTC (permalink / raw)
  To: 30948; +Cc: Carlo Zancanaro, GNU Debbugs, Ludovic Courtès

reassign 30948 guix
thanks
--
Hi,

I'm moving this from 'guix-patches' to 'guix', so that it's more
discoverable as a *bug*.  It still bites us every now and then (grep the
Guix source code for usages of tini to find some occurrences).

Thanks,

Maxim




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-24 16:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-26 11:16 [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers Carlo Zancanaro
2018-03-26 23:39 ` Carlo Zancanaro
2018-03-29 20:07 ` Ludovic Courtès
2018-03-29 21:15   ` Carlo Zancanaro
2018-03-30  8:16     ` Ludovic Courtès
2018-03-30 11:17       ` Carlo Zancanaro
2018-03-30 15:17         ` Ludovic Courtès
2022-11-24 16:40           ` Maxim Cournoyer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).