Update on GuixSD containers

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Update on GuixSD containers
@ 2015-06-08 15:20 Thompson, David
  2015-06-09 21:28 ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Thompson, David @ 2015-06-08 15:20 UTC (permalink / raw)
  To: guix-devel

Hey folks,

I'd like to give a quick update on the state of wip-container branch.
As of this morning, one can run the below commands and have a somewhat
functional GuixSD container:

  # Hardcoded /tmp/container as the container root directory until I
  # add a command line switch.
  mkdir /tmp/container
  guix system container container-config.scm

Where 'container-config.scm' is:

    (use-modules (gnu))

    ;; Minimal GuixSD configuration suitable for a Linux container.
    (operating-system
      (host-name "container-test")
      (timezone "America/New_York")
      (locale "en_US.UTF-8")
      ;; Unused
      (bootloader (grub-configuration (device "/dev/sdX")))
      ;; Dummy FS
      (file-systems (cons (file-system
                            (mount-point "/")
                            (device "dummy")
                            (type "dummy"))
                          %base-file-systems))

      (users (cons (user-account
                    (name "alice")
                    (comment "Bob's sister")
                    (group "users")
                    (supplementary-groups '("wheel" "audio" "video"))
                    (home-directory "/home/alice"))
                   %base-user-accounts)))

The activation and boot scripts for the system have been tweaked to
DTRT for a container, and DMD is able to start successfully and start
all of the base services, sans the console-font-tty services for some
reason.

So, this is cool, but much work remains to be done.  Our containers
operate in 5 of 6 possible Linux namespaces: mount, PID, UTS, IPC, and
network.  The remaining namespace to get working is the user
namespace, which is especially tricky.  I don't think even Docker can
use user namespaces properly yet, but I might be wrong.  Additionally,
our containers have a loopback device, but have no way of accessing an
outside network such as your LAN or a virtual network on the host
system.  There's also no support for cgroups, which would allow us to
limit the resource usage of containers like you can with a VM
hypervisor.

For the long term, we'll need a container daemon to keep track of all
containers on the system to allow for easily starting and stopping
them (right now you have to 'sudo kill -9 <dmd pid>'), spawning new
processes within them (for example, launching bash for an interactive
environment), and whatever else we might want.

In closing, things aren't exactly usable, but I encourage
brave/curious people to take 'guix system container' for a spin and
hack on it to make Guix the best container management tool yet!  Also,
I think the code is very easy to follow (unlike Docker's
libcontainer), so if you want to understand what containers *really*
are beyond a buzzword, have a look at gnu/build/linux-container.scm
and gnu/system/linux-container.scm.

Happy hacking,

- Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-08 15:20 Update on GuixSD containers Thompson, David
@ 2015-06-09 21:28 ` Ludovic Courtès
  2015-06-11 14:51   ` Thompson, David
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2015-06-09 21:28 UTC (permalink / raw)
  To: Thompson, David; +Cc: guix-devel

"Thompson, David" <dthompson2@worcester.edu> skribis:

> I'd like to give a quick update on the state of wip-container branch.
> As of this morning, one can run the below commands and have a somewhat
> functional GuixSD container:
>
>   # Hardcoded /tmp/container as the container root directory until I
>   # add a command line switch.
>   mkdir /tmp/container
>   guix system container container-config.scm

Wonderful!  I’ve given it a try, and it works as advertised. ;-)
I was a bit afraid the first time I ran the ‘run-container’ script as
root, but everything went like a charm.

I tried adding this dummy service:

  (define (bash-service)
    (with-monad %store-monad
      (return (service
               (documentation "Run Bash from PID 1.")
               (provision '(shell))
               (start #~(make-forkexec-constructor
                         (string-append #$bash "/bin/bash")))
               (stop #~(make-kill-destructor))
               (respawn? #t)))))

... but it dies for some reason.  So no shell prompt.

> So, this is cool, but much work remains to be done.  Our containers
> operate in 5 of 6 possible Linux namespaces: mount, PID, UTS, IPC, and
> network.  The remaining namespace to get working is the user
> namespace, which is especially tricky.  I don't think even Docker can
> use user namespaces properly yet, but I might be wrong.  Additionally,
> our containers have a loopback device, but have no way of accessing an
> outside network such as your LAN or a virtual network on the host
> system.  There's also no support for cgroups, which would allow us to
> limit the resource usage of containers like you can with a VM
> hypervisor.

OK.

> For the long term, we'll need a container daemon to keep track of all
> containers on the system to allow for easily starting and stopping
> them (right now you have to 'sudo kill -9 <dmd pid>'), spawning new
> processes within them (for example, launching bash for an interactive
> environment), and whatever else we might want.

Having launched a bunch of containers and then hacked to kill all the
dmds, I can see why keeping track of containers matters.  :-)

Until there’s a daemon to keep track of containers, “guix system
container” could return the PID of the container’s PID1, to make it
easier to kill it later?

> In closing, things aren't exactly usable, but I encourage
> brave/curious people to take 'guix system container' for a spin and
> hack on it to make Guix the best container management tool yet!  Also,
> I think the code is very easy to follow (unlike Docker's
> libcontainer), so if you want to understand what containers *really*
> are beyond a buzzword, have a look at gnu/build/linux-container.scm
> and gnu/system/linux-container.scm.

Indeed I find the new code easy to read and well integrated; I like it!

It’s a shame that only CLONE_NEWUSER is available to non-root users.  I
wonder what the rationale was.  AIUI, Docker’s daemon performs clone(2)
on behalf of clients, right?

Thanks for the great work!

Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-09 21:28 ` Ludovic Courtès
@ 2015-06-11 14:51   ` Thompson, David
  2015-06-12 15:08     ` Ludovic Courtès
  2015-06-12 15:12     ` Ludovic Courtès
  0 siblings, 2 replies; 14+ messages in thread
From: Thompson, David @ 2015-06-11 14:51 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Tue, Jun 9, 2015 at 5:28 PM, Ludovic Courtès <ludo@gnu.org> wrote:
> "Thompson, David" <dthompson2@worcester.edu> skribis:
>
>> I'd like to give a quick update on the state of wip-container branch.
>> As of this morning, one can run the below commands and have a somewhat
>> functional GuixSD container:
>>
>>   # Hardcoded /tmp/container as the container root directory until I
>>   # add a command line switch.
>>   mkdir /tmp/container
>>   guix system container container-config.scm
>
> Wonderful!  I’ve given it a try, and it works as advertised. ;-)
> I was a bit afraid the first time I ran the ‘run-container’ script as
> root, but everything went like a charm.

Yeah, running as root is a bit scary.  With working user namespaces it
should become less scary.  I just don't know how to reasonably start a
system with users of its own that are allowed to write to the file
system.  Everything I've tried thus far has failed.  I thought that
mapping the uid/gid 0 in the namespace to uid/gid 0 outside of the
namespace would be enough to boot the system, but it didn't work.

> I tried adding this dummy service:
>
>   (define (bash-service)
>     (with-monad %store-monad
>       (return (service
>                (documentation "Run Bash from PID 1.")
>                (provision '(shell))
>                (start #~(make-forkexec-constructor
>                          (string-append #$bash "/bin/bash")))
>                (stop #~(make-kill-destructor))
>                (respawn? #t)))))
>
> ... but it dies for some reason.  So no shell prompt.

I wouldn't expect that to work because bash isn't actually run in your
tty.  To create an interactive environment within the container (or
run any arbitrary program), we need a tool that calls setns() with
open file descriptors for all of the container's namespaces and then
exec() the desired command.  I threw together a tool to do this
quickly, but for some reason joining the mount namespace fails with
EINVAL.  I have no idea why.  Joining the IPC, UTS, PID, and network
namespaces isn't a problem.  Enlightenment needed!

>> For the long term, we'll need a container daemon to keep track of all
>> containers on the system to allow for easily starting and stopping
>> them (right now you have to 'sudo kill -9 <dmd pid>'), spawning new
>> processes within them (for example, launching bash for an interactive
>> environment), and whatever else we might want.
>
> Having launched a bunch of containers and then hacked to kill all the
> dmds, I can see why keeping track of containers matters.  :-)
>
> Until there’s a daemon to keep track of containers, “guix system
> container” could return the PID of the container’s PID1, to make it
> easier to kill it later?

I'm actually unsure how to acquire the PID of the container's init
process since I clone and exec.  Any ideas?

> It’s a shame that only CLONE_NEWUSER is available to non-root users.  I
> wonder what the rationale was.  AIUI, Docker’s daemon performs clone(2)
> on behalf of clients, right?

Yeah, our daemon would do the same thing.  We could maybe even have a
little Guile library that allows one to evaluate arbitrary scheme code
from within the container. :)

- Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-11 14:51   ` Thompson, David
@ 2015-06-12 15:08     ` Ludovic Courtès
  2015-06-13  3:41       ` Thompson, David
  2018-07-24 22:22       ` Christopher Lemmer Webber
  2015-06-12 15:12     ` Ludovic Courtès
  1 sibling, 2 replies; 14+ messages in thread
From: Ludovic Courtès @ 2015-06-12 15:08 UTC (permalink / raw)
  To: Thompson, David; +Cc: guix-devel

"Thompson, David" <dthompson2@worcester.edu> skribis:

> On Tue, Jun 9, 2015 at 5:28 PM, Ludovic Courtès <ludo@gnu.org> wrote:

[...]

>> I tried adding this dummy service:
>>
>>   (define (bash-service)
>>     (with-monad %store-monad
>>       (return (service
>>                (documentation "Run Bash from PID 1.")
>>                (provision '(shell))
>>                (start #~(make-forkexec-constructor
>>                          (string-append #$bash "/bin/bash")))
>>                (stop #~(make-kill-destructor))
>>                (respawn? #t)))))
>>
>> ... but it dies for some reason.  So no shell prompt.
>
> I wouldn't expect that to work because bash isn't actually run in your
> tty.  To create an interactive environment within the container (or
> run any arbitrary program), we need a tool that calls setns() with
> open file descriptors for all of the container's namespaces and then
> exec() the desired command.  I threw together a tool to do this
> quickly, but for some reason joining the mount namespace fails with
> EINVAL.  I have no idea why.  Joining the IPC, UTS, PID, and network
> namespaces isn't a problem.  Enlightenment needed!

Oh, I see.  setns(2) specifies 6 reasons for EINVAL...

>> Until there’s a daemon to keep track of containers, “guix system
>> container” could return the PID of the container’s PID1, to make it
>> easier to kill it later?
>
> I'm actually unsure how to acquire the PID of the container's init
> process since I clone and exec.  Any ideas?

Isn’t it the return value of ‘clone’?

>> It’s a shame that only CLONE_NEWUSER is available to non-root users.  I
>> wonder what the rationale was.  AIUI, Docker’s daemon performs clone(2)
>> on behalf of clients, right?
>
> Yeah, our daemon would do the same thing.  We could maybe even have a
> little Guile library that allows one to evaluate arbitrary scheme code
> from within the container. :)

Definitely.  Another application I’ve always wanted is a least-authority
shell, like Plash [0].

(Speaking of which, I just found Shill [1], which seems similar to Plash
and even has a to-do item regarding package management [2] and is
written in Racket; unfortunately it runs on FreeBSD, for Capsicum.)

Thanks,
Ludo’.

[0] http://plash.beasts.org/contents.html
[1] http://shill.seas.harvard.edu/
[2] http://shill.seas.harvard.edu/projects.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-11 14:51   ` Thompson, David
  2015-06-12 15:08     ` Ludovic Courtès
@ 2015-06-12 15:12     ` Ludovic Courtès
  2015-06-13  1:41       ` Thompson, David
  1 sibling, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2015-06-12 15:12 UTC (permalink / raw)
  To: Thompson, David; +Cc: guix-devel

"Thompson, David" <dthompson2@worcester.edu> skribis:

> Yeah, our daemon would do the same thing.  We could maybe even have a
> little Guile library that allows one to evaluate arbitrary scheme code
> from within the container. :)

Actually, something quite easily feasible would be this:

  (eval-in-container #~(system* #$evil-program
                                #$(local-file "important-data.txt"))
                     #:networking? #f)

... where the container’s store would be populated with just
EVIL-PROGRAM and the local file.

Food for thought...

Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-12 15:12     ` Ludovic Courtès
@ 2015-06-13  1:41       ` Thompson, David
  2015-06-13 13:06         ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Thompson, David @ 2015-06-13  1:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Fri, Jun 12, 2015 at 11:12 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> "Thompson, David" <dthompson2@worcester.edu> skribis:
>
>> Yeah, our daemon would do the same thing.  We could maybe even have a
>> little Guile library that allows one to evaluate arbitrary scheme code
>> from within the container. :)
>
> Actually, something quite easily feasible would be this:
>
>   (eval-in-container #~(system* #$evil-program
>                                 #$(local-file "important-data.txt"))
>                      #:networking? #f)
>
> ... where the container’s store would be populated with just
> EVIL-PROGRAM and the local file.
>
> Food for thought...

Ooooh yeah!  That would be cool.  Though I think we should still spawn
a dmd process as PID 1 to deal with reaping zombie processes.  We
could generate a single service that runs the gexp script.  How does
that sound?

Thanks for this good idea!

- Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-12 15:08     ` Ludovic Courtès
@ 2015-06-13  3:41       ` Thompson, David
  2018-07-24 22:22       ` Christopher Lemmer Webber
  1 sibling, 0 replies; 14+ messages in thread
From: Thompson, David @ 2015-06-13  3:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Fri, Jun 12, 2015 at 11:08 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> "Thompson, David" <dthompson2@worcester.edu> skribis:
>
>> On Tue, Jun 9, 2015 at 5:28 PM, Ludovic Courtès <ludo@gnu.org> wrote:
>>
>>> Until there’s a daemon to keep track of containers, “guix system
>>> container” could return the PID of the container’s PID1, to make it
>>> easier to kill it later?
>>
>> I'm actually unsure how to acquire the PID of the container's init
>> process since I clone and exec.  Any ideas?
>
> Isn’t it the return value of ‘clone’?

Oh, you're right.  I forgot that the exec() *replaces* the process,
rather than spawning a new one.  The script now outputs the PID.

>>> It’s a shame that only CLONE_NEWUSER is available to non-root users.  I
>>> wonder what the rationale was.  AIUI, Docker’s daemon performs clone(2)
>>> on behalf of clients, right?
>>
>> Yeah, our daemon would do the same thing.  We could maybe even have a
>> little Guile library that allows one to evaluate arbitrary scheme code
>> from within the container. :)
>
> Definitely.  Another application I’ve always wanted is a least-authority
> shell, like Plash [0].
>
> (Speaking of which, I just found Shill [1], which seems similar to Plash
> and even has a to-do item regarding package management [2] and is
> written in Racket; unfortunately it runs on FreeBSD, for Capsicum.)

That's really cool.  Using a container + user-specified shared
directories we can achieve something like this, I think.

- Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-13  1:41       ` Thompson, David
@ 2015-06-13 13:06         ` Ludovic Courtès
  2015-06-13 13:14           ` Thompson, David
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2015-06-13 13:06 UTC (permalink / raw)
  To: Thompson, David; +Cc: guix-devel

"Thompson, David" <dthompson2@worcester.edu> skribis:

> On Fri, Jun 12, 2015 at 11:12 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>
>>> Yeah, our daemon would do the same thing.  We could maybe even have a
>>> little Guile library that allows one to evaluate arbitrary scheme code
>>> from within the container. :)
>>
>> Actually, something quite easily feasible would be this:
>>
>>   (eval-in-container #~(system* #$evil-program
>>                                 #$(local-file "important-data.txt"))
>>                      #:networking? #f)
>>
>> ... where the container’s store would be populated with just
>> EVIL-PROGRAM and the local file.
>>
>> Food for thought...
>
> Ooooh yeah!  That would be cool.  Though I think we should still spawn
> a dmd process as PID 1 to deal with reaping zombie processes.  We
> could generate a single service that runs the gexp script.  How does
> that sound?

Wouldn’t it be enough to have the Guile process that evaluates the
expression be PID 1 in the container, as is the case in guix-daemon
containers?

Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-13 13:06         ` Ludovic Courtès
@ 2015-06-13 13:14           ` Thompson, David
  2015-06-13 20:19             ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Thompson, David @ 2015-06-13 13:14 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Sat, Jun 13, 2015 at 9:06 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> "Thompson, David" <dthompson2@worcester.edu> skribis:
>
>> On Fri, Jun 12, 2015 at 11:12 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>
>>>> Yeah, our daemon would do the same thing.  We could maybe even have a
>>>> little Guile library that allows one to evaluate arbitrary scheme code
>>>> from within the container. :)
>>>
>>> Actually, something quite easily feasible would be this:
>>>
>>>   (eval-in-container #~(system* #$evil-program
>>>                                 #$(local-file "important-data.txt"))
>>>                      #:networking? #f)
>>>
>>> ... where the container’s store would be populated with just
>>> EVIL-PROGRAM and the local file.
>>>
>>> Food for thought...
>>
>> Ooooh yeah!  That would be cool.  Though I think we should still spawn
>> a dmd process as PID 1 to deal with reaping zombie processes.  We
>> could generate a single service that runs the gexp script.  How does
>> that sound?
>
> Wouldn’t it be enough to have the Guile process that evaluates the
> expression be PID 1 in the container, as is the case in guix-daemon
> containers?

Sure, it would work, but my concern is that a long-running process on
a user's machine could create and orphan tons of child processes and
nothing would be able to clean them up until the PID namespace is
garbage collected.

- Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-13 13:14           ` Thompson, David
@ 2015-06-13 20:19             ` Ludovic Courtès
  2015-06-16 16:39               ` Thompson, David
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2015-06-13 20:19 UTC (permalink / raw)
  To: Thompson, David; +Cc: guix-devel

"Thompson, David" <dthompson2@worcester.edu> skribis:

> On Sat, Jun 13, 2015 at 9:06 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>
>>> On Fri, Jun 12, 2015 at 11:12 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>>
>>>>> Yeah, our daemon would do the same thing.  We could maybe even have a
>>>>> little Guile library that allows one to evaluate arbitrary scheme code
>>>>> from within the container. :)
>>>>
>>>> Actually, something quite easily feasible would be this:
>>>>
>>>>   (eval-in-container #~(system* #$evil-program
>>>>                                 #$(local-file "important-data.txt"))
>>>>                      #:networking? #f)
>>>>
>>>> ... where the container’s store would be populated with just
>>>> EVIL-PROGRAM and the local file.
>>>>
>>>> Food for thought...
>>>
>>> Ooooh yeah!  That would be cool.  Though I think we should still spawn
>>> a dmd process as PID 1 to deal with reaping zombie processes.  We
>>> could generate a single service that runs the gexp script.  How does
>>> that sound?
>>
>> Wouldn’t it be enough to have the Guile process that evaluates the
>> expression be PID 1 in the container, as is the case in guix-daemon
>> containers?
>
> Sure, it would work, but my concern is that a long-running process on
> a user's machine could create and orphan tons of child processes and
> nothing would be able to clean them up until the PID namespace is
> garbage collected.

My understanding was that killing a container’s PID 1 (from the outside)
effectively killed all the processes of that PID name space.  Isn’t it
the case?

(The daemon works around that by running processes under a separate UID
and doing kill(-1, SIGKILL) under that UID.)

Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-13 20:19             ` Ludovic Courtès
@ 2015-06-16 16:39               ` Thompson, David
  2015-06-19 12:08                 ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Thompson, David @ 2015-06-16 16:39 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Sat, Jun 13, 2015 at 4:19 PM, Ludovic Courtès <ludo@gnu.org> wrote:
> "Thompson, David" <dthompson2@worcester.edu> skribis:
>
>> On Sat, Jun 13, 2015 at 9:06 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>
>>>> On Fri, Jun 12, 2015 at 11:12 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>>>
>>>>>> Yeah, our daemon would do the same thing.  We could maybe even have a
>>>>>> little Guile library that allows one to evaluate arbitrary scheme code
>>>>>> from within the container. :)
>>>>>
>>>>> Actually, something quite easily feasible would be this:
>>>>>
>>>>>   (eval-in-container #~(system* #$evil-program
>>>>>                                 #$(local-file "important-data.txt"))
>>>>>                      #:networking? #f)
>>>>>
>>>>> ... where the container’s store would be populated with just
>>>>> EVIL-PROGRAM and the local file.
>>>>>
>>>>> Food for thought...
>>>>
>>>> Ooooh yeah!  That would be cool.  Though I think we should still spawn
>>>> a dmd process as PID 1 to deal with reaping zombie processes.  We
>>>> could generate a single service that runs the gexp script.  How does
>>>> that sound?
>>>
>>> Wouldn’t it be enough to have the Guile process that evaluates the
>>> expression be PID 1 in the container, as is the case in guix-daemon
>>> containers?
>>
>> Sure, it would work, but my concern is that a long-running process on
>> a user's machine could create and orphan tons of child processes and
>> nothing would be able to clean them up until the PID namespace is
>> garbage collected.
>
> My understanding was that killing a container’s PID 1 (from the outside)
> effectively killed all the processes of that PID name space.  Isn’t it
> the case?

Yes, that is the case.  That triggers the "garbage collection" of that
namespace, if you will.  My point is that, without a proper PID 1 that
can DTRT with orphaned processes, a long running process in a
container could potentially create a ton of orphaned child processes
with no way for them to be reaped without killing PID 1.  I wouldn't
be very happy if a program that I was running in a sandbox was
polluting the process list.  I don't think this is a concern for the
build daemon because the build process is a (relatively) short-lived
process, but running something like a web browser could go on for
days, weeks, etc.

> (The daemon works around that by running processes under a separate UID
> and doing kill(-1, SIGKILL) under that UID.)

So, PID 1 in the build container forks and changes the UID or
something?  Sorry, I'm a bit lost right now.

Thanks for trying to explain.

- Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-16 16:39               ` Thompson, David
@ 2015-06-19 12:08                 ` Ludovic Courtès
  2015-06-19 12:29                   ` Thompson, David
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2015-06-19 12:08 UTC (permalink / raw)
  To: Thompson, David; +Cc: guix-devel

"Thompson, David" <dthompson2@worcester.edu> skribis:

> On Sat, Jun 13, 2015 at 4:19 PM, Ludovic Courtès <ludo@gnu.org> wrote:
>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>
>>> On Sat, Jun 13, 2015 at 9:06 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>>
>>>>> On Fri, Jun 12, 2015 at 11:12 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>>>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>>>>
>>>>>>> Yeah, our daemon would do the same thing.  We could maybe even have a
>>>>>>> little Guile library that allows one to evaluate arbitrary scheme code
>>>>>>> from within the container. :)
>>>>>>
>>>>>> Actually, something quite easily feasible would be this:
>>>>>>
>>>>>>   (eval-in-container #~(system* #$evil-program
>>>>>>                                 #$(local-file "important-data.txt"))
>>>>>>                      #:networking? #f)
>>>>>>
>>>>>> ... where the container’s store would be populated with just
>>>>>> EVIL-PROGRAM and the local file.
>>>>>>
>>>>>> Food for thought...
>>>>>
>>>>> Ooooh yeah!  That would be cool.  Though I think we should still spawn
>>>>> a dmd process as PID 1 to deal with reaping zombie processes.  We
>>>>> could generate a single service that runs the gexp script.  How does
>>>>> that sound?
>>>>
>>>> Wouldn’t it be enough to have the Guile process that evaluates the
>>>> expression be PID 1 in the container, as is the case in guix-daemon
>>>> containers?
>>>
>>> Sure, it would work, but my concern is that a long-running process on
>>> a user's machine could create and orphan tons of child processes and
>>> nothing would be able to clean them up until the PID namespace is
>>> garbage collected.
>>
>> My understanding was that killing a container’s PID 1 (from the outside)
>> effectively killed all the processes of that PID name space.  Isn’t it
>> the case?
>
> Yes, that is the case.  That triggers the "garbage collection" of that
> namespace, if you will.  My point is that, without a proper PID 1 that
> can DTRT with orphaned processes, a long running process in a
> container could potentially create a ton of orphaned child processes
> with no way for them to be reaped without killing PID 1.  I wouldn't
> be very happy if a program that I was running in a sandbox was
> polluting the process list.  I don't think this is a concern for the
> build daemon because the build process is a (relatively) short-lived
> process, but running something like a web browser could go on for
> days, weeks, etc.

Yes, I understand.  This is definitely an important concern for full
GuixSD containers.

However, ‘eval-in-container’ would be much simpler, synchronous, and
typically for short-lived processes.  So I guess the process that runs
‘eval-in-container’ would clone(2) (via ‘call-with-container’) and
simply waitpid(2) the child process (which is PID 1 in its container).

When the parent process gets a SIGINT or SIGHUP, it could send SIGKILL
to the child, thereby terminating the container.

Does that make sense?

>> (The daemon works around that by running processes under a separate UID
>> and doing kill(-1, SIGKILL) under that UID.)
>
> So, PID 1 in the build container forks and changes the UID or
> something?

Yes, with setuid (see build.cc:2180.)

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-19 12:08                 ` Ludovic Courtès
@ 2015-06-19 12:29                   ` Thompson, David
  0 siblings, 0 replies; 14+ messages in thread
From: Thompson, David @ 2015-06-19 12:29 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Fri, Jun 19, 2015 at 8:08 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> "Thompson, David" <dthompson2@worcester.edu> skribis:
>
>> On Sat, Jun 13, 2015 at 4:19 PM, Ludovic Courtès <ludo@gnu.org> wrote:
>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>
>>>> On Sat, Jun 13, 2015 at 9:06 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>>>
>>>>>> On Fri, Jun 12, 2015 at 11:12 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>>>>>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>>>>>>
>>>>>>>> Yeah, our daemon would do the same thing.  We could maybe even have a
>>>>>>>> little Guile library that allows one to evaluate arbitrary scheme code
>>>>>>>> from within the container. :)
>>>>>>>
>>>>>>> Actually, something quite easily feasible would be this:
>>>>>>>
>>>>>>>   (eval-in-container #~(system* #$evil-program
>>>>>>>                                 #$(local-file "important-data.txt"))
>>>>>>>                      #:networking? #f)
>>>>>>>
>>>>>>> ... where the container’s store would be populated with just
>>>>>>> EVIL-PROGRAM and the local file.
>>>>>>>
>>>>>>> Food for thought...
>>>>>>
>>>>>> Ooooh yeah!  That would be cool.  Though I think we should still spawn
>>>>>> a dmd process as PID 1 to deal with reaping zombie processes.  We
>>>>>> could generate a single service that runs the gexp script.  How does
>>>>>> that sound?
>>>>>
>>>>> Wouldn’t it be enough to have the Guile process that evaluates the
>>>>> expression be PID 1 in the container, as is the case in guix-daemon
>>>>> containers?
>>>>
>>>> Sure, it would work, but my concern is that a long-running process on
>>>> a user's machine could create and orphan tons of child processes and
>>>> nothing would be able to clean them up until the PID namespace is
>>>> garbage collected.
>>>
>>> My understanding was that killing a container’s PID 1 (from the outside)
>>> effectively killed all the processes of that PID name space.  Isn’t it
>>> the case?
>>
>> Yes, that is the case.  That triggers the "garbage collection" of that
>> namespace, if you will.  My point is that, without a proper PID 1 that
>> can DTRT with orphaned processes, a long running process in a
>> container could potentially create a ton of orphaned child processes
>> with no way for them to be reaped without killing PID 1.  I wouldn't
>> be very happy if a program that I was running in a sandbox was
>> polluting the process list.  I don't think this is a concern for the
>> build daemon because the build process is a (relatively) short-lived
>> process, but running something like a web browser could go on for
>> days, weeks, etc.
>
> Yes, I understand.  This is definitely an important concern for full
> GuixSD containers.
>
> However, ‘eval-in-container’ would be much simpler, synchronous, and
> typically for short-lived processes.  So I guess the process that runs
> ‘eval-in-container’ would clone(2) (via ‘call-with-container’) and
> simply waitpid(2) the child process (which is PID 1 in its container).
>
> When the parent process gets a SIGINT or SIGHUP, it could send SIGKILL
> to the child, thereby terminating the container.
>
> Does that make sense?

Yes, crystal clear now.  Thanks for bearing with me.

>>> (The daemon works around that by running processes under a separate UID
>>> and doing kill(-1, SIGKILL) under that UID.)
>>
>> So, PID 1 in the build container forks and changes the UID or
>> something?
>
> Yes, with setuid (see build.cc:2180.)

Awesome, thank you.

My current container work is figuring out how to spawn interactive
processes in a container, such as bash or a Guile REPL.  Seems I need
to learn how to make a pty and maybe do some dup/dup2 calls to pipe
stdin in the parent process to the child container process.  Any
wisdom you have (or anyone else reading this) would be most welcome.
:)

- Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Update on GuixSD containers
  2015-06-12 15:08     ` Ludovic Courtès
  2015-06-13  3:41       ` Thompson, David
@ 2018-07-24 22:22       ` Christopher Lemmer Webber
  1 sibling, 0 replies; 14+ messages in thread
From: Christopher Lemmer Webber @ 2018-07-24 22:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Ludovic Courtès writes:

> Definitely.  Another application I’ve always wanted is a least-authority
> shell, like Plash [0].
>
> (Speaking of which, I just found Shill [1], which seems similar to Plash
> and even has a to-do item regarding package management [2] and is
> written in Racket; unfortunately it runs on FreeBSD, for Capsicum.)

As a side note, yesterday I learned about Capsicum for Linux:

  https://github.com/google/capsicum-linux

Unfortunately it has not seen commits this last year.  A shame; it would
really be nice to get such ocap support in GNU/Linux.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-07-24 22:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-08 15:20 Update on GuixSD containers Thompson, David
2015-06-09 21:28 ` Ludovic Courtès
2015-06-11 14:51   ` Thompson, David
2015-06-12 15:08     ` Ludovic Courtès
2015-06-13  3:41       ` Thompson, David
2018-07-24 22:22       ` Christopher Lemmer Webber
2015-06-12 15:12     ` Ludovic Courtès
2015-06-13  1:41       ` Thompson, David
2015-06-13 13:06         ` Ludovic Courtès
2015-06-13 13:14           ` Thompson, David
2015-06-13 20:19             ` Ludovic Courtès
2015-06-16 16:39               ` Thompson, David
2015-06-19 12:08                 ` Ludovic Courtès
2015-06-19 12:29                   ` Thompson, David

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).