unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH 0/15] Add preliminary support for Linux containers
@ 2015-07-06 13:01 David Thompson
  2015-07-07 10:28 ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: David Thompson @ 2015-07-06 13:01 UTC (permalink / raw)
  To: guix-devel

Greetings Guix hackers,

The following (large) patch set implements the basic building blocks of
a Linux container implemention in pure Scheme.  There's an awful lot of
marketing buzz around the word "container" these days due to Docker, but
they are a generally useful concept that we can use (and already use in
the build daemon) to build isolated environments for say, a web browser,
to prevent it from being able to read/write to every file in the user's
home directory.  Additionally, one can create special GuixSD
configurations suited for running in a container, for a lightweight
alternative to virtual machines.  Probably the best part of all of this
is that the interface is accessible to unprivileged users, with some
caveats.

The main interface to this functionality is the 'call-with-container'
procedure in the (gnu build linux-container) module:

    (call-with-container
      (lambda ()
        (sethostname "guix-0.8.3"))

There is also a 'container-excursion' procedure for evaluating code in
the context of an existing container process:

    (container-excursion 9999
      (lambda ()
        (mkdir "/foo"))

To run a command in the context of a running container, there's a new
'guix container exec' command for that:

    guix container exec 9999 /run/current-system/profile/bin/bash --login

If that's not exciting enough, how about launching a new development
environment inside a container?

    guix environment --container emacs

Or, how about launching a GuixSD system in a container?:

    (use-modules (gnu))
    (use-package-modules linux)
    (use-service-modules networking)
    
    ;; Minimal GuixSD configuration suitable for a Linux container.
    (operating-system
      (host-name "container-test")
      (timezone "America/New_York")
      (locale "en_US.UTF-8")
      ;; Unused
      (bootloader (grub-configuration (device "/dev/sdX")))
      (file-systems %container-file-systems)
    
      (users (cons (user-account
                    (name "alice")
                    (comment "Bob's sister")
                    (group "users")
                    (supplementary-groups '("wheel" "audio" "video"))
                    (home-directory "/home/alice"))
                   %base-user-accounts))
    
      (packages (cons* strace %base-packages))
    
      (services (list (static-networking-service "lo" "127.0.0.1"
                                                 #:provision '(loopback)))))

Here's how you build it:

    guix system container container.scm

Now that the cool stuff is out of the way, here are the drawbacks:

There is currently no support for "control groups" (cgroups) or
networking via the Linux netlink interface.  Unprivileged users cannot
map more than a single uid/gid to the host system, so multi-user
containers *must* be created with root privileges.

Unfortunately, there is still one blocker bug that I know of: The unit
test for 'container-excursion' is non-deterministic.  Once out of every
10 to 20 test runs, it fails, but I can't figure out why.  For anyone
interested, here are some strace snippets:

Command:

    strace -q -e trace=readlink,setns,clone,chdir -f make check TESTS=tests/containers.scm

Failure:

    [pid 10608] clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 10622
    [pid 10622] chdir("/")                  = 0
    [pid 10622] +++ exited with 0 +++
    [pid 10608] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=10622, si_status=0, si_utime=0, si_stime=0} ---
    [pid 10608] clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 10623
    [pid 10608] readlink("/proc/10623/ns/user", "user:[4026532287]", 100) = 17
    [pid 10608] readlink("/proc/10623/ns/ipc", "ipc:[4026532290]", 100) = 16
    [pid 10608] readlink("/proc/10623/ns/uts", "uts:[4026532289]", 100) = 16
    [pid 10608] readlink("/proc/10623/ns/net", "net:[4026532292]", 100) = 16
    [pid 10608] readlink("/proc/10623/ns/pid", "pid:[4026532344]", 100) = 16
    [pid 10608] readlink("/proc/10623/ns/mnt", "mnt:[4026532288]", 100) = 16
    [pid 10608] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcc058aa10) = 10624
    [pid 10624] readlink("/proc/10624/ns/user", "user:[4026531837]", 4095) = 17
    [pid 10624] readlink("/proc/10623/ns/user", "user:[4026532287]", 4095) = 17
    [pid 10624] setns(16, 0)                = 0
    [pid 10624] readlink("/proc/10624/ns/ipc", "ipc:[4026531839]", 4095) = 16
    [pid 10624] readlink("/proc/10623/ns/ipc", "ipc:[4026532290]", 4095) = 16
    [pid 10624] setns(17, 0)                = 0
    [pid 10624] readlink("/proc/10624/ns/uts", "uts:[4026531838]", 4095) = 16
    [pid 10624] readlink("/proc/10623/ns/uts", "uts:[4026532289]", 4095) = 16
    [pid 10624] setns(18, 0)                = 0
    [pid 10624] readlink("/proc/10624/ns/net", "net:[4026531969]", 4095) = 16
    [pid 10624] readlink("/proc/10623/ns/net", "net:[4026532292]", 4095) = 16
    [pid 10624] setns(19, 0)                = 0
    [pid 10624] readlink("/proc/10624/ns/pid", "pid:[4026531836]", 4095) = 16
    [pid 10624] readlink("/proc/10623/ns/pid", "pid:[4026532344]", 4095) = 16
    [pid 10624] setns(20, 0)                = 0
    [pid 10624] readlink("/proc/10624/ns/mnt", "mnt:[4026531840]", 4095) = 16
    [pid 10624] readlink("/proc/10623/ns/mnt", "mnt:[4026532288]", 4095) = 16
    [pid 10624] setns(21, 0)                = 0
    [pid 10624] chdir("/")                  = 0
    [pid 10624] clone( <unfinished ...>
    [pid 10623] chdir("/")                  = 0
    [pid 10624] <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcc058aa10) = 10625
    [pid 10625] readlink("/proc/2/ns/user", 0x8ed0d0, 100) = -1 EACCES (Permission denied)

Success:

[pid 12387] clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 12402
[pid 12402] chdir("/")                  = 0
[pid 12402] +++ exited with 0 +++
[pid 12387] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12402, si_status=0, si_utime=0, si_stime=0} ---
[pid 12387] clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 12403
[pid 12387] readlink("/proc/12403/ns/user", "user:[4026532287]", 100) = 17
[pid 12387] readlink("/proc/12403/ns/ipc", "ipc:[4026532290]", 100) = 16
[pid 12387] readlink("/proc/12403/ns/uts", "uts:[4026532289]", 100) = 16
[pid 12387] readlink("/proc/12403/ns/net", "net:[4026532292]", 100) = 16
[pid 12387] readlink("/proc/12403/ns/pid", "pid:[4026532344]", 100) = 16
[pid 12387] readlink("/proc/12403/ns/mnt", "mnt:[4026532288]", 100) = 16
[pid 12387] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4d67124a10) = 12404
[pid 12404] readlink("/proc/12404/ns/user", "user:[4026531837]", 4095) = 17
[pid 12404] readlink("/proc/12403/ns/user", "user:[4026532287]", 4095) = 17
[pid 12404] setns(16, 0)                = 0
[pid 12404] readlink("/proc/12404/ns/ipc", "ipc:[4026531839]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/ipc", "ipc:[4026532290]", 4095) = 16
[pid 12404] setns(17, 0)                = 0
[pid 12404] readlink("/proc/12404/ns/uts", "uts:[4026531838]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/uts", "uts:[4026532289]", 4095) = 16
[pid 12404] setns(18, 0)                = 0
[pid 12404] readlink("/proc/12404/ns/net", "net:[4026531969]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/net", "net:[4026532292]", 4095) = 16
[pid 12404] setns(19, 0)                = 0
[pid 12404] readlink("/proc/12404/ns/pid", "pid:[4026531836]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/pid", "pid:[4026532344]", 4095) = 16
[pid 12404] setns(20, 0)                = 0
[pid 12404] readlink("/proc/12404/ns/mnt", "mnt:[4026531840]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/mnt", "mnt:[4026532288]", 4095) = 16
[pid 12404] setns(21, 0)                = 0
[pid 12403] chdir("/")                  = 0
[pid 12404] chdir("/")                  = 0
[pid 12404] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4d67124a10) = 12406
[pid 12406] readlink("/proc/2/ns/user", "user:[4026532287]", 100) = 17
[pid 12406] readlink("/proc/2/ns/ipc", "ipc:[4026532290]", 100) = 16
[pid 12406] readlink("/proc/2/ns/uts", "uts:[4026532289]", 100) = 16
[pid 12406] readlink("/proc/2/ns/net", "net:[4026532292]", 100) = 16
[pid 12406] readlink("/proc/2/ns/pid", "pid:[4026532344]", 100) = 16
[pid 12406] readlink("/proc/2/ns/mnt", "mnt:[4026532288]", 100) = 16

In both cases, all of the 'setns' system calls succeed, but the EACCES
error leads me to believe that the excursion process is somehow *not* a
member of the necessary mount namespace.  I haven't seen this failure
when running the 'guix container exec' command which uses
'container-excursion', so I'm suspecting that there may be a race
condition to address.

tl;dr: Containers!  There's a bug in a test!  Help!

Happy hacking,

-- 
David Thompson
GPG Key: 0FF1D807

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/15] Add preliminary support for Linux containers
  2015-07-06 13:01 [PATCH 0/15] Add preliminary support for Linux containers David Thompson
@ 2015-07-07 10:28 ` Ludovic Courtès
  2015-07-07 22:35   ` Thompson, David
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2015-07-07 10:28 UTC (permalink / raw)
  To: David Thompson; +Cc: guix-devel

Howdy!

In short, this is awesome!

Here are random notes I took as I was playing with all this.

David Thompson <dthompson2@worcester.edu> skribis:

> The main interface to this functionality is the 'call-with-container'
> procedure in the (gnu build linux-container) module:
>
>     (call-with-container
                          ^^
Missing list of mounts here.

>       (lambda ()
>         (sethostname "guix-0.8.3"))

Surprisingly, calling ‘getpid’ in the thunk returns the PID of the
parent (I was expecting it to return 1.)  Not sure why that is the
case.  I’m still amazed that this works as non-root, BTW.

There’s an issue when the parent’s Guile is not mapped into the
container’s file system: ‘use-modules’ forms and auto-loading will fail.
For instance, I did (use-modules (ice-9 ftw)) in the parent and called
‘scandir’ in the child, but that failed because of an attempt to
auto-load (ice-9 i18n), which is unavailable in the container.

> There is also a 'container-excursion' procedure for evaluating code in
> the context of an existing container process:
>
>     (container-excursion 9999
>       (lambda ()
>         (mkdir "/foo"))
>
> To run a command in the context of a running container, there's a new
> 'guix container exec' command for that:
>
>     guix container exec 9999 /run/current-system/profile/bin/bash --login

I failed to get that to work, both with ‘guix environment --container’
and ‘guix system container’.  For instance, with a GuixSD container
running as root as PID 29532, I got this:

--8<---------------cut here---------------start------------->8---
$ sudo ./pre-inst-env guix container exec 29532 ls
Backtrace:
In ice-9/boot-9.scm:
 155: 14 [catch #t #<catch-closure 1be1bc0> ...]
In unknown file:
   ?: 13 [apply-smob/1 #<catch-closure 1be1bc0>]
In ice-9/boot-9.scm:
  61: 12 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 11 [eval # #]
In ice-9/boot-9.scm:
2401: 10 [save-module-excursion #<procedure 1bff980 at ice-9/boot-9.scm:4045:3 ()>]
4050: 9 [#<procedure 1bff980 at ice-9/boot-9.scm:4045:3 ()>]
1724: 8 [%start-stack load-stack ...]
1729: 7 [#<procedure 1c16e70 ()>]
In unknown file:
   ?: 6 [primitive-load "/home/ludo/src/guix/scripts/guix"]
In guix/ui.scm:
1015: 5 [run-guix-command container "exec" "29532" "ls"]
In gnu/build/linux-container.scm:
  36: 4 [call-with-clean-exit #<procedure 240cb10 at gnu/build/linux-container.scm:278:3 ()>]
 279: 3 [#<procedure 240cb10 at gnu/build/linux-container.scm:278:3 ()>]
In ice-9/boot-9.scm:
 768: 2 [for-each #<procedure 2408000 at gnu/build/linux-container.scm:279:15 (ns)> ...]
 867: 1 [call-with-input-file "/proc/29779/ns/user" ...]
 867: 0 [call-with-input-file "/proc/29532/ns/user" ...]

ice-9/boot-9.scm:867:17: In procedure call-with-input-file:
ice-9/boot-9.scm:867:17: In procedure setns: 11 0: Nevalida argumento
--8<---------------cut here---------------end--------------->8---

What am I missing?

> If that's not exciting enough, how about launching a new development
> environment inside a container?
>
>     guix environment --container emacs

This is wonderful.  :-)

Currently, $PWD is mapped to /env in the container.  I think the default
should be to map $PWD to $PWD, because often build systems record
$top_srcdir and $top_builddir and would be confused if you work on a
given build tree both inside and outside the container.

Also, I think we should add --expose and --share as for ‘guix system’,
though that can come later.

Last, I wonder if there should be an option to use a UID other than 0.
Then perhaps we’d need to create fake /etc/group and /etc/passwd, as
done in build.cc.

WDYT?

> Here's how you build it:
>
>     guix system container container.scm

Very neat.  I wonder if that should automatically override the
‘file-systems’ field to be ‘%container-file-systems’, so that one can
reuse existing OS declarations unmodified.  WDYT?

> Unfortunately, there is still one blocker bug that I know of: The unit
> test for 'container-excursion' is non-deterministic.  Once out of every
> 10 to 20 test runs, it fails, but I can't figure out why.  For anyone
> interested, here are some strace snippets:

Ouch, this one looks more difficult.  :-)

I’ll comment on the individual patches.

Thank you for the nice code!

Ludo’.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/15] Add preliminary support for Linux containers
  2015-07-07 10:28 ` Ludovic Courtès
@ 2015-07-07 22:35   ` Thompson, David
  2015-07-08 12:46     ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Thompson, David @ 2015-07-07 22:35 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Tue, Jul 7, 2015 at 6:28 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> Howdy!
>
> In short, this is awesome!
>
> Here are random notes I took as I was playing with all this.
>
> David Thompson <dthompson2@worcester.edu> skribis:
>
>> The main interface to this functionality is the 'call-with-container'
>> procedure in the (gnu build linux-container) module:
>>
>>     (call-with-container
>                           ^^
> Missing list of mounts here.

Oof.  Oversight while I was typing all this up.  Sorry!

>>       (lambda ()
>>         (sethostname "guix-0.8.3"))
>
> Surprisingly, calling ‘getpid’ in the thunk returns the PID of the
> parent (I was expecting it to return 1.)  Not sure why that is the
> case.  I’m still amazed that this works as non-root, BTW.

The first process created inside the PID namespace gets the honor of
being PID 1, not the process created with the 'clone' call.

For more information, see: https://lwn.net/Articles/532748/

> There’s an issue when the parent’s Guile is not mapped into the
> container’s file system: ‘use-modules’ forms and auto-loading will fail.
> For instance, I did (use-modules (ice-9 ftw)) in the parent and called
> ‘scandir’ in the child, but that failed because of an attempt to
> auto-load (ice-9 i18n), which is unavailable in the container.

Hmm, I don't know of a way to deal with that other than the user being
careful to bind-mount in the Guile modules they need.
Hmm, there's various reasons that EINVAL would be thrown.  Could you
readlink "those" files, that is /proc/<pid-outside-container>/ns/user
and /proc/<pid-inside-container>/ns/user, and tell me if the contents
are the same?  They shouldn't be, but this will eliminate one of the
possible causes of EINVAL.

>> If that's not exciting enough, how about launching a new development
>> environment inside a container?
>>
>>     guix environment --container emacs
>
> This is wonderful.  :-)
>
> Currently, $PWD is mapped to /env in the container.  I think the default
> should be to map $PWD to $PWD, because often build systems record
> $top_srcdir and $top_builddir and would be confused if you work on a
> given build tree both inside and outside the container.

Sure, I didn't think of that.  I will make change it.

> Also, I think we should add --expose and --share as for ‘guix system’,
> though that can come later.

Yes, I also really want that, but it's a task for another time.

> Last, I wonder if there should be an option to use a UID other than 0.
> Then perhaps we’d need to create fake /etc/group and /etc/passwd, as
> done in build.cc.
>
> WDYT?
>
>> Here's how you build it:
>>
>>     guix system container container.scm
>
> Very neat.  I wonder if that should automatically override the
> ‘file-systems’ field to be ‘%container-file-systems’, so that one can
> reuse existing OS declarations unmodified.  WDYT?

This would be a better user experience, for sure.  I thought about
this, but I don't know how to do it in a way that isn't surprising or
just broken.  Ideas?

>> Unfortunately, there is still one blocker bug that I know of: The unit
>> test for 'container-excursion' is non-deterministic.  Once out of every
>> 10 to 20 test runs, it fails, but I can't figure out why.  For anyone
>> interested, here are some strace snippets:
>
> Ouch, this one looks more difficult.  :-)

Yes, I have no idea what's wrong.  Sapping... my... hack... energy...

> I’ll comment on the individual patches.

Much appreciated.

> Thank you for the nice code!

Thanks for sifting through all of this code!

- Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/15] Add preliminary support for Linux containers
  2015-07-07 22:35   ` Thompson, David
@ 2015-07-08 12:46     ` Ludovic Courtès
  2015-07-08 13:00       ` Thompson, David
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2015-07-08 12:46 UTC (permalink / raw)
  To: Thompson, David; +Cc: guix-devel

"Thompson, David" <dthompson2@worcester.edu> skribis:

> On Tue, Jul 7, 2015 at 6:28 AM, Ludovic Courtès <ludo@gnu.org> wrote:

[...]

>>>       (lambda ()
>>>         (sethostname "guix-0.8.3"))
>>
>> Surprisingly, calling ‘getpid’ in the thunk returns the PID of the
>> parent (I was expecting it to return 1.)  Not sure why that is the
>> case.  I’m still amazed that this works as non-root, BTW.
>
> The first process created inside the PID namespace gets the honor of
> being PID 1, not the process created with the 'clone' call.
>
> For more information, see: https://lwn.net/Articles/532748/

To me, the thunk above is just like ‘childFunc’ in
<https://lwn.net/Articles/533492/>–i.e., it’s the procedure that ‘clone’
calls in the first child process of the new PID name space.

What am I missing?

>> There’s an issue when the parent’s Guile is not mapped into the
>> container’s file system: ‘use-modules’ forms and auto-loading will fail.
>> For instance, I did (use-modules (ice-9 ftw)) in the parent and called
>> ‘scandir’ in the child, but that failed because of an attempt to
>> auto-load (ice-9 i18n), which is unavailable in the container.
>
> Hmm, I don't know of a way to deal with that other than the user being
> careful to bind-mount in the Guile modules they need.

Right.  Maybe the best we can do is to add a word of caution in the
docstring or something.

> Hmm, there's various reasons that EINVAL would be thrown.  Could you
> readlink "those" files, that is /proc/<pid-outside-container>/ns/user
> and /proc/<pid-inside-container>/ns/user, and tell me if the contents
> are the same?  They shouldn't be, but this will eliminate one of the
> possible causes of EINVAL.

It turns out I was targeting the wrong PID.

>> Also, I think we should add --expose and --share as for ‘guix system’,
>> though that can come later.
>
> Yes, I also really want that, but it's a task for another time.

Sure.

>>> Here's how you build it:
>>>
>>>     guix system container container.scm
>>
>> Very neat.  I wonder if that should automatically override the
>> ‘file-systems’ field to be ‘%container-file-systems’, so that one can
>> reuse existing OS declarations unmodified.  WDYT?
>
> This would be a better user experience, for sure.  I thought about
> this, but I don't know how to do it in a way that isn't surprising or
> just broken.  Ideas?

IMO it’d be fine to simply override the subset of ‘file-systems’ that
clashes with ‘%container-file-systems’, similar to what
‘virtualized-operating-system’ does in (gnu system vm).

WDYT?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/15] Add preliminary support for Linux containers
  2015-07-08 12:46     ` Ludovic Courtès
@ 2015-07-08 13:00       ` Thompson, David
  2015-07-08 21:59         ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Thompson, David @ 2015-07-08 13:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Wed, Jul 8, 2015 at 8:46 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> "Thompson, David" <dthompson2@worcester.edu> skribis:
>
>> On Tue, Jul 7, 2015 at 6:28 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>
> [...]
>
>>>>       (lambda ()
>>>>         (sethostname "guix-0.8.3"))
>>>
>>> Surprisingly, calling ‘getpid’ in the thunk returns the PID of the
>>> parent (I was expecting it to return 1.)  Not sure why that is the
>>> case.  I’m still amazed that this works as non-root, BTW.
>>
>> The first process created inside the PID namespace gets the honor of
>> being PID 1, not the process created with the 'clone' call.
>>
>> For more information, see: https://lwn.net/Articles/532748/
>
> To me, the thunk above is just like ‘childFunc’ in
> <https://lwn.net/Articles/533492/>–i.e., it’s the procedure that ‘clone’
> calls in the first child process of the new PID name space.
>
> What am I missing?

It's non-intuitive because PID namespaces are given special treatment.
The cloned process is like PID 1 in the sense that if you fork, the
new process is PID 2.  However, if you call 'getpid' in the cloned
process, it returns the PID in the context of the parent PID
namespace, and you are expecting PID 1.

In that example from LWN, 'childFunc' calls 'execvp', and *that* new
process becomes PID 1 (and 'getpid' agrees).  This is the usual
pattern I see in all container implementations:  The process that
calls clone sets up the environment and then execs the real init
system.

Is it more clear now?

>>> There’s an issue when the parent’s Guile is not mapped into the
>>> container’s file system: ‘use-modules’ forms and auto-loading will fail.
>>> For instance, I did (use-modules (ice-9 ftw)) in the parent and called
>>> ‘scandir’ in the child, but that failed because of an attempt to
>>> auto-load (ice-9 i18n), which is unavailable in the container.
>>
>> Hmm, I don't know of a way to deal with that other than the user being
>> careful to bind-mount in the Guile modules they need.
>
> Right.  Maybe the best we can do is to add a word of caution in the
> docstring or something.

Okay, I will do that.

>> Hmm, there's various reasons that EINVAL would be thrown.  Could you
>> readlink "those" files, that is /proc/<pid-outside-container>/ns/user
>> and /proc/<pid-inside-container>/ns/user, and tell me if the contents
>> are the same?  They shouldn't be, but this will eliminate one of the
>> possible causes of EINVAL.
>
> It turns out I was targeting the wrong PID.

Glad it's not totally broken on machines other than mine. :)

>>> Also, I think we should add --expose and --share as for ‘guix system’,
>>> though that can come later.
>>
>> Yes, I also really want that, but it's a task for another time.
>
> Sure.
>
>>>> Here's how you build it:
>>>>
>>>>     guix system container container.scm
>>>
>>> Very neat.  I wonder if that should automatically override the
>>> ‘file-systems’ field to be ‘%container-file-systems’, so that one can
>>> reuse existing OS declarations unmodified.  WDYT?
>>
>> This would be a better user experience, for sure.  I thought about
>> this, but I don't know how to do it in a way that isn't surprising or
>> just broken.  Ideas?
>
> IMO it’d be fine to simply override the subset of ‘file-systems’ that
> clashes with ‘%container-file-systems’, similar to what
> ‘virtualized-operating-system’ does in (gnu system vm).

I will implement that.

Thanks!

- Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/15] Add preliminary support for Linux containers
  2015-07-08 13:00       ` Thompson, David
@ 2015-07-08 21:59         ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2015-07-08 21:59 UTC (permalink / raw)
  To: Thompson, David; +Cc: guix-devel

"Thompson, David" <dthompson2@worcester.edu> skribis:

> On Wed, Jul 8, 2015 at 8:46 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>> "Thompson, David" <dthompson2@worcester.edu> skribis:
>>
>>> On Tue, Jul 7, 2015 at 6:28 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>>
>> [...]
>>
>>>>>       (lambda ()
>>>>>         (sethostname "guix-0.8.3"))
>>>>
>>>> Surprisingly, calling ‘getpid’ in the thunk returns the PID of the
>>>> parent (I was expecting it to return 1.)  Not sure why that is the
>>>> case.  I’m still amazed that this works as non-root, BTW.
>>>
>>> The first process created inside the PID namespace gets the honor of
>>> being PID 1, not the process created with the 'clone' call.
>>>
>>> For more information, see: https://lwn.net/Articles/532748/
>>
>> To me, the thunk above is just like ‘childFunc’ in
>> <https://lwn.net/Articles/533492/>–i.e., it’s the procedure that ‘clone’
>> calls in the first child process of the new PID name space.
>>
>> What am I missing?
>
> It's non-intuitive because PID namespaces are given special treatment.
> The cloned process is like PID 1 in the sense that if you fork, the
> new process is PID 2.  However, if you call 'getpid' in the cloned
> process, it returns the PID in the context of the parent PID
> namespace, and you are expecting PID 1.
>
> In that example from LWN, 'childFunc' calls 'execvp', and *that* new
> process becomes PID 1 (and 'getpid' agrees).  This is the usual
> pattern I see in all container implementations:  The process that
> calls clone sets up the environment and then execs the real init
> system.
>
> Is it more clear now?

Yes, indeed.  The weird part is that ‘exec’ does not create a new
process, so it’s not supposed to change the return value of ‘getpid’.
But I guess it’s just an artifact of the whole name space hack.  ;-)

Thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-08 21:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-06 13:01 [PATCH 0/15] Add preliminary support for Linux containers David Thompson
2015-07-07 10:28 ` Ludovic Courtès
2015-07-07 22:35   ` Thompson, David
2015-07-08 12:46     ` Ludovic Courtès
2015-07-08 13:00       ` Thompson, David
2015-07-08 21:59         ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).