From: David Thompson <dthompson2@worcester.edu>
To: guix-devel@gnu.org
Subject: [PATCH 0/15] Add preliminary support for Linux containers
Date: Mon, 06 Jul 2015 09:01:43 -0400 [thread overview]
Message-ID: <87lhetcudk.fsf@izanagi.i-did-not-set--mail-host-address--so-tickle-me> (raw)
Greetings Guix hackers,
The following (large) patch set implements the basic building blocks of
a Linux container implemention in pure Scheme. There's an awful lot of
marketing buzz around the word "container" these days due to Docker, but
they are a generally useful concept that we can use (and already use in
the build daemon) to build isolated environments for say, a web browser,
to prevent it from being able to read/write to every file in the user's
home directory. Additionally, one can create special GuixSD
configurations suited for running in a container, for a lightweight
alternative to virtual machines. Probably the best part of all of this
is that the interface is accessible to unprivileged users, with some
caveats.
The main interface to this functionality is the 'call-with-container'
procedure in the (gnu build linux-container) module:
(call-with-container
(lambda ()
(sethostname "guix-0.8.3"))
There is also a 'container-excursion' procedure for evaluating code in
the context of an existing container process:
(container-excursion 9999
(lambda ()
(mkdir "/foo"))
To run a command in the context of a running container, there's a new
'guix container exec' command for that:
guix container exec 9999 /run/current-system/profile/bin/bash --login
If that's not exciting enough, how about launching a new development
environment inside a container?
guix environment --container emacs
Or, how about launching a GuixSD system in a container?:
(use-modules (gnu))
(use-package-modules linux)
(use-service-modules networking)
;; Minimal GuixSD configuration suitable for a Linux container.
(operating-system
(host-name "container-test")
(timezone "America/New_York")
(locale "en_US.UTF-8")
;; Unused
(bootloader (grub-configuration (device "/dev/sdX")))
(file-systems %container-file-systems)
(users (cons (user-account
(name "alice")
(comment "Bob's sister")
(group "users")
(supplementary-groups '("wheel" "audio" "video"))
(home-directory "/home/alice"))
%base-user-accounts))
(packages (cons* strace %base-packages))
(services (list (static-networking-service "lo" "127.0.0.1"
#:provision '(loopback)))))
Here's how you build it:
guix system container container.scm
Now that the cool stuff is out of the way, here are the drawbacks:
There is currently no support for "control groups" (cgroups) or
networking via the Linux netlink interface. Unprivileged users cannot
map more than a single uid/gid to the host system, so multi-user
containers *must* be created with root privileges.
Unfortunately, there is still one blocker bug that I know of: The unit
test for 'container-excursion' is non-deterministic. Once out of every
10 to 20 test runs, it fails, but I can't figure out why. For anyone
interested, here are some strace snippets:
Command:
strace -q -e trace=readlink,setns,clone,chdir -f make check TESTS=tests/containers.scm
Failure:
[pid 10608] clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 10622
[pid 10622] chdir("/") = 0
[pid 10622] +++ exited with 0 +++
[pid 10608] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=10622, si_status=0, si_utime=0, si_stime=0} ---
[pid 10608] clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 10623
[pid 10608] readlink("/proc/10623/ns/user", "user:[4026532287]", 100) = 17
[pid 10608] readlink("/proc/10623/ns/ipc", "ipc:[4026532290]", 100) = 16
[pid 10608] readlink("/proc/10623/ns/uts", "uts:[4026532289]", 100) = 16
[pid 10608] readlink("/proc/10623/ns/net", "net:[4026532292]", 100) = 16
[pid 10608] readlink("/proc/10623/ns/pid", "pid:[4026532344]", 100) = 16
[pid 10608] readlink("/proc/10623/ns/mnt", "mnt:[4026532288]", 100) = 16
[pid 10608] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcc058aa10) = 10624
[pid 10624] readlink("/proc/10624/ns/user", "user:[4026531837]", 4095) = 17
[pid 10624] readlink("/proc/10623/ns/user", "user:[4026532287]", 4095) = 17
[pid 10624] setns(16, 0) = 0
[pid 10624] readlink("/proc/10624/ns/ipc", "ipc:[4026531839]", 4095) = 16
[pid 10624] readlink("/proc/10623/ns/ipc", "ipc:[4026532290]", 4095) = 16
[pid 10624] setns(17, 0) = 0
[pid 10624] readlink("/proc/10624/ns/uts", "uts:[4026531838]", 4095) = 16
[pid 10624] readlink("/proc/10623/ns/uts", "uts:[4026532289]", 4095) = 16
[pid 10624] setns(18, 0) = 0
[pid 10624] readlink("/proc/10624/ns/net", "net:[4026531969]", 4095) = 16
[pid 10624] readlink("/proc/10623/ns/net", "net:[4026532292]", 4095) = 16
[pid 10624] setns(19, 0) = 0
[pid 10624] readlink("/proc/10624/ns/pid", "pid:[4026531836]", 4095) = 16
[pid 10624] readlink("/proc/10623/ns/pid", "pid:[4026532344]", 4095) = 16
[pid 10624] setns(20, 0) = 0
[pid 10624] readlink("/proc/10624/ns/mnt", "mnt:[4026531840]", 4095) = 16
[pid 10624] readlink("/proc/10623/ns/mnt", "mnt:[4026532288]", 4095) = 16
[pid 10624] setns(21, 0) = 0
[pid 10624] chdir("/") = 0
[pid 10624] clone( <unfinished ...>
[pid 10623] chdir("/") = 0
[pid 10624] <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbcc058aa10) = 10625
[pid 10625] readlink("/proc/2/ns/user", 0x8ed0d0, 100) = -1 EACCES (Permission denied)
Success:
[pid 12387] clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 12402
[pid 12402] chdir("/") = 0
[pid 12402] +++ exited with 0 +++
[pid 12387] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=12402, si_status=0, si_utime=0, si_stime=0} ---
[pid 12387] clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 12403
[pid 12387] readlink("/proc/12403/ns/user", "user:[4026532287]", 100) = 17
[pid 12387] readlink("/proc/12403/ns/ipc", "ipc:[4026532290]", 100) = 16
[pid 12387] readlink("/proc/12403/ns/uts", "uts:[4026532289]", 100) = 16
[pid 12387] readlink("/proc/12403/ns/net", "net:[4026532292]", 100) = 16
[pid 12387] readlink("/proc/12403/ns/pid", "pid:[4026532344]", 100) = 16
[pid 12387] readlink("/proc/12403/ns/mnt", "mnt:[4026532288]", 100) = 16
[pid 12387] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4d67124a10) = 12404
[pid 12404] readlink("/proc/12404/ns/user", "user:[4026531837]", 4095) = 17
[pid 12404] readlink("/proc/12403/ns/user", "user:[4026532287]", 4095) = 17
[pid 12404] setns(16, 0) = 0
[pid 12404] readlink("/proc/12404/ns/ipc", "ipc:[4026531839]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/ipc", "ipc:[4026532290]", 4095) = 16
[pid 12404] setns(17, 0) = 0
[pid 12404] readlink("/proc/12404/ns/uts", "uts:[4026531838]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/uts", "uts:[4026532289]", 4095) = 16
[pid 12404] setns(18, 0) = 0
[pid 12404] readlink("/proc/12404/ns/net", "net:[4026531969]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/net", "net:[4026532292]", 4095) = 16
[pid 12404] setns(19, 0) = 0
[pid 12404] readlink("/proc/12404/ns/pid", "pid:[4026531836]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/pid", "pid:[4026532344]", 4095) = 16
[pid 12404] setns(20, 0) = 0
[pid 12404] readlink("/proc/12404/ns/mnt", "mnt:[4026531840]", 4095) = 16
[pid 12404] readlink("/proc/12403/ns/mnt", "mnt:[4026532288]", 4095) = 16
[pid 12404] setns(21, 0) = 0
[pid 12403] chdir("/") = 0
[pid 12404] chdir("/") = 0
[pid 12404] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4d67124a10) = 12406
[pid 12406] readlink("/proc/2/ns/user", "user:[4026532287]", 100) = 17
[pid 12406] readlink("/proc/2/ns/ipc", "ipc:[4026532290]", 100) = 16
[pid 12406] readlink("/proc/2/ns/uts", "uts:[4026532289]", 100) = 16
[pid 12406] readlink("/proc/2/ns/net", "net:[4026532292]", 100) = 16
[pid 12406] readlink("/proc/2/ns/pid", "pid:[4026532344]", 100) = 16
[pid 12406] readlink("/proc/2/ns/mnt", "mnt:[4026532288]", 100) = 16
In both cases, all of the 'setns' system calls succeed, but the EACCES
error leads me to believe that the excursion process is somehow *not* a
member of the necessary mount namespace. I haven't seen this failure
when running the 'guix container exec' command which uses
'container-excursion', so I'm suspecting that there may be a race
condition to address.
tl;dr: Containers! There's a bug in a test! Help!
Happy hacking,
--
David Thompson
GPG Key: 0FF1D807
next reply other threads:[~2015-07-06 13:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-06 13:01 David Thompson [this message]
2015-07-07 10:28 ` [PATCH 0/15] Add preliminary support for Linux containers Ludovic Courtès
2015-07-07 22:35 ` Thompson, David
2015-07-08 12:46 ` Ludovic Courtès
2015-07-08 13:00 ` Thompson, David
2015-07-08 21:59 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lhetcudk.fsf@izanagi.i-did-not-set--mail-host-address--so-tickle-me \
--to=dthompson2@worcester.edu \
--cc=guix-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).