unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Latest guile-daemon changes and bewilderment
@ 2017-07-24 10:18 Caleb Ristvedt
  2017-07-25  8:44 ` Ludovic Courtès
  2017-07-25 20:05 ` Mark H Weaver
  0 siblings, 2 replies; 6+ messages in thread
From: Caleb Ristvedt @ 2017-07-24 10:18 UTC (permalink / raw)
  To: guix-devel

Hello guix!

After a pileup of changes and testing, I finally got around to pushing
my changes to the guile-daemon to the branch on savannah. The big
addition is half-working derivation-building (just one at a time, for
now, and only chroot builds for now) in the form of
guix/store/build-derivations.scm. More on the not-working half later.

This includes reference-scanning, which I think pretty much works, but
which I regret to say I spent too much time thinking about. Premature
optimization and all that. It takes a rather different approach to the
way the C++ code does it - it uses a trie to recognize references out of
a list of possibilities (namely, anything thrown in the build
environment's store). The idea is that additional pre-calculation can be
performed on each node and it can become a sort of branching skip table
of the sort used in that one string-matching algorithm whose name I
can't remember. At least, that's the way it works in my head - the parts
to make it fast haven't been implemented yet. I should probably split
the scanner out into a separate module.

The build environment currently is isolated by creating new pid, mount,
ipc, and uts namespaces, remounting everything as MS_PRIVATE, using
pivot-root to make a generated directory the new root, and unmounting
the old root.

So, does it work? Well... sort of. Which brings me to a really bizarre
problem to debug. *deep breath*

The way I've been going about testing this has been to first delete
test-tmp completely, then run (as root, since I'm testing chroot stuff):

./test-env guix build --dry-run hello
./test-env guile
scheme@(guile-user)> (use-modules (ice-9 readline))
scheme@(guile-user)> (activate-readline)
scheme@(guile-user)> ,m (guix store build-derivations)
scheme@(guile-user)> (build-derivation
                      (read-derivation-from-file
                       "path/to/guix/test-tmp/store/...hello-2.10.drv"))

This eventually fails trying to build the bootstrap make. This is the
only error message I get: http://paste.lisp.org/display/351519. So it
seems that some low-level guile thing is failing. I know from
print-statement debugging that the exact place it fails is when the
gnu-build-system calls apply to start the set-paths phase, but have no
indication of why this is. It doesn't make it to the set-paths
procedure, though. Perhaps someone more familiar with guile's internals
knows more?

Builder: http://paste.lisp.org/display/351521.
Environment variables: http://paste.lisp.org/display/351523.

You'll notice that among the environment variables is
GUILE_AUTO_COMPILE=0. This is actually something I added myself because
for some reason the bootstrap guile wasn't honoring the
--no-auto-compile flag, but does honor the environment
variable. Strange.

Stranger still, attempting to run the bootstrap guile interactively from
the build environment (replacing the execl with a system* so the build
environment can be explored from the REPL) causes it to print the
copyright notice and immediately exit with an exit code of 0. So
debugging that way isn't really an option. Notably, running the same
guile from outside of the build environment works fine.

Copy+pasting the builder code into the guile-2.2 REPL in the build
environment after adding the module-import to the load path works just
fine.

Confusing. I know there are several non-essential parts of the build
process that aren't implemented yet, but none of them are used by this
particular derivation as far as I can tell.

- reepca

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest guile-daemon changes and bewilderment
  2017-07-24 10:18 Latest guile-daemon changes and bewilderment Caleb Ristvedt
@ 2017-07-25  8:44 ` Ludovic Courtès
  2017-07-28 11:19   ` Caleb Ristvedt
  2017-07-25 20:05 ` Mark H Weaver
  1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2017-07-25  8:44 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

Hi reepca!

Caleb Ristvedt <caleb.ristvedt@cune.org> skribis:

> After a pileup of changes and testing, I finally got around to pushing
> my changes to the guile-daemon to the branch on savannah. The big
> addition is half-working derivation-building (just one at a time, for
> now, and only chroot builds for now) in the form of
> guix/store/build-derivations.scm. More on the not-working half later.
>
> This includes reference-scanning, which I think pretty much works,

Woow!

> but which I regret to say I spent too much time thinking
> about. Premature optimization and all that. It takes a rather
> different approach to the way the C++ code does it - it uses a trie to
> recognize references out of a list of possibilities (namely, anything
> thrown in the build environment's store). The idea is that additional
> pre-calculation can be performed on each node and it can become a sort
> of branching skip table of the sort used in that one string-matching
> algorithm whose name I can't remember. At least, that's the way it
> works in my head - the parts to make it fast haven't been implemented
> yet. I should probably split the scanner out into a separate module.

I haven’t put as much thought into the scanning algorithm as you did, so
I’ll defer to you.  Great that you already have a good prototype of
that!

And yes, I agree it’s probably better to have the scanner in its own
module.

> The build environment currently is isolated by creating new pid, mount,
> ipc, and uts namespaces, remounting everything as MS_PRIVATE, using
> pivot-root to make a generated directory the new root, and unmounting
> the old root.

Sounds good.

> So, does it work? Well... sort of. Which brings me to a really bizarre
> problem to debug. *deep breath*
>
> The way I've been going about testing this has been to first delete
> test-tmp completely, then run (as root, since I'm testing chroot stuff):
>
> ./test-env guix build --dry-run hello
> ./test-env guile
> scheme@(guile-user)> (use-modules (ice-9 readline))
> scheme@(guile-user)> (activate-readline)
> scheme@(guile-user)> ,m (guix store build-derivations)
> scheme@(guile-user)> (build-derivation
>                       (read-derivation-from-file
>                        "path/to/guix/test-tmp/store/...hello-2.10.drv"))
>
> This eventually fails trying to build the bootstrap make. This is the
> only error message I get: http://paste.lisp.org/display/351519. So it
> seems that some low-level guile thing is failing. I know from
> print-statement debugging that the exact place it fails is when the
> gnu-build-system calls apply to start the set-paths phase, but have no
> indication of why this is. It doesn't make it to the set-paths
> procedure, though. Perhaps someone more familiar with guile's internals
> knows more?
>
> Builder: http://paste.lisp.org/display/351521.
> Environment variables: http://paste.lisp.org/display/351523.

Is there a line above or below the backtrace mentioning the uncaught
exception?  Could you ‘strace -f’ the daemon process?

Another test would be to manually run the very command that appears in
gnu-make-boot0.drv “by hand”, first outside of a chroot, then within a
chroot with /gnu/store bind-mounted in it (or you could use PRoot for
that.)


BTW, I see the code uses ‘clone’ directly.  It would be safer to use
‘call-with-container’, which already handles bind mounts, non-local
exits, and so on.  Would it be an option?

Regarding scanning, (guix build grafts) contains a special-purpose
reference scanner that Mark carefully optimized; it might be worth
looking at.

> You'll notice that among the environment variables is
> GUILE_AUTO_COMPILE=0. This is actually something I added myself because
> for some reason the bootstrap guile wasn't honoring the
> --no-auto-compile flag, but does honor the environment
> variable. Strange.

Yeah, we shouldn’t add this environment variable to derivations because
that would influence everything that goes in there.

Could it be that ‘argv’ lacked the executable name as argv[0], and thus
the argument list was shifted to the left?

Let’s maybe try to further debug this interactively on IRC.

Thanks for the update!

Ludo’.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest guile-daemon changes and bewilderment
  2017-07-24 10:18 Latest guile-daemon changes and bewilderment Caleb Ristvedt
  2017-07-25  8:44 ` Ludovic Courtès
@ 2017-07-25 20:05 ` Mark H Weaver
  1 sibling, 0 replies; 6+ messages in thread
From: Mark H Weaver @ 2017-07-25 20:05 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

Caleb Ristvedt <caleb.ristvedt@cune.org> writes:

> This includes reference-scanning, which I think pretty much works, but
> which I regret to say I spent too much time thinking about. Premature
> optimization and all that. It takes a rather different approach to the
> way the C++ code does it - it uses a trie to recognize references out of
> a list of possibilities (namely, anything thrown in the build
> environment's store). The idea is that additional pre-calculation can be
> performed on each node and it can become a sort of branching skip table
> of the sort used in that one string-matching algorithm whose name I
> can't remember.

Maybe you're thinking of the Boyer-Moore string search algorithm:

  https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

You might also want to take a look at the simple algorithm I cooked up
to make grafting faster, in (guix build graft).  In theory it might be
improved by building some kind of lookup table based on the set of
expected hashes, as you suggest, but I wasn't sure if it would be a net
win, given the added code complexity and larger lookup tables.  We can
skip quite a lot simply based on the fact that Nix hashes include only
1/8 of the possible byte values, and exclude the most common letters
from English text.  More investigation is needed!

    Regards,
      Mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest guile-daemon changes and bewilderment
  2017-07-25  8:44 ` Ludovic Courtès
@ 2017-07-28 11:19   ` Caleb Ristvedt
  2017-08-01 12:46     ` Ludovic Courtès
  2017-08-20  6:30     ` Ricardo Wurmus
  0 siblings, 2 replies; 6+ messages in thread
From: Caleb Ristvedt @ 2017-07-28 11:19 UTC (permalink / raw)
  Cc: guix-devel


> Is there a line above or below the backtrace mentioning the uncaught
> exception?  Could you ‘strace -f’ the daemon process?

No, no line above or below. Very strange.

> BTW, I see the code uses ‘clone’ directly.  It would be safer to use
> ‘call-with-container’, which already handles bind mounts, non-local
> exits, and so on.  Would it be an option?

There are a couple of issues with using call-with-container. In
decreasing order of perceived difficulty to solve:

1. Copying the output from the build to the real store. How would that
work? It wouldn't work with call-with-container - the container can't
access the outside world, and the chroot directory is automatically
deleted once the scope of the container is left. On top of that, there's
no guarantee that anything inside the chroot directory is visible
outside of the container namespace, since it's on a separate filesystem
mounted in that separate namespace.

The primary solution I have in mind for this is to add a separate
procedure argument to call-with-container (maybe "use-output"?) to be
run after the main thunk has finished running but before the temporary
directory has been deleted and which takes the chroot directory as its
only argument. Also to change the mounting of a tmpfs on the chroot dir
to happen before the clone, and explicitly unmount it after running the
aforementioned use-output thunk.

2. Some MS_PRIVATE stuff. Cleaning up will fail when whatever filesystem
the chroot directory is on is mounted as MS_SHARED, which according to a
comment in build.cc is what systemd mounts stuff as. (I'm curious why we
don't seem to have run into this issue yet, perhaps I have misunderstood
something)

My solution here is to change the propagation type of the chroot
directory inside the namespace to MS_PRIVATE. Anything mounted under it
will inherit that propagation type and not appear mounted in the
original namespace, and unmounting the chroot directory should work
fine.

3. Miscellaneous order changes. I don't know enough about the inner
workings of linux to be completely sure to what extent the order of some
operations is significant.

4. Minor differences. For example, the C++ daemon doesn't currently
bind-mount /dev/ptmx or /dev/fuse or /dev/console. I don't think those
would be a problem, but I dunno.

... and then I paused writing this for 2 days while I checked whether my
in-theory solutions would work in practice. And it seems like they
actually do (see recent branch update). Mostly. I need to figure out why
it fails when a new user namespace is created - for some reason
pivot-root fails when new-root was mounted from a different user
namespace.

But on the bright side, it somehow solved the bug I described earlier. I
still haven't managed to make it all the way to building hello (a
libunistring test hangs), but it's getting much farther.


> Regarding scanning, (guix build grafts) contains a special-purpose
> reference scanner that Mark carefully optimized; it might be worth
> looking at.

Huh. I did not know that. In hindsight, it seems obvious that there must
have been something like that for grafts to function. I'll look into
that.

>> You'll notice that among the environment variables is
>> GUILE_AUTO_COMPILE=0. This is actually something I added myself because
>> for some reason the bootstrap guile wasn't honoring the
>> --no-auto-compile flag, but does honor the environment
>> variable. Strange.
>
> Yeah, we shouldn’t add this environment variable to derivations because
> that would influence everything that goes in there.

Aye, it was mostly just for debugging. That problem has also disappeared
with the switch to using call-with-container. It's nice and all, but I
do wish I knew what caused it.

> Could it be that ‘argv’ lacked the executable name as argv[0], and thus
> the argument list was shifted to the left?

That's what I thought too, but the same behavior happened when adding
the executable name explicitly as the first argument both to system* and
execl.

> Let’s maybe try to further debug this interactively on IRC.

... and then I promptly fell asleep and spent the next few days
(nights?) tinkering. Oops.

Oh well, there will be no shortage of debugging to be done. It seems
like that's going to be the pattern for the near future - add an
isolation mechanism or something that conforms better to what is
currently done, try to build stuff, get a bit farther, look for more
stuff to fix.

- reepca

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest guile-daemon changes and bewilderment
  2017-07-28 11:19   ` Caleb Ristvedt
@ 2017-08-01 12:46     ` Ludovic Courtès
  2017-08-20  6:30     ` Ricardo Wurmus
  1 sibling, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2017-08-01 12:46 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

Hi reepca,

Caleb Ristvedt <caleb.ristvedt@cune.org> skribis:

>> Is there a line above or below the backtrace mentioning the uncaught
>> exception?  Could you ‘strace -f’ the daemon process?
>
> No, no line above or below. Very strange.

And strace?  :-)  It could be that it fails to load some of the
(system *) helper modules when displaying the error, something like
that.

>> BTW, I see the code uses ‘clone’ directly.  It would be safer to use
>> ‘call-with-container’, which already handles bind mounts, non-local
>> exits, and so on.  Would it be an option?
>
> There are a couple of issues with using call-with-container. In
> decreasing order of perceived difficulty to solve:

[...]

> ... and then I paused writing this for 2 days while I checked whether my
> in-theory solutions would work in practice. And it seems like they
> actually do (see recent branch update). Mostly. I need to figure out why
> it fails when a new user namespace is created - for some reason
> pivot-root fails when new-root was mounted from a different user
> namespace.

Heheh.  :-)

The commit “build-derivations: use call-with-container” does two things:
adjust ‘call-with-container’, and actually use it in the new daemon
code.  I think it would be great to split this into two logical changes.

I don’t get the use of ‘try-umount’ in an unwind handler of
‘call-with-container’.  Since ‘call-with-container’ uses
‘call-with-temporary-directory’, ROOT is never a mount point, no?

> But on the bright side, it somehow solved the bug I described earlier. I
> still haven't managed to make it all the way to building hello (a
> libunistring test hangs), but it's getting much farther.

If the hanging test is ‘test-lock’, that’s a “known problem” on machines
with more than ~8 cores, and it’s fixed in core-updates.

>> Let’s maybe try to further debug this interactively on IRC.
>
> ... and then I promptly fell asleep and spent the next few days
> (nights?) tinkering. Oops.

No problem.  :-)

Next time feel free to ping me when you’re at the peak of your debugging
activity, in case I can help out!

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest guile-daemon changes and bewilderment
  2017-07-28 11:19   ` Caleb Ristvedt
  2017-08-01 12:46     ` Ludovic Courtès
@ 2017-08-20  6:30     ` Ricardo Wurmus
  1 sibling, 0 replies; 6+ messages in thread
From: Ricardo Wurmus @ 2017-08-20  6:30 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel


Hi,

> There are a couple of issues with using call-with-container. In
> decreasing order of perceived difficulty to solve:
>
> 1. Copying the output from the build to the real store. How would that
> work? It wouldn't work with call-with-container - the container can't
> access the outside world, and the chroot directory is automatically
> deleted once the scope of the container is left.

The usual way to deal with this is to bind mount outside directories
into a local directory in the container.  Have you tried that or is this
not applicable?

-- 
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-08-20  6:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-24 10:18 Latest guile-daemon changes and bewilderment Caleb Ristvedt
2017-07-25  8:44 ` Ludovic Courtès
2017-07-28 11:19   ` Caleb Ristvedt
2017-08-01 12:46     ` Ludovic Courtès
2017-08-20  6:30     ` Ricardo Wurmus
2017-07-25 20:05 ` Mark H Weaver

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).