From mboxrd@z Thu Jan 1 00:00:00 1970 From: Caleb Ristvedt Subject: Re: Latest guile-daemon changes and bewilderment Date: Fri, 28 Jul 2017 06:19:33 -0500 Message-ID: <87wp6sbygq.fsf@cune.org> References: <8760eicf4m.fsf@cune.org> <87a83snbw5.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:47913) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1db3J4-0006mQ-GW for guix-devel@gnu.org; Fri, 28 Jul 2017 07:19:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1db3J1-00083h-K7 for guix-devel@gnu.org; Fri, 28 Jul 2017 07:19:46 -0400 Received: from mail-it0-x22a.google.com ([2607:f8b0:4001:c0b::22a]:35509) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1db3J1-00082L-Cu for guix-devel@gnu.org; Fri, 28 Jul 2017 07:19:43 -0400 Received: by mail-it0-x22a.google.com with SMTP id h199so95717542ith.0 for ; Fri, 28 Jul 2017 04:19:41 -0700 (PDT) Received: from GuixPotato ([208.89.170.250]) by smtp.gmail.com with ESMTPSA id t124sm8842330iod.29.2017.07.28.04.19.39 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 28 Jul 2017 04:19:40 -0700 (PDT) In-Reply-To: <87a83snbw5.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 25 Jul 2017 10:44:58 +0200") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" Cc: guix-devel@gnu.org > Is there a line above or below the backtrace mentioning the uncaught > exception? Could you =E2=80=98strace -f=E2=80=99 the daemon process? No, no line above or below. Very strange. > BTW, I see the code uses =E2=80=98clone=E2=80=99 directly. It would be s= afer to use > =E2=80=98call-with-container=E2=80=99, which already handles bind mounts,= non-local > exits, and so on. Would it be an option? There are a couple of issues with using call-with-container. In decreasing order of perceived difficulty to solve: 1. Copying the output from the build to the real store. How would that work? It wouldn't work with call-with-container - the container can't access the outside world, and the chroot directory is automatically deleted once the scope of the container is left. On top of that, there's no guarantee that anything inside the chroot directory is visible outside of the container namespace, since it's on a separate filesystem mounted in that separate namespace. The primary solution I have in mind for this is to add a separate procedure argument to call-with-container (maybe "use-output"?) to be run after the main thunk has finished running but before the temporary directory has been deleted and which takes the chroot directory as its only argument. Also to change the mounting of a tmpfs on the chroot dir to happen before the clone, and explicitly unmount it after running the aforementioned use-output thunk. 2. Some MS_PRIVATE stuff. Cleaning up will fail when whatever filesystem the chroot directory is on is mounted as MS_SHARED, which according to a comment in build.cc is what systemd mounts stuff as. (I'm curious why we don't seem to have run into this issue yet, perhaps I have misunderstood something) My solution here is to change the propagation type of the chroot directory inside the namespace to MS_PRIVATE. Anything mounted under it will inherit that propagation type and not appear mounted in the original namespace, and unmounting the chroot directory should work fine. 3. Miscellaneous order changes. I don't know enough about the inner workings of linux to be completely sure to what extent the order of some operations is significant. 4. Minor differences. For example, the C++ daemon doesn't currently bind-mount /dev/ptmx or /dev/fuse or /dev/console. I don't think those would be a problem, but I dunno. ... and then I paused writing this for 2 days while I checked whether my in-theory solutions would work in practice. And it seems like they actually do (see recent branch update). Mostly. I need to figure out why it fails when a new user namespace is created - for some reason pivot-root fails when new-root was mounted from a different user namespace. But on the bright side, it somehow solved the bug I described earlier. I still haven't managed to make it all the way to building hello (a libunistring test hangs), but it's getting much farther. > Regarding scanning, (guix build grafts) contains a special-purpose > reference scanner that Mark carefully optimized; it might be worth > looking at. Huh. I did not know that. In hindsight, it seems obvious that there must have been something like that for grafts to function. I'll look into that. >> You'll notice that among the environment variables is >> GUILE_AUTO_COMPILE=3D0. This is actually something I added myself because >> for some reason the bootstrap guile wasn't honoring the >> --no-auto-compile flag, but does honor the environment >> variable. Strange. > > Yeah, we shouldn=E2=80=99t add this environment variable to derivations b= ecause > that would influence everything that goes in there. Aye, it was mostly just for debugging. That problem has also disappeared with the switch to using call-with-container. It's nice and all, but I do wish I knew what caused it. > Could it be that =E2=80=98argv=E2=80=99 lacked the executable name as arg= v[0], and thus > the argument list was shifted to the left? That's what I thought too, but the same behavior happened when adding the executable name explicitly as the first argument both to system* and execl. > Let=E2=80=99s maybe try to further debug this interactively on IRC. ... and then I promptly fell asleep and spent the next few days (nights?) tinkering. Oops. Oh well, there will be no shortage of debugging to be done. It seems like that's going to be the pattern for the near future - add an isolation mechanism or something that conforms better to what is currently done, try to build stuff, get a bit farther, look for more stuff to fix. - reepca