My old university email address started demanding a phone number to "verify the security of my account", which was pretty funny considering it never had a phone number to begin with, so I'm locked out of that. Same GPG public key, though. Christopher Baines writes: > Hey! > > I think this has been talked about for a while [1], but I want to make it > happen. Currently the guix-daemon is still similar to the nix-daemon > that it was forked from, and is implemented in C++. I think that a Guile > implementation of the guix-daemon will simplify Guix and better support > hacking on and around the daemon to add new features and move Guix > forward. I'd like to help with this if at all possible. > Still though, I'd like to hear what people think about which direction > the implementation should go, and what features they'd like to see. Even > if those are not essential to make the Guile implementation viable, it > still might inform the direction to take. Okay, brain dump time: I think that using fibers has a lot of potential, but there are obstacles that need to be worked around. In the single-threaded case, we risk a big slowdown if multiple clients are active at once, since we're doing what used to be done by n processes with one single thread. It would be especially noticeable during big disk reads and writes, since those basically ignore O_NONBLOCK, and most procedures that act on entire files at once would therefore probably not hit many yield points. The worst situation would be where multiple worker fibers are attempting to do reference scanning at the same time. Posix asynchronous disk IO could be used, but glibc currently implements it... using a thread pool. There is the RWF_NOWAIT flag to preadv2, though it's only available on newer linuxes and has bugs in 5.9 and 5.10. Additionally, being single-threaded means use of F_SETLKW is a no-go, so you're stuck with polling there. Granted, that's not such a big issue if in 99% of use cases there is only one process doing the locking, so it can all be managed internally. Speaking of file locks, the risk of accidental clobbering of locks jumps way up once it's all moved in to one process, and IIRC we already have bugs with accidental clobbering of locks. You can get a half-decent interface by doing what sqlite does, which is a combination of intra-process locking and holding on to open file descriptors until all locks on the underlying file are released. There are some subtle pathological cases there that are a lot more likely in the guix daemon than in sqlite, though. For example, suppose you open a file twice to get ports p1 and p2, acquire read locks on both of them, then close p1, then open the file again to get p3, acquire a read lock on it, close p2, get p4, acquire a read lock on it, close p3, get p5... and so on. This will cause unbounded file descriptor usage, and eventually you'll run out. There is no workaround in this model other than "hope that usage pattern doesn't come up much". Additionally, you need to ensure that every close of a potentially-locked file goes through a special close-wrapper. I'm actually in the middle of working on a solution for this that involves a separate locker process that gets passed file descriptors to lock via a unix socket. Speaking of file descriptors, running the entire daemon in one process is going to mean much higher pressure on file descriptor resource usage. IIRC, while building a derivation, the closure of its inputs needs to be locked, and that means a file descriptor for each and every store item in its input closure, simultaneously. The separate locker process would make it possible to retain those locks while not having them open in the main process. Another issue that will need to be addressed, whether single-threaded or not, is the use of memoization caches in various places. These aren't weak hash tables, so they are both not-thread-safe and will retain strong references to both the cached results and the arguments used to obtain them for as long as the procedure it is based on remains. In a long-running server process, this is less than ideal. One approach could be to put a bound on how large they can grow, with some eviction policy for deciding what gets discarded first. If memoization is used to ensure pointer equality as a matter of correctness, though, that probably won't work well. The simplest solution would probably be to change them to use weak hash tables, though perhaps with an option available to bring back non-weak hash tables on the client side. In the multithreaded case, fork() and clone() become concerns, since they can no longer be safely run from guile. One way around this would be to use posix_spawn to produce a single-threaded guile process, then have that do the fork or clone as necessary. The fork case shouldn't actually be necessary, though, as the child process can just exec directly. In the clone case, CLONE_PARENT can be used to make the resulting process a child of the original, main process, though I don't know how portable that is to hurd (really, I don't know how namespace setup works in general on hurd). Instead of doing this spawn-two-processes-to-spawn-one routine every time we want to set up a container, we could create a spawner-helper process once and just keep it around. If we can do that before any threads are created, we don't even need posix_spawn (though it is nice to have around, and I do have bindings to it). I remember reading that that's what the apache web server did. This would however mean some places would need to use interfaces like "eval-with-container" instead of "call-with-container", which is somewhat less convenient. But code staging shouldn't be a terribly foreign concept to most guixers. Another concern is child process management; a blocking waitpid will of course block the calling thread, so something like a SIGCHLD handler or a dedicated reaper thread would be needed in order to simulate a blocking waitpid. Personally I think it would be a good idea to go with something like Shepherd's process monitor approach, but with some changes. First, move child reaping into the process monitor itself, so that all the SIGCHLD handler does is send a notification to the process monitor (and it should do this via a condition variable, not a channel, so that it doesn't block, since asyncs run on whatever fiber happens to be current on that thread's scheduler at the time, and this means it is possible for a signal's handler to be run from within the process monitor fiber). Second, wrap all process-spawning procedures such that they now return objects instead of PIDs. A object contains a PID, a condition variable signaled when the process is terminated, and a slot for holding the exit status. Immediately before spawning a process, send a message to the process monitor temporarily disabling reaping, then spawn the process, create the object, and register it with the process monitor, resuming reaping at the same time. Then a waitpid replacement can very easily operate on these process objects. Sqlite is yet another concern. I haven't yet looked at how you've handled this in the build coordinator, but I'm curious. Any blocking interface it has, such as a busy handler, isn't going to work very well. We could wrap the sqlite procedures with ones that retry with exponential backoff (which is what "PRAGMA busy_timeout = ..." does internally). That would work, though not optimally. I'm not sure of a better way, though - https://www.sqlite.org/c3ref/unlock_notify.html looks sort of right, but the documentation says that's just for "shared cache mode". It seems a bit of a shame to keep guessing at when the database "might" next be available when the other entities accessing the database may well be in the very same process and so could just give us an earliest-possible checking point directly. Those are all the design concerns that I had off the top of my head, I might recall some more later on. Personally I think it would be prudent to design as if for multiple threads. On the subject of features, I would like it if downloaders (fixed-output derivations) had access to /gnu/store/.links so that they can easily look up whether a file with a given hash already exists, and copy it over if so. Often when writing a package definition I'll run "guix download" to get the hash, and that will put it in the store as a side effect, but then when it comes time to build the package it will re-download it all over again because the file name is different. Technically that should already be achievable just by tweaking chroot-dirs and the downloaders. It would also be nice if the same concept could be applied to directories, such as git repositories - perhaps a /gnu/store/.dirlinks with symbolic links? Of course, those wouldn't be used for deduplication, just for easy access for fixed-output derivations. A similar approach could also include e.g. a mounted DVD with The GNU Source Distribution on it. Oh, another feature that just occurred to me - the ability to automatically retry a failed derivation with fewer builds in parallel, or depending on system load or available memory. It's quite annoying to run a guix command that is supposed to take multiple days, and have it end up taking more than a week because it fails several times each day - often bringing down several hours worth of progress with it due to very large derivations - because of some small package's flaky tests that fail under high load. - reepca