From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:306:2d92::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id UFPEGoafBGVotwAAG6o9tA:P1 (envelope-from ) for ; Fri, 15 Sep 2023 20:16:38 +0200 Received: from aspmx1.migadu.com ([2001:41d0:306:2d92::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id UFPEGoafBGVotwAAG6o9tA (envelope-from ) for ; Fri, 15 Sep 2023 20:16:38 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 271D25CB30 for ; Fri, 15 Sep 2023 20:16:38 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers eddsa verify failed") header.d=russelstein.xyz header.s=ed25519 header.b=GUtsFhgS; dkim=fail ("headers rsa verify failed") header.d=russelstein.xyz header.s=rsa header.b=qQ5MCYN0; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Seal: i=1; s=key1; d=yhetil.org; t=1694801798; a=rsa-sha256; cv=none; b=fE6zwdkQySmt7IUp/EUYYMjBACZAeF/uz72I1GrRkQdNGLIfWY7xrG/FIXZ5jUJSXD/uGg ilcQ+sH6VgEf6y3fGmul1AOkttkuqPjkfljbmF/5I2zmv1xELqIu/4EbNKAPXh1R0Yb9pl 3Ioy09zB4JqbCAEZQlpWUn1DUXhqQdWX9lFRzWLJAY4XmP1g+nTZOsyD7bNF9oxHaUH8Dt ssKkj54sx0uykhEkp7eVygvVmzMjhvgzUUYP00RQJT02jssPceIXmLMIL/Qv7fLhuWFFDh otV4LpYrraI9Vw2EyqQrQ7+3M00+9U5+VqiFceDt3VWdasoftilf8iDWU+3sNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1694801798; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=ai9Obp0ur0BiBfaIfmZaGH0HwbTYrk7LW9cF01wShUc=; b=Jy1nbpmH6lnpvVY8FRKpF5WHZuA9SY2ROmlr7aM9PxGd8TmtnFejizZIGKjPC9b6FvJ9XE 716Xl1pUBaCRiFgvHSb/K+Zr5PmHad9pNeV5bb9WK8LR1C5v68QdR+w3V8qT98PRUPEQs6 FTIrvGEer/Ml8MpvV+MlK4E+6Y81vc+0WPbpPuiYm05uXutrDP1SXa4TxKQNXhIj1I3Qi8 GGxJ2ioPeyCQMRpIhWo2scrxOco0CZhdWrGilmCmgbHsSio9cS7EmnQXDntOiidXPkCHaP J5Ls1ZdkqYbPUzRuwjBo7gJvdPQH4N7w1obdTiPWJbR48Yyl/RX5xl/ZC/CTdQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers eddsa verify failed") header.d=russelstein.xyz header.s=ed25519 header.b=GUtsFhgS; dkim=fail ("headers rsa verify failed") header.d=russelstein.xyz header.s=rsa header.b=qQ5MCYN0; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qhDME-0000Wj-9y; Fri, 15 Sep 2023 14:15:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgqCZ-0003dy-SI for guix-devel@gnu.org; Thu, 14 Sep 2023 13:32:29 -0400 Received: from h88.64.18.98.static.ip.windstream.net ([98.18.64.88] helo=russelstein.xyz) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgqCT-0004Xl-8P for guix-devel@gnu.org; Thu, 14 Sep 2023 13:32:27 -0400 DKIM-Signature: v=1; a=ed25519-sha256; q=dns/txt; c=relaxed/relaxed; d=russelstein.xyz; s=ed25519; h=Sender:Content-Type:MIME-Version:Message-ID: In-Reply-To:Date:References:Subject:Cc:To:From:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ai9Obp0ur0BiBfaIfmZaGH0HwbTYrk7LW9cF01wShUc=; b=GUtsFhgSfql6zUa2O/ofRrY14t FaZ6ZzXbdcFzK0Ev1KRtZ7Agmade0KPKIP/cJltdb1UZzgZ8xneVeVZVF8Bg==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=russelstein.xyz; s=rsa; h=Sender:Content-Type:MIME-Version:Message-ID: In-Reply-To:Date:References:Subject:Cc:To:From:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ai9Obp0ur0BiBfaIfmZaGH0HwbTYrk7LW9cF01wShUc=; b=qQ5MCYN047cFOhdYgqorA+0cMq yyia83QPuGfIoN3y15DGGltotgHYv0v0CX131GD1bWXOKuX/mFCH5jiZauo64599JDcyiMfmcV79W BtdyfuITS+NLOwWjQ1WVRl/EuTsTCBT803m2xqI4kb2uWMyowCLy9TT9ZcJxMfHLOMNLmrrNLvEJ8 hBO1cQSdWJUbRRdvxaiRHQXyVvHfMzSdrhbyFUAwG1hVzFlYwf6MORV2Ztmn5m+0FmfDFKKDeWS+7 IkRUAVtaiidL6+QUPw7HhmRxllDZ4lEbVkWcwFOHtLJ2GLt5m6NWLvCXlMELPHJsX+Sy6JagUyCbB /oNHgFh2S6UZ9Kt33Zt0xZO+Nl6RxM3weKecUvhx83LFBSBcaDkAgjvzqdDrR8eb0TdrWEgRL09/j /y6llYlyUinIQWdHhr1ZXfEYrPVp3jlsMRKjbMMuzKH9p9MDH3glNvZKoDweEgkFDPQuJBy9E1xzq UJHZD8Wuk7pWX3fzNqKOlJWoMVVe2utupm2k3Db0b+LWd/PzwDLFcoDyZ9HI60Qs1wJhORMHxttVK fV5tA7bQ/wqnlKQl/pA0NjVvmXG+JMevMShkOL0ThzV3bxogODeLPwT+pRDSefHYTYeFSwtuNu2fh GIs+ksrKtZipOgGUPh8BHNk/t6WQMqJlne8GHmd0I=; Received: from librecmc.lan ([192.168.11.1] helo=GuixPotato) by russelstein.xyz with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qgqCM-0006iZ-3C; Thu, 14 Sep 2023 12:32:16 -0500 To: Christopher Baines Cc: guix-devel@gnu.org Subject: Re: Implementing the guix-dameon in Guile References: <875y4et4up.fsf@cbaines.net> Date: Thu, 14 Sep 2023 12:31:41 -0500 In-Reply-To: <875y4et4up.fsf@cbaines.net> (Christopher Baines's message of "Wed, 13 Sep 2023 16:36:54 +0100") Message-ID: <87r0n0r79e.fsf@a.a> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=98.18.64.88; envelope-from=reepca@russelstein.xyz; helo=russelstein.xyz X-Spam_score_int: 12 X-Spam_score: 1.2 X-Spam_bar: + X-Spam_report: (1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_PBL=3.335, RCVD_IN_SORBS_DUL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 15 Sep 2023 14:15:55 -0400 X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Caleb Ristvedt From: Caleb Ristvedt via "Development of GNU Guix and the GNU System distribution." Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -3.31 X-Spam-Score: -3.31 X-Migadu-Queue-Id: 271D25CB30 X-Migadu-Scanner: mx2.migadu.com X-TUID: oeLb5tSc7vbm --=-=-= Content-Type: text/plain My old university email address started demanding a phone number to "verify the security of my account", which was pretty funny considering it never had a phone number to begin with, so I'm locked out of that. Same GPG public key, though. Christopher Baines writes: > Hey! > > I think this has been talked about for a while [1], but I want to make it > happen. Currently the guix-daemon is still similar to the nix-daemon > that it was forked from, and is implemented in C++. I think that a Guile > implementation of the guix-daemon will simplify Guix and better support > hacking on and around the daemon to add new features and move Guix > forward. I'd like to help with this if at all possible. > Still though, I'd like to hear what people think about which direction > the implementation should go, and what features they'd like to see. Even > if those are not essential to make the Guile implementation viable, it > still might inform the direction to take. Okay, brain dump time: I think that using fibers has a lot of potential, but there are obstacles that need to be worked around. In the single-threaded case, we risk a big slowdown if multiple clients are active at once, since we're doing what used to be done by n processes with one single thread. It would be especially noticeable during big disk reads and writes, since those basically ignore O_NONBLOCK, and most procedures that act on entire files at once would therefore probably not hit many yield points. The worst situation would be where multiple worker fibers are attempting to do reference scanning at the same time. Posix asynchronous disk IO could be used, but glibc currently implements it... using a thread pool. There is the RWF_NOWAIT flag to preadv2, though it's only available on newer linuxes and has bugs in 5.9 and 5.10. Additionally, being single-threaded means use of F_SETLKW is a no-go, so you're stuck with polling there. Granted, that's not such a big issue if in 99% of use cases there is only one process doing the locking, so it can all be managed internally. Speaking of file locks, the risk of accidental clobbering of locks jumps way up once it's all moved in to one process, and IIRC we already have bugs with accidental clobbering of locks. You can get a half-decent interface by doing what sqlite does, which is a combination of intra-process locking and holding on to open file descriptors until all locks on the underlying file are released. There are some subtle pathological cases there that are a lot more likely in the guix daemon than in sqlite, though. For example, suppose you open a file twice to get ports p1 and p2, acquire read locks on both of them, then close p1, then open the file again to get p3, acquire a read lock on it, close p2, get p4, acquire a read lock on it, close p3, get p5... and so on. This will cause unbounded file descriptor usage, and eventually you'll run out. There is no workaround in this model other than "hope that usage pattern doesn't come up much". Additionally, you need to ensure that every close of a potentially-locked file goes through a special close-wrapper. I'm actually in the middle of working on a solution for this that involves a separate locker process that gets passed file descriptors to lock via a unix socket. Speaking of file descriptors, running the entire daemon in one process is going to mean much higher pressure on file descriptor resource usage. IIRC, while building a derivation, the closure of its inputs needs to be locked, and that means a file descriptor for each and every store item in its input closure, simultaneously. The separate locker process would make it possible to retain those locks while not having them open in the main process. Another issue that will need to be addressed, whether single-threaded or not, is the use of memoization caches in various places. These aren't weak hash tables, so they are both not-thread-safe and will retain strong references to both the cached results and the arguments used to obtain them for as long as the procedure it is based on remains. In a long-running server process, this is less than ideal. One approach could be to put a bound on how large they can grow, with some eviction policy for deciding what gets discarded first. If memoization is used to ensure pointer equality as a matter of correctness, though, that probably won't work well. The simplest solution would probably be to change them to use weak hash tables, though perhaps with an option available to bring back non-weak hash tables on the client side. In the multithreaded case, fork() and clone() become concerns, since they can no longer be safely run from guile. One way around this would be to use posix_spawn to produce a single-threaded guile process, then have that do the fork or clone as necessary. The fork case shouldn't actually be necessary, though, as the child process can just exec directly. In the clone case, CLONE_PARENT can be used to make the resulting process a child of the original, main process, though I don't know how portable that is to hurd (really, I don't know how namespace setup works in general on hurd). Instead of doing this spawn-two-processes-to-spawn-one routine every time we want to set up a container, we could create a spawner-helper process once and just keep it around. If we can do that before any threads are created, we don't even need posix_spawn (though it is nice to have around, and I do have bindings to it). I remember reading that that's what the apache web server did. This would however mean some places would need to use interfaces like "eval-with-container" instead of "call-with-container", which is somewhat less convenient. But code staging shouldn't be a terribly foreign concept to most guixers. Another concern is child process management; a blocking waitpid will of course block the calling thread, so something like a SIGCHLD handler or a dedicated reaper thread would be needed in order to simulate a blocking waitpid. Personally I think it would be a good idea to go with something like Shepherd's process monitor approach, but with some changes. First, move child reaping into the process monitor itself, so that all the SIGCHLD handler does is send a notification to the process monitor (and it should do this via a condition variable, not a channel, so that it doesn't block, since asyncs run on whatever fiber happens to be current on that thread's scheduler at the time, and this means it is possible for a signal's handler to be run from within the process monitor fiber). Second, wrap all process-spawning procedures such that they now return objects instead of PIDs. A object contains a PID, a condition variable signaled when the process is terminated, and a slot for holding the exit status. Immediately before spawning a process, send a message to the process monitor temporarily disabling reaping, then spawn the process, create the object, and register it with the process monitor, resuming reaping at the same time. Then a waitpid replacement can very easily operate on these process objects. Sqlite is yet another concern. I haven't yet looked at how you've handled this in the build coordinator, but I'm curious. Any blocking interface it has, such as a busy handler, isn't going to work very well. We could wrap the sqlite procedures with ones that retry with exponential backoff (which is what "PRAGMA busy_timeout = ..." does internally). That would work, though not optimally. I'm not sure of a better way, though - https://www.sqlite.org/c3ref/unlock_notify.html looks sort of right, but the documentation says that's just for "shared cache mode". It seems a bit of a shame to keep guessing at when the database "might" next be available when the other entities accessing the database may well be in the very same process and so could just give us an earliest-possible checking point directly. Those are all the design concerns that I had off the top of my head, I might recall some more later on. Personally I think it would be prudent to design as if for multiple threads. On the subject of features, I would like it if downloaders (fixed-output derivations) had access to /gnu/store/.links so that they can easily look up whether a file with a given hash already exists, and copy it over if so. Often when writing a package definition I'll run "guix download" to get the hash, and that will put it in the store as a side effect, but then when it comes time to build the package it will re-download it all over again because the file name is different. Technically that should already be achievable just by tweaking chroot-dirs and the downloaders. It would also be nice if the same concept could be applied to directories, such as git repositories - perhaps a /gnu/store/.dirlinks with symbolic links? Of course, those wouldn't be used for deduplication, just for easy access for fixed-output derivations. A similar approach could also include e.g. a mounted DVD with The GNU Source Distribution on it. Oh, another feature that just occurred to me - the ability to automatically retry a failed derivation with fewer builds in parallel, or depending on system load or available memory. It's quite annoying to run a guix command that is supposed to take multiple days, and have it end up taking more than a week because it fails several times each day - often bringing down several hours worth of progress with it due to very large derivations - because of some small package's flaky tests that fail under high load. - reepca --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQFKBAEBCAA0FiEEdNapMPRLm4SepVYGwWaqSV9/GJwFAmUDQ4oWHGNhbGViQHJ1 c3NlbHN0ZWluLnh5egAKCRDBZqpJX38YnEvwB/0RXHuQEqQNM14qSKmgThv/xE6U NPfLKiSQxDYLfTUNn5eZJllC+lSKQqM/q/4ZT0knLtW/9pfjq6kw1X+CtD/lUD3V sKLspnbZlM2U9vqaOI3jKILrFLlDWMyKdZ8vSfSqxWEmHqChgIWrDOKODFBCxl/7 pMEFCgcP0YG/MD8QeL7jkUPE3I+xHFYNYcFPkDKMMiXrqpESojT3blBpI+T1fQxt 62me8VwBAo0vEnbzw+IFNH+KCoIeazcHs+fvBkr98P3mI6arvazpCuIOJl+QfPXP bTsMa0A846ljz346YhIk1KEznivmseUHiCAHdLv2lTk6XHEdTOTVo+hEpZP1 =FzZ/ -----END PGP SIGNATURE----- --=-=-=--