Hey! I sent out the last update about a month ago in March [1]. 1: https://lists.gnu.org/archive/html/guix-devel/2023-03/msg00262.html ## Numbers bordeaux.guix.gnu.org currently provides ~2.4 million nars, which take up ~10TiB to store. ## Storage There are two machines with all the nars, hatysa and bishan. On the plus size, I recently installed a new hard drive in hatysa so it now has plenty of storage for more nars. However, space on bishan is running out, it now has less than 1TiB of free space and will probably run out of space within the next 4 to 8 weeks. ## Problems and bug fixes In the past, I've seen the coordinator memory usage unexpectedly spike and I haven't really understood why. This started happening again recently, and I managed to make some progress on tracking it down. It seems that the problem happens during calls to (backtrace). I was able to reproduce this with the error handling for hooks, but I haven't been able to reproduce it in a more standalone manor yet. I've opened a Guile bug here [2], and for now, I've been trying to work around this issue by removing backtrace calls from the coordinator. Obviously this isn't ideal, but I'd also like to avoid this problem. 2: https://issues.guix.gnu.org/62623 Another odd issue that I've been coming up against for a while is some port encoding issue, I've filed a bug about that too [3]. I've been working around this one by adding silent error handling to logging in critical areas, so that things keep working even if the logging raises an exception. 3: https://issues.guix.gnu.org/62590 I also investigated why there were problems substituting derivations on the childhurds. This was broken by some timeout related changes I made that don't work on the hurd, this is now fixed and the error reporting in that area is improved. I recently spotted a crash in the build coordinator when building anthy-9100h [4], that turned out to be due to a bug in Guile with handling invalid unicode when using suspendable ports. This is now fixed upstream [5] and the relevant Guile package in Guix has this patch included. 4: https://issues.guix.gnu.org/62240 5: https://issues.guix.gnu.org/62290 I think I made some progress on the write_wait_fd errors I've been seeing from the coordinator agents. Luckily Ludovic seems to have done most of the work, so I was able to send a patch for guile-gnutls [6]. 6: https://gitlab.com/gnutls/guile/-/merge_requests/10 ## Progress The Git repositories for the guix-build-coordinator and nar-herder are now on Savannah [7], which is great as it means other committers can now easily push to these repositories. 7: https://git.savannah.gnu.org/cgit/guix The big new feature I've been working on is support for listening for events from the coordinator. This is only possible recently with the support for streaming responses in the guile-fibers web server. While the build coordinator isn't intended as a web service, it does make use of http for talking to clients and agents. I've followed the standard for server sent events [8] for this new functionality. 8: https://html.spec.whatwg.org/multipage/server-sent-events.html The client interface for the coordinator isn't exposed since there's no authentication mechanism. However, I've also got a prototype for a web frontend [9] for the build farm up and running. This does expose the events stream, along with the state information that's needed to make sense of this. The result so far is this activity page [10], which shows information about the agents and the builds allocated to them. It's still a very rough prototype though, and there's more work needed to make it reliable and include more information to make it more useful. 9: https://git.cbaines.net/guix/bffe/ 10: https://bordeaux.guix.gnu.org/activity I've also made some improvements to the build coordinator in terms of cancelling builds, the combined post publish hook (to enable validating the availability of referenced nars), parallel hook processing in the coordinator and prioritising post build actions in the agents (which helps mitigate congested uploading). ## Next steps The bishan storage problem is growing ever closer, there's still a need to come up with a plan. One which ideally reduces the amount of hardware I'm personally renting. I still need to do a bit more work on validating nar reference availability when asking the nar-herder to import nars. Now that the bordeaux build farm is being used for QA, there will be some nars that no longer need to be kept (as they correspond to some derivation that didn't end up on the master branch). It would be good to start automatically removing these to free up space. As above, I think a good start has been made on making the build coordinator behind bordeaux more observable, but there's still lots of room for improvement with that. I also have some sysadmin things to do. The Overdrive (ARM) machine monokuma has some btrfs issue with it's drive, so I need to reinstall Guix on it to get it back working. I also have a RiscV board that I've had for ages, and should get connected up to the build farm to start building things. If you're interested in working on any of this, do let me know, and let me know if you have any comments or questions! Thanks, Chris