Ludovic Courtès writes: > Christopher Baines skribis: > >> I've been doing some performance tuning, submitting builds is now more >> parallelised, a source of slowness when fetching builds has been >> addressed, and one of the long queries involved in allocating builds has >> been removed, which also improved handling of the WAL (Sqlite write >> ahead log). >> >> There's also a few new features. Agents can be deactivated which means >> they won't get any builds allocated. The coordinator now checks the >> hashes of outputs which are submitted, a safeguard which I added because >> the coordinator now also supports resuming the uploads of outputs. This >> is particularly important when trying to upload large (> 1GiB) outputs >> over slow connections. >> >> I also added a new x86_64 build machine. It's a 4 core Intel NUC that I >> had sitting around, but I cleaned it up and got it building things. This >> was particularly useful as I was able to use it to retry building >> guile@3.0.7, which is extremely hard to build [2]. This was blocking >> building the channel instance derivations for x86_64-linux. >> >> 2: https://data.guix.gnu.org/gnu/store/7k6s13bzbz5fd72ha1gx9rf6rrywhxzz-guile-3.0.7.drv > > Neat! (Though I wouldn’t say building Guile is “extremely hard”, > especially on x86_64. :-)) The ability to keep retrying is much > welcome. To rephrase, I found it extremely hard to get that particular Guile derivation to build successfully, it failed to build 12 times, and only succeeded when I added new hardware to attempt on (I'm guessing the particular issue I was encountering was exacerbated by more cores). Unfortunately, I also think that you finding it easy to build actually contributes to the problem here, since it makes finding and addressing issues like this harder. >> Space is running out on bayfront, the machine that runs the coordinator, >> stores all the nars and build logs, and serves the substitutes. I knew >> this was probably going to be an issue, bayfront didn't have much space >> to begin with, but I had hoped I'd be further forward in developing some >> way to allow moving the nars around between multiple machines, to remove >> the need to store all of them on bayfront. I have got a plan, there's >> some ideas I mentioned back in February [4], but I haven't got around to >> implementing anything yet. The disk space usage trend is pretty much >> linear, so if things continue without any change, I think it will be >> necessary to pause the agents within a month, to avoid filling up >> bayfront entirely. > > Ah, bummer. I hope we can find a solution one way or another. > Certainly we could replicate nars on another machine with more disk, > possibly buying the necessary hardware with the project funds. Since this email got a bit delayed when I sent it, things have moved on a bit now. 90% disk usage was the threshold I had in mind for bayfront, and that's now pretty much been reached so I've paused all the agents. My plans for how to address this have also developed a bit as well, but it's still going to take a month at least to get things going again. Chris