From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id QNscBNTTtF81BgAA0tVLHw (envelope-from ) for ; Wed, 18 Nov 2020 07:57:08 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id AAsTANTTtF/DDwAAB5/wlQ (envelope-from ) for ; Wed, 18 Nov 2020 07:57:08 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id BE7639402A0 for ; Wed, 18 Nov 2020 07:57:06 +0000 (UTC) Received: from localhost ([::1]:40220 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kfIKv-00071Z-L3 for larch@yhetil.org; Wed, 18 Nov 2020 02:57:05 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:43472) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kfIKi-00071M-91 for guix-devel@gnu.org; Wed, 18 Nov 2020 02:56:52 -0500 Received: from mira.cbaines.net ([212.71.252.8]:60758) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kfIKe-0000Do-BU; Wed, 18 Nov 2020 02:56:51 -0500 Received: from localhost (188.30.135.14.threembb.co.uk [188.30.135.14]) by mira.cbaines.net (Postfix) with ESMTPSA id 9CB4C27BBF5; Wed, 18 Nov 2020 07:56:44 +0000 (GMT) Received: from capella (localhost [127.0.0.1]) by localhost (OpenSMTPD) with ESMTP id 321767bb; Wed, 18 Nov 2020 07:56:42 +0000 (UTC) References: <87tutnlnjy.fsf@cbaines.net> <87blfvocrn.fsf@gnu.org> User-agent: mu4e 1.4.13; emacs 27.1 From: Christopher Baines To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: Thoughts on building things for substitutes and the Guix Build Coordinator In-reply-to: <87blfvocrn.fsf@gnu.org> Date: Wed, 18 Nov 2020 07:56:39 +0000 Message-ID: <87r1orksi0.fsf@cbaines.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Received-SPF: pass client-ip=212.71.252.8; envelope-from=mail@cbaines.net; helo=mira.cbaines.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/18 02:56:45 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: ns3122888.ip-94-23-21.eu Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -3.11 X-TUID: ZxiKJBRpQPAk --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ludovic Court=C3=A8s writes: > Christopher Baines skribis: > >> Because you aren't copying the store items back in to a single store, or >> serving substitutes from the store, you don't need to scale the store to >> serve more substitutes. You've still got a bunch of nars + narinfos to >> store, but I think that is an easier problem to tackle. > > Yes, this is good for the use case of providing substitutes and it would > certainly help on a big build farm like berlin. > > I see a lot could be shared with (guix scripts publish) and (guix > scripts substitute). We should extract the relevant bits and move them > to new modules explicitly meant for more general consumption. I think > it=E2=80=99s important to reduce duplication. Yeah, that would be good. >> Another feature supported by the Guix Build Coordinator is retries. If a >> build fails, the Guix Build Coordinator can automatically retry it. In a >> perfect world, everything would succeed first time, but because the >> world isn't perfect, there still can be intermittent build >> failures. Retrying failed builds even once can help reduce the chance >> that a failure leads to no substitutes for that builds as well as any >> builds that depend on that output. > > That=E2=80=99s nice too; it=E2=80=99s one of the practical issues we have= with Cuirass > and that=E2=80=99s tempting to ignore because =E2=80=9Chey it=E2=80=99s a= ll functional!=E2=80=9D, but > then reality gets in the way. One further benefit related to this is that if you want to manually retry building a derivation, you just submit a new build for that derivation. The Guix Build Coordinator also has no concept of "Failed (dependency)", it never gives up. This avoids the situation where spurious failures block other builds. >> Because the build results don't end up in a store (they could, but as >> set out above, not being in the store is a feature I think), you can't >> use `guix gc` to get rid of old store entries/substitutes. I have some >> ideas about what to implement to provide some kind of GC approach over a >> bunch of nars + narinfos, but I haven't implemented anything yet. > > =E2=80=98guix publish=E2=80=99 has support for that via (guix cache), so = if we could > share code, that=E2=80=99d be great. Guix publish does time based deletion, based on when the files were first created, right? If that works for people, that's fine I guess. Personally, I'm thinking about GC as in, don't delete nar A if you want to keep nar B, and nar B references nar A. It's perfectly possible that someone could fetch nar B if you deleted nar A, but it's also possible that someone couldn't because of that missing substitute. Maybe I'm overthinking this though? The Cuirass + guix publish approach does something similar, because Cuirass creates GC roots that expire. guix gc wouldn't delete a store item if it's needed by something that's protected by a Cuirass created GC root. Another complexity here that I didn't set out initially, is that there are places the Guix Build Coordinator makes decisions based on the belief that if it's database says a build has succeeded for an output, that output will be available. If a situation where a build needed an output that had been successfully built, but then deleted, I think the coordinator would get stuck forever trying that build and it not starting because of the missing store item. My thinking on this at the moment is maybe what you'd want to do is tell the Guix Build Coordinator that you've deleted a store item and it's truly missing, but that would complicate the setup to some degree. > One option would be to populate /var/cache/guix/publish and to let =E2=80= =98guix > publish=E2=80=99 serve it from there. That's probably pretty easy to do, I haven't looked at the details though. >> There could be issues with the implementation=E2=80=A6 I'd like to think= it's >> relatively simple, but that doesn't mean there aren't issues. For some >> reason or another, getting backtraces for exceptions rarely works. Most >> of the time the coordinator tries to print a backtrace, the part of >> Guile doing that raises an exception. I've managed to cause it to >> segfault, through using SQLite incorrectly, which hasn't been obvious to >> fix at least for me. Additionally, there are some places where I'm >> fighting against bits of Guix, things like checking for substitutes >> without caching, or substituting a derivation without starting to build >> it. > > I=E2=80=99ve haven=E2=80=99t yet watched your talk but I=E2=80=99ve what = Mathieu=E2=80=99s, where he > admits to being concerned about the reliability of code involving Fibers > and/or SQLite (which I can understand given his/our experience, although > I=E2=80=99m maybe less pessimistic). What=E2=80=99s your experience, how= do you feel > about it? The coordinator does use Fibers, plus a lot of different threads for different things. Regarding reliability, it's hard to say really. Given I set out to build something that works across a (unreliable) network, I've built in reliability through making sure things retry upon failure among other things. I definitely haven't chased any blocked fibers, although there could be some of those lurking in the code, I might have not noticed because it sorts itself out eventually. One of the problems I did have recently was that some hooks would just stop getting processed. Each type of hook has a thread, which just checked if there were any events to process every second, and processed any if there were. I'm not sure what was wrong, but I changed the code to be smarter, be triggered when new events are actually entered in to the database, and poll every so often just in case. I haven't seen hooks get stuck since then, but what I'm trying to convey here is that I'm not quite sure how to track down issues that occur in specific threads. Another thing to mention here is that implementing suppport for PostgreSQL through Guile Squee is still a thing I have in mind, and that might be more appropriate for larger databases. It's still prone to the fibers blocking problem, but at least it's harder to cause Segfaults with Squee compared to SQLite. --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAl+007dfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh aW5lcy5uZXQACgkQXiijOwuE9Xcv5Q/+JK3OJaZK3euu0NNuiaCUP3gj+Kwllf5n EEs3hvW003eb/eqzevelprYin8uMEGjNL1aO+AIwELWEj0HeWAT3QdM1UEtHV+Dp ZnJLpqVWIiliLAQczSh4Ar2+ipo/a2sjtLGPS6EC4/E2J17NVTq7C9KEwSVv7JC8 BL8UgEjCIMr2BZByDFxeUDAUtEThOZrjwRTB6QT843FC4Pfl7DaxOJihy3a1wHl9 lE5HhF8vpaPryC8MlMyihqnsduPslXykQvW33YVA7ORMUabfWM1wR0cJfZQ4bnxh ueSmsEEVvJA6nILTKeHeErtGF8biptlxrNm+LNJHWoDe3UEHEAUoAPyHNe13MyQB VN9sX/Pb5wAinjf1ReUh7MrQD7sBbtHh9WAM3khFY3gDqG70ho79XmDW1g03SGdT NF9mzRroOOixwr7yYl9tzC1wonueBaI7fx7dnpeEyahgEtCz98IALKiUnR8l+Sh0 JDrrUPu5VqT5bi0q6EQDYb15afDG0hIqrWma3UX4mznwwNX97bvh7lb2o6hFXcOu gtuwUbbiN5QjeBT8Dkvtya9yQG9fkF/mEr2yZELX8ArzCVfiHDjxxewQv/cbmmbi 3XlwfmzwBJhiHEwVb+mTrNMp5VekVVvfmtsw2asDmSOVCyMAVYc6yKD1kMNUtMhW PON3XcpQ2hg= =ljlE -----END PGP SIGNATURE----- --=-=-=--