From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id Kjm2Bb5KJGCFAQAA0tVLHw (envelope-from ) for ; Wed, 10 Feb 2021 21:06:06 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 8CQXAb5KJGAUNAAA1q6Kng (envelope-from ) for ; Wed, 10 Feb 2021 21:06:06 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 6359B940485 for ; Wed, 10 Feb 2021 21:06:05 +0000 (UTC) Received: from localhost ([::1]:47360 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l9wgW-000293-7U for larch@yhetil.org; Wed, 10 Feb 2021 16:06:04 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:42988) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l9wea-00011O-Ix for guix-devel@gnu.org; Wed, 10 Feb 2021 16:04:04 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]:56184) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l9wea-00012i-8h; Wed, 10 Feb 2021 16:04:04 -0500 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=45682 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1l9weZ-0000f1-N7; Wed, 10 Feb 2021 16:04:04 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Christopher Baines Subject: Re: Handling nars/narinfos at scale, some ideas... References: <87h7moyhv9.fsf@cbaines.net> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 22 =?utf-8?Q?Pluvi=C3=B4se?= an 229 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Wed, 10 Feb 2021 22:04:01 +0100 In-Reply-To: <87h7moyhv9.fsf@cbaines.net> (Christopher Baines's message of "Sat, 06 Feb 2021 22:02:50 +0000") Message-ID: <87y2fv4ou6.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -2.86 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: 6359B940485 X-Spam-Score: -2.86 X-Migadu-Scanner: scn0.migadu.com X-TUID: F+YEVnvQO8i4 Hi Chris! Christopher Baines skribis: > When serving from a store, you can use guix gc to remove items, and gc > roots to protect the items you want to keep. I'm not aware of similar > tooling when you just have a bunch of nars+narinfo files. This means you > either just delete files based on when you generated them, or don't > delete anything and potentially have an ever growing collection of nars. Nitpick: =E2=80=98guix publish=E2=80=99 has a simple LRU policy for its cac= he, based the atime of cached narinfos, which allows it to eventually reclaim unpopular items. > When serving the substitutes, there's advantages to having low latency > access to the narinfo files, since they're very small. If you're trying > to serve the whole world. one way of doing this would be to store the > narinfos on several machines around the world, and direct requests for > them to a machine that's close in terms of network latency. The relevant > bit here is storing the narinfos on multiple machines, and keeping them > in sync. This also may improve resilience if through this there's not a > single point of failure with the one machine storing the narinfo files. > > I think these needs: doing garbage collection across narinfo data and > storing narinfo data on multiple machines can be met with one > approach. I'm also thinking this might be a good place to try and store > analytics about the fetching of nars+narinfos. I think what=E2=80=99s appropriate here is =E2=80=9Ccache eviction=E2=80=9D= rather than =E2=80=9Cgarbage collection=E2=80=9D: in the former case, time locality is the driving facto= r to determine what to remove, whereas in the latter case, reachability from some roots is what matters. That=E2=80=99s the difference between /var/cache/guix/publish and /gnu/store. I believe here you=E2=80=99d typically want policies similar to that of =E2= =80=98guix publish=E2=80=99: LRU + minimum time-to-live. When things are distributed,= it=E2=80=99s a bit harder though: do you need to gather usage stats from all the mirrors to the head? or do you perform cache eviction on each mirror with purely local knowledge? In any case, you need to make sure that the =E2=80=98Cache-Control=E2=80=99= header sent to the client with its narinfo reply is honored=E2=80=94that the nar will r= emain available for the specified time, no matter which replica the client ends up talking to. > This new tool/service would be a standalone thing, but I'm very much > thinking about deploying it alongside a Guix Build Coordinator > instance. Again, while the Guix Build Coordinator can help with serving > substitutes, that approach doesn't stretch yet to doing the things > above. > > Note that while this does similar things to guix publish, it's not > designed to replace it. This approach is probably only worth it if you > want to store/serve nars+narinfos on from more than one machine. > > I also don't see this as something to do instead of things like IPFS > distribution for substitutes, but I do think it would be good to have a > way of providing substitutes over HTTP which is reliable and works at a > global scale. Agreed on all points. > The architecture I'm currently thinking about for this is to store the > narinfo data in a PostgreSQL database. This will allow for storing the > equivalent of "roots" in the graph, using SQL queries to traverse the > graph to find the "garbage" and using logical replication to sync the > data between multiple machines. Additionally, I'm thinking that the > narinfo's can be served directly from the database, and maybe analytics > data (counts of narinfo requests) can be saved back to the database. What about nars, BTW? :-) > My testbed for this will probably be guix.cbaines.net, so I'll probably > need to look at doing something to direct requests to different servers > (maybe GeoIP with knot) and getting Letsencrypt to work across multiple > servers, but that can come later. > > Anyway, I haven't actually implemented this yet, but maybe after sending > this email I'll be one step closer... > > Please let me know if you have any thoughts or questions! That=E2=80=99s a pretty exciting project, and if it can address the single-point-of-failure issue with ci.guix.gnu.org and also provide a general solution to mirroring (rather than the ad-hoc solutions discussed so far), that=E2=80=99s great! Thanks, Ludo=E2=80=99.