From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id OAw5FCESH2CBDQAA0tVLHw (envelope-from ) for ; Sat, 06 Feb 2021 22:03:13 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id 0KMDECESH2A/LwAAbx9fmQ (envelope-from ) for ; Sat, 06 Feb 2021 22:03:13 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id CC14E9401C0 for ; Sat, 6 Feb 2021 22:03:12 +0000 (UTC) Received: from localhost ([::1]:42320 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l8Vfb-0006Mk-OM for larch@yhetil.org; Sat, 06 Feb 2021 17:03:11 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54680) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l8VfO-0006Mc-J5 for guix-devel@gnu.org; Sat, 06 Feb 2021 17:02:58 -0500 Received: from mira.cbaines.net ([212.71.252.8]:55264) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l8VfM-0007HT-Ka for guix-devel@gnu.org; Sat, 06 Feb 2021 17:02:58 -0500 Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:8ac0:b4c7:f5c8:7caa]) by mira.cbaines.net (Postfix) with ESMTPSA id 08A4327BC1E for ; Sat, 6 Feb 2021 22:02:55 +0000 (GMT) Received: from capella (localhost [127.0.0.1]) by localhost (OpenSMTPD) with ESMTP id a5b781db for ; Sat, 6 Feb 2021 22:02:54 +0000 (UTC) User-agent: mu4e 1.4.14; emacs 27.1 From: Christopher Baines To: guix-devel@gnu.org Subject: Handling nars/narinfos at scale, some ideas... Date: Sat, 06 Feb 2021 22:02:50 +0000 Message-ID: <87h7moyhv9.fsf@cbaines.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Received-SPF: pass client-ip=212.71.252.8; envelope-from=mail@cbaines.net; helo=mira.cbaines.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -4.46 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: CC14E9401C0 X-Spam-Score: -4.46 X-Migadu-Scanner: scn1.migadu.com X-TUID: dFbpBfy5cGII --=-=-= Content-Type: text/plain Hey, This is something I've been thinking about for a while, I also ended up setting out some of these ideas on IRC a few days ago [1]. 1: https://logs.guix.gnu.org/guix/2021-02-01.log#222156 While I think the approach taken in the Guix Build Coordinator for serving substitutes for built outputs, generating the nar+narinfo files upfront and storing them is the way to go when you're trying to serve lots of substitutes, there's some areas for improvement on this approach. When serving from a store, you can use guix gc to remove items, and gc roots to protect the items you want to keep. I'm not aware of similar tooling when you just have a bunch of nars+narinfo files. This means you either just delete files based on when you generated them, or don't delete anything and potentially have an ever growing collection of nars. When serving the substitutes, there's advantages to having low latency access to the narinfo files, since they're very small. If you're trying to serve the whole world. one way of doing this would be to store the narinfos on several machines around the world, and direct requests for them to a machine that's close in terms of network latency. The relevant bit here is storing the narinfos on multiple machines, and keeping them in sync. This also may improve resilience if through this there's not a single point of failure with the one machine storing the narinfo files. I think these needs: doing garbage collection across narinfo data and storing narinfo data on multiple machines can be met with one approach. I'm also thinking this might be a good place to try and store analytics about the fetching of nars+narinfos. This new tool/service would be a standalone thing, but I'm very much thinking about deploying it alongside a Guix Build Coordinator instance. Again, while the Guix Build Coordinator can help with serving substitutes, that approach doesn't stretch yet to doing the things above. Note that while this does similar things to guix publish, it's not designed to replace it. This approach is probably only worth it if you want to store/serve nars+narinfos on from more than one machine. I also don't see this as something to do instead of things like IPFS distribution for substitutes, but I do think it would be good to have a way of providing substitutes over HTTP which is reliable and works at a global scale. The architecture I'm currently thinking about for this is to store the narinfo data in a PostgreSQL database. This will allow for storing the equivalent of "roots" in the graph, using SQL queries to traverse the graph to find the "garbage" and using logical replication to sync the data between multiple machines. Additionally, I'm thinking that the narinfo's can be served directly from the database, and maybe analytics data (counts of narinfo requests) can be saved back to the database. My testbed for this will probably be guix.cbaines.net, so I'll probably need to look at doing something to direct requests to different servers (maybe GeoIP with knot) and getting Letsencrypt to work across multiple servers, but that can come later. Anyway, I haven't actually implemented this yet, but maybe after sending this email I'll be one step closer... Please let me know if you have any thoughts or questions! Chris --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmAfEgtfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh aW5lcy5uZXQACgkQXiijOwuE9Xe+AhAAngwzfdijYuIrFBRxQnvUgDqfVmSXVb4q V6O1HJhMGHTFCuBPQNTPn3IZ7PdI2Aq6th7/xAS3MkS+3tUfses75qlAlzzqLAqa ULoEW126k4EADuiX9GWkZuAi/hOrrQ8XhImzxYmivId+5FjMIEndnkIhipXhwoGU GCe+/iXTNmP/j56/0aGTybAIVwbcdIap4X1BGuYi6YpkBQ7GmuaOYGHB+jvowgj4 j71FA740PVCdaHkKlHhK98OlPIKwBhCXByXb5D9tvZTFU7bDlkWUe/UTEvr0OK1M LZItBttKO8qlz9NzbTjI1t0HnGcvzsSWfGOJ6L4LskO2AJTQb5wvDQrc7M0wronF Pypj4piduBEO+NzPYO861ixu/MeQFwW2FTxjXacrI2tzG/AtppmCxUH3MoQKMBYA 1uOrXnghSQZhjvUsLNBgfAFznF/NWoVNkIU4BHCDaPM0ro6vPi1r7aVua2L+ZY0Z o59fFtifs5HLXD2KqR1oZHqaGbEsdisQUJW8aeXC/jJ9LicRXJoLsvnG5WUAG33u 65kyiCZaWjf40dNvCLDmidFJIc81wtBYQcXcfLg+hMo57D7Zg2Q75LLvkex8v0Kf 1DyXjm6AZGAE13Y0Lb8KOGN0aUCSCUuXiRdNbTSfJFWShaZ3hB06lSwFhvw1RwEZ K17xyHAvw9o= =QmUg -----END PGP SIGNATURE----- --=-=-=--