Handling nars/narinfos at scale, some ideas...

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Handling nars/narinfos at scale, some ideas...
@ 2021-02-06 22:02 Christopher Baines
  2021-02-10 21:04 ` Ludovic Courtès
  0 siblings, 1 reply; 2+ messages in thread
From: Christopher Baines @ 2021-02-06 22:02 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3292 bytes --]

Hey,

This is something I've been thinking about for a while, I also ended up
setting out some of these ideas on IRC a few days ago [1].

1: https://logs.guix.gnu.org/guix/2021-02-01.log#222156

While I think the approach taken in the Guix Build Coordinator for
serving substitutes for built outputs, generating the nar+narinfo files
upfront and storing them is the way to go when you're trying to serve
lots of substitutes, there's some areas for improvement on this
approach.

When serving from a store, you can use guix gc to remove items, and gc
roots to protect the items you want to keep. I'm not aware of similar
tooling when you just have a bunch of nars+narinfo files. This means you
either just delete files based on when you generated them, or don't
delete anything and potentially have an ever growing collection of nars.

When serving the substitutes, there's advantages to having low latency
access to the narinfo files, since they're very small. If you're trying
to serve the whole world. one way of doing this would be to store the
narinfos on several machines around the world, and direct requests for
them to a machine that's close in terms of network latency. The relevant
bit here is storing the narinfos on multiple machines, and keeping them
in sync. This also may improve resilience if through this there's not a
single point of failure with the one machine storing the narinfo files.

I think these needs: doing garbage collection across narinfo data and
storing narinfo data on multiple machines can be met with one
approach. I'm also thinking this might be a good place to try and store
analytics about the fetching of nars+narinfos.

This new tool/service would be a standalone thing, but I'm very much
thinking about deploying it alongside a Guix Build Coordinator
instance. Again, while the Guix Build Coordinator can help with serving
substitutes, that approach doesn't stretch yet to doing the things
above.

Note that while this does similar things to guix publish, it's not
designed to replace it. This approach is probably only worth it if you
want to store/serve nars+narinfos on from more than one machine.

I also don't see this as something to do instead of things like IPFS
distribution for substitutes, but I do think it would be good to have a
way of providing substitutes over HTTP which is reliable and works at a
global scale.

The architecture I'm currently thinking about for this is to store the
narinfo data in a PostgreSQL database. This will allow for storing the
equivalent of "roots" in the graph, using SQL queries to traverse the
graph to find the "garbage" and using logical replication to sync the
data between multiple machines. Additionally, I'm thinking that the
narinfo's can be served directly from the database, and maybe analytics
data (counts of narinfo requests) can be saved back to the database.

My testbed for this will probably be guix.cbaines.net, so I'll probably
need to look at doing something to direct requests to different servers
(maybe GeoIP with knot) and getting Letsencrypt to work across multiple
servers, but that can come later.

Anyway, I haven't actually implemented this yet, but maybe after sending
this email I'll be one step closer...

Please let me know if you have any thoughts or questions!

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Handling nars/narinfos at scale, some ideas...
  2021-02-06 22:02 Handling nars/narinfos at scale, some ideas Christopher Baines
@ 2021-02-10 21:04 ` Ludovic Courtès
  0 siblings, 0 replies; 2+ messages in thread
From: Ludovic Courtès @ 2021-02-10 21:04 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hi Chris!

Christopher Baines <mail@cbaines.net> skribis:

> When serving from a store, you can use guix gc to remove items, and gc
> roots to protect the items you want to keep. I'm not aware of similar
> tooling when you just have a bunch of nars+narinfo files. This means you
> either just delete files based on when you generated them, or don't
> delete anything and potentially have an ever growing collection of nars.

Nitpick: ‘guix publish’ has a simple LRU policy for its cache, based the
atime of cached narinfos, which allows it to eventually reclaim
unpopular items.

> When serving the substitutes, there's advantages to having low latency
> access to the narinfo files, since they're very small. If you're trying
> to serve the whole world. one way of doing this would be to store the
> narinfos on several machines around the world, and direct requests for
> them to a machine that's close in terms of network latency. The relevant
> bit here is storing the narinfos on multiple machines, and keeping them
> in sync. This also may improve resilience if through this there's not a
> single point of failure with the one machine storing the narinfo files.
>
> I think these needs: doing garbage collection across narinfo data and
> storing narinfo data on multiple machines can be met with one
> approach. I'm also thinking this might be a good place to try and store
> analytics about the fetching of nars+narinfos.

I think what’s appropriate here is “cache eviction” rather than “garbage
collection”: in the former case, time locality is the driving factor to
determine what to remove, whereas in the latter case, reachability from
some roots is what matters.  That’s the difference between
/var/cache/guix/publish and /gnu/store.

I believe here you’d typically want policies similar to that of ‘guix
publish’: LRU + minimum time-to-live.  When things are distributed, it’s
a bit harder though: do you need to gather usage stats from all the
mirrors to the head? or do you perform cache eviction on each mirror
with purely local knowledge?

In any case, you need to make sure that the ‘Cache-Control’ header sent
to the client with its narinfo reply is honored—that the nar will remain
available for the specified time, no matter which replica the client
ends up talking to.

> This new tool/service would be a standalone thing, but I'm very much
> thinking about deploying it alongside a Guix Build Coordinator
> instance. Again, while the Guix Build Coordinator can help with serving
> substitutes, that approach doesn't stretch yet to doing the things
> above.
>
> Note that while this does similar things to guix publish, it's not
> designed to replace it. This approach is probably only worth it if you
> want to store/serve nars+narinfos on from more than one machine.
>
> I also don't see this as something to do instead of things like IPFS
> distribution for substitutes, but I do think it would be good to have a
> way of providing substitutes over HTTP which is reliable and works at a
> global scale.

Agreed on all points.

> The architecture I'm currently thinking about for this is to store the
> narinfo data in a PostgreSQL database. This will allow for storing the
> equivalent of "roots" in the graph, using SQL queries to traverse the
> graph to find the "garbage" and using logical replication to sync the
> data between multiple machines. Additionally, I'm thinking that the
> narinfo's can be served directly from the database, and maybe analytics
> data (counts of narinfo requests) can be saved back to the database.

What about nars, BTW?  :-)

> My testbed for this will probably be guix.cbaines.net, so I'll probably
> need to look at doing something to direct requests to different servers
> (maybe GeoIP with knot) and getting Letsencrypt to work across multiple
> servers, but that can come later.
>
> Anyway, I haven't actually implemented this yet, but maybe after sending
> this email I'll be one step closer...
>
> Please let me know if you have any thoughts or questions!

That’s a pretty exciting project, and if it can address the
single-point-of-failure issue with ci.guix.gnu.org and also provide a
general solution to mirroring (rather than the ad-hoc solutions
discussed so far), that’s great!

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-02-10 21:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-06 22:02 Handling nars/narinfos at scale, some ideas Christopher Baines
2021-02-10 21:04 ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).