unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Experimental nar-herder support for serving fixed output files by hash
@ 2022-06-24  8:10 Christopher Baines
  2022-06-24  8:31 ` Maxime Devos
  2022-06-27  8:52 ` Ludovic Courtès
  0 siblings, 2 replies; 7+ messages in thread
From: Christopher Baines @ 2022-06-24  8:10 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3355 bytes --]

Hey!

The nar-herder helps with managing a collection of nars. There's some
overlap with the functionality of guix publish in that both tools can
serve narinfo files which is a key part of providing substitutes.

One thing that guix publish does aside from serving narinfo files is
providing access to files in the store produced by fixed output
derivations. Package sources are fixed output derivations, so this
basically means single file package sources, like tar files.

Using ci.guix.gnu.org as an example, this looks like:

  https://ci.guix.gnu.org/file/0ad-0.0.25b-alpha.tar.xz/sha256/1p9fa8f7sjb9c5wl3mawzyfqvgr614kdkhrj2k4db9vkyisws3fp

You can request a file from the store, if you know it's name and
hash. In guix publish, this works by computing the
/gnu/store/... filename for a file with this name and hash, and then
serving it if it exists. Additionally, on ci.guix.gnu.org, there's some
NGinx caching in front so some files may be still available, even if
they've been removed from the store.

With the nar-herder, the implementation is a little trickier. Since the
nar-herder manages a collection of nars, rather than serving things from
the store, it might have the file being requested but it's inside a
probably compressed nar file. So, to respond to these requests, the
nar-herder has to take the relevant nar file and then read the file out
of it. I've now got an initial implementation of this:

  https://git.cbaines.net/guix/nar-herder/commit/?id=042f49e5fb52ea844ed5d29c17b26fbc8ad49f0e

The code isn't great, there's some difficulty in extracting the single
file from the nar, but the biggest problem is a limitation in the guile
fibers web server. Currently, responses have to be read in to memory,
which is fine for we pages, but not great if you're trying to serve
files which can be multiple gigabytes in size. This also means that the
first byte of the response is available when all the bytes are
available, so the download is slow to start.

With all of that said though, it does seem to work. For testing, I've
enabled it on bishan, which serves the bordeaux.guix.gnu.org collection
of nars. It only has IPv6 connectivity, so you'll only be able to try
this out if you've got an IPv6 support locally:

  https://bishan.guix.gnu.org/file/0ad-0.0.25b-alpha.tar.xz/sha256/1p9fa8f7sjb9c5wl3mawzyfqvgr614kdkhrj2k4db9vkyisws3fp

In terms of next steps, there's some things to do with improving the
implementation, but it would be good to hear if this is actually
worthwile?

ci.guix.gnu.org is already used as a content addressed mirror, although
given that there's a push to keep the store on berlin small, I'm not
sure how many files are actually available, or will be available in the
future. There's a 50G NGinx cache, of which I think 7G is used, so this
feature is probably being used a bit at least.

In terms of what enabling this for the bordeaux.guix.gnu.org collection
of nars would look like, I think there's roughly 50,000 tarballs taking
up at least a tebibyte of space which would be downloadable. These are
available as substitutes, but maybe there's value in making them
available this way as well?

Let me know what you think?

Thanks,

Chris


1:
sqlite> SELECT SUM(size) FROM narinfo_files WHERE url LIKE '%.tar.%';
1102376493623
sqlite> SELECT COUNT(*) FROM narinfo_files WHERE url LIKE '%.tar.%';
48326

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Experimental nar-herder support for serving fixed output files by hash
  2022-06-24  8:10 Experimental nar-herder support for serving fixed output files by hash Christopher Baines
@ 2022-06-24  8:31 ` Maxime Devos
  2022-06-27  8:52 ` Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Maxime Devos @ 2022-06-24  8:31 UTC (permalink / raw)
  To: Christopher Baines, guix-devel

[-- Attachment #1: Type: text/plain, Size: 1144 bytes --]

Christopher Baines schreef op vr 24-06-2022 om 09:10 [+0100]:
> [...]
> In terms of next steps, there's some things to do with improving the
> implementation, but it would be good to hear if this is actually
> worthwile?

I wouldn't know about the Guix part, but supporting streaming reading
responses in Guile(-Fibers) sounds useful outside Guix as well.  IIRC,
at some point in the past, I tried doing some streaming of reading
responses (or was it writing responses?) and it didn't work out ...

I'm not sure what ‘this’ refers to here: supporting fixed-output
derivations in general, or improving the implementation?  I don't know
the answer on the latter (except for a generic less memory/latency =
good answer), but I'd like to say that ci.guix.gnu.org's support for
fixed-output derivation makes it effectively acts like a mirror for all
source code used in Guix, so I'd like to keep that behaviour (*).
(There's always SWH, but avoiding a single points of failure would be
nice ...)

(*) I don't know if a lack of support for fixed-output derivations in
nar-herder would affect this ...

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Experimental nar-herder support for serving fixed output files by hash
  2022-06-24  8:10 Experimental nar-herder support for serving fixed output files by hash Christopher Baines
  2022-06-24  8:31 ` Maxime Devos
@ 2022-06-27  8:52 ` Ludovic Courtès
  2022-06-27 11:58   ` Christopher Baines
  1 sibling, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2022-06-27  8:52 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hello,

Christopher Baines <mail@cbaines.net> skribis:

> With the nar-herder, the implementation is a little trickier. Since the
> nar-herder manages a collection of nars, rather than serving things from
> the store, it might have the file being requested but it's inside a
> probably compressed nar file. So, to respond to these requests, the
> nar-herder has to take the relevant nar file and then read the file out
> of it. I've now got an initial implementation of this:
>
>   https://git.cbaines.net/guix/nar-herder/commit/?id=042f49e5fb52ea844ed5d29c17b26fbc8ad49f0e

Interesting.

> The code isn't great, there's some difficulty in extracting the single
> file from the nar, but the biggest problem is a limitation in the guile
> fibers web server. Currently, responses have to be read in to memory,
> which is fine for we pages, but not great if you're trying to serve
> files which can be multiple gigabytes in size. This also means that the
> first byte of the response is available when all the bytes are
> available, so the download is slow to start.

That, and in practice a cache (with some eviction mechanism) would be
necessary so nars don’t need to be extracted every time and so we can
use sendfile(2).

> With all of that said though, it does seem to work. For testing, I've
> enabled it on bishan, which serves the bordeaux.guix.gnu.org collection
> of nars. It only has IPv6 connectivity, so you'll only be able to try
> this out if you've got an IPv6 support locally:
>
>   https://bishan.guix.gnu.org/file/0ad-0.0.25b-alpha.tar.xz/sha256/1p9fa8f7sjb9c5wl3mawzyfqvgr614kdkhrj2k4db9vkyisws3fp

Nice!

> In terms of next steps, there's some things to do with improving the
> implementation, but it would be good to hear if this is actually
> worthwile?

IWBN to share as much code as possible with ‘guix publish’, which has
great test suite coverage and is being hammered every day.  Clearly the
bit about extracting nars is specific to the nar-herder though, so that
may prove difficult.

Thoughts?

Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Experimental nar-herder support for serving fixed output files by hash
  2022-06-27  8:52 ` Ludovic Courtès
@ 2022-06-27 11:58   ` Christopher Baines
  2022-06-27 17:53     ` Maxim Cournoyer
  2022-06-30 11:40     ` Ludovic Courtès
  0 siblings, 2 replies; 7+ messages in thread
From: Christopher Baines @ 2022-06-27 11:58 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2316 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

>> The code isn't great, there's some difficulty in extracting the single
>> file from the nar, but the biggest problem is a limitation in the guile
>> fibers web server. Currently, responses have to be read in to memory,
>> which is fine for we pages, but not great if you're trying to serve
>> files which can be multiple gigabytes in size. This also means that the
>> first byte of the response is available when all the bytes are
>> available, so the download is slow to start.
>
> That, and in practice a cache (with some eviction mechanism) would be
> necessary so nars don’t need to be extracted every time and so we can
> use sendfile(2).

I'd actually imagined that this would be used infrequently, but yeah, if
the decompression does become a bottleneck, then some caching reverse
proxy could help reduce that.

>> With all of that said though, it does seem to work. For testing, I've
>> enabled it on bishan, which serves the bordeaux.guix.gnu.org collection
>> of nars. It only has IPv6 connectivity, so you'll only be able to try
>> this out if you've got an IPv6 support locally:
>>
>>   https://bishan.guix.gnu.org/file/0ad-0.0.25b-alpha.tar.xz/sha256/1p9fa8f7sjb9c5wl3mawzyfqvgr614kdkhrj2k4db9vkyisws3fp
>
> Nice!
>
>> In terms of next steps, there's some things to do with improving the
>> implementation, but it would be good to hear if this is actually
>> worthwile?
>
> IWBN to share as much code as possible with ‘guix publish’, which has
> great test suite coverage and is being hammered every day.  Clearly the
> bit about extracting nars is specific to the nar-herder though, so that
> may prove difficult.

I'm going to look at the Guile Fibers web server, hopefully that can be
improved to support streaming responses, which would allow removing a
lot of custom code from guix publish.

There isn't all that much code to the nar-herder though, and most of
waht is there is doing different things to guix publish, so I'm not sure
there's all that much to share.

What I was getting at here though, ignoring the implementation, was
whether this is worthwhile to do? As in, is there benefit to having this
and being able to extend the content addressed mirrors that Guix uses?

Thanks,

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Experimental nar-herder support for serving fixed output files by hash
  2022-06-27 11:58   ` Christopher Baines
@ 2022-06-27 17:53     ` Maxim Cournoyer
  2022-06-30 11:40     ` Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Maxim Cournoyer @ 2022-06-27 17:53 UTC (permalink / raw)
  To: Christopher Baines; +Cc: Ludovic Courtès, guix-devel

Hi,

[...]

> What I was getting at here though, ignoring the implementation, was
> whether this is worthwhile to do? As in, is there benefit to having this
> and being able to extend the content addressed mirrors that Guix uses?

It's an effort/reward analysis that you are in the best position to
judge, but as an external observer there's definitely value in having
redundancy also at the level of serving content-addressable items.

So if you have the motivation to do it, I encourage you to do so!  And
if you can improve Guile-Fibers in the process, that'll benefit more
than just one code base, which would also be nice.

Thanks,

Maxim


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Experimental nar-herder support for serving fixed output files by hash
  2022-06-27 11:58   ` Christopher Baines
  2022-06-27 17:53     ` Maxim Cournoyer
@ 2022-06-30 11:40     ` Ludovic Courtès
  2022-06-30 18:28       ` Christopher Baines
  1 sibling, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2022-06-30 11:40 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hi,

Christopher Baines <mail@cbaines.net> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>>> The code isn't great, there's some difficulty in extracting the single
>>> file from the nar, but the biggest problem is a limitation in the guile
>>> fibers web server. Currently, responses have to be read in to memory,
>>> which is fine for we pages, but not great if you're trying to serve
>>> files which can be multiple gigabytes in size. This also means that the
>>> first byte of the response is available when all the bytes are
>>> available, so the download is slow to start.
>>
>> That, and in practice a cache (with some eviction mechanism) would be
>> necessary so nars don’t need to be extracted every time and so we can
>> use sendfile(2).
>
> I'd actually imagined that this would be used infrequently, but yeah, if
> the decompression does become a bottleneck, then some caching reverse
> proxy could help reduce that.

Relying on a proxy may be insufficient, because you still have incoming
requests that can trigger unbounded peaks of I/O and CPU usage, and
these requests may not be satisfied in time (the client may hang up
before the server is done processing the nar.)

>> IWBN to share as much code as possible with ‘guix publish’, which has
>> great test suite coverage and is being hammered every day.  Clearly the
>> bit about extracting nars is specific to the nar-herder though, so that
>> may prove difficult.
>
> I'm going to look at the Guile Fibers web server, hopefully that can be
> improved to support streaming responses, which would allow removing a
> lot of custom code from guix publish.

By “streaming responses”, do you mean pipelining?

How would that affect ‘guix publish’?

> There isn't all that much code to the nar-herder though, and most of
> waht is there is doing different things to guix publish, so I'm not sure
> there's all that much to share.
>
> What I was getting at here though, ignoring the implementation, was
> whether this is worthwhile to do? As in, is there benefit to having this
> and being able to extend the content addressed mirrors that Guix uses?

Having more content-addressed mirrors is worthwhile IMO, yes.

Having two different implementations of the same interfaces may not be
ideal, though, in terms of long-term maintenance cost.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Experimental nar-herder support for serving fixed output files by hash
  2022-06-30 11:40     ` Ludovic Courtès
@ 2022-06-30 18:28       ` Christopher Baines
  0 siblings, 0 replies; 7+ messages in thread
From: Christopher Baines @ 2022-06-30 18:28 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

>>> IWBN to share as much code as possible with ‘guix publish’, which has
>>> great test suite coverage and is being hammered every day.  Clearly the
>>> bit about extracting nars is specific to the nar-herder though, so that
>>> may prove difficult.
>>
>> I'm going to look at the Guile Fibers web server, hopefully that can be
>> improved to support streaming responses, which would allow removing a
>> lot of custom code from guix publish.
>
> By “streaming responses”, do you mean pipelining?

Streaming might not be the best word, but I'm referring to not having to
have the entire response body in memory before sending the first byte.

I've now done an initial implementation of this:

  https://github.com/wingo/fibers/pull/63

> How would that affect ‘guix publish’?

My reading of the concurrent-http-server in the publish script suggests
it could be replaced by the fibers web server, if it supports
"streaming" response bodies, as I've described above.

That might introduce some fibers related issues as I'm guessing there
might be some native code used for compression, but it would be a way to
remove the workarounds related to the web server part.

>> There isn't all that much code to the nar-herder though, and most of
>> waht is there is doing different things to guix publish, so I'm not sure
>> there's all that much to share.
>>
>> What I was getting at here though, ignoring the implementation, was
>> whether this is worthwhile to do? As in, is there benefit to having this
>> and being able to extend the content addressed mirrors that Guix uses?
>
> Having more content-addressed mirrors is worthwhile IMO, yes.

Cool :)

> Having two different implementations of the same interfaces may not be
> ideal, though, in terms of long-term maintenance cost.

Indeed, and I'm not set on keeping the nar-herder separate from guix
publish, but the nar-herder does seem to be doing a good job, and I
haven't seen a way of unifying the tooling yet.

Thanks,

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-30 19:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-24  8:10 Experimental nar-herder support for serving fixed output files by hash Christopher Baines
2022-06-24  8:31 ` Maxime Devos
2022-06-27  8:52 ` Ludovic Courtès
2022-06-27 11:58   ` Christopher Baines
2022-06-27 17:53     ` Maxim Cournoyer
2022-06-30 11:40     ` Ludovic Courtès
2022-06-30 18:28       ` Christopher Baines

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).