unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Christopher Baines <mail@cbaines.net>
Cc: guix-devel@gnu.org
Subject: Re: March update on bordeaux.guix.gnu.org
Date: Tue, 09 Apr 2024 17:37:29 +0200	[thread overview]
Message-ID: <87edbemts6.fsf@gnu.org> (raw)
In-Reply-To: <87a5mhs1gi.fsf@cbaines.net> (Christopher Baines's message of "Fri, 29 Mar 2024 10:53:27 +0000")

Hello,

Christopher Baines <mail@cbaines.net> skribis:

> The max-age of that narinfo is currently based on the scheduled removal
> of the zstd compressed nar, which is going to happen quite far in the
> future.
>
> I did think of a number of ways to approach this, and I'm not sure I've
> settled on the right one yet. Maybe the TTL should be capped at 600, and
> then drop to 0 as the time to remove the zstd nar approaches?

Not sure what you mean by removal.  To me, it’s the other way around:
when the server advertises a TTL, it must guarantee that the associated
nars remain available until that TTL has expired.  (It can also keep it
longer.)

The way ‘guix publish’ honors that commitment is by keeping track of the
last time each narinfo was published and protecting the narinfo and nars
from deletion until the TTL has expired.

>> But then again, that’s the advertised TTL; the real TTL is still
>> infinite, right?
>
> As you probably know, the situation is more complex.
>
> The problems caused when the nar-herder started removing zstd compressed
> nars shows the difference between retention of the nar in some form, and
> whether a cached narinfo response can be considered fresh or stale.
>
> Users might also not notice the availability of zstd nars if they cache
> responses forever, since currently there will be a lag between the nar
> becoming available, and a zstd compression being created (although we
> could generate zstd compressed nars for everything).

I think we should arrange to never advertise nars that are not actually
available, if that’s what you meant.

>> It’s a cache.  It’s useful to have this cache because in “typical” Guix
>> usage you’re likely to ask repeatedly for the same substitutes.
>>
>> Regarding the cost, 3f5e14182931f123c10513a3e1e2abaebfb52279 made things
>> more reasonable by putting a higher bound on narinfo retention.  On my
>> laptop, I have:
>>
>> $ ls -lrt /var/guix/substitute/cache/4refhwxbjmeua2kwg2nmzhv4dg4d3dorpjefq7kiciw2pfhaf26a/ |wc -l
>> 11549
>> $ du -h /var/guix/substitute/cache/4refhwxbjmeua2kwg2nmzhv4dg4d3dorpjefq7kiciw2pfhaf26a/ 
>> 50M     /var/guix/substitute/cache/4refhwxbjmeua2kwg2nmzhv4dg4d3dorpjefq7kiciw2pfhaf26a/
>>
>> Maybe that’s still excessive and we could further reduce the maximum
>> caching time.
>
> Having played around with this a bit (e.g. hacking guix weather not to
> cache), I'm a bit sceptical. Given maintaining the cache takes time that
> could be spent doing network I/O, and does potentially slow disk I/O, I
> think it would be interesting to try and work out in what situations the
> cache speeds things up overall, and in what situations it slows things
> down overall..

Yes, we can surely fine-tune that to make sure it remains beneficial.

>>> 6: https://lists.gnu.org/archive/html/guix-devel/2023-05/msg00290.html
>>
>> BTW, should we document this mirror somewhere (and also ensure that Guix
>> Foundation pays the bills), or do you view it more as an experiment for
>> now?
>
> If the project does want to provide mirrors, I think that would be
> great. From this experiment, I think we have some evidence that there
> are people using Guix outside of Europe, and in some cases they struggle
> with the European based infrastructure. It also seems like these mirrors
> do help, and the monetary cost isn't too high in my view.
>
> I think we should probably wait until the project starts managing them
> before documenting/advertising them more widely though.

*We* are “the project”.  :-)

By that I mean that we should discuss it with people at the Guix
Foundation and with the broader community, but it seems pretty clear to
me that there’s interest in having mirrors up and running, especially
outside Europe.  Then of course we need to be able to scale it according
to available funds, but at least the goal itself is clear.

[...]

>> Do you think the Data Service or another source of info would let us
>> make such decisions?
>>
>> If we take it to the extreme, we could have a sophisticated retention
>> policy like: drop all fixed-output derivations known to be available
>> from disarchive.guix + SWH, drop substitutes for packages that have less
>> than 100 dependents, etc.
>
> I think the Data Service (specifically data.guix.gnu.org) might be
> really helpful here, as it speeds up being able to work out what a nar
> or derivation relates to.
>
> Additionally, the nar-herder can tag narinfos (associate key=value pairs
> with them), and that's intended to help you manage the nars. So we
> should probably start tagging the nars with potentially useful
> information now, so that we can use that data later to make desicions.

Really good.

> We're storing 17.5TiB of nars currently, and this increases linearly, so
> it would be good to understand how this can be broken down. The
> nar-herder should help here as well, as providing you can download the
> 11G database, that should contain all the information you need to start
> digging in to this.
>
>   wget https://bordeaux.guix.gnu.org/latest-database-dump -O bordeaux.db

Neat, thanks!

Ludo’.


  reply	other threads:[~2024-04-09 15:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27 13:49 March update on bordeaux.guix.gnu.org Christopher Baines
2024-03-29 10:21 ` Ludovic Courtès
2024-03-29 10:53   ` Christopher Baines
2024-04-09 15:37     ` Ludovic Courtès [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-03-16 19:29 Christopher Baines
2023-03-21  2:49 ` Maxim Cournoyer
2023-03-22 14:27 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87edbemts6.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=mail@cbaines.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).