Building and caching old Guix derivations for a faster time machine

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Building and caching old Guix derivations for a faster time machine
@ 2023-11-10  9:29 Ricardo Wurmus
  2023-11-16  9:59 ` Simon Tournier
  2023-11-16 15:39 ` Ludovic Courtès
  0 siblings, 2 replies; 11+ messages in thread
From: Ricardo Wurmus @ 2023-11-10  9:29 UTC (permalink / raw)
  To: guix-devel

Hi Guix,

to me the biggest downside of using “guix time-machine” is that it has
to do a lot of boring work before the interesting work begins.  The
boring work includes building Guix derivations for the given channels,
most of which have long been collected as garbage on ci.guix.gnu.org.

It would be helpful, I think, to more aggressively cache these
derivations and their outputs, and to go back in time and build the
derivatinons for past revisions of Guix.  I would expect there to be a
lot of overlap in the produced files, so perhaps it won’t cost all that
much in terms of storage.

What do you think?

-- 
Ricardo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-10  9:29 Building and caching old Guix derivations for a faster time machine Ricardo Wurmus
@ 2023-11-16  9:59 ` Simon Tournier
  2023-11-16 15:39 ` Ludovic Courtès
  1 sibling, 0 replies; 11+ messages in thread
From: Simon Tournier @ 2023-11-16  9:59 UTC (permalink / raw)
  To: Ricardo Wurmus, guix-devel; +Cc: Christopher Baines

Hi Ricardo,

On Fri, 10 Nov 2023 at 10:29, Ricardo Wurmus <rekado@elephly.net> wrote:

> to me the biggest downside of using “guix time-machine” is that it has
> to do a lot of boring work before the interesting work begins.  The
> boring work includes building Guix derivations for the given channels,
> most of which have long been collected as garbage on ci.guix.gnu.org.
>
> It would be helpful, I think, to more aggressively cache these
> derivations and their outputs, and to go back in time and build the
> derivatinons for past revisions of Guix.  I would expect there to be a
> lot of overlap in the produced files, so perhaps it won’t cost all that
> much in terms of storage.

I agree.  And it rings a bell about a discussion on the private
guix-sysadmin mailing list, subject: Backup for substitutes.

Back in 2022, Alexandre from Univ. Montpellier was proposing to store
the artifacts the project would like to keep for a longer term.  Well, I
do not know what is the current status.

That’s said, I am in favor:

 1. keep all the derivations and their outputs required by Guix itself.

 2. keep all the outputs for some specific revisions; say v1.0, v1.1,
 v1.2, v1.3, v1.4, and some other chosen points in time.

About #1, it will help when running “guix time-machine”.  About #2, it
will help with concrete issues as time-bombs when running “guix
time-machine -- shell”

The questions are: which server?  who maintain? :-)

Cheers,
simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-10  9:29 Building and caching old Guix derivations for a faster time machine Ricardo Wurmus
  2023-11-16  9:59 ` Simon Tournier
@ 2023-11-16 15:39 ` Ludovic Courtès
  2023-11-18  4:27   ` Maxim Cournoyer
  1 sibling, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2023-11-16 15:39 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi,

Ricardo Wurmus <rekado@elephly.net> skribis:

> to me the biggest downside of using “guix time-machine” is that it has
> to do a lot of boring work before the interesting work begins.  The
> boring work includes building Guix derivations for the given channels,
> most of which have long been collected as garbage on ci.guix.gnu.org.
>
> It would be helpful, I think, to more aggressively cache these
> derivations and their outputs, and to go back in time and build the
> derivatinons for past revisions of Guix.  I would expect there to be a
> lot of overlap in the produced files, so perhaps it won’t cost all that
> much in terms of storage.
>
> What do you think?

I agree.  The ‘guix publish’ TTL¹ at ci.guix was increased to 180 days
following <https://issues.guix.gnu.org/48926> in 2021.  That’s still not
that much and these days and right now we have 84 TiB free at ci.guix.

I guess we can afford increasing the TTL, probably starting with, say,
300 days, and monitoring disk usage.

WDYT?

For longer-term storage though, we’ll need a solution like what Simon
described, offered by university colleagues.  I’m not sure why this
particular effort stalled; we should check with whoever spearheaded it
and see if we can resume.

Thanks,
Ludo’.

¹ That’s the time-to-live, which denotes the minimum time a substitute
  is kept.  Anytime a substitute is queried, its “age” is reset; if
  nobody asks for it, it may be reclaimed after its TTL has expired.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-16 15:39 ` Ludovic Courtès
@ 2023-11-18  4:27   ` Maxim Cournoyer
  2023-11-22 18:27     ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Cournoyer @ 2023-11-18  4:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Ricardo Wurmus, guix-devel

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> Ricardo Wurmus <rekado@elephly.net> skribis:
>
>> to me the biggest downside of using “guix time-machine” is that it has
>> to do a lot of boring work before the interesting work begins.  The
>> boring work includes building Guix derivations for the given channels,
>> most of which have long been collected as garbage on ci.guix.gnu.org.
>>
>> It would be helpful, I think, to more aggressively cache these
>> derivations and their outputs, and to go back in time and build the
>> derivatinons for past revisions of Guix.  I would expect there to be a
>> lot of overlap in the produced files, so perhaps it won’t cost all that
>> much in terms of storage.
>>
>> What do you think?
>
> I agree.  The ‘guix publish’ TTL¹ at ci.guix was increased to 180 days
> following <https://issues.guix.gnu.org/48926> in 2021.  That’s still not
> that much and these days and right now we have 84 TiB free at ci.guix.
>
> I guess we can afford increasing the TTL, probably starting with, say,
> 300 days, and monitoring disk usage.
>
> WDYT?

While the 84 TiB we have at our disposal is indeed lot, I'd rather we
keep the TTL at 180 days, to keep things more manageable for backup/sync
purposes.  Our current TTL currently yields 7 TiB of compressed NARs,
which fits nicely into the hydra-guix-129 10 TiB slice available for
local/simple redundancy (it's still on my TODO, missing the copy bit).

I've been meaning to document an easy mirroring setup for that
/var/cache/guix/publish directory, and having 14 TiB instead of 7 TiB
there would hurt such setups.

Perhaps a compromise we could do is drop yet another compression format?
We carry both Zstd and LZMA for Berlin, which I see little value in; if
we carried only ZSTD archives we could probably continue having < 10 TiB
of NARs for a TTL of 360 days (although having only 3.5 TiB of NARs to
sync around for mirrors would be great too!).

What do you think?

-- 
Thanks,
Maxim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-18  4:27   ` Maxim Cournoyer
@ 2023-11-22 18:27     ` Ludovic Courtès
  2023-11-29 16:34       ` Simon Tournier
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2023-11-22 18:27 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Ricardo Wurmus, guix-devel

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

>> I agree.  The ‘guix publish’ TTL¹ at ci.guix was increased to 180 days
>> following <https://issues.guix.gnu.org/48926> in 2021.  That’s still not
>> that much and these days and right now we have 84 TiB free at ci.guix.
>>
>> I guess we can afford increasing the TTL, probably starting with, say,
>> 300 days, and monitoring disk usage.
>>
>> WDYT?
>
> While the 84 TiB we have at our disposal is indeed lot, I'd rather we
> keep the TTL at 180 days, to keep things more manageable for backup/sync
> purposes.  Our current TTL currently yields 7 TiB of compressed NARs,
> which fits nicely into the hydra-guix-129 10 TiB slice available for
> local/simple redundancy (it's still on my TODO, missing the copy bit).
>
> I've been meaning to document an easy mirroring setup for that
> /var/cache/guix/publish directory, and having 14 TiB instead of 7 TiB
> there would hurt such setups.

Maybe we should learn from what Chris has been doing with the
Nar-Herder, too.  Ideally, the build farm front-end (‘berlin’ in this
case) would be merely a cache for recently-built artifacts, and we’d
have long-term storage elsewhere where we could keep nars for several
years.

The important thing being: we need to decouple the build farm from
(long-term) nar provision.

> Perhaps a compromise we could do is drop yet another compression format?
> We carry both Zstd and LZMA for Berlin, which I see little value in; if
> we carried only ZSTD archives we could probably continue having < 10 TiB
> of NARs for a TTL of 360 days (although having only 3.5 TiB of NARs to
> sync around for mirrors would be great too!).
>
> What do you think?

For compatibility reasons¹ and performance reasons², I would refrain
from removing lzip or zstd substitutes, at least for “current”
substitutes.

For long-term storage though, we could choose to keep lzip only (because
it compresses better).  Not something we can really do with the current
‘guix publish’ setup though.

Thoughts?

Ludo’.

¹ Zstd support was added relatively recently.  Older daemons may support
  lzip but not zstd.

² https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-22 18:27     ` Ludovic Courtès
@ 2023-11-29 16:34       ` Simon Tournier
  2023-11-30 13:28         ` Maxim Cournoyer
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Tournier @ 2023-11-29 16:34 UTC (permalink / raw)
  To: Ludovic Courtès, Maxim Cournoyer; +Cc: Ricardo Wurmus, guix-devel

Hi,

On mer., 22 nov. 2023 at 19:27, Ludovic Courtès <ludo@gnu.org> wrote:

> For long-term storage though, we could choose to keep lzip only (because
> it compresses better).  Not something we can really do with the current
> ‘guix publish’ setup though.

It looks good to me.  For me, the priority list looks like:

 1. Keep for as longer as we can all the requirements for running Guix
 itself, e.g., “guix time-machine”.  Keep all the dependencies and all
 the outputs of derivations.  At least, for all the ones the build farms
 are already building.

 2. Keep for 3-5 years all the outputs for specific Guix revision, as
 v1.0, v1.1, v1.2, v1.3, v1.4.  And some few others.

Cheers,
simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-29 16:34       ` Simon Tournier
@ 2023-11-30 13:28         ` Maxim Cournoyer
  2023-11-30 14:05           ` Guillaume Le Vaillant
  2024-01-12  9:56           ` Simon Tournier
  0 siblings, 2 replies; 11+ messages in thread
From: Maxim Cournoyer @ 2023-11-30 13:28 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Ludovic Courtès, Ricardo Wurmus, guix-devel

Hi Simon,

Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On mer., 22 nov. 2023 at 19:27, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> For long-term storage though, we could choose to keep lzip only (because
>> it compresses better).  Not something we can really do with the current
>> ‘guix publish’ setup though.
>
> It looks good to me.  For me, the priority list looks like:

I'd like to have a single archive type as well in the future, but I'd
settle on Zstd, not lzip, because it's faster to compress and
decompress, and its compression ratio is not that different when using
its highest level (19).

>  1. Keep for as longer as we can all the requirements for running Guix
>  itself, e.g., “guix time-machine”.  Keep all the dependencies and all
>  the outputs of derivations.  At least, for all the ones the build farms
>  are already building.
>
>  2. Keep for 3-5 years all the outputs for specific Guix revision, as
>  v1.0, v1.1, v1.2, v1.3, v1.4.  And some few others.

That'd be nice, but not presently doable as we can't fine tune retention
for a particular 'derivation' and its inputs in the Cuirass
configuration, unless I've missed it.

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-30 13:28         ` Maxim Cournoyer
@ 2023-11-30 14:05           ` Guillaume Le Vaillant
  2023-12-05  1:18             ` Maxim Cournoyer
  2024-01-12  9:56           ` Simon Tournier
  1 sibling, 1 reply; 11+ messages in thread
From: Guillaume Le Vaillant @ 2023-11-30 14:05 UTC (permalink / raw)
  To: Maxim Cournoyer
  Cc: Simon Tournier, Ludovic Courtès, Ricardo Wurmus, guix-devel

[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Hi Simon,
>
> Simon Tournier <zimon.toutoune@gmail.com> writes:
>
>> Hi,
>>
>> On mer., 22 nov. 2023 at 19:27, Ludovic Courtès <ludo@gnu.org> wrote:
>>
>>> For long-term storage though, we could choose to keep lzip only (because
>>> it compresses better).  Not something we can really do with the current
>>> ‘guix publish’ setup though.
>>
>> It looks good to me.  For me, the priority list looks like:
>
> I'd like to have a single archive type as well in the future, but I'd
> settle on Zstd, not lzip, because it's faster to compress and
> decompress, and its compression ratio is not that different when using
> its highest level (19).

Last time I checked, zstd with max compression (zstd --ultra -22) was
a little slower and had a little lower compression ratio than lzip with
max compression (lzip -9).
Zstd is however much faster for decompression.

Another thing that could be useful to consider is that lzip was designed
for long term storage, so it has some redundancy allowing fixing/recovering
a corrupt archive (e.g. using lziprecover) if there has been some bit
rot in the hardware storing the file.
Whereas as far as I know zstd will just tell you "error: bad checksum"
and will have no way to fix the archive.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 247 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-30 14:05           ` Guillaume Le Vaillant
@ 2023-12-05  1:18             ` Maxim Cournoyer
  0 siblings, 0 replies; 11+ messages in thread
From: Maxim Cournoyer @ 2023-12-05  1:18 UTC (permalink / raw)
  To: Guillaume Le Vaillant
  Cc: Simon Tournier, Ludovic Courtès, Ricardo Wurmus, guix-devel

Hi Guillaume,

Guillaume Le Vaillant <glv@posteo.net> writes:

> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Hi Simon,
>>
>> Simon Tournier <zimon.toutoune@gmail.com> writes:
>>
>>> Hi,
>>>
>>> On mer., 22 nov. 2023 at 19:27, Ludovic Courtès <ludo@gnu.org> wrote:
>>>
>>>> For long-term storage though, we could choose to keep lzip only (because
>>>> it compresses better).  Not something we can really do with the current
>>>> ‘guix publish’ setup though.
>>>
>>> It looks good to me.  For me, the priority list looks like:
>>
>> I'd like to have a single archive type as well in the future, but I'd
>> settle on Zstd, not lzip, because it's faster to compress and
>> decompress, and its compression ratio is not that different when using
>> its highest level (19).
>
> Last time I checked, zstd with max compression (zstd --ultra -22) was
> a little slower and had a little lower compression ratio than lzip with
> max compression (lzip -9).
> Zstd is however much faster for decompression.

I think when we talk about performance of NARs, we mean it in the
context of a Guix user installing them (decompressing) more than in the
context of the CI producing them, so zstd beats lzip here.

> Another thing that could be useful to consider is that lzip was designed
> for long term storage, so it has some redundancy allowing fixing/recovering
> a corrupt archive (e.g. using lziprecover) if there has been some bit
> rot in the hardware storing the file.
> Whereas as far as I know zstd will just tell you "error: bad checksum"
> and will have no way to fix the archive.

That's an interesting aspect of lzip, but in this age of CRC-check file
systems like Btrfs, we have other means on ensuring data integrity (and
recovery, assuming we have backups available).

I'm still of the opinion that carrying a single set of zstd-only NARs
makes the most sense in the long run.

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2023-11-30 13:28         ` Maxim Cournoyer
  2023-11-30 14:05           ` Guillaume Le Vaillant
@ 2024-01-12  9:56           ` Simon Tournier
  2024-01-15  4:02             ` Maxim Cournoyer
  1 sibling, 1 reply; 11+ messages in thread
From: Simon Tournier @ 2024-01-12  9:56 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Ludovic Courtès, Ricardo Wurmus, guix-devel

Hi Maxim,

On Thu, 30 Nov 2023 at 08:28, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:

> I'd like to have a single archive type as well in the future, but I'd
> settle on Zstd, not lzip, because it's faster to compress and
> decompress, and its compression ratio is not that different when using
> its highest level (19).

When running an inferior (past revision), some past Guile code as it was
in this past revision is launched.  Hum, I have never checked: the
substitution mechanism depends on present revision code (Guile and
daemon) or on past revision?

Other said, what are the requirements for the backward compatibility?
Being able to run past Guix from a recent Guix, somehow.

>>  1. Keep for as longer as we can all the requirements for running Guix
>>  itself, e.g., “guix time-machine”.  Keep all the dependencies and all
>>  the outputs of derivations.  At least, for all the ones the build farms
>>  are already building.
>>
>>  2. Keep for 3-5 years all the outputs for specific Guix revision, as
>>  v1.0, v1.1, v1.2, v1.3, v1.4.  And some few others.
>
> That'd be nice, but not presently doable as we can't fine tune retention
> for a particular 'derivation' and its inputs in the Cuirass
> configuration, unless I've missed it.

That’s an implementation detail, a bug or a feature request, pick the
one you prefer. ;-)

We could imagine various paths for these next steps, IMHO.  For
instance, we could move these outputs to some specific stores
independent of the current ones (ci.guix and bordeaux.guix).  For
instance, we could have “cold” storage with some cooking bakery for
making hot again, instead of keeping all hot.  For instance, we could
imagine etc. :-)

Well, I do not have think much and I just speak loud: Cuirass (and Build
Coordinator) are the builders, and I would not rely on them for some NAR
“archiving“, instead maybe “we” could put some love into the tool
nar-herder.  Somehow, extract specific NAR that the project would like
to keep longer than the unpredictable current mechanism.

Cheers,
simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Building and caching old Guix derivations for a faster time machine
  2024-01-12  9:56           ` Simon Tournier
@ 2024-01-15  4:02             ` Maxim Cournoyer
  0 siblings, 0 replies; 11+ messages in thread
From: Maxim Cournoyer @ 2024-01-15  4:02 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Ludovic Courtès, Ricardo Wurmus, guix-devel

Hi Simon,

Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi Maxim,
>
> On Thu, 30 Nov 2023 at 08:28, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:
>
>> I'd like to have a single archive type as well in the future, but I'd
>> settle on Zstd, not lzip, because it's faster to compress and
>> decompress, and its compression ratio is not that different when using
>> its highest level (19).
>
> When running an inferior (past revision), some past Guile code as it was
> in this past revision is launched.  Hum, I have never checked: the
> substitution mechanism depends on present revision code (Guile and
> daemon) or on past revision?
>
> Other said, what are the requirements for the backward compatibility?
> Being able to run past Guix from a recent Guix, somehow.

We're only impacting the future, not the past, I think.  The inferior
mechanism still relies on the same daemon, as far as I know, and the
currently available gzipped nars would remain available according to
their current retention policy (6 months when unused).

>>>  1. Keep for as longer as we can all the requirements for running Guix
>>>  itself, e.g., “guix time-machine”.  Keep all the dependencies and all
>>>  the outputs of derivations.  At least, for all the ones the build farms
>>>  are already building.
>>>
>>>  2. Keep for 3-5 years all the outputs for specific Guix revision, as
>>>  v1.0, v1.1, v1.2, v1.3, v1.4.  And some few others.
>>
>> That'd be nice, but not presently doable as we can't fine tune retention
>> for a particular 'derivation' and its inputs in the Cuirass
>> configuration, unless I've missed it.
>
> That’s an implementation detail, a bug or a feature request, pick the
> one you prefer. ;-)

I'd say it's a feature request :-).

> We could imagine various paths for these next steps, IMHO.  For
> instance, we could move these outputs to some specific stores
> independent of the current ones (ci.guix and bordeaux.guix).  For
> instance, we could have “cold” storage with some cooking bakery for
> making hot again, instead of keeping all hot.  For instance, we could
> imagine etc. :-)
>
> Well, I do not have think much and I just speak loud: Cuirass (and Build
> Coordinator) are the builders, and I would not rely on them for some NAR
> “archiving“, instead maybe “we” could put some love into the tool
> nar-herder.  Somehow, extract specific NAR that the project would like
> to keep longer than the unpredictable current mechanism.

It seems the nar-herder would perhaps be well suited for this, if
someone is inclined to implement it, given it keeps each nars in a
database, which should make it fast to query for all the 'guix' packages
substitutes.  Perhaps it even has (or could have) hooks when registering
a new nars which could define what is done to it (send to another
server).

Otherwise good old 'find' could be used to rsync the 'guix' named nars
and their .narinfo metadata files to a different location, but that'd
probably be less efficient (IO-intensive) on the huge multi-terabytes
collection of nars we carry.

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-15  4:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-10  9:29 Building and caching old Guix derivations for a faster time machine Ricardo Wurmus
2023-11-16  9:59 ` Simon Tournier
2023-11-16 15:39 ` Ludovic Courtès
2023-11-18  4:27   ` Maxim Cournoyer
2023-11-22 18:27     ` Ludovic Courtès
2023-11-29 16:34       ` Simon Tournier
2023-11-30 13:28         ` Maxim Cournoyer
2023-11-30 14:05           ` Guillaume Le Vaillant
2023-12-05  1:18             ` Maxim Cournoyer
2024-01-12  9:56           ` Simon Tournier
2024-01-15  4:02             ` Maxim Cournoyer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).