* Building and caching old Guix derivations for a faster time machine @ 2023-11-10 9:29 Ricardo Wurmus 2023-11-16 9:59 ` Simon Tournier 2023-11-16 15:39 ` Ludovic Courtès 0 siblings, 2 replies; 11+ messages in thread From: Ricardo Wurmus @ 2023-11-10 9:29 UTC (permalink / raw) To: guix-devel Hi Guix, to me the biggest downside of using “guix time-machine” is that it has to do a lot of boring work before the interesting work begins. The boring work includes building Guix derivations for the given channels, most of which have long been collected as garbage on ci.guix.gnu.org. It would be helpful, I think, to more aggressively cache these derivations and their outputs, and to go back in time and build the derivatinons for past revisions of Guix. I would expect there to be a lot of overlap in the produced files, so perhaps it won’t cost all that much in terms of storage. What do you think? -- Ricardo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-10 9:29 Building and caching old Guix derivations for a faster time machine Ricardo Wurmus @ 2023-11-16 9:59 ` Simon Tournier 2023-11-16 15:39 ` Ludovic Courtès 1 sibling, 0 replies; 11+ messages in thread From: Simon Tournier @ 2023-11-16 9:59 UTC (permalink / raw) To: Ricardo Wurmus, guix-devel; +Cc: Christopher Baines Hi Ricardo, On Fri, 10 Nov 2023 at 10:29, Ricardo Wurmus <rekado@elephly.net> wrote: > to me the biggest downside of using “guix time-machine” is that it has > to do a lot of boring work before the interesting work begins. The > boring work includes building Guix derivations for the given channels, > most of which have long been collected as garbage on ci.guix.gnu.org. > > It would be helpful, I think, to more aggressively cache these > derivations and their outputs, and to go back in time and build the > derivatinons for past revisions of Guix. I would expect there to be a > lot of overlap in the produced files, so perhaps it won’t cost all that > much in terms of storage. I agree. And it rings a bell about a discussion on the private guix-sysadmin mailing list, subject: Backup for substitutes. Back in 2022, Alexandre from Univ. Montpellier was proposing to store the artifacts the project would like to keep for a longer term. Well, I do not know what is the current status. That’s said, I am in favor: 1. keep all the derivations and their outputs required by Guix itself. 2. keep all the outputs for some specific revisions; say v1.0, v1.1, v1.2, v1.3, v1.4, and some other chosen points in time. About #1, it will help when running “guix time-machine”. About #2, it will help with concrete issues as time-bombs when running “guix time-machine -- shell” The questions are: which server? who maintain? :-) Cheers, simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-10 9:29 Building and caching old Guix derivations for a faster time machine Ricardo Wurmus 2023-11-16 9:59 ` Simon Tournier @ 2023-11-16 15:39 ` Ludovic Courtès 2023-11-18 4:27 ` Maxim Cournoyer 1 sibling, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2023-11-16 15:39 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel Hi, Ricardo Wurmus <rekado@elephly.net> skribis: > to me the biggest downside of using “guix time-machine” is that it has > to do a lot of boring work before the interesting work begins. The > boring work includes building Guix derivations for the given channels, > most of which have long been collected as garbage on ci.guix.gnu.org. > > It would be helpful, I think, to more aggressively cache these > derivations and their outputs, and to go back in time and build the > derivatinons for past revisions of Guix. I would expect there to be a > lot of overlap in the produced files, so perhaps it won’t cost all that > much in terms of storage. > > What do you think? I agree. The ‘guix publish’ TTL¹ at ci.guix was increased to 180 days following <https://issues.guix.gnu.org/48926> in 2021. That’s still not that much and these days and right now we have 84 TiB free at ci.guix. I guess we can afford increasing the TTL, probably starting with, say, 300 days, and monitoring disk usage. WDYT? For longer-term storage though, we’ll need a solution like what Simon described, offered by university colleagues. I’m not sure why this particular effort stalled; we should check with whoever spearheaded it and see if we can resume. Thanks, Ludo’. ¹ That’s the time-to-live, which denotes the minimum time a substitute is kept. Anytime a substitute is queried, its “age” is reset; if nobody asks for it, it may be reclaimed after its TTL has expired. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-16 15:39 ` Ludovic Courtès @ 2023-11-18 4:27 ` Maxim Cournoyer 2023-11-22 18:27 ` Ludovic Courtès 0 siblings, 1 reply; 11+ messages in thread From: Maxim Cournoyer @ 2023-11-18 4:27 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Ricardo Wurmus, guix-devel Hi, Ludovic Courtès <ludo@gnu.org> writes: > Hi, > > Ricardo Wurmus <rekado@elephly.net> skribis: > >> to me the biggest downside of using “guix time-machine” is that it has >> to do a lot of boring work before the interesting work begins. The >> boring work includes building Guix derivations for the given channels, >> most of which have long been collected as garbage on ci.guix.gnu.org. >> >> It would be helpful, I think, to more aggressively cache these >> derivations and their outputs, and to go back in time and build the >> derivatinons for past revisions of Guix. I would expect there to be a >> lot of overlap in the produced files, so perhaps it won’t cost all that >> much in terms of storage. >> >> What do you think? > > I agree. The ‘guix publish’ TTL¹ at ci.guix was increased to 180 days > following <https://issues.guix.gnu.org/48926> in 2021. That’s still not > that much and these days and right now we have 84 TiB free at ci.guix. > > I guess we can afford increasing the TTL, probably starting with, say, > 300 days, and monitoring disk usage. > > WDYT? While the 84 TiB we have at our disposal is indeed lot, I'd rather we keep the TTL at 180 days, to keep things more manageable for backup/sync purposes. Our current TTL currently yields 7 TiB of compressed NARs, which fits nicely into the hydra-guix-129 10 TiB slice available for local/simple redundancy (it's still on my TODO, missing the copy bit). I've been meaning to document an easy mirroring setup for that /var/cache/guix/publish directory, and having 14 TiB instead of 7 TiB there would hurt such setups. Perhaps a compromise we could do is drop yet another compression format? We carry both Zstd and LZMA for Berlin, which I see little value in; if we carried only ZSTD archives we could probably continue having < 10 TiB of NARs for a TTL of 360 days (although having only 3.5 TiB of NARs to sync around for mirrors would be great too!). What do you think? -- Thanks, Maxim ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-18 4:27 ` Maxim Cournoyer @ 2023-11-22 18:27 ` Ludovic Courtès 2023-11-29 16:34 ` Simon Tournier 0 siblings, 1 reply; 11+ messages in thread From: Ludovic Courtès @ 2023-11-22 18:27 UTC (permalink / raw) To: Maxim Cournoyer; +Cc: Ricardo Wurmus, guix-devel Hi, Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: >> I agree. The ‘guix publish’ TTL¹ at ci.guix was increased to 180 days >> following <https://issues.guix.gnu.org/48926> in 2021. That’s still not >> that much and these days and right now we have 84 TiB free at ci.guix. >> >> I guess we can afford increasing the TTL, probably starting with, say, >> 300 days, and monitoring disk usage. >> >> WDYT? > > While the 84 TiB we have at our disposal is indeed lot, I'd rather we > keep the TTL at 180 days, to keep things more manageable for backup/sync > purposes. Our current TTL currently yields 7 TiB of compressed NARs, > which fits nicely into the hydra-guix-129 10 TiB slice available for > local/simple redundancy (it's still on my TODO, missing the copy bit). > > I've been meaning to document an easy mirroring setup for that > /var/cache/guix/publish directory, and having 14 TiB instead of 7 TiB > there would hurt such setups. Maybe we should learn from what Chris has been doing with the Nar-Herder, too. Ideally, the build farm front-end (‘berlin’ in this case) would be merely a cache for recently-built artifacts, and we’d have long-term storage elsewhere where we could keep nars for several years. The important thing being: we need to decouple the build farm from (long-term) nar provision. > Perhaps a compromise we could do is drop yet another compression format? > We carry both Zstd and LZMA for Berlin, which I see little value in; if > we carried only ZSTD archives we could probably continue having < 10 TiB > of NARs for a TTL of 360 days (although having only 3.5 TiB of NARs to > sync around for mirrors would be great too!). > > What do you think? For compatibility reasons¹ and performance reasons², I would refrain from removing lzip or zstd substitutes, at least for “current” substitutes. For long-term storage though, we could choose to keep lzip only (because it compresses better). Not something we can really do with the current ‘guix publish’ setup though. Thoughts? Ludo’. ¹ Zstd support was added relatively recently. Older daemons may support lzip but not zstd. ² https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-22 18:27 ` Ludovic Courtès @ 2023-11-29 16:34 ` Simon Tournier 2023-11-30 13:28 ` Maxim Cournoyer 0 siblings, 1 reply; 11+ messages in thread From: Simon Tournier @ 2023-11-29 16:34 UTC (permalink / raw) To: Ludovic Courtès, Maxim Cournoyer; +Cc: Ricardo Wurmus, guix-devel Hi, On mer., 22 nov. 2023 at 19:27, Ludovic Courtès <ludo@gnu.org> wrote: > For long-term storage though, we could choose to keep lzip only (because > it compresses better). Not something we can really do with the current > ‘guix publish’ setup though. It looks good to me. For me, the priority list looks like: 1. Keep for as longer as we can all the requirements for running Guix itself, e.g., “guix time-machine”. Keep all the dependencies and all the outputs of derivations. At least, for all the ones the build farms are already building. 2. Keep for 3-5 years all the outputs for specific Guix revision, as v1.0, v1.1, v1.2, v1.3, v1.4. And some few others. Cheers, simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-29 16:34 ` Simon Tournier @ 2023-11-30 13:28 ` Maxim Cournoyer 2023-11-30 14:05 ` Guillaume Le Vaillant 2024-01-12 9:56 ` Simon Tournier 0 siblings, 2 replies; 11+ messages in thread From: Maxim Cournoyer @ 2023-11-30 13:28 UTC (permalink / raw) To: Simon Tournier; +Cc: Ludovic Courtès, Ricardo Wurmus, guix-devel Hi Simon, Simon Tournier <zimon.toutoune@gmail.com> writes: > Hi, > > On mer., 22 nov. 2023 at 19:27, Ludovic Courtès <ludo@gnu.org> wrote: > >> For long-term storage though, we could choose to keep lzip only (because >> it compresses better). Not something we can really do with the current >> ‘guix publish’ setup though. > > It looks good to me. For me, the priority list looks like: I'd like to have a single archive type as well in the future, but I'd settle on Zstd, not lzip, because it's faster to compress and decompress, and its compression ratio is not that different when using its highest level (19). > 1. Keep for as longer as we can all the requirements for running Guix > itself, e.g., “guix time-machine”. Keep all the dependencies and all > the outputs of derivations. At least, for all the ones the build farms > are already building. > > 2. Keep for 3-5 years all the outputs for specific Guix revision, as > v1.0, v1.1, v1.2, v1.3, v1.4. And some few others. That'd be nice, but not presently doable as we can't fine tune retention for a particular 'derivation' and its inputs in the Cuirass configuration, unless I've missed it. -- Thanks, Maxim ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-30 13:28 ` Maxim Cournoyer @ 2023-11-30 14:05 ` Guillaume Le Vaillant 2023-12-05 1:18 ` Maxim Cournoyer 2024-01-12 9:56 ` Simon Tournier 1 sibling, 1 reply; 11+ messages in thread From: Guillaume Le Vaillant @ 2023-11-30 14:05 UTC (permalink / raw) To: Maxim Cournoyer Cc: Simon Tournier, Ludovic Courtès, Ricardo Wurmus, guix-devel [-- Attachment #1: Type: text/plain, Size: 1321 bytes --] Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > Hi Simon, > > Simon Tournier <zimon.toutoune@gmail.com> writes: > >> Hi, >> >> On mer., 22 nov. 2023 at 19:27, Ludovic Courtès <ludo@gnu.org> wrote: >> >>> For long-term storage though, we could choose to keep lzip only (because >>> it compresses better). Not something we can really do with the current >>> ‘guix publish’ setup though. >> >> It looks good to me. For me, the priority list looks like: > > I'd like to have a single archive type as well in the future, but I'd > settle on Zstd, not lzip, because it's faster to compress and > decompress, and its compression ratio is not that different when using > its highest level (19). Last time I checked, zstd with max compression (zstd --ultra -22) was a little slower and had a little lower compression ratio than lzip with max compression (lzip -9). Zstd is however much faster for decompression. Another thing that could be useful to consider is that lzip was designed for long term storage, so it has some redundancy allowing fixing/recovering a corrupt archive (e.g. using lziprecover) if there has been some bit rot in the hardware storing the file. Whereas as far as I know zstd will just tell you "error: bad checksum" and will have no way to fix the archive. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 247 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-30 14:05 ` Guillaume Le Vaillant @ 2023-12-05 1:18 ` Maxim Cournoyer 0 siblings, 0 replies; 11+ messages in thread From: Maxim Cournoyer @ 2023-12-05 1:18 UTC (permalink / raw) To: Guillaume Le Vaillant Cc: Simon Tournier, Ludovic Courtès, Ricardo Wurmus, guix-devel Hi Guillaume, Guillaume Le Vaillant <glv@posteo.net> writes: > Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > >> Hi Simon, >> >> Simon Tournier <zimon.toutoune@gmail.com> writes: >> >>> Hi, >>> >>> On mer., 22 nov. 2023 at 19:27, Ludovic Courtès <ludo@gnu.org> wrote: >>> >>>> For long-term storage though, we could choose to keep lzip only (because >>>> it compresses better). Not something we can really do with the current >>>> ‘guix publish’ setup though. >>> >>> It looks good to me. For me, the priority list looks like: >> >> I'd like to have a single archive type as well in the future, but I'd >> settle on Zstd, not lzip, because it's faster to compress and >> decompress, and its compression ratio is not that different when using >> its highest level (19). > > Last time I checked, zstd with max compression (zstd --ultra -22) was > a little slower and had a little lower compression ratio than lzip with > max compression (lzip -9). > Zstd is however much faster for decompression. I think when we talk about performance of NARs, we mean it in the context of a Guix user installing them (decompressing) more than in the context of the CI producing them, so zstd beats lzip here. > Another thing that could be useful to consider is that lzip was designed > for long term storage, so it has some redundancy allowing fixing/recovering > a corrupt archive (e.g. using lziprecover) if there has been some bit > rot in the hardware storing the file. > Whereas as far as I know zstd will just tell you "error: bad checksum" > and will have no way to fix the archive. That's an interesting aspect of lzip, but in this age of CRC-check file systems like Btrfs, we have other means on ensuring data integrity (and recovery, assuming we have backups available). I'm still of the opinion that carrying a single set of zstd-only NARs makes the most sense in the long run. -- Thanks, Maxim ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2023-11-30 13:28 ` Maxim Cournoyer 2023-11-30 14:05 ` Guillaume Le Vaillant @ 2024-01-12 9:56 ` Simon Tournier 2024-01-15 4:02 ` Maxim Cournoyer 1 sibling, 1 reply; 11+ messages in thread From: Simon Tournier @ 2024-01-12 9:56 UTC (permalink / raw) To: Maxim Cournoyer; +Cc: Ludovic Courtès, Ricardo Wurmus, guix-devel Hi Maxim, On Thu, 30 Nov 2023 at 08:28, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote: > I'd like to have a single archive type as well in the future, but I'd > settle on Zstd, not lzip, because it's faster to compress and > decompress, and its compression ratio is not that different when using > its highest level (19). When running an inferior (past revision), some past Guile code as it was in this past revision is launched. Hum, I have never checked: the substitution mechanism depends on present revision code (Guile and daemon) or on past revision? Other said, what are the requirements for the backward compatibility? Being able to run past Guix from a recent Guix, somehow. >> 1. Keep for as longer as we can all the requirements for running Guix >> itself, e.g., “guix time-machine”. Keep all the dependencies and all >> the outputs of derivations. At least, for all the ones the build farms >> are already building. >> >> 2. Keep for 3-5 years all the outputs for specific Guix revision, as >> v1.0, v1.1, v1.2, v1.3, v1.4. And some few others. > > That'd be nice, but not presently doable as we can't fine tune retention > for a particular 'derivation' and its inputs in the Cuirass > configuration, unless I've missed it. That’s an implementation detail, a bug or a feature request, pick the one you prefer. ;-) We could imagine various paths for these next steps, IMHO. For instance, we could move these outputs to some specific stores independent of the current ones (ci.guix and bordeaux.guix). For instance, we could have “cold” storage with some cooking bakery for making hot again, instead of keeping all hot. For instance, we could imagine etc. :-) Well, I do not have think much and I just speak loud: Cuirass (and Build Coordinator) are the builders, and I would not rely on them for some NAR “archiving“, instead maybe “we” could put some love into the tool nar-herder. Somehow, extract specific NAR that the project would like to keep longer than the unpredictable current mechanism. Cheers, simon ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building and caching old Guix derivations for a faster time machine 2024-01-12 9:56 ` Simon Tournier @ 2024-01-15 4:02 ` Maxim Cournoyer 0 siblings, 0 replies; 11+ messages in thread From: Maxim Cournoyer @ 2024-01-15 4:02 UTC (permalink / raw) To: Simon Tournier; +Cc: Ludovic Courtès, Ricardo Wurmus, guix-devel Hi Simon, Simon Tournier <zimon.toutoune@gmail.com> writes: > Hi Maxim, > > On Thu, 30 Nov 2023 at 08:28, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote: > >> I'd like to have a single archive type as well in the future, but I'd >> settle on Zstd, not lzip, because it's faster to compress and >> decompress, and its compression ratio is not that different when using >> its highest level (19). > > When running an inferior (past revision), some past Guile code as it was > in this past revision is launched. Hum, I have never checked: the > substitution mechanism depends on present revision code (Guile and > daemon) or on past revision? > > Other said, what are the requirements for the backward compatibility? > Being able to run past Guix from a recent Guix, somehow. We're only impacting the future, not the past, I think. The inferior mechanism still relies on the same daemon, as far as I know, and the currently available gzipped nars would remain available according to their current retention policy (6 months when unused). >>> 1. Keep for as longer as we can all the requirements for running Guix >>> itself, e.g., “guix time-machine”. Keep all the dependencies and all >>> the outputs of derivations. At least, for all the ones the build farms >>> are already building. >>> >>> 2. Keep for 3-5 years all the outputs for specific Guix revision, as >>> v1.0, v1.1, v1.2, v1.3, v1.4. And some few others. >> >> That'd be nice, but not presently doable as we can't fine tune retention >> for a particular 'derivation' and its inputs in the Cuirass >> configuration, unless I've missed it. > > That’s an implementation detail, a bug or a feature request, pick the > one you prefer. ;-) I'd say it's a feature request :-). > We could imagine various paths for these next steps, IMHO. For > instance, we could move these outputs to some specific stores > independent of the current ones (ci.guix and bordeaux.guix). For > instance, we could have “cold” storage with some cooking bakery for > making hot again, instead of keeping all hot. For instance, we could > imagine etc. :-) > > Well, I do not have think much and I just speak loud: Cuirass (and Build > Coordinator) are the builders, and I would not rely on them for some NAR > “archiving“, instead maybe “we” could put some love into the tool > nar-herder. Somehow, extract specific NAR that the project would like > to keep longer than the unpredictable current mechanism. It seems the nar-herder would perhaps be well suited for this, if someone is inclined to implement it, given it keeps each nars in a database, which should make it fast to query for all the 'guix' packages substitutes. Perhaps it even has (or could have) hooks when registering a new nars which could define what is done to it (send to another server). Otherwise good old 'find' could be used to rsync the 'guix' named nars and their .narinfo metadata files to a different location, but that'd probably be less efficient (IO-intensive) on the huge multi-terabytes collection of nars we carry. -- Thanks, Maxim ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-01-15 4:03 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-10 9:29 Building and caching old Guix derivations for a faster time machine Ricardo Wurmus 2023-11-16 9:59 ` Simon Tournier 2023-11-16 15:39 ` Ludovic Courtès 2023-11-18 4:27 ` Maxim Cournoyer 2023-11-22 18:27 ` Ludovic Courtès 2023-11-29 16:34 ` Simon Tournier 2023-11-30 13:28 ` Maxim Cournoyer 2023-11-30 14:05 ` Guillaume Le Vaillant 2023-12-05 1:18 ` Maxim Cournoyer 2024-01-12 9:56 ` Simon Tournier 2024-01-15 4:02 ` Maxim Cournoyer
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.