unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Proposition to streamline our NAR collection to just zstd-compressed ones
@ 2024-01-10  2:32 Maxim Cournoyer
  2024-01-10 11:36 ` Ludovic Courtès
  2024-01-17 16:32 ` Simon Tournier
  0 siblings, 2 replies; 5+ messages in thread
From: Maxim Cournoyer @ 2024-01-10  2:32 UTC (permalink / raw)
  To: guix-devel, guix-sysadmin

Hello Guix, and Happy New Year!

It's been on my head for quite a bit of time (about 2 years, according
to [0]), to streamline our offering of cached nars.  Letting go of gzip
2 years ago, along a more aggressive garbage collection policy allowed
us to reduce our storage needs by at least 6.5 TiB.  I'm proposing to do
the same with our lzip compressed nars, to let go of an additional 3.9
TiB:

--8<---------------cut here---------------start------------->8---
$ du -sh /var/cache/guix/publish/{lzip,zstd}
3.9T    /var/cache/guix/publish/lzip
4.1T    /var/cache/guix/publish/zstd

$ find /var/cache/guix/publish/lzip -name '*.nar' | wc -l
4484645
$ find /var/cache/guix/publish/zstd -name '*.nar' | wc -l
4461195
--8<---------------cut here---------------end--------------->8---

The above suggests that zstd compressed nars are about 5% larger than
the lzip ones, which is not big enough to justify carrying both, in my
opinion.  In exchange for a little bit more bandwidth, users would have
the nars decompressed much faster with less CPU overhead locally.

Having our complete nars collection fit in around 4 TiB would also open
the door for simple rsync-based mirroring, which I have started working
on.

What do you think?  Should we go ahead and effect the following simple
change for the Berlin build farm?

--8<---------------cut here---------------start------------->8---
modified   hydra/modules/sysadmin/services.scm
@@ -683,7 +683,7 @@ to a selected directory.")
                    ;; <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00097.html>
                    ;; for the compression ratio/decompression speed
                    ;; tradeoffs.
-                   (compression '(("lzip" 9) ("zstd" 19)))
+                   (compression '(("zstd" 19)))
                    (cache-bypass-threshold cache-bypass-threshold)
                    (workers publish-workers)))
--8<---------------cut here---------------end--------------->8---

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Proposition to streamline our NAR collection to just zstd-compressed ones
  2024-01-10  2:32 Proposition to streamline our NAR collection to just zstd-compressed ones Maxim Cournoyer
@ 2024-01-10 11:36 ` Ludovic Courtès
  2024-01-15  8:31   ` Efraim Flashner
  2024-01-18 10:13   ` Giovanni Biscuolo
  2024-01-17 16:32 ` Simon Tournier
  1 sibling, 2 replies; 5+ messages in thread
From: Ludovic Courtès @ 2024-01-10 11:36 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: guix-devel, guix-sysadmin

Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> It's been on my head for quite a bit of time (about 2 years, according
> to [0]), to streamline our offering of cached nars.  Letting go of gzip
> 2 years ago, along a more aggressive garbage collection policy allowed
> us to reduce our storage needs by at least 6.5 TiB.  I'm proposing to do
> the same with our lzip compressed nars, to let go of an additional 3.9
> TiB:

Those space savings would be welcome.

> The above suggests that zstd compressed nars are about 5% larger than
> the lzip ones, which is not big enough to justify carrying both, in my
> opinion.  In exchange for a little bit more bandwidth, users would have
> the nars decompressed much faster with less CPU overhead locally.

The difference is slightly higher, with lzip being 8% smaller, for a big
package like ungoogled-chromium or icecat:

--8<---------------cut here---------------start------------->8---
$ wget -qO- https://ci.guix.gnu.org/7n95j1zlnwzc44azjs7nj8givnzdfs87.narinfo|grep -B1 ^FileSize
Compression: lzip
FileSize: 85783483
--
Compression: zstd
FileSize: 92796393
$ wget -qO- https://ci.guix.gnu.org/prpjnnnhay0alanmkgjh66vfwjlb98kq.narinfo|grep -B1 ^FileSize
Compression: lzip
FileSize: 295991
--
Compression: zstd
FileSize: 323456
--8<---------------cut here---------------end--------------->8---

But yeah, even though adaptive compression selection on the client is a
minor improvement, whether it warrants the extra space is debatable.

> What do you think?  Should we go ahead and effect the following simple
> change for the Berlin build farm?
>
> modified   hydra/modules/sysadmin/services.scm
> @@ -683,7 +683,7 @@ to a selected directory.")
>                     ;; <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00097.html>
>                     ;; for the compression ratio/decompression speed
>                     ;; tradeoffs.
> -                   (compression '(("lzip" 9) ("zstd" 19)))
> +                   (compression '(("zstd" 19)))

No objection from me, but…

… an important consideration: zstd support was added in 1.3.0, released
in May 2021.

From experience we know that users on foreign distros rarely, if ever,
upgrade the daemon (on top of that, upgrading the daemon is non-trivial
to someone who initially installed the Debian package, from what I’ve
seen, because one needs to fiddle with the .service file to adjust file
names and the likes), and we can be sure that many are still running an
old daemon.  We spent a lot of time on user support after gzip
substitutes had been removed (‘guix substitute’ would just crash) and we
must avoid that.

(guix store) emits a warning when connecting to an “old” daemon, but
only for daemons older than 2018.  We could emit a warning based on
whether or not “builtin:git-download” is available, but maybe that’s too
early?

In addition to the warning, we should communicate in advance and make
sure our instructions on how to upgrade the daemon are accurate and
clear.

Thoughts?

Ludo’.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Proposition to streamline our NAR collection to just zstd-compressed ones
  2024-01-10 11:36 ` Ludovic Courtès
@ 2024-01-15  8:31   ` Efraim Flashner
  2024-01-18 10:13   ` Giovanni Biscuolo
  1 sibling, 0 replies; 5+ messages in thread
From: Efraim Flashner @ 2024-01-15  8:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Maxim Cournoyer, guix-devel, guix-sysadmin

[-- Attachment #1: Type: text/plain, Size: 4503 bytes --]

On Wed, Jan 10, 2024 at 12:36:51PM +0100, Ludovic Courtès wrote:
> Hello,
> 
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> 
> > It's been on my head for quite a bit of time (about 2 years, according
> > to [0]), to streamline our offering of cached nars.  Letting go of gzip
> > 2 years ago, along a more aggressive garbage collection policy allowed
> > us to reduce our storage needs by at least 6.5 TiB.  I'm proposing to do
> > the same with our lzip compressed nars, to let go of an additional 3.9
> > TiB:
> 
> Those space savings would be welcome.
> 
> > The above suggests that zstd compressed nars are about 5% larger than
> > the lzip ones, which is not big enough to justify carrying both, in my
> > opinion.  In exchange for a little bit more bandwidth, users would have
> > the nars decompressed much faster with less CPU overhead locally.
> 
> The difference is slightly higher, with lzip being 8% smaller, for a big
> package like ungoogled-chromium or icecat:
> 
> --8<---------------cut here---------------start------------->8---
> $ wget -qO- https://ci.guix.gnu.org/7n95j1zlnwzc44azjs7nj8givnzdfs87.narinfo|grep -B1 ^FileSize
> Compression: lzip
> FileSize: 85783483
> --
> Compression: zstd
> FileSize: 92796393
> $ wget -qO- https://ci.guix.gnu.org/prpjnnnhay0alanmkgjh66vfwjlb98kq.narinfo|grep -B1 ^FileSize
> Compression: lzip
> FileSize: 295991
> --
> Compression: zstd
> FileSize: 323456
> --8<---------------cut here---------------end--------------->8---
> 
> But yeah, even though adaptive compression selection on the client is a
> minor improvement, whether it warrants the extra space is debatable.

There's another zstd flag that we should probably add: --rsyncable.

--rsyncable: zstd will periodically synchronize the compression state to
make the compressed file more rsync-friendly.  There is a negligible
impact to compression ratio, and a potential impact to compression
speed, perceptible at higher speeds, for example when combining
--rsyncable with many parallel worker threads.  This feature does
not work with --single-thread. You probably don´t want to use it with
long range mode, since it will decrease the effectiveness of the
synchronization points, but your mileage may vary.


> > What do you think?  Should we go ahead and effect the following simple
> > change for the Berlin build farm?
> >
> > modified   hydra/modules/sysadmin/services.scm
> > @@ -683,7 +683,7 @@ to a selected directory.")
> >                     ;; <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00097.html>
> >                     ;; for the compression ratio/decompression speed
> >                     ;; tradeoffs.
> > -                   (compression '(("lzip" 9) ("zstd" 19)))
> > +                   (compression '(("zstd" 19)))
> 
> No objection from me, but…
> 
> … an important consideration: zstd support was added in 1.3.0, released
> in May 2021.
> 
> From experience we know that users on foreign distros rarely, if ever,
> upgrade the daemon (on top of that, upgrading the daemon is non-trivial
> to someone who initially installed the Debian package, from what I’ve
> seen, because one needs to fiddle with the .service file to adjust file
> names and the likes), and we can be sure that many are still running an
> old daemon.  We spent a lot of time on user support after gzip
> substitutes had been removed (‘guix substitute’ would just crash) and we
> must avoid that.
> 
> (guix store) emits a warning when connecting to an “old” daemon, but
> only for daemons older than 2018.  We could emit a warning based on
> whether or not “builtin:git-download” is available, but maybe that’s too
> early?

builtin:git-download sometimes bites me on my machines since I don't
upgrade my aarch64/riscv64 installs that often.

Also, 2018 is now about 5 years ago.  It might be a good idea to just
have a rolling YEAR-3 warning that the daemon is getting old and they
might be missing out on features present in newer daemon versions.

> In addition to the warning, we should communicate in advance and make
> sure our instructions on how to upgrade the daemon are accurate and
> clear.
> 
> Thoughts?
> 
> Ludo’.
> 

-- 
Efraim Flashner   <efraim@flashner.co.il>   רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Proposition to streamline our NAR collection to just zstd-compressed ones
  2024-01-10  2:32 Proposition to streamline our NAR collection to just zstd-compressed ones Maxim Cournoyer
  2024-01-10 11:36 ` Ludovic Courtès
@ 2024-01-17 16:32 ` Simon Tournier
  1 sibling, 0 replies; 5+ messages in thread
From: Simon Tournier @ 2024-01-17 16:32 UTC (permalink / raw)
  To: Maxim Cournoyer, guix-devel, guix-sysadmin

Hi,

On Tue, 09 Jan 2024 at 21:32, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:

> What do you think?  Should we go ahead and effect the following simple
> change for the Berlin build farm?
>
> --8<---------------cut here---------------start------------->8---
> modified   hydra/modules/sysadmin/services.scm
> @@ -683,7 +683,7 @@ to a selected directory.")
>                     ;; <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00097.html>
>                     ;; for the compression ratio/decompression speed
>                     ;; tradeoffs.
> -                   (compression '(("lzip" 9) ("zstd" 19)))
> +                   (compression '(("zstd" 19)))
>                     (cache-bypass-threshold cache-bypass-threshold)
>                     (workers publish-workers)))
> --8<---------------cut here---------------end--------------->8---

I think it is a good idea but the change is more than just oneline. ;-)

I agree with Ludo: the change requires communication.  Something like:

 1. Blog post.  Something like that [1], a bit extended with a Migration
    section.

 2. A news (guix pull --news) announcing the sunset date.  And probably
    pointing to the blog post (or elsewhere) for helping the migration.

 3. Optionally emit a warning when the daemon is “too” old.


I agree that the extra space can be annoying.  In the same time, user
experience matters more, IMHO.

Cheers,
simon

1: https://guix.gnu.org/en/blog/2022/sunsetting-gzip-substitutes-availability/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Proposition to streamline our NAR collection to just zstd-compressed ones
  2024-01-10 11:36 ` Ludovic Courtès
  2024-01-15  8:31   ` Efraim Flashner
@ 2024-01-18 10:13   ` Giovanni Biscuolo
  1 sibling, 0 replies; 5+ messages in thread
From: Giovanni Biscuolo @ 2024-01-18 10:13 UTC (permalink / raw)
  To: Ludovic Courtès, Maxim Cournoyer; +Cc: guix-devel, guix-sysadmin

[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]

Hello,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

> From experience we know that users on foreign distros rarely, if ever,
> upgrade the daemon (on top of that, upgrading the daemon is non-trivial
> to someone who initially installed the Debian package, from what I’ve
> seen, because one needs to fiddle with the .service file to adjust file
> names and the likes),

The upgrade instructions are in (info "(guix) Upgrading Guix").

I run the daemon on Debian but installed it with the install script, not
with the Debian package: I'm going to test the installation on a VM and
I'll see/document what a user should do to upgrade a daemon installed
that way

My /etc/systemd/system/guix-daemon.service is:

--8<---------------cut here---------------start------------->8---

# This is a "service unit file" for the systemd init system to launch
# 'guix-daemon'.  Drop it in /etc/systemd/system or similar to have
# 'guix-daemon' automatically started.

[Unit]
Description=Build daemon for GNU Guix

[Service]
ExecStart=/var/guix/profiles/per-user/root/current-guix/bin/guix-daemon --build-users-group=guixbuild --substitute-urls='https://ci.guix.gnu.org https://bordeaux.guix.gnu.org'
Environment=GUIX_LOCPATH=/var/guix/profiles/per-user/root/guix-profile/lib/locale LC_ALL=en_US.utf8
Environment=TMPDIR=/home/guix-builder
RemainAfterExit=yes
StandardOutput=syslog
StandardError=syslog

# See <https://lists.gnu.org/archive/html/guix-devel/2016-04/msg00608.html>.
# Some package builds (for example, go@1.8.1) may require even more than
# 1024 tasks.
TasksMax=8192

[Install]
WantedBy=multi-user.target

--8<---------------cut here---------------end--------------->8---

I tweaked it a little bit to add "--substitute-urls" to ExecStart and
"LC_ALL" to Environment, but the Guix provided one should work.

AFAIU following the official daemon upgrade instructions should do the
job: right?

If this is not the case with the Debian package IMO it's a Debian
package (.service file) bug: we should add a footnote to (info "(guix)
Upgrading Guix") and file a bug upstream if needed, no?

[...]

> In addition to the warning, we should communicate in advance and make
> sure our instructions on how to upgrade the daemon are accurate and
> clear.

IMO the instructions on (info "(guix) Upgrading Guix") are clear; they
are just for a systemd based distro but should be easily "transposed" to
a different init system by the users... or not?

Thanks! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-18 10:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-10  2:32 Proposition to streamline our NAR collection to just zstd-compressed ones Maxim Cournoyer
2024-01-10 11:36 ` Ludovic Courtès
2024-01-15  8:31   ` Efraim Flashner
2024-01-18 10:13   ` Giovanni Biscuolo
2024-01-17 16:32 ` Simon Tournier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).