unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / Atom feed
* When substitute download + decompression is CPU-bound
@ 2020-12-14 22:20 Ludovic Courtès
  2020-12-14 22:29 ` Julien Lepiller
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Ludovic Courtès @ 2020-12-14 22:20 UTC (permalink / raw)
  To: guix-devel

Hi Guix!

Consider these two files:

  https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92
  https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92

Quick decompression bench:

--8<---------------cut here---------------start------------->8---
$ du -h /tmp/uc.nar.[gl]z
103M	/tmp/uc.nar.gz
71M	/tmp/uc.nar.lz
$ gunzip -c < /tmp/uc.nar.gz| wc -c
350491552
$ time lzip -d </tmp/uc.nar.lz >/dev/null

real	0m6.040s
user	0m5.950s
sys	0m0.036s
$ time gunzip -c < /tmp/uc.nar.gz >/dev/null

real	0m2.009s
user	0m1.977s
sys	0m0.032s
--8<---------------cut here---------------end--------------->8---

The decompression throughput (compressed bytes read in the first column,
uncompressed bytes written in the second column) is:

          input   |  output
  gzip: 167 MiB/s | 52 MB/s
  lzip:  56 MiB/s | 11 MB/s

Indeed, if you run this from a computer on your LAN:

  wget -O - … | gunzip > /dev/null

you’ll find that wget caps at 50 M/s with gunzip, whereas with lunzip it
caps at 11 MB/s.

From my place I get a peak download bandwidth of 30+ MB/s from
ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can’t go
beyond 11 M/s due to decompression).  I must say it never occurred to me
it could be the case when we introduced lzip substitutes.

I’d get faster substitute downloads with gzip (I would download more but
the time-to-disk would be smaller.)  Specifically, download +
decompression of ungoogled-chromium from the LAN completes in 2.4s for
gzip vs. 7.1s for lzip.  On a low-end ARMv7 device, also on the LAN, I
get 32s (gzip) vs. 53s (lzip).

Where to go from here?  Several options:

  0. Lzip decompression speed increases with compression ratio, but
     we’re already using ‘--best’ on ci.  The only way we could gain is
     by using “multi-member archives” and then parallel decompression as
     done in plzip, but that’s probably not supported in lzlib.  So
     we’re probably stuck here.

  1. Since ci.guix.gnu.org still provides both gzip and lzip archives,
     ‘guix substitute’ could automatically pick one or the other
     depending on the CPU and bandwidth.  Perhaps a simple trick would
     be to check the user/wall-clock time ratio and switch to gzip for
     subsequent downloads if that ratio is close to one.  How well would
     that work?

  2. Use Zstd like all the cool kids since it seems to have a much
     higher decompression speed: <https://facebook.github.io/zstd/>.
     630 MB/s on ungoogled-chromium on my laptop.  Woow.

  3. Allow for parallel downloads (really: parallel decompression) as
     Julien did in <https://issues.guix.gnu.org/39728>.

My preference would be #2, #1, and #3, in this order.  #2 is great but
it’s quite a bit of work, whereas #1 could be deployed quickly.  I’m not
fond of #3 because it just papers over the underlying issue and could be
counterproductive if the number of jobs is wrong.

Thoughts?

Ludo’.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès
@ 2020-12-14 22:29 ` Julien Lepiller
  2020-12-14 22:59 ` Nicolò Balzarotti
  2020-12-15 10:40 ` Jonathan Brielmaier
  2 siblings, 0 replies; 23+ messages in thread
From: Julien Lepiller @ 2020-12-14 22:29 UTC (permalink / raw)
  To: guix-devel, Ludovic Courtès

[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]

My proposed changes to allow for parallel download assume downloads are network-bound, so they can be separate from other jobs. If downloads are actually CPU-bound, then it has indeed no merit at all :)

Le 14 décembre 2020 17:20:17 GMT-05:00, "Ludovic Courtès" <ludo@gnu.org> a écrit :
>Hi Guix!
>
>Consider these two files:
>
>https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92
>https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92
>
>Quick decompression bench:
>
>--8<---------------cut here---------------start------------->8---
>$ du -h /tmp/uc.nar.[gl]z
>103M	/tmp/uc.nar.gz
>71M	/tmp/uc.nar.lz
>$ gunzip -c < /tmp/uc.nar.gz| wc -c
>350491552
>$ time lzip -d </tmp/uc.nar.lz >/dev/null
>
>real	0m6.040s
>user	0m5.950s
>sys	0m0.036s
>$ time gunzip -c < /tmp/uc.nar.gz >/dev/null
>
>real	0m2.009s
>user	0m1.977s
>sys	0m0.032s
>--8<---------------cut here---------------end--------------->8---
>
>The decompression throughput (compressed bytes read in the first
>column,
>uncompressed bytes written in the second column) is:
>
>          input   |  output
>  gzip: 167 MiB/s | 52 MB/s
>  lzip:  56 MiB/s | 11 MB/s
>
>Indeed, if you run this from a computer on your LAN:
>
>  wget -O - … | gunzip > /dev/null
>
>you’ll find that wget caps at 50 M/s with gunzip, whereas with lunzip
>it
>caps at 11 MB/s.
>
>From my place I get a peak download bandwidth of 30+ MB/s from
>ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can’t go
>beyond 11 M/s due to decompression).  I must say it never occurred to
>me
>it could be the case when we introduced lzip substitutes.
>
>I’d get faster substitute downloads with gzip (I would download more
>but
>the time-to-disk would be smaller.)  Specifically, download +
>decompression of ungoogled-chromium from the LAN completes in 2.4s for
>gzip vs. 7.1s for lzip.  On a low-end ARMv7 device, also on the LAN, I
>get 32s (gzip) vs. 53s (lzip).
>
>Where to go from here?  Several options:
>
>  0. Lzip decompression speed increases with compression ratio, but
>     we’re already using ‘--best’ on ci.  The only way we could gain is
>    by using “multi-member archives” and then parallel decompression as
>     done in plzip, but that’s probably not supported in lzlib.  So
>     we’re probably stuck here.
>
>  1. Since ci.guix.gnu.org still provides both gzip and lzip archives,
>     ‘guix substitute’ could automatically pick one or the other
>     depending on the CPU and bandwidth.  Perhaps a simple trick would
>     be to check the user/wall-clock time ratio and switch to gzip for
>    subsequent downloads if that ratio is close to one.  How well would
>     that work?
>
>  2. Use Zstd like all the cool kids since it seems to have a much
>     higher decompression speed: <https://facebook.github.io/zstd/>.
>     630 MB/s on ungoogled-chromium on my laptop.  Woow.
>
>  3. Allow for parallel downloads (really: parallel decompression) as
>     Julien did in <https://issues.guix.gnu.org/39728>.
>
>My preference would be #2, #1, and #3, in this order.  #2 is great but
>it’s quite a bit of work, whereas #1 could be deployed quickly.  I’m
>not
>fond of #3 because it just papers over the underlying issue and could
>be
>counterproductive if the number of jobs is wrong.
>
>Thoughts?
>
>Ludo’.

[-- Attachment #2: Type: text/html, Size: 4247 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès
  2020-12-14 22:29 ` Julien Lepiller
@ 2020-12-14 22:59 ` Nicolò Balzarotti
  2020-12-15  7:52   ` Pierre Neidhardt
  2020-12-15 11:36   ` Ludovic Courtès
  2020-12-15 10:40 ` Jonathan Brielmaier
  2 siblings, 2 replies; 23+ messages in thread
From: Nicolò Balzarotti @ 2020-12-14 22:59 UTC (permalink / raw)
  To: Ludovic Courtès, guix-devel

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Guix!
>
Hi Ludo

> Quick decompression bench:

I guess this benchmark follows the distri talk, doesn't it? :)

File size with zstd vs zstd -9 vs current lzip:
- 71M uc.nar.lz
- 87M uc.nar.zst-9
- 97M uc.nar.zst-default

> Where to go from here?  Several options:

>   1. Since ci.guix.gnu.org still provides both gzip and lzip archives,
>      ‘guix substitute’ could automatically pick one or the other
>      depending on the CPU and bandwidth.  Perhaps a simple trick would
>      be to check the user/wall-clock time ratio and switch to gzip for
>      subsequent downloads if that ratio is close to one.  How well would
>      that work?

I'm not sure using heuristics (i.e., guessing what should work better,
like in 1.) is the way to go, as temporary slowdowns to the network/cpu
will during the first download would affect the decision.

>   2. Use Zstd like all the cool kids since it seems to have a much
>      higher decompression speed: <https://facebook.github.io/zstd/>.
>      630 MB/s on ungoogled-chromium on my laptop.  Woow.

I know this means more work to do, but it seems to be the best
alternative.  However, if we go that way, will we keep lzip substitutes?
The 20% difference in size between lzip/zstd would mean a lot with slow
(mobile) network connections.

Nicolò


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-14 22:59 ` Nicolò Balzarotti
@ 2020-12-15  7:52   ` Pierre Neidhardt
  2020-12-15  9:45     ` Nicolò Balzarotti
  2020-12-15 11:42     ` Ludovic Courtès
  2020-12-15 11:36   ` Ludovic Courtès
  1 sibling, 2 replies; 23+ messages in thread
From: Pierre Neidhardt @ 2020-12-15  7:52 UTC (permalink / raw)
  To: Nicolò Balzarotti, Ludovic Courtès, guix-devel

[-- Attachment #1: Type: text/plain, Size: 680 bytes --]

Another option is plzip (parallel Lzip, an official part of Lzip).

> decompression of ungoogled-chromium from the LAN completes in 2.4s for
> gzip vs. 7.1s for lzip.  On a low-end ARMv7 device, also on the LAN, I
> get 32s (gzip) vs. 53s (lzip).

With four cores, plzip would beat gzip in the first case.
With only 2 cores, plzip would beat gzip in the second case.

What's left to do to implement plzip support?  That's the good news:
almost nothing!

- On the Lzip binding side, we need to add support for multi pages.
  It's a bit of work but not that much.
- On the Guix side, there is nothing to do.

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15  7:52   ` Pierre Neidhardt
@ 2020-12-15  9:45     ` Nicolò Balzarotti
  2020-12-15  9:54       ` Pierre Neidhardt
  2020-12-15 11:42     ` Ludovic Courtès
  1 sibling, 1 reply; 23+ messages in thread
From: Nicolò Balzarotti @ 2020-12-15  9:45 UTC (permalink / raw)
  To: Pierre Neidhardt, Ludovic Courtès, guix-devel

Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Another option is plzip (parallel Lzip, an official part of Lzip).

Wouldn't that mean that this will become a problem when we'll have
parallel downloads (and sometimes parallel decompression will happen)?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15  9:45     ` Nicolò Balzarotti
@ 2020-12-15  9:54       ` Pierre Neidhardt
  2020-12-15 10:03         ` Nicolò Balzarotti
  0 siblings, 1 reply; 23+ messages in thread
From: Pierre Neidhardt @ 2020-12-15  9:54 UTC (permalink / raw)
  To: Nicolò Balzarotti, Ludovic Courtès, guix-devel

[-- Attachment #1: Type: text/plain, Size: 615 bytes --]

Nicolò Balzarotti <anothersms@gmail.com> writes:

> Pierre Neidhardt <mail@ambrevar.xyz> writes:
>
>> Another option is plzip (parallel Lzip, an official part of Lzip).
>
> Wouldn't that mean that this will become a problem when we'll have
> parallel downloads (and sometimes parallel decompression will happen)?

What do you mean?

Parallel decompression is unrelated to downloads as far as I
understand.  Once the archive (or just archive chunks?) is available,
plzip can decompress multiple segments at the same time if enough cores
are available.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15  9:54       ` Pierre Neidhardt
@ 2020-12-15 10:03         ` Nicolò Balzarotti
  2020-12-15 10:13           ` Pierre Neidhardt
  0 siblings, 1 reply; 23+ messages in thread
From: Nicolò Balzarotti @ 2020-12-15 10:03 UTC (permalink / raw)
  To: Pierre Neidhardt, Ludovic Courtès, guix-devel

Pierre Neidhardt <mail@ambrevar.xyz> writes:

>
> What do you mean?
>
If you download multiple files at a time, you might end up decompressing
them simultaneously.  Plzip won't help then on a dual core machine,
where you might end up being cpu bound again then. Is this right?

If it is, reducing the overall cpu usage seems to be a better approach
in the long term.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15 10:03         ` Nicolò Balzarotti
@ 2020-12-15 10:13           ` Pierre Neidhardt
  2020-12-15 10:14             ` Pierre Neidhardt
  0 siblings, 1 reply; 23+ messages in thread
From: Pierre Neidhardt @ 2020-12-15 10:13 UTC (permalink / raw)
  To: Nicolò Balzarotti, Ludovic Courtès, guix-devel

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

Nicolò Balzarotti <anothersms@gmail.com> writes:

> If you download multiple files at a time, you might end up decompressing
> them simultaneously.  Plzip won't help then on a dual core machine,
> where you might end up being cpu bound again then. Is this right?
>
> If it is, reducing the overall cpu usage seems to be a better approach
> in the long term.

An answer to this may be in pipelining the process.

The parallel downloads would feed the archives to the pipeline and the
parallel decompressor would pop the archives out of the pipeline one by
one.

If I'm not mistaken, this should yield optimal results regardless of the
network or CPU performance.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15 10:13           ` Pierre Neidhardt
@ 2020-12-15 10:14             ` Pierre Neidhardt
  0 siblings, 0 replies; 23+ messages in thread
From: Pierre Neidhardt @ 2020-12-15 10:14 UTC (permalink / raw)
  To: Nicolò Balzarotti, Ludovic Courtès, guix-devel

[-- Attachment #1: Type: text/plain, Size: 130 bytes --]

Here the "pipeline" could be a CSP channel.
Not sure what the term is in Guile.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès
  2020-12-14 22:29 ` Julien Lepiller
  2020-12-14 22:59 ` Nicolò Balzarotti
@ 2020-12-15 10:40 ` Jonathan Brielmaier
  2020-12-15 19:43   ` Joshua Branson
  2 siblings, 1 reply; 23+ messages in thread
From: Jonathan Brielmaier @ 2020-12-15 10:40 UTC (permalink / raw)
  To: guix-devel

Super interesting findings!

On 14.12.20 23:20, Ludovic Courtès wrote:
>    2. Use Zstd like all the cool kids since it seems to have a much
>       higher decompression speed: <https://facebook.github.io/zstd/>.
>       630 MB/s on ungoogled-chromium on my laptop.  Woow.

Not only decompression speed is fast, compression is as well:

size	file			time for compression (lower is better)
335M	uc.nar

104M	uc.nar.gz
	 	  8
71M	uc.nar.lz.level9	120
74M	uc.nar.lz.level6
  	 80
82M	uc.nar.lz.level3	 30
89M	uc.nar.lz
.level1	 16
97M	uc.nar.zst	 	  1

So I am bought by zstd, as user and as substitution server care taker :)

For mobile users and users without internet flatrates the increased nar
size is a problem.
Although I think the problem here is not bewtween gzip, lzip and zstd.
It's the fact that we completely download the new package even if's just
some 100 lines of diffoscope diff[0]. And most of them is due to the
change /gnu/store name...

[0] diffoscope --max-diff-block-lines 0
/gnu/store/zvcn2r352wxnmq7jayz5myg23gh9s17q-icedove-78.5.1
/gnu/store/dzjym6y7b9z4apgvvydj9lf0kbaa8qbv-icedove-78.5.1
lines: 783
size: 64k


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-14 22:59 ` Nicolò Balzarotti
  2020-12-15  7:52   ` Pierre Neidhardt
@ 2020-12-15 11:36   ` Ludovic Courtès
  2020-12-15 11:45     ` Nicolò Balzarotti
  1 sibling, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2020-12-15 11:36 UTC (permalink / raw)
  To: Nicolò Balzarotti; +Cc: guix-devel

Hi,

Nicolò Balzarotti <anothersms@gmail.com> skribis:

> I guess this benchmark follows the distri talk, doesn't it? :)

Yes, that and my own quest for optimization opportunities.  :-)

> File size with zstd vs zstd -9 vs current lzip:
> - 71M uc.nar.lz
> - 87M uc.nar.zst-9
> - 97M uc.nar.zst-default
>
>> Where to go from here?  Several options:
>
>>   1. Since ci.guix.gnu.org still provides both gzip and lzip archives,
>>      ‘guix substitute’ could automatically pick one or the other
>>      depending on the CPU and bandwidth.  Perhaps a simple trick would
>>      be to check the user/wall-clock time ratio and switch to gzip for
>>      subsequent downloads if that ratio is close to one.  How well would
>>      that work?
>
> I'm not sure using heuristics (i.e., guessing what should work better,
> like in 1.) is the way to go, as temporary slowdowns to the network/cpu
> will during the first download would affect the decision.

I suppose we could time each substitute download and adjust the choice
continually.

It might be better to provide a command-line flag to choose between
optimizing for bandwidth usage (users with limited Internet access may
prefer that) or for speed.

>>   2. Use Zstd like all the cool kids since it seems to have a much
>>      higher decompression speed: <https://facebook.github.io/zstd/>.
>>      630 MB/s on ungoogled-chromium on my laptop.  Woow.
>
> I know this means more work to do, but it seems to be the best
> alternative.  However, if we go that way, will we keep lzip substitutes?
> The 20% difference in size between lzip/zstd would mean a lot with slow
> (mobile) network connections.

A lot in what sense?  In terms of bandwidth usage, right?

In terms of speed, zstd would probably reduce the time-to-disk as soon
as you have ~15 MB/s peak bandwidth or more.

Anyway, we’re not there yet, but I suppose if we get zstd support, we
could configure berlin to keep lzip and zstd (rather than lzip and gzip
as is currently the case).

Ludo’.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15  7:52   ` Pierre Neidhardt
  2020-12-15  9:45     ` Nicolò Balzarotti
@ 2020-12-15 11:42     ` Ludovic Courtès
  2020-12-15 12:31       ` Pierre Neidhardt
  1 sibling, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2020-12-15 11:42 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel, Nicolò Balzarotti

Hi,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Another option is plzip (parallel Lzip, an official part of Lzip).
>
>> decompression of ungoogled-chromium from the LAN completes in 2.4s for
>> gzip vs. 7.1s for lzip.  On a low-end ARMv7 device, also on the LAN, I
>> get 32s (gzip) vs. 53s (lzip).
>
> With four cores, plzip would beat gzip in the first case.
> With only 2 cores, plzip would beat gzip in the second case.
>
> What's left to do to implement plzip support?  That's the good news:
> almost nothing!
>
> - On the Lzip binding side, we need to add support for multi pages.
>   It's a bit of work but not that much.
> - On the Guix side, there is nothing to do.

Well, ‘guix publish’ would first need to create multi-member archives,
right?

Also, lzlib (which is what we use) does not implement parallel
decompression, AIUI.

Even if it did, would we be able to take advantage of it?  Currently
‘restore-file’ expects to read an archive stream sequentially.

Even if I’m wrong :-), decompression speed would at best be doubled on
multi-core machines (wouldn’t help much on low-end ARM devices), and
that’s very little compared to the decompression speed achieved by zstd.

Ludo’.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15 11:36   ` Ludovic Courtès
@ 2020-12-15 11:45     ` Nicolò Balzarotti
  0 siblings, 0 replies; 23+ messages in thread
From: Nicolò Balzarotti @ 2020-12-15 11:45 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Ludovic Courtès <ludo@gnu.org> writes:

> A lot in what sense?  In terms of bandwidth usage, right?

Yep, I think most of mobile data plans are still limited.  Even if here
in Italy is easy to get 50Gb+/monthly, I think it's not the same
worldwide.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15 11:42     ` Ludovic Courtès
@ 2020-12-15 12:31       ` Pierre Neidhardt
  2020-12-18 14:59         ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Pierre Neidhardt @ 2020-12-15 12:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, Nicolò Balzarotti

[-- Attachment #1: Type: text/plain, Size: 1716 bytes --]

Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> Well, ‘guix publish’ would first need to create multi-member archives,
> right?

Correct, but it's trivial once the bindings have been implemented.

> Also, lzlib (which is what we use) does not implement parallel
> decompression, AIUI.

Yes it does, multi-member archives is a non-optional part of the Lzip
specs, and lzlib implemetns all the specs.

> Even if it did, would we be able to take advantage of it?  Currently
> ‘restore-file’ expects to read an archive stream sequentially.

Yes it works, I just tried this:

--8<---------------cut here---------------start------------->8---
cat big-file.lz | plzip -d -o big-file -
--8<---------------cut here---------------end--------------->8---

Decompression happens in parallel.

> Even if I’m wrong :-), decompression speed would at best be doubled on
> multi-core machines (wouldn’t help much on low-end ARM devices), and
> that’s very little compared to the decompression speed achieved by zstd.

Why doubled?  If the archive has more than CORE-NUMBER segments, then
the decompression duration can be divided by CORE-NUMBER.

All that said, I think we should have both:

- Parallel lzip support is the easiest to add at this point.
  It's the best option for people with low bandwidth.  This can benefit
  most of the planet I suppose.

- zstd is best for users with high bandwidth (or with slow hardware).
  We need to write the necessary bindings though, so it will take a bit
  more time.

Then the users can choose which compression they prefer, mostly
depending on their hardware and bandwidth.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15 10:40 ` Jonathan Brielmaier
@ 2020-12-15 19:43   ` Joshua Branson
  2021-01-07 10:45     ` Guillaume Le Vaillant
  0 siblings, 1 reply; 23+ messages in thread
From: Joshua Branson @ 2020-12-15 19:43 UTC (permalink / raw)
  To: Jonathan Brielmaier; +Cc: guix-devel


Looking on the Zstandard website (https://facebook.github.io/zstd/), it
mentions google's snappy compression library
(https://github.com/google/snappy).  Snappy has some fairly good
benchmarks too:

Compressor 	Ratio 	Compression 	Decompress.
zstd    	2.884   500 MB/s 	1660 MB/s
snappy  	2.073   560 MB/s 	1790 MB/s

Would snappy be easier to use than Zstandard?

--
Joshua Branson
Sent from Emacs and Gnus
  https://gnucode.me
  https://video.hardlimit.com/accounts/joshua_branson/video-channels
  https://propernaming.org
  "You can have whatever you want, as long as you help

enough other people get what they want." - Zig Ziglar


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15 12:31       ` Pierre Neidhardt
@ 2020-12-18 14:59         ` Ludovic Courtès
  2020-12-18 15:33           ` Pierre Neidhardt
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2020-12-18 14:59 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel, Nicolò Balzarotti

Hi Pierre,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Well, ‘guix publish’ would first need to create multi-member archives,
>> right?
>
> Correct, but it's trivial once the bindings have been implemented.

OK.

>> Also, lzlib (which is what we use) does not implement parallel
>> decompression, AIUI.
>
> Yes it does, multi-member archives is a non-optional part of the Lzip
> specs, and lzlib implemetns all the specs.

Nice.

>> Even if it did, would we be able to take advantage of it?  Currently
>> ‘restore-file’ expects to read an archive stream sequentially.
>
> Yes it works, I just tried this:
>
> cat big-file.lz | plzip -d -o big-file -
>
> Decompression happens in parallel.
>
>> Even if I’m wrong :-), decompression speed would at best be doubled on
>> multi-core machines (wouldn’t help much on low-end ARM devices), and
>> that’s very little compared to the decompression speed achieved by zstd.
>
> Why doubled?  If the archive has more than CORE-NUMBER segments, then
> the decompression duration can be divided by CORE-NUMBER.

My laptop has 4 cores, so at best I’d get a 4x speedup, compared to the
10x speedup with zstd that also comes with much lower resource usage,
etc.

> All that said, I think we should have both:
>
> - Parallel lzip support is the easiest to add at this point.
>   It's the best option for people with low bandwidth.  This can benefit
>   most of the planet I suppose.
>
> - zstd is best for users with high bandwidth (or with slow hardware).
>   We need to write the necessary bindings though, so it will take a bit
>   more time.
>
> Then the users can choose which compression they prefer, mostly
> depending on their hardware and bandwidth.

Would you like to give parallel lzip a try?

Thanks!

Ludo’.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-18 14:59         ` Ludovic Courtès
@ 2020-12-18 15:33           ` Pierre Neidhardt
  0 siblings, 0 replies; 23+ messages in thread
From: Pierre Neidhardt @ 2020-12-18 15:33 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, Nicolò Balzarotti

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:

> My laptop has 4 cores, so at best I’d get a 4x speedup, compared to the
> 10x speedup with zstd that also comes with much lower resource usage,
> etc.

Of course, it's a trade off between high compression and high speed :)

Since there is no universal best option, I think it's best to support both.

> Would you like to give parallel lzip a try?

It shouldn't be too hard for me considering I already have experience
with Lzip, but I can only reasonably do this after FOSDEM, so in 1.5
month from now.

If I forget, please ping me ;)

If there is any taker before that, please go ahead! :)

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2020-12-15 19:43   ` Joshua Branson
@ 2021-01-07 10:45     ` Guillaume Le Vaillant
  2021-01-07 11:00       ` Pierre Neidhardt
  2021-01-14 21:51       ` Ludovic Courtès
  0 siblings, 2 replies; 23+ messages in thread
From: Guillaume Le Vaillant @ 2021-01-07 10:45 UTC (permalink / raw)
  To: Joshua Branson; +Cc: guix-devel


[-- Attachment #1.1: Type: text/plain, Size: 353 bytes --]


I compared gzip, lzip and zstd when compressing a 580 MB pack (therefore
containing "subsitutes" for several packages) with different compression
levels. Maybe the results can be of some use to someone.

Note that the plots only show the results using only 1 thread and
standard compression levels, and that the speed axis is using
logarithmic scale.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: compression-benchmark.org --]
[-- Type: text/x-org, Size: 9159 bytes --]

Machine used for the tests:
 - CPU: Intel i7-3630QM
 - RAM: 16 MiB

Programs:
 - gzip 1.10
 - pigz 2.4
 - lzip 1.21
 - plzip 1.8
 - zstd 1.4.4
 - pzstd 1.4.4

Uncompressed file:
 - name: monero-0.17.1.5-pack.tar
 - size: 582707200 bytes

#+PLOT: script:"compression-benchmark.plot"
| Comp. command  | Comp. time | Comp. size | Comp. speed | Comp. ratio | Decomp. time | Decomp. speed |
|----------------+------------+------------+-------------+-------------+--------------+---------------|
| gzip -1        |      7.999 |  166904534 |    72847506 |       3.491 |        3.292 |      50700041 |
| gzip -2        |      8.469 |  161859128 |    68804723 |       3.600 |        3.214 |      50360650 |
| gzip -3        |     10.239 |  157839772 |    56910558 |       3.692 |        3.144 |      50203490 |
| gzip -4        |     11.035 |  151039457 |    52805365 |       3.858 |        3.104 |      48659619 |
| gzip -5        |     13.767 |  146693142 |    42326375 |       3.972 |        3.143 |      46672969 |
| gzip -6        |     19.707 |  144364588 |    29568539 |       4.036 |        3.001 |      48105494 |
| gzip -7        |     24.014 |  143727357 |    24265312 |       4.054 |        2.993 |      48021168 |
| gzip -8        |     43.219 |  143062985 |    13482663 |       4.073 |        2.969 |      48185579 |
| gzip -9        |     70.930 |  142803637 |     8215243 |       4.080 |        2.964 |      48179365 |
| pigz -1 -p 4   |      2.247 |  165745308 |   259326747 |       3.516 |        1.919 |      86370666 |
| pigz -2 -p 4   |      2.394 |  160661935 |   243403175 |       3.627 |        1.862 |      86284605 |
| pigz -3 -p 4   |      2.776 |  156696382 |   209908934 |       3.719 |        1.817 |      86239065 |
| pigz -4 -p 4   |      3.045 |  150539955 |   191365255 |       3.871 |        1.787 |      84241721 |
| pigz -5 -p 4   |      3.855 |  146289903 |   151156213 |       3.983 |        1.732 |      84462992 |
| pigz -6 -p 4   |      5.378 |  143967093 |   108350167 |       4.048 |        1.721 |      83653163 |
| pigz -7 -p 4   |      6.579 |  143350506 |    88570786 |       4.065 |        1.702 |      84224739 |
| pigz -8 -p 4   |      11.76 |  142738270 |    49549932 |       4.082 |        1.720 |      82987366 |
| pigz -9 -p 4   |     19.878 |  142479078 |    29314176 |       4.090 |        1.691 |      84257290 |
| lzip -0        |     16.686 |  130302649 |    34921923 |       4.472 |        9.981 |      13055070 |
| lzip -1        |     42.011 |  118070414 |    13870348 |       4.935 |        8.669 |      13619842 |
| lzip -2        |     51.395 |  112769303 |    11337819 |       5.167 |        8.368 |      13476255 |
| lzip -3        |     69.344 |  106182860 |     8403138 |       5.488 |        8.162 |      13009417 |
| lzip -4        |     89.781 |  100072461 |     6490318 |       5.823 |        7.837 |      12769231 |
| lzip -5        |    119.626 |   95033235 |     4871075 |       6.132 |        7.586 |      12527450 |
| lzip -6        |    155.740 |   83063613 |     3741538 |       7.015 |        6.856 |      12115463 |
| lzip -7        |    197.485 |   78596381 |     2950640 |       7.414 |        6.586 |      11933857 |
| lzip -8        |    238.076 |   72885403 |     2447568 |       7.995 |        6.227 |      11704738 |
| lzip -9        |    306.368 |   72279340 |     1901985 |       8.062 |        6.203 |      11652320 |
| plzip -0 -n 4  |      4.821 |  131211238 |   120868533 |       4.441 |        2.829 |      46380784 |
| plzip -1 -n 4  |     13.453 |  120565830 |    43314294 |       4.833 |        2.604 |      46300242 |
| plzip -2 -n 4  |     15.695 |  114874773 |    37126932 |       5.073 |        2.398 |      47904409 |
| plzip -3 -n 4  |     20.563 |  108896468 |    28337655 |       5.351 |        2.486 |      43803889 |
| plzip -4 -n 4  |     26.871 |  102285879 |    21685356 |       5.697 |        2.375 |      43067739 |
| plzip -5 -n 4  |     35.220 |   97402840 |    16544781 |       5.982 |        2.448 |      39788742 |
| plzip -6 -n 4  |     45.812 |   89260273 |    12719532 |       6.528 |        2.145 |      41613181 |
| plzip -7 -n 4  |     62.723 |   82944080 |     9290168 |       7.025 |        2.080 |      39876962 |
| plzip -8 -n 4  |     71.928 |   78477272 |     8101257 |       7.425 |        2.120 |      37017581 |
| plzip -9 -n 4  |    103.744 |   75648923 |     5616780 |       7.703 |        2.578 |      29344035 |
| zstd -1        |      2.057 |  145784609 |   283280117 |       3.997 |        0.639 |     228144928 |
| zstd -2        |      2.316 |  136049621 |   251600691 |       4.283 |        0.657 |     207077049 |
| zstd -3        |      2.733 |  127702753 |   213211562 |       4.563 |        0.650 |     196465774 |
| zstd -4        |      3.269 |  126224007 |   178252432 |       4.616 |        0.658 |     191829798 |
| zstd -5        |      5.136 |  122024478 |   113455452 |       4.775 |        0.680 |     179447762 |
| zstd -6        |      6.394 |  120035201 |    91133438 |       4.854 |        0.652 |     184103069 |
| zstd -7        |      8.510 |  116048780 |    68473231 |       5.021 |        0.612 |     189622190 |
| zstd -8        |      9.875 |  114821611 |    59008324 |       5.075 |        0.593 |     193628349 |
| zstd -9        |     12.478 |  113868149 |    46698766 |       5.117 |        0.588 |     193653315 |
| zstd -10       |     14.982 |  111113753 |    38893819 |       5.244 |        0.578 |     192238327 |
| zstd -11       |     16.391 |  110674252 |    35550436 |       5.265 |        0.583 |     189835767 |
| zstd -12       |     21.008 |  110031164 |    27737395 |       5.296 |        0.570 |     193037130 |
| zstd -13       |     51.259 |  109262475 |    11367900 |       5.333 |        0.561 |     194763770 |
| zstd -14       |     58.897 |  108632734 |     9893665 |       5.364 |        0.562 |     193296680 |
| zstd -15       |     82.514 |  107956132 |     7061919 |       5.398 |        0.557 |     193817113 |
| zstd -16       |     78.935 |  105533404 |     7382114 |       5.522 |        0.576 |     183217715 |
| zstd -17       |     89.832 |   94165409 |     6486633 |       6.188 |        0.565 |     166664441 |
| zstd -18       |    115.663 |   91124039 |     5037974 |       6.395 |        0.614 |     148410487 |
| zstd -19       |    157.008 |   90229137 |     3711322 |       6.458 |        0.614 |     146952992 |
| zstd -20       |    162.499 |   80742922 |     3585913 |       7.217 |        0.605 |     133459375 |
| zstd -21       |    207.122 |   79619348 |     2813353 |       7.319 |        0.611 |     130309899 |
| zstd -22       |    277.177 |   78652901 |     2102293 |       7.409 |        0.634 |     124058203 |
| pzstd -1 -p 4  |      0.621 |  146665510 |   938336876 |       3.973 |        0.196 |     748293418 |
| pzstd -2 -p 4  |      0.720 |  137416958 |   809315556 |       4.240 |        0.227 |     605361048 |
| pzstd -3 -p 4  |      1.180 |  128748806 |   493819661 |       4.526 |        0.231 |     557354139 |
| pzstd -4 -p 4  |      1.786 |  127373154 |   326263830 |       4.575 |        0.240 |     530721475 |
| pzstd -5 -p 4  |      2.635 |  123216422 |   221141252 |       4.729 |        0.240 |     513401758 |
| pzstd -6 -p 4  |      3.774 |  121257316 |   154400424 |       4.806 |        0.251 |     483096876 |
| pzstd -7 -p 4  |      3.988 |  117361187 |   146115145 |       4.965 |        0.263 |     446240255 |
| pzstd -8 -p 4  |      4.540 |  116172098 |   128349604 |       5.016 |        0.240 |     484050408 |
| pzstd -9 -p 4  |      5.083 |  115237287 |   114638442 |       5.057 |        0.268 |     429989877 |
| pzstd -10 -p 4 |      5.630 |  112359994 |   103500391 |       5.186 |        0.226 |     497168115 |
| pzstd -11 -p 4 |      5.991 |  111969711 |    97263762 |       5.204 |        0.246 |     455161427 |
| pzstd -12 -p 4 |      8.001 |  111326376 |    72829296 |       5.234 |        0.227 |     490424564 |
| pzstd -13 -p 4 |     16.035 |  110525395 |    36339707 |       5.272 |        0.259 |     426738977 |
| pzstd -14 -p 4 |     18.145 |  109957500 |    32113927 |       5.299 |        0.253 |     434614625 |
| pzstd -15 -p 4 |     24.791 |  109358520 |    23504788 |       5.328 |        0.224 |     488207679 |
| pzstd -16 -p 4 |     23.940 |  106888588 |    24340317 |       5.452 |        0.234 |     456788838 |
| pzstd -17 -p 4 |     29.099 |   97393935 |    20024991 |       5.983 |        0.266 |     366142613 |
| pzstd -18 -p 4 |     37.124 |   94273955 |    15696240 |       6.181 |        0.284 |     331950546 |
| pzstd -19 -p 4 |     48.798 |   93531545 |    11941211 |       6.230 |        0.262 |     356990630 |
| pzstd -20 -p 4 |     54.860 |   82067608 |    10621713 |       7.100 |        0.302 |     271747046 |
| pzstd -21 -p 4 |     64.179 |   79735488 |     9079406 |       7.308 |        0.389 |     204975548 |
| pzstd -22 -p 4 |    256.242 |   78688788 |     2274050 |       7.405 |        0.585 |     134510749 |
#+TBLFM: $4='(format "%d" (round (/ 582707200.0 $2)));N :: $5='(format "%.3f" (/ 582707200.0 $3));N :: $7='(format "%d" (round (/ $3 $6)));N

[-- Attachment #1.3: compression-benchmark.plot --]
[-- Type: text/plain, Size: 1631 bytes --]

set terminal png size 1920, 1080
set style data linespoints
set logscale y
set xlabel "Compression ratio"
set ylabel "Compression speed (MB/s)"
set output "compression.png"
plot '$datafile' every ::0::8 using 5:($4 / 1000000) linecolor "dark-violet" title "gzip", \
     '$datafile' every ::0::8 using 5:($4 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "dark-violet" offset -1, -1 notitle, \
     '$datafile' every ::18::27 using 5:($4 / 1000000) linecolor "navy" title "lzip", \
     '$datafile' every ::18::27 using 5:($4 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "navy" offset 0, -1 notitle, \
     '$datafile' every ::38::56 using 5:($4 / 1000000) linecolor "olive" title "zstd", \
     '$datafile' every ::38::56 using 5:($4 / 1000000):(substr(stringcolumn(1), 7, 9)) with labels textcolor "olive" offset 1, 1 notitle

set ylabel "Decompression speed (MB/s)"
set output "decompression.png"
plot '$datafile' every ::0::8 using 5:($7 / 1000000) linecolor "dark-violet" title "gzip", \
     '$datafile' every ::0::8 using 5:($7 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "dark-violet" offset 0, -1 notitle, \
     '$datafile' every ::18::27 using 5:($7 / 1000000) linecolor "navy" title "lzip", \
     '$datafile' every ::18::27 using 5:($7 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "navy" offset 0, -1 notitle, \
     '$datafile' every ::38::56 using 5:($7 / 1000000) linecolor "olive" title "zstd", \
     '$datafile' every ::38::56 using 5:($7 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "olive" offset 0, -1 notitle

[-- Attachment #1.4: compression.png --]
[-- Type: image/png, Size: 16056 bytes --]

[-- Attachment #1.5: decompression.png --]
[-- Type: image/png, Size: 12804 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 247 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2021-01-07 10:45     ` Guillaume Le Vaillant
@ 2021-01-07 11:00       ` Pierre Neidhardt
  2021-01-07 11:33         ` Guillaume Le Vaillant
  2021-01-14 21:51       ` Ludovic Courtès
  1 sibling, 1 reply; 23+ messages in thread
From: Pierre Neidhardt @ 2021-01-07 11:00 UTC (permalink / raw)
  To: Guillaume Le Vaillant, Joshua Branson; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 392 bytes --]

Wow, impressive! :)

Guillaume Le Vaillant <glv@posteo.net> writes:

> Note that the plots only show the results using only 1 thread and

Doesn't 1 thread defeat the purpose of parallel compression / decompression?

> Machine used for the tests:
>  - CPU: Intel i7-3630QM
>  - RAM: 16 MiB

I suppose you meant 16 GiB ;)

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2021-01-07 11:00       ` Pierre Neidhardt
@ 2021-01-07 11:33         ` Guillaume Le Vaillant
  0 siblings, 0 replies; 23+ messages in thread
From: Guillaume Le Vaillant @ 2021-01-07 11:33 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]


Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Wow, impressive! :)
>
> Guillaume Le Vaillant <glv@posteo.net> writes:
>
>> Note that the plots only show the results using only 1 thread and
>
> Doesn't 1 thread defeat the purpose of parallel compression / decompression?
>

It was just to get a better idea of the relative compression and
decompression speeds of the algorithms. When using n threads, if the
file is big enough, the speeds are almost multiplied by n and the
compression ratio is a little lower.

>> Machine used for the tests:
>>  - CPU: Intel i7-3630QM
>>  - RAM: 16 MiB
>
> I suppose you meant 16 GiB ;)

Yes, of course :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 247 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2021-01-07 10:45     ` Guillaume Le Vaillant
  2021-01-07 11:00       ` Pierre Neidhardt
@ 2021-01-14 21:51       ` Ludovic Courtès
  2021-01-14 22:08         ` Nicolò Balzarotti
  2021-01-15  8:10         ` Pierre Neidhardt
  1 sibling, 2 replies; 23+ messages in thread
From: Ludovic Courtès @ 2021-01-14 21:51 UTC (permalink / raw)
  To: Guillaume Le Vaillant; +Cc: guix-devel

Hi Guillaume,

Guillaume Le Vaillant <glv@posteo.net> skribis:

> I compared gzip, lzip and zstd when compressing a 580 MB pack (therefore
> containing "subsitutes" for several packages) with different compression
> levels. Maybe the results can be of some use to someone.

It’s insightful, thanks a lot!

One takeaway for me is that zstd decompression remains an order of
magnitude faster than the others, regardless of the compression level.

Another one is that at level 10 and higher zstd achieves compression
ratios that are more in the ballpark of lzip.

If we are to change the compression methods used at ci.guix.gnu.org, we
could use zstd >= 10.

We could also drop gzip, but there are probably pre-1.1 daemons out
there that understand nothing but gzip¹, so perhaps that’ll have to
wait.  Now, compressing substitutes three times may be somewhat
unreasonable.

Thoughts?

Ludo’.

¹ https://guix.gnu.org/en/blog/2020/gnu-guix-1.1.0-released/


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2021-01-14 21:51       ` Ludovic Courtès
@ 2021-01-14 22:08         ` Nicolò Balzarotti
  2021-01-15  8:10         ` Pierre Neidhardt
  1 sibling, 0 replies; 23+ messages in thread
From: Nicolò Balzarotti @ 2021-01-14 22:08 UTC (permalink / raw)
  To: Ludovic Courtès, Guillaume Le Vaillant; +Cc: guix-devel

Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> We could also drop gzip, but there are probably pre-1.1 daemons out
> there that understand nothing but gzip¹, so perhaps that’ll have to
> wait.  Now, compressing substitutes three times may be somewhat
> unreasonable.
>
> Thoughts?
>
Is there a request log where we can check whether this is true?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: When substitute download + decompression is CPU-bound
  2021-01-14 21:51       ` Ludovic Courtès
  2021-01-14 22:08         ` Nicolò Balzarotti
@ 2021-01-15  8:10         ` Pierre Neidhardt
  1 sibling, 0 replies; 23+ messages in thread
From: Pierre Neidhardt @ 2021-01-15  8:10 UTC (permalink / raw)
  To: Ludovic Courtès, Guillaume Le Vaillant; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:

> One takeaway for me is that zstd decompression remains an order of
> magnitude faster than the others, regardless of the compression level.
>
> Another one is that at level 10 and higher zstd achieves compression
> ratios that are more in the ballpark of lzip.

Hmmm, this is roughly true for lzip < level 6, but as soon as lzip hits level 6
(the default!) it compresses up to twice as much!

> If we are to change the compression methods used at ci.guix.gnu.org, we
> could use zstd >= 10.

On Guillaume's graph, the compression speed at the default level 3 is
about 110 MB/s, while at level 10 it's about 40 MB/s, which is
approximately the gzip speed.

If server compression time does not matter, then I agree, level >= 10
would be a good option.

What about zstd level 19 then?  It's as slow as lzip to compress, but
decompresses still blazingly fast, which is what we are trying to
achieve here, _while_ offering a compression ration in the ballpark of
lzip level 6 (but still not that of lzip level 9).

> We could also drop gzip, but there are probably pre-1.1 daemons out
> there that understand nothing but gzip¹, so perhaps that’ll have to
> wait.  Now, compressing substitutes three times may be somewhat
> unreasonable.

Agreed, maybe release an announcement and give it a few months / 1 year?

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-01-15  8:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès
2020-12-14 22:29 ` Julien Lepiller
2020-12-14 22:59 ` Nicolò Balzarotti
2020-12-15  7:52   ` Pierre Neidhardt
2020-12-15  9:45     ` Nicolò Balzarotti
2020-12-15  9:54       ` Pierre Neidhardt
2020-12-15 10:03         ` Nicolò Balzarotti
2020-12-15 10:13           ` Pierre Neidhardt
2020-12-15 10:14             ` Pierre Neidhardt
2020-12-15 11:42     ` Ludovic Courtès
2020-12-15 12:31       ` Pierre Neidhardt
2020-12-18 14:59         ` Ludovic Courtès
2020-12-18 15:33           ` Pierre Neidhardt
2020-12-15 11:36   ` Ludovic Courtès
2020-12-15 11:45     ` Nicolò Balzarotti
2020-12-15 10:40 ` Jonathan Brielmaier
2020-12-15 19:43   ` Joshua Branson
2021-01-07 10:45     ` Guillaume Le Vaillant
2021-01-07 11:00       ` Pierre Neidhardt
2021-01-07 11:33         ` Guillaume Le Vaillant
2021-01-14 21:51       ` Ludovic Courtès
2021-01-14 22:08         ` Nicolò Balzarotti
2021-01-15  8:10         ` Pierre Neidhardt

unofficial mirror of guix-devel@gnu.org 

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://yhetil.org/guix-devel/0 guix-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 guix-devel guix-devel/ https://yhetil.org/guix-devel \
		guix-devel@gnu.org
	public-inbox-index guix-devel

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.gnu.guix.devel
	nntp://news.gmane.io/gmane.comp.gnu.guix.devel


AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git