unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: guix-devel <guix-devel@gnu.org>
Subject: When substitute download + decompression is CPU-bound
Date: Mon, 14 Dec 2020 23:20:17 +0100	[thread overview]
Message-ID: <87im94qbby.fsf@gnu.org> (raw)

Hi Guix!

Consider these two files:

  https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92
  https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92

Quick decompression bench:

--8<---------------cut here---------------start------------->8---
$ du -h /tmp/uc.nar.[gl]z
103M	/tmp/uc.nar.gz
71M	/tmp/uc.nar.lz
$ gunzip -c < /tmp/uc.nar.gz| wc -c
350491552
$ time lzip -d </tmp/uc.nar.lz >/dev/null

real	0m6.040s
user	0m5.950s
sys	0m0.036s
$ time gunzip -c < /tmp/uc.nar.gz >/dev/null

real	0m2.009s
user	0m1.977s
sys	0m0.032s
--8<---------------cut here---------------end--------------->8---

The decompression throughput (compressed bytes read in the first column,
uncompressed bytes written in the second column) is:

          input   |  output
  gzip: 167 MiB/s | 52 MB/s
  lzip:  56 MiB/s | 11 MB/s

Indeed, if you run this from a computer on your LAN:

  wget -O - … | gunzip > /dev/null

you’ll find that wget caps at 50 M/s with gunzip, whereas with lunzip it
caps at 11 MB/s.

From my place I get a peak download bandwidth of 30+ MB/s from
ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can’t go
beyond 11 M/s due to decompression).  I must say it never occurred to me
it could be the case when we introduced lzip substitutes.

I’d get faster substitute downloads with gzip (I would download more but
the time-to-disk would be smaller.)  Specifically, download +
decompression of ungoogled-chromium from the LAN completes in 2.4s for
gzip vs. 7.1s for lzip.  On a low-end ARMv7 device, also on the LAN, I
get 32s (gzip) vs. 53s (lzip).

Where to go from here?  Several options:

  0. Lzip decompression speed increases with compression ratio, but
     we’re already using ‘--best’ on ci.  The only way we could gain is
     by using “multi-member archives” and then parallel decompression as
     done in plzip, but that’s probably not supported in lzlib.  So
     we’re probably stuck here.

  1. Since ci.guix.gnu.org still provides both gzip and lzip archives,
     ‘guix substitute’ could automatically pick one or the other
     depending on the CPU and bandwidth.  Perhaps a simple trick would
     be to check the user/wall-clock time ratio and switch to gzip for
     subsequent downloads if that ratio is close to one.  How well would
     that work?

  2. Use Zstd like all the cool kids since it seems to have a much
     higher decompression speed: <https://facebook.github.io/zstd/>.
     630 MB/s on ungoogled-chromium on my laptop.  Woow.

  3. Allow for parallel downloads (really: parallel decompression) as
     Julien did in <https://issues.guix.gnu.org/39728>.

My preference would be #2, #1, and #3, in this order.  #2 is great but
it’s quite a bit of work, whereas #1 could be deployed quickly.  I’m not
fond of #3 because it just papers over the underlying issue and could be
counterproductive if the number of jobs is wrong.

Thoughts?

Ludo’.


             reply	other threads:[~2020-12-14 22:24 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 22:20 Ludovic Courtès [this message]
2020-12-14 22:29 ` When substitute download + decompression is CPU-bound Julien Lepiller
2020-12-14 22:59 ` Nicolò Balzarotti
2020-12-15  7:52   ` Pierre Neidhardt
2020-12-15  9:45     ` Nicolò Balzarotti
2020-12-15  9:54       ` Pierre Neidhardt
2020-12-15 10:03         ` Nicolò Balzarotti
2020-12-15 10:13           ` Pierre Neidhardt
2020-12-15 10:14             ` Pierre Neidhardt
2020-12-15 11:42     ` Ludovic Courtès
2020-12-15 12:31       ` Pierre Neidhardt
2020-12-18 14:59         ` Ludovic Courtès
2020-12-18 15:33           ` Pierre Neidhardt
2020-12-15 11:36   ` Ludovic Courtès
2020-12-15 11:45     ` Nicolò Balzarotti
2020-12-15 10:40 ` Jonathan Brielmaier
2020-12-15 19:43   ` Joshua Branson
2021-01-07 10:45     ` Guillaume Le Vaillant
2021-01-07 11:00       ` Pierre Neidhardt
2021-01-07 11:33         ` Guillaume Le Vaillant
2021-01-14 21:51       ` Ludovic Courtès
2021-01-14 22:08         ` Nicolò Balzarotti
2021-01-28 17:53           ` Are gzip-compressed substitutes still used? Ludovic Courtès
2021-03-17 17:12             ` Ludovic Courtès
2021-03-17 17:33               ` Léo Le Bouter
2021-03-17 18:08                 ` Vagrant Cascadian
2021-03-18  0:03                   ` zimoun
2021-03-18 16:00                     ` Vagrant Cascadian
2021-03-18 18:53                       ` Leo Famulari
2021-03-20 11:23                   ` Ludovic Courtès
2021-03-17 18:06               ` zimoun
2021-03-17 18:20               ` Jonathan Brielmaier
2021-03-18 17:25               ` Pierre Neidhardt
2021-01-15  8:10         ` When substitute download + decompression is CPU-bound Pierre Neidhardt
2021-01-28 17:58           ` Ludovic Courtès
2021-01-29  9:45             ` Pierre Neidhardt
2021-01-29 11:23               ` Guillaume Le Vaillant
2021-01-29 11:55                 ` Nicolò Balzarotti
2021-01-29 12:13                   ` Pierre Neidhardt
2021-01-29 13:06                     ` Guillaume Le Vaillant
2021-01-29 14:55                     ` Nicolò Balzarotti
2021-02-01 22:18                 ` Ludovic Courtès
2021-01-29 13:33             ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87im94qbby.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).