unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Julien Lepiller <julien@lepiller.eu>
To: guix-devel@gnu.org, "Ludovic Courtès" <ludo@gnu.org>
Subject: Re: When substitute download + decompression is CPU-bound
Date: Mon, 14 Dec 2020 17:29:39 -0500	[thread overview]
Message-ID: <B4418E22-548C-436A-9F91-D7A4F25D4CC0@lepiller.eu> (raw)
In-Reply-To: <87im94qbby.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]

My proposed changes to allow for parallel download assume downloads are network-bound, so they can be separate from other jobs. If downloads are actually CPU-bound, then it has indeed no merit at all :)

Le 14 décembre 2020 17:20:17 GMT-05:00, "Ludovic Courtès" <ludo@gnu.org> a écrit :
>Hi Guix!
>
>Consider these two files:
>
>https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92
>https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92
>
>Quick decompression bench:
>
>--8<---------------cut here---------------start------------->8---
>$ du -h /tmp/uc.nar.[gl]z
>103M	/tmp/uc.nar.gz
>71M	/tmp/uc.nar.lz
>$ gunzip -c < /tmp/uc.nar.gz| wc -c
>350491552
>$ time lzip -d </tmp/uc.nar.lz >/dev/null
>
>real	0m6.040s
>user	0m5.950s
>sys	0m0.036s
>$ time gunzip -c < /tmp/uc.nar.gz >/dev/null
>
>real	0m2.009s
>user	0m1.977s
>sys	0m0.032s
>--8<---------------cut here---------------end--------------->8---
>
>The decompression throughput (compressed bytes read in the first
>column,
>uncompressed bytes written in the second column) is:
>
>          input   |  output
>  gzip: 167 MiB/s | 52 MB/s
>  lzip:  56 MiB/s | 11 MB/s
>
>Indeed, if you run this from a computer on your LAN:
>
>  wget -O - … | gunzip > /dev/null
>
>you’ll find that wget caps at 50 M/s with gunzip, whereas with lunzip
>it
>caps at 11 MB/s.
>
>From my place I get a peak download bandwidth of 30+ MB/s from
>ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can’t go
>beyond 11 M/s due to decompression).  I must say it never occurred to
>me
>it could be the case when we introduced lzip substitutes.
>
>I’d get faster substitute downloads with gzip (I would download more
>but
>the time-to-disk would be smaller.)  Specifically, download +
>decompression of ungoogled-chromium from the LAN completes in 2.4s for
>gzip vs. 7.1s for lzip.  On a low-end ARMv7 device, also on the LAN, I
>get 32s (gzip) vs. 53s (lzip).
>
>Where to go from here?  Several options:
>
>  0. Lzip decompression speed increases with compression ratio, but
>     we’re already using ‘--best’ on ci.  The only way we could gain is
>    by using “multi-member archives” and then parallel decompression as
>     done in plzip, but that’s probably not supported in lzlib.  So
>     we’re probably stuck here.
>
>  1. Since ci.guix.gnu.org still provides both gzip and lzip archives,
>     ‘guix substitute’ could automatically pick one or the other
>     depending on the CPU and bandwidth.  Perhaps a simple trick would
>     be to check the user/wall-clock time ratio and switch to gzip for
>    subsequent downloads if that ratio is close to one.  How well would
>     that work?
>
>  2. Use Zstd like all the cool kids since it seems to have a much
>     higher decompression speed: <https://facebook.github.io/zstd/>.
>     630 MB/s on ungoogled-chromium on my laptop.  Woow.
>
>  3. Allow for parallel downloads (really: parallel decompression) as
>     Julien did in <https://issues.guix.gnu.org/39728>.
>
>My preference would be #2, #1, and #3, in this order.  #2 is great but
>it’s quite a bit of work, whereas #1 could be deployed quickly.  I’m
>not
>fond of #3 because it just papers over the underlying issue and could
>be
>counterproductive if the number of jobs is wrong.
>
>Thoughts?
>
>Ludo’.

[-- Attachment #2: Type: text/html, Size: 4247 bytes --]

  reply	other threads:[~2020-12-14 22:30 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès
2020-12-14 22:29 ` Julien Lepiller [this message]
2020-12-14 22:59 ` Nicolò Balzarotti
2020-12-15  7:52   ` Pierre Neidhardt
2020-12-15  9:45     ` Nicolò Balzarotti
2020-12-15  9:54       ` Pierre Neidhardt
2020-12-15 10:03         ` Nicolò Balzarotti
2020-12-15 10:13           ` Pierre Neidhardt
2020-12-15 10:14             ` Pierre Neidhardt
2020-12-15 11:42     ` Ludovic Courtès
2020-12-15 12:31       ` Pierre Neidhardt
2020-12-18 14:59         ` Ludovic Courtès
2020-12-18 15:33           ` Pierre Neidhardt
2020-12-15 11:36   ` Ludovic Courtès
2020-12-15 11:45     ` Nicolò Balzarotti
2020-12-15 10:40 ` Jonathan Brielmaier
2020-12-15 19:43   ` Joshua Branson
2021-01-07 10:45     ` Guillaume Le Vaillant
2021-01-07 11:00       ` Pierre Neidhardt
2021-01-07 11:33         ` Guillaume Le Vaillant
2021-01-14 21:51       ` Ludovic Courtès
2021-01-14 22:08         ` Nicolò Balzarotti
2021-01-28 17:53           ` Are gzip-compressed substitutes still used? Ludovic Courtès
2021-03-17 17:12             ` Ludovic Courtès
2021-03-17 17:33               ` Léo Le Bouter
2021-03-17 18:08                 ` Vagrant Cascadian
2021-03-18  0:03                   ` zimoun
2021-03-18 16:00                     ` Vagrant Cascadian
2021-03-18 18:53                       ` Leo Famulari
2021-03-20 11:23                   ` Ludovic Courtès
2021-03-17 18:06               ` zimoun
2021-03-17 18:20               ` Jonathan Brielmaier
2021-03-18 17:25               ` Pierre Neidhardt
2021-01-15  8:10         ` When substitute download + decompression is CPU-bound Pierre Neidhardt
2021-01-28 17:58           ` Ludovic Courtès
2021-01-29  9:45             ` Pierre Neidhardt
2021-01-29 11:23               ` Guillaume Le Vaillant
2021-01-29 11:55                 ` Nicolò Balzarotti
2021-01-29 12:13                   ` Pierre Neidhardt
2021-01-29 13:06                     ` Guillaume Le Vaillant
2021-01-29 14:55                     ` Nicolò Balzarotti
2021-02-01 22:18                 ` Ludovic Courtès
2021-01-29 13:33             ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B4418E22-548C-436A-9F91-D7A4F25D4CC0@lepiller.eu \
    --to=julien@lepiller.eu \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).