My proposed changes to allow for parallel download assume downloads are network-bound, so they can be separate from other jobs. If downloads are actually CPU-bound, then it has indeed no merit at all :) Le 14 décembre 2020 17:20:17 GMT-05:00, "Ludovic Courtès" a écrit : >Hi Guix! > >Consider these two files: > >https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92 >https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92 > >Quick decompression bench: > >--8<---------------cut here---------------start------------->8--- >$ du -h /tmp/uc.nar.[gl]z >103M /tmp/uc.nar.gz >71M /tmp/uc.nar.lz >$ gunzip -c < /tmp/uc.nar.gz| wc -c >350491552 >$ time lzip -d /dev/null > >real 0m6.040s >user 0m5.950s >sys 0m0.036s >$ time gunzip -c < /tmp/uc.nar.gz >/dev/null > >real 0m2.009s >user 0m1.977s >sys 0m0.032s >--8<---------------cut here---------------end--------------->8--- > >The decompression throughput (compressed bytes read in the first >column, >uncompressed bytes written in the second column) is: > > input | output > gzip: 167 MiB/s | 52 MB/s > lzip: 56 MiB/s | 11 MB/s > >Indeed, if you run this from a computer on your LAN: > > wget -O - … | gunzip > /dev/null > >you’ll find that wget caps at 50 M/s with gunzip, whereas with lunzip >it >caps at 11 MB/s. > >From my place I get a peak download bandwidth of 30+ MB/s from >ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can’t go >beyond 11 M/s due to decompression). I must say it never occurred to >me >it could be the case when we introduced lzip substitutes. > >I’d get faster substitute downloads with gzip (I would download more >but >the time-to-disk would be smaller.) Specifically, download + >decompression of ungoogled-chromium from the LAN completes in 2.4s for >gzip vs. 7.1s for lzip. On a low-end ARMv7 device, also on the LAN, I >get 32s (gzip) vs. 53s (lzip). > >Where to go from here? Several options: > > 0. Lzip decompression speed increases with compression ratio, but > we’re already using ‘--best’ on ci. The only way we could gain is > by using “multi-member archives” and then parallel decompression as > done in plzip, but that’s probably not supported in lzlib. So > we’re probably stuck here. > > 1. Since ci.guix.gnu.org still provides both gzip and lzip archives, > ‘guix substitute’ could automatically pick one or the other > depending on the CPU and bandwidth. Perhaps a simple trick would > be to check the user/wall-clock time ratio and switch to gzip for > subsequent downloads if that ratio is close to one. How well would > that work? > > 2. Use Zstd like all the cool kids since it seems to have a much > higher decompression speed: . > 630 MB/s on ungoogled-chromium on my laptop. Woow. > > 3. Allow for parallel downloads (really: parallel decompression) as > Julien did in . > >My preference would be #2, #1, and #3, in this order. #2 is great but >it’s quite a bit of work, whereas #1 could be deployed quickly. I’m >not >fond of #3 because it just papers over the underlying issue and could >be >counterproductive if the number of jobs is wrong. > >Thoughts? > >Ludo’.