From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 8xntEynm11/HVwAA0tVLHw (envelope-from ) for ; Mon, 14 Dec 2020 22:24:41 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id WPpmDynm1188YQAA1q6Kng (envelope-from ) for ; Mon, 14 Dec 2020 22:24:41 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id D570F9404D7 for ; Mon, 14 Dec 2020 22:24:40 +0000 (UTC) Received: from localhost ([::1]:33600 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kowGl-0002Mw-Nm for larch@yhetil.org; Mon, 14 Dec 2020 17:24:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:50324) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kowCa-0006yQ-8p for guix-devel@gnu.org; Mon, 14 Dec 2020 17:20:20 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]:34382) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kowCa-0005Tb-16 for guix-devel@gnu.org; Mon, 14 Dec 2020 17:20:20 -0500 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=43088 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kowCZ-00025n-I1 for guix-devel@gnu.org; Mon, 14 Dec 2020 17:20:19 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: guix-devel Subject: When substitute download + decompression is CPU-bound X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 24 Frimaire an 229 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 14 Dec 2020 23:20:17 +0100 Message-ID: <87im94qbby.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -1.31 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: D570F9404D7 X-Spam-Score: -1.31 X-Migadu-Scanner: scn0.migadu.com X-TUID: 7sc95aHv+UwZ Hi Guix! Consider these two files: https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogl= ed-chromium-87.0.4280.88-0.b78cb92 https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogl= ed-chromium-87.0.4280.88-0.b78cb92 Quick decompression bench: --8<---------------cut here---------------start------------->8--- $ du -h /tmp/uc.nar.[gl]z 103M /tmp/uc.nar.gz 71M /tmp/uc.nar.lz $ gunzip -c < /tmp/uc.nar.gz| wc -c 350491552 $ time lzip -d /dev/null real 0m6.040s user 0m5.950s sys 0m0.036s $ time gunzip -c < /tmp/uc.nar.gz >/dev/null real 0m2.009s user 0m1.977s sys 0m0.032s --8<---------------cut here---------------end--------------->8--- The decompression throughput (compressed bytes read in the first column, uncompressed bytes written in the second column) is: input | output gzip: 167=C2=A0MiB/s | 52=C2=A0MB/s lzip: 56=C2=A0MiB/s | 11=C2=A0MB/s Indeed, if you run this from a computer on your LAN: wget -O - =E2=80=A6 | gunzip > /dev/null you=E2=80=99ll find that wget caps at 50=C2=A0M/s with gunzip, whereas with= lunzip it caps at 11=C2=A0MB/s. >From my place I get a peak download bandwidth of 30+=C2=A0MB/s from ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can=E2=80=99t go beyond 11=C2=A0M/s due to decompression). I must say it never occurred to = me it could be the case when we introduced lzip substitutes. I=E2=80=99d get faster substitute downloads with gzip (I would download mor= e but the time-to-disk would be smaller.) Specifically, download + decompression of ungoogled-chromium from the LAN completes in 2.4s for gzip vs. 7.1s for lzip. On a low-end ARMv7 device, also on the LAN, I get 32s (gzip) vs. 53s (lzip). Where to go from here? Several options: 0. Lzip decompression speed increases with compression ratio, but we=E2=80=99re already using =E2=80=98--best=E2=80=99 on ci. The only = way we could gain is by using =E2=80=9Cmulti-member archives=E2=80=9D and then parallel dec= ompression as done in plzip, but that=E2=80=99s probably not supported in lzlib. So we=E2=80=99re probably stuck here. 1. Since ci.guix.gnu.org still provides both gzip and lzip archives, =E2=80=98guix substitute=E2=80=99 could automatically pick one or the = other depending on the CPU and bandwidth. Perhaps a simple trick would be to check the user/wall-clock time ratio and switch to gzip for subsequent downloads if that ratio is close to one. How well would that work? 2. Use Zstd like all the cool kids since it seems to have a much higher decompression speed: . 630=C2=A0MB/s on ungoogled-chromium on my laptop. Woow. 3. Allow for parallel downloads (really: parallel decompression) as Julien did in . My preference would be #2, #1, and #3, in this order. #2 is great but it=E2=80=99s quite a bit of work, whereas #1 could be deployed quickly. I= =E2=80=99m not fond of #3 because it just papers over the underlying issue and could be counterproductive if the number of jobs is wrong. Thoughts? Ludo=E2=80=99.