From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 9I4YCC3eQmJXcQAAgWs5BA (envelope-from ) for ; Tue, 29 Mar 2022 12:23:41 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id kMVVBC3eQmKWqwAAauVa8A (envelope-from ) for ; Tue, 29 Mar 2022 12:23:41 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B0FD52DDA6 for ; Tue, 29 Mar 2022 12:23:40 +0200 (CEST) Received: from localhost ([::1]:46424 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nZ90l-0006K7-OF for larch@yhetil.org; Tue, 29 Mar 2022 06:23:39 -0400 Received: from eggs.gnu.org ([209.51.188.92]:43134) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nZ8zt-0006GV-UU for guix-devel@gnu.org; Tue, 29 Mar 2022 06:22:45 -0400 Received: from [2001:470:142:3::e] (port=32966 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nZ8zt-0007QU-Kv; Tue, 29 Mar 2022 06:22:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=ObAUxXJcOX8v2MKwPZs66LRl2/DEmaAX4fWX9Pam5Pg=; b=JQ1jX6y3Bhg4L9Vk6dYI j6VCnCsLezRGgCsj7LXXo0PKBpxUY4uXk+0MazLrt9ONDOsBOQ96lJM2bOVNuZNBJLN19gYw2vJPM KP2FMt13G+XCc3AHUoVvvZ4XlX0WEy9x26Ycwt3xu9XkzH6OAGScmbrjiG9MVtvHugVIPq4ylACVu FpmvEgZjUgZ2OzKW0j4fJFg28Hc/qdmLKk3oBtpdkY0SRRubFohqFKbH9SwqLMefBApejWCfuxIzi KYHgQjQrQMJYGYGRl8bF8DGla0SVBpyMOnsJhf2qGx5Qb5A4Ss9u2GLLPSfKWaHnrAmIeCaFC8K5N yBsZ/wdkcMHESw==; Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=38682 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nZ8zt-0000dV-4z; Tue, 29 Mar 2022 06:22:45 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Maxim Cournoyer Subject: Re: Profiling of man-db database generation with zlib vs zstd References: <875yo53iuq.fsf@gmail.com> <87ee2r9gms.fsf@gnu.org> <87zgldwg87.fsf@gmail.com> <87czi8vz3n.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 9 Germinal an 230 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 29 Mar 2022 12:22:43 +0200 In-Reply-To: <87czi8vz3n.fsf@gmail.com> (Maxim Cournoyer's message of "Sat, 26 Mar 2022 23:44:12 -0400") Message-ID: <87k0cd12j0.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1648549420; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=ObAUxXJcOX8v2MKwPZs66LRl2/DEmaAX4fWX9Pam5Pg=; b=bKjoxw+btuOsBWqWx7JosxNZXmnyr3K+H9rXHRV22CTxcu4U0IWp5dtkksTyohpqZBuzuI xOwPjLdR4kyr+JGrcKZnf2VUwmYxYSkVjjLY3AZ0qiMR1vQM3NZRzWS1yHLiU+GxjHJB3r x0Akqsx0aFCYM/mkH/Me8uMHgiAbvgiEntAMJ2TdFqndjTsjIRpj9kZaJDLOW2pB2cNfhL 9VZly65Hywm2LGa8Qwgbcodr2r/DFoRYSkI0kp6IKtGkmWxA5C4pszEiWeyA6Aya/nh3pt x8RgT3h7NnQoR04IyFc3X6/5yC05kvGv8RxYn0+ObrDzmPLT1f2dCIp9Cm2kbQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1648549420; a=rsa-sha256; cv=none; b=LJmbSAqBmoavABDZySjGvsxYdF3B20TXJiiR9xgQLMhh6g7FCv5KEaPf4mmJmnV67Z+qEr xTfSgvo/0VjMO9mwWmb622tcG9i4ejkPD82EmKKrcftWE3cCBZ7aFKPzlm4QLww5hZp4pT fKmd5ETEy+Oa18O2WmihNITWgf0zXTa2YuB67WDGwFZdpW2CCZbc6JXIpnfil85fjPvgge N6bKt6ymk4UsB0bOpy5fblozX4Luc/80a/ZnL52ZgZbpKLEmLZvePF+nCKKE+IY8jWLwBp RUDGRBPeIJ9xVXqRhh9+yD0dq8kTEmm9Yf+3veFU/FfKus64JmTQnKwEq9gQDw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gnu.org header.s=fencepost-gnu-org header.b=JQ1jX6y3; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -5.27 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gnu.org header.s=fencepost-gnu-org header.b=JQ1jX6y3; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: B0FD52DDA6 X-Spam-Score: -5.27 X-Migadu-Scanner: scn0.migadu.com X-TUID: TyOXTACJY4E5 Howdy, Maxim Cournoyer skribis: > OK, I've now compiled guix/man-db.scm with -O3, generated a profile with > ~5300 manuals (guix shell --no-grafts --pure man-db perl@5.14 man-pages > libx11:doc gnutls-dane:doc my-tcl) on core-updates with my local changes > to compress man pages with zstd and profiled: > > scheme@(guix man-db)> ,profile (define a (mandb-entries "/gnu/store/jp1kj= kz5m116r960gvjk1sj4b0fkb0ip-profile/share/man")) > % cumulative self > time seconds seconds procedure > 69.37 16.19 16.15 %read-line > 14.14 3.31 3.29 string-tokenize > 5.76 1.34 1.34 gdbm.scm:122:11 [...] > Total time: 23.282459186 seconds (7.318256931 seconds in GC) > > > It still shows that parsing the files is much more expensive than > decompressing them. This is also true of zlib-compressed manuals; > here's the same experiment on master: > > $ guix shell --no-grafts --pure man-db perl@5.14 man-pages libx11:doc gnu= tls-dane:doc tcl > [env]$ echo $GUIX_ENVIRONMENT && exit > /gnu/store/qqd7d22wf9d220prkm682yypybpr7df4-profile > $ guix shell -D guix guile-gdbm-ffi guile-zstd > [env]$ guild compile -O3 guix/man-db.scm > `/home/maxim/.cache/guile/ccache/3.0-LE-8-4.6/home/maxim/src/guix-master/= guix/man-db.scm.go' > [env]$ ./pre-inst-env guix repl > > scheme@(guix-user)> ,m (guix man-db) > scheme@(guix man-db)> ,profile (define a (mandb-entries "/gnu/store/qqd7d= 22wf9d220prkm682yypybpr7df4-profile/share/man")) > % cumulative self > time seconds seconds procedure > 49.15 2.62 2.56 string-tokenize > 15.93 0.87 0.83 %read-line [...] > Total time: 5.217801086 seconds (1.528583927 seconds in GC) > > Hum, OK, so if I understand guile-zstd causes an almost 5x > slowdown... that doesn't make much sense unless the guile-zstd library > is much slower itself (zstd-decompression should be about 3.5x faster > than that of zlib). Zstd decompression is indeed faster than zlib decompression. It could be that there are other factors interfering though. For example, since we decompress only the first few bytes of each file, it could be that allocation/initialization of the zlib/zstd decompression ports outweighs the actual decompression cost. It might be possible to visualize that with a C-level profile, using =E2=80=98perf= =E2=80=99. > Supposing guile-zstd was a 1:1 equivalent to guile-zlib, then we should > be able to see the above ~5 s time shrink down to around 1.5 s (best > case), which would be nice. > > I'm impressed the man-db hook is that fast already though; I seem to > recall it would take like 30 s (or more?) for such a large amount of man > pages on this machine. Are you doing this on a hot cache? I/O would normally dominate that whole process when running =E2=80=98guix install foo=E2=80=99, because in t= hat case you=E2=80=99d be reading all these man pages for the first time (they=E2=80= =99re not in cache). HTH, Ludo=E2=80=99.