From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id YLfIKfUeOmIaXAEAgWs5BA (envelope-from ) for ; Tue, 22 Mar 2022 20:09:41 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id OAV4IvUeOmLg4QAAG6o9tA (envelope-from ) for ; Tue, 22 Mar 2022 20:09:41 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id F2AB53B8AE for ; Tue, 22 Mar 2022 20:09:40 +0100 (CET) Received: from localhost ([::1]:38448 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nWjsy-0005Gz-1Q for larch@yhetil.org; Tue, 22 Mar 2022 15:09:40 -0400 Received: from eggs.gnu.org ([209.51.188.92]:35224) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nWjsR-0005Dr-AK for guix-devel@gnu.org; Tue, 22 Mar 2022 15:09:07 -0400 Received: from [2607:f8b0:4864:20::735] (port=43893 helo=mail-qk1-x735.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nWjsP-0001Uc-Gr for guix-devel@gnu.org; Tue, 22 Mar 2022 15:09:07 -0400 Received: by mail-qk1-x735.google.com with SMTP id p25so8276046qkj.10 for ; Tue, 22 Mar 2022 12:09:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:subject:date:message-id:mime-version; bh=tTDQNN17KwZyDcHb+2e+KPalqcGSZp1F0u8TMH5woDs=; b=AW8LokQyeZiXKorpdcFyOqXNu9tQnoSI4Cw9/V/NWaa4LM6r/o8wvBNLzE7iLS8Utw QVNRM5h5VF/6WEuibtDLheBPkB10Mk38A8SmSnE1ohdQgSL3VEvDyFIJZw+tciXUXhYS WJpSm05Dz6fYVXUFLV8D6TitRtCpqjLmqsZd9Y/rvlgQh0jPrqFyysmivO4lGaioqyMt AFnIkJ90cxUvX6pK+tMRiBm0P5vkTBye24MtopyqSDjsAyyDNTq1bjRWgOaruWnuGEFO fowUh5WkhvjMeaaFqb9LTXs85zbxRfM8KjoX43kx9peIOYr94UBZzOrJw3VkYUmWPaXl NLNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version; bh=tTDQNN17KwZyDcHb+2e+KPalqcGSZp1F0u8TMH5woDs=; b=18R9Y0CGhm1wI798S7xZ+Bidcb6xmI/FbLa8GbSG6YKLgHZMvAhT4ViuJoVN7j38Ab f6Qe9nK5X8l2GmZTS4Bn292+Y/8l8ZWMYa1x47hezLwImHFORfyrWo4/LT9K2TWlvGqd PukqUltIEqAloMMoKrIM38BPgaB6tF4+RV92OuQBPz61fHAzN5cn6tEvS9ol223ecNNH PFtClN64LeTtNCaeut8UovAymF5h5g5ilWVHqLkBEIOF4iZy2Co5Zuw3cXgAarZKB8Ve +2oyrMIjBtuQz7EYOyUbtKzHyYZTJmMu5C2QZyObxKfWdlPwl82kQfJw4M4H3P57uYOe nvUA== X-Gm-Message-State: AOAM530oh+2pkKo+ncKgDFpawgFKLiCTC8ubYH9+Uc4A6JdF0upxfKbq +ljJcpbx6ULQrb9auBDehh67vVNYrKY= X-Google-Smtp-Source: ABdhPJxlYCFLyZLCtGE2hPdkJGLRq67JRk/+A6xqGQL0BL0y0Jr5kS/7zrVXb+g7DNenp8CNGslEAw== X-Received: by 2002:ae9:e854:0:b0:67b:225d:61f1 with SMTP id a81-20020ae9e854000000b0067b225d61f1mr16343634qkg.667.1647976143756; Tue, 22 Mar 2022 12:09:03 -0700 (PDT) Received: from hurd (dsl-10-129-199.b2b2c.ca. [72.10.129.199]) by smtp.gmail.com with ESMTPSA id h14-20020a05622a170e00b002e1a65754d8sm14926379qtk.91.2022.03.22.12.09.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Mar 2022 12:09:03 -0700 (PDT) From: Maxim Cournoyer To: guix-devel Subject: Profiling of man-db database generation with zlib vs zstd Date: Tue, 22 Mar 2022 15:09:01 -0400 Message-ID: <875yo53iuq.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::735 (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::735; envelope-from=maxim.cournoyer@gmail.com; helo=mail-qk1-x735.google.com X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, PDS_HP_HELO_NORDNS=0.659, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1647976181; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=tTDQNN17KwZyDcHb+2e+KPalqcGSZp1F0u8TMH5woDs=; b=U/kfgg9ahOJgRJJtlM9h9p55JEy62U0IePnBUk9zHGk/AeMzjrdPtcTvR3srXCz2spirXh 6nLFefkK8QFX1+XKq13ap/DFveS8KLTwLSiDJWQRfT6Vvoi0budYY773DyaAhbd+VapwCB PlA0NV8kV4WCPKlxpxy8ItoJvoZMLz4LGPTodkm4UiBF/R41KWtzCxd943J8f9QlTNOX/T Bon4GCyHGcSitgin8WcnK4gUjkd4Tk4j+zFAp/nxNNoVFi0oNA703deJhDR79bQz0Ru5v9 +yS/cELuCbXeIn+i7CkbU/E0PiLabeekncIpJLGaqI5leu1V8sTSFvQ5lgDVKg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1647976181; a=rsa-sha256; cv=none; b=YR8UA6M5O1jz18pfkX3+UtKhfTL5sxMoM0aBJTCNvyUILSSzp81a15ve6fE86iGFHcGRTq YrNaV6uP9ohvlYcwC8wnR6XxTwzZjC457nVQtotkcVsBq46ZeqkUu1SWJWvAebmvBA6KqA MgRZy2D2biF/TzGkL2x+OQ0dPPGJaFiqBSrS3Dj2GaRmrisB8ZfEwbxmd0JWV4LBYdkFE2 2YbdluATKh52JD2vSiwbw1OZkMmc2wCcgUenSDzOxuH/hTSKROVaHFQvsguO0sFxxSklaU v3dIqx8OUDqKgwm5MBUP2eStGDZwra5+cSbqJ2miOaPC8dliqtzU2h3w2s9MgQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AW8LokQy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -4.11 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AW8LokQy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: F2AB53B8AE X-Spam-Score: -4.11 X-Migadu-Scanner: scn1.migadu.com X-TUID: R9eYsaQaRpTj Hello Guix, I've spent some time looking whether the manual database generation could be sped up by switching from compressing the manual pages with zstd instead of zlib, which is reported to be about 3.5 times faster to decompress. I thought decompression would play a role because the hook is CPU-intensive and thousands of files (depending on your profile) need to be decompressed in order to be parsed. But my experiment suggests otherwise: switching from zlib to zstd currently wouldn't buy us any gain; and profiling the mandb-entries from (guix man-db) shows something like this (current, zlib compression): --8<---------------cut here---------------start------------->8--- scheme@(guix man-db)> ,profile (define a (mandb-entries "/gnu/store/jgc63dxvpd8zq0p8al71x07a02qj8i1b-man-pages-5.13/share/man")) % cumulative self time seconds seconds procedure 20.95 1.98 1.75 gdbm.scm:122:11 20.95 1.75 1.75 string-tokenize 19.37 3.61 1.62 set-procedure-property! 6.72 0.56 0.56 ice-9/eval.scm:604:6 4.35 0.36 0.36 %read-line 4.35 0.36 0.36 anon #xa8e0b0 2.77 0.23 0.23 apply-smob/1 1.58 11.51 0.13 ice-9/eval.scm:292:11 1.58 0.20 0.13 open-file 1.58 0.13 0.13 ice-9/eval.scm:333:13 1.19 0.20 0.10 ice-9/eval.scm:263:9 1.19 0.13 0.10 ice-9/eval.scm:297:11 0.79 0.66 0.07 ice-9/eval.scm:159:9 0.79 0.07 0.07 ice-9/eval.scm:187:12 0.79 0.07 0.07 ice-9/popen.scm:183:0:reap-pipes 0.79 0.07 0.07 stat 0.79 0.07 0.07 ice-9/eval.scm:273:7 0.79 0.07 0.07 make-custom-binary-input-port 0.79 0.07 0.07 anon #xa8e078 0.79 0.07 0.07 ice-9/eval.scm:123:11 0.40 0.76 0.03 ice-9/rdelim.scm:193:0:read-line 0.40 0.20 0.03 ice-9/ftw.scm:445:2:loop 0.40 0.17 0.03 zlib.scm:158:0:make-gzip-input-port 0.40 0.03 0.03 ice-9/eval.scm:182:7 0.40 0.03 0.03 ice-9/eval.scm:329:11 0.40 0.03 0.03 string-trim-both 0.40 0.03 0.03 ice-9/eval.scm:125:11 0.40 0.03 0.03 dup->fdes 0.40 0.03 0.03 ice-9/eval.scm:226:7 0.40 0.03 0.03 string:11:9 0.00 3.64 0.00 ice-9/eval.scm:586:29 0.00 2.05 0.00 %after-gc-thunk 0.00 2.05 0.00 anon #xa7d750 0.00 1.98 0.00 ice-9/eval.scm:212:12 0.00 1.46 0.00 string-map 0.00 0.36 0.00 %read-line 0.00 0.36 0.00 zlib.scm:99:4 0.00 0.20 0.00 ice-9/eval.scm:618:6 0.00 0.10 0.00 guix/build/utils.scm:476:0:find-files 0.00 0.10 0.00 ice-9/eval.scm:244:10 0.00 0.10 0.00 anon #xa7d6dc 0.00 0.10 0.00 ice-9/eval.scm:625:6 0.00 0.10 0.00 srfi/srfi-1.scm:452:2:fold 0.00 0.07 0.00 zlib.scm:87:4 0.00 0.07 0.00 ice-9/boot-9.scm:1971:6 0.00 0.03 0.00 sort 0.00 0.03 0.00 ice-9/eval.scm:263:9 0.00 0.03 0.00 guix/build/utils.scm:492:28 --- Sample count: 253 Total time: 8.368360886 seconds (5.356567327 seconds in GC) --8<---------------cut here---------------end--------------->8--- When rebuilding the man-pages package with zstd compressed manuals and using guile-zstd to read them, we instead get: --8<---------------cut here---------------start------------->8--- scheme@(guix man-db)> ,profile (define a (mandb-entries "/gnu/store/dsm4wkzgzq6i00xc765vpgdzb0aq99fa-man-pages-5.13/share/man")) % cumulative self time seconds seconds procedure 28.93 2.38 2.34 gdbm.scm:122:11 18.60 3.82 1.51 set-procedure-property! 12.81 1.04 1.04 string-tokenize 7.02 0.57 0.57 ice-9/eval.scm:604:6 4.55 0.37 0.37 make-bytevector 4.13 0.33 0.33 %read-line 3.31 0.27 0.27 anon #x1bb01e0 1.65 0.17 0.13 ice-9/eval.scm:263:9 1.65 0.13 0.13 stat 1.24 0.90 0.10 ice-9/rdelim.scm:193:0:read-line 1.24 0.10 0.10 ice-9/eval.scm:342:13 1.24 0.10 0.10 ice-9/eval.scm:125:11 1.24 0.10 0.10 regexp-exec 0.83 0.10 0.07 ice-9/eval.scm:155:9 0.83 0.07 0.07 ice-9/eval.scm:333:13 0.83 0.07 0.07 pointer->bytevector 0.83 0.07 0.07 anon #x1bb01a8 0.83 0.07 0.07 ice-9/eval.scm:187:12 0.83 0.07 0.07 string:5:9 0.00 3.82 0.00 ice-9/eval.scm:586:29 0.00 2.38 0.00 anon #x1b9f6c0 0.00 2.38 0.00 %after-gc-thunk 0.00 1.57 0.00 string-map 0.00 0.57 0.00 ice-9/eval.scm:159:9 0.00 0.47 0.00 %read-line 0.00 0.47 0.00 zstd.scm:234:2:read! 0.00 0.44 0.00 zstd.scm:216:0:make-zstd-input-port 0.00 0.27 0.00 ice-9/ftw.scm:445:2:loop 0.00 0.17 0.00 srfi/srfi-1.scm:452:2:fold 0.00 0.13 0.00 ice-9/boot-9.scm:1971:6 0.00 0.13 0.00 ice-9/eval.scm:196:12 0.00 0.13 0.00 ice-9/eval.scm:618:6 0.00 0.13 0.00 guix/build/utils.scm:487:0:find-files 0.00 0.13 0.00 anon #x1b9f64c 0.00 0.13 0.00 ice-9/eval.scm:297:11 0.00 0.10 0.00 ice-9/eval.scm:244:10 0.00 0.10 0.00 system/foreign.scm:187:0:parse-c-struct 0.00 0.10 0.00 guix/build/utils.scm:503:28 0.00 0.07 0.00 ice-9/eval.scm:278:9 0.00 0.07 0.00 zstd.scm:208:4 0.00 0.07 0.00 filter 0.00 0.07 0.00 ice-9/eval.scm:625:6 0.00 0.07 0.00 sort 0.00 0.07 0.00 ice-9/eval.scm:259:9 0.00 0.03 0.00 open-file 0.00 0.03 0.00 close-port 0.00 0.03 0.00 ice-9/eval.scm:191:12 0.00 0.03 0.00 bytevector->pointer --- Sample count: 242 Total time: 8.104984533 seconds (5.253193735 seconds in GC) --8<---------------cut here---------------end--------------->8--- So, it seems the parsing of the manual files itself in Guile (to retrieve their name, synopsis, etc.) is what is using most of the time (CPU-bound). Perhaps this can be optimized? Maxim