From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id iFw2MPttRGLijQAAgWs5BA (envelope-from ) for ; Wed, 30 Mar 2022 16:49:31 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id IBCdLfttRGI1kAAA9RJhRA (envelope-from ) for ; Wed, 30 Mar 2022 16:49:31 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 56EB21572B for ; Wed, 30 Mar 2022 16:49:31 +0200 (CEST) Received: from localhost ([::1]:43854 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nZZda-0002iJ-7y for larch@yhetil.org; Wed, 30 Mar 2022 10:49:30 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50490) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nZZdI-0002i9-7s for guix-devel@gnu.org; Wed, 30 Mar 2022 10:49:12 -0400 Received: from [2607:f8b0:4864:20::82e] (port=36850 helo=mail-qt1-x82e.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nZZdG-0007lO-KY; Wed, 30 Mar 2022 10:49:11 -0400 Received: by mail-qt1-x82e.google.com with SMTP id s11so18231971qtc.3; Wed, 30 Mar 2022 07:49:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=ZmO8xpKq6HSwIbZCqYhczGkCW1e8xEaAmIzWTwNK18U=; b=bfQ3vCuqknrXzdbtMlMoQH6oS/09GKyEdIJGU/WVCb84pRuRRIRGyf+YwejmpkT4QQ H/ihQMSG1pqPXPECMqxCZNR4PLkvdRuc2tNVfPc0VHSODZ1RHFcAKT4et6Fxw9oP0Kyf 3t4YzuS44nLKDJU+TyeS+mhUiXgKXjI8E8vHnhmtXxbIRsPvjabqrSLrJP7RhkXCYAgJ VNE/K/wSTgAXSLmwPNuBKTGJ7WwGpKAlgsxRW7mbS2Qs8d4gmaGvdxhe9a2mVoo5IMaQ DK6/s3wyL76No3dtw0MC+h3hRjzkpxkR17w75T8XI22FWBY41vS4/Fc7gImLG25muImM WFAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=ZmO8xpKq6HSwIbZCqYhczGkCW1e8xEaAmIzWTwNK18U=; b=koFce1krDRiQcZmKMFZgrl5/45JIXpAEJFo8duGq/dulWnfoaaTWg8AndkUp6wWimW KBsP1IZo+ofzI0q45uwFIeLFmt4tVDG+daUoH9ynkfuAr6eGlAa26oiGMtwt2FzX0s7j bwZtjjIeK/NY94+AF+z46dANRJj9ANf25BcRbMVNK8YcxDUIyVKHll0WB7tadS2d3xt1 qL13UA86qdm3C/IBgW50myF3Uvvv9pG8MZZxNMFp3MEdHyZXZrvTwX3rP8TaQLPc23KE tzbNI1Hre4wSTvR42CQGFeGIKGWbrHtAeAHpkobTCMFrzg9OA4BkqMplPdRevNRKFpMQ 20vA== X-Gm-Message-State: AOAM532dm3C0L8HK5h7xEFmXo9WUrMgnl8B4KdJZkmU1MWS8tWs3gzUT 61hoKlBoj28u5CQC+QOUCPK58J4XDwQ= X-Google-Smtp-Source: ABdhPJy8ZQnxHNdUHIIj63YWbo5ij3OS1bE4K8SethYhm4dMC/JmXFF3ZKPM9e0XjX8KjCj9R6vrPg== X-Received: by 2002:ac8:5a84:0:b0:2e1:4f1d:36fb with SMTP id c4-20020ac85a84000000b002e14f1d36fbmr32667425qtc.54.1648651748556; Wed, 30 Mar 2022 07:49:08 -0700 (PDT) Received: from hurd (dsl-157-28.b2b2c.ca. [66.158.157.28]) by smtp.gmail.com with ESMTPSA id g5-20020ac87f45000000b002e125ef0ba3sm17459352qtk.82.2022.03.30.07.49.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 07:49:07 -0700 (PDT) From: Maxim Cournoyer To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: Profiling of man-db database generation with zlib vs zstd References: <875yo53iuq.fsf@gmail.com> <87ee2r9gms.fsf@gnu.org> <87o81qviqg.fsf@gmail.com> <87czi5126h.fsf@gnu.org> Date: Wed, 30 Mar 2022 10:49:06 -0400 In-Reply-To: <87czi5126h.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Tue, 29 Mar 2022 12:30:14 +0200") Message-ID: <875ynvv6l9.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::82e (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::82e; envelope-from=maxim.cournoyer@gmail.com; helo=mail-qt1-x82e.google.com X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, PDS_HP_HELO_NORDNS=0.659, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1648651771; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=ZmO8xpKq6HSwIbZCqYhczGkCW1e8xEaAmIzWTwNK18U=; b=Elcnw+PrjcWUywSGhFjz3wW4yZaCEHRDpUHhCjZsKtngw685Gf2GVdVWhhRzwKH6d3Uxrk LAz+hXmRNqeIxj/QKSFUDm6uuGSck5VH8PRyAu3HVw+GDPIk/3loPsVtwp7qAladSMdf18 72yPFUVjlCzg0+h+PJI9T4Lm355k90LY/NKXK/vNmcGyZF0zih6ZxIy1JybSoAFFl1wKYg piyWTEUcMbH8dcG0pLKmgAirLn4s/G0YXvV9RM/uLDDmIqwVp8d5yWWYuRBz8ZrgZ+PxH/ /+KSfwKlLFrSCHFhuOrG24M/nS9ik96xZ5ubkuC8yMZGQI1dwU4qfoetJ3U7Ew== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1648651771; a=rsa-sha256; cv=none; b=lTQ2Zw3cG7jKdnOCxz7RCUSucFOrxw8Vb0/ZZecUCgjtjdozrMSLiWp0QbrSHhVVUxQoNd n2TqDxz6/G26Yfe/s+adOH/SY41nVViuKIewMqG4Yvd/hwn/7bb5DAbcb1aoSlGud8WKdr yspjzg6j3rcSYOfyItio7dhlNb7M/U65afDNCnENE7ceokH6n1oZxYEzeqCiNOdkA/XWed m4k0d9NxYMC/fC7OW9K9iVY0EnCJPSB3aydmdRSvMuYoipkIUBlNTSUT59eJXIfUb52njK GcVJoBSaKIhwu5uXi2RJ0wVBK1aFRMV4AUiu13ocfjMu+7aEyUyog2yBpM10Qg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=bfQ3vCuq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -2.57 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=bfQ3vCuq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 56EB21572B X-Spam-Score: -2.57 X-Migadu-Scanner: scn1.migadu.com X-TUID: ChnCU7gRr5XU Hi Ludovic, Ludovic Court=C3=A8s writes: [...] > To isolate the problem, you could allocate the 4=C2=A0MiB buffer outside = of > the loop and use =E2=80=98get-bytevector-n!=E2=80=99, and also remove cod= e that writes > to =E2=80=98output=E2=80=99. I've adjusted the benchmark like so: --8<---------------cut here---------------start------------->8--- (use-modules (ice-9 binary-ports) (ice-9 match) (rnrs bytevectors) (zstd)) (define MiB (expt 2 20)) (define block-size (* 4 MiB)) (define bv (make-bytevector block-size)) (define input-file "/tmp/chromium-98.0.4758.102.tar.zst") (define (run) (call-with-input-file input-file (lambda (port) (call-with-zstd-input-port port (lambda (input) (while (not (eof-object? (get-bytevector-n! input bv 0 block-size))))))))) (run) --8<---------------cut here---------------end--------------->8--- It now runs much faster: --8<---------------cut here---------------start------------->8--- $ time+ zstd -cdk /tmp/chromium-98.0.4758.102.tar.zst > /dev/null cpu: 98%, mem: 10560 KiB, wall: 0:09.56, sys: 0.37, usr: 9.06 --8<---------------cut here---------------end--------------->8--- --8<---------------cut here---------------start------------->8--- $ time+ guile ~/src/guile-zstd/benchmark.scm cpu: 100%, mem: 25152 KiB, wall: 0:11.69, sys: 0.38, usr: 11.30 --8<---------------cut here---------------end--------------->8--- So guile-zstd was about 20% slower, not too far. For completeness, here's the same benchmark adjusted for guile-zlib: --8<---------------cut here---------------start------------->8--- (use-modules (ice-9 binary-ports) (ice-9 match) (rnrs bytevectors) (zlib)) (define MiB (expt 2 20)) (define block-size (* 4 MiB)) (define bv (make-bytevector block-size)) (define input-file "/tmp/chromium-98.0.4758.102.tar.gz") (define (run) (call-with-input-file input-file (lambda (port) (call-with-gzip-input-port port (lambda (input) (while (not (eof-object? (get-bytevector-n! input bv 0 block-size))))))))) (run) --8<---------------cut here---------------end--------------->8--- --8<---------------cut here---------------start------------->8--- $ time+ guile ~/src/guile-zstd/benchmark-zlib.scm cpu: 86%, mem: 14552 KiB, wall: 0:23.50, sys: 1.09, usr: 19.15 --8<---------------cut here---------------end--------------->8--- --8<---------------cut here---------------start------------->8--- $ time+ gunzip -ck /tmp/chromium-98.0.4758.102.tar.gz > /dev/null cpu: 98%, mem: 2304 KiB, wall: 0:35.99, sys: 0.60, usr: 34.99 --8<---------------cut here---------------end--------------->8--- Surprisingly, here guile-zlib appears to be faster than the 'gunzip' command; guile-zstd is about twice as fast to decompress this 4 GiB something archive (compressed with zstd at level 19). So, it seems the foundation we're building on is sane after all. This suggests that compression is not the bottleneck when generating the man pages database, probably because it only needs to read the first few bytes of each compressed manpage to gather the information it needs, and that the rest is more expensive compared to that (such as string-tokenize'ing the lines read to parse the data). To be continued... Thanks! Maxim