all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel <guix-devel@gnu.org>
Subject: Re: Profiling of man-db database generation with zlib vs zstd
Date: Wed, 30 Mar 2022 10:49:06 -0400	[thread overview]
Message-ID: <875ynvv6l9.fsf@gmail.com> (raw)
In-Reply-To: <87czi5126h.fsf@gnu.org> ("Ludovic Courtès"'s message of "Tue, 29 Mar 2022 12:30:14 +0200")

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

> To isolate the problem, you could allocate the 4 MiB buffer outside of
> the loop and use ‘get-bytevector-n!’, and also remove code that writes
> to ‘output’.

I've adjusted the benchmark like so:

--8<---------------cut here---------------start------------->8---
(use-modules (ice-9 binary-ports)
             (ice-9 match)
             (rnrs bytevectors)
             (zstd))

(define MiB (expt 2 20))
(define block-size (* 4 MiB))
(define bv (make-bytevector block-size))
(define input-file "/tmp/chromium-98.0.4758.102.tar.zst")

(define (run)
  (call-with-input-file input-file
    (lambda (port)
      (call-with-zstd-input-port port
        (lambda (input)
          (while (not (eof-object?
                       (get-bytevector-n! input bv 0 block-size)))))))))

(run)
--8<---------------cut here---------------end--------------->8---

It now runs much faster:

--8<---------------cut here---------------start------------->8---
$ time+ zstd -cdk /tmp/chromium-98.0.4758.102.tar.zst > /dev/null
cpu: 98%, mem: 10560 KiB, wall: 0:09.56, sys: 0.37, usr: 9.06
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
$ time+ guile ~/src/guile-zstd/benchmark.scm
cpu: 100%, mem: 25152 KiB, wall: 0:11.69, sys: 0.38, usr: 11.30
--8<---------------cut here---------------end--------------->8---

So guile-zstd was about 20% slower, not too far.

For completeness, here's the same benchmark adjusted for guile-zlib:

--8<---------------cut here---------------start------------->8---
(use-modules (ice-9 binary-ports)
             (ice-9 match)
             (rnrs bytevectors)
             (zlib))

(define MiB (expt 2 20))
(define block-size (* 4 MiB))
(define bv (make-bytevector block-size))
(define input-file "/tmp/chromium-98.0.4758.102.tar.gz")

(define (run)
  (call-with-input-file input-file
    (lambda (port)
      (call-with-gzip-input-port port
        (lambda (input)
          (while (not (eof-object?
                       (get-bytevector-n! input bv 0 block-size)))))))))

(run)
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
$ time+ guile ~/src/guile-zstd/benchmark-zlib.scm
cpu: 86%, mem: 14552 KiB, wall: 0:23.50, sys: 1.09, usr: 19.15
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
$ time+ gunzip -ck /tmp/chromium-98.0.4758.102.tar.gz > /dev/null
cpu: 98%, mem: 2304 KiB, wall: 0:35.99, sys: 0.60, usr: 34.99
--8<---------------cut here---------------end--------------->8---

Surprisingly, here guile-zlib appears to be faster than the 'gunzip'
command; guile-zstd is about twice as fast to decompress this 4 GiB
something archive (compressed with zstd at level 19).

So, it seems the foundation we're building on is sane after all.  This
suggests that compression is not the bottleneck when generating the man
pages database, probably because it only needs to read the first few
bytes of each compressed manpage to gather the information it needs, and
that the rest is more expensive compared to that (such as
string-tokenize'ing the lines read to parse the data).

To be continued...

Thanks!

Maxim


  reply	other threads:[~2022-03-30 14:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22 19:09 Profiling of man-db database generation with zlib vs zstd Maxim Cournoyer
2022-03-24 21:37 ` Ludovic Courtès
2022-03-26  3:22   ` Maxim Cournoyer
2022-03-27  3:44     ` Maxim Cournoyer
2022-03-29 10:22       ` Ludovic Courtès
2022-03-28  3:49   ` Maxim Cournoyer
2022-03-29 10:30     ` Ludovic Courtès
2022-03-30 14:49       ` Maxim Cournoyer [this message]
2022-03-30 16:16       ` Jonathan McHugh
2022-03-31 17:13     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875ynvv6l9.fsf@gmail.com \
    --to=maxim.cournoyer@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.