all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: guix-devel <guix-devel@gnu.org>
Subject: Re: Profiling of man-db database generation with zlib vs zstd
Date: Tue, 29 Mar 2022 12:30:14 +0200	[thread overview]
Message-ID: <87czi5126h.fsf@gnu.org> (raw)
In-Reply-To: <87o81qviqg.fsf@gmail.com> (Maxim Cournoyer's message of "Sun, 27 Mar 2022 23:49:59 -0400")

Hi!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> You'll need to generate the tar.zst and tar.gz yourself, but the script
> that was used is:
>
> ;; decompress-zstd.scm
> (use-modules (ice-9 binary-ports)
>              (ice-9 match)
>              (statprof)
>              (zstd))
>
> (define MiB (expt 2 20))
> (define input-file "/tmp/chromium-98.0.4758.102.tar.zst")
> (define output-file "/dev/null")
>
> (define (decompression-test)
>   (call-with-input-file input-file
>     (lambda (port)
>       (call-with-zstd-input-port port
>         (lambda (input)
>           (call-with-output-file output-file
>             (lambda (output)
>               (let loop ((bv (get-bytevector-n input (* 4 MiB))))
>                 (match bv
>                   ((? eof-object?)
>                    #t)
>                   (bv
>                    (put-bytevector output bv)
>                    (loop (get-bytevector-n input (* 4 MiB)))))))))))))

To isolate the problem, you could allocate the 4 MiB buffer outside of
the loop and use ‘get-bytevector-n!’, and also remove code that writes
to ‘output’.

> This confirms that guile-zstd is not noticeably faster than guile-zlib,
> which is unexpected.

Uh, surprising.

Note that ‘statprof’ incurs overhead, so in general if you want timings,
get them without ‘statprof’.

> Compare to the command line tools:
>
> $ time+ zstd -cdk /tmp/chromium-98.0.4758.102.tar.zst > /dev/null
> cpu: 99%, mem: 10548 KiB, wall: 0:09.37, sys: 0.30, usr: 9.05
>
> $ time+ gunzip -ck /tmp/chromium-98.0.4758.102.tar.gz > /dev/null
> cpu: 99%, mem: 2908 KiB, wall: 0:22.29, sys: 0.31, usr: 21.98
>
> where zstd is about 2.3x faster.
>
> It's unfortunate that the bulk of the time is spent in "anon" (anonymous
> proc?), which doesn't say much.

It’s likely one of the lambdas.

> Perhaps I should open an issue with the guile-zstd project.

Yes, or we can continue here.  :-)

From there I think we should first fully isolate the thing we’re
measuring, as discussed above, to gain confidence.

It the code using guile-zstd is slower than the CLI, then it could be
that guile-zstd doesn’t initialize the library properly, or that it gets
buffering wrong or something.

I’ll see if I can give it a try too.

Thanks for investigating!

Ludo’.


  reply	other threads:[~2022-03-29 10:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22 19:09 Profiling of man-db database generation with zlib vs zstd Maxim Cournoyer
2022-03-24 21:37 ` Ludovic Courtès
2022-03-26  3:22   ` Maxim Cournoyer
2022-03-27  3:44     ` Maxim Cournoyer
2022-03-29 10:22       ` Ludovic Courtès
2022-03-28  3:49   ` Maxim Cournoyer
2022-03-29 10:30     ` Ludovic Courtès [this message]
2022-03-30 14:49       ` Maxim Cournoyer
2022-03-30 16:16       ` Jonathan McHugh
2022-03-31 17:13     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czi5126h.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=maxim.cournoyer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.