unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: guix-devel <guix-devel@gnu.org>
Subject: Re: Profiling of man-db database generation with zlib vs zstd
Date: Tue, 29 Mar 2022 12:22:43 +0200	[thread overview]
Message-ID: <87k0cd12j0.fsf@gnu.org> (raw)
In-Reply-To: <87czi8vz3n.fsf@gmail.com> (Maxim Cournoyer's message of "Sat, 26 Mar 2022 23:44:12 -0400")

Howdy,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> OK, I've now compiled guix/man-db.scm with -O3, generated a profile with
> ~5300 manuals (guix shell --no-grafts --pure man-db perl@5.14 man-pages
> libx11:doc gnutls-dane:doc my-tcl) on core-updates with my local changes
> to compress man pages with zstd and profiled:
>
> scheme@(guix man-db)> ,profile (define a (mandb-entries "/gnu/store/jp1kjkz5m116r960gvjk1sj4b0fkb0ip-profile/share/man"))
> %     cumulative   self
> time   seconds     seconds  procedure
>  69.37     16.19     16.15  %read-line
>  14.14      3.31      3.29  string-tokenize
>   5.76      1.34      1.34  gdbm.scm:122:11

[...]

> Total time: 23.282459186 seconds (7.318256931 seconds in GC)
>
>
> It still shows that parsing the files is much more expensive than
> decompressing them.  This is also true of zlib-compressed manuals;
> here's the same experiment on master:
>
> $ guix shell --no-grafts --pure man-db perl@5.14 man-pages libx11:doc gnutls-dane:doc tcl
> [env]$ echo $GUIX_ENVIRONMENT && exit
> /gnu/store/qqd7d22wf9d220prkm682yypybpr7df4-profile
> $ guix shell -D guix guile-gdbm-ffi guile-zstd
> [env]$ guild compile -O3 guix/man-db.scm
> `/home/maxim/.cache/guile/ccache/3.0-LE-8-4.6/home/maxim/src/guix-master/guix/man-db.scm.go'
> [env]$ ./pre-inst-env guix repl
>
> scheme@(guix-user)> ,m (guix man-db)
> scheme@(guix man-db)> ,profile (define a (mandb-entries "/gnu/store/qqd7d22wf9d220prkm682yypybpr7df4-profile/share/man"))
> %     cumulative   self
> time   seconds     seconds  procedure
>  49.15      2.62      2.56  string-tokenize
>  15.93      0.87      0.83  %read-line

[...]

> Total time: 5.217801086 seconds (1.528583927 seconds in GC)
>
> Hum, OK, so if I understand guile-zstd causes an almost 5x
> slowdown... that doesn't make much sense unless the guile-zstd library
> is much slower itself (zstd-decompression should be about 3.5x faster
> than that of zlib).

Zstd decompression is indeed faster than zlib decompression.  It could
be that there are other factors interfering though.

For example, since we decompress only the first few bytes of each file,
it could be that allocation/initialization of the zlib/zstd
decompression ports outweighs the actual decompression cost.  It might
be possible to visualize that with a C-level profile, using ‘perf’.

> Supposing guile-zstd was a 1:1 equivalent to guile-zlib, then we should
> be able to see the above ~5 s time shrink down to around 1.5 s (best
> case), which would be nice.
>
> I'm impressed the man-db hook is that fast already though; I seem to
> recall it would take like 30 s (or more?) for such a large amount of man
> pages on this machine.

Are you doing this on a hot cache?  I/O would normally dominate that
whole process when running ‘guix install foo’, because in that case
you’d be reading all these man pages for the first time (they’re not in
cache).

HTH,
Ludo’.


  reply	other threads:[~2022-03-29 10:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22 19:09 Profiling of man-db database generation with zlib vs zstd Maxim Cournoyer
2022-03-24 21:37 ` Ludovic Courtès
2022-03-26  3:22   ` Maxim Cournoyer
2022-03-27  3:44     ` Maxim Cournoyer
2022-03-29 10:22       ` Ludovic Courtès [this message]
2022-03-28  3:49   ` Maxim Cournoyer
2022-03-29 10:30     ` Ludovic Courtès
2022-03-30 14:49       ` Maxim Cournoyer
2022-03-30 16:16       ` Jonathan McHugh
2022-03-31 17:13     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0cd12j0.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=maxim.cournoyer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).