unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: Arne Babenhauserheide <arne_bab@web.de>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 36630@debbugs.gnu.org
Subject: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Tue, 16 Jul 2019 01:32:46 +0200	[thread overview]
Message-ID: <877e8ig9gh.fsf@web.de> (raw)
In-Reply-To: <878sszl1jo.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3022 bytes --]

Hi Ludo’,

Ludovic Courtès <ludo@gnu.org> writes:
>> * guix/profiles.scm (manual-database): par-map over the entries.  This
>> distributes the load roughly equally over all cores and avoids blocking on
>> I/O.  The order of the entries stays the same since write-mandb-database sorts
>> them.
>
> I would think the whole process is largely I/O-bound.  Did you try
> measuring differences?

I did not measure the difference in build-time, but I did check the
system load. Without this patch, one of my cores is under full
load. With this patch all 12 hyperthreads have a mean load of 50%.

> I picked the manual-database derivation returned for:
>   guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
> (It has 3,046 entries.)

How exactly did you run the derivation? I’d like to check it if you can
give me the exact commandline to run (a command I can run repeatedly).


> On a SSD and with a hot cache, on my 4-core laptop, I get 74s with
> ‘master’, and 53s with this patch.

I’m using a machine with 6 physical cores, hyperthreading, and an NVMe
M.2 disk, so it is likely that it would not be disk-bound for me at 4
threads.

> However, it will definitely not scale linearly, so we should probably
> cap at 2 or 4 threads.  WDYT?

Looking at the underlying action, this seems to be a task that scales
pretty well. It just unpacks files into the disk-cache.

It should also not consume much memory, so I don’t see a reason to
artificially limit the number of threads.

> Another issue with the patch is that the [n/total] counter does not grow
> monotically now: it might temporally go backwards.  Consequently, at
> -v1, users will see a progress bar that hesitates and occasionally goes
> backward, which isn’t great.

It typically jumps forward in the beginning and then stalls until the
first manual page is finished.

Since par-map uses a global queue of futures to process, and since the
output is the first part of (compute-entry …), I don’t expect the
progress to move backwards in ways a user sees: It could only move
backwards during the initial step where all threads start at the same
time, and there the initial output should be overwritten fast enough to
not be noticeable.

> This would need to fix it with a mutex-protected global counter.

A global counter would be pretty bad for scaling. As it is, this code
needs no communication between processes besides returning the final
result, so it behaves exactly like a normal map, aside from being
faster. So I’d prefer to accept the forward-jumping.

> All in all, I’m not sure this is worth the complexity.
>
> WDYT?

Given that building manual pages is the most timeconsuming part when
installing a small tool into my profile, I think it is worth the
complexity. Especially because most of the complexity is being taken
care of by (ice-9 threads par-map).

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1076 bytes --]

  reply	other threads:[~2019-07-15 23:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-12 21:42 [bug#36630] [PATCH] guix: parallelize building the manual-database Arne Babenhauserheide
2019-07-15 16:12 ` Ludovic Courtès
2019-07-15 23:32   ` Arne Babenhauserheide [this message]
2019-07-16 21:14     ` Ludovic Courtès
2019-07-17 22:06       ` Arne Babenhauserheide
2019-07-18  8:55         ` Ludovic Courtès
2019-07-18  8:57         ` Ludovic Courtès
2019-07-18 10:59           ` Arne Babenhauserheide
2019-07-18 13:46             ` Ludovic Courtès
2019-07-18 20:03               ` Arne Babenhauserheide
2019-10-23 20:01                 ` Arne Babenhauserheide
2020-03-31 13:02                   ` bug#36630: " Ludovic Courtès
2019-10-23 19:59 ` [bug#36630] [PATCH] use two threads to build man-pages and secure output msgs with mutex Arne Babenhauserheide
2019-10-27 16:27 ` Arne Babenhauserheide

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877e8ig9gh.fsf@web.de \
    --to=arne_bab@web.de \
    --cc=36630@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).