From: "Ludovic Courtès" <ludo@gnu.org>
To: Arne Babenhauserheide <arne_bab@web.de>
Cc: 36630@debbugs.gnu.org
Subject: [bug#36630] [PATCH] guix: parallelize building the manual-database
Date: Tue, 16 Jul 2019 23:14:48 +0200 [thread overview]
Message-ID: <87zhldel6f.fsf@gnu.org> (raw)
In-Reply-To: <877e8ig9gh.fsf@web.de> (Arne Babenhauserheide's message of "Tue, 16 Jul 2019 01:32:46 +0200")
Hello,
Arne Babenhauserheide <arne_bab@web.de> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
[...]
>> I picked the manual-database derivation returned for:
>> guix environment --ad-hoc jupyter python-ipython python-ipykernel -n
>> (It has 3,046 entries.)
>
> How exactly did you run the derivation? I’d like to check it if you can
> give me the exact commandline to run (a command I can run repeatedly).
If you run the command above, it’ll list
/gnu/store/…-manual-database.drv. So you can just run:
guix build /gnu/store/…-manual-database.drv
or:
guix build /gnu/store/…-manual-database.drv --check
if it had already been built before.
>> On a SSD and with a hot cache, on my 4-core laptop, I get 74s with
>> ‘master’, and 53s with this patch.
>
> I’m using a machine with 6 physical cores, hyperthreading, and an NVMe
> M.2 disk, so it is likely that it would not be disk-bound for me at 4
> threads.
The result may be entirely different with a spinning disk. :-)
I’m not saying we should optimize for spinning disks, just that what you
see is at one end of the spectrum.
>> However, it will definitely not scale linearly, so we should probably
>> cap at 2 or 4 threads. WDYT?
>
> Looking at the underlying action, this seems to be a task that scales
> pretty well. It just unpacks files into the disk-cache.
>
> It should also not consume much memory, so I don’t see a reason to
> artificially limit the number of threads.
On a many-core machine like we have in our build farm, with spinning
disks, I believe that using one thread per core would be
counterproductive.
>> Another issue with the patch is that the [n/total] counter does not grow
>> monotically now: it might temporally go backwards. Consequently, at
>> -v1, users will see a progress bar that hesitates and occasionally goes
>> backward, which isn’t great.
>
> It typically jumps forward in the beginning and then stalls until the
> first manual page is finished.
>
> Since par-map uses a global queue of futures to process, and since the
> output is the first part of (compute-entry …), I don’t expect the
> progress to move backwards in ways a user sees: It could only move
> backwards during the initial step where all threads start at the same
> time, and there the initial output should be overwritten fast enough to
> not be noticeable.
Hmm, maybe. I’m sure we’ll get reports saying this looks weird and
Something Must Absolutely Be Done About It. :-)
But anyway, another issue is that we would need to honor
‘parallel-job-count’, which means using ‘n-par-map’, which doesn’t use
futures.
> Given that building manual pages is the most timeconsuming part when
> installing a small tool into my profile, I think it is worth the
> complexity. Especially because most of the complexity is being taken
> care of by (ice-9 threads par-map).
Just today I realized that the example above (with Jupyter) has so many
entries because of propagated inputs; in particular libxext along brings
1,000+ man pages. We should definitely do something about these
packages.
Needs more thought…
Thanks,
Ludo’.
next prev parent reply other threads:[~2019-07-16 21:15 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-12 21:42 [bug#36630] [PATCH] guix: parallelize building the manual-database Arne Babenhauserheide
2019-07-15 16:12 ` Ludovic Courtès
2019-07-15 23:32 ` Arne Babenhauserheide
2019-07-16 21:14 ` Ludovic Courtès [this message]
2019-07-17 22:06 ` Arne Babenhauserheide
2019-07-18 8:55 ` Ludovic Courtès
2019-07-18 8:57 ` Ludovic Courtès
2019-07-18 10:59 ` Arne Babenhauserheide
2019-07-18 13:46 ` Ludovic Courtès
2019-07-18 20:03 ` Arne Babenhauserheide
2019-10-23 20:01 ` Arne Babenhauserheide
2020-03-31 13:02 ` bug#36630: " Ludovic Courtès
2019-10-23 19:59 ` [bug#36630] [PATCH] use two threads to build man-pages and secure output msgs with mutex Arne Babenhauserheide
2019-10-27 16:27 ` Arne Babenhauserheide
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zhldel6f.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=36630@debbugs.gnu.org \
--cc=arne_bab@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).