Hi Ludo’, Ludovic Courtès writes: >> * guix/profiles.scm (manual-database): par-map over the entries. This >> distributes the load roughly equally over all cores and avoids blocking on >> I/O. The order of the entries stays the same since write-mandb-database sorts >> them. > > I would think the whole process is largely I/O-bound. Did you try > measuring differences? I did not measure the difference in build-time, but I did check the system load. Without this patch, one of my cores is under full load. With this patch all 12 hyperthreads have a mean load of 50%. > I picked the manual-database derivation returned for: > guix environment --ad-hoc jupyter python-ipython python-ipykernel -n > (It has 3,046 entries.) How exactly did you run the derivation? I’d like to check it if you can give me the exact commandline to run (a command I can run repeatedly). > On a SSD and with a hot cache, on my 4-core laptop, I get 74s with > ‘master’, and 53s with this patch. I’m using a machine with 6 physical cores, hyperthreading, and an NVMe M.2 disk, so it is likely that it would not be disk-bound for me at 4 threads. > However, it will definitely not scale linearly, so we should probably > cap at 2 or 4 threads. WDYT? Looking at the underlying action, this seems to be a task that scales pretty well. It just unpacks files into the disk-cache. It should also not consume much memory, so I don’t see a reason to artificially limit the number of threads. > Another issue with the patch is that the [n/total] counter does not grow > monotically now: it might temporally go backwards. Consequently, at > -v1, users will see a progress bar that hesitates and occasionally goes > backward, which isn’t great. It typically jumps forward in the beginning and then stalls until the first manual page is finished. Since par-map uses a global queue of futures to process, and since the output is the first part of (compute-entry …), I don’t expect the progress to move backwards in ways a user sees: It could only move backwards during the initial step where all threads start at the same time, and there the initial output should be overwritten fast enough to not be noticeable. > This would need to fix it with a mutex-protected global counter. A global counter would be pretty bad for scaling. As it is, this code needs no communication between processes besides returning the final result, so it behaves exactly like a normal map, aside from being faster. So I’d prefer to accept the forward-jumping. > All in all, I’m not sure this is worth the complexity. > > WDYT? Given that building manual pages is the most timeconsuming part when installing a small tool into my profile, I think it is worth the complexity. Especially because most of the complexity is being taken care of by (ice-9 threads par-map). Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken