From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:52792) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hnUmt-00076B-Fh for guix-patches@gnu.org; Tue, 16 Jul 2019 17:15:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hnUms-0002bi-Ax for guix-patches@gnu.org; Tue, 16 Jul 2019 17:15:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:42435) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hnUms-0002bc-76 for guix-patches@gnu.org; Tue, 16 Jul 2019 17:15:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hnUmr-00089K-UY for guix-patches@gnu.org; Tue, 16 Jul 2019 17:15:02 -0400 Subject: [bug#36630] [PATCH] guix: parallelize building the manual-database Resent-Message-ID: From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20190712214245.23857-1-arne_bab@web.de> <878sszl1jo.fsf@gnu.org> <877e8ig9gh.fsf@web.de> Date: Tue, 16 Jul 2019 23:14:48 +0200 In-Reply-To: <877e8ig9gh.fsf@web.de> (Arne Babenhauserheide's message of "Tue, 16 Jul 2019 01:32:46 +0200") Message-ID: <87zhldel6f.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Arne Babenhauserheide Cc: 36630@debbugs.gnu.org Hello, Arne Babenhauserheide skribis: > Ludovic Court=C3=A8s writes: [...] >> I picked the manual-database derivation returned for: >> guix environment --ad-hoc jupyter python-ipython python-ipykernel -n >> (It has 3,046 entries.) > > How exactly did you run the derivation? I=E2=80=99d like to check it if y= ou can > give me the exact commandline to run (a command I can run repeatedly). If you run the command above, it=E2=80=99ll list /gnu/store/=E2=80=A6-manual-database.drv. So you can just run: guix build /gnu/store/=E2=80=A6-manual-database.drv or: guix build /gnu/store/=E2=80=A6-manual-database.drv --check if it had already been built before. >> On a SSD and with a hot cache, on my 4-core laptop, I get 74s with >> =E2=80=98master=E2=80=99, and 53s with this patch. > > I=E2=80=99m using a machine with 6 physical cores, hyperthreading, and an= NVMe > M.2 disk, so it is likely that it would not be disk-bound for me at 4 > threads. The result may be entirely different with a spinning disk. :-) I=E2=80=99m not saying we should optimize for spinning disks, just that wha= t you see is at one end of the spectrum. >> However, it will definitely not scale linearly, so we should probably >> cap at 2 or 4 threads. WDYT? > > Looking at the underlying action, this seems to be a task that scales > pretty well. It just unpacks files into the disk-cache. > > It should also not consume much memory, so I don=E2=80=99t see a reason to > artificially limit the number of threads. On a many-core machine like we have in our build farm, with spinning disks, I believe that using one thread per core would be counterproductive. >> Another issue with the patch is that the [n/total] counter does not grow >> monotically now: it might temporally go backwards. Consequently, at >> -v1, users will see a progress bar that hesitates and occasionally goes >> backward, which isn=E2=80=99t great. > > It typically jumps forward in the beginning and then stalls until the > first manual page is finished. > > Since par-map uses a global queue of futures to process, and since the > output is the first part of (compute-entry =E2=80=A6), I don=E2=80=99t ex= pect the > progress to move backwards in ways a user sees: It could only move > backwards during the initial step where all threads start at the same > time, and there the initial output should be overwritten fast enough to > not be noticeable. Hmm, maybe. I=E2=80=99m sure we=E2=80=99ll get reports saying this looks w= eird and Something Must Absolutely Be Done About It. :-) But anyway, another issue is that we would need to honor =E2=80=98parallel-job-count=E2=80=99, which means using =E2=80=98n-par-map= =E2=80=99, which doesn=E2=80=99t use futures. > Given that building manual pages is the most timeconsuming part when > installing a small tool into my profile, I think it is worth the > complexity. Especially because most of the complexity is being taken > care of by (ice-9 threads par-map). Just today I realized that the example above (with Jupyter) has so many entries because of propagated inputs; in particular libxext along brings 1,000+ man pages. We should definitely do something about these packages. Needs more thought=E2=80=A6 Thanks, Ludo=E2=80=99.