From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxim Cournoyer Subject: bug#27284: Memory leak in 'guix pull' or 'make' in guix source Date: Tue, 19 Sep 2017 22:40:02 -0400 Message-ID: <87y3paccu5.fsf@gmail.com> References: <20170608083935.izw747zaetkaxv4o@abyayala> <87poamv2i7.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:46379) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1duUwm-000091-7f for bug-guix@gnu.org; Tue, 19 Sep 2017 22:41:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1duUwh-0007Pr-B1 for bug-guix@gnu.org; Tue, 19 Sep 2017 22:41:08 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:40202) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1duUwh-0007P9-6u for bug-guix@gnu.org; Tue, 19 Sep 2017 22:41:03 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87poamv2i7.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 19 Sep 2017 22:48:16 +0200") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 27284@debbugs.gnu.org ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Hello Guix! > > A heads-up to share the cognitive burden related to this topic. :-) > > So, we have two problems: compilation time, and memory consumption. I > *think* I=E2=80=99ve identified one of the major causes for both in Guile, > though it=E2=80=99s too early to say exactly how much this will impact re= source > consumption for a full Guix compilation. See > > for details. > > When that is fixed, we=E2=80=99ll still have a performance problem: build= ing all > of Guix will still take more time than we=E2=80=99d like, and it won=E2= =80=99t get > better as we add new files. So we need to address this. > > This has been discussed informally many times, and here=E2=80=99s a summa= ry of > the ideas I=E2=80=99m aware of: > > 1. Build Guix as separate derivations: the first derivation builds the [...] > > 2. Build all of Guix like the =E2=80=98guix=E2=80=99 package does, and = hope that we > can get a substitute. This seems like the best option to me. It strikes me as odd that Guix isn't distributed as its own substitutable package that can be efficiently downloaded and unpacked to the store, rather than the current intensive build-it-all process. > Bootstrapping issue: to do that, we first need compute the > derivation of this new =E2=80=98guix=E2=80=99 package. Thus, we at = least need to > build the closure of (guix packages), which should take a minute or > so, after which we can compute the derivation, which could take a > couple of minutes maybe. > The problem is that building all of Guix (including running the > test suite) takes some time, potentially more than the interval > between two subsequent pushes to the repo. Thus, it=E2=80=99s quite= likely > that the build farm would always be lagging behind. Do we really have to build it *all*, every time? There are two optimizations which I can think of: 1. Only rebuild the files that changed. I guess the challenge here is for the build to be deterministic. It would be important to be able to detect when any of the inputs part of the guix closure changed; when such a thing occurs we'd have to fully rebuild Guix. Otherwise, for a constant closure, it should be safe to simply 'git pull' and run 'make' to rebuild only the modified files, which could speed up things a bit. A smarter Guile that would only partially recompile the changed/referenced parts of massive files instead of completely (think python.scm) would also improve things a lot (this would also be a must when working on packages directly in the tree! Updating the hash of a Python package shouldn't require minutes of compilation). 2. Remove the --no-auto-compile arg from (guix scripts guix). I'm not sure why this is desirable. Is it because otherwise the cached compiled copies would be susceptible to ABI breakages? This could be catch and handled by the build script, in which case it would retry with '--fresh-auto-compile'. IIUC, these cached files would never end in the the package to be distributed anyway, but having them would greatly speed up computing the closure of guix. Even with such optimizations it could still take more time to build Guix than the rate at which commits hit master. Maybe a cron job that could build the latest Guix every, say, 5 or 10 minutes (only when new commits appeared since the last time it was built)? We would then distribute the latest built Guix package when users would issue a 'guix pull'. There could be a 'guix pull' option to force building the package from the latest sources, for those who prefer the current behavior. I think most users would happily trade having to entirely compile Guix from scratch everytime they use 'guix pull' with being behind 5 minutes worth of commits from master's tip! Maxim