From mboxrd@z Thu Jan 1 00:00:00 1970 From: myglc2 Subject: Re: Guix on clusters and in HPC Date: Mon, 31 Oct 2016 20:11:45 -0400 Message-ID: <86h97sf48u.fsf@gmail.com> References: <87r37divr8.fsf@gnu.org> <86vawh9lvw.fsf@gmail.com> <87shrjtj5g.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:46788) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c1Mev-0005kB-9K for guix-devel@gnu.org; Mon, 31 Oct 2016 20:10:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1Mer-0008Bt-8m for guix-devel@gnu.org; Mon, 31 Oct 2016 20:10:33 -0400 In-Reply-To: <87shrjtj5g.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 26 Oct 2016 14:00:11 +0200") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: Guix-devel On 10/26/2016 at 14:00 Ludovic Court=C3=A8s writes: > myglc2 skribis: > >> The scheduler that I am most familiar with, SGE, supports the >> proposition that compute hosts are heterogeneous and that they each have >> a fixed software and/or hardware configuration. As a result, users need >> to specify resources, such as SW packages &/or #CPUs &/or memory needed >> for a given job. These requirements in turn control where a given job >> can run. QMAKE, the integration of GNU Make with the SGE scheduler, >> further allows a make recipe step to specify specific resources for a >> SGE job to process the make step. > > I see. > >> While SGE is dated and can be a bear to use, it provides a useful >> yardstick for HPC/Cluster functionality. So it is useful to consider how >> Guix(SD) might impact this model. Presumably a defining characteristic >> of GuixSD clusters is that the software configuration of compute hosts >> no longer needs to be fixed and the user can "dial in" a specific SW >> configuration for each job step. This is in many ways a good thing. But >> it also generates new requirements. How does one specify the SW config >> for a given job or recipe step: >> >> 1) VM image? >> >> 2) VM? >> >> 3) Installed System Packages? >> >> 4) Installed (user) packages? > > The ultimate model here would be that of offloading=C2=B9: users would use > Guix on their machine, compute the derivation they want to build > locally, and offload the actual build to the cluster. In turn, the > cluster would schedule builds on the available and matching compute > nodes. But of course, this is quite sophisticated. > > =C2=B9 > https://www.gnu.org/software/guix/manual/html_node/Daemon-Offload-Setup.h= tml Thanks for pointing me to this. I hadn't internalized (an probably don't yet understand) just how cool the Offload Facility is. Sorry if my earlier comments were pendantic or uninformed as a result :-( Considering the Offload Facility as a SGE replacement, it would be interesting to make a venn diagram of the SGE and Guix Offload Facility functions and to study the usability issues of each. Having failed that assignment, here are a few thoughts: I guess we would see the QMAKE makefile mentioned above as being replaced by guile recipe(s)? Maybe we need a cheat sheet showing how to map between the two sets of functions/concepts? I am a little unclear about the implications of placing all analysis results into the Store. In labs where I have worked, data sources and destinations are typically managed by project. What are the pros and cons of everything in the store? What are the management/maintenance issues? E.g, when any result can be reproducibly derived, then only the inputs are precious, but maybe we want to "protect" from GC those results that were computationally more expensive? In Grid Engine, "submit hosts" are the machines that a user logs into to gain access to the cluster. Usually there are one or two such hosts, often used, in part, to simplify cluster access control. I guess you are saying that, when every user machine is set up as an "offload facility," it becomes like a "submit host." In general, this would be "nicer" but not sufficient. Grid Engine also provides a 'qrsh' command that allows users to log into compute host(s), reserving the same resources as required by a given job. This is useful when debugging a process that is failing or prototyping a process that requires memory, CPUs, or other resources not available on the user machine. Can the offload facility be extended to support something like this? > A more directly usable approach is to simply let users manage profiles > on the cluster using =E2=80=98guix package=E2=80=99 or =E2=80=98guix envi= ronment=E2=80=99. Then they > can specify the right profile or the right =E2=80=98guix environment=E2= =80=99 command in > their jobs. This seems quite powerful. How would one reproducibly specify "which guix" version [to use | was used]? Does this fit within the offload facility harness?