From mboxrd@z Thu Jan  1 00:00:00 1970
From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=)
Subject: Re: Guix on clusters and in HPC
Date: Wed, 26 Oct 2016 14:00:11 +0200
Message-ID: <87shrjtj5g.fsf@gnu.org>
References: <87r37divr8.fsf@gnu.org> <86vawh9lvw.fsf@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45692)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ludo@gnu.org>) id 1bzMsT-0004Aa-29
	for guix-devel@gnu.org; Wed, 26 Oct 2016 08:00:22 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <ludo@gnu.org>) id 1bzMsP-00033u-Tc
	for guix-devel@gnu.org; Wed, 26 Oct 2016 08:00:17 -0400
In-Reply-To: <86vawh9lvw.fsf@gmail.com> (myglc2@gmail.com's message of "Mon,
	24 Oct 2016 22:56:51 -0400")
List-Id: "Development of GNU Guix and the GNU System distribution."
	<guix-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guix-devel/>
List-Post: <mailto:guix-devel@gnu.org>
List-Help: <mailto:guix-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=subscribe>
Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org
Sender: "Guix-devel" <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
To: myglc2 <myglc2@gmail.com>
Cc: Guix-devel <guix-devel@gnu.org>

Hi,

myglc2 <myglc2@gmail.com> skribis:

> The scheduler that I am most familiar with, SGE, supports the
> proposition that compute hosts are heterogeneous and that they each have
> a fixed software and/or hardware configuration. As a result, users need
> to specify resources, such as SW packages &/or #CPUs &/or memory needed
> for a given job. These requirements in turn control where a given job
> can run. QMAKE, the integration of GNU Make with the SGE scheduler,
> further allows a make recipe step to specify specific resources for a
> SGE job to process the make step.

I see.

> While SGE is dated and can be a bear to use, it provides a useful
> yardstick for HPC/Cluster functionality. So it is useful to consider how
> Guix(SD) might impact this model. Presumably a defining characteristic
> of GuixSD clusters is that the software configuration of compute hosts
> no longer needs to be fixed and the user can "dial in" a specific SW
> configuration for each job step.  This is in many ways a good thing. But
> it also generates new requirements. How does one specify the SW config
> for a given job or recipe step:
>
> 1) VM image?
>
> 2) VM?
>
> 3) Installed System Packages?
>
> 4) Installed (user) packages?

The ultimate model here would be that of offloading=C2=B9: users would use
Guix on their machine, compute the derivation they want to build
locally, and offload the actual build to the cluster.  In turn, the
cluster would schedule builds on the available and matching compute
nodes.  But of course, this is quite sophisticated.

=C2=B9 https://www.gnu.org/software/guix/manual/html_node/Daemon-Offload-Se=
tup.html

A more directly usable approach is to simply let users manage profiles
on the cluster using =E2=80=98guix package=E2=80=99 or =E2=80=98guix enviro=
nment=E2=80=99.  Then they
can specify the right profile or the right =E2=80=98guix environment=E2=80=
=99 command in
their jobs.

> Based on my experiments with Guix/Debian, GuixSD, VMs, and VM images it
> is not obvious to me which of these levels of abstraction is
> appropriate. Perhaps any mix should be supported. In any case, tools to
> manage this aspect of a GuixSD cluster are needed. And they need to be
> integrated with the cluster scheduler to produce a manageable GuixSD HPC
> cluster.

Note that I=E2=80=99m focusing on the use of Guix on a cluster on top of
whatever ancient distro is already running, as a replacement for
home-made =E2=80=9Cmodules=E2=80=9D and such, and as opposed to running Gui=
xSD on all
the compute nodes.

Running GuixSD on all the nodes of a cluster would certainly be valuable
from a sysadmin viewpoint, but it=E2=80=99s also something that=E2=80=99s m=
uch harder to
do in practice today.

> The most forward-thinking group that I know discarded their cluster
> hardware a year ago to replace it with starcluster
> (http://star.mit.edu/cluster/). Starcluster automates the creation,
> care, and feeding of a HPC clusters on AWS using the Grid Engine
> scheduler and AMIs. The group has a full-time "starcluster jockey" who
> manages their cluster and they seem quite happy with the approach. So
> you may want to consider starcluster as a model when you think of
> cluster management requirements.

Hmm, OK.

Thanks for your feedback,
Ludo=E2=80=99.