From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roel Janssen Subject: Re: Guix on clusters and in HPC Date: Tue, 18 Oct 2016 18:47:40 +0200 Message-ID: <8737jteh8z.fsf@gnu.org> References: <87r37divr8.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:60038) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bwXX9-0008AG-NT for guix-devel@gnu.org; Tue, 18 Oct 2016 12:46:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bwXX8-0000EH-Ck for guix-devel@gnu.org; Tue, 18 Oct 2016 12:46:35 -0400 In-reply-to: <87r37divr8.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: Guix-devel Ludovic Courtès writes: > Hello, > > I’m trying to gather a “wish list” of things to be done to facilitate > the use of Guix on clusters and for high-performance computing (HPC). > > Ricardo and I wrote about the advantages, shortcomings, and perspectives > before: > > http://elephly.net/posts/2015-04-17-gnu-guix.html > https://hal.inria.fr/hal-01161771/en > > I know that Pjotr, Roel, Ben, Eric and maybe others also have experience > and ideas on what should be done (and maybe even code? :-)). > > So I’ve come up with an initial list of work items going from the > immediate needs to crazy ideas (batch scheduler integration!) that > hopefully make sense to cluster/HPC people. I’d be happy to get > feedback, suggestions, etc. from whoever is interested! > > (The reason I’m asking is that I’m considering submitting a proposal at > Inria to work on some of these things.) > > TIA! :-) Here are some aspects I think we need: * Network-aware guix-daemon From a user's point of view it would be cool to have a network-aware guix-daemon. In our cluster, we have a shared storage, on which we have the store, but manipulating the store through guix-daemon is now limited to a single node (and a single request per profile). Having `guix' talk with `guix-daemon' over a network allows users to install stuff from any node, instead of a specific node. * Profile management The abstraction of profiles is an awesome feature of FPM, but the user interface is missing. We could do better here. Switch the default profile (and prepend values of environment variables to the current values): $ guix profile --switch=/path/to/shared/profile Reset to default profile (and environment variable values without the profile we just unset): $ guix profile --reset Create an isolated environment based on a profile: $ guix environment --profile=/path/to/profile --pure --ad-hoc * Workflow management/execution Add automatic program execution with its own vocabulary. I think "workflow management" boils down to execution of a G-exp, but the results do not necessarily need to be stored in the store (because the data it works on is probably managed by an external data management system). A powerful feature of GNU Guix is its domain-specific language for describing software packages. We could add domain-specific parts for workflow management (a `workflow' data type and a `task' or `process' data type gets us there more or less). With workflow management we are only interested in the "build function", not the "source code" or the "build output". You are probably aware that I worked on this for some time, so I could share the data types I have and the execution engine parts I have. The HPC-specific part of this is the compatibility with existing job scheduling systems and data management systems. * Document on why we need super user privileges on the Guix daemon Probably an infamous point by now. By design, the Linux kernel keeps control over all processes. With GNU Guix, we need some control over the environment in which a process runs (disable network access, change the user that executes a process), and the environment in which the output lives (chown, chmod, to allow multiple users to use the build output). Instead of hitting the wall of "we are not going to run this thing with root privileges", we could present our sysadmins with a document for the reasons, the design decisions and the actual code involved in super user privilege stuff. This is something I am working on as well, but help is always welcome :-). Kind regards, Roel Janssen