From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ricardo Wurmus Subject: Re: Workflow management with GNU Guix Date: Mon, 16 May 2016 14:22:02 +0200 Message-ID: <87twhyp505.fsf@mdc-berlin.de> References: <87wpmzhdk2.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:44420) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b2HXU-0007SA-Hy for guix-devel@gnu.org; Mon, 16 May 2016 08:22:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b2HXO-0008RD-5k for guix-devel@gnu.org; Mon, 16 May 2016 08:22:24 -0400 In-reply-to: List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Roel Janssen Cc: guix-devel@gnu.org (Resending this as it could not be delivered.) Ricardo Wurmus writes: > Hi Roel, > >> With GNU Guix we are able to install programs to our machines with an amazing >> level of control over the dependency graph of the programs. We can now know >> what code will run when we invoke a program. We can now know what the impact >> of an upgrade will be. And we can now safely roll-back to previous states. >> >> What seems to be a common practice in research involving data analysis, is >> running multiple programs in a chain to transform data from raw to specific. >> This is often referred to as a "pipeline" or a "workflow". Because data sets >> can be quite large in comparison to the computing power of our laptops, the >> data analysis is performed on computing clusters instead of single machines. >> >> The usage of a pipeline/workflow is somewhat different from the package >> construction, because we want to run the sequence of commands on different data >> sets (as opposed to running it on the same source code). Plus, I would like to >> integrate it with existing computing clusters that have a job scheduling system >> in place. >> >> The reason I think this should be possible with Guix is that it has >> everything in place to do software deployment and run-time isolation >> (containers). From there it is a small step to executing programs in an >> automated way. >> >> So, I would like to propose a new Guix subcommand and an extension to >> the package management language to add workflow management features. > > I probably don’t understand your idea well enough, but from what I > understand it doesn’t really have much to do with packages (other than > using them) and store manipulation per se (produced artifacts are not > added to the store). Exactly what features of Guix do you want to build > on? > > My perspective on pipelines is that they should be developed like any > other software package, treating individual tools as you would treat > libraries. This means that a pipeline would have a configuration step > in which it checks for the paths of all tools it needs internally, and > then use the full paths rather than assume all tools to be in a > directory listed in the PATH variable. > > Distributing jobs to clusters would be the responsibility of the > pipeline, e.g. by using DRMMA, which supports several resource > management backends and has bindings for a wide range of programming > languages. > >> Would this be a feature you are interested in adding to GNU Guix? > > Even if it wasn’t part of Guix itself, you could develop it separately > and still add it as a Guix command, much like it is currently done for > “guix web” (which I think should eventually be part of Guix). > >> I'm currently working on a proof-of-concept implementation that has three >> record types/levels of abstraction: >> : Describes which es should be run, and concerns itself with >> the order of execution. >> >> : Describes what packages are needed to run the programs involved, >> and its relationship to other processes. Processes take input and >> generate output much like the package construction process. >> >>