From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 6BqkJsGJwl4PfQAA0tVLHw (envelope-from ) for ; Mon, 18 May 2020 13:12:33 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id EJitIsGJwl5CMQAAB5/wlQ (envelope-from ) for ; Mon, 18 May 2020 13:12:33 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0AED09404E1 for ; Mon, 18 May 2020 13:12:33 +0000 (UTC) Received: from localhost ([::1]:60178 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jafZH-0005cq-Sg for larch@yhetil.org; Mon, 18 May 2020 09:12:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57006) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jafYf-0005ai-AU for guix-devel@gnu.org; Mon, 18 May 2020 09:11:53 -0400 Received: from mail.thebird.nl ([94.142.245.5]:36444) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jafYe-0004e9-1b for guix-devel@gnu.org; Mon, 18 May 2020 09:11:52 -0400 Received: by mail.thebird.nl (Postfix, from userid 1000) id B84DB78AF; Mon, 18 May 2020 15:11:48 +0200 (CEST) Date: Mon, 18 May 2020 08:11:48 -0500 From: Pjotr Prins To: guix-devel Subject: Re: Slurm with containers (i.e., orchestration) Message-ID: <20200518131148.mquee56ukhr7swk2@thebird.nl> References: <20200518124900.jkr5rts5bnslrkqg@thebird.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200518124900.jkr5rts5bnslrkqg@thebird.nl> User-Agent: NeoMutt/20170113 (1.7.2) Received-SPF: pass client-ip=94.142.245.5; envelope-from=pjotr2020@thebird.nl; helo=mail.thebird.nl X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/18 08:49:00 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -1.01 X-TUID: ih59XDL6AyDz Ricardo added slurm-drmaa in the past (I can't believe it almost 4 years ago we packaged slurm!) which may also help in addressing some points http://www.drmaa.org/ Pj. On Mon, May 18, 2020 at 07:49:00AM -0500, Pjotr Prins wrote: > I am looking into some light-weight style orchestration. One > possibility is to use Slurm with Guix containers - on a cluster with > Guix that is almost trivial (we use Guix containers a lot! They are > great) and would also allow non-container jobs. > > Once we have containers and Slurm it should also be possible to deploy > in some cloud infrastructure, provided there are no dependencies on > the cluster itself. I think it would make a terrific BLOG story if we > put something like that together. > > Bcbio describes an architecture that uses the common workflow language > (CWL) to run pipelines with containers > > https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#running-with-cromwell-local-hpc > > I am not promoting the use of this, but it shows that infrastructure > exists that can deploy workflows on containers in different setups > (Bcbio supports Slurm). I know the Guix infrastructure uses Guix > deploy to achieve similar roll-outs. What that lacks is the > orchestration mechanism itself which should handle dependencies > between jobs (i.e. a workflow). The GNU Workflow Language goes some > way, but it does not handle orchestration itself. > > In other words, we almost have the pieces, but one thing is missing > :). Thoughts? I know I have brought this up before in different > guises, but we start to really need something here. > > What makes orchestration? I guess it concerns a dynamic database of > machines that can execute jobs and some type of software registry > (Guix). Next it should be able to schedule and execute jobs using > some constraint specifiers (like network/CPU/RAM). It could be a > 'dynamic' Slurm that makes use of real machines and VMs. Or hook into > an existing cloud service. A slurm job could monitor sending a > container into a cloud service. > > I think we can build this up a step at a time. > > Thoughts? > > Pj. >