From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id +PrVMSRQWmBiLgAA0tVLHw (envelope-from ) for ; Tue, 23 Mar 2021 20:31:32 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 6IOnLSRQWmDEHgAA1q6Kng (envelope-from ) for ; Tue, 23 Mar 2021 20:31:32 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 34F4ED7E4 for ; Tue, 23 Mar 2021 21:31:32 +0100 (CET) Received: from localhost ([::1]:33088 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lOngZ-0003UC-6e for larch@yhetil.org; Tue, 23 Mar 2021 16:31:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52240) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lOnfb-0002G8-3m for gwl-devel@gnu.org; Tue, 23 Mar 2021 16:30:31 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37019) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lOnfa-0006Kj-Kv; Tue, 23 Mar 2021 16:30:30 -0400 Received: from 2001-1c02-0b2a-aa00-2eb3-e528-5257-7587.cable.dynamic.v6.ziggo.nl ([2001:1c02:b2a:aa00:2eb3:e528:5257:7587]:38828) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1lOnfN-000356-4B; Tue, 23 Mar 2021 16:30:23 -0400 Message-ID: <54a5378e98cc233dbd93be59dbd5cf861230d9fa.camel@gnu.org> Subject: Re: Getting started with GWL 0.3.0 From: Roel Janssen To: Ricardo Wurmus Date: Tue, 23 Mar 2021 21:30:14 +0100 In-Reply-To: <87k0pxvd9f.fsf@elephly.net> References: <87pmzpvknf.fsf@elephly.net> <17010cba54fe3607be33eecceeb23dd8fffb1ab5.camel@gnu.org> <87k0pxvd9f.fsf@elephly.net> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: gwl-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gwl-devel@gnu.org Errors-To: gwl-devel-bounces+larch=yhetil.org@gnu.org Sender: "gwl-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616531492; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=0slCr2hQE4/ppJZEw15N5ZszSYW34uVdYq6DK+crC2c=; b=aw/JJCBfJ3LsgJMtOlkOLfGsnVhFJh5RGIpZk1A4D/hcaGRqce8qZjk9xurkx6m4wS78TT 2CDGsPBEOLZBA178xLT1AJp0kRrm9lNaeKyUypXvHwPGj/SLSVMzYw0xc9VL0DuEGwrRzG nJr6hcFFjjy1feDn+TCFckSHKOX10nIeVX6hIwkQeYbmvSx8cQAucvlUt5gE+QSFg1HoMY 2jIDeD4RHM0+3YodeTnoAzYfYxKta85NJhpLdBQtcAG9aWnXdjTwU5tiIC1/6plGlKeOe5 dLvOkOrfMjsyE/DGSyNF4nBu+ZAlL4gn6ioj/ukqCoAuZCS/pyTqJLNOwNy+3Q== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616531492; a=rsa-sha256; cv=none; b=Vlyj5/ZNawDBjsr0mMN9o/z5St3hH9KE4aidws3GyZdDb3gHydBNBTyCFNUG+fgnCL7MDE j6JWmuPQXUdmckLtcKbpNsbJkLZsvteSffb7wDQV/LLzjzoHt5dZy+5oBrR0CTmCX1l2nm ZOIvEfXmMsP+yUznWv5nYr4DfUz9hzcnRBDCfAdueDMYtzOPZWsdAtwXFTdclpibBPlqEm Qfas40FA9xQqgMVxkvtUejNk7HCEqJwYhyB8UHBl8FtxxUNAuScQyBeSGgTboPXETvod9Y QGeXKAFBdhfaaEC9GW16Oi3UtcIMKkFeME9o1fQetEjvgO7T9BtjL20l1IAeDg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Migadu-Spam-Score: -2.92 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Migadu-Queue-Id: 34F4ED7E4 X-Spam-Score: -2.92 X-Migadu-Scanner: scn0.migadu.com X-TUID: ootyGKeB4Tqx On Tue, 2021-03-23 at 21:14 +0100, Ricardo Wurmus wrote: > > Roel Janssen writes: > > > On Tue, 2021-03-23 at 18:34 +0100, Ricardo Wurmus wrote: > > > > > > Before you get too enthusiastic about the GWL, though, I’d like to > > > note > > > that 0.3.0 has a few known bugs that are already fixed in the > > > repository.  I’ve been putting off making a new release until > > > either > > > Guile-AWS or Guile-DRMAA are ready and usable with the GWL. > > > > Is there a feature-branch to try out GWL with Guile-DRMAA? :) > > Unfortunately not yet. > > I haven’t been 100% successful with the only DRMAA-enabled cluster that > I have access to, and it turns out that it’s not as simple as SGE’s > “hold_jid”. > > It’s no longer “fire and forget”, which is a bit sad, but that’s how > DRMAA works.  We need a run-time component that keeps track of > submitted > jobs and their status and actively starts held jobs when the > prerequisites have finished. That's unfortunate, but I believe having a daemon that keeps track of the workflow opens possibilities for "cloud" "orchestration". > It’s not clear to me if and how we should persist workflow state.  The > GWL will submit all jobs to the scheduler in a held state and then > change their status when its their turn.  I wonder if and how we should > handle the case where the GWL runtime monitor dies and is restarted. > The easiest way is to simply kill all queued up jobs, but I don’t know > if there’s a better approach. > > Ideas? I find killing/removing queued jobs upon exiting the runtime monitor a good idea! Maybe not suitable anymore, but I wrote a "qsub" command that translates to "squeue" here: https://github.com/roelj/qsub-slurm Could we use the same approach? It works because jobs are submitted in order. The look-up mechanism can be found here: https://github.com/roelj/qsub-slurm/blob/master/qsub.in#L233-L253 I have access to a SLURM cluster (I don't know which version of DRMAA it supports), but I can test it. Kind regards, Roel Janssen