From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 8HI5DrBLXmArRAAAgWs5BA (envelope-from ) for ; Fri, 26 Mar 2021 22:01:36 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id IP4GCLBLXmBpfgAAbx9fmQ (envelope-from ) for ; Fri, 26 Mar 2021 21:01:36 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 3C0FD106B7 for ; Fri, 26 Mar 2021 22:01:35 +0100 (CET) Received: from localhost ([::1]:35836 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lPtaH-0004Op-CQ for larch@yhetil.org; Fri, 26 Mar 2021 17:01:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42918) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lPtaA-0004Lb-5m for gwl-devel@gnu.org; Fri, 26 Mar 2021 17:01:26 -0400 Received: from sender4-of-o51.zoho.com ([136.143.188.51]:21183) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lPta7-0008Cw-Sh; Fri, 26 Mar 2021 17:01:25 -0400 ARC-Seal: i=1; a=rsa-sha256; t=1616792477; cv=none; d=zohomail.com; s=zohoarc; b=ZmrgLTsAT3/+wO/6eMJ+jjp1kNrKYYaDRF+ido/YslWz3GnZr0Hez0ZexPXojv+POUwYkVFz7MbNf6JpdvetriLvCOtrtCtYFjfQOzNW10Xkd81a/GJWj22YFN6jtDEmiIljpXEgpyLX9PU+3YpQ3CkdPNystwbPtdI/Vo5AxUk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1616792477; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=0T4mR8g1eq1/1pIfF3B4qKQcYG6vOy+s0i3adOwjIzY=; b=R6u1EH6ChyHeiJiASpbFRxuCEMr/hHM8rZKI4CJqM+pji8X5eDRJIc+stgMal0om1tjMvQ6oadusac73vxISD43+MvZGCQREA49UWGKmCYWJFtmxaOrDi8VAmmtKY/d0LvHE/ew4pYkOm2gOn/GInjT4HKIsMaTwgFU2P5UvlbQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1616792477; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=0T4mR8g1eq1/1pIfF3B4qKQcYG6vOy+s0i3adOwjIzY=; b=dmofahZ9QMNapVIhHfRzj/4IjBzN11bg6++kta1mkjzJZTcMIRl/f9fDj0rAt23c XegP6Ucw9L4vijQx/x/iwBQODnhHAjNxGdJ08BfUC8qBJiIFmow217l3JROEKD7x8Ev XtNRCf76rtHkjIikdaWYWA2SLw9CGQ+S+3VXzvN8= Received: from localhost (p54ad4990.dip0.t-ipconnect.de [84.173.73.144]) by mx.zohomail.com with SMTPS id 1616792472033944.6085610194436; Fri, 26 Mar 2021 14:01:12 -0700 (PDT) References: <87pmzpvknf.fsf@elephly.net> <17010cba54fe3607be33eecceeb23dd8fffb1ab5.camel@gnu.org> <87k0pxvd9f.fsf@elephly.net> <54a5378e98cc233dbd93be59dbd5cf861230d9fa.camel@gnu.org> User-agent: mu4e 1.4.14; emacs 27.1 From: Ricardo Wurmus To: Roel Janssen Subject: Re: Getting started with GWL 0.3.0 In-reply-to: <54a5378e98cc233dbd93be59dbd5cf861230d9fa.camel@gnu.org> X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Fri, 26 Mar 2021 22:01:09 +0100 Message-ID: <874kgxtysq.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Received-SPF: pass client-ip=136.143.188.51; envelope-from=rekado@elephly.net; helo=sender4-of-o51.zoho.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: gwl-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gwl-devel@gnu.org Errors-To: gwl-devel-bounces+larch=yhetil.org@gnu.org Sender: "gwl-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616792495; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=0T4mR8g1eq1/1pIfF3B4qKQcYG6vOy+s0i3adOwjIzY=; b=RJoqaWvLGC/bw6iqSUXasrKUB42C/3vBEFcLq4xiAj62Tx3ydFYj4WEH+O4Bk5L2nzjNv/ RBtsivkL12VpnnjwWBJ9FhmhU4B1va+4UC3+xi1zs1GkDx5Wsm6T2FKVHJjoNR+U49yPHp +BW2DN2hnxOKYt7C3avRYHAxDT2QqDFduSi4+bAcgmJZTRrZdUGnQJhJoHiQHGHYoEyLsn +03SlZbrPwjAmPMK60O1NTHQLw2TruFKLHQUbQmH0V3klUzbfsMm4TXmiV4v45xRyTOLhG E10N28GIw/J0sr1+wkYkH6xAKdJTvx0MC03cot2siboy2z/mWz644ZdyqMuzRw== ARC-Seal: i=2; s=key1; d=yhetil.org; t=1616792495; a=rsa-sha256; cv=pass; b=daYawfJwOQOHkjdG/t4s2Zh8HWQpEdIdVInzzsdRTNm95ykLQRrfWbgXTRCzchM3CmLu/5 +Yp8DpyxsRpJL1tcTxfkjyaAA29hAlOqCEKs9sG3fXE1Kk74hsfnT/jfbWUhNoCSY5GoSL WB5ONpYGy3zpSbS4wYshwyr/u0XFmmQNiYJSrjlP1KfEfa81XjMBfk4iDeg8Rv13DMEZOy JiqLhpjhO8LVkGpShKi3P54Rdb5IPcKb2vMePlcO71d0Jp942w9OwynpCR1gEU8Sfby/5q PWy8yBDcjlJHCyBVyvpMuRQoImwm9JM3z2W1+1aO3OZdLEwREgJGB0KndUXzFA== ARC-Authentication-Results: i=2; aspmx1.migadu.com; dkim=pass header.d=elephly.net header.s=zoho header.b=dmofahZ9; arc=pass ("zohomail.com:s=zohoarc:i=1"); spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.62 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=elephly.net header.s=zoho header.b=dmofahZ9; arc=pass ("zohomail.com:s=zohoarc:i=1"); dmarc=none; spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Migadu-Queue-Id: 3C0FD106B7 X-Spam-Score: -3.62 X-Migadu-Scanner: scn0.migadu.com X-TUID: +H28Y6ovKDEx Hi Roel, >> > Is there a feature-branch to try out GWL with Guile-DRMAA? :) >>=20 >> Unfortunately not yet. >>=20 >> I haven=E2=80=99t been 100% successful with the only DRMAA-enabled clust= er that >> I have access to, and it turns out that it=E2=80=99s not as simple as SG= E=E2=80=99s >> =E2=80=9Chold_jid=E2=80=9D. >>=20 >> It=E2=80=99s no longer =E2=80=9Cfire and forget=E2=80=9D, which is a bit= sad, but that=E2=80=99s how >> DRMAA works. We need a run-time component that keeps track of >> submitted >> jobs and their status and actively starts held jobs when the >> prerequisites have finished. > > That's unfortunate, but I believe having a daemon that keeps track of > the workflow opens possibilities for "cloud" "orchestration". Yes, it=E2=80=99s pretty much the same mechanism, except that for the =E2= =80=9Ccloud=E2=80=9D we generally don=E2=80=99t have a ready-made =E2=80=9Cselect=E2=80=9D or =E2= =80=9Cwait=E2=80=9D equivalent. There we would either need to write code that lets the instances contact a coordination service or let the GWL process poll their status. With DRMAA it=E2=80=99s pretty simple: we submit all jobs in hold state, th= en start the first layer, and then we use the =E2=80=9Cwait=E2=80=9D call to b= e notified of any completed job. The docstring in Guile DRMAA says: --8<---------------cut here---------------start------------->8--- "Wait for the completion of a job with identifier JOB-ID. If the JOB-ID is the special symbol '*, wait for the completion of any job that has been submitted during this session. TIMEOUT (an integer) specifies the number of seconds to block. If it is not provided or is #FALSE this procedure will block forever. This procedure returns three values: the identifier of the job that has completed, the status code of the job (an opaque value), and an alist of resource usage statistics." --8<---------------cut here---------------end--------------->8--- The GWL already knows the graph of processes and each process corresponds to a submitted job, so with the return values of this procedure it should really not be complicated to implement. >> It=E2=80=99s not clear to me if and how we should persist workflow state= . The >> GWL will submit all jobs to the scheduler in a held state and then >> change their status when its their turn. I wonder if and how we should >> handle the case where the GWL runtime monitor dies and is restarted. >> The easiest way is to simply kill all queued up jobs, but I don=E2=80=99= t know >> if there=E2=80=99s a better approach. >>=20 >> Ideas? > > I find killing/removing queued jobs upon exiting the runtime monitor a > good idea! With DRMAA this is very easy. The =E2=80=9Ccontrol=E2=80=9D procedure allo= ws us to kill all jobs that were enqueued in the current session. In Guile DRMAA that=E2=80=99s (control '* 'terminate) > I have access to a SLURM cluster (I don't know which version of DRMAA > it supports), but I can test it. SLURM has an external DRMAA 1.0 implementation; it is not included by default. In Guix that=E2=80=99s provided by the slurm-drmaa package. --=20 Ricardo