unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
From: Ricardo Wurmus <rekado@elephly.net>
To: Roel Janssen <roel@gnu.org>
Cc: gwl-devel@gnu.org
Subject: Re: Getting started with GWL 0.3.0
Date: Fri, 26 Mar 2021 22:01:09 +0100	[thread overview]
Message-ID: <874kgxtysq.fsf@elephly.net> (raw)
In-Reply-To: <54a5378e98cc233dbd93be59dbd5cf861230d9fa.camel@gnu.org>


Hi Roel,

>> > Is there a feature-branch to try out GWL with Guile-DRMAA? :)
>> 
>> Unfortunately not yet.
>> 
>> I haven’t been 100% successful with the only DRMAA-enabled cluster that
>> I have access to, and it turns out that it’s not as simple as SGE’s
>> “hold_jid”.
>> 
>> It’s no longer “fire and forget”, which is a bit sad, but that’s how
>> DRMAA works.  We need a run-time component that keeps track of
>> submitted
>> jobs and their status and actively starts held jobs when the
>> prerequisites have finished.
>
> That's unfortunate, but I believe having a daemon that keeps track of
> the workflow opens possibilities for "cloud" "orchestration".

Yes, it’s pretty much the same mechanism, except that for the “cloud” we
generally don’t have a ready-made “select” or “wait” equivalent.  There
we would either need to write code that lets the instances contact a
coordination service or let the GWL process poll their status.

With DRMAA it’s pretty simple: we submit all jobs in hold state, then
start the first layer, and then we use the “wait” call to be notified of
any completed job.  The docstring in Guile DRMAA says:

--8<---------------cut here---------------start------------->8---
   "Wait for the completion of a job with identifier JOB-ID.  If the
JOB-ID is the special symbol '*, wait for the completion of any job that
has been submitted during this session.

TIMEOUT (an integer) specifies the number of seconds to block.  If it
is not provided or is #FALSE this procedure will block forever.

This procedure returns three values: the identifier of the job that
has completed, the status code of the job (an opaque value), and an
alist of resource usage statistics."
--8<---------------cut here---------------end--------------->8---

The GWL already knows the graph of processes and each process
corresponds to a submitted job, so with the return values of this
procedure it should really not be complicated to implement.

>> It’s not clear to me if and how we should persist workflow state.  The
>> GWL will submit all jobs to the scheduler in a held state and then
>> change their status when its their turn.  I wonder if and how we should
>> handle the case where the GWL runtime monitor dies and is restarted.
>> The easiest way is to simply kill all queued up jobs, but I don’t know
>> if there’s a better approach.
>> 
>> Ideas?
>
> I find killing/removing queued jobs upon exiting the runtime monitor a
> good idea!

With DRMAA this is very easy.  The “control” procedure allows us to kill
all jobs that were enqueued in the current session.  In Guile DRMAA
that’s

   (control '* 'terminate)

> I have access to a SLURM cluster (I don't know which version of DRMAA
> it supports), but I can test it.

SLURM has an external DRMAA 1.0 implementation; it is not included by
default.  In Guix that’s provided by the slurm-drmaa package.

-- 
Ricardo


  reply	other threads:[~2021-03-26 21:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22 10:32 Getting started with GWL 0.3.0 Konrad Hinsen
2021-03-22 11:03 ` zimoun
2021-03-22 13:04   ` Konrad Hinsen
2021-03-22 13:51     ` zimoun
2021-03-22 15:07       ` Konrad Hinsen
2021-03-22 18:16         ` zimoun
2021-03-23 12:57           ` Konrad Hinsen
2021-03-23 13:16             ` Ricardo Wurmus
2021-03-23 13:24               ` Roel Janssen
2021-03-23 20:16             ` zimoun
2021-03-24 10:08               ` Konrad Hinsen
2021-03-24 10:44                 ` zimoun
2021-03-23 15:51 ` Konrad Hinsen
2021-03-23 17:34   ` Ricardo Wurmus
2021-03-23 19:30     ` Roel Janssen
2021-03-23 20:14       ` Ricardo Wurmus
2021-03-23 20:30         ` Roel Janssen
2021-03-26 21:01           ` Ricardo Wurmus [this message]
2021-04-30 21:50       ` Ricardo Wurmus
2021-03-24  9:52     ` Konrad Hinsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.guixwl.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kgxtysq.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=gwl-devel@gnu.org \
    --cc=roel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).