unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
From: Ricardo Wurmus <rekado@elephly.net>
To: gwl-devel@gnu.org
Subject: Re: Preparing for a new release
Date: Sun, 09 Feb 2020 14:00:53 +0100	[thread overview]
Message-ID: <877e0vq5iy.fsf@elephly.net> (raw)
In-Reply-To: <87a75spx3i.fsf@elephly.net>

While playing with a real-world workflow I found a few problems:

* inputs and outputs are not validated

  When a process declares that it produces an output, but then doesn’t do
  that, the next process will fail with a nasty error message.  This is
  especially nasty when using containerization as the error is about
  failing to map the input into the container.

  Processes should automatically validate their inputs and outputs.
  Since inputs and outputs could technically be something other than
  files I’m not sure exactly how to do this.

  @Roel: can you give an example of inputs / outputs that are not files?
  I remember that you suggested that inputs might be database queries,
  for example.  I wonder if we should mark inputs and outputs with
  types, so that the GWL can know if something is supposed to be a file
  or something else.  …just how would “something else” be used in a
  process?

* The --output option has no effect

  I think the “--output” option should cause all generated files to end
  up somewhere in the given directory.  I wonder if this should affect
  *all* generated files or just the final output.  If all outputs should
  be affected then all *inputs* must be adjusted as well.  Maybe
  “--output” is the wrong name.  Should it be “--prefix” instead?

* It’s not possible to select more than one tagged item

  In my test workflow I’m generating a bunch of inputs by mapping over
  an argument list.  Now the problem is that I can’t select all of these
  inputs easily in a code snippet.  With the syntax we have I can only
  select the first item following a tag.

  To address this I’ve extended the accessor syntax, so this works now:

--8<---------------cut here---------------start------------->8---
process frobnicate
  packages "frobnicator"
  inputs
    . genome: "hg19.fa"
    . samples: "a" "b" "c"
  outputs
    . "result"
  # {
    frobnicate -g {{inputs:genome}} --files {{inputs::samples}} > {{outputs}}
  }
--8<---------------cut here---------------end--------------->8---

  Note how {{inputs::samples}} is substituted with “a b c”.  With just a
  single colon it would be just “a”.  Single colon = single item; double
  colon = more than one item.

* Containerization and directories

  Containers for processes that create output files in directories that
  don’t exist yet cannot be created.  That’s because the “containerize”
  procedure tries to map directories of input and output files into the
  container — and the output directory doesn’t exist yet.

  How should this be handled?  We could ignore non-existing output
  directories when creating containers, I suppose.  I think that’s the
  best option, because we can’t just create them lest we break
  procedures that don’t deal well with existing directories.


--
Ricardo

  reply	other threads:[~2020-02-09 13:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-08 12:39 Preparing for a new release Ricardo Wurmus
2020-02-08 20:38 ` Kyle Meyer
2020-02-08 21:50   ` Ricardo Wurmus
2020-02-09 13:00     ` Ricardo Wurmus [this message]
2020-02-09 23:33       ` zimoun
2020-02-10  1:25         ` Kyle Meyer
2020-02-10  7:34           ` zimoun
2020-02-10  6:31         ` Ricardo Wurmus
2020-02-10  7:43           ` zimoun
2020-02-10 21:28             ` Ricardo Wurmus
2020-02-10 23:43               ` zimoun
2020-02-11  9:39                 ` Ricardo Wurmus
2020-02-11  9:34               ` Ricardo Wurmus
2020-02-11 15:37                 ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.guixwl.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877e0vq5iy.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=gwl-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).