unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
From: Konrad Hinsen <konrad.hinsen@fastmail.net>
To: gwl-devel@gnu.org
Subject: Managing data files in workflows
Date: Thu, 25 Mar 2021 10:57:27 +0100	[thread overview]
Message-ID: <m18s6bk12w.fsf@ordinateur-de-catherine--konrad.home> (raw)

Hi everyone,

Coming from make-like workflow systems, I wonder how data files are best
managed in GWL workflow. GWL is clearly less file-centric than make
(which is a Good Thing in my opinion), but at a first reading of the
manual, it doesn't seem to care about files at all, except for
auto-connect.

A simple example:

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # { wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv }

workflow influenza-incidence
  processes download
==================================================

This works fine the first time, but the second time it fails because the
output file of the process already exists. This doesn't look very
useful. The two behaviors I do see as potentially useful are

 1) Always replace the file.
 2) Don't run the process if the output file already exists
    (as make would do by default)

I can handle this in my bash code of course, but that becomes lengthy
even for this trivial case:

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # {
      rm {{outputs}}
      wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
    }
==================================================

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # {
      test -f {{outputs}} || wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
    }
==================================================

Is there a better solution?

Cheers,
  Konrad.


             reply	other threads:[~2021-03-25  9:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25  9:57 Konrad Hinsen [this message]
2021-03-26  7:02 ` Managing data files in workflows zimoun
2021-03-26 12:46   ` Konrad Hinsen
2021-03-26  8:47 ` Ricardo Wurmus
2021-03-26 12:30   ` Konrad Hinsen
2021-03-26 12:54     ` Konrad Hinsen
2021-03-26 13:13     ` Ricardo Wurmus
2021-03-26 15:36       ` Konrad Hinsen
2021-04-01 13:27         ` Ricardo Wurmus
2021-04-02  8:41           ` Konrad Hinsen
2021-04-07 11:38             ` Ricardo Wurmus
2021-04-08  7:28               ` Konrad Hinsen
2021-05-03  9:18                 ` Ricardo Wurmus
2021-05-03 11:58                   ` zimoun
2021-05-03 13:47                     ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.guixwl.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m18s6bk12w.fsf@ordinateur-de-catherine--konrad.home \
    --to=konrad.hinsen@fastmail.net \
    --cc=gwl-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).