unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
* Managing data files in workflows
@ 2021-03-25  9:57 Konrad Hinsen
  2021-03-26  7:02 ` zimoun
  2021-03-26  8:47 ` Ricardo Wurmus
  0 siblings, 2 replies; 15+ messages in thread
From: Konrad Hinsen @ 2021-03-25  9:57 UTC (permalink / raw)
  To: gwl-devel

Hi everyone,

Coming from make-like workflow systems, I wonder how data files are best
managed in GWL workflow. GWL is clearly less file-centric than make
(which is a Good Thing in my opinion), but at a first reading of the
manual, it doesn't seem to care about files at all, except for
auto-connect.

A simple example:

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # { wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv }

workflow influenza-incidence
  processes download
==================================================

This works fine the first time, but the second time it fails because the
output file of the process already exists. This doesn't look very
useful. The two behaviors I do see as potentially useful are

 1) Always replace the file.
 2) Don't run the process if the output file already exists
    (as make would do by default)

I can handle this in my bash code of course, but that becomes lengthy
even for this trivial case:

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # {
      rm {{outputs}}
      wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
    }
==================================================

==================================================
process download
  packages "wget"
  outputs
    file "data/weekly-incidence.csv"
  # {
      test -f {{outputs}} || wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
    }
==================================================

Is there a better solution?

Cheers,
  Konrad.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-03 13:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25  9:57 Managing data files in workflows Konrad Hinsen
2021-03-26  7:02 ` zimoun
2021-03-26 12:46   ` Konrad Hinsen
2021-03-26  8:47 ` Ricardo Wurmus
2021-03-26 12:30   ` Konrad Hinsen
2021-03-26 12:54     ` Konrad Hinsen
2021-03-26 13:13     ` Ricardo Wurmus
2021-03-26 15:36       ` Konrad Hinsen
2021-04-01 13:27         ` Ricardo Wurmus
2021-04-02  8:41           ` Konrad Hinsen
2021-04-07 11:38             ` Ricardo Wurmus
2021-04-08  7:28               ` Konrad Hinsen
2021-05-03  9:18                 ` Ricardo Wurmus
2021-05-03 11:58                   ` zimoun
2021-05-03 13:47                     ` Ricardo Wurmus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).