From: Konrad Hinsen <konrad.hinsen@fastmail.net>
To: gwl-devel@gnu.org
Subject: Managing data files in workflows
Date: Thu, 25 Mar 2021 10:57:27 +0100 [thread overview]
Message-ID: <m18s6bk12w.fsf@ordinateur-de-catherine--konrad.home> (raw)
Hi everyone,
Coming from make-like workflow systems, I wonder how data files are best
managed in GWL workflow. GWL is clearly less file-centric than make
(which is a Good Thing in my opinion), but at a first reading of the
manual, it doesn't seem to care about files at all, except for
auto-connect.
A simple example:
==================================================
process download
packages "wget"
outputs
file "data/weekly-incidence.csv"
# { wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv }
workflow influenza-incidence
processes download
==================================================
This works fine the first time, but the second time it fails because the
output file of the process already exists. This doesn't look very
useful. The two behaviors I do see as potentially useful are
1) Always replace the file.
2) Don't run the process if the output file already exists
(as make would do by default)
I can handle this in my bash code of course, but that becomes lengthy
even for this trivial case:
==================================================
process download
packages "wget"
outputs
file "data/weekly-incidence.csv"
# {
rm {{outputs}}
wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
}
==================================================
==================================================
process download
packages "wget"
outputs
file "data/weekly-incidence.csv"
# {
test -f {{outputs}} || wget -O {{outputs}} http://www.sentiweb.fr/datasets/incidence-PAY-3.csv
}
==================================================
Is there a better solution?
Cheers,
Konrad.
next reply other threads:[~2021-03-25 9:57 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-25 9:57 Konrad Hinsen [this message]
2021-03-26 7:02 ` Managing data files in workflows zimoun
2021-03-26 12:46 ` Konrad Hinsen
2021-03-26 8:47 ` Ricardo Wurmus
2021-03-26 12:30 ` Konrad Hinsen
2021-03-26 12:54 ` Konrad Hinsen
2021-03-26 13:13 ` Ricardo Wurmus
2021-03-26 15:36 ` Konrad Hinsen
2021-04-01 13:27 ` Ricardo Wurmus
2021-04-02 8:41 ` Konrad Hinsen
2021-04-07 11:38 ` Ricardo Wurmus
2021-04-08 7:28 ` Konrad Hinsen
2021-05-03 9:18 ` Ricardo Wurmus
2021-05-03 11:58 ` zimoun
2021-05-03 13:47 ` Ricardo Wurmus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.guixwl.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m18s6bk12w.fsf@ordinateur-de-catherine--konrad.home \
--to=konrad.hinsen@fastmail.net \
--cc=gwl-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).