unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
From: Ricardo Wurmus <rekado@elephly.net>
To: Konrad Hinsen <konrad.hinsen@fastmail.net>
Cc: gwl-devel@gnu.org
Subject: Re: Managing data files in workflows
Date: Mon, 03 May 2021 11:18:18 +0200	[thread overview]
Message-ID: <87a6pckwf9.fsf@elephly.net> (raw)
In-Reply-To: <m1lf9tfd5u.fsf@ordinateur-de-catherine--konrad.home>


Konrad Hinsen <konrad.hinsen@fastmail.net> writes:

> Hi Ricardo,
>
>> We can fix the problem with symlinks by restoring the target of 
>> the link
>> instead of the link itself, but I feel that we need to take a 
>> step back
>> and consider what this cache is really to be used for.
>
> Indeed, and I have to admit that this isn't clear to me 
> yet. What is it
> supposed to protect against? Modification of files by other 
> processes of
> the workflow? Modification of files outside of the workflow? 
> Both?
>
> For the second situation (modification outside of the workflow), 
> I think
> it would be sufficient to store a checksum, and terminate the 
> workflow
> with an error if it detects such tampering.
>
> The first situation is more difficult. There are actually two 
> cases:
>  1. The workflow intentionally updates files as it proceeds.
>  2. The workflow modifies a file by mistake.
>
> Only the workflow author can make the distinction, so this needs 
> some
> specific input syntax. Case 2 could then again be handled by a 
> simple
> checksum test for signalling an error.
>
> This leaves case 1, for which the only good solution is to make 
> a copy
> of the file at the end of each process, and restore it in later 
> runs.

Yes, you are right.  On wip-drmaa I changed the cache to never 
symlink.  It either hardlinks or copies.  This solves the 
immediate problem.

Yes, the semantics of hardlink/copy differ, but since our 
assumption is that intermediate files are reproducible, we can 
ignore this at this point.

I want to make the cache store/restore actions configurable, 
though, so that you can implement whatever caching method you want 
(including caching by copying to AWS S3).  

I’d like to introduce modifiers “immutable” and “mutable”, so that 
you can write “immutable file "whatever" you "want"” etc. 
“immutable” would take care of recording hashes and checking 
previously recorded hashes in a local state directory.

-- 
Ricardo


  reply	other threads:[~2021-05-03  9:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25  9:57 Managing data files in workflows Konrad Hinsen
2021-03-26  7:02 ` zimoun
2021-03-26 12:46   ` Konrad Hinsen
2021-03-26  8:47 ` Ricardo Wurmus
2021-03-26 12:30   ` Konrad Hinsen
2021-03-26 12:54     ` Konrad Hinsen
2021-03-26 13:13     ` Ricardo Wurmus
2021-03-26 15:36       ` Konrad Hinsen
2021-04-01 13:27         ` Ricardo Wurmus
2021-04-02  8:41           ` Konrad Hinsen
2021-04-07 11:38             ` Ricardo Wurmus
2021-04-08  7:28               ` Konrad Hinsen
2021-05-03  9:18                 ` Ricardo Wurmus [this message]
2021-05-03 11:58                   ` zimoun
2021-05-03 13:47                     ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.guixwl.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6pckwf9.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=gwl-devel@gnu.org \
    --cc=konrad.hinsen@fastmail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).