all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Ricardo Wurmus <rekado@elephly.net>
To: Konrad Hinsen <konrad.hinsen@fastmail.net>
Cc: gwl-devel@gnu.org
Subject: Re: Managing data files in workflows
Date: Fri, 26 Mar 2021 14:13:16 +0100	[thread overview]
Message-ID: <87czvmt5w3.fsf@elephly.net> (raw)
In-Reply-To: <m1k0puhzbg.fsf@ordinateur-de-catherine--konrad.home>


Konrad Hinsen <konrad.hinsen@fastmail.net> writes:

>> This works for me correctly:
>
> Thanks for looking into this! For me, your change makes no difference.
> Nor should it, because in my setup the "data" directory already exists.
> I still get an error message about the already existing file.
>
> Maybe it's time to switch to the development version of GWL!

Hmm, I don’t see any commits since 0.3.0 that would affect the cache
implementation.  GWL computes cache hashes for all processes and the
processes they depend on.  In your case it’s trivial: there’s just one
process.  The process definition is hashed and looked up in the cache
to see if there is any output for the given process hash.

In my test case this file exists:

    /tmp/gwl/lf6uca7zcyyldkcrxn3zwc275ax3ip676aqgjo75ybwojtl4emoq/data/weekly-incidence.csv

/tmp/gwl is the cache prefix, and the hash corresponds to the process.
Since data/weekly-incidence.csv exists and that’s the only declared
output, GWL decides not compute the output again.

At least that happens in my case.  I wonder why it doesn’t work in your
case.

> However, what I had in mind with my question is the management of
> intermediate results in my workflow, especially in its development
> phase. If I change my workflow file, or a script that it calls,
> I'd want only the affected steps to be recomputed. That's not much
> of an issue for my current test case, but I have bigger dreams for
> the future ;-)

Yes, that’s the way it’s supposed to work already.  GWL computes the
hashes of each chain of processes, which includes the generated process
script, its inputs, and the hashes of all processes that lead up to this
process.  Any change in the chain will lead to a new hash and thus a
cache miss, leading GWL to recompute.

-- 
Ricardo


  parent reply	other threads:[~2021-03-26 13:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25  9:57 Managing data files in workflows Konrad Hinsen
2021-03-26  7:02 ` zimoun
2021-03-26 12:46   ` Konrad Hinsen
2021-03-26  8:47 ` Ricardo Wurmus
2021-03-26 12:30   ` Konrad Hinsen
2021-03-26 12:54     ` Konrad Hinsen
2021-03-26 13:13     ` Ricardo Wurmus [this message]
2021-03-26 15:36       ` Konrad Hinsen
2021-04-01 13:27         ` Ricardo Wurmus
2021-04-02  8:41           ` Konrad Hinsen
2021-04-07 11:38             ` Ricardo Wurmus
2021-04-08  7:28               ` Konrad Hinsen
2021-05-03  9:18                 ` Ricardo Wurmus
2021-05-03 11:58                   ` zimoun
2021-05-03 13:47                     ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czvmt5w3.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=gwl-devel@gnu.org \
    --cc=konrad.hinsen@fastmail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.