unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
From: Konrad Hinsen <konrad.hinsen@fastmail.net>
To: Ricardo Wurmus <rekado@elephly.net>
Cc: gwl-devel@gnu.org
Subject: Re: Managing data files in workflows
Date: Fri, 02 Apr 2021 10:41:35 +0200	[thread overview]
Message-ID: <m1lfa1f58g.fsf@ordinateur-de-catherine--konrad.home> (raw)
In-Reply-To: <87h7kq2kzy.fsf@elephly.net>

Hi Ricardo,

> Maybe.  You could run with “--dry-run” to see what GWL claims it would
> do to confirm that it considers the file to be “not cached”.
>
> Also enable more log events (in particular cache events) with
>
> “--log-events=error,info,execute,cache,debug”

Thanks, I think I made progress with those nice debugging aids.

When I run my workflow for the first time, I see

  cache: Caching `./data/weekly-incidence.csv' as
  `/tmp/gwl/mwmeuuhnu7sv4mpouj7o5x4se4qp5n5auzhpkb7y7oxidoxzc6ra/./data/weekly-incidence.csv'

The '.' in there looks suspect. Let's see what I got:

   $ ls -lR /tmp/gwl/mwmeuuhnu7sv4mpouj7o5x4se4qp5n5auzhpkb7y7oxidoxzc6ra
   /tmp/gwl/mwmeuuhnu7sv4mpouj7o5x4se4qp5n5auzhpkb7y7oxidoxzc6ra:
   total 4
   drwxrwxr-x 2 hinsen hinsen 4096  2 avril 10:13 data

   /tmp/gwl/mwmeuuhnu7sv4mpouj7o5x4se4qp5n5auzhpkb7y7oxidoxzc6ra/data:
   total 0
   lrwxrwxrwx 1 hinsen hinsen 27  2 avril 10:13 weekly-incidence.csv -> ./data/weekly-incidence.csv

That's an invalid symbolic link, so it's not surprising that a second
run doesn't find the cached file.

When I use an absolute filename to refer to my download target, the
symlink in the cache is valid and points to the downloaded file. And
when I run the workflow a second time, it skips the "download" process
as expected. But then, it fails trying to "restore" the file:

   run: Skipping process "download" (cached at /tmp/gwl/ubvscxwoezl63qmvyfszlf6azmuc655h7gbbtosqshlm5r6ckyhq/).
   cache: Restoring `/tmp/gwl/ubvscxwoezl63qmvyfszlf6azmuc655h7gbbtosqshlm5r6ckyhq//home/hinsen/projects/mooc-workflows/influenza-analysis/data/weekly-incidence.csv' to `/home/hinsen/projects/mooc-workflows/influenza-analysis/data/weekly-incidence.csv'
   Backtrace:
              6 (primitive-load "/home/hinsen/.config/guix/current/bin/guix")
   In guix/ui.scm:
     2164:12  5 (run-guix-command _ . _)
   In srfi/srfi-1.scm:
      460:18  4 (fold #<procedure 7f45ba1d5c40 at gwl/workflows.scm:388:2 (ite…> …)
      460:18  3 (fold #<procedure 7f45ba1d5c20 at gwl/workflows.scm:391:13 (pr…> …)
   In gwl/workflows.scm:
      392:21  2 (_ #<process download> ())
   In srfi/srfi-1.scm:
       634:9  1 (for-each #<procedure 7f45ba1d57e0 at gwl/workflows.scm:549:26…> …)
   In guix/ui.scm:
       566:4  0 (_ system-error "symlink" _ _ _)

   guix/ui.scm:566:4: In procedure symlink: Operation not permitted: "/home/hinsen/projects/mooc-workflows/influenza-analysis/data/weekly-incidence.csv"

Looking at the source code in (gwl cache), restoring means symlinking
the target file to the cached file, which can't work given that the
cache is already a symlink to the target file.

So... I don't understand how the cache is supposed to work. If it stores
symlinks, there is no need to restore anything. If it is supposed to
store copies, then that's not what it does. My original issue with the
relative filename is a detail that should be easy to fix.

Cheers,
  Konrad.


  reply	other threads:[~2021-04-02  8:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25  9:57 Managing data files in workflows Konrad Hinsen
2021-03-26  7:02 ` zimoun
2021-03-26 12:46   ` Konrad Hinsen
2021-03-26  8:47 ` Ricardo Wurmus
2021-03-26 12:30   ` Konrad Hinsen
2021-03-26 12:54     ` Konrad Hinsen
2021-03-26 13:13     ` Ricardo Wurmus
2021-03-26 15:36       ` Konrad Hinsen
2021-04-01 13:27         ` Ricardo Wurmus
2021-04-02  8:41           ` Konrad Hinsen [this message]
2021-04-07 11:38             ` Ricardo Wurmus
2021-04-08  7:28               ` Konrad Hinsen
2021-05-03  9:18                 ` Ricardo Wurmus
2021-05-03 11:58                   ` zimoun
2021-05-03 13:47                     ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.guixwl.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1lfa1f58g.fsf@ordinateur-de-catherine--konrad.home \
    --to=konrad.hinsen@fastmail.net \
    --cc=gwl-devel@gnu.org \
    --cc=rekado@elephly.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).