Re: [GWL] (random) next steps?

unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed

From: Ricardo Wurmus <rekado@elephly.net>
To: zimoun <zimon.toutoune@gmail.com>
Cc: Guix Devel <guix-devel@gnu.org>, gwl-devel@gnu.org
Subject: Re: [GWL] (random) next steps?
Date: Fri, 21 Dec 2018 21:06:08 +0100	[thread overview]
Message-ID: <87va3mr6fl.fsf@elephly.net> (raw)
In-Reply-To: <CAJ3okZ3DxZz017f4W2=Of2WcP79BXnYXg3D3tUitpgHFSaxr_w@mail.gmail.com>

Hi simon,

>> > 6.
>> > The graph of dependencies between the processes/units/rules is written
>> > by hand. What should be the best strategy to capture it ? By files "à
>> > la" Snakemake ? Other ?
>>
>> The GWL currently does not use the input information provided by the
>> user in the data-inputs field.  For the content addressible store we
>> will need to change this.  The GWL will then be able of determining that
>> data-inputs are in fact the outputs of other processes.
>
> Hum? nice but how?
> I mean, the graph cannot be deduced and it needs to be written by
> hand, somehow. Isn't it?

We can connect a graph by joining the inputs of one process with the
outputs of another.

With a content addressed store we would run processes in isolation and
map the declared data inputs into the environment.  Instead of working
on the global namespace of the shared file system we can learn from Guix
and strictly control the execution environment.  After a process has run
to completion, only files that were declared as outputs end up in the
content addressed store.

A process could declare outputs like this:

    (define the-process
      (process
        (name 'foo)
        (outputs
         '((result "path/to/result.bam")
           (meta   "path/to/meta.xml")))))

Other processes can then access these files with:

    (output the-process 'result)

i.e. the file corresponding to the declared output “result” of the
process named by the variable “the-process”.

The question here is just how far we want to take the idea of “content
addressed” – is it enough to take the hash of all inputs or do we need
to compute the output hash, which could be much more expensive?

--
Ricardo

next prev parent reply	other threads:[~2018-12-21 20:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJ3okZ1Wy8eOGgnvFQN-ay-j37HCjFbYoT3EobkvRNULq0eJHA@mail.gmail.com>
2018-12-15  9:09 ` [GWL] (random) next steps? Ricardo Wurmus
2018-12-17 17:33   ` zimoun
2018-12-21 20:06     ` Ricardo Wurmus [this message]
2019-01-04 17:48       ` zimoun
2019-01-16 22:08         ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.guixwl.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87va3mr6fl.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=guix-devel@gnu.org \
    --cc=gwl-devel@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).