From mboxrd@z Thu Jan  1 00:00:00 1970
References: <CAJ3okZ1Wy8eOGgnvFQN-ay-j37HCjFbYoT3EobkvRNULq0eJHA@mail.gmail.com>
	<874lbfxijq.fsf@elephly.net>
	<CAJ3okZ3DxZz017f4W2=Of2WcP79BXnYXg3D3tUitpgHFSaxr_w@mail.gmail.com>
From: Ricardo Wurmus <rekado@elephly.net>
Subject: Re: [GWL] (random) next steps?
Message-ID: <87va3mr6fl.fsf@elephly.net>
In-reply-to: <CAJ3okZ3DxZz017f4W2=Of2WcP79BXnYXg3D3tUitpgHFSaxr_w@mail.gmail.com>
Date: Fri, 21 Dec 2018 21:06:08 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guix-devel/>
List-Post: <mailto:guix-devel@gnu.org>
List-Help: <mailto:guix-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=subscribe>
Errors-To: guix-devel-bounces+kyle=kyleam.com@gnu.org
Sender: "Guix-devel" <guix-devel-bounces+kyle=kyleam.com@gnu.org>
To: zimoun <zimon.toutoune@gmail.com>
Cc: Guix Devel <guix-devel@gnu.org>, gwl-devel@gnu.org
List-ID: <gwl-devel.gnu.org>


Hi simon,

>> > 6.
>> > The graph of dependencies between the processes/units/rules is written
>> > by hand. What should be the best strategy to capture it ? By files "=
=C3=A0
>> > la" Snakemake ? Other ?
>>
>> The GWL currently does not use the input information provided by the
>> user in the data-inputs field.  For the content addressible store we
>> will need to change this.  The GWL will then be able of determining that
>> data-inputs are in fact the outputs of other processes.
>
> Hum? nice but how?
> I mean, the graph cannot be deduced and it needs to be written by
> hand, somehow. Isn't it?

We can connect a graph by joining the inputs of one process with the
outputs of another.

With a content addressed store we would run processes in isolation and
map the declared data inputs into the environment.  Instead of working
on the global namespace of the shared file system we can learn from Guix
and strictly control the execution environment.  After a process has run
to completion, only files that were declared as outputs end up in the
content addressed store.

A process could declare outputs like this:

    (define the-process
      (process
        (name 'foo)
        (outputs
         '((result "path/to/result.bam")
           (meta   "path/to/meta.xml")))))

Other processes can then access these files with:

    (output the-process 'result)

i.e. the file corresponding to the declared output =E2=80=9Cresult=E2=80=9D=
 of the
process named by the variable =E2=80=9Cthe-process=E2=80=9D.

The question here is just how far we want to take the idea of =E2=80=9Ccont=
ent
addressed=E2=80=9D =E2=80=93 is it enough to take the hash of all inputs or=
 do we need
to compute the output hash, which could be much more expensive?

--
Ricardo