From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1gjtN0-0004iB-Hi for mharc-gwl-devel@gnu.org; Wed, 16 Jan 2019 17:09:10 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60452) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gjtMx-0004hs-Ip for gwl-devel@gnu.org; Wed, 16 Jan 2019 17:09:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gjtMw-0001DF-9F for gwl-devel@gnu.org; Wed, 16 Jan 2019 17:09:07 -0500 Received: from sender-of-o53.zoho.com ([135.84.80.218]:21718) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gjtMv-0000pv-UR for gwl-devel@gnu.org; Wed, 16 Jan 2019 17:09:06 -0500 References: <874lbfxijq.fsf@elephly.net> <87va3mr6fl.fsf@elephly.net> From: Ricardo Wurmus In-reply-to: Date: Wed, 16 Jan 2019 23:08:34 +0100 Message-ID: <87won4gsrh.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [GWL] (random) next steps? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: zimoun Cc: gwl-devel@gnu.org Hi simon, [- guix-devel@gnu.org] I wrote: > We can connect a graph by joining the inputs of one process with the > outputs of another. > > With a content addressed store we would run processes in isolation and > map the declared data inputs into the environment. Instead of working > on the global namespace of the shared file system we can learn from Guix > and strictly control the execution environment. After a process has run > to completion, only files that were declared as outputs end up in the > content addressed store. > > A process could declare outputs like this: > > (define the-process > (process > (name 'foo) > (outputs > '((result "path/to/result.bam") > (meta "path/to/meta.xml"))))) > > Other processes can then access these files with: > > (output the-process 'result) > > i.e. the file corresponding to the declared output =E2=80=9Cresult=E2=80= =9D of the > process named by the variable =E2=80=9Cthe-process=E2=80=9D. You wrote: > From my point of view, there is 2 different paths: > 1- the inputs-outputs are attached to the process/rule/unit > 2- the processes/rules/units are a pure function and then the > `workflow' describes how to glue them together. [=E2=80=A6] > On one hand, from the path 1-, it is hard to reuse the process/rule > because the composition is hard-coded in the inputs-outputs > (duplication of the same process/rule with different inputs-outputs). > The graph is written by the user when it writes the inputs-outputs > chain. > On the other hand, from the path 2-, it is difficult to provide both > the inputs-outputs to the function and also the graph without > duplicate some code. I agree with this assessment. I would like to note, though, that at least the declaration of outputs works in both systems. Only when an exact input is tightly attached to a process/rule do we limit ourselves to the first path where composition is inflexible. > Last, is it useful to write on disk the intermediate files if they are > not stored? > In the tread [0], we discussed the possibility to stream the pipes. > Let say, the simple case: > filter input > filtered > quality filtered > output > and the piped version is better is you do not mind about the filtered fil= e: > filter input | quality > ouput > > However, the classic pipe does not fit for this case: > filter input_R1 > R1_filtered > filter input_R2 > R2_filtered > align R1_filtered R2_filtered > output_aligned > In general, one is not interested to conserve the files > R{1,2}_filtered. So why spend time to write them on disk and to hash > them. > > In other words, is it doable to stream the `processes' at the process > level? [=E2=80=A6] > [0] http://lists.gnu.org/archive/html/guix-devel/2018-07/msg00231.html For this to work at all inputs and outputs must be declared. This wasn=E2=80=99t mentioned before, but it could of course be done in the work= flow declaration rather than the individual process descriptions. But even then it isn=E2=80=99t clear to me how to do this in a general fash= ion. It may work fine for tools that write to I/O streams, but we would probably need mechanisms to declare this behaviour. It cannot be generally inferred, nor can a process automatically change the behaviour of its procedure to switch between the generation of intermediate files and output to a stream. The GWL examples show the use of the =E2=80=9C(system "foo > out.file") idi= om, which I don=E2=80=99t like very much. I=E2=80=99d prefer to use "foo" dire= ctly and declare the output to be a stream. > Last, could we add a GWL session to the before-FOSDEM days? The Guix Days are what we make of them, so yes, we can have a GWL session there :) -- Ricardo