From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from list by lists.gnu.org with archive (Exim 4.71)
	id 1glrkq-0006mt-3f
	for mharc-gwl-devel@gnu.org; Tue, 22 Jan 2019 03:49:56 -0500
Received: from eggs.gnu.org ([209.51.188.92]:51503)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zimon.toutoune@gmail.com>) id 1glrkn-0006ml-7V
	for gwl-devel@gnu.org; Tue, 22 Jan 2019 03:49:54 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zimon.toutoune@gmail.com>) id 1glrkl-0005LV-Ep
	for gwl-devel@gnu.org; Tue, 22 Jan 2019 03:49:52 -0500
Received: from mail-qt1-x834.google.com ([2607:f8b0:4864:20::834]:36733)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <zimon.toutoune@gmail.com>)
	id 1glrkl-0005Kn-A4
	for gwl-devel@gnu.org; Tue, 22 Jan 2019 03:49:51 -0500
Received: by mail-qt1-x834.google.com with SMTP id t13so26708511qtn.3
	for <gwl-devel@gnu.org>; Tue, 22 Jan 2019 00:49:51 -0800 (PST)
MIME-Version: 1.0
References: <87bm4df2ld.fsf@elephly.net>
	<CAJ3okZ1enxGxXxjAVdgNjzU8mwEs8aDquZfLKAaLaT1WLYXUTg@mail.gmail.com>
	<878szgg9bi.fsf@elephly.net>
	<CAJ3okZ1Yt_F=CqX5SmZ7e0O1F_TXvm8X1ahVHnvqOTf0uc+DbA@mail.gmail.com>
	<871s58fk0z.fsf@elephly.net>
	<CAJ3okZ2F2_C_GtXLnsFbHpOs23uiA7QFomSxzm4LaKC_=QmERQ@mail.gmail.com>
	<871s55eiae.fsf@elephly.net>
In-Reply-To: <871s55eiae.fsf@elephly.net>
From: zimoun <zimon.toutoune@gmail.com>
Date: Tue, 22 Jan 2019 09:49:37 +0100
Message-ID: <CAJ3okZ32AccbethmeC80zmm15O-ipReYNN9x41AwvrF6_K-5yQ@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: =?UTF-8?B?UmU6IG1lcmdpbmcg4oCccHJvY2Vzc2Vz4oCdIGFuZCDigJxyZXN0?=
 =?UTF-8?B?cmljdGlvbnPigJ0=?=
List-Id: <gwl-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/gwl-devel>,
	<mailto:gwl-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/gwl-devel/>
List-Post: <mailto:gwl-devel@gnu.org>
List-Help: <mailto:gwl-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/gwl-devel>,
	<mailto:gwl-devel-request@gnu.org?subject=subscribe>
To: Ricardo Wurmus <rekado@elephly.net>
Cc: gwl-devel@gnu.org

Hi Ricardo,

On Mon, 21 Jan 2019 at 23:51, Ricardo Wurmus <rekado@elephly.net> wrote:

> > Is it possible to turn off the test (make check) when building hello ?
>
> This is not supported in Guix, so there=E2=80=99s nothing I can do in the=
 GWL.

Ok.

>
> > Cosmetic comment. :-)
> > About the `A -> B' which means A depends on B.
> > To me, the arrow is counterintuitive, notationally speaking. :-)
> > Because the data flow is going from B to A.
> > Even if this notation is usual when speaking of dependencies and graph.
>
> The arrow is read as =E2=80=9Cdepends on=E2=80=9D.  If you want to we cou=
ld just as well
> support an arrow in the opposite direction, as it really has no
> meaning.  But I think that would be more confusing.

>>From the Snakemake doc about graph and DAG [1], they choose: ""A -> B"
means B depends on A because it expresses how the data flow, i.e. the
output of A is the input of B.
It is the same for CWL [2].
I agree that it is not the usual way to express the dependencies. (e.g. UML=
).
If we choose the snakemake/cwl meaning for `->' then it will not be
consistent with the meaning of the arrow of `guix graph'.

>>From my perspective, it is more intuitive the snakemake/cwl way. But
what is intuitive for someone is not for else one. :-)


If we speak about cosmetic, and let the example fom the graph [3]. I
find more readable:

1.
(graph
   (samsort samindex -> bcftools_call))

than
2.
 (graph
   (bcftools_call <- samsort samindex)

than
3.
 (graph
   (bcftools_call -> samsort samindex)


I do not know, I feel like cutting an hair in four pieces. :-)
(french expression :-)

[1] https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#step-4-=
indexing-read-alignments-and-visualizing-the-dag-of-jobs
[2] https://view.commonwl.org/workflows/github.com/common-workflow-language=
/cwltool/blob/master/cwltool/schemas/v1.0/v1.0/step-valuefrom3-wf.cwl
[3] https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#id1


> > From a simple user perspective, I find more readable the current
> > version with `graph'. Because I am able to see the flow even if I do
> > not know about the processes fry, bake and take.
>
> Right.  I also prefer the explicit =E2=80=9Cgraph=E2=80=9D syntax.  With =
=E2=80=9Clink=E2=80=9D
> (formerly =E2=80=9Cconnect=E2=80=9D) it=E2=80=99s *possible* but not requ=
iried to automatically
> link up all of the processes.  I suspect that this is more in line with
> what Snakemake users might expect.

Instead of `link', why not `auto-link'?


> > From my point of view, the `let' part fixes the entry point or some
> > specific location of outputs (for debugging purpose?).
> >
> > (define (eat input output)
> >  (process
> >   (name "Eat")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (define (cook input output)
> >  (process
> >   (name "Cook")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (define (take input output)
> >  (process
> >   (name "Take")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (workflow
> >   (processes
> >     (let ((take-choc (inputs take "/path/to/chocolate"))
> >           (take-cake (outputs take "/path/to/store/cake"))
> >           (miam (outputs eat "/path/to/my/mouth")))
> >     (graph
> >        (cook -> take-choc)
> >        (take-cake -> cook)
> >        (miam -> take-cake)))
> >
> > If the inputs/outputs are not specified in the `let' part, then they
> > are automatically stored somewhere in /tmp/ or elsewhere and then
> > (optionally) removed when all the workflow is done.
> >
> > I imagine `inputs'/`outputs' returning a curryfied process, somehow.
> >
> > And similarly about options, e.g,
> >  (define* (cook input output #:optional temp-woven)
> >      blah)
> >
> >
> > Does it make sense ?
>
> This seems to be from the perspective of data flow as you indicated
> earlier.  I=E2=80=99m not sure I fully understand it, but I give it a try=
.  (To
> me it seems similar to continuations.)

I am not clear with continuations but yes it seems similar once said. :-)


Thank you to take from your time and give it a try.


> Expressed as a data flow the workflow looks like this:
>
>   (take "chocolate") =3D> cook =3D> (take "cake") =3D> miam
>
> At each step we generate a value that can be processed by the next
> step.  This looks suspiciously like an Arrow[1].

You better expressed my thoughts. :-)

>
> [1]: https://www.haskell.org/arrows/syntax.html
>
>   (push "chocolate"
>     (>>> take cook take miam))
>
> i.e. we push the value =E2=80=9Cchocolate=E2=80=9D into a chain where a p=
rocedure=E2=80=99s
> outputs are connected to the next procedure=E2=80=99s inputs.
>
> The example makes it a bit hard to think about this clearly =E2=80=94 wha=
t about
> the second invocation of =E2=80=9Ctake=E2=80=9D?  What about multiple inp=
uts?  Isn=E2=80=99t
> this just function composition and application?

To me, multiple inputs or outputs should be an issue when composing, I agre=
e.

Say that `take' takes 2 inputs, say `a' and `b'. We could impose to
pack them as a list (a b) and the process' writer should have to
unpack them.
Now say that `cook` returns 3 outputs, say `x' and `y' and `z'. They
are also packed as a list.
However how to encode the facts that `a' corresponds to `z', and `b' to `y'=
.

You need somehow a dummy process that unpack and repack, that somehow
agrees the "type" of each process.

(push
 (>>> take cook dumb take miam))

(define (dumb input output)
  (data-inputs ((u (cadr input)
                        (v (caadr input)))
  (outputs (v u)))


I do not know if it makes sense, if it is usable and better.
I just find that more "functional".


>
> x >=E2=80=93 A =E2=80=93> B =E2=80=94> C =E2=80=93> E =E2=80=93> F
>      |    `=E2=80=93=E2=80=93> D =E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=
=80=93=E2=80=93/
>      `=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93/
>
> x is the input to the data flow.
>
>     (flow (x)
>       (a <- (A x))     ; apply A and bind output to =E2=80=9Ca=E2=80=9D
>       (b <- (B a))     ; apply B and bind output to =E2=80=9Cb=E2=80=9D
>       (e <- (>>> C E)) ; apply C and then E, bind the output to =E2=80=9C=
e=E2=80=9D
>       (d <- (D a b))   ; apply D and bind the output to =E2=80=9Cd=E2=80=
=9D
>       (-> (F e d)))    ; return F applied to =E2=80=9Ce=E2=80=9D and =E2=
=80=9Cd=E2=80=9D
>
> =E2=80=9Cflow=E2=80=9D would somehow figure out in what order to run thin=
gs.  I feel
> that there should be a better way to express this, but I haven=E2=80=99t =
found
> one.

Yes. This is already nice! :-)


And the user does not have to manage by hand the names of all the outputs.
In other word, say the user has already computer your workflow with
`x' set to /path/to/my-file.
Then this user writes another flow:
 (flow (x)
  (z <- (>>> A B x))
  (-> (G z)))
When apply this second flow to /path/to/my-file, then the result `z'
is already in the CAS (see `b') and only (G z) is computed.
The dream should be:
 (flow (x)
   (-> ((>>> A B G) x)))
And to automatically detect that the composition `B . A' is already
computed for the value /path/to/my-file.
Well, I am dreaming... :-)


All the best,
simon