unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
* merging “processes” and “restrictions”
@ 2019-01-19  8:55 Ricardo Wurmus
  2019-01-19 10:26 ` zimoun
  0 siblings, 1 reply; 15+ messages in thread
From: Ricardo Wurmus @ 2019-01-19  8:55 UTC (permalink / raw)
  To: gwl-devel

Hi,

I think it is unfortunate that workflows have two fields for specifying
processes: one is “processes” the other is “restrictions”.  It seems to
me that we can do without “restrictions” by relaxing the requirements
for the “processes” value.

The use of both fields sometimes makes it necessary to wrap the whole
workflow definition in a let binding so that both fields can access
identical values:

  (let ((eat-fruit (eat "fruit"))
        (eat-veges (eat "vegetables")))
    (workflow
     (name "simple")
     (processes
      (list greet
            eat-fruit
            eat-veges
            sleep
            bye))
     (restrictions
      `((,eat-fruit ,greet)
        (,eat-veges ,greet)
        (,sleep ,eat-fruit ,eat-veges)
        (,bye ,sleep)))))

This looks like a minor improvement to me because the let can be where
it’s needed:

    (workflow
     (name "simple")
     (processes
      (let ((eat-fruit (eat "fruit"))
            (eat-veges (eat "veges")))
        (list (list eat-fruit greet)
              (list eat-veges greet)
              (list sleep eat-fruit eat-veges)
              (list bye sleep)))))

All of the elements of the list together is equivalent to the list of
processes.  The “processes” field now also doubles as a “restrictions”
field as the value can be an adjacency list of processes to their
dependencies.

For trivial processes where none of the processes depend on each other
it would look like this:

    (workflow
     (name "simple")
     (processes
       (list (list A)
             (list B)
             (list C))))

With just a little bit of extra processing before storing the value it
could become this instead:

    (workflow
     (name "simple")
     (processes A B C))

If you’re like me you’ll find that the restrictions syntax looks rather
verbose with all those “list”s.  Using quoting doesn’t make this any
more readable, unfortunately:

    (workflow
     (name "simple")
     (processes
      (let ((eat-fruit (eat "fruit"))
            (eat-veges (eat "veges")))
        `((,eat-fruit ,greet)
          (,eat-veges ,greet)
          (,sleep ,eat-fruit ,eat-veges)
          (,bye ,sleep)))))

Can we use macros to clarify the syntax?

    (workflow
     (name "simple")
     (processes
      (let ((eat-fruit (eat "fruit"))
            (eat-veges (eat "veges")))
        (graph (eat-fruit -> greet)
               (eat-veges -> greet)
               (sleep     -> eat-fruit eat-veges)
               (bye       -> sleep)))))

“graph” would be a macro that takes any number of node to node
associations, each of which are expected to be in the form

    (node -> nodes …)

“graph” isn’t a great name.  Maybe you can suggest a different name or
even a character…

What do you think?

--
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-19  8:55 merging “processes” and “restrictions” Ricardo Wurmus
@ 2019-01-19 10:26 ` zimoun
  2019-01-19 11:45   ` Ricardo Wurmus
  0 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2019-01-19 10:26 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel

Dear Ricardo,

I agree with your "graph proposal".

I have the same conclusion as you, that there is unconvenient duplication.

As we discussed elsewhere, I see the process as "pure functions" and
the aim of the workflow  is to glue them together by writing the
graph. I am not clear yet about how to manage the inputs/outputs
(fixed in the definition of the process or fixed in the workflow) and
from my point of view your proposal is better than the restriction
way.

I am still failing to write a macro that inplements my "view":
 - write the graph
 - collect the inputs/outputs
which somehow is similar to your proposal. I pick the name `dataflow'
for this not-yet-immplemented macro name.

Well, instead of your graph name, I propose dataflow or stream or datastream.


What do you think?


All the best,
simon

On Sat, 19 Jan 2019 at 09:56, Ricardo Wurmus <rekado@elephly.net> wrote:
>
> Hi,
>
> I think it is unfortunate that workflows have two fields for specifying
> processes: one is “processes” the other is “restrictions”.  It seems to
> me that we can do without “restrictions” by relaxing the requirements
> for the “processes” value.
>
> The use of both fields sometimes makes it necessary to wrap the whole
> workflow definition in a let binding so that both fields can access
> identical values:
>
>   (let ((eat-fruit (eat "fruit"))
>         (eat-veges (eat "vegetables")))
>     (workflow
>      (name "simple")
>      (processes
>       (list greet
>             eat-fruit
>             eat-veges
>             sleep
>             bye))
>      (restrictions
>       `((,eat-fruit ,greet)
>         (,eat-veges ,greet)
>         (,sleep ,eat-fruit ,eat-veges)
>         (,bye ,sleep)))))
>
> This looks like a minor improvement to me because the let can be where
> it’s needed:
>
>     (workflow
>      (name "simple")
>      (processes
>       (let ((eat-fruit (eat "fruit"))
>             (eat-veges (eat "veges")))
>         (list (list eat-fruit greet)
>               (list eat-veges greet)
>               (list sleep eat-fruit eat-veges)
>               (list bye sleep)))))
>
> All of the elements of the list together is equivalent to the list of
> processes.  The “processes” field now also doubles as a “restrictions”
> field as the value can be an adjacency list of processes to their
> dependencies.
>
> For trivial processes where none of the processes depend on each other
> it would look like this:
>
>     (workflow
>      (name "simple")
>      (processes
>        (list (list A)
>              (list B)
>              (list C))))
>
> With just a little bit of extra processing before storing the value it
> could become this instead:
>
>     (workflow
>      (name "simple")
>      (processes A B C))
>
> If you’re like me you’ll find that the restrictions syntax looks rather
> verbose with all those “list”s.  Using quoting doesn’t make this any
> more readable, unfortunately:
>
>     (workflow
>      (name "simple")
>      (processes
>       (let ((eat-fruit (eat "fruit"))
>             (eat-veges (eat "veges")))
>         `((,eat-fruit ,greet)
>           (,eat-veges ,greet)
>           (,sleep ,eat-fruit ,eat-veges)
>           (,bye ,sleep)))))
>
> Can we use macros to clarify the syntax?
>
>     (workflow
>      (name "simple")
>      (processes
>       (let ((eat-fruit (eat "fruit"))
>             (eat-veges (eat "veges")))
>         (graph (eat-fruit -> greet)
>                (eat-veges -> greet)
>                (sleep     -> eat-fruit eat-veges)
>                (bye       -> sleep)))))
>
> “graph” would be a macro that takes any number of node to node
> associations, each of which are expected to be in the form
>
>     (node -> nodes …)
>
> “graph” isn’t a great name.  Maybe you can suggest a different name or
> even a character…
>
> What do you think?
>
> --
> Ricardo
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-19 10:26 ` zimoun
@ 2019-01-19 11:45   ` Ricardo Wurmus
  2019-01-19 17:55     ` zimoun
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ricardo Wurmus @ 2019-01-19 11:45 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel


Hi simon,

> I am not clear yet about how to manage the inputs/outputs
> (fixed in the definition of the process or fixed in the workflow)
[…]
> I am still failing to write a macro that inplements my "view":
>  - write the graph
>  - collect the inputs/outputs

This is interesting and it might be a solution to this conundrum.  If
the processes can declare their inputs without refering to other
processes then we have a solution: the graph can be built from the
inputs and outputs of the provided processes without having to specify
any dependencies manually.

We need a procedure that takes any number of processes as inputs and
matches inputs with outputs to generate an adjacency list of processes.
This shouldn’t be difficult.

> I pick the name `dataflow' for this not-yet-immplemented macro name.
>
> Well, instead of your graph name, I propose dataflow or stream or datastream.

I’d like this to be a short name if possible.  In fact, I’d prefer if it
was completely invisible like this:

   (workflow
    (name "simple")
    (processes
      ((eat "fruit") -> greet)
      ((eat "veges") -> greet)
      (sleep         -> (eat "fruit") (eat "veges"))
      (bye           -> sleep)))

Or like this assuming that all of the processes declare inputs and
outputs *somehow*:

  (workflow
   (name "simple")
   (processes
     (eat "fruit") (eat "veges") greet sleep bye))

I’ll play around with this today.

--
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-19 11:45   ` Ricardo Wurmus
@ 2019-01-19 17:55     ` zimoun
  2019-01-19 20:51       ` Ricardo Wurmus
  2019-01-21 14:43     ` Ricardo Wurmus
  2019-01-21 15:32     ` Ricardo Wurmus
  2 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2019-01-19 17:55 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel

Hi Ricardo,


On Sat, 19 Jan 2019 at 12:45, Ricardo Wurmus <rekado@elephly.net> wrote:
>
> > I am not clear yet about how to manage the inputs/outputs
> > (fixed in the definition of the process or fixed in the workflow)
> […]
> > I am still failing to write a macro that inplements my "view":
> >  - write the graph
> >  - collect the inputs/outputs
>
> This is interesting and it might be a solution to this conundrum.  If
> the processes can declare their inputs without refering to other
> processes then we have a solution: the graph can be built from the
> inputs and outputs of the provided processes without having to specify
> any dependencies manually.
>
> We need a procedure that takes any number of processes as inputs and
> matches inputs with outputs to generate an adjacency list of processes.

I agree even I am not sure to be fully clear. :-)
And your proposal with `let' is already better than the current duplication.

> This shouldn’t be difficult.

I trust you. :-)
I am not enough skilled in Scheme to success.


>
> > I pick the name `dataflow' for this not-yet-immplemented macro name.
> >
> > Well, instead of your graph name, I propose dataflow or stream or datastream.
>
> I’d like this to be a short name if possible.  In fact, I’d prefer if it
> was completely invisible like this:
>
>    (workflow
>     (name "simple")
>     (processes
>       ((eat "fruit") -> greet)
>       ((eat "veges") -> greet)
>       (sleep         -> (eat "fruit") (eat "veges"))
>       (bye           -> sleep)))

Is it possible invisible?

Or why not `->>' instead of your previous `graph'?


> Or like this assuming that all of the processes declare inputs and
> outputs *somehow*:
>
>   (workflow
>    (name "simple")
>    (processes
>      (eat "fruit") (eat "veges") greet sleep bye))

With this, I do not see how the graph could be deduced; without
specifying the inputs-outputs relationship and without specifying the
processes relationship.
I prefer the `->' version. :-)


Thank you for your attempt to improve. :-)


Cheers,
simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-19 17:55     ` zimoun
@ 2019-01-19 20:51       ` Ricardo Wurmus
  2019-01-21 18:45         ` zimoun
  0 siblings, 1 reply; 15+ messages in thread
From: Ricardo Wurmus @ 2019-01-19 20:51 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel


zimoun <zimon.toutoune@gmail.com> writes:

>> I’d like this to be a short name if possible.  In fact, I’d prefer if it
>> was completely invisible like this:
>>
>>    (workflow
>>     (name "simple")
>>     (processes
>>       ((eat "fruit") -> greet)
>>       ((eat "veges") -> greet)
>>       (sleep         -> (eat "fruit") (eat "veges"))
>>       (bye           -> sleep)))
>
> Is it possible invisible?

Sure, it just requires more macrology.

>> Or like this assuming that all of the processes declare inputs and
>> outputs *somehow*:
>>
>>   (workflow
>>    (name "simple")
>>    (processes
>>      (eat "fruit") (eat "veges") greet sleep bye))
>
> With this, I do not see how the graph could be deduced; without
> specifying the inputs-outputs relationship and without specifying the
> processes relationship.

This will only work if these processes declare inputs and outputs and
they can be matched up.  Otherwise all of these processes would be
deemed independent.

I still wonder how processes should declare inputs.  The easiest and
possibly least useful way I can think of is to have them declare
abstract symbols.

--8<---------------cut here---------------start------------->8---
(process: 'bake
  (data-inputs '(flour eggs))
  (procedure '(display "baking"))
  (outputs '(cake)))

(process: fry
  (data-inputs '(flour eggs))
  (procedure '(display "frying"))
  (outputs '(pancake)))

(process: (take thing)
  (procedure '(format #t "taking ~a." thing))
  (outputs (list thing)))

(workflow: dinner
  (processes
    (list (take 'flour) (take 'eggs) fry bake)))
--8<---------------cut here---------------end--------------->8---

Here all of the dinner processes have outputs:

  (map process-outputs (workflow-processes dinner)
  => (list 'flour 'eggs 'pancake 'cake)

And here are the inputs:

  (map process-data-inputs (workflow-processes dinner)
  => (list #f #f '(flour eggs) '(flour eggs))

Given this information we can deduce the adjacency list:

  (graph
   (fry  -> (take 'flour) (take 'eggs))
   (bake -> (take 'flour) (take 'eggs)))

In this case “outputs” would mean “provides”, and “data-inputs” would be
“requires”.  There could be more than one process “providing” a certain
kind of output.

I’m not sure how useful this is as a *generic* mechanism, though.  One
could also use this as a very specific mechanism, for example to have a
process declare that it outputs a certain file, and another that it
takes this very same file as an input.

(I don’t know how this would relate to the content addressable data
store.  Maybe it doesn’t at all.)

--
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-19 11:45   ` Ricardo Wurmus
  2019-01-19 17:55     ` zimoun
@ 2019-01-21 14:43     ` Ricardo Wurmus
  2019-01-21 18:53       ` zimoun
  2019-01-21 15:32     ` Ricardo Wurmus
  2 siblings, 1 reply; 15+ messages in thread
From: Ricardo Wurmus @ 2019-01-21 14:43 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> Hi simon,
>
>> I am not clear yet about how to manage the inputs/outputs
>> (fixed in the definition of the process or fixed in the workflow)
> […]
>> I am still failing to write a macro that inplements my "view":
>>  - write the graph
>>  - collect the inputs/outputs
>
> This is interesting and it might be a solution to this conundrum.  If
> the processes can declare their inputs without refering to other
> processes then we have a solution: the graph can be built from the
> inputs and outputs of the provided processes without having to specify
> any dependencies manually.
>
> We need a procedure that takes any number of processes as inputs and
> matches inputs with outputs to generate an adjacency list of processes.
> This shouldn’t be difficult.

This works now:

--8<---------------cut here---------------start------------->8---
define-module : simple-wisp

use-modules
  gwl workflows
  gwl processes
  gwl sugar

process: hello
  package-inputs
    list "hello"
  synopsis "Run hello"
  procedure '(system* "hello")

process: python-test
  package-inputs
    list "python2"
  data-inputs
    list "sample.bam" "hg38.fa" "abc"
  synopsis "Run Python"
  description
    . "Run Python and demonstrate that it can access process information via environment variables."
  procedure ## python
import os

def hello():
  print "hello from python 2"
  print os.environ["_GWL_PROCESS_DATA_INPUTS"]
  print os.environ["_GWL_PROCESS_NAME"]

hello()
##

process: bash-test
  package-inputs : list "bash"
  synopsis "Run Bash"
  description
    . "Run Bash and demonstrate that it can access process information via environment variables."
  procedure ## /bin/bash -c
echo "${_GWL_PROCESS_DATA_INPUTS}"
echo "${_GWL_PROCESS_NAME}"
##

workflow: simple-wisp
  processes
    graph
      python-test -> hello
      bash-test   -> hello
--8<---------------cut here---------------end--------------->8---

With a little more macrology the “graph” call could disappear, but that
would make it impossible to specify an adjacency list as a simple alist,
which I’d like to keep supporting as it is useful when combining
workflows.

--
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-19 11:45   ` Ricardo Wurmus
  2019-01-19 17:55     ` zimoun
  2019-01-21 14:43     ` Ricardo Wurmus
@ 2019-01-21 15:32     ` Ricardo Wurmus
  2019-01-21 18:55       ` zimoun
  2019-01-21 19:33       ` Ricardo Wurmus
  2 siblings, 2 replies; 15+ messages in thread
From: Ricardo Wurmus @ 2019-01-21 15:32 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> We need a procedure that takes any number of processes as inputs and
> matches inputs with outputs to generate an adjacency list of processes.
> This shouldn’t be difficult.

This procedure is called “connect” and it is now available.  With
connect and well-specified inputs and outputs one can now do this:

    (workflow
     (name "pipeline")
     (processes
      (connect compress-files create-files move-archives)))

All of these processes declare inputs and outputs and the correct
adjacency list is produced by “connect” by matching them up.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-19 20:51       ` Ricardo Wurmus
@ 2019-01-21 18:45         ` zimoun
  2019-01-21 22:51           ` Ricardo Wurmus
  0 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2019-01-21 18:45 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel

Hi Ricardo,

I have just updated the repo.
Wouawou !!

For example, I run:

 guix gc
 GUILE_AUTO_COMPILE=0 GUIX_WORKFLOW_PATH=./doc/examples/ \
  ./pre-inst-env guix workflow -r simple

and all the dance with the store shows up. Beautiful! :-)

Is it possible to turn off the test (make check) when building hello ?


Cosmetic comment. :-)
About the `A -> B' which means A depends on B.
To me, the arrow is counterintuitive, notationally speaking. :-)
Because the data flow is going from B to A.
Even if this notation is usual when speaking of dependencies and graph.


> >> Or like this assuming that all of the processes declare inputs and
> >> outputs *somehow*:
> >>
> >>   (workflow
> >>    (name "simple")
> >>    (processes
> >>      (eat "fruit") (eat "veges") greet sleep bye))
> >
> > With this, I do not see how the graph could be deduced; without
> > specifying the inputs-outputs relationship and without specifying the
> > processes relationship.
>
> This will only work if these processes declare inputs and outputs and
> they can be matched up.  Otherwise all of these processes would be
> deemed independent.
>
> I still wonder how processes should declare inputs.  The easiest and
> possibly least useful way I can think of is to have them declare
> abstract symbols.
>
> --8<---------------cut here---------------start------------->8---
> (process: 'bake
>   (data-inputs '(flour eggs))
>   (procedure '(display "baking"))
>   (outputs '(cake)))
>
> (process: fry
>   (data-inputs '(flour eggs))
>   (procedure '(display "frying"))
>   (outputs '(pancake)))
>
> (process: (take thing)
>   (procedure '(format #t "taking ~a." thing))
>   (outputs (list thing)))
>
> (workflow: dinner
>   (processes
>     (list (take 'flour) (take 'eggs) fry bake)))
> --8<---------------cut here---------------end--------------->8---
>
[...]
> Given this information we can deduce the adjacency list:
>
>   (graph
>    (fry  -> (take 'flour) (take 'eggs))
>    (bake -> (take 'flour) (take 'eggs)))
>
[...]
> I’m not sure how useful this is as a *generic* mechanism, though.  One
> could also use this as a very specific mechanism, for example to have a
> process declare that it outputs a certain file, and another that it
> takes this very same file as an input.

>From a simple user perspective, I find more readable the current
version with `graph'. Because I am able to see the flow even if I do
not know about the processes fry, bake and take.
With:
 (graph
   (fry -> (take 'flour) (take 'eggs))
   (bake -> (take 'flour) (take 'cheese)))
the dependency graph is clear even if I have no idea about all the processes.
With:
  (list (take 'flour) (take 'eggs) fry bake)))
I need to know how the process `fry' is built to deduce what this
workflow will do.

>From my point of view, the `let' part fixes the entry point or some
specific location of outputs (for debugging purpose?).

(define (eat input output)
 (process
  (name "Eat")
  (data-inputs input)
  (outputs output)))

(define (cook input output)
 (process
  (name "Cook")
  (data-inputs input)
  (outputs output)))

(define (take input output)
 (process
  (name "Take")
  (data-inputs input)
  (outputs output)))

(workflow
  (processes
    (let ((take-choc (inputs take "/path/to/chocolate"))
          (take-cake (outputs take "/path/to/store/cake"))
          (miam (outputs eat "/path/to/my/mouth")))
    (graph
       (cook -> take-choc)
       (take-cake -> cook)
       (miam -> take-cake)))

If the inputs/outputs are not specified in the `let' part, then they
are automatically stored somewhere in /tmp/ or elsewhere and then
(optionally) removed when all the workflow is done.

I imagine `inputs'/`outputs' returning a curryfied process, somehow.

And similarly about options, e.g,
 (define* (cook input output #:optional temp-woven)
     blah)


Does it make sense ?


> (I don’t know how this would relate to the content addressable data
> store.  Maybe it doesn’t at all.)

I do not know neither. :-)


All the best,
simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-21 14:43     ` Ricardo Wurmus
@ 2019-01-21 18:53       ` zimoun
  0 siblings, 0 replies; 15+ messages in thread
From: zimoun @ 2019-01-21 18:53 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel

Hi Ricardo,


> This works now:

Awesome !!
Even if python and bash is less funny than cookie and fruits ;-)

>
> --8<---------------cut here---------------start------------->8---
> define-module : simple-wisp
>
> use-modules
>   gwl workflows
>   gwl processes
>   gwl sugar
>
> process: hello
>   package-inputs
>     list "hello"
>   synopsis "Run hello"
>   procedure '(system* "hello")
>
> process: python-test
>   package-inputs
>     list "python2"
>   data-inputs
>     list "sample.bam" "hg38.fa" "abc"
>   synopsis "Run Python"
>   description
>     . "Run Python and demonstrate that it can access process information via environment variables."
>   procedure ## python
> import os
>
> def hello():
>   print "hello from python 2"
>   print os.environ["_GWL_PROCESS_DATA_INPUTS"]
>   print os.environ["_GWL_PROCESS_NAME"]
>
> hello()
> ##
>
> process: bash-test
>   package-inputs : list "bash"
>   synopsis "Run Bash"
>   description
>     . "Run Bash and demonstrate that it can access process information via environment variables."
>   procedure ## /bin/bash -c
> echo "${_GWL_PROCESS_DATA_INPUTS}"
> echo "${_GWL_PROCESS_NAME}"
> ##
>
> workflow: simple-wisp
>   processes
>     graph
>       python-test -> hello
>       bash-test   -> hello
> --8<---------------cut here---------------end--------------->8---


How to I try that?


> With a little more macrology the “graph” call could disappear, but that
> would make it impossible to specify an adjacency list as a simple alist,
> which I’d like to keep supporting as it is useful when combining
> workflows.

I am not sure that `graph' should disappear.
Because it provides useful information when reading what is going on.

I mean if the aim of workflow is understandable from the graph and
then I can easily explain it to colleagues and/or collaborator.
For example, Bob writes processes and Alice writes the workflow based
on the description without diving in all the details to compose them.


Thank you for all your work!

Cheers,
simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-21 15:32     ` Ricardo Wurmus
@ 2019-01-21 18:55       ` zimoun
  2019-01-21 19:33       ` Ricardo Wurmus
  1 sibling, 0 replies; 15+ messages in thread
From: zimoun @ 2019-01-21 18:55 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel

Hi Ricardo,

On Mon, 21 Jan 2019 at 16:32, Ricardo Wurmus <rekado@elephly.net> wrote:
> Ricardo Wurmus <rekado@elephly.net> writes:
>
> > We need a procedure that takes any number of processes as inputs and
> > matches inputs with outputs to generate an adjacency list of processes.
> > This shouldn’t be difficult.
>
> This procedure is called “connect” and it is now available.  With
> connect and well-specified inputs and outputs one can now do this:
>
>     (workflow
>      (name "pipeline")
>      (processes
>       (connect compress-files create-files move-archives)))
>
> All of these processes declare inputs and outputs and the correct
> adjacency list is produced by “connect” by matching them up.

Nice !!

I am playing... :-)

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-21 15:32     ` Ricardo Wurmus
  2019-01-21 18:55       ` zimoun
@ 2019-01-21 19:33       ` Ricardo Wurmus
  2019-01-21 19:59         ` zimoun
  1 sibling, 1 reply; 15+ messages in thread
From: Ricardo Wurmus @ 2019-01-21 19:33 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> Ricardo Wurmus <rekado@elephly.net> writes:
>
>> We need a procedure that takes any number of processes as inputs and
>> matches inputs with outputs to generate an adjacency list of processes.
>> This shouldn’t be difficult.
>
> This procedure is called “connect” and it is now available.  With
> connect and well-specified inputs and outputs one can now do this:
>
>     (workflow
>      (name "pipeline")
>      (processes
>       (connect compress-files create-files move-archives)))
>
> All of these processes declare inputs and outputs and the correct
> adjacency list is produced by “connect” by matching them up.

Eh, I just noticed that “connect” is a Guile core binding.  To avoid
shadowing a potentially useful feature we better find a new short name
for “connect”.

How about “link”?

--
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-21 19:33       ` Ricardo Wurmus
@ 2019-01-21 19:59         ` zimoun
  2019-01-26 21:49           ` Ricardo Wurmus
  0 siblings, 1 reply; 15+ messages in thread
From: zimoun @ 2019-01-21 19:59 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel

> How about “link”?

connect was nice :-)

combine?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-21 18:45         ` zimoun
@ 2019-01-21 22:51           ` Ricardo Wurmus
  2019-01-22  8:49             ` zimoun
  0 siblings, 1 reply; 15+ messages in thread
From: Ricardo Wurmus @ 2019-01-21 22:51 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel


Hi simon,

> For example, I run:
>
>  guix gc
>  GUILE_AUTO_COMPILE=0 GUIX_WORKFLOW_PATH=./doc/examples/ \
>   ./pre-inst-env guix workflow -r simple
>
> and all the dance with the store shows up. Beautiful! :-)
>
> Is it possible to turn off the test (make check) when building hello ?

This is not supported in Guix, so there’s nothing I can do in the GWL.

> Cosmetic comment. :-)
> About the `A -> B' which means A depends on B.
> To me, the arrow is counterintuitive, notationally speaking. :-)
> Because the data flow is going from B to A.
> Even if this notation is usual when speaking of dependencies and graph.

The arrow is read as “depends on”.  If you want to we could just as well
support an arrow in the opposite direction, as it really has no
meaning.  But I think that would be more confusing.

>> >> Or like this assuming that all of the processes declare inputs and
>> >> outputs *somehow*:
>> >>
>> >>   (workflow
>> >>    (name "simple")
>> >>    (processes
>> >>      (eat "fruit") (eat "veges") greet sleep bye))
>> >
>> > With this, I do not see how the graph could be deduced; without
>> > specifying the inputs-outputs relationship and without specifying the
>> > processes relationship.
>>
>> This will only work if these processes declare inputs and outputs and
>> they can be matched up.  Otherwise all of these processes would be
>> deemed independent.
>>
>> I still wonder how processes should declare inputs.  The easiest and
>> possibly least useful way I can think of is to have them declare
>> abstract symbols.
>>
>> --8<---------------cut here---------------start------------->8---
>> (process: 'bake
>>   (data-inputs '(flour eggs))
>>   (procedure '(display "baking"))
>>   (outputs '(cake)))
>>
>> (process: fry
>>   (data-inputs '(flour eggs))
>>   (procedure '(display "frying"))
>>   (outputs '(pancake)))
>>
>> (process: (take thing)
>>   (procedure '(format #t "taking ~a." thing))
>>   (outputs (list thing)))
>>
>> (workflow: dinner
>>   (processes
>>     (list (take 'flour) (take 'eggs) fry bake)))
>> --8<---------------cut here---------------end--------------->8---
>>
> [...]
>> Given this information we can deduce the adjacency list:
>>
>>   (graph
>>    (fry  -> (take 'flour) (take 'eggs))
>>    (bake -> (take 'flour) (take 'eggs)))
>>
> [...]
>> I’m not sure how useful this is as a *generic* mechanism, though.  One
>> could also use this as a very specific mechanism, for example to have a
>> process declare that it outputs a certain file, and another that it
>> takes this very same file as an input.
>
> From a simple user perspective, I find more readable the current
> version with `graph'. Because I am able to see the flow even if I do
> not know about the processes fry, bake and take.

Right.  I also prefer the explicit “graph” syntax.  With “link”
(formerly “connect”) it’s *possible* but not requiried to automatically
link up all of the processes.  I suspect that this is more in line with
what Snakemake users might expect.

Luckily, we can offer both ways without problems.

> From my point of view, the `let' part fixes the entry point or some
> specific location of outputs (for debugging purpose?).
>
> (define (eat input output)
>  (process
>   (name "Eat")
>   (data-inputs input)
>   (outputs output)))
>
> (define (cook input output)
>  (process
>   (name "Cook")
>   (data-inputs input)
>   (outputs output)))
>
> (define (take input output)
>  (process
>   (name "Take")
>   (data-inputs input)
>   (outputs output)))
>
> (workflow
>   (processes
>     (let ((take-choc (inputs take "/path/to/chocolate"))
>           (take-cake (outputs take "/path/to/store/cake"))
>           (miam (outputs eat "/path/to/my/mouth")))
>     (graph
>        (cook -> take-choc)
>        (take-cake -> cook)
>        (miam -> take-cake)))
>
> If the inputs/outputs are not specified in the `let' part, then they
> are automatically stored somewhere in /tmp/ or elsewhere and then
> (optionally) removed when all the workflow is done.
>
> I imagine `inputs'/`outputs' returning a curryfied process, somehow.
>
> And similarly about options, e.g,
>  (define* (cook input output #:optional temp-woven)
>      blah)
>
>
> Does it make sense ?

This seems to be from the perspective of data flow as you indicated
earlier.  I’m not sure I fully understand it, but I give it a try.  (To
me it seems similar to continuations.)

Expressed as a data flow the workflow looks like this:

  (take "chocolate") => cook => (take "cake") => miam

At each step we generate a value that can be processed by the next
step.  This looks suspiciously like an Arrow[1].

[1]: https://www.haskell.org/arrows/syntax.html

  (push "chocolate"
    (>>> take cook take miam))

i.e. we push the value “chocolate” into a chain where a procedure’s
outputs are connected to the next procedure’s inputs.

The example makes it a bit hard to think about this clearly — what about
the second invocation of “take”?  What about multiple inputs?  Isn’t
this just function composition and application?

  ((>>> take cook take miam) "chocolate")

  ((compose miam take cook take) "chocolate")

I don’t really know what to do with the output field of a process in
this case.  Is it really needed at all?  I guess it is needed when the
data flow is more complex and named outputs can be used.

x >– A –> B —> C –> E –> F
     |    `––> D ––––––/
     `–––––––/

x is the input to the data flow.

    (flow (x)
      (a <- (A x))     ; apply A and bind output to “a”
      (b <- (B a))     ; apply B and bind output to “b”
      (e <- (>>> C E)) ; apply C and then E, bind the output to “e”
      (d <- (D a b))   ; apply D and bind the output to “d”
      (-> (F e d)))    ; return F applied to “e” and “d”

“flow” would somehow figure out in what order to run things.  I feel
that there should be a better way to express this, but I haven’t found
one.

--
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-21 22:51           ` Ricardo Wurmus
@ 2019-01-22  8:49             ` zimoun
  0 siblings, 0 replies; 15+ messages in thread
From: zimoun @ 2019-01-22  8:49 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel

Hi Ricardo,

On Mon, 21 Jan 2019 at 23:51, Ricardo Wurmus <rekado@elephly.net> wrote:

> > Is it possible to turn off the test (make check) when building hello ?
>
> This is not supported in Guix, so there’s nothing I can do in the GWL.

Ok.

>
> > Cosmetic comment. :-)
> > About the `A -> B' which means A depends on B.
> > To me, the arrow is counterintuitive, notationally speaking. :-)
> > Because the data flow is going from B to A.
> > Even if this notation is usual when speaking of dependencies and graph.
>
> The arrow is read as “depends on”.  If you want to we could just as well
> support an arrow in the opposite direction, as it really has no
> meaning.  But I think that would be more confusing.

>From the Snakemake doc about graph and DAG [1], they choose: ""A -> B"
means B depends on A because it expresses how the data flow, i.e. the
output of A is the input of B.
It is the same for CWL [2].
I agree that it is not the usual way to express the dependencies. (e.g. UML).
If we choose the snakemake/cwl meaning for `->' then it will not be
consistent with the meaning of the arrow of `guix graph'.

>From my perspective, it is more intuitive the snakemake/cwl way. But
what is intuitive for someone is not for else one. :-)


If we speak about cosmetic, and let the example fom the graph [3]. I
find more readable:

1.
(graph
   (samsort samindex -> bcftools_call))

than
2.
 (graph
   (bcftools_call <- samsort samindex)

than
3.
 (graph
   (bcftools_call -> samsort samindex)


I do not know, I feel like cutting an hair in four pieces. :-)
(french expression :-)

[1] https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#step-4-indexing-read-alignments-and-visualizing-the-dag-of-jobs
[2] https://view.commonwl.org/workflows/github.com/common-workflow-language/cwltool/blob/master/cwltool/schemas/v1.0/v1.0/step-valuefrom3-wf.cwl
[3] https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#id1


> > From a simple user perspective, I find more readable the current
> > version with `graph'. Because I am able to see the flow even if I do
> > not know about the processes fry, bake and take.
>
> Right.  I also prefer the explicit “graph” syntax.  With “link”
> (formerly “connect”) it’s *possible* but not requiried to automatically
> link up all of the processes.  I suspect that this is more in line with
> what Snakemake users might expect.

Instead of `link', why not `auto-link'?


> > From my point of view, the `let' part fixes the entry point or some
> > specific location of outputs (for debugging purpose?).
> >
> > (define (eat input output)
> >  (process
> >   (name "Eat")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (define (cook input output)
> >  (process
> >   (name "Cook")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (define (take input output)
> >  (process
> >   (name "Take")
> >   (data-inputs input)
> >   (outputs output)))
> >
> > (workflow
> >   (processes
> >     (let ((take-choc (inputs take "/path/to/chocolate"))
> >           (take-cake (outputs take "/path/to/store/cake"))
> >           (miam (outputs eat "/path/to/my/mouth")))
> >     (graph
> >        (cook -> take-choc)
> >        (take-cake -> cook)
> >        (miam -> take-cake)))
> >
> > If the inputs/outputs are not specified in the `let' part, then they
> > are automatically stored somewhere in /tmp/ or elsewhere and then
> > (optionally) removed when all the workflow is done.
> >
> > I imagine `inputs'/`outputs' returning a curryfied process, somehow.
> >
> > And similarly about options, e.g,
> >  (define* (cook input output #:optional temp-woven)
> >      blah)
> >
> >
> > Does it make sense ?
>
> This seems to be from the perspective of data flow as you indicated
> earlier.  I’m not sure I fully understand it, but I give it a try.  (To
> me it seems similar to continuations.)

I am not clear with continuations but yes it seems similar once said. :-)


Thank you to take from your time and give it a try.


> Expressed as a data flow the workflow looks like this:
>
>   (take "chocolate") => cook => (take "cake") => miam
>
> At each step we generate a value that can be processed by the next
> step.  This looks suspiciously like an Arrow[1].

You better expressed my thoughts. :-)

>
> [1]: https://www.haskell.org/arrows/syntax.html
>
>   (push "chocolate"
>     (>>> take cook take miam))
>
> i.e. we push the value “chocolate” into a chain where a procedure’s
> outputs are connected to the next procedure’s inputs.
>
> The example makes it a bit hard to think about this clearly — what about
> the second invocation of “take”?  What about multiple inputs?  Isn’t
> this just function composition and application?

To me, multiple inputs or outputs should be an issue when composing, I agree.

Say that `take' takes 2 inputs, say `a' and `b'. We could impose to
pack them as a list (a b) and the process' writer should have to
unpack them.
Now say that `cook` returns 3 outputs, say `x' and `y' and `z'. They
are also packed as a list.
However how to encode the facts that `a' corresponds to `z', and `b' to `y'.

You need somehow a dummy process that unpack and repack, that somehow
agrees the "type" of each process.

(push
 (>>> take cook dumb take miam))

(define (dumb input output)
  (data-inputs ((u (cadr input)
                        (v (caadr input)))
  (outputs (v u)))


I do not know if it makes sense, if it is usable and better.
I just find that more "functional".


>
> x >– A –> B —> C –> E –> F
>      |    `––> D ––––––/
>      `–––––––/
>
> x is the input to the data flow.
>
>     (flow (x)
>       (a <- (A x))     ; apply A and bind output to “a”
>       (b <- (B a))     ; apply B and bind output to “b”
>       (e <- (>>> C E)) ; apply C and then E, bind the output to “e”
>       (d <- (D a b))   ; apply D and bind the output to “d”
>       (-> (F e d)))    ; return F applied to “e” and “d”
>
> “flow” would somehow figure out in what order to run things.  I feel
> that there should be a better way to express this, but I haven’t found
> one.

Yes. This is already nice! :-)


And the user does not have to manage by hand the names of all the outputs.
In other word, say the user has already computer your workflow with
`x' set to /path/to/my-file.
Then this user writes another flow:
 (flow (x)
  (z <- (>>> A B x))
  (-> (G z)))
When apply this second flow to /path/to/my-file, then the result `z'
is already in the CAS (see `b') and only (G z) is computed.
The dream should be:
 (flow (x)
   (-> ((>>> A B G) x)))
And to automatically detect that the composition `B . A' is already
computed for the value /path/to/my-file.
Well, I am dreaming... :-)


All the best,
simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging “processes” and “restrictions”
  2019-01-21 19:59         ` zimoun
@ 2019-01-26 21:49           ` Ricardo Wurmus
  0 siblings, 0 replies; 15+ messages in thread
From: Ricardo Wurmus @ 2019-01-26 21:49 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel


zimoun <zimon.toutoune@gmail.com> writes:

>> How about “link”?
>
> connect was nice :-)
>
> combine?

I ended up picking “auto-connect”.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-26 21:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-19  8:55 merging “processes” and “restrictions” Ricardo Wurmus
2019-01-19 10:26 ` zimoun
2019-01-19 11:45   ` Ricardo Wurmus
2019-01-19 17:55     ` zimoun
2019-01-19 20:51       ` Ricardo Wurmus
2019-01-21 18:45         ` zimoun
2019-01-21 22:51           ` Ricardo Wurmus
2019-01-22  8:49             ` zimoun
2019-01-21 14:43     ` Ricardo Wurmus
2019-01-21 18:53       ` zimoun
2019-01-21 15:32     ` Ricardo Wurmus
2019-01-21 18:55       ` zimoun
2019-01-21 19:33       ` Ricardo Wurmus
2019-01-21 19:59         ` zimoun
2019-01-26 21:49           ` Ricardo Wurmus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).