unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
From: Ricardo Wurmus <rekado@elephly.net>
To: zimoun <zimon.toutoune@gmail.com>
Cc: gwl-devel@gnu.org
Subject: Re: merging “processes” and “restrictions”
Date: Mon, 21 Jan 2019 23:51:21 +0100	[thread overview]
Message-ID: <871s55eiae.fsf@elephly.net> (raw)
In-Reply-To: <CAJ3okZ2F2_C_GtXLnsFbHpOs23uiA7QFomSxzm4LaKC_=QmERQ@mail.gmail.com>


Hi simon,

> For example, I run:
>
>  guix gc
>  GUILE_AUTO_COMPILE=0 GUIX_WORKFLOW_PATH=./doc/examples/ \
>   ./pre-inst-env guix workflow -r simple
>
> and all the dance with the store shows up. Beautiful! :-)
>
> Is it possible to turn off the test (make check) when building hello ?

This is not supported in Guix, so there’s nothing I can do in the GWL.

> Cosmetic comment. :-)
> About the `A -> B' which means A depends on B.
> To me, the arrow is counterintuitive, notationally speaking. :-)
> Because the data flow is going from B to A.
> Even if this notation is usual when speaking of dependencies and graph.

The arrow is read as “depends on”.  If you want to we could just as well
support an arrow in the opposite direction, as it really has no
meaning.  But I think that would be more confusing.

>> >> Or like this assuming that all of the processes declare inputs and
>> >> outputs *somehow*:
>> >>
>> >>   (workflow
>> >>    (name "simple")
>> >>    (processes
>> >>      (eat "fruit") (eat "veges") greet sleep bye))
>> >
>> > With this, I do not see how the graph could be deduced; without
>> > specifying the inputs-outputs relationship and without specifying the
>> > processes relationship.
>>
>> This will only work if these processes declare inputs and outputs and
>> they can be matched up.  Otherwise all of these processes would be
>> deemed independent.
>>
>> I still wonder how processes should declare inputs.  The easiest and
>> possibly least useful way I can think of is to have them declare
>> abstract symbols.
>>
>> --8<---------------cut here---------------start------------->8---
>> (process: 'bake
>>   (data-inputs '(flour eggs))
>>   (procedure '(display "baking"))
>>   (outputs '(cake)))
>>
>> (process: fry
>>   (data-inputs '(flour eggs))
>>   (procedure '(display "frying"))
>>   (outputs '(pancake)))
>>
>> (process: (take thing)
>>   (procedure '(format #t "taking ~a." thing))
>>   (outputs (list thing)))
>>
>> (workflow: dinner
>>   (processes
>>     (list (take 'flour) (take 'eggs) fry bake)))
>> --8<---------------cut here---------------end--------------->8---
>>
> [...]
>> Given this information we can deduce the adjacency list:
>>
>>   (graph
>>    (fry  -> (take 'flour) (take 'eggs))
>>    (bake -> (take 'flour) (take 'eggs)))
>>
> [...]
>> I’m not sure how useful this is as a *generic* mechanism, though.  One
>> could also use this as a very specific mechanism, for example to have a
>> process declare that it outputs a certain file, and another that it
>> takes this very same file as an input.
>
> From a simple user perspective, I find more readable the current
> version with `graph'. Because I am able to see the flow even if I do
> not know about the processes fry, bake and take.

Right.  I also prefer the explicit “graph” syntax.  With “link”
(formerly “connect”) it’s *possible* but not requiried to automatically
link up all of the processes.  I suspect that this is more in line with
what Snakemake users might expect.

Luckily, we can offer both ways without problems.

> From my point of view, the `let' part fixes the entry point or some
> specific location of outputs (for debugging purpose?).
>
> (define (eat input output)
>  (process
>   (name "Eat")
>   (data-inputs input)
>   (outputs output)))
>
> (define (cook input output)
>  (process
>   (name "Cook")
>   (data-inputs input)
>   (outputs output)))
>
> (define (take input output)
>  (process
>   (name "Take")
>   (data-inputs input)
>   (outputs output)))
>
> (workflow
>   (processes
>     (let ((take-choc (inputs take "/path/to/chocolate"))
>           (take-cake (outputs take "/path/to/store/cake"))
>           (miam (outputs eat "/path/to/my/mouth")))
>     (graph
>        (cook -> take-choc)
>        (take-cake -> cook)
>        (miam -> take-cake)))
>
> If the inputs/outputs are not specified in the `let' part, then they
> are automatically stored somewhere in /tmp/ or elsewhere and then
> (optionally) removed when all the workflow is done.
>
> I imagine `inputs'/`outputs' returning a curryfied process, somehow.
>
> And similarly about options, e.g,
>  (define* (cook input output #:optional temp-woven)
>      blah)
>
>
> Does it make sense ?

This seems to be from the perspective of data flow as you indicated
earlier.  I’m not sure I fully understand it, but I give it a try.  (To
me it seems similar to continuations.)

Expressed as a data flow the workflow looks like this:

  (take "chocolate") => cook => (take "cake") => miam

At each step we generate a value that can be processed by the next
step.  This looks suspiciously like an Arrow[1].

[1]: https://www.haskell.org/arrows/syntax.html

  (push "chocolate"
    (>>> take cook take miam))

i.e. we push the value “chocolate” into a chain where a procedure’s
outputs are connected to the next procedure’s inputs.

The example makes it a bit hard to think about this clearly — what about
the second invocation of “take”?  What about multiple inputs?  Isn’t
this just function composition and application?

  ((>>> take cook take miam) "chocolate")

  ((compose miam take cook take) "chocolate")

I don’t really know what to do with the output field of a process in
this case.  Is it really needed at all?  I guess it is needed when the
data flow is more complex and named outputs can be used.

x >– A –> B —> C –> E –> F
     |    `––> D ––––––/
     `–––––––/

x is the input to the data flow.

    (flow (x)
      (a <- (A x))     ; apply A and bind output to “a”
      (b <- (B a))     ; apply B and bind output to “b”
      (e <- (>>> C E)) ; apply C and then E, bind the output to “e”
      (d <- (D a b))   ; apply D and bind the output to “d”
      (-> (F e d)))    ; return F applied to “e” and “d”

“flow” would somehow figure out in what order to run things.  I feel
that there should be a better way to express this, but I haven’t found
one.

--
Ricardo

  reply	other threads:[~2019-01-21 22:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-19  8:55 merging “processes” and “restrictions” Ricardo Wurmus
2019-01-19 10:26 ` zimoun
2019-01-19 11:45   ` Ricardo Wurmus
2019-01-19 17:55     ` zimoun
2019-01-19 20:51       ` Ricardo Wurmus
2019-01-21 18:45         ` zimoun
2019-01-21 22:51           ` Ricardo Wurmus [this message]
2019-01-22  8:49             ` zimoun
2019-01-21 14:43     ` Ricardo Wurmus
2019-01-21 18:53       ` zimoun
2019-01-21 15:32     ` Ricardo Wurmus
2019-01-21 18:55       ` zimoun
2019-01-21 19:33       ` Ricardo Wurmus
2019-01-21 19:59         ` zimoun
2019-01-26 21:49           ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.guixwl.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871s55eiae.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=gwl-devel@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).