unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Roel Janssen <roel@gnu.org>
To: zimoun <zimon.toutoune@gmail.com>
Cc: Guix Devel <guix-devel@gnu.org>
Subject: Re: GWL pipelined process composition ?
Date: Thu, 19 Jul 2018 10:15:24 +0200	[thread overview]
Message-ID: <87muun8wvf.fsf@gnu.org> (raw)
In-Reply-To: <CAJ3okZ39W5vikMP0_zjooO-8VmobaXoLCZN-nCk7ygNe29bSyg@mail.gmail.com>


zimoun <zimon.toutoune@gmail.com> writes:

> Hi Roel,
>
> Thank you for all your comments.
>
>
>> Maybe we can come up with a convenient way to combine two processes
>> using a shell pipe.  But this needs more thought!
>
> Yes, from my point of view, the classic shell pipe `|` has two strong
> limitations for workflows:
>  1. it does not compose at the 'process' level but at the procedure 'level'
>  2. it cannot deal with two inputs.

Yes, and this strongly suggests that shell pipes are indeed limited to
the procedures *the shell* can combine.  So we can only use them at the
procedure level.  They weren't designed to deal with two (or more)
inputs, and if they were, that would make it vastly more complex.

> As an illustration for the point 1., it appears to me more "functional
> spirit" to write one process/task/unit corresponding to "samtools
> view" and another one about compressing "gzip -c". Then, if you have a
> process that filters some fastq, you can easily reuse the compress
> process, and composes it. For more complicated workflows, such as
> RNAseq or other, the composition seems an advantage.

Maybe we could solve this at the symbolic (programming) level instead.

So if we were to try to avoid using "| gzip -c > ..." all over our code,
we could define a function to wrap this.  Here's a simple example:

(define (with-compressed-output command output-file)
  (system (string-append command " | gzip -c > " output-file)))

And then you could use it in a procedure like so:

(define-public A
  (process
    (name "A")
    (package-inputs (list samtools gzip))
    (data-inputs "/tmp/sample.sam")
    (outputs "/tmp/sample.sam.gz")
    (procedure
     #~(with-compressed-output
         (string-append "samtools view " #$data-inputs)
         #$outputs))))

This isn't perfect, because we still need to include “gzip” in the
‘package-inputs’.  It doesn't allow multiple input files, nor does it
split the “gzip” command from the “samtools” command on the process
level.  However, it does allow us to express the idea that we want to
compress the output of a command and save that in a file without having
to explicitely provide the commands to do that.

>
> As an illustration for the point 2., I do not do with shell pipe:
>
>   dd if=/dev/urandom of=file1 bs=1024 count=1k
>   dd if=/dev/urandom of=file2 bs=1024 count=2k
>   tar -cvf file.tar file1 file2
>
> or whatever process instead of `dd` which is perhaps not the right example here.
> To be clear,
>   process that outputs fileA
>   process that outputs fileB
>   process that inputs fileA *and* fileB
> without write on disk fileA and fileB.

Given the ‘dd’ example, I don't see how that could work without
reinventing the way filesystems work.

> All the best,
> simon

Thanks!

Kind regards,
Roel Janssen

      parent reply	other threads:[~2018-07-19  8:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 11:20 GWL pipelined process composition ? zimoun
2018-07-18 17:29 ` Roel Janssen
2018-07-18 21:55   ` zimoun
2018-07-19  7:13     ` Pjotr Prins
2018-07-19 11:44       ` zimoun
2018-07-19  8:15     ` Roel Janssen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87muun8wvf.fsf@gnu.org \
    --to=roel@gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).