From: Pjotr Prins <pjotr.public12@thebird.nl>
To: zimoun <zimon.toutoune@gmail.com>
Cc: Guix Devel <guix-devel@gnu.org>
Subject: Re: GWL pipelined process composition ?
Date: Thu, 19 Jul 2018 09:13:11 +0200 [thread overview]
Message-ID: <20180719071311.u7fydagta7wrwr3h@thebird.nl> (raw)
In-Reply-To: <CAJ3okZ39W5vikMP0_zjooO-8VmobaXoLCZN-nCk7ygNe29bSyg@mail.gmail.com>
On Wed, Jul 18, 2018 at 11:55:25PM +0200, zimoun wrote:
> Hi Roel,
>
> Thank you for all your comments.
>
> > Maybe we can come up with a convenient way to combine two processes
> > using a shell pipe. But this needs more thought!
>
> Yes, from my point of view, the classic shell pipe `|` has two strong
> limitations for workflows:
> 1. it does not compose at the 'process' level but at the procedure 'level'
> 2. it cannot deal with two inputs.
>
> As an illustration for the point 1., it appears to me more "functional
> spirit" to write one process/task/unit corresponding to "samtools
> view" and another one about compressing "gzip -c". Then, if you have a
> process that filters some fastq, you can easily reuse the compress
> process, and composes it. For more complicated workflows, such as
> RNAseq or other, the composition seems an advantage.
Yes, but the original question was whether you could stream data
without writing to disk, right? Unix pipes are the system way of
providing that functionality - with the added advantage of parallel
processing between Unix processes. The downside, as you say is that it
is not that composable.
To make it composable you'd have to manage process communication -
using some network/socket protocol - and GWL would have to fire up the
processes in parallel so they can communicate - preferably on one box.
That is a significant and fragile piece of functionality to add ;).
Error/failure handling in particular will be hard.
Unsurprisingly there are no systems that handle that well - that I am
aware. The best you'll get today is composable containers that 'talk'
with each other. But that is ad hoc network programming.
The great thing about the GWL is that it describes pipelines
deterministically and makes great use of GNU Guix. I think those are
killer features.
Adding composable pipes will magnify the size of the code base and
make it fragile at the same time. Besides, network transport layers
will add another possibility of IO bottle necks. It is a whole project
in itself ;)
Probably a good idea to keep it simple.
I'd stick with pipes when possible. All pipelines can be described as
a combination of sequential processing and scatter/gather processing:
there will always be inefficiencies. To address these you'll need to
rewrite the software tools you want to run (as we did with sambamba at
the time, to replace samtools).
In FP you are working in one space, so it is much easier to compose
(functions).
Pj.
next prev parent reply other threads:[~2018-07-19 7:13 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-18 11:20 GWL pipelined process composition ? zimoun
2018-07-18 17:29 ` Roel Janssen
2018-07-18 21:55 ` zimoun
2018-07-19 7:13 ` Pjotr Prins [this message]
2018-07-19 11:44 ` zimoun
2018-07-19 8:15 ` Roel Janssen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180719071311.u7fydagta7wrwr3h@thebird.nl \
--to=pjotr.public12@thebird.nl \
--cc=guix-devel@gnu.org \
--cc=zimon.toutoune@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.