unofficial mirror of gwl-devel@gnu.org
 help / color / mirror / Atom feed
* How do I support building a guix package over multiple machines in a cloud environment?
@ 2019-12-02 17:36 Josh Marshall
  2019-12-02 19:00 ` zimoun
  2019-12-02 23:39 ` Ricardo Wurmus
  0 siblings, 2 replies; 11+ messages in thread
From: Josh Marshall @ 2019-12-02 17:36 UTC (permalink / raw)
  To: gwl-devel

Hello all,

Simon directed me from help-guix to wake up this list.  I have a use
case or a colleague to articulate their work as a guix package.  I'm
still very new, and would like to use this as a learning opportunity.
He uses it as a bioinformatics workflow to generate some analysis.  It
typically runs in 6 threads, 42GB peak memory, 100GB of on disk files,
and is currently using nextflow.  This seems like a perfect use case
to try out doing more with guix/gwl.  His workflow is more analogous
to software packaging  that a dumb software pipeline or even a re-run
optimized bioinformatics pipeline.

Is this kind of use case supported?  If so, how so?  Is nextflow not
practical to keep?  Please, someone catch me up here so I can start to
write code to help him out.  If this goes well, my company could
integrate for gwl/guix in our work, which would be amazing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 17:36 How do I support building a guix package over multiple machines in a cloud environment? Josh Marshall
@ 2019-12-02 19:00 ` zimoun
  2019-12-02 20:02   ` pjotr.public12
  2019-12-02 20:03   ` Josh Marshall
  2019-12-02 23:39 ` Ricardo Wurmus
  1 sibling, 2 replies; 11+ messages in thread
From: zimoun @ 2019-12-02 19:00 UTC (permalink / raw)
  To: Josh Marshall; +Cc: gwl-devel

Hi (again) Josh :-)

On Mon, 2 Dec 2019 at 19:38, Josh Marshall
<joshua.r.marshall.1991@gmail.com> wrote:

> He uses it as a bioinformatics workflow to generate some analysis.  It

GWL should work for this use case. o/

> Is this kind of use case supported?  If so, how so?  Is nextflow not
> practical to keep?  Please, someone catch me up here so I can start to
> write code to help him out.  If this goes well, my company could
> integrate for gwl/guix in our work, which would be amazing.

Netxflow [1] is a Domain Specific Language (DSL): you write "rules"
and how these rules are combined together. In the bioinformatics
field, Snakemake [2] seems more popular. Other alternatives are CWL
[3], WDL [4], etc.

Basically, you describe:
 - what is the inputs
 - what is the outputs
 - how to process the inputs to produce the outputs

You can find examples there [*]. It uses the WISP syntax [5] but it
perfectly works with a Scheme-syntax if you prefer parenthesis. ;-)

[*] https://guixwl.org/


However, you should be interested by this blog post [#] by Pjotr using
Guix and CWL and other niceties!

[#] https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/

AFAIK, Nextflow is not yet packaged in Guix. One direction is to
package it and then use the workflow described in Nextflow DSL in the
spirit of [#]. One other direction is to rewrite the workflow using
the GWL DSL. It depends a bit on what is your final aim.


Hope that helps.
simon

[1] https://www.nextflow.io/
[2] https://snakemake.readthedocs.io/en/stable/
[3] https://www.commonwl.org/
[4] http://www.openwdl.org/
[5] https://srfi.schemers.org/srfi-119/srfi-119.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 19:00 ` zimoun
@ 2019-12-02 20:02   ` pjotr.public12
  2019-12-02 20:16     ` Josh Marshall
  2019-12-02 20:03   ` Josh Marshall
  1 sibling, 1 reply; 11+ messages in thread
From: pjotr.public12 @ 2019-12-02 20:02 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel, Josh Marshall

Yeah, if you read my BLOG below you can also simply split packaging
(read deployment) from the pipeline (runner). It would already be
worthwile to show how Guix can be mixed with Nextflow and how you
could use the same method to create reproducible containers.

That is a real win!

In the next step try to package Nextflow itself.

Or, alternatively, port pipelines to GWL. Mind, GWL needs development.
Nextflow is an amazing (Cloud) tool that can benefit from
reproducible deployment.

Pj.


On Mon, Dec 02, 2019 at 08:00:07PM +0100, zimoun wrote:
> Hi (again) Josh :-)
> 
> On Mon, 2 Dec 2019 at 19:38, Josh Marshall
> <joshua.r.marshall.1991@gmail.com> wrote:
> 
> > He uses it as a bioinformatics workflow to generate some analysis.  It
> 
> GWL should work for this use case. o/
> 
> > Is this kind of use case supported?  If so, how so?  Is nextflow not
> > practical to keep?  Please, someone catch me up here so I can start to
> > write code to help him out.  If this goes well, my company could
> > integrate for gwl/guix in our work, which would be amazing.
> 
> Netxflow [1] is a Domain Specific Language (DSL): you write "rules"
> and how these rules are combined together. In the bioinformatics
> field, Snakemake [2] seems more popular. Other alternatives are CWL
> [3], WDL [4], etc.
> 
> Basically, you describe:
>  - what is the inputs
>  - what is the outputs
>  - how to process the inputs to produce the outputs
> 
> You can find examples there [*]. It uses the WISP syntax [5] but it
> perfectly works with a Scheme-syntax if you prefer parenthesis. ;-)
> 
> [*] https://guixwl.org/
> 
> 
> However, you should be interested by this blog post [#] by Pjotr using
> Guix and CWL and other niceties!
> 
> [#] https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/
> 
> AFAIK, Nextflow is not yet packaged in Guix. One direction is to
> package it and then use the workflow described in Nextflow DSL in the
> spirit of [#]. One other direction is to rewrite the workflow using
> the GWL DSL. It depends a bit on what is your final aim.
> 
> 
> Hope that helps.
> simon
> 
> [1] https://www.nextflow.io/
> [2] https://snakemake.readthedocs.io/en/stable/
> [3] https://www.commonwl.org/
> [4] http://www.openwdl.org/
> [5] https://srfi.schemers.org/srfi-119/srfi-119.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 19:00 ` zimoun
  2019-12-02 20:02   ` pjotr.public12
@ 2019-12-02 20:03   ` Josh Marshall
  2019-12-02 20:59     ` zimoun
  1 sibling, 1 reply; 11+ messages in thread
From: Josh Marshall @ 2019-12-02 20:03 UTC (permalink / raw)
  To: gwl-devel

Looking at https://lists.gnu.org/archive/html/gwl-devel/2019-01/msg00034.html
the use case I'm looking at explicitly requires the input files to be
hashed and tracked manually, as if a package.  The actual pipeline
doesn't change much if at all, but those large data files must be
tracked.  Nextflow is the current fad pipeline, but it would be nice
to have some fully magical reproducible way to just re-use any DSL, as
there are a ton used and it would be nice to not replicate
xkcd.com/927 .  Still reading over everything.  I'm going to get a
direct plan for supporting this use case today.

I wish I had work time for this rather than vacation time.  The
technology is fascinating.

On Mon, Dec 2, 2019 at 2:00 PM zimoun <zimon.toutoune@gmail.com> wrote:
>
> Hi (again) Josh :-)
>
> On Mon, 2 Dec 2019 at 19:38, Josh Marshall
> <joshua.r.marshall.1991@gmail.com> wrote:
>
> > He uses it as a bioinformatics workflow to generate some analysis.  It
>
> GWL should work for this use case. o/
>
> > Is this kind of use case supported?  If so, how so?  Is nextflow not
> > practical to keep?  Please, someone catch me up here so I can start to
> > write code to help him out.  If this goes well, my company could
> > integrate for gwl/guix in our work, which would be amazing.
>
> Netxflow [1] is a Domain Specific Language (DSL): you write "rules"
> and how these rules are combined together. In the bioinformatics
> field, Snakemake [2] seems more popular. Other alternatives are CWL
> [3], WDL [4], etc.
>
> Basically, you describe:
>  - what is the inputs
>  - what is the outputs
>  - how to process the inputs to produce the outputs
>
> You can find examples there [*]. It uses the WISP syntax [5] but it
> perfectly works with a Scheme-syntax if you prefer parenthesis. ;-)
>
> [*] https://guixwl.org/
>
>
> However, you should be interested by this blog post [#] by Pjotr using
> Guix and CWL and other niceties!
>
> [#] https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/
>
> AFAIK, Nextflow is not yet packaged in Guix. One direction is to
> package it and then use the workflow described in Nextflow DSL in the
> spirit of [#]. One other direction is to rewrite the workflow using
> the GWL DSL. It depends a bit on what is your final aim.
>
>
> Hope that helps.
> simon
>
> [1] https://www.nextflow.io/
> [2] https://snakemake.readthedocs.io/en/stable/
> [3] https://www.commonwl.org/
> [4] http://www.openwdl.org/
> [5] https://srfi.schemers.org/srfi-119/srfi-119.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 20:02   ` pjotr.public12
@ 2019-12-02 20:16     ` Josh Marshall
  2019-12-02 21:00       ` zimoun
  0 siblings, 1 reply; 11+ messages in thread
From: Josh Marshall @ 2019-12-02 20:16 UTC (permalink / raw)
  To: pjotr.public12; +Cc: gwl-devel

I'll start with packaging nextflow.  That seems like the easy starting
point, will be needed if I want to switch my personal work systems to
guix, and I still shouldn't get deep into guix development challenges
since it is all still new.

On Mon, Dec 2, 2019 at 3:02 PM <pjotr.public12@thebird.nl> wrote:
>
> Yeah, if you read my BLOG below you can also simply split packaging
> (read deployment) from the pipeline (runner). It would already be
> worthwile to show how Guix can be mixed with Nextflow and how you
> could use the same method to create reproducible containers.
>
> That is a real win!
>
> In the next step try to package Nextflow itself.
>
> Or, alternatively, port pipelines to GWL. Mind, GWL needs development.
> Nextflow is an amazing (Cloud) tool that can benefit from
> reproducible deployment.
>
> Pj.
>
>
> On Mon, Dec 02, 2019 at 08:00:07PM +0100, zimoun wrote:
> > Hi (again) Josh :-)
> >
> > On Mon, 2 Dec 2019 at 19:38, Josh Marshall
> > <joshua.r.marshall.1991@gmail.com> wrote:
> >
> > > He uses it as a bioinformatics workflow to generate some analysis.  It
> >
> > GWL should work for this use case. o/
> >
> > > Is this kind of use case supported?  If so, how so?  Is nextflow not
> > > practical to keep?  Please, someone catch me up here so I can start to
> > > write code to help him out.  If this goes well, my company could
> > > integrate for gwl/guix in our work, which would be amazing.
> >
> > Netxflow [1] is a Domain Specific Language (DSL): you write "rules"
> > and how these rules are combined together. In the bioinformatics
> > field, Snakemake [2] seems more popular. Other alternatives are CWL
> > [3], WDL [4], etc.
> >
> > Basically, you describe:
> >  - what is the inputs
> >  - what is the outputs
> >  - how to process the inputs to produce the outputs
> >
> > You can find examples there [*]. It uses the WISP syntax [5] but it
> > perfectly works with a Scheme-syntax if you prefer parenthesis. ;-)
> >
> > [*] https://guixwl.org/
> >
> >
> > However, you should be interested by this blog post [#] by Pjotr using
> > Guix and CWL and other niceties!
> >
> > [#] https://hpc.guix.info/blog/2019/01/creating-a-reproducible-workflow-with-cwl/
> >
> > AFAIK, Nextflow is not yet packaged in Guix. One direction is to
> > package it and then use the workflow described in Nextflow DSL in the
> > spirit of [#]. One other direction is to rewrite the workflow using
> > the GWL DSL. It depends a bit on what is your final aim.
> >
> >
> > Hope that helps.
> > simon
> >
> > [1] https://www.nextflow.io/
> > [2] https://snakemake.readthedocs.io/en/stable/
> > [3] https://www.commonwl.org/
> > [4] http://www.openwdl.org/
> > [5] https://srfi.schemers.org/srfi-119/srfi-119.html
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 20:03   ` Josh Marshall
@ 2019-12-02 20:59     ` zimoun
  0 siblings, 0 replies; 11+ messages in thread
From: zimoun @ 2019-12-02 20:59 UTC (permalink / raw)
  To: Josh Marshall; +Cc: gwl-devel

On Mon, 2 Dec 2019 at 21:03, Josh Marshall
<joshua.r.marshall.1991@gmail.com> wrote:
>
> Looking at https://lists.gnu.org/archive/html/gwl-devel/2019-01/msg00034.html
> the use case I'm looking at explicitly requires the input files to be
> hashed and tracked manually, as if a package.

Currently, how to track inputs/outputs is still a work in progress.

As you can see, the last email of the mailing list was back on July.
Since then, bit of life intervened and I do not have enough time to
contribute/improve until next January. (I will not speak for Ricardo
but I know he is currently a bit busy by real life. :-))

> The actual pipeline
> doesn't change much if at all, but those large data files must be
> tracked.  Nextflow is the current fad pipeline, but it would be nice
> to have some fully magical reproducible way to just re-use any DSL, as

It is an hard topic...

Back on January 2018, I was thinking to write CWL front end and the
original author of GWL did this insighted answer [1].

[1] https://lists.gnu.org/archive/html/guix-devel/2018-01/msg00390.html



Thank you to keep alive the interest in this project. :-)
I am sure there is room in bioinformatics field for such tool.


All the best,
simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 20:16     ` Josh Marshall
@ 2019-12-02 21:00       ` zimoun
  0 siblings, 0 replies; 11+ messages in thread
From: zimoun @ 2019-12-02 21:00 UTC (permalink / raw)
  To: Josh Marshall; +Cc: gwl-devel

On Mon, 2 Dec 2019 at 21:16, Josh Marshall
<joshua.r.marshall.1991@gmail.com> wrote:

> I'll start with packaging nextflow.  That seems like the easy starting
> point, will be needed if I want to switch my personal work systems to
> guix, and I still shouldn't get deep into guix development challenges
> since it is all still new.

Feel free to ask questions on help-guix@gnu.org and/or the IRC channel
#guix. Because I presume you will have good time with Java. ;-)

Thank you for working on this.

All the best,
simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 17:36 How do I support building a guix package over multiple machines in a cloud environment? Josh Marshall
  2019-12-02 19:00 ` zimoun
@ 2019-12-02 23:39 ` Ricardo Wurmus
  2019-12-03  3:00   ` Josh Marshall
  2019-12-03 11:03   ` zimoun
  1 sibling, 2 replies; 11+ messages in thread
From: Ricardo Wurmus @ 2019-12-02 23:39 UTC (permalink / raw)
  To: joshua.r.marshall.1991; +Cc: gwl-devel


Hi Josh,

I would not use the GWL to automate *building* software in a distributed
fashion across separate nodes.  There are tools that have been designed
specifically for distributed compilation of software that would be more
appropriate here.

The GWL is capable of running jobs in a distributed environment.  I
started work on an AWS library for Guile[1], which would allow us to
spawn EC2 instances as needed.  The library works in that it provides a
DSL for interacting with AWS, but it needs testing, polishing, and
integration into the GWL.

Currently, I’m on parental leave and don’t really get to do any hacking
for a few months.  I would be happy if someone else could take a look at
the AWS library and its integration into the GWL; these tasks are not
trivial, but there are only few unknowns and it really just requires
time to implement the missing bits.

~~ Ricardo

[1]: https://git.elephly.net/?p=software/guile-aws.git;a=summary

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 23:39 ` Ricardo Wurmus
@ 2019-12-03  3:00   ` Josh Marshall
  2019-12-03 11:03   ` zimoun
  1 sibling, 0 replies; 11+ messages in thread
From: Josh Marshall @ 2019-12-03  3:00 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel

Thank you for the reply.

I'm not prepared to sink in the development time to get native
spanning to cloud instances, but thank you for the information.  This
probably puts the nail in the coffin for using guix for my coworker's
work but I'm still interested if only for my own projects.

On Mon, Dec 2, 2019 at 6:39 PM Ricardo Wurmus <rekado@elephly.net> wrote:
>
>
> Hi Josh,
>
> I would not use the GWL to automate *building* software in a distributed
> fashion across separate nodes.  There are tools that have been designed
> specifically for distributed compilation of software that would be more
> appropriate here.
>
> The GWL is capable of running jobs in a distributed environment.  I
> started work on an AWS library for Guile[1], which would allow us to
> spawn EC2 instances as needed.  The library works in that it provides a
> DSL for interacting with AWS, but it needs testing, polishing, and
> integration into the GWL.
>
> Currently, I’m on parental leave and don’t really get to do any hacking
> for a few months.  I would be happy if someone else could take a look at
> the AWS library and its integration into the GWL; these tasks are not
> trivial, but there are only few unknowns and it really just requires
> time to implement the missing bits.
>
> ~~ Ricardo
>
> [1]: https://git.elephly.net/?p=software/guile-aws.git;a=summary
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-02 23:39 ` Ricardo Wurmus
  2019-12-03  3:00   ` Josh Marshall
@ 2019-12-03 11:03   ` zimoun
  2019-12-03 13:47     ` Pjotr Prins
  1 sibling, 1 reply; 11+ messages in thread
From: zimoun @ 2019-12-03 11:03 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: gwl-devel, Josh Marshall

Hi Ricardo,

On Tue, 3 Dec 2019 at 00:39, Ricardo Wurmus <rekado@elephly.net> wrote:

> The GWL is capable of running jobs in a distributed environment.  I
> started work on an AWS library for Guile[1], which would allow us to
> spawn EC2 instances as needed.  The library works in that it provides a
> DSL for interacting with AWS, but it needs testing, polishing, and
> integration into the GWL.

This is really great!
Let talk about that in the near future. :-)


> Currently, I’m on parental leave and don’t really get to do any hacking
> for a few months.

Take care! :-)

> I would be happy if someone else could take a look at
> the AWS library and its integration into the GWL; these tasks are not
> trivial, but there are only few unknowns and it really just requires
> time to implement the missing bits.

I will schedule some time to give a look at your library.
If you are coming at the Guix Days, we could discussed these tasks and
others. :-)


Cheers,
simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How do I support building a guix package over multiple machines in a cloud environment?
  2019-12-03 11:03   ` zimoun
@ 2019-12-03 13:47     ` Pjotr Prins
  0 siblings, 0 replies; 11+ messages in thread
From: Pjotr Prins @ 2019-12-03 13:47 UTC (permalink / raw)
  To: zimoun; +Cc: gwl-devel, Josh Marshall

On Tue, Dec 03, 2019 at 12:03:47PM +0100, zimoun wrote:
> I will schedule some time to give a look at your library.
> If you are coming at the Guix Days, we could discussed these tasks and
> others. :-)

We should make it into a working group. ChrisM may be interested too.

Pj.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-12-03 13:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-02 17:36 How do I support building a guix package over multiple machines in a cloud environment? Josh Marshall
2019-12-02 19:00 ` zimoun
2019-12-02 20:02   ` pjotr.public12
2019-12-02 20:16     ` Josh Marshall
2019-12-02 21:00       ` zimoun
2019-12-02 20:03   ` Josh Marshall
2019-12-02 20:59     ` zimoun
2019-12-02 23:39 ` Ricardo Wurmus
2019-12-03  3:00   ` Josh Marshall
2019-12-03 11:03   ` zimoun
2019-12-03 13:47     ` Pjotr Prins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).