Slurm with containers (i.e., orchestration)

all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Slurm with containers (i.e., orchestration)
@ 2020-05-18 12:49 Pjotr Prins
  2020-05-18 13:11 ` Pjotr Prins
  2020-05-19 22:33 ` Begley Brothers Inc
  0 siblings, 2 replies; 4+ messages in thread
From: Pjotr Prins @ 2020-05-18 12:49 UTC (permalink / raw)
  To: guix-devel

I am looking into some light-weight style orchestration. One
possibility is to use Slurm with Guix containers - on a cluster with
Guix that is almost trivial (we use Guix containers a lot! They are
great) and would also allow non-container jobs.

Once we have containers and Slurm it should also be possible to deploy
in some cloud infrastructure, provided there are no dependencies on
the cluster itself. I think it would make a terrific BLOG story if we
put something like that together. 

Bcbio describes an architecture that uses the common workflow language
(CWL) to run pipelines with containers

  https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#running-with-cromwell-local-hpc

I am not promoting the use of this, but it shows that infrastructure
exists that can deploy workflows on containers in different setups
(Bcbio supports Slurm). I know the Guix infrastructure uses Guix
deploy to achieve similar roll-outs. What that lacks is the
orchestration mechanism itself which should handle dependencies
between jobs (i.e. a workflow). The GNU Workflow Language goes some
way, but it does not handle orchestration itself.

In other words, we almost have the pieces, but one thing is missing
:). Thoughts? I know I have brought this up before in different
guises, but we start to really need something here.

What makes orchestration? I guess it concerns a dynamic database of
machines that can execute jobs and some type of software registry
(Guix).  Next it should be able to schedule and execute jobs using
some constraint specifiers (like network/CPU/RAM). It could be a
'dynamic' Slurm that makes use of real machines and VMs. Or hook into
an existing cloud service. A slurm job could monitor sending a
container into a cloud service. 

I think we can build this up a step at a time. 

Thoughts?

Pj.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Slurm with containers (i.e., orchestration)
  2020-05-18 12:49 Slurm with containers (i.e., orchestration) Pjotr Prins
@ 2020-05-18 13:11 ` Pjotr Prins
  2020-05-19 22:33 ` Begley Brothers Inc
  1 sibling, 0 replies; 4+ messages in thread
From: Pjotr Prins @ 2020-05-18 13:11 UTC (permalink / raw)
  To: guix-devel

Ricardo added slurm-drmaa in the past (I can't believe it
almost 4 years ago we packaged slurm!) which may also help in
addressing some points

  http://www.drmaa.org/

Pj.


On Mon, May 18, 2020 at 07:49:00AM -0500, Pjotr Prins wrote:
> I am looking into some light-weight style orchestration. One
> possibility is to use Slurm with Guix containers - on a cluster with
> Guix that is almost trivial (we use Guix containers a lot! They are
> great) and would also allow non-container jobs.
> 
> Once we have containers and Slurm it should also be possible to deploy
> in some cloud infrastructure, provided there are no dependencies on
> the cluster itself. I think it would make a terrific BLOG story if we
> put something like that together. 
> 
> Bcbio describes an architecture that uses the common workflow language
> (CWL) to run pipelines with containers
> 
>   https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#running-with-cromwell-local-hpc
> 
> I am not promoting the use of this, but it shows that infrastructure
> exists that can deploy workflows on containers in different setups
> (Bcbio supports Slurm). I know the Guix infrastructure uses Guix
> deploy to achieve similar roll-outs. What that lacks is the
> orchestration mechanism itself which should handle dependencies
> between jobs (i.e. a workflow). The GNU Workflow Language goes some
> way, but it does not handle orchestration itself.
> 
> In other words, we almost have the pieces, but one thing is missing
> :). Thoughts? I know I have brought this up before in different
> guises, but we start to really need something here.
> 
> What makes orchestration? I guess it concerns a dynamic database of
> machines that can execute jobs and some type of software registry
> (Guix).  Next it should be able to schedule and execute jobs using
> some constraint specifiers (like network/CPU/RAM). It could be a
> 'dynamic' Slurm that makes use of real machines and VMs. Or hook into
> an existing cloud service. A slurm job could monitor sending a
> container into a cloud service. 
> 
> I think we can build this up a step at a time. 
> 
> Thoughts?
> 
> Pj.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Slurm with containers (i.e., orchestration)
  2020-05-18 12:49 Slurm with containers (i.e., orchestration) Pjotr Prins
  2020-05-18 13:11 ` Pjotr Prins
@ 2020-05-19 22:33 ` Begley Brothers Inc
  2020-05-20  2:13   ` Begley Brothers Inc
  1 sibling, 1 reply; 4+ messages in thread
From: Begley Brothers Inc @ 2020-05-19 22:33 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

On Mon, May 18, 2020 at 7:50 AM Pjotr Prins <pjotr.public12@thebird.nl> wrote:
>
> I am looking into some light-weight style orchestration. One

We think there is such a niche, the 80/20 rule.
We think containers are too limiting and a bad idea to target - but we
use them and they have mind share.
We also have some other ideas in mind, related to this context, but
we'll keep this on-topic.

Compromise:
Cast the issue in terms of a VM and let the 'hello world' MVP/example
be a VM that grabs a container from a registry and runs it to
completion and shutsdown.
Then demostrate the use of Guix building the VM and show the you can
not only discard the the container overhead, cruft and headaches, but
you also get a more powerful Dockerfile, and all the other Guix
features.
Then show that you can easily repurposethat VM workflow to a Metal Machine.

Of course in real world cases there are many scenarios where people
are looking for the reverse incrementalist pathway:
1.) Legacy-App + MM
2.) Legacy MM + (Guix + Legacy-App)
3.) Legacy MM + Guix + workflow
4.) Guix + workflow + (VM or MM or both)

> possibility is to use Slurm with Guix containers - on a cluster with
> Guix that is almost trivial (we use Guix containers a lot! They are
> great) and would also allow non-container jobs.

Hmm, doesn't slurm break the opening objective 'light-weight'?
Maybe better to write a VM abstraction/adapter for something like
Tinkerbell/tink[1], its Apache-2.0, and some project context is
here[5].

Define the use case as: VM's that run a task lauched by init and shut
themselves down when done - many of course have open-ended run times.
For multiple VM use cases:
There are a multitude of distributed computing tools that Guix leaves
the user free to chose amoung to build into their VM - Guix could take
no position on whether Condor, Nomad, etc, etc., etc. are better
suited to someone's problem.

With those constraints in mind, and lightweight being primary, then it
is simple to imagine Guix generating a VM version of such a
workflow[2]and delgating the workflow heavy lifting to
Tinkerbell/tink:

```bash
guix light ~/src/project/hello-world.tmpl
```

> Once we have containers and Slurm it should also be possible to deploy

slurm: -1
containers: -1

> in some cloud infrastructure, provided there are no dependencies on

I think you could get there and beyond with some relatively minor
(compared to Slurm) contributions to Tinkerbell (Apache-2.0).
This setup[3] targets an AWS instance, so you could likely leverage
`guix deploy` too.

> the cluster itself. I think it would make a terrific BLOG story if we
> put something like that together.
>
> Bcbio describes an architecture that uses the common workflow language
> (CWL) to run pipelines with containers
>
>   https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#running-with-cromwell-local-hpc
>
> I am not promoting the use of this, but it shows that infrastructure
> exists that can deploy workflows on containers in different setups

Again we believe if you think in terms of VM's (rather than
containers) there is a wider set of possible use cases.
If you build on Tinkerbell/tink or re-implement its logic - not clear
what you have in mind - you could also expand the Guix use cases to
workflows that include metal machine (MM) users/managers.

> (Bcbio supports Slurm). I know the Guix infrastructure uses Guix
> deploy to achieve similar roll-outs. What that lacks is the
> orchestration mechanism itself which should handle dependencies
> between jobs (i.e. a workflow).

We're not familiar with GNU workflow language - with that caveat:
We think that for a lightweight implementation you might be better off
defining the scope more narrowly but still expanding the universe of
Guix use cases as we outlined.
Once that is in place, experience might lead the project to be
opinionated on how 'jobs' are handled between machines (VM and MM).

Maybe getting to the point of a Guix entry in such Awesome-Metal lists[4]

> The GNU Workflow Language goes some
> way, but it does not handle orchestration itself.
>
> In other words, we almost have the pieces, but one thing is missing
> :).

Agreed.  This does seem to be a gap.
The challenge is keeping the solution elegant, focussed, yet general.
Tinkerbell targets the “17 Unix Rules” so they may be interested in
accomodating a VM use case?
If not it would still be possible to 'define' a VM that looks to
tinkerbell like a MM.

> Thoughts? I know I have brought this up before in different
> guises, but we start to really need something here.
>
> What makes orchestration? I guess it concerns a dynamic database of
> machines that can execute jobs and some type of software registry
> (Guix).

That seems a resonable inital scope definition, especially if you
recognize VM and MM as two distinct categories of machines to apply
Guix to.

> Next it should be able to schedule and execute jobs using
> some constraint specifiers (like network/CPU/RAM). It could be a
> 'dynamic' Slurm that makes use of real machines and VMs. Or hook into
> an existing cloud service. A slurm job could monitor sending a
> container into a cloud service.

Agreed. Those also strike us as 2nd order/stage scope elements - once
orchestrated VM's and MM's are running Guix deployed OS and apps.

> I think we can build this up a step at a time.

It is not clear, but it does sound like you could intend to implment
everything in Guix? Or take a more "build small, build modular, and
build simple" approach where Guix connects some pre-existing elements?
We've identified one in Tinkerbell/tink, where Guix would get the
benefit of "four microservices that take you from a powered off server
to a high-level execution environment running your very special custom
thingamabobber"  but there maybe others better suited?

>
> Thoughts?

Very interesting. Thanks for sharing.

[1]: https://github.com/tinkerbell/tink
[2]: https://github.com/tinkerbell/tink/blob/master/docs/hello-world.md
[3]: https://github.com/tinkerbell/tink/blob/master/docs/setup.md
[4]: https://github.com/alexellis/awesome-baremetal
[5]: https://www.packet.com/blog/open-sourcing-tinkerbell

-- 
Kind Regards

Begley Brothers Inc.

The content of this email is confidential and intended for the
recipient specified in message only. It is strictly forbidden to share
any part of this message with any third party, without a written
consent of the sender. If you received this message by mistake, please
reply to this message and follow with its deletion, so that we can
ensure such a mistake does not occur in the future.
This message has been sent as a part of discussion between Begley
Brothers Inc. and the addressee whose name is specified above. Should
you receive this message by mistake, we would be most grateful if you
informed us that the message has been sent to you. In this case, we
also ask that you delete this message from your mailbox, and do not
forward it or any part of it to anyone else. Thank you for your
cooperation and understanding.
Begley Brothers Inc. puts the security of the client at a high
priority. Therefore, we have put efforts into ensuring that the
message is error and virus-free. Unfortunately, full security of the
email cannot be ensured as, despite our efforts, the data included in
emails could be infected, intercepted, or corrupted. Therefore, the
recipient should check the email for threats with proper software, as
the sender does not accept liability for any damage inflicted by
viewing the content of this email.
The views and opinions included in this email belong to their author
and do not necessarily mirror the views and opinions of the company.
Our employees are obliged not to make any defamatory clauses,
infringe, or authorize infringement of any legal right. Therefore, the
company will not take any liability for such statements included in
emails. In case of any damages or other liabilities arising, employees
are fully responsible for the content of their emails.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Slurm with containers (i.e., orchestration)
  2020-05-19 22:33 ` Begley Brothers Inc
@ 2020-05-20  2:13   ` Begley Brothers Inc
  0 siblings, 0 replies; 4+ messages in thread
From: Begley Brothers Inc @ 2020-05-20  2:13 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

P.S
Just to keep things interesting - GWL: A workflow management language
extension for GNU Guix

The Guix Workflow Language (GWL) provides a scientific computing
extension to GNU Guix's declarative language for package management
for the declaration of scientific workflows.

https://www.guixwl.org/tutorial

On Tue, May 19, 2020 at 5:33 PM Begley Brothers Inc
<begleybrothers@gmail.com> wrote:
>
> On Mon, May 18, 2020 at 7:50 AM Pjotr Prins <pjotr.public12@thebird.nl> wrote:
> >
> > I am looking into some light-weight style orchestration. One
>
> We think there is such a niche, the 80/20 rule.
> We think containers are too limiting and a bad idea to target - but we
> use them and they have mind share.
> We also have some other ideas in mind, related to this context, but
> we'll keep this on-topic.
>
> Compromise:
> Cast the issue in terms of a VM and let the 'hello world' MVP/example
> be a VM that grabs a container from a registry and runs it to
> completion and shutsdown.
> Then demostrate the use of Guix building the VM and show the you can
> not only discard the the container overhead, cruft and headaches, but
> you also get a more powerful Dockerfile, and all the other Guix
> features.
> Then show that you can easily repurposethat VM workflow to a Metal Machine.
>
> Of course in real world cases there are many scenarios where people
> are looking for the reverse incrementalist pathway:
> 1.) Legacy-App + MM
> 2.) Legacy MM + (Guix + Legacy-App)
> 3.) Legacy MM + Guix + workflow
> 4.) Guix + workflow + (VM or MM or both)
>
> > possibility is to use Slurm with Guix containers - on a cluster with
> > Guix that is almost trivial (we use Guix containers a lot! They are
> > great) and would also allow non-container jobs.
>
> Hmm, doesn't slurm break the opening objective 'light-weight'?
> Maybe better to write a VM abstraction/adapter for something like
> Tinkerbell/tink[1], its Apache-2.0, and some project context is
> here[5].
>
> Define the use case as: VM's that run a task lauched by init and shut
> themselves down when done - many of course have open-ended run times.
> For multiple VM use cases:
> There are a multitude of distributed computing tools that Guix leaves
> the user free to chose amoung to build into their VM - Guix could take
> no position on whether Condor, Nomad, etc, etc., etc. are better
> suited to someone's problem.
>
> With those constraints in mind, and lightweight being primary, then it
> is simple to imagine Guix generating a VM version of such a
> workflow[2]and delgating the workflow heavy lifting to
> Tinkerbell/tink:
>
> ```bash
> guix light ~/src/project/hello-world.tmpl
> ```
>
> > Once we have containers and Slurm it should also be possible to deploy
>
> slurm: -1
> containers: -1
>
> > in some cloud infrastructure, provided there are no dependencies on
>
> I think you could get there and beyond with some relatively minor
> (compared to Slurm) contributions to Tinkerbell (Apache-2.0).
> This setup[3] targets an AWS instance, so you could likely leverage
> `guix deploy` too.
>
> > the cluster itself. I think it would make a terrific BLOG story if we
> > put something like that together.
> >
> > Bcbio describes an architecture that uses the common workflow language
> > (CWL) to run pipelines with containers
> >
> >   https://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html#running-with-cromwell-local-hpc
> >
> > I am not promoting the use of this, but it shows that infrastructure
> > exists that can deploy workflows on containers in different setups
>
> Again we believe if you think in terms of VM's (rather than
> containers) there is a wider set of possible use cases.
> If you build on Tinkerbell/tink or re-implement its logic - not clear
> what you have in mind - you could also expand the Guix use cases to
> workflows that include metal machine (MM) users/managers.
>
> > (Bcbio supports Slurm). I know the Guix infrastructure uses Guix
> > deploy to achieve similar roll-outs. What that lacks is the
> > orchestration mechanism itself which should handle dependencies
> > between jobs (i.e. a workflow).
>
> We're not familiar with GNU workflow language - with that caveat:
> We think that for a lightweight implementation you might be better off
> defining the scope more narrowly but still expanding the universe of
> Guix use cases as we outlined.
> Once that is in place, experience might lead the project to be
> opinionated on how 'jobs' are handled between machines (VM and MM).
>
> Maybe getting to the point of a Guix entry in such Awesome-Metal lists[4]
>
> > The GNU Workflow Language goes some
> > way, but it does not handle orchestration itself.
> >
> > In other words, we almost have the pieces, but one thing is missing
> > :).
>
> Agreed.  This does seem to be a gap.
> The challenge is keeping the solution elegant, focussed, yet general.
> Tinkerbell targets the “17 Unix Rules” so they may be interested in
> accomodating a VM use case?
> If not it would still be possible to 'define' a VM that looks to
> tinkerbell like a MM.
>
> > Thoughts? I know I have brought this up before in different
> > guises, but we start to really need something here.
> >
> > What makes orchestration? I guess it concerns a dynamic database of
> > machines that can execute jobs and some type of software registry
> > (Guix).
>
> That seems a resonable inital scope definition, especially if you
> recognize VM and MM as two distinct categories of machines to apply
> Guix to.
>
> > Next it should be able to schedule and execute jobs using
> > some constraint specifiers (like network/CPU/RAM). It could be a
> > 'dynamic' Slurm that makes use of real machines and VMs. Or hook into
> > an existing cloud service. A slurm job could monitor sending a
> > container into a cloud service.
>
> Agreed. Those also strike us as 2nd order/stage scope elements - once
> orchestrated VM's and MM's are running Guix deployed OS and apps.
>
> > I think we can build this up a step at a time.
>
> It is not clear, but it does sound like you could intend to implment
> everything in Guix? Or take a more "build small, build modular, and
> build simple" approach where Guix connects some pre-existing elements?
> We've identified one in Tinkerbell/tink, where Guix would get the
> benefit of "four microservices that take you from a powered off server
> to a high-level execution environment running your very special custom
> thingamabobber"  but there maybe others better suited?
>
> >
> > Thoughts?
>
> Very interesting. Thanks for sharing.
>
> [1]: https://github.com/tinkerbell/tink
> [2]: https://github.com/tinkerbell/tink/blob/master/docs/hello-world.md
> [3]: https://github.com/tinkerbell/tink/blob/master/docs/setup.md
> [4]: https://github.com/alexellis/awesome-baremetal
> [5]: https://www.packet.com/blog/open-sourcing-tinkerbell
>
> --
> Kind Regards
>
> Begley Brothers Inc.
>
> The content of this email is confidential and intended for the
> recipient specified in message only. It is strictly forbidden to share
> any part of this message with any third party, without a written
> consent of the sender. If you received this message by mistake, please
> reply to this message and follow with its deletion, so that we can
> ensure such a mistake does not occur in the future.
> This message has been sent as a part of discussion between Begley
> Brothers Inc. and the addressee whose name is specified above. Should
> you receive this message by mistake, we would be most grateful if you
> informed us that the message has been sent to you. In this case, we
> also ask that you delete this message from your mailbox, and do not
> forward it or any part of it to anyone else. Thank you for your
> cooperation and understanding.
> Begley Brothers Inc. puts the security of the client at a high
> priority. Therefore, we have put efforts into ensuring that the
> message is error and virus-free. Unfortunately, full security of the
> email cannot be ensured as, despite our efforts, the data included in
> emails could be infected, intercepted, or corrupted. Therefore, the
> recipient should check the email for threats with proper software, as
> the sender does not accept liability for any damage inflicted by
> viewing the content of this email.
> The views and opinions included in this email belong to their author
> and do not necessarily mirror the views and opinions of the company.
> Our employees are obliged not to make any defamatory clauses,
> infringe, or authorize infringement of any legal right. Therefore, the
> company will not take any liability for such statements included in
> emails. In case of any damages or other liabilities arising, employees
> are fully responsible for the content of their emails.



-- 
Kind Regards

Begley Brothers Inc.

The content of this email is confidential and intended for the
recipient specified in message only. It is strictly forbidden to share
any part of this message with any third party, without a written
consent of the sender. If you received this message by mistake, please
reply to this message and follow with its deletion, so that we can
ensure such a mistake does not occur in the future.
This message has been sent as a part of discussion between Begley
Brothers Inc. and the addressee whose name is specified above. Should
you receive this message by mistake, we would be most grateful if you
informed us that the message has been sent to you. In this case, we
also ask that you delete this message from your mailbox, and do not
forward it or any part of it to anyone else. Thank you for your
cooperation and understanding.
Begley Brothers Inc. puts the security of the client at a high
priority. Therefore, we have put efforts into ensuring that the
message is error and virus-free. Unfortunately, full security of the
email cannot be ensured as, despite our efforts, the data included in
emails could be infected, intercepted, or corrupted. Therefore, the
recipient should check the email for threats with proper software, as
the sender does not accept liability for any damage inflicted by
viewing the content of this email.
The views and opinions included in this email belong to their author
and do not necessarily mirror the views and opinions of the company.
Our employees are obliged not to make any defamatory clauses,
infringe, or authorize infringement of any legal right. Therefore, the
company will not take any liability for such statements included in
emails. In case of any damages or other liabilities arising, employees
are fully responsible for the content of their emails.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-20  2:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-18 12:49 Slurm with containers (i.e., orchestration) Pjotr Prins
2020-05-18 13:11 ` Pjotr Prins
2020-05-19 22:33 ` Begley Brothers Inc
2020-05-20  2:13   ` Begley Brothers Inc

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.