unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#44387: SLURM client version must match daemon version
@ 2020-11-02  9:10 Ludovic Courtès
  2020-11-02 14:36 ` Ludovic Courtès
  0 siblings, 1 reply; 3+ messages in thread
From: Ludovic Courtès @ 2020-11-02  9:10 UTC (permalink / raw)
  To: 44387

Hello,

We’ve noticed the problem below on clusters running a foreign distro
when slurmd is version 19.x and our clients are version 20.x:

--8<---------------cut here---------------start------------->8---
[courtes@devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- package -A slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
python-slurm-magic      0.0-0.73dd1a2   out     gnu/packages/parallel.scm:225:4
slurm   20.02.5 out     gnu/packages/parallel.scm:109:2
slurm-drmaa     1.1.1   out     gnu/packages/parallel.scm:194:2
[courtes@devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- environment --ad-hoc slurm -- squeue
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
slurm_load_jobs error: Zero Bytes were transmitted or received
[courtes@devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- package -A slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
python-slurm-magic      0.0-0.73dd1a2   out     gnu/packages/parallel.scm:225:4
slurm   19.05.3-2       out     gnu/packages/parallel.scm:109:2
slurm-drmaa     1.1.1   out     gnu/packages/parallel.scm:194:2
[courtes@devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- environment --ad-hoc slurm -- squeue
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[courtes@devel01 ~]$ /usr/bin/squeue --version
slurm 19.05.2
--8<---------------cut here---------------end--------------->8---

It means that we cannot generally use the Guix-provided SLURM on
clusters running foreign distros.

<https://slurm.schedmd.com/troubleshoot.html#network> reads:

  Slurm daemons will support RPCs and state files from the two previous
  major releases (e.g. a version 17.11.x SlurmDBD will support slurmctld
  daemons and commands with a version of 17.11.x, 17.02.x or 16.05.x).

Looking at <https://download.schedmd.com/slurm/>, there’s been quite a
few releases between 19.05.3-2 and 20.02.5, which may explain the
problem I described.


Apparently the only .so in Open MPI linked against SLURM is
‘lib/openmpi/mca_pmix_s1.so’.  The diff suggests that the two versions are
not ABI-compatible, so one wouldn’t be able to use ‘--with-graft’ to
graft one version in lieu of the other:

--8<---------------cut here---------------start------------->8---
[courtes@devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- build slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
/gnu/store/37b7qnwck4pg51qia4w002i62g156xgw-slurm-19.05.3-2
[courtes@devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- build slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
/gnu/store/7n6aks2wcmn2pxv03q8ij38hsj9zfzk9-slurm-20.02.5
[courtes@devel01 ~]$ abidiff --stat /gnu/store/37b7qnwck4pg51qia4w002i62g156xgw-slurm-19.05.3-2/lib/slurm/libslurmfull.so /gnu/store/7n6aks2wcmn2pxv03q8ij38hsj9zfzk9-slurm-20.02.5/lib/slurm/libslurmfull.so
Functions changes summary: 0 Removed, 0 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 80 Removed, 162 Added function symbols not referenced by debug info
Variable symbols changes summary: 3 Removed, 0 Added variable symbols not referenced by debug info
--8<---------------cut here---------------end--------------->8---

What can we do about it?

At least, we should package several known-useful versions, so that
people can use ‘--with-input=slurm@X=slurm@Y’ (if needed) or explicitly
refer to the version they want in their profile.  I’ll work on that.

Anything else?

I heard that PMIx, a scheduler-independent API, will eventually
supersede SLURM in Open MPI.  Let’s see if that loosens version
requirements.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#44387: SLURM client version must match daemon version
  2020-11-02  9:10 bug#44387: SLURM client version must match daemon version Ludovic Courtès
@ 2020-11-02 14:36 ` Ludovic Courtès
  2020-11-02 16:27   ` Ricardo Wurmus
  0 siblings, 1 reply; 3+ messages in thread
From: Ludovic Courtès @ 2020-11-02 14:36 UTC (permalink / raw)
  To: 44387

Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> At least, we should package several known-useful versions, so that
> people can use ‘--with-input=slurm@X=slurm@Y’ (if needed) or explicitly
> refer to the version they want in their profile.  I’ll work on that.

I’ve reintroduced version 19.05:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=e1bd62eb5ce0f2410b2607f157989588791b43e0




^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#44387: SLURM client version must match daemon version
  2020-11-02 14:36 ` Ludovic Courtès
@ 2020-11-02 16:27   ` Ricardo Wurmus
  0 siblings, 0 replies; 3+ messages in thread
From: Ricardo Wurmus @ 2020-11-02 16:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 44387


Ludovic Courtès <ludo@gnu.org> writes:

> Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
>
>> At least, we should package several known-useful versions, so that
>> people can use ‘--with-input=slurm@X=slurm@Y’ (if needed) or explicitly
>> refer to the version they want in their profile.  I’ll work on that.
>
> I’ve reintroduced version 19.05:
>
>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=e1bd62eb5ce0f2410b2607f157989588791b43e0

Good call.  It seems like a good idea to keep older major versions
around.

There’s a similar problem with postgres, which needs (or used to need)
more than one version to upgrade existing data from an older version.

-- 
Ricardo




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-11-02 16:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02  9:10 bug#44387: SLURM client version must match daemon version Ludovic Courtès
2020-11-02 14:36 ` Ludovic Courtès
2020-11-02 16:27   ` Ricardo Wurmus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).