all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Guix and openmpi in a container environment
@ 2020-01-19 10:25 Todor Kondić
  2020-01-27 10:54 ` Todor Kondić
  2020-02-12 14:10 ` Ludovic Courtès
  0 siblings, 2 replies; 4+ messages in thread
From: Todor Kondić @ 2020-01-19 10:25 UTC (permalink / raw)
  To: help-guix\@gnu.org

I am getting mpirun errors when trying to execute a simple

mpirun -np 1 program

(where program is e.g. 'ls') command in a container environment.

The error is usually:

All nodes which are allocated for this job are already filled.

which makes no sense, as I am trying this on my workstation (single socket, four cores -- your off-the-shelf i5 cpu) and no scheduling system enabled.


I set up the container with this command:

guix environment -C -N --ad-hoc -m default.scm

where default.scm:

(use-modules (guix packages))
(specifications->manifest
 `(;; Utilities
   "less"
   "bash"
   "make"
   "openssh"
   "guile"
   "nano"
   "glibc-locales"
   "gcc-toolchain@7.4.0"
   "gfortran-toolchain@7.4.0"
   "python"
   "openmpi"
   "fftw"
   "fftw-openmpi"
   ,@(map (lambda (x) (package-name x)) %base-packages)))



Simply installing openmpi (guix package -i openmpi) in my usual Guix profile just works out of the box. So, there has to be some quirk where the openmpi container installation is blind to some settings within the usual environment.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Guix and openmpi in a container environment
  2020-01-19 10:25 Guix and openmpi in a container environment Todor Kondić
@ 2020-01-27 10:54 ` Todor Kondić
  2020-01-27 12:48   ` Todor Kondić
  2020-02-12 14:10 ` Ludovic Courtès
  1 sibling, 1 reply; 4+ messages in thread
From: Todor Kondić @ 2020-01-27 10:54 UTC (permalink / raw)
  To: Todor Kondić; +Cc: help-guix\\@gnu.org

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, 19 January 2020 11:25, Todor Kondić <tk.code@protonmail.com> wrote:

> I am getting mpirun errors when trying to execute a simple
>
> mpirun -np 1 program
>
> (where program is e.g. 'ls') command in a container environment.
>
> The error is usually:
>
> All nodes which are allocated for this job are already filled.
>
> which makes no sense, as I am trying this on my workstation (single socket, four cores -- your off-the-shelf i5 cpu) and no scheduling system enabled.
>
> I set up the container with this command:
>
> guix environment -C -N --ad-hoc -m default.scm
>
> where default.scm:
>
> (use-modules (guix packages))
> (specifications->manifest
> `(;; Utilities
> "less"
> "bash"
> "make"
> "openssh"
> "guile"
> "nano"
> "glibc-locales"
> "gcc-toolchain@7.4.0"
> "gfortran-toolchain@7.4.0"
> "python"
> "openmpi"
> "fftw"
> "fftw-openmpi"
> ,@(map (lambda (x) (package-name x)) %base-packages)))
>
> Simply installing openmpi (guix package -i openmpi) in my usual Guix profile just works out of the box. So, there has to be some quirk where the openmpi container installation is blind to some settings within the usual environment.

For the environment above,

if the mpirun invocation is changed to provide the hostname

mpirun --host $HOSTNAME:4 -np 4 ls

ls is executed in four processes and the output is four times the contents of the current directory as expected.

Of course, ls is not an MPI program. However, testing this elementary fortran MPI code,

---
program testrun2
  use mpi
  implicit none
  integer :: ierr

  call mpi_init(ierr)
  call mpi_finalize(ierr)

end program testrun2
---

fails with runtime errors on any number of processes.


The compilation line was:
mpif90 test2.f90 -o testrun2

The mpirun command:
mpirun --host $HOSTNAME:4 -np 4


Let me reiterate, there is no need to declare the host and its maximal number of slots in the normal user environment. Also, the runtime errors are gone.

Could it be that the openmpi package needs a few other basic dependencies not present in the package declaration for the particular case of a single node (normal PC) machine?

Also, I noted that gfortran/mpif90 ignores "CPATH" and "LIBRARY_PATH" env variables. I had to specify this explicitly via -I and -L flags to the compiler.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Guix and openmpi in a container environment
  2020-01-27 10:54 ` Todor Kondić
@ 2020-01-27 12:48   ` Todor Kondić
  0 siblings, 0 replies; 4+ messages in thread
From: Todor Kondić @ 2020-01-27 12:48 UTC (permalink / raw)
  To: Todor Kondić; +Cc: help-guix\\@gnu.org

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, 27 January 2020 11:54, Todor Kondić <tk.code@protonmail.com> wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Sunday, 19 January 2020 11:25, Todor Kondić tk.code@protonmail.com wrote:
>
> > I am getting mpirun errors when trying to execute a simple
> > mpirun -np 1 program
> > (where program is e.g. 'ls') command in a container environment.
> > The error is usually:
> > All nodes which are allocated for this job are already filled.
> > which makes no sense, as I am trying this on my workstation (single socket, four cores -- your off-the-shelf i5 cpu) and no scheduling system enabled.
> > I set up the container with this command:
> > guix environment -C -N --ad-hoc -m default.scm
> > where default.scm:
> > (use-modules (guix packages))
> > (specifications->manifest
> > `(;; Utilities
> > "less"
> > "bash"
> > "make"
> > "openssh"
> > "guile"
> > "nano"
> > "glibc-locales"
> > "gcc-toolchain@7.4.0"
> > "gfortran-toolchain@7.4.0"
> > "python"
> > "openmpi"
> > "fftw"
> > "fftw-openmpi"
> > ,@(map (lambda (x) (package-name x)) %base-packages)))
> > Simply installing openmpi (guix package -i openmpi) in my usual Guix profile just works out of the box. So, there has to be some quirk where the openmpi container installation is blind to some settings within the usual environment.
>
> For the environment above,
>
> if the mpirun invocation is changed to provide the hostname
>
> mpirun --host $HOSTNAME:4 -np 4 ls
>
> ls is executed in four processes and the output is four times the contents of the current directory as expected.
>
> Of course, ls is not an MPI program. However, testing this elementary fortran MPI code,
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> program testrun2
> use mpi
> implicit none
> integer :: ierr
>
> call mpi_init(ierr)
> call mpi_finalize(ierr)
>
> end program testrun2
>
> --------------------------------------------------------------------------------------------------------------------------
>
> fails with runtime errors on any number of processes.
>
> The compilation line was:
> mpif90 test2.f90 -o testrun2
>
> The mpirun command:
> mpirun --host $HOSTNAME:4 -np 4
>
> Let me reiterate, there is no need to declare the host and its maximal number of slots in the normal user environment. Also, the runtime errors are gone.
>
> Could it be that the openmpi package needs a few other basic dependencies not present in the package declaration for the particular case of a single node (normal PC) machine?
>
> Also, I noted that gfortran/mpif90 ignores "CPATH" and "LIBRARY_PATH" env variables. I had to specify this explicitly via -I and -L flags to the compiler.



After playing around a bit more, I can confirm that pure guix environment does works. Therefore, my solution is to get rid of -C flag and use --pure when developing and testing the MPI code on my workstation. Of course, it would be interesting to find out why OpenMPI stops working inside the "-C" environment. The closest problem solved on the net that I could find was about the friction between the new vader shared memory module of the OpenMPI and Docker containers (https://github.com/open-mpi/ompi/issues/4948). The recommended circumvention technique did not work, but it feels related.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Guix and openmpi in a container environment
  2020-01-19 10:25 Guix and openmpi in a container environment Todor Kondić
  2020-01-27 10:54 ` Todor Kondić
@ 2020-02-12 14:10 ` Ludovic Courtès
  1 sibling, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2020-02-12 14:10 UTC (permalink / raw)
  To: Todor Kondić; +Cc: help-guix@gnu.org

Hello Todor,

Todor Kondić <tk.code@protonmail.com> skribis:

> guix environment -C -N --ad-hoc -m default.scm

> Simply installing openmpi (guix package -i openmpi) in my usual Guix profile just works out of the box. So, there has to be some quirk where the openmpi container installation is blind to some settings within the usual environment.

Open MPI and its “drivers” (UCX, PSM, etc.) browse /sys, /proc, and /dev
to determine what devices are available.  Could it be that one of these
things is missing or different inside the container?  Does ‘strace’
reveal anything?

The article at
<https://hpc.guix.info/blog/2019/12/optimized-and-portable-open-mpi-packaging/>
might shed some light on some of these things.

HTH,
Ludo’.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-12 14:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-19 10:25 Guix and openmpi in a container environment Todor Kondić
2020-01-27 10:54 ` Todor Kondić
2020-01-27 12:48   ` Todor Kondić
2020-02-12 14:10 ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.