* Guix and openmpi in a container environment @ 2020-01-19 10:25 Todor Kondić 2020-01-27 10:54 ` Todor Kondić 2020-02-12 14:10 ` Ludovic Courtès 0 siblings, 2 replies; 4+ messages in thread From: Todor Kondić @ 2020-01-19 10:25 UTC (permalink / raw) To: help-guix\@gnu.org I am getting mpirun errors when trying to execute a simple mpirun -np 1 program (where program is e.g. 'ls') command in a container environment. The error is usually: All nodes which are allocated for this job are already filled. which makes no sense, as I am trying this on my workstation (single socket, four cores -- your off-the-shelf i5 cpu) and no scheduling system enabled. I set up the container with this command: guix environment -C -N --ad-hoc -m default.scm where default.scm: (use-modules (guix packages)) (specifications->manifest `(;; Utilities "less" "bash" "make" "openssh" "guile" "nano" "glibc-locales" "gcc-toolchain@7.4.0" "gfortran-toolchain@7.4.0" "python" "openmpi" "fftw" "fftw-openmpi" ,@(map (lambda (x) (package-name x)) %base-packages))) Simply installing openmpi (guix package -i openmpi) in my usual Guix profile just works out of the box. So, there has to be some quirk where the openmpi container installation is blind to some settings within the usual environment. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Guix and openmpi in a container environment 2020-01-19 10:25 Guix and openmpi in a container environment Todor Kondić @ 2020-01-27 10:54 ` Todor Kondić 2020-01-27 12:48 ` Todor Kondić 2020-02-12 14:10 ` Ludovic Courtès 1 sibling, 1 reply; 4+ messages in thread From: Todor Kondić @ 2020-01-27 10:54 UTC (permalink / raw) To: Todor Kondić; +Cc: help-guix\\@gnu.org ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Sunday, 19 January 2020 11:25, Todor Kondić <tk.code@protonmail.com> wrote: > I am getting mpirun errors when trying to execute a simple > > mpirun -np 1 program > > (where program is e.g. 'ls') command in a container environment. > > The error is usually: > > All nodes which are allocated for this job are already filled. > > which makes no sense, as I am trying this on my workstation (single socket, four cores -- your off-the-shelf i5 cpu) and no scheduling system enabled. > > I set up the container with this command: > > guix environment -C -N --ad-hoc -m default.scm > > where default.scm: > > (use-modules (guix packages)) > (specifications->manifest > `(;; Utilities > "less" > "bash" > "make" > "openssh" > "guile" > "nano" > "glibc-locales" > "gcc-toolchain@7.4.0" > "gfortran-toolchain@7.4.0" > "python" > "openmpi" > "fftw" > "fftw-openmpi" > ,@(map (lambda (x) (package-name x)) %base-packages))) > > Simply installing openmpi (guix package -i openmpi) in my usual Guix profile just works out of the box. So, there has to be some quirk where the openmpi container installation is blind to some settings within the usual environment. For the environment above, if the mpirun invocation is changed to provide the hostname mpirun --host $HOSTNAME:4 -np 4 ls ls is executed in four processes and the output is four times the contents of the current directory as expected. Of course, ls is not an MPI program. However, testing this elementary fortran MPI code, --- program testrun2 use mpi implicit none integer :: ierr call mpi_init(ierr) call mpi_finalize(ierr) end program testrun2 --- fails with runtime errors on any number of processes. The compilation line was: mpif90 test2.f90 -o testrun2 The mpirun command: mpirun --host $HOSTNAME:4 -np 4 Let me reiterate, there is no need to declare the host and its maximal number of slots in the normal user environment. Also, the runtime errors are gone. Could it be that the openmpi package needs a few other basic dependencies not present in the package declaration for the particular case of a single node (normal PC) machine? Also, I noted that gfortran/mpif90 ignores "CPATH" and "LIBRARY_PATH" env variables. I had to specify this explicitly via -I and -L flags to the compiler. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Guix and openmpi in a container environment 2020-01-27 10:54 ` Todor Kondić @ 2020-01-27 12:48 ` Todor Kondić 0 siblings, 0 replies; 4+ messages in thread From: Todor Kondić @ 2020-01-27 12:48 UTC (permalink / raw) To: Todor Kondić; +Cc: help-guix\\@gnu.org ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, 27 January 2020 11:54, Todor Kondić <tk.code@protonmail.com> wrote: > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Sunday, 19 January 2020 11:25, Todor Kondić tk.code@protonmail.com wrote: > > > I am getting mpirun errors when trying to execute a simple > > mpirun -np 1 program > > (where program is e.g. 'ls') command in a container environment. > > The error is usually: > > All nodes which are allocated for this job are already filled. > > which makes no sense, as I am trying this on my workstation (single socket, four cores -- your off-the-shelf i5 cpu) and no scheduling system enabled. > > I set up the container with this command: > > guix environment -C -N --ad-hoc -m default.scm > > where default.scm: > > (use-modules (guix packages)) > > (specifications->manifest > > `(;; Utilities > > "less" > > "bash" > > "make" > > "openssh" > > "guile" > > "nano" > > "glibc-locales" > > "gcc-toolchain@7.4.0" > > "gfortran-toolchain@7.4.0" > > "python" > > "openmpi" > > "fftw" > > "fftw-openmpi" > > ,@(map (lambda (x) (package-name x)) %base-packages))) > > Simply installing openmpi (guix package -i openmpi) in my usual Guix profile just works out of the box. So, there has to be some quirk where the openmpi container installation is blind to some settings within the usual environment. > > For the environment above, > > if the mpirun invocation is changed to provide the hostname > > mpirun --host $HOSTNAME:4 -np 4 ls > > ls is executed in four processes and the output is four times the contents of the current directory as expected. > > Of course, ls is not an MPI program. However, testing this elementary fortran MPI code, > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > program testrun2 > use mpi > implicit none > integer :: ierr > > call mpi_init(ierr) > call mpi_finalize(ierr) > > end program testrun2 > > -------------------------------------------------------------------------------------------------------------------------- > > fails with runtime errors on any number of processes. > > The compilation line was: > mpif90 test2.f90 -o testrun2 > > The mpirun command: > mpirun --host $HOSTNAME:4 -np 4 > > Let me reiterate, there is no need to declare the host and its maximal number of slots in the normal user environment. Also, the runtime errors are gone. > > Could it be that the openmpi package needs a few other basic dependencies not present in the package declaration for the particular case of a single node (normal PC) machine? > > Also, I noted that gfortran/mpif90 ignores "CPATH" and "LIBRARY_PATH" env variables. I had to specify this explicitly via -I and -L flags to the compiler. After playing around a bit more, I can confirm that pure guix environment does works. Therefore, my solution is to get rid of -C flag and use --pure when developing and testing the MPI code on my workstation. Of course, it would be interesting to find out why OpenMPI stops working inside the "-C" environment. The closest problem solved on the net that I could find was about the friction between the new vader shared memory module of the OpenMPI and Docker containers (https://github.com/open-mpi/ompi/issues/4948). The recommended circumvention technique did not work, but it feels related. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Guix and openmpi in a container environment 2020-01-19 10:25 Guix and openmpi in a container environment Todor Kondić 2020-01-27 10:54 ` Todor Kondić @ 2020-02-12 14:10 ` Ludovic Courtès 1 sibling, 0 replies; 4+ messages in thread From: Ludovic Courtès @ 2020-02-12 14:10 UTC (permalink / raw) To: Todor Kondić; +Cc: help-guix@gnu.org Hello Todor, Todor Kondić <tk.code@protonmail.com> skribis: > guix environment -C -N --ad-hoc -m default.scm > Simply installing openmpi (guix package -i openmpi) in my usual Guix profile just works out of the box. So, there has to be some quirk where the openmpi container installation is blind to some settings within the usual environment. Open MPI and its “drivers” (UCX, PSM, etc.) browse /sys, /proc, and /dev to determine what devices are available. Could it be that one of these things is missing or different inside the container? Does ‘strace’ reveal anything? The article at <https://hpc.guix.info/blog/2019/12/optimized-and-portable-open-mpi-packaging/> might shed some light on some of these things. HTH, Ludo’. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-02-12 14:10 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-19 10:25 Guix and openmpi in a container environment Todor Kondić 2020-01-27 10:54 ` Todor Kondić 2020-01-27 12:48 ` Todor Kondić 2020-02-12 14:10 ` Ludovic Courtès
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.