From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Todor_Kondi=C4=87?= Subject: Re: Guix and openmpi in a container environment Date: Mon, 27 Jan 2020 12:48:04 +0000 Message-ID: <6K-7-hm660HeN1vpt3cYHjwP-22V4fD2SCbYjLDQ0Gl4kmlMSBwwmT-BHgnImRo4HiAicmhdhBPXkYiC0QynTehS0niDu4qqMYpAcQzQvNo=@protonmail.com> References: Reply-To: =?UTF-8?Q?Todor_Kondi=C4=87?= Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:41544) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iw3oM-0006JI-4A for help-guix@gnu.org; Mon, 27 Jan 2020 07:48:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iw3oK-0006zd-E0 for help-guix@gnu.org; Mon, 27 Jan 2020 07:48:13 -0500 Received: from mail2.protonmail.ch ([185.70.40.22]:63353) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iw3oJ-0006yc-Pq for help-guix@gnu.org; Mon, 27 Jan 2020 07:48:12 -0500 In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-guix-bounces+gcggh-help-guix=m.gmane-mx.org@gnu.org Sender: "Help-Guix" To: =?UTF-8?Q?Todor_Kondi=C4=87?= Cc: "help-guix\\\\@gnu.org" =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On Monday, 27 January 2020 11:54, Todor Kondi=C4=87 wrote: > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original = Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 > On Sunday, 19 January 2020 11:25, Todor Kondi=C4=87 tk.code@protonmail.co= m wrote: > > > I am getting mpirun errors when trying to execute a simple > > mpirun -np 1 program > > (where program is e.g. 'ls') command in a container environment. > > The error is usually: > > All nodes which are allocated for this job are already filled. > > which makes no sense, as I am trying this on my workstation (single soc= ket, four cores -- your off-the-shelf i5 cpu) and no scheduling system enab= led. > > I set up the container with this command: > > guix environment -C -N --ad-hoc -m default.scm > > where default.scm: > > (use-modules (guix packages)) > > (specifications->manifest > > `(;; Utilities > > "less" > > "bash" > > "make" > > "openssh" > > "guile" > > "nano" > > "glibc-locales" > > "gcc-toolchain@7.4.0" > > "gfortran-toolchain@7.4.0" > > "python" > > "openmpi" > > "fftw" > > "fftw-openmpi" > > ,@(map (lambda (x) (package-name x)) %base-packages))) > > Simply installing openmpi (guix package -i openmpi) in my usual Guix pr= ofile just works out of the box. So, there has to be some quirk where the o= penmpi container installation is blind to some settings within the usual en= vironment. > > For the environment above, > > if the mpirun invocation is changed to provide the hostname > > mpirun --host $HOSTNAME:4 -np 4 ls > > ls is executed in four processes and the output is four times the content= s of the current directory as expected. > > Of course, ls is not an MPI program. However, testing this elementary for= tran MPI code, > > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ------------------------------- > > program testrun2 > use mpi > implicit none > integer :: ierr > > call mpi_init(ierr) > call mpi_finalize(ierr) > > end program testrun2 > > -------------------------------------------------------------------------= ------------------------------------------------- > > fails with runtime errors on any number of processes. > > The compilation line was: > mpif90 test2.f90 -o testrun2 > > The mpirun command: > mpirun --host $HOSTNAME:4 -np 4 > > Let me reiterate, there is no need to declare the host and its maximal nu= mber of slots in the normal user environment. Also, the runtime errors are = gone. > > Could it be that the openmpi package needs a few other basic dependencies= not present in the package declaration for the particular case of a single= node (normal PC) machine? > > Also, I noted that gfortran/mpif90 ignores "CPATH" and "LIBRARY_PATH" env= variables. I had to specify this explicitly via -I and -L flags to the com= piler. After playing around a bit more, I can confirm that pure guix environment d= oes works. Therefore, my solution is to get rid of -C flag and use --pure w= hen developing and testing the MPI code on my workstation. Of course, it wo= uld be interesting to find out why OpenMPI stops working inside the "-C" en= vironment. The closest problem solved on the net that I could find was abou= t the friction between the new vader shared memory module of the OpenMPI an= d Docker containers (https://github.com/open-mpi/ompi/issues/4948). The rec= ommended circumvention technique did not work, but it feels related.