From: Ludovic Courtès Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich To: Maurice Brémond Cc: 39588@debbugs.gnu.org, zimoun Date: Fri, 21 Feb 2020 12:32:44 +0100 (34 weeks, 3 days, 2 hours ago) Hi, I actually managed to reproduce it with a minimal test case (attached): $ guix build -f mpich-test.scm substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0% La jena derivo estos konstruata: /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv building /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv... /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found Invalid error code (-2) (error ring index 127 invalid) INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373 Invalid error code (-2) (error ring index 127 invalid) INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373 Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(586)..............: MPID_Init(224).....................: channel initialization failed MPIDI_CH3_Init(105)................: MPID_nem_init(324).................: MPID_nem_tcp_init(175).............: MPID_nem_tcp_get_business_card(401): MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0) Backtrace: 1 (primitive-load "/gnu/store/iykxzg1n018sigd4c23kx1c4ngz?") In guix/build/utils.scm: 652:6 0 (invoke _ . _) guix/build/utils.scm:652:6: In procedure invoke: Throw to key `srfi-34' with args `(#)'. builder for `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed with exit code 1 build of /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv failed View build log at '/var/log/guix/drvs/rg/r7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv.bz2'. guix build: error: build of `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed The same program outside the container works just fine: $ guix environment --ad-hoc mpich -- mpiexec -np 2 "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init" np = 2, rank = 0 np = 2, rank = 1 ‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup. Interestingly, ‘getaddrinfo’ fails in the build environment when passed the flags that ‘MPL_get_sockaddr’ uses: (computed-file "getaddrinfo" #~(pk #$output (getaddrinfo "localhost" #f (logior AI_ADDRCONFIG AI_V4MAPPED) AF_INET SOCK_STREAM IPPROTO_TCP))) However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works. Now we need to see why the ‘ai_family’ hint is causing troubles in glibc, and perhaps in parallel try to work around it in MPICH… Ludo’. PS: I’ll be mostly away from keyboard in the coming days. (use-modules (guix) (gnu)) (define code (plain-file "mpi.c" " #include #include #include int main (int argc, char *argv[]) { int err, np, rank; err = MPI_Init (&argc, &argv); assert (err == 0); err = MPI_Comm_size(MPI_COMM_WORLD, &np); assert (err == 0); err = MPI_Comm_rank(MPI_COMM_WORLD, &rank); assert (err == 0); printf (\"np = %i, rank = %i\\n\", np, rank); return 0; } ")) (define toolchain (specification->package "gcc-toolchain")) (define mpich (specification->package "mpich")) (computed-file "mpi-init" (with-imported-modules '((guix build utils)) #~(begin (use-modules (guix build utils)) (setenv "PATH" (string-append #$(file-append toolchain "/bin") ":" #$(file-append mpich "/bin"))) (setenv "CPATH" #$(file-append mpich "/include")) (setenv "LIBRARY_PATH" (string-append #$(file-append mpich "/lib") ":" #$(file-append toolchain "/lib"))) (invoke "mpicc" "-o" #$output "-Wall" "-g" #$code) ;; Run the MPI code in the build environment. (invoke "mpiexec" "-np" "2" #$output))))