From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 08fDFd+YjV+TNQAA0tVLHw (envelope-from ) for ; Mon, 19 Oct 2020 13:47:11 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 8BoMEd+YjV9mawAA1q6Kng (envelope-from ) for ; Mon, 19 Oct 2020 13:47:11 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B423C940105 for ; Mon, 19 Oct 2020 13:47:10 +0000 (UTC) Received: from localhost ([::1]:48732 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kUVVF-0000nx-Nk for larch@yhetil.org; Mon, 19 Oct 2020 09:47:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47704) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kUVV7-0000nk-TV for guix-patches@gnu.org; Mon, 19 Oct 2020 09:47:01 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:56993) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kUVV7-0007sD-Jq for guix-patches@gnu.org; Mon, 19 Oct 2020 09:47:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kUVV7-000628-IK for guix-patches@gnu.org; Mon, 19 Oct 2020 09:47:01 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich Resent-From: Maurice =?UTF-8?Q?Br=C3=A9mond?= Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Mon, 19 Oct 2020 13:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 39588 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: zimoun Cc: 39588@debbugs.gnu.org, Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 39588-submit@debbugs.gnu.org id=B39588.160311518923143 (code B ref 39588); Mon, 19 Oct 2020 13:47:01 +0000 Received: (at 39588) by debbugs.gnu.org; 19 Oct 2020 13:46:29 +0000 Received: from localhost ([127.0.0.1]:40306 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kUVUb-00061A-7P for submit@debbugs.gnu.org; Mon, 19 Oct 2020 09:46:29 -0400 Received: from mail2-relais-roc.national.inria.fr ([192.134.164.83]:31481) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kUVUY-00060u-O3 for 39588@debbugs.gnu.org; Mon, 19 Oct 2020 09:46:27 -0400 X-IronPort-AV: E=Sophos;i="5.77,394,1596492000"; d="scan'208";a="473314871" Received: from lfbn-gre-1-164-23.w90-112.abo.wanadoo.fr (HELO maurice-HP-ZBook-15-G3.inria.fr) ([90.112.13.23]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Oct 2020 15:46:20 +0200 From: Maurice =?UTF-8?Q?Br=C3=A9mond?= References: <87blq2rclk.fsf@inria.fr> <87o8tx3z2q.fsf@gnu.org> <87eeupd3t1.fsf@gnu.org> <861rhz1d7b.fsf@gmail.com> <87o8l28qjh.fsf@gnu.org> Date: Mon, 19 Oct 2020 15:46:20 +0200 In-Reply-To: (zimoun's message of "Fri, 16 Oct 2020 13:46:16 +0200") Message-ID: <87lfg2pbv7.fsf@inria.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -5.0 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -6.0 (------) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Maurice =?UTF-8?Q?Br=C3=A9mond?= Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Spam-Score: -0.01 X-TUID: eS5z9tMRDGK3 --=-=-= Content-Type: text/plain Hello, A build of mumps-openmpi with mpich fails: guix time-machine -- build mumps-openmpi --with-input=openmpi=mpich [...] mpirun -n 3 ./test_scotch_dgraph_check data/bump.grf Invalid error code (-2) (error ring index 127 invalid) INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373 Invalid error code (-2) (error ring index 127 invalid) INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373 Invalid error code (-2) (error ring index 127 invalid) INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373 Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(586)..............: MPID_Init(224).....................: channel initialization failed MPIDI_CH3_Init(105)................: MPID_nem_init(324).................: MPID_nem_tcp_init(175).............: MPID_nem_tcp_get_business_card(401): MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0) This is what Ludo reproduced: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Description: test case From: Ludovic Court=C3=A8s Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-s= cotch-mpich, python-mpi4py-mpich To: Maurice Br=C3=A9mond Cc: 39588@debbugs.gnu.org, zimoun Date: Fri, 21 Feb 2020 12:32:44 +0100 (34 weeks, 3 days, 2 hours ago) Hi, I actually managed to reproduce it with a minimal test case (attached): $ guix build -f mpich-test.scm substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0% La jena derivo estos konstruata: /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv building /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv... /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215= : expr: command not found /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215= : expr: command not found /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215= : expr: command not found /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215= : expr: command not found /gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215= : expr: command not found Invalid error code (-2) (error ring index 127 invalid) INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MP= ID_nem_tcp_init:373 Invalid error code (-2) (error ring index 127 invalid) INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MP= ID_nem_tcp_init:373 Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(586)..............:=20 MPID_Init(224).....................: channel initialization failed MPIDI_CH3_Init(105)................:=20 MPID_nem_init(324).................:=20 MPID_nem_tcp_init(175).............:=20 MPID_nem_tcp_get_business_card(401):=20 MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno= 0) Backtrace: 1 (primitive-load "/gnu/store/iykxzg1n018sigd4c23kx1c4ngz?") In guix/build/utils.scm: 652:6 0 (invoke _ . _) guix/build/utils.scm:652:6: In procedure invoke: Throw to key `srfi-34' with args `(#)'. builder for `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' fail= ed with exit code 1 build of /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv failed View build log at '/var/log/guix/drvs/rg/r7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi= -init.drv.bz2'. guix build: error: build of `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mp= i-init.drv' failed The same program outside the container works just fine: $ guix environment --ad-hoc mpich -- mpiexec -np 2 "/gnu/store/8i1dci1wxd6c= 0q6a2cz4kgb8adfk8rrz-mpi-init" np =3D 2, rank =3D 0 np =3D 2, rank =3D 1 =E2=80=98MPL_get_sockaddr=E2=80=99 uses =E2=80=98getaddrinfo=E2=80=99 for h= ost name lookup. Interestingly, =E2=80=98getaddrinfo=E2=80=99 fails in the build environment= when passed the flags that =E2=80=98MPL_get_sockaddr=E2=80=99 uses: (computed-file "getaddrinfo" #~(pk #$output (getaddrinfo "localhost" #f (logior AI_ADDRCONFIG AI_V4MAPPED) AF_INET SOCK_STREAM IPPROTO_TCP))) However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works. Now we need to see why the =E2=80=98ai_family=E2=80=99 hint is causing trou= bles in glibc, and perhaps in parallel try to work around it in MPICH=E2=80=A6 Ludo=E2=80=99. PS: I=E2=80=99ll be mostly away from keyboard in the coming days. (use-modules (guix) (gnu)) (define code (plain-file "mpi.c" " #include #include #include int main (int argc, char *argv[]) { int err, np, rank; err =3D MPI_Init (&argc, &argv); assert (err =3D=3D 0); err =3D MPI_Comm_size(MPI_COMM_WORLD, &np); assert (err =3D=3D 0); err =3D MPI_Comm_rank(MPI_COMM_WORLD, &rank); assert (err =3D=3D 0); printf (\"np =3D %i, rank =3D %i\\n\", np, rank); return 0; } ")) (define toolchain (specification->package "gcc-toolchain")) (define mpich (specification->package "mpich")) (computed-file "mpi-init" (with-imported-modules '((guix build utils)) #~(begin (use-modules (guix build utils)) (setenv "PATH" (string-append #$(file-append toolchain "/bin"= ) ":" #$(file-append mpich "/bin"))) (setenv "CPATH" #$(file-append mpich "/include")) (setenv "LIBRARY_PATH" (string-append #$(file-append mpich "/lib") ":" #$(file-append toolchain "/lib"= ))) (invoke "mpicc" "-o" #$output "-Wall" "-g" #$code) ;; Run the MPI code in the build environment. (invoke "mpiexec" "-np" "2" #$output)))) --=-=-= Content-Type: text/plain Note that it is ok with the raw mpich patch guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=add-mpich -- build mumps-openmpi --with-input=openmpi=mpich I tried a build with the same hwloc as the embedded commit f7b08df258c2e7d04ca2035ddd55a1de91f806d4 (the HEAD used for hwloc in mpich) but the result is the same: guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=test-mpich -- build mumps-openmpi --with-input=openmpi=mpich (the 2 steps time-machine needed is another question...) Maurice --=-=-=--