From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id gk4rLaLGF2DkKQAA0tVLHw (envelope-from ) for ; Mon, 01 Feb 2021 09:15:14 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id kPbXKKLGF2DDVQAA1q6Kng (envelope-from ) for ; Mon, 01 Feb 2021 09:15:14 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id E7D7A940274 for ; Mon, 1 Feb 2021 09:15:13 +0000 (UTC) Received: from localhost ([::1]:54860 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l6VIe-0002kd-RS for larch@yhetil.org; Mon, 01 Feb 2021 04:15:12 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:48182) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l6VIU-0002if-Lb for bug-guix@gnu.org; Mon, 01 Feb 2021 04:15:02 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:47242) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1l6VIU-0006lI-Eu for bug-guix@gnu.org; Mon, 01 Feb 2021 04:15:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1l6VIU-0003Cv-A9 for bug-guix@gnu.org; Mon, 01 Feb 2021 04:15:02 -0500 X-Loop: help-debbugs@gnu.org Subject: bug#46229: rdma-core 33.x breaks InfiniBand support in =?UTF-8?Q?Open=C2=A0MPI?= Resent-From: Efraim Flashner Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 01 Feb 2021 09:15:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 46229 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 46229-submit@debbugs.gnu.org id=B46229.161217084412176 (code B ref 46229); Mon, 01 Feb 2021 09:15:02 +0000 Received: (at 46229) by debbugs.gnu.org; 1 Feb 2021 09:14:04 +0000 Received: from localhost ([127.0.0.1]:58780 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1l6VHU-00039n-DQ for submit@debbugs.gnu.org; Mon, 01 Feb 2021 04:14:03 -0500 Received: from flashner.co.il ([178.62.234.194]:41468) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1l6VHP-00039W-OU for 46229@debbugs.gnu.org; Mon, 01 Feb 2021 04:13:59 -0500 Received: from localhost (unknown [31.210.181.184]) by flashner.co.il (Postfix) with ESMTPSA id 8E47240049; Mon, 1 Feb 2021 09:13:49 +0000 (UTC) Date: Mon, 1 Feb 2021 11:13:17 +0200 From: Efraim Flashner Message-ID: References: <87r1m0i2vc.fsf@inria.fr> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="GPPj5rkEb904iDY0" Content-Disposition: inline In-Reply-To: <87r1m0i2vc.fsf@inria.fr> X-PGP-Key-ID: 0x41AAE7DCCA3D8351 X-PGP-Key: https://flashner.co.il/~efraim/efraim_flashner.asc X-PGP-Fingerprint: A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florent Pruvost , 46229@debbugs.gnu.org, Greg Hogan Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -3.96 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: E7D7A940274 X-Spam-Score: -3.96 X-Migadu-Scanner: scn1.migadu.com X-TUID: uhA+fhZAidR5 --GPPj5rkEb904iDY0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 01, 2021 at 09:55:19AM +0100, Ludovic Court=C3=A8s wrote: > Hello, >=20 > We noticed that the recent rdma-core upgrade to 33.1=C2=B9 leads to segfa= ults > in InfiniBand related routines: >=20 > --8<---------------cut here---------------start------------->8--- > $ guix time-machine --commit=3D23a5dcce1d893b8f5c5301ae3c1af863776ed3cf -= - environment --pure --ad-hoc openmpi openssh intel-mpi-benchmarks --with-= debug-info=3Drdma-core -- mpiexec -np 2 IMB-MPI1 PingPong > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec noticed that process rank 1 with PID 0 on node devel02 exited on = signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.20879=20 > core.20879: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style= , from 'IMB-MPI1 PingPong', real uid: 10218, effective uid: 10218, real gid= : 11018, effective gid: 11018, execfn: '/gnu/store/ls8pkyi05iabk952x7gy545l= c7zyr4cv-profile/bin/IMB-MPI1', platform: 'x86_64' > $ gdb /gnu/store/ls8pkyi05iabk952x7gy545lc7zyr4cv-profile/bin/IMB-MPI1 co= re.20879=20 > (gdb) bt > #0 0x00007f93b2789e88 in ibv_cmd_create_cq () > from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/li= bibverbs.so.1 > #1 0x00007f93b28c57bb in hfi1_create_cq () > from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/li= bibverbs/libhfi1verbs-rdmav33.so > #2 0x00007f93b2796331 in ibv_create_cq@@IBVERBS_1.1 () > from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/li= bibverbs.so.1 > #3 0x00007f93b27c0a55 in opal_common_verbs_qp_test () > from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/lib= mca_common_verbs.so.40 > #4 0x00007f93b27f4e83 in btl_openib_component_init () > from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/ope= nmpi/mca_btl_openib.so > #5 0x00007f93b4516aaf in mca_btl_base_select () > from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/lib= open-pal.so.40 > #6 0x00007f93b29552c2 in mca_bml_r2_component_init () > from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/ope= nmpi/mca_bml_r2.so > #7 0x00007f93b4b81b54 in mca_bml_base_init () > from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/lib= mpi.so.40 > #8 0x00007f93b4bc4ef8 in ompi_mpi_init () > from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/lib= mpi.so.40 > #9 0x00007f93b4b5ee55 in PMPI_Init_thread () > from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/lib= mpi.so.40 > #10 0x0000000000405b55 in main () > --8<---------------cut here---------------end--------------->8--- >=20 > Conversely, a pre-upgrade commit works fine: >=20 > --8<---------------cut here---------------start------------->8--- > $ guix time-machine --commit=3Dc2538db5617032788ac2f140496d00d8107579c8 -= - environment --pure --ad-hoc openmpi openssh intel-mpi-benchmarks -- mpie= xec -np 2 IMB-MPI1 PingPong > --8<---------------cut here---------------end--------------->8--- >=20 > Does that ring a bell? >=20 > Thanks, > Ludo=E2=80=99. >=20 > =C2=B9 https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3Dc2739c0801= ebc5461564e862ce8f08405e2782dc >=20 I thought I built everything that depended on rdma-core, and unfortunately I don't have a way to test it. As an actual user of the package I trust you to revert the change if necessary. I don't see anything on their mailing list pointing to this, or any other bugs really. http://vger.kernel.org/vger-lists.html#linux-rdma --=20 Efraim Flashner =D7=90=D7=A4=D7=A8=D7=99=D7=9D = =D7=A4=D7=9C=D7=A9=D7=A0=D7=A8 GPG key =3D A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted --GPPj5rkEb904iDY0 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAmAXxioACgkQQarn3Mo9 g1Fisw//a/snbXlcI+xut6wzpLb7I0EB5zM9CYWsJPppVzch25lkwErkWDeryLyM VmppeBLLDyWwg9CskklBt+CFUhEEiFJxZNHhJn0DjuZPE8G+cH0Tc/cluzY6RvxP Ynkz0CKIfo5AP5HDktjUX4NCTl7JqFJ+P1EM0aM2yqlHXLiA30JokJS+Ogz/PkA0 0l63tHRJg3Zw7lX01GOVCBeZ3T8Gq5MHrLvlPsUbsAnoosJsbLB4/Fl+pFil9uWN tWlccrk0AdP6tU4QvyPzDPNxB+wRPjaUmuZDwYvyoCqmDON5nqpATdBsHNFJP10i N7tSS/9YepTRQ6fgj9zRuqbU1SXZuLgNBlmN/C/cBeDhXLduXcZbt9lwfD2cBFAh SBcogQ+HScd0q5I3HZtDYDy2bA0L1WmtsiVDAlcFZdBECYZFyCKQulsdTgOmpor7 AIMdIftvx1WVpLFOC84o+IHZ2qPJqFJcWGHywm23Qt5Tg8Rf4TJ0yXBbCCO3rnPR p7dv7fT/FKUFe0rZpHV3fGpYaemGAIfqPk3dbiqVYNsoOYV+kdDWI14BkebjCLXH B4iWmvpil3789k7Ij0LQSowYEVlewc7WP3j0/ajO1Uj4+xIucAklOi9BkYOQgmw7 xvXAqLxDXehkQcaFzoj71rCnh+G7skTr9x2uSzdU7PnvcNWc9U8= =DOu4 -----END PGP SIGNATURE----- --GPPj5rkEb904iDY0--