From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id aG1+EDPCF2DSdAAA0tVLHw (envelope-from ) for ; Mon, 01 Feb 2021 08:56:19 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id aBdRDDPCF2A3YwAAB5/wlQ (envelope-from ) for ; Mon, 01 Feb 2021 08:56:19 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 96C25940274 for ; Mon, 1 Feb 2021 08:56:18 +0000 (UTC) Received: from localhost ([::1]:44600 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l6V0L-0005Ng-Gk for larch@yhetil.org; Mon, 01 Feb 2021 03:56:17 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:43718) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l6V09-0005N1-FS for bug-guix@gnu.org; Mon, 01 Feb 2021 03:56:05 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:47196) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1l6V06-0006nL-CN for bug-guix@gnu.org; Mon, 01 Feb 2021 03:56:05 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1l6V05-0000ai-Tm; Mon, 01 Feb 2021 03:56:01 -0500 X-Loop: help-debbugs@gnu.org Subject: bug#46229: rdma-core 33.x breaks InfiniBand support in =?UTF-8?Q?Open=C2=A0MPI?= Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: code@greghogan.com, florent.pruvost@inria.fr, efraim@flashner.co.il, bug-guix@gnu.org Resent-Date: Mon, 01 Feb 2021 08:56:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 46229 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 46229@debbugs.gnu.org X-Debbugs-Original-To: X-Debbugs-Original-Xcc: Greg Hogan , Florent Pruvost , Efraim Flashner Received: via spool by submit@debbugs.gnu.org id=B.16121697432246 (code B ref -1); Mon, 01 Feb 2021 08:56:01 +0000 Received: (at submit) by debbugs.gnu.org; 1 Feb 2021 08:55:43 +0000 Received: from localhost ([127.0.0.1]:58740 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1l6Uzm-0000a9-Qc for submit@debbugs.gnu.org; Mon, 01 Feb 2021 03:55:43 -0500 Received: from lists.gnu.org ([209.51.188.17]:38730) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1l6Uzk-0000a1-7t for submit@debbugs.gnu.org; Mon, 01 Feb 2021 03:55:41 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:43670) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l6Uzf-0005L1-Au for bug-guix@gnu.org; Mon, 01 Feb 2021 03:55:35 -0500 Received: from mail3-relais-sop.national.inria.fr ([192.134.164.104]:45821) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l6UzY-0006SZ-12 for bug-guix@gnu.org; Mon, 01 Feb 2021 03:55:33 -0500 X-IronPort-AV: E=Sophos;i="5.79,392,1602540000"; d="scan'208";a="371686467" Received: from 91-160-117-201.subs.proxad.net (HELO ribbon) ([91.160.117.201]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Feb 2021 09:55:19 +0100 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 13 =?UTF-8?Q?Pluvi=C3=B4se?= an 229 de la =?UTF-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 01 Feb 2021 09:55:19 +0100 Message-ID: <87r1m0i2vc.fsf@inria.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=192.134.164.104; envelope-from=ludovic.courtes@inria.fr; helo=mail3-relais-sop.national.inria.fr X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florent Pruvost , Greg Hogan Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -2.36 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: 96C25940274 X-Spam-Score: -2.36 X-Migadu-Scanner: scn1.migadu.com X-TUID: Ejbk+rTbmO+V Hello, We noticed that the recent rdma-core upgrade to 33.1=C2=B9 leads to segfaul= ts in InfiniBand related routines: --8<---------------cut here---------------start------------->8--- $ guix time-machine --commit=3D23a5dcce1d893b8f5c5301ae3c1af863776ed3cf -- = environment --pure --ad-hoc openmpi openssh intel-mpi-benchmarks --with-de= bug-info=3Drdma-core -- mpiexec -np 2 IMB-MPI1 PingPong -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 1 with PID 0 on node devel02 exited on si= gnal 11 (Segmentation fault). -------------------------------------------------------------------------- $ file core.20879=20 core.20879: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, = from 'IMB-MPI1 PingPong', real uid: 10218, effective uid: 10218, real gid: = 11018, effective gid: 11018, execfn: '/gnu/store/ls8pkyi05iabk952x7gy545lc7= zyr4cv-profile/bin/IMB-MPI1', platform: 'x86_64' $ gdb /gnu/store/ls8pkyi05iabk952x7gy545lc7zyr4cv-profile/bin/IMB-MPI1 core= .20879=20 (gdb) bt #0 0x00007f93b2789e88 in ibv_cmd_create_cq () from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libi= bverbs.so.1 #1 0x00007f93b28c57bb in hfi1_create_cq () from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libi= bverbs/libhfi1verbs-rdmav33.so #2 0x00007f93b2796331 in ibv_create_cq@@IBVERBS_1.1 () from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libi= bverbs.so.1 #3 0x00007f93b27c0a55 in opal_common_verbs_qp_test () from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmc= a_common_verbs.so.40 #4 0x00007f93b27f4e83 in btl_openib_component_init () from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/openm= pi/mca_btl_openib.so #5 0x00007f93b4516aaf in mca_btl_base_select () from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libop= en-pal.so.40 #6 0x00007f93b29552c2 in mca_bml_r2_component_init () from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/openm= pi/mca_bml_r2.so #7 0x00007f93b4b81b54 in mca_bml_base_init () from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmp= i.so.40 #8 0x00007f93b4bc4ef8 in ompi_mpi_init () from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmp= i.so.40 #9 0x00007f93b4b5ee55 in PMPI_Init_thread () from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmp= i.so.40 #10 0x0000000000405b55 in main () --8<---------------cut here---------------end--------------->8--- Conversely, a pre-upgrade commit works fine: --8<---------------cut here---------------start------------->8--- $ guix time-machine --commit=3Dc2538db5617032788ac2f140496d00d8107579c8 -- = environment --pure --ad-hoc openmpi openssh intel-mpi-benchmarks -- mpiexe= c -np 2 IMB-MPI1 PingPong --8<---------------cut here---------------end--------------->8--- Does that ring a bell? Thanks, Ludo=E2=80=99. =C2=B9 https://git.savannah.gnu.org/cgit/guix.git/commit/?id=3Dc2739c0801eb= c5461564e862ce8f08405e2782dc