all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Efraim Flashner <efraim@flashner.co.il>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>
Cc: Florent Pruvost <florent.pruvost@inria.fr>,
	46229@debbugs.gnu.org, Greg Hogan <code@greghogan.com>
Subject: bug#46229: rdma-core 33.x breaks InfiniBand support in Open MPI
Date: Mon, 1 Feb 2021 11:13:17 +0200	[thread overview]
Message-ID: <YBfGLcBzFiLpPkh7@3900XT> (raw)
In-Reply-To: <87r1m0i2vc.fsf@inria.fr>

[-- Attachment #1: Type: text/plain, Size: 4066 bytes --]

On Mon, Feb 01, 2021 at 09:55:19AM +0100, Ludovic Courtès wrote:
> Hello,
> 
> We noticed that the recent rdma-core upgrade to 33.1¹ leads to segfaults
> in InfiniBand related routines:
> 
> --8<---------------cut here---------------start------------->8---
> $ guix time-machine --commit=23a5dcce1d893b8f5c5301ae3c1af863776ed3cf --  environment --pure --ad-hoc openmpi openssh intel-mpi-benchmarks --with-debug-info=rdma-core -- mpiexec -np 2 IMB-MPI1 PingPong
> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 1 with PID 0 on node devel02 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> $ file core.20879 
> core.20879: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'IMB-MPI1 PingPong', real uid: 10218, effective uid: 10218, real gid: 11018, effective gid: 11018, execfn: '/gnu/store/ls8pkyi05iabk952x7gy545lc7zyr4cv-profile/bin/IMB-MPI1', platform: 'x86_64'
> $ gdb /gnu/store/ls8pkyi05iabk952x7gy545lc7zyr4cv-profile/bin/IMB-MPI1 core.20879 
> (gdb) bt
> #0  0x00007f93b2789e88 in ibv_cmd_create_cq ()
>    from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libibverbs.so.1
> #1  0x00007f93b28c57bb in hfi1_create_cq ()
>    from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libibverbs/libhfi1verbs-rdmav33.so
> #2  0x00007f93b2796331 in ibv_create_cq@@IBVERBS_1.1 ()
>    from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libibverbs.so.1
> #3  0x00007f93b27c0a55 in opal_common_verbs_qp_test ()
>    from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmca_common_verbs.so.40
> #4  0x00007f93b27f4e83 in btl_openib_component_init ()
>    from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/openmpi/mca_btl_openib.so
> #5  0x00007f93b4516aaf in mca_btl_base_select ()
>    from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libopen-pal.so.40
> #6  0x00007f93b29552c2 in mca_bml_r2_component_init ()
>    from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/openmpi/mca_bml_r2.so
> #7  0x00007f93b4b81b54 in mca_bml_base_init ()
>    from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmpi.so.40
> #8  0x00007f93b4bc4ef8 in ompi_mpi_init ()
>    from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmpi.so.40
> #9  0x00007f93b4b5ee55 in PMPI_Init_thread ()
>    from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmpi.so.40
> #10 0x0000000000405b55 in main ()
> --8<---------------cut here---------------end--------------->8---
> 
> Conversely, a pre-upgrade commit works fine:
> 
> --8<---------------cut here---------------start------------->8---
> $ guix time-machine --commit=c2538db5617032788ac2f140496d00d8107579c8 --  environment --pure --ad-hoc openmpi openssh intel-mpi-benchmarks -- mpiexec -np 2 IMB-MPI1 PingPong
> --8<---------------cut here---------------end--------------->8---
> 
> Does that ring a bell?
> 
> Thanks,
> Ludo’.
> 
> ¹ https://git.savannah.gnu.org/cgit/guix.git/commit/?id=c2739c0801ebc5461564e862ce8f08405e2782dc
> 

I thought I built everything that depended on rdma-core, and
unfortunately I don't have a way to test it. As an actual user of the
package I trust you to revert the change if necessary.

I don't see anything on their mailing list pointing to this, or any
other bugs really.
http://vger.kernel.org/vger-lists.html#linux-rdma

-- 
Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-02-01  9:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-01  8:55 bug#46229: rdma-core 33.x breaks InfiniBand support in Open MPI Ludovic Courtès
2021-02-01  9:13 ` Efraim Flashner [this message]
2021-02-01 10:13 ` Ludovic Courtès
2021-02-01 11:10 ` Ludovic Courtès
2021-02-01 13:05   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YBfGLcBzFiLpPkh7@3900XT \
    --to=efraim@flashner.co.il \
    --cc=46229@debbugs.gnu.org \
    --cc=code@greghogan.com \
    --cc=florent.pruvost@inria.fr \
    --cc=ludovic.courtes@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.