The performance penalty for thread-multiple is supposed to be mitigated in the most recent openmpi, but not in this version, and most applications are happy with MPI_THREAD_FUNNELED.