From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Love Subject: Re: OpenBLAS and performance Date: Thu, 21 Dec 2017 16:17:52 +0000 Message-ID: <87h8sk3vin.fsf@albion.it.manchester.ac.uk> References: <20171219104956.GB806@thebird.nl> <87tvwl7h4w.fsf@albion.it.manchester.ac.uk> <87h8sl78vp.fsf@albion.it.manchester.ac.uk> <20171220172215.GA7926@thebird.nl> <87d139xo3v.fsf@elephly.net> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:33100) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eS3Xi-0000aX-Mz for guix-devel@gnu.org; Thu, 21 Dec 2017 11:17:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eS3Xf-0000EY-9e for guix-devel@gnu.org; Thu, 21 Dec 2017 11:17:58 -0500 Received: from probity.mcc.ac.uk ([130.88.200.94]:57787) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eS3Xf-0000DU-4I for guix-devel@gnu.org; Thu, 21 Dec 2017 11:17:55 -0500 In-Reply-To: <87d139xo3v.fsf@elephly.net> (Ricardo Wurmus's message of "Wed, 20 Dec 2017 19:15:16 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ricardo Wurmus Cc: guix-devel@gnu.org Ricardo Wurmus writes: > Hi Pjotr, > >> I was just stating that the default openblas package does not perform >> well (it is single threaded, for one). > > Is it really single-threaded? I remember having a couple of problems > with OpenBLAS on our cluster when it is used with Numpy as both would > spawn lots of threads. The solution was to limit OpenBLAS to at most > two threads. Yes, it's symlinked from the libopenblasp variant, which is linked against libpthread, and I'd expect such problems. Anyhow, there's something badly wrong if it doesn't perform roughly equivalently to MKL on SIMD other than AVX512. If I recall correctly, the DGEMM single-threaded performance/core for HPC-type Sandybridge is in the high 20s GFLOPs, and roughly double that for avx2 ({Has,broad}well). I don't think the bad L2 cache value that currently used for Haswell has much effect in that case, but does in other benchmarks. I'll supply a patch for that. Another point about the OB package is that it excludes LAPACK for some reason that doesn't seem to be recorded. I think that should be included, partly for convenience, and partly because it optimizes some of LAPACK.