From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Love Subject: Re: OpenBLAS and performance Date: Fri, 22 Dec 2017 14:35:22 +0000 Message-ID: <87h8si25lh.fsf@albion.it.manchester.ac.uk> References: <20171219104956.GB806@thebird.nl> <87tvwl7h4w.fsf@albion.it.manchester.ac.uk> <87h8sl78vp.fsf@albion.it.manchester.ac.uk> <20171220172215.GA7926@thebird.nl> <87d139xo3v.fsf@elephly.net> <20171220192802.GA8426@thebird.nl> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:55426) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eSOQG-0001DT-31 for guix-devel@gnu.org; Fri, 22 Dec 2017 09:35:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eSOQC-0000x1-4j for guix-devel@gnu.org; Fri, 22 Dec 2017 09:35:40 -0500 Received: from [195.159.176.226] (port=45625 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eSOQB-0000vQ-UH for guix-devel@gnu.org; Fri, 22 Dec 2017 09:35:36 -0500 Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1eSOO8-0002D6-EM for guix-devel@gnu.org; Fri, 22 Dec 2017 15:33:28 +0100 List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: guix-devel@gnu.org For what it's worth, I get 37000 Mflops from the dgemm.goto benchmark using the current Guix openblas and OPENBLAS_NUM_THREADS=1 at a size of 7000 on a laptop with "i5-6200U CPU @ 2.30GHz" (avx2). That looks about right, and it should more-or-less plateau at that size. For comparison, I get 44000 on a cluster node "E5-2690 v3 @ 2.60GHz" with its serial build of 0.2.19. (I mis-remembered the sandybridge figures, which should be low 20s, not high 20s.) If you see something much different, perhaps the performance counters give a clue, e.g. with Guix' scorep/cube, oprofile, or perf. I've sent a patch for the correct cache size on haswell, but I don't think it makes much difference in this case.