From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Bavier Subject: fftw runtime cpu detection Date: Thu, 5 Apr 2018 17:13:29 -0500 Message-ID: <20180405221329.GT105827@pe06.us.cray.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="ZPt4rx8FFjLCG7dd" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:40870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f4D8T-0004ZD-FW for guix-devel@gnu.org; Thu, 05 Apr 2018 18:13:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f4D8O-0006uv-F7 for guix-devel@gnu.org; Thu, 05 Apr 2018 18:13:37 -0400 Received: from esa1.cray.iphmx.com ([68.232.142.33]:9537) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1f4D8O-0006uK-0N for guix-devel@gnu.org; Thu, 05 Apr 2018 18:13:32 -0400 Received: from pe06.us.cray.com (pe06.us.cray.com [172.30.79.74]) by sealmr01.us.cray.com (8.14.3/8.13.8/hubv3-LastChangedRevision: 16250) with ESMTP id w35MDTk2014761 for ; Thu, 5 Apr 2018 15:13:29 -0700 Content-Disposition: inline List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: guix-devel@gnu.org --ZPt4rx8FFjLCG7dd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hello Guix, I recently discovered that the FFTW library can do runtime cpu detection. In order to do this, the package needs to be configured to build SIMD "codelets", like how our 'fftw-avx' currently does. Then, based on the instruction support detected at runtime, make those kernels available to the fftw "planner" for execution. I tested this on two systems: 1) system with sse2, and 2) system with avx2. I configured the library with "--enable-sse2 --enable-avx --enable-avx2", then ran the following on both systems: 1) $ ./tests/bench --verbose=3 --verify 'ibcd11x7x6v10' Planning ibcd11x7x6v10... using plan_many_dft estimate-planner time: 0.004355 s using plan_many_dft planner time: 0.035684 s (dft-rank>=2/1 (dft-vrank>=1-x11/1 (dft-rank>=2/1 (dft-vrank>=1-x7/1 (dft-direct-6-x10 "n1bv_6_sse2")) (dft-direct-7-x60 "n1bv_7_sse2"))) (dft-direct-11-x420 "n1bv_11_sse2")) flops: 36800 add, 9700 mul, 26260 fma estimated cost: 99057.699080, pcost = 115706.000000 ibcd11x7x6v10 4.33362e-16 7.27264e-16 8.46842e-16 2) $ ./tests/bench --verbose=3 --verify 'ibcd11x7x6v10' Planning ibcd11x7x6v10... using plan_many_dft estimate-planner time: 0.001485 s using plan_many_dft planner time: 0.025788 s (dft-rank>=2/1 (dft-rank>=2/1 (dft-vrank>=1-x77/1 (dft-direct-6-x10 "n1bv_6_sse2")) (dft-vrank>=1-x11/1 (dft-direct-7-x60 "n1bv_7_avx"))) (dft-direct-11-x420 "n1bv_11_avx")) flops: 12280 add, 2810 mul, 6950 fma estimated cost: 28996.283180, pcost = 40767.000000 ibcd11x7x6v10 2.24601e-07 3.90447e-07 2.42548e-07 The attached patch is a WIP. -- Eric Bavier, Scientific Libraries, Cray Inc. --ZPt4rx8FFjLCG7dd Content-Type: text/x-patch; charset=us-ascii Content-Disposition: attachment; filename="guix-fftw-codelets.patch" diff --git a/gnu/packages/algebra.scm b/gnu/packages/algebra.scm index 2aa1777db..96c78ea81 100644 --- a/gnu/packages/algebra.scm +++ b/gnu/packages/algebra.scm @@ -533,17 +533,26 @@ a C program.") (build-system gnu-build-system) (arguments '(#:configure-flags - '("--enable-shared" "--enable-openmp" "--enable-threads") - #:phases (alist-cons-before - 'build 'no-native - (lambda _ - ;; By default '-mtune=native' is used. However, that may - ;; cause the use of ISA extensions (SSE2, etc.) that are - ;; not necessarily available on the user's machine when - ;; that package is built on a different machine. - (substitute* (find-files "." "Makefile$") - (("-mtune=native") ""))) - %standard-phases))) + `("--enable-shared" "--enable-openmp" "--enable-threads" + ,@(let ((system (or (%current-target-system) (%current-system)))) + (cond + ((or (string-prefix? "x86_64" system) + (string-prefix? "i686" system)) + ;; Enable AVX & co. for codelets. See details at: + ;; . + '("--enable-avx" "--enable-avx2" + "--enable-avx512" "--enable-avx-128-fma")) + ((string-prefix? "aarch64" system) + '("--enable-neon" "--enable-armv8-cntvct-el0")) + ((string-prefix? "armv7" system) + '("--enable-neon" "--enable-armv7a-cntvct")) + ((string-prefix? "mips" system) + '("--enable-mips-zbus-timer")))) + ;; By default '-mtune=native' is used. However, that may cause the + ;; use of ISA extensions (e.g. AVX) that are not necessarily + ;; available on the user's machine when that package is built on a + ;; different machine. + "ax_cv_c_flags__mtune_native=no"))) (native-inputs `(("perl" ,perl))) (home-page "http://fftw.org") (synopsis "Computing the discrete Fourier transform") @@ -560,7 +569,7 @@ cosine/ sine transforms or DCT/DST).") (arguments (substitute-keyword-arguments (package-arguments fftw) ((#:configure-flags cf) - `(cons "--enable-float" ,cf)))) + `(cons "--enable-float" "--enable-sse" ,cf)))) (description (string-append (package-description fftw) " Single-precision version.")))) @@ -592,29 +601,6 @@ cosine/ sine transforms or DCT/DST).") (base32 "0wsms8narnbhfsa8chdflv2j9hzspvflblnqdn7hw8x5xdzrnq1v")))))) -(define-public fftw-avx - (package - (inherit fftw-3.3.7) - (name "fftw-avx") - (arguments - (substitute-keyword-arguments (package-arguments fftw-3.3.7) - ((#:configure-flags flags ''()) - ;; Enable AVX & co. See details at: - ;; . - `(append '("--enable-avx" "--enable-avx2" "--enable-avx512" - "--enable-avx-128-fma") - ,flags)) - ((#:substitutable? _ #f) - ;; To run the tests, we must have a CPU that supports all these - ;; extensions. Since we cannot be sure that machines in the build - ;; farm support them, disable substitutes altogether. - #f) - ((#:phases _) - ;; Since we're not providing binaries, let '-mtune=native' through. - '%standard-phases))) - (synopsis "Computing the discrete Fourier transform (AVX2-optimized)") - (supported-systems '("x86_64-linux")))) - (define-public java-la4j (package (name "java-la4j") --ZPt4rx8FFjLCG7dd--