Hello Guix, I recently discovered that the FFTW library can do runtime cpu detection. In order to do this, the package needs to be configured to build SIMD "codelets", like how our 'fftw-avx' currently does. Then, based on the instruction support detected at runtime, make those kernels available to the fftw "planner" for execution. I tested this on two systems: 1) system with sse2, and 2) system with avx2. I configured the library with "--enable-sse2 --enable-avx --enable-avx2", then ran the following on both systems: 1) $ ./tests/bench --verbose=3 --verify 'ibcd11x7x6v10' Planning ibcd11x7x6v10... using plan_many_dft estimate-planner time: 0.004355 s using plan_many_dft planner time: 0.035684 s (dft-rank>=2/1 (dft-vrank>=1-x11/1 (dft-rank>=2/1 (dft-vrank>=1-x7/1 (dft-direct-6-x10 "n1bv_6_sse2")) (dft-direct-7-x60 "n1bv_7_sse2"))) (dft-direct-11-x420 "n1bv_11_sse2")) flops: 36800 add, 9700 mul, 26260 fma estimated cost: 99057.699080, pcost = 115706.000000 ibcd11x7x6v10 4.33362e-16 7.27264e-16 8.46842e-16 2) $ ./tests/bench --verbose=3 --verify 'ibcd11x7x6v10' Planning ibcd11x7x6v10... using plan_many_dft estimate-planner time: 0.001485 s using plan_many_dft planner time: 0.025788 s (dft-rank>=2/1 (dft-rank>=2/1 (dft-vrank>=1-x77/1 (dft-direct-6-x10 "n1bv_6_sse2")) (dft-vrank>=1-x11/1 (dft-direct-7-x60 "n1bv_7_avx"))) (dft-direct-11-x420 "n1bv_11_avx")) flops: 12280 add, 2810 mul, 6950 fma estimated cost: 28996.283180, pcost = 40767.000000 ibcd11x7x6v10 2.24601e-07 3.90447e-07 2.42548e-07 The attached patch is a WIP. -- Eric Bavier, Scientific Libraries, Cray Inc.