From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Love Subject: Re: Optionally using more advanced CPU features Date: Fri, 01 Sep 2017 11:46:16 +0100 Message-ID: <87inh2zog7.fsf@albion.it.manchester.ac.uk> References: <87inhhw1ms.fsf@elephly.net> <87a82s9cw3.fsf@gnu.org> <87wp5ufkqs.fsf@albion.it.manchester.ac.uk> <87k21nerqa.fsf@inria.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:48546) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dnjSw-0002FP-Bg for guix-devel@gnu.org; Fri, 01 Sep 2017 06:46:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dnjSt-0005ex-71 for guix-devel@gnu.org; Fri, 01 Sep 2017 06:46:22 -0400 Received: from tranquility.mcc.ac.uk ([130.88.200.145]:44452) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dnjSt-0005du-0r for guix-devel@gnu.org; Fri, 01 Sep 2017 06:46:19 -0400 In-Reply-To: <87k21nerqa.fsf@inria.fr> ("Ludovic \=\?iso-8859-1\?Q\?Court\=E8s\?\= \=\?iso-8859-1\?Q\?\=22's\?\= message of "Mon, 28 Aug 2017 15:48:00 +0200") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ludovic =?iso-8859-1?Q?Court=E8s?= Cc: guix-devel Ludovic Court=C3=A8s writes: >> That may be the best way to handle it, but it's not widely available, >> and isn't possible generally (as far as I know), e.g. for Fortran code. >> See also below. This issue surfaced again recently in Fedora. > > Right. Do you have examples of Fortran packages in mind? Not much off-hand because, shall we say, there's a shortage of the sort of profiling information that's necessary for system performance engineering and procurement. It's not in Guix, but cp2k is a (mainly) Fortran program that is, or was, used as performance regression test for GCC. I only know about its profile for cases where time in MPI or fftw is most relevant. However, two of its kernels, ELPA, and libsmm (as libxsmm) have low-level optimized versions for x86_64, but only Fortran implementations for other architectures as far as I know. Otherwise, BLAS/LAPACK for any micro-architectures that don't have support in free optimized variants like OpenBLAS. >> In cases that don't dispatch on cpuid (or whatever), I think the >> relevant missing OS/tool support is SIMD-specific hwcaps in the loader. >> Hwcaps seem to be essentially undocumented, but there is, or has been, >> support for instruction set capabilities on some architectures, just not >> x86_64 apparently. (An ancient example was for missing instructions on >> some SPARC systems which greatly affected crypto operations in ssh et >> al.) > > But that sounds similar to IFUNC in that application code would need to > actually use hwcap info to select the right implementation at load time, > right? As far as I know, it's a loader feature. See "Hardware capabilities" in ld.so(1). > >> There=E2=80=99s probably scientific software out there that can benefi= t from > >> using the latest SSE/AVX/whatever extension, and yet doesn=E2=80=99t u= se any of > >> the tricks above. When we find such a piece of software, I think we > >> should investigate and (1) see whether it actually benefits from those > >> ISA extensions, and (2) see whether it would be feasible to just use > >> =E2=80=98target_clones=E2=80=99 or similar on the hot spots. > > > >> One example which has been investigated, and you can't, is BLIS. You > > (Why =E2=80=9Cyou can=E2=80=99t?=E2=80=9D It=E2=80=99s free software AFA= ICS on > .) Well, you could embark on some sort of (GCC-specific?) re-write, but it would be better to work on . I don't think there's anywhere you can just attach GCC attributes, and certainly no magic will happen for currently-unsupported architectures. >> need it for vaguely competitive avx512 linear algebra. (OpenBLAS is >> basically fine for previous Intel and AMD SIMD.) See, e.g., >> >> et seq. I don't know if there's any good reason to, but if you want >> ATLAS you have the same issue -- along with extra issues building it. > > ATLAS is a problem because it does built-time ISA selection (and maybe > profile-guided optimization?). Yes, that's what I meant. (I can't remember to what extent you can just specify the architecture and build it without the parameter sweep.) > I sympathize with the idea of having several ABI-compatible BLAS > implementations for the reasons you give. That somewhat conflicts with > the idea of reproducibility, but after all we can have our cake and eat > it too: the user can decide to have LD_LIBRARY_PATH point to an > alternate ABI-compatible BLAS, or they can keep using the one that > appears in RUNPATH. > > Thoughts? Right, about the cake -- as with other packaging systems -- and LD_LIBRARY_PATH/LD_PRELOAD are important for debugging and measurement anyway. [I know too much about computing and experimental science to believe in reproducibility as it's normally talked about, though facilities for reproducible builds and environment components are good.]