From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludovic.courtes@inria.fr (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: Optionally using more advanced CPU features Date: Mon, 28 Aug 2017 15:48:00 +0200 Message-ID: <87k21nerqa.fsf@inria.fr> References: <87inhhw1ms.fsf@elephly.net> <87a82s9cw3.fsf@gnu.org> <87wp5ufkqs.fsf@albion.it.manchester.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:60872) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dmKOg-0002Io-Jg for guix-devel@gnu.org; Mon, 28 Aug 2017 09:48:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dmKOa-0004SA-LB for guix-devel@gnu.org; Mon, 28 Aug 2017 09:48:10 -0400 In-Reply-To: <87wp5ufkqs.fsf@albion.it.manchester.ac.uk> (Dave Love's message of "Wed, 23 Aug 2017 14:59:23 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Dave Love Cc: guix-devel Hi Dave, Dave Love skribis: > ludovic.courtes@inria.fr (Ludovic Court=C3=A8s) writes: [...] >> To some extent, I think this is a compiler/OS/upstream issue. By that I >> mean that the best way to achieve use of extra CPU features is by using >> the =E2=80=9CIFUNC=E2=80=9D feature of GNU ld.so, which is what libc doe= s (it has >> variants of strcmp etc. tweaked for various CPU extensions like SSE, and >> the right one gets picked up at load time.) Software like GMP, Nettle, >> or MPlayer also does this kind of selection at run time, but using >> custom mechanisms. > > That may be the best way to handle it, but it's not widely available, > and isn't possible generally (as far as I know), e.g. for Fortran code. > See also below. This issue surfaced again recently in Fedora. Right. Do you have examples of Fortran packages in mind? > In cases that don't dispatch on cpuid (or whatever), I think the > relevant missing OS/tool support is SIMD-specific hwcaps in the loader. > Hwcaps seem to be essentially undocumented, but there is, or has been, > support for instruction set capabilities on some architectures, just not > x86_64 apparently. (An ancient example was for missing instructions on > some SPARC systems which greatly affected crypto operations in ssh et > al.) But that sounds similar to IFUNC in that application code would need to actually use hwcap info to select the right implementation at load time, right? >> There=E2=80=99s probably scientific software out there that can benefit = from >> using the latest SSE/AVX/whatever extension, and yet doesn=E2=80=99t use= any of >> the tricks above. When we find such a piece of software, I think we >> should investigate and (1) see whether it actually benefits from those >> ISA extensions, and (2) see whether it would be feasible to just use >> =E2=80=98target_clones=E2=80=99 or similar on the hot spots. > > One example which has been investigated, and you can't, is BLIS. You (Why =E2=80=9Cyou can=E2=80=99t?=E2=80=9D It=E2=80=99s free software AFAIC= S on .) > need it for vaguely competitive avx512 linear algebra. (OpenBLAS is > basically fine for previous Intel and AMD SIMD.) See, e.g., > > et seq. I don't know if there's any good reason to, but if you want > ATLAS you have the same issue -- along with extra issues building it. ATLAS is a problem because it does built-time ISA selection (and maybe profile-guided optimization?). > Related, I argue, as on the Fedora list, that like BLAS (and LAPACK) > should handled the way they are in Debian, with shared libraries built > compatibly with the reference BLAS. They should be selectable at run > time, typically according to compute node type by flipping the ld.so > search path; you should be able to substitute BLIS or a GPU > implementation for OpenBLAS. That likely applies in other cases, but > I'm most familiar with the linear algebra ones. I sympathize with the idea of having several ABI-compatible BLAS implementations for the reasons you give. That somewhat conflicts with the idea of reproducibility, but after all we can have our cake and eat it too: the user can decide to have LD_LIBRARY_PATH point to an alternate ABI-compatible BLAS, or they can keep using the one that appears in RUNPATH. Thoughts? Ludo=E2=80=99.