From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Love Subject: Re: Optionally using more advanced CPU features Date: Wed, 23 Aug 2017 14:59:23 +0100 Message-ID: <87wp5ufkqs.fsf@albion.it.manchester.ac.uk> References: <87inhhw1ms.fsf@elephly.net> <87a82s9cw3.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:57923) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkWBt-0003Ed-Df for guix-devel@gnu.org; Wed, 23 Aug 2017 09:59:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkWBq-0002SM-PB for guix-devel@gnu.org; Wed, 23 Aug 2017 09:59:29 -0400 Received: from clarity.mcc.ac.uk ([130.88.200.144]:35335) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dkWBq-0002Rj-IF for guix-devel@gnu.org; Wed, 23 Aug 2017 09:59:26 -0400 In-Reply-To: <87a82s9cw3.fsf@gnu.org> ("Ludovic \=\?iso-8859-1\?Q\?Court\=E8s\?\= \=\?iso-8859-1\?Q\?\=22's\?\= message of "Tue, 22 Aug 2017 11:21:00 +0200") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ludovic =?iso-8859-1?Q?Court=E8s?= Cc: guix-devel ludovic.courtes@inria.fr (Ludovic Court=C3=A8s) writes: > Hi, > > Ricardo Wurmus skribis: > >> I was wondering how we should go about optionally building software for >> more advanced CPU features. Currently, we build software for the lowest >> common feature set among x86_64 CPUs. That=E2=80=99s good for portabili= ty, but >> not so good for performance. >> >> Enabling CPU features often happens through configure flags, but >> expressing support at that level in our package definitions seems bad. >> How can we make it possible for users to build their software for >> different CPUs? > > To some extent, I think this is a compiler/OS/upstream issue. By that I > mean that the best way to achieve use of extra CPU features is by using > the =E2=80=9CIFUNC=E2=80=9D feature of GNU ld.so, which is what libc does= (it has > variants of strcmp etc. tweaked for various CPU extensions like SSE, and > the right one gets picked up at load time.) Software like GMP, Nettle, > or MPlayer also does this kind of selection at run time, but using > custom mechanisms. That may be the best way to handle it, but it's not widely available, and isn't possible generally (as far as I know), e.g. for Fortran code. See also below. This issue surfaced again recently in Fedora. In cases that don't dispatch on cpuid (or whatever), I think the relevant missing OS/tool support is SIMD-specific hwcaps in the loader. Hwcaps seem to be essentially undocumented, but there is, or has been, support for instruction set capabilities on some architectures, just not x86_64 apparently. (An ancient example was for missing instructions on some SPARC systems which greatly affected crypto operations in ssh et al.) >> We can cross-compile for other architectures on the command line with >> =E2=80=9C--target=E2=80=9D and =E2=80=9C--system=E2=80=9D; can we allow = for compilation with special CPU >> features across the graph with =E2=80=9C--features=E2=80=9D? Build syst= em abstractions >> or package definitions would then be changed to recognize these features >> and modify the corresponding flags as needed. > > I=E2=80=99ve considered this, but designing this would be tricky, and not= quite > right IMO. > > There=E2=80=99s probably scientific software out there that can benefit f= rom > using the latest SSE/AVX/whatever extension, and yet doesn=E2=80=99t use = any of > the tricks above. When we find such a piece of software, I think we > should investigate and (1) see whether it actually benefits from those > ISA extensions, and (2) see whether it would be feasible to just use > =E2=80=98target_clones=E2=80=99 or similar on the hot spots. One example which has been investigated, and you can't, is BLIS. You need it for vaguely competitive avx512 linear algebra. (OpenBLAS is basically fine for previous Intel and AMD SIMD.) See, e.g., et seq. I don't know if there's any good reason to, but if you want ATLAS you have the same issue -- along with extra issues building it. Related, I argue, as on the Fedora list, that like BLAS (and LAPACK) should handled the way they are in Debian, with shared libraries built compatibly with the reference BLAS. They should be selectable at run time, typically according to compute node type by flipping the ld.so search path; you should be able to substitute BLIS or a GPU implementation for OpenBLAS. That likely applies in other cases, but I'm most familiar with the linear algebra ones. [By the way, you do have to be careful with ISA-specific libraries on heterogeneous systems if you use checkpoint-restart, as you probably should on an HPC cluster -- you need to restart on compatible hardware.]