unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Optionally using more advanced CPU features
@ 2017-08-21 12:23 Ricardo Wurmus
  2017-08-22  9:21 ` Ludovic Courtès
  2017-08-26  3:39 ` Optionally using more advanced CPU features Ben Woodcroft
  0 siblings, 2 replies; 12+ messages in thread
From: Ricardo Wurmus @ 2017-08-21 12:23 UTC (permalink / raw)
  To: guix-devel

Hi Guix,

I was wondering how we should go about optionally building software for
more advanced CPU features.  Currently, we build software for the lowest
common feature set among x86_64 CPUs.  That’s good for portability, but
not so good for performance.

Enabling CPU features often happens through configure flags, but
expressing support at that level in our package definitions seems bad.
How can we make it possible for users to build their software for
different CPUs?

We can cross-compile for other architectures on the command line with
“--target” and “--system”; can we allow for compilation with special CPU
features across the graph with “--features”?  Build system abstractions
or package definitions would then be changed to recognize these features
and modify the corresponding flags as needed.

If we had a larger build farm we could also offer substitutes for more
modern CPUs.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Optionally using more advanced CPU features
  2017-08-21 12:23 Optionally using more advanced CPU features Ricardo Wurmus
@ 2017-08-22  9:21 ` Ludovic Courtès
  2017-08-23 13:59   ` Dave Love
  2017-08-26  3:39 ` Optionally using more advanced CPU features Ben Woodcroft
  1 sibling, 1 reply; 12+ messages in thread
From: Ludovic Courtès @ 2017-08-22  9:21 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi,

Ricardo Wurmus <rekado@elephly.net> skribis:

> I was wondering how we should go about optionally building software for
> more advanced CPU features.  Currently, we build software for the lowest
> common feature set among x86_64 CPUs.  That’s good for portability, but
> not so good for performance.
>
> Enabling CPU features often happens through configure flags, but
> expressing support at that level in our package definitions seems bad.
> How can we make it possible for users to build their software for
> different CPUs?

To some extent, I think this is a compiler/OS/upstream issue.  By that I
mean that the best way to achieve use of extra CPU features is by using
the “IFUNC” feature of GNU ld.so, which is what libc does (it has
variants of strcmp etc. tweaked for various CPU extensions like SSE, and
the right one gets picked up at load time.)  Software like GMP, Nettle,
or MPlayer also does this kind of selection at run time, but using
custom mechanisms.

GCC now has a ‘target_clones’ function attribute, which instructs it to
generate several variants of a function and use IFUNC to pick up the
right one (info "(gcc) Common Function Attributes").  Ideally, upstream
would use this.

When upstream does that, we have portable-yet-efficient “fat” binaries,
and there’s nothing to do on our side.  :-)

> We can cross-compile for other architectures on the command line with
> “--target” and “--system”; can we allow for compilation with special CPU
> features across the graph with “--features”?  Build system abstractions
> or package definitions would then be changed to recognize these features
> and modify the corresponding flags as needed.

I’ve considered this, but designing this would be tricky, and not quite
right IMO.

There’s probably scientific software out there that can benefit from
using the latest SSE/AVX/whatever extension, and yet doesn’t use any of
the tricks above.  When we find such a piece of software, I think we
should investigate and (1) see whether it actually benefits from those
ISA extensions, and (2) see whether it would be feasible to just use
‘target_clones’ or similar on the hot spots.

If it turns out that this approach doesn’t scale or isn’t suitable, then
we can think more about what you suggest.  But before starting such an
endeavor, I would really like to get a better understanding of the
software we’re talking about and the options that we have.

WDYT?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Optionally using more advanced CPU features
  2017-08-22  9:21 ` Ludovic Courtès
@ 2017-08-23 13:59   ` Dave Love
  2017-08-28 13:48     ` Ludovic Courtès
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Love @ 2017-08-23 13:59 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

ludovic.courtes@inria.fr (Ludovic Courtès) writes:

> Hi,
>
> Ricardo Wurmus <rekado@elephly.net> skribis:
>
>> I was wondering how we should go about optionally building software for
>> more advanced CPU features.  Currently, we build software for the lowest
>> common feature set among x86_64 CPUs.  That’s good for portability, but
>> not so good for performance.
>>
>> Enabling CPU features often happens through configure flags, but
>> expressing support at that level in our package definitions seems bad.
>> How can we make it possible for users to build their software for
>> different CPUs?
>
> To some extent, I think this is a compiler/OS/upstream issue.  By that I
> mean that the best way to achieve use of extra CPU features is by using
> the “IFUNC” feature of GNU ld.so, which is what libc does (it has
> variants of strcmp etc. tweaked for various CPU extensions like SSE, and
> the right one gets picked up at load time.)  Software like GMP, Nettle,
> or MPlayer also does this kind of selection at run time, but using
> custom mechanisms.

That may be the best way to handle it, but it's not widely available,
and isn't possible generally (as far as I know), e.g. for Fortran code.
See also below.  This issue surfaced again recently in Fedora.

In cases that don't dispatch on cpuid (or whatever), I think the
relevant missing OS/tool support is SIMD-specific hwcaps in the loader.
Hwcaps seem to be essentially undocumented, but there is, or has been,
support for instruction set capabilities on some architectures, just not
x86_64 apparently.  (An ancient example was for missing instructions on
some SPARC systems which greatly affected crypto operations in ssh et
al.)

>> We can cross-compile for other architectures on the command line with
>> “--target” and “--system”; can we allow for compilation with special CPU
>> features across the graph with “--features”?  Build system abstractions
>> or package definitions would then be changed to recognize these features
>> and modify the corresponding flags as needed.
>
> I’ve considered this, but designing this would be tricky, and not quite
> right IMO.
>
> There’s probably scientific software out there that can benefit from
> using the latest SSE/AVX/whatever extension, and yet doesn’t use any of
> the tricks above.  When we find such a piece of software, I think we
> should investigate and (1) see whether it actually benefits from those
> ISA extensions, and (2) see whether it would be feasible to just use
> ‘target_clones’ or similar on the hot spots.

One example which has been investigated, and you can't, is BLIS.  You
need it for vaguely competitive avx512 linear algebra.  (OpenBLAS is
basically fine for previous Intel and AMD SIMD.)  See, e.g.,
<https://github.com/xianyi/OpenBLAS/issues/991#issuecomment-273631173>
et seq.  I don't know if there's any good reason to, but if you want
ATLAS you have the same issue -- along with extra issues building it.

Related, I argue, as on the Fedora list, that like BLAS (and LAPACK)
should handled the way they are in Debian, with shared libraries built
compatibly with the reference BLAS.  They should be selectable at run
time, typically according to compute node type by flipping the ld.so
search path; you should be able to substitute BLIS or a GPU
implementation for OpenBLAS.  That likely applies in other cases, but
I'm most familiar with the linear algebra ones.

[By the way, you do have to be careful with ISA-specific libraries on
heterogeneous systems if you use checkpoint-restart, as you probably
should on an HPC cluster -- you need to restart on compatible hardware.]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Optionally using more advanced CPU features
  2017-08-21 12:23 Optionally using more advanced CPU features Ricardo Wurmus
  2017-08-22  9:21 ` Ludovic Courtès
@ 2017-08-26  3:39 ` Ben Woodcroft
  2017-08-26  5:14   ` Pjotr Prins
  2017-09-04 14:50   ` Ludovic Courtès
  1 sibling, 2 replies; 12+ messages in thread
From: Ben Woodcroft @ 2017-08-26  3:39 UTC (permalink / raw)
  To: Ricardo Wurmus, guix-devel

[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]

Hi,


On 21/08/17 22:23, Ricardo Wurmus wrote:
> Hi Guix,
>
> I was wondering how we should go about optionally building software for
> more advanced CPU features.  Currently, we build software for the lowest
> common feature set among x86_64 CPUs.  That’s good for portability, but
> not so good for performance.
In many cases we can set the --with-arch flag when configuring GCC, so 
that packages built with that GCC are optimised for that architecture by 
default.

We have discussed this in the past, 
(https://lists.gnu.org/archive/html/guix-devel/2016-10/msg00005.html) 
but as you say individual packages sometimes need individual attention.

Anyway, to move forward I created a repo so that package recipes can be 
modified to use a GCC that has been optimised for a particular 
architecture. I put it out there so that it is more than just a patch on 
this ML, but I'd be happy to incorporate it into Guix proper if that is 
desired.
https://github.com/wwood/cpu-specific-guix

For instance, to build DIAMOND optimised for sandybridge:

GUILE_LOAD_PATH=/path/to/cpu-specific-guix:$GUILE_LOAD_PATH\ |guix build -e '(begin (use-modules (cpu-specific-guix) (gnu packages 
bioinformatics))\ (cpu-specific-package diamond "sandybridge"))'|



HTH, ben

[-- Attachment #2: Type: text/html, Size: 2017 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Optionally using more advanced CPU features
  2017-08-26  3:39 ` Optionally using more advanced CPU features Ben Woodcroft
@ 2017-08-26  5:14   ` Pjotr Prins
  2017-09-04 14:50   ` Ludovic Courtès
  1 sibling, 0 replies; 12+ messages in thread
From: Pjotr Prins @ 2017-08-26  5:14 UTC (permalink / raw)
  To: Ben Woodcroft; +Cc: guix-devel

On Sat, Aug 26, 2017 at 11:39:41AM +0800, Ben Woodcroft wrote:
> I was wondering how we should go about optionally building software for
> more advanced CPU features.  Currently, we build software for the lowest
> common feature set among x86_64 CPUs.  That’s good for portability, but
> not so good for performance.
> 
>    In many cases we can set the --with-arch flag when configuring GCC, so
>    that packages built with that GCC are optimised for that architecture
>    by default.
>    We have discussed this in the past,
>    ([1]https://lists.gnu.org/archive/html/guix-devel/2016-10/msg00005.html
>    ) but as you say individual packages sometimes need individual
>    attention.
>    Anyway, to move forward I created a repo so that package recipes can be
>    modified to use a GCC that has been optimised for a particular
>    architecture. I put it out there so that it is more than just a patch
>    on this ML, but I'd be happy to incorporate it into Guix proper if that
>    is desired.
>    [2]https://github.com/wwood/cpu-specific-guix
>    For instance, to build DIAMOND optimised for sandybridge:
> GUILE_LOAD_PATH=/path/to/cpu-specific-guix:$GUILE_LOAD_PATH\
>   guix build -e '(begin (use-modules (cpu-specific-guix) (gnu packages bioinform
> atics))\
>   (cpu-specific-package diamond "sandybridge"))'
> 
>    HTH, ben

Pretty cool. This works for leave-packages. For libraries we'll need
to have something that goes deeper into the graph. Openblas/atlas/GSL
are prime examples that would benefit a wide range of applications.

I am working on GEMMA these days and I will target supercomputing
architectures. Having math libraries that target vectorization
optimizations for gcc and LLVM would be very useful. The current
deployment strategy is 'one-offs' on the GUIX_PACKAGE_PATH.

Just as a note, it makes no sense to optimize all Guix packages. In
fact I prefer we have the non-optimized by default since that is what
everyone is using and is (arguably) well tested. We only have the 1 in
a thousand library we want to specialize/optimize aggressively.

Pj.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Optionally using more advanced CPU features
  2017-08-23 13:59   ` Dave Love
@ 2017-08-28 13:48     ` Ludovic Courtès
  2017-09-01 10:46       ` Dave Love
  0 siblings, 1 reply; 12+ messages in thread
From: Ludovic Courtès @ 2017-08-28 13:48 UTC (permalink / raw)
  To: Dave Love; +Cc: guix-devel

Hi Dave,

Dave Love <fx@gnu.org> skribis:

> ludovic.courtes@inria.fr (Ludovic Courtès) writes:

[...]

>> To some extent, I think this is a compiler/OS/upstream issue.  By that I
>> mean that the best way to achieve use of extra CPU features is by using
>> the “IFUNC” feature of GNU ld.so, which is what libc does (it has
>> variants of strcmp etc. tweaked for various CPU extensions like SSE, and
>> the right one gets picked up at load time.)  Software like GMP, Nettle,
>> or MPlayer also does this kind of selection at run time, but using
>> custom mechanisms.
>
> That may be the best way to handle it, but it's not widely available,
> and isn't possible generally (as far as I know), e.g. for Fortran code.
> See also below.  This issue surfaced again recently in Fedora.

Right.  Do you have examples of Fortran packages in mind?

> In cases that don't dispatch on cpuid (or whatever), I think the
> relevant missing OS/tool support is SIMD-specific hwcaps in the loader.
> Hwcaps seem to be essentially undocumented, but there is, or has been,
> support for instruction set capabilities on some architectures, just not
> x86_64 apparently.  (An ancient example was for missing instructions on
> some SPARC systems which greatly affected crypto operations in ssh et
> al.)

But that sounds similar to IFUNC in that application code would need to
actually use hwcap info to select the right implementation at load time,
right?

>> There’s probably scientific software out there that can benefit from
>> using the latest SSE/AVX/whatever extension, and yet doesn’t use any of
>> the tricks above.  When we find such a piece of software, I think we
>> should investigate and (1) see whether it actually benefits from those
>> ISA extensions, and (2) see whether it would be feasible to just use
>> ‘target_clones’ or similar on the hot spots.
>
> One example which has been investigated, and you can't, is BLIS.  You

(Why “you can’t?”  It’s free software AFAICS on
<https://github.com/flame/blis/tree/master>.)

> need it for vaguely competitive avx512 linear algebra.  (OpenBLAS is
> basically fine for previous Intel and AMD SIMD.)  See, e.g.,
> <https://github.com/xianyi/OpenBLAS/issues/991#issuecomment-273631173>
> et seq.  I don't know if there's any good reason to, but if you want
> ATLAS you have the same issue -- along with extra issues building it.

ATLAS is a problem because it does built-time ISA selection (and maybe
profile-guided optimization?).

> Related, I argue, as on the Fedora list, that like BLAS (and LAPACK)
> should handled the way they are in Debian, with shared libraries built
> compatibly with the reference BLAS.  They should be selectable at run
> time, typically according to compute node type by flipping the ld.so
> search path; you should be able to substitute BLIS or a GPU
> implementation for OpenBLAS.  That likely applies in other cases, but
> I'm most familiar with the linear algebra ones.

I sympathize with the idea of having several ABI-compatible BLAS
implementations for the reasons you give.  That somewhat conflicts with
the idea of reproducibility, but after all we can have our cake and eat
it too: the user can decide to have LD_LIBRARY_PATH point to an
alternate ABI-compatible BLAS, or they can keep using the one that
appears in RUNPATH.

Thoughts?

Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Optionally using more advanced CPU features
  2017-08-28 13:48     ` Ludovic Courtès
@ 2017-09-01 10:46       ` Dave Love
  2017-09-04 12:38         ` Ludovic Courtès
  2017-09-07 15:51         ` Packaging BLIS Ludovic Courtès
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Love @ 2017-09-01 10:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

>> That may be the best way to handle it, but it's not widely available,
>> and isn't possible generally (as far as I know), e.g. for Fortran code.
>> See also below.  This issue surfaced again recently in Fedora.
>
> Right.  Do you have examples of Fortran packages in mind?

Not much off-hand because, shall we say, there's a shortage of the sort
of profiling information that's necessary for system performance
engineering and procurement.  It's not in Guix, but cp2k is a (mainly)
Fortran program that is, or was, used as performance regression test for
GCC.  I only know about its profile for cases where time in MPI or fftw
is most relevant.  However, two of its kernels, ELPA, and libsmm (as
libxsmm) have low-level optimized versions for x86_64, but only Fortran
implementations for other architectures as far as I know.

Otherwise, BLAS/LAPACK for any micro-architectures that don't have
support in free optimized variants like OpenBLAS.

>> In cases that don't dispatch on cpuid (or whatever), I think the
>> relevant missing OS/tool support is SIMD-specific hwcaps in the loader.
>> Hwcaps seem to be essentially undocumented, but there is, or has been,
>> support for instruction set capabilities on some architectures, just not
>> x86_64 apparently.  (An ancient example was for missing instructions on
>> some SPARC systems which greatly affected crypto operations in ssh et
>> al.)
>
> But that sounds similar to IFUNC in that application code would need to
> actually use hwcap info to select the right implementation at load time,
> right?

As far as I know, it's a loader feature.  See "Hardware capabilities" in
ld.so(1).

> >> There’s probably scientific software out there that can benefit from
> >> using the latest SSE/AVX/whatever extension, and yet doesn’t use any of
> >> the tricks above.  When we find such a piece of software, I think we
> >> should investigate and (1) see whether it actually benefits from those
> >> ISA extensions, and (2) see whether it would be feasible to just use
> >> ‘target_clones’ or similar on the hot spots.
> >
> >> One example which has been investigated, and you can't, is BLIS.  You
>
> (Why “you can’t?”  It’s free software AFAICS on
> <https://github.com/flame/blis/tree/master>.)

Well, you could embark on some sort of (GCC-specific?) re-write, but it
would be better to work on <https://github.com/flame/blis/issues/129>.
I don't think there's anywhere you can just attach GCC attributes, and
certainly no magic will happen for currently-unsupported architectures.

>> need it for vaguely competitive avx512 linear algebra.  (OpenBLAS is
>> basically fine for previous Intel and AMD SIMD.)  See, e.g.,
>> <https://github.com/xianyi/OpenBLAS/issues/991#issuecomment-273631173>
>> et seq.  I don't know if there's any good reason to, but if you want
>> ATLAS you have the same issue -- along with extra issues building it.
>
> ATLAS is a problem because it does built-time ISA selection (and maybe
> profile-guided optimization?).

Yes, that's what I meant.  (I can't remember to what extent you can just
specify the architecture and build it without the parameter sweep.)

> I sympathize with the idea of having several ABI-compatible BLAS
> implementations for the reasons you give.  That somewhat conflicts with
> the idea of reproducibility, but after all we can have our cake and eat
> it too: the user can decide to have LD_LIBRARY_PATH point to an
> alternate ABI-compatible BLAS, or they can keep using the one that
> appears in RUNPATH.
>
> Thoughts?

Right, about the cake -- as with other packaging systems -- and
LD_LIBRARY_PATH/LD_PRELOAD are important for debugging and measurement
anyway.  [I know too much about computing and experimental science to
believe in reproducibility as it's normally talked about, though
facilities for reproducible builds and environment components are good.]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Optionally using more advanced CPU features
  2017-09-01 10:46       ` Dave Love
@ 2017-09-04 12:38         ` Ludovic Courtès
  2017-09-07 15:51         ` Packaging BLIS Ludovic Courtès
  1 sibling, 0 replies; 12+ messages in thread
From: Ludovic Courtès @ 2017-09-04 12:38 UTC (permalink / raw)
  To: Dave Love; +Cc: guix-devel

Hello,

Dave Love <fx@gnu.org> skribis:

> Ludovic Courtès <ludovic.courtes@inria.fr> writes:

[...]

>> But that sounds similar to IFUNC in that application code would need to
>> actually use hwcap info to select the right implementation at load time,
>> right?
>
> As far as I know, it's a loader feature.  See "Hardware capabilities" in
> ld.so(1).

Indeed.

I’ve looked at the newish libmvec along with the “Vector ABI” in the
toolchain and it’s really the kind of thing we’re looking for.

> Well, you could embark on some sort of (GCC-specific?) re-write, but it
> would be better to work on <https://github.com/flame/blis/issues/129>.
> I don't think there's anywhere you can just attach GCC attributes, and
> certainly no magic will happen for currently-unsupported architectures.

Agreed, adjusting BLIS to do some load-time configuration seems like the
right thing.  Thanks for the reference, we’ll see how it goes.

Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Optionally using more advanced CPU features
  2017-08-26  3:39 ` Optionally using more advanced CPU features Ben Woodcroft
  2017-08-26  5:14   ` Pjotr Prins
@ 2017-09-04 14:50   ` Ludovic Courtès
  1 sibling, 0 replies; 12+ messages in thread
From: Ludovic Courtès @ 2017-09-04 14:50 UTC (permalink / raw)
  To: Ben Woodcroft; +Cc: guix-devel

Hi Ben,

Ben Woodcroft <b.woodcroft@uq.edu.au> skribis:

> Anyway, to move forward I created a repo so that package recipes can
> be modified to use a GCC that has been optimised for a particular
> architecture. I put it out there so that it is more than just a patch
> on this ML, but I'd be happy to incorporate it into Guix proper if
> that is desired.
> https://github.com/wwood/cpu-specific-guix
>
> For instance, to build DIAMOND optimised for sandybridge:
>
> GUILE_LOAD_PATH=/path/to/cpu-specific-guix:$GUILE_LOAD_PATH\ |guix
> build -e '(begin (use-modules (cpu-specific-guix) (gnu packages
> bioinformatics))\ (cpu-specific-package diamond "sandybridge"))'|

That’s a neat hack!

It’s a bit of a sledgehammer, in that we could achieve this without
rebuilding GCC, I think.  For instance, we could create a ‘gcc’ wrapper
that automatically passes “-march=foo” on the command line of the real
‘gcc’, no?

Thanks for sharing!

Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Packaging BLIS
  2017-09-01 10:46       ` Dave Love
  2017-09-04 12:38         ` Ludovic Courtès
@ 2017-09-07 15:51         ` Ludovic Courtès
  2017-09-08 22:36           ` Dave Love
  1 sibling, 1 reply; 12+ messages in thread
From: Ludovic Courtès @ 2017-09-07 15:51 UTC (permalink / raw)
  To: Dave Love; +Cc: guix-devel

Hello,

Dave Love <fx@gnu.org> skribis:

> Ludovic Courtès <ludovic.courtes@inria.fr> writes:

[...]

>> >> One example which has been investigated, and you can't, is BLIS.  You
>>
>> (Why “you can’t?”  It’s free software AFAICS on
>> <https://github.com/flame/blis/tree/master>.)
>
> Well, you could embark on some sort of (GCC-specific?) re-write, but it
> would be better to work on <https://github.com/flame/blis/issues/129>.
> I don't think there's anywhere you can just attach GCC attributes, and
> certainly no magic will happen for currently-unsupported architectures.

That caught my attention so I packaged BLIS:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=5a7deb117424ff4d430b771b50e534cf065c0ba1

There are several “flavors” of BLIS, so you can always rebuild your
favorite program with:

  --with-input=openblas=blis-haswell

and similar (or even ‘--with-graft=blis=blis-haswell’ where applicable).

Hopefully the issue you linked to above will be fixed in future versions
of BLIS, at which point we can probably provide a single “blis” package.

Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Packaging BLIS
  2017-09-07 15:51         ` Packaging BLIS Ludovic Courtès
@ 2017-09-08 22:36           ` Dave Love
  2017-09-11  7:12             ` Ludovic Courtès
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Love @ 2017-09-08 22:36 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

ludovic.courtes@inria.fr (Ludovic Courtès) writes:

> That caught my attention so I packaged BLIS:
>
>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=5a7deb117424ff4d430b771b50e534cf065c0ba1
>
> There are several “flavors” of BLIS, so you can always rebuild your
> favorite program with:
>
>   --with-input=openblas=blis-haswell
>
> and similar (or even ‘--with-graft=blis=blis-haswell’ where applicable).

I'm not aware of an advantage of BLIS over OpenBLAS except for KNL (and
maybe Skylake Xeon, if that's covered).  My interest was specifically
for avx512.  (BLIS is documented as basically a research effort, which I
think explains the way it's built.)

> Hopefully the issue you linked to above will be fixed in future versions
> of BLIS, at which point we can probably provide a single “blis” package.
>
> Ludo’.

I still think it's important to be able to select the BLAS at run time
with dynamic linking.  I suppose the trick I used for rpm will work
similarly, though I'm not sure how ld.so behaves in Guix.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Packaging BLIS
  2017-09-08 22:36           ` Dave Love
@ 2017-09-11  7:12             ` Ludovic Courtès
  0 siblings, 0 replies; 12+ messages in thread
From: Ludovic Courtès @ 2017-09-11  7:12 UTC (permalink / raw)
  To: Dave Love; +Cc: guix-devel

Dave Love <fx@gnu.org> skribis:

> ludovic.courtes@inria.fr (Ludovic Courtès) writes:
>
>> That caught my attention so I packaged BLIS:
>>
>>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=5a7deb117424ff4d430b771b50e534cf065c0ba1
>>
>> There are several “flavors” of BLIS, so you can always rebuild your
>> favorite program with:
>>
>>   --with-input=openblas=blis-haswell
>>
>> and similar (or even ‘--with-graft=blis=blis-haswell’ where applicable).
>
> I'm not aware of an advantage of BLIS over OpenBLAS except for KNL (and
> maybe Skylake Xeon, if that's covered).  My interest was specifically
> for avx512.  (BLIS is documented as basically a research effort, which I
> think explains the way it's built.)

OK, I haven’t done any benchmarking yet.

>> Hopefully the issue you linked to above will be fixed in future versions
>> of BLIS, at which point we can probably provide a single “blis” package.
>>
>> Ludo’.
>
> I still think it's important to be able to select the BLAS at run time
> with dynamic linking.  I suppose the trick I used for rpm will work
> similarly, though I'm not sure how ld.so behaves in Guix.

ld.so does not honor /etc/ld.so.{conf,cache}, but apart from that it
works as usual.

HTH,
Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-09-11  7:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-21 12:23 Optionally using more advanced CPU features Ricardo Wurmus
2017-08-22  9:21 ` Ludovic Courtès
2017-08-23 13:59   ` Dave Love
2017-08-28 13:48     ` Ludovic Courtès
2017-09-01 10:46       ` Dave Love
2017-09-04 12:38         ` Ludovic Courtès
2017-09-07 15:51         ` Packaging BLIS Ludovic Courtès
2017-09-08 22:36           ` Dave Love
2017-09-11  7:12             ` Ludovic Courtès
2017-08-26  3:39 ` Optionally using more advanced CPU features Ben Woodcroft
2017-08-26  5:14   ` Pjotr Prins
2017-09-04 14:50   ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).