PyTorch with ROCm

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* PyTorch with ROCm
@ 2024-03-23 23:02 David Elsing
  2024-03-24 15:41 ` Ricardo Wurmus
  2024-03-28 22:21 ` Ludovic Courtès
  0 siblings, 2 replies; 7+ messages in thread
From: David Elsing @ 2024-03-23 23:02 UTC (permalink / raw)
  To: guix-devel; +Cc: ludo, rekado

Hello,

after seeing that ROCm packages [1] are available in the Guix-HPC
channel, I decided to try and package PyTorch 2.2.1 with ROCm 6.0.2.

For this, I first unbundled the (many) remaining dependencies of the
python-pytorch package and updated it to 2.2.1, the patch series for
which can be found here [2,3].

For building ROCm and building the remaining packages, I did not apply
the same quality standard as for python-pytorch and just tried to get it
working at all with ROCM 6.0.2. To reduce the build time, I also only
tested them for gfx1101 as set in the %amdgpu-targets variable in
amd/rocm-base.scm (which needs to be adjusted for other GPUs). Here, it
seemed to work fine on my GPU.

The changes for the ROCm packages are here [4] as a modification of
Guix-HPC. There, the python-pytorch-rocm package in
amd/machine-learning.scm depends on the python-pytorch-avx package in
[2,3]. Both python-pytorch and python-pytorch-avx support AVX2 / AVX-512
instructions, but the latter also has support for fbgemm and nnpack. I
used it over python-pytorch because AVX2 or AVX-512 instructions should
be available on a CPU with PCIe atomics anyway, which ROCm requires.

For some packages, such as composable-kernel, the build time and
memory requirement is already very high when building only for one GPU
architecture, so maybe it would be best to make a separate package for
each architecture?
I'm not sure they can be combined however, as the GPU code is included
in the shared libraries. Thus all dependent packages like
python-pytorch-rocm would need to be built for each architecture as
well, which is a large duplication for the non-GPU parts.

There were a few other issues as well, some of them should probably be
addressed upstream:
- Many tests assume a GPU to be present, so they need to be disabled.
- For several packages (e.g. rocfft), I had to disable the
  validate-runpath? phase, as there was an error when reading ELF
  files. It is however possible that I also disabled it for packages
  where it was not necessary, but it was the case for rocblas at
  least. Here, kernels generated are contained in ELF files, which are
  detected by elf-file? in guix/build/utils.scm, but rejected by
  has-elf-header? in guix/elf.scm, which leads to an error.
- Dependencies of python-tensile copy source files and later copy them
  with shutil.copy, sometimes twice. This leads to permission errors,
  as the permissions in the store are kept, so I patched it to use
  shutil.copyfile instead.
- There were a few errors due to using the GCC 11 system headers with
  rocm-toolchain (which is based on Clang+LLVM). For roctracer,
  replacing std::experimental::filesystem by std::filesystem suffices,
  but for rocthrust, the placement new operator is not found. I
  applied the patch from Gentoo [5], where it is replaced by a simple
  assignment. It looks like UB to me though, even if it happens to
  work. The question is whether this is a bug in libstdc++, clang or
  amdclang++...
- rocMLIR also contains a fork of the LLVM source tree and it is not
  clear at a first glance how exactly it differs from the main ROCm
  fork of LLVM or upstream LLVM.

It would be really great to have these packages in Guix proper, but
first of course the base ROCm packages need to be added after deciding
how to deal with the different architectures. Also, are several ROCm
versions necessary or would only one (the current latest) version
suffice?

Cheers,
David

[1] https://hpc.guix.info/blog/2024/01/hip-and-rocm-come-to-guix/
[2] https://issues.guix.gnu.org/69591
[3] https://codeberg.org/dtelsing/Guix/src/branch/pytorch
[4] https://codeberg.org/dtelsing/Guix-HPC/src/branch/pytorch-rocm
[5] https://gitweb.gentoo.org/repo/gentoo.git/tree/sci-libs/rocThrust/files/rocThrust-4.0-operator_new.patch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PyTorch with ROCm
  2024-03-23 23:02 PyTorch with ROCm David Elsing
@ 2024-03-24 15:41 ` Ricardo Wurmus
  2024-03-24 18:13   ` David Elsing
  2024-03-28 22:21 ` Ludovic Courtès
  1 sibling, 1 reply; 7+ messages in thread
From: Ricardo Wurmus @ 2024-03-24 15:41 UTC (permalink / raw)
  To: David Elsing; +Cc: guix-devel, ludo

Hi David,

> after seeing that ROCm packages [1] are available in the Guix-HPC
> channel, I decided to try and package PyTorch 2.2.1 with ROCm 6.0.2.

Excellent initiative!

> For this, I first unbundled the (many) remaining dependencies of the
> python-pytorch package and updated it to 2.2.1, the patch series for
> which can be found here [2,3].

Oh, commit 8429f25ecd83594e80676a67ad9c54f0d6cf3f16 added
python-pytorch2 at version 2.2.1.  Do you think you could adjust your
patches to modify that one instead?

> It would be really great to have these packages in Guix proper, but
> first of course the base ROCm packages need to be added after deciding
> how to deal with the different architectures. Also, are several ROCm
> versions necessary or would only one (the current latest) version
> suffice?

As for the ROCm-specific work, I'm not qualified to comment.  I do
support a move of the ROCm packages from Guix HPC to Guix proper.

I think it is sufficient to only have the current version of ROCm; other
versions could be added if there is reasonable demand.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PyTorch with ROCm
  2024-03-24 15:41 ` Ricardo Wurmus
@ 2024-03-24 18:13   ` David Elsing
  0 siblings, 0 replies; 7+ messages in thread
From: David Elsing @ 2024-03-24 18:13 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel, ludo

Hi Ricardo,

thanks for the information!

Ricardo Wurmus <rekado@elephly.net> writes:

> Oh, commit 8429f25ecd83594e80676a67ad9c54f0d6cf3f16 added
> python-pytorch2 at version 2.2.1.  Do you think you could adjust your
> patches to modify that one instead?

I already adjusted the patches yesterday to remove the python-pytorch2
package you added, as the patch series updates the main python-pytorch
package to version 2.
The new inputs of your package were already included, with the exception
of python-opt-einsum. I had overlooked it before and included it now. :)
Is there a reason to keep version 1 around? Then I could adjust the
patches again. Otherwise, it makes sense for me to move the
python-pytorch package to version 2.2.1 and have a package variant with
2.0.1 for r-torch (which I kept and adjusted).

Due to problems when building dependencies, the new package only
succeeds to build for x86_64. As I explained in the patch series, asmjit
fails on armhf because GCC runs out of memory (it reaches 4 GB I think
and more is of course not possible) and cpuinfo has a known bug on
aarch64 [1], which causes the tests to fail and AFAICT also break
PyTorch at runtime. Through python-pytorch -> python-expecttest ->
poetry -> python-keyring -> python-secretstorage -> python-cryptography,
the python-pytorch package now depends on rust, which currently requires
too much memory to build on 32 bit systems, so i686 is not supported
either.

What do you think should be done here?

I added all packages required for the core tests to native-inputs, but
decided to disable them as they require a long time to run. If I remove
the test inputs (in particular python-expecttest), the package could
probably also be built for i686. Would it be acceptable keep them as a
comment for reference?

> I think it is sufficient to only have the current version of ROCm; other
> versions could be added if there is reasonable demand.

That sounds good to me.

Best,
David

[1] https://github.com/pytorch/cpuinfo/issues/14

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PyTorch with ROCm
  2024-03-23 23:02 PyTorch with ROCm David Elsing
  2024-03-24 15:41 ` Ricardo Wurmus
@ 2024-03-28 22:21 ` Ludovic Courtès
  2024-03-31 22:21   ` David Elsing
  1 sibling, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2024-03-28 22:21 UTC (permalink / raw)
  To: David Elsing; +Cc: guix-devel, rekado

Hello!

David Elsing <david.elsing@posteo.net> skribis:

> after seeing that ROCm packages [1] are available in the Guix-HPC
> channel, I decided to try and package PyTorch 2.2.1 with ROCm 6.0.2.

Nice!

> The changes for the ROCm packages are here [4] as a modification of
> Guix-HPC. There, the python-pytorch-rocm package in
> amd/machine-learning.scm depends on the python-pytorch-avx package in
> [2,3]. Both python-pytorch and python-pytorch-avx support AVX2 / AVX-512
> instructions, but the latter also has support for fbgemm and nnpack. I
> used it over python-pytorch because AVX2 or AVX-512 instructions should
> be available on a CPU with PCIe atomics anyway, which ROCm requires.

I’m happy to merge your changes in the ‘guix-hpc’ channel for the time
being (I can create you an account there if you wish so you can create
merge requests etc.).  Let me know!

I agree with Ricardo that this should be merged into Guix proper
eventually.  This is still in flux and we’d need to check what Kjetil
and Thomas at AMD think, in particular wrt. versions, so no ETA so far.

> For some packages, such as composable-kernel, the build time and
> memory requirement is already very high when building only for one GPU
> architecture, so maybe it would be best to make a separate package for
> each architecture?

Is PyTorch able to build code for several GPU architectures and pick the
right one at run time?  If it does, that would seem like the better
option for me, unless that is indeed so computationally expensive that
it’s not affordable.

> I'm not sure they can be combined however, as the GPU code is included
> in the shared libraries. Thus all dependent packages like
> python-pytorch-rocm would need to be built for each architecture as
> well, which is a large duplication for the non-GPU parts.

Yeah, but maybe that’s OK if we keep the number of supported GPU
architectures to a minimum?

> There were a few other issues as well, some of them should probably be
> addressed upstream:
> - Many tests assume a GPU to be present, so they need to be disabled.

Yes.  I/we’d like to eventually support that.  (There’d need to be some
annotation in derivations or packages specifying what hardware is
required, and ‘cuirass remote-worker’, ‘guix offload’, etc. would need
to honor that.)

> - For several packages (e.g. rocfft), I had to disable the
>   validate-runpath? phase, as there was an error when reading ELF
>   files. It is however possible that I also disabled it for packages
>   where it was not necessary, but it was the case for rocblas at
>   least. Here, kernels generated are contained in ELF files, which are
>   detected by elf-file? in guix/build/utils.scm, but rejected by
>   has-elf-header? in guix/elf.scm, which leads to an error.

Weird.  We’d need to look more closely into the errors you got.

[...]

> - There were a few errors due to using the GCC 11 system headers with
>   rocm-toolchain (which is based on Clang+LLVM). For roctracer,
>   replacing std::experimental::filesystem by std::filesystem suffices,
>   but for rocthrust, the placement new operator is not found. I
>   applied the patch from Gentoo [5], where it is replaced by a simple
>   assignment. It looks like UB to me though, even if it happens to
>   work. The question is whether this is a bug in libstdc++, clang or
>   amdclang++...
> - rocMLIR also contains a fork of the LLVM source tree and it is not
>   clear at a first glance how exactly it differs from the main ROCm
>   fork of LLVM or upstream LLVM.

Oh, just noticed your patch bring a lot of things beyond PyTorch itself!
I think there’s some overlap with
<https://gitlab.inria.fr/guix-hpc/guix-hpc/-/merge_requests/38>, we
should synchronize.

Thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PyTorch with ROCm
  2024-03-28 22:21 ` Ludovic Courtès
@ 2024-03-31 22:21   ` David Elsing
  2024-04-02 14:00     ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: David Elsing @ 2024-03-31 22:21 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, rekado

Hi!

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> I’m happy to merge your changes in the ‘guix-hpc’ channel for the time
> being (I can create you an account there if you wish so you can create
> merge requests etc.).  Let me know!

Ok sure, that sounds good! I made the packages only for ROCm 6.0.2 so
far though.

> I agree with Ricardo that this should be merged into Guix proper
> eventually.  This is still in flux and we’d need to check what Kjetil
> and Thomas at AMD think, in particular wrt. versions, so no ETA so far.

Yes I agree, the ROCm packages are not ready to be merged yet.

> Is PyTorch able to build code for several GPU architectures and pick the
> right one at run time?  If it does, that would seem like the better
> option for me, unless that is indeed so computationally expensive that
> it’s not affordable.

It is the same as for other HIP/ROCm libraries, so the GPU architectures
chosen at build time are all available at runtime and automatically
picked. For reference, the Arch Linux package for PyTorch [1] enables 12
architectures. I think the architectures which can be chosen at compile
time also depend on the ROCm version.

>> I'm not sure they can be combined however, as the GPU code is included
>> in the shared libraries. Thus all dependent packages like
>> python-pytorch-rocm would need to be built for each architecture as
>> well, which is a large duplication for the non-GPU parts.
>
> Yeah, but maybe that’s OK if we keep the number of supported GPU
> architectures to a minimum?

If it's no issue for the build farm it would probably be good to include
a set of default architectures (the officially supported ones?) like you
suggested, and make it easy to recompile all dependent packages for
other architectures. Maybe this can be done with a package
transformation like for '--tune'?. IIRC, building composable-kernel for
the default architectures with 16 threads exceeded 32 GB of memory
before I cancelled the build and set it to only architecture.

>> - Many tests assume a GPU to be present, so they need to be disabled.
>
> Yes.  I/we’d like to eventually support that.  (There’d need to be some
> annotation in derivations or packages specifying what hardware is
> required, and ‘cuirass remote-worker’, ‘guix offload’, etc. would need
> to honor that.)

That sounds like a good idea, could this also include CPU ISA
extensions, such as AVX2 and AVX-512?

>> - For several packages (e.g. rocfft), I had to disable the
>>   validate-runpath? phase, as there was an error when reading ELF
>>   files. It is however possible that I also disabled it for packages
>>   where it was not necessary, but it was the case for rocblas at
>>   least. Here, kernels generated are contained in ELF files, which are
>>   detected by elf-file? in guix/build/utils.scm, but rejected by
>>   has-elf-header? in guix/elf.scm, which leads to an error.
>
> Weird.  We’d need to look more closely into the errors you got.

I think the issue is simply that elf-file? just checks the magic bytes
and has-elf-header? checks for the entire header. If the former returns
#t and the latter #f, an error is raised by parse-elf in guix/elf.scm.
It seems some ROCm (or tensile?) ELF files have another header format.

> Oh, just noticed your patch bring a lot of things beyond PyTorch itself!
> I think there’s some overlap with
> <https://gitlab.inria.fr/guix-hpc/guix-hpc/-/merge_requests/38>, we
> should synchronize.
Ah, I did not see this before, the overlap seems to be tensile,
roctracer and rocblas. For rocblas, I saw that they set
"-DAMDGPU_TARGETS=gfx1030;gfx90a", probably for testing?

Thank you!
David

[1] https://gitlab.archlinux.org/archlinux/packaging/packages/python-pytorch/-/blob/ae90c1e8bdb99af458ca0a545c5736950a747690/PKGBUILD

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PyTorch with ROCm
  2024-03-31 22:21   ` David Elsing
@ 2024-04-02 14:00     ` Ludovic Courtès
  2024-04-03 22:21       ` David Elsing
  0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2024-04-02 14:00 UTC (permalink / raw)
  To: David Elsing; +Cc: guix-devel, rekado, Romain GARBAGE

Hello!

(Cc’ing my colleague Romain who may work on related things soon.)

David Elsing <david.elsing@posteo.net> skribis:

> It is the same as for other HIP/ROCm libraries, so the GPU architectures
> chosen at build time are all available at runtime and automatically
> picked. For reference, the Arch Linux package for PyTorch [1] enables 12
> architectures. I think the architectures which can be chosen at compile
> time also depend on the ROCm version.

Nice.  We’d have to check what the size and build time tradeoff is, but
it makes sense to enable a bunch of architectures.

>>> I'm not sure they can be combined however, as the GPU code is included
>>> in the shared libraries. Thus all dependent packages like
>>> python-pytorch-rocm would need to be built for each architecture as
>>> well, which is a large duplication for the non-GPU parts.
>>
>> Yeah, but maybe that’s OK if we keep the number of supported GPU
>> architectures to a minimum?
>
> If it's no issue for the build farm it would probably be good to include
> a set of default architectures (the officially supported ones?) like you
> suggested, and make it easy to recompile all dependent packages for
> other architectures. Maybe this can be done with a package
> transformation like for '--tune'?. IIRC, building composable-kernel for
> the default architectures with 16 threads exceeded 32 GB of memory
> before I cancelled the build and set it to only architecture.

Yeah, we could think about a transformation option.  Maybe
‘--with-configure-flags=python-pytorch=-DAMDGPU_TARGETS=xyz’ would work,
and if not, we can come up with a specific transformation and/or an
procedure that takes a list of architectures and returns a package.

>>> - Many tests assume a GPU to be present, so they need to be disabled.
>>
>> Yes.  I/we’d like to eventually support that.  (There’d need to be some
>> annotation in derivations or packages specifying what hardware is
>> required, and ‘cuirass remote-worker’, ‘guix offload’, etc. would need
>> to honor that.)
>
> That sounds like a good idea, could this also include CPU ISA
> extensions, such as AVX2 and AVX-512?

That’d be great, yes.  Don’t hold your breath though as I/we haven’t
scheduled work on this yet.  If you’re interested in working on it, we
can discuss it of course.

> I think the issue is simply that elf-file? just checks the magic bytes
> and has-elf-header? checks for the entire header. If the former returns
> #t and the latter #f, an error is raised by parse-elf in guix/elf.scm.
> It seems some ROCm (or tensile?) ELF files have another header format.

Uh, never came across such a situation.  What’s so special about those
ELF files?  How are they created?

>> Oh, just noticed your patch bring a lot of things beyond PyTorch itself!
>> I think there’s some overlap with
>> <https://gitlab.inria.fr/guix-hpc/guix-hpc/-/merge_requests/38>, we
>> should synchronize.
> Ah, I did not see this before, the overlap seems to be tensile,
> roctracer and rocblas. For rocblas, I saw that they set
> "-DAMDGPU_TARGETS=gfx1030;gfx90a", probably for testing?

Could be, we’ll see.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PyTorch with ROCm
  2024-04-02 14:00     ` Ludovic Courtès
@ 2024-04-03 22:21       ` David Elsing
  0 siblings, 0 replies; 7+ messages in thread
From: David Elsing @ 2024-04-03 22:21 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, rekado, Romain GARBAGE

Hello,

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> Yeah, we could think about a transformation option.  Maybe
> ‘--with-configure-flags=python-pytorch=-DAMDGPU_TARGETS=xyz’ would work,
> and if not, we can come up with a specific transformation and/or an
> procedure that takes a list of architectures and returns a package.

I think that would work for python-pytorch itself, but it would need to
be set for all ROCm dependencies as well. It would be good to make sure
that the targets for a package are a subset of the intersection of the
targets specified for its dependencies.

>>>> - Many tests assume a GPU to be present, so they need to be disabled.
>>>
>>> Yes.  I/we’d like to eventually support that.  (There’d need to be some
>>> annotation in derivations or packages specifying what hardware is
>>> required, and ‘cuirass remote-worker’, ‘guix offload’, etc. would need
>>> to honor that.)
>>
>> That sounds like a good idea, could this also include CPU ISA
>> extensions, such as AVX2 and AVX-512?
>
> That’d be great, yes.  Don’t hold your breath though as I/we haven’t
> scheduled work on this yet.  If you’re interested in working on it, we
> can discuss it of course.

I am definitively interested, but am not familiar with Cuirass. Would
this also require support by the build daemon to determine which
hardware is available?

>> I think the issue is simply that elf-file? just checks the magic bytes
>> and has-elf-header? checks for the entire header. If the former returns
>> #t and the latter #f, an error is raised by parse-elf in guix/elf.scm.
>> It seems some ROCm (or tensile?) ELF files have another header format.
>
> Uh, never came across such a situation.  What’s so special about those
> ELF files?  How are they created?

After checking again, I noticed that the error actually only occurs for
rocblas. :)
Here, the problematic ELF files are generated by Tensile [1], and are
installed in lib/rocblas/library (by library/src/CMakeLists.txt, which
calls a CMake function from the Tensile package). They are shared object
libraries for the GPU architecture(s) [2]. Tensile uses
`clang-offload-builder` (from rocm-toolchain) to create the files, and
it seems to me that the "ELF" header comes from there, but I don't know
why it is special.

Thanks,
David

[1] https://github.com/ROCm/Tensile/blob/17df881bde80fc20f997dfb290f4bb4b0e05a7e9/Tensile/TensileCreateLibrary.py#L283
[2] https://github.com/ROCm/Tensile/wiki/TensileCreateLibrary#code-object-libraries

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-03 22:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-23 23:02 PyTorch with ROCm David Elsing
2024-03-24 15:41 ` Ricardo Wurmus
2024-03-24 18:13   ` David Elsing
2024-03-28 22:21 ` Ludovic Courtès
2024-03-31 22:21   ` David Elsing
2024-04-02 14:00     ` Ludovic Courtès
2024-04-03 22:21       ` David Elsing

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).