Re: PyTorch with ROCm - Ludovic Courtès

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: "Ludovic Courtès" <ludovic.courtes@inria.fr>
To: David Elsing <david.elsing@posteo.net>
Cc: guix-devel@gnu.org,  rekado@elephly.net,
	Romain GARBAGE <romain.garbage@inria.fr>
Subject: Re: PyTorch with ROCm
Date: Tue, 02 Apr 2024 16:00:34 +0200	[thread overview]
Message-ID: <874jcj6f0d.fsf@inria.fr> (raw)
In-Reply-To: <7ymsqe9h5l.fsf@posteo.net> (David Elsing's message of "Sun, 31 Mar 2024 22:21:26 +0000")

Hello!

(Cc’ing my colleague Romain who may work on related things soon.)

David Elsing <david.elsing@posteo.net> skribis:

> It is the same as for other HIP/ROCm libraries, so the GPU architectures
> chosen at build time are all available at runtime and automatically
> picked. For reference, the Arch Linux package for PyTorch [1] enables 12
> architectures. I think the architectures which can be chosen at compile
> time also depend on the ROCm version.

Nice.  We’d have to check what the size and build time tradeoff is, but
it makes sense to enable a bunch of architectures.

>>> I'm not sure they can be combined however, as the GPU code is included
>>> in the shared libraries. Thus all dependent packages like
>>> python-pytorch-rocm would need to be built for each architecture as
>>> well, which is a large duplication for the non-GPU parts.
>>
>> Yeah, but maybe that’s OK if we keep the number of supported GPU
>> architectures to a minimum?
>
> If it's no issue for the build farm it would probably be good to include
> a set of default architectures (the officially supported ones?) like you
> suggested, and make it easy to recompile all dependent packages for
> other architectures. Maybe this can be done with a package
> transformation like for '--tune'?. IIRC, building composable-kernel for
> the default architectures with 16 threads exceeded 32 GB of memory
> before I cancelled the build and set it to only architecture.

Yeah, we could think about a transformation option.  Maybe
‘--with-configure-flags=python-pytorch=-DAMDGPU_TARGETS=xyz’ would work,
and if not, we can come up with a specific transformation and/or an
procedure that takes a list of architectures and returns a package.

>>> - Many tests assume a GPU to be present, so they need to be disabled.
>>
>> Yes.  I/we’d like to eventually support that.  (There’d need to be some
>> annotation in derivations or packages specifying what hardware is
>> required, and ‘cuirass remote-worker’, ‘guix offload’, etc. would need
>> to honor that.)
>
> That sounds like a good idea, could this also include CPU ISA
> extensions, such as AVX2 and AVX-512?

That’d be great, yes.  Don’t hold your breath though as I/we haven’t
scheduled work on this yet.  If you’re interested in working on it, we
can discuss it of course.

> I think the issue is simply that elf-file? just checks the magic bytes
> and has-elf-header? checks for the entire header. If the former returns
> #t and the latter #f, an error is raised by parse-elf in guix/elf.scm.
> It seems some ROCm (or tensile?) ELF files have another header format.

Uh, never came across such a situation.  What’s so special about those
ELF files?  How are they created?

>> Oh, just noticed your patch bring a lot of things beyond PyTorch itself!
>> I think there’s some overlap with
>> <https://gitlab.inria.fr/guix-hpc/guix-hpc/-/merge_requests/38>, we
>> should synchronize.
> Ah, I did not see this before, the overlap seems to be tensile,
> roctracer and rocblas. For rocblas, I saw that they set
> "-DAMDGPU_TARGETS=gfx1030;gfx90a", probably for testing?

Could be, we’ll see.

Thanks,
Ludo’.

next prev parent reply	other threads:[~2024-04-02 14:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-23 23:02 PyTorch with ROCm David Elsing
2024-03-24 15:41 ` Ricardo Wurmus
2024-03-24 18:13   ` David Elsing
2024-03-28 22:21 ` Ludovic Courtès
2024-03-31 22:21   ` David Elsing
2024-04-02 14:00     ` Ludovic Courtès [this message]
2024-04-03 22:21       ` David Elsing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874jcj6f0d.fsf@inria.fr \
    --to=ludovic.courtes@inria.fr \
    --cc=david.elsing@posteo.net \
    --cc=guix-devel@gnu.org \
    --cc=rekado@elephly.net \
    --cc=romain.garbage@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).