unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: Zacchaeus Scheffer <zaccysc@gmail.com>
To: Lars-Dominik Braun <lars@6xq.net>
Cc: guix-science@gnu.org
Subject: Re: Freeing Machine Learning with ROCm
Date: Tue, 26 Apr 2022 00:45:13 -0400	[thread overview]
Message-ID: <CAJejy7=rjPKrkD=YBf+Xtwz0a7tdz6437un7yspdZk2QZRyrQQ@mail.gmail.com> (raw)
In-Reply-To: <YmZALT6hxKX5MWxI@noor.fritz.box>

[-- Attachment #1: Type: text/plain, Size: 3373 bytes --]

>
> > Based on the fact that many ROCm packages exist in guix, and
> > that I don't see people complain, it seems it must have worked in the
> > past.
> Indeed, I am using Guix’ darktable and rocm-opencl-runtime packages
> for OpenCL-accelerated photo editing. But I’m also doing this on a
> foreign distribution with a custom kernel (5.15) – not Guix System.
>
I tried kernel version 5.15 before I tried 5.4.  Is there anything else
special about your kernel version?

> > ROCk module is loaded
> > > Unable to open /dev/kfd read-write: No such file or directory
> > > <my username here> is member of video group
> Which GPU are you using? Can you see it with `lspci` and does it have the
> `amdgpu` driver attached? Is the firmware loaded (`dmesg | grep amdgpu`,
> I’m guessing no, since you use linux-libre)?
>
I have an AMD Radeon Instinct MI60, one of the few officially supported
GPUs.  `lspci | grep -i amd` gives:
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a0
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a1
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20
so it seems to be detected. `dmesg | grep amdgpu` gives:
[   12.446826] [drm] amdgpu kernel modesetting enabled.
[   12.485012] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ)
feature not supported
[   12.522503] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[   12.522538] amdgpu: ATOM BIOS: 113-D1630600-107
[   12.523127] [drm:sdma_v4_0_early_init.cold [amdgpu]] *ERROR* sdma_v4_0:
Failed to load firmware "/*(DEBLOBBED)*/"
[   12.523277] [drm:sdma_v4_0_early_init.cold [amdgpu]] *ERROR* Failed to
load sdma firmware!
[   12.533887] amdgpu 0000:03:00.0: amdgpu: MEM ECC is active.
[   12.533889] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is active.
[   12.533893] amdgpu 0000:03:00.0: amdgpu: RAS INFO: ras initialized
successfully, hardware ability[7fff] ras_mask[7fff]
[   12.533902] amdgpu 0000:03:00.0: amdgpu: VRAM: 32752M 0x0000008000000000
- 0x00000087FEFFFFFF (32752M used)
[   12.533904] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 -
0x000000001FFFFFFF
[   12.533905] amdgpu 0000:03:00.0: amdgpu: AGP: 267878400M
0x0000008800000000 - 0x0000FFFFFFFFFFFF
[   12.557543] [drm] amdgpu: 32752M of VRAM memory ready
[   12.557549] [drm] amdgpu: 24018M of GTT memory ready.
[   12.557775] amdgpu 0000:03:00.0: amdgpu: failed to init sos firmware
[   12.557777] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp
firmware!
[   12.557916] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP
block <psp> failed -2
[   12.558042] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
[   12.558044] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[   12.558047] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
[   12.602981] amdgpu: probe of 0000:03:00.0 failed with error -2
[   12.603026] [drm] amdgpu: ttm finalized
So it seems to be partially working, partially not.  That "Fatal error
during GPU init" is pretty discouraging though...  With the way AMD
promoted ROCm as being so open, I was really under the impression that I
would be able to make this work on Guix, albeit with some work on my end,
but you sound skeptical.  Do you think it is possible?

Thanks for your kind response,
Zacchaeus

[-- Attachment #2: Type: text/html, Size: 4046 bytes --]

  reply	other threads:[~2022-04-26  6:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-23  5:14 Freeing Machine Learning with ROCm Zacchaeus Scheffer
2022-04-25  6:31 ` Lars-Dominik Braun
2022-04-26  4:45   ` Zacchaeus Scheffer [this message]
2022-04-26  6:24     ` Lars-Dominik Braun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJejy7=rjPKrkD=YBf+Xtwz0a7tdz6437un7yspdZk2QZRyrQQ@mail.gmail.com' \
    --to=zaccysc@gmail.com \
    --cc=guix-science@gnu.org \
    --cc=lars@6xq.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).