From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id qPfyEqiQZ2KyoAAAbAwnHQ (envelope-from ) for ; Tue, 26 Apr 2022 08:26:48 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id OOfcEqiQZ2I/dwAAauVa8A (envelope-from ) for ; Tue, 26 Apr 2022 08:26:48 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id E205E17C32 for ; Tue, 26 Apr 2022 08:26:47 +0200 (CEST) Received: from localhost ([::1]:47252 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1njEes-00049l-I6 for larch@yhetil.org; Tue, 26 Apr 2022 02:26:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53218) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1njD4s-0000Kg-8C for guix-science@gnu.org; Tue, 26 Apr 2022 00:45:30 -0400 Received: from mail-ed1-x536.google.com ([2a00:1450:4864:20::536]:39682) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1njD4q-0006Zd-9X for guix-science@gnu.org; Tue, 26 Apr 2022 00:45:29 -0400 Received: by mail-ed1-x536.google.com with SMTP id g20so20852321edw.6 for ; Mon, 25 Apr 2022 21:45:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=t3PJGzvWG8hEteoit1aQ5RZe/YEmvXU3kSCYwvq6Bkg=; b=KkVc/p8A1JJcmvgOdZlqAZShRAYj5h+91BrpTzG9SrTc5WlQvUB5qLoTNbFZY4subX Tm80MjmTTv4FswcaQHqa88ptp9UzDNqPUU3MLWP/7zehOi4xkMe/rYoQkPxSfEfsOaIa irVh4ku94BVCKpt6ps5c2HgLrEs1R5ngV+ooBNVjZ4lTxzorTTQj6I64sKJ2Axp3xIsZ fI+0aTJJqHfx6agXsYUpwfMAjQnHGCv2YZVh+esLcaKl8yOqjaL/TteL/Va3SfZv3eM1 tmLpcSDmmW8rA7P14Th2BDQI3l7AXhm44+71Zy7VTNbzn+vlWScv7YaESE9fkKAdVZPC ov7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=t3PJGzvWG8hEteoit1aQ5RZe/YEmvXU3kSCYwvq6Bkg=; b=QYyNFjOuYjLzOKC1y3YaznCP8jM6TKrG5Y8xEUPzuPD6Xn74b7cPCkTf0WcVa/bBGi SiulnriklLnWf2blLK/LiFJw9diCUpQnwgQ2M1+lo6ifusqG1kISc+/m1vSs5O7qB+gy NysxarwePk/r95im+0aRhq2ynZu8rkfDBDW6GAMmygafq6S/FrqZkYKbgoJu4cL2mv/F U4UT2RrjXNMCjhMOkE3/FYv0TIzgZl6PAot99VbI3FnyWsvS6vhIjrT55j6dmMXowSNI 20pPEBxH7tDUM7BccSt3MrAxZ0QGYWxgCE0dKtv4ndDM+PSxOW2kfuCkNb29gnm1rF+y ao6g== X-Gm-Message-State: AOAM533PGEokSxIAblRfmA5hODcPbDJQEZzvY4o4J3B+bhRbuxlD8VEd bajUEWAWy0jZPd8wrrPGOVQaX1reuC55gKVG06uvVKH8jmQ= X-Google-Smtp-Source: ABdhPJwYGC3KX5E9BVmv48LTlq6N1qhlOQKtjBsWt1L/Z0yq7stm7MExiiZMXPNpzgMBSx7Kq4ZYNUHVhDdKHs+XH5c= X-Received: by 2002:aa7:c31a:0:b0:425:df3c:de8e with SMTP id l26-20020aa7c31a000000b00425df3cde8emr11305187edq.83.1650948324565; Mon, 25 Apr 2022 21:45:24 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Zacchaeus Scheffer Date: Tue, 26 Apr 2022 00:45:13 -0400 Message-ID: Subject: Re: Freeing Machine Learning with ROCm To: Lars-Dominik Braun Cc: guix-science@gnu.org Content-Type: multipart/alternative; boundary="0000000000006da66f05dd875c03" Received-SPF: pass client-ip=2a00:1450:4864:20::536; envelope-from=zaccysc@gmail.com; helo=mail-ed1-x536.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Tue, 26 Apr 2022 02:26:31 -0400 X-BeenThere: guix-science@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org Sender: "Guix-Science" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1650954408; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=t3PJGzvWG8hEteoit1aQ5RZe/YEmvXU3kSCYwvq6Bkg=; b=OGwuPOW8dwFaPxpvfvnlnwKsRoE47TBRAr6I5sbxR+4fI3VDiTu26M9wI4qQmcPzD0jIfx vGbN/wrwuVxYMzSGnq0+pUcKBoHSkJMGpT8yIH2w4AtnP7N3eOll/tbPJMEnSMAw8u2jOJ 1ypFzpjVYEwtXAIr4WUjRMDLMV0SMvd71nuDS95xfL/7kcRyXEo3yypuKdwms52feCivE/ 1/uTqvUjtKh7laQ3ylJabgaiO42sjsB+CzLeRqnBZUQaSQSoLzqf8VpwiCNiQ+6y0TfS4H Eceeg0k7Div5EL8+raf4RaZKbNmo21IrCqFvzijOXQ9uSXjHfJneYldbT3zx/g== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1650954408; a=rsa-sha256; cv=none; b=KzSpaE5D6P0u1czfDkZsTp4L1SzAuDuflNK3mZFvBJoKp5j7Z6w6ZVDKs9tNpDi3OV/19U qCGWsCGTxlDz6ibsw2JYa9swABOHl6kmZ6XTxw/hchDp9WuJGUm0gdf6FEmpIXDkYbrqYI XrNNZQpGfYX1ZXkjAY8cfICQjLeawDYEPjQcgoyYKNKw8dlWz1flphH4Xv6iM23y5co0AF sO52TY/yUhcgOgxBm8u+PJJZYlpzqpMAfbXylg9OCKASirH9Jzm95PCKluqAp32B07r+jq /HhnTbqicpLsffidcd42VGo7DoYUu5gVQGK4q4S8W6BT9YCkFt5gFzHPp9LEpg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="KkVc/p8A"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 2.99 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="KkVc/p8A"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: E205E17C32 X-Spam-Score: 2.99 X-Migadu-Scanner: scn1.migadu.com X-TUID: tQNDqgi/iG+9 --0000000000006da66f05dd875c03 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > > > Based on the fact that many ROCm packages exist in guix, and > > that I don't see people complain, it seems it must have worked in the > > past. > Indeed, I am using Guix=E2=80=99 darktable and rocm-opencl-runtime packag= es > for OpenCL-accelerated photo editing. But I=E2=80=99m also doing this on = a > foreign distribution with a custom kernel (5.15) =E2=80=93 not Guix Syste= m. > I tried kernel version 5.15 before I tried 5.4. Is there anything else special about your kernel version? > > ROCk module is loaded > > > Unable to open /dev/kfd read-write: No such file or directory > > > is member of video group > Which GPU are you using? Can you see it with `lspci` and does it have the > `amdgpu` driver attached? Is the firmware loaded (`dmesg | grep amdgpu`, > I=E2=80=99m guessing no, since you use linux-libre)? > I have an AMD Radeon Instinct MI60, one of the few officially supported GPUs. `lspci | grep -i amd` gives: 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a0 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a1 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 so it seems to be detected. `dmesg | grep amdgpu` gives: [ 12.446826] [drm] amdgpu kernel modesetting enabled. [ 12.485012] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [ 12.522503] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR [ 12.522538] amdgpu: ATOM BIOS: 113-D1630600-107 [ 12.523127] [drm:sdma_v4_0_early_init.cold [amdgpu]] *ERROR* sdma_v4_0: Failed to load firmware "/*(DEBLOBBED)*/" [ 12.523277] [drm:sdma_v4_0_early_init.cold [amdgpu]] *ERROR* Failed to load sdma firmware! [ 12.533887] amdgpu 0000:03:00.0: amdgpu: MEM ECC is active. [ 12.533889] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is active. [ 12.533893] amdgpu 0000:03:00.0: amdgpu: RAS INFO: ras initialized successfully, hardware ability[7fff] ras_mask[7fff] [ 12.533902] amdgpu 0000:03:00.0: amdgpu: VRAM: 32752M 0x0000008000000000 - 0x00000087FEFFFFFF (32752M used) [ 12.533904] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF [ 12.533905] amdgpu 0000:03:00.0: amdgpu: AGP: 267878400M 0x0000008800000000 - 0x0000FFFFFFFFFFFF [ 12.557543] [drm] amdgpu: 32752M of VRAM memory ready [ 12.557549] [drm] amdgpu: 24018M of GTT memory ready. [ 12.557775] amdgpu 0000:03:00.0: amdgpu: failed to init sos firmware [ 12.557777] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp firmware! [ 12.557916] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block failed -2 [ 12.558042] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed [ 12.558044] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init [ 12.558047] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device. [ 12.602981] amdgpu: probe of 0000:03:00.0 failed with error -2 [ 12.603026] [drm] amdgpu: ttm finalized So it seems to be partially working, partially not. That "Fatal error during GPU init" is pretty discouraging though... With the way AMD promoted ROCm as being so open, I was really under the impression that I would be able to make this work on Guix, albeit with some work on my end, but you sound skeptical. Do you think it is possible? Thanks for your kind response, Zacchaeus --0000000000006da66f05dd875c03 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
> Based on the fact that many ROCm packages exist in guix= , and
> that I don't see people complain, it seems it must have w= orked in the
> past.
Indeed, I am using Guix=E2=80=99 darktable an= d rocm-opencl-runtime packages
for OpenCL-accelerated photo editing. But= I=E2=80=99m also doing this on a
foreign distribution with a custom ker= nel (5.15) =E2=80=93 not Guix System.
I tried k= ernel version 5.15 before I tried=C2=A05.4.=C2=A0 Is there anything else sp= ecial about your kernel version?

> > ROCk module is loaded
> > Unable to open /dev/kfd= read-write: No such file or directory
> > <my username here>= ; is member of video group
Which GPU are you using? Can you see it with = `lspci` and does it have the
`amdgpu` driver attached? Is the firmware l= oaded (`dmesg | grep amdgpu`,
I=E2=80=99m guessing no, since you use lin= ux-libre)?
I have an AMD Radeon Instinct MI60, one o= f the few officially supported GPUs. =C2=A0`lspci | grep -i amd` gives:
= 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a0
0= 2:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a1
03= :00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20
so it seems to be detected. `dmesg | grep amd= gpu` gives:
[ =C2=A0 12.446826] [drm] amdgpu kernel modesetting enabled.=
[ =C2=A0 12.485012] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (T= MZ) feature not supported
[ =C2=A0 12.522503] amdgpu 0000:03:00.0: amdgp= u: Fetched VBIOS from ROM BAR
[ =C2=A0 12.522538] amdgpu: ATOM BIOS: 113= -D1630600-107
[ =C2=A0 12.523127] [drm:sdma_v4_0_early_init.cold [amdgpu= ]] *ERROR* sdma_v4_0: Failed to load firmware "/*(DEBLOBBED)*/"[ =C2=A0 12.523277] [drm:sdma_v4_0_early_init.cold [amdgpu]] *ERROR* Fail= ed to load sdma firmware!
[ =C2=A0 12.533887] amdgpu 0000:03:00.0: amdgp= u: MEM ECC is active.
[ =C2=A0 12.533889] amdgpu 0000:03:00.0: amdgpu: S= RAM ECC is active.
[ =C2=A0 12.533893] amdgpu 0000:03:00.0: amdgpu: RAS = INFO: ras initialized successfully, hardware ability[7fff] ras_mask[7fff][ =C2=A0 12.533902] amdgpu 0000:03:00.0: amdgpu: VRAM: 32752M 0x000000800= 0000000 - 0x00000087FEFFFFFF (32752M used)
[ =C2=A0 12.533904] amdgpu 00= 00:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[= =C2=A0 12.533905] amdgpu 0000:03:00.0: amdgpu: AGP: 267878400M 0x000000880= 0000000 - 0x0000FFFFFFFFFFFF
[ =C2=A0 12.557543] [drm] amdgpu: 32752M of= VRAM memory ready
[ =C2=A0 12.557549] [drm] amdgpu: 24018M of GTT memor= y ready.
[ =C2=A0 12.557775] amdgpu 0000:03:00.0: amdgpu: failed to init= sos firmware
[ =C2=A0 12.557777] [drm:psp_sw_init [amdgpu]] *ERROR* Fai= led to load psp firmware!
[ =C2=A0 12.557916] [drm:amdgpu_device_init.co= ld [amdgpu]] *ERROR* sw_init of IP block <psp> failed -2
[ =C2=A0 = 12.558042] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
[ = =C2=A0 12.558044] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init<= br>[ =C2=A0 12.558047] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing devic= e.
[ =C2=A0 12.602981] amdgpu: probe of 0000:03:00.0 failed with error -= 2
[ =C2=A0 12.603026] [drm] amdgpu: ttm finalized
So it seems to= be partially working, partially not.=C2=A0 That "Fatal error during G= PU init" is pretty discouraging though...=C2=A0 With the way AMD promo= ted ROCm as being so open, I was really under the impression that I would b= e able to make this work on Guix, albeit with some work on my end, but you = sound skeptical.=C2=A0 Do you think it is possible?

Thanks for your kind response,
Zacchaeus
<= /div> --0000000000006da66f05dd875c03--