From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id QNibI/iwY2I2OwEAbAwnHQ (envelope-from ) for ; Sat, 23 Apr 2022 09:55:36 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id gOPOIviwY2K+kgAAG6o9tA (envelope-from ) for ; Sat, 23 Apr 2022 09:55:36 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0E00224880 for ; Sat, 23 Apr 2022 09:55:36 +0200 (CEST) Received: from localhost ([::1]:45478 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1niAcA-0006zr-TH for larch@yhetil.org; Sat, 23 Apr 2022 03:55:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52046) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ni86S-0005Zr-1Q for guix-science@gnu.org; Sat, 23 Apr 2022 01:14:40 -0400 Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]:36401) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ni86Q-0000oX-0j for guix-science@gnu.org; Sat, 23 Apr 2022 01:14:39 -0400 Received: by mail-ed1-x52d.google.com with SMTP id a1so7039502edt.3 for ; Fri, 22 Apr 2022 22:14:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=yD+vBvYYOJxM3D7tUzjRA43eVmjVeoMfdbyDlnNuqy4=; b=ZBwV+D0ZhbtOLdAOMfSvjpv1DOe99wpdXIDSMh1ow6AKzp9bjLc5TLGVx/9XL6OQga FiXTOuRjE060SQtJOspUBuA07l6qwVwyvfPZoNaXVygwTgEb3O84iNxnEkskp30bJHDP g8bAYUxLwUhLb06yg5ky8knl6HY5pEKrabzGhsfqAlVPAnI/T0OW1UV7xUis0GyUQO9v H3exizcVd2t9xK//nP6qm1lfU153Dsnl27hkj+LBRUU+/iSSg/a1IgOHa5kyBpEON8h3 kKfH3fuO3trJpevPyqTXv4+C3pW7cA28eW6SREWfklhT+h4SoSNmbP/Sv3DXcNLVKtzg Ay9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=yD+vBvYYOJxM3D7tUzjRA43eVmjVeoMfdbyDlnNuqy4=; b=WbbNlW4tjaAmaHwpiplOfv4gSReZ4kRk2EAfhcM24XRdU89pxM+qm3ZIQdBbgXe6oo zeP+ijMax1KqD2AIY0LDhNXmoY5xxnzBkshxsZEWnKtFCr1BoNBdevQA32wafDejvyAF CclMm5M2FAVE1dMwJgf6UaUCkn9leVHYMg6wcteG3SIWchhmFC3U4877e6LDjb39VIEc AGSH9RMNqn7VexWAkBW9ROcj+Jqwi88Y7LUAvKr+nEOfWs2Si/0JLxUGN3GyPDyIg1FL yi/qE+/WYhH1Tc5o2eE/rVbju9SoUVB1MmIuUoVJhEjKqKZglcKwIlDwlaqMODSNtcXj 7oRQ== X-Gm-Message-State: AOAM531HrdJl/T4ZONS67xqUi8tkeAbJp/FzR2cn8OC9KpBnSMyTReWE faL5iN5mzKsNdPOLqNaz3Gg01PSQbUKdeeEUWkHssBbcUus= X-Google-Smtp-Source: ABdhPJxyMDcoXkAWBBBMfzY6M0oLPV2jTrLWEabIblDW7b3ezmxMmYBx7ZKru3dXN4rdLASDNzd+iZ2GaOjlXBJ8xmA= X-Received: by 2002:a05:6402:1385:b0:413:2bc6:4400 with SMTP id b5-20020a056402138500b004132bc64400mr8612427edv.94.1650690873410; Fri, 22 Apr 2022 22:14:33 -0700 (PDT) MIME-Version: 1.0 From: Zacchaeus Scheffer Date: Sat, 23 Apr 2022 01:14:22 -0400 Message-ID: Subject: Freeing Machine Learning with ROCm To: guix-science@gnu.org Content-Type: multipart/alternative; boundary="00000000000024c76805dd4b6b8a" Received-SPF: pass client-ip=2a00:1450:4864:20::52d; envelope-from=zaccysc@gmail.com; helo=mail-ed1-x52d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sat, 23 Apr 2022 03:55:30 -0400 X-BeenThere: guix-science@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org Sender: "Guix-Science" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1650700536; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=yD+vBvYYOJxM3D7tUzjRA43eVmjVeoMfdbyDlnNuqy4=; b=ZTAIRJbTwLKxHH9IO4/hhfe9NVLBtIqwU1DnhYNRbDLfa1zqhrGDc9fe01zZ/QeN7wJGVB AtN6KX+pPEvnu/qPIKP8ayDbAB7IqRktSmsNmuvXn6iVsKSkEOO6I0GfvlyTnCZAH1t24l 9QnHRcqP1UDz+/fP5IVMVgK7Y6s9KV2Ra31BPPlS21ELAwzc8KS/JfokOSXToQ0pfMRQLk dBAQnohYD+yw9lW3bY1K1MnuNdji0Hb6Bi2t2Y11lHuVNF8iNT9en1/uD+6CCYyDlJwok8 rRJ/OeaNG41xL+qEF13oRFPgMQ5Bm1Pwj38UG1vMwRKwiayoey4fJBbAJTZk+A== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1650700536; a=rsa-sha256; cv=none; b=YlXaSibaK+VW7mah1ThmU03vQqklw3zYOdZc2AQJG+QfcVQVQT5xWsiA1LwsB9v2GnqQ9U U1zrDNK8qeBporEFtQeM+hoL+CYjWUrki7HWmNeR2nNiipwTIqMEnoidLJxraBGVViicFt cl8VH+ef7l17cyKll+NWQDdOCFAmTSx5SO+oK+B+9jmewbf2EHG6WWO+uIIxGkre3WJzR/ 5dOC98WchoOqAutJdFJ4rJfr/DlWbDHRovKphfucTBsxviKbdfEa4//VV8WnNVzNevbBp7 i6yNZjX5ri5fG/ixKcmdoqd+LPQxh/PZkEeBrMeB7DWaOT4HnJke68NPYDUglQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZBwV+D0Z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 2.98 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZBwV+D0Z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 0E00224880 X-Spam-Score: 2.98 X-Migadu-Scanner: scn1.migadu.com X-TUID: Mm3iqmxFHxVa --00000000000024c76805dd4b6b8a Content-Type: text/plain; charset="UTF-8" tHi guix-science, Basically all computer vision and/or machine learning research is done on GPUs in Pytorch and/or Tensorflow. Now, it should be possible to do this with ROCm drivers on a supported AMD GPU. However, I'm having trouble utilizing my GPU with ROCm drivers. This seems to be due to problems in the current Guix version, as I was able to utilize the GPU fine on a different OS. Based on the fact that many ROCm packages exist in guix, and that I don't see people complain, it seems it must have worked in the past. While I am interested in helping fix this in the current guix version (discussed more below), I also think it is important that people be able to use GPUs now. This brings me to my first question: Has anyone been able to run a ROCm compatible GPU on a Guix system using ROCm drivers? And, if so, could you provide resources to do so? (channels.scm with working guix commit, system.scm, home.scm, manifest.scm, etc.) Also, if you were able to get pytorch/tensorflow to play well on a GPU, info on that would also be nice. Currently, I have tried putting the results of "guix search rocm" (minus procmail) into a manifest (included below), and calling rocminfo (AMD nvidia-smi equivalent-ish). This gives me: > ROCk module is loaded > Unable to open /dev/kfd read-write: No such file or directory > is member of video group Maybe there is a missing magic udev rule? I was able to find a thread somewhere (can't find it now) where they suggested rolling back the kernel version. Cross-checking with the ROCm 4.3 install documentation (because the ROCm version in the guix repo is 4.3), I saw that the supported ubuntu version had kernel version 5.4.*, so I tried downgrading my kernel by adding: (kernel (specification->package "linux-libre@5.4.190")) to my system.scm, reconfiguring, and rebooting. I also tried similarly adding all rocm packages to my system.scm. In every instance, I tried running as a user (with appropriate groups as indicated by ROCm documentation) and root. In all cases, I get the error printed above. While this problem would seem like a good question for upstream ROCm, they don't officially support any but a few OS's, so here I am. In retrospect, could it maybe be that I can use the card without probing it with rocminfo? It would certainly be nice to be able to check the temperature (especially so I don't have to leave the fan on full blast) among other things, but maybe that isn't strictly necessary for doing machine learning on it? Any suggestion for how to get closer to getting GPU-accelerated (ROCm) pytorch/tensorflow running on Guix is appreciated. Thanks, Zacchaeus P.S. The archive for guix-science@gnu.org is pretty sparse. Should I be posting this to bug-guix instead? contents of my manifest.scm mentioned above: (specifications->manifest '("rocm-cmake" "rocminfo" "rocm-opencl-runtime" "rocm-device-libs" "rocm-comgr" "rocm-bandwidth-test" "rocr-runtime" "roct-thunk-interface" "rocclr")) --00000000000024c76805dd4b6b8a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
tHi guix-science,

Basically all computer vision and= /or machine learning research is done on GPUs in Pytorch and/or Tensorflow.= =C2=A0 Now, it should be possible to do this with ROCm drivers on a support= ed AMD GPU.=C2=A0 However, I'm having trouble utilizing my GPU with ROC= m drivers.=C2=A0 This seems to be due to problems in the current Guix versi= on, as I was able to utilize the GPU fine on a different OS.=C2=A0 Based on= the fact that many ROCm packages exist in guix, and that I don't see p= eople complain, it seems it must have worked in the past.=C2=A0 While I am = interested in helping fix this in the current guix version (discussed more = below), I also think it is important that people be able to use GPUs now.= =C2=A0 This brings me to my first question:

Has anyone been able to = run a ROCm compatible GPU on a Guix system using ROCm drivers?=C2=A0 And, i= f so, could you provide resources to do so? (channels.scm with working guix= commit, system.scm, home.scm, manifest.scm, etc.) =C2=A0Also, if you were = able to get pytorch/tensorflow to play well on a GPU, info on that would al= so be nice.

Currently, I have tried putting the results of "gui= x search rocm" (minus procmail) into a manifest (included below), and = calling rocminfo (AMD nvidia-smi equivalent-ish).=C2=A0 This gives me:
&= gt; ROCk module is loaded
> Unable to open /dev/kfd read-write: No su= ch file or directory
> <my username here> is member of video gr= oup
Maybe there is a missing magic udev rule?=C2=A0 I was able to find a= thread somewhere (can't find it now) where they suggested rolling back= the kernel version.=C2=A0 Cross-checking with the ROCm 4.3 install documen= tation (because the ROCm version in the guix repo is 4.3), I saw that the s= upported ubuntu version had kernel version 5.4.*, so I tried downgrading my= kernel by adding:
(kernel (specification->package "linux-libre@= 5.4.190"))
to my system.scm, reconfiguring, and rebooting.=C2=A0 I = also tried similarly adding all rocm packages to my system.scm.=C2=A0 In ev= ery instance, I tried running as a user (with appropriate groups as indicat= ed by ROCm documentation) and root.=C2=A0 In all cases, I get the error pri= nted above.=C2=A0 While this problem would seem like a good question for up= stream ROCm, they don't officially support any but a few OS's, so h= ere I am.

In retrospect, could it maybe be that I can use the card w= ithout probing it with rocminfo?=C2=A0 It would certainly be nice to be abl= e to check the temperature (especially so I don't have to leave the fan= on full blast) among other things, but maybe that isn't strictly neces= sary for doing machine learning on it?

Any suggestion for how to get= closer to getting GPU-accelerated (ROCm) pytorch/tensorflow running on Gui= x is appreciated.

Thanks,
Zacchaeus

P.S.
The archi= ve for guix-scien= ce@gnu.org is pretty sparse.=C2=A0 Should I be posting this to bug-guix= instead?

contents of my manifest.scm mentioned above:
(specifica= tions->manifest
=C2=A0'("rocm-cmake"
=C2=A0 =C2=A0&q= uot;rocminfo"
=C2=A0 =C2=A0"rocm-opencl-runtime"
=C2= =A0 =C2=A0"rocm-device-libs"
=C2=A0 =C2=A0"rocm-comgr&quo= t;
=C2=A0 =C2=A0"rocm-bandwidth-test"
=C2=A0 =C2=A0"ro= cr-runtime"
=C2=A0 =C2=A0"roct-thunk-interface"
=C2=A0= =C2=A0"rocclr"))
--00000000000024c76805dd4b6b8a--