From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms1.migadu.com with LMTPS id GNf2BZ3WDWYhgAAAqHPOHw:P1 (envelope-from ) for ; Thu, 04 Apr 2024 00:22:21 +0200 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id GNf2BZ3WDWYhgAAAqHPOHw (envelope-from ) for ; Thu, 04 Apr 2024 00:22:21 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=WONm3Cvr; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=posteo.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1712182941; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=BCtlEFhcrCiiD26We6/DhTzP3XFVjYdG1Vy9DJxpV5E=; b=ci2fnd8TfmrgAKcNsbSam/3hg5yOCmXMlBTKmnOQRkxveZ0GPsnYPXwsvjD2cmn6WhGFgY jju6jtqVtqI8VvTBms/mcorlElhzV+FzNXdJ1VzwDdaqppPATXcLqsl/dK00MzGnR9XgMp RKEt8MxcljiFGkZWd7gtR37VyCWwEGQJfIQ4eym4qkQDUAQi66nm8M/9zeQpMeR/KkcIc6 jIPi6I2O7RR9tKC/HVSq6DrtOb0iX8VPprsZlq9dUXHTBFtshVENLjoBRiZiV9Hb/fdhgx xfAVF9luEwe0r3VGBmO799ZQiISn8CWmoXHj3ric8R/uKP98a3IxB4k8sVzUkQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1712182941; a=rsa-sha256; cv=none; b=XvY8/cLmt3VTdHELITxc2V6a4ISgzIp0nlpuMAvJ4QGF1XaWV+ELKXX0O2vkEimafXN3J6 tVxopyZ9Fvwu0a95bN9NhkZb8kfdL92ZItDe8TX1eiRkuRKIX/Emce2LIl8s1JYfiQR8WH lPlubZSacTvIs4MomSZy/hb9w72y0R7M1bkARXt+nqrAGZ9hhhd4qKT9nXAQ4cW95/zJnF /XAsGq5MLZV+vMoEyoXSLGLofy442zNTD/HfyodALpzybp6fEzwGcNMzDmN/9yWidnIHht MO10YswLEDO7e3UBWI2q0V8x995JPStWSDCMz+X4aMsvWhC0b3SBrD+/aY9fWw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=WONm3Cvr; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=posteo.net Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 8ECFA363C6 for ; Thu, 4 Apr 2024 00:22:20 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rs8zX-0007Ct-GT; Wed, 03 Apr 2024 18:21:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rs8zV-0007Cg-Eh for guix-devel@gnu.org; Wed, 03 Apr 2024 18:21:57 -0400 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rs8zT-0001Zx-0w for guix-devel@gnu.org; Wed, 03 Apr 2024 18:21:57 -0400 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id AF5F5240027 for ; Thu, 4 Apr 2024 00:21:51 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1712182911; bh=O0tn5wWIhSC5WPS8+n//RL/v+2ifngle4aGcTwCBODQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type: Content-Transfer-Encoding:From; b=WONm3Cvr5uWOyQwIU6MXoFQ/QWTMl/cB0KSrhGGajF9C7cOSn0Bbks/Y8Bhm0BEok hI+0xY1v4S9Fav5Fx6nuSvyxlqblcl/sSpx3i91SUnG2HWUw+Pg5wW4KhFWKBQXDgv x4TIdXvc4vJAilqA6456xtW/TdRw9cOa5fyBigy9E8iip4APSwoZgMAJpuJjlL+mmL k8v7bQ1fSCEYikoJi8KRAzYRt7r9weY+AnBPdEw2e8owoc3Ug+rdWof+6krQWkECnD 7vGhwapyEl5qYdgj3KTm3XY/llV+EpDTVzCwPZe8mBZCydrKZiarx/Tj9K9yayKudO EZS/HxSClFbIQ== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4V8zkQ59QWz9rxB; Thu, 4 Apr 2024 00:21:50 +0200 (CEST) From: David Elsing To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: guix-devel@gnu.org, rekado@elephly.net, Romain GARBAGE Subject: Re: PyTorch with ROCm In-Reply-To: <874jcj6f0d.fsf@inria.fr> References: <86msqoeele.fsf@posteo.net> <87y1a2j8v4.fsf@gnu.org> <7ymsqe9h5l.fsf@posteo.net> <874jcj6f0d.fsf@inria.fr> Date: Wed, 03 Apr 2024 22:21:43 +0000 Message-ID: <86v84yccjs.fsf@posteo.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.67.36.65; envelope-from=david.elsing@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -10.06 X-Spam-Score: -10.06 X-Migadu-Queue-Id: 8ECFA363C6 X-Migadu-Scanner: mx12.migadu.com X-TUID: zrISREAMuuYA Hello, Ludovic Court=C3=A8s writes: > Yeah, we could think about a transformation option. Maybe > =E2=80=98--with-configure-flags=3Dpython-pytorch=3D-DAMDGPU_TARGETS=3Dxyz= =E2=80=99 would work, > and if not, we can come up with a specific transformation and/or an > procedure that takes a list of architectures and returns a package. I think that would work for python-pytorch itself, but it would need to be set for all ROCm dependencies as well. It would be good to make sure that the targets for a package are a subset of the intersection of the targets specified for its dependencies. >>>> - Many tests assume a GPU to be present, so they need to be disabled. >>> >>> Yes. I/we=E2=80=99d like to eventually support that. (There=E2=80=99d= need to be some >>> annotation in derivations or packages specifying what hardware is >>> required, and =E2=80=98cuirass remote-worker=E2=80=99, =E2=80=98guix of= fload=E2=80=99, etc. would need >>> to honor that.) >> >> That sounds like a good idea, could this also include CPU ISA >> extensions, such as AVX2 and AVX-512? > > That=E2=80=99d be great, yes. Don=E2=80=99t hold your breath though as I= /we haven=E2=80=99t > scheduled work on this yet. If you=E2=80=99re interested in working on i= t, we > can discuss it of course. I am definitively interested, but am not familiar with Cuirass. Would this also require support by the build daemon to determine which hardware is available? >> I think the issue is simply that elf-file? just checks the magic bytes >> and has-elf-header? checks for the entire header. If the former returns >> #t and the latter #f, an error is raised by parse-elf in guix/elf.scm. >> It seems some ROCm (or tensile?) ELF files have another header format. > > Uh, never came across such a situation. What=E2=80=99s so special about = those > ELF files? How are they created? After checking again, I noticed that the error actually only occurs for rocblas. :) Here, the problematic ELF files are generated by Tensile [1], and are installed in lib/rocblas/library (by library/src/CMakeLists.txt, which calls a CMake function from the Tensile package). They are shared object libraries for the GPU architecture(s) [2]. Tensile uses `clang-offload-builder` (from rocm-toolchain) to create the files, and it seems to me that the "ELF" header comes from there, but I don't know why it is special. Thanks, David [1] https://github.com/ROCm/Tensile/blob/17df881bde80fc20f997dfb290f4bb4b0e= 05a7e9/Tensile/TensileCreateLibrary.py#L283 [2] https://github.com/ROCm/Tensile/wiki/TensileCreateLibrary#code-object-l= ibraries