From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms1.migadu.com with LMTPS id OKl6K8IPDGZUBAEA62LTzQ:P1 (envelope-from ) for ; Tue, 02 Apr 2024 16:01:38 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1.migadu.com with LMTPS id OKl6K8IPDGZUBAEA62LTzQ (envelope-from ) for ; Tue, 02 Apr 2024 16:01:38 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=inria.fr header.s=dc header.b=nGJd65u8; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=inria.fr ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1712066498; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=mthtyRT9CikHhY6aSgRct0hxHLTqqvJPcszkh0oMlLk=; b=R2rF/E12amTxFKtjNXY/hAUAHseLdAuuynP4DH6f9JayypdWqz89WhYEa4EUDtR+D6J1mp mLfgtD1JVOS7voXG+WT8Uko5iY1i0ESIyGmIvdVBA4z4LDXCpUqpK80lN5R5Kc8Po8S2nv /dIeBZ8EHflHoQ/a0IfIMJKTxbCgPSnx3p9rrnZm1s3FaDZwD5lvFCrVFoFPIGyVKnvDty xuPhm/lv1CRl1lCpbB2CtkNH/xRXBHr8IhNI2Mecc66DDkE77sj29297UetYHd/tFPYTJq gRUpXyE7QbFogUtVGQ3gu8M7ZMvbgp4bLRqyLr3+fndqYuLz17DCSavqWRzM3g== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=inria.fr header.s=dc header.b=nGJd65u8; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=inria.fr ARC-Seal: i=1; s=key1; d=yhetil.org; t=1712066498; a=rsa-sha256; cv=none; b=gxYlzoUFKGJt1uXgiurYU+f8z2aXWz44Xt4FjnKpRWanahhH8rEjNDp38suTaRk1yLCABb Rhwcdkd387fXsbO0BzMT9aWy8PRkVjkVtaM4pgwETqfuQxSGF6UYidw7IfrVoNstQQbXno cEvBXdUh6xCY//J1eHND+sQ8VS5Wz5Dj5iv5JEN2PJ1vGXUtgCQ8t+0FUoC355jZkanBye BlyOQ9QYjPaGcAgFkx8yUeWp6+1VvoxcGrjHtX0/HuPQm3a0ynC20IbS7SvkeFzeMOm+Se 4pyfWI8QMPew9GjNHlVt3csbmQ8qI+a95hXF9iEhq8Ja9J6VqpVkiHP2hJLbAA== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 6BDDC6D4AB for ; Tue, 2 Apr 2024 16:01:38 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rrehC-0001Pg-G4; Tue, 02 Apr 2024 10:01:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rregz-0001M2-5j for guix-devel@gnu.org; Tue, 02 Apr 2024 10:00:51 -0400 Received: from mail2-relais-roc.national.inria.fr ([192.134.164.83]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rregu-0006gO-Ug for guix-devel@gnu.org; Tue, 02 Apr 2024 10:00:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inria.fr; s=dc; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=mthtyRT9CikHhY6aSgRct0hxHLTqqvJPcszkh0oMlLk=; b=nGJd65u83KS7NBki6eTAhi7GUeGVFqPs9B0EVNsw/3BIuWZeYcu+hh5j SKp49wudSK7IrnBazy6hwkd0XpLxSPIo99Z3mHx7GFmcguhfu+GgbPbgd aTyKUYLZgccvATmfN7pwWAHrZ/2xeDSs475Lb2Mb4ASURJGJtu+ptWTeF Y=; X-IronPort-AV: E=Sophos;i="6.07,175,1708383600"; d="scan'208";a="159529819" Received: from unknown (HELO ribbon) ([193.50.110.139]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2024 16:00:34 +0200 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: David Elsing Cc: guix-devel@gnu.org, rekado@elephly.net, Romain GARBAGE Subject: Re: PyTorch with ROCm In-Reply-To: <7ymsqe9h5l.fsf@posteo.net> (David Elsing's message of "Sun, 31 Mar 2024 22:21:26 +0000") References: <86msqoeele.fsf@posteo.net> <87y1a2j8v4.fsf@gnu.org> <7ymsqe9h5l.fsf@posteo.net> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Quartidi 14 Germinal an 232 de la =?utf-8?Q?R=C3=A9v?= =?utf-8?Q?olution=2C?= jour du =?utf-8?Q?H=C3=AAtre?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 02 Apr 2024 16:00:34 +0200 Message-ID: <874jcj6f0d.fsf@inria.fr> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=192.134.164.83; envelope-from=ludovic.courtes@inria.fr; helo=mail2-relais-roc.national.inria.fr X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Queue-Id: 6BDDC6D4AB X-Spam-Score: -11.05 X-Migadu-Spam-Score: -11.05 X-Migadu-Scanner: mx10.migadu.com X-TUID: cqnqYXrDHhOM Hello! (Cc=E2=80=99ing my colleague Romain who may work on related things soon.) David Elsing skribis: > It is the same as for other HIP/ROCm libraries, so the GPU architectures > chosen at build time are all available at runtime and automatically > picked. For reference, the Arch Linux package for PyTorch [1] enables 12 > architectures. I think the architectures which can be chosen at compile > time also depend on the ROCm version. Nice. We=E2=80=99d have to check what the size and build time tradeoff is,= but it makes sense to enable a bunch of architectures. >>> I'm not sure they can be combined however, as the GPU code is included >>> in the shared libraries. Thus all dependent packages like >>> python-pytorch-rocm would need to be built for each architecture as >>> well, which is a large duplication for the non-GPU parts. >> >> Yeah, but maybe that=E2=80=99s OK if we keep the number of supported GPU >> architectures to a minimum? > > If it's no issue for the build farm it would probably be good to include > a set of default architectures (the officially supported ones?) like you > suggested, and make it easy to recompile all dependent packages for > other architectures. Maybe this can be done with a package > transformation like for '--tune'?. IIRC, building composable-kernel for > the default architectures with 16 threads exceeded 32 GB of memory > before I cancelled the build and set it to only architecture. Yeah, we could think about a transformation option. Maybe =E2=80=98--with-configure-flags=3Dpython-pytorch=3D-DAMDGPU_TARGETS=3Dxyz= =E2=80=99 would work, and if not, we can come up with a specific transformation and/or an procedure that takes a list of architectures and returns a package. >>> - Many tests assume a GPU to be present, so they need to be disabled. >> >> Yes. I/we=E2=80=99d like to eventually support that. (There=E2=80=99d = need to be some >> annotation in derivations or packages specifying what hardware is >> required, and =E2=80=98cuirass remote-worker=E2=80=99, =E2=80=98guix off= load=E2=80=99, etc. would need >> to honor that.) > > That sounds like a good idea, could this also include CPU ISA > extensions, such as AVX2 and AVX-512? That=E2=80=99d be great, yes. Don=E2=80=99t hold your breath though as I/w= e haven=E2=80=99t scheduled work on this yet. If you=E2=80=99re interested in working on it,= we can discuss it of course. > I think the issue is simply that elf-file? just checks the magic bytes > and has-elf-header? checks for the entire header. If the former returns > #t and the latter #f, an error is raised by parse-elf in guix/elf.scm. > It seems some ROCm (or tensile?) ELF files have another header format. Uh, never came across such a situation. What=E2=80=99s so special about th= ose ELF files? How are they created? >> Oh, just noticed your patch bring a lot of things beyond PyTorch itself! >> I think there=E2=80=99s some overlap with >> , we >> should synchronize. > Ah, I did not see this before, the overlap seems to be tensile, > roctracer and rocblas. For rocblas, I saw that they set > "-DAMDGPU_TARGETS=3Dgfx1030;gfx90a", probably for testing? Could be, we=E2=80=99ll see. Thanks, Ludo=E2=80=99.