From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id GPN+NDHdLmRh7gAASxT56A (envelope-from ) for ; Thu, 06 Apr 2023 16:54:41 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id kC5jMzHdLmRdGAEAG6o9tA (envelope-from ) for ; Thu, 06 Apr 2023 16:54:41 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 8CD9433D58 for ; Thu, 6 Apr 2023 16:54:41 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pkR07-00022C-3W; Thu, 06 Apr 2023 10:54:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pkR04-00021m-Jt for guix-devel@gnu.org; Thu, 06 Apr 2023 10:54:08 -0400 Received: from mail-lj1-x22d.google.com ([2a00:1450:4864:20::22d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pkR01-0000li-FG for guix-devel@gnu.org; Thu, 06 Apr 2023 10:54:07 -0400 Received: by mail-lj1-x22d.google.com with SMTP id q14so40943709ljm.11 for ; Thu, 06 Apr 2023 07:54:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680792843; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=w7IEmxvZUV9WaVO09/lzSH97GScSxo4+cr+VbAAwC/Y=; b=djk6BSFVfu5eEMFLqADf3RkhL5IIXq8nK9etgvPo7VrHM8FaYWINhbyDmLebXvFnI7 3LCcoure+KucKb4WG9UWid/qU8Tm2gKe680R5EZrwamBscKWDxrsYELwyVuRHu6by6nC 1OUcaYMOerpZws5dQZ1qWVwh+f4bOkWXn4l0lDIQZjpH6jJQhHtuRk/dGb0oOZwWD4gt /AkN1ha7u2wytX+AoStUxj6ZaDv3Z2aGyzknmUUB8wh0I7OWWhQnjrleGnD2GvvsdWIx FKaQQmnW/3/xb6xODE7M5q/t8V96uCVnYcDCJfdnkRiCzKUgsVMhCFE2MO624L/BBYqA WAKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680792843; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w7IEmxvZUV9WaVO09/lzSH97GScSxo4+cr+VbAAwC/Y=; b=KhgfaclHd0TOHSg1/LAE6Ej7NYk6tWM3A/G9onLsZ9PjYy6iSfX6S5BivRWkGnPyc5 97+0yYJZS+cvIyAGHz1ds0ypR6fiCXwKukipXVLnAjAX0mlMxU1Jr/1pYlPCOOg12/jl RfpIDjatp6H3gcMjpJg/HTNoCqSTbYii/j/rexPu8P3hWrr478czgmIjht7d5j1I5xbP zKewGZa+NXF0Dxn5JcPKd9yiNYRfOikh4z9PdVGBj0Txr5Q8jGjWJmZlj/sIpML5IAua LlEeQzMw+sQKo8RRmKv95800xsDRVZWbhUSikPQh7xrqx4RX5Bm1vXRxqJnmhAAYjX48 kO8Q== X-Gm-Message-State: AAQBX9e8G4Xs1237XNLuLreh9RreryM4JKpc0URPAESJEoUr8qrxUyto fK9iEclP1LNesh2Qrl9Q297umPwVd+32u/c/71k= X-Google-Smtp-Source: AKy350ZA6wTKq1JkHMAv5rXt4IkIJTJiWNJ+nBs2KUpzsjveWmfV03naCjjSaGjmraFpbt8T7ittj6mu0DjrCM6CWec= X-Received: by 2002:a2e:9006:0:b0:29e:e7b7:dfd4 with SMTP id h6-20020a2e9006000000b0029ee7b7dfd4mr3494841ljg.4.1680792842882; Thu, 06 Apr 2023 07:54:02 -0700 (PDT) MIME-Version: 1.0 References: <868rf5e71j.fsf@gmail.com> <3A47DA6E-C392-4989-AFD4-20660D968415@posteo.net> In-Reply-To: <3A47DA6E-C392-4989-AFD4-20660D968415@posteo.net> From: Simon Tournier Date: Thu, 6 Apr 2023 16:53:51 +0200 Message-ID: Subject: Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?) To: Kyle Cc: guix-devel@gnu.org, Ryan Prior , Nicolas Graves Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::22d; envelope-from=zimon.toutoune@gmail.com; helo=mail-lj1-x22d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Seal: i=1; s=key1; d=yhetil.org; t=1680792881; a=rsa-sha256; cv=none; b=MTJi4HM7uUeLDkgIDV5sIRqzL2/QZrUpSm3QCKKCFZg11hixE/iubu4U2hfkXyhaZVRmD2 c0s7WZFxhqJeZMbJypQNxmunSrrUUKY9kO9BaAFzDwX6N/hwwCgj0kXjc3aUVA7H/Nr+pY t+zvWoutXXaUW+51pzIYRzchMxsSKtiF0kq2ULkpOvCdzNTVZH3NHfTqInmup+yKqgDxU4 LRft+v4UvjTQIWeqSU/n8/Ug9c2LSqssUa5RPPe23usqb3rZkO8Yw6iciwVPjdiDSo5IMH hDQ0hJKr8tMn/sDjgizLF8Z8cSkVVep5Wt0hacsY9WLSzRx4unk7tsWk3/QEsw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=djk6BSFV; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1680792881; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=w7IEmxvZUV9WaVO09/lzSH97GScSxo4+cr+VbAAwC/Y=; b=WcoJjiMz9rzca5fFByPOdjFR9Inb5qAlWmuApF62UAIIn7wYqOSZI0wJNgBC5IdL9uIwO7 ipdXGmyFIJr3gc1xR1Hk2iMABBc5Y0GLCNs6unnbFn1yacHTrRLIGBjYp38iGRnUn3srrU y99xZf90dCtME9b0tfSXtyfXaewn3BsDQ+oRMwcDT53rD9w8Eu+hje6c3lkiPrv4TAfOIj zsTaUlLUcBt8mp6fEqFt7C4ZXUJ4/dPCU0aO8GVjDoEIyLobozOM33BIptdP98vIPINaKv 0M8PYGTay/L8GBlqjA99aaawSzPQvgfzJ9hVM7lDOOyS+742XdgU3h4JaOU63Q== X-Migadu-Spam-Score: 0.55 X-Migadu-Scanner: scn1.migadu.com Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=djk6BSFV; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Spam-Score: 0.55 X-Migadu-Queue-Id: 8CD9433D58 X-TUID: knP1cHGMQxEK Hi, On Thu, 6 Apr 2023 at 15:41, Kyle wrote: > I have only seen situations where the optimization is "too entailed with = randomness" when models are trained on proprietary GPUs with specific setti= ngs. Otherwise, pseudo-random seeds are perfectly sufficient to remove the = indeterminism. Feel free to pick real-world model using 15 billions of parameters and then to train it again. And if you succeed, feel free to train it again to have bit-to-bit reproducibility. Bah the cost (CPU or GPU power and at the end the electricity, so real money) would not be nothing and I am far to be convinced that paying this bill is worth, reproducibility speaking. > =3D> https://discourse.julialang.org/t/flux-reproducibility-of-gpu-experi= ments/62092 Ahah! I am laughing when Julia is already not reproducible itself. https://issues.guix.gnu.org/22304 https://issues.guix.gnu.org/47354 And upstream does not care much as you can see https://github.com/JuliaLang/julia/issues/25900 https://github.com/JuliaLang/julia/issues/34753 Well, years ago Nicol=C3=B4 made a patch for improving but it has not been merged yet. For instance, some people are trying to have "reproducible" benchmark of machine learning, https://benchopt.github.io/ and last time I checked, they have good times and a lot of fun. ;-) Well, I would be less confident than "pseudo-random seeds are perfectly sufficient to remove the indeterminism". :-) > Many people think that "ultimate" reproducibility is not a practical eith= er. It's always going to be easier in the short term to take shortcuts whic= h make conclusions dependent on secret sauce which few can understand. > > =3D> https://hpc.guix.info/blog/2022/07/is-reproducibility-practical/ Depending on the size of the model, training it again is not practical. Similarly, the computation for predicting weather forecast is not practically reproducible and no one is ready to put the amount of money on the table to do so. Instead, people are exchanging dataset of pressure maps. Bit-to-bit reproducibility is a mean for verifying the correctness between some claim and what had concretely be done. But that's not the only mean. Speaking about some scientific method point of view, it is false to think that it is possible to reproduce all or that it is possible to reproduce all. Consider theoretical physics experiment by LHC; in this case, the confidence in the result is not done using independent bit-to-bit reproducibility but by as much as possible transparency of all the stages. Moreover, what Ludo wrote in this blog post is their own points of view and for example I do not share all. Anyway. :-) For sure, bit-to-bit reproducible is not a end for trusting one result but a mean among many others. It is possible to have bit-to-bit reproducible results that are wrong and other results impossible to reproduce bit-to-bit that are correct. Well, back to Julia, since part of Julia is not bit-to-bit reproducible, does it mean that the scientific outputs generated using Julia are not trustable? All that said, if the re-computation of the weights is affordable because the size of the model is affordable, yes for sure, we could try. But from my point of view, the re-computation of the weights should not be blocking for inclusion. What should be blocking is the license of this data (weights). > From my point of view, pre-trained > >weights should be considered as the output of a (numerical) experiment, > >similarly as we include other experimental data (from genome to > >astronomy dataset). > > I think its a stretch to consider a data compression as an experiment. In= experiments I am always finding mistakes which confuse the interpretation = hidden by prematurely compressing data, e.g. by taking inappropriate averag= es. Don't confuse the actual experimental results with dubious data process= ing steps. I do not see where I speak about data compression. Anyway. :-) Well, I claim that data processing is an experiment. There is no "actual experiment" and "data processing". It is a continuum. Today, any instrument generating data does internally numerical processing. Other said, what you consider as your raw inputs is considered as output by other, so by following the recursive problem, the true original raw material is physical samples and that is what we should package, i.e., we should send by post mail these physical samples and then reproduce all. Here, I am stretching. ;-) The genomic references that we already packaged are also the result of "data processing" that no one is redoing. I do not see any difference between the weights of machine learning models and these genomic references; they are both generated data resulting from a experiment (broad meaning). Cheers, simon