From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id YLXsKgMdpGSUrgAASxT56A (envelope-from ) for ; Tue, 04 Jul 2023 15:22:11 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id cM8JKgMdpGTOnAAAG6o9tA (envelope-from ) for ; Tue, 04 Jul 2023 15:22:11 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 7E7209D61 for ; Tue, 4 Jul 2023 15:22:11 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qGfib-0000me-Lm; Tue, 04 Jul 2023 09:05:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qGfiZ-0000XU-67 for guix-devel@gnu.org; Tue, 04 Jul 2023 09:05:19 -0400 Received: from mta-07-3.privateemail.com ([198.54.118.214]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qGfiW-00039A-Oe; Tue, 04 Jul 2023 09:05:18 -0400 Received: from mta-07.privateemail.com (localhost [127.0.0.1]) by mta-07.privateemail.com (Postfix) with ESMTP id 801B41800405; Tue, 4 Jul 2023 09:05:06 -0400 (EDT) Received: from APP-12 (unknown [10.50.14.212]) by mta-07.privateemail.com (Postfix) with ESMTPA id 2CF1718000BC; Tue, 4 Jul 2023 09:05:01 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=twdb.moe; s=default; t=1688475906; bh=gBBuWr2JXDg50onZL8raVDeWZj8N4SgRwNUfRVTdet0=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From; b=Yq+MPNa/f0tkPjS6zhy2E+QLIdSmvAHXXLdUGYfsaMrI8FXcMWHTH3Kdb6NFj2G7a a6IJifXwVSaz9OGQ2WPN4XSzGJZo3NA+J/U/HOevztFaxRM6dHOYffRzikyCz3q2hS O8mDJOinKV8kbO2ogQ2kzJ6Pa/VVIyVe05a3DMjC5L3ZEdzeq2duINofYTS1LySZ6H uKiE/rtFd2kViuUmC4fCPFJAdpFlaO80+WvW5QnXJR635JHqglsVKlTGkOTNUwH3+h 7nTpemjywRRNfHpKkkkf8Q/CQsS4FfGI++QIvr5rNNS88Yxwt2O0qAhQ1lr9o8b/X/ XSCuXAmej4gaA== Date: Tue, 4 Jul 2023 10:05:01 -0300 (BRT) From: zamfofex To: Simon Tournier , =?UTF-8?Q?Ludovic_Court=C3=A8s?= Cc: =?UTF-8?B?5a6L5paH5q2m?= , Ryan Prior , Nicolas Graves , guix-devel@gnu.org Message-ID: <1353752735.686806.1688475901148@privateemail.com> In-Reply-To: <87wmzh5o5v.fsf@gmail.com> References: <868rf5e71j.fsf@gmail.com> <87ilcweumh.fsf@envs.net> <87v8gtzvu3.fsf@gmail.com> <87r0r3je82.fsf@gnu.org> <87wn0qrmdx.fsf@gmail.com> <87cz1aum5j.fsf@gnu.org> <87wmzh5o5v.fsf@gmail.com> Subject: Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 Importance: Normal X-Mailer: Open-Xchange Mailer v7.10.6-Rev47 X-Originating-Client: open-xchange-appsuite X-Virus-Scanned: ClamAV using ClamSMTP Received-SPF: pass client-ip=198.54.118.214; envelope-from=zamfofex@twdb.moe; helo=MTA-07-3.privateemail.com X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1688476931; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=XzM98ropdku/+0bm7y9vUkHVDuSLO40gxK3X3Xt7dAE=; b=u4Res572iKe13qUyhxYkNo7xyMS06Y40BqpfpM4S/lXp2H8ueMZsRyz8psYUBEWASt26l2 mvaMW5NibmRzLM2+gIvKHuJf3yYUEINbq/Xm4cgeUNVDwQzof6LR1AtBl3N+dordadf0mh ZFXCYf48XqNUDc1CM5hLkU/FDHkhRiuyoeMUAuqBckUmHvtXRm1ZLdkVZ1tFFXKSscfkh8 8YxfEcoNJTBWqzlk7jYGThRCXdD0fbt7vYf+bhMcPcWkHbEJqVt3qZjqXxhJA+LeOQnQ+M XiVhn6Ih8ZgvQmrzCOoiQAjFMvYOKjsD9u4tdHW5hxjgcq9DxI5gGq0CZwY1Hw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none ("invalid DKIM record") header.d=twdb.moe header.s=default header.b="Yq+MPNa/"; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1688476931; a=rsa-sha256; cv=none; b=sXS3gbkilv5Qatd68erOhEVEtUTMP2auaGVW7ZpvvDXU6WfJSqmVn44VrMaBX+9099eite 6Z4rEFiVrPsDuUTjR92oEIBRqbVsolxZcocwA0L2cgD/XHkAvmPsNyb6OHCkfPzEa3ZiSo eEXPRxi8mpLvzT6GvBleQ/NsuSChO+c1TMYBHFOQZpKJ3FE1+kt25ObMvI8GuMa2Dga9UK 0P+Ea66VkYJ4naqvxNRqPI2r0gqAt8jKOrSgYls+UHW6oTjcATn7o2b8GXSHOcIUEisSoS TbHuJ7fM+CGQ86MPW7rgTFq4yLktoV+G4YLoZNqdKPNiwXiKCYq+/7sRWF1tFw== X-Migadu-Scanner: scn1.migadu.com X-Migadu-Spam-Score: 5.44 Authentication-Results: aspmx1.migadu.com; dkim=none ("invalid DKIM record") header.d=twdb.moe header.s=default header.b="Yq+MPNa/"; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 7E7209D61 X-Spam-Score: 5.44 X-TUID: qfFi0Jm5EZbM > On 07/03/2023 6:39 AM -03 Simon Tournier wrote= : >=20 > Well, I do not see any difference between pre-trained weights and icons > or sound or good fitted-parameters (e.g., the package > python-scikit-learn has a lot ;-)). As I said elsewhere, I do not see > the difference between pre-trained neural network weights and genomic > references (e.g., the package r-bsgenome-hsapiens-1000genomes-hs37d5). I feel like, although this might (arguably) not be the case for leela-zero = nor Lc0 specifically, for certain machine learning projects, a pretrained n= etwork can affect the program=E2=80=99s behavior so deeply that it might be= considered a program itself! Such networks usually approximate an arbitrar= y function. The more complex the model is, the more complex the behavior of= this function can be, and thus the closer to being an arbitrary program it= is. But this =E2=80=9Cprogram=E2=80=9D has no source code, it is effectively cr= eated in this binary form that is difficult to analyse. In any case, I feel like the issue Ludovic was talking about =E2=80=9Cuser = autonomy=E2=80=9D is fairly relevant (as I understand it). For icons, image= s, and other similar kinds of assets, it is easy enough for the user to rep= lace them, or create their own if they want. But for pretrained networks, e= ven if they are under a free license, the user might not be able to easily = create their own network that suits their purposes. For example, for an image recognition software, there might be data provide= d by the maintainers of the program that is able to recognise a specific se= t of objects in input images, but the user might want to use it to recognis= e a different kind of object. If it is too costly for the user to train a n= ew network for their purposes (in terms of hardware and time required), the= user is effectively entirely bound by the decisions of the maintainers of = the software, and they can=E2=80=99t change it to suit their purposes. In that sense, there *might* be room for the maintainers to intentionally a= nd maliciously bind the user to the kinds of data they want to provide. And= perhaps even more likely (and even more dangerously), when the data is opa= que enough, there is room for the maintainers to bias the networks in obscu= re ways without telling the user. You can imagine this being used in the co= ntext of, say, text generation or translation, for the developers to embed = a certain opinion they have into the network in order to bias people toward= s it. But even when not done maliciously, this can still be limiting to the user = if they are unable to easily train their own networks as a replacement.