From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id 8BNwBNA7K2Sm0QAASxT56A (envelope-from ) for ; Mon, 03 Apr 2023 22:49:20 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id oMaLA9A7K2T51AAAG6o9tA (envelope-from ) for ; Mon, 03 Apr 2023 22:49:20 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id AE4FE35D47 for ; Mon, 3 Apr 2023 22:49:19 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pjR6f-0005du-Fp; Mon, 03 Apr 2023 16:48:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pjR6e-0005df-5P for guix-devel@gnu.org; Mon, 03 Apr 2023 16:48:48 -0400 Received: from 17.mo583.mail-out.ovh.net ([46.105.56.132]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pjR6b-0007b3-Pl for guix-devel@gnu.org; Mon, 03 Apr 2023 16:48:47 -0400 Received: from director10.ghost.mail-out.ovh.net (unknown [10.108.20.52]) by mo583.mail-out.ovh.net (Postfix) with ESMTP id A7B11245BD for ; Mon, 3 Apr 2023 20:48:42 +0000 (UTC) Received: from ghost-submission-6684bf9d7b-kk82m (unknown [10.108.1.194]) by director10.ghost.mail-out.ovh.net (Postfix) with ESMTPS id D447D1FD64; Mon, 3 Apr 2023 20:48:41 +0000 (UTC) Received: from ngraves.fr ([37.59.142.95]) by ghost-submission-6684bf9d7b-kk82m with ESMTPSA id 0mxsMKk7K2TW3BcA7OyDow (envelope-from ); Mon, 03 Apr 2023 20:48:41 +0000 X-OVh-ClientIp: 81.67.140.142 To: Ryan Prior , "licensing@fsf.org" Cc: guix-devel@gnu.org Subject: Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?) In-Reply-To: References: Date: Mon, 03 Apr 2023 22:48:41 +0200 Message-ID: <87bkk4hety.fsf@ngraves.fr> MIME-Version: 1.0 Content-Type: text/plain X-Ovh-Tracer-Id: 4299248794350641717 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedvhedrvdeijedgudehgecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemucehtddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvfevufgjfhffkfggtgesthdtredttddttdenucfhrhhomheppfhitgholhgrshcuifhrrghvvghsuceonhhgrhgrvhgvshesnhhgrhgrvhgvshdrfhhrqeenucggtffrrghtthgvrhhnpeetveeltedtgeduteehveefteehfffgteffhffhfeejvdetfffhledvuefftdekvdenucffohhmrghinhepfhgrtggvsghoohhkrdgtohhmpdhgihhthhhusgdrtghomhdpmhhouggvlhgtrghrugdrmhgupdgrlhhphhgrtggvphhhvghirdgtohhmnecukfhppeduvdejrddtrddtrddupdekuddrieejrddugedtrddugedvpdefjedrheelrddugedvrdelheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeduvdejrddtrddtrddupdhmrghilhhfrhhomhepoehnghhrrghvvghssehnghhrrghvvghsrdhfrheqpdhnsggprhgtphhtthhopedupdhrtghpthhtohepghhuihigqdguvghvvghlsehgnhhurdhorhhgpdfovfetjfhoshhtpehmohehkeefpdhmohguvgepshhmthhpohhuth Received-SPF: pass client-ip=46.105.56.132; envelope-from=ngraves@ngraves.fr; helo=17.mo583.mail-out.ovh.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Nicolas Graves From: Nicolas Graves via "Development of GNU Guix and the GNU System distribution." Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Seal: i=1; s=key1; d=yhetil.org; t=1680554959; a=rsa-sha256; cv=none; b=H2rD+Zv10Ozc7ozIU2Hu4o/IdvM1X4bRV7f0x+pFyQciPWQWtebTg28XKbLtk5kk/gq+HP QJ6HClkdm4JTojfIQHYhhbFBzZZyG+PbddONB1S+QVlersWPxgz1pgfuILAEyGrLKLrzxk U/v0hL72XEmMukPDGtUzmOxts2YUCc+f8COYO+iwR+uipsiM+odV/14NvdeufBzjLoRLVA xiu6M3SM/Efm9KuCvLPOOOoL04hXaom8xDpjSHqK1KhfAGtoWeiR2KHqHE/IYKXHhqW7sl Roiy5hHzzLWWsUgUGbNvlRXVFDHER/gdAyZsm0/KmDDTjEGCysz5eP4/xtn12A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1680554959; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=S+kOREeRp5RMkABk0T+890F+4cMtV4GYAibtjP74Ygs=; b=tqXT4LcjBt5yYcNbW0y8+XIkO6On3f8lGPW+6YsQwyuEK0vl7QPKEaQ4IUn7JcW8oKc3i5 HHeIzOddQnFV6I+h6sytbob8u8VoPFMuulCiSg/5JmXTWYzYkQiv4s002uZGc4nnxQpmht VuWLbgr7c25cAbd/ZDblB4HwlInJgPVmFbX0Ul/cS99NfI4YkRZtDuCfOdIZVFAVionakN wKBZhfDokvjRpNFaXXn8G5JdUHi76kF8TyI5MylhhSMUYY3gG7S+wETEkAagGHSmLeDdMg Qg/6/e+cSXnpK0c0m++RJgXnpXGMdLi7UOxMCkxlpjvwIMAm3KRAsQhN5WMgXA== X-Migadu-Spam-Score: 3.48 X-Migadu-Scanner: scn1.migadu.com Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Spam-Score: 3.48 X-Migadu-Queue-Id: AE4FE35D47 X-TUID: /qNSbdzwdP7I On 2023-04-03 18:07, Ryan Prior wrote: > Hi there FSF Licensing! (CC: Guix devel, Nicholas Graves) This morning I read through the FSDG to see if it gives any guidance on when machine learning model weights are appropriate for inclusion in a free system. It does not seem to offer much. > > Many ML models are advertising themselves as "open source", including the llama model that Nicholas (quoted below) is interested in including into Guix. However, according to what I can find in Meta's announcement (https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) and the project's documentation (https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) the model itself is not covered by the GPLv3 but rather "a noncommercial license focused on research use cases." I cannot find the full text of this license anywhere in 20 minutes of searching, perhaps others have better ideas how to find it or perhaps the Meta team would provide a copy if we ask. Just to be precise on llama, what I proposed was to include the port of Facebook code to CPP, (llama.cpp, see ticket 62443 on guix-patches), which itself has a license. The weight themselves indeed do not have a license. You can only download them through torrents because they were leaked. For this model in particular, we can't include them in Guix indeed (also because of their sheer size). The other case I evoked and one that is more mature is the case of VOSK audio recognition, which model binaries have an Apache license (you can find them here: https://alphacephei.com/vosk/models > > Free systems will see incentive to include trained models in their distributions to support use cases like automatic live transcription of audio, recognition of objects in photos and video, and natural language-driven help and documentation features. I hope we can update the FSDG to help ensure that any such inclusion fully meets the requirements of freedom for all our users. Thanks for this email and the question about these guidelines, Ryan. I would be glad to help if I can. > > Cheers, > Ryan > > > ------- Original Message ------- > On Monday, April 3rd, 2023 at 4:48 PM, Nicolas Graves via "Development of GNU Guix and the GNU System distribution." wrote: > > >> >> >> >> Hi Guix! >> >> I've recently contributed a few tools that make a few OSS machine >> learning programs usable for Guix, namely nerd-dictation for dictation >> and llama-cpp as a converstional bot. >> >> In the first case, I would also like to contribute parameters of some >> localized models so that they can be used more easily through Guix. I've >> already discussed this subject when submitting these patches, without a >> clear answer. >> >> In the case of nerd-dictation, the model parameters that can be used >> are listed here : https://alphacephei.com/vosk/models >> >> One caveat is that using all these models can take a lot of space on the >> servers, a burden which is not useful because no build step are really >> needed (except an unzip step). In this case, we can use the >> #:substitutable? #f flag. You can find an example of some of these >> packages right here : >> https://git.sr.ht/~ngraves/dotfiles/tree/main/item/packages.scm >> >> So my question is: Should we add this type of models in packages for >> Guix? If yes, where should we put them? In machine-learning.scm? In a >> new file machine-learning-models.scm (such a file would never need new >> modules, and it might avoid some confusion between the tools and the >> parameters needed to use the tools)? >> >> >> -- >> Best regards, >> Nicolas Graves -- Best regards, Nicolas Graves