From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id wKN8GlIlK2QF2wAASxT56A (envelope-from ) for ; Mon, 03 Apr 2023 21:13:22 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id OPWAGlIlK2Q3fQAA9RJhRA (envelope-from ) for ; Mon, 03 Apr 2023 21:13:22 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 26A2C322E4 for ; Mon, 3 Apr 2023 21:13:21 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pjPbC-0003Hp-3v; Mon, 03 Apr 2023 15:12:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pjPbA-0003HO-HP for guix-devel@gnu.org; Mon, 03 Apr 2023 15:12:12 -0400 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pjPb8-00008y-8x for guix-devel@gnu.org; Mon, 03 Apr 2023 15:12:12 -0400 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id E97EB240405 for ; Mon, 3 Apr 2023 21:12:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1680549126; bh=D485GooPA9eXkcsb7NBp3efkXdbFpTeaaIo0B+cNPi0=; h=Date:From:To:Subject:From; b=hrJVoMp5SRyL7HpevV5ZkKWt9gTAtcbP+bW3/H3g5J0U8VEUxEFEnRzeH6Rt3x/Ho 8fWyCv5kIkCYvyRdUL12TusjHJLXw4YFBAUjruqjH8DlSa8hvnL8LD8+d5e8hxPYaY w1zs2cjzclRQjF/6VtjoKKBQF46aZRpXtllHNbBXP9Aglso7exsSCDNq7+XuTqZcji kBs7QFWisCssDbGc+HdEwvnpnf+VFuFMjno30Nz1/euGuX+peI8SQilJZJDL97s3IF gcMF8oUHke/5yqeKdEKM1gqSsh09ZBuaeziQcvN8YRc/EK0kdAX0165F5VqhnUb5F3 +yKllTd923Rmw== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4Pr0rN73yhz9rxP for ; Mon, 3 Apr 2023 21:12:04 +0200 (CEST) Date: Mon, 03 Apr 2023 19:12:01 +0000 From: Kyle To: "Nicolas Graves via Development of GNU Guix and the GNU System distribution." , guix-devel@gnu.org Subject: Re: Where should we put machine learning model parameters ? In-Reply-To: <87jzyshpyr.fsf@ngraves.fr> References: <87jzyshpyr.fsf@ngraves.fr> Message-ID: <298126E3-0137-4B39-BC48-C284D0464B68@posteo.net> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=----4UW1FRQSTK9AUY2AQAHBPG9IXGD9W1 Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=185.67.36.65; envelope-from=kyle@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Seal: i=1; s=key1; d=yhetil.org; t=1680549202; a=rsa-sha256; cv=none; b=r0l+Bgrc1t+HgFThZMxRwQJB3e19H+85ygVb9Vgv6jDE6c+4clwgmXoUvBha9vKB7yhN5o 1Y7q7MG81qO+6a02SOK9DbPCPr73+2uowkhjONsq+ZOowJ1i5HrOpYp+o40gjiZ/pNq8Vz CH0E886ujXO93Cf8TR7PUuQSniW+48KF3cJUB9fsOwgG0rfjYT5KAmoWANShMocZUKidDW rpOcGDcDwKahGKFVobRaSy7gQZEfQSVEU39SwUdfhqiMa0cnhQqcmPvUZP4zimnurrtUU9 Z/ZPSOedFiWPm8WjyvBKagS4nqICbCSp3KhS4MVu3MDGAfcD0MCrShdVsdnbYA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=hrJVoMp5; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1680549202; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=D485GooPA9eXkcsb7NBp3efkXdbFpTeaaIo0B+cNPi0=; b=VzB6WuzckAAG+2vRjVEVMYv4g+JSZYFIyEYFwHpjcJgEgGSUIMTnblewnHKLnWvIKuCxcJ OYgz846B8Q3yTlCV0dwRk6kE479xmetIJn40D6rzEcMwqBH/G2Cl7qfB2UQuR1ITpElxnr ZoG2SUFIKVL4g4CCrjThXXc5b9KXPMPmm6qZ9v3hfzrs9ugrhVpbY8+STAd2O+ceZ6K90d PCoeoSbGctqQ13BgSVHgbr7hzr9ErSOdrcz5d/1T6AEfxrfGdbfyws74EE3+6NX0VC81Ac em1YwWzANIYnKLoDYuaxVL8eUqV9ILINPUe62g1ky6yFFDl/sElxU5+TwylTGg== Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=hrJVoMp5; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Scanner: scn0.migadu.com X-Migadu-Spam-Score: -0.02 X-Spam-Score: -0.02 X-Migadu-Queue-Id: 26A2C322E4 X-TUID: t3/Kq9UWM8l7 ------4UW1FRQSTK9AUY2AQAHBPG9IXGD9W1 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable My view as a statistician and Guix user is that trained machine learning mo= dels should at best be provided as substitutes=2E They are opaque binary ar= tifacts of purely digital compilation processes and should not be treated e= xceptionally to any other build artifact=2E It would seem to me most consistent with the goals of the project to insis= t on fully reproducible builds for machine learning models for them to be c= onsidered for inclusion into the main Guix distribution=2E Full reproducibility would make the space requirements for including them = even bigger than just the parameters but would ensure that the four freedom= s could be preserved=2E On April 3, 2023 12:48:12 PM EDT, "Nicolas Graves via Development of GNU G= uix and the GNU System distribution=2E" wrote: > >Hi Guix! > >I've recently contributed a few tools that make a few OSS machine >learning programs usable for Guix, namely nerd-dictation for dictation >and llama-cpp as a converstional bot=2E > >In the first case, I would also like to contribute parameters of some >localized models so that they can be used more easily through Guix=2E I'v= e >already discussed this subject when submitting these patches, without a >clear answer=2E > >In the case of nerd-dictation, the model parameters that can be used >are listed here : https://alphacephei=2Ecom/vosk/models > >One caveat is that using all these models can take a lot of space on the >servers, a burden which is not useful because no build step are really >needed (except an unzip step)=2E In this case, we can use the >#:substitutable? #f flag=2E You can find an example of some of these >packages right here : >https://git=2Esr=2Eht/~ngraves/dotfiles/tree/main/item/packages=2Escm > >So my question is: Should we add this type of models in packages for >Guix? If yes, where should we put them? In machine-learning=2Escm? In a >new file machine-learning-models=2Escm (such a file would never need new >modules, and it might avoid some confusion between the tools and the >parameters needed to use the tools)? > > >--=20 >Best regards, >Nicolas Graves > ------4UW1FRQSTK9AUY2AQAHBPG9IXGD9W1 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable My view as a statistician and Guix user is that tr= ained machine learning models should at best be provided as substitutes=2E = They are opaque binary artifacts of purely digital compilation processes an= d should not be treated exceptionally to any other build artifact=2E
It would seem to me most consistent with the goals of the project to insis= t on fully reproducible builds for machine learning models for them to be c= onsidered for inclusion into the main Guix distribution=2E

Full repr= oducibility would make the space requirements for including them even bigge= r than just the parameters but would ensure that the four freedoms could be= preserved=2E



On April 3, 2023 12= :48:12 PM EDT, "Nicolas Graves via Development of GNU Guix and the GNU Syst= em distribution=2E" <guix-devel@gnu=2Eorg> wrote:

Hi Guix!

I've recently contr= ibuted a few tools that make a few OSS machine
learning programs usable = for Guix, namely nerd-dictation for dictation
and llama-cpp as a convers= tional bot=2E

In the first case, I would also like to contribute par= ameters of some
localized models so that they can be used more easily th= rough Guix=2E I've
already discussed this subject when submitting these = patches, without a
clear answer=2E

In the case of nerd-dictation,= the model parameters that can be used
are listed here : https://alphacephei=2Ecom/vosk/models

One caveat is that using all these models can take a lot of space = on the
servers, a burden which is not useful because no build step are r= eally
needed (except an unzip step)=2E In this case, we can use the
#= :substitutable? #f flag=2E You can find an example of some of these
pack= ages right here :
https://git=2Esr=2Eht/~ngraves/dotfiles/tree/ma= in/item/packages=2Escm

So my question is: Should we add this typ= e of models in packages for
Guix? If yes, where should we put them? In m= achine-learning=2Escm? In a
new file machine-learning-models=2Escm (such= a file would never need new
modules, and it might avoid some confusion = between the tools and the
parameters needed to use the tools)?

------4UW1FRQSTK9AUY2AQAHBPG9IXGD9W1--