unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Kyle <kyle@posteo.net>
To: "Nicolas Graves via Development of GNU Guix and the GNU System
	distribution." <guix-devel@gnu.org>,
	guix-devel@gnu.org
Subject: Re: Where should we put machine learning model parameters ?
Date: Mon, 03 Apr 2023 19:12:01 +0000	[thread overview]
Message-ID: <298126E3-0137-4B39-BC48-C284D0464B68@posteo.net> (raw)
In-Reply-To: <87jzyshpyr.fsf@ngraves.fr>

[-- Attachment #1: Type: text/plain, Size: 2113 bytes --]

My view as a statistician and Guix user is that trained machine learning models should at best be provided as substitutes. They are opaque binary artifacts of purely digital compilation processes and should not be treated exceptionally to any other build artifact.

It would seem to me most consistent with the goals of the project to insist on fully reproducible builds for machine learning models for them to be considered for inclusion into the main Guix distribution.

Full reproducibility would make the space requirements for including them even bigger than just the parameters but would ensure that the four freedoms could be preserved.



On April 3, 2023 12:48:12 PM EDT, "Nicolas Graves via Development of GNU Guix and the GNU System distribution." <guix-devel@gnu.org> wrote:
>
>Hi Guix!
>
>I've recently contributed a few tools that make a few OSS machine
>learning programs usable for Guix, namely nerd-dictation for dictation
>and llama-cpp as a converstional bot.
>
>In the first case, I would also like to contribute parameters of some
>localized models so that they can be used more easily through Guix. I've
>already discussed this subject when submitting these patches, without a
>clear answer.
>
>In the case of nerd-dictation, the model parameters that can be used
>are listed here : https://alphacephei.com/vosk/models
>
>One caveat is that using all these models can take a lot of space on the
>servers, a burden which is not useful because no build step are really
>needed (except an unzip step). In this case, we can use the
>#:substitutable? #f flag. You can find an example of some of these
>packages right here :
>https://git.sr.ht/~ngraves/dotfiles/tree/main/item/packages.scm
>
>So my question is: Should we add this type of models in packages for
>Guix? If yes, where should we put them? In machine-learning.scm? In a
>new file machine-learning-models.scm (such a file would never need new
>modules, and it might avoid some confusion between the tools and the
>parameters needed to use the tools)?
>
>
>-- 
>Best regards,
>Nicolas Graves
>

[-- Attachment #2: Type: text/html, Size: 2499 bytes --]

  reply	other threads:[~2023-04-03 19:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-03 16:48 Where should we put machine learning model parameters ? Nicolas Graves via Development of GNU Guix and the GNU System distribution.
2023-04-03 19:12 ` Kyle [this message]
2023-04-06 18:55 ` how to deal with large dataset? (was Re: Where should we put machine learning model parameters ?) Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=298126E3-0137-4B39-BC48-C284D0464B68@posteo.net \
    --to=kyle@posteo.net \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).