unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Jim Porter <jporterbugs@gmail.com>
To: Daniel Fleischer <danflscr@gmail.com>, Richard Stallman <rms@gnu.org>
Cc: ahyatt@gmail.com, emacs-devel@gnu.org
Subject: Re: [NonGNU ELPA] New package: llm
Date: Sun, 20 Aug 2023 21:48:06 -0700	[thread overview]
Message-ID: <705ab838-142a-b3cc-8cc8-6f4d143c4341@gmail.com> (raw)
In-Reply-To: <m2wmxt38cr.fsf@gmail.com>

On 8/17/2023 10:08 AM, Daniel Fleischer wrote:
> That is not accurate; LLMs can definitely run locally on your machine.
> Models can be downloaded and ran using Python. Here is an LLM released
> under Apache 2 license [0]. There are "black-box" models, served in the
> cloud, but the revolution we're is precisely because many models are
> released freely and can be ran (and trained) locally, even on a laptop.
> 
> [0] https://huggingface.co/mosaicml/mpt-7b

The link says that this model has been pretrained, which is certainly 
useful for the average person who doesn't want (or doesn't have the 
resources) to perform the training themselves, but from the 
documentation, it's not clear how I *would* perform the training myself 
if I were so inclined. (I've only toyed with LLMs, so I'm not an expert 
at more "advanced" cases like this.)

I do see that the documentation mentions the training datasets used, but 
it also says that "great efforts have been taken to clean the 
pretraining data". Am I able to access the cleaned datasets? I looked 
over their blog post[1], but I didn't see anything describing this in 
detail.

While I certainly appreciate the effort people are making to produce 
LLMs that are more open than OpenAI (a low bar), I'm not sure if 
providing several gigabytes of model weights in binary format is really 
providing the *source*. It's true that you can still edit these models 
in a sense by fine-tuning them, but you could say the same thing about a 
project that only provided the generated output from GNU Bison, instead 
of the original input to Bison.

(Just to be clear, I don't mean any of the above to be leading 
questions. I really don't know the answers, and using analogies to 
previous cases like Bison can only get us so far. I truly hope there 
*is* a freedom-respecting way to interface with LLMs, but I also think 
it's worth taking some extra care at the beginning so we can choose the 
right path forward.)

[1] https://www.mosaicml.com/blog/mpt-7b



  parent reply	other threads:[~2023-08-21  4:48 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-07 23:54 [NonGNU ELPA] New package: llm Andrew Hyatt
2023-08-08  5:42 ` Philip Kaludercic
2023-08-08 15:08   ` Spencer Baugh
2023-08-08 15:09   ` Andrew Hyatt
2023-08-09  3:47 ` Richard Stallman
2023-08-09  4:37   ` Andrew Hyatt
2023-08-13  1:43     ` Richard Stallman
2023-08-13  1:43     ` Richard Stallman
2023-08-13  2:11       ` Emanuel Berg
2023-08-15  5:14       ` Andrew Hyatt
2023-08-15 17:12         ` Jim Porter
2023-08-17  2:02           ` Richard Stallman
2023-08-17  2:48             ` Andrew Hyatt
2023-08-19  1:51               ` Richard Stallman
2023-08-19  9:08                 ` Ihor Radchenko
2023-08-21  1:12                   ` Richard Stallman
2023-08-21  8:26                     ` Ihor Radchenko
2023-08-17 17:08             ` Daniel Fleischer
2023-08-19  1:49               ` Richard Stallman
2023-08-19  8:15                 ` Daniel Fleischer
2023-08-21  1:12                   ` Richard Stallman
2023-08-21  4:48               ` Jim Porter [this message]
2023-08-21  5:12                 ` Andrew Hyatt
2023-08-21  6:03                   ` Jim Porter
2023-08-21  6:36                 ` Daniel Fleischer
2023-08-22  1:06                 ` Richard Stallman
2023-08-16  2:30         ` Richard Stallman
2023-08-16  5:11           ` Tomas Hlavaty
2023-08-18  2:10             ` Richard Stallman
2023-08-27  1:07       ` Andrew Hyatt
2023-08-27 13:11         ` Philip Kaludercic
2023-08-28  1:31           ` Richard Stallman
2023-08-28  2:32             ` Andrew Hyatt
2023-08-28  2:59               ` Jim Porter
2023-08-28  4:54                 ` Andrew Hyatt
2023-08-31  2:10                 ` Richard Stallman
2023-08-31  9:06                   ` Ihor Radchenko
2023-09-04  1:27                     ` Richard Stallman
2023-09-06 12:25                       ` Ihor Radchenko
2023-09-04  1:27                     ` Richard Stallman
2023-08-27 18:36         ` Jim Porter
2023-08-28  0:19           ` Andrew Hyatt
2023-09-04  1:27           ` Richard Stallman
2023-09-04  5:18             ` Andrew Hyatt
2023-09-07  1:21               ` Richard Stallman
2023-09-12  4:54                 ` Andrew Hyatt
2023-09-12  9:57                   ` Philip Kaludercic
2023-09-12 15:05                   ` Stefan Kangas
2023-09-19 16:26                     ` Andrew Hyatt
2023-09-19 16:34                       ` Philip Kaludercic
2023-09-19 18:19                         ` Andrew Hyatt
2023-09-04  1:27         ` Richard Stallman
2023-08-09  3:47 ` Richard Stallman
2023-08-09  4:06   ` Andrew Hyatt
2023-08-12  2:44     ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=705ab838-142a-b3cc-8cc8-6f4d143c4341@gmail.com \
    --to=jporterbugs@gmail.com \
    --cc=ahyatt@gmail.com \
    --cc=danflscr@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).