From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andrew Hyatt Newsgroups: gmane.emacs.devel Subject: Re: [NonGNU ELPA] New package: llm Date: Mon, 28 Aug 2023 00:54:49 -0400 Message-ID: References: <87v8d0iqa5.fsf@posteo.net> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000002796380603f47f5e" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23534"; mail-complaints-to="usenet@ciao.gmane.io" Cc: rms@gnu.org, Philip Kaludercic , emacs-devel@gnu.org To: Jim Porter Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Aug 28 06:55:35 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qaUHn-0005uE-PP for ged-emacs-devel@m.gmane-mx.org; Mon, 28 Aug 2023 06:55:35 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qaUHM-0002rY-PY; Mon, 28 Aug 2023 00:55:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qaUHK-0002r5-QH for emacs-devel@gnu.org; Mon, 28 Aug 2023 00:55:06 -0400 Original-Received: from mail-ed1-x52c.google.com ([2a00:1450:4864:20::52c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qaUHH-0001HK-Gn; Mon, 28 Aug 2023 00:55:06 -0400 Original-Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-522bd411679so3665411a12.0; Sun, 27 Aug 2023 21:55:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693198501; x=1693803301; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=pu4VeU30kk48UeEOoU9jpCuzEvsVtVgf3a5rADqNWMg=; b=HoLYGIuGP63f85S1dLibJIPV2U/LgZyNeglbywhtuZJTDnoeYNPKEPAbKnP+XPof3z epmmeOnnGxAUW4Grytb8ZlKT+1bkNYSJRAZ1yUIgr4F48kVfVL724dcUg4nVkN1+fU8L 3R8n7c1gp52HRlZfFF1ZarqaGgxmp8mwZ3ZRjv7NYsj2+O2bx5K1IpM0vrnF4io6a/bG JuGXerLh23LMBbnD8WcakWTB1A58RpaVNyfSFibcQt+HwQEWisM+++dlrvvcaZRD36Ei vmlIB/vFe+T8w1ZB3fvpRVrtlanwH82/BL2AUggIc2qMyqdZkt1gbiKYQOLySroP5Tzf p50Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693198501; x=1693803301; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pu4VeU30kk48UeEOoU9jpCuzEvsVtVgf3a5rADqNWMg=; b=aSWRV6V+6avGCv6+s+6FfbmfSdHgUrZPeQ1Tbpg3LzC1Px8WIkr92bEvMIIu6hIex0 CO2UoHPpYBy4KuPY2rQbQAw/iYsPj7Fef+5QcjnDnqsMe3zOLMgkQdUJZyacRubkbLe6 fkB8mKEKQaHNhKkRRAmKJkK6MQB+RDjoG1mAFuDFUTn7Bdg2EP/Rv9KiGV87DRrgfXR9 0W4cObTIQePDpMuuEzLrD7cVsQKzyV//N1A7xqiVDBmvvpe49FxeiQ6sMTJ7dYSGQq2/ OU0UNI8b2on05D1jp2dPEhC51n5was+w+rdSG+ZRpmRs+GMhZYXD3jA/7bZ+bv011+5Y T71Q== X-Gm-Message-State: AOJu0YzodJl1apZxbxc5HXA5iK26VxQfI2a+0wql4Z3iBYMq4Vsb6HDM jeiefAKD+o2e8hFhIVXyOlIlkpiJ2lyj/KN1Ddc= X-Google-Smtp-Source: AGHT+IGcOC8aVylfKfdSgXngK1VTjPrYNlEmm47ynlgzZ8K9dZRFoVcVtyZMK7FkUBcQ3PrbMn2TKkLW2bMSCS5sK60= X-Received: by 2002:a05:6402:b29:b0:525:680a:6b89 with SMTP id bo9-20020a0564020b2900b00525680a6b89mr18781622edb.12.1693198500466; Sun, 27 Aug 2023 21:55:00 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2a00:1450:4864:20::52c; envelope-from=ahyatt@gmail.com; helo=mail-ed1-x52c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:309400 Archived-At: --0000000000002796380603f47f5e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Aug 27, 2023 at 10:59=E2=80=AFPM Jim Porter = wrote: > On 8/27/2023 7:32 PM, Andrew Hyatt wrote: > > After following Jim Porter's suggestion above, here is the new function= , > > and you can see the advice we're giving in the docstring: > > > > (cl-defgeneric llm-nonfree-message-info (provider) > > "If PROVIDER is non-free, return info for a warning. > > This should be a cons of the name of the LLM, and the URL of the > > terms of service. > > > > If the LLM is free and has no restrictions on use, this should > > return nil. Since this function already returns nil, there is no > > need to override it." > > (ignore provider) > > nil) > > For what it's worth, I was thinking about having the default be the > opposite: warn users by default, since we don't really know if an LLM > provider is free unless the Elisp code indicates it. (Otherwise, it > could simply mean the author of that provider forgot to override > 'llm-nonfree-message-info'.) In other words, assume the worst by > default. :) That said, if everyone else thinks this isn't an issue, I > won't stamp my feet about it. > I agree that it'd be nice to have that property. That's the way I had it initially, but since you need info if it's non-free (the name / TOS), but not if it is free, the design where free was the default was the simplest. The alternative was one method indicating it was free/nonfree and the other, if non-free, to provide the additional information. > > As for the docstring, I see that many models use ordinary software > licenses, such as the Apache license. That could make it easier for us > to define the criteria for a libre provider: is the model used by the > provider available under a license the FSF considers a free software > license?[1] (For LLM providers that you use by making a web request, we > could also expect that all the code for their web API is libre too. > However, that code is comparatively uninteresting, and so long as you > could get the model to use on a self-hosted system[2], I don't see a > need to warn the user.) > I agree that it'd be nice to define this in a more clear way, but we also can just wait until someone proposes a free LLM to include to judge it. We can always bring it back to the emacs-devel list if there is uncertainty. The hosting code is not that relevant here. For these companies, there would be restrictions on the use of the model even if there were no other unfree software in the middle (kind of like how Llama 2 is). Notably, no company is going to want the user to train competing models with their model. This is the most common restriction on freedoms of the user. > > (Also, if you prefer to avoid having to say '(ignore provider)', you can > also prefix 'provider' with an underscore. That'll make the byte > compiler happy.) > TIL, that's a great tip, thanks! > > [1] https://www.gnu.org/licenses/license-list.en.html > > [2] At least, in theory. A user might not have enough computing power to > use the model in practice, but I don't think that matters for this case. > --0000000000002796380603f47f5e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, Aug 27, 2023 at 10:59=E2=80=AFPM = Jim Porter <jporterbugs@gmail.c= om> wrote:
On 8/27/2023 7:32 PM, Andrew Hyatt wrote:
> After following Jim Porter's suggestion above, here is the new fun= ction,
> and you can see the advice we're giving in the docstring:
>
> (cl-defgeneric llm-nonfree-message-info (provider)
>=C2=A0 =C2=A0 "If PROVIDER is non-free, return info for a warning.=
> This should be a cons of the name of the LLM, and the URL of the
> terms of service.
>
> If the LLM is free and has no restrictions on use, this should
> return nil. Since this function already returns nil, there is no
> need to override it."
>=C2=A0 =C2=A0 (ignore provider)
>=C2=A0 =C2=A0 nil)

For what it's worth, I was thinking about having the default be the opposite: warn users by default, since we don't really know if an LLM <= br> provider is free unless the Elisp code indicates it. (Otherwise, it
could simply mean the author of that provider forgot to override
'llm-nonfree-message-info'.) In other words, assume the worst by default. :) That said, if everyone else thinks this isn't an issue, I <= br> won't stamp my feet about it.

I agr= ee that it'd be nice to have that property.=C2=A0 That's the way I = had it initially, but since you need info if it's non-free (the name / = TOS), but not if it is free, the design where free was the default was the = simplest.=C2=A0 The alternative was one method indicating it was free/nonfr= ee and the other, if non-free, to provide the additional information.
=
=C2=A0

As for the docstring, I see that many models use ordinary software
licenses, such as the Apache license. That could make it easier for us
to define the criteria for a libre provider: is the model used by the
provider available under a license the FSF considers a free software
license?[1] (For LLM providers that you use by making a web request, we could also expect that all the code for their web API is libre too.
However, that code is comparatively uninteresting, and so long as you
could get the model to use on a self-hosted system[2], I don't see a need to warn the user.)

I agree that it= 'd be nice to define this in a more clear way, but we also can just wai= t until someone proposes a free LLM to include to judge it.=C2=A0 We can al= ways bring it back to the emacs-devel list if there is uncertainty.

The hosting code is not that relevant here.=C2=A0 For the= se companies, there would be restrictions on the use of the model even if t= here were no other unfree software in the middle (kind of like how Llama 2 = is).=C2=A0 Notably, no company is going to want the user to train competing= models with their model.=C2=A0 =C2=A0This is the most common restriction o= n freedoms of the user.
=C2=A0

(Also, if you prefer to avoid having to say '(ignore provider)', yo= u can
also prefix 'provider' with an underscore. That'll make the byt= e
compiler happy.)

TIL, that's a grea= t tip, thanks!
=C2=A0

[1] https://www.gnu.org/licenses/license-list.en.ht= ml

[2] At least, in theory. A user might not have enough computing power to use the model in practice, but I don't think that matters for this case= .
--0000000000002796380603f47f5e--