From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Shane Mulligan Newsgroups: gmane.emacs.tangents Subject: Re: Help building Pen.el (GPT for emacs) Date: Sat, 24 Jul 2021 14:10:43 +1200 Message-ID: References: <83im1948mj.fsf@gnu.org> <83lf642jeh.fsf@gnu.org> <83r1fp1es9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000c5822105c7d507b0" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4921"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , Stefan Kangas , emacs-tangents@gnu.org, Jean Louis To: rms@gnu.org Original-X-From: emacs-tangents-bounces+get-emacs-tangents=m.gmane-mx.org@gnu.org Sat Jul 24 04:11:11 2021 Return-path: Envelope-to: get-emacs-tangents@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m778B-00015y-GU for get-emacs-tangents@m.gmane-mx.org; Sat, 24 Jul 2021 04:11:11 +0200 Original-Received: from localhost ([::1]:38212 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m778A-0000kg-IM for get-emacs-tangents@m.gmane-mx.org; Fri, 23 Jul 2021 22:11:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56220) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m777z-0000kG-Tf for emacs-tangents@gnu.org; Fri, 23 Jul 2021 22:10:59 -0400 Original-Received: from mail-yb1-xb35.google.com ([2607:f8b0:4864:20::b35]:45709) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1m777x-00067J-Qm; Fri, 23 Jul 2021 22:10:59 -0400 Original-Received: by mail-yb1-xb35.google.com with SMTP id a201so5100311ybg.12; Fri, 23 Jul 2021 19:10:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=M2dkFM587F7H88JwFP15KOdr/+Ce2Dwd20WsSErZ8dU=; b=a7gvjQZhvNhM4kpGpWTqrq86uJV1+E/GkgXO5ekK6X3ZZ3Sh5rAL6QrL/SeDvKeipD sxFfT6M/iXTfsm5a/qkxiGRRWpZh0b1HfINCrf6VTRMPQ5wjkv/1vNpmnmTFvYEAX3kn 3UKqo+AhBZ1uURlUmH4+XQsMc5iWdIn10zIdWqF8EifK3eWCCSbLK0YsOanEW2QxJBFw WSmmuFf6eiEd343XkBEB68q1FK94HlNsKCx+i8vOz0tXl8t74oSRHOM8XKmoLkVbZYO6 N8kn4vBiqSo8ZNAn37QA1WNm8/Qk92oHz0EFfqdOMkY/1si9+jz0PTkE6KY2GPlKLui3 Uevg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=M2dkFM587F7H88JwFP15KOdr/+Ce2Dwd20WsSErZ8dU=; b=ERHqhIDLdgHgqQ/EUm+BR/+XHY+ggcDe2ZtHhR9XbsmBMMAacWYYvv8qnSBFLUx/X+ KYG7rEQsLOTaplXj2n8OkuYW6fQR+75Kzp7SbWWlG4N4sKVrXRb08BtfBswlR5LasaOk K7ss+N4wpuzibSRJk715R1+tD5wNraCcxddB9VrU+Dv3cucu6HpQakdWN2MhmvvTM0Pv ugaJCCvAb0seFhhy/Gvhbiu22K624HupWvL2x2vsA5jw8RFVBp2ERT+DftRLA0EhtHez 7vs3Di9HALzjMIEStfqatBHHD0kmKBWCpBO6kZYLUoS8IJGiJhBF9+z6UUI1SCMDcxrt JG/w== X-Gm-Message-State: AOAM532jYcPi6Tcc7t61PuWlRA74f/CpF1VFBt5e+YPc2onmCNT2lwh5 RzLMDo/vgZLEAhyW3SFBuTMbbdy/Ht3jtwyUOMaPKrPTcG1rsI9tdA== X-Google-Smtp-Source: ABdhPJwIZZ0F0LrvDQEci6/uhwELG6Z1wKUYNXp6lcJoCxOKe+vgOTqQbGiPthYQ13NwgIBk/P0SkxGT9Sr6uAIffic= X-Received: by 2002:a25:ed01:: with SMTP id k1mr10306613ybh.74.1627092655898; Fri, 23 Jul 2021 19:10:55 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2607:f8b0:4864:20::b35; envelope-from=mullikine@gmail.com; helo=mail-yb1-xb35.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-tangents@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Emacs news and miscellaneous discussions outside the scope of other Emacs mailing lists List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-tangents-bounces+get-emacs-tangents=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-tangents" Xref: news.gmane.io gmane.emacs.tangents:672 Archived-At: --000000000000c5822105c7d507b0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It's a bit like whitewashing because it's reconstructing generatively by finding artificial/contrived associations between different works that the author had not intended but may have been part of their inspiration inspiration, and it compresses the information based on these assocations. It's a bit like running a lossy 'zip' on the internet and then decompressing probabilistically. When run deterministically (set the temperature of GPT to 0), you may actually see 'snippets' from various places, every time, with the same input generating the same snippets. So the source material is important. What GitHub did was very, very bad but they did it anyway. That doesn't mean GPT is bad, it just means they zipped up content they should not have and created this language 'index' or ('codex' is what they call it). What they really should do, if they are honest people, is train the model on subsets of GitHub code by separate licence and release the models with the same license. Shane Mulligan How to contact me: =F0=9F=87=A6=F0=9F=87=BA 00 61 421 641 250 =F0=9F=87=B3=F0=9F=87=BF 00 64 21 1462 759 <+64-21-1462-759> mullikine@gmail.com On Sat, Jul 24, 2021 at 1:14 PM Richard Stallman wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > > > That's not what happens with these services: they don't _copy_ code > > > from other software (that won't work, because the probability of th= e > > > variables being called by other names is 100%, and thus such code, = if > > > pasted into your program, will not compile). What they do, they > > > extract ideas and algorithms from those other places, and express > them > > > in terms of your variables and your data types. So licenses are no= t > > > relevant here. > > > According to online reviews chunks of code is copied even verbatim an= d > > people find from where. Even if modified, it still requires licensing > > compliance. > > From what I have read, it seems that the behavior of copilot runs on a > spectrum from the first description to the second description. I > expect that in many cases, nothing copyrightable has been copied, but > in some cases copilot does copy a substantial amount from a > copyrighted work. > > -- > Dr Richard Stallman (https://stallman.org) > Chief GNUisance of the GNU Project (https://gnu.org) > Founder, Free Software Foundation (https://fsf.org) > Internet Hall-of-Famer (https://internethalloffame.org) > > > --000000000000c5822105c7d507b0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
It's a bit like whitewashing because it's
recon= structing generatively by finding
artificial/contrived associations betw= een
different works that the author had not
intended but may have bee= n part of their
inspiration inspiration, and it compresses the
inform= ation based on these assocations.

It's a bit like running a loss= y 'zip' on the
internet and then decompressing
probabilistica= lly.

When run deterministically (set the temperature of GPT to 0), y= ou may actually
see 'snippets' from various places, every time, = with the same input generating
the same snippets.

So the source m= aterial is important.

What GitHub did was very, very bad but theydid it anyway.

That doesn't mean GPT is bad, it just means
t= hey zipped up content they should not have
and created this language = 9;index' or ('codex'
is what they call it).

What they= really should do, if they are honest
people, is train the model on subs= ets of
GitHub code by separate licence and release
the models with th= e same license.

Shane Mulligan

How to contact me:
3D""
=F0=9F=87= =A6=F0=9F=87=BA00 61= 421 641 250
=F0=9F=87=B3=F0=9F=87=BF00 64 21 1462 759
= mullikine@gmail.com



On Sat, Jul 24, 2021 at= 1:14 PM Richard Stallman <rms@gnu.org> wrote:
[[[= To any NSA and FBI agents reading my email: please consider=C2=A0 =C2=A0 ]= ]]
[[[ whether defending the US Constitution against all enemies,=C2=A0 =C2=A0= =C2=A0]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]<= br>
=C2=A0 > > That's not what happens with these services: they don&= #39;t _copy_ code
=C2=A0 > > from other software (that won't work, because the prob= ability of the
=C2=A0 > > variables being called by other names is 100%, and thus su= ch code, if
=C2=A0 > > pasted into your program, will not compile).=C2=A0 What th= ey do, they
=C2=A0 > > extract ideas and algorithms from those other places, and = express them
=C2=A0 > > in terms of your variables and your data types.=C2=A0 So l= icenses are not
=C2=A0 > > relevant here.

=C2=A0 > According to online reviews chunks of code is copied even verba= tim and
=C2=A0 > people find from where. Even if modified, it still requires lic= ensing
=C2=A0 > compliance.

>From what I have read, it seems that the behavior of copilot runs on a
spectrum from the first description to the second description.=C2=A0 I
expect that in many cases, nothing copyrightable has been copied, but
in some cases copilot does copy a substantial amount from a
copyrighted work.

--
Dr Richard Stallman (
https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)


--000000000000c5822105c7d507b0--