From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Shane Mulligan Newsgroups: gmane.emacs.tangents Subject: Re: Help building Pen.el (GPT for emacs) Date: Sat, 24 Jul 2021 14:34:40 +1200 Message-ID: References: <83im1948mj.fsf@gnu.org> <83lf642jeh.fsf@gnu.org> <83r1fp1es9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000005d174f05c7d55d23" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34010"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , Stefan Kangas , emacs-tangents@gnu.org, Jean Louis To: rms@gnu.org Original-X-From: emacs-tangents-bounces+get-emacs-tangents=m.gmane-mx.org@gnu.org Sat Jul 24 04:35:08 2021 Return-path: Envelope-to: get-emacs-tangents@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m77VM-0008cp-1h for get-emacs-tangents@m.gmane-mx.org; Sat, 24 Jul 2021 04:35:08 +0200 Original-Received: from localhost ([::1]:39392 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m77VK-000304-8A for get-emacs-tangents@m.gmane-mx.org; Fri, 23 Jul 2021 22:35:06 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58566) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m77V9-0002zw-P2 for emacs-tangents@gnu.org; Fri, 23 Jul 2021 22:34:55 -0400 Original-Received: from mail-yb1-xb2d.google.com ([2607:f8b0:4864:20::b2d]:46967) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1m77V7-00059m-Ml; Fri, 23 Jul 2021 22:34:55 -0400 Original-Received: by mail-yb1-xb2d.google.com with SMTP id k65so558264yba.13; Fri, 23 Jul 2021 19:34:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=TufbYfoj6tofurUxXLlfkGKnaC750iw1jTgUfMgvpRI=; b=PdZAkrJOnIpdaD1L9RPKiLLkbXAYU6NmaNPuGyAmxD7Gn+BjOHOEuOlu8STg+LVNi+ JynkjJie+V/fzNnkQhz9RTy83gZwBEeCvc9YgXs/TS+NcuamBNbboMF9GBumx/B1r4Bw l+AjvXGp8N/Vxd9tRVbhOAZGRfIKJD2+DJ2T8rpBofo0qjFYXvx8nInu6GwHe/yH/utR Q6OFHmHpizyaJaXGG2J0CotkjvfuWsCYpXr3BisQ+lloqMKAraa2ax8ifKuEJm+jbv6j 9yzvpiafzJkS6XOuGeQcPQ5oG/ntB2nRmHVbi7FK04euJRB1Qb8l1aqBRZ9aR/+vcQkH oTZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=TufbYfoj6tofurUxXLlfkGKnaC750iw1jTgUfMgvpRI=; b=uSWq5gT1oTouCFkV2ZHBv5ecPBzh6X+/LYFI/cU/eNb09XGqsm7jGMpfr8vClf7HYo 6MS3ETcDlpRMdlmvTmWsrD7YBAgrbdaE4C44ugS+QDbR3KGnj0aAvJ+DUzm2ziork4Pt 9ibMWL8k5OkCSfb9Gf5VrUnHb7ou4T5KOZFjlDaI2uSxC7K7RpzI4okeRAQ6rZUYturA i2ojfxMSg5FlsEq/Jz2lVjMgZSfHklIyb4Yrf6+FH5ZnEqL9lEzs6ltSlj2MdsNLeTRs y1xekgGEHupT4kDjCeWM6cmUaADyXKT7MNBeKJbPFwPZycLUaxKGG69ia6ABk4Ls/M1w pPlw== X-Gm-Message-State: AOAM5301hx3zOfp7JRD6vQoVNQoPoRMiwXacCAQ3XGfUxnTdgyewYsYP UyjYF8IJvqcPlmOp+l00cDDQ3f1RLs+tQITzK7cQZi+gjRiZTZgcpw== X-Google-Smtp-Source: ABdhPJzQevfPNUhTGWxgqFNngZLp2oE7w7IFMWbZwzoepQl8c6T1Or4FZhomp1Fm8SwBHXKjCR+h9kPJacOdg2FlXmw= X-Received: by 2002:a25:380c:: with SMTP id f12mr10746898yba.208.1627094091895; Fri, 23 Jul 2021 19:34:51 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2607:f8b0:4864:20::b2d; envelope-from=mullikine@gmail.com; helo=mail-yb1-xb2d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-tangents@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Emacs news and miscellaneous discussions outside the scope of other Emacs mailing lists List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-tangents-bounces+get-emacs-tangents=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-tangents" Xref: news.gmane.io gmane.emacs.tangents:673 Archived-At: --0000000000005d174f05c7d55d23 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable This is why the technology is a bit like a personal Google search, Stackoverflow, which you can store offline because it's an index of the internet that is capable of reconstruction. But it's not limited to code generation. Codex is nothing. Emacs + GPT would carve a large piece out of M$. Codex is a model trained for the purpose of generating code, but GPT models will become abundant for all tasks, including image and audio synthesis and understanding. Emacs is a complete operating system. VSCode is geared towards programming. Emacs can do infinitely more things with GPT than VSCode can because it's holistic. Even the 'eliza' in emacs can pass the turing test with GPT. GPT can run sequences of commands in emacs to automate entire workflows with natural language. But the future is in collaborative GPT. The basis/base truth would become versions of LMs or ontologies. Right now that's EleutherAI. Shane Mulligan How to contact me: =F0=9F=87=A6=F0=9F=87=BA 00 61 421 641 250 =F0=9F=87=B3=F0=9F=87=BF 00 64 21 1462 759 <+64-21-1462-759> mullikine@gmail.com On Sat, Jul 24, 2021 at 2:10 PM Shane Mulligan wrote: > It's a bit like whitewashing because it's > reconstructing generatively by finding > artificial/contrived associations between > different works that the author had not > intended but may have been part of their > inspiration inspiration, and it compresses the > information based on these assocations. > > It's a bit like running a lossy 'zip' on the > internet and then decompressing > probabilistically. > > When run deterministically (set the temperature of GPT to 0), you may > actually > see 'snippets' from various places, every time, with the same input > generating > the same snippets. > > So the source material is important. > > What GitHub did was very, very bad but they > did it anyway. > > That doesn't mean GPT is bad, it just means > they zipped up content they should not have > and created this language 'index' or ('codex' > is what they call it). > > What they really should do, if they are honest > people, is train the model on subsets of > GitHub code by separate licence and release > the models with the same license. > > Shane Mulligan > > How to contact me: > =F0=9F=87=A6=F0=9F=87=BA 00 61 421 641 250 > =F0=9F=87=B3=F0=9F=87=BF 00 64 21 1462 759 <+64-21-1462-759> > mullikine@gmail.com > > > On Sat, Jul 24, 2021 at 1:14 PM Richard Stallman wrote: > >> [[[ To any NSA and FBI agents reading my email: please consider ]]] >> [[[ whether defending the US Constitution against all enemies, ]]] >> [[[ foreign or domestic, requires you to follow Snowden's example. ]]] >> >> > > That's not what happens with these services: they don't _copy_ cod= e >> > > from other software (that won't work, because the probability of t= he >> > > variables being called by other names is 100%, and thus such code, >> if >> > > pasted into your program, will not compile). What they do, they >> > > extract ideas and algorithms from those other places, and express >> them >> > > in terms of your variables and your data types. So licenses are n= ot >> > > relevant here. >> >> > According to online reviews chunks of code is copied even verbatim a= nd >> > people find from where. Even if modified, it still requires licensin= g >> > compliance. >> >> From what I have read, it seems that the behavior of copilot runs on a >> spectrum from the first description to the second description. I >> expect that in many cases, nothing copyrightable has been copied, but >> in some cases copilot does copy a substantial amount from a >> copyrighted work. >> >> -- >> Dr Richard Stallman (https://stallman.org) >> Chief GNUisance of the GNU Project (https://gnu.org) >> Founder, Free Software Foundation (https://fsf.org) >> Internet Hall-of-Famer (https://internethalloffame.org) >> >> >> --0000000000005d174f05c7d55d23 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
This is why the technology is a bit= like a
personal Google search, Stackoverflow, which
you can store of= fline because it's an index of the internet that is capable of reconstr= uction.

But it's not limited to code generation. Codex
is not= hing. Emacs + GPT would carve a large
piece out of M$.

Codex is a= model trained for the purpose of
generating code, but GPT models will b= ecome
abundant for all tasks, including image and
audio synthesis and= understanding.

Emacs is a complete operating system.
VSCode is g= eared towards programming.

Emacs can do infinitely more things with = GPT
than VSCode can because it's holistic.

Even the 'eliz= a' in emacs can pass the turing
test with GPT. GPT can run sequences= of commands in emacs to automate
entire workflows with natural language= .

But the future is in collaborative GPT.

The basis/base trut= h would become versions of
LMs or ontologies.

Right now that'= s EleutherAI.

Shane Mulligan

=
How to contact me:
3D""
=F0=9F=87=A6= =F0=9F=87=BA00 61 42= 1 641 250
=F0=9F=87=B3=F0=9F=87=BF00 64 21 1462 759
= mullikine@gmail.com



On Sat, Jul 24, 2021 at= 2:10 PM Shane Mulligan <mullikin= e@gmail.com> wrote:
It's a bit like whitewashing because it'= ;s
reconstructing generatively by finding
artificial/contrived associ= ations between
different works that the author had not
intended but m= ay have been part of their
inspiration inspiration, and it compresses th= e
information based on these assocations.

It's a bit like run= ning a lossy 'zip' on the
internet and then decompressing
pro= babilistically.

When run deterministically (set the temperature of G= PT to 0), you may actually
see 'snippets' from various places, e= very time, with the same input generating
the same snippets.

So t= he source material is important.

What GitHub did was very, very bad = but they
did it anyway.

That doesn't mean GPT is bad, it just= means
they zipped up content they should not have
and created this l= anguage 'index' or ('codex'
is what they call it).
What they really should do, if they are honest
people, is train the mo= del on subsets of
GitHub code by separate licence and release
the mod= els with the same license.

<= div dir=3D"ltr">
Shane Mu= lligan

=
How to contact me:
3D""
=F0=9F=87=A6=F0=9F=87=BA00 61 421 641 250
=F0=9F= =87=B3=F0=9F=87=BF00 6= 4 21 1462 759
mullikine@gmail.com



On Sat, Jul 24, 2021 at= 1:14 PM Richard Stallman <rms@gnu.org> wrote:
[[[ To any NSA and FBI agents reading my email: please consi= der=C2=A0 =C2=A0 ]]]
[[[ whether defending the US Constitution against all enemies,=C2=A0 =C2=A0= =C2=A0]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]<= br>
=C2=A0 > > That's not what happens with these services: they don&= #39;t _copy_ code
=C2=A0 > > from other software (that won't work, because the prob= ability of the
=C2=A0 > > variables being called by other names is 100%, and thus su= ch code, if
=C2=A0 > > pasted into your program, will not compile).=C2=A0 What th= ey do, they
=C2=A0 > > extract ideas and algorithms from those other places, and = express them
=C2=A0 > > in terms of your variables and your data types.=C2=A0 So l= icenses are not
=C2=A0 > > relevant here.

=C2=A0 > According to online reviews chunks of code is copied even verba= tim and
=C2=A0 > people find from where. Even if modified, it still requires lic= ensing
=C2=A0 > compliance.

>From what I have read, it seems that the behavior of copilot runs on a
spectrum from the first description to the second description.=C2=A0 I
expect that in many cases, nothing copyrightable has been copied, but
in some cases copilot does copy a substantial amount from a
copyrighted work.

--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)


--0000000000005d174f05c7d55d23--