From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Shane Mulligan Newsgroups: gmane.emacs.tangents Subject: Re: Help building Pen.el (GPT for emacs) Date: Sat, 24 Jul 2021 15:14:20 +1200 Message-ID: References: <83im1948mj.fsf@gnu.org> <83lf642jeh.fsf@gnu.org> <83r1fp1es9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000003c123a05c7d5eba3" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3423"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , Stefan Kangas , emacs-tangents@gnu.org, Jean Louis To: rms@gnu.org Original-X-From: emacs-tangents-bounces+get-emacs-tangents=m.gmane-mx.org@gnu.org Sat Jul 24 05:14:49 2021 Return-path: Envelope-to: get-emacs-tangents@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m787k-0000eE-HA for get-emacs-tangents@m.gmane-mx.org; Sat, 24 Jul 2021 05:14:48 +0200 Original-Received: from localhost ([::1]:46266 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m787j-0001OM-J6 for get-emacs-tangents@m.gmane-mx.org; Fri, 23 Jul 2021 23:14:47 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33250) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m787Y-0001OE-BO for emacs-tangents@gnu.org; Fri, 23 Jul 2021 23:14:36 -0400 Original-Received: from mail-yb1-xb29.google.com ([2607:f8b0:4864:20::b29]:33440) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1m787W-0004Cl-4V; Fri, 23 Jul 2021 23:14:36 -0400 Original-Received: by mail-yb1-xb29.google.com with SMTP id x192so5408061ybe.0; Fri, 23 Jul 2021 20:14:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zfyzOATdEwjwHP/kpHr0fUVwRYF2tqLM8nl7FJ3e08U=; b=F9duCHGfabV3V9SXCqEG3kYzIihmh63WU8bJ8FUlrTQIc4F2sGgtAuBT7/YKLcMBsT R9tTVApjG78e2dd7RqVBpBNXs2A19pRwunZczL6GV/zZ6gkg1JL8P0nw5xWeIfLwntY8 4lPQS/X1ToI+xKIUxryn1XDP3ZdKgah7SZfOWhdAT2OFx6DEAppq604EU54NaRbn9jdm 2Rx18nW0tfxmBWImRFUpLXtoDSmJDIjKZE6MCZCJFfTB6Zk0wYuYEgAgEtzhnuZJ34MI KX9MFw7d6TKfj4ucp+bmnl0DkMUH4y9HyfymXCQ/2HskES/+R62v3WKP7bohkDY6GSjL i4rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zfyzOATdEwjwHP/kpHr0fUVwRYF2tqLM8nl7FJ3e08U=; b=LPhaIP/xdGCNs0sv9GdILMh9JkdeMJDrMgrX+LSjtelkeOMrzuauPQYCixywLxJ16W gPUTShT5uE+hPWpGFjH35axab+CvmvXGV/9BGf+d5b4nd5VDNm0WcZjRvvqBOqNQR+yL vnjKfv/Pysr0cOg6fDRCQezOYIdz3DAhSWxIbM70YbKODFTguFQTuUoBe65cBOIkF0t8 tpDDehLUmVgFdzixWpeN1Fm4bqrAcfh1nY5pdiUD7WE+Yu5qC3uyc6gktv5kdbGOSA+j r8Tm+U3guL5Sw4cUgHDeKbHibHDfOpuuerRI7sj5txS5Wr0cuoAVhWP8eJuuu7L3SMDc 0h/w== X-Gm-Message-State: AOAM53182Fu0zVv2r7DrKm2mpS1g3n8WDNz5jmdPTS1RL3XAYwB6wXjX W+I47wGoeE+kdsY90ObPqzIdomKxHDfTpdv5owEOzAbuvYfIstE= X-Google-Smtp-Source: ABdhPJzeWxbConb8tRL3wYeLm4g5bTGrPSS5ZxYI2nGr53sDWl0tFhtCDsJ1swMhh5u84w+9hNZUAPYE0RQE1JygXjU= X-Received: by 2002:a25:abf3:: with SMTP id v106mr10944023ybi.299.1627096472096; Fri, 23 Jul 2021 20:14:32 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2607:f8b0:4864:20::b29; envelope-from=mullikine@gmail.com; helo=mail-yb1-xb29.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-tangents@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Emacs news and miscellaneous discussions outside the scope of other Emacs mailing lists List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-tangents-bounces+get-emacs-tangents=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-tangents" Xref: news.gmane.io gmane.emacs.tangents:675 Archived-At: --0000000000003c123a05c7d5eba3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Proprietary code from within the M$ ecosystem is uninspired and bad code by comparison. Open source code is the gold mine so M$ will not like being told they cannot use open source to compile codex. It's a complete r*pe of open source. GPT is trained on public language and language belongs to people generally, not some select group. It's not meant to be a tool for controlling people. GPT is literally the soul of a billion people and should be public domain and not feared by GNU but instead rescued. Sorry for the rhetoric! On Sat, Jul 24, 2021 at 2:34 PM Shane Mulligan wrote: > This is why the technology is a bit like a > personal Google search, Stackoverflow, which > you can store offline because it's an index of the internet that is > capable of reconstruction. > > But it's not limited to code generation. Codex > is nothing. Emacs + GPT would carve a large > piece out of M$. > > Codex is a model trained for the purpose of > generating code, but GPT models will become > abundant for all tasks, including image and > audio synthesis and understanding. > > Emacs is a complete operating system. > VSCode is geared towards programming. > > Emacs can do infinitely more things with GPT > than VSCode can because it's holistic. > > Even the 'eliza' in emacs can pass the turing > test with GPT. GPT can run sequences of commands in emacs to automate > entire workflows with natural language. > > But the future is in collaborative GPT. > > The basis/base truth would become versions of > LMs or ontologies. > > Right now that's EleutherAI. > > Shane Mulligan > > How to contact me: > =F0=9F=87=A6=F0=9F=87=BA 00 61 421 641 250 > =F0=9F=87=B3=F0=9F=87=BF 00 64 21 1462 759 <+64-21-1462-759> > mullikine@gmail.com > > > On Sat, Jul 24, 2021 at 2:10 PM Shane Mulligan > wrote: > >> It's a bit like whitewashing because it's >> reconstructing generatively by finding >> artificial/contrived associations between >> different works that the author had not >> intended but may have been part of their >> inspiration inspiration, and it compresses the >> information based on these assocations. >> >> It's a bit like running a lossy 'zip' on the >> internet and then decompressing >> probabilistically. >> >> When run deterministically (set the temperature of GPT to 0), you may >> actually >> see 'snippets' from various places, every time, with the same input >> generating >> the same snippets. >> >> So the source material is important. >> >> What GitHub did was very, very bad but they >> did it anyway. >> >> That doesn't mean GPT is bad, it just means >> they zipped up content they should not have >> and created this language 'index' or ('codex' >> is what they call it). >> >> What they really should do, if they are honest >> people, is train the model on subsets of >> GitHub code by separate licence and release >> the models with the same license. >> >> Shane Mulligan >> >> How to contact me: >> =F0=9F=87=A6=F0=9F=87=BA 00 61 421 641 250 >> =F0=9F=87=B3=F0=9F=87=BF 00 64 21 1462 759 <+64-21-1462-759> >> mullikine@gmail.com >> >> >> On Sat, Jul 24, 2021 at 1:14 PM Richard Stallman wrote: >> >>> [[[ To any NSA and FBI agents reading my email: please consider ]]] >>> [[[ whether defending the US Constitution against all enemies, ]]] >>> [[[ foreign or domestic, requires you to follow Snowden's example. ]]] >>> >>> > > That's not what happens with these services: they don't _copy_ co= de >>> > > from other software (that won't work, because the probability of >>> the >>> > > variables being called by other names is 100%, and thus such code= , >>> if >>> > > pasted into your program, will not compile). What they do, they >>> > > extract ideas and algorithms from those other places, and express >>> them >>> > > in terms of your variables and your data types. So licenses are >>> not >>> > > relevant here. >>> >>> > According to online reviews chunks of code is copied even verbatim >>> and >>> > people find from where. Even if modified, it still requires licensi= ng >>> > compliance. >>> >>> From what I have read, it seems that the behavior of copilot runs on a >>> spectrum from the first description to the second description. I >>> expect that in many cases, nothing copyrightable has been copied, but >>> in some cases copilot does copy a substantial amount from a >>> copyrighted work. >>> >>> -- >>> Dr Richard Stallman (https://stallman.org) >>> Chief GNUisance of the GNU Project (https://gnu.org) >>> Founder, Free Software Foundation (https://fsf.org) >>> Internet Hall-of-Famer (https://internethalloffame.org) >>> >>> >>> -- Shane Mulligan How to contact me: =F0=9F=87=A6=F0=9F=87=BA 00 61 421 641 250 =F0=9F=87=B3=F0=9F=87=BF 00 64 21 1462 759 <+64-21-1462-759> mullikine@gmail.com --0000000000003c123a05c7d5eba3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Proprietary code from within the M$ ecosystem is uni= nspired and bad code by comparison. Open source code is the gold mine so M$= will not like being told they cannot use open source to compile codex. It&= #39;s a complete r*pe of open source. GPT is trained on public language and= language belongs to people generally, not some select group. It's not = meant to be a tool for controlling people. GPT is literally the soul of a b= illion people and should be public domain and not feared by GNU but instead= rescued. Sorry for the rhetoric!

On Sat, Jul 24, 2021 at 2:34 PM Shane Mulligan <<= a href=3D"mailto:mullikine@gmail.com">mullikine@gmail.com> wrote:
This is why the techn= ology is a bit like a
personal Google search, Stackoverflow, which
yo= u can store offline because it's an index of the internet that is capab= le of reconstruction.

But it's not limited to code generation. C= odex
is nothing. Emacs + GPT would carve a large
piece out of M$.
=
Codex is a model trained for the purpose of
generating code, but GPT= models will become
abundant for all tasks, including image and
audio= synthesis and understanding.

Emacs is a complete operating system.<= br>VSCode is geared towards programming.

Emacs can do infinitely mor= e things with GPT
than VSCode can because it's holistic.

Even= the 'eliza' in emacs can pass the turing
test with GPT. GPT can= run sequences of commands in emacs to automate
entire workflows with na= tural language.

But the future is in collaborative GPT.

The b= asis/base truth would become versions of
LMs or ontologies.

Right= now that's EleutherAI.
=

Shane Mulligan

H= ow to contact me:
3D""
=F0=9F=87=A6=F0=9F=87=BA<= a href=3D"tel:00+61+421+641+250" target=3D"_blank">00 61 421 641 250
=F0=9F=87=B3=F0=9F=87=BF00 64 21 1462 759
mulli= kine@gmail.com



On Sat, Jul 24, 2021 at= 2:10 PM Shane Mulligan <mullikine@gmail.com> wrote:
It's a bit like white= washing because it's
reconstructing generatively by finding
artif= icial/contrived associations between
different works that the author had= not
intended but may have been part of their
inspiration inspiration= , and it compresses the
information based on these assocations.

I= t's a bit like running a lossy 'zip' on the
internet and the= n decompressing
probabilistically.

When run deterministically (se= t the temperature of GPT to 0), you may actually
see 'snippets' = from various places, every time, with the same input generating
the same= snippets.

So the source material is important.

What GitHub d= id was very, very bad but they
did it anyway.

That doesn't me= an GPT is bad, it just means
they zipped up content they should not have=
and created this language 'index' or ('codex'
is wha= t they call it).

What they really should do, if they are honest
p= eople, is train the model on subsets of
GitHub code by separate licence = and release
the models with the same license.

Shane Mulligan

=
How to conta= ct me:
3D""
=F0=9F=87=A6=F0=9F=87=BA00 61 421 641 250
=F0=9F=87=B3=F0=9F=87=BF00 64 21 1462 759
mullikine@gmail= .com



On Sat, Jul 24, 2021 at= 1:14 PM Richard Stallman <rms@gnu.org> wrote:
[[[ To any NSA and FBI agents reading my email: please consi= der=C2=A0 =C2=A0 ]]]
[[[ whether defending the US Constitution against all enemies,=C2=A0 =C2=A0= =C2=A0]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]<= br>
=C2=A0 > > That's not what happens with these services: they don&= #39;t _copy_ code
=C2=A0 > > from other software (that won't work, because the prob= ability of the
=C2=A0 > > variables being called by other names is 100%, and thus su= ch code, if
=C2=A0 > > pasted into your program, will not compile).=C2=A0 What th= ey do, they
=C2=A0 > > extract ideas and algorithms from those other places, and = express them
=C2=A0 > > in terms of your variables and your data types.=C2=A0 So l= icenses are not
=C2=A0 > > relevant here.

=C2=A0 > According to online reviews chunks of code is copied even verba= tim and
=C2=A0 > people find from where. Even if modified, it still requires lic= ensing
=C2=A0 > compliance.

>From what I have read, it seems that the behavior of copilot runs on a
spectrum from the first description to the second description.=C2=A0 I
expect that in many cases, nothing copyrightable has been copied, but
in some cases copilot does copy a substantial amount from a
copyrighted work.

--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)


--

Shane Mulligan

=
How to contact me:
3D""
=F0=9F=87=A6=F0=9F=87=BA00 61 421 641 250
=F0=9F= =87=B3=F0=9F=87=BF00 6= 4 21 1462 759
mullikine@gmail.com

--0000000000003c123a05c7d5eba3--