Re: Help building Pen.el (GPT for emacs)

unofficial mirror of emacs-tangents@gnu.org
 help / color / mirror / Atom feed

* Re: Help building Pen.el (GPT for emacs)
       [not found]                 ` <CACT87JriMaF1kFjEE_8=8FEQpAi6sxr3x3vZT3rafjY=4mQgZg@mail.gmail.com>
@ 2021-07-19 17:00                   ` Jean Louis
  2021-07-23  6:51                     ` Shane Mulligan
  0 siblings, 1 reply; 46+ messages in thread
From: Jean Louis @ 2021-07-19 17:00 UTC (permalink / raw)
  To: Shane Mulligan; +Cc: Eli Zaretskii, emacs-tangents, Stefan Kangas, rms

* Shane Mulligan <mullikine@gmail.com> [2021-07-18 11:01]:
> Pen.el stands for Prompt Engineering in emacs.
> Prompt Engineering is the art of describing what you would
> like a language model (transformer) to do. It is a new type of programming,
> example oriented;
> like literate programming, but manifested automatically.

Sounds like a replacement for a programmer's mind. 

> A transformer takes some text (called a prompt) and continues
> it. However, the continuation is the superset of all NLP tasks,

Where is definition of the abbreviation NLP?

> as the generation can also be a classification, for instance. Those
> NLP tasks extend beyond world languages and into programming
> languages (whatever has been 'indexed' or 'learned') from these
> large LMs.

What is definition of the abbreviation LM?

> Pen.el is an editing environment for designing 'prompts' to LMs. It
> is better than anything that exists, even at OpenAI or at
> Microsoft. I have been working on it and preparing for this for a
> long time.

Good, but is there a video to show what it really does?

> These prompts are example- based tasks. There are a number of design
> patterns which Pen.el is seeking to encode into a domain-specific
> language called 'examplary' for example- oriented programming.

Do you mean "exemplary" or "examplary", is it spelling mistake?

I have to ask as your description is still pretty abstract without
particular example.

> Pen.el creates functions 1:1 for a prompt to an emacs lisp function.

The above does not tell me anything.

> Emacs is Grammarly, Google Translate, Copilot, Stackoveflow and
> infinitely many other services all rolled into one and allows you to
> have a private parallel to all these services that is completely
> private and open source -- that is if you have downloaded the
> EleutherAI model locally.

I understand that it is kind of fetching information, but that does
not solve licensing issues, it sounds like licensing hell.

> ** Response to Jean Louis
> - And I do not think it should be in GNU ELPA due to above reasons.
> 
> I am glad I have forewarned you guys. This is my current goal. Help
> in my project would be appreciated. I cannot do it alone and I
> cannot convince all of you.

Why don't you tell about licensing issues? Taking code without proper
licensing compliance is IMHO, not an option. It sounds as problem
generator.

> > Why don't you simply make an Emacs package as .tar as described in Emacs
> Lisp manual?

> Thank you for taking a look at my emacs package. It's not ready net
> for Melpa merge. I hope that I will be able to find some help in
> order to prepare it, but the rules are very strict and this may not
> happen.

I did not say to put it in Melpa. Package you can make for yourself
and users so that users can M-x package-install-file

That is really not related to any online Emacs package repository. It
is way how to install Emacs packages no matter where one gets it.

> > How does that solves the licensing problems?
> The current EleutherAI model which competes with GPT-3 is GPT-Neo.
> It is MIT licensed.

That is good.

But the code that is generated and injected requires proper
contribution.

> Also the data it has been trained on is MIT licensed.

Yes, and then the program should also solve the proper contributions
automatically. You cannot just say "MIT licensed", this has to be
proven, source has to be found and proper attributions applied.

Why don't you implement proper licensing?

Please find ONE license that you are using from code that is being
used as database for generation of future code and provide link to
it. Then show how is license complied to.

> The current EleutherAI model which competes with Codex is GPT-j.
> It is licensed with Apache-2.0 License

That is good, but I am referring to the generated code.

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-19 17:00                   ` Help building Pen.el (GPT for emacs) Jean Louis
@ 2021-07-23  6:51                     ` Shane Mulligan
  2021-07-23 10:12                       ` Jean Louis
  2021-07-25  1:06                       ` Richard Stallman
  0 siblings, 2 replies; 46+ messages in thread
From: Shane Mulligan @ 2021-07-23  6:51 UTC (permalink / raw)
  To: Jean Louis; +Cc: Eli Zaretskii, emacs-tangents, Stefan Kangas, rms

[-- Attachment #1: Type: text/plain, Size: 10102 bytes --]

Hi Jean and GNU friends,

GPT is potentially the best thing to happen to emacs in a very long time.
It will bring back power from the corporations and save it to your
computer, open source and transparent, and offline.

Please consider including a collaborative, open source prompts repository
in the next version of emacs.

So far I'm yet to see anything like it, but I see in commercial products
everywhere that they have full domain over this new type of code.
I am trying to build up relationships in my project Pen.el with others who
value open source. gptprompts.org, for example.
This is to create a catalogue for pen.el. One thing we have just introduced
is a field to specify a licence for each prompt. However, I must say
that prompts are more like functions. *Soft prompts* are very granular
prompts as they have been reduced to a minimal number of characters using
optimisation.

https://arxiv.org/abs/2104.08691

Therefore, there must be support for prompting at
the syntax level of emacs, in my opinion. And it is also clear now that
since a prompt looks more like binary code,
that this is a new type of function definition and a new type of
programming is emerging.

A prompt function is a function defined by a
version of a Language Model (LM) and a prompt (input), but
as is the case in haskell, every function may
be reduced to one that takes a single input
and returns a single output. In other words,
most prompt functions will be parameterised
and have an arity greater than one.

I am building a collaborative imaginary
programming environment in emacs. This is an
editing environment where people can integrate
LMs into emacs, extending emacs with prompt
functions. The power of this is profound and beyond belief.
I have coined the term "prompt functions", so don't expect to be able to
find
it online if you go searching.

Here is a new corporation which is creating a prompt engineering
environment.
However, they do not have their own operating system to integrate prompting
into. That's why emacs is years ahead, potentially.
A prompt is merely a function with a language model as a parameter. Without
integration, it's quite useless.

https://gpt3demo.com/apps/mantiumai

I think a prompts database -- something like
Datomic or other RDF-like, immutable storage
must be added into GNU organisation to store
selected prompts and generations, and a GPL or EleutherAI GPT model
is ultimately integrated into core emacs via
some low level syntax through partnership with EleutherAI.
I would expect in the future to download emacs
along with an open- source GPT model, and be
able to create prompt functions as easily as
creating macros.

A 1:1 prompt:function database of sorts is a
good starting point in my opinion, but
remembering the generations is also important.
But the scale is immense. This is why a p2p
database that can remember immutably is
important, in my opinion. If this seems too
grand of scale, then at the very least
consider a GNU prompts repository.

> Sounds like a replacement for a programmer's mind.
Yes it is. It trivialises the implementation and requires that programmers
now be more imaginative, and will be supported by the language model.
Rather than writing an implementation, function is defined by the
input types and a Language Model and version of the language model.

> Where is definition of the abbreviation NLP?
NLP stands for Natural Language Processing. Until recently, code was not
considered part of that domain, but the truth is NLP algorithms are
extremely useful for code generation, code search and code understanding.

> What is definition of the abbreviation LM?
LM stands for Language Model. It is a statistical model of language, rather
than use formal grammars. Emacs lisp functions and macros do not have a
syntax for stochastic/probabilistic programming.
Good, but is there a video to show what it really does?

Here is an online catalogue of GPT tools. Pen.el is among the developer
tools.
https://gpt3demo.com/category/developer-tools

=Pen.el= and emacs has the potential to do all the things for all of
the products in =gpt3demo.com=.

> I would like to demonstrate Pen.el with this particular video which I
have created to demonstrate a new type of programming -- collaborative
within a language model.
https://mullikine.github.io/posts/caching-and-saving-results-of-prompt-functions-in-pen-el/
https://asciinema.org/a/MhOU0eMnJsRpXf2Ak9YStPlz8

> Do you mean "exemplary" or "examplary", is it spelling mistake?
I am building a DSL for encoding prompt design
patterns to generate prompt functions for
emacs.

http://github.com/semiosis/pen.el/blob/master/src/pen-examplary.el

> Pen.el creates functions 1:1 for a prompt to an emacs lisp function.

What this means is that a prompt may be
parameterized to define a relation (i.e.
function) and therefore code and I have chosen
to create one parameterized function per prompt.

The prompt text once associated to a LM
becomes a type of query (i.e. code), so
prompts should not be discounted as being any
less than such, and qualify for the GPL3 license.

> I understand that it is kind of fetching information, but that does not
solve licensing issues, it sounds like licensing hell.
This is exactly why a GPL LM or compatible LM
is absolutely crucial and needs to be
integrated, otherwise all imaginary code will
be violating and harvesting open source for
the foreseeable future as there is no
alternative.

Sincerely,
Shane

.

Shane Mulligan

How to contact me:
🇦🇺 00 61 421 641 250
🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
mullikine@gmail.com

On Tue, Jul 20, 2021 at 5:04 AM Jean Louis <bugs@gnu.support> wrote:

> * Shane Mulligan <mullikine@gmail.com> [2021-07-18 11:01]:
> > Pen.el stands for Prompt Engineering in emacs.
> > Prompt Engineering is the art of describing what you would
> > like a language model (transformer) to do. It is a new type of
> programming,
> > example oriented;
> > like literate programming, but manifested automatically.
>
> Sounds like a replacement for a programmer's mind.
>
> > A transformer takes some text (called a prompt) and continues
> > it. However, the continuation is the superset of all NLP tasks,
>
> Where is definition of the abbreviation NLP?
>
> > as the generation can also be a classification, for instance. Those
> > NLP tasks extend beyond world languages and into programming
> > languages (whatever has been 'indexed' or 'learned') from these
> > large LMs.
>
> What is definition of the abbreviation LM?
>
> > Pen.el is an editing environment for designing 'prompts' to LMs. It
> > is better than anything that exists, even at OpenAI or at
> > Microsoft. I have been working on it and preparing for this for a
> > long time.
>
> Good, but is there a video to show what it really does?
>
> > These prompts are example- based tasks. There are a number of design
> > patterns which Pen.el is seeking to encode into a domain-specific
> > language called 'examplary' for example- oriented programming.
>
> Do you mean "exemplary" or "examplary", is it spelling mistake?
>
> I have to ask as your description is still pretty abstract without
> particular example.
>
> > Pen.el creates functions 1:1 for a prompt to an emacs lisp function.
>
> The above does not tell me anything.
>
> > Emacs is Grammarly, Google Translate, Copilot, Stackoveflow and
> > infinitely many other services all rolled into one and allows you to
> > have a private parallel to all these services that is completely
> > private and open source -- that is if you have downloaded the
> > EleutherAI model locally.
>
> I understand that it is kind of fetching information, but that does
> not solve licensing issues, it sounds like licensing hell.
>
> > ** Response to Jean Louis
> > - And I do not think it should be in GNU ELPA due to above reasons.
> >
> > I am glad I have forewarned you guys. This is my current goal. Help
> > in my project would be appreciated. I cannot do it alone and I
> > cannot convince all of you.
>
> Why don't you tell about licensing issues? Taking code without proper
> licensing compliance is IMHO, not an option. It sounds as problem
> generator.
>
> > > Why don't you simply make an Emacs package as .tar as described in
> Emacs
> > Lisp manual?
>
> > Thank you for taking a look at my emacs package. It's not ready net
> > for Melpa merge. I hope that I will be able to find some help in
> > order to prepare it, but the rules are very strict and this may not
> > happen.
>
> I did not say to put it in Melpa. Package you can make for yourself
> and users so that users can M-x package-install-file
>
> That is really not related to any online Emacs package repository. It
> is way how to install Emacs packages no matter where one gets it.
>
> > > How does that solves the licensing problems?
> > The current EleutherAI model which competes with GPT-3 is GPT-Neo.
> > It is MIT licensed.
>
> That is good.
>
> But the code that is generated and injected requires proper
> contribution.
>
> > Also the data it has been trained on is MIT licensed.
>
> Yes, and then the program should also solve the proper contributions
> automatically. You cannot just say "MIT licensed", this has to be
> proven, source has to be found and proper attributions applied.
>
> Why don't you implement proper licensing?
>
> Please find ONE license that you are using from code that is being
> used as database for generation of future code and provide link to
> it. Then show how is license complied to.
>
> > The current EleutherAI model which competes with Codex is GPT-j.
> > It is licensed with Apache-2.0 License
>
> That is good, but I am referring to the generated code.
>
> --
> Jean
>
> Take action in Free Software Foundation campaigns:
> https://www.fsf.org/campaigns
>
> In support of Richard M. Stallman
> https://stallmansupport.org/
>

[-- Attachment #2: Type: text/html, Size: 13458 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23  6:51                     ` Shane Mulligan
@ 2021-07-23 10:12                       ` Jean Louis
  2021-07-23 10:54                         ` Eli Zaretskii
  2021-07-25  1:06                       ` Richard Stallman
  1 sibling, 1 reply; 46+ messages in thread
From: Jean Louis @ 2021-07-23 10:12 UTC (permalink / raw)
  To: Shane Mulligan; +Cc: Eli Zaretskii, Stefan Kangas, emacs-tangents, rms

* Shane Mulligan <mullikine@gmail.com> [2021-07-23 10:07]:
> Hi Jean and GNU friends,
> 
> GPT is potentially the best thing to happen to emacs in a very long time.
> It will bring back power from the corporations and save it to your
> computer, open source and transparent, and offline.

Description is too apstract.

Show me how it brought power from corporations and saved it to your
computer?

The term "open source" is vague, you know it? So I do not know what is
meant with it. There is no clear licensing that you are proposing,
neither so far you explained how is licensing solved. If you have not
read the licenses please let me know, as then you most probably do not
know to what I am referring.

Example:

You are using artificial intelligence, but in fact pieces of codes are
in chunks copied from other sources without attribution and without
knowing if licenses are compatible. If that issue is not solve I do
not see why would anybody serious use AI to create code as that would
potentially generate so many legal problems. And it does so now. So
many people gave up on Github because Github does not comply to
licenses when using Copilot.

You are more or less proposing the same conflict to come to Emacs and
I did not see where is your solution?

> I understand that it is kind of fetching information, but that does
> not solve licensing issues, it sounds like licensing hell.  This is
> exactly why a GPL LM or compatible LM is absolutely crucial and
> needs to be integrated, otherwise all imaginary code will be
> violating and harvesting open source for the foreseeable future as
> there is no alternative.

So how?

That should be first to start with as one cannot even experiment in
public without it. Experimenting at home is fine, but as soon as
anything is published in public without compliance to licenses it
generates problems.

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 10:12                       ` Jean Louis
@ 2021-07-23 10:54                         ` Eli Zaretskii
  2021-07-23 11:32                           ` Jean Louis
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-23 10:54 UTC (permalink / raw)
  To: Jean Louis; +Cc: stefan, emacs-tangents, mullikine, rms

> Date: Fri, 23 Jul 2021 13:12:12 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: Eli Zaretskii <eliz@gnu.org>, Stefan Kangas <stefan@marxist.se>,
>  emacs-tangents@gnu.org, rms@gnu.org
> 
> You are using artificial intelligence, but in fact pieces of codes are
> in chunks copied from other sources without attribution and without
> knowing if licenses are compatible.

That's not what happens with these services: they don't _copy_ code
from other software (that won't work, because the probability of the
variables being called by other names is 100%, and thus such code, if
pasted into your program, will not compile).  What they do, they
extract ideas and algorithms from those other places, and express them
in terms of your variables and your data types.  So licenses are not
relevant here.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 10:54                         ` Eli Zaretskii
@ 2021-07-23 11:32                           ` Jean Louis
  2021-07-23 11:51                             ` Eli Zaretskii
  2021-07-24  1:14                             ` Richard Stallman
  0 siblings, 2 replies; 46+ messages in thread
From: Jean Louis @ 2021-07-23 11:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-23 13:55]:
> > Date: Fri, 23 Jul 2021 13:12:12 +0300
> > From: Jean Louis <bugs@gnu.support>
> > Cc: Eli Zaretskii <eliz@gnu.org>, Stefan Kangas <stefan@marxist.se>,
> >  emacs-tangents@gnu.org, rms@gnu.org
> > 
> > You are using artificial intelligence, but in fact pieces of codes are
> > in chunks copied from other sources without attribution and without
> > knowing if licenses are compatible.
> 
> That's not what happens with these services: they don't _copy_ code
> from other software (that won't work, because the probability of the
> variables being called by other names is 100%, and thus such code, if
> pasted into your program, will not compile).  What they do, they
> extract ideas and algorithms from those other places, and express them
> in terms of your variables and your data types.  So licenses are not
> relevant here.

According to online reviews chunks of code is copied even verbatim and
people find from where. Even if modified, it still requires licensing
compliance. 

If code compiles or not is irrelevant. If one runs it or not is also
irrelevant, code need not even run.

I do not believe that any of the AI-s so far "extract ideas". I never
heard of it. Which algorithms is there on this planet that may extract
idea? 

I also do not believe that AI extract algorithms, though AI has its
own algorithms to generate the relevant possible code.

If newly generated code is modification from other code, what we know
now that it is, and is based on, that requires licensing
attributions. 

That licenses are relevant one can see from online discussions related
to Github Copilot:

https://fossa.com/blog/analyzing-legal-implications-github-copilot/

https://medium.com/geekculture/githubs-ai-copilot-might-get-you-sued-if-you-use-it-c1cade1ea229

https://fosspost.org/github-copilot/

https://artificialintelligence-news.com/2021/07/06/experts-debate-github-latest-ai-tool-violates-copyright-law/

https://www.opensourceforu.com/2021/07/github-copilot-sparks-debates-around-open-source-licenses/

That is just a small excerpt from a very large debat online related to
licensing issues. 

Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 11:32                           ` Jean Louis
@ 2021-07-23 11:51                             ` Eli Zaretskii
  2021-07-23 12:47                               ` Jean Louis
  2021-07-24  1:14                             ` Richard Stallman
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-23 11:51 UTC (permalink / raw)
  To: Jean Louis; +Cc: stefan, emacs-tangents, mullikine, rms

> Date: Fri, 23 Jul 2021 14:32:00 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: mullikine@gmail.com, stefan@marxist.se, emacs-tangents@gnu.org,
>   rms@gnu.org
> 
> > That's not what happens with these services: they don't _copy_ code
> > from other software (that won't work, because the probability of the
> > variables being called by other names is 100%, and thus such code, if
> > pasted into your program, will not compile).  What they do, they
> > extract ideas and algorithms from those other places, and express them
> > in terms of your variables and your data types.  So licenses are not
> > relevant here.
> 
> According to online reviews chunks of code is copied even verbatim and
> people find from where.

That cannot be true.  It is nonsense to copy unrelated code into a
program and tell people this is what they should use.

> If code compiles or not is irrelevant. If one runs it or not is also
> irrelevant, code need not even run.

A feature or service that is based on this idea will never fly,
believe me.  Which program would want to have code pasted into his/her
program that would cause compilation errors or, worse, break it at run
time?

> I do not believe that any of the AI-s so far "extract ideas". I never
> heard of it. Which algorithms is there on this planet that may extract
> idea? 

That's a very general question, it is impossible to answer it in a
post to a mailing list.  If you are really interested, you will have
to read up on that.  But you are wrong in your beliefs.

> If newly generated code is modification from other code, what we know
> now that it is, and is based on, that requires licensing
> attributions. 

Once again, your assumptions are all wrong, so your conclusions are
also wrong.  Why not try one of these services and see what they
actually do, before you pass your (quite harsh) judgment on them, and
on the modern state of AI in general?

> That licenses are relevant one can see from online discussions related
> to Github Copilot:

That people ask these questions and discuss this doesn't mean the
problem is real.  many people don't really understand what copyright
means and how to apply it to program code.  People also ask questions
about the GPL, and there's a vociferous group of people who think the
copyright assignment of code to the FSF means you give up all your
rights in the code you've written.  None of that is true, but still
the rumors and the heated discussions go on and on.  Their existence
proves nothing, except that some people misunderstand something.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 11:51                             ` Eli Zaretskii
@ 2021-07-23 12:47                               ` Jean Louis
  2021-07-23 13:39                                 ` Shane Mulligan
  2021-07-23 19:33                                 ` Eli Zaretskii
  0 siblings, 2 replies; 46+ messages in thread
From: Jean Louis @ 2021-07-23 12:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-23 14:51]:
> > According to online reviews chunks of code is copied even verbatim and
> > people find from where.
> 
> That cannot be true.  It is nonsense to copy unrelated code into a
> program and tell people this is what they should use.

I wonder how sure you are in that, did you do the online research? It
is not about related or unrelated, I do believe that AI finds and
generates related code. But 

Here are references disputing how "it cannot be true":

https://hacker-news.news/post/27710287

https://mmacvicar.medium.com/it-is-best-if-copilot-copies-everything-d84506128e5a

https://loudlabs.nl/news/githubs-commercial-ai-tool-was-built-from-open-source-code/

> > If code compiles or not is irrelevant. If one runs it or not is also
> > irrelevant, code need not even run.
> 
> A feature or service that is based on this idea will never fly,
> believe me.  Which program would want to have code pasted into his/her
> program that would cause compilation errors or, worse, break it at run
> time?

Of course people want code to fun. Just that copyright laws don't
handle technical functionality. It is irrelevant if program works or
does not work. There are thousands of copyrighted programs that cannot
work any more as devices are not on the market, they are still under
copyright.

> > I do not believe that any of the AI-s so far "extract ideas". I never
> > heard of it. Which algorithms is there on this planet that may extract
> > idea? 
> 
> That's a very general question, it is impossible to answer it in a
> post to a mailing list.  If you are really interested, you will have
> to read up on that.  But you are wrong in your beliefs.
> 
> > If newly generated code is modification from other code, what we know
> > now that it is, and is based on, that requires licensing
> > attributions. 
> 
> Once again, your assumptions are all wrong, so your conclusions are
> also wrong.  Why not try one of these services and see what they
> actually do, before you pass your (quite harsh) judgment on them, and
> on the modern state of AI in general?

I can hear you how I am wrong, conclusions are wrong, though I gave
you references enough to research it on Internet that will tell that
there are possible serious licensing problems with such generated
code.

> > That licenses are relevant one can see from online discussions related
> > to Github Copilot:
> 
> That people ask these questions and discuss this doesn't mean the
> problem is real.  many people don't really understand what copyright
> means and how to apply it to program code.

Well said! Though that is not relevant.

Question is very particular, specific and concrete:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

How does Pen.el and background AI services ensure of licensing
compliance?

I would appreciate if you find solution to that or stay on that
subject, as if I am wrong or right is not relevant, what I wish is to
have assurance that it is free software. Prove me wrong by providing
exact references in not only on country's law but also other
countries' laws, the lows that make it legal, or how otherwise the
legality of such code is justified and how users may get free
software.

For example you may wish to mention "fair use" and on the other hand
similar laws must be found in other countries that would justify it to
be free software. 

As long as you don't tackle those subjects there is no legal solution
for Pen.el and background AI to be used with assurance that software
is truly free software.

Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 12:47                               ` Jean Louis
@ 2021-07-23 13:39                                 ` Shane Mulligan
  2021-07-23 14:39                                   ` Jean Louis
  2021-07-26  0:16                                   ` Richard Stallman
  2021-07-23 19:33                                 ` Eli Zaretskii
  1 sibling, 2 replies; 46+ messages in thread
From: Shane Mulligan @ 2021-07-23 13:39 UTC (permalink / raw)
  To: Jean Louis; +Cc: Eli Zaretskii, Stefan Kangas, emacs-tangents, rms

[-- Attachment #1: Type: text/plain, Size: 9744 bytes --]

Hi Jean, Eli, GNU,

> "open source"
I am referring to free software in the spirit
of GNU. Free as in freedom, from oppression,
from an attack against creative and cognitive
intelligence.

> GPT is potentially the best thing to happen to emacs in a very long time.
> It will bring back power from the corporations and save it to your
> computer, open source and transparent, and offline.

The way this will work is you will download
the free GPT model, such as GPT-j, GPT-neo or
GPT-neox and then you will have an offline and
private alternative to many things previously
you would go online for.

I have been working 5 months on demonstrations
this whole time and I have informed you guys via emails,
using specific demonstrations. I've even hand
picked for you.

Now you ask again and I'll give you another, but
you are missing the point by
focusing on one example when the possibilities
are infinite.

If I was a computing pioneer and I had to
convince you of the importance of AI and all I
had was the lambda calculus, would you see it?
And you ask me for another cherry-picked demo,
but it is very much beyond your current
understanding.

It's so much beyond what you believe is
possible that you ask for an example. I have
shown you 10+ already.

GPT turns emacs into something very powerful
beyond your current comprehension. It's so
profound that it will replace many of the
online and offline services you may have come
to take for granted. It goes way beyond that too.

This is the telos and purpose of emacs. It
can save free software by absorbing GPT.

GPT is not a toy.

Here is another demo.
The below instructions were given to me by the
tutor in Pen.el when I asked it for help.

    There are two ways to quit Emacs, the hard way, and the easy way.
    In the hard way, you type M-x kill-emacs, and press enter.
    In the easy way, you press C-x C-c.

The following is a prompt that created this interactive function.

#+BEGIN_SRC yaml
  prompt: |
    This is a conversation between a human and a brilliant AI.
    The topic is "<2>".

    Human: Hello, are you my <1> tutor?
    ###
    AI: Hi there.
    Yes I am.
    How can I help you?
    ###
    Human: Thanks. I have a question. <2>
    ###
    AI: I would be happy to answer your question.
#+END_SRC

Here is the recording of me doing that:

https://asciinema.org/a/SCUhm3l11N3w5eilUfewBDCiP

In the future the core prompts will be only a
few bytes and most prompts we use will be
defined using types alone, or subjective to
previously executed prompt functions,
collaboratively.

A prompt may be defined by type names alone,
plus the version of a LM; the rest is inferred
or subjective to peer to peer prompts:

#+BEGIN_SRC emacs-lisp
  (defprompt ("short lines of code" "regex"))
#+END_SRC

You haven't yet understood the profundity of GPT and
doing a great disservice to free software by
stifling imaginary programming inside emacs.

Eli is right in that these LMs are not copy
pasting. They are indeed generating from a
statistical model. I know many people who are
creating their own models on themselves. There
will be infinitely many of these models in the
future. GNU needs one. LMs may be trained on
subsets of the code from GitHub to ensure
there is no possibility for derivative works
from conflicting licensed code.

> You are more or less proposing the same conflict to come to Emacs and
> I did not see where is your solution?
Emacs is dead without GPT. That's why I raised the issue.
It's dead because it can be imagined by LMs
and will lose its power. It'll be just another imagined environment.
Software is changing and Emacs can't miss out
because it represents freedom.

> So how?
As I have said countless times in the past and
is clearly visible from their website,
EleutherAI has MIT and Apache licensed
alternative models which you can freely
download.

Here is a completer you can freely use:
https://github.com/samrawal/emacs-secondmate

This will not replace everything from =gpt3demo.com=.

You're failing to see the full picture here.
It's absolutely vital for emacs' survival to
have GPT incorporated. Make it happen.

> As soon as anything is published in public
> without compliance to licenses it generates
> problems.
Prompts are completely at the license of the
person who created it, even if they are
queries to GPT3.
Just as I can write SQL to Microsoft SQL
Server and license my SQL as GPL3.
Prompts are code and I'm talking about a new
type of programming that is nonexistent in
emacs.

My suggestions:
- Create prompts database
  - So people can collaborate on open source prompts
  - So people can extend emacs with language models
  - So people know it's ok to use their imagination and emacs supports
creative intelligence
- Integrate prompt functions into emacs somehow
  - defprompt
- Optionally ship GPT-neo and GPT-j with emacs
- Consider creating a prompting server
- Consider a database for saving generations

=Pen.el= is GPL3. There's nothing wrong with
typing on a keyboard so it's fully compliant
with licensing.

=Pen.el= allows you to select the completion
engine and you may use a libre completion
engine such as GPT-j, GPT-neo or GPT-neox.

> "Prove me wrong"
Do me a favour and do some research yourself.
I have too much to do.

Sincerely,
Shane

Shane Mulligan

How to contact me:
🇦🇺 00 61 421 641 250
🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
mullikine@gmail.com


On Sat, Jul 24, 2021 at 12:50 AM Jean Louis <bugs@gnu.support> wrote:

> * Eli Zaretskii <eliz@gnu.org> [2021-07-23 14:51]:
> > > According to online reviews chunks of code is copied even verbatim and
> > > people find from where.
> >
> > That cannot be true.  It is nonsense to copy unrelated code into a
> > program and tell people this is what they should use.
>
> I wonder how sure you are in that, did you do the online research? It
> is not about related or unrelated, I do believe that AI finds and
> generates related code. But
>
> Here are references disputing how "it cannot be true":
>
> https://hacker-news.news/post/27710287
>
>
> https://mmacvicar.medium.com/it-is-best-if-copilot-copies-everything-d84506128e5a
>
>
> https://loudlabs.nl/news/githubs-commercial-ai-tool-was-built-from-open-source-code/
>
> > > If code compiles or not is irrelevant. If one runs it or not is also
> > > irrelevant, code need not even run.
> >
> > A feature or service that is based on this idea will never fly,
> > believe me.  Which program would want to have code pasted into his/her
> > program that would cause compilation errors or, worse, break it at run
> > time?
>
> Of course people want code to fun. Just that copyright laws don't
> handle technical functionality. It is irrelevant if program works or
> does not work. There are thousands of copyrighted programs that cannot
> work any more as devices are not on the market, they are still under
> copyright.
>
> > > I do not believe that any of the AI-s so far "extract ideas". I never
> > > heard of it. Which algorithms is there on this planet that may extract
> > > idea?
> >
> > That's a very general question, it is impossible to answer it in a
> > post to a mailing list.  If you are really interested, you will have
> > to read up on that.  But you are wrong in your beliefs.
> >
> > > If newly generated code is modification from other code, what we know
> > > now that it is, and is based on, that requires licensing
> > > attributions.
> >
> > Once again, your assumptions are all wrong, so your conclusions are
> > also wrong.  Why not try one of these services and see what they
> > actually do, before you pass your (quite harsh) judgment on them, and
> > on the modern state of AI in general?
>
> I can hear you how I am wrong, conclusions are wrong, though I gave
> you references enough to research it on Internet that will tell that
> there are possible serious licensing problems with such generated
> code.
>
> > > That licenses are relevant one can see from online discussions related
> > > to Github Copilot:
> >
> > That people ask these questions and discuss this doesn't mean the
> > problem is real.  many people don't really understand what copyright
> > means and how to apply it to program code.
>
> Well said! Though that is not relevant.
>
> Question is very particular, specific and concrete:
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
>
> How does Pen.el and background AI services ensure of licensing
> compliance?
>
> I would appreciate if you find solution to that or stay on that
> subject, as if I am wrong or right is not relevant, what I wish is to
> have assurance that it is free software. Prove me wrong by providing
> exact references in not only on country's law but also other
> countries' laws, the lows that make it legal, or how otherwise the
> legality of such code is justified and how users may get free
> software.
>
> For example you may wish to mention "fair use" and on the other hand
> similar laws must be found in other countries that would justify it to
> be free software.
>
> As long as you don't tackle those subjects there is no legal solution
> for Pen.el and background AI to be used with assurance that software
> is truly free software.
>
>
>
> Jean
>
> Take action in Free Software Foundation campaigns:
> https://www.fsf.org/campaigns
>
> In support of Richard M. Stallman
> https://stallmansupport.org/
>
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 13153 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 13:39                                 ` Shane Mulligan
@ 2021-07-23 14:39                                   ` Jean Louis
  2021-07-26  0:16                                   ` Richard Stallman
  1 sibling, 0 replies; 46+ messages in thread
From: Jean Louis @ 2021-07-23 14:39 UTC (permalink / raw)
  To: Shane Mulligan; +Cc: Eli Zaretskii, Stefan Kangas, emacs-tangents, rms

* Shane Mulligan <mullikine@gmail.com> [2021-07-23 16:40]:
> Hi Jean, Eli, GNU,
> 
> > "open source"
> I am referring to free software in the spirit
> of GNU. Free as in freedom, from oppression,
> from an attack against creative and cognitive
> intelligence.

Alright, though in GNU project we don't use the term "Open Source" as
it was never about it. The term is "free software". Open Source is
today vague, people use the term "Open" for things which are not so
open, including source code, there is open source from Microsoft which
is proprietary and yet called "open source", there we have now OpenAI
which is not free software, and so on.

See "Open":
https://www.gnu.org/philosophy/words-to-avoid.html#Open

> Now you ask again and I'll give you another, but you are missing the
> point by focusing on one example when the possibilities are
> infinite.

I hope that generated code will not take longer time to verify then
writing it by hand.

Not to mistake me, if the databases are free as in the definition of
free software and your code is free, then I am definitely for that,
and I like AI, we have too little of the artificial intelligence in
21st century. We are under developed civilization in that regard. The
movie 2001: A Space Odyssey was made in 1968, prediction was already
that we would have space ships with Hal AI that guides us, but we are
not far from bedroom with Amazon spying "AI" devices.

It is very important to have all parts free as in free software
definition.

Review again:
https://www.gnu.org/philosophy/free-sw.html

> GPT turns emacs into something very powerful beyond your current
> comprehension. It's so profound that it will replace many of the
> online and offline services you may have come to take for
> granted. It goes way beyond that too.

Yes, I am asking for less abstract, more practical examples. I have no
use from the hype.

> Here is another demo.
> The below instructions were given to me by the
> tutor in Pen.el when I asked it for help.
> 
>     There are two ways to quit Emacs, the hard way, and the easy way.
>     In the hard way, you type M-x kill-emacs, and press enter.
>     In the easy way, you press C-x C-c.

I have to look on it realistically from my angle, so I do not see any
use in this example. I see that AI guessed something and generated
text. I would not agree that M-x kill-emacs is hard way, and that C-x
C-c is easy way. I would rather say that easy way is to choose File
and Quit menu options.

> The following is a prompt that created this interactive function.
> 
> #+BEGIN_SRC yaml
>   prompt: |
>     This is a conversation between a human and a brilliant AI.
>     The topic is "<2>".
> 
>     Human: Hello, are you my <1> tutor?
>     ###
>     AI: Hi there.
>     Yes I am.
>     How can I help you?
>     ###
>     Human: Thanks. I have a . <2>
>     ###
>     AI: I would be happy to answer your question.
> #+END_SRC

I am sorry, I wish to see example of usefulness. I will go over your
previous examples. It is definitely possible that I neglected it, but
you know from beginning that I am interested in this. I have my
reasons why I am interested as I do generate a lot of text and I wish
to spare my writings. In the above quote I do not see that prompt, and
how is relevant to how to quit Emacs.

> Here is the recording of me doing that:
> 
> https://asciinema.org/a/SCUhm3l11N3w5eilUfewBDCiP

I have clicked on that link and could not find exact reference. I
found "haskell lsp with HIE", something about "htop" and
"stackexchange". 

> A prompt may be defined by type names alone,
> plus the version of a LM; the rest is inferred
> or subjective to peer to peer prompts:
> 
> #+BEGIN_SRC emacs-lisp
>   (defprompt ("short lines of code" "regex"))
> #+END_SRC
> 
> You haven't yet understood the profundity of GPT and
> doing a great disservice to free software by
> stifling imaginary programming inside emacs.

The above hype paragraphs are suspiciously AI-looking.

> > You are more or less proposing the same conflict to come to Emacs and
> > I did not see where is your solution?

> Emacs is dead without GPT. That's why I raised the issue.  It's dead
> because it can be imagined by LMs and will lose its power. It'll be
> just another imagined environment.  Software is changing and Emacs
> can't miss out because it represents freedom.

That was not context of my question. Did you read last email to Eli
about it?

It seem like you either ignored my question or you keep using AI to
generate hype about it.

Julia Reda from Germany is at least trying to answer my question
related to licensing compliance here:
https://juliareda.eu/2021/07/github-copilot-is-not-infringing-your-copyright/

So there are at least ways to go to understand how it complies or
could comply to licenses or be liberated from licenses.

> You're failing to see the full picture here.
> It's absolutely vital for emacs' survival to
> have GPT incorporated. Make it happen.

I'm having a hard time following your proposal. I'm not sure what
you're asking. But I am too old to understand it. Though it is not my
age that matters. It is your experience, your knowledge, your
education, your understanding, your wisdom. In all of these you and me
have to be the best judge of what is good for you and me and what is
not.

The above paragraph was created by AI with small corrections. It says
nothing just as the above quoted paragraph says nothing
essential. "full picture", "absolutely vital", "Emacs survival", "Make
it happen" -- that is sales pitch. And I am sales manager btw.

That AI is useful in general, no need to convince me. Question was
about licensing and I see that some activists like Julia Reda, which I
have contacted previously in relation to copyright issues in EU, have
found legal justifications for licensing compliance. 

That is what I wanted to know.

The issue is however not closed with the assumptions of Julia Reda, as
she may know EU laws, but not all jurisdictions are quite aligned, so
we have still to be vigilant and follow up on that.

IMHO, you should incorporate justifications used by Mrs. Reda in your
commentary or README files with references so that licensing becomes
clear for future readers.

> > As soon as anything is published in public
> > without compliance to licenses it generates
> > problems.
> Prompts are completely at the license of the
> person who created it, even if they are
> queries to GPT3.

If prompts never appear in published works those are not relevant. If
they do appear, their licensing compliance is assured by programmer or
author. Even prompts could be coming from proprietary software.

> My suggestions:
> - Create prompts database
>   - So people can collaborate on open source prompts
>   - So people can extend emacs with language models
>   - So people know it's ok to use their imagination and emacs supports
> creative intelligence
> - Integrate prompt functions into emacs somehow?
>   - defprompt
> - Optionally ship GPT-neo and GPT-j with emacs
> - Consider creating a prompting server
> - Consider a database for saving generations

I find it all goods ideas, I just wish I could see more practical
example:

- prompts database, should be on the website? Where? Hosted by which
  party? Is it centralized or decentralized?

- collaboration by which means? Email, chat? Website forum? How
  exactly?

- I understand your 3rd point.

- GPT-neo and GPT-j is how big?

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
       [not found]   ` <CACT87JpAcUfuRB01CcnfbL4yCTPyDoiG_WOzzxVvAW7rhj0=Mw@mail.gmail.com>
@ 2021-07-23 15:37     ` Jean Louis
  0 siblings, 0 replies; 46+ messages in thread
From: Jean Louis @ 2021-07-23 15:37 UTC (permalink / raw)
  To: Shane Mulligan; +Cc: emacs-tangents

I have tried following:

1. emacs -q

2. load src/init.el

3. then add the directory src to variable load-path

4. then to load pen.el 

I could see that it is asking for s-join function which it cannot
find, it looked like Emacs was looping, beeping, totally
unresponsive. I had to kill it.

I am using Emacs development version, are you also using it?

In general I was thinking that init.el and init-setup.el will set it
up for me.

The file installation.org is unclear, what do I need to do to start
using pen.el ?


$ emacs -q
Connection lost to X server ':0'
When compiled with GTK, Emacs cannot recover from X disconnects.
This is a GTK bug: https://gitlab.gnome.org/GNOME/gtk/issues/221
For details, see etc/PROBLEMS.
Fatal error 6: Aborted
Backtrace:
emacs(+0x1516d4)[0x55ba843c36d4]
emacs(+0x463d8)[0x55ba842b83d8]
emacs(+0x468d3)[0x55ba842b88d3]
emacs(+0x45940)[0x55ba842b7940]
emacs(+0x459fe)[0x55ba842b79fe]
/usr/lib/libX11.so.6(_XIOError+0x5f)[0x7fac94d5bb9f]
/usr/lib/libX11.so.6(_XEventsQueued+0xa7)[0x7fac94d59217]
/usr/lib/libX11.so.6(XPending+0x62)[0x7fac94d4aa92]
/usr/lib/libgdk-3.so.0(+0x8d722)[0x7fac95530722]
/usr/lib/libglib-2.0.so.0(g_main_context_prepare+0x1b0)[0x7fac94ed5bc0]
/usr/lib/libglib-2.0.so.0(+0xa7a06)[0x7fac94f29a06]
/usr/lib/libglib-2.0.so.0(g_main_context_pending+0x2a)[0x7fac94ed371a]
/usr/lib/libgtk-3.so.0(gtk_events_pending+0x10)[0x7fac95779690]
emacs(+0x105c5d)[0x55ba84377c5d]
emacs(+0x13e932)[0x55ba843b0932]
emacs(+0x13f635)[0x55ba843b1635]
emacs(+0x228e9b)[0x55ba8449ae9b]
emacs(+0x993fa)[0x55ba8430b3fa]
emacs(+0x7d3c9)[0x55ba842ef3c9]
emacs(+0x82525)[0x55ba842f4525]
emacs(+0x83cf4)[0x55ba842f5cf4]
emacs(+0x686f2)[0x55ba842da6f2]
emacs(+0x943a9)[0x55ba843063a9]
emacs(+0x14729e)[0x55ba843b929e]
emacs(+0x1b4047)[0x55ba84426047]
emacs(+0x137404)[0x55ba843a9404]
emacs(+0x1b6773)[0x55ba84428773]
emacs(+0x1373ab)[0x55ba843a93ab]
emacs(+0x13cd76)[0x55ba843aed76]
emacs(+0x13d0a2)[0x55ba843af0a2]
emacs(+0x4e929)[0x55ba842c0929]
/usr/lib/libc.so.6(__libc_start_main+0xd5)[0x7fac938bfb25]
emacs(+0x4ef2e)[0x55ba842c0f2e]
Aborted
~/Programming/git/pen.el
$ 


-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 12:47                               ` Jean Louis
  2021-07-23 13:39                                 ` Shane Mulligan
@ 2021-07-23 19:33                                 ` Eli Zaretskii
  2021-07-24  3:07                                   ` Jean Louis
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-23 19:33 UTC (permalink / raw)
  To: Jean Louis; +Cc: stefan, emacs-tangents, mullikine, rms

> Date: Fri, 23 Jul 2021 15:47:21 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: mullikine@gmail.com, stefan@marxist.se, emacs-tangents@gnu.org,
>   rms@gnu.org
> 
> * Eli Zaretskii <eliz@gnu.org> [2021-07-23 14:51]:
> > > According to online reviews chunks of code is copied even verbatim and
> > > people find from where.
> > 
> > That cannot be true.  It is nonsense to copy unrelated code into a
> > program and tell people this is what they should use.
> 
> I wonder how sure you are in that, did you do the online research? It
> is not about related or unrelated, I do believe that AI finds and
> generates related code. But 
> 
> Here are references disputing how "it cannot be true":

You take everything you read in these blogs for granted?  Did you
actually see the original code which these allude to?

> > > If code compiles or not is irrelevant. If one runs it or not is also
> > > irrelevant, code need not even run.
> > 
> > A feature or service that is based on this idea will never fly,
> > believe me.  Which program would want to have code pasted into his/her
> > program that would cause compilation errors or, worse, break it at run
> > time?
> 
> Of course people want code to fun. Just that copyright laws don't
> handle technical functionality. It is irrelevant if program works or
> does not work.

For copyright purposes, it doesn't.  But for the programmer who uses
the code it very much does.  So if these services give them code that
doesn't work they will not use it, and eventually will stop using the
services.

> > Once again, your assumptions are all wrong, so your conclusions are
> > also wrong.  Why not try one of these services and see what they
> > actually do, before you pass your (quite harsh) judgment on them, and
> > on the modern state of AI in general?
> 
> I can hear you how I am wrong, conclusions are wrong, though I gave
> you references enough to research it on Internet that will tell that
> there are possible serious licensing problems with such generated
> code.

See above.

> Question is very particular, specific and concrete:
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> 
> How does Pen.el and background AI services ensure of licensing
> compliance?
> 
> I would appreciate if you find solution to that or stay on that
> subject, as if I am wrong or right is not relevant, what I wish is to
> have assurance that it is free software. Prove me wrong by providing
> exact references in not only on country's law but also other
> countries' laws, the lows that make it legal, or how otherwise the
> legality of such code is justified and how users may get free
> software.

I'm sorry, but I don't work for you.  If you have problems with using
code from these services, then the onus is on you to do the research
and make up your own mind.  The discussion here is not about the code
these services give their users, it's whether and how Emacs can make
use of those services.  Emacs allows the user to write proprietary
code, and there's no legal issues when the user does that.  Emacs also
allows the user to copy someone else's code without permission, and
that's not a problem for Emacs when the user does that.

> As long as you don't tackle those subjects there is no legal solution
> for Pen.el and background AI to be used with assurance that software
> is truly free software.

You confuse "free software" with "software being used to write free
programs".  They are not the same.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 11:32                           ` Jean Louis
  2021-07-23 11:51                             ` Eli Zaretskii
@ 2021-07-24  1:14                             ` Richard Stallman
  2021-07-24  2:10                               ` Shane Mulligan
  2021-07-24  6:49                               ` Eli Zaretskii
  1 sibling, 2 replies; 46+ messages in thread
From: Richard Stallman @ 2021-07-24  1:14 UTC (permalink / raw)
  To: Jean Louis; +Cc: stefan, eliz, mullikine, emacs-tangents

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > That's not what happens with these services: they don't _copy_ code
  > > from other software (that won't work, because the probability of the
  > > variables being called by other names is 100%, and thus such code, if
  > > pasted into your program, will not compile).  What they do, they
  > > extract ideas and algorithms from those other places, and express them
  > > in terms of your variables and your data types.  So licenses are not
  > > relevant here.

  > According to online reviews chunks of code is copied even verbatim and
  > people find from where. Even if modified, it still requires licensing
  > compliance. 

From what I have read, it seems that the behavior of copilot runs on a
spectrum from the first description to the second description.  I
expect that in many cases, nothing copyrightable has been copied, but
in some cases copilot does copy a substantial amount from a
copyrighted work.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  1:14                             ` Richard Stallman
@ 2021-07-24  2:10                               ` Shane Mulligan
  2021-07-24  2:34                                 ` Shane Mulligan
  2021-07-24  6:49                               ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Shane Mulligan @ 2021-07-24  2:10 UTC (permalink / raw)
  To: rms; +Cc: Eli Zaretskii, Stefan Kangas, emacs-tangents, Jean Louis

[-- Attachment #1: Type: text/plain, Size: 2665 bytes --]

It's a bit like whitewashing because it's
reconstructing generatively by finding
artificial/contrived associations between
different works that the author had not
intended but may have been part of their
inspiration inspiration, and it compresses the
information based on these assocations.

It's a bit like running a lossy 'zip' on the
internet and then decompressing
probabilistically.

When run deterministically (set the temperature of GPT to 0), you may
actually
see 'snippets' from various places, every time, with the same input
generating
the same snippets.

So the source material is important.

What GitHub did was very, very bad but they
did it anyway.

That doesn't mean GPT is bad, it just means
they zipped up content they should not have
and created this language 'index' or ('codex'
is what they call it).

What they really should do, if they are honest
people, is train the model on subsets of
GitHub code by separate licence and release
the models with the same license.

Shane Mulligan

How to contact me:
🇦🇺 00 61 421 641 250
🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
mullikine@gmail.com

On Sat, Jul 24, 2021 at 1:14 PM Richard Stallman <rms@gnu.org> wrote:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > > That's not what happens with these services: they don't _copy_ code
>   > > from other software (that won't work, because the probability of the
>   > > variables being called by other names is 100%, and thus such code, if
>   > > pasted into your program, will not compile).  What they do, they
>   > > extract ideas and algorithms from those other places, and express
> them
>   > > in terms of your variables and your data types.  So licenses are not
>   > > relevant here.
>
>   > According to online reviews chunks of code is copied even verbatim and
>   > people find from where. Even if modified, it still requires licensing
>   > compliance.
>
> From what I have read, it seems that the behavior of copilot runs on a
> spectrum from the first description to the second description.  I
> expect that in many cases, nothing copyrightable has been copied, but
> in some cases copilot does copy a substantial amount from a
> copyrighted work.
>
> --
> Dr Richard Stallman (https://stallman.org)
> Chief GNUisance of the GNU Project (https://gnu.org)
> Founder, Free Software Foundation (https://fsf.org)
> Internet Hall-of-Famer (https://internethalloffame.org)
>
>
>

[-- Attachment #2: Type: text/html, Size: 4946 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  2:10                               ` Shane Mulligan
@ 2021-07-24  2:34                                 ` Shane Mulligan
  2021-07-24  3:14                                   ` Shane Mulligan
  0 siblings, 1 reply; 46+ messages in thread
From: Shane Mulligan @ 2021-07-24  2:34 UTC (permalink / raw)
  To: rms; +Cc: Eli Zaretskii, Stefan Kangas, emacs-tangents, Jean Louis

[-- Attachment #1: Type: text/plain, Size: 3939 bytes --]

This is why the technology is a bit like a
personal Google search, Stackoverflow, which
you can store offline because it's an index of the internet that is capable
of reconstruction.

But it's not limited to code generation. Codex
is nothing. Emacs + GPT would carve a large
piece out of M$.

Codex is a model trained for the purpose of
generating code, but GPT models will become
abundant for all tasks, including image and
audio synthesis and understanding.

Emacs is a complete operating system.
VSCode is geared towards programming.

Emacs can do infinitely more things with GPT
than VSCode can because it's holistic.

Even the 'eliza' in emacs can pass the turing
test with GPT. GPT can run sequences of commands in emacs to automate
entire workflows with natural language.

But the future is in collaborative GPT.

The basis/base truth would become versions of
LMs or ontologies.

Right now that's EleutherAI.

Shane Mulligan

How to contact me:
🇦🇺 00 61 421 641 250
🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
mullikine@gmail.com


On Sat, Jul 24, 2021 at 2:10 PM Shane Mulligan <mullikine@gmail.com> wrote:

> It's a bit like whitewashing because it's
> reconstructing generatively by finding
> artificial/contrived associations between
> different works that the author had not
> intended but may have been part of their
> inspiration inspiration, and it compresses the
> information based on these assocations.
>
> It's a bit like running a lossy 'zip' on the
> internet and then decompressing
> probabilistically.
>
> When run deterministically (set the temperature of GPT to 0), you may
> actually
> see 'snippets' from various places, every time, with the same input
> generating
> the same snippets.
>
> So the source material is important.
>
> What GitHub did was very, very bad but they
> did it anyway.
>
> That doesn't mean GPT is bad, it just means
> they zipped up content they should not have
> and created this language 'index' or ('codex'
> is what they call it).
>
> What they really should do, if they are honest
> people, is train the model on subsets of
> GitHub code by separate licence and release
> the models with the same license.
>
> Shane Mulligan
>
> How to contact me:
> 🇦🇺 00 61 421 641 250
> 🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
> mullikine@gmail.com
>
>
> On Sat, Jul 24, 2021 at 1:14 PM Richard Stallman <rms@gnu.org> wrote:
>
>> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
>> [[[ whether defending the US Constitution against all enemies,     ]]]
>> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>>
>>   > > That's not what happens with these services: they don't _copy_ code
>>   > > from other software (that won't work, because the probability of the
>>   > > variables being called by other names is 100%, and thus such code,
>> if
>>   > > pasted into your program, will not compile).  What they do, they
>>   > > extract ideas and algorithms from those other places, and express
>> them
>>   > > in terms of your variables and your data types.  So licenses are not
>>   > > relevant here.
>>
>>   > According to online reviews chunks of code is copied even verbatim and
>>   > people find from where. Even if modified, it still requires licensing
>>   > compliance.
>>
>> From what I have read, it seems that the behavior of copilot runs on a
>> spectrum from the first description to the second description.  I
>> expect that in many cases, nothing copyrightable has been copied, but
>> in some cases copilot does copy a substantial amount from a
>> copyrighted work.
>>
>> --
>> Dr Richard Stallman (https://stallman.org)
>> Chief GNUisance of the GNU Project (https://gnu.org)
>> Founder, Free Software Foundation (https://fsf.org)
>> Internet Hall-of-Famer (https://internethalloffame.org)
>>
>>
>>

[-- Attachment #2: Type: text/html, Size: 7947 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 19:33                                 ` Eli Zaretskii
@ 2021-07-24  3:07                                   ` Jean Louis
  2021-07-24  7:32                                     ` Eli Zaretskii
  2021-07-25  1:09                                     ` Richard Stallman
  0 siblings, 2 replies; 46+ messages in thread
From: Jean Louis @ 2021-07-24  3:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-23 22:34]:
> > Here are references disputing how "it cannot be true":
> 
> You take everything you read in these blogs for granted?  Did you
> actually see the original code which these allude to?

In case of Copilot, Github admits:
https://docs.github.com/en/github/copilot/research-recitation

"This investigation demonstrates that GitHub Copilot can quote a body
of code verbatim, but that it rarely does so, and when it does, it
mostly quotes code that everybody quotes, and mostly at the beginning
of a file, as if to break the ice.This investigation demonstrates that
GitHub Copilot can quote a body of code verbatim, but that it rarely
does so, and when it does, it mostly quotes code that everybody
quotes, and mostly at the beginning of a file, as if to break the
ice."

And fact that it is "rare" does not make it a less problem for
copyright purposes as the new author cannot know which part of the
code has used "rare" verbatim.

> > Question is very particular, specific and concrete:
> > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> > 
> > How does Pen.el and background AI services ensure of licensing
> > compliance?
> > 
> > I would appreciate if you find solution to that or stay on that
> > subject, as if I am wrong or right is not relevant, what I wish is to
> > have assurance that it is free software. Prove me wrong by providing
> > exact references in not only on country's law but also other
> > countries' laws, the lows that make it legal, or how otherwise the
> > legality of such code is justified and how users may get free
> > software.
> 
> I'm sorry, but I don't work for you.  If you have problems with using
> code from these services, then the onus is on you to do the research
> and make up your own mind.  The discussion here is not about the code
> these services give their users, it's whether and how Emacs can make
> use of those services.  Emacs allows the user to write proprietary
> code, and there's no legal issues when the user does that.  Emacs also
> allows the user to copy someone else's code without permission, and
> that's not a problem for Emacs when the user does that.

If you don't wish to correspond, don't, you are free. 

I have never said nor implied "you work for me" and I cannot see how
is that relevant to the question. If you participate in discussion and
respond to my question relating to licensing compliance, then provide
a reference justifying its legality. Or simply say you don't have
such. Your employment is not subject of my question nor relevant.

I am not user of proprietary software and I don't consider options of
writing proprietary software. Neither I am participating in discussion
to foster ideas of creation of proprietary software.

I am free software user and for that specific case I am interested how
the licensing issue is solved.

However, my question is at least answered by my online research as I
have already found the refrences:

1. Julia Reda's reference; and

2. OpenAI_RFC-84-FR-58141.pdf
https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf

Conclusions are:

- legal justifications exists for US jurisdiction as the companies
  providing the AI are strong enough to find their ways, they are
  playing on the card as given in references above; as somebody
  already said, I doubt they would use "fair use" doctrine if the AI
  would be trained on proprietary software such as Windows;

- conflict is serious and it is out there among the people and remains
  unsolved; AI has been trained on GPL and other free software and is
  used by corporations to generate new code without attributions;
  people complain that it is misuse of intentions of authors; 

- overall international legal situation is thus unclear, especially
  considering that free software spans the whole world, not just the
  US jurisdiction, as what may work within US is not same among all
  jurisdictions;

> > As long as you don't tackle those subjects there is no legal solution
> > for Pen.el and background AI to be used with assurance that software
> > is truly free software.
> 
> You confuse "free software" with "software being used to write free
> programs".  They are not the same.

Maybe I have expressed myself in such way as not to get the point
understood. It must be so, as I have finally found the first legal
references myself. 

For Pen.el I have never made any relevance to legality question I
made, and I have the pen.el repository over here and license is
clear. Never mentioned it.

I have not made reference to "software being used to write free
programs" as a server side service I did not tackle that, it is most
probably proprietary software or some versions could be free
software. But that is not relevant.

What is at hand is:
━━━━━━━━━━━━━━━━━━━

1. There is pool of GPL and other free software which authors expect
   compliance to their licenses;

2. Large corporation is trying to use "fair use" doctrine on the pool
   of software to create a service;

3. Service generates new software, sometimes duplicating verbatim
   code;

Question was and still remains largely unsolved is how authors who use
newly generated code can be sure that generated software is free
software and to comply to GPL and other free software licenses?

Conclusion as of 2021-07-24 is that authors cannot be sure as there
are legal uncertainties.

Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  2:34                                 ` Shane Mulligan
@ 2021-07-24  3:14                                   ` Shane Mulligan
  0 siblings, 0 replies; 46+ messages in thread
From: Shane Mulligan @ 2021-07-24  3:14 UTC (permalink / raw)
  To: rms; +Cc: Eli Zaretskii, Stefan Kangas, emacs-tangents, Jean Louis

[-- Attachment #1: Type: text/plain, Size: 4856 bytes --]

Proprietary code from within the M$ ecosystem is uninspired and bad code by
comparison. Open source code is the gold mine so M$ will not like being
told they cannot use open source to compile codex. It's a complete r*pe of
open source. GPT is trained on public language and language belongs to
people generally, not some select group. It's not meant to be a tool for
controlling people. GPT is literally the soul of a billion people and
should be public domain and not feared by GNU but instead rescued. Sorry
for the rhetoric!

On Sat, Jul 24, 2021 at 2:34 PM Shane Mulligan <mullikine@gmail.com> wrote:

> This is why the technology is a bit like a
> personal Google search, Stackoverflow, which
> you can store offline because it's an index of the internet that is
> capable of reconstruction.
>
> But it's not limited to code generation. Codex
> is nothing. Emacs + GPT would carve a large
> piece out of M$.
>
> Codex is a model trained for the purpose of
> generating code, but GPT models will become
> abundant for all tasks, including image and
> audio synthesis and understanding.
>
> Emacs is a complete operating system.
> VSCode is geared towards programming.
>
> Emacs can do infinitely more things with GPT
> than VSCode can because it's holistic.
>
> Even the 'eliza' in emacs can pass the turing
> test with GPT. GPT can run sequences of commands in emacs to automate
> entire workflows with natural language.
>
> But the future is in collaborative GPT.
>
> The basis/base truth would become versions of
> LMs or ontologies.
>
> Right now that's EleutherAI.
>
> Shane Mulligan
>
> How to contact me:
> 🇦🇺 00 61 421 641 250
> 🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
> mullikine@gmail.com
>
>
> On Sat, Jul 24, 2021 at 2:10 PM Shane Mulligan <mullikine@gmail.com>
> wrote:
>
>> It's a bit like whitewashing because it's
>> reconstructing generatively by finding
>> artificial/contrived associations between
>> different works that the author had not
>> intended but may have been part of their
>> inspiration inspiration, and it compresses the
>> information based on these assocations.
>>
>> It's a bit like running a lossy 'zip' on the
>> internet and then decompressing
>> probabilistically.
>>
>> When run deterministically (set the temperature of GPT to 0), you may
>> actually
>> see 'snippets' from various places, every time, with the same input
>> generating
>> the same snippets.
>>
>> So the source material is important.
>>
>> What GitHub did was very, very bad but they
>> did it anyway.
>>
>> That doesn't mean GPT is bad, it just means
>> they zipped up content they should not have
>> and created this language 'index' or ('codex'
>> is what they call it).
>>
>> What they really should do, if they are honest
>> people, is train the model on subsets of
>> GitHub code by separate licence and release
>> the models with the same license.
>>
>> Shane Mulligan
>>
>> How to contact me:
>> 🇦🇺 00 61 421 641 250
>> 🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
>> mullikine@gmail.com
>>
>>
>> On Sat, Jul 24, 2021 at 1:14 PM Richard Stallman <rms@gnu.org> wrote:
>>
>>> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
>>> [[[ whether defending the US Constitution against all enemies,     ]]]
>>> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>>>
>>>   > > That's not what happens with these services: they don't _copy_ code
>>>   > > from other software (that won't work, because the probability of
>>> the
>>>   > > variables being called by other names is 100%, and thus such code,
>>> if
>>>   > > pasted into your program, will not compile).  What they do, they
>>>   > > extract ideas and algorithms from those other places, and express
>>> them
>>>   > > in terms of your variables and your data types.  So licenses are
>>> not
>>>   > > relevant here.
>>>
>>>   > According to online reviews chunks of code is copied even verbatim
>>> and
>>>   > people find from where. Even if modified, it still requires licensing
>>>   > compliance.
>>>
>>> From what I have read, it seems that the behavior of copilot runs on a
>>> spectrum from the first description to the second description.  I
>>> expect that in many cases, nothing copyrightable has been copied, but
>>> in some cases copilot does copy a substantial amount from a
>>> copyrighted work.
>>>
>>> --
>>> Dr Richard Stallman (https://stallman.org)
>>> Chief GNUisance of the GNU Project (https://gnu.org)
>>> Founder, Free Software Foundation (https://fsf.org)
>>> Internet Hall-of-Famer (https://internethalloffame.org)
>>>
>>>
>>> --

Shane Mulligan

How to contact me:
🇦🇺 00 61 421 641 250
🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
mullikine@gmail.com

[-- Attachment #2: Type: text/html, Size: 10603 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  1:14                             ` Richard Stallman
  2021-07-24  2:10                               ` Shane Mulligan
@ 2021-07-24  6:49                               ` Eli Zaretskii
  2021-07-24  7:33                                 ` Jean Louis
  2021-07-24  7:41                                 ` Philip Kaludercic
  1 sibling, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24  6:49 UTC (permalink / raw)
  To: rms; +Cc: mullikine, emacs-tangents, stefan, bugs

> From: Richard Stallman <rms@gnu.org>
> Date: Fri, 23 Jul 2021 21:14:23 -0400
> Cc: stefan@marxist.se, eliz@gnu.org, mullikine@gmail.com,
>  emacs-tangents@gnu.org
> 
>   > > That's not what happens with these services: they don't _copy_ code
>   > > from other software (that won't work, because the probability of the
>   > > variables being called by other names is 100%, and thus such code, if
>   > > pasted into your program, will not compile).  What they do, they
>   > > extract ideas and algorithms from those other places, and express them
>   > > in terms of your variables and your data types.  So licenses are not
>   > > relevant here.
> 
>   > According to online reviews chunks of code is copied even verbatim and
>   > people find from where. Even if modified, it still requires licensing
>   > compliance. 
> 
> From what I have read, it seems that the behavior of copilot runs on a
> spectrum from the first description to the second description.  I
> expect that in many cases, nothing copyrightable has been copied, but
> in some cases copilot does copy a substantial amount from a
> copyrighted work.

It cannot be a verbatim copy, because at least the variables, and
sometimes also the data types, need to be renamed.  Whether the result
is still under the original copyright cannot be established without
actually comparing the two versions of the code.  So any general
flat rejection of the idea of these services on these grounds is not
serious, IMO.

Of course, someone like Jean will not use any code until a bunch of
lawyers submit an official opinion about the legal implications, but
IMO that's a radical view that doesn't make a lot of sense, especially
since none of the code accessible openly via the net can be
proprietary, for obvious reasons.  Jean could do whatever he
personally likes, but his radical views don't necessarily bind the GNU
project in general and Emacs in particular.

Moreover, ironically Jean bases his views on opinions and issues
expressed by clear opponents of Free Software.  The strongest drive
behind many of these blogs' aversion from these services is the fear
that GPL-licensed code creeps into proprietary software produced by
enterprises and their software subcontractors, because that would
require them to make the sources available or at least put them at a
risk of lawsuits.  It is a well-known fact that most, if not all,
software contracts for proprietary software nowadays include explicit
prohibition of using GPL-licensed code in the product.  It is those
people that serve these contracts and enterprises who drive the
whoop-la about licensing issues in code offered by these AI-based
services.  So before embracing their FUD and biased opinions, I really
suggest to actually look at the code, compare it with the original,
and make an independent assessment of both whether it's a "copy" from
the copyright POV and of the licenses of the original code.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  3:07                                   ` Jean Louis
@ 2021-07-24  7:32                                     ` Eli Zaretskii
  2021-07-24  7:54                                       ` Jean Louis
  2021-07-25  1:09                                     ` Richard Stallman
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24  7:32 UTC (permalink / raw)
  To: Jean Louis; +Cc: stefan, emacs-tangents, mullikine, rms

> Date: Sat, 24 Jul 2021 06:07:18 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: mullikine@gmail.com, stefan@marxist.se, emacs-tangents@gnu.org,
>   rms@gnu.org
> 
> * Eli Zaretskii <eliz@gnu.org> [2021-07-23 22:34]:
> > > Here are references disputing how "it cannot be true":
> > 
> > You take everything you read in these blogs for granted?  Did you
> > actually see the original code which these allude to?
> 
> In case of Copilot, Github admits:
> https://docs.github.com/en/github/copilot/research-recitation

Yes, and there are cases of real code stealing out there.  The only
thing that proves is that some mistaken or dishonest operators can do
this.

> > > I would appreciate if you find solution to that or stay on that
> > > subject, as if I am wrong or right is not relevant, what I wish is to
> > > have assurance that it is free software. Prove me wrong by providing
> > > exact references in not only on country's law but also other
> > > countries' laws, the lows that make it legal, or how otherwise the
> > > legality of such code is justified and how users may get free
> > > software.
> > 
> > I'm sorry, but I don't work for you.  If you have problems with using
> > code from these services, then the onus is on you to do the research
> > and make up your own mind.  The discussion here is not about the code
> > these services give their users, it's whether and how Emacs can make
> > use of those services.  Emacs allows the user to write proprietary
> > code, and there's no legal issues when the user does that.  Emacs also
> > allows the user to copy someone else's code without permission, and
> > that's not a problem for Emacs when the user does that.
> 
> If you don't wish to correspond, don't, you are free. 
> 
> I have never said nor implied "you work for me" and I cannot see how
> is that relevant to the question.

You consistently take the stance that implies, and many times
explicitly states, that (a) you represent the views of the GNU
project, and (b) the GNU project should or should not do this and
that.  Then, when people like me object, you demand that they prove
something to you, or else.  But no one here is under any obligation of
proving anything to you, and your views and opinions (which are quite
radical, I must say) are your own and no one else's.  They are your
own responsibility, and if you want them to be proven or dis-proven,
you should do that yourself.

> If you participate in discussion and respond to my question relating
> to licensing compliance, then provide a reference justifying its
> legality. Or simply say you don't have such. Your employment is not
> subject of my question nor relevant.

I could ask you to do the same.  You never provided any reference
justifying the legality, just a lot of blogs that spread FUD (whose
motivation, which many times is struggle against Free Software, I
described in my previous message).  If you demand something of your
correspondents, please live up to the same high standards, or stop
demanding that others do.  Quoting a random selection of blog postings
is NOT research and does NOT justify anything, except that the issue
is being "discussed" by some people.  It doesn't even mean that those
discussions are serious, let alone that whoever posts those opinions
doesn't have an agenda.

> I am not user of proprietary software and I don't consider options of
> writing proprietary software. Neither I am participating in discussion
> to foster ideas of creation of proprietary software.
> 
> I am free software user and for that specific case I am interested how
> the licensing issue is solved.

You are free to do whatever you like in your work; that is your
prerogative and no one else's.  But here we discuss what the Emacs
project should or should not do about this technology, not your
private decisions.

> 2. OpenAI_RFC-84-FR-58141.pdf
> https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf
> 
> Conclusions are:
> 
> - legal justifications exists for US jurisdiction as the companies
>   providing the AI are strong enough to find their ways, they are
>   playing on the card as given in references above; as somebody
>   already said, I doubt they would use "fair use" doctrine if the AI
>   would be trained on proprietary software such as Windows;
> 
> - conflict is serious and it is out there among the people and remains
>   unsolved; AI has been trained on GPL and other free software and is
>   used by corporations to generate new code without attributions;
>   people complain that it is misuse of intentions of authors; 
> 
> - overall international legal situation is thus unclear, especially
>   considering that free software spans the whole world, not just the
>   US jurisdiction, as what may work within US is not same among all
>   jurisdictions;

That's not what the above document concludes.  Quote:

  Conclusion

  We submit that:

  I. Under current law, training AI systems constitutes fair use.

  II. Policy considerations underlying fair use doctrine support the
  finding that training AI systems constitute fair use.

  III. Nevertheless, legal uncertainty on the copyright implications
  of training AI systems imposes substantial costs on AI developers
  and so should be authoritatively resolved.

> Conclusion as of 2021-07-24 is that authors cannot be sure as there
> are legal uncertainties.

Those are your personal conclusions.  They don't follow, and sometimes
directly contradict, the references you yourself posted (sometimes so
much so that I wonder whether we really read the same text).  My
opinions differ substantially, for the reasons I explained above and
in other messages.

(Full disclosure: part of my daytime job is development of
sophisticated AI-based algorithms that use machine learning
technologies for various practical purposes, including analysis of
"natural language" text.)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  6:49                               ` Eli Zaretskii
@ 2021-07-24  7:33                                 ` Jean Louis
  2021-07-24  8:10                                   ` Eli Zaretskii
  2021-07-24  7:41                                 ` Philip Kaludercic
  1 sibling, 1 reply; 46+ messages in thread
From: Jean Louis @ 2021-07-24  7:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mullikine, emacs-tangents, stefan, rms

Eli, I do take care of licensing when re-using somebody's software,
and when publishing software or distributing it. 

There is nothing "radical" about it. 

Concerns of other people are also not radical. Intention of authors is
not respected even if there is legal circumvention in the US such as
"fair use", that does not fly in other jurisdictions.

I do understand you have some unsolved issues or something you cannot
handle related to licensing as you are more for technical side, but
please don't call it "radical" as that does not teach people about GPL
licensing.

Jean

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  6:49                               ` Eli Zaretskii
  2021-07-24  7:33                                 ` Jean Louis
@ 2021-07-24  7:41                                 ` Philip Kaludercic
  2021-07-24  7:59                                   ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Philip Kaludercic @ 2021-07-24  7:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms, bugs

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Richard Stallman <rms@gnu.org>
>> Date: Fri, 23 Jul 2021 21:14:23 -0400
>> Cc: stefan@marxist.se, eliz@gnu.org, mullikine@gmail.com,
>>  emacs-tangents@gnu.org
>> 
>>   > > That's not what happens with these services: they don't _copy_ code
>>   > > from other software (that won't work, because the probability of the
>>   > > variables being called by other names is 100%, and thus such code, if
>>   > > pasted into your program, will not compile).  What they do, they
>>   > > extract ideas and algorithms from those other places, and express them
>>   > > in terms of your variables and your data types.  So licenses are not
>>   > > relevant here.
>> 
>>   > According to online reviews chunks of code is copied even verbatim and
>>   > people find from where. Even if modified, it still requires licensing
>>   > compliance. 
>> 
>> From what I have read, it seems that the behavior of copilot runs on a
>> spectrum from the first description to the second description.  I
>> expect that in many cases, nothing copyrightable has been copied, but
>> in some cases copilot does copy a substantial amount from a
>> copyrighted work.
>
> It cannot be a verbatim copy, because at least the variables, and
> sometimes also the data types, need to be renamed.  Whether the result
> is still under the original copyright cannot be established without
> actually comparing the two versions of the code.  So any general
> flat rejection of the idea of these services on these grounds is not
> serious, IMO.

Not necessarily, if it generates a pure, top-level function. Someone
could type something like "Sort list of postcodes" and it generates a
Radix Sort function. And if this is part of some code that was copied a
lot, the model might tend to generate this verbatim even more likely.

Or that is at least my understanding.

-- 
	Philip Kaludercic



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  7:32                                     ` Eli Zaretskii
@ 2021-07-24  7:54                                       ` Jean Louis
  2021-07-24  8:50                                         ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Jean Louis @ 2021-07-24  7:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-24 10:33]:
> > I have never said nor implied "you work for me" and I cannot see how
> > is that relevant to the question.
> 
> You consistently take the stance that implies, and many times
> explicitly states, that (a) you represent the views of the GNU
> project

I have never in my life said so, you please stop with it.

> and (b) the GNU project should or should not do this and that.

I have always beeing following GNU project as such because I agree to
its principles and guidances as published on GNU website. I will
definitely compare issues at hand with the already well known GNU
guidelines just as you and other people do.

For example when there is recommendation of proprietary software I
will say that GNU project does not endorse such.

> Then, when people like me object, you demand that they prove
> something to you, or else.

The issues are totally unrelated. I am sorry for your
misunderstandings. 

I was interested to find out how is legality solved when re-using the
code generated by AI. Nothing else beyond that. And I have found
references and made conclusions. 

> > If you participate in discussion and respond to my question relating
> > to licensing compliance, then provide a reference justifying its
> > legality. Or simply say you don't have such. Your employment is not
> > subject of my question nor relevant.
> 
> I could ask you to do the same.  You never provided any reference
> justifying the legality, just a lot of blogs that spread FUD (whose
> motivation, which many times is struggle against Free Software, I
> described in my previous message).

Take it easy. 

I asked simple question, refined the question, and found references,
made conclusions of legality justification in US jurisdiction and that
there are unsolved global problem. 

There is no need to expand discussion into directions which is purely
irrelevant to the question about licensing compliance.

> > 2. OpenAI_RFC-84-FR-58141.pdf
> > https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf
> > 
> > Conclusions are:
> > 
> > - legal justifications exists for US jurisdiction as the companies
> >   providing the AI are strong enough to find their ways, they are
> >   playing on the card as given in references above; as somebody
> >   already said, I doubt they would use "fair use" doctrine if the AI
> >   would be trained on proprietary software such as Windows;
> > 
> > - conflict is serious and it is out there among the people and remains
> >   unsolved; AI has been trained on GPL and other free software and is
> >   used by corporations to generate new code without attributions;
> >   people complain that it is misuse of intentions of authors; 
> > 
> > - overall international legal situation is thus unclear, especially
> >   considering that free software spans the whole world, not just the
> >   US jurisdiction, as what may work within US is not same among all
> >   jurisdictions;
> 
> That's not what the above document concludes.  Quote:

My conclusions are not what document conclude. I never said so. The
document is related exclusively to US jurisdiction and its final
determination is vague. It is their proposal, and not a court
decision. They have openly said that they are financially strong
enough and will try to defend any cases by using "fair use" doctrine
in the US.

This however, does not solve all jurisdictions. "fair use" doctrine is
also not finally solved in the US.

>   Conclusion
> 
>   We submit that:
> 
>   I. Under current law, training AI systems constitutes fair use.

That is their opinion, as that is a corporation that has the strength
to submit such document and of course that they found some legal
defense. If anybody starts complaining it will be a court case that
will give final judgments.

>   II. Policy considerations underlying fair use doctrine support the
>   finding that training AI systems constitute fair use.
> 
>   III. Nevertheless, legal uncertainty on the copyright implications
>   of training AI systems imposes substantial costs on AI developers
>   and so should be authoritatively resolved.

Everything from that document relates to US jurisdiction only. It is
one-sided thus biased document, clearly opposing the views of many GPL
authors. 

It is the corporations defense argument for ripping off the GPL
software. They found the way and wish to play on that card. 

It is document of conflict, not document of friendship or
collaboration.  It is document of one-sidede defense, not a document
that contributes to free software. 

It is document that defend proprietary software, not document that
fosters free software.

Thus it does not resolve anything in the community. It serves to one
party only.

Would that corporation release all software as free software that
would bring or make a new leap forward. They are making one big step
backwards. One can see that by number of aware free software
developers canceling the Github accounts.

> > Conclusion as of 2021-07-24 is that authors cannot be sure as there
> > are legal uncertainties.
> 
> Those are your personal conclusions.

Personal definitely, but not as the only one with the same opinion,
which should be clear from references which left probably unread.

Legality of free software on the planet was ensured by the GPL
license. Maybe the license was never planned to be international, but
it does function well internationally. 

Legality of AI generated code and "free use" doctrine in the US is at
this point of development yet far from functioning well
internationally. 

> (Full disclosure: part of my daytime job is development of
> sophisticated AI-based algorithms that use machine learning
> technologies for various practical purposes, including analysis of
> "natural language" text.)

That is great, it is technical part of software. Keep doing.

Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  7:41                                 ` Philip Kaludercic
@ 2021-07-24  7:59                                   ` Eli Zaretskii
  2021-07-24  9:31                                     ` Philip Kaludercic
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24  7:59 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: stefan, emacs-tangents, mullikine, rms, bugs

> From: Philip Kaludercic <philipk@posteo.net>
> Cc: rms@gnu.org,  mullikine@gmail.com,  emacs-tangents@gnu.org,
>   stefan@marxist.se,  bugs@gnu.support
> Date: Sat, 24 Jul 2021 07:41:21 +0000
> 
> > It cannot be a verbatim copy, because at least the variables, and
> > sometimes also the data types, need to be renamed.  Whether the result
> > is still under the original copyright cannot be established without
> > actually comparing the two versions of the code.  So any general
> > flat rejection of the idea of these services on these grounds is not
> > serious, IMO.
> 
> Not necessarily, if it generates a pure, top-level function. Someone
> could type something like "Sort list of postcodes" and it generates a
> Radix Sort function. And if this is part of some code that was copied a
> lot, the model might tend to generate this verbatim even more likely.

A sort function must state at least the data type before it can be
compiled.  And if you are talking about pseudo-code that is data-type
agnostic, then that's an algorithm, and is not copyrightable, AFAIK.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  7:33                                 ` Jean Louis
@ 2021-07-24  8:10                                   ` Eli Zaretskii
  2021-07-24  8:21                                     ` Jean Louis
  2021-07-24  8:35                                     ` Jean Louis
  0 siblings, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24  8:10 UTC (permalink / raw)
  To: Jean Louis; +Cc: mullikine, emacs-tangents, stefan, rms

> Date: Sat, 24 Jul 2021 10:33:57 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: rms@gnu.org, stefan@marxist.se, mullikine@gmail.com,
>   emacs-tangents@gnu.org
> 
> 
> Eli, I do take care of licensing when re-using somebody's software,
> and when publishing software or distributing it. 
> 
> There is nothing "radical" about it. 

Considering the license of the code is not radical, indeed.  But the
criteria you personally apply when considering that _are_ radical.
You posted enough opinions about these matters to make that abundantly
clear.

There's nothing wrong with having such views, they are your personal
views, and are entirely legitimate.  All I'm saying is that the Emacs
project should not be guided by such views, for the reasons I
explained.

> Concerns of other people are also not radical.

No, but your interpretation of those "concerns" is.

> Intention of authors is not respected even if there is legal
> circumvention in the US such as "fair use", that does not fly in
> other jurisdictions.

So you agree that the problems you raised don't seem to exist at least
in the US?

> I do understand you have some unsolved issues or something you cannot
> handle related to licensing

No, I don't have any unsolved issues.

> as you are more for technical side

??? What is that supposed to mean?

> but please don't call it "radical" as that does not teach people
> about GPL licensing.

When I see a radical view, I call it "radical".  Promoting Free
Software requires healthy pragmatism, because we want the Free
Software to flourish and remain relevant by picking up the advances in
technology.  Rejecting such new technologies just because there's some
doubts expressed by someone in some blog is "radical", and IMO
eventually detrimental to Free Software development.  We should
instead carefully and independently assess the issues and make our own
judgment based on specific details of each such development.  We
cannot run away of every idea because some people say it might cause
trouble in some cases.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  8:10                                   ` Eli Zaretskii
@ 2021-07-24  8:21                                     ` Jean Louis
  2021-07-24  8:35                                     ` Jean Louis
  1 sibling, 0 replies; 46+ messages in thread
From: Jean Louis @ 2021-07-24  8:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mullikine, emacs-tangents, stefan, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-24 11:11]:
> There's nothing wrong with having such views, they are your personal
> views, and are entirely legitimate.  All I'm saying is that the Emacs
> project should not be guided by such views, for the reasons I
> explained.

My question has been resolved. It is over.

Did you read it?



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  8:10                                   ` Eli Zaretskii
  2021-07-24  8:21                                     ` Jean Louis
@ 2021-07-24  8:35                                     ` Jean Louis
  2021-07-24  8:59                                       ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Jean Louis @ 2021-07-24  8:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mullikine, emacs-tangents, stefan, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-24 11:11]:
> When I see a radical view, I call it "radical".  Promoting Free
> Software requires healthy pragmatism, because we want the Free
> Software to flourish and remain relevant by picking up the advances in
> technology.  Rejecting such new technologies just because there's some
> doubts expressed by someone in some blog is "radical", and IMO
> eventually detrimental to Free Software development.  We should
> instead carefully and independently assess the issues and make our own
> judgment based on specific details of each such development.  We
> cannot run away of every idea because some people say it might cause
> trouble in some cases.

I am totally for advances of technology as long as we foster free
software and freedom in computing. We have too little of AI today in
21st century.

Question is definitely not so general how presented in your paragraph
above. It is very specific, related on how to solve licensing issues.

There are no doubts that code may be copied verbatin, as here is
authorized and official documentation by Github related to Copilot:
https://docs.github.com/en/github/copilot/research-recitation

It is not related to various other AIs, etc. I am not sure if the same
AI is even used in Pen.el. It may not be relevant. In the Copilot
documentation it says: 

Quote:

This investigation demonstrates that GitHub Copilot can quote a body
of code verbatim, but that it rarely does so, and when it does, it
mostly quotes code that everybody quotes, and mostly at the beginning
of a file, as if to break the ice.

Additionally I have been using OpenAI and found not 0.1 percent
verbatim responses, I could find those pages on Internet from where
verbatim paragraphs were cited. I am still in playground. I can find
paragraphs from websites from our competitors as a response. I still
have to discover I in the AI in the playground of OpenAI service.

Licensing issues I have made and for which I have found partial
solution are in no way related to rejecting, rather to adopting it in
free software.

My question was how we can adopt the code generated into free software
(for example by using Pen.el) as it generates code by using other GPL
free software without attributions. 

Partially it is resolved in the US, though unproven and with great
conflict with authors. It does not give assurance. I am not sure if I
can generate the code and that it is really "original" and
infringement free.

Is the OpenAI company giving me some kind of guarantee that I will be
held without liabilities if I use that code?

Thus those issues may be temporarily brushed off with "fair use" in
US, they remain unsolved in the US until the first few court cases or
class action suite, and are not resolved on international level at
all.

Julia Reda's statement does not apply in all jurisdictions.

At this moment there is no verified legal statement by let us say FSF
attorneys or legal experts or some other organization that will
confirm legal status of such generated code or text on international
level. 

Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  7:54                                       ` Jean Louis
@ 2021-07-24  8:50                                         ` Eli Zaretskii
  2021-07-24 16:16                                           ` Jean Louis
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24  8:50 UTC (permalink / raw)
  To: Jean Louis; +Cc: stefan, emacs-tangents, mullikine, rms

> Date: Sat, 24 Jul 2021 10:54:07 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: mullikine@gmail.com, stefan@marxist.se, emacs-tangents@gnu.org,
>   rms@gnu.org
> 
> * Eli Zaretskii <eliz@gnu.org> [2021-07-24 10:33]:
> > > I have never said nor implied "you work for me" and I cannot see how
> > > is that relevant to the question.
> > 
> > You consistently take the stance that implies, and many times
> > explicitly states, that (a) you represent the views of the GNU
> > project
> 
> I have never in my life said so, you please stop with it.

Please re-read your postings.  They say otherwise.  I realize that you
didn't intend that, but that's how your words sound, here and
elsewhere.  If you want to avoid such interpretation, please take
great care to tone down your categorical statements, use qualifiers
like IMO and AFAIK and "I think", and generally make sure your words
say that's your opinion, not an absolute truth, let alone something
the GNU project decided to do or is doing.

> For example when there is recommendation of proprietary software I
> will say that GNU project does not endorse such.

Please consider adding "AFAIK" or somesuch, otherwise this sounds like
you are speaking for the project.

> I was interested to find out how is legality solved when re-using the
> code generated by AI. Nothing else beyond that. And I have found
> references and made conclusions. 

And I challenged your conclusions.  Nothing beyond that.

> > That's not what the above document concludes.  Quote:
> 
> My conclusions are not what document conclude. I never said so.

You didn't say otherwise, either.  Someone who didn't have time to
read the document could think that it's what the document concluded.
That's why I posted the actual quotation, so that people could draw
their own conclusions, or decide they do want to read the document
itself.

> The document is related exclusively to US jurisdiction and its final
> determination is vague. It is their proposal, and not a court
> decision. They have openly said that they are financially strong
> enough and will try to defend any cases by using "fair use" doctrine
> in the US.

All true, but the same can be said about all the other posts and blogs
you quoted.  Which leaves us none the wiser about the problem.  We
still need to assess the original facts and data and make our own
conclusions.  I did, and my conclusions are starkly different from
yours.

> Everything from that document relates to US jurisdiction only. It is
> one-sided thus biased document, clearly opposing the views of many GPL
> authors. 
> 
> It is the corporations defense argument for ripping off the GPL
> software. They found the way and wish to play on that card. 
> 
> It is document of conflict, not document of friendship or
> collaboration.  It is document of one-sidede defense, not a document
> that contributes to free software. 
> 
> It is document that defend proprietary software, not document that
> fosters free software.
> 
> Thus it does not resolve anything in the community. It serves to one
> party only.

And yet you draw conclusions from it about how GNU and Emacs should
behave about this technology?  How does that make sense?

> > > Conclusion as of 2021-07-24 is that authors cannot be sure as there
> > > are legal uncertainties.
> > 
> > Those are your personal conclusions.
> 
> Personal definitely, but not as the only one with the same opinion,

How does that make any difference?  Should GNU and Emacs take the fact
that several people expressed this opinion as meaning it is the truth
for our purposes?  Especially since some (perhaps many) of them are
driven by motivation that is explicitly anti-Free Software?

> which should be clear from references which left probably unread.

Please don't assume I didn't read those references.  You have no basis
for making such nasty assumptions about me, let alone expressing them
publicly here.

> Legality of free software on the planet was ensured by the GPL
> license. Maybe the license was never planned to be international, but
> it does function well internationally. 
> 
> Legality of AI generated code and "free use" doctrine in the US is at
> this point of development yet far from functioning well
> internationally. 

Which, to me, says that we should carefully examine this issue by
ourselves, not draw any premature conclusions from the hoop-la out
there.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  8:35                                     ` Jean Louis
@ 2021-07-24  8:59                                       ` Eli Zaretskii
  2021-07-24 16:18                                         ` Jean Louis
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24  8:59 UTC (permalink / raw)
  To: Jean Louis; +Cc: mullikine, emacs-tangents, stefan, rms

> Date: Sat, 24 Jul 2021 11:35:41 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: rms@gnu.org, stefan@marxist.se, mullikine@gmail.com,
>   emacs-tangents@gnu.org
> 
> There are no doubts that code may be copied verbatin, as here is
> authorized and official documentation by Github related to Copilot:
> https://docs.github.com/en/github/copilot/research-recitation

So we won't use that Github stuff.  Which we won't anyway, because we
avoid Github in general.

How does this help us decide about general usability of this
technology?  It doesn't.

> At this moment there is no verified legal statement by let us say FSF
> attorneys or legal experts or some other organization that will
> confirm legal status of such generated code or text on international
> level. 

I suggest to leave the legal stuff to the legal experts.  We should
assess the data and provide them with facts, not make the decisions
for them.  Which means this discussion, and your suggestions that we
should already stay away of this technology, are premature at best, if
not in the wrong place at the wrong time.

So please don't discourage independent assessment of this technology
by posting half-baked "legal" opinions from people with questionable
motivation (present company excluded, of course) representing that
this technology is legally incompatible with Free Software.  IMO, we
don't yet know enough for any such definitive opinions and
conclusions.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  7:59                                   ` Eli Zaretskii
@ 2021-07-24  9:31                                     ` Philip Kaludercic
  2021-07-24 11:19                                       ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Philip Kaludercic @ 2021-07-24  9:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms, bugs

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Philip Kaludercic <philipk@posteo.net>
>> Cc: rms@gnu.org,  mullikine@gmail.com,  emacs-tangents@gnu.org,
>>   stefan@marxist.se,  bugs@gnu.support
>> Date: Sat, 24 Jul 2021 07:41:21 +0000
>> 
>> > It cannot be a verbatim copy, because at least the variables, and
>> > sometimes also the data types, need to be renamed.  Whether the result
>> > is still under the original copyright cannot be established without
>> > actually comparing the two versions of the code.  So any general
>> > flat rejection of the idea of these services on these grounds is not
>> > serious, IMO.
>> 
>> Not necessarily, if it generates a pure, top-level function. Someone
>> could type something like "Sort list of postcodes" and it generates a
>> Radix Sort function. And if this is part of some code that was copied a
>> lot, the model might tend to generate this verbatim even more likely.
>
> A sort function must state at least the data type before it can be
> compiled.  And if you are talking about pseudo-code that is data-type
> agnostic, then that's an algorithm, and is not copyrightable, AFAIK.

No, I was thinking about concrete code, that depending on the language
might even just rely on the standard library, especially if the language
has generics. Seeing how often SO code has been found in random
repositories[0], I don't think it is improbable that the trained models
might notice these patterns.

[0] For example https://programming.guide/worlds-most-copied-so-snippet.html

-- 
	Philip Kaludercic



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  9:31                                     ` Philip Kaludercic
@ 2021-07-24 11:19                                       ` Eli Zaretskii
  2021-07-24 14:16                                         ` Philip Kaludercic
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24 11:19 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: stefan, emacs-tangents, mullikine, rms, bugs

> From: Philip Kaludercic <philipk@posteo.net>
> Cc: rms@gnu.org,  mullikine@gmail.com,  emacs-tangents@gnu.org,
>   stefan@marxist.se,  bugs@gnu.support
> Date: Sat, 24 Jul 2021 09:31:38 +0000
> 
> >> Not necessarily, if it generates a pure, top-level function. Someone
> >> could type something like "Sort list of postcodes" and it generates a
> >> Radix Sort function. And if this is part of some code that was copied a
> >> lot, the model might tend to generate this verbatim even more likely.
> >
> > A sort function must state at least the data type before it can be
> > compiled.  And if you are talking about pseudo-code that is data-type
> > agnostic, then that's an algorithm, and is not copyrightable, AFAIK.
> 
> No, I was thinking about concrete code, that depending on the language
> might even just rely on the standard library, especially if the language
> has generics. Seeing how often SO code has been found in random
> repositories[0], I don't think it is improbable that the trained models
> might notice these patterns.

Sorry, I don't understand what you have in mind.  Can you show an
example of useful code that could be copied verbatim into a program
without at least some renaming, without breaking the program?



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 11:19                                       ` Eli Zaretskii
@ 2021-07-24 14:16                                         ` Philip Kaludercic
  2021-07-24 14:37                                           ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Philip Kaludercic @ 2021-07-24 14:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms, bugs

Eli Zaretskii <eliz@gnu.org> writes:

>> > A sort function must state at least the data type before it can be
>> > compiled.  And if you are talking about pseudo-code that is data-type
>> > agnostic, then that's an algorithm, and is not copyrightable, AFAIK.
>> 
>> No, I was thinking about concrete code, that depending on the language
>> might even just rely on the standard library, especially if the language
>> has generics. Seeing how often SO code has been found in random
>> repositories[0], I don't think it is improbable that the trained models
>> might notice these patterns.
>
> Sorry, I don't understand what you have in mind.  Can you show an
> example of useful code that could be copied verbatim into a program
> without at least some renaming, without breaking the program?

To take the example from the article I mentioned above

    public static String humanReadableByteCount(long bytes, boolean si) {
        int unit = si ? 1000 : 1024;
        if (bytes < unit) return bytes + " B";
        int exp = (int) (Math.log(bytes) / Math.log(unit));
        String pre = (si ? "kMGTPE" : "KMGTPE").charAt(exp-1) + (si ? "" : "i");
        return String.format("%.1f %sB", bytes / Math.pow(unit, exp), pre);
    }

can be copied into a Java program, and assuming that there is no other
method called humanReadableByteCount in the same class, it should
compile and run without renaming or re-typing. CoPilot might generate
this from a comment like, 

    // Convert a byte count to a human-readable string

since it is mentioned over 6000 times on GitHub (and this method even
has a bug, as the article explains -- but that is a totally different
issue).

-- 
	Philip Kaludercic



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 14:16                                         ` Philip Kaludercic
@ 2021-07-24 14:37                                           ` Eli Zaretskii
  2021-07-24 14:49                                             ` Philip Kaludercic
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24 14:37 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: stefan, emacs-tangents, mullikine, rms, bugs

> From: Philip Kaludercic <philipk@posteo.net>
> Cc: rms@gnu.org,  mullikine@gmail.com,  emacs-tangents@gnu.org,
>   stefan@marxist.se,  bugs@gnu.support
> Date: Sat, 24 Jul 2021 14:16:55 +0000
> 
> > Sorry, I don't understand what you have in mind.  Can you show an
> > example of useful code that could be copied verbatim into a program
> > without at least some renaming, without breaking the program?
> 
> To take the example from the article I mentioned above
> 
>     public static String humanReadableByteCount(long bytes, boolean si) {
>         int unit = si ? 1000 : 1024;
>         if (bytes < unit) return bytes + " B";
>         int exp = (int) (Math.log(bytes) / Math.log(unit));
>         String pre = (si ? "kMGTPE" : "KMGTPE").charAt(exp-1) + (si ? "" : "i");
>         return String.format("%.1f %sB", bytes / Math.pow(unit, exp), pre);
>     }
> 
> can be copied into a Java program, and assuming that there is no other
> method called humanReadableByteCount in the same class, it should
> compile and run without renaming or re-typing.

How would one know it's 'long' and not some other data type?

> CoPilot might generate this from a comment like,
> 
>     // Convert a byte count to a human-readable string
> 
> since it is mentioned over 6000 times on GitHub (and this method even
> has a bug, as the article explains -- but that is a totally different
> issue).

That's not how AI works: it doesn't just count the number of times
something is mentioned.  That usually leads to unsatisfactory results.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 14:37                                           ` Eli Zaretskii
@ 2021-07-24 14:49                                             ` Philip Kaludercic
  2021-07-24 15:13                                               ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Philip Kaludercic @ 2021-07-24 14:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms, bugs

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Philip Kaludercic <philipk@posteo.net>
>> Cc: rms@gnu.org,  mullikine@gmail.com,  emacs-tangents@gnu.org,
>>   stefan@marxist.se,  bugs@gnu.support
>> Date: Sat, 24 Jul 2021 14:16:55 +0000
>> 
>> > Sorry, I don't understand what you have in mind.  Can you show an
>> > example of useful code that could be copied verbatim into a program
>> > without at least some renaming, without breaking the program?
>> 
>> To take the example from the article I mentioned above
>> 
>>     public static String humanReadableByteCount(long bytes, boolean si) {
>>         int unit = si ? 1000 : 1024;
>>         if (bytes < unit) return bytes + " B";
>>         int exp = (int) (Math.log(bytes) / Math.log(unit));
>>         String pre = (si ? "kMGTPE" : "KMGTPE").charAt(exp-1) + (si ? "" : "i");
>>         return String.format("%.1f %sB", bytes / Math.pow(unit, exp), pre);
>>     }
>> 
>> can be copied into a Java program, and assuming that there is no other
>> method called humanReadableByteCount in the same class, it should
>> compile and run without renaming or re-typing.
>
> How would one know it's 'long' and not some other data type?

I am not sure what you mean? "long" makes sense here because Java will
automatically up-cast any other type to fit.

>> CoPilot might generate this from a comment like,
>> 
>>     // Convert a byte count to a human-readable string
>> 
>> since it is mentioned over 6000 times on GitHub (and this method even
>> has a bug, as the article explains -- but that is a totally different
>> issue).
>
> That's not how AI works: it doesn't just count the number of times
> something is mentioned.  That usually leads to unsatisfactory results.

Of course, that would be oversimplifying. At the same time, if the
training samples have common patterns, a model is more likely to
reproduce that behaviour. But since these are neural networks we are
talking about, it is hard to determine causality to begin with, which
probably makes the whole situation even more difficult (speaking as a
non-lawyer).

-- 
	Philip Kaludercic



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 14:49                                             ` Philip Kaludercic
@ 2021-07-24 15:13                                               ` Eli Zaretskii
  0 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24 15:13 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: stefan, emacs-tangents, mullikine, rms, bugs

> From: Philip Kaludercic <philipk@posteo.net>
> Cc: rms@gnu.org,  mullikine@gmail.com,  emacs-tangents@gnu.org,
>   stefan@marxist.se,  bugs@gnu.support
> Date: Sat, 24 Jul 2021 14:49:02 +0000
> 
> > How would one know it's 'long' and not some other data type?
> 
> I am not sure what you mean? "long" makes sense here because Java will
> automatically up-cast any other type to fit.

So you came up with perhaps the single example that exists in the
whole world where the issues I mentioned _might_ not matter, and even
that only under some assumptions.  A feature that aspires to be
generally useful cannot possibly depend on such problematic
assumptions.

> >> since it is mentioned over 6000 times on GitHub (and this method even
> >> has a bug, as the article explains -- but that is a totally different
> >> issue).
> >
> > That's not how AI works: it doesn't just count the number of times
> > something is mentioned.  That usually leads to unsatisfactory results.
> 
> Of course, that would be oversimplifying. At the same time, if the
> training samples have common patterns, a model is more likely to
> reproduce that behaviour.

No, that's not it: a single example repeated in identical form many
times doesn't reinforce the learned pattern.  You need many similar,
but different code samples, and most probably in different languages.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  8:50                                         ` Eli Zaretskii
@ 2021-07-24 16:16                                           ` Jean Louis
  2021-07-24 16:44                                             ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Jean Louis @ 2021-07-24 16:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-24 11:51]:
> Please re-read your postings.  They say otherwise.  I realize that you
> didn't intend that, but that's how your words sound, here and
> elsewhere.  If you want to avoid such interpretation, please take
> great care to tone down your categorical statements, use qualifiers
> like IMO and AFAIK and "I think", and generally make sure your words
> say that's your opinion, not an absolute truth, let alone something
> the GNU project decided to do or is doing.
> 
> > For example when there is recommendation of proprietary software I
> > will say that GNU project does not endorse such.

> Please consider adding "AFAIK" or somesuch, otherwise this sounds like
> you are speaking for the project.

Thanks, but no, all opinions are private. What is official for GNU
project is on the GNU website, in the GNU manuals and other
documentation. 

Though, that is not the subject of this thread.

You have said your opinion though you did not mention not even one
legal reference to the questions about licensing compliance that I
have mentioned. So I keep it as your differing opinion though without
references to legalities I do not find it relevant.

Julia Reda's opinion I find relevant, and I found it online. I wish I
could find it as answer on this mailing list sooner than online, but
it is not so. 

> And yet you draw conclusions from it about how GNU and Emacs should
> behave about this technology?  How does that make sense?

I never said so, that is misunderstanding. Once again, my question
related GPL licensing compliance is relevant to GNU project, and to
GPL licensed software authors.

Licensing is legality. It is not related to technological parts. I
have never mentioned technology and how GNU and Emacs should behave. 

In the thread of Pen.el subject I wanted to find out how is compliance
to licenses solved. Don't make fuss about the simple questions. Maybe
is better to wait and let maybe somebody else jump in and answer it. 

You seem to personally chase me that I stop asking questions? 

It does not really seem welcoming, it seems like I did something bad
to you and you are pushing with force to stop me asking such simple
banal question. 

We talk about GPL licensing compliance for years in various GNU
related discussion within GNU project and without GNU project. I was
asking German companies about licensing compliance to GNU GPL software
and had such a nice conversation with them and they agreed to comply
to it, and provided sources.

You please make it easy, as I am asking logical question, don't call
me radical as that has negative connotations. 

> > Legality of AI generated code and "free use" doctrine in the US is at
> > this point of development yet far from functioning well
> > internationally. 
> 
> Which, to me, says that we should carefully examine this issue by
> ourselves, not draw any premature conclusions from the hoop-la out
> there.

Remember that it was me who first responded to original poster and
installed pen.el and tried to run it, at that time I did not have the
OpenAI key, but now I have it. 

From that, it should be obvious that I am interested in the
technology.

Without even looking online (due to my limited Internet) I have asked
about licensing compliance, there was no answer until I found it today
from online sources. That question is related to adopting the
technology, not to rejecting it.

If you wish to adopt anything into private use one has to have
permissions, or in this case "fair use" exemption granted by US
government. It should be obvious that I have referenced legal
advisors, attorneys who made that document, including Julia Reda,
known as activist in Germany, and it was me who found references and
listed it here.

Beside those really deficient expressions, if you have a constructive
references on how how each jurisdiction would accept "fair use" let me
know, otherwise leave this discussion in peace and myself. 

Stay on subject, don't call me words as me and you didn't graze the
sheep together.

Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  8:59                                       ` Eli Zaretskii
@ 2021-07-24 16:18                                         ` Jean Louis
  2021-07-24 16:45                                           ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Jean Louis @ 2021-07-24 16:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mullikine, emacs-tangents, stefan, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-24 12:00]:
> So please don't discourage independent assessment of this technology
> by posting half-baked "legal" opinions from people with questionable
> motivation (present company excluded, of course) representing that
> this technology is legally incompatible with Free Software.  IMO, we
> don't yet know enough for any such definitive opinions and
> conclusions.

Quite contrary, the GPL licensing compliance question is related to
adoption and expansion of free software. 

I have never stated it is incompatible with free software, I have
asked how is GPL licensing compliance solved.


Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 16:16                                           ` Jean Louis
@ 2021-07-24 16:44                                             ` Eli Zaretskii
  2021-07-24 18:01                                               ` Jean Louis
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24 16:44 UTC (permalink / raw)
  To: Jean Louis; +Cc: stefan, emacs-tangents, mullikine, rms

> Date: Sat, 24 Jul 2021 19:16:34 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: mullikine@gmail.com, stefan@marxist.se, emacs-tangents@gnu.org,
>   rms@gnu.org
> 
> > > For example when there is recommendation of proprietary software I
> > > will say that GNU project does not endorse such.
> 
> > Please consider adding "AFAIK" or somesuch, otherwise this sounds like
> > you are speaking for the project.
> 
> Thanks, but no, all opinions are private.

Again, yours don't sound private, and many people will become confused
at best.

> You have said your opinion though you did not mention not even one
> legal reference to the questions about licensing compliance that I
> have mentioned. So I keep it as your differing opinion though without
> references to legalities I do not find it relevant.

This is a public list.  It is important for me to state my differing
opinions so that people could make up their own minds.

> You seem to personally chase me that I stop asking questions? 

No.  I'm trying to correct the wrong impression your postings could
have made on people reading them.

> It does not really seem welcoming, it seems like I did something bad
> to you and you are pushing with force to stop me asking such simple
> banal question. 

I said exactly what I think was wrong with your postings.  I wish you
would change the style and wording when you speak on these issues, but
I have no real control on what you will do.

> You please make it easy, as I am asking logical question, don't call
> me radical as that has negative connotations. 

I explained in detail why I said that.  I'm sorry to conclude that you
disregard that.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 16:18                                         ` Jean Louis
@ 2021-07-24 16:45                                           ` Eli Zaretskii
  2021-07-24 17:57                                             ` Jean Louis
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24 16:45 UTC (permalink / raw)
  To: Jean Louis; +Cc: mullikine, emacs-tangents, stefan, rms

> Date: Sat, 24 Jul 2021 19:18:02 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: rms@gnu.org, stefan@marxist.se, mullikine@gmail.com,
>   emacs-tangents@gnu.org
> 
> * Eli Zaretskii <eliz@gnu.org> [2021-07-24 12:00]:
> > So please don't discourage independent assessment of this technology
> > by posting half-baked "legal" opinions from people with questionable
> > motivation (present company excluded, of course) representing that
> > this technology is legally incompatible with Free Software.  IMO, we
> > don't yet know enough for any such definitive opinions and
> > conclusions.
> 
> Quite contrary, the GPL licensing compliance question is related to
> adoption and expansion of free software. 
> 
> I have never stated it is incompatible with free software, I have
> asked how is GPL licensing compliance solved.

No, you said we shouldn't use this.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 16:45                                           ` Eli Zaretskii
@ 2021-07-24 17:57                                             ` Jean Louis
  2021-07-24 18:15                                               ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Jean Louis @ 2021-07-24 17:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mullikine, emacs-tangents, stefan, rms

* Eli Zaretskii <eliz@gnu.org> [2021-07-24 19:46]:

> > I have never stated it is incompatible with free software, I have
> > asked how is GPL licensing compliance solved.
> 
> No, you said we shouldn't use this.

Sorry for misunderstandings. 

I don't remember I ever said we shouldn't use this. It does not appear
to me logical as I remember my intention was to find out how is
licensing compliance solved so that it becomes clear how it
works. 

Question was directed to author of Pen.el and there was no clear
answer neither from you, so I found myself from online research that
at least in US for now it is based on "fair use" doctrine.

When we take the word "fair" in its original definition, it should be
obvious from online comments that many GPL authors do not really find
it "fair". It is however one defense that all of present similar AI
models have in common, they are to use "fair use" doctrine. We will
see that.

Also to mention, AI as such is not related to this particular case of
using GPL and other free software, it is just one small application of
overall artificial technology, there are many other applications which
are totally out of this context.

Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 16:44                                             ` Eli Zaretskii
@ 2021-07-24 18:01                                               ` Jean Louis
  0 siblings, 0 replies; 46+ messages in thread
From: Jean Louis @ 2021-07-24 18:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefan, emacs-tangents, mullikine, rms

Eli, my question is solved, there was subject "Pen.el" and I found
what is the legal justification from third parties. Thus my question
is solved. I would like to try Pen.el but I have here technical
problems.

Thank for suggestions, I will think about it.

Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24 17:57                                             ` Jean Louis
@ 2021-07-24 18:15                                               ` Eli Zaretskii
  0 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2021-07-24 18:15 UTC (permalink / raw)
  To: Jean Louis; +Cc: mullikine, emacs-tangents, stefan, rms

> Date: Sat, 24 Jul 2021 20:57:42 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: rms@gnu.org, stefan@marxist.se, mullikine@gmail.com,
>   emacs-tangents@gnu.org
> 
> * Eli Zaretskii <eliz@gnu.org> [2021-07-24 19:46]:
> 
> > > I have never stated it is incompatible with free software, I have
> > > asked how is GPL licensing compliance solved.
> > 
> > No, you said we shouldn't use this.
> 
> Sorry for misunderstandings. 

To avoid such misunderstandings, I suggest to tone down your language
when you are talking about licensing issues associated with some
technologies or products, so that what you write couldn't be
interpreted as saying that there are legal problems which prevent our
use of those technologies and products.

> Question was directed to author of Pen.el and there was no clear
> answer neither from you, so I found myself from online research that
> at least in US for now it is based on "fair use" doctrine.
> 
> When we take the word "fair" in its original definition, it should be
> obvious from online comments that many GPL authors do not really find
> it "fair". It is however one defense that all of present similar AI
> models have in common, they are to use "fair use" doctrine. We will
> see that.

That is a separate issue, which is IMO completely unrelated.  Emacs is
Free Software, and is distributed under GPL, so for Emacs it is OK to
allow users to use other GPL code out there in their programs.  That
there are producers of proprietary software who use pieces of GPL code
in their proprietary products without complying with GPL is completely
unrelated to what the Emacs project can do with this technology.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23  6:51                     ` Shane Mulligan
  2021-07-23 10:12                       ` Jean Louis
@ 2021-07-25  1:06                       ` Richard Stallman
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Stallman @ 2021-07-25  1:06 UTC (permalink / raw)
  To: Shane Mulligan; +Cc: eliz, stefan, emacs-tangents, bugs

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > GPT is potentially the best thing to happen to emacs in a very long time.

If GPT-3 were released as a free program, we might want to use it.
Perhaps it would be very useful.

Please correct me if I am mistaken, but I think GPT-3 is an unreleased
program which people can use only via SaaSS.  SaaSS stands for Service
as a Software Substitute.  It means that a "service" accepts your
data, does a specific computing job, and sends you back the results.
Using such a service is morally mostly equivalent to running a nonfree
program -- so we cannot suggest that anyone DO that.

See https://gnu.org/philosophy/who-does-that-server-really-serve.html for
more explanation of this issue.

  > The way this will work is you will download
  > the free GPT model, such as GPT-j, GPT-neo or
  > GPT-neox and then you will have an offline and
  > private alternative to many things previously
  > you would go online for.

Are you saying there is a free replacement for GPT-3 and we can
run these free models with it on our own computers?

That could be good news, because we could actually use it.

  > It will bring back power from the corporations and save it to your
  > computer,

That sounds exciting but it is not concrete enough to think about.

              open source and transparent, 

What follows is a side issue, but it's an important side issue.

"Open source" is the slogan of a campaign we don't advocate.
It is partly similar to the free software movement but discards
the moral foundation: the idea of freedom.

We don't use the slogan "open source" because we want to advocate
freedom, not forget it.

See https://gnu.org/philosophy/open-source-misses-the-point.html
for more explanation of the difference between free software and open
source.  See also https://thebaffler.com/salvos/the-meme-hustler for
Evgeny Morozov's article on the same point.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-24  3:07                                   ` Jean Louis
  2021-07-24  7:32                                     ` Eli Zaretskii
@ 2021-07-25  1:09                                     ` Richard Stallman
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Stallman @ 2021-07-25  1:09 UTC (permalink / raw)
  To: Jean Louis; +Cc: stefan, eliz, mullikine, emacs-tangents

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > And fact that it is "rare" does not make it a less problem for
  > copyright purposes as the new author cannot know which part of the
  > code has used "rare" verbatim.

I think that is correct.  Falling into this pit may be unusual, but
that doesn't make it painless.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-23 13:39                                 ` Shane Mulligan
  2021-07-23 14:39                                   ` Jean Louis
@ 2021-07-26  0:16                                   ` Richard Stallman
  2021-07-26  0:28                                     ` Shane Mulligan
  1 sibling, 1 reply; 46+ messages in thread
From: Richard Stallman @ 2021-07-26  0:16 UTC (permalink / raw)
  To: Shane Mulligan; +Cc: eliz, stefan, emacs-tangents, bugs

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > GPT turns emacs into something very powerful
  > beyond your current comprehension. It's so
  > profound that it will replace many of the
  > online and offline services you may have come
  > to take for granted. It goes way beyond that too.

Unfortunately, telling me that something is "powerful beyond [my]
current comprehension" does not help me start to comprehend any of it.

Would you like to name some of the services that GPT would replace?
I might learn something concrete from that.

  > Here is the recording of me doing that:

  > https://asciinema.org/a/SCUhm3l11N3w5eilUfewBDCiP

I looked at that page, but I have no idea what it means.  The page
shows three boxes side by side.  Each seems to contain some code, or
maybe parameter specs, in a language I don't know.  I clicked on the
first box and it brought me to a similar page with three other boxes.

It tasks about "asciicasts" but I don't know what that means.
If it is something to be viewed, how can I do so?

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-26  0:16                                   ` Richard Stallman
@ 2021-07-26  0:28                                     ` Shane Mulligan
  2021-07-30  3:20                                       ` Shane Mulligan
  0 siblings, 1 reply; 46+ messages in thread
From: Shane Mulligan @ 2021-07-26  0:28 UTC (permalink / raw)
  To: rms; +Cc: Eli Zaretskii, Stefan Kangas, emacs-tangents, Jean Louis

[-- Attachment #1: Type: text/plain, Size: 2511 bytes --]

Hey Richard and all.

I have just participated in the Augment Minds unconference and have a
recorded demo of Pen.el

I will also be presenting the demo to Nat Friedman. I have made some
references to the new codex model and how it has stolen the inspiration
from Free software.

The point I'm making is this: Pen.el and software which combines GPT into
the operating system is the future
and I'm alerting GNU to this first but I'm also showing GitHub. This is for
the following reasons

- The Copilot/codex model is a disgrace
- We need an free repository of prompts and prompt functions for emacs

I hope the demo which I will send in the next day or two (or whenever it
becomes available) will be informative. It will be easier than the
asciicast.

Thank you.

Shane Mulligan

How to contact me:
🇦🇺 00 61 421 641 250
🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
mullikine@gmail.com


On Mon, Jul 26, 2021 at 12:16 PM Richard Stallman <rms@gnu.org> wrote:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > GPT turns emacs into something very powerful
>   > beyond your current comprehension. It's so
>   > profound that it will replace many of the
>   > online and offline services you may have come
>   > to take for granted. It goes way beyond that too.
>
> Unfortunately, telling me that something is "powerful beyond [my]
> current comprehension" does not help me start to comprehend any of it.
>
> Would you like to name some of the services that GPT would replace?
> I might learn something concrete from that.
>
>   > Here is the recording of me doing that:
>
>   > https://asciinema.org/a/SCUhm3l11N3w5eilUfewBDCiP
>
> I looked at that page, but I have no idea what it means.  The page
> shows three boxes side by side.  Each seems to contain some code, or
> maybe parameter specs, in a language I don't know.  I clicked on the
> first box and it brought me to a similar page with three other boxes.
>
> It tasks about "asciicasts" but I don't know what that means.
> If it is something to be viewed, how can I do so?
>
>
>
> --
> Dr Richard Stallman (https://stallman.org)
> Chief GNUisance of the GNU Project (https://gnu.org)
> Founder, Free Software Foundation (https://fsf.org)
> Internet Hall-of-Famer (https://internethalloffame.org)
>
>
>

[-- Attachment #2: Type: text/html, Size: 4961 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-26  0:28                                     ` Shane Mulligan
@ 2021-07-30  3:20                                       ` Shane Mulligan
  2021-07-30  6:55                                         ` Jean Louis
  0 siblings, 1 reply; 46+ messages in thread
From: Shane Mulligan @ 2021-07-30  3:20 UTC (permalink / raw)
  To: rms; +Cc: Eli Zaretskii, Stefan Kangas, emacs-tangents, Jean Louis

[-- Attachment #1.1: Type: text/plain, Size: 3705 bytes --]

Hey guys.

In the last week I have been writing a thesis for Imaginary Programming,
which aims to make all of this clear and formalised.

I am very sorry if I have sounded frustrated, but I think that this is so
important for free software and a GPL-4 may be required to protect people,
but also that Copilot and OpenAI's Codex and GPT-3 models infringe upon the
spirit of GPT-3 code.

I will attach the thesis into this email.

https://github.com/semiosis/imaginary-programming-thesis/blob/master/thesis.org

I am working around the clock to finish this thesis and have it published,
but it's really important to have these protections in place before the
huge suite of SASS services and Microsoft Apps hit the market which
are using Copilot and Codex to generate derivative works and applications
built upon the backs of free software developers.

Thank you.

Shane Mulligan

How to contact me:
🇦🇺 00 61 421 641 250
🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
mullikine@gmail.com

On Mon, Jul 26, 2021 at 12:28 PM Shane Mulligan <mullikine@gmail.com> wrote:

> Hey Richard and all.
>
> I have just participated in the Augment Minds unconference and have a
> recorded demo of Pen.el
>
> I will also be presenting the demo to Nat Friedman. I have made some
> references to the new codex model and how it has stolen the inspiration
> from Free software.
>
> The point I'm making is this: Pen.el and software which combines GPT into
> the operating system is the future
> and I'm alerting GNU to this first but I'm also showing GitHub. This is
> for the following reasons
>
> - The Copilot/codex model is a disgrace
> - We need an free repository of prompts and prompt functions for emacs
>
> I hope the demo which I will send in the next day or two (or whenever it
> becomes available) will be informative. It will be easier than the
> asciicast.
>
> Thank you.
>
> Shane Mulligan
>
> How to contact me:
> 🇦🇺 00 61 421 641 250
> 🇳🇿 00 64 21 1462 759 <+64-21-1462-759>
> mullikine@gmail.com
>
>
> On Mon, Jul 26, 2021 at 12:16 PM Richard Stallman <rms@gnu.org> wrote:
>
>> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
>> [[[ whether defending the US Constitution against all enemies,     ]]]
>> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>>
>>   > GPT turns emacs into something very powerful
>>   > beyond your current comprehension. It's so
>>   > profound that it will replace many of the
>>   > online and offline services you may have come
>>   > to take for granted. It goes way beyond that too.
>>
>> Unfortunately, telling me that something is "powerful beyond [my]
>> current comprehension" does not help me start to comprehend any of it.
>>
>> Would you like to name some of the services that GPT would replace?
>> I might learn something concrete from that.
>>
>>   > Here is the recording of me doing that:
>>
>>   > https://asciinema.org/a/SCUhm3l11N3w5eilUfewBDCiP
>>
>> I looked at that page, but I have no idea what it means.  The page
>> shows three boxes side by side.  Each seems to contain some code, or
>> maybe parameter specs, in a language I don't know.  I clicked on the
>> first box and it brought me to a similar page with three other boxes.
>>
>> It tasks about "asciicasts" but I don't know what that means.
>> If it is something to be viewed, how can I do so?
>>
>>
>>
>> --
>> Dr Richard Stallman (https://stallman.org)
>> Chief GNUisance of the GNU Project (https://gnu.org)
>> Founder, Free Software Foundation (https://fsf.org)
>> Internet Hall-of-Famer (https://internethalloffame.org)
>>
>>
>>

[-- Attachment #1.2: Type: text/html, Size: 8049 bytes --]

[-- Attachment #2: thesis.org --]
[-- Type: application/octet-stream, Size: 26754 bytes --]

* Imaginary programming is a new programming paradigm based on language models

** Abstract
Imaginary code is code who's behaviour is
influenced by LMs. The side effects or return
values of imaginary code, therefore, are
imagined by a LM, but may also be used to
facilitate the imagination of the programmer
and may be considered to be a bicycle for the
imagination. This is very obvious when
interacting with an imaginary REPL. I will
attempt to formalise imaginary programming,
make some demonstrations of programming within
this paradigm and explore some useful data
structures and algorithms that are both
impurely and purely imaginary. I'll also give
an example of an imaginary programming
language that I have created (perhaps the
first of its kind), examplary. The motivation
for formalising imaginary programming is not
purely academic. Imaginary code needs to be
recognised as code so that it may be protected
by GPL. I also posit that models of NL, if
trained on source code, create a holographic
representation of the software, which I argue
is a derivative work and a reflection of the
original code. I argue that a holographic
representation of software both within (author
inspiration) and without (how it is used) is
just another representation of the software,
alongside the original source code, just as
functions may be represented differently.

** Introduction
The recently burgeoning and soon to diminish
programming paradigm of prompt engineering is
about to be superseded by prompt-tuning and
the fine-tuning of LMs which will further
occlude the way that software works. Prompt
engineering has barely had it's time in the
spotlight and as a result has not established
itself as a sovereign programming paradigm.

However, imaginary programming is a broader
definition that encapsulates all programming
that solicits LMs and uses their output to
effect change in a program's logic and will
outlast prompt engineering as a useful
concept.

In contrast with imaginary code, ordinary code
has not yet been contaminated by a LM, and we
say that it has no imaginary dimension to it.

Impure imaginary code is where ordinary code
intersects with pure imaginary code. An impure
imaginary function is a function that queries
a LM to directly affect its own logic or
output. We say that an impure imaginary
function is grounded [to reality] because it's
connecting base reality to a LM.

The output and behaviour of an impure
imaginary function is directly influenced by
base reality plus a query to a LM.

The query [or prompt] to the LM may be in part
constructed manually through prompt
engineering, or in part constructed
automatically via prompt tuning, or in part 
constructed or eliminated by the fine-tuning of a LM.
Even after fine-tuning, there
is still a query to be formulated to the LM,
and that query may indeed be the empty string.

Considering that large LMs such as GPT-3
can perform multiple tasks, the process of
refining a query through prompt-engineering,
prompt-tuning or fine-tuning also
characterises the expected output from the LM.
All that is left is to map a prompt along with
its associated LM to a function and then you
have a prompt function.

Prompt functions reconcile LMs with
programming languages. A prompt
function is just a function that prompts a LM
and may optionally be parameterized with
template variables that are substituted into
the prompt or also contain hyperparameters to
affect the LM's operation.

Such functions are the basis for services such
as GitHub Copilot.

** Impure imaginary code is useful
Impure imaginary code is very obviously useful
as such code utilises LMs that are trained to
perform useful tasks. GPT-3, for example, is a
generalist and only requires a tiny amount of
prompt design and/or fine-tuning to direct it
to perform the task you want.

Following are some demonstrations of using
impure imaginary code to construct part of an
imaginary programming environment, perform
code generation, transpile code and translate
world languages.

*** Useful impure imaginary functions
**** With one =Pen.el= system
The following prompt function definition function
associates a prompt with a LM (OpenAI's GPT-3
davinci) and defines the parameters for a function in emacs lisp.

#+BEGIN_SRC yaml -n :async :results verbatim code
  title: bash one liner generator on OS from natural language
  doc: Get a bash one liner on OS from natural language
  notes:
  - "rlprompt is used here outside of pen.el"
  rlprompt: nlsh <1>
  prompt: |
      # List of one-liner shell commands for <1>.
      # Language: Shell
      # Operating System: <1>

      Input: Print the current directory
      Output: pwd
      ###
      Input: List files
      Output: ls -l
      ###
      Input: Change directory to /tmp
      Output: cd /tmp
      ###
      repeater: |
      Input: {}
      Output:
  lm-command: "openai-complete.sh"
  engine: davinci
  temperature: 0.8
  max-tokens: 60
  top-p: 1
  stop-sequences:
  - "###"
  vars:
  - Operating System
  - command
  examples:
  - Arch Linux
  - Install package
  postprocessor: 'sed ''s/^Output: //'''
  conversation-mode: true
#+END_SRC

The following is the generated documentation
for the interactive prompt function in emacs.

#+BEGIN_SRC text -n :async :results verbatim code
  pf-bash-one-liner-generator-from-natural-language is an interactive
  function defined in pen-example-config.el.

  Signature
  (pf-bash-one-liner-generator-from-natural-language &optional TASK-DESCRIPTION &key NO-SELECT-RESULT)

  Documentation
  bash one liner generator from natural language
  Get a bash one liner from natural language

  path:
  - /home/shane/source/git/spacemacs/prompts/prompts/bash-one-liner.prompt

  examples:
  - shift last argument

  Key Bindings
  This command is not in any keymaps.

  References
  pf-bash-one-liner-generator-from-natural-language is unused in pen-example-config.el.
#+END_SRC

Below is the generated interactive function in emacs lisp.

#+BEGIN_SRC emacs-lisp -n :async :results verbatim code
  (lambda
    (&optional task-description &rest --cl-rest--)
    "bash one liner generator from natural language\nGet a bash one liner from natural language\n\npath:\n- /home/shane/source/git/spacemacs/prompts/prompts/bash-one-liner.prompt\n\nexamples:\n- shift last argument\n\n(fn &optional TASK-DESCRIPTION &key NO-SELECT-RESULT)"
    (interactive
     (list
      (if mark-active
          (pen-selected-text)
        (if nil
            (etv "shift last argument")
          (read-string-hist "task-description: " "shift last argument")))))
    (let*
        ((no-select-result
          (car
           (cdr
            (plist-member --cl-rest-- ':no-select-result)))))
      (progn
        (let
            ((--cl-keys-- --cl-rest--))
          (while --cl-keys--
            (cond
             ((memq
               (car --cl-keys--)
               '(:no-select-result :allow-other-keys))
              (setq --cl-keys--
                    (cdr
                     (cdr --cl-keys--))))
             ((car
               (cdr
                (memq ':allow-other-keys --cl-rest--)))
              (setq --cl-keys-- nil))
             (t
              (error "Keyword argument %s not one of (:no-select-result)"
                     (car --cl-keys--))))))
        (cl-block pf-bash-one-liner-generator-from-natural-language
          (let*
              ((final-prompt "The following is a list of one-liners for the linux command-line:\n\n# get newest file in directory bash\n$ ls -t * | head -1\n###\n# Find with invert match - e.g. find every file that is not mp3\n$ find . -name '*' -type f -not -path '*.mp3'\n###\n# Recursively remove all \"node_modules\" folders\n$ find . -name \"node_modules\" -exec rm -rf '{}' +\n###\n# <1>\n$\n")
               (final-max-tokens
                (str
                 (if
                     (variable-p 'max-tokens)
                     (eval 'max-tokens)
                   60)))
               (final-stop-sequences
                (if
                    (variable-p 'stop-sequences)
                    (eval 'stop-sequences)
                  '("###")))
               (vals
                (mapcar 'str
                        (if
                            (not
                             (interactive-p))
                            (progn
                              (cl-loop for sym in
                                       '(task-description)
                                       for iarg in
                                       '((if mark-active
                                             (pen-selected-text)
                                           (if nil
                                               (etv "shift last argument")
                                             (read-string-hist "task-description: " "shift last argument"))))
                                       collect
                                       (let*
                                           ((initval
                                             (eval sym)))
                                         (if
                                             (and
                                              (not initval)
                                              iarg)
                                             (eval iarg)
                                           initval))))
                          (cl-loop for v in
                                   '(task-description)
                                   until
                                   (eq v '&key)
                                   collect
                                   (eval v)))))
               (vals
                (cl-loop for tp in
                         (-zip-fill nil vals 'nil)
                         collect
                         (let*
                             ((v
                               (car tp))
                              (pp
                               (cdr tp)))
                           (if pp
                               (pen-sn pp v)
                             v))))
               (i 1)
               (final-prompt
                (pen-expand-template final-prompt vals))
               (prompt-end-pos
                (or
                 (byte-string-search "<:pp>" "The following is a list of one-liners for the linux command-line:\n\n# get newest file in directory bash\n$ ls -t * | head -1\n###\n# Find with invert match - e.g. find every file that is not mp3\n$ find . -name '*' -type f -not -path '*.mp3'\n###\n# Recursively remove all \"node_modules\" folders\n$ find . -name \"node_modules\" -exec rm -rf '{}' +\n###\n# <1>\n$\n")
                 (string-bytes final-prompt)))
               (final-prompt
                (string-replace "<:pp>" "" final-prompt))
               (final-prompt
                (if nil
                    (sor
                     (pen-snc nil final-prompt)
                     (concat "prompt-filter " nil " failed."))
                  final-prompt))
               (pen-sh-update
                (or pen-sh-update
                    (>=
                     (prefix-numeric-value current-global-prefix-arg)
                     4)))
               (shcmd
                (pen-log
                 (concat
                  (sh-construct-envs
                   `(("PEN_PROMPT" ,(pen-encode-string final-prompt))
                     ("PEN_LM_COMMAND" ,"openai-complete.sh")
                     ("PEN_ENGINE" ,"davinci")
                     ("PEN_MAX_TOKENS" ,(pen-expand-template final-max-tokens vals))
                     ("PEN_TEMPERATURE" ,(pen-expand-template
                                          (str 0.8)
                                          vals))
                     ("PEN_STOP_SEQUENCE" ,(pen-encode-string
                                            (str
                                             (if
                                                 (variable-p 'stop-sequence)
                                                 (eval 'stop-sequence)
                                               "###"))))
                     ("PEN_TOP_P" ,1)
                     ("PEN_CACHE" ,nil)
                     ("PEN_N_COMPLETIONS" ,5)
                     ("PEN_END_POS" ,prompt-end-pos)))
                  " " "upd lm-complete")))
               (resultsdirs
                (cl-loop for i in
                         (number-sequence 1 1)
                         collect
                         (progn
                           (message
                            (concat "pf-bash-one-liner-generator-from-natural-language" " query "
                                    (int-to-string i)
                                    "..."))
                           (let
                               ((ret
                                 (pen-prompt-snc shcmd i)))
                             (message
                              (concat "pf-bash-one-liner-generator-from-natural-language" " done "
                                      (int-to-string i)))
                             ret))))
               (results
                (-uniq
                 (flatten-once
                  (cl-loop for rd in resultsdirs collect
                           (if
                               (sor rd)
                               (->>
                                   (glob
                                    (concat rd "/*"))
                                 (mapcar 'e/cat)
                                 (mapcar
                                  (lambda
                                    (r)
                                    (if
                                        (and nil
                                             (sor nil))
                                        (pen-sn nil r)
                                      r)))
                                 (mapcar
                                  (lambda
                                    (r)
                                    (if
                                        (and
                                         (variable-p 'prettify)
                                         prettify nil
                                         (sor nil))
                                        (pen-sn nil r)
                                      r)))
                                 (mapcar
                                  (lambda
                                    (r)
                                    (if
                                        (not nil)
                                        (s-trim-left r)
                                      r)))
                                 (mapcar
                                  (lambda
                                    (r)
                                    (if
                                        (not nil)
                                        (s-trim-right r)
                                      r)))
                                 (mapcar
                                  (lambda
                                    (r)
                                    (cl-loop for stsq in final-stop-sequences do
                                             (let
                                                 ((matchpos
                                                   (string-search stsq r)))
                                               (if matchpos
                                                   (setq r
                                                         (s-truncate matchpos r "")))))
                                    r)))
                             (list
                              (message "Try UPDATE=y or debugging")))))))
               (result
                (if no-select-result
                    (length results)
                  (cl-fz results :prompt
                         (concat "pf-bash-one-liner-generator-from-natural-language" ": ")
                         :select-only-match t))))
            (if no-select-result results
              (if
                  (interactive-p)
                  (cond
                   ((>=
                     (prefix-numeric-value current-prefix-arg)
                     4)
                    (etv result))
                   ((and nil mark-active)
                    (replace-region result))
                   ((or nil nil)
                    (insert result))
                   (t
                    (etv result)))
                result)))))))
#+END_SRC

The above function creates a NL shell. This
enables you to generate shell commands based
on NL and it is parameterized to enable you to
specify the operating system that the commands
generated should run on.

#+BEGIN_SRC emacs-lisp -n :async :results raw
  (list2str (pf-bash-one-liner-generator-on-os-from-natural-language "Arch Linux" "Disable firewall" :no-select-result t))
#+END_SRC

Here is a list of suggestions generated from
the above prompt function.

#+BEGIN_SRC text -n :async :results verbatim code
  iptables -F
  iptables -P OUTPUT DROP
  sed -i 's/^[ \t]*firewall=.*$/firewall=0/' /etc/sysconfig/iptables
  systemctl stop iptables.service
  sudo systemctl stop iptables
  sudo ufw disable
#+END_SRC

You may also run it as a REPL.

https://semiosis.github.io/posts/imaginary-programming-with-gpt-3/

#+BEGIN_SRC yaml -n :async :results verbatim code
  title: Code interpreter kickstarter
  future-titles:
  - Code interpreter kickstarter
  doc: Given a line of code, infer the result of running that code
  prompt-version: 4
  prompt: |
    Code examples:

    Language: Python
    Input: print(random.randint(0,9))
    Output: 5
    ###
    Language: Bash
    Input: Str="Learn Linux from LinuxHint"; subStr=${Str:6:5}
    Output: Linux
    ###
  repeater: |
    Language: <1>
    Input: {}
    Output:
  issues: 
  engine: davinci
  temperature: 0.8
  max-tokens: 60
  top-p: 1
  stop-sequences:
  - "##"
  - "\n"
  vars:
  - language
  - code
  examples:
  - haskell
  - '"Hello" ++ " " ++ "World"'
  prefer-external: true
  external: iol
  similarity-test: string-equal
  quality-script: levenshtein -s
  conversation-mode: true
  n-test-runs: 5
#+END_SRC

#+BEGIN_SRC emacs-lisp -n :async :results raw
  (car (pf-code-interpreter-kickstarter "Haskell" "\"Hello\" ++ \" \" ++ \"World\"" :no-select-result t))
#+END_SRC

#+BEGIN_SRC text -n :async :results verbatim code
  Hello World
#+END_SRC

**** With two =Pen.el= systems
***** Using a common language model
Translating communications with a world
language translation prompt function.

#+BEGIN_SRC yaml -n :async :results verbatim code
  title: Translate from world language X to Y
  prompt-version: 3
  doc: This prompt translates English text to any world langauge
  prompt: |
    ###
    # English: Hello
    # Russian: Zdravstvuyte
    # Italian: Salve
    # Japanese: Konnichiwa
    # German: Guten Tag
    # French: Bonjour
    # Spanish: Hola
    ###
    # English: Happy birthday!
    # French: Bon anniversaire !
    # German: Alles Gute zum Geburtstag!
    # Italian: Buon compleanno!
    # Indonesian: Selamat ulang tahun!
    ###
    # <1>: <3>
    # <2>:
  engine: davinci
  temperature: 0.5
  max-tokens: 200
  top-p: 1
  stop-sequences:
  - "#"
  vars:
  - from-language
  - to-language
  - phrase
  preprocessors:
  - cat
  - cat
  - pen-s onelineify
  postprocessor: pen-s unonelineify
  examples:
  - English
  - French
  - Goodnight
  var-defaults:
  - "(or (sor (nth 0 (pf-get-language (pen-selected-text) :no-select-result t))) (read-string-hist \"Pen From language: \"))"
  - "(read-string-hist \"Pen To language: \")"
  - "(pen-selected-text)"
  filter: on
#+END_SRC

A demonstration of two people who understand
different world languages using a common LM to
understand one another.

#+NAME: fromenglish
#+BEGIN_SRC text -n :async :results verbatim code
  Happy birthday
  To you
#+END_SRC

#+BEGIN_SRC emacs-lisp -n :async :results code raw
  ;; Alice translates into french for Bob
  (car (pf-translate-from-world-language-x-to-y "English" "French" "Happy birthday\nTo you" :no-select-result t))
#+END_SRC

#+NAME: fromfrench
#+BEGIN_SRC text -n :async :results verbatim code
  Bon anniversaire
  A vous
#+END_SRC

#+BEGIN_SRC text -n :async :results verbatim code
  Merci
  beaucoup
#+END_SRC

#+BEGIN_SRC emacs-lisp -n :async :results code raw
  ;; Bob translates back into English for Alice
  (car (pf-translate-from-world-language-x-to-y "French" "English" "Merci\nbeaucoup" :no-select-result t))
#+END_SRC

#+BEGIN_SRC text -n :async :results verbatim code
  Thank you!
#+END_SRC

https://asciinema.org/a/7YnSnrrLgbiFlyMyYxBgaZYUb

#+BEGIN_EXPORT html
<!-- Play on asciinema.com -->
<!-- <a title="asciinema recording" href="https://asciinema.org/a/7YnSnrrLgbiFlyMyYxBgaZYUb" target="_blank"><img alt="asciinema recording" src="https://asciinema.org/a/7YnSnrrLgbiFlyMyYxBgaZYUb.svg" /></a> -->
<!-- Play on the blog -->
<script src="https://asciinema.org/a/7YnSnrrLgbiFlyMyYxBgaZYUb.js" id="asciicast-7YnSnrrLgbiFlyMyYxBgaZYUb" async></script>
#+END_EXPORT

***** With different language models
- GPT-neo and GPT-3?
- curie vs davinci?

- Generate a story about a meeting with one prompt
- Summarize with bullet points
  - meeting-bullets-to-summary.prompt

*** An impure imaginary data structure
**** With one =Pen.el= system
- Natural language database entry
**** With two =Pen.el= systems
- Database prompt
**** With three =Pen.el= systems
- Database prompt

*** TODO Find a useful impure imaginary algorithm
**** With one =Pen.el= system
- Translate from X to Y
- Backtranslate from Y to X

Find a better prompt?
**** With two =Pen.el= systems
**** With three =Pen.el= systems

** Pure imaginary code is useful
Pure imaginary programming is a type of programming where the original language
models may not even be known.

I demonstate that collaborative pure imaginary programming is useful.

*** Translation between two =Pen.el= systems with different language models
A common library of pure imaginary functions.

#+BEGIN_SRC emacs-lisp -n :async :results verbatim code
  ("translate" "prose" "from" "to")
#+END_SRC

Pure imaginary functions can be composed.

#+BEGIN_SRC emacs-lisp -n :async :results verbatim code
  ("translate" ("make analogy about" "topic") "from" "to")
#+END_SRC

** Imaginary programming languages are required to work with language models
*** Examplary
- Part of it is task-oriented, which defers imagination to a language model to understand what it means.
- Part of it is example-oriented, which is pure-imaginary.

*** Example-oriented
#+BEGIN_SRC emacs-lisp -n :async :results verbatim code
  ;; Convert lines to regex.
  (xl-defprompt ("lines of code" regex)
                 ;; :task "Convert lines to regex"
                 ;; Generate input with this
                 ;; :gen "examplary-edit-generator shane"
                 :gen examplary-edit-generator
                 :filter "grex"
                 ;; The third argument (if supplied) should be incorrect output (a counterexample).
                 ;; If the 2nd argument is left out, it will be generated by the command specified by :external
                 :examples (("example 1\nexample2")
                            ("example 2\nexample3" "^example [23]$")
                            ("pi4\npi5" "^pi[45]$" "pi4\npi5"))
                 :lm-command "openai-complete.sh")
#+END_SRC

*** Task oriented
#+BEGIN_SRC emacs-lisp -n :async :results verbatim code
  ("translate" ("make analogy about" "topic") "from" "to")
#+END_SRC

** Projecting the code back to the starting LM is possible
- Semantic search on existing documents
- Semantic search on existing functions in emacs

** Language models encode holographic representations of software
It's important to avoid mixing training data
of varying licenses when training LMs. 

One risk is that in the future, as
holographic representations of software are used more in place of running original source code (i.e. as LMs are
used more to simulate software), a software's
hologram is more likely to be used in ways that violate the
original license or the spirit of the license.

LMs bring with them understanding of the way
software is used, and also an understanding of
the inspiration that went into designing that
software. The issue is that this is all
automated and right now new software companies
are staking their future on LMs and using said
models to their fullest.

Therefore, the inexorable conclusion is that
software that has been used to train these
models will be used holographically, perhaps
more than even from their original software
and their holographic representation that
encodes the value of the software (the way
it's used as opposed to written) is what's
more important and that's is what is being exploited.

If the original code of an example of free
software was part of the training data of a NN
alongside software of other conflicting
licenses then that effectively relicences the
same software without consent, going forward
into the future.

*** Generating parts of emacs with GPT-3
I am able to generate parts of GPL protected
software using LMs and can query the LMs as to
how they are used.

Therefore, the software exists now in the latent space of a language model in
the form of a hologram, within and without the source code. Language models
encode contrived associations made between different pieces of software in
order to create an accurate model that is useful for simulation, code
generation, code understanding and modelling the usage of software.

- The holographic representation

*** =0.9 / 1= is still stealing

** Counter arguments
*** It's not imaginary, it's just... English? more like, stochastic programming?
Imaginary programming is more of an activity
and a style of programming and is not really
concerned with the amount of uncertainty.

Your code might take a trip through someone
else's LM along the way and be projected back
to your own.

That means that some of the logic is
completely obscured and you have to make
assumptions.

You may collaborate on a user interface or
program with others and since that code can't
be fully understood by one person because of
the veil then you are compelled to imagine in
order to create something useful.

A person must build their own interface from
the pure imaginary functions that are shared.

It's a paradigm completely made up so it's
useful as far as it's useful.

All this is based on this idea that we will
have many finetuned and completely different
transformer models and we must learn to
communicate.

The NeverEnding story also influenced my
thoughts.

Once everyone stops believing in Fantasia it
ceases to exist, as does the utility of
applications built in pure imaginary code.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Help building Pen.el (GPT for emacs)
  2021-07-30  3:20                                       ` Shane Mulligan
@ 2021-07-30  6:55                                         ` Jean Louis
  0 siblings, 0 replies; 46+ messages in thread
From: Jean Louis @ 2021-07-30  6:55 UTC (permalink / raw)
  To: Shane Mulligan; +Cc: Eli Zaretskii, emacs-tangents, Stefan Kangas, rms

* Shane Mulligan <mullikine@gmail.com> [2021-07-30 06:20]:
> Hey guys.
> 
> In the last week I have been writing a thesis for Imaginary Programming,
> which aims to make all of this clear and formalised.
> 
> I am very sorry if I have sounded frustrated, but I think that this is so
> important for free software and a GPL-4 may be required to protect people,
> but also that Copilot and OpenAI's Codex and GPT-3 models infringe upon the
> spirit of GPT-3 code.
> 
> I will attach the thesis into this email.
> 
> https://github.com/semiosis/imaginary-programming-thesis/blob/master/thesis.org

Please try to see if you can help on this:

FSF-funded call for white papers on philosophical and legal questions around Copilot
https://www.fsf.org/blogs/licensing/fsf-funded-call-for-white-papers-on-philosophical-and-legal-questions-around-copilot


-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-07-30  6:55 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CACT87JohxuswvDcqGOiQR7BrHoqJFG252QD6XjEuAPU2HSuWOw@mail.gmail.com>
     [not found] ` <CADwFkm=cN4W0Mgo_hYgwWgddoe=cXj5+WYJWnAHZmmd+rd7gKw@mail.gmail.com>
     [not found]   ` <E1m4YYb-0005Ds-GJ@fencepost.gnu.org>
     [not found]     ` <CACT87JqMZ+pbVDQ-5gZHMsGcfm04CoeKZn6sY5yy+1rnxCimOQ@mail.gmail.com>
     [not found]       ` <83im1948mj.fsf@gnu.org>
     [not found]         ` <CACT87JrCAi3Umdke6gL+_W_7k2j+21jsuT=1hq5kyOx19L2x+A@mail.gmail.com>
     [not found]           ` <CACT87Jo41S2FJKxfPs0qP=qkXvwvcc0xnf1X6oEkjuhmAJ6w3A@mail.gmail.com>
     [not found]             ` <YPO+bAMpqMhxDBxU@protected.localdomain>
     [not found]               ` <83lf642jeh.fsf@gnu.org>
     [not found]                 ` <CACT87JriMaF1kFjEE_8=8FEQpAi6sxr3x3vZT3rafjY=4mQgZg@mail.gmail.com>
2021-07-19 17:00                   ` Help building Pen.el (GPT for emacs) Jean Louis
2021-07-23  6:51                     ` Shane Mulligan
2021-07-23 10:12                       ` Jean Louis
2021-07-23 10:54                         ` Eli Zaretskii
2021-07-23 11:32                           ` Jean Louis
2021-07-23 11:51                             ` Eli Zaretskii
2021-07-23 12:47                               ` Jean Louis
2021-07-23 13:39                                 ` Shane Mulligan
2021-07-23 14:39                                   ` Jean Louis
2021-07-26  0:16                                   ` Richard Stallman
2021-07-26  0:28                                     ` Shane Mulligan
2021-07-30  3:20                                       ` Shane Mulligan
2021-07-30  6:55                                         ` Jean Louis
2021-07-23 19:33                                 ` Eli Zaretskii
2021-07-24  3:07                                   ` Jean Louis
2021-07-24  7:32                                     ` Eli Zaretskii
2021-07-24  7:54                                       ` Jean Louis
2021-07-24  8:50                                         ` Eli Zaretskii
2021-07-24 16:16                                           ` Jean Louis
2021-07-24 16:44                                             ` Eli Zaretskii
2021-07-24 18:01                                               ` Jean Louis
2021-07-25  1:09                                     ` Richard Stallman
2021-07-24  1:14                             ` Richard Stallman
2021-07-24  2:10                               ` Shane Mulligan
2021-07-24  2:34                                 ` Shane Mulligan
2021-07-24  3:14                                   ` Shane Mulligan
2021-07-24  6:49                               ` Eli Zaretskii
2021-07-24  7:33                                 ` Jean Louis
2021-07-24  8:10                                   ` Eli Zaretskii
2021-07-24  8:21                                     ` Jean Louis
2021-07-24  8:35                                     ` Jean Louis
2021-07-24  8:59                                       ` Eli Zaretskii
2021-07-24 16:18                                         ` Jean Louis
2021-07-24 16:45                                           ` Eli Zaretskii
2021-07-24 17:57                                             ` Jean Louis
2021-07-24 18:15                                               ` Eli Zaretskii
2021-07-24  7:41                                 ` Philip Kaludercic
2021-07-24  7:59                                   ` Eli Zaretskii
2021-07-24  9:31                                     ` Philip Kaludercic
2021-07-24 11:19                                       ` Eli Zaretskii
2021-07-24 14:16                                         ` Philip Kaludercic
2021-07-24 14:37                                           ` Eli Zaretskii
2021-07-24 14:49                                             ` Philip Kaludercic
2021-07-24 15:13                                               ` Eli Zaretskii
2021-07-25  1:06                       ` Richard Stallman
     [not found] ` <YN8bZEJAkWyQwjrB@protected.localdomain>
     [not found]   ` <CACT87JpAcUfuRB01CcnfbL4yCTPyDoiG_WOzzxVvAW7rhj0=Mw@mail.gmail.com>
2021-07-23 15:37     ` Jean Louis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).