* Next Steps For the Software Heritage Problem
@ 2024-06-18 8:37 MSavoritias
2024-06-18 14:19 ` Ian Eure
` (3 more replies)
0 siblings, 4 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-18 8:37 UTC (permalink / raw)
To: guix-devel
Hello,
Context:
As you may already know there have discussions around Software Heritage
and the LLM model they are collaborating with for a bit now. The model
itself was announced at
https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/
As I have started writing some packages I became interested in how I
might actually stop my code from ever reaching Software Heritage or at
the very least said LLM model. Every single package in guix is added
there automatically.
I sent an email on Friday and I got an answer back that such consent
mechanism hasn't been implemented and I was shown the legal terms.
instead what I am supposed to do is:
After guix has my code, my code will be automatically in Software
Heritage and the LLM model. So I am supposed to opt out seperately with
both of them to ensure that my code wont be used for future versions.
This of course means that my code will stay forever in Software
Heritage and the LLM model (or some version of it at least).
The reasoning that was given was that code harvesting happens anyway
and we give an opt-out. I am guessing its opt-out and not opt-in
because they would have less code but this is speculation of course :)
This is against our desire to make it a welcoming space and also
against the spirit of our CoC. Specifically because authors do not know
this happens when they submit packages to Guix. So it is all done
without consent.
Next Steps:
So what can we do as a Guix community from here?
Communication/Writing wise:
1. Add a clear disclaimer/requirment that any new package that is added
in Guix, the person has to give consent or get consent from the person
that the package is written in. This needs to be added in the docs and
in the email procedures.
2. Make a blog post of our stance towards Software Heritage and the
code harvesting they are doing. This post will write in environmental
and ethical grounds why Guix is against this and mention specifically
Software Heritage. This is done to separate and mention that we do not
like what is happening in case anyone comes asking, and hopefully give
public pressure to Software Heritage.
3. Exclude all Software Heritage merch, stands, talks, people in
official capacity, logos, or anything else that participates in social
events of guix and write it in some rules we have. also write in
channel rules that Software Heritage is offtopic same way Non-Free
Software is offtopic.
4. There doesn't seem to be any movement on the side of Guix towards:
- Accountability in an official capacity of SH for the terrible
handling of the trans name incident and a plan to make it easier in
the future.
- The LLM problem that was mentioned in this email.
So with that said I urge anybody who has been in contact with them in
an official Guix capacity to come forward, otherwise I can volunteer to
be that. Idk if we have a community outreach thing I need to be in also
for that. (we should if not)
The above make two assumptions:
1. That the Guix community is against LLM/"AI". Which for environmental
and ethical grounds we should be.
2. That we are a consent culture.
Coding Wise this has been talked about before some potential options
are:
- Communicate with Software Heritage to be able to give a "sign" that
the code that is sent should go or not in the code harvesting project.
- Remove all Software Heritage integration since its too hard to be
ethical about it and built a better solution.
Conclusion:
To summarize from the steps I wrote above, it seems Software Heritage
makes it harder and harder for us to actually be an inclusive,
welcoming space we want to be. Idk what that leaves us, as I said I am
not part of any "insider" discussions. But it seems to not move that
much and its time to start doing actionable things in another direction.
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 8:37 Next Steps For the Software Heritage Problem MSavoritias
@ 2024-06-18 14:19 ` Ian Eure
2024-06-19 8:36 ` Dale Mellor
2024-06-18 16:21 ` Greg Hogan
` (2 subsequent siblings)
3 siblings, 1 reply; 70+ messages in thread
From: Ian Eure @ 2024-06-18 14:19 UTC (permalink / raw)
To: guix-devel
Hi MSavoritias,
Thank you for the email.
I’m going to lay out this situation as clearly as I can, in the
hope that others will better understand, and hopefully treat it
with the seriousness it deserves.
1. Guix requests SWH to archive some source code. This is fine.
2. SWH archives the code. This is also fine.
3. SWH gives all their source to an AI company, HuggingFace. This
is questionable. While fine in theory, the company they gave it
to, HuggingFace, violates both the licenses of the code they’re
given, and SWH’s own policy on LLMs. Instead of terminating the
partnership, SWH has continued to tout it as "responsible AI" in
the face of these violations[1]. This makes me doubt whether
they’re acting in good faith.
4. HuggingFace trains a LLM out of all the code they’re given and
redistributes it. This is *not* fine. The LLM is a derivative
work of the source code it’s trained on, which violates the
licenses of many projects in its training set -- it’s akin to
compiling a gigantic .so file built from the SWH dataset.
5. HuggingFace uses its StarCoder2 LLM to generate source code.
This is *also* not fine. This output is also a derivative work of
the inputs, and it’s redistributed with no license or attribution
whatsoever. HuggingFace purports to include attribution in their
model, however, their own tools make no use of it and emit code
with no attribution. You can observe this behavior yourself:
https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
I understand Guix’s participation is several degrees removed from
where the core of the problem lies. However, the partnership with
SWH is indirectly enabling massive violations of the licenses of
the software it packages. Guix should stop doing that.
Thanks,
— Ian
[1]:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
MSavoritias <email@msavoritias.me> writes:
> Hello,
>
> Context:
>
> As you may already know there have discussions around Software
> Heritage
> and the LLM model they are collaborating with for a bit now. The
> model
> itself was announced at
> https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/
>
> As I have started writing some packages I became interested in
> how I
> might actually stop my code from ever reaching Software Heritage
> or at
> the very least said LLM model. Every single package in guix is
> added
> there automatically.
>
> I sent an email on Friday and I got an answer back that such
> consent
> mechanism hasn't been implemented and I was shown the legal
> terms.
> instead what I am supposed to do is:
>
> After guix has my code, my code will be automatically in
> Software
> Heritage and the LLM model. So I am supposed to opt out
> seperately with
> both of them to ensure that my code wont be used for future
> versions.
> This of course means that my code will stay forever in Software
> Heritage and the LLM model (or some version of it at least).
>
> The reasoning that was given was that code harvesting happens
> anyway
> and we give an opt-out. I am guessing its opt-out and not opt-in
> because they would have less code but this is speculation of
> course :)
>
> This is against our desire to make it a welcoming space and also
> against the spirit of our CoC. Specifically because authors do
> not know
> this happens when they submit packages to Guix. So it is all
> done
> without consent.
>
> Next Steps:
>
> So what can we do as a Guix community from here?
> Communication/Writing wise:
>
> 1. Add a clear disclaimer/requirment that any new package that
> is added
> in Guix, the person has to give consent or get consent from the
> person
> that the package is written in. This needs to be added in the
> docs and
> in the email procedures.
> 2. Make a blog post of our stance towards Software Heritage and
> the
> code harvesting they are doing. This post will write in
> environmental
> and ethical grounds why Guix is against this and mention
> specifically
> Software Heritage. This is done to separate and mention that we
> do not
> like what is happening in case anyone comes asking, and
> hopefully give
> public pressure to Software Heritage.
> 3. Exclude all Software Heritage merch, stands, talks, people in
> official capacity, logos, or anything else that participates in
> social
> events of guix and write it in some rules we have. also write in
> channel rules that Software Heritage is offtopic same way
> Non-Free
> Software is offtopic.
> 4. There doesn't seem to be any movement on the side of Guix
> towards:
> - Accountability in an official capacity of SH for the terrible
> handling of the trans name incident and a plan to make it
> easier in
> the future.
> - The LLM problem that was mentioned in this email.
> So with that said I urge anybody who has been in contact with
> them in
> an official Guix capacity to come forward, otherwise I can
> volunteer to
> be that. Idk if we have a community outreach thing I need to be
> in also
> for that. (we should if not)
>
> The above make two assumptions:
> 1. That the Guix community is against LLM/"AI". Which for
> environmental
> and ethical grounds we should be.
> 2. That we are a consent culture.
>
> Coding Wise this has been talked about before some potential
> options
> are:
> - Communicate with Software Heritage to be able to give a "sign"
> that
> the code that is sent should go or not in the code harvesting
> project.
> - Remove all Software Heritage integration since its too hard to
> be
> ethical about it and built a better solution.
>
> Conclusion:
>
> To summarize from the steps I wrote above, it seems Software
> Heritage
> makes it harder and harder for us to actually be an inclusive,
> welcoming space we want to be. Idk what that leaves us, as I
> said I am
> not part of any "insider" discussions. But it seems to not move
> that
> much and its time to start doing actionable things in another
> direction.
>
> MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 8:37 Next Steps For the Software Heritage Problem MSavoritias
2024-06-18 14:19 ` Ian Eure
@ 2024-06-18 16:21 ` Greg Hogan
2024-06-18 16:33 ` MSavoritias
2024-06-19 10:10 ` Efraim Flashner
2024-06-21 8:39 ` About SWH, let avoid the wrong discussion Simon Tournier
3 siblings, 1 reply; 70+ messages in thread
From: Greg Hogan @ 2024-06-18 16:21 UTC (permalink / raw)
To: MSavoritias; +Cc: guix-devel
On Tue, Jun 18, 2024 at 4:37 AM MSavoritias <email@msavoritias.me> wrote:
>
> 1. Add a clear disclaimer/requirment that any new package that is added
> in Guix, the person has to give consent or get consent from the person
> that the package is written in. This needs to be added in the docs and
> in the email procedures.
You will be happy to know that Guix has always had this requirement
[1] by only packaging software licensed with the four essential
freedoms [2]. It's the first item on the Guix homepage.
[1] https://guix.gnu.org/manual/en/html_node/Software-Freedom.html
[2] https://www.gnu.org/philosophy/free-sw.en.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 16:21 ` Greg Hogan
@ 2024-06-18 16:33 ` MSavoritias
2024-06-18 17:31 ` Greg Hogan
0 siblings, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-18 16:33 UTC (permalink / raw)
To: Greg Hogan; +Cc: MSavoritias, guix-devel
On Tue, 18 Jun 2024 12:21:33 -0400
Greg Hogan <code@greghogan.com> wrote:
> On Tue, Jun 18, 2024 at 4:37 AM MSavoritias <email@msavoritias.me>
> wrote:
> >
> > 1. Add a clear disclaimer/requirment that any new package that is
> > added in Guix, the person has to give consent or get consent from
> > the person that the package is written in. This needs to be added
> > in the docs and in the email procedures.
>
> You will be happy to know that Guix has always had this requirement
> [1] by only packaging software licensed with the four essential
> freedoms [2]. It's the first item on the Guix homepage.
>
> [1] https://guix.gnu.org/manual/en/html_node/Software-Freedom.html
> [2] https://www.gnu.org/philosophy/free-sw.en.html
Ah it seems I wasn't clear enough.
I meant write something like:
By packaging a software project for Guix you are exposing said software
to a code harvesting project (also known as LLMs or "AI") run by
Software Heritage and/or their partners. Make sure you have gotten
fully informed consent and that the author of this package fully
understands what the implications are.
Something like that. To make it clear that the package that is about to
be added to Guix is going to be harvested for the LLM models Software
Heritage decided to share the code with.
Hope this is more clear.
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
@ 2024-06-18 17:12 Andy Tai
2024-06-18 18:08 ` Ian Eure
0 siblings, 1 reply; 70+ messages in thread
From: Andy Tai @ 2024-06-18 17:12 UTC (permalink / raw)
To: guix-devel
What is the role of GNU Guix in this? If Guix is mainly a referral
mechanism like web page links to the actual contents, the real problem
is not Guix but the use of free software which can be obtained via
other mechanisms directly anyway to train LLMs if Guix is not in the
loop?
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 16:33 ` MSavoritias
@ 2024-06-18 17:31 ` Greg Hogan
2024-06-18 17:57 ` Ian Eure
2024-06-19 7:01 ` MSavoritias
0 siblings, 2 replies; 70+ messages in thread
From: Greg Hogan @ 2024-06-18 17:31 UTC (permalink / raw)
To: MSavoritias; +Cc: guix-devel
On Tue, Jun 18, 2024 at 12:33 PM MSavoritias <email@msavoritias.me> wrote:
>
> Ah it seems I wasn't clear enough.
> I meant write something like:
>
> By packaging a software project for Guix you are exposing said software
> to a code harvesting project (also known as LLMs or "AI") run by
> Software Heritage and/or their partners. Make sure you have gotten
> fully informed consent and that the author of this package fully
> understands what the implications are.
>
> Something like that. To make it clear that the package that is about to
> be added to Guix is going to be harvested for the LLM models Software
> Heritage decided to share the code with.
>
> Hope this is more clear.
Free software licenses do not require bespoke consent to "to run the
program, to study and change the program in source code form, to
redistribute exact copies, and to distribute modified versions" (and
"Being free to do these things means (among other things) that you do
not have to ask or pay for permission to do so.").
Your fear mongering against free software runs afoul of Guix project
guidelines ("In addition, the GNU distribution follow [sic] the free
software distribution guidelines. Among other things, these guidelines
reject non-free firmware, recommendations of non-free software, and
discuss ways to deal with trademarks and patents.").
If you feel that LLMs/AI are violating the terms of a license, then
feel free to pursue that through the legal system (potentially very
profitable given the monetary penalties for violations of copyright).
Otherwise, we should be celebrating the users and use of free
software. I'm old enough to remember "Only wimps use tape backup:
_real_ men just upload their important stuff on ftp, and let the rest
of the world mirror it ;)"
[https://lkml.iu.edu/hypermail/linux/kernel/9607.2/0292.html].
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 17:31 ` Greg Hogan
@ 2024-06-18 17:57 ` Ian Eure
2024-06-19 7:01 ` MSavoritias
1 sibling, 0 replies; 70+ messages in thread
From: Ian Eure @ 2024-06-18 17:57 UTC (permalink / raw)
To: guix-devel
Hi Greg,
Please read my earlier reply in this thread[1].
HuggingFace is demonstrably violating the licenses of the Free
Software used to train its StarCoder2 LLM.
Software Heritage is continuing to partner with HuggingFace in
spite of these violations.
Guix is continuing to partner with SWH in spite of their continued
support of these violations.
Guix is indirectly enabling the violation of the license for the
Free Software it packages. Guix has the power to stop doing that.
What is your specific rationale for continuing to enable these
clear license violations?
Thanks,
— Ian
[1]:
https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00195.html
Greg Hogan <code@greghogan.com> writes:
> On Tue, Jun 18, 2024 at 12:33 PM MSavoritias
> <email@msavoritias.me> wrote:
>>
>> Ah it seems I wasn't clear enough.
>> I meant write something like:
>>
>> By packaging a software project for Guix you are exposing said
>> software
>> to a code harvesting project (also known as LLMs or "AI") run
>> by
>> Software Heritage and/or their partners. Make sure you have
>> gotten
>> fully informed consent and that the author of this package
>> fully
>> understands what the implications are.
>>
>> Something like that. To make it clear that the package that is
>> about to
>> be added to Guix is going to be harvested for the LLM models
>> Software
>> Heritage decided to share the code with.
>>
>> Hope this is more clear.
>
> Free software licenses do not require bespoke consent to "to run
> the
> program, to study and change the program in source code form, to
> redistribute exact copies, and to distribute modified versions"
> (and
> "Being free to do these things means (among other things) that
> you do
> not have to ask or pay for permission to do so.").
>
> Your fear mongering against free software runs afoul of Guix
> project
> guidelines ("In addition, the GNU distribution follow [sic] the
> free
> software distribution guidelines. Among other things, these
> guidelines
> reject non-free firmware, recommendations of non-free software,
> and
> discuss ways to deal with trademarks and patents.").
>
> If you feel that LLMs/AI are violating the terms of a license,
> then
> feel free to pursue that through the legal system (potentially
> very
> profitable given the monetary penalties for violations of
> copyright).
> Otherwise, we should be celebrating the users and use of free
> software. I'm old enough to remember "Only wimps use tape
> backup:
> _real_ men just upload their important stuff on ftp, and let the
> rest
> of the world mirror it ;)"
> [https://lkml.iu.edu/hypermail/linux/kernel/9607.2/0292.html].
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 17:12 Next Steps For the Software Heritage Problem Andy Tai
@ 2024-06-18 18:08 ` Ian Eure
2024-06-19 10:31 ` raingloom
2024-06-27 12:27 ` Ludovic Courtès
0 siblings, 2 replies; 70+ messages in thread
From: Ian Eure @ 2024-06-18 18:08 UTC (permalink / raw)
To: guix-devel
Guix sends archive requests to SWH. SWH gives that source code to
HuggingFace. HuggingFace demonstrably violates the licenses.
Guix could stop sending archive requests to SWH. This wouldn’t
*stop* the bad things from happening, but it would *stop
condoning* them. The same as how Guix not allowing non-free
software doesn’t stop people from running it, but doesn’t condone
it.
Please read my replies in this thread, and the earlier
"Concerns/questions around Software Heritage Archive" one. I have
outlined the situation, repeatedly, with references.
Thanks,
— Ian
Andy Tai <atai@atai.org> writes:
> What is the role of GNU Guix in this? If Guix is mainly a
> referral
> mechanism like web page links to the actual contents, the real
> problem
> is not Guix but the use of free software which can be obtained
> via
> other mechanisms directly anyway to train LLMs if Guix is not in
> the
> loop?
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 17:31 ` Greg Hogan
2024-06-18 17:57 ` Ian Eure
@ 2024-06-19 7:01 ` MSavoritias
2024-06-19 9:57 ` Efraim Flashner
2024-06-20 2:56 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
1 sibling, 2 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-19 7:01 UTC (permalink / raw)
To: Greg Hogan; +Cc: guix-devel
On Tue, 18 Jun 2024 13:31:02 -0400
Greg Hogan <code@greghogan.com> wrote:
> On Tue, Jun 18, 2024 at 12:33 PM MSavoritias <email@msavoritias.me>
> wrote:
> >
> > Ah it seems I wasn't clear enough.
> > I meant write something like:
> >
> > By packaging a software project for Guix you are exposing said
> > software to a code harvesting project (also known as LLMs or "AI")
> > run by Software Heritage and/or their partners. Make sure you have
> > gotten fully informed consent and that the author of this package
> > fully understands what the implications are.
> >
> > Something like that. To make it clear that the package that is
> > about to be added to Guix is going to be harvested for the LLM
> > models Software Heritage decided to share the code with.
> >
> > Hope this is more clear.
>
> Free software licenses do not require bespoke consent to "to run the
> program, to study and change the program in source code form, to
> redistribute exact copies, and to distribute modified versions" (and
> "Being free to do these things means (among other things) that you do
> not have to ask or pay for permission to do so.").
>
> Your fear mongering against free software runs afoul of Guix project
> guidelines ("In addition, the GNU distribution follow [sic] the free
> software distribution guidelines. Among other things, these guidelines
> reject non-free firmware, recommendations of non-free software, and
> discuss ways to deal with trademarks and patents.").
>
> If you feel that LLMs/AI are violating the terms of a license, then
> feel free to pursue that through the legal system (potentially very
> profitable given the monetary penalties for violations of copyright).
> Otherwise, we should be celebrating the users and use of free
> software. I'm old enough to remember "Only wimps use tape backup:
> _real_ men just upload their important stuff on ftp, and let the rest
> of the world mirror it ;)"
> [https://lkml.iu.edu/hypermail/linux/kernel/9607.2/0292.html].
Hey Greg,
You seem to be arguing on a different thread or a point I never made. I
didn't talk about licenses or legal/state rules before you mentioned
them. What I have mentioned is that SH breaks our social rules and
expectations by feeding all code into an algorithm that will endlessly
output the same as original.
I am not interested what the states or licenses/copyrights allow or
don't allow in this case. What I care about is what we expect as a
community when we submit a package/code to guix and if that violates
our social rules and expectations. And from what I have seen and talked
with people it does indeed.
PS. I am also not a man :P
Regards,
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
@ 2024-06-19 7:52 Simon Tournier
2024-06-19 9:13 ` MSavoritias
0 siblings, 1 reply; 70+ messages in thread
From: Simon Tournier @ 2024-06-19 7:52 UTC (permalink / raw)
To: Ian Eure, guix-devel
Hi Ian, all,
On Tue, 18 Jun 2024 at 10:57, Ian Eure <ian@retrospec.tv> wrote:
> Guix is continuing to partner with SWH in spite of their continued
> support of these violations.
Quickly because I am in the middle of a busy day. :-)
I think that LLM asks ethical and legal question that even FSF or EFF or
SFC does not provide clear answers. (And that probably the level where
the discussion should happen.) That’s not a light topic and we should
not rush in one definitive conclusion.
Thank you for the rise of the concern some weeks ago. It appears to me
good that people had expressed their concerns. And still does.
Although I am reading there or overthere an aggressive tone; useless.
Again, people behind SWH are long-term free software activists and be
sure that they do not take this concern lightly. FYI, people of SWH are
in touch with some people from Guix to speak about all that.
1. Legal.
These license violations are your interpretation of the law and to my
knowledge nothing have been in Court, yet.
Today, it does not really matter if we (or I) share this opinion.
Because for now, it’s just an opinion.
However, no one is a lawyer here and drawing a clear line is not simple.
Thus, FWIW, I would not jump in hard conclusions based on my own opinion
because today I am not confidant enough to emit a definitive legal
position.
2. Ethical.
If we speak about ethical concerns, we need to be very cautious. We all
share the same core of values about free software. Then we all do not
bound these values to the same point. Some of us extend them to some
topics, other restrict a bit.
Here the issue is that other values than the ones about free software
are dragged in the picture to emit a position. That’s where we need to
be cautious because we need to embrace the diversity and do not morally
judge what is outside our free software project.
About SWH, FWIW, here is my moral reasoning; as you see, it is far to be
definitive.
I think that LLM/IA is morally bad in climate change context; a moral
value outside free software, BTW. By extension, HuggingFace appears to
me morally bad.
Then, is SWH morally bad because they did a partnership with
HuggingFace? Is it morally bad to help SWH in harvesting source code?
Well, the answers do not jump to my eyes.
An analogy could be: Am I morally bad when I use my Github account to
report bugs of free software there? Or when I contribute to free
software hosted on Github? Let do not drift; I am just trying to expose
that moral questions are often more complex that yes or no.
All is not 0 and 1. There is tradeoff and balance.
Back to SWH. I consider that free software source code is part of human
culture and it must be preserved. Preserving source code is morally
good.
Thus, I think the mission of SWH is morally good. Because their
partnership with UNESCO in order to collect and preserve this human
culture is morally good. Then, helping in that mission appear to me
morally good.
Moreover, being able to rescue is also morally good. For example, in
scientific context where the trust in scientific knowledge depends on
software that vanish. This trust appears to me vitally important.
Therefore, it appears to me very harsh to jump in definitive moral
conclusion about the SWH initiative.
All that said, back to my busy day. :-)
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 14:19 ` Ian Eure
@ 2024-06-19 8:36 ` Dale Mellor
2024-06-20 17:00 ` Andreas Enge
0 siblings, 1 reply; 70+ messages in thread
From: Dale Mellor @ 2024-06-19 8:36 UTC (permalink / raw)
To: Ian Eure, guix-devel
On Tue, 2024-06-18 at 07:19 -0700, Ian Eure wrote:
> Hi MSavoritias,
>
> Thank you for the email.
>
> I’m going to lay out this situation as clearly as I can, in the
> hope that others will better understand, and hopefully treat it
> with the seriousness it deserves.
>
> 1. Guix requests SWH to archive some source code. This is fine.
No, it's not. I use Guix as a tool to develop my own projects, private and
personal for reasons I'm keeping to myself. As part of that I write package
definitions for them, and use the Guix machinery to build and test. I *cannot*
have Guix just giving my code away to anybody, that is just fundamentally wrong.
We need to ask what is Guix? A free operating system, a framework for
developing free operating systems, or a more generic tool for software
development and deployment? If the latter it *cannot* do nefarious things
without explicit consent.
I think at least there should be a /restricted/ license type available to
package definitions, and the system absolutely should not give source code away
from packages which use this (of course, they won't get into the official
distribution, but that's fine).
More broadly, I think they should just stop inter-operating with SH. Just
walk away.
Dale
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 7:52 Simon Tournier
@ 2024-06-19 9:13 ` MSavoritias
2024-06-19 9:54 ` Efraim Flashner
2024-06-19 14:41 ` Simon Tournier
0 siblings, 2 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-19 9:13 UTC (permalink / raw)
To: Simon Tournier; +Cc: Ian Eure, guix-devel
On Wed, 19 Jun 2024 09:52:36 +0200
Simon Tournier <zimon.toutoune@gmail.com> wrote:
> Hi Ian, all,
>
> On Tue, 18 Jun 2024 at 10:57, Ian Eure <ian@retrospec.tv> wrote:
>
> > Guix is continuing to partner with SWH in spite of their continued
> > support of these violations.
>
> Quickly because I am in the middle of a busy day. :-)
Hey Simon,
>
> I think that LLM asks ethical and legal question that even FSF or EFF
> or SFC does not provide clear answers. (And that probably the level
> where the discussion should happen.) That’s not a light topic and we
> should not rush in one definitive conclusion.
>
> Thank you for the rise of the concern some weeks ago. It appears to
> me good that people had expressed their concerns. And still does.
> Although I am reading there or overthere an aggressive tone; useless.
>
> Again, people behind SWH are long-term free software activists and be
> sure that they do not take this concern lightly. FYI, people of SWH
> are in touch with some people from Guix to speak about all that.
That is a very good point actually and it is one I also raised in the
email I sent. That we have been told there are some discussions but we
haven't seen any results for over 6 months now. Hence me asking for
anybody that has approached SH in an official Guix capacity to step
forward. Otherwise as I said I can approach SH :)
>
> 1. Legal.
>
> These license violations are your interpretation of the law and to my
> knowledge nothing have been in Court, yet.
>
> Today, it does not really matter if we (or I) share this opinion.
> Because for now, it’s just an opinion.
>
> However, no one is a lawyer here and drawing a clear line is not
> simple.
>
> Thus, FWIW, I would not jump in hard conclusions based on my own
> opinion because today I am not confidant enough to emit a definitive
> legal position.
>
That is fair, I agree that copyright wise and legal/state wise the
answer is not clear at all. And I don't think anybody in this mailing
list can decidely answer that as you said.
> 2. Ethical.
>
> If we speak about ethical concerns, we need to be very cautious. We
> all share the same core of values about free software. Then we all
> do not bound these values to the same point. Some of us extend them
> to some topics, other restrict a bit.
>
> Here the issue is that other values than the ones about free software
> are dragged in the picture to emit a position. That’s where we need
> to be cautious because we need to embrace the diversity and do not
> morally judge what is outside our free software project.
>
> About SWH, FWIW, here is my moral reasoning; as you see, it is far to
> be definitive.
I agree that we probably won't find any definitive answer if LLMs are
bad or not. But that is also not the question posed here tho.
The question posed here was that *all* code that is sent from Guix to
SH is automatically transfered without consent to be used in an LLM
model. That is without said process being opt-in and without said
process being transparent.
The second one could be solved by adding the disclaimer and making the
changes to commit packages as a i said. It can also be done I was told
by just stopping guix from uploading any new code to SH from any
package. which I would also be in favor.
The first one can be done with social pressure which is what the
blogpost and the talking and potentially the not including SH into Guix
go towards.
Whether LLMs are ethical or not has nothing to do with the question
posted above. Although personally I would push for not including LLMs
unless under strict criteria of environmental and ethical sourcing. but
that can come at a later time.
I would also like SH to see why opt-in should be the default at the
very least, and the process should be transparent to everybody putting
code into SH. Archiving source code is a good cause. This is why
I said to approach them in official Guix capacity :)
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 9:13 ` MSavoritias
@ 2024-06-19 9:54 ` Efraim Flashner
2024-06-19 10:25 ` raingloom
2024-06-19 10:34 ` MSavoritias
2024-06-19 14:41 ` Simon Tournier
1 sibling, 2 replies; 70+ messages in thread
From: Efraim Flashner @ 2024-06-19 9:54 UTC (permalink / raw)
To: MSavoritias; +Cc: Simon Tournier, Ian Eure, guix-devel
[-- Attachment #1: Type: text/plain, Size: 5629 bytes --]
On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote:
> On Wed, 19 Jun 2024 09:52:36 +0200
> Simon Tournier <zimon.toutoune@gmail.com> wrote:
>
> > Hi Ian, all,
> >
> > On Tue, 18 Jun 2024 at 10:57, Ian Eure <ian@retrospec.tv> wrote:
> >
> > > Guix is continuing to partner with SWH in spite of their continued
> > > support of these violations.
> >
> > Quickly because I am in the middle of a busy day. :-)
>
> Hey Simon,
>
> >
> > I think that LLM asks ethical and legal question that even FSF or EFF
> > or SFC does not provide clear answers. (And that probably the level
> > where the discussion should happen.) That’s not a light topic and we
> > should not rush in one definitive conclusion.
> >
> > Thank you for the rise of the concern some weeks ago. It appears to
> > me good that people had expressed their concerns. And still does.
> > Although I am reading there or overthere an aggressive tone; useless.
> >
> > Again, people behind SWH are long-term free software activists and be
> > sure that they do not take this concern lightly. FYI, people of SWH
> > are in touch with some people from Guix to speak about all that.
>
> That is a very good point actually and it is one I also raised in the
> email I sent. That we have been told there are some discussions but we
> haven't seen any results for over 6 months now. Hence me asking for
> anybody that has approached SH in an official Guix capacity to step
> forward. Otherwise as I said I can approach SH :)
The relationship between SWH and Hugging Face is (IMO) off-topic for the
Guix mailing lists. I'm not surprised that the discussions are
happening elsewhere.
> >
> > 1. Legal.
> >
> > These license violations are your interpretation of the law and to my
> > knowledge nothing have been in Court, yet.
> >
> > Today, it does not really matter if we (or I) share this opinion.
> > Because for now, it’s just an opinion.
> >
> > However, no one is a lawyer here and drawing a clear line is not
> > simple.
> >
> > Thus, FWIW, I would not jump in hard conclusions based on my own
> > opinion because today I am not confidant enough to emit a definitive
> > legal position.
> >
>
> That is fair, I agree that copyright wise and legal/state wise the
> answer is not clear at all. And I don't think anybody in this mailing
> list can decidely answer that as you said.
>
> > 2. Ethical.
> >
> > If we speak about ethical concerns, we need to be very cautious. We
> > all share the same core of values about free software. Then we all
> > do not bound these values to the same point. Some of us extend them
> > to some topics, other restrict a bit.
> >
> > Here the issue is that other values than the ones about free software
> > are dragged in the picture to emit a position. That’s where we need
> > to be cautious because we need to embrace the diversity and do not
> > morally judge what is outside our free software project.
> >
> > About SWH, FWIW, here is my moral reasoning; as you see, it is far to
> > be definitive.
>
> I agree that we probably won't find any definitive answer if LLMs are
> bad or not. But that is also not the question posed here tho.
>
> The question posed here was that *all* code that is sent from Guix to
> SH is automatically transfered without consent to be used in an LLM
> model. That is without said process being opt-in and without said
> process being transparent.
I am not a lawyer, nor do I play one on TV.
Transferring the code is (legally) fine, using the code is (legally)
fine, distributing the result is (I think) legally questionable.
If your concern is the code being transferred to the LLM owners, IMO
that's already covered by the license of the code itself. As for what
the LLM owners do with the code, (again I am not a lawyer) it should not
make a difference if SWH gives them the code, they download it from
Guix's infrastructure or get it straight from upstream. Redistributing
the source code is allowed.
> The second one could be solved by adding the disclaimer and making the
> changes to commit packages as a i said. It can also be done I was told
> by just stopping guix from uploading any new code to SH from any
> package. which I would also be in favor.
> The first one can be done with social pressure which is what the
> blogpost and the talking and potentially the not including SH into Guix
> go towards.
>
> Whether LLMs are ethical or not has nothing to do with the question
> posted above. Although personally I would push for not including LLMs
> unless under strict criteria of environmental and ethical sourcing. but
> that can come at a later time.
>
> I would also like SH to see why opt-in should be the default at the
> very least, and the process should be transparent to everybody putting
> code into SH. Archiving source code is a good cause. This is why
> I said to approach them in official Guix capacity :)
One of our packages, dbxfs, left Github a while ago and continued
development on a different forge. They adjusted their README to disallow
hosting of their code on Github. Based on this restriction we have
labeled later versions of the software as non-free and have not updated
the package. IMO saying that source code cannot be uploaded to SWH would
fall into the same category.
--
Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 7:01 ` MSavoritias
@ 2024-06-19 9:57 ` Efraim Flashner
2024-06-20 2:56 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
1 sibling, 0 replies; 70+ messages in thread
From: Efraim Flashner @ 2024-06-19 9:57 UTC (permalink / raw)
To: MSavoritias; +Cc: Greg Hogan, guix-devel
[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]
On Wed, Jun 19, 2024 at 10:01:43AM +0300, MSavoritias wrote:
> On Tue, 18 Jun 2024 13:31:02 -0400
> Greg Hogan <code@greghogan.com> wrote:
>
> > On Tue, Jun 18, 2024 at 12:33 PM MSavoritias <email@msavoritias.me>
> > wrote:
> > >
<snip>
> >
> > If you feel that LLMs/AI are violating the terms of a license, then
> > feel free to pursue that through the legal system (potentially very
> > profitable given the monetary penalties for violations of copyright).
> > Otherwise, we should be celebrating the users and use of free
> > software. I'm old enough to remember "Only wimps use tape backup:
> > _real_ men just upload their important stuff on ftp, and let the rest
> > of the world mirror it ;)"
> > [https://lkml.iu.edu/hypermail/linux/kernel/9607.2/0292.html].
>
> Hey Greg,
>
<snip>
>
> PS. I am also not a man :P
To head off any potential misunderstanding, I followed the link above
and the line "Only wimps ..." is an old quote from Linus Torvalds, not
Greg assuming your gender :).
--
Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 8:37 Next Steps For the Software Heritage Problem MSavoritias
2024-06-18 14:19 ` Ian Eure
2024-06-18 16:21 ` Greg Hogan
@ 2024-06-19 10:10 ` Efraim Flashner
2024-06-21 8:39 ` About SWH, let avoid the wrong discussion Simon Tournier
3 siblings, 0 replies; 70+ messages in thread
From: Efraim Flashner @ 2024-06-19 10:10 UTC (permalink / raw)
To: MSavoritias; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 828 bytes --]
On Tue, Jun 18, 2024 at 11:37:17AM +0300, MSavoritias wrote:
> Hello,
<snip>
> So with that said I urge anybody who has been in contact with them in
> an official Guix capacity to come forward, otherwise I can volunteer to
> be that. Idk if we have a community outreach thing I need to be in also
> for that. (we should if not)
<snip>
Without addressing the rest of the email, I'd like to point out that if
the Guix project needs to interact with SWH (or Hugging Face) in an
official capacity then the maintainers will either do it or take care of
it. Thank you for your offer, we'll keep it in mind.
--
Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 9:54 ` Efraim Flashner
@ 2024-06-19 10:25 ` raingloom
2024-06-19 15:46 ` Ekaitz Zarraga
2024-06-19 10:34 ` MSavoritias
1 sibling, 1 reply; 70+ messages in thread
From: raingloom @ 2024-06-19 10:25 UTC (permalink / raw)
To: MSavoritias, Simon Tournier, Ian Eure, guix-devel
On 2024-06-19 11:54, Efraim Flashner wrote:
> On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote:
> ...
> One of our packages, dbxfs, left Github a while ago and continued
> development on a different forge. They adjusted their README to disallow
> hosting of their code on Github. Based on this restriction we have
> labeled later versions of the software as non-free and have not updated
> the package. IMO saying that source code cannot be uploaded to SWH would
> fall into the same category.
No wonder more and more people are growing dissatisfied with the free
software movement.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 18:08 ` Ian Eure
@ 2024-06-19 10:31 ` raingloom
2024-06-27 12:27 ` Ludovic Courtès
1 sibling, 0 replies; 70+ messages in thread
From: raingloom @ 2024-06-19 10:31 UTC (permalink / raw)
To: Ian Eure; +Cc: guix-devel
On 2024-06-18 20:08, Ian Eure wrote:
> Andy Tai <atai@atai.org> writes:
>
>> What is the role of GNU Guix in this? If Guix is mainly a referral
>> mechanism like web page links to the actual contents, the real problem
>> is not Guix but the use of free software which can be obtained via
>> other mechanisms directly anyway to train LLMs if Guix is not in the
>> loop?
> Guix sends archive requests to SWH. SWH gives that source code to HuggingFace. HuggingFace demonstrably violates the licenses.
>
> Guix could stop sending archive requests to SWH. This wouldn’t *stop* the bad things from happening, but it would *stop condoning* them. The same as how Guix not allowing non-free software doesn’t stop people from running it, but doesn’t condone it.
> ...
Guix doesn't just condone it in this case, it's actively helping SWH out
by submitting packages.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 9:54 ` Efraim Flashner
2024-06-19 10:25 ` raingloom
@ 2024-06-19 10:34 ` MSavoritias
1 sibling, 0 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-19 10:34 UTC (permalink / raw)
To: Efraim Flashner; +Cc: MSavoritias, Simon Tournier, Ian Eure, guix-devel
On Wed, 19 Jun 2024 12:54:30 +0300
Efraim Flashner <efraim@flashner.co.il> wrote:
> On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote:
> > On Wed, 19 Jun 2024 09:52:36 +0200
> > Simon Tournier <zimon.toutoune@gmail.com> wrote:
> >
> > > Hi Ian, all,
> > >
> > > On Tue, 18 Jun 2024 at 10:57, Ian Eure <ian@retrospec.tv> wrote:
>
> > > I think that LLM asks ethical and legal question that even FSF or
> > > EFF or SFC does not provide clear answers. (And that probably
> > > the level where the discussion should happen.) That’s not a
> > > light topic and we should not rush in one definitive conclusion.
> > >
> > > Thank you for the rise of the concern some weeks ago. It appears
> > > to me good that people had expressed their concerns. And still
> > > does. Although I am reading there or overthere an aggressive
> > > tone; useless.
> > >
> > > Again, people behind SWH are long-term free software activists
> > > and be sure that they do not take this concern lightly. FYI,
> > > people of SWH are in touch with some people from Guix to speak
> > > about all that.
> >
> > That is a very good point actually and it is one I also raised in
> > the email I sent. That we have been told there are some discussions
> > but we haven't seen any results for over 6 months now. Hence me
> > asking for anybody that has approached SH in an official Guix
> > capacity to step forward. Otherwise as I said I can approach SH :)
>
> The relationship between SWH and Hugging Face is (IMO) off-topic for
> the Guix mailing lists. I'm not surprised that the discussions are
> happening elsewhere.
Given that any code and package that is contributed to Guix goes to SWH
and Hugging Face I would disagree.
> > > 2. Ethical.
> > >
> > > If we speak about ethical concerns, we need to be very cautious.
> > > We all share the same core of values about free software. Then
> > > we all do not bound these values to the same point. Some of us
> > > extend them to some topics, other restrict a bit.
> > >
> > > Here the issue is that other values than the ones about free
> > > software are dragged in the picture to emit a position. That’s
> > > where we need to be cautious because we need to embrace the
> > > diversity and do not morally judge what is outside our free
> > > software project.
> > >
> > > About SWH, FWIW, here is my moral reasoning; as you see, it is
> > > far to be definitive.
> >
> > I agree that we probably won't find any definitive answer if LLMs
> > are bad or not. But that is also not the question posed here tho.
> >
> > The question posed here was that *all* code that is sent from Guix
> > to SH is automatically transfered without consent to be used in an
> > LLM model. That is without said process being opt-in and without
> > said process being transparent.
>
> I am not a lawyer, nor do I play one on TV.
>
> Transferring the code is (legally) fine, using the code is (legally)
> fine, distributing the result is (I think) legally questionable.
>
> If your concern is the code being transferred to the LLM owners, IMO
> that's already covered by the license of the code itself. As for what
> the LLM owners do with the code, (again I am not a lawyer) it should
> not make a difference if SWH gives them the code, they download it
> from Guix's infrastructure or get it straight from upstream.
> Redistributing the source code is allowed.
Idk if you read the email that was sent to Greg in the other thread.
Given that you replied there too I assume you did.
So given this context I am repeating again that is not about legal and
let me copy-past my reply to the legal argument:
Quote:
You seem to be arguing on a different thread or a point I never made. I
didn't talk about licenses or legal/state rules before you mentioned
them. What I have mentioned is that SH breaks our social rules and
expectations by feeding all code into an algorithm that will endlessly
output the same as original.
I am not interested what the states or licenses/copyrights allow or
don't allow in this case. What I care about is what we expect as a
community when we submit a package/code to guix and if that violates
our social rules and expectations. And from what I have seen and talked
with people it does indeed.
> > The second one could be solved by adding the disclaimer and making
> > the changes to commit packages as a i said. It can also be done I
> > was told by just stopping guix from uploading any new code to SH
> > from any package. which I would also be in favor.
> > The first one can be done with social pressure which is what the
> > blogpost and the talking and potentially the not including SH into
> > Guix go towards.
> >
> > Whether LLMs are ethical or not has nothing to do with the question
> > posted above. Although personally I would push for not including
> > LLMs unless under strict criteria of environmental and ethical
> > sourcing. but that can come at a later time.
> >
> > I would also like SH to see why opt-in should be the default at the
> > very least, and the process should be transparent to everybody
> > putting code into SH. Archiving source code is a good cause. This
> > is why I said to approach them in official Guix capacity :)
>
> One of our packages, dbxfs, left Github a while ago and continued
> development on a different forge. They adjusted their README to
> disallow hosting of their code on Github. Based on this restriction
> we have labeled later versions of the software as non-free and have
> not updated the package. IMO saying that source code cannot be
> uploaded to SWH would fall into the same category.
Good thing that is not what i suggested then. :)
Regards,
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 9:13 ` MSavoritias
2024-06-19 9:54 ` Efraim Flashner
@ 2024-06-19 14:41 ` Simon Tournier
2024-06-20 6:51 ` MSavoritias
1 sibling, 1 reply; 70+ messages in thread
From: Simon Tournier @ 2024-06-19 14:41 UTC (permalink / raw)
To: MSavoritias; +Cc: Ian Eure, guix-devel
Hi MSavoritias, all,
Let me provide more context.
The concern started couple of months ago, to my knowledge. And
discussion is still on going. So I think that’s incorrect to say “any
result for over 6 months”.
Moreover, I feel you have a misunderstanding about HuggingFace and SWH
partnership. From the reading of public information, HuggingFace and
BigCode trains on a subset of SWH source code archive. I mean, it is a
snapshot and to my knowledge, they provided the list of source code that
had been used for training.
Not to avoid the question but from a pragmatic point of view, one might
ask if the source code you write and do not want to be included in the
training dataset, if this source code is concretely part of that
training dataset.
HuggingFace is not training continuously with source code from SWH.
And technically, SWH is an archive i.e., the code is not stored hot. I
do not know and I have not read all details by HuggingFace of their
method; i.e., which kind of data they process – independent unique
files, complete repository, etc. What I know is that the piece when
fetching from SWH is named SWH Vault; it requires to “cook” and prepare
all the files that take times, from minutes to days.
All that to say two key points:
1. People behind SWH are well-aware about various sides of the concerns.
As said, they are long-time free software supporters. Be sure they have
eared community concerns. Some discussions are still pending because as
explained, all sides of ethical questions needs to be cautious.
Please do not think it is ignored.
2. FWIW, I am in touch with SWH people – among other members from Guix
community. For instance, in order to feed the discussion, Roberto from
SWH pointed to me this blog point by Bruce Perens:
https://perens.com/2019/10/12/invasion-of-the-ethical-licenses/
Well, I do not know if the outcome will be aligned with your current
opinion, but be sure that your concerns as the others raised by Guix
community members are taking into account.
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 10:25 ` raingloom
@ 2024-06-19 15:46 ` Ekaitz Zarraga
2024-06-20 6:36 ` MSavoritias
0 siblings, 1 reply; 70+ messages in thread
From: Ekaitz Zarraga @ 2024-06-19 15:46 UTC (permalink / raw)
To: raingloom, MSavoritias, Simon Tournier, Ian Eure, guix-devel
On 2024-06-19 12:25, raingloom@riseup.net wrote:
> On 2024-06-19 11:54, Efraim Flashner wrote:
>> On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote:
>> ...
>> One of our packages, dbxfs, left Github a while ago and continued
>> development on a different forge. They adjusted their README to disallow
>> hosting of their code on Github. Based on this restriction we have
>> labeled later versions of the software as non-free and have not updated
>> the package. IMO saying that source code cannot be uploaded to SWH would
>> fall into the same category.
>
> No wonder more and more people are growing dissatisfied with the free
> software movement.
>
There are many valid reasons why someone might criticize the Free
Software movement and people behind it, but making free software only
has 4 simple rules. If you don't comply with them you are not free
software anymore. It's as simple as that, and that simple it should be.
Free Software gives me the FREEDOM to print the code, make a roll with
it and shove it up my ass if I want to (and even distribute my modified
copies for other people to do so). The same freedom I have to upload it
to github. If you prevent me from doing one or the other you are
restricting my freedom and that's defeating the purpose of free software
and we cannot consider your code free software anymore. The line is
clear, and trying to pretend to be free software while restricting
people's freedoms (regardless of what they are) is absurd.
The Free Software movement can be labeled (and is often labeled) as a
political movement but I'd say it's more of an ethical movement. It's a
way to share *values* and the value we share here is freedom. We might
or might not share other values, politics, religion or anything, but as
long as we put the freedom in the first place we should agree that free
software is better than any other software model we have.
There are bad actors in the world (say thieves, killers or... GitHub and
AI), and we can discuss about how we should deal with them but I don't
think the answer is putting our *values* aside but embrace them harder
(one value, freedom, in our case).
If people is not happy with the Free Software movement because it puts
the freedom first, I can only understand it as people being mad about
Free Software because it's about software.
For other values, we can start other initiatives I may or may not agree
more with, but if the value is freedom (in software), I don't think
there's any better way to push for it. But trying to disguise other
things inside of the Free Software is kind of dishonest.
I don't know, maybe I'm just a little bit tired.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 7:01 ` MSavoritias
2024-06-19 9:57 ` Efraim Flashner
@ 2024-06-20 2:56 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-20 5:18 ` MSavoritias
1 sibling, 1 reply; 70+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-20 2:56 UTC (permalink / raw)
To: MSavoritias, Greg Hogan; +Cc: guix-devel
Hi MSavoritias,
On Wed, Jun 19 2024, MSavoritias wrote:
> I am not interested what the states or licenses/copyrights allow or
> don't allow in this case. What I care about is what we expect as a
> community when we submit a package/code to guix and if that violates
> our social rules and expectations.
Just in case the sweeping mention of our social rules and expectations
includes me, please know that licensing and copyright are a big part of
why I am a part of this community.
Kind regards
Felix
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 2:56 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-06-20 5:18 ` MSavoritias
0 siblings, 0 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-20 5:18 UTC (permalink / raw)
To: Felix Lechner; +Cc: Greg Hogan, guix-devel
On Wed, 19 Jun 2024 19:56:26 -0700
Felix Lechner <felix.lechner@lease-up.com> wrote:
> Hi MSavoritias,
>
> On Wed, Jun 19 2024, MSavoritias wrote:
>
> > I am not interested what the states or licenses/copyrights allow or
> > don't allow in this case. What I care about is what we expect as a
> > community when we submit a package/code to guix and if that violates
> > our social rules and expectations.
>
> Just in case the sweeping mention of our social rules and expectations
> includes me, please know that licensing and copyright are a big part
> of why I am a part of this community.
>
> Kind regards
> Felix
Sure we all are.
But remember that we also have a CoC and social rules because building
a community can't be done on top of legal rules ie. copyright.
Just like social rules shouldn't be used for legal matters all the
time, same way with copyright for social rules. Which is what I am
saying here.
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 15:46 ` Ekaitz Zarraga
@ 2024-06-20 6:36 ` MSavoritias
2024-06-20 14:35 ` Ekaitz Zarraga
0 siblings, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-20 6:36 UTC (permalink / raw)
To: Ekaitz Zarraga
Cc: raingloom, MSavoritias, Simon Tournier, Ian Eure, guix-devel
On Wed, 19 Jun 2024 17:46:08 +0200
Ekaitz Zarraga <ekaitz@elenq.tech> wrote:
> On 2024-06-19 12:25, raingloom@riseup.net wrote:
> > On 2024-06-19 11:54, Efraim Flashner wrote:
> >> On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote:
> >> ...
> >> One of our packages, dbxfs, left Github a while ago and continued
> >> development on a different forge. They adjusted their README to
> >> disallow hosting of their code on Github. Based on this
> >> restriction we have labeled later versions of the software as
> >> non-free and have not updated the package. IMO saying that source
> >> code cannot be uploaded to SWH would fall into the same category.
> >
> > No wonder more and more people are growing dissatisfied with the
> > free software movement.
> >
>
Hey Ekaitz,
Please remember two things in the context of all of this:
1. Guix is not a software entity but it is made of people that want a
safer, collaborative space to create things. These things may be code,
a blog post or anything else as part of guix. Even a social network
account. I am saying this because you only talked about Free Software
in your message and not about people or different contexts.
And we are talking about people here. Not code. Code is not alive.
2. You seem to imply that Free Software or code is apolitical. (in the
sense of social or state politics not) Which it is not. Nothing is.
For example Free Software is explicitly pro-capitalist and
pro-Google/big companies. I am not saying I disagree, but its good
to keep in mind that politics exist and do exist always. And in the case
> There are many valid reasons why someone might criticize the Free
> Software movement and people behind it, but making free software only
> has 4 simple rules. If you don't comply with them you are not free
> software anymore. It's as simple as that, and that simple it should
> be.
>
> Free Software gives me the FREEDOM to print the code, make a roll
> with it and shove it up my ass if I want to (and even distribute my
> modified copies for other people to do so). The same freedom I have
> to upload it to github. If you prevent me from doing one or the other
> you are restricting my freedom and that's defeating the purpose of
> free software and we cannot consider your code free software anymore.
> The line is clear, and trying to pretend to be free software while
> restricting people's freedoms (regardless of what they are) is absurd.
This is missing the context that GPL does indeed restrict people's
freedom to license code as the see fit. Because it was written to
further the political goals of FSF. It is on purpose. So we are already
restricting the freedom of people to do what they want on purpose.
And lets not forget
"your freedom ends where the other persons freedom begins"
and consent of course in the issue at hand.
>
> The Free Software movement can be labeled (and is often labeled) as a
> political movement but I'd say it's more of an ethical movement. It's
> a way to share *values* and the value we share here is freedom. We
> might or might not share other values, politics, religion or
> anything, but as long as we put the freedom in the first place we
> should agree that free software is better than any other software
> model we have.
>
> There are bad actors in the world (say thieves, killers or... GitHub
> and AI), and we can discuss about how we should deal with them but I
> don't think the answer is putting our *values* aside but embrace them
> harder (one value, freedom, in our case).
Definetily agree. The solution is not to embrace propietary software or
restrict software. Its to write down some common social rules that are
rooted in consent.
> If people is not happy with the Free Software movement because it
> puts the freedom first, I can only understand it as people being mad
> about Free Software because it's about software.
>
> For other values, we can start other initiatives I may or may not
> agree more with, but if the value is freedom (in software), I don't
> think there's any better way to push for it. But trying to disguise
> other things inside of the Free Software is kind of dishonest.
Fair. I mean we already have CoC and channel descriptions. Idk if we
have event guidelines/CoC yet but we should.
> I don't know, maybe I'm just a little bit tired.
No worries. I think it was very well said.
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 14:41 ` Simon Tournier
@ 2024-06-20 6:51 ` MSavoritias
2024-06-20 14:40 ` Simon Tournier
0 siblings, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-20 6:51 UTC (permalink / raw)
To: Simon Tournier; +Cc: Ian Eure, guix-devel
On Wed, 19 Jun 2024 16:41:33 +0200
Simon Tournier <zimon.toutoune@gmail.com> wrote:
> Hi MSavoritias, all,
>
> Let me provide more context.
>
> The concern started couple of months ago, to my knowledge. And
> discussion is still on going. So I think that’s incorrect to say “any
> result for over 6 months”.
Hey Simon,
I was talking about the perspective of a guix person that is not part
of maintainers or any mailing lists that these discussions are
happening. So from my side there hasn't been any updates from SWH or
from Guix either for the named issue or the LLM issue.
> Moreover, I feel you have a misunderstanding about HuggingFace and SWH
> partnership. From the reading of public information, HuggingFace and
> BigCode trains on a subset of SWH source code archive. I mean, it is
> a snapshot and to my knowledge, they provided the list of source code
> that had been used for training.
>
> Not to avoid the question but from a pragmatic point of view, one
> might ask if the source code you write and do not want to be included
> in the training dataset, if this source code is concretely part of
> that training dataset.
>
> HuggingFace is not training continuously with source code from SWH.
>
> And technically, SWH is an archive i.e., the code is not stored hot.
> I do not know and I have not read all details by HuggingFace of their
> method; i.e., which kind of data they process – independent unique
> files, complete repository, etc. What I know is that the piece when
> fetching from SWH is named SWH Vault; it requires to “cook” and
> prepare all the files that take times, from minutes to days.
Thats all fair and valid. Sadly tho SWH:
- Doesn't even mention on their website anything about what happens to
my code and where. so there is provenance. (unless i start searching
HuggingFace.
- The email from the director that was sent to me says explicitly that
they don't see an issue with it being opt-out after the fact and
embrase LLMs usage. So that seems to me that its already in there.
> All that to say two key points:
>
> 1. People behind SWH are well-aware about various sides of the
> concerns. As said, they are long-time free software supporters. Be
> sure they have eared community concerns. Some discussions are still
> pending because as explained, all sides of ethical questions needs to
> be cautious.
>
> Please do not think it is ignored.
>
>
> 2. FWIW, I am in touch with SWH people – among other members from Guix
> community. For instance, in order to feed the discussion, Roberto
> from SWH pointed to me this blog point by Bruce Perens:
>
> https://perens.com/2019/10/12/invasion-of-the-ethical-licenses/
>
> Well, I do not know if the outcome will be aligned with your current
> opinion, but be sure that your concerns as the others raised by Guix
> community members are taking into account.
Thank you for giving me an honest and detailed answer.
I wish I could say this was encouraging but as things currently stand I
would like much more transparency about what is actually happening from
Guix and SWH. Because currently:
- The director seemed completely oblivious to any issues with LLMs or
code harvesting without consent.
- Efraim seemed to have suggested that there hasn't been any
communication and its even offtopic.
- Nothing has been written from Guix or SWH publicly about it and there
are no mechanisms in place in the short term even to mitigate some of
these things. (Which my next steps try to fix when I make the patches
in a few weeks)
Regards,
MSavoritias
> Cheers,
> simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 6:36 ` MSavoritias
@ 2024-06-20 14:35 ` Ekaitz Zarraga
2024-06-21 8:51 ` MSavoritias
0 siblings, 1 reply; 70+ messages in thread
From: Ekaitz Zarraga @ 2024-06-20 14:35 UTC (permalink / raw)
To: MSavoritias; +Cc: raingloom, Simon Tournier, Ian Eure, guix-devel
Hi,
On 2024-06-20 08:36, MSavoritias wrote:
> On Wed, 19 Jun 2024 17:46:08 +0200
> Ekaitz Zarraga <ekaitz@elenq.tech> wrote:
>
>> On 2024-06-19 12:25, raingloom@riseup.net wrote:
>>> On 2024-06-19 11:54, Efraim Flashner wrote:
>>>> On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote:
>>>> ...
>>>> One of our packages, dbxfs, left Github a while ago and continued
>>>> development on a different forge. They adjusted their README to
>>>> disallow hosting of their code on Github. Based on this
>>>> restriction we have labeled later versions of the software as
>>>> non-free and have not updated the package. IMO saying that source
>>>> code cannot be uploaded to SWH would fall into the same category.
>>>
>>> No wonder more and more people are growing dissatisfied with the
>>> free software movement.
>>>
>>
> Hey Ekaitz,
>
> Please remember two things in the context of all of this:
> 1. Guix is not a software entity but it is made of people that want a
> safer, collaborative space to create things. These things may be code,
> a blog post or anything else as part of guix. Even a social network
> account. I am saying this because you only talked about Free Software
> in your message and not about people or different contexts.
> And we are talking about people here. Not code. Code is not alive.
I was specifically talking about the Free Software issue raised by
Efraim and the message by Raingloom. And exactly what you point out is
what I wanted separate as you very well did. Now we are talking about
the people and about how things affect people, and that's a different
matter I'm going to tackle below.
> 2. You seem to imply that Free Software or code is apolitical. (in the
> sense of social or state politics not) Which it is not. Nothing is.
> For example Free Software is explicitly pro-capitalist and
> pro-Google/big companies. I am not saying I disagree, but its good
> to keep in mind that politics exist and do exist always. And in the case
I'm not one of those people that think everything is politics but that's
not a debate I want to open. Free Software can be understood from many
ways. I don't think it's pro-capitalist, but pro-freedom, but that
freedom affects the capitalists too, and it's a *value* they have. But
freedom is also an anarchist value, and it can be an anti-capitalist
value too it becomes more politic when you put more things around it.
The issue I was trying to point is Free Software attracts many people
from many different backgrounds and politics, and trying to push for one
side defeats its purpose: making people stay together because they have
some shared value.
>> There are many valid reasons why someone might criticize the Free
>> Software movement and people behind it, but making free software only
>> has 4 simple rules. If you don't comply with them you are not free
>> software anymore. It's as simple as that, and that simple it should
>> be.
>>
>> Free Software gives me the FREEDOM to print the code, make a roll
>> with it and shove it up my ass if I want to (and even distribute my
>> modified copies for other people to do so). The same freedom I have
>> to upload it to github. If you prevent me from doing one or the other
>> you are restricting my freedom and that's defeating the purpose of
>> free software and we cannot consider your code free software anymore.
>> The line is clear, and trying to pretend to be free software while
>> restricting people's freedoms (regardless of what they are) is absurd.
>
> This is missing the context that GPL does indeed restrict people's
> freedom to license code as the see fit. Because it was written to
> further the political goals of FSF. It is on purpose. So we are already
> restricting the freedom of people to do what they want on purpose.
It does restrict your freedom but only if your goal is restrict other
people's software freedom. I'd say the argument here was that GPL
provides more absolute freedom in the current world than other licenses
but I don't think the GPL was a very easy decision to make for the
radical freedom fighters. That's why some people don't like it.
> And lets not forget
> "your freedom ends where the other persons freedom begins"
> and consent of course in the issue at hand.
Yes, but I don't think this is a matter Free Software needs to deal
with. And my original message was around that.
Now, we should do something as a set of people that collaboratively work
in a project. Probably not under the Free Software label, because what
free software is is already pretty clear and well defined, but as
something else, may that be Guix users and contributors, if we wish.
>>
>> The Free Software movement can be labeled (and is often labeled) as a
>> political movement but I'd say it's more of an ethical movement. It's
>> a way to share *values* and the value we share here is freedom. We
>> might or might not share other values, politics, religion or
>> anything, but as long as we put the freedom in the first place we
>> should agree that free software is better than any other software
>> model we have.
>>
>> There are bad actors in the world (say thieves, killers or... GitHub
>> and AI), and we can discuss about how we should deal with them but I
>> don't think the answer is putting our *values* aside but embrace them
>> harder (one value, freedom, in our case).
>
> Definetily agree. The solution is not to embrace propietary software or
> restrict software. Its to write down some common social rules that are
> rooted in consent.
>
>> If people is not happy with the Free Software movement because it
>> puts the freedom first, I can only understand it as people being mad
>> about Free Software because it's about software.
>>
>> For other values, we can start other initiatives I may or may not
>> agree more with, but if the value is freedom (in software), I don't
>> think there's any better way to push for it. But trying to disguise
>> other things inside of the Free Software is kind of dishonest.
>
> Fair. I mean we already have CoC and channel descriptions. Idk if we
> have event guidelines/CoC yet but we should.
>
>> I don't know, maybe I'm just a little bit tired.
>
> No worries. I think it was very well said.
>
> MSavoritias
That was just for clarifying my point wasn't against this discussion but
to say that the decision Efraim took on dbxfs is not only correct but
the only possible decision, and that it should be.
Now in Guix, I don't feel comfortable with the fact we are helping
people use AI that doesn't respect the licenses of our work to be
trained. I'm sick of it.
If they respected the licenses, I'd be ok with it. Since I accepted Free
Software's social contract I'm open for anyone to use my code with any
purpose (unless they don't respect people's freedom later).
Also, even if we don't do anything about it, Guix's codebase is public,
so they could do it anyway, regardless of SWH, so there's not much we
can do about that.
What we *can* do is raise our concerns to SWH, motivating them to be
more strict with their collaboration with companies or with the terms of
their collaboration. It's probably better that they are in our side in
this battle than if we are alone. I think they are sensible to this
issue so it shouldn't be hard to have a proper conversation with them
and see if we can understand better what they do, how, in which terms
and so on.
Maybe it's better that these AI companies reach our code through SWH
with a well-written contract than letting them steal it from the
internet without having them to sign anything.
I'm kind of just guessing there, but we are probably stronger that way.
Also, if we could make other distros to take part on this it would be a
great way to be stronger.
In any case, I think SWH are more than sensible to this issue and I
think their connections might be helpful to not only restrict this
HugginFace from doing shady things but to start pushing for regulation
for every AI company that uses our sweat for their purposes.
So, to come back to my original point: It's not the free software that
needs to change. It's the regulation of AI companies that should, and
the responsibility we demand from them. Legally and morally, they should
be accountable of what they do, and that's the direction I'd like to
approach this. Maybe it's not easy to change the regulation of the whole
world, but we can try to push for it in Europe (we pioneered some
related regulations before) first.
In summary, I don't think this is just a SWH is bad/good or Free
Software is bad/good issue.
Best,
Ekaitz
PS: If there's action I'm open and ready for it, but I won't like this
discussion to become an exercise of ethical bragging with no goals.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 6:51 ` MSavoritias
@ 2024-06-20 14:40 ` Simon Tournier
2024-06-21 9:08 ` MSavoritias
0 siblings, 1 reply; 70+ messages in thread
From: Simon Tournier @ 2024-06-20 14:40 UTC (permalink / raw)
To: MSavoritias; +Cc: Ian Eure, guix-devel
Hi MSavoritias, all,
On Thu, 20 Jun 2024 at 09:51, MSavoritias <email@msavoritias.me> wrote:
>> Not to avoid the question but from a pragmatic point of view, one
>> might ask if the source code you write and do not want to be included
>> in the training dataset, if this source code is concretely part of
>> that training dataset.
[...]
> Thats all fair and valid. Sadly tho SWH:
> -
> there is provenance. (unless i start searching
> HuggingFace.
Being concrete and explicit, could you please share:
1. Which part of your code is included in the pretraining dataset?
It’s easy, you can copy/paste a snippet and it returns the location
from where it comes from.
https://huggingface.co/spaces/bigcode/search-v2a
2. What is your code that is included in SWH archive?
Again, it’s easy: checkout some commit of your repository, then
inside this repository, you can run:
echo "https://archive.softwareheritage.org/swh:1:dir:$(guix hash -S git -f hex -H sha1 .)"
Do not miss the ’.’ (dot) once entering the repository. This
command returns SWHID. Other said, using this identifier, you might
know if the repository is stored by SWH. (Be careful with temporary
artifacts as .go files or else.)
Or you can also check for one specific content:
$ echo "https://archive.softwareheritage.org/swh:1:cnt:$(guix hash -S git -f hex -H sha1 COPYING)"
https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2
And the URL display the content of the file COPYING. Here GPL 3
license for instance.
3. Where such source code from #2 and #3 is packaged by Guix?
That said, if the source is hosted on GitHub or GitLab.com or SourceHut
or CodeBerg or some other popular forges or even mirrored without your
consent on one of these, please consider that your code had been
ingested by ChatGPT without any mean to verify. Obviously, that’s not
an argument to accept the situation with HuggingFace and I understand
that you do not want that your publicly release copyleft source code
could be reused by any LLM.
However, as said several times, rooting this willing of non-inclusion is
larger than your own willing once you publicly released such source code
under some copyleft license. I hope we agree on that.
Again, I am not trying to avoid something. And again, we all have heard
your points. Nothing is ignored. To my knowledge, the path forward is
not yet well-defined.
Since we are discussing at length with various different inputs, it
means that a common understanding and/or opinion does not seem obvious.
>> Well, I do not know if the outcome will be aligned with your current
>> opinion, but be sure that your concerns as the others raised by Guix
>> community members are taking into account.
>
> Thank you for giving me an honest and detailed answer.
I feel you are pushy on the topic and for what my opinion is worth, it
is not helpful to raise again and again that you want a way to opt-out.
Yeah, people got it. :-) And you are probably not alone, I guess.
It would help if you could provide a source code that your wrote and
answer the three criteria above: included in pretraining dataset,
included in SWH, packaged by Guix.
I do not have special information from SWH but I am sure SWH people are
working on the topic. And again, maybe the outcome will not be aligned
with your opinion. Another story.
Now, the other question you ask to Guix: do we continue to help SWH in
harvesting? You propose to stop, IIUC. Ok, we got it, too. :-) From my
point of view, the path forward is not to speak on the abstract but to
root on concrete numbers; it would help in bounding what we are speaking
about.
Concretely, if you would like to be able to opt-out, could you point:
1. the piece from the Guix source code you are the author?
2. source code you are the author that is packaged by Guix?
Again, I am not trying to avoid the discussion. Instead, I would prefer
to root the discussion on concrete examples. Then it would appear to me
easier to make progress.
As Greg or Ekaitz also wrote: opting out has implications on the meaning
of freedom behind “free software“.
IMHO, that’s not because we would like to opt-out that we could, would
be able to or allowed to. Therefore, instead of holding opinions on the
abstract, let try to make progress and start on the concrete: which
piece of source code are we speaking about?
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-19 8:36 ` Dale Mellor
@ 2024-06-20 17:00 ` Andreas Enge
2024-06-20 18:42 ` Dale Mellor
0 siblings, 1 reply; 70+ messages in thread
From: Andreas Enge @ 2024-06-20 17:00 UTC (permalink / raw)
To: Dale Mellor; +Cc: guix-devel
Am Wed, Jun 19, 2024 at 09:36:29AM +0100 schrieb Dale Mellor:
> No, it's not. I use Guix as a tool to develop my own projects, private and
> personal for reasons I'm keeping to myself. As part of that I write package
> definitions for them, and use the Guix machinery to build and test. I *cannot*
> have Guix just giving my code away to anybody, that is just fundamentally wrong.
>
> I think at least there should be a /restricted/ license type available to
> package definitions, and the system absolutely should not give source code away
> from packages which use this (of course, they won't get into the official
> distribution, but that's fine).
Is there a misunderstanding here? The Guix software framework does not
communicate software that you work on to outsiders. As I understand it,
SWH looks at the Guix packages that are publicly available in the Guix
git repo, and then archives the corresponding source code of these packages.
By definition, this is free software (otherwise we would not package it),
and available from elsewhere on the Internet (the "uri" part of the
"source" field). So I think Guix does not actually do anything in this
context, and all this discussion is moot. (Well, I suppose we may encourage
SWH to archive these sources, and am personally very much in favour of it;
but they do not need us for archiving the sources.)
The goal of SWH is to archive all free software in the world, and if you
want to prevent your software from appearing in their collection, the only
reliable solution is to not publish it as free software (which apparently
is your approach, Dale, for the software you are talking about).
Andreas
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 17:00 ` Andreas Enge
@ 2024-06-20 18:42 ` Dale Mellor
2024-06-20 20:54 ` Andreas Enge
2024-06-20 21:27 ` Next Steps For the Software Heritage Problem Simon Tournier
0 siblings, 2 replies; 70+ messages in thread
From: Dale Mellor @ 2024-06-20 18:42 UTC (permalink / raw)
To: Andreas Enge; +Cc: guix-devel
On Thu, 2024-06-20 at 19:00 +0200, Andreas Enge wrote:
> Am Wed, Jun 19, 2024 at 09:36:29AM +0100 schrieb Dale Mellor:
> > No, it's not. I use Guix as a tool to develop my own projects, private
> > and
> > personal for reasons I'm keeping to myself. As part of that I write package
> > definitions for them, and use the Guix machinery to build and test. I
> > *cannot*
> > have Guix just giving my code away to anybody, that is just fundamentally
> > wrong.
>
> Is there a misunderstanding here? The Guix software framework does not
> communicate software that you work on to outsiders.
I'm sure guix lint tried to push my code out to them the last time I tried.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 18:42 ` Dale Mellor
@ 2024-06-20 20:54 ` Andreas Enge
2024-06-20 20:59 ` Ekaitz Zarraga
2024-06-20 21:27 ` Next Steps For the Software Heritage Problem Simon Tournier
1 sibling, 1 reply; 70+ messages in thread
From: Andreas Enge @ 2024-06-20 20:54 UTC (permalink / raw)
To: Dale Mellor; +Cc: guix-devel
Am Thu, Jun 20, 2024 at 07:42:44PM +0100 schrieb Dale Mellor:
> I'm sure guix lint tried to push my code out to them the last time I tried.
Ah indeed, there is this in guix/lint.scm:
(define (check-archival package)
"Check whether PACKAGE's source code is archived on Software Heritage. If
it's not, and if its source code is a VCS snapshot, then send a \"save\"
request to Software Heritage.
It potentially calls this:
(define (save-package-source package)
"Attempt to save the source of PACKAGE on SWH. Return a list of warnings."
Which calls this from swh.scm:
(define* (save-origin url #:optional (type "git"))
"Request URL to be saved."
(call (swh-url "/api/1/origin/save" type "url" url) json->save-reply
http-post*))
So it does not push code, but a URL from which the code can be downloaded.
Thus it requires the code to be available from the Internet; local code
is "safe" from SWH.
Now I do not know what will happen if you save your code as a git
repository at a hidden URL. For instance, does SWH check the license?
I would hope so.
There is documentation of this feature here:
https://archive.softwareheritage.org/api/1/origin/save/doc/
which says this:
Depending of the provided origin url, the save request can either be:
- immediately accepted, for well known code hosting providers like for instance GitHub or GitLab
- rejected, in case the url is blacklisted by Software Heritage
- put in pending state until a manual check is done in order to determine if it can be loaded or not
So I suppose that if you submit a hidden, but publicly available URL
pointing to non-free code, the request will be "put in pending state",
manually checked and rejected, and maybe the URL added to the blacklist.
Andreas
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 20:54 ` Andreas Enge
@ 2024-06-20 20:59 ` Ekaitz Zarraga
2024-06-20 21:12 ` Andreas Enge
2024-06-21 8:41 ` Dale Mellor
0 siblings, 2 replies; 70+ messages in thread
From: Ekaitz Zarraga @ 2024-06-20 20:59 UTC (permalink / raw)
To: Andreas Enge, Dale Mellor; +Cc: guix-devel
Hi,
On 2024-06-20 22:54, Andreas Enge wrote:
> Am Thu, Jun 20, 2024 at 07:42:44PM +0100 schrieb Dale Mellor:
>> I'm sure guix lint tried to push my code out to them the last time I tried.
>
> Ah indeed, there is this in guix/lint.scm:
>
> (define (check-archival package)
> "Check whether PACKAGE's source code is archived on Software Heritage. If
> it's not, and if its source code is a VCS snapshot, then send a \"save\"
> request to Software Heritage.
>
> It potentially calls this:
> (define (save-package-source package)
> "Attempt to save the source of PACKAGE on SWH. Return a list of warnings."
>
> Which calls this from swh.scm:
> (define* (save-origin url #:optional (type "git"))
> "Request URL to be saved."
> (call (swh-url "/api/1/origin/save" type "url" url) json->save-reply
> http-post*))
>
> So it does not push code, but a URL from which the code can be downloaded.
> Thus it requires the code to be available from the Internet; local code
> is "safe" from SWH.
>
> Now I do not know what will happen if you save your code as a git
> repository at a hidden URL. For instance, does SWH check the license?
> I would hope so.
>
> There is documentation of this feature here:
> https://archive.softwareheritage.org/api/1/origin/save/doc/
> which says this:
> Depending of the provided origin url, the save request can either be:
> - immediately accepted, for well known code hosting providers like for instance GitHub or GitLab
> - rejected, in case the url is blacklisted by Software Heritage
> - put in pending state until a manual check is done in order to determine if it can be loaded or not
>
> So I suppose that if you submit a hidden, but publicly available URL
> pointing to non-free code, the request will be "put in pending state",
> manually checked and rejected, and maybe the URL added to the blacklist.
>
> Andreas
>
>
For this specific case we could add some flag to the command line like
`--do-not-archive` or something like that.
WDYT?
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 20:59 ` Ekaitz Zarraga
@ 2024-06-20 21:12 ` Andreas Enge
2024-06-21 8:41 ` Dale Mellor
1 sibling, 0 replies; 70+ messages in thread
From: Andreas Enge @ 2024-06-20 21:12 UTC (permalink / raw)
To: Ekaitz Zarraga; +Cc: Dale Mellor, guix-devel
Am Thu, Jun 20, 2024 at 10:59:41PM +0200 schrieb Ekaitz Zarraga:
> For this specific case we could add some flag to the command line like
> `--do-not-archive` or something like that.
guix lint -x archival
if I understand "guix lint --help" correctly.
Andreas
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 18:42 ` Dale Mellor
2024-06-20 20:54 ` Andreas Enge
@ 2024-06-20 21:27 ` Simon Tournier
1 sibling, 0 replies; 70+ messages in thread
From: Simon Tournier @ 2024-06-20 21:27 UTC (permalink / raw)
To: Dale Mellor, Andreas Enge; +Cc: guix-devel
Hi,
On Thu, 20 Jun 2024 at 19:42, Dale Mellor <guix-devel-0brg6a@rdmp.org> wrote:
> I'm sure guix lint tried to push my code out to them the last time I
> tried.
Yes, it’s the checker ’archival’.
Therefore, running “guix lint -x archival” does not send any request to
SWH.
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* About SWH, let avoid the wrong discussion
2024-06-18 8:37 Next Steps For the Software Heritage Problem MSavoritias
` (2 preceding siblings ...)
2024-06-19 10:10 ` Efraim Flashner
@ 2024-06-21 8:39 ` Simon Tournier
2024-06-21 9:12 ` MSavoritias
3 siblings, 1 reply; 70+ messages in thread
From: Simon Tournier @ 2024-06-21 8:39 UTC (permalink / raw)
To: MSavoritias, Dale Mellor, Ian Eure, guix-devel
Hi all,
For the record, the Software Heritage initiative is supportive of the
Guix project since years.
It means that members of Guix community have or had interactions with
Software Heritage (SWH) teams since years. For example, the blog post
“Connecting reproducible deployment to a long-term source code archive”
[1] published in 2019. And more recently, the scientific communication
“Source Code Archiving to the Rescue of Reproducible Deployment” [2].
Almost 6 years of friendly interactions and shared values.
Could we avoid to express definitive opinions based on partial
considerations about multi-dimensional topics?
Since years, several members of Guix community are helped in one way or
the other by SWH team members in improving free software ecosystem.
Well, I speak for myself: I have been invited to several events
organized by SWH and it’s up to you to trust me when I say: SWH team
works very hard to embrace all the diversity of FOSS communities. For
example, I recently attended to a talk organized by SWH about Commons;
that talk had been a very good food for thought and maybe it could feed
our current discussion about governance/sociocracy via comments here or
there I could commit, I do not know, maybe.
Well, I am very grateful for the opportunity to interact with SWH teams.
For the record, SWH provided various supports for the organization of 10
Years of Guix, back in 2022. Please remember that SWH team members were
there and some stayed all the three days; probably because we are a nice
community? All the video stream and good videos of the 10 Years of Guix
event you probably watched or maybe watch again is because the tireless
work of multi-hats person (Debian Developer, Debian Video Team, … and
working at SWH) helped by Guix community members.
Please check the Copyright header for the subcommand “guix locate”.
Yes, it had been partly written by one SWH team member because, yes they
run Guix. Yes, their day-job is at SWH and they are also part of our
Guix community by contributing to Guix source code.
Now, you take it as it is: I am sad by what people are concluding!
Yes I understand why people are angry. Yes discussions must happen.
However, I was expecting more benefit of the doubt considering history
and track record. Hum, even, maybe, I am asking myself if Guix
community is indeed nice or if this time the community is just harsh and
unfair.
Do we forget the track record and the common history?
Then, for what my opinion is worth, fighting against SWH while thinking
it’s fighting against LLM/AI is the wrong fight. Because 1. we are all
in the team. And 2. because SWH could be a facilitator for helping in
some regulations, maybe, I do not know. Somehow, I agree with Ekaitz.
You take it as it is: I was expecting more humility by Guix community
members. Do you really think that a collective of people involved in
various FOSS communities with different roles, dedicating their free
time to free software or open source movements, do you think they are
the bad actors here?
My humility tells me, as I expressed several times, nothing is ignored.
Yes I also got the point about the lack of transparency. As I said
above, FWIW, I am in touch with SWH team. Well, I do not have special
information from SWH and I trust them to have listened or are still
listening various communities. So my understanding is: work is in
progress… Somehow, wait and see.
Yes I know we cannot wait forever. Again, do we forget the track record
and the common history? Do we consider that a multi-layers topic
involving legal or ethics questions is straightforward to articulate?
My humility tells me to wait to have clear and better understanding
about SWH motivations, their rationale, the measures and
counter-measures they maybe have in mind. Be patient and tolerant as I
am with my friends.
Long enough email and thread. That’s all from me! :-)
My last message. Not because I am bored but because one week of
holidays is starting now for me. ;-)
1: https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/
2: https://hal.science/hal-04586520v1
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 20:59 ` Ekaitz Zarraga
2024-06-20 21:12 ` Andreas Enge
@ 2024-06-21 8:41 ` Dale Mellor
2024-06-21 9:19 ` MSavoritias
2024-06-21 17:51 ` Exclude checker with package properties [draft PATCH] Simon Tournier
1 sibling, 2 replies; 70+ messages in thread
From: Dale Mellor @ 2024-06-21 8:41 UTC (permalink / raw)
To: Ekaitz Zarraga, Andreas Enge; +Cc: guix-devel
On Thu, 2024-06-20 at 22:59 +0200, Ekaitz Zarraga wrote:
> Hi,
>
> On 2024-06-20 22:54, Andreas Enge wrote:
> > Am Thu, Jun 20, 2024 at 07:42:44PM +0100 schrieb Dale Mellor:
> > > I'm sure guix lint tried to push my code out to them the last time I
> > > tried.
> >
> > Ah indeed, there is this in guix/lint.scm:
> >
> > So it does not push code, but a URL from which the code can be downloaded.
> > Thus it requires the code to be available from the Internet; local code
> > is "safe" from SWH.
But this is still leaking information.
> > Now I do not know what will happen if you save your code as a git
> > repository at a hidden URL. For instance, does SWH check the license?
> > I would hope so.
Hope is not really good enough, there needs to be certainty in this.
>
> For this specific case we could add some flag to the command line like
> `--do-not-archive` or something like that.
`-x archival` does it, but it is too easy to forget and once the cat is out
of the bag privacy is lost. I really think this should be default behaviour, or
at least there should be a flag in the package definition. I would still be
uncomfortable with the last option, as everyone would be relying on the
collective of Guix maintainers to not screw up and accidentally leak private
data.
Dale
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 14:35 ` Ekaitz Zarraga
@ 2024-06-21 8:51 ` MSavoritias
0 siblings, 0 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-21 8:51 UTC (permalink / raw)
To: Ekaitz Zarraga; +Cc: raingloom, Simon Tournier, Ian Eure, guix-devel
On Thu, 20 Jun 2024 16:35:10 +0200
Ekaitz Zarraga <ekaitz@elenq.tech> wrote:
> > 2. You seem to imply that Free Software or code is apolitical. (in the
> > sense of social or state politics not) Which it is not. Nothing is.
> > For example Free Software is explicitly pro-capitalist and
> > pro-Google/big companies. I am not saying I disagree, but its good
> > to keep in mind that politics exist and do exist always. And in the case
>
> I'm not one of those people that think everything is politics but that's
> not a debate I want to open. Free Software can be understood from many
> ways. I don't think it's pro-capitalist, but pro-freedom, but that
> freedom affects the capitalists too, and it's a *value* they have. But
> freedom is also an anarchist value, and it can be an anti-capitalist
> value too it becomes more politic when you put more things around it.
> The issue I was trying to point is Free Software attracts many people
> from many different backgrounds and politics, and trying to push for one
> side defeats its purpose: making people stay together because they have
> some shared value.
I agree up to point. There is a lot of ifs and buts here and the CoC covers some of the already.
Not every political opinion should be respected.
> >> There are many valid reasons why someone might criticize the Free
> >> Software movement and people behind it, but making free software only
> >> has 4 simple rules. If you don't comply with them you are not free
> >> software anymore. It's as simple as that, and that simple it should
> >> be.
> >>
> >> Free Software gives me the FREEDOM to print the code, make a roll
> >> with it and shove it up my ass if I want to (and even distribute my
> >> modified copies for other people to do so). The same freedom I have
> >> to upload it to github. If you prevent me from doing one or the other
> >> you are restricting my freedom and that's defeating the purpose of
> >> free software and we cannot consider your code free software anymore.
> >> The line is clear, and trying to pretend to be free software while
> >> restricting people's freedoms (regardless of what they are) is absurd.
> >
> > This is missing the context that GPL does indeed restrict people's
> > freedom to license code as the see fit. Because it was written to
> > further the political goals of FSF. It is on purpose. So we are already
> > restricting the freedom of people to do what they want on purpose.
>
> It does restrict your freedom but only if your goal is restrict other
> people's software freedom. I'd say the argument here was that GPL
> provides more absolute freedom in the current world than other licenses
> but I don't think the GPL was a very easy decision to make for the
> radical freedom fighters. That's why some people don't like it.
Sure I agree. My point was more that we already restrict stuff to make room for better things.
Same way the CoC restricts some people from participating so that our spaces can be safer for people to participate.
Its the tradeoffs you have to do. By allowing everybody to do whatever they want or allowing everybody to say whatever they want, you end losing everybody.
As you said yourself.
> > And lets not forget
> > "your freedom ends where the other persons freedom begins"
> > and consent of course in the issue at hand.
>
> Yes, but I don't think this is a matter Free Software needs to deal
> with. And my original message was around that.
>
> Now, we should do something as a set of people that collaboratively work
> in a project. Probably not under the Free Software label, because what
> free software is is already pretty clear and well defined, but as
> something else, may that be Guix users and contributors, if we wish.
yep. I agree. And this is exactly what I wanted to do in my proposal in the first place :D
> >> The Free Software movement can be labeled (and is often labeled) as a
> >> political movement but I'd say it's more of an ethical movement. It's
> >> a way to share *values* and the value we share here is freedom. We
> >> might or might not share other values, politics, religion or
> >> anything, but as long as we put the freedom in the first place we
> >> should agree that free software is better than any other software
> >> model we have.
> >>
> >> There are bad actors in the world (say thieves, killers or... GitHub
> >> and AI), and we can discuss about how we should deal with them but I
> >> don't think the answer is putting our *values* aside but embrace them
> >> harder (one value, freedom, in our case).
> >
> > Definetily agree. The solution is not to embrace propietary software or
> > restrict software. Its to write down some common social rules that are
> > rooted in consent.
> >
> >> If people is not happy with the Free Software movement because it
> >> puts the freedom first, I can only understand it as people being mad
> >> about Free Software because it's about software.
> >>
> >> For other values, we can start other initiatives I may or may not
> >> agree more with, but if the value is freedom (in software), I don't
> >> think there's any better way to push for it. But trying to disguise
> >> other things inside of the Free Software is kind of dishonest.
> >
> > Fair. I mean we already have CoC and channel descriptions. Idk if we
> > have event guidelines/CoC yet but we should.
> >
> >> I don't know, maybe I'm just a little bit tired.
> >
> > No worries. I think it was very well said.
> >
> > MSavoritias
>
> That was just for clarifying my point wasn't against this discussion but
> to say that the decision Efraim took on dbxfs is not only correct but
> the only possible decision, and that it should be.
I think our decisions should be a lot more based on context than dogma or some kind of immovable law. But that is just me and probably a discussion for another time.
> Now in Guix, I don't feel comfortable with the fact we are helping
> people use AI that doesn't respect the licenses of our work to be
> trained. I'm sick of it.
>
> If they respected the licenses, I'd be ok with it. Since I accepted Free
> Software's social contract I'm open for anyone to use my code with any
> purpose (unless they don't respect people's freedom later).
>
> Also, even if we don't do anything about it, Guix's codebase is public,
> so they could do it anyway, regardless of SWH, so there's not much we
> can do about that.
I mean sure. But the problem is that Guix actively gives them the source code which they use for the wrong purposes.
I wouldn't have a problem if it was on archiving. Just because somebody else is an asshole doesn't mean we have to be.
Also a lot of people don't see the Free Software social contract as GPL. They see it as a legal license.
Probably we could define some kind of Free Software contract on top but I am guessing that:
1. It would be against GPL, because GPL doesn't want anybody for any purpose to use your code. We would go public domain.
2. A lot of people probably couldn't accept it. See for example hostile forks even inside GNU that have happened.
> What we *can* do is raise our concerns to SWH, motivating them to be
> more strict with their collaboration with companies or with the terms of
> their collaboration. It's probably better that they are in our side in
> this battle than if we are alone. I think they are sensible to this
> issue so it shouldn't be hard to have a proper conversation with them
> and see if we can understand better what they do, how, in which terms
> and so on.
I agree. I don't want to burn any bridges. Which is why I made the proposal that I did. To put social pressure on them to actually respect consent.
> Maybe it's better that these AI companies reach our code through SWH
> with a well-written contract than letting them steal it from the
> internet without having them to sign anything.
>
> I'm kind of just guessing there, but we are probably stronger that way.
> Also, if we could make other distros to take part on this it would be a
> great way to be stronger.
>
> In any case, I think SWH are more than sensible to this issue and I
> think their connections might be helpful to not only restrict this
> HugginFace from doing shady things but to start pushing for regulation
> for every AI company that uses our sweat for their purposes.
>
> So, to come back to my original point: It's not the free software that
> needs to change. It's the regulation of AI companies that should, and
> the responsibility we demand from them. Legally and morally, they should
> be accountable of what they do, and that's the direction I'd like to
> approach this. Maybe it's not easy to change the regulation of the whole
> world, but we can try to push for it in Europe (we pioneered some
> related regulations before) first.
Maybe. Then again this changes nothing to the current discussion.
That a system of code harvesting like SWH has needs to opt-in with consent.
Laws or not.
Then everybody can take the decision they think is based and give or not give their code to the LLM model :)
> In summary, I don't think this is just a SWH is bad/good or Free
> Software is bad/good issue.
>
> Best,
> Ekaitz
>
> PS: If there's action I'm open and ready for it, but I won't like this
> discussion to become an exercise of ethical bragging with no goals.
Please see my initial email for this thread for actional goals :)
That I plan to send a pr/mr/email for soonish.
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-20 14:40 ` Simon Tournier
@ 2024-06-21 9:08 ` MSavoritias
0 siblings, 0 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-21 9:08 UTC (permalink / raw)
To: Simon Tournier; +Cc: Ian Eure, guix-devel
On Thu, 20 Jun 2024 16:40:57 +0200
Simon Tournier <zimon.toutoune@gmail.com> wrote:
> Being concrete and explicit, could you please share:
>
> 1. Which part of your code is included in the pretraining dataset?
>
> It’s easy, you can copy/paste a snippet and it returns the location
> from where it comes from.
>
> https://huggingface.co/spaces/bigcode/search-v2a
>
>
> 2. What is your code that is included in SWH archive?
>
> Again, it’s easy: checkout some commit of your repository, then
> inside this repository, you can run:
>
> echo "https://archive.softwareheritage.org/swh:1:dir:$(guix hash -S git -f hex -H sha1 .)"
>
> Do not miss the ’.’ (dot) once entering the repository. This
> command returns SWHID. Other said, using this identifier, you might
> know if the repository is stored by SWH. (Be careful with temporary
> artifacts as .go files or else.)
>
> Or you can also check for one specific content:
>
> $ echo "https://archive.softwareheritage.org/swh:1:cnt:$(guix hash -S git -f hex -H sha1 COPYING)"
> https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2
>
> And the URL display the content of the file COPYING. Here GPL 3
> license for instance.
>
>
> 3. Where such source code from #2 and #3 is packaged by Guix?
my code is not yet in Guix. The question and actions I said came about because I want to commit my package to Guix
but the minute I do it its shared without my consent with SWH.
> That said, if the source is hosted on GitHub or GitLab.com or SourceHut
> or CodeBerg or some other popular forges or even mirrored without your
> consent on one of these, please consider that your code had been
> ingested by ChatGPT without any mean to verify. Obviously, that’s not
> an argument to accept the situation with HuggingFace and I understand
> that you do not want that your publicly release copyleft source code
> could be reused by any LLM.
>
> However, as said several times, rooting this willing of non-inclusion is
> larger than your own willing once you publicly released such source code
> under some copyleft license. I hope we agree on that.
>
> Again, I am not trying to avoid something. And again, we all have heard
> your points. Nothing is ignored. To my knowledge, the path forward is
> not yet well-defined.
>
> Since we are discussing at length with various different inputs, it
> means that a common understanding and/or opinion does not seem obvious.
Let me put it more clearly. I am NOT asking for SWH to stop training the LLM. and I am NOT asking Guix to take a stance against LLMs.
and I do know that my code is going to be harvested anyway yeah.
what I DO ask is:
1. for SWH to make the sharing of code to the LLM strictly opt-in.
2. For Guix not to enable that behavior until that is fixed because it is against our social rules and CoC
The second step I have already outlined in the first emails some steps we could take to protect our package authors and show our disagreement.
And also in the xmpp chat it was shared that guix can just stop sending new package code until it an opt-in system is in place
> >> Well, I do not know if the outcome will be aligned with your current
> >> opinion, but be sure that your concerns as the others raised by Guix
> >> community members are taking into account.
> >
> > Thank you for giving me an honest and detailed answer.
>
> I feel you are pushy on the topic and for what my opinion is worth, it
> is not helpful to raise again and again that you want a way to opt-out.
> Yeah, people got it. :-) And you are probably not alone, I guess.
Ah I am not pushing for what I want tho this is not how the thread started :)
The thread started with me saying what I am going to DO concertely about the SWH problem that is all.
I already have some practical things if you read it and I am going to start sending pr/mr/emails as i said soonish to move it forward.
I just wanted to give a heads up to the list so it doesn't come out of nowhere.
> I do not have special information from SWH but I am sure SWH people are
> working on the topic. And again, maybe the outcome will not be aligned
> with your opinion. Another story.
>
> Now, the other question you ask to Guix: do we continue to help SWH in
> harvesting? You propose to stop, IIUC. Ok, we got it, too. :-) From my
> point of view, the path forward is not to speak on the abstract but to
> root on concrete numbers; it would help in bounding what we are speaking
> about.
>
> Concretely, if you would like to be able to opt-out, could you point:
>
> 1. the piece from the Guix source code you are the author?
>
> 2. source code you are the author that is packaged by Guix?
>
> Again, I am not trying to avoid the discussion. Instead, I would prefer
> to root the discussion on concrete examples. Then it would appear to me
> easier to make progress.
>
> As Greg or Ekaitz also wrote: opting out has implications on the meaning
> of freedom behind “free software“.
I mean it does if you think that:
1. Guix doesn't have any social rules on top of the FSF definition (it does) and that it doesn't respect consent
2. That its not about the context of something. For example GPL or our CoC restrict freedom so that people can be more free to express themselves :)
> IMHO, that’s not because we would like to opt-out that we could, would
> be able to or allowed to. Therefore, instead of holding opinions on the
> abstract, let try to make progress and start on the concrete: which
> piece of source code are we speaking about?
The softwares here -> https://sr.ht/~msavoritias/
Which the minute I add them to guix the code is going to be in SWH.
Not that this is about only my software but as the example you wanted.
MSavoritias
> Cheers,
> simon
>
>
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 8:39 ` About SWH, let avoid the wrong discussion Simon Tournier
@ 2024-06-21 9:12 ` MSavoritias
2024-06-21 9:46 ` Andreas Enge
0 siblings, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-21 9:12 UTC (permalink / raw)
To: Simon Tournier; +Cc: Dale Mellor, Ian Eure, guix-devel
On Fri, 21 Jun 2024 10:39:50 +0200
Simon Tournier <zimon.toutoune@gmail.com> wrote:
Hey,
Just wanted to send a quick reply that as I have mentioned elsewhere I do not wish to see SWH go. I think they are doing great work.
and as I mention in my first email I want to apply social pressure and make it clear to package authors what is happening so we can move to an opt-in model.
It was never my intent to make it seem like we need to burn all bridges with SWH. I do think they have done mistakes but that is not a reason to break apart.
We definetily need something like SWH and I do hope to see them come around to a consentual model.
MSavoritias
> Hi all,
>
> For the record, the Software Heritage initiative is supportive of the
> Guix project since years.
>
> It means that members of Guix community have or had interactions with
> Software Heritage (SWH) teams since years. For example, the blog post
> “Connecting reproducible deployment to a long-term source code archive”
> [1] published in 2019. And more recently, the scientific communication
> “Source Code Archiving to the Rescue of Reproducible Deployment” [2].
>
> Almost 6 years of friendly interactions and shared values.
>
> Could we avoid to express definitive opinions based on partial
> considerations about multi-dimensional topics?
>
> Since years, several members of Guix community are helped in one way or
> the other by SWH team members in improving free software ecosystem.
>
> Well, I speak for myself: I have been invited to several events
> organized by SWH and it’s up to you to trust me when I say: SWH team
> works very hard to embrace all the diversity of FOSS communities. For
> example, I recently attended to a talk organized by SWH about Commons;
> that talk had been a very good food for thought and maybe it could feed
> our current discussion about governance/sociocracy via comments here or
> there I could commit, I do not know, maybe.
>
> Well, I am very grateful for the opportunity to interact with SWH teams.
>
> For the record, SWH provided various supports for the organization of 10
> Years of Guix, back in 2022. Please remember that SWH team members were
> there and some stayed all the three days; probably because we are a nice
> community? All the video stream and good videos of the 10 Years of Guix
> event you probably watched or maybe watch again is because the tireless
> work of multi-hats person (Debian Developer, Debian Video Team, … and
> working at SWH) helped by Guix community members.
>
> Please check the Copyright header for the subcommand “guix locate”.
> Yes, it had been partly written by one SWH team member because, yes they
> run Guix. Yes, their day-job is at SWH and they are also part of our
> Guix community by contributing to Guix source code.
>
> Now, you take it as it is: I am sad by what people are concluding!
>
> Yes I understand why people are angry. Yes discussions must happen.
>
> However, I was expecting more benefit of the doubt considering history
> and track record. Hum, even, maybe, I am asking myself if Guix
> community is indeed nice or if this time the community is just harsh and
> unfair.
>
> Do we forget the track record and the common history?
>
> Then, for what my opinion is worth, fighting against SWH while thinking
> it’s fighting against LLM/AI is the wrong fight. Because 1. we are all
> in the team. And 2. because SWH could be a facilitator for helping in
> some regulations, maybe, I do not know. Somehow, I agree with Ekaitz.
>
> You take it as it is: I was expecting more humility by Guix community
> members. Do you really think that a collective of people involved in
> various FOSS communities with different roles, dedicating their free
> time to free software or open source movements, do you think they are
> the bad actors here?
>
> My humility tells me, as I expressed several times, nothing is ignored.
>
> Yes I also got the point about the lack of transparency. As I said
> above, FWIW, I am in touch with SWH team. Well, I do not have special
> information from SWH and I trust them to have listened or are still
> listening various communities. So my understanding is: work is in
> progress… Somehow, wait and see.
>
> Yes I know we cannot wait forever. Again, do we forget the track record
> and the common history? Do we consider that a multi-layers topic
> involving legal or ethics questions is straightforward to articulate?
>
> My humility tells me to wait to have clear and better understanding
> about SWH motivations, their rationale, the measures and
> counter-measures they maybe have in mind. Be patient and tolerant as I
> am with my friends.
>
> Long enough email and thread. That’s all from me! :-)
>
> My last message. Not because I am bored but because one week of
> holidays is starting now for me. ;-)
>
> 1: https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/
> 2: https://hal.science/hal-04586520v1
>
> Cheers,
> simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-21 8:41 ` Dale Mellor
@ 2024-06-21 9:19 ` MSavoritias
2024-06-21 13:33 ` Luis Felipe
2024-06-21 17:51 ` Exclude checker with package properties [draft PATCH] Simon Tournier
1 sibling, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-21 9:19 UTC (permalink / raw)
To: Dale Mellor; +Cc: Ekaitz Zarraga, Andreas Enge, guix-devel
On Fri, 21 Jun 2024 09:41:10 +0100
Dale Mellor <guix-devel-0brg6a@rdmp.org> wrote:
> On Thu, 2024-06-20 at 22:59 +0200, Ekaitz Zarraga wrote:
> > Hi,
> >
> > On 2024-06-20 22:54, Andreas Enge wrote:
> > > Am Thu, Jun 20, 2024 at 07:42:44PM +0100 schrieb Dale Mellor:
> > > > I'm sure guix lint tried to push my code out to them the last time I
> > > > tried.
> > >
> > > Ah indeed, there is this in guix/lint.scm:
> > >
> > > So it does not push code, but a URL from which the code can be downloaded.
> > > Thus it requires the code to be available from the Internet; local code
> > > is "safe" from SWH.
>
> But this is still leaking information.
>
> > > Now I do not know what will happen if you save your code as a git
> > > repository at a hidden URL. For instance, does SWH check the license?
> > > I would hope so.
>
> Hope is not really good enough, there needs to be certainty in this.
>
> >
> > For this specific case we could add some flag to the command line like
> > `--do-not-archive` or something like that.
>
> `-x archival` does it, but it is too easy to forget and once the cat is out
> of the bag privacy is lost. I really think this should be default behaviour, or
> at least there should be a flag in the package definition. I would still be
> uncomfortable with the last option, as everyone would be relying on the
> collective of Guix maintainers to not screw up and accidentally leak private
> data.
>
> Dale
Yeah very much agree this should be the default behavior. Archiving should be opt-in to avoid any surprises for the person running it.
I am surprised it became default actually.
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 9:12 ` MSavoritias
@ 2024-06-21 9:46 ` Andreas Enge
2024-06-21 10:44 ` MSavoritias
0 siblings, 1 reply; 70+ messages in thread
From: Andreas Enge @ 2024-06-21 9:46 UTC (permalink / raw)
To: MSavoritias; +Cc: guix-devel
Am Fri, Jun 21, 2024 at 12:12:13PM +0300 schrieb MSavoritias:
> and as I mention in my first email I want to apply social pressure and make it clear to package authors what is happening so we can move to an opt-in model.
Well, the opt-in model is in place: As soon as I put my code under a free
license on the Internet, I opt in for it to be harvested by SWH (and anybody
else, including non-friendly companies and state actors).
Now the code may not be found by SWH, and the moment someone makes a Guix
package out of it and adds it to the Guix main channel, SWH will find and
archive it; but the opt-in has happened before at the moment I put the code
online with its license.
Maybe I misunderstood to what you want to apply the term "opt-in" (after
reading your other message in which you use the term, this seems to be
the case). If it is to source code of packages being used for AI training,
there is actually no need to have a separate opt-in. Either it is legal
under your license (and then you have effectively opted in), or it is
illegal (in which case explicit opt-in already is a requirement).
Am Fri, Jun 21, 2024 at 11:14:18AM +0300 schrieb MSavoritias:
> Aside from that even Guix uploading all code from the packages to
> SWH that basically feeds it to a LLM model is indeed not honoring consent of the author of the package.
Guix does not upload code to SWH. It gives them a pointer to a public git
repository that SWH then harvests or not according to their rules (see my
reply to Dale yesterday). These are not the same things at all.
Whether or not one agrees with the SWH policy on LLM training (and I have
not looked at it well enough to form my opinion), I do not think there
is anything we should change at the level of the Guix project. Maybe SWH
should put into place an opt-in procedure for feeding LLM; but I do not
think we in Guix should put into place an opt-in procedure for informing
SWH of the source code we package. (Which would be completely ineffective
anyway: One single person in the world would be enough to run the code in
"guix lint -c archival" on all Guix packages in all channels they have
access to. For instance, SWH themselves.)
Andreas
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 9:46 ` Andreas Enge
@ 2024-06-21 10:44 ` MSavoritias
2024-06-21 13:45 ` Luis Felipe
` (2 more replies)
0 siblings, 3 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-21 10:44 UTC (permalink / raw)
To: Andreas Enge; +Cc: guix-devel
On Fri, 21 Jun 2024 11:46:56 +0200
Andreas Enge <andreas@enge.fr> wrote:
> Am Fri, Jun 21, 2024 at 12:12:13PM +0300 schrieb MSavoritias:
> > and as I mention in my first email I want to apply social pressure and make it clear to package authors what is happening so we can move to an opt-in model.
>
> Well, the opt-in model is in place: As soon as I put my code under a free
> license on the Internet, I opt in for it to be harvested by SWH (and anybody
> else, including non-friendly companies and state actors).
That may be how you have understood it but that is not how most people understand it.
See for example mirroring videos that creators have made online, or more recently some activitypub software harvesting posts for a search engine.
As I have been saying a lot in this thread (because there seem to be a lot of people in the Guix community not familiar that legal are not the same as social rules):
-Just because you CAN do something doesn't mean you SHOULD. In the sense that yes somebody can probably harvest all my posts from activitypub and post them somewhere else,
in practise they are an asshole tho and probably are going to be deferated pretty fast for breaking the social rules of common human decency :)
This is by design in activitypub btw the social rule of don't harvest stuff. Same way that it is in xmpp. Not that assholes don't exist of course, but nobody is exempt from common human decency and
a following the rules of a place. See also https://www.consentfultech.io/ for a good read. Hope it answers some questions.
- What you are saying even if it was true, is not indicated anywhere in the manual or the website. (which is part of what I want to do.) Add a warning for package authors and commiters and a proper procedure.
We are ultimately living in a society that we have some good faith by default that everybody acts respectfully (dont leak my messages that i sent to you in private for example). If they don't
we take measures to not include them anymore. I am not saying this for SWH mind you, its just an example.
Saying that I can do whatever I want is a very reductionist point of view that I doubt would be acceptable inside Guix and FSF even. Given that GPL itself doesn't allow you to do whatever you want.
TBH it seems you are not the only one in this thread not knowing that laws (legal rules of states) ie. the FSF licenses and work and whatever, are not the same as social rules.
But given that Guix has a CoC and social rules on top of that I am hopeful :)
> Now the code may not be found by SWH, and the moment someone makes a Guix
> package out of it and adds it to the Guix main channel, SWH will find and
> archive it; but the opt-in has happened before at the moment I put the code
> online with its license.
>
> Maybe I misunderstood to what you want to apply the term "opt-in" (after
> reading your other message in which you use the term, this seems to be
> the case). If it is to source code of packages being used for AI training,
> there is actually no need to have a separate opt-in. Either it is legal
> under your license (and then you have effectively opted in), or it is
> illegal (in which case explicit opt-in already is a requirement).
Again as I wrote above legal has nothing to do with it really. Its about our social rules and what we have as common understanding in Guix.
if you just do something just because you can, then that makes you an asshole in my book. See hostile forks for example that have happened.
> Am Fri, Jun 21, 2024 at 11:14:18AM +0300 schrieb MSavoritias:
> > Aside from that even Guix uploading all code from the packages to
> > SWH that basically feeds it to a LLM model is indeed not honoring consent of the author of the package.
>
> Guix does not upload code to SWH. It gives them a pointer to a public git
> repository that SWH then harvests or not according to their rules (see my
> reply to Dale yesterday). These are not the same things at all.
This is bikeshedding and arguing on schemantics. Guix gives them a url to download the source code from, so ultimately we (the Guix project) is responsible for the code showing up in there.
Lets not argue over schemantics like this. It is even posted on their website in case you want to argue otherwise https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/
> Whether or not one agrees with the SWH policy on LLM training (and I have
> not looked at it well enough to form my opinion), I do not think there
> is anything we should change at the level of the Guix project. Maybe SWH
> should put into place an opt-in procedure for feeding LLM; but I do not
> think we in Guix should put into place an opt-in procedure for informing
> SWH of the source code we package. (Which would be completely ineffective
> anyway: One single person in the world would be enough to run the code in
> "guix lint -c archival" on all Guix packages in all channels they have
> access to. For instance, SWH themselves.)
Sure they can. But it starts with showing an example ourselves how it is done. If we wait on others we might as well shut guix down and go develop on macs or something :P
Putting it in Guix is the optimal way to act in good faith towards our community imo. Is it harder? sure. But its always harder to care about consent and privacy and such than otherwise.
MSavoritias
>
> Andreas
>
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-21 9:19 ` MSavoritias
@ 2024-06-21 13:33 ` Luis Felipe
0 siblings, 0 replies; 70+ messages in thread
From: Luis Felipe @ 2024-06-21 13:33 UTC (permalink / raw)
To: MSavoritias, Dale Mellor; +Cc: guix-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 959 bytes --]
Hi,
El 21/06/24 a las 9:19, MSavoritias escribió:
> On Fri, 21 Jun 2024 09:41:10 +0100
> Dale Mellor <guix-devel-0brg6a@rdmp.org> wrote:
>
>> `-x archival` does it, but it is too easy to forget and once the cat is out
>> of the bag privacy is lost. I really think this should be default behaviour, or
>> at least there should be a flag in the package definition. I would still be
>> uncomfortable with the last option, as everyone would be relying on the
>> collective of Guix maintainers to not screw up and accidentally leak private
>> data.
>>
>> Dale
> Yeah very much agree this should be the default behavior. Archiving should be opt-in to avoid any surprises for the person running it.
> I am surprised it became default actually.
MSavoritias, Dale, I think this is one specific point you could report
as an issue (https://issues.guix.gnu.org/), track it with a number and
maybe provide patches if you are able to.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 2881 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 10:44 ` MSavoritias
@ 2024-06-21 13:45 ` Luis Felipe
2024-06-21 14:15 ` MSavoritias
2024-06-21 16:51 ` Vagrant Cascadian
2024-06-22 13:06 ` Richard Sent
2 siblings, 1 reply; 70+ messages in thread
From: Luis Felipe @ 2024-06-21 13:45 UTC (permalink / raw)
To: MSavoritias, Andreas Enge; +Cc: guix-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 1389 bytes --]
El 21/06/24 a las 10:44, MSavoritias escribió:
> On Fri, 21 Jun 2024 11:46:56 +0200
> Andreas Enge <andreas@enge.fr> wrote:
>
>> Am Fri, Jun 21, 2024 at 11:14:18AM +0300 schrieb MSavoritias:
>>> Aside from that even Guix uploading all code from the packages to
>>> SWH that basically feeds it to a LLM model is indeed not honoring consent of the author of the package.
>> Guix does not upload code to SWH. It gives them a pointer to a public git
>> repository that SWH then harvests or not according to their rules (see my
>> reply to Dale yesterday). These are not the same things at all.
> This is bikeshedding and arguing on schemantics. Guix gives them a url to download the source code from, so ultimately we (the Guix project) is responsible for the code showing up in there.
> Lets not argue over schemantics like this. It is even posted on their website in case you want to argue otherwise https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/
I think the differentiation between sending code and sending a URL is
necessary. Saying that Guix sends your code or your source files to SWH
leads people to think that Guix *will* transmit those files from your
local machine over the Internet to SWH machines when you run "guix lint
YOUR_PRIVATE_PACKAGE". And that's not the case, is it?
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 2881 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 13:45 ` Luis Felipe
@ 2024-06-21 14:15 ` MSavoritias
2024-06-21 16:33 ` Luis Felipe
2024-06-21 16:34 ` Liliana Marie Prikler
0 siblings, 2 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-21 14:15 UTC (permalink / raw)
To: Luis Felipe; +Cc: Andreas Enge, guix-devel
On Fri, 21 Jun 2024 13:45:04 +0000
Luis Felipe <sirgazil@zoho.com> wrote:
> El 21/06/24 a las 10:44, MSavoritias escribió:
> > On Fri, 21 Jun 2024 11:46:56 +0200
> > Andreas Enge <andreas@enge.fr> wrote:
> >
> >> Am Fri, Jun 21, 2024 at 11:14:18AM +0300 schrieb MSavoritias:
> >>> Aside from that even Guix uploading all code from the packages to
> >>> SWH that basically feeds it to a LLM model is indeed not honoring consent of the author of the package.
> >> Guix does not upload code to SWH. It gives them a pointer to a public git
> >> repository that SWH then harvests or not according to their rules (see my
> >> reply to Dale yesterday). These are not the same things at all.
> > This is bikeshedding and arguing on schemantics. Guix gives them a url to download the source code from, so ultimately we (the Guix project) is responsible for the code showing up in there.
> > Lets not argue over schemantics like this. It is even posted on their website in case you want to argue otherwise https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/
>
> I think the differentiation between sending code and sending a URL is
> necessary. Saying that Guix sends your code or your source files to SWH
> leads people to think that Guix *will* transmit those files from your
> local machine over the Internet to SWH machines when you run "guix lint
> YOUR_PRIVATE_PACKAGE". And that's not the case, is it?
But I didnt say that tho did I? the context you are reading as from the quote is Guix uploading all code from its packages to SWH.
Not any private repos. So i have no idea what you are reffering to here tbh.
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 14:15 ` MSavoritias
@ 2024-06-21 16:33 ` Luis Felipe
2024-06-21 17:04 ` Msavoritias
2024-06-21 16:34 ` Liliana Marie Prikler
1 sibling, 1 reply; 70+ messages in thread
From: Luis Felipe @ 2024-06-21 16:33 UTC (permalink / raw)
To: MSavoritias; +Cc: Andreas Enge, guix-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 2203 bytes --]
El 21/06/24 a las 14:15, MSavoritias escribió:
> On Fri, 21 Jun 2024 13:45:04 +0000
> Luis Felipe <sirgazil@zoho.com> wrote:
>
>> El 21/06/24 a las 10:44, MSavoritias escribió:
>>> On Fri, 21 Jun 2024 11:46:56 +0200
>>> Andreas Enge <andreas@enge.fr> wrote:
>>>
>>>> Am Fri, Jun 21, 2024 at 11:14:18AM +0300 schrieb MSavoritias:
>>>>> Aside from that even Guix uploading all code from the packages to
>>>>> SWH that basically feeds it to a LLM model is indeed not honoring consent of the author of the package.
>>>> Guix does not upload code to SWH. It gives them a pointer to a public git
>>>> repository that SWH then harvests or not according to their rules (see my
>>>> reply to Dale yesterday). These are not the same things at all.
>>> This is bikeshedding and arguing on schemantics. Guix gives them a url to download the source code from, so ultimately we (the Guix project) is responsible for the code showing up in there.
>>> Lets not argue over schemantics like this. It is even posted on their website in case you want to argue otherwise https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/
>> I think the differentiation between sending code and sending a URL is
>> necessary. Saying that Guix sends your code or your source files to SWH
>> leads people to think that Guix *will* transmit those files from your
>> local machine over the Internet to SWH machines when you run "guix lint
>> YOUR_PRIVATE_PACKAGE". And that's not the case, is it?
> But I didnt say that tho did I? the context you are reading as from the quote is Guix uploading all code from its packages to SWH.
> Not any private repos. So i have no idea what you are reffering to here tbh.
No, you didn't.
What I'm trying to say is that I don't think specifying what Guix
sends/uploads to SWH is "bikeshedding". For example, when you say "Guix
uploading all code from its packages to SWH", it's ambiguous to me. I
don't understand whether you are referring to the package definitions or
to the source files those packages refer to. And, if I understand
correctly, Guix doesn't upload any of these to SWH.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 2881 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 14:15 ` MSavoritias
2024-06-21 16:33 ` Luis Felipe
@ 2024-06-21 16:34 ` Liliana Marie Prikler
1 sibling, 0 replies; 70+ messages in thread
From: Liliana Marie Prikler @ 2024-06-21 16:34 UTC (permalink / raw)
To: MSavoritias, Luis Felipe; +Cc: Andreas Enge, guix-devel
Hi, MSavoritias,
Am Freitag, dem 21.06.2024 um 17:15 +0300 schrieb MSavoritias:
> But I didnt say that tho did I? the context you are reading as from
> the quote is Guix uploading all code from its packages to SWH.
> Not any private repos. So i have no idea what you are reffering to
> here tbh.
I hate to say that, but you kinda did. It was implicit on the mailing
list (at least in the OP), but very explicit in the XMPP room, where
you say
"it automatically sen[d]s your repo (and all your code) that is
reachable through the internet to Software Heritage […] with no way to
opt-out at any of the process and no flag with `guix lint` to disable
it"
Now, you stand corrected on both accounts (the automatic sending of
code and the inability to disable it), but I'd like to poke at another
tangent.
Currently, the StarCoder LLM endorsed by SWH, claims to only ingest
GitHub and to filter out both commercial and copyleft code, thus
training on non-copyleft "open source" software only [1]. So, at the
time of writing, you do have an "easy" opt-out by way of using the GPL.
Except, that, of course, their script to detect licenses is buggy –
what else did you expect? Just search for GNOME using their tool.[2]
It will print out repos like the unlicensed releng [3] – although for
some reason, being unlicensed appears to be fair game to them anyway
[1] – or the GPL'd devhelp [4].
So, in my opinion, the collaboration between SWH and StarCoder should
trigger some side-eyeing; and if only to exclude the archival lint for
the time being. We can still consider SWH as a software mirror if all
else fails, and they should probably be quick enough in updating as
well. Long term, we might want to look into options that do not openly
endorse tools which make such questionable decisions.
On the notion of consent, I do think that "I license my code under the
MIT license, because then companies will like me" ought to count as
consent here. [3] and [4] on the other hand very much don't. Also,
"sign up with GitHub, so that you can opt out" is not a great consent
model either – at the very least accept bleeping email.
As per Doctorow's law of enshittification, there is a good chance that
"ethical AI" to SWH will become "any AI" if we do nothing to
communicate that this is not what we as Guix expect.
Cheers
[1] https://arxiv.org/abs/2402.19173
[2] https://huggingface.co/spaces/bigcode/in-the-stack
[3] https://github.com/GNOME/releng
[4] https://github.com/GNOME/devhelp
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 10:44 ` MSavoritias
2024-06-21 13:45 ` Luis Felipe
@ 2024-06-21 16:51 ` Vagrant Cascadian
2024-06-21 17:22 ` MSavoritias
2024-06-21 17:25 ` About SWH, let avoid the wrong discussion Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-22 13:06 ` Richard Sent
2 siblings, 2 replies; 70+ messages in thread
From: Vagrant Cascadian @ 2024-06-21 16:51 UTC (permalink / raw)
To: guix-devel
[-- Attachment #1: Type: text/plain, Size: 5737 bytes --]
On 2024-06-21, MSavoritias wrote:
> On Fri, 21 Jun 2024 11:46:56 +0200
> Andreas Enge <andreas@enge.fr> wrote:
>> Am Fri, Jun 21, 2024 at 12:12:13PM +0300 schrieb MSavoritias:
>> > and as I mention in my first email I want to apply social pressure and make it clear to package authors what is happening so we can move to an opt-in model.
>>
>> Well, the opt-in model is in place: As soon as I put my code under a free
>> license on the Internet, I opt in for it to be harvested by SWH (and anybody
>> else, including non-friendly companies and state actors).
>
> That may be how you have understood it but that is not how most people understand it.
> See for example mirroring videos that creators have made online, or more recently some activitypub software harvesting posts for a search engine.
I think the fundamental difference is that such videos or activitypub
posts are not necessarily released under a license that *expressly*
permits sharing.
In most cases, those posts and videos are often released without any
license at all, and the person retains the legal, social, moral and
ethical rights to decide how that content is shared if at all. (I am
speaking with those terms in the "plain" english sense, although they
may have specific legal meanings in some contexts)
> As I have been saying a lot in this thread (because there seem to be a
> lot of people in the Guix community not familiar that legal are not
> the same as social rules):
> -Just because you CAN do something doesn't mean you SHOULD. In the sense that yes somebody can probably harvest all my posts from activitypub and post them somewhere else,
> in practise they are an asshole tho and probably are going to be
> deferated pretty fast for breaking the social rules of common human
> decency :)
With something released under a Free Software license, calling someone
an "asshole" simply for using the permissions granted by that license,
by the very person who granted those permissions, starts to feel a bit
like a baited trap and honestly, maybe outright duplicitous. Certainly
rude, at the very least.
Again, that is different from some arbitrary post or video or cat
picture on the internet, which more likely than not has no explicit
permissions granted.
> TBH it seems you are not the only one in this thread not knowing that laws (legal rules of states) ie. the FSF licenses and work and whatever, are not the same as social rules.
> But given that Guix has a CoC and social rules on top of that I am hopeful :)
Well... free software ... is a bunch of social rules. Licenses are
social rules. Contracts are social rules. Laws are social
rules. Admittedly, a lot of the mechanics involved in law creation and
enforcement are dubious and suspect and weighted in the favor large,
wealthy and/or otherwise powerful entities...
I am not sure arguing about social vs. legal vs. whatever is even really
a useful direction... almost missing the point entirely.
I would rather ask... what is the intention of the Free Software
movement?
The licenses are merely imperfect tools to achieve those aims, and a
clever way to leverage some specific legal mechanisms, but the licenses
are not an end unto themselves.
For me personally, it is about creating a shared commons that can be
used to build healthy thriving local, regional, global and virtual
communities that do useful or interesting things... I dare dream that
some of those collaboration skills leak into other aspects of life too,
not just software!
I have a lot of doubts that the LLM training from SWH data is going to
further this vision for free software... while the overall work of SWH
most definitely does.
Given my crude understanding of how LLM training works, it seems hard to
imagine that it could actually produce models that comply with all of
the license terms of innumerable free software projects, some of which
have mutually incompatible terms. For just a handful of examples that
are incompatible with the GPL:
https://www.gnu.org/licenses/license-list.html#GPLIncompatibleLicenses
So unless they are very extremely exceedingly excruciatingly careful
about not including incompatible licenses... I have significant doubts.
The incentives are just not there.
I am a bit disappointed with the very optimistic take SWH has regarding
LLMs for code:
https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/
Even with all the identifiers to show which code a model was trained on,
the whole point of a large model is it is built from a huge
dataset... my guess is it takes significantly more effort to audit that
dataset than to create an LLM with it.
Which is to say license compliance, one of the few tools of the Free
Software movement, seems unlikely to be effective. It is barely
effective with more traditional software development.
In short, er, at length, I am really not sure what to do.
I find the opt-out/opt-in angle to be almost tangential.
I find all the hype, and more importantly, active harm done with LLMs to
be a very serious threat to free software, various disadvantaged
communities, and possibly the literal liveability of our biggest commons
so far, dear planet earth... to be appalling.
If some social pressure from the Guix community could improve things, by
all means, though I worry that it might be at best performative rather
than effective, especially if the pressure is placed N parties removed
from the source of the actual problem (e.g. those irresponsibly training
of LLMs without respecting the licenses).
Aaaaaand... I have to cut myself off now. :)
live well,
vagrant
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 16:33 ` Luis Felipe
@ 2024-06-21 17:04 ` Msavoritias
0 siblings, 0 replies; 70+ messages in thread
From: Msavoritias @ 2024-06-21 17:04 UTC (permalink / raw)
To: Luis Felipe; +Cc: Andreas Enge, guix-devel
On Fri, 21 Jun 2024 16:33:40 +0000
Luis Felipe <sirgazil@zoho.com> wrote:
> El 21/06/24 a las 14:15, MSavoritias escribió:
> > On Fri, 21 Jun 2024 13:45:04 +0000
> > Luis Felipe <sirgazil@zoho.com> wrote:
> >
> >> El 21/06/24 a las 10:44, MSavoritias escribió:
> >>> On Fri, 21 Jun 2024 11:46:56 +0200
> >>> Andreas Enge <andreas@enge.fr> wrote:
> >>>
> >>>> Am Fri, Jun 21, 2024 at 11:14:18AM +0300 schrieb MSavoritias:
> >>>>> Aside from that even Guix uploading all code from the packages to
> >>>>> SWH that basically feeds it to a LLM model is indeed not honoring consent of the author of the package.
> >>>> Guix does not upload code to SWH. It gives them a pointer to a public git
> >>>> repository that SWH then harvests or not according to their rules (see my
> >>>> reply to Dale yesterday). These are not the same things at all.
> >>> This is bikeshedding and arguing on schemantics. Guix gives them a url to download the source code from, so ultimately we (the Guix project) is responsible for the code showing up in there.
> >>> Lets not argue over schemantics like this. It is even posted on their website in case you want to argue otherwise https://www.softwareheritage.org/2019/04/18/software-heritage-and-gnu-guix-join-forces-to-enable-long-term-reproducibility/
> >> I think the differentiation between sending code and sending a URL is
> >> necessary. Saying that Guix sends your code or your source files to SWH
> >> leads people to think that Guix *will* transmit those files from your
> >> local machine over the Internet to SWH machines when you run "guix lint
> >> YOUR_PRIVATE_PACKAGE". And that's not the case, is it?
> > But I didnt say that tho did I? the context you are reading as from the quote is Guix uploading all code from its packages to SWH.
> > Not any private repos. So i have no idea what you are reffering to here tbh.
>
> No, you didn't.
>
> What I'm trying to say is that I don't think specifying what Guix
> sends/uploads to SWH is "bikeshedding". For example, when you say "Guix
> uploading all code from its packages to SWH", it's ambiguous to me. I
> don't understand whether you are referring to the package definitions or
> to the source files those packages refer to. And, if I understand
> correctly, Guix doesn't upload any of these to SWH.
From the `guix lint` documentation:
archival ¶
Checks whether the package’s source code is archived at Software Heritage.
When the source code that is not archived comes from a version-control system (VCS)—e.g., it’s obtained with git-fetch, send Software Heritage a “save” request so that it eventually archives it. This ensures that the source will remain available in the long term, and that Guix can fall back to Software Heritage should the source code disappear from its original host. The status of recent “save” requests can be viewed on-line.
When source code is a tarball obtained with url-fetch, simply print a message when it is not archived. As of this writing, Software Heritage does not allow requests to save arbitrary tarballs; we are working on ways to ensure that non-VCS source code is also archived.
Software Heritage limits the request rate per IP address. When the limit is reached, guix lint prints a message and the archival checker stops doing anything until that limit has been reset.
This is run for all packages in the Guix tree in case you didnt know. (and by default in guix lint)
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 16:51 ` Vagrant Cascadian
@ 2024-06-21 17:22 ` MSavoritias
2024-06-21 20:51 ` Vagrant Cascadian
2024-06-21 17:25 ` About SWH, let avoid the wrong discussion Felix Lechner via Development of GNU Guix and the GNU System distribution.
1 sibling, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-21 17:22 UTC (permalink / raw)
To: Vagrant Cascadian; +Cc: guix-devel
On Fri, 21 Jun 2024 09:51:30 -0700
Vagrant Cascadian <vagrant@debian.org> wrote:
> On 2024-06-21, MSavoritias wrote:
> > On Fri, 21 Jun 2024 11:46:56 +0200
> > Andreas Enge <andreas@enge.fr> wrote:
> >> Am Fri, Jun 21, 2024 at 12:12:13PM +0300 schrieb MSavoritias:
> >> > and as I mention in my first email I want to apply social pressure and make it clear to package authors what is happening so we can move to an opt-in model.
> >>
> >> Well, the opt-in model is in place: As soon as I put my code under a free
> >> license on the Internet, I opt in for it to be harvested by SWH (and anybody
> >> else, including non-friendly companies and state actors).
> >
> > That may be how you have understood it but that is not how most people understand it.
> > See for example mirroring videos that creators have made online, or more recently some activitypub software harvesting posts for a search engine.
>
> I think the fundamental difference is that such videos or activitypub
> posts are not necessarily released under a license that *expressly*
> permits sharing.
>
> In most cases, those posts and videos are often released without any
> license at all, and the person retains the legal, social, moral and
> ethical rights to decide how that content is shared if at all. (I am
> speaking with those terms in the "plain" english sense, although they
> may have specific legal meanings in some contexts)
Its not actually. License doesn't matter to fediverse communities (I am talking ones that are part of the BadSpace here)
It is a social issue and treat accordinly. As in defederate (dont assosiate) with people who dont respect your community rules.
Laws, and licenses have nothing to do with it.
Also bear in mind that the same communities opposed and blocked search engines that tried to make the posts searchable.
That is why it became opt-in in the end :D
> > As I have been saying a lot in this thread (because there seem to be a
> > lot of people in the Guix community not familiar that legal are not
> > the same as social rules):
>
> > -Just because you CAN do something doesn't mean you SHOULD. In the sense that yes somebody can probably harvest all my posts from activitypub and post them somewhere else,
> > in practise they are an asshole tho and probably are going to be
> > deferated pretty fast for breaking the social rules of common human
> > decency :)
>
> With something released under a Free Software license, calling someone
> an "asshole" simply for using the permissions granted by that license,
> by the very person who granted those permissions, starts to feel a bit
> like a baited trap and honestly, maybe outright duplicitous. Certainly
> rude, at the very least.
>
> Again, that is different from some arbitrary post or video or cat
> picture on the internet, which more likely than not has no explicit
> permissions granted.
See about fediverse again. Its understood socially to be a bad thing not legally.
Because after all mostly nobody has the time and money for state laws to work.
> > TBH it seems you are not the only one in this thread not knowing that laws (legal rules of states) ie. the FSF licenses and work and whatever, are not the same as social rules.
> > But given that Guix has a CoC and social rules on top of that I am hopeful :)
>
> Well... free software ... is a bunch of social rules. Licenses are
> social rules. Contracts are social rules. Laws are social
> rules. Admittedly, a lot of the mechanics involved in law creation and
> enforcement are dubious and suspect and weighted in the favor large,
> wealthy and/or otherwise powerful entities...
>
> I am not sure arguing about social vs. legal vs. whatever is even really
> a useful direction... almost missing the point entirely.
>
> I would rather ask... what is the intention of the Free Software
> movement?
>
> The licenses are merely imperfect tools to achieve those aims, and a
> clever way to leverage some specific legal mechanisms, but the licenses
> are not an end unto themselves.
>
> For me personally, it is about creating a shared commons that can be
> used to build healthy thriving local, regional, global and virtual
> communities that do useful or interesting things... I dare dream that
> some of those collaboration skills leak into other aspects of life too,
> not just software!
That is all well and good but sadly Free Software says nothing about social rules.
For example what is Guix supposed to do when racists come in the chat?
or what if there is a hostile fork with the same name and submits itself for Guix inclusion?
or what if like a few months ago you have a trans person saying in the mailing list that you deadnamed them? Do we not change the software even if FSF free software says we can do whatever we want?
I doubt the last case would go well with a lot of people in the Guix community.
These are just some examples that Free Software can't solve for better or for worse. So it is up to social rules to decide what to do.
That is to say I agree we need collaboration and shared commons and such. But to create said collaborations we need to create safe spaces, protect people, value consent.
> If some social pressure from the Guix community could improve things, by
> all means, though I worry that it might be at best performative rather
> than effective, especially if the pressure is placed N parties removed
> from the source of the actual problem (e.g. those irresponsibly training
> of LLMs without respecting the licenses).
Maybe, maybe not. Its not only about "changing" SWH tho. That would be nice indeed by itself.
What it also accomplishes is that it signals that Guix cares about consent and about its community.
> Aaaaaand... I have to cut myself off now. :)
>
>
> live well,
> vagrant
Regards,
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 16:51 ` Vagrant Cascadian
2024-06-21 17:22 ` MSavoritias
@ 2024-06-21 17:25 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
1 sibling, 0 replies; 70+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-21 17:25 UTC (permalink / raw)
To: Vagrant Cascadian, guix-devel
Hi Vagrant,
On Fri, Jun 21 2024, Vagrant Cascadian wrote:
> I have to cut myself off now.
Please feel free to keep going. Out of the dozens of comments here,
including my own, yours was the most valuable.
+1 to your fatigue with LLM hype; to the critique of the excess
expenditure of precious resources; to the legal/social observations; and
also to the balance of your message.
Thank you for saving me from having to write all that myself!
Kind regards
Felix
^ permalink raw reply [flat|nested] 70+ messages in thread
* Exclude checker with package properties [draft PATCH]
2024-06-21 8:41 ` Dale Mellor
2024-06-21 9:19 ` MSavoritias
@ 2024-06-21 17:51 ` Simon Tournier
2024-06-21 18:37 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
` (2 more replies)
1 sibling, 3 replies; 70+ messages in thread
From: Simon Tournier @ 2024-06-21 17:51 UTC (permalink / raw)
To: Dale Mellor, Ekaitz Zarraga, Andreas Enge; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 250 bytes --]
On Fri, 21 Jun 2024 at 09:41, Dale Mellor <guix-devel-0brg6a@rdmp.org> wrote:
> `-x archival` does it, but it is too easy to forget
[...]
> at least there should be a flag in the package definition.
See attached the patch implementing that.
[-- Attachment #2: p.patch --]
[-- Type: text/x-diff, Size: 9894 bytes --]
From 8cb162bcde91d3b39453de576caadb9a6f8f8733 Mon Sep 17 00:00:00 2001
Message-ID: <8cb162bcde91d3b39453de576caadb9a6f8f8733.1718990517.git.zimon.toutoune@gmail.com>
From: Simon Tournier <zimon.toutoune@gmail.com>
Date: Fri, 21 Jun 2024 19:17:57 +0200
Subject: [PATCH] guix: lint: Honor 'no-archival?' package property.
* guix/lint.scm (check-archival): Skip the checker if the package is marked.
* doc/guix.texi: Document it.
Change-Id: I2e21b60ee4f02255f298740a2e9ebb1717e490ff
---
doc/guix.texi | 15 ++++-
guix/lint.scm | 154 ++++++++++++++++++++++++++------------------------
2 files changed, 93 insertions(+), 76 deletions(-)
diff --git a/doc/guix.texi b/doc/guix.texi
index 769ca1399f..5c1cb89686 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -71,7 +71,7 @@
Copyright @copyright{} 2019 Alex Griffin@*
Copyright @copyright{} 2019, 2020, 2021, 2022 Guillaume Le Vaillant@*
Copyright @copyright{} 2020 Liliana Marie Prikler@*
-Copyright @copyright{} 2019, 2020, 2021, 2022, 2023 Simon Tournier@*
+Copyright @copyright{} 2019, 2020, 2021, 2022, 2023, 2024 Simon Tournier@*
Copyright @copyright{} 2020 Wiktor Żelazny@*
Copyright @copyright{} 2020 Damien Cassou@*
Copyright @copyright{} 2020 Jakub Kądziołka@*
@@ -15380,6 +15380,19 @@ Invoking guix lint
prints a message and the @code{archival} checker stops doing anything until
that limit has been reset.
+Sometimes it is not desired to send a request for archiving each time
+@command{guix lint} is run. The package might be marked to skip the
+@code{archival} checker by honoring the @code{no-archival?} property in
+package definition:
+
+@lisp
+(define-public python-scikit-learn
+ (package
+ (name "python-scikit-learn")
+ ;; @dots{}
+ (properties '((no-archival? . #t)))))
+@end lisp
+
@item cve
@cindex security vulnerabilities
@cindex CVE, Common Vulnerabilities and Exposures
diff --git a/guix/lint.scm b/guix/lint.scm
index 68d532968d..4c33ec6598 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -1717,84 +1717,88 @@ (define (check-archival package)
(lookup-directory-by-nar-hash (content-hash-value hash)
(content-hash-algorithm hash)))
- (parameterize ((%allow-request? skip-when-limit-reached))
- (catch #t
- (lambda ()
- (match (package-source package)
- (#f ;no source
- '())
- ((and (? origin? origin)
- (= origin-uri (? git-reference? reference)))
- (define url
- (git-reference-url reference))
- (define commit
- (git-reference-commit reference))
- (define hash
- (origin-hash origin))
-
- (match (or (lookup-by-nar-hash hash)
- (if (commit-id? commit)
- (or (lookup-revision commit)
- (lookup-origin-revision url commit))
- (lookup-origin-revision url commit)))
- ((or (? string?) (? revision?))
- '())
- (#f
- ;; Revision is missing from the archive, attempt to save it.
- (save-package-source package))))
- ((? origin? origin)
- (if (and=> (origin-hash origin) ;XXX: for ungoogled-chromium
- content-hash-value) ;& icecat
- (let ((hash (origin-hash origin)))
- (match (or (lookup-by-nar-hash hash)
- (lookup-content (content-hash-value hash)
- (symbol->string
- (content-hash-algorithm hash))))
- (#f
- ;; If ORIGIN is a version-control checkout, save it now.
- ;; If not, check whether HASH is in the Disarchive
- ;; database ("Save Code Now" does not accept tarballs).
- (if (vcs-origin origin)
- (save-package-source package)
- (match (lookup-disarchive-spec hash)
- (#f
- (list (make-warning package
- (G_ "source not archived on Software \
+ (if (not (assq 'no-archival? (package-properties package)))
+ (parameterize ((%allow-request? skip-when-limit-reached))
+ (catch #t
+ (lambda ()
+ (match (package-source package)
+ (#f ;no source
+ '())
+ ((and (? origin? origin)
+ (= origin-uri (? git-reference? reference)))
+ (define url
+ (git-reference-url reference))
+ (define commit
+ (git-reference-commit reference))
+ (define hash
+ (origin-hash origin))
+
+ (match (or (lookup-by-nar-hash hash)
+ (if (commit-id? commit)
+ (or (lookup-revision commit)
+ (lookup-origin-revision url commit))
+ (lookup-origin-revision url commit)))
+ ((or (? string?) (? revision?))
+ '())
+ (#f
+ ;; Revision is missing from the archive, attempt to save it.
+ (save-package-source package))))
+ ((? origin? origin)
+ (if (and=> (origin-hash origin) ;XXX: for ungoogled-chromium
+ content-hash-value) ;& icecat
+ (let ((hash (origin-hash origin)))
+ (match (or (lookup-by-nar-hash hash)
+ (lookup-content (content-hash-value hash)
+ (symbol->string
+ (content-hash-algorithm hash))))
+ (#f
+ ;; If ORIGIN is a version-control checkout, save it now.
+ ;; If not, check whether HASH is in the Disarchive
+ ;; database ("Save Code Now" does not accept tarballs).
+ (if (vcs-origin origin)
+ (save-package-source package)
+ (match (lookup-disarchive-spec hash)
+ (#f
+ (list (make-warning package
+ (G_ "source not archived on Software \
Heritage and missing from the Disarchive database")
- #:field 'source)))
- (directory-ids
- (match (find (lambda (id)
- (not (lookup-directory id)))
- directory-ids)
- (#f '())
- (id
- (list (make-warning package
- (G_ "\
+ #:field 'source)))
+ (directory-ids
+ (match (find (lambda (id)
+ (not (lookup-directory id)))
+ directory-ids)
+ (#f '())
+ (id
+ (list (make-warning package
+ (G_ "\
Disarchive entry refers to non-existent SWH directory '~a'")
- (list id)
- #:field 'source))))))))
- ((? content?)
- '())
- ((? string? swhid)
- '())))
- '()))
- ((? local-file?)
- '())
- (_
- (list (make-warning package
- (G_ "\
+ (list id)
+ #:field 'source))))))))
+ ((? content?)
+ '())
+ ((? string? swhid)
+ '())))
+ '()))
+ ((? local-file?)
+ '())
+ (_
+ (list (make-warning package
+ (G_ "\
source is not an origin, it cannot be archived")
- #:field 'source)))))
- (match-lambda*
- (('swh-error url method response)
- (swh-response->warning package url method response))
- ((key . args)
- (if (eq? key skip-key)
- '()
- (with-networking-fail-safe
- (G_ "while connecting to Software Heritage")
- '()
- (apply throw key args))))))))
+ #:field 'source)))))
+ (match-lambda*
+ (('swh-error url method response)
+ (swh-response->warning package url method response))
+ ((key . args)
+ (if (eq? key skip-key)
+ '()
+ (with-networking-fail-safe
+ (G_ "while connecting to Software Heritage")
+ '()
+ (apply throw key args)))))))
+ (list
+ (make-warning package
+ (G_ "skip archiving as marked by package")))))
(define (check-haskell-stackage package)
"Check whether PACKAGE is a Haskell package ahead of the current
base-commit: bc8a41f4a8d9f1f0525d7bc97c67ed3c8aea3111
--
2.41.0
[-- Attachment #3: Type: text/plain, Size: 466 bytes --]
Well, thinking about indeed it could helpful in some context to specify
the checkers to exclude at the package definition level. Other said,
this patch could be generalized. Work in progress… :-)
Cheers,
simon
PS: I am on the train and the network connection is poor… I have sent
the guix-patches but it does not appear. Hum, weird?! Is
debbugs.gnu.org having issues?
Because the issue rings a bell… but I do not find the message.
^ permalink raw reply related [flat|nested] 70+ messages in thread
* Re: Exclude checker with package properties [draft PATCH]
2024-06-21 17:51 ` Exclude checker with package properties [draft PATCH] Simon Tournier
@ 2024-06-21 18:37 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-21 18:44 ` Simon Tournier
2024-06-21 18:42 ` Simon Tournier
2024-06-22 15:54 ` Draft: dry-run + Exclude checker with package properties Simon Tournier
2 siblings, 1 reply; 70+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-21 18:37 UTC (permalink / raw)
To: Simon Tournier, Dale Mellor, Ekaitz Zarraga, Andreas Enge; +Cc: guix-devel
Hi Simon,
On Fri, Jun 21 2024, Simon Tournier wrote:
> Is debbugs.gnu.org having issues?
Yes, the community0p server crashed this morning. Luckily, Debbugs
appears to be back online and added messages I sent during the outage.
Maybe yours will get there, too.
> See attached the patch implementing that.
Thank you! Do you see a chance we can amend the patch so I can block
such package definitions from being used by 'guix deploy', 'guix system
reconfigure' and 'guix home reconfigure'?
The new field looks to me like an amendment of the license terms,
especially if the field was added by the author pursuant to the
objections raised in this thread. I would rather not pollute my systems
with potentially unfree software.
Also, for all the controversy surrounding LLMs, which I read with great
interest, SHW still provides a valuable service by making sure the
sources I depend upon to configure my systems do not disappear. Due to
my custom patches, I regularly bootstrap Guix. I cannot be caught in a
situation from which I cannot recover.
Kind regards
Felix
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Exclude checker with package properties [draft PATCH]
2024-06-21 17:51 ` Exclude checker with package properties [draft PATCH] Simon Tournier
2024-06-21 18:37 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-06-21 18:42 ` Simon Tournier
2024-06-22 15:54 ` Draft: dry-run + Exclude checker with package properties Simon Tournier
2 siblings, 0 replies; 70+ messages in thread
From: Simon Tournier @ 2024-06-21 18:42 UTC (permalink / raw)
To: Dale Mellor, Ekaitz Zarraga, Andreas Enge; +Cc: guix-devel
On Fri, 21 Jun 2024 at 19:51, Simon Tournier <zimon.toutoune@gmail.com> wrote:
> Well, thinking about indeed it could helpful in some context to specify
> the checkers to exclude at the package definition level. Other said,
> this patch could be generalized. Work in progress… :-)
Done here: https://issues.guix.gnu.org/71697#1
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Exclude checker with package properties [draft PATCH]
2024-06-21 18:37 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-06-21 18:44 ` Simon Tournier
0 siblings, 0 replies; 70+ messages in thread
From: Simon Tournier @ 2024-06-21 18:44 UTC (permalink / raw)
To: Felix Lechner; +Cc: Dale Mellor, Ekaitz Zarraga, Andreas Enge, guix-devel
Hi Felix,
On Fri, 21 Jun 2024 at 20:37, Felix Lechner <felix.lechner@lease-up.com> wrote:
> > Is debbugs.gnu.org having issues?
>
> Yes, the community0p server crashed this morning. Luckily, Debbugs
> appears to be back online and added messages I sent during the outage.
> Maybe yours will get there, too.
Thanks. Yeah the message reached issues.guix.gnu.org so I guess all
is fine. :-)
> > See attached the patch implementing that.
>
> Thank you! Do you see a chance we can amend the patch so I can block
> such package definitions from being used by 'guix deploy', 'guix system
> reconfigure' and 'guix home reconfigure'?
My input of this will wait after my holidays. ;-)
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 17:22 ` MSavoritias
@ 2024-06-21 20:51 ` Vagrant Cascadian
2024-06-22 15:46 ` MSavoritias
0 siblings, 1 reply; 70+ messages in thread
From: Vagrant Cascadian @ 2024-06-21 20:51 UTC (permalink / raw)
To: MSavoritias; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 6790 bytes --]
On 2024-06-21, MSavoritias wrote:
> On Fri, 21 Jun 2024 09:51:30 -0700
> Vagrant Cascadian <vagrant@debian.org> wrote:
>
>> On 2024-06-21, MSavoritias wrote:
>> > On Fri, 21 Jun 2024 11:46:56 +0200
>> > Andreas Enge <andreas@enge.fr> wrote:
>> >> Am Fri, Jun 21, 2024 at 12:12:13PM +0300 schrieb MSavoritias:
>> >> > and as I mention in my first email I want to apply social pressure and make it clear to package authors what is happening so we can move to an opt-in model.
>> >>
>> >> Well, the opt-in model is in place: As soon as I put my code under a free
>> >> license on the Internet, I opt in for it to be harvested by SWH (and anybody
>> >> else, including non-friendly companies and state actors).
>> >
>> > That may be how you have understood it but that is not how most people understand it.
>> > See for example mirroring videos that creators have made online, or more recently some activitypub software harvesting posts for a search engine.
>>
>> I think the fundamental difference is that such videos or activitypub
>> posts are not necessarily released under a license that *expressly*
>> permits sharing.
>>
>> In most cases, those posts and videos are often released without any
>> license at all, and the person retains the legal, social, moral and
>> ethical rights to decide how that content is shared if at all. (I am
>> speaking with those terms in the "plain" english sense, although they
>> may have specific legal meanings in some contexts)
>
> Its not actually. License doesn't matter to fediverse communities (I am talking ones that are part of the BadSpace here)
> It is a social issue and treat accordinly. As in defederate (dont assosiate) with people who dont respect your community rules.
> Laws, and licenses have nothing to do with it.
What is a license other than an explicit set of community rules
pertaining to the community around which that license is relevent
(e.g. a specific piece of software)?
When people break community rules, there may be consequences... and
whatever relevent community figures out what to do about it, with
whatever explicit or ad-hoc process they have at hand... some of those
methods work out better than others.
I see no notable difference with the way the fediverse works; people or
communities choose to associate or disassociate from other people or
communities when a common set of norms cannot be established. If you
repeatedly or severely break the rules (a.k.a. laws) of a particular
community, you probably will no longer be welcome in that community.
>> With something released under a Free Software license, calling someone
>> an "asshole" simply for using the permissions granted by that license,
>> by the very person who granted those permissions, starts to feel a bit
>> like a baited trap and honestly, maybe outright duplicitous. Certainly
>> rude, at the very least.
>>
>> Again, that is different from some arbitrary post or video or cat
>> picture on the internet, which more likely than not has no explicit
>> permissions granted.
>
> See about fediverse again. Its understood socially to be a bad thing not legally.
> Because after all mostly nobody has the time and money for state laws to work.
If I tell you "go ahead and do X with this cool thing I made, as long as
you respect Y, forever, honest" and then you say "stop doing X now, I
take it back because Z" ... that might come across as socially
inappropriate weather there are laws involved or not; the law is
irrelevent as far as I am concerned.
Of course, context matters; maybe Z is something nobody had ever thought
of before, and it is a surprise to everyone... and maybe even pretty
undesireable. Maybe Z is a pretty arbitrary whim... and everything
in-between. Maybe, just maybe, there is a big ambiguous grey area or
even a gray area...
A license is just a social arrangement, a codified set of social rules,
promises and expectations, just because it has some codified legal
enforcement mechanism does not change that. Obviously, due to systematic
power imbalances, it is probably different than breaking a promise to
meet someone for a picnic tomorrow afternoon.
>> > TBH it seems you are not the only one in this thread not knowing that laws (legal rules of states) ie. the FSF licenses and work and whatever, are not the same as social rules.
>> > But given that Guix has a CoC and social rules on top of that I am hopeful :)
>>
>> Well... free software ... is a bunch of social rules. Licenses are
>> social rules. Contracts are social rules. Laws are social
>> rules. Admittedly, a lot of the mechanics involved in law creation and
>> enforcement are dubious and suspect and weighted in the favor large,
>> wealthy and/or otherwise powerful entities...
>>
>> I am not sure arguing about social vs. legal vs. whatever is even really
>> a useful direction... almost missing the point entirely.
>>
>> I would rather ask... what is the intention of the Free Software
>> movement?
>>
>> The licenses are merely imperfect tools to achieve those aims, and a
>> clever way to leverage some specific legal mechanisms, but the licenses
>> are not an end unto themselves.
>>
>> For me personally, it is about creating a shared commons that can be
>> used to build healthy thriving local, regional, global and virtual
>> communities that do useful or interesting things... I dare dream that
>> some of those collaboration skills leak into other aspects of life too,
>> not just software!
>
> That is all well and good but sadly Free Software says nothing about
> social rules. For example what is Guix supposed to do when racists
> come in the chat? or what if there is a hostile fork with the same
> name and submits itself for Guix inclusion? or what if like a few
> months ago you have a trans person saying in the mailing list that you
> deadnamed them? Do we not change the software even if FSF free
> software says we can do whatever we want?
>
> I doubt the last case would go well with a lot of people in the Guix
> community. These are just some examples that Free Software can't
> solve for better or for worse. So it is up to social rules to decide
> what to do.
Sure, this is why we have a whole toolbox with things like a code of
conduct, documentation, and mailing lists to discuss and hash these
things out when something unforseen comes up...
> That is to say I agree we need collaboration and shared commons and
> such. But to create said collaborations we need to create safe spaces,
> protect people, value consent.
I agree, though still might come to different conclusions (or lack
thereof) about how exactly to achieve that.
live well,
vagrant
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 10:44 ` MSavoritias
2024-06-21 13:45 ` Luis Felipe
2024-06-21 16:51 ` Vagrant Cascadian
@ 2024-06-22 13:06 ` Richard Sent
2024-06-22 14:42 ` MSavoritias
2 siblings, 1 reply; 70+ messages in thread
From: Richard Sent @ 2024-06-22 13:06 UTC (permalink / raw)
To: MSavoritias; +Cc: Andreas Enge, guix-devel
Hi MSavoritias,
MSavoritias <email@msavoritias.me> writes:
>> Well, the opt-in model is in place: As soon as I put my code under a free
>> license on the Internet, I opt in for it to be harvested by SWH (and anybody
>> else, including non-friendly companies and state actors).
> That may be how you have understood it but that is not how most people
> understand it. See for example mirroring videos that creators have
> made online, or more recently some activitypub software harvesting
> posts for a search engine.
>
> As I have been saying a lot in this thread (because there seem to be a
> lot of people in the Guix community not familiar that legal are not
> the same as social rules):
I feel the need to jump in here because that first paragraph, to me,
implies that the silent members of the community agree with you. I do
not.
Mirroring/archiving code released under a free license is different then
copying videos or posts that were not licensed. The two are so different
that opposition to the latter can't be compared to opposition to the
former. And yes, I do mean from a ethical perspective. These are wildly
different issues.
> Saying that I can do whatever I want is a very reductionist point of
> view that I doubt would be acceptable inside Guix and FSF even. Given
> that GPL itself doesn't allow you to do whatever you want.
Restrictions for the purpose of maximizing freedom are different then
restrictions for the purpose of limiting freedom.
> Again as I wrote above legal has nothing to do with it really. Its
> about our social rules and what we have as common understanding in
> Guix.
To some people (myself included), ensuring software is and remains free
IS an ethical rule (along with the contents of Guix's Contributor
Covenant of course). I do not believe any rules in said code of conduct
are being violated here.
>> `-x archival` does it, but it is too easy to forget and once the cat is out
>> of the bag privacy is lost. I really think this should be default behaviour,
>> or
>> at least there should be a flag in the package definition. I would still be
>> uncomfortable with the last option, as everyone would be relying on the
>> collective of Guix maintainers to not screw up and accidentally leak private
>> data.
>>
>> Dale
> Yeah very much agree this should be the default behavior. Archiving
> should be opt-in to avoid any surprises for the person running it. I
> am surprised it became default actually.
It is not my responsibility to ensure publicly available code released
under a FOSS license is not archived. It is the developers
responsibility to not release it under a FOSS license. (Perhaps nonfree
private channels would benefit from a change in the default behavior but
Guix should not tailor its defaults around such a use case.)
I am opposed to any theoretical change in Guix's packaging policy that
restricts software freedom. This would include a system that allows for
marking individual packages as "do not upload to software heritage".
To clarify. I am specifically opposed to a change in official Guix
packages that allows for this statement:
"Do not upload automatically to software heritage, and no one else can
either."
I have no objection to disabling archival for technical reasons. And of
course, 3rd party channels are free to do whatever they want.
As Felix said:
> The new field looks to me like an amendment of the license terms,
> especially if the field was added by the author pursuant to the
> objections raised in this thread. I would rather not pollute my
> systems with potentially unfree software.
Nonfree software does not belong in Guix proper.
I believe [1] is a relevant piece on this topic. It discusses some of
the issues with adding additional restrictions to a GPL license. Here's
a choice quote from the GPL:
> All other non-permissive additional terms are considered "further
> restrictions" within the meaning of section 10. If the Program as you
> received it, or any part of it, contains a notice stating that it is
> governed by this License along with a term that is a further
> restriction, you may remove that term.
And the rationale:
> Here we were particularly concerned to address the problem of program
> authors who purport to license their works in a misleading and
> possibly self-contradictory fashion, using the GPL together with
> unacceptable added restrictions that would make those works non-free
> software.
[1]: https://www.fsf.org/blogs/licensing/protecting-free-software-against-confusing-additional-restrictions
--
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-22 13:06 ` Richard Sent
@ 2024-06-22 14:42 ` MSavoritias
2024-06-22 19:53 ` Ricardo Wurmus
0 siblings, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-22 14:42 UTC (permalink / raw)
To: Richard Sent; +Cc: Andreas Enge, guix-devel
On Sat, 22 Jun 2024 09:06:20 -0400
Richard Sent <richard@freakingpenguin.com> wrote:
> Hi MSavoritias,
>
> MSavoritias <email@msavoritias.me> writes:
>
> >> Well, the opt-in model is in place: As soon as I put my code under a free
> >> license on the Internet, I opt in for it to be harvested by SWH (and anybody
> >> else, including non-friendly companies and state actors).
> > That may be how you have understood it but that is not how most people
> > understand it. See for example mirroring videos that creators have
> > made online, or more recently some activitypub software harvesting
> > posts for a search engine.
> >
> > As I have been saying a lot in this thread (because there seem to be a
> > lot of people in the Guix community not familiar that legal are not
> > the same as social rules):
>
> I feel the need to jump in here because that first paragraph, to me,
> implies that the silent members of the community agree with you. I do
> not.
>
> Mirroring/archiving code released under a free license is different then
> copying videos or posts that were not licensed. The two are so different
> that opposition to the latter can't be compared to opposition to the
> former. And yes, I do mean from a ethical perspective. These are wildly
> different issues.
>
> > Saying that I can do whatever I want is a very reductionist point of
> > view that I doubt would be acceptable inside Guix and FSF even. Given
> > that GPL itself doesn't allow you to do whatever you want.
>
> Restrictions for the purpose of maximizing freedom are different then
> restrictions for the purpose of limiting freedom.
Thank you for proving my point :)
That what "limits freedom" is very subjective that is. You have your opinion other people have yours.
GPL has been called bad for restricting freedom after all if you dont know.
> > Again as I wrote above legal has nothing to do with it really. Its
> > about our social rules and what we have as common understanding in
> > Guix.
>
> To some people (myself included), ensuring software is and remains free
> IS an ethical rule (along with the contents of Guix's Contributor
> Covenant of course). I do not believe any rules in said code of conduct
> are being violated here.
Does you ethics not include privacy and consent? Because mine do.
see -> https://www.consentfultech.io
> >> `-x archival` does it, but it is too easy to forget and once the cat is out
> >> of the bag privacy is lost. I really think this should be default behaviour,
> >> or
> >> at least there should be a flag in the package definition. I would still be
> >> uncomfortable with the last option, as everyone would be relying on the
> >> collective of Guix maintainers to not screw up and accidentally leak private
> >> data.
> >>
> >> Dale
> > Yeah very much agree this should be the default behavior. Archiving
> > should be opt-in to avoid any surprises for the person running it. I
> > am surprised it became default actually.
>
> It is not my responsibility to ensure publicly available code released
> under a FOSS license is not archived. It is the developers
> responsibility to not release it under a FOSS license. (Perhaps nonfree
> private channels would benefit from a change in the default behavior but
> Guix should not tailor its defaults around such a use case.)
>
> I am opposed to any theoretical change in Guix's packaging policy that
> restricts software freedom. This would include a system that allows for
> marking individual packages as "do not upload to software heritage".
>
> To clarify. I am specifically opposed to a change in official Guix
> packages that allows for this statement:
>
> "Do not upload automatically to software heritage, and no one else can
> either."
Let me put this more clear Richard, the statement above that archiving should be off by default means:
- Guix respects the consent of the person using guix lint and their expectations. (that lint actually lints)
- Respects their privacy
- Respects their autonomy.
Now if you want to disagree that people should have privacy or expectations then I fear we are becoming the next Google.
Personally I do not want Guix to become the next google but I instead want to respect privacy, autonomy and consent.
If you do not believe in these then I fear we have a fundamental disagreement here.
Regards,
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-21 20:51 ` Vagrant Cascadian
@ 2024-06-22 15:46 ` MSavoritias
2024-06-22 17:55 ` Breath, let take a short break :-) Simon Tournier
0 siblings, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-22 15:46 UTC (permalink / raw)
To: Vagrant Cascadian; +Cc: MSavoritias, guix-devel
On Fri, 21 Jun 2024 13:51:17 -0700
Vagrant Cascadian <vagrant@debian.org> wrote:
Hey,
I am really tempted to just write this off as a bad faith argument (which it mostly is) but either way i replied some things more down because I am trying to believe you are
arguing in good faith.
If its not a bad faith argument, please consider the time and place and the context of things before arguing next time.
> On 2024-06-21, MSavoritias wrote:
> > On Fri, 21 Jun 2024 09:51:30 -0700
> > Vagrant Cascadian <vagrant@debian.org> wrote:
> >
> >> On 2024-06-21, MSavoritias wrote:
> >> > On Fri, 21 Jun 2024 11:46:56 +0200
> >> > Andreas Enge <andreas@enge.fr> wrote:
> >> >> Am Fri, Jun 21, 2024 at 12:12:13PM +0300 schrieb MSavoritias:
> >> >> > and as I mention in my first email I want to apply social pressure and make it clear to package authors what is happening so we can move to an opt-in model.
> >> >>
> >> >> Well, the opt-in model is in place: As soon as I put my code under a free
> >> >> license on the Internet, I opt in for it to be harvested by SWH (and anybody
> >> >> else, including non-friendly companies and state actors).
> >> >
> >> > That may be how you have understood it but that is not how most people understand it.
> >> > See for example mirroring videos that creators have made online, or more recently some activitypub software harvesting posts for a search engine.
> >>
> >> I think the fundamental difference is that such videos or activitypub
> >> posts are not necessarily released under a license that *expressly*
> >> permits sharing.
> >>
> >> In most cases, those posts and videos are often released without any
> >> license at all, and the person retains the legal, social, moral and
> >> ethical rights to decide how that content is shared if at all. (I am
> >> speaking with those terms in the "plain" english sense, although they
> >> may have specific legal meanings in some contexts)
> >
> > Its not actually. License doesn't matter to fediverse communities (I am talking ones that are part of the BadSpace here)
> > It is a social issue and treat accordinly. As in defederate (dont assosiate) with people who dont respect your community rules.
> > Laws, and licenses have nothing to do with it.
>
> What is a license other than an explicit set of community rules
> pertaining to the community around which that license is relevent
> (e.g. a specific piece of software)?
A license is a state instrument that compels somebody to do something otherwise they may get taken to state courts and have violence used against them by police
> The simplest definition is "A license is a promise not to sue", because a license usually either permits the licensed party to engage in an illegal activity, and subject to prosecution, without the license
From https://en.wikipedia.org/wiki/License
You may equate license as social rules but outside of FSF and/or GNU nobody else really does. I havent seen it used anywhere like this.
Also nobody is using licenses as social rules (not Gnu, not Guix, not Debian) nobody really. And GPL would make a horrible community anyway because it doesnt say anything about racism or sexism for example.
> >> With something released under a Free Software license, calling someone
> >> an "asshole" simply for using the permissions granted by that license,
> >> by the very person who granted those permissions, starts to feel a bit
> >> like a baited trap and honestly, maybe outright duplicitous. Certainly
> >> rude, at the very least.
> >>
> >> Again, that is different from some arbitrary post or video or cat
> >> picture on the internet, which more likely than not has no explicit
> >> permissions granted.
> >
> > See about fediverse again. Its understood socially to be a bad thing not legally.
> > Because after all mostly nobody has the time and money for state laws to work.
>
> If I tell you "go ahead and do X with this cool thing I made, as long as
> you respect Y, forever, honest" and then you say "stop doing X now, I
> take it back because Z" ... that might come across as socially
> inappropriate weather there are laws involved or not; the law is
> irrelevent as far as I am concerned.
What somebody "tell you" is not only the license. You may try to make it simpler to make your life easier feel free.
But what "somebody told you" is literally that. Just ask the person :) Anything else is pretending its all good to yourself.
> Of course, context matters; maybe Z is something nobody had ever thought
> of before, and it is a surprise to everyone... and maybe even pretty
> undesireable. Maybe Z is a pretty arbitrary whim... and everything
> in-between. Maybe, just maybe, there is a big ambiguous grey area or
> even a gray area...
>
> A license is just a social arrangement, a codified set of social rules,
> promises and expectations, just because it has some codified legal
> enforcement mechanism does not change that. Obviously, due to systematic
> power imbalances, it is probably different than breaking a promise to
> meet someone for a picnic tomorrow afternoon.
Its an legal agreement on a specific thing yes. Specifically it deals with code.
But we don't deal with code everyday. We deal with people writing code. And surprise everybody has their own wants and needs.
So no you can't make your life easier by only following a legal document and ignoring the human factor in it.
And the human factor is talking, CoC, Community Guidelines, Community rules, social rules etc. SWH learned this the hard way with the trans incident recently.
> >
> > That is all well and good but sadly Free Software says nothing about
> > social rules. For example what is Guix supposed to do when racists
> > come in the chat? or what if there is a hostile fork with the same
> > name and submits itself for Guix inclusion? or what if like a few
> > months ago you have a trans person saying in the mailing list that you
> > deadnamed them? Do we not change the software even if FSF free
> > software says we can do whatever we want?
> >
> > I doubt the last case would go well with a lot of people in the Guix
> > community. These are just some examples that Free Software can't
> > solve for better or for worse. So it is up to social rules to decide
> > what to do.
>
> Sure, this is why we have a whole toolbox with things like a code of
> conduct, documentation, and mailing lists to discuss and hash these
> things out when something unforseen comes up...
Exactly yes. You can't build a community on Free Software after all :)
community as in: How do people collaborate and coexist in a safe space.
> > That is to say I agree we need collaboration and shared commons and
> > such. But to create said collaborations we need to create safe spaces,
> > protect people, value consent.
>
> I agree, though still might come to different conclusions (or lack
> thereof) about how exactly to achieve that.
Different ways to achieve that is fine and more than welcome.
What doesn't help is questioning that we need these, CoCs dont matter or debating on the definition of words.
I would have welcomed more of the former than the latter in this thread. (which is not what i got.)
MSavoritias
> live well,
> vagrant
^ permalink raw reply [flat|nested] 70+ messages in thread
* Draft: dry-run + Exclude checker with package properties
2024-06-21 17:51 ` Exclude checker with package properties [draft PATCH] Simon Tournier
2024-06-21 18:37 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-21 18:42 ` Simon Tournier
@ 2024-06-22 15:54 ` Simon Tournier
2 siblings, 0 replies; 70+ messages in thread
From: Simon Tournier @ 2024-06-22 15:54 UTC (permalink / raw)
To: Dale Mellor, Ekaitz Zarraga, Andreas Enge; +Cc: guix-devel
Hi,
Patch #71697 [1] introduces dry-run for the checkers and a way to
exclude some checkers directly in the package definition. In addition
to exclude checkers from the command-line.
FWIW, I think it covers:
> but it is too easy to forget and once the cat is out
> of the bag privacy is lost
Well, the way to display can be improved, IMHO.
1: https://issues.guix.gnu.org/71697#4
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Breath, let take a short break :-)
2024-06-22 15:46 ` MSavoritias
@ 2024-06-22 17:55 ` Simon Tournier
2024-06-24 7:30 ` MSavoritias
0 siblings, 1 reply; 70+ messages in thread
From: Simon Tournier @ 2024-06-22 17:55 UTC (permalink / raw)
To: MSavoritias; +Cc: MSavoritias, guix-devel
Hi MSavoritias,
This message is not to cut any discussion but maybe it could be helpful
or a bit saner if you refrain to rehash again and again the same to all
messages, replying the same (or almost) to each person expressing
different opinions.
No blame, and I also include myself: being very enthusiastic to defend
ideas and values. However, a storm of replies is maybe not the best
mean to achieve such defense. :-)
I think people got your points and your opinion, quickly summarized as:
1. SWH broke “implicit social rules”,
2. Because of that, Guix must make a clear public “pressure” against SWH.
Let look how the thread looks like:
https://yhetil.org/guix/87a5jfjoey.fsf@gmail.com/T/#rc72a0743026006ee9d4758cfa794df42a9964a55
(or this other one: https://yhetil.org/guix/87il1mupco.fsf@meson/#r)
Then, for what my humble point of view is worth here, I think that your
opinion is maybe not the consensus. Obviously, the discussion is still
open and your opinion is welcome – yeah obviously welcome! – but maybe
not by replying to all, each time.
You are advocating for a safe place, right? From my eyes, when I see
the structure of the thread, it does not generate a safe place where
collaboration is encouraged.
My feeling, when I do a step back and look to the structure of the
thread, is that some opinions are silent because it’s hard to have the
space to express them.
Sometimes, a breath is helpful. Somehow, FWIW, I suggest you to let the
discussion aside, then some days later read again some messages, try to
differently understand what other peers are trying to express, and
comment to few on a fresh mindset.
All opinions are very welcome. We are all here because we value Free
Software, community, people, etc. and not necessary in that order. And
that’s very important to be able to express all the diversity.
Again, this message is not a mean to cut any discussion. Instead, this
message is a call to slow down. :-)
WDYT?
Cheers,
simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-22 14:42 ` MSavoritias
@ 2024-06-22 19:53 ` Ricardo Wurmus
2024-06-24 7:55 ` MSavoritias
0 siblings, 1 reply; 70+ messages in thread
From: Ricardo Wurmus @ 2024-06-22 19:53 UTC (permalink / raw)
To: MSavoritias; +Cc: Richard Sent, Andreas Enge, guix-devel
MSavoritias <email@msavoritias.me> writes:
>> To clarify. I am specifically opposed to a change in official Guix
>> packages that allows for this statement:
>>
>> "Do not upload automatically to software heritage, and no one else can
>> either."
>
> Let me put this more clear Richard, the statement above that archiving should be off by default means:
>
> - Guix respects the consent of the person using guix lint and their expectations. (that lint actually lints)
> - Respects their privacy
> - Respects their autonomy.
User autonomy is not curtailed by informing an aligned service's crawler
that an update has occurred. You have a first class option to disable
whatever checks you don't want to run. That's autonomy.
Since time immemorial "guix lint" has done more than strictly checking
that code is formatted correctly. "guix lint" is a contributor's tool.
Its features encode values that "we" want to preserve as new packages
are added. The intended purpose of "guix lint" is to encourage "high
quality" packages. We arrived at this meaning of "high quality" (as
approximated by the workings of "guix lint") through years of collective
work on packages. Since we've seen source code disappear, which negates
Guix reproducibility guarantees by robbing users of Guix of their
practical freedoms to the software, the modules of "guix lint" include
discouraging the use of volatile URLs (like generated tarballs),
suggesting the use of mirrors, and relatedly notifies SWH that the Guix
software collection is about to change to increase your chances of
getting identical source code years from now. All that because software
freedom is void without source code.
Here is a list of other checks that talk to the internet:
--8<---------------cut here---------------start------------->8---
- home-page: Validate home-page URLs
- source: Validate source URLs
...
- cve: Check the Common Vulnerabilities and Exposures (CVE) database
- refresh: Check the package for new upstream releases
- archival: Ensure source code archival on Software Heritage
--8<---------------cut here---------------end--------------->8---
Are these all privacy leaks? Are they in opposition of the goals of
"guix lint"? In opposition to the goals of those who use "guix lint"?
If so: why?
> Now if you want to disagree that people should have privacy or
> expectations then I fear we are becoming the next Google.
This is jumping the shark, and I think it is a statement that is
(unintentionally?) rather insulting to those of us who have been
contributing to Guix for a long time and have spent many excess calories
wringing their brains to make sure Guix is not your average tech bro
project.
It is disappointing to see the levity with which statements of this
severity are dropped here. The Guix community that I choose to remember
was less prone to making inflammatory statements when disagreements
became apparent.
--
Ricardo
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Breath, let take a short break :-)
2024-06-22 17:55 ` Breath, let take a short break :-) Simon Tournier
@ 2024-06-24 7:30 ` MSavoritias
2024-06-24 10:23 ` Tomas Volf
2024-06-24 11:56 ` Lets cut this off Efraim Flashner
0 siblings, 2 replies; 70+ messages in thread
From: MSavoritias @ 2024-06-24 7:30 UTC (permalink / raw)
To: Simon Tournier; +Cc: guix-devel
On Sat, 22 Jun 2024 19:55:05 +0200
Simon Tournier <zimon.toutoune@gmail.com> wrote:
Hey Simon,
I would suggest to take a step back as you said and consider whether what you are doing is in fact tone-policing here.
https://en.wikipedia.org/wiki/Tone_policing
> A tone argument (also called tone policing) is a type of ad hominem aimed at the tone of an argument instead of its factual or logical content in order to dismiss a person's argument. Ignoring the truth or falsity of a statement, a tone argument instead focuses on the emotion with which it is expressed. This is a logical fallacy because a person can be angry while still being rational. Nonetheless, a tone argument may be useful when responding to a statement that itself does not have rational content, such as an appeal to emotion.
I will elaborate below.
> Hi MSavoritias,
>
> This message is not to cut any discussion but maybe it could be helpful
> or a bit saner if you refrain to rehash again and again the same to all
> messages, replying the same (or almost) to each person expressing
> different opinions.
Its not really different tho is it? In the sense that since the beginning of this thread there has been 2 opinions.
The one for consent was supressed pretty fast with arguments appealing on "ethics" and "you just don't understand free software is all the rules we have" both of them bad faith arguments of course.
So if you look at the thread actually its the people that have already phrased their support have stopped replying, and I am the one replying to one opinion.
I invite you to think about 3 things:
1. Why did you felt to point it out to me instead of the same bad faith argument been written again and again?
2. What happened to the people that wanted consent but now don't reply anymore.
3. Does that a culture like this stop more voices from coming forward?
> No blame, and I also include myself: being very enthusiastic to defend
> ideas and values. However, a storm of replies is maybe not the best
> mean to achieve such defense. :-)
>
> I think people got your points and your opinion, quickly summarized as:
>
> 1. SWH broke “implicit social rules”,
> 2. Because of that, Guix must make a clear public “pressure” against SWH.
>
>
> Let look how the thread looks like:
>
> https://yhetil.org/guix/87a5jfjoey.fsf@gmail.com/T/#rc72a0743026006ee9d4758cfa794df42a9964a55
> (or this other one: https://yhetil.org/guix/87il1mupco.fsf@meson/#r)
Again see above.
> Then, for what my humble point of view is worth here, I think that your
> opinion is maybe not the consensus. Obviously, the discussion is still
> open and your opinion is welcome – yeah obviously welcome! – but maybe
> not by replying to all, each time.
As mentioned above you probably missed the first few replies before this thread was taken over so please go read again the first few hours :)
As another point please dont gaslight me :) I know how many people have replied in support both in this thread and in xmpp.
So this thread being flooded by people who dont think CoC or consent or privacy matters doesn't really make me question if I am right.
It makes me question that of course nobody else is going to reply to get storm of replies saying how "unethical" they are.
https://en.wikipedia.org/wiki/Gaslighting
> Gaslighting is a colloquialism, loosely defined as manipulating someone into questioning their own perception of reality.
> You are advocating for a safe place, right? From my eyes, when I see
> the structure of the thread, it does not generate a safe place where
> collaboration is encouraged.
>
> My feeling, when I do a step back and look to the structure of the
> thread, is that some opinions are silent because it’s hard to have the
> space to express them.
Yes exactly. So lets see what opinions were expressed the first few hours of this thread. And what opinions have been expressed after mostly.
And lets see what kind of arguments were against these initial points. (hint: its not good faith arguments most of them :) )
I do agree that the mailing list is not a safe space for a host of reasons that I already knew going in (and believe me its not easy try to write and persist) but that is an argument for another time.
> Sometimes, a breath is helpful. Somehow, FWIW, I suggest you to let the
> discussion aside, then some days later read again some messages, try to
> differently understand what other peers are trying to express, and
> comment to few on a fresh mindset.
>
> All opinions are very welcome. We are all here because we value Free
> Software, community, people, etc. and not necessary in that order. And
> that’s very important to be able to express all the diversity.
If we value diversity then we need to ask:
Where are the different opinions really and why did they left? Have you asked yourself that Simon?
This is not meant of course to say that it is your fault. Its meant to be a wider discussion of:
1. Why did the moderation fail in this thread?
2. Where are the diversity of voices?
3. Why was the piling on of a single view point that is again the Guix CoC allowed in this thread?
MSavoritias
> Again, this message is not a mean to cut any discussion. Instead, this
> message is a call to slow down. :-)
>
> WDYT?
>
> Cheers,
> simon
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-22 19:53 ` Ricardo Wurmus
@ 2024-06-24 7:55 ` MSavoritias
2024-06-24 9:13 ` Ricardo Wurmus
0 siblings, 1 reply; 70+ messages in thread
From: MSavoritias @ 2024-06-24 7:55 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: Richard Sent, Andreas Enge, guix-devel
On Sat, 22 Jun 2024 21:53:27 +0200
Ricardo Wurmus <rekado@elephly.net> wrote:
> MSavoritias <email@msavoritias.me> writes:
>
> >> To clarify. I am specifically opposed to a change in official Guix
> >> packages that allows for this statement:
> >>
> >> "Do not upload automatically to software heritage, and no one else can
> >> either."
> >
> > Let me put this more clear Richard, the statement above that archiving should be off by default means:
> >
> > - Guix respects the consent of the person using guix lint and their expectations. (that lint actually lints)
> > - Respects their privacy
> > - Respects their autonomy.
>
> User autonomy is not curtailed by informing an aligned service's crawler
> that an update has occurred. You have a first class option to disable
> whatever checks you don't want to run. That's autonomy.
It is in the sense that you haven't gotten the consent of the person running the linter on something that happens outside the context of "linting code".
I have posted this elsewhere but see https://www.consentfultech.io/
Its about not assuming things on behalf of the person running the tool. Specifically for stuff that are more "sensitive" like operations that don't involve linting code.
> Since time immemorial "guix lint" has done more than strictly checking
> that code is formatted correctly. "guix lint" is a contributor's tool.
> Its features encode values that "we" want to preserve as new packages
> are added. The intended purpose of "guix lint" is to encourage "high
> quality" packages. We arrived at this meaning of "high quality" (as
> approximated by the workings of "guix lint") through years of collective
> work on packages. Since we've seen source code disappear, which negates
> Guix reproducibility guarantees by robbing users of Guix of their
> practical freedoms to the software, the modules of "guix lint" include
> discouraging the use of volatile URLs (like generated tarballs),
> suggesting the use of mirrors, and relatedly notifies SWH that the Guix
> software collection is about to change to increase your chances of
> getting identical source code years from now. All that because software
> freedom is void without source code.
Maybe then the tool needs to be renamed? Or more ideally a new subcommand `guix lint contribute` should be added.
Because from the places I asked in xmpp and here it seems everybody that is not reading the docs or knee deep in guix project, assumes it just lints and is surprised it does more things.
> Here is a list of other checks that talk to the internet:
>
> --8<---------------cut here---------------start------------->8---
> - home-page: Validate home-page URLs
> - source: Validate source URLs
> ...
> - cve: Check the Common Vulnerabilities and Exposures (CVE) database
> - refresh: Check the package for new upstream releases
> - archival: Ensure source code archival on Software Heritage
> --8<---------------cut here---------------end--------------->8---
>
> Are these all privacy leaks? Are they in opposition of the goals of
> "guix lint"? In opposition to the goals of those who use "guix lint"?
> If so: why?
This has actually been mentioned yeah. In the xmpp room I have there were a lot of people surprised that a linter was added and would like to see it being opt-in.
Lets be honest here irc is a tech place exclusively these days so you will rarely find new arguments. Maybe putting a poll in activitypub/masto would help :)
> > Now if you want to disagree that people should have privacy or
> > expectations then I fear we are becoming the next Google.
>
> This is jumping the shark, and I think it is a statement that is
> (unintentionally?) rather insulting to those of us who have been
> contributing to Guix for a long time and have spent many excess calories
> wringing their brains to make sure Guix is not your average tech bro
> project.
>
> It is disappointing to see the levity with which statements of this
> severity are dropped here. The Guix community that I choose to remember
> was less prone to making inflammatory statements when disagreements
> became apparent.
>
You are right I did assume things about your opinions when I shouldn't. I apologize.
I am glad that you and others have been trying to make this into a welcoming project, its one of the reasons I joined after all :D
Of course that doesn't mean we can't do better, and this thread has made that pretty apparent. In a whole set of different terms that is.
I would say also that as the Guix community becomes larger its going to be necesserily less homogenous. Especially if we (the Guix Project) are doing our it right.
As a counterpoint I know a lot of people who choose not to join the mailing lists specifically due the culture so to speak.
Seeing how this thread has devolved I am wondering what the next steps would be to address this. Seeing as diversity and a welcoming environment wasn't kept.
Open to suggestions of course :)
MSavoritias
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: About SWH, let avoid the wrong discussion
2024-06-24 7:55 ` MSavoritias
@ 2024-06-24 9:13 ` Ricardo Wurmus
0 siblings, 0 replies; 70+ messages in thread
From: Ricardo Wurmus @ 2024-06-24 9:13 UTC (permalink / raw)
To: MSavoritias; +Cc: Richard Sent, Andreas Enge, guix-devel
MSavoritias <email@msavoritias.me> writes:
>> > - Guix respects the consent of the person using guix lint and their expectations. (that lint actually lints)
>> > - Respects their privacy
>> > - Respects their autonomy.
>>
>> User autonomy is not curtailed by informing an aligned service's crawler
>> that an update has occurred. You have a first class option to disable
>> whatever checks you don't want to run. That's autonomy.
>
> It is in the sense that you haven't gotten the consent of the person
> running the linter on something that happens outside the context of
> "linting code".
But look: here you switch from "autonomy" to "consent". You mentioned
"autonomy" before, and that's what I responded to. Irrespective of
whether I agree with your assertion on consent here, I think it is
important not to conflate very different concepts when attempting to
build consensus in a community discussion (lopsided as it may be). It's
how we end up talking past each other as one word points to another, and
we're led in circles.
It's also why I think it was a valuable contribution to the discussion
to draw a distinction between sending a URL and sending code. It may
seem like nitpicking, but for me (in the role of the jaded observer
whose detachment is either the result of having attained enlightenment
or being uprooted by depression) it's a world of a difference: I'm okay
with a notification containing a public URL being sent, but I'd be
furious if my bytes were siphoned off.
While I have my nit-picking hat on, allow me to but-ackshually: "Linting
code" is not really what this is about, because we're dealing with
*packages*, not arbitrary *code*. Within the context of Guix (which is
not, for example, a general purpose programming language where the unit
of interest is "code") I do think the assumption is a little too eagerly
impressed by prior experience with programming tools. I'm not saying
it's somebody's *fault* for having an assumption like this, I just think
it's an unfortunate conflation of related but distinct concepts.
> Because from the places I asked in xmpp and here it seems everybody
> that is not reading the docs or knee deep in guix project, assumes it
> just lints and is surprised it does more things.
Yes, we've had similar problems in the past where documentation is not
considered and individual assumptions (developed by other the use of
other tools, because intuition is a lie) are used as the yard stick
against which the behavior of tools is judged. Examples include "guix
refresh", "guix package", "guix container", "guix archive", and even
"guix repl".
"Nuance" is an emergent property; no single word can be nuanced, so in
my opinion a command name cannot possibly carry enough information to
accurately represent the gamut of its behaviors. We can only hint at a
general direction and use the term as an index into documentation. We
have several layers of documentation; the first pointer would be into
the output of "guix help". Perhaps changing the short description shown
next to "guix lint" would reach those averse to documentation, to colour
the pointer in ways that better hint at the concepts it points to in the
manual?
> Seeing how this thread has devolved I am wondering what the next steps
> would be to address this. Seeing as diversity and a welcoming
> environment wasn't kept.
> Open to suggestions of course :)
I think it is very difficult to feel welcome when people don't
understand or disagree with you. I've been there myself, countless
times before. The very attempt to express myself clearly is intensely
uncomfortable; it's like walking on egg shells, but not because of a
community failing, but because any error in representing my view point
is going to make the waters more turbulent, confuse the issue, spawn
requests for clarification, or sub-threads on issues that really don't
matter to the originally intended point.
And yet, all the properties of a pleasant community are exemplified in
the process of untangling the knots of disagreement. I think it is
dangerous to label the attempts to argue an opposing point of view and
the attempts to define boundaries as "arguments in bad faith". This is
a sure fire way of sabotaging one's own goals. We're all operating
under very limited information about other people's points of view,
their amount of information, their values and the amount of overlap with
our own. For some of us, defining a topics boundaries is a precondition
to understanding details within them.
Passionate people often run the risk of steam rolling a budding
discussion. [And this is my cue to disconnect from it again.] The sheer
volume of messages can intimidate people and keep them from making their
voice heard. (I, too, have been intimidated by this thread, even though
there is no reasonable threat to my standing in the community if/when I
make a fool of myself.) I read that in Sociocracy meetings, people
speak up one after the other, in turns, and not again before everyone
else has been heard. Here we don't even know who is in attendance, so
that's not easily modeled. Also, email with its ever-branching
sub-threads easily devolves into the average emacs-devel "discussion".
Simon's proposed RFC process (which I support) aims to improve this by
putting a consent-seeking process first. I think it would be a good
alternative to whatever this is :) This topic would benefit from a
declaration of statements (which members of the community can refute or
agree with) and an actionable proposal.
--
Ricardo
PS: Unless specifically addressed, the above is not directed at any one
person in particular. I'm only capable of seeing stories and themes,
but the actors and their actions are all a big blur to me. Such is
looking out from this here brain, smoothened by age and defeat.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Breath, let take a short break :-)
2024-06-24 7:30 ` MSavoritias
@ 2024-06-24 10:23 ` Tomas Volf
2024-06-24 11:56 ` Lets cut this off Efraim Flashner
1 sibling, 0 replies; 70+ messages in thread
From: Tomas Volf @ 2024-06-24 10:23 UTC (permalink / raw)
To: MSavoritias; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 2512 bytes --]
On 2024-06-24 10:30:05 +0300, MSavoritias wrote:
> [..]
> > You are advocating for a safe place, right? From my eyes, when I see
> > the structure of the thread, it does not generate a safe place where
> > collaboration is encouraged.
> >
> > My feeling, when I do a step back and look to the structure of the
> > thread, is that some opinions are silent because it’s hard to have the
> > space to express them.
>
> Yes exactly. So lets see what opinions were expressed the first few hours of this thread. And what opinions have been expressed after mostly.
I do not think this is a fair test. People might have other things to do than
to respond to fairly heavy email during "the first few hours"...
> [..]
> If we value diversity then we need to ask:
> Where are the different opinions really and why did they left? Have you asked yourself that Simon?
>
> This is not meant of course to say that it is your fault. Its meant to be a wider discussion of:
> 1. Why did the moderation fail in this thread?
Did it though? I feel like I could have expressed my opinion if I wanted to do
so.
Could you please describe how would you envisioned this thread to be handled
with regards to the moderation? Ideally in specific, actionable steps.
> 2. Where are the diversity of voices?
Who knows. Maybe they said their piece and were satisfied with it. Maybe they
were convinced my (some) arguments of the other side to just wait a bit longer.
Maybe they were indeed scared away by people expressing different opinion.
My point is that (afaik) you do *not* know where they are, so the way you put it
(implying their absence is caused solely by failure of moderation) feels bit
underhanded. At least without some actually investigation into the topic (which
you do not mention here, so I assume it was not performed).
> 3. Why was the piling on of a single view point that is again the Guix CoC allowed in this thread?
I do not believe expressing opinion different from yours is CoC violation. The
"piling on" part can just be viewed as expression of the fact that many people
disagree with you, not a harassment. If you believe any particular message
violated CoC, you should report it according to the CoC. That will move it
outside of "in *your* opinion it was a violation" into "we know whether it was".
Have a nice day,
Tomas Volf
--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Lets cut this off
2024-06-24 7:30 ` MSavoritias
2024-06-24 10:23 ` Tomas Volf
@ 2024-06-24 11:56 ` Efraim Flashner
1 sibling, 0 replies; 70+ messages in thread
From: Efraim Flashner @ 2024-06-24 11:56 UTC (permalink / raw)
To: MSavoritias; +Cc: Simon Tournier, guix-devel
[-- Attachment #1: Type: text/plain, Size: 1604 bytes --]
It seems to me that there are two assertions in this (long) thread:
* Consent should be required before letting SWH know there's new
source code in the wild.
* The license of the code gives SWH all the legal rights it needs to use
the code however they see fit.
As for the second one, I don't recall reading any arguments that SWH
doesn't have the *legal* right to slurp up all Free Software source code
they want and to use it how they want. I'm going to move right past this
one.
As far as the consent-required assertion, it seems to come down to "I
don't like what they're doing with the code so they should be required
to get my consent". This runs directly counter to the license.
Another reading could be "SWH may find the code later on their own, but
I don't want to make it easy for them because I disagree with how they
handle the code". I don't see this as running counter to the licenses in
question, but it does run counter to Guix's integration with the SWH.
The Software Heritage already acts as a fallback location to recreate
missing tarballs and this is something we want to continue to happen. I
see removing the SWH linter tie-in as shooting ourselves in the foot and
not likely to make any difference to the SWH.
On a personal level it is always possible to run 'guix lint' with the
'--no-network' flag or the '--exclude=archival' flag.
--
Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-18 18:08 ` Ian Eure
2024-06-19 10:31 ` raingloom
@ 2024-06-27 12:27 ` Ludovic Courtès
2024-06-27 15:30 ` Ian Eure
1 sibling, 1 reply; 70+ messages in thread
From: Ludovic Courtès @ 2024-06-27 12:27 UTC (permalink / raw)
To: Ian Eure; +Cc: guix-devel
Ian Eure <ian@retrospec.tv> skribis:
> Guix sends archive requests to SWH. SWH gives that source code to
> HuggingFace. HuggingFace demonstrably violates the licenses.
Which licenses? As has been said previously, and you can verify for
yourself, it does not ingest code under copyleft licenses.
Ludo’.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-27 12:27 ` Ludovic Courtès
@ 2024-06-27 15:30 ` Ian Eure
2024-06-27 16:48 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-27 16:58 ` Ludovic Courtès
0 siblings, 2 replies; 70+ messages in thread
From: Ian Eure @ 2024-06-27 15:30 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
Hi Ludo,
Ludovic Courtès <ludo@gnu.org> writes:
> Ian Eure <ian@retrospec.tv> skribis:
>
>> Guix sends archive requests to SWH. SWH gives that source code
>> to
>> HuggingFace. HuggingFace demonstrably violates the licenses.
>
> Which licenses? As has been said previously, and you can verify
> for
> yourself, it does not ingest code under copyleft licenses.
>
While this is what their paper claims[1], it doesn’t appear to be
true, since I can see my own GPL’d code in the training set. I’ve
since moved nearly all of my code off GitHub, but if you visit
their "Am I in The Stack?" page[2] and enter my old username
("ieure"), you will see pretty much every repository I ever hosted
there, including both unlicensed and GPL’d code. Some examples
are hyperspace-el, nssh-el, tl1-mode, etc. While there aren’t
LICENSE files in those repos, the file headers of all clearly
indicate that they’re GPL’d.
Unfortunately, there is no way to check for the presence of code
in the training set except by GitHub username.
What I don’t know for certain is whether these are in the training
set because they came from SWH, or because HuggingFace obtained
them through other means. Given that all the links for my GitHub
username on that "Am I in The Stack" link back to SWH, it seems
very likely that it came from them.
Thanks,
— Ian
[1]: https://arxiv.org/pdf/2402.19173 "We also exclude
copyleft-licensed code..."
[2]: https://huggingface.co/spaces/bigcode/in-the-stack
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-27 15:30 ` Ian Eure
@ 2024-06-27 16:48 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-27 16:58 ` Ludovic Courtès
1 sibling, 0 replies; 70+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-27 16:48 UTC (permalink / raw)
To: Ian Eure, Ludovic Courtès; +Cc: guix-devel
Hi Ian,
On Thu, Jun 27 2024, Ian Eure wrote:
> I’ve [...] moved nearly all of my code off GitHub
Me too. I think closed it off from search crawlers. No one should be
using Github anymore except for merge requests. I left many years ago.
> if you visit their "Am I in The Stack?" page
Thank you for the link!
> pretty much every repository I ever hosted [is in] there, including
> both unlicensed and GPL’d code.
Mine too. My software likewise has valid headers but no LICENSE files.
> Unfortunately, there is no way to check for the presence of code
> in the training set except by GitHub username.
That's probably because you and I may eventually become part of a class
of copyright holders in a court action.
> What I don’t know for certain is whether these are in the training
> set because they came from SWH, or because HuggingFace obtained
> them through other means.
I can say for certain that none of my items (username "lechner") are in
Guix or elsewhere, so they probably did not originate via SWH.
Also, did you see the opt-out link at the bottom? I considered it but
would on balance prefer to be part of the settlement class.
Kind regards
Felix
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
2024-06-27 15:30 ` Ian Eure
2024-06-27 16:48 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
@ 2024-06-27 16:58 ` Ludovic Courtès
1 sibling, 0 replies; 70+ messages in thread
From: Ludovic Courtès @ 2024-06-27 16:58 UTC (permalink / raw)
To: Ian Eure, MSavoritias; +Cc: guix-devel
Hi,
Ian Eure <ian@retrospec.tv> skribis:
> While this is what their paper claims[1], it doesn’t appear to be
> true, since I can see my own GPL’d code in the training set. I’ve
> since moved nearly all of my code off GitHub, but if you visit their
> "Am I in The Stack?" page[2] and enter my old username ("ieure"), you
> will see pretty much every repository I ever hosted there, including
> both unlicensed and GPL’d code.
That’s not my experience: I looked for Guix and Coreutils, both GPL’d,
both mirrored on GitHub, and none of it is there.
> Some examples are hyperspace-el,
> nssh-el, tl1-mode, etc. While there aren’t LICENSE files in those
> repos, the file headers of all clearly indicate that they’re GPL’d.
Well, not providing a COPYING/LICENSE file isn’t helping either: file
headers may not be all that clear to a parser.
At any rate, even though I’m watching this LLM trend with discontent
like many in the free software world, I believe this discussion is
missing the point and shooting the messenger(s).
One of the three missions of SWH is to share code—much like ftp.gnu.org.
That’s all they did. Anyone can access the archive of SWH, for any
purpose.
HuggingFace trained “BigCode” on source SWH harvested from GitHub (a
subset of the SWH archive) and chose to abide by the principles put
forward by SWH in its Oct. 2023 statement. HuggingFace didn’t have to
do that; they could have acted like Microsoft and all the “AI” companies
and just scrape everything without asking anyone—be it from SWH or from
other sources.
There is no “Software Heritage problem” and really, that very phrase and
the accusative tone in this thread is unwelcome and below our standards
for communication in Guix. This has gone too far. This is not the
place to further discuss the impact of using LLMs on free software, and
definitely not the place to throw unfounded accusations.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Next Steps For the Software Heritage Problem
@ 2024-06-28 18:01 Juliana Sims
0 siblings, 0 replies; 70+ messages in thread
From: Juliana Sims @ 2024-06-28 18:01 UTC (permalink / raw)
To: Ludovic Courtès, MSavoritias, ian; +Cc: guix-devel
Hey y'all,
I've avoided weighing in on this topic because I'm of two minds about
it. Still, when members of the community raise concerns, it's important
to take those concerns seriously. We must be careful how we address
them because the opinions and concerns of any community member are as
legitimate as those of any other.
This conversation has at times been contentious. People have not always
used the most diplomatic language. And yet, there has been a thorough
discussion of this topic. The conclusion appears to be that Guix cannot
make changes in relation to SWH. It's clear there is no more room for
productive conversation. I therefore echo Ludo's request to let this
topic drop.
I want to express my gratitude for a community where people are able to
express their concerns and have them taken seriously, regardless of who
they are. Let's not lose that. Let's not forget that, even when
passions are high, we all want Guix to succeed and have a healthy
community, and we all work to that end as best as we can with the
information and resources available to us.
Best,
Juli
^ permalink raw reply [flat|nested] 70+ messages in thread
end of thread, other threads:[~2024-06-28 18:03 UTC | newest]
Thread overview: 70+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-18 8:37 Next Steps For the Software Heritage Problem MSavoritias
2024-06-18 14:19 ` Ian Eure
2024-06-19 8:36 ` Dale Mellor
2024-06-20 17:00 ` Andreas Enge
2024-06-20 18:42 ` Dale Mellor
2024-06-20 20:54 ` Andreas Enge
2024-06-20 20:59 ` Ekaitz Zarraga
2024-06-20 21:12 ` Andreas Enge
2024-06-21 8:41 ` Dale Mellor
2024-06-21 9:19 ` MSavoritias
2024-06-21 13:33 ` Luis Felipe
2024-06-21 17:51 ` Exclude checker with package properties [draft PATCH] Simon Tournier
2024-06-21 18:37 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-21 18:44 ` Simon Tournier
2024-06-21 18:42 ` Simon Tournier
2024-06-22 15:54 ` Draft: dry-run + Exclude checker with package properties Simon Tournier
2024-06-20 21:27 ` Next Steps For the Software Heritage Problem Simon Tournier
2024-06-18 16:21 ` Greg Hogan
2024-06-18 16:33 ` MSavoritias
2024-06-18 17:31 ` Greg Hogan
2024-06-18 17:57 ` Ian Eure
2024-06-19 7:01 ` MSavoritias
2024-06-19 9:57 ` Efraim Flashner
2024-06-20 2:56 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-20 5:18 ` MSavoritias
2024-06-19 10:10 ` Efraim Flashner
2024-06-21 8:39 ` About SWH, let avoid the wrong discussion Simon Tournier
2024-06-21 9:12 ` MSavoritias
2024-06-21 9:46 ` Andreas Enge
2024-06-21 10:44 ` MSavoritias
2024-06-21 13:45 ` Luis Felipe
2024-06-21 14:15 ` MSavoritias
2024-06-21 16:33 ` Luis Felipe
2024-06-21 17:04 ` Msavoritias
2024-06-21 16:34 ` Liliana Marie Prikler
2024-06-21 16:51 ` Vagrant Cascadian
2024-06-21 17:22 ` MSavoritias
2024-06-21 20:51 ` Vagrant Cascadian
2024-06-22 15:46 ` MSavoritias
2024-06-22 17:55 ` Breath, let take a short break :-) Simon Tournier
2024-06-24 7:30 ` MSavoritias
2024-06-24 10:23 ` Tomas Volf
2024-06-24 11:56 ` Lets cut this off Efraim Flashner
2024-06-21 17:25 ` About SWH, let avoid the wrong discussion Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-22 13:06 ` Richard Sent
2024-06-22 14:42 ` MSavoritias
2024-06-22 19:53 ` Ricardo Wurmus
2024-06-24 7:55 ` MSavoritias
2024-06-24 9:13 ` Ricardo Wurmus
-- strict thread matches above, loose matches on Subject: below --
2024-06-18 17:12 Next Steps For the Software Heritage Problem Andy Tai
2024-06-18 18:08 ` Ian Eure
2024-06-19 10:31 ` raingloom
2024-06-27 12:27 ` Ludovic Courtès
2024-06-27 15:30 ` Ian Eure
2024-06-27 16:48 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-27 16:58 ` Ludovic Courtès
2024-06-19 7:52 Simon Tournier
2024-06-19 9:13 ` MSavoritias
2024-06-19 9:54 ` Efraim Flashner
2024-06-19 10:25 ` raingloom
2024-06-19 15:46 ` Ekaitz Zarraga
2024-06-20 6:36 ` MSavoritias
2024-06-20 14:35 ` Ekaitz Zarraga
2024-06-21 8:51 ` MSavoritias
2024-06-19 10:34 ` MSavoritias
2024-06-19 14:41 ` Simon Tournier
2024-06-20 6:51 ` MSavoritias
2024-06-20 14:40 ` Simon Tournier
2024-06-21 9:08 ` MSavoritias
2024-06-28 18:01 Juliana Sims
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).