all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Concerns/questions around Software Heritage Archive
@ 2024-03-16 15:52 Ian Eure
  2024-03-16 17:50 ` Christopher Baines
                   ` (7 more replies)
  0 siblings, 8 replies; 61+ messages in thread
From: Ian Eure @ 2024-03-16 15:52 UTC (permalink / raw)
  To: guix-devel

Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last fall, 
and it struck me as rather a good idea.  However, I’ve seen some 
things lately which have soured me on them.

They appear to be using the archive to build LLMs: 
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a developer 
who wished to update their name: 
https://cohost.org/arborelia/post/4968198-the-software-heritag 
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I 
assume means it’s been included in SWH.  While I’m dealing with 
their (IMO: unethical) opt-out process, I likely also need to stop 
new copies from being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should 
*never* be included in SWH?

Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

  — Ian


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
@ 2024-03-16 17:50 ` Christopher Baines
  2024-03-16 18:24   ` MSavoritias
                     ` (2 more replies)
  2024-03-16 17:58 ` MSavoritias
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 61+ messages in thread
From: Christopher Baines @ 2024-03-16 17:50 UTC (permalink / raw)
  To: Ian Eure; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2352 bytes --]


Ian Eure <ian@retrospec.tv> writes:

> Hi Guixy people,
>
> I’d never heard of SWH before I started hacking on Guix last fall, and
> it struck me as rather a good idea.  However, I’ve seen some things
> lately which have soured me on them.
>
> They appear to be using the archive to build LLMs:
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>
> I was also distressed to see how poorly they treated a developer who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag
>
> GPL’d software I’ve created has been packaged for Guix, which I assume
> means it’s been included in SWH.  While I’m dealing with their (IMO:
> unethical) opt-out process, I likely also need to stop new copies from
> being uploaded again in the future.
>
> Is there a way to indicate, in a Guix package, that it should *never*
> be included in SWH?

Not currently, and I don't really see the point in such a mechanism. If
you really never want them to store your code, then you need to license
it accordingly (and not make it free software).

> Is there a way to tell Guix to never download source from SWH?

Also no, and it's probably best to do this at the network level on your
systems/network if you want this to be the case.

Skipping back to this though:

> I was also distressed to see how poorly they treated a developer who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag

This is probably worth thinking about as Guix is in a similar situation
regarding publishing source code, and people potentially wanting to
change historical source code both in things Guix packages and Guix
itself.

Like Software Heritage, there's cryptographical implications for
rewriting the Git history and modifying source tarballs or nars that
contain source code.

We have 17TiB of compressed source code and built software stored for
bordeaux.guix.gnu.org now and we should probably work out how to handle
people asking for things to be removed or changed (for any and all
reasons).

It's probably worth working out our position on this in advance of
someone asking.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
  2024-03-16 17:50 ` Christopher Baines
@ 2024-03-16 17:58 ` MSavoritias
  2024-03-18  9:50   ` Please hold your horses Simon Tournier
  2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 61+ messages in thread
From: MSavoritias @ 2024-03-16 17:58 UTC (permalink / raw)
  To: Ian Eure, guix-devel

On 3/16/24 17:52, Ian Eure wrote:

> Hi Guixy people,
>
> I’d never heard of SWH before I started hacking on Guix last fall, and 
> it struck me as rather a good idea.  However, I’ve seen some things 
> lately which have soured me on them.
>
> They appear to be using the archive to build LLMs: 
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>
> I was also distressed to see how poorly they treated a developer who 
> wished to update their name: 
> https://cohost.org/arborelia/post/4968198-the-software-heritag 
> https://cohost.org/arborelia/post/5052044-the-software-heritag
>
> GPL’d software I’ve created has been packaged for Guix, which I assume 
> means it’s been included in SWH.  While I’m dealing with their (IMO: 
> unethical) opt-out process, I likely also need to stop new copies from 
> being uploaded again in the future.
>
> Is there a way to indicate, in a Guix package, that it should *never* 
> be included in SWH?
>
> Is there a way to tell Guix to never download source from SWH?
>
> I want absolutely nothing to do with them.
>
> Thanks,
>
>  — Ian
>

Oh no.

Apparently they have A.I. and blockchain besides being also transphobic.

Thanks for the heads up. That's all I needed to know to never touch 
whatever they are doing.


MSavoritias



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 17:50 ` Christopher Baines
@ 2024-03-16 18:24   ` MSavoritias
  2024-03-16 19:08     ` Christopher Baines
  2024-03-16 19:45     ` Tomas Volf
  2024-03-16 19:06   ` Ian Eure
  2024-03-16 23:16   ` Vivien Kraus
  2 siblings, 2 replies; 61+ messages in thread
From: MSavoritias @ 2024-03-16 18:24 UTC (permalink / raw)
  To: Christopher Baines, Ian Eure; +Cc: guix-devel


On 3/16/24 19:50, Christopher Baines wrote:
> Ian Eure <ian@retrospec.tv> writes:
>
>> Hi Guixy people,
>>
>> I’d never heard of SWH before I started hacking on Guix last fall, and
>> it struck me as rather a good idea.  However, I’ve seen some things
>> lately which have soured me on them.
>>
>> They appear to be using the archive to build LLMs:
>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>>
>> I was also distressed to see how poorly they treated a developer who
>> wished to update their name:
>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>>
>> GPL’d software I’ve created has been packaged for Guix, which I assume
>> means it’s been included in SWH.  While I’m dealing with their (IMO:
>> unethical) opt-out process, I likely also need to stop new copies from
>> being uploaded again in the future.
>>
>> Is there a way to indicate, in a Guix package, that it should *never*
>> be included in SWH?
> Not currently, and I don't really see the point in such a mechanism. If
> you really never want them to store your code, then you need to license
> it accordingly (and not make it free software).

You are talking about legal tho. Yes legally they can copy the code.

But what can Guix do socially to give people the choice? For reasons of 
consent that is.

>> I was also distressed to see how poorly they treated a developer who
>> wished to update their name:
>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>> https://cohost.org/arborelia/post/5052044-the-software-heritag
> This is probably worth thinking about as Guix is in a similar situation
> regarding publishing source code, and people potentially wanting to
> change historical source code both in things Guix packages and Guix
> itself.
>
> Like Software Heritage, there's cryptographical implications for
> rewriting the Git history and modifying source tarballs or nars that
> contain source code.
>
> We have 17TiB of compressed source code and built software stored for
> bordeaux.guix.gnu.org now and we should probably work out how to handle
> people asking for things to be removed or changed (for any and all
> reasons).
>
> It's probably worth working out our position on this in advance of
> someone asking.

I would go a step further actually. Software Heritage is effectively 
breaking CoC of Guix now.

Im not proposing removing all code or something obviously that connects 
to Software Heritage, but there should be some social action we can take.


For example until the matter is resolved and Software Heritage 
implements a process that respects trans rights Software Heritage should 
not be welcome in Guix Spaces.


MSavoritias



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 17:50 ` Christopher Baines
  2024-03-16 18:24   ` MSavoritias
@ 2024-03-16 19:06   ` Ian Eure
  2024-03-16 19:49     ` Tomas Volf
  2024-03-16 23:16   ` Vivien Kraus
  2 siblings, 1 reply; 61+ messages in thread
From: Ian Eure @ 2024-03-16 19:06 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel


Christopher Baines <mail@cbaines.net> writes:

> [[PGP Signed Part:Undecided]]
>
> Ian Eure <ian@retrospec.tv> writes:
>
>> Hi Guixy people,
>>
>> I’d never heard of SWH before I started hacking on Guix last 
>> fall, and
>> it struck me as rather a good idea.  However, I’ve seen some 
>> things
>> lately which have soured me on them.
>>
>> They appear to be using the archive to build LLMs:
>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>>
>> I was also distressed to see how poorly they treated a 
>> developer who
>> wished to update their name:
>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>>
>> GPL’d software I’ve created has been packaged for Guix, which I 
>> assume
>> means it’s been included in SWH.  While I’m dealing with their 
>> (IMO:
>> unethical) opt-out process, I likely also need to stop new 
>> copies from
>> being uploaded again in the future.
>>
>> Is there a way to indicate, in a Guix package, that it should 
>> *never*
>> be included in SWH?
>
> Not currently, and I don't really see the point in such a 
> mechanism. If
> you really never want them to store your code, then you need to 
> license
> it accordingly (and not make it free software).
>

I don’t want my code in SWH *because* it’s free.  A primary use of 
LLMs is laundering freely licensed software into proprietary, 
commercial projects through "AI" code completion and generation. 
Any Free software in an LLM training set can and will be used in 
violation of its license, without a clear path for the author to 
seek recourse.  I deleted my code off Github and abandoned it 
completely for this exact reason, and am deeply irked to be going 
through this nonsense again.

A more salient question may be: Is there a process within Guix 
(either the program or the organization) which uploads source to 
SWH?  Or does it rely on SWH indepently?

If the latter, my problem is likely solved by blocking SWH at my 
network edge and opting out of their archive (or trying to) and 
the downstream training models they’ve already put it in.  If the 
former, the only control I currently have to protect my license is 
removing packages from Guix which contain it.  I don’t want that 
outcome.

Noting also that the path here seems to be 
SWH->huggingface->bigcode training set, and the opt-out process 
for the training set appears to be a complete sham.  To opt-out, 
you must create a Github Issue; only one opt-out has *ever* been 
processed, and there are 200+ sitting there, many with no response 
for nearly a year[1].  I want no part of any of this.


>> Is there a way to tell Guix to never download source from SWH?
>
> Also no, and it's probably best to do this at the network level 
> on your
> systems/network if you want this to be the case.
>

I’ll investigate this, though I’d prefer if there was a way to 
configure source mirrors in the Guix daemon.


> Skipping back to this though:
>
>> I was also distressed to see how poorly they treated a 
>> developer who
>> wished to update their name:
>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>
> This is probably worth thinking about as Guix is in a similar 
> situation
> regarding publishing source code, and people potentially wanting 
> to
> change historical source code both in things Guix packages and 
> Guix
> itself.
>
> Like Software Heritage, there's cryptographical implications for
> rewriting the Git history and modifying source tarballs or nars 
> that
> contain source code.
>
> We have 17TiB of compressed source code and built software 
> stored for
> bordeaux.guix.gnu.org now and we should probably work out how to 
> handle
> people asking for things to be removed or changed (for any and 
> all
> reasons).
>
> It's probably worth working out our position on this in advance 
> of
> someone asking.
>

Yes, I agree that Guix needs a better solution for this.

Thanks,

  — Ian

[1]: https://github.com/bigcode-project/opt-out-v2/issues


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 18:24   ` MSavoritias
@ 2024-03-16 19:08     ` Christopher Baines
  2024-03-16 19:45     ` Tomas Volf
  1 sibling, 0 replies; 61+ messages in thread
From: Christopher Baines @ 2024-03-16 19:08 UTC (permalink / raw)
  To: MSavoritias; +Cc: Ian Eure, guix-devel

[-- Attachment #1: Type: text/plain, Size: 3431 bytes --]


MSavoritias <email@msavoritias.me> writes:

> On 3/16/24 19:50, Christopher Baines wrote:
>> Ian Eure <ian@retrospec.tv> writes:
>>
>>> Hi Guixy people,
>>>
>>> I’d never heard of SWH before I started hacking on Guix last fall, and
>>> it struck me as rather a good idea.  However, I’ve seen some things
>>> lately which have soured me on them.
>>>
>>> They appear to be using the archive to build LLMs:
>>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>>>
>>> I was also distressed to see how poorly they treated a developer who
>>> wished to update their name:
>>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>>>
>>> GPL’d software I’ve created has been packaged for Guix, which I assume
>>> means it’s been included in SWH.  While I’m dealing with their (IMO:
>>> unethical) opt-out process, I likely also need to stop new copies from
>>> being uploaded again in the future.
>>>
>>> Is there a way to indicate, in a Guix package, that it should *never*
>>> be included in SWH?
>> Not currently, and I don't really see the point in such a mechanism. If
>> you really never want them to store your code, then you need to license
>> it accordingly (and not make it free software).
>
> You are talking about legal tho. Yes legally they can copy the code.
>
> But what can Guix do socially to give people the choice? For reasons
> of consent that is.

...

>>> I was also distressed to see how poorly they treated a developer who
>>> wished to update their name:
>>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>> This is probably worth thinking about as Guix is in a similar situation
>> regarding publishing source code, and people potentially wanting to
>> change historical source code both in things Guix packages and Guix
>> itself.
>>
>> Like Software Heritage, there's cryptographical implications for
>> rewriting the Git history and modifying source tarballs or nars that
>> contain source code.
>>
>> We have 17TiB of compressed source code and built software stored for
>> bordeaux.guix.gnu.org now and we should probably work out how to handle
>> people asking for things to be removed or changed (for any and all
>> reasons).
>>
>> It's probably worth working out our position on this in advance of
>> someone asking.
>
> I would go a step further actually. Software Heritage is effectively
> breaking CoC of Guix now.
>
> Im not proposing removing all code or something obviously that
> connects to Software Heritage, but there should be some social action
> we can take.
>
>
> For example until the matter is resolved and Software Heritage
> implements a process that respects trans rights Software Heritage
> should not be welcome in Guix Spaces.

As I say, Guix is in a very similar situation as a project to Software
Heritage, we publish artefacts containing peoples personal details and
there are technical implications in changing the personal details in
those artefacts.

The only difference as far as I'm aware is that no one is currently
asking Guix as a project to update their personal details in the
artefacts we store and publish.

As a project, we should sort out our stuff before jumping to judge
others.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 18:24   ` MSavoritias
  2024-03-16 19:08     ` Christopher Baines
@ 2024-03-16 19:45     ` Tomas Volf
  2024-03-17  7:06       ` MSavoritias
  1 sibling, 1 reply; 61+ messages in thread
From: Tomas Volf @ 2024-03-16 19:45 UTC (permalink / raw)
  To: MSavoritias; +Cc: Christopher Baines, Ian Eure, guix-devel

[-- Attachment #1: Type: text/plain, Size: 2191 bytes --]

On 2024-03-16 20:24:50 +0200, MSavoritias wrote:
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> > This is probably worth thinking about as Guix is in a similar situation
> > regarding publishing source code, and people potentially wanting to
> > change historical source code both in things Guix packages and Guix
> > itself.
> >
> > Like Software Heritage, there's cryptographical implications for
> > rewriting the Git history and modifying source tarballs or nars that
> > contain source code.
> >
> > We have 17TiB of compressed source code and built software stored for
> > bordeaux.guix.gnu.org now and we should probably work out how to handle
> > people asking for things to be removed or changed (for any and all
> > reasons).
> >
> > It's probably worth working out our position on this in advance of
> > someone asking.
>
> I would go a step further actually. Software Heritage is effectively
> breaking CoC of Guix now.
>
> Im not proposing removing all code or something obviously that connects to
> Software Heritage, but there should be some social action we can take.
>
>
> For example until the matter is resolved and Software Heritage implements a
> process that respects trans rights Software Heritage should not be welcome
> in Guix Spaces.

I did skim the articles and I did not see any details on what the technical
solution should be.  SWH, among other things, archives the repositories and
allows fetching them by commit hash.  At least as far as I know.  Since that
commit hash does contain the author field, what is the proposed solution here to
change the author name without changing the commit hash?

While I am not a huge fan of the ability to map the "fake" author name over the
real one in the UI, what other solutions do you or the article author envision?
I am genuinely curious what you think can be done here.

Have a nice day,
Tomas Volf

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 19:06   ` Ian Eure
@ 2024-03-16 19:49     ` Tomas Volf
  0 siblings, 0 replies; 61+ messages in thread
From: Tomas Volf @ 2024-03-16 19:49 UTC (permalink / raw)
  To: Ian Eure; +Cc: Christopher Baines, guix-devel

[-- Attachment #1: Type: text/plain, Size: 4922 bytes --]

On 2024-03-16 12:06:27 -0700, Ian Eure wrote:
>
> Christopher Baines <mail@cbaines.net> writes:
>
> > [[PGP Signed Part:Undecided]]
> >
> > Ian Eure <ian@retrospec.tv> writes:
> >
> > > Hi Guixy people,
> > >
> > > I’d never heard of SWH before I started hacking on Guix last fall,
> > > and
> > > it struck me as rather a good idea.  However, I’ve seen some things
> > > lately which have soured me on them.
> > >
> > > They appear to be using the archive to build LLMs:
> > > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
> > >
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> > >
> > > GPL’d software I’ve created has been packaged for Guix, which I
> > > assume
> > > means it’s been included in SWH.  While I’m dealing with their (IMO:
> > > unethical) opt-out process, I likely also need to stop new copies
> > > from
> > > being uploaded again in the future.
> > >
> > > Is there a way to indicate, in a Guix package, that it should
> > > *never*
> > > be included in SWH?
> >
> > Not currently, and I don't really see the point in such a mechanism. If
> > you really never want them to store your code, then you need to license
> > it accordingly (and not make it free software).
> >
>
> I don’t want my code in SWH *because* it’s free.  A primary use of LLMs is
> laundering freely licensed software into proprietary, commercial projects
> through "AI" code completion and generation. Any Free software in an LLM
> training set can and will be used in violation of its license, without a
> clear path for the author to seek recourse.  I deleted my code off Github
> and abandoned it completely for this exact reason, and am deeply irked to be
> going through this nonsense again.
>
> A more salient question may be: Is there a process within Guix (either the
> program or the organization) which uploads source to SWH?  Or does it rely
> on SWH indepently?

`guix lint PKG-NAME' schedules SWH archival if possible.  No code is directly
uploaded (at least currently), so assuming you have a IP list of SWH, it should
be possible to block it.  At least AFAIK.

If you have the list, or know how to get it, could you share it?  I would be
interesting in blocking it as well from my git hosting.

>
> If the latter, my problem is likely solved by blocking SWH at my network
> edge and opting out of their archive (or trying to) and the downstream
> training models they’ve already put it in.  If the former, the only control
> I currently have to protect my license is removing packages from Guix which
> contain it.  I don’t want that outcome.
>
> Noting also that the path here seems to be SWH->huggingface->bigcode
> training set, and the opt-out process for the training set appears to be a
> complete sham.  To opt-out, you must create a Github Issue; only one opt-out
> has *ever* been processed, and there are 200+ sitting there, many with no
> response for nearly a year[1].  I want no part of any of this.
>
>
> > > Is there a way to tell Guix to never download source from SWH?
> >
> > Also no, and it's probably best to do this at the network level on your
> > systems/network if you want this to be the case.
> >
>
> I’ll investigate this, though I’d prefer if there was a way to configure
> source mirrors in the Guix daemon.
>
>
> > Skipping back to this though:
> >
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> >
> > This is probably worth thinking about as Guix is in a similar situation
> > regarding publishing source code, and people potentially wanting to
> > change historical source code both in things Guix packages and Guix
> > itself.
> >
> > Like Software Heritage, there's cryptographical implications for
> > rewriting the Git history and modifying source tarballs or nars that
> > contain source code.
> >
> > We have 17TiB of compressed source code and built software stored for
> > bordeaux.guix.gnu.org now and we should probably work out how to handle
> > people asking for things to be removed or changed (for any and all
> > reasons).
> >
> > It's probably worth working out our position on this in advance of
> > someone asking.
> >
>
> Yes, I agree that Guix needs a better solution for this.
>
> Thanks,
>
>  — Ian
>
> [1]: https://github.com/bigcode-project/opt-out-v2/issues
>

T.

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
  2024-03-16 17:50 ` Christopher Baines
  2024-03-16 17:58 ` MSavoritias
@ 2024-03-16 21:37 ` Ryan Prior
  2024-03-17  9:39   ` Lars-Dominik Braun
  2024-03-17 13:03 ` Olivier Dion
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 61+ messages in thread
From: Ryan Prior @ 2024-03-16 21:37 UTC (permalink / raw)
  To: Ian Eure; +Cc: guix-devel

On Saturday, March 16th, 2024 at 10:52 AM, Ian Eure <ian@retrospec.tv> wrote:

> 
> 
> Hi Guixy people,
> [...]
> I was also distressed to see how poorly they treated a developer
> who wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag

I read these posts with interest. It is worth noting that the complained-about organization, Inria, supports Guix as well & has close historical ties to the project (although it is does not have decision-making power here AFAIK.) It is a shame that Inria have treated this matter with such apparent disregard.

I have heard folks in the Guix maintenance sphere claim that we never rewrite git history in Guix, as a matter of policy. I believe we should revisit that policy (is it actually written anywhere?) with an eye towards possible exceptions, and develop a mechanism for securely maintaining continuity of Guix installations after history has been rewritten so that we maintain this as a technical possibility in the future, even if we should choose to use it sparingly.

Ryan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 17:50 ` Christopher Baines
  2024-03-16 18:24   ` MSavoritias
  2024-03-16 19:06   ` Ian Eure
@ 2024-03-16 23:16   ` Vivien Kraus
  2024-03-16 23:27     ` Tomas Volf
       [not found]     ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
  2 siblings, 2 replies; 61+ messages in thread
From: Vivien Kraus @ 2024-03-16 23:16 UTC (permalink / raw)
  To: Christopher Baines, Ian Eure; +Cc: guix-devel

Hello!

Le samedi 16 mars 2024 à 17:50 +0000, Christopher Baines a écrit :
> This is probably worth thinking about as Guix is in a similar
> situation
> regarding publishing source code, and people potentially wanting to
> change historical source code both in things Guix packages and Guix
> itself.

I see two problems:

1. providing packages;
2. developing Guix itself.

I am sure that 1. is not a real problem, we could just ask the
developer to release a new version incrementing the patch number,
upgrade it on our side, and forget the old version. Garbage collection
would ultimately get rid of the old tarballs.

2. is more difficult, because Guix contributors sometimes change their
names too, and a commit reading “update my name” is not the best
solution. If I understand correctly, rewriting the history would be
understood as a “downgrade attack”, contrary to the ftfy case where the
developer could rewrite the history without such consequences. Is my
understanding correct?


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 23:16   ` Vivien Kraus
@ 2024-03-16 23:27     ` Tomas Volf
       [not found]     ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
  1 sibling, 0 replies; 61+ messages in thread
From: Tomas Volf @ 2024-03-16 23:27 UTC (permalink / raw)
  To: Vivien Kraus; +Cc: Christopher Baines, Ian Eure, guix-devel

[-- Attachment #1: Type: text/plain, Size: 2255 bytes --]

On 2024-03-17 00:16:26 +0100, Vivien Kraus wrote:
> Hello!
>
> Le samedi 16 mars 2024 à 17:50 +0000, Christopher Baines a écrit :
> > This is probably worth thinking about as Guix is in a similar
> > situation
> > regarding publishing source code, and people potentially wanting to
> > change historical source code both in things Guix packages and Guix
> > itself.
>
> I see two problems:
>
> 1. providing packages;
> 2. developing Guix itself.
>
> I am sure that 1. is not a real problem, we could just ask the
> developer to release a new version incrementing the patch number,
> upgrade it on our side, and forget the old version. Garbage collection
> would ultimately get rid of the old tarballs.

How would that approach interact with `guix time-machine'?  If developer takes
the approach of the package mentioned here (rewrite the history), would that not
cause the previous version to be no longer buildable, since the commit would no
longer exist?

I am not sure what the developer would do for old tarballs in this situation.
Re-release them from the re-written history or just drop them?  Either would be
a problem.  Or would they not care about dead name in the tarballs?

Currently SWH protects against the first (git commit), not sure if there is any
protection against the second currently (does SWH injects tarballs as well?).

Either I am missing something, or this would actually be a problem for the
time-machine use case.

> 2. is more difficult, because Guix contributors sometimes change their
> names too, and a commit reading “update my name” is not the best
> solution. If I understand correctly, rewriting the history would be
> understood as a “downgrade attack”, contrary to the ftfy case where the
> developer could rewrite the history without such consequences. Is my
> understanding correct?

For my use case using .mailmap was enough, but that was not a dead name
situation.  However it is a solution that works today, and changes the name
visible in most git operations (afaict) without modifying the history.  So
something to consider.


Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Fw: Re: Concerns/questions around Software Heritage Archive
       [not found]     ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
@ 2024-03-16 23:40       ` Ryan Prior
  0 siblings, 0 replies; 61+ messages in thread
From: Ryan Prior @ 2024-03-16 23:40 UTC (permalink / raw)
  To: Guix Devel

[I intended to CC the following to guix-devel but forgot:]

------- Forwarded Message -------
From: Ryan Prior <rprior@protonmail.com>
Date: On Saturday, March 16th, 2024 at 6:36 PM
Subject: Re: Concerns/questions around Software Heritage Archive
To: Vivien Kraus <vivien@planete-kraus.eu>


> 
> 
> On Saturday, March 16th, 2024 at 6:13 PM, Vivien Kraus vivien@planete-kraus.eu wrote:
> 
> > 2. is more difficult, because Guix contributors sometimes change their
> > names too, and a commit reading “update my name” is not the best
> > solution. If I understand correctly, rewriting the history would be
> > understood as a “downgrade attack”, contrary to the ftfy case where the
> > developer could rewrite the history without such consequences. Is my
> > understanding correct?
> 
> 
> It's only a problem IMO because we make the decision to treat Guix as an append-only series of commits and treat any other outcome as a potential attack. One alternate solution would be to allow provision of an authenticated alternate-history data structure, which indicates a set of (old commit hash, new commit hash) tuples going back to the first rewritten commit in the history, and the whole thing would be signed by a Guix committer. That way, the updating Guix client can rewind history, apply the new commit(s), verify that the old chain and new chain match what's provided in the alternate-history structure & that its signature is valid. Thus verified, the Guix installation could continue without needing to allow a downgrade exception.
> 
> Perhaps there are much better ways of handling this, but I propose it in hopes of clarifying that there are technical solutions which preserve integrity while permitting history rewrites in situations where it is desirable.
> 
> I have requested previously that some commits I've provided be rewritten to update my name. In my case, it's because I've sometimes misconfigured my email software such that some commits by me are signed just "ryan" or "Ryan Prior via Protonmail" or similar, rather than my preference which is "Ryan Prior".
> 
> In my case this causes me no harm and is simply an annoyance, so when I encountered resistance to rewriting the offending commits, I dropped the matter, and I still consider it dropped and settled. Even if we developed the capability to securely present a rewritten history, I wouldn't demand that such be used to address small concerns like mine.
> 
> However, I know we have at least two trans Guix contributors. Do they have any commits with their deadnames on them? Not that this is an invitation to go look; they can tell us if this is a concern worth raising. I include the detail to clarify that this is not a distant concern. Perhaps they have been silent thus far for the same reason that I have, because the policy against rewrites presents too high a barrier? (Or it may not bother them, or maybe they used their initials which are the same etc?) In any case I think it would be courteous to develop a procedure by which we could remove deadnames from old commits, or otherwise remove harmful information from Guix's development history, should this become a necessity.
> 
> Ryan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 19:45     ` Tomas Volf
@ 2024-03-17  7:06       ` MSavoritias
  0 siblings, 0 replies; 61+ messages in thread
From: MSavoritias @ 2024-03-17  7:06 UTC (permalink / raw)
  To: Christopher Baines, Ian Eure, guix-devel


On 3/16/24 21:45, Tomas Volf wrote:
> On 2024-03-16 20:24:50 +0200, MSavoritias wrote:
>>>> I was also distressed to see how poorly they treated a developer who
>>>> wished to update their name:
>>>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>>>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>>> This is probably worth thinking about as Guix is in a similar situation
>>> regarding publishing source code, and people potentially wanting to
>>> change historical source code both in things Guix packages and Guix
>>> itself.
>>>
>>> Like Software Heritage, there's cryptographical implications for
>>> rewriting the Git history and modifying source tarballs or nars that
>>> contain source code.
>>>
>>> We have 17TiB of compressed source code and built software stored for
>>> bordeaux.guix.gnu.org now and we should probably work out how to handle
>>> people asking for things to be removed or changed (for any and all
>>> reasons).
>>>
>>> It's probably worth working out our position on this in advance of
>>> someone asking.
>> I would go a step further actually. Software Heritage is effectively
>> breaking CoC of Guix now.
>>
>> Im not proposing removing all code or something obviously that connects to
>> Software Heritage, but there should be some social action we can take.
>>
>>
>> For example until the matter is resolved and Software Heritage implements a
>> process that respects trans rights Software Heritage should not be welcome
>> in Guix Spaces.
> I did skim the articles and I did not see any details on what the technical
> solution should be.  SWH, among other things, archives the repositories and
> allows fetching them by commit hash.  At least as far as I know.  Since that
> commit hash does contain the author field, what is the proposed solution here to
> change the author name without changing the commit hash?
>
> While I am not a huge fan of the ability to map the "fake" author name over the
> real one in the UI, what other solutions do you or the article author envision?
> I am genuinely curious what you think can be done here.

I think you are arguing for something else than what I wrote? I didn't 
say about technical solutions and that's up to Software Heritage to 
figure it out.

I did say that there should be social consequences since Software 
Heritage is breaking CoC here.

And by breaking CoC I mean that Software Heritage seems to have a 
complete lack of empathy towards trans people.


Regarding what Guix could do personally the answer is clear: People are 
more important than machines and code.

So we should find a way that trans people feel safe in Guix.


MSavoritias

> Have a nice day,
> Tomas Volf


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
@ 2024-03-17  9:39   ` Lars-Dominik Braun
  2024-03-17  9:47     ` MSavoritias
  2024-03-18 14:04     ` pinoaffe
  0 siblings, 2 replies; 61+ messages in thread
From: Lars-Dominik Braun @ 2024-03-17  9:39 UTC (permalink / raw)
  To: Ryan Prior; +Cc: Ian Eure, guix-devel

Hey,

> I have heard folks in the Guix maintenance sphere claim that we never rewrite git history in Guix, as a matter of policy. I believe we should revisit that policy (is it actually written anywhere?) with an eye towards possible exceptions, and develop a mechanism for securely maintaining continuity of Guix installations after history has been rewritten so that we maintain this as a technical possibility in the future, even if we should choose to use it sparingly.

the fallout of rewriting Guix’ git history would be devastating. It
would break every single Guix installation, because

a) `guix pull` authenticates commits and we might lose our trust anchor
if we rewrite history earlier than the introduction of this feature,
b) `guix pull` outright rejects changes to the commit history to prevent
downgrade attacks.

Additionally it would break every single existing usage of the
time machine and thereby completely defeat the goal of providing
reproducible software environments since the commit hash is used to
identify the point in time to jump to.

I doubt developing “mechanisms” – whatever they look like – would
be worth the effort. Our contributors matter, but so do our users. Never
ever rewriting our git history is a tradeoff we should make for our users.

Lars



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17  9:39   ` Lars-Dominik Braun
@ 2024-03-17  9:47     ` MSavoritias
  2024-03-17 11:53       ` paul
  2024-03-17 16:20       ` Concerns/questions around Software Heritage Archive Ian Eure
  2024-03-18 14:04     ` pinoaffe
  1 sibling, 2 replies; 61+ messages in thread
From: MSavoritias @ 2024-03-17  9:47 UTC (permalink / raw)
  To: Lars-Dominik Braun, Ryan Prior; +Cc: Ian Eure, guix-devel


On 3/17/24 11:39, Lars-Dominik Braun wrote:
> Hey,
>
>> I have heard folks in the Guix maintenance sphere claim that we never rewrite git history in Guix, as a matter of policy. I believe we should revisit that policy (is it actually written anywhere?) with an eye towards possible exceptions, and develop a mechanism for securely maintaining continuity of Guix installations after history has been rewritten so that we maintain this as a technical possibility in the future, even if we should choose to use it sparingly.
> the fallout of rewriting Guix’ git history would be devastating. It
> would break every single Guix installation, because
>
> a) `guix pull` authenticates commits and we might lose our trust anchor
> if we rewrite history earlier than the introduction of this feature,
> b) `guix pull` outright rejects changes to the commit history to prevent
> downgrade attacks.
>
> Additionally it would break every single existing usage of the
> time machine and thereby completely defeat the goal of providing
> reproducible software environments since the commit hash is used to
> identify the point in time to jump to.
>
> I doubt developing “mechanisms” – whatever they look like – would
> be worth the effort. Our contributors matter, but so do our users. Never
> ever rewriting our git history is a tradeoff we should make for our users.
>
> Lars
>
>
Thats a good point. in the sense that its a tradeoff here and I 
absolutely agree.


But let me add some food for thought here:

1. Were the social aspects considered when the system came into place?

2. Is it more important for the system to stay as is than to welcome new 
contributors?

3. You mention "its a tradeoff we should make for our users". How many 
trans people where involved in that decision and how much did their 
opinion matter in this?


I am saying this because giving power to people(what is called users) is 
not only handling them code or make sure everything is free software.

Its also the hard part of making sure the voices of people that can not 
code is heard and is participating and taking in mind.

I am not trying to say what we should do about commit history rewriting 
here. Personally the tradeoffs are probably worth it.

But I am trying to say what Guix should do as a culture over including 
people or excluding in the case of Software Heritage.


MSavoritias



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17  9:47     ` MSavoritias
@ 2024-03-17 11:53       ` paul
  2024-03-17 11:57         ` MSavoritias
                           ` (2 more replies)
  2024-03-17 16:20       ` Concerns/questions around Software Heritage Archive Ian Eure
  1 sibling, 3 replies; 61+ messages in thread
From: paul @ 2024-03-17 11:53 UTC (permalink / raw)
  To: guix-devel

Hi all ,

thank you MSavoritias for bringing up points that many of us share. It's 
clearly a tradeoff what to do about the past. For the future, as 
Christpher already stated, we need a serious solution that we can uphold 
as a free software project that does not alienate users or contributors.

My opinion is that names are just wrong to be included, not only because 
of deadnames, but in general having a database with a column first_name 
and a column second_name is something only a 35 yrs old white cis boy 
could have thought was a good idea to model the spectrum of names humans 
use all over the world:

https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

If we'd really need to identify contributors, and obviously Guix 
doesn't, we could use an UUID/machine readable identifier which can then 
be mapped to a displayed name. I believe git can already be configured 
to do so.


giacomo



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17 11:53       ` paul
@ 2024-03-17 11:57         ` MSavoritias
  2024-03-17 14:57           ` Richard Sent
  2024-03-17 16:28           ` Ian Eure
  2024-03-17 12:51         ` Tomas Volf
  2024-03-20 15:25         ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
  2 siblings, 2 replies; 61+ messages in thread
From: MSavoritias @ 2024-03-17 11:57 UTC (permalink / raw)
  To: paul, guix-devel


On 3/17/24 13:53, paul wrote:
> Hi all ,
>
> thank you MSavoritias for bringing up points that many of us share. 
> It's clearly a tradeoff what to do about the past. For the future, as 
> Christpher already stated, we need a serious solution that we can 
> uphold as a free software project that does not alienate users or 
> contributors.
>
> My opinion is that names are just wrong to be included, not only 
> because of deadnames, but in general having a database with a column 
> first_name and a column second_name is something only a 35 yrs old 
> white cis boy could have thought was a good idea to model the spectrum 
> of names humans use all over the world:
>
> https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ 
>
>
> If we'd really need to identify contributors, and obviously Guix 
> doesn't, we could use an UUID/machine readable identifier which can 
> then be mapped to a displayed name. I believe git can already be 
> configured to do so.
>
>
> giacomo
>
>
The uuid sounds like a very interesting solution indeed.

I wonder how easy it could be to add it to git.


I agree that making some rules about names that are going to be wrong at 
some point or in some place is the wrong solution long term for sure.


MSavoritias



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17 11:53       ` paul
  2024-03-17 11:57         ` MSavoritias
@ 2024-03-17 12:51         ` Tomas Volf
  2024-03-17 23:56           ` Attila Lendvai
  2024-03-20 15:25         ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
  2 siblings, 1 reply; 61+ messages in thread
From: Tomas Volf @ 2024-03-17 12:51 UTC (permalink / raw)
  To: paul; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 288 bytes --]

On 2024-03-17 12:53:54 +0100, paul wrote:
> only a 35 yrs old white cis boy

Could you stop labeling people like this?  It makes me feel uncomfortable and
not welcomed...

T.

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
                   ` (2 preceding siblings ...)
  2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
@ 2024-03-17 13:03 ` Olivier Dion
  2024-03-17 17:57 ` Ludovic Courtès
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 61+ messages in thread
From: Olivier Dion @ 2024-03-17 13:03 UTC (permalink / raw)
  To: Ian Eure, guix-devel

On Sat, 16 Mar 2024, Ian Eure <ian@retrospec.tv> wrote:

[...]

> GPL’d software I’ve created has been packaged for Guix, which I assume
> means it’s been included in SWH.  While I’m dealing with their (IMO:
> unethical) opt-out process, I likely also need to stop new copies from
> being uploaded again in the future.

Even without Guix, SWH could upload your projects into their "database".
In fact, I believe anyone can ask to archive your project to SWH.  So
even if you ask Guix to not do the archiving, anyone contributing might
change that in the future.

I believe that preventing Guix from archiving your software is a
symbolic standpoint -- which I respect --, but would put more burden on
the Guix developers.  On the other hand, if enough people refuse to
archive to SWH, this might shift Guix onto a new direction for longterm
source archiving.

I'm not a lawyer, but perhaps a first solution -- for the AI stuff --
would be to add an exception to the GPL that prevents AI from training
on it.  Alas, as usual, our legislators are late on that matter, so that
might not even work.

[...]

-- 
Olivier Dion
oldiob.ca


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17 11:57         ` MSavoritias
@ 2024-03-17 14:57           ` Richard Sent
  2024-03-17 16:28           ` Ian Eure
  1 sibling, 0 replies; 61+ messages in thread
From: Richard Sent @ 2024-03-17 14:57 UTC (permalink / raw)
  To: MSavoritias; +Cc: paul, guix-devel

Regarding Guix development, if the decision is made to not change
existing policy or implement another authorship mechanism, I think some
text could be added to the manual explaining such.

Contributing to Guix is an intentional thing, unlike SWH. Updating the
manual means contributors will, at least, be making an informed decision
to contribute, knowing that names cannot be changed in the Guix repo's
history due to X, Y, and Z consequences in Guix's functionality.

I'm not suggesting that this solution is "the end-all-be-all" or
invalidates alternative avenues, but I feel it is an improvement over
the status quo with no negative tradeoffs. I would not support a
solution that obsoletes time-machine or requires regular manual
intervention during upgrades.

Personally as a new contributor I find it gratifying to see my name in
the commit history.

-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17  9:47     ` MSavoritias
  2024-03-17 11:53       ` paul
@ 2024-03-17 16:20       ` Ian Eure
  2024-03-17 16:55         ` MSavoritias
  1 sibling, 1 reply; 61+ messages in thread
From: Ian Eure @ 2024-03-17 16:20 UTC (permalink / raw)
  To: MSavoritias; +Cc: Lars-Dominik Braun, Ryan Prior, guix-devel


MSavoritias <email@msavoritias.me> writes:

> On 3/17/24 11:39, Lars-Dominik Braun wrote:
>> Hey,
>>
>>> I have heard folks in the Guix maintenance sphere claim that 
>>> we
>> never rewrite git history in Guix, as a matter of policy. I 
>> believe
>> we should revisit that policy (is it actually written 
>> anywhere?)
>> with an eye towards possible exceptions, and develop a 
>> mechanism for
>> securely maintaining continuity of Guix installations after 
>> history
>> has been rewritten so that we maintain this as a technical
>> possibility in the future, even if we should choose to use it
>> sparingly.
>> the fallout of rewriting Guix’ git history would be 
>> devastating. It
>> would break every single Guix installation, because
>>
>> a) `guix pull` authenticates commits and we might lose our 
>> trust anchor
>> if we rewrite history earlier than the introduction of this 
>> feature,
>> b) `guix pull` outright rejects changes to the commit history 
>> to prevent
>> downgrade attacks.
>>
>> Additionally it would break every single existing usage of the
>> time machine and thereby completely defeat the goal of 
>> providing
>> reproducible software environments since the commit hash is 
>> used to
>> identify the point in time to jump to.
>>
>> I doubt developing “mechanisms” – whatever they look like – 
>> would
>> be worth the effort. Our contributors matter, but so do our 
>> users. Never
>> ever rewriting our git history is a tradeoff we should make for 
>> our users.
>>
>> Lars
>>
>>
> Thats a good point. in the sense that its a tradeoff here and I
> absolutely agree.
>
>
> But let me add some food for thought here:
>
> 1. Were the social aspects considered when the system came into 
> place?
>
> 2. Is it more important for the system to stay as is than to 
> welcome
> new contributors?
>
> 3. You mention "its a tradeoff we should make for our 
> users". How many
> trans people where involved in that decision and how much did 
> their
> opinion matter in this?
>
>
> I am saying this because giving power to people(what is called 
> users)
> is not only handling them code or make sure everything is free
> software.
>
> Its also the hard part of making sure the voices of people that 
> can
> not code is heard and is participating and taking in mind.
>

Just want to say that I appreciate and agree with your thoughtful 
words.

I’d also note that name changes aren’t a concern limited to trans 
people, and framing this as "we have to upend everything Because 
Transgender" is both wrong and feels pretty bad to me.  Anyone can 
change their name at any time for any reason, or no reason at all, 
and may wish to update historical references to their previous 
names.  Having a mechanism to support this is, in my view, a 
matter of basic decency and respect for all humans.

Thanks,

  — Ian


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17 11:57         ` MSavoritias
  2024-03-17 14:57           ` Richard Sent
@ 2024-03-17 16:28           ` Ian Eure
  1 sibling, 0 replies; 61+ messages in thread
From: Ian Eure @ 2024-03-17 16:28 UTC (permalink / raw)
  To: MSavoritias; +Cc: paul, guix-devel


MSavoritias <email@msavoritias.me> writes:

> On 3/17/24 13:53, paul wrote:
>> Hi all ,
>>
>> thank you MSavoritias for bringing up points that many of us
>> share. It's clearly a tradeoff what to do about the past. For 
>> the
>> future, as Christpher already stated, we need a serious 
>> solution
>> that we can uphold as a free software project that does not 
>> alienate
>> users or contributors.
>>
>> My opinion is that names are just wrong to be included, not 
>> only
>> because of deadnames, but in general having a database with a 
>> column
>> first_name and a column second_name is something only a 35 yrs 
>> old
>> white cis boy could have thought was a good idea to model the
>> spectrum of names humans use all over the world:
>>
>> https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
>> If we'd really need to identify contributors, and obviously 
>> Guix
>> doesn't, we could use an UUID/machine readable identifier which 
>> can
>> then be mapped to a displayed name. I believe git can already 
>> be
>> configured to do so.
>>
>>
>> giacomo
>>
>>
> The uuid sounds like a very interesting solution indeed.
>
> I wonder how easy it could be to add it to git.
>

This also seems like interesting territory to explore.  The 
concerns raised around rewriting history have valid points; I 
think it’s impractical to rewrite history any time a change needs 
to happen, as that would be an ongoing source of disruption.  But 
rewriting history *once*, to switch to a more general mechanism, 
seems like a reasonable trade to me.  This also presents an 
opportunity: we could combine this with a default branch switch 
from master to main.  A news entry left as the final commit in 
master could inform people of whatever steps may be needed to 
update (if that can’t be automated), and the main branch would 
contain the rewritten history.

It’s certainly not a perfect solution, but it seems pragmatic.

  — Ian


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17 16:20       ` Concerns/questions around Software Heritage Archive Ian Eure
@ 2024-03-17 16:55         ` MSavoritias
  0 siblings, 0 replies; 61+ messages in thread
From: MSavoritias @ 2024-03-17 16:55 UTC (permalink / raw)
  To: Ian Eure; +Cc: Lars-Dominik Braun, Ryan Prior, guix-devel

On 3/17/24 18:20, Ian Eure wrote:

>
> MSavoritias <email@msavoritias.me> writes:
>
>> On 3/17/24 11:39, Lars-Dominik Braun wrote:
>>> Hey,
>>>
>>>> I have heard folks in the Guix maintenance sphere claim that we
>>> never rewrite git history in Guix, as a matter of policy. I believe
>>> we should revisit that policy (is it actually written anywhere?)
>>> with an eye towards possible exceptions, and develop a mechanism for
>>> securely maintaining continuity of Guix installations after history
>>> has been rewritten so that we maintain this as a technical
>>> possibility in the future, even if we should choose to use it
>>> sparingly.
>>> the fallout of rewriting Guix’ git history would be devastating. It
>>> would break every single Guix installation, because
>>>
>>> a) `guix pull` authenticates commits and we might lose our trust anchor
>>> if we rewrite history earlier than the introduction of this feature,
>>> b) `guix pull` outright rejects changes to the commit history to 
>>> prevent
>>> downgrade attacks.
>>>
>>> Additionally it would break every single existing usage of the
>>> time machine and thereby completely defeat the goal of providing
>>> reproducible software environments since the commit hash is used to
>>> identify the point in time to jump to.
>>>
>>> I doubt developing “mechanisms” – whatever they look like – would
>>> be worth the effort. Our contributors matter, but so do our users. 
>>> Never
>>> ever rewriting our git history is a tradeoff we should make for our 
>>> users.
>>>
>>> Lars
>>>
>>>
>> Thats a good point. in the sense that its a tradeoff here and I
>> absolutely agree.
>>
>>
>> But let me add some food for thought here:
>>
>> 1. Were the social aspects considered when the system came into place?
>>
>> 2. Is it more important for the system to stay as is than to welcome
>> new contributors?
>>
>> 3. You mention "its a tradeoff we should make for our users". How many
>> trans people where involved in that decision and how much did their
>> opinion matter in this?
>>
>>
>> I am saying this because giving power to people(what is called users)
>> is not only handling them code or make sure everything is free
>> software.
>>
>> Its also the hard part of making sure the voices of people that can
>> not code is heard and is participating and taking in mind.
>>
>
> Just want to say that I appreciate and agree with your thoughtful words.
>
> I’d also note that name changes aren’t a concern limited to trans 
> people, and framing this as "we have to upend everything Because 
> Transgender" is both wrong and feels pretty bad to me.  Anyone can 
> change their name at any time for any reason, or no reason at all, and 
> may wish to update historical references to their previous names.  
> Having a mechanism to support this is, in my view, a matter of basic 
> decency and respect for all humans.
>
> Thanks,
>
>  — Ian

You are right. I failed to see how it could be desirable for other 
people too.

I agree it should be done for everybody.


MSavoritias



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
                   ` (3 preceding siblings ...)
  2024-03-17 13:03 ` Olivier Dion
@ 2024-03-17 17:57 ` Ludovic Courtès
  2024-03-20 17:22   ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
  2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 61+ messages in thread
From: Ludovic Courtès @ 2024-03-17 17:57 UTC (permalink / raw)
  To: Ian Eure; +Cc: guix-devel

Hi,

Ian Eure <ian@retrospec.tv> skribis:

> They appear to be using the archive to build LLMs:
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

To me, if the end result is that copyleft licenses are ignored, as is
the case with Microsoft’s CoPilot, then we have a problem.

That’s no excuse, but the problem goes beyond SWH: people upload copies
of repositories to GitHub without one’s consent (nothing to blame them
for, it’s free software), and then code ends up being used as training
data for CoPilot.

As you may have seen, this is being discussed on the Fediverse.  I’d
like to leave the SWH people time to reply to concerns that have been
raised.

> I was also distressed to see how poorly they treated a developer who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag

That’s another concern, with append-only storage in general, starting
with Git.  We should look for solutions that work for both contributors
who change names and for users.  This has happened several times in Guix
and what people did was search/replace their name and adjust ‘.mailmap’.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17 12:51         ` Tomas Volf
@ 2024-03-17 23:56           ` Attila Lendvai
  0 siblings, 0 replies; 61+ messages in thread
From: Attila Lendvai @ 2024-03-17 23:56 UTC (permalink / raw)
  To: Tomas Volf; +Cc: paul, guix-devel

> only a 35 yrs old white cis boy

you're judging a group of individuals, namely those who were handed the cis white male mix at the genetic lottery, as a uniform blob. and maybe even somewhat deplorable, if i'm reading your right.

does it make sense to judge an individual based on some coincidental properties? or really, based on anything else than their actions? does it make sense to discuss the actions/morality of a group of individuals that is formed based on some coincidental properties? e.g. what can we say about the morality of all the blond people?

and ultimately, is that an effective way of speaking up for human rights and welcoming environments -- of all things?

maybe it's time to take a thorough look at the book that you're preaching from?

if i may, let me attempt to inspire you:

“The world is changed by your example, not by your opinion.”
	— Paulo Coelho (1947–)
%
“Yesterday I was clever, so I wanted to change the world. Today I am wise, so I am changing myself.”
	— Rumi (1207–1273)
%
“If there is to be peace in the world,
There must be peace in the nations.
If there is to be peace in the nations,
There must be peace in the cities.
If there is to be peace in the cities,
There must be peace between neighbors.
If there is to be peace between neighbors,
There must be peace in the home.
If there is to be peace in the home,
There must be peace in the heart.”
	— Lao Tzu (sixth century BC)
%
“A man of humanity is one who, in seeking to establish himself, finds a foothold for others and who, in desiring attaining himself, helps others to attain.”
	— Confucius (551–479 BC)
%
“To put the world in order, we must first put the nation in order; to put the nation in order, we must first put the family in order; to put the family in order; we must first cultivate our personal life; we must first set our hearts right.”
	— Confucius (551–479 BC)
%
“Until we have met the monsters in ourselves, we keep trying to slay them in the outer world. And we find that we cannot. For all darkness in the world stems from darkness in the heart. And it is there that we must do our work.”
	— Marianne Williamson (1952–), 'Everyday Grace: Having Hope, Finding Forgiveness And Making Miracles' (2004)
%
“If things go wrong in the world, this is because something is wrong with the individual, because something is wrong with me. Therefore, if I am sensible, I shall put myself right first”
	— Carl Jung (1875–1961), 'The Meaning of Psychology for Modern Man'

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“If liberty means anything at all, it means the right to tell people what they do not want to hear.”
	— George Orwell (1903–1950)



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
                   ` (4 preceding siblings ...)
  2024-03-17 17:57 ` Ludovic Courtès
@ 2024-03-18  9:28 ` Simon Tournier
  2024-03-18 11:47   ` MSavoritias
                     ` (2 more replies)
  2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
  2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
  7 siblings, 3 replies; 61+ messages in thread
From: Simon Tournier @ 2024-03-18  9:28 UTC (permalink / raw)
  To: Ian Eure, guix-devel

Hi,

On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> wrote:

> They appear to be using the archive to build LLMs: 
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

About LLM, Software Heritage made a clear statement:

    https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code

Quoting:

        We feel that the question is no longer whether LLMs for code
        should be built. They are already being built, independently of
        what we do, and there is no turning back.  The real question is
        how they should be built and whom they should benefit.

Principles:

        1. Knowledge derived from the Software Heritage archive must be
        given back to humanity, rather than monopolized for private
        gain. The resulting machine learning models must be made available
        under a suitable open license, together with the documentation and
        toolings needed to use them.

        2. The initial training data extracted from the Software Heritage
        archive must be fully and precisely identified by, for example,
        publishing the corresponding SWHID identifiers (note that, in the
        context of Software Heritage, public availability of the initial
        training data is a given: anyone can obtain it from the
        archive). This will enable use cases such as: studying biases
        (fairness), verifying if a code of interest was present in the
        training data (transparency), and providing appropriate attribution
        when generated code bears resemblance to training data (credit),
        among others.

        3. Mechanisms should be established, where possible, for authors to
        exclude their archived code from the training inputs before model
        training begins.

I hope it clarifies your concerns to some extent.


Moreover, you wrote: « I want absolutely nothing to do with them. »

Maybe there is a misunderstanding on your side about what “free
software” and GPL means because once “free software”, you cannot prevent
people to use “your” free software for any purposes you dislike.

If you want to bound the use cases of the software you create, you need
to explicitly specify that in the license.  And if you do, your software
will not be considered as “free software”.

That’s the double sword of “free software”. :-)

Cheers,
simon


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Please hold your horses
  2024-03-16 17:58 ` MSavoritias
@ 2024-03-18  9:50   ` Simon Tournier
  0 siblings, 0 replies; 61+ messages in thread
From: Simon Tournier @ 2024-03-18  9:50 UTC (permalink / raw)
  To: MSavoritias, Ian Eure, guix-devel

Hi MSavoritias,

Could you please stop to propagate tangential or opinionated views?
Please hold your horses.

You wrote several times, about Software Heritage:

>                                                  being also transphobic.

[…]

> I would go a step further actually. Software Heritage is effectively 
> breaking CoC of Guix now.

[…]

>                                              Software Heritage 
> implements a process that respects trans rights Software Heritage should 
> not be welcome in Guix Spaces.

[…]

>                                                          Software 
> Heritage is breaking CoC here.

This language is not acceptable on Guix channel of communication.

It appears to me much better to stay open and let the benefit
of the doubt.  Let avoid bold conclusions and prefer constructive
arguments.

For instance, I refrain to qualify your opinion because it would not be
helpful… So I apply my own advice letting you the benefit of the
doubt.

Cheers,
simon


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Content-Addressed system and history?
  2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
                   ` (5 preceding siblings ...)
  2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
@ 2024-03-18 11:14 ` Simon Tournier
  2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
  7 siblings, 0 replies; 61+ messages in thread
From: Simon Tournier @ 2024-03-18 11:14 UTC (permalink / raw)
  To: Ian Eure, guix-devel

Hi,

On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> wrote:

> I was also distressed to see how poorly they treated a developer 
> who wished to update their name: 
> https://cohost.org/arborelia/post/4968198-the-software-heritag 
> https://cohost.org/arborelia/post/5052044-the-software-heritag

This asks two questions, IMHO.

1. Can the future you decide who were the past you?

2. What is Content-addressed system?


About #1, that’s somehow a philosophical question. :-)

That’s what the question about changing the public identity asks: you
can act on who you are and who you want to be but because the time is
not reversal, sadly, you cannot change who you were.  It is not possible
to collectively rewrite the history.

Allowing such process leads to dangerous consequences, IMHO.  That’s
another story. :-)

Do not take me wrong.  That’s still an open question and the right to be
forgotten is a topic by itself, e.g., legal.  We will not address it in
the Guix project.



About #2, that’s a technical question.

By definition of a Content-Addressed system, the key associated to the
value is computed by a procedure depending only on the content itself.
Therefore, change the content then change the key.

Git [1] is probably the tool that have popularized that.  Consider a
project using Git and you clone it.  Now, you have a complete copy of
many keys associated to many contents, and also many links between the
keys themselves.  For instance, the key of the object ’Git commit’
depends on its content which depends on the key of the object ’Git
tree’.

Now, if you rewrite any content, then it rewrites the key.  As pointed,
this change might propagate.

All the question becomes the authority.  Because I also have another
copy/clone with the initial set of keys and you have now modified ones,
how do we agree what are the right ones?

Well, at the size [2] of linked posts, the Git history rewriting is
affordable.  Now, I am not convinced that the person would try – or even
think of – such if this project would have hundreds of contributors and
thousands of users.  That’s my opinion and I agree it is not an
argument. :-)

At the level of Guix, allowing a mutable history implies a random
availability of binary substitutes.

To be explicit, rewrite the Git history of Guix implies the break of:

 + local Git repositories of Guix developers
 + regular Guix users and the trust mechanism
 
Somehow, a Content-Addressed system is designed around immutable
content.  And if one know how to implement a Content-Addressed system
relying on mutable content, I would be very interested to know more
about it.


Cheers,
simon


1: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
2: https://github.com/rspeer/python-ftfy/graphs/contributors


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
@ 2024-03-18 11:47   ` MSavoritias
  2024-03-18 13:12     ` Simon Tournier
  2024-03-18 16:27   ` Kaelyn
  2024-03-18 19:38   ` Ian Eure
  2 siblings, 1 reply; 61+ messages in thread
From: MSavoritias @ 2024-03-18 11:47 UTC (permalink / raw)
  To: Simon Tournier, Ian Eure, guix-devel

On 3/18/24 11:28, Simon Tournier wrote:

> Hi,
>
> On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> wrote:
>
>> They appear to be using the archive to build LLMs:
>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
> About LLM, Software Heritage made a clear statement:
>
>      https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code
>
> Quoting:
>
>          We feel that the question is no longer whether LLMs for code
>          should be built. They are already being built, independently of
>          what we do, and there is no turning back.  The real question is
>          how they should be built and whom they should benefit.
>
> Principles:
>
>          1. Knowledge derived from the Software Heritage archive must be
>          given back to humanity, rather than monopolized for private
>          gain. The resulting machine learning models must be made available
>          under a suitable open license, together with the documentation and
>          toolings needed to use them.
>
>          2. The initial training data extracted from the Software Heritage
>          archive must be fully and precisely identified by, for example,
>          publishing the corresponding SWHID identifiers (note that, in the
>          context of Software Heritage, public availability of the initial
>          training data is a given: anyone can obtain it from the
>          archive). This will enable use cases such as: studying biases
>          (fairness), verifying if a code of interest was present in the
>          training data (transparency), and providing appropriate attribution
>          when generated code bears resemblance to training data (credit),
>          among others.
>
>          3. Mechanisms should be established, where possible, for authors to
>          exclude their archived code from the training inputs before model
>          training begins.
>
> I hope it clarifies your concerns to some extent.
>
>
> Moreover, you wrote: « I want absolutely nothing to do with them. »
>
> Maybe there is a misunderstanding on your side about what “free
> software” and GPL means because once “free software”, you cannot prevent
> people to use “your” free software for any purposes you dislike.
>
> If you want to bound the use cases of the software you create, you need
> to explicitly specify that in the license.  And if you do, your software
> will not be considered as “free software”.
>
> That’s the double sword of “free software”. :-)

Simon,


1.

You seem to be misunderstanding the statement here that was said.

What you can do legally and what you can do socially are not always the 
same thing.

As advice for the future when somebody says a concern or wish they have, 
your first statement shouldn't be "but its legal" because that 
completely dismisses any constructive discussion that could be done.

And you seem to be talking about legal a lot here so thats not a good look.


Yes, legally Ian probably can't get lawyers on you. But nobody is 
talking about legally here.

What is in question here is whether Software Heritage respects people 
enough to do the right thing and respect their wishes without getting 
lawyers/legal involved.


Besides with the way you are framing Free Software as not respecting any 
social rules then that makes Free Software not attractive which is the 
opposite of what we are trying to do here :)


2.

 > Somehow, a Content-Addressed system is designed around immutable 
content. And if one know how to implement a Content-Addressed system 
relying on mutable content, I would be very interested to know more 
about it.


Please refrain from doing such remarks. Nobody here suggested anything 
that you mention here and you effectively devalue the discussion by 
arguing like this and frame other people as stupid.


3.

Its not on people that are not included to write the code. If Guix is to 
be an inclusive project, then Guix should do the work so that people 
feel included.

You may disagree with this sure, but shutting down the discussion 
because nobody wrote the code for you is very elitist of you.


4.

 > This language is not acceptable on Guix channel of communication.

Calling out transphobia it is very much accepted here actually :)

Its transphobic speech that is not accepted.


I welcome Software Heritage to make an announcement about this or some 
kind of official communication saying their stance.

Although I still wouldn't use them due to the LLMs and AI stuff that 
they are using. Which I hope at some point realize their mistake.


MSavoritias



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18 11:47   ` MSavoritias
@ 2024-03-18 13:12     ` Simon Tournier
  2024-03-18 14:00       ` MSavoritias
  0 siblings, 1 reply; 61+ messages in thread
From: Simon Tournier @ 2024-03-18 13:12 UTC (permalink / raw)
  To: MSavoritias, Ian Eure, guix-devel

Hi MSavoritias,

On lun., 18 mars 2024 at 13:47, MSavoritias <email@msavoritias.me> wrote:

> 1.
>
> You seem to be misunderstanding the statement here that was said.
>
> What you can do legally and what you can do socially are not always the 
> same thing.

I do not read where I wrote something like that but anyway.

A program is free software if the program's users have the four
essential freedoms: [1]

  0. The freedom to run the program as you wish, for any purpose.
  1. The freedom to study how the program works, and change it so it does
     your computing as you wish. Access to the source code is a precondition
     for this. 
  2. The freedom to redistribute copies so you can help others.
  3. The freedom to distribute copies of your modified versions to
     others. By doing this you can give the whole community a chance to
     benefit from your changes. Access to the source code is a precondition
     for this.

All is about the philosophy of “free software”.

1: https://www.gnu.org/philosophy/free-sw.en.html


> As advice for the future when somebody says a concern or wish they have, 
> your first statement shouldn't be "but its legal" because that 
> completely dismisses any constructive discussion that could be done.

Again, I am not arguing about “legal” something.  Instead, I am pointing
that this wish does not match the principles of “free software”.

If you accept that the software you create is “free software” then you
cannot complain if this “free software” is used in some contexts that
you consider unethical.

That’s the double sword of “free software”.

Do I consider LLMs as something unethical?  I think yes: most AI appears
to me unethical but that’s another story (rooting my arguments in
arguments about energy [2,3,4]).

2: https://social.sciences.re/@zimoun/112082437445032973
3: https://social.sciences.re/@zimoun/112039562095800532
4: https://social.sciences.re/@zimoun/112038609631116527


> What is in question here is whether Software Heritage respects people 
> enough to do the right thing and respect their wishes without getting 
> lawyers/legal involved.

Again, this is an incorrect frame, IMHO.  Software Heritage (SWH) do the
things you granted them to do.  SWH respects the “ethical” definition of
“free software”.

Again, do I think that feeding LLM after publishing a statement for LLM
code is a good move?  I do not know…  Does it break my ethical values?
Maybe…  Can I complain about my contributions to “free software” reused
in a way that I might consider unethical?  No.

5: https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
6: https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/


> Besides with the way you are framing Free Software as not respecting any 
> social rules then that makes Free Software not attractive which is the 
> opposite of what we are trying to do here :)

I do not know what are the “social rules” of “free software”.  At best,
I understand the social rules of a community working on free software.

And this community is far to be an homogeneous whole with clear social
rules.  These social rules vary and the only shared denominator is the
“free software” principles defined by four freedoms.

The only question might be: by allowing ingested source code to be used
to train LLM, is Software Heritage aligned with the values that the Guix
community promote?

To be honest, I cannot answer to that question in a hurry.


> 2.
>
>  > Somehow, a Content-Addressed system is designed around immutable 
> > content. And if one know how to implement a Content-Addressed system 
> > relying on mutable content, I would be very interested to know more 
> > about it.
>
> Please refrain from doing such remarks. Nobody here suggested anything 
> that you mention here and you effectively devalue the discussion by 
> arguing like this and frame other people as stupid.

I will not refrain to say: Talk is cheap!

Positions about the situation with “rewrite history” cannot be a
discussion about opinions but it needs to be rooted in how it
technically works and what does it mean Content-addressed system.


> 3.
>
> You may disagree with this sure, but shutting down the discussion 
> because nobody wrote the code for you is very elitist of you.

We are speaking about which discussion because I am lost.  About LLM or
about “rewrite history”?

About LLM, see point #1.

About “rewrite history”, see point #2


> 4.
>
>  > This language is not acceptable on Guix channel of communication.
>
> Calling out transphobia it is very much accepted here actually :)

No it is not.  Because it is a bold conclusion.

I am asking that the Guix project rewrite right now its history:
changing my identity ’zimoun’ to my identity ’Simon Tournier’.  Since
the Guix project will take the time to check, then I will claim: the
Guix project is French-phobic!

I ask you again to stop such language.  I respect your opinion but name
calling is not welcoming on Guix channels of communication.

Cheers,
simon


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18 13:12     ` Simon Tournier
@ 2024-03-18 14:00       ` MSavoritias
  2024-03-18 14:32         ` Simon Tournier
  0 siblings, 1 reply; 61+ messages in thread
From: MSavoritias @ 2024-03-18 14:00 UTC (permalink / raw)
  To: Simon Tournier, guix-devel


On 3/18/24 15:12, Simon Tournier wrote:
> Hi MSavoritias,
>
> On lun., 18 mars 2024 at 13:47, MSavoritias <email@msavoritias.me> wrote:
>
>
>> As advice for the future when somebody says a concern or wish they have,
>> your first statement shouldn't be "but its legal" because that
>> completely dismisses any constructive discussion that could be done.
> Again, I am not arguing about “legal” something.  Instead, I am pointing
> that this wish does not match the principles of “free software”.
>
> If you accept that the software you create is “free software” then you
> cannot complain if this “free software” is used in some contexts that
> you consider unethical.
>
> That’s the double sword of “free software”.
>
> Do I consider LLMs as something unethical?  I think yes: most AI appears
> to me unethical but that’s another story (rooting my arguments in
> arguments about energy [2,3,4]).
>
> 2: https://social.sciences.re/@zimoun/112082437445032973
> 3: https://social.sciences.re/@zimoun/112039562095800532
> 4: https://social.sciences.re/@zimoun/112038609631116527
>
Yes you are. The argument that you can do what you want with Free 
Software is based around a licence which is a legal construct of states.

I think you have misunderstood that here we are talking about the social 
rules of being a decent group of human beings and respect somebody 
else's wishes.

>> What is in question here is whether Software Heritage respects people
>> enough to do the right thing and respect their wishes without getting
>> lawyers/legal involved.
> Again, this is an incorrect frame, IMHO.  Software Heritage (SWH) do the
> things you granted them to do.  SWH respects the “ethical” definition of
> “free software”.

You are bringing the legal argument again. The argument that you can do 
what you want with Free Software is based around a licence which is a 
legal construct of states.

I think you have misunderstood that here we are talking about the social 
rules of being a decent group of human beings and respect somebody 
else's wishes.

In this case somebody asks for something so if SFH is a good member of 
our community they should do that. Otherwise they are not a good member 
of our community.

>
>> Besides with the way you are framing Free Software as not respecting any
>> social rules then that makes Free Software not attractive which is the
>> opposite of what we are trying to do here :)
> I do not know what are the “social rules” of “free software”.  At best,
> I understand the social rules of a community working on free software.
>
> And this community is far to be an homogeneous whole with clear social
> rules.  These social rules vary and the only shared denominator is the
> “free software” principles defined by four freedoms.

Guix has a CoC that's the common thing we have here. For social things 
that is. Plus some cultural things of course.


MSavoritias




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-17  9:39   ` Lars-Dominik Braun
  2024-03-17  9:47     ` MSavoritias
@ 2024-03-18 14:04     ` pinoaffe
  1 sibling, 0 replies; 61+ messages in thread
From: pinoaffe @ 2024-03-18 14:04 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: Ryan Prior, Ian Eure, guix-devel


Lars-Dominik Braun <lars@6xq.net> writes:
>> I have heard folks in the Guix maintenance sphere claim that we
>> never rewrite git history in Guix, as a matter of policy. I believe we
>> should revisit that policy (is it actually written anywhere?) with an
>> eye towards possible exceptions, and develop a mechanism for securely
>> maintaining continuity of Guix installations after history has been
>> rewritten so that we maintain this as a technical possibility in the
>> future, even if we should choose to use it sparingly.
>
> the fallout of rewriting Guix’ git history would be devastating. It
> would break every single Guix installation, because
>
> a) `guix pull` authenticates commits and we might lose our trust anchor
> if we rewrite history earlier than the introduction of this feature,
> b) `guix pull` outright rejects changes to the commit history to prevent
> downgrade attacks.
>
> Additionally it would break every single existing usage of the
> time machine and thereby completely defeat the goal of providing
> reproducible software environments since the commit hash is used to
> identify the point in time to jump to.
>
> I doubt developing “mechanisms” – whatever they look like – would
> be worth the effort. Our contributors matter, but so do our users. Never
> ever rewriting our git history is a tradeoff we should make for our users.

There may come a time where we don't really have another option but to
rewrite (part of) history (e.g., if someone vandalizes the repository
using incriminating/illegal files) - I hope that such vandalism would be
caught quickly so that most guix installations would not be infected,
but it may be a good idea to plan what to do in the unfortunte event that
it is necessary to rewrite guix history



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18 14:00       ` MSavoritias
@ 2024-03-18 14:32         ` Simon Tournier
  0 siblings, 0 replies; 61+ messages in thread
From: Simon Tournier @ 2024-03-18 14:32 UTC (permalink / raw)
  To: MSavoritias, guix-devel

Hi MSavoritias,

On lun., 18 mars 2024 at 16:00, MSavoritias <email@msavoritias.me> wrote:

> I think you have misunderstood that here we are talking about

> I think you have misunderstood that here we are talking about

What if? Maybe it’s you.  Maybe you, “you have misunderstood that here
we are talking about […]”.

For what my opinion is worth here, I would prefer that you do not assume
on what I might have understood.  Similarly, I am not assuming anything
about your understanding of the various topics at hand.

That’s my last message in this thread.

Cheers,
simon


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
  2024-03-18 11:47   ` MSavoritias
@ 2024-03-18 16:27   ` Kaelyn
  2024-03-18 17:39     ` Daniel Littlewood
  2024-03-18 20:38     ` Olivier Dion
  2024-03-18 19:38   ` Ian Eure
  2 siblings, 2 replies; 61+ messages in thread
From: Kaelyn @ 2024-03-18 16:27 UTC (permalink / raw)
  To: guix-devel

On Monday, March 18th, 2024 at 2:28 AM, Simon Tournier <zimon.toutoune@gmail.com> wrote:

> 
> Hi,
> 
> On sam., 16 mars 2024 at 08:52, Ian Eure ian@retrospec.tv wrote:
> 
> > They appear to be using the archive to build LLMs:
> > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
> 
> 
> About LLM, Software Heritage made a clear statement:
> 
> https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code
> 
> Quoting:
> 
> We feel that the question is no longer whether LLMs for code
> should be built. They are already being built, independently of
> what we do, and there is no turning back. The real question is
> how they should be built and whom they should benefit.
> 
> Principles:
> 
> 1. Knowledge derived from the Software Heritage archive must be
> given back to humanity, rather than monopolized for private
> gain. The resulting machine learning models must be made available
> under a suitable open license, together with the documentation and
> toolings needed to use them.
> 
> 2. The initial training data extracted from the Software Heritage
> archive must be fully and precisely identified by, for example,
> publishing the corresponding SWHID identifiers (note that, in the
> context of Software Heritage, public availability of the initial
> training data is a given: anyone can obtain it from the
> archive). This will enable use cases such as: studying biases
> (fairness), verifying if a code of interest was present in the
> training data (transparency), and providing appropriate attribution
> when generated code bears resemblance to training data (credit),
> among others.
> 
> 3. Mechanisms should be established, where possible, for authors to
> exclude their archived code from the training inputs before model
> training begins.
> 
> I hope it clarifies your concerns to some extent.
> 
> 
> Moreover, you wrote: « I want absolutely nothing to do with them. »
> 
> Maybe there is a misunderstanding on your side about what “free
> software” and GPL means because once “free software”, you cannot prevent
> people to use “your” free software for any purposes you dislike.
> 
> If you want to bound the use cases of the software you create, you need
> to explicitly specify that in the license. And if you do, your software
> will not be considered as “free software”.
> 
> That’s the double sword of “free software”. :-)

Hi,

I want to stress that I am not a lawyer, but my (possiblibly outdated) understanding of what machine learning models can and cannot do with regards to their training data, and a reading of parts of the GPL 2 and 3, suggest that at best the SWH's LLM is in a legal grey area and at worst directly violates the license of GPL code that it ingests for training. As such, I don't think it is accurate to say "you cannot prevent people to use “your” free software for any purposes you dislike" in response to concerns about automatic inclusion of free software into LLM training sets. Specifically, my understanding (as of a few years ago) is that LLMs have difficulty tracing and atttributing various aspects of its training to specific inputs, which seems to be in violation of of e.g. Sections 5 and 6 of the GPL. Specific quotes from those sections https://www.gnu.org/licenses/gpl-3.0.html:

From section 5:
> You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions:
> 
>     a) The work must carry prominent notices stating that you modified it, and giving a relevant date.
>     b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to “keep intact all notices”.
>     c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it.
>     d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so.

and from Section 6:
> You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways:
> 
>     a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange.
>     b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge.
>     c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b.
>     d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements.
>     e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d.

And from the GPL 2 text at https://www.gnu.org/licenses/old-licenses/gpl-2.0.html:

> 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:
> 
>     a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. 
>     b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. 
>     c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) 
> 
> These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.
> 
> Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.
> 
> In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
> 
> 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:
> 
>     a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, 
>     b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, 
>     c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) 
> 
> The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.
> 
> If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.
> 
> 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 

Again, I want to emphasize IANAL. As a layman, my understanding of ML model training is that it cannot maintain enough of a trace between GPLed input code and its (modified) use in the output to maintain the licensing and distribution requirements from either the GPL 3 sections above or the GPL 2 sections 2 and 3. I also believe that section 4 of the GPL 2 directly applies to these LLM code models.

There is also the potential licensing issues of mixing (potentially) incompatible licenses in the training data sets, such as GPL and CDDL code, with no way to distinguish or separate the (arguably) modified sources from each.

Just my $0.02 USD on the LLM side of matter, as much of the discussion seems to be around the cost vs benefit of rewriting the git history for updating personally identifying information.

Cheers,
Kaelyn

> 
> Cheers,
> simon


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18 16:27   ` Kaelyn
@ 2024-03-18 17:39     ` Daniel Littlewood
  2024-03-18 20:38     ` Olivier Dion
  1 sibling, 0 replies; 61+ messages in thread
From: Daniel Littlewood @ 2024-03-18 17:39 UTC (permalink / raw)
  To: Kaelyn; +Cc: guix-devel

Hi Kaelyn,

The legal question is unsettled, and there is ongoing litigation by
(at least) Matthew Butterick in the US, since at least 2022. The
reasonable positions I'm aware of are:

1. An LLM (or, more precisely, the set of weights that define it) is
not a derivative work of its training data, for the purposes of
copyright, and thus the license is irrelevant.
2. Producing an LLM from training data is a transformative fair use,
and thus the license is irrelevant.
3. Neither 1 nor 2 holds, and LLMs constitute copyright infringement
on a profound scale (of both copyrighted and copylefted works).

The FSF and CC have both commissioned white papers on the impact of
such considerations for Free works. I don't recall seeing anything
particularly insightful in them. Probably a waste of time to discuss
it here.

Best wishes,
Dan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
  2024-03-18 11:47   ` MSavoritias
  2024-03-18 16:27   ` Kaelyn
@ 2024-03-18 19:38   ` Ian Eure
  2024-03-18 22:02     ` Ludovic Courtès
  2024-03-19 10:58     ` Simon Tournier
  2 siblings, 2 replies; 61+ messages in thread
From: Ian Eure @ 2024-03-18 19:38 UTC (permalink / raw)
  To: Simon Tournier; +Cc: guix-devel


Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> 
> wrote:
>
>> They appear to be using the archive to build LLMs: 
>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>
> About LLM, Software Heritage made a clear statement:
>
>     https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code
>
> Quoting:
>
>         We feel that the question is no longer whether LLMs for 
>         code
>         should be built. They are already being built, 
>         independently of
>         what we do, and there is no turning back.  The real 
>         question is
>         how they should be built and whom they should benefit.
>
> Principles:
>
>         1. Knowledge derived from the Software Heritage archive 
>         must be
>         given back to humanity, rather than monopolized for 
>         private
>         gain. The resulting machine learning models must be made 
>         available
>         under a suitable open license, together with the 
>         documentation and
>         toolings needed to use them.
>
>         2. The initial training data extracted from the Software 
>         Heritage
>         archive must be fully and precisely identified by, for 
>         example,
>         publishing the corresponding SWHID identifiers (note 
>         that, in the
>         context of Software Heritage, public availability of the 
>         initial
>         training data is a given: anyone can obtain it from the
>         archive). This will enable use cases such as: studying 
>         biases
>         (fairness), verifying if a code of interest was present 
>         in the
>         training data (transparency), and providing appropriate 
>         attribution
>         when generated code bears resemblance to training data 
>         (credit),
>         among others.
>
>         3. Mechanisms should be established, where possible, for 
>         authors to
>         exclude their archived code from the training inputs 
>         before model
>         training begins.
>
> I hope it clarifies your concerns to some extent.
>

It doesn’t clarify them, but it does illustrate them.

HuggingFace and the StarCoder2 model is in violation of principle 
2.  By their own admission, they are including code without clear 
licensing[1]:

    The main difference between the Stack v2 and the Stack v1 is 
    that we
    include both permissively licensed and unlicensed files.

HuggingFace’s StarChat2 Playground[2] also violates this 
principle, as it outputs code without any license or provenance 
information; I know, because I tried it.  While their own terms of 
use for StarCoder2 state:

    Any use of all or part of the code gathered in The Stack v2 
    must abide by
    the terms of the original licenses...

...their own playground makes this impossible.

HuggingFace is also in violation of the third principle, because 
they haven’t established a functioning opt-out model[3].  Opting 
out requires using non-free software; requests have been sitting 
for nearly a year with no action or response; and out of every 
request submitted, only a single one has *ever* been honored.

They appear to be violating free software licenses on large scale. 
They are in violation of SWH’s own positions.


> Moreover, you wrote: « I want absolutely nothing to do with 
> them. »
>
> Maybe there is a misunderstanding on your side about what “free
> software” and GPL means because once “free software”, you cannot 
> prevent
> people to use “your” free software for any purposes you dislike.
>
> If you want to bound the use cases of the software you create, 
> you need
> to explicitly specify that in the license.  And if you do, your 
> software
> will not be considered as “free software”.
>
> That’s the double sword of “free software”. :-)
>

I am crystal clear on the meaning of free software.  I wish to 
remove it from these models *in order to* keep it free.

Thanks,

  — Ian

[1]: https://arxiv.org/html/2402.19173v1
[2]: 
https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
[3]: https://huggingface.co/datasets/bigcode/the-stack-v2
[4]: https://github.com/bigcode-project/opt-out-v2/issues


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18 16:27   ` Kaelyn
  2024-03-18 17:39     ` Daniel Littlewood
@ 2024-03-18 20:38     ` Olivier Dion
  1 sibling, 0 replies; 61+ messages in thread
From: Olivier Dion @ 2024-03-18 20:38 UTC (permalink / raw)
  To: Kaelyn, guix-devel

On Mon, 18 Mar 2024, Kaelyn <kaelyn.alexi@protonmail.com> wrote:
> On Monday, March 18th, 2024 at 2:28 AM, Simon Tournier <zimon.toutoune@gmail.com> wrote:

[...]

>> That’s the double sword of “free software”. :-)
>
> Hi,
>
> I want to stress that I am not a lawyer, but my (possiblibly outdated)
> understanding of what machine learning models can and cannot do with
> regards to their training data, and a reading of parts of the GPL 2
> and 3, suggest that at best the SWH's LLM is in a legal grey area and
> at worst directly violates the license of GPL code that it ingests for
> training. As such, I don't think it is accurate to say "you cannot
> prevent people to use “your” free software for any purposes you
> dislike" in response to concerns about automatic inclusion of free
> software into LLM training sets. Specifically, my understanding (as of
> a few years ago) is that LLMs have difficulty tracing and atttributing
> various aspects of its training to specific inputs, which seems to be
> in violation of of e.g. Sections 5 and 6 of the GPL. Specific quotes
> from those sections https://www.gnu.org/licenses/gpl-3.0.html:

I think that the larger point here is that you do not get to choose who
use your software and for what purpose.  That is the double edges sword
of free software.

Putting aside LLM for a moment, what if some package in Guix is used for
military purpose?  Will this software be removed from Guix because one
of its user uses it in some unethical way, even though it is also used
in an ethical way by others.  Will we penalized users for the sake of
moral high ground?

This raise the question, what is considered ethical and when does ethic
become political dogma?

[...]

-- 
Olivier Dion
oldiob.ca


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18 19:38   ` Ian Eure
@ 2024-03-18 22:02     ` Ludovic Courtès
  2024-03-19 10:58     ` Simon Tournier
  1 sibling, 0 replies; 61+ messages in thread
From: Ludovic Courtès @ 2024-03-18 22:02 UTC (permalink / raw)
  To: Ian Eure; +Cc: Simon Tournier, guix-devel

Hello,

Ian Eure <ian@retrospec.tv> skribis:

> HuggingFace and the StarCoder2 model is in violation of principle 2.
> By their own admission, they are including code without clear
> licensing[1]:

[...]

> HuggingFace is also in violation of the third principle, because they
> haven’t established a functioning opt-out model[3].  Opting out
> requires using non-free software; requests have been sitting for
> nearly a year with no action or response; and out of every request
> submitted, only a single one has *ever* been honored.
>
> They appear to be violating free software licenses on large
> scale. They are in violation of SWH’s own positions.

You may be right, but again, I think we should all wait for SWH folks to
weigh in.

Many people working there are long-time free software activists; I think
we can trust them to take our concerns into consideration, but they may
also need more time to reply thoughtfully.

Besides, we should probably focus the discussion on what it means for Guix.

Ludo’.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-18 19:38   ` Ian Eure
  2024-03-18 22:02     ` Ludovic Courtès
@ 2024-03-19 10:58     ` Simon Tournier
  2024-03-19 15:37       ` Ian Eure
  1 sibling, 1 reply; 61+ messages in thread
From: Simon Tournier @ 2024-03-19 10:58 UTC (permalink / raw)
  To: Ian Eure; +Cc: guix-devel

Hi,

On lun., 18 mars 2024 at 12:38, Ian Eure <ian@retrospec.tv> wrote:

> They appear to be violating free software licenses on large scale. 
> They are in violation of SWH’s own positions.

[...]

> [1]: https://arxiv.org/html/2402.19173v1
> [2]: 
> https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
> [3]: https://huggingface.co/datasets/bigcode/the-stack-v2
> [4]: https://github.com/bigcode-project/opt-out-v2/issues

Please note that Software Heritage folks are not co-author of all that;
or I misread.  Do not take me wrong, this is not an attempt to escape
but a query for waiting the feedback of SWH.

As Ludo said, SWH folks are, by the way, also long time Free Software
activists.  For the record, the quality of 10 Years of Guix [1] videos
is the result of tireless work (for free!) by a Debian video team member
(also working for SWH) and one of SWH co-founder had been Debian project
leader.  Let the benefit of the doubt while waiting.

1: https://10years.guix.gnu.org

Cheers,
simon

PS: Thanks for the detailed explanations.  I will provide my reading
later, after some concerns will be separated, eventually.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-19 10:58     ` Simon Tournier
@ 2024-03-19 15:37       ` Ian Eure
  0 siblings, 0 replies; 61+ messages in thread
From: Ian Eure @ 2024-03-19 15:37 UTC (permalink / raw)
  To: Simon Tournier; +Cc: guix-devel


Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On lun., 18 mars 2024 at 12:38, Ian Eure <ian@retrospec.tv> 
> wrote:
>
>> They appear to be violating free software licenses on large 
>> scale. 
>> They are in violation of SWH’s own positions.
>
> [...]
>
>> [1]: https://arxiv.org/html/2402.19173v1
>> [2]: 
>> https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
>> [3]: https://huggingface.co/datasets/bigcode/the-stack-v2
>> [4]: https://github.com/bigcode-project/opt-out-v2/issues
>
> Please note that Software Heritage folks are not co-author of 
> all that;
> or I misread.  Do not take me wrong, this is not an attempt to 
> escape
> but a query for waiting the feedback of SWH.
>

Shit rolls downhill.  It’s the least surprising thing in the world 
to find that an "AI" company is violating licenses, because the 
entire technology is based on infringement at a massive scale. 
SWH’s partnership with, and promotion of, both the company and its 
license-violating model, in violation of their *own stated 
principles*, raises very legitimate questions.

There are multpile overlapping concerns here; personal, 
organizational, legal, ethical, and technical.

From a personal, legal standpoint, HuggingFace is almost certainly 
in violation of my code’s licenses.  I will, therefore, work to 
remove my code from their models.  From a personal, ethical 
standpoint, I believe that SWH has proven themselves untrustworthy 
by enabling *and promoting* this infringement in violation of 
their own stated policies, and will work to remove my code from 
their archive.  Personally, I cannot extend them the benefit of 
the doubt on this.  They blew it.

From an organizational ethical standpoint, Guix is IMO on the 
right track by waiting on SWH (and perhaps pressuring them to fix 
things).  From an organizational, technical perspective, I would 
like to see concrete measures to support my (and hundreds of 
others’) personal, ethical desires to exclude software from SWH, 
and by extension, HuggingFace’s models.


> As Ludo said, SWH folks are, by the way, also long time Free 
> Software
> activists.
>

In my view, this is not to their credit.  I’d expect people 
familiar with Free Software to be *more* sensitive to licensing 
concerns, thus less likely to partner with a company likely to 
violate them.


> PS: Thanks for the detailed explanations.  I will provide my 
> reading
> later, after some concerns will be separated, eventually.

You’re very welcome.

Thanks,

  — Ian


^ permalink raw reply	[flat|nested] 61+ messages in thread

* contributor uuid (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-17 11:53       ` paul
  2024-03-17 11:57         ` MSavoritias
  2024-03-17 12:51         ` Tomas Volf
@ 2024-03-20 15:25         ` bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
  2 siblings, 0 replies; 61+ messages in thread
From: bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6 @ 2024-03-20 15:25 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

paul <goodoldpaul@autistici.org> writes:

[...]

> If we'd really need to identify contributors, and obviously Guix 
> doesn't, we could use an UUID/machine readable identifier which can then 
> be mapped to a displayed name. I believe git can already be configured 
> to do so.

every contributor wishing to do so can already choose to use the
preferred uuid/email metadata they wish and ask some person with commit
access to add a uuid/display-name mapping via git .mailmap

unfortunately this does not resolve the problem with rewriting history
with git, because Guix artifacts also contains source code that usually
contains information about the author, including names that potentially
could become "deadnames" in the future

happy hacking!

--
bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-17 17:57 ` Ludovic Courtès
@ 2024-03-20 17:22   ` Giovanni Biscuolo
  2024-03-21  6:12     ` MSavoritias
  0 siblings, 1 reply; 61+ messages in thread
From: Giovanni Biscuolo @ 2024-03-20 17:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 8582 bytes --]

Hello Ludovic and Guix devel community!

Disclaimer: I've still not read all the relevant threads [3] [4], so
please forgive me if I repeat some information already provided.

What rights are we talking about?

As a *free software* user do I have the right to redistribute /old/
copies of the source code and documentation I got in the past from the
copyright holder, in any form (e.g. print)?... or to use old sources or
documentation to develop derived work, with _attribution_, without
asking for consent from the original authors and/or contact the original
authors to ask them what is their current name?

If yes, I would like to exercise all my rights without being harassed.

Also, SHW and other organizations (re)distributing free software have
their rights and should excercise them without being harassed.

Ludovic Courtès <ludo@gnu.org> writes:

[...]

>> I was also distressed to see how poorly they treated a developer who
>> wished to update their name:

[1] https://cohost.org/arborelia/post/4968198-the-software-heritag

[2] https://cohost.org/arborelia/post/5052044-the-software-heritag

> That’s another concern, with append-only storage in general, starting
> with Git.  We should look for solutions that work for both contributors
> who change names and for users.  This has happened several times in Guix
> and what people did was search/replace their name and adjust
> ‘.mailmap’.

This is a good solution but unfortunately this is not what the author of
the blog posts above [1] [2] and some people in this and other threads
[3] [4] are asking SWH - and Guix and potentially all other people
distributing copies of copyrighted works (e.g. documentation) - to do.

They are asking to "rewrite history" [1] (of git... why not of other
archives?):

--8<---------------cut here---------------start------------->8---

I already fixed my name in my code. I updated the README and the
copyright notice, and I ran git-filter-repo to rewrite the git history
so it had always said my correct name, including in commits. This is a
thing you can do.

--8<---------------cut here---------------end--------------->8---

The author explicitely invokes the "right to rectification" (of the
GDPR) [2]:

--8<---------------cut here---------------start------------->8---

I give zero shits about the integrity of their data structures. I had
already sent them a second email invoking the Right to Rectification,
which it seemed like they ignored again, so it was time to get more
formal.

[...] En application de l’article 21.1 du Règlement général sur la
protection des données (RGPD), je m’oppose au traitement de mes données
à caractère personnel par votre organisme, l’archive Software Héritage.

[...] Dès lors, vous voudrez bien : 

* supprimer mes données de vos fichiers et notifier ma demande aux
 organismes auxquels vous les auriez communiquées (articles 17.1.c. et
 19 du RGPD) ;

* si vous en avez l’obligation légale, m’indiquer la durée de
 conservation de mes données dans vos bases archives ;

* m'informer de ces éléments dans les meilleurs délais et au plus tard
 dans un délai d’un mois à compter de la réception de ce courrier
 (article 12.3 du RGPD).

--8<---------------cut here---------------end--------------->8---

People asking to rectify informaiton /they/ _published_ on their own are
obviously misinterpreting the relevant section of the GDPR (more on this
later)... and in fact, the SHW DPO reply is [2]:

--8<---------------cut here---------------start------------->8---

Unfortunately, the deletion or modification of the software repositories
you requested cannot be performed, for several reasons:

* On the one hand, these developments involve several authors and are
 made available under open source licenses, which explicitly allow
 copying and redistribution

* On the other hand, the mission of Software Heritage archive is to
 guarantee the availability of all versions of all publicly available
 source codes, and to ensure the integrity of these codes

We understand the concern about the display of outdated identities, and
for this reason a mechanism has been put in place to display a preferred
identity across all the Software Heritage archive.

--8<---------------cut here---------------end--------------->8---

But the authos is still not satisfied with the solution proposed by SHW
(and used by Guix for it's contributors):

--8<---------------cut here---------------start------------->8---

* I was not asking them to develop such a mechanism. I don't just want
 them to cosmetically change what they display, I want them to change
 the data. I can't trust the organization that contains the transphobe
 who had written their previous content policy to hold on to a
 substitution rule involving my deadname forever.

--8<---------------cut here---------------end--------------->8---

«I want them to change the data», that is: rewrite history (of /all/ the
copies of the repository archived by SWH, **fork** included?)

The CNIL (the french data regulator) has been involved, but the author
do not trust CNIL:

--8<---------------cut here---------------start------------->8---

The explanation I can come up with is that CNIL and Inria are friends,
and CNIL will never take action against Inria.

--8<---------------cut here---------------end--------------->8---

Last but NOT least: what is this "right to rectification"?
...simple:

--8<---------------cut here---------------start------------->8---

Art. 16 GDPR Right to rectification

1The data subject shall have the right to obtain from the controller
without undue delay the rectification of inaccurate personal data
concerning him or her. 2Taking into account the purposes of the
processing, the data subject shall have the right to have incomplete
personal data completed, including by means of providing a supplementary
statement.

--8<---------------cut here---------------end--------------->8---
(https://gdpr-info.eu/art-16-gdpr/)

Simple... really?!?

First question is: is the "deadname" of the author "inaccurate personal
data concerning him or her" or it is "just" the /accurate/ name the
person had before he or she changed it?

...but the most interesting part is the "suitable recital" n. 65:

--8<---------------cut here---------------start------------->8---

1 A data subject should have the right to have personal data concerning
him or her rectified and a ‘right to be forgotten’ where the retention
of such data infringes this Regulation or Union or Member State law to
which the controller is subject.

[...]

5 However, the further retention of the personal data should be lawful
where it is necessary, for exercising the right of freedom of expression
and information, for compliance with a legal obligation, for the
performance of a task carried out in the public interest or in the
exercise of official authority vested in the controller, on the grounds
of public interest in the area of public health, for archiving purposes
in the public interest, scientific or historical research purposes or
statistical purposes, or for the establishment, exercise or defence of
legal claims.

--8<---------------cut here---------------end--------------->8---
(https://gdpr-info.eu/recitals/no-65/)

Is SHW (and Guix, and... *me*) exercising it's rights of /archiving/ and
/scientific or (and!) historical research/?  I say yes.

Last question: do SHW (and Guix, and *me*) have the right to archive and
redistribute free software for historical purposes.

But also: is the retention of the "deadname" even necessary to exercise
or defense legal claims about _copyright_ issues?

And also: is my right to retain the integrity of data structures I
obtained by copyright holders or I have to throw it away if one of the
copyright holder asks me to retroactively rewrite all occurrences of his
or her name for his or her asserted "right to rectification".

All in all: what rights are we talking about, please?!?

Loving, Giovanni



[3]
https://yhetil.org/guix/iytrYuvr9BcPdWG17PDP5SXyjrZzwBGx1sbh0BVcDZ8PAifSIMdPXPbuhhDu-2woPlaWmEWnSt09h4OravmRRBrMB5uDlXYtKtI0egEQX_k=@lendvai.name/#r

[4]
https://yhetil.org/guix/86d01304cc8957a2508e1d1732421b5e0f9ceeb5.camel@planete-kraus.eu/



P.S.: I am DPO and copyright advisor at my tiny company, but IANAL :-D

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-20 17:22   ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
@ 2024-03-21  6:12     ` MSavoritias
  2024-03-21 10:49       ` Attila Lendvai
                         ` (3 more replies)
  0 siblings, 4 replies; 61+ messages in thread
From: MSavoritias @ 2024-03-21  6:12 UTC (permalink / raw)
  To: Giovanni Biscuolo; +Cc: guix-devel

On 3/20/24 19:22, Giovanni Biscuolo wrote:

> Hello Ludovic and Guix devel community!
>
> Disclaimer: I've still not read all the relevant threads [3] [4], so
> please forgive me if I repeat some information already provided.
>
> What rights are we talking about?

You are making the same misconception as some other people in the thread 
here.

We are talking about social rules that we have here in the Guix 
community not legal/state rules.


Specifically the social rules that we support trans people and we want 
to include them. Any person really that want to change their name at 
some point for some reason.

To that end we listen to their concerns/wishes and we accommodate them.

>
> As a *free software* user do I have the right to redistribute /old/
> copies of the source code and documentation I got in the past from the
> copyright holder, in any form (e.g. print)?... or to use old sources or
> documentation to develop derived work, with _attribution_, without
> asking for consent from the original authors and/or contact the original
> authors to ask them what is their current name?

Copyright is not consent. When we are talking about consent we are 
talking about it in social rules.

See also 
https://www.consentfultech.io/wp-content/uploads/2019/10/Building-Consentful-Tech.pdf 
as a nice paper for consent in tech.

> If yes, I would like to exercise all my rights without being harassed.

Again this has nothing to do with rights granted by states. This is 
about including people and making them feel safe and respected.


MSavoritias

>
> Also, SHW and other organizations (re)distributing free software have
> their rights and should excercise them without being harassed.
>
> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21  6:12     ` MSavoritias
@ 2024-03-21 10:49       ` Attila Lendvai
  2024-03-21 11:51       ` pelzflorian (Florian Pelz)
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: Attila Lendvai @ 2024-03-21 10:49 UTC (permalink / raw)
  To: MSavoritias; +Cc: Giovanni Biscuolo, guix-devel

> We are talking about social rules that we have here in the Guix
> community not legal/state rules.


ethics, i.e. the discussion of rights, is a branch of philosophy.

ideally, it should inform the people who are writing and enforcing state laws, but these days -- sadly -- it has precious little to do with state laws. and i think you're the one here who conflates the two.


> Specifically the social rules that we support trans people and we want
> to include them. Any person really that want to change their name at
> some point for some reason.
>
> To that end we listen to their concerns/wishes and we accommodate them.


i've asked you this before, and i'll keep asking it: sure, accommodate, but to what extent? what is a reasonable cost i can incur on others? (see the discussion of negative vs. positive rights in this context)

what if i declare that i only feel accommodated here if everyone attaches the local weather forcast to each mail they send to guix-devel?

the limit of your demands begins where it starts to constrain the freedom of others. considering this is an essential part of respectful behavior towards others.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“I am not what happened to me, I am what I choose to become.”
	— Carl Jung (1875–1961)



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21  6:12     ` MSavoritias
  2024-03-21 10:49       ` Attila Lendvai
@ 2024-03-21 11:51       ` pelzflorian (Florian Pelz)
  2024-03-21 11:52       ` pinoaffe
  2024-03-21 15:23       ` Hartmut Goebel
  3 siblings, 0 replies; 61+ messages in thread
From: pelzflorian (Florian Pelz) @ 2024-03-21 11:51 UTC (permalink / raw)
  To: MSavoritias; +Cc: Giovanni Biscuolo, guix-devel

Hello all.  I object to this argument:

MSavoritias <email@msavoritias.me> writes:
> We are talking about social rules that we have here in the Guix
> community not legal/state rules.

No, legal rules come from deliberation of social arguments.

CoC-wise, it seems to me that SWH was unfriendly and this is important
to Guix.

But SWH’s legal arguments are also social arguments and cannot be
dismissed.  I do not know if SWH really is an archive in the sense of
the law, but certainly we are facing a trade-off.

It would be nice if Guix could handle harmless deletion or
rectifications.  Whether that is possible shapes laws.  I believe it is
possible, but “show me how” is a valid response.

Regards,
Florian


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21  6:12     ` MSavoritias
  2024-03-21 10:49       ` Attila Lendvai
  2024-03-21 11:51       ` pelzflorian (Florian Pelz)
@ 2024-03-21 11:52       ` pinoaffe
  2024-03-21 15:08         ` Giovanni Biscuolo
  2024-03-21 15:23       ` Hartmut Goebel
  3 siblings, 1 reply; 61+ messages in thread
From: pinoaffe @ 2024-03-21 11:52 UTC (permalink / raw)
  To: MSavoritias; +Cc: Giovanni Biscuolo, guix-devel

Hi!

MSavoritias <email@msavoritias.me> writes:

> On 3/20/24 19:22, Giovanni Biscuolo wrote:
>> Disclaimer: I've still not read all the relevant threads [3] [4], so
>> please forgive me if I repeat some information already provided.
>>
>> What rights are we talking about?
>
> You are making the same misconception as some other people in the
> thread here.
>
> We are talking about social rules that we have here in the Guix
> community not legal/state rules.

Arborelia is clearly talking about legal/state rules in part of her
blogposts.  You can argue that the state rules aren't relevant here
(IMO, Giovanni's observations support this argument), but it's not a
"misconception" to think that the current discussion is at least
partially about the legal aspects.

> Specifically the social rules that we support trans people and we want
> to include them. Any person really that want to change their name at
> some point for some reason.
>
> To that end we listen to their concerns/wishes and we accommodate
> them.

I agree that we should listen to peoples concerns/wishes and accommodate
them out of basic respect, but we can only accomodate people's wishes
when those wishes fall within what is technologically feasible and reasonable.

When a person publishes books under a certain identity, it is not
feasible for *every* mention in every copy to retroactively be updated
to reflect a new name.  In a similar manner, it is (currently) not
always feasible to rewrite git history to change historic names.

I think we, as Guix,
- should examine if/how it is currently feasible to rewrite our git history,
- should examine possible workarounds going forward,
- should move towards something like UUIDs and petnames in the long run.

(see https://spritelyproject.org/news/petname-systems.html).

>> As a *free software* user do I have the right to redistribute /old/
>> copies of the source code and documentation I got in the past from the
>> copyright holder, in any form (e.g. print)?... or to use old sources or
>> documentation to develop derived work, with _attribution_, without
>> asking for consent from the original authors and/or contact the original
>> authors to ask them what is their current name?
>
> Copyright is not consent. When we are talking about consent we are
> talking about it in social rules.
>
> See also
> https://www.consentfultech.io/wp-content/uploads/2019/10/Building-Consentful-Tech.pdf
> as a nice paper for consent in tech.
>
>> If yes, I would like to exercise all my rights without being harassed.
>
> Again this has nothing to do with rights granted by states. This is
> about including people and making them feel safe and respected.

I fully agree with you here, rights such as the right to free speech and
copyleft don't mean that any action that falls within those rights
should be free of consequences, especially when such an action excludes
others, disrespects them or makes them feel unsafe.

>> [...]

kind regards,
pinoaffe


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 11:52       ` pinoaffe
@ 2024-03-21 15:08         ` Giovanni Biscuolo
  2024-03-21 15:11           ` MSavoritias
  2024-03-21 16:17           ` pinoaffe
  0 siblings, 2 replies; 61+ messages in thread
From: Giovanni Biscuolo @ 2024-03-21 15:08 UTC (permalink / raw)
  To: pinoaffe; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

Hello pinoaffe,

pinoaffe <pinoaffe@gmail.com> writes:

[...]

> I think we, as Guix,
> - should examine if/how it is currently feasible to rewrite our git
> history,

it's not, see also:
https://guix.gnu.org/en/blog/2020/securing-updates/

> - should examine possible workarounds going forward,
> - should move towards something like UUIDs and petnames in the long run.
>
> (see https://spritelyproject.org/news/petname-systems.html).

I don't understand how using petnames, uuids or even a re:claimID
identity (see below) could solve the problem with "rewriting history" in
case a person wishes to change his or her previous _published_ name
(petname, uuid...) in an archived content-addressable storage system.

As a side note, other than the "petname system" please also consider
re:claimID from GNUnet:
https://www.gnunet.org/en/reclaim/index.html
https://www.gnunet.org/en/reclaim/motivation.html

[...]

Regards, Giovanni.


[1] https://guix.gnu.org/en/blog/2020/securing-updates/


-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 15:08         ` Giovanni Biscuolo
@ 2024-03-21 15:11           ` MSavoritias
  2024-03-21 22:11             ` Philip McGrath
  2024-03-21 16:17           ` pinoaffe
  1 sibling, 1 reply; 61+ messages in thread
From: MSavoritias @ 2024-03-21 15:11 UTC (permalink / raw)
  To: Giovanni Biscuolo, pinoaffe; +Cc: guix-devel


On 3/21/24 17:08, Giovanni Biscuolo wrote:
> Hello pinoaffe,
>
> pinoaffe <pinoaffe@gmail.com> writes:
>
> [...]
>
>> I think we, as Guix,
>> - should examine if/how it is currently feasible to rewrite our git
>> history,
> it's not, see also:
> https://guix.gnu.org/en/blog/2020/securing-updates/
>
>> - should examine possible workarounds going forward,
>> - should move towards something like UUIDs and petnames in the long run.
>>
>> (see https://spritelyproject.org/news/petname-systems.html).
> I don't understand how using petnames, uuids or even a re:claimID
> identity (see below) could solve the problem with "rewriting history" in
> case a person wishes to change his or her previous _published_ name
> (petname, uuid...) in an archived content-addressable storage system.

It doesnt solve the problem of rewriting history. It solves the bug of 
having names part of the git history.

see also https://gitlab.com/gitlab-org/gitlab/-/issues/20960 for Gitlab 
doing the same thing.


MSavoritias

>
> As a side note, other than the "petname system" please also consider
> re:claimID from GNUnet:
> https://www.gnunet.org/en/reclaim/index.html
> https://www.gnunet.org/en/reclaim/motivation.html
>
> [...]
>
> Regards, Giovanni.
>
>
> [1] https://guix.gnu.org/en/blog/2020/securing-updates/
>
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21  6:12     ` MSavoritias
                         ` (2 preceding siblings ...)
  2024-03-21 11:52       ` pinoaffe
@ 2024-03-21 15:23       ` Hartmut Goebel
  2024-03-21 15:27         ` MSavoritias
                           ` (2 more replies)
  3 siblings, 3 replies; 61+ messages in thread
From: Hartmut Goebel @ 2024-03-21 15:23 UTC (permalink / raw)
  To: guix-devel

Am 21.03.24 um 07:12 schrieb MSavoritias:
> Specifically the social rules that we support trans people and we want 
> to include them. Any person really that want to change their name at 
> some point for some reason. 

Interestingly you are asking the right to get the old name rewritten for 
trans people only.

To be frank: IMHO This is a quiet egocentric point of view.

In many cultures all over the world women are required to change their 
name when they merry. And you are not asking for women's right. But only 
for right for the small but loud minority of trans people.

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 15:23       ` Hartmut Goebel
@ 2024-03-21 15:27         ` MSavoritias
  2024-03-21 15:54           ` Ekaitz Zarraga
  2024-03-22  4:33           ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  2024-03-21 16:18         ` Efraim Flashner
  2024-03-21 16:23         ` pinoaffe
  2 siblings, 2 replies; 61+ messages in thread
From: MSavoritias @ 2024-03-21 15:27 UTC (permalink / raw)
  To: Hartmut Goebel, guix-devel

On 3/21/24 17:23, Hartmut Goebel wrote:

> Am 21.03.24 um 07:12 schrieb MSavoritias:
>> Specifically the social rules that we support trans people and we 
>> want to include them. Any person really that want to change their 
>> name at some point for some reason. 
>
> Interestingly you are asking the right to get the old name rewritten 
> for trans people only.
>
> To be frank: IMHO This is a quiet egocentric point of view.
>
> In many cultures all over the world women are required to change their 
> name when they merry. And you are not asking for women's right. But 
> only for right for the small but loud minority of trans people.


What are you implying with the "loud" minority here?


MSavoritias



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 15:27         ` MSavoritias
@ 2024-03-21 15:54           ` Ekaitz Zarraga
  2024-03-22  4:33           ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  1 sibling, 0 replies; 61+ messages in thread
From: Ekaitz Zarraga @ 2024-03-21 15:54 UTC (permalink / raw)
  To: MSavoritias, Hartmut Goebel, guix-devel

Hi,

> What are you implying with the "loud" minority here?
> 
> 
> MSavoritias

He's probably talking about the same thing that made you continue being 
heated after the fact you were told to calm down and you are not wasting 
any single opportunity to continue answering every single email in this 
thread and all the subthreads that continue to appear.

I don't want to look insensitive but I think we are revolving around the 
same issue over and over again and honestly it's bothering me.

Not the discussion itself, which has a profound meaning and it's a deep 
issue, but the way it is taking place and where it is taking place.

It's also extremely sad to me to see many unanswered questions in the 
help-guix mailing list, which might or might not include questions from 
trans people that are willing to use the fantastic software we all 
collectively maintain and which would help them have a better life, and 
yet we are talking about the detail of the detail here for no real 
reason: this conversation does not have any practical purpose.

Also there are hundreds of issues open in guix, which don't happen to 
deserve the attention this discussion has.

I don't think this conversation is going to reach anywhere, and I would 
like to encourage people to spend their energy somewhere else until we 
really start having a different mindset on the issue. As we were 
suggested to do.

I don't think this is a topic for `guix-devel` mailing list. If it is, 
please let me know and change my expectations accordingly.

My suggestion is: if this is an actual problem with guix's software, we 
should open an issue for this, for those who are interested on actually 
trying to improve the situation. If it's not a problem with guix, then 
this conversation is just an exercise of ethical and intellectual 
bragging that is just uninteresting to me and more appropriate for 
social media.

Best,
Ekaitz



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 15:08         ` Giovanni Biscuolo
  2024-03-21 15:11           ` MSavoritias
@ 2024-03-21 16:17           ` pinoaffe
  1 sibling, 0 replies; 61+ messages in thread
From: pinoaffe @ 2024-03-21 16:17 UTC (permalink / raw)
  To: Giovanni Biscuolo; +Cc: guix-devel


Giovanni Biscuolo <g@xelera.eu> writes:
> [...]
> pinoaffe <pinoaffe@gmail.com> writes:
>> - should examine possible workarounds going forward,
>> - should move towards something like UUIDs and petnames in the long run.
>>
>> (see https://spritelyproject.org/news/petname-systems.html).
>
> I don't understand how using petnames, uuids or even a re:claimID
> identity (see below) could solve the problem with "rewriting history" in
> case a person wishes to change his or her previous _published_ name
> (petname, uuid...) in an archived content-addressable storage system.
It would decouple "name" from "identity as represented in the git merkle
tree", thus allowing name changes to occur without affecting hashes and
the like.  I see no possible reason for UUID changes, as UUIDs (by
themself) are not personally identifying.  This of course would not
allow retroactive splitting/merging of identities, but I feel like
permitting that is incompatible with the idea of identities anyhow.

> As a side note, other than the "petname system" please also consider
> re:claimID from GNUnet:
> https://www.gnunet.org/en/reclaim/index.html
> https://www.gnunet.org/en/reclaim/motivation.html

Sure, I'll take a look

kind regards,
pinoaffe


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 15:23       ` Hartmut Goebel
  2024-03-21 15:27         ` MSavoritias
@ 2024-03-21 16:18         ` Efraim Flashner
  2024-03-21 16:23         ` pinoaffe
  2 siblings, 0 replies; 61+ messages in thread
From: Efraim Flashner @ 2024-03-21 16:18 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Thu, Mar 21, 2024 at 04:23:01PM +0100, Hartmut Goebel wrote:
> Am 21.03.24 um 07:12 schrieb MSavoritias:
> > Specifically the social rules that we support trans people and we want
> > to include them. Any person really that want to change their name at
> > some point for some reason.
> 
> Interestingly you are asking the right to get the old name rewritten for
> trans people only.
> 
> To be frank: IMHO This is a quiet egocentric point of view.

I took it in as though we were discussing the recent activity, not that
it was ONLY this instance that we care about.  I have a number of
friends who have more than 1 set of names and specifically wish to to by
one set over the other.  The point is that there is a vocal portion of
people in the world who insist on deadnaming people, and that is not
okay.

> In many cultures all over the world women are required to change their name
> when they merry. And you are not asking for women's right. But only for
> right for the small but loud minority of trans people.

As a project, we support people by addressing them by their preferred
name, and honoring their wishes as to name, gender, honorifics, etc. For
all people. If a person chooses to go by their "maiden name" or their
"married name" or a pseudonym, that's their prerogative.

> 
> -- 
> Regards
> Hartmut Goebel
> 
> | Hartmut Goebel          | h.goebel@crazy-compilers.com               |
> | www.crazy-compilers.com | compilers which you thought are impossible |
> 
> 

-- 
Efraim Flashner   <efraim@flashner.co.il>   רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 15:23       ` Hartmut Goebel
  2024-03-21 15:27         ` MSavoritias
  2024-03-21 16:18         ` Efraim Flashner
@ 2024-03-21 16:23         ` pinoaffe
  2 siblings, 0 replies; 61+ messages in thread
From: pinoaffe @ 2024-03-21 16:23 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel


Hartmut Goebel <h.goebel@crazy-compilers.com> writes:

> Am 21.03.24 um 07:12 schrieb MSavoritias:
>> Specifically the social rules that we support trans people and we
>> want to include them. Any person really that want to change their
>> name at some point for some reason. 
>
> Interestingly you are asking the right to get the old name rewritten
> for trans people only.

This discussion arose because of the experiences of someone who's trans,
and is relevant to many trans folks, so of course this will remain a
major focus of the discussion.

> To be frank: IMHO This is a quiet egocentric point of view.
You're wrong and it ain't

> In many cultures all over the world women are required to change their
> name when they merry. And you are not asking for women's right. But
> only for right for the small but loud minority of trans people.
I am not aware of any women who want/have wanted to retroactively change
historic occurences of their maiden name, so your mail reeks of concern
trolling to me.

There are (of course) instances where people may want to replace
historic use of a name with another name for reasons other than
transitioning, but that should make you rejoice in the fact that
protecting trans people's rights also protects cis people's rights.
This should not at all be surprising, as trans rights are human rights.

Kind regards,
pinoaffe


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 15:11           ` MSavoritias
@ 2024-03-21 22:11             ` Philip McGrath
  0 siblings, 0 replies; 61+ messages in thread
From: Philip McGrath @ 2024-03-21 22:11 UTC (permalink / raw)
  To: MSavoritias, Giovanni Biscuolo, pinoaffe; +Cc: guix-devel


On Thu, Mar 21, 2024, at 11:11 AM, MSavoritias wrote:
> On 3/21/24 17:08, Giovanni Biscuolo wrote:
>> […]
>> I don't understand how using petnames, uuids or even a re:claimID
>> identity (see below) could solve the problem with "rewriting history" in
>> case a person wishes to change his or her previous _published_ name
>> (petname, uuid...) in an archived content-addressable storage system.
>
> It doesnt solve the problem of rewriting history. It solves the bug of 
> having names part of the git history.
>
> see also https://gitlab.com/gitlab-org/gitlab/-/issues/20960 for Gitlab 
> doing the same thing.
>

Unless I’m missing something, the linked Gitlab issue seems to be a proposal by someone in February 2018 that Gitlab adopt some system of using UUIDs instead of author information. There was fairly limited discussion, with the last comment in May 2018. There does not seem to have been a consensus supporting the proposal, and I’m not seeing any indication that Gitlab plans to implement the proposal.

Furthermore, the author and committer metadata are not the only places where people’s names appear in Guix. For example, I know some font packages that mention the name of the font designer in the package’s description. More broadly, Guix also refers to package sources by their content hashes: most sources probably contain some people’s names, and any of these could face the same problems as names directly included in the Guix Git repository.

I strongly believe in the importance of protecting trans people from harassment. I don’t know how to solve the tension with long-term bit-for-bit reproducibility. 

Philip


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)
  2024-03-21 15:27         ` MSavoritias
  2024-03-21 15:54           ` Ekaitz Zarraga
@ 2024-03-22  4:33           ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
  1 sibling, 0 replies; 61+ messages in thread
From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-03-22  4:33 UTC (permalink / raw)
  To: guix-devel

> IMHO This is a quiet egocentric point of view.
> What are you implying with the "loud" minority here?

Hi,

"Quiet" is a funny typo here.

Also, "peace on Earth and goodwill toward [all]." [1]

Please

[1] https://www.youtube.com/watch?v=74ocbvwam7c


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
                   ` (6 preceding siblings ...)
  2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
@ 2024-04-20 18:48 ` Ian Eure
  2024-05-01 15:29   ` Ian Eure
  2024-05-02 10:28   ` Ludovic Courtès
  7 siblings, 2 replies; 61+ messages in thread
From: Ian Eure @ 2024-04-20 18:48 UTC (permalink / raw)
  To: guix-devel

Hello,

I’m following up on this since discussion since it’s been a month 
and I haven’t heard any updates.

Summarizing the situation:

- SHF has an opaque, difficult, and undocumented process for 
  handling name changes.  I’s like to stress again that this is 
  *not* strictly a transgender issue (though it likely affects 
  them more, or in worse/different ways) -- it is a human respect 
  issue.  Many, many more cisgender people change their name than 
  transgender people.

- SHF gave their archive to HuggingFace, an "AI" company which is 
  generating derived works with no attribution or provenance, in 
  ways which violate the both licenses of the projects used to 
  train their model, and the SHF principles for LLMs.

- HuggingFace wasn’t respecting requests to opt-out of their 
  model.


On the first point, it sounds like SHF has made concrete progress 
to improve[1], which is very good to hear.  If SHF continues on 
this course, I think the concern is resolved.

On the third point, HuggingFace has begun honoring opt-out 
requests, but is still very far behind.  Also, they don’t remove 
code from the older versions of their model -- it remains there 
forever.  This is progress, but still, not great.

On the second point, I have not seen any public statements 
indicating that either SHF or HuggingFace even acknowledges the 
problem.  SHF’s most recent newsletter[2], published in April 2024 
(after these concerns came to light), continues to tout that 
StarCoder2 is "the first AI model aligned with our principles," 
which appears to be false.  StarCoder2 includes both licensed and 
unlicensed code, and HuggingFace’s own StarChat2 playground 
produces works derivative of this code, with no attribution or 
licensing information.  There is also no statement or position on 
the SHF news blog.  Nor hsa HuggingFace either fixed their tools, 
or made a statement.  This is still very much a live concern.

I have a few questions:

- Has Guix reached out to SHF to express these concerns / get a 
  response?
- Whether a public or private response, what would Guix consider 
  to be an acceptable response?  An unacceptable respoinse?
- How long is Guix willing to wait for a response?

Thanks,

  — Ian

[1]: 
https://cohost.org/arborelia/post/5273879-they-are-fixing-some
[2]: 
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf

Ian Eure <ian@retrospec.tv> writes:

> Hi Guixy people,
>
> I’d never heard of SWH before I started hacking on Guix last 
> fall, and
> it struck me as rather a good idea.  However, I’ve seen some 
> things
> lately which have soured me on them.
>
> They appear to be using the archive to build LLMs:
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>
> I was also distressed to see how poorly they treated a developer 
> who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag
>
> GPL’d software I’ve created has been packaged for Guix, which I 
> assume
> means it’s been included in SWH.  While I’m dealing with their 
> (IMO:
> unethical) opt-out process, I likely also need to stop new 
> copies from
> being uploaded again in the future.
>
> Is there a way to indicate, in a Guix package, that it should 
> *never*
> be included in SWH?
>
> Is there a way to tell Guix to never download source from SWH?
>
> I want absolutely nothing to do with them.
>
> Thanks,
>
>  — Ian
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
@ 2024-05-01 15:29   ` Ian Eure
  2024-05-01 15:41     ` Tomas Volf
  2024-05-02 10:28   ` Ludovic Courtès
  1 sibling, 1 reply; 61+ messages in thread
From: Ian Eure @ 2024-05-01 15:29 UTC (permalink / raw)
  To: guix-devel

Hello Guixers,

It’s been another week with no response or movement on this.  I’m 
disappointed that this situation seems to be getting treated so 
lightly.  Adhering to the terms of software licenses is 
fundamental to the operation of the free software ecosystem; there 
is no software freedom without it.  It’s surprising that a pretty 
clear-cut situation of creating derivative works of free software 
in violation of their licenses would be shrugged off so easily.

Whatever the Guix organization’s position is, I’m reaching my 
personal limit, and need to see some kind of positive movement on 
this[1].  If Guix is going to continue to facilitate license 
violations, I will have no choice but to remove my software from 
it to defend them.

  — Ian

[1]: Personally, I would be satisfied with a per-package setting 
which disables scheduling source for archiving by SWH.  Seeing 
this, or a committment to build this within a reasonable 
timeframe, would allay my concerns.

Ian Eure <ian@retrospec.tv> writes:

> Hello,
>
> I’m following up on this since discussion since it’s been a 
> month and
> I haven’t heard any updates.
>
> Summarizing the situation:
>
> - SHF has an opaque, difficult, and undocumented process for
>   handling name changes.  I’s like to stress again that this is
>   *not* strictly a transgender issue (though it likely affects 
>   them
>   more, or in worse/different ways) -- it is a human respect 
>   issue.
>   Many, many more cisgender people change their name than
>   transgender people.
>
> - SHF gave their archive to HuggingFace, an "AI" company which 
> is
>   generating derived works with no attribution or provenance, in
>   ways which violate the both licenses of the projects used to 
>   train
>  their model, and the SHF principles for LLMs.
>
> - HuggingFace wasn’t respecting requests to opt-out of their 
> model.
>
>
> On the first point, it sounds like SHF has made concrete 
> progress to
> improve[1], which is very good to hear.  If SHF continues on 
> this
> course, I think the concern is resolved.
>
> On the third point, HuggingFace has begun honoring opt-out 
> requests,
> but is still very far behind.  Also, they don’t remove code from 
> the
> older versions of their model -- it remains there forever.  This 
> is
> progress, but still, not great.
>
> On the second point, I have not seen any public statements 
> indicating
> that either SHF or HuggingFace even acknowledges the problem. 
> SHF’s
> most recent newsletter[2], published in April 2024 (after these
> concerns came to light), continues to tout that StarCoder2 is 
> "the
> first AI model aligned with our principles," which appears to be
> false.  StarCoder2 includes both licensed and unlicensed code, 
> and
> HuggingFace’s own StarChat2 playground produces works derivative 
> of
> this code, with no attribution or licensing information.  There 
> is
> also no statement or position on the SHF news blog.  Nor hsa
> HuggingFace either fixed their tools, or made a statement.  This 
> is
> still very much a live concern.
>
> I have a few questions:
>
> - Has Guix reached out to SHF to express these concerns / get a
>   response?
> - Whether a public or private response, what would Guix consider 
> to
>  be an acceptable response?  An unacceptable respoinse?
> - How long is Guix willing to wait for a response?
>
> Thanks,
>
>  — Ian
>
> [1]: 
> https://cohost.org/arborelia/post/5273879-they-are-fixing-some
> [2]:
> https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf
>
> Ian Eure <ian@retrospec.tv> writes:
>
>> Hi Guixy people,
>>
>> I’d never heard of SWH before I started hacking on Guix last 
>> fall,
>> and
>> it struck me as rather a good idea.  However, I’ve seen some 
>> things
>> lately which have soured me on them.
>>
>> They appear to be using the archive to build LLMs:
>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>>
>> I was also distressed to see how poorly they treated a 
>> developer who
>> wished to update their name:
>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>>
>> GPL’d software I’ve created has been packaged for Guix, which I
>> assume
>> means it’s been included in SWH.  While I’m dealing with their 
>> (IMO:
>> unethical) opt-out process, I likely also need to stop new 
>> copies
>> from
>> being uploaded again in the future.
>>
>> Is there a way to indicate, in a Guix package, that it should
>> *never*
>> be included in SWH?
>>
>> Is there a way to tell Guix to never download source from SWH?
>>
>> I want absolutely nothing to do with them.
>>
>> Thanks,
>>
>>  — Ian
>>
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-05-01 15:29   ` Ian Eure
@ 2024-05-01 15:41     ` Tomas Volf
  0 siblings, 0 replies; 61+ messages in thread
From: Tomas Volf @ 2024-05-01 15:41 UTC (permalink / raw)
  To: Ian Eure; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

On 2024-05-01 08:29:29 -0700, Ian Eure wrote:
>  If Guix is going to continue to facilitate license violations, I will have no
> choice but to remove my software from it to defend them.

Purely hypothetically, if it would come to this, how would you go about it?
Assuming the software is under free license (requirement for inclusion into
Guix), I am unsure based on what would the removal be demanded.  Do you have
some specific approach in mind?

Have a nice day,
Tomas Volf

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
  2024-05-01 15:29   ` Ian Eure
@ 2024-05-02 10:28   ` Ludovic Courtès
  2024-05-09 16:00     ` Maxim Cournoyer
  1 sibling, 1 reply; 61+ messages in thread
From: Ludovic Courtès @ 2024-05-02 10:28 UTC (permalink / raw)
  To: Ian Eure; +Cc: guix-devel

Hi Ian,

Ian Eure <ian@retrospec.tv> skribis:

> Summarizing the situation:
>
> - SHF has an opaque, difficult, and undocumented process for
>   handling name changes.  I’s like to stress again that this is
>   *not* strictly a transgender issue (though it likely affects   them
>   more, or in worse/different ways) -- it is a human respect   issue.
>   Many, many more cisgender people change their name than
>   transgender people.

It is also not strictly an SWH issue: how does Internet Archive handle
name changes?  What about append-only storage in general?  We’ve
discussed this already.

> - SHF gave their archive to HuggingFace, an "AI" company which is
>   generating derived works with no attribution or provenance, in
>   ways which violate the both licenses of the projects used to   train
>  their model, and the SHF principles for LLMs.

[...]

> - Has Guix reached out to SHF to express these concerns / get a
>   response?

I’ve seen and participated in informal discussions, but that’s all I
know.  Maintainers?

> - Whether a public or private response, what would Guix consider   to
>  be an acceptable response?  An unacceptable respoinse?
> - How long is Guix willing to wait for a response?

Free software people, myself included, have expressed disappointment
regarding the use of code harvested by SWH for HuggingFace’s training.
Stefano Zacchiroli of SWH responded to these concerns on Mastodon back
in March, as you probably saw.

One important point is that copyleft code is excluded from the training
dataset; I was able to anecdotally check that for GPL code such as Guix
using their interface (there was a thread on Mastodon but I can’t find
it): <https://huggingface.co/spaces/bigcode/in-the-stack>.  That
addresses my main concern.

Remaining concerns include the weak wording of the principles put
forward by SWH in its statement on LLMs:
<https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/>.
I think this is something worth discussing further with them (it’s
already been brought up notably on Mastodon).  It’s not clear to me
whether this is a task for Guix as a project.

(I do not forget that, in the meantime, Microsoft ingests everything
that’s on GitHub, including copyleft code, and including clones of repos
that were not initially hosted there.)

I’m not sure this is the kind of answer you expected, but I hope it
makes sense!

Ludo’.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Concerns/questions around Software Heritage Archive
  2024-05-02 10:28   ` Ludovic Courtès
@ 2024-05-09 16:00     ` Maxim Cournoyer
  0 siblings, 0 replies; 61+ messages in thread
From: Maxim Cournoyer @ 2024-05-09 16:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Ian Eure, guix-devel

Hi Ian, Ludovic.

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Ian,
>
> Ian Eure <ian@retrospec.tv> skribis:
>
>> Summarizing the situation:
>>
>> - SHF has an opaque, difficult, and undocumented process for
>>   handling name changes.  I’s like to stress again that this is
>>   *not* strictly a transgender issue (though it likely affects   them
>>   more, or in worse/different ways) -- it is a human respect   issue.
>>   Many, many more cisgender people change their name than
>>   transgender people.
>
> It is also not strictly an SWH issue: how does Internet Archive handle
> name changes?  What about append-only storage in general?  We’ve
> discussed this already.

>> - SHF gave their archive to HuggingFace, an "AI" company which is
>>   generating derived works with no attribution or provenance, in
>>   ways which violate the both licenses of the projects used to   train
>>  their model, and the SHF principles for LLMs.
>
> [...]
>
>> - Has Guix reached out to SHF to express these concerns / get a
>>   response?
>
> I’ve seen and participated in informal discussions, but that’s all I
> know.  Maintainers?

We haven't.  Given some improvements were apparently already made by SWF
in response to concerns raised, it seems the dialogue should continue.

>> - Whether a public or private response, what would Guix consider   to
>>  be an acceptable response?  An unacceptable respoinse?
>> - How long is Guix willing to wait for a response?
>
> Free software people, myself included, have expressed disappointment
> regarding the use of code harvested by SWH for HuggingFace’s training.
> Stefano Zacchiroli of SWH responded to these concerns on Mastodon back
> in March, as you probably saw.
>
> One important point is that copyleft code is excluded from the training
> dataset; I was able to anecdotally check that for GPL code such as Guix
> using their interface (there was a thread on Mastodon but I can’t find
> it): <https://huggingface.co/spaces/bigcode/in-the-stack>.  That
> addresses my main concern.
>
> Remaining concerns include the weak wording of the principles put
> forward by SWH in its statement on LLMs:
> <https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/>.
> I think this is something worth discussing further with them (it’s
> already been brought up notably on Mastodon).  It’s not clear to me
> whether this is a task for Guix as a project.

I don't think it is a task for Guix specifically, but rather for all
users of SWH or interested parties.

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2024-05-09 16:02 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
2024-03-16 17:50 ` Christopher Baines
2024-03-16 18:24   ` MSavoritias
2024-03-16 19:08     ` Christopher Baines
2024-03-16 19:45     ` Tomas Volf
2024-03-17  7:06       ` MSavoritias
2024-03-16 19:06   ` Ian Eure
2024-03-16 19:49     ` Tomas Volf
2024-03-16 23:16   ` Vivien Kraus
2024-03-16 23:27     ` Tomas Volf
     [not found]     ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
2024-03-16 23:40       ` Fw: " Ryan Prior
2024-03-16 17:58 ` MSavoritias
2024-03-18  9:50   ` Please hold your horses Simon Tournier
2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
2024-03-17  9:39   ` Lars-Dominik Braun
2024-03-17  9:47     ` MSavoritias
2024-03-17 11:53       ` paul
2024-03-17 11:57         ` MSavoritias
2024-03-17 14:57           ` Richard Sent
2024-03-17 16:28           ` Ian Eure
2024-03-17 12:51         ` Tomas Volf
2024-03-17 23:56           ` Attila Lendvai
2024-03-20 15:25         ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
2024-03-17 16:20       ` Concerns/questions around Software Heritage Archive Ian Eure
2024-03-17 16:55         ` MSavoritias
2024-03-18 14:04     ` pinoaffe
2024-03-17 13:03 ` Olivier Dion
2024-03-17 17:57 ` Ludovic Courtès
2024-03-20 17:22   ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
2024-03-21  6:12     ` MSavoritias
2024-03-21 10:49       ` Attila Lendvai
2024-03-21 11:51       ` pelzflorian (Florian Pelz)
2024-03-21 11:52       ` pinoaffe
2024-03-21 15:08         ` Giovanni Biscuolo
2024-03-21 15:11           ` MSavoritias
2024-03-21 22:11             ` Philip McGrath
2024-03-21 16:17           ` pinoaffe
2024-03-21 15:23       ` Hartmut Goebel
2024-03-21 15:27         ` MSavoritias
2024-03-21 15:54           ` Ekaitz Zarraga
2024-03-22  4:33           ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-03-21 16:18         ` Efraim Flashner
2024-03-21 16:23         ` pinoaffe
2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
2024-03-18 11:47   ` MSavoritias
2024-03-18 13:12     ` Simon Tournier
2024-03-18 14:00       ` MSavoritias
2024-03-18 14:32         ` Simon Tournier
2024-03-18 16:27   ` Kaelyn
2024-03-18 17:39     ` Daniel Littlewood
2024-03-18 20:38     ` Olivier Dion
2024-03-18 19:38   ` Ian Eure
2024-03-18 22:02     ` Ludovic Courtès
2024-03-19 10:58     ` Simon Tournier
2024-03-19 15:37       ` Ian Eure
2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-05-01 15:29   ` Ian Eure
2024-05-01 15:41     ` Tomas Volf
2024-05-02 10:28   ` Ludovic Courtès
2024-05-09 16:00     ` Maxim Cournoyer

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.