* Concerns/questions around Software Heritage Archive @ 2024-03-16 15:52 Ian Eure 2024-03-16 17:50 ` Christopher Baines ` (7 more replies) 0 siblings, 8 replies; 61+ messages in thread From: Ian Eure @ 2024-03-16 15:52 UTC (permalink / raw) To: guix-devel Hi Guixy people, I’d never heard of SWH before I started hacking on Guix last fall, and it struck me as rather a good idea. However, I’ve seen some things lately which have soured me on them. They appear to be using the archive to build LLMs: https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ I was also distressed to see how poorly they treated a developer who wished to update their name: https://cohost.org/arborelia/post/4968198-the-software-heritag https://cohost.org/arborelia/post/5052044-the-software-heritag GPL’d software I’ve created has been packaged for Guix, which I assume means it’s been included in SWH. While I’m dealing with their (IMO: unethical) opt-out process, I likely also need to stop new copies from being uploaded again in the future. Is there a way to indicate, in a Guix package, that it should *never* be included in SWH? Is there a way to tell Guix to never download source from SWH? I want absolutely nothing to do with them. Thanks, — Ian ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure @ 2024-03-16 17:50 ` Christopher Baines 2024-03-16 18:24 ` MSavoritias ` (2 more replies) 2024-03-16 17:58 ` MSavoritias ` (6 subsequent siblings) 7 siblings, 3 replies; 61+ messages in thread From: Christopher Baines @ 2024-03-16 17:50 UTC (permalink / raw) To: Ian Eure; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 2352 bytes --] Ian Eure <ian@retrospec.tv> writes: > Hi Guixy people, > > I’d never heard of SWH before I started hacking on Guix last fall, and > it struck me as rather a good idea. However, I’ve seen some things > lately which have soured me on them. > > They appear to be using the archive to build LLMs: > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ > > I was also distressed to see how poorly they treated a developer who > wished to update their name: > https://cohost.org/arborelia/post/4968198-the-software-heritag > https://cohost.org/arborelia/post/5052044-the-software-heritag > > GPL’d software I’ve created has been packaged for Guix, which I assume > means it’s been included in SWH. While I’m dealing with their (IMO: > unethical) opt-out process, I likely also need to stop new copies from > being uploaded again in the future. > > Is there a way to indicate, in a Guix package, that it should *never* > be included in SWH? Not currently, and I don't really see the point in such a mechanism. If you really never want them to store your code, then you need to license it accordingly (and not make it free software). > Is there a way to tell Guix to never download source from SWH? Also no, and it's probably best to do this at the network level on your systems/network if you want this to be the case. Skipping back to this though: > I was also distressed to see how poorly they treated a developer who > wished to update their name: > https://cohost.org/arborelia/post/4968198-the-software-heritag > https://cohost.org/arborelia/post/5052044-the-software-heritag This is probably worth thinking about as Guix is in a similar situation regarding publishing source code, and people potentially wanting to change historical source code both in things Guix packages and Guix itself. Like Software Heritage, there's cryptographical implications for rewriting the Git history and modifying source tarballs or nars that contain source code. We have 17TiB of compressed source code and built software stored for bordeaux.guix.gnu.org now and we should probably work out how to handle people asking for things to be removed or changed (for any and all reasons). It's probably worth working out our position on this in advance of someone asking. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 17:50 ` Christopher Baines @ 2024-03-16 18:24 ` MSavoritias 2024-03-16 19:08 ` Christopher Baines 2024-03-16 19:45 ` Tomas Volf 2024-03-16 19:06 ` Ian Eure 2024-03-16 23:16 ` Vivien Kraus 2 siblings, 2 replies; 61+ messages in thread From: MSavoritias @ 2024-03-16 18:24 UTC (permalink / raw) To: Christopher Baines, Ian Eure; +Cc: guix-devel On 3/16/24 19:50, Christopher Baines wrote: > Ian Eure <ian@retrospec.tv> writes: > >> Hi Guixy people, >> >> I’d never heard of SWH before I started hacking on Guix last fall, and >> it struck me as rather a good idea. However, I’ve seen some things >> lately which have soured me on them. >> >> They appear to be using the archive to build LLMs: >> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ >> >> I was also distressed to see how poorly they treated a developer who >> wished to update their name: >> https://cohost.org/arborelia/post/4968198-the-software-heritag >> https://cohost.org/arborelia/post/5052044-the-software-heritag >> >> GPL’d software I’ve created has been packaged for Guix, which I assume >> means it’s been included in SWH. While I’m dealing with their (IMO: >> unethical) opt-out process, I likely also need to stop new copies from >> being uploaded again in the future. >> >> Is there a way to indicate, in a Guix package, that it should *never* >> be included in SWH? > Not currently, and I don't really see the point in such a mechanism. If > you really never want them to store your code, then you need to license > it accordingly (and not make it free software). You are talking about legal tho. Yes legally they can copy the code. But what can Guix do socially to give people the choice? For reasons of consent that is. >> I was also distressed to see how poorly they treated a developer who >> wished to update their name: >> https://cohost.org/arborelia/post/4968198-the-software-heritag >> https://cohost.org/arborelia/post/5052044-the-software-heritag > This is probably worth thinking about as Guix is in a similar situation > regarding publishing source code, and people potentially wanting to > change historical source code both in things Guix packages and Guix > itself. > > Like Software Heritage, there's cryptographical implications for > rewriting the Git history and modifying source tarballs or nars that > contain source code. > > We have 17TiB of compressed source code and built software stored for > bordeaux.guix.gnu.org now and we should probably work out how to handle > people asking for things to be removed or changed (for any and all > reasons). > > It's probably worth working out our position on this in advance of > someone asking. I would go a step further actually. Software Heritage is effectively breaking CoC of Guix now. Im not proposing removing all code or something obviously that connects to Software Heritage, but there should be some social action we can take. For example until the matter is resolved and Software Heritage implements a process that respects trans rights Software Heritage should not be welcome in Guix Spaces. MSavoritias ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 18:24 ` MSavoritias @ 2024-03-16 19:08 ` Christopher Baines 2024-03-16 19:45 ` Tomas Volf 1 sibling, 0 replies; 61+ messages in thread From: Christopher Baines @ 2024-03-16 19:08 UTC (permalink / raw) To: MSavoritias; +Cc: Ian Eure, guix-devel [-- Attachment #1: Type: text/plain, Size: 3431 bytes --] MSavoritias <email@msavoritias.me> writes: > On 3/16/24 19:50, Christopher Baines wrote: >> Ian Eure <ian@retrospec.tv> writes: >> >>> Hi Guixy people, >>> >>> I’d never heard of SWH before I started hacking on Guix last fall, and >>> it struck me as rather a good idea. However, I’ve seen some things >>> lately which have soured me on them. >>> >>> They appear to be using the archive to build LLMs: >>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ >>> >>> I was also distressed to see how poorly they treated a developer who >>> wished to update their name: >>> https://cohost.org/arborelia/post/4968198-the-software-heritag >>> https://cohost.org/arborelia/post/5052044-the-software-heritag >>> >>> GPL’d software I’ve created has been packaged for Guix, which I assume >>> means it’s been included in SWH. While I’m dealing with their (IMO: >>> unethical) opt-out process, I likely also need to stop new copies from >>> being uploaded again in the future. >>> >>> Is there a way to indicate, in a Guix package, that it should *never* >>> be included in SWH? >> Not currently, and I don't really see the point in such a mechanism. If >> you really never want them to store your code, then you need to license >> it accordingly (and not make it free software). > > You are talking about legal tho. Yes legally they can copy the code. > > But what can Guix do socially to give people the choice? For reasons > of consent that is. ... >>> I was also distressed to see how poorly they treated a developer who >>> wished to update their name: >>> https://cohost.org/arborelia/post/4968198-the-software-heritag >>> https://cohost.org/arborelia/post/5052044-the-software-heritag >> This is probably worth thinking about as Guix is in a similar situation >> regarding publishing source code, and people potentially wanting to >> change historical source code both in things Guix packages and Guix >> itself. >> >> Like Software Heritage, there's cryptographical implications for >> rewriting the Git history and modifying source tarballs or nars that >> contain source code. >> >> We have 17TiB of compressed source code and built software stored for >> bordeaux.guix.gnu.org now and we should probably work out how to handle >> people asking for things to be removed or changed (for any and all >> reasons). >> >> It's probably worth working out our position on this in advance of >> someone asking. > > I would go a step further actually. Software Heritage is effectively > breaking CoC of Guix now. > > Im not proposing removing all code or something obviously that > connects to Software Heritage, but there should be some social action > we can take. > > > For example until the matter is resolved and Software Heritage > implements a process that respects trans rights Software Heritage > should not be welcome in Guix Spaces. As I say, Guix is in a very similar situation as a project to Software Heritage, we publish artefacts containing peoples personal details and there are technical implications in changing the personal details in those artefacts. The only difference as far as I'm aware is that no one is currently asking Guix as a project to update their personal details in the artefacts we store and publish. As a project, we should sort out our stuff before jumping to judge others. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 18:24 ` MSavoritias 2024-03-16 19:08 ` Christopher Baines @ 2024-03-16 19:45 ` Tomas Volf 2024-03-17 7:06 ` MSavoritias 1 sibling, 1 reply; 61+ messages in thread From: Tomas Volf @ 2024-03-16 19:45 UTC (permalink / raw) To: MSavoritias; +Cc: Christopher Baines, Ian Eure, guix-devel [-- Attachment #1: Type: text/plain, Size: 2191 bytes --] On 2024-03-16 20:24:50 +0200, MSavoritias wrote: > > > I was also distressed to see how poorly they treated a developer who > > > wished to update their name: > > > https://cohost.org/arborelia/post/4968198-the-software-heritag > > > https://cohost.org/arborelia/post/5052044-the-software-heritag > > This is probably worth thinking about as Guix is in a similar situation > > regarding publishing source code, and people potentially wanting to > > change historical source code both in things Guix packages and Guix > > itself. > > > > Like Software Heritage, there's cryptographical implications for > > rewriting the Git history and modifying source tarballs or nars that > > contain source code. > > > > We have 17TiB of compressed source code and built software stored for > > bordeaux.guix.gnu.org now and we should probably work out how to handle > > people asking for things to be removed or changed (for any and all > > reasons). > > > > It's probably worth working out our position on this in advance of > > someone asking. > > I would go a step further actually. Software Heritage is effectively > breaking CoC of Guix now. > > Im not proposing removing all code or something obviously that connects to > Software Heritage, but there should be some social action we can take. > > > For example until the matter is resolved and Software Heritage implements a > process that respects trans rights Software Heritage should not be welcome > in Guix Spaces. I did skim the articles and I did not see any details on what the technical solution should be. SWH, among other things, archives the repositories and allows fetching them by commit hash. At least as far as I know. Since that commit hash does contain the author field, what is the proposed solution here to change the author name without changing the commit hash? While I am not a huge fan of the ability to map the "fake" author name over the real one in the UI, what other solutions do you or the article author envision? I am genuinely curious what you think can be done here. Have a nice day, Tomas Volf -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 19:45 ` Tomas Volf @ 2024-03-17 7:06 ` MSavoritias 0 siblings, 0 replies; 61+ messages in thread From: MSavoritias @ 2024-03-17 7:06 UTC (permalink / raw) To: Christopher Baines, Ian Eure, guix-devel On 3/16/24 21:45, Tomas Volf wrote: > On 2024-03-16 20:24:50 +0200, MSavoritias wrote: >>>> I was also distressed to see how poorly they treated a developer who >>>> wished to update their name: >>>> https://cohost.org/arborelia/post/4968198-the-software-heritag >>>> https://cohost.org/arborelia/post/5052044-the-software-heritag >>> This is probably worth thinking about as Guix is in a similar situation >>> regarding publishing source code, and people potentially wanting to >>> change historical source code both in things Guix packages and Guix >>> itself. >>> >>> Like Software Heritage, there's cryptographical implications for >>> rewriting the Git history and modifying source tarballs or nars that >>> contain source code. >>> >>> We have 17TiB of compressed source code and built software stored for >>> bordeaux.guix.gnu.org now and we should probably work out how to handle >>> people asking for things to be removed or changed (for any and all >>> reasons). >>> >>> It's probably worth working out our position on this in advance of >>> someone asking. >> I would go a step further actually. Software Heritage is effectively >> breaking CoC of Guix now. >> >> Im not proposing removing all code or something obviously that connects to >> Software Heritage, but there should be some social action we can take. >> >> >> For example until the matter is resolved and Software Heritage implements a >> process that respects trans rights Software Heritage should not be welcome >> in Guix Spaces. > I did skim the articles and I did not see any details on what the technical > solution should be. SWH, among other things, archives the repositories and > allows fetching them by commit hash. At least as far as I know. Since that > commit hash does contain the author field, what is the proposed solution here to > change the author name without changing the commit hash? > > While I am not a huge fan of the ability to map the "fake" author name over the > real one in the UI, what other solutions do you or the article author envision? > I am genuinely curious what you think can be done here. I think you are arguing for something else than what I wrote? I didn't say about technical solutions and that's up to Software Heritage to figure it out. I did say that there should be social consequences since Software Heritage is breaking CoC here. And by breaking CoC I mean that Software Heritage seems to have a complete lack of empathy towards trans people. Regarding what Guix could do personally the answer is clear: People are more important than machines and code. So we should find a way that trans people feel safe in Guix. MSavoritias > Have a nice day, > Tomas Volf ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 17:50 ` Christopher Baines 2024-03-16 18:24 ` MSavoritias @ 2024-03-16 19:06 ` Ian Eure 2024-03-16 19:49 ` Tomas Volf 2024-03-16 23:16 ` Vivien Kraus 2 siblings, 1 reply; 61+ messages in thread From: Ian Eure @ 2024-03-16 19:06 UTC (permalink / raw) To: Christopher Baines; +Cc: guix-devel Christopher Baines <mail@cbaines.net> writes: > [[PGP Signed Part:Undecided]] > > Ian Eure <ian@retrospec.tv> writes: > >> Hi Guixy people, >> >> I’d never heard of SWH before I started hacking on Guix last >> fall, and >> it struck me as rather a good idea. However, I’ve seen some >> things >> lately which have soured me on them. >> >> They appear to be using the archive to build LLMs: >> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ >> >> I was also distressed to see how poorly they treated a >> developer who >> wished to update their name: >> https://cohost.org/arborelia/post/4968198-the-software-heritag >> https://cohost.org/arborelia/post/5052044-the-software-heritag >> >> GPL’d software I’ve created has been packaged for Guix, which I >> assume >> means it’s been included in SWH. While I’m dealing with their >> (IMO: >> unethical) opt-out process, I likely also need to stop new >> copies from >> being uploaded again in the future. >> >> Is there a way to indicate, in a Guix package, that it should >> *never* >> be included in SWH? > > Not currently, and I don't really see the point in such a > mechanism. If > you really never want them to store your code, then you need to > license > it accordingly (and not make it free software). > I don’t want my code in SWH *because* it’s free. A primary use of LLMs is laundering freely licensed software into proprietary, commercial projects through "AI" code completion and generation. Any Free software in an LLM training set can and will be used in violation of its license, without a clear path for the author to seek recourse. I deleted my code off Github and abandoned it completely for this exact reason, and am deeply irked to be going through this nonsense again. A more salient question may be: Is there a process within Guix (either the program or the organization) which uploads source to SWH? Or does it rely on SWH indepently? If the latter, my problem is likely solved by blocking SWH at my network edge and opting out of their archive (or trying to) and the downstream training models they’ve already put it in. If the former, the only control I currently have to protect my license is removing packages from Guix which contain it. I don’t want that outcome. Noting also that the path here seems to be SWH->huggingface->bigcode training set, and the opt-out process for the training set appears to be a complete sham. To opt-out, you must create a Github Issue; only one opt-out has *ever* been processed, and there are 200+ sitting there, many with no response for nearly a year[1]. I want no part of any of this. >> Is there a way to tell Guix to never download source from SWH? > > Also no, and it's probably best to do this at the network level > on your > systems/network if you want this to be the case. > I’ll investigate this, though I’d prefer if there was a way to configure source mirrors in the Guix daemon. > Skipping back to this though: > >> I was also distressed to see how poorly they treated a >> developer who >> wished to update their name: >> https://cohost.org/arborelia/post/4968198-the-software-heritag >> https://cohost.org/arborelia/post/5052044-the-software-heritag > > This is probably worth thinking about as Guix is in a similar > situation > regarding publishing source code, and people potentially wanting > to > change historical source code both in things Guix packages and > Guix > itself. > > Like Software Heritage, there's cryptographical implications for > rewriting the Git history and modifying source tarballs or nars > that > contain source code. > > We have 17TiB of compressed source code and built software > stored for > bordeaux.guix.gnu.org now and we should probably work out how to > handle > people asking for things to be removed or changed (for any and > all > reasons). > > It's probably worth working out our position on this in advance > of > someone asking. > Yes, I agree that Guix needs a better solution for this. Thanks, — Ian [1]: https://github.com/bigcode-project/opt-out-v2/issues ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 19:06 ` Ian Eure @ 2024-03-16 19:49 ` Tomas Volf 0 siblings, 0 replies; 61+ messages in thread From: Tomas Volf @ 2024-03-16 19:49 UTC (permalink / raw) To: Ian Eure; +Cc: Christopher Baines, guix-devel [-- Attachment #1: Type: text/plain, Size: 4922 bytes --] On 2024-03-16 12:06:27 -0700, Ian Eure wrote: > > Christopher Baines <mail@cbaines.net> writes: > > > [[PGP Signed Part:Undecided]] > > > > Ian Eure <ian@retrospec.tv> writes: > > > > > Hi Guixy people, > > > > > > I’d never heard of SWH before I started hacking on Guix last fall, > > > and > > > it struck me as rather a good idea. However, I’ve seen some things > > > lately which have soured me on them. > > > > > > They appear to be using the archive to build LLMs: > > > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ > > > > > > I was also distressed to see how poorly they treated a developer who > > > wished to update their name: > > > https://cohost.org/arborelia/post/4968198-the-software-heritag > > > https://cohost.org/arborelia/post/5052044-the-software-heritag > > > > > > GPL’d software I’ve created has been packaged for Guix, which I > > > assume > > > means it’s been included in SWH. While I’m dealing with their (IMO: > > > unethical) opt-out process, I likely also need to stop new copies > > > from > > > being uploaded again in the future. > > > > > > Is there a way to indicate, in a Guix package, that it should > > > *never* > > > be included in SWH? > > > > Not currently, and I don't really see the point in such a mechanism. If > > you really never want them to store your code, then you need to license > > it accordingly (and not make it free software). > > > > I don’t want my code in SWH *because* it’s free. A primary use of LLMs is > laundering freely licensed software into proprietary, commercial projects > through "AI" code completion and generation. Any Free software in an LLM > training set can and will be used in violation of its license, without a > clear path for the author to seek recourse. I deleted my code off Github > and abandoned it completely for this exact reason, and am deeply irked to be > going through this nonsense again. > > A more salient question may be: Is there a process within Guix (either the > program or the organization) which uploads source to SWH? Or does it rely > on SWH indepently? `guix lint PKG-NAME' schedules SWH archival if possible. No code is directly uploaded (at least currently), so assuming you have a IP list of SWH, it should be possible to block it. At least AFAIK. If you have the list, or know how to get it, could you share it? I would be interesting in blocking it as well from my git hosting. > > If the latter, my problem is likely solved by blocking SWH at my network > edge and opting out of their archive (or trying to) and the downstream > training models they’ve already put it in. If the former, the only control > I currently have to protect my license is removing packages from Guix which > contain it. I don’t want that outcome. > > Noting also that the path here seems to be SWH->huggingface->bigcode > training set, and the opt-out process for the training set appears to be a > complete sham. To opt-out, you must create a Github Issue; only one opt-out > has *ever* been processed, and there are 200+ sitting there, many with no > response for nearly a year[1]. I want no part of any of this. > > > > > Is there a way to tell Guix to never download source from SWH? > > > > Also no, and it's probably best to do this at the network level on your > > systems/network if you want this to be the case. > > > > I’ll investigate this, though I’d prefer if there was a way to configure > source mirrors in the Guix daemon. > > > > Skipping back to this though: > > > > > I was also distressed to see how poorly they treated a developer who > > > wished to update their name: > > > https://cohost.org/arborelia/post/4968198-the-software-heritag > > > https://cohost.org/arborelia/post/5052044-the-software-heritag > > > > This is probably worth thinking about as Guix is in a similar situation > > regarding publishing source code, and people potentially wanting to > > change historical source code both in things Guix packages and Guix > > itself. > > > > Like Software Heritage, there's cryptographical implications for > > rewriting the Git history and modifying source tarballs or nars that > > contain source code. > > > > We have 17TiB of compressed source code and built software stored for > > bordeaux.guix.gnu.org now and we should probably work out how to handle > > people asking for things to be removed or changed (for any and all > > reasons). > > > > It's probably worth working out our position on this in advance of > > someone asking. > > > > Yes, I agree that Guix needs a better solution for this. > > Thanks, > > — Ian > > [1]: https://github.com/bigcode-project/opt-out-v2/issues > T. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 17:50 ` Christopher Baines 2024-03-16 18:24 ` MSavoritias 2024-03-16 19:06 ` Ian Eure @ 2024-03-16 23:16 ` Vivien Kraus 2024-03-16 23:27 ` Tomas Volf [not found] ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com> 2 siblings, 2 replies; 61+ messages in thread From: Vivien Kraus @ 2024-03-16 23:16 UTC (permalink / raw) To: Christopher Baines, Ian Eure; +Cc: guix-devel Hello! Le samedi 16 mars 2024 à 17:50 +0000, Christopher Baines a écrit : > This is probably worth thinking about as Guix is in a similar > situation > regarding publishing source code, and people potentially wanting to > change historical source code both in things Guix packages and Guix > itself. I see two problems: 1. providing packages; 2. developing Guix itself. I am sure that 1. is not a real problem, we could just ask the developer to release a new version incrementing the patch number, upgrade it on our side, and forget the old version. Garbage collection would ultimately get rid of the old tarballs. 2. is more difficult, because Guix contributors sometimes change their names too, and a commit reading “update my name” is not the best solution. If I understand correctly, rewriting the history would be understood as a “downgrade attack”, contrary to the ftfy case where the developer could rewrite the history without such consequences. Is my understanding correct? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 23:16 ` Vivien Kraus @ 2024-03-16 23:27 ` Tomas Volf [not found] ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com> 1 sibling, 0 replies; 61+ messages in thread From: Tomas Volf @ 2024-03-16 23:27 UTC (permalink / raw) To: Vivien Kraus; +Cc: Christopher Baines, Ian Eure, guix-devel [-- Attachment #1: Type: text/plain, Size: 2255 bytes --] On 2024-03-17 00:16:26 +0100, Vivien Kraus wrote: > Hello! > > Le samedi 16 mars 2024 à 17:50 +0000, Christopher Baines a écrit : > > This is probably worth thinking about as Guix is in a similar > > situation > > regarding publishing source code, and people potentially wanting to > > change historical source code both in things Guix packages and Guix > > itself. > > I see two problems: > > 1. providing packages; > 2. developing Guix itself. > > I am sure that 1. is not a real problem, we could just ask the > developer to release a new version incrementing the patch number, > upgrade it on our side, and forget the old version. Garbage collection > would ultimately get rid of the old tarballs. How would that approach interact with `guix time-machine'? If developer takes the approach of the package mentioned here (rewrite the history), would that not cause the previous version to be no longer buildable, since the commit would no longer exist? I am not sure what the developer would do for old tarballs in this situation. Re-release them from the re-written history or just drop them? Either would be a problem. Or would they not care about dead name in the tarballs? Currently SWH protects against the first (git commit), not sure if there is any protection against the second currently (does SWH injects tarballs as well?). Either I am missing something, or this would actually be a problem for the time-machine use case. > 2. is more difficult, because Guix contributors sometimes change their > names too, and a commit reading “update my name” is not the best > solution. If I understand correctly, rewriting the history would be > understood as a “downgrade attack”, contrary to the ftfy case where the > developer could rewrite the history without such consequences. Is my > understanding correct? For my use case using .mailmap was enough, but that was not a dead name situation. However it is a solution that works today, and changes the name visible in most git operations (afaict) without modifying the history. So something to consider. Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>]
* Fw: Re: Concerns/questions around Software Heritage Archive [not found] ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com> @ 2024-03-16 23:40 ` Ryan Prior 0 siblings, 0 replies; 61+ messages in thread From: Ryan Prior @ 2024-03-16 23:40 UTC (permalink / raw) To: Guix Devel [I intended to CC the following to guix-devel but forgot:] ------- Forwarded Message ------- From: Ryan Prior <rprior@protonmail.com> Date: On Saturday, March 16th, 2024 at 6:36 PM Subject: Re: Concerns/questions around Software Heritage Archive To: Vivien Kraus <vivien@planete-kraus.eu> > > > On Saturday, March 16th, 2024 at 6:13 PM, Vivien Kraus vivien@planete-kraus.eu wrote: > > > 2. is more difficult, because Guix contributors sometimes change their > > names too, and a commit reading “update my name” is not the best > > solution. If I understand correctly, rewriting the history would be > > understood as a “downgrade attack”, contrary to the ftfy case where the > > developer could rewrite the history without such consequences. Is my > > understanding correct? > > > It's only a problem IMO because we make the decision to treat Guix as an append-only series of commits and treat any other outcome as a potential attack. One alternate solution would be to allow provision of an authenticated alternate-history data structure, which indicates a set of (old commit hash, new commit hash) tuples going back to the first rewritten commit in the history, and the whole thing would be signed by a Guix committer. That way, the updating Guix client can rewind history, apply the new commit(s), verify that the old chain and new chain match what's provided in the alternate-history structure & that its signature is valid. Thus verified, the Guix installation could continue without needing to allow a downgrade exception. > > Perhaps there are much better ways of handling this, but I propose it in hopes of clarifying that there are technical solutions which preserve integrity while permitting history rewrites in situations where it is desirable. > > I have requested previously that some commits I've provided be rewritten to update my name. In my case, it's because I've sometimes misconfigured my email software such that some commits by me are signed just "ryan" or "Ryan Prior via Protonmail" or similar, rather than my preference which is "Ryan Prior". > > In my case this causes me no harm and is simply an annoyance, so when I encountered resistance to rewriting the offending commits, I dropped the matter, and I still consider it dropped and settled. Even if we developed the capability to securely present a rewritten history, I wouldn't demand that such be used to address small concerns like mine. > > However, I know we have at least two trans Guix contributors. Do they have any commits with their deadnames on them? Not that this is an invitation to go look; they can tell us if this is a concern worth raising. I include the detail to clarify that this is not a distant concern. Perhaps they have been silent thus far for the same reason that I have, because the policy against rewrites presents too high a barrier? (Or it may not bother them, or maybe they used their initials which are the same etc?) In any case I think it would be courteous to develop a procedure by which we could remove deadnames from old commits, or otherwise remove harmful information from Guix's development history, should this become a necessity. > > Ryan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure 2024-03-16 17:50 ` Christopher Baines @ 2024-03-16 17:58 ` MSavoritias 2024-03-18 9:50 ` Please hold your horses Simon Tournier 2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior ` (5 subsequent siblings) 7 siblings, 1 reply; 61+ messages in thread From: MSavoritias @ 2024-03-16 17:58 UTC (permalink / raw) To: Ian Eure, guix-devel On 3/16/24 17:52, Ian Eure wrote: > Hi Guixy people, > > I’d never heard of SWH before I started hacking on Guix last fall, and > it struck me as rather a good idea. However, I’ve seen some things > lately which have soured me on them. > > They appear to be using the archive to build LLMs: > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ > > I was also distressed to see how poorly they treated a developer who > wished to update their name: > https://cohost.org/arborelia/post/4968198-the-software-heritag > https://cohost.org/arborelia/post/5052044-the-software-heritag > > GPL’d software I’ve created has been packaged for Guix, which I assume > means it’s been included in SWH. While I’m dealing with their (IMO: > unethical) opt-out process, I likely also need to stop new copies from > being uploaded again in the future. > > Is there a way to indicate, in a Guix package, that it should *never* > be included in SWH? > > Is there a way to tell Guix to never download source from SWH? > > I want absolutely nothing to do with them. > > Thanks, > > — Ian > Oh no. Apparently they have A.I. and blockchain besides being also transphobic. Thanks for the heads up. That's all I needed to know to never touch whatever they are doing. MSavoritias ^ permalink raw reply [flat|nested] 61+ messages in thread
* Please hold your horses 2024-03-16 17:58 ` MSavoritias @ 2024-03-18 9:50 ` Simon Tournier 0 siblings, 0 replies; 61+ messages in thread From: Simon Tournier @ 2024-03-18 9:50 UTC (permalink / raw) To: MSavoritias, Ian Eure, guix-devel Hi MSavoritias, Could you please stop to propagate tangential or opinionated views? Please hold your horses. You wrote several times, about Software Heritage: > being also transphobic. […] > I would go a step further actually. Software Heritage is effectively > breaking CoC of Guix now. […] > Software Heritage > implements a process that respects trans rights Software Heritage should > not be welcome in Guix Spaces. […] > Software > Heritage is breaking CoC here. This language is not acceptable on Guix channel of communication. It appears to me much better to stay open and let the benefit of the doubt. Let avoid bold conclusions and prefer constructive arguments. For instance, I refrain to qualify your opinion because it would not be helpful… So I apply my own advice letting you the benefit of the doubt. Cheers, simon ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure 2024-03-16 17:50 ` Christopher Baines 2024-03-16 17:58 ` MSavoritias @ 2024-03-16 21:37 ` Ryan Prior 2024-03-17 9:39 ` Lars-Dominik Braun 2024-03-17 13:03 ` Olivier Dion ` (4 subsequent siblings) 7 siblings, 1 reply; 61+ messages in thread From: Ryan Prior @ 2024-03-16 21:37 UTC (permalink / raw) To: Ian Eure; +Cc: guix-devel On Saturday, March 16th, 2024 at 10:52 AM, Ian Eure <ian@retrospec.tv> wrote: > > > Hi Guixy people, > [...] > I was also distressed to see how poorly they treated a developer > who wished to update their name: > https://cohost.org/arborelia/post/4968198-the-software-heritag > https://cohost.org/arborelia/post/5052044-the-software-heritag I read these posts with interest. It is worth noting that the complained-about organization, Inria, supports Guix as well & has close historical ties to the project (although it is does not have decision-making power here AFAIK.) It is a shame that Inria have treated this matter with such apparent disregard. I have heard folks in the Guix maintenance sphere claim that we never rewrite git history in Guix, as a matter of policy. I believe we should revisit that policy (is it actually written anywhere?) with an eye towards possible exceptions, and develop a mechanism for securely maintaining continuity of Guix installations after history has been rewritten so that we maintain this as a technical possibility in the future, even if we should choose to use it sparingly. Ryan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior @ 2024-03-17 9:39 ` Lars-Dominik Braun 2024-03-17 9:47 ` MSavoritias 2024-03-18 14:04 ` pinoaffe 0 siblings, 2 replies; 61+ messages in thread From: Lars-Dominik Braun @ 2024-03-17 9:39 UTC (permalink / raw) To: Ryan Prior; +Cc: Ian Eure, guix-devel Hey, > I have heard folks in the Guix maintenance sphere claim that we never rewrite git history in Guix, as a matter of policy. I believe we should revisit that policy (is it actually written anywhere?) with an eye towards possible exceptions, and develop a mechanism for securely maintaining continuity of Guix installations after history has been rewritten so that we maintain this as a technical possibility in the future, even if we should choose to use it sparingly. the fallout of rewriting Guix’ git history would be devastating. It would break every single Guix installation, because a) `guix pull` authenticates commits and we might lose our trust anchor if we rewrite history earlier than the introduction of this feature, b) `guix pull` outright rejects changes to the commit history to prevent downgrade attacks. Additionally it would break every single existing usage of the time machine and thereby completely defeat the goal of providing reproducible software environments since the commit hash is used to identify the point in time to jump to. I doubt developing “mechanisms” – whatever they look like – would be worth the effort. Our contributors matter, but so do our users. Never ever rewriting our git history is a tradeoff we should make for our users. Lars ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 9:39 ` Lars-Dominik Braun @ 2024-03-17 9:47 ` MSavoritias 2024-03-17 11:53 ` paul 2024-03-17 16:20 ` Concerns/questions around Software Heritage Archive Ian Eure 2024-03-18 14:04 ` pinoaffe 1 sibling, 2 replies; 61+ messages in thread From: MSavoritias @ 2024-03-17 9:47 UTC (permalink / raw) To: Lars-Dominik Braun, Ryan Prior; +Cc: Ian Eure, guix-devel On 3/17/24 11:39, Lars-Dominik Braun wrote: > Hey, > >> I have heard folks in the Guix maintenance sphere claim that we never rewrite git history in Guix, as a matter of policy. I believe we should revisit that policy (is it actually written anywhere?) with an eye towards possible exceptions, and develop a mechanism for securely maintaining continuity of Guix installations after history has been rewritten so that we maintain this as a technical possibility in the future, even if we should choose to use it sparingly. > the fallout of rewriting Guix’ git history would be devastating. It > would break every single Guix installation, because > > a) `guix pull` authenticates commits and we might lose our trust anchor > if we rewrite history earlier than the introduction of this feature, > b) `guix pull` outright rejects changes to the commit history to prevent > downgrade attacks. > > Additionally it would break every single existing usage of the > time machine and thereby completely defeat the goal of providing > reproducible software environments since the commit hash is used to > identify the point in time to jump to. > > I doubt developing “mechanisms” – whatever they look like – would > be worth the effort. Our contributors matter, but so do our users. Never > ever rewriting our git history is a tradeoff we should make for our users. > > Lars > > Thats a good point. in the sense that its a tradeoff here and I absolutely agree. But let me add some food for thought here: 1. Were the social aspects considered when the system came into place? 2. Is it more important for the system to stay as is than to welcome new contributors? 3. You mention "its a tradeoff we should make for our users". How many trans people where involved in that decision and how much did their opinion matter in this? I am saying this because giving power to people(what is called users) is not only handling them code or make sure everything is free software. Its also the hard part of making sure the voices of people that can not code is heard and is participating and taking in mind. I am not trying to say what we should do about commit history rewriting here. Personally the tradeoffs are probably worth it. But I am trying to say what Guix should do as a culture over including people or excluding in the case of Software Heritage. MSavoritias ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 9:47 ` MSavoritias @ 2024-03-17 11:53 ` paul 2024-03-17 11:57 ` MSavoritias ` (2 more replies) 2024-03-17 16:20 ` Concerns/questions around Software Heritage Archive Ian Eure 1 sibling, 3 replies; 61+ messages in thread From: paul @ 2024-03-17 11:53 UTC (permalink / raw) To: guix-devel Hi all , thank you MSavoritias for bringing up points that many of us share. It's clearly a tradeoff what to do about the past. For the future, as Christpher already stated, we need a serious solution that we can uphold as a free software project that does not alienate users or contributors. My opinion is that names are just wrong to be included, not only because of deadnames, but in general having a database with a column first_name and a column second_name is something only a 35 yrs old white cis boy could have thought was a good idea to model the spectrum of names humans use all over the world: https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ If we'd really need to identify contributors, and obviously Guix doesn't, we could use an UUID/machine readable identifier which can then be mapped to a displayed name. I believe git can already be configured to do so. giacomo ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 11:53 ` paul @ 2024-03-17 11:57 ` MSavoritias 2024-03-17 14:57 ` Richard Sent 2024-03-17 16:28 ` Ian Eure 2024-03-17 12:51 ` Tomas Volf 2024-03-20 15:25 ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6 2 siblings, 2 replies; 61+ messages in thread From: MSavoritias @ 2024-03-17 11:57 UTC (permalink / raw) To: paul, guix-devel On 3/17/24 13:53, paul wrote: > Hi all , > > thank you MSavoritias for bringing up points that many of us share. > It's clearly a tradeoff what to do about the past. For the future, as > Christpher already stated, we need a serious solution that we can > uphold as a free software project that does not alienate users or > contributors. > > My opinion is that names are just wrong to be included, not only > because of deadnames, but in general having a database with a column > first_name and a column second_name is something only a 35 yrs old > white cis boy could have thought was a good idea to model the spectrum > of names humans use all over the world: > > https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ > > > If we'd really need to identify contributors, and obviously Guix > doesn't, we could use an UUID/machine readable identifier which can > then be mapped to a displayed name. I believe git can already be > configured to do so. > > > giacomo > > The uuid sounds like a very interesting solution indeed. I wonder how easy it could be to add it to git. I agree that making some rules about names that are going to be wrong at some point or in some place is the wrong solution long term for sure. MSavoritias ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 11:57 ` MSavoritias @ 2024-03-17 14:57 ` Richard Sent 2024-03-17 16:28 ` Ian Eure 1 sibling, 0 replies; 61+ messages in thread From: Richard Sent @ 2024-03-17 14:57 UTC (permalink / raw) To: MSavoritias; +Cc: paul, guix-devel Regarding Guix development, if the decision is made to not change existing policy or implement another authorship mechanism, I think some text could be added to the manual explaining such. Contributing to Guix is an intentional thing, unlike SWH. Updating the manual means contributors will, at least, be making an informed decision to contribute, knowing that names cannot be changed in the Guix repo's history due to X, Y, and Z consequences in Guix's functionality. I'm not suggesting that this solution is "the end-all-be-all" or invalidates alternative avenues, but I feel it is an improvement over the status quo with no negative tradeoffs. I would not support a solution that obsoletes time-machine or requires regular manual intervention during upgrades. Personally as a new contributor I find it gratifying to see my name in the commit history. -- Take it easy, Richard Sent Making my computer weirder one commit at a time. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 11:57 ` MSavoritias 2024-03-17 14:57 ` Richard Sent @ 2024-03-17 16:28 ` Ian Eure 1 sibling, 0 replies; 61+ messages in thread From: Ian Eure @ 2024-03-17 16:28 UTC (permalink / raw) To: MSavoritias; +Cc: paul, guix-devel MSavoritias <email@msavoritias.me> writes: > On 3/17/24 13:53, paul wrote: >> Hi all , >> >> thank you MSavoritias for bringing up points that many of us >> share. It's clearly a tradeoff what to do about the past. For >> the >> future, as Christpher already stated, we need a serious >> solution >> that we can uphold as a free software project that does not >> alienate >> users or contributors. >> >> My opinion is that names are just wrong to be included, not >> only >> because of deadnames, but in general having a database with a >> column >> first_name and a column second_name is something only a 35 yrs >> old >> white cis boy could have thought was a good idea to model the >> spectrum of names humans use all over the world: >> >> https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ >> If we'd really need to identify contributors, and obviously >> Guix >> doesn't, we could use an UUID/machine readable identifier which >> can >> then be mapped to a displayed name. I believe git can already >> be >> configured to do so. >> >> >> giacomo >> >> > The uuid sounds like a very interesting solution indeed. > > I wonder how easy it could be to add it to git. > This also seems like interesting territory to explore. The concerns raised around rewriting history have valid points; I think it’s impractical to rewrite history any time a change needs to happen, as that would be an ongoing source of disruption. But rewriting history *once*, to switch to a more general mechanism, seems like a reasonable trade to me. This also presents an opportunity: we could combine this with a default branch switch from master to main. A news entry left as the final commit in master could inform people of whatever steps may be needed to update (if that can’t be automated), and the main branch would contain the rewritten history. It’s certainly not a perfect solution, but it seems pragmatic. — Ian ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 11:53 ` paul 2024-03-17 11:57 ` MSavoritias @ 2024-03-17 12:51 ` Tomas Volf 2024-03-17 23:56 ` Attila Lendvai 2024-03-20 15:25 ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6 2 siblings, 1 reply; 61+ messages in thread From: Tomas Volf @ 2024-03-17 12:51 UTC (permalink / raw) To: paul; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 288 bytes --] On 2024-03-17 12:53:54 +0100, paul wrote: > only a 35 yrs old white cis boy Could you stop labeling people like this? It makes me feel uncomfortable and not welcomed... T. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 12:51 ` Tomas Volf @ 2024-03-17 23:56 ` Attila Lendvai 0 siblings, 0 replies; 61+ messages in thread From: Attila Lendvai @ 2024-03-17 23:56 UTC (permalink / raw) To: Tomas Volf; +Cc: paul, guix-devel > only a 35 yrs old white cis boy you're judging a group of individuals, namely those who were handed the cis white male mix at the genetic lottery, as a uniform blob. and maybe even somewhat deplorable, if i'm reading your right. does it make sense to judge an individual based on some coincidental properties? or really, based on anything else than their actions? does it make sense to discuss the actions/morality of a group of individuals that is formed based on some coincidental properties? e.g. what can we say about the morality of all the blond people? and ultimately, is that an effective way of speaking up for human rights and welcoming environments -- of all things? maybe it's time to take a thorough look at the book that you're preaching from? if i may, let me attempt to inspire you: “The world is changed by your example, not by your opinion.” — Paulo Coelho (1947–) % “Yesterday I was clever, so I wanted to change the world. Today I am wise, so I am changing myself.” — Rumi (1207–1273) % “If there is to be peace in the world, There must be peace in the nations. If there is to be peace in the nations, There must be peace in the cities. If there is to be peace in the cities, There must be peace between neighbors. If there is to be peace between neighbors, There must be peace in the home. If there is to be peace in the home, There must be peace in the heart.” — Lao Tzu (sixth century BC) % “A man of humanity is one who, in seeking to establish himself, finds a foothold for others and who, in desiring attaining himself, helps others to attain.” — Confucius (551–479 BC) % “To put the world in order, we must first put the nation in order; to put the nation in order, we must first put the family in order; to put the family in order; we must first cultivate our personal life; we must first set our hearts right.” — Confucius (551–479 BC) % “Until we have met the monsters in ourselves, we keep trying to slay them in the outer world. And we find that we cannot. For all darkness in the world stems from darkness in the heart. And it is there that we must do our work.” — Marianne Williamson (1952–), 'Everyday Grace: Having Hope, Finding Forgiveness And Making Miracles' (2004) % “If things go wrong in the world, this is because something is wrong with the individual, because something is wrong with me. Therefore, if I am sensible, I shall put myself right first” — Carl Jung (1875–1961), 'The Meaning of Psychology for Modern Man' -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “If liberty means anything at all, it means the right to tell people what they do not want to hear.” — George Orwell (1903–1950) ^ permalink raw reply [flat|nested] 61+ messages in thread
* contributor uuid (was Re: Concerns/questions around Software Heritage Archive) 2024-03-17 11:53 ` paul 2024-03-17 11:57 ` MSavoritias 2024-03-17 12:51 ` Tomas Volf @ 2024-03-20 15:25 ` bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6 2 siblings, 0 replies; 61+ messages in thread From: bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6 @ 2024-03-20 15:25 UTC (permalink / raw) To: guix-devel [-- Attachment #1: Type: text/plain, Size: 838 bytes --] paul <goodoldpaul@autistici.org> writes: [...] > If we'd really need to identify contributors, and obviously Guix > doesn't, we could use an UUID/machine readable identifier which can then > be mapped to a displayed name. I believe git can already be configured > to do so. every contributor wishing to do so can already choose to use the preferred uuid/email metadata they wish and ask some person with commit access to add a uuid/display-name mapping via git .mailmap unfortunately this does not resolve the problem with rewriting history with git, because Guix artifacts also contains source code that usually contains information about the author, including names that potentially could become "deadnames" in the future happy hacking! -- bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 849 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 9:47 ` MSavoritias 2024-03-17 11:53 ` paul @ 2024-03-17 16:20 ` Ian Eure 2024-03-17 16:55 ` MSavoritias 1 sibling, 1 reply; 61+ messages in thread From: Ian Eure @ 2024-03-17 16:20 UTC (permalink / raw) To: MSavoritias; +Cc: Lars-Dominik Braun, Ryan Prior, guix-devel MSavoritias <email@msavoritias.me> writes: > On 3/17/24 11:39, Lars-Dominik Braun wrote: >> Hey, >> >>> I have heard folks in the Guix maintenance sphere claim that >>> we >> never rewrite git history in Guix, as a matter of policy. I >> believe >> we should revisit that policy (is it actually written >> anywhere?) >> with an eye towards possible exceptions, and develop a >> mechanism for >> securely maintaining continuity of Guix installations after >> history >> has been rewritten so that we maintain this as a technical >> possibility in the future, even if we should choose to use it >> sparingly. >> the fallout of rewriting Guix’ git history would be >> devastating. It >> would break every single Guix installation, because >> >> a) `guix pull` authenticates commits and we might lose our >> trust anchor >> if we rewrite history earlier than the introduction of this >> feature, >> b) `guix pull` outright rejects changes to the commit history >> to prevent >> downgrade attacks. >> >> Additionally it would break every single existing usage of the >> time machine and thereby completely defeat the goal of >> providing >> reproducible software environments since the commit hash is >> used to >> identify the point in time to jump to. >> >> I doubt developing “mechanisms” – whatever they look like – >> would >> be worth the effort. Our contributors matter, but so do our >> users. Never >> ever rewriting our git history is a tradeoff we should make for >> our users. >> >> Lars >> >> > Thats a good point. in the sense that its a tradeoff here and I > absolutely agree. > > > But let me add some food for thought here: > > 1. Were the social aspects considered when the system came into > place? > > 2. Is it more important for the system to stay as is than to > welcome > new contributors? > > 3. You mention "its a tradeoff we should make for our > users". How many > trans people where involved in that decision and how much did > their > opinion matter in this? > > > I am saying this because giving power to people(what is called > users) > is not only handling them code or make sure everything is free > software. > > Its also the hard part of making sure the voices of people that > can > not code is heard and is participating and taking in mind. > Just want to say that I appreciate and agree with your thoughtful words. I’d also note that name changes aren’t a concern limited to trans people, and framing this as "we have to upend everything Because Transgender" is both wrong and feels pretty bad to me. Anyone can change their name at any time for any reason, or no reason at all, and may wish to update historical references to their previous names. Having a mechanism to support this is, in my view, a matter of basic decency and respect for all humans. Thanks, — Ian ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 16:20 ` Concerns/questions around Software Heritage Archive Ian Eure @ 2024-03-17 16:55 ` MSavoritias 0 siblings, 0 replies; 61+ messages in thread From: MSavoritias @ 2024-03-17 16:55 UTC (permalink / raw) To: Ian Eure; +Cc: Lars-Dominik Braun, Ryan Prior, guix-devel On 3/17/24 18:20, Ian Eure wrote: > > MSavoritias <email@msavoritias.me> writes: > >> On 3/17/24 11:39, Lars-Dominik Braun wrote: >>> Hey, >>> >>>> I have heard folks in the Guix maintenance sphere claim that we >>> never rewrite git history in Guix, as a matter of policy. I believe >>> we should revisit that policy (is it actually written anywhere?) >>> with an eye towards possible exceptions, and develop a mechanism for >>> securely maintaining continuity of Guix installations after history >>> has been rewritten so that we maintain this as a technical >>> possibility in the future, even if we should choose to use it >>> sparingly. >>> the fallout of rewriting Guix’ git history would be devastating. It >>> would break every single Guix installation, because >>> >>> a) `guix pull` authenticates commits and we might lose our trust anchor >>> if we rewrite history earlier than the introduction of this feature, >>> b) `guix pull` outright rejects changes to the commit history to >>> prevent >>> downgrade attacks. >>> >>> Additionally it would break every single existing usage of the >>> time machine and thereby completely defeat the goal of providing >>> reproducible software environments since the commit hash is used to >>> identify the point in time to jump to. >>> >>> I doubt developing “mechanisms” – whatever they look like – would >>> be worth the effort. Our contributors matter, but so do our users. >>> Never >>> ever rewriting our git history is a tradeoff we should make for our >>> users. >>> >>> Lars >>> >>> >> Thats a good point. in the sense that its a tradeoff here and I >> absolutely agree. >> >> >> But let me add some food for thought here: >> >> 1. Were the social aspects considered when the system came into place? >> >> 2. Is it more important for the system to stay as is than to welcome >> new contributors? >> >> 3. You mention "its a tradeoff we should make for our users". How many >> trans people where involved in that decision and how much did their >> opinion matter in this? >> >> >> I am saying this because giving power to people(what is called users) >> is not only handling them code or make sure everything is free >> software. >> >> Its also the hard part of making sure the voices of people that can >> not code is heard and is participating and taking in mind. >> > > Just want to say that I appreciate and agree with your thoughtful words. > > I’d also note that name changes aren’t a concern limited to trans > people, and framing this as "we have to upend everything Because > Transgender" is both wrong and feels pretty bad to me. Anyone can > change their name at any time for any reason, or no reason at all, and > may wish to update historical references to their previous names. > Having a mechanism to support this is, in my view, a matter of basic > decency and respect for all humans. > > Thanks, > > — Ian You are right. I failed to see how it could be desirable for other people too. I agree it should be done for everybody. MSavoritias ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-17 9:39 ` Lars-Dominik Braun 2024-03-17 9:47 ` MSavoritias @ 2024-03-18 14:04 ` pinoaffe 1 sibling, 0 replies; 61+ messages in thread From: pinoaffe @ 2024-03-18 14:04 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: Ryan Prior, Ian Eure, guix-devel Lars-Dominik Braun <lars@6xq.net> writes: >> I have heard folks in the Guix maintenance sphere claim that we >> never rewrite git history in Guix, as a matter of policy. I believe we >> should revisit that policy (is it actually written anywhere?) with an >> eye towards possible exceptions, and develop a mechanism for securely >> maintaining continuity of Guix installations after history has been >> rewritten so that we maintain this as a technical possibility in the >> future, even if we should choose to use it sparingly. > > the fallout of rewriting Guix’ git history would be devastating. It > would break every single Guix installation, because > > a) `guix pull` authenticates commits and we might lose our trust anchor > if we rewrite history earlier than the introduction of this feature, > b) `guix pull` outright rejects changes to the commit history to prevent > downgrade attacks. > > Additionally it would break every single existing usage of the > time machine and thereby completely defeat the goal of providing > reproducible software environments since the commit hash is used to > identify the point in time to jump to. > > I doubt developing “mechanisms” – whatever they look like – would > be worth the effort. Our contributors matter, but so do our users. Never > ever rewriting our git history is a tradeoff we should make for our users. There may come a time where we don't really have another option but to rewrite (part of) history (e.g., if someone vandalizes the repository using incriminating/illegal files) - I hope that such vandalism would be caught quickly so that most guix installations would not be infected, but it may be a good idea to plan what to do in the unfortunte event that it is necessary to rewrite guix history ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure ` (2 preceding siblings ...) 2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior @ 2024-03-17 13:03 ` Olivier Dion 2024-03-17 17:57 ` Ludovic Courtès ` (3 subsequent siblings) 7 siblings, 0 replies; 61+ messages in thread From: Olivier Dion @ 2024-03-17 13:03 UTC (permalink / raw) To: Ian Eure, guix-devel On Sat, 16 Mar 2024, Ian Eure <ian@retrospec.tv> wrote: [...] > GPL’d software I’ve created has been packaged for Guix, which I assume > means it’s been included in SWH. While I’m dealing with their (IMO: > unethical) opt-out process, I likely also need to stop new copies from > being uploaded again in the future. Even without Guix, SWH could upload your projects into their "database". In fact, I believe anyone can ask to archive your project to SWH. So even if you ask Guix to not do the archiving, anyone contributing might change that in the future. I believe that preventing Guix from archiving your software is a symbolic standpoint -- which I respect --, but would put more burden on the Guix developers. On the other hand, if enough people refuse to archive to SWH, this might shift Guix onto a new direction for longterm source archiving. I'm not a lawyer, but perhaps a first solution -- for the AI stuff -- would be to add an exception to the GPL that prevents AI from training on it. Alas, as usual, our legislators are late on that matter, so that might not even work. [...] -- Olivier Dion oldiob.ca ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure ` (3 preceding siblings ...) 2024-03-17 13:03 ` Olivier Dion @ 2024-03-17 17:57 ` Ludovic Courtès 2024-03-20 17:22 ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo 2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier ` (2 subsequent siblings) 7 siblings, 1 reply; 61+ messages in thread From: Ludovic Courtès @ 2024-03-17 17:57 UTC (permalink / raw) To: Ian Eure; +Cc: guix-devel Hi, Ian Eure <ian@retrospec.tv> skribis: > They appear to be using the archive to build LLMs: > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ To me, if the end result is that copyleft licenses are ignored, as is the case with Microsoft’s CoPilot, then we have a problem. That’s no excuse, but the problem goes beyond SWH: people upload copies of repositories to GitHub without one’s consent (nothing to blame them for, it’s free software), and then code ends up being used as training data for CoPilot. As you may have seen, this is being discussed on the Fediverse. I’d like to leave the SWH people time to reply to concerns that have been raised. > I was also distressed to see how poorly they treated a developer who > wished to update their name: > https://cohost.org/arborelia/post/4968198-the-software-heritag > https://cohost.org/arborelia/post/5052044-the-software-heritag That’s another concern, with append-only storage in general, starting with Git. We should look for solutions that work for both contributors who change names and for users. This has happened several times in Guix and what people did was search/replace their name and adjust ‘.mailmap’. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 61+ messages in thread
* the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-17 17:57 ` Ludovic Courtès @ 2024-03-20 17:22 ` Giovanni Biscuolo 2024-03-21 6:12 ` MSavoritias 0 siblings, 1 reply; 61+ messages in thread From: Giovanni Biscuolo @ 2024-03-20 17:22 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 8582 bytes --] Hello Ludovic and Guix devel community! Disclaimer: I've still not read all the relevant threads [3] [4], so please forgive me if I repeat some information already provided. What rights are we talking about? As a *free software* user do I have the right to redistribute /old/ copies of the source code and documentation I got in the past from the copyright holder, in any form (e.g. print)?... or to use old sources or documentation to develop derived work, with _attribution_, without asking for consent from the original authors and/or contact the original authors to ask them what is their current name? If yes, I would like to exercise all my rights without being harassed. Also, SHW and other organizations (re)distributing free software have their rights and should excercise them without being harassed. Ludovic Courtès <ludo@gnu.org> writes: [...] >> I was also distressed to see how poorly they treated a developer who >> wished to update their name: [1] https://cohost.org/arborelia/post/4968198-the-software-heritag [2] https://cohost.org/arborelia/post/5052044-the-software-heritag > That’s another concern, with append-only storage in general, starting > with Git. We should look for solutions that work for both contributors > who change names and for users. This has happened several times in Guix > and what people did was search/replace their name and adjust > ‘.mailmap’. This is a good solution but unfortunately this is not what the author of the blog posts above [1] [2] and some people in this and other threads [3] [4] are asking SWH - and Guix and potentially all other people distributing copies of copyrighted works (e.g. documentation) - to do. They are asking to "rewrite history" [1] (of git... why not of other archives?): --8<---------------cut here---------------start------------->8--- I already fixed my name in my code. I updated the README and the copyright notice, and I ran git-filter-repo to rewrite the git history so it had always said my correct name, including in commits. This is a thing you can do. --8<---------------cut here---------------end--------------->8--- The author explicitely invokes the "right to rectification" (of the GDPR) [2]: --8<---------------cut here---------------start------------->8--- I give zero shits about the integrity of their data structures. I had already sent them a second email invoking the Right to Rectification, which it seemed like they ignored again, so it was time to get more formal. [...] En application de l’article 21.1 du Règlement général sur la protection des données (RGPD), je m’oppose au traitement de mes données à caractère personnel par votre organisme, l’archive Software Héritage. [...] Dès lors, vous voudrez bien : * supprimer mes données de vos fichiers et notifier ma demande aux organismes auxquels vous les auriez communiquées (articles 17.1.c. et 19 du RGPD) ; * si vous en avez l’obligation légale, m’indiquer la durée de conservation de mes données dans vos bases archives ; * m'informer de ces éléments dans les meilleurs délais et au plus tard dans un délai d’un mois à compter de la réception de ce courrier (article 12.3 du RGPD). --8<---------------cut here---------------end--------------->8--- People asking to rectify informaiton /they/ _published_ on their own are obviously misinterpreting the relevant section of the GDPR (more on this later)... and in fact, the SHW DPO reply is [2]: --8<---------------cut here---------------start------------->8--- Unfortunately, the deletion or modification of the software repositories you requested cannot be performed, for several reasons: * On the one hand, these developments involve several authors and are made available under open source licenses, which explicitly allow copying and redistribution * On the other hand, the mission of Software Heritage archive is to guarantee the availability of all versions of all publicly available source codes, and to ensure the integrity of these codes We understand the concern about the display of outdated identities, and for this reason a mechanism has been put in place to display a preferred identity across all the Software Heritage archive. --8<---------------cut here---------------end--------------->8--- But the authos is still not satisfied with the solution proposed by SHW (and used by Guix for it's contributors): --8<---------------cut here---------------start------------->8--- * I was not asking them to develop such a mechanism. I don't just want them to cosmetically change what they display, I want them to change the data. I can't trust the organization that contains the transphobe who had written their previous content policy to hold on to a substitution rule involving my deadname forever. --8<---------------cut here---------------end--------------->8--- «I want them to change the data», that is: rewrite history (of /all/ the copies of the repository archived by SWH, **fork** included?) The CNIL (the french data regulator) has been involved, but the author do not trust CNIL: --8<---------------cut here---------------start------------->8--- The explanation I can come up with is that CNIL and Inria are friends, and CNIL will never take action against Inria. --8<---------------cut here---------------end--------------->8--- Last but NOT least: what is this "right to rectification"? ...simple: --8<---------------cut here---------------start------------->8--- Art. 16 GDPR Right to rectification 1The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her. 2Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement. --8<---------------cut here---------------end--------------->8--- (https://gdpr-info.eu/art-16-gdpr/) Simple... really?!? First question is: is the "deadname" of the author "inaccurate personal data concerning him or her" or it is "just" the /accurate/ name the person had before he or she changed it? ...but the most interesting part is the "suitable recital" n. 65: --8<---------------cut here---------------start------------->8--- 1 A data subject should have the right to have personal data concerning him or her rectified and a ‘right to be forgotten’ where the retention of such data infringes this Regulation or Union or Member State law to which the controller is subject. [...] 5 However, the further retention of the personal data should be lawful where it is necessary, for exercising the right of freedom of expression and information, for compliance with a legal obligation, for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller, on the grounds of public interest in the area of public health, for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes, or for the establishment, exercise or defence of legal claims. --8<---------------cut here---------------end--------------->8--- (https://gdpr-info.eu/recitals/no-65/) Is SHW (and Guix, and... *me*) exercising it's rights of /archiving/ and /scientific or (and!) historical research/? I say yes. Last question: do SHW (and Guix, and *me*) have the right to archive and redistribute free software for historical purposes. But also: is the retention of the "deadname" even necessary to exercise or defense legal claims about _copyright_ issues? And also: is my right to retain the integrity of data structures I obtained by copyright holders or I have to throw it away if one of the copyright holder asks me to retroactively rewrite all occurrences of his or her name for his or her asserted "right to rectification". All in all: what rights are we talking about, please?!? Loving, Giovanni [3] https://yhetil.org/guix/iytrYuvr9BcPdWG17PDP5SXyjrZzwBGx1sbh0BVcDZ8PAifSIMdPXPbuhhDu-2woPlaWmEWnSt09h4OravmRRBrMB5uDlXYtKtI0egEQX_k=@lendvai.name/#r [4] https://yhetil.org/guix/86d01304cc8957a2508e1d1732421b5e0f9ceeb5.camel@planete-kraus.eu/ P.S.: I am DPO and copyright advisor at my tiny company, but IANAL :-D -- Giovanni Biscuolo Xelera IT Infrastructures [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 849 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-20 17:22 ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo @ 2024-03-21 6:12 ` MSavoritias 2024-03-21 10:49 ` Attila Lendvai ` (3 more replies) 0 siblings, 4 replies; 61+ messages in thread From: MSavoritias @ 2024-03-21 6:12 UTC (permalink / raw) To: Giovanni Biscuolo; +Cc: guix-devel On 3/20/24 19:22, Giovanni Biscuolo wrote: > Hello Ludovic and Guix devel community! > > Disclaimer: I've still not read all the relevant threads [3] [4], so > please forgive me if I repeat some information already provided. > > What rights are we talking about? You are making the same misconception as some other people in the thread here. We are talking about social rules that we have here in the Guix community not legal/state rules. Specifically the social rules that we support trans people and we want to include them. Any person really that want to change their name at some point for some reason. To that end we listen to their concerns/wishes and we accommodate them. > > As a *free software* user do I have the right to redistribute /old/ > copies of the source code and documentation I got in the past from the > copyright holder, in any form (e.g. print)?... or to use old sources or > documentation to develop derived work, with _attribution_, without > asking for consent from the original authors and/or contact the original > authors to ask them what is their current name? Copyright is not consent. When we are talking about consent we are talking about it in social rules. See also https://www.consentfultech.io/wp-content/uploads/2019/10/Building-Consentful-Tech.pdf as a nice paper for consent in tech. > If yes, I would like to exercise all my rights without being harassed. Again this has nothing to do with rights granted by states. This is about including people and making them feel safe and respected. MSavoritias > > Also, SHW and other organizations (re)distributing free software have > their rights and should excercise them without being harassed. > > Ludovic Courtès <ludo@gnu.org> writes: > > [...] > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 6:12 ` MSavoritias @ 2024-03-21 10:49 ` Attila Lendvai 2024-03-21 11:51 ` pelzflorian (Florian Pelz) ` (2 subsequent siblings) 3 siblings, 0 replies; 61+ messages in thread From: Attila Lendvai @ 2024-03-21 10:49 UTC (permalink / raw) To: MSavoritias; +Cc: Giovanni Biscuolo, guix-devel > We are talking about social rules that we have here in the Guix > community not legal/state rules. ethics, i.e. the discussion of rights, is a branch of philosophy. ideally, it should inform the people who are writing and enforcing state laws, but these days -- sadly -- it has precious little to do with state laws. and i think you're the one here who conflates the two. > Specifically the social rules that we support trans people and we want > to include them. Any person really that want to change their name at > some point for some reason. > > To that end we listen to their concerns/wishes and we accommodate them. i've asked you this before, and i'll keep asking it: sure, accommodate, but to what extent? what is a reasonable cost i can incur on others? (see the discussion of negative vs. positive rights in this context) what if i declare that i only feel accommodated here if everyone attaches the local weather forcast to each mail they send to guix-devel? the limit of your demands begins where it starts to constrain the freedom of others. considering this is an essential part of respectful behavior towards others. -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “I am not what happened to me, I am what I choose to become.” — Carl Jung (1875–1961) ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 6:12 ` MSavoritias 2024-03-21 10:49 ` Attila Lendvai @ 2024-03-21 11:51 ` pelzflorian (Florian Pelz) 2024-03-21 11:52 ` pinoaffe 2024-03-21 15:23 ` Hartmut Goebel 3 siblings, 0 replies; 61+ messages in thread From: pelzflorian (Florian Pelz) @ 2024-03-21 11:51 UTC (permalink / raw) To: MSavoritias; +Cc: Giovanni Biscuolo, guix-devel Hello all. I object to this argument: MSavoritias <email@msavoritias.me> writes: > We are talking about social rules that we have here in the Guix > community not legal/state rules. No, legal rules come from deliberation of social arguments. CoC-wise, it seems to me that SWH was unfriendly and this is important to Guix. But SWH’s legal arguments are also social arguments and cannot be dismissed. I do not know if SWH really is an archive in the sense of the law, but certainly we are facing a trade-off. It would be nice if Guix could handle harmless deletion or rectifications. Whether that is possible shapes laws. I believe it is possible, but “show me how” is a valid response. Regards, Florian ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 6:12 ` MSavoritias 2024-03-21 10:49 ` Attila Lendvai 2024-03-21 11:51 ` pelzflorian (Florian Pelz) @ 2024-03-21 11:52 ` pinoaffe 2024-03-21 15:08 ` Giovanni Biscuolo 2024-03-21 15:23 ` Hartmut Goebel 3 siblings, 1 reply; 61+ messages in thread From: pinoaffe @ 2024-03-21 11:52 UTC (permalink / raw) To: MSavoritias; +Cc: Giovanni Biscuolo, guix-devel Hi! MSavoritias <email@msavoritias.me> writes: > On 3/20/24 19:22, Giovanni Biscuolo wrote: >> Disclaimer: I've still not read all the relevant threads [3] [4], so >> please forgive me if I repeat some information already provided. >> >> What rights are we talking about? > > You are making the same misconception as some other people in the > thread here. > > We are talking about social rules that we have here in the Guix > community not legal/state rules. Arborelia is clearly talking about legal/state rules in part of her blogposts. You can argue that the state rules aren't relevant here (IMO, Giovanni's observations support this argument), but it's not a "misconception" to think that the current discussion is at least partially about the legal aspects. > Specifically the social rules that we support trans people and we want > to include them. Any person really that want to change their name at > some point for some reason. > > To that end we listen to their concerns/wishes and we accommodate > them. I agree that we should listen to peoples concerns/wishes and accommodate them out of basic respect, but we can only accomodate people's wishes when those wishes fall within what is technologically feasible and reasonable. When a person publishes books under a certain identity, it is not feasible for *every* mention in every copy to retroactively be updated to reflect a new name. In a similar manner, it is (currently) not always feasible to rewrite git history to change historic names. I think we, as Guix, - should examine if/how it is currently feasible to rewrite our git history, - should examine possible workarounds going forward, - should move towards something like UUIDs and petnames in the long run. (see https://spritelyproject.org/news/petname-systems.html). >> As a *free software* user do I have the right to redistribute /old/ >> copies of the source code and documentation I got in the past from the >> copyright holder, in any form (e.g. print)?... or to use old sources or >> documentation to develop derived work, with _attribution_, without >> asking for consent from the original authors and/or contact the original >> authors to ask them what is their current name? > > Copyright is not consent. When we are talking about consent we are > talking about it in social rules. > > See also > https://www.consentfultech.io/wp-content/uploads/2019/10/Building-Consentful-Tech.pdf > as a nice paper for consent in tech. > >> If yes, I would like to exercise all my rights without being harassed. > > Again this has nothing to do with rights granted by states. This is > about including people and making them feel safe and respected. I fully agree with you here, rights such as the right to free speech and copyleft don't mean that any action that falls within those rights should be free of consequences, especially when such an action excludes others, disrespects them or makes them feel unsafe. >> [...] kind regards, pinoaffe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 11:52 ` pinoaffe @ 2024-03-21 15:08 ` Giovanni Biscuolo 2024-03-21 15:11 ` MSavoritias 2024-03-21 16:17 ` pinoaffe 0 siblings, 2 replies; 61+ messages in thread From: Giovanni Biscuolo @ 2024-03-21 15:08 UTC (permalink / raw) To: pinoaffe; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 1079 bytes --] Hello pinoaffe, pinoaffe <pinoaffe@gmail.com> writes: [...] > I think we, as Guix, > - should examine if/how it is currently feasible to rewrite our git > history, it's not, see also: https://guix.gnu.org/en/blog/2020/securing-updates/ > - should examine possible workarounds going forward, > - should move towards something like UUIDs and petnames in the long run. > > (see https://spritelyproject.org/news/petname-systems.html). I don't understand how using petnames, uuids or even a re:claimID identity (see below) could solve the problem with "rewriting history" in case a person wishes to change his or her previous _published_ name (petname, uuid...) in an archived content-addressable storage system. As a side note, other than the "petname system" please also consider re:claimID from GNUnet: https://www.gnunet.org/en/reclaim/index.html https://www.gnunet.org/en/reclaim/motivation.html [...] Regards, Giovanni. [1] https://guix.gnu.org/en/blog/2020/securing-updates/ -- Giovanni Biscuolo Xelera IT Infrastructures [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 849 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 15:08 ` Giovanni Biscuolo @ 2024-03-21 15:11 ` MSavoritias 2024-03-21 22:11 ` Philip McGrath 2024-03-21 16:17 ` pinoaffe 1 sibling, 1 reply; 61+ messages in thread From: MSavoritias @ 2024-03-21 15:11 UTC (permalink / raw) To: Giovanni Biscuolo, pinoaffe; +Cc: guix-devel On 3/21/24 17:08, Giovanni Biscuolo wrote: > Hello pinoaffe, > > pinoaffe <pinoaffe@gmail.com> writes: > > [...] > >> I think we, as Guix, >> - should examine if/how it is currently feasible to rewrite our git >> history, > it's not, see also: > https://guix.gnu.org/en/blog/2020/securing-updates/ > >> - should examine possible workarounds going forward, >> - should move towards something like UUIDs and petnames in the long run. >> >> (see https://spritelyproject.org/news/petname-systems.html). > I don't understand how using petnames, uuids or even a re:claimID > identity (see below) could solve the problem with "rewriting history" in > case a person wishes to change his or her previous _published_ name > (petname, uuid...) in an archived content-addressable storage system. It doesnt solve the problem of rewriting history. It solves the bug of having names part of the git history. see also https://gitlab.com/gitlab-org/gitlab/-/issues/20960 for Gitlab doing the same thing. MSavoritias > > As a side note, other than the "petname system" please also consider > re:claimID from GNUnet: > https://www.gnunet.org/en/reclaim/index.html > https://www.gnunet.org/en/reclaim/motivation.html > > [...] > > Regards, Giovanni. > > > [1] https://guix.gnu.org/en/blog/2020/securing-updates/ > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 15:11 ` MSavoritias @ 2024-03-21 22:11 ` Philip McGrath 0 siblings, 0 replies; 61+ messages in thread From: Philip McGrath @ 2024-03-21 22:11 UTC (permalink / raw) To: MSavoritias, Giovanni Biscuolo, pinoaffe; +Cc: guix-devel On Thu, Mar 21, 2024, at 11:11 AM, MSavoritias wrote: > On 3/21/24 17:08, Giovanni Biscuolo wrote: >> […] >> I don't understand how using petnames, uuids or even a re:claimID >> identity (see below) could solve the problem with "rewriting history" in >> case a person wishes to change his or her previous _published_ name >> (petname, uuid...) in an archived content-addressable storage system. > > It doesnt solve the problem of rewriting history. It solves the bug of > having names part of the git history. > > see also https://gitlab.com/gitlab-org/gitlab/-/issues/20960 for Gitlab > doing the same thing. > Unless I’m missing something, the linked Gitlab issue seems to be a proposal by someone in February 2018 that Gitlab adopt some system of using UUIDs instead of author information. There was fairly limited discussion, with the last comment in May 2018. There does not seem to have been a consensus supporting the proposal, and I’m not seeing any indication that Gitlab plans to implement the proposal. Furthermore, the author and committer metadata are not the only places where people’s names appear in Guix. For example, I know some font packages that mention the name of the font designer in the package’s description. More broadly, Guix also refers to package sources by their content hashes: most sources probably contain some people’s names, and any of these could face the same problems as names directly included in the Guix Git repository. I strongly believe in the importance of protecting trans people from harassment. I don’t know how to solve the tension with long-term bit-for-bit reproducibility. Philip ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 15:08 ` Giovanni Biscuolo 2024-03-21 15:11 ` MSavoritias @ 2024-03-21 16:17 ` pinoaffe 1 sibling, 0 replies; 61+ messages in thread From: pinoaffe @ 2024-03-21 16:17 UTC (permalink / raw) To: Giovanni Biscuolo; +Cc: guix-devel Giovanni Biscuolo <g@xelera.eu> writes: > [...] > pinoaffe <pinoaffe@gmail.com> writes: >> - should examine possible workarounds going forward, >> - should move towards something like UUIDs and petnames in the long run. >> >> (see https://spritelyproject.org/news/petname-systems.html). > > I don't understand how using petnames, uuids or even a re:claimID > identity (see below) could solve the problem with "rewriting history" in > case a person wishes to change his or her previous _published_ name > (petname, uuid...) in an archived content-addressable storage system. It would decouple "name" from "identity as represented in the git merkle tree", thus allowing name changes to occur without affecting hashes and the like. I see no possible reason for UUID changes, as UUIDs (by themself) are not personally identifying. This of course would not allow retroactive splitting/merging of identities, but I feel like permitting that is incompatible with the idea of identities anyhow. > As a side note, other than the "petname system" please also consider > re:claimID from GNUnet: > https://www.gnunet.org/en/reclaim/index.html > https://www.gnunet.org/en/reclaim/motivation.html Sure, I'll take a look kind regards, pinoaffe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 6:12 ` MSavoritias ` (2 preceding siblings ...) 2024-03-21 11:52 ` pinoaffe @ 2024-03-21 15:23 ` Hartmut Goebel 2024-03-21 15:27 ` MSavoritias ` (2 more replies) 3 siblings, 3 replies; 61+ messages in thread From: Hartmut Goebel @ 2024-03-21 15:23 UTC (permalink / raw) To: guix-devel Am 21.03.24 um 07:12 schrieb MSavoritias: > Specifically the social rules that we support trans people and we want > to include them. Any person really that want to change their name at > some point for some reason. Interestingly you are asking the right to get the old name rewritten for trans people only. To be frank: IMHO This is a quiet egocentric point of view. In many cultures all over the world women are required to change their name when they merry. And you are not asking for women's right. But only for right for the small but loud minority of trans people. -- Regards Hartmut Goebel | Hartmut Goebel | h.goebel@crazy-compilers.com | | www.crazy-compilers.com | compilers which you thought are impossible | ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 15:23 ` Hartmut Goebel @ 2024-03-21 15:27 ` MSavoritias 2024-03-21 15:54 ` Ekaitz Zarraga 2024-03-22 4:33 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-03-21 16:18 ` Efraim Flashner 2024-03-21 16:23 ` pinoaffe 2 siblings, 2 replies; 61+ messages in thread From: MSavoritias @ 2024-03-21 15:27 UTC (permalink / raw) To: Hartmut Goebel, guix-devel On 3/21/24 17:23, Hartmut Goebel wrote: > Am 21.03.24 um 07:12 schrieb MSavoritias: >> Specifically the social rules that we support trans people and we >> want to include them. Any person really that want to change their >> name at some point for some reason. > > Interestingly you are asking the right to get the old name rewritten > for trans people only. > > To be frank: IMHO This is a quiet egocentric point of view. > > In many cultures all over the world women are required to change their > name when they merry. And you are not asking for women's right. But > only for right for the small but loud minority of trans people. What are you implying with the "loud" minority here? MSavoritias ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 15:27 ` MSavoritias @ 2024-03-21 15:54 ` Ekaitz Zarraga 2024-03-22 4:33 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 1 sibling, 0 replies; 61+ messages in thread From: Ekaitz Zarraga @ 2024-03-21 15:54 UTC (permalink / raw) To: MSavoritias, Hartmut Goebel, guix-devel Hi, > What are you implying with the "loud" minority here? > > > MSavoritias He's probably talking about the same thing that made you continue being heated after the fact you were told to calm down and you are not wasting any single opportunity to continue answering every single email in this thread and all the subthreads that continue to appear. I don't want to look insensitive but I think we are revolving around the same issue over and over again and honestly it's bothering me. Not the discussion itself, which has a profound meaning and it's a deep issue, but the way it is taking place and where it is taking place. It's also extremely sad to me to see many unanswered questions in the help-guix mailing list, which might or might not include questions from trans people that are willing to use the fantastic software we all collectively maintain and which would help them have a better life, and yet we are talking about the detail of the detail here for no real reason: this conversation does not have any practical purpose. Also there are hundreds of issues open in guix, which don't happen to deserve the attention this discussion has. I don't think this conversation is going to reach anywhere, and I would like to encourage people to spend their energy somewhere else until we really start having a different mindset on the issue. As we were suggested to do. I don't think this is a topic for `guix-devel` mailing list. If it is, please let me know and change my expectations accordingly. My suggestion is: if this is an actual problem with guix's software, we should open an issue for this, for those who are interested on actually trying to improve the situation. If it's not a problem with guix, then this conversation is just an exercise of ethical and intellectual bragging that is just uninteresting to me and more appropriate for social media. Best, Ekaitz ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 15:27 ` MSavoritias 2024-03-21 15:54 ` Ekaitz Zarraga @ 2024-03-22 4:33 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 1 sibling, 0 replies; 61+ messages in thread From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-03-22 4:33 UTC (permalink / raw) To: guix-devel > IMHO This is a quiet egocentric point of view. > What are you implying with the "loud" minority here? Hi, "Quiet" is a funny typo here. Also, "peace on Earth and goodwill toward [all]." [1] Please [1] https://www.youtube.com/watch?v=74ocbvwam7c ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 15:23 ` Hartmut Goebel 2024-03-21 15:27 ` MSavoritias @ 2024-03-21 16:18 ` Efraim Flashner 2024-03-21 16:23 ` pinoaffe 2 siblings, 0 replies; 61+ messages in thread From: Efraim Flashner @ 2024-03-21 16:18 UTC (permalink / raw) To: Hartmut Goebel; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 1736 bytes --] On Thu, Mar 21, 2024 at 04:23:01PM +0100, Hartmut Goebel wrote: > Am 21.03.24 um 07:12 schrieb MSavoritias: > > Specifically the social rules that we support trans people and we want > > to include them. Any person really that want to change their name at > > some point for some reason. > > Interestingly you are asking the right to get the old name rewritten for > trans people only. > > To be frank: IMHO This is a quiet egocentric point of view. I took it in as though we were discussing the recent activity, not that it was ONLY this instance that we care about. I have a number of friends who have more than 1 set of names and specifically wish to to by one set over the other. The point is that there is a vocal portion of people in the world who insist on deadnaming people, and that is not okay. > In many cultures all over the world women are required to change their name > when they merry. And you are not asking for women's right. But only for > right for the small but loud minority of trans people. As a project, we support people by addressing them by their preferred name, and honoring their wishes as to name, gender, honorifics, etc. For all people. If a person chooses to go by their "maiden name" or their "married name" or a pseudonym, that's their prerogative. > > -- > Regards > Hartmut Goebel > > | Hartmut Goebel | h.goebel@crazy-compilers.com | > | www.crazy-compilers.com | compilers which you thought are impossible | > > -- Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) 2024-03-21 15:23 ` Hartmut Goebel 2024-03-21 15:27 ` MSavoritias 2024-03-21 16:18 ` Efraim Flashner @ 2024-03-21 16:23 ` pinoaffe 2 siblings, 0 replies; 61+ messages in thread From: pinoaffe @ 2024-03-21 16:23 UTC (permalink / raw) To: Hartmut Goebel; +Cc: guix-devel Hartmut Goebel <h.goebel@crazy-compilers.com> writes: > Am 21.03.24 um 07:12 schrieb MSavoritias: >> Specifically the social rules that we support trans people and we >> want to include them. Any person really that want to change their >> name at some point for some reason. > > Interestingly you are asking the right to get the old name rewritten > for trans people only. This discussion arose because of the experiences of someone who's trans, and is relevant to many trans folks, so of course this will remain a major focus of the discussion. > To be frank: IMHO This is a quiet egocentric point of view. You're wrong and it ain't > In many cultures all over the world women are required to change their > name when they merry. And you are not asking for women's right. But > only for right for the small but loud minority of trans people. I am not aware of any women who want/have wanted to retroactively change historic occurences of their maiden name, so your mail reeks of concern trolling to me. There are (of course) instances where people may want to replace historic use of a name with another name for reasons other than transitioning, but that should make you rejoice in the fact that protecting trans people's rights also protects cis people's rights. This should not at all be surprising, as trans rights are human rights. Kind regards, pinoaffe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure ` (4 preceding siblings ...) 2024-03-17 17:57 ` Ludovic Courtès @ 2024-03-18 9:28 ` Simon Tournier 2024-03-18 11:47 ` MSavoritias ` (2 more replies) 2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier 2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure 7 siblings, 3 replies; 61+ messages in thread From: Simon Tournier @ 2024-03-18 9:28 UTC (permalink / raw) To: Ian Eure, guix-devel Hi, On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> wrote: > They appear to be using the archive to build LLMs: > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ About LLM, Software Heritage made a clear statement: https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code Quoting: We feel that the question is no longer whether LLMs for code should be built. They are already being built, independently of what we do, and there is no turning back. The real question is how they should be built and whom they should benefit. Principles: 1. Knowledge derived from the Software Heritage archive must be given back to humanity, rather than monopolized for private gain. The resulting machine learning models must be made available under a suitable open license, together with the documentation and toolings needed to use them. 2. The initial training data extracted from the Software Heritage archive must be fully and precisely identified by, for example, publishing the corresponding SWHID identifiers (note that, in the context of Software Heritage, public availability of the initial training data is a given: anyone can obtain it from the archive). This will enable use cases such as: studying biases (fairness), verifying if a code of interest was present in the training data (transparency), and providing appropriate attribution when generated code bears resemblance to training data (credit), among others. 3. Mechanisms should be established, where possible, for authors to exclude their archived code from the training inputs before model training begins. I hope it clarifies your concerns to some extent. Moreover, you wrote: « I want absolutely nothing to do with them. » Maybe there is a misunderstanding on your side about what “free software” and GPL means because once “free software”, you cannot prevent people to use “your” free software for any purposes you dislike. If you want to bound the use cases of the software you create, you need to explicitly specify that in the license. And if you do, your software will not be considered as “free software”. That’s the double sword of “free software”. :-) Cheers, simon ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier @ 2024-03-18 11:47 ` MSavoritias 2024-03-18 13:12 ` Simon Tournier 2024-03-18 16:27 ` Kaelyn 2024-03-18 19:38 ` Ian Eure 2 siblings, 1 reply; 61+ messages in thread From: MSavoritias @ 2024-03-18 11:47 UTC (permalink / raw) To: Simon Tournier, Ian Eure, guix-devel On 3/18/24 11:28, Simon Tournier wrote: > Hi, > > On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> wrote: > >> They appear to be using the archive to build LLMs: >> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ > About LLM, Software Heritage made a clear statement: > > https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code > > Quoting: > > We feel that the question is no longer whether LLMs for code > should be built. They are already being built, independently of > what we do, and there is no turning back. The real question is > how they should be built and whom they should benefit. > > Principles: > > 1. Knowledge derived from the Software Heritage archive must be > given back to humanity, rather than monopolized for private > gain. The resulting machine learning models must be made available > under a suitable open license, together with the documentation and > toolings needed to use them. > > 2. The initial training data extracted from the Software Heritage > archive must be fully and precisely identified by, for example, > publishing the corresponding SWHID identifiers (note that, in the > context of Software Heritage, public availability of the initial > training data is a given: anyone can obtain it from the > archive). This will enable use cases such as: studying biases > (fairness), verifying if a code of interest was present in the > training data (transparency), and providing appropriate attribution > when generated code bears resemblance to training data (credit), > among others. > > 3. Mechanisms should be established, where possible, for authors to > exclude their archived code from the training inputs before model > training begins. > > I hope it clarifies your concerns to some extent. > > > Moreover, you wrote: « I want absolutely nothing to do with them. » > > Maybe there is a misunderstanding on your side about what “free > software” and GPL means because once “free software”, you cannot prevent > people to use “your” free software for any purposes you dislike. > > If you want to bound the use cases of the software you create, you need > to explicitly specify that in the license. And if you do, your software > will not be considered as “free software”. > > That’s the double sword of “free software”. :-) Simon, 1. You seem to be misunderstanding the statement here that was said. What you can do legally and what you can do socially are not always the same thing. As advice for the future when somebody says a concern or wish they have, your first statement shouldn't be "but its legal" because that completely dismisses any constructive discussion that could be done. And you seem to be talking about legal a lot here so thats not a good look. Yes, legally Ian probably can't get lawyers on you. But nobody is talking about legally here. What is in question here is whether Software Heritage respects people enough to do the right thing and respect their wishes without getting lawyers/legal involved. Besides with the way you are framing Free Software as not respecting any social rules then that makes Free Software not attractive which is the opposite of what we are trying to do here :) 2. > Somehow, a Content-Addressed system is designed around immutable content. And if one know how to implement a Content-Addressed system relying on mutable content, I would be very interested to know more about it. Please refrain from doing such remarks. Nobody here suggested anything that you mention here and you effectively devalue the discussion by arguing like this and frame other people as stupid. 3. Its not on people that are not included to write the code. If Guix is to be an inclusive project, then Guix should do the work so that people feel included. You may disagree with this sure, but shutting down the discussion because nobody wrote the code for you is very elitist of you. 4. > This language is not acceptable on Guix channel of communication. Calling out transphobia it is very much accepted here actually :) Its transphobic speech that is not accepted. I welcome Software Heritage to make an announcement about this or some kind of official communication saying their stance. Although I still wouldn't use them due to the LLMs and AI stuff that they are using. Which I hope at some point realize their mistake. MSavoritias ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 11:47 ` MSavoritias @ 2024-03-18 13:12 ` Simon Tournier 2024-03-18 14:00 ` MSavoritias 0 siblings, 1 reply; 61+ messages in thread From: Simon Tournier @ 2024-03-18 13:12 UTC (permalink / raw) To: MSavoritias, Ian Eure, guix-devel Hi MSavoritias, On lun., 18 mars 2024 at 13:47, MSavoritias <email@msavoritias.me> wrote: > 1. > > You seem to be misunderstanding the statement here that was said. > > What you can do legally and what you can do socially are not always the > same thing. I do not read where I wrote something like that but anyway. A program is free software if the program's users have the four essential freedoms: [1] 0. The freedom to run the program as you wish, for any purpose. 1. The freedom to study how the program works, and change it so it does your computing as you wish. Access to the source code is a precondition for this. 2. The freedom to redistribute copies so you can help others. 3. The freedom to distribute copies of your modified versions to others. By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this. All is about the philosophy of “free software”. 1: https://www.gnu.org/philosophy/free-sw.en.html > As advice for the future when somebody says a concern or wish they have, > your first statement shouldn't be "but its legal" because that > completely dismisses any constructive discussion that could be done. Again, I am not arguing about “legal” something. Instead, I am pointing that this wish does not match the principles of “free software”. If you accept that the software you create is “free software” then you cannot complain if this “free software” is used in some contexts that you consider unethical. That’s the double sword of “free software”. Do I consider LLMs as something unethical? I think yes: most AI appears to me unethical but that’s another story (rooting my arguments in arguments about energy [2,3,4]). 2: https://social.sciences.re/@zimoun/112082437445032973 3: https://social.sciences.re/@zimoun/112039562095800532 4: https://social.sciences.re/@zimoun/112038609631116527 > What is in question here is whether Software Heritage respects people > enough to do the right thing and respect their wishes without getting > lawyers/legal involved. Again, this is an incorrect frame, IMHO. Software Heritage (SWH) do the things you granted them to do. SWH respects the “ethical” definition of “free software”. Again, do I think that feeding LLM after publishing a statement for LLM code is a good move? I do not know… Does it break my ethical values? Maybe… Can I complain about my contributions to “free software” reused in a way that I might consider unethical? No. 5: https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ 6: https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/ > Besides with the way you are framing Free Software as not respecting any > social rules then that makes Free Software not attractive which is the > opposite of what we are trying to do here :) I do not know what are the “social rules” of “free software”. At best, I understand the social rules of a community working on free software. And this community is far to be an homogeneous whole with clear social rules. These social rules vary and the only shared denominator is the “free software” principles defined by four freedoms. The only question might be: by allowing ingested source code to be used to train LLM, is Software Heritage aligned with the values that the Guix community promote? To be honest, I cannot answer to that question in a hurry. > 2. > > > Somehow, a Content-Addressed system is designed around immutable > > content. And if one know how to implement a Content-Addressed system > > relying on mutable content, I would be very interested to know more > > about it. > > Please refrain from doing such remarks. Nobody here suggested anything > that you mention here and you effectively devalue the discussion by > arguing like this and frame other people as stupid. I will not refrain to say: Talk is cheap! Positions about the situation with “rewrite history” cannot be a discussion about opinions but it needs to be rooted in how it technically works and what does it mean Content-addressed system. > 3. > > You may disagree with this sure, but shutting down the discussion > because nobody wrote the code for you is very elitist of you. We are speaking about which discussion because I am lost. About LLM or about “rewrite history”? About LLM, see point #1. About “rewrite history”, see point #2 > 4. > > > This language is not acceptable on Guix channel of communication. > > Calling out transphobia it is very much accepted here actually :) No it is not. Because it is a bold conclusion. I am asking that the Guix project rewrite right now its history: changing my identity ’zimoun’ to my identity ’Simon Tournier’. Since the Guix project will take the time to check, then I will claim: the Guix project is French-phobic! I ask you again to stop such language. I respect your opinion but name calling is not welcoming on Guix channels of communication. Cheers, simon ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 13:12 ` Simon Tournier @ 2024-03-18 14:00 ` MSavoritias 2024-03-18 14:32 ` Simon Tournier 0 siblings, 1 reply; 61+ messages in thread From: MSavoritias @ 2024-03-18 14:00 UTC (permalink / raw) To: Simon Tournier, guix-devel On 3/18/24 15:12, Simon Tournier wrote: > Hi MSavoritias, > > On lun., 18 mars 2024 at 13:47, MSavoritias <email@msavoritias.me> wrote: > > >> As advice for the future when somebody says a concern or wish they have, >> your first statement shouldn't be "but its legal" because that >> completely dismisses any constructive discussion that could be done. > Again, I am not arguing about “legal” something. Instead, I am pointing > that this wish does not match the principles of “free software”. > > If you accept that the software you create is “free software” then you > cannot complain if this “free software” is used in some contexts that > you consider unethical. > > That’s the double sword of “free software”. > > Do I consider LLMs as something unethical? I think yes: most AI appears > to me unethical but that’s another story (rooting my arguments in > arguments about energy [2,3,4]). > > 2: https://social.sciences.re/@zimoun/112082437445032973 > 3: https://social.sciences.re/@zimoun/112039562095800532 > 4: https://social.sciences.re/@zimoun/112038609631116527 > Yes you are. The argument that you can do what you want with Free Software is based around a licence which is a legal construct of states. I think you have misunderstood that here we are talking about the social rules of being a decent group of human beings and respect somebody else's wishes. >> What is in question here is whether Software Heritage respects people >> enough to do the right thing and respect their wishes without getting >> lawyers/legal involved. > Again, this is an incorrect frame, IMHO. Software Heritage (SWH) do the > things you granted them to do. SWH respects the “ethical” definition of > “free software”. You are bringing the legal argument again. The argument that you can do what you want with Free Software is based around a licence which is a legal construct of states. I think you have misunderstood that here we are talking about the social rules of being a decent group of human beings and respect somebody else's wishes. In this case somebody asks for something so if SFH is a good member of our community they should do that. Otherwise they are not a good member of our community. > >> Besides with the way you are framing Free Software as not respecting any >> social rules then that makes Free Software not attractive which is the >> opposite of what we are trying to do here :) > I do not know what are the “social rules” of “free software”. At best, > I understand the social rules of a community working on free software. > > And this community is far to be an homogeneous whole with clear social > rules. These social rules vary and the only shared denominator is the > “free software” principles defined by four freedoms. Guix has a CoC that's the common thing we have here. For social things that is. Plus some cultural things of course. MSavoritias ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 14:00 ` MSavoritias @ 2024-03-18 14:32 ` Simon Tournier 0 siblings, 0 replies; 61+ messages in thread From: Simon Tournier @ 2024-03-18 14:32 UTC (permalink / raw) To: MSavoritias, guix-devel Hi MSavoritias, On lun., 18 mars 2024 at 16:00, MSavoritias <email@msavoritias.me> wrote: > I think you have misunderstood that here we are talking about > I think you have misunderstood that here we are talking about What if? Maybe it’s you. Maybe you, “you have misunderstood that here we are talking about […]”. For what my opinion is worth here, I would prefer that you do not assume on what I might have understood. Similarly, I am not assuming anything about your understanding of the various topics at hand. That’s my last message in this thread. Cheers, simon ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier 2024-03-18 11:47 ` MSavoritias @ 2024-03-18 16:27 ` Kaelyn 2024-03-18 17:39 ` Daniel Littlewood 2024-03-18 20:38 ` Olivier Dion 2024-03-18 19:38 ` Ian Eure 2 siblings, 2 replies; 61+ messages in thread From: Kaelyn @ 2024-03-18 16:27 UTC (permalink / raw) To: guix-devel On Monday, March 18th, 2024 at 2:28 AM, Simon Tournier <zimon.toutoune@gmail.com> wrote: > > Hi, > > On sam., 16 mars 2024 at 08:52, Ian Eure ian@retrospec.tv wrote: > > > They appear to be using the archive to build LLMs: > > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ > > > About LLM, Software Heritage made a clear statement: > > https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code > > Quoting: > > We feel that the question is no longer whether LLMs for code > should be built. They are already being built, independently of > what we do, and there is no turning back. The real question is > how they should be built and whom they should benefit. > > Principles: > > 1. Knowledge derived from the Software Heritage archive must be > given back to humanity, rather than monopolized for private > gain. The resulting machine learning models must be made available > under a suitable open license, together with the documentation and > toolings needed to use them. > > 2. The initial training data extracted from the Software Heritage > archive must be fully and precisely identified by, for example, > publishing the corresponding SWHID identifiers (note that, in the > context of Software Heritage, public availability of the initial > training data is a given: anyone can obtain it from the > archive). This will enable use cases such as: studying biases > (fairness), verifying if a code of interest was present in the > training data (transparency), and providing appropriate attribution > when generated code bears resemblance to training data (credit), > among others. > > 3. Mechanisms should be established, where possible, for authors to > exclude their archived code from the training inputs before model > training begins. > > I hope it clarifies your concerns to some extent. > > > Moreover, you wrote: « I want absolutely nothing to do with them. » > > Maybe there is a misunderstanding on your side about what “free > software” and GPL means because once “free software”, you cannot prevent > people to use “your” free software for any purposes you dislike. > > If you want to bound the use cases of the software you create, you need > to explicitly specify that in the license. And if you do, your software > will not be considered as “free software”. > > That’s the double sword of “free software”. :-) Hi, I want to stress that I am not a lawyer, but my (possiblibly outdated) understanding of what machine learning models can and cannot do with regards to their training data, and a reading of parts of the GPL 2 and 3, suggest that at best the SWH's LLM is in a legal grey area and at worst directly violates the license of GPL code that it ingests for training. As such, I don't think it is accurate to say "you cannot prevent people to use “your” free software for any purposes you dislike" in response to concerns about automatic inclusion of free software into LLM training sets. Specifically, my understanding (as of a few years ago) is that LLMs have difficulty tracing and atttributing various aspects of its training to specific inputs, which seems to be in violation of of e.g. Sections 5 and 6 of the GPL. Specific quotes from those sections https://www.gnu.org/licenses/gpl-3.0.html: From section 5: > You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: > > a) The work must carry prominent notices stating that you modified it, and giving a relevant date. > b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to “keep intact all notices”. > c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. > d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. and from Section 6: > You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: > > a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. > b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. > c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. > d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. > e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. And from the GPL 2 text at https://www.gnu.org/licenses/old-licenses/gpl-2.0.html: > 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: > > a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. > b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. > c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) > > These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. > > Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. > > In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. > > 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: > > a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, > b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, > c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) > > The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. > > If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. > > 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. Again, I want to emphasize IANAL. As a layman, my understanding of ML model training is that it cannot maintain enough of a trace between GPLed input code and its (modified) use in the output to maintain the licensing and distribution requirements from either the GPL 3 sections above or the GPL 2 sections 2 and 3. I also believe that section 4 of the GPL 2 directly applies to these LLM code models. There is also the potential licensing issues of mixing (potentially) incompatible licenses in the training data sets, such as GPL and CDDL code, with no way to distinguish or separate the (arguably) modified sources from each. Just my $0.02 USD on the LLM side of matter, as much of the discussion seems to be around the cost vs benefit of rewriting the git history for updating personally identifying information. Cheers, Kaelyn > > Cheers, > simon ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 16:27 ` Kaelyn @ 2024-03-18 17:39 ` Daniel Littlewood 2024-03-18 20:38 ` Olivier Dion 1 sibling, 0 replies; 61+ messages in thread From: Daniel Littlewood @ 2024-03-18 17:39 UTC (permalink / raw) To: Kaelyn; +Cc: guix-devel Hi Kaelyn, The legal question is unsettled, and there is ongoing litigation by (at least) Matthew Butterick in the US, since at least 2022. The reasonable positions I'm aware of are: 1. An LLM (or, more precisely, the set of weights that define it) is not a derivative work of its training data, for the purposes of copyright, and thus the license is irrelevant. 2. Producing an LLM from training data is a transformative fair use, and thus the license is irrelevant. 3. Neither 1 nor 2 holds, and LLMs constitute copyright infringement on a profound scale (of both copyrighted and copylefted works). The FSF and CC have both commissioned white papers on the impact of such considerations for Free works. I don't recall seeing anything particularly insightful in them. Probably a waste of time to discuss it here. Best wishes, Dan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 16:27 ` Kaelyn 2024-03-18 17:39 ` Daniel Littlewood @ 2024-03-18 20:38 ` Olivier Dion 1 sibling, 0 replies; 61+ messages in thread From: Olivier Dion @ 2024-03-18 20:38 UTC (permalink / raw) To: Kaelyn, guix-devel On Mon, 18 Mar 2024, Kaelyn <kaelyn.alexi@protonmail.com> wrote: > On Monday, March 18th, 2024 at 2:28 AM, Simon Tournier <zimon.toutoune@gmail.com> wrote: [...] >> That’s the double sword of “free software”. :-) > > Hi, > > I want to stress that I am not a lawyer, but my (possiblibly outdated) > understanding of what machine learning models can and cannot do with > regards to their training data, and a reading of parts of the GPL 2 > and 3, suggest that at best the SWH's LLM is in a legal grey area and > at worst directly violates the license of GPL code that it ingests for > training. As such, I don't think it is accurate to say "you cannot > prevent people to use “your” free software for any purposes you > dislike" in response to concerns about automatic inclusion of free > software into LLM training sets. Specifically, my understanding (as of > a few years ago) is that LLMs have difficulty tracing and atttributing > various aspects of its training to specific inputs, which seems to be > in violation of of e.g. Sections 5 and 6 of the GPL. Specific quotes > from those sections https://www.gnu.org/licenses/gpl-3.0.html: I think that the larger point here is that you do not get to choose who use your software and for what purpose. That is the double edges sword of free software. Putting aside LLM for a moment, what if some package in Guix is used for military purpose? Will this software be removed from Guix because one of its user uses it in some unethical way, even though it is also used in an ethical way by others. Will we penalized users for the sake of moral high ground? This raise the question, what is considered ethical and when does ethic become political dogma? [...] -- Olivier Dion oldiob.ca ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier 2024-03-18 11:47 ` MSavoritias 2024-03-18 16:27 ` Kaelyn @ 2024-03-18 19:38 ` Ian Eure 2024-03-18 22:02 ` Ludovic Courtès 2024-03-19 10:58 ` Simon Tournier 2 siblings, 2 replies; 61+ messages in thread From: Ian Eure @ 2024-03-18 19:38 UTC (permalink / raw) To: Simon Tournier; +Cc: guix-devel Simon Tournier <zimon.toutoune@gmail.com> writes: > Hi, > > On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> > wrote: > >> They appear to be using the archive to build LLMs: >> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ > > About LLM, Software Heritage made a clear statement: > > https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code > > Quoting: > > We feel that the question is no longer whether LLMs for > code > should be built. They are already being built, > independently of > what we do, and there is no turning back. The real > question is > how they should be built and whom they should benefit. > > Principles: > > 1. Knowledge derived from the Software Heritage archive > must be > given back to humanity, rather than monopolized for > private > gain. The resulting machine learning models must be made > available > under a suitable open license, together with the > documentation and > toolings needed to use them. > > 2. The initial training data extracted from the Software > Heritage > archive must be fully and precisely identified by, for > example, > publishing the corresponding SWHID identifiers (note > that, in the > context of Software Heritage, public availability of the > initial > training data is a given: anyone can obtain it from the > archive). This will enable use cases such as: studying > biases > (fairness), verifying if a code of interest was present > in the > training data (transparency), and providing appropriate > attribution > when generated code bears resemblance to training data > (credit), > among others. > > 3. Mechanisms should be established, where possible, for > authors to > exclude their archived code from the training inputs > before model > training begins. > > I hope it clarifies your concerns to some extent. > It doesn’t clarify them, but it does illustrate them. HuggingFace and the StarCoder2 model is in violation of principle 2. By their own admission, they are including code without clear licensing[1]: The main difference between the Stack v2 and the Stack v1 is that we include both permissively licensed and unlicensed files. HuggingFace’s StarChat2 Playground[2] also violates this principle, as it outputs code without any license or provenance information; I know, because I tried it. While their own terms of use for StarCoder2 state: Any use of all or part of the code gathered in The Stack v2 must abide by the terms of the original licenses... ...their own playground makes this impossible. HuggingFace is also in violation of the third principle, because they haven’t established a functioning opt-out model[3]. Opting out requires using non-free software; requests have been sitting for nearly a year with no action or response; and out of every request submitted, only a single one has *ever* been honored. They appear to be violating free software licenses on large scale. They are in violation of SWH’s own positions. > Moreover, you wrote: « I want absolutely nothing to do with > them. » > > Maybe there is a misunderstanding on your side about what “free > software” and GPL means because once “free software”, you cannot > prevent > people to use “your” free software for any purposes you dislike. > > If you want to bound the use cases of the software you create, > you need > to explicitly specify that in the license. And if you do, your > software > will not be considered as “free software”. > > That’s the double sword of “free software”. :-) > I am crystal clear on the meaning of free software. I wish to remove it from these models *in order to* keep it free. Thanks, — Ian [1]: https://arxiv.org/html/2402.19173v1 [2]: https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground [3]: https://huggingface.co/datasets/bigcode/the-stack-v2 [4]: https://github.com/bigcode-project/opt-out-v2/issues ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 19:38 ` Ian Eure @ 2024-03-18 22:02 ` Ludovic Courtès 2024-03-19 10:58 ` Simon Tournier 1 sibling, 0 replies; 61+ messages in thread From: Ludovic Courtès @ 2024-03-18 22:02 UTC (permalink / raw) To: Ian Eure; +Cc: Simon Tournier, guix-devel Hello, Ian Eure <ian@retrospec.tv> skribis: > HuggingFace and the StarCoder2 model is in violation of principle 2. > By their own admission, they are including code without clear > licensing[1]: [...] > HuggingFace is also in violation of the third principle, because they > haven’t established a functioning opt-out model[3]. Opting out > requires using non-free software; requests have been sitting for > nearly a year with no action or response; and out of every request > submitted, only a single one has *ever* been honored. > > They appear to be violating free software licenses on large > scale. They are in violation of SWH’s own positions. You may be right, but again, I think we should all wait for SWH folks to weigh in. Many people working there are long-time free software activists; I think we can trust them to take our concerns into consideration, but they may also need more time to reply thoughtfully. Besides, we should probably focus the discussion on what it means for Guix. Ludo’. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-18 19:38 ` Ian Eure 2024-03-18 22:02 ` Ludovic Courtès @ 2024-03-19 10:58 ` Simon Tournier 2024-03-19 15:37 ` Ian Eure 1 sibling, 1 reply; 61+ messages in thread From: Simon Tournier @ 2024-03-19 10:58 UTC (permalink / raw) To: Ian Eure; +Cc: guix-devel Hi, On lun., 18 mars 2024 at 12:38, Ian Eure <ian@retrospec.tv> wrote: > They appear to be violating free software licenses on large scale. > They are in violation of SWH’s own positions. [...] > [1]: https://arxiv.org/html/2402.19173v1 > [2]: > https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground > [3]: https://huggingface.co/datasets/bigcode/the-stack-v2 > [4]: https://github.com/bigcode-project/opt-out-v2/issues Please note that Software Heritage folks are not co-author of all that; or I misread. Do not take me wrong, this is not an attempt to escape but a query for waiting the feedback of SWH. As Ludo said, SWH folks are, by the way, also long time Free Software activists. For the record, the quality of 10 Years of Guix [1] videos is the result of tireless work (for free!) by a Debian video team member (also working for SWH) and one of SWH co-founder had been Debian project leader. Let the benefit of the doubt while waiting. 1: https://10years.guix.gnu.org Cheers, simon PS: Thanks for the detailed explanations. I will provide my reading later, after some concerns will be separated, eventually. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-19 10:58 ` Simon Tournier @ 2024-03-19 15:37 ` Ian Eure 0 siblings, 0 replies; 61+ messages in thread From: Ian Eure @ 2024-03-19 15:37 UTC (permalink / raw) To: Simon Tournier; +Cc: guix-devel Simon Tournier <zimon.toutoune@gmail.com> writes: > Hi, > > On lun., 18 mars 2024 at 12:38, Ian Eure <ian@retrospec.tv> > wrote: > >> They appear to be violating free software licenses on large >> scale. >> They are in violation of SWH’s own positions. > > [...] > >> [1]: https://arxiv.org/html/2402.19173v1 >> [2]: >> https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground >> [3]: https://huggingface.co/datasets/bigcode/the-stack-v2 >> [4]: https://github.com/bigcode-project/opt-out-v2/issues > > Please note that Software Heritage folks are not co-author of > all that; > or I misread. Do not take me wrong, this is not an attempt to > escape > but a query for waiting the feedback of SWH. > Shit rolls downhill. It’s the least surprising thing in the world to find that an "AI" company is violating licenses, because the entire technology is based on infringement at a massive scale. SWH’s partnership with, and promotion of, both the company and its license-violating model, in violation of their *own stated principles*, raises very legitimate questions. There are multpile overlapping concerns here; personal, organizational, legal, ethical, and technical. From a personal, legal standpoint, HuggingFace is almost certainly in violation of my code’s licenses. I will, therefore, work to remove my code from their models. From a personal, ethical standpoint, I believe that SWH has proven themselves untrustworthy by enabling *and promoting* this infringement in violation of their own stated policies, and will work to remove my code from their archive. Personally, I cannot extend them the benefit of the doubt on this. They blew it. From an organizational ethical standpoint, Guix is IMO on the right track by waiting on SWH (and perhaps pressuring them to fix things). From an organizational, technical perspective, I would like to see concrete measures to support my (and hundreds of others’) personal, ethical desires to exclude software from SWH, and by extension, HuggingFace’s models. > As Ludo said, SWH folks are, by the way, also long time Free > Software > activists. > In my view, this is not to their credit. I’d expect people familiar with Free Software to be *more* sensitive to licensing concerns, thus less likely to partner with a company likely to violate them. > PS: Thanks for the detailed explanations. I will provide my > reading > later, after some concerns will be separated, eventually. You’re very welcome. Thanks, — Ian ^ permalink raw reply [flat|nested] 61+ messages in thread
* Content-Addressed system and history? 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure ` (5 preceding siblings ...) 2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier @ 2024-03-18 11:14 ` Simon Tournier 2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure 7 siblings, 0 replies; 61+ messages in thread From: Simon Tournier @ 2024-03-18 11:14 UTC (permalink / raw) To: Ian Eure, guix-devel Hi, On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> wrote: > I was also distressed to see how poorly they treated a developer > who wished to update their name: > https://cohost.org/arborelia/post/4968198-the-software-heritag > https://cohost.org/arborelia/post/5052044-the-software-heritag This asks two questions, IMHO. 1. Can the future you decide who were the past you? 2. What is Content-addressed system? About #1, that’s somehow a philosophical question. :-) That’s what the question about changing the public identity asks: you can act on who you are and who you want to be but because the time is not reversal, sadly, you cannot change who you were. It is not possible to collectively rewrite the history. Allowing such process leads to dangerous consequences, IMHO. That’s another story. :-) Do not take me wrong. That’s still an open question and the right to be forgotten is a topic by itself, e.g., legal. We will not address it in the Guix project. About #2, that’s a technical question. By definition of a Content-Addressed system, the key associated to the value is computed by a procedure depending only on the content itself. Therefore, change the content then change the key. Git [1] is probably the tool that have popularized that. Consider a project using Git and you clone it. Now, you have a complete copy of many keys associated to many contents, and also many links between the keys themselves. For instance, the key of the object ’Git commit’ depends on its content which depends on the key of the object ’Git tree’. Now, if you rewrite any content, then it rewrites the key. As pointed, this change might propagate. All the question becomes the authority. Because I also have another copy/clone with the initial set of keys and you have now modified ones, how do we agree what are the right ones? Well, at the size [2] of linked posts, the Git history rewriting is affordable. Now, I am not convinced that the person would try – or even think of – such if this project would have hundreds of contributors and thousands of users. That’s my opinion and I agree it is not an argument. :-) At the level of Guix, allowing a mutable history implies a random availability of binary substitutes. To be explicit, rewrite the Git history of Guix implies the break of: + local Git repositories of Guix developers + regular Guix users and the trust mechanism Somehow, a Content-Addressed system is designed around immutable content. And if one know how to implement a Content-Addressed system relying on mutable content, I would be very interested to know more about it. Cheers, simon 1: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects 2: https://github.com/rspeer/python-ftfy/graphs/contributors ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure ` (6 preceding siblings ...) 2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier @ 2024-04-20 18:48 ` Ian Eure 2024-05-01 15:29 ` Ian Eure 2024-05-02 10:28 ` Ludovic Courtès 7 siblings, 2 replies; 61+ messages in thread From: Ian Eure @ 2024-04-20 18:48 UTC (permalink / raw) To: guix-devel Hello, I’m following up on this since discussion since it’s been a month and I haven’t heard any updates. Summarizing the situation: - SHF has an opaque, difficult, and undocumented process for handling name changes. I’s like to stress again that this is *not* strictly a transgender issue (though it likely affects them more, or in worse/different ways) -- it is a human respect issue. Many, many more cisgender people change their name than transgender people. - SHF gave their archive to HuggingFace, an "AI" company which is generating derived works with no attribution or provenance, in ways which violate the both licenses of the projects used to train their model, and the SHF principles for LLMs. - HuggingFace wasn’t respecting requests to opt-out of their model. On the first point, it sounds like SHF has made concrete progress to improve[1], which is very good to hear. If SHF continues on this course, I think the concern is resolved. On the third point, HuggingFace has begun honoring opt-out requests, but is still very far behind. Also, they don’t remove code from the older versions of their model -- it remains there forever. This is progress, but still, not great. On the second point, I have not seen any public statements indicating that either SHF or HuggingFace even acknowledges the problem. SHF’s most recent newsletter[2], published in April 2024 (after these concerns came to light), continues to tout that StarCoder2 is "the first AI model aligned with our principles," which appears to be false. StarCoder2 includes both licensed and unlicensed code, and HuggingFace’s own StarChat2 playground produces works derivative of this code, with no attribution or licensing information. There is also no statement or position on the SHF news blog. Nor hsa HuggingFace either fixed their tools, or made a statement. This is still very much a live concern. I have a few questions: - Has Guix reached out to SHF to express these concerns / get a response? - Whether a public or private response, what would Guix consider to be an acceptable response? An unacceptable respoinse? - How long is Guix willing to wait for a response? Thanks, — Ian [1]: https://cohost.org/arborelia/post/5273879-they-are-fixing-some [2]: https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf Ian Eure <ian@retrospec.tv> writes: > Hi Guixy people, > > I’d never heard of SWH before I started hacking on Guix last > fall, and > it struck me as rather a good idea. However, I’ve seen some > things > lately which have soured me on them. > > They appear to be using the archive to build LLMs: > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ > > I was also distressed to see how poorly they treated a developer > who > wished to update their name: > https://cohost.org/arborelia/post/4968198-the-software-heritag > https://cohost.org/arborelia/post/5052044-the-software-heritag > > GPL’d software I’ve created has been packaged for Guix, which I > assume > means it’s been included in SWH. While I’m dealing with their > (IMO: > unethical) opt-out process, I likely also need to stop new > copies from > being uploaded again in the future. > > Is there a way to indicate, in a Guix package, that it should > *never* > be included in SWH? > > Is there a way to tell Guix to never download source from SWH? > > I want absolutely nothing to do with them. > > Thanks, > > — Ian > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure @ 2024-05-01 15:29 ` Ian Eure 2024-05-01 15:41 ` Tomas Volf 2024-05-02 10:28 ` Ludovic Courtès 1 sibling, 1 reply; 61+ messages in thread From: Ian Eure @ 2024-05-01 15:29 UTC (permalink / raw) To: guix-devel Hello Guixers, It’s been another week with no response or movement on this. I’m disappointed that this situation seems to be getting treated so lightly. Adhering to the terms of software licenses is fundamental to the operation of the free software ecosystem; there is no software freedom without it. It’s surprising that a pretty clear-cut situation of creating derivative works of free software in violation of their licenses would be shrugged off so easily. Whatever the Guix organization’s position is, I’m reaching my personal limit, and need to see some kind of positive movement on this[1]. If Guix is going to continue to facilitate license violations, I will have no choice but to remove my software from it to defend them. — Ian [1]: Personally, I would be satisfied with a per-package setting which disables scheduling source for archiving by SWH. Seeing this, or a committment to build this within a reasonable timeframe, would allay my concerns. Ian Eure <ian@retrospec.tv> writes: > Hello, > > I’m following up on this since discussion since it’s been a > month and > I haven’t heard any updates. > > Summarizing the situation: > > - SHF has an opaque, difficult, and undocumented process for > handling name changes. I’s like to stress again that this is > *not* strictly a transgender issue (though it likely affects > them > more, or in worse/different ways) -- it is a human respect > issue. > Many, many more cisgender people change their name than > transgender people. > > - SHF gave their archive to HuggingFace, an "AI" company which > is > generating derived works with no attribution or provenance, in > ways which violate the both licenses of the projects used to > train > their model, and the SHF principles for LLMs. > > - HuggingFace wasn’t respecting requests to opt-out of their > model. > > > On the first point, it sounds like SHF has made concrete > progress to > improve[1], which is very good to hear. If SHF continues on > this > course, I think the concern is resolved. > > On the third point, HuggingFace has begun honoring opt-out > requests, > but is still very far behind. Also, they don’t remove code from > the > older versions of their model -- it remains there forever. This > is > progress, but still, not great. > > On the second point, I have not seen any public statements > indicating > that either SHF or HuggingFace even acknowledges the problem. > SHF’s > most recent newsletter[2], published in April 2024 (after these > concerns came to light), continues to tout that StarCoder2 is > "the > first AI model aligned with our principles," which appears to be > false. StarCoder2 includes both licensed and unlicensed code, > and > HuggingFace’s own StarChat2 playground produces works derivative > of > this code, with no attribution or licensing information. There > is > also no statement or position on the SHF news blog. Nor hsa > HuggingFace either fixed their tools, or made a statement. This > is > still very much a live concern. > > I have a few questions: > > - Has Guix reached out to SHF to express these concerns / get a > response? > - Whether a public or private response, what would Guix consider > to > be an acceptable response? An unacceptable respoinse? > - How long is Guix willing to wait for a response? > > Thanks, > > — Ian > > [1]: > https://cohost.org/arborelia/post/5273879-they-are-fixing-some > [2]: > https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf > > Ian Eure <ian@retrospec.tv> writes: > >> Hi Guixy people, >> >> I’d never heard of SWH before I started hacking on Guix last >> fall, >> and >> it struck me as rather a good idea. However, I’ve seen some >> things >> lately which have soured me on them. >> >> They appear to be using the archive to build LLMs: >> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ >> >> I was also distressed to see how poorly they treated a >> developer who >> wished to update their name: >> https://cohost.org/arborelia/post/4968198-the-software-heritag >> https://cohost.org/arborelia/post/5052044-the-software-heritag >> >> GPL’d software I’ve created has been packaged for Guix, which I >> assume >> means it’s been included in SWH. While I’m dealing with their >> (IMO: >> unethical) opt-out process, I likely also need to stop new >> copies >> from >> being uploaded again in the future. >> >> Is there a way to indicate, in a Guix package, that it should >> *never* >> be included in SWH? >> >> Is there a way to tell Guix to never download source from SWH? >> >> I want absolutely nothing to do with them. >> >> Thanks, >> >> — Ian >> > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-05-01 15:29 ` Ian Eure @ 2024-05-01 15:41 ` Tomas Volf 0 siblings, 0 replies; 61+ messages in thread From: Tomas Volf @ 2024-05-01 15:41 UTC (permalink / raw) To: Ian Eure; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 590 bytes --] On 2024-05-01 08:29:29 -0700, Ian Eure wrote: > If Guix is going to continue to facilitate license violations, I will have no > choice but to remove my software from it to defend them. Purely hypothetically, if it would come to this, how would you go about it? Assuming the software is under free license (requirement for inclusion into Guix), I am unsure based on what would the removal be demanded. Do you have some specific approach in mind? Have a nice day, Tomas Volf -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure 2024-05-01 15:29 ` Ian Eure @ 2024-05-02 10:28 ` Ludovic Courtès 2024-05-09 16:00 ` Maxim Cournoyer 1 sibling, 1 reply; 61+ messages in thread From: Ludovic Courtès @ 2024-05-02 10:28 UTC (permalink / raw) To: Ian Eure; +Cc: guix-devel Hi Ian, Ian Eure <ian@retrospec.tv> skribis: > Summarizing the situation: > > - SHF has an opaque, difficult, and undocumented process for > handling name changes. I’s like to stress again that this is > *not* strictly a transgender issue (though it likely affects them > more, or in worse/different ways) -- it is a human respect issue. > Many, many more cisgender people change their name than > transgender people. It is also not strictly an SWH issue: how does Internet Archive handle name changes? What about append-only storage in general? We’ve discussed this already. > - SHF gave their archive to HuggingFace, an "AI" company which is > generating derived works with no attribution or provenance, in > ways which violate the both licenses of the projects used to train > their model, and the SHF principles for LLMs. [...] > - Has Guix reached out to SHF to express these concerns / get a > response? I’ve seen and participated in informal discussions, but that’s all I know. Maintainers? > - Whether a public or private response, what would Guix consider to > be an acceptable response? An unacceptable respoinse? > - How long is Guix willing to wait for a response? Free software people, myself included, have expressed disappointment regarding the use of code harvested by SWH for HuggingFace’s training. Stefano Zacchiroli of SWH responded to these concerns on Mastodon back in March, as you probably saw. One important point is that copyleft code is excluded from the training dataset; I was able to anecdotally check that for GPL code such as Guix using their interface (there was a thread on Mastodon but I can’t find it): <https://huggingface.co/spaces/bigcode/in-the-stack>. That addresses my main concern. Remaining concerns include the weak wording of the principles put forward by SWH in its statement on LLMs: <https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/>. I think this is something worth discussing further with them (it’s already been brought up notably on Mastodon). It’s not clear to me whether this is a task for Guix as a project. (I do not forget that, in the meantime, Microsoft ingests everything that’s on GitHub, including copyleft code, and including clones of repos that were not initially hosted there.) I’m not sure this is the kind of answer you expected, but I hope it makes sense! Ludo’. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Concerns/questions around Software Heritage Archive 2024-05-02 10:28 ` Ludovic Courtès @ 2024-05-09 16:00 ` Maxim Cournoyer 0 siblings, 0 replies; 61+ messages in thread From: Maxim Cournoyer @ 2024-05-09 16:00 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Ian Eure, guix-devel Hi Ian, Ludovic. Ludovic Courtès <ludo@gnu.org> writes: > Hi Ian, > > Ian Eure <ian@retrospec.tv> skribis: > >> Summarizing the situation: >> >> - SHF has an opaque, difficult, and undocumented process for >> handling name changes. I’s like to stress again that this is >> *not* strictly a transgender issue (though it likely affects them >> more, or in worse/different ways) -- it is a human respect issue. >> Many, many more cisgender people change their name than >> transgender people. > > It is also not strictly an SWH issue: how does Internet Archive handle > name changes? What about append-only storage in general? We’ve > discussed this already. >> - SHF gave their archive to HuggingFace, an "AI" company which is >> generating derived works with no attribution or provenance, in >> ways which violate the both licenses of the projects used to train >> their model, and the SHF principles for LLMs. > > [...] > >> - Has Guix reached out to SHF to express these concerns / get a >> response? > > I’ve seen and participated in informal discussions, but that’s all I > know. Maintainers? We haven't. Given some improvements were apparently already made by SWF in response to concerns raised, it seems the dialogue should continue. >> - Whether a public or private response, what would Guix consider to >> be an acceptable response? An unacceptable respoinse? >> - How long is Guix willing to wait for a response? > > Free software people, myself included, have expressed disappointment > regarding the use of code harvested by SWH for HuggingFace’s training. > Stefano Zacchiroli of SWH responded to these concerns on Mastodon back > in March, as you probably saw. > > One important point is that copyleft code is excluded from the training > dataset; I was able to anecdotally check that for GPL code such as Guix > using their interface (there was a thread on Mastodon but I can’t find > it): <https://huggingface.co/spaces/bigcode/in-the-stack>. That > addresses my main concern. > > Remaining concerns include the weak wording of the principles put > forward by SWH in its statement on LLMs: > <https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/>. > I think this is something worth discussing further with them (it’s > already been brought up notably on Mastodon). It’s not clear to me > whether this is a task for Guix as a project. I don't think it is a task for Guix specifically, but rather for all users of SWH or interested parties. -- Thanks, Maxim ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2024-05-09 16:02 UTC | newest] Thread overview: 61+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure 2024-03-16 17:50 ` Christopher Baines 2024-03-16 18:24 ` MSavoritias 2024-03-16 19:08 ` Christopher Baines 2024-03-16 19:45 ` Tomas Volf 2024-03-17 7:06 ` MSavoritias 2024-03-16 19:06 ` Ian Eure 2024-03-16 19:49 ` Tomas Volf 2024-03-16 23:16 ` Vivien Kraus 2024-03-16 23:27 ` Tomas Volf [not found] ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com> 2024-03-16 23:40 ` Fw: " Ryan Prior 2024-03-16 17:58 ` MSavoritias 2024-03-18 9:50 ` Please hold your horses Simon Tournier 2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior 2024-03-17 9:39 ` Lars-Dominik Braun 2024-03-17 9:47 ` MSavoritias 2024-03-17 11:53 ` paul 2024-03-17 11:57 ` MSavoritias 2024-03-17 14:57 ` Richard Sent 2024-03-17 16:28 ` Ian Eure 2024-03-17 12:51 ` Tomas Volf 2024-03-17 23:56 ` Attila Lendvai 2024-03-20 15:25 ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6 2024-03-17 16:20 ` Concerns/questions around Software Heritage Archive Ian Eure 2024-03-17 16:55 ` MSavoritias 2024-03-18 14:04 ` pinoaffe 2024-03-17 13:03 ` Olivier Dion 2024-03-17 17:57 ` Ludovic Courtès 2024-03-20 17:22 ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo 2024-03-21 6:12 ` MSavoritias 2024-03-21 10:49 ` Attila Lendvai 2024-03-21 11:51 ` pelzflorian (Florian Pelz) 2024-03-21 11:52 ` pinoaffe 2024-03-21 15:08 ` Giovanni Biscuolo 2024-03-21 15:11 ` MSavoritias 2024-03-21 22:11 ` Philip McGrath 2024-03-21 16:17 ` pinoaffe 2024-03-21 15:23 ` Hartmut Goebel 2024-03-21 15:27 ` MSavoritias 2024-03-21 15:54 ` Ekaitz Zarraga 2024-03-22 4:33 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-03-21 16:18 ` Efraim Flashner 2024-03-21 16:23 ` pinoaffe 2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier 2024-03-18 11:47 ` MSavoritias 2024-03-18 13:12 ` Simon Tournier 2024-03-18 14:00 ` MSavoritias 2024-03-18 14:32 ` Simon Tournier 2024-03-18 16:27 ` Kaelyn 2024-03-18 17:39 ` Daniel Littlewood 2024-03-18 20:38 ` Olivier Dion 2024-03-18 19:38 ` Ian Eure 2024-03-18 22:02 ` Ludovic Courtès 2024-03-19 10:58 ` Simon Tournier 2024-03-19 15:37 ` Ian Eure 2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier 2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure 2024-05-01 15:29 ` Ian Eure 2024-05-01 15:41 ` Tomas Volf 2024-05-02 10:28 ` Ludovic Courtès 2024-05-09 16:00 ` Maxim Cournoyer
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.