From: Ian Eure <ian@retrospec.tv>
To: Christopher Baines <mail@cbaines.net>
Cc: guix-devel@gnu.org
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Sat, 16 Mar 2024 12:06:27 -0700 [thread overview]
Message-ID: <87edcaug07.fsf@meson> (raw)
In-Reply-To: <87cyruqcfe.fsf@cbaines.net>
Christopher Baines <mail@cbaines.net> writes:
> [[PGP Signed Part:Undecided]]
>
> Ian Eure <ian@retrospec.tv> writes:
>
>> Hi Guixy people,
>>
>> I’d never heard of SWH before I started hacking on Guix last
>> fall, and
>> it struck me as rather a good idea. However, I’ve seen some
>> things
>> lately which have soured me on them.
>>
>> They appear to be using the archive to build LLMs:
>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>>
>> I was also distressed to see how poorly they treated a
>> developer who
>> wished to update their name:
>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>>
>> GPL’d software I’ve created has been packaged for Guix, which I
>> assume
>> means it’s been included in SWH. While I’m dealing with their
>> (IMO:
>> unethical) opt-out process, I likely also need to stop new
>> copies from
>> being uploaded again in the future.
>>
>> Is there a way to indicate, in a Guix package, that it should
>> *never*
>> be included in SWH?
>
> Not currently, and I don't really see the point in such a
> mechanism. If
> you really never want them to store your code, then you need to
> license
> it accordingly (and not make it free software).
>
I don’t want my code in SWH *because* it’s free. A primary use of
LLMs is laundering freely licensed software into proprietary,
commercial projects through "AI" code completion and generation.
Any Free software in an LLM training set can and will be used in
violation of its license, without a clear path for the author to
seek recourse. I deleted my code off Github and abandoned it
completely for this exact reason, and am deeply irked to be going
through this nonsense again.
A more salient question may be: Is there a process within Guix
(either the program or the organization) which uploads source to
SWH? Or does it rely on SWH indepently?
If the latter, my problem is likely solved by blocking SWH at my
network edge and opting out of their archive (or trying to) and
the downstream training models they’ve already put it in. If the
former, the only control I currently have to protect my license is
removing packages from Guix which contain it. I don’t want that
outcome.
Noting also that the path here seems to be
SWH->huggingface->bigcode training set, and the opt-out process
for the training set appears to be a complete sham. To opt-out,
you must create a Github Issue; only one opt-out has *ever* been
processed, and there are 200+ sitting there, many with no response
for nearly a year[1]. I want no part of any of this.
>> Is there a way to tell Guix to never download source from SWH?
>
> Also no, and it's probably best to do this at the network level
> on your
> systems/network if you want this to be the case.
>
I’ll investigate this, though I’d prefer if there was a way to
configure source mirrors in the Guix daemon.
> Skipping back to this though:
>
>> I was also distressed to see how poorly they treated a
>> developer who
>> wished to update their name:
>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>
> This is probably worth thinking about as Guix is in a similar
> situation
> regarding publishing source code, and people potentially wanting
> to
> change historical source code both in things Guix packages and
> Guix
> itself.
>
> Like Software Heritage, there's cryptographical implications for
> rewriting the Git history and modifying source tarballs or nars
> that
> contain source code.
>
> We have 17TiB of compressed source code and built software
> stored for
> bordeaux.guix.gnu.org now and we should probably work out how to
> handle
> people asking for things to be removed or changed (for any and
> all
> reasons).
>
> It's probably worth working out our position on this in advance
> of
> someone asking.
>
Yes, I agree that Guix needs a better solution for this.
Thanks,
— Ian
[1]: https://github.com/bigcode-project/opt-out-v2/issues
next prev parent reply other threads:[~2024-03-16 19:36 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
2024-03-16 17:50 ` Christopher Baines
2024-03-16 18:24 ` MSavoritias
2024-03-16 19:08 ` Christopher Baines
2024-03-16 19:45 ` Tomas Volf
2024-03-17 7:06 ` MSavoritias
2024-03-16 19:06 ` Ian Eure [this message]
2024-03-16 19:49 ` Tomas Volf
2024-03-16 23:16 ` Vivien Kraus
2024-03-16 23:27 ` Tomas Volf
[not found] ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
2024-03-16 23:40 ` Fw: " Ryan Prior
2024-03-16 17:58 ` MSavoritias
2024-03-18 9:50 ` Please hold your horses Simon Tournier
2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
2024-03-17 9:39 ` Lars-Dominik Braun
2024-03-17 9:47 ` MSavoritias
2024-03-17 11:53 ` paul
2024-03-17 11:57 ` MSavoritias
2024-03-17 14:57 ` Richard Sent
2024-03-17 16:28 ` Ian Eure
2024-03-17 12:51 ` Tomas Volf
2024-03-17 23:56 ` Attila Lendvai
2024-03-20 15:25 ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
2024-03-17 16:20 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-03-17 16:55 ` MSavoritias
2024-03-18 14:04 ` pinoaffe
2024-03-17 13:03 ` Olivier Dion
2024-03-17 17:57 ` Ludovic Courtès
2024-03-20 17:22 ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
2024-03-21 6:12 ` MSavoritias
2024-03-21 10:49 ` Attila Lendvai
2024-03-21 11:51 ` pelzflorian (Florian Pelz)
2024-03-21 11:52 ` pinoaffe
2024-03-21 15:08 ` Giovanni Biscuolo
2024-03-21 15:11 ` MSavoritias
2024-03-21 22:11 ` Philip McGrath
2024-03-21 16:17 ` pinoaffe
2024-03-21 15:23 ` Hartmut Goebel
2024-03-21 15:27 ` MSavoritias
2024-03-21 15:54 ` Ekaitz Zarraga
2024-03-22 4:33 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-03-21 16:18 ` Efraim Flashner
2024-03-21 16:23 ` pinoaffe
2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
2024-03-18 11:47 ` MSavoritias
2024-03-18 13:12 ` Simon Tournier
2024-03-18 14:00 ` MSavoritias
2024-03-18 14:32 ` Simon Tournier
2024-03-18 16:27 ` Kaelyn
2024-03-18 17:39 ` Daniel Littlewood
2024-03-18 20:38 ` Olivier Dion
2024-03-18 19:38 ` Ian Eure
2024-03-18 22:02 ` Ludovic Courtès
2024-03-19 10:58 ` Simon Tournier
2024-03-19 15:37 ` Ian Eure
2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-05-01 15:29 ` Ian Eure
2024-05-01 15:41 ` Tomas Volf
2024-05-02 10:28 ` Ludovic Courtès
2024-05-09 16:00 ` Maxim Cournoyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87edcaug07.fsf@meson \
--to=ian@retrospec.tv \
--cc=guix-devel@gnu.org \
--cc=mail@cbaines.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).