From: Tomas Volf <~@wolfsden.cz>
To: Ian Eure <ian@retrospec.tv>
Cc: Christopher Baines <mail@cbaines.net>, guix-devel@gnu.org
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Sat, 16 Mar 2024 20:49:44 +0100 [thread overview]
Message-ID: <ZfX32Ejd9g0QO5ri@ws> (raw)
In-Reply-To: <87edcaug07.fsf@meson>
[-- Attachment #1: Type: text/plain, Size: 4922 bytes --]
On 2024-03-16 12:06:27 -0700, Ian Eure wrote:
>
> Christopher Baines <mail@cbaines.net> writes:
>
> > [[PGP Signed Part:Undecided]]
> >
> > Ian Eure <ian@retrospec.tv> writes:
> >
> > > Hi Guixy people,
> > >
> > > I’d never heard of SWH before I started hacking on Guix last fall,
> > > and
> > > it struck me as rather a good idea. However, I’ve seen some things
> > > lately which have soured me on them.
> > >
> > > They appear to be using the archive to build LLMs:
> > > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
> > >
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> > >
> > > GPL’d software I’ve created has been packaged for Guix, which I
> > > assume
> > > means it’s been included in SWH. While I’m dealing with their (IMO:
> > > unethical) opt-out process, I likely also need to stop new copies
> > > from
> > > being uploaded again in the future.
> > >
> > > Is there a way to indicate, in a Guix package, that it should
> > > *never*
> > > be included in SWH?
> >
> > Not currently, and I don't really see the point in such a mechanism. If
> > you really never want them to store your code, then you need to license
> > it accordingly (and not make it free software).
> >
>
> I don’t want my code in SWH *because* it’s free. A primary use of LLMs is
> laundering freely licensed software into proprietary, commercial projects
> through "AI" code completion and generation. Any Free software in an LLM
> training set can and will be used in violation of its license, without a
> clear path for the author to seek recourse. I deleted my code off Github
> and abandoned it completely for this exact reason, and am deeply irked to be
> going through this nonsense again.
>
> A more salient question may be: Is there a process within Guix (either the
> program or the organization) which uploads source to SWH? Or does it rely
> on SWH indepently?
`guix lint PKG-NAME' schedules SWH archival if possible. No code is directly
uploaded (at least currently), so assuming you have a IP list of SWH, it should
be possible to block it. At least AFAIK.
If you have the list, or know how to get it, could you share it? I would be
interesting in blocking it as well from my git hosting.
>
> If the latter, my problem is likely solved by blocking SWH at my network
> edge and opting out of their archive (or trying to) and the downstream
> training models they’ve already put it in. If the former, the only control
> I currently have to protect my license is removing packages from Guix which
> contain it. I don’t want that outcome.
>
> Noting also that the path here seems to be SWH->huggingface->bigcode
> training set, and the opt-out process for the training set appears to be a
> complete sham. To opt-out, you must create a Github Issue; only one opt-out
> has *ever* been processed, and there are 200+ sitting there, many with no
> response for nearly a year[1]. I want no part of any of this.
>
>
> > > Is there a way to tell Guix to never download source from SWH?
> >
> > Also no, and it's probably best to do this at the network level on your
> > systems/network if you want this to be the case.
> >
>
> I’ll investigate this, though I’d prefer if there was a way to configure
> source mirrors in the Guix daemon.
>
>
> > Skipping back to this though:
> >
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> >
> > This is probably worth thinking about as Guix is in a similar situation
> > regarding publishing source code, and people potentially wanting to
> > change historical source code both in things Guix packages and Guix
> > itself.
> >
> > Like Software Heritage, there's cryptographical implications for
> > rewriting the Git history and modifying source tarballs or nars that
> > contain source code.
> >
> > We have 17TiB of compressed source code and built software stored for
> > bordeaux.guix.gnu.org now and we should probably work out how to handle
> > people asking for things to be removed or changed (for any and all
> > reasons).
> >
> > It's probably worth working out our position on this in advance of
> > someone asking.
> >
>
> Yes, I agree that Guix needs a better solution for this.
>
> Thanks,
>
> — Ian
>
> [1]: https://github.com/bigcode-project/opt-out-v2/issues
>
T.
--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-03-16 19:50 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
2024-03-16 17:50 ` Christopher Baines
2024-03-16 18:24 ` MSavoritias
2024-03-16 19:08 ` Christopher Baines
2024-03-16 19:45 ` Tomas Volf
2024-03-17 7:06 ` MSavoritias
2024-03-16 19:06 ` Ian Eure
2024-03-16 19:49 ` Tomas Volf [this message]
2024-03-16 23:16 ` Vivien Kraus
2024-03-16 23:27 ` Tomas Volf
[not found] ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
2024-03-16 23:40 ` Fw: " Ryan Prior
2024-03-16 17:58 ` MSavoritias
2024-03-18 9:50 ` Please hold your horses Simon Tournier
2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
2024-03-17 9:39 ` Lars-Dominik Braun
2024-03-17 9:47 ` MSavoritias
2024-03-17 11:53 ` paul
2024-03-17 11:57 ` MSavoritias
2024-03-17 14:57 ` Richard Sent
2024-03-17 16:28 ` Ian Eure
2024-03-17 12:51 ` Tomas Volf
2024-03-17 23:56 ` Attila Lendvai
2024-03-20 15:25 ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
2024-03-17 16:20 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-03-17 16:55 ` MSavoritias
2024-03-18 14:04 ` pinoaffe
2024-03-17 13:03 ` Olivier Dion
2024-03-17 17:57 ` Ludovic Courtès
2024-03-20 17:22 ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
2024-03-21 6:12 ` MSavoritias
2024-03-21 10:49 ` Attila Lendvai
2024-03-21 11:51 ` pelzflorian (Florian Pelz)
2024-03-21 11:52 ` pinoaffe
2024-03-21 15:08 ` Giovanni Biscuolo
2024-03-21 15:11 ` MSavoritias
2024-03-21 22:11 ` Philip McGrath
2024-03-21 16:17 ` pinoaffe
2024-03-21 15:23 ` Hartmut Goebel
2024-03-21 15:27 ` MSavoritias
2024-03-21 15:54 ` Ekaitz Zarraga
2024-03-22 4:33 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-03-21 16:18 ` Efraim Flashner
2024-03-21 16:23 ` pinoaffe
2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
2024-03-18 11:47 ` MSavoritias
2024-03-18 13:12 ` Simon Tournier
2024-03-18 14:00 ` MSavoritias
2024-03-18 14:32 ` Simon Tournier
2024-03-18 16:27 ` Kaelyn
2024-03-18 17:39 ` Daniel Littlewood
2024-03-18 20:38 ` Olivier Dion
2024-03-18 19:38 ` Ian Eure
2024-03-18 22:02 ` Ludovic Courtès
2024-03-19 10:58 ` Simon Tournier
2024-03-19 15:37 ` Ian Eure
2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-05-01 15:29 ` Ian Eure
2024-05-01 15:41 ` Tomas Volf
2024-05-02 10:28 ` Ludovic Courtès
2024-05-09 16:00 ` Maxim Cournoyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfX32Ejd9g0QO5ri@ws \
--to=~@wolfsden.cz \
--cc=guix-devel@gnu.org \
--cc=ian@retrospec.tv \
--cc=mail@cbaines.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.