unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Tomas Volf <~@wolfsden.cz>
To: Ian Eure <ian@retrospec.tv>
Cc: Christopher Baines <mail@cbaines.net>, guix-devel@gnu.org
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Sat, 16 Mar 2024 20:49:44 +0100	[thread overview]
Message-ID: <ZfX32Ejd9g0QO5ri@ws> (raw)
In-Reply-To: <87edcaug07.fsf@meson>

[-- Attachment #1: Type: text/plain, Size: 4922 bytes --]

On 2024-03-16 12:06:27 -0700, Ian Eure wrote:
>
> Christopher Baines <mail@cbaines.net> writes:
>
> > [[PGP Signed Part:Undecided]]
> >
> > Ian Eure <ian@retrospec.tv> writes:
> >
> > > Hi Guixy people,
> > >
> > > I’d never heard of SWH before I started hacking on Guix last fall,
> > > and
> > > it struck me as rather a good idea.  However, I’ve seen some things
> > > lately which have soured me on them.
> > >
> > > They appear to be using the archive to build LLMs:
> > > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
> > >
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> > >
> > > GPL’d software I’ve created has been packaged for Guix, which I
> > > assume
> > > means it’s been included in SWH.  While I’m dealing with their (IMO:
> > > unethical) opt-out process, I likely also need to stop new copies
> > > from
> > > being uploaded again in the future.
> > >
> > > Is there a way to indicate, in a Guix package, that it should
> > > *never*
> > > be included in SWH?
> >
> > Not currently, and I don't really see the point in such a mechanism. If
> > you really never want them to store your code, then you need to license
> > it accordingly (and not make it free software).
> >
>
> I don’t want my code in SWH *because* it’s free.  A primary use of LLMs is
> laundering freely licensed software into proprietary, commercial projects
> through "AI" code completion and generation. Any Free software in an LLM
> training set can and will be used in violation of its license, without a
> clear path for the author to seek recourse.  I deleted my code off Github
> and abandoned it completely for this exact reason, and am deeply irked to be
> going through this nonsense again.
>
> A more salient question may be: Is there a process within Guix (either the
> program or the organization) which uploads source to SWH?  Or does it rely
> on SWH indepently?

`guix lint PKG-NAME' schedules SWH archival if possible.  No code is directly
uploaded (at least currently), so assuming you have a IP list of SWH, it should
be possible to block it.  At least AFAIK.

If you have the list, or know how to get it, could you share it?  I would be
interesting in blocking it as well from my git hosting.

>
> If the latter, my problem is likely solved by blocking SWH at my network
> edge and opting out of their archive (or trying to) and the downstream
> training models they’ve already put it in.  If the former, the only control
> I currently have to protect my license is removing packages from Guix which
> contain it.  I don’t want that outcome.
>
> Noting also that the path here seems to be SWH->huggingface->bigcode
> training set, and the opt-out process for the training set appears to be a
> complete sham.  To opt-out, you must create a Github Issue; only one opt-out
> has *ever* been processed, and there are 200+ sitting there, many with no
> response for nearly a year[1].  I want no part of any of this.
>
>
> > > Is there a way to tell Guix to never download source from SWH?
> >
> > Also no, and it's probably best to do this at the network level on your
> > systems/network if you want this to be the case.
> >
>
> I’ll investigate this, though I’d prefer if there was a way to configure
> source mirrors in the Guix daemon.
>
>
> > Skipping back to this though:
> >
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> >
> > This is probably worth thinking about as Guix is in a similar situation
> > regarding publishing source code, and people potentially wanting to
> > change historical source code both in things Guix packages and Guix
> > itself.
> >
> > Like Software Heritage, there's cryptographical implications for
> > rewriting the Git history and modifying source tarballs or nars that
> > contain source code.
> >
> > We have 17TiB of compressed source code and built software stored for
> > bordeaux.guix.gnu.org now and we should probably work out how to handle
> > people asking for things to be removed or changed (for any and all
> > reasons).
> >
> > It's probably worth working out our position on this in advance of
> > someone asking.
> >
>
> Yes, I agree that Guix needs a better solution for this.
>
> Thanks,
>
>  — Ian
>
> [1]: https://github.com/bigcode-project/opt-out-v2/issues
>

T.

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-03-16 19:50 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
2024-03-16 17:50 ` Christopher Baines
2024-03-16 18:24   ` MSavoritias
2024-03-16 19:08     ` Christopher Baines
2024-03-16 19:45     ` Tomas Volf
2024-03-17  7:06       ` MSavoritias
2024-03-16 19:06   ` Ian Eure
2024-03-16 19:49     ` Tomas Volf [this message]
2024-03-16 23:16   ` Vivien Kraus
2024-03-16 23:27     ` Tomas Volf
     [not found]     ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
2024-03-16 23:40       ` Fw: " Ryan Prior
2024-03-16 17:58 ` MSavoritias
2024-03-18  9:50   ` Please hold your horses Simon Tournier
2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
2024-03-17  9:39   ` Lars-Dominik Braun
2024-03-17  9:47     ` MSavoritias
2024-03-17 11:53       ` paul
2024-03-17 11:57         ` MSavoritias
2024-03-17 14:57           ` Richard Sent
2024-03-17 16:28           ` Ian Eure
2024-03-17 12:51         ` Tomas Volf
2024-03-17 23:56           ` Attila Lendvai
2024-03-20 15:25         ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
2024-03-17 16:20       ` Concerns/questions around Software Heritage Archive Ian Eure
2024-03-17 16:55         ` MSavoritias
2024-03-18 14:04     ` pinoaffe
2024-03-17 13:03 ` Olivier Dion
2024-03-17 17:57 ` Ludovic Courtès
2024-03-20 17:22   ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
2024-03-21  6:12     ` MSavoritias
2024-03-21 10:49       ` Attila Lendvai
2024-03-21 11:51       ` pelzflorian (Florian Pelz)
2024-03-21 11:52       ` pinoaffe
2024-03-21 15:08         ` Giovanni Biscuolo
2024-03-21 15:11           ` MSavoritias
2024-03-21 22:11             ` Philip McGrath
2024-03-21 16:17           ` pinoaffe
2024-03-21 15:23       ` Hartmut Goebel
2024-03-21 15:27         ` MSavoritias
2024-03-21 15:54           ` Ekaitz Zarraga
2024-03-22  4:33           ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-03-21 16:18         ` Efraim Flashner
2024-03-21 16:23         ` pinoaffe
2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
2024-03-18 11:47   ` MSavoritias
2024-03-18 13:12     ` Simon Tournier
2024-03-18 14:00       ` MSavoritias
2024-03-18 14:32         ` Simon Tournier
2024-03-18 16:27   ` Kaelyn
2024-03-18 17:39     ` Daniel Littlewood
2024-03-18 20:38     ` Olivier Dion
2024-03-18 19:38   ` Ian Eure
2024-03-18 22:02     ` Ludovic Courtès
2024-03-19 10:58     ` Simon Tournier
2024-03-19 15:37       ` Ian Eure
2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-05-01 15:29   ` Ian Eure
2024-05-01 15:41     ` Tomas Volf
2024-05-02 10:28   ` Ludovic Courtès
2024-05-09 16:00     ` Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfX32Ejd9g0QO5ri@ws \
    --to=~@wolfsden.cz \
    --cc=guix-devel@gnu.org \
    --cc=ian@retrospec.tv \
    --cc=mail@cbaines.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).