all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Ian Eure <ian@retrospec.tv>
To: guix-devel <guix-devel@gnu.org>
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Sat, 20 Apr 2024 11:48:20 -0700	[thread overview]
Message-ID: <87frvfan0r.fsf@retrospec.tv> (raw)
In-Reply-To: <87il1mupco.fsf@meson> (Ian Eure's message of "Sat, 16 Mar 2024 08:52:53 -0700")

Hello,

I’m following up on this since discussion since it’s been a month 
and I haven’t heard any updates.

Summarizing the situation:

- SHF has an opaque, difficult, and undocumented process for 
  handling name changes.  I’s like to stress again that this is 
  *not* strictly a transgender issue (though it likely affects 
  them more, or in worse/different ways) -- it is a human respect 
  issue.  Many, many more cisgender people change their name than 
  transgender people.

- SHF gave their archive to HuggingFace, an "AI" company which is 
  generating derived works with no attribution or provenance, in 
  ways which violate the both licenses of the projects used to 
  train their model, and the SHF principles for LLMs.

- HuggingFace wasn’t respecting requests to opt-out of their 
  model.


On the first point, it sounds like SHF has made concrete progress 
to improve[1], which is very good to hear.  If SHF continues on 
this course, I think the concern is resolved.

On the third point, HuggingFace has begun honoring opt-out 
requests, but is still very far behind.  Also, they don’t remove 
code from the older versions of their model -- it remains there 
forever.  This is progress, but still, not great.

On the second point, I have not seen any public statements 
indicating that either SHF or HuggingFace even acknowledges the 
problem.  SHF’s most recent newsletter[2], published in April 2024 
(after these concerns came to light), continues to tout that 
StarCoder2 is "the first AI model aligned with our principles," 
which appears to be false.  StarCoder2 includes both licensed and 
unlicensed code, and HuggingFace’s own StarChat2 playground 
produces works derivative of this code, with no attribution or 
licensing information.  There is also no statement or position on 
the SHF news blog.  Nor hsa HuggingFace either fixed their tools, 
or made a statement.  This is still very much a live concern.

I have a few questions:

- Has Guix reached out to SHF to express these concerns / get a 
  response?
- Whether a public or private response, what would Guix consider 
  to be an acceptable response?  An unacceptable respoinse?
- How long is Guix willing to wait for a response?

Thanks,

  — Ian

[1]: 
https://cohost.org/arborelia/post/5273879-they-are-fixing-some
[2]: 
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf

Ian Eure <ian@retrospec.tv> writes:

> Hi Guixy people,
>
> I’d never heard of SWH before I started hacking on Guix last 
> fall, and
> it struck me as rather a good idea.  However, I’ve seen some 
> things
> lately which have soured me on them.
>
> They appear to be using the archive to build LLMs:
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>
> I was also distressed to see how poorly they treated a developer 
> who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag
>
> GPL’d software I’ve created has been packaged for Guix, which I 
> assume
> means it’s been included in SWH.  While I’m dealing with their 
> (IMO:
> unethical) opt-out process, I likely also need to stop new 
> copies from
> being uploaded again in the future.
>
> Is there a way to indicate, in a Guix package, that it should 
> *never*
> be included in SWH?
>
> Is there a way to tell Guix to never download source from SWH?
>
> I want absolutely nothing to do with them.
>
> Thanks,
>
>  — Ian
>


  parent reply	other threads:[~2024-04-20 18:49 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
2024-03-16 17:50 ` Christopher Baines
2024-03-16 18:24   ` MSavoritias
2024-03-16 19:08     ` Christopher Baines
2024-03-16 19:45     ` Tomas Volf
2024-03-17  7:06       ` MSavoritias
2024-03-16 19:06   ` Ian Eure
2024-03-16 19:49     ` Tomas Volf
2024-03-16 23:16   ` Vivien Kraus
2024-03-16 23:27     ` Tomas Volf
     [not found]     ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
2024-03-16 23:40       ` Fw: " Ryan Prior
2024-03-16 17:58 ` MSavoritias
2024-03-18  9:50   ` Please hold your horses Simon Tournier
2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
2024-03-17  9:39   ` Lars-Dominik Braun
2024-03-17  9:47     ` MSavoritias
2024-03-17 11:53       ` paul
2024-03-17 11:57         ` MSavoritias
2024-03-17 14:57           ` Richard Sent
2024-03-17 16:28           ` Ian Eure
2024-03-17 12:51         ` Tomas Volf
2024-03-17 23:56           ` Attila Lendvai
2024-03-20 15:25         ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
2024-03-17 16:20       ` Concerns/questions around Software Heritage Archive Ian Eure
2024-03-17 16:55         ` MSavoritias
2024-03-18 14:04     ` pinoaffe
2024-03-17 13:03 ` Olivier Dion
2024-03-17 17:57 ` Ludovic Courtès
2024-03-20 17:22   ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
2024-03-21  6:12     ` MSavoritias
2024-03-21 10:49       ` Attila Lendvai
2024-03-21 11:51       ` pelzflorian (Florian Pelz)
2024-03-21 11:52       ` pinoaffe
2024-03-21 15:08         ` Giovanni Biscuolo
2024-03-21 15:11           ` MSavoritias
2024-03-21 22:11             ` Philip McGrath
2024-03-21 16:17           ` pinoaffe
2024-03-21 15:23       ` Hartmut Goebel
2024-03-21 15:27         ` MSavoritias
2024-03-21 15:54           ` Ekaitz Zarraga
2024-03-22  4:33           ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-03-21 16:18         ` Efraim Flashner
2024-03-21 16:23         ` pinoaffe
2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
2024-03-18 11:47   ` MSavoritias
2024-03-18 13:12     ` Simon Tournier
2024-03-18 14:00       ` MSavoritias
2024-03-18 14:32         ` Simon Tournier
2024-03-18 16:27   ` Kaelyn
2024-03-18 17:39     ` Daniel Littlewood
2024-03-18 20:38     ` Olivier Dion
2024-03-18 19:38   ` Ian Eure
2024-03-18 22:02     ` Ludovic Courtès
2024-03-19 10:58     ` Simon Tournier
2024-03-19 15:37       ` Ian Eure
2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
2024-04-20 18:48 ` Ian Eure [this message]
2024-05-01 15:29   ` Concerns/questions around Software Heritage Archive Ian Eure
2024-05-01 15:41     ` Tomas Volf
2024-05-02 10:28   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87frvfan0r.fsf@retrospec.tv \
    --to=ian@retrospec.tv \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.