From: Ian Eure <ian@retrospec.tv>
To: guix-devel <guix-devel@gnu.org>
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Sat, 20 Apr 2024 11:48:20 -0700 [thread overview]
Message-ID: <87frvfan0r.fsf@retrospec.tv> (raw)
In-Reply-To: <87il1mupco.fsf@meson> (Ian Eure's message of "Sat, 16 Mar 2024 08:52:53 -0700")
Hello,
I’m following up on this since discussion since it’s been a month
and I haven’t heard any updates.
Summarizing the situation:
- SHF has an opaque, difficult, and undocumented process for
handling name changes. I’s like to stress again that this is
*not* strictly a transgender issue (though it likely affects
them more, or in worse/different ways) -- it is a human respect
issue. Many, many more cisgender people change their name than
transgender people.
- SHF gave their archive to HuggingFace, an "AI" company which is
generating derived works with no attribution or provenance, in
ways which violate the both licenses of the projects used to
train their model, and the SHF principles for LLMs.
- HuggingFace wasn’t respecting requests to opt-out of their
model.
On the first point, it sounds like SHF has made concrete progress
to improve[1], which is very good to hear. If SHF continues on
this course, I think the concern is resolved.
On the third point, HuggingFace has begun honoring opt-out
requests, but is still very far behind. Also, they don’t remove
code from the older versions of their model -- it remains there
forever. This is progress, but still, not great.
On the second point, I have not seen any public statements
indicating that either SHF or HuggingFace even acknowledges the
problem. SHF’s most recent newsletter[2], published in April 2024
(after these concerns came to light), continues to tout that
StarCoder2 is "the first AI model aligned with our principles,"
which appears to be false. StarCoder2 includes both licensed and
unlicensed code, and HuggingFace’s own StarChat2 playground
produces works derivative of this code, with no attribution or
licensing information. There is also no statement or position on
the SHF news blog. Nor hsa HuggingFace either fixed their tools,
or made a statement. This is still very much a live concern.
I have a few questions:
- Has Guix reached out to SHF to express these concerns / get a
response?
- Whether a public or private response, what would Guix consider
to be an acceptable response? An unacceptable respoinse?
- How long is Guix willing to wait for a response?
Thanks,
— Ian
[1]:
https://cohost.org/arborelia/post/5273879-they-are-fixing-some
[2]:
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf
Ian Eure <ian@retrospec.tv> writes:
> Hi Guixy people,
>
> I’d never heard of SWH before I started hacking on Guix last
> fall, and
> it struck me as rather a good idea. However, I’ve seen some
> things
> lately which have soured me on them.
>
> They appear to be using the archive to build LLMs:
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>
> I was also distressed to see how poorly they treated a developer
> who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag
>
> GPL’d software I’ve created has been packaged for Guix, which I
> assume
> means it’s been included in SWH. While I’m dealing with their
> (IMO:
> unethical) opt-out process, I likely also need to stop new
> copies from
> being uploaded again in the future.
>
> Is there a way to indicate, in a Guix package, that it should
> *never*
> be included in SWH?
>
> Is there a way to tell Guix to never download source from SWH?
>
> I want absolutely nothing to do with them.
>
> Thanks,
>
> — Ian
>
next prev parent reply other threads:[~2024-04-20 18:49 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
2024-03-16 17:50 ` Christopher Baines
2024-03-16 18:24 ` MSavoritias
2024-03-16 19:08 ` Christopher Baines
2024-03-16 19:45 ` Tomas Volf
2024-03-17 7:06 ` MSavoritias
2024-03-16 19:06 ` Ian Eure
2024-03-16 19:49 ` Tomas Volf
2024-03-16 23:16 ` Vivien Kraus
2024-03-16 23:27 ` Tomas Volf
[not found] ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
2024-03-16 23:40 ` Fw: " Ryan Prior
2024-03-16 17:58 ` MSavoritias
2024-03-18 9:50 ` Please hold your horses Simon Tournier
2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
2024-03-17 9:39 ` Lars-Dominik Braun
2024-03-17 9:47 ` MSavoritias
2024-03-17 11:53 ` paul
2024-03-17 11:57 ` MSavoritias
2024-03-17 14:57 ` Richard Sent
2024-03-17 16:28 ` Ian Eure
2024-03-17 12:51 ` Tomas Volf
2024-03-17 23:56 ` Attila Lendvai
2024-03-20 15:25 ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
2024-03-17 16:20 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-03-17 16:55 ` MSavoritias
2024-03-18 14:04 ` pinoaffe
2024-03-17 13:03 ` Olivier Dion
2024-03-17 17:57 ` Ludovic Courtès
2024-03-20 17:22 ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
2024-03-21 6:12 ` MSavoritias
2024-03-21 10:49 ` Attila Lendvai
2024-03-21 11:51 ` pelzflorian (Florian Pelz)
2024-03-21 11:52 ` pinoaffe
2024-03-21 15:08 ` Giovanni Biscuolo
2024-03-21 15:11 ` MSavoritias
2024-03-21 22:11 ` Philip McGrath
2024-03-21 16:17 ` pinoaffe
2024-03-21 15:23 ` Hartmut Goebel
2024-03-21 15:27 ` MSavoritias
2024-03-21 15:54 ` Ekaitz Zarraga
2024-03-22 4:33 ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-03-21 16:18 ` Efraim Flashner
2024-03-21 16:23 ` pinoaffe
2024-03-18 9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
2024-03-18 11:47 ` MSavoritias
2024-03-18 13:12 ` Simon Tournier
2024-03-18 14:00 ` MSavoritias
2024-03-18 14:32 ` Simon Tournier
2024-03-18 16:27 ` Kaelyn
2024-03-18 17:39 ` Daniel Littlewood
2024-03-18 20:38 ` Olivier Dion
2024-03-18 19:38 ` Ian Eure
2024-03-18 22:02 ` Ludovic Courtès
2024-03-19 10:58 ` Simon Tournier
2024-03-19 15:37 ` Ian Eure
2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
2024-04-20 18:48 ` Ian Eure [this message]
2024-05-01 15:29 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-05-01 15:41 ` Tomas Volf
2024-05-02 10:28 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87frvfan0r.fsf@retrospec.tv \
--to=ian@retrospec.tv \
--cc=guix-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.