unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: MSavoritias <email@msavoritias.me>
To: Simon Tournier <zimon.toutoune@gmail.com>,
	Ian Eure <ian@retrospec.tv>, guix-devel <guix-devel@gnu.org>
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Mon, 18 Mar 2024 13:47:42 +0200	[thread overview]
Message-ID: <7881988a-6a95-f0d8-8d6b-6794651c9d2c@fannys.me> (raw)
In-Reply-To: <87a5mvyjl4.fsf@gmail.com>

On 3/18/24 11:28, Simon Tournier wrote:

> Hi,
>
> On sam., 16 mars 2024 at 08:52, Ian Eure <ian@retrospec.tv> wrote:
>
>> They appear to be using the archive to build LLMs:
>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
> About LLM, Software Heritage made a clear statement:
>
>      https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code
>
> Quoting:
>
>          We feel that the question is no longer whether LLMs for code
>          should be built. They are already being built, independently of
>          what we do, and there is no turning back.  The real question is
>          how they should be built and whom they should benefit.
>
> Principles:
>
>          1. Knowledge derived from the Software Heritage archive must be
>          given back to humanity, rather than monopolized for private
>          gain. The resulting machine learning models must be made available
>          under a suitable open license, together with the documentation and
>          toolings needed to use them.
>
>          2. The initial training data extracted from the Software Heritage
>          archive must be fully and precisely identified by, for example,
>          publishing the corresponding SWHID identifiers (note that, in the
>          context of Software Heritage, public availability of the initial
>          training data is a given: anyone can obtain it from the
>          archive). This will enable use cases such as: studying biases
>          (fairness), verifying if a code of interest was present in the
>          training data (transparency), and providing appropriate attribution
>          when generated code bears resemblance to training data (credit),
>          among others.
>
>          3. Mechanisms should be established, where possible, for authors to
>          exclude their archived code from the training inputs before model
>          training begins.
>
> I hope it clarifies your concerns to some extent.
>
>
> Moreover, you wrote: « I want absolutely nothing to do with them. »
>
> Maybe there is a misunderstanding on your side about what “free
> software” and GPL means because once “free software”, you cannot prevent
> people to use “your” free software for any purposes you dislike.
>
> If you want to bound the use cases of the software you create, you need
> to explicitly specify that in the license.  And if you do, your software
> will not be considered as “free software”.
>
> That’s the double sword of “free software”. :-)

Simon,


1.

You seem to be misunderstanding the statement here that was said.

What you can do legally and what you can do socially are not always the 
same thing.

As advice for the future when somebody says a concern or wish they have, 
your first statement shouldn't be "but its legal" because that 
completely dismisses any constructive discussion that could be done.

And you seem to be talking about legal a lot here so thats not a good look.


Yes, legally Ian probably can't get lawyers on you. But nobody is 
talking about legally here.

What is in question here is whether Software Heritage respects people 
enough to do the right thing and respect their wishes without getting 
lawyers/legal involved.


Besides with the way you are framing Free Software as not respecting any 
social rules then that makes Free Software not attractive which is the 
opposite of what we are trying to do here :)


2.

 > Somehow, a Content-Addressed system is designed around immutable 
content. And if one know how to implement a Content-Addressed system 
relying on mutable content, I would be very interested to know more 
about it.


Please refrain from doing such remarks. Nobody here suggested anything 
that you mention here and you effectively devalue the discussion by 
arguing like this and frame other people as stupid.


3.

Its not on people that are not included to write the code. If Guix is to 
be an inclusive project, then Guix should do the work so that people 
feel included.

You may disagree with this sure, but shutting down the discussion 
because nobody wrote the code for you is very elitist of you.


4.

 > This language is not acceptable on Guix channel of communication.

Calling out transphobia it is very much accepted here actually :)

Its transphobic speech that is not accepted.


I welcome Software Heritage to make an announcement about this or some 
kind of official communication saying their stance.

Although I still wouldn't use them due to the LLMs and AI stuff that 
they are using. Which I hope at some point realize their mistake.


MSavoritias



  reply	other threads:[~2024-03-18 11:48 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-16 15:52 Concerns/questions around Software Heritage Archive Ian Eure
2024-03-16 17:50 ` Christopher Baines
2024-03-16 18:24   ` MSavoritias
2024-03-16 19:08     ` Christopher Baines
2024-03-16 19:45     ` Tomas Volf
2024-03-17  7:06       ` MSavoritias
2024-03-16 19:06   ` Ian Eure
2024-03-16 19:49     ` Tomas Volf
2024-03-16 23:16   ` Vivien Kraus
2024-03-16 23:27     ` Tomas Volf
     [not found]     ` <EoCuAq3N681mOIAh7ptCyXiyscM9R0iPDBWId1eS4EbTJ2-ARWNfGuqtXIvmqcJNBl1SQvMM4X6-GiC5LiUv4TJv6J4ritPA3uZ2JBwkAzQ=@protonmail.com>
2024-03-16 23:40       ` Fw: " Ryan Prior
2024-03-16 17:58 ` MSavoritias
2024-03-18  9:50   ` Please hold your horses Simon Tournier
2024-03-16 21:37 ` Concerns/questions around Software Heritage Archive Ryan Prior
2024-03-17  9:39   ` Lars-Dominik Braun
2024-03-17  9:47     ` MSavoritias
2024-03-17 11:53       ` paul
2024-03-17 11:57         ` MSavoritias
2024-03-17 14:57           ` Richard Sent
2024-03-17 16:28           ` Ian Eure
2024-03-17 12:51         ` Tomas Volf
2024-03-17 23:56           ` Attila Lendvai
2024-03-20 15:25         ` contributor uuid (was Re: Concerns/questions around Software Heritage Archive) bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
2024-03-17 16:20       ` Concerns/questions around Software Heritage Archive Ian Eure
2024-03-17 16:55         ` MSavoritias
2024-03-18 14:04     ` pinoaffe
2024-03-17 13:03 ` Olivier Dion
2024-03-17 17:57 ` Ludovic Courtès
2024-03-20 17:22   ` the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive) Giovanni Biscuolo
2024-03-21  6:12     ` MSavoritias
2024-03-21 10:49       ` Attila Lendvai
2024-03-21 11:51       ` pelzflorian (Florian Pelz)
2024-03-21 11:52       ` pinoaffe
2024-03-21 15:08         ` Giovanni Biscuolo
2024-03-21 15:11           ` MSavoritias
2024-03-21 22:11             ` Philip McGrath
2024-03-21 16:17           ` pinoaffe
2024-03-21 15:23       ` Hartmut Goebel
2024-03-21 15:27         ` MSavoritias
2024-03-21 15:54           ` Ekaitz Zarraga
2024-03-22  4:33           ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-03-21 16:18         ` Efraim Flashner
2024-03-21 16:23         ` pinoaffe
2024-03-18  9:28 ` Concerns/questions around Software Heritage Archive Simon Tournier
2024-03-18 11:47   ` MSavoritias [this message]
2024-03-18 13:12     ` Simon Tournier
2024-03-18 14:00       ` MSavoritias
2024-03-18 14:32         ` Simon Tournier
2024-03-18 16:27   ` Kaelyn
2024-03-18 17:39     ` Daniel Littlewood
2024-03-18 20:38     ` Olivier Dion
2024-03-18 19:38   ` Ian Eure
2024-03-18 22:02     ` Ludovic Courtès
2024-03-19 10:58     ` Simon Tournier
2024-03-19 15:37       ` Ian Eure
2024-03-18 11:14 ` Content-Addressed system and history? Simon Tournier
2024-04-20 18:48 ` Concerns/questions around Software Heritage Archive Ian Eure
2024-05-01 15:29   ` Ian Eure
2024-05-01 15:41     ` Tomas Volf
2024-05-02 10:28   ` Ludovic Courtès
2024-05-09 16:00     ` Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7881988a-6a95-f0d8-8d6b-6794651c9d2c@fannys.me \
    --to=email@msavoritias.me \
    --cc=guix-devel@gnu.org \
    --cc=ian@retrospec.tv \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).